|
| 1 | +# Connecting CrateDB Data to an LLM with LlamaIndex and Azure OpenAI |
| 2 | + |
| 3 | +This folder contains the codebase for [this tutorial](https://community.cratedb.com/t/how-to-connect-your-cratedb-data-to-llm-with-llamaindex-and-azure-openai/1612) on the CrateDB community forum. You should read the tutorial for instructions on how to set up the components that you need on Azure, and use this README for setting up CrateDB and the Python code. |
| 4 | + |
| 5 | +This has been tested using: |
| 6 | + |
| 7 | +* Python 3.12.2 |
| 8 | +* macOS Sequoia 15.0.1 |
| 9 | +* CrateDB 5.8.3 running in CrateDB Cloud on AWS Europe (Ireland) |
| 10 | + |
| 11 | +## Database Setup |
| 12 | + |
| 13 | +You will need a CrateDB Cloud database: sign up [here](https://console.cratedb.cloud/) and use the free "CRFREE" tier. |
| 14 | + |
| 15 | +Make a note of the hostname, username and password for your database. You'll need those when configuring the environment file later. |
| 16 | + |
| 17 | +Create a table in CrateDB: |
| 18 | + |
| 19 | +```sql |
| 20 | +CREATE TABLE IF NOT EXISTS time_series_data ( |
| 21 | + timestamp TIMESTAMP, |
| 22 | + value DOUBLE, |
| 23 | + location STRING, |
| 24 | + sensor_id INT |
| 25 | +); |
| 26 | +``` |
| 27 | + |
| 28 | +Add some sample data: |
| 29 | + |
| 30 | +```sql |
| 31 | +INSERT INTO time_series_data (timestamp, value, location, sensor_id) |
| 32 | +VALUES |
| 33 | + ('2023-09-14T00:00:00', 10.5, 'Sensor A', 1), |
| 34 | + ('2023-09-14T01:00:00', 15.2, 'Sensor A', 1), |
| 35 | + ('2023-09-14T02:00:00', 18.9, 'Sensor A', 1), |
| 36 | + ('2023-09-14T03:00:00', 12.7, 'Sensor B', 2), |
| 37 | + ('2023-09-14T04:00:00', 17.3, 'Sensor B', 2), |
| 38 | + ('2023-09-14T05:00:00', 20.1, 'Sensor B', 2), |
| 39 | + ('2023-09-14T06:00:00', 22.5, 'Sensor A', 1), |
| 40 | + ('2023-09-14T07:00:00', 18.3, 'Sensor A', 1), |
| 41 | + ('2023-09-14T08:00:00', 16.8, 'Sensor A', 1), |
| 42 | + ('2023-09-14T09:00:00', 14.6, 'Sensor B', 2), |
| 43 | + ('2023-09-14T10:00:00', 13.2, 'Sensor B', 2), |
| 44 | + ('2023-09-14T11:00:00', 11.7, 'Sensor B', 2); |
| 45 | +``` |
| 46 | + |
| 47 | +## Python Project Setup |
| 48 | + |
| 49 | +Create and activate a virtual environment: |
| 50 | + |
| 51 | +``` |
| 52 | +python3 -m venv .venv |
| 53 | +source .venv/bin/activate |
| 54 | +``` |
| 55 | + |
| 56 | +Install the dependencies: |
| 57 | + |
| 58 | +```bash |
| 59 | +pip install -r requirements.txt |
| 60 | +``` |
| 61 | + |
| 62 | +## Configure your Environment |
| 63 | + |
| 64 | +To configure your environment, copy the provided [`env.example`](./env.example) file to a new file named `.env`, then open it with a text editor. |
| 65 | + |
| 66 | +Set the values in the file as follows: |
| 67 | + |
| 68 | +``` |
| 69 | +OPENAI_API_KEY=<Your key from Azure> |
| 70 | +OPENAI_API_TYPE=azure |
| 71 | +OPENAI_AZURE_ENDPOINT=https://<Your endpoint from Azure e.g. myendpoint.openai.azure.com> |
| 72 | +OPENAI_AZURE_API_VERSION=2024-08-01-preview |
| 73 | +LLM_INSTANCE=<The name of your Chat GPT 3.5 turbo instance from Azure> |
| 74 | +EMBEDDING_MODEL_INSTANCE=<The name of your Text Embedding Ada 2.0 instance from Azure> |
| 75 | +CRATEDB_URL="crate://<Database user name>:<Database password>@<Database host>:4200/?ssl=true" |
| 76 | +CRATEDB_TABLE_NAME=time_series_data |
| 77 | +``` |
| 78 | + |
| 79 | +Save your changes. |
| 80 | + |
| 81 | +## Run the Code |
| 82 | + |
| 83 | +Run the code like so: |
| 84 | + |
| 85 | +```bash |
| 86 | +python main.py |
| 87 | +``` |
| 88 | + |
| 89 | +Here's the expected output: |
| 90 | + |
| 91 | +``` |
| 92 | +Creating SQLAlchemy engine... |
| 93 | +Connecting to CrateDB... |
| 94 | +Creating SQLDatabase instance... |
| 95 | +Creating QueryEngine... |
| 96 | +Running query... |
| 97 | +> Source (Doc id: b2b0afac-6fb6-4674-bc80-69941a8c10a5): [(17.033333333333335,)] |
| 98 | +Query was: What is the average value for sensor 1? |
| 99 | +Answer was: The average value for sensor 1 is 17.033333333333335. |
| 100 | +{ |
| 101 | + 'b2b0afac-6fb6-4674-bc80-69941a8c10a5': { |
| 102 | + 'sql_query': 'SELECT AVG(value) FROM time_series_data WHERE sensor_id = 1', |
| 103 | + 'result': [ |
| 104 | + (17.033333333333335,) |
| 105 | + ], |
| 106 | + 'col_keys': [ |
| 107 | + 'avg(value)' |
| 108 | + ] |
| 109 | + }, |
| 110 | + 'sql_query': 'SELECT AVG(value) FROM time_series_data WHERE sensor_id = 1', |
| 111 | + 'result': [ |
| 112 | + (17.033333333333335,) |
| 113 | + ], |
| 114 | + 'col_keys': [ |
| 115 | + 'avg(value)' |
| 116 | + ] |
| 117 | +} |
| 118 | +``` |
0 commit comments