|
1 | 1 | # OpenAI embeddings example application |
2 | 2 |
|
3 | | -## Overview |
| 3 | +This is a small example Node.js/Express application that demonstrates how to |
| 4 | +integrate Elastic and OpenAI. |
4 | 5 |
|
5 | | -Small example Node.js/Express.js application to demonstrate how to integrate Elastic and OpenAI. |
| 6 | +The application has two components: |
| 7 | +* [generate](generate_embeddings.js) |
| 8 | + * Generates embeddings for [sample_data](sample_data/medicare.json) into |
| 9 | + Elasticsearch. |
| 10 | +* [app](search_app.js) |
| 11 | + * Runs the web service which hosts the [web frontend](views) and the |
| 12 | + search API. |
| 13 | +* Both scripts use the [Elasticsearch](https://github.com/elastic/elasticsearch-js) and [OpenAI](https://github.com/openai/openai-node) JavaScript clients. |
6 | 14 |
|
7 | | -This folder includes two files: |
| 15 | + |
8 | 16 |
|
9 | | -- `generate_embeddings.js`: Processes a JSON file, generates text embeddings for each document in the file using OpenAI's API, and then stores the documents and their corresponding embeddings in an Elasticsearch index. |
10 | | -- `search_app.js`: A tiny Express.js web app that renders a search bar, generates embeddings for search queries, and performs semantic search using Elasticsearch's [kNN search](https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html). It retrieves the search results and returns a list of hits, ranked by relevance. |
| 17 | +## Download the Project |
11 | 18 |
|
12 | | -Both scripts use the [Elasticsearch](https://github.com/elastic/elasticsearch-js) and [OpenAI](https://github.com/openai/openai-node) JavaScript clients. |
| 19 | +Download the project from Github and extract the `openai-embeddings` folder. |
| 20 | + |
| 21 | +```bash |
| 22 | +curl https://codeload.github.com/elastic/elasticsearch-labs/tar.gz/main | \ |
| 23 | +tar -xz --strip=2 elasticsearch-labs-main/example-apps/openai-embeddings |
| 24 | +``` |
13 | 25 |
|
14 | | -## Requirements |
| 26 | +## Make your .env file |
15 | 27 |
|
16 | | -- Node.js 16+ |
| 28 | +Copy [env.example](env.example) to `.env` and fill in values noted inside. |
17 | 29 |
|
18 | | -## Setup |
| 30 | +## Installing and connecting to Elasticsearch |
19 | 31 |
|
20 | | -This section will walk you through the steps for setting up and using the application from scratch. |
21 | | -(Skip the first steps if you already have an Elastic deployment and OpenAI account/API key.) |
| 32 | +There are a number of ways to install Elasticsearch. Cloud is best for most |
| 33 | +use-cases. Visit the [Install Elasticsearch](https://www.elastic.co/search-labs/tutorials/install-elasticsearch) for more information. |
22 | 34 |
|
23 | | -### 1. Download the Project |
| 35 | +Once you decided your approach, edit your `.env` file accordingly. |
24 | 36 |
|
25 | | -Download the project from Github and extract the `openai-embeddings` folder. |
| 37 | +### Running your own Elastic Stack with Docker |
| 38 | + |
| 39 | +If you'd like to start Elastic locally, you can use the provided |
| 40 | +[docker-compose-elastic.yml](docker-compose-elastic.yml) file. This starts |
| 41 | +Elasticsearch, Kibana, and APM Server and only requires Docker installed. |
| 42 | + |
| 43 | +Use docker compose to run Elastic stack in the background: |
26 | 44 |
|
27 | 45 | ```bash |
28 | | -curl https://codeload.github.com/elastic/elasticsearch-labs/tar.gz/main | \ |
29 | | -tar -xz --strip=2 elasticsearch-labs-main/example-apps/openai-embeddings |
| 46 | +docker compose -f docker-compose-elastic.yml up --force-recreate -d |
30 | 47 | ``` |
31 | 48 |
|
32 | | -### 2. Create OpenAI account and API key |
| 49 | +Then, you can view Kibana at http://localhost:5601/app/home#/ |
33 | 50 |
|
34 | | -- Go to https://platform.openai.com/ and sign up |
35 | | -- Generate an API key and make note of it |
| 51 | +If asked for a username and password, use username: elastic and password: elastic. |
36 | 52 |
|
37 | | - |
| 53 | +Clean up when finished, like this: |
38 | 54 |
|
39 | | -### 3. Create Elastic Cloud account and credentials |
| 55 | +```bash |
| 56 | +docker compose -f docker-compose-elastic.yml down |
| 57 | +``` |
40 | 58 |
|
41 | | -- [Sign up](https://cloud.elastic.co/registration?onboarding_token=vectorsearch&utm_source=github&utm_content=elasticsearch-labs-samples) for a Elastic cloud account |
42 | | -- Make note of the master username/password shown to you during creation of the deployment |
43 | | -- Make note of the Elastic Cloud ID after the deployment |
| 59 | +## Running the App |
44 | 60 |
|
45 | | - |
| 61 | +There are two ways to run the app: via Docker or locally. Docker is advised for |
| 62 | +ease while locally is advised if you are making changes to the application. |
46 | 63 |
|
47 | | - |
| 64 | +### Run with docker |
48 | 65 |
|
49 | | -### 4. Install Node dependencies |
| 66 | +Docker compose is the easiest way, as you get one-step to: |
| 67 | +* generate embeddings and store them into Elasticsearch |
| 68 | +* run the app, which listens on http://localhost:3000 |
50 | 69 |
|
51 | | -```sh |
52 | | -npm install |
| 70 | +**Double-check you have a `.env` file with all your variables set first!** |
| 71 | + |
| 72 | +```bash |
| 73 | +docker compose up --build --force-recreate |
53 | 74 | ``` |
54 | 75 |
|
55 | | -### 5. Set environment variables |
| 76 | +Clean up when finished, like this: |
56 | 77 |
|
57 | | -```sh |
58 | | -export ELASTIC_CLOUD_ID=<your Elastic cloud ID> |
59 | | -export ELASTIC_USERNAME=<your Elastic username> |
60 | | -export ELASTIC_PASSWORD=<your Elastic password> |
61 | | -export OPENAI_API_KEY=<your OpenAI API key> |
| 78 | +```bash |
| 79 | +docker compose down |
62 | 80 | ``` |
63 | 81 |
|
64 | | -### 6. Generate embeddings and index documents |
| 82 | +### Run locally |
65 | 83 |
|
66 | | -```sh |
67 | | -npm run generate |
| 84 | +First, set up a Node.js environment for the example like this: |
68 | 85 |
|
69 | | -Connecting to Elastic Cloud: my-openai-integration-test:dXMt(...) |
70 | | -(node:95956) ExperimentalWarning: stream/web is an experimental feature. This feature could change at any time |
71 | | -(Use `node --trace-warnings ...` to show where the warning was created) |
72 | | -Reading from file sample_data/medicare.json |
73 | | -Processing 12 documents... |
74 | | -Processing batch of 10 documents... |
75 | | -Calling OpenAI API for 10 embeddings with model text-embedding-ada-002 |
76 | | -Indexing 10 documents to index openai-integration... |
77 | | -Processing batch of 2 documents... |
78 | | -Calling OpenAI API for 2 embeddings with model text-embedding-ada-002 |
79 | | -Indexing 2 documents to index openai-integration... |
80 | | -Processing complete |
| 86 | +```bash |
| 87 | +nvm use --lts # or similar to setup Node.js v20 or later |
| 88 | +npm install |
81 | 89 | ``` |
82 | 90 |
|
83 | | -_**Note**: the example application uses the `text-embedding-ada-002` OpenAI model for generating the embeddings, which provides a 1536-dimensional vector output. See [this section](#using-a-different-openai-model) if you want to use a different model._ |
84 | | - |
85 | | -### 7. Launch web app |
| 91 | +**Double-check you have a `.env` file with all your variables set first!** |
86 | 92 |
|
87 | | -```sh |
88 | | -npm run app |
| 93 | +#### Run the generate command |
89 | 94 |
|
90 | | -Connecting to Elastic Cloud: my-openai-integration-test:dXMt(...) |
91 | | -(node:96017) ExperimentalWarning: stream/web is an experimental feature. This feature could change at any time |
92 | | -(Use `node --trace-warnings ...` to show where the warning was created) |
93 | | -Express app listening on port 3000 |
| 95 | +First, ingest the data into elasticsearch: |
| 96 | +```bash |
| 97 | +npm run generate |
94 | 98 | ``` |
95 | 99 |
|
96 | | -### 8. Run semantic search in the web app |
97 | | - |
98 | | -- Open http://localhost:3000 in your browser |
99 | | -- Enter a search query and press Search |
| 100 | +#### Run the app |
100 | 101 |
|
101 | | - |
| 102 | +Now, run the app, which listens on http://localhost:3000 |
| 103 | +```bash |
| 104 | +npm run app |
| 105 | +``` |
102 | 106 |
|
103 | | -## Customize configuration |
| 107 | +## Advanced |
104 | 108 |
|
105 | | -Here are some tips for modifying the code for your use case. For example, you might want to use your own sample data. |
| 109 | +Here are some tips for modifying the code for your use case. For example, you |
| 110 | +might want to use your own sample data. |
106 | 111 |
|
107 | 112 | ### Using a different source file or document mapping |
108 | 113 |
|
109 | 114 | - Ensure your file contains the documents in JSON format |
110 | | -- Modify the document mappings and fields in the `.js` files and in `views/search.hbs` |
111 | | -- Modify the initialization of `FILE` in `utils.js` |
| 115 | +- Modify the document mappings and fields in the `.js` files and in [views/search.hbs](views/search.hbs) |
| 116 | +- Modify the initialization of `FILE` in [utils.js](utils.js) |
112 | 117 |
|
113 | 118 | ### Using a different OpenAI model |
114 | 119 |
|
115 | | -- Modify the initialization of `MODEL` in `utils.js` |
116 | | -- Ensure that `embedding.dims` in your index mapping is the same number as the dimensions of the model's output |
| 120 | +- Modify `EMBEDDINGS_MODEL` in `.env` |
| 121 | +- Ensure that `embedding.dims` in your index mapping is the same number as the dimensions of the model's output. |
117 | 122 |
|
118 | 123 | ### Using a different Elastic index |
119 | 124 |
|
120 | | -- Modify the initialization of `INDEX` in `utils.js` |
121 | | - |
122 | | -### Using a different method for authenticating with Elastic |
123 | | - |
124 | | -- Modify the initialization of `elasticsearchClient` in `utils.js` |
125 | | -- Refer to [this document](https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/client-connecting.html#authentication) about authentication schemes |
| 125 | +- Modify the initialization of `INDEX` in [utils.js](utils.js) |
126 | 126 |
|
127 | | -### Running on self-managed Elastic cluster |
| 127 | +### Using a different method to connect to Elastic |
128 | 128 |
|
129 | | -- Modify the initialization of `elasticsearchClient` in `utils.js` |
130 | | -- Refer to [this document](https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/client-connecting.html#connect-self-managed-new) about connecting to a self-managed cluster |
| 129 | +- Modify the initialization of `elasticsearchClient` in [utils.js](utils.js) |
| 130 | +- Refer to [this document](https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/client-connecting.html) |
0 commit comments