Skip to content

Commit f3cf8a8

Browse files
ashishtiwari1993miguelgrinberg
authored andcommitted
Adding vector search demo using Gemini Embeddings (elastic#155)
* Adding vector search demo using Gemini Embeddings * Adding getpass() function to get credentials * Added api_key authentication for Elasticsearch connection * Added api_key authentication for Elasticsearch connection
1 parent 486bd74 commit f3cf8a8

File tree

1 file changed

+202
-0
lines changed

1 file changed

+202
-0
lines changed
Lines changed: 202 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,202 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "c52e30d1-cb29-4e70-af4a-9c953fcb0f2e",
6+
"metadata": {},
7+
"source": [
8+
"# Quickstart: Vector search using Gemini Embeddings and Elasticsearch\n",
9+
"\n",
10+
"This tutorial demonstrates how to use the [Gemini API](https://ai.google.dev/docs) to create [embeddings](https://ai.google.dev/docs/embeddings_guide) and store them in Elasticsearch. Elasticsearch will enable us to perform vector search (Knn) to find similar documents."
11+
]
12+
},
13+
{
14+
"cell_type": "markdown",
15+
"id": "88303061-f357-43d8-8b63-c4f79e9a1746",
16+
"metadata": {},
17+
"source": [
18+
"## setup\n",
19+
"\n",
20+
"* Elastic Credentials - Create [Cloud deployment](https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud) to get all Elastic credentials (`ELASTIC_CLOUD_ID`, `ELASTIC_API_KEY`).\n",
21+
"\n",
22+
"* `GOOGLE_API_KEY` - To use the Gemini API, you need an API key. [Follow](https://ai.google.dev/tutorials/setup) to create a key with one click in Google AI Studio."
23+
]
24+
},
25+
{
26+
"cell_type": "markdown",
27+
"id": "76ca723c-6148-4682-a5ae-486e73cb2b94",
28+
"metadata": {},
29+
"source": [
30+
"## Install packages"
31+
]
32+
},
33+
{
34+
"cell_type": "code",
35+
"execution_count": null,
36+
"id": "ef1f1e52-f892-489f-8947-3e4698f5f5c3",
37+
"metadata": {},
38+
"outputs": [],
39+
"source": [
40+
"pip install -q -U google-generativeai elasticsearch"
41+
]
42+
},
43+
{
44+
"cell_type": "markdown",
45+
"id": "3d86d3fa-4ca0-41b6-a4bc-81bacf26bf02",
46+
"metadata": {},
47+
"source": [
48+
"## Import packages and credentials"
49+
]
50+
},
51+
{
52+
"cell_type": "code",
53+
"execution_count": null,
54+
"id": "bb62d8fb-6c34-44fd-bc94-18b644422ee8",
55+
"metadata": {},
56+
"outputs": [],
57+
"source": [
58+
"import google.generativeai as genai\n",
59+
"import google.ai.generativelanguage as glm\n",
60+
"from elasticsearch import Elasticsearch, helpers\n",
61+
"from getpass import getpass\n",
62+
"\n",
63+
"GOOGLE_API_KEY=getpass(\"Google API Key :\")\n",
64+
"ELASTIC_API_KEY=getpass(\"Elastic API Key :\")\n",
65+
"ELASTIC_CLOUD_ID=getpass(\"Elastic Cloud ID :\")\n",
66+
"elastic_index_name='gemini-demo'"
67+
]
68+
},
69+
{
70+
"cell_type": "markdown",
71+
"id": "8b22dc16-c0a0-48f0-979d-5d21c17bd264",
72+
"metadata": {},
73+
"source": [
74+
"## Embedding generation\n",
75+
"\n"
76+
]
77+
},
78+
{
79+
"cell_type": "code",
80+
"execution_count": null,
81+
"id": "ca56532d-7c82-4e2b-aecf-2173520d3696",
82+
"metadata": {},
83+
"outputs": [],
84+
"source": [
85+
"genai.configure(api_key=GOOGLE_API_KEY)\n",
86+
"\n",
87+
"title = \"Climate in India\"\n",
88+
"sample_text = (\"India generally experiences a hot summer from March to June, with temperatures often exceeding 40°C in central and northern regions. Monsoon season, from June to September, brings heavy rainfall, especially in the western coast and northeastern areas. Post-monsoon months, October and November, mark a transition with decreasing rainfall. Winter, from December to February, varies in temperature across the country, with colder conditions in the north and milder weather in the south. India's diverse climate is influenced by its geographical features, resulting in regional \")\n",
89+
"\n",
90+
"model = 'models/embedding-001'\n",
91+
"embedding = genai.embed_content(model=model,\n",
92+
" content=sample_text,\n",
93+
" task_type=\"retrieval_document\",\n",
94+
" title=title)\n"
95+
]
96+
},
97+
{
98+
"cell_type": "markdown",
99+
"id": "6239eda7-3bed-43dd-a6a8-a8369b907d5c",
100+
"metadata": {},
101+
"source": [
102+
"## Connecting Elasticsearch"
103+
]
104+
},
105+
{
106+
"cell_type": "code",
107+
"execution_count": null,
108+
"id": "7cbade18-3049-46f1-8d3e-5b22d4aade5b",
109+
"metadata": {},
110+
"outputs": [],
111+
"source": [
112+
"es = Elasticsearch(\n",
113+
" cloud_id = ELASTIC_CLOUD_ID,\n",
114+
" api_key= ELASTIC_API_KEY\n",
115+
")"
116+
]
117+
},
118+
{
119+
"cell_type": "markdown",
120+
"id": "20d070c8-9e19-48a3-bc3b-5f22067eb63f",
121+
"metadata": {},
122+
"source": [
123+
"## Index document with Elasticsearch"
124+
]
125+
},
126+
{
127+
"cell_type": "code",
128+
"execution_count": null,
129+
"id": "e02ca81e-7caa-4505-95c6-3c6be7843c8f",
130+
"metadata": {},
131+
"outputs": [],
132+
"source": [
133+
"doc = {\n",
134+
" 'text' : sample_text,\n",
135+
" 'text_embedding' : embedding['embedding'] \n",
136+
"}\n",
137+
"\n",
138+
"resp = es.index(index=elastic_index_name, document=doc)\n",
139+
"\n",
140+
"print(resp)"
141+
]
142+
},
143+
{
144+
"cell_type": "markdown",
145+
"id": "afa0d371-afbf-4f98-9cd1-ee457839f323",
146+
"metadata": {},
147+
"source": [
148+
"## Searching for document with Elasticsearch"
149+
]
150+
},
151+
{
152+
"cell_type": "code",
153+
"execution_count": null,
154+
"id": "d71eeacc-d0c8-4035-b052-a1c03300aec0",
155+
"metadata": {},
156+
"outputs": [],
157+
"source": [
158+
"q = \"How's weather in India?\"\n",
159+
"\n",
160+
"embedding = genai.embed_content(model=model,\n",
161+
" content=sample_text,\n",
162+
" task_type=\"retrieval_query\")\n",
163+
"\n",
164+
"resp = es.search(\n",
165+
" index = elastic_index_name,\n",
166+
" knn={\n",
167+
" \"field\": \"text_embedding\",\n",
168+
" \"query_vector\": embedding['embedding'],\n",
169+
" \"k\": 10,\n",
170+
" \"num_candidates\": 100\n",
171+
" }\n",
172+
")\n",
173+
"\n",
174+
"\n",
175+
"for result in resp['hits']['hits']:\n",
176+
" pretty_output = (f\"\\n\\nID: {result['_id']}\\n\\nText: {result['_source']['text']}\\n\\nEmbedding: {result['_source']['text_embedding']}\")\n",
177+
" print(pretty_output)"
178+
]
179+
}
180+
],
181+
"metadata": {
182+
"kernelspec": {
183+
"display_name": "Python 3 (ipykernel)",
184+
"language": "python",
185+
"name": "python3"
186+
},
187+
"language_info": {
188+
"codemirror_mode": {
189+
"name": "ipython",
190+
"version": 3
191+
},
192+
"file_extension": ".py",
193+
"mimetype": "text/x-python",
194+
"name": "python",
195+
"nbconvert_exporter": "python",
196+
"pygments_lexer": "ipython3",
197+
"version": "3.11.6"
198+
}
199+
},
200+
"nbformat": 4,
201+
"nbformat_minor": 5
202+
}

0 commit comments

Comments
 (0)