|
| 1 | +{ |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "id": "c52e30d1-cb29-4e70-af4a-9c953fcb0f2e", |
| 6 | + "metadata": {}, |
| 7 | + "source": [ |
| 8 | + "# Quickstart: Vector search using Gemini Embeddings and Elasticsearch\n", |
| 9 | + "\n", |
| 10 | + "This tutorial demonstrates how to use the [Gemini API](https://ai.google.dev/docs) to create [embeddings](https://ai.google.dev/docs/embeddings_guide) and store them in Elasticsearch. Elasticsearch will enable us to perform vector search (Knn) to find similar documents." |
| 11 | + ] |
| 12 | + }, |
| 13 | + { |
| 14 | + "cell_type": "markdown", |
| 15 | + "id": "88303061-f357-43d8-8b63-c4f79e9a1746", |
| 16 | + "metadata": {}, |
| 17 | + "source": [ |
| 18 | + "## setup\n", |
| 19 | + "\n", |
| 20 | + "* Elastic Credentials - Create [Cloud deployment](https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud) to get all Elastic credentials (`ELASTIC_CLOUD_ID`, `ELASTIC_API_KEY`).\n", |
| 21 | + "\n", |
| 22 | + "* `GOOGLE_API_KEY` - To use the Gemini API, you need an API key. [Follow](https://ai.google.dev/tutorials/setup) to create a key with one click in Google AI Studio." |
| 23 | + ] |
| 24 | + }, |
| 25 | + { |
| 26 | + "cell_type": "markdown", |
| 27 | + "id": "76ca723c-6148-4682-a5ae-486e73cb2b94", |
| 28 | + "metadata": {}, |
| 29 | + "source": [ |
| 30 | + "## Install packages" |
| 31 | + ] |
| 32 | + }, |
| 33 | + { |
| 34 | + "cell_type": "code", |
| 35 | + "execution_count": null, |
| 36 | + "id": "ef1f1e52-f892-489f-8947-3e4698f5f5c3", |
| 37 | + "metadata": {}, |
| 38 | + "outputs": [], |
| 39 | + "source": [ |
| 40 | + "pip install -q -U google-generativeai elasticsearch" |
| 41 | + ] |
| 42 | + }, |
| 43 | + { |
| 44 | + "cell_type": "markdown", |
| 45 | + "id": "3d86d3fa-4ca0-41b6-a4bc-81bacf26bf02", |
| 46 | + "metadata": {}, |
| 47 | + "source": [ |
| 48 | + "## Import packages and credentials" |
| 49 | + ] |
| 50 | + }, |
| 51 | + { |
| 52 | + "cell_type": "code", |
| 53 | + "execution_count": null, |
| 54 | + "id": "bb62d8fb-6c34-44fd-bc94-18b644422ee8", |
| 55 | + "metadata": {}, |
| 56 | + "outputs": [], |
| 57 | + "source": [ |
| 58 | + "import google.generativeai as genai\n", |
| 59 | + "import google.ai.generativelanguage as glm\n", |
| 60 | + "from elasticsearch import Elasticsearch, helpers\n", |
| 61 | + "from getpass import getpass\n", |
| 62 | + "\n", |
| 63 | + "GOOGLE_API_KEY=getpass(\"Google API Key :\")\n", |
| 64 | + "ELASTIC_API_KEY=getpass(\"Elastic API Key :\")\n", |
| 65 | + "ELASTIC_CLOUD_ID=getpass(\"Elastic Cloud ID :\")\n", |
| 66 | + "elastic_index_name='gemini-demo'" |
| 67 | + ] |
| 68 | + }, |
| 69 | + { |
| 70 | + "cell_type": "markdown", |
| 71 | + "id": "8b22dc16-c0a0-48f0-979d-5d21c17bd264", |
| 72 | + "metadata": {}, |
| 73 | + "source": [ |
| 74 | + "## Embedding generation\n", |
| 75 | + "\n" |
| 76 | + ] |
| 77 | + }, |
| 78 | + { |
| 79 | + "cell_type": "code", |
| 80 | + "execution_count": null, |
| 81 | + "id": "ca56532d-7c82-4e2b-aecf-2173520d3696", |
| 82 | + "metadata": {}, |
| 83 | + "outputs": [], |
| 84 | + "source": [ |
| 85 | + "genai.configure(api_key=GOOGLE_API_KEY)\n", |
| 86 | + "\n", |
| 87 | + "title = \"Climate in India\"\n", |
| 88 | + "sample_text = (\"India generally experiences a hot summer from March to June, with temperatures often exceeding 40°C in central and northern regions. Monsoon season, from June to September, brings heavy rainfall, especially in the western coast and northeastern areas. Post-monsoon months, October and November, mark a transition with decreasing rainfall. Winter, from December to February, varies in temperature across the country, with colder conditions in the north and milder weather in the south. India's diverse climate is influenced by its geographical features, resulting in regional \")\n", |
| 89 | + "\n", |
| 90 | + "model = 'models/embedding-001'\n", |
| 91 | + "embedding = genai.embed_content(model=model,\n", |
| 92 | + " content=sample_text,\n", |
| 93 | + " task_type=\"retrieval_document\",\n", |
| 94 | + " title=title)\n" |
| 95 | + ] |
| 96 | + }, |
| 97 | + { |
| 98 | + "cell_type": "markdown", |
| 99 | + "id": "6239eda7-3bed-43dd-a6a8-a8369b907d5c", |
| 100 | + "metadata": {}, |
| 101 | + "source": [ |
| 102 | + "## Connecting Elasticsearch" |
| 103 | + ] |
| 104 | + }, |
| 105 | + { |
| 106 | + "cell_type": "code", |
| 107 | + "execution_count": null, |
| 108 | + "id": "7cbade18-3049-46f1-8d3e-5b22d4aade5b", |
| 109 | + "metadata": {}, |
| 110 | + "outputs": [], |
| 111 | + "source": [ |
| 112 | + "es = Elasticsearch(\n", |
| 113 | + " cloud_id = ELASTIC_CLOUD_ID,\n", |
| 114 | + " api_key= ELASTIC_API_KEY\n", |
| 115 | + ")" |
| 116 | + ] |
| 117 | + }, |
| 118 | + { |
| 119 | + "cell_type": "markdown", |
| 120 | + "id": "20d070c8-9e19-48a3-bc3b-5f22067eb63f", |
| 121 | + "metadata": {}, |
| 122 | + "source": [ |
| 123 | + "## Index document with Elasticsearch" |
| 124 | + ] |
| 125 | + }, |
| 126 | + { |
| 127 | + "cell_type": "code", |
| 128 | + "execution_count": null, |
| 129 | + "id": "e02ca81e-7caa-4505-95c6-3c6be7843c8f", |
| 130 | + "metadata": {}, |
| 131 | + "outputs": [], |
| 132 | + "source": [ |
| 133 | + "doc = {\n", |
| 134 | + " 'text' : sample_text,\n", |
| 135 | + " 'text_embedding' : embedding['embedding'] \n", |
| 136 | + "}\n", |
| 137 | + "\n", |
| 138 | + "resp = es.index(index=elastic_index_name, document=doc)\n", |
| 139 | + "\n", |
| 140 | + "print(resp)" |
| 141 | + ] |
| 142 | + }, |
| 143 | + { |
| 144 | + "cell_type": "markdown", |
| 145 | + "id": "afa0d371-afbf-4f98-9cd1-ee457839f323", |
| 146 | + "metadata": {}, |
| 147 | + "source": [ |
| 148 | + "## Searching for document with Elasticsearch" |
| 149 | + ] |
| 150 | + }, |
| 151 | + { |
| 152 | + "cell_type": "code", |
| 153 | + "execution_count": null, |
| 154 | + "id": "d71eeacc-d0c8-4035-b052-a1c03300aec0", |
| 155 | + "metadata": {}, |
| 156 | + "outputs": [], |
| 157 | + "source": [ |
| 158 | + "q = \"How's weather in India?\"\n", |
| 159 | + "\n", |
| 160 | + "embedding = genai.embed_content(model=model,\n", |
| 161 | + " content=sample_text,\n", |
| 162 | + " task_type=\"retrieval_query\")\n", |
| 163 | + "\n", |
| 164 | + "resp = es.search(\n", |
| 165 | + " index = elastic_index_name,\n", |
| 166 | + " knn={\n", |
| 167 | + " \"field\": \"text_embedding\",\n", |
| 168 | + " \"query_vector\": embedding['embedding'],\n", |
| 169 | + " \"k\": 10,\n", |
| 170 | + " \"num_candidates\": 100\n", |
| 171 | + " }\n", |
| 172 | + ")\n", |
| 173 | + "\n", |
| 174 | + "\n", |
| 175 | + "for result in resp['hits']['hits']:\n", |
| 176 | + " pretty_output = (f\"\\n\\nID: {result['_id']}\\n\\nText: {result['_source']['text']}\\n\\nEmbedding: {result['_source']['text_embedding']}\")\n", |
| 177 | + " print(pretty_output)" |
| 178 | + ] |
| 179 | + } |
| 180 | + ], |
| 181 | + "metadata": { |
| 182 | + "kernelspec": { |
| 183 | + "display_name": "Python 3 (ipykernel)", |
| 184 | + "language": "python", |
| 185 | + "name": "python3" |
| 186 | + }, |
| 187 | + "language_info": { |
| 188 | + "codemirror_mode": { |
| 189 | + "name": "ipython", |
| 190 | + "version": 3 |
| 191 | + }, |
| 192 | + "file_extension": ".py", |
| 193 | + "mimetype": "text/x-python", |
| 194 | + "name": "python", |
| 195 | + "nbconvert_exporter": "python", |
| 196 | + "pygments_lexer": "ipython3", |
| 197 | + "version": "3.11.6" |
| 198 | + } |
| 199 | + }, |
| 200 | + "nbformat": 4, |
| 201 | + "nbformat_minor": 5 |
| 202 | +} |
0 commit comments