Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions docs/modules/colbert/pages/index.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,17 @@ To get started using ColBERT with RAGStack and Astra DB, see the xref:examples:c

The `colbert` module provides a vanilla implementation for ColBERT retrieval. It is not tied to any specific framework and can be used with any of the RAGStack packages.

To install the `ragstack-ai-colbert` package:
+
[source,python]
----
pip install ragstack-ai-colbert
----

To use ColBERT with LangChain or LLamaIndex, install ColBERT as an extra:

* `ragstack-ai-langchain[colbert]`
* `ragstack-ai-llamaindex[colbert]`
* `pip install "ragstack-ai-langchain[colbert]"`
* `pip install "ragstack-ai-llamaindex[colbert]"`

== How is ColBERT different from RAG?

Expand Down
102 changes: 65 additions & 37 deletions docs/modules/examples/pages/colbert.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -15,48 +15,55 @@ Use ColBERT, Astra DB, and RAGStack to:

For more information, see xref:colbert:index.adoc[].

== Prerequisites

[NOTE]
====
To run ragstack-ai-colbert in a Windows environment, use https://learn.microsoft.com/en-us/windows/wsl/install[Windows Subsystem for Linux].
====

Import the ragstack-ai-colbert package:
== Prerequisites

. Install dependencies:
+
[source,python]
----
pip install ragstack-ai-colbert
pip install ragstack-ai-colbert python-dotenv
----
+
. Create a `.env` file in your application directory with the following environment variables:
+
[source,bash]
----
ASTRA_DB_APPLICATION_TOKEN=AstraCS: ...
ASTRA_DB_ID=2eab82dc-9032-45ba-aeb0-a481b6f9458d
----
+
[NOTE]
====
In an Astra API endpoint like `https://2eab82dc-9032-45ba-aeb0-a481b6f9458d-us-east-1.apps.astra.datastax.com`, the `ASTRA_DB_ID` is `2eab82dc-9032-45ba-aeb0-a481b6f9458d`.
====

== Prepare data and create embeddings

. Prepare documents for chunking.
. Import dependencies and load environment variables.
+
[source,python]
----
arctic_botany_dict = {
"Introduction to Arctic Botany": "Arctic botany is the study of plant life in the Arctic, a region characterized by extreme cold, permafrost, and minimal sunlight for much of the year. Despite these harsh conditions, a diverse range of flora thrives here, adapted to survive with minimal water, low temperatures, and high light levels during the summer. This introduction aims to shed light on the resilience and adaptation of Arctic plants, setting the stage for a deeper dive into the unique botanical ecosystem of the Arctic.",
"Arctic Plant Adaptations": "Plants in the Arctic have developed unique adaptations to endure the extreme climate. Perennial growth, antifreeze proteins, and a short growth cycle are among the evolutionary solutions. These adaptations not only allow the plants to survive but also to reproduce in short summer months. Arctic plants often have small, dark leaves to absorb maximum sunlight, and some species grow in cushion or mat forms to resist cold winds. Understanding these adaptations provides insights into the resilience of Arctic flora.",
"The Tundra Biome": "The Arctic tundra is a vast, treeless biome where the subsoil is permanently frozen. Here, the vegetation is predominantly composed of dwarf shrubs, grasses, mosses, and lichens. The tundra supports a surprisingly rich biodiversity, adapted to its cold, dry, and windy conditions. The biome plays a crucial role in the Earth's climate system, acting as a carbon sink. However, it's sensitive to climate change, with thawing permafrost and shifting vegetation patterns.",
"Arctic Plant Biodiversity": "Despite the challenging environment, the Arctic boasts a significant variety of plant species, each adapted to its niche. From the colorful blooms of Arctic poppies to the hardy dwarf willows, these plants form a complex ecosystem. The biodiversity of Arctic flora is vital for local wildlife, providing food and habitat. This diversity also has implications for Arctic peoples, who depend on certain plant species for food, medicine, and materials.",
"Climate Change and Arctic Flora": "Climate change poses a significant threat to Arctic botany, with rising temperatures, melting permafrost, and changing precipitation patterns. These changes can lead to shifts in plant distribution, phenology, and the composition of the Arctic flora. Some species may thrive, while others could face extinction. This dynamic is critical to understanding future Arctic ecosystems and their global impact, including feedback loops that may exacerbate global warming.",
"Research and Conservation in the Arctic": "Research in Arctic botany is crucial for understanding the intricate balance of this ecosystem and the impacts of climate change. Scientists conduct studies on plant physiology, genetics, and ecosystem dynamics. Conservation efforts are focused on protecting the Arctic's unique biodiversity through protected areas, sustainable management practices, and international cooperation. These efforts aim to preserve the Arctic flora for future generations and maintain its role in the global climate system.",
"Traditional Knowledge and Arctic Botany": "Indigenous peoples of the Arctic have a deep connection with the land and its plant life. Traditional knowledge, passed down through generations, includes the uses of plants for nutrition, healing, and materials. This body of knowledge is invaluable for both conservation and understanding the ecological relationships in Arctic ecosystems. Integrating traditional knowledge with scientific research enriches our comprehension of Arctic botany and enhances conservation strategies.",
"Future Directions in Arctic Botanical Studies": "The future of Arctic botany lies in interdisciplinary research, combining traditional knowledge with modern scientific techniques. As the Arctic undergoes rapid changes, understanding the ecological, cultural, and climatic dimensions of Arctic flora becomes increasingly important. Future research will need to address the challenges of climate change, explore the potential for Arctic plants in biotechnology, and continue to conserve this unique biome. The resilience of Arctic flora offers lessons in adaptation and survival relevant to global challenges."
}
arctic_botany_texts = list(arctic_botany_dict.values())
import os
import logging
import nest_asyncio
from dotenv import load_dotenv
from ragstack_colbert import CassandraDatabase, ColbertEmbeddingModel, ColbertVectorStore

load_dotenv()
----
+
. Set up the ColBERT and Astra configurations.
+
[source,python]
----
from getpass import getpass
from ragstack_colbert import CassandraDatabase, ColbertEmbeddingModel, ColbertVectorStore

keyspace="default_keyspace"
database_id=getpass("Enter your Astra Database Id:")
astra_token=getpass("Enter your Astra Token:")
database_id=os.getenv("ASTRA_DB_ID")
astra_token=os.getenv("ASTRA_DB_APPLICATION_TOKEN")

database = CassandraDatabase.from_astra(
astra_token=astra_token,
Expand All @@ -72,6 +79,24 @@ vector_store = ColbertVectorStore(
)
----
+
. Prepare documents for chunking.
+
[source,python]
----
arctic_botany_dict = {
"Introduction to Arctic Botany": "Arctic botany is the study of plant life in the Arctic, a region characterized by extreme cold, permafrost, and minimal sunlight for much of the year. Despite these harsh conditions, a diverse range of flora thrives here, adapted to survive with minimal water, low temperatures, and high light levels during the summer. This introduction aims to shed light on the resilience and adaptation of Arctic plants, setting the stage for a deeper dive into the unique botanical ecosystem of the Arctic.",
"Arctic Plant Adaptations": "Plants in the Arctic have developed unique adaptations to endure the extreme climate. Perennial growth, antifreeze proteins, and a short growth cycle are among the evolutionary solutions. These adaptations not only allow the plants to survive but also to reproduce in short summer months. Arctic plants often have small, dark leaves to absorb maximum sunlight, and some species grow in cushion or mat forms to resist cold winds. Understanding these adaptations provides insights into the resilience of Arctic flora.",
"The Tundra Biome": "The Arctic tundra is a vast, treeless biome where the subsoil is permanently frozen. Here, the vegetation is predominantly composed of dwarf shrubs, grasses, mosses, and lichens. The tundra supports a surprisingly rich biodiversity, adapted to its cold, dry, and windy conditions. The biome plays a crucial role in the Earth's climate system, acting as a carbon sink. However, it's sensitive to climate change, with thawing permafrost and shifting vegetation patterns.",
"Arctic Plant Biodiversity": "Despite the challenging environment, the Arctic boasts a significant variety of plant species, each adapted to its niche. From the colorful blooms of Arctic poppies to the hardy dwarf willows, these plants form a complex ecosystem. The biodiversity of Arctic flora is vital for local wildlife, providing food and habitat. This diversity also has implications for Arctic peoples, who depend on certain plant species for food, medicine, and materials.",
"Climate Change and Arctic Flora": "Climate change poses a significant threat to Arctic botany, with rising temperatures, melting permafrost, and changing precipitation patterns. These changes can lead to shifts in plant distribution, phenology, and the composition of the Arctic flora. Some species may thrive, while others could face extinction. This dynamic is critical to understanding future Arctic ecosystems and their global impact, including feedback loops that may exacerbate global warming.",
"Research and Conservation in the Arctic": "Research in Arctic botany is crucial for understanding the intricate balance of this ecosystem and the impacts of climate change. Scientists conduct studies on plant physiology, genetics, and ecosystem dynamics. Conservation efforts are focused on protecting the Arctic's unique biodiversity through protected areas, sustainable management practices, and international cooperation. These efforts aim to preserve the Arctic flora for future generations and maintain its role in the global climate system.",
"Traditional Knowledge and Arctic Botany": "Indigenous peoples of the Arctic have a deep connection with the land and its plant life. Traditional knowledge, passed down through generations, includes the uses of plants for nutrition, healing, and materials. This body of knowledge is invaluable for both conservation and understanding the ecological relationships in Arctic ecosystems. Integrating traditional knowledge with scientific research enriches our comprehension of Arctic botany and enhances conservation strategies.",
"Future Directions in Arctic Botanical Studies": "The future of Arctic botany lies in interdisciplinary research, combining traditional knowledge with modern scientific techniques. As the Arctic undergoes rapid changes, understanding the ecological, cultural, and climatic dimensions of Arctic flora becomes increasingly important. Future research will need to address the challenges of climate change, explore the potential for Arctic plants in biotechnology, and continue to conserve this unique biome. The resilience of Arctic flora offers lessons in adaptation and survival relevant to global challenges."
}

arctic_botany_texts = list(arctic_botany_dict.values())
----
+
. Connect to Astra and ingest embeddings.
+
[source,python]
Expand All @@ -94,40 +119,40 @@ Python::
+
[source,python]
----
import logging
import nest_asyncio
nest_asyncio.apply()

logging.getLogger('cassandra').setLevel(logging.ERROR) # workaround to suppress logs
from ragstack_colbert import ColbertRetriever
retriever = ColbertRetriever(
vector_store=db, embedding_model=colbert
)
logging.getLogger("cassandra").setLevel(logging.ERROR) # workaround to suppress logs
retriever = vector_store.as_retriever()

answers = retriever.retrieve("What's arctic botany", k=2)
for answer in answers:
print(f"Rank: {answer.rank} Score: {answer.score} Text: {answer.data.text}\n")
answers = retriever.text_search("What's arctic botany", k=2)
for rank, (answer, score) in enumerate(answers):
print(f"Rank: {rank} Score: {score} Text: {answer.text}\n")
----

+
Result::
+
[source,plain]
----
Rank: 0 Score: 5.266005039215088 Text: Arctic botany is the study of plant life in the Arctic, a region characterized by extreme cold, permafrost, and minimal sunlight for much of the year. Despite these harsh conditions, a diverse range of flora thrives here, adapted to survive with minimal water, low temperatures, and high light levels during the summer. This introduction aims to shed light on the resilience and adaptation of Arctic plants, setting the stage for a deeper dive into the unique botanical ecosystem of the Arctic.
#> QueryTokenizer.tensorize(batch_text[0], batch_background[0], bsize) ==
#> Input: . What's arctic botany, True, None
#> Output IDs: torch.Size([9]), tensor([ 101, 1, 2054, 1005, 1055, 2396, 2594, 17018, 102])
#> Output Mask: torch.Size([9]), tensor([1, 1, 1, 1, 1, 1, 1, 1, 1])

Rank: 0 Score: 5.266004428267479 Text: Arctic botany is the study of plant life in the Arctic, a region characterized by extreme cold, permafrost, and minimal sunlight for much of the year. Despite these harsh conditions, a diverse range of flora thrives here, adapted to survive with minimal water, low temperatures, and high light levels during the summer. This introduction aims to shed light on the resilience and adaptation of Arctic plants, setting the stage for a deeper dive into the unique botanical ecosystem of the Arctic.

Rank: 1 Score: 3.9489646703004837 Text: Research in Arctic botany is crucial for understanding the intricate balance of this ecosystem and the impacts of climate change. Scientists conduct studies on plant physiology, genetics, and ecosystem dynamics. Conservation efforts are focused on protecting the Arctic's unique biodiversity through protected areas, sustainable management practices, and international cooperation. These efforts aim to preserve the Arctic flora for future generations and maintain its role in the global climate system.
Rank: 1 Score: 5.266004309058189 Text: Arctic botany is the study of plant life in the Arctic, a region characterized by extreme cold, permafrost, and minimal sunlight for much of the year. Despite these harsh conditions, a diverse range of flora thrives here, adapted to survive with minimal water, low temperatures, and high light levels during the summer. This introduction aims to shed light on the resilience and adaptation of Arctic plants, setting the stage for a deeper dive into the unique botanical ecosystem of the Arctic.
----
======

== Retrieve embeddings with the LangChain retriever

Alternatively, use the ColBERT extra with the `ragstack-ai-langchain` package to retrieve documents.

. Install the RAGStack LangChain package with the ColBERT extra.
. Install the RAGStack Langchain package with the ColBERT extra.
+
[source,python]
----
pip install ragstack-ai-langchain[colbert]
pip install "ragstack-ai-langchain[colbert]"
----
+
. Run the LangChain retriever against the indexed embeddings.
Expand All @@ -145,7 +170,10 @@ lc_vector_store = LangchainColbertVectorStore(
embedding_model=embedding_model,
)

docs = lc_vector_store.similarity_search("what kind fish lives shallow coral reefs atlantic, india ocean, red sea, gulf of mexico, pacific, and arctic ocean")
docs = lc_vector_store.similarity_search(
"what kind fish lives shallow coral reefs atlantic, india ocean, "
"red sea, gulf of mexico, pacific, and arctic ocean"
)
print(f"first answer: {docs[0].page_content}")
----

Expand Down
2 changes: 1 addition & 1 deletion docs/modules/examples/pages/llama-astra.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ DB Access Token] with Database Administrator permissions.
Install the following dependencies:
[source,python]
----
pip install ragstack-ai
pip install ragstack-ai python-dotenv
----
See the https://docs.datastax.com/en/ragstack/docs/prerequisites.html[Prerequisites] page for more details.

Expand Down