Skip to content

Conversation

@dminnear-rh
Copy link
Contributor

@sauagarwa There's a lot of changes to clean up this pattern in addition to adding the subchart for deploying SQL Server for local deploy and Azure SQL server as a RAG DB provider.

I'll summarize the key changes:

  • added ansible playbook and updated makefile to allow creating GPU machinesets on Azure
  • remove unused code (the minio chart, additional cluster groups)
  • add charts for azure sql server and local sql server and updated secrets to accommodate them
  • updated number of elasticsearch nodes since it would report as unhealthy without an additional node for the document embeddings
  • added specific overrides for Azure to allow installing on GPUs with only 16 GB of VRAM which are more compatible with the NVIDIA operator
  • enabled the use of the TGI server chart (now we can have a model deployed by the inference service on vLLM as well as an additional model deployed by the TGI server so the default AWS installation can use two models by default)
  • Models are now only configured in one place (values-global.yaml) and you can use the values global.model.vllm and global.model.tgis to easily set the model used by the inference server and TGI server, respectively
  • Updated the RAG DB embedding image to use the image built from https://github.com/validatedpatterns-sandbox/vector-embedder which allows more flexibility in configuring the env, better logging, configuring the sources from repos or URLs more easily the values file, and updated versions of langchain and the db providers
  • renamed the llm-serving-service chart to vllm-inference-service to further distinguish that we're using vLLM for that chart versus the HF TGI server in the tgis-server chart
  • loosen versions of operators installed to better support more versions of Openshift

With these changes, we are able to install the chart on ROSA 4.18 as well as the ARO 4.14 provided by our demo platform and everything comes up healthy and synced regardless of your RAG DB provider. This ensures CI will begin passing again (as it checks for out-of-sync and unhealthy applications in ArgoCD).

Please let me know if there's anything you want me to change, move into a separate PR, etc. and I'm happy to do so.

One other important thing, right now this PR is using quay.io/dminnear/gradio-tgi-multi-model-rag because the latest changes to the UI haven't been built and pushed to https://quay.io/repository/ecosystem-appeng/rag-llm-ui?tab=info. If we can get that image updated with the latest https://github.com/RHEcosystemAppEng/llm-on-openshift/tree/main/examples/ui/gradio/gradio-tgi-multi-model-rag-redis I can revert that back to the proper image.

@dminnear-rh dminnear-rh requested a review from sauagarwa June 3, 2025 16:32
tgis: mistralai/Mistral-7B-Instruct-v0.3
# Embedding model used by the RAG DB
embedding: sentence-transformers/all-mpnet-base-v2
tgisServer:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we deploying tgisServer

@sauagarwa sauagarwa merged commit ab18e90 into validatedpatterns:main Jun 10, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants