Gemini Embedding: Powering RAG and context engineering

SurveyJS: Build JSON-Driven Surveys and Forms with Full Data Control

Add the SurveyJS UI components to your JS app (React/Angular/Vue3). Securely collect and analyze data without sending it to 3rd-party servers. Fully customizable, works with any backend, and ideal for data-heavy apps. Learn more.

surveyjs.io

featured

InfluxDB – Built for High-Performance Time Series Workloads

InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

www.influxdata.com

featured

beir

1 10 2,023 7.1 Python

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.

It's always worth checking out the MTEB leaderboard: https://huggingface.co/spaces/mteb/leaderboard
There are some good open models there that have longer context limits and fewer dimensions.
The benchmarks are just a guide. It's best to build a test dataset with your own data. This is a good example of that: https://github.com/beir-cellar/beir/wiki/Load-your-custom-da...
Another benefit of having your own test dataset, is that it can grow as your data grows. And you can quickly test new models to see how it performs with YOUR data.
SurveyJS

surveyjs.io featured

SurveyJS: Build JSON-Driven Surveys and Forms with Full Data Control. Add the SurveyJS UI components to your JS app (React/Angular/Vue3). Securely collect and analyze data without sending it to 3rd-party servers. Fully customizable, works with any backend, and ideal for data-heavy apps. Learn more.
openai-cookbook

2 235 69,902 9.6 Jupyter Notebook

Examples and guides for using the OpenAI API

Anyone who has recently worked on embedding model finetuning, any useful tools you'd recommend (both for dataset curation and actual finetuning)? Any models you'd recommend as especially good for finetuning?
I'm interested in both full model finetunes, and downstream matrix optimization as done in [1].
[1] https://github.com/openai/openai-cookbook/blob/main/examples...
directory-indexer

3 1 13 9.3 TypeScript

Directory Indexer MCP Server - A local MCP Server for indexing your local directories into a knowledgebase for your AI Assistants.

I have been thinking around solving this problem. I think one of the reasons some AI assistants shine vs others is how they can reduce the amount of context the LLM needs to work with using in-built tools. I think there's room to democratize these capabilities. One such capability is allowing the LLMs to directly work with the embeddings.
I wrote an MCP server directory-indexer[1] for this (self-hosted indexing mcp server). The goal being indexing any directories you want your AI to know about and gives the it MCP tools to search through the embeddings etc. While an agentic grep may be valuable, when working with tons of files with similar topics (like customer cases, technical docs), pre-processed embeddings have proven valuable for me. One reason I really like it is that it democratizes my data and documents: giving consistent results when working with different AI assistants - the alternative being vastly different results based on the in-built capabilities of the coding assistants. Another being having access to you "knowledge" from any project you're on. Though since this is selfhosted, I use nomic-embed-text for the embedding which has been sufficient for most use cases.
[1] https://github.com/peteretelej/directory-indexer

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Any* Embedding Model Can Become a Late Interaction Model - If You Give It a Chance!

2 projects | dev.to | 29 Aug 2024
Quick tip: Using R, OpenAI and SingleStore Notebooks

1 project | dev.to | 1 May 2024
Splade: Sparse Neural Search

1 project | news.ycombinator.com | 11 Mar 2024
On building a semantic search engine

3 projects | news.ycombinator.com | 6 Jan 2024
BEIR: A Heterogeneous Benchmark for Information Retrieval

1 project | news.ycombinator.com | 2 Jan 2024

Gemini Embedding: Powering RAG and context engineering

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
hardware-buttons linkedin-bot template-engine-js
Post date: 31 Jul 2025

beir

SurveyJS

openai-cookbook

directory-indexer

Related posts

Any* Embedding Model Can Become a Late Interaction Model - If You Give It a Chance!

Quick tip: Using R, OpenAI and SingleStore Notebooks

Splade: Sparse Neural Search

On building a semantic search engine

BEIR: A Heterogeneous Benchmark for Information Retrieval

Did you know that MDX is
the 39th most popular programming language
based on number of references?

Gemini Embedding: Powering RAG and context engineering

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com hardware-buttons linkedin-bot template-engine-js Post date: 31 Jul 2025

beir

SurveyJS

openai-cookbook

directory-indexer

Related posts

Any* Embedding Model Can Become a Late Interaction Model - If You Give It a Chance!

Quick tip: Using R, OpenAI and SingleStore Notebooks

Splade: Sparse Neural Search

On building a semantic search engine

BEIR: A Heterogeneous Benchmark for Information Retrieval

Did you know that MDX is the 39th most popular programming language based on number of references?

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
hardware-buttons linkedin-bot template-engine-js
Post date: 31 Jul 2025

Did you know that MDX is
the 39th most popular programming language
based on number of references?