Add task categories and prominent links to paper and code
#1
by nielsr HF Staff - opened
README.md CHANGED
@@ -1,11 +1,18 @@ | |
1 | --- |
2 | - license: apache-2.0 |
3 | language: |
4 | - en |
| |
5 | size_categories: |
6 | - 1M<n<10M |
| |
| |
| |
7 | --- |
8 | |
| |
| |
| |
| |
9 | The F2LLM dataset includes 6 million query-document-negative tuples curated solely from open-source, non-synthetic data, serving as a strong, budget-friendly baseline for training embedding models. |
10 | |
11 | ## Data Format |
| |
1 | --- |
| |
2 | language: |
3 | - en |
4 | + license: apache-2.0 |
5 | size_categories: |
6 | - 1M<n<10M |
7 | + task_categories: |
8 | + - text-retrieval |
9 | + - feature-extraction |
10 | --- |
11 | |
12 | + # F2LLM Dataset |
13 | + |
14 | + [Paper](https://huggingface.co/papers/2510.02294) | [Code](https://github.com/codefuse-ai/CodeFuse-Embeddings/tree/main/F2LLM) |
15 | + |
16 | The F2LLM dataset includes 6 million query-document-negative tuples curated solely from open-source, non-synthetic data, serving as a strong, budget-friendly baseline for training embedding models. |
17 | |
18 | ## Data Format |