Add task categories and prominent links to paper and code

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +8 -1
README.md CHANGED
@@ -1,11 +1,18 @@
1
  ---
2
- license: apache-2.0
3
  language:
4
  - en
 
5
  size_categories:
6
  - 1M<n<10M
 
 
 
7
  ---
8
 
 
 
 
 
9
  The F2LLM dataset includes 6 million query-document-negative tuples curated solely from open-source, non-synthetic data, serving as a strong, budget-friendly baseline for training embedding models.
10
 
11
  ## Data Format
 
1
  ---
 
2
  language:
3
  - en
4
+ license: apache-2.0
5
  size_categories:
6
  - 1M<n<10M
7
+ task_categories:
8
+ - text-retrieval
9
+ - feature-extraction
10
  ---
11
 
12
+ # F2LLM Dataset
13
+
14
+ [Paper](https://huggingface.co/papers/2510.02294) | [Code](https://github.com/codefuse-ai/CodeFuse-Embeddings/tree/main/F2LLM)
15
+
16
  The F2LLM dataset includes 6 million query-document-negative tuples curated solely from open-source, non-synthetic data, serving as a strong, budget-friendly baseline for training embedding models.
17
 
18
  ## Data Format