Skip to content

Commit 684a424

Browse files
committed
Update Text Representation (Embeddings).ipynb
1 parent 205578e commit 684a424

File tree

1 file changed

+30
-15
lines changed

1 file changed

+30
-15
lines changed

Module 9 - GenAI (LLMs and Prompt Engineering)/1. Text Embeddings/Text Representation (Embeddings).ipynb

Lines changed: 30 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -6,19 +6,39 @@
66
"metadata": {},
77
"source": [
88
"# **Text Representation aka Text Embeddings**\n",
9-
"Text embeddings are a way to represent words or phrases as vectors in a high-dimensional space based on their contextual meaning within a corpus of text data. The idea is that if two phrases are similar then the vectors that represent those phrases should be close together and vice versa.\n",
109
"\n",
1110
"### **What's Covered**\n",
12-
"1. Introduction to Feature Extraction\n",
13-
"2. Case Study - Identifying Relavant US Economy News Articles\n",
14-
"3. Various Feature Representation Techniques\n",
11+
"1. Why is NLP hard?\n",
12+
"2. Introduction to Feature Extraction\n",
13+
"3. Case Study - Identifying Relavant US Economy News Articles\n",
14+
"4. Various Feature Representation Techniques\n",
1515
" - Basic Vectorization Approaches\n",
1616
" - Distributed Representation\n",
1717
" - Universal Language Representation\n",
1818
" - Handcrafted Features\n",
19-
"4. What is Language Modeling?\n",
20-
"5. Use Cases\n",
21-
"6. Some Real Time Applications"
19+
"5. What is Language Modeling?\n",
20+
"6. Use Cases\n",
21+
"7. Some Real Time Applications"
22+
]
23+
},
24+
{
25+
"cell_type": "markdown",
26+
"id": "5cca8f07-899b-41da-9407-2bc380103b13",
27+
"metadata": {},
28+
"source": [
29+
"## **Why is NLP hard?**\n",
30+
"\n",
31+
"1. Complexity of representation\n",
32+
"2. Ambiguity in Natural Language\n",
33+
"\n",
34+
"**Note:** Ambiguity means uncertainity of meaning. \n",
35+
"> Example: The car hit the pole while it was moving.\n",
36+
"\n",
37+
"**Note:** Complexity of representation. Eg: Poems, Sarcasm, etc...\n",
38+
"> Example 1: This task is a piece of cake. \n",
39+
"> Example 2: You have a football game tomorrow. Break a leg!\n",
40+
"\n",
41+
"**Important:** The raw data, a sequence of symbols cannot be fed directly to the algorithms themselves as most of them expect numerical feature vectors with a fixed size rather than the raw text documents with variable length."
2242
]
2343
},
2444
{
@@ -27,6 +47,9 @@
2747
"metadata": {},
2848
"source": [
2949
"## **Introduction to Feature Extraction**\n",
50+
"\n",
51+
"Text embeddings are a way to represent words or phrases as vectors in a high-dimensional space based on their contextual meaning within a corpus of text data. **The idea is that if two phrases are similar then the vectors that represent those phrases should be close together and vice versa.**\n",
52+
"\n",
3053
"1. Feature Extraction is an important step for any machine learning problem.\n",
3154
"2. No matter how good a modeling algorithm you use, if you feed in poor features, you will get poor results.\n",
3255
"3. **Remember:** \"Garbage in, garbage out.\"\n",
@@ -901,14 +924,6 @@
901924
"7. Machine Translation\n",
902925
"8. Question and Answering"
903926
]
904-
},
905-
{
906-
"cell_type": "code",
907-
"execution_count": null,
908-
"id": "b64ff9bd-b3bf-4f6f-898f-9bbb40de13fe",
909-
"metadata": {},
910-
"outputs": [],
911-
"source": []
912927
}
913928
],
914929
"metadata": {

0 commit comments

Comments
 (0)