mohitlal2004
diff --git a/‎Module 9 - GenAI (LLMs and Prompt Engineering)/1. Text Embeddings/Text Representation (Embeddings).ipynb‎
Lines changed: 30 additions & 15 deletions b/‎Module 9 - GenAI (LLMs and Prompt Engineering)/1. Text Embeddings/Text Representation (Embeddings).ipynb‎
Lines changed: 30 additions & 15 deletions
@@ -6,19 +6,39 @@
  "metadata": {},
  "source": [
  "# **Text Representation aka Text Embeddings**\n",
- "Text embeddings are a way to represent words or phrases as vectors in a high-dimensional space based on their contextual meaning within a corpus of text data. The idea is that if two phrases are similar then the vectors that represent those phrases should be close together and vice versa.\n",
  "\n",
  "### **What's Covered**\n",
- "1. Introduction to Feature Extraction\n",
- "2. Case Study - Identifying Relavant US Economy News Articles\n",
- "3. Various Feature Representation Techniques\n",
+ "1. Why is NLP hard?\n",
+ "2. Introduction to Feature Extraction\n",
+ "3. Case Study - Identifying Relavant US Economy News Articles\n",
+ "4. Various Feature Representation Techniques\n",
  " - Basic Vectorization Approaches\n",
  " - Distributed Representation\n",
  " - Universal Language Representation\n",
  " - Handcrafted Features\n",
- "4. What is Language Modeling?\n",
- "5. Use Cases\n",
- "6. Some Real Time Applications"
+ "5. What is Language Modeling?\n",
+ "6. Use Cases\n",
+ "7. Some Real Time Applications"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5cca8f07-899b-41da-9407-2bc380103b13",
+ "metadata": {},
+ "source": [
+ "## **Why is NLP hard?**\n",
+ "\n",
+ "1. Complexity of representation\n",
+ "2. Ambiguity in Natural Language\n",
+ "\n",
+ "**Note:** Ambiguity means uncertainity of meaning. \n",
+ "> Example: The car hit the pole while it was moving.\n",
+ "\n",
+ "**Note:** Complexity of representation. Eg: Poems, Sarcasm, etc...\n",
+ "> Example 1: This task is a piece of cake. \n",
+ "> Example 2: You have a football game tomorrow. Break a leg!\n",
+ "\n",
+ "**Important:** The raw data, a sequence of symbols cannot be fed directly to the algorithms themselves as most of them expect numerical feature vectors with a fixed size rather than the raw text documents with variable length."
  ]
  },
  {
@@ -27,6 +47,9 @@
  "metadata": {},
  "source": [
  "## **Introduction to Feature Extraction**\n",
+ "\n",
+ "Text embeddings are a way to represent words or phrases as vectors in a high-dimensional space based on their contextual meaning within a corpus of text data. **The idea is that if two phrases are similar then the vectors that represent those phrases should be close together and vice versa.**\n",
+ "\n",
  "1. Feature Extraction is an important step for any machine learning problem.\n",
  "2. No matter how good a modeling algorithm you use, if you feed in poor features, you will get poor results.\n",
  "3. **Remember:** \"Garbage in, garbage out.\"\n",
@@ -901,14 +924,6 @@
  "7. Machine Translation\n",
  "8. Question and Answering"
  ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "b64ff9bd-b3bf-4f6f-898f-9bbb40de13fe",
- "metadata": {},
- "outputs": [],
- "source": []
  }
  ],
  "metadata": {