|
6 | 6 | "metadata": {}, |
7 | 7 | "source": [ |
8 | 8 | "# **Text Representation aka Text Embeddings**\n", |
9 | | - "Text embeddings are a way to represent words or phrases as vectors in a high-dimensional space based on their contextual meaning within a corpus of text data. The idea is that if two phrases are similar then the vectors that represent those phrases should be close together and vice versa.\n", |
10 | 9 | "\n", |
11 | 10 | "### **What's Covered**\n", |
12 | | - "1. Introduction to Feature Extraction\n", |
13 | | - "2. Case Study - Identifying Relavant US Economy News Articles\n", |
14 | | - "3. Various Feature Representation Techniques\n", |
| 11 | + "1. Why is NLP hard?\n", |
| 12 | + "2. Introduction to Feature Extraction\n", |
| 13 | + "3. Case Study - Identifying Relavant US Economy News Articles\n", |
| 14 | + "4. Various Feature Representation Techniques\n", |
15 | 15 | " - Basic Vectorization Approaches\n", |
16 | 16 | " - Distributed Representation\n", |
17 | 17 | " - Universal Language Representation\n", |
18 | 18 | " - Handcrafted Features\n", |
19 | | - "4. What is Language Modeling?\n", |
20 | | - "5. Use Cases\n", |
21 | | - "6. Some Real Time Applications" |
| 19 | + "5. What is Language Modeling?\n", |
| 20 | + "6. Use Cases\n", |
| 21 | + "7. Some Real Time Applications" |
| 22 | + ] |
| 23 | + }, |
| 24 | + { |
| 25 | + "cell_type": "markdown", |
| 26 | + "id": "5cca8f07-899b-41da-9407-2bc380103b13", |
| 27 | + "metadata": {}, |
| 28 | + "source": [ |
| 29 | + "## **Why is NLP hard?**\n", |
| 30 | + "\n", |
| 31 | + "1. Complexity of representation\n", |
| 32 | + "2. Ambiguity in Natural Language\n", |
| 33 | + "\n", |
| 34 | + "**Note:** Ambiguity means uncertainity of meaning. \n", |
| 35 | + "> Example: The car hit the pole while it was moving.\n", |
| 36 | + "\n", |
| 37 | + "**Note:** Complexity of representation. Eg: Poems, Sarcasm, etc...\n", |
| 38 | + "> Example 1: This task is a piece of cake. \n", |
| 39 | + "> Example 2: You have a football game tomorrow. Break a leg!\n", |
| 40 | + "\n", |
| 41 | + "**Important:** The raw data, a sequence of symbols cannot be fed directly to the algorithms themselves as most of them expect numerical feature vectors with a fixed size rather than the raw text documents with variable length." |
22 | 42 | ] |
23 | 43 | }, |
24 | 44 | { |
|
27 | 47 | "metadata": {}, |
28 | 48 | "source": [ |
29 | 49 | "## **Introduction to Feature Extraction**\n", |
| 50 | + "\n", |
| 51 | + "Text embeddings are a way to represent words or phrases as vectors in a high-dimensional space based on their contextual meaning within a corpus of text data. **The idea is that if two phrases are similar then the vectors that represent those phrases should be close together and vice versa.**\n", |
| 52 | + "\n", |
30 | 53 | "1. Feature Extraction is an important step for any machine learning problem.\n", |
31 | 54 | "2. No matter how good a modeling algorithm you use, if you feed in poor features, you will get poor results.\n", |
32 | 55 | "3. **Remember:** \"Garbage in, garbage out.\"\n", |
|
901 | 924 | "7. Machine Translation\n", |
902 | 925 | "8. Question and Answering" |
903 | 926 | ] |
904 | | - }, |
905 | | - { |
906 | | - "cell_type": "code", |
907 | | - "execution_count": null, |
908 | | - "id": "b64ff9bd-b3bf-4f6f-898f-9bbb40de13fe", |
909 | | - "metadata": {}, |
910 | | - "outputs": [], |
911 | | - "source": [] |
912 | 927 | } |
913 | 928 | ], |
914 | 929 | "metadata": { |
|
0 commit comments