|
880 | 880 | "## **Various Feature Representation Techniques**\n", |
881 | 881 | "\n", |
882 | 882 | "Feature representation for text are classified into four categories: \n", |
883 | | - "- **Basic Vectorization Approaches**\n", |
| 883 | + "1. **Basic Vectorization Approaches**\n", |
884 | 884 | " - Eg: One-Hot Encoding, Bag of Words, Bag of N-Grams, and TFIDF\n", |
885 | | - " - Drawbacks: They are discrete representations, vector representation is sparse and hig-dimensional, and they cannot handle OOV words.\n", |
886 | | - "- **Distributed Representations**\n", |
| 885 | + " - Drawbacks: They are discrete representations, vector representation is sparse and high-dimensional, and they cannot handle OOV words.\n", |
| 886 | + "2. **Distributed Representations**\n", |
887 | 887 | " - Eg: Word Embeddings (Word2Vec, GloVe, fastText), Document Embeddings (Doc2Vec)\n", |
888 | 888 | " - Text embeddings are a way to represent words or phrases as vectors in a high-dimensional space based on their semantic meaning within a corpus of text data. The idea is that if two phrases are similar then the vectors that represent those phrases should be close together and vice versa.\n", |
889 | 889 | " - Word2Vec Arcitecture for Training - Continuous Bag of Words (CBOW), SkipGram and SkipGram with Negative Sampling\n", |
| 890 | + " - Negative Sampling: It's like teaching by showing both what's right and wrong.\n", |
| 891 | + " - Word2Vec, in its original implementations like Skip-gram and Continuous Bag of Words (CBOW), captures contextual information from both directions around a target word. However, it does generate only one representation per word in the learned embedding space. This is because Word2Vec learns a single vector representation for each word in the vocabulary based on the aggregated contextual information from all instances of that word in the training corpus. So, while the model considers multiple contexts for each word during training, it ultimately produces a single embedding vector per word, which encapsulates its semantic properties across various contexts.\n", |
890 | 892 | " - Word2Vec don't have a good way of handling OOV words\n", |
891 | 893 | " - **Handling OOV words problem:** One way is by modifying the training process by bringing in characters and other sub-level linguistic components such as morphological properties (e.g., prefixes, suffixes, word endings, etc...). **FastText** from facebook follows this approach.\n", |
892 | 894 | " - Doc2Vec: Based on Paragraph vector framework. Neural network used to learn Doc2Vec embeddings is very similar to CBOW and SkipGram architecture of Word2Vec.\n", |
|
895 | 897 | " - To facilitate the learning of distributed representations for sequential data, a change in training architecture was necessary. This is where **Recurrent Neural Networks (RNNs)** come into play.\n", |
896 | 898 | " - RNNs learn distributed representations of sequential data by processing input sequences one token at a time. RNNs are capable of capturing long-term dependencies in sequential data, making them effective for tasks such as language modeling, machine translation, sentiment analysis, and text generation.\n", |
897 | 899 | " - Variants of RNNs, such as Long Short-Term Memory (LSTM) networks and Gated Recurrent Units (GRUs), address the vanishing gradient problem and improve the ability of RNNs to capture long-range dependencies in text data. \n", |
898 | | - "- **Universal Language Representaion**\n", |
| 900 | + "3. **Universal Language Representaion**\n", |
899 | 901 | " - Problem in the above approach:\n", |
900 | 902 | " - One word gets one fixed representation. Eg: \"I went to bank to withdraw money\" and \"I sat on the river bank\" both uses the word \"bank\"\n", |
901 | 903 | " - Handling long-term dependencies in extremely long sequences\n", |
902 | 904 | " - Computationally expensive to train\n", |
903 | 905 | " - Slow to train due to sequential trianing \n", |
904 | 906 | " - Solutions:\n", |
905 | | - " - In 2017 - **[Attention is all you need](https://arxiv.org/pdf/1706.03762.pdf)** paper introduced by Google solves the \"Sequential training\" and \"Long-term dependencies\" problem of earlier architecture by removing the need of RNN cells completely.\n", |
| 907 | + " - In 2017 - **[Attention is all you need](https://arxiv.org/pdf/1706.03762.pdf)** paper introduced by Google solves the \"Sequential training\" problem of earlier architecture by removing the need of RNN cells completely. AND \"Long-term dependencies\" problem with the help of Attention Mechanishm.\n", |
906 | 908 | " - In 2018 - Researchers from University of Washington came with **[Contextual Word Representations](https://arxiv.org/pdf/1802.05365.pdf)**, which addresses the above problem of \"One word gets one fixed representation\".\n", |
907 | 909 | " - **Remember:** Recently, Contextual Word Representations are learned by using the word embeddings we discussed earlier (like Word2Vec) and training on **language modeling** task using complex neural architecture (like RNNs and Transformers).\n", |
908 | 910 | " - **Language Modeling:** It is a task of predicting the next likely word in a sequence of words. In its earliest form, it used the idea of n-gram frequencies to estimate the probability of the next word given a history of words.\n", |
909 | 911 | " - **Key Idea:** Learn embedding on a generic task like language modeling on a massive corpus and then fine-tune learnings on a task-specific data. This is also known as **transfer learning**.\n", |
910 | 912 | " - **How to decide whether to train our own embeddings or use pre-trained embeddings?** - A good rule of thumb is to compute the vocabulary overlap. If the overlap between the vocabulary of our custom domain and that of pre-trained word embeddings is significant, pre-trained word embeddings tends to give good results.\n", |
911 | 913 | " - **One more important factor to consider while deploying models with embeddings-based feature extraction approach:** - Remember that learned or pre-trained embedding models have to be stored and loaded into memory while using these approaches. If the model itself is bulky, we need to factor this into our deployment needs.\n", |
912 | | - "- **Handcrafted Features**\n", |
| 914 | + "4. **Handcrafted Features**\n", |
913 | 915 | " - These features have to be designed manually, keeping in mind both the domain knowledge and the ML algorithms to train the NLP models.\n", |
914 | 916 | " - Custom feature engineering is much more difficult to formulate compared to other feature engineering schemes we've seen so far. " |
915 | 917 | ] |
|
0 commit comments