|
590 | 590 | "5. **Combination of autoregressive and autoencoding language models** are more versatile and flexible in generating text. It has been shown that the combination models can generate more diverse and creative text in different context compared to pure decode-based autoregressive models due to their ability to capture additional context using the encoder. For eg: **T5**\n" |
591 | 591 | ] |
592 | 592 | }, |
593 | | - { |
594 | | - "cell_type": "markdown", |
595 | | - "id": "f16f5932-f1d2-4b7f-826e-ea31584fcb99", |
596 | | - "metadata": {}, |
597 | | - "source": [ |
598 | | - "## **Use Cases**\n", |
599 | | - "1. Text Classification\n", |
600 | | - " - **Pipeline without learning** - Given a corpus of tweets where each tweet is labeled with its corresponding sentiment - negative and positive. We want to build a classification system that will predict the sentiment of an unseen tweet using only text of the tweet. A simple solution could be to create a list of positive and negative words in english. Now compute the useage of positive versus negative words in the input tweet and make a prediction based on the information. Further enhancements to this approach may involve creating more sophisticated dictionaries with degrees of positive, negative and neutral sentiment of words or formulating specific heuristics (e.g., useage of certain smileys indicate positive sentiment) and using them to make predictions. This approach is called `lexicon-based sentiment analysis`.\n", |
601 | | - " - ML/DL Approaches\n", |
602 | | - "2. Information Extraction\n", |
603 | | - "3. Chatbot\n", |
604 | | - "4. Topic Modeling\n", |
605 | | - "5. Text Generation\n", |
606 | | - "6. Text Summarization\n", |
607 | | - "7. Question and Answering" |
608 | | - ] |
609 | | - }, |
610 | 593 | { |
611 | 594 | "cell_type": "markdown", |
612 | 595 | "id": "b8dba930-1530-4a04-88be-db9c50617e50", |
|
633 | 616 | " - Trending topic detection\n", |
634 | 617 | " - Opinion Mining" |
635 | 618 | ] |
| 619 | + }, |
| 620 | + { |
| 621 | + "cell_type": "markdown", |
| 622 | + "id": "f16f5932-f1d2-4b7f-826e-ea31584fcb99", |
| 623 | + "metadata": {}, |
| 624 | + "source": [ |
| 625 | + "## **Use Cases**\n", |
| 626 | + "1. Text Classification\n", |
| 627 | + " - Sentiment Analysis, Spam Ham Detection, Fake news detection, Adult content filtering, etc...\n", |
| 628 | + " - **Pipeline without learning** - Given a corpus of tweets where each tweet is labeled with its corresponding sentiment - negative and positive. We want to build a classification system that will predict the sentiment of an unseen tweet using only text of the tweet. A simple solution could be to create a list of positive and negative words in english. Now compute the useage of positive versus negative words in the input tweet and make a prediction based on the information. Further enhancements to this approach may involve creating more sophisticated dictionaries with degrees of positive, negative and neutral sentiment of words or formulating specific heuristics (e.g., useage of certain smileys indicate positive sentiment) and using them to make predictions. This approach is called `lexicon-based sentiment analysis`.\n", |
| 629 | + " - Pipeline with ML/DL Approaches\n", |
| 630 | + "2. Information Extraction\n", |
| 631 | + " - Keyphrase Extraction: Extraction of bunch of commonly used keywords or phrases from the text data.\n", |
| 632 | + " - Named Entity Recognition: Task of Identifying the entities in a document. Entities are typically name of person, location and organizations. It can also be dates, products, names/numbers of laws or article, etc...\n", |
| 633 | + " - Quick Note: POS Tagging can improve the NER performance.\n", |
| 634 | + " - NER is not a normal classification task. It is modeled as a sequence classification problem, because if you think about it entity prediction for the current word also depends on the context.\n", |
| 635 | + " - To illustrate the difference between a normal classifier and a sequence classifier, conside the following sentence: \"Washington is a rainy state.\" When a normal classifier sees this sentence and has to classify it word by word, it has to make a decision as to whether Washington refers to a person or the State, without looking at the surroundeing words. Conditional Random Fields (CRFs) were classical popular sequence classifier training algorithms. Now a days, there are RNNs, LSTMs, Transformers, etc that can generate state-of-the-art results.\n", |
| 636 | + " - Named Entity Disambiguation (NED) and Linking (NEL), Relationship Extraction, etc...: Use Azure or Google API\n", |
| 637 | + "3. Search and Information Retrieval\n", |
| 638 | + "4. Chatbot\n", |
| 639 | + "5. Topic Modeling and\n", |
| 640 | + "6. Text Generation\n", |
| 641 | + "7. Text Summarization\n", |
| 642 | + "8. Machine Translation\n", |
| 643 | + "9. Question and Answering" |
| 644 | + ] |
| 645 | + }, |
| 646 | + { |
| 647 | + "cell_type": "code", |
| 648 | + "execution_count": null, |
| 649 | + "id": "b64ff9bd-b3bf-4f6f-898f-9bbb40de13fe", |
| 650 | + "metadata": {}, |
| 651 | + "outputs": [], |
| 652 | + "source": [] |
636 | 653 | } |
637 | 654 | ], |
638 | 655 | "metadata": { |
|
0 commit comments