|
195 | 195 | "2. Individual NLP tasks have traditionally been solved by individual models created for each specific task. That is, until— BERT!\n", |
196 | 196 | "3. Tasks - BERT can solve 11+ NLP tasks such as sentiment analysis, named entity recognition, etc...\n", |
197 | 197 | "4. Pretrained on: \n", |
198 | | - " **a.** English Wikipedia - At the time 2.5 billion words \n", |
199 | | - " **b.** Book Corpus - 800 million words \n", |
200 | | - "5. BERT's tokenizer handles OOV tokens (out of vocabulary / previously unknown) by breaking them up into smaller chunks of known tokens.\n", |
201 | | - "6. Trained on two language modeling specific tasks: \n", |
| 198 | + " **a.** English Wikipedia - At the time 2.5 Billion words \n", |
| 199 | + " **b.** Book Corpus - 800 Million words \n", |
| 200 | + "5. Training on a dataset this large takes a long time. BERT’s training was made possible thanks to the novel Transformer architecture and sped up by using TPUs (Tensor Processing Units - Google’s custom circuit built specifically for large ML models). ~64 TPUs trained BERT over the course of 4 days.\n", |
| 201 | + "6. BERT's tokenizer handles OOV tokens (out of vocabulary / previously unknown) by breaking them up into smaller chunks of known tokens.\n", |
| 202 | + "7. Trained on two language modeling specific tasks: \n", |
202 | 203 | " **a.** **Masked Language Modeling (MLM) aka Autoencoding Task** - Helps BERT recognize token interaction within the sentence. \n", |
203 | 204 | " **b.** **Next Sentence Prediction (NSP) Task** - Helps BERT to understand how tokens interact with each other between sentences. \n", |
204 | 205 | "<img style=\"float: right;\" width=\"300\" height=\"300\" src=\"data/images/bert_language_model_task.jpeg\">\n", |
205 | | - "7. BERT uses three layer of token embedding for a given piece of text: Token Embedding, Segment Embedding and Position Embedding.\n", |
206 | | - "8. BERT uses the encoder of transformer and ignores the decoder to become exceedingly good at processing/understanding massive amounts of text very quickly relative to other slower LLMs that focus on generating text one token at a time.\n", |
207 | | - "9. BERT itself doesn't classify text or summarize documents but it is often used as a pre-trained model for downstream NLP tasks. \n", |
| 206 | + "8. BERT uses three layer of token embedding for a given piece of text: Token Embedding, Segment Embedding and Position Embedding.\n", |
| 207 | + "9. BERT uses the encoder of transformer and ignores the decoder to become exceedingly good at processing/understanding massive amounts of text very quickly relative to other slower LLMs that focus on generating text one token at a time.\n", |
| 208 | + "10. BERT itself doesn't classify text or summarize documents but it is often used as a pre-trained model for downstream NLP tasks. \n", |
208 | 209 | "<img style=\"float: right;\" width=\"300\" height=\"300\" src=\"data/images/bert_classification.jpeg\">\n", |
209 | | - "10. 1 year later RoBERTa by Facebook AI shown to not require NSP task. It matched and even beat the original BERT model's performance in many areas.\n", |
210 | | - "11. Reference: [Click here to read more](https://huggingface.co/blog/bert-101)\n", |
211 | | - "12. BERT Implementation: [Click here to learn how to use BERT](https://colab.research.google.com/github/jalammar/jalammar.github.io/blob/master/notebooks/bert/A_Visual_Notebook_to_Using_BERT_for_the_First_Time.ipynb)\n", |
| 210 | + "11. 1 year later RoBERTa by Facebook AI shown to not require NSP task. It matched and even beat the original BERT model's performance in many areas.\n", |
| 211 | + "12. Reference: [Click here to read more](https://huggingface.co/blog/bert-101)\n", |
| 212 | + "13. BERT Implementation: [Click here to learn how to use BERT](https://colab.research.google.com/github/jalammar/jalammar.github.io/blob/master/notebooks/bert/A_Visual_Notebook_to_Using_BERT_for_the_First_Time.ipynb)\n", |
212 | 213 | "\n", |
213 | 214 | "#### **2. GPT (Generative Pre-Trained Transformer)**\n", |
214 | 215 | "\n", |
|
0 commit comments