anurashmidalai
diff --git a/‎Module 9 - GenAI (LLMs and Prompt Engineering)/2. Intro to Transformers and LLMs/transformers_and_llms.ipynb‎
Lines changed: 11 additions & 10 deletions b/‎Module 9 - GenAI (LLMs and Prompt Engineering)/2. Intro to Transformers and LLMs/transformers_and_llms.ipynb‎
Lines changed: 11 additions & 10 deletions
@@ -195,20 +195,21 @@
  "2. Individual NLP tasks have traditionally been solved by individual models created for each specific task. That is, until— BERT!\n",
  "3. Tasks - BERT can solve 11+ NLP tasks such as sentiment analysis, named entity recognition, etc...\n",
  "4. Pretrained on: \n",
- " **a.** English Wikipedia - At the time 2.5 billion words \n",
- " **b.** Book Corpus - 800 million words \n",
- "5. BERT's tokenizer handles OOV tokens (out of vocabulary / previously unknown) by breaking them up into smaller chunks of known tokens.\n",
- "6. Trained on two language modeling specific tasks: \n",
+ " **a.** English Wikipedia - At the time 2.5 Billion words \n",
+ " **b.** Book Corpus - 800 Million words \n",
+ "5. Training on a dataset this large takes a long time. BERT’s training was made possible thanks to the novel Transformer architecture and sped up by using TPUs (Tensor Processing Units - Google’s custom circuit built specifically for large ML models). ~64 TPUs trained BERT over the course of 4 days.\n",
+ "6. BERT's tokenizer handles OOV tokens (out of vocabulary / previously unknown) by breaking them up into smaller chunks of known tokens.\n",
+ "7. Trained on two language modeling specific tasks: \n",
  " **a.** **Masked Language Modeling (MLM) aka Autoencoding Task** - Helps BERT recognize token interaction within the sentence. \n",
  " **b.** **Next Sentence Prediction (NSP) Task** - Helps BERT to understand how tokens interact with each other between sentences. \n",
  "<img style=\"float: right;\" width=\"300\" height=\"300\" src=\"data/images/bert_language_model_task.jpeg\">\n",
- "7. BERT uses three layer of token embedding for a given piece of text: Token Embedding, Segment Embedding and Position Embedding.\n",
- "8. BERT uses the encoder of transformer and ignores the decoder to become exceedingly good at processing/understanding massive amounts of text very quickly relative to other slower LLMs that focus on generating text one token at a time.\n",
- "9. BERT itself doesn't classify text or summarize documents but it is often used as a pre-trained model for downstream NLP tasks. \n",
+ "8. BERT uses three layer of token embedding for a given piece of text: Token Embedding, Segment Embedding and Position Embedding.\n",
+ "9. BERT uses the encoder of transformer and ignores the decoder to become exceedingly good at processing/understanding massive amounts of text very quickly relative to other slower LLMs that focus on generating text one token at a time.\n",
+ "10. BERT itself doesn't classify text or summarize documents but it is often used as a pre-trained model for downstream NLP tasks. \n",
  "<img style=\"float: right;\" width=\"300\" height=\"300\" src=\"data/images/bert_classification.jpeg\">\n",
- "10. 1 year later RoBERTa by Facebook AI shown to not require NSP task. It matched and even beat the original BERT model's performance in many areas.\n",
- "11. Reference: [Click here to read more](https://huggingface.co/blog/bert-101)\n",
- "12. BERT Implementation: [Click here to learn how to use BERT](https://colab.research.google.com/github/jalammar/jalammar.github.io/blob/master/notebooks/bert/A_Visual_Notebook_to_Using_BERT_for_the_First_Time.ipynb)\n",
+ "11. 1 year later RoBERTa by Facebook AI shown to not require NSP task. It matched and even beat the original BERT model's performance in many areas.\n",
+ "12. Reference: [Click here to read more](https://huggingface.co/blog/bert-101)\n",
+ "13. BERT Implementation: [Click here to learn how to use BERT](https://colab.research.google.com/github/jalammar/jalammar.github.io/blob/master/notebooks/bert/A_Visual_Notebook_to_Using_BERT_for_the_First_Time.ipynb)\n",
  "\n",
  "#### **2. GPT (Generative Pre-Trained Transformer)**\n",
  "\n",