SDineshKumar1304
diff --git a/‎Module 9 - GenAI (LLMs and Prompt Engineering)/2. Intro to Transformers, LLMs and GenAI/transformers_llms_and_genai.ipynb‎
Lines changed: 2 additions & 2 deletions b/‎Module 9 - GenAI (LLMs and Prompt Engineering)/2. Intro to Transformers, LLMs and GenAI/transformers_llms_and_genai.ipynb‎
Lines changed: 2 additions & 2 deletions
@@ -61,8 +61,8 @@
  " - This architecture used to work well with smaller sentence.\n",
  " - **The Problem:** While it could handle variable-length input and output sequences, it used to rely on generating a single fixed-length context vector for the entire input sequence, which can lead to information loss, especially for longer sequences.\n",
  "3. **[(2015) Neural Machine Translation by Joint Learning to Align and Translate](https://arxiv.org/pdf/1409.0473.pdf)** paper introduced the concept of **Attention Mechanism** to solve the above problem.\n",
- " - Unlike traditional NMT models that encode the entire source sentence into a fixed-length context vector, the attention mechanism allows the model to focus on different parts of the source sentence dynamically while generating the translation.\n",
- " - Attention Mechanism also addressed the problem of learning alignment between input and output sequences, enables the model to weigh the importance of each word in the source sentence differently during translation. By dynamically adjusting the attention weights, the model can focus more on relevant words and ignore irrelevant ones, leading to more accurate translations.\n",
+ " - Unlike traditional NMT models that encode the entire source sentence into a fixed-length context vector, the **attention mechanism allows the model to focus on different parts of the source sentence dynamically** while generating the translation.\n",
+ " - Attention Mechanism also **addressed the problem of learning alignment between input and output sequences**, enables the model to weigh the importance of each word in the source sentence differently during translation. By dynamically adjusting the attention weights, the model can focus more on relevant words and ignore irrelevant ones, leading to more accurate translations. For eg: Think about the english to hindi translation for \"I work at Apple Inc\" vs \"I work at Apple Farm\". Where should I keep सेब vs एप्पल इंक ?\n",
  " - At each timestamp of the decoder, the dynamically calculated context vector indicates which timestamps of the encoder sequence are expected to have the most influence on the current decoding step of the decoder.\n",
  " - In simple terms, context vector will be the weighted sum of encoders hidden state. And these weights are called as **attention weights**.\n",
  " - The attention mechanism has improved, the quality of translation on long input sentences. But it was not able to solve a huge fundamental flaw i.e. sequential training.\n",