0% found this document useful (0 votes)

12 views1 page

Transformet Notes

Transformers, introduced in 2017, revolutionized NLP by utilizing self-attention instead of recurrence, forming the basis for models like BERT and GPT. They process tokens in parallel, allowing for efficient context understanding through attention mechanisms. The architecture consists of an Encoder-Decoder structure, with modern models often using only the encoder or decoder for specific tasks.

Uploaded by

Apu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views1 page

Transformet Notes

Uploaded by

Apu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 1

1.

Introduction

Transformers are a deep learning architecture introduced in 2017 by Vaswani et al.

in the paper “Attention Is All You Need”.

They revolutionized Natural Language Processing (NLP) by replacing recurrence

(RNNs/LSTMs) with self-attention.

Today, they are the foundation of models like BERT, GPT, T5, LLaMA, etc., and are
also used in vision, speech, and multimodal tasks.

2. Key Idea

Instead of processing tokens sequentially (like RNNs), Transformers process them in

parallel.

The core mechanism is Attention, which lets the model decide which parts of the
input are most relevant to each token.

3. Transformer Architecture
3.1 Overall Structure

A Transformer has an Encoder–Decoder structure (like in seq2seq models), though

many modern models use only the encoder (e.g., BERT) or only the decoder (e.g.,
GPT).

Encoder: Processes input sequence and creates contextual embeddings.

Decoder: Generates output sequence, using encoder outputs + attention over

previously generated tokens.

3.2 Components

Input Embeddings

Words/tokens are converted into vectors.

Position information is added using positional encoding (since no recurrence

exists).

Self-Attention

Each token looks at other tokens to understand context.

Uses Query (Q), Key (K), and Value (V) matrices.

Attention Score = softmax(QKᵀ / √d) V.

Captures long-range dependencies efficiently

Transformer Design Report
No ratings yet
Transformer Design Report
21 pages
Transformers Architecture
No ratings yet
Transformers Architecture
5 pages
Transformers AI Fundamentals
No ratings yet
Transformers AI Fundamentals
2 pages
Understanding The Transformer Archi
No ratings yet
Understanding The Transformer Archi
2 pages
The Transformer Architecture Explai
No ratings yet
The Transformer Architecture Explai
2 pages
Transformer Model in NLP Explained
No ratings yet
Transformer Model in NLP Explained
1 page
Transformers: Attention Is All You Need
No ratings yet
Transformers: Attention Is All You Need
54 pages
Transformers
No ratings yet
Transformers
20 pages
JioDiscover-What Is The Neural Networ
No ratings yet
JioDiscover-What Is The Neural Networ
5 pages
Attention Is All You Need
No ratings yet
Attention Is All You Need
1 page
The Transformer Model - Revolutionizing Artificial Intelligence
No ratings yet
The Transformer Model - Revolutionizing Artificial Intelligence
6 pages
Transformer Architecture Guide
No ratings yet
Transformer Architecture Guide
2 pages
Transformers in Machine Learning - GeeksforGeeks
No ratings yet
Transformers in Machine Learning - GeeksforGeeks
9 pages
Transformers Report Revised
No ratings yet
Transformers Report Revised
10 pages
The Transformer - The Engine Behind Large Language
No ratings yet
The Transformer - The Engine Behind Large Language
3 pages
Transformers Info
No ratings yet
Transformers Info
3 pages
Good Note - Transformer
No ratings yet
Good Note - Transformer
16 pages
How Transformers Work - A Detailed Exploration of Transformer Architecture - DataCamp
No ratings yet
How Transformers Work - A Detailed Exploration of Transformer Architecture - DataCamp
19 pages
Transformer Networks
No ratings yet
Transformer Networks
53 pages
Report
No ratings yet
Report
1 page
Transformer
No ratings yet
Transformer
5 pages
Transformers
No ratings yet
Transformers
2 pages
How Transformers Work - A Detailed Exploration of Transformer Architecture - DataCamp
No ratings yet
How Transformers Work - A Detailed Exploration of Transformer Architecture - DataCamp
20 pages
Transformer Presentation
No ratings yet
Transformer Presentation
15 pages
3 2transformers
No ratings yet
3 2transformers
22 pages
Applsci 14 04316
No ratings yet
Applsci 14 04316
27 pages
The Transformer Revolution Unveiling The Inner Workings of A Computational Marvel
No ratings yet
The Transformer Revolution Unveiling The Inner Workings of A Computational Marvel
2 pages
Transformers Explained Visually (Part 1) - Overview of Functionality - by Ketan Doshi - Towards Data Science
No ratings yet
Transformers Explained Visually (Part 1) - Overview of Functionality - by Ketan Doshi - Towards Data Science
23 pages
11.1. Queries, Keys, and Values - Dive Into Deep Learning 1.0-Merged-Compressed
No ratings yet
11.1. Queries, Keys, and Values - Dive Into Deep Learning 1.0-Merged-Compressed
55 pages
Tranformrerz
No ratings yet
Tranformrerz
62 pages
Am Ogh Seminar Report
No ratings yet
Am Ogh Seminar Report
19 pages
# ? Attention Is All You Need - Simple Explanation
No ratings yet
# ? Attention Is All You Need - Simple Explanation
5 pages
Transformers
No ratings yet
Transformers
21 pages
Attention Is All You Need: Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit
No ratings yet
Attention Is All You Need: Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit
15 pages
A Guide To Transformers
No ratings yet
A Guide To Transformers
7 pages
TRANSFORMER
No ratings yet
TRANSFORMER
5 pages
Notes 2 Transformer Model Architecture
No ratings yet
Notes 2 Transformer Model Architecture
4 pages
Definition:: Large Language Models (LLMS)
No ratings yet
Definition:: Large Language Models (LLMS)
41 pages
Attention Book Sample
No ratings yet
Attention Book Sample
32 pages
Transformers
No ratings yet
Transformers
12 pages
2022-Markowitz-Transformers, Explained - Understand The Model Behind GPT-3, BERT, and T5
No ratings yet
2022-Markowitz-Transformers, Explained - Understand The Model Behind GPT-3, BERT, and T5
11 pages
AE556 2024 Topic7 Transformer
No ratings yet
AE556 2024 Topic7 Transformer
49 pages
Getting Started With The Model Architecture of The Transformer
No ratings yet
Getting Started With The Model Architecture of The Transformer
103 pages
Transformer Variants Survey
No ratings yet
Transformer Variants Survey
22 pages
The Deep Learning Revolution
No ratings yet
The Deep Learning Revolution
1 page
Lesson 14 - Transformer
No ratings yet
Lesson 14 - Transformer
124 pages
Transformers
No ratings yet
Transformers
15 pages
Week 12
100% (1)
Week 12
64 pages
Attention 1
No ratings yet
Attention 1
1 page
Lec 3
No ratings yet
Lec 3
13 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
62 pages
Nicole Koenigstein - Transformers in Action (MEAP v7) 2024 (2024, Manning Publications Co.) - Libgen - Li
No ratings yet
Nicole Koenigstein - Transformers in Action (MEAP v7) 2024 (2024, Manning Publications Co.) - Libgen - Li
272 pages
Transformer Architectures - ResearchPaper
No ratings yet
Transformer Architectures - ResearchPaper
13 pages
NLP
No ratings yet
NLP
1 page
Transformers in Machine Learning
No ratings yet
Transformers in Machine Learning
16 pages
Advanced Deep Learning Course
No ratings yet
Advanced Deep Learning Course
3 pages
Transformers
No ratings yet
Transformers
127 pages
Groq LLM Filtered
No ratings yet
Groq LLM Filtered
1 page
C Make Lists
No ratings yet
C Make Lists
2 pages
Zoom
No ratings yet
Zoom
22 pages
Task Logs
No ratings yet
Task Logs
1 page
Appu
No ratings yet
Appu
60 pages
Java Script Ix
No ratings yet
Java Script Ix
18 pages
Physics Crash Course for JEE
No ratings yet
Physics Crash Course for JEE
6 pages
Sol 5
No ratings yet
Sol 5
19 pages

Transformet Notes

Uploaded by

Transformet Notes

Uploaded by

1.

Transformers are a deep learning architecture introduced in 2017 by Vaswani et al.

They revolutionized Natural Language Processing (NLP) by replacing recurrence

Instead of processing tokens sequentially (like RNNs), Transformers process them in

A Transformer has an Encoder–Decoder structure (like in seq2seq models), though

Encoder: Processes input sequence and creates contextual embeddings.

Decoder: Generates output sequence, using encoder outputs + attention over

Words/tokens are converted into vectors.

Position information is added using positional encoding (since no recurrence

Each token looks at other tokens to understand context.

Uses Query (Q), Key (K), and Value (V) matrices.

Attention Score = softmax(QKᵀ / √d) V.

Captures long-range dependencies efficiently

You might also like