0% found this document useful (0 votes)

189 views9 pages

Transformers in Machine Learning - GeeksforGeeks

Transformers are a neural network architecture introduced in 2017, primarily used in natural language processing and computer vision, overcoming limitations of traditional models like RNNs and LSTMs. They utilize self-attention mechanisms and parallel processing to understand context better, with applications in machine translation, speech recognition, and more. The architecture includes components like positional encoding, multi-head attention, and an encoder-decoder structure, enabling effective handling of complex tasks and relationships in data.

Uploaded by

Abhinandan Kalita

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

189 views9 pages

Transformers in Machine Learning - GeeksforGeeks

Uploaded by

Abhinandan Kalita

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Search...

Transformers in Machine Learning

Last Updated : 27 Feb, 2025

Transformer is a neural network architecture used for performing

machine learning tasks particularly in natural language processing
(NLP) and computer vision. In 2017 Vaswani et al. published a paper "
Attention is All You Need" in which the transformers architecture was
introduced. The article explores the architecture, workings and
applications of transformers.

Need For Transformers Model in Machine Learning

Transformer Architecture is a model that uses self-attention to
transform one whole sentence into a single sentence. This is useful
where older models work step by step and it helps overcome the
challenges seen in models like RNNs and LSTMs. Traditional models
like RNNs (Recurrent Neural Networks) suffer from the vanishing
gradient problem which leads to long-term memory loss. RNNs process
text sequentially meaning they analyze words one at a time.

For example, in the sentence: "XYZ went to France in 2019 when

there were no cases of COVID and there he met the president of
that country" the word "that country" refers to "France".
However RNN would struggle to link "that country" to "France"
since it processes each word in sequence leading to losing context
over long sentences. This limitation prevents RNNs from
understanding the full meaning of the sentence.

While adding more memory cells in LSTMs (Long Short-Term Memory

networks) helped address the vanishing gradient issue they still
process words one by one. This sequential processing means LSTMs
can't analyze an entire sentence at once.

For instance the word "point" has different meanings in these two
sentences:
"The needle has a sharp point." (Point = Tip)
"It is Tutorial
Deep Learning not polite toAnalysis
Data point Tutorial
at people." (Point
Python = visualization
â€“ Data Gesture) tutorial Sign In

Traditional models struggle with this context dependence, whereas,

Transformer model through its self-attention mechanism, processes
the entire sentence in parallel addressing these issues and making it
significantly more effective at understanding context.

Architecture and Working of Transformers

1. Positional Encoding

Unlike RNNs transformers lack an inherent understanding of word order

since they process data in parallel. To solve this Positional Encodings
are added to token embeddings providing information about the
position of each token within a sequence.

2. Position-wise Feed-Forward Networks

The Feed-Forward Networks consist of two linear transformations with

a ReLU activation. It is applied independently to each position in the
sequence.

Mathematically:

FFN(x) = max(0, xW1 + b1 )W2 + b2

This transformation helps refine the encoded representation at each

position.

3. Attention Mechanism

The attention mechanism allows transformers to determine which

words in a sentence are most relevant to each other. This is done using
a scaled dot-product attention approach:

1. Each word in a sequence is mapped to three vectors:

Query (Q)
Key (K)
Value (V)

2. Attention scores are computed as: Attention(Q, K, V ) =

softmax ( QKdk ) V
T

3. These scores determine how much attention each word should pay to
others.
Multi-Head Attention
Instead of using a single attention mechanism transformers apply
multi-head attention where multiple attention layers run in parallel.
This enables the model to capture different types of relationships
within the input.

4. Encoder-Decoder Architecture

The encoder-decoder structure is key to transformer models. The

encoder processes the input sequence into a vector, while the decoder
converts this vector back into a sequence. Each encoder and decoder
layer includes self-attention and feed-forward layers. In the decoder,
an encoder-decoder attention layer is added to focus on relevant parts
of the input.

For example, a French sentence "Je suis étudiant" is translated

into "I am a student" in English.

The encoder consists of multiple layers (typically 6 layers). Each layer

has two main components:

Self-Attention Mechanism – Helps the model understand word

relationships.
Feed-Forward Neural Network – Further transforms the
representation.

The decoder also consists of 6 layers, but with an additional encoder-

decoder attention mechanism. This allows the decoder to focus on
relevant parts of the input sentence while generating output.
For instance in the sentence "The cat didn't chase the mouse, because
it was not hungry", the word 'it' refers to 'cat'. The self-attention
mechanism helps the model correctly associate 'it' with 'cat' ensuring an
accurate understanding of sentence structure.

Applications of Transformers
Some of the applications of transformers are:

1. NLP Tasks: Transformers are used for machine translation, text

summarization, named entity recognition and sentiment analysis.
2. Speech Recognition: They process audio signals to convert speech
into transcribed text.
3. Computer Vision: Transformers are applied to image classification,
object detection, and image generation.
4. Recommendation Systems: They provide personalized
recommendations based on user preferences.
5. Text and Music Generation: Transformers are used for generating
text (e.g., articles) and composing music.

Transformers have redefined deep learning across NLP, computer vision,

and beyond. With advancements like BERT, GPT and Vision
Transformers (ViTs) they continue to push the boundaries of AI and
language understanding and multimodal learning.
Comment More info

Advertise with us

Similar Reads
Data Transformation in Machine Learning
Often the data received in a machine learning project is messy and
missing a bunch of values, creating a problem while we try to train our…

15+ min read

Top Machine Learning Trends in 2025

Just as electricity transformed almost everything 100 years ago, today I
actually have a hard time thinking of an industry that I don't think Artifici…

15+ min read

Machine Learning Models

Machine Learning models are very powerful resources that automate
multiple tasks and make them more accurate and efficient. ML handles…

15+ min read

50 Machine Learning Terms Explained

Machine Learning has become an integral part of modern technology,
driving advancements in everything from personalized recommendations…

15+ min read

Machine Learning Roadmap

Nowadays, machine learning (ML) is a key tool for gaining insights from
complex data and driving innovation in many industries. As more…

15+ min read

What is Machine Learning?

Machine learning is a branch of artificial intelligence that enables
algorithms to uncover hidden patterns within datasets. It allows them to…

15+ min read

Supervised Machine Learning

Supervised machine learning is a fundamental approach for machine
learning and artificial intelligence. It involves training a model using…

15+ min read

seq2seq Model in Machine Learning

Seq2Seq model or Sequence-to-Sequence model, is a machine learning
architecture designed for tasks involving sequential data. It takes an inpu…

15+ min read

Types of Machine Learning

Machine learning is the branch of Artificial Intelligence that focuses on
developing models and algorithms that let computers learn from data an…

15+ min read

Statistics For Machine Learning

Machine Learning Statistics: In the field of machine learning (ML),
statistics plays a pivotal role in extracting meaningful insights from data…

15+ min read

Corporate & Communications Address:

A-143, 7th Floor, Sovereign Corporate
Tower, Sector- 136, Noida, Uttar Pradesh
(201305)

Registered Address:
K 061, Tower K, Gulshan Vivante
Apartment, Sector 137, Noida, Gautam
Buddh Nagar, Uttar Pradesh, 201305

Advertise with us

Company Explore
About Us Job-A-Thon Hiring Challenge
Legal GfG Weekly Contest
Privacy Policy Offline Classroom Program
Careers DSA in JAVA/C++
In Media Master System Design
Contact Us Master CP
GfG Corporate Solution GeeksforGeeks Videos
Placement Training Program

Languages DSA
Python Data Structures
Java Algorithms
C++ DSA for Beginners
PHP Basic DSA Problems
GoLang DSA Roadmap
SQL DSA Interview Questions
R Language Competitive Programming
Android Tutorial

Data Science & ML Web Technologies

Data Science With Python HTML
Data Science For Beginner CSS
Machine Learning JavaScript
ML Maths TypeScript
Data Visualisation ReactJS
Pandas NextJS
NumPy NodeJs
NLP Bootstrap
Deep Learning Tailwind CSS
Python Tutorial Computer Science
Python Programming Examples GATE CS Notes
Django Tutorial Operating Systems
Python Projects Computer Network
Python Tkinter Database Management System
Web Scraping Software Engineering
OpenCV Tutorial Digital Logic Design
Python Interview Question Engineering Maths

DevOps System Design

Git High Level Design
AWS Low Level Design
Docker UML Diagrams
Kubernetes Interview Guide
Azure Design Patterns
GCP OOAD
DevOps Roadmap System Design Bootcamp
Interview Questions

School Subjects Databases

Mathematics SQL
Physics MYSQL
Chemistry PostgreSQL
Biology PL/SQL
Social Science MongoDB
English Grammar

Preparation Corner More Tutorials

Company-Wise Recruitment Process Software Development
Aptitude Preparation Software Testing
Puzzles Product Management
Company-Wise Preparation Project Management
Linux
Excel
All Cheat Sheets

Machine Learning/Data Science Programming Languages

Complete Machine Learning & Data Science Program - [LIVE] C Programming with Data Structures
Data Analytics Training using Excel, SQL, Python & PowerBI - C++ Programming Course
[LIVE] Java Programming Course
Data Science Training Program - [LIVE] Python Full Course
Data Science Course with IBM Certification

Clouds/Devops GATE 2026

DevOps Engineering GATE CS Rank Booster
AWS Solutions Architect Certification GATE DA Rank Booster
Salesforce Certified Administrator Course GATE CS & IT Course - 2026
GATE DA Course 2026
GATE Rank Predictor
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved

Transformer
No ratings yet
Transformer
5 pages
The Transformer Model - Revolutionizing Artificial Intelligence
No ratings yet
The Transformer Model - Revolutionizing Artificial Intelligence
6 pages
JioDiscover-What Is The Neural Networ
No ratings yet
JioDiscover-What Is The Neural Networ
5 pages
3 2transformers
No ratings yet
3 2transformers
22 pages
Transformers
No ratings yet
Transformers
21 pages
Transformer Architecture Explained in LLMs
No ratings yet
Transformer Architecture Explained in LLMs
2 pages
How Transformers Work - A Detailed Exploration of Transformer Architecture - DataCamp
No ratings yet
How Transformers Work - A Detailed Exploration of Transformer Architecture - DataCamp
19 pages
Transformers
No ratings yet
Transformers
15 pages
Getting Started With The Model Architecture of The Transformer
No ratings yet
Getting Started With The Model Architecture of The Transformer
103 pages
LLM .Foundation - Models.from - The.ground - Up
No ratings yet
LLM .Foundation - Models.from - The.ground - Up
195 pages
Week 12
100% (1)
Week 12
64 pages
Tranformrerz
No ratings yet
Tranformrerz
62 pages
Transformers Architecture
No ratings yet
Transformers Architecture
5 pages
Transformers: Attention Is All You Need
No ratings yet
Transformers: Attention Is All You Need
54 pages
Nicole Koenigstein - Transformers in Action (MEAP v7) 2024 (2024, Manning Publications Co.) - Libgen - Li
No ratings yet
Nicole Koenigstein - Transformers in Action (MEAP v7) 2024 (2024, Manning Publications Co.) - Libgen - Li
272 pages
Transformers Illustraded
No ratings yet
Transformers Illustraded
31 pages
Transformer Networks
No ratings yet
Transformer Networks
53 pages
FDP Deep Learning Architectures and Applications
No ratings yet
FDP Deep Learning Architectures and Applications
51 pages
BTech Advanced AI Unit03
No ratings yet
BTech Advanced AI Unit03
109 pages
Transformers in Machine Learning
No ratings yet
Transformers in Machine Learning
16 pages
Generative AI
No ratings yet
Generative AI
54 pages
GenAI For Developers
No ratings yet
GenAI For Developers
205 pages
How Transformers Work - A Detailed Exploration of Transformer Architecture - DataCamp
No ratings yet
How Transformers Work - A Detailed Exploration of Transformer Architecture - DataCamp
20 pages
Transformers in Action MEAP V06 Nicole Koenigstein Full Access
100% (2)
Transformers in Action MEAP V06 Nicole Koenigstein Full Access
156 pages
Introduction To Transformers An NLP Perspective
No ratings yet
Introduction To Transformers An NLP Perspective
119 pages
11.1. Queries, Keys, and Values - Dive Into Deep Learning 1.0-Merged-Compressed
No ratings yet
11.1. Queries, Keys, and Values - Dive Into Deep Learning 1.0-Merged-Compressed
55 pages
Visual Guide to Transformers
No ratings yet
Visual Guide to Transformers
30 pages
Lesson 14 - Transformer
No ratings yet
Lesson 14 - Transformer
124 pages
Transformer
No ratings yet
Transformer
31 pages
Transformers
No ratings yet
Transformers
10 pages
The Transformer - The Engine Behind Large Language
No ratings yet
The Transformer - The Engine Behind Large Language
3 pages
Transformers Explained Visually (Part 1) - Overview of Functionality - by Ketan Doshi - Towards Data Science
No ratings yet
Transformers Explained Visually (Part 1) - Overview of Functionality - by Ketan Doshi - Towards Data Science
23 pages
Generative AI Unit 3 Notes
No ratings yet
Generative AI Unit 3 Notes
8 pages
Ai900 M1 Notes
No ratings yet
Ai900 M1 Notes
4 pages
The Evolution of Deep Learning
No ratings yet
The Evolution of Deep Learning
53 pages
Transformer Design Report
No ratings yet
Transformer Design Report
21 pages
Transformers
No ratings yet
Transformers
127 pages
Definition:: Large Language Models (LLMS)
No ratings yet
Definition:: Large Language Models (LLMS)
41 pages
Attention Book Sample
No ratings yet
Attention Book Sample
32 pages
Transformers
No ratings yet
Transformers
12 pages
GenAI Workshop
No ratings yet
GenAI Workshop
35 pages
The Illustrated Transformer - Jay Alammar - Visualizing Machine Learning One Concept at A Time
No ratings yet
The Illustrated Transformer - Jay Alammar - Visualizing Machine Learning One Concept at A Time
22 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
62 pages
Lecture 6 Transformers
No ratings yet
Lecture 6 Transformers
92 pages
Transformers - AI's Language Revolution - Grok
No ratings yet
Transformers - AI's Language Revolution - Grok
6 pages
Transformet Notes
No ratings yet
Transformet Notes
1 page
Uppwise Standard PPT 2
No ratings yet
Uppwise Standard PPT 2
13 pages
The Transformer Architecture Explai
No ratings yet
The Transformer Architecture Explai
2 pages
Transformers
No ratings yet
Transformers
15 pages
Report 1 Transformers
No ratings yet
Report 1 Transformers
7 pages
2024 Transformer Master
No ratings yet
2024 Transformer Master
50 pages
AE556 2024 Topic7 Transformer
No ratings yet
AE556 2024 Topic7 Transformer
49 pages
A Guide To Transformers
No ratings yet
A Guide To Transformers
7 pages
An Introduction To Transformers
No ratings yet
An Introduction To Transformers
8 pages
GenAIWorkshop GEOMAR With Footnotes Final
No ratings yet
GenAIWorkshop GEOMAR With Footnotes Final
41 pages
Transformers Info
No ratings yet
Transformers Info
3 pages
2022-Markowitz-Transformers, Explained - Understand The Model Behind GPT-3, BERT, and T5
No ratings yet
2022-Markowitz-Transformers, Explained - Understand The Model Behind GPT-3, BERT, and T5
11 pages
2022 F-150 Order Guide
No ratings yet
2022 F-150 Order Guide
52 pages
Pop 18
No ratings yet
Pop 18
2 pages
Older Adults Talk Technology: Technology Usage and Attitudes
No ratings yet
Older Adults Talk Technology: Technology Usage and Attitudes
12 pages
A Detailed View Inside Snowflake
No ratings yet
A Detailed View Inside Snowflake
14 pages
MNRE Registration Help
No ratings yet
MNRE Registration Help
9 pages
S4 HANA MDG - Vendor Master Governance
0% (1)
S4 HANA MDG - Vendor Master Governance
31 pages
Field Group:-Qualifier Section Heading
No ratings yet
Field Group:-Qualifier Section Heading
22 pages
DC Em3000
50% (4)
DC Em3000
2 pages
SOP 02 Yard Operations PDF
0% (1)
SOP 02 Yard Operations PDF
4 pages
Marketing Research Process
No ratings yet
Marketing Research Process
3 pages
Range: Af Compressors
No ratings yet
Range: Af Compressors
2 pages
Reliability Improvement of Power Systems Using Shunt Reactive Compensation and Distributed Generation
No ratings yet
Reliability Improvement of Power Systems Using Shunt Reactive Compensation and Distributed Generation
16 pages
Unattended Petrol Pump System Diagrams
No ratings yet
Unattended Petrol Pump System Diagrams
4 pages
Operational and Commercial Systems, 1976-1989
100% (1)
Operational and Commercial Systems, 1976-1989
13 pages
System Admin Notes
No ratings yet
System Admin Notes
35 pages
When The Cloud Decides: Designing For Predictive Machine Learning For The IoT (O'Reilly Design 2016)
No ratings yet
When The Cloud Decides: Designing For Predictive Machine Learning For The IoT (O'Reilly Design 2016)
49 pages
12 Best Air Conditioner Brands Available in India
No ratings yet
12 Best Air Conditioner Brands Available in India
18 pages
Ktunotes - in EC302 Digital Communication
No ratings yet
Ktunotes - in EC302 Digital Communication
3 pages
Self Priming Pump "Kpa" - Data Sheet - Data Sheet
No ratings yet
Self Priming Pump "Kpa" - Data Sheet - Data Sheet
3 pages
SummaryIMLog 20110-40000-1100 00001
No ratings yet
SummaryIMLog 20110-40000-1100 00001
10 pages
Educator's Tech & Teaching Portfolio
No ratings yet
Educator's Tech & Teaching Portfolio
4 pages
The New Food Safety Standard ISO 22000. Assessment, Comparison and Correlation With HACCP and ISO 9000:2000. The Practical Implementation in Victual Business
No ratings yet
The New Food Safety Standard ISO 22000. Assessment, Comparison and Correlation With HACCP and ISO 9000:2000. The Practical Implementation in Victual Business
16 pages
Installation Guide v9 02-En
No ratings yet
Installation Guide v9 02-En
36 pages
Technion Campus Map
No ratings yet
Technion Campus Map
2 pages
Mototrbo System Training
No ratings yet
Mototrbo System Training
58 pages
Mca 3 Sem Software Engineering Mca509 Dec 2018
No ratings yet
Mca 3 Sem Software Engineering Mca509 Dec 2018
2 pages
ANALYSISTABS Free Project Management Tracker Excel
No ratings yet
ANALYSISTABS Free Project Management Tracker Excel
19 pages
Safari - May 26, 2024 at 7:59 AM
No ratings yet
Safari - May 26, 2024 at 7:59 AM
1 page
Comparison of VSI Versus LCI Systems FINAL
No ratings yet
Comparison of VSI Versus LCI Systems FINAL
14 pages