0% found this document useful (0 votes)
80 views13 pages

Chapter 1 - Introduction

Uploaded by

om55500r
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views13 pages

Chapter 1 - Introduction

Uploaded by

om55500r
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Chapter 1- INTRODUCTION

INTRODUCTION OF LANGUAGE PROCESSING SYSTEM


A compiler translates the code written in one language to some other language without changing the
meaning of the program. It is also expected that a compiler should make the target code efficient and
optimized in terms of time and space. Compiler is a software which converts a program written in high
level language (Source Language) to low level language (Object/Target/Machine Language).

 Cross Compiler that runs on a machine ‘A’ and produces a code for another machine ‘B’. It is capable of
creating code for a platform other than the one on which the compiler is running.

 Source-to-source Compiler or transcompiler or transpiler is a compiler that translates source code written in
one programming language into source code of another programming language.

Compiler design principles provide an in-depth view of translation and optimization process.
It includes lexical, syntax, and semantic analysis as front end, and code generation and optimization
as back-end.

Language Processing System


A computer is a logical assembly of Software and Hardware. The hardware knows a language, that is hard
for us to grasp, consequently we tend to write programs in high-level language, that is much less
complicated for us to comprehend and maintain in thoughts. Now these programs go through a series of
transformation so that they can readily be used machines. This is where language procedure systems
come handy.
1. High Level Language – If a program contains #define or #include directives such as #include or #define it is
called HLL. They are closer to humans but far from machines. These (#) tags are called pre-processor
directives. They direct the pre-processor about what to do.
2. Preprocessor: A preprocessor, generally considered as a part of compiler, is a tool that produces input for
compilers. It deals with macro-processing, augmentation, file inclusion, language extension, etc.

1
Preprocessor
A preprocessor produce input to compilers. They may perform the following functions.
i) Macro processing: A preprocessor may allow a user to define macros that are short hands for longer constructs.
ii) File inclusion: A preprocessor may include header files into the program text.
iii) Rational preprocessor: these preprocessors augment older languages with more modern flow-of-control
and data structuring facilities.
iv) Language Extensions: These preprocessor attempts to add capabilities to the language by certain amounts to
build-in macro.

3. Assembly Language – Its neither in binary form nor high level. It is an intermediate state that is a
combination of machine instructions and some other useful data needed for execution.

4. ASSEMBLER
Programmers found it difficult to write or read programs in machine language. They begin to use a mnemonic
(symbols) for each machine instruction, which they would subsequently translate into machine language. Such a
mnemonic machine language is now called an assembly language. An assembler translates assembly language
programs into machine code. The output of an assembler is called an object file, which contains a combination of
machine instructions as well as the data required to place these instructions in memory.

5. Interpreter
An interpreter, like a compiler, translates high-level language into low-level machine language. The difference lies in
the way they read the source code or input. A compiler reads the whole source code at once, creates tokens, checks
semantics, generates intermediate code, executes the whole program and may involve many passes. In contrast, an
interpreter reads a statement from the input, converts it to an intermediate code, executes it, then takes the next

2
statement in sequence. If an error occurs, an interpreter stops execution and reports it. whereas a compiler reads
the whole program even if it encounters several errors.

6. Relocatable Machine Code – It can be loaded at any point and can be run. The address within the program
will be in such a way that it will cooperate for the program movement.

7. Linker: Linker is a computer program that links and merges various object files together in order to make an
executable file. All these files might have been compiled by separate assemblers.
8. Loader : Loader is a part of operating system and is responsible for loading executable files into memory and
executes them. It calculates the size of a program (instructions and data) and creates memory space for it. It
initializes various registers to initiate execution.
Loader It converts the relocatable code into absolute code and tries to run the program resulting in a running
program or an error message (or sometimes both can happen). Linker loads a variety of object files into a
single file to make it executable. Then loader loads it in memory and executes it.

9. Cross-compiler
A compiler that runs on platform (A) and is capable of generating executable code for platform (B) is called a cross-
compiler.

COMAPARISION OF COMPILER AND INTERPRETER PROCESSS


BASIS FOR COMPILER (C, C++) INTERPRETER (JAVA)
COMPARISON
Input It takes an entire program at a time. It takes a single line of code or instruction
at a time.
Output It generates intermediate object code. It does not produce any intermediate object
code.
Working mechanism The compilation is done before Compilation and execution take place
execution. simultaneously.
Speed Comparatively faster Slower

Memory Memory requirement is more due to It requires less memory as it does not
the creation of object code. create intermediate object code.
Errors Display all errors after compilation, all Displays error of each line one by one.
at the same time.
Error detection Difficult Easier comparatively
Pertaining Programming C, C++, C#, Scala, typescript uses PHP, Perl, Python, Ruby uses an
languages compiler. interpreter.

3
COMPILER DESIGN ISSUES
The compilation process is a sequence of various phases. Each phase takes input from its previous stage, has its
own representation of source program, and feeds its output to the next phase of the compiler. Let us understand the
phases of a compiler.

4
Lexical Analysis
The first phase of scanner works as a text scanner. This phase scans the source code as a stream of characters and
converts it into meaningful lexemes. Lexical analyzer represents these lexemes in the form of tokens as:
<token-name, attribute-value>

Lexemes are said to be a sequence of characters (alphanumeric) in a token

For example, in C language, the variable declaration line


int value = 100;
contains the tokens:

int (keyword), value (identifier), = (operator), 100 (constant) and ; (symbol).

Alphabets
Any finite set of symbols {0,1} is a set of binary alphabets, {0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F} is a set of Hexadecimal
alphabets, {a-z, A-Z} is a set of English language alphabets.

Strings
Any finite sequence of alphabets is called a string. Length of the string is the total number of occurrence of alphabets,
e.g., the length of the string tutorials point is 14 and is denoted by |tutorials point| = 14. A string having no alphabets,
i.e. a string of zero length is known as an empty string and is denoted by ε (epsilon).

Arithmetic Symbols Addition(+), Subtraction(-), Modulo(%), Multiplication(*), Division(/)

Punctuation Comma(,), Semicolon(;), Dot(.), Arrow(->)

Assignment =

Special Assignment +=, /=, *=, -=

Comparison ==, !=, <, <=, >, >=

Preprocessor #

Location Specifier &

Logical &, &&, |, ||, !

Shift Operator >>, >>>, <<, <<<

5
Syntax Analysis:- The second stage of translation is called Syntax analysis or parsing. In this phase expressions,
statements, declarations etc… are identified by using the results of lexical analysis. Syntax analysis is aided by using
techniques based on formal grammar of the programming language.
 Parser converts the tokens produced by lexical analyzer into a tree like representation called parse tree.
 Syntax tree is a compressed representation of the parse tree in which the operators appear as interior nodes
and the operands of the operator are the children of the node for that operator.
Input: Tokens c = a + b * 5;
c, a, b (identifier), =(assignment), +*(operator), ;(symbol)
Output: Syntax tree

Semantic analysis is the third phase of compiler.


• It checks for the semantic consistency.
• Type information is gathered and stored in symbol table or in syntax tree.
• Performs type checking.

Intermediate Code Generations:- An intermediate representation of the final machine language code is produced.
This phase bridges the analysis and synthesis phases of translation.
Most commonly used form is the three address code.
a). Three address code
t1 = inttofloat (5)
t2 = id3* tl
t3 = id2 + t2
id1 = t3

6
Example – The three address code for the expression: a+b*c+d: BODMAS
T1=b*c
T2=a+T1
T3=T2+d T 1 , T 2 , T 3 are temporary variables.

b). Postfix Notation Example:


The postfix representation of the expression (a – b) * (c + d) + (a – b) is : ab – cd + *ab -+.
For simplicity, the parser will use the syntax-directed translation of infix expressions to postfix form. For example,
the postfix form of the expression 9-5+2 is 95-2+

c). Syntax Tree Example: syntax tree the internal nodes are operators and child nodes are operands.
Example: x = (a + b * c) / (a – b * c)

Code Optimization: - This is optional phase described to improve the intermediate code so that the output runs
faster and takes less space.
Code Generation:- The last phase of translation is code generation. A number of optimizations to reduce the length
of machine language program are carried out during this phase. The output of the code generator is the machine
language program of the specified computer.

The code generation involves


o Allocation of register and memory.
o Generation of correct references.
o Generation of correct data types.
o Generation of missing code.
LDF R2, id3 Load data to floating point Register
MULF R2, # 5.0
LDF R1, id2
ADDF R1, R2
STF id1, R1 Store Data from floating point Register

7
Table Management (or) Book-keeping: - Symbol table is used to store all the information about identifiers
used in the program. Symbol Table is an important data structure created and maintained by the compiler in
order to keep track of semantics of variable i.e. it stores information about scope and binding information about
names, information about instances of various entities such as variable and function names, classes, objects, etc.
• It is a data structure containing a record for each identifier, with fields for the attributes of the identifier.
• It allows finding the record for each identifier quickly and to store or retrieve data from that record.
• Whenever an identifier is detected in any of the phases, it is stored in the symbol table.
1. It is built in lexical and syntax analysis phases.
2. The information is collected by the analysis phases of compiler and is used by synthesis phases of compiler to
generate code.
3. It is used by compiler to achieve compile time efficiency.
4. It is used by various phases of compiler as follows :-

1. Lexical Analysis: Creates new table entries in the table, example like entries about token.
2. Syntax Analysis: Adds information regarding attribute type, scope, dimension, line of reference, use, etc in the
table.
3. Semantic Analysis: Uses available information in the table to check for semantics i.e. to verify that expressions
and assignments are semantically correct (type checking) and update it accordingly.
4. Intermediate Code generation: Refers symbol table for knowing how much and what type of run-time is
allocated and table helps in adding temporary variable information.
5. Code Optimization: Uses information present in symbol table for machine dependent optimization.
6. Target Code generation: Generates code by using address information of identifier present in the table.

Symbol Table entries – Each entry in symbol table is associated with attributes that support compiler in different
phases.
Items stored in Symbol table:
 Variable names and constants
 Procedure and function names
 Literal constants and strings
 Compiler generated temporaries
 Labels in source languages
Information used by compiler from Symbol table:
 Data type and name
 Declaring procedures
 Offset in storage
 If structure or record then, pointer to structure table.
 For parameters, whether parameter passing by value or by reference
 Number and type of arguments passed to function
 Base Address

Operations of Symbol table – The basic operations defined on a symbol table include:

8
Example
int a, b; float c; char z, x;

Symbol name Type Address


a Int 1000
b Int 1002
c Float 1004
z char 1008
x char 1009
extern double test (double x);
double sample (int count)
{
double sum= 0.0;
for (int i = 1; i < = count; i++)
sum+= test((double) i);
return sum;
}

Symbol name Type Scope


test function, double extern
x double function parameter
sample function, double global
count int function parameter
sum double block local
i int for-loop statement

Consider the following C++ function:


// Define a global function
int add(int a, int b)
{
int sum = 0;
sum = a + b;
return sum;
}
Symbol Table for above code:

Name Type Scope


add function global
a int function parameter
b int function parameter
sum int local

9
Example of Compilation Process

10
Error detection and Recovery in Compiler
In this phase of compilation, all possible errors made by the user are detected and reported to the user in form of
error messages. This process of locating errors and reporting it to user is called Error Handling process.
Functions of Error handler
 Detection
 Reporting
 Recovery

Classification of Errors

• Each phase can encounter errors. After detecting an error, a phase must handle the error so that compilation can
proceed.
i. In lexical analysis, errors occur in separation of tokens.
ii. In syntax analysis, errors occur during construction of syntax tree.
iii. In semantic analysis, errors may occur at the following cases:
(i) When the compiler detects constructs that have right syntactic structure but no meaning
(ii) During type conversion.
• In code optimization, errors occur when the result is affected by the optimization. In code generation, it shows
error when code is missing etc.
Figure illustrates the translation of source code through each phase, considering the statement
c =a+ b * 5.

Lexical Errors
It includes incorrect or misspelled name of some identifier i.e., identifiers typed incorrectly. INT integer
Lexical phase errors
These errors are detected during the lexical analysis phase. Typical lexical errors are
 Exceeding length of identifier or numeric constants.
 Appearance of illegal characters
 Unmatched string
Example 1 : printf("Geeksforgeeks");$
This is a lexical error since an illegal character $ appears at the end of statement.

Example 2 : This is a comment */


This is an lexical error since end of comment is present but beginning is not present.

11
Syntactical Errors
It includes missing semicolon or unbalanced parenthesis. Syntactic errors are handled by syntax analyzer
(parser).
When an error is detected, it must be handled by parser to enable the parsing of the rest of the input. In general,
errors may be expected at various stages of compilation but most of the errors are syntactic errors and hence the
parser should be able to detect and report those errors in the program.

Syntactic phase errors


These errors are detected during syntax analysis phase. Typical syntax errors are
 Errors in structure
 Missing operator
 Misspelled keywords
 Unbalanced parenthesis
Example : swicth(ch)
{
.......
.......
}
The keyword switch is incorrectly written as swicth. Hence, “Unidentified keyword/identifier” error occurs.

The goals of error handler in parser are:

• Report the presence of errors clearly and accurately.


• Recover from each error quickly enough to detect subsequent errors.
• Add minimal overhead to the processing of correcting programs.

There are four common error-recovery strategies that can be implemented in the parser to deal with errors in
the code.
o Panic mode.
o Statement level.
o Error productions.
o Global correction.

Semantical Errors
These errors are detected during semantic analysis phase. These errors are a result of incompatible value
assignment. The semantic errors that the semantic analyzer is expected to recognize are:
 Incompatible type of operands
 Undeclared variables
 Type mismatch.
 Reserved identifier misuse.
 Multiple declaration of variable in a scope.
 Accessing an out of scope variable.
 Not matching of actual arguments with formal one

Example : int a[10], b;


.......
.......
a = b;
It generates a semantic error because of an incompatible type of a and b.

12
Error Handling in Compiler Design
The tasks of the Error Handling process are to detect each error, report it to the user, and then make some recover
strategy and implement them to handle error. During this whole process processing time of program should not be
slow. An Error is the blank entries in the symbol table.

Types or Sources of Error – There are two types of error: run-time and compile-time error:
1. A run-time error is an error which takes place during the execution of a program, and usually happens because of
adverse system parameters or invalid input data. The lack of sufficient memory to run an application or a memory
conflict with another program and logical error are example of this.
2. Logic errors, occur when executed code does not produce the expected result. Logic errors are best handled by
meticulous program debugging.

3. Compile-time errors rises at compile time, before execution of the program. Syntax error or missing file reference
that prevents the program from successfully compiling is the example of this.

Classification of Compile-time error –


1. Lexical : This includes misspellings of identifiers, keywords or operators
2. Syntactical : missing semicolon or unbalanced parenthesis
3. Semantical : incompatible value assignment or type mismatches between operator and operand
4. Logical : code not reachable, infinite loop. [These errors occur due to not reachable code-infinite loop.]

for( n=1, n> 0 , n++)


{
cout<< n;
}

13

You might also like