Dbms Notes
Dbms Notes
The database is a collection of inter-related data which is used to retrieve, insert and delete the
data efficiently. It is also used to organize the data in the form of a table, schema, views, and
reports, etc.
For example: The college Database organizes the data about the admin, staff, students and
faculty etc.
Using the database, you can easily retrieve, insert, and delete the information.
o Data Definition: It is used for creation, modification, and removal of definition that defines the
organization of data in the database.
o Data Updation: It is used for the insertion, modification, and deletion of the actual data in the
database.
o Data Retrieval: It is used to retrieve the data from the database which can be used by
applications for various purposes.
o User Administration: It is used for registering and monitoring users, maintain data integrity,
enforcing data security, dealing with concurrency control, monitoring performance and
recovering information corrupted by unexpected failure.
Characteristics of DBMS
o It uses a digital repository established on a server to store and manage the information.
o It can provide a clear and logical view of the process that manipulates data.
o DBMS contains automatic backup and recovery procedures.
o It contains ACID properties which maintain data in a healthy state in case of failure.
o It can reduce the complex relationship between data.
o It is used to support manipulation and processing of data.
o It is used to provide security of data.
o It can view the database from different viewpoints according to the requirements of the user.
Advantages of DBMS
o Controls database redundancy: It can control data redundancy because it stores all the data in
one single database file and that recorded data is placed in the database.
o Data sharing: In DBMS, the authorized users of an organization can share the data among
multiple users.
o Easily Maintenance: It can be easily maintainable due to the centralized nature of the database
system.
o Reduce time: It reduces development time and maintenance need.
o Backup: It provides backup and recovery subsystems which create automatic backup of data
from hardware and software failures and restores the data if required.
o multiple user interface: It provides different types of user interfaces like graphical user
interfaces, application program interfaces
Disadvantages of DBMS
o Cost of Hardware and Software: It requires a high speed of data processor and large memory size
to run DBMS software.
o Size: It occupies a large space of disks and large memory to run them efficiently.
o Complexity: Database system creates additional complexity and requirements.
Higher impact of failure: Failure is highly impacted the database because in most
of the organization, all the data stored in a single database and if the database is
damaged due to electric failure or database corruption then the data may be lost
forever. Difference between File System and DBMS
Consider an example of a student's file system. The student file will contain information
regarding the student (i.e. roll no, student name, course etc.). Similarly, we have a subject file
that contains information about the subject and the result file which contains the information
regarding the result.
Some fields are duplicated in more than one file, which leads to data redundancy. So to
overcome this problem, we need to create a centralized system, i.e. DBMS approach.
DBMS:
A database approach is a well-organized collection of data that are related in a meaningful way
which can be accessed by different users but stored only once in a system. The various
operations performed by the DBMS system are: Insertion, deletion, selection, sorting etc.
In the above figure,
There are the following differences between DBMS and File systems:
Meaning DBMS is a collection of data. In DBMS, The file system is a collection of data. In th
the user is not required to write the system, the user has to write the procedure
procedures. for managing the database.
Sharing of data Due to the centralized approach, data Data is distributed in many files, and it ma
sharing is easy. be of different formats, so it isn't easy t
share data.
Data Abstraction DBMS gives an abstract view of data that The file system provides the detail of the dat
hides the details. representation and storage of data.
Security and DBMS provides a good protection It isn't easy to protect a file under the fil
Protection mechanism. system.
Recovery DBMS provides a crash recovery The file system doesn't have a cras
Mechanism mechanism, i.e., DBMS protects the user mechanism, i.e., if the system crashes whil
from system failure. entering some data, then the content of th
file will be lost.
Manipulation DBMS contains a wide variety of The file system can't efficiently store an
Techniques sophisticated techniques to store and retrieve the data.
retrieve the data.
Concurrency DBMS takes care of Concurrent access of In the File system, concurrent access ha
Problems data using some form of locking. many problems like redirecting the file whil
deleting some information or updating som
information.
Where to use Database approach used in large systems File system approach used in large system
which interrelate many files. which interrelate many files.
Cost The database system is expensive to design. The file system approach is cheaper t
design.
Data Redundancy Due to the centralization of the database, In this, the files and application programs ar
and Inconsistency the problems of data redundancy and created by different programmers so tha
inconsistency are controlled. there exists a lot of duplication of data whic
may lead to inconsistency.
Structure The database structure is complex to The file system approach has a simpl
design. structure.
Data In this system, Data Independence exists, In the File system approach, there exists n
Independence and it can be of two types. Data Independence.
o Logical Data Independence
o Physical Data Independence
Integrity Integrity Constraints are easy to apply. Integrity Constraints are difficult t
Constraints implement in file system.
Data Models In the database approach, 3 types of data In the file system approach, there is n
models exist: concept of data models exists.
o Hierarchal data models
o Network data models
o Relational data models
Flexibility Changes are often a necessity to the content The flexibility of the system is less a
of the data stored in any system, and these compared to the DBMS approach.
changes are more easily with a database
approach.
Data Models
Data Model is the modeling of the data description, data semantics, and consistency constraints
of the data. It provides the conceptual tools for describing the design of a database at each level
of data abstraction. Therefore, there are following four data models used for understanding the
structure of the database:
Relational Model
The most popular data model in DBMS is the Relational Model. It is more scientific a model
than others. This model is based on first-order predicate logic and defines a table as an n-ary
relation.
hierarchical model,
2. Network Model
The Network Model was formalized by the Database Task group in the 1960s. This
model is the generalization of the hierarchical model. This model can consist of
multiple parent segments and these segments are grouped as levels but there exists a
logical association between the segments belonging to any level. Mostly, there exists a
many-to-many logical association between any of the two segments.
Object oriented data model is based upon real world situations. These situations are represented
as objects, with different attributes. All these object have multiple relationships between them.
Elements of Object oriented data model
Objects
The real world entities and situations are represented as objects in the Object oriented database
model.
Attributes and Method
Every object has certain characteristics. These are represented using Attributes. The behaviour
of the objects is represented using Methods.
Class
Similar attributes and methods are grouped together using a class. An object can be called as an
instance of the class.
Inheritance
A new class can be derived from the original class. The derived class contains attributes and
methods of the original class as well as its own.
Example
An Example of the Object Oriented data model is −
Shape, Circle, Rectangle and Triangle are all objects in this model.
Circle has the attributes Center and Radius.
Rectangle has the attributes Length and Breath
Triangle has the attributes Base and Height.
The objects Circle, Rectangle and Triangle inherit from the object Shape.
Levels of abstraction
Data Abstraction is a process of hiding unwanted or irrelevant details from the end user. It
provides a different view and helps in achieving data independence which is used to enhance the
security of data.
The database systems consist of complicated data structures and relations. For users to access
the data easily, these complications are kept hidden, and only the relevant part of the database is
made accessible to the users through data abstraction.
Because they give programmers, Database Managers, and end users a consolidated view of the
data, Database Management Systems are crucial because they relieve applications and end users
of the need to comprehend the physical location of the data. Application Programme Interfaces
(APIs) manage internet requests and responses for particular sorts of data.
In marketing materials, the phrase "database as a service" (DBaaS) may be used to refer to both
relational and non-relational DBMS components that are given via the internet.
Users of DBMSs include application programmers, Database Administrators (DBAs), and end
users.
Database Administrators are typically the only people who work directly with a DBMS. Today,
end users read and write to databases using front-end interfaces made by programmers, while
programmers use cloud APIs to connect with DBMSs.
hree Parts that make up the Database System are:
o Query Processor
o Storage Manager
o Disk Storage
1. Query Processor
The query processing is handled by the query processor, as the name implies. It executes the
user's query, to put it simply. In this way, the query processor aids the database system in making
data access simple and easy. The query processor's primary duty is to successfully execute the
query. The Query Processor transforms (or interprets) the user's application program-provided
requests into instructions that a computer can understand.
Data Definition Language is what DDL stands for. As implied by the name, the DDL Interpreter
interprets DDL statements like those used in schema definitions (such as create, remove, etc.).
This interpretation yields a set of tables that include the meta-data (data of data) that is kept in
the data dictionary. Metadata may be stored in a data dictionary. In essence, it is a part of the disc
storage that will be covered in a later section of this article.
o DML Compiler:
Compiler for DML Data Manipulation Language is what DML stands for. In keeping with its
name, the DML Compiler converts DML statements like select, update, and delete into low-level
instructions or simply machine-readable object code, to enable execution. The optimization of
queries is another function of the DML compiler. Since a single question can typically be
translated into a number of evaluation plans. As a result, some optimization is needed to select
the evaluation plan with the lowest cost out of all the options. This process, known as query
optimization, is exclusively carried out by the DML compiler. Simply put, query optimization
determines the most effective technique to carry out a query.
Before the query evaluation, the embedded DML commands in the application program (such as
SELECT, FROM, etc., in SQL) must be pre-compiled into standard procedural calls (program
instructions that the host language can understand). Therefore, the DML statements which are
embedded in an application program must be converted into routine calls by the Embedded DML
Pre-compiler.
o Query Optimizer:
It starts by taking the evaluation plan for the question, runs it, and then returns the result. Simply
said, the query evaluation engine evaluates the SQL commands used to access the database's
contents before returning the result of the query. In a nutshell, it is in charge of analyzing the
queries and running the object code that the DML Compiler produces. Apache Drill, Presto, and
other Query Evaluation Engines are a few examples.
2. Storage Manager:
An application called Storage Manager acts as a conduit between the queries made and the data
kept in the database. Another name for it is Database Control System. By applying the
restrictions and running the DCL instructions, it keeps the database's consistency and integrity. It
is in charge of retrieving, storing, updating, and removing data from the database.
o Integrity Manager:
Whenever there is any change in the database, the Integrity manager will manage the integrity
constraints.
o Authorization Manager:
Authorization manager verifies the user that he is valid and authenticated for the specific query
or request.
o File Manager:
All the files and data structure of the database are managed by this component.
o Transaction Manager:
It is responsible for making the database consistent before and after the transactions. Concurrent
processes are generally controlled by this component.
o Buffer Manager:
The transfer of data between primary and main memory and managing the cache memory is done
by the buffer manager.
3. Disk Storage
A DBMS can use various kinds of Data Structures as a part of physical system implementation in
the form of disk storage.
o Data Dictionary:
It contains the metadata (data of data), which means each object of the database has some
information about its structure. So, it creates a repository which contains the details about the
structure of the database object.
o Data Files:
Database Designing
The next step involves designing the database considering the user-based requirements and
splitting them out into various models so that load or heavy dependencies on a single aspect are
not imposed. Therefore, there has been some model-centric approach and that's where logical
and physical models play a crucial role.
Physical Model - The physical model is concerned with the practices and implementations of the
logical model.
Logical Model - This stage is primarily concerned with developing a model based on the
proposed requirements. The entire model is designed on paper without any implementation or
adopting DBMS considerations.
Implementation
The last step covers the implementation methods and checking out the behavior that matches our
requirements. It is ensured with continuous integration testing of the database with different data
sets and conversion of data into machine understandable language. The manipulation of data is
primarily focused on these steps where queries are made to run and check if the application is
designed satisfactorily or not.
Data conversion and loading - This section is used to import and convert data from the old to
the new system.
Testing - This stage is concerned with error identification in the newly implemented system.
Testing is a crucial step because it checks the database directly and compares the requirement
specifications.
Logical
A logical data model generally describes the data in as many details as possible, without having
to be concerned about the physical implementations in the database. Features of logical data
model might include:
Also, one important factor after following the above approach is to critically examine the design
based on requirement gathering. If the above steps are strictly followed, there are chances of
creating a highly efficient database design that follows the native approach.
To understand these points, see the image below to get a clear picture.
If we compare the logical data model as shown in the figure above with some sample data in the
diagram, we can come up with facts that in a conceptual data model there are no presence of a
primary key whereas a logical data model has primary keys for all of its attributes. Also, logical
data model the cover relationship between different entities and carries room for foreign keys to
establish relationships among them.
Physical
A Physical data mode generally represents how the approach or concept of designing the
database. The main purpose of the physical data model is to show all the structures of the table
including the column name, column data type, constraints, keys(primary and foreign), and
the relationship among tables. The following are the features of a physical data model:
While designing a physical data model, the following points should be taken into consideration:
Comparing this physical data model with the logical with the previous logical model, we might conclude
the differences that in a physical database entity names are considered table names and attributes are
considered column names. Also, the data type of each column is defined in the physical model depending
on the actual database used.
ER (Entity Relationship) Diagram in DBMS
o ER model stands for an Entity-Relationship model. It is a high-level data model. This model is
used to define the data elements and relationship for a specified system.
o It develops a conceptual design for the database. It also develops a very simple and easy to design
view of data.
o In ER modeling, the database structure is portrayed as a diagram called an entity-relationship
diagram.
For example, Suppose we design a school database. In this database, the student will be an
entity with attributes like address, name, id, age, etc. The address can be another entity with
attributes like city, street name, pin code, etc and there will be a relationship between them.
Component of ER Diagram
1. Entity:
An entity may be any object, class, person or place. In the ER diagram, an entity can be
represented as rectangles.
An entity that depends on another entity called a weak entity. The weak entity doesn't contain
any key attribute of its own. The weak entity is represented by a double rectangle.
2. Attribute
The attribute is used to describe the property of an entity. Eclipse is used to represent an attribute.
For example, id, age, contact number, name, etc. can be attributes of a student.
a. Key Attribute
The key attribute is used to represent the main characteristics of an entity. It represents a primary
key. The key attribute is represented by an ellipse with the text underlined.
b. Composite Attribute
An attribute that composed of many other attributes is known as a composite attribute. The
composite attribute is represented by an ellipse, and those ellipses are connected with an ellipse.
c. Multivalued Attribute
An attribute can have more than one value. These attributes are known as a multivalued attribute.
The double oval is used to represent multivalued attribute.
For example, a student can have more than one phone number.
d. Derived Attribute
An attribute that can be derived from other attribute is known as a derived attribute. It can be
represented by a dashed ellipse.
For example, A person's age changes over time and can be derived from another attribute like
Date of birth.
3. Relationship
A relationship is used to describe the relation between entities. Diamond or rhombus is used to
represent the relationship.
Types of relationship are as follows:
a. One-to-One Relationship
When only one instance of an entity is associated with the relationship, then it is known as one to
one relationship.
For example, A female can marry to one male, and a male can marry to one female.
b. One-
to-many relationship
When only one instance of the entity on the left, and more than one instance of an entity on the
right associates with the relationship then this is known as a one-to-many relationship.
For example, Scientist can invent many inventions, but the invention is done by the only
specific scientist
c. Many-to-one relationship
When more than one instance of the entity on the left, and only one instance of an entity on the
right associates with the relationship then it is known as a many-to-one relationship.
For example, Student enrolls for only one course, but a course can have many students.
d. Many-to-many relationship
When more than one instance of the entity on the left, and more than one instance of an entity on
the right associates with the relationship then it is known as a many-to-many relationship.
For example, Employee can assign by many projects and project can have many employees.
Cardinality
Cardinality means how the entities are arranged to each other or what is the relationship structure
between entities in a relationship set. In a Database Management System, Cardinality represents
a number that denotes how many times an entity is participating with another entity in a
relationship set. The Cardinality of DBMS is a very important attribute in representing the
structure of a Database. In a table, the number of rows or tuples represents the Cardinality.
Cardinality Ratio
Cardinality ratio is also called Cardinality Mapping, which represents the mapping of one
entity set to another entity set in a relationship set. We generally take the example of a binary
relationship set where two entities are mapped to each other.
Cardinality is very important in the Database of various businesses. For example, if we want to
track the purchase history of each customer then we can use the one-to-many cardinality to find
the data of a specific customer. The Cardinality model can be used in Databases by Database
Managers for a variety of purposes, but corporations often use it to evaluate customer or
inventory data.
1. One to one
2. Many to one
3. One to many
4. Many to many
One to One
One to one cardinality is represented by a 1:1 symbol. In this, there is at most one relationship
from one entity to another entity. There are a lot of examples of one-to-one cardinality in real life
databases.
For example, one student can have only one student id, and one student id can belong to only
one student. So, the relationship mapping between student and student id will be one to one
cardinality mapping.
Another example is the relationship between the director of the school and the school because
one school can have a maximum of one director, and one director can belong to only one school.
One to one Cardinality is the subset of Many to one Cardinality. It can be represented by M:1.
For example, there are multiple patients in a hospital who are served by a single doctor, so the
relationship between patients and doctors can be represented by Many to one Cardinality.
For Example, in a hospital, there can be various compounders, so the relationship between the
hospital and compounders can be mapped through One-to-many Cardinality.
Many to Many Cardinalities:
In many, many cardinalities mapping, there can be one or more than one entity that can associate
with one or more than one entity of set 2. In the same way from the end of set 2, one or more
than one entity can make a relation with one or more than one entity of set 1.
It is represented by M: N or N: M.
One to one cardinality, One to many cardinalities, and Many to one cardinality is the subset of
the many to many cardinalities.
For Example, in a college, multiple students can work on a single project, and a single student
can also work on multiple projects. So, the relationship between the project and the student can
be represented by many to many cardinalities.
Entity Set: An Entity is an object of Entity Type and a set of all entities is called an
entity set. For Example, E1 is an entity having Entity Type Student and the set of all
students is called Entity Set. In ER diagram, Entity Type is represented as:
Entity Set
Entity-Relationship Set
A set of relationships of the same type is known as a relationship set. The following
relationship set depicts S1 as enrolled in C2, S2 as enrolled in C1, and S3 as registered
in C3.
Relationship Set
Unary Relationship
2. Binary Relationship: When there are TWO entities set participating in a relationship,
the relationship is called a binary relationship. For example, a Student is enrolled in a
Course.
Binary Relationship
3. n-ary Relationship: When there are n entities set participating in a relation, the
relationship is called an n-ary relationship.
Participation Constraint
Participation Constraint is applied to the entity participating in the relationship set.
1. Total Participation – Each entity in the entity set must participate in the relationship.
If each student must enroll in a course, the participation of students will be total. Total
participation is shown by a double line in the ER diagram.
2. Partial Participation – The entity in the entity set may or may NOT participate in
the relationship. If some courses are not enrolled by any of the students, the
participation in the course will be partial.
The diagram depicts the ‘Enrolled in’ relationship set with Student Entity set having
total participation and Course Entity set having partial participation.
Keys
o Keys play an important role in the relational database.
o It is used to uniquely identify any record or row of data from the table. It is also used to establish
and identify relationships between tables.
For example, ID is used as a key in the Student table because it is unique for each student. In the
PERSON table, passport_number, license_number, SSN are keys since they are unique for each
person.
Types of keys:
1. Primary key
o It is the first key used to identify one and only one instance of an entity uniquely. An entity can
contain multiple keys, as we saw in the PERSON table. The key which is most suitable from
those lists becomes a primary key.
o In the EMPLOYEE table, ID can be the primary key since it is unique for each employee. In the
EMPLOYEE table, we can even select License_Number and Passport_Number as primary keys
since they are also unique.
o For each entity, the primary key selection is based on requirements and developers.
2. Candidate key
o A candidate key is an attribute or set of attributes that can uniquely identify a tuple.
o Except for the primary key, the remaining attributes are considered a candidate key. The
candidate keys are as strong as the primary key.
For example: In the EMPLOYEE table, id is best suited for the primary key. The rest of the
attributes, like SSN, Passport_Number, License_Number, etc., are considered a candidate key.
3. Super Key
Super key is an attribute set that can uniquely identify a tuple. A super key is a superset of a
candidate key.
For example: In the above EMPLOYEE table, for(EMPLOEE_ID, EMPLOYEE_NAME), the
name of two employees can be the same, but their EMPLYEE_ID can't be the same. Hence, this
combination can also be a key.
PauseNext
Unmute
Duration 18:10
Loaded: 5.50%
Â
Fullscreen
4. Foreign key
o Foreign keys are the column of the table used to point to the primary key of another table.
o Every employee works in a specific department in a company, and employee and department are
two different entities. So we can't store the department's information in the employee table. That's
why we link these two tables through the primary key of one table.
o We add the primary key of the DEPARTMENT table, Department_Id, as a new attribute in the
EMPLOYEE table.
o In the EMPLOYEE table, Department_Id is the foreign key, and both the tables are related.
5. Alternate key
There may be one or more attributes or a combination of attributes that uniquely identify each
tuple in a relation. These attributes or combinations of the attributes are called the candidate keys.
One key is chosen as the primary key from these candidate keys, and the remaining candidate
key, if it exists, is termed the alternate key. In other words, the total number of the alternate
keys is the total number of candidate keys minus the primary key. The alternate key may or may
not exist. If there is only one candidate key in a relation, it does not have an alternate key.
For example, employee relation has two attributes, Employee_Id and PAN_No, that act as
candidate keys. In this relation, Employee_Id is chosen as the primary key, so the other candidate
key, PAN_No, acts as the Alternate key.
6. Composite key
Whenever a primary key consists of more than one attribute, it is known as a composite key.
This key is also known as Concatenated Key.
For example, in employee relations, we assume that an employee may be assigned multiple
roles, and an employee may work on multiple projects simultaneously. So the primary key will
be composed of all three attributes, namely Emp_ID, Emp_role, and Proj_ID in combination. So
these attributes act as a composite key since the primary key comprises more than one attribute.
7. Artificial key
The key created using arbitrarily assigned data are known as artificial keys. These keys are
created when a primary key is large and complex and has no relationship with many other
relations. The data values of the artificial keys are usually numbered in a serial order.
For example, the primary key, which is composed of Emp_ID, Emp_role, and Proj_ID, is large
in employee relations. So it would be better to add a new virtual attribute to identify each tuple in
the relation uniquely.
Generalization
o Generalization is like a bottom-up approach in which two or more entities of lower level combine
to form a higher level entity if they have some attributes in common.
o In generalization, an entity of a higher level can also combine with the entities of the lower level
to form a further higher level entity.
o Generalization is more like subclass and superclass system, but the only difference is the
approach. Generalization uses the bottom-up approach.
o In generalization, entities are combined to form a more generalized entity, i.e., subclasses are
combined to make a superclass.
For example, Faculty and Student entities can be generalized and create a higher level entity
Person.
Specialization
o Specialization is a top-down approach, and it is opposite to Generalization. In specialization, one
higher level entity can be broken down into two lower level entities.
o Specialization is used to identify the subset of an entity set that shares some distinguishing
characteristics.
o Normally, the superclass is defined first, the subclass and its related attributes are defined next,
and relationship set are then added.
Aggregation
In aggregation, the relation between two entities is treated as a single entity. In aggregation,
relationship with its corresponding entities is aggregated into a higher level entity.
For example: Center entity offers the Course entity act as a single entity in the relationship
which is in a relationship with another entity visitor. In the real world, if a visitor visits a
coaching center then he will never enquiry about the Course only or just about the Center instead
he will ask the enquiry about both
UNIT 2
Integrity Constraints
o Integrity constraints are a set of rules. It is used to maintain the quality of information.
o Integrity constraints ensure that the data insertion, updating, and other processes have to
be performed in such a way that data integrity is not affected.
o Thus, integrity constraint is used to guard against accidental damage to the database.