The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4th edition)/Ch7 (Navathe 3rd edition)
The document outlines the relational data model concepts and constraints as introduced by Dr. E.F. Codd in 1970, highlighting the advantages of relational database management systems (RDBMS) over earlier data models. It explains key components such as relations, attributes, tuples, and various types of constraints including domain, key, and referential integrity constraints. Additionally, it discusses Codd's 12 rules that define the criteria for a fully relational database system.
The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4th edition)/Ch7 (Navathe 3rd edition)
1.
1 The Relational DataModel and Relational Database Constraints Ch5 (Navathe 4th edition)/ Ch7 (Navathe 3rd edition) OUTLINE o Relational Model Concepts o Domains, Attributes, and Relations o Characteristics of Relations o Relational Model Constrains and Relational Database Schemas Introduction The principles of the relational model were first outlined by Dr. E. F. Codd in a June 1970 paper called "A Relational Model of Data for Large Shared Data Banks:' In this paper. Dr. Codd proposed the relational model for database systems. The more popular models used at that time were hierarchical and network, or even simple flat file data structures. Relational database management systems (RDBMS) soon became very popular, especially for their ease of use and flexibility in structure. In addition, a number of innovative vendors, such as Oracle, supplemented the RDBMS with a suite of powerful application development and user products, providing a total solution. Earlier we saw how to convert an unorganized text description of information requirements into a conceptual design, by the use of ER diagrams. The advantage of ER diagrams is that they force you to identify data requirements that are implicitly known, but not explicitly written down in the original description. Here we will see how to convert this ER into a logical design (this will be defined below) of a relational database. The logical model is also called a Relational Model. Relational Model Concepts The relational Model of Data is based on the concept of a Relation. A Relation is a mathematical concept based on the ideas of sets. The strength of the relational approach to data management comes from the formal foundation provided by the theory of relations. The model was first proposed by Dr. E.F. Codd of IBM Research in 1970. The relational model represents the database as a collection of relations. RELATION Informal Definition Each relation resembles a table of values. A relation may be thought of as a set of rows. A relation may alternately be thought of as a set of columns. Each row represents a fact that corresponds to a real-world entity or relationship. Each row has a value of an item or set of items that uniquely identifies that row in the table. Sometimes row-ids or sequential numbers are assigned to identify the rows in the table. Each column typically is called by its column name or column header or attribute name. Formal Definitions
2.
2 In the formalrelational model terminology The table is called a relation. A row is called a tuple. A column header is called an attribute. The data type describing the types of values that can appear in each column is represented by a domain of possible values. Example of STUDENT Relation(figure 5.1) Relational Schema: A relation schema R, denoted by R( A1, A2, A3, …, An ) is made up of a relation name R and a list of attributes A1, A2, A3, …, An . Attribute: It is the name of a column in a particular table (all data is stored in tables). Each attribute Ai must have a domain, dom(Ai ). Domain: It is a set of atomic values that an attribute can take and is denoted by dom(A). Atomic means that each value in the domain is indivisible as far as the relational model is concerned. Degree of a Relation: The degree (or arity) of a relation is the number of attributes n of its relation schema. An example of a relation schema for a relation of degree seven, which describes university students, is the following: STUDENT(Name, SSN, HomePhone, Address, OfficePhone, Age, GPA) Using the data type of each attribute, the definition is sometimes written as: STUDENT(Name: string, SSN: string, HomePhone: string, Address: string, OfficePhone:string, Age: integer, GPA: real) For this relation schema, STUDENT is the name of the relation, which has sevenattributes. Some domains for some of the attributes of the STUDENT relation: dom(Name) = Names;
3.
3 dom(SSN)=Social_security_numbers; It is alsopossible to refer to attributes of a relation schema by their position within the relation; thus, the second attribute of the STUDENT relation is SSN, whereas the fourth attribute is Address. Formally, Given R(A1, A2, .........., An) r(R) dom (A1) X dom (A2) X ....X dom(An) Relation state: A relation (or relation state) r of the relation schema R(A1, A2, .........., An) also denoted by r(R), is a set of tuples (rows) r(R) = {t1, t2, …, tn} where each ti is an n-tuple. ti = <v1, v2, …, vn> where each vj element-of dom(Aj). Let R(A1, A2) be a relation schema: Let dom(A1) = {0,1} Let dom(A2) = {a,b,c} Then: dom(A1) X dom(A2) is all possible combinations: {<0,a> , <0,b> , <0,c>, <1,a>, <1,b>, <1,c> } The relation state r(R) dom(A1) X dom(A2) A relation (or relation state) r of the relation schema R(A1, A2, .........., An) also denoted by r(R), is a set of n-tuples r = {tl , tz, ... , tm}. Each n-tuple t is an ordered list of n values t = <vI' VZ, ... , vn>, where each value Vi' 1 ::; i ::; n, is an element of dom(A) or is a special null value. The ith value in tuple t, which corresponds to the attribute Ai' is referred to as t[AJ (or t[i] if we use the positional notation). The terms relation intension for the schema R and relation extension for a relation state r(R) are also commonly used. Figure 5.1 shows an example of a STUDENT relation, which corresponds to the STUDENT schema just specified. Each tuple in the relation represents a particular student entity. We Let R(A1, A2) be a relation schema: Let dom(A1) = {0,1} Let dom(A2) = {a,b,c} Then: dom(A1) X dom(A2) is all possible combinations: {<0,a> , <0,b> , <0,c>, <1,a>, <1,b>, <1,c> } The relation state r(R) dom(A1) X dom(A2) FORMAL DEFINITION Formally, Given R(A1, A2, .........., An)
4.
4 Definition Summary Informal TermsFormal Terms Table Relation Column Header Attribute All possible Column Values Domain Row Tuple Table Definition Schema of a Relation Populated Table State of the Relation Characteristics Of Relations Ordering of tuples in a relation r(R): The tuples are not considered to be ordered, even though they appear to be in the tabular form. Ordering of attributes in a relation schema R (and of values within each tuple): We will consider the attributes in R(A1, A2, ..., An) and the values in t=<v1, v2, ..., vn> to be ordered . Values in a tuple: All values are considered atomic (indivisible). A special null value is used to represent values that are unknown or inapplicable to certain tuples. Same state as previous Figure (but with different order of tuples)
5.
5 Values andNULLs in tuples Each value in a tuple is atomic Flat relational model • Composite and multivalued attributes not allowed • First normal form assumption Multivalued attributes • Must be represented by separate relations Composite attributes • Represented only by simple component attributes in basic relational model NULL values Represent the values of attributes that may be unknown or may not apply to a tuple Meanings for NULL values • Value unknown • Value exists but is not available • Attribute does not apply to this tuple (also known as value undefined) Interpretation (meaning) of a relation Assertion • Each tuple in the relation is a fact or a particular instance of the assertion Predicate • Values in each tuple interpreted as values that satisfy predicate Relational keys There are two kinds of keys in relations. The first are identifying keys: the primary key is the main concept, while two other keys – super key and candidate key – are related concepts. The second kind is the foreign key. Identifying Keys Super Keys A super key is a set of one or more attributes whose values can be used to uniquely identify a tuple within a relation. A relation may have more than one super key, but it always has at least one: the set of all attributes that make up the relation. Candidate Keys A candidate key is a super key that is minimal; that is, there is no proper subset that is itself a superkey. A relation may have more than one candidate key, and the different candidate keys
6.
6 may have adifferent number of attributes. In other words, you should not interpret 'minimal' to mean the super key with the fewest attributes. A candidate key has two properties: (i) in each tuple of R, the values of K uniquely identify that tuple (uniqueness) (ii) no proper subset of K has the uniqueness property (irreducibility). Primary Key The primary key of a relation is a candidate key especially selected to be the key for the relation. In other words, it is a choice, and there can be only one candidate key designated to be the primary key. Relationship between identity keys The relationship between keys: Superkey ⊇ Candidate Key ⊇ Primary Key Foreign keys The attribute(s) within one relation that matches a candidate key of another relation. A relation may have several foreign keys, associated with different target relations. Relational Model Notations: A relation schema R of degree n is denoted by R(A1, A2, . . ., An). The letters Q, R, S denote relation names. The letters q, r, s denote relation states. The letters t, u, v denote tuples. In general, the name of a relation schema such as STUDENT also indicates the current set of tuples in that relation—the current relation state—whereas STUDENT(Name, SSN, . . .) refers only to the relation schema. An attribute A can be qualified with the relation name R to which it belongs by using the dot notation R.A—for example, STUDENT.Name or STUDENT.Age. This is because the same name may be used for two attributes in different relations. However, all attribute names in a particular relation must be distinct. An n-tuple t in a relation r(R) is denoted by t = <v1, v2, . . ., vn>, where vi is the value corresponding to attribute Ai. The following notation refers to component values of tuples: Both t[Ai] and t.Ai refer to the value vi in t for attribute Ai. Both t[Au, Aw, . . ., Az] and t.(Au, Aw, . . ., Az), where Au, Aw, . . ., Az is a list of attributes from R, refer to the subtuple of values <vu, vw, . . ., vz> from t corresponding to the attributes specified in the list. 5.2 RELATIONAL MODEL CONSTRAINTS AND RELATIONAL DATABASE SCHEMAS
7.
7 Constraints: Constraints areconditions that must hold on all valid relation instances. Restrictions on the actual values in a database state. Derived from the rules in the miniworld that the database represents. Constraints on databases can generally be divided into three main categories: Constraints that are inherent in the data model. We call these inherent model based or implicit constraints. Constraints that can be directly expressed in the schemas of the data model, typically by specifying them in the DDL. We call these schema-based constraints or explicit constraints. Constraints that cannot be directly expressed in the schemas of the data model, and hence must be expressed and enforced by the application programs. We call these application-based constraints or semantic constraints or business rule. 5.2.1 Domain constraints Domain constraints specify that within each tuple, the value of each attribute A must be an atomic value from the domain dom(A). The data types associated with domains typically include: Numeric data types for integers and real numbers Characters Booleans Fixed-length strings Variable-length strings Date, time, timestamp Money Other special data types These various data types are offered by SQL-99. 5.2.2. Key Constraints and Constraints on Null Values Superkey of R: A set of attributes, SK, of R such that no two tuples in any valid relational instance, r( R), will have the same value for SK. Therefore, for any two distinct tuples, t1 and t2 in r( R), t1[ SK] != t2[SK]. Key of R: A minimal superkey. That is, a superkey, K, of R such that the removal of ANY attribute from K will result in a set of attributes that are not a superkey. Example: The CAR relation schema: CAR(State, Reg#, SerialNo, Make, Model, Year) has two keys Key1 = {State, Reg#}, Key2 = {SerialNo}, which are also superkeys. {SerialNo, Make} is a superkey but not a key. If a relation has several candidate keys, one is chosen arbitrarily to be the primary key. The primary key attributes are underlined. 5.2.3 Relational Databases and Relational Database Schema
8.
8 Relational databaseschema S Set of relation schemas S = {R1, R2, ..., Rm} Set of integrity constraints IC Relational database state Set of relation states DB = {r1, r2, ..., rm} Each ri is a state of Ri and such that the ri relation states satisfy integrity constraints specified in IC Invalid state Does not obey all the integrity constraints Valid state Satisfies all the constraints in the defined set of integrity constraints IC In fig 5.5 a relational database schema COMPANY COMPANY={EMPLOYEE, DEPARTMENT, DEPT_LOCATIONS, PROJECT, WORKS_ON, DEPENDENT}.
10 5.2.4 Entity Integrity,Referential Integrity,and Foreign Keys Entity integrity constraint No primary key value can be NULL because the primary key value is used to distinguish each tuples in a relation. Key constraints and Entity integrity constraint are specified on individual relations. Referential integrity constraint A constraint involving two relations (the previous constraints involve a single relation). Used to specify a relationship among tuples in two relations: the referencing relation and the referenced relation. Tuples in the referencing relation R1 have attributes FK (called foreign key attributes) that reference the primary key attributes PK of the referenced relation R2. A tuple t1 in R1 is said to reference a tuple t2 in R2 if t1[FK] = t2[PK]. A referential integrity constraint can be displayed in a relational database schema as a directed arc from R1.FK to R2. Foreign key rules: The attributes in FK have the same domain(s) as the primary key attributes PK. The value in the foreign key column (or columns) FK of the the referencing relation R1 can be either: i) A value of an existing primary key value of the corresponding primary key PK in the referenced relation R2, or ii) A null. In case (ii), the FK in R1 should not be a part of its own primary key.
11.
11 Codd's 12 rules Dr.E. F. Codd's 12 rules for defining a fully relational database A relational database management system (RDBMS) is a database management system (DBMS) that is based on the relational model as introduced by E. F. Codd. Most popular commercial and open source databases currently in use are based on the relational model. A short definition of an RDBMS may be a DBMS in which data is stored in the form of tables and the relationship among the data is also stored in the form of tables. E.F. Codd, the famous mathematician has introduced 12 rules for the relational model for databases commonly known as Codd's rules. The rules mainly define what is required for a DBMS for it to be considered relational, i.e., an RDBMS. The rules and their description are as follows:- Rule 0: Foundation Rule A relational database management system should be capable of using its relational facilities (exclusively) to manage the database. Rule 1: Information Rule All information in the database is to be represented in one and only one way. This is achieved by values in column positions within rows of tables. Rule 2: Guaranteed Access Rule All data must be accessible with no ambiguity, that is, Each and every datum (atomic value) is guaranteed to be logically accessible by resorting to a combination of table name, primary key value and column name. Rule 3: Systematic treatment of null values Null values (distinct from empty character string or a string of blank characters and distinct from zero or any other number) are supported in the fully relational DBMS for representing missing information in a systematic way, independent of data type. Rule 4: Dynamic On-line Catalog Basedon the Relational Model The database description is represented at the logical level in the same way as ordinary data, so authorized users can apply the same relational language to its interrogation as they apply to regular data. The authorized users can access the database structure by using common language i.e. SQL. Rule 5: Comprehensive Data Sublanguage Rule A relational system may support several languages and various modes of terminal use. However, there must be at least one language whose statements are expressible, per some well-defined syntax, as character strings and whose ability to support all of the following is comprehensible:
12.
12 a. data definition b.view definition c. data manipulation (interactive and by program) d. integrity constraints e. authorization f. Transaction boundaries (begin, commit, and rollback). Rule 6: View Updating Rule All views that are theoretically updateable are also updateable by the system. Rule 7: High-level Insert, Update, and Delete The system is able to insert, update and delete operations fully. It can also perform the operations on multiple rows simultaneously. Rule 8: Physical Data Independence Application programs and terminal activities remain logically unimpaired whenever any changes are made in either storage representation or access methods. Rule 9: Logical Data Independence Application programs and terminal activities remain logically unimpaired when information preserving changes of any kind that theoretically permit unimpairment are made to the base tables. Rule 10: Integrity Independence Integrity constraints specific to a particular relational database must be definable in the relational data sublanguage and storable in the catalog, not in the application programs. Rule 11: Distribution Independence The data manipulation sublanguage of a relational DBMS must enable application programs and terminal activities to remain logically unimpaired whether and whenever data are physically centralized or distributed. Rule 12: Nonsubversion Rule If a relational system has or supports a low-level (single-record-at-a-time) language, that low- level language cannot be used to subvert or bypass the integrity rules or constraints expressed in the higher-level (multiple-records-at-a-time) relational language.