![]() |
The Relational Data ModelIn 1970, E. F. Codd, a member of the IBM Research Laboratory in San Jose, California, published his now famous paper, 'A Relational Model for Large Shared Data Banks' [1] in which were defined a set of abstract principles for database management. A definition of the relational model has been attempted by C. J. Date [2]. During the early 1970's several experimental database management systems, based on the principles outlined in Codd's paper, were developed; commercial products became available in the late 1970's, and proliferated during the 1980's. Oracle, Ingres and DB2 are perhaps the most widely used products in large multi-user environments, while others such as Access, Foxpro and Paradox tend to dominate the PC market. We shall be illustrating the theory and principles of relation database systems with reference to the Oracle proprietary Relational Database Management Systems (RDBMS) and Structured Query Language (SQL), the most commonly used database language used to manipulate the structure of the database and data held within it 3.1. The relational model.The relational model consists of three components:
3.2. NormalisationTables with the characteristics specified in 3.1. are said to be in 1ST NORMAL FORM (1NF). Their successful use within a database makes further constraints desirable to avoid redundancy, and ensure that subsequent updating cannot cause inconsistency.
THE NON-KEY ATTRIBUTES ARE FUNCTIONALLY DEPENDENT ON THE KEY, THE WHOLE KEY, AND NOTHING BUT THE KEY!The process of normalisation involves first flattening out any tree or network structures to eliminate repeating groups, then splitting the data into separate tables where the above conditions apply. For instance a stock record table, stock_record, may have the columns: item_number description price pricecode location quantity (repeated for each location)and typical data: 123, 1" nails, 10p, 7, Birmingham, 30, Coventry, 25might be flattened to form the table: STOCK_RECORD (item,description,price,pricecode,location,quantity) 123 1" nails 10 7 Birmingham 30 123 1" nails 10 7 Coventry 25which could then be normalised into: STOCK_ITEM (item, description, pricecode) STOCK_LEVEL (item, location, quantity) PRICE_CODE (pricecode, price)Relationships are represented by matching attribute values in different tables - the occurrence of so-called foreign keys. The relational approach forces the designer to be very strict about the assignment of unique identifiers to table entries, since no other linking method is permitted. In theory, a fully normalised data base is desirable to avoid duplication of information and the consequent risk of inconsistency after updating. However there is a practical difficulty that normalisation tends to fragment the data, and separate items of information which need to be seen together. The designer may sometimes prefer to have a data base which is not strictly in normal form so as to improve performance in frequently-executed operations. A useful aspect of the relational model is that the data definition may itself is represented in tabular form; data descriptions may be created and examined using extensions to the relational query language. Subsets of the database are defined as views (virtual tables) produced by relational operations on the stored database. These may select particular rows or columns, or combine existing tables, to give any individual user easy access to an appropriate part of the data, or to enforce security constraints. View tables are not stored permanently, but recreated when required from the latest version of the base tables using the definition of the view. 3.2 Functional Dependencies -
Functional
dependencies (FDs) are used to specify formal
measures
of the "goodness" of relational designs -
FDs and keys are used to define normal
forms for relations -
FDs are constraints
that are derived from the meaning
and interrelationships
of the data attributes -
A set of attributes X functionally
determines
a set of attributes Y if the value of X determines
a unique value for Y -
X ->
Y holds if whenever two tuples have the same value for X, they must
have
he same value
for Y -
For any two tuples t1 and t2 in any
relation instance r(R):
If
t1[X]=t2[X], then
t1[Y]=t2[Y] -
X ->
Y in R specifies a constraint
on all relation instances r(R) -
Written as X ->
Y; can be displayed graphically on a relation schema as in
Figures.
( denoted by the
arrow: ®
). -
FDs are derived from the real-world constraints on the attributes Examples
of FD constraints: -
social security number determines employee name -
project number determines project name and location i.e.,
PNUMBER ->
{PNAME, PLOCATION} -
employee ssn and project number determines the hours per week
that the employee works
on the project -
An FD is a property of the attributes in the schema R -
The constraint must hold on every
relation instance
r(R) -
If K is a key of R, then K functionally determines all attributes
in R (since we never have two distinct tuples with t1[K]=t2[K]) 3.3 Existence DependenciesAn important class of constraints is the existence dependencies. Specifically, , if the existence of entity x depends on the existence of entity y, then x is said to be existence dependent on y. Operationally, if y is deleted, so is x. Entity y is said to be a dominant entity, and x is said to be a subordinate entity. As an illustration, consider the entity set loan and the entity set payment that keeps information about all the payments that were made in connection to particular loan. the payment entity set is described by the attributes payment-number, payment-date and payment-amount. We form a relationship set loan-payment between these two entity sets which is one-to-many from loan to payment. Every payment entity must be associated with a loan entity. If a loan entity is deleted, then all the payments associated with that loan must be deleted also. In contrast, payment entities can be deleted from the database without affecting any loan. The entity set loan, therefore is dominant and payment is subordinate. in the loan-payment relationship set.
3.4. Relational OperationsGiven this simple and restricted data structure, it is possible to define some very powerful relational operators which, from the users' point of view, act in parallel' on all entries in a table simultaneously, although their implementation may require conventional processing.Codd originally defined eight relational operators.
3.4.1. SELECTRESTRICTS the rows chosen from a table to those entries with specified attribute values.SELECT item FROM stock_level WHERE quantity > 100constructs a new, logical table - an unnamed relation - with one column per row (i.e. item) containing all rows from stock_level that satisfy the WHERE clause. 3.4.2. PROJECTSelects rows made up of a sub-set of columns from a table.PROJECT stock_item OVER item AND descriptionproduces a new logical table where each row contains only two columns - item and description. The new table will only contain distinct rows from stock_item; i.e. any duplicate rows so formed will be eliminated. 3.4.3. JOINAssociates entries from two tables on the basis of matching column values.JOIN stock_item WITH stock_level OVER itemIt is not necessary for there to be a one-to-one relationship between entries in two tables to be joined - entries which do not match anything will be eliminated from the result, and entries from one table which match several entries in the other will be duplicated the required number of times. The above definition is actually that of a NATURAL or EQUI-JOIN - i.e. a join in which the values of the matching columns are equal. It has become normal to extend join to include other comparison operators such as less than, greater than, etc. It is important to be clear about one's intentions here to obtain meaningful results. Join is obviously a very general operation, and the principal source of processing power in relational systems, but it is also costly in time and space. Because no ordering can be guaranteed, a join may require a comparison of every entry in one table with every entry in the other, and create large intermediate results. That is why users of large-scale data bases, while acknowledging the power and flexibility of the relational approach, were slow to adopt it instead of methods based on more efficient file processing techniques. 3.4.4. PRODUCT.Builds a relation from two specified relations consisting of all possible combinations of rows, one from each of the two relations.For example, consider two relations, A and B, consisting of rows: A: a B: d => A product B: a d b e a e c b d b e c d c e 3.4.5. UNION.Builds a relation consisting of all rows appearing in either or both of the two relations.For example, consider two relations, A and B, consisting of rows: A: a B: a => A union B: a b e b c c e 3.4.6. INTERSECT.Builds a relation consisting of all rows appearing in both of the two relations.For example, consider two relations, A and B, consisting of rows: A: a B: a => A intersect B: a b e c 3.4.7. DIFFERENCE.Builds a relation consisting of all rows appearing in the first and not in the second of the two relations.For example, consider two relations, A and B, consisting of rows: A: a B: a => A - B: b and B - A: e b e c c 3.4.8. DIVIDE.Takes two relations, one binary and one unary, and builds a relation consisting of all values of one column of the binary relation that match, in the other column, all values in the unary relation.A: a x B: x => A divide B: a a y y a z b x c yOf the relational operators 3.2.4. to 3.2.8.defined by Codd, the most important is DIVISION. For example, suppose table A contains a list of suppliers and commodities, table B a list of all commodities bought by a company. Dividing A by B produces a table listing suppliers who sell all commodities. 3.5. Relational languagesIn the last fifteen to twenty years, research and development on relational systems (much of it undertaken by IBM) has focused on two issues:
SUM (quantity IN stock-level OVER item) find total amount of each item across all locations. COUNT (stock-item OVER location) find the number of stock items held in each location.It should be possible to combine functions and operators in expressions of arbitrary complexity in a general-purpose relational language. There should be no need for explicit reference to intermediate results or programming control structures. A language with these properties is sometimes described as 'relationally complete'. References
|