free web hosting | free hosting | Web Hosting | Free Website Submission | shopping cart | php hosting
affordable web hosting | Pets | web page hosting | web hosting | website hosting | web hosting service | web hosting | best web hosting

The Relational Data Model

 

In 1970, E. F. Codd, a member of the IBM Research Laboratory in San Jose, California, published his now famous paper, 'A Relational Model for Large Shared Data Banks' [1] in which were defined a set of abstract principles for database management. A definition of the relational model has been attempted by C. J. Date [2].

During the early 1970's several experimental database management systems, based on the principles outlined in Codd's paper, were developed; commercial products became available in the late 1970's, and proliferated during the 1980's. Oracle, Ingres and DB2 are perhaps the most widely used products in large multi-user environments, while others such as Access, Foxpro and Paradox tend to dominate the PC market.

We shall be illustrating the theory and principles of relation database systems with reference to the Oracle proprietary Relational Database Management Systems (RDBMS) and Structured Query Language (SQL), the most commonly used database language used to manipulate the structure of the database and data held within it

3.1. The relational model.

The relational model consists of three components:
  1. A Structural component -- a set of TABLES (also called RELATIONS).
  2. MANIPULATIVE component consisting of a set of high-level operations which act upon and produce whole tables.
  3. A SET OF RULES for maintaining the INTEGRITY of the database.
The terminology associated with relational database theory originates from the branch of mathematics called set theory although there are widely used synonyms for these precise, mathematical terms.
  • Data structures are composed of two components which represent a model of the situation being considered. These are (i) ENTITY TYPES - i.e. data group types, and (ii) the RELATIONSHIPS between the entity types.
  • Entity types are represented by RELATIONS or BASE TABLES. These two terms are interchangeable - a RELATION is the mathematical term for a TABLE.
  • A base table is loosely defined as an un-ordered collection of zero, one or more TUPLES (ROWS) each of which consists of one or more un-ordered ATTRIBUTES (COLUMNS). All tuples are made up of exactly the same set of attributes.

    For the remainder of this discussion we shall use the more widely known terminology:

    • TABLE for RELATION
    • ROW for TUPLE
    • COLUMN for ATTRIBUTE
  • Each column is drawn from a DOMAIN, that is, a set of values from which the actual values are taken (e.g. a set of car model names). More than one column in a table may draw its values from the same domain.
  • A column entry in any row is SINGLE-VALUED, i.e. it contains exactly one item only (e.g. a surname). Repeating groups, i.e. columns which contain sets of values rather than a single value, not allowed.
  • Each row of a table is uniquely identified by a PRIMARY KEY composed of one or more columns. This implies that a table may not contain duplicate rows.
  • Note that, in general, a column, or group of columns, that uniquely identifies a row in a table is called a CANDIDATE KEY. There may be more than one candidate key for a particular table; one of these will be chosen as the primary key.
  • The ENTITY INTEGRITY RULE of the model states that no component of the primary key may contain a NULL value.
  • A column, or combination of columns, that matches the primary key of another table is called a FOREIGN KEY.
  • The REFERENTIAL INTEGRITY RULE of the model states that, for every foreign key value in a table there must be a corresponding primary key value in another table in the database.
  • Only two kinds of table may be defined in a SQL schema; BASE TABLES and VIEWS. These are called NAMED RELATIONS. Other tables, UNNAMED RELATIONS, may be derived from these by means of relational operations such as JOINS and PROJECTIONS.
  • All tables are LOGICAL ENTITIES. Of these only base tables physically exist in that there exist physically stored records, and possible physical access paths such as indexes, in one or more stored files, that directly support the table in physical storage. Although standard techniques such as HASHING, INDEXING, etc. will be used for implementation efficiency, the user of the data base should require no knowledge of previously defined access paths.
  • Views and the results of all operations on tables - unnamed relations - are tables that exist as LOGICAL DEFINITIONS, in terms of a view definition, or a [SELECT .. FROM .. WHERE .. ORDER BY] sequence.
  • The term UPDATE has two meanings:
    • as a SQL operation in its own right which causes one or more columns in a table to be altered; in this context it will always be shown in upper-case letters - UPDATE.
    • as a generic term used to include the SQL operations INSERT, DELETE and UPDATE; in this context it will always be shown in lower-case letters.

3.2. Normalisation

Tables with the characteristics specified in 3.1. are said to be in 1ST NORMAL FORM (1NF). Their successful use within a database makes further constraints desirable to avoid redundancy, and ensure that subsequent updating cannot cause inconsistency.
  • Since each row is different, one or more columns must be present to identify it uniquely, i.e. to act as the primary key. The definition of further normal forms relies on the primary key and on the notion of FUNCTIONAL DEPENDENCY. The functional dependency of attribute B on attribute A implies that for each value of A there is only one possible value of B.
  • In a 2ND NORMAL FORM (2NF) relation, all non-key attributes are functionally dependent on every attribute value making up the key, not on a subset of those values.
  • 3RD NORMAL FORM (3NF) implies, additionally, that all non-key attributes are dependent only on the key, i.e. there is no TRANSITIVE DEPENDENCY.
Thus, in a well-formed relation:
     THE NON-KEY ATTRIBUTES ARE FUNCTIONALLY DEPENDENT
     ON THE KEY, THE WHOLE KEY, AND NOTHING BUT THE KEY!
The process of normalisation involves first flattening out any tree or network structures to eliminate repeating groups, then splitting the data into separate tables where the above conditions apply. For instance a stock record table, stock_record, may have the columns:
     item_number
     description
     price
     pricecode
     location
     quantity  (repeated for each location)
and typical data:
     123,  1"  nails,  10p,  7,  Birmingham,  30,  Coventry,  25
might be flattened to form the table:
     STOCK_RECORD (item,description,price,pricecode,location,quantity)

     123   1" nails   10   7   Birmingham   30
     123   1" nails   10   7   Coventry     25
which could then be normalised into:
     STOCK_ITEM (item, description, pricecode)
     STOCK_LEVEL (item, location, quantity)
     PRICE_CODE (pricecode, price)
Relationships are represented by matching attribute values in different tables - the occurrence of so-called foreign keys. The relational approach forces the designer to be very strict about the assignment of unique identifiers to table entries, since no other linking method is permitted.

In theory, a fully normalised data base is desirable to avoid duplication of information and the consequent risk of inconsistency after updating. However there is a practical difficulty that normalisation tends to fragment the data, and separate items of information which need to be seen together. The designer may sometimes prefer to have a data base which is not strictly in normal form so as to improve performance in frequently-executed operations.

A useful aspect of the relational model is that the data definition may itself is represented in tabular form; data descriptions may be created and examined using extensions to the relational query language. Subsets of the database are defined as views (virtual tables) produced by relational operations on the stored database. These may select particular rows or columns, or combine existing tables, to give any individual user easy access to an appropriate part of the data, or to enforce security constraints. View tables are not stored permanently, but recreated when required from the latest version of the base tables using the definition of the view.

3.2 Functional Dependencies

 -  Functional dependencies (FDs) are used to specify formal measures  of the "goodness" of relational designs

 -  FDs and keys are used to define normal forms for relations

 -  FDs are constraints that are derived from the meaning  and interrelationships  of the data attributes

 -  A set of attributes X functionally determines  a set of attributes Y if the value of X  determines a unique value for Y

 -  X -> Y holds if whenever two tuples have the same value for X, they must have   he same value for Y

 -  For any two tuples t1 and t2 in any relation instance r(R):

   If  t1[X]=t2[X], then  t1[Y]=t2[Y]

 -  X -> Y in R specifies a constraint  on all relation instances r(R)

 -  Written as X -> Y; can be displayed graphically on a relation schema as in Figures.   ( denoted by the arrow: ® ).

 -  FDs are derived from the real-world constraints on the attributes

Examples of FD constraints:

-  social security number determines employee name i.e SSN -> ENAME

 -  project number determines project name and location  i.e., PNUMBER -> {PNAME, PLOCATION}

 -  employee ssn and project number determines the hours per week that the employee  works on the project   i.e., {SSN, PNUMBER} -> HOURS

 -  An FD is a property of the attributes in the schema R

 -  The constraint must hold on every relation instance  r(R)

 -  If K is a key of R, then K functionally determines all attributes in R (since we never have two distinct tuples with t1[K]=t2[K])

3.3 Existence Dependencies

An important class of constraints is the existence dependencies. Specifically, , if the existence of entity x depends on the existence of entity y, then x is said to be existence dependent on y.  Operationally, if y is deleted, so is x. Entity y is said to be a dominant entity, and x is said to be a subordinate entity. 

As an illustration, consider the entity set loan and the entity set payment that keeps information about all the payments that were made in connection to  particular loan. the payment entity set is described by the attributes payment-number, payment-date and payment-amount. We form a relationship set loan-payment between these two entity sets which is one-to-many from loan to payment. Every payment entity must be associated with a loan entity. If a loan entity is deleted, then all the payments associated with that loan must be deleted also. In contrast, payment entities can be deleted from the database without affecting any loan. The entity set loan, therefore is dominant and payment is subordinate. in the loan-payment relationship set.

 

3.4. Relational Operations

Given this simple and restricted data structure, it is possible to define some very powerful relational operators which, from the users' point of view, act in parallel' on all entries in a table simultaneously, although their implementation may require conventional processing.

Codd originally defined eight relational operators.

  1. SELECT originally called RESTRICT
  2. PROJECT
  3. JOIN
  4. PRODUCT
  5. UNION
  6. INTERSECT
  7. DIFFERENCE
  8. DIVIDE
The most important of these are (1), (2), (3) and (8), which, together with some other aggregate functions, are powerful enough to answer a wide range of queries. The eight operators will be described as general procedures - i.e. not in the syntax of SQL or any other relational language. The important point is that they define the result required rather than the detailed process of obtaining it - what but not how.

3.4.1. SELECT

RESTRICTS the rows chosen from a table to those entries with specified attribute values.
     SELECT item
     FROM stock_level
     WHERE quantity > 100
constructs a new, logical table - an unnamed relation - with one column per row (i.e. item) containing all rows from stock_level that satisfy the WHERE clause.

3.4.2. PROJECT

Selects rows made up of a sub-set of columns from a table.
     PROJECT stock_item
     OVER item AND description
produces a new logical table where each row contains only two columns - item and description. The new table will only contain distinct rows from stock_item; i.e. any duplicate rows so formed will be eliminated.

3.4.3. JOIN

Associates entries from two tables on the basis of matching column values.
     JOIN stock_item
     WITH stock_level
     OVER item
It is not necessary for there to be a one-to-one relationship between entries in two tables to be joined - entries which do not match anything will be eliminated from the result, and entries from one table which match several entries in the other will be duplicated the required number of times.

The above definition is actually that of a NATURAL or EQUI-JOIN - i.e. a join in which the values of the matching columns are equal. It has become normal to extend join to include other comparison operators such as less than, greater than, etc. It is important to be clear about one's intentions here to obtain meaningful results. Join is obviously a very general operation, and the principal source of processing power in relational systems, but it is also costly in time and space. Because no ordering can be guaranteed, a join may require a comparison of every entry in one table with every entry in the other, and create large intermediate results. That is why users of large-scale data bases, while acknowledging the power and flexibility of the relational approach, were slow to adopt it instead of methods based on more efficient file processing techniques.

3.4.4. PRODUCT.

Builds a relation from two specified relations consisting of all possible combinations of rows, one from each of the two relations.

For example, consider two relations, A and B, consisting of rows:

     A: a     B: d     =>   A product B: a   d
        b        e                       a   e
        c                                b   d
                                         b   e
                                         c   d
                                         c   e

3.4.5. UNION.

Builds a relation consisting of all rows appearing in either or both of the two relations.

For example, consider two relations, A and B, consisting of rows:

     A: a     B: a     =>     A union B: a
        b        e                       b
        c                                c
                                         e

3.4.6. INTERSECT.

Builds a relation consisting of all rows appearing in both of the two relations.

For example, consider two relations, A and B, consisting of rows:

     A: a     B: a     =>     A intersect B: a
        b        e
        c

3.4.7. DIFFERENCE.

Builds a relation consisting of all rows appearing in the first and not in the second of the two relations.

For example, consider two relations, A and B, consisting of rows:

     A: a     B: a     =>  A - B: b   and   B - A: e
        b        e                c
        c

3.4.8. DIVIDE.

Takes two relations, one binary and one unary, and builds a relation consisting of all values of one column of the binary relation that match, in the other column, all values in the unary relation.
     A: a  x     B: x     =>     A divide B: a
        a  y        y
        a  z
        b  x
        c  y
Of the relational operators 3.2.4. to 3.2.8.defined by Codd, the most important is DIVISION. For example, suppose table A contains a list of suppliers and commodities, table B a list of all commodities bought by a company. Dividing A by B produces a table listing suppliers who sell all commodities.

3.5. Relational languages

In the last fifteen to twenty years, research and development on relational systems (much of it undertaken by IBM) has focused on two issues:
  • how to handle large data bases efficiently,
  • how to present non-technical users with easily-understood query languages based on the relational operations.
The purpose of a relational language is to provide a high-level interface allowing users (probably interactively) to interrogate and modify a data base within previously defined constraints. The general tendency has been to move away from the rather mathematical formalisms defined by Codd, towards query languages which are assumed to be easier for the general database user. The one currently most widely used is SQL, which covers relational operations, but has many extra features necessary for practical database application development. Most important of these are the relational functions, which act on single tables and produce aggregate results, also in table form. Those commonly provided are:
  • SUM total values for a particular attribute.
  • COUNT number of entries in a table.
  • AVERAGE SUM / COUNT
  • ORDER sort table entries on attribute values.
  • MAX or MIN select entry having the maximum or minimum value in the specified attribute.
These functions may be generalised to produce a separate result for each distinct value of one or more attributes:
     SUM (quantity IN stock-level OVER item)  find total amount of each
                                              item across all locations.

     COUNT (stock-item OVER location)         find the number of stock
                                              items held in each
                                              location.
It should be possible to combine functions and operators in expressions of arbitrary complexity in a general-purpose relational language. There should be no need for explicit reference to intermediate results or programming control structures. A language with these properties is sometimes described as 'relationally complete'.

References

  1. E.F.Codd 'A Relational Model for Large Shared Data Banks' CACM 13(6) June 1970
  2. C.J.Date An introduction to database systems, 6th edition Addison Wesley 1995