DATABASE MODELING & DESIGN - The Fundamental Principles

The basic component of a file in a file system is a data item, which is the smallest named
Unit of data. A group of related data items treated as a unit by an application is called a
Record. A file is a collection of records of a single type.
In a relation database, a data item is called an attribute, a record is called a row or tuple,
And a file is called a table. A database is a collection of interrelated stored data that
Serves the needs of multiple users within one or more organizations. A database
management system (DBMS) is a generalized software system for manipulating database.
A DBMS supports a logical view (schema,subschema); physical view (access methods,
Data clustering); data definition language; data manipulation language; and important
Utilities such as transaction management and concurrency control, data integrity, crash
Recovery and security. Relational DBMSs provide a much higher degree of data
Independence and also make database conversion and reorganization much easier.

Steps: Database Life Cycle
1. Requirements Analysis
2. Logical Design
a) ER Modeling
b) View Integration
c) Transformation of the ER Model to SQL Tables
d) Normalization of Tables
3. Usage Refinement
4. Data Distribution
5. Local Schema and Physical Design
6. Database Implementation, monitoring and modification.

Basic Objects: Entities, Relationships, Attributes

The basic ER model consists of three classes of objects: Entities, Relationships and
Attributes.
Entities
Entities are the principal data objects about which information is to be collected; they
Usually denote a person, place, thing , or event of informational interrest. A particular
Occurrence of an entity is called an entity instance or sometimes an entity occurrence.
Relationships
Relationships represent real-world associations among one or more entities, and as such
Have no physical or conceptual existence other than that which is inherited from their
Entity associations. A particular occurrence of a relationship is called a relationship
Instance or relationship occurrence. A role is defined as the function an entity plays
In a relationship.
Attributes
Attributes are characteristics of entities that provide descriptive detail about them. A
Particular occurrence of an attribute within an entity or relationship is called an
Attribute value. Attributes of an entity such as Employee may include emp_id,emp_
Name,emp_address,phone_no,fax_no,job_title , and so on.
Transformation Rules and SQL Constructs
The basic transformations can be described in terms of the three types of tables they
Produce:
--An Entity table with the same information content as the original entity
This transformation always occurs for entities with binary relationships that are
Many-to-many, one-to-many on the 'one' (parent) side, or one-to-one on one side;
Entities with binary recursive relationships that are many-to-many; and entities with
Any ternary or higher-degree relationship, or a generalization hierarchy.
--An Entity table with the embedded foreign key of the parent entity
This transformation always occurs for entities with binary relationships that are one-
To-many for the entity on the 'many'(child)side, for one-to-one relationships for one
Of the entities, and for each entity with a binary recursive relationship that is one-to-
One or one-to-many.
--A relationship table with the foreign keys of all the entities in the relationship
This transformation always occurs for relationships that are binary and many-to-
Many, relationships that are binary recursive and many-to-many, and all relationship
That are of ternary or higher degree. This is other most common way CASE tools
Handle relationships in the ER model.
Binary Relationships
A one-to-one binary relationship between two entities - When both entities are
Mandatory, each entity becomes a table and the key of either entity can appear in the
Other entity's table as a foreign key. One of the entities in an optional in the other
Should contain the foreign key of the other entity in its transformed table. When
both entities are optional, either entity can contain the embedded foreign key of the
Other entity, with nulls allowed in the foreign keys.
The one-to-many relationship can be shown as either mandatory or optional on the
'many' side, without affecting the transformation. On the 'one' side it may be either
mandatory or optional. In all the cases the foreign key must appear on the 'many'
side, which represents the child entity, with nulls allowed for foreign keys only in
the optional 'one' case. Foreign key constraints are set according to the specific
meaning of the relationship and may vary from one relationship to another.
The many-to-many relationship, as completely optional,requires a relationship table
with primary keys of both entities. The same transformation applies to either the
optional or mandatory case, including the fact the not null cause must appear for the
Foreign keys in both cases. Foreign key constraints on delete and update must
always be cascade because each entry in the SQL table depends on the current value
or existence of the referenced primary key.
Binary Recursive Relationships
A single entity with a one-to-one relationship implies some form of entity occurance
pairing, as indicated by the relationship name. This pairing may be completely
optional, completely mandatory, or neither. In all of these cases, the pairing entity
key appears as a foreign key in the resulting table. The two key attributes are taken
from the same domain but are given different names to designate their unique use.
The one-to-many relationship requires a foreign key in the entity table. The foreign
key constraints can vary with the particular relationship. The many-to-many relation
-ship uses a relationship table; it could also be defined as mandatory .
Multiple Relationships
Multiple relationships among n entities are always considered to be completely
independent. One-to-one or one-to-many binary or binary recursive relationships
that result in entity tables that are either equivalent or differ only in the addition
of a foreign key can simply be merged into a single entity table containing all
the foreign keys. Many-to-many or ternary relationships that result in
relationship tables tend to be unique and cannot be merged.

Transformation Steps
The list that follows summarizes the basic transformation steps from an ER
Diagram to SQL.
1) Transform each entity into a table containing the key and non-key attributes
of the entity.
2) Transform every many-to-many binary or binary recursive relationship into a
Relationship table with the keys of the entities and the attributes of the
relationship.
3 ) Transform every ternary or higher level n-ary relationship into a relationship
table.

NORMALIZATION

Fundamentals
Relational database tables, whether they are derived from ER models or from some
other design method, sometimes suffer from some rather serious problems in terms
of performance, integrity, and maintainability. For example, when the entire data-
base is defined as a single large table, it can result in a large amount of redundant
data and lengthy searches for just a small number of target rows. It can also result
in long and expensive updates, and deletions in particular can result in the
elimination of useful data as an unwanted side effect.
For example, products, salespersons, customers, and orders are all stored in a single
Table called 'sales'. In this table, certain product and customer information is stored
redundantly, wasting storage space. Also, updates such as changing the address of
the customer would require changing many rows. Finally, deleting the only out-
standing order by a valued customer also deletes the only copy of his address and
credit rating as a side effect. Such information may be difficult ( or sometimes
impossible) to recover.
If we had a method of breaking up such a large table into smaller tables so that these
types of problems would be eliminated, the database would be much more efficient
and reliable. Classes of relational database schemes or table definitions, called
normal forms, are commonly used to accomplish this goal. The creation of a
normal form database table is called normalization. It is accomplished by analyzing
the interdependencies among individual attributes associated with those tables and
taking projections (subsets of columns) of larger tables to form smaller ones.

First Normal Form
A Table is in first normal form (1NF) if and only if all columns contain only atomic
values; that is, there are no repeating groups (columns) within a row.
A repeating group occurs in a relational table when a multivalued attribute is allowed
to have more than one value represented within a single row. When this happen, rows
must either be defined as variable length or defined with enough attribute positions to
accommodate the maximum possible set of values.
The advantages of 1NF over unnormalized tables are its representational simplicity
and the ease with which one can develop a query language for it. The disadvantage is
the requirement of duplicate data.

Super keys, Candidate keys, and Primary keys
A table in 1NF often suffers from data duplication, update performance, and update
Integrity problems. A superkey is a set of one or more attributes which, taken
collectively, allows us to identify uniquely an entity or table. Any subset of the
attributes of a superkey that is also a superkey and not reducible to another superkey
is called a candidate key. A primary key is selected arbitrarily from the set of
candidate keys to be used in an index for that table.

Second Normal Form
A Table is in second normal form (2NF) if and only if it is in 1NF and every nonkey
attribute is fully dependent on the primary key. An attribute is fully dependent on the
primary key if it is on the right side of an FD for which the left side is either the
primary key itself or something that can be derived from the primary key using the
transitivity of FDs.
The property of one or more attributes that uniquely determine the value of one or
more other attributes is called functional dependence.
An example of a transitive FD in report(table) is the following:
report_no -> dept_no
dept_no -> dept_name
Therefore we can derive the FD ( report_no -> dept_name ), since dept_name is
Transitively dependent on report_no.

Third Normal Form
A Table is in third normal form (3NF) if and only if for every nontrivial functional
dependency X->A, where X and A are either simple or composite attributes, one of
two conditions must hold. Either attribute X is a superkey, or attribute A is a
member of a candidate key. If attribute A is a member of a candidate key, A is
called a prime attribute.
Note: A trivial FD is of the form YZ->Z.

Boyce-Codd Normal Form
Third normal form, which eliminates most of the anomalies known in databases
today, is the most common standard for normalization in commercial databases
and CASE tools. The few remaining anomalies can be eliminated by the Boyce-
Codd normal form and higher normal forms.
"A table R is in Boyce_Codd normal form (BCNF) if for every nontrivial FD X->A,
X is a superkey".
BCNF is a stronger form of normalization than 3NF because it eliminates the second
condition for 3NF , which allowed the right side of the FD to be a prime attribute.
Thus, every left side of an FD in a table must be a superkey. Every table that is
BCNF is also 3NF, 2NF and 1NF, by the previous definitions.
A loseless decomposition requires that when you decompose the table into two
smaller tables by projecting the original table over two overlapping subsets of the
scheme, the natural join of those subset tables must result in the original table
without any extra unwanted rows. The simplest way to avoid the delete anomaly
for this kind of situation is to create a separate table for each of the two assertions.
These two tables are partially redundant, enough so as to avoid the delete anomaly.
This decomposition is lossless ( trivially ) and preserves functional dependencies,
but it also degrades update performance due to redunancy, and necessitates
additional storage space. The trade-off is often worth it bacause the delete
anomaly is avoided.

Fourth and Fifth Normal Forms
Normal forms up to BCNF were defined solely on FDs, and for most database
practitioners, either 3NF or BCNF is a sufficient level of normalization. However,
there are in fact two more normal forms that are needed to eliminate the rest of the
Currently known anomalies. (constraints: Multivalued dependencies
and join dependencies).

In a multivalued dependency (MVD), X->>Y holds on table R with table schemeRS
If , whenever a valid instance of table R(X,Y,Z) contains a pair of rows that contain
duplicate values of X , then the instance also contains the pair of rows obtained by
interchanging the Y values in the original pair. This includes situations where only
pairs of rows exist. Note that X and Y may contain either single or composite
attributes.
An MVD X->>Y is trivial if Y is a subset of X, or if X union Y = RS . Finally, an
FD implies an MVD , which implies that a single row with a given value of X is also
An MVD , albeit a trivial form.

Fourth Normal Form
A Table R is in fourth normal form (4NF) if and only if it is in BCNF and, when-
ever there exists an MVD in R ( say X->>Y ), at least one of the following holds:
The MVD is trivial or X is a superkey for R.
The goal of fourth normal form is to eliminate nontrivial MVDs from a table by
projecting them onto separate smaller tables and thus eliminate the update anomalies
associated with the MVDs. Decomposition of a table into 4NF tables results in less
data redundancy.

Fifth Normal Form
A Table is in fifth normal form (5NF) if it cannot have a lossless decomposition by
the projection operation into any number of smaller tables.
A lossless decomposition of a table implied that it can be decomposed by two or
more projections, followed by a natural join of those projections (in any order)
that results in the original table, without any spurious or missing rows. The general
lossless decomposition constraint, involving any number of projections, is also known
as a join dependency (JD). In other words, a table is not in 5NF if it can be lossless
decomposed/joined via some n1 projections.
A lossless decomposition of two projections is equivalent to an MVD, and the table
may be in BCNF ( if the MVD is trivial) or 4NF ( if the MVD is trivial or the left
side is a superkey). A lossless decomposition of three or more projections is
equivalent to a JD, and the table is only in 4NF. Thus, an MVD is a special case of
A JD where the number of projections is two.
A table may have constraints that are FDs, MVDs, and JDs. An MVD is a special
case of a JD. In order to determine the level of normalization of the table, analyze
the FDs first to determine normalization through BCNF ; then analyze the MVDs to
determine which BCNF tables are also 4NF; and, finally, analyze the JDs to find
which 4NF tables are also 5NF.
A many-to-many-to-many ternary relationship is:
1. BCNF if it can be replaced by two binary relationships.
2. 4NF if it can only be replaced by three binary relationships, and
3. 5NF if it cannot be replaced in any way (and thus is a true ternary
relationship).