The basic component of a file in a file system is a data item, which is the smallest named Unit of data. A group of related data items treated as a unit by an application is called a Record. A file is a collection of records of a single type. In a relation database, a data item is called an attribute, a record is called a row or tuple, And a file is called a table. A database is a collection of interrelated stored data that Serves the needs of multiple users within one or more organizations. A database management system (DBMS) is a generalized software system for manipulating database. A DBMS supports a logical view (schema,subschema); physical view (access methods, Data clustering); data definition language; data manipulation language; and important Utilities such as transaction management and concurrency control, data integrity, crash Recovery and security. Relational DBMSs provide a much higher degree of data Independence and also make database conversion and reorganization much easier. Steps: Database Life Cycle 1. Requirements Analysis 2. Logical Design a) ER Modeling b) View Integration c) Transformation of the ER Model to SQL Tables d) Normalization of Tables 3. Usage Refinement 4. Data Distribution 5. Local Schema and Physical Design 6. Database Implementation, monitoring and modification. Basic Objects: Entities, Relationships, Attributes The basic ER model consists of three classes of objects: Entities, Relationships and Attributes. Entities Entities are the principal data objects about which information is to be collected; they Usually denote a person, place, thing , or event of informational interrest. A particular Occurrence of an entity is called an entity instance or sometimes an entity occurrence. Relationships Relationships represent real-world associations among one or more entities, and as such Have no physical or conceptual existence other than that which is inherited from their Entity associations. A particular occurrence of a relationship is called a relationship Instance or relationship occurrence. A role is defined as the function an entity plays In a relationship. Attributes Attributes are characteristics of entities that provide descriptive detail about them. A Particular occurrence of an attribute within an entity or relationship is called an Attribute value. Attributes of an entity such as Employee may include emp_id,emp_ Name,emp_address,phone_no,fax_no,job_title , and so on. Transformation Rules and SQL Constructs The basic transformations can be described in terms of the three types of tables they Produce: --An Entity table with the same information content as the original entity This transformation always occurs for entities with binary relationships that are Many-to-many, one-to-many on the 'one' (parent) side, or one-to-one on one side; Entities with binary recursive relationships that are many-to-many; and entities with Any ternary or higher-degree relationship, or a generalization hierarchy. --An Entity table with the embedded foreign key of the parent entity This transformation always occurs for entities with binary relationships that are one- To-many for the entity on the 'many'(child)side, for one-to-one relationships for one Of the entities, and for each entity with a binary recursive relationship that is one-to- One or one-to-many. --A relationship table with the foreign keys of all the entities in the relationship This transformation always occurs for relationships that are binary and many-to- Many, relationships that are binary recursive and many-to-many, and all relationship That are of ternary or higher degree. This is other most common way CASE tools Handle relationships in the ER model. Binary Relationships A one-to-one binary relationship between two entities - When both entities are Mandatory, each entity becomes a table and the key of either entity can appear in the Other entity's table as a foreign key. One of the entities in an optional in the other Should contain the foreign key of the other entity in its transformed table. When both entities are optional, either entity can contain the embedded foreign key of the Other entity, with nulls allowed in the foreign keys. The one-to-many relationship can be shown as either mandatory or optional on the 'many' side, without affecting the transformation. On the 'one' side it may be either mandatory or optional. In all the cases the foreign key must appear on the 'many' side, which represents the child entity, with nulls allowed for foreign keys only in the optional 'one' case. Foreign key constraints are set according to the specific meaning of the relationship and may vary from one relationship to another. The many-to-many relationship, as completely optional,requires a relationship table with primary keys of both entities. The same transformation applies to either the optional or mandatory case, including the fact the not null cause must appear for the Foreign keys in both cases. Foreign key constraints on delete and update must always be cascade because each entry in the SQL table depends on the current value or existence of the referenced primary key. Binary Recursive Relationships A single entity with a one-to-one relationship implies some form of entity occurance pairing, as indicated by the relationship name. This pairing may be completely optional, completely mandatory, or neither. In all of these cases, the pairing entity key appears as a foreign key in the resulting table. The two key attributes are taken from the same domain but are given different names to designate their unique use. The one-to-many relationship requires a foreign key in the entity table. The foreign key constraints can vary with the particular relationship. The many-to-many relation -ship uses a relationship table; it could also be defined as mandatory . Multiple Relationships Multiple relationships among n entities are always considered to be completely independent. One-to-one or one-to-many binary or binary recursive relationships that result in entity tables that are either equivalent or differ only in the addition of a foreign key can simply be merged into a single entity table containing all the foreign keys. Many-to-many or ternary relationships that result in relationship tables tend to be unique and cannot be merged. Transformation Steps The list that follows summarizes the basic transformation steps from an ER Diagram to SQL. 1) Transform each entity into a table containing the key and non-key attributes of the entity. 2) Transform every many-to-many binary or binary recursive relationship into a Relationship table with the keys of the entities and the attributes of the relationship. 3 ) Transform every ternary or higher level n-ary relationship into a relationship table. NORMALIZATION Fundamentals Relational database tables, whether they are derived from ER models or from some other design method, sometimes suffer from some rather serious problems in terms of performance, integrity, and maintainability. For example, when the entire data- base is defined as a single large table, it can result in a large amount of redundant data and lengthy searches for just a small number of target rows. It can also result in long and expensive updates, and deletions in particular can result in the elimination of useful data as an unwanted side effect. For example, products, salespersons, customers, and orders are all stored in a single Table called 'sales'. In this table, certain product and customer information is stored redundantly, wasting storage space. Also, updates such as changing the address of the customer would require changing many rows. Finally, deleting the only out- standing order by a valued customer also deletes the only copy of his address and credit rating as a side effect. Such information may be difficult ( or sometimes impossible) to recover. If we had a method of breaking up such a large table into smaller tables so that these types of problems would be eliminated, the database would be much more efficient and reliable. Classes of relational database schemes or table definitions, called normal forms, are commonly used to accomplish this goal. The creation of a normal form database table is called normalization. It is accomplished by analyzing the interdependencies among individual attributes associated with those tables and taking projections (subsets of columns) of larger tables to form smaller ones. First Normal Form A Table is in first normal form (1NF) if and only if all columns contain only atomic values; that is, there are no repeating groups (columns) within a row. A repeating group occurs in a relational table when a multivalued attribute is allowed to have more than one value represented within a single row. When this happen, rows must either be defined as variable length or defined with enough attribute positions to accommodate the maximum possible set of values. The advantages of 1NF over unnormalized tables are its representational simplicity and the ease with which one can develop a query language for it. The disadvantage is the requirement of duplicate data. Super keys, Candidate keys, and Primary keys A table in 1NF often suffers from data duplication, update performance, and update Integrity problems. A superkey is a set of one or more attributes which, taken collectively, allows us to identify uniquely an entity or table. Any subset of the attributes of a superkey that is also a superkey and not reducible to another superkey is called a candidate key. A primary key is selected arbitrarily from the set of candidate keys to be used in an index for that table. Second Normal Form A Table is in second normal form (2NF) if and only if it is in 1NF and every nonkey attribute is fully dependent on the primary key. An attribute is fully dependent on the primary key if it is on the right side of an FD for which the left side is either the primary key itself or something that can be derived from the primary key using the transitivity of FDs. The property of one or more attributes that uniquely determine the value of one or more other attributes is called functional dependence. An example of a transitive FD in report(table) is the following: report_no -> dept_no dept_no -> dept_name Therefore we can derive the FD ( report_no -> dept_name ), since dept_name is Transitively dependent on report_no. Third Normal Form A Table is in third normal form (3NF) if and only if for every nontrivial functional dependency X->A, where X and A are either simple or composite attributes, one of two conditions must hold. Either attribute X is a superkey, or attribute A is a member of a candidate key. If attribute A is a member of a candidate key, A is called a prime attribute. Note: A trivial FD is of the form YZ->Z. Boyce-Codd Normal Form Third normal form, which eliminates most of the anomalies known in databases today, is the most common standard for normalization in commercial databases and CASE tools. The few remaining anomalies can be eliminated by the Boyce- Codd normal form and higher normal forms. "A table R is in Boyce_Codd normal form (BCNF) if for every nontrivial FD X->A, X is a superkey". BCNF is a stronger form of normalization than 3NF because it eliminates the second condition for 3NF , which allowed the right side of the FD to be a prime attribute. Thus, every left side of an FD in a table must be a superkey. Every table that is BCNF is also 3NF, 2NF and 1NF, by the previous definitions. A loseless decomposition requires that when you decompose the table into two smaller tables by projecting the original table over two overlapping subsets of the scheme, the natural join of those subset tables must result in the original table without any extra unwanted rows. The simplest way to avoid the delete anomaly for this kind of situation is to create a separate table for each of the two assertions. These two tables are partially redundant, enough so as to avoid the delete anomaly. This decomposition is lossless ( trivially ) and preserves functional dependencies, but it also degrades update performance due to redunancy, and necessitates additional storage space. The trade-off is often worth it bacause the delete anomaly is avoided. Fourth and Fifth Normal Forms Normal forms up to BCNF were defined solely on FDs, and for most database practitioners, either 3NF or BCNF is a sufficient level of normalization. However, there are in fact two more normal forms that are needed to eliminate the rest of the Currently known anomalies. (constraints: Multivalued dependencies and join dependencies). In a multivalued dependency (MVD), X->>Y holds on table R with table schemeRS If , whenever a valid instance of table R(X,Y,Z) contains a pair of rows that contain duplicate values of X , then the instance also contains the pair of rows obtained by interchanging the Y values in the original pair. This includes situations where only pairs of rows exist. Note that X and Y may contain either single or composite attributes. An MVD X->>Y is trivial if Y is a subset of X, or if X union Y = RS . Finally, an FD implies an MVD , which implies that a single row with a given value of X is also An MVD , albeit a trivial form. Fourth Normal Form A Table R is in fourth normal form (4NF) if and only if it is in BCNF and, when- ever there exists an MVD in R ( say X->>Y ), at least one of the following holds: The MVD is trivial or X is a superkey for R. The goal of fourth normal form is to eliminate nontrivial MVDs from a table by projecting them onto separate smaller tables and thus eliminate the update anomalies associated with the MVDs. Decomposition of a table into 4NF tables results in less data redundancy. Fifth Normal Form A Table is in fifth normal form (5NF) if it cannot have a lossless decomposition by the projection operation into any number of smaller tables. A lossless decomposition of a table implied that it can be decomposed by two or more projections, followed by a natural join of those projections (in any order) that results in the original table, without any spurious or missing rows. The general lossless decomposition constraint, involving any number of projections, is also known as a join dependency (JD). In other words, a table is not in 5NF if it can be lossless decomposed/joined via some n1 projections. A lossless decomposition of two projections is equivalent to an MVD, and the table may be in BCNF ( if the MVD is trivial) or 4NF ( if the MVD is trivial or the left side is a superkey). A lossless decomposition of three or more projections is equivalent to a JD, and the table is only in 4NF. Thus, an MVD is a special case of A JD where the number of projections is two. A table may have constraints that are FDs, MVDs, and JDs. An MVD is a special case of a JD. In order to determine the level of normalization of the table, analyze the FDs first to determine normalization through BCNF ; then analyze the MVDs to determine which BCNF tables are also 4NF; and, finally, analyze the JDs to find which 4NF tables are also 5NF. A many-to-many-to-many ternary relationship is: 1. BCNF if it can be replaced by two binary relationships. 2. 4NF if it can only be replaced by three binary relationships, and 3. 5NF if it cannot be replaced in any way (and thus is a true ternary relationship).