What is a
Relational Database?
What is a
Database Management System (DBMS)?
Conceptual Entity vs. Physical Entity
Generalization (SuperType/SubType)
Transforming the Conceptual Data Model into Relations
Here are some examples of data.
The list above is a set of concrete values. They are not meaningful unless we know what they are values of. We can guess at some of the values: Mary Smith is likely the name of a person; 19 Oak is probably a street address; 75 could be just about anything – an age, degrees, a golf score, etc.
In order for these values to be meaningful, there must be a meaning attached to them. In effect we need data about the data, which is called metadata. See the table below.
METADATA |
DATA |
name of person |
Mary Smith |
age |
75 |
number of grandchildren |
2 |
telephone number |
9164537834 |
college degree |
BS |
street address |
19 Oak |
name of cat |
Jack |
color of eyes |
blue |
The metadata gives meaning to the data,
but questions still remain – whose age,
who has 2 grandchildren, etc. The
data need to be applied to some entity, in this case a particular
person – Mary Smith. The data above
refer to a single entity; a set of data values treated as a unit is
called a record.
Records refer to entities. An entity may be a person, a product, an event, a corporation, a disease, etc. In general, an entity is anything that can be described by a set of data values. Usually, a record will contain a data value(s) that uniquely identifies the entity – it is commonly called the primary key..
A group of similar records treated as a unit is called a file. In this example it would be a file of persons or person file. Person file may be too general; for example, employee file or customer file is more specific. Generally, the file name indicates the type of entity that is being referenced – employee file, payroll file, student file, inventory file, sales file, accounts payable file, diseases file, etc.
A group of files treated as a unit is called a database. A database acts as an integrated set of files which is well suited to perform the following:
Information is the response given to a question (query); it usually involves the processing of data.
For example:
QUERY:
List the names of all employees with a BS degree.
RESPONSE: Each record in the file is
examined and those records with a degree of BS are selected and the names
displayed.
NOTE: Only data that is used becomes information, data that is never used is wasted space.
When records are stored they require some type of space – paper, computer hard drive, etc. A piece of space is allocated for each unit of data referenced by the metadata of a given record – this piece of space is usually referred to as a field. It is not uncommon to refer to the metadata of a record as the record’s fields. For example, a record may have a name field, an address field, etc. Defining a file’s structure (design) usually means defining the fields of the file’s record - each record in a given file has the samee format. If a file were stored on paper, the fields may appear as column headings (see below). A column of space may be designated to contain name values and another column may be designated for address values.
NAME |
ADDRESS |
Mary Smith |
|
John Doe |
|
Emma Quinn |
|
In C++, a record’s fields are defined as follows:
struct recordType
{
string
name;
string
address;
};
NOTE: When defining fields (space) for computer storage, it is also necessary to indicate the data type of each field. The field names and the data type are metadata. The field is the smallest unit of meaning that is given a name.
There are a variety of database types – one type is the widely used relational database. A relational database represents files in the form of tables, thus the files of a relational database are sometimes called tables although relation is the technical term for a file in a relational database. A relation may be characterized as a specific type of table – all relations are tables, but not all tables are relations. A table becomes a relation when it meets the following criteria:
A DBMS is software that manages and controls metadata/data. It is the successor to the traditional file system.
Below are two examples of reading and displaying a file which contrast the traditional file system with a DBMS. Both methods will process the following file.
NAME |
JOB TITLE |
Smith |
programmer |
Doe |
electrician |
Quinn |
doctor |
Example 1: Tradtional file system using C++ to read and display a file.
#include<iostream>
#include<string>
#include <fstream>
using namespace std;
struct recType
{
string
name;
string
jobTitle;
};
void main()
{
recType myRec;
ifstream inFile("nameFile.txt");
while
(inFile>>myRec.name>>myRec.jobTitle)
{
cout<<myRec.name<<"
"<<myRec.jobTitle<<endl;
}
}
Example 2: A DBMS (Microsoft’s Access) using SQL (a DBMS language) to read and display a file.
select name,jobTitle
from nameFile;
Notice the absence of detail in the SQL “program.” The SQL program states WHAT is to be done and the DBMS supplies the details of HOW to perform the task. The C++ program must specify how to perform the task. The diagram below illustrates the difference.
TRADITIONAL FILE
SYSTEM
DBMS
The notable difference is that the user program (C++) in the traditional file system interfaces directly with the data; in the DBMS the user program (SQL) interfaces with the DBMS which in turn interfaces with the data. In the DBMS, the file structure is stored once, and as a SQL program references a file, the DBMS accesses the file’s structure. In other words, in a DBMS, a file’s structure is centrally located, while in the traditional file system the file structure is distributed across each program that uses the file – decentralized. Each C++ program that uses the nameFile must include the file’s structure; consequently, a change in the file structure requires a change in the program using that file – the traditional file system has no program/data independence as does the DBMS. If one hundred C++ programs use the nameFile, a change in the file’s structure would require changing and recompiling each of the one hundred C++ programs!
For example: Suppose a salary field were to be added to the nameFile Each C++ program using the nameFile would require adding the new field to its struct, even if the program did not use the salary field! On the other hand, the DBMS makes one change to the centrally located, single copy of the file’s structure. Each SQL program using the nameFile remains unaffected by the file structure change – this is program/data independence.
The design cycle consists of three main steps that produce the following:
I Conceptual Data Model;
II Logical Data Model;
III Physical Database.
A conceptual data model is a very general view (abstraction) of:
1. Entities;
2. Relationships between entities.
The conceptual data model is expressed in a graphical language, consisting mainly of blocks, ovals, and lines. The following is a basic lexicon:
= a reference to an entity class.
= entity attribute
= relationship
between entity classes.
* = many
1 = one
Example 3: A company wishes to develop a database involving its salespersons and its customers. The company’s Business Rules allow salespersons to have many customers, but each customer is assigned to only one salesperson.
Step I: create the conceptual data model,
SALESPERSON CUSTOMER
1 has
*
Think of the model above as an expression of the company’s business rules using a graphical language instead of English. The model above says the following: Any given saleperson has the possibility of many customers, and any given customer has the possibility of only one salesperson. The same statement can be summarized as follows: There is a “one to many” relationship between salesperson and customer.
Each entity can have one or more attributes, which describe the entity. It is often the case that an entity has an attribute(s) that is used to uniquely identify each member of the entity class – this attribute(s) is called the primary key. By definition, no two individual entities will have the same value for the primary key. For example, the salesperson ID is the primary key, which means that no two salespersons will have the same ID number. The customer acct# is the primary key for the customer entity. There are cases where it will require more than one attribute to serve as the primary key. It is customary to underline the primary key.
SUMMARY
A.) Each block
represents a class of entities – e.g., the SALESPERSON block represents the class of salesperson entities
– a member of this class is an individual salesperson such as Mary Smith.
B.) The following are called cardinalities.
1_________* means “one to many” relationship;
*_________* means “many to many” relationship;
1_________1 means “one to one” relationship.
C.) means attribute - e.g., the customer’s NAME, the salesperson’s ID, etc.
C.) A primary key is one or more attributes that serve to uniquely identify each member of a class of entities
Step II: Develop the logical
data model
In this example, the step will consist of transforming the conceptual data model into a relational model – the relational model is one of many types of logical data models.
Here is the process.
A.) Transform each block (entity) into a relation.
B.) Transform
each relationship into its relational
counterpart.
Instead of a relation, we will be using relational schema, which consists of the name of the relation followed by the attributes enclosed in parentheses.
Transform each block (entity) into a relational schema
Salesperson(salespersonId, salespersonName)
Customer(customerAcct#, customerName, customerAddress)
Transform the “one to many” relationship into its relational
counterpart
The way the relational model expresses relationships, such as “one to many”, is by the presence of a common attribute in each of the connected (related) relations. In this case, the primary key salespersonId will be added to the customer relation.
Customer(customerAcct#, salespersonId, customerName, customerAddress)
Rule for tranforming “one to many” relationship: add the primary key of the “one” relation to the “many” relation.
The salespersonId attribute in the customer relation is called the foreign key, indicating that this attribute refers to the primary key of another relation, and is used to relate (integrate) the two relations.
Step III: Implement the physical database.
Since the logical data model is relational, a relational DBMS must be chosen - in this case Microsoft’s Access DBMS is chosen.
Here is the process:
A.) Convert each relational schema into a table.
1. Define the table’s metadata( (column names ,data type)
2. Add the actual data values for each record of the table.
B.) Specify the relationships between tables.
salesperson |
|
salespersonId |
name |
001 |
smith |
002 |
jones |
003 |
adams |
Repeat the above for the customer
table.
NOTE: ∞ means many as does *
Example 4: A company wishes to develop a database involving the products it assembles and the types of parts that constitute each product. A product is made up of many different part types, and each part type may appear in different products.
Step I: Create the conceptual data model.
PRODUCT
* contains
*
The model above says that a single product contains
many parts, and a single part type is used in many products;
that is, there is a “ many to many” relationship
between product and part type.
The primary keys are ID for the product entity and PARTNUM
for the part type entity.
Step II: Develop the logical
data model.
Here is the process.
A.) Transform each block (entity) into a relation.
B.) Transform
each relationship into it’s relational counterpart.
Instead of a relation, we will be using relational schema, which consists of the name of the relation followed by the attributes enclosed in parentheses.
Transform each block (entity) into a relational schema
product(productId, productDescription)
part(partnum, partDescription, supplier)
Transform the “many to many” relationship into its relational
counterpart
A “many to many” relationship is translated by creating an additional relation that contains, at minimum, the primary key of each relation in the relationship.
product_part(productId, partnum)
Rule for tranforming many to many relationship: Create an additional relation that is a hybrid of the two relations in the many to many relationship. The hybrid relation should contain, at minimum, the primary key of each of the aforementioned relations.
Step III This step is left to the reader.
SQL (Structured Query Language) is a language designed to be used with relational databases. The language has two main parts, DDL (Data Definition Language) and DML (Data Manipulation Language). The DDL performs such tasks as defining tables, deleting tables, and altering the structure of tables. The DML performs such tasks as retrieving data and maintaining data (updating records, deleting records, inserting records). See the following chart for a more complete breakdown.
--Create
-- Delete
SELECTION QUERIES ACTION
QUERIES
-- Alter
--Select
--Update
--Insert
--Delete
The following is a brief introduction to DML,
specifically SELECTION QUERIES.
This category of SQL statement(s) is designed to retrieve information from a relational database. It consists of one basic statement (called the select command) and various clauses and parameters. Below is the basic form of the select command.
select ………
from………… .
Where…… …..
The select clause contains what is to be displayed or projected. The expressions in the select clause can be:
Example:
select sum(amount)
(Output)
Expr1000
250000
select sum(amount) as SalesTot
(Output)
SalesTot
250000
The from clause contains references to the source of the data. These sources could be:
The where clause contains conditional expressions and is required only when needed.
Example: where state=’CT’ or state=’RI’
Some of the conditionals are:
Example: Simple projection
of one field name.
Select custname
From customer
Example: Simple
projection of many field names.
Select custname,age
From customer
Example: Simple
projection of every field name in the table.
Select *
From customer
Example: Average
age of all customers.
Select avg(age)
From customer
Example:Total number of
customers.
Select count(*)
From customer
Example: Using an
alias.
Select count(*) as TotalCustomers
From customer
Example:Total sales of all
customers.
Select sum(sales)
From customer
Example:Using the where clause.
Select custname
From customer
Where sales > 10000
Example:List the names of
customers with sales over 10,000 and under the age of 36.
Select custname
From customer
Where sales > 10000 and age < 36
Example: Simple
join. List the names of each customer
and the name and age of the salesperson assigned to them – each customer is
assigned to only one salesperson.
Select custname,salespersonName,salespersonAge
From customer,salesperson
Where customer.salespersonId = salesperson.salespersonId
NOTE: The where clause
above serves to join the customer and salesperson tables on the basis of a
common field – salespersonId. As a result of the join, each salesperson record is joined (attached) to the
appropriate customer record.
Example:List the total
customer sales by state.
Select custState,sum(sales) as TotSales
From customer
Group by custState
NOTE: The group by clause
above groups all the
custState TotSales
CT 4780
MA 5432
NY 10908
RI 3500
(See the SQL Supplement for a more comprehensive and in-depth coverage of
SQL)
The Conceptual Data Model (CDM) is a diagram that depicts entities and their relationships with other entities. Each entity is further described by depicting its attributes. The CDM represents the first step in a top-down approach to the design and development of a database.
An attribute is viewed as a property of an entity, but sometimes it is difficult to distinguish an entity from an attribute.
Example 5:
ENTITY: employee
ATTRIBUTES: social security
number (ss#); name; skill
In example 5 above, skill is an attribute, but in another context skill is treated as an entity – see example 6 below.
Example 6:
* 1
In example 6, each employee has one skill, and each skill can be had by many employees - there is a one to many relationship from skill to employee.
Example 7 below shows another instance where skill is treated as an entity.
Example 7:
* *
In example 7, an employee can have many skills – there is a many-to-many relationship between employee and skill.
SUMMARY
Rule for determining attributes: Treat everything as an entity, and then look for entities that are:
1. In a functional relationship (on the one side) with another entity;
2. And do not have attributes;
3. And have no relationships with other entities.
If an entity meets all three criteria, then treat it as an attribute.
An aggregate entity is an entity constructed of other entities.
Example 8A: A company wishes to develop a CDM depicting the relationship between its various stores and products sold. Each store can sell different types of products, and each product type can be sold by different stores.
* *
Example 8B: Same as example 8A, except the company wishes to know the total quantity of products sold by each store; and the total quantity sold of each product.
STORE PRODUCT
*
*
Example 8C: Same as example 8A, except the company wishes to know the total quantity sold of a given product by a given store.
Example 8B shows the total quantity sold by each store – e.g. store 10 sold a total quantity of 1000 product units. The model does not show a detailed breakdown of the various products – e.g 300 units of product x, 200 units of product y, and 500 units of product z. The model in example 8B is too general, putting it another way, it has a coarse granularity. Example 8C calls for more detail or finer granularity. Example 8C will require an aggregate entity.
SALES * * PRODUCT STORE
*
NOTE
Model 8B cannot provide the detail (finer granularity) that 8C
provides, but 8C can provide the generality (coarser granularity) of 8B. Using 8C, store 10’s total quantity sold can
be determined by adding up the individual store 10 totals for products x, y,
and z. Fine granularity is more
flexible than coarse granularity.
The quantity is now the attribute of the aggregate entity (SALES) and represents the total quantity sold of a given product by a given store. The example 8C model has a finer granularity than the example 8B model. SALES is an aggregate entity because it is constructed of other entities – a nested entity. This model will lead to a database that can answer the query How much of product y did store 10 sell; but the granularity is not fine enough to answer the query How much of product y did store 10 sell on a given date? – see example 8D.
Example 8D: Same as example 8C, except the company wishes to know the total quantity sold of a given product by a given store on a given date. Note that a given store can sell the same product type on different dates.
* *
*
*
*
The quantuty attribute now represents the quantity sold of a given product by a given store on a given date. Until now, the relationships have been binary relations - relations between two entities. The relationship inside the aggregate entity above is a ternary relationship – the relationship between three entities. It shows that a given store sells a given product on many dates.
Example 8E: Same as example 8D, except
the ternary
relationship is transformed into a binary relationship –
a many-to-many between an aggregate
entity and a non-aggregate entity. Model
8D and 8E are semantically equivalent – they convey the same meaning.
* *
*
*
DATE
*
The 8E model above illustrates a more complex nesting, namely an aggregate entity nested inside an aggregate entity, but there are some who prefer to deal with binary relationships.
Exercise 1: Explain why the model below (call it 8F) cannot answer the same query that
8D and 8E can answer – How many units of
product x did store 10 sell on
SALES * *
* *
* *
Philosophers make a useful distinction called the type/token distinction. For example, how many words are listed below?
Cat
Cat
Cat
Someone says three and another says one. In a sense, both are correct – it is said that there is one type and three tokens (instances) of the type. In database design the same distinction is sometimes called the conceptual (type) / physical (token) distinction.
Example 9: In the model below, the product entity is a conceptual entity, because a given product is sold in many stores. This would be impossible if the product entity were a physical entity because a physical entity is singular and can be sold in only one store, thus in this case it is the same conceptual (type) product that can be sold in many stores.
STORE PRODUCT
* *
Example 10: The model below indicates that there is a always one to many cardinality between a conceptual entity and a physical entity.
(CONCEPTUAL) AUTO (PHYSICAL) AUTO
1
*
One of the most useful organizing activities of the mind is classification – the is a relationship. For example: a savings account is a bank account; a checking account is a bank account; a cat is an animal; gold is a metal. In the preceding examples, there is a supertype / subtype relationship: animal/cat; metal/gold; bank account/savings account, etc.
Example 11: The model below (the Ụ represents an “is a ´relationship) represents a supertype/subtype relationship among entities.
Ụ Ụ Ụ
CHECKING
ACCOUNT SAVINGS
ACCOUNT MORTGAGE
ACCOUNT
A weak entity is one whose existence depends on another entity (called the parent); that is, the cardinality from the weak (dependent) to the parent is a minimum of one and a maximum of one. The weak entity must contain the same primary key as the parent. In the example below, the weak entity contains the same primary key (invoice#) as the parent.
Example 12: The line item entity depends on the bill entity.
1 *
1. Explain why the model below
cannot answer the same query that 8D and 8E can answer – How many units of
product x did store 10 sell on March 26, 2003 and on March 26, 2004?
*
The ACME Corp. wishes to develop a conceptual data model based on the following business rules:
The model should, at minimum, answer the following queries:
The model should include the basic attributes of each object( entity).
The BARTON stockbroker firm wishes to develop a conceptual data model to track its clients and the stocks they OWN. Among the data for each stock are its symbol, the industry it is in, and its earnings. The firm requires the client’s name, address, and social security number. One client can hold many stocks, and one stock can be held by many clients. The firm needs to know the date the client purchased the stock, the number of shares purchased, and the purchase price per share.
A department has many employees, but an employee belongs to one department. The department holds periodic meetings with individual employees. The company wishes to record the date of the last meeting with each employee.
A department has many employees and an employee belongs to more than one department. The department holds periodic meetings with individual employees. The company wishes to record the date of the last meeting with each employee.
Many employees are assigned to a given project. A single employee may be assigned to more than one project. An employee can have many skills and a skill may be had by many employees. The company desires to know the set of skills an employee applies to a given project and the date assigned to a given project.
7.
A computer is comprised of many parts. A part can appear in many computers. The company needs to list the parts contained in their various computer models. Also, the company needs to know the quantity on hand for its computers and parts.
The company buys the same part from different manufacturers – for example, it buys hard drives from Western Digital, Seagate, Maxtor, or IBM. The company has been experiencing quality control problems and wishes to identify the manufacturer of a failed part.. Draw an enhanced model that will accommodate both A and B.
Many employees are assigned to a given project. A single employee may be assigned to more than one project. Even though many projects can be assigned to a given city, an employee may be assigned to only one project per city. Develop a model that reflects these company rules.
1. This model states the following: for a given combination of store and product there is only one date.
Models 8d and 8e state the following: for a given combination of store and product there are many dates.
*
* 1
* *
uses *
NOTE: Will the model answer the following query: What are the skills of a given employee? The answer is NO. The model indicates the skills used by various employees, but not any skills not being used at the time. The following model adds this enhancement.
has
7.
(Beyond scope?!!)
There are a variety of logical data models with the relational model being the most widely used variety and the object model variety catching on fast. The relational model will be discussed in this section. The relational model is based on the mathematical theory of relations where tables are called relations; rows are called tuples; fields are called attributes. A relation is a specific type of table that meets the following criteria:
A table with the following fields would not be a relation because of the repeating fields (skill).
WorkerId
WorkerName
skill1
skill2
skill3
One problem with repeating fields in this case is determining how many skill fields to allow. Suppose a worker has more than 3 skills!! Suppose a particular worker has one skill. If you define more skills than any worker could possibly have, then there will be many rows (workers) with wasted fields (space) – rows with many null values in the unused skill fields.
The solution to this problem is to divide the single table into two relations (see below).
worker(workerId, workerName)
workerSkills(workerId, skill)
Each time a single worker acquires a new skill, a new row is added to the workerSkills relation.
(workerSkills
Relation)
workerId skill
120 |
electrical |
230 |
plumbing |
120 |
roofing |
120 |
plumbing |
230 |
electrical |
Many larger companies deploy two types of databases:
1. A database for Information Retrieval – often called a Data Warehouse.
2. A database for daily Transactions - often called a Production Database
The topic of Anomalies and Normal Forms, in general, will apply to the Production Database. A production database, distinct from a Data Warehouse, undergoes frequent and voluminous maintenance. Maintenance is:
1. Adding new rows.
2. Deleting rows.
3. Modifying rows.
Anomaly will be defined as a rare and undesirable event. Anomalies can occur when a production database undergoes maintenance. There are the following types of anomalies:
1. Addition anomaly.
2. Deletion anomaly.
3. Modification anomaly.
The following relation will be used to illustrate the three anomalies.
student(studentId,
major, studentName, advisorId)
Business Rules:
studentId
major studentName advisorId
s100 |
acct |
Able |
f400 |
s100 |
math |
Able |
f500 |
s200 |
acct |
Baker |
f400 |
s300 |
hist |
Crane |
f600 |
s400 |
math |
Doe |
F500 |
s500 |
hist |
Evans |
f600 |
s500 |
math |
Evans |
f500 |
Addition Anomaly:
Suppose a new student (s600, Hayes) is to be added to the table, but the student has not yet chosen a major. In this case the student data (name and id) cannot be added to the table until a major is chosen. Why? A relation cannot contain a null value in the primary key. The primary key in this table is comprised of two fields – studentId and major.
Deletion Anomaly:
Suppose students s100 and s200 leave the school and all their rows are deleted from the relation. In this case more information will be deleted than was intended. It just so happens that these two students are the only students advised by f400, thus the information that f400 is an advisor for accounting majors is also deleted.
Modification Anomaly:
Suppose a new advisor (f900) replaces f500. In this case more than one row requires modification – every row with advisor f500 needs to be modified.
Maintaining a relation such as the above can lead to anomalies. This suggests that there is a design flaw in the table. Normalization is the process of designing relations that will not be prone to anomalies and other maintenance problems.
How does normalization relate to another design process conceptual data modeling? Normalization is bottom-up and conceptual data modeling is top-down. Top-down begins with a general view of the relationships between entities; bottom-up begins with a concrete view of fields and their relationships to other fields. Each design process leads to the construction of relations, but normalization can also be used to test tables created by the top-down process for design flaws. Why not just use normalization (bottom-up) because the result should be normalized relations that do not contain design flaws, whereas the top-down approach may produce relations with design flaws?! An advantage of the top-down approach is that it also generates a company-wide graphical overview of the data and the various relationships. The larger companies often use both approaches in a complementary manner.
The normalization process was created as the consequence of analyzing flawed tables and discovering the source of the flaw and thereby providing a remedy. The source of most anomalies is attributed to certain undesirable functional dependencies (FD’s). The normalization process will be viewed in this discussion as a set of constraints outlawing (forbidding) certain FD’s. Normalization produces relations that are in various degrees of normal form. The lower the degree, the less constraints imposed on the table and the more prone to flaws. Higher degrees impose increasingly more constraints until the relation is free from flaws. Each higher degree contains all the constraints of the preceding degrees plus more. What follows are the degrees of normal forms, beginning with the lowest and moving incrementally to the higher normal forms.
Before delving into the constraints imposed by the normal forms, it is necessary to understand the terms composite key, atomic key, subkey , nonkey, and the concept of functional dependency.
Composite Key: If a primary key consists of two or more fields, the primary key is called a composite key.
Subkey: If a primary key is a composite key, then any field or combination of fields within the composite key is called a subkey – the composite key as a whole is not considered a subkey.
Atomic Key: If a primary key consists of one field, the primary key is called an atomic key. An atomic key has no subkeys - a >composite key has subkeys.
Nonkey: Any field or combination of fields that is not part of the key.
Functional Dependency: In this context we are referring to the functional dependency that occurs between fields. A functional dependency occurs between field A and field B when a given value of A is associated with only a single value of B – it is said that A determines B, or B is dependent on A.
For example: In the student relation above, a single value of studentId (100) determines studentName (Able). Conversely, a single value of studentId does not determine major – studentId (100) can be mapped onto two values of major (acct, math). In the latter case, it is said that studentId multi-determines major, or major is not dependent on studentId.
What follows is the list of normal forms and their associated constraints.
1. First
Table must be a relation.
i. No repeating field.
ii. No duplicate rows.
2. Second
a. 1nf.
b. No subkey determines a nonkey.
Each nonkey
is determined by the whole key.
c. 1nf plus an atomic key = 2nf.
3. Third
a. 2nf.
b. No nonkey determines another nonkey.
Each nonkey
is determined by the whole key and nothing but the whole key.
c. 2nf + one nonkey = 3nf.
4. Boyce Codd Normal Form (bcnf)
a. 3nf.
b. No nonkey determines a subkey.
c. 3nf + atomic key = bcnf.
A relation in bcnf
is free from any anomalies caused by FD’s – all lower normal
forms are not free from anomalies.
The following notation will be used:
A à B means A determines B.
Aàà means A multi-determines B; implies A does not determine B.
A-/à means A does not determine B
Exercises: Identify
the highest normal form of the
following relations.
Answers
Decomposition is a technique for normalizing a relation. Normalizing a relation means raising a relation to a higher normal form, in our case raising a relation to bcnf. In most cases, decomposition achieves this task by decomposing the relation in question into two or more relations that are in bcnf.
Example: The
relation below is in 1nf because major (a subkey) determines advisor (a nonkey). Decompose the relation into bcnf.
NOTE: Decomposition
must not change the meaning of the original relation. The
various relations resulting from decomposition must collectively contain
the same information as the original relation. That is, the various relations, when
joined, must form the original relation.
.
student(id, major, advisor)
Decomposition
Process:
Steps:
r2(id, major)
As it stands, joining r1 and r2 will produce the original relation. On the other hand, if advisor were chosen as the foreign key for r2 in step 2 (r2(id, advisor)), then joining r1 and r2 would NOT reproduce the original relation, but rather a relation with erroneous data. WHY?
Exercises: Identify
the normal form and decompose the
following relations.
Answers
1. a) 2nf - nonkey (activity)
determines nonkey (fee).
b)
r1(activity, fee) r2(id,
activity)
2. a) 3nf – nonkey (advisor)
determines subkey (major).
b)
r1(advisor, major) r2(id,
advisor)
3. a) 1nf – subkey (supplier) determines nonkey (price).
Once a conceptual data model (CDM) has been approved, the next step is to transform it into a logical data model of choice – in this case the relational data model. What follows are examples of the various transformations.
Example 1: Transforming a One to One Relationship.
THE TRANSFORMATION PROCESS
MAJOR(majorId, facId, description).
Choosing a or b may be a toss of the coin or may be determined by a reason. Suppose the following: every faculty does not act as an advisor; each major has an advisor; and option b was chosen. If that were the case, then some faculty rows would contain a null value in the majorId field indicating that they are not advisors. Usually null values are to be avoided wherever possible – choosing option a avoids the null values in this case.
NOTE: .The CDM and the
relational model are two different languages (notations). The CDM expresses the relationship
between two entities as a line connecting the two
entities. The relational model
expresses the relationship between two relations by way of a common field
appearing in each relation. This is
done by adding the primary key field
of one relation to the other relation where the field is called the foreign
key. A foreign key should not be
shown in a CDM – it has no meaning
in the CDM language.
Example 2: Transforming a One to Many Relationship.
THE TRANSFORMATION PROCESS
Example 3: Transforming a Many to Many Relationship.
THE TRANSFORMATION PROCESS
Example 4: Transforming Aggregate Entities.
THE TRANSFORMATION PROCESS
Example 5: Transforming Ternary Relationships.
|
What if the ternary relationship above were inside an aggregate entity? Just add any aggregate entity attributes to the assignment relation.
The following relational database stores student exam scores.
student |
|
id |
name |
100 |
able |
200 |
baker |
grades |
|||
id |
class |
exam |
score |
100 |
c1 |
1 |
85 |
100 |
c1 |
2 |
88 |
100 |
c1 |
3 |
92 |
100 |
c2 |
1 |
87 |
200 |
c1 |
1 |
75 |
200 |
c1 |
2 |
85 |
200 |
c3 |
1 |
90 |
The following XML document stores the same information.
<?xml
version="1.0" ?>
- <student_grades>
- < <student>
< <id>100</id>
< <name>able</name>
- < <grades>
- < <grade>
<class>c1</class>
< <exam>1</exam>
< <score>85</score>
</grade>
- < <grade>
<class>c1</class>
< <exam>2</exam>
< <score>88</score>
</grade>
- < <grade>
<class>c1</class>
< <exam>3</exam>
< <score>92</score>
</grade>
<grade>
<class>c2</class>
< <exam>1</exam>
< <score>87</score>
</grade>
</grades>
</student>
- < <student>
< <id>200</id>
< <name>baker</name>
- < <grades>
- < <grade>
<class>c1</class>
< <exam>1</exam>
< <score>75</score>
</grade>
- < <grade>
<class>c1</class>
< <exam>2</exam>
< <score>85</score>
</grade>
<grade>
<class>c3</class>
< <exam>1</exam>
< <score>90</score>
</grade>
</grades>
</student>
</student_grades>
Note the differences:
XML 1. Data
stored in one document. 2. No foreign
key(s) are required. 3. The
structure is hierarchical. 4. Can
contain many levels. 5. Is a text
document - can be read by any text reader including notepad, and web browsers. 6.
Ideally suited for exchanging data on the web. RELATIONAL 1.
Requires two tables to store the data. 2. id is repeated in the scores table as a foreign
key serving as a means for joining the two tables. 3. The
structure is tabular – rows and columns. 4.
Limited to three levels: table; record; field. 5. Data
is stored in a proprietary binary format whose access is primarily limited
to a specific database system. 6. Not
suitable for exchanging data on the web.
Databases play a central role in today’s pervasive world of web applications.
A web application is an application that can be initiated by anyone on the web. The web user supplies the web browser with the web address (URL) of the application – for example www.google.com. The consumer of a web application will be called the client and the provider of the web application will be called the server. The server is often designed to host thousands of clients accessing the application at the same time from all regions of cyberspace. Consequently, the web application must be designed to embrace many different clients:
clients with various web browsers and versions;
clients with various computer hardware/software (platforms);
and more.
Below is a diagram depicting the interaction between clients and a server. Each client may engage in a series of request/response sequences. It should be noted that a single sever may host a variety of web applications at the same time.
WHAT ARE
THE MAIN COMPONENTS OF A WEB APPLICATION?
SERVER-SIDE
i. C#
ii. VB.NET (Visual Basic)
iii. C
iv. Others
CLIENT-SIDE
Web Browser such as Microsoft’s IE (Internet Explorer)
1.
Has the
facility to execute client-side scripts
written in:
a.
JavaScript
b.
VBScript
c.
Others
2.
Has the
facility to parse XML documents
3.
Exposes
(provides access) the DOM (Document Object Model)
HOW
DOES A SERVER-SIDE SCRIPT INTERACT WITH A DATABASE?
Using ASP.NET
A set of data access classes, called ADO.NET, is exposed (made available) to server-side scripts. This set contains many useful data access classes such as:
Here is a sequence of statements from a C# script that creates a dataset derived from the premier database.
string
strconn,strcmd1,strcmd2;
strconn="Provider=Microsoft.Jet.OLEDB.4.0;";
strconn+="data
Source=c:\\inetpub\\wwwroot\\premier.mdb";
strcmd1="select * from worker";
// connect & execute the
SQL command using DataAdapter’s connection object
& command object
OleDbDataAdapter da=new OleDbDataAdapter(strcmd1,strconn);
DataSet ds=new DataSet(); // create a dataset
object
da.Fill(ds,"worker"); // assign the worker table to the dataset (ds)
strcmd1="select * from assignment";
// execute the SQL command
using DataAdapter’s command object
da.SelectCommand.CommandText=strcmd1;
da.Fill(ds,"assignment"); // assign the assignment table to the dataset (ds)
strcmd1="select * from building";
// execute the SQL command
using DataAdapter’s command object
da.SelectCommand.CommandText=strcmd1; // execute the SQL command
da.Fill(ds,"building"); // assign the building table to the dataset (ds)
// finally, the DataSet, ds, is complete
MORE
ABOUT THE DATASET
This section will use the DataSet, ds, that was created above. The Dataset is a very useful data access object.
Here is an outline of the DataSet.
Assume:
ds = a DataSet object
dt = a DataTable object.
dr = a DataRow object
dc = a DataColumn object
1. Each Dataset has a collection of Tables.
ds.Tables
2. Each Table has a collection of Rows plus a collection of Columns.
a. dt.Rows
b. dt.Columns
3. Collections such as Tables, Rows, and Columns:
a. have a Count property.
i. ds.Tables.Count Indicates the number of tables
ii. dt.Rows.Count Indicates the number of rows
iii. dt.Columns.Count Indicates the number of columns
b. can be indexed to reference individual members of the collection – the index is zero-based
i. ds.Tables[2] refers to the 3rd table in the collection
ii. dt,Rows[0] refers to the 1st row in the collection
iii. dt.Rows[2][3] refers to the value stored in the 4th column of the 3rd row in the collection
4. Each Row is a list of column values and is indexed.
dr[3] refers to the 4th value in the row, which represents column 4’s value.
5. Each Column has a set of properties such as ColumnName. An individual column is not indexed.
dc.ColumnName
Here is a list of various Dataset related C# notations:
ds.Tables refers to the
collection of tables comprising ds
ds.Tables.Count to the number
(count) of tables in the dataset
ds.Tables[0] refers to to the first table assigned to ds
– the “worker” table
ds.Tables[“worker”]
also
refers to the “worker” table
ds.Tables[1] refers to the second table
assigned to ds – the “assignment” table, etc.
ds.Tables[“worker”].Rows refers to the “worker” table’s rows
DataTable dtw=ds.Tables["worker"]; creates a DataTable object for the
“worker” table
dtw.Rows refers to the
“worker” table’s rows – easier notation than
ds.Tables[“worker”].Rows
dtw.Rows.Count refers to the number
(count) of rows in the table
dtw.Rows[0] refers to the first
row of the table
dtw.Rows[0][0]
refers to
the value stored in the first column of the first row of the”worker”
table
dtw.Rows[0]["worker_id"] refers to the value stored in the first column (“worker_id”) of the first row of the table
DataRow dr=dtw.Rows[0]; creates a DataRow object of the first row of the “worker” table
dr[0] refers to the value
stored in the first column of the first row of the “worker” table
dtw.Columns refers to the
collection of columns in the “worker” table
dtw.Columns.Count dtw.Columns.Count refers to the
number (count)
of columns in the “work” table
dtw.Columns[0].ColumnName refers to the name of column 1
DataColumn dc=dtw.Columns[0]; creates a DataColumn object of the
first column of the “worker” table
dc.ColumnName refers to the name
of column 1 – same as dtw.Columns[0].ColumnName
Here is a nested for loop that outputs the column name and column value of each row of the worker table
for (int
row=0;row<dtw.Rows.Count;row++)
{
for
(int col=0;col<dtw.Columns.Count;col++)
{
Response.Write(dtw.Columns[col].ColumnName+" = ");
Response.Write(dtw.Rows[row][col]);
Response.Write("<br>");
}
Response.Write("**************************************<br>");
}
Here is the output of the first two rows:
WORKER_ID = 1235
WORKER_NAME = M. FARADAY
HRLY_RATE = 18.5
SKILL_TYPE = ELECTRIC
SUPV_ID = 1311
**************************************
WORKER_ID = 1311
WORKER_NAME = C. COULOMB
HRLY_RATE = 15.5
SKILL_TYPE = ELECTRIC
SUPV_ID = 1311
**************************************