Introduction
Historical ContextEdgar Frank Codd defined the foundations of the relational model in the late 1960s. In 1970, he published “A Relational Model of data for Large Shared Data Banks,” which is considered one of the most influential documents in computer science history.
Before Codd’s model, the only standardized theoretical model was Codasyl, which was widely used in the 1970s. Codd introduced the term “relation” (sometimes translated as interrelation) that groups data independently of its physical storage.
Core Objectives
Physical Independence
The way data is stored should not influence its logical manipulation. If physical storage changes, users shouldn’t even notice.
Logical Independence
Applications using the database should not require modification when database elements are modified.
Flexibility
The database easily offers different views depending on users and applications.
Uniformity
Logical structures always have a single conceptual form (tables).
Basic Terminology
Relations and Their Components
Relation (Table)
A mathematical construct representing a set of tuples
Visually represented as a two-dimensional table
Contains rows (tuples) and columns (attributes)
Has a unique name within the database schema
Attributes (Columns)
Named properties or characteristics of an entity
Each attribute has an associated domain
Must have unique names within a relation
Can be simple or composite
Types: Simple, Composite, Single-valued, Multi-valued, Derived
Tuples (Rows)
Represents a single instance or record in the relation
Must be unique within the relation
Order is not significant
Cannot contain repeating groups (1NF requirement)
Domain
Set of allowable values for an attribute
Defines the data type and constraints
Examples: integers, strings, dates
Can have additional restrictions (CHECK constraints)
Extended Concepts
Logical structure of the relation
Written as R(A₁, A₂, …, Aₙ)
R is relation name, A₁ to Aₙ are attributes
Represents metadata about the relation
Actual data in the relation at a given time
Collection of tuples currently stored
Changes frequently unlike schema
Must conform to all constraints
Number of attributes in a relation
Also called arity or width
Unary (1), Binary (2), Ternary (3), etc.
Remains constant unless schema changes
Number of tuples in a relation
Changes as data is modified
Important for query optimization
Used in relationship classifications
Key Concepts
Domain : Contains all possible values that a particular attribute can take
Degree : Indicates the size of a relation based on the number of columns
Cardinality : Number of tuples in a relation
Table Types
Persistent Tables
Base : Independent tables created by specifying their structure
Views : Tables that only store a query definition
Snapshots : Views that store both data and query
Temporary Tables
Automatically eliminated by the system. Can be any of the above types.
Keys
Important Keys
Candidate Key : Set of attributes that uniquely identify each tuple
Primary Key : Chosen candidate key as tuple identifier
Alternative Key : Any candidate key that isn’t primary
Foreign Key : Attributes related to another table’s primary key
Relational Model Constraints
Inherent Constraints
Domain Constraint : Any value in a tuple must belong to the attribute’s domain
Key Constraint : Each tuple in a relation must be unique
User Constraints
NULL Constraint
Specifies whether an attribute can contain null values
Uniqueness
Marks candidate keys that aren’t primary keys
Entity Integrity
No key attribute can have a null value
Referential Integrity
Foreign key values must match existing primary keys in referenced relations
Codd’s Rules
Rule ZeroFor a system to be called a relational database management system, it must use its relational capabilities exclusively to manage the database.
Information Rule : All information must be represented as data values in tables
Guaranteed Access : Every data value must be accessible through table name, primary key, and column name
Systematic NULL Treatment : NULL values must be supported systematically
Dynamic Catalog : Database description must be represented at the logical level like regular data
Comprehensive Language : Must support at least one relational language
View Updating : All theoretically updatable views must be updatable by the system
High-Level Operations : Insert, update, and delete operations must be supported at the set level
Physical Independence : Changes in physical storage or access methods don’t affect application logic
Logical Independence : Changes in base tables don’t affect application logic
Integrity Independence : Integrity constraints must be definable in the database language
Distribution Independence : Same operations work in both centralized and distributed databases
Nonsubversion : If a low-level interface exists, it cannot be used to bypass integrity rules
Basic Entity Types
Transformation Rules
Regular Entity Types
Create a relation for each strong entity
Primary key becomes the relation’s primary key
Include all simple attributes
For composite attributes, include only their components
Choose one alternative for derived attributes
emp_id: INTEGER PRIMARY KEY ,
Weak Entity Types
Create a relation for the weak entity
Include primary key of owner entity
Add partial key attributes
Combined key becomes primary key
dependent_name: VARCHAR ( 50 ),
relation_type: VARCHAR ( 20 ),
PRIMARY KEY (emp_id, dependent_name),
FOREIGN KEY (emp_id) REFERENCES EMPLOYEE
Simple Attributes
Directly converted to columns
Maintain their data types
Example: name VARCHAR(50)
Composite Attributes
Either decompose into simple attributes
Or create as single concatenated field
Example: address_street, address_city vs full_address
Multi-valued Attributes
Create new relation
Include primary key of original entity
Add attribute for the multi-valued field
Create composite primary key
Derived Attributes
Option 1: Store calculation formula
Option 2: Store computed value
Option 3: Create view or computed column
Relationship Mapping
One-to-One (1:1)
Option A: Foreign key in either relation
Option B: Merge into single relation if total participation
Add unique constraint on foreign key
emp_id: INTEGER PRIMARY KEY ,
FOREIGN KEY (dept_id) REFERENCES DEPARTMENT
One-to-Many (1:N)
Add foreign key in N-side relation
No additional relation needed
emp_id: INTEGER PRIMARY KEY ,
FOREIGN KEY (dept_id) REFERENCES DEPARTMENT
Many-to-Many (M:N)
Create new relation for relationship
Include primary keys of both entities
Add any relationship attributes
PRIMARY KEY (emp_id, project_id),
FOREIGN KEY (emp_id) REFERENCES EMPLOYEE,
FOREIGN KEY (project_id) REFERENCES PROJECT
Inheritance/Specialization
Single Relation: Merge all attributes
Multiple Relations: Separate tables with foreign keys
Union Relations: Mixed approach based on constraints
Aggregation
Transform underlying relationship first
Create relation for aggregated unit
Link through foreign keys
N-ary Relationships
Complex RelationshipsFor relationships involving more than two entities:
Create new relation for the relationship
Include primary keys of all participating entities
Add any relationship attributes
Primary key typically includes all foreign keys
Best Practices
Performance Considerations
Consider indexing strategy
Evaluate denormalization needs
Analyze query patterns
Integrity Rules
Implement all necessary constraints
Consider triggers for complex rules
Document assumptions and decisions
[Rest of original content remains the same…]
Normalization
Important ProcessNormalization is a process to verify that defined tables meet certain conditions. It aims to guarantee:
No redundancy exists
Coherent representation of entities and relationships
Logical problems are minimized
First Normal Form (1NF) : No repeating groups allowed
Second Normal Form (2NF) : Must be in 1NF and non-key attributes must fully depend on the primary key
Third Normal Form (3NF) : Must be in 2NF and no transitive dependencies allowed
Boyce-Codd Normal Form (BCNF) : Must be in 3NF and every determinant must be a candidate key
Advanced FormsThere are additional normal forms (4NF, 5NF, DKNF, 6NF) but they are mainly theoretical and rarely used in real-world applications.