📌 Basics of Databases
🔹 What is a Database?
A database is an organized collection of structured data stored electronically
in a computer system. Databases allow efficient data retrieval, insertion,
updating, and deletion.
Database Management System or DBMS in short refers to the
technology of storing and retrieving users' data with utmost efficiency along
with appropriate security measures. This tutorial explains the basics of DBMS
such as its architecture, data models, data schema, data independence, E-R
model, relation model, relational database design, and storage and file
structure and much more.
💡 Real-life analogy: A database is like a cupboard. Each shelf is a table, and items on that shelf
are data entries.
A modern DBMS has the following characteristics −
Real-world entity − A modern DBMS is more realistic and uses real-
world entities to design its architecture. It uses the behavior and
attributes too. For example, a school database may use students as an
entity and their age as an attribute.
Relation-based tables − DBMS allows entities and relations among
them to form tables. A user can understand the architecture of a
database just by looking at the table names.
Isolation of data and application − A database system is entirely
different than its data. A database is an active entity, whereas data is
said to be passive, on which the database works and organizes. DBMS
also stores metadata, which is data about data, to ease its own
process.
Less redundancy − DBMS follows the rules of normalization, which
splits a relation when any of its attributes is having redundancy in
values. Normalization is a mathematically rich and scientific process
that reduces data redundancy.
Consistency − Consistency is a state where every relation in a
database remains consistent. There exist methods and techniques,
which can detect attempt of leaving database in inconsistent state. A
DBMS can provide greater consistency as compared to earlier forms of
data storing applications like file-processing systems.
Query Language − DBMS is equipped with query language, which
makes it more efficient to retrieve and manipulate data. A user can
apply as many and as different filtering options as required to retrieve
a set of data. Traditionally it was not possible where file-processing
system was used.
Types of DBMS:
1. Relational (SQL): Relational databases are the most widely used type
of database today. They store data in tables, with rows representing
records and columns representing attributes of the records. In this
database, every piece of information has a relationship with every
other piece of information. This is on account of every data value in the
database having a unique identity in the form of a record.
E.g. : MySQL, PostgreSQL, Oracle
2. NoSQL: A NoSQL database (short for “non-SQL” or “non-relational”) provides a
mechanism for storing and retrieving data that does not rely on traditional table-
based relational models. Instead, it uses flexible data models like key-value
pairs, documents, column families, or graphs, making it ideal for handling
unstructured, semi-structured, and structured data.
NoSQL databases are known for their simplicity of
design, horizontal scalability (adding more servers for scaling),
and high availability. Unlike relational databases, their data
structures allow faster operations in certain use cases. E.g. : MongoDB,
Cassandra (for unstructured data)
🔹 Tables, Rows, and Columns
A relational database defines database relationships in the form of
tables.
A table is a collection of related data entries, and it consists of
columns and rows. (e.g.- Users)
A column holds specific information about every record in the table.
(e.g.- user_id, name)
A record (or row) is each individual entry that exists in a table.
🔸 Example: Users Table
CREATE TABLE Users (
user_id INT PRIMARY KEY,
name VARCHAR(50),
email VARCHAR(100)
);
🔹 Primary & Foreign Keys
Primary Key (PK): A primary key is used to ensure that data in the
specific column is unique. A column cannot have NULL values. It is
either an existing table column or a column that is specifically
generated by the database according to a defined sequence. (e.g.,
user_id).
Example: STUD_NO, as well as STUD_PHONE, are candidate keys for
relation STUDENT but STUD_NO can be chosen as the primary key
(only one out of many candidate keys).
Foreign Key (FK): A foreign key is a column or group of columns in a relational
database table that provides a link between data in two tables. It is a column (or
columns) that references a column (most often the primary key) of another table.
Example: STUD_NO in STUDENT_COURSE is a foreign key to STUD_NO
in STUDENT relation.
🔸 Example: Orders Table with FK
CREATE TABLE Orders (
order_id INT PRIMARY KEY,
user_id INT,
order_date DATE,
FOREIGN KEY (user_id) REFERENCES Users(user_id)
);
📌 SQL Commands
🔹 Basic CRUD Operations
1. Select Statement :
The SELECT statement is used to select data from a database.
The data returned is stored in a result table, called the result-set.
Syntax:
SELECT column1, column2, ...
FROM table_name;
Here, column1, column2, ... are the field names of the table
you want to select data from. If you want to select all the fields
available in the table, use the following syntax:
SELECT * FROM table_name;
SELECT DISTINCT Statement:
The SELECT DISTINCT statement is used to return only distinct
(different) values.
Inside a table, a column often contains many duplicate values;
and sometimes you only want to list the different (distinct)
values.
Syntax:
SELECT DISTINCT column1, column2, ...
FROM table_name;
INSERT INTO Statement
The INSERT INTO statement is used to insert new records in a
table.
INSERT INTO Syntax
It is possible to write the INSERT INTO statement in two ways:
1. Specify both the column names and the values to be
inserted:
INSERT INTO table_name (column1, column2, column3, ...)
VALUES (value1, value2, value3, ...);
2. If you are adding values for all the columns of the table, you do
not need to specify the column names in the SQL query.
However, make sure the order of the values is in the same
order as the columns in the table. Here, the INSERT
INTO syntax would be as follows:
INSERT INTO table_name
VALUES (value1, value2, value3, ...);
UPDATE Statement
The UPDATE statement is used to modify the existing records
in a table.
UPDATE Syntax:
UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;
Note: Be careful when updating records in a table! Notice
the WHERE clause in the UPDATE statement.
The WHERE clause specifies which record(s) that should be
updated. If you omit the WHERE clause, all records in the table
will be updated!
DELETE Statement
The DELETE statement is used to delete existing records in a
table.
DELETE Syntax
DELETE FROM table_name WHERE condition;
Note: Be careful when deleting records in a table! Notice
the WHERE clause in the DELETE statement.
The WHERE clause specifies which record(s) should be deleted.
If you omit the WHERE clause, all records in the table will be
deleted!
🔹 Filtering & Sorting
WHERE Clause:
The WHERE clause is used to filter records.
It is used to extract only those records that fulfill a specified
condition.
Syntax:
SELECT column1, column2, ...
FROM table_name
WHERE condition;
Text Fields vs. Numeric Fields
SQL requires single quotes around text values (most database
systems will also allow double quotes).
However, numeric fields should not be enclosed in quotes:
Example
SELECT * FROM Customers
WHERE CustomerID = 1;
The following operators can be used in the WHERE clause:
=,>,>=,<,<=,<>(Not equal ,in some versions of sql this
operator maybe written as: !=), BETWEEN,LIKE, IN ,etc.
AND, OR and NOT Operators:
The WHERE clause can be combined with AND, OR,
and NOT operators.
The AND and OR operators are used to filter records based on
more than one condition:
The AND operator displays a record if all the conditions
separated by AND are TRUE.
The OR operator displays a record if any of the conditions
separated by OR is TRUE.
The NOT operator displays a record if the condition(s) is NOT
TRUE.
AND Syntax:
SELECT column1, column2, ...
FROM table_name
WHERE condition1 AND condition2 AND condition3 ...;
OR Syntax:
SELECT column1, column2, ...
FROM table_name
WHERE condition1 OR condition2 OR condition3 ...;
NOT Syntax:
SELECT column1, column2, ...
FROM table_name
WHERE NOT condition;
ORDER BY Keyword:
The ORDER BY keyword is used to sort the result-set in
ascending or descending order.
The ORDER BY keyword sorts the records in ascending order by
default. To sort the records in descending order, use
the DESC keyword.
ORDER BY Syntax:
SELECT column1, column2, ...
FROM table_name
ORDER BY column1, column2, ... ASC|DESC;
GROUP BY:
The GROUP BY statement groups rows that have the same values into summary
rows, like "find the number of customers in each country".
The GROUP BY statement is often used with aggregate functions
(COUNT(), MAX(), MIN(), SUM(), AVG()) to group the result-set by one or
more columns.
GROUP BY Syntax
SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
ORDER BY column_name(s);
SELECT country, COUNT(*) FROM Customers GROUP BY country;
HAVING:
The HAVING clause was added to SQL because the WHERE keyword cannot be used
with aggregate functions.
HAVING Syntax
SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
HAVING condition
ORDER BY column_name(s);
SELECT country, COUNT(*) FROM Customers GROUP BY country HAVING COUNT(*) >
10;
🔹 JOINs
A JOIN clause is used to combine rows from two or more tables,
based on a related column between them.
Let's look at a selection from the "Orders" table:
Then, look at a selection from the "Customers" table:
Notice that the "CustomerID" column in the "Orders" table refers to
the "CustomerID" in the "Customers" table. The relationship
between the two tables above is the "CustomerID" column.
Then, we can create the following SQL statement (that contains
an INNER JOIN), that selects records that have matching values in
both tables:
ExampleGet your own SQL Server
SELECT [Link], [Link],
[Link]
FROM Orders
INNER JOIN Customers ON [Link]=[Link]
ID;
and it will produce something like this:
Types of Joins:
1. Inner Join:
The INNER JOIN keyword selects records that have matching values in both
tables.
INNER JOIN S YNTAX
SELECT C O L U M N _ N A M E ( S )
FROM T A B L E 1
INNER JOIN T A B L E 2
ON T A B L E 1. C O L U M N _ N A M E = T A B L E 2. C O L U M N _ N A M E ;
Note: The INNER JOIN keyword selects all rows from both
tables as long as there is a match between the columns.
If there are records in the "Orders" table that do not have
matches in "Customers", these orders will not be shown!
Below is a selection from the "Orders" table:
And a selection from the "Customers" table:
The following SQL statement selects all orders with customer
information:
SELECT [Link], [Link]
FROM Orders
INNER JOIN Customers ON [Link] =
[Link];
[Link] Join:
The LEFT JOIN keyword returns all records from the left table
(table1), and the matching records (if any) from the right table
(table2).
LEFT JOIN Syntax
SELECT column_name(s)
FROM table1
LEFT JOIN table2
ON table1.column_name = table2.column_name;
Below is a selection from the "Customers" table:
And a selection from the "Orders" table:
Note: The LEFT JOIN keyword returns all records from the left table
(Customers), even if there are no matches in the right table
(Orders).
The following SQL statement will select all customers, and any
orders they might have:
SELECT [Link], [Link]
FROM Customers
LEFT JOIN Orders ON [Link] = [Link]
ORDER BY [Link];
[Link] Join:
The RIGHT JOIN keyword returns all records from the right table
(table2), and the matching records (if any) from the left table
(table1).
RIGHT JOIN Syntax
SELECT column_name(s)
FROM table1
RIGHT JOIN table2
ON table1.column_name = table2.column_name;
Below is a selection from the "Orders" table:
And a selection from the "Employees" table:
The following SQL statement will return all employees, and any
orders they might have placed:
SELECT [Link], [Link],
[Link]
FROM Orders
RIGHT JOIN Employees ON [Link] =
[Link]
ORDER BY [Link];
Note: The RIGHT JOIN keyword returns all records from the right
table (Employees), even if there are no matches in the left table
(Orders).
📌 Advanced SQL
🔹 Subqueries
In SQL, a subquery can be defined as a query embedded within another
query. It is often used in the WHERE, HAVING, or FROM clauses of a
statement. Subqueries are commonly used with SELECT, UPDATE, INSERT,
and DELETE statements to achieve complex filtering and data
manipulation. They are an essential tool when we need to perform
operations like:
Filtering: Getting specific records based on conditions derived from
another query.
Aggregating: Performing aggregate functions like SUM, COUNT,
or AVG based on subquery results.
Updating: Dynamically updating records based on values from other
tables.
Deleting: Deleting records from one table using criteria based on
another.
While there is no universal syntax for subqueries, they are commonly used
in SELECT statements as follows. This general syntax allows the outer query
to use the results of the inner subquery for filtering or other operations.
Syntax
SELECT column_name
FROM table_name
WHERE column_name expression operator
(SELECT column_name FROM table_name WHERE …);
Key Characteristics of Subqueries
1. Nested Structure: A subquery is executed within the context of an
outer query.
2. Parentheses: Subqueries must always be enclosed in parentheses ().
3. Comparison Operators: Subqueries can be used with operators
like =, >, <, IN, NOT IN, LIKE, etc.
4. Single-Row vs. Multi-Row Subqueries: Subqueries may return a
single value (e.g., a single row) or multiple values. Depending on the
result, different SQL constructs may be required.
Common SQL Clauses for Subqueries
Subqueries are frequently used in specific SQL clauses to achieve more
complex results. Here are the common clauses where subqueries are used:
1. WHERE Clause: Subqueries in the WHERE clause help filter data based
on the results of another query. For example, you can filter records based on
values returned by a subquery.
2. FROM Clause: Subqueries can be used in the FROM clause to treat the
result of the subquery as a derived table or temporary table that can be
joined with other tables.
3. HAVING Clause: Subqueries in the HAVING clause allow you to filter
aggregated data after performing group operations.
Types of Subqueries
1. Single-Row Subquery: Returns a single value (row). Useful with
comparison operators like =, >, <.
2. Multi-Row Subquery: Returns multiple values (rows). Useful with
operators like IN, ANY, ALL.
3. Correlated Subquery: Refers to columns from the outer query in the
subquery. Unlike regular subqueries, the subquery depends on the
outer query for its values.
4. Non-Correlated Subquery: Does not refer to the outer query and
can be executed independently.
Examples of Using SQL Subqueries
These examples showcase how subqueries can be used for various
operations like selecting, updating, deleting, or inserting data, providing
insights into their syntax and functionality. Through these examples, we
will understand the flexibility and importance of subqueries in
simplifying complex database tasks. Consider the following two tables:
1. DATABASE TABLE
Database Table
2. STUDENT TABLE
Student Table
Example 1: Fetching Data Using Subquery in WHERE Clause
This example demonstrates how to use a subquery to retrieves roll numbers
of students in section ‘A’, and the outer query uses those roll numbers to
fetch corresponding details (name, location, and phone number) from
the DATABASE table. This enables filtering based on results from another
table.
Query:
SELECT NAME, LOCATION, PHONE_NUMBER
FROM DATABASE
WHERE ROLL_NO IN (
SELECT ROLL_NO FROM STUDENT WHERE SECTION='A'
);
Output
NAM LOCATIO PHONE_NUMB
E N ER
Ravi Salem 8989898989
Coimbato
Raj 8877665544
re
Explanation: The inner query fetches the roll numbers of students in section
‘A’. The outer query uses those roll numbers to filter records from
the DATABASE table.
Example 2: Using Subquery with INSERT
In this example, a subquery is used to insert all records from
the Student2 table into the Student1 table. The SELECT statement inside
the INSERT INTO statement fetches all the data from Student2 and inserts it
into Student1.
Student1 Table
NAM ROLL_N LOCATIO PHONE_NUMB
E O N ER
Ra
101 chennai 9988773344
m
coimbato
Raju 102 9090909090
re
NAM ROLL_N LOCATIO PHONE_NUMB
E O N ER
Ravi 103 salem 8989898989
Student2 Table
NAM ROLL_N LOCATIO PHONE_NUMB
E O N ER
Raj 111 chennai 8787878787
Sai 112 mumbai 6565656565
coimbato
Sri 113 7878787878
re
Query:
INSERT INTO Student1
SELECT * FROM Student2;
Output
NAM ROLL_N LOCATIO PHONE_NUMB
E O N ER
Ra 101 chennai 9988773344
NAM ROLL_N LOCATIO PHONE_NUMB
E O N ER
coimbato
Raju 102 9090909090
re
Ravi 103 salem 8989898989
Raj 111 chennai 8787878787
Sai 112 mumbai 6565656565
coimbato
Sri 113 7878787878
re
Explanation: The SELECT statement inside the INSERT INTO query fetches
all records from Student2 and inserts them into Student1.
Example 3: Using Subquery with DELETE
Subqueries are often used in DELETE statements to remove rows from a
table based on criteria derived from another table. The subquery retrieves
roll numbers of students from Student1 where the location is ‘Chennai‘.
The outer query then deletes records from Student2 whose roll numbers
match those from the subquery. This allows for targeted deletion based on
data from another table.
Query:
DELETE FROM Student2
WHERE ROLL_NO IN (SELECT ROLL_NO
FROM Student1
WHERE LOCATION = 'chennai');
Output
NAM ROLL_N LOCATIO PHONE_NUMB
E O N ER
Sai 112 mumbai 6565656565
coimbato
Sri 113 7878787878
re
Explanation: The subquery retrieves roll numbers of students
from Student1 who are located in ‘Chennai’. The outer query deletes those
records from Student2.
Example 4: Using Subquery with UPDATE
The subquery retrieves the locations of ‘Raju’ and ‘Ravi’ from Student1. The
outer query then updates the NAME in Student2 to ‘Geeks’ for all students
whose LOCATION matches any of the retrieved locations. This allows for
updating data in Student2 based on conditions from Student1.
Query:
UPDATE Student2
SET NAME='geeks'
WHERE LOCATION IN (SELECT LOCATION
FROM Student1
WHERE NAME IN ('Raju', 'Ravi'));
Output
ROLL_N LOCATIO PHONE_NUMB
NAME O N ER
Sai 112 mumbai 6565656565
geek coimbato
113 7878787878
s re
Explanation: The inner query fetches the locations of ‘Raju’ and ‘Ravi’
from Student1. The outer query updates the name to ‘Geeks’
in Student2 where the location matches those of ‘Raju’ or ‘Ravi’
🔹 Views
What is a View in SQL?
A view in SQL is a saved SQL query that acts as a virtual table. Unlike regular
tables, views do not store data themselves. Instead, they dynamically
generate data by executing the SQL query defined in the view each time it is
accessed. It can fetch data from one or more tables and present it in
a customized format, allowing developers to:
Simplify Complex Queries: Encapsulate complex joins and
conditions into a single object.
Enhance Security: Restrict access to specific columns or rows.
Present Data Flexibly: Provide tailored data views for different
users.
We will be using these two SQL tables for examples.
StudentDetails
-- Create StudentDetails table
CREATE TABLE StudentDetails (
S_ID INT PRIMARY KEY,
NAME VARCHAR(255),
ADDRESS VARCHAR(255)
);
INSERT INTO StudentDetails (S_ID, NAME, ADDRESS)
VALUES
(1, 'Harsh', 'Kolkata'),
(2, 'Ashish', 'Durgapur'),
(3, 'Pratik', 'Delhi'),
(4, 'Dhanraj', 'Bihar'),
(5, 'Ram', 'Rajasthan');
StudentMarks
-- Create StudentMarks table
CREATE TABLE StudentMarks (
ID INT PRIMARY KEY,
NAME VARCHAR(255),
Marks INT,
Age INT
);
INSERT INTO StudentMarks (ID, NAME, Marks, Age)
VALUES
(1, 'Harsh', 90, 19),
(2, 'Suresh', 50, 20),
(3, 'Pratik', 80, 19),
(4, 'Dhanraj', 95, 21),
(5, 'Ram', 85, 18);
CREATE VIEWS in SQL
We can create a view using CREATE VIEW statement. A View can be
created from a single table or multiple tables.
Syntax:
CREATE VIEW view_name AS
SELECT column1, column2…..
FROM table_name
WHERE condition;
Key Terms:
view_name: Name for the View
table_name: Name of the table
condition: Condition to select rows
Example 1: Creating a Simple View from a Single Table
Let’s look at some examples of CREATE VIEW Statement in SQL to get a
better understanding of how to create views in SQL. In this example, we will
create a View named DetailsView from the table StudentDetails.
Query:
CREATE VIEW DetailsView AS
SELECT NAME, ADDRESS
FROM StudentDetails
WHERE S_ID < 5;
Use the below query to retrieve the data from this view
SELECT * FROM DetailsView;
Output:
Here, we will create a view named StudentNames from the table
StudentDetails. Query:
CREATE VIEW StudentNames AS
SELECT S_ID, NAME
FROM StudentDetails
ORDER BY NAME;
If we now query the view as,
SELECT * FROM StudentNames;
Output:
Example 2: Creating a View From Multiple Tables
In this example we will create a View MarksView that combines data from
both tables StudentDetails and StudentMarks. To create a View from
multiple tables we can simply include multiple tables in
the SELECT statement.
Query:
CREATE VIEW MarksView AS
SELECT [Link], [Link],
[Link]
FROM StudentDetails, StudentMarks
WHERE [Link] = [Link];
To display data of View MarksView:
SELECT * FROM MarksView;
Output:
Managing Views: Listing, Updating, and Deleting
1. Listing all Views in a Database
We can list all the Views in a database, using the SHOW FULL
TABLES statement or using the information_schema table. A View can be
created from a single table or multiple tables
USE "database_name";
SHOW FULL TABLES WHERE table_type LIKE "%VIEW";
Using information_schema
SELECT table_name
FROM information_schema.views
WHERE table_schema = 'database_name';
OR
SELECT table_schema, table_name, view_definition
FROM information_schema.views
WHERE table_schema = 'database_name';
2. Deleting a View
SQL allows us to delete an existing View. We can delete or drop View using
the DROP statement. Here’s how to remove the MarksView:
DROP VIEW view_name;
Example: In this example, we are deleting the View MarksView.
DROP VIEW MarksView;
3. Updating a View Definition
If we want to update the existing data within the view, use
the UPDATE statement.
UPDATE view_name
SET column1 = value1, column2 = value2...., columnN = valueN
WHERE [condition];
If you want to update the view definition without affecting the data, use
the CREATE OR REPLACE VIEW statement. For example, let’s add
the Age column to the MarksView:
CREATE OR REPLACE VIEW view_name AS
SELECT column1, column2, ...
FROM table_name
WHERE condition;
Note: Not all views can be updated using the UPDATE statement.
Rules to Update Views in SQL:
Certain conditions need to be satisfied to update a view. If any of these
conditions are not met, the view can not be updated.
1. The SELECT statement which is used to create the view should not
include GROUP BY clause or ORDER BY clause.
2. The SELECT statement should not have the DISTINCT keyword.
3. The View should have all NOT NULL values.
4. The view should not be created using nested queries or complex
queries.
5. The view should be created from a single table. If the view is created
using multiple tables then we will not be allowed to update the view.
Advanced Techniques with Views
1. Updating Data Through Views
We can use the CREATE OR REPLACE VIEW statement to add or replace
fields from a view If we want to update the view MarksView and add the
field AGE to this View from StudentMarks Table, we can do this by:
Example:
CREATE OR REPLACE VIEW MarksView AS
SELECT [Link], [Link],
[Link], [Link]
FROM StudentDetails, StudentMarks
WHERE [Link] = [Link];
If we fetch all the data from MarksView now as:
SELECT * FROM MarksView;
Output:
2. Inserting Data into Views
We can insert a row in a View in the same way as we do in a table. We can
use the INSERT INTO statement of SQL to insert a row in a View. In the below
example, we will insert a new row in the View DetailsView which we have
created above in the example of “creating views from a single table“.
Example:
INSERT INTO DetailsView(NAME, ADDRESS)
VALUES("Suresh","Gurgaon");
If we fetch all the data from DetailsView now as,
SELECT * FROM DetailsView;
Output:
3. Deleting a row from a View
Deleting rows from a view is also as simple as deleting rows from a table. We
can use the DELETE statement of SQL to delete rows from a view. Also
deleting a row from a view first deletes the row from the actual table and the
change is then reflected in the view. In this example, we will delete the last
row from the view DetailsView which we just added in the above example of
inserting rows.
Example:
DELETE FROM DetailsView
WHERE NAME="Suresh";
If we fetch all the data from DetailsView now as,
SELECT * FROM DetailsView;
Output:
4. WITH CHECK OPTION Clause
The WITH CHECK OPTION clause in SQL is a very useful clause for views. It
applies to an updatable view. It is used to prevent data modification (using
INSERT or UPDATE) if the condition in the WHERE clause in the CREATE VIEW
statement is not satisfied.
If we have used the WITH CHECK OPTION clause in the CREATE VIEW
statement, and if the UPDATE or INSERT clause does not satisfy the
conditions then they will return an error. In the below example, we are
creating a View SampleView from the StudentDetails Table with a WITH
CHECK OPTION clause.
Example:
CREATE VIEW SampleView AS
SELECT S_ID, NAME
FROM StudentDetails
WHERE NAME IS NOT NULL
WITH CHECK OPTION;
In this view, if we now try to insert a new row with a null value in the NAME
column then it will give an error because the view is created with the
condition for the NAME column as NOT NULL. For example, though the View
is updatable then also the below query for this View is not valid:
INSERT INTO SampleView(S_ID)
VALUES(6);
🔹 Indexing
What Are Indexes in SQL?
An index in SQL is a schema object that improves the speed of data
retrieval operations on a table. It works by creating a separate data
structure that provides pointers to the rows in a table, making it faster to
look up rows based on specific column values. Indexes act as a table of
contents for a database, allowing the server to locate data quickly and
efficiently, reducing disk I/O operations.
Benefits of Indexes:
Faster Queries: Speeds up SELECT and JOIN operations.
Lower Disk I/O: Reduces the load on your database by limiting the
amount of data scanned.
Better Performance on Large Tables: Essential when working with
millions of records.
Creating an Index
Creating an index allows us to define a quick access path to data. SQL
indexes can be applied to one or more columns and can be
either unique or non-unique. Unique indexes ensure that no duplicate
values are entered in the indexed columns, while non-unique
indexes simply speed up queries without enforcing uniqueness. You can
create:
Single-column indexes: For basic queries
Multi-column indexes: For queries using multiple filters
Unique indexes: To ensure data uniqueness
Syntax:
CREATE INDEX index
ON TABLE column;
Example
CREATE INDEX idx_product_id
ON Sales (product_id);
Explanation:
This creates an index named idx_product_id on the product_id column in
the Sales table, improving the speed of queries that filter or join based on
this column.
Multi – Column Indexes
If queries often use more than one column in conditions, we can create
a multi-column index for better performance.
Syntax:
CREATE INDEX index
ON TABLE (column1, column2,…..);
Example
CREATE INDEX idx_product_quantity
ON Sales (product_id, quantity);
Explanation:
This index allows the database to quickly filter or join data based on
both product_id and quantity columns.
Unique Indexes
A unique index ensures that all values in the indexed column(s) are unique,
preventing duplicates. These are useful for maintaining the integrity of the
data, ensuring that no two rows have the same values in the indexed
columns.
Syntax:
CREATE UNIQUE INDEX index_name
ON table_name (column_name);
Example
CREATE UNIQUE INDEX idx_unique_employee_id
ON Employees (employee_id);
Explanation:
This index ensures that no two rows in the Employees table have the
same employee_id, which maintains data integrity and prevents duplicate
entries.
Removing an Index
If an index is no longer needed, it can be removed to improve write
performance or save storage space. As indexes can slow down
operations like INSERT, UPDATE, and DELETE due to the overhead of
maintaining them, dropping unnecessary indexes can improve overall
database efficiency. The DROP INDEX command is used for this purpose.
Syntax
DROP INDEX index;
Explanation:
This command removes an index from the database schema. It does not
affect the underlying data in the table but may slow down future queries that
would have benefited from the index.
Altering an Index
If an index requires adjustments, such as reorganizing or rebuilding, it
can be altered without affecting the data. This is useful for optimizing
index performance as tables grow larger.
Syntax:
ALTER INDEX IndexName
ON TableName REBUILD;
Explanation:
This command rebuilds the specified index, which can optimize query
performance by reorganizing its structure, especially in large tables.
Confirming and Viewing Indexes
We can view all the indexes in a database to understand which ones are in
use and confirm their structure. In SQL, the following query helps us see the
indexes for a given table:
Syntax:
SELECT * from USER_INDEXES;
Explanation:
This query retrieves all the indexes in the database schema, showing their
names and the columns they are associated with. We can use this
information to audit or manage existing indexes.
Renaming an Index
In some cases, renaming an index might be necessary for clarity or
consistency. While SQL doesn’t directly support renaming indexes, we can
use a combination of commands to achieve this.
Syntax:
EXEC sp_rename ‘old_index_name’, ‘new_index_name’, ‘INDEX’;
Explanation:
This command allows us to rename an existing index, which helps
maintain clarity in our database schema.
When Should Indexes Be Created?
Indexes can significantly improve query performance, but they should be
used judiciously. The following scenarios warrant creating indexes:
1. Wide Range of Values: Indexes are helpful when a column has a
wide range of values, such as product IDs or customer names, as they
speed up search operations.
2. Non-NULL Values: Columns that don’t contain many NULL values are
ideal candidates for indexing, as NULLs complicate the indexing
process.
3. Frequent Query Conditions: Indexes should be created on columns
frequently used in WHERE clauses or as part of a join condition.
When Should Indexes Be Avoided?
While indexes enhance performance, they may not always be beneficial,
especially in certain situations:
1. Small Tables: Indexes are not needed for small tables as queries will
likely perform well without them.
2. Infrequent Query Use: If a column is rarely used in queries, indexing
it will only add overhead.
3. Frequently Updated Columns: Avoid indexing columns that are
frequently updated, as the index will need to be updated with each
change, adding overhead.
Why SQL Indexing is Important?
Indexing in SQL is a critical feature for optimizing query performance,
especially for large datasets. Here are some common scenarios where
indexing proves beneficial:
1. Large Data Tables: SQL queries on tables with millions of rows can
significantly slow down due to full table scans. Indexes provide a faster
alternative by allowing quick access to relevant rows.
2. Join Optimization: Indexes on columns used for joining tables (such
as foreign keys) improve the performance of complex joins.
3. Search Operations: Queries that search for specific values in a
column can be sped up with indexes, reducing the time required to
perform lookups.
4. However, it is essential to be mindful of the storage cost and
performance tradeoffs associated with indexes. Over-indexing can
lead to unnecessary overhead, while under-indexing may slow down
data retrieval
🔹 Transactions
Ensures data integrity with COMMIT and ROLLBACK.
🔸 Example: Transfer money between accounts
BEGIN TRANSACTION;
UPDATE Accounts SET balance = balance - 100 WHERE account_id = 1;
UPDATE Accounts SET balance = balance + 100 WHERE account_id = 2;
COMMIT;
🔹 Data Constraints
[Link] and not null:
What is a NULL Value?
A field with a NULL value is a field with no value.
If a field in a table is optional, it is possible to insert a new
record or update a record without adding a value to this field.
Then, the field will be saved with a NULL value.
Note: A NULL value is different from a zero value or a field that
contains spaces. A field with a NULL value is one that has been
left blank during record creation!
How to Test for NULL Values?
It is not possible to test for NULL values with comparison
operators, such as =, <, or <>.
We will have to use the IS NULL and IS NOT NULL operators
instead.
IS NULL Syntax:
SELECT column_names
FROM table_name
WHERE column_name IS NULL;
IS NOT NULL Syntax:
SELECT column_names
FROM table_name
WHERE column_name IS NOT NULL;
[Link]:
SQL UNIQUE Constraint
The UNIQUE constraint ensures that all values in a column are
different.
Both the UNIQUE and PRIMARY KEY constraints provide a guarantee
for uniqueness for a column or set of columns.
A PRIMARY KEY constraint automatically has a UNIQUE constraint.
However, you can have many UNIQUE constraints per table, but only
one PRIMARY KEY constraint per table.
SQL UNIQUE Constraint on CREATE TABLE
The following SQL creates a UNIQUE constraint on the "ID" column
when the "Persons" table is created:
SQL Server / Oracle / MS Access:
CREATE TABLE Persons (
ID int NOT NULL UNIQUE,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int
);
MySQL:
CREATE TABLE Persons (
ID int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int,
UNIQUE (ID)
);
To name a UNIQUE constraint, and to define a UNIQUE constraint on
multiple columns, use the following SQL syntax:
MySQL / SQL Server / Oracle / MS Access:
CREATE TABLE Persons (
ID int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int,
CONSTRAINT UC_Person UNIQUE (ID,LastName)
);
ADVERTISEMENT
SQL UNIQUE Constraint on ALTER TABLE
To create a UNIQUE constraint on the "ID" column when the table is
already created, use the following SQL:
MySQL / SQL Server / Oracle / MS Access:
ALTER TABLE Persons
ADD UNIQUE (ID);
To name a UNIQUE constraint, and to define a UNIQUE constraint on
multiple columns, use the following SQL syntax:
MySQL / SQL Server / Oracle / MS Access:
ALTER TABLE Persons
ADD CONSTRAINT UC_Person UNIQUE (ID,LastName);
DROP a UNIQUE Constraint
To drop a UNIQUE constraint, use the following SQL:
MySQL:
ALTER TABLE Persons
DROP INDEX UC_Person;
SQL Server / Oracle / MS Access:
ALTER TABLE Persons
DROP CONSTRAINT UC_Person;
[Link]:
SQL CHECK Constraint
The CHECK constraint is used to limit the value range that can be
placed in a column.
If you define a CHECK constraint on a column it will allow only
certain values for this column.
If you define a CHECK constraint on a table it can limit the values in
certain columns based on values in other columns in the row.
SQL CHECK on CREATE TABLE
The following SQL creates a CHECK constraint on the "Age" column
when the "Persons" table is created. The CHECK constraint ensures
that the age of a person must be 18, or older:
MySQL:
CREATE TABLE Persons (
ID int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int,
CHECK (Age>=18)
);
SQL Server / Oracle / MS Access:
CREATE TABLE Persons (
ID int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int CHECK (Age>=18)
);
To allow naming of a CHECK constraint, and for defining
a CHECK constraint on multiple columns, use the following SQL
syntax:
MySQL / SQL Server / Oracle / MS Access:
CREATE TABLE Persons (
ID int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int,
City varchar(255),
CONSTRAINT CHK_Person CHECK (Age>=18 AND City='Sandnes')
);
ADVERTISEMENT
SQL CHECK on ALTER TABLE
To create a CHECK constraint on the "Age" column when the table is
already created, use the following SQL:
MySQL / SQL Server / Oracle / MS Access:
ALTER TABLE Persons
ADD CHECK (Age>=18);
To allow naming of a CHECK constraint, and for defining
a CHECK constraint on multiple columns, use the following SQL
syntax:
MySQL / SQL Server / Oracle / MS Access:
ALTER TABLE Persons
ADD CONSTRAINT CHK_PersonAge CHECK (Age>=18 AND City='San
dnes');
DROP a CHECK Constraint
To drop a CHECK constraint, use the following SQL:
SQL Server / Oracle / MS Access:
ALTER TABLE Persons
DROP CONSTRAINT CHK_PersonAge;
MySQL:
ALTER TABLE Persons
DROP CHECK CHK_PersonAge;
📌 Database Design
🔹 Normalization
Normalization is an important process in database design that helps improve
the database’s efficiency, consistency, and accuracy. It makes it easier to
manage and maintain the data and ensures that the database is adaptable
to changing business needs.
Database normalization is the process of organizing the attributes of
the database to reduce or eliminate data redundancy (having the same
data but at different places).
Data redundancy unnecessarily increases the size of the database as
the same data is repeated in many places. Inconsistency problems also
arise during insert, delete, and update operations.
In the relational model, there exist standard methods to quantify how
efficient a databases is. These methods are called normal forms and
there are algorithms to covert a given database into normal forms.
Normalization generally involves splitting a table into multiple ones
which must be linked each time a query is made requiring data from
the split tables.
Why do we need Normalization?
The primary objective for normalizing the relations is to eliminate the below
anomalies. Failure to reduce anomalies results in data redundancy, which
may threaten data integrity and cause additional issues as the database
increases. Normalization consists of a set of procedures that assist you in
developing an effective database structure.
Insertion Anomalies: Insertion anomalies occur when it is not
possible to insert data into a database because the required fields are
missing or because the data is incomplete. For example, if a database
requires that every record has a primary key, but no value is provided
for a particular record, it cannot be inserted into the database.
Deletion anomalies: Deletion anomalies occur when deleting a
record from a database and can result in the unintentional loss of data.
For example, if a database contains information about customers and
orders, deleting a customer record may also delete all the orders
associated with that customer.
Updation anomalies: Updation anomalies occur when modifying data
in a database and can result in inconsistencies or errors. For example,
if a database contains information about employees and their salaries,
updating an employee’s salary in one record but not in all related
records could lead to incorrect calculations and reporting.
Prerequisites for Understanding Database Normalization
In database normalization, we mainly put only tightly related
information together. To find the closeness, we need to find which
attributes are dependent on each other. To understand dependencies,
we need to learn the below concepts.
Keys are like unique identifiers in a table. For example, in a table of
students, the student ID is a key because it uniquely identifies each
student. Without keys, it would be hard to tell one record apart from
another, especially if some information (like names) is the same. Keys
ensure that data is not duplicated and that every record can be
uniquely accessed.
Functional dependency helps define the relationships between data in
a table. For example, if you know a student’s ID, you can find their
name, age, and class. This relationship shows how one piece of data
(like the student ID) determines other pieces of data in the same table.
Functional dependency helps us understand these rules and
connections, which are crucial for organizing data properly.
Once we figure out dependencies, we split tables to make sure that
only closely related data is together in a table. When we split tables,
we need to ensure that we do not loose information. For this, we need
to learn the below concepts.
Dependency Preserving Decomposition
Lossless Decomposition in DBMS
Features of Database Normalization
Elimination of Data Redundancy: One of the main features of
normalization is to eliminate the data redundancy that can occur in a
database. Data redundancy refers to the repetition of data in different
parts of the database. Normalization helps in reducing or eliminating
this redundancy, which can improve the efficiency and consistency of
the database.
Ensuring Data Consistency: Normalization helps in ensuring that
the data in the database is consistent and accurate. By eliminating
redundancy, normalization helps in preventing inconsistencies and
contradictions that can arise due to different versions of the same
data.
Simplification of Data Management: Normalization simplifies the
process of managing data in a database. By breaking down a complex
data structure into simpler tables, normalization makes it easier to
manage the data, update it, and retrieve it.
Improved Database Design: Normalization helps in improving the
overall design of the database. By organizing the data in a structured
and systematic way, normalization makes it easier to design and
maintain the database. It also makes the database more flexible and
adaptable to changing business needs.
Avoiding Update Anomalies: Normalization helps in avoiding update
anomalies, which can occur when updating a single record in a table
affects multiple records in other tables. Normalization ensures that
each table contains only one type of data and that the relationships
between the tables are clearly defined, which helps in avoiding such
anomalies.
Standardization: Normalization helps in standardizing the data in the
database. By organizing the data into tables and defining relationships
between them, normalization helps in ensuring that the data is stored
in a consistent and uniform manner.
Normal Forms in DBMS
First Normal Form
If a relation contains a composite or multi-valued attribute, it violates
the first normal form, or the relation is in the first normal form if it does
not contain any composite or multi-valued attribute. A relation is in
first normal form if every attribute in that relation is single-valued
attribute.
A table is in 1 NF if:
There are only Single Valued Attributes.
Attribute Domain does not change.
There is a unique name for every Attribute/Column.
The order in which data is stored does not matter.
Rules for First Normal Form (1NF) in DBMS
To follow the First Normal Form (1NF) in a database, these simple rules
must be followed:
1. Every Column Should Have Single Values
Each column in a table must contain only one value in a cell. No cell
should hold multiple values. If a cell contains more than one value, the
table does not follow 1NF.
Example: A table with columns like [Writer 1], [Writer 2], and [Writer
3] for the same book ID is not in 1NF because it repeats the same type
of information (writers). Instead, all writers should be listed in separate
rows.
2. All Values in a Column Should Be of the Same Type
Each column must store the same type of data. You cannot mix
different types of information in the same column.
Example: If a column is meant for dates of birth (DOB), you cannot
use it to store names. Each type of information should have its own
column.
3. Every Column Must Have a Unique Name
Each column in the table must have a unique name. This avoids
confusion when retrieving, updating, or adding data.
Example: If two columns have the same name, the database system
may not know which one to use.
4. The Order of Data Doesn’t Matter
In 1NF, the order in which data is stored in a table doesn’t affect how
the table works. You can organize the rows in any way without
breaking the rules.
Example:
Consider the below COURSES Relation :
In the above table, Courses has a multi-valued attribute, so it is not in
1NF. The Below Table is in 1NF as there is no multi-valued attribute.
What is Second Normal Form (2NF)?
Second Normal Form (2NF) is based on the concept of fully functional
dependency. It is a way to organize a database table so that it
reduces redundancy and ensures data consistency. For a table to be in
2NF, it must first meet the requirements of First Normal Form (1NF),
meaning all columns should contain single, indivisible values without any
repeating groups. Additionally, the table should not have partial
dependencies.
The primary goal of Second Normal Form is to eliminate partial
dependencies. A partial dependency happens when a non-prime
attribute (an attribute not part of a candidate key) depends on only a part of
a composite primary key, rather than on the entire key. Removing these
partial dependencies helps in reducing redundancy and preventing update
anomalies.
Example of Second Normal Form (2NF)
Consider a table storing information about students, courses, and their fees:
There are many courses having the same course fee. Here,
COURSE_FEE cannot alone decide the value of COURSE_NO or
STUD_NO.
COURSE_FEE together with STUD_NO cannot decide the value of
COURSE_NO.
COURSE_FEE together with COURSE_NO cannot decide the value of
STUD_NO.
The candidate key for this table is {STUD_NO, COURSE_NO} because
the combination of these two columns uniquely identifies each row in
the table.
COURSE_FEE is a non-prime attribute because it is not part of the
candidate key {STUD_NO, COURSE_NO}.
But, COURSE_NO -> COURSE_FEE, i.e., COURSE_FEE is dependent on
COURSE_NO, which is a proper subset of the candidate key.
Therefore, Non-prime attribute COURSE_FEE is dependent on a proper
subset of the candidate key, which is a partial dependency and so this
relation is not in 2NF.
In 2NF, we eliminate such dependencies by breaking the table into
two separate tables:
1. A table that links students and courses.
2. A table that stores course fees.
Now, each table is in 2NF:
The Course Table ensures that COURSE_FEE depends only
on COURSE_NO.
The Student-Course Table ensures there are no partial dependencies
because it only relates students to courses.
Now, the COURSE_FEE is no longer repeated in every row, and each table is
free from partial dependencies. This makes the database more
efficient and easier to maintain.
Why is 2NF Important?
By ensuring that a database table adheres to Second Normal Form, we
achieve several key benefits:
1. Reduces Redundancy: In our example, we no longer store the same
course fee multiple times. Instead, we store it once in the Course Fee table
and reference it in the Student-Course table.
2. Minimizes Update Anomalies: With data being centralized in the right
tables, you’re less likely to run into problems when you update or delete
information. For example, if a course fee changes, you only need to update it
in one place.
3. Improves Data Integrity: By eliminating partial dependencies, 2NF
ensures that the database structure is logical, which in turn ensures that
data relationships are consistent.
4. Enhances Query Efficiency: Queries will be more efficient, as tables are
smaller and more focused on specific data, making it faster to retrieve the
necessary information.
What is Partial Dependency?
A functional dependency denoted as X→Y where X and Y are an attribute
set of a relation, is a partial dependency , if some attribute A∈X can be
removed and the dependency still holds. For example, if you have a
functional dependency X→Y, where X is a composite candidate key (made of
multiple columns), and we can remove one column from X, but the
dependency still works, then it’s a partial dependency.
In a composite key (a key made of multiple attributes), a partial
dependency happens when one of the non-prime attributes depends only on
a part of the composite key. Here’s how to identify partial dependencies in
your database:
Look for functional dependencies where one attribute depends on a
part of the primary key, not the entire key.
If an attribute (like COURSE_FEE in our example) depends on just a
part of the key (COURSE_NO), it’s a partial dependency.
To remove partial dependencies, break the table into smaller tables
that store only relevant data together.
Third Normal Form (3NF)
A relation is in the third normal form, if there is no transitive dependency for
non-prime attributes as well as it is in the second normal form. A relation is
in 3NF if at least one of the following conditions holds in every non-trivial
function dependency X –> Y.
X is a super key.
Y is a prime attribute (each element of Y is part of some candidate
key).
In other words,
A relation that is in First and Second Normal Form and in which no non-
primary-key attribute is transitively dependent on the primary key, then it is
in Third Normal Form (3NF).
Note:
If A->B and B->C are two FDs then A->C is called transitive dependency.
The normalization of 2NF relations to 3NF involves the removal of transitive
dependencies. If a transitive dependency exists, we remove the transitively
dependent attribute(s) from the relation by placing the attribute(s) in a new
relation along with a copy of the determinant. Consider the examples given
below.
Example : Consider the below Relation,
In the relation CANDIDATE given above:
Functional dependency Set: {CAND_NO -> CAND_NAME, CAND_NO -
>CAND_STATE, CAND_STATE -> CAND_COUNTRY, CAND_NO ->
CAND_AGE}
So, Candidate key here would be: {CAND_NO}
For the relation given here in the table, CAND_NO -> CAND_STATE and
CAND_STATE -> CAND_COUNTRY are actually true. Thus,
CAND_COUNTRY depends transitively on CAND_NO. This transitive
relation violates the rules of being in the 3NF. So, if we want to convert
it into the third normal form, then we have to decompose the relation
CANDIDATE (CAND_NO, CAND_NAME, CAND_STATE, CAND_COUNTRY,
CAND_AGE) as:
CANDIDATE (CAND_NO, CAND_NAME, CAND_STATE, CAND_AGE)
STATE_COUNTRY (STATE, COUNTRY).
Example 2: Consider Relation R(A, B, C, D, E)
A -> BC,
CD -> E,
B -> D,
E -> A
All possible candidate keys in above relation are {A, E, CD, BC} . All attribute
are on right sides of all functional dependencies are prime. Therefore, the
above
Note:
Third Normal Form (3NF) is considered adequate for normal relational
database design because most of the 3NF tables are free of insertion,
update, and deletion anomalies. Moreover, 3NF always ensures functional
dependency preserving and lossless .
What is Transitive Dependency?
A transitive dependency occurs when a non-key attribute depends on the
another non-key attribute rather than directly on the primary key. For
instance, consider a table with the attributes (A, B, C) where A is the primary
key and B and C are non-key attributes. If B determines C then C is
transitively dependent on the A through B. This can lead to data anomalies
and redundancy which 3NF aims to eliminate by the ensuring that all non-
key attributes depend only on the primary key.
Boyce-Codd Normal Form (BCNF)
Boyce-Codd Normal Form (BCNF) is a stricter version of Third Normal Form
(3NF) that ensures a more simplified and efficient database design. It
enforces that every non-trivial functional dependency must have a superkey
on its left-hand side. This approach addresses potential issues with candidate
keys and ensures the database is free from redundancy.
BCNF eliminates redundancy more effectively than 3NF by strictly requiring
that all functional dependencies originate from super-keys.
BCNF is essential for good database schema design in higher-level systems
where consistency and efficiency are important, particularly when there are
many candidate keys (as one often finds with a delivery system).
Rules for BCNF
Rule 1: The table should be in the 3rd Normal Form.
Rule 2: X should be a super-key for every functional dependency (FD) X−>Y
in a given relation.
Note: To test whether a relation is in BCNF, we identify all the determinants
and make sure that they are candidate keys.
To determine the highest normal form of a given relation R with functional
dependencies, the first step is to check whether the BCNF condition holds. If
R is found to be in BCNF, it can be safely deduced that the relation is also
in 3NF, 2NF, and 1NF. The 1NF has the least restrictive constraint – it only
requires a relation R to have atomic values in each tuple. The 2NF has a
slightly more restrictive constraint.
The 3NF has a more restrictive constraint than the first two normal forms but
is less restrictive than the BCNF. In this manner, the restriction increases as
we traverse down the hierarchy.
We are going to discuss some basic examples which let you understand the
properties of BCNF. We will discuss multiple examples here.
Example 1
Consider a relation R with attributes (student, teacher, subject).
FD: { (student, Teacher) -> subject, (student, subject) -> Teacher, (Teacher)
-> subject}
Candidate keys are (student, teacher) and (student, subject).
The above relation is in 3NF (since there is no transitive dependency).
A relation R is in BCNF if for every non-trivial FD X->Y, X must be a key.
The above relation is not in BCNF, because in the FD (teacher-
>subject), teacher is not a key. This relation suffers with anomalies −
For example, if we delete the student Tahira , we will also lose the
information that [Link] teaches C. This issue occurs because the
teacher is a determinant but not a candidate key.
R is divided into two relations R1(Teacher, Subject) and R2(Student,
Teacher).
For more, refer to BCNF in DBMS.
How to Satisfy BCNF?
For satisfying this table in BCNF, we have to decompose it into further tables.
Here is the full procedure through which we transform this table into BCNF.
Let us first divide this main table into two
tables Stu_Branch and Stu_Course Table.
Stu_Branch Table
Stu_I
D Stu_Branch
101 Computer Science & Engineering
Electronics & Communication
102
Engineering
Candidate Key for this table: Stu_ID.
Stu_Course Table
Branch_Num Stu_Course_
Stu_Course ber No
DBMS B_001 201
Computer Networks B_001 202
VLSI Technology B_003 401
Branch_Num Stu_Course_
Stu_Course ber No
Mobile
B_003 402
Communication
Candidate Key for this table: Stu_Course.
Stu_Enroll Table
Stu_I Stu_Course_
D No
101 201
101 202
102 401
102 402
Candidate Key for this table: {Stu_ID, Stu_Course_No}.
After decomposing into further tables, now it is in BCNF, as it is passing the
condition of Super Key, that in functional dependency X−>Y, X is a Super
Key.
Example 3
Find the highest normal form of a relation R(A, B, C, D, E) with FD set as:
{ BC->D, AC->BE, B->E }
Explanation:
Step-1: As we can see, (AC)+ ={A, C, B, E, D} but none of its subsets
can determine all attributes of the relation, So AC will be the candidate
key. A or C can’t be derived from any other attribute of the relation, so
there will be only 1 candidate key {AC}.
Step-2: Prime attributes are those attributes that are part of candidate
key {A, C} in this example and others will be non-prime {B, D, E} in
this example.
Step-3: The relation R is in 1st normal form as a relational DBMS does
not allow multi-valued or composite attributes.
The relation is in 2nd normal form because BC->D is in 2nd normal form (BC
is not a proper subset of candidate key AC) and AC->BE is in 2nd normal
form (AC is candidate key) and B->E is in 2nd normal form (B is not a proper
subset of candidate key AC).
The relation is not in 3rd normal form because in BC->D (neither BC is a
super key nor D is a prime attribute) and in B->E (neither B is a super key
nor E is a prime attribute) but to satisfy 3rd normal for, either LHS of an FD
should be super key or RHS should be a prime attribute. So the highest
normal form of relation will be the 2nd Normal form.
Note: A prime attribute cannot be transitively dependent on a key in BCNF
relation.
Consider these functional dependencies of some relation R
AB ->C
C ->B
AB ->B
From the above functional dependency, we get that the candidate key of R is
AB and AC. A careful observation is required to conclude that the above
dependency is a Transitive Dependency as the prime attribute B transitively
depends on the key AB through C. Now, the first and the third FD are in BCNF
as they both contain the candidate key (or simply KEY) on their left sides.
The second dependency, however, is not in BCNF but is definitely in 3NF due
to the presence of the prime attribute on the right side. So, the highest
normal form of R is 3NF as all three FDs satisfy the necessary conditions to
be in 3NF.
Example 3
For example consider relation R(A, B, C)
A -> BC,
B -> A
A and B both are super keys so the above relation is in BCNF.
Note: BCNF decomposition may always not be possible with dependency
preserving, however, it always satisfies the lossless join condition. For
example, relation R (V, W, X, Y, Z), with functional dependencies:
V, W -> X
Y, Z -> X
W -> Y
It would not satisfy dependency preserving BCNF decomposition.
Note: Redundancies are sometimes still present in a BCNF relation as it is
not always possible to eliminate them completely.
🔹 Entity-Relationship Model (ER Diagram)
The Entity Relationship Model is a model for identifying entities (like student,
car or company) to be represented in the database and representation of
how those entities are related. The ER data model specifies enterprise
schema that represents the overall logical structure of a database
graphically.
We typically follow the below steps for designing a database for an
application.
Gather the requirements (functional and data) by asking questions to
the database users.
Create a logical or conceptual design of the database. This is where ER
model plays a role. It is the most used graphical representation of the
conceptual design of a database.
After this, focus on Physical Database Design (like indexing) and
external design (like views)
Why Use ER Diagrams In DBMS
ER diagrams represent the E-R model in a database, making them easy
to convert into relations (tables).
ER diagrams serve the purpose of real-world modeling of objects which
makes them intently useful.
ER diagrams require no technical knowledge of the underlying DBMS
used.
It gives a standard solution for visualizing the data logically.
Symbols Used in ER Model
ER Model is used to model the logical view of the system from a data
perspective which consists of these symbols:
Rectangles: Rectangles represent entities in the ER Model.
Ellipses: Ellipses represent attributes in the ER Model.
Diamond: Diamonds represent relationships among Entities.
Lines: Lines represent attributes to entities and entity sets with other
relationship types.
Double Ellipse: Double ellipses represent multi-valued Attributes.
Double Rectangle: Double rectangle represents a weak entity.
Symbols used in ER Diagram
Components of ER Diagram
ER Model consists of Entities, Attributes, and Relationships among Entities in
a Database System.
Components of ER Diagram
What is an Entity
An Entity may be an object with a physical existence: a particular person,
car, house, or employee or it may be an object with a conceptual existence –
a company, a job, or a university course.
What is an Entity Set
An entity refers to an individual object of an entity type, and the collection of
all entities of a particular type is called an entity set. For example, E1 is an
entity that belongs to the entity type “Student,” and the group of all students
forms the entity set. In the ER diagram below, the entity type is represented
as:
Entity Set
We can represent the entity set in ER Diagram but can’t represent entity in
ER Diagram because entity is row and column in the relation and ER Diagram
is graphical representation of data.
Types of Entity
There are two types of entity:
1. Strong Entity
A Strong Entity is a type of entity that has a key Attribute. Strong Entity does
not depend on other Entity in the Schema. It has a primary key, that helps in
identifying it uniquely, and it is represented by a rectangle. These are called
Strong Entity Types.
2. Weak Entity
An Entity type has a key attribute that uniquely identifies each entity in the
entity set. But some entity type exists for which key attributes can’t be
defined. These are called Weak Entity types.
For Example, A company may store the information of dependents
(Parents, Children, Spouse) of an Employee. But the dependents can’t exist
without the employee. So dependent will be a Weak Entity Type and
Employee will be identifying entity type for dependent, which means it
is Strong Entity Type.
A weak entity type is represented by a double rectangle. The participation of
weak entity types is always total. The relationship between the weak entity
type and its identifying strong entity type is called identifying relationship
and it is represented by a double diamond.
Strong Entity and Weak Entity
What are Attributes
Attributes are the properties that define the entity type. For example,
Roll_No, Name, DOB, Age, Address, and Mobile_No are the attributes that
define entity type Student. In ER diagram, the attribute is represented by an
oval.
Attribute
Types of Attributes
1. Key Attribute
The attribute which uniquely identifies each entity in the entity set is
called the key attribute. For example, Roll_No will be unique for each
student. In ER diagram, the key attribute is represented by an oval with
underlying lines.
Key Attribute
2. Composite Attribute
An attribute composed of many other attributes is called a composite
attribute. For example, the Address attribute of the student Entity type
consists of Street, City, State, and Country. In ER diagram, the composite
attribute is represented by an oval comprising of ovals.
Composite Attribute
3. Multivalued Attribute
An attribute consisting of more than one value for a given entity. For
example, Phone_No (can be more than one for a given student). In ER
diagram, a multivalued attribute is represented by a double oval.
Multivalued Attribute
4. Derived Attribute
An attribute that can be derived from other attributes of the entity type is
known as a derived attribute. e.g.; Age (can be derived from DOB). In ER
diagram, the derived attribute is represented by a dashed oval.
Derived Attribute
The Complete Entity Type Student with its Attributes can be represented as:
Entity and Attributes
Relationship Type and Relationship Set
A Relationship Type represents the association between entity types. For
example, ‘Enrolled in’ is a relationship type that exists between entity type
Student and Course. In ER diagram, the relationship type is represented by a
diamond and connecting the entities with lines.
Entity-Relationship Set
A set of relationships of the same type is known as a relationship set. The
following relationship set depicts S1 as enrolled in C2, S2 as enrolled in C1,
and S3 as registered in C3.
Relationship Set
Degree of a Relationship Set
The number of different entity sets participating in a relationship set is called
the degree of a relationship set.
1. Unary Relationship: When there is only ONE entity set participating in a
relation, the relationship is called a unary relationship. For example, one
person is married to only one person.
Unary Relationship
2. Binary Relationship: When there are TWO entities set participating in a
relationship, the relationship is called a binary relationship. For example, a
Student is enrolled in a Course.
Binary Relationship
3. Ternary Relationship: When there are three entity sets participating in
a relationship, the relationship is called a ternary relationship.
4. N-ary Relationship: When there are n entities set participating in a
relationship, the relationship is called an n-ary relationship.
What is Cardinality
The maximum number of times an entity of an entity set participates in a
relationship set is known as cardinality . Cardinality can be of different types:
1. One-to-One: When each entity in each entity set can take part only once
in the relationship, the cardinality is one-to-one. Let us assume that a male
can marry one female and a female can marry one male. So the relationship
will be one-to-one.
One to One Cardinality
Using Sets, it can be represented as:
Set Representation of One-to-One
2. One-to-Many: In one-to-many mapping as well where each entity can be
related to more than one entity. Let us assume that one surgeon department
can accommodate many doctors. So the Cardinality will be 1 to M. It means
one department has many Doctors.
one to many cardinality
Using sets, one-to-many cardinality can be represented as:
Set Representation of One-to-Many
3. Many-to-One: When entities in one entity set can take part only once in
the relationship set and entities in other entity sets can take part more than
once in the relationship set, cardinality is many to one. Let us assume that a
student can take only one course but one course can be taken by many
students. So the cardinality will be n to 1. It means that for one course there
can be n students but for one student, there will be only one course.
many to one cardinality
Using Sets, it can be represented as:
Set Representation of Many-to-One
In this case, each student is taking only 1 course but 1 course has been
taken by many students.
4. Many-to-Many: When entities in all entity sets can take part more than
once in the relationship cardinality is many to many. Let us assume that a
student can take more than one course and one course can be taken by
many students. So the relationship will be many to many.
many to many cardinality
Using Sets, it can be represented as:
Many-to-Many Set Representation
In this example, student S1 is enrolled in C1 and C3 and Course C3 is
enrolled by S1, S3, and S4. So it is many-to-many relationships.
Participation Constraint
Participation Constraint is applied to the entity participating in the
relationship set.
1. Total Participation: Each entity in the entity set must participate in the
relationship. If each student must enroll in a course, the participation of
students will be total. Total participation is shown by a double line in the ER
diagram.
2. Partial Participation: The entity in the entity set may or may NOT
participate in the relationship. If some courses are not enrolled by any of the
students, the participation in the course will be partial.
The diagram depicts the ‘Enrolled in’ relationship set with Student Entity set
having total participation and Course Entity set having partial participation.
Total Participation and Partial Participation
Using Set, it can be represented as,
Set representation of Total Participation and Partial Participation
Every student in the Student Entity set participates in a relationship but
there exists a course C4 that is not taking part in the relationship.
How to Draw an ER Diagram
The very first step is to identify all the Entities
Represent these entities in a Rectangle and label them accordingly.
The next step is to identify the relationship between them and
represent them accordingly using the Diamond shape. Ensure that
relationships are not directly connected to each other.
Attach attributes to the entities by using ovals. Each entity can have
multiple attributes (such as name, age, etc.), which are connected to
the respective entity.
Assign primary keys to each entity. These are unique identifiers that
help distinguish each instance of the entity. Represent them with
underlined attributes.
Remove any unnecessary or repetitive entities and relationships
Review the diagram make sure it is clear and effectively conveys the
relationships between the entities.