Introduction to Relational Database Systems
by Gregory Jorgensen, PDXperts LLC
If you collect and use any kind of data you probably have some kind of organizing system. Whether you use index cards, a filing cabinet, Excel spreadsheets, or some kind of database program, your system should let you add and change data, delete data, and retrieve data, and it should work faster and more efficiently than if you had to do it by hand.
When your record keeping needs outgrow a filing cabinet or bundle of index cards, a computerized system can help you manage the growing quantity of data, and let you use your data more effectively. Spreadsheet programs can manage small, simple databases. Desktop programs such as Microsoft Access and FileMaker can handle databases too big or too complicated for a spreadsheet. Larger databases, databases accessed by many people at once, and databases that feed client/server and web applications need real database servers.
The Relational Model
The relational model, invented by IBM researcher Ted Codd in 1970, wasnt turned into a commercial product until almost 1980. Since then database systems based on the relational model, called relational database management systems or RDBMSs, have come to dominate the database software market. Today few people know about any other kind of database management system.
People usually find relational databases alien and forbidding. The relational model derives from a mathematical conception of data and relationships; it doesnt model or imitate any organizing system people use in real life. If you sat down with a pile of index cards to organize your CD collection you wouldnt choose a relational model. If you sat down with a pile of index cards to organize all of the transactions a typical bank handles every day, youd give up on the index cards and look for a better solution. If you worked long and hard, youd invent the relational model.
I wont get into the theory behind the relational model, except to point out that no RDBMS implements the relational model completely. Although commercial RDBMSs have a lot in common, each system has quirks and non-standard extensions. You must understand relational theory to correctly design a database--just learning a particular RDBMS wont get you all the way there.
A good RDBMS and a well-designed relational database give you some important benefits:
- Data integrity and consistency maintained and/or enforced by the RDBMS.
- Redundant data eliminated or kept to a practical minimum.
- Data retrieved by unique keys.
- Relationships expressed through matching keys.
- Physical organization of data managed by RDBMS.
- Optimization of storage and database operation execution times.
- Concurrency: database users dont corrupt each others work.
- Scalable: can spread load across multiple CPUs or servers.
If you dont use an RDBMS you have to forego some of those benefits, or write code yourself.
Normalization
If you know anything about relational databases, you probably know about normalization. The process of normalization transforms data into forms that conform to the relational model. Normalized data enables the RDBMS to enforce integrity rules, guarantee consistency, and optimize database access. Learning how to normalize data takes significant time and practice. Data modelers spend a lot of time understanding the meaning of data so they can properly normalize it, but programmers frequently downplay normalization, or dismiss it outright as an academic problem. Most databases come from power users and programmers, not data modelers, and most databases suffer from un-normalized data, redundancy, integrity and performance problems. Un-normalized databases usually need a lot of application code to protect the database from corruption.
If you find yourself working with a programmer or database designer who dismisses normalization as academic or theoretical, beware. You wouldnt get into a plane with a pilot who dismissed physics as unnecessary theory, so dont trust your data to someone who doesnt understand database theory and normalization.
Transactions and the ACID properties
Large database systems that support many simultaneous users must insure that changes to the database cant corrupt the database or leave it in an inconsistent state. Most database systems can bundle multiple updates (adds, changes, or deletes) together into transactions. Transactions have four important features usually referred to as the ACID properties:
- Atomic: Transactions either completely succeed or completely fail. If the system crashes before the transaction completes the databases state does not change.
- Consistent: Transactions preserve database consistency. A transaction transforms the database from a consistent state to another consistent state.
- Isolated: A transactions updates dont interfere with other transactions, or other users of the database. Until a transaction completely succeeds the database system conceals the individual updates from other transactions.
- Durable: Once a transaction completes (commits), the updates survive in the database.
All of the commercial RDBMSs support transactions. MySQL, a popular open-source system, does not fully support transactions.
SQL
As commercial relational databases entered the market and RDBMSs proliferated, Structured Query Language (SQL) became the most popular language for querying and manipulating relational databases--so popular that people often use the terms SQL and relational database interchangeably. All of the important commercial and open source RDBMSs support most or all of the SQL standard.
As a computer language SQL has simple syntax and few keywords, because SQL merely expresses relational operations in an English-like syntax. But the relational operations can get complicated, and SQL soon gets in the way. If we used relational algebra and mathematical notation to query databases, we wouldnt have to live with SQLs shortcomings, but few people would ever learn how to use the simplest database.
Some sample SQL statements:
INSERT INTO Songs (song, artist)
VALUES ('I Feel Good', 'James Brown')
UPDATE Cupboard SET qty=2
WHERE itemname='Captain Crunch cereal'
SELECT year,color,price FROM Cars
WHERE make='Ford' AND model='Falcon'
You can use SQL without really understanding database design or relational theory. Casual database users can get by with a little SQL, but programmers and database designers should understand relational theory and the principles of database design. Unfortunately employers and recruiters often choose candidates with two years experience with, say, Oracle PL/SQL over candidates who have a deeper understanding of relational theory and database design.
Relational Database Software
On the desktop, Microsoft Access has enough relational qualities to work for small databases, but doesnt hold up well under load. Access doesnt support transactions, so it cant scale to handle many users or multiple servers. FileMaker doesnt support the relational model or transactions, though it can organize and manipulate small and medium-size databases without needing a lot of programming support.
Commercial RDBMSs:
- Oracle
- Microsoft SQL Server
- IBMs DB/2
- Sybase
All of the commercial products cost too much for desktop users or small businesses. Large companies almost always own one (or more) commercial RDBMS. All of the commercial RDBMSs conform to the relational model (more or less), and they all implement the ANSI/ISO SQL language standard (with some exceptions and lots of vendor-specific extensions).
Free/Open Source RDBMSs:
- MySQL
- PostgreSQL
MySQL backs a lot of web sites and seems the database of choice for web developers. MySQL has some big holes in its implementation of the relational model, though, and lacks the integrity and scalability features of the commercial databases. PostgreSQL doesnt have the same shortcomings as MySQL, but it also doesnt really compare to the commercial products. Both MySQL and PostgreSQL can handle large real databases and work fine for a wide range of applications. MySQL and PostgreSQL dont replace the commercial products (yet); they replace ad-hoc data storage systems such as flat files and hash tables.
Books
An Introduction to Database Systems, C. J. Date
The Database Relational Model, C. J. Date
Practical Issues in Database Management: A Reference for the Thinking Practitioner, Fabian Pascal
Database Design for Mere Mortals: A Hands-On Guide to Relational Database Design, Michael Hernandez
A Guide to SQL Standard, C. J. Date and Hugh Darwen
SQL Queries for Mere Mortals, Michael Hernandez and John L. Viescas
SQL: Visual QuickStart Guide, Chris Fehily
Web Sites
Free SQL course and lots more: www.sqlcourse.com
SQL Tutorial: www.w3schools.com/sql/default.asp
All about SQL: www.sql.org
WebMonkey database tutorials: hotwired.lycos.com/webmonkey/backend/databases/index.html
Yet another intro: www.databasejournal.com/sqletc/article.php/1469521
Database Debunkings: http://www.pgro.uk7.net/index.htm
Products
MySQL: www.mysql.com
PostgreSQL: www.postgresql.org
Oracle: www.oracle.com
Microsoft SQL Server: www.microsoft.com/sql/default.asp
Sybase: www.sybase.com
IBM DB/2: www-3.ibm.com/software/data/db2
