Databases

database theory

overview

tutorials

  • How to implement a hierarchial database design (originally supplied by Dave Boulden).

normalisation

Normalisation is the process of distributing data in such a manner that it is not generally repeated.

For instance, if you have a table called “Employees”, which stores employee data, and a table called “Departments”, which stores data for departments into which each employee falls, you would not, in a normalised structure store the Department name in each employee record.

If we had three Departments, called “Sales”, “Finance” and “Human Resources”, and each employee in the Employee table had to belong to one department, it would be a waste of disk space to store the word “Sales” etc. in each employee record to designate which department they belonged to. “But”, I hear you cry, “disk space is no longer the same issue it used to be in the early days when relational databases were first created!” Entirely correct, and although the original thinking around normalisation was centred around disk utilisation (for it was a most precious resource in them thar days) it quickly showed itself to be a logical methodology for maintaining a data state that was both compatible to human and computer processes.

Say, for example, in the Departments table, you have a department called “Sales”, and you have 1000 employees in the Employees table linked to that department. If you link each employee to the department name, and then decide to change the name of the department to “Super Sales”, you will have to go through 1000 records and change the name of the department in each one. But, if you have a department called “Sales”, that is linked to 1000 employee records through a department_id called “1”, you can easily change the department name to “Super Sales”, and all queries drawing back employee information for that department will automatically register the department name as “Super Sales”.

There are various levels of normalisation, with Third Normal Form being the one most database folks try to obtain, and anything beyond that a dream. However, normalisation has its own problems. If, for instance you have 10 million records in a system from which you need to obtain data trends (data mining) or statistical reports, normalisation becomes a burden, because the linking of various tables to other tables in any SQL query is a time-related and performance issue. In heavy reporting and data-mining applications it is the usual practice to ‘de-normalise’ the data; that is, putting the department name back into a cross-reference table that includes the employee data and department data, to obtain the benefit of speed within the query process.

If you’re confused, don’t worry, so is every other database ‘newbie’. Database modelling and design is an intuitive process. When you become skilled at it you just “know” what is the right thing to do. One common mistake of new database modellers is to normalise things to hell and gone just because the theory says they should.

Forget the theory. Use the Force.

designing a db

specific databases

MySQL

  • Open-souce, mostly-free database.
  • Fast and lightweight
  • Runs on !FreeBSD, Linux, Windows, and most other platforms
  • Standard module for PHP, ie very easy to use in shared hosting etc
  • See MyPHPAdmin, a standard but wondeful web interface.
  • Fairly good SQL-compliance but lacks features of heavier databases
  • (Optionally) supports transactions (despite what you may have heard).
  • Non ACID (still?)
  • See MySQL homepage

PostgreSQL

  • Open-source, BSD-style free database.
  • Fast and lightweight
  • Runs on Linux, Windows (cygwin for 7.x, 8 will be native), most other platforms
  • Standard extension for PHP, Java
  • See phpPgAdmin, a great web interface.
  • Has everything you could expect from a serious database
  • ACID
  • See PostgreSQL homepage

MS Access

  • Desktop database.
  • Widely used
  • Excellent RAD tools
  • Able to power small/low traffic websites

MSDE

  • Cut down version of MS SQL Server
  • Free
  • Most of the functionality of MS SQL Server
  • ACID
  • See MSDE homepage

MS SQL Server Express

  • Alpha release of MS SQL Server 2005 (ie Yukon)
  • Extends CLR into the DB (No more Stored Procs)
  • No limit on number of connections
  • No Go Live license available yet
  • Free download from Microsoft
  • See MS SQL Server Express homepage

MS SQL Server

  • Expensive for enterprise level
  • Enterprise class database
  • Comes with a suite of BI and management tools
  • Next generation
  • Only runs on Windows
  • ACID
  • See MS SQL Server homepage

Oracle XE

  • Free download from Oracle
  • Independent Software Vendors are able to distribute free of charge
  • Includes Application Development tool (HTMLDB)
  • Runs on Linux and Windows
  • See Oracle XE homepage

Oracle

  • Enterprise class database
  • Standard Edition or Enterprise Edition; both support clustering
  • Standard version similarly priced to MS SQL Server
  • Runs on more operating system than any other commercial database
  • ACID
  • Comes with it’s own programming language PL/SQL, which can be used for very powerful ‘in-database’ functionality.
  • Database downloads are free for development. You only need a licence when you go into a production environment.
  • See Oracle Downloads Page