This lesson is excerpted from the Expert Skills book
This lesson is excerpted form the book: Learn Excel 365 Expert Skills with The Smart Method: Fourth Edition: updated for the Jul 2020 Semi-Annual version 2002.
This lesson is contained in the session: Create Get & Transform Queries.
A set of tables that are related to each other are referred to as a relational database. Some of Get & Transform’s most useful features require an understanding of basic relational database theory.
What is a primary key?
A relational database consists of several tables, each containing data. A database table is conceptually no different to an Excel table except that each row must have a unique primary key.
Here’s an example of a table from a relational database:
The above example comes from a database table called Category. It is good relational database design practice to name a primary key as the table name plus the letters ID. This designer has followed best practice and called the primary key CategoryID.
The only important quality of a primary key is that all primary key values must be different (unique) within the primary key column. That’s because the primary key is used (by a relationship) to identify a single, row in a table. If there were two table rows with the same primary key, a relationship wouldn’t be able to correctly identify which single row was being referenced.
In the above example the primary key is a number but primary keys can consist of numbers, letters or both.
What is a foreign key?
Here is an example of two related tables from a relational database:
You can see how it is possible to determine that Aniseed Syrup is in the Condiments category:
- In the Product table, Aniseed Syrup has a CategoryID of 2.
- The item in the Category table with a primary key (CategoryID) of 2 is Condiments.
While CategoryID is the primary key column in the Category table, it is a foreign key column within the Product table.
You can think of the CategoryID values in the Product table as belonging to the Category table, making them foreign keys in the Product table.
You can probably now see the wisdom of the naming convention for primary keys. It makes it possible to spot which are the primary and foreign keys in a table at a glance.
- The column named with the table name plus ID is the primary key.
- Any other column name that is suffixed with ID (has ID at the end of the name) is a foreign key.
- Any column not suffixed with ID is a regular data field containing information.
Meaningful and meaningless primary keys
In the above example, the primary key is a meaningless number. The number 2 tells you nothing about any attribute of the Condiments category. It simply provides a way to find where the correct Condiments row is located within the table.
It is also possible to use a meaningful primary key (though a professional database designer would never do this). For example, you could argue that because category names are unique, it is fine to use the CategoryName column in the above table as the primary key.
If you decided to use the category name as the primary key the tables would look like this:
From a database design perspective, this is generally a bad idea
You’ll often find, however, that you need to create this type of relationship when creating relationships between tables that originate from different databases. In this case, no formal primary key/foreign key relationship will exist.
For example, you may need to join a table from your corporate database containing customer addresses with a table detailing sales tax rates by state that you’ve downloaded from the Internet. In this case you would need to create a primary key/foreign key relationship between the state columns in each table.
Additional information (shown in the sidebar of the Expert Skills book)
Virtually all of the world’s databases work like this
Dr Edgar F Codd (1923-2003) invented the relational database when working for IBM in the 1970’s.
The first relational database products came to market in the late 1970’s and were quickly adopted by big business.
So good and great was Codd’s design that nothing better has been developed in over thirty years.
The entire world of commerce is now powered by relational databases.
You’ll usually find a requirement to analyze data from a relational database in all but the very smallest enterprises.
Primary and foreign keys should have the same name
Any database designer worthy of the name will use the same name for the primary key column and the related foreign key column.
I’ve found that, in the real world of business, I often need to work with a badly designed database where this is not the case. This makes the data a lot more difficult to work with.
How meaningful primary keys almost stopped the Welsh from buying cars
Many years ago, I implemented a Europe-wide Business Intelligence solution for a very large automotive finance company.
I was not pleased (but unsurprised) to find that the database designer had used meaningful primary and foreign keys.
The designer had used a concatenation of the customer’s last name and date of birth as the primary key for the customer table.
The designer believed that the possibility of having two customers with the same last name and date of birth was extremely unlikely to ever happen.
In fact, it transpired that 13.5% of the Welsh population have a last name of Jones, meaning that it was certain to happen in Wales (and actually very likely to happen everywhere).
The database was, of course, re-designed to use meaningless primary keys.