Primary and Foreign keys are a very simple relational database concept. Until very recently Excel users didn’t have to worry about relationships because Excel didn’t support them.
This all changed when Microsoft added Power Pivot (an OLAP data modeling tool) and Get & Transform (a powerful Extract Transform and Load tool) to Excel. Suddenly Excel users had to understand relational database theory to make use of the new tools.
If you are an Excel users and want to understand how primary and foreign keys fit into the big picture of creating an OLAP solution using Power Pivot you should read my my Excel Power Pivot 2-minute overview article first. But if you are a database designer struggling with relational concepts (but with no interest in Excel) you’ll also get a lot out of this short article.
The article below is an unedited lesson from one of the 35 short focused lessons in my Excel Expert Skills book/e-book that enables students to master the Get & Transform (previously called Power Query) tool that is now included in all current Excel versions.
Leave me a comment if you find the article useful.
Virtually all of the world’s databases work like this
Dr Edgar F Codd (1923-2003) invented the relational database when working for IBM in the 1970’s.
The first relational database products came to market in the late 1970’s and were quickly adopted by big business.
So good and great was Codd’s design that nothing better has been developed in over thirty years.
The entire world of commerce is now powered by relational databases.
You’ll usually find a requirement to analyze data from a relational database in all but the very smallest enterprises.
Primary and foreign keys should have the same name
Any database designer worthy of the name will use the same name for the primary key column and the related foreign key column.
I’ve found that, in the real world of business, I often need to work with a badly designed database where this is not the case. This makes the data a lot more difficult to work with.
How meaningful primary keys almost stopped the Welsh from buying cars
Many years ago, I implemented a Europe-wide Business Intelligence solution for a very large automotive finance company.
I was not pleased (but unsurprised) to find that the database designer had used meaningful primary and foreign keys.
The designer had used a concatenation of the customer’s last name and date of birth as the primary key for the customer table.
The designer believed that the possibility of having two customers with the same last name and date of birth was extremely unlikely to ever happen.
In fact, it transpired that 13.5% of the Welsh population have a last name of Jones, meaning that it was certain to happen in Wales (and actually very likely to happen everywhere).
The database was, of course, re-designed to use meaningless primary keys.
Lesson 11-26: Understand primary and foreign keys
A set of tables that are related to each other are referred to as a relational database. Some of Get & Transform’s most useful features require an understanding of basic relational database theory.
What is a primary key?
A relational database consists of several tables, each containing data. A database table is conceptually no different to an Excel table except that each row must have a unique primary key.
Here’s an example of a table from a relational database:
The above example comes from a database table called Category. It is good relational database design practice to name a primary key as the table name plus the letters ID. This designer has followed best practice and called the primary key CategoryID.
The only important quality of a primary key is that all primary key values must be different (unique) within the primary key column. That’s because the primary key is used (by a relationship) to identify a single, row in a table. If there were two table rows with the same primary key, a relationship wouldn’t be able to correctly identify which single row was being referenced.
In the above example the primary key is a number but primary keys can consist of numbers, letters or both.
What is a foreign key?
Here is an example of two related tables from a relational database:
You can see how it is possible to determine that Aniseed Syrup is in the Condiments category:
- In the Product table, Aniseed Syrup has a CategoryID of 2.
- The item in the Category table with a primary key (CategoryID) of 2 is Condiments.
While CategoryID is the primary key column in the Category table, it is a foreign key column within the Product table.
You can think of the CategoryID values in the Product table as belonging to the Category table, making them foreign keys in the Product table.
You can probably now see the wisdom of the naming convention for primary keys. It makes it possible to spot which are the primary and foreign keys in a table at a glance.
- The column named with the table name plus ID is the primary key.
- Any other column name that is suffixed with ID (has ID at the end of the name) is a foreign key.
- Any column not suffixed with ID is a regular data field containing information.
Meaningful and meaningless primary keys
In the above example, the primary key is a meaningless number. The number 2 tells you nothing about any attribute of the Condiments category. It simply provides a way to find where the correct Condiments row is located within the table.
It is also possible to use a meaningful primary key (though a professional database designer would never do this). For example, you could argue that because category names are unique, it is fine to use the CategoryName column in the above table as the primary key.
If you decided to use the category name as the primary key the tables would look like this:
From a database design perspective, this is generally a bad idea
You’ll often find, however, that you need to create this type of relationship when creating relationships between tables that originate from different databases. In this case, no formal primary key/foreign key relationship will exist.
For example, you may need to join a table from your corporate database containing customer addresses with a table detailing sales tax rates by state that you’ve downloaded from the Internet. In this case you would need to create a primary key/foreign key relationship between the state columns in each table.
This lesson is excerpted from the above book.
This is the only up-to-date Excel book currently published and includes an entire session devoted to the new Dynamic Arrays features.
It is also the only book that will teach you absolutely every Excel skill including Power Pivot, OLAP and DAX.
Covered in Session 12 – Power Pivot, Data Modelling, OLAP and Business Intelligence
Covered in Lesson 12-20: Use the CUBEVALUE function to query a data model.
Covered in Session 13: An Introduction to DAX
Covered in Lesson 12-6: Use an OLAP pivot table to analysze data residing in a data model.
Covered in Lesson 11-33 Create a merged query using fuzzy logic.
Covered in Session 10: 3D Maps.
Covered in Lesson 11-7: Create and use a custom data type.
Covered in Lesson 9-2: Use Natural Language Queries.
Covered in Lesson 11-8: Use custom data types in formulas.
Covered in Lesson 9-1: Use automatic data analysis to create data insights.
Power Pivot causes great confusion. This simple 2-minute overview completely de-mystifies Power Pivot and its associated technologies.
A Power Pivot data model is an OLAP database. Business databases are OLTP databases. This article discusses how OLTP databases are structured.