The Smart Method Logo

Publishers of the world’s most comprehensive and up-to-date Excel tutorials

Picture of man holding key over his head

Understand Primary and Foreign Keys

This lesson is excerpted from the Expert Skills book

This lesson is excerpted form the book: Learn Excel 365 Expert Skills with The Smart Method: Fourth Edition: updated for the Jul 2020 Semi-Annual version 2002. 

This lesson is contained in the session: Create Get & Transform Queries.

A set of tables that are related to each other are referred to as a relational database. Some of Get & Transform’s most useful features require an understanding of basic relational database theory.

What is a primary key?

A relational database consists of several tables, each containing data. A database table is conceptually no different to an Excel table except that each row must have a unique primary key. 

Here’s an example of a table from a relational database:

The above example comes from a database table called Category. It is good relational database design practice to name a primary key as the table name plus the letters ID. This designer has followed best practice and called the primary key CategoryID.

The only important quality of a primary key is that all primary key values must be different (unique) within the primary key column. That’s because the primary key is used (by a relationship) to identify a single, row in a table. If there were two table rows with the same primary key, a relationship wouldn’t be able to correctly identify which single row was being referenced.

In the above example the primary key is a number but primary keys can consist of numbers, letters or both.

What is a foreign key?

Here is an example of two related tables from a relational database:

You can see how it is possible to determine that Aniseed Syrup is in the Condiments category:

  • In the Product table, Aniseed Syrup has a CategoryID of 2.
  • The item in the Category table with a primary key (CategoryID) of 2 is Condiments.

While CategoryID is the primary key column in the Category table, it is a foreign key column within the Product table.

You can think of the CategoryID values in the Product table as belonging to the Category table, making them foreign keys in the Product table.

You can probably now see the wisdom of the naming convention for primary keys. It makes it possible to spot which are the primary and foreign keys in a table at a glance.

  • The column named with the table name plus ID is the primary key.
  • Any other column name that is suffixed with ID (has ID at the end of the name) is a foreign key.
  • Any column not suffixed with ID is a regular data field containing information.

Meaningful and meaningless primary keys

In the above example, the primary key is a meaningless number. The number 2 tells you nothing about any attribute of the Condiments category. It simply provides a way to find where the correct Condiments row is located within the table.

It is also possible to use a meaningful primary key (though a professional database designer would never do this). For example, you could argue that because category names are unique, it is fine to use the CategoryName column in the above table as the primary key.

If you decided to use the category name as the primary key the tables would look like this:

From a database design perspective, this is generally a bad idea

You’ll often find, however, that you need to create this type of relationship when creating relationships between tables that originate from different databases. In this case, no formal primary key/foreign key relationship will exist.

For example, you may need to join a table from your corporate database containing customer addresses with a table detailing sales tax rates by state that you’ve downloaded from the Internet. In this case you would need to create a primary key/foreign key relationship between the state columns in each table.

Additional information (shown in the sidebar of the Expert Skills book)

Virtually all of the world’s databases work like this

Dr Edgar F Codd (1923-2003) invented the relational database when working for IBM in the 1970’s.

The first relational database products came to market in the late 1970’s and were quickly adopted by big business.

So good and great was Codd’s design that nothing better has been developed in over thirty years.

The entire world of commerce is now powered by relational databases.

You’ll usually find a requirement to analyze data from a relational database in all but the very smallest enterprises.

Primary and foreign keys should have the same name

Any database designer worthy of the name will use the same name for the primary key column and the related foreign key column.

I’ve found that, in the real world of business, I often need to work with a badly designed database where this is not the case. This makes the data a lot more difficult to work with.

How meaningful primary keys almost stopped the Welsh from buying cars

Many years ago, I implemented a Europe-wide Business Intelligence solution for a very large automotive finance company.

I was not pleased (but unsurprised) to find that the database designer had used meaningful primary and foreign keys.

The designer had used a concatenation of the customer’s last name and date of birth as the primary key for the customer table.
The designer believed that the possibility of having two customers with the same last name and date of birth was extremely unlikely to ever happen.

In fact, it transpired that 13.5% of the Welsh population have a last name of Jones, meaning that it was certain to happen in Wales (and actually very likely to happen everywhere).

The database was, of course, re-designed to use meaningless primary keys.

This lesson as shown in the Expert Skills e-book Kindle reader for PC

Lesson 11-26 Understand primary and foreign keys

Available for Excel 365, 2019 and 2016 as both an e-book and paper printed book.

Share this article

Share on facebook
Share on twitter
Share on linkedin
Share on stumbleupon
Share on email

Related Articles

Understand modern data analysis

Understand modern data analysis

This article describes a new way of working with Excel called “modern data analysis” enabled by the new Get & Transform and Power Pivot tools.

OLAP Star Schema

Understand OLAP database Design

A Power Pivot data model is an OLAP database. OLAP databases can be modeled as a Snowflake or Star schema. This article explains both designs.

OLTP database example

Understand OLTP database design

A Power Pivot data model is an OLAP database. Business databases are OLTP databases. This article discusses how OLTP databases are structured.

Leave a Reply

Your email address will not be published.

13 − 6 =