The Smart Method Logo

Publishers of the world’s most comprehensive and up-to-date Excel tutorials

Understand modern data analysis

Understand modern data analysis

This lesson is excerpted from the Expert Skills book

This lesson is excerpted form the book: Learn Excel 365 Expert Skills with The Smart Method: Fourth Edition: updated for the Jul 2020 Semi-Annual version 2002. 

This lesson is contained in the session: Power Pivot, Data Modeling, OLAP and Business Intelligence.

You may need some existing knowledge to understand this article

Our Expert Skills book teaches Power Pivot, Get & Transform and other OLAP tools from first principles in a very structured way.  This is the introduction to this lesson in the Expert Skills book:

You now have an in-depth understanding of Get & Transform. You also have a limited understanding of relational database theory and Power Pivot. With this foundation knowledge you should now be able to understand how these tools and technologies work together to provide a new paradigm for data analysis using Excel that is often described as: Modern Data Analysis.

If you do not have any understanding of Get & Transform, relational database theory and Power Pivot you may still get some value from this article but you may have difficulty understanding some of the modern data analysis concepts described in the final section.

What does modern data analysis mean?

When the Get & Transform and Power Pivot tools were added to Excel (in the Excel 2019 perpetual licence version and the January 2019 Excel 365 semi-annual update) they enabled a new paradigm for data analysis.

The term: Modern Data Analysis is often used to describe the use of these new tools in their intended fashion. The term: Classic Data Analysis has been used to describe the “old way” of doing things.

The five steps in the modern data analysis workflow

Most data analysis tasks require five clearly defined steps: Get, Transform, Relate, Aggregate (summarize) and Visualize.

  1. Get the data. This means importing data (that may be located in a database, Excel worksheet, CSV file or other source).
  2. Transform the data. This might include removing unwanted columns, converting text fields to dates, rounding numeric values, filtering, sorting, renaming columns, adding calculated columns and other similar tasks.
  3. Relate the data. If data resides in several different tables, there needs to be a method to either define the relationships between them or to consolidate the data into a single table.
  4. Aggregate (summarize) the data. Values usually need to be aggregated (for example by creating grouped totals or averages) into a digestible form.
  5. Visualize the data. This may involve adding charts, pivot charts, conditional formatting, sparklines and other visualizations.

How data analysis was done in the past (classic data analysis)

  1. Get the data. This was traditionally done by importing data into a collection of Excel tables or ranges.
  2. Transform the data. This might include removing unwanted columns from the worksheet, adding new calculated columns containing Excel functions and formulas, converting text fields to dates, rounding numeric values, renaming columns, formatting cells and other similar tasks.
  3. Relate the data. Traditional (non OLAP) pivot tables can only summarize data from a single table so data from several different tables would often have to be consolidated into a single table. This was traditionally done with VLOOKUP, SWITCH, INDEX, MATCH, IF and IFS functions.
  4. Aggregate the data. Excel contains many methods to summarize data. Popular methods include Automatic Subtotals, Data Tables, traditional (as opposed to OLAP) Pivot Tables and aggregation functions such as SUM, SUMIFS, AVERAGE, MAX, MIN, COUNT and COUNTIF.
  5. Visualize the data. This might involve summarizing data in a traditional pivot table or pivot chart, adding charts, conditional formatting, sparklines and other visualizations.

How modern data analysis achieves faster and better results

  1. Get the data. This is done using the Get and Transform tool. Data can be imported from a huge range of data sources using this tool (including relational databases, Excel worksheets, CSV files and web pages). Get & Transform can work with very large data sets (often called Big Data) as it is not restricted to Excel’s million row limitation (see sidebar).
  2. Transform the data. This is also done using the Get and Transform tool. Unlike Excel, the transform actions are stored in PQFL (Power Query Formula Language) expressions. Both the connection details and the PQFL steps are stored in a query. This means that the query can be re-run, avoiding repetitive work in the future if the source data changes.
  3. Relate the data. Power Pivot enables tables to be related in a data model using primary key/foreign key relationships. This provides huge flexibility and avoids the use of any Excel functions (such as VLOOKUP) to relate data.
  4. Summarize the data. While a traditional pivot table can only access data residing in a single table, data models can be analyzed with a new type of pivot table (called an OLAP pivot table). The OLAP pivot table can do just about anything that a regular pivot table can do but can access data residing in multiple related tables. Power Pivot also enables DAX (Data Analysis Expressions) to be added to the data model. DAX enables calculated columns and calculated measures (aggregations) to be simply defined. The Expert Skills book has a whole session covering DAX in: Session Thirteen: An introduction to DAX.
  5. Visualize the data. OLAP Pivot Tables and OLAP Pivot Charts provide the primary method of visualizing data residing in data models. It is also possible to use any of Excel’s classic analysis and visualization features by using CUBEVALUE functions to extract data directly from a data model into Excel cells. You’ll learn more about the CUBEVALUE function later, in: Lesson 12 20: Use the CUBEVALUE function to query a data model.

Additional information (shown in the sidebar of the Expert Skills book)

Power Pivot has been available for selected versions of Excel for a long time

Power Pivot was first released in May 2010 as part of the SQL Server 2008 R2 product line. It was also possible to install Power Pivot onto any version of Excel 2010 as a free add-in.

For Excel 2013 and Excel 2016 the full Power Pivot add-in was only available for Pro Plus versions of Excel.

In the standard version of Excel 2013 it became possible to create a relational data model and analyze the model using an OLAP pivot table or chart. Without access to Power Pivot it wasn’t possible to create DAX calculated fields or calculated measures or to display a schema. The Expert Skills book has a complete session devoted to DAX: Session Thirteen: An introduction to DAX.

Excel 2016 also added the Get & Transform tool (previously named Power Query) as an integral part of all Excel versions.

Excel 2019 (and the Jan 2019 semi-annual Excel 365 release) finally added Power Pivot to all Excel versions. It then became possible to realize the full modern data analysis vision using any Excel version.

Modern data analysis supports Big Data

One of the compelling reasons to move to modern data analysis is the increasing requirement to work with Big Data.

Excel worksheets theoretically support a little over a million rows of data (in reality the useful limit is far lower).

Power Pivot tables can contain up to approximately two thousand million rows, meaning that there is no imaginable data requirement that cannot be supported.

Modern data analysis is fast

Even when working with data tables containing many millions of rows most analysis actions will complete almost instantly.

Power Pivot uses Microsoft’s xVelocity in-memory analysis engine. This engine can scan billions of data rows per second and can produce reports in a tiny fraction of the time that would be needed by Excel.

Modern data analysis performs most of the work outside Excel

Notice that, in modern data analysis, only Step 5 involves using Excel.

All of the actions that are used to create the data model are performed by the Get & Transform and Power Pivot applications.

This lesson as shown in the Expert Skills e-book Kindle reader for PC

Available for Excel 365, 2019 and 2016 as both an e-book and paper printed book.

Share this article

Share on facebook
Share on twitter
Share on linkedin
Share on stumbleupon
Share on email

Related Articles

Understand Get & Transform and ETL

Understand Get & Transform and ETL

Get & Transform (or Power Query) is an advanced ETL tool. ETL is an acronym for Extract, Transform and Load. This article simply explains this amazing tool.

OLAP Star Schema

Understand OLAP database Design

A Power Pivot data model is an OLAP database. OLAP databases can be modeled as a Snowflake or Star schema. This article explains both designs.

OLTP database example

Understand OLTP database design

A Power Pivot data model is an OLAP database. Business databases are OLTP databases. This article discusses how OLTP databases are structured.

Leave a Reply

Your email address will not be published.

four + 14 =