Tableau Performance: Why Embedded Data Source is Faster than Published
April 16, 2017
, , , , , , ,

Optimize ExtractIt started with one of my colleague who managed a Tableau case from our end. His client was constantly dissatisfied: her workbooks were significantly faster with embedded data sources compared to published ones. Even if we told her that this is normal, this is the way how life works, so better accept it and let it go. But no, she wanted an explanation. Well, here it is.

Spoiler: this post is about Optimize Extract feature in Tableau and how it works internally.

So why it’s slower?

There are many things that make published data sources slower than having a single workbook with embedded data sources. First of all Published Datasources rely on data server process. This is definitely an additional layer in the processing. First, you hit vizql server, then it goes to dataserver (which can be located on a separate server host), then to tdeserver. This is an additional step, there are some serialization and de-serialization work in place (but the data itself is in binary), but this isn’t supposed to be huge overhead.

Open a Workbook with Published Extract Based Datasource

Open a Workbook with Published Extract Based Datasource

So no, not this additional proxy step is responsible for the performance difference. Usually the reason is the optimize extract which works differently in published data sources and embedded data sources.

Optimize Extract

Have you ever wondered what is that magic submenu option in Extract menu? “Optimize“. It must be something great, huh? Sometimes you see during extract generation that it is optimizing. But what exactly happing behind the scenes – let’s see it in details. The documentation says that it can do two things:

  • Materialized Calculated Fields
  • Create acceleration views for relevant quick filters

And now switch to practice mode and check how they’re working internally.

Materialized Calculated Fields

Let’s start from full scratch with a data connection pointing to a Tableau Server repository. Create a connection to roles  and make an extracted data source from it:

Screen Shot 2017-04-15 at 19.19.07

With columns like:

Screen Shot 2017-04-15 at 19.19.29

Now we should have our columns in our TDE. Let’s see how the generated tableau extract looks inside. First check what schemas and tables we have:

As you see we have the usual [SYS]  schema where all extract metadata stored. This is interesting but not for now. $TableauMetadata  in the [Extract]  is a metadata key-value store and it usually stores the data source definition XML. This isn’t even interesting at all.

However, we have the [Extract]  table in the [Extract]  schema as well where we find our extracted table. That’s something! Let’s have a look into it:

We already see the first materialized column  Number of Records  with value 1 everywhere. This wasn’t in the postgres repository but the extraction process materialized for its convenience.

Move forward and check what happens when we define a calculated field. Thus, create a new calculated field called Public Name :

Screen Shot 2017-04-15 at 19.32.24

Public Name calculation – This will be a new materialized column in the TDE

And now optimize the extract. Optimization always occurs during extract refresh, however, if you just added a new calculation and you want to materialize it immediately you need to trigger optimization manually.

Optimize Extract

After triggering optimization we can sneak in to the newly generated extract and observe a new column (scroll to the right):

We have a new column called Calculation_5834061538592075777  that is exactly the same calculation what we just added. Thus, it was precalculated and materialized. From now there is no need to do the branching calculation on the fly.

This pre-calculation is only supported for row level calculations. No table calculations or level of detail calculations are materizalized/optimized. However, for FIXED LODs I do not see any reason why not materialize values – but again, it’s not implemented for any type of LODs.

Quick Filter Materialization

Let’s move forward and see what is the case with the filter optimization. This called as Acceleration Views and defined as:

When a workbook contains filters that are set to show only relevant values, computing the available values for that filter can be an expensive query. For these filters, Tableau must evaluate the other filters in the workbook first and then compute the relevant values based on their filter selections. To speed up these queries, a view can be created that computes the possible filter values and caches them for faster lookup later.

To check it add a few quick filters to workbook and click optimize again.

Screen Shot 2017-04-16 at 16.00.33

After an another round of optimize we can check what changed in our Extract file. As observed there is a new schema called [Cache] . Let’s look into it.

As you see the combination of quick filters are materialized into a single table. Using this View#0 Tableau can quickly visualize relevant quick filters without querying the full data set. ( Entry  is a metadata table that defines the contents for each Accelerated View table)

This is all fine. But what happens if we switch to Published Datasource?

Published Datasource

Now the fun part. After publishing the data source to the Server the extract preserves the cache schema (relevant quick filter caches). This is fine.

But after the first extract refresh on Tableau side the cache schema is gone.

It makes sense: if you do not have an associated workbook with quick filters you’ll never be able to cache quick filters. Coz you don’t have quick filters.

On the other hand: every workbook using relevant filters will be slower with published data sources simply because of the missing accelerated views.

Workbook level calculations

But this isn’t the only drawback with published data sources. As you might know there are two types of calculations: one is stored in the data source and the other is stored in the workbook layer.

Screen Shot 2017-04-16 at 18.21.38

Data Source level calculations cannot be edited directly


Screen Shot 2017-04-16 at 18.21.51

Workbook level calculations can be edited directly

So we have two types of calculations. The Data Source level calculations are extract based on the server with materialized columns. But your newly defined workbook level calculations cannot be pre-optimized as server has no idea about them during extraction time.

Bottomline: if your users have lots of workbook level calculations with published data sources then they’ll be unhappy at some point. This was what happened with my colleague: his client used many advanced calcs on top of IT supported published data sources. When she made a local copy it was fast but when she switched back to the published data source it slowed down. No surprises here: this is expected.


To sum up what to keep in mind:

  1. Proxying data source requests comes with a minor penalty. Plus, data server is an additional process within the flow with its own problems (like the SAP HANA issue).
  2. Tableau can cache quick filters but only for embedded data sources
  3. Workbook level calculations never get materialized

There are even more tricks to speed things up, but maybe next time.

Do you need more information or help to make your system significantly faster? Just drop me a line.



Tamás Földi

Related items

/ You may check this items as well

sync frelard

Tableau Extensions Addons Introduction: Synchronized Scrollbars

At this year’s Tableau Conference, I tried t...

Read more

Tableau External Services API: Adding Haskell Expressions as Calculations

We all have our own Tableau Conference habits.  M...

Read more
Scaling Tableau Image

Scaling out Tableau Extracts – Building a distributed, multi-node MPP Hyper Cluster

Tableau Hyper Database (“Extract”) is ...

Read more