It started with one of my colleague who managed a Tableau case from our end. His client was constantly dissatisfied: her workbooks were significantly faster with embedded data sources compared to published ones. Even if we told her that this is normal, this is the way how life works, so better accept it and let it go. But no, she wanted an explanation. Well, here it is.
Spoiler: this post is about Optimize Extract feature in Tableau and how it works internally.
So why it’s slower?
There are many things that make published data sources slower than having a single workbook with embedded data sources. First of all Published Datasources rely on data server process. This is definitely an additional layer in the processing. First, you hit vizql server, then it goes to dataserver (which can be located on a separate server host), then to tdeserver. This is an additional step, there are some serialization and de-serialization work in place (but the data itself is in binary), but this isn’t supposed to be huge overhead.
So no, not this additional proxy step is responsible for the performance difference. Usually the reason is the optimize extract which works differently in published data sources and embedded data sources.
Optimize Extract
Have you ever wondered what is that magic submenu option in Extract menu? “Optimize“. It must be something great, huh? Sometimes you see during extract generation that it is optimizing. But what exactly happing behind the scenes – let’s see it in details. The documentation says that it can do two things:
- Materialized Calculated Fields
- Create acceleration views for relevant quick filters
And now switch to practice mode and check how they’re working internally.
Materialized Calculated Fields
Let’s start from full scratch with a data connection pointing to a Tableau Server repository. Create a connection to roles and make an extracted data source from it:
With columns like:
Now we should have our columns in our TDE. Let’s see how the generated tableau extract looks inside. First check what schemas and tables we have:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
tab.tcp://[::1]:27042> (database "/Users/tfoldi/Documents/My Tableau Repository/Datasources/roles (workgroup).tde") No output tab.tcp://[::1]:27042> (show schemas) +------------+ |SCHEMAS_NAME| +------------+ |Extract | |SYS | +------------+ tab.tcp://[::1]:27042> (show tables [Extract]) +----------------+ |TABLES_NAME | +----------------+ |$TableauMetadata| |Extract | +----------------+ |
As you see we have the usual [SYS] schema where all extract metadata stored. This is interesting but not for now. $TableauMetadata in the [Extract] is a metadata key-value store and it usually stores the data source definition XML. This isn’t even interesting at all.
However, we have the [Extract] table in the [Extract] schema as well where we find our extracted table. That’s something! Let’s have a look into it:
1 2 3 4 5 6 7 8 9 10 |
tab.tcp://[::1]:27042> (top (table [Extract].[Extract]) () 5) +-----------------+--------------+--------------------+----------+-------------+-------------+-----------+--+-------------+--------------------+ |Number of Records|administrative|created_at |creator_id|display_name |display_order|displayable|id|name |updated_at | +-----------------+--------------+--------------------+----------+-------------+-------------+-----------+--+-------------+--------------------+ |1 |False |2016-10-30 0:19:16.0|NULL |Editor |40 |True |2 |editor |2016-10-30 0:19:16.0| |1 |False |2016-10-30 0:19:16.0|NULL |Interactor |30 |True |3 |interactor |2016-10-30 0:19:16.0| |1 |False |2016-10-30 0:19:16.0|NULL |Viewer |10 |True |1 |viewer |2016-10-30 0:19:16.0| |1 |True |2016-10-30 0:19:16.0|NULL |Administrator|50 |False |4 |administrator|2016-10-30 0:19:16.0| |1 |True |2016-10-30 0:19:16.0|NULL |Publisher |45 |False |8 |publisher |2016-10-30 0:19:16.0| +-----------------+--------------+--------------------+----------+-------------+-------------+-----------+--+-------------+--------------------+ |
We already see the first materialized column Number of Records with value 1 everywhere. This wasn’t in the postgres repository but the extraction process materialized for its convenience.
Move forward and check what happens when we define a calculated field. Thus, create a new calculated field called Public Name :

Public Name calculation – This will be a new materialized column in the TDE
And now optimize the extract. Optimization always occurs during extract refresh, however, if you just added a new calculation and you want to materialize it immediately you need to trigger optimization manually.
After triggering optimization we can sneak in to the newly generated extract and observe a new column (scroll to the right):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
tab.tcp://[::1]:27042> (top (table [Extract].[Extract]) () 5) +-----------------+--------------+--------------------+----------+-------------+-------------+-----------+--+-------------+--------------------+-------------------------------+ |Number of Records|administrative|created_at |creator_id|display_name |display_order|displayable|id|name |updated_at |Calculation_5834061538592075777| +-----------------+--------------+--------------------+----------+-------------+-------------+-----------+--+-------------+--------------------+-------------------------------+ |1 |False |2016-10-30 0:19:16.0|NULL |Editor |40 |True |2 |editor |2016-10-30 0:19:16.0|Editor | |1 |False |2016-10-30 0:19:16.0|NULL |Interactor |30 |True |3 |interactor |2016-10-30 0:19:16.0|Interactor | |1 |False |2016-10-30 0:19:16.0|NULL |Viewer |10 |True |1 |viewer |2016-10-30 0:19:16.0|Viewer | |1 |True |2016-10-30 0:19:16.0|NULL |Administrator|50 |False |4 |administrator|2016-10-30 0:19:16.0|Unknown | |1 |True |2016-10-30 0:19:16.0|NULL |Publisher |45 |False |8 |publisher |2016-10-30 0:19:16.0|Unknown | +-----------------+--------------+--------------------+----------+-------------+-------------+-----------+--+-------------+--------------------+-------------------------------+ tab.tcp://[::1]:27042> (describe [Extract].[Extract]) +-------------------------------+-------------+-------------------------------+ |Column |Property |Value | +-------------------------------+-------------+-------------------------------+ [...] |Calculation_5834061538592075777|name |Calculation_5834061538592075777| |Calculation_5834061538592075777|type |CharString | |Calculation_5834061538592075777|cardinality |8 | |Calculation_5834061538592075777|collation |en_US | |Calculation_5834061538592075777|comparable |True | |Calculation_5834061538592075777|compression |heap | |Calculation_5834061538592075777|count-value |10 | |Calculation_5834061538592075777|distinct |True | |Calculation_5834061538592075777|max-value |'Viewer' | |Calculation_5834061538592075777|min-value |'Administrator' | |Calculation_5834061538592075777|not-null |True | |Calculation_5834061538592075777|precision |255 | |Calculation_5834061538592075777|scale |1 | |Calculation_5834061538592075777|size |1020 | |Calculation_5834061538592075777|storagewidth |1 | +-------------------------------+-------------+-------------------------------+ |
We have a new column called Calculation_5834061538592075777 that is exactly the same calculation what we just added. Thus, it was precalculated and materialized. From now there is no need to do the branching calculation on the fly.
This pre-calculation is only supported for row level calculations. No table calculations or level of detail calculations are materizalized/optimized. However, for FIXED LODs I do not see any reason why not materialize values – but again, it’s not implemented for any type of LODs.
Quick Filter Materialization
Let’s move forward and see what is the case with the filter optimization. This called as Acceleration Views and defined as:
When a workbook contains filters that are set to show only relevant values, computing the available values for that filter can be an expensive query. For these filters, Tableau must evaluate the other filters in the workbook first and then compute the relevant values based on their filter selections. To speed up these queries, a view can be created that computes the possible filter values and caches them for faster lookup later.
To check it add a few quick filters to workbook and click optimize again.
After an another round of optimize we can check what changed in our Extract file. As observed there is a new schema called [Cache] . Let’s look into it.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
tab.tcp://[::1]:27042> (database "/var/folders/8w/q0_7rc6n5w14x4cdx0vgw9280000gp/T/tableau-temp/TEMP_1z00ihn1oakevy1h1sp0611h5f8g.tde") No output tab.tcp://[::1]:27042> (show schemas) +------------+ |SCHEMAS_NAME| +------------+ |Cache | |Extract | |SYS | +------------+ tab.tcp://[::1]:27042> (show tables [Cache]) +-----------+ |TABLES_NAME| +-----------+ |Entry | |View#0 | +-----------+ tab.tcp://[::1]:27042> (table [Cache].[View#0]) +---------------------+-------------------------+--------------------------------+ |display_name_ |name_ |Calculation_5834061538592075777_| +---------------------+-------------------------+--------------------------------+ |Editor |editor |Editor | |Interactor |interactor |Interactor | |Viewer |viewer |Viewer | |Administrator |administrator |Unknown | |Publisher |publisher |Unknown | |Project Leader |project_leader |Project Leader | |Publisher |project_default |Publisher | |Data Source Connector|ds_connector |Data Source Connector | |Data Source Editor |ds_editor |Data Source Editor | |NULL |write_on_default_project |Unknown | +---------------------+-------------------------+--------------------------------+ |
As you see the combination of quick filters are materialized into a single table. Using this View#0 Tableau can quickly visualize relevant quick filters without querying the full data set. ( Entry is a metadata table that defines the contents for each Accelerated View table)
This is all fine. But what happens if we switch to Published Datasource?
Published Datasource
Now the fun part. After publishing the data source to the Server the extract preserves the cache schema (relevant quick filter caches). This is fine.
1 2 3 4 5 6 7 8 9 10 |
tab.tcp://[::1]:27042> (database "/tmp/Data/Extracts/federated_1j59n1z0h01y4v1ae1c6b0.tde") No output tab.tcp://[::1]:27042> (show schemas) +------------+ |SCHEMAS_NAME| +------------+ |Cache | |Extract | |SYS | +------------+ |
But after the first extract refresh on Tableau side the cache schema is gone.
1 2 3 4 5 6 7 8 9 |
tab.tcp://[::1]:27042> (database "/tmp/Data/Extracts/roles _workgroup_.tde") No output tab.tcp://[::1]:27042> (show schemas) +------------+ |SCHEMAS_NAME| +------------+ |Extract | |SYS | +------------+ |
It makes sense: if you do not have an associated workbook with quick filters you’ll never be able to cache quick filters. Coz you don’t have quick filters.
On the other hand: every workbook using relevant filters will be slower with published data sources simply because of the missing accelerated views.
Workbook level calculations
But this isn’t the only drawback with published data sources. As you might know there are two types of calculations: one is stored in the data source and the other is stored in the workbook layer.

Data Source level calculations cannot be edited directly

Workbook level calculations can be edited directly
So we have two types of calculations. The Data Source level calculations are extract based on the server with materialized columns. But your newly defined workbook level calculations cannot be pre-optimized as server has no idea about them during extraction time.
Bottomline: if your users have lots of workbook level calculations with published data sources then they’ll be unhappy at some point. This was what happened with my colleague: his client used many advanced calcs on top of IT supported published data sources. When she made a local copy it was fast but when she switched back to the published data source it slowed down. No surprises here: this is expected.
Conclusion
To sum up what to keep in mind:
- Proxying data source requests comes with a minor penalty. Plus, data server is an additional process within the flow with its own problems (like the SAP HANA issue).
- Tableau can cache quick filters but only for embedded data sources
- Workbook level calculations never get materialized
There are even more tricks to speed things up, but maybe next time.
Do you need more information or help to make your system significantly faster? Just drop me a line.
- Tableau Extensions Addons Introduction: Synchronized Scrollbars - December 2, 2019
- Tableau External Services API: Adding Haskell Expressions as Calculations - November 20, 2019
- Scaling out Tableau Extracts – Building a distributed, multi-node MPP Hyper Cluster - August 11, 2019