FOR BI PROS Tableau
Tableau Filestore Consistency Checker – How Repository Maps to Filestore
August 17, 2017
0
, , , , ,

Somewhere around the end of July I was invited to show some magic from our consultancy portfolio in the Server Admin User Group’s virtual meetup. I choose (no, actually Mark choose) two topics: the first was automated query banding (practically server side search and replace to programmatically fill initial sqls for workbooks and data sources) while the second was this consistency checker tool. Why should you care about your repository consistency? Read more, I’ll explain.

First of all, if you prefer to entertained, especially in and audiovisual way then you can see it with your beautiful eyes from 32:07:

If not, then let me explain.

Why?

It’s simple: because that’s how IT works. That’s how computer system works. If you store half of the data in one location while the rest in an another it will be eventually inconsistent. Maybe something went wrong with the power cord or the host VM. Maybe Tableau’s online backup feature is not that consistent. Anyhow, it’s advised to check the state of the two repositories: is the metadata in postgres repository matches with the filestore contents? 

If not, users might face into issues that you could detect before them: non-opening workbooks referencing missing extracts. Or on the other hands you miss the opportunity to delete orphaned extract files to free up some precious tableau server storage.

Filestore introduction

The FileStore in Tableau Server is used to store large objects and datasets that would be -due to their size- inefficient to keep in the repository (PostgreSQL) database.

While its inner workings are quite complex, for our simple use-case we are only interested in the following properties:

  • stores the up-to-date extracts for the datasources and workbooks in <TABLEAU>/tabsvc/data/dataengine/extract
  • stores the historical versions of workbooks and datasources in <TABLEAU>/tabsvc/data/dataengine/revision prior version 10.3. From Tableau Server 10.3 Tableau stores reduced data (only the xml from packaged objects), thus storing prior versions is feasible using postgres large objects exclusively.

Our first job will be finding where these file locations are coming from.

Step 1. – find the extract file paths

Lets look at some extract paths:
c:\ProgramData\Tableau\Tableau Server\data\tabsvc\dataengine\extract\14\9E\{DE60E61D-5540-4E0B-8E41-2BE28A2CEE85}\dataengine_42038_846130138889.tde

Just pulling this path apart, the important bits are (in our own naming convention):

  • Tableau base path: c:\ProgramData\Tableau\Tableau Server\data
  • Filestore root: tabsvc\dataengine\extract
  • Distribution keys: 14\9E
  • GUID / Content Id : {DE60E61D-5540-4E0B-8E41-2BE28A2CEE85}
  • Extract filename : dataengine_42038_846130138889.tde

Where do these come from?

Tableau base path + filestore root

The first two parts should be trivial.
GUID + Extract filename

(we’ll get back to the distribution keys in a sec)

After doing a little grepping around in the PostgreSQL repository, we can find the corresponding column(s):

And with a little digging around the DB, we can come up with a query for this:

To find all datasource guids and filenames (extracts for datasources):

To find all extract guids and filenames (extracts for embeded datasources):

GUID and filename for Workbook and Datasource revisions

After doing the same greping as we did for finding the extract GUIDs, we find that the historical versions are even easier:

In our theory the GUID and the file name seems to be the same for datasource versions.

To find all workbook versions:

To find all datasource versions:

Where do these come from? (part 2.)

Distribution keys:

In our theory crafting, these are here to spread the files so we wont hit the OS max files/directory limit

They are generated by:

  • take the SHA-256  (with the default Java parameterization, the GNU command-line SHA256 will give different results)
    hash of the GUID  (or CONTENT ID  for revisions). Use online SHA256 for testing like www.xorbin.com/tools/sha256-hash-calculator
  • take the bits 0-8 and bits 12-20 of the hash and their hexadecimal representations are the two distribution keys

For example:

  • GUID is {DE60E61D-5540-4E0B-8E41-2BE28A2CEE85}
  • SHA-256 is 14.9E..........................  (dots are not important)
  • Distribution keys are 14  and 9E

We are not there yet

So everything is fine and dandy except the filenames in the repo differ from what we were getting from the file system for extracts (revisions seem to be just fine)
Well, there are some rules to this:

  1. Replace anything in the file name that is not an alphanumeric caracter OR A SPACE (!) with an underscore ( _ )
  2. truncate the resulting name to 32 characters
  3. append .tde  extension

And some legacy rules:

At times, the file may come from a legacy version of tableau, and in this case the postfix leaf  is added to the file name before step 2.

Tableau gave us a better explanation:

From: “XXXX” xyz@tableau.com

[…]

“leaf” was appended to data connection names for pre-10.0 data sources that were upgraded to federated connections to make sure the connection names were distinct from data source names. So the servers that have “leaf” files have probably been around for a while. There is nothing special about the postfix otherwise.

And even more legacy rules:

Some filenames we get from the repo (for example: dataengine.1drcocf045eval105hqv70juo5cs (copy) ) have strange filenames in the store:

...\extract\14\9E\{DE60E61D-5540-4E0B-8E41-2BE28A2CEE85}\dataengine_1drcocf045eval105hqv7_1.tde In our experience, datasources that were copied will have an underscore ( _ ) and an incremental number added AFTER STEP 2 (after the truncation).

In this case, the (copy)  postfix seems to be removed from the name BEFORE STEP 1.

Putting everything together in an application

We built an open-source version of the resulting tool to showcase this functionality. The tool is capable of checking the existence and proper location of the following entities:

  • current versions of extracts for data sources
  • current versions of extracts embeded into workbooks
  • historical versions of workbooks
  • historical versions of datasources

Source code is here: https://gitlab.com/brilliant-data/tableau-filestore-consistency-checker

Tamás Földi

Tamás Földi

Director of IT Development at Starschema
Decades of experience with data processing and state of the art programming. From nuclear bomb explosion simulation to distributed file systems. ethical hacking, real time stream processing practically I always had a great fun with those geeky ones and zeros.
Tamás Földi

Related items

/ You may check this items as well

TCOT

Pushing the Limits of Tableau Server – Practical Lessons from a Zen Master

Somewhere around the end of July I was invited to ...

Read more
Sales dashboard embedded into SAP GUI

Embed Tableau Dashboards in SAP ERP and S4/HANA

Somewhere around the end of July I was invited to ...

Read more
speed-1249610_640

Tableau Performance: Why Embedded Data Source is Faster than Published

Somewhere around the end of July I was invited to ...

Read more