Editorial FOR BI PROS Tableau
Reading from Tableau Data Extracts – The complete story
May 7, 2015
1
, , , , , , ,

TDETesting is everything. Without proper unit tests we cannot build anything, this is one of the most important aspect of any software development methodology. When I wrote a clojure wrapper around tableau data extract API I had difficulties to write good unit tests simply because there is no automated way to validate contents of TDE files easily (without Server). How should I check if all data what I wrote to an extract file actually match with the expected values? Well, using only Tableau tools I cannot.

Luckily I have solution for this like for everything.

The history

Back in 2012 when tableau released the TDE API it was write only and Windows only. Both limitations were cumbersome. If I want to write robust ETL process I need to know the last/max values of different columns. Also, most of my customers run their ETL applications on Unix systems. They demanded some solution from my team. We are consultants, we love money and for demand we always have supply.

The first “native” version

I grew up in Eastern Europe in the era of socialism. We did not have fancy computers at home and to be honest we had no money for the latest cutting edge applications. When we wanted to play with some game or application the first reflex was to start SoftICE or other disassembler and start looking into the registers to catch something. This was decades ago and I am not a rioting teenager anymore but I still remember the flow, how things needs to be disassembled to write things on top of it.

Back to the original topic. If you had look into DataExtract.log  file you saw what is going on. When you work with  tdeserver64 (that is the database server handles extract related operations)  it will listen on named pipe or network sockets. Named pipe is the default from Desktop or from extract API while TCP sockets are used in server mode.

My first thought was to solve all problems in one round. Reverse engineer the network protocol and build a library on Linux which will connect to the windows machine’s tdeserver process. It was such a great idea that I even started to work on the first version after my new year’s eve hangover on January first:

tde_without_api

Creating TDE database with direct connection to tdedeserver. I went byte by byte to understand them do

First of all it is not so easy to sniff data on localhost. Unlike on *nix boxes the loopback device cannot be moved to promiscuous mode. Same applies to named pipes, dumping IO calls in other processes is pretty inconvenient. The solution for that was using Detours. Detours is an API re-route library which helps hook into  applications and windows system calls. It allows you to inject your DLLs into any application so you we can safely say that Detours is your best friend right after dogs.

After I had everything together I wrote the first version where the library was platform independent using plain tcp socket calls. You still required to run the server on windows but the client was platform independent.

It worked but I felt that it wasn’t the best solution. I was able to read and write most of the data types but handling value arrays were relatively painful. So I moved forward.

Disassemble & Decompile

When you open IDA Pro you start feeling the power of the dark side. Very addictive. It turned out that I was wrong in the whole time, it is much easier to link directly against tableau DLLs than to deconstruct the network communication. I made a nice command line tool to get contents of the TDE file.

Cat contents of TDE files to CSV

Export contents of TDE files to CSV

I was proud so I dropped a mail to tableau and asked if I can publish this or not.

End of story

Few days later I got a friendly reminder citing the license :

You shall not (and shall not allow any third party to): (a) decompile, disassemble, or otherwise reverse engineer the Software or Media Elements or attempt to reconstruct or discover any source code, underlying ideas, algorithms, file formats or programming interfaces of the Software or Media Elements by any means whatsoever

The whole project was moved to the shelf until last week. Then I used these tools and libraries for automated testing my TDE library. The tests passed so it will go back to that shelf and probably remains there forever.

 

 


Do you have some crazy idea and looking for a solution? Share with us!

Tamás Földi

Tamás Földi

Director of IT Development at Starschema
Decades of experience with data processing and state of the art programming. From nuclear bomb explosion simulation to distributed file systems. ethical hacking, real time stream processing practically I always had a great fun with those geeky ones and zeros.
Tamás Földi
  • Siraj Samsudeen

    Hi Tamas, is there any update on reading from a TDE file after the release of 10.1? I am in a project where reading the TDE file would be of great help.

Related items

/ You may check this items as well

Pasted image at 2018_01_09 04_59 PM

Python Experiments in Tableau 1. – Add live currency conversion to Tableau Dashboards using TabPy

Testing is everything. Without proper unit tests w...

Read more
Tableau Docker

HOWTO: Tableau Server Linux in Docker Container

Testing is everything. Without proper unit tests w...

Read more
Tableau Consistency Checker

Tableau Filestore Consistency Checker – How Repository Maps to Filestore

Testing is everything. Without proper unit tests w...

Read more