FOR BI PROS Tableau
Introducing TableauFS: File System on Tableau Server Repository
May 16, 2015
5
Working with TableauFS

Working with TableauFS

We have so many APIs for Tableau Server but to be honest even the simplest things cannot be achieved without extensive amount of work. Lets see what my colleagues / clients demanded from me:

  1. Easy way to move workbooks and data sources across servers (large enterprise environment, between multi-tenant systems)
  2. Version control workbooks
  3. Search by workbook/data source contents, like where a particular connection or table is used in published workbooks
  4. Point in time recovery of workbooks, backup individual projects or workbooks (similar to version control)
  5. Mass-change workbooks and data sources (we have tableau servers with 5.000-10.000 workbooks/data sources)

Lets see how would a file system solve these issues?

  1. Move workbooks? Just copy file between servers like “scp dir1/file server2:dir1/”
  2. Version control? Use git with annex or bop
  3. Search contents? grep or zipgrep on files
  4. Run tools like TWB Auditor to understand your workbooks’ contents
  5. Point in time recovery? git or rsync contents to a snapshot aware file system
  6. Mass change? sed, ruby, python, etc or some twbx editing tool like powertools directly on the server files

You see, this is why you desperately need a file system for your published data. Good news everyone, last week I wrote one…

…and I called it as TableauFS (pragmatically over creativity). It’s a FUSE based userspace file system driver built in pure ANSI C (for performance and fun) on top of Tableau’s repository server. It allows to mount tableau servers with all data sources and workbooks directly to the file system. File information and contents are retrieved on-access without any local persistence or caching so when you cat a file it will go to tableau and retrieve the chunks one by one.

The file system connects directly to the postgresql repository database using readonly credentials for read only mode or tblwgadmin or postgres for read write access.

TableauFS in action – list project contents and read their files

Do you want it? Sure you do.

Installation

You need five packages in advance to compile it: fuse-devel, postgresql-devel, cmake, makefile and gcc. To work with workbooks/datasources larger than 2GB you need postgresql version 9.3+ otherwise the file limit is 2GB.

You can clone the source from https://github.com/tfoldi/fuse-tableaufs.

For getting the binaries just type cmake . && make && make install and you will have everything installed. The executable will be installed as  /usr/local/bin/tableaufs  but you can use mount directly.

 

Configuring tableau server database

To exploit all features (include read-write mode) you need tblwgadmin or similar user with superuser privilege while for read only access read only user is almost enough.  Unfortunately, Tableau’s readonly user does not have select access on pg_largeobject  (as Jonathan Macdonald discovered in this post), so you have to logon as tblwgadmin and issue:

to leverage the full read only experience. It will not harm your system (this is still read only) but unsupported.

Enable and grant select to readonly user

Enable and grant select to readonly user

The steps here:

  1. Enable readonly user with tabadmin dbpass --username readonly <password>  command as documented here
  2. Check your pgsql admin password in tabsvc.yml file. The default location is C:\ProgramData\Tableau\Tableau Server\config but depending on your ProgramData folder, this can be different. lease note that ProgramData folder can be hidden.
  3. Go to Tableau Server\9.0\pgsql\bin folder and issue psql -h localhost -p 8060 -U tblwgadmin workgroup command and paste the password from tabsvc.yml
  4. Execute the grant select statement

Usage

You can mount it with default mount  unix command as:

“ro” stands for read only mode as the default mode is rw. You can mount your server directly with tableaufs command as well:

To unmount, simply:

Basic stuff

TableauFS maps Tableau repository to the following directory structure:

You can go to each directory, list and stat files, find without any limitation. Packaged and non-packaged objects have different file names, tbwx and tdsx are packaged while twb and tdx are plain XML files. You can read, grep, find, search and edit them just like regular files. Whatever you do will be executed on tableau server, the FS does not cache or store blocks locally.

Search in workbooks

We will explore this topic in details in some of my forthcoming posts, but let just note that you can search easily inside XML and zipped XML objects. In the below example I used zipgrep to list all data connection from a packaged workbook. No tabcmd get, no logon to the tableau web portal, no rest api. Just plain unix commands:

Working with TableauFS

Working with TableauFS

Editing existing workbooks are also possible, just check out this thread: http://community.tableau.com/message/369406#369406

Version control & object based point in time recovery

One of the best things in a file system is that you can snapshot or version control its contents. You can expect an extensive post on how to version control and backup automatically all (or selected) tableau objects, how to view differences between changes in a human readable way using only open source tools. In advance, just to keep you entertained here is an example how to create a new git repository and add all of your tableau workbooks and data sources in it:

Adding all Tableau Server workbooks to git repository using tableau fs

Adding all Tableau Server workbooks to git repository using tableau fs

Looks nice? Wait until I just show my set of git extensions to manage zip packaged objects in git repo.

Performance

I love speed and performance, especially when it matters and in a file system it definitely does. Everything is written in pure ANSI C, using only fuse and postgres client libraries.

On my laptop with virtual tableau server it the IO throughput is between 15-35 MB/sec, which is definitely not bad for a network file system.

Do you have question or a good idea how to make it better? Drop a line or ping me at twitter (@tfoldi).

Tamás Földi

Tamás Földi

Director of IT Development at Starschema
Decades of experience with data processing and state of the art programming. From nuclear bomb explosion simulation to distributed file systems. ethical hacking, real time stream processing practically I always had a great fun with those geeky ones and zeros.
Tamás Földi
  • Matthew

    This is awesome! (and thanks for the shoutout to the Power Tools for Tableau!)

  • Guy Cuthbert

    Awesome work, and a great summary article; the data animators here at Atheon will be watching future developments with interest…

  • Keith Helfrich

    Hi Földi,

    Really looking forward to using this! And I’m a bit confused. I’ve just submitted an issue on github.

    Is it possible to connect to the Tableau Server on Windows from tableauFS running on OSX? If so, then I must have a bit of a pre-requisites problem, because the cmake command won’t finish successfully.

    Or do you rather suggest using Docker to do the complete installation on Windows?

    I hope the first option is possible. Maybe you could give a bit more instructions on how to start from scratch on OSX ?

    Thanks!

  • Shakil

    Hi Tamas,

    Greetings –

    First, thank you for all the great tools and knowledge that you share with the community – Much appreciated! We have a need to update the database name for over 5000 workbooks across 700+ sites. These workbooks use a live connection (with embedded credentials) and are published on the Tableau Server. I downloaded the TableauFS application code from GitHub and compiled and built it on Ubuntu and now able to mount all the sites and workbooks as a file system. Thank you for building such a wonderful tool! I am now able to use the search and replace functionality of Linux (Find/Sed) to replace the dbname across all the twb xml files. However, after the update, the workbooks lose the embedded credentials (credentials should be same as the old dbname) and Tableau prompts for a password when the workbook is loaded after the update. Do you have any thoughts on how to avoid this from happening so that the embedded credentials are preserved upon updates via TableauFS?

    Any ideas or thoughts would be greatly appreciated!

    Thanks!
    Shakil

Related items

/ You may check this items as well

Pasted image at 2018_01_09 04_59 PM

Python Experiments in Tableau 1. – Add live currency conversion to Tableau Dashboards using TabPy

We have so many APIs for Tableau Server but to be ...

Read more
Tableau Docker

HOWTO: Tableau Server Linux in Docker Container

We have so many APIs for Tableau Server but to be ...

Read more
Tableau Consistency Checker

Tableau Filestore Consistency Checker – How Repository Maps to Filestore

We have so many APIs for Tableau Server but to be ...

Read more