FOR BI PROS Tableau
Introducing TableauFS: File System on Tableau Server Repository
May 16, 2015
5
Working with TableauFS

Working with TableauFS

We have so many APIs for Tableau Server but to be honest even the simplest things cannot be achieved without extensive amount of work. Lets see what my colleagues / clients demanded from me:

  1. Easy way to move workbooks and data sources across servers (large enterprise environment, between multi-tenant systems)
  2. Version control workbooks
  3. Search by workbook/data source contents, like where a particular connection or table is used in published workbooks
  4. Point in time recovery of workbooks, backup individual projects or workbooks (similar to version control)
  5. Mass-change workbooks and data sources (we have tableau servers with 5.000-10.000 workbooks/data sources)

Lets see how would a file system solve these issues?

  1. Move workbooks? Just copy file between servers like “scp dir1/file server2:dir1/”
  2. Version control? Use git with annex or bop
  3. Search contents? grep or zipgrep on files
  4. Run tools like TWB Auditor to understand your workbooks’ contents
  5. Point in time recovery? git or rsync contents to a snapshot aware file system
  6. Mass change? sed, ruby, python, etc or some twbx editing tool like powertools directly on the server files

You see, this is why you desperately need a file system for your published data. Good news everyone, last week I wrote one…

…and I called it as TableauFS (pragmatically over creativity). It’s a FUSE based userspace file system driver built in pure ANSI C (for performance and fun) on top of Tableau’s repository server. It allows to mount tableau servers with all data sources and workbooks directly to the file system. File information and contents are retrieved on-access without any local persistence or caching so when you cat a file it will go to tableau and retrieve the chunks one by one.

The file system connects directly to the postgresql repository database using readonly credentials for read only mode or tblwgadmin or postgres for read write access.

TableauFS in action – list project contents and read their files

Do you want it? Sure you do.

Installation

You need five packages in advance to compile it: fuse-devel, postgresql-devel, cmake, makefile and gcc. To work with workbooks/datasources larger than 2GB you need postgresql version 9.3+ otherwise the file limit is 2GB.

You can clone the source from https://github.com/tfoldi/fuse-tableaufs.

For getting the binaries just type cmake . && make && make install and you will have everything installed. The executable will be installed as  /usr/local/bin/tableaufs  but you can use mount directly.

 

Configuring tableau server database

To exploit all features (include read-write mode) you need tblwgadmin or similar user with superuser privilege while for read only access read only user is almost enough.  Unfortunately, Tableau’s readonly user does not have select access on pg_largeobject  (as Jonathan Macdonald discovered in this post), so you have to logon as tblwgadmin and issue:

to leverage the full read only experience. It will not harm your system (this is still read only) but unsupported.

Enable and grant select to readonly user

Enable and grant select to readonly user

The steps here:

  1. Enable readonly user with tabadmin dbpass --username readonly <password>  command as documented here
  2. Check your pgsql admin password in tabsvc.yml file. The default location is C:\ProgramData\Tableau\Tableau Server\config but depending on your ProgramData folder, this can be different. lease note that ProgramData folder can be hidden.
  3. Go to Tableau Server\9.0\pgsql\bin folder and issue psql -h localhost -p 8060 -U tblwgadmin workgroup command and paste the password from tabsvc.yml
  4. Execute the grant select statement

Usage

You can mount it with default mount  unix command as:

“ro” stands for read only mode as the default mode is rw. You can mount your server directly with tableaufs command as well:

To unmount, simply:

Basic stuff

TableauFS maps Tableau repository to the following directory structure:

You can go to each directory, list and stat files, find without any limitation. Packaged and non-packaged objects have different file names, tbwx and tdsx are packaged while twb and tdx are plain XML files. You can read, grep, find, search and edit them just like regular files. Whatever you do will be executed on tableau server, the FS does not cache or store blocks locally.

Search in workbooks

We will explore this topic in details in some of my forthcoming posts, but let just note that you can search easily inside XML and zipped XML objects. In the below example I used zipgrep to list all data connection from a packaged workbook. No tabcmd get, no logon to the tableau web portal, no rest api. Just plain unix commands:

Working with TableauFS

Working with TableauFS

Editing existing workbooks are also possible, just check out this thread: http://community.tableau.com/message/369406#369406

Version control & object based point in time recovery

One of the best things in a file system is that you can snapshot or version control its contents. You can expect an extensive post on how to version control and backup automatically all (or selected) tableau objects, how to view differences between changes in a human readable way using only open source tools. In advance, just to keep you entertained here is an example how to create a new git repository and add all of your tableau workbooks and data sources in it:

Adding all Tableau Server workbooks to git repository using tableau fs

Adding all Tableau Server workbooks to git repository using tableau fs

Looks nice? Wait until I just show my set of git extensions to manage zip packaged objects in git repo.

Performance

I love speed and performance, especially when it matters and in a file system it definitely does. Everything is written in pure ANSI C, using only fuse and postgres client libraries.

On my laptop with virtual tableau server it the IO throughput is between 15-35 MB/sec, which is definitely not bad for a network file system.

Do you have question or a good idea how to make it better? Drop a line or ping me at twitter (@tfoldi).

Tamás Földi

Related items

/ You may check this items as well

sync frelard

Tableau Extensions Addons Introduction: Synchronized Scrollbars

At this year’s Tableau Conference, I tried t...

Read more

Tableau External Services API: Adding Haskell Expressions as Calculations

We all have our own Tableau Conference habits.  M...

Read more
Scaling Tableau Image

Scaling out Tableau Extracts – Building a distributed, multi-node MPP Hyper Cluster

Tableau Hyper Database (“Extract”) is ...

Read more