Automatic, shadow-copy like version control of Tableau Server objects (workbooks, data sources) is a crucial part of the server administration. While most of the admins using tableau backup tool only we should note that it lacks features like individual workbook level restore and quick, incremental backup. But that’s why you are here: with this method you can easily create and automate incremental snapshots of your Tableau Server objects directly to the git version control system. This is the perfect solution if some of your users overwrites or deletes a workbooks then begging for restore: you can go back in time and retrieve any previous version easily.
Indigents
My solution relies on the following technologies:
- TableauFS: The existing open source version control systems working with files. Thus, we should make all tableau repository objects available as opaque file system objects. This is exactly what TableauFS does.
- Git: Git is the de-facto distributed version control system standard. It was written by a well-known guy (google for it if you don’t know who).
- Git LFS: is an open source Git extension for versioning large files
- crontab, windows scheduler or any other scheduler
The solution works on Linux, OSX and Windows.
Installation
First of all install Tableau FS as described here or here then install git with your OS’s package manager (like sudo yum install git ). Next step is to install git lfs. Just download the binaries and execute the installer shell script:
1 2 3 4 5 6 7 8 9 10 11 |
$ sudo yum install git [..] $ wget https://github.com/github/git-lfs/releases/download/v0.5.2/git-lfs-linux-amd64-0.5.2.tar.gz [..] $ tar xvzf git-lfs-linux-amd64-0.5.2.tar.gz git-lfs-0.5.2/ git-lfs-0.5.2/git-lfs git-lfs-0.5.2/install.sh $ cd git-lfs-0.5.2/ $ sudo sh ./install.sh git lfs initialized |
Installation is done, let’s set it up.
Configuration
TableauFS
We should create a directory for our version control system like ~/tableau-repos . Then put all of our servers inside subdirectories like ~/tableau-repos/dev . TableauFS is a (mostly) read only file system, thus, we should write any scm related information to ~/tableau-repos only.
1 2 3 4 5 6 7 |
$ mkdir ~/tableau-repos $ mkdir ~/tableau-repos/dev $ tableaufs -s -o "pghost=myserver,pgport=8060,pguser=readonly,pgpass=lolololo,ro" ~/tableau-repos/dev TableauFS v0.7.1 using FUSE API Version 29 Connecting to readonly@myserver:8060 $ cd ~/tableau-repos/dev && ls Default |
Now we have the Tableau Server objects on our file system (not physically, just virtually), so we can move to the version control part.
Git LFS
We can use the same old git init command to create an empty git repo as usual. LFS should know what are our binary file set. This can be set by lfs track command:
1 2 3 4 5 6 7 |
~/tableau-repos $ git init Initialized empty Git repository in /home/ec2-user/tableau-repos/.git/ ~/tableau-repos $ git lfs track "*.twbx" Tracking *.twbx ~/tableau-repos $ git lfs track "*.tdsx" Tracking *.tdsx |
Now we can proceed an import the existing objects and commit the initial release:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
~/tableau-repos $ git add --all ~/tableau-repos $ git status On branch master Initial commit Changes to be committed: (use "git rm --cached <file>..." to unstage) new file: .gitattributes new file: dev/Default/Tableau Samples/Dashboard Parameter Example Many Dashboards2.twbx new file: dev/Default/Tableau Samples/Regional.twbx new file: dev/Default/Tableau Samples/Superstore.twbx new file: dev/Default/Tableau Samples/iphone5.twbx new file: dev/Default/Tableau Samples/test.twb new file: dev/Default/default/test.twb ~/tableau-repos $ git commit -a -m "Initial release" [master (root-commit) a0e5c10] Initial release 7 files changed, 494 insertions(+) create mode 100644 .gitattributes create mode 100644 dev/Default/Tableau Samples/Dashboard Parameter Example Many Dashboards2.twbx create mode 100644 dev/Default/Tableau Samples/Regional.twbx create mode 100644 dev/Default/Tableau Samples/Superstore.twbx create mode 100644 dev/Default/Tableau Samples/iphone5.twbx create mode 100644 dev/Default/Tableau Samples/test.twb create mode 100644 dev/Default/default/test.twb |
Very nice, we have just synchronized our git repo with the tableau server repository. If you have a few thousands workbook it could take a few minutes.
Scheduling the updates
Next time when you execute git add and commit only the changed files will be stored (and the differences only). You can safely schedule the following one lines which will look for added, deleted and updated files and push them to the local repo:
1 2 3 |
$ git add --all && git commit -a -m "auto versioning" On branch master nothing to commit, working directory clean |
Our customers usually requests hourly incremental backups from workbooks and datasources which is perfectly feasible even for big deployments (5000+ workbooks).
Managing versions
Lets take an example workbook called Dashboard Parameter Example Many Dashboards2 from Default site, Tableau Samples project.
Lets modify something inside the workbook, like change the calculation of DB Text field:
Save the changes. Now when we switch back to the console the changes will be immediately visible for git as well:
Voila, the changes were successfully committed to repository. The git log shows that changes are stored in version control system:
1 2 3 4 5 |
~/tableau-repos $ git log -p dev/Default/Tableau\ Samples/Dashboard\ Parameter\ Example\ Many\ Dashboards2.twbx | egrep '(Date|index)' Date: Wed Jul 22 20:28:17 2015 +0000 index da05385..783be8e 100644 Date: Wed Jul 22 20:18:29 2015 +0000 index 0000000..da05385 |
Change git’s default diff behavior
Diffing two different twbx files with vanilla git is not really useful. Git thinks that twbx is a binary file and shows only the binary difference. However, we can set custom diff utilities to trick git. For twbx lets create a shell script which invokes unzip -c -a and extracts only the twb file from it. Add this script to our .gitconfig then reference it from our repository’s .gitattributes .
1 2 3 4 5 6 7 8 9 10 |
$ cat ~/twbxcat.sh unzip -c -a "$1" '*.twb' $ cat ~/.gitconfig [diff "twbx"] textconv = ~/twbxcat.sh ~/tableau-repos$ cat .gitattributes *.twbx filter=lfs diff=twbx merge=lfs -crlf *.tdsx filter=lfs diff=lfs merge=lfs -crlf |
Now we can see the changes directly from git logs.
Was this hard? I don’t think so. If you follow these your users will be protected from accidental deletes and they can request to go back to any versions which were modified even a year before.
- Tableau Extensions Addons Introduction: Synchronized Scrollbars - December 2, 2019
- Tableau External Services API: Adding Haskell Expressions as Calculations - November 20, 2019
- Scaling out Tableau Extracts – Building a distributed, multi-node MPP Hyper Cluster - August 11, 2019