FOR BI PROS Tableau
Tableau Server on Linux – Part 2: Zookeeper and File Store/TDFS
August 4, 2015
0
, , , , , , ,

File Store / TDFS on LinuxThe journey continues: in this session we are going to move the famous TDFS (Tableau Distributed File System) with Zookeeper service to one of our favorite operating system: Linux. The goal is the same: have each and every Tableau Server processes on Linux without the need of the Windows OS. And a short disclaimer: this is 100% unsupported by Tableau and you need valid licenses for your Linux box otherwise you are going to violate their EULA.

Previously on “Tableau Server on Linux – Part 1 – Data Engine”

And today we are not just going to install these two services on Linux. No, we’ll do a lot more! We start to transform our Single Node Tableau Server to a Cluster without even touching the GUI.

The Basics

TDFS – or as Tableau calls File Store service – is installed along with the Data Engine and controls the storage of extracts. In highly available environments, the File Store ensures that extracts are synchronized to other file store nodes so they are available if one file store node stops running. How does it work in practice? If you refresh a data source then

  • Backgrounder receives an extract refresh task
  • Gets the Data and pass it to the tdeserver process with its new unique name
  • tdeserver writes the new local tde file
  • Backgrounder connects to File Store service and report the new file
  • File Store puts the file to TDFS
  • TDFS implementation ensures that file is replicated to all nodes. Node configurations are stored in zookeeper under /tdfs zookeeper directory.

In order to use our tdeserver without the need to copy files between Tableau Server and our Linux hosts we need zookeeper and tdfs, that’s it. So, let’s configure them.

Zookeeper

First of all, what is Zookeeper? According to Zookeeper’s website Zookeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. Tableau uses Zookeeper to store cluster information, check who is doing what, who is available. Most of these functionalities are implemented in the Coordination Service. But enough from theory, let’s jump into the practice!

Installing Zookeeper on Linux

Zookeeper in Tableau Server 9.0 uses Zookeeper 3.4.6 which is the latest stable release. Zookeeper is written purely in Java, thus, binaries should work on all platforms where java is supported. You can download this version from any apache mirrors.

Installation is done, the zookeeper distribution is ready to server in your zookeeper-3.4.6 folder. To have everything up and running we need a tableau-compatible configuration. The configuration should look like:

Couple of things: server.1 should be our original Windows Tableau server while server.2 is the Linux one. The dataDir should point to zookeepers local data directory. This needs to be created with mkdir ~/zookeper-data command. Also, you should create a file called myid inside dataDir to tell zookeeper the local node’s id:

Good. Now switch to the windows box and add the server.2 line to Tableau Server’s zoo.conf located in %PROGRAMDATA%\Tableau\Tableau Server\data\tabsvc\config\zookeeper\zoo.cfg . That’s it. Restart tableau server, then start our own Linux Zookeper instance with:

You can quickly check  zookeeper.out  to see everything is okay.

Validating Zookeper

We already built a Zookeper cluster and joined to our Tableau Server, isn’t it fantastic? Well, it is. But what’s inside? Well, let’s have a look:

Nice, it seems we can access everything locally from Linux. Or maybe not:

Tdfs folder is password protected. Time to authenticate ourselves. You can get the password from %PROGRAMDATA%\Tableau\Tableau Server\data\tabsvc\config\filestores.properties as usual:

Now, let’s authenticate and retry read from /tdfs directory:

Everything as expected. Zookeeper: job done.

TDFS / File Store

Getting the binaries

And now, something different. Until now we deal only with ready to use services. Now, let’s move something really tableau specific. We should start moving Tableau java packages (jars) to our Linux box. Here is what and how:

  • create a new folder called tableau-apps. This is where the code will go
  • create a folder as tableau-apps/bin. Copy all jar files from Tableau Server’s bin/ folder recursively. If you are doin’ it right you should have repo-jars and repo-migrate-jars subfolders with jar files in it as well. You do not need everything right now, but this is only part two – and we will move all services in the next few weeks, not just TDFS!
  • create new folder as tableau-apps/lib. Just like in case of bins, copy all jar files from Tableau Server’s lib/ folder. Here you don’t need recursion, first level is enough.

That’s it, binaries are done. How about configuration?

TDFS Configuration – On Linux

Create a new folder filestore and create the following three files:

log4j.xml – to see what is going on:

connections.properties – this is required to know where to connect

And finally the filestore.properties:

The windows server is still the 54.203.245.18 while 54.212.254.40 is the linux node. The filestore.root directory should point to our data engine directory (which was created in our part 1). And don’t forget to change the fszkuser user’s password.

Linux part is done, switch to windows.

TDFS Configuration – On Windows

In addition to zookeeper authentication TDFS blocks all connections which aren’t coming from worker nodes. Thus, we should add this node as working in the following files:

  • filestore.properties
  • connections.properties
  • connections.yaml
  • backgrounder.properties
  • clustercontroller.properties
  • dataengine/tdeserver_standalone0.yml

Practically you must:

  1. search and replace localhost string with the external IP of the server in all above listed files
  2. change worker.hosts to worker.hosts=windows_ip,linux_ip in filestore.properties andtdeserver_standalone0.yml due to whitelisting

You can find these files in %PROGRAMDATA%\Tableau\Tableau Server\data\tabsvc\config.

Start TDFS

Config done, let’s start TDFS:

Now, you should see some nice log messages in filestore/filestore.log:

If you are still with me then you just accomplished part 2: you have your TDFS and Zookeper on your Linux node in cluster mode.

Try things out

A typical test case would be an extract refesh. After refresh completion we should see the generated TDE file both on Windows and Linux.

Refresh Extract on Server

Now in the backgrounder.log we can see that it was able to communicate with TDFS:

On Windows:

On Linux:

Hurray, our file was replicated successfully in our newly built cluster. This is the end, the happy end.

File Store / TDFS on Linux

If you have questions or comments just let me know and stay tuned for learn about more services – running on Linux.

Tamás Földi

Tamás Földi

Director of IT Development at Starschema
Decades of experience with data processing and state of the art programming. From nuclear bomb explosion simulation to distributed file systems. ethical hacking, real time stream processing practically I always had a great fun with those geeky ones and zeros.
Tamás Földi

Related items

/ You may check this items as well

Pasted image at 2018_01_09 04_59 PM

Python Experiments in Tableau 1. – Add live currency conversion to Tableau Dashboards using TabPy

The journey continues: in this session we are goin...

Read more
Tableau Docker

HOWTO: Tableau Server Linux in Docker Container

The journey continues: in this session we are goin...

Read more
Tableau Consistency Checker

Tableau Filestore Consistency Checker – How Repository Maps to Filestore

The journey continues: in this session we are goin...

Read more