FOR BI PROS LEARN Tableau
Reconfiguring Tableau Server without Restart: Graceful Restart (part2)
June 5, 2016
0
, , , , , ,

DraingGraceful restart, the promise of restarting core components like vizqlserver without impacting the users. As of now there is no official support from Tableau Software to do black magic like this, but yes, you aren’t here for official supported stuff. You are here for actual solutions. Considering that you know how Tableau Server’s configuration template engine works and familiar with the basics of tableau server gateway architecture including its services’ sticky sessions let’s jump into the details.

Graceful Tableau Server Restart aka Draining Mode

The key to graceful restart (= restart Tableau Server services without impacting production, like disconnect established vizql sessions) is the load balancer’s Draining Mode. Just to remember Tableau Server gateway uses load balancer clusters to load balance across same kind of services. Like it has balancer://vizportal-cluster  defined in httpd.conf  containing all vizql server backend application servers. When someone hits URL starting with /vizql his requests will be forwarded to worker URLs. If his web session has vizql session id then apache gateway will pick the same worker for the same vizql session id.

If you check Tableau Server’s LoadBalancer status in the balancer-manager, you will see something like this:

LoadBalancer status for /vizql URLs. All requests with this prefix will be forwarded to these named workers (vizqlserver tomcat processes)

LoadBalancer status for /vizql URLs. All requests with this prefix will be forwarded to these named workers (vizqlserver tomcat processes)

You might remember that usual vizql session IDs look like:

25E9C16D1C1B4F83B0C79A1128A000AE-0:1 

The first part is a 128bit unique identifier, then a dash, then the route identifier as seen in the balancer configuration. Yes, this is the way how can you know which vizqlsession belongs to which vizqlserver worker. 

Now the fun part.

Apache’s documentation defines drain mode as:

When worker is in drain mode it will only accept existing sticky sessions destined for itself and ignore all other requests.

Sounds exactly what we need: the load balancer will redirect existing sessions (like vizql or data server) to existing load balancer workers if their sessions are already established, otherwise skip the ones in drain mode.

Switching worker 0:0 to drain mode

Switching worker with route 0:0 to drain mode

Now that we can control who is doing what we can define the process flow for restarting vizql servers:

  1. Change configuration setting in workgroup.yml (like trusted hosts or log level) and issue tabadmin configure
  2. Pick the first worker/route from the LoadBalancer and put it to draining mode. This will ensure that no new sessions will be redirected to this worker
  3. Wait until all sessions finishes or time out (you can monitor it with JMX). Additionally you can add hard timeout for restart like 30-60 mins.
  4. Change worker state from Draining to Disabled
  5. Signal terminate to vizqlserver. If you’re sophisticated you can enable tomcat’s shutdown port, so you don’t have kill the process in a barbarian way.
  6. vizqlserver will restart automatically
  7. Change mode to Draining Mode = Off, Disabled = Off
  8. Go to next vizql worker

This is it. It applies to other services like data server, vizportal or saml-service.

How do we use?

Here, at Starschema we prefer to have our own Server administrator tool chain. In addition to Palette Center and Insight we built dozen of tools to support complex tasks like this phased, graceful restart. One of our tool of choice is tabadmin-cli, a readline based tabadmin shell built on top of tabadmin.jar. It makes things faster as you need to ramp up the JVM/Jruby stack once and use that JVM for all consequent calls, plus the convenient code completion. Graceful restart is also included in the tool and using the same process flow as described above.

Magical, isn’t it? I have to confess, not just because it’s my own creation, but I just love this tool.

Summary

If Tableau Support or Knowledge Base tells you that you have to restart your services to change log levels or add an IP to trusted hosts just ignore it. No, you definitely don’t have to. You just need to know how and what needs to be restarted. Understanding the gateway’s load balancing and rewrite rules helps to perform the necessary steps to avoid planned outages ensuring that your user base can see and understand data, without any interruption.

Next steps

Graceful restart is nice, but frankly, who wants restart services if you can change a running process’ memory? With advanced reverse engineering, disassembler and debugger tools you can change any running process’ behaviour. Need to change the log level? No need to restart the server just change the memory address where it manage the log writes. Sounds scary? Just stay tuned, you’ll learn a lot assembly, linking, symbol hooking and memory patching in part3!

 

Tamás Földi

Tamás Földi

Director of IT Development at Starschema
Decades of experience with data processing and state of the art programming. From nuclear bomb explosion simulation to distributed file systems. ethical hacking, real time stream processing practically I always had a great fun with those geeky ones and zeros.
Tamás Földi

There are 0 comments

Related items

/ You may check this items as well

Pasted image at 2018_01_09 04_59 PM

Python Experiments in Tableau 1. – Add live currency conversion to Tableau Dashboards using TabPy

Graceful restart, the promise of restarting core ...

Read more
Tableau Docker

HOWTO: Tableau Server Linux in Docker Container

Graceful restart, the promise of restarting core ...

Read more
Tableau Consistency Checker

Tableau Filestore Consistency Checker – How Repository Maps to Filestore

Graceful restart, the promise of restarting core ...

Read more