FOR BI PROS LEARN Tableau
Tableau Live Data Connection to CSV over HTTP/REST
April 1, 2015
1
, , , , , , , , , ,

I ran into a request in Tableau Community Forum: how to connect a .csv file located on a remote web server thru http protocol. This is not really supported yet, and the to be released web connector will be extract only. But that’s why we are here, the brave community to getting things done. In this post I will show you an easy way how to deal with remote data providers in Tableau, allowing live connection to csv, json and xml formats.

Preparations

Depending on your distribution, Postgres may or may not be compiled with Python support. Lets check if our distribution has the python bindings available separately:

  $ apt-cache search postgres | grep python

Apt shows me a list of postgres-related packages. The ones I am interested in are:

postgresql-plpython-9.3 – PL/Python procedural language for PostgreSQL 9.3
postgresql-plpython3-9.3 – PL/Python 3 procedural language for PostgreSQL 9.3

I’ll be using the python 2 version for this tutorial, so lets install that one:

$ sudo apt-get install postgresql-plpython

Now lets create a new database to keep our exploration for ourselves. On my machine, I will call my db ‘pydl_test’.

$ sudo -u postgres createdb pydl_test

Then we need to install the python language itself into the database:

$ sudo createlang plpythonu pydl_test

Staring out

Lets create a new sql file with a skeleton function. I will call mine pgpydl.sql

And lets test if our python installation works by starting a psql shell and importing our sql file and running a small test select.

 

Downloading real data

We’ll use urllib2 from the Python standard library to download the contents. Its a basic library, but provides everything we’ll need.

Lets think for a moment about the requirements of our little function:

  • it should download the web page if possible
  • if not, then depending on the specific use-case:
    • it should throw an exception which cancels the current transaction
    • it should return some default value instead
  • if the download is possible, then depending on the specific use-case:
    • it should download the whole response or
    • we should be able to provide a limit on the amount of data downloaded

So we’ll need to parameterize both error handling and the (existance of and size of the) download limit.

Now we can test the function.

psql should now show our sample csv file.

Parsing the CSV

Create a return type

When we are parsing the CSV, we are returning complex types, so we need a type wrapper around the data returned from our parsing function.

Using the CSV parser

Parsing CSV becomes trivial at this point using pythons built-in CSV parser, the only thing we have to look out for is that we are returning many instances of this type. So we need set our return type as a SETOF site_users, and return either a list of lists or a generator. We’ll go with the latter.

Unlike the from_url() function, the parser must have a unique signature for each parsed datatype, so we need to copy it for each new type we want to parse, but thankfully we only need to change RETURNS SETOF site_user to RETURNS SETOF my_type.

Now lets test it:

And we should have a list of full names from China.

From Tableau

We just need to set the custom SQL to our new function. The results are there from the web site, without caching, immediately.

Custom SQL for Web Service Call

The Results in Tableau

This is a simple postgres workbook based on custom SQL – however the data is coming from web service

You can even add parameters to construct or modify the url, thus, you can pass parameters to your web service without modifying the postgres function.

Edge cases

There are some edge cases for these methods: when using a differenct csv file, the columns are separated by semicolons instead of colons, so we need to update our parsing function. But we must create the return type first:

We use Python CSV’s sniffer to figure out the CSV separator and format, and we need to add a new argument to tell the parser if it should skip the first line of the CSV data, where the labels usually are.

Also we encounter a small problem:

It seems that the web server is telling us that the content is ISO-8859-1, but it gives us something completely different. So we need to fix our downloader function first by modifying the try block. And on top of this, our local test server returns the response as application/octet-stream, which contains no charset data. We need to handle these cases. So the changed code of the function looks something like this:

Now we can use it. This time we want to get proper columns instead of the composite record type, so we’ll expand the sales_data to a real row of fields using the star:

The output of the function can be used directly from SQL tools or Tableau:

Results of the above query

NOTE: Currently Python has problems handling CSV files, and all input should be UTF-8 or printable ASCII to be safe. The CSV module documentation has more ways of dealing with different encodings.

Summary

Using database stored procedure as data gateways are not a new or unique idea, but sometimes people forget how easily to add non-database data thru database engines to end systems. This method could work with other database engines like sql server or oracle, I used postgres only because it is free and relatively easy to use.

With small modification you can parse json or xml files, flatten them and return their data to tableau – with only a few lines of code, as shown above.

The used SQLs can be downloaded from here.

Tamás Földi

Tamás Földi

Director of IT Development at Starschema
Decades of experience with data processing and state of the art programming. From nuclear bomb explosion simulation to distributed file systems. ethical hacking, real time stream processing practically I always had a great fun with those geeky ones and zeros.
Tamás Földi
  • Eric Norcross

    This is exactly what I’m looking for! Thanks very much for the detailed write up. It also got me wondering if the same could be accomplished using the PL/Perl extension; as some services, like Amazon RDS, don’t support the PL/Python package.

Related items

/ You may check this items as well

TCOT

Pushing the Limits of Tableau Server – Practical Lessons from a Zen Master

I ran into a request in Tableau Community Forum: h...

Read more
Sales dashboard embedded into SAP GUI

Embed Tableau Dashboards in SAP ERP and S4/HANA

I ran into a request in Tableau Community Forum: h...

Read more
speed-1249610_640

Tableau Performance: Why Embedded Data Source is Faster than Published

I ran into a request in Tableau Community Forum: h...

Read more