Tableau and some ETL
In the following post I’m going to show you how to create a simple dashboard in Tableau from a MySQL database table which contains data from an .csv file loaded into the database via Talend. It’s also an easy to understand and hopefully working introduction to a basic ETL process, so you can use it as a starting point for further developments.
For this dataset we could use flat file connection directly from Tableau. However, this is not the case with other public sources (like EUROSTAT) where you need unpivot or other data massage. My aim is to show the complete process from downloading the data until the first iterations on the visualization side.
The data set I choose is the consumer complaints issued by US citizens to different loan and mortgage companies. It can be downloaded from here.
Comma or not comma, that is the question
When you download the dataset it comes in the form of a .csv file, that you can load into Tableau (in most cases…). But there is an issue with the separation of the fields that requires a different approach. If you take a closer look at the data you will notice that the field separator is comma. The data set also contains fields that have comma(s) inline, and that doesn’t fits with the data consistency model required by Tableau. Tableau doesn’t have the functionality of the so called „text enclosure” which would mean that even if you have commas in your field and those are enclosed by quotes you can specify an enclosure character that the data miner/visualizer/loader software will recognize as a single field.
Using Talend to load the data to MySQL
So instead of loading the data directly to Tableau we will need to find a different method. Fortunately there is a software called „Talend” that gives the ability to create and run different ETL jobs either on your machine or in a remote server. We will use it’s capabilites to load the raw data from the .csv file to a MySQL database so we can connect to that from Tableau.
In order to use, you will need to download Talend Studio and a MySQL server (I used to use XAMPP which also gives you Apache and other useful features in a bundle).
The Talend job
Open Talend Open Studio and create a job, call it whatever you like. It will serve as the loading job of the .csv file to the MySQL database table.
A) Import the .csv file
1. Open the „Metadata” label on the left (click on the small arrow which appears when you hover the label) then create a „File delimited” type input by right clicking on it.
2. Give it a name
3. Select your file you want to use (or the .csv we just downloaded) and set the format to „Windows”
4.Set the parameters for loading as follows (pay attention to the field separator, the csv delimited selector and the „Text enclosure” settings. Also don’t forget to set the first row to column name.
5.Set the length of the columns (only string type) to 255 since Talend runs on the first 1000 rows when it checks the width of a specific column and that can cause some headache later on – so prevent it. Also set the „Complaint_ID” to key. As a last step, set the column type to “Date” instead of “String” at Date_received and Date_sent_to_company fields.
6.Hit „Finish” and grab a coffee.
B) Set up the job flow
We will use the already installed MySQL database engine to load the data into, so we will have to set up a connection from talend to that. When you are setting up connections or you want to start something based on the previous job’s success, right click on a component and select „On Subjob OK”.
1. Drag and drop the previously created file input from the left.
2. Drag and drop from the right side a „MySQL Output” component to the canvas. Set the related properties as follows (if you have a different table name update it to yours). Notice, that „Action on table” shows „Insert” which means any time you run the job it will insert the same data (so after a while you will have a huge table). Here you will need to right click on the csv file and select the main row as the connection type since we are going to load the data here.
3. If everything is fine, close the connection by dragging a „MySQL Close” component to the canvas and (!) right clicking on the csv component and selection „On Subjob OK” type of the connection. Since the csv file and the database loader constitute a job, the connection needs to be closed on the success of this job.
On localhost connection, go to the Advanced Settings and check in the Auto Commit option.
4. Click on save and then run the job. Possibly you will need to fix some minor error, but after that the data should appear in your MySQL database and we can now go to Tableau.
C)Connect to the database from Tableau
1. You can easily connect to many various datasources from Tableau, but for now select MySQL from the list.
2. You can fill in the required fields (host, user, password) and if you click „Connect” you should be able to select one of the available databases from the server. Select the one Talend created and you can select the table as well.
Optionaly – if there is no ODBC connector on your machine, you need to install it before – here you can find the aproppriete driver.
3. If you done it right, you should see something similar to this screen. On the left side there are the different dimensions and measures from the table with which you can start to work.
So from now on you are ready to make your own worksheets and dashboard. Here’s what I made, feel free to post a screenshot of yours in the comments. In a forthcoming post I’m going to show you how I created my dashboard.
Did you like this article? Follow us on Twitter!
Latest posts by Mihály Minkó (see all)
- Developing a simple dashboard in Tableau (part II.) - October 13, 2014
- Creating a simple dashboard in Tableau with Talend and MySQL - September 15, 2014