This Github Web Data Connector is a very basic connector designed to showcase some basic aspects of the Tableau Web Data Connector architecture. After you read some of the WDC techniques in this previous MongoDB post we can learn two more things: authentication and pagination. You can find the full sources in our Starschema Web Tableau Connector repository‘s GitHub folder.
Purpose of the connector
Enable us to get a list of commits from a repository and then create reports about the contributions, contributors and the dates the contributions came.
We’ll use the GitHub REST API to access these commits. The URL to access these is:
1 |
https://api.github.com/repos/<user>/<repo>/commits |
This returns us a JSON array of commits: sample commit list.
There are two aspects of this API endpoint we need to take into consideration:
- Non-authenticated requests are limited to 60 requests/hour, so we need to add basic authentication
- A single request only returns a limited number of commits, so we need pagination
So lets tackle those challenges one-by-one.
Getting started with the connector
We’ll be using the Starschema WDC Connector Base to write our connector since it simplifies some basic tasks (like creating a simple UI).
The construction of the UI and basic states of the connector basically mirror the structure of the MongoDB connector:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
wdc_base.make_tableau_connector steps: start: template: require './start.jade' run: template: require './run.jade' transitions: "start > run" : (data)-> _.extend data, wdc_base.fetch_inputs("#state-start") "enter run": (data)-> # save the password tableau.password = JSON.stringify user: data.auth_username password: data.auth_password # Remove the sensitive data from the connection_data delete data.auth_username delete data.auth_password wdc_base.set_connection_data(data) tableau.submit() |
The only real difference here is storing the username / password pair inside the encrypted tableau.password field as a JSON object to authenticate the requests in the callbacks and removing them from the connection data object so they wont be stored unencrypted.
Getting the metadata
Since github always returns the same data, we can keep the metadata static:
1 2 3 |
GITHUB_COLUMNS = names: ['author_name', 'committer_name', 'author', 'authored_at', 'committer', 'committed_at'] types: ['text', 'text','text', 'date', 'text', 'date'] |
And do the header callback using this static object:
1 2 |
columns: (connection_data)-> tableau.headersCallback( GITHUB_COLUMNS.names, GITHUB_COLUMNS.types ) |
Getting the commit data
Lets go over the rows() function that returns the actual commit data:
1 2 |
rows: ( connection_data, lastRecordToken)-> {user: auth_user, password: auth_password} = JSON.parse(tableau.password) |
First we deserialize the username and password from the encrypted storage so we can do authentication if necessary.
1 2 |
# the URL of the first page connectionUrl = github_commits_url(connection_data.username, connection_data.reponame) |
Then figure out the github URL for the commits from the connection data.
1 2 3 |
# if we are in a pagination loop, use the last record token to load the next page if lastRecordToken.length > 0 connectionUrl = lastRecordToken |
If we are not on the first page (the lastRecordToken isn’t empty) then the url of the next page is already stored in the lastRecordToken so we set the connection url to that.
Next we need to build up the parameters for our AJAX call:
1 2 3 4 5 6 7 8 9 10 11 |
xhr_params = url: connectionUrl, dataType: 'json', success: (data, textStatus, request)-> # [... Success handler / will be detailed later ...] error: (xhr, ajaxOptions, thrownError)-> # Add something to the log and return an empty set if there # was problem with the connection err = "Connection error: #{xhr.responseText} -- #{thrownError}" tableau.log err tableau.abortWithError "Cannot connect to the specified GitHub repository. -- #{err}" |
These parameters are always necessary, and they represent a very basic AJAX call to githubs api. Thankfully the GitHub API uses CORS headers so we wont have any trouble when it comes to cross-site script access.
After creating the parameter object, we may need to add the authentication header to it if we set up the connector for authentication:
1 2 3 |
# Add the auth handler if necessary if connection_data.do_auth xhr_params = apply_auth(xhr_params, auth_user, auth_password) |
The apply_auth helper simply adds an authorization header to our request:
1 2 3 4 5 6 |
make_base_auth = (user, password)-> "Basic #{btoa("#{user}:#{password}")}" apply_auth = (params, username, password)-> _.extend {}, params, beforeSend: (xhr)-> xhr.setRequestHeader('Authorization', make_base_auth(username, password)) |
After the AJAX parameter setup is complete, we just need to call it:
1 |
$.ajax xhr_params |
Now we have created and dispatched the request, so its time to look at what happens when we have received a reply (aka. the succes handler):
1 |
link_headers = parse_link_header( request.getResponseHeader('Link') ) |
For pagination, the response returned by GitHub contains a header named Link which contains the pagination information that we need to parse with our little helper:
1 2 3 4 5 6 7 8 9 |
# Parses the value of the Link header returned by GitHub parse_link_header = (link_header)-> return {} unless link_header o = {} match = LINK_REGEXP.exec link_header while match != null console.log match o[match[2]] = match[1] match = LINK_REGEXP.exec link_header |
Next we’ll check if we actually received the correct response. We do this by checking if the response is an actual array:
1 2 3 |
# Stop if no commits present unless _.isArray(data) tableau.abortWithError "GitHub returned an invalid response." |
Now we know that the data we received can be iterated, so lets just do that and collect the data we need from each commit:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
out = for commit_data in data # shorten names {committer: cc, author: ca} = commit_data.commit # return the data { author_name: ca.name committer_name: cc.name author: ca.email committer: cc.email authored_at: ca.date committed_at: cc.date } |
This simply turns our array of complicated commit objects into a simple flat object that we’ll return to Tableau.
All thats left for us is to check if we need to load more pages and call the tableau.dataCallback() function:
1 2 |
has_more = if link_headers.next then true else false tableau.dataCallback( out, link_headers.next, has_more) |
The Result
Some nice shots from the results like the connector page and the fetched data:
Questions and comments are welcome as always.
- Tableau Extensions Addons Introduction: Synchronized Scrollbars - December 2, 2019
- Tableau External Services API: Adding Haskell Expressions as Calculations - November 20, 2019
- Scaling out Tableau Extracts – Building a distributed, multi-node MPP Hyper Cluster - August 11, 2019