FOR BI PROS LEARN Talend
Vacuum Analyze with Talend Open Studio
March 23, 2015
0
, , , , , , , , , ,

The whole started with a requirement to vacuum analyze our production greenplum appliances. The problem was with the timing, we only had two hours to complete the activity. This means we would need to start with the table that was never vacuumed, then followup with the other tables ordered by vacuum date and stop process precisely after two hours. The following query will return the vacuum analyze SQLs in the correct order (works for all postgres clones), we just need to execute them:

It is possible to narrow down the tables/schemas using filters. So we have a SQL script that provides us the SQL queries we have to execute, and this is where we hit a problem. We can execute these SQL-s with Talend. We can even use multiple threads. But we want to have some automation to kill this after a given time.

My first solution looked like this:

talend_vacuum_job_overview_v1

Overview of the first version.

Looking at the results we can see that something is not okay. The execution time should be 2 hours: it was, execution should succeed, it failed. With a generated fail message, still it appears as a failure.

talend_vacuum_job_execution_v1

Failure even if the job was successful

This approach works, you can use this to automatically kill long running ETL sessions, and get notified by.Since we want to have this in production (where we get a notification from all the failed jobs) we need a way that won’t generate any failures in the logs, if we have a failure in the logs there must be a failure. I shared this approach because it is a generic solution. Its simple, and can be used anywhere.

As for the second version, I realised we have to get rid of the tDie component, also this approach works with Talend Open Studio

talend_vacuum_job_overview_v2

Overview of the second version

So where is the magic you may ask? Well it is hidden inside the tJavaRow component:

After a given time we start to skip in the loop. This way after the given time the second tFlowMeter won’t receive any new data.

talend_vacuum_job_detail_1_v2

Execution details

I configured it to stop after 1 minute, however it seems we had a few queries running, after those completed the job quickly iterated on the results without passing them. We can have a comparison with the original version, how it appears in the logs:

talend_vacuum_job_exeucution_v2

Talends Project level logging

For every project I configure project level logging, that is one of the best features talend offers, and its available in open studio as well.

You may ask how does the actual executor looks like:

talend_vacuum_job_detail_2_v2

Executor job

Since this job was created in Talend Enterprise we have the option to use Threads. The vacuums can be paralellized, using the Iterate links advanced option with a context variable. But stil connection creation will take some considerable time. So we should use a shared connection. Unfortunately we need a separate connection for each thread, since the PostgreSQL JDBC driver will serialize the parallel requests. We can have a unique connection name among the Threads:

The tJava contains only a System.out.println. Since this executor will be called thousands of times every week, project level logging will be turned on, but we still have the console logs to notify about errors, which shouldn’t happen.


Did you like this article? Follow us on Twitter!

Balazs Gunics

Balazs Gunics

ETL Specialist at Starschema Ltd
Ferrying data with Talend since 2012.
ETL Development is nice, but supporting Developers by implementing non-existing features and creating workarounds are even nicer.
There's nothing you can't solve with Talend.
Balazs Gunics

Latest posts by Balazs Gunics (see all)

Related items

/ You may check this items as well

Scaling Tableau Image

Scaling out Tableau Extracts – Building a distributed, multi-node MPP Hyper Cluster

The whole started with a requirement to vacuum ana...

Read more
Pasted image at 2018_01_09 04_59 PM

Python Experiments in Tableau 1. – Add live currency conversion to Tableau Dashboards using TabPy

The whole started with a requirement to vacuum ana...

Read more
Tableau Docker

HOWTO: Tableau Server Linux in Docker Container

The whole started with a requirement to vacuum ana...

Read more