Tuesday, June 3, 2014

How to optimize Informatica Aggregator Transformation

In Informatica Power Center, Aggregator transformation is one of the heavily used transformation and one of expensive transformation in terms of the performance, especially when you are having large data set for processing. Now question is how to optimize or tune the aggregator transformation. There are multiple ways to optimize aggregator transformation in Informatica. I have listed out some of them –
  • Group by simple columns: Aggregator transformation performs better when it’s grouped by simple columns. If possible for grouping use number field instead of string or date.
  • Use sorted input: It will reduce the aggregator cache which will improve performance.
  • Use incremental aggregation: it will only cache and process the changed data rather than entire set of data.

Despite of applying above tuning method you may not get the expected performance. Now let us try to see if we can avoid the aggregator transformation completely by using the analytical functions in Source Qualifier. This will be possible only if all the data elements in aggregator transformation are from same source database. This won’t work if source is flat file or if all the data elements are not from same source database. I have given examples for couple of analytical functions that can be used in Source Qualifier.
  • Let’s say you would like to get the average age of the employees reporting to the same manager at the same time you need employee name and their age as well. Without analytical function you would need to have two queries joined one would get the average age another would get then employee name and their age. By using analytical function you can get this in one query like this -
         SELECT MANAGER_NAME, EMP_NAME, AGE,
         AVG(AGE) OVER (PARTITION BY MANAGER_NAME) AS AVG_AGE
         FROM FACT_EMP;

MANAGER_NAME
EMP_NAME
AGE
AVG_AGE
Mike
John
30
40.5
Mike
David
34
40.5
Mike
Sam
50
40.5
Mike
Bob
48
40.5


  • Similarly if you like to get the summation of salary for each manager then you can write a query like this – 
         SELECT MANAGER_NAME, EMP_NAME, SALARY, 
         SUM(SALARY) OVER (PARTITION BY MANAGER_NAME) AS SAL_BY_MANAGER 
         FROM FACT_EMP; 

MANAGER_NAME
EMP_NAME
SALARY
SAL_BY_MANAGER
Mike
John
80000
315000
Mike
David
75000
315000
Mike
Sam
60000
315000
Mike
Bob
100000
315000
 
There are so many other analytical functions, like AVG, COUNT, DENSE_RANK, LAG, LEAD, MAX, MIN, RANK, ROW_NUMBER, SUM, etc.

For detailed explanation about analytical function and full list of analytical functions please refer to your database manual.

I would be glad to see your comment or feedback about this topic.



2 comments: