Ans: Informatica is a Software development company, which offers data integration products. If offers products for ETL, data masking, data Quality, data replica, data virtualization, master data management, etc.
Informatica Powercenter ETL/Data Integration tool is a most widely used tool and in the common term when we say Informatica, it refers to the Informatica PowerCenter tool for ETL.
Informatica Powercenter is used for Data integration. It offers the capability to connect & fetch data from different heterogeneous source and processing of data.
For example, you can connect to an SQL Server Database and Oracle Database both and can integrate the data into a third system.
The latest version of Informatica PowerCenter available is 9.6.0. The different editions for the PowerCenter are:
The popular clients using Informatica Powercenter as a data integration tool are U.S Air Force, Allianz, Fannie Mae, ING, Samsung, etc. The popular tools available in the market in competition to Informatica are IBM Datastage, Oracle OWB, Microsoft SSIS and Ab Initio.
OR
Define Informatica?
Ans: Informatica is a tool, supporting all the steps of Extraction, Transformation and Load process. Now days Informatica is also being used as an Integration tool.Informatica is an easy to use tool. It has got a simple visual interface like forms in visual basic. You just need to drag and drop different objects (known as transformations) and design process flow for Data extraction transformation and load.
These process flow diagrams are known as mappings. Once a mapping is made, it can be scheduled to run as and when required. In the background Informatica server takes care of fetching data from source, transforming it, & loading it to the target systems/databases.
Ans: Informatica comes to the picture wherever we have a data system available and at the backend we want to perform certain operations on the data. It can be like cleaning up of data, modifying the data, etc. based on certain set of rules or simply loading of bulk data from one system to another.
Informatica offers a rich set of features like operations at row level on data, integration of data from multiple structured, semi-structured or unstructured systems, scheduling of data operation. It also has the feature of metadata, so the information about the process and data operations are also preserved.
Ans: Informatica has some advantages over other data integration systems. A couple of the advantages are:
Or
Its GUI tool, Coding in any graphical tool is generally faster than hand code scripting.
Can communicate with all major data sources (mainframe/RDBMS/Flat Files/XML/VSM/SAP etc).
Can handle vary large/huge data very effectively.
User can apply Mappings, extract rules, cleansing rules, transformation rules, aggregation logic and loading rules are in separate objects in an ETL tool. Any change in any of the object will give minimum impact of other object.
Reusability of the object (Transformation Rules)
Informatica has different “adapters” for extracting data from packaged ERP applications (such as SAP or PeopleSoft).
Availability of resource in the market.
Can be run on Window and Unix environment.
Ans: Informatica has a wide range of application that covers areas such as:
Ans: Some basic Informatica programs are:
Ans: There are many development components in Informatica. However, these are the most widely used of them:
Ans: ETL tools are quite different from other tools. They are used for performing some actions such as:
Definition of
Sub Questions in Q8.
Ans:
Criteria | Informatica | DataStage |
GUI for development & monitoring | PowerDesigner, Repository Manager, Worflow Designer, Workflow Manager. | DataStage Designer, Job Sequence Designer and Director. |
Data integration solution | Step-by-step solution | Project based integration solution |
Data transformation | Good | Excellent |
Ans:
Ans:
Ans: Domain is the term in which all interlinked relationship and nodes are under taken by sole organizational point.
Ans: Repository server mainly guarantees the repository reliability and uniformity while powerhouse server tackles the execution of many procedures between the factors of server’s database repository.
Ans: It mainly depends upon the number of ports we required but as general there can be any number of repositories.
Ans: The main advantage of partitioning a session is to get better server’s process and competence. Other advantage is it implements the solo sequences within the session.
Ans: With the help of command task at session level we can create indexes after the load procedure.
Ans: Session is a teaching group that requires to be to transform information from source to a target.
Ans: We can have any number of session but it is advisable to have lesser number of session in a batch because it will become easier for migration.
Ans: At the time values alter during the session’s implementation it is known as mapping variable whereas the values that don’t alter within the session implementation is called as mapping parameters.
Ans: The features of complex mapping are:
Many numbers of transformations
tricky needscompound business logic
Ans: With the help of debugging option we can identify whether mapping is correct or not without connecting sessions.
Ans: Yes, we can use mapping parameter or variables into any other reusable transformation because it doesn’t have any mapplet.
Ans: If extra memory is needed aggregator provides extra cache files for keeping the transformation values. It also keeps the transitional value that are there in local buffer memory.
Ans: The transformation that has entrance right to RDBMS Is known as lookup transformation.
Ans: The dimensions that are used for playing diversified roles while remaining in the same database domain are known as role playing dimensions.
Ans: We can access repository reports by using metadata reporter. No need of using SQL or other transformation as it is a web app.
Ans: The types of metadata which is stored in repository are Target definition, Source definition, Mapplet, Mappings, Transformations.
Ans: Transfer of data take place from one code page to another keeping that both code pages have the same character sets then data failure cannot occur.
Ans: At a time we can validate only one mapping. Hence mapping cannot be validated simultaneously.
Ans: It is different from expression transformation in which we can do calculations in set but here we can do aggregate calculations such as averages, sum, etc.
Ans: It is used for performing non aggregated calculations. We can test conditional statements before output results move to the target tables.
Ans: Filter transformation is a way of filtering rows in a mapping. It have all ports of input/output and the row which matches with that condition can only pass by that filter.
Ans: It combines two associated mixed sources located in different locations while a source qualifier transformation can combine data rising from a common source.
Ans: Lookup transformation is used for maintaining data in a relational table through mapping. We can use multiple lookup transformation in a mapping.
Ans: It is a different input group transformation that is used to combine data from different sources.
Ans: The incremental aggregation is done whenever a session is developed for a mapping aggregate.
Ans: In connected lookup inputs are taken straight away from various transformations in the pipeline it is called connected lookup. While unconnected lookup doesn’t take inputs straight away from various transformations, but it can be used in any transformations and can be raised as a function using LKP expression.
The differences are illustrated in the below table:
Connected Lookup | Unconnected Lookup |
Connected lookup participates in dataflow and receives input directly from the pipeline | Unconnected lookup receives input values from the result of a LKP: expression in another transformation |
Connected lookup can use both dynamic and static cache | Unconnected Lookup cache can NOT be dynamic |
Connected lookup can return more than one column value ( output port ) | Unconnected Lookup can return only one column value i.e. output port |
Connected lookup caches all lookup columns | Unconnected lookup caches only the lookup output ports in the lookup conditions and the return port |
Supports user-defined default values (i.e. value to return when lookup conditions are not satisfied) | Does not support user defined default values |
Ans: A mapplet is a recyclable object that is using mapplet designer.
Ans: This transformation is used various times in mapping. It is divest from other mappings which use the transformation as it is stored as a metadata.
Ans: Whenever the row has to be updated or inserted based on some sequence then update strategy is used. But in this condition should be specified before for the processed row to be tick as update or inserted.
Ans: When it faces DD_Reject in update strategy transformation then it sends server to reject files.
Ans: It is a substitute for the natural prime key. It is a unique identification for each row in the table.
Ans: In order to perform session partition one need to configure the session to partition source data and then installing the Informatica server machine in multifold CPU’s.
Ans: Errors log, Bad file, Workflow low and session log namely files are created during the session rums.
Ans: It is a mass of instruction that guides power center server about how and when to move data from sources to targets.
Ans: This task permits one or more than one shell commands in UNIX or DOS in windows to run during the workflow.
Ans: This task can be used anywhere in the workflow to run the shell commands.
Ans: Command task can be called as the pre or post session shell command for a session task. One can run it as pre session command r post session success command or post session failure command.
Ans: Predefined event are the file-watch event. It waits for a specific file to arrive at a specific location.
Ans: User defined event are a flow of tasks in the workflow. Events can be developed and then raised as need comes.
Ans: The group of directions that communicates server about how to implement tasks is known as work flow.
Ans: The different tools in workflow manager are:
Task Developer.
Task Designer.
Workflow Designer.
Ans: ‘CONTROL M’ is the third party tool for scheduling purpose other than workflow manager.
Ans: It is a process by which multi-dimensional analysis occurs.
Take charge of your career by going through our professionally designed Informatica Certification Course.
Ans: Different types of OLAP are ROLAP, HOLAP< DOLAP.
Ans: Worklet is said when the workflow tasks are collected in a group. It includes timer, decision, command, event wait, etc.
Ans: With the help of target designer we can create target definition.
Ans: In workflow monitor we can find throughput option.
Right click on session, then press on get run properties and under source/target statistics we can find this option.
Ans: It is specified on the criteria of source qualifiers in a mapping. If there are many source qualifiers attached to various targets then we can entitle order in which informatica loads data in targets.
Ans: Aggregator performance improves dramatically if records are sorted before passing to the aggregator and “sorted input” option under aggregator properties is checked. The record set should be sorted on those columns that are used in Group By operation.It is often a good idea to sort the record set in database level e.g. inside a source qualifier transformation, unless there is a chance that already sorted records from source qualifier can again become unsorted before reaching aggregator.
Ans: Informatica Lookups can be cached or un-cached (No cache). And Cached lookup can be either static or dynamic. A static cache is one which does not modify the cache once it is built and it remains same during the session run. On the other hand, A caches refreshed during the session run by inserting or updating the records in cache based on the incoming source data.
By default, Informatica cache is static cache.A lookup cache can also be divided as persistent or non-persistent based on whether Informatica retains the cache even after the completion of session run or deletes it.
Ans: A target table can be updated without using ‘Update Strategy’. For this, we need to define the key in the target table in Informatica level and then we need to connect the key and the field we want to update in the mapping Target. In the session level, we should set the target property as “Update as Update” and check the “Update” check-box.Let’s assume we have a target table “Customer” with fields as “Customer ID”, “Customer Name” and “Customer Address”.
Suppose we want to update “Customer Address” without an Update Strategy. Then we have to define “Customer ID” as primary key in Informatica level and we will have to connect Customer ID and Customer Address fields in the mapping. If the session properties are set correctly as described above, then the mapping will only update the customer address field for all matching customer IDs.
Ans: From an Informatica developer’s perspective, some of the new features in Informatica 9.x are as follows:Now Lookup can be configured as an active transformation – it can return multiple rows on successful match
Now you can write SQL override on un-cached lookup also. Previously you could do it only on cached lookup
You can control the size of your session log. In a real-time environment you can control the session log file size or time
Database deadlock resilience feature – this will ensure that your session does not immediately fail if it encounters any database deadlock, it will now retry the operation again. You can configure number of retry attempts.
Ans: First up, Informatica is a data integration tool, while Teradata is a MPP database with some scripting (BTEQ) and fast data movement (mLoad, FastLoad, Parallel Transporter, etc) capabilities.Informatica over Teradata
Ans: Informatica ETL tool is market leader in data integration and data quality services. Informatica is successful ETL and EAI tool with significant industry coverage.ETL refers to extract, transform, load. Data integration tools are different from other software platforms and languages.
They have no inbuilt feature to build user interface where end user can see the transformed data. Informatica ETL tool “power center” has capability to manage, integrate and migrate enterprise data.
Ans: InformaticaPowerCenter is one of the Enterprise Data Integration products developed by Informatica Corporation. InformaticaPowerCenter is an ETL tool used for extracting data from the source, transforming and loading data in to the target.The Extraction part involves understanding, analyzing and cleaning of the source data.
Transformation part involves cleaning of the data more precisely and modifying the data as per the business requirements.
The loading part involves assigning the dimensional keys and loading into the warehouse.
Ans: An expression transformation in Informatica is a common Powercenter mapping transformation. It is used to transform data passed through it one record at a time. The expression transformation is passive and connected. Within an expression, data can be manipulated, variables created, and output ports generated. We can write conditional statements within output ports or variables to help transform data according to our business requirements.
Ans: The problem comes with traditional programming languages where you need to connect to multiple sources and you have to handle errors. For this you have to write complex code. ETL tools provide a ready-made solution for this. You don’t need to worry about handling these things and can concentrate only on coding the requirement part.
Ans: An active transformation is the one that performs any of the following actions:
On the other hand a passive transformation is the one which does not change the number of rows that pass through it. Example: Expression transformation.
Ans: Following differences can be noted:
Router | Filter |
Router transformation divides the incoming records into multiple groups based on some condition. Such groups can be mutually inclusive (Different groups may contain same record) | Filter transformation restricts or blocks the incoming record set based on one given condition. |
Router transformation itself does not block any record. If a certain record does not match any of the routing conditions, the record is routed to default group | Filter transformation does not have a default group. If one record does not match filter condition, the record is blocked |
Router acts like CASE.. WHEN statement in SQL (Or Switch().. Case statement in C) | Filter acts like WHERE condition is SQL. |
Ans:
Ans: This is because we can select the "distinct" option in the sorter property.
When the Sorter transformation is configured to treat output rows as distinct, it assigns all ports as part of the sort key. The Integration Service discards duplicate rows compared during the sort operation. The number of Input Rows will vary as compared with the Output rows and hence it is an Active transformation.
Ans: We can configure a Lookup transformation to cache the underlying lookup table. In case of static or read-only lookup cache the Integration Service caches the lookup table at the beginning of the session and does not update the lookup cache while it processes the Lookup transformation.
In case of dynamic lookup cache the Integration Service dynamically inserts or updates data in the lookup cache and passes the data to the target. The dynamic cache is synchronized with the target.
Ans: When we issue the STOP command on the executing session task, the Integration Service stops reading data from source. It continues processing, writing and committing the data to targets. If the Integration Service cannot finish processing and committing data, we can issue the abort command.
In contrast ABORT command has a timeout period of 60 seconds. If the Integration Service cannot finish processing and committing data within the timeout period, it kills the DTM process and terminates the session.
Scenario 1: Duplicate rows are present in relational database
Suppose we have Duplicate records in Source System and we want to load only the unique records in the Target System eliminating the duplicate rows. What will be the approach?
Assuming that the source system is a Relational Database, to eliminate duplicate records, we can check the Distinct option of the Source Qualifier of the source table and load the target accordingly.
But what if the source is a flat file? Then how can we remove the duplicates from flat file source?
Scenario 2: Deleting duplicate rows / selecting distinct rows for FLAT FILE sources
Here since the source system is a Flat File you will not be able to select the distinct option in the source qualifier as it will be disabled due to flat file source table. Hence the next approach may be we use a Sorter Transformation and check the Distinct option. When we select the distinct option all the columns will the selected as keys, in ascending order by default.
Deleting Duplicate Record Using Informatica Aggregator
Other ways to handle duplicate records in source batch run is to use an Aggregator Transformation and using the Group By checkbox on the ports having duplicate occurring data. Here you can have the flexibility to select the last or the first of the duplicate column value records.
Scenario 1:
Suppose we have some serial numbers in a flat file source. We want to load the serial numbers in two target files one containing the EVEN serial numbers and the other file having the ODD ones.
Answer:
After the Source Qualifier place a Router Transformation. Create two Groups namely EVEN and ODD, with filter conditions as:
MOD(SERIAL_NO,2)=0 and MOD(SERIAL_NO,2)=1
... respectively. Then output the two groups into two flat file targets.
Scenario 1
Suppose in our Source Table we have data as given below:
Student Name | Maths | Life Science | Physical Science |
Sam | 100 | 70 | 80 |
John | 75 | 100 | 85 |
Tom | 80 | 100 | 85 |
We want to load our Target Table as:
Student Name | Subject Name | Marks |
Sam | Maths | 100 |
Sam | Life Science | 70 |
Sam | Physical Science | 80 |
John | Maths | 75 |
John | Life Science | 100 |
John | Physical Science | 85 |
Tom | Maths | 80 |
Tom | Life Science | 100 |
Tom | Physical Science | 85 |
Describe your approach.
Answer:
Here to convert the Rows to Columns we have to use the Normalizer Transformation followed by an Expression Transformation to Decode the column taken into consideration.
Question
Name the transformations which converts one to many rows i.e increases the i/p:o/p row count. Also what is the name of its reverse transformation.
Answer:
Normalizer as well as Router Transformations are the Active transformation which can increase the number of input rows to output rows.
Scenario 2
Suppose we have a source table and we want to load three target tables based on source rows such that first row moves to first target table, second row in second target table, third row in third target table, fourth row again in first target table so on and so forth. Describe your approach.
Answer
We can clearly understand that we need a Router transformation to route or filter source data to the three target tables. Now the question is what will be the filter conditions. First of all we need an Expression Transformation where we have all the source table columns and along with that we have another i/o port say seq_num, which is gets sequence numbers for each source row from the port NextVal of a Sequence Generator start value 0 and increment by 1. Now the filter condition for the three router groups will be:
Scenario
Suppose we have ten source flat files of same structure. How can we load all the files in target database in a single batch run using a single mapping.
Answer
After we create a mapping to load data in target database from flat files, next we move on to the session property of the Source Qualifier. To load a set of source files we need to create a file say final.txt containing the source falt file names, ten files in our case and set the Source filetype option as Indirect. Next point this flat file final.txt fully qualified through Source file directory and Source filename.
Answer:
We will use the very basic concept of the Expression Transformation that at a time we can access the previous row data as well as the currently processed data in an expression transformation. What we need is simple Sorter, Expression and Filter transformation to achieve aggregation at Informatica level.
Scenario
Suppose in our Source Table we have data as given below:
Student Name | Subject Name | Marks |
Sam | Maths | 100 |
Tom | Maths | 80 |
Sam | Physical Science | 80 |
John | Maths | 75 |
Sam | Life Science | 70 |
John | Life Science | 100 |
John | Physical Science | 85 |
Tom | Life Science | 100 |
Tom | Physical Science | 85 |
We want to load our Target Table as:
Student Name | Maths | Life Science | Physical Science |
Sam | 100 | 70 | 80 |
John | 75 | 100 | 85 |
Tom | 80 | 100 | 85 |
Describe your approach.
Answer
Here our scenario is to convert many rows to one rows, and the transformation which will help us to achieve this is Aggregator.
Our Mapping will look like this:
We will sort the source data based on STUDENT_NAME ascending followed by SUBJECT ascending.
Now based on STUDENT_NAME in GROUP BY clause the following output subject columns are populated as
[teaserbox type="5" img="2803" title="Interested in Learning Devops " subtitle="Join myTectra Now!" link_url="http://www.mytectra.com/devops-training-in-bangalore.html" target="blank"]Q29).Revisiting Source Qualifier Transformation
Ans:
A Source Qualifier is an Active and Connected Informatica transformation that reads the rows from a relational database or flat file source.
Since the transformation provides us with the property Select Distinct, when the Integration Service adds a SELECT DISTINCT clause to the default SQL query, which in turn affects the number of rows returned by the Database to the Integration Service and hence it is an Active transformation.
Ans: The Source Qualifier transformation displays the transformation datatypes. The transformation datatypes determine how the source database binds data when the Integration Service reads it.Now if we alter the datatypes in the Source Qualifier transformation or the datatypes in the source definition and Source Qualifier transformation do not match, the Designer marks the mapping as invalid when we save it.
Ans: Whenever we add Custom SQL or SQL override query it overrides the User-Defined Join, Source Filter, Number of Sorted Ports, and Select Distinct settings in the Source Qualifier transformation. Hence only the user defined SQL Query will be fired in the database and all the other options will be ignored .
Ans: Source Filter option is used basically to reduce the number of rows the Integration Service queries so as to improve performance.
Select Distinct option is used when we want the Integration Service to select unique values from a source, filtering out unnecessary data earlier in the data flow, which might improve performance.
Number Of Sorted Ports option is used when we want the source data to be in a sorted fashion so as to use the same in some following transformations like Aggregator or Joiner, those when configured for sorted input will improve the performance.
Ans: Mismatch or Changing the order of the list of selected columns to that of the connected transformation output ports may result is session failure.
Ans: We use source filter to reduce the number of source records. If we include the string WHERE in the source filter, the Integration Service fails the session.
Ans: While joining Source Data of heterogeneous sources as well as to join flat files we will use the Joiner transformation. Use the Joiner transformation when we need to join the following types of sources:
Ans: Sybase supports a maximum of 16 columns in an ORDER BY clause. So if the source is Sybase, do not sort more than 16 columns.
Ans: If we have multiple Source Qualifier transformations connected to multiple targets, we can designate the order in which the Integration Service loads data into the targets.
In the Mapping Designer, We need to configure the Target Load Plan based on the Source Qualifier transformations in a mapping to specify the required loading order.
Ans: In the Workflow Manager, we can Configure Constraint based load ordering for a session. The Integration Service orders the target load on a row-by-row basis. For every row generated by an active source, the Integration Service loads the corresponding transformed row first to the primary key table, then to the foreign key table.
Hence if we have one Source Qualifier transformation that provides data for multiple target tables having primary and foreign key relationships, we will go for Constraint based load ordering.
Q30. Revisiting Filter Transformation
Ans: A Filter transformation is an Active and Connected transformation that can filter rows in a mapping.
Only the rows that meet the Filter Condition pass through the Filter transformation to the next transformation in the pipeline. TRUE and FALSE are the implicit return values from any filter condition we set. If the filter condition evaluates to NULL, the row is assumed to be FALSE.
The numeric equivalent of FALSE is zero (0) and any non-zero value is the equivalent of TRUE.
As an ACTIVE transformation, the Filter transformation may change the number of rows passed through it. A filter condition returns TRUE or FALSE for each row that passes through the transformation, depending on whether a row meets the specified condition. Only rows that return TRUE pass through this transformation. Discarded rows do not appear in the session log or reject files.
Ans:
SQ Source Filter | Filter Transformation |
Source Qualifier transformation filters rows when read from a source. | Filter transformation filters rows from within a mapping |
Source Qualifier transformation can only filter rows from Relational Sources. | Filter transformation filters rows coming from any type of source system in the mapping level. |
Source Qualifier limits the row set extracted from a source. | Filter transformation limits the row set sent to a target. |
Source Qualifier reduces the number of rows used throughout the mapping and hence it provides better performance. | To maximize session performance, include the Filter transformation as close to the sources in the mapping as possible to filter out unwanted data early in the flow of data from sources to targets. |
The filter condition in the Source Qualifier transformation only uses standard SQL as it runs in the database. | Filter Transformation can define a condition using any statement or transformation function that returns either a TRUE or FALSE value. |
Q31. Revisiting Joiner Transformation
Q1. What is a Joiner Transformation and why it is an Active one?
Ans: A Joiner is an Active and Connected transformation used to join source data from the same source system or from two related heterogeneous sources residing in different locations or file systems.
The Joiner transformation joins sources with at least one matching column. The Joiner transformation uses a condition that matches one or more pairs of columns between the two sources.
The two input pipelines include a master pipeline and a detail pipeline or a master and a detail branch. The master pipeline ends at the Joiner transformation, while the detail pipeline continues to the target.
In the Joiner transformation, we must configure the transformation properties namely Join Condition, Join Type and Sorted Input option to improve Integration Service performance.
The join condition contains ports from both input sources that must match for the Integration Service to join two rows. Depending on the type of join selected, the Integration Service either adds the row to the result set or discards the row.
The Joiner transformation produces result sets based on the join type, condition, and input data sources. Hence it is an Active transformation.
Ans: The Joiner transformation accepts input from most transformations. However, following are the limitations:
Ans: During a session run, the Integration Service compares each row of the master source against the detail source. The master and detail sources need to be configured for optimal performance.
To improve performance for an Unsorted Joiner transformation, use the source with fewer rows as the master source. The fewer unique rows in the master, the fewer iterations of the join comparison occur, which speeds the join process.
When the Integration Service processes an unsorted Joiner transformation, it reads all master rows before it reads the detail rows. The Integration Service blocks the detail source while it caches rows from the master source. Once the Integration Service reads and caches all master rows, it unblocks the detail source and reads the detail rows.
To improve performance for a Sorted Joiner transformation, use the source with fewer duplicate key values as the master source.
When the Integration Service processes a sorted Joiner transformation, it blocks data based on the mapping configuration and it stores fewer rows in the cache, increasing performance.
Blocking logic is possible if master and detail input to the Joiner transformation originate from different sources. Otherwise, it does not use blocking logic. Instead, it stores more rows in the cache.
Ans: In SQL, a join is a relational operator that combines data from multiple tables into a single result set. The Joiner transformation is similar to an SQL join except that data can originate from different types of sources.
The Joiner transformation supports the following types of joins :
Note: A normal or master outer join performs faster than a full outer or detail outer join.
Ans:
Ans: We can define one or more conditions based on equality between the specified master and detail sources. Both ports in a condition must have the same datatype.
If we need to use two ports in the join condition with non-matching datatypes we must convert the datatypes so that they match. The Designer validates datatypes in a join condition.
Additional ports in the join condition increases the time necessary to join two sources.
The order of the ports in the join condition can impact the performance of the Joiner transformation. If we use multiple ports in the join condition, the Integration Service compares the ports in the order we specified.
Ans: The Joiner transformation does not match null values.
For example, if both EMP_ID1 and EMP_ID2 contain a row with a null value, the Integration Service does not consider them a match and does not join the two rows.
To join rows with null values, replace null input with default values in the Ports tab of the joiner, and then join on the default values.
Note: If a result set includes fields that do not contain data in either of the sources, the Joiner transformation populates the empty fields with null values. If we know that a field will return a NULL and we do not want to insert NULLs in the target, set a default value on the Ports tab for the corresponding port.
Ans: If we have sorted both the master and detail pipelines in order of the ports say ITEM_NO, ITEM_NAME and PRICE we must ensure that:
Ans: The best option is to place the Joiner transformation directly after the sort origin to maintain sorted data. However do not place any of the following transformations between the sort origin and the Joiner transformation:
Ans: Our Mapping will look like this:
To start with the mapping we need the following transformations:
After the Source qualifier of the EMP table place a Sorter Transformation . Sort based on DEPTNOport.
Next we place a Sorted Aggregator Transformation. Here we will find out the AVERAGE SALARY for each (GROUP BY) DEPTNO.
When we perform this aggregation, we lose the data for individual employees.
To maintain employee data, we must pass a branch of the pipeline to the Aggregator Transformation and pass a branch with the same sorted source data to the Joiner transformation to maintain the original data.
When we join both branches of the pipeline, we join the aggregated data with the original data.
So next we need Sorted Joiner Transformation to join the sorted aggregated data with the original data, based on DEPTNO. Here we will be taking the aggregated pipeline as the Master and original dataflow as Detail Pipeline.
After that we need a Filter Transformation to filter out the employees having salary less than average salary for their department.
Filter Condition: SAL>=AVG_SAL
Lastly we have the Target table instance.
Ans: A Sequence Generator transformationisa Passive and Connected transformation that generates numeric values. It is used to create unique primary key values, replace missing primary keys, or cycle through a sequential range of numbers. This transformation by default contains ONLY Two OUTPUT ports namely CURRVAL and NEXTVAL. We cannot edit or delete these ports neither we cannot add ports to this unique transformation. We can create approximately two billion unique numeric values with the widest range from 1 to 2147483647.
Ans: Sequence Generator:
Properties | Description |
Start Value | Start value of the generated sequence that we want the Integration Service to use if we use the Cycle option. If we select Cycle, the Integration Service cycles back to this value when it reaches the end value. Default is 0. |
Increment By | Difference between two consecutive values from the NEXTVAL port.Default is 1. |
End Value | Maximum value generated by SeqGen. After reaching this value the session will fail if the sequence generator is not configured to cycle.Default is 2147483647. |
Current Value | Current value of the sequence. Enter the value we want the Integration Service to use as the first value in the sequence. Default is 1. |
Cycle | If selected, when the Integration Service reaches the configured end value for the sequence, it wraps around and starts the cycle again, beginning with the configured Start Value. |
Number of Cached Values | Number of sequential values the Integration Service caches at a time. Default value for a standard Sequence Generator is 0. Default value for a reusable Sequence Generator is 1,000. |
Reset | Restarts the sequence at the current value each time a session runs.This option is disabled for reusable Sequence Generator transformations. |
Will the Surrogate keys in both the target tables be same? If not how can we flow the same sequence values in both of them.
Ans: When we connect the NEXTVAL output port of the Sequence Generator directly to the surrogate key columns of the target tables, the Sequence number will not be the same.
A block of sequence numbers is sent to one target tables surrogate key column. The second targets receives a block of sequence numbers from the Sequence Generator transformation only after the first target table receives the block of sequence numbers.
Suppose we have 5 rows coming from the source, so the targets will have the sequence values as TGT1 (1,2,3,4,5) and TGT2 (6,7,8,9,10). [Taken into consideration Start Value 0, Current value 1 and Increment by 1.
Now suppose the requirement is like that we need to have the same surrogate keys in both the targets.
Then the easiest way to handle the situation is to put an Expression Transformation in between the Sequence Generator and the Target tables. The SeqGen will pass unique values to the expression transformation, and then the rows are routed from the expression transformation to the targets.
Suppose the Current Value is 0 and End Value of Sequence generator is set to 80. What will happen?
Ans: End Value is the maximum value the Sequence Generator will generate. After it reaches the End value the session fails with the following error message:
TT_11009 Sequence Generator Transformation: Overflow error.
Failing of session can be handled if the Sequence Generator is configured to Cycle through the sequence, i.e. whenever the Integration Service reaches the configured end value for the sequence, it wraps around and starts the cycle again, beginning with the configured Start Value.
Ans: When we convert a non reusable sequence generator to resuable one we observe that the Number of Cached Values is set to 1000 by default; And the Reset property is disabled.
When we try to set the Number of Cached Values property of a Reusable Sequence Generator to 0 in the Transformation Developer we encounter the following error message:
The number of cached values must be greater than zero for reusable sequence transformation.
Ans: An aggregator is an Active, Connected transformation which performs aggregate calculations like AVG, COUNT, FIRST, LAST, MAX, MEDIAN, MIN, PERCENTILE, STDDEV, SUM and VARIANCE.
Ans: An Expression Transformation performs calculation on a row-by-row basis. An Aggregator Transformation performs calculations on groups.
Ans: Apart from aggregate expressions Informatica Aggregator also supports non-aggregate expressions and conditional clauses.
Ans: By default, the aggregator transformation treats null values as NULL in aggregate functions. But we can specify to treat null values in aggregate functions as NULL or zero.
Ans: We can enable the session option, Incremental Aggregation for a session that includes an Aggregator Transformation. When the Integration Service performs incremental aggregation, it actually passes changed source data through the mapping and uses the historical cache data to perform aggregate calculations incrementally.
Ans:
Ans: Integration Service creates the index and data caches files in memory to process the Aggregator transformation. If the Integration Service requires more space as allocated for the index and data cache sizes in the transformation properties, it stores overflow values in cache files i.e. paging to disk. One way to increase session performance is to increase the index and data cache sizes in the transformation properties. But when we check Sorted Input the Integration Service uses memory to process an Aggregator transformation it does not use cache files.
Ans:
Ans:
Ans: If we do not group values, the Integration Service will return only the last row for the input rows.
Ans: Integration Service produces one row for each group based on the group by ports. The columns which are neither part of the key nor aggregate expression will return the corresponding value of last record of the group received. However, if we specify particularly the FIRST function, the Integration Service then returns the value of the specified first row of the group. So default is the LAST function.
Ans: Use conditional clauses in the aggregate expression to reduce the number of rows used in the aggregation. The conditional clause can be any clause that evaluates to TRUE or FALSE.
SUM( SALARY, JOB = CLERK )
Use non-aggregate expressions in group by ports to modify or replace groups.
IIF( PRODUCT = Brown Bread, Bread, PRODUCT )
The expression can also include one aggregate function within another aggregate function, such as:
MAX( COUNT( PRODUCT ))
Q1. What is a Rank Transform?
Ans: Rank is an Active Connected Informatica transformation used to select a set of top or bottom values of data.
Ans: Like the Aggregator transformation, the Rank transformation lets us group information. The Rank Transform allows us to select a group of top or bottom values, not just one value as in case of Aggregator MAX, MIN functions.
Ans: Rank port is an input/output port use to specify the column for which we want to rank the source values. By default Informatica creates an output port RANKINDEX for each Rank transformation. It stores the ranking position for each row in a group.
Ans: Rank transformation lets us group information. We can configure one of its input/output ports as a group by port. For each unique value in the group port, the transformation creates a group of rows falling within the rank definition (top or bottom, and a particular number in each rank).
Ans: If two rank values match, they receive the same value in the rank index and the transformation skips the next value.
Ans:
Ans: During a session, the Integration Service compares an input row with rows in the data cache. If the input row out-ranks a cached row, the Integration Service replaces the cached row with the input row. If we configure the Rank transformation to rank based on different groups, the Integration Service ranks incrementally for each group it finds. The Integration Service creates an index cache to stores the group information and data cache for the row data.
Ans: Rank transformation can return the strings at the top or the bottom of a session sort order. When the Integration Service runs in Unicode mode, it sorts character data in the session using the selected sort order associated with the Code Page of IS which may be French, German, etc. When the Integration Service runs in ASCII mode, it ignores this setting and uses a binary sort order to sort character data.
Ans: Sorter Transformation is an Active, Connected Informatica transformation used to sort data in ascending or descending order according to specified sort keys. The Sorter transformation contains only input/output ports.
Ans: When the Sorter transformation is configured to treat output rows as distinct, it assigns all ports as part of the sort key. The Integration Service discards duplicate rows compared during the sort operation. The number of Input Rows will vary as compared with the Output rows and hence it is an Active transformation.
Ans: The Case Sensitive property determines whether the Integration Service considers case when sorting data. When we enable the Case Sensitive property, the Integration Service sorts uppercase characters higher than lowercase characters.
Ans: We can configure the way the Sorter transformation treats null values. Enable the property Null Treated Low if we want to treat null values as lower than any other value when it performs the sort operation. Disable this option if we want the Integration Service to treat null values as higher than any other value.
Ans: The Integration Service passes all incoming data into the Sorter Cache before Sorter transformation performs the sort operation.
The Integration Service uses the Sorter Cache Size property to determine the maximum amount of memory it can allocate to perform the sort operation. If it cannot allocate enough memory, the Integration Service fails the session. For best performance, configure Sorter cache size with a value less than or equal to the amount of available physical RAM on the Integration Service machine.
If the amount of incoming data is greater than the amount of Sorter cache size, the Integration Service temporarily stores data in the Sorter transformation work directory. The Integration Service requires disk space of at least twice the amount of incoming data when storing data in the work directory.
Ans: The Union transformation is an Active, Connected non-blocking multiple input group transformation use to merge data from multiple pipelines or sources into one pipeline branch. Similar to the UNION ALL SQL statement, the Union transformation does not remove duplicate rows.
Ans:
Ans: Lookups are cached by default in Informatica. Lookup cache can be either non-persistent or persistent. The Integration Service saves or deletes lookup cache files after a successful session run based on whether the Lookup cache is checked as persistent or not.
Ans: Any Informatica Transformation created in the in the Transformation Developer or a non-reusable promoted to reusable transformation from the mapping designer which can be used in multiple mappings is known as Reusable Transformation. When we add a reusable transformation to a mapping, we actually add an instance of the transformation. Since the instance of a reusable transformation is a pointer to that transformation, when we change the transformation in the Transformation Developer, its instances reflect these changes.
A Mapplet is a reusable object created in the Mapplet Designer which contains a set of transformations and lets us reuse the transformation logic in multiple mappings. A Mapplet can contain as many transformations as we need. Like a reusable transformation when we use a mapplet in a mapping, we use an instance of the mapplet and any change made to the mapplet is inherited by all instances of the mapplet.
Ans: Normalizer, Cobol sources, XML sources, XML Source Qualifier transformations, Target definitions, Pre- and post- session Stored Procedures, Other Mapplets.
Ans:
PMERR_TRANS- Stores metadata about the source and transformation ports, such as name and datatype, when a transformation error occurs.
[teaserbox type="5" img="2806" title="Interested in Learning Devops Online!" subtitle="Learn the Devops Online" link_url="https://www.iteanz.com/devops-training"]
Ans: When we issue the STOP command on the executing session task, the Integration Service stops reading data from source. It continues processing, writing and committing the data to targets. If the Integration Service cannot finish processing and committing data, we can issue the abort command.
In contrast ABORT command has a timeout period of 60 seconds. If the Integration Service cannot finish processing and committing data within the timeout period, it kills the DTM process and terminates the session.
Ans: Yes we can copy session to new folder or repository provided the corresponding Mapping is already in there.
Ans: Lookup is just similar like SQL LEFT OUTER JOIN.
Interested in learning Informatica? Well, we have the comprehensive Informatica Training Course.