Ans:
Criteria | Pentaho | Tableau |
Functionality | ETL, OLAP, & static Reports | Data analytics |
Availability | Open source | Proprietary |
Strengths | Data Integration | Interactive visualizations |
Ans: Revered as one of the most efficient and resourceful data integration tools (DI), Pentaho virtually supports all available data sources and allows scalable data clustering and data mining. It is a light-weight Business Intelligence suite executing Online Analytical Processing (OLAP) services, ETL functions, reports and dashboards creation and other data-analysis and visualization operations.
Ans:
Ans: A metadata model in Pentaho formulates the physical structure of your database into a logical business model. These mappings are stored in a central repository and allow developers and administrators to build business-logical DB tables that are cost effective and optimized. It further simplifies the working of business users allowing them to create formatted reports and dashboards ensuring security to data access.
All in all, metadata model provides an encapsulation around the physical definitions of your database and the logical representation and define relationships between them.
Ans: Pentaho Reporting Evaluation is a particular package of a subset of the Pentaho Reporting capabilities, designed for typical first-phase evaluation activities such as accessing sample data, creating and editing reports, and viewing and interacting with reports.
Ans: MDX is an acronym for ‘Multi-Dimensional Expressions,’ the standard query language introduced by Microsoft SQL OLAP Services. MDX is an imperative part of XML for analysis API, which has a different structure than SQL. A basic MDX query is:
SELECT {[Quantity].[Unit Sales], [Quantity].[Store Sales]} ON COLUMNS,
{[Product].members} ON ROWS
FROM [Sales]
WHERE [Time].[1999].[Q2]
Ans: While transformations refer to shifting and transforming rows from source system to target system, jobs perform high level operations like implementing transformations, file transfer via FTP, sending mails, etc.
Another significant difference is that the transformation allows parallel execution whereas jobs implement steps in order.
Ans: PDI supports joining of two tables form the same databse using a ‘Table Input’ method, performing the join in SQL only.
On the other hand, for joining two tables in different databases, users implement ‘Database Join’ step. However, in database join, each input row query executes on the target system from the main stream, resulting in lower performance as the number of queries implement on the B increases.
To avoid the above situation, there is yet another option to join rows form two different Table Input steps. You can use ‘Merge Join ‘step, using the SQL query having ‘ORDER BY’ clause. Remember, the rows must be perfectly sorted before implementing merge join.
Ans: Since PDI transformations support parallel execution of all the steps/operations, it is impossible to sequentialize transformations in Pentaho. Moreover, to make this happen, users need to change the core architecture, which will actually result in slow processing.
Ans: Pentaho Reporting evaluation is a complete package of its reporting abilities, activities and tools, specifically designed for first-phase evaluation like accessing the sample, generating and updating reports, viewing them and performing various interactions. This evaluation consists of Pentaho platform components, Report Designer and ad hoc interface for reporting used for local installation.
Ans: No, Pentaho doesn’t allow field duplication.
Ans: “Select Values” will rename a field as you select the original field also. The original field will have a duplicate name of the other field now.
Ans: You can either create a new transformation/job or close and reopen the ones already loaded in Spoon.
Ans: Dashboards are the collection of various information objects on single page including diagrams, tables and textual information. The Pentaho AJAX API is used to extract BI information while Pentaho Solution Repository contains the content definitions.
Ans: Transformation logic can be shared using subtransformations, which provides seamless loading and transformation of variables enhancing efficiency and productivity of the system. Subtransformations can be called and reconfigured when required.
Ans: Pentaho reporting enables businesses to create structured and informative reports to easily access, format and deliver meaningful and important information to clients and customers. They also help business users to analyze and track consumer’s behavior for the specific time and functionality, thereby directing them towards the right success path.
Ans: Pentaho Data Mining refers to the Weka Project, which consists of a detailed tool set for machine learning and data mining. Weka is open source software for extracting large sers of information about users, clients and businesses. It is built on Java programming.
Ans: No. Data Integration refers to passing of data from one type of systems to other within the same application. On the contrary, ETL is used to extract and access data from different sources. And transform it into other objects and tables.
Ans: It is just the construction of parent child relationships in a database. Hierarchy Flattening uses both horizontal and vertical formats, which enables easy and trouble-free identification of sub elements. It further allows users to understand and read the main hierarchy of BI and includes Parent column, Child Column, Parent attributes and Child attributes.
Ans: PRD is a graphic tool to execute report-editing functions and create simple and advanced reports and help users export them in PDF, Excel, HTML and CSV files. PRD consists of Java-based report engine offering data integration, portability and scalability. Thus, it can be embedded in Java web applications and also other application servers like Pentaho BAserver.
Ans: Transformations dialog box consists of two different tables: one of arguments and the other of variables. While arguments refer to command line specified during batch processing, PDI variables refer to objects that are set in a previous transformation/job in the OS.
Ans: Pentaho offers JNDI connection configuration for local DI to avoid continuous running of application server during the development and testing of transformations. Edit the properties in jdbc.propertiesfile located at…\data-integration-server\pentaho-solutions\system\simple-jndi.
Ans: Yes, Pentaho is a trademark.
Ans: Multidimensional Expressions (MDX) is a query language for OLAP databases, much like SQL is a query language for relational databases. It is also a calculation language, with syntax similar to spreadsheet formulas.
Ans: Finite ordered list of elements is called as tuple.
Ans: The Cube will contain the following data:
Ans: Transformations is moving and transforming rows from source to target.
Jobs are more about high level flow control.
Ans: If we want to join 2 tables from the same database, we can use a “Table Input” step and do the join in SQL itself.
If we want to join 2 tables that are not in the same database. We can use the the “Database Join”.
Ans: it is not possible as in PDI transformations all of the steps run in parallel. So we can’t sequentialize them.
Ans: We can Create a new conversion or close and re-open the ones we have loaded in Spoon.
Ans: BIT is not a standard SQL data type. It’s not even standard on MySQL as the meaning (core definition) changed from MySQL version 4 to 5.
Also a BIT uses 2 bytes on MySQL. That’s why in PDI we made the safe choice and went for a char(1) to store a boolean. There is a simple workaround available: change the data type with a Select Values step to “Integer” in the metadata tab. This converts it to 1 for “true” and 0 for “false”, just like MySQL expects.
Ans: This is not possible as in PDI transformations all the steps run in parallel. So we can’t sequentialize them. This would require architectural changes to PDI and sequential processing also result in very slow processing.
Ans: we can’t. if we have duplicate fieldnames. Before PDI v2.5.0 we were able to force duplicate fields, but also only the first value of the duplicate fields could ever be used.
Ans:
Ans:
Arguments:
Arguments are command line arguments that we would normally specify during batch processing.
variables:
Variables are environment or PDI variables that we would normally set in a previous transformation in a job.
Ans:
i)Suite Pentaho
ii)All build under Java platform
Ans: Pentaho Schema Workbench offers a graphical edge for designing OLAP cubes for Pentaho Analysis.
Ans: It is a visual, banded report writer. It has various features lilke using subreports, charts and graphs etc.
Ans: It is the technology which enables files to be transparently encrypted to secure personal data from attackers with physical access to the computer.
Ans: The metadata stored in the repository by associating information with individual objects in the repository.
Ans: Snapshots are read-only copies of a master table located on a remote node which can be periodically refreshed to reflect changes made to the master table.
Ans: Data staging is actually a group of procedures used to prepare source system data for loading a data warehouse.
Ans: Full Load means completely erasing the insides of one or more tables and filling with fresh data.
Incremental Load means applying ongoing changes to one or more tables based on a predefined schedule.
Ans: Dataflow from source to target is called as mapping.
Ans: It is a set of instruction which tell when and how to move data from respective source to target.
Ans: It is a set of instruction which tell the infomatica server how to execute the task.
Ans: It creates and configure the set of transformation.
Ans: A data warehouse is said to be a three-tier system where a middle system provides usable data in a secure way to end users. Both side of this middle system are the end users and the back-end data stores.
Ans: ODS is Operational Data Store which comes in between of data warehouse and staging area.
Ans: ETL Tool is used for extracting data from the legecy system and load it into specified database with some processing of cleansing data.
OLAP Tool is used for reporting process . Here data is available in multidimensional model hence we can write simple query to extract data from database.
Ans: XML is an extensiable markup language which defines a set of rule for encoding documents in both formats which is human readable and machine readable.
Ans: Informatica Powercenter 4.1, Informatica Powercenter 5.1, Powercenter Informatica 6.1.2, Informatica Powercenter 7.1.2, etc.
Ans: Abinitio,DataStage, Informatica, Cognos Decision Stream, etc
Ans: MDX is multi- dimensional expression which is a main query language implemented by the Mondrains.
Ans: It is a cube to view data where we can slice and dice the data. It have time dimension, locations and figures.
Ans: Several solutions exist:
Use a “Select Values” step renaming a field while selecting also the original one. The result will be that the original field will be duplicated to another name. It will look as follows:
This will duplicate fieldA to fieldB and fieldC.
Use a calculator step and use e.g. The NLV(A,B) operation as follows:
This will have the same effect as the first solution: 3 fields in the output which are copies of each other: fieldA, fieldB, and fieldC.
Use a JavaScript step to copy the field:
This will have the same effect as the previous solutions: 3 fields in the output which are copies of each other: fieldA, fieldB, and fieldC.
Ans: If you look in the PDI main directory you will see a sub-directory “simple-jndi”, which contains a file called “jdbc.properties”. You should change this file so that the JNDI information matches the one you use in your application server.
After that you set in the connection tab of Spoon the “Method of access” to JNDI, the “Connection type” to the type of database you’re using. And “Connection name” to the name of the JDNI datasource (as used in “jdbc.properties”).
Ans: The catch is to specifically restrict the file list to the files inside the compressed collection. Some examples:
You have a file with the following structure:
access.logs.tar.gz
access.log.1
access.log.2
access.log.3
To read each of these files in a File Input step:
File/Directory | Wildcard |
tar:gz:/path/to/access.logs.tar.gz!/access.logs.tar! | .+ |
Note: If you only wanted certain files in the tarball, you could certainly use a wildcard like access.log..* or something. .+ is the magic if you don’t want to specify the children filenames. .* will not work because it will include the folder (i.e. tar:gz:/path/to/access.logs.tar.gz!/access.logs.tar!/ )
You have a simpler file, fat-access.log.gz. You could use the Compression option of the File Input step to deal with this simple case, but if you wanted to use VFS instead, you would use the following specification:
File/Directory | Wildcard |
gz:file://c:/path/to/fat-access.log.gz! | .+ |
Finally, if you have a zip file with the following structure:
access.logs.zip/
a-root-access.log
subdirectory1/
subdirectory-access.log.1
subdirectory-access.log.2
subdirectory2/
subdirectory-access.log.1
subdirectory-access.log.2
You might want to access all the files, in which case you’d use:
File/Directory | Wildcard |
zip:file://c:/path/to/access.logs.zip! | a-root-access.log |
zip:file://c:/path/to/access.logs.zip!/subdirectory1 | subdirectory-access.log.* |
zip:file://c:/path/to/access.logs.zip!/subdirectory2 | subdirectory-access.log.* |
Note: For some reason, the .+ doesn’t work in the subdirectories, they still show the directory entries. :
Click Here to know more details about Pentaho BI.