Ans: ETL stands for Extract-Transform-Load and it is a process of how data is loaded from the source system to the data warehouse. Data is extracted from an OLTP database, transformed to match the data warehouse schema and loaded into the data warehouse database.
Ans: Data has become the critical part of all kinds of businesses and operations. Because data is so important to a successful business, poor performance or inaccurate procedure can cost time and money. Therefore, ETL testing is designed to ensure that the data processing is done in the expected way for the business/enterprise to get the benefit out of it.
Ans:
Ans: ETL systems are used by businesses to integrate data from multiple sources. These software systems are key components in ensuring that your company is processing its data efficiently, allowing your business to run smooth and without interruption.
Ans:
Ans: Cubes are data processing units comprised of fact tables and dimensions from the data warehouse. It provides multi-dimensional analysis.
OLAP stands for Online Analytics Processing, and OLAP cube stores large data in muti-dimensional form for reporting purposes. It consists of facts called as measures categorized by dimensions.
Ans: Requires in depth knowledge on the ETL tools and processes
Needs to write the SQL queries for the various given scenarios during the testing phase.
Test components of ETL data warehouse
Execute backend data-driven test
Create, design and execute test cases, test plans and test harness
Identify the problem and provide solutions for potential issues
Approve requirements and design specifications
Data transfers and Test flat file
Writing SQL queries for various scenarios like count test.
Should be able to carry our different types of tests such as Primary Key, defaults and keep a check on the other functionality of the ETL process.
Quality Check.
Ans:
Ans: ETL tool is meant for extraction data from the legacy systems and load into specified database with some process of cleansing data.
ex: Informatica, data stage ....etc
OLAP is meant for Reporting purpose in OLAP data available in Multidirectional model. so that you can write simple query to extract data from the data base.
ex: Business objects, Cognos....etc
Ans: A fact table without measures is known as Factless fact table. It can view the number of occurring events. For example, it is used to record an event such as employee count in a company.
Ans:
ODS: Operational Data Store.
ODS: Comes between staging area & Data Warehouse. The data is ODS will be at the low level of granularity.
Once data was populated in ODS aggregated data will be loaded into EDW through ODS.
Ans: Data staging is actually a collection of processes used to prepare source system data for loading a data warehouse. Staging includes the following steps:
Ans: Most data warehouses are considered to be a three-tier system. This is essential to their structure. The first layer is where the data lands. This is the collection point where data from outside sources is compiled. The second layer is known as the ‘integration layer.’ This is where the data that has been stored is transformed to meet company needs. The third layer is called the ‘dimension layer,’ and is where the transformed information is stored for internal use.
Ans: Data warehousing comes before the mining process. This is the act of gathering data from various exterior sources and organizing it into one specific location: the warehouse. Data mining is when that data is analyzed and used as information for making decisions.
Ans: Partitioning is when an area of data storage is sub-divided to improve performance. Think of it as an organizational tool. If all your collected data is in one large space without organization the digital tools used for analyzing it will have a more difficult time finding the information in order to analyze it. Partitioning your warehouse will create an organizational structure that will make locating and analyzing easier and faster.
Two types of partitioning are round-robin partitioning and Hash Partitioning. Round-robin partitioning is when the data is evenly distributed among all partitions. This means that the number of rows in each partition is relatively the same. Hash partitioning is when the server applies a hash function in order to create partition keys to group data.
Ans: The process of ETL allows a business/enterprise to collect important data from different source systems and validate/change it to fit their goals and models, and then store it in data warehouse for analytic, forecasts and other kinds of reports for daily use. In a world of digital enterprise, it is a critical part of running an effective and efficient business.
Ans: ETL testing includes the following :
Verify whether the data is transforming correctly according to business requirements.
Verify that the projected data is loaded into the data warehouse without any truncation and data loss.
Make sure that ETL application reports invalid data and replaces with default values.
Make sure that data loads at expected time frame to improve scalability and performance.
Ans: Tracing level is the amount of data stored in the log files. Tracing level can be classified in two Normal and Verbose. Normal level explains the tracing level in a detailed manner while verbose explains the tracing levels at each and every row.
Ans: Grain fact can be defined as the level at which the fact information is stored. It is also known as Fact Granularity.
Ans: Data base testing contains different steps compared to data ware house testing:
Ans: To verify the Data which are being transferred from one system to the other in the described patter/manner by the business (requirements).
Ans:
Views:
Materialized View log:
Ans:
Round-Robin Partitioning:
Hash Portioning:
Ans: Connected Lookup:
Unconnected Lookup:
Ans: The following are the steps to fine tune mappings: