The purpose of Metadata Testing is to verify that the table definitions conform to the data model and application design specifications. Show Data Type CheckVerify that the table and column data type definitions are as per the data model design specifications.
Data Length CheckVerify that the length of database columns are as per the data model design specifications.
Data Length CheckIndex / Constraint CheckVerify that proper constraints and indexes are defined on the database tables as per the design specifications.
Metadata Naming Standards CheckVerify that the names of the database metadata such as tables, columns, indexes are as per the naming standards.
Metadata Check Across EnvironmentsCompare table and column metadata across environments to ensure that changes have been migrated appropriately.
Automate metadata testing with ETL ValidatorETL Validator comes with Metadata Compare Wizard for automatically capturing and comparing Table Metadata.
The purpose of Data Completeness tests are to verify that all the expected data is loaded in target from the source. Some of the tests that can be run are : Compare and Validate counts, aggregates (min, max, sum, avg) and actual data between the source and target. Record Count ValidationCompare count of records of the primary source table and target table. Check for any rejected records.
Column Data Profile ValidationColumn or attribute level data profiling is an effective tool to compare source and target data without actually comparing the entire data. It is similar to comparing the checksum of your source and target data. These tests are essential when testing large amounts of data. Some of the common data profile comparisons that can be done between the source and target are:
Compare Entire Source and Target DataCompare data (values) between the flat file and target data effectively validating 100% of the data. In regulated industries such as finance and pharmaceutical, 100% data validation might be a compliance requirement. It is also a key requirement for data migration projects. However, performing 100% data validation is a challenge when large volumes of data is involved. This is where ETL testing tools such as ETL Validator can be used because they have an inbuilt ELV engine (Extract, Load, Validate) capabile of comparing large values of data.
Automate Data Completeness Testing using ETL ValidatorETL Validator comes with Data Profile Test Case, Component Test Case and Query Compare Test Case for automating the comparison of source and target data.
The purpose of Data Quality tests is to verify the accuracy of the data. Data profiling is used to identify data quality issues and the ETL is designed to fix or handle these issue. However, source data keeps changing and new data quality issues may be discovered even after the ETL is being used in production. Automating the data quality checks in the source and target system is an important aspect of ETL execution and testing. Duplicate Data ChecksLook for duplicate rows with same unique key column or a unique combination of columns as per business requirement.
Data Validation RulesMany database fields can contain a range of values that cannot be enumerated. However, there are reasonable constraints or rules that can be applied to detect situations where the data is clearly wrong. Instances of fields containing values violating the validation rules defined represent a quality gap that can impact ETL processing.
Data Integrity ChecksThis measurement addresses “keyed” relationships of entities within a domain. The goal of these checks is to identify orphan records in the child entity with a foreign key to the parent entity.
Automate data quality testing using ETL Validator ETL Validator comes with Data Rules Test Plan and Foreign Key Test Plan for automating the data quality testing.
Data is transformed during the ETL process so that it can be consumed by applications on the target system. Transformed data is generally important for the target systems and hence it is important to test transformations. There are two approaches for testing transformations – white box testing and blackbox testing. Transformation testing using White Box approachWhite box testing is a testing technique, that examines the program structure and derives test data from the program logic / code. For transformation testing, this involves reviewing the transformation logic from the mapping design document and the ETL code to come up with test cases. The steps to be followed are listed below:
The advantage with this approach is that the test can be rerun easily on a larger source data. The disadvantage of this approach is that the tester has to reimplement the transformation logic.
Transformation testing using Black Box approachBlack-box testing is a method of software testing that examines the functionality of an application without peering into its internal structures or workings. For transformation testing, this involves reviewing the transformation logic from the mapping design document setting up the test data appropriately. The steps to be followed are listed below:
The advantage with this approach is that the transformation logic does not need to be reimplemented during the testing. The disadvantage of this approach is that the tester needs to setup test data for each transformation scenario and come up with the expected values for the transformed data manually.
Automate data transformation testing using ETL Validator ETL Validator comes with Component Test Case which can be used to test transformations using the White Box approach or the Black Box approach.
The goal of ETL Regression testing is to verify that the ETL is producing the same output for a given input before and after the change. Any differences need to be validated whether are expected as per the changes. Changes to Metadata Track changes to table metadata in the Source and Target environments. Often changes to source and target system metadata changes are not communicated to the QA and Development teams resulting in ETL and Application failures. This check is important from a regression testing standpoint.
Automated ETL TestingAutomating the ETL testing is the key for regression testing of the ETL particularly more so in an agile development environment. Organizing test cases into test plans (or test suites) and executing them automatically as and when needed can reduce the time and effort needed to perform the regression testing. Automating ETL testing can also eliminate any human errors while performing manual checks. Regression testing by baselining target dataOften testers need to regression test an existing ETL mapping with a number of transformations. It may not be practical to perform an end-to-end transformation testing in such cases given the time and resource constraints. From a pure regression testing standpoint it might be sufficient to baseline the data in the target table or flat file and compare it with the actual result in such cases. Here are the steps:
Automate ETL regression testing using ETL Validator ETL Validator comes with a Baseline and Compare Wizard which can be used to generate test cases for automatically baselining your target table data and comparing them with the new data. Using this approach any changes to the target data can be identified. ETL Validator also comes with Metadata Compare Wizard that can be used to track changes to Table metadata over a period of time. This helps ensure that the QA and development teams are aware of the changes to table metadata in both Source and Target systems. Many database fields can only contain limited set of enumerated values. Instances of fields containing values not found in the valid set represent a quality gap that can impact processing. Verify that data conforms to reference data standardsData model standards dictate that the values in certain columns should adhere to a values in a domain.
Compare domain values across environmentsOne of the challenge in maintaining reference data is to verify that all the reference data values from the development environments has been migrated properly to the test and production environments.
Track reference data changesBaseline reference data and compare it with the latest reference data so that the changes can be validated.
Automate reference data testing using ETL ValidatorETL Validator comes with Baseline & Compare Wizard and Data Rules test plan for automatically capturing and comparing Table Metadata.
ETL process is generally designed to be run in a Full mode or Incremental mode. When running in Full mode, the ETL process truncates the target tables and reloads all (or most) of the data from the source systems. Incremental ETL only loads the data that changed in the source system using some kind of change capture mechanism to identify changes. Incremental ETL is essential to reducing the ETL run times and it is often used method for updating data on a regular basis. The purpose of Incremental ETL testing is to verify that updates on the sources are getting loaded into the target system properly. While most of the data completeness and data transformation tests are relevant for incremental ETL testing, there are a few additional tests that are relevant. To start with, setup of test data for updates and inserts is a key for testing Incremental ETL. Duplicate Data ChecksWhen a source record is updated, the incremental ETL should be able to lookup for the existing record in the target table and update it. If not this can result in duplicates in the target table.
Compare Data ValuesVerify that the changed data values in the source are reflecting correctly in the target data. Typically, the records updated by an ETL process are stamped by a run ID or a date of the ETL run. This date can be used to identify the newly updated or inserted records in the target system. Alternatively, all the records that got updated in the last few days in the source and target can be compared based on the incremental ETL run frequency.
Data Denormalization ChecksDenormalization of data is quite common in a data warehouse environment. Source data is denormalized in the ETL so that the report performance can be improved. However, the denormalized values can get stale if the ETL process is not designed to update them based on changes in the source data.
Slowly Changing Dimension ChecksWhile there are different types of slowly changing dimensions (SCD), testing of and SCD Type 2 dimension presently a unique challenge since there can be multiple records with the same natural key. Type 2 SCD is designed to create a new record whenever there is a change to a set of columns. The latest record is tagged with a flag and there are start date and end date columns to indicate the period of relevance for the record. Some of the tests specific to a Type 2 SCD are listed below:
Automate incremental ETL testing using ETL ValidatorETL Validator comes with Benchmarking Capability in Component Test Case for automating the incremental ETL testing. Benchmarking capability allows the user to automatically compare the latest data in the target table with a previous copy to identify the differences. These differences can then be compared with the source data changes for validation. Once the data is transformed and loaded into the target by the ETL process, it is consumed by another application or process in the target system. For data warehouse projects, the consuming application is a BI tool such as OBIEE, Business Objects, Cognos or SSRS. For a data migration project, data is extracted from a legacy application and loaded into a new application. In a data integration project, data is being shared between two different applications usually on a regular basis. The goal of ETL integration testing is to perform an end-to-end testing of the data in the ETL process and the consuming application. End-to-End Data TestingIntegration testing of the ETL process and the related applications involves the following steps:
Automate integrated ETL testing using ETL ValidatorETL Validator comes with Component Test Case the supports comparing an OBIEE report (logical query) with the database queries from the source and target. Using the component test case the data in the OBIEE report can be compared with the data from the source and target databases thus identifying issues in the ETL process as well as the OBIEE report. Performance of the ETL process is one of the key issues in any ETL project. Often development environments do not have enough source data for performance testing of the ETL process. This could be because the project has just started and the source system only has small amount of test data or production data has PII information which cannot be loaded into the test database without scrubbing. The ETL process can behave differently with different volumes of data.
End-to-End Data TestingIntegration testing of the ETL process and the related applications involves the following steps:
What is the process for data validation testing?Database validation testing involves stored data and metadata validation. The testing is done based on requirements against the quality and performance of the data. Testers also look into the data objects, functionality, types, and lengths before making the data live and available for users.
What validate the integrity of data files?Validation Checks
Reconciliation checks (Checking number of lines, totals etc.) Field-specific checks (Checking for presence and uniqueness of fields, formatting, numerical bounds etc.) Cross-field checks (Checking consistency of values within a given time snapshot where there are dependencies)
What are the 3 types of data validation?Different kinds. Data type validation;. Range and constraint validation;. Code and cross-reference validation;. Structured validation; and.. Consistency validation.. What is data verification and validation?Data verification: to make sure that the data is accurate. Data validation: to make sure that the data is correct.
|