ETL Total

This is Thirupathirao Pasupuleti Previously worked as a ETL Tester in Syntags IT LLP Hyderabad and having 4.
2 years of
experience in ETL Testing and tools i have experience in informatica power center as a ETL Tool and database is SQL, I have
expertise in JIRA as DefectTracking and Test management tool. coming to the my roles and responsibilities are we are
working on agile methodology which we have 2 weeks in a sprint, once sprint planning is done will be assigning few user
stories once i get user stories i will be preparing queries based on the Mapping document, once code has deployed for testing
i will be preparing jobs and validate the ETL process and involved in preparing test cases and involved reviewing and
executing test cases as per the business requirement.
Project Architecture:
On-Premises Data: Data coming from the different sources sometimes it is flat files, database files load in the on premises
informatica cloud pick the data files into the staging area
Staging Area: In the staging area Data coming from the different sources with different formats in staging convert the data
into unique format, In staging data cleansing, data merging, redundant data will be removed in the staging
ODS(operational Data Source): Data load to staging area to ODS by applying the Business rules it is load into the ODS
Project Details: UNITED NATURALFOODS
United natural foods is leading independent national distributor of natural foods and personal care items and nutritional
supplements like baby care items and natural items in the United Nations
This project EDW( Enterprise data warehouse) customer data and product sales analysis is required involved to build data
ware house in the all the locations, sales trend analysis is required to analyse business team to know there customer
choices...
Roles and responsibilities in this project: understanding the business requirements, involved in writing test scenarios and
involved in reviewing test cases and test execution, reporting the test report and attending daily meetings.
Validations We perform
Record count validations, Reconciliation checks, data length, data types, constraint check, index checks, source data validation,
data comparison check, duplicate data validations, data with Primary key, Foreign key, Null Value Checks
What Layer working : Source to target
.
Priority is the order in which the developer should resolve a defect whereas Severity is the degree of impact that a defect has on
the operation of the product. Priority is categorized into three types: low, medium and high whereas Severity is categorized into
five types : critical Severity is a factor used to identify how much a defect impairs product usage. There are many scales of
severity Surrogate Key: the key is generated when a new record is inserted into a table. When a primary key is generated at
runtime, it is called a surrogate key. Surrogate key is an internally generated key by the current system and is invisible to the
user. As several objects are available in the database corresponding to surrogate, surrogate key can not be utilized as primary
key. For example: A sequential number can be a surrogate key.
A natural key is a single column or a combination of columns that has a business value and occurs naturally in the real world
(e.g. Social security number, International Standard Book Number…).
PRIMARY KEY: A PRIMARY KEY constraint uniquely identifies each record in a database table. All columns participating in a
primary key constraint must not contain NULL values
FACT TABLE: Fact table basically represents the metrics, measurements or facts of a business process, In fact tables facts are
stored and they are linked to a number of dimension tables via foreign keys
Addictive facts: Number of dimension associated with fact tables, Semi addictive facts: some dimensions associated with fact
tables but not all. Non addictive facts: it can’t sumup for any dimension table.
DIMENSION TABLE: Dimensions are descriptive data which is described by the keys dimensions are organized in the tables called
Dimension Table, Confirm dimension: a dimension table which can be shared by multiple fact tables. Junk Dimension it cannot
be used to described facts is known as JunkDimension
SCD(Slowly Changing Dimension):SCD is a dimension that stores and manages both current and historical data over time in a
data warehouse. It is considered and implemented as one of the most critical ETL tasks in tracking the history of dimension
records.
SCD1: SCD1 the new data overwrites the existing data. Thus the existing data is lost as it is not stored anywhere else.
SCD2: Creating another dimension record, A new record is created with the changed data values and this new record becomes
the current record.
SCD2: New record is added to the
SCD2 metadata – eff_start_date, eff_end_date, and is_current are designed to manage the state of the
record. eff_start_date and eff_end_date contain the time interval when the record is effective. Metadata – timestamp is the
actual time when the customer record was generated
JOINS: Joins clause used to combine 2 or more tables or select statements related columns between them .
Inner join: It returns the records that have matching values in both tables
left join: Return records from left table and matched records from right table
right join: Return records from right table and matched records from left table````````` `
outer join: Return all records when there is a match in either left or right table
Natural Join: NATURAL JOIN is similar to INNER join but we do not need to use the ON clause during the join. Meaning in a
natural join we just specify the tables.
UNION & UNION ALL: union operator used to combine result set of 2 or more select statements, every select statement with in
union must have same number of columns & same data types it is not return duplicate values, UNION ALL : All records with dupli
CONSTRAINTS: UNIQUE, NOT NULL, CHECK, DEFAULT, INDEX, PK, Foreign Key
Staging Area: During ETL process a staging area is used as an intermediate storage area it serves as a temporary staging area
between data sources and data ware house.
Noramalization: Normalization is a data base design which is implemented to reduce redundant data /duplicate data /repeated
data in the data base, normalization rules desides larger tables into smaller tables and links using relationship keys.0
OLTP: Online transaction processing captures, stores, and processes data from transactions in real time.
OLAP: Online analytical processing uses complex queries to analyze aggregated historical data from OLTP
Smoke testing : Smoke Testing is performed to ascertain that the critical functionalities of the program are working fine. Smoke
testing exercises the entire system from end to end
Sanitary Testing: Sanity testing is done at random to verify that each functionality is working as expected.
STAR SCHEMA: A star schema contains both dimension tables and fact tables in it. In star schema each dimension is surrounded
by fact tables. SNOWFLAKE SCHEMA: A snow flake schema contains all three- dimension tables, fact tables, and sub-dimension
tables. Each dimension is normalized into sub-dimensions.
CASE: the case statements goes through conditions and return a value when first condition is met(like if –then-else)
RANK, Dense_Rank, Row_nuber: it is assigns rank to each record in a table it skips the similar values
VIEW: View is virtual table it acts as a actual table the views are not stored in the database, no memory concept is
MATERIALIZED VIEW: The results of a view expression are stored in a database system. It has some store memory
SUB Query : A Subquery is a SQL query within another query. It is a subset of a Select statement whose return values are used in
filtering the conditions of the main query.
Correlated SUB Query: a correlated subquery is a subquery that uses values from the outer query in order to complete. Because
a correlated subquery requires the outer query to be executed first, the correlated subquery must run once for every row in the
outer query. It is also known as a synchronized subquery.
3rd HIGHEST SALARY: Select min(salary) from (select top3 * from employee order by salary desc)third order by salary asc--- (2nd
HIGHEST –select max(salary) from employee whre salary not in (select max(salary) from employee)
Nth HIGHEST—select * from employee e1 where(n-1)=(select count(distinct(e2.salary)) from employee e2 where
e2.salary>e1.salary),
NEW TABLE CREATE: select * into newtable from old table( without data where 1=0;)
DUPLICATE FIND: select * ,count(*) from employee group by empid having count(*)>1 DELETE DUPLICATE:
Delete from(select *,row_number()over(partition by empid order by empid) as rn from employeetable)where rn>1
SUBSTRING: select substring(‘fullname’,1,charindex(‘_’,fullname) as firstname,
substring(‘fullname’,charindex(‘_’,fullname)+1,len(‘fullname)) as lastname from employee
Water Fall Model: The waterfall model is a classical model used in system development life cycle to create a system with a linear
and sequential approach. It is termed as waterfall because the model develops systematically from one phase to another in a
downward fashion.
Agile Model: The main difference is that Waterfall is a linear system of working that requires the team to complete each project
phase before moving on to the next one while Agile encourages the team to work simultaneously on different phases of the
project
Defect Life Cycle: New > Open or Reject > In Analysis.>In Development >Ready to test >In Test > done or Re Open.
Testing Life Cycle: Require Phase > Analysis Phase > Test Preparation> Test Execution > Signoff
Regression testing: This testing is done to make sure that new code changes should not have side effects on the existing
functionalities. It ensures that the old code still works once the latest code changes are done.
Cast , Convert: Change the data type from one format to another format. SELECT CAST(25.65 AS varchar)), CONVERT(int, 25.65)
Change Data Capture: Inserting new records, updating one or more fields of existing records, deleting records are the types of
changes which Change Data Capture processes must detect in the source system.
LIKE : SELECT FullNameFROM EmployeeDetails WHERE FullName LIKE ‘__hn%’;
CONCAT: SELECT CONCAT(EmpId, ManagerId) as NewId FROM EmployeeDetails;- empidmangerid
TRIM: UPDATE EmployeeDetails SET FullName = LTRIM(RTRIM(FullName));
Case: select case when empid=true then’yes’ else no end as false from employee
Max Salary in Each Dept: SELECT dept_id, MAX(salary) as max_salary_per_dept FROM employee GROUP BY dept_id;
Functions: SELECT *, RANK() OVER(partition by ORDER BY salary DESC) AS ranks, DENSE_RANK() OVER(ORDER BY salary DESC) AS
dense_ranks , ROW_NUMBER() OVER(ORDER BY salary DESC) AS row_numbers FROM managers;
LEAD,LAG: Select value, LAG(sale_value) OVER(ORDER BY sale_value) from table |Lead(sale_value) OVER(ORDER BY sale_value)
EVEN/ODD: SELECT E.EmpId, E.Project, E.Salary FROM ( SELECT *, Row_Number() OVER(ORDER BY EmpId) AS RowNumber
FROM EmployeeSalary) E WHERE E.RowNumber % 2 = 0|Even ,%2=1 |ODD

ETL Total

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ETL Total

Uploaded by

Copyright:

Available Formats

This is Thirupathirao Pasupuleti Previously worked as a ETL Tester in Syntags IT LLP Hyderabad and having 4.

You might also like