Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

Data Quality Best Practices

Donna Burbank and Nigel Turner


Global Data Strategy, Ltd.
August 26th, 2021

Follow on Twitter @donnaburbank, @nigelturner8


@GlobalDataStrat
Copyright Global Data Strategy, Ltd. 2021 Twitter Event hashtag: #DAStrategies
Donna Burbank

• Recognized industry expert in information • Excellence in Data Management Award Follow on Twitter @donnaburbank
management with over 25 years of from DAMA International @GlobalDataStrat
experience in data strategy, information
• Past President and Advisor to the DAMA
management, data modeling, metadata
Rocky Mountain chapter
management, and enterprise architecture
• Co-author of several books on data
• Managing Director at Global Data Strategy,
management
Ltd., an international information
management consulting company that • Regular contributor to industry
specializes in the alignment of business publications
drivers with data-centric technology • She can be reached at
• Worked with dozens of Fortune 500 [email protected]
companies worldwide in the Americas, Donna is based in Boulder, Colorado, US
Europe, Asia, and Africa and speaks
regularly at industry conferences
Global Data Strategy, Ltd. 2021 2
Nigel Turner

• Spent much of his career in British • Nigel is very active in professional Data
Telecommunications Group (BT) Management organizations and is an
where he led a series of enterprise- elected Data Management Association
wide IM & data governance initiatives. (DAMA) UK Committee member.
• Also been VP of Information • He was the joint winner of DAMA
Management Strategy at Harte Hanks International’s 2015 Community Award
Trillium Software, and Principal for the work he initiated and led in
• Worked in Information Management Consultant at FromHereOn and IPL. setting up a mentoring scheme in the
(IM) and related areas for over 25 UK where experienced DAMA
years. Experience has embraced Data professionals coach and support newer
Governance, Information Strategy, data management professionals.
Data Quality, Data Governance, Master • Nigel is based in Cardiff, Wales, UK.
Data Management & Business
Intelligence.

Follow on Twitter @NigelTurner8


Global Data Strategy, Ltd. 2021
Today’s hashtag: # DAStrategies
DATAVERSITY Data Architecture Strategies
This Year’s Lineup

• January Emerging Trends in Data Architecture – What’s the Next Big Thing?
• February Building a Data Strategy - Practical Steps for Aligning with Business Goals
• March Data Modeling Case Study – Business Data Modeling at Kiewit
• April Master Data Management – Aligning Data, Process, and Governance
• May Data Architecture, Solution Architecture, Platform Architecture – What’s the Difference?
• June Enterprise Architecture vs. Data Architecture
• July Best Practices in Metadata Management
• August Data Quality Best Practices (with guest Nigel Turner)
• September Data Modeling Techniques
• October Data Governance: Aligning Technical & Business Approaches
• December Data Architecture for Digital Transformation

Global Data Strategy, Ltd. 2021 4


What We’ll Cover Today

• Tackling data quality problems requires more than a


series of tactical, one off improvement projects.
• By their nature, many data quality problems extend
across and often beyond an organization.
• Addressing these issues requires a holistic architectural
approach combining people, process and technology.

Global Data Strategy, Ltd. 2021 5


Agenda

• Discuss how to deliver data quality improvements in the Baseline & Develop
phases of the A2E methodology
• Highlight the critical role of Business Rules in improving Data Quality
• Illustrate why getting Business Rules right is critical
• Outline how to use Business Rules to correct poor data quality and sustain
improved data quality

Global Data Strategy, Ltd. 2021 6


Data Quality is Part of a Wider Data Strategy
A Successful Data Strategy links Business Goals with Technology Solutions

“Top-Down” alignment with


business priorities

Managing the people, process,


policies & culture around data

Leveraging & managing data for


strategic advantage

Coordinating & integrating


disparate data sources

“Bottom-Up” management &


inventory of data sources

Global Data Strategy, Ltd. 2021 7


www.globaldatastrategy.com
Tackling Data Quality: the A2E approach

Assess Step Purpose


Understand what data exists and how it is used
Assess Business within the organization
Usage
Baseline the current quality of the data and
Baseline Data assess how well it is meeting business needs
Evaluate Baseline Sources
Focus priorities to optimise early business
Cycle of Continuous
Data Quality Improvement
Converge on benefits and set ‘fit for purpose’ quality targets
Business Critical Areas to guide improvement activities
Design & deploy improvement initiatives
Develop (encompassing people, process, and technology)
Improvements and measure the impact against targets
Regularly measure the data and continue to
Evaluate Benefits & improve it so that it continues to meet current
Develop Converge ROI and future business needs

Global Data Strategy, Ltd. 2021 8


Data Quality Improvement: The Importance of Business Rules

• In a data context, business rules are used to define and


enforce the standards that data must conform to
”A Business Rule is a criterion • Have a key role in assessing, baselining and improving data
used to guide day-to-day quality
business activity, shape
operational business judgments, • Can be used to:
or make operational business • Cleanse and enhance existing data
decisions.” • Become standards which new data must conform to
• Guide data design in new developments
Ronald Ross, quoted in • Enforce data standards in existing applications and platforms
architectureandgovernance.com • Stop poor quality data being entered at source, e.g. via drop
down lists, screen entry validation etc.

Global Data Strategy, Ltd. 2021 9


How Do You Classify Business Rules?

• Many different ways to classify business rules – can be very complex


• A simple classification is:

FORMAT BUSINESS RULES CONTENT BUSINESS RULES


Specify the format standards data Specify the allowable content
should comply with of records or fields

Include: Include:
• Field length • Allowable values
(fixed, variable etc.) • Whether mandatory or
• Character format optional
(e.g. Alphabetic, Numeric, • Relationships with other
Alphanumeric etc.) fields or records

Global Data Strategy, Ltd. 2021 10


Example Data Related Business Rules

FORMAT RULES

• A UK National Insurance Number must be in the format: aa nn nn nn a


• An employee must have a unique Employee ID in the format: aa nnnn
• Date of birth should be in North American format of MM/DD/YYYY
• A full US zip code must be in the format nnnnn-nnnn
• Internet router identifier must be in the format Aaa_Nan_Naa

Global Data Strategy, Ltd. 2021 11


Example Data Related Business Rules

CONTENT RULES
• Every Sales Representative must be assigned to one and only one Sales Region
• A valid email address must be entered by a customer to enable a customer’s
order to be accepted
CONTENT
• Gender codes must have the valid value of Male, Female or Unknown
• A supplier must have at least one associated geographical address
• Product Price should be Product Unit Cost + 25%

Global Data Strategy, Ltd. 2021 12


How Do You Identify Business Rules?
• Business rules can be discovered or derived from:
• Data models (Business / Logical / Physical)
• Business documentation (e.g. Process Descriptions, User Instructions)
• IT Documentation (e.g. requirements specifications, system manuals)
• Source code (e.g. If ‘A Then B’ statements)
• Master and / or Reference Data Sources (e.g. currency codes, product
master data)
• Documented metadata (e.g. Business Glossaries, Data Dictionaries,
Metadata Repositories)
• Data profiling outputs
• Talking to key stakeholders: VITAL IMPORTANCE OF STAKEHOLDER
• Data owners and data stewards (if in place) ENGAGEMENT:
• Business rules are frequently implicit (i.e. locked
• Data producers and consumers in people’s heads) and not formally documented
• Where business rules are documented,
• Other business and IT subject matter experts documentation is often out of date and not
updated in line with system changes

Global Data Strategy, Ltd. 2021 13


Data Models Describe the Organization

• Relationships define the data-centric Business Rules of an organization


• You should be able to “read” a data model like a sentence
• The Entities / Concepts are the “nouns” – the boxes on a data model
• It’s often helpful to start by taking some text describing the organization (or transcripts
from stakeholder interviews) and draw boxes around the nouns to find the core entities
• An employee can work for more than one department.
• A customer can have more than one account. BUSINESS Employee
RULES
• A department can contain more than one employee.
Customer
Account

Department
Global Data Strategy, Ltd. 2021 14
Deriving Business Rules: Business Data Model
• Communication & definition of core data concepts & their definitions
BUSINESS RULE:
A CUSTOMER is a
• A business data BUSINESS RULE: current or former client
model provides An EMPLOYEE must be who must have had an
core definitions on the active payroll account active within
of key data the last 6 months
objects.
• It also shows key
relationships
between data
objects.
• Even a simple
diagram as the
one on the right
can tell a
powerful “story”
…. And
uncover key
business rules
BUSINESS RULE:
A COMPANY must
contain 1 or more
customers with an
Global Data Strategy, Ltd. 2021 active account
When Business Rules Go Wrong or Go Missing

REAL
DATA
LIFE QUALITY

STORIES
HORROR
2021

Global Data Strategy, Ltd. 2021 16


Why Do Business Rules Matter? DQ ‘Short’comings
• Liam Thorp made headline news in the UK in Feb 2021
• Received a priority invite for a Covid-19 vaccination because
he was medically classed as ‘morbidly obese’
Beatles statue • The reason – his local health board had recorded his height as
City of Liverpool 6.2 centimetres and not his real height of 6 feet 2 inches
• This made his Body Mass Index (BMI) 28,000, calculated by his
weight / height ratio
• A BMI of 40 and above is classed as ‘morbidly obese’
• Now corrected, and he was put back in his rightful place in the
vaccine queue

Liam Thorp “I can see the funny


32 years old side of this story but KEY PROBLEM - ABSENCE
Liverpool also recognise there is OF BUSINESS RULES TO
resident an important issue for SPECIFY:
us to address” • Minimum Height
Chair of the Liverpool • Maximum BMI
Clinical Commissioning (Content)
Group (leading the city’s
vaccine roll out)
Global Data Strategy, Ltd. 2021 17
Why Do Business Rules Matter? ‘Miss’ing weight

• UK Air Accidents Investigation Branch (AAIB) report (April 2021)


declared a ‘Serious Incident’ at Birmingham airport, UK
• Report highlighted that 3 flights to Europe in July 2020 had taken off with
the weight of the plane load underestimated by an average 1,200kg
• This miscalculation could have caused a ‘serious incident’ on take off as it
determines take off speed, thrust etc.
• Problem happened because all passengers with the title ‘Miss’ were
automatically assumed by outsourced IT suppliers to be children and not
adults
• A child’s standard estimated weight is 35kg; an adult 69kg
• The airline described it as ‘ a simple flaw in its IT system’
• In reality, there was a serious problem with its business rules! KEY PROBLEMS:
• Reliance on IT, and not the business,
• The airline has now introduced manual validation of all passengers at
to specify the business rules
check in to ensure adults titled ‘Miss’ are changed to ‘Ms’ on the
• Making cultural assumptions that
passenger roster (?)
were incorrect

Global Data Strategy, Ltd. 2021


Four Step Process: Using Business Rules for Data Quality Improvement

STEP 1:
Profile
data
sources

STEP 2:
STEP 4:
Agree
Monitor &
priority DQ
report CYCLE OF CONTINUOUS
problems &
adherence DATA QUALITY
IMPROVEMENT design
to Business
Business
Rules
Rules

STEP 3:
Deploy
Business
Rules

Global Data Strategy, Ltd. 2021 19


Step 1: Quantifying Data Problems - The Value of Data Profiling

• The benefits of data profiling include:


• Checks conformance of the dataset with
business rules
• Enables fact-based discussion of the causes and
impacts of data problems
• Great starting point for Data Quality
improvement workshops
• Automatic generation of metadata
• Supports both data quality focus &
improvement and metadata capture
• Data profiling tools automate the process
of assessing and reporting on the quality
of data sources
• Data profiling can also be done via SQL,
without purchasing a tool
Example partial Data Profiling report

Global Data Strategy, Ltd. 2021 20


Step 1: An Alternative Approach to Quantifying Data Problems

Source:
Only 3% of Companies’ Data
Meets Basic Quality Standards

Tadhg Nagle, Thomas C. Redman


& David Sammon

Harvard Business Review


September 11 2017

21
Global Data Strategy, Ltd. 2021 21
Step 1: Data Profiling & Potential Data Quality Problem Identification

ROLE
EMPLOYEE NO SURNAME FIRST NAME GENDER DATE OF BIRTH
CODE

802540 Smith Brian Female 31/01/56 PM16


YN4176B Gregg Male 07/09/80 9999
811609 Patel Priya XXXX 25/12/78 AL60
22298 Bothroyd Bridget Female 28/08/09 TBD
802540 Smith Bryan Male 31/01/56 PM10

855265 Hayes Leslie Female 00/00/00 AL76


Taylor Kevin Unknown 12/30/69 US18

Note: Records extracted and anonymized from an actual HR database

Global Data Strategy, Ltd. 2021 22


Step 1: Data Profiling & Potential DQ Problem Identification

DATE OF
EMPLOYEE NO SURNAME FIRST NAME GENDER ROLE CODE
BIRTH

802540 Smith Brian Female 31/01/56 PM16

YN4176B Gregg Male 07/09/80 9999

811609 Patel Priya XXXX 25/12/78 AL60

22298 Bothroyd Bridget Female 28/08/09 TBD Key:


Potential
802540 Smith Bryan Male 31/01/56 PM10 Data Quality
Problem
855265 Hayes Leslie Female 00/00/00 AL76
Potential
Taylor Kevin Unknown 12/30/69 US18 Duplicate
Record

ANSWER: Total number of potential Data Quality problems is 13 or 19, depending on


whether Smith is a duplicate record

Global Data Strategy, Ltd. 2021 23


23
Step 2: Business Review & Validation
• Data profiling findings should be reviewed by appropriate business & IT
stakeholders
• If formal Data Governance in place, this should ideally led by the Data Stewards
responsible for the specific data domains
• Aim to reach consensus on what the business impact is
• Ways of doing this:
• Workshops and / or meetings (virtual or F2F)
• By workflows, seeking views on the potential problem areas
• For priority areas, agree Business Rules which should be in place to drive and
enforce data quality improvement
• Create and deploy Business Rules
• Test rules first in case of unforeseen downstream impacts
• Embed in appropriate operational systems or Data Quality Rules Engine (see later)

Global Data Strategy, Ltd. 2021 24


Step 3: Using Business Rules to steer and enforce Data Quality standards

Example potential format Example potential


business rules content business rules
Employee No. must be in format Gender should align with First
nnnnnn. Blank Employee Numbers Name derived from Common
are allowed if new starter awaiting Names Reference file
Emp. No. allocation
First Name must not be blank Allowable Genders are FEMALE,
MALE, SELF-DETERMINED or
UNKNOWN
Role code must be in format AAnn Date of Birth must be expressed
as DD/MM/YY and in the range
01/01/1940 to 12/12/2005
Date of Birth must be in format Employee No. should be unique.
nn/nn/nn Only one Emp. No. should be
allocated to any individual
employee

Global Data Strategy, Ltd. 2021 25


Step 3: Deploying Business Rules - Approaches

Data Entry
Guidelines, Master & Reference
Business Glossary Data Management
& Training

Application Code Data Quality Tool:


(e.g. data input DQ Business Rules
validation) Engine

Global Data Strategy, Ltd. 2021 26


Step 3: Automating Data Quality Business Rules via a DQ Rules Engine
DATA DATA QUALITY
INPUT RULES ENGINE

Real Time Data Validation

REPORTING
Batch
LAYER
Validation

SOURCE STAGING / ETL DATA DATA


SYSTEMS LAYER WAREHOUSE MARTS
Global Data Strategy, Ltd. 2021
Step 4: Monitor & Report Adherence

• When Business Rules are implemented can be used to:


• Check continued adherence of existing data
• Enforce the rules on new data to prevent new problems
• Best monitored via Data Quality Dashboards
• Provide regular reports on adherence of data to Business Rules
• Set KPIs to drive continuous data improvement
• Identify data quality trends
• Highlight areas where corrective action required
• Indicate where / if Business Rules may need to be amended to
meet changing business needs
• When reporting always try to relate data quality to business
outcomes
• Address the ‘so what’ objection
• Puts a financial or other benefit on continued data quality Data Quality Dashboard
improvement

Global Data Strategy, Ltd. 2021 28


Summary

• Business Rules are key to uncovering data quality


problems and driving data quality improvement

• Business Rules can be explicit or implicit so have to be


discovered and created in a variety of ways

• Follow the simple 4 Step process outlined to ensure you


optimize the value of Business Rules in your data quality
initiatives

• Remember that Business Rules are not set in stone and


need to be monitored and amended in line with changing
organizational needs and requirements

• With data quality the business always ultimately rules, so


Business Rules provide the means to enable this

Global Data Strategy, Ltd. 2021 29


Who We Are: Business-Focused Data Strategy
Maximize the Organizational Value of Your Data Investment

In today’s business environment, showing rapid time to value for


any technical investment is critical.

But technology and data can be complex. At Global Data Strategy,


we help demystify technical complexity to help you:

• Demonstrate the ROI and business value of data to your


management
• Build a data strategy at your pace to match your unique culture
and organizational style.
• Create an actionable roadmap for “quick wins”, which building
towards a long-term scalable architecture.

Global Data Strategy’s shares experience from some of the largest Global Data Strategy has worked with organizations globally in the
international organizations scaled to the pace of your unique team. following industries:
Finance · Retail · Social Services · Health Care · Education · Manufacturing
· Government · Public Utilities · Construction · Media & Entertainment ·
Insurance …. and more
Global Data Strategy, Ltd. 2021 www.globaldatastrategy.com
DATAVERSITY Data Architecture Strategies
This Year’s Lineup
• January Emerging Trends in Data Architecture – What’s the Next Big Thing?
• February Building a Data Strategy - Practical Steps for Aligning with Business Goals
• March Data Modeling Case Study – Business Data Modeling at Kiewit
• April Master Data Management – Aligning Data, Process, and Governance
• May Data Architecture, Solution Architecture, Platform Architecture – What’s the Difference?
• June Enterprise Architecture vs. Data Architecture
• July Best Practices in Metadata Management
• August Data Quality Best Practices (with guest Nigel Turner)
• September Data Modeling Techniques
• October Data Governance: Aligning Technical & Business Approaches
• December Data Architecture for Digital Transformation

Global Data Strategy, Ltd. 2021 31


Thoughts? Ideas?
Questions?

32
Global Data Strategy, Ltd. 2021

You might also like