01 - Data Sciences

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Data Sciences

Data science describes a cross-functional field which uses scientific methods to extract knowledge
from data. Techniques from disciplines like mathematics and statistics, computational and
information science are utilised to generate hypotheses and draw conclusions based on various data

“Data science” is still at peak interest for worldwide searches. Companies like Uber and Amazon
have built entire business models using data science methodologies. The pharma and healthcare
sector has been using data science methods for more than 50 years to provide data-driven evidence,
but this industry has also seen new players, especially in consumer devices such as wearables to
measure various health-related parameters.

The pharmaceutical industry is adapting to these new technologies, too. FDA-approved or cleared
devices are already used mainly for exploratory purposes in clinical trials and create huge volumes of
data. We can create valuable insights when we connect these new data sources to clinical data and
apply data science methods like machine learning, deep learning or artificial intelligence.

"Data Sciences" vs. "Clinical Data Science"

Data science in general is a broad field that encompasses the use of data to gain insights, make
predictions, and inform decision-making. Clinical data science is a specialized subfield within data
science that focuses on the use of data to advance healthcare and medicine.
Some key differences between clinical data science and general data science include:

1. Scope: Clinical data science focuses specifically on data related to healthcare and medicine,
whereas data science can be applied to a wide range of fields and industries.

2. Data sources: Clinical data science often involves working with data from electronic health
records, clinical trials, and other healthcare-specific sources. There are several data privacy
regulations that apply to these type of patient data sources. Regulations are in place to
protect the privacy of patients and ensure that their personal health information is used
ethically and responsibly. On the contrary, data science can involve working with data from a
variety of sources, including social media, financial data, and more, which are not as strongly
regulated as patient related data sources.

3. Applications: Clinical data science can be used to improve patient care, inform public health
policy, and optimize healthcare delivery. Statistical analysis is used to evaluate the safety
and effectiveness of new treatments, to understand how they compare to existing options,
and to identify trends and patterns in patient outcomes. The planning, data management,
statistical analysis and validation needs to follow specific requirements. On the contrary,
Data Science can be applied to a wide range of problems and goals, including improving
business operations, predicting outcomes, and more. These areas are usually less regulated.

As mentioned above, there are several requirements for statistical analysis in clinical data science,

1. Planning: Statistical analysis should be planned in advance of the start of a clinical trial. This
includes defining the objectives of the study, identifying the primary and secondary
endpoints, and determining the sample size and statistical methods to be used.

2. Data management: Clinical trial data should be collected, managed, and analyzed in a
consistent and transparent manner. This includes ensuring the accuracy and completeness
of the data, as well as establishing processes for handling missing data and other potential
sources of bias.

3. Analysis: Statistical analysis should be conducted in a rigorous and unbiased manner, using
appropriate statistical methods and software. The results of the analysis should be reported
transparently, including any limitations or assumptions made.

4. Validation: The results of the statistical analysis should be validated by an independent

statistical review committee or other appropriate bodies.

Overall, clinical data science is a highly specialized subfield within data science that focuses on the
use of data to advance healthcare and medicine.

The Importance of Quantitative Analytics for the Clinical Data Scientist

A thorough understanding of biostatistical methods is of critical importance for
a clinical data scientist to ensure traceable and repeatable data insight. Besides all the buzz
around machine learning or artificial intelligence, statistics is the foundation for all this.

A clinical data scientist needs to have a deep understanding of the quantitative analytical
methods which are usually used to describe the efficacy and safety of new drugs under
development in a study population of patients. In the traditional manner, quantitative analytical
methods are used to plan and analyse randomised clinical trials to create evidence-based
decisions using scientifically sound methods.

All these statistical methods have an impact on the data collection and analysis.
Therefore, the clinical data scientist needs to understand the statistical methods and concepts to
effectively generate additional data insight. This is especially true for cases when
the clinical data scientist wants to combine data generated from randomised clinical trials with other
data sources, such as real-world evidence data.

1 - Quantitative Analytics

2 - Outlier Detection in Clinical Trials

3 - Safety Analytics

Advanced Analytics
New data sources, like data from wearables and devices or other real-world data sources, quickly
generate huge amounts of data points. Traditional quantitative analytical methods are
no longer suitable to draw conclusions in an efficient manner. Here, advanced analytical
methods should be used.
Advanced analytics is an important area of clinical data science. Classic biostatistical methods
are used to describe data and test statistical hypotheses in clinical trials. Advanced analytics on the
other hand uses high-level methods and tools to focus on predicting future trends, events
and behaviours. It also includes newer technologies such as machine learning and artificial
intelligence, semantic analysis, visualisations, and even neural networks.

With advanced analytics, the pharmaceutical industry gains richer insights from multiple data
sources to reveal hidden patterns and relationships in clinical data or also operational data produced
during the conduct of clinical trials. There are many examples for the implementation of advanced
analytics methods. The use of advanced analytics methods can accelerate the site selection
process and help detect fraud and misconduct in clinical trials. Machine learning methods could help
to more efficiently ensure the medical coding of clinical trial data or support the interpretation of
complex medical images such as MRIs.

Therefore, a thorough understanding of advanced analytics methods is vital for the success of the
pharmaceutical industry to harness the power of data extensively to become more responsive,
competitive and innovative.

4 - Advanced Analytics - Machine Learning

5 - Advanced Analytics - Natural Language Processing

Data Visualisation
In a nutshell, data visualisation is just a graphical representation of any data or information. Visual
elements such as charts, graphs and maps are the few data visualisation tools that provide users
with an easy way of understanding the represented information.

The increased availability of data sources and the huge amount of data, in particular, made
data visualisation a hot topic for the pharmaceutical industry also. Their great
data visualisation packages is one of the reasons why open-source programming languages R and
Python have gained a lot of traction in data science. There are also many data visualisation software
packages, such as SAS/JMP, Spotfire or Tableau available, which enable the user to process and
visually analyse large volumes of data.

Regardless of the sophistication of these data visualisation packages, they are still just tools. The
results from a tool are only as good as the knowledge and skill of the professional using the tool.
Learning the syntax of a programming language or how to use a given software package is fairly
straightforward. Obtaining the skill to develop effective data visualisations is less straightforward
and remains a challenge for a majority of programmers and statisticians.

6 - Data Visualisations

Data Storytelling
Data storytelling is the result of all other quantitative, advanced and visual analytical methods. It is
the next elusive step to extract the real value out of the data. Not an ounce of value can be created
unless insights are uncovered and translated into actions or decisions.

Many individuals with advanced degrees in computer science, mathematics or statistics struggle
with communicating their insights to others effectively – essentially, telling the story of their
number. If an insight isn’t understood and isn’t compelling, no one will act on it and no change will

It’s important for a clinical data scientist to understand how the different elements of
data, visualisation and narrative combine and work together in data storytelling.

7 - Data Storytelling
8 - Freytag's Pyramid and Clinical Data Stories

9 - Data Storytelling - Diabetes Study

Data Science Use Cases in the Pharmaceutical and Healthcare

There are many use cases for clinical data science in the pharmaceutical and healthcare industry.
Some examples include:

1. Drug development: Clinical data science can be used to identify new drug targets, optimize
clinical trial designs, optimize the site selection process, conduct risk-based monitoring,
perform patten recognition to detect fraud or misconducat and evaluate the safety and
effectiveness of new treatments.

2. Patient care: Clinical data science can be used to identify patterns in patient care and
outcomes, inform clinical decision-making, and optimize treatment plans.

3. Public health: Clinical data science can be used to track and monitor the spread of diseases,
identify risk factors, and inform public health policy.

4. Population health: Clinical data science can be used to identify trends and patterns in
population health and inform efforts to improve the health of specific populations.

Besides these traditional use cases, new data sources from wearables or devices open a lot of new
opportunities for the pharmaceutical and healthcare industry. The development of digital biomarker
endpoints could lead to new additional insights from a population which has previously been
excluded from the patient data collection process. This can potentially produce a lot of new holistic
insights into the development of medical conditions and eventually completely change the drug
development process.
Overall, clinical data science has the potential to improve patient care, advance medical research,
and inform public health policy in the pharmaceutical and healthcare industry.

Reference Material
Recommended PHUSE educational material
Introductions into Data Science

• Data Sciences Project (Educating for the Future Working Group), PHUSE US Connect 2020,
Sascha Ahrweiler (Bayer), Aldir Medeiros Filho

• Data Science Applications and Scenarios, PHUSE US Connect

2018, Giri Balasubramanian (PRA Health Sciences), Ponraj Thangarajan (PRA Health

• Adventures in Data Science, PHUSE EU Connect 2016, Kieran Martin (Roche)

• Managing the Change – Evolving from Statistical Programmers to Clinical Data Scientists,
PHUSE EU Connect 2014, Sascha Ahrweiler (Bayer)

Advanced Analytics

• Introduction into Machine Learning, Katja Glass (Bayer)

Data Visualisations

• Concepts and Strategies for Developing Effective Data Visualizations, PHUSE EU Connect
2016, Becky Bates (GCE Solutions)

• Building Interactive Web Apps - Experiences with R Shiny, PHUSE EU Connect

2019, Stephanie Fechtner (Bayer)

• Others from: https://lexjansen.com/cgi-bin/xsl_transform.php?x=sdv&c=phuse

Data Storytelling

Data Storytelling: the essential data science skill everyone needs, Brent Dykes, Forbes, 2016

Recommended Readings
• Data Science for Business: What you need to know about data mining and data-analytic
thinking, Frank Provost, Tom Fawcett

• Essential Math for Data Science, Tirthajyoti Sarkar, Medium, 2018

• Essential Math for Data Science, Hadrien Jean, OReilly, September 2020

Recommended Videos
• Data Science Tutorial Videos – Simplilearn

• freeCodeCamp – https://www.youtube.com/channel/UC8butISFwT-Wl7EV0hUK0BQ
• Selected TED talks related to data science in healthcare:

• Big Data in Healthcare – https://www.youtube.com/watch?v=7t75CNC34vU

• Data driven healthcare: It’s personal – https://www.youtube.com/watch?v=Y3phCyMynos

• Big Data, Small World – https://www.youtube.com/watch?v=Zr02fMBfuRA

Recommended Websites
• https://towardsdatascience.com – Sharing concepts, ideas and codes through well-written
and informative articles using the Medium platform

• https://www.oreilly.com/radar/topics/ai-ml – Radar on artificial intelligence

and machine learning from O’Reilly. This is a collection of well-written articles, videos and

• https://www.kdnuggets.com – Leading award-

winning blog on AI, analytics, big data, data mining, data science,
and machine learning, edited by Gregory Piatetsky-Shapiro and Matthew Mayo 

• https://blogs.oracle.com/datascience/ – Nicely written and genuinely useful Oracle

AI & data science blog for anyone who wants to analyse data

• https://blogs.sas.com/content/subconsciousmusings – The SAS Data Science Blog features

the perspectives of SAS data scientists, as they share the technical methods used to solve
many of the challenging problems facing organisations today.

You might also like