Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

INTRODUCTION TO DATA

SCIENCE
OUTLINE
• Data, Big Data and Challenges
• Data Science
• Introduction
• Why Data Science

• Data Scientists
• What do they do?

• Major/Concentration in Data Science


• What courses to take.
DATA ALL AROUND
• Lots of data is being collected
and warehoused
• Web data, e-commerce
• Financial transactions, bank/credit transactions
• Online trading and purchasing
• Social Network
HOW MUCH DATA DO WE HAVE?
• Google processes 20 PB a day (2008)
• Facebook has 60 TB of daily logs
• eBay has 6.5 PB of user data + 50 TB/day (5/2009)
• 1000 genomes project: 200 TB

• Cost of 1 TB of disk: $35


• Time to read 1 TB disk: 3 hrs
(100 MB/s)
BIG DATA
Big Data is any data that is expensive to manage and hard to
extract value from
• Volume
• The size of the data
• Velocity
• The latency of data processing relative to the growing
demand for interactivity
• Variety and Complexity
• the diversity of sources, formats, quality, structures.
BIG DATA
TYPES OF DATA WE HAVE

• Relational Data (Tables/Transaction/Legacy Data)


• Text Data (Web)
• Semi-structured Data (XML)
• Graph Data
• Social Network, Semantic Web (RDF), …
• Streaming Data
• You can afford to scan the data once
WHAT TO DO WITH THESE DATA?
• Aggregation and Statistics
• Data warehousing and OLAP
• Indexing, Searching, and Querying
• Keyword based search
• Pattern matching (XML/RDF)
• Knowledge discovery
• Data Mining
• Statistical Modeling
WHAT IS DATA SCIENCE?

• An area that manages, manipulates, extracts, and


interprets knowledge from tremendous amount of data
• Data science (DS) is a multidisciplinary field of study with
goal to address the challenges in big data
• Data science principles apply to all data – big and small

https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/
WHAT IS DATA SCIENCE?
• Theories and techniques from many fields and
disciplines are used to investigate and analyze a large
amount of data to help decision makers in many
industries such as science, engineering, economics,
politics, finance, and education
• Computer Science
• Pattern recognition, visualization, data warehousing, High
performance computing, Databases, AI
• Mathematics
• Mathematical Modeling
• Statistics
• Statistical and Stochastic modeling, Probability.
DATA SCIENCE
DATA SCIENCE
REAL LIFE EXAMPLES

• Companies learn your secrets, shopping patterns, and


preferences
• Data Science and election (2008, 2012)
• 1 million people installed the Obama Facebook app that gave
access to info on “friends”
DATA SCIENTISTS

• Data scientists are the key to realizing the opportunities


presented by big data. They bring structure to it, find
compelling patterns in it, and advise executives on the
implications for products, processes, and decisions
• They find stories, extract knowledge. They are not reporters
WHAT DO DATA SCIENTISTS DO?

• National Security
• Cyber Security
• Business Analytics
• Engineering
• Healthcare
• And more ….
CONCENTRATION IN DATA SCIENCE

• Mathematics and Applied Mathematics


• Applied Statistics/Data Analysis
• Solid Programming Skills (R, Python, Julia, SQL)
• Data Mining
• Data Base Storage and Management
• Machine Learning and discovery

You might also like