Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

Data Analytics – Intro to Key

Concepts

What is Data?

• Data is a collection of facts, such as numbers, words, measurements, observations or


just descriptions of things, reference or analysis.

• E.g: Tanjore, Raja Raja Chola, Big Temple, 1010 AD

• Information : Linking data to objects / person as a fact.

• E.g: Raja Raja Chola built Big Temple at Tanjore in 1010 AD.

• Knowledge : Extraction / summarization / Interpretation

• E.g: When we say a story at the end we give the moral of the story which may not be
present verbaticaly in the story.

Types of Data

• Data can be in various forms:

• Structured: 21MBA0102, stored in database and can be accessed via SQL (RDBMS)

• Unstructured: emails, whatsapp, social media, … accessed via NoSQL (MongoDB)

• Text : English, Multilingual

• Hypertext : Used in Web (connecting web pages)

• Image and Graphics – Pictorial representation

• Audio : Speech and Voice

• Video : Stored and Live

• Animation

• Sensor data : Analog form (can be converted to digital)

Amount of data generated every day

• 500 million tweets are sent

• 294 billion emails are sent

• 4 petabytes of data are created on Facebook

• 4 terabytes of data are created from each connected car

• 65 billion messages are sent on WhatsApp


• 5 billion searches are made

• By 2025, it’s estimated that 463 exabytes of data will be created each day globally –
that’s the equivalent of

212,765,957 DVDs per day!

What is Data Analytics?

• Data analytics refers to the process and practice of analyzing data to answer questions,
extract insights, and identify trends.

• This is done using an array of tools, techniques, and frameworks depending on the
type of analysis being conducted. (Ref : Harvard Business School Online)

• The four major types of analytics include:

• Descriptive analytics, which looks at data to examine, understand, and describe


something that’s already happened.

• Diagnostic analytics, which goes deeper than descriptive analytics by seeking to


understand the why behind what happened.

• Predictive analytics, which relies on historical data, past trends, and assumptions to
answer questions about what will happen in the future.

• Prescriptive analytics, which aims to identify specific actions that one should take to
reach future targets or goals.

Business Analytics

• Applying data analytics tools and methodologies in a business setting is typically


referred to as business analytics.

• Budgeting and forecasting: By assessing a company’s historical revenue, sales, and


costs data and its goals for future growth, an analyst can identify the relevant budget and
investments.

• Risk management: By understanding certain business risks occurring—and their


associated costs—an analyst can make costeffective recommendations to help mitigate them.

• Marketing and sales: By understanding key metrics, such as lead-tocustomer


conversion rate, a marketing analyst can identify the number of leads one must generate to fill
the sales pipeline.

• Product development (or research and development): By understanding how


customers have reacted to product features in the past, an analyst can help an organization in
product development, design, and user experience in the future.

What is a Data Science?


• Data analytics is mainly focused on understanding datasets and gleaning insights
whereas data science is centered on building, cleaning, and organizing datasets.

• Data scientists create and leverage algorithms, statistical models, and their own
custom analyses to collect and shape raw data into something that can be easily understood.

• Data Scientists are performing some key functions:

• Data wrangling: We clean and organize data to be readily used.

• Statistical modeling: We run data through different models—such as regression,


classification, and clustering models to identify relationships between variables and gain
insight from the numbers.

• Programming: The process of writing computer programs and algorithms in a variety


of languages—such as R, Python, and SQL to analyze large datasets efficiently than through
manual analysis.

Skills needed for a Data Analyst

• Data Analysis can be used in any branch of Business

– Human Resource

– Finance

– Operations

– Sales / Marketing

• Domain Knowledge

– Go to the field and talk to customers, find their pains

• Analytical Thinking

– Problem solving using Tools

– Critical Thinking

• Applying Right Tool for the given problem (many times Excel will do)

• Communication

– Power point presentation is more important

Ways of Looking at Data

• Time Series and cross-sectional data

• Time series record developments over time; say, monthly ice-cream output
• Cross sectional data captures a situation at a moment of time; e.g value of sales at
various branches on one day.

• Scales of Measurement

• Nominal or categorical data identify classifications only; e.g, sex (male/female),


departments

(HR/Finance/Marketing), sales regions (North, South, East, West) – Here no quantities are
implied – No maths can be done on them.

• Ordinal or Ranked data

• Categories can be sorted into certain order (ascending / descending), but differences
between ranks are not necessarily equal. e.g. customer feedback (bad / average / good /
excellent) – No Maths can be done

• Interval scale data

• Measurable differences are identified, but the zero point is arbitrary. e.g, Is 20 deg
Celsius twice as hot as 10 deg Celsius? Convert to Fahrenheit to see that it is not. The
equivalents are 68 deg and 50 deg Fahrenheit. Temperature is measured on an interval scale
with arbitrary zero points (0 deg Celsius and 32 deg

Fahrenheit) – We cannot divide and do comparisons.

• Ratio scale data

• There is a true zero and measurements can be compared as ratios. e.g, if Ram, Gopi,
and Bala achieve a sales target of Rs. 250K, 500K and 1000K in a given month, then we can
claim that the achievement of Bala is twice that of Gopi and 4 times that of Ram – We can
divide and do the Maths.

• Continuity

• Certain results are presented in one type of data only.

e.g assign 0.4 of a salesman for promoting a product in a region. Here only whole numbers
are permitted.

• Discrete values are counted in whole numbers (integers); e.g, the number of satisfied
customers

• Continuous variables do not increase in steps. Measurements such as heights and


weights are continuous – Real (decimal) numbers are permitted.

• Fractions, percentages and proportions

• Monetary systems are based on 100 subdivisions. e.g, 100 paise is equal to 1 Rupee;
100 cents to the dollar.
• Amounts less than one big unit are fractions.

• 50 paise = 0.5 rupee or 50% of one Rupee.

• Proportions and percentage are all same thing with different names.

• ¾ means divide 3 by 4 = 0.75 (proportion). If we multiply it by 100 we get


percentage.

• Percentage increases and decreases

• A percentage increase followed by the same percentage decrease does not give us
back where we started. e.g, do not accept a 50% increase in salary for six months, to be
followed by a 50% cut.

• e.g Rs. 1000 increased by 50% is Rs. 1500.

• 50% of decrease then on Rs.1500 is Rs. 750

• A frequent business problem is finding what a number was before it was increased by
a given percentage.

• Simply divide by (1+i), where is the percentage increase expressed as a proportion.

• E.g, If an invoice is for Rs. 575 including 15% GST, the tax exclusive amount is
575/(1.15) = Rs. 500.

• Fractions

• If anything is increased by an amount x/y, the increment is x/(x+y) of the new total.

• E.g, if Rs. 100 is increased by ½, the increment of Rs. 50 is expressed as 1/(1+2) =


1/3 of the new total

• Rs. 100 increased by ¾ is Rs. 175; the Rs. 75 increment is 3/(3+4) = 3/7 of the new
total, namely, Rs. 175

• Scientific notations (expressing small and big numbers)

• Individuals use small amounts of cash; corporates talk of huge amount of cash.

• E.g, Thousand, Million, Billion, Trillion, etc

• They are used to save time writing out large and small numbers.

• E.g, 1.25 x 106 is equal to 12,50,000;

• 1.25 x 10-6 is equal to 0.00000125

• Rounding
• Values ending in 4 or less are rounded down (1.24 becomes 1.2), amounts ending in 5
or more are rounded up (1.25 becomes 1.3).

• Two time two equals four (Wrong with Rounding)

• 1.5 and 2.4 both round to 2

• 1.5 x 1.5 is 2.25, which rounds to 2

• 2.4 x 2.4 is 5.76, which rounds to 6

• 1.45 rounds to 1.5, which rounds a second time to 2 despite the original being nearer
to 1.

• Note : Rounding is to be done after multiplying or dividing

• Relationship between Proportions and Growth

• E.g, the finance director has received annual 10% pay raises for the last 10 years. By
how much ahs his salary increased? Not 100%, but 160%.

• Think of proportionate increase. He earned 1.1 times the amount in the year before.

• In year one, he received the base amount (1.0) times 1.1 = 1.1.

• In year two, total growth was 1.1 x 1.1 = 1.21.

• In year three, 1.21 x 1.1 = 1.331 and so on up to

2.358 x 1.1 = 2.594 in tenth year.

• Take away 1 and multiply by 100, we get 159.4% increase (rounded as 160%)

• Powers – When the growth rate is always the same, multiply the proportion by itself a
number of time.

• Here, 1.1 was multiplied by itself 10 times ( In Maths 1.110)

• A 2.0% monthly price rise of a commodity is equivalent to an annual rate of inflation


of 26.8%, not 24%.

• If India’s GDP is 1.7% higher during Jan-March quarter than during Oct-Dec quarter
then it is equivalent to an annual rate of increase of 7% (1.017 x 1.017 x 1.017 x 1.017)

• Brackets – When the order of operation is important, we use brackets.

• 4 x (2 + 3 ) = 20 is different from (4 x 2) + 3 = 11.

• We may use more than one brackets [(4 x 2) + 3] x 6 = 66.

• Perform innermost ones first and then go to outer ones.


• In statistics, Roman letters are used for sample data (p = proportion from a sample).

• Greek letters indicate population data (eg π).

• Logarithms

• Another name for a power or exponent.

• Ten raised to the power of 3 is 103 = 10 x 10 x 10 = 1000.

• 3 is the logarithm of 1000 to the base 10 • Logs are used for flattening out growth
rates.

• In Maths, multiplication and division of large number can be done using Logarithms.

• Used in outlier analysis and visualization of data whose range (difference between
max and min) is high.

• Converting a nonlinear data into a linear form.

You might also like