Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

GLOBAL ACADEMY OF TECHNOLOGY

Department of Artificial Intelligence and Data Science


Affiliated to VTU, Accredited by NAAC with 'A' grade
RR Nagar, Bengaluru – 560 098

20ADS45: Foundations of Data Science


Question Bank for Module 3

Sl.No Question
MODULE 3
1 Define the following terminologies:
(i) Population (ii) Parameter (iii) Sample of population (iv) Statistic

2 List the two main ways of collection of data. Highlight the pros and cons of each of the
two approaches.

3 Discuss about the following types of sampling:


(i) Probability Sampling (ii) Random Sampling (iii) Unequal Probability Sampling

4 Discuss the various measures of center.


Which measure of center is more sensitive to outliers?

Survey question: How many semester hours are you taking this semester?
Responses: 15, 12, 18, 12, 15, 15, 12, 18, 15, 16
5
Compute the various measure of center for the data above.

7 The annual earnings for employees of a certain restaurant are given below:

12 laborers earn $8000 each.

10 laborers earn $9000 each.

4 supervisors earn $11000 each


The owner/manager earns $240,000.

Of the three measures of central tendency, which will be the least accurate
representation of "typical earnings?"

8 The table highlights the relative frequency distribution of a population of retail


store employees according to hourly wage.
Find: (i) Mean wage (ii) Median Wage

9 Discuss the salient features of the following measures of variation:


(i) Range (ii) Interquartile Range (iii) Variance (iv) Standard deviation

10 Midterm exam scores for a small advanced neuroanatomy class are provided below.
Scores represent percent of items marked correct on the exam.

87
99
75
87
94
75
35
88
87
93
Compute:
(i) Range (ii) Interquartile Range (iii) Variance (iv) Standard deviation

11 You are shopping for toilet tissue. As you compare prices of various brands, some offer
price per roll while others offer price per sheet. Determine which of the two pricing
method has less variability.

12 Briefly describe the statistic used as “measure of relative standing” with


example.

Problems based on z-score:


(i) The mean growth of the thickness of trees in a forest is found to be .5
13. cm/year with a standard deviation of .1 cm/year. What is the z-score
corresponding to 1 cm/year?
(ii) A particular leg bone for dinosaur fossils has a mean length of 5 feet with a
standard deviation of 3 inches. What is the z-score that corresponds to a length
of 62 inches?

Define correlation coefficient, with respect to statistics.


14. Describe the salient features of correlation coefficient.
15 State Empirical rule, along with an example.
(a) The lifespans of lizards in a particular zoo are normally distributed. The
16 average lizard lives 3.1 years; the standard deviation is 0.6 years.

Use the empirical rule (68−95−99.7%) to estimate the probability of a lizard


living longer than 5 years.

(b) If the price per pound of USDA Choice Beef is normally distributed with a
mean of $4.85/lb and a standard deviation of $0.35/lb, what is the estimated
probability that a randomly chosen sample (from a randomly chosen market)
will be between $5.20 and $5.55 per pound?

17. (a) A normally distributed data set has µ = 10 and σ = 2.5, what is the
probability of randomly selecting a value greater than 17.5 from the set?

(b) A normally distributed data set has µ = .05 and σ = .01, what is the
probability of randomly choosing a value between .05 and .07 from the set?

(c) A normally distributed data set has µ = 514 and an unknown standard
deviation, what is the probability that a randomly selected value will be less
than 514?

18. Describe the overview of the following Graphical Displays of Basic Statistical
Descriptions, with an example each:

(i) Boxplot (ii) Bar Chart (iii) Histogram (iv) Quantile Plot

(v) Q-Q Plot (vi) Scatter Plot


19. Differentiate the attribute of ‘Similarity’ and ‘Dissimilarity’ between two
objects. Also, list the properties to be satisfied by ‘Similarity’ and Dissimilarity’
respectively.

20. Describe the different types of Distance on Numeric Data, with mathematical
formula representing it.
21. For the given collection of points, compute:

(i) Manhattan Distance Matrix

(ii) Euclidean Distance Matrix

(iii) Supremum Distance matrix

x y
Point
P1 1 2
P2 2 3
P3 3 2
P4 2 1

22 Compute Simple Matching Coefficient (SMC), Jaccard Coefficient (J),


Hamming Distance for the following pair of binary vectors:

p=0101110001

q=1100110111

23 Describe the concept of Cosine Similarity.

Compute Cosine Similarity for the following pair of vectors:

X = (2, 5, 1, 0, 0, 3, 6, 1, 0, 4)

Y = (3, 3, 2, 1, 0, 3, 2, 5, 5, 2)

Also compute Extended Jaccard Coefficient/ Tanimoto coefficient for the same
data.

24. (i) Describe the concept of Correlation Coefficient / Pearson’s Correlation


Coefficient between two objects x and y.
(ii) Compute Pearson Correlation Coefficient for the following data, having
table of 6 people with different age and and weights.

(iii)

Q25. (i) List the drawbacks associated with correlation

(ii) List the issues associated with Proximity Calculation

You might also like