Download as pdf or txt
Download as pdf or txt
You are on page 1of 46

Data Warehouse and Data Mining MCQ Questions

Name: Shivani Dattatraya Chatte

Roll No: 08

1) Which of the following refers to the problem of finding abstracted patterns (or structures) in
the unlabeled data?

a. Supervised learning
b. Unsupervised learning
c. Hybrid learning
d. Reinforcement learning

Answer: b

Explanation: Unsupervised learning is a type of machine learning algorithm that is generally used to
find the hidden structured and patterns in the given unlabeled data.

2) Which one of the following refers to querying the unstructured textual data?

a. Information access
b. Information update
c. Information retrieval
d. Information manipulation

Answer: c

Explanation: Information retrieval refers to querying the unstructured textual data. We can also
understand information retrieval as an activity (or process) in which the tasks of obtaining information
from system recourses that are relevant to the information required from the huge source of
information.

3) Which of the following can be considered as the correct process of Data Mining?

a. Infrastructure, Exploration, Analysis, Interpretation, Exploitation


b. Exploration, Infrastructure, Analysis, Interpretation, Exploitation
c. Exploration, Infrastructure, Interpretation, Analysis, Exploitation
d. Exploration, Infrastructure, Analysis, Exploitation, Interpretation

Answer: a
Explanation: The process of data mining contains many sub-processes in a specific order. The correct
order in which all sub-processes of data mining executes is Infrastructure, Exploration, Analysis,
Interpretation, and Exploitation.

4) Which of the following is an essential process in which the intelligent methods are applied to
extract data patterns?

a. Warehousing
b. Data Mining
c. Text Mining
d. Data Selection

Answer: b

Explanation: Data mining is a type of process in which several intelligent methods are used to extract
meaningful data from the huge collection ( or set) of data.

5) What is KDD in data mining?

a. Knowledge Discovery Database


b. Knowledge Discovery Data
c. Knowledge Data definition
d. Knowledge data house

Answer: a

Explanation: The term KDD or Knowledge Discovery Database is refers to a broad process of
discovering the knowledge in the data and emphasizes the high-level applications of specific Data
Mining techniques as well.

6) The adaptive system management refers to:

a. Science of making machine performs the task that would require intelligence when performed
by humans.
b. A computational procedure that takes some values as input and produces some values as the
output.
c. It uses machine learning techniques, in which programs learn from their past experience and
adapt themself to new conditions or situations.
d. All of the above.

Answer: c
Explanation: Generally, adaptive system management refers to using machine learning techniques. In
which the programs learn from their past experience and adapt themselves for new conditions and
events.

7) For what purpose, the analysis tools pre-compute the summaries of the huge amount of
data?

a. In order to maintain consistency


b. For authentication
c. For data access
d. To obtain the queries response

Answer: d

Explanation:

Whenever a query is fired, the response of the query would be put very earlier. So, for the query
response, the analysis tools pre-compute the summaries of the huge amount of data. To understand it in
more details, consider the following example:

Suppose that to get some information about something, you write a keyword in Google search. Google's
analytical tools will then pre-compute large amounts of data to provide a quick output related to the
keywords you have written.

8) What are the functions of Data Mining?

a. Association and correctional analysis classification


b. Prediction and characterization
c. Cluster analysis and Evolution analysis
d. All of the above

Answer: d

Explanation: In data mining, there are several functionalities used for performing the different types of
tasks. The common functionalities used in data mining are cluster analysis, prediction, characterization,
and evolution. Still, the association and correctional analysis classification are also one of the important
functionalities of data mining.

9) In the following given diagram, which type of clustering is used?

a. Hierarchal
b. Naive Bayes
c. Partitional
d. None of the above
Answer: a

Explanation: In the above-given diagram, the hierarchal type of clustering is used. The hierarchal type
of clustering categorizes data through a variety of scales by making a cluster tree. So the correct answer
is A.

10) Which of the following statements is incorrect about the hierarchal clustering?

a. The hierarchal type of clustering is also known as the HCA


b. The choice of an appropriate metric can influence the shape of the cluster
c. In general, the splits and merges both are determined in a greedy manner
d. All of the above

Answer: a

Explanation: All following statements given in the above question are incorrect, so the correct answer
is D.

11) Which one of the following can be considered as the final output of the hierarchal type of
clustering?

a. A tree which displays how the close thing are to each other
b. Assignment of each point to clusters
c. Finalize estimation of cluster centroids
d. None of the above

Answer: a

Explanation: The hierarchal type of clustering can be referred to as the agglomerative approach.

12) Which one of the following statements about the K-means clustering is incorrect?

a. The goal of the k-means clustering is to partition (n) observation into (k) clusters
b. K-means clustering can be defined as the method of quantization
c. The nearest neighbor is the same as the K-means
d. All of the above

Answer: c

Explanation: There is nothing to deal in between the k-means and the K- means the nearest neighbor.
13) Which of the following statements about hierarchal clustering is incorrect?

a. The hierarchal clustering can primarily be used for the aim of exploration
b. The hierarchal clustering should not be primarily used for the aim of exploration
c. Both A and B
d. None of the above

Answer: a

Explanation: The hierarchical clustering technique can be used for exploration because it is the
deterministic technique of clustering.

14) Which one of the clustering technique needs the merging approach?

a. Partitioned
b. Naïve Bayes
c. Hierarchical
d. Both A and C

Answer: c

Explanation: The hierarchal type of clustering is one of the most commonly used methods to analyze
social network data. In this type of clustering method, multiple nodes are compared with each other on
the basis of their similarities and several larger groups' are formed by merging the nodes or groups of
nodes that have similar characteristics.

15) The self-organizing maps can also be considered as the instance of _________ type of learning.

a. Supervised learning
b. Unsupervised learning
c. Missing data imputation
d. Both A & C

Answer: b

Explanation: The Self Organizing Map (SOM), or the Self Organizing Feature Map is a kind of Artificial
Neural Network which is trained through unsupervised learning.

16) The following given statement can be considered as the examples of_________

Suppose one wants to predict the number of newborns according to the size of storks' population by
performing supervised learning
a. Structural equation modelling
b. Clustering
c. Regression
d. Classification

Answer: c

Explanation: The above-given statement can be considered as an example of regression. Therefore the
correct answer is C.

17) In the example predicting the number of newborns, the final number of total newborns can
be considered as the _________

a. Features
b. Observation
c. Attribute
d. Outcome

Answer: d

Explanation: In the example of predicting the total number of newborns, the result will be represented
as the outcome. Therefore, the total number of newborns will be found in the outcome or addressed by
the outcome.

18) Which of the following statement is true about the classification?

a. It is a measure of accuracy
b. It is a subdivision of a set
c. It is the task of assigning a classification
d. None of the above

Answer: b

Explanation: The term "classification" refers to the classification of the given data into certain sub-
classes or groups according to their similarities or on the basis of the specific given set of rules.

19) Which of the following statements is correct about data mining?

a. It can be referred to as the procedure of mining knowledge from data


b. Data mining can be defined as the procedure of extracting information from a set of the data
c. The procedure of data mining also involves several other processes like data cleaning, data
transformation, and data integration
d. All of the above
Answer: d

Explanation: The term data mining can be defined as the process of extracting information from the
massive collection of data. In other words, we can also say that data mining is the procedure of mining
useful knowledge from a huge set of data.

20) In data mining, how many categories of functions are included?

a. 5
b. 4
c. 2
d. 3

Answer: c

Explanation: There are only two categories of functions included in data mining: Descriptive,
Classification and Prediction. Therefore the correct answer is C.

21) Which of the following can be considered as the classification or mapping of a set or class
with some predefined group or classes?

a. Data set
b. Data Characterization
c. Data Sub Structure
d. Data Discrimination

Answer: d

Explanation: The discrimination refers to the mapping (or classification) of a class with some
predefined groups or classes. So the correct answer is D.

22) The analysis performed to uncover the interesting statistical correlation between
associated -attributes value pairs are known as the _______.

a. Mining of association
b. Mining of correlation
c. Mining of clusters
d. All of the above

Answer: b

Explanation: Mining of correlation refers to the additional analysis performed for uncovering the
interesting statistical correlation in between associated-attribute-value pairs.
23) Which one of the following can be defined as the data object which does not comply with
the general behavior (or the model of available data)?

a. Evaluation Analysis
b. Outliner Analysis
c. Classification
d. Prediction

Answer: b

Explanation: It may be defined as the object that doesn't comply with the general behavior or with the
model of available data.

24) Which one of the following statements is not correct about the data cleaning?

a. It refers to the process of data cleaning


b. It refers to the transformation of wrong data into correct data
c. It refers to correcting inconsistent data
d. All of the above

Answer: d

Explanation: Data cleaning is a kind of process that is applied to data set to remove the noise from the
data (or noisy data), inconsistent data from the given data. It also involves the process of
transformation where wrong data is transformed into the correct data as well. In other words, we can
also say that data cleaning is a kind of pre-process in which the given set of data is prepared for the data
warehouse.

25) The classification of the data mining system involves:

a. Database technology
b. Information Science
c. Machine learning
d. All of the above

Answer: d

Explanation: Generally, the classification of a data mining system depends on the following criteria:
Database technology, machine learning, visualization, information science, and several other disciplines.

26) In order to integrate heterogeneous databases, how many types of approaches are there in
the data warehousing?
a. 3
b. 4
c. 5
d. 2

Answer: d

Explanation: In general, data warehousing consist of data integration, data cleaning, and data
consolidations. Therefore to integrate heterogeneous databases, there are two approaches that are
update-driven approach and the query-driven approach. So the correct answer is D.

27) The issues like efficiency, scalability of data mining algorithms comes under_______

a. Performance issues
b. Diverse data type issues
c. Mining methodology and user interaction
d. All of the above

Answer: a

Explanation: In order to extract information effectively from a huge collection of data in databases, the
data mining algorithm must be efficient and scalable. Therefore the correct answer is A.

28) Which of the following is the correct advantage of the Update-Driven Approach?

a. This approach provides high performance.


b. The data can be copied, processed, integrated, annotated, summarized and restructured in the
semantic data store in advance.
c. Both A and B
d. None of the above

Answer: c

Explanation: The statements given in both A and B are the advantage of the Update-Driven Approach
in Data Warehousing. So the correct answer is C.

29) Which of the following statements about the query tools is correct?

a. Tools developed to query the database


b. Attributes of a database table that can take only numerical values
c. Both and B
d. None of the above
Answer: a

Explanation: The query tools are used to query the database. Or we can also say that these tools are
generally used to get only the necessary information from the entire database.

30) Which one of the following correctly defines the term cluster?

a. Group of similar objects that differ significantly from other objects


b. Symbolic representation of facts or ideas from which information can potentially be extracted
c. Operations on a database to transform or simplify data in order to prepare it for a machine-
learning algorithm
d. All of the above

Answer: a

Explanation: The term "cluster" refers to the set of similar objects or items that differ significantly
from the other available objects. In other words, we can understand clusters as making groups of
objects that contain similar characteristics form all available objects. Therefore the correct answer is A.

31) Which one of the following refers to the binary attribute?

a. This takes only two values. In general, these values will be 0 and 1, and they can be coded as one
bit
b. The natural environment of a certain species
c. Systems that can be used without knowledge of internal operations
d. All of the above

Answer: a

Explanation: In general, the binary attribute takes only two types of values, that are 0 and 1and these
values can be coded as one bit. So the correct answer will be A.

32) Which of the following correctly refers the data selection?

a. A subject-oriented integrated time-variant non-volatile collection of data in support of


management
b. The actual discovery phase of a knowledge discovery process
c. The stage of selecting the right data for a KDD process
d. All of the above

Answer: c
Explanation: Data selection can be defined as the stage in which the correct data is selected for the
phase of a knowledge discovery process (or KKD process). Therefore the correct answer C.

33) Which one of the following correctly refers to the task of the classification?

a. A measure of the accuracy, of the classification of a concept that is given by a certain theory
b. The task of assigning a classification to a set of examples
c. A subdivision of a set of examples into a number of classes
d. None of the above

Answer: b

Explanation: The task of classification refers to dividing the set into subsets or in the numbers of the
classes. Therefore the correct answer is C.

34) Which of the following correctly defines the term "Hybrid"?

a. Approach to the design of learning algorithms that is structured along the lines of the theory of
evolution.
b. Decision support systems that contain an information base filled with the knowledge of an
expert formulated in terms of if-then rules.
c. Combining different types of method or information
d. None of these

Answer: c

Explanation: The term "hybrid" refers to merging two objects and forms individual object that contains
features of the combined objects.

35) Which of the following correctly defines the term "Discovery"?

a. It is hidden within a database and can only be recovered if one is given certain clues (an
example IS encrypted information).
b. An extremely complex molecule that occurs in human chromosomes and that carries genetic
information in the form of genes.
c. It is a kind of process of executing implicit, previously unknown and potentially useful
information from data
d. None of the above

Answer: c
Explanation: The term "discovery" means to discover something new that has not yet been discovered.
It can also be interpreted as a process of executing underlying, previously unknown and potentially
useful information from data.

36) Euclidean distance measure is can also defined as ___________

a. The process of finding a solution for a problem simply by enumerating all possible solutions
according to some predefined order and then testing them
b. The distance between two points as calculated using the Pythagoras theorem
c. A stage of the KDD process in which new data is added to the existing selection.
d. All of the above

Answer: c

Explanation: Euclidean distance measure can be defined as the calculating distance between two
points in either in-plane or three-dimensional space measures the length of the segments connecting
two points. It can also define as the distance between two points as calculated using the Pythagoras
theorem.

37) Which one of the following can be considered as the correct application of the data mining?

a. Fraud detection
b. Corporate Analysis & Risk management
c. Management and market analysis
d. All of the above

Answer: d

Explanation: Data mining is highly useful in a variety of areas such as fraud detection, corporate
analysis, and risk management, and market analysis, etc., so the correct option is D.

38) Which one of the following correctly refers to the Class study in the data cauterization?

a. Final class
b. Study class
c. Target class
d. Both A and C

Answer: c

Explanation: In the data cauterization, generally, the study class refers to the target class, and the study
class is the class that is under the process of summarizing data.
39) Which of the following refers to the sequence of pattern that occurs frequently?

a. Frequent sub-sequence
b. Frequent sub-structure
c. Frequent sub-items
d. All of the above

Answer: a

Explanation: In data mining, the frequent sub-sequence refers to a certain sequence of patterns that
occurs frequently, for example, buying a camera followed by the memory card. So the correct answer
will be A.

40) Which one of the following refers to the model regularities or to the objects that trends or
not consistent with the change in time?

a. Prediction
b. Evolution analysis
c. Classification
d. Both A and B

Answer: b

Explanation: In general, the evolution analysis refers to the model regularities or the object trends that
vary with change in time.

41) The issues like "handling the rational and complex types of data" comes under which of the
following category?

a. Diverse Data Type


b. Mining methodology and user interaction Issues
c. Performance issues
d. All of the above

Answer: a

Explanation: It is quite often that a database can contain multiple types of data, complex objects, and
temporary data, etc., so it is not possible that only one type of system can filter all data. Therefore this
type of issue comes under the category Diverse Data type. So the correct answer is A.

42) Which of the following also used as the first step in the knowledge discovery process?
a. Data selection
b. Data cleaning
c. Data transformation
d. Data integration

Answer: b

Explanation: Data cleaning is included as one of the first steps of the knowledge discovery process. So
the correct answer is B.

43) Which of the following refers to the steps of the knowledge discovery process, in which the
several data sources are combined?

a. Data selection
b. Data cleaning
c. Data transformation
d. Data integration

Answer: d

Explanation: The step "data integration" of the knowledge discovery process refers to combining
several data sources. Therefore the correct answer is D.

44) Which of the following can be considered as the drawback of the query-Driven approach in
data warehousing?

a. This approach is expensive for queries that require aggregations


b. This approach is expensive insufficient, and very frequent queries
c. This approach requires a very complex integration and filtering process
d. All of the above

Answer: d

Explanation: All statements given in the above question are drawbacks of the query-driven approach.
Therefore the correct answer is D.

45) Which of the following correctly refers to the term "Data Independence"?

a. It means that the programs are not dependent on the logical attributes
b. It refers to that data that is defined separately, not included in the program
c. It means that the programs are totally dependent on the physical attributes of data
d. Both A and C
Answer: d

Explanation: The term "Data Independence" refers that the programs are not dependent on the
physical attributes of data and neither on the logical attributes of data.

46) Which of the following is generally used by the E-R model to represent the weak entities?

a. Diamond
b. Doubly outlined rectangle
c. Dotted rectangle
d. Both B & C

Answer: b

Explanation: Generally, the double outline rectangle is used in the E-R model to represent the weak
entities.

47) Which one of the following refers to the Black Box?

a. It can be referred as the system that can be used without the knowledge of the internal
operations
b. It referrers the natural environment of the specific species
c. It takes only two values at most that are 0 and 1
d. All of the above

Answer: a

Explanation: Black Box is referred to as the system which takes only two values at most are zero and
one.

48) Which one of the following issues must be considered before investing in data mining?

a. Compatibility
b. Functionality
c. Vendor consideration
d. All of the above

Answer: d

Explanation: The common but important issues like functionality and compatibility must always be
discussed before investing in data mining. Therefore the correct answer is D.

49) The term "DMQL" stands for _____


a. Data Marts Query Language
b. DBMiner Query Language
c. Data Mining Query Language
d. None of the above

Answer: c

Explanation: The term "DMQL" refers to the Data Mining Query Language. Therefore the correct
answer is C.

50) In certain cases, it is not clear what kind of pattern need to find, data mining
should_________:

a. Try to perform all possible tasks


b. Perform both predictive and descriptive task
c. It may allow interaction with the user so that he can guide the mining process
d. All of the above

Answer: c

Explanation: In some data mining operations where it is not clear what kind of pattern needed to find,
here the user can guide the data mining process. Because a user has a good sense of which type of
pattern he wants to find. So, he can eliminate the discovery of all other non-required patterns and focus
the process to find only the required pattern by setting up some rules. Therefore the correct answer is
C.

51) __________ contains information that gives users an easy-to-understand perspective of the
information stored in the data warehouse

a. Financial metadata
b. Operational metadata
c. Technical metadata
d. Business metadata

Answer: d

52) _________ is not associated with data cleaning process.

a. Deduplication
b. Domain consistency
c. Segmentation
d. Disambiguation

Answer: c
53) ___________ is a good alternative to the star schema.

a. Star schema.
b. Snowflake schema.
c. Fact constellation.
d. Star-snowflake schema.

Answer: c

54) Dimensionality refers to ___________

a. Cardinality of key values in a star schema


b. The data that describes the transactions in the fact table
c. The level of detail of data that is held in the fact table
d. The level of detail of data that is held in the dimension table

Answer: b

55) A star schema has what type of relationship from a dimension to the fact table?

a. Many-to-many
b. Many-to-one
c. One-to-one
d. One-to-many

Answer: d

56) __________ predicts future trends & behaviors, allowing business managers to make
knowledge-driven decisions

a. Meta data
b. Data mart
c. Data warehouse
d. Data Mining

Answer: d

57) Expansion for DSS in DW is ___________

a. Decisive Strategic System


b. Data Support System
c. Data Store System
d. Decision Support system
Answer: d

58) A data warehouse is described by which of the following?

a. Can be updated by end users


b. Contains only current data
c. Contains numerous naming conventions and formats
d. Organized around important subject areas

Answer: d

59) Data in a data warehouse___________

a. in a flat file format


b. can be normalised but often is not
c. must be in normalised form to at least 3NF
d. must be in normalised form to at least 2NF

Answer: b

60) The main organisational justification for implementing a data warehouse is to provide
___________

a. ETL from operation systems to strategic systems


b. Large scale transaction processing
c. Storing large volumes of data
d. Decision support

Answer: d

61) A data warehouse ___________

a. must import data from transactional systems whenever significant changes occur in the
transactional data
b. works on live transactional data to provide up to date and valid results
c. takes regular copies of transaction data
d. takes preprocessed transaction data and stores in a way that is optimised for analysis

Answer: d

62) Data warehouse contains ________data that is seldom found in the operational environment
a. informational
b. normalized
c. denormalized
d. summary

Answer: d

63) Which of the following statements about DW is true?

a. A data warehouse is necessary to all those organisations that are using relational OLTP
b. A data warehouse is useful to all organisations that currently use OLTP
c. A data warehouse is valuable to the organisations that need to keep an audit trail of their
activities
d. A data warehouse is valuable only if the organisation has an interest in analysing historical data

Answer: d

64) What is true of storing data in DW?

a. A DW automatically makes a copy of every transaction recorded in OLTP systems


b. Adding data for the sake of it may well degrade the effectiveness of DW
c. A DW is a relatively straighttforward thing to set up
e. The more data a data warehouse has, the better it is

Answer: b

65) The extract process is _______

a. capturing a subset of the data contained in operational systems


b. capturing a subset of the data contained in various decision support systems
c. capturing all of the data contained in various decision support systems
d. capturing of the data contained in all operational systems

Answer: a

66) Which statement best describes fact table

a. A fact table describes the transactions stored in a DW


b. The fact table of a DW is the main store of descriptions of the transactions
c. A fact table describes the granularity of data in a DW
d. The fact table of a data warehouse is the main store of all of the recorded transactions over time

Answer: d
67) Fact tables are described by which of the following?

a. Partially normalized
b. Completely denormalized
c. Partially denormalized
d. Completely normalized

Answer: d

68) _______ are numeric measurements or values that represent a specific business aspect or
activity

a. Dimensions
b. Schemas
c. Facts
d. Tables

Answer: c

69) A fact is said to be fully additive if __________.

a. additive over atleast one of the dimensions


b. Only numeric measures are used
c. All possible summaries are stored
d. it is additive over every dimension of its dimensionality

Answer: d

70) Granularity refers to ___________

a. The level of detail of the data stored in a data warehouse


b. The number of fact tables in a data warehouse
c. The number of dimensions in a data warehouse
d. The level of detail of the data descriptions held in a data warehouse

Answer: a

71) Data cubes can grow to n-number of dimensions, thus becoming _______

a. Hypercubes
b. Star Cubes
c. Dimensional Cubes
d. Solid cubes
Answer: a

72) _____describes the data contained in the data warehouse

a. Relational data
b. Operational data
c. Informational data
d. Meta data

Answer: d

73) An operational system is which of the following?

a. A system that is used to run the business in real time and is based on current data
b. A system that is used to run the business in real time and is based on historical data
c. A system that is used to support decision making and is based on historical data
d. A system that is used to support decision making and is based on current data

Answer: a

74) Treating incorrect or missing data is called as _______.

a. Preprocessing
b. Interpretation
c. Selection
d. Transformation

Answer: a

75) ______ makes a copy of a table and places it in a different location, to improve access time

a. Archive
b. Replication
c. Partitioning
d. Aggregation

Answer: b

76) When you ________ the data, you are aggregating the data to a higher level

a. Slice
b. Roll up
c. Accumulate
d. Drill down
Answer: b

77) ______ is a measurement of the density of the data held in the data cube

a. Mass
b. Sparsity
c. Compactness
d. Concentration

Answer: b

78) A fact table in the centre with dimension tables directly linked to it

a. A star schema
b. A star flake schema
c. A snowflake schema
d. A constellation

Answer: a

79) ______ is a data transformation.

a. Re-format
b. Selection
c. Projection
d. Comparison

Answer: a

80)_________ introduces the Management Data Warehouse (MDW) to SQL Server Management
Studio for streamlined performance troubleshooting.
a. SQL Server 2005
b. SQL Server 2008
c. SQL Server 2012
d. SQL Server 2014

Answer: b

Explanation: MDW is a set of components that enable a database developer or administrator to quickly
track down problems that could be causing performance degradation.

81) Point out the correct statement.


a. MDW consist of three components
b. SQL Server Express instances can be targets
c. Setting up the MDW is a one-step process
d. All of the mentioned

Answer: a

Explanation: MDW consists of three components: Data Collector, MDW database and MDW reports.

82) Which of the following mode allows for the collection and uploading of data to occur on
demand?
a. Non-cached mode
b. Cached mode
c. Mixed mode
d. All of the mentioned

Answer: a
Explanation: In non-cached mode, collection and upload are on the same schedule.

83) Which of the following scenario favours cached mode?


a. Continuous collection of data
b. Less frequent uploads
c. Data collection and uploading of jobs on different schedules
d. All of the mentioned

Answer: d
Explanation: Cached mode uses separate schedules for collection and upload.

84) Point out the wrong statement.


a. The Data Collection is performed primarily through SSIS packages that control the collection
frequency on the target
b. You should change the database name after creation
c. Do not change any of the job specifications for the data collection and upload jobs
d. None of the mentioned

Answer: b
Explanation: You should not change the database name after creation, because all of the jobs created to
manage the database collection refer to the database by the original name and will generate errors if
the name is changed.

85) Which of the following is the best Practice and Caveat for Management Data Warehouse?
a. Use a centralized server for the MDW database
b. The XML parameters for a single T-SQL collection item can have multiple <Query> elements
c. Use a distributed server for the MDW database
d. All of the mentioned

Answer: a
Explanation: Centralized server allows you to use a single point for viewing reports for multiple
instances.

86) ____________ stores information about how the management data warehouse reports should
group and aggregate performance counters.
a. core.snapshots_internal
b. core.supported_collector_types_internal
c. core.wait_categories
d. core.performance_counter_report_group_items

Answer: d
Explanation: core.wait_categories contains the categories used to group wait types according to
wait_type characteristic.

87) Which of the following table is used in the management data warehouse schema that is
required for the Server Activity?
a. snapshots.query_stat
b. snapshots.os_latch_stats
c. snapshots.active_sessions
d. all of the mentioned

Answer: b
Explanation: snapshots.os_latch_stats is a System level resource table.

88) Which of the following is syntax for sp_add_collector_type procedure?


a. core.sp_add_collector [ @collector_type_uid = ] ‘collector_type_uid’
b. core.sp_add_collector_type [ @collector_type_uid = ].
c. core.sp_add_collector_type [ @collector_type_uid = ] ‘collector_type_uid’
d. none of the mentioned

Answer: c
Explanation: core.sp_add_collector_type adds a new entry to the core.supported_collector_types view
in the management data warehouse database.

89) What does collector_type_id stands for in the following code snippet?
core.sp_remove_collector_type [ @collector_type_uid = ] ‘collector_type_uid’
a. uniqueidentifier
b. membership role
c. directory
d. none of the mentioned

Answer: a
Explanation: collector_type_uid is the GUID for the collector type.

90)What is true about data mining?


a. Data Mining is defined as the procedure of extracting information from huge sets of data
b. Data mining also involves other processes such as Data Cleaning, Data Integration, Data
Transformation
c. Data mining is the procedure of mining knowledge from data.
d. All of the above

Answer: d

Explanation: Data Mining is defined as extracting information from huge sets of data. In other
words, we can say that data mining is the procedure of mining knowledge from data. The
information or knowledge extracted so that it can be used.

91) How many categories of functions involved in Data Mining?


a. 2
b. 3
c. 4
d. 5

Answer: a

Explanation: there are two categories of functions involved in Data Mining : 1. Descriptive, 2.
Classification and Prediction

92) The mapping or classification of a class with some predefined group or class is known as?
a. Data Characterization
b. Data Discrimination
c. Data Set
d. Data Sub Structure

Answer: b

Explanation: Data Discrimination : It refers to the mapping or classification of a class with


some predefined group or class

93) The analysis performed to uncover interesting statistical correlations between associated-
attribute-value pairs is called?

a. Mining of Association
b. Mining of Clusters
c. Mining of Correlations
d. None of the above

Answer: c

Explanation: Mining of Correlations : It is a kind of additional analysis performed to uncover


interesting statistical correlations between associated-attribute-value pairs or between two
item sets to analyze that if they have positive, negative or no effect on each other.

94) __________ may be defined as the data objects that do not comply with the general behavior
or model of the data available.
a. Outlier Analysis
b. Evolution Analysis
c. Prediction
d. Classification

Answer: a

Explanation: Outlier Analysis : Outliers may be defined as the data objects that do not comply
with the general behavior or model of the data available.

95)"Efficiency and scalability of data mining algorithms" issues comes under?


a. Mining Methodology and User Interaction Issues
b. Performance Issues
c. Diverse Data Types Issues
d. None of the above

Answer: b

Explanation: In order to effectively extract the information from huge amount of data in
databases, data mining algorithm must be efficient and scalable.

96) To integrate heterogeneous databases, how many approaches are there in Data
Warehousing?
a. 2
b. 3
c. 4
d. 5

Answer: a

Explanation: Data warehousing involves data cleaning, data integration, and data
consolidations. To integrate heterogeneous databases, we have the following two approaches :
Query Driven Approach, Update Driven Approach

97) Which of the following is correct advantage of Update-Driven Approach in Data


Warehousing?

a. This approach provides high performance.


b. The data can be copied, processed, integrated, annotated, summarized and restructured in the
semantic data store in advance.
c. Both A and B
d. None Of the above

Answer: C

Explanation: Both A and B are advantage of Update-Driven Approach in Data Warehousing.

98) What is the use of data cleaning?


a. to remove the noisy data
b. correct the inconsistencies in data
c. transformations to correct the wrong data.
d. All of the above

Answer: d

Explanation: Data cleaning is a technique that is applied to remove the noisy data and correct
the inconsistencies in data. Data cleaning involves transformations to correct the wrong data.
Data cleaning is performed as a data preprocessing step while preparing the data for a data
warehouse.

99) Data Mining System Classification consists of?


a. Database Technology
b. Machine Learning
c. Information Science
d. All of the above

Answer: d

Explanation: A data mining system can be classified according to the following criteria :
Database Technology, Statistics, Machine Learning, Information Science, Visualization, Other
Disciplines

100) A priori algorithm operates in ___ method


a. Bottom-up search method
b. Breadth-first search method
c. None of above
d. Both a & b

Answer: d

101) A bi-directional search takes advantage of ___ process

a. Bottom-up process
b. Top-down process
c. None
d. Both a & b
Answer: d

102) The pincer-search has an advantage over a priori algorithm when the largest frequent item set is
long.
a. True
b. false
Answer: a

103) MCFS stand for


a. Maximum Frequent Candidate Set
b. Minimal Frequent Candidate Set
c. None of above

Answer: a

104) MFCS helps in pruning the candidate set


a. True
b. False

Answer: a

105) DIC algorithm stands for ___


a. Dynamic itemset counting algorithm
b. Dynamic itself counting algorithm
c. Dynamic item set countless algorithms
d. None of above

Answer: a

106) If the item set is in a dashed circle while completing a full pass it moves towards
a. Dashed circle
b. Dashed box
c. Solid Box
d. Solid circle

Answer: d

107) If the item set is in the dashed box then it moves into a solid box after completing a full pass
a. True
b. False
Answer: a

108) The dashed arrow indicates the movement of the item set
a. True
b. False

Answer: b

109) The vertical arrow indicates the movement of the item set after reaching the frequency threshold
a. True
b. False

Answer: a

110) Frequent set properties are:


a. Downward closure property
b. Upward closure property
c. A & B
d. None of these

Answer: c

111) Any subset of a frequent set is a frequent set is


a. Downward closure property
b. Upward closure property
c. a and b

Answer: a

112) Periodic maintenance of a data mart means


a. Loading
b. Refreshing
c. Purging
d. All are true

Answer: d

113) The Fp-tree Growth algorithm was proposed by


a. Srikant
b. Aggrawal
c. Hanetal
d. None of these
Answer: c

114) The main idea of the algorithm is to maintain a frequent pattern tree of the date set. An extended
prefix tree structure starting crucial and quantitative information about frequent sets
a. Priori Algorithm
b. Pinchers Algorithm
c. FP- Tree Growth algo.
d. All of these

Answer: c

115) The data warehousing and data mining technologies have extensive potential applications in the
govt in various central govt sectors such as :
a. Agriculture
b. Rural Development
c. Health and Energy
d. all of the true

Answer: d

116) ODS Stands for


a. External operational data sources
b. operational data source
c. output data source
d. none of the above

Answer: a

117) Good performance can be achieved in a data mart environment by extensive use of
a. Indexes
b. creating profile records
c. volumes of data
d. all of the above

Answer: d

118) Features of Fp tree are


(i). It is dependent on the support threshold
(ii). It depends on the ordering of the items
(iii). It depends on the different values of trees
(iv). It depends on frequent itemsets with respect to give information
a. (i) & (ii)
b. (iii) & (iv)
c. (i) & (iii)
d. (ii) only
Answer: a

119) For a list T, we denote head_t as its first element and body-t as the remaining part of the list (the
portion of the list T often removal of head_t) thus t is
a. {head} {body}
b. {head_t} {body_t}
c. {t_head}{t_body}
d. None of these

Answer: b

120) Partition Algorithm executes in


a. One phase
b. Two-Phase
c. Three phase
d. None of these

Answer: b

121) In the First Phase of the Partition Algorithm


a. Logically divides into a number of non-overlapping partitions
b. Logically divides into a number of overlapping Partitions
c. Not divides into partitions
d. Divides into non-logically and non-overlapping Partitions

Answer: a

122) Functions of the second phase of the partition algorithm are


a. Actual support of item sets are generated
b. Frequent item sets are identified
c. Both (a) & (b)
d. None of these

Answer: c

123) Partition algorithm is based on the


a. Size of the global Candidate set
b. Size of the local Candidate set
c. Size of frequent itemsets
d. No. Of item sets

Answer: a
124) Pincer search algorithm based on the principle of
a. Bottom-up
b. Top-Down
c. Directional
d. Bi-Directional

Answer: d

125) Pincer-Search Method Algorithm contains


(i) Frequent item set in a bottom-up manner
(ii) Recovery procedure to recover candidates
(iii) List of maximal frequent item sets
(iv) Generate a number of partitions
a. (i) only
b. (i) & (iii) only
c. (i),(iii) & (iv)
d. (i),(ii)&(iii)

Answer: d

126) Is a full-breadth search, where no background knowledge of frequent itemsets is used for
pruning?
a. Level-crises filtering by the single item
b. Level-by-level independent
c. Multi-level mining with uniform support
d. Multi-level mining with reduced support

Answer: b

127) Disadvantage of uniform support is


a. Items at lower levels of abstraction will occur as frequently.
b. If the minimum support threshold is set too high, I could miss several meaningful associations
c. Both (a) & (b)
d. None of these

Answer: c

128) Warehouse administrator responsible for


a. Administrator
b. maintenance
c. both a and b
d. none of the above

Answer: c
129) The pincer-search has an advantage over a priori algorithm when the largest frequent itemset is
long
a. True
b. false

Answer: a

130) What are the common approaches to tree pruning?


a. Prepruning and Postpruning approach.
b. Prepruning.
c. Postpruning.
d. None of the above.

Answer: a

131) Tree pruning methods address this problem of ___?


a. Overfitting the branches
b. Overfitting the data
c. a and b both
d. None of the above

Answer: b

132) What is the Full Form of MDL.


a. Maximum Description Length
b. Minimum Description Length
c. Mean Described Length
d. Minimum Described Length

Answer: b

133) State that the Statements are True / False:


a. Post pruning approach Removes Branches from a ‘Fully Grown’ Tree.
a. True
b. False

Answer: a

b. The “Best Pruned Tree is the one that maximizes the number of encoding bits.
a. True
b. False

Answer: b

134) Upon halting, the node becomes a ___


a. Heap
b. Subset
c. Leaf
d. Superset

Answer: c

135) demographic and neural clustering are methods of clustering based on


a. data types
b. methodology of calculation
c. Inter record distance
d. all of the above

Answer: d

136) POS stands for


a. Peer of sale
b. Point of sale
c. part of the sale
d. none of the above

Answer: b

137) Classification and Prediction are two forms of


a. Data analysis
b. Decision Tree
c. A and B
d. None of these

Answer: a

138) Classification predicts


a. Categorical labels
b. Prediction models continued valued function
c. A and B
d. None of these

Answer: c

139) Classification and Prediction have numerous applications:


a. Credit approval
b. Medical diagnosis
c. Performance prediction & selective marketing
d. All of these
Answer: d

140) Class label of each training sample is provided with this step is known as
a. Unsupervised learning
b. Supervised learning
c. Training samples
d. Clustering

Answer: b

141) Decision tree is based on


a. Bottom-down technique
b. Top-down technique
c. Divide-and-conquer manner
d. Top-down recursive divide-and-conquer manner

Answer: d

142) Recursive Partitioning stops in Decision Tree when


a. All samples for a given node belong to the same class.
b. There are no remaining attributes on which samples may be further partitioned.
c. There are no samples for the branch test.
d. All the above.

Answer: d

143) To select the test attribute of each node in a decision tree we use
a. Entity Selection Measure
b. Data Selection Measure
c. Information Gain Measure
d. None of these

Answer: c

144) Test attribute for the current node in the decision tree is chosen on the basis of
a. Lowest entity gain
b. Highest data gain
c. Highest Information Gain
d. Lowest Attribute Gain

Answer: c

145) Advantage of the Information-theoretic approach of the decision tree is


a. Minimizes the expected number of tests needed
b. Minimizes the number of Nodes
c. Maximizes the number of nodes
d. Maximizes the number of tests

Answer: a

146) Let us be the no. of samples of S in class Ci then expected information to classify a given sample is
given by
a. L(s1,s2,……..sm)=_log2(pi)
b. L(s1,s2,……..sm)=-_pilog2(pi)
c. L(s1,s2,……..sm)=_pilog2x
d. L(s1,s2,……..sm)=_pilog2(pi)

Answer: b

147) Steps applied to the data in order to improve the accuracy, efficiency, and scalability are:-
a. Data cleaning
b. Relevance analysis
c. Data transformation
d. All of the above

Answer: d

148) The process used to remove or reduce noise and the treatment of missing values
a. Data cleaning
b. Relevance analysis
c. Data transformation
d. None of above

Answer: a

149) Relevance analysis may be performed on the data by removing any irrelevant attribute from the
process.
a. True
b. False

Answer: a

150) Classification and prediction method can be affected by:-


a. Accuracy & Speed
b. Robustness & Scalability
c. Interpretability
d. All of the above
Answer: d

151) In a decision tree internal node denotes a test on an attribute and Leaf nodes represent classes or
class distributions
a. True
b. false

Answer: a

152) ___ attempts to identify and remove branches, with Improving accuracy
a. decision tree
b. tree pruning
c. both of them
d. none of above

Answer: b

153) To deal with larger data sets, a sampling method, called ___
a. Clara
b. Dara
c. Pam
d. None

Answer: a

154) What is the Full Form of CLARA.


a. Clustering Large Applicant
b. Close Large Applicant
c. Clustering Large Applications
d. None of the above

Answer: c

155) What is the Full Form of CLARANS.


a. Clustering Large Applications Based Upon Randomized Search
b. Close Large Applicant Based Upon Role Search
c. Clustering Large Applicant Based Upon Randomized Search
d. None of the above

Answer: a

156) Which Algorithm was proposed that combines the Sapling Technique with PAM.
a. CLARA
b. CLARANS
c. Both a and b
d. None of these.

Answer: b

157) Which are the two type of Hierarchical Clustering?


a. Agglomerative Hierarchical Clustering and Density Hierarchical Clustering
b. Agglomerative Hierarchical Clustering and Divisive Hierarchical Clustering
c. Divisive Hierarchical Clustering and Density Hierarchical Clustering
d. None of the above

Answer: b

158) Cluster is a :
a. The process of grouping a set of physical or abstract objects into classes of similar objects is
called clustering.
b. A cluster of data objects can be treated collectively as one group in many applications
c. Cluster analysis is an important human activity.
d. All of the above

Answer: d

159) Cluster analysis tools based on


a. K-means
b. K-medosis
c. A and B
d. None of these

Answer: c

160) S-Plus, SPSS, SAS software packages use for


a. Data Mining
b. Classification
c. Clustering
d. Prediction

Answer: c

161) Unsupervised learning is an example of


a. Classification and prediction
b. Classification and Regression
c. clustering
d. Data Mining
Answer: c

162) Requirement of Clustering in Data Mining


a. Scalability
b. Ability to deal with different types of attributes
c. Ability to deal with noisy data
d. Discovery of clusters with arbitrary shape
e. Minimal requirement for domain knowledge to determine input parameters
f. Insensitivity to the order of input records
g. High dimensionality
h. Constraint-based clustering
(a). a, c, d, f
(b). g, h
(c). All of these
(d). None of these

Answer: c

163) Clustering method can be classified


a. Partitioning Methods
b. Hierarchical methods
c. Density-based methods
d. All of these

Answer: d

164) Hierarchical methods can be classified


a. Agglomerative Approach
b. Divisive Approach
c. A and B
d. None of these

Answer: c

165) Agglomerative approach is called as


a. Bottom-up Approach
b. Top-Down Approach
c. A and B
d. None of these

Answer: a

166) Top-Down Approach is


a. Agglomerative Approach
b. Divisive Approach
Answer: b

167) Drawback of Hierarchical Methods


a. Suffer from the fact that once a step is done, it can never be undone.
b. A technique is that they cannot correct erroneous decision.
c. Both a & b
d. None of these

Answer: c

168) Two approaches to improving the quality of hierarchical clustering:


a. Perform careful analysis of object “linkages” at each hierarchical partitioning, such as in CURE
and Chameleon
b. Integrate Hierarchical agglomeration and iterative relocation by first using a hierarchical
agglomerative algorithm and refining the result using an iterative relocation
c. Both a & b
d. None of these

Answer: c

169) Classical Portioning methods are


a. k-means and k-median
b. k-means and k-medoids
c. k-modes only
d. none of these

Answer: b

170) K-means technique is based on


a. Centroid Object
b. Reference object
c. Representative object
d. Partition Object

Answer: a

171) K-medoids technique is based on


a. Centroid Object
b. Representative object
c. Partition Object
d. None of these

Answer: b
172) The k-means and the k-modes methods can be integrated to cluster data with mixed numeric and
categorical values, resulting in
a. k-median method
b. k-partition method
c. k-prototypes method
d. k-medoids method

Answer: c

173) The squared-error criterion is used in a k-means method defined as


a. E=_I=1tok _pεci [p-mi]
b. E=_I=1tok _pεci [mi]2
c. E=_I=1tok _pεci [p]2
d. E=_I=1tok _pεci [p-mi]2

Answer: d

174) The Computational Complexity of the k-means method algorithm is


a. O(log x)
b. Θ(nkt)
c. O(nkt)
d. Θ(log x)

Answer: c

175) Which Method is more Robust-k-means or k-medoids?


a. The k means is more robust in the presence of noise
b. The k-medoids method is more robust in the presence of noise and outliers
c. The k-medoids method is more robust due to no. of partitions
d. The k means is more robust due to its less complexity

Answer: b

176) First k-medoids algorithm introduced is


a. Prototype Above Medoids
b. Partition Below Medoids
c. Prototype Around Medoids
d. Partitioning Around Medoids

Answer: d

177) PAM stands for


a. Prototype Above Medoids
b. Prototype Around Means
c. Partitioning Around Medoids
d. Partitioning Above Means
Answer: c

178) Which statements are true fork-means


(i). It can apply only when the mean of the cluster is defined.
(ii). It is not suitable for discovering clusters with non-convex shapes
(iii). This method is relatively efficient in processing only small data.
a. (i) only
b. (i) & (ii) only
c. (iii) only
d. All the above

Answer: b

179) DBSCAN stands for:


a. Divisive Based Clustering Method
b. Density-Based Clustering Method
c. Both a & b
d. None of above

Answer: b

180) DBSCAN defines a cluster as a maximal set of density –


Connected points
a. True
b. False

Answer: a

181) For a non-negative value ε,Ne(Oi)={ Oj ∈D I d(Oi,Oj)≤ ε}


a. True
b. false

Answer: a

182) The ___ client is a desktop that relies on the server to which it is connected for the majority of its
computing power.
a. thin
b. none
c. thick
d. web server

Answer: a
183) An object is said to be the Core Object if
a. _ Ne(O)_ ≥ MinPts
b. _ N (O)_ ≥ MaxPts
c. none of above
d. both a & b

Answer: a

184) The density-reachability relation is transitive but not symmetric


a. True
b. False

Answer: a

185) Non-core objects are:-


a. border object
b. noise object
c. non-object
d. both a & b

Answer: d

186) DBSCAN algorithm can classify into:


a. classified
b. unclassified
c. noise
d. all of above

Answer: d

187) Unsupervised learning is an example of


a. Classification and prediction
b. Classification and Regression
c. clustering
d. Data Mining

Answer: c

188) Data can be classified as


a. reference data
b. transaction data and derived data
c. derived data
d. all of the above
Answer: d

189) Reference and transaction data originates from


a. operational system
b. Unnormalized data
c. data marts
d. all are true

Answer: a

190) Derived data is derived from


a. reference data
b. transaction data
c. reference and transaction data
d. none of the above

Answer: c

191) Unnormalized data, which is the basis for online analytical processing tools are prepared
periodically but is directly based on detailed ___.
a. reference data
b. transaction data
c. reference and transaction data
d. none of the above

Answer: a

192) The data mart is loaded with data from a data warehouse by means of a ___
a. load program
b. process
c. project
d. all is valid

Answer: a

193) The chief considerations for a Load program are:


a. frequency and schedule
b. total or partial refreshment
c. customization and re-sequencing
d. all are true

Answer: d
194) Periodic maintenance of a data mart means
a. all are true
b. loading
c. refreshing
d. purging

Answer: a

195) Detailed level data, summary level, preprocessed and Adhoc data are data in
a. data warehouse
b. data mart
c. both
d. none of the above

Answer: b

196) Data sources in the data warehouse are referred to as


a. External data source
b. Operational data source
c. External operational data source
d. none of the above

Answer: c

197)___ Table help and enable the end-users of the data mart to relate the data to its expanded version.
a. data
b. reference
c. both a and b
d. none of the above

Answer: b

You might also like