Download as rtf, pdf, or txt
Download as rtf, pdf, or txt
You are on page 1of 2

Identify the unstructured data from the following Image

What kind of classification is our case study 'Spam Detection'?Binary

Which preprocessing technique is used to remove the most commonly used words?Stopword removal

Cross-validation technique is used to evaluate a classifier by dividing the data set into training set to train
the classifier and testing set to test the same T

True Negative is when the predicted instance and the actual is positive.F

True Positive is when the predicted instance and the actual instance is not negative.T

ITPE

Data Analysis -> PreProcessing -> Model Building--> Predict

A classifer that can compute using numeric as well as categorical values is Decision Tree Classifier

print(sentiment_analysis_data['label'].unique()) 10

Which of the given hyper parameter(s), when increased may cause random forest to over fit the data?
Depth of Tree

Choose the correct sequence for classifier building from the following:Initialize -> Train - -> Predict--
>Evaluate

Clustering is a supervised classification False

Classification where each data is mapped to more than one class is called Multi Class Classification

To view the first 3 rows of the dataset, which of the following commands are used?
sentiment_analysis_data.head(3)

Imagine you have just finished training a decision tree for spam classication and it is showing abnormal
bad performance on both your training and test sets. Assume that your implementation has no bugs.
What could be reason for this problem You need to increase the learning rate.

Which NLP technique uses lexical knowledge base to obtain the correct base form of the words?
lemmatization

Which one of the following is not a classification technique?StratifiedShuffleSplit


Supervised learning differs from unsupervised learning in that supervised learning requires Labeled data

Model Tuning helps to increase the accuracy True

Identify the stop words from the following Both "the" and "it"

In a Term Document Matrix (TDM) each row represents document

TF-IDF is a freature extraction technique T

Which of the following is not a performance evaluation measure?DecisionTree

Which of the following command is used to view the dataset SIZE and what is the value returned?
sentiment_analysis_data.size,(7086, 3)

What is the purpose of lemmatization?To convert words to a proper base form

Lemmatization offers better precision than stemming T

The fit(X, y) is used to Train the Classifier

What does the command sentiment_analysis_data['label'].value_counts() return?The total count of


elements in 'label' column

Can we consider sentiment classification as a text classification problem?T

Inverse Document frequency is used in term document matrix.F

Pruning is a technique associated with SVM

email spam data is an example of Unstructured Data

Select pre-processing techniques from the options All

High classification accuracy always indicates a good classifier.F

Which type of cross validation is used for imbalanced dataset? Stratified Shuffle Split

Stemming and lemmatization gives the same result.F

Which numerical statistics is used to identify the importance of a rare word in a document? tf-idf

You might also like