Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

Natural Language Processing

Rahul Sahai
December 2020
N AT U R A L L A N G U A G E P R O C E S S I N G I S A
SUBFIELD OF LINGUISTICS, COMPUTER
SCIENCE, AND ARTIFICIAL INTELLIGENCE
CONCERNED WITH THE INTERACTIONS
BETWEEN COMPUTERS AND HUMAN
L A N G U A G E , I N PA R T I C U L A R H O W T O
PR OG R AM COMPUT ER S TO PR OCESS AND
A N A LY Z E L A R G E A M O U N T S O F N AT U R A L
L A N G U A G E D AT A .

NATURAL LANGUAGE PROCESSING


SENTIMENT ANALYSIS
• Suppose you have an application which is very famous and as
around a billion users and you decide to add a new functionality
to your application, how will you get feedback for it? How will you
get a prognosis for customer utility?

• Sentiment Analysis using ML automatically gathers feedback from


customers. You can choose whether to ask every customer for
feedback, or simply those who’ve indicated they have an issue
that needs fixing.

• You can also choose whether to gather this at the point of


purchase, or after delivery, or both. And can also join that data
with your customers’ purchase history, to give you a complete
view of:
• How happy your customers are,
• how valuable they are to you, and
• feedback on why they’re happy/areas you could
improve/issues they’ve faced, and much more

3
COMPUTATIONALLY IDENTIFYING AND
CATEGORIZING OPINIONS

E M P LOY E E O P P O R T U N I T I E S
The process of computationally identifying and categorizing
opinions expressed in a piece of text, especially in order to
determine whether the writer’s attitude towards a particular topic,
product or feature is positive, negative or neutral. Sentiment
Analysis is also called:
• Opinion extraction
• Opinion mining We are proactively now confirming our emotional
• Sentiment mining states on social media- allowing reinforcement to NLP
• Subjectivity analysis algorithms
4
GETTING DEEPER INTO
SENTIMENT ANALYSIS
USING COMPLEX
UNSUPERVISED
LEARNING ALGORITHMS

L E T ’ S D I V E I N

5
SENTIMENT R ALGORITHM

• SentimentR attempts to take into account valence shifters (i.e., negators, amplifiers
(intensifiers), de-amplifiers (down toners), and adversative conjunctions) while calculating
sentiment scores
• So what are these valence shifters?
• A negator flips the sign of a polarized word (e.g., “I do not like it.”). An amplifier (intensifier)
increases the impact of a polarized word (e.g., “I really like it.”).
• A de-amplifier (down-toner) reduces the impact of a polarized word (e.g., “I hardly like it.”).
An adversative conjunction overrules the previous clause containing a polarized word (e.g., “I
like it but it’s not worth it.”).
• Using the valence shifters with the polarized word a sentiment score is given
PARSING

Within computational linguistics the term is used to refer to the


formal analysis by a computer of a sentence or other string of
words into its constituents, resulting in a parse tree showing
their syntactic relation to each other, which may also
contain semantic and other information. Some parsing
algorithms may generate a parse forest or list of parse trees for
a syntactically ambiguous input.
The term is also used in psycholinguistics when describing
language comprehension. In this context, parsing refers to the
way that human beings analyze a sentence or phrase (in spoken
language or text) "in terms of grammatical constituents,
identifying the parts of speech, syntactic relations, etc.“ This
term is especially common when discussing what linguistic cues
help speakers to interpret garden-path sentences.
7
LET ’S TAKE AN
EXAMPLE-
WOMEN’S
E-COMMERCE
REVIEWS
TA K I N G A L O O K AT W O M E N ’ S
CLOTHI NG E-COMMERCE REVI EWS
D ATA S E T
H T T P S : / / W W W. K A G G L E . C O M / N I C A P O T A T
O /WO MENS-EC O MMER C E -C LO THING-
R E VI E WS?SE L E C T = WO ME NS+ C L O T H I NG+E
-C O MMER C E+R EVIEWS.C SV

8
THE DATASET (GLIMPSE)
Positive
Recommended Division Department Class
Age Title Review Text Rating
IND
Feedback
Name Name Name
Count
Intimat
33 Absolutely wonderful - silky and sexy and comfortable 4 1 0 Initmates Intimate
es
Love this dress! it's sooo pretty. i happened to find it in a store, and i'm glad i did bc i never
would have ordered it online bc it's petite. i bought a petite and am 5'8". i love the length Dresse
34 5 1 4 General Dresses
s
on me- hits just a little below the knee. would definitely be a true midi on someone who is
truly petite.
I had such high hopes for this dress and really wanted it to work for me. i initially ordered the
Some petite small (my usual size) but i found this to be outrageously small. so small in fact that i
major could not zip it up! i reordered it in petite medium, which was just ok. overall, the top half Dresse
60 3 0 0 General Dresses
s
design was comfortable and fit nicely, but the bottom half had a very tight under layer and several
flaws somewhat cheap (net) over layers. imo, a major design flaw was the net over layer sewn
directly into the zipper - it c
My
I love, love, love this jumpsuit. it's fun, flirty, and fabulous! every time i wear it, i get nothing General
50 favorite 5 1 0
Petite
Bottoms Pants
but great compliments!
buy!
Flatterin This shirt is very flattering to all due to the adjustable front tie. it is the perfect length to Blouse
47 5 1 6 General Tops
s
g shirt wear with leggings and it is sleeveless so it pairs well with any cardigan. love this shirt!!!
I love tracy reese dresses, but this one is not for the very petite. i am just under 5 feet tall
and usually wear a 0p in this brand. this dress was very pretty out of the package but its a lot
Not for
of dress. the skirt is long and very full so it overwhelmed my small frame. not a stranger to Dresse
49 the very 2 0 4 General Dresses
s
alterations, shortening and narrowing the skirt would take away from the embellishment of
petite
the garment. i love the color and the idea of the style but it just did not work on me. i
returned this dress.
COMMENCING INITIAL ANALYSIS

• As we can see, Tops has the highest


percentage of Reviews and Ratings in
our dataset, followed by dresses.
• Products like the Jackets and Trend
department have received the worst
reviews.
• Now we will observe the distribution
of the rating within the departments
W E A NA LYZ E THE % RATI NG OF E A CH
DE PA RTM E NT ( B OTTOM S , DRE S S E S , I NTI M ATE ,
J A CK E TS , TOP S A ND TRE ND)

We observe that Trends has a very low weightage (119/23486 = 0.51%), so I have decided to discard it.
JACKETS AND BOTTOMS HAVE RECEIVED THE
HIGHEST RATINGS
• In each department a
5-star rating is very
dominant and has a
higher percentage.
• The one with the most
rating 5 stars has is the
department of Jackets
(61.14%), followed by
Bottoms (59.83%) and
Intimate (59.54%).
AGE 26-39 LEAVES THE GREATEST NUMBER
OF REVIEW S
• Splitting reviews by age
group
• Age 26-39 leaves a
greater number of
reviews, followed by the
group 40-55.
• Ignoring group of 66 - 99,
since the number of
reviews does not
consider them

NOTE: we are releasing Trend again since as we have explained previously we will not use it.
GTRENDSR: PERFORM AND DISPLAY GOOGLE
TRENDS QUERIES FOR CATEGORIES OF
DRESSES, J EANS , KNITS & PANTS • We have selected 4 words that
I have considered had more
records in the dataset:
"Dresses", "knits", "jeans" and
"pants". We can carry out a
study of the Google search for
these words
• Loading the gtrendsR library
we can carry out a study of
the Google search for these
words
• We observe how dresses and
jeans have a high degree of
interest, followed closely.
• While Knits has almost zero
interest.
W E W ILL MAKE SOME GRAPHS ACCORDING TO
OUR INTERESTS (1)
W E W ILL MAKE SOME GRAPHS ACCORDING TO
OUR INTERESTS (2)
SENTIMENT ANALYSIS: MOST USED W ORDS
SENTIMENT ANALYSIS: EMOTIONS ABOUT
RATING PRODUCT
MOST COMMON BIGRAMS (BY RATINGS)
• With the help of some Kaggle
users and kernels and
• this link: https:
//www.tidytextmining.com/tidyt
ext.html#word-frequencies I
have tried to perform a
wordcloud, in addition to other
aspects of Text Mining
• First of all, we look for reviews
without a title and with a title,
since they exist within the
dataset:
NETW ORK OF W ORDS ASSOCIATED W ITH
POSITIVE (FIVE) AND NEGATIVE W ORDS (ONE)
One Five
W ORD CLOUDS: POSITIVE AND NEGATIVE
SENTIMENTS
Five
One
LINEAR DISCRIMINANT ANALYSIS
Five
One
USING TEXT MINING
AND SENTIMENT
ANALYSIS FOR W ORD
CLOUD

• Text mining and certain plotting packages are not


installed by default so one has to install them
manually The relevant packages are:
• tm – the text mining package
• SnowballC – required for stemming
• ggplot2 – plotting capabilities
• wordcloud – which is self-explanatory

23
TEXT MINING
PACKAGE (TM)

24
SENTIMENT ANALYSIS: W ORD CLOUD
Positive Sentiment Negative Sentiment
OPEN AI RELEASES GPT-2, A TEXT-
GENERATING AI SYSTEM

Text-Generating AI Systems, such as the GPT-2 system


developed by Open AI and unveiled last week, may be
more likely to evolve into human-like machines than
traditional AI, says Open AI researcher James Kuffner. “If
these systems can be trained to do certain tasks that are
similar to humans, then we can expect to see a human-
level intelligence emerge, not just in the short run but in
the long run,” says Kuffner.
OPENAI'S GPT-2: "THE AI THAT WAS TOO
DANGEROUS TO RELEASE"
As has become the norm when there is a breakthrough in deep learning research. GPT-2 stands for
“Generative Pretrained Transformer 2”:

• “Generative” means the model was trained to predict (or “generate”) the next token in a sequence of
tokens in an unsupervised way. In other words, the model was thrown a whole lot of raw text data and
asked to figure out the statistical features of the text to create more text.
• “Pretrained” means OpenAI created a large and powerful language model, which they fine-tuned for
specific tasks like machine translation later on. “Transformer” means OpenAI used the transformer
architecture, as opposed to an RNN, LSTM, GRU or any other 3/4 letter acronym you have in mind. I’m not
going to discuss the transformer architecture in detail since there’s already another great article on the
FloydHub blog that explains how it works.
• “2” means this isn’t the first time they’re trying this whole GPT thing out.
AI IS NOT THE ENEMY (CREATED BY GPT2)
In many ways, AI is a natural extension of our ability to communicate. This can be seen in
two ways. The first is a direct implication of the second: the ability to read the environment.
The ability to recognize that another character is in a certain situation is a direct result of
having a basic understanding of language.
The second is a more abstract concept and one which can be applied to almost any medium:
a good writer can tell a good story even if they don’t understand how the story works.
But, if we’re going to ask whether AI is the enemy, it’s important to ask why we are in an
environment in which it exists. I think we’re on the cusp of something interesting here. For
the first time in human history, a new kind of technology has arisen that is creating real
value and excitement.
We’ve long seen the exponential growth of AI systems. We’ve been doing research on AI for
decades. But now, in the wake of the massive success of Google Translate and Siri, I think
we’re finally at a point where people are seeing the value.
We’re at the end of the year now and there’s still so much to accomplish. AI is not going to be
the end of our industry. We still have a ton of people that are working on AI and most of
these people are doing so in secret to avoid the legal ramifications of AI being misused.
“AI will help us.”
I see the big tech companies, the big social media companies, the big pharma companies,
the big finance companies and all these are using AI to get their products to market. They
are not using AI as a weapon. They are using it as a tool to help us make the product and get
the job done.
AI is not a new problem. We are just getting to the point where we are really using it.

---- End of GPT2 Output


… AND THERE ARE MORE…
THANK YOU

[email protected]

office103@glob alvyz.on microsoft.com

You might also like