Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 57

UNIT IV

HEALTHCARE AND DEEP LEARNING

Introduction on Deep Learning – DFF network CNN- RNN for Sequences – Biomedical Image
and Signal Analysis – Natural Language Processing and Data Mining for Clinical Data – Mobile
Imaging and Analytics – Clinical Decision Support System.

Drawbacks of the Machine Learning


 Traditional ML algorithms are not useful while working with high dimensional data, that
is where we have a large number of inputs and outputs. For example, in case of
handwriting recognition we have large amount of input where we will have different type
of inputs associated with different type of handwriting.
 Second major challenge is to tell the computer what are the features it should look
for that will play an important role in predicting the outcome as well as to achieve better
accuracy while doing so. This very process is referred as feature extraction.
 Feeding raw data to the algorithm rarely ever works and this is the reason why feature
extraction is a critical part of the traditional machine learning workflow.
 Therefore, without feature extraction, the challenge for the programmer increases as the

effectiveness of algorithm very much depends on how insightful the programmer is.

 Hence, it is very difficult to apply these Machine Learning models or algorithms to


complex problems like object recognition, handwriting recognition, NLP (Natural
Language Processing), etc.  

What is deep learning?

Deep learning attempts to mimic the human brain—albeit far from matching its ability—
enabling systems to cluster data and make predictions with incredible accuracy.

Deep learning is a subset of machine learning, which is essentially a neural network with three or
more layers. These neural networks attempt to simulate the behaviour of the human brain—albeit
far from matching its ability—allowing it to “learn” from large amounts of data.

Deep learning models are capable of learning to focus on the right features by themselves,
requiring little guidance from the programmer.
Basically, deep learning mimics the way our brain functions i.e. it learns from experience. As
you know, our brain is made up of billions of neurons that allows us to do amazing things. Even
the brain of a one year old kid can solve complex problems which are very difficult to solve even
using super-computers.Recognize the face of their parents and different objects as well.
Discriminate different voices and can even recognize a particular person based on his/her voice.
Draw inference from facial gestures of other persons and many more. 

How deep learning mimics the functionality of a brain? 

Deep learning uses the concept of artificial neurons that functions in a similar manner as the
biological neurons present in our brain. Therefore, we can say that Deep Learning is a subfield
of machine learning concerned with algorithms inspired by the structure and function of the brain
called artificial neural networks. Now, let us take an example to understand it. Suppose we want
to make a system that can recognize faces of different people in an image. If we solve this as a
typical machine learning problem, we will define facial features such as eyes, nose, ears etc.
and then, the system will identify which features are more important for which person on its
own.

Now, deep learning takes this one step ahead. Deep learning automatically finds out the features
which are important for classification because of deep neural networks, whereas in case of
Machine Learning we had to manually define these features.

How Deep Learning works?

The inspiration for deep learning is the way that the human brain filters the information. Its main
motive is to simulate human-like decision making. Neurons in the brain pass the signals to
perform the actions. Similarly, artificial neurons connect in a neural network to perform tasks
clustering, classification, or regression. The neural network sorts the unlabeled data according to
the similarities of the data. That’s the idea behind a deep learning algorithm.

Neurons are grouped into three different types of layers:

a) Input layer

b) Hidden layer
c) Output layer

Input Layer

• It receives the input data from the observation. This information


breaks into numbers and the bits of binary data that a computer can
understand. Variables need to be either standardized or normalized to
be within the same range.

Hidden Layer

• It performs mathematical computations on input


data. To decide the number of hidden layers and
the number of neurons in each layer is
challenging. It does the non-linear processing
units for feature extraction and transformation.
Each following layer utilizes the output of the
preceding layer as input. It forms the hierarchy
concepts from the learning. In the hierarchy,
each level grasps to transform the input data into
a more and more abstract and composite
representation.

• The “deep” in Deep Learning refers to have more than one hidden layer.
Output Layer:

The output layer returns the output data

Weight:

The connection between neurons is called weight, which is the numerical values. The weight
between neurons determines the learning ability of the neural network. During the learning of
artificial neural networks, weight between the neuron changes. Initial weights are set randomly.

Transfer Function

The transfer function translates the input signals to output signals. Four types of transfer
functions are commonly used, Unit step (threshold), sigmoid, piecewise linear, and Gaussian.

Unit step (threshold)

The output is set at one of two levels, depending on


whether the total input is greater than or less than some
threshold value.

Sigmoid

The sigmoid function consists of 2 functions, logistic and


tangential. The values of logistic function range from 0
and 1 and -1 to +1 for tangential function.

Piecewise Linear
Gaussian

Gaussian functions are bell-shaped curves that are continuous. The node output (high/low) is
interpreted in terms of class membership (1/0), depending on how close the net input is to a
chosen value of average.

Linear

Like a linear regression, a linear activation function transforms the weighted sum inputs of the
neuron to an output using a linear function.
Activation Function

Activation function decides, whether a neuron should be activated or not by calculating weighted
sum and further adding bias with it.

Activation function - Hidden layer i.e. layer 1 :-

z(1) = W(1)X + b(1)


a(1) = z(1)

Here, z(1) is the vectorized output of layer 1.W(1) be the vectorized weights assigned to neurons
of hidden layer i.e. w1, w2, w3 and w4. X be the vectorized input features i.e. i1 and i2. b is the
vectorized bias assigned to neurons in hidden layer i.e. b1 and b2. a(1) is the vectorized form of
any linear function.

Layer 2 i.e. output layer :-


• // Note : Input for layer

• // 2 is output from layer 1

• z(2) = W(2)a(1) + b(2)

• a(2) = z(2)

Calculation at Output layer:

• // Putting value of z(1) here


• z(2) = (W(2) * [W(1)X + b(1)]) + b(2)
• z(2) = [W(2) * W(1)] * X + [W(2)*b(1) + b(2)]
• Let,
• [W(2) * W(1)] = W]
• [W(2)*b(1) + b(2)] = b]
• Final output : z(2) = W*X + b
• Which is again a linear function

Types of Deep Learning Networks


 Feedforward neural network
 Radial basis function neural networks
 Multi-layer perceptron
 Convolution neural network
 Recurrent neural network
 Modular neural network
 Sequence to sequence models
Feedforward neural network
This type of neural network is the very basic neural network where the flow control occurs from
the input layer and goes towards the output layer.These kinds of networks are only having single
layers or only 1 hidden layer.Since the data moves only in 1 direction there is no
backpropagation technique in this network.In this network, the sum of the weights present in the
input is fed into the input layer. These kinds of networks are used in the facial recognition
algorithm using computer vision.
Radial basis function neural networks
This kind of neural network have generally more than 1 layer preferably two layers
In this kind of network, the relative distance from any point to the center is calculated and the
same is passed towards the next layer
Radial basis networks are generally used in power restoration systems to restore the power in the
shortest span of time to avoid blackouts.
Multi-layer perceptron
This type of network are having more than 3 layers and its used to classify the data which is not
linear.These kinds of networks are fully connected with every node.These networks are
extensively used for speech recognition and other machine learning technologies.

Convolution neural network (CNN)


CNN is one of the variations of the multilayer perceptron. CNN can contain more than 1
convolution layer and since it contains a convolution layer the network is very deep with fewer
parameters. CNN is very effective for image recognition and identifying different image patterns.
Recurrent neural network
RNN is a type of neural network where the output of a particular neuron is fed back as an input
to the same node. This method helps the network to predict the output. This kind of network is
useful in maintaining a small state of memory which is very useful for developing the chatbot.
This kind of network is used in chatbot development and text-to-speech technologies.

Modular neural network


This kind of network is not a single network but a combination of multiple small neural
networks.
All the sub-networks make a big neural network and all of them work independently to achieve a
common target. These networks are very helpful in breaking the small-large problem into small
pieces and then solving it.
Sequence to sequence models
This type of network is generally a combination of two RNN networks.
The network works on the encoding and decoding that is it consists of the encoder which is used
to process the input and there is a decoder which processes the output
Generally, this kind of network is used for text processing where the length of the inputted text is
not as same as outputted text.

DFF network CNN


Deep feedforward networks
Deep feedforward networks, also often called feedforward neural networks, or multilayer
perceptrons (MLPs), are the quintessential deep learning models. The goal of a feedforward
network is to approximate some function f*.
[Only for reference: a function approximation problem asks us to select a function among a well-
defined class that closely matches a target function in a task-specific way.]
For example, for a classifier, y = f *(x) maps an input x to a category y. A feedforward network
defines a mapping y= f (x; θ) and learns the value of the parameters θ that result in the best
function approximation.
Flow of Information
These models are called feedforward because information flows through the
function being evaluated from x, through the intermediate computations used to
define f, and finally to the output y. There are no feedback connections in
which outputs of the model are fed back into itself. When feedforward neural
networks are extended to include feedback connections, they are called
recurrent neural networks.
Example: US Election
Importance of Feedforward Networks:
Form basis for many commercial applications
1. CNNs are a special kind of feedforward networks.They are used for recognizing objects
from photos
2. They are a conceptual stepping stones to RNNs
3. RNNs power many NLP applications
Feedforward Neural Network Structures
Feedforward neural networks are called networks because they are typically rep- resented by
composing together many different functions. The model is associated with a directed acyclic
graph describing how the functions are composed together. For example, we might have three
functions f(1) , f (2) , and f (3) connected in a chain, to form f(x ) = f (3) (f (2) (f (1)(x))). These
chain structures are the most commonly used structures of neural networks. In this case, f (1) is
called the first layer of the network, f (2) is called the second layer, and so on.
Definition of Depth
The overall length of the chain gives the depth of the model. – Ex: the composite function f(x ) =
f (3) (f (2) (f (1)(x))),has depth of 3 . It is from this terminology that the name “deep learning”
arises. The final layer of a feedforward network ex f (3) is called the output layer.
Training the Network
In network training we drive f (x) to match f* (x). Training data provides us with noisy,
approximate examples of f* (x) evaluated at different training points. Each example
accompanied by label y ≈ f*(x). Training examples specify directly what the output layer must
do at each point x – It must produce a value that is close to y.
Definition of Hidden Layer:
Hidden layers perform various types of mathematical computation on the input data and
recognize the patterns that are part of. Behavior of Hidden layers is not directly specified by
the data. Learning algorithm must decide how to use those layers to produce value that is close to
y . Training data does not say what individual layers should do . Since the desired output for
these layers is not shown, they are called hidden layers.
A net with depth 2: one hidden layer
Width of Model
Each hidden layer is typically vector-valued. Dimensionality of hidden layer vector is width of
the model.
Units of a model
Each element of vector viewed as a neuron – Instead of thinking of it as a vector-vector function,
they are regarded as units in parallel. Each unit receives inputs from many other units and
computes its own activation value
Depth versus Width
Going deeper makes network more expressive – It can capture variations of the data better. –
Yields expressiveness more efficiently than width . Tradeoff for more expressiveness is
increased tendency to overfit – You will need more data or additional regularization. network
should be as deep as training data allows. – But you can only determine a suitable depth by
experiment. Also computation increases with no. of layers.
Convolutional Neural Network:
In deep learning, a convolutional neural network (CNN/ConvNet) is a class of deep neural
networks most commonly applied to analyze visual imagery. Now when we think of a neural
network we think about matrix multiplications but that is not the case with ConvNet. It uses a
special technique called Convolution. Now in mathematics convolution is a mathematical
operation on two functions that produces a third function that expresses how the shape of
one is modified by the other.
➢ A convolutional neural network (CNN or ConvNet), is a network architecture for deep
learning which learns directly from data, eliminating the need for manual feature
extraction.
➢ CNNs are particularly useful for finding patterns in images to recognize objects, faces,
and scenes. They can also be quite effective for classifying non-image data such as audio,
time series, and signal data.
➢ Applications that call for object recognition and computer vision— such as self driving
vehicles and face-recognition applications — rely heavily on CNNs
Important Factors:
➢ CNNs eliminate the need for manual feature extraction—the features are learned directly
by the CNN.
➢ CNNs produce highly accurate recognition results.
➢ CNNs can be retrained for new recognition tasks, enabling you to build on pre-existing
networks.
➢ Deep learning workflow. Images are passed to the CNN, which automatically learns
features and classifies objects.
Applications:
• Medical Imaging: CNNs can examine thousands of pathology reports to visually detect
the presence or absence of cancer cells in images.
• Audio Processing: Keyword detection can be used in any device with a microphone to
detect when a certain word or phrase is spoken - (‘Hey Siri!’). CNNs can accurately learn
and detect the keyword while ignoring all other phrases regardless of the environment.
• Stop Sign Detection: Automated driving relies on CNNs to accurately detect the presence
of a sign or other object and make decisions based on the output.
• Synthetic Data Generation: Using Generative Adversarial Networks (GANs), new images
can be produced for use in deep learning applications including face recognition and
automated driving.
How CNNs Work:
• A convolutional neural network can have tens or hundreds of layers that each learn to
detect different features of an image.
• Filters are applied to each training image at different resolutions, and the output of each
convolved image is used as the input to the next layer.
• The filters can start as very simple features, such as brightness and edges, and increase in
complexity to features that uniquely define the object
Feature Learning, Layers, and Classification:
• Three of the most common layers are: convolution, activation or ReLU, and pooling.
Convolution :
• Convolution puts the input images through a set of convolutional filters, each of which
activates certain features from the images.
• Convolution is a specialized type of linear operation used for feature extraction, where a
small array of numbers, called a kernel, is applied across the input, which is an array of
numbers, called a tensor. An element-wise product between each element of the kernel
and the input tensor is calculated at each location of the tensor and summed to obtain the
output value in the corresponding position of the output tensor, called a feature map.
• This procedure is repeated applying multiple kernels to form an arbitrary number of
feature maps, which represent different characteristics of the input tensors; different
kernels can, thus, be considered as different feature extractors
• Two key hyperparameters that define the convolution operation are size and number of
kernels. The former is typically 3 × 3, but sometimes 5 × 5 or 7 × 7. The latter is
arbitrary, and determines the depth of output feature maps
• However, there are three hyperparameters which affect the volume size of the output that
need to be set before the training of the neural network begins. These include:
• 1. The number of filters affects the depth of the output. For example, three distinct filters
would yield three different feature maps, creating a depth of three. 
• 2. Stride is the distance, or number of pixels, that the kernel moves over the input matrix.
While stride values of two or greater is rare, a larger stride yields a smaller output.
• 3. Zero-padding is usually used when the filters do not fit the input image. This sets all
elements that fall outside of the input matrix to zero, producing a larger or equally sized
output. There are three types of padding:
• Valid padding: This is also known as no padding. In this case, the last convolution is
dropped if dimensions do not align.
• Same padding: This padding ensures that the output layer has the same size as the input
layer
• Full padding: This type of padding increases the size of the output by adding zeros to the
border of the input.
Pooling layer:
• Pooling layers, also known as downsampling, conducts dimensionality reduction,
reducing the number of parameters in the input.
• Similar to the convolutional layer, the pooling operation sweeps a filter across the entire
input, but the difference is that this filter does not have any weights.
• Instead, the kernel applies an aggregation function to the values within the receptive
field, populating the output array. There are two main types of pooling:
• Max pooling: As the filter moves across the input, it selects the pixel with the maximum
value to send to the output array. As an aside, this approach tends to be used more often
compared to average pooling.
• Average pooling: As the filter moves across the input, it calculates the average value
within the receptive field to send to the output array.
• While a lot of information is lost in the pooling layer, it also has a number of benefits to
the CNN. They help to reduce complexity, improve efficiency, and limit risk of
overfitting. 
Fully-Connected Layer:
• The name of the full-connected layer aptly describes itself. As mentioned earlier, the
pixel values of the input image are not directly connected to the output layer in partially
connected layers. However, in the fully-connected layer, each node in the output layer
connects directly to a node in the previous layer.
• This layer performs the task of classification based on the features extracted through the
previous layers and their different filters. While convolutional and pooling layers tend to
use ReLu functions, FC layers usually leverage a softmax activation function to classify
inputs appropriately, producing a probability from 0 to 1.
• Rectified linear unit (ReLU) allows for faster and more effective training by mapping
negative values to zero and maintaining positive values. This is sometimes referred to
as activation, because only the activated features are carried forward into the next layer.

RNN for Sequences


Sequence models:
• Sequence models are the machine learning models that input or output sequences of data.
Sequential data includes text streams, audio clips, video clips, time-series data and etc.
Recurrent Neural Networks (RNNs) is a popular algorithm used in sequence models.
• Applications of Sequence Models:
Used for speech recognition, voice recognition, time series prediction, and natural
language processing.
Why sequentially matters?
• There are a lot of real-life scenarios like image processing, voice recognition, language
translations in which sequence matters. For example, If I write “are you how ?” will it
make sense? no right because our brain is trained to process this sentence in sequence.
• That is because we have trained our brain with this sequenced information and some
change in their order would make it hogwash. Similarly, these tasks need a model that
considers time, traditional models like SVM, logistic regression, or Neural Networks like
FFN are not capable of doing these tasks. While talking about AI/ML, the primary
conception of Artificial Intelligence is a machine that can engage with a human in a way
similar to other humans.
• Artificial Intelligence is the ability of a machine to convincingly engage in dialogue
( what we will call an AI-based advanced chatbot ), this will be only done when
computers will be able to process the time-dependent data in the same way as the human
mind does.
Recurrent Neural Networks:
• RNN is the multiple ANNs chained so as to keep a track of previous outputs, unlike
normal ANN. The output of current timestep acts as an input to the next timestep.
Predictions have to be made based on the past inputs where there is a need to memorize
the previous inputs. Hence, RNN has “Hidden states” which act as a memory for what
all information is computed.
• where x̅ and ȳ represent the input and the output respectively, s represents the states
which is generally previous input, and Wx, Ws, Wy represents the weights for input,
hidden and output layers respectively.
RNN Folded Model

RNN unfolded model

• In FFNN we obtain the input for the hidden layer by applying the activation function, for
this, we only need the input vector and the weights matrix.

• RNNs also use activation functions only with a small change:


• The hidden layer’s input is calculated using the sum of the product of input and state
vectors with their respective weights matrix
• output is calculated the same in both FFNNs and RNNs using the following formula:

• The unfolded architecture of the RNNs can be altered as per the requirement, say if you
want to do the sentiment classification task we can have multiple inputs and single output

• while in the case of language generation models we need to have multiple inputs and
multiple outputs, also RNNs can be stacked together for some special use cases.
Types of Recurrent Neural Networks
There are four types of Recurrent Neural Networks:
➢ One to One
➢ One to Many
➢ Many to One
➢ Many to Many
One to One RNN
This type of neural network is known as the Vanilla Neural Network. It's used for general
machine learning problems, which has a single input and a single output.
One to Many RNN
This type of neural network has a single input and multiple outputs. An example of this is the
image caption.

Many to One RNN


This RNN takes a sequence of inputs and generates a single output. Sentiment analysis is a good
example of this kind of network where a given sentence can be classified as expressing positive
or negative sentiments.
Many to Many RNN
This RNN takes a sequence of inputs and generates a sequence of outputs. Machine translation is
one of the examples.

Training Recurrent Neural Networks


• Training RNNs is considered to be difficult, in order to preserve long-range dependencies
it often meets one of the problems called
• Exploding Gradients ( weights become too large that over-fits the model ) or
• Vanishing Gradients ( weights become too small that under-fits the model ).
• The occurrence of these two problems depends on the activation functions used in the
hidden layer.
• with the sigmoid activation function vanishing gradient problem sounds reasonable
while with rectified linear unit exploding gradient make sense.
• For these problems, a concept called regularisation is used which helps to tackle both
vanishing and exploding gradient.
• RNNs can be easily trained using some Deep Learning libraries like Tensorflow,
PyTorch, Theano, etc. The only important thing here is if you want to run RNNs, GPUs
are needed since they are deeper networks for smaller networks you can make use of
online GPU enabled notebooks like Google Colab, Kaggle Kernels, etc.
• As an extension to RNNs, LSTMs ( Long Short Term Memory ) and BRNNs
( Bidirectional Recurrent Neural Networks ) were proposed
Advantages of an RNN
• It can model non-linear temporal/sequential relationships.
No need to specify lags to predict the next value in comparison to and autoregressive
process.
Disadvantages of an RNN
• Vanishing Gradient Problem
• Not suited for predicting long horizons
NATURAL LANGUAGE PROCESSING AND DATA MINING IN CLINICAL TEXT

NATURAL LANGUAGE PROCESSING:

Natural language processing (NLP) is the ability of a computer program to understand human
language as it is spoken and written -- referred to as natural language. It is a component of
artificial intelligence (AI).

WHY WE ARE USING NLP IN CLINICAL TEXT?

Electronic health records (EHR) of patients are major sources of clinical information that
arecritical to improvement of health care processes. Automated approach for retrieving
informationfrom these records is highly challenging due to the complexity involved in
converting clinical textthat is available in free-text to a structured format. Natural language
processing (NLP) and datamining techniques are capable of processing a large volume of clinical
text (textual patient reports)

to automatically encode clinical information in a timely manner.

GENERAL WORKFLOW OF A NLP SYSTEM:

The input for a NLP system is the unstructured natural text that is extracted from patient’s
medical record and send it to report analyser.

Report Analyzer:

The clinical text differs from the biomedical text with the possible use of pseudotables,
i.e.,natural text formatted to appear as tables, medical abbreviations, and punctuation in addition
to the natural language. The text is normally dictated and transcribed to a person or speech
recognition software and is usually available in free-text format. Some clinical texts are even
available in the image or graph format which are in unstructured format.

As a result, NLP processing techniques are applied to convert the unstructured free-text into a
structured format.
The first and foremost task of report analyzer is to preprocess the clinical input text by applying
NLP methodologies. The major preprocessing tasks in a clinical NLP include text segmentation,
text irregularities handling, domain specific abbreviation, and missing punctuation

Text Analyzer

Text analyzer is the most important module in clinical text processing that extracts the clinical
information from free-text and makes it compatible for database storage.The syntactic and
semantic interpreter component of the text analyzer generates a deeper structure such as
constituent or dependency tree structures to capture the clinical information present in the text.
The conversion rules or ML algorithms encode the clinical information from the deep tree
structures. An advantage of the rule-based approach is that the predefined patterns are expert-
curated and are highly specific. The database handler and inference rules component generates a
processed form of data from the database storage

CORE COMPONENTS OF NLP


Due to the complex nature of the clinical text, the analysis is carried out in many phases such as
morphological analysis, lexical analysis, syntactic analysis, semantic analysis, and data encoding

MORPHOLOGICAL ANALYSIS:

It is a word level analysis.

It contain four steps

 Tokenization - it extracts word from a given text

 Stop word Remove - it remove unwanted word like punctuations and articels etc.

 Stemming – It is the process of reducing word into its base forms. example:base form of
took is take i.e,the word took is derived from take.

 N gram language – It is a sequence of n continuous words

o unigram:-It process one word at a time

o bigram:-It process two words at a time and so on. By this we can find the
probabilty of the word

Core NLP Components

Research in NLP for clinical domain makes the computers understand the free-form clinical text
for automatic extraction of clinical information. The general aims of clinical NLP understandings
include the theoretical investigation of human language to explore the details of language from
computer implementation point of view and the more natural man-machine communications that
aims at producing a practical automated system. Due to the complex nature of the clinical text,
the analysis is carried out in many phases such as morphological analysis, lexical analysis,
syntactic analysis, semantic analysis, and data encoding.
LEXICAL ANALYSIS:

The words or phrases in the text are mapped to the relevant linguistic information such as
syntactic information, i.e., noun, verb, adverb, etc., and semantic information i.e., disease,
procedure, body part, etc. Lexical analysis is achieved with a special dictionary called a lexicon,
which provides the necessary rules and data for carrying out the linguistic mapping. The
development of maintenance of a lexicon requires extensive knowledge engineering and effort to
develop and maintain. The National Library of Medicine (NLM) maintains the Specialist
Lexicon with comprehensive syntactic information associated with both medical and English
terms.
Semantic Analysis

It is used to check whether the sentence is meaningful or not. It find some importent tokens and
find its base words. It find parts of speech of each word (It is done in lexical analysis). It need to
check, the two words come together in a sentence does they make a sense. It is done by mapping
syntactic structure and objects in a domain.

It determines the words or phrases in the text that are clinically relevant, and extracts their
semantic relations. The natural language semantics consists of two major features:

 The representation of the meanings of a sentence, which can allow the possible
manipulations (particularly inference)

 Relating these representations to the part of the linguistic model that deals with the
structure (grammar or syntax).

The semantic analysis uses the semantic model of the domain or ontology to structure and
encodes the information from the clinical text. The semantic model is either frame oriented or
conceptual graphs. The generated structured output of the semantic analysis is subsequently
used by other automated processes.
SYNTATIC ANALYSIS:

The word “syntax” refers to the study of formal relationships between words in the text. The
grammatical knowledge and parsing techniques are the major key elements to perform syntactic
analysis. The context free grammar (CFG) is the most common grammar used for syntactic
analysis. CFG is also known by various other terms including phrase structure grammar (PSG)
and definite clause grammar (DCG). The syntactic analysis is done by using two basic parsing
techniques called top-down parsing and bottom-up parsing to assign POS tags (e.g., noun, verb,
adjective, etc.) to the sequence of tokens that form a sentence and to determine the structure of
the sentence through parsing tools.
DATA ENCODING:

The process of mining information from EHR requires coding of data that is achieved either
manually or by using NLP techniques to map free-text entries with an appropriate code. The
coded data is classified and standardized for storage and retrieval purposes in clinical research.
Manual coding is normally facilitated with search engines or pick-up list.

Data Mining in Healthcare

The use of data mining in healthcare is being adopted by organizations with a focus on
optimizing the efficiency and quality of their predictive analytics.

In the healthcare industry specifically, data mining can be used to decrease costs by increasing
efficiencies, improve patient quality of life, and perhaps most importantly, save the lives of more
patients.

DATA MINING IN CLINICAL TEXT:

Text mining in clinical domain is usually more difficult than general domains (e.g. newswire
reports and scientific literature) because of the high level of noise in both the corpus and training
data for machine learning (ML). Healthcare systems and specifically health record systems
contain both structured and unstructured information as text.

It is a subfield of biomedical NLP to determine classes of information found in clinical text that
are useful for basic biological scientists and clinicians for providing better health care.

More specifically, it is estimated that over 40% of the data in healthcare record systems contains
text, so-called clinical text, sometimes also called electronic patient record text.

Clinical text contains valuable information about symptoms, diagnoses, treatments, drug use and
adverse (drug) events for the patient that can be utilized to improve healthcare for other patients.

However, clinical text also contains sensitive information such as personal names, telephone
numbers and addresses of the patient and relatives. This information needs to be pseudonymized
before the clinical text can be utilized for secondary use.

Text mining and data mining techniques to uncover the information on health, disease, and
treatment response support the electronically stored details of patients’ health records. A
significant chunk of information in HER and CDA are text and extraction of such information by
conventional data mining methods is not possible. The semi-structured and unstructured data in
the clinical text and even certain categories of test results such as echocardiograms and radiology
reports can be mined for information by utilizing both data mining and text mining techniques.

Information extraction

Information extraction (IE) is a specialized field of NLP for extracting predefined types of
information from the natural text. It is defined as the process of discovering and extracting
knowledge from the unstructured text.

IE differs from information retrieval (IR) that is meant to be for identifying and retrieving
relevant documents. In general, IR returns documents and IE returns information or facts.

A typical IE system for the clinical domain is a combination of components such as tokenizer,
sentence boundary detector, POS tagger, morphological analyzer, shallow parser, deep parser
(optional), gazetteer, named entity recognizer, discourse module, template extractor, and template
combiner.

A careful modeling of relevant attributes with templates is required for the performance of high
level components such as discourse module, template extractor, and template combiner. The high
level components always depend on the performance of the low level modules such as POS
tagger, named entity recognizer, etc.

IE for clinical domain is meant for the extraction of information present in the clinical text. The
Linguistic String Project–Medical Language Processor (LSP–MLP), and Medical Language
Extraction and Encoding system (MedLEE) are the commonly adopted systems to extract
UMLS concepts from clinical text.
Preprocessing

The primary source of information in the clinical domain is the clinical text written in natural
language. However, the rich contents of the clinical text are not immediately accessible by the
clinical application systems that require input in a more structured form. An initial module
adopted by various clinical NLP systems to extract information is the preliminary preprocessing
of the unstructured text to make it available for further processing. The most commonly used
preprocessing techniques in clinical NLP are spell checking, word sense disambiguation, POS
tagging, and shallow and deep parsing.

Spell Checking

The misspelling in clinical text is reported to be much higher than any other types of texts. In
addition to the traditional spell checker, various research groups have come out with a variety of
methods for spell checking in the clinical domain: UMLS-based spell-checking error correction
tool and morpho-syntactic disambiguation tools.

Word Sense Disambiguation

The process of understanding the sense of the word in a specific context is termed as word sense
disambiguation. The supervised ML classifiers and the unsupervised approaches automatically
perform the word sense disambiguation for biomedical terms.

POS Tagging

An important preprocessing step adapted by most of the NLP systems is POS tagging that reads
the text and assigns the parts of speech tag to each word or token of the text. POS tagging is the
annotation of words in the text to their appropriate POS tags by considering the related and
adjacent words in a phrase, sentence, and paragraph. POS tagging is the first step in syntactic
analysis and finds its application in IR, IE, word sense disambiguation, etc. POS tags are a set of
word categories based on the role that words may play in the sentence in which they appear. The
most common set contains seven different tags: Article, Noun, Verb, Adjective, Preposition,
Number, and Proper Noun.

Shallow and Deep Parsing

Parsing is the process of determining the complete syntactic structure of a sentence or a string of
symbols in a language. Parser is a tool that converts an input sentence into an abstract syntax tree
such as the constituent tree and dependency tree, whose leafs correspond to the words of the
given sentence and the internal nodes represent the grammatical tags such as noun, verb, noun
phrase, verb phrase, etc. Most of the parsers apply ML approaches such as PCFGs (probabilistic
context-free grammars) as in the Stanford lexical parser [50] and even maximum entropy and
neural network.

Few parsers even use lexical statistics by considering the words and their POS tags. Such taggers
are well known for overfitting problems that require additional smoothing. An alternative to the
overfitting problem is to apply shallow parsing, which splits the text into nonoverlapping word
sequences or phrases, such that syntactically related words are grouped together. The word
phrase represents the predefined grammatical tags such as noun phrase, verb phrase,
prepositional phrase, adverb phrase, subordinated clause, adjective phrase, conjunction phrase,
and list marker. The benefits of shallow parsing are the speed and robustness of processing.
Parsing is generally useful as a preprocessing step in extracting information from the natural text.

Context-Based Extraction

The fundamental step for a clinical NLP system is the recognition of medical words and phrases
because these terms represent the concepts specific to the domain of study and make it possible
to understand the relations between the identified concepts. Even highly sophisticated systems of
clinical NLP include the initial processing of recognizing medical words and phrases prior to the
extraction of information of interest. While IE from the medical and clinical text can be carried
out in many ways, this section explains the five main modules of IE.

Concept Extraction

Extracting concepts (such as drugs, symptoms, and diagnoses) from clinical narratives
constitutes a basic enabling technology to unlock the knowledge within and support more
advanced reasoning applications such as diagnosis explanation, disease progression modeling,
and intelligent analysis of the effectiveness of treatment. The first and foremost module in
clinical NLP following the initial text preprocessing phase is the identification of the boundaries
of the medical terms/phrases and understanding the meaning by mapping the identified
term/phrase to a unique concept identifier in an appropriate ontology. The recognition of clinical
entities can be achieved by a dictionary-based method using the UMLS Metathesaurus, rule-
based approaches, statistical method, and hybrid approaches. The identification and extraction of
entities present in the clinical text largely depends on the understanding of the context. For
example, the recognition of diagnosis and treatment procedures in the clinical text requires the
recognition and understanding of the clinical condition as well as the determination of its
presence or absence. The contextual features related to clinical NLP are negation (absence of a
clinical condition), historicity (the condition had occurred in the recent past and might occur in
the future), and experiencer (the condition related to the patient). While many algorithms are
available for context identification and extraction, it is recommended to detect the degree of
certainty in the context.

Association Extraction

Clinical text is the rich source of information on patients conditions and their treatments with
additional information on potential medication allergies, side effects, and even adverse effects.
Information contained in clinical records is of value for both clinical practice and research;
however, text mining from clinical records, particularly from narrative-style fields (such as
discharge summaries and progress reports), has proven to be an elusive target for clinical Natural
Language Processing (clinical NLP), due in part to the lack of availability of annotated corpora
specific to the task. Yet, the extraction of concepts (such as mentions of problems, treatments,
and tests) and the association between them from clinical narratives constitutes the basic
enabling technology that will unlock the knowledge contained in them and drive more advanced
reasoning applications such as diagnosis explanation, disease progression modeling, and
intelligent analysis of the effectiveness of treatment.

Negation

“Negation” is an important context that plays a critical role in extracting information from the
clinical text. Many NLP systems incorporate a separate module for negation analysis in text
preprocessing. However, the importance of negation identification has gained much of its interest
among the NLP research community in recent years. As a result, explicit negation detection
systems such as NegExpander, Negfinder, and a specific system for extracting SNOMED-CT
concepts as well as negation identification algorithms such as NegEx that uses regular expression
for identifying negation and a hybrid approach based on regular expressions and grammatical
parsing are developed by a few of the dedicated research community. While the NegExpander
program identifies the negation terms and then expands to the related concepts, Negfinder is a
more complex system that uses indexed concepts from UMLS and regular expressions along
with a parser using LALR (look-ahead left-recursive) grammar to identify the negations.

Extracting Codes

Extracting codes is a popular approach that uses NLP techniques to extract the codes mapped to
controlled sources from clinical text. The most common codes dealing with diagnoses are the
International Classification of Diseases (ICD) versions 9 and 10 codes. The ICD is designed to
promote international comparability in the collection, processing, classification and presentation
of mortality statistics.
Preprocessing of texts such as tokenisation and text segmentation.

Word processing such as :

Morphological processing:- Pre-processing technique based on morphological operations for


four different imaging modalities namely MRI, CT, mammogram and ultrasound images have
been discussed. In pre-processing step after removal of noise, cleaning of images is done by
dilating, eroding, opening and closing operations. Top hat transforms extract small elements and
details in the image. Even though morphology is a very old technique, it still finds its application
in all medical images in one way or the other. This morphological processing can also be
extended to use in medical image feature selection and segmentation

•Lemmatisation: Lemmatisation (or lemmatization) in linguistics is the process of grouping


together the inflected forms of a word so they can be analysed as a single item, identified by the
word's lemma.

•Stemming: Stemming is a natural language processing technique that lowers inflection in words
to their root forms, hence aiding in the preprocessing of text, words, and documents for text
normalization.

•Compound splitting: Dealing with word compounding in statistical machine translation (SMT)
is essential to mitigate the sparse data problems that productive word generation causes. There
are several issues that need to be addressed: splitting compound words into their correct
components (i.e. disambiguating between split points), deciding whether to split a compound
word at all, and, if translating into a compounding language, merging components into a
compound word

• Abbreviation detection: Detection of abbreviations is also a major subproblem and task of


sentence segmentation and tokenization processes in general, i.e.: disambiguate sentence endings
from punctuation attached to abbrevations. Statistical methods (NLP) have been applied to detect
and extract them successfully, mostly in a (semi-)supervised manner.

Generally, the same building blocks used for regular texts can also be utilised for clinical text
processing. However, clinical texts contain more noise in the form of incomplete sentences,
misspelled words and non-standard abbreviations that can make the natural language processing
cumbersome.

Applications:

1) Healthcare Associated Infections (HAIs)

Healthcare associated infections are also called hospital associated infections or nosocomial
infections. An important goal in defeating HAIs is to collect statistics by detecting and measure
the prevalence of HAIs, but also to predict and warn if a particular patient has a high risk of
obtaining HAI. HAIs can encompass, for example, pneumonia, urinary tract infection, sepsis or
various wound infections but also norovirus (winter vomiting disease). Two machine learning
algorithms ; Support Vector Machine (SVM) and Random Forest (RF) in the Weka toolkit were
applied on the annotated Stockholm EPR Detect-HAI Corpus.

2) Detection of Adverse Drug Events (ADEs)

Adverse drug events (ADEs) are a major public health problem, around 5% of all hospital
admissions in the world are due to ADEs

All drugs are poisonous in some sense but given in the correct amount they may cure a disease.

(a) Dose-related, for example giving toxic effect.

(b) Non-dose related, for example penicillin hypersensitivity.

(c) Dose-related and time-related, related to the cumulative dose.

(d) Time-related, becomes apparent some time after the use of the drug.

(e) Withdrawal, occurs after the withdrawal of the drug.

(f) Unexpected, often caused by drug interactions.

First of all, ICD-10 diagnosis codes related to adverse drug events that are assigned to the patient
records need to be studied.

Medical classification systems:

Medical terminologies, classification systems and available controlled vocabularies are used in
healthcare to report, administer, classify and explain diseases and treatment, including
medication.

Mobile Imaging and Analytics

Mobile imaging is the technique of creating visual representations of the interior of a body for
clinical analysis and medical intervention, as well as visual representation of the function of
some organs or tissues.

Introduction:

Mobile technology and smart devices, especially smartphones, allows new ways of easier
imaging at the patient’s bedside and possess the possibility to be made into a diagnostic tool that
can be used by both professionals as well as lay people. Smartphones usually contain at least one
high-resolution camera that can be used for image formation. However, careful consideration has
to be taken when dealing with cameras in general, and with nonscientific cameras specifically.
Many parameters are usually reported on camera in public commercials, but not all of them are
useful. Especially, pixel resolution can be misleading as the number of pixels itself is not a
measure of quality. Quality is usually measured in signal-to-noise ratio (SNR)

SNR is defined as the power of the signal by the power of noise.

Noise can be introduced in several steps of the image acquisition.

• Shot noise, which is dependent on the quality of the sensor and the discretization of
different number of photons. This noise mostly occurs when only a few photons hit the
sensor.

• Transfer noise, which is introduced by connectivity in the sensor. This is usually static
for all images and can be reduced using background subtraction with an image acquired
in complete darkness.

In case of a camera, the signal is the amount of light captured by the sensor. Since image
noise is reduced, more photons are available. The most important parameter for the quality of
an optical system is the amount of light accumulated on each pixel. This parameter is
determined by the physical size of a pixel (or chip size in relation to number of pixels), as
larger pixel acquires more light, and the diameter of the entry lens, which regulates the
amount of light. The size of the entry lens is usually given in f-stop k (written as 1:k or f/k),
the ratio of distance from sensor to entry lens to diameter of entry lens, the lower, the better.
Most modern smartphones have similar optical parameters as regular consumer cameras,
while being built at a far smaller scale.

First integrations of these cameras into clinical routine and research have already shown
manifold applications for mobile technology in medicine.One example is the usage of the
smartphone camera to take pictures of test strips for automatic analysis.

Another example is the use of smartphone cameras to document necrotic skin lesions caused
from the rare disease calciphylaxis in a multicenter clinical registry. Here, special care must be
taken when dealing with multiple different smartphones or lighting conditions due to different
efficiencies in capturing colors.

A color reference has to be used to calibrate the camera colors in a later step. To control
illumination, zoom, and distance, the German company FotoFinder has developed an integrated
lens system that is easily attached to and powered by an iPhone transforming it into a
dermatoscope.
Beside the integrated camera, additional image formation methods can also be used on smart
devices by either incorporating special sensors (like ultrasound or ECG) or by connecting
themwired or wireless to more powerful imaging machines like micronuclearmagnetic resonance
(micro-NMR) for bedside diagnostic.

Data Visualization

The task of transforming an acquired image dataset into a perceptible form is called
visualization.This is rather simple for most 2D methods like digital photographs, but for 3D
volumes, in particular, if voxels are annotated with several features or monitored over time
(3D+t). In general, all data is displayed by transforming it into a colored 2D representation.
Hence, we need to consider the output devices as well as the definition and value ranges of the
initial data.

Visualization Basics

The human eye is capable of detecting light between 390 and 700 nm wavelengths. Images that
are recorded and displayed within this so-called visible spectrum show the data in “true color.”
But because many modalities like X-ray, ultraviolet, or infrared imaging capture wavelengths
outside the visible spectrum, a modification of the recorded data has to be performed. The
resulting image (e.g., a grayscale image for X-ray) is displayed in “false color.” A special case of
this is the so-called “pseudo color,” which means that the color of an image has been artificed to
enhance certain features. Here, a single channel image and a so-called color map are used to
convert each value of the single channel into a corresponding color.

As an example, the Doppler signal contains information on direction of movement for each
position. This movement can be either positive (towards the detector), negative (away from the
detector), or zero (no movement). To superimpose this information to morphologic image data (B
mode), a different color scheme is applied. The zero level would be encoded in black, negative
values in blue, and positive values in red. Larger absolute value of the signal results in brighter
color.

Output Devices

All data is displayed on a computer screen, where colors are mixed from three basic channels:
red, green, and blue (RGB). This results in a cubic color space.Setting all three colors to the same
value creates different shades of gray. Each color is usually scaled from 0 (dark) to 255 (bright).
This equals a bit depth of 8, meaning that 8 bits in memory are allocated for each color channel
yielding in total 256 power 3∼ 16 million possible values. Higher bit-depth color or gray values
are also possible but rarely used, as they are not well supported by computer screens and file
formats.

However, in some cases a higher contrast or distribution of color or gray values is needed, e.g.,
for diagnostics in radiology. Therefore, computer screens in diagnostic radiology support higher
bit depth (e.g., grayscale bit depth of 10), and have a better contrast (e.g., 1400:1 compared to
1000:1 regular) and brightness (e.g., 400 cd/m2 brightness compared to 200 cd/m2 regular) than
regular computer screens.

Printers differ from screens in that the background color of a screen (no color turned on) is black,
while the background color of a printout (paper) is white. Thus, higher values in color for screens
result in brighter colors, while higher amounts of color from a printer result in darker colors.
Therefore, printers usually use cyan, magenta, yellow, and black (CMYK) color space to
compensate for the nonblack background. Black is used as a key ingredient when mixing the
colors to minimize the fluid on the paper. Therefore, computer screens in diagnostic radiology
support higher bit.

Mobile Visualization

Recently, visualization and display technology has been dominated by trends in mobile
computing.For example, prior to the introduction of the first retina display with the iPhone 4 in
2010, almost all computer and smartphone displays had a pixel density of about 70–100 pixels
per inch(ppi). Increase in resolution was mostly achieved through larger monitor screens.

However, the introduction of the retina display increased the pixel density above 300 ppi,
improving perceived contrast and also outperforming radiology displays in many other aspects
(e.g., iPhone 4 brightness: 500 cd/m2). Thereby, these new types of screens show great potential
for radiologists.

Additionally, modern smartphones and tablet computers provide a high amount of processing
power (e.g, 64-bit dual core, 1.3 GHz in iPhone 5s) that can be used for image visualization.
Almost all 2D and surface-rendering visualization techniques can be employed in real time. Real
time means that the result is delivered fast enough to make an impact on the current situation, or,
in terms of visualization of data, so that no delay between action (e.g., zooming) and result
(zoomed image) is perceived. Usually, this requires 15 to 20 frames per second (fps).[Frame
rate, then, is the speed at which those images are shown]

Volume rendering

Volume rendering is a type of data visualization technique which creates a three-dimensional
representation of data. CT and MRI data are frequently visualized with volume rendering in
addition to other reconstructions and slices. This technique can also be applied to tomosynthesis
data.

Volume rending is computationally expensive, for example, a dataset of CT angiography can


contain up to 6 GB of data in 512 power 3 pixels taken over time that have to be in memory
during visualization.Therefore, most smart devices are not capable of performing volume
rending natively. Remote visualization has been successfully implemented to display images,
which have been rendered on a server, remotely on a tablet computer or smartphone. This so-
called streaming is performed by sending video of a live view of an object from the server to the
client (tablet computer or smart-phone).

Example, using H.264 video compression that is standard in mobile communication. On the
other hand, the client captures touches, swipes, and other interactions of the user and sends these
to the server to update the live view. Streaming of video data has the benefit of allowing the user
to use a mobile device while having the computational power of a workstation. The drawback of
this approach is the needed bandwidth to stream images in real time from the server to the
mobile device.
For example, a video with 30 frames per second (fps) and a resolution of 1920 by 1080 pixels
(FullHD/1080 p) requires about 1 Mb/sec bandwidth. This is not possible through most current
wireless networks like 3G, which is limited between 350 and 2000 kilobits per second
(kbit/s),depending on country and reception.

Calibration

Important for distributed visualization on a range of different devices is calibration. This means
that the same image is displayed in the exact same way on all devices, even if background
illumination differs between these devices. For this, an application has been developed that allow
users to calibrate their devices visually on their own. In this application the user is guided
through 8 steps, each showing a visual pattern. In each step, the user has to adjust a slider to
change the visibility of the pattern.

One concern that is often raised when visualizing biomedical images on mobile devices is the
appropriateness for diagnostics. For example, software that displays medical images might have
to undergo investigation by the Food and Drug Administration (FDA) or other local legal
authorities to be cleared for commercial marketing. Smartphones and tablet computers do not
necessarily meet the requirements to undergo these studies. Therefore, the appropriateness and
legitimacy of the device chosen for visualization should always be taken into account when
considering the use of a mobile device for diagnostic or visualization of medical images.

Image Analysis

Image analysis is the task of extracting abstract information or semantics and knowledge from
the raw pixels of image and signal data.

This is the most challenging task in biomedical imaging as it supports researchers and clinicians
in finding clues for disease or certain phenotypes (diagnostics), supports novices and experts in
performing procedures (therapy) and follow-up to the outcome, and allows scientist to gain
knowledge from imaging data.

With the growing number of digital imaging devices, automated knowledge extraction becomes
more and more important. The new trend towards mobile and personalized health data
additionally drives the need for automation. For example, many applications for the smartphone-
based investigation of skin cancers do already exist but only a few are actually accurate. Pulse
frequency is determined accurately and contactless by any smartphone device simply filming the
face and determining the very slight periodic changes in skin color, which are usually not
observed by humans.

Biomedical image analysis

Biomedical image analysis task can be split up into several substeps:

1. Preprocessing to remove background noise or enhance the image

2. Extraction of features to be used in later steps

3. Registration of several images

4. Segmentation (localization and delineation) of regions of interest (ROIs)

5. Classification of the image or segmented parts and measurements

Preprocessing and Filtering

Basically all images from biomedical imaging modalities and especially those from smart phone
cameras are noisy and contain artifacts. Therefore, preprocessing is required before the data can
be used for analysis. Additional preprocessing can also help to prepare the image for certain
analysis tasks, such as edge detection. Most of the preprocessing algorithms are low in
computation time and memory requirements and hence suitable for mobile devices.

Gaussian filter
A Gaussian filter is commonly used to remove noise and recording artifacts from an image by
blurring. The filter consists of a multidimensional Gaussian distribution that is convolved with
the image. For convolution, the center value is replaced with the accumulated weighted values
according to the mask. High frequency noise in the image is thereby reduced.
On convolution of the local region and the Gaussian kernel gives the highest intensity value to
the center part of the local region(38.4624) and the remaining pixels have less intensity as the
distance from the center increases.Sum up the result and store it in the current pixel
location(Intensity = 94.9269) of the image.
Median filter:

The median filter is also used to reduce noise. For this filter, a sliding window with a fixed size
(here a 3x3 pixel) is moved across the image. The center point of the window is replaced by the
median value within the window. For median computation, the image pixel values at current
mask position (A to I) are sorted, and the center is replaced by the fifth value in the sorted
row.This removes outliers in an otherwise smooth area while maintaining the value of the
majority of the pixels.
Sobel filter

The Sobel filter is used to enhance edges in the image. For this, an asymmetric filter is
convolved with the image. The mask that is visualized is sensitive to vertical edges, in particular
to vertical edges from black to white. Usually, this mask is turned by 90◦ and the signs are
changed ending up with a set of eight different masks. All eight masks are applied individually
and, for instance, the maximum is used as a replacement for the center pixel to obtain an edge
map.
Feature Extraction

Features are simplified descriptors of an image or part of an image. Features are used to compare
two images, or find similarities or shared objects between multiple images. Image features can be
either global (describing the image as a whole) or local (describing a part of any size of the
image).

A very basic global image feature is the image histogram. A histogram is a probability
distribution of the pixel/voxel values in the image. For each possible value, the number of
occurrences is counted in the image. This results in a very simplified representation as
information on the intensity is maintained, but all spatial information is lost. Global features,
such as the shape of the histogram, can be used, for instance, to distinguish between classes of
images, e.g., hand and skull radiographs

Local features describe only a part of the image at a certain spatial position. Most are created in
two separate steps. The first one is feature detection, in which points of interest (POIs) are
localized.The second step features description. For each of the detected points, a description of
this position (possibly including some surrounding areas) is created. Since images can be
acquired under different conditions like scale and rotation, certain invariance against these
changes is needed for both detector and descriptor.

Scale Invariant Feature Transform (SIFT)

Recognizing objects in images is one of the most important problems in computer vision. A
common approach is to first extract the feature descriptions of the objects to be recognized from
reference images, and store such descriptions in a database. When there is a new image, its
feature descriptions are extracted and compared to the object descriptions in the database to see
if the image contains any object we are looking for. In real-life applications, the objects in the
images to be processed can differ from the reference images in many way:

Scale, i.e. size of the object in the image

 Orientation

 Viewpoint
 Illumination

 Partially covered

Scale-invariant feature transform (SIFT) is an algorithm for extracting stable feature description


of objects call keypoints that are robust to changes in scale, orientation, shear, position, and
illumination.
Refer book for Remaining

You might also like