Sinemn Pro
Sinemn Pro
Submitted by
SINEKA M
(Register No:713522CA50)
PROJECT REPORT
Submitted to the
Of
in
NOV/DEC-2023
BONAFIDE CERTIFICATE
MACHNE LEARNNG” is the bonafide work of Ms. SINEKA M who carried out of this
project work under my supervision. Certified further, that to the best of my knowledge the work
reported herein does not form part of any other project report or dissertation on the basis of
which a degree or award was conferred on an earlier occasion on this or any other candidate.
_______________________ _______________________
Submitted for the Viva-Voice examination held at SNS COLLEGE OF TECHNOLOGY, held on
_______________
_____________________ ________________________
I SINEKA M declare that the project work entitled “Breast Cancer Prediction Using in
Machine Learning” submitted to SNS COLLEGE OF TECHNOLOGY (AN
AUTONOMOUS INSTITUTION), Coimbatore of the requirements for the award of Degree
of Master of Computer Applications is a record of original work done by me under the
guidance of Dr.S.SUNDARARAJAN ,Professor and Head of the department of Computer
Applications (PG), SNS COLLEGE OF TECHNOLOGY (AN AUTONOMOUS
INSTITUTION), Coimbatore.
Date: SINEKA M
ACKNOWLEDGEMENT
First of all I solicit my humble thanks to god almighty for being with me, guiding me the right
way throughout my project work.
Finally, I thank our parents for their financial and moral support and also our friends who
helped us in completing of our project work successfully
iv
ABSTRACT
For this project, we are using Deep Convolutional Neural Network (CNN) which is an advanced
neural network. The dataset we use contain images (Histopathological images) from real life
which are for both diseased and normal cases. CNN requires a large amount of data and days of
training on high end systems to give optimum results. To avoid that, we use a process called
transfer learning which make use of pertained model (trained on a dataset of 1 million images).
This helps in getting closer to the optimum model. Further, there will be a training performed
with our own dataset which helps us to create the optimum model for breast cancer detection.
v
TABLE OF CONTENTS
Page No.
Certificate
Declaration
Abstract
Acknowledgement
Table of contents
List of Figures
List of Tables
List of Abbreviations
1 INTRODUCTION
Company Profle 1
1.1 About Project
1.2 Problem Statement
1.3 Objective
1.4 Scope of the Project
1.5 Methodology
2 LITERATURE SURVEY 5
vi
4.2 Feasibility Study 18
4.3 Use Case Diagram 19
4.4 Class Diagram 20
4.5 Sequence Diagram 21
5 IMPLEMENTATION 22
5.1 VGG16 Architecture 23
5.2 Convolution Layer 26
5.3 Filters 28
5.4 Pooling Layer 32
5.5 Feature Extraction 35
5.6 Random Forest 37
7 TESTING 45
vii
LIST OF FIGURES
viii
LIST OF TABLES
Agile Methodology 3
LIST OF ABBREVIATIONS
FC - Fully Connected
ix
CHAPTER 1
INTRODUCTION
COMPANY PROFLE:
This innovative software redefines the urban narrative, creating pathways that align
seamlessly with the pulse of city life. Its impact extends beyond mere convenience,
fostering a landscape where environmental sustainability intertwines with the ease of
mobility. Through continuous evolution and adaptation, it paves the way for a tomorrow
where transportation is not just efficient but a catalyst for a better world, bridging gaps
and shaping a connected, thriving society.
ABOUT PROJECT:
Cancer is currently a deadly disease rising across the globe. Among the several
existing types of cancer, breast cancer (BC) presents two very concerning
characteristics: It is the most common cancer among women worldwide. It presents a
very high mortality rate when compared to other types of cancer.
Histopathological analysis remains the most widely used method for BC diagnosis
and most of the diagnosis continues being done by pathologists. Applying visual
inspection of histological samples under the microscope, automatic classification of
histopathological images is a research topic that can make BC diagnosis faster and
less prone to errors.
1
The detection of breast cancer is done by the analysis of either mammography or
ultrasound imaging and regular check-ups. Pathologists check the microscopic
elements and tissue structure for detailed analysis.
The reported results clearly show that the latter can achieve higher recognition rates.
However, the development of such system requires longer training time, some tricks
like random patches to improve performance, and still a lot of expertise from the
developer to tweak the system.
Deep CNN diagnosis provides a second option for image diagnosis which can
improve the reliability of experts’ decision making. Advanced CNN technology has
achieved great success in natural image classification and it has been used widely in
bio-medical image processing. Digitized tissue histopathology has now become
responsive to the application for computerized image analysis.
\
• Analysis is the process of finding the best solution to the problem. System
analysis is the process by which we learn about the existing problems, define
objects and requirements and evaluates the solutions.
• The chapter covers the implementation aspects of the project, giving details of
the programming language and development environment used. It also gives
an overview of the core modules of the project with their step by step flow.
• Result and conclusion chapter gives the result of the project and the snapshots
of the project, its related information including graphs and other information
of the output.
• Testing chapter gives an overview of the various types of testing incorporated
during the entire duration of the project.
• The summary of the work carried, contributions if any, their utility along with
the scope for further work.
1.3 Objectives
1.4 Methodology
• Collecting information regarding patients and their microscopic biopsy images.
• Inferring patterns, identifying model parameters and deciding features of
interest.
• Building a record structure showing how patients and pathologists can access
data.
• Deploying the prototype for testing, collecting feedback from users,
maintenance and bug fixing.
Agile Methodology
3
1.5 Problem Statement
• Goal: To build a robust model using machine-learned features and
Convolutional Neural Networks for the detection of breast cancer from
histopathological images.
• The broad adoption of WSI and other forms of digital pathology has been
facing obstacles such as the high cost of implementing and operating the
technology, insufficient productivity for high-volume clinical routines,
intrinsic technology-related concerns.
• Deep CNN is used which avoids handling large training data and large
training time by taking output of CNN as input.
CHAPTER 2
LITERATURE SURVEY
2.1 PAPER 1:
\
In this work we presented an investigation of the use of DeCAF features for
breast cancer recognition using the BreaKHis dataset. The large size of the BreaKHis
dataset has given us the opportunity to compare, on the same dataset, CNN trained
from scratch with (DeCAF) features repurposed from another CNN trained on natural
images, which often is not possible with medical image datasets since they are too
small. From the results we can observe that these features are a viable alternative for a
fast creation of image recognition systems using deep learning, and this system can
perform better than systems using visual feature descriptors. Compared with a CNN
trained from scratch, DeCAF features present comparable recognition rates. Note that
training a CNN specifically for the problem requires more complex and slower
training schemes.
This result is important for the design of future classification based systems in
computer-aided diagnosis, since it shows that deep learned features, even if obtained
with a CNN trained on other types of images, are valuable. With this study we make one
more step towards transfer learning for medical image analysis and CAD/CADx
systems, where CNN trained on ImageNet enable the detection of nodules in medical
images.
2.2 PAPER 2:
The present work is based on the classification of breast cancer using capsule
net architecture. From this work, it is clear that the performance of the conventional
5
architectures can be improved by data pre-processing and parameter tuning. The
results show that this method can be used as an automated tool to assist doctors in
disease diagnosis, which may lead to higher concentration in the treatment at early
stages rather than diagnosis and can increase the cancer survival rate.
In the proposed method, the histology images are fed as an input to the capsule
network architecture and it consist of input layer, hidden layer and output layer. In
fully connected network every neuron in one layer is connected to every other neuron
to another layer which gives probability distribution. Present work is based on
classification of breast cancer using capsule network and performance can be
improved by improving pre-processing and parameter tuning.
2.3 PAPER 3:
Title: “Breast cancer diagnosis from biopsy images with highly reliable random
subspace classifier ensembles”
\
ensemble consists of binary SVMs which were trained in parallel, while the second
ensemble comprises MLPs. During classification, the cascade of classifier ensembles
received randomly sampled subsets of features following the Random Subspace
procedure. For both of the ensembles the rejection option was implemented by
relating the consensus degree from majority voting to a confidence measure and
abstaining to classify ambiguous samples if the consensus degree was lower than the
threshold.
Although the proposed system has shown promising results with respect to the
biopsy image classification task, there are still some aspects that need to be further
investigated. The benchmark images used in this work were cropped from the original
biopsy scans and only cover the important areas of the scans. However, often it is
difficult to find Regions of Interest (ROIs) that contain the most important tissues in
biopsy scans, more efforts therefore need to be put into detecting ROIs from biopsy
images. In this paper, the parameters for the cascade system, such as ensemble size
and rejection threshold, were decided empirically; this may not have produced the
most satisfactory performance with respect to all application contexts. Therefore,
7
some self-adaptive rules or algorithms for automatically optimizing these parameters
would be desirable.
2.4 PAPER 4:
We have also proposed several strategies for training the CNN architectures,
based on the extraction of patches obtained randomly or by sliding a window
mechanism, that allow to deal with the high-resolution of these textured images
without changing the CNN architectures designed for low resolution images. Our
experimental results obtained on the BreaKHis dataset showed improved accuracy
obtained by CNN when compared to traditional machine learning models trained on
the same dataset but with state-of-the-art texture descriptors. Future work can explore
different CNN architectures and the optimization of the hyperparameters. Also,
strategies to select representative patches in order to improve the accuracy can be
explored.
\
for low resolution images. The goal is to preserve the original tissue structures and
molecular composition allowing to observe it in a light microscope. CNN has
achieved success in image classification problem including medical image analysis.
CNN consists of multiple trainable stages stacked on top of each other followed by
supervised classifier and feature maps. Feature map is a 2D array storing a color
channel of input image. Output consists of a set array where each feature map
represents a particular feature extracted at location of the input.
2.5 PAPER 5:
In this study, four different clustering algorithms for nuclei segmentation were
compared. The methods were applied in a medical decision support system and were
tested in terms of the classification accuracy on real routine medical data acquired
from the Regional Hospital in Zielona Góra. The main question to be answered was
whether relatively simple but fast methods can be successfully applied in
computeraided diagnosis. The results reaching approximately 96–100% are very
optimistic and provethe approach not only to be fast and relatively easy to implement
but also to provide accurate medical information for a cytologist. More sophisticated
segmentation methods might be disadvantageous considering slight or no resulting
improvement along with the large increase in complexity and computational
requirements. However, the presented methods often fail to appropriately identify
overlapping nuclei. Additionally, if a given image is composed of very few nuclei,
then the clustering might give incorrect results. Given the above, it is important to
provide an adequate number of images (in the paper, the authors used 9 images per
patient) to achieve a good detection accuracy.
The clustering algorithm selection impact on the final classification result was
also verified. Although visual inspection showed some discrepancies in the quality of
the segmentation results, the classification accuracy has not confirmed these
differences.
9
In the future, this research is going to be directed toward a possible application
of the presented approach to virtual slides (VS). VS are images that have extremely
high resolution (9 gigapixels and more) and contain information on the whole slide.
Moreover, there are plans to improve the accuracy for the three-class problem, where
apart from benign and malignant cases, there is also a fibroadenoma case.
Fibroadenoma is a benign tumor that can have some properties that are similar to a
malignant tumor. However, such a task could require a more sophisticated
classification approach.
2.6 PAPER 6:
In this work we presented an investigation of the use of DeCAF features for breast
cancer recognition using the BreaKHis dataset. The large size of the BreaKHis dataset
has given us the opportunity to compare, on the same dataset, CNN trained from
scratch with (DeCAF) features repurposed from another CNN trained on natural
images, which often is not possible with medical image datasets since they are too
small. From the results we can observe that these features are a viable alternative for a
fast creation of image recognition systems using deep learning, and this system can
perform better than systems using visual feature descriptors. Compared with a CNN
trained from scratch, DeCAF features present comparable recognition rates. Note that
training a CNN specifically for the problem requires more complex and slower
training schemes.
This result is important for the design of future classification based systems in
computer-aided diagnosis, since it shows that deep learned features, even if obtained
with a CNN trained on other types of images, are valuable. With this study we make one
more step towards transfer learning for medical image analysis and CAD/CADx
systems, where CNN trained on ImageNet enable the detection of nodules in medical
images.
\
As future work, one direction is to improve the recognition accuracy of
DeCAF features using patches. Further investigation on the size of the patches, as
well as overlapping patches, can be beneficial to increase the accuracies obtained with
DeCAF features. Another investigation that can produce good results is the
combination of these features with other visual descriptors and task-specific CNNs, to
exploit the complementarity of these approaches. In addition, a better investigation on
feature and classifier selection could also improve performance.
2.7 PAPER 7:
In this work, we have proposed a general framework based on CNNs for learning
breast cancer histopathology image features. The proposed framework is independent
from microscopy magnification and faster than previous methods as it requires single
training. Speed and magnification independence properties are achieved without
sacrificing the state-of-the-art performance. Magnification independent models are
scalable, new training images from any magnification level could be utilized and
trained models could easily be tuned (fine-tuning) by introducing new samples.
For the future work, stain normalization, deeper architectures, and splitting the
network before the last fully-connected layer could be investigated. It would be
interesting to observe task-wise early stopping in multi-task architecture. More
importantly, additional data with increased number of patients should be introduced.
11
We believe CNNs are more promising in breast cancer histopathology image
classification than handcrafted features and the data is the key issue to obtain more
robust models.
CHAPTER 3
The SRS also functions as a blueprint for completing a project with as little cost
growth as possible. The SRS is often referred to as the "parent" document because all
subsequent project management documents, such as design specifications, statements
of work, software architecture specifications, testing and validation plans, and
documentation plans, are related to it. It is important to note that an SRS contains
functional and non-functional requirements only; it doesn't offer design suggestions,
possible solutions to technology or business issues, or any other information other
than what the development team understands the customer's system requirements to
be.
The SRS functions as a blueprint for completing a project. The goal of preparing the
SRS document is to:
• Facilitate communication between the customer, analyst, system developers,
maintainers.
• To form a foundation for the design phase.
\
• Support system testing facilities.
• Controlling the evolution of the system
Non-functional requirements are the requirements which are not directly concerned
with the specific function delivered by the system. They specify the criteria that can
be used to judge the operation of a system rather than specific behaviours. They may
relate to emergent system properties. Non-functional requirements for this system are
specified as follows:
13
• 20GB hard disk space for training
• Testing can be done on PC: 8GB RAM
• Intel i7 processor
• 10GB hard disk space
3.2 Software Requirement
Google Colab is a free cloud service and now it supports free GPU. Improves Python
programming language coding skills. Not only is this a great tool for improving
coding skills, but it also allows to develop deep learning applications using libraries.
Collaboratory, or “Collab” for short, allows you to write and execute Python in your
browser, with zero configuration required, free access to GPUs and easy sharing.
CHAPTER 4
4.1 Overview
Analysis is the process of finding the best solution to the problem. System
analysis is the process by which we learn about the existing problems, define objects
and requirements and evaluates the solutions. It is the way of thinking about the
organization and the problem it involves, a set of technologies that helps in solving
these problems. Feasibility study plays an important role in system analysis which
gives the target for design and development.
All systems are feasible when provided with unlimited resource and infinite time.
But unfortunately, this condition does not prevail in practical world. So, it is both
necessary and prudent to evaluate the feasibility of the system at the earliest possible
time. Months or years of effort, thousands of rupees and untold professional
\
embarrassment can be averted if an ill-conceived system is recognized early in the
definition phase. Feasibility & risk analysis are related in many ways. If project risk is
great, the feasibility of producing quality software is reduced.
System is only beneficial only if it can be turned into information systems that will
meet the organization’s technical requirement. Simply stated this test of feasibility
asks whether the system will work or not when developed & installed, whether there
are any major barriers to implementation. Regarding all these issues in technical
analysis there are several points to focus on: -
Changes to bring in the system: All changes should be in positive direction, there
will be an increased level of efficiency and better customer service.
Required skills: Platforms & tools used in this project are widely used. So, the
skilled manpower is readily available in the industry.
Acceptability: The structure of the system is kept feasible enough so that there
should not be any problem from the user’s point of view.
15
Summary
The main aim of this chapter is to find out whether the system is feasible
enough or not. For these reasons' different kinds of analysis, such as performance
analysis, technical analysis etc. is performed.
A use case diagram is usually simple. It does not show the detail of the use
cases:
• It only summarizes some of the relationships between use cases, actors, and
systems.
• It does not show the order in which steps are performed to achieve the goals of
each use case.
• Pathologist
• Doctor
• Oncologist
\
Fig 4.3 Use-Case Diagram
• Pathologist: This is user interface class which has function for invoking train
and predict.
17
• Feature extraction: This class has functions for extracting the features from the
dataset and making it suitable for Random Forest.
• As shown in the sequence diagram for training flow, Pathologist invokes train and
trains a Random Forest model based on the training dataset.
• AS shown in the sequence diagram for Prediction, Random Forest invokes predict
whether it is Benign or Malignant.
\
Fig 4.5 Sequence diagram for training flow
19
CHAPTER 5
IMPLEMENTATION
• Careful planning.
• Investigation of the system and constraints.
• Design of methods to achieve the changeover.
• Evaluation of the changeover method.
• Correct decisions regarding selection of the platform
• Appropriate selection of the language for application development
\
Fig 5.1: VGG16 Structure
The input to cov1 layer is of fixed size 224 x 224 RGB image. The image is passed
through a stack of convolutional (conv.) layers, where the filters were used with a
very small receptive field: 3×3 (which is the smallest size to capture the notion of
left/right, up/down, centre). In one of the configurations, it also utilizes 1×1
convolution filters, which can be seen as a linear transformation of the input channels
21
(followed by non-linearity). The convolution stride is fixed to 1 pixel; the spatial
padding of conv. layer input is such that the spatial resolution is preserved after
convolution, i.e. the padding is 1-pixel for 3×3 conv. layers. Spatial pooling is carried
out by five max-pooling layers, which follow some of the conv. layers (not all the
conv. layers are followed by max-pooling). Max-pooling is performed over a 2×2pixel
window, with stride 2.
Three Fully-Connected (FC) layers follow a stack of convolutional layers (which has
a different depth in different architectures): the first two have 4096 channels each, the
third performs 1000-way ILSVRC classification and thus contains 1000 channels (one
for each class). The final layer is the soft-max layer. The configuration of the fully
connected layers is the same in all networks.
The convolutional layer is the core building block of a CNN. Convolutional layers
are the layers where filters are applied to the original image, or to other feature maps
in a deep CNN. This is where most of the user-specified parameters are in the
network. Convolutional Neural Networks (CNN or ConvNet) are complex feed
forward neural networks. CNNs are used for image classification and recognition
because of its high accuracy.
\
Fig 5.2: Convolutional Layer
5.3 Filters
As shown in the figure 5.3 and 5.4, the reading of the input matrix begins at
the bottom right of image. Next the software selects a smaller matrix there, which is
called a filter. Then the filter produces convolution that is moves along the input
23
image. The filter’s task is to multiply its value by the original pixel values. All these
multiplications are summed up. One number is obtained in the end. After passing the
filter across all positions, a matrix is obtained, but smaller than an input matrix.
The network will consist of several convolutional network mixed with non
linear and pooling layers. When the image passes through one convolution layer, the
output of the first layer becomes the input for the second layer. And this happens with
every further convolutional layer.
\
Fig 5.4: Shows filter used in convolutional layer
The non linear is added after each convolution operation. It has an activation
function, which brings non linear property. Without this property a network would not
be intense and will not be able to model the class label.
The pooling layer follows the non-linear layer. It works with width and height of the
image and perform operation on them. As a result, the image volume is reduced. After
completion of series of convolutional, non-linear and pooling layers, it is fully
connected layer. The layer takes the output information from convolutional networks
to the end of the network results in an N dimensional vector.
25
Fig 5.5: Shows filter used in convolutional layer
As shown in the figure 5.5, consider the max pooling with 2x2 window and
stride 2, maximum value in that stride is 2, in the next stride, the maximum value is 1.
Then the maximum value 3 is followed by 1.
By combining these maximum values, a new matrix is formed, which has reduced in
the size of the presentation to reduce the number of parameters and computation in
the network.
The addition of a pooling layer after the convolutional layer is a common pattern used
for ordering layers within a convolutional neural network that may be repeated one or
more times in a given model.
The pooling layer operates upon each feature map separately to create a new set of the
same number of pooled feature maps.
\
Fig 5.6: Pooling Layer
27
Fig 5.7 Feature Extraction
As shown in the figure 5.7, when an image is selected, the features are
extracted from the image. Then the information features are selected using which we
can classify whether the tumour is benign or malignant.
Random forests, otherwise known as the random forest model, is a method for
classification and other tasks. It operates from decision trees and outputs classification
of the individual trees. The random forest is a classification algorithm consisting of
many decisions’ trees. It uses bagging and feature randomness when building each
individual tree to try to create an uncorrelated forest of trees whose prediction by
committee is more accurate than that of any individual tree.
\
Fig 5.8: Random Forest
29
CODE:
Using VGG16 to diagnose breast cancer detection
import pandas as pd import
numpy as np import os from
google.colab import drive
import pickle
//importing python libraries that will be used later in the program
Instantiate VGG16
import tensorflow as tf
# Import the VGG16 network architecture
//we use VGG16 network architecture for extracting features from the image
IMG_WIDTH = 50
IMG_HEIGHT = 50
IMG_DEPTH = 3
BATCH_SIZE = 16
//we initialise dimension of image as 50x50x3 with batch of 16 images
\
conv_base = VGG16(weights='./vgg16_weights_tf_dim_ordering_tf_kernels_notop.h
5',
include_top=False,
input_shape=(IMG_HEIGHT, IMG_WIDTH, IMG_DEPTH))
print(tr.shape) //shape
of train features
clf.fit(tr, y_train)
//training the training image features that we are extracting along wuth these
corresponding labels.
(8998, 512)
RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
criterion='gini', max_depth=5, max_features='auto',
max_leaf_nodes=None, max_samples=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=5, n_jobs=None,
oob_score=False, random_state=None, verbose=0, warm_start=False)
31
val = val_features.reshape(val_features.shape[0], -1) // reshaping val_features and
storing it in val.
clf.score(val, y_val)
//to check the accuracy of our previous training, i.e. how accurately this model will
predict if a person has cancer or not.
labs = list(set(y))
//storing all the labels in y as a list in labs.
model =
tf.keras.models.Sequential([ tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512, activation=tf.nn.relu),
tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(2,
activation=tf.nn.softmax)
])
model.compile(optimizer=adm,
loss='sparse_categorical_crossentropy',
metrics=['accuracy']) model.fit(train_features,
alt_num_labs_train, epochs=15)
\
// training the model with model.fit
//we are taking epochs as 15, so our model will train for 15 times and tries to reach the
best prediction accuracy.
model.evaluate(val_features, alt_num_labs_val)
import tensorflow as tf
model = models.Sequential()
model.add(layers.Dense(512, activation='relu', input_dim=(1*1*512)))
model.add(layers.Dense(2, activation=tf.nn.softmax))
model.compile( loss='binary_crossentrop
y',
optimizer=optimizers.RMSprop(lr=2e-4),
metrics=['acc'])
reduce_learning = callbacks.ReduceLROnPlateau(
monitor='val_loss', factor=0.2, patience=2,
verbose=1, mode='auto',
epsilon=0.0001,
cooldown=2, min_lr=0)
eary_stopping = callbacks.EarlyStopping(
monitor='val_loss', min_delta=0,
patience=7, verbose=1, mode='auto')
33
callbacks = [reduce_learning, eary_stopping]
callbacks=callbacks
)
import matplotlib.pyplot as plt
plt.figure()
\
//here we are graphically representing the prediction loss during training and
validation.
plt.legend()
plt.show()
• The overall accuracy for breast cancer diagnosis achieved equal to 86%.
• The dataset showed improved accuracy obtained by CNN when compared to
traditional learning techniques.
• Deep CNN diagnosis provides a second option for image diagnosis which can
improve the reliability of expert decision making.
• The following snapshots and graphs define the results or outputs that we will
get after step by step execution of each proposed protocol.
SCREENSHOTS
\
• This image is the most affected part/patch of the original scanned image
• It shows how the image size is decreasing in each layer i.e., input layer, Convo2D
layer, maxpooling layer
37
Fig 6.4: Accuracy achieved
• The above figure consists of accuracy achieved for each of the epoch value
• For different epoch value, the accuracy achieved is different
\
• For epoch=18, the maximum accuracy achieved is 94%
• The above graph is the accuracy graph obtained from the achieved accuracies
• As we can see, as the training accuracy at its peak, which shows the maximum
result
39
Fig 6.7: Loss graph
\
Fig 6.8: Output shown for a specific image
• Based on the model prediction, it predict whether the image is cancerous or not
• For the input image of size 50X50, the model predicts the diagnosis as Cancer
41
CHAPTER 7
TESTING
This chapter gives an overview of the various types of testing incorporated during the
entire duration of the project.
Under System Testing technique, the entire system is tested as per the requirements.
It is a Black-box type Testing that is based on overall requirement specifications and
covers all the combined parts of a system.
\
7.5 Interface Testing
The objective of this Interface Testing is to validate the interface as per the business
requirement. The expected interface of the application is mentioned in the detailed
design document and interface mock-up screens. Checks if the application correctly
connects to the server.
Compatibility Testing checks whether the application is compatible with the specified
software and hardware requirements and functions efficiently as expected.
CHAPTER 8
8.1 CONCLUSION
The present work is based on the classification of breast cancer using capsule
net architecture. From this work, it is clear that the performance of the conventional
architectures can be improved by data pre-processing and parameter tuning.
43
The results show that this method can be used as an automated tool to assist
doctors in disease diagnosis, which may lead to higher concentration in the treatment
at early stages rather than diagnosis and can increase the cancer survival rate.
As it is difficult to detect the breast cancer in early stages, doctors can use
CNN as second opinion. CNN provides accuracy and quality compared to other
methods used to predict the breast cancer tumours as Benign and Malignant. CNN can
be used as a second opinion by the doctors to diagnose the patients.
• stain normalization, deeper architectures, and splitting the network before the
last fully-connected layer could be investigated.
REFERENCES
\
• F. Spanhol, L. S. Oliveira, C. Petitjean, and L. Heutte, “A dataset for breast
cancer histopathological image classification,” IEEE Transactions on
Biomedical Engineering (TBME), vol. 63, pp. 1455–1462, 2015.
• Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.:
“Largescale video classification with convolutional neural networks”. In:
Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference
on. pp. 1725–1732. IEEE (2014)
45