Survey On Crime Analysis and Prediction Using Machine Learning Techniques
Survey On Crime Analysis and Prediction Using Machine Learning Techniques
Social media text analytics is the process of deriving information from text
sources. Text analysis can be applied to any text-based dataset, including
social media. Crime is a major problem faced today by society even we are
finding in social media. Crimes have affected the quality of life and
economic growth badly. It can identify the crime patterns and predict the
crimes by detecting and analyzing the historical data. However, some
crimes are unregistered and unsolved due to a lack of evidence. Thus,
detecting crimes is a still challenging task. Some can use social media to
detect crimes related activities. Because social media users sometimes
convey messages related to his or her surrounding environment via social
media message. It is proposed as a machine learning approach to detect
the crimes and analyses it on its type. As the first step, we fetch the text
messages using predefined keywords relating to the crimes. Then, after the
preprocessing, we applied a support vector machine- based filtering
approach to eliminate the noise. And then Random forest is used for
classification . Then in the final stage, it analyses and categorize the crime
type.
v
TABLE OF CONTENTS
ABSTRACT v
ABBREVIATIONS x
1 INTRODUCTION 1
1.3 2
DOMAIN INTRODUCTION
2
1.3.1 BASICS OF PYTHON
2
1.3.2 PYTHON FEATURES
2.4
HARDWARE REQUIREMENTS 3
v
3 METHODS AND ALGORITHMS USED
4
3.1
DATA COLLECTION 4
3.2
DATA CLEANING 4
3.3
DATA PREPROCESSING 4
3.4
FEATURE EXTRACTION 5
3.9.1 NLTK 10
3.9.2 TENSORFLOW 10
3.9.3 KERAS 10
3.10
SUPPORT VECTOR MACHINE 11
4
RESULTS 22
4.1
RESULTS 22
4.2
SCREEN SHORTS OF OUTPUT 23
5
CONCLUSION AND FUTURE WORK 24
5.1
CONCLUSION 24
5.2
FUTURE SCOPE 24
REFERENCES 25
v
APPENDIX 26
A. SOURCE CODE 26
v
LIST OF FIGURES
3.12 BOOTSTRAPPING 19
ix
ABBREVIATIONS
ML MACHINE LEARNING
ix
CHAPTER 1
INTRODUCTION
Security is a very necessary aspect of life. Unless we are safe, our most
important needs cannot be met. Security is therefore a requirement in
human life that helps us to achieve our goals collectively or individually.
Crimes are a social problem, which costs our society deeply in many
aspects. The ability to identify unsafe areas for crime and identify the most
recent crime in a particular location has become a growing concern for
both local authorities and residents. On the other hand, people are always
interested in improving safety and make reliable relationships with
neighbors when living in a busy society. The prevalence of crime is one of
the greatest challenges for societies around the world, particularly in
metropolitan areas There are more researches regarding social crimes in
the world, but using social media, there are few types of research about
crimes and their behavior. Therefore, the paper aims in presenting a
prediction model (algorithm) by using the machine-learning technique,
which is meant to possess a strong capability to predict crimes by factors
of social media dataset using the Data Mining concept. Our main data
source is social media. The main goal is to identify each hidden data
source and predict results.
● The main objective of the project is to predict the crime rate and
analyze the crime rate to be happened in future. Based on this
Information the officials can take charge and try to reduce the crime
rate.
● The concept of Multi Linear Regression is used for predicting the
graph between the Types of Crimes (Independent Variable) and the
Year (Dependent Variable)
1
● The system will look at how to convert crime information into a
regression problem, so that it will help detectives in solving crimes
faster.
● Crime analysis based on available information to extract crime
patterns. Using various multi linear regression techniques,
frequency of occurring crime can be predicted based on territorial
distribution of existing data and Crime recognition.
Python has a simple syntax similar to the English language. Python has
syntax that allows developers to write programs with fewer lines than some
other programming languages. Python runs on an interpreter system,
meaning that code can be executed as soon as it is written.
2
CHAPTER 2
AIM AND SCOPE
● Libraries: OpenCV
3
● Input Devices : Keyboard, Mouse, Ram : 2 GB
CHAPTER – 3
● The search of the social media posts must be based on a set of keywords
that can be used to classify the crime situations.
● Thus in the first filter, we used the main crime-related keywords according
to crime categories.
● Because, there may be typos, unwanted content like URLs, and stop
words in the social media post.
● Thus, data which is obtained from social media is highly unstructured and
noisy.
4
● Pre-processing techniques will generate clean tweet data that will be used
for the next process.
● First, we removed the stop words such as is, the, which, have, etc. The
words do not convey any positive or negative meaning. So, we can easily
remove the stop word without affecting the meaning of the message.