Classification and Popularity Assessment of English Songs Based On Audio Features
Classification and Popularity Assessment of English Songs Based On Audio Features
Abstract:- Nowadays a large amount of new music breakthrough in speech recognition and image processing. In
emerges every year. How to properly categorize music for addition to the classical algorithms listed above, there are
quick browsing and retrieval by users and evaluate music many practical music classification algorithms. These methods
popularity based on audio features is an important are based on extracting features that reflect the essential
research topic. In this study, the decision tree model is properties of music, designing high-performance classifiers,
used to classify music styles on a dataset consisting of and optimizing the classification results.
audio features of 4802 songs from 2008-2017. Then, the
number of music listening in the dataset was used as an Meanwhile, with the significant increase in the number
indicator to assess the popularity of songs. By comparing of online music releases each year, how to predict the
the training results of different Machine Learning popularity of music and push music on this basis has become
algorithms on the dataset, Gradient Boosting Regressor is an important area that affects the activity of music website
chosen to be used in this case, and the relative importance users [4-5]. Auditing every music piece without a purpose will
of different audio features on the popularity of songs was undoubtedly add a lot of unnecessary time costs for music
calculated with this model. users. Since many users browse and enjoy music works on
electronic music platforms every day, the resulting massive
Keywords:- Audio Features, Machine Learning, amount of recorded data on users' browsing collections and
Classification. listening to music is an essential guide to music trends and
users' preferences [6].
I. INTRODUCTION
This study is based on 17.7K English song data from
With the rapid development and popularity of the 2008-2017 and the track metrics compiled by The Echo Nest
Internet and information technology, online music has become on the Kaggle platform. The dataset provides several audio
an essential form of entertainment in people's daily lives. features, including acousticness, danceability, energy,
Music websites and applications based on streaming instrumentalness, liveness, tempo, and valence, as well as
technology have also become the primary channels for people label data such as music classification and the number of
to access music. At the same time, a large amount of new music listening. This study uses the above data set to train the
music emerges every year. It becomes an important research music classifier with the Decision Tree algorithm. At the same
topic to classify these music appropriately for users to browse time, the number of music listening is the most famous
and retrieve them quickly and evaluate whether users will indicator of music works, and the evaluator of user
welcome the songs based on the features in the music. preferences is constructed using music feature data.
Music is a more complex audio information than speech, II. MUSIC STYLE CLASSIFICATION
containing various elements such as human voice, musical
instruments, nature sounds, and noise. In the early days of In this study, the decision tree model is used to classify
music information processing research, the focus was on music styles on a dataset consisting of audio features of 4802
music recognition and retrieval methods. It was not until the songs from 2008-2017. The Decision Tree model used for the
1990s, with the rise of Internet technology, that the field of classification problem generalizes the classification rules
music classification algorithms came to the forefront. from the training dataset and is a supervised learning method.
Matisyahu et al. proposed a method in 1995 to preprocess The decision tree model has a tree-like structure and
audio information using the Fourier transform and then represents classifying data based on features. The advantage
classify it using artificial neural networks [1]. In 1996 Wold et is that the model is readable, and the classification is faster
al. proposed to use the mean, variance, and autocorrelation than other commonly used algorithms. Although there is an
correlation coefficients as features to classify audio signals infinite number of conditional probability models based on
using the KNN algorithm [2]. In 2002, Tzanetakis et al. used the class division in the feature space, during the training
timbre pitch and rhythm as features to classify music with process of the decision tree, the model that fits the training
61% classification accuracy [3]. In 2012, the Google Brain data well and has excellent predictive power for the unknown
project used a single amount of computing resources to train a data should be selected.
deep neural network (DNN), which achieved a significant