4543-File Feedback On Revisions From Reviewers - 15071-1-4-20221226

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 8

Accredited Ranking SINTA 2

Decree of the Director General of Higher Education, Research, and Technology, No. 158/E/KPT/2021
Validity period from Volume 5 Number 2 of 2021 to Volume 10 Number 1 of 2026

Published online on: http://jurnal.iaii.or.id


JURNAL RESTI

(Rekayasa Sistem dan Teknologi Informasi)


Vol. 6 No. x (2022) x - x ISSN Media Electronic: 2580-0760

Solution to Scalability and Sparsity Problems in Collaborative Filtering


using K-Means Clustering and WP-Rank

Abstract
Collaborative Filtering is a method to be used in recommendation systems. Collaborative Filtering works by analyzing
rating data patterns. It is also used to make predictions of user interest. This process begins with collecting data and
analyzing large amounts of information about the behavior, activities, and tendencies of users. The results of the analysis are
used to predict what users like based on similarities with other users. In addition, Collaborative Filtering is able to produce
recommendations with better quality than recommendation systems based on content and demographics. However,
Collaborative Filtering still faces scalability and sparsity problems. It is because the data is always evolving so that it
becomes big data, besides that there are many data with incomplete conditions or many vacancies are found. Therefore, the
purpose of this study proposed a clustering and ranking based approach. The cluster algorithm used K-Means. Meanwhile,
the WP-Rank method was used for ranking based. The experimental results showed that the running time was faster with an
average execution time of 0.15 second by clustering. In addition, it was able to improve the quality of recommendations as
indicated by an increase in the value of NDCG at k=22, the average value of NDCG was 0.82, so that the recommendations
produced had more quality and more appropriate with user interests.
Keywords: collaborative filtering, scalability, sparsity, K-means, WP-Rank

1. Introduction mendapatkan artikel dari basis data berita yang besar.


Amazon untuk mempromosikan rekomendasi.
Recommendation systems are often used to solve
Rekomendasi didapatkan dengan melihat histori
problems by seeking relevant information from the
pembelian sebelumnya atau dengan melihan pengguna
available collection of information. Generally applied
sejenis. Selain itu ada Ringo yang membuat profile
to fields that have large amounts of data and continue
pengguna berdasarkan rating yang diberikan terhadap
to grow over time. This system will process user
album music [1]. Although CF is a popular method, it
information which then provides recommendations
faces major problems, namely cold start, sparsity and
according to the characteristics of the user, namely
scalability. Cold start is a condition of new users who
according to their specialization. One of the methods
have never given a rating to a product, so that the
used in providing recommendations according to user
information obtained for the direction of user interest
interest is collaborative filtering.
is difficult to know or new items that have never
received a rating from the user. If the direction of
Collaborative filtering (CF) works by analyzing rating
interest is unknown, it is difficult to give
data patterns, which are then used to make predictions
recommendations. Sparsity is data in sparse
of user interest based on similarities with other users.
conditions; this is because the data matrix contains
CF has several advantages including easy to
incomplete or many data voids are found. If sparse
implement and can filter all kinds of information or
data are found, then the resulting similarity value will
goods without having to analyze comments from users.
be small, either on the similarity between users or the
In addition, CF generates high quality
similarity between items, so that the resulting
recommendations. CF juga merupakan metode yang
recommendations are of less quality. Scalability is a
lengkap dan umum, serta banyak diimplementasikan
condition where recommendation systems need to
diberbagai bidang, seperti pada di Goup Lens untuk
increase their computing power to offer timely

Accepted: xx-xx-2022 | Received in revised: xx-04-2022 | Published: xx-04-2022


1
Author1, Author2
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol. 6 No. x (2022)

recommendations. This is done with large-scale data recommendation system by speeding up training
conditions and requires a lot of resources and reliable procedures[4].
computing.
Several studies have also been conducted to overcome
Several studies have been carried out to overcome this the problem of sparsity, including those conducted by
problem, such as that of Das, J et al who proposed a Ifada, Net al., who combined ratings and similarity of
clustering-based collaborative filtering approach, by film genres in solving sparsity problems. Meanwhile,
partitioning the data using CURE (Clustering using to overcome the scalability problem, Fuzzy C-Means
representatives). The results of the cluster are then is used to perform cluster movies. This approach is
processed using a collaborative filtering algorithm so able to produce dense rating data so that it can be
as to produce recommendations for target users. This scaled for data with high dimensions, and the resulting
process is carried out for each cluster, so it does not recommendations are of higher quality [5].
process the entire database of user items. In this way, Furthermore, the study of Andra, D and Baizal, Z
the time required becomes faster. In addition to proposed Principal Component Analysis (PCA) and K-
overcoming the scalability problem, the clustering Means Clustering to overcome the sparsity problem.
approach can overcome the sparsity problem by PCA is used to reduce data dimensions and improve
reducing the dimension of the rating matrix and the performance of K-Means clustering. While the K-
reducing noise data. In addition, it significantly Means Algorithm is used to form data clusters and
reduces running time and with quality reduce the amount of data processed. Using PCA and
recommendations [2]. K-Means results in a lower RMSE value compared to
other models [6]. The same thing was also done by
Wang, L et al proposed a diversified and scalable Ardimansyah, MI et al., proposing Matrix
recommendation method (DR_LT) to overcome Factorization to fill in the empty rating values to
problems in neighborhood-based collaborative filtering overcome the problem of sparse rating data [7].
(CF). Some of these problems include an increase in
the volume of rating item data from users, so the In addition, study to overcome sparsity was also
resulting recommendations are less efficient. This is carried out by Lestari, S., et al who proposed a ranking
because the recommendation system will analyze all based approach, namely the NRF (Normalized Rating
ranking data when searching for similar users or Frequency) method [8]. In addition, Lestari, S., et al
similar items. In addition, neighborhood-based also proposed WP-Rank which maximizes the use of
collaborative filtering (CF) pays more attention to ranking data to generate product weights. The
recommendation accuracy, while key indicators such experimental results show that the WP-Rank method is
as recommendation performance are often ignored superior to the Borda method [9]. They were followed
such as recommendation diversity (RD) which will by proposing the PoratRank method to generate
have an impact on recommendation results and reduce product rankings by optimizing rating data so that the
user satisfaction. By using a DR_LT, which is utilize aggregation results are in the form of product rankings
locality-sensitive hashing and cover trees to optimize recommended to users according to their interests.
the list of recommendations so that performance This process is able to produce higher quality
becomes effective, besides producing item recommendations[10].
recommendations that are accurate, diverse and able to
solve scalability problems [3]. Meanwhile, our next study is to combine a clustering
approach with a ranking based approach to overcome
Zhao, Z et al proposed a recurrent neural network the problems of scalability and sparsity. The K-Means
(RNN) to overcome the scalability problem, because it clustering algorithm is used to overcome scalability
saves memory. In addition, it is also able to overcome problems, while WP-Rank is used to overcome the
the cold start problem, as well as provide new users sparsity problem by performing an aggregation process
with the same quality of service as the old users. The so as to produce higher quality recommendations that
steps taken are, the initial stage is allocating a table of are in accordance with user preferences.
items for items and using a pair of vectors embedding
to represent each item. In this way it is possible to use
several vectors to represent many items, so it will 2. Research Methods
reduce the memory used. The second step in the item
table will use a similarity-based initialization method This study solved the problem of scalability and
so that the representation of items will be better. sparsity in Collaborative Filtering using clustering and
Followed by placing the appropriate item in the item ranking based approaches. The cluster algorithm used
table, using the loss function and adjustment method. K-Means. Meanwhile, the WP-Rank method was used
This is to improve the performance of the for ranking based. The stages of the study was seen in
Figure 1.

DOI: https://doi.org/10.29207/resti.v6iX.xxx
Creative Commons Attribution 4.0 International License (CC BY 4.0)
2
Author1, Author2
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol. 6 No. x (2022)

943 users and 1682 movies. The demographic


information includes age, gender, occupation, zip.
Start User demographic information was seen in Table 1.
Meanwhile, the list of occupations was seen in Table
2.
Data Collection

After the data collection stage was complete, it was


followed by pre-processing data. It was done by
Data Pre-Processing changing the empty rating data with a value of 0, in
addition to changing the demographic data to numeric,
K-Means as in gender to 1 and 2, while in occupation to 1-21, as
Evaluation of Cluster shown in Table 3.
(Davies Bouldin Index)

Table 3. The example of pre-processing data result


WP-Rank
Id Age Gender Occupation
1 24 2 20
Evaluation
NDCG 2 53 1 14
Running Time
3 23 2 21
4 24 2 20
End 5 33 1 14
6 42 2 7
Figure 1. Research Stages 7 57 2 1
8 36 2 1
9 29 2 19
Table 1. The Example of Information Data for Demography User
10 53 2 10
User Id Gender Age Occupation Zip Code
1 M 56 16 70072
2 M 25 15 55117
The next step was to perform clustering using the data
3 M 45 7 2460
as shown in Table 4 with the K-Means algorithm, and
4 M 25 20 55455
evaluation using the Davies Bouldin Index method to
5 F 50 9 55117
determine the optimal number of clusters. The results
6 M 35 1 6810
of the cluster were ranked using the WP-Rank method,
7 M 25 12 11413
so as to produce recommendations in the form of film
8 M 25 17 61614
rankings.
9 F 35 1 95370
10 F 25 1 4093 The next step was to evaluate the ranking quality using
NDCG, and evaluate the time used for method
Table 2. List of Occupation execution (running time).
Id Occupation Id Occupation
0 Other Or Not Specified 1 Lawyer
1 Academic / Educator 1 Programmer
K-Means
2 Artist 12 Retired K-means was a simple algorithm and the process is
3 Clerical Admin 31 Sales Marketing fast, so it is widely used in study. The K-Means
4 College / Grad Student 41 Scientist algorithm used a partitioning system to group data into
5 Customer Service 51 Self-Employed two or more clusters [11]. The K-Means algorithm
6 Doctor / Health Care 61 Technician / Engineer worked by grouping objects based on the cluster center
7 Executive Managerial 71 Tradesman / Craftsman point (centroid) closest to the object [6]. The goal was
8 Farmer 81 Unemployed to group objects by maximizing the similarity of
9 Home maker 92 Writer objects in one cluster and minimizing the similarity of
10 Student 0
objects between clusters. The measure of similarity in
the cluster used a function of distance. So that the
similarity of the object is calculated based on the
The initial stage of this study was started by collecting
shortest distance between the object and the centroid
datasets from movilens.org, namely movilen 100k with
point. One of the methods used to calculate the

DOI: https://doi.org/10.29207/resti.v6iX.xxx
Creative Commons Attribution 4.0 International License (CC BY 4.0)
3
Author1, Author2
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol. 6 No. x (2022)

distance Euclidean Distance Space by knowing the shortest distance between two points. The stages of the
K-Means algorithm were:

Table 4. The Example of clustered data using K-Means algorithm


id age gender occupation movie1 movie2 movie3 … movie1682
1 24 2 20 5 3 4 … 3
2 53 1 14 4 0 0 … 0
3 23 2 21 0 0 0 … 0
4 24 2 20 0 0 0 … 0
5 33 1 14 4 3 0 … 0
6 42 2 7 4 0 0 … 0
7 57 2 1 0 0 0 … 0
8 36 2 1 0 0 0 … 0
9 29 2 19 0 0 0 … 0
10 53 2 10 4 0 0 … 0
. . . . . . . … .
. . . . . . . … .
. . . . . . . … .
943 22 2 19 4 5 3 … 0

1. Determine the number of clusters (k=…) to be SR(R(u , R(k , p ) )=¿ (5)


, ph )
formed.
g h

2. Determine the centroid (initial initiation can be U = {u1, u2, ..., ug, …, ul-1, ul} (U was User) dan P
done by selecting data randomly). = {p1, p2, …, ph, …, pm-1, pm} (P was produk).
3. Calculate the distance on each data to the centroid.
This study used Euclidean Distance Space, using
Equation (1). 2) Determine product points


n m
D( x i , c j)= ∑ ( x i−c j)2 P(u , p ) =1+ ∑ PR(ug , ph , k)
(1)
g h
i=1 k=1
(6)

{
4. Grouping data based on proximity to the centroid.
The smaller the distance value, the closer the data 1 ,if R(u , p ) > R(u ,k) , g h g
is to the cluster centroid. 1 , if R(u , p )=R(u , k) , S(u , p ) >S (u , k) ,
5. Determine the new centroid, by finding the average PR(u g , ph , k )= g h g g h g

value of the data that was a member of the cluster. 1 ,if R(u , p )=R(u ,k) , S (u , p ) =S(u , k) ,u g <k ,
g h g g h g
Using Equation 2. 0 , otherwise
p
(7)

∑ y hj (2) Determine Process for ranking product (P(u g , ph) )


C kj= h=1 ; y hj ∈cluster ¿−k aquired from value 1 added by the result from the
p
calculation required point was seen in equation 7.
6. Repeat steps 2 to 5, this loop was able to stop if the
data position does not change anymore. 3) Counting Weight Point

℘(u , p )=( S(u , p )+ R(u , p ) ) P(u , p )


g h g h g h g h
(8)
WP-Rank
The working steps of the WP-Rank method were: 4) Counting Weight Point Rank (WP-Rank)
1) Counting the number of equal ratings.
n

n ℘−Rank( p )=∑ ℘(k , p ) (9)


=∑ SR (R(u
h h

S(u g , ph) g , ph) , R(k , p ) )


h
k=1

k=1
The ranking results from WP-Rank are then taken by
(4)
Top-K to be recommended to users.

DOI: https://doi.org/10.29207/resti.v6iX.xxx
Creative Commons Attribution 4.0 International License (CC BY 4.0)
4
Author1, Author2
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol. 6 No. x (2022)

MATLAB R2014a software, with Intel Core I7


hardware, 1TB HD capacity, and 8GB of RAM.
Evaluation
These specifications are then used to evaluate the
a. Davies Bouldin Index running time of the WP-Rank method. In addition
The Davies Bouldin Index (DBI) was used for the to calculating real time, running time was
purpose of measuring by maximizing the distance measured by calculating the time complexity T(n)
between clusters, and at the same time minimizing [15]. The algorithm is run with a number of
the distance between clustered points. The value computational steps according to the input size n
of the DBI indicated the quality of the cluster, the so that T(n) is obtained. By using the time
smaller the value of the DBI. It stated that the complexity of the algorithm, it was possible to
better the "k" value and it was the optimal determine the rate of increase in the time required
criterion for the number of clusters [12]. DBI is for the algorithm with increasing input size n.
one of the methods used to evaluate the internal Input size n: amount of data processed by an
cluster generated by the clustering algorithm. The algorithm.
smaller DBI value indicates that the number of
clusters formed is the best. DBI is used to 3. Results and Discussions
maximize the distance from one cluster to another.
This study used Movilens 100k data, so it wa clustered
In addition, it is used to minimize the distance
using the K-Means algorithm. The results of the
between points in a cluster. If the value of the
cluster were then processed using the WP-Rank
similarity of characteristics in each cluster shows
method so as to produce a product ranking (movie) as
a smaller value, it indicates that there are
a basis for recommendations to the user in the form of
differences between clusters, so the maximum
a list of movies according to their specialization.
distance is obtained. If the intra-cluster distance
shows a minimal value, then the level of similarity Experiments were carried out by clustering datasets
of cluster characteristics is high[13]. based on demographic data, especially user age using
the K-Means algorithm. The number of "k" starts from
2-25 and was evaluated using the Davies Bouldin
b. NDCG Index to find out the most optimal number of clusters.
Normalized Discount Cumulative Gain (NDCG) Experiments were carried out using RapidMiner to
was a method used to evaluate the quality of build a model using K-Means clustering and the
recommendations in the form of rankings [14] Davies Bouldin Index as shown in Figure 2 and the
[15][16]. NDCG served to measure the evaluation results was seen in Table 5.
performance of the recommendation system by
looking at the relevance value of the entity [17]. Table 5. The Result of Davies Bouldin Index
The quality of the ranking was evaluated using
GCG, which was evaluating and the top product k= Davies Bouldin Value k= Davies Bouldin Value

from the ranking results[18][8]. The NDCG 2 2,980 14 3,163

equation was written as Equation (8) and Equation 3 3,402 15 3,061

(9): 4 3,466 16 3,111

5 3,936 17 2,379
p rel i
2 −1 6 3,291 18 2,825
DCG p=∑ (8) 7 3,831 19 2,728
i=1 log 2 ( i +1 ) 3,133 2,829
8 20
9 3,621 21 2,921
DCG p 3,795
NDCG p= (9) 10 22 2,216
IDCG p 11 3,392 23 2,262
12 3,193 24 2,754

c. Running Time 13 2,809 25 2,646

Several studies have evaluated the running time to


see the algorithm's performance. There are several Table 5 showed that the results of the evaluation using
ways to do this, namely by calculating real-time or the Davies Bouldin Index, the largest value was 3,936
by calculating time complexity. When the with k = 5, while the smallest Davies Bouldin Index
algorithm is run (execution), it will calculate how value was 2,216 with k = 22. The smaller the Davies
long it will take, which is generally measured in Bouldin Index value indicated that the number of
seconds[19]. The environment and the complexity clusters was getting better (optimal) in the experiment.
of the algorithm greatly affect the results of the This meant that the best number of clusters was at
running time evaluation. This experiment uses k=22.
DOI: https://doi.org/10.29207/resti.v6iX.xxx
Creative Commons Attribution 4.0 International License (CC BY 4.0)
5
Author1, Author2
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol. 6 No. x (2022)

Figure 2. Model Clustering K-Means and Performance Evaluation

Figure 2 was a model for the clustering process using Figure 3 showed that the results of the NDCG
the K-Means algorithm and evaluation using the evaluation of the ranking quality was generated by the
Davies Bouldin Index. Determination of the value of WP-Rank method, based on the number of clusters
"k" starting from k=2 to k=25 to determine the effect formed from the K-Means algorithm. In this
of the number of clusters on the results of the Davies experiment, the number of "k" set was 2-25 clusters,
Bouldin Index. The evaluation results using the and samples were taken at k=2, k-5, k=10, k=15, k=20,
Davies Bouldin Index found that there was a k=22, and k=25. Based on the experimental, results
significant change in value, namely in the number of showed that there was an increase in the value of
clusters 2, 3, and 4; the average Davies Bouldin Index NDCG from NDCG 1-10 at k=2 to k=22. The average
value was above 3. Meanwhile, for the number of NDCG values were 0.63, 0.68, 0.78, 0.76, 0.81, and
clusters 5-25, the average Davies Bouldin Index value 0.82, respectively. Meanwhile, for k=25, there was a
was above 2. However, the most optimal number of decrease in the average value of NDCG, which was
clusters in this experiment is shown in the number of 0.79. There was a significant difference at k=2 and
clusters 22, with the smallest value of 2,216. k=5 when compared to k=22, namely 0.19 and 0.14.
Meanwhile, for k=10, k=15, k=20, they are 0.04, 0.06,
Furthermore, this study evaluated the quality of the and 0.01. However, there was a decrease in the
ranking generated by the WP-Rank method based on average value of NDCG at k=25 although it was not
the cluster formed using NDCG. The results of the too significant, namely 0.02. The average value of
evaluation was seen in Figure 3. NDCG was k=22. It showed that the highest value
because k=22 was the most optimal number of clusters
according to the results of the evaluation using the
Davies Bouldin Index. In addition, it showed that the
optimal number of clusters affects the quality of the
recommendations as indicated by a better NDCG
value.
The next evaluation was running time by calculating
the real time the process takes to execute the input data
to produce a ranking. The results of the running time
evaluation was seen in Figure 4.
Figure 4 showed the results of the running time
evaluation for k=2 to k=25. The average time required
for execution was 1.923, 0.507, 0.239, 0.165, 0.145,
Gambar 3. The Evaluation of NDCG on WP-Rank Implementation

DOI: https://doi.org/10.29207/resti.v6iX.xxx
Creative Commons Attribution 4.0 International License (CC BY 4.0)
6
Author1, Author2
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol. 6 No. x (2022)

0.164, and 0.110 second. There was a significant Acknowledgment


decrease in the execution time required at k=2 and
This study was supported by research grants organized
k=5, with a difference of 1.42 seconds. Meanwhile, at
by the Directorate of Research, Technology, and
k=5 and k=10, there was also a decrease in execution
Community Service (DRTPM), Master Thesis
time, namely 0.27 seconds, but for k=15, k=20, k=22,
Research Scheme (PTM) with number
k=25, the execution time required was relatively
1367/LL2/PG/2022.
stable, namely 0.15 seconds on average. Based on this,
it concluded that the implementation of the K-Means
Reference
clustering algorithm was able to overcome the
scalability problem which is one of the problems faced [1] K. C. Jena, S. Mishra, S. Sahoo, and B. K. Mishra,
by Collaborative Filtering. The K-Means algorithm “Principles, Techniques and Evaluation of
Recommendation Systems,” in International Conference
partitions data so that the execution process was faster on Inventive Systems and Control (ICISC), 2017, pp. 1–6.
and produces recommendations with better quality. https://doi.org/10.1109/ICISC.2017.8068649
[2] J. Das, T. H. Academy, M. Banerjee, T. H. Academy, and
S. Majumder, “Scalable Recommendations using
Clustering based Collaborative Filtering,” in International
Conference on Information Technology (ICIT), 2019, no.
March 2020, pp. 1–6.
https://doi.org/10.1109/ICIT48102.2019.00056
[3] L. Wang, X. Zhang, T. Wang, S. Wan, G. Srivastava, and
S. Member, “Diversified and Scalable Service
Recommendation With Accuracy Guarantee,” IEEE
Trans. Comput. Soc. Syst., pp. 1–12, 2020.
https://doi.org/10.1109/TCSS.2020.3007812
[4] Z. Zhao, Y. Sheng, M. Zhu, and A. J. Wang, “A Memory-
Efficient Approach to the Scalability of Recommender
System With Hit Improvement,” IEEE Access, vol. 6, pp.
67070–67081, 2018.
https://doi.org/10.1109/ACCESS.2018.2878808
[5] N. Ifada, “Employing Sparsity Removal Approach and
Fuzzy C-Means Clustering Technique on a Movie
Figure 4. The Result of Running Time Evaluation Recommendation System,” in International Confrence on
Computer Engineering, Network and Intelligent
Multimedia (CENIM), 2018, pp. 329–334.
4. Conclusion https://doi.org/10.1109/CENIM.2018.8711270
[6] D. Andra and A. Baizal, “E-commerce Recommender
This study implemented the K-Means algorithm and System Using PCA and K-Means Clustering,” J. RESTI
the WP-Rank method to overcome scalability and (Rekayasa Sist. dan Teknol. Inf., vol. 6, no. 158, pp. 57–
sparsity problems. Based on the experiment it 63, 2022. https://doi.org/10.29207/resti.v6i1.3782
concluded that: [7] M. I. Ardimansyah, A. F. Huda, and Z. K. A. Baizal,
“Preprocessing Matrix Factorization for Solving Data
Sparsity on Memory-Based Collaborative Filtering,” in
1. The scalability problem in Collaborative Filtering 3rd International Conference on Science in Information
can be overcome by implementing the K-Means Technology (ICSITech) Preprocessing, 2017, pp. 521–525.
https://doi.org/10.1109/ICSITech.2017.8257168
algorithm, namely by partitioning the data into
[8] S. Lestari, T. B. Adji, and A. E. Permanasari, “NRF :
several clusters. The experimental results showed Normalized Rating Frequency for Collaborative Filtering,”
that the optimal number of clusters was k=22. It in 2018 International Conference on Applied Information
was indicated by the results of the evaluation using Technology and Innovation (ICAITI), 2018, pp. 19–25.
https://doi.org/10.1109/ICAITI.2018.8686743
the Davies Bouldin Index with the smallest value [9] S. Lestari, T. B. Adji, and A. E. Permanasari, “WP-Rank :
of 2.216. In addition, the results of the evaluation Rank Aggregation based Collaborative Filtering Method
of running time were faster with an average in Recommender System,” Int. J. Eng. Technol., vol. 7,
execution time of 0.15 second. pp. 193–197, 2018.
http://dx.doi.org/10.14419/ijet.v7i4.40.24431
2. The combination of clustering and ranking based [10] S. Lestari, R. Kurniawan, and D. Linda, “Porat Rank to
on the approaches of the WP-Rank method Improve Performance Recommendation System,” in
overcome the sparsity problem, because clustering Proceedings of the 1st International Conference on
it was able to reduce the dimensions of the rating Electronics, Biomedical Engineering, and Health
Informatics, Lecture Notes in Electrical Engineering 746,
matrix and empty data. Furthermore, the 2021, pp. 1–14. https://doi.org/10.1007/978-981-33-6926-
aggregation process of the WP-Rank method was 9_1
to produce quality recommendations as indicated [11] M. Jumarlis and Mirfan, “Detecting Diseases on Clove
by the average value of NDCG at k = 22 of 0.82. Leaves Using GLCM and Clustering K-Means,” J. RESTI
(Rekayasa Sist. dan Teknol. Informasi), vol. 6, no. 4, pp.
624–631, 2022. https://doi.org/10.29207/resti.v6i4.4033
The further study will compare it with other clustering [12] A. K. Singh, S. Mittal, Y. V. Srivastava, and P. Malhotra,
algorithms to determine the effect on the quality of “Clustering Evaluation by Davies-Bouldin Index ( DBI )
recommendations. in Cereal data using K-Means,” in International

DOI: https://doi.org/10.29207/resti.v6iX.xxx
Creative Commons Attribution 4.0 International License (CC BY 4.0)
7
Author1, Author2
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol. 6 No. x (2022)

Conference on Computing Methodologies and


Communication (ICCMC), 2020, no. Iccmc, pp. 306–310.
https://doi.org/10.1109/ICCMC48092.2020.ICCMC-
00057
[13] H. Santoso and H. Magdalena, “Improved K-Means
Algorithm on Home Industry Data Clustering in the
Province of Bangka Belitung,” in International
Conference on Smart Technology and Applications
(ICoSTA), 2020, pp. 1–6.
https://doi.org/10.1109/ICoSTA48221.2020.1570598913
[14] D. Luo and N. J. Yuan, “Representation Learning with
Pair-wise Constraints for Collaborative Ranking,” Tenth
ACM Int. Conf. Web Seacrh Data Min., pp. 567–575,
2017. https://doi.org/10.1145/3018661.3018720
[15] B. Shams and S. Haratizadeh, “IteRank : An iterative
network-oriented approach to neighbor-based
collaborative ranking,” Knowledge-Based Syst., vol. 128,
pp. 102–114, 2017.
https://doi.org/10.1016/j.knosys.2017.05.002
[16] B. Shams and S. Haratizadeh, “Item-based collaborative
ranking,” Knowledge-Based Syst., vol. 152, pp. 172–185,
2018. https://doi.org/10.1016/j.knosys.2018.04.012
[17] N. Ifada, T. F. Rahman, and M. K. Sophan, “Comparing
Collaborative Filtering and Hybrid based Approaches for
Movie Recommendation,” in Information Technology
International Seminar (ITIS), 2020, pp. 219–223.
https://doi.org/10.1109/ITIS50118.2020.9321014
[18] L. Niu, Y. A. N. Peng, and Y. Liu, “Deep
Recommendation Model Combining Long - and Short-
Term Interest Preferences,” IEEE Access, vol. 9, pp.
166455–166464, 2021.
https://doi.org/10.1109/ACCESS.2021.3135983
[19] J. Chen, H. Wang, and Z. Yan, “Evolutionary
heterogeneous clustering for rating prediction based on
user collaborative fi ltering ☆,” Swarm Evol. Comput.,
vol. 38, no. April 2017, pp. 35–41, 2018.
https://doi.org/10.1016/j.swevo.2017.05.008

DOI: https://doi.org/10.29207/resti.v6iX.xxx
Creative Commons Attribution 4.0 International License (CC BY 4.0)
8

You might also like