Bi Et Big Data
Bi Et Big Data
Bi Et Big Data
PII: S0957-4174(18)30309-9
DOI: 10.1016/j.eswa.2018.05.018
Reference: ESWA 11971
Please cite this article as: Ting-Peng Liang , Yu-Hsi Liu , Research Landscape of Business Intelli-
gence and Big Data Analytics: A Bibliometrics Study, Expert Systems With Applications (2018), doi:
10.1016/j.eswa.2018.05.018
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service
to our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and
all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
T
IP
Ting-Peng Liang1
Department of Information Management
CR
National Sun Yat-Sen University
Kaohsiung, Taiwan
Email: [email protected]
US
Yu-Hsi Liu2
AN
Institute of Economics
Academia Sinica
Taipei, Taiwan
M
Email: [email protected]
ED
PT
CE
AC
1
Corresponding author; National Chair Professor and Director of Electronic Commerce Research
Center, National Sun Yat-Sen University, No. 70, Lianhai Road, Gushan District, Kaohsiung City,
Taiwan 80444. [email protected]. 886- 07-5252000 ext. 4781
2
Postdoctoral Fellow, Institute of Economics, Academia Sinica, No. 128, Section 2, Academia Rd,
Nangang District, Taipei City, Taiwan 11529. [email protected]
1
ACCEPTED MANUSCRIPT
Abstract
T
publications from 1990 to 2017 in journals indexed in Science Citation Index
Expanded (SCIE), Social Science Citation Index (SSCI) and Arts & Humanities
IP
Citation Index (AHCI). We map the time trend, disciplinary distribution,
CR
high-frequency keywords to show emerging topics. The findings indicate that
Computer Science and management information systems are two core disciplines that
drive research associated with Big data and Business Intelligence. “Data Mining”,
US
“Social Media” and “Information System” are high frequency keywords, but “Cloud
Computing”, “Data Warehouse” and “Knowledge Management” are more emphasized
after 2016.
AN
M
2
ACCEPTED MANUSCRIPT
1. Introduction
T
while the academic research related to Big Data and Business Intelligence has thrived.
The number of research papers is increasing very fast. Research topics range from
IP
concepts, methodologies, applications, and management. Hence, it is valuable to
CR
provide an overview of the published research so that interested scholars can easily
know the research profile so far.
US
For this purpose, we conducted a Bibliometric study to examine the academic
research output related to “Big Data” and “Business Intelligence” and analyzed
publication data obtained from Web of Science, that includes papers indexed in
AN
Science Citation Index Expanded (SCIE), Social Science Citation Index (SSCI), Arts
& Humanities Citation Index (AHCI), and Emerging Sources Citation Index (ESCI).
The data period is from 1990 to December 31, 2017. Indexed publications with key
M
words of “Big Data” and “Business Intelligence” in their title, abstract or subject are
retrieved and analyzed. Findings are then presented.
ED
2. Research Background
PT
Both “Big Data” (BD) and “Business Intelligence” (BI) are fast growing key words in
recent academic research. While “big data” becomes popular recently, “business
CE
Intelligence” was proposed much earlier. Luhn (1958) began to use the term
“Business Intelligence” to describe an automatic system that disseminates information
and supports decision-making process. The concept was later assimilated into the area
AC
of decision support and information systems. For instance, Vitt, et al (2002) defined
business intelligence as a multifaceted concept that includes three different
perspectives: making better decision faster, converting data into information, and
using a rational approach to management (p.13). They define a BI cycle to include
four phases: analysis, insight, action, and performance measurement. Turban, et al.
(2005) further expanded BI to cover data warehouse, data acquisition, data mining,
business analytics, and visualization.
3
ACCEPTED MANUSCRIPT
The term “Big Data” was not mentioned until 2011. Berry (2011) first proposed the
significance of “Big Data” to management in an academic publication. At the same
time, McKinsey & Company (Manyika et al, 2011) also addressed that the technology
and platform of “Big Data” had become a vital factor to enhancing a firm‟s
productivity and competitiveness. After these two seminal works, the publication of
“Big Data” has booted exponentially.
T
data collection, storage and analytics, while BI focuses more on data analysis,
IP
visualization and applications for business decision making. Previous research in
these areas have significant overlap. For example, Tanev, et al. (2015) applies web
CR
search and online data reduction techniques to assess the value of product-enabled
services. He, et al. (2015) analyzed social media data to obtain competitive
intelligence. Griva, et al. (2018) analyzed market basket data to segment customers. In
US
addition to marketing research, Moro, et al. (2015) summarized applications of text
mining for business intelligence in banking. Sun, et al. (2014) examined business
AN
intelligence in real estates. Chung (2009) studied the effect of visualization in
business intelligence. Brichni, et al. (2017) proposed a method to evaluate business
intelligent systems. Above sample papers show the diversity of BD and BI research in
M
recent years. They also indicate the value to provide a more comprehensive snapshot
of research related to BD and BI.
ED
3. Research Methodology
In order to have a more comprehensive profile of BD and BI, we built our data set
PT
citation patterns and the topic evolutions of the related academic outputs. Citespace
and VowsViewer were used to conduct the Bibliometric study.
4
ACCEPTED MANUSCRIPT
publication trend, knowledge base, citation pattern, author network, reader usage,
impact and importance of a subject or a paper.
How the academic outputs related to “Big Data” and “Business Intelligence”
have grown and evolved in the last decade?
How research topics change and evolve in these academic outputs?
Which discipline drives the related research?
T
Who are major contributors toward these outputs? Which paper is the most
IP
influential?
What are the most-cited references to these outputs?
CR
Our search using “big data” and “business intelligence as key words resulted in the
database that includes 10,637 publications associated with “Big Data” and 1,168
US
publications associated with “Business Intelligence.” Among these documents, 141
publications contain both “Big Data” and “Business Intelligence.”
AN
The first analysis is publication trend. Figure 1 shows the time trend of “Big Data”
and of “Business Intelligence.” Less than 38 academic outputs of “Big Data” were
ED
found until 2011. The number increased to 92 in 2012 and multiplied very quickly
afterward. In the single year of 2016, the number of BD publications went up to 3,287.
In contrast to “Big Data”, the number of BI publications stayed relatively stable over
PT
the years. The trend of “Business Intelligence” started long before 2012, and
increased to 48 in 2008, much higher than that of BD. However, the number only
CE
We also examine the 141 papers that simultaneously include “BD&BI” as key words .
Figure 2 shows the time trend of BD&BI publications. The number of these
publications significantly increased to 32 in 2015 and continued to grow, but it is still
not comparable to that of BD papers. The reason behind the small publication number
could be that although the applications of BI and BD usually overlap, most papers
may choose to show their major orientation as either technical or managerial. Another
possibility is that big data is much of a buzzword that has been used extensively in
commercial outlets also, while business intelligence is more restricted to certain
5
ACCEPTED MANUSCRIPT
T
2000 2143
IP
1500
1000 1009
54 85 396
500 41
CR
48 159 183 192
0 11 5 82 94
22 92
2008 2010 2011 2012 2013 2014 2015 2016 2017
PUBLISHED YEAR
US
Figure 1: Time Trend of BD and BI Research
AN
The Time Trend of Related Research
60
"Big Data" & "Business Intelligence" 57
M
50
THE NUMBER OF PAPERS
40
ED
37
30 32
20
PT
10 8
4
0 1
CE
Table 1 presents the statistics of document types in the data set. Our data set include
publications of article, editorial material and book reviews. Among the 10,637
publications of “Big Data”, 77.52% are articles, 10.63% are editorial materials, and
6.58% are reviews. Among the 1144 publications of “Business Intelligence”, 89.97%
are articles. For “BD&BI” research, 85.1% are article. In Table 1, the summation of
the percentages may exceed 100% and the record count may exceed the total number
6
ACCEPTED MANUSCRIPT
T
Editorial 1156 10.63% 35 2.95% 8 9.22%
IP
material
Review 716 6.58% 35 2.95% 13 5.67%
CR
Book review 98 0.90% 17 1.43% - -
Meeting abstract 309 2.84% 18 1.52% - -
Proceedings
paper
194 1.78%
US 71 5.99% 3 2.13%
AN
5. Major Keywords and Topics
Table 2 summarizes the high frequency keywords of the “Big Data” and “Business
M
Although a few keywords such as “data mining”, “social media” and “management”
are overlapped, we see significant discrepancy between these two groups of research.
CE
more application-oriented.
T
performance 262 information systems 44
IP
social media 255 design 40
privacy 240 olap 39
CR
internet 233 web 38
surveillance 215 decision support system 37
data analytics
hadoop
prediction
205
186
181
UScompetitive intelligence
information technology
business analytics
29
28
27
AN
optimization 176 decision making 26
internet of things 167 design science 26
M
determined by the cluster to which the keyword belongs. Lines among key words
indicate the strongest co-citation links between keywords. “Big Data” is the center of
the cloud since it is the search key. Consistent with Table 2, “model”, “algorithm”,
PT
belong to 5 different main areas. The red cluster is formed by healthcare area
publications, and the green cluster belongs to computer science area. “Business
Intelligence” is on the top front belonging to the yellow cluster. The yellow cluster
AC
8
ACCEPTED MANUSCRIPT
T
IP
CR
US
AN
M
ED
PT
CE
9
ACCEPTED MANUSCRIPT
T
IP
CR
US
Figure 4 Evolution of top keywords in “BD&BI” publications
Given the popularity of BD and BI, many special issues on these topics have been
AN
published. The publication of special issues may be another way to observe research
evolution as special issues show interests of academic journals. Figure 3 shows the
journal special issues that we found. MIS Quarterly (2012) published the first Special
M
utilize text analysis, web analysis, network analysis in finance, bank, customer service
and other industries. Most other special issues are published in or after 2016. For
instance, IEEE Transactions on Cybernetics (2016) published a special issue on “Risk
PT
Intelligence in Big Data Era”. This special issue highlights how to build an effective
data-oriented risk analytics system (Wu et al., 2016). Most recently in December 2017,
CE
Information System Frontier (Huang et al.,2017) also published a special issue that
emphasized more on cloud-based issues such as cloud storage and cloud computing.
This trend toward cloud-based issues echoes to the evolution of keywords over time.
AC
Expert Systems with Applications publish this special issue on big data analytics for
business intelligence in 2018. A special issue focused on the strategic value of big
data analytics will appear in the Journal of Management Information Systems in 2018.
T
MIS Quarterly 2016
networked business
IP
Big Data Analytics and Business
Information Systems Frontiers 2017
Intelligence in Industry
CR
Does Big Data Mean Big
Knowledge? Knowledge
Journal of Knowledge Management 2017
Systems
Intelligence
ED
publications.
CE
The keyword coded as "0" is "cardiovascular disease", the earliest subject coming out
with the papers published before 2013. It shows that medical-related issues are where
BD started in social sciences. It also echoes the fact that health care service is the
AC
most important field for BD applications. This is probably due to the availability of
healthcare data from insurance companies and hospitals.
Among the ten topics in Figure 5, we can see that the topic of “agenda setting” (#4)
may have come to an end. Research on agenda setting was documented as early as
2007 and became popular in 2009. In 2014, there were still many papers referring to
"agenda setting" (the circle size reflects popularity of publication), but almost none
was cited after 2015. It may indicate that the studies involving "agenda setting" in BD
had come to an end.
11
T
IP
CR
US
ACCEPTED MANUSCRIPT
AN
12
M
ED
PT
CE
Figure 5 Timeline View of the “Big Data” Cited Network
AC
ACCEPTED MANUSCRIPT
Figure 5 also shows that "business failure" (#2), “online user review" (#7) and
"supply chain" (#3) are areas that have attracted attentions continuously. These three
long-lasting key words are all important topics in management field. “Business
Failure” reached a peak referential period in 2012 and had been published as latest as
2016. "Supply chain" has been constantly discussed and cited in every time period
and is one of the few items still being investigated in 2017 and 2018, though the
number of relevant papers is smaller than other subjects. “Online user review” has
less citations than “business failure” and “supply chain”, but is still cited and
T
discussed from the beginning to 2016.
IP
CR
7. Disciplinary Distribution and Major Journals
Another issue we may look into is the disciplines involved in BD and BI. We use
special issues published by research journals as our evidence. Table 4 summarizes the
US
academic fields of eight special issues on BD and BI. Three journals fall into the
Computer Science field, and the others are related to Information Science and
AN
Management. This implies that Computer Science has been the core discipline that
drives the research on BD and BI, while information science and management are also
important disciplines.
M
Publish
Name of the Journal Academic Fields
Date
2012
MIS Quarterly Computer science, Information systems, Management
PT
2016
IEEE Transactions on
2016 Computer science, Cybernetics, Artificial intelligence
CE
Cybernetics
Information Technology
2016 Information Systems, Tourism
Tourism
AC
Journal of Marketing
2016 Management, Marketing
Management
Information Systems Computer science, Information systems, Theory &
2017
Frontiers methods
Journal of Knowledge
2017 Information science & Library science, Management
Management
Expert Systems with Computer science, artificial intelligence, Engineering,
2018
Applications Operations research & management science
13
ACCEPTED MANUSCRIPT
Journal of Management
2018 Information systems, Management
Information Systems
Table 5 presents the top 10 journals that published the highest number of BD and BI
papers in descending order. We can find that these two groups of journals have
overlaps, but their top lists are quite different. Compared to BI papers mainly
published in computer science journals, information system and management journals,
BD papers were published in more diversified journals that emphasized
interdisciplinary applications. Three journals are specific to BD and claimed to be
T
multidisciplinary. To sum up, journals that publish BI research are more related to
IP
management fields while journals that publish BD research are broader.
CR
Table 5 Major Journals Which Published Most Big Data and Business
Intelligence Research
Journal
IEEE access
Big Data
Counts
124
Associated Fields
Computer Science,
US Journal
Business Intelligence
Counts Associated Fields
Computer science-artificial
AN
Information Systems, Expert Systems intelligence, Engineering,
38
Electrical & Electronic with Applications Operations research &
Engineering Management science
M
35 systems,
Systems
Operations research &
management science
PT
Applications
Neurocomputing 75 Computer science Knowledge Based Computer science-
14
-artificial intelligence Systems artificial intelligence
Agro FOOD Industry 69 Biotechnology &
Industrial
Hi Tech Applied Microbiology; Computer science-
Management & 13
Food science & Engineering-industrial
Data Systems
technology
Journal of 64 Computer
Supercomputing science-hardware & International
T
architecture, theory & Journal of Data Computer science- software
IP
13
methods; Warehousing and engineering
Engineering-electrical Mining
CR
& electronic
Information Sciences 55 Computer science- Computer science-
Information
information systems
US Systems Frontiers
12 information systems
AN
8. Major Authors and Influential Publications
Our dataset allows us to find most influential authors and most cited papers among
M
these 141 BD & BI publications. Table 6 lists the publications with the most citation
and centrality in the academic networks. “Citations” are the frequency of being cited
ED
in the whole data bank, while “Links” is the frequency of being linked among the 141
BD-BI publications Both Citations and Links measure publication importance and
author influence.
PT
Links
Citations
(Among the
Publication (in WOS Data
141
AC
Bank)
Publications)
Chen et al. (2012) 634 50
Wang et al. (2015) 162 9
Tien, J. M. (2013) 41 4
Chang, Y. W., Hsu, P. Y., & Wu, Z. Y. (2015) 29 3
Freire et al. (2016) 46 0
He et al. (2015) 25 2
Fuchs et al. (2014) 24 2
15
ACCEPTED MANUSCRIPT
Table 6 shows that, among these 141 BD & BI publications, Chen et al. (2012) is the
most influential paper with 634 times of citations and involvement in 50 links. This is
because it is the position paper of the first special issue published in MIS Quarterly.
The second influential publication is Gandomi, et al. (2015), followed by Tien et
al.(2013) and Chang et al. (2015). These are important literature and the knowledge
T
base for later BD & BI research.
IP
CR
US
AN
M
ED
PT
CE
AC
16
ACCEPTED MANUSCRIPT
(2014) are also significant nodes with high popularity. This supports the previous
argument that these three papers are important to BD&BI literature.
Given the profile indicated in previous analysis, we are able to identify a few key
directions for future research. Figure 7 shows a general framework that divides
research topics into four dimensions: technology, applications, management, and
impact. Within each dimension, many possible topics need to be further explored.
T
The technology dimension, for instance, includes issues related to data collection,
IP
storage, analytics, and integration infrastructure. For example, sentimental analysis
needs to collect and analyze textual data properly. Technology for parsing the
CR
collected textual data properly and defining positive or negative emotion are also key
research issue.
US
AN
M
ED
PT
CE
AC
Application issues are those associated with applying certain technology to a specific
domain. For instance, business applications are oriented toward profit making, while
medical applications may focus more on accuracy or calculation efficiency. Risks
involved in different application domains may be important too. For example,
marketing plan derived from inaccurate segmentation of customers may cause
17
ACCEPTED MANUSCRIPT
monetary loss but prescription from wrong patient diagnosis may result in loss of life
that is totally unacceptable. Hence, researchers need to take concerns unique to
application domains into consideration when they conduct BD/BI research.
Management issues include factors that affect the adoption of BD/BI technology, the
cost-benefit assessment when the technology is to be adopted, security and privacy
issues involved in BD/BI, and organizational readiness (e.g., human resources) of
adopting BD/BI. A number of theories related to the adoption of information
technologies are available. They are helpful in investigating why BD/BI is adopted or
T
not adopted. Security and privacy issues are big concerns as well from the
IP
management perspective.
CR
The impact of BD/BI is another dimension that has not yet been thoroughly studied.
Most previous research focus on the positive side of BD/BI for promoting the
technology, but has yet to prove the value creation from BD/BI or the avoidance of
US
negative impact. Most existing case reports are based on anecdotal evidence. We need
more large-scale research to verify the value of BD/BI, both strategic and managerial
AN
values. We also need research to investigate the impact (both positive and negative) of
BD/BI on individual life, organizational operations, and social activities. For instance,
how would location or traffic data of users collected from mobile Apps (e.g., Google
M
map) may enhance the safety (or police beat) in a community with minimum invasion
into individual privacy.
ED
This paper reports results from a bibliometric analysis on published academic papers
PT
associated with “Big data” and “Business Intelligence”. Using CiteSpace, VOSViewer
and descriptive statistics, we analyzed publication data from 1990 to 2017 in journals
CE
indexed in Science Citation Index, Social Science Citation Index and Arts &
Humanities Citation Index. A total of 10,637 publications with “Big Data” as key
words and 1,168 publications with “Business Intelligence” as key words were
AC
identified and analyzed. The time trend, their disciplinary distribution, high-frequency
keywords and topic evolutions of these academic outputs have been reported.
A few major findings have been found. First, although “Business Intelligence”
emerged long before “Big Data” and has grown steadily, its growth rate is below that
of BD publications, which has increased explosively after 2013. This reflects the huge
interests in BD research in recent 5 years.
18
ACCEPTED MANUSCRIPT
Third, interested topics also differ as observed by the difference in popular keywords.
High frequency keywords associated with BD research are related to algorithm and
computing, while those associated with BI research are more focused on management
and decision support. In Big Data literature, the keyword “Business Intelligence” is
T
directly linked to “management”, “data analytics” and “predictive analytics”, which
IP
shows the nature of BI research.
CR
Fourth, a few papers have been well-cited and become knowledge core of BD and BI
research. From the citation of 141 papers with both BD and BI as keywords, we find
Chen et al. (2012) to be the most popular one. Tien (2013) and Fernández (2014) are
US
also significant nodes. “Data Mining”, “Social Media” and “Information System” are
high frequency keywords, while keyword “Cloud Computing”, “Data Warehouse” and
AN
“Knowledge Management” emerged in 2016 and 2017.
Finally, a few directions for future research has been proposed. Scholars interested in
M
BD and BI research may follow the framework shown in Figure 7 of this paper to
position their research. Journal editors may think of what research topics fit journal
ED
Acknowledgements: This research was partially funded by grants to the first author
PT
Intelligent E-Commerce, and Research Institute for Humanities and Social Sciences
of Ministry of Science and Technology.
Reference
AC
1. Arnott, D., & Pervan, G. (2014). A critical analysis of decision support systems
research revisited: the rise of design science. Journal of Information Technology,
29(4), 269-293.
2. Brichni, M. et al. (2017), BI4BI: A continuous evaluation system for Business
Intelligence systems, Expert Systems with Applications, Volume 76, pp. 97-112.
3. Chang, Y. W., Hsu, P. Y., & Wu, Z. Y. (2015). Exploring managers' intention to use
business intelligence: the role of motivations. Behaviour & Information
19
ACCEPTED MANUSCRIPT
T
Association for Information Systems, 1(1), pp.33-53.
IP
8. Cukier, K., & Mayer-Schoenberger, V. (2013). The rise of big data: How it's
changing the way we think about the world. Foreign Aff., 92, 28.
CR
9. Davenport, T. H. (2006). Competing on analytics. Harvard business review, 84(1),
98.
10. Davenport, T. H., & Harris, J. G. (2007). Competing on analytics: The new
US
science of winning. Harvard Business Press.
11. Dean, J., & Ghemawat, S. (2008). MapReduce: simplified data processing on
AN
large clusters. Communications of the ACM, 51(1), 107-113.
12. Fernández, A., del Río, S., López, V., Bawakid, A., del Jesus, M. J., Benítez, J. M.,
& Herrera, F. (2014). Big Data with Cloud Computing: an insight on the
M
13. Freire, M., Serrano-Laguna, Á., Iglesias, B. M., Martínez-Ortiz, I., Moreno-Ger, P.,
& Fernández-Manjón, B. (2016). Game learning analytics: Learning analytics for
serious games. Learning, design, and technology, 1-29.
PT
14. Fuchs, M., Höpken, W., & Lexhagen, M. (2014). Big data analytics for knowledge
generation in tourism destinations–A case from Sweden. Journal of destination
CE
20
ACCEPTED MANUSCRIPT
T
and customer relationship management. John Wiley & Sons.
IP
22. Luhn, H. P. (1958). A business intelligence system. IBM Journal of Research and
Development, 2(4), 314-319.
CR
23. Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Byers,
A. H. (2011). Big data: The next frontier for innovation, competition, and
productivity. McKinsey Global Institute
US
24. Marine-Roig, E., & Clavé, S. A. (2015). Tourism analytics with massive
user-generated content: A case study of Barcelona. Journal of Destination
AN
Marketing & Management, 4(3), 162-172.
25. McAfee, A., Brynjolfsson, E., & Davenport, T. H. (2012). Big data: the
management revolution. Harvard business review, 90(10), 60-68.
M
26. Sun, D., Du, Y., Xu, W., Zuo, M., Zhang, C., Zhou, J. (2014). " Combining Online
News Articles and Web Search to Predict the Fluctuation of Real Estate Market in
ED
Big Data Context "Pacific Asia Journal of the Association for Information
Systems, 6(4), pp.19-37.
27. Tien, J. M. (2013). Big data: Unleashing information. Journal of Systems Science
PT
21
ACCEPTED MANUSCRIPT
T
IP
CR
US
AN
M
ED
PT
CE
AC
22