Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Accounting Horizons American Accounting Association

Vol. 29, No. 2 DOI: 10.2308/acch-51068


2015
pp. 423–429

Big Data Analytics in Financial Statement


Audits
Min Cao, Roman Chychyla, and Trevor Stewart

SYNOPSIS: Big Data analytics is the process of inspecting, cleaning, transforming, and
modeling Big Data to discover and communicate useful information and patterns,
suggest conclusions, and support decision making. Big Data has been used for
advanced analytics in many domains but hardly, if at all, by auditors. This article
hypothesizes that Big Data analytics can improve the efficiency and effectiveness of
financial statement audits. We explain how Big Data analytics applied in other domains
might be applied in auditing. We also discuss the characteristics of Big Data analytics,
which set it apart from traditional auditing, and its implications for practical
implementation.
Keywords: Big Data; analytical methods; auditing.

INTRODUCTION

B
ig Data includes datasets that are too large and complex to manipulate or interrogate with
standard methods or tools. It is characterized by ‘‘three Vs’’: volume, velocity, and variety
(McAfee and Brynjolfsson 2012). Volume refers to the sheer size of the dataset, velocity to
the speed of data generation, and variety to the multiplicity of data sources; the three Vs tend to be
interrelated.1 Traditional datasets utilized by auditors and academia, such as Compustat, CRSP, and
Audit Analytics, are not Big Data. Big Data is a relatively recent phenomenon, the product of a
technological environment in which almost anything can be recorded, measured, and captured

Min Cao is an Assistant Professor at Rutgers, The State University of New Jersey, New Brunswick,
Roman Chychyla is a Visiting Assistant Professor at the University of Miami, and Trevor Stewart is
a retired Deloitte partner and a Senior Research Fellow at Rutgers, The State University of New
Jersey.

The authors gratefully acknowledge the advice, help, and comments received from many individuals including
Khrystyna Bochkay, Alexander Kogan, Miklos Vasarhelyi, and seminar participants at Rutgers, The State University of
New Jersey. We also thank the editors, Arnold M. Wright, Paul A. Griffin, and Brad M. Tuttle, as well as two
anonymous reviewers for their helpful and insightful comments.
Submitted: February 2015
Accepted: February 2015
Published Online: February 2015
Corresponding author: Trevor Stewart
Email: [email protected]

1
Some also refer to the four Vs of Big Data, the fourth being ‘‘veracity’’ (Zhang, Yang, and Appelbaum 2015).
423
424 Cao, Chychyla, and Stewart

digitally, and thereby turned into data—a process that Mayer-Schönberger and Cukier (2013) refer
to as ‘‘datafication.’’ Datafication may track thousands of simultaneous events; be performed in real
time; involve a multiplicity of numbers, text, images, sound, and video; and require petabytes
(thousands of terabytes) of storage capacity.2 Examples of Big Data include more than 1 million
customer transactions every hour at Walmart, more than 50 billion photos on Facebook, and 200
gigabytes of astronomical data collected per night.3 Big Data has been used in marketing to target
potential customers, in political campaigning to study voter demographics, in sports to evaluate
teams and players, in national security to identify threats, in biology to study DNA, and in law
enforcement to identify crime suspects (Mayer-Schönberger and Cukier 2013).
Big Data analytics is the process of inspecting, cleaning, transforming, and modeling Big Data
to discover and communicate useful information and patterns, suggest conclusions, and support
decision making. For the purposes of this article, we assume that the auditor focuses on the
transactions, balances, and disclosures that underlie the financial statements and related
management assertions. In the auditing of financial statements in accordance with International
Statements on Auditing (ISAs), numerous potential opportunities arise for Big Data analytics. For
example, the following audit activities are likely to benefit from Big Data analytics:
 Identifying and assessing the risks associated with accepting or continuing an audit
engagement, for example, the risks of bankruptcy or high-level management fraud.
 Identifying and assessing the risks of material misstatement of the financial statements due to
fraud, and testing for fraud with regard to the assessed risks (ISA 240, IAASB 2014a).
 Identifying and assessing the risks of material misstatement through understanding the entity
and its environment (ISA 315, IAASB 2014b). This includes performing preliminary
analytical procedures, as well as evaluating the design and implementation of internal
controls and testing their operating effectiveness.
 Performing substantive analytical procedures in response to the auditor’s assessment of the
risks of material misstatement (ISA 520, IAASB 2014c).
 Performing analytical procedures near the end of the audit to assist the auditor in forming an
overall conclusion about whether the financial statements are consistent with the auditor’s
understanding of the entity (ISA 520, IAASB 2014c).
In this article, we hypothesize that a financial statement audit can potentially be improved by
analytical methods that use Big Data. In such audits, the data are transactions and balances that
usually reside in the entity’s ERP and data warehouse systems. These data are not Big Data per se
unless they are accumulated over a significant period of time or are complemented with additional
facts. Therefore, most Big Data opportunities discussed in this paper come from auxiliary data that,
after processing, may reveal matters of audit interest. The Big Data of potential audit interest
includes social media information, surveillance videos, and stock market transaction data.

EXAMPLES OF BIG DATA ANALYTICS


Since there are few, if any, current applications of Big Data analytics in external auditing, and
none that we are aware of, we describe examples from other disciplines and hypothesize how
similar applications could be implemented in external auditing.

2
See, http://archive.wired.com/science/discoveries/magazine/16-07/pb_intro for a good illustration of how much a
petabyte is.
3
When the Sloan Digital Sky Survey (SDSS) began collecting astronomical data in 2000, it amassed more in its first
few weeks than all data collected in the history of astronomy.

Accounting Horizons
June 2015
Big Data Analytics in Financial Statement Audits 425

First, the new availability of voluminous and informative sources of data has resulted in new
approaches to predict stock price averages. For instance, Bollen, Mao, and Zeng (2011) measure
global public mood based on Twitter data and successfully use it to predict daily fluctuations of the
Dow Jones Industrial Average (DJIA). They utilized Google’s Profile of Mood States and
academically developed OpinionFinder (Wilson et al. 2005) tools to generate daily time series of
the public mood shifts based on nearly 10 million public tweets posted by approximately 2.7
million users. By doing this, the authors were able to predict shifts in the DJIA three to four days
ahead. In addition to social media, news articles are also known to predict movements of stock
prices (Chan 2003; Mittermayer 2004). It is conceivable that similar data sources can be used to
predict bankruptcy or assess the overall financial state of a firm. Such tools might be used to better
identify and evaluate engagement risk and thus reduce litigation risk.
Second, demographic and weather data have been used to predict customers’ behavior.
OfficeMax, a large retailer of office supplies, uses LivePredict, a system built by online technology
provider Monetate, to personalize online landing pages based on customers’ demographics.4
Interestingly, this system tries to predict customers’ political views, and adjusts accordingly. The
system uses IP addresses to identify customers’ locations and U.S. census data to create
demographic profiles. In a weather-related application, Walmart analyzed its terabytes of
transactional data to determine that when hurricanes threatened, customers not only bought
additional flashlights, but that sales of strawberry Pop-Tarts (a popular breakfast snack) increased
sevenfold.5 This and similar findings from Big Data analytics help Walmart to better manage
inventories. Geographical and demographic data have a potential to reasonably predict revenues
and sales in individual business locations. The resulting estimates may be used as a benchmark to
assess sales amounts by locations. In addition, peer-based metrics can be utilized to draw attention
to possibly problematic branches. Similar analytics may improve the audit process by focusing
resources on more risky parts of the business.
Third, Big Data analytics commonly involves combining several sources of data, some
structured and others unstructured, including numbers, text, images, sound, and video—the
processing of which requires a combination of different analytical methods from different
disciplines. An example is Ayata’s Prescriptive Analytics, which is used in oil and gas exploration
to predict optimal drilling sites based on data such as images from well logs, videos of fluid flows
from hydraulic fractures, sounds from drilling operations, text from driller’s notes, and numbers
from production reports.6 The challenge of integrating different sources of Big Data including
news, audio and video streams, cell phone recordings, social media comments, and using them for
audit purposes is discussed by Moffitt and Vasarhelyi (2013), who propose using such data to
obtain new forms of evidence, confirm existence of events, and validate reporting elements.
Fourth, the Los Angeles Police Department analyzes data from crime scenes, including time,
location, nature, and actors in order to predict the most likely timing and location of crimes on that
day and to deploy forces most effectively.7 The result has been a significant improvement in the
LAPD’s ability to forestall criminal activity and neutralize potential perpetrators such as gang
members in the predicted area. Similar analytics that relies on information about a firm’s past
activities or outcomes of past audits could be used by auditors to identify fraud risks and direct audit
effort aimed at fraud detection.

4
See, http://www.forbes.com/sites/lydiadishman/2013/08/08/forget-ab-testing-office-max-uses-livepredict-to-
segment-red-and-blue-voters/
5
See, http://www.nytimes.com/2004/11/14/business/yourmoney/14wal.html
6
See, http://www.wired.com/insights/2014/01/big-data-analytics-can-deliver-u-s-energy-independence/
7
See, http://www.huffingtonpost.com/2012/07/01/predictive-policing-technology-los-angeles_n_1641276.html

Accounting Horizons
June 2015
426 Cao, Chychyla, and Stewart

Fifth, the SEC is investing in Big Data analytics applications to monitor market events, seek
out financial statement fraud, and identify audit failures. For example, Market Information Data
Analysis System (MIDAS), rolled out by the SEC in January 2013, collects about one billion
records a day from the proprietary feeds of each of the 13 national equity exchanges, time-stamped
to the microsecond. The data is extremely voluminous, challenging to process correctly, and
requires specialized data expertise. In July 2013, the agency announced the formation of Financial
Reporting and Audit Task Force to strengthen the effort to identify securities law violations relating
to the preparation of financial statements, issuer reporting and disclosure, and audit failures. The
task force uses an analytical Accounting Quality Model (AQM), better known in financial services
as ‘‘RoboCop,’’ to scan routine regulatory filings and flag high-risk activities warranting closer
inspection by SEC enforcement teams. At the same time, the SEC also announced the formation of
the Microcap Fraud Task Force to investigate fraud in the issuance, marketing, and trading of
microcap securities. The task force will monitor websites and social media because microcap
fraudsters frequently employ them to prey on unsophisticated investors.8 Similar analytics could be
used by auditors to identify fraudulent or high-risk activities by auditees.
Finally, we note that internal audit groups at some large companies are utilizing Big Data
within their organizations. For example, the internal audit team at BlueCross and BlueShield of
North Carolina uses Big Data analytics to identify duplicate insurance claims from millions of
claims each month.9 KPMG, Deloitte, and PwC all have publications on their websites explaining
how internal auditors can use data analytics to improve both efficiency and effectiveness. For
example, KPMG suggests that ‘‘With data analytics, organizations have the ability to review every
transaction—not just a sample—which enables a more efficient analysis on a greater scale’’ (KPMG
2013, 1). Many internal audit activities mirror those of external financial statement audits and
similar Big Data analytics can be applied.

CHARACTERISTICS OF BIG DATA ANALYTICS


There are certain characteristics of Big Data analytics that are causing users to rethink how data
are used. First, it is increasingly possible to analyze ALL or almost all the data rather than just a
small, carefully curated subset or sample. This can lead to models that are more robust than before.
For example, if an auditor wants to determine what characteristics of journal entries are indicators
of risk of error or fraud, then it is possible to analyze all the journal entries for however long records
have been kept and use this information to identify current journal entries that are truly unusual.
Whereas in the past one had to be very careful to eliminate polluted data, when all the data are
available a certain degree of messiness is acceptable.10
A second shift in thinking is from causation to correlation. Instead of trying to understand the
fundamental causes of complex phenomena, it is increasingly possible to identify and make use of
correlations. For example, Mayer-Schönberger and Cukier (2013, 132) report that ‘‘researchers at
the University of Ontario Institute of Technology and IBM are working with a number of hospitals
on software to help doctors make better diagnostic decisions when caring for premature babies . . .
The software captures and processes patient data in real time, tracking 16 different data streams,
such as heart rate, respiration rate, temperature, blood pressure, and blood oxygen level, which

8
http://www.sec.gov/News/PressRelease/Detail/PressRelease/1365171624975#.U0X3m6HD_IU (last accessed Janu-
ary 16, 2014).
9
See, https://www.kpmg.com/US/en/IssuesAndInsights/ArticlesPublications/Documents/big-data-oceans.pdf.
10
We recognize that polluted data may be more of a problem in some applications than in others. For example, more
data dramatically help in the area of computational linguistics, even if data are messy (Weikum et al. 2012).
However, data quality may be more important than data size in movie-recommending systems (Pilászy and Tikk
2009).

Accounting Horizons
June 2015
Big Data Analytics in Financial Statement Audits 427

together amount to around 1,260 data points per second.’’ While these observations may allow
doctors to eventually understand fundamental causes, simply knowing that something is likely to
occur is arguably more important than understanding exactly why. It is not hard to imagine an
analogous auditing application in which restatements or other adverse events are correlated with
indicators culled from every public company filing and other information.
The ability to use correlation models with vast amounts of high-velocity data, in order to
pinpoint transactions or events of audit interest, becomes significantly more useful when applied
continuously. Continuous auditing and monitoring systems are likely to become particularly
relevant in this Petabyte Age, transforming audit practice (Vasarhelyi and Halper 1991; Alles,
Brennan, Kogan, and Vasarhelyi 2006) where, for example, statistical relationships between
different business elements and processes may be monitored continuously to detect irregular events
(Kogan, Alles, Vasarhelyi, and Wu 2011).

IMPLEMENTING BIG DATA ANALYTICS IN AUDITS


Implementing Big Data analytics is not a trivial endeavor. It requires individuals with expertise
in data analytics, as well as appropriate hardware and software resources. As a result, many
businesses outsource their Big Data applications to solutions providers such as Teradata, IBM, and
Wipro that offer specialist services. Similarly, the training related to Big Data analytics may go well
beyond the scope of the academic and professional level of an auditor. The auditing profession will
have either to hire new analytically trained professionals, or more likely to use the services of third-
party solutions providers for Big Data analytics. Relying on third-party solutions providers raises a
privacy concern, but this issue is not new—the profession already relies on third parties, such as
banks, when carrying out audits.
In identifying anomalies and exceptions for further audit investigation, current implementa-
tions of analytical methods sometimes generate more false positives than can feasibly be
investigated by the audit team, and result in information overload (Debreceny, Gray, and Rahman
2003; Alles, Kogan, and Vasarhelyi 2008). One of the opportunities of Big Data analytics is the
possibility of dramatically reducing the number of false positives through more accurate
identification of true anomalies and exceptions together with better systems of prioritization (Issa
and Kogan 2013).
There are several issues that the auditing profession will need to deal with related to Big Data
analytics. First, making successful use of Big Data requires a paradigm shift. Instead of using some
data in small clean datasets and focusing on causation (plausible relationships in ISA terms), the
auditor using Big Data will tend to use ‘‘all’’ the data in large relatively messy datasets, and will
focus more on correlation than causation. The degree to which this approach is implemented in
audit will vary according to the stage of an audit: using messy data is more tolerable for planning
and risk assessment as opposed to substantive procedures. For example, Big Data analytics can be
used to identify business patterns and trends, traditional audit analytics and computer-assisted audit
techniques can be used to conduct a more detailed analysis of potential issues, and conventional
auditing judgment can be used to determine the impact of findings on financial reporting. In
addition, messy data might not be appropriate for analytical procedures that are sensitive to noise.
Nevertheless, this thinking is somewhat foreign to the profession. It will certainly require significant
new guidance and education, and may even require auditing standards themselves to be modified.
Second, the volume of Big Data introduces significant computational challenges. Many
common analytical techniques used in auditing could not be applied to Big Data. The solution is
either to use simple analytical techniques that require less computational resources, or to select
subsets of data that could be managed by more complex analytical tools. The latter case is using Big

Accounting Horizons
June 2015
428 Cao, Chychyla, and Stewart

Data to carefully select a subset that is more valuable for an audit. For example, there are methods
to select subsets of data that result in more accurate analytical models (e.g., see Settles 2009).
Third, privacy is a potential concern when Big Data is used. Some analytics may require
clients’ nonpublic information beyond that usually released to auditors. Others would benefit from
information about previously conducted audits, perhaps of other clients. The usage of such sensitive
information in Big Data applications presents a challenge, although this concern is not specific to
auditing. For example, the European Union is scrutinizing Google over a raft of antitrust and
privacy concerns related to its use of Big Data (Mayer-Schönberger and Cukier 2013).
Finally, when ‘‘all’’ the data are processed through the auditor’s analytical systems and there is
a failure to identify fraud or error, there is a risk that the auditor will be second-guessed. It is always
easy for others who have the benefit of hindsight to identify indicators that the auditor missed and to
connect the dots—just as the U.S. intelligence community was castigated for not connecting, in
advance, the dots that would have led to the apprehension of the bombers of the 2013 Boston
Marathon. This is not an entirely new problem, but auditors have traditionally based their work on
samples, and it is accepted that there is a statistical risk that fraud or error will not be identified.
Last, a change to Big Data analytics could identify fraud or error that was missed in the past. Again,
this is not a new problem, but it is an issue that auditors adopting Big Data analytics will likely have
to deal with.
Besides using Big Data analytics to perform audits, audit firms can potentially use it for
internal purposes. For example, since most audit working papers are electronic, there is an
opportunity for the firm to analyze audits across an entire portfolio in search of anomalies and
potential quality issues.

CONCLUDING REMARKS
Big Data is revolutionizing many fields at an increasing rate, and it seems only a matter of time
before the auditing profession adopts similar analytical methods. In this paper, we provide examples
of Big Data analytics and suggest analogous auditing applications. We briefly discuss certain
characteristics of Big Data analytics that are relevant to audit and identify some of the opportunities
and challenges of implementation.

REFERENCES
Alles, M., G. Brennan, A. Kogan, and M. Vasarhelyi. 2006. Continuous monitoring of business process
controls: A pilot implementation of a continuous auditing system at Siemens. International Journal
of Accounting Information Systems 7 (2): 137–161.
Alles, M., A. Kogan, and M. Vasarhelyi. 2008. Putting continuous auditing theory into practice: Lessons
from two pilot implementations. Journal of Information Systems 22 (2): 195–214.
Bollen, J., H. Mao, and X. Zeng. 2011. Twitter mood predicts the stock market. Journal of Computational
Science 2 (1): 1–8.
Chan, W. 2003. Stock price reaction to news and no news: Drift and reversal after headlines. Journal of
Financial Economics 70 (2): 223–260.
Debreceny, R., G. Gray, and A. Rahman. 2003. The determinants of Internet financial reporting. Journal of
Accounting and Public Policy 21 (4): 371–394.
International Auditing and Assurance Standards Board (IAASB). 2014a. The auditor’s responsibilities
relating to fraud in an audit of financial statements. ISA 240. In Handbook of International Quality
Control, Auditing, Review, Other Assurance, and Related Services Pronouncements, Vol. 1. New
York, NY: International Federation of Accountants. Available at: http://www.ifac.org
International Auditing and Assurance Standards Board (IAASB). 2014b. Identifying and assessing the risks
of material misstatement through understanding the entity and its environment. ISA 315. In

Accounting Horizons
June 2015
Big Data Analytics in Financial Statement Audits 429

Handbook of International Quality Control, Auditing, Review, Other Assurance, and Related
Services Pronouncements, Vol. 1. New York, NY: International Federation of Accountants.
Available at: http://www.ifac.org
International Auditing and Assurance Standards Board (IAASB). 2014c. Analytical procedures. ISA 520. In
Handbook of International Quality Control, Audititing, Review, Other Assurance, and Related
Services Pronouncements, Vol. 1. New York, NY: International Federation of Accountants.
Available at: http://www.ifac.org
Issa, H., and A. Kogan. 2013. A Predictive Ordered Logistic Regression Model for Quality Review of
Control Risk Assessments. Working paper, Rutgers Accounting Research Center.
Kogan, A., M. Alles, M. Vasarhelyi, and J. Wu. 2011. Analytical Procedures for Continuous Data Level
Auditing: Continuity Equations. Working paper, Rutgers Accounting Research Center.
KPMG. 2013. Data Analytics for Internal Audit. Available at: https://www.kpmg.com/CH/en/Library/
Articles-Publications/Documents/Advisory/pub-20130412-data-analytics-internal-audit-en.pdf
Mayer-Schönberger, V., and K. Cukier. 2013. Big Data: A Revolution That Will Transform How We Live,
Work, and Think. Boston, MA: Eamon Dolan/Houghton Mifflin Harcourt.
McAfee, A. and E. Brynjolfsson. 2012. Big data: the management revolution. Harvard Business Review 90:
60–66.
Mittermayer, M. 2004. Forecasting Intraday Stock Price Trends with Text Mining Techniques. Proceedings
of the 37th Annual Hawaii International Conference on System Sciences, Hawaii, HI, January 5–8.
Moffitt, K., and M. Vasarhelyi. 2013. AIS in an age of Big Data. Journal of Information Systems 27 (2): 1–
19.
Pilászy, I., and D. Tikk. 2009. Recommending new movies: Even a few ratings are more valuable than
metadata. In Proceedings of the Third ACM Conference on Recommender Systems, 93–100. New
York, NY: RecSys 2009.
Settles, B. 2009. Active Learning Literature Survey 1648. Computer Sciences Technical Report, University
of Wisconsin–Madison.
Vasarhelyi, M., and F. Halper. 1991. The continuous audit of online systems. Auditing: A Journal of
Practice & Theory 10 (1): 110–125.
Weikum, G., J. Hoffart, N. Nakashole, M. Spaniol, F. Suchanek, and M. Yosef. 2012. Big data methods for
computational linguistics. IEEE Data Engineering Bulletin 35 (3): 46–64.
Wilson, T., P. Hoffmann, S. Somasundaran, J. Kessler, J. Wiebe, Y. Choi, C. Cardie, E. Riloff, and S.
Patwardhan. 2005. OpinionFinder: A system for subjectivity analysis. In Proceedings of HLT/
EMNLP on Interactive Demonstrations, 34–35. Stroudsburg, PA: Association for Computational
Linguistics.
Zhang, J., X. Yang, and D. Appelbaum. 2015. Toward effective Big Data analysis in continuous auditing.
Accounting Horizons 29 (2).

Accounting Horizons
June 2015
Copyright of Accounting Horizons is the property of American Accounting Association and
its content may not be copied or emailed to multiple sites or posted to a listserv without the
copyright holder's express written permission. However, users may print, download, or email
articles for individual use.

You might also like