Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Journal of the Association for Information Systems

Volume 22 Issue 1 Article 4

1-8-2021

Design Principles for Robust Fraud Detection: The Case of Stock


Market Manipulations
Michael Siering
Goethe University Frankfurt, [email protected]

Jan Muntermann
University of Goettingen, [email protected]

Miha Grčar
Jožef Stefan Institute, [email protected]

Follow this and additional works at: https://aisel.aisnet.org/jais

Recommended Citation
Siering, Michael; Muntermann, Jan; and Grčar, Miha (2021) "Design Principles for Robust Fraud Detection:
The Case of Stock Market Manipulations," Journal of the Association for Information Systems, 22(1), .
DOI: 10.17705/1jais.00657
Available at: https://aisel.aisnet.org/jais/vol22/iss1/4

This material is brought to you by the AIS Journals at AIS Electronic Library (AISeL). It has been accepted for
inclusion in Journal of the Association for Information Systems by an authorized administrator of AIS Electronic
Library (AISeL). For more information, please contact [email protected].
Journal of the Association for Information Systems (2021) 22(1), 156-178
doi: 10.17705/1jais.00657

RESEARCH ARTICLE

ISSN 1536-9323

Design Principles for Robust Fraud Detection:


The Case of Stock Market Manipulations

Michael Siering1, Jan Muntermann2, Miha Grčar3


1
Goethe University Frankfurt, Germany, [email protected]
2
University of Goettingen, Germany, [email protected]
3
Jožef Stefan Institute, Slovenia, [email protected]

Abstract

We address the challenge of building an automated fraud detection system with robust classifiers
that mitigate countermeasures from fraudsters in the field of information-based securities fraud. Our
work involves developing design principles for robust fraud detection systems and presenting
corresponding design features. We adopt an instrumentalist perspective that relies on theory-based
linguistic features and ensemble learning concepts as justificatory knowledge for building robust
classifiers. We perform a naive evaluation that assesses the classifiers’ performance to identify
suspicious stock recommendations, and a robustness evaluation with a simulation that demonstrates
a response to fraudster countermeasures. The results indicate that the use of theory-based linguistic
features and ensemble learning can significantly increase the robustness of classifiers and contribute
to the effectiveness of robust fraud detection. We discuss implications for supervisory authorities,
industry, and individual users.

Keywords: Fraud Detection, Market Manipulation, Design Principles, Text Mining, Data Mining,
Instrumentalism, Ensemble Learning

Sandeep Purao was the accepting senior editor. This research article was submitted on February 22, 2016 and
underwent four revisions.
2005). However, the robustness of fraud-detection
1 Introduction efforts against these types of countermeasures,
especially in terms of identifying fraudulent texts, has
Fraud detection systems (FDS) have gained importance
rarely been addressed thus far. We respond to this
in both business and societal contexts. For instance, FDS
theoretical and practical research gap by conducting a
have been used to identify suspicious employee
multiyear design science research (DSR) project with a
communications (Holton, 2009), fraudulent corporate
multinational project consortium to address the problem
disclosures (Ravisankar et al. 2011), and unauthorized
of information-based market manipulation.
financial transactions (Chen, Chen, & Lin, 2006). A
common problem in the field of fraud detection is that In this type of market manipulation, fraudsters
fraudsters constantly adapt their behavior to avoid being frequently attempt to manipulate stock prices by
detected by contemporary systems (Bolton & Hand, disseminating highly positive but false information
2002). For instance, consider a text categorization through fraudulent websites, spam messages, and
system that uses certain keywords to determine whether advertising campaigns on legitimate websites (SEC,
a document is suspicious. If the keywords become 2012b). Fraudsters often follow a “buy low and spam
known, fraudsters will refrain from using them and high” strategy: They begin by purchasing a certain
adapt the content of their messages (Webb, Chitti, & Pu, stock, then they recommend the stock to internet users

156
Design Principles for Robust Fraud Detection

to increase demand for it, thereby raising the stock’s around with different theories and different traditions of
price, and, finally, the fraudsters sell their stocks at a scientific knowledge production in a way that rival
profit (Frieder & Zittrain, 2006). These types of so- philosophies of science neglect” (Kilduff, Mehra, &
called “pump and dump” schemes have become a Dunn, 2011, p. 1011). Specifically, we employ theories
serious problem and a number of spam campaigns have drawn from marketing and financial economics as
led to significant financial losses (FBI, 2011). Investors kernel theories that inform our artifact construction
duped by such schemes risk losing significant portions (Gregor & Hevner, 2013). We demonstrate that our
of their investments after the spam campaign concludes, research approach, design principles, and design
when prices typically fall below their original levels features are advantageous for problem solving and
(Aggarwal & Wu, 2006). Moreover, the firms that generate practicable outcomes. We conduct an empirical
issued the affected stocks suffer significant reputational evaluation of the artifact’s validity in the context of
loss (Hanke & Hauser, 2008). The research consortium stock market manipulations and assess its robustness by
that addressed this problem consisted of nine partners, simulating a fraudster taking countermeasures against
including universities, financial institutions, and IT our solution. The remainder of this paper is structured as
service providers from the finance and market follows: Section 2 presents the research background,
surveillance domains. In addition, a financial market Section 3 focuses on the research methodology applied
surveillance authority contributed within an advisory and our artifact design, Section 4 outlines our artifact
board. evaluation, Section 5 discusses the results, and Section
6 concludes the paper.
Previous studies have proposed various methods of
detecting fraudulent websites or messages (Abbasi et al.,
2010; Caruana & Li, 2012). Financial fraud detection is 2 Research Background
an important field (Ngai et al., 2011), and scholars have
repeatedly addressed the problem of identifying 2.1 Fraud Detection in Finance
securities fraud in general (Fast et al., 2007).
Data mining techniques have been applied to address
Nevertheless, the problem of information-based fraud in
diverse types of fraud, especially in the financial
its various forms, such as the dissemination of
context. Ngai et al. (2011) provide an overview of this
fraudulent stock recommendations, remains
field and the major categories of financial fraud: bank
underexplored, especially in terms of providing robust
fraud, insurance fraud, securities and commodities
classifications. Specifically, a robust classifier is one
fraud, and other finance-related fraud.
that will “resist change without adapting its initial stable
configuration” (Wieland & Marcus Wallenburg, 2012, Regarding bank fraud, the extant research has focused
p. 890). primarily on credit card fraud (Chen et al., 2006),
although insurance fraud and other finance-related
To address the problem, this study develops an IT
fraud have been explored in diverse contexts, such as
artifact that can act as a robust classifier by providing an
automotive insurance fraud (Caudill, Ayuso, &
assessment of whether a given document is suspected of
Guillén, 2005) and financial statement fraud (Glancy
being fraudulent. The artifact is based on new design
& Yadav, 2011; Ravisankar et al., 2011). By contrast,
principles and exhibits new design features that make
few studies have examined the process of detecting
these classifications robust against potential fraudster
manipulations of securities and commodities markets
countermeasures. To develop the artifact, we follow the
(Ngai et al., 2011). Regarding securities fraud, three
problem-solving design science research (DSR)
types of stock market manipulation schemes have been
paradigm (Hevner, March, & Park, 2004; Newell &
described in the literature: information-based, trade-
Simon, 1972) with a constructive and proactive
based, and action-based manipulations (Allen & Gale,
approach (Iivari, 2007; Iivari, 2015). More specifically,
1992). These schemes seek to manipulate stock prices
we followed the process model of Kuechler &
through the release and spread of false information
Vaishnavi (2008) to formulate specific design principles
(information-based manipulation), the buying or
and design features (at the mesolevel) to address the
selling of a stock (trade-based manipulation), or the
identified problem and the specific design requirements
execution of certain management activities (action-
in the field of information-based fraud.
based manipulation). Scholars have extensively
From a methodological perspective, our research studied trade-based manipulation (Felixson & Pelli,
illustrates how classifiers constructed on the basis of 1999). The restrictions imposed upon managers who
relevant kernel theories can support problem solving. trade their own firms’ stock have led to action-based
Our work therefore differs significantly from traditional manipulation becoming rare (Öğüt et al., 2009).
data mining research, which strictly follows the logic of Information-based manipulation has gained increasing
induction, generating new knowledge by applying data attention in recent years because the internet has
mining methods to detect patterns within the existing facilitated the spread of fraudulent stock
data. In contrast, we adopt an instrumentalist recommendations to large audiences. The
perspective, which provides the “freedom to play manipulators typically attempt to profit by purchasing

157
Journal of the Association for Information Systems

a stock at a low price, recommending it to other during feature processing (Kolcz & Teo, 2009), but the
investors, and then selling the stock at a higher price potential use of linguistic features as textual
(Siering et al., 2017). Research has demonstrated that representations has rarely been investigated.
trading volumes increase if stocks are advertised
Linguistic features are derived from an original feature
through fraudulent recommendations (Böhme & Holz,
set, such as a “bag of words” from a document
2006). Furthermore, several studies have revealed that
(Djeraba, 2002). Such features have been successfully
these fraudulent recommendations can generate
applied for author identification (Zheng et al., 2006)
increases in stock prices during the manipulation
and speaker recognition tasks (Campbell et al., 2007)
period. However, when no further recommendation
but only to increase classification performance, not to
messages are published, the prices of the manipulated
increase classifier robustness. Furthermore, the
stocks decrease rapidly to below their original levels
selection of linguistic features has typically been ad
(Aggarwal & Wu, 2006; Böhme & Holz, 2006; Hanke
hoc, rather than based on theoretical insights drawn
& Hauser, 2008). Even though the United States
from kernel theories serving as “justificatory
Securities and Exchange Commission (SEC) has taken
knowledge” (Gregor & Jones, 2007) to improve
countermeasures against these forms of manipulation
classification robustness.
(i.e., by releasing warnings, suspending trading, and
prosecuting manipulators), manipulation campaigns A different category of studies seeks to increase the
can still be effective (Siering, 2019). robustness of classifications by training collections of
different classifiers and implementing various rules
In general, the detection of stock market manipulation
such as majority voting or classification averages to
remains underexplored (Ngai et al., 2011). While the
combine classification results (Biggio, Fumera, &
general characteristics of such manipulation schemes
Roli, 2010; Perols, Chari, & Agrawal, 2009). Although
and potential system designs have been taken into
this research stream provides guidance for the
account (Gregory & Muntermann, 2014; Siering et al.,
development and combination of multiple classifiers
2017), the use of unstructured data sources such as
that use the same input data, no study has yet attempted
financial news or investment newsletters does not
to construct classifiers guided by relevant kernel
appear to have been analyzed. This is a critical gap
theory to achieve better robustness against potential
because this type of textual data is a frequent source of
countermeasures.
malicious and misleading information in the context of
information-based manipulations. Furthermore, the
2.2.2 Related Work from Financial
potential countermeasures that fraudsters may use to
circumvent fraud-detection mechanisms also remain Economics and Marketing Research
underexplored. In the following, we focus on related work from the
field of financial economics and marketing research to
2.2 Theoretical Perspectives on the explain the aspects that make stock recommendations
Robustness of Fraud Detection effective. We incorporated this work into the
development of our design features. In financial
2.2.1 Related Work from Machine Learning economics, it is assumed that information processing
is the basis of investment decisions (Fama, 1970).
Fraud-detection systems must satisfy the general Behavioral finance theory states that investment
requirement of being able to achieve good decisions can also be driven by irrational factors such
classification performance. However, the development as information presentation, including the sentiment
of robust fraud-detection classifiers is a challenging expressed within a stock recommendation (de Bondt,
task: If fraudsters are aware that their activities may be 1998). Persuasive communication is also typically the
detected, they might implement appropriate focus of marketing research: Stock recommendations
countermeasures to evade the fraud-detection systems. represent a form of advertising that is sent to internet
A robust classifier is one that will “resist change users to influence their information processing and
without adapting its initial stable configuration” ultimately promote desired behavior—specifically, the
(Wieland & Marcus Wallenburg, 2012, p. 890). This purchase of a specific stock (Vakratsas & Ambler,
consideration significantly complicates the 1999).
classification task for these systems, making their
challenge “quite different from traditional Marketing research has recognized the important role
classification problems, as intelligent, malicious, and of advertisements’ information content (Abernethy
adaptive adversaries can manipulate their samples to & Franke, 1996). Advertisements are often used by
mislead a classifier or a learning algorithm” (Biggio et consumers to acquire product-related information,
al., 2011, p. 350). Different approaches have been which is then incorporated into purchase decisions
explored to increase the robustness of classifiers (Nelson, 1970). Moreover, if advertisements disregard
against the countermeasures of potential attackers. customers’ search for relevant product information, the
Several studies suggest adaptations of classifiers advertisers’ “non-informative advertising policy may

158
Design Principles for Robust Fraud Detection

self-destruct” (Resnik & Stern, 1977, p. 53). In analysis and appropriate domain knowledge are
addition, in the financial context, the price- especially important for developing suitable problem
determination process for various instruments such as solutions (Peffers et al., 2007). In this case, both gained
stocks is driven primarily by the information available insights and justificatory knowledge become integral
to market participants (Fama, 1970). Therefore, the parts of the developed problem solution (Simon,
information content of advertisements is particularly 1996).
important and should be considered by advertisers who
The role of theory in DSR is twofold (Kuechler
promote financial products (Jones & Smythe, 2003).
& Vaishnavi, 2008). First, so-called “kernel theories,”
Text readability encompasses the question of how which often originate from non-IS disciplines, may
easily a text can be read and the educational level inform the search for a satisficing problem solution.
required to understand its content (Bailin & Grafstein, We consider the work introduced in the previous
2001; Korfiatis, García-Bariocanal, & Sánchez- section to be such kernel theory. Second, DSR seeks to
Alonso, 2012). It has been shown that readability is a make theoretical contributions by providing explicit
prerequisite for advertising efficacy (Abruzzini, 1967). prescriptions for “how to do something/solve a
Thus, advertisers seek to increase the productive problem.” Such prescriptive guidance is provided by
attention devoted to their advertisements by ensuring design principles that represent “core principles and
that they are easy to read (Clark, Kaminski, & Brown, concepts to guide design” (Vaishnavi & Kuechler,
1990). The effect of text readability on investors’ 2015, p. 20), which can be applied for “use in the
reactions has also been investigated in the financial design and implementation of the IS product” (Hevner
context. In particular, the readability of corporate & Chatterjee, 2010, p. 49). We develop and present
disclosures has been found to influence trading such design principles, which are mapped to design
behavior, with investors demonstrating delayed features at the instantiated level. Our design principles
reactions to corporate disclosures that are difficult to provide “a clear statement of truth that guides or
read (You & Zhang, 2009), and improved disclosure constrains action” for the development of robust fraud-
readability significantly affects small investor trading detection systems (Hevner & Chatterjee, 2010, p. 66)
(Loughran & McDonald, 2010). and can thus be considered to be essential design
principles (Gregor, Müller, & Seidel, 2013). By
The important role of sentiment within advertisements
offering a more effective solution to a well-known
and the effects of these emotions on consumers’ moods
class of problem (fraud detection), our study belongs
and reactions have been the subject of various studies.
to improvement research: Here, new and better
Emotional advertising appears to increase consumers’
solutions are developed for known problems (Gregor
attention to a product and bolster consumers’
& Hevner, 2013).
memories of product-related features (Chandy et al.,
2001), and product-related emotional communications
can intensify consumers’ attitudes (Sonnier,
3.2 Research Process
McAlister, & Rutz, 2011). These arguments are Our DSR project follows the process model of Kuechler
supported in the financial context by behavioral and Vaishnavi (2008), which provided guidance during
finance theory. In particular, it is assumed that our research process (see Figure 1). In the first step
investors are influenced by the tone of discussions that (awareness of the problem), the goal is to develop an
involve certain financial instruments, and it has been understanding of the problem faced by stakeholders.
shown that investors are influenced by sentiments After collecting, structuring, and condensing this
expressed in newspapers, message boards, and even information, the problem description and design
Twitter messages (Bollen & Huina, 2011; Das & Chen, requirements are formulated (see Section 3.3). These
2007). may be revised during the problem-solving process. The
design requirements are addressed in the following step
3 Research Methodology and (suggestion), in which the initial ideas (tentative
designs) for solving the problem are produced. New
Artifact Design ideas may be brought forward deductively on the basis
of a relevant kernel theory or abductively from other
3.1 Design Science Research sources (e.g., similar cases; Kuechler & Vaishnavi,
We adopt the DSR paradigm, which is generally 2012) and are condensed in the form of design principles
related to the development of IT artifacts (Hevner et (see Section 3.4). However, while our approach to
al., 2004; March & Smith, 1995; Peffers et al., 2007). problem solving is inductive and data-driven, our logic
A key characteristic of this research paradigm is that of action is also characterized by truth-independent
DSR researchers search for satisficing (though not problem solving. Here, we consider theories to be
necessarily the best) problem solutions that meet the “useful instruments in helping predict events and solve
formulated problem requirements (Simon, 1996). problems” (Kilduff et al., 2011, p. 302).
Because DSR is focused on problem solving, problem

159
Journal of the Association for Information Systems

Knowledge Flows Process Steps Outputs

Awareness of Problem Description /


Problem Design Requirements

Knowledge Design Principles /


Suggestion
Contribution Tentative Design

Development Design Features / Artifact

Evaluation Performance Measures


Design Science
Knowledge
Conclusion Results

Figure 1. Employed Design Science Research Process Model Based on Kuechler & Vaishnavi (2008)

In the third step (development), the design principles are authority and an IT company that develops software for
mapped to design features—the specific artifact capital market surveillance. They reported that it is
capabilities that result from (for example) a chosen imperative to process the large and ever-growing
algorithm (Meth, Mueller, & Maedche, 2015). We universe of web documents to obtain knowledge of this
present these design features in Section 3.5 in terms of type of market manipulation. Based on these insights,
an instantiated IT artifact—algorithm implementations we derived design requirement DR1.
that are evaluated in step four. Here, suitable measures
DR1: Process a large volume of unstructured data. To
are used to assess the performance of the IT artifact. The
detect information-based securities fraud, FDS
results may provide support for the previously coded
should support the processing of large
design knowledge, as illustrated in Section 4 or,
collections of documents published on the
alternatively, may necessitate alterations during the
internet.
previously taken steps. When the evaluation results
provide support for the successful design of a satisficing Further interviews with domain experts showed that
problem solution, the codified design knowledge is being able to easily access large collections of
finalized and presented in the context of future research documents is not sufficient. Manually processing and
in the final step (conclusion). The knowledge assessing documents is not adequate because of the large
contribution is thereby made. In the following sections, number of documents available. Consequently, an
we outline the steps taken to develop robust FDS. automated assessment of documents is required.
However, full automation in the field of market
3.3 Problem Description and Design manipulation detection is not feasible. As a domain
Requirements expert explained during an interview, it is ultimately up
to the courts to decide whether to find a market
The phenomenon of information-based market participant guilty of market manipulation. Instead, FDS
manipulations (i.e., the spread of false information to should direct its attention to cases in which documents
affect stock prices) has existed for many years. As seen are found to be suspicious and require further manual
in the historical cases reported by the SEC (1959), analysis. Against this background, design requirement
information-based market manipulation used to be the DR2 was derived.
exclusive preserve of privileged market participants
such as broker-dealers, who capitalized on the fact that DR2: Provide automated identification of suspicious
investors attentively listened to them. Today, the group documents. The FDS should direct its attention
of manipulators has grown and the way in which they to cases that merit further manual detailed
use technology has changed significantly. Now, almost exploration and provide an automated
anyone can use the Internet to spread rumors throughout classification of documents (suspicious versus
the world at nearly no cost. Thus, the problem of non-suspicious).
information-based market manipulation has become After the first steps within the research process (Section
more urgent, while its detection and prevention have 3.2) were taken, we presented an initial tentative design
become more difficult (SEC, 2012b). (see sections below) to domain experts. While the initial
In our DSR project, this problem was explained by the reaction to design requirements DR1 and DR2 was
participating domain experts. Our group of experts positive, the domain experts sensed a problem with the
consisted of representatives of a market supervisory suggested artifact that had not been clearly articulated.

160
Design Principles for Robust Fraud Detection

Based on their experiences with other types of market automated classifications of cases via machine learning
manipulation, the experts intuited that market technologies (Ngai et al., 2011). Thus, following these
manipulators will adjust their behavior after becoming design requirements as well as the literature stream
aware that corresponding FDS have been developed. outlined in the research background section, we
Consequently, an FDS must provide reliable document formulated the second design principle, DP2.
classifications to address manipulators’ adjustments of
DP2: Automation of document processing and
their writing style to prevent documents from being
classification: FDS should provide automated
classified as suspicious. This feedback led us to derive a
document processing and classification
third and final design requirement, DR3.
(suspicious vs. non-suspicious).
DR3: Limit system vulnerability to fraudster
Finally, to fulfill the design requirement of limiting
countermeasures. The FDS should, without
system vulnerability to fraudsters’ countermeasures
reconfiguration, provide reliable classifications
(DR3), we inferred that these countermeasures must be
of documents when the manipulator adjusts the
anticipated if the system is to be made more robust
writing style to mislead the system.
against them. This inference is particularly important
because fraudsters have been shown to manipulate their
3.4 Design Principles of Robust Fraud deceptive content to mislead existing FDS (Biggio et al.,
Detection Systems 2011). This phenomenon has been observed in the field
To address these design requirements, we developed of spam detection, especially with regard to textual
several design principles that guided our artifact content (Goodman et al., 2007). Awareness of such
development. Following the requirements whereby a potential countermeasures should thus help increase
large volume of unstructured data must be processed FDS robustness—specifically, the degree to which the
(DR1) and document classifications should be classification process functions correctly in the presence
conducted automatically (DR2), a related knowledge of stressful environmental conditions (IEEE, 1990).
discovery process (Fayyad, Piatetsky-Shapiro, & Consequently, we formulated the third design principle.
Smyth, 1996) must extract patterns from existing DP3: Anticipation of fraudsters’ countermeasures:
documents. The most important aspect of this process is FDS should provide reliable document
the development of a proper problem understanding. classifications even when adapted documents
Based on that problem understanding, an appropriate prevent correct FDS classifications.
feature set can be derived for data mining purposes.
Therefore, it is essential to understand which features 3.5 Artifact Design Features
are well-suited for identifying suspicious documents.
Based on our design principles, we developed the design
Regarding “pump and dump” campaigns, we recognize features that guide our artifact development to realize a
that fraudsters will try to convince customers to buy a robust FDS classifier. The resulting classifier can be
specific stock. Thus, we assume that the campaigns are integrated within an FDS as the core component to
formulated in a way that maximizes fraud effectiveness. provide such classifications. The design features thus
Consequently, we infer that theories from financial resemble the specific artifact characteristics that are
economics and marketing seeking to explain necessary to satisfy the design principles (Meth et al.,
information processing in financial markets and 2015). The specific mapping between design principles
purchase decision-making behavior might be useful in and features is shown in Figure 2. We present two
the identification of relevant document characteristics. design features that are related to document
We therefore formulated our first design principle to transformation (DF1a, DF1b), one design feature used
focus on these kernel theories during the knowledge- for automated document classification (DF2), and two
discovery process. design features used to increase classifier robustness
DP1: Theory-guided knowledge discovery process: (DF3a, DF3b). In the case of document transformation,
The FDS development process should be we first focus on the classic “bag-of-words” model and
informed by kernel theories explaining fraud then emphasize the theoretically derived linguistic
effectiveness. features. The theory-guided knowledge discovery
process plays a central role in determining the design
Additionally, following DR1 and DR2, we inferred that features, as the linguistic features are used for document
the automated processing of stock recommendations transformation, classification, and increased classifier
and of classifying documents as either suspicious or robustness. The specific design features and their
non-suspicious is required. This finding is in line with relationships to the design principles are outlined in the
earlier FDS from other domains, which has largely following sections.
relied on automated solutions for data processing and

161
Journal of the Association for Information Systems

Design Features

DF1a: Document
Transformation:
Design Requirements Design Principles BoW Model

DR1: Process a Large DP1: Theory-guided DF1b: Document


Volume of Unstructured Knowledge Discovery Transformation:
Data Process Linguistic Features

DR2: Provide Automated DP2: Automation of DF2: Automated


Identification of Document Processing Document Classification:
Suspicious Documents and Classification SVM-trained Classifier

DR3: Limit System DP3: Anticipation of DF3a: Classifier


Vulnerability to Scammer Scammer Robustness:
Countermeasures Countermeasures Combination of Features

DF3b: Classifier
Robustness:
Ensemble Learning

Figure 2. Mapping of Design Requirements, Principles, and Features

3.5.1 Design Feature DF1a: Document 3.5.2 Design Feature DF1b: Document
Transformation with a Bag-of-Words Transformation with Linguistic
Model Features
We implemented document transformation via a bag- We implemented another design feature (DF1b),
of-words model as a basic design feature (DF1a) to document transformation with linguistic features to
address the design principle of the automation of address the design principles of a theory-guided
document processing and classification (DP2; Russell, knowledge discovery process (DP1) and to enable
Norvig, & Davis, 2010). Because classical machine automated document processing (DP2). In line with
learning techniques cannot assess plain text, we first our instrumentalist research perspective, we sought to
used several pre-processing steps for the text of the discover the theory “that has the highest likelihood of
examined recommendations (Apté, Damerau, & solving [our] particular problem” (Kilduff et al., 2011,
Weiss, 1994; Wei & Dong, 2001). We decomposed p. 303). Theoretical foundations from financial
each document into its individual words, regarding economics and marketing serve as justificatory
each word as a feature (i.e., a bag-of-words model; knowledge for our artifact design, guiding us to take
Russell et al., 2010). To increase computational into account information content, readability, and
efficiency and classification performance, we reduced sentiment as linguistic features.
the number of features by removing stop words and
Information content. To increase the advertising
applied minimum and maximum thresholds for the
effect of their stock recommendations, fraudsters need
number of documents in which each feature should
to provide a significant amount of relevant information
occur (Groth, Siering, & Gomber, 2014). We also
about that stock. Thus, we determined that the
applied a stemmer (Porter, 1980). To avoid overly
document information content in the context of DF1b
optimistic classification results, we filtered out stock
had the capacity to facilitate the identification of
symbols, firm names, publisher names, and
suspicious stock recommendations. We measured
disclaimers that are contained only in suspicious stock
information content by relying on the “entropy
recommendations. The remaining features were used
measure” (Shannon, 1951). Entropy is a widely used
to construct a document-feature matrix for the training
measure of information content and can also be applied
and evaluation of the models. The term frequency-
to measure the information content and redundancy of
inverse document frequency (TF-IDF) measure was
text samples (Shannon, 1951). In this study, we used
used to calculate the corresponding weights (Hotho,
an adaptation of Shannon entropy (Shannon, 1948),
Nürnberger, & Paaß, 2005).

162
Design Principles for Robust Fraud Detection

which is provided by Equation (1) below. This metric journalists are not aimed simply at convincing readers
is also extensively used in the field of machine learning to purchase particular stocks but should instead aim to
(Han & Kamber, 2006): provide an unbiased analysis. Against this background,
we propose document sentiment as an appropriate
𝑛 linguistic feature for identifying suspicious documents.
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 = − ∑ 𝑝𝑖 log⁡( 𝑝𝑖 ) (1) We examined the sentiments expressed in stock
𝑖=1 recommendations using an unsupervised, dictionary-
based approach (Zhou & Chaovalit, 2008). We used
In the above calculation of entropy, n denotes the the Harvard-IV-4 dictionary, which is commonly used
words contained in a document, and pi represents the in studies related to the current investigation (Hu et al.,
probability that specific word i will occur. Here, high 2012; Tetlock, 2007; Tetlock, Saar-Tsechansky, &
entropy values symbolize high information content Macskassy, 2008). We counted the occurrences of
(Martin & Rey, 2000; Teahan, 2000). positive and negative words using the categories
defined by this dictionary, and also considered
Readability. Fraudsters seek to increase the demand
negations (Loughran & McDonald, 2011).
for a stock; therefore, because readability increases
advertising efficacy and investors’ reactions, Next, we adapted several document-level sentiment
suspicious stock recommendations should be easy to metrics, as presented in Equations (5), (6), and (7)
understand. Consequently, we used document below (Hu et al., 2012; Tetlock et al., 2008; Zhang &
readability as another linguistic feature to identify Skiena, 2010):
suspicious stock recommendations. We measured text
readability by calculating the automated readability 𝑝𝑜𝑠 − 𝑛𝑒𝑔
index (ARI), the Flesch Reading Ease Score (Flesch), 𝑃𝑜𝑙𝑎𝑟𝑖𝑡𝑦 = (5)
𝑝𝑜𝑠 + 𝑛𝑒𝑔
and the Fog Index (Fog), which are provided by
Equations (2), (3), and (4), respectively (Hu, Bose,
Koh, & Liu, 2012; Loughran & McDonald, 2010; 𝑝𝑜𝑠
𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 = (6)
Smith & Senter, 1967). The ARI, as calculated by 𝑛
Equation (2) 2 below, has been used in the context of
manipulation detection (Hu et al., 2012): 𝑛𝑒𝑔
𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑖𝑡𝑦 = (7)
𝑛
𝑤𝑜𝑟𝑑𝑠 𝑠𝑡𝑟𝑜𝑘𝑒𝑠
𝐴𝑅𝐼 = 0.5 + 4.71 − 21.43 (2)
𝑠𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝑠 𝑤𝑜𝑟𝑑𝑠 These metrics consider pos, which represents the
number of positive words, and neg, which represents
𝑤𝑜𝑟𝑑𝑠 𝑐𝑜𝑚𝑝𝑙𝑒𝑥⁡𝑤𝑜𝑟𝑑𝑠 the number of negative words, both calculated as
𝐹𝑜𝑔 = 0.4 ( + 100 ) (3) described above. In addition, n is defined as the total
𝑠𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝑠 𝑤𝑜𝑟𝑑𝑠
number of words. If a document contains neither
positive nor negative words, the value of the above
𝑤𝑜𝑟𝑑𝑠
𝐹𝑙𝑒𝑠𝑐ℎ = 206.835 − 1.015 metrics is defined as zero. A positive polarity value
𝑠𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝑠 (4) indicates the predominance of positive words in a
𝑠𝑦𝑙𝑙𝑎𝑏𝑙𝑒𝑠
− 84.6 document; similarly, a negative value indicates the
𝑤𝑜𝑟𝑑𝑠
predominance of negative words. We also calculated
the proportion of positive and negative words (relative
In the above equations, words, sentences, syllables,
to total words) in each document (positivity and
and strokes represent the total number of words,
negativity, respectively).
sentences, syllables, and strokes in the text,
respectively. Complex words indicates the total 3.5.3 Design Feature DF2: Automated
number of words consisting of three or more syllables.
Document Classification with an SVM-
Both ARI and Fog are intended to represent the grade
level required to understand a text; thus, lower scores based Classifier
for these metrics indicate that a document is easier to As a further design feature that addresses the design
read. By contrast, low Flesch scores indicate principle of the theory-guided knowledge discovery
documents that are difficult to read (Loughran process (DP1) and the design principles of the
& McDonald, 2010). automation of document processing and
Sentiment. It can be assumed that suspicious stock transformation (DP2), we applied automated document
recommendations will have a very positive tone classification using SVM-based classifiers that
because fraudsters seek to increase the demand for and identify suspicious stock recommendations (DF2). We
the stock price of the targeted stock. By contrast, stock thus followed a supervised learning setup whereby we
recommendations published by professional used suspicious and non-suspicious stock

163
Journal of the Association for Information Systems

recommendations to train and evaluate several do this, we combined the outputs of Classifiers A and
classifiers that should then be able to classify new B and thus also considered the feature set resulting
recommendations. For this training, we used a Support from the theory-guided knowledge discovery process
Vector Machine (SVM) because it has been proven (DP1). As a simple approach, Classifier D combines
useful for analyzing both structured and unstructured the outputs of Classifiers A and B as follows:
data (Joachims, 1998; Kim, 2003; Tay & Cao, 2001).
Based on this design feature, we built two fundamental suspicious, A(𝐱) > 0 ∨ B(𝐱) > 0
classifiers. D(𝐱) = { (8)
non-suspicious, otherwise
Classifier A is based on a bag-of-words model and thus
builds upon design feature DF1a. For this classifier, the Thus, document x is classified as suspicious by
text is pre-processed, and the words are used as Classifier D if either Classifier A or Classifier B
features to represent the text. This approach reflects a evaluates it as being suspicious ( > 0); this technique
classical text categorization task. Classifier B utilizes represents the basic multiple-classifier approach
linguistic features to represent the stock proposed by Jorgensen, Zhou, and Inge (2008).
recommendations and thus builds upon design feature
Finally, we constructed Classifier E, which addresses
DF1b. For this classifier, the different measures for
the concern that a fraudster may adopt
information content, readability, and sentiment are
countermeasures that involve adjusting the message
used as input variables to determine whether a
content. Classifier E combines the outputs of
document is suspected to be a fraudulent stock
Classifiers A and B in a more complex manner.
recommendation.
Because of the nature of the SVM classification, the
3.5.4 Design Feature DF3a: Classifier vector space underlying a classifier is separated into
two half-spaces by a hyperplane. Consequently, a
Robustness with Combined Feature document can lie on either the “suspicious” or “non-
Sets suspicious” side of the hyperplane. A hyperplane can
To implement design principle DP3—to consider the be formally described as 𝐰 ⋅ 𝐱 𝟎 + 𝑏 = 0, where 𝐱 𝟎 is
fraudster’s countermeasures and to develop a classifier a point lying on the hyperplane, 𝐰 is the weight vector
that is robust to these countermeasures—we (normal to the hyperplane), and 𝑏 denotes the
implemented design feature DF3a by increasing hyperplane bias (offset from the origin of the vector
classifier robustness with combined feature sets. We space). The parameters 𝐰 and 𝑏 are both determined
assumed that combining the bag-of-words model and by the SVM training algorithm in an attempt to
linguistic features would improve classifier robustness, separate the positive training examples (i.e., suspicious
as avoiding being detected by classifiers that rely on documents) from the negative ones (i.e., non-
two feature sets can be assumed to be more difficult suspicious documents) by the widest possible margin
than taking countermeasures against one feature set. with respect to the SVM optimization function.
Consequently, we addressed the theory-guided
Let us examine Classifier A more closely to explain the
knowledge discovery process by focusing on the
concept of document manipulation. In the case of
related feature set (DP1). Thus, we trained Classifier
Classifier A, each document is represented as a high-
C, which builds upon both feature sets. This classifier
dimensional vector of TF-IDF weights, with each
incorporates the linguistic features for information
weight corresponding to one feature in the document.
content, readability, and sentiment and the features of
Given document vector 𝐱 , Classifier A performs the
the bag-of-words model.
following assessment to determine whether the
document is suspected of being fraudulent:
3.5.5 Design Feature DF3b: Classifier
Robustness Based on Ensemble A(𝐱) = 𝐰 ⋅ 𝐱 + 𝑏 = 𝑤1 𝑥1 + 𝑤2 𝑥2 + ⋯ + 𝑤𝑛 𝑥𝑛 + 𝑏
Learning In this formulation, 𝑛 is the size of the vocabulary (i.e.,
In addition to directly combining the feature sets in a the number of different features in the document
single classifier, ensemble learning can also increase collection), 𝑥𝑖 is the TF-IDF weight of the i-th feature
the robustness of a fraud-detection approach (it is 0 if that particular feature is not present in the
(Dietterich, 1997). Therefore, we also implemented document), and 𝑤𝑖 is the SVM weight that corresponds
design feature DF3b by training and combining several to the i-th feature. If A(𝐱) is positive, the document lies
classifiers to increase the robustness of the resulting on the positive side of the hyperplane and is considered
classifier by anticipating the fraudsters’ to be suspicious; if it is negative, the document lies on
countermeasures (DP3). Given our focus on building the negative side of the hyperplane and is considered
robust classifiers, we constructed two additional non-suspicious.
classifiers based on an ensemble learning approach. To

164
Design Principles for Robust Fraud Detection

Non-suspicious Documents
Dimension 2


+

– –

– +
– +
+
– + +
+

+
+ +
Document
Manipulation m
Suspicious Documents

Dimension 1

Figure 3. Pushing Documents from One Side (Suspicious) of the Hyperplane to the
Other (Non-Suspicious)

their recommendations to retain their advertising


To present a suspicious document as non-suspicious,
effects; thus, Classifier B is considered to be a superior
the fraudster needs to replace words that indicate fraud
approach for assessing potentially altered documents.
with words that indicate trustworthiness (according to
This new design feature of robust classifiers, which is
Classifier A). Technically, this requires replacing
represented by Classifier Ethr, is defined as follows:
feature 𝑖 with feature 𝑗 so that 𝑤𝑖 𝑥𝑖 > 𝑤𝑗 𝑥𝑗 , which
decreases the overall value of A(𝐱) . By performing
such swaps, the fraudster “pushes” the document from suspicious, A(𝐱) > 0 ∨
the positive side of the hyperplane toward the negative E𝑡ℎ𝑟 (𝐱) = { (𝐱 ∈ SA (𝑡ℎ𝑟) ∧ B(𝐱) > 0) (9)
side. Pushing a suspicious document far into non- non-suspicious, otherwise
suspicious territory is not feasible, as doing so requires
a high degree of manipulation. An altered document is This definition can be restated as follows:
thus most likely to lie relatively close to the hyperplane
on the non-suspicious (i.e., negative) side. In fact, such
a document is expected to be found in subspace SA, suspicious, A(𝐱) > 0 ∨
which is parameterized by 𝑡ℎ𝑟 ≥⁡ 0⁡ and defined by E𝑡ℎ𝑟 (𝐱) = { (|A(𝐱)| < 𝑡ℎ𝑟 ∧ B(𝐱) > 0) (10)
SA(𝑡ℎ𝑟) = {𝐱;⁡−𝑡ℎ𝑟 <⁡x∙w + 𝑏 ≤⁡0}⁡=⁡{𝐱;⁡−𝑡ℎ𝑟 < A(𝐱) ⁡⁡non-suspicious, otherwise
≤⁡ 0}. The basis for a robust classifier follows the
intuition that, even for a relatively small value of the The following conclusions hold for extreme conditions,
threshold thr, the manipulated documents are pushed when the boundary of SA lies on the hyperplane and SA
from the suspicious space into SA(thr). Figure 3 thus effectively does not exist ( E0 ), and when SA
illustrates the described approach in a two-dimensional occupies the entire negative half-space (E∞ ):
space.
This reasoning implies that the documents that fall into
E (𝐱) = D(𝐱) =
SA(thr) may have been altered. Therefore, for
Classifier E, if a document falls outside of SA(thr), the suspicious, A(𝐱) > 0 ∨ B(𝐱) > 0
{
output of Classifier A is accepted. However, if the non-suspicious, otherwise (11)
document falls into SA(thr), Classifier B is instead
employed to categorize the document. Changing single
words in a document (i.e., the bag-of-words document suspicious, A(𝐱) > 0
E0 (𝐱) = { (12)
representation) is straightforward, whereas changing non-suspicious, otherwise
linguistic features requires more effort and is not
typically desirable for fraudsters because they want

165
Journal of the Association for Information Systems

4 Evaluation account linguistic features relating to advertising


efficiency (in contrast to classifiers that solely follow
The important considerations for conducting DF1a and thus rely solely on a bag-of-words model).
evaluations of IT artifacts in DSR include choice of
evaluation criteria (the “what”) and evaluation method H1: When under attack, a classifier based on linguistic
(the “how”; Prat, Comyn-Wattiau, & Akoka, 2015). Our features outperforms a classifier based solely on
selections are guided by the design requirements. As a bag-of-words model.
evaluation criteria, we selected “validity,” which In addition to taking linguistic features into account,
suggests “that the artifact works correctly, i.e., correctly classification performance can be increased by
achieves its goal” (Prat et al., 2015, p. 265). This is, combining feature sets (DF3a) or by applying
referring to our design requirements, closely related to ensemble learning (DF3b). In the case of a
robustness, “the ability of the artifact to handle invalid combination of feature sets, it can be assumed that it is
inputs or stressful environmental conditions” (Prat et al., more difficult to manipulate classifiers that consider
2015, p. 266). To gain insights into how well the both bag-of-words and linguistic features than
classifier performed, we first examined how the classifiers that consider bag-of-words or linguistic
classifier identified suspicious documents in normal features alone.
circumstances (see section 4.3). Next, to understand
robustness, we evaluated how the classifier performed In the case of ensemble learning, the individual
in the presence of countermeasures (see Section 4.4). decisions of different classifiers are combined to
The evaluation followed 10-fold cross-validation and a classify new examples (Dietterich, 1997). Ensembles
simulation-based evaluation setup that modeled the can be more accurate if individual classifiers disagree
fraudsters’ behavior on the basis of inputs by the domain (Dietterich, 1997; Hansen & Salamon, 1990) because
experts. “multiple learner systems try to exploit the local
different behavior of the base learners to enhance the
In the following, we outline our evaluation hypotheses accuracy and the reliability of the overall inductive
concentrating on the question of which classifiers are learning systems” (Valentini & Masulli, 2002, p. 4).
most suitable to address the design requirements. Given the background of these general advantages in
Thereafter, we outline the acquisition of the corpus of the case of different classification tasks, we
documents used to train and evaluate the classifiers. hypothesize that these characteristics will continue to
Finally, we outline our evaluation approach and the be advantageous if such classifiers, based on DF3a or
corresponding results. DF3b, are attacked:

4.1 Hypotheses H2a: When under attack, a classifier that combines


linguistic features and the bag-of-words model
The ability to manage a large volume of unstructured will outperform other classifier configurations
data (DR1) and to support the automated identification based solely on linguistic features or a bag-of-
of suspicious documents (DR2) are basic characteristics words model.
of all classifiers. Thus, we concentrate on the question
of which implementation of our design features H2b: When under attack, a classifier based on
performs best in the provision of robust classifications ensemble learning incorporating linguistic
(DR3) when formulating our evaluation hypotheses. features and the bag-of-words model will
outperform other classifier configurations
Fraudsters seek to evade classifiers by avoiding terms based solely on linguistic features or a bag-of-
that identify suspicious contents and/or replacing such words model.
terms with words that are typically contained in non-
suspicious messages (Biggio et al., 2010; Jorgensen et 4.2 Dataset Acquisition and Descriptive
al., 2008). In the following, we define this behavior as Statistics
an “attack” on the functioning of the classifier. If the
classifiers subjected to these countermeasures are not 4.2.1 Dataset Acquisition
retrained, the classification performance of the attacked
classifiers will decrease significantly (Webb et al., Training and evaluating classifiers require documents
2005). However, in a scenario that involves stock that represent both document classes: documents
recommendations intended to convince readers to buy suspected to be fraudulent stock recommendations and
the advertised stock, we assume that fraudsters seek to documents that contain reliable recommendations. The
maintain their advertising efficiency. Thus, fraudsters identification of appropriate documents was carefully
have a vested interest in retaining the message features conducted in cooperation with our domain experts; it
that influence advertising efficiency. Based on this also incorporated feedback from financial institutions
reasoning, we formulate the following hypothesis for and the financial supervisory authority. The SEC has
classifiers that provide automated document published several criteria that provide the basis for
classifications (DF2), following DF1b and taking into identifying documents that represent stock

166
Design Principles for Robust Fraud Detection

recommendations as suspicious and/or fraudulent.1 We detection classifiers. Our results remain robust
searched for stock recommendations that fulfilled these regardless of whether the complete or the reduced
criteria and included small-cap stocks traded primarily datasets for suspicious and non-suspicious
in markets with little regulation, labeling these recommendations were assessed.
recommendations as suspicious. To acquire newsletters
All of the classifiers were trained with a biased cost
promoting stocks that matched these criteria, we used
function because of the unbalanced dataset (Witten,
the newsletter.hotstocked.com archive. This internet
Frank, Hall, & Pal, 2016). Therefore, the error on
service does not publish its own stock recommendations
suspicious examples was multiplied by the total number
but aggregates diverse stock recommendations that are
of non-suspicious examples divided by the total number
published either on the web or in investment
of suspicious examples (2,088/896 = 2.33) during the
newsletters.
training. We also trained the classifiers with a non-
The identification of reliable stock recommendations biased cost function; the recall of the suspicious
was also carefully conducted. Stock recommendations documents was most heavily affected by this (it
published on the internet that do not fulfill the SEC decreased), significantly affecting the overall
criteria for suspiciousness are not guaranteed to be classification performance, as shown by the F-measure.
reliable (they can be manipulative without triggering the
conditions required for a suspicious or fraudulent 4.2.2 Descriptive Statistics
designation (Aggarwal & Wu, 2006)). Thus, we only
For both suspicious and non-suspicious stock
considered documents that were published in more
recommendations, we determined information content,
reliable sources, specifically financial newspapers. We
readability, and sentiment, as described above. We
used analyst reports that contain stock recommendations
tested whether the theoretically derived linguistic
published by Dow Jones Newswires. Based on feedback
features were suited for differentiating between the two
from our domain experts, we selected Dow Jones
document classes and were consequently useful in fraud
Newswires as an appropriate source for reliable
detection by performing Wilcoxon rank-sum tests to
documents because it is a major financial news provider
assess the equality of medians (see Table 1). With
that is well-regarded by financial professionals (Tetlock,
respect to the information content of the examined
2007) and because its documents are created by many
recommendations, we found that Entropy was
different authors. Thus, we downloaded the analyst
significantly higher for suspicious stock
reports published by Dow Jones Newswires and
recommendations than for non-suspicious stock
designated these reports as non-suspicious stock
recommendations. This result indicates that suspicious
recommendations.
stock recommendations contain more information than
Following the above procedures, we acquired a total of non-suspicious stock recommendations. With respect to
14,556 suspicious and 3,342 non-suspicious stock readability, each readability measure indicates that
recommendations published between December 15, suspicious stock recommendations are easier to
2010 and February 10, 2012. We removed stock understand than non-suspicious stock
symbols, firm names, and publisher names from the recommendations. Therefore, the null hypothesis of
documents to ensure the generalizability of the results. equal medians could be rejected for ARI, Flesch, and
In the Discussion section below, we elaborate on the Fog at a 1% confidence level. Moreover, Table 1 shows
finding that our classification results remain robust that suspicious stock recommendations indicate
when taking a second dataset into account. sentiments that are more positive than the sentiments of
non-suspicious stock recommendations. In addition,
We considered only the first suspicious
compared with non-suspicious recommendations,
recommendation that was published with regard to a
suspicious stock recommendations contain a higher
specific stock to remove identical recommendations and
fraction of positive sentiment-bearing words (positivity)
to avoid overfitting. This restriction reduced the final
and a lower fraction of negative sentiment-bearing
number of suspicious stock recommendations used in
words (negativity). Thus, all the examined linguistic
this study to 896. In addition, a review of the non-
features discriminate between the two document
suspicious documents obtained from Dow Jones
categories, and the observed differences are consistent
Newswires reveals that some of these documents
with the kernel theories that justified our feature
consisted only of tables that span a large number of
selection. Given these results, we consider the linguistic
stocks but do not include any analyses. Thus, we
feature set to be useful in the fraud detection context.
discarded these documents from the analysis, and a total
of 2,088 documents were used to train our fraud-

1 In this context, the SEC warns investors against trading recommendation was an advertisement), or whether the
stocks that are recommended if it is unclear whether the recommended stock is a small, thinly traded company (SEC,
recommender holds a position in the recommended stock, 2012a).
whether compensation was paid to the recommender (if the

167
Journal of the Association for Information Systems

Table 1. Descriptive Statistics for Linguistic Features and Results of Wilcoxon Rank-
Sum Tests for the Equality of Medians (***/**/*:p<1%/5%/10%)
Variable Linguistic Suspicious stock Non-suspicious stock p-value
feature recommendations recommendations
Mean Median Mean Median
Information Entropy 7.1826 7.2858 6.8579 6.8833 < 0.01***
content
Readability ARI 13.951 13.755 15.739 15.561 < 0.01***
Flesch 45.111 44.276 39.240 39.475 < 0.01***
Fog 15.947 15.949 17.072 16.971 < 0.01***
Sentiment Polarity 0.4322 0.4390 0.1218 0.1261 < 0.01***
Positivity 0.0861 0.0864 0.0688 0.0686 < 0.01***
Negativity 0.0344 0.0333 0.0538 0.0525 < 0.01***

based on ensemble learning. For this classic


4.3 Naive Evaluation evaluation, we selected thr = 0.5 for Classifier E, but
We evaluated the performance of the different other values between 0 and 1 produced similar results,
classifiers and the general validity of the proposed as illustrated in the following section. If only the basic
problem solution utilizing k-fold stratified cross- text-based features are taken into account (Classifier
validation (k = 10), which avoids overly optimistic A), an accuracy of 99.67% is achieved. In addition, the
results (Mitchell, 1997). We created a contingency precision, recall, and F1 scores are above 98% for all
table that contains the number of correctly and of the classes of results. These are excellent scores,
incorrectly classified examples. These results were although previous text mining studies have reported
classified as true positives (TP), true negatives (TN), comparable results for related document classification
false positives (FP), or false negatives (FN). On this tasks (Joachims, 1998; Webb et al., 2005).
basis, the performance metrics of accuracy, precision,
Furthermore, Classifier B achieves a classification
recall, and F1 (Hotho et al., 2005; Kotsiantis, 2007; van
accuracy of 83.61%; thus, 83.61% of all cases are
Rijsbergen, 1979) were calculated through micro-
classified correctly through this approach.
averaging (Chau & Chen, 2008). We calculated
Misclassification costs (i.e., the consequences of
precision, recall, and F1 for the “suspicious” and “non-
classifying suspicious recommendations as non-
suspicious” classes. The evaluation metrics are defined
suspicious and vice versa) are particularly important in
as follows:
fraud detection (Phua et al., 2010). Thus, the
classification results for both classes should also be
𝑇𝑃 + 𝑇𝑁 taken into account. In the case of Classifier B,
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (13)
𝑇𝑃 + 𝐹𝑃 + 𝑇𝑁 + 𝐹𝑁 significantly lower precision appears to be achieved
for the suspicious class than for the non-suspicious
𝑇𝑃 class. However, the difference in recall between these
𝑅𝑒𝑐𝑎𝑙𝑙 = (14) two classes is less substantial: 86.84% of the
𝑇𝑃 + 𝐹𝑁
suspicious recommendations are classified as
suspicious, whereas 82.31% of the non-suspicious
𝑇𝑃 recommendations are classified as non-suspicious.
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (15)
𝑇𝑃 + 𝐹𝑃
Classifier C, which incorporates the bag-of-words
model and linguistic features, produces results that are
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∙ 𝑅𝑒𝑐𝑎𝑙𝑙 comparable to, but slightly lower than, the results of
𝐹1 = 2 ∙ (16)
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙 Classifier A. Regarding the classifiers based on
ensemble learning, Classifier D demonstrates an
The results of the 10-fold cross-validation are overall classification performance that appears to be
presented in Table 2. This table presents the results for between those of Classifiers A and B. Finally,
Classifier A, which accounts only for the bag-of-words Classifier E0.5 produces an overall classification
model; for Classifier B, which accounts only for the performance that is comparable to the performance of
linguistic features of information content, readability, Classifiers A and C. Thus, Classifiers A, C, and E0.5
and sentiment; for Classifier C, which utilizes both achieve very good results in the identification of
feature sets; and for Classifiers D and E, which are suspicious recommendations and produce slightly
better overall performance than Classifiers B and D.

168
Design Principles for Robust Fraud Detection

Table 2. SVM Classification Results (All values Are Given as Percentages)


Class suspicious Class non-suspicious
Accuracy Precision Recall F1 Precision Recall F1
Classifier A 99.67 98.99 99.89 99.44 99.95 99.57 99.76
Classifier B 83.61 67.72 86.84 76.10 93.54 82.31 87.57
Classifier C 98.79 97.33 98.61 97.97 99.43 98.86 99.14
Classifier D 87.50 70.54 100.00 82.73 100.00 82.17 90.21
Classifier E0.5 98.69 95.83 100.00 97.87 100.00 98.14 99.06

Table 1. The 20 Most Important Features for Classifier C by SVM Weight


Rank (Linguistic) feature Weight Rank (Linguistic) feature Weight
1 Alert 1.5032 11 fitch 0.5315
2 Sp 1.3050 12 upgrade 0.5219
3 Polarity 0.8393 13 gbp 0.5059
4 Analyst 0.7964 14 bank 0.4732
5 Entropy 0.6138 15 moodys 0.4674
6 Said 0.6123 16 eur 0.4530
7 Technology 0.6097 17 chart 0.4204
8 pick 0.5844 18 read 0.4177
9 Mid 0.5642 19 list 0.3927
10 ratings 0.5400 20 Flesch 0.3572

for Classifier C. For example, many suspicious stock


4.4 Robustness Evaluation recommendations alert (#1) investors about stock
To evaluate the robustness of the proposed classifiers, picks (#8). From a fraudster’s point of view, these
we first analyzed the relative importance of the words should be avoided in future stock
linguistic features. Thereafter, we simulated an attack recommendations to prevent detection by the
on the classifiers to evaluate how these performance classifiers. However, a fraudster would also need to
figures change if the input documents are manipulated alter the linguistic features of a message. As a
according to a document manipulation strategy consequence, we expect Classifier C to be more robust
described in the Appendix. than Classifier A against manipulations because
important linguistic features pose a dilemma for
During the training process, SVM assigns certain fraudsters—as marketing theory points out, t message
weights to the features that it assesses. We used these manipulation of linguistic features to avoid
assigned weights to evaluate the importance of identification by Classifier C would decrease the
individual features (i.e., individual words or linguistic advertising effect of the fraudster’s recommendations.
features). In particular, weights with higher absolute
values exert greater influence on the classification To further explore the robustness of the classifiers, we
decision (Guyon et al., 2002). Table 3 reports the 20 performed a simulation of a worst-case attack. We
most important features for Classifier C, sorted by assumed that the fraudster has obtained or could fully
weight. This table shows that linguistic features are of replicate the feature weights of Classifier A and is
great importance. For Classifier C, polarity (i.e., thereby fully aware of the most relevant words that
sentiment) has the highest rank among the linguistic should be avoided; this assumption is much more strict
features, whereas entropy (i.e., information content) is than related attack simulation approaches that do not
ranked #5. Furthermore, Flesch (i.e., readability) is assume this type of insider knowledge (Jorgensen et
ranked #20 (out of the 9,990 features that are relevant al., 2008; Webb et al., 2005). Second, we assumed that
in the model). the fraudster did not want to reduce the advertising
effect of the document. As a result, the linguistic
Furthermore, a number of features of the bag-of-words features (and thus also Model B) were expected to be
model are also among the 20 most important features relatively stable. For each suspicious document, given

169
Journal of the Association for Information Systems

the degree of manipulation m, the fraudster replaced values of m that are equal to or greater than 0.3. The
the m% most important features that drive same result was observed for the F1 measure in the case
suspiciousness with suitable synonyms that are of m ≥ 0.4, which combines the precision and recall
considered to be less suspicious (the detailed algorithm factors. Thus, the results of this simulated attack
for document manipulation is presented in the support H1.
Appendix).
Furthermore, Classifier C appears to be more robust
To evaluate the robustness of the different classifiers, than a classifier that uses only the bag-of-words model
we evaluated their classification performance by (Classifier A) or linguistic features (Classifier B),
increasing the manipulation degree m (i.e., the which supports H2a. With respect to the performance
percentage of words that are replaced by suitable of the developed classifier based on ensemble learning
synonyms). This assessment is graphically depicted in approaches, it can be concluded that Classifier D and
Figure 4. In accordance with H1, we see that, when the various Classifier E configurations (i.e., for several
under attack (i.e., if m is increased), the classifier based different thr values) exhibit by far the best robustness
on linguistic features only (Classifier B) outperforms to the attacks, as demonstrated by the various
the classifier based on the bag-of-words model with performance measures. However, Classifier E
respect to accuracy (Classifier A). Although the outperformed Classifier D in most scenarios and
accuracy of Classifier B is below that of Classifier A performed reasonably well at m = 0.
at m = 0, Classifier B outperformed Classifier A for
Accuracy per Classifier Precision (Suspicious Documents) per Classifier
1.00 1.00

0.95 0.95

0.90 0.90

0.85 0.85
Precision
Accuracy

0.80 0.80

0.75 0.75

0.70 0.70

0.65 0.65

0.60 0.60
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.0 0.1 0.2 0.3 0.4 0.5 0.6
Degree of Document Manipulation m Degree of Document Manipulation m

Recall (Suspicious Documents) per Classifier F1 (Suspicious Documents) per Classifier


1.00 1.00

0.90 0.90

0.80 0.80

0.70 0.70

0.60 0.60
Recall

F1

0.50 0.50

0.40 0.40

0.30 0.30

0.20 0.20

0.10 0.10

0.00 0.00
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.0 0.1 0.2 0.3 0.4 0.5 0.6
Degree of Document Manipulation m Degree of Document Manipulation m

Classifier E (Advanced Ensemble, thr = 0.1)


Classifier A (Bag-of-Words Model only)
Classifier E (Advanced Ensemble, thr = 0.3)
Classifier B (Linguistic Features only)
Classifier E (Advanced Ensemble, thr = 0.5)
Classifier C (Combined Feature Set)
Classifier E (Advanced Ensemble, thr = 0.7)
Classifier D (Simple Ensemble)
Classifier E (Advanced Ensemble, thr = 0.9)

Figure 4. The Robustness of Classifiers Against Countermeasures

170
Design Principles for Robust Fraud Detection

In short, given a performance metric and a document Our predictions are based on stock recommendations
manipulation level, Classifier E0.5 is either the outright for which publishers self-disclosed that they were paid
best-performing classifier or performs similarly to the to advertise the stocks in question. Thus, the study does
best-performing classifier (for instance, with regard to not assess recommendations without this disclaimer.
accuracy, the absolute difference between Classifier However, the inclusion of this statement is obligatory
E0.5 and the best-performing classifier was always (Hu, McInish, & Zeng, 2009), and the SEC cannot
equal to or less than 1%). prohibit the publication of fraudulent stock
recommendations that include this statement as doing
These findings support H2b, which states that, when
so would obstruct “freedom of speech” (SEC, 2012a).
under attack, a classifier based on ensemble learning
Thus, we cannot claim that our study incorporates all
that accounts for both linguistic features and the bag-
possible types of suspicious stock recommendations,
of-words model outperforms models based on either
although it does include a significant subset of them.
linguistic features or the bag-of-words model alone.
By excluding the disclaimers during training, we
However, the sensitivity of this classifier must be
ensured that the classifiers could detect the remaining
established by selecting an appropriate thr value.
suspicious stock recommendations that did not contain
Classifier E0.5 is more robust to manipulations than
disclaimers.
Classifier C, which shows that the application of
ensemble learning is more appropriate than the To rule out the possibility that the results of this study
combination of feature sets at the classifier level. were driven by fundamental differences in the
document sources used for training (e.g., a news
5 Discussion agency such as Dow Jones Newswires might have
guidelines for the composition of related documents)
Our results show that the proposed design principles that were different from suspicious documents (which
and features can be used to address the design are published by various promoters), we reran our
requirements for robust fraud detection. We found that experiments using another source of non-suspicious
prior theories from marketing and financial economics documents (recommendations published in the Yahoo!
provide a foundation (justificatory knowledge) for Finance category “Investing Ideas & Strategies,”). In
identifying suspicious stock recommendations. this setting, the classification results remained robust.
Notably, we found that such recommendations are This allowed us to further establish robustness and
easier to read, incorporate more positive sentiments, overcome a major limitation of fraud-detection
and provide greater information content (which systems (manipulators adapt to them after their
supports advertising success). For the forecasting characteristics have been published) (Bolton & Hand,
models, we confirmed the usefulness of theory-based 2002).
linguistic features (see H1). A classifier based on just
the linguistic features provides good results. The 6 Conclusion
robustness evaluation confirmed the usefulness of
theory-based linguistic features (see H2a, H2b) and In this study, we present a fraud-detection approach for
demonstrates that an ensemble learning approach that identifying suspicious stock recommendations. To
uses linguistic features and bag-of-words models is improve the robustness of this approach, we propose
appropriate for generating a robust fraud-detection new design principles, design features, and different
classifier. classifiers that utilize both a bag-of-words model and
linguistic features derived from domain kernel
We acknowledge that our approach has limitations.
theories.
First, our approach addressed two different types of
documents that can be regarded as examples of We contribute theoretically and methodologically to
suspicious and non-suspicious stock recommendations the literature in several ways. Most importantly, we
(relying on criteria published by the SEC to identify propose design principles and specific design features
suspicious stock recommendations and on analyst for robust fraud-detection systems that address the
reports published by Dow Jones Newswires to identify problem class of information-based market
non-suspicious recommendations). An alternative manipulations, and we demonstrate robust evaluations
approach would be to assess recommendations by based on attack simulations. Our approach (that
domain experts. This approach was criticized by our includes bag-of-words models and theory-motivated
domain experts because one cannot be certain whether linguistic features in combination with ensemble
a recommendation that is labeled as suspicious actually learning) significantly increases the robustness of
aims to manipulate stock prices because they would fraud-detection. Through our work, we demonstrate
not know the specific intentions of the publisher that the shift from foundationalism to instrumentalism
(supported by Bolton & Hand, 2002). As also argued in contemporary data mining research can contribute
by the involved market supervisory authority, any such to problem solving. In this case, foundationalism seeks
assessment for training a classifier must follow to progress toward truth by following inductive logic,
documented criteria that can be disclosed. whereas instrumentalism attempts to engage in

171
Journal of the Association for Information Systems

problem solving, provides the flexibility to build an recommendations serving in pump-and-dump


approach on the basis of relevant theories, and utilizes schemes. Additionally, the proposed fraud-detection
different reasoning principles, including both classifiers could also be used to complement
induction and deduction (Kilduff et al., 2011). To the established FDS covering other manipulation
best of our knowledge, our study is the first to scenarios (Gregory & Muntermann, 2014). Second,
investigate the problem of information-based fraud our findings may be relevant to security software
detection by analyzing and classifying stock developers who are addressing this problem domain, at
recommendations. least with respect to stock scam emails (Symantec,
2011). Our classifiers could be included in browser
The practical contributions of this study are threefold.
toolbars, which already generate warnings for phishing
First, the proposed fraud-detection classifiers can be
websites. Finally, the design principles and design
included in a fraud detection system (FDS) to enhance
features for improving classifier robustness and its
the “information-based market manipulation detection
evaluation could be applied to other fields or languages
capabilities” of firms and market surveillance
apart from English to investigate the robustness of text-
authorities. In particular, existing detection schemes
based classifiers—for example, for opinion spam (Liu,
can be improved to clearly and correctly identify stock
2012) in the social commerce context.

172
Design Principles for Robust Fraud Detection

References high-level features. IEEE Transactions on


Audio, Speech, and Language Processing ,
Abbasi, A., Zhang, Z., Zimbra, D., Chen, H., & 15(7), 2085-2094.
Nunamaker, J. F. (2010). Detecting fake
websites: The contribution of statistical Caruana, G., & Li, M. (2012). A survey of emerging
learning theory. MIS Quarterly, 34(3), 435-461. approaches to spam filtering. ACM Computing
Surveys, 44(2), 1-27.
Abernethy, A. M., & Franke, G. R. (1996). The
information content of advertising: A meta- Caudill, S. B., Ayuso, M., & Guillén, M. (2005). Fraud
analysis. Journal of Advertising, 25(2), 1-17. detection using a multinomial logit model with
missing information. The Journal of Risk and
Abruzzini, P. (1967). Measuring language difficulty in Insurance, 72(4), 539-550.
advertising copy. Journal of Marketing, 31(2),
22-26. Chandy, R. K., Tellis, G. J., MacInnis, D. J., &
Thaivanich, P. (2001). What to say when:
Aggarwal, R. K., & Wu, G. (2006). Stock market Advertising appeals in evolving markets.
manipulations. The Journal of Business, 79(4), Journal of Marketing Research, 38(4), 399-
1915-1953. 414.
Allen, F., & Gale, D. (1992). Stock-price Chau, M., & Chen, H. (2008). A machine learning
manipulation. The Review of Financial Studies, approach to web page filtering using content
5(3), 503-529. and structure analysis. Decision Support
Systems, 44(2), 482-494.
Apté, C., Damerau, F., & Weiss, S. M. (1994).
Automated learning of decision rules for text Chen, R., Chen, T., & Lin, C. J. (2006). A new binary
categorization. ACM Transactions on support vector system for increasing detection
Information Systems, 12(3), 233-251. rate of credit card fraud. International Journal
of Pattern Recognition and Artificial
Baccianella, S., Esuli, A., & Sebastiani, F. (2010).
Intelligence, 20(2), 227-239.
SentiWordNet 3.0: An enhanced lexical
resource for sentiment analysis and opinion Clark, G. L., Kaminski, P. F., & Brown, G. (1990). The
mining. Proceedings of the Seventh readability of advertisements and articles in
International Conference on Language trade journals. Industrial Marketing
Resources and Evaluation. Management, 19(3), 251-260.
Bailin, A., & Grafstein, A. (2001). The linguistic Das, S. R., & Chen, M. Y. (2007). Yahoo! for Amazon:
assumptions underlying readability formulae: a Sentiment extraction from small talk on the
critique. Language & Communication, 21(3), web. Management Science, 53(9), 1375-1388.
285-301.
de Bondt, W. F. M. (1998). A portrait of the individual
Biggio, B., Corona, I., Fumera, G., Giacinto, G., & investor. European Economic Review, 42(3-5),
Roli, F. (2011). Bagging classifiers for fighting 831-844.
poisoning attacks in adversarial classification
tasks. Lecture Notes in Computer Science, Dietterich, T. G. (1997). Machine-learning research:
6713, 350-359. Four current directions. AI Magazine, 18(4),
97-136.
Biggio, B., Fumera, G., & Roli, F. (2010). Multiple
classifier systems under attack. Lecture Notes in Djeraba, C. (2002). Content-based multimedia
Computer Science, 5997, 74-83. indexing and retrieval. IEEE MultiMedia, 9(2),
18-22.
Böhme, R., & Holz, T. (2006). The effect of stock
spam on financial markets. 5th Workshop on the Fama, E. F. (1970). Efficient capital markets: A review
Economics of Information Security. of theory and empirical work. The Journal of
Finance, 25(2), 383-417.
Bollen, J., & Huina, M. (2011). Twitter mood as a
stock market predictor. Computers and Fast, A., Friedland, L., Maier, M., Taylor, B., Jensen,
Operations Research, 44(10), 91-94. D., Goldberg, H. G., & Komoroske, J. (2007).
Relational data pre-processing techniques for
Bolton, R. J., & Hand, D. J. (2002). Statistical fraud improved securities fraud detection.
detection: A review. Statistical Science, 17(3), Proceedings of the 13th ACM SIGKDD
235-255. International Conference on Knowledge
Discovery and Data Mining, 941-949.
Campbell, W. M., Campbell, J. P., Gleason, T. P.,
Reynolds, D. A., & Wade, S. (2007). Speaker
verification using support vector machines and

173
Journal of the Association for Information Systems

Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). Hanke, M., & Hauser, F. (2008). On the effects of
From data mining to knowledge discovery in stock spam e-mails. Journal of Financial
databases. AI Magazine, 17(3), 37-54. Markets, 11(1), 57-83.
FBI. (2011). Financial crimes report to the public. Hansen, L. K., & Salamon, P. (1990). Neural network
http://www.fbi.gov/stats-services/publications/ ensembles. IEEE Transactions on Pattern
financial-crimes-report-2010-2011 Analysis and Machine Intelligence, 12, 993-
1001.
Felixson, K., & Pelli, A. (1999). Day end returns:
Stock price manipulation. Journal of Hevner, A., & Chatterjee, S. (2010). Design research
Multinational Financial Management, 9(2), 95- in information systems: Theory and practice:
127. Springer.
Fellbaum, C. (1998). WordNet: An electronic lexical Hevner, A.R., March, S. T., & Park, J. (2004). Design
database. MIT Press. science in information systems research. MIS
Quarterly, 28(1), 75-105.
Frieder, L., & Zittrain, J. (2006). Spam works:
Evidence from stock touts and corresponding Holton, C. (2009). Identifying disgruntled employee
market activity. Berkman Center Research. systems fraud risk through text mining: A
simple solution for a multi-billion dollar
Glancy, F. H., & Yadav, S. B. (2011). A computational
problem: IT decisions in organizations.
model for financial reporting fraud detection:
Decision Support Systems, 46(4), 853-864.
On quantitative methods for detection of
financial fraud. Decision Support Systems, Hotho, A., Nürnberger, A., & Paaß, G. (2005). a brief
50(3), 595-601. survey of text mining. GLDV Journal for
Computational Linguistics, 20(1), 19-62.
Goodman, J., Cormack, G. V., & Heckerman, D.
(2007). Spam and the ongoing battle for the Hu, B., McInish, T., & Zeng, L. (2009). The CAN-
inbox. Communications of the ACM, 50(2), 24- SPAM Act of 2003 and stock spam emails.
33. Financial Services Review, 18, 87-104.
Gregor, S., & Hevner, A. R. (2013). Positioning and Hu, N., Bose, I., Koh, N. S., & Liu, L. (2012).
presenting design science research for Manipulation of online reviews: An analysis of
maximum impact. MIS Quarterly, 37(2), 337- ratings, readability, and sentiments. Decision
355. Support Systems, 52(3), 674-684.
Gregor, S., & Jones, D. (2007). The Anatomy of a IEEE. (1990). IEEE standard glossary of software
design theory. Journal of the Association for engineering terminology (IEEE Std 610.12-
Information Systems, 8(5), 312-335. 1990). IEEE.
Gregor, S., Müller, O., & Seidel, S. (2013). Reflection, Iivari, J. (2007). A paradigmatic analysis of
abstraction, and theorizing in design and information systems as a design science.
development research. Proceedings of the 21st Scandinavian Journal of Information Systems,
European Conference on Information Systems. 19(2), 39-64.
Gregory, R. W., & Muntermann, J. (2014). Research Iivari, J. (2015). Distinguishing and contrasting two
note—Heuristic theorizing: Proactively strategies for design science research.
generating design theories. Information European Journal of Information Systems,
Systems Research, 25(3), 639-653. 24(1), 107-115.
Groth, S. S., Siering, M., & Gomber, P. (2014). How Joachims, T. (1998). Text categorization with support
to enable automated trading engines to cope vector machines: Learning with many relevant
with news-related liquidity shocks? Extracting features. Proceedings of the 10th European
signals from unstructured data. Decision Conference on Machine Learning, 137-142.
Support Systems, 62, 32-42.
Jones, M. A., & Smythe, T. (2003). The information
Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. content of mutual fund print advertising. The
(2002). Gene selection for cancer classification Journal of Consumer Affairs, 37(1), 22-41.
using support vector machines. Machine
Jorgensen, Z., Zhou, Y., & Inge, M. (2008). A multiple
Learning, 46(1), 389-422.
instance learning strategy for combating good
Han, J., & Kamber, M. (2006). Data mining: Concepts word attacks on spam filters. Journal of
and techniques (2nd ed.). Morgan Kaufmann. Machine Learning Research, 8, 1115-1146.

174
Design Principles for Robust Fraud Detection

Kilduff, M., Mehra, A., & Dunn, M. B. (2011). From Nelson, P. (1970). Information and consumer
blue sky research to problem solving: A behavior. Journal of Political Economy, 78(2),
philosophy of science theory of new knowledge 311-329.
production. The Academy of Management
Newell, A., & Simon, H. A. (1972). Human problem
Review, 36(2), 297-317.
solving (Vol. 9). Prentice-Hall.
Kim, K.-j. (2003). Financial time series forecasting
Ngai, E. W. T., Hu, Y., Wong, Y. H., Chen, Y., & Sun,
using support vector machines.
X. (2011). The application of data mining
Neurocomputing, 55(1-2), 307-319.
techniques in financial fraud detection: A
Kolcz, A., & Teo, C. H. (2009). Feature weighting for classification framework and an academic
improved classifier robustness. Proceedings of review of literature. Decision Support Systems,
the Sixth Conference on Email and Anti-Spam. 50(3), 559-569.
Korfiatis, N., García-Bariocanal, E., & Sánchez- Öğüt, H., Mete Doğanay, M., & Aktaş, R. (2009).
Alonso, S. (2012). Evaluating content quality Detecting stock-price manipulation in an
and helpfulness of online product reviews: The emerging market: The case of Turkey. Expert
interplay of review helpfulness vs. review Systems with Applications, 36(9), 11944-
content. Electronic Commerce Research and 11949.
Applications, 11(3), 205-217.
Peffers, K., Tuunanen, T., Rothenberger, M., &
Kotsiantis, S. B. (2007). Supervised machine learning: Chatterjee, S. (2007). A design science research
A review of classification techniques. methodology for information systems research.
Informatica, 31(3), 249-268. Journal of Management Information Systems,
24(3), 45-77.
Kuechler, B., & Vaishnavi, V. (2008). On theory
development in design science research: Perols, J., Chari, K., & Agrawal, M. (2009).
Anatomy of a research project. European Information market-based decision fusion.
Journal of Information Systems, 17, 489-504. Management Science, 55(5), 827-842.
Kuechler, W., & Vaishnavi, V. (2012). A framework Phua, C., Lee, V., Smith, K., & Gayler, R. (2010). A
for theory development in design science comprehensive survey of data mining-based
research: multiple perspectives. Journal of the fraud detection research. Proceedings of the
Association for Information Systems, 13(6), International Conference on Intelligent
395-423. Computation Technology and Automation, 50-
53.
Liu, B. (2012). Sentiment analysis and opinion mining.
Synthesis Lectures on Human Language Porter, M.F. (1980). An algorithm for suffix stripping.
Technologies, 5(1), 1-167. Program, 14(3), 211-218.
Loughran, T., & McDonald, B. (2010). Measuring Prat, N., Comyn-Wattiau, I., & Akoka, J. (2015). A
Readability in Financial Text (Working paper). taxonomy of evaluation methods for
University of Notre Dame. information systems artifacts. Journal of
Management Information Systems, 32(3), 229-
Loughran, T., & McDonald, B. (2011). When is a
267.
liability not a liability? Textual analysis,
dictionaries, and 10-Ks. The Journal of Ravisankar, P., Ravi, V., Raghava Rao, G., & Bose, I.
Finance, 66(1), 35-65. (2011). Detection of financial statement fraud
and feature selection using data mining
March, S. T., & Smith, G. F. (1995). Design and
techniques. Decision Support Systems, 50(2),
natural science research on information
491-500.
technology. Decision Support Systems, 15, 251-
266. Resnik, A., & Stern, B. L. (1977). An analysis of
information content in television advertising.
Martin, M. A., & Rey, J.-M. (2000). On the role of
Journal of Marketing, 41(1), 50-53.
Shannon’s entropy as a measure of
heterogeneity. Geoderma, 98(1-2), 1-3. Russell, S. J., Norvig, P., & Davis, E. (2010). Artificial
intelligence: A modern approach (3rd ed.).
Meth, H., Mueller, B., & Maedche, A. (2015).
Prentice Hall.
Designing a requirement mining system.
Journal of the Association for Information SEC. (1959). A 25 year summary of the activities of the
Systems, 16(9), 799-837. securities and exchange commission. SEC.
Mitchell, T. (1997). Machine learning. McGraw-Hill.

175
Journal of the Association for Information Systems

SEC. (2012a). Internet fraud: Tips for checking out Vaishnavi, V. K., & Kuechler, W. (2015). Design
newsletters. http://www.sec.gov/investor/pubs/ science research methods and patterns:
cyberfraud/newsletter.htm Innovating information and communication
technology (2nd ed.). CRC Press.
SEC. (2012b). Investor alert: Social media and
investing—Avoiding fraud. http://www.sec. Vakratsas, D., & Ambler, T. (1999). How advertising
gov/investor/alerts/socialmediaandfraud.pdf works: What do we really know? Journal of
Marketing Research, 63(1), 26-43.
Shannon, C. E. (1948). A mathematical theory of
communication. Bell System Technical Valentini, G., & Masulli, F. (2002). Ensembles of
Journal, 27(3), 379-423. learning machines. Lecture Notes in Computer
Science, 2486, 3-20.
Shannon, C.E. (1951). Prediction and entropy of
printed English. Bell System Technical Journal, van Rijsbergen, C. J. (1979). Information retrieval
30(1), 50-64. (2nd ed.). Butterworths.
Siering, M. (2019). The economics of stock touting Webb, S., Chitti, S., & Pu, C. (2005). An experimental
during internet-based pump and dump evaluation of spam filter performance and
campaigns. Information Systems Journal, robustness against attack. Proceedings of the
29(2), 456-483. First International Conference on
Collaborative Computing.
Siering, M., Clapham, B., Engel, O., & Gomber, P.
(2017). A taxonomy of financial market Wei, C.-P., & Dong, Y.-X. (2001). A mining-based
manipulations: Establishing trust and market category evolution approach to managing
integrity in the financialized economy through online document categories. Proceedings of the
automated fraud detection. Journal of 34th Hawaii International Conference on
Information Technology, 32(3), 251-269. System Sciences.
Simon, H. A. (1996). The sciences of the artificial (3rd Wieland, A., & Marcus Wallenburg, C. (2012).
ed.). MIT Press. Dealing with supply chain risks: Linking risk
management practices and strategies to
Smith, E.A., & Senter, R.J. (1967). Automated
performance. International Journal of Physical
readability index. Aerospace Medical Research
Distribution & Logistics Management, 42(10),
Laboratories.
887-905.
Sonnier, G. P., McAlister, L., & Rutz, O. J. (2011). A
Witten, I. H., Frank, E., Hall, M. A., & Pal, C. J.
dynamic model of the effect of online
(2016). Data mining: Practical machine
communications on firm sales. Marketing
learning tools and techniques (4th ed.). Morgan
Science, 30(4), 702-716.
Kaufmann.
Symantec. (2011). Global debt crises news drives
You, H., & Zhang, X. (2009). Financial reporting
pump-and-dump stock scams. http://www.
complexity and investor underreaction to 10-K
symantec.com/connect/blogs/global-debt-
information. Review of Accounting Studies, 14,
crises-news-drives-pump-and-dump-stock-
559-586.
scams
Zhang, W., & Skiena, S. (2010). Trading strategies to
Tay, F. E. H., & Cao, L. (2001). Application of support
exploit blog and news sentiment. Proceedings
vector machines in financial time series
of the 4th International AAAI Conference on
forecasting. Omega, 29(4), 309-317.
Weblogs and Social Media.
Teahan, W.J. (2000). Text classification and
Zheng, R., Li, J., Chen, H., & Huang, Z. (2006). A
segmentation using minimum cross-entropy.
framework for authorship identification of
Proceedings of the International Conference on
online messages: Writing-style features and
Content-Based Multimedia Information Access,
classification techniques. Journal of the
943-961.
American Society for Information Science and
Tetlock, P. C. (2007). Giving content to investor Technology, 57(3), 378-393.
sentiment: The role of media in the stock
Zhou, L., & Chaovalit, P. (2008). Ontology-supported
market. The Journal of Finance, 62(3), 1139-
polarity mining. Journal of the American
1168.
Society for Information Science and
Tetlock, P. C., Saar-Tsechansky, M., & Macskassy, S. Technology, 59(1), 98-110
(2008). More than words: Quantifying
language to measure firms’ fundamentals. The
Journal of Finance, 63(3), 1437-1467.

176
Design Principles for Robust Fraud Detection

Appendix: Algorithm for Document Manipulation


1. Extract all of the unique features from the document.
2. Use the SVM decision function to rank the features according to their contribution to classifying the document
into the suspicious class. Because classifier A is based on a linear kernel, this decision function takes the
following form (Guyon et al., 2002):

d(x) = x∙w + 𝑏 = 𝑥1 𝑤1 + 𝑥2 𝑤2 + ⋯ + 𝑥𝑛 𝑤𝑛 + 𝑏 (17)

In this equation, x is the TF-IDF vector, w is the SVM weight vector, and b is the hyperplane bias. As shown
above, each of the components (the summands) contributes to the final value of d( x). The components xiwi
correspond to the features in the bag-of-words vocabulary. If the suspicious documents are labeled “1” and the
non-suspicious documents are labeled “-1” in the training set, then a positive value for a particular component
xiwi would indicate that it contributes to classifying the document into the suspicious class, and the absolute value
of this component would represent the degree to which the feature contributes to the final outcome. The fraudster
therefore ranks the features in descending order (i.e., largest to smallest) according to their xiwi values, such that
the features that provide the greatest contributions to classifying the document into the suspicious class are at the
top of this ranked list.
3. In the document, the fraudster locates the words corresponding to the topmost (100  m)% features from the list.
The fraudster considers only features with positive xiwi values (even if this means that fewer than (100  m)%
features are considered). The fraudster replaces each of these words with a suitable synonym. The fraudster’s
lexical knowledge is modeled with two lexical resources: WordNet (Fellbaum, 1998) and SentiWordNet
(Baccianella, Esuli, & Sebastiani, 2010). The fraudster modifies a word in the following manner:
a. The fraudster looks up the word’s lemma in WordNet and retrieves all of its synonyms
b. The fraudster uses SentiWordNet to determine the amount of positivity p i and the amount of negativity
ni to assign to each synonym si. If the word to be replaced bears a positive sentiment (pi > ni), then only
the words with pi > 0 would be regarded as suitable replacements. Similarly, if the word to be replaced
bears a negative sentiment (ni > pi), only the words with ni > 0 are regarded as suitable replacements. The
intuition behind this supposition is that the fraudster wishes to preserve the marketing effect of the
document (see assumption 2).
c. The fraudster looks at the weight wi for each of the synonyms. If a synonym does not exist in the bag-of-
words vocabulary, its weight wi equals 0. The synonyms are ranked in ascending order (i.e., smallest to
largest) according to their weights. This classification procedure means that the synonyms with the most
negative weights (i.e., the synonyms that contribute the most to classifying the document in the non-
suspicious class) will be at the top of the list. The fraudster uses the topmost synonym from the list to
replace the original word in the document.

177
Journal of the Association for Information Systems

About the Authors


Michael Siering is a postdoctoral research associate at Goethe University Frankfurt and works as a project manager
in the financial services industry. His research focuses on decision support systems in electronic markets, with a focus
on the analysis of user-generated content. His work has been published in outlets such as Journal of Management
Information Systems, Journal of the Association for Information Systems, Information Systems Journal, Journal of
Information Technology, and Decision Support Systems.
Jan Muntermann is a full professor of electronic finance and digital markets at the University of Goettingen,
Germany. His research interests include (big) data analytics and managerial decision support, digital business strategy
development and execution, and the conceptual and methodological foundations of theory development in information
systems research. He has published in journals such as Information Systems Research, Decision Support Systems, and
the European Journal of Information Systems.
Miha Grčar was a researcher in social media and news analytics at the Department of Knowledge Technologies at
Jožef Stefan Institute, Slovenia, for over 10 years. His area of expertise is a mix of data mining, text mining, stream
mining, machine learning, recommender systems, information retrieval, network analysis, and language technologies.
He is also a skilled software developer and a co-founder of Sowa Labs GmbH, a company recently acquired by Boerse
Stuttgart Group. Miha now works as the CTO and a managing director of Sowa Labs.

Copyright © 2021 by the Association for Information Systems. Permission to make digital or hard copies of all or part
of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for
profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for
components of this work owned by others than the Association for Information Systems must be honored. Abstracting
with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists requires prior
specific permission and/or fee. Request permission to publish from: AIS Administrative Office, P.O. Box 2712 Atlanta,
GA, 30301-2712 Attn: Reprints, or via email from [email protected].

178

You might also like