Natural Language Model For Automatic Identification of Intimate Partner Violence Reports From Twitter
Natural Language Model For Automatic Identification of Intimate Partner Violence Reports From Twitter
Array
journal homepage: www.sciencedirect.com/journal/array
A R T I C L E I N F O A B S T R A C T
Keywords: Intimate partner violence (IPV) is a preventable public health problem that affects millions of people worldwide.
Intimate partner violence Approximately one in four women are estimated to be or have been victims of severe violence at some point in
Domestic violence their lives, irrespective of age, ethnicity, and economic status. Victims often report IPV experiences on social
Natural language processing
media, and automatic detection of such reports via machine learning may enable improved surveillance and
Machine learning
Social media
targeted distribution of support and/or interventions for those in need. However, no artificial intelligence sys
tems for automatic detection currently exists, and we attempted to address this research gap. We collected posts
from Twitter using a list of IPV-related keywords, manually reviewed subsets of retrieved posts, and prepared
annotation guidelines to categorize tweets into IPV-report or non-IPV-report. We annotated 6,348 tweets in total,
with the inter-annotator agreement (IAA) of 0.86 (Cohen’s kappa) among 1,834 double-annotated tweets. The
class distribution in the annotated dataset was highly imbalanced, with only 668 posts (~11%) labeled as IPV-
report. We then developed an effective natural language processing model to identify IPV-reporting tweets
automatically. The developed model achieved classification F1-scores of 0.76 for the IPV-report class and 0.97 for
the non-IPV-report class. We conducted post-classification analyses to determine the causes of system errors and
to ensure that the system did not exhibit biases in its decision making, particularly with respect to race and
gender. Our automatic model can be an essential component for a proactive social media-based intervention and
support framework, while also aiding population-level surveillance and large-scale cohort studies.
1. Introduction depression, anxiety, PTSD, dissociation, and anger [6,7]. Physical health
outcomes include injuries, sexually transmitted diseases, back and limb
Intimate partner violence (IPV) is a preventable public health problems, memory loss, dizziness, and gastrointestinal conditions. Fe
problem that impacts millions of people worldwide [1]. IPV can be male victims could experience undesirable reproductive outcomes,
defined as physical or sexual assault, or both, of a spouse, partner, or including miscarriages and gynecologic disorders. From an economic
cohabiting dating couples [2,3]. Approximately one in four women in perspective, the approximate IPV lifetime cost is $103,767 per female
the United States (US) is estimated to be or have been victims of severe victim and $23,414 per male victim; a population economic burden
violence at some point in their lives, irrespective of their age, ethnicity, amounts to nearly $3.6 trillion (2014 US$) over victims’ lifetimes, 37%
and economic status [1]. IPV victims suffer from physical or mental of which, $1.3 trillion, is paid by the government [8].
health problems in the short and long term, including injury, pain, sleep Emerging data show that since the outbreak of COVID-19, reports of
problems, depression, post-traumatic stress disorder (PTSD), and suicide IPV have increased worldwide [9]. In the US, IPV reports in the early
[4]. Family members and children who live in or witness violence can phases of COVID-19 increased in many states and cities compared to
also experience adverse health and social developmental issues [5]. prior years [10,11]. The current crisis, described as a “horrifying global
Notably, children exposed to IPV are more likely to experience surge in domestic violence” by the United Nations Chief [12], may be
* Corresponding author.
E-mail address: [email protected] (M.A. Al-Garadi).
https://doi.org/10.1016/j.array.2022.100217
Received 11 January 2022; Received in revised form 1 July 2022; Accepted 2 July 2022
Available online 20 July 2022
2590-0056/© 2022 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-
nc-nd/4.0/).
M.A. Al-Garadi et al. Array 15 (2022) 100217
attributed to mandatory lockdowns or movement restrictions to curb the • We are also the first to develop an effective NLP pipeline involving
spread of COVID-19 [12,13]; isolation of IPV victims with their abusers supervised machine learning that can automatically extract and
at home; elevated tension between partners, compounded by security, classify IPV-related tweets.
health, and financial worries (e.g., socioeconomic instability and busi • We present a thorough analysis of the classification errors, and the
ness closures) [12,14]. The Sustainable Development Goals—a collec model performances at different training data sizes, as this infor
tion of 17 interlinked global goals designed to be a “blueprint for mation may be crucial for future research.
achieving a better and more sustainable future for all—focus on elimi • We present an analysis of potential biases and the trustworthiness of
nating all forms of violence against women and girls [15,16]. our model’s performance through content analysis and the approach
Conventionally, IPV-related data have been collected through sur of layered integrated gradients.
veys and medical/police reports [17,18]. It has become challenging to • We describe challenges faced during our analysis and lessons
obtain IPV-related data from these traditional sources during the learned, which will help future researchers design similar NLP sys
COVID-19 pandemic because IPV victims, for example, were reluctant to tems for social media-based data.
see healthcare providers due to the fear of contracting the virus [17,18].
Social media websites (SMWs), such as Twitter and Reddit, can poten 2. Methods
tially get around the pandemic-induced obstacles by collecting data
online. More than 4.5 billion individuals use SMWs globally, many of 2.1. Data collection
whom use it for extended periods [19]. SMWs have become a new
communication tool for people to express their thoughts, emotions, We collected publicly available English posts (tweets) related to IPV
opinions, and discuss daily problems, regardless of geographical local from Twitter using the SMW’s public streaming application program
ity. During the COVID-19 pandemic and lockdown, SMWs have become ming interface (API). The keywords we used are provided in Supple
many people’s primary channels to express and share information, mentary Table S1. We also used the Python library snscrape to collect
emotions, and details of daily lives with their friends and family mem data during the COVID-19 pandemic between January 1, 2020 – and
bers [20]. SMWs may have particular utility to IPV victims because they March 31, 2021.
tend to share their sensitive information more with friends (64%) and
family members (49%), but less with healthcare providers (26%), police 2.2. Annotation guidelines
(23%), and shelter advocates (20%) [21,22]. Also, SMWs have been
shown in past research to be excellent sources for collecting live data Four annotators encoded each tweet into one of two catego
precisely, at scale (i.e., a large number of observations), anonymously, ries—personal (reported by victims themselves) IPV-report (or IPV) or
unobtrusively, and at a low cost [23,24]. Thus, SMW data about IPV and non-IPV-report (or non-IPV). Such categorization of each tweet was made
other digital footprints of IPV victims’ behaviors can be analyzed in based on the definition of IPV. The Centers for Disease Control and
real-time to model patterns of IPV and discover underlying risk factors at Prevention defines IPV as physical violence, sexual violence, stalking,
the population and individual levels [25]. Moreover, during the and psychological aggression (including multiple coercive tactics) by a
COVID-19 pandemic, the United Nations urged to increase investment in current or former intimate partner (i.e., spouse, boyfriend or girlfriend,
online technology and civil society organizations to build harmless dating partner, or ongoing sexual partner) [27]. Supplementary Table S2
means for women to find help and support without informing their provides comprehensive details and examples of various types of IPV.
abusers, and accordingly reduce domestic violence [12]. SMWs, where Two necessary factors that determined if a tweet was a self-report of IPV
users can share anonymous postings, may serve as secure platforms were the (1) mention of an intimate partner as an abuser and (2)
compared to other means (e.g., phone call, email) for offering instru mention or description of any type of abuse (physical violence, sexual
mental, informational, and social support to IPV victims. These advan violence, stalking, and psychological aggression) or abusive tactics.
tages of SMWs may complement (not substitute) the conventional We conducted the annotation process iteratively over small datasets
resources (e.g., hotline, shelter) and strengthen the effort to prevent IPV (~200 tweets). Guided by the definition of IPV and a domain-expert
and support IPV victims. (SK), the annotators discussed disagreements in the early annotations
To the best of our knowledge, there has been no study employing until consistent coding rules were reached. With the finalized annotation
social media analytics (e.g., natural language processing (NLP), machine guidelines, the annotators encoded the final dataset (n = 6,348) used for
learning) to identify IPV victims on SMWs for surveillance, prevention, training and testing our NLP model. The gold-standard dataset was
and/or intervention. Therefore, this study aims to develop an auto developed once reliable levels of agreement between the annotators
mated, social media-based system to detect and categorize streaming were achieved. A subset of the tweets was annotated twice or thrice for
IPV-related big data (into IPV and non-IPV cases) during the COVID-19 computing inter-annotator agreements (IAA). For double-annotated
pandemic on Twitter via NLP and machine learning. tweets, if the annotators disagreed with the class, the tweet was
assessed by an independent annotator who resolved the disagreement.
The final IAA was calculated using Cohen’s kappa [28].
1.1. Significance
2
M.A. Al-Garadi et al. Array 15 (2022) 100217
predefined tags (i.e., URL and <user>). We also lowercased all text. 3.2. Classification results
Traditional machine learning models: For the traditional classifiers,
we produced 1,000 most frequent n-grams (adjacent series of words with The classifier performances on the test set are shown in Table 1.
n ranging from 1 to 3: unigrams (n = 1), bigrams (n = 2), and trigrams (n Accuracies and class-specific F1 scores are reported for each classifier.
= 3)) [37]. The stemmed and lowercased tokens from the texts were The confusion matrix for the best-performing classifier (i.e., RoBERTa) is
used to generated the n-gram vectors. shown in Fig. 2. As Table 1 illustrates, the BERT model also obtained
Deep learning model: We converted each word into a corresponding competitive results. However, traditional machine learning and deep
word vector then fed it into the BiLSTM classifier. For a word to vector learning models did not perform well, particularly for the primary
conversion, we used the Twitter GloVe word embeddings trained on 2 evaluation metric (IPV-report F1 score), with no significant differences
billion tweets and 27 billion tokens, including a 1.2 million-word vo between these classifiers.
cabulary. We used uncased GloVe embedding with 200-dimension
vectors [38]. 3.3. Learning curve
Transformer-based models: We used BERT and RoBERTa. The pre-
processed training tweets were fed into the model for fine-tuning the Fig. 3 shows the learning curve at different percentages of training
model for IPV classification. The hyper-parameters and technical details data (20%, 40%, 60%, 80%, 100%) and the performance obtained using
are presented in Supplementary Table S3. fixed test data. Overall, the performance tends to improve with
Model training validation: Our primary objective was to create a increasing training data, which is expected, particularly from 20% to
model for identifying IPV tweets from streaming Twitter data. Our pri 80%. With only 40% of the training dataset, the pre-trained models (i.e.,
mary metric for assessing classifiers’ performance was the F1-score BERT, RoBERTa) performed similarly to the traditional and deep
(harmonic mean of precision and recall) for the IPV class. We divided the learning models with 100% of the training dataset.
annotated dataset into training (70%), development (10%), and test sets
(20%). We used the training dataset for training models, the develop 3.4. Error and model analyses
ment dataset for optimizing model hyperparameters, and the test set to
evaluate and compare classifier performances. Given that the RoBERTa model (Table 1) produced the best perfor
mance compared with other models, we used it for our NLP pipeline and
2.4. Post-classification analyses investigated the errors made by this classifier. For the tweets annotated
as non-IPV-report, few non-IPV examples were classified as IPV as re
Learning curve analysis: We evaluated the model performance at flected by a very high F1-score for the non-IPV class (i.e., 0.97).
different sizes of training data (20%, 40%, 60%, 80%, 100%) to inves Nevertheless, in a few cases, an error occurred when tweets contained
tigate the behavior of each model at a given training percentage and users’ indirect IPV experiences (reporting IPV not experienced person
assess whether growing the training dataset improved the models’ ally but by others). For instance, “my ex-neighbours rowed a lot. he was a
performance on the fixed test dataset. big bloke, over day i heard him hit her; my heart beating out my chest, I
Error analysis: To identify potential reasons for misclassifications, knocked on their door. it stopped that fight. < hashtag> domestic violence.”
we studied the common patterns of errors made by our model by Additionally, non-IPV examples were misclassified as IPV when tweets
manually analyzing the contents of a subset of misclassified tweets. contained hypothetical scenarios not experienced by the authors in the
Bias analysis: Generally, humans recognize the meanings of sen real world, such as: “that is completely different issue. people not wanting to
tences by focusing on the most important words. Some words play more have children is different from divorce. i could have children with good in
important roles than others in understanding a post’s meaning and tentions in mind, and still get divorced bc my husband turns abusive.” In
deciding its class (e.g., IPV or non-IPV). We aimed to examine the words contrast, for the tweets annotated as IPV tweets, misclassifications often
on which our best performing model focuses and which ones have more occurred into non-IPV tweets when the author’s IPV was implied in the
importance than the others in making the classification decisions. This tweet but was not explicit. For instance, “I hate how abusers justify their
process helps us ensure that the developed model has a low bias, is abuse of their partner in their head. That was what my ex would do. the
trustworthy, and understands the model’s errors. To achieve this physical scars heal, but the mental ones remain.” In this example, the
objective, we used the approach of layered integrated gradients [39] author did not state up front that her ex-partner harmed her body and
proposed by Captum [40,41]. The approach is based on an attribution mind. “my ex would do” did not contain any indicator of IPV; hence, the
technique called integrated gradients [42], which uses an interpretable tweet was inaccurately considered to be non-IPV by our model.
algorithm that calculates word importance scores by approximating the
integral of the gradients of the model’s outputs with respect to the inputs 3.5. Model behavior and bias analysis
[42,43]. We analyzed a random sample of 5% of the posts from the test
set in this manner. We first checked whether our model’s classification outcomes were
biased toward a specific gender or racial group. We made this our focus
3. Results because many recent studies have shown that machine learning models
3
M.A. Al-Garadi et al. Array 15 (2022) 100217
Fig. 1. Shows the general framework describing overall process for develoving NLP pipeline for classifying IPV-related tweets.
Fig. 3. Classifier performances at different training set sizes (learning curves). SVM = Support Vector Machine, DT = Decision tree, NN = Neural Network, BiLSTM =
Bi-Directional Long Short-Term Memory, BERT = Bidirectional Encoder Representations from Transformers, RoBERTa = Robustly Optimized BERT.
4
M.A. Al-Garadi et al.
Table 2
Bias and model outcome analyses were conducted using the layered integrated gradients approach. Highlighted text segments show where the model focused when making classification decisions. IPV (Positive Class),
Non-IPV(Negative class).
Gender bias analysis
I am a woman, it took me three years Positive (1) Positive Label correctly classified, and the
to leave this abusive boyfriend. (1) classification outcomes did not
change when we changed the
gender-related word in the example.
The main important word for
classifying positive class (green
highlights) is still the same in both
examples.
4. Discussion
6
M.A. Al-Garadi et al. Array 15 (2022) 100217
The current strategy of automatic classification does not take into ac onlinelibrary.wiley.com/doi/abs/10.1111/j.1469-7610.1994.tb01133.x?sid=nlm
%3Apubmed.
count neighboring posts in threads. In the future, we will explore stra
[7] Johnsona RM, et al. Adverse behavioral and emotional outcomes from child abuse
tegies for incorporating thread-level information in the classification and witnessed violence. Child Maltreat 2002;7(3):179–86.
process. [8] Peterson C, et al. Lifetime economic burden of intimate partner violence among U.
The dataset used to pre-train the RoBERTa or the BERT model mainly S. Adults. Am J Prev Med 2018;55(4):433–44. https://doi.org/10.1016/j.
amepre.2018.04.049 (in eng).
contains a corpus of books (800 million words) and English Wikipedia. [9] Kim S, Sarker A, Sales JM. The use of social media to prevent and reduce intimate
The way people express themselves in SMWs is different from the texts in partner violence during COVID-19 and beyond. Partner Abuse 2021;12(4):512–8.
books [37]. Efforts have been made to pre-train BERT-related models on [10] Police Bureau Portland. Trends Analysis: Pre & Post School Closures – April 15,
2020. 2020. Accessed: August 20, 2020. [Online]. Available: https://www.port
social media. However, previous research showed that the RoBERTa landoregon.gov/police/article/760237.
model based on traditional text still performs better than these models in [11] Cuomo Governor Andrew M. Following spike in domestic violence during COVID-
several social media NLP tasks [51,52]. A potential direction may be 19 pandemic, secretary to the governor melissa derosa & NYS council on women &
girls launch task force to find innovative solutions to crisis. accessed August 20,
combining traditional data with social media data to have a hybrid 2020), https://www.governor.ny.gov/news/following-spike-domestic-violence
model built with a large text corpus and social media language. -during-covid-19-pandemic-secretary-governor-melissa-derosa.
[12] United Nations. In: UN chief calls for domestic violence ‘ceasefire’ amid ‘horrifying
global surge’. UN News; 2020.
5. Conclusion [13] Boserup B, McKenney M, Elkbuli A. Alarming trends in US domestic violence
during the COVID-19 pandemic. Am J Emerg Med Apr 28 2020. https://doi.org/
Although IPV victims often reach out for support and intervention 10.1016/j.ajem.2020.04.077 (in eng).
[14] B. Gosangi et al., "Exacerbation of physical intimate partner violence during
through social media channels such as Twitter, there is little effort to use COVID-19 lockdown," Radiology, vol. 0, no. 0, p. 202866, doi: 10.1148/
such platforms to address this public health problem. Posts about IPV are radiol.2020202866.
typically lost in the massive volume of data constantly posted on social [15] Agüero JM. COVID-19 and the rise of intimate partner violence, vol. 137. World
development; 2021, 105217.
media. Developing an effective, low-biased, and trustworthy model for
[16] T. E. Union, "Ending Violence Against Women and Girls," The Spotlight Initiative,
classifying self-reported IPV in social media has significant practical Accessed 11/16/2021. [Online]. Available: https://www.un.org/
applications. This study developed and evaluated an NLP pipeline to sustainabledevelopment/ending-violence-against-women-and-girls/.
[17] Gosangi B, et al. Exacerbation of physical intimate partner violence during COVID-
collect and classify posts from the Twitter platform to identify IPV-
19 pandemic. Radiology 2021;298(1):E38–45.
related tweets. Our NLP pipeline achieved comparable performance to [18] Moreira DN, Pinto da Costa M. The impact of the Covid-19 pandemic in the
humans and was shown to be particularly not biased to gender or race- precipitation of intimate partner violence (in eng) Int J Law Psychiatr 2020;71.
related words. By identifying IPV victims on Twitter, our model will lay https://doi.org/10.1016/j.ijlp.2020.101606. 101606-101606, July-August.
[19] Kepios, Global Social Media Statistics. online source Accessed June 26 2022, 2022.
the groundwork to design and deliver evidence-based interventions, and [Online]. Available: https://datareportal.com/social-media-users#:~:text
support to IPV victims and potentially enable us to address the problem =Analysis%20from%20Kepios%20shows%20that,since%20this%20time%20last%
of IPV in close to real-time via social media. 20year.
[20] Koeze E, Popper N. The virus changed the way we internet. In: The New York
times; 2020.
Authors’ contributions [21] Glass N, Eden KB, Bloom T, Perrin N. Computerized aid improves safety decision
process for survivors of intimate partner violence. J Interpers Violence 2010;25
(11):1947–64. https://doi.org/10.1177/0886260509354508.
MAA, SK and AS designed the experiments MAA, SK, YCY, YG, EW, [22] Glass N, Eden KB, Bloom T, Perrin N. Computerized aid improves safety decision
SL and AS conducted the data collection, analysis, and evaluations. process for survivors of intimate partner violence, vol. 25; 2010. p. 1947–64. no.
MAA, SK and AS conducted manuscript writing. MAA, SK, EW, and AS 11.
[23] Schwab-Reese LM, Hovdestad W, Tonmyr L, Fluke J. The potential use of social
helped interpret relevant findings, discussed and analyzed the results. media and other internet-related data and communications for child maltreatment
surveillance and epidemiological research: scoping review and recommendations
Funding statement (in eng) Child Abuse Negl Nov 2018;85:187–201. https://doi.org/10.1016/j.
chiabu.2018.01.014.
[24] Lin H, et al. User-level psychological stress detection from social media using deep
The study was funded by the Injury Prevention Research Center at neural network. Orlando, Florida, USA. In: Presented at the Proceedings of the
Emory (IPRCE), Emory University. 22nd ACM international conference on Multimedia; 2014. https://doi.org/
10.1145/2647868.2654945 [Online]. Available:.
[25] Merchant RM, Lurie N. Social media and emergency preparedness in response to
Declaration of competing interest novel coronavirus. JAMA 2020;323(20):2011–2. https://doi.org/10.1001/
jama.2020.4469.
[26] El Morr C, Layal M. Effectiveness of ICT-based intimate partner violence
The authors have declared no competing interest. interventions: a systematic review. BMC Publ Health 2020;20(1):1–25.
[27] Breiding M, Basile KC, Smith SG, Black MC, Mahendra RR. Intimate partner
violence surveillance: Uniform definitions and recommended data elements. CDC
Appendix A. Supplementary data report; 2015., Version 2.0.
[28] Cohen J. Weighted kappa: nominal scale agreement with provision for scaled
Supplementary data to this article can be found online at https://doi. disagreement or partial credit. Psychol Bull Oct 1968;70(4):213–20. https://doi.
org/10.1037/h0026256.
org/10.1016/j.array.2022.100217. [29] Safavian SR, Landgrebe D. A survey of decision tree classifier methodology. IEEE
transactions on systems, man, and cybernetics 1991;21(3):660–74.
References [30] Chang CC, Lin CJ. LIBSVM: a library for support vector machines. Acm T Intel Syst
Tec 2011;2(3):1–27. https://doi.org/10.1145/1961189.1961199 (in English) Artn
27.
[1] Smith SG, et al. The national Intimate Partner and Sexual Violence Survey (NISVS):
[31] Platt J. Probabilistic outputs for support vector machines and comparisons to
2015 Data Brief – Updated Release. Atlanta, GA: National Center for Injury
regularized likelihood methods. Advances in large margin classifiers 1999;10(3):
Prevention and Control, Centers for Disease Control and Prevention; 2018.
61–74.
[2] Campbell JC. Health consequences of intimate partner violence. The lancet 2002;
[32] Wang S-C. Artificial neural network. In: Interdisciplinary computing in java
359(9314):1331–6.
programming. Springer; 2003. p. 81–100.
[3] Capaldi DM, Knoble NB, Shortt JW, Kim HK. A systematic review of risk factors for
[33] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput 1997;9(8).
intimate partner violence. Partner abuse 2012;3(2):231–80.
https://doi.org/10.1162/neco.1997.9.8.1735. 1735-80, Nov 15.
[4] G. Dillon, R. Hussain, D. Loxton, and S. Rahman, "Mental and physical health and
[34] Schuster M, Paliwal KK. Bidirectional recurrent neural networks. IEEE Trans Signal
intimate partner violence against women: a review of the literature," International
Process 1997;45(11):2673–81.
journal of family medicine, vol. 2013, 2013.
[35] Devlin J, Chang M-W, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional
[5] Karakurt G, Koç E, Çetinsaya EE, Ayluçtarhan Z, Bolen S. Meta-analysis and
transformers for language understanding. 2018. arXiv preprint arXiv:1810.04805.
systematic review for the treatment of perpetrators of intimate partner violence.
[36] Liu Y, et al. Roberta: A robustly optimized bert pretraining approach. 2019. arXiv
Neurosci Biobehav Rev 2019;105:220–30.
preprint arXiv:1907.11692.
[6] Cummings EM, Davies PT. Maternal depression and child development. JCPP (J
Child Psychol Psychiatry) 1994;35(1):73–122 [Online]. Available: https://acamh.
7
M.A. Al-Garadi et al. Array 15 (2022) 100217
[37] Al-Garadi MA, et al. Text classification models for the automatic detection of [47] Abburi H, Parikh P, Chhaya N, Varma V. Semi-supervised multi-task learning for
nonmedical prescription medication use from social media. BMC Med Inf Decis multi-label fine-grained sexism classification. In: Proceedings of the 28th
Making 2021;21(1):1–13. international conference on computational linguistics; 2020. p. 5810–20.
[38] Pennington J, Socher R, Manning CD. Glove: global vectors for word [48] Anzovino M, Fersini E, Rosso P. Automatic identification and classification of
representation. In: Proceedings of the 2014 conference on empirical methods in misogynistic language on twitter. In: International conference on applications of
natural language processing. EMNLP; 2014. p. 1532–43. natural language to information systems. Springer; 2018. p. 57–64.
[39] Mudrakarta PK, Taly A, Sundararajan M, Dhamdhere K. Did the model understand [49] Frenda S, Ghanem B, Montes-y-Gómez M, Rosso P. Online hate speech against
the question?. 2018. arXiv preprint arXiv:1805.05492. women: automatic identification of misogyny and sexism on twitter. J Intell Fuzzy
[40] Kokhlikyan N, et al. Captum: A unified and generic model interpretability library Syst 2019;36(5):4743–52.
for pytorch. 2020. arXiv preprint arXiv:2009.07896. [50] Poletto F, Basile V, Sanguinetti M, Bosco C, Patti V. Resources and benchmark
[41] Pierse C. Transformers Interpret. 2021 [Online]. Available: "https://github.com/cd corpora for hate speech detection: a systematic review. Comput Humanit 2021;55
pierse/transformers-interpret. (2):477–523.
[42] Sundararajan M, Taly A, Yan Q. Axiomatic attribution for deep networks. In: [51] Guo Y, Dong X, Al-Garadi MA, Sarker A, Paris C, Aliod DM. Benchmarking of
International conference on machine learning. PMLR; 2017. p. 3319–28. transformer-based pre-trained models on social media text classification datasets.
[43] Hayati SA, Kang D, Ungar L. Does BERT Learn as Humans Perceive? Understanding In: Proceedings of the the 18th annual workshop of the australasian language
Linguistic Styles through Lexica. 2021. arXiv preprint arXiv:2109.02738. technology association; 2020. p. 86–91.
[44] Viera AJ, Garrett JM. Understanding interobserver agreement: the kappa statistic. [52] Guo Y, Ge Y, Al-Garadi MA, Sarker A. Pre-trained transformer-based classification
Fam Med May 2005;37(5):360–3 (in eng). and span detection models for social media health applications. In: Proceedings of
[45] Lwowski B, Rios A. The risk of racial bias while tracking influenza-related content the sixth social media mining for health (# SMM4H) workshop and shared task;
on social media using machine learning. J Am Med Inf Assoc 2021;28(4):839–49. 2021. p. 52–7.
[46] DeBrusk C. The risk of machine-learning bias (and how to prevent it). MIT Sloan
Management Review; 2018.