Skip to main content

Showing 1–39 of 39 results for author: Goharian, N

Searching in archive cs. Search in all archives.
.
  1. Lexically-Accelerated Dense Retrieval

    Authors: Hrishikesh Kulkarni, Sean MacAvaney, Nazli Goharian, Ophir Frieder

    Abstract: Retrieval approaches that score documents based on learned dense vectors (i.e., dense retrieval) rather than lexical signals (i.e., conventional retrieval) are increasingly popular. Their ability to identify related documents that do not necessarily contain the same terms as those appearing in the user's query (thereby improving recall) is one of their key advantages. However, to actually achieve… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

    Comments: SIGIR 2023

  2. arXiv:2307.07586  [pdf, other

    cs.CL

    QontSum: On Contrasting Salient Content for Query-focused Summarization

    Authors: Sajad Sotudeh, Nazli Goharian

    Abstract: Query-focused summarization (QFS) is a challenging task in natural language processing that generates summaries to address specific queries. The broader field of Generative Information Retrieval (Gen-IR) aims to revolutionize information extraction from vast document corpora through generative approaches, encompassing Generative Document Retrieval (GDR) and Grounded Answer Retrieval (GAR). This pa… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

    Comments: 9 pages, Long paper accepted at Gen-IR@SIGIR23

  3. arXiv:2302.01342  [pdf, other

    cs.CL

    Curriculum-Guided Abstractive Summarization

    Authors: Sajad Sotudeh, Hanieh Deilamsalehy, Franck Dernoncourt, Nazli Goharian

    Abstract: Recent Transformer-based summarization models have provided a promising approach to abstractive summarization. They go beyond sentence selection and extractive strategies to deal with more complicated tasks such as novel word generation and sentence paraphrasing. Nonetheless, these models have two shortcomings: (1) they often perform poorly in content selection, and (2) their training strategy is… ▽ More

    Submitted 8 February, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

    Comments: 8 pages, Long paper. arXiv admin note: text overlap with arXiv:2302.00954

  4. arXiv:2302.00954  [pdf, other

    cs.CL cs.AI

    Curriculum-guided Abstractive Summarization for Mental Health Online Posts

    Authors: Sajad Sotudeh, Nazli Goharian, Hanieh Deilamsalehy, Franck Dernoncourt

    Abstract: Automatically generating short summaries from users' online mental health posts could save counselors' reading time and reduce their fatigue so that they can provide timely responses to those seeking help for improving their mental state. Recent Transformers-based summarization models have presented a promising approach to abstractive summarization. They go beyond sentence selection and extractive… ▽ More

    Submitted 2 February, 2023; originally announced February 2023.

    Comments: 4 pages, short paper, accepted to The 13th International Workshop on Health Text Mining and Information Analysis (LOUHI 2022)

  5. arXiv:2206.00856  [pdf, other

    cs.CL

    MentSum: A Resource for Exploring Summarization of Mental Health Online Posts

    Authors: Sajad Sotudeh, Nazli Goharian, Zachary Young

    Abstract: Mental health remains a significant challenge of public health worldwide. With increasing popularity of online platforms, many use the platforms to share their mental health conditions, express their feelings, and seek help from the community and counselors. Some of these platforms, such as Reachout, are dedicated forums where the users register to seek help. Others such as Reddit provide subreddi… ▽ More

    Submitted 1 June, 2022; originally announced June 2022.

    Comments: 8 pages, LREC 2022 Long Paper

  6. arXiv:2206.00847  [pdf, other

    cs.CL

    TSTR: Too Short to Represent, Summarize with Details! Intro-Guided Extended Summary Generation

    Authors: Sajad Sotudeh, Nazli Goharian

    Abstract: Many scientific papers such as those in arXiv and PubMed data collections have abstracts with varying lengths of 50-1000 words and average length of approximately 200 words, where longer abstracts typically convey more information about the source paper. Up to recently, scientific summarization research has typically focused on generating short, abstract-like summaries following the existing datas… ▽ More

    Submitted 1 June, 2022; originally announced June 2022.

    Comments: 9 pages, NAACL 2022 Long Paper

  7. arXiv:2110.01159  [pdf, other

    cs.CL

    TLDR9+: A Large Scale Resource for Extreme Summarization of Social Media Posts

    Authors: Sajad Sotudeh, Hanieh Deilamsalehy, Franck Dernoncourt, Nazli Goharian

    Abstract: Recent models in developing summarization systems consist of millions of parameters and the model performance is highly dependent on the abundance of training data. While most existing summarization corpora contain data in the order of thousands to one million, generation of large-scale summarization datasets in order of couple of millions is yet to be explored. Practically, more data is better at… ▽ More

    Submitted 5 October, 2021; v1 submitted 3 October, 2021; originally announced October 2021.

    Comments: Accepted to New Frontiers in Summarization Workshop (EMNLP 2021)

  8. Simplified Data Wrangling with ir_datasets

    Authors: Sean MacAvaney, Andrew Yates, Sergey Feldman, Doug Downey, Arman Cohan, Nazli Goharian

    Abstract: Managing the data for Information Retrieval (IR) experiments can be challenging. Dataset documentation is scattered across the Internet and once one obtains a copy of the data, there are numerous different data formats to work with. Even basic formats can have subtle dataset-specific nuances that need to be considered for proper use. To help mitigate these challenges, we introduce a new robust and… ▽ More

    Submitted 10 May, 2021; v1 submitted 3 March, 2021; originally announced March 2021.

    Comments: SIGIR 2021 Resource

  9. arXiv:2103.01328  [pdf, other

    cs.CL

    ToxCCIn: Toxic Content Classification with Interpretability

    Authors: Tong Xiang, Sean MacAvaney, Eugene Yang, Nazli Goharian

    Abstract: Despite the recent successes of transformer-based models in terms of effectiveness on a variety of tasks, their decisions often remain opaque to humans. Explanations are particularly important for tasks like offensive language or toxicity detection on social media because a manual appeal process is often in place to dispute automatically flagged content. In this work, we propose a technique to imp… ▽ More

    Submitted 1 March, 2021; originally announced March 2021.

    Comments: Long paper accepted to WASSA2021@EACL

  10. arXiv:2012.14136  [pdf, other

    cs.CL

    On Generating Extended Summaries of Long Documents

    Authors: Sajad Sotudeh, Arman Cohan, Nazli Goharian

    Abstract: Prior work in document summarization has mainly focused on generating short summaries of a document. While this type of summary helps get a high-level view of a given document, it is desirable in some cases to know more detailed information about its salient points that can't fit in a short summary. This is typically the case for longer documents such as a research paper, legal document, or a book… ▽ More

    Submitted 28 December, 2020; originally announced December 2020.

    Comments: Accepted at SDU 2021

  11. ABNIRML: Analyzing the Behavior of Neural IR Models

    Authors: Sean MacAvaney, Sergey Feldman, Nazli Goharian, Doug Downey, Arman Cohan

    Abstract: Pretrained contextualized language models such as BERT and T5 have established a new state-of-the-art for ad-hoc search. However, it is not yet well-understood why these methods are so effective, what makes some variants more effective than others, and what pitfalls they may have. We present a new comprehensive framework for Analyzing the Behavior of Neural IR ModeLs (ABNIRML), which includes new… ▽ More

    Submitted 20 July, 2023; v1 submitted 1 November, 2020; originally announced November 2020.

    Comments: TACL version

  12. arXiv:2010.05987  [pdf, other

    cs.CL cs.IR

    SLEDGE-Z: A Zero-Shot Baseline for COVID-19 Literature Search

    Authors: Sean MacAvaney, Arman Cohan, Nazli Goharian

    Abstract: With worldwide concerns surrounding the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), there is a rapidly growing body of scientific literature on the virus. Clinicians, researchers, and policy-makers need to be able to search these articles effectively. In this work, we present a zero-shot ranking algorithm that adapts to COVID-related scientific literature. Our approach filters tr… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

    Comments: EMNLP 2020. This article draws heavily from arXiv:2005.02365

  13. arXiv:2008.09703  [pdf, ps, other

    cs.CL

    Team DoNotDistribute at SemEval-2020 Task 11: Features, Finetuning, and Data Augmentation in Neural Models for Propaganda Detection in News Articles

    Authors: Michael Kranzlein, Shabnam Behzad, Nazli Goharian

    Abstract: This paper presents our systems for SemEval 2020 Shared Task 11: Detection of Propaganda Techniques in News Articles. We participate in both the span identification and technique classification subtasks and report on experiments using different BERT-based models along with handcrafted features. Our models perform well above the baselines for both tasks, and we contribute ablation studies and discu… ▽ More

    Submitted 21 August, 2020; originally announced August 2020.

  14. arXiv:2007.14477  [pdf, ps, other

    cs.CL

    GUIR at SemEval-2020 Task 12: Domain-Tuned Contextualized Models for Offensive Language Detection

    Authors: Sajad Sotudeh, Tong Xiang, Hao-Ren Yao, Sean MacAvaney, Eugene Yang, Nazli Goharian, Ophir Frieder

    Abstract: Offensive language detection is an important and challenging task in natural language processing. We present our submissions to the OffensEval 2020 shared task, which includes three English sub-tasks: identifying the presence of offensive language (Sub-task A), identifying the presence of target in offensive language (Sub-task B), and identifying the categories of the target (Sub-task C). Our expe… ▽ More

    Submitted 28 July, 2020; originally announced July 2020.

    Comments: SemEval 2020

  15. arXiv:2005.08805  [pdf, ps, other

    cs.CL

    Interaction Matching for Long-Tail Multi-Label Classification

    Authors: Sean MacAvaney, Franck Dernoncourt, Walter Chang, Nazli Goharian, Ophir Frieder

    Abstract: We present an elegant and effective approach for addressing limitations in existing multi-label classification models by incorporating interaction matching, a concept shown to be useful for ad-hoc search result ranking. By performing soft n-gram interaction matching, we match labels with natural language descriptions (which are common to have in most multi-labeling tasks). Our approach can be used… ▽ More

    Submitted 18 May, 2020; originally announced May 2020.

  16. arXiv:2005.02365  [pdf, other

    cs.IR cs.CL

    SLEDGE: A Simple Yet Effective Baseline for COVID-19 Scientific Knowledge Search

    Authors: Sean MacAvaney, Arman Cohan, Nazli Goharian

    Abstract: With worldwide concerns surrounding the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), there is a rapidly growing body of literature on the virus. Clinicians, researchers, and policy-makers need a way to effectively search these articles. In this work, we present a search system called SLEDGE, which utilizes SciBERT to effectively re-rank articles. We train the model on a general-do… ▽ More

    Submitted 3 August, 2020; v1 submitted 5 May, 2020; originally announced May 2020.

  17. arXiv:2005.00163  [pdf, other

    cs.CL

    Attend to Medical Ontologies: Content Selection for Clinical Abstractive Summarization

    Authors: Sajad Sotudeh, Nazli Goharian, Ross W. Filice

    Abstract: Sequence-to-sequence (seq2seq) network is a well-established model for text summarization task. It can learn to produce readable content; however, it falls short in effectively identifying key regions of the source. In this paper, we approach the content selection problem for clinical abstractive summarization by augmenting salient ontological terms into the summarizer. Our experiments on two publ… ▽ More

    Submitted 30 April, 2020; originally announced May 2020.

    Comments: Accepted to ACL 2020

  18. Training Curricula for Open Domain Answer Re-Ranking

    Authors: Sean MacAvaney, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto, Nazli Goharian, Ophir Frieder

    Abstract: In precision-oriented tasks like answer ranking, it is more important to rank many relevant answers highly than to retrieve all relevant answers. It follows that a good ranking strategy would be to learn how to identify the easiest correct answers first (i.e., assign a high ranking score to answers that have characteristics that usually indicate relevance, and a low ranking score to those with cha… ▽ More

    Submitted 21 May, 2020; v1 submitted 29 April, 2020; originally announced April 2020.

    Comments: Accepted at SIGIR 2020 (long)

  19. Efficient Document Re-Ranking for Transformers by Precomputing Term Representations

    Authors: Sean MacAvaney, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto, Nazli Goharian, Ophir Frieder

    Abstract: Deep pretrained transformer networks are effective at various ranking tasks, such as question answering and ad-hoc document ranking. However, their computational expenses deem them cost-prohibitive in practice. Our proposed approach, called PreTTR (Precomputing Transformer Term Representations), considerably reduces the query-time latency of deep transformer networks (up to a 42x speedup on web do… ▽ More

    Submitted 26 May, 2020; v1 submitted 29 April, 2020; originally announced April 2020.

    Comments: Accepted at SIGIR 2020 (long)

  20. Expansion via Prediction of Importance with Contextualization

    Authors: Sean MacAvaney, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto, Nazli Goharian, Ophir Frieder

    Abstract: The identification of relevance with little textual context is a primary challenge in passage retrieval. We address this problem with a representation-based ranking approach that: (1) explicitly models the importance of each term using a contextualized language model; (2) performs passage expansion by propagating the importance to similar terms; and (3) grounds the representations in the lexicon,… ▽ More

    Submitted 20 May, 2020; v1 submitted 29 April, 2020; originally announced April 2020.

    Comments: Accepted at SIGIR 2020 (short)

  21. Ranking Significant Discrepancies in Clinical Reports

    Authors: Sean MacAvaney, Arman Cohan, Nazli Goharian, Ross Filice

    Abstract: Medical errors are a major public health concern and a leading cause of death worldwide. Many healthcare centers and hospitals use reporting systems where medical practitioners write a preliminary medical report and the report is later reviewed, revised, and finalized by a more experienced physician. The revisions range from stylistic to corrections of critical errors or misinterpretations of the… ▽ More

    Submitted 18 January, 2020; originally announced January 2020.

    Comments: ECIR 2020 (short)

  22. Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-shot Learning

    Authors: Sean MacAvaney, Luca Soldaini, Nazli Goharian

    Abstract: While billions of non-English speaking users rely on search engines every day, the problem of ad-hoc information retrieval is rarely studied for non-English languages. This is primarily due to a lack of data set that are suitable to train ranking algorithms. In this paper, we tackle the lack of data by leveraging pre-trained multilingual language models to transfer a retrieval system trained on En… ▽ More

    Submitted 30 December, 2019; originally announced December 2019.

    Comments: ECIR 2020 (short)

  23. arXiv:1905.05818  [pdf, other

    cs.CL cs.IR

    Ontology-Aware Clinical Abstractive Summarization

    Authors: Sean MacAvaney, Sajad Sotudeh, Arman Cohan, Nazli Goharian, Ish Talati, Ross W. Filice

    Abstract: Automatically generating accurate summaries from clinical reports could save a clinician's time, improve summary coverage, and reduce errors. We propose a sequence-to-sequence abstractive summarization model augmented with domain-specific ontological information to enhance content selection and summary generation. We apply our method to a dataset of radiology reports and show that it significantly… ▽ More

    Submitted 14 May, 2019; originally announced May 2019.

    Comments: 4 pages; SIGIR 2019 Short Paper

  24. CEDR: Contextualized Embeddings for Document Ranking

    Authors: Sean MacAvaney, Andrew Yates, Arman Cohan, Nazli Goharian

    Abstract: Although considerable attention has been given to neural ranking architectures recently, far less attention has been paid to the term representations that are used as input to these models. In this work, we investigate how two pretrained contextualized language models (ELMo and BERT) can be utilized for ad-hoc document ranking. Through experiments on TREC benchmarks, we find that several existing… ▽ More

    Submitted 19 August, 2019; v1 submitted 15 April, 2019; originally announced April 2019.

    Comments: Appeared in SIGIR 2019, 4 pages

  25. Overcoming low-utility facets for complex answer retrieval

    Authors: Sean MacAvaney, Andrew Yates, Arman Cohan, Luca Soldaini, Kai Hui, Nazli Goharian, Ophir Frieder

    Abstract: Many questions cannot be answered simply; their answers must include numerous nuanced details and additional context. Complex Answer Retrieval (CAR) is the retrieval of answers to such questions. In their simplest form, these questions are constructed from a topic entity (e.g., `cheese') and a facet (e.g., `health effects'). While topic matching has been thoroughly explored, we observe that some f… ▽ More

    Submitted 21 November, 2018; originally announced November 2018.

    Comments: This is a pre-print of an article published in Information Retrieval Journal. The final authenticated version (including additional experimental results, analysis, etc.) is available online at: https://doi.org/10.1007/s10791-018-9343-0

    Journal ref: Information Retrieval Journal 2018

  26. arXiv:1806.07916  [pdf, other

    cs.CL

    RSDD-Time: Temporal Annotation of Self-Reported Mental Health Diagnoses

    Authors: Sean MacAvaney, Bart Desmet, Arman Cohan, Luca Soldaini, Andrew Yates, Ayah Zirikly, Nazli Goharian

    Abstract: Self-reported diagnosis statements have been widely employed in studying language related to mental health in social media. However, existing research has largely ignored the temporality of mental health diagnoses. In this work, we introduce RSDD-Time: a new dataset of 598 manually annotated self-reported depression diagnosis posts from Reddit that include temporal information about the diagnosis.… ▽ More

    Submitted 20 June, 2018; originally announced June 2018.

    Comments: 6 pages, accepted for publication at the CLPsych workshop at NAACL-HLT 2018

  27. arXiv:1806.05258  [pdf, other

    cs.CL

    SMHD: A Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions

    Authors: Arman Cohan, Bart Desmet, Andrew Yates, Luca Soldaini, Sean MacAvaney, Nazli Goharian

    Abstract: Mental health is a significant and growing public health concern. As language usage can be leveraged to obtain crucial insights into mental health conditions, there is a need for large-scale, labeled, mental health-related datasets of users who have been diagnosed with one or more of such conditions. In this paper, we investigate the creation of high-precision patterns to identify self-reported di… ▽ More

    Submitted 10 July, 2018; v1 submitted 13 June, 2018; originally announced June 2018.

    Comments: COLING 2018

  28. arXiv:1805.00791  [pdf, other

    cs.IR

    Characterizing Question Facets for Complex Answer Retrieval

    Authors: Sean MacAvaney, Andrew Yates, Arman Cohan, Luca Soldaini, Kai Hui, Nazli Goharian, Ophir Frieder

    Abstract: Complex answer retrieval (CAR) is the process of retrieving answers to questions that have multifaceted or nuanced answers. In this work, we present two novel approaches for CAR based on the observation that question facets can vary in utility: from structural (facets that can apply to many similar topics, such as 'History') to topical (facets that are specific to the question's topic, such as the… ▽ More

    Submitted 2 May, 2018; originally announced May 2018.

    Comments: 4 pages; SIGIR 2018 Short Paper

  29. arXiv:1804.07253  [pdf, other

    cs.CL

    Helping or Hurting? Predicting Changes in Users' Risk of Self-Harm Through Online Community Interactions

    Authors: Luca Soldaini, Timothy Walsh, Arman Cohan, Julien Han, Nazli Goharian

    Abstract: In recent years, online communities have formed around suicide and self-harm prevention. While these communities offer support in moment of crisis, they can also normalize harmful behavior, discourage professional treatment, and instigate suicidal ideation. In this work, we focus on how interaction with others in such a community affects the mental state of users who are seeking support. We first… ▽ More

    Submitted 19 April, 2018; originally announced April 2018.

    Comments: 10 pages, 4 figures, 5 tables, accepted for publication at the CLPsych workshop at NAACL-HLT 2018

  30. arXiv:1804.05685  [pdf, other

    cs.CL

    A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents

    Authors: Arman Cohan, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Seokhwan Kim, Walter Chang, Nazli Goharian

    Abstract: Neural abstractive summarization models have led to promising results in summarizing relatively short documents. We propose the first model for abstractive summarization of single, longer-form documents (e.g., research papers). Our approach consists of a new hierarchical encoder that models the discourse structure of a document, and an attentive discourse-aware decoder to generate the summary. Emp… ▽ More

    Submitted 22 May, 2018; v1 submitted 16 April, 2018; originally announced April 2018.

    Comments: NAACL HLT 2018

  31. arXiv:1804.05408  [pdf, other

    cs.CL

    GU IRLAB at SemEval-2018 Task 7: Tree-LSTMs for Scientific Relation Classification

    Authors: Sean MacAvaney, Luca Soldaini, Arman Cohan, Nazli Goharian

    Abstract: SemEval 2018 Task 7 focuses on relation ex- traction and classification in scientific literature. In this work, we present our tree-based LSTM network for this shared task. Our approach placed 9th (of 28) for subtask 1.1 (relation classification), and 5th (of 20) for subtask 1.2 (relation classification with noisy entities). We also provide an ablation study of features included as input to the ne… ▽ More

    Submitted 15 April, 2018; originally announced April 2018.

    Comments: 5 pages, Accepted to SemEval 2018

  32. arXiv:1709.01848  [pdf, other

    cs.CL

    Depression and Self-Harm Risk Assessment in Online Forums

    Authors: Andrew Yates, Arman Cohan, Nazli Goharian

    Abstract: Users suffering from mental health conditions often turn to online resources for support, including specialized online support communities or general communities such as Twitter and Reddit. In this work, we present a neural framework for supporting and studying users in both types of communities. We propose methods for identifying posts in support communities that may indicate a risk of self-harm,… ▽ More

    Submitted 6 September, 2017; originally announced September 2017.

    Comments: Expanded version of EMNLP17 paper. Added sections 6.1, 6.2, 6.4, FastText baseline, and CNN-R

  33. Identifying Harm Events in Clinical Care through Medical Narratives

    Authors: Arman Cohan, Allan Fong, Raj Ratwani, Nazli Goharian

    Abstract: Preventable medical errors are estimated to be among the leading causes of injury and death in the United States. To prevent such errors, healthcare systems have implemented patient safety and incident reporting systems. These systems enable clinicians to report unsafe conditions and cases where patients have been harmed due to errors in medical care. These reports are narratives in natural langua… ▽ More

    Submitted 15 August, 2017; originally announced August 2017.

    Comments: ACM-BCB 2017

  34. Scientific document summarization via citation contextualization and scientific discourse

    Authors: Arman Cohan, Nazli Goharian

    Abstract: The rapid growth of scientific literature has made it difficult for the researchers to quickly learn about the developments in their respective fields. Scientific document summarization addresses this challenge by providing summaries of the important contributions of scientific papers. We present a framework for scientific summarization which takes advantage of the citations and the scientific dis… ▽ More

    Submitted 11 June, 2017; originally announced June 2017.

    Comments: Preprint. The final publication is available at Springer via http://dx.doi.org/10.1007/s00799-017-0216-8, International Journal on Digital Libraries (IJDL) 2017

  35. Contextualizing Citations for Scientific Summarization using Word Embeddings and Domain Knowledge

    Authors: Arman Cohan, Nazli Goharian

    Abstract: Citation texts are sometimes not very informative or in some cases inaccurate by themselves; they need the appropriate context from the referenced paper to reflect its exact contributions. To address this problem, we propose an unsupervised model that uses distributed representation of words as well as domain knowledge to extract the appropriate context from the reference paper. Evaluation results… ▽ More

    Submitted 22 May, 2017; originally announced May 2017.

    Comments: SIGIR 2017

  36. arXiv:1704.06619  [pdf, other

    cs.CL cs.IR

    Scientific Article Summarization Using Citation-Context and Article's Discourse Structure

    Authors: Arman Cohan, Nazli Goharian

    Abstract: We propose a summarization approach for scientific articles which takes advantage of citation-context and the document discourse model. While citations have been previously used in generating scientific summaries, they lack the related context from the referenced article and therefore do not accurately reflect the article's content. Our method overcomes the problem of inconsistency between the cit… ▽ More

    Submitted 21 April, 2017; originally announced April 2017.

    Comments: EMNLP 2015

  37. arXiv:1702.07092  [pdf, other

    cs.CL cs.IR

    A Neural Attention Model for Categorizing Patient Safety Events

    Authors: Arman Cohan, Allan Fong, Nazli Goharian, Raj Ratwani

    Abstract: Medical errors are leading causes of death in the US and as such, prevention of these errors is paramount to promoting health care. Patient Safety Event reports are narratives describing potential adverse events to the patients and are important in identifying and preventing medical errors. We present a neural network architecture for identifying the type of safety events which is the first step i… ▽ More

    Submitted 22 February, 2017; originally announced February 2017.

    Comments: ECIR 2017

  38. arXiv:1702.06875  [pdf, other

    cs.CL cs.IR cs.SI

    Triaging Content Severity in Online Mental Health Forums

    Authors: Arman Cohan, Sydney Young, Andrew Yates, Nazli Goharian

    Abstract: Mental health forums are online communities where people express their issues and seek help from moderators and other users. In such forums, there are often posts with severe content indicating that the user is in acute distress and there is a risk of attempted self-harm. Moderators need to respond to these severe posts in a timely manner to prevent potential self-harm. However, the large volume o… ▽ More

    Submitted 22 February, 2017; originally announced February 2017.

    Comments: Accepted for publication in Journal of the Association for Information Science and Technology (2017)

  39. arXiv:1604.00400  [pdf, other

    cs.CL

    Revisiting Summarization Evaluation for Scientific Articles

    Authors: Arman Cohan, Nazli Goharian

    Abstract: Evaluation of text summarization approaches have been mostly based on metrics that measure similarities of system generated summaries with a set of human written gold-standard summaries. The most widely used metric in summarization evaluation has been the ROUGE family. ROUGE solely relies on lexical overlaps between the terms and phrases in the sentences; therefore, in cases of terminology variati… ▽ More

    Submitted 1 April, 2016; originally announced April 2016.

    Comments: LREC 2016