-
Findings of the TSAR-2022 Shared Task on Multilingual Lexical Simplification
Authors:
Horacio Saggion,
Sanja Štajner,
Daniel Ferrés,
Kim Cheng Sheang,
Matthew Shardlow,
Kai North,
Marcos Zampieri
Abstract:
We report findings of the TSAR-2022 shared task on multilingual lexical simplification, organized as part of the Workshop on Text Simplification, Accessibility, and Readability TSAR-2022 held in conjunction with EMNLP 2022. The task called the Natural Language Processing research community to contribute with methods to advance the state of the art in multilingual lexical simplification for English…
▽ More
We report findings of the TSAR-2022 shared task on multilingual lexical simplification, organized as part of the Workshop on Text Simplification, Accessibility, and Readability TSAR-2022 held in conjunction with EMNLP 2022. The task called the Natural Language Processing research community to contribute with methods to advance the state of the art in multilingual lexical simplification for English, Portuguese, and Spanish. A total of 14 teams submitted the results of their lexical simplification systems for the provided test data. Results of the shared task indicate new benchmarks in Lexical Simplification with English lexical simplification quantitative results noticeably higher than those obtained for Spanish and (Brazilian) Portuguese.
△ Less
Submitted 6 February, 2023;
originally announced February 2023.
-
Lexical Simplification Benchmarks for English, Portuguese, and Spanish
Authors:
Sanja Stajner,
Daniel Ferres,
Matthew Shardlow,
Kai North,
Marcos Zampieri,
Horacio Saggion
Abstract:
Even in highly-developed countries, as many as 15-30\% of the population can only understand texts written using a basic vocabulary. Their understanding of everyday texts is limited, which prevents them from taking an active role in society and making informed decisions regarding healthcare, legal representation, or democratic choice. Lexical simplification is a natural language processing task th…
▽ More
Even in highly-developed countries, as many as 15-30\% of the population can only understand texts written using a basic vocabulary. Their understanding of everyday texts is limited, which prevents them from taking an active role in society and making informed decisions regarding healthcare, legal representation, or democratic choice. Lexical simplification is a natural language processing task that aims to make text understandable to everyone by replacing complex vocabulary and expressions with simpler ones, while preserving the original meaning. It has attracted considerable attention in the last 20 years, and fully automatic lexical simplification systems have been proposed for various languages. The main obstacle for the progress of the field is the absence of high-quality datasets for building and evaluating lexical simplification systems. We present a new benchmark dataset for lexical simplification in English, Spanish, and (Brazilian) Portuguese, and provide details about data selection and annotation procedures. This is the first dataset that offers a direct comparison of lexical simplification systems for three languages. To showcase the usability of the dataset, we adapt two state-of-the-art lexical simplification systems with differing architectures (neural vs.\ non-neural) to all three languages (English, Spanish, and Brazilian Portuguese) and evaluate their performances on our new dataset. For a fairer comparison, we use several evaluation measures which capture varied aspects of the systems' efficacy, and discuss their strengths and weaknesses. We find a state-of-the-art neural lexical simplification system outperforms a state-of-the-art non-neural lexical simplification system in all three languages. More importantly, we find that the state-of-the-art neural lexical simplification systems perform significantly better for English than for Spanish and Portuguese.
△ Less
Submitted 12 September, 2022;
originally announced September 2022.
-
GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Authors:
Sebastian Gehrmann,
Abhik Bhattacharjee,
Abinaya Mahendiran,
Alex Wang,
Alexandros Papangelis,
Aman Madaan,
Angelina McMillan-Major,
Anna Shvets,
Ashish Upadhyay,
Bingsheng Yao,
Bryan Wilie,
Chandra Bhagavatula,
Chaobin You,
Craig Thomson,
Cristina Garbacea,
Dakuo Wang,
Daniel Deutsch,
Deyi Xiong,
Di Jin,
Dimitra Gkatzia,
Dragomir Radev,
Elizabeth Clark,
Esin Durmus,
Faisal Ladhak,
Filip Ginter
, et al. (52 additional authors not shown)
Abstract:
Evaluation in machine learning is usually informed by past choices, for example which datasets or metrics to use. This standardization enables the comparison on equal footing using leaderboards, but the evaluation choices become sub-optimal as better alternatives arise. This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, an…
▽ More
Evaluation in machine learning is usually informed by past choices, for example which datasets or metrics to use. This standardization enables the comparison on equal footing using leaderboards, but the evaluation choices become sub-optimal as better alternatives arise. This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, and human evaluation to make definitive claims. To make following best model evaluation practices easier, we introduce GEMv2. The new version of the Generation, Evaluation, and Metrics Benchmark introduces a modular infrastructure for dataset, model, and metric developers to benefit from each others work. GEMv2 supports 40 documented datasets in 51 languages. Models for all datasets can be evaluated online and our interactive data card creation and rendering tools make it easier to add new datasets to the living benchmark.
△ Less
Submitted 24 June, 2022; v1 submitted 22 June, 2022;
originally announced June 2022.
-
Five Psycholinguistic Characteristics for Better Interaction with Users
Authors:
Sanja Štajner,
Seren Yenikent,
Marc Franco-Salvador
Abstract:
When two people pay attention to each other and are interested in what the other has to say or write, they almost instantly adapt their writing/speaking style to match the other. For a successful interaction with a user, chatbots and dialogue systems should be able to do the same. We propose a framework consisting of five psycholinguistic textual characteristics for better human-computer interacti…
▽ More
When two people pay attention to each other and are interested in what the other has to say or write, they almost instantly adapt their writing/speaking style to match the other. For a successful interaction with a user, chatbots and dialogue systems should be able to do the same. We propose a framework consisting of five psycholinguistic textual characteristics for better human-computer interaction. We describe the annotation processes used for collecting the data, and benchmark five binary classification tasks, experimenting with different training sizes and model architectures. The best architectures noticeably outperform several baselines and achieve macro-averaged F$_1$-scores between 72\% and 96\% depending on the language and the task. The proposed framework proved to be fairly easy to model for various languages even with small amount of manually annotated data if right architectures are used.
△ Less
Submitted 21 March, 2022; v1 submitted 17 December, 2020;
originally announced December 2020.
-
A Report on the Complex Word Identification Shared Task 2018
Authors:
Seid Muhie Yimam,
Chris Biemann,
Shervin Malmasi,
Gustavo H. Paetzold,
Lucia Specia,
Sanja Štajner,
Anaïs Tack,
Marcos Zampieri
Abstract:
We report the findings of the second Complex Word Identification (CWI) shared task organized as part of the BEA workshop co-located with NAACL-HLT'2018. The second CWI shared task featured multilingual and multi-genre datasets divided into four tracks: English monolingual, German monolingual, Spanish monolingual, and a multilingual track with a French test set, and two tasks: binary classification…
▽ More
We report the findings of the second Complex Word Identification (CWI) shared task organized as part of the BEA workshop co-located with NAACL-HLT'2018. The second CWI shared task featured multilingual and multi-genre datasets divided into four tracks: English monolingual, German monolingual, Spanish monolingual, and a multilingual track with a French test set, and two tasks: binary classification and probabilistic classification. A total of 12 teams submitted their results in different task/track combinations and 11 of them wrote system description papers that are referred to in this report and appear in the BEA workshop proceedings.
△ Less
Submitted 24 April, 2018;
originally announced April 2018.