Search | arXiv e-print repository

HyperCAN: Hypernetwork-Driven Deep Parameterized Constitutive Models for Metamaterials

Authors: Li Zheng, Dennis M. Kochmann, Siddhant Kumar

Abstract: We introduce HyperCAN, a machine learning framework that utilizes hypernetworks to construct adaptable constitutive artificial neural networks for a wide range of beam-based metamaterials exhibiting diverse mechanical behavior under finite deformations. HyperCAN integrates an input convex network that models the nonlinear stress-strain map of a truss lattice, while ensuring adherence to fundamenta… ▽ More We introduce HyperCAN, a machine learning framework that utilizes hypernetworks to construct adaptable constitutive artificial neural networks for a wide range of beam-based metamaterials exhibiting diverse mechanical behavior under finite deformations. HyperCAN integrates an input convex network that models the nonlinear stress-strain map of a truss lattice, while ensuring adherence to fundamental mechanics principles, along with a hypernetwork that dynamically adjusts the parameters of the convex network as a function of the lattice topology and geometry. This unified framework demonstrates robust generalization in predicting the mechanical behavior of previously unseen metamaterial designs and loading scenarios well beyond the training domain. We show how HyperCAN can be integrated into multiscale simulations to accurately capture the highly nonlinear responses of large-scale truss metamaterials, closely matching fully resolved simulations while significantly reducing computational costs. This offers new efficient opportunities for the multiscale design and optimization of truss metamaterials. △ Less

Submitted 12 August, 2024; originally announced August 2024.

arXiv:2408.05350 [pdf, other]

Enabling Quick, Accurate Crowdsourced Annotation for Elevation-Aware Flood Extent Mapping

Authors: Landon Dyken, Saugat Adhikari, Pravin Poudel, Steve Petruzza, Da Yan, Will Usher, Sidharth Kumar

Abstract: In order to assess damage and properly allocate relief efforts, mapping the extent of flood events is a necessary and important aspect of disaster management. In recent years, deep learning methods have evolved as an effective tool to quickly label high-resolution imagery and provide necessary flood extent mappings. These methods, though, require large amounts of annotated training data to create… ▽ More In order to assess damage and properly allocate relief efforts, mapping the extent of flood events is a necessary and important aspect of disaster management. In recent years, deep learning methods have evolved as an effective tool to quickly label high-resolution imagery and provide necessary flood extent mappings. These methods, though, require large amounts of annotated training data to create models that are accurate and robust to new flooded imagery. In this work, we provide FloodTrace, an application that enables effective crowdsourcing for flooded region annotation for machine learning training data, removing the requirement for annotation to be done solely by researchers. We accomplish this through two orthogonal methods within our application, informed by requirements from domain experts. First, we utilize elevation-guided annotation tools and 3D rendering to inform user annotation decisions with digital elevation model data, improving annotation accuracy. For this purpose, we provide a unique annotation method that uses topological data analysis to outperform the state-of-the-art elevation-guided annotation tool in efficiency. Second, we provide a framework for researchers to review aggregated crowdsourced annotations and correct inaccuracies using methods inspired by uncertainty visualization. We conducted a user study to confirm the application effectiveness in which 266 graduate students annotated high-resolution aerial imagery from Hurricane Matthew in North Carolina. Experimental results show the accuracy and efficiency benefits of our application apply even for untrained users. In addition, using our aggregation and correction framework, flood detection models trained on crowdsourced annotations were able to achieve performance equal to models trained on expert-labeled annotations, while requiring a fraction of the time on the part of the researcher. △ Less

Submitted 31 July, 2024; originally announced August 2024.

arXiv:2408.04490 [pdf, ps, other]

Symmetric Encryption Scheme Based on Quasigroup Using Chained Mode of Operation

Authors: Satish Kumar, Harshdeep Singh, Indivar Gupta, Ashok Ji Gupta

Abstract: In this paper, we propose a novel construction for a symmetric encryption scheme, referred as SEBQ which is based on the structure of quasigroup. We utilize concepts of chaining like mode of operation and present a block cipher with in-built properties. We prove that SEBQ shows resistance against chosen plaintext attack (CPA) and by applying unbalanced Feistel transformation [19], it achieves secu… ▽ More In this paper, we propose a novel construction for a symmetric encryption scheme, referred as SEBQ which is based on the structure of quasigroup. We utilize concepts of chaining like mode of operation and present a block cipher with in-built properties. We prove that SEBQ shows resistance against chosen plaintext attack (CPA) and by applying unbalanced Feistel transformation [19], it achieves security against chosen ciphertext attacks (CCA). Subsequently, we conduct an assessment of the randomness of the proposed scheme by running the NIST test suite and we analyze the impact of the initial vector, secret key and plaintext on ciphertext through an avalanche effect analysis. We also compare the results with existing schemes based on quasigroups [11,46]. Moreover, we analyze the computational complexity in terms of number of operations needed for encryption and decryption process. △ Less

Submitted 8 August, 2024; originally announced August 2024.

MSC Class: 20N05; 05B15; 94A60; 68W20

arXiv:2408.03907 [pdf, other]

Decoding Biases: Automated Methods and LLM Judges for Gender Bias Detection in Language Models

Authors: Shachi H Kumar, Saurav Sahay, Sahisnu Mazumder, Eda Okur, Ramesh Manuvinakurike, Nicole Beckage, Hsuan Su, Hung-yi Lee, Lama Nachman

Abstract: Large Language Models (LLMs) have excelled at language understanding and generating human-level text. However, even with supervised training and human alignment, these LLMs are susceptible to adversarial attacks where malicious users can prompt the model to generate undesirable text. LLMs also inherently encode potential biases that can cause various harmful effects during interactions. Bias evalu… ▽ More Large Language Models (LLMs) have excelled at language understanding and generating human-level text. However, even with supervised training and human alignment, these LLMs are susceptible to adversarial attacks where malicious users can prompt the model to generate undesirable text. LLMs also inherently encode potential biases that can cause various harmful effects during interactions. Bias evaluation metrics lack standards as well as consensus and existing methods often rely on human-generated templates and annotations which are expensive and labor intensive. In this work, we train models to automatically create adversarial prompts to elicit biased responses from target LLMs. We present LLM- based bias evaluation metrics and also analyze several existing automatic evaluation methods and metrics. We analyze the various nuances of model responses, identify the strengths and weaknesses of model families, and assess where evaluation methods fall short. We compare these metrics to human evaluation and validate that the LLM-as-a-Judge metric aligns with human judgement on bias in response generation. △ Less

Submitted 7 August, 2024; originally announced August 2024.

Comments: 6 pages paper content, 17 pages of appendix

arXiv:2408.03592 [pdf, other]

HistoSPACE: Histology-Inspired Spatial Transcriptome Prediction And Characterization Engine

Authors: Shivam Kumar, Samrat Chatterjee

Abstract: Spatial transcriptomics (ST) enables the visualization of gene expression within the context of tissue morphology. This emerging discipline has the potential to serve as a foundation for developing tools to design precision medicines. However, due to the higher costs and expertise required for such experiments, its translation into a regular clinical practice might be challenging. Despite the impl… ▽ More Spatial transcriptomics (ST) enables the visualization of gene expression within the context of tissue morphology. This emerging discipline has the potential to serve as a foundation for developing tools to design precision medicines. However, due to the higher costs and expertise required for such experiments, its translation into a regular clinical practice might be challenging. Despite the implementation of modern deep learning to enhance information obtained from histological images using AI, efforts have been constrained by limitations in the diversity of information. In this paper, we developed a model, HistoSPACE that explore the diversity of histological images available with ST data to extract molecular insights from tissue image. Our proposed study built an image encoder derived from universal image autoencoder. This image encoder was connected to convolution blocks to built the final model. It was further fine tuned with the help of ST-Data. This model is notably lightweight in compared to traditional histological models. Our developed model demonstrates significant efficiency compared to contemporary algorithms, revealing a correlation of 0.56 in leave-one-out cross-validation. Finally, its robustness was validated through an independent dataset, showing a well matched preditction with predefined disease pathology. △ Less

Submitted 7 August, 2024; originally announced August 2024.

arXiv:2408.02930 [pdf, other]

The Need for a Big World Simulator: A Scientific Challenge for Continual Learning

Authors: Saurabh Kumar, Hong Jun Jeon, Alex Lewandowski, Benjamin Van Roy

Abstract: The "small agent, big world" frame offers a conceptual view that motivates the need for continual learning. The idea is that a small agent operating in a much bigger world cannot store all information that the world has to offer. To perform well, the agent must be carefully designed to ingest, retain, and eject the right information. To enable the development of performant continual learning agent… ▽ More The "small agent, big world" frame offers a conceptual view that motivates the need for continual learning. The idea is that a small agent operating in a much bigger world cannot store all information that the world has to offer. To perform well, the agent must be carefully designed to ingest, retain, and eject the right information. To enable the development of performant continual learning agents, a number of synthetic environments have been proposed. However, these benchmarks suffer from limitations, including unnatural distribution shifts and a lack of fidelity to the "small agent, big world" framing. This paper aims to formalize two desiderata for the design of future simulated environments. These two criteria aim to reflect the objectives and complexity of continual learning in practical settings while enabling rapid prototyping of algorithms on a smaller scale. △ Less

Submitted 5 August, 2024; originally announced August 2024.

Comments: Accepted to the Finding the Frame Workshop at RLC 2024

arXiv:2408.02140 [pdf, other]

VidModEx: Interpretable and Efficient Black Box Model Extraction for High-Dimensional Spaces

Authors: Somnath Sendhil Kumar, Yuvaraj Govindarajulu, Pavan Kulkarni, Manojkumar Parmar

Abstract: In the domain of black-box model extraction, conventional methods reliant on soft labels or surrogate datasets struggle with scaling to high-dimensional input spaces and managing the complexity of an extensive array of interrelated classes. In this work, we present a novel approach that utilizes SHAP (SHapley Additive exPlanations) to enhance synthetic data generation. SHAP quantifies the individu… ▽ More In the domain of black-box model extraction, conventional methods reliant on soft labels or surrogate datasets struggle with scaling to high-dimensional input spaces and managing the complexity of an extensive array of interrelated classes. In this work, we present a novel approach that utilizes SHAP (SHapley Additive exPlanations) to enhance synthetic data generation. SHAP quantifies the individual contributions of each input feature towards the victim model's output, facilitating the optimization of an energy-based GAN towards a desirable output. This method significantly boosts performance, achieving a 16.45% increase in the accuracy of image classification models and extending to video classification models with an average improvement of 26.11% and a maximum of 33.36% on challenging datasets such as UCF11, UCF101, Kinetics 400, Kinetics 600, and Something-Something V2. We further demonstrate the effectiveness and practical utility of our method under various scenarios, including the availability of top-k prediction probabilities, top-k prediction labels, and top-1 labels. △ Less

Submitted 4 August, 2024; originally announced August 2024.

arXiv:2408.00312 [pdf, other]

doi 10.1145/3627673.3679592

Adversarial Text Rewriting for Text-aware Recommender Systems

Authors: Sejoon Oh, Gaurav Verma, Srijan Kumar

Abstract: Text-aware recommender systems incorporate rich textual features, such as titles and descriptions, to generate item recommendations for users. The use of textual features helps mitigate cold-start problems, and thus, such recommender systems have attracted increased attention. However, we argue that the dependency on item descriptions makes the recommender system vulnerable to manipulation by adve… ▽ More Text-aware recommender systems incorporate rich textual features, such as titles and descriptions, to generate item recommendations for users. The use of textual features helps mitigate cold-start problems, and thus, such recommender systems have attracted increased attention. However, we argue that the dependency on item descriptions makes the recommender system vulnerable to manipulation by adversarial sellers on e-commerce platforms. In this paper, we explore the possibility of such manipulation by proposing a new text rewriting framework to attack text-aware recommender systems. We show that the rewriting attack can be exploited by sellers to unfairly uprank their products, even though the adversarially rewritten descriptions are perceived as realistic by human evaluators. Methodologically, we investigate two different variations to carry out text rewriting attacks: (1) two-phase fine-tuning for greater attack performance, and (2) in-context learning for higher text rewriting quality. Experiments spanning 3 different datasets and 4 existing approaches demonstrate that recommender systems exhibit vulnerability against the proposed text rewriting attack. Our work adds to the existing literature around the robustness of recommender systems, while highlighting a new dimension of vulnerability in the age of large-scale automated text generation. △ Less

Submitted 1 August, 2024; originally announced August 2024.

Comments: Accepted for publication at: 33rd ACM International Conference on Information and Knowledge Management (CIKM 2024). Code and data at: https://github.com/sejoonoh/ATR

arXiv:2407.21783 [pdf, other]

The Llama 3 Herd of Models

Authors: Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere, Bethany Biron, Binh Tang , et al. (510 additional authors not shown)

Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical… ▽ More Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs competitively with the state-of-the-art on image, video, and speech recognition tasks. The resulting models are not yet being broadly released as they are still under development. △ Less

Submitted 15 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

arXiv:2407.21141 [pdf, other]

FL-DECO-BC: A Privacy-Preserving, Provably Secure, and Provenance-Preserving Federated Learning Framework with Decentralized Oracles on Blockchain for VANETs

Authors: Sathwik Narkedimilli, Rayachoti Arun Kumar, N. V. Saran Kumar, Ramapathruni Praneeth Reddy, Pavan Kumar C

Abstract: Vehicular Ad-Hoc Networks (VANETs) hold immense potential for improving traffic safety and efficiency. However, traditional centralized approaches for machine learning in VANETs raise concerns about data privacy and security. Federated Learning (FL) offers a solution that enables collaborative model training without sharing raw data. This paper proposes FL-DECO-BC as a novel privacy-preserving, pr… ▽ More Vehicular Ad-Hoc Networks (VANETs) hold immense potential for improving traffic safety and efficiency. However, traditional centralized approaches for machine learning in VANETs raise concerns about data privacy and security. Federated Learning (FL) offers a solution that enables collaborative model training without sharing raw data. This paper proposes FL-DECO-BC as a novel privacy-preserving, provably secure, and provenance-preserving federated learning framework specifically designed for VANETs. FL-DECO-BC leverages decentralized oracles on blockchain to securely access external data sources while ensuring data privacy through advanced techniques. The framework guarantees provable security through cryptographic primitives and formal verification methods. Furthermore, FL-DECO-BC incorporates a provenance-preserving design to track data origin and history, fostering trust and accountability. This combination of features empowers VANETs with secure and privacy-conscious machine-learning capabilities, paving the way for advanced traffic management and safety applications. △ Less

Submitted 30 July, 2024; originally announced July 2024.

arXiv:2407.21046 [pdf, other]

Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines

Authors: Yuchen Li, Alexandre Kirchmeyer, Aashay Mehta, Yilong Qin, Boris Dadachev, Kishore Papineni, Sanjiv Kumar, Andrej Risteski

Abstract: Autoregressive language models are the currently dominant paradigm for text generation, but they have some fundamental limitations that cannot be remedied by scale-for example inherently sequential and unidirectional generation. While alternate classes of models have been explored, we have limited mathematical understanding of their fundamental power and limitations. In this paper we focus on Gene… ▽ More Autoregressive language models are the currently dominant paradigm for text generation, but they have some fundamental limitations that cannot be remedied by scale-for example inherently sequential and unidirectional generation. While alternate classes of models have been explored, we have limited mathematical understanding of their fundamental power and limitations. In this paper we focus on Generative Masked Language Models (GMLMs), a non-autoregressive paradigm in which we train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model, These models empirically strike a promising speed-quality trade-off as each step can be typically parallelized by decoding the entire sequence in parallel. We develop a mathematical framework for analyzing and improving such models which sheds light on questions of sample complexity and inference speed and quality. Empirically, we adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality compared with autoregressive models. We run careful ablation experiments to give recommendations on key design choices, and make fine-grained observations on the common error modes in connection with our theory. Our mathematical analyses and empirical observations characterize both potentials and limitations of this approach, and can be applied to future works on improving understanding and performance of GMLMs. Our codes are released at https://github.com/google-research/google-research/tree/master/padir △ Less

Submitted 22 July, 2024; originally announced July 2024.

Comments: ICML 2024

arXiv:2407.19773 [pdf, other]

Unmasking unlearnable models: a classification challenge for biomedical images without visible cues

Authors: Shivam Kumar, Samrat Chatterjee

Abstract: Predicting traits from images lacking visual cues is challenging, as algorithms are designed to capture visually correlated ground truth. This problem is critical in biomedical sciences, and their solution can improve the efficacy of non-invasive methods. For example, a recent challenge of predicting MGMT methylation status from MRI images is critical for treatment decisions of glioma patients. Us… ▽ More Predicting traits from images lacking visual cues is challenging, as algorithms are designed to capture visually correlated ground truth. This problem is critical in biomedical sciences, and their solution can improve the efficacy of non-invasive methods. For example, a recent challenge of predicting MGMT methylation status from MRI images is critical for treatment decisions of glioma patients. Using less robust models poses a significant risk in these critical scenarios and underscores the urgency of addressing this issue. Despite numerous efforts, contemporary models exhibit suboptimal performance, and underlying reasons for this limitation remain elusive. In this study, we demystify the complexity of MGMT status prediction through a comprehensive exploration by performing benchmarks of existing models adjoining transfer learning. Their architectures were further dissected by observing gradient flow across layers. Additionally, a feature selection strategy was applied to improve model interpretability. Our finding highlighted that current models are unlearnable and may require new architectures to explore applications in the real world. We believe our study will draw immediate attention and catalyse advancements in predictive modelling with non-visible cues. △ Less

Submitted 29 July, 2024; originally announced July 2024.

arXiv:2407.17508 [pdf, other]

Artificial Intelligence Based Navigation in Quasi Structured Environment

Authors: Hariram Sampath Kumar, Archana Singh, Manish Kumar Ojha

Abstract: The proper planning of different types of public transportation such as metro, highway, waterways, and so on, can increase the efficiency, reduce the congestion and improve the safety of the country. There are certain challenges associated with route planning, such as high cost of implementation, need for adequate resource & infrastructure and resistance to change. The goal of this research is to… ▽ More The proper planning of different types of public transportation such as metro, highway, waterways, and so on, can increase the efficiency, reduce the congestion and improve the safety of the country. There are certain challenges associated with route planning, such as high cost of implementation, need for adequate resource & infrastructure and resistance to change. The goal of this research is to examine the working, applications, complexity factors, advantages & disadvantages of Floyd- Warshall, Bellman-Ford, Johnson, Ant Colony Optimization (ACO), Particle Swarm Optimization (PSO), & Grey Wolf Optimizer (GWO), to find the best choice for the above application. In this paper, comparative analysis of above-mentioned algorithms is presented. The Floyd-Warshall method and ACO algorithm are chosen based on the comparisons. Also, a combination of modified Floyd-Warshall with ACO algorithm is proposed. The proposed algorithm showed better results with less time complexity, when applied on randomly structured points within a boundary called quasi-structured points. In addition, this paper also discusses the future works of integrating Floyd-Warshall with ACO to develop a real-time model for overcoming above mentioned-challenges during transportation route planning. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 10 pages, 8 figures

arXiv:2407.16905 [pdf, other]

Assessing the role of clinical summarization and patient chart review within communications, medical management, and diagnostics

Authors: Chanseo Lee, Kimon-Aristotelis Vogt, Sonu Kumar

Abstract: Effective summarization of unstructured patient data in electronic health records (EHRs) is crucial for accurate diagnosis and efficient patient care, yet clinicians often struggle with information overload and time constraints. This review dives into recent literature and case studies on both the significant impacts and outstanding issues of patient chart review on communications, diagnostics, an… ▽ More Effective summarization of unstructured patient data in electronic health records (EHRs) is crucial for accurate diagnosis and efficient patient care, yet clinicians often struggle with information overload and time constraints. This review dives into recent literature and case studies on both the significant impacts and outstanding issues of patient chart review on communications, diagnostics, and management. It also discusses recent efforts to integrate artificial intelligence (AI) into clinical summarization tasks, and its transformative impact on the clinician's potential, including but not limited to reductions of administrative burden and improved patient-centered care. △ Less

Submitted 24 June, 2024; originally announced July 2024.

arXiv:2407.15716 [pdf, other]

CrashEventLLM: Predicting System Crashes with Large Language Models

Authors: Priyanka Mudgal, Bijan Arbab, Swaathi Sampath Kumar

Abstract: As the dependence on computer systems expands across various domains, focusing on personal, industrial, and large-scale applications, there arises a compelling need to enhance their reliability to sustain business operations seamlessly and ensure optimal user satisfaction. System logs generated by these devices serve as valuable repositories of historical trends and past failures. The use of machi… ▽ More As the dependence on computer systems expands across various domains, focusing on personal, industrial, and large-scale applications, there arises a compelling need to enhance their reliability to sustain business operations seamlessly and ensure optimal user satisfaction. System logs generated by these devices serve as valuable repositories of historical trends and past failures. The use of machine learning techniques for failure prediction has become commonplace, enabling the extraction of insights from past data to anticipate future behavior patterns. Recently, large language models have demonstrated remarkable capabilities in tasks including summarization, reasoning, and event prediction. Therefore, in this paper, we endeavor to investigate the potential of large language models in predicting system failures, leveraging insights learned from past failure behavior to inform reasoning and decision-making processes effectively. Our approach involves leveraging data from the Intel Computing Improvement Program (ICIP) system crash logs to identify significant events and develop CrashEventLLM. This model, built upon a large language model framework, serves as our foundation for crash event prediction. Specifically, our model utilizes historical data to forecast future crash events, informed by expert annotations. Additionally, it goes beyond mere prediction, offering insights into potential causes for each crash event. This work provides the preliminary insights into prompt-based large language models for the log-based event prediction task. △ Less

Submitted 28 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

Comments: Accepted in ICITCOM'24. Copyrights will be with IEEE

arXiv:2407.15324 [pdf, other]

Cooperative Salvo Guidance over Leader-Follower Network with Free-Will Arbitrary Time Convergence

Authors: Rajib Shekhar Pal, Shashi Ranjan Kumar, Dwaipayan Mukherjee

Abstract: A cooperative salvo strategy is proposed in this paper which achieves consensus among the interceptors within a pre-defined arbitrary settling time. Considering non-linear engagement kinematics and a system lag to capture the effect of interceptor autopilot as present in realistic interception scenarios, the guidance schemes use the time-to-go estimates of the interceptors in order to achieve simu… ▽ More A cooperative salvo strategy is proposed in this paper which achieves consensus among the interceptors within a pre-defined arbitrary settling time. Considering non-linear engagement kinematics and a system lag to capture the effect of interceptor autopilot as present in realistic interception scenarios, the guidance schemes use the time-to-go estimates of the interceptors in order to achieve simultaneous interception of a stationary target at a pre-determined impact time. The guidance scheme ensures that consensus among the time-to-go estimates of the interceptors is achieved within a settling time whose upper bound can be pre-specified arbitrarily independent of the initial conditions or design parameters. The efficacy of the proposed guidance strategy is demonstrated using numerical simulations with varied conditions of initial position, velocities and heading angle errors of the interceptors as well as different desired impact times. △ Less

Submitted 21 July, 2024; originally announced July 2024.

arXiv:2407.15227 [pdf, other]

A Community-Centric Perspective for Characterizing and Detecting Anti-Asian Violence-Provoking Speech

Authors: Gaurav Verma, Rynaa Grover, Jiawei Zhou, Binny Mathew, Jordan Kraemer, Munmun De Choudhury, Srijan Kumar

Abstract: Violence-provoking speech -- speech that implicitly or explicitly promotes violence against the members of the targeted community, contributed to a massive surge in anti-Asian crimes during the pandemic. While previous works have characterized and built tools for detecting other forms of harmful speech, like fear speech and hate speech, our work takes a community-centric approach to studying anti-… ▽ More Violence-provoking speech -- speech that implicitly or explicitly promotes violence against the members of the targeted community, contributed to a massive surge in anti-Asian crimes during the pandemic. While previous works have characterized and built tools for detecting other forms of harmful speech, like fear speech and hate speech, our work takes a community-centric approach to studying anti-Asian violence-provoking speech. Using data from ~420k Twitter posts spanning a 3-year duration (January 1, 2020 to February 1, 2023), we develop a codebook to characterize anti-Asian violence-provoking speech and collect a community-crowdsourced dataset to facilitate its large-scale detection using state-of-the-art classifiers. We contrast the capabilities of natural language processing classifiers, ranging from BERT-based to LLM-based classifiers, in detecting violence-provoking speech with their capabilities to detect anti-Asian hateful speech. In contrast to prior work that has demonstrated the effectiveness of such classifiers in detecting hateful speech ($F_1 = 0.89$), our work shows that accurate and reliable detection of violence-provoking speech is a challenging task ($F_1 = 0.69$). We discuss the implications of our findings, particularly the need for proactive interventions to support Asian communities during public health crises. The resources related to the study are available at https://claws-lab.github.io/violence-provoking-speech/. △ Less

Submitted 21 July, 2024; originally announced July 2024.

Comments: Accepted to ACL 2024 Main

arXiv:2407.15022 [pdf]

Encouraging Responsible Use of Generative AI in Education: A Reward-Based Learning Approach

Authors: Aditi Singh, Abul Ehtesham, Saket Kumar, Gaurav Kumar Gupta, Tala Talaei Khoei

Abstract: This research introduces an innovative mathematical learning approach that integrates generative AI to cultivate a structured learning rather than quick solution. Our method combines chatbot capabilities and generative AI to offer interactive problem-solving exercises, enhancing learning through a stepby-step approach for varied problems, advocating for the responsible use of AI in education. Our… ▽ More This research introduces an innovative mathematical learning approach that integrates generative AI to cultivate a structured learning rather than quick solution. Our method combines chatbot capabilities and generative AI to offer interactive problem-solving exercises, enhancing learning through a stepby-step approach for varied problems, advocating for the responsible use of AI in education. Our approach emphasizes that immediate answers from ChatGPT can impede real learning. We introduce a reward-based system that requires students to solve mathematical problems effectively to receive the final answer. This encourages a progressive learning path from basic to complex problems, rewarding mastery with final solutions. The goal is to transition students from seeking quick fixes to engaging actively in a comprehensive learning experience. △ Less

Submitted 26 June, 2024; originally announced July 2024.

Comments: 9 pages, 4 figures

arXiv:2407.13833 [pdf, other]

Phi-3 Safety Post-Training: Aligning Language Models with a "Break-Fix" Cycle

Authors: Emman Haider, Daniel Perez-Becker, Thomas Portet, Piyush Madan, Amit Garg, David Majercak, Wen Wen, Dongwoo Kim, Ziyi Yang, Jianwen Zhang, Hiteshi Sharma, Blake Bullwinkel, Martin Pouliot, Amanda Minnich, Shiven Chawla, Solianna Herrera, Shahed Warreth, Maggie Engler, Gary Lopez, Nina Chikanov, Raja Sekhar Rao Dheekonda, Bolor-Erdene Jagdagdorj, Roman Lutz, Richard Lundeen, Tori Westerhoff , et al. (5 additional authors not shown)

Abstract: Recent innovations in language model training have demonstrated that it is possible to create highly performant models that are small enough to run on a smartphone. As these models are deployed in an increasing number of domains, it is critical to ensure that they are aligned with human preferences and safety considerations. In this report, we present our methodology for safety aligning the Phi-3… ▽ More Recent innovations in language model training have demonstrated that it is possible to create highly performant models that are small enough to run on a smartphone. As these models are deployed in an increasing number of domains, it is critical to ensure that they are aligned with human preferences and safety considerations. In this report, we present our methodology for safety aligning the Phi-3 series of language models. We utilized a "break-fix" cycle, performing multiple rounds of dataset curation, safety post-training, benchmarking, red teaming, and vulnerability identification to cover a variety of harm areas in both single and multi-turn scenarios. Our results indicate that this approach iteratively improved the performance of the Phi-3 models across a wide range of responsible AI benchmarks. △ Less

Submitted 18 July, 2024; originally announced July 2024.

arXiv:2407.13318 [pdf, other]

A new approach to delegate signing rights to proxy signers using isogeny-based cryptography

Authors: Kunal Dey, Somnath Kumar, Vikas Srivastava, Sumit Kumar Debnath

Abstract: E-governance is a two-way protocol through which one can use government services, share data and request information. It refers to the use of communication and information technologies to provide government services to public in an efficient and fast manner. In addition, any document submitted to the e-Government system must be authenticated by a government officer using a digital signature scheme… ▽ More E-governance is a two-way protocol through which one can use government services, share data and request information. It refers to the use of communication and information technologies to provide government services to public in an efficient and fast manner. In addition, any document submitted to the e-Government system must be authenticated by a government officer using a digital signature scheme. In the context of digital signatures, the proxy signature is an important cryptographic primitive that allows the original signer to delegate signing authority to another signer (proxy signer). The proxy signature has a number of important applications in the e-government system. There are now a large amount of proxy signature schemes. The security of most of them relies on the following hard problems: the discrete logarithm problem and the factorization of integers problem. However, a large-scale quantum computer can solve them in polynomial time due to Shor's algorithm. As a consequence, there is a need for a quantum computer-resistant proxy signature to secure e-governance system from quantum adversaries. In this work, we propose the first post-quantum isogeny based proxy signature scheme CSI-PS (commutative supersingular isogeny proxy signature). Our construction is proven to be uf-cma secure under the hardness of the group action inverse problem (GAIP) based on isogeny. △ Less

Submitted 18 July, 2024; originally announced July 2024.

arXiv:2407.12185 [pdf, other]

Satisficing Exploration for Deep Reinforcement Learning

Authors: Dilip Arumugam, Saurabh Kumar, Ramki Gummadi, Benjamin Van Roy

Abstract: A default assumption in the design of reinforcement-learning algorithms is that a decision-making agent always explores to learn optimal behavior. In sufficiently complex environments that approach the vastness and scale of the real world, however, attaining optimal performance may in fact be an entirely intractable endeavor and an agent may seldom find itself in a position to complete the requisi… ▽ More A default assumption in the design of reinforcement-learning algorithms is that a decision-making agent always explores to learn optimal behavior. In sufficiently complex environments that approach the vastness and scale of the real world, however, attaining optimal performance may in fact be an entirely intractable endeavor and an agent may seldom find itself in a position to complete the requisite exploration for identifying an optimal policy. Recent work has leveraged tools from information theory to design agents that deliberately forgo optimal solutions in favor of sufficiently-satisfying or satisficing solutions, obtained through lossy compression. Notably, such agents may employ fundamentally different exploratory decisions to learn satisficing behaviors more efficiently than optimal ones that are more data intensive. While supported by a rigorous corroborating theory, the underlying algorithm relies on model-based planning, drastically limiting the compatibility of these ideas with function approximation and high-dimensional observations. In this work, we remedy this issue by extending an agent that directly represents uncertainty over the optimal value function allowing it to both bypass the need for model-based planning and to learn satisficing policies. We provide simple yet illustrative experiments that demonstrate how our algorithm enables deep reinforcement-learning agents to achieve satisficing behaviors. In keeping with previous work on this setting for multi-armed bandits, we additionally find that our algorithm is capable of synthesizing optimal behaviors, when feasible, more efficiently than its non-information-theoretic counterpart. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: Accepted to the Finding the Frame Workshop at RLC 2024

arXiv:2407.12043 [pdf, other]

The Art of Saying No: Contextual Noncompliance in Language Models

Authors: Faeze Brahman, Sachin Kumar, Vidhisha Balachandran, Pradeep Dasigi, Valentina Pyatkin, Abhilasha Ravichander, Sarah Wiegreffe, Nouha Dziri, Khyathi Chandu, Jack Hessel, Yulia Tsvetkov, Noah A. Smith, Yejin Choi, Hannaneh Hajishirzi

Abstract: Chat-based language models are designed to be helpful, yet they should not comply with every user request. While most existing work primarily focuses on refusal of "unsafe" queries, we posit that the scope of noncompliance should be broadened. We introduce a comprehensive taxonomy of contextual noncompliance describing when and how models should not comply with user requests. Our taxonomy spans a… ▽ More Chat-based language models are designed to be helpful, yet they should not comply with every user request. While most existing work primarily focuses on refusal of "unsafe" queries, we posit that the scope of noncompliance should be broadened. We introduce a comprehensive taxonomy of contextual noncompliance describing when and how models should not comply with user requests. Our taxonomy spans a wide range of categories including incomplete, unsupported, indeterminate, and humanizing requests (in addition to unsafe requests). To test noncompliance capabilities of language models, we use this taxonomy to develop a new evaluation suite of 1000 noncompliance prompts. We find that most existing models show significantly high compliance rates in certain previously understudied categories with models like GPT-4 incorrectly complying with as many as 30% of requests. To address these gaps, we explore different training strategies using a synthetically-generated training set of requests and expected noncompliant responses. Our experiments demonstrate that while direct finetuning of instruction-tuned models can lead to both over-refusal and a decline in general capabilities, using parameter efficient methods like low rank adapters helps to strike a good balance between appropriate noncompliance and other capabilities. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.10837 [pdf, other]

Trajectory Tracking for Unmanned Aerial Vehicles in 3D Spaces under Motion Constraints

Authors: Saurabh Kumar, Shashi Ranjan Kumar, Abhinav Sinha

Abstract: This article presents a three-dimensional nonlinear trajectory tracking control strategy for unmanned aerial vehicles (UAVs) in the presence of spatial constraints. As opposed to many existing control strategies, which do not consider spatial constraints, the proposed strategy considers spatial constraints on each degree of freedom movement of the UAV. Such consideration makes the design appealing… ▽ More This article presents a three-dimensional nonlinear trajectory tracking control strategy for unmanned aerial vehicles (UAVs) in the presence of spatial constraints. As opposed to many existing control strategies, which do not consider spatial constraints, the proposed strategy considers spatial constraints on each degree of freedom movement of the UAV. Such consideration makes the design appealing for many practical applications, such as pipeline inspection, boundary tracking, etc. The proposed design accounts for the limited information about the inertia matrix, thereby affirming its inherent robustness against unmodeled dynamics and other imperfections. We rigorously show that the UAV will converge to its desired path by maintaining bounded position, orientation, and linear and angular speeds. Finally, we demonstrate the effectiveness of the proposed strategy through various numerical simulations. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.09466 [pdf, other]

TRAVERSE: Traffic-Responsive Autonomous Vehicle Experience & Rare-event Simulation for Enhanced safety

Authors: Sandeep Thalapanane, Sandip Sharan Senthil Kumar, Guru Nandhan Appiya Dilipkumar Peethambari, Sourang SriHari, Laura Zheng, Julio Poveda, Ming C. Lin

Abstract: Data for training learning-enabled self-driving cars in the physical world are typically collected in a safe, normal environment. Such data distribution often engenders a strong bias towards safe driving, making self-driving cars unprepared when encountering adversarial scenarios like unexpected accidents. Due to a dearth of such adverse data that is unrealistic for drivers to collect, autonomous… ▽ More Data for training learning-enabled self-driving cars in the physical world are typically collected in a safe, normal environment. Such data distribution often engenders a strong bias towards safe driving, making self-driving cars unprepared when encountering adversarial scenarios like unexpected accidents. Due to a dearth of such adverse data that is unrealistic for drivers to collect, autonomous vehicles can perform poorly when experiencing such rare events. This work addresses much-needed research by having participants drive a VR vehicle simulator going through simulated traffic with various types of accidental scenarios. It aims to understand human responses and behaviors in simulated accidents, contributing to our understanding of driving dynamics and safety. The simulation framework adopts a robust traffic simulation and is rendered using the Unity Game Engine. Furthermore, the simulation framework is built with portable, light-weight immersive driving simulator hardware, lowering the resource barrier for studies in autonomous driving research. Keywords: Rare Events, Traffic Simulation, Autonomous Driving, Virtual Reality, User Studies △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.08818 [pdf]

MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization

Authors: Orevaoghene Ahia, Sachin Kumar, Hila Gonen, Valentin Hoffman, Tomasz Limisiewicz, Yulia Tsvetkov, Noah A. Smith

Abstract: In multilingual settings, non-Latin scripts and low-resource languages are usually disadvantaged in terms of language models' utility, efficiency, and cost. Specifically, previous studies have reported multiple modeling biases that the current tokenization algorithms introduce to non-Latin script languages, the main one being over-segmentation. In this work, we propose MAGNET; multilingual adaptiv… ▽ More In multilingual settings, non-Latin scripts and low-resource languages are usually disadvantaged in terms of language models' utility, efficiency, and cost. Specifically, previous studies have reported multiple modeling biases that the current tokenization algorithms introduce to non-Latin script languages, the main one being over-segmentation. In this work, we propose MAGNET; multilingual adaptive gradient-based tokenization to reduce over-segmentation via adaptive gradient-based subword tokenization. MAGNET learns to predict segment boundaries between byte tokens in a sequence via sub-modules within the model, which act as internal boundary predictors (tokenizers). Previous gradient-based tokenization methods aimed for uniform compression across sequences by integrating a single boundary predictor during training and optimizing it end-to-end through stochastic reparameterization alongside the next token prediction objective. However, this approach still results in over-segmentation for non-Latin script languages in multilingual settings. In contrast, MAGNET offers a customizable architecture where byte-level sequences are routed through language-script-specific predictors, each optimized for its respective language script. This modularity enforces equitable segmentation granularity across different language scripts compared to previous methods. Through extensive experiments, we demonstrate that in addition to reducing segmentation disparities, MAGNET also enables faster language modelling and improves downstream utility. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2407.08726 [pdf, other]

Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data

Authors: Cherie Ho, Jiaye Zou, Omar Alama, Sai Mitheran Jagadesh Kumar, Benjamin Chiang, Taneesh Gupta, Chen Wang, Nikhil Keetha, Katia Sycara, Sebastian Scherer

Abstract: Top-down Bird's Eye View (BEV) maps are a popular representation for ground robot navigation due to their richness and flexibility for downstream tasks. While recent methods have shown promise for predicting BEV maps from First-Person View (FPV) images, their generalizability is limited to small regions captured by current autonomous vehicle-based datasets. In this context, we show that a more sca… ▽ More Top-down Bird's Eye View (BEV) maps are a popular representation for ground robot navigation due to their richness and flexibility for downstream tasks. While recent methods have shown promise for predicting BEV maps from First-Person View (FPV) images, their generalizability is limited to small regions captured by current autonomous vehicle-based datasets. In this context, we show that a more scalable approach towards generalizable map prediction can be enabled by using two large-scale crowd-sourced mapping platforms, Mapillary for FPV images and OpenStreetMap for BEV semantic maps. We introduce Map It Anywhere (MIA), a data engine that enables seamless curation and modeling of labeled map prediction data from existing open-source map platforms. Using our MIA data engine, we display the ease of automatically collecting a dataset of 1.2 million pairs of FPV images & BEV maps encompassing diverse geographies, landscapes, environmental factors, camera models & capture scenarios. We further train a simple camera model-agnostic model on this data for BEV map prediction. Extensive evaluations using established benchmarks and our dataset show that the data curated by MIA enables effective pretraining for generalizable BEV map prediction, with zero-shot performance far exceeding baselines trained on existing datasets by 35%. Our analysis highlights the promise of using large-scale public maps for developing & testing generalizable BEV perception, paving the way for more robust autonomous navigation. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2407.08655 [pdf, other]

SPOCKMIP: Segmentation of Vessels in MRAs with Enhanced Continuity using Maximum Intensity Projection as Loss

Authors: Chethan Radhakrishna, Karthikesh Varma Chintalapati, Sri Chandana Hudukula Ram Kumar, Raviteja Sutrave, Hendrik Mattern, Oliver Speck, Andreas Nürnberger, Soumick Chatterjee

Abstract: Identification of vessel structures of different sizes in biomedical images is crucial in the diagnosis of many neurodegenerative diseases. However, the sparsity of good-quality annotations of such images makes the task of vessel segmentation challenging. Deep learning offers an efficient way to segment vessels of different sizes by learning their high-level feature representations and the spatial… ▽ More Identification of vessel structures of different sizes in biomedical images is crucial in the diagnosis of many neurodegenerative diseases. However, the sparsity of good-quality annotations of such images makes the task of vessel segmentation challenging. Deep learning offers an efficient way to segment vessels of different sizes by learning their high-level feature representations and the spatial continuity of such features across dimensions. Semi-supervised patch-based approaches have been effective in identifying small vessels of one to two voxels in diameter. This study focuses on improving the segmentation quality by considering the spatial correlation of the features using the Maximum Intensity Projection~(MIP) as an additional loss criterion. Two methods are proposed with the incorporation of MIPs of label segmentation on the single~(z-axis) and multiple perceivable axes of the 3D volume. The proposed MIP-based methods produce segmentations with improved vessel continuity, which is evident in visual examinations of ROIs. Patch-based training is improved by introducing an additional loss term, MIP loss, to penalise the predicted discontinuity of vessels. A training set of 14 volumes is selected from the StudyForrest dataset comprising of 18 7-Tesla 3D Time-of-Flight~(ToF) Magnetic Resonance Angiography (MRA) images. The generalisation performance of the method is evaluated using the other unseen volumes in the dataset. It is observed that the proposed method with multi-axes MIP loss produces better quality segmentations with a median Dice of $80.245 \pm 0.129$. Also, the method with single-axis MIP loss produces segmentations with a median Dice of $79.749 \pm 0.109$. Furthermore, a visual comparison of the ROIs in the predicted segmentation reveals a significant improvement in the continuity of the vessels when MIP loss is incorporated into training. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2407.07786 [pdf, ps, other]

The Human Factor in AI Red Teaming: Perspectives from Social and Collaborative Computing

Authors: Alice Qian Zhang, Ryland Shaw, Jacy Reese Anthis, Ashlee Milton, Emily Tseng, Jina Suh, Lama Ahmad, Ram Shankar Siva Kumar, Julian Posada, Benjamin Shestakofsky, Sarah T. Roberts, Mary L. Gray

Abstract: Rapid progress in general-purpose AI has sparked significant interest in "red teaming," a practice of adversarial testing originating in military and cybersecurity applications. AI red teaming raises many questions about the human factor, such as how red teamers are selected, biases and blindspots in how tests are conducted, and harmful content's psychological effects on red teamers. A growing bod… ▽ More Rapid progress in general-purpose AI has sparked significant interest in "red teaming," a practice of adversarial testing originating in military and cybersecurity applications. AI red teaming raises many questions about the human factor, such as how red teamers are selected, biases and blindspots in how tests are conducted, and harmful content's psychological effects on red teamers. A growing body of HCI and CSCW literature examines related practices-including data labeling, content moderation, and algorithmic auditing. However, few, if any, have investigated red teaming itself. This workshop seeks to consider the conceptual and empirical challenges associated with this practice, often rendered opaque by non-disclosure agreements. Future studies may explore topics ranging from fairness to mental health and other areas of potential harm. We aim to facilitate a community of researchers and practitioners who can begin to meet these challenges with creativity, innovation, and thoughtful reflection. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: Workshop proposal accepted to CSCW 2024

arXiv:2407.07128 [pdf, other]

Modularity aided consistent attributed graph clustering via coarsening

Authors: Samarth Bhatia, Yukti Makhija, Manoj Kumar, Sandeep Kumar

Abstract: Graph clustering is an important unsupervised learning technique for partitioning graphs with attributes and detecting communities. However, current methods struggle to accurately capture true community structures and intra-cluster relations, be computationally efficient, and identify smaller communities. We address these challenges by integrating coarsening and modularity maximization, effectivel… ▽ More Graph clustering is an important unsupervised learning technique for partitioning graphs with attributes and detecting communities. However, current methods struggle to accurately capture true community structures and intra-cluster relations, be computationally efficient, and identify smaller communities. We address these challenges by integrating coarsening and modularity maximization, effectively leveraging both adjacency and node features to enhance clustering accuracy. We propose a loss function incorporating log-determinant, smoothness, and modularity components using a block majorization-minimization technique, resulting in superior clustering outcomes. The method is theoretically consistent under the Degree-Corrected Stochastic Block Model (DC-SBM), ensuring asymptotic error-free performance and complete label recovery. Our provably convergent and time-efficient algorithm seamlessly integrates with graph neural networks (GNNs) and variational graph autoencoders (VGAEs) to learn enhanced node features and deliver exceptional clustering performance. Extensive experiments on benchmark datasets demonstrate its superiority over existing state-of-the-art methods for both attributed and non-attributed graphs. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: The first two authors contributed equally to this work

arXiv:2407.06868 [pdf, other]

Energy Efficient Fair STAR-RIS for Mobile Users

Authors: Ashok S. Kumar, Nancy Nayak, Sheetal Kalyani, Himal A. Suraweera

Abstract: In this work, we propose a method to improve the energy efficiency and fairness of simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RIS) for mobile users, ensuring reduced power consumption while maintaining reliable communication. To achieve this, we introduce a new parameter known as the subsurface assignment variable, which determines the number of STAR-RIS e… ▽ More In this work, we propose a method to improve the energy efficiency and fairness of simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RIS) for mobile users, ensuring reduced power consumption while maintaining reliable communication. To achieve this, we introduce a new parameter known as the subsurface assignment variable, which determines the number of STAR-RIS elements allocated to each user. We then formulate a novel optimization problem by concurrently optimizing the phase shifts of the STAR-RIS and subsurface assignment variable. We leverage the deep reinforcement learning (DRL) technique to address this optimization problem. The DRL model predicts the phase shifts of the STAR-RIS and efficiently allocates elements of STAR-RIS to the users. Additionally, we incorporate a penalty term in the DRL model to facilitate intelligent deactivation of STAR-RIS elements when not in use to enhance energy efficiency. Through extensive experiments, we show that the proposed method can achieve fairly high and nearly equal data rates for all users in both the transmission and reflection spaces in an energy-efficient manner. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.04444 [pdf, other]

TokenVerse: Unifying Speech and NLP Tasks via Transducer-based ASR

Authors: Shashi Kumar, Srikanth Madikeri, Juan Zuluaga-Gomez, Iuliia Nigmatulina, Esaú Villatoro-Tello, Sergio Burdisso, Petr Motlicek, Karthik Pandia, Aravind Ganapathiraju

Abstract: In traditional conversational intelligence from speech, a cascaded pipeline is used, involving tasks such as voice activity detection, diarization, transcription, and subsequent processing with different NLP models for tasks like semantic endpointing and named entity recognition (NER). Our paper introduces TokenVerse, a single Transducer-based model designed to handle multiple tasks. This is achie… ▽ More In traditional conversational intelligence from speech, a cascaded pipeline is used, involving tasks such as voice activity detection, diarization, transcription, and subsequent processing with different NLP models for tasks like semantic endpointing and named entity recognition (NER). Our paper introduces TokenVerse, a single Transducer-based model designed to handle multiple tasks. This is achieved by integrating task-specific tokens into the reference text during ASR model training, streamlining the inference and eliminating the need for separate NLP models. In addition to ASR, we conduct experiments on 3 different tasks: speaker change detection, endpointing, and NER. Our experiments on a public and a private dataset show that the proposed method improves ASR by up to 7.7% in relative WER while outperforming the cascaded pipeline approach in individual task performance. Additionally, we present task transfer learning to a new task within an existing TokenVerse. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 5 pages, double column

arXiv:2407.04332 [pdf]

Energy Efficient Knapsack Optimization Using Probabilistic Memristor Crossbars

Authors: Jinzhan Li, Suhas Kumar, Su-in Yi

Abstract: Constrained optimization underlies crucial societal problems (for instance, stock trading and bandwidth allocation), but is often computationally hard (complexity grows exponentially with problem size). The big-data era urgently demands low-latency and low-energy optimization at the edge, which cannot be handled by digital processors due to their non-parallel von Neumann architecture. Recent effor… ▽ More Constrained optimization underlies crucial societal problems (for instance, stock trading and bandwidth allocation), but is often computationally hard (complexity grows exponentially with problem size). The big-data era urgently demands low-latency and low-energy optimization at the edge, which cannot be handled by digital processors due to their non-parallel von Neumann architecture. Recent efforts using massively parallel hardware (such as memristor crossbars and quantum processors) employing annealing algorithms, while promising, have handled relatively easy and stable problems with sparse or binary representations (such as the max-cut or traveling salesman problems).However, most real-world applications embody three features, which are encoded in the knapsack problem, and cannot be handled by annealing algorithms - dense and non-binary representations, with destabilizing self-feedback. Here we demonstrate a post-digital-hardware-friendly randomized competitive Ising-inspired (RaCI) algorithm performing knapsack optimization, experimentally implemented on a foundry-manufactured CMOS-integrated probabilistic analog memristor crossbar. Our solution outperforms digital and quantum approaches by over 4 orders of magnitude in energy efficiency. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 16 pages, 8 figures

arXiv:2407.04087 [pdf, other]

Advanced Artificial Intelligence Strategy for Optimizing Urban Rail Network Design using Nature-Inspired Algorithms

Authors: Hariram Sampath Kumar, Archana Singh, Manish Kumar Ojha

Abstract: This study introduces an innovative methodology for the planning of metro network routes within the urban environment of Chennai, Tamil Nadu, India. A comparative analysis of the modified Ant Colony Optimization (ACO) method (previously developed) with recent breakthroughs in nature-inspired algorithms demonstrates the modified ACO's superiority over modern techniques. By utilizing the modified AC… ▽ More This study introduces an innovative methodology for the planning of metro network routes within the urban environment of Chennai, Tamil Nadu, India. A comparative analysis of the modified Ant Colony Optimization (ACO) method (previously developed) with recent breakthroughs in nature-inspired algorithms demonstrates the modified ACO's superiority over modern techniques. By utilizing the modified ACO algorithm, the most efficient routes connecting the origin and destination of the metro route are generated. Additionally, the model is applied to the existing metro network to highlight variations between the model's results and the current network. The Google Maps platform, integrated with Python, handles real-time data, including land utilization, Geographical Information Systems (GIS) data, census information, and points of interest. This processing enables the identification of stops within the city and along the chosen routes. The resulting metro network showcases substantial benefits compared to conventional route planning methods, with noteworthy enhancements in workforce productivity, decreased planning time, and cost-efficiency. This study significantly enhances the efficiency of urban transport systems, specifically in rapidly changing metropolitan settings such as chennai. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: 10 pages, 17 figures

arXiv:2407.04053 [pdf, other]

Edge AI: A Taxonomy, Systematic Review and Future Directions

Authors: Sukhpal Singh Gill, Muhammed Golec, Jianmin Hu, Minxian Xu, Junhui Du, Huaming Wu, Guneet Kaur Walia, Subramaniam Subramanian Murugesan, Babar Ali, Mohit Kumar, Kejiang Ye, Prabal Verma, Surendra Kumar, Felix Cuadrado, Steve Uhlig

Abstract: Edge Artificial Intelligence (AI) incorporates a network of interconnected systems and devices that receive, cache, process, and analyse data in close communication with the location where the data is captured with AI technology. Recent advancements in AI efficiency, the widespread use of Internet of Things (IoT) devices, and the emergence of edge computing have unlocked the enormous scope of Edge… ▽ More Edge Artificial Intelligence (AI) incorporates a network of interconnected systems and devices that receive, cache, process, and analyse data in close communication with the location where the data is captured with AI technology. Recent advancements in AI efficiency, the widespread use of Internet of Things (IoT) devices, and the emergence of edge computing have unlocked the enormous scope of Edge AI. The goal of Edge AI is to optimize data processing efficiency and velocity while ensuring data confidentiality and integrity. Despite being a relatively new field of research, spanning from 2014 to the present, it has shown significant and rapid development over the last five years. In this article, we present a systematic literature review for Edge AI to discuss the existing research, recent advancements, and future research directions. We created a collaborative edge AI learning system for cloud and edge computing analysis, including an in-depth study of the architectures that facilitate this mechanism. The taxonomy for Edge AI facilitates the classification and configuration of Edge AI systems while also examining its potential influence across many fields through compassing infrastructure, cloud computing, fog computing, services, use cases, ML and deep learning, and resource management. This study highlights the significance of Edge AI in processing real-time data at the edge of the network. Additionally, it emphasizes the research challenges encountered by Edge AI systems, including constraints on resources, vulnerabilities to security threats, and problems with scalability. Finally, this study highlights the potential future research directions that aim to address the current limitations of Edge AI by providing innovative solutions. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: Preprint Version, 18 Figures

arXiv:2407.03152 [pdf, other]

Stereo Risk: A Continuous Modeling Approach to Stereo Matching

Authors: Ce Liu, Suryansh Kumar, Shuhang Gu, Radu Timofte, Yao Yao, Luc Van Gool

Abstract: We introduce Stereo Risk, a new deep-learning approach to solve the classical stereo-matching problem in computer vision. As it is well-known that stereo matching boils down to a per-pixel disparity estimation problem, the popular state-of-the-art stereo-matching approaches widely rely on regressing the scene disparity values, yet via discretization of scene disparity values. Such discretization o… ▽ More We introduce Stereo Risk, a new deep-learning approach to solve the classical stereo-matching problem in computer vision. As it is well-known that stereo matching boils down to a per-pixel disparity estimation problem, the popular state-of-the-art stereo-matching approaches widely rely on regressing the scene disparity values, yet via discretization of scene disparity values. Such discretization often fails to capture the nuanced, continuous nature of scene depth. Stereo Risk departs from the conventional discretization approach by formulating the scene disparity as an optimal solution to a continuous risk minimization problem, hence the name "stereo risk". We demonstrate that $L^1$ minimization of the proposed continuous risk function enhances stereo-matching performance for deep networks, particularly for disparities with multi-modal probability distributions. Furthermore, to enable the end-to-end network training of the non-differentiable $L^1$ risk optimization, we exploited the implicit function theorem, ensuring a fully differentiable network. A comprehensive analysis demonstrates our method's theoretical soundness and superior performance over the state-of-the-art methods across various benchmark datasets, including KITTI 2012, KITTI 2015, ETH3D, SceneFlow, and Middlebury 2014. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Accepted as an Oral Paper at ICML 2024. Draft info: 18 pages, 6 Figure, 16 Tables

arXiv:2407.02929 [pdf, other]

A Hybrid Reactive Routing Protocol for Decentralized UAV Networks

Authors: Shivam Garg, Alexander Ihler, Elizabeth Serena Bentley, Sunil Kumar

Abstract: Wireless networks consisting of low SWaP, FW-UAVs are used in many applications, such as monitoring, search and surveillance of inaccessible areas. A decentralized and autonomous approach ensures robustness to failures; the UAVs explore and sense within the area and forward their information, in a multihop manner, to nearby aerial gateway nodes. However, the unpredictable nature of the events, rel… ▽ More Wireless networks consisting of low SWaP, FW-UAVs are used in many applications, such as monitoring, search and surveillance of inaccessible areas. A decentralized and autonomous approach ensures robustness to failures; the UAVs explore and sense within the area and forward their information, in a multihop manner, to nearby aerial gateway nodes. However, the unpredictable nature of the events, relatively high speed of UAVs, and dynamic UAV trajectories cause the network topology to change significantly over time, resulting in frequent route breaks. A holistic routing approach is needed to support multiple traffic flows in these networks to provide mobility- and congestion-aware, high-quality routes when needed, with low control and computational overheads, using the information collected in a distributed manner. Existing routing schemes do not address all the mentioned issues. We present a hybrid reactive routing protocol for decentralized UAV networks. Our scheme searches routes on-demand, monitors a region around the selected route (the pipe), and proactively switches to an alternative route before the current route's quality degrades below a threshold. We empirically evaluate the impact of pipe width and node density on our ability to find alternate high-quality routes within the pipe and the overhead required to maintain the pipe. Compared to existing reactive routing schemes, our approach achieves higher throughput and reduces the number of route discoveries, overhead, and resulting flow interruptions at different traffic loads, node densities and speeds. Despite having limited network topology information, and low overhead and route computation complexity, our proposed scheme achieves superior throughput to proactive optimized link state routing scheme at different network and traffic settings. We also evaluate the relative performance of reactive and proactive routing schemes. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.02662 [pdf, other]

Supporters and Skeptics: LLM-based Analysis of Engagement with Mental Health (Mis)Information Content on Video-sharing Platforms

Authors: Viet Cuong Nguyen, Mini Jain, Abhijat Chauhan, Heather Jaime Soled, Santiago Alvarez Lesmes, Zihang Li, Michael L. Birnbaum, Sunny X. Tang, Srijan Kumar, Munmun De Choudhury

Abstract: Over one in five adults in the US lives with a mental illness. In the face of a shortage of mental health professionals and offline resources, online short-form video content has grown to serve as a crucial conduit for disseminating mental health help and resources. However, the ease of content creation and access also contributes to the spread of misinformation, posing risks to accurate diagnosis… ▽ More Over one in five adults in the US lives with a mental illness. In the face of a shortage of mental health professionals and offline resources, online short-form video content has grown to serve as a crucial conduit for disseminating mental health help and resources. However, the ease of content creation and access also contributes to the spread of misinformation, posing risks to accurate diagnosis and treatment. Detecting and understanding engagement with such content is crucial to mitigating their harmful effects on public health. We perform the first quantitative study of the phenomenon using YouTube Shorts and Bitchute as the sites of study. We contribute MentalMisinfo, a novel labeled mental health misinformation (MHMisinfo) dataset of 739 videos (639 from Youtube and 100 from Bitchute) and 135372 comments in total, using an expert-driven annotation schema. We first found that few-shot in-context learning with large language models (LLMs) are effective in detecting MHMisinfo videos. Next, we discover distinct and potentially alarming linguistic patterns in how audiences engage with MHMisinfo videos through commentary on both video-sharing platforms. Across the two platforms, comments could exacerbate prevailing stigma with some groups showing heightened susceptibility to and alignment with MHMisinfo. We discuss technical and public health-driven adaptive solutions to tackling the "epidemic" of mental health misinformation online. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 12 pages, in submission to ICWSM

arXiv:2406.18848 [pdf, other]

Temporally Multi-Scale Sparse Self-Attention for Physical Activity Data Imputation

Authors: Hui Wei, Maxwell A. Xu, Colin Samplawski, James M. Rehg, Santosh Kumar, Benjamin M. Marlin

Abstract: Wearable sensors enable health researchers to continuously collect data pertaining to the physiological state of individuals in real-world settings. However, such data can be subject to extensive missingness due to a complex combination of factors. In this work, we study the problem of imputation of missing step count data, one of the most ubiquitous forms of wearable sensor data. We construct a n… ▽ More Wearable sensors enable health researchers to continuously collect data pertaining to the physiological state of individuals in real-world settings. However, such data can be subject to extensive missingness due to a complex combination of factors. In this work, we study the problem of imputation of missing step count data, one of the most ubiquitous forms of wearable sensor data. We construct a novel and large scale data set consisting of a training set with over 3 million hourly step count observations and a test set with over 2.5 million hourly step count observations. We propose a domain knowledge-informed sparse self-attention model for this task that captures the temporal multi-scale nature of step-count data. We assess the performance of the model relative to baselines and conduct ablation studies to verify our specific model designs. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: Accepted by Conference on Health, Inference, and Learning (CHIL) 2024

arXiv:2406.18510 [pdf, other]

WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models

Authors: Liwei Jiang, Kavel Rao, Seungju Han, Allyson Ettinger, Faeze Brahman, Sachin Kumar, Niloofar Mireshghallah, Ximing Lu, Maarten Sap, Yejin Choi, Nouha Dziri

Abstract: We introduce WildTeaming, an automatic LLM safety red-teaming framework that mines in-the-wild user-chatbot interactions to discover 5.7K unique clusters of novel jailbreak tactics, and then composes multiple tactics for systematic exploration of novel jailbreaks. Compared to prior work that performed red-teaming via recruited human workers, gradient-based optimization, or iterative revision with… ▽ More We introduce WildTeaming, an automatic LLM safety red-teaming framework that mines in-the-wild user-chatbot interactions to discover 5.7K unique clusters of novel jailbreak tactics, and then composes multiple tactics for systematic exploration of novel jailbreaks. Compared to prior work that performed red-teaming via recruited human workers, gradient-based optimization, or iterative revision with LLMs, our work investigates jailbreaks from chatbot users who were not specifically instructed to break the system. WildTeaming reveals previously unidentified vulnerabilities of frontier LLMs, resulting in up to 4.6x more diverse and successful adversarial attacks compared to state-of-the-art jailbreak methods. While many datasets exist for jailbreak evaluation, very few open-source datasets exist for jailbreak training, as safety training data has been closed even when model weights are open. With WildTeaming we create WildJailbreak, a large-scale open-source synthetic safety dataset with 262K vanilla (direct request) and adversarial (complex jailbreak) prompt-response pairs. To mitigate exaggerated safety behaviors, WildJailbreak provides two contrastive types of queries: 1) harmful queries (vanilla & adversarial) and 2) benign queries that resemble harmful queries in form but contain no harm. As WildJailbreak considerably upgrades the quality and scale of existing safety resources, it uniquely enables us to examine the scaling effects of data and the interplay of data properties and model capabilities during safety training. Through extensive experiments, we identify the training properties that enable an ideal balance of safety behaviors: appropriate safeguarding without over-refusal, effective handling of vanilla and adversarial queries, and minimal, if any, decrease in general capabilities. All components of WildJailbeak contribute to achieving balanced safety behaviors of models. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.17968 [pdf, other]

Efficient Document Ranking with Learnable Late Interactions

Authors: Ziwei Ji, Himanshu Jain, Andreas Veit, Sashank J. Reddi, Sadeep Jayasumana, Ankit Singh Rawat, Aditya Krishna Menon, Felix Yu, Sanjiv Kumar

Abstract: Cross-Encoder (CE) and Dual-Encoder (DE) models are two fundamental approaches for query-document relevance in information retrieval. To predict relevance, CE models use joint query-document embeddings, while DE models maintain factorized query and document embeddings; usually, the former has higher quality while the latter benefits from lower latency. Recently, late-interaction models have been p… ▽ More Cross-Encoder (CE) and Dual-Encoder (DE) models are two fundamental approaches for query-document relevance in information retrieval. To predict relevance, CE models use joint query-document embeddings, while DE models maintain factorized query and document embeddings; usually, the former has higher quality while the latter benefits from lower latency. Recently, late-interaction models have been proposed to realize more favorable latency-quality tradeoffs, by using a DE structure followed by a lightweight scorer based on query and document token embeddings. However, these lightweight scorers are often hand-crafted, and there is no understanding of their approximation power; further, such scorers require access to individual document token embeddings, which imposes an increased latency and storage burden. In this paper, we propose novel learnable late-interaction models (LITE) that resolve these issues. Theoretically, we prove that LITE is a universal approximator of continuous scoring functions, even for relatively small embedding dimension. Empirically, LITE outperforms previous late-interaction models such as ColBERT on both in-domain and zero-shot re-ranking tasks. For instance, experiments on MS MARCO passage re-ranking show that LITE not only yields a model with better generalization, but also lowers latency and requires 0.25x storage compared to ColBERT. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.17963 [pdf, other]

Empowering Interdisciplinary Insights with Dynamic Graph Embedding Trajectories

Authors: Yiqiao Jin, Andrew Zhao, Yeon-Chang Lee, Meng Ye, Ajay Divakaran, Srijan Kumar

Abstract: We developed DyGETViz, a novel framework for effectively visualizing dynamic graphs (DGs) that are ubiquitous across diverse real-world systems. This framework leverages recent advancements in discrete-time dynamic graph (DTDG) models to adeptly handle the temporal dynamics inherent in dynamic graphs. DyGETViz effectively captures both micro- and macro-level structural shifts within these graphs,… ▽ More We developed DyGETViz, a novel framework for effectively visualizing dynamic graphs (DGs) that are ubiquitous across diverse real-world systems. This framework leverages recent advancements in discrete-time dynamic graph (DTDG) models to adeptly handle the temporal dynamics inherent in dynamic graphs. DyGETViz effectively captures both micro- and macro-level structural shifts within these graphs, offering a robust method for representing complex and massive dynamic graphs. The application of DyGETViz extends to a diverse array of domains, including ethology, epidemiology, finance, genetics, linguistics, communication studies, social studies, and international relations. Through its implementation, DyGETViz has revealed or confirmed various critical insights. These include the diversity of content sharing patterns and the degree of specialization within online communities, the chronological evolution of lexicons across decades, and the distinct trajectories exhibited by aging-related and non-related genes. Importantly, DyGETViz enhances the accessibility of scientific findings to non-domain experts by simplifying the complexities of dynamic graphs. Our framework is released as an open-source Python package for use across diverse disciplines. Our work not only addresses the ongoing challenges in visualizing and analyzing DTDG models but also establishes a foundational framework for future investigations into dynamic graph representation and analysis across various disciplines. △ Less

Submitted 28 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

Comments: 27 pages, 11 figures

arXiv:2406.17641 [pdf]

The experience of humans' and robots' mutual (im)politeness in enacted service scenarios: An empirical study

Authors: Victor Kaptelinin, Suna Bensch, Thomas Hellström, Patrik Björnfot, Shikhar Kumar

Abstract: The paper reports an empirical study of the effect of human treatment of a robot on the social perception of the robot's behavior. The study employed an enacted interaction between an anthropomorphic "waiter" robot and two customers. The robot and one of the customers (acted out by a researcher) were following four different interaction scripts, representing all combinations of mutual politeness a… ▽ More The paper reports an empirical study of the effect of human treatment of a robot on the social perception of the robot's behavior. The study employed an enacted interaction between an anthropomorphic "waiter" robot and two customers. The robot and one of the customers (acted out by a researcher) were following four different interaction scripts, representing all combinations of mutual politeness and impoliteness of the robot and the customer. The participants (N=24, within-subject design) were assigned the role of an "included observer", that is, a fellow customer who was present in the situation without being actively involved in the interactions. The participants assessed how they experienced the interaction scenarios by providing Likert scale scores and free-text responses. The results indicate that while impolite robots' behavior was generally assessed negatively, it was commonly perceived as more justifiable and fairer if the robot was treated impolitely by the human. Politeness reciprocity expectations in the context of the social perception of robots are discussed. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: 19 pages, 5 figures, 7 tables

arXiv:2406.13715 [pdf, other]

Converging Dimensions: Information Extraction and Summarization through Multisource, Multimodal, and Multilingual Fusion

Authors: Pranav Janjani, Mayank Palan, Sarvesh Shirude, Ninad Shegokar, Sunny Kumar, Faruk Kazi

Abstract: Recent advances in large language models (LLMs) have led to new summarization strategies, offering an extensive toolkit for extracting important information. However, these approaches are frequently limited by their reliance on isolated sources of data. The amount of information that can be gathered is limited and covers a smaller range of themes, which introduces the possibility of falsified cont… ▽ More Recent advances in large language models (LLMs) have led to new summarization strategies, offering an extensive toolkit for extracting important information. However, these approaches are frequently limited by their reliance on isolated sources of data. The amount of information that can be gathered is limited and covers a smaller range of themes, which introduces the possibility of falsified content and limited support for multilingual and multimodal data. The paper proposes a novel approach to summarization that tackles such challenges by utilizing the strength of multiple sources to deliver a more exhaustive and informative understanding of intricate topics. The research progresses beyond conventional, unimodal sources such as text documents and integrates a more diverse range of data, including YouTube playlists, pre-prints, and Wikipedia pages. The aforementioned varied sources are then converted into a unified textual representation, enabling a more holistic analysis. This multifaceted approach to summary generation empowers us to extract pertinent information from a wider array of sources. The primary tenet of this approach is to maximize information gain while minimizing information overlap and maintaining a high level of informativeness, which encourages the generation of highly coherent summaries. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 11 pages, 3 figures

arXiv:2406.12405 [pdf]

On The Effective Rate and Error Rate Analysis over Fluctuating Nakagami-m Fading Channel

Authors: Manpreet Kaur, Puspraj Singh Chauhan, Sandeep Kumar, Pappu Kumar Verma

Abstract: This paper provides a detailed analysis of the important performance metrics like effective capacity and symbol error rate over fluctuating Nakagami-m fading channel. This distribution is obtained from the ratio of two random variables, following the Nakagami-m distribution and the uniform distribution. Our study derives exact analytical expressions for the EC and SER under different modulation sc… ▽ More This paper provides a detailed analysis of the important performance metrics like effective capacity and symbol error rate over fluctuating Nakagami-m fading channel. This distribution is obtained from the ratio of two random variables, following the Nakagami-m distribution and the uniform distribution. Our study derives exact analytical expressions for the EC and SER under different modulation schemes, considering the effect of channel parameters. Recognising the importance of additive Laplacian noise in today scenario, it has been considered for the error performance analysis of the system. This work may be utilised for the design and optimization of the systems operating in environments characterized by fluctuating Nakagami-m fading. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 18 pages

arXiv:2406.12274 [pdf, other]

SafeInfer: Context Adaptive Decoding Time Safety Alignment for Large Language Models

Authors: Somnath Banerjee, Soham Tripathy, Sayan Layek, Shanu Kumar, Animesh Mukherjee, Rima Hazra

Abstract: Safety-aligned language models often exhibit fragile and imbalanced safety mechanisms, increasing the likelihood of generating unsafe content. In addition, incorporating new knowledge through editing techniques to language models can further compromise safety. To address these issues, we propose SafeInfer, a context-adaptive, decoding-time safety alignment strategy for generating safe responses to… ▽ More Safety-aligned language models often exhibit fragile and imbalanced safety mechanisms, increasing the likelihood of generating unsafe content. In addition, incorporating new knowledge through editing techniques to language models can further compromise safety. To address these issues, we propose SafeInfer, a context-adaptive, decoding-time safety alignment strategy for generating safe responses to user queries. SafeInfer comprises two phases: the safety amplification phase, which employs safe demonstration examples to adjust the model's hidden states and increase the likelihood of safer outputs, and the safety-guided decoding phase, which influences token selection based on safety-optimized distributions, ensuring the generated content complies with ethical guidelines. Further, we present HarmEval, a novel benchmark for extensive safety evaluations, designed to address potential misuse scenarios in accordance with the policies of leading AI tech giants. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: Under review

arXiv:2406.11768 [pdf, other]

GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities

Authors: Sreyan Ghosh, Sonal Kumar, Ashish Seth, Chandra Kiran Reddy Evuru, Utkarsh Tyagi, S Sakshi, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha

Abstract: Perceiving and understanding non-speech sounds and non-verbal speech is essential to making decisions that help us interact with our surroundings. In this paper, we propose GAMA, a novel General-purpose Large Audio-Language Model (LALM) with Advanced Audio Understanding and Complex Reasoning Abilities. We build GAMA by integrating an LLM with multiple types of audio representations, including feat… ▽ More Perceiving and understanding non-speech sounds and non-verbal speech is essential to making decisions that help us interact with our surroundings. In this paper, we propose GAMA, a novel General-purpose Large Audio-Language Model (LALM) with Advanced Audio Understanding and Complex Reasoning Abilities. We build GAMA by integrating an LLM with multiple types of audio representations, including features from a custom Audio Q-Former, a multi-layer aggregator that aggregates features from multiple layers of an audio encoder. We fine-tune GAMA on a large-scale audio-language dataset, which augments it with audio understanding capabilities. Next, we propose CompA-R (Instruction-Tuning for Complex Audio Reasoning), a synthetically generated instruction-tuning (IT) dataset with instructions that require the model to perform complex reasoning on the input audio. We instruction-tune GAMA with CompA-R to endow it with complex reasoning abilities, where we further add a soft prompt as input with high-level semantic evidence by leveraging event tags of the input audio. Finally, we also propose CompA-R-test, a human-labeled evaluation dataset for evaluating the capabilities of LALMs on open-ended audio question-answering that requires complex reasoning. Through automated and expert human evaluations, we show that GAMA outperforms all other LALMs in literature on diverse audio understanding tasks by margins of 1%-84%. Further, GAMA IT-ed on CompA-R proves to be superior in its complex reasoning and instruction following capabilities. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: Project Website: https://sreyan88.github.io/gamaaudio/

arXiv:2406.09443 [pdf, other]

Comparative Analysis of Personalized Voice Activity Detection Systems: Assessing Real-World Effectiveness

Authors: Satyam Kumar, Sai Srujana Buddi, Utkarsh Oggy Sarawgi, Vineet Garg, Shivesh Ranjan, Ognjen, Rudovic, Ahmed Hussen Abdelaziz, Saurabh Adya

Abstract: Voice activity detection (VAD) is a critical component in various applications such as speech recognition, speech enhancement, and hands-free communication systems. With the increasing demand for personalized and context-aware technologies, the need for effective personalized VAD systems has become paramount. In this paper, we present a comparative analysis of Personalized Voice Activity Detection… ▽ More Voice activity detection (VAD) is a critical component in various applications such as speech recognition, speech enhancement, and hands-free communication systems. With the increasing demand for personalized and context-aware technologies, the need for effective personalized VAD systems has become paramount. In this paper, we present a comparative analysis of Personalized Voice Activity Detection (PVAD) systems to assess their real-world effectiveness. We introduce a comprehensive approach to assess PVAD systems, incorporating various performance metrics such as frame-level and utterance-level error rates, detection latency and accuracy, alongside user-level analysis. Through extensive experimentation and evaluation, we provide a thorough understanding of the strengths and limitations of various PVAD variants. This paper advances the understanding of PVAD technology by offering insights into its efficacy and viability in practical applications using a comprehensive set of metrics. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.09167 [pdf, other]

Vision Transformer Segmentation for Visual Bird Sound Denoising

Authors: Sahil Kumar, Jialu Li, Youshan Zhang

Abstract: Audio denoising, especially in the context of bird sounds, remains a challenging task due to persistent residual noise. Traditional and deep learning methods often struggle with artificial or low-frequency noise. In this work, we propose ViTVS, a novel approach that leverages the power of the vision transformer (ViT) architecture. ViTVS adeptly combines segmentation techniques to disentangle clean… ▽ More Audio denoising, especially in the context of bird sounds, remains a challenging task due to persistent residual noise. Traditional and deep learning methods often struggle with artificial or low-frequency noise. In this work, we propose ViTVS, a novel approach that leverages the power of the vision transformer (ViT) architecture. ViTVS adeptly combines segmentation techniques to disentangle clean audio from complex signal mixtures. Our key contributions encompass the development of ViTVS, introducing comprehensive, long-range, and multi-scale representations. These contributions directly tackle the limitations inherent in conventional approaches. Extensive experiments demonstrate that ViTVS outperforms state-of-the-art methods, positioning it as a benchmark solution for real-world bird sound denoising applications. Source code is available at: https://github.com/aiai-4/ViVTS. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: INTERSPEECH 2024

arXiv:2406.07910 [pdf, other]

Demonstration of Safe Electromagnetic Radiation Emitted by 5G Active Antenna Systems

Authors: Sumit Kumar, Chandan Kumar Sheemar, Abdelrahman Astro, Jorge Querol, Symeon Chatzinotas

Abstract: The careful planning and safe deployment of 5G technologies will bring enormous benefits to society and the economy. Higher frequency, beamforming, and small-cells are key technologies that will provide unmatched throughput and seamless connectivity to 5G users. Superficial knowledge of these technologies has raised concerns among the general public about the harmful effects of radiation. Several… ▽ More The careful planning and safe deployment of 5G technologies will bring enormous benefits to society and the economy. Higher frequency, beamforming, and small-cells are key technologies that will provide unmatched throughput and seamless connectivity to 5G users. Superficial knowledge of these technologies has raised concerns among the general public about the harmful effects of radiation. Several standardization bodies are active to put limits on the emissions which are based on a defined set of radiation measurement methodologies. However, due to the peculiarity of 5G such as dynamicity of the beams, network densification, Time Division Duplexing mode of operation, etc, using existing EMF measurement methods may provide inaccurate results. In this context, we discuss our experimental studies aimed towards the measurement of radiation caused by beam-based transmissions from a 5G base station equipped with an Active Antenna System(AAS). We elaborate on the shortcomings of current measurement methodologies and address several open questions. Next, we demonstrate that using user-specific downlink beamforming, not only better performance is achieved compared to non-beamformed downlink, but also the radiation in the vicinity of the intended user is significantly decreased. Further, we show that under weak reception conditions, an uplink transmission can cause significantly high radiation in the vicinity of the user equipment. We believe that our work will help in clearing several misleading concepts about the 5G EMF radiation effects. We conclude the work by providing guidelines to improve the methodology of EMF measurement by considering the spatiotemporal dynamicity of the 5G transmission. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.06811 [pdf, other]

Learning Continually by Spectral Regularization

Authors: Alex Lewandowski, Saurabh Kumar, Dale Schuurmans, András György, Marlos C. Machado

Abstract: Loss of plasticity is a phenomenon where neural networks become more difficult to train during the course of learning. Continual learning algorithms seek to mitigate this effect by sustaining good predictive performance while maintaining network trainability. We develop new techniques for improving continual learning by first reconsidering how initialization can ensure trainability during early ph… ▽ More Loss of plasticity is a phenomenon where neural networks become more difficult to train during the course of learning. Continual learning algorithms seek to mitigate this effect by sustaining good predictive performance while maintaining network trainability. We develop new techniques for improving continual learning by first reconsidering how initialization can ensure trainability during early phases of learning. From this perspective, we derive new regularization strategies for continual learning that ensure beneficial initialization properties are better maintained throughout training. In particular, we investigate two new regularization techniques for continual learning: (i) Wasserstein regularization toward the initial weight distribution, which is less restrictive than regularizing toward initial weights; and (ii) regularizing weight matrix singular values, which directly ensures gradient diversity is maintained throughout training. We present an experimental analysis that shows these alternative regularizers can improve continual learning performance across a range of supervised learning tasks and model architectures. The alternative regularizers prove to be less sensitive to hyperparameters while demonstrating better training in individual tasks, sustaining trainability as new tasks arrive, and achieving better generalization performance. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Showing 1–50 of 1,075 results for author: Kumar, S