Search | arXiv e-print repository

MRAC Track 1: 2nd Workshop on Multimodal, Generative and Responsible Affective Computing

Authors: Shreya Ghosh, Zhixi Cai, Abhinav Dhall, Dimitrios Kollias, Roland Goecke, Tom Gedeon

Abstract: With the rapid advancements in multimodal generative technology, Affective Computing research has provoked discussion about the potential consequences of AI systems equipped with emotional intelligence. Affective Computing involves the design, evaluation, and implementation of Emotion AI and related technologies aimed at improving people's lives. Designing a computational model in affective comput… ▽ More With the rapid advancements in multimodal generative technology, Affective Computing research has provoked discussion about the potential consequences of AI systems equipped with emotional intelligence. Affective Computing involves the design, evaluation, and implementation of Emotion AI and related technologies aimed at improving people's lives. Designing a computational model in affective computing requires vast amounts of multimodal data, including RGB images, video, audio, text, and physiological signals. Moreover, Affective Computing research is deeply engaged with ethical considerations at various stages-from training emotionally intelligent models on large-scale human data to deploying these models in specific applications. Fundamentally, the development of any AI system must prioritize its impact on humans, aiming to augment and enhance human abilities rather than replace them, while drawing inspiration from human intelligence in a safe and responsible manner. The MRAC 2024 Track 1 workshop seeks to extend these principles from controlled, small-scale lab environments to real-world, large-scale contexts, emphasizing responsible development. The workshop also aims to highlight the potential implications of generative technology, along with the ethical consequences of its use, to researchers and industry professionals. To the best of our knowledge, this is the first workshop series to comprehensively address the full spectrum of multimodal, generative affective computing from a responsible AI perspective, and this is the second iteration of this workshop. Webpage: https://react-ws.github.io/2024/ △ Less

Submitted 11 September, 2024; originally announced September 2024.

Comments: ACM MM Workshop 2024. Workshop webpage: https://react-ws.github.io/2024/

arXiv:2409.06991 [pdf, other]

1M-Deepfakes Detection Challenge

Authors: Zhixi Cai, Abhinav Dhall, Shreya Ghosh, Munawar Hayat, Dimitrios Kollias, Kalin Stefanov, Usman Tariq

Abstract: The detection and localization of deepfake content, particularly when small fake segments are seamlessly mixed with real videos, remains a significant challenge in the field of digital media security. Based on the recently released AV-Deepfake1M dataset, which contains more than 1 million manipulated videos across more than 2,000 subjects, we introduce the 1M-Deepfakes Detection Challenge. This ch… ▽ More The detection and localization of deepfake content, particularly when small fake segments are seamlessly mixed with real videos, remains a significant challenge in the field of digital media security. Based on the recently released AV-Deepfake1M dataset, which contains more than 1 million manipulated videos across more than 2,000 subjects, we introduce the 1M-Deepfakes Detection Challenge. This challenge is designed to engage the research community in developing advanced methods for detecting and localizing deepfake manipulations within the large-scale high-realistic audio-visual dataset. The participants can access the AV-Deepfake1M dataset and are required to submit their inference results for evaluation across the metrics for detection or localization tasks. The methodologies developed through the challenge will contribute to the development of next-generation deepfake detection and localization systems. Evaluation scripts, baseline models, and accompanying code will be available on https://github.com/ControlNet/AV-Deepfake1M. △ Less

Submitted 10 September, 2024; originally announced September 2024.

Comments: ACM MM 2024. Challenge webpage: https://deepfakes1m.github.io/

arXiv:2409.06821 [pdf, other]

Sam2Rad: A Segmentation Model for Medical Images with Learnable Prompts

Authors: Assefa Seyoum Wahd, Banafshe Felfeliyan, Yuyue Zhou, Shrimanti Ghosh, Adam McArthur, Jiechen Zhang, Jacob L. Jaremko, Abhilash Hareendranathan

Abstract: Foundation models like the segment anything model require high-quality manual prompts for medical image segmentation, which is time-consuming and requires expertise. SAM and its variants often fail to segment structures in ultrasound (US) images due to domain shift. We propose Sam2Rad, a prompt learning approach to adapt SAM and its variants for US bone segmentation without human prompts. It int… ▽ More Foundation models like the segment anything model require high-quality manual prompts for medical image segmentation, which is time-consuming and requires expertise. SAM and its variants often fail to segment structures in ultrasound (US) images due to domain shift. We propose Sam2Rad, a prompt learning approach to adapt SAM and its variants for US bone segmentation without human prompts. It introduces a prompt predictor network (PPN) with a cross-attention module to predict prompt embeddings from image encoder features. PPN outputs bounding box and mask prompts, and 256-dimensional embeddings for regions of interest. The framework allows optional manual prompting and can be trained end-to-end using parameter-efficient fine-tuning (PEFT). Sam2Rad was tested on 3 musculoskeletal US datasets: wrist (3822 images), rotator cuff (1605 images), and hip (4849 images). It improved performance across all datasets without manual prompts, increasing Dice scores by 2-7% for hip/wrist and up to 33% for shoulder data. Sam2Rad can be trained with as few as 10 labeled images and is compatible with any SAM architecture for automatic segmentation. △ Less

Submitted 10 September, 2024; originally announced September 2024.

arXiv:2409.06224 [pdf, other]

MIP-GAF: A MLLM-annotated Benchmark for Most Important Person Localization and Group Context Understanding

Authors: Surbhi Madan, Shreya Ghosh, Lownish Rai Sookha, M. A. Ganaie, Ramanathan Subramanian, Abhinav Dhall, Tom Gedeon

Abstract: Estimating the Most Important Person (MIP) in any social event setup is a challenging problem mainly due to contextual complexity and scarcity of labeled data. Moreover, the causality aspects of MIP estimation are quite subjective and diverse. To this end, we aim to address the problem by annotating a large-scale `in-the-wild' dataset for identifying human perceptions about the `Most Important Per… ▽ More Estimating the Most Important Person (MIP) in any social event setup is a challenging problem mainly due to contextual complexity and scarcity of labeled data. Moreover, the causality aspects of MIP estimation are quite subjective and diverse. To this end, we aim to address the problem by annotating a large-scale `in-the-wild' dataset for identifying human perceptions about the `Most Important Person (MIP)' in an image. The paper provides a thorough description of our proposed Multimodal Large Language Model (MLLM) based data annotation strategy, and a thorough data quality analysis. Further, we perform a comprehensive benchmarking of the proposed dataset utilizing state-of-the-art MIP localization methods, indicating a significant drop in performance compared to existing datasets. The performance drop shows that the existing MIP localization algorithms must be more robust with respect to `in-the-wild' situations. We believe the proposed dataset will play a vital role in building the next-generation social situation understanding methods. The code and data is available at https://github.com/surbhimadan92/MIP-GAF. △ Less

Submitted 10 September, 2024; originally announced September 2024.

Comments: Accepted for publication at WACV 2025

arXiv:2409.03916 [pdf, other]

A Survey on Signed Graph Embedding: Methods and Applications

Authors: Shrabani Ghosh

Abstract: A signed graph (SG) is a graph where edges carry sign information attached to it. The sign of a network can be positive, negative, or neutral. A signed network is ubiquitous in a real-world network like social networks, citation networks, and various technical networks. There are many network embedding models have been proposed and developed for signed networks for both homogeneous and heterogeneo… ▽ More A signed graph (SG) is a graph where edges carry sign information attached to it. The sign of a network can be positive, negative, or neutral. A signed network is ubiquitous in a real-world network like social networks, citation networks, and various technical networks. There are many network embedding models have been proposed and developed for signed networks for both homogeneous and heterogeneous types. SG embedding learns low-dimensional vector representations for nodes of a network, which helps to do many network analysis tasks such as link prediction, node classification, and community detection. In this survey, we perform a comprehensive study of SG embedding methods and applications. We introduce here the basic theories and methods of SGs and survey the current state of the art of signed graph embedding methods. In addition, we explore the applications of different types of SG embedding methods in real-world scenarios. As an application, we have explored the citation network to analyze authorship networks. We also provide source code and datasets to give future direction. Lastly, we explore the challenges of SG embedding and forecast various future research directions in this field. △ Less

Submitted 5 September, 2024; originally announced September 2024.

arXiv:2409.01628 [pdf, other]

CTG-KrEW: Generating Synthetic Structured Contextually Correlated Content by Conditional Tabular GAN with K-Means Clustering and Efficient Word Embedding

Authors: Riya Samanta, Bidyut Saha, Soumya K. Ghosh, Sajal K. Das

Abstract: Conditional Tabular Generative Adversarial Networks (CTGAN) and their various derivatives are attractive for their ability to efficiently and flexibly create synthetic tabular data, showcasing strong performance and adaptability. However, there are certain critical limitations to such models. The first is their inability to preserve the semantic integrity of contextually correlated words or phrase… ▽ More Conditional Tabular Generative Adversarial Networks (CTGAN) and their various derivatives are attractive for their ability to efficiently and flexibly create synthetic tabular data, showcasing strong performance and adaptability. However, there are certain critical limitations to such models. The first is their inability to preserve the semantic integrity of contextually correlated words or phrases. For instance, skillset in freelancer profiles is one such attribute where individual skills are semantically interconnected and indicative of specific domain interests or qualifications. The second challenge of traditional approaches is that, when applied to generate contextually correlated tabular content, besides generating semantically shallow content, they consume huge memory resources and CPU time during the training stage. To address these problems, we introduce a novel framework, CTGKrEW (Conditional Tabular GAN with KMeans Clustering and Word Embedding), which is adept at generating realistic synthetic tabular data where attributes are collections of semantically and contextually coherent words. CTGKrEW is trained and evaluated using a dataset from Upwork, a realworld freelancing platform. Comprehensive experiments were conducted to analyze the variability, contextual similarity, frequency distribution, and associativity of the generated data, along with testing the framework's system feasibility. CTGKrEW also takes around 99\% less CPU time and 33\% less memory footprints than the conventional approach. Furthermore, we developed KrEW, a web application to facilitate the generation of realistic data containing skill-related information. This application, available at https://riyasamanta.github.io/krew.html, is freely accessible to both the general public and the research community. △ Less

Submitted 3 September, 2024; originally announced September 2024.

arXiv:2409.01484 [pdf, other]

Watermarking of Quantum Circuits

Authors: Rupshali Roy, Swaroop Ghosh

Abstract: Quantum circuits constitute Intellectual Property (IP) of the quantum developers and users, which needs to be protected from theft by adversarial agents, e.g., the quantum cloud provider or a rogue adversary present in the cloud. This necessitates the exploration of low-overhead techniques applicable to near-term quantum devices, to trace the quantum circuits/algorithms\textquotesingle{} IP and th… ▽ More Quantum circuits constitute Intellectual Property (IP) of the quantum developers and users, which needs to be protected from theft by adversarial agents, e.g., the quantum cloud provider or a rogue adversary present in the cloud. This necessitates the exploration of low-overhead techniques applicable to near-term quantum devices, to trace the quantum circuits/algorithms\textquotesingle{} IP and their output. We present two such lightweight watermarking techniques to prove ownership in the event of an adversary cloning the circuit design. For the first technique, a rotation gate is placed on ancilla qubits combined with other gate(s) at the output of the circuit. For the second method, a set of random gates are inserted in the middle of the circuit followed by its inverse, separated from the circuit by a barrier. These models are combined and applied on benchmark circuits, and the circuit depth, 2-qubit gate count, probability of successful trials (PST), and probabilistic proof of authorship (PPA) are compared against the state-of-the-art. The PST is reduced by a minuscule 0.53\% against the non-watermarked benchmarks and is up to 22.69\% higher compared to existing techniques. The circuit depth has been reduced by up to 27.7\% as against the state-of-the-art. The PPA is astronomically smaller than existing watermarks. △ Less

Submitted 2 September, 2024; originally announced September 2024.

arXiv:2409.00093 [pdf, other]

Towards Sustainable Personalized On-Device Human Activity Recognition with TinyML and Cloud-Enabled Auto Deployment

Authors: Bidyut Saha, Riya Samanta, Soumya K Ghosh, Ram Babu Roy

Abstract: Human activity recognition (HAR) holds immense potential for transforming health and fitness monitoring, yet challenges persist in achieving personalized outcomes and sustainability for on-device continuous inferences. This work introduces a wrist-worn smart band designed to address these challenges through a novel combination of on-device TinyML-driven computing and cloud-enabled auto-deployment.… ▽ More Human activity recognition (HAR) holds immense potential for transforming health and fitness monitoring, yet challenges persist in achieving personalized outcomes and sustainability for on-device continuous inferences. This work introduces a wrist-worn smart band designed to address these challenges through a novel combination of on-device TinyML-driven computing and cloud-enabled auto-deployment. Leveraging inertial measurement unit (IMU) sensors and a customized 1D Convolutional Neural Network (CNN) for personalized HAR, users can tailor activity classes to their unique movement styles with minimal calibration. By utilising TinyML for local computations, the smart band reduces the necessity for constant data transmission and radio communication, which in turn lowers power consumption and reduces carbon footprint. This method also enhances the privacy and security of user data by limiting its transmission. Through transfer learning and fine-tuning on user-specific data, the system achieves a 37\% increase in accuracy over generalized models in personalized settings. Evaluation using three benchmark datasets, WISDM, PAMAP2, and the BandX demonstrates its effectiveness across various activity domains. Additionally, this work presents a cloud-supported framework for the automatic deployment of TinyML models to remote wearables, enabling seamless customization and on-device inference, even with limited target data. By combining personalized HAR with sustainable strategies for on-device continuous inferences, this system represents a promising step towards fostering healthier and more sustainable societies worldwide. △ Less

Submitted 26 August, 2024; originally announced September 2024.

arXiv:2409.00081 [pdf, other]

Examining Different Research Communities: Authorship Network

Authors: Shrabani Ghosh

Abstract: Google Scholar is one of the top search engines to access research articles across multiple disciplines for scholarly literature. Google scholar advance search option gives the privilege to extract articles based on phrases, publishers name, authors name, time duration etc. In this work, we collected Google Scholar data (2000-2021) for two different research domains in computer science: Data Minin… ▽ More Google Scholar is one of the top search engines to access research articles across multiple disciplines for scholarly literature. Google scholar advance search option gives the privilege to extract articles based on phrases, publishers name, authors name, time duration etc. In this work, we collected Google Scholar data (2000-2021) for two different research domains in computer science: Data Mining and Software Engineering. The scholar database resources are powerful for network analysis, data mining, and identify links between authors via authorship network. We examined coauthor-ship network for each domain and studied their network structure. Extensive experiments are performed to analyze publications trend and identifying influential authors and affiliated organizations for each domain. The network analysis shows that the networks features are distinct from one another and exhibit small communities within the influential authors of a particular domain. △ Less

Submitted 24 August, 2024; originally announced September 2024.

arXiv:2409.00051 [pdf, other]

OnDiscuss: An Epistemic Network Analysis Learning Analytics Visualization Tool for Evaluating Asynchronous Online Discussions

Authors: Yanye Luther, Marcia Moraes, Sudipto Ghosh, James Folkestad

Abstract: Asynchronous online discussions are common assignments in both hybrid and online courses to promote critical thinking and collaboration among students. However, the evaluation of these assignments can require considerable time and effort from instructors. We created OnDiscuss, a learning analytics visualization tool for instructors that utilizes text mining algorithms and Epistemic Network Analysi… ▽ More Asynchronous online discussions are common assignments in both hybrid and online courses to promote critical thinking and collaboration among students. However, the evaluation of these assignments can require considerable time and effort from instructors. We created OnDiscuss, a learning analytics visualization tool for instructors that utilizes text mining algorithms and Epistemic Network Analysis (ENA) to generate visualizations of student discussion data. Text mining is used to generate an initial codebook for the instructor as well as automatically code the data. This tool allows instructors to edit their codebook and then dynamically view the resulting ENA networks for the entire class and individual students. Through empirical investigation, we assess this tool's effectiveness to help instructors in analyzing asynchronous online discussion assignments. △ Less

Submitted 19 August, 2024; originally announced September 2024.

Comments: Accepted to the International Conference on Quantitative Ethnography 2024 in Philadelphia, Pennsylvania

arXiv:2408.17128 [pdf, ps, other]

Time varying channel estimation for RIS assisted network with outdated CSI: Looking beyond coherence time

Authors: Souvik Deb, Sasthi C. Ghosh

Abstract: The channel estimation (CE) overhead for unstructured multipath-rich channels increases linearly with the number of reflective elements of reconfigurable intelligent surface (RIS). This results in a significant portion of the channel coherence time being spent on CE, reducing data communication time. Furthermore, due to the mobility of the user equipment (UE) and the time consumed during CE, the e… ▽ More The channel estimation (CE) overhead for unstructured multipath-rich channels increases linearly with the number of reflective elements of reconfigurable intelligent surface (RIS). This results in a significant portion of the channel coherence time being spent on CE, reducing data communication time. Furthermore, due to the mobility of the user equipment (UE) and the time consumed during CE, the estimated channel state information (CSI) may become outdated during actual data communication. In recent studies, the timing for CE has been primarily determined based on the coherence time interval, which is dependent on the velocity of the UE. However, the effect of the current channel condition and pathloss of the UEs can also be utilized to control the duration between successive CE to reduce the overhead while still maintaining the quality of service. Furthermore, for muti-user systems, the appropriate coherence time intervals of different users may be different depending on their velocities. Therefore CE carried out ignoring the difference in coherence time of different UEs may result in the estimated CSI being detrimentally outdated for some users. In contrast, others may not have sufficient time for data communication. To this end, based on the throughput analysis on outdated CSI, an algorithm has been designed to dynamically predict the next time instant for CE after the current CSI acquisition. In the first step, optimal RIS phase shifts to maximise channel gain is computed. Based on this and the amount of degradation of SINR due to outdated CSI, transmit powers are allocated for the UEs and finally the next time instant for CE is predicted such that the aggregated throughput is maximized. Simulation results confirm that our proposed algorithm outperforms the coherence time-based strategies. △ Less

Submitted 30 August, 2024; originally announced August 2024.

arXiv:2408.17027 [pdf, other]

ConDense: Consistent 2D/3D Pre-training for Dense and Sparse Features from Multi-View Images

Authors: Xiaoshuai Zhang, Zhicheng Wang, Howard Zhou, Soham Ghosh, Danushen Gnanapragasam, Varun Jampani, Hao Su, Leonidas Guibas

Abstract: To advance the state of the art in the creation of 3D foundation models, this paper introduces the ConDense framework for 3D pre-training utilizing existing pre-trained 2D networks and large-scale multi-view datasets. We propose a novel 2D-3D joint training scheme to extract co-embedded 2D and 3D features in an end-to-end pipeline, where 2D-3D feature consistency is enforced through a volume rende… ▽ More To advance the state of the art in the creation of 3D foundation models, this paper introduces the ConDense framework for 3D pre-training utilizing existing pre-trained 2D networks and large-scale multi-view datasets. We propose a novel 2D-3D joint training scheme to extract co-embedded 2D and 3D features in an end-to-end pipeline, where 2D-3D feature consistency is enforced through a volume rendering NeRF-like ray marching process. Using dense per pixel features we are able to 1) directly distill the learned priors from 2D models to 3D models and create useful 3D backbones, 2) extract more consistent and less noisy 2D features, 3) formulate a consistent embedding space where 2D, 3D, and other modalities of data (e.g., natural language prompts) can be jointly queried. Furthermore, besides dense features, ConDense can be trained to extract sparse features (e.g., key points), also with 2D-3D consistency -- condensing 3D NeRF representations into compact sets of decorated key points. We demonstrate that our pre-trained model provides good initialization for various 3D tasks including 3D classification and segmentation, outperforming other 3D pre-training methods by a significant margin. It also enables, by exploiting our sparse features, additional useful downstream tasks, such as matching 2D images to 3D scenes, detecting duplicate 3D scenes, and querying a repository of 3D scenes through natural language -- all quite efficiently and without any per-scene fine-tuning. △ Less

Submitted 30 August, 2024; originally announced August 2024.

Comments: ECCV 2024

arXiv:2408.16929 [pdf, other]

AI-driven Reverse Engineering of QML Models

Authors: Archisman Ghosh, Swaroop Ghosh

Abstract: Quantum machine learning (QML) is a rapidly emerging area of research, driven by the capabilities of Noisy Intermediate-Scale Quantum (NISQ) devices. With the progress in the research of QML models, there is a rise in third-party quantum cloud services to cater to the increasing demand for resources. New security concerns surface, specifically regarding the protection of intellectual property (IP)… ▽ More Quantum machine learning (QML) is a rapidly emerging area of research, driven by the capabilities of Noisy Intermediate-Scale Quantum (NISQ) devices. With the progress in the research of QML models, there is a rise in third-party quantum cloud services to cater to the increasing demand for resources. New security concerns surface, specifically regarding the protection of intellectual property (IP) from untrustworthy service providers. One of the most pressing risks is the potential for reverse engineering (RE) by malicious actors who may steal proprietary quantum IPs such as trained parameters and QML architecture, modify them to remove additional watermarks or signatures and re-transpile them for other quantum hardware. Prior work presents a brute force approach to RE the QML parameters which takes exponential time overhead. In this paper, we introduce an autoencoder-based approach to extract the parameters from transpiled QML models deployed on untrusted third-party vendors. We experiment on multi-qubit classifiers and note that they can be reverse-engineered under restricted conditions with a mean error of order 10^-1. The amount of time taken to prepare the dataset and train the model to reverse engineer the QML circuit being of the order 10^3 seconds (which is 10^2x better than the previously reported value for 4-layered 4-qubit classifiers) makes the threat of RE highly potent, underscoring the need for continued development of effective defenses. △ Less

Submitted 29 August, 2024; originally announced August 2024.

Comments: 7 pages, 4 figures

arXiv:2408.16535 [pdf, other]

TinyTNAS: GPU-Free, Time-Bound, Hardware-Aware Neural Architecture Search for TinyML Time Series Classification

Authors: Bidyut Saha, Riya Samanta, Soumya K. Ghosh, Ram Babu Roy

Abstract: In this work, we present TinyTNAS, a novel hardware-aware multi-objective Neural Architecture Search (NAS) tool specifically designed for TinyML time series classification. Unlike traditional NAS methods that rely on GPU capabilities, TinyTNAS operates efficiently on CPUs, making it accessible for a broader range of applications. Users can define constraints on RAM, FLASH, and MAC operations to di… ▽ More In this work, we present TinyTNAS, a novel hardware-aware multi-objective Neural Architecture Search (NAS) tool specifically designed for TinyML time series classification. Unlike traditional NAS methods that rely on GPU capabilities, TinyTNAS operates efficiently on CPUs, making it accessible for a broader range of applications. Users can define constraints on RAM, FLASH, and MAC operations to discover optimal neural network architectures within these parameters. Additionally, the tool allows for time-bound searches, ensuring the best possible model is found within a user-specified duration. By experimenting with benchmark dataset UCI HAR, PAMAP2, WISDM, MIT BIH, and PTB Diagnostic ECG Databas TinyTNAS demonstrates state-of-the-art accuracy with significant reductions in RAM, FLASH, MAC usage, and latency. For example, on the UCI HAR dataset, TinyTNAS achieves a 12x reduction in RAM usage, a 144x reduction in MAC operations, and a 78x reduction in FLASH memory while maintaining superior accuracy and reducing latency by 149x. Similarly, on the PAMAP2 and WISDM datasets, it achieves a 6x reduction in RAM usage, a 40x reduction in MAC operations, an 83x reduction in FLASH, and a 67x reduction in latency, all while maintaining superior accuracy. Notably, the search process completes within 10 minutes in a CPU environment. These results highlight TinyTNAS's capability to optimize neural network architectures effectively for resource-constrained TinyML applications, ensuring both efficiency and high performance. The code for TinyTNAS is available at the GitHub repository and can be accessed at https://github.com/BidyutSaha/TinyTNAS.git. △ Less

Submitted 29 August, 2024; originally announced August 2024.

arXiv:2408.15605 [pdf, other]

ES-PTAM: Event-based Stereo Parallel Tracking and Mapping

Authors: Suman Ghosh, Valentina Cavinato, Guillermo Gallego

Abstract: Visual Odometry (VO) and SLAM are fundamental components for spatial perception in mobile robots. Despite enormous progress in the field, current VO/SLAM systems are limited by their sensors' capability. Event cameras are novel visual sensors that offer advantages to overcome the limitations of standard cameras, enabling robots to expand their operating range to challenging scenarios, such as high… ▽ More Visual Odometry (VO) and SLAM are fundamental components for spatial perception in mobile robots. Despite enormous progress in the field, current VO/SLAM systems are limited by their sensors' capability. Event cameras are novel visual sensors that offer advantages to overcome the limitations of standard cameras, enabling robots to expand their operating range to challenging scenarios, such as high-speed motion and high dynamic range illumination. We propose a novel event-based stereo VO system by combining two ideas: a correspondence-free mapping module that estimates depth by maximizing ray density fusion and a tracking module that estimates camera poses by maximizing edge-map alignment. We evaluate the system comprehensively on five real-world datasets, spanning a variety of camera types (manufacturers and spatial resolutions) and scenarios (driving, flying drone, hand-held, egocentric, etc). The quantitative and qualitative results demonstrate that our method outperforms the state of the art in majority of the test sequences by a margin, e.g., trajectory error reduction of 45% on RPG dataset, 61% on DSEC dataset, and 21% on TUM-VIE dataset. To benefit the community and foster research on event-based perception systems, we release the source code and results: https://github.com/tub-rip/ES-PTAM △ Less

Submitted 28 August, 2024; originally announced August 2024.

Comments: 17 pages, 7 figures, 4 tables, https://github.com/tub-rip/ES-PTAM

Journal ref: European Conference on Computer Vision (ECCV) Workshops, Milan, Italy 2024

arXiv:2408.15076 [pdf, other]

MiWaves Reinforcement Learning Algorithm

Authors: Susobhan Ghosh, Yongyi Guo, Pei-Yao Hung, Lara Coughlin, Erin Bonar, Inbal Nahum-Shani, Maureen Walton, Susan Murphy

Abstract: The escalating prevalence of cannabis use poses a significant public health challenge globally. In the U.S., cannabis use is more prevalent among emerging adults (EAs) (ages 18-25) than any other age group, with legalization in the multiple states contributing to a public perception that cannabis is less risky than in prior decades. To address this growing concern, we developed MiWaves, a reinforc… ▽ More The escalating prevalence of cannabis use poses a significant public health challenge globally. In the U.S., cannabis use is more prevalent among emerging adults (EAs) (ages 18-25) than any other age group, with legalization in the multiple states contributing to a public perception that cannabis is less risky than in prior decades. To address this growing concern, we developed MiWaves, a reinforcement learning (RL) algorithm designed to optimize the delivery of personalized intervention prompts to reduce cannabis use among EAs. MiWaves leverages domain expertise and prior data to tailor the likelihood of delivery of intervention messages. This paper presents a comprehensive overview of the algorithm's design, including key decisions and experimental outcomes. The finalized MiWaves RL algorithm was deployed in a clinical trial from March to May 2024. △ Less

Submitted 27 August, 2024; originally announced August 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2402.17739

arXiv:2408.14113 [pdf, other]

Fast Low Level Disk Encryption Using FPGAs

Authors: Debrup Chakraborty, Sebati Ghosh, Cuauhtemoc Mancillas-Lopez, Palash Sarkar

Abstract: A fixed length tweakable enciphering scheme (TES) is the appropriate cryptographic functionality for low level disk encryption. Research on TES over the last two decades have led to a number of proposals many of which have already been implemented using FPGAs. This paper considers the FPGA implementations of two more recent and promising TESs, namely AEZ and FAST. The relevant architectures are de… ▽ More A fixed length tweakable enciphering scheme (TES) is the appropriate cryptographic functionality for low level disk encryption. Research on TES over the last two decades have led to a number of proposals many of which have already been implemented using FPGAs. This paper considers the FPGA implementations of two more recent and promising TESs, namely AEZ and FAST. The relevant architectures are described and simulation results on the Xilinx Virtex 5 and Virtex 7 FPGAs are presented. For comparison, two IEEE standard schemes, XCB and EME2 are considered. The results indicate that FAST outperforms the other schemes making it a serious candidate for future incorporation by disk manufacturers and standardisation bodies. △ Less

Submitted 26 August, 2024; originally announced August 2024.

arXiv:2408.13720 [pdf, other]

A prototype-based model for set classification

Authors: Mohammad Mohammadi, Sreejita Ghosh

Abstract: Classification of sets of inputs (e.g., images and texts) is an active area of research within both computer vision (CV) and natural language processing (NLP). A common way to represent a set of vectors is to model them as linear subspaces. In this contribution, we present a prototype-based approach for learning on the manifold formed from such linear subspaces, the Grassmann manifold. Our propose… ▽ More Classification of sets of inputs (e.g., images and texts) is an active area of research within both computer vision (CV) and natural language processing (NLP). A common way to represent a set of vectors is to model them as linear subspaces. In this contribution, we present a prototype-based approach for learning on the manifold formed from such linear subspaces, the Grassmann manifold. Our proposed method learns a set of subspace prototypes capturing the representative characteristics of classes and a set of relevance factors automating the selection of the dimensionality of the subspaces. This leads to a transparent classifier model which presents the computed impact of each input vector on its decision. Through experiments on benchmark image and text datasets, we have demonstrated the efficiency of our proposed classifier, compared to the transformer-based models in terms of not only performance and explainability but also computational resource requirements. △ Less

Submitted 25 August, 2024; originally announced August 2024.

arXiv:2408.12021 [pdf, other]

R-STELLAR: A Resilient Synthesizable Signature Attenuation SCA Protection on AES-256 with built-in Attack-on-Countermeasure Detection

Authors: Archisman Ghosh, Dong-Hyun Seo, Debayan Das, Santosh Ghosh, Shreyas Sen

Abstract: Side channel attacks (SCAs) remain a significant threat to the security of cryptographic systems in modern embedded devices. Even mathematically secure cryptographic algorithms, when implemented in hardware, inadvertently leak information through physical side channel signatures such as power consumption, electromagnetic (EM) radiation, light emissions, and acoustic emanations. Exploiting these si… ▽ More Side channel attacks (SCAs) remain a significant threat to the security of cryptographic systems in modern embedded devices. Even mathematically secure cryptographic algorithms, when implemented in hardware, inadvertently leak information through physical side channel signatures such as power consumption, electromagnetic (EM) radiation, light emissions, and acoustic emanations. Exploiting these side channels significantly reduces the search space of the attacker. In recent years, physical countermeasures have significantly increased the minimum traces to disclosure (MTD) to 1 billion. Among them, signature attenuation is the first method to achieve this mark. Signature attenuation often relies on analog techniques, and digital signature attenuation reduces MTD to 20 million, requiring additional methods for high resilience. We focus on improving the digital signature attenuation by an order of magnitude (MTD 200M). Additionally, we explore possible attacks against signature attenuation countermeasure. We introduce a Voltage drop Linear region Biasing (VLB) attack technique that reduces the MTD to over 2000 times less than the previous threshold. This is the first known attack against a physical side-channel attack (SCA) countermeasure. We have implemented an attack detector with a response time of 0.8 milliseconds to detect such attacks, limiting SCA leakage window to sub-ms, which is insufficient for a successful attack. △ Less

Submitted 21 August, 2024; originally announced August 2024.

Comments: Extended from CICC. Now under revision at Journal of Solid-State Circuits

arXiv:2408.11510 [pdf, other]

Empowering Volunteer Crowdsourcing Services: A Serverless-assisted, Skill and Willingness Aware Task Assignment Approach for Amicable Volunteer Involvement

Authors: Riya Samanta, Biswajeet Sethi, Soumya K Ghosh

Abstract: Volunteer crowdsourcing (VCS) leverages citizen interaction to address challenges by utilizing individuals' knowledge and skills. Complex social tasks often require collaboration among volunteers with diverse skill sets, and their willingness to engage is crucial. Matching tasks with the most suitable volunteers remains a significant challenge. VCS platforms face unpredictable demands in terms of… ▽ More Volunteer crowdsourcing (VCS) leverages citizen interaction to address challenges by utilizing individuals' knowledge and skills. Complex social tasks often require collaboration among volunteers with diverse skill sets, and their willingness to engage is crucial. Matching tasks with the most suitable volunteers remains a significant challenge. VCS platforms face unpredictable demands in terms of tasks and volunteer requests, complicating the prediction of resource requirements for the volunteer-to-task assignment process. To address these challenges, we introduce the Skill and Willingness-Aware Volunteer Matching (SWAM) algorithm, which allocates volunteers to tasks based on skills, willingness, and task requirements. We also developed a serverless framework to deploy SWAM. Our method outperforms conventional solutions, achieving a 71% improvement in end-to-end latency efficiency. We achieved a 92% task completion ratio and reduced task waiting time by 56%, with an overall utility gain 30% higher than state-of-the-art baseline methods. This framework contributes to generating effective volunteer and task matches, supporting grassroots community coordination and fostering citizen involvement, ultimately contributing to social good. △ Less

Submitted 21 August, 2024; originally announced August 2024.

arXiv:2408.11498 [pdf, other]

Sustainable Volunteer Engagement: Ensuring Potential Retention and Skill Diversity for Balanced Workforce Composition in Crowdsourcing Paradigm

Authors: Riya Samanta, Soumya K Ghosh

Abstract: Crowdsourcing (CS) faces the challenge of managing complex, skill-demanding tasks, which requires effective task assignment and retention strategies to sustain a balanced workforce. This challenge has become more significant in Volunteer Crowdsourcing Services (VCS). This study introduces Workforce Composition Balance (WCB), a novel framework designed to maintain workforce diversity in VCS by dyna… ▽ More Crowdsourcing (CS) faces the challenge of managing complex, skill-demanding tasks, which requires effective task assignment and retention strategies to sustain a balanced workforce. This challenge has become more significant in Volunteer Crowdsourcing Services (VCS). This study introduces Workforce Composition Balance (WCB), a novel framework designed to maintain workforce diversity in VCS by dynamically adjusting retention decisions. The WCB framework integrates the Volunteer Retention and Value Enhancement (VRAVE) algorithm with advanced skill-based task assignment methods. It ensures efficient remuneration policy for both assigned and unassigned potential volunteers by incorporating their potential levels, participation dividends, and satisfaction scores. Comparative analysis with three state-of-the-art baselines on real dataset shows that our WCB framework achieves 1.4 times better volunteer satisfaction and a 20% higher task retention rate, with only a 12% increase in remuneration. The effectiveness of the proposed WCB approach is to enhance the volunteer engagement and their long-term retention, thus making it suitable for functioning of social good applications where a potential and skilled volunteer workforce is crucial for sustainable community services. △ Less

Submitted 31 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

arXiv:2408.09562 [pdf, other]

Security Concerns in Quantum Machine Learning as a Service

Authors: Satwik Kundu, Swaroop Ghosh

Abstract: Quantum machine learning (QML) is a category of algorithms that employ variational quantum circuits (VQCs) to tackle machine learning tasks. Recent discoveries have shown that QML models can effectively generalize from limited training data samples. This capability has sparked increased interest in deploying these models to address practical, real-world challenges, resulting in the emergence of Qu… ▽ More Quantum machine learning (QML) is a category of algorithms that employ variational quantum circuits (VQCs) to tackle machine learning tasks. Recent discoveries have shown that QML models can effectively generalize from limited training data samples. This capability has sparked increased interest in deploying these models to address practical, real-world challenges, resulting in the emergence of Quantum Machine Learning as a Service (QMLaaS). QMLaaS represents a hybrid model that utilizes both classical and quantum computing resources. Classical computers play a crucial role in this setup, handling initial pre-processing and subsequent post-processing of data to compensate for the current limitations of quantum hardware. Since this is a new area, very little work exists to paint the whole picture of QMLaaS in the context of known security threats in the domain of classical and quantum machine learning. This SoK paper is aimed to bridge this gap by outlining the complete QMLaaS workflow, which encompasses both the training and inference phases and highlighting significant security concerns involving untrusted classical or quantum providers. QML models contain several sensitive assets, such as the model architecture, training/testing data, encoding techniques, and trained parameters. Unauthorized access to these components could compromise the model's integrity and lead to intellectual property (IP) theft. We pinpoint the critical security issues that must be considered to pave the way for a secure QMLaaS deployment. △ Less

Submitted 18 August, 2024; originally announced August 2024.

Comments: 9 pages, 3 figures

arXiv:2408.07832 [pdf, other]

LADDER: Language Driven Slice Discovery and Error Rectification

Authors: Shantanu Ghosh, Rayan Syed, Chenyu Wang, Clare B. Poynton, Kayhan Batmanghelich

Abstract: Error slice discovery associates structured patterns with model errors. Existing methods discover error slices by clustering the error-prone samples with similar patterns or assigning discrete attributes to each sample for post-hoc analysis. While these methods aim for interpretability and easier mitigation through reweighting or rebalancing, they may not capture the full complexity of error patte… ▽ More Error slice discovery associates structured patterns with model errors. Existing methods discover error slices by clustering the error-prone samples with similar patterns or assigning discrete attributes to each sample for post-hoc analysis. While these methods aim for interpretability and easier mitigation through reweighting or rebalancing, they may not capture the full complexity of error patterns due to incomplete or missing attributes. Contrary to the existing approach, this paper utilizes the reasoning capabilities of the Large Language Model (LLM) to analyze complex error patterns and generate testable hypotheses. This paper proposes LADDER: Language Driven slice Discovery and Error Rectification. It first projects the model's representation into a language-aligned feature space (eg CLIP) to preserve semantics in the original model feature space. This ensures the accurate retrieval of sentences that highlight the model's errors. Next, the LLM utilizes the sentences and generates hypotheses to discover error slices. Finally, we mitigate the error by fine-tuning the classification head by creating a group-balanced dataset using the hypotheses. Our entire method does not require any attribute annotation, either explicitly or through external tagging models. We validate our method with \textbf{five} image classification datasets. The code is available (https://github.com/batmanlab/Ladder). △ Less

Submitted 4 September, 2024; v1 submitted 31 July, 2024; originally announced August 2024.

arXiv:2408.06002 [pdf, other]

Generative Design of Multimodal Soft Pneumatic Actuators

Authors: Saswath Ghosh, Sitikantha Roy

Abstract: The recent advancements in machine learning techniques have steered us towards the data-driven design of products. Motivated by this objective, the present study proposes an automated design methodology that employs data-driven methods to generate new designs of soft actuators. One of the bottlenecks in the data-driven automated design process is having publicly available data to train the model.… ▽ More The recent advancements in machine learning techniques have steered us towards the data-driven design of products. Motivated by this objective, the present study proposes an automated design methodology that employs data-driven methods to generate new designs of soft actuators. One of the bottlenecks in the data-driven automated design process is having publicly available data to train the model. Due to its unavailability, a synthetic data set of soft pneumatic network (Pneu-net) actuators has been created. The parametric design data set for the training of the generative model is created using data augmentation. Next, the Gaussian mixture model has been applied to generate novel parametric designs of Pneu-net actuators. The distance-based metric defines the novelty and diversity of the generated designs. In addition, it is noteworthy that the model has the potential to generate a multimodal Pneu-net actuator that could perform in-plane bending and out-of-plane twisting. Later, the novel design is passed through finite element analysis to evaluate the quality of the generated design. Moreover, the trajectory of each category of Pneu-net actuators evaluates the performance of the generated Pneu-net actuators and emphasizes the necessity of multimodal actuation. The proposed model could accelerate the design of new soft robots by selecting a soft actuator from the developed novel pool of soft actuators. △ Less

Submitted 12 August, 2024; originally announced August 2024.

arXiv:2408.01996 [pdf, other]

Configuring Safe Spiking Neural Controllers for Cyber-Physical Systems through Formal Verification

Authors: Arkaprava Gupta, Sumana Ghosh, Ansuman Banerjee, Swarup Kumar Mohalik

Abstract: Spiking Neural Networks (SNNs) are a subclass of neuromorphic models that have great potential to be used as controllers in Cyber-Physical Systems (CPSs) due to their energy efficiency. They can benefit from the prevalent approach of first training an Artificial Neural Network (ANN) and then translating to an SNN with subsequent hyperparameter tuning. The tuning is required to ensure that the resu… ▽ More Spiking Neural Networks (SNNs) are a subclass of neuromorphic models that have great potential to be used as controllers in Cyber-Physical Systems (CPSs) due to their energy efficiency. They can benefit from the prevalent approach of first training an Artificial Neural Network (ANN) and then translating to an SNN with subsequent hyperparameter tuning. The tuning is required to ensure that the resulting SNN is accurate with respect to the ANN in terms of metrics like Mean Squared Error (MSE). However, SNN controllers for safety-critical CPSs must also satisfy safety specifications, which are not guaranteed by the conversion approach. In this paper, we propose a solution which tunes the $temporal$ $window$ hyperparameter of the translated SNN to ensure both accuracy and compliance with the safe range specification that requires the SNN outputs to remain within a safe range. The core verification problem is modelled using mixed-integer linear programming (MILP) and is solved with Gurobi. When the controller fails to meet the range specification, we compute tight bounds on the SNN outputs as feedback for the CPS developer. To mitigate the high computational cost of verification, we integrate data-driven steps to minimize verification calls. Our approach provides designers with the confidence to safely integrate energy-efficient SNN controllers into modern CPSs. We demonstrate our approach with experimental results on five different benchmark neural controllers. △ Less

Submitted 4 August, 2024; originally announced August 2024.

Comments: This is the complete version of a paper with the same title that appeared at MEMOCODE 2024

arXiv:2408.01594 [pdf, other]

"I don't see myself represented here at all": User Experiences of Stable Diffusion Outputs Containing Representational Harms across Gender Identities and Nationalities

Authors: Sourojit Ghosh, Nina Lutz, Aylin Caliskan

Abstract: Though research into text-to-image generators (T2Is) such as Stable Diffusion has demonstrated their amplification of societal biases and potentials to cause harm, such research has primarily relied on computational methods instead of seeking information from real users who experience harm, which is a significant knowledge gap. In this paper, we conduct the largest human subjects study of Stable D… ▽ More Though research into text-to-image generators (T2Is) such as Stable Diffusion has demonstrated their amplification of societal biases and potentials to cause harm, such research has primarily relied on computational methods instead of seeking information from real users who experience harm, which is a significant knowledge gap. In this paper, we conduct the largest human subjects study of Stable Diffusion, with a combination of crowdsourced data from 133 crowdworkers and 14 semi-structured interviews across diverse countries and genders. Through a mixed-methods approach of intra-set cosine similarity hierarchies (i.e., comparing multiple Stable Diffusion outputs for the same prompt with each other to examine which result is 'closest' to the prompt) and qualitative thematic analysis, we first demonstrate a large disconnect between user expectations for Stable Diffusion outputs with those generated, evidenced by a set of Stable Diffusion renditions of `a Person' providing images far away from such expectations. We then extend this finding of general dissatisfaction into highlighting representational harms caused by Stable Diffusion upon our subjects, especially those with traditionally marginalized identities, subjecting them to incorrect and often dehumanizing stereotypes about their identities. We provide recommendations for a harm-aware approach to (re)design future versions of Stable Diffusion and other T2Is. △ Less

Submitted 2 August, 2024; originally announced August 2024.

Comments: Upcoming Publication, AIES 2024

arXiv:2408.01590 [pdf, other]

Interpretations, Representations, and Stereotypes of Caste within Text-to-Image Generators

Authors: Sourojit Ghosh

Abstract: The surge in the popularity of text-to-image generators (T2Is) has been matched by extensive research into ensuring fairness and equitable outcomes, with a focus on how they impact society. However, such work has typically focused on globally-experienced identities or centered Western contexts. In this paper, we address interpretations, representations, and stereotypes surrounding a tragically und… ▽ More The surge in the popularity of text-to-image generators (T2Is) has been matched by extensive research into ensuring fairness and equitable outcomes, with a focus on how they impact society. However, such work has typically focused on globally-experienced identities or centered Western contexts. In this paper, we address interpretations, representations, and stereotypes surrounding a tragically underexplored context in T2I research: caste. We examine how the T2I Stable Diffusion displays people of various castes, and what professions they are depicted as performing. Generating 100 images per prompt, we perform CLIP-cosine similarity comparisons with default depictions of an 'Indian person' by Stable Diffusion, and explore patterns of similarity. Our findings reveal how Stable Diffusion outputs perpetuate systems of 'castelessness', equating Indianness with high-castes and depicting caste-oppressed identities with markers of poverty. In particular, we note the stereotyping and representational harm towards the historically-marginalized Dalits, prominently depicted as living in rural areas and always at protests. Our findings underscore a need for a caste-aware approach towards T2I design, and we conclude with design recommendations. △ Less

Submitted 2 August, 2024; originally announced August 2024.

Comments: Upcoming Publication, AIES 2024

arXiv:2407.19561 [pdf, ps, other]

Anti-Concentration for the Unitary Haar Measure and Applications to Random Quantum Circuits

Authors: Bill Fefferman, Soumik Ghosh, Wei Zhan

Abstract: We prove a Carbery-Wright style anti-concentration inequality for the unitary Haar measure, by showing that the probability of a polynomial in the entries of a random unitary falling into an $\varepsilon$ range is at most a polynomial in $\varepsilon$. Using it, we show that the scrambling speed of a random quantum circuit is lower bounded: Namely, every input qubit has an influence that is at lea… ▽ More We prove a Carbery-Wright style anti-concentration inequality for the unitary Haar measure, by showing that the probability of a polynomial in the entries of a random unitary falling into an $\varepsilon$ range is at most a polynomial in $\varepsilon$. Using it, we show that the scrambling speed of a random quantum circuit is lower bounded: Namely, every input qubit has an influence that is at least exponentially small in depth, on any output qubit touched by its lightcone. We give three applications of this new scrambling speed lower bound that apply to random quantum circuits with Haar random gates: $\bullet$ An optimal $Ω(\log \varepsilon^{-1})$ depth lower bound for $\varepsilon$-approximate unitary designs; $\bullet$ A polynomial-time quantum algorithm that computes the depth of a bounded-depth circuit, given oracle access to the circuit; $\bullet$ A polynomial-time algorithm that learns log-depth circuits up to polynomially small diamond distance, given oracle access to the circuit. The first depth lower bound works against any architecture. The latter two algorithms apply to architectures defined over any geometric dimension, and can be generalized to a wide class of architectures with good lightcone properties. △ Less

Submitted 28 July, 2024; originally announced July 2024.

Comments: 31 pages

arXiv:2407.19099 [pdf, other]

Sponsored is the New Organic: Implications of Sponsored Results on Quality of Search Results in the Amazon Marketplace

Authors: Abhisek Dash, Saptarshi Ghosh, Animesh Mukherjee, Abhijnan Chakraborty, Krishna P. Gummadi

Abstract: Interleaving sponsored results (advertisements) amongst organic results on search engine result pages (SERP) has become a common practice across multiple digital platforms. Advertisements have catered to consumer satisfaction and fostered competition in digital public spaces; making them an appealing gateway for businesses to reach their consumers. However, especially in the context of digital mar… ▽ More Interleaving sponsored results (advertisements) amongst organic results on search engine result pages (SERP) has become a common practice across multiple digital platforms. Advertisements have catered to consumer satisfaction and fostered competition in digital public spaces; making them an appealing gateway for businesses to reach their consumers. However, especially in the context of digital marketplaces, due to the competitive nature of the sponsored results with the organic ones, multiple unwanted repercussions have surfaced affecting different stakeholders. From the consumers' perspective the sponsored ads/results may cause degradation of search quality and nudge consumers to potentially irrelevant and costlier products. The sponsored ads may also affect the level playing field of the competition in the marketplaces among sellers. To understand and unravel these potential concerns, we analyse the Amazon digital marketplace in four different countries by simulating 4,800 search operations. Our analyses over SERPs consisting 2M organic and 638K sponsored results show items with poor organic ranks (beyond 100th position) appear as sponsored results even before the top organic results on the first page of Amazon SERP. Moreover, we also observe that in majority of the cases, these top sponsored results are costlier and are of poorer quality than the top organic results. We believe these observations can motivate researchers for further deliberation to bring in more transparency and guard rails in the advertising practices followed in digital marketplaces. △ Less

Submitted 26 July, 2024; originally announced July 2024.

Comments: This work has been accepted as a full paper in AAAI/ACM conference on Artificial Intelligence, Ethics and Society (AIES) 2024

arXiv:2407.16661 [pdf, ps, other]

Regenerative Ulam-von Neumann Algorithm: An Innovative Markov chain Monte Carlo Method for Matrix Inversion

Authors: Soumyadip Ghosh, Lior Horesh, Vassilis Kalantzis, Yingdong Lu, Tomasz Nowicki

Abstract: This paper presents an extension of the classical Ulan-von Neumann Markov chain Monte-Carlo algorithm for the computation of the matrix inverse. The algorithm presented in this paper, termed as \emph{regenerative Ulam-von Neumann algorithm}, utilizes the regenerative structure of classical, non-truncated Neumann series defined by a non-singular matrix and produces an unbiased estimator of the matr… ▽ More This paper presents an extension of the classical Ulan-von Neumann Markov chain Monte-Carlo algorithm for the computation of the matrix inverse. The algorithm presented in this paper, termed as \emph{regenerative Ulam-von Neumann algorithm}, utilizes the regenerative structure of classical, non-truncated Neumann series defined by a non-singular matrix and produces an unbiased estimator of the matrix inverse. Furthermore, the accuracy of the proposed algorithm depends on a single parameter that controls the total number of Markov transitions simulated thus avoiding the challenge of balancing between the total number of Markov chain replications and its corresponding length as in the classical Ulam-von Neumann algorithm. To efficiently utilize the Markov chain transition samples in the calculation of the regenerative quantities, the proposed algorithm quantifies automatically the contribution of each Markov transition to all regenerative quantities by a carefully designed updating scheme that utilized three separate matrices containing the current weights, total weights, and regenerative cycle count, respectively. A probabilistic analysis of the performance of the algorithm, including the variance of the estimator, is provided. Finally, numerical experiments verify the qualitative effectiveness of the proposed scheme. △ Less

Submitted 16 August, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

MSC Class: 68Q25; 68R10; 65C05

arXiv:2407.15810 [pdf, other]

Breaking the Global North Stereotype: A Global South-centric Benchmark Dataset for Auditing and Mitigating Biases in Facial Recognition Systems

Authors: Siddharth D Jaiswal, Animesh Ganai, Abhisek Dash, Saptarshi Ghosh, Animesh Mukherjee

Abstract: Facial Recognition Systems (FRSs) are being developed and deployed globally at unprecedented rates. Most platforms are designed in a limited set of countries but deployed in worldwide, without adequate checkpoints. This is especially problematic for Global South countries which lack strong legislation to safeguard persons facing disparate performance of these systems. A combination of unavailabili… ▽ More Facial Recognition Systems (FRSs) are being developed and deployed globally at unprecedented rates. Most platforms are designed in a limited set of countries but deployed in worldwide, without adequate checkpoints. This is especially problematic for Global South countries which lack strong legislation to safeguard persons facing disparate performance of these systems. A combination of unavailability of datasets, lack of understanding of FRS functionality and low-resource bias mitigation measures accentuate the problem. In this work, we propose a new face dataset composed of 6,579 unique male and female sportspersons from eight countries around the world. More than 50% of the dataset comprises individuals from the Global South countries and is demographically diverse. To aid adversarial audits and robust model training, each image has four adversarial variants, totaling over 40,000 images. We also benchmark five popular FRSs, both commercial and open-source, for the task of gender prediction (and country prediction for one of the open-source models as an example of red-teaming). Experiments on industrial FRSs reveal accuracies ranging from 98.2%--38.1%, with a large disparity between males and females in the Global South (max difference of 38.5%). Biases are also observed in all FRSs between females of the Global North and South (max difference of ~50%). Grad-CAM analysis identifies the nose, forehead and mouth as the regions of interest on one of the open-source FRSs. Utilizing this insight, we design simple, low-resource bias mitigation solutions using few-shot and novel contrastive learning techniques significantly improving the accuracy with disparity between males and females reducing from 50% to 1.5% in one of the settings. In the red-teaming experiment with the open-source Deepface model, contrastive learning proves more effective than simple fine-tuning. △ Less

Submitted 26 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

Comments: This work has been accepted for publication at AAAI/ACM AIES 2024

arXiv:2407.14779 [pdf, other]

Do Generative AI Models Output Harm while Representing Non-Western Cultures: Evidence from A Community-Centered Approach

Authors: Sourojit Ghosh, Pranav Narayanan Venkit, Sanjana Gautam, Shomir Wilson, Aylin Caliskan

Abstract: Our research investigates the impact of Generative Artificial Intelligence (GAI) models, specifically text-to-image generators (T2Is), on the representation of non-Western cultures, with a focus on Indian contexts. Despite the transformative potential of T2Is in content creation, concerns have arisen regarding biases that may lead to misrepresentations and marginalizations. Through a community-cen… ▽ More Our research investigates the impact of Generative Artificial Intelligence (GAI) models, specifically text-to-image generators (T2Is), on the representation of non-Western cultures, with a focus on Indian contexts. Despite the transformative potential of T2Is in content creation, concerns have arisen regarding biases that may lead to misrepresentations and marginalizations. Through a community-centered approach and grounded theory analysis of 5 focus groups from diverse Indian subcultures, we explore how T2I outputs to English prompts depict Indian culture and its subcultures, uncovering novel representational harms such as exoticism and cultural misappropriation. These findings highlight the urgent need for inclusive and culturally sensitive T2I systems. We propose design guidelines informed by a sociotechnical perspective, aiming to address these issues and contribute to the development of more equitable and representative GAI technologies globally. Our work also underscores the necessity of adopting a community-centered approach to comprehend the sociotechnical dynamics of these models, complementing existing work in this space while identifying and addressing the potential negative repercussions and harms that may arise when these models are deployed on a global scale. △ Less

Submitted 3 August, 2024; v1 submitted 20 July, 2024; originally announced July 2024.

Comments: This is the pre-peer reviewed version, which has been accepted at the 7th AAAI ACM Conference on AI, Ethics, and Society, Oct. 21, 2024, California, USA

arXiv:2407.14650 [pdf, other]

Auditing the Grid-Based Placement of Private Label Products on E-commerce Search Result Pages

Authors: Siddharth D Jaiswal, Abhisek Dash, Nitika Shroff, Yashwanth Babu Vunnam, Saptarshi Ghosh, Animesh Mukherjee

Abstract: E-commerce platforms support the needs and livelihoods of their two most important stakeholders -- customers and producers/sellers. Multiple algorithmic systems, like ``search'' systems mediate the interactions between these stakeholders by connecting customers to producers with relevant items. Search results include (i) private label (PL) products that are manufactured/sold by the platform itself… ▽ More E-commerce platforms support the needs and livelihoods of their two most important stakeholders -- customers and producers/sellers. Multiple algorithmic systems, like ``search'' systems mediate the interactions between these stakeholders by connecting customers to producers with relevant items. Search results include (i) private label (PL) products that are manufactured/sold by the platform itself, as well as (ii) third-party products on advertised / sponsored and organic positions. In this paper, we systematically quantify the extent of PL product promotion on e-commerce search results for the two largest e-commerce platforms operating in India -- Amazon.in and Flipkart. By analyzing snapshots of search results across the two platforms, we discover high PL promotion on the initial result pages (~ 15% PLs are advertised on the first SERP of Amazon). Both platforms use different strategies to promote their PL products, such as placing more PLs on the advertised positions -- while Amazon places them on the first, middle, and last rows of the search results, Flipkart places them on the first two positions and the (entire) last column of the search results. We discover that these product placement strategies of both platforms conform with existing user attention strategies proposed in the literature. Finally, to supplement the findings from the collected data, we conduct a survey among 68 participants on Amazon Mechanical Turk. The click pattern from our survey shows that users strongly prefer to click on products placed at positions that correspond to the PL products on the search results of Amazon, but not so strongly on Flipkart. The click-through rate follows previously proposed theoretically grounded user attention distribution patterns in a two-dimensional layout. △ Less

Submitted 19 July, 2024; originally announced July 2024.

arXiv:2407.12869 [pdf, ps, other]

Bilingual Adaptation of Monolingual Foundation Models

Authors: Gurpreet Gosal, Yishi Xu, Gokul Ramakrishnan, Rituraj Joshi, Avraham Sheinin, Zhiming, Chen, Biswajit Mishra, Natalia Vassilieva, Joel Hestness, Neha Sengupta, Sunil Kumar Sahu, Bokang Jia, Onkar Pandit, Satheesh Katipomu, Samta Kamboj, Samujjwal Ghosh, Rahul Pal, Parvez Mullah, Soundar Doraiswamy, Mohamed El Karim Chami, Preslav Nakov

Abstract: We present an efficient method for adapting a monolingual Large Language Model (LLM) to another language, addressing challenges of catastrophic forgetting and tokenizer limitations. We focus this study on adapting Llama 2 to Arabic. Our two-stage approach begins with expanding the vocabulary and training only the embeddings matrix, followed by full model continual pre-training on a bilingual corpu… ▽ More We present an efficient method for adapting a monolingual Large Language Model (LLM) to another language, addressing challenges of catastrophic forgetting and tokenizer limitations. We focus this study on adapting Llama 2 to Arabic. Our two-stage approach begins with expanding the vocabulary and training only the embeddings matrix, followed by full model continual pre-training on a bilingual corpus. By continually pre-training on a mix of Arabic and English corpora, the model retains its proficiency in English while acquiring capabilities in Arabic. Our approach results in significant improvements in Arabic and slight enhancements in English, demonstrating cost-effective cross-lingual transfer. We perform ablations on embedding initialization techniques, data mix ratios, and learning rates and release a detailed training recipe. To demonstrate generalizability of this approach we also adapted Llama 3 8B to Arabic and Llama 2 13B to Hindi. △ Less

Submitted 25 July, 2024; v1 submitted 13 July, 2024; originally announced July 2024.

arXiv:2407.12848 [pdf, ps, other]

Applicability of Large Language Models and Generative Models for Legal Case Judgement Summarization

Authors: Aniket Deroy, Kripabandhu Ghosh, Saptarshi Ghosh

Abstract: Automatic summarization of legal case judgements, which are known to be long and complex, has traditionally been tried via extractive summarization models. In recent years, generative models including abstractive summarization models and Large language models (LLMs) have gained huge popularity. In this paper, we explore the applicability of such models for legal case judgement summarization. We ap… ▽ More Automatic summarization of legal case judgements, which are known to be long and complex, has traditionally been tried via extractive summarization models. In recent years, generative models including abstractive summarization models and Large language models (LLMs) have gained huge popularity. In this paper, we explore the applicability of such models for legal case judgement summarization. We applied various domain specific abstractive summarization models and general domain LLMs as well as extractive summarization models over two sets of legal case judgements from the United Kingdom (UK) Supreme Court and the Indian (IN) Supreme Court and evaluated the quality of the generated summaries. We also perform experiments on a third dataset of legal documents of a different type, Government reports from the United States (US). Results show that abstractive summarization models and LLMs generally perform better than the extractive methods as per traditional metrics for evaluating summary quality. However, detailed investigation shows the presence of inconsistencies and hallucinations in the outputs of the generative models, and we explore ways to reduce the hallucinations and inconsistencies in the summaries. Overall, the investigation suggests that further improvements are needed to enhance the reliability of abstractive models and LLMs for legal case judgement summarization. At present, a human-in-the-loop technique is more suitable for performing manual checks to identify inconsistencies in the generated summaries. △ Less

Submitted 20 July, 2024; v1 submitted 6 July, 2024; originally announced July 2024.

Comments: Accepted at Artificial Intelligence and Law, Springer, 2024

arXiv:2407.11268 [pdf, other]

Heterogenous Multi-Source Data Fusion Through Input Mapping and Latent Variable Gaussian Process

Authors: Yigitcan Comlek, Sandipp Krishnan Ravi, Piyush Pandita, Sayan Ghosh, Liping Wang, Wei Chen

Abstract: Artificial intelligence and machine learning frameworks have served as computationally efficient mapping between inputs and outputs for engineering problems. These mappings have enabled optimization and analysis routines that have warranted superior designs, ingenious material systems and optimized manufacturing processes. A common occurrence in such modeling endeavors is the existence of multiple… ▽ More Artificial intelligence and machine learning frameworks have served as computationally efficient mapping between inputs and outputs for engineering problems. These mappings have enabled optimization and analysis routines that have warranted superior designs, ingenious material systems and optimized manufacturing processes. A common occurrence in such modeling endeavors is the existence of multiple source of data, each differentiated by fidelity, operating conditions, experimental conditions, and more. Data fusion frameworks have opened the possibility of combining such differentiated sources into single unified models, enabling improved accuracy and knowledge transfer. However, these frameworks encounter limitations when the different sources are heterogeneous in nature, i.e., not sharing the same input parameter space. These heterogeneous input scenarios can occur when the domains differentiated by complexity, scale, and fidelity require different parametrizations. Towards addressing this void, a heterogeneous multi-source data fusion framework is proposed based on input mapping calibration (IMC) and latent variable Gaussian process (LVGP). In the first stage, the IMC algorithm is utilized to transform the heterogeneous input parameter spaces into a unified reference parameter space. In the second stage, a multi-source data fusion model enabled by LVGP is leveraged to build a single source-aware surrogate model on the transformed reference space. The proposed framework is demonstrated and analyzed on three engineering case studies (design of cantilever beam, design of ellipsoidal void and modeling properties of Ti6Al4V alloy). The results indicate that the proposed framework provides improved predictive accuracy over a single source model and transformed but source unaware model. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: 20 Pages,9 Figures, Data is available per request

arXiv:2407.07237 [pdf, other]

The Quantum Imitation Game: Reverse Engineering of Quantum Machine Learning Models

Authors: Archisman Ghosh, Swaroop Ghosh

Abstract: Quantum Machine Learning (QML) amalgamates quantum computing paradigms with machine learning models, providing significant prospects for solving complex problems. However, with the expansion of numerous third-party vendors in the Noisy Intermediate-Scale Quantum (NISQ) era of quantum computing, the security of QML models is of prime importance, particularly against reverse engineering, which could… ▽ More Quantum Machine Learning (QML) amalgamates quantum computing paradigms with machine learning models, providing significant prospects for solving complex problems. However, with the expansion of numerous third-party vendors in the Noisy Intermediate-Scale Quantum (NISQ) era of quantum computing, the security of QML models is of prime importance, particularly against reverse engineering, which could expose trained parameters and algorithms of the models. We assume the untrusted quantum cloud provider is an adversary having white-box access to the transpiled user-designed trained QML model during inference. Reverse engineering (RE) to extract the pre-transpiled QML circuit will enable re-transpilation and usage of the model for various hardware with completely different native gate sets and even different qubit technology. Such flexibility may not be obtained from the transpiled circuit which is tied to a particular hardware and qubit technology. The information about the number of parameters, and optimized values can allow further training of the QML model to alter the QML model, tamper with the watermark, and/or embed their own watermark or refine the model for other purposes. In this first effort to investigate the RE of QML circuits, we perform RE and compare the training accuracy of original and reverse-engineered Quantum Neural Networks (QNNs) of various sizes. We note that multi-qubit classifiers can be reverse-engineered under specific conditions with a mean error of order 1e-2 in a reasonable time. We also propose adding dummy fixed parametric gates in the QML models to increase the RE overhead for defense. For instance, adding 2 dummy qubits and 2 layers increases the overhead by ~1.76 times for a classifier with 2 qubits and 3 layers with a performance overhead of less than 9%. We note that RE is a very powerful attack model which warrants further efforts on defenses. △ Less

Submitted 15 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

Comments: 11 pages, 12 figures

arXiv:2407.06323 [pdf, ps, other]

When in Doubt, Cascade: Towards Building Efficient and Capable Guardrails

Authors: Manish Nagireddy, Inkit Padhi, Soumya Ghosh, Prasanna Sattigeri

Abstract: Large language models (LLMs) have convincing performance in a variety of downstream tasks. However, these systems are prone to generating undesirable outputs such as harmful and biased text. In order to remedy such generations, the development of guardrail (or detector) models has gained traction. Motivated by findings from developing a detector for social bias, we adopt the notion of a use-mentio… ▽ More Large language models (LLMs) have convincing performance in a variety of downstream tasks. However, these systems are prone to generating undesirable outputs such as harmful and biased text. In order to remedy such generations, the development of guardrail (or detector) models has gained traction. Motivated by findings from developing a detector for social bias, we adopt the notion of a use-mention distinction - which we identified as the primary source of under-performance in the preliminary versions of our social bias detector. Armed with this information, we describe a fully extensible and reproducible synthetic data generation pipeline which leverages taxonomy-driven instructions to create targeted and labeled data. Using this pipeline, we generate over 300K unique contrastive samples and provide extensive experiments to systematically evaluate performance on a suite of open source datasets. We show that our method achieves competitive performance with a fraction of the cost in compute and offers insight into iteratively developing efficient and capable guardrail models. Warning: This paper contains examples of text which are toxic, biased, and potentially harmful. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05901 [pdf, other]

Intelligent Routing as a Service (iRaaS)

Authors: Saptarshi Ghosh, Konstantinos Antonakoglou, Ioannis Mavromatis, Kostas Katsaros

Abstract: The scope of the Sixth-Generation Self-Organized Networks (6G-SON) advances its predecessor's capability towards agility, flexibility, and adaptability. On-demand overlay networking technologies have shown a prominent maturity while coping with the rising complexity and scale of enterprise, service provider, and data centre networks. The Software-Defined Networking paradigm has recently offered Mo… ▽ More The scope of the Sixth-Generation Self-Organized Networks (6G-SON) advances its predecessor's capability towards agility, flexibility, and adaptability. On-demand overlay networking technologies have shown a prominent maturity while coping with the rising complexity and scale of enterprise, service provider, and data centre networks. The Software-Defined Networking paradigm has recently offered Model Driven Programmability, minimizing network management complexity through automation and orchestration. However, leveraging Machine Learning-driven network optimization, a.k.a. Knowledge-Defined Networking (KDN), has still been a domain of interest for the Network Softwarization research community. In this article, we propose Intelligent Routing as a Service (iRaaS) architecture as an application layer cognitive routing framework for KDNs. iRaaS offers routing logic customization (i.e., customizing metric function and path-finding algorithm) and provides an option to include heuristic parameters from trained models as a part of the metric calculation. iRaaS sits on the application plane above the knowledge plane in a KDN stack, thus providing platform- and vendor-agnostic coupling with existing network infrastructures. This article covers the scope of iRaaS by using reliability as a heuristic for standard path-discovery algorithms, e.g., Shortest Path First (SPF) and Diffusion Update algorithm (DUAL), along with the architectural specification. We validate our approach through a Proof-of-Concept deployment. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05399 [pdf, other]

IL-TUR: Benchmark for Indian Legal Text Understanding and Reasoning

Authors: Abhinav Joshi, Shounak Paul, Akshat Sharma, Pawan Goyal, Saptarshi Ghosh, Ashutosh Modi

Abstract: Legal systems worldwide are inundated with exponential growth in cases and documents. There is an imminent need to develop NLP and ML techniques for automatically processing and understanding legal documents to streamline the legal system. However, evaluating and comparing various NLP models designed specifically for the legal domain is challenging. This paper addresses this challenge by proposing… ▽ More Legal systems worldwide are inundated with exponential growth in cases and documents. There is an imminent need to develop NLP and ML techniques for automatically processing and understanding legal documents to streamline the legal system. However, evaluating and comparing various NLP models designed specifically for the legal domain is challenging. This paper addresses this challenge by proposing IL-TUR: Benchmark for Indian Legal Text Understanding and Reasoning. IL-TUR contains monolingual (English, Hindi) and multi-lingual (9 Indian languages) domain-specific tasks that address different aspects of the legal system from the point of view of understanding and reasoning over Indian legal documents. We present baseline models (including LLM-based) for each task, outlining the gap between models and the ground truth. To foster further research in the legal domain, we create a leaderboard (available at: https://exploration-lab.github.io/IL-TUR/) where the research community can upload and compare legal text understanding systems. △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: Accepted at ACL 2024 Main Conference; 40 Pages (9 Pages + References + Appendix)

arXiv:2407.03835 [pdf, other]

7th ABAW Competition: Multi-Task Learning and Compound Expression Recognition

Authors: Dimitrios Kollias, Stefanos Zafeiriou, Irene Kotsia, Abhinav Dhall, Shreya Ghosh, Chunchang Shao, Guanyu Hu

Abstract: This paper describes the 7th Affective Behavior Analysis in-the-wild (ABAW) Competition, which is part of the respective Workshop held in conjunction with ECCV 2024. The 7th ABAW Competition addresses novel challenges in understanding human expressions and behaviors, crucial for the development of human-centered technologies. The Competition comprises of two sub-challenges: i) Multi-Task Learning… ▽ More This paper describes the 7th Affective Behavior Analysis in-the-wild (ABAW) Competition, which is part of the respective Workshop held in conjunction with ECCV 2024. The 7th ABAW Competition addresses novel challenges in understanding human expressions and behaviors, crucial for the development of human-centered technologies. The Competition comprises of two sub-challenges: i) Multi-Task Learning (the goal is to learn at the same time, in a multi-task learning setting, to estimate two continuous affect dimensions, valence and arousal, to recognise between the mutually exclusive classes of the 7 basic expressions and 'other'), and to detect 12 Action Units); and ii) Compound Expression Recognition (the target is to recognise between the 7 mutually exclusive compound expression classes). s-Aff-Wild2, which is a static version of the A/V Aff-Wild2 database and contains annotations for valence-arousal, expressions and Action Units, is utilized for the purposes of the Multi-Task Learning Challenge; a part of C-EXPR-DB, which is an A/V in-the-wild database with compound expression annotations, is utilized for the purposes of the Compound Expression Recognition Challenge. In this paper, we introduce the two challenges, detailing their datasets and the protocols followed for each. We also outline the evaluation metrics, and highlight the baseline systems and their results. Additional information about the competition can be found at \url{https://affective-behavior-analysis-in-the-wild.github.io/7th}. △ Less

Submitted 8 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.02658 [pdf, other]

Efficient Exact Algorithms for Minimum Covering of Orthogonal Polygons with Squares

Authors: Anubhav Dhar, Subham Ghosh, Sudeshna Kolay

Abstract: The Orthogonal Polygon Covering with Squares (OPCS) problem takes as input an orthogonal polygon $P$ without holes with $n$ vertices, where vertices have integral coordinates. The aim is to find a minimum number of axis-parallel, possibly overlapping squares which lie completely inside $P$, such that their union covers the entire region inside $P$. Aupperle et. al~\cite{aupperle1988covering} provi… ▽ More The Orthogonal Polygon Covering with Squares (OPCS) problem takes as input an orthogonal polygon $P$ without holes with $n$ vertices, where vertices have integral coordinates. The aim is to find a minimum number of axis-parallel, possibly overlapping squares which lie completely inside $P$, such that their union covers the entire region inside $P$. Aupperle et. al~\cite{aupperle1988covering} provide an $\mathcal O(N^{1.5})$-time algorithm to solve OPCS for orthogonal polygons without holes, where $N$ is the number of integral lattice points lying in the interior or on the boundary of $P$. Designing algorithms for OPCS with a running time polynomial in $n$ (the number of vertices of $P$) was discussed as an open question in \cite{aupperle1988covering}, since $N$ can be exponentially larger than $n$. In this paper we design a polynomial-time exact algorithm for OPCS with a running time of $\mathcal O(n^{14})$. We also consider the following structural parameterized version of the problem. A knob in an orthogonal polygon is a polygon edge whose both endpoints are convex polygon vertices. Given an input orthogonal polygon with $n$ vertices and $k$ knobs, we design an algorithm for OPCS with running time $\mathcal O(n^2 + k^{14} \cdot n)$. In \cite{aupperle1988covering}, the Orthogonal Polygon with Holes Covering with Squares (OPCSH) problem is also studied where orthogonal polygon could have holes, and the objective is to find a minimum square covering of the input polygon. This is shown to be NP-complete. We think there is an error in the existing proof in \cite{aupperle1988covering}, where a reduction from Planar 3-CNF is shown. We fix this error in the proof with an alternate construction of one of the gadgets used in the reduction, hence completing the proof of NP-completeness of OPCSH. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.02536 [pdf, other]

doi 10.4230/LIPIcs.GIScience.2023.3

Reducing False Discoveries in Statistically-Significant Regional-Colocation Mining: A Summary of Results

Authors: Subhankar Ghosh, Jayant Gupta, Arun Sharma, Shuai An, Shashi Shekhar

Abstract: Given a set \emph{S} of spatial feature types, its feature instances, a study area, and a neighbor relationship, the goal is to find pairs $<$a region ($r_{g}$), a subset \emph{C} of \emph{S}$>$ such that \emph{C} is a statistically significant regional-colocation pattern in $r_{g}$. This problem is important for applications in various domains including ecology, economics, and sociology. The prob… ▽ More Given a set \emph{S} of spatial feature types, its feature instances, a study area, and a neighbor relationship, the goal is to find pairs $<$a region ($r_{g}$), a subset \emph{C} of \emph{S}$>$ such that \emph{C} is a statistically significant regional-colocation pattern in $r_{g}$. This problem is important for applications in various domains including ecology, economics, and sociology. The problem is computationally challenging due to the exponential number of regional colocation patterns and candidate regions. Previously, we proposed a miner \cite{10.1145/3557989.3566158} that finds statistically significant regional colocation patterns. However, the numerous simultaneous statistical inferences raise the risk of false discoveries (also known as the multiple comparisons problem) and carry a high computational cost. We propose a novel algorithm, namely, multiple comparisons regional colocation miner (MultComp-RCM) which uses a Bonferroni correction. Theoretical analysis, experimental evaluation, and case study results show that the proposed method reduces both the false discovery rate and computational cost. △ Less

Submitted 1 July, 2024; originally announced July 2024.

ACM Class: E.m; F.2; E.1; H.3; I.5; J.0

arXiv:2407.01878 [pdf, other]

Compare without Despair: Reliable Preference Evaluation with Generation Separability

Authors: Sayan Ghosh, Tejas Srinivasan, Swabha Swayamdipta

Abstract: Human evaluation of generated language through pairwise preference judgments is pervasive. However, under common scenarios, such as when generations from a model pair are very similar, or when stochastic decoding results in large variations in generations, it results in inconsistent preference ratings. We address these challenges by introducing a meta-evaluation measure, separability, which estima… ▽ More Human evaluation of generated language through pairwise preference judgments is pervasive. However, under common scenarios, such as when generations from a model pair are very similar, or when stochastic decoding results in large variations in generations, it results in inconsistent preference ratings. We address these challenges by introducing a meta-evaluation measure, separability, which estimates how suitable a test instance is for pairwise preference evaluation. For a candidate test instance, separability samples multiple generations from a pair of models, and measures how distinguishable the two sets of generations are. Our experiments show that instances with high separability values yield more consistent preference ratings from both human- and auto-raters. Further, the distribution of separability allows insights into which test benchmarks are more valuable for comparing models. Finally, we incorporate separability into ELO ratings, accounting for how suitable each test instance might be for reliably ranking LLMs. Overall, separability has implications for consistent, efficient and robust preference evaluation of LLMs with both human- and auto-raters. △ Less

Submitted 8 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

Comments: Corrected description of reference in Related Work

arXiv:2407.01732 [pdf, other]

Investigating Nudges toward Related Sellers on E-commerce Marketplaces: A Case Study on Amazon

Authors: Abhisek Dash, Abhijnan Chakraborty, Saptarshi Ghosh, Animesh Mukherjee, Krishna P. Gummadi

Abstract: E-commerce marketplaces provide business opportunities to millions of sellers worldwide. Some of these sellers have special relationships with the marketplace by virtue of using their subsidiary services (e.g., fulfillment and/or shipping services provided by the marketplace) -- we refer to such sellers collectively as Related Sellers. When multiple sellers offer to sell the same product, the mark… ▽ More E-commerce marketplaces provide business opportunities to millions of sellers worldwide. Some of these sellers have special relationships with the marketplace by virtue of using their subsidiary services (e.g., fulfillment and/or shipping services provided by the marketplace) -- we refer to such sellers collectively as Related Sellers. When multiple sellers offer to sell the same product, the marketplace helps a customer in selecting an offer (by a seller) through (a) a default offer selection algorithm, (b) showing features about each of the offers and the corresponding sellers (price, seller performance metrics, seller's number of ratings etc.), and (c) finally evaluating the sellers along these features. In this paper, we perform an end-to-end investigation into how the above apparatus can nudge customers toward the Related Sellers on Amazon's four different marketplaces in India, USA, Germany and France. We find that given explicit choices, customers' preferred offers and algorithmically selected offers can be significantly different. We highlight that Amazon is adopting different performance metric evaluation policies for different sellers, potentially benefiting Related Sellers. For instance, such policies result in notable discrepancy between the actual performance metric and the presented performance metric of Related Sellers. We further observe that among the seller-centric features visible to customers, sellers' number of ratings influences their decisions the most, yet it may not reflect the true quality of service by the seller, rather reflecting the scale at which the seller operates, thereby implicitly steering customers toward larger Related Sellers. Moreover, when customers are shown the rectified metrics for the different sellers, their preference toward Related Sellers is almost halved. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: This work has been accepted for presentation at the ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW) 2024. It will appear in Proceedings of the ACM on Human-Computer Interaction

arXiv:2407.00317 [pdf, other]

Towards Statistically Significant Taxonomy Aware Co-location Pattern Detection

Authors: Subhankar Ghosh, Arun Sharma, Jayant Gupta, Shashi Shekhar

Abstract: Given a collection of Boolean spatial feature types, their instances, a neighborhood relation (e.g., proximity), and a hierarchical taxonomy of the feature types, the goal is to find the subsets of feature types or their parents whose spatial interaction is statistically significant. This problem is for taxonomy-reliant applications such as ecology (e.g., finding new symbiotic relationships across… ▽ More Given a collection of Boolean spatial feature types, their instances, a neighborhood relation (e.g., proximity), and a hierarchical taxonomy of the feature types, the goal is to find the subsets of feature types or their parents whose spatial interaction is statistically significant. This problem is for taxonomy-reliant applications such as ecology (e.g., finding new symbiotic relationships across the food chain), spatial pathology (e.g., immunotherapy for cancer), retail, etc. The problem is computationally challenging due to the exponential number of candidate co-location patterns generated by the taxonomy. Most approaches for co-location pattern detection overlook the hierarchical relationships among spatial features, and the statistical significance of the detected patterns is not always considered, leading to potential false discoveries. This paper introduces two methods for incorporating taxonomies and assessing the statistical significance of co-location patterns. The baseline approach iteratively checks the significance of co-locations between leaf nodes or their ancestors in the taxonomy. Using the Benjamini-Hochberg procedure, an advanced approach is proposed to control the false discovery rate. This approach effectively reduces the risk of false discoveries while maintaining the power to detect true co-location patterns. Experimental evaluation and case study results show the effectiveness of the approach. △ Less

Submitted 4 July, 2024; v1 submitted 29 June, 2024; originally announced July 2024.

Comments: Accepted in The 16th Conference on Spatial Information Theory (COSIT) 2024

ACM Class: E.m; H.3.3; I.5; J.4; J.4

arXiv:2406.17957 [pdf, other]

Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment

Authors: Paarth Neekhara, Shehzeen Hussain, Subhankar Ghosh, Jason Li, Rafael Valle, Rohan Badlani, Boris Ginsburg

Abstract: Large Language Model (LLM) based text-to-speech (TTS) systems have demonstrated remarkable capabilities in handling large speech datasets and generating natural speech for new speakers. However, LLM-based TTS models are not robust as the generated output can contain repeating words, missing words and mis-aligned speech (referred to as hallucinations or attention errors), especially when the text c… ▽ More Large Language Model (LLM) based text-to-speech (TTS) systems have demonstrated remarkable capabilities in handling large speech datasets and generating natural speech for new speakers. However, LLM-based TTS models are not robust as the generated output can contain repeating words, missing words and mis-aligned speech (referred to as hallucinations or attention errors), especially when the text contains multiple occurrences of the same token. We examine these challenges in an encoder-decoder transformer model and find that certain cross-attention heads in such models implicitly learn the text and speech alignment when trained for predicting speech tokens for a given text. To make the alignment more robust, we propose techniques utilizing CTC loss and attention priors that encourage monotonic cross-attention over the text tokens. Our guided attention training technique does not introduce any new learnable parameters and significantly improves robustness of LLM-based TTS models. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: Published as a conference paper at INTERSPEECH 2024

arXiv:2406.11768 [pdf, other]

GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities

Authors: Sreyan Ghosh, Sonal Kumar, Ashish Seth, Chandra Kiran Reddy Evuru, Utkarsh Tyagi, S Sakshi, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha

Abstract: Perceiving and understanding non-speech sounds and non-verbal speech is essential to making decisions that help us interact with our surroundings. In this paper, we propose GAMA, a novel General-purpose Large Audio-Language Model (LALM) with Advanced Audio Understanding and Complex Reasoning Abilities. We build GAMA by integrating an LLM with multiple types of audio representations, including feat… ▽ More Perceiving and understanding non-speech sounds and non-verbal speech is essential to making decisions that help us interact with our surroundings. In this paper, we propose GAMA, a novel General-purpose Large Audio-Language Model (LALM) with Advanced Audio Understanding and Complex Reasoning Abilities. We build GAMA by integrating an LLM with multiple types of audio representations, including features from a custom Audio Q-Former, a multi-layer aggregator that aggregates features from multiple layers of an audio encoder. We fine-tune GAMA on a large-scale audio-language dataset, which augments it with audio understanding capabilities. Next, we propose CompA-R (Instruction-Tuning for Complex Audio Reasoning), a synthetically generated instruction-tuning (IT) dataset with instructions that require the model to perform complex reasoning on the input audio. We instruction-tune GAMA with CompA-R to endow it with complex reasoning abilities, where we further add a soft prompt as input with high-level semantic evidence by leveraging event tags of the input audio. Finally, we also propose CompA-R-test, a human-labeled evaluation dataset for evaluating the capabilities of LALMs on open-ended audio question-answering that requires complex reasoning. Through automated and expert human evaluations, we show that GAMA outperforms all other LALMs in literature on diverse audio understanding tasks by margins of 1%-84%. Further, GAMA IT-ed on CompA-R proves to be superior in its complex reasoning and instruction following capabilities. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: Project Website: https://sreyan88.github.io/gamaaudio/

arXiv:2406.11704 [pdf, other]

Nemotron-4 340B Technical Report

Authors: Nvidia, :, Bo Adler, Niket Agarwal, Ashwath Aithal, Dong H. Anh, Pallab Bhattacharya, Annika Brundyn, Jared Casper, Bryan Catanzaro, Sharon Clay, Jonathan Cohen, Sirshak Das, Ayush Dattagupta, Olivier Delalleau, Leon Derczynski, Yi Dong, Daniel Egert, Ellie Evans, Aleksander Ficek, Denys Fridman, Shaona Ghosh, Boris Ginsburg, Igor Gitman, Tomasz Grzegorzek , et al. (58 additional authors not shown)

Abstract: We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distribution, modification, and use of the models and its outputs. These models perform competitively to open access models on a wide range of evaluation be… ▽ More We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distribution, modification, and use of the models and its outputs. These models perform competitively to open access models on a wide range of evaluation benchmarks, and were sized to fit on a single DGX H100 with 8 GPUs when deployed in FP8 precision. We believe that the community can benefit from these models in various research studies and commercial applications, especially for generating synthetic data to train smaller language models. Notably, over 98% of data used in our model alignment process is synthetically generated, showcasing the effectiveness of these models in generating synthetic data. To further support open research and facilitate model development, we are also open-sourcing the synthetic data generation pipeline used in our model alignment process. △ Less

Submitted 6 August, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.06818 [pdf, other]

Conformal Prediction for Class-wise Coverage via Augmented Label Rank Calibration

Authors: Yuanjie Shi, Subhankar Ghosh, Taha Belkhouja, Janardhan Rao Doppa, Yan Yan

Abstract: Conformal prediction (CP) is an emerging uncertainty quantification framework that allows us to construct a prediction set to cover the true label with a pre-specified marginal or conditional probability. Although the valid coverage guarantee has been extensively studied for classification problems, CP often produces large prediction sets which may not be practically useful. This issue is exacerba… ▽ More Conformal prediction (CP) is an emerging uncertainty quantification framework that allows us to construct a prediction set to cover the true label with a pre-specified marginal or conditional probability. Although the valid coverage guarantee has been extensively studied for classification problems, CP often produces large prediction sets which may not be practically useful. This issue is exacerbated for the setting of class-conditional coverage on imbalanced classification tasks. This paper proposes the Rank Calibrated Class-conditional CP (RC3P) algorithm to reduce the prediction set sizes to achieve class-conditional coverage, where the valid coverage holds for each class. In contrast to the standard class-conditional CP (CCP) method that uniformly thresholds the class-wise conformity score for each class, the augmented label rank calibration step allows RC3P to selectively iterate this class-wise thresholding subroutine only for a subset of classes whose class-wise top-k error is small. We prove that agnostic to the classifier and data distribution, RC3P achieves class-wise coverage. We also show that RC3P reduces the size of prediction sets compared to the CCP method. Comprehensive experiments on multiple real-world datasets demonstrate that RC3P achieves class-wise coverage and 26.25% reduction in prediction set sizes on average. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Showing 1–50 of 738 results for author: Ghosh, S