Search | arXiv e-print repository

Value Improved Actor Critic Algorithms

Authors: Yaniv Oren, Moritz A. Zanger, Pascal R. van der Vaart, Matthijs T. J. Spaan, Wendelin Bohmer

Abstract: Many modern reinforcement learning algorithms build on the actor-critic (AC) framework: iterative improvement of a policy (the actor) using policy improvement operators and iterative approximation of the policy's value (the critic). In contrast, the popular value-based algorithm family employs improvement operators in the value update, to iteratively improve the value function directly. In this wo… ▽ More Many modern reinforcement learning algorithms build on the actor-critic (AC) framework: iterative improvement of a policy (the actor) using policy improvement operators and iterative approximation of the policy's value (the critic). In contrast, the popular value-based algorithm family employs improvement operators in the value update, to iteratively improve the value function directly. In this work, we propose a general extension to the AC framework that employs two separate improvement operators: one applied to the policy in the spirit of policy-based algorithms and one applied to the value in the spirit of value-based algorithms, which we dub Value-Improved AC (VI-AC). We design two practical VI-AC algorithms based in the popular online off-policy AC algorithms TD3 and DDPG. We evaluate VI-TD3 and VI-DDPG in the Mujoco benchmark and find that both improve upon or match the performance of their respective baselines in all environments tested. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2310.17623 [pdf, other]

Proving Test Set Contamination in Black Box Language Models

Authors: Yonatan Oren, Nicole Meister, Niladri Chatterji, Faisal Ladhak, Tatsunori B. Hashimoto

Abstract: Large language models are trained on vast amounts of internet data, prompting concerns and speculation that they have memorized public benchmarks. Going from speculation to proof of contamination is challenging, as the pretraining data used by proprietary models are often not publicly accessible. We show that it is possible to provide provable guarantees of test set contamination in language model… ▽ More Large language models are trained on vast amounts of internet data, prompting concerns and speculation that they have memorized public benchmarks. Going from speculation to proof of contamination is challenging, as the pretraining data used by proprietary models are often not publicly accessible. We show that it is possible to provide provable guarantees of test set contamination in language models without access to pretraining data or model weights. Our approach leverages the fact that when there is no data contamination, all orderings of an exchangeable benchmark should be equally likely. In contrast, the tendency for language models to memorize example order means that a contaminated language model will find certain canonical orderings to be much more likely than others. Our test flags potential contamination whenever the likelihood of a canonically ordered benchmark dataset is significantly higher than the likelihood after shuffling the examples. We demonstrate that our procedure is sensitive enough to reliably prove test set contamination in challenging situations, including models as small as 1.4 billion parameters, on small test sets of only 1000 examples, and datasets that appear only a few times in the pretraining corpus. Using our test, we audit five popular publicly accessible language models for test set contamination and find little evidence for pervasive contamination. △ Less

Submitted 23 November, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

arXiv:2210.13455 [pdf, other]

E-MCTS: Deep Exploration in Model-Based Reinforcement Learning by Planning with Epistemic Uncertainty

Authors: Yaniv Oren, Matthijs T. J. Spaan, Wendelin Böhmer

Abstract: One of the most well-studied and highly performing planning approaches used in Model-Based Reinforcement Learning (MBRL) is Monte-Carlo Tree Search (MCTS). Key challenges of MCTS-based MBRL methods remain dedicated deep exploration and reliability in the face of the unknown, and both challenges can be alleviated through principled epistemic uncertainty estimation in the predictions of MCTS. We pre… ▽ More One of the most well-studied and highly performing planning approaches used in Model-Based Reinforcement Learning (MBRL) is Monte-Carlo Tree Search (MCTS). Key challenges of MCTS-based MBRL methods remain dedicated deep exploration and reliability in the face of the unknown, and both challenges can be alleviated through principled epistemic uncertainty estimation in the predictions of MCTS. We present two main contributions: First, we develop methodology to propagate epistemic uncertainty in MCTS, enabling agents to estimate the epistemic uncertainty in their predictions. Second, we utilize the propagated uncertainty for a novel deep exploration algorithm by explicitly planning to explore. We incorporate our approach into variations of MCTS-based MBRL approaches with learned and provided dynamics models, and empirically show deep exploration through successful epistemic uncertainty estimation achieved by our approach. We compare to a non-planning-based deep-exploration baseline, and demonstrate that planning with epistemic MCTS significantly outperforms non-planning based exploration in the investigated deep exploration benchmark. △ Less

Submitted 30 August, 2023; v1 submitted 21 October, 2022; originally announced October 2022.

Comments: Submitted to NeurIPS 2023, accepted to EWRL 2023

arXiv:2201.09956 [pdf, other]

doi 10.14722/ndss.2022.24093

DRAWNAPART: A Device Identification Technique based on Remote GPU Fingerprinting

Authors: Tomer Laor, Naif Mehanna, Antonin Durey, Vitaly Dyadyuk, Pierre Laperdrix, Clémentine Maurice, Yossi Oren, Romain Rouvoy, Walter Rudametkin, Yuval Yarom

Abstract: Browser fingerprinting aims to identify users or their devices, through scripts that execute in the users' browser and collect information on software or hardware characteristics. It is used to track users or as an additional means of identification to improve security. In this paper, we report on a new technique that can significantly extend the tracking time of fingerprint-based tracking methods… ▽ More Browser fingerprinting aims to identify users or their devices, through scripts that execute in the users' browser and collect information on software or hardware characteristics. It is used to track users or as an additional means of identification to improve security. In this paper, we report on a new technique that can significantly extend the tracking time of fingerprint-based tracking methods. Our technique, which we call DrawnApart, is a new GPU fingerprinting technique that identifies a device based on the unique properties of its GPU stack. Specifically, we show that variations in speed among the multiple execution units that comprise a GPU can serve as a reliable and robust device signature, which can be collected using unprivileged JavaScript. We investigate the accuracy of DrawnApart under two scenarios. In the first scenario, our controlled experiments confirm that the technique is effective in distinguishing devices with similar hardware and software configurations, even when they are considered identical by current state-of-the-art fingerprinting algorithms. In the second scenario, we integrate a one-shot learning version of our technique into a state-of-the-art browser fingerprint tracking algorithm. We verify our technique through a large-scale experiment involving data collected from over 2,500 crowd-sourced devices over a period of several months and show it provides a boost of up to 67% to the median tracking duration, compared to the state-of-the-art method. DrawnApart makes two contributions to the state of the art in browser fingerprinting. On the conceptual front, it is the first work that explores the manufacturing differences between identical GPUs and the first to exploit these differences in a privacy context. On the practical front, it demonstrates a robust technique for distinguishing between machines with identical hardware and software configurations. △ Less

Submitted 24 January, 2022; originally announced January 2022.

Comments: Network and Distributed System Security Symposium, Feb 2022, San Diego, United States

arXiv:2103.04952 [pdf, other]

Prime+Probe 1, JavaScript 0: Overcoming Browser-based Side-Channel Defenses

Authors: Anatoly Shusterman, Ayush Agarwal, Sioli O'Connell, Daniel Genkin, Yossi Oren, Yuval Yarom

Abstract: The "eternal war in cache" has reached browsers, with multiple cache-based side-channel attacks and countermeasures being suggested. A common approach for countermeasures is to disable or restrict JavaScript features deemed essential for carrying out attacks. To assess the effectiveness of this approach, in this work we seek to identify those JavaScript features which are essential for carrying ou… ▽ More The "eternal war in cache" has reached browsers, with multiple cache-based side-channel attacks and countermeasures being suggested. A common approach for countermeasures is to disable or restrict JavaScript features deemed essential for carrying out attacks. To assess the effectiveness of this approach, in this work we seek to identify those JavaScript features which are essential for carrying out a cache-based attack. We develop a sequence of attacks with progressively decreasing dependency on JavaScript features, culminating in the first browser-based side-channel attack which is constructed entirely from Cascading Style Sheets (CSS) and HTML, and works even when script execution is completely blocked. We then show that avoiding JavaScript features makes our techniques architecturally agnostic, resulting in microarchitectural website fingerprinting attacks that work across hardware platforms including Intel Core, AMD Ryzen, Samsung Exynos, and Apple M1 architectures. As a final contribution, we evaluate our techniques in hardened browser environments including the Tor browser, Deter-Fox (Cao el al., CCS 2017), and Chrome Zero (Schwartz et al., NDSS 2018). We confirm that none of these approaches completely defend against our attacks. We further argue that the protections of Chrome Zero need to be more comprehensively applied, and that the performance and user experience of Chrome Zero will be severely degraded if this approach is taken. △ Less

Submitted 8 March, 2021; originally announced March 2021.

arXiv:1909.02060 [pdf, other]

Distributionally Robust Language Modeling

Authors: Yonatan Oren, Shiori Sagawa, Tatsunori B. Hashimoto, Percy Liang

Abstract: Language models are generally trained on data spanning a wide range of topics (e.g., news, reviews, fiction), but they might be applied to an a priori unknown target distribution (e.g., restaurant reviews). In this paper, we first show that training on text outside the test distribution can degrade test performance when using standard maximum likelihood (MLE) training. To remedy this without the k… ▽ More Language models are generally trained on data spanning a wide range of topics (e.g., news, reviews, fiction), but they might be applied to an a priori unknown target distribution (e.g., restaurant reviews). In this paper, we first show that training on text outside the test distribution can degrade test performance when using standard maximum likelihood (MLE) training. To remedy this without the knowledge of the test distribution, we propose an approach which trains a model that performs well over a wide range of potential test distributions. In particular, we derive a new distributionally robust optimization (DRO) procedure which minimizes the loss of the model over the worst-case mixture of topics with sufficient overlap with the training distribution. Our approach, called topic conditional value at risk (topic CVaR), obtains a 5.5 point perplexity reduction over MLE when the language models are trained on a mixture of Yelp reviews and news and tested only on reviews. △ Less

Submitted 4 September, 2019; originally announced September 2019.

Comments: Camera ready version for EMNLP

arXiv:1908.02524 [pdf, other]

Cross-Router Covert Channels

Authors: Adar Ovadya, Rom Ogen, Yakov Mallah, Niv Gilboa, Yossi Oren

Abstract: Many organizations protect secure networked devices from non-secure networked devices by assigning each class of devices to a different logical network. These two logical networks, commonly called the host network and the guest network, use the same router hardware, which is designed to isolate the two networks in software. In this work we show that logical network isolation based on host and gu… ▽ More Many organizations protect secure networked devices from non-secure networked devices by assigning each class of devices to a different logical network. These two logical networks, commonly called the host network and the guest network, use the same router hardware, which is designed to isolate the two networks in software. In this work we show that logical network isolation based on host and guest networks can be overcome by the use of cross-router covert channels. Using specially-crafted network traffic, these channels make it possible to leak data between the host network and the guest network, and vice versa, through the use of the router as a shared medium. We performed a survey of routers representing multiple vendors and price points, and discovered that all of the routers we surveyed are vulnerable to at least one class of covert channel. Our attack can succeed even if the attacker has very limited permissions on the infected device, and even an iframe hosting malicious JavaScript code can be used for this purpose. We provide several metrics for the effectiveness of such channels, based on their pervasiveness, rate and covertness, and discuss possible ways of identifying and preventing these leakages. △ Less

Submitted 7 August, 2019; originally announced August 2019.

Comments: Presented at WOOT 2019 - https://orenlab.sise.bgu.ac.il/p/CrossRouter

arXiv:1905.04691 [pdf, other]

Sensor Defense In-Software (SDI):Practical Software Based Detection of Spoofing Attacks on Position Sensor

Authors: Kevin Sam Tharayil, Benyamin Farshteindiker, Shaked Eyal, Nir Hasidim, Roy Hershkovitz, Shani Houri, Ilia Yoffe, Michal Oren, Yossi Oren

Abstract: Position sensors, such as the gyroscope, the magnetometer and the accelerometer, are found in a staggering variety of devices, from smartphones and UAVs to autonomous robots. Several works have shown how adversaries can mount spoofing attacks to remotely corrupt or even completely control the outputs of these sensors. With more and more critical applications relying on sensor readings to make impo… ▽ More Position sensors, such as the gyroscope, the magnetometer and the accelerometer, are found in a staggering variety of devices, from smartphones and UAVs to autonomous robots. Several works have shown how adversaries can mount spoofing attacks to remotely corrupt or even completely control the outputs of these sensors. With more and more critical applications relying on sensor readings to make important decisions, defending sensors from these attacks is of prime importance. In this work we present practical software based defenses against attacks on two common types of position sensors, specifically the gyroscope and the magnetometer. We first characterize the sensitivity of these sensors to acoustic and magnetic adversaries. Next, we present two software-only defenses: a machine learning based single sensor defense, and a sensor fusion defense which makes use of the mathematical relationship between the two sensors. We performed a detailed theoretical analysis of our defenses, and implemented them on a variety of smartphones, as well as on a resource-constrained IoT sensor node. Our defenses do not require any hardware or OS-level modifications, making it possible to use them with existing hardware. Moreover, they provide a high detection accuracy, a short detection time and a reasonable power consumption. △ Less

Submitted 12 May, 2019; originally announced May 2019.

ACM Class: B.8.1; K.6.5

arXiv:1812.01194 [pdf, other]

A Retrieve-and-Edit Framework for Predicting Structured Outputs

Authors: Tatsunori B. Hashimoto, Kelvin Guu, Yonatan Oren, Percy Liang

Abstract: For the task of generating complex outputs such as source code, editing existing outputs can be easier than generating complex outputs from scratch. With this motivation, we propose an approach that first retrieves a training example based on the input (e.g., natural language description) and then edits it to the desired output (e.g., code). Our contribution is a computationally efficient method f… ▽ More For the task of generating complex outputs such as source code, editing existing outputs can be easier than generating complex outputs from scratch. With this motivation, we propose an approach that first retrieves a training example based on the input (e.g., natural language description) and then edits it to the desired output (e.g., code). Our contribution is a computationally efficient method for learning a retrieval model that embeds the input in a task-dependent way without relying on a hand-crafted metric or incurring the expense of jointly training the retriever with the editor. Our retrieve-and-edit framework can be applied on top of any base model. We show that on a new autocomplete task for GitHub Python code and the Hearthstone cards benchmark, retrieve-and-edit significantly boosts the performance of a vanilla sequence-to-sequence model on both tasks. △ Less

Submitted 3 December, 2018; originally announced December 2018.

Comments: To appear, NeurIPS 2018

arXiv:1811.07153 [pdf, other]

Robust Website Fingerprinting Through the Cache Occupancy Channel

Authors: Anatoly Shusterman, Lachlan Kang, Yarden Haskal, Yosef Meltser, Prateek Mittal, Yossi Oren, Yuval Yarom

Abstract: Website fingerprinting attacks, which use statistical analysis on network traffic to compromise user privacy, have been shown to be effective even if the traffic is sent over anonymity-preserving networks such as Tor. The classical attack model used to evaluate website fingerprinting attacks assumes an on-path adversary, who can observe all traffic traveling between the user's computer and the Tor… ▽ More Website fingerprinting attacks, which use statistical analysis on network traffic to compromise user privacy, have been shown to be effective even if the traffic is sent over anonymity-preserving networks such as Tor. The classical attack model used to evaluate website fingerprinting attacks assumes an on-path adversary, who can observe all traffic traveling between the user's computer and the Tor network. In this work we investigate these attacks under a different attack model, in which the adversary is capable of running a small amount of unprivileged code on the target user's computer. Under this model, the attacker can mount cache side-channel attacks, which exploit the effects of contention on the CPU's cache, to identify the website being browsed. In an important special case of this attack model, a JavaScript attack is launched when the target user visits a website controlled by the attacker. The effectiveness of this attack scenario has never been systematically analyzed, especially in the open-world model which assumes that the user is visiting a mix of both sensitive and non-sensitive sites. In this work we show that cache website fingerprinting attacks in JavaScript are highly feasible, even when they are run from highly restrictive environments, such as the Tor Browser. Specifically, we use machine learning techniques to classify traces of cache activity. Unlike prior works, which try to identify cache conflicts, our work measures the overall occupancy of the last-level cache. We show that our approach achieves high classification accuracy in both the open-world and the closed-world models. We further show that our techniques are resilient both to network-based defenses and to side-channel countermeasures introduced to modern browsers as a response to the Spectre attack. △ Less

Submitted 21 February, 2019; v1 submitted 17 November, 2018; originally announced November 2018.

arXiv:1805.04850 [pdf, other]

Shattered Trust: When Replacement Smartphone Components Attack

Authors: Omer Shwartz, Amir Cohen, Asaf Shabtai, Yossi Oren

Abstract: Phone touchscreens, and other similar hardware components such as orientation sensors, wireless charging controllers, and NFC readers, are often produced by third-party manufacturers and not by the phone vendors themselves. Third-party driver source code to support these components is integrated into the vendor's source code. In contrast to 'pluggable' drivers, such as USB or network drivers, the… ▽ More Phone touchscreens, and other similar hardware components such as orientation sensors, wireless charging controllers, and NFC readers, are often produced by third-party manufacturers and not by the phone vendors themselves. Third-party driver source code to support these components is integrated into the vendor's source code. In contrast to 'pluggable' drivers, such as USB or network drivers, the component driver's source code implicitly assumes that the component hardware is authentic and trustworthy. As a result of this trust, very few integrity checks are performed on the communications between the component and the device's main processor. In this paper, we call this trust into question, considering the fact that touchscreens are often shattered and then replaced with aftermarket components of questionable origin. We analyze the operation of a commonly used touchscreen controller. We construct two standalone attacks, based on malicious touchscreen hardware, that function as building blocks toward a full attack: a series of touch injection attacks that allow the touchscreen to impersonate the user and exfiltrate data, and a buffer overflow attack that lets the attacker execute privileged operations. Combining the two building blocks, we present and evaluate a series of end-to-end attacks that can severely compromise a stock Android phone with standard firmware. Our results make the case for a hardware-based physical countermeasure. △ Less

Submitted 13 May, 2018; originally announced May 2018.

Comments: Presented in WOOT 17', 11th {USENIX} Workshop on Offensive Technologies ({WOOT} 17) - 2017

arXiv:1709.08878 [pdf, other]

Generating Sentences by Editing Prototypes

Authors: Kelvin Guu, Tatsunori B. Hashimoto, Yonatan Oren, Percy Liang

Abstract: We propose a new generative model of sentences that first samples a prototype sentence from the training corpus and then edits it into a new sentence. Compared to traditional models that generate from scratch either left-to-right or by first sampling a latent sentence vector, our prototype-then-edit model improves perplexity on language modeling and generates higher quality outputs according to hu… ▽ More We propose a new generative model of sentences that first samples a prototype sentence from the training corpus and then edits it into a new sentence. Compared to traditional models that generate from scratch either left-to-right or by first sampling a latent sentence vector, our prototype-then-edit model improves perplexity on language modeling and generates higher quality outputs according to human evaluation. Furthermore, the model gives rise to a latent edit vector that captures interpretable semantics such as sentence similarity and sentence-level analogies. △ Less

Submitted 7 September, 2018; v1 submitted 26 September, 2017; originally announced September 2017.

Comments: 14 pages, Transactions of the Association for Computational Linguistics (TACL), 2018

arXiv:1502.07373 [pdf, other]

The Spy in the Sandbox -- Practical Cache Attacks in Javascript

Authors: Yossef Oren, Vasileios P. Kemerlis, Simha Sethumadhavan, Angelos D. Keromytis

Abstract: We present the first micro-architectural side-channel attack which runs entirely in the browser. In contrast to other works in this genre, this attack does not require the attacker to install any software on the victim's machine -- to facilitate the attack, the victim needs only to browse to an untrusted webpage with attacker-controlled content. This makes the attack model highly scalable and extr… ▽ More We present the first micro-architectural side-channel attack which runs entirely in the browser. In contrast to other works in this genre, this attack does not require the attacker to install any software on the victim's machine -- to facilitate the attack, the victim needs only to browse to an untrusted webpage with attacker-controlled content. This makes the attack model highly scalable and extremely relevant and practical to today's web, especially since most desktop browsers currently accessing the Internet are vulnerable to this attack. Our attack, which is an extension of the last-level cache attacks of Yarom et al., allows a remote adversary recover information belonging to other processes, other users and even other virtual machines running on the same physical host as the victim web browser. We describe the fundamentals behind our attack, evaluate its performance using a high bandwidth covert channel and finally use it to construct a system-wide mouse/network activity logger. Defending against this attack is possible, but the required countermeasures can exact an impractical cost on other benign uses of the web browser and of the computer. △ Less

Submitted 1 March, 2015; v1 submitted 25 February, 2015; originally announced February 2015.

Showing 1–13 of 13 results for author: Oren, Y