API changes for Aggregator-side differential privacy #607

tholop · 2023-06-16T18:04:29Z

As discussed yesterday, this PR makes some API changes required to add DP on aggregated shares (aka Server-DP, following the terminology from DPrio (https://petsymposium.org/popets/2023/popets-2023-0086.php). Once we settle on an API we can tackle DP implementations for specific VDAFs, such as #578.

cc/ @cjpatton

src/vdaf.rs

src/dp.rs

src/vdaf.rs

src/dp.rs

src/vdaf.rs

src/dp.rs

divergentdave · 2023-06-20T21:59:35Z

src/vdaf/prio2.rs

@@ -21,7 +21,6 @@ use std::{
    convert::{TryFrom, TryInto},
    io::Cursor,
 };
-


nit: please back out the unrelated whitespace changes in this directory

cjpatton

nit: Block comments should break lines at 100 characters, for consistency.

Cargo.toml

src/dp.rs

src/vdaf.rs

MxmUrw · 2023-06-21T22:27:13Z

We have a remark regarding the name DifferentialPrivacyBudget: It seems that in the literature the term "privacy budget" is used in the context of iteration of privacy mechanisms (to track the privacy which was already spent), while simply "privacy parameter" is used to refer to the $(\epsilon,\delta)$ or $\rho$ parameters occurring in the definitions of the various notions of privacy, such as zCDP, etc. Compare with various papers such as the definition of CDP, the analysis of the discrete gaussian, the "deep learning with dp" paper.

Usually, the privacy budget is calculated by summing up the privacy parameters of each round, which makes the distinction somewhat vacuous - the privacy parameter is effectively just the privacy budget of a single differentially private mechanism. We would be happy to leave the code as is, but wanted to give a heads-up regarding this. Of course, I assume there might be communities where using the term "budget" is more customary?

MxmUrw · 2023-06-21T22:57:54Z

Thinking further about the discussed direction for this PR, we concluded the following:

It would be useful to separate the choice of noising mechanism (gaussian, laplace) from the choice of the parametrization, i.e., "notion of privacy" (e.g., epsilon-delta, zCDP)
It should be possible to implement dp-noising for VDAFs generically over a tuple of (noising mechanism, parameter), this would require a function new_with_param(param : Parameter, sensitivity : Rational) -> Distribution for each supported (distribution, parameter) pair.

As a sketch, this might look as follows:

// We need a trait which inherits from `rand::Distribution`
trait ParametrizedDistribution<DPBudget : DifferentialPrivacyBudget> : Distribution<BigInt>
{
    fn new_with_budget(budget: DPBudget, sensitivity: Rational) -> Self;
}

And the Aggregator trait would take it as a further parameter:

pub trait AggregatorWithNoise<DPBudget, PDistribution>: Aggregator
where
    DPBudget: DifferentialPrivacyBudget,
    PDistribution: ParametrizedDistribution<DPBudget>,
{
    /// Adds noise to an aggregate share such that the aggregate result
    /// is `DPBudget`-differentially private as long as one Aggregator is honest.
    fn add_noise_to_agg_share(
        &self,
        budget: &DPBudget,
        agg_param: &Self::AggregationParam,
        agg_share: &mut Self::AggregateShare,
        num_measurements: usize,
    ) -> Result<(), VdafError>;
}

Then implementations of add_noise_to_agg_share can use new_with_budget to create a distribution and sample from it.

tholop · 2023-06-22T14:56:24Z

I like this proposal @MxmUrw. One concern I have is that your sketch hardcodes the type of the distribution to be BigInt. That can be problematic if some VDAFs prefer distributions that sample in different sets, e.g. a truncated discrete Gaussian sampling directly a u128. So I added a generic T in my implementation. Let me know if that looks good!

Another note: DifferentialPrivacyDistribution will sample individual elements (because it inherits from Distribution), so the constructor only specifies DP guarantees for scalar-valued queries. For vector-valued queries, it will be up to add_noise_to_agg_share to compute the right sensitivity for the scalar distribution, and sample each element i.i.d. I don't think it will be hard (e.g. with Theorem 14 of https://arxiv.org/pdf/2004.00010.pdf) but I wanted to keep track of this.

Regarding the name: I tend to use "privacy budget" more often than "privacy parameters" (as do some practitioners like this survey, the Programming DP book or even theory papers like the RDP definition). Both terms seem pretty interchangeable. I just like the concrete feel of "budget" compared to generic "parameters" that could represent many things for the non-expert. I agree that "privacy parameters" might be more widespread though, so I don't have strong feelings.

Edit: here is how an implementation could look like

impl DifferentialPrivacyDistribution<ZeroConcentratedDifferentialPrivacyBudget, BigInt>
    for DiscreteGaussian
{
    /// Uses Theorem 4 from [[CKS20]]
    ///
    /// [CKS20]: https://arxiv.org/pdf/2004.00010.pdf
    fn from(
        privacy_budget: &ZeroConcentratedDifferentialPrivacyBudget,
        sensitivity: Rational,
    ) -> Self {
        let sigma = Rational {
            numerator: privacy_budget.epsilon.denominator * sensitivity.numerator,
            denominator: privacy_budget.epsilon.numerator * sensitivity.denominator,
        };
        Self { sigma }
    }
}

tholop · 2023-06-26T13:57:36Z

@MxmUrw @ooovi @divergentdave do you have more changes to suggest? If not, I can rebase and squash so you can merge.

MxmUrw · 2023-06-26T14:37:39Z

Looks good to us!

I think parametrizing over T is a good choice, and I do agree with the fact that the name "budget" gives a better idea of the purpose of this trait, so I am in favor of keeping it as is.

divergentdave · 2023-06-26T16:05:33Z

I would prefer that we take the correspondence from DP budgets to DP noise distributions out of the type system.

As a motivating thought experiment, if noise needs to be non-identically distributed for some future combination of DP budget and VDAF, then we could still write all the code needed to do so by ignoring DifferentialPrivacyDistribution::from() and instead constructing multiple distribution instances directly. Moreover, the code in AggregatorWithNoise impl blocks can always do this, so I think the DifferentialPrivacyDistribution trait adds unnecessary complexity at the type system level.

With the current trait definitions, we would need to specify AggregatorWithNoise's DPDistribution and T type parameters wherever it's used, because they cannot be inferred from the arguments or other type parameters. Additionally, I worry that DPDistribution: DifferentialPrivacyDistribution<DPBudget, T> would become another "infectious" trait bound that has to be copied to all generic code. (this has been an ergonomic issue for libprio in the past)

So long as our conception of the DifferentialPrivacyBudget trait encompasses a choice of noise distribution, I think we could eliminate DifferentialPrivacyDistribution, and leave it up to the impl AggregatorWithNoise<..., MyDPBudget> blocks to choose and parameterize the correct distributions.

MxmUrw · 2023-06-26T16:37:39Z

If I understand correctly, this means that it would become unpractical to implement VDAFs generically over the choice of distribution. That is, the VDAF would hard-code the choice of distribution in impl AggregatorWithNoise<..., MyDPBudget> for a given MyDPBudget. (I think coupling choice of distribution to choice of budget type would be confusing, because usually these choices are quite orthogonal, as far as I know - @tholop @ooovi ?)

OTOH, it may not be necessary for the AggregatorWithNoise trait to be as generic as proposed. For instance, we only ever intend to use the discrete gaussian for our fixedvec type. Are there even cases where the same VDAF is intended to work with multiple distributions?

EDIT: I am sympathetic with the rest of the argument, I think this is just a matter of figuring out the proper parametrization. I do agree that add_noise_to_agg_share could work just as well without the ::from() method. But I do think that choosing budget type and choosing distribution are two separate concerns. We can even combine it into the same trait, but then it should reflect this by being called sth along the lines of DPBudgetAndDistribution and we would then have for example EpsilonDeltaDiscreteGaussian and zCDPDiscreteGaussian...

tholop · 2023-06-26T18:39:23Z

Are there even cases where the same VDAF is intended to work with multiple distributions?

Yes. For instance, Prio3Count or Prio3Sum make sense with both Laplace and Gaussian noise, and the end user should be able to decide depending on their requirements (Pure DP, composition, type of error, etc). We could still hardcode the distribution, e.g. with Prio3CountGaussian and Prio3CountLaplace. That would leave room for a hypothetical future VDAF using non i.i.d noise, but I don't know if it's desirable according to @divergentdave:

So long as our conception of the DifferentialPrivacyBudget trait encompasses a choice of noise distribution, I think we could eliminate DifferentialPrivacyDistribution, and leave it up to the impl AggregatorWithNoise<..., MyDPBudget> blocks to choose and parameterize the correct distributions.

Let me try to summarize some desired properties (they might not be all required, actually 1 and 3 seem incompatible):

Pass something that encapsulates a noise distribution and the corresponding budget to AggregatorWithNoise
Don't pollute the trait system with too many trait bounds
Allow some VDAFs to combine different distributions
Keep some separation between the type of distribution (Gaussian/Laplace) and the budget (zCDP with epsilon=2, EpsilonDeltaDP with epsilon, delta = 1, 0.1)

And I think that we have two solutions:

@MxmUrw's proposal of DPBudgetAndDistribution satisfies 1, 2 and 4. I like that, it might actually make our life easier when we plug VDAF into DAP (e.g. we need codepoints to pass the type of distribution and the budget). Daphne has such a DpConfig type: https://github.com/cloudflare/daphne/blob/edda510353709496427e52f07ebbd29dd7302531/daphne/src/messages/taskprov.rs#L117.
I don't know any VDAF that would need 3 at the moment, but if we want to support that then I think that we should hardcode the distribution with something like Prio3CountGaussian (in which case we lose 1). This solution sounds pretty good to me too (we just have two types of distributions, so hardcoding them is not too complicated).

divergentdave · 2023-06-26T18:57:56Z

The DPBudgetAndDistribution/DpConfig approach sounds good to me. (one other candidate name: DifferentialPrivacyStrategy?)

Regarding non-i.i.d. noise, I was only considering sampling noise from multiple distributions of the same kind, i.e. two different discrete Gaussian distributions for a VDAF that computes both a sum and a sum of squares. (not combining discrete Gaussian and Laplace) Thus, a DPBudgetAndDistribution argument should be expressive enough for such cases.

tholop · 2023-06-26T20:18:07Z

I was only considering sampling noise from multiple distributions of the same kind, i.e. two different discrete Gaussian distributions for a VDAF that computes both a sum and a sum of squares.

Ah indeed, this would be a realistic VDAF, and it's unclear if we can build it with DifferentialPrivacyDistribution::from() by playing with the sensitivity parameter (for a fixed budget). Instead, we can optionally provide a conversion function similar to from directly in the relevant DPBudgetAndDistribution/DPConfig object. I added such an example in ZCdpDiscreteGaussian::create_distribution().

The new implementation should be extensible enough to support non i.i.d VDAFs or other changes, and does static dispatch. The drawback is that we might have to implement multiple combinations of accounting methods & distributions, as @MxmUrw pointed out. An alternative would be to pass a (distribution_type, budget) tuple and determine the compatibility at runtime, as suggested in #607 (comment).

divergentdave · 2023-06-26T20:20:50Z

src/dp.rs

+    ///
+    /// [CKS20]: https://arxiv.org/pdf/2004.00010.pdf
+    pub fn create_distribution(
+        budget: ZeroConcentratedDifferentialPrivacyBudget,


nit: this could take a reference instead

Suggested change

budget: ZeroConcentratedDifferentialPrivacyBudget,

budget: &ZeroConcentratedDifferentialPrivacyBudget,

tholop · 2023-06-29T17:28:55Z

@divergentdave @tgeoghegan is there anything else I can do before merging? Thank you!

src/dp.rs

tgeoghegan · 2023-06-29T21:14:32Z

src/dp.rs

+    numerator: u32,
+    denominator: u32,


If these aren't pub then there's no way to construct a Rational from outside the crate. Additionally, if there are any restrictions on what a valid Rational is (for instance, is it OK for numerator > denominator?) then we should provide a constructor that enforces that.

Indeed. I actually had a constructor but we removed it since it wasn't used. We don't have any constraint on the fields, so I'll just stick to public fields.

src/dp.rs

src/vdaf.rs

tgeoghegan

Looks good to me!

- AggregatorWithNoise trait with a method to add noise - DifferentialPrivacyStrategy trait to represent budgets and distributions - Partial example with DiscreteGaussian and ZeroConcentratedDifferentialPrivacy

cjpatton self-requested a review June 16, 2023 18:23

divergentdave reviewed Jun 16, 2023

View reviewed changes

src/vdaf.rs Show resolved Hide resolved

tholop marked this pull request as ready for review June 16, 2023 18:51

tholop requested a review from a team as a code owner June 16, 2023 18:51

cjpatton reviewed Jun 16, 2023

View reviewed changes

tgeoghegan reviewed Jun 16, 2023

View reviewed changes

divergentdave reviewed Jun 16, 2023

View reviewed changes

src/dp.rs Outdated Show resolved Hide resolved

ooovi reviewed Jun 17, 2023

View reviewed changes

src/dp.rs Outdated Show resolved Hide resolved

cjpatton reviewed Jun 19, 2023

View reviewed changes

src/dp.rs Outdated Show resolved Hide resolved

src/dp.rs Show resolved Hide resolved

src/dp.rs Show resolved Hide resolved

tholop requested a review from divergentdave June 20, 2023 21:57

divergentdave reviewed Jun 20, 2023

View reviewed changes

divergentdave approved these changes Jun 20, 2023

View reviewed changes

tholop force-pushed the server_dp_api branch from 70acd36 to 01787ea Compare June 20, 2023 22:47

cjpatton reviewed Jun 20, 2023

View reviewed changes

tholop force-pushed the server_dp_api branch 3 times, most recently from 8ba9d5e to f16ef97 Compare June 21, 2023 22:07

cjpatton approved these changes Jun 21, 2023

View reviewed changes

divergentdave mentioned this pull request Jun 22, 2023

IDPF: Replace CoinToss trait #614

Closed

divergentdave approved these changes Jun 26, 2023

View reviewed changes

tholop force-pushed the server_dp_api branch from fc47842 to f6b1b59 Compare June 26, 2023 20:50

tgeoghegan reviewed Jun 29, 2023

View reviewed changes

tgeoghegan approved these changes Jun 29, 2023

View reviewed changes

Add experimental API for differential privacy

c7c65d7

- AggregatorWithNoise trait with a method to add noise - DifferentialPrivacyStrategy trait to represent budgets and distributions - Partial example with DiscreteGaussian and ZeroConcentratedDifferentialPrivacy

tholop force-pushed the server_dp_api branch from 9419284 to c7c65d7 Compare June 29, 2023 22:44

divergentdave merged commit 9fd1874 into divviup:main Jun 30, 2023
6 checks passed

MxmUrw mentioned this pull request Jul 5, 2023

adding differential privacy to FixedPointBoundedL2VecSum #578

Merged

divergentdave mentioned this pull request Jul 13, 2023

Migrate from SeedStream to RngCore #644

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API changes for Aggregator-side differential privacy #607

API changes for Aggregator-side differential privacy #607

tholop commented Jun 16, 2023

divergentdave Jun 20, 2023

cjpatton left a comment

MxmUrw commented Jun 21, 2023 •

edited

Loading

MxmUrw commented Jun 21, 2023 •

edited

Loading

tholop commented Jun 22, 2023 •

edited

Loading

tholop commented Jun 26, 2023

MxmUrw commented Jun 26, 2023

divergentdave commented Jun 26, 2023

MxmUrw commented Jun 26, 2023 •

edited

Loading

tholop commented Jun 26, 2023 •

edited

Loading

divergentdave commented Jun 26, 2023

tholop commented Jun 26, 2023

divergentdave Jun 26, 2023

tholop Jun 26, 2023

tholop commented Jun 29, 2023

tgeoghegan Jun 29, 2023

tholop Jun 29, 2023

tgeoghegan left a comment

	budget: ZeroConcentratedDifferentialPrivacyBudget,
	budget: &ZeroConcentratedDifferentialPrivacyBudget,

API changes for Aggregator-side differential privacy #607

API changes for Aggregator-side differential privacy #607

Conversation

tholop commented Jun 16, 2023

divergentdave Jun 20, 2023

Choose a reason for hiding this comment

cjpatton left a comment

Choose a reason for hiding this comment

MxmUrw commented Jun 21, 2023 • edited Loading

MxmUrw commented Jun 21, 2023 • edited Loading

tholop commented Jun 22, 2023 • edited Loading

tholop commented Jun 26, 2023

MxmUrw commented Jun 26, 2023

divergentdave commented Jun 26, 2023

MxmUrw commented Jun 26, 2023 • edited Loading

tholop commented Jun 26, 2023 • edited Loading

divergentdave commented Jun 26, 2023

tholop commented Jun 26, 2023

divergentdave Jun 26, 2023

Choose a reason for hiding this comment

tholop Jun 26, 2023

Choose a reason for hiding this comment

tholop commented Jun 29, 2023

tgeoghegan Jun 29, 2023

Choose a reason for hiding this comment

tholop Jun 29, 2023

Choose a reason for hiding this comment

tgeoghegan left a comment

Choose a reason for hiding this comment

MxmUrw commented Jun 21, 2023 •

edited

Loading

MxmUrw commented Jun 21, 2023 •

edited

Loading

tholop commented Jun 22, 2023 •

edited

Loading

MxmUrw commented Jun 26, 2023 •

edited

Loading

tholop commented Jun 26, 2023 •

edited

Loading