Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API changes for Aggregator-side differential privacy #607

Merged
merged 1 commit into from
Jun 30, 2023

Conversation

tholop
Copy link
Contributor

@tholop tholop commented Jun 16, 2023

As discussed yesterday, this PR makes some API changes required to add DP on aggregated shares (aka Server-DP, following the terminology from DPrio (https://petsymposium.org/popets/2023/popets-2023-0086.php). Once we settle on an API we can tackle DP implementations for specific VDAFs, such as #578.

cc/ @cjpatton

@cjpatton cjpatton self-requested a review June 16, 2023 18:23
@tholop tholop marked this pull request as ready for review June 16, 2023 18:51
@tholop tholop requested a review from a team as a code owner June 16, 2023 18:51
src/dp.rs Outdated Show resolved Hide resolved
src/dp.rs Outdated Show resolved Hide resolved
src/dp.rs Show resolved Hide resolved
src/dp.rs Outdated Show resolved Hide resolved
src/dp.rs Outdated Show resolved Hide resolved
src/dp.rs Outdated Show resolved Hide resolved
src/dp.rs Outdated Show resolved Hide resolved
src/dp.rs Outdated Show resolved Hide resolved
src/dp.rs Outdated Show resolved Hide resolved
src/vdaf.rs Outdated Show resolved Hide resolved
src/dp.rs Outdated Show resolved Hide resolved
src/dp.rs Outdated Show resolved Hide resolved
src/dp.rs Outdated Show resolved Hide resolved
src/dp.rs Outdated Show resolved Hide resolved
src/vdaf.rs Outdated Show resolved Hide resolved
src/vdaf.rs Outdated Show resolved Hide resolved
src/dp.rs Outdated Show resolved Hide resolved
src/dp.rs Outdated Show resolved Hide resolved
src/dp.rs Outdated Show resolved Hide resolved
src/dp.rs Outdated Show resolved Hide resolved
src/dp.rs Show resolved Hide resolved
src/dp.rs Show resolved Hide resolved
@@ -21,7 +21,6 @@ use std::{
convert::{TryFrom, TryInto},
io::Cursor,
};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please back out the unrelated whitespace changes in this directory

Copy link
Collaborator

@cjpatton cjpatton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • nit: Block comments should break lines at 100 characters, for consistency.

Cargo.toml Outdated Show resolved Hide resolved
src/dp.rs Outdated Show resolved Hide resolved
src/dp.rs Outdated Show resolved Hide resolved
src/dp.rs Outdated Show resolved Hide resolved
src/dp.rs Outdated Show resolved Hide resolved
src/dp.rs Show resolved Hide resolved
src/dp.rs Outdated Show resolved Hide resolved
src/dp.rs Outdated Show resolved Hide resolved
src/vdaf.rs Outdated Show resolved Hide resolved
src/vdaf.rs Outdated Show resolved Hide resolved
@tholop tholop force-pushed the server_dp_api branch 3 times, most recently from 8ba9d5e to f16ef97 Compare June 21, 2023 22:07
@MxmUrw
Copy link
Contributor

MxmUrw commented Jun 21, 2023

We have a remark regarding the name DifferentialPrivacyBudget: It seems that in the literature the term "privacy budget" is used in the context of iteration of privacy mechanisms (to track the privacy which was already spent), while simply "privacy parameter" is used to refer to the $(\epsilon,\delta)$ or $\rho$ parameters occurring in the definitions of the various notions of privacy, such as zCDP, etc. Compare with various papers such as the definition of CDP, the analysis of the discrete gaussian, the "deep learning with dp" paper.

Usually, the privacy budget is calculated by summing up the privacy parameters of each round, which makes the distinction somewhat vacuous - the privacy parameter is effectively just the privacy budget of a single differentially private mechanism. We would be happy to leave the code as is, but wanted to give a heads-up regarding this. Of course, I assume there might be communities where using the term "budget" is more customary?

@MxmUrw
Copy link
Contributor

MxmUrw commented Jun 21, 2023

Thinking further about the discussed direction for this PR, we concluded the following:

  • It would be useful to separate the choice of noising mechanism (gaussian, laplace) from the choice of the parametrization, i.e., "notion of privacy" (e.g., epsilon-delta, zCDP)
  • It should be possible to implement dp-noising for VDAFs generically over a tuple of (noising mechanism, parameter), this would require a function new_with_param(param : Parameter, sensitivity : Rational) -> Distribution for each supported (distribution, parameter) pair.

As a sketch, this might look as follows:

// We need a trait which inherits from `rand::Distribution`
trait ParametrizedDistribution<DPBudget : DifferentialPrivacyBudget> : Distribution<BigInt>
{
    fn new_with_budget(budget: DPBudget, sensitivity: Rational) -> Self;
}

And the Aggregator trait would take it as a further parameter:

pub trait AggregatorWithNoise<DPBudget, PDistribution>: Aggregator
where
    DPBudget: DifferentialPrivacyBudget,
    PDistribution: ParametrizedDistribution<DPBudget>,
{
    /// Adds noise to an aggregate share such that the aggregate result
    /// is `DPBudget`-differentially private as long as one Aggregator is honest.
    fn add_noise_to_agg_share(
        &self,
        budget: &DPBudget,
        agg_param: &Self::AggregationParam,
        agg_share: &mut Self::AggregateShare,
        num_measurements: usize,
    ) -> Result<(), VdafError>;
}

Then implementations of add_noise_to_agg_share can use new_with_budget to create a distribution and sample from it.

@tholop
Copy link
Contributor Author

tholop commented Jun 22, 2023

I like this proposal @MxmUrw. One concern I have is that your sketch hardcodes the type of the distribution to be BigInt. That can be problematic if some VDAFs prefer distributions that sample in different sets, e.g. a truncated discrete Gaussian sampling directly a u128. So I added a generic T in my implementation. Let me know if that looks good!

Another note: DifferentialPrivacyDistribution will sample individual elements (because it inherits from Distribution), so the constructor only specifies DP guarantees for scalar-valued queries. For vector-valued queries, it will be up to add_noise_to_agg_share to compute the right sensitivity for the scalar distribution, and sample each element i.i.d. I don't think it will be hard (e.g. with Theorem 14 of https://arxiv.org/pdf/2004.00010.pdf) but I wanted to keep track of this.

Regarding the name: I tend to use "privacy budget" more often than "privacy parameters" (as do some practitioners like this survey, the Programming DP book or even theory papers like the RDP definition). Both terms seem pretty interchangeable. I just like the concrete feel of "budget" compared to generic "parameters" that could represent many things for the non-expert. I agree that "privacy parameters" might be more widespread though, so I don't have strong feelings.

Edit: here is how an implementation could look like

impl DifferentialPrivacyDistribution<ZeroConcentratedDifferentialPrivacyBudget, BigInt>
    for DiscreteGaussian
{
    /// Uses Theorem 4 from [[CKS20]]
    ///
    /// [CKS20]: https://arxiv.org/pdf/2004.00010.pdf
    fn from(
        privacy_budget: &ZeroConcentratedDifferentialPrivacyBudget,
        sensitivity: Rational,
    ) -> Self {
        let sigma = Rational {
            numerator: privacy_budget.epsilon.denominator * sensitivity.numerator,
            denominator: privacy_budget.epsilon.numerator * sensitivity.denominator,
        };
        Self { sigma }
    }
}

@tholop
Copy link
Contributor Author

tholop commented Jun 26, 2023

@MxmUrw @ooovi @divergentdave do you have more changes to suggest? If not, I can rebase and squash so you can merge.

@MxmUrw
Copy link
Contributor

MxmUrw commented Jun 26, 2023

Looks good to us!

I think parametrizing over T is a good choice, and I do agree with the fact that the name "budget" gives a better idea of the purpose of this trait, so I am in favor of keeping it as is.

@divergentdave
Copy link
Contributor

I would prefer that we take the correspondence from DP budgets to DP noise distributions out of the type system.

As a motivating thought experiment, if noise needs to be non-identically distributed for some future combination of DP budget and VDAF, then we could still write all the code needed to do so by ignoring DifferentialPrivacyDistribution::from() and instead constructing multiple distribution instances directly. Moreover, the code in AggregatorWithNoise impl blocks can always do this, so I think the DifferentialPrivacyDistribution trait adds unnecessary complexity at the type system level.

With the current trait definitions, we would need to specify AggregatorWithNoise's DPDistribution and T type parameters wherever it's used, because they cannot be inferred from the arguments or other type parameters. Additionally, I worry that DPDistribution: DifferentialPrivacyDistribution<DPBudget, T> would become another "infectious" trait bound that has to be copied to all generic code. (this has been an ergonomic issue for libprio in the past)

So long as our conception of the DifferentialPrivacyBudget trait encompasses a choice of noise distribution, I think we could eliminate DifferentialPrivacyDistribution, and leave it up to the impl AggregatorWithNoise<..., MyDPBudget> blocks to choose and parameterize the correct distributions.

@MxmUrw
Copy link
Contributor

MxmUrw commented Jun 26, 2023

If I understand correctly, this means that it would become unpractical to implement VDAFs generically over the choice of distribution. That is, the VDAF would hard-code the choice of distribution in impl AggregatorWithNoise<..., MyDPBudget> for a given MyDPBudget. (I think coupling choice of distribution to choice of budget type would be confusing, because usually these choices are quite orthogonal, as far as I know - @tholop @ooovi ?)

OTOH, it may not be necessary for the AggregatorWithNoise trait to be as generic as proposed. For instance, we only ever intend to use the discrete gaussian for our fixedvec type. Are there even cases where the same VDAF is intended to work with multiple distributions?

EDIT: I am sympathetic with the rest of the argument, I think this is just a matter of figuring out the proper parametrization. I do agree that add_noise_to_agg_share could work just as well without the ::from() method. But I do think that choosing budget type and choosing distribution are two separate concerns. We can even combine it into the same trait, but then it should reflect this by being called sth along the lines of DPBudgetAndDistribution and we would then have for example EpsilonDeltaDiscreteGaussian and zCDPDiscreteGaussian...

@tholop
Copy link
Contributor Author

tholop commented Jun 26, 2023

Are there even cases where the same VDAF is intended to work with multiple distributions?

Yes. For instance, Prio3Count or Prio3Sum make sense with both Laplace and Gaussian noise, and the end user should be able to decide depending on their requirements (Pure DP, composition, type of error, etc). We could still hardcode the distribution, e.g. with Prio3CountGaussian and Prio3CountLaplace. That would leave room for a hypothetical future VDAF using non i.i.d noise, but I don't know if it's desirable according to @divergentdave:

So long as our conception of the DifferentialPrivacyBudget trait encompasses a choice of noise distribution, I think we could eliminate DifferentialPrivacyDistribution, and leave it up to the impl AggregatorWithNoise<..., MyDPBudget> blocks to choose and parameterize the correct distributions.

Let me try to summarize some desired properties (they might not be all required, actually 1 and 3 seem incompatible):

  1. Pass something that encapsulates a noise distribution and the corresponding budget to AggregatorWithNoise
  2. Don't pollute the trait system with too many trait bounds
  3. Allow some VDAFs to combine different distributions
  4. Keep some separation between the type of distribution (Gaussian/Laplace) and the budget (zCDP with epsilon=2, EpsilonDeltaDP with epsilon, delta = 1, 0.1)

And I think that we have two solutions:

  • @MxmUrw's proposal of DPBudgetAndDistribution satisfies 1, 2 and 4. I like that, it might actually make our life easier when we plug VDAF into DAP (e.g. we need codepoints to pass the type of distribution and the budget). Daphne has such a DpConfig type: https://github.com/cloudflare/daphne/blob/edda510353709496427e52f07ebbd29dd7302531/daphne/src/messages/taskprov.rs#L117.
  • I don't know any VDAF that would need 3 at the moment, but if we want to support that then I think that we should hardcode the distribution with something like Prio3CountGaussian (in which case we lose 1). This solution sounds pretty good to me too (we just have two types of distributions, so hardcoding them is not too complicated).

@divergentdave
Copy link
Contributor

The DPBudgetAndDistribution/DpConfig approach sounds good to me. (one other candidate name: DifferentialPrivacyStrategy?)

Regarding non-i.i.d. noise, I was only considering sampling noise from multiple distributions of the same kind, i.e. two different discrete Gaussian distributions for a VDAF that computes both a sum and a sum of squares. (not combining discrete Gaussian and Laplace) Thus, a DPBudgetAndDistribution argument should be expressive enough for such cases.

@tholop
Copy link
Contributor Author

tholop commented Jun 26, 2023

I was only considering sampling noise from multiple distributions of the same kind, i.e. two different discrete Gaussian distributions for a VDAF that computes both a sum and a sum of squares.

Ah indeed, this would be a realistic VDAF, and it's unclear if we can build it with DifferentialPrivacyDistribution::from() by playing with the sensitivity parameter (for a fixed budget). Instead, we can optionally provide a conversion function similar to from directly in the relevant DPBudgetAndDistribution/DPConfig object. I added such an example in ZCdpDiscreteGaussian::create_distribution().

The new implementation should be extensible enough to support non i.i.d VDAFs or other changes, and does static dispatch. The drawback is that we might have to implement multiple combinations of accounting methods & distributions, as @MxmUrw pointed out. An alternative would be to pass a (distribution_type, budget) tuple and determine the compatibility at runtime, as suggested in #607 (comment).

src/dp.rs Outdated
///
/// [CKS20]: https://arxiv.org/pdf/2004.00010.pdf
pub fn create_distribution(
budget: ZeroConcentratedDifferentialPrivacyBudget,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this could take a reference instead

Suggested change
budget: ZeroConcentratedDifferentialPrivacyBudget,
budget: &ZeroConcentratedDifferentialPrivacyBudget,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@tholop
Copy link
Contributor Author

tholop commented Jun 29, 2023

@divergentdave @tgeoghegan is there anything else I can do before merging? Thank you!

src/dp.rs Outdated Show resolved Hide resolved
src/dp.rs Outdated
Comment on lines 11 to 12
numerator: u32,
denominator: u32,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these aren't pub then there's no way to construct a Rational from outside the crate. Additionally, if there are any restrictions on what a valid Rational is (for instance, is it OK for numerator > denominator?) then we should provide a constructor that enforces that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. I actually had a constructor but we removed it since it wasn't used. We don't have any constraint on the fields, so I'll just stick to public fields.

src/dp.rs Outdated Show resolved Hide resolved
src/dp.rs Outdated Show resolved Hide resolved
src/dp.rs Outdated Show resolved Hide resolved
src/vdaf.rs Show resolved Hide resolved
Copy link
Contributor

@tgeoghegan tgeoghegan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

- AggregatorWithNoise trait with a method to add noise
- DifferentialPrivacyStrategy trait to represent budgets and distributions
- Partial example with DiscreteGaussian and ZeroConcentratedDifferentialPrivacy
@divergentdave divergentdave merged commit 9fd1874 into divviup:main Jun 30, 2023
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants