Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documented noisy aggregate functions in new page #22715

Merged
merged 1 commit into from May 15, 2024

Conversation

jonhehir
Copy link
Contributor

@jonhehir jonhehir commented May 10, 2024

Description

Moved documentation for noisy_count_gaussian and related functions from the Aggregate Functions page to a new Noisy Aggregate Functions page. Also added documentation of the SFM sketch type and functions, such as noisy_approx_set_sfm.

Motivation and Context

This is the documentation promised back in #21290.

Impact

Docs only

Test Plan

Docs build and render nicely.

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Add documentation for noisy aggregate functions at :doc:`/functions/noisy`, including :func:`noisy_approx_distinct_sfm` and :func:`noisy_approx_set_sfm` (:pr:`21290`, :pr:`22715`)

@jonhehir jonhehir requested review from steveburnett and a team as code owners May 10, 2024 18:32
@jonhehir jonhehir requested a review from presto-oss May 10, 2024 18:32
@github-actions github-actions bot added the docs label May 10, 2024
Noisy Aggregate Functions
=========================

Noisy aggregate functions are aggregate functions that provide random, noisy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are aggregate functions --> are functions

mechanisms, it is important to note that neither the values returned by
these functions nor the query results that incorporate these functions
are differentially private in general. See `Limitations`_ below for more
details. Users who wish to support a strong privacy guarantee should
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd delete " Users who wish to support a strong privacy guarantee should
discuss with a suitable technical expert first."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to vote to leave this sentence, as we want to be abundantly clear about both the limitations and their practical implications.


.. note::

Unlike :func:`count`, this function will return ``NULL`` when the (true) count of ``x`` is 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will return --> returns


.. function:: noisy_count_gaussian(x, noise_scale[, random_seed]) -> bigint

Counts the non-``NULL`` values in ``x`` and then adds random Gaussian noise with 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's x? (a column I assume)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I've renamed all these from x to col, which is hopefully more clear.


Distinct counting can be performed using ``noisy_count_gaussian(DISTINCT x, ...)``, or with
:func:`noisy_approx_distinct_sfm`. Generally speaking, :func:`noisy_count_gaussian` will
return more accurate results but at a larger computational cost.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will return __> returns

.. function:: noisy_avg_gaussian(x, noise_scale[, random_seed]) -> double
:noindex:

Calculates the average (arithmetic mean) of all the input values in ``x`` and then adds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"adds a normally distributed random double value with 0 mean..."


.. note::

Unlike :func:`approx_set`, this function will return ``NULL`` when ``x`` is empty.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will return --> returns

.. note::

Unlike :func:`approx_set`, this function will return ``NULL`` when ``x`` is empty.
If this behavior is undesirable, :func:`coalesce` with :func:`noisy_empty_approx_set_sfm`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sentence fragment


.. note::

Unlike :func:`approx_distinct`, this function will return ``NULL`` when ``x`` is empty.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returns

Libraries and How to Fix It. In *Proceedings of the 2022 ACM SIGSAC Conference
on Computer and Communications Security* (pp. 471-484).

.. [Hehir2023] Hehir, J., Ting, D., & Cormode, G. (2023). Sketch-Flip-Merge:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone ahead and linked all the references.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, thanks! I had a suggestion I couldn't make inline so I'll mention it here.

Similar to the table of contents that I suggest adding to noisy.rst, aggregate.rst could benefit from a table of contents even more due to its length. Could you add a table of contents to aggregate.rst, the same as I added a comment in noisy.rst about?
Screenshot 2024-05-13 at 1 22 19 PM

If you do so, you could add a heading to aggregate.rst for Noisy Aggregate Functions, and underneath that header link to the new page with "See :doc:noisy. " underneath it, to help the reader know that there are aggregate functions that are not on the Aggregate Functions page.

Screenshot of aggregate.rst in local build with screenshot.

Screenshot of local build for consideration.

presto-docs/src/main/sphinx/language/types.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/language/types.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/language/types.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/functions.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/functions/noisy.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/functions/noisy.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/functions/noisy.rst Outdated Show resolved Hide resolved
Unlike :func:`count`, this function returns ``NULL`` when the (true) count of ``col`` is 0.

Distinct counting can be performed using ``noisy_count_gaussian(DISTINCT col, ...)``, or with
:func:`noisy_approx_distinct_sfm`. Generally speaking, :func:`noisy_count_gaussian` will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:func:`noisy_approx_distinct_sfm`. Generally speaking, :func:`noisy_count_gaussian` will
:func:`noisy_approx_distinct_sfm`. Generally speaking, :func:`noisy_count_gaussian`

presto-docs/src/main/sphinx/functions/noisy.rst Outdated Show resolved Hide resolved
@jonhehir
Copy link
Contributor Author

Thank you, @steveburnett! I appreciate the edits and reStructuredText pointers. I think I've hit your full wish list. Let me know if all looks good. 😄

Moved documentation for noisy_count_gaussian and related
functions from the Aggregate Functions page to a new Noisy
Aggregate Functions page. Also added documentation of the
SFM sketch type and functions, such as noisy_approx_set_sfm.
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull updated branch, new local build of docs, verified changes. New review of the documents as a whole, everything looks good. Thanks!

@jonhehir jonhehir merged commit d8376e1 into prestodb:master May 15, 2024
57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants