Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws payload tagging #4309

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from
Draft

aws payload tagging #4309

wants to merge 6 commits into from

Conversation

tlhunter
Copy link
Member

@tlhunter tlhunter commented May 15, 2024

This PR rebuilds #4131. It removes hundreds of files worth of whitespace changes and rebuilds yarn.lock based on current master branch. Ultimately @jbertran will have done 90% of the work in this PR.

What does this PR do?

This PR introduces AWS payload reporting as tags.

Configuration

We introduce 3 new environment variables:

  • DD_TRACE_CLOUD_REQUEST_PAYLOAD_TAGGING defines the activation of the feature for requests, values being either "all" (no additional redactionor a comma-separated list of JSONPath queries identifying payload paths to be replaced with the value"redacted"`.
  • DD_TRACE_CLOUD_RESPONSE_PAYLOAD_TAGGING
  • DD_TRACE_CLOUD_PAYLOAD_TAGGING_MAX_DEPTH sets the depth after which we stop creating tags from a payload

Behaviour

With the feature activated, aws-sdk calls to the enabled plugins will create additional tags representing the payload, with the following modifications:

  1. Paths known to be PII/sensitive are hard-coded to be redacted (service by service)
  2. Paths known to be user-input data likely to contain JSON are expanded
  3. Paths matching the JSONPath queries passed by the environment variables or corresponding runtime tracer configuration are redacted

This PR only provides the feature for SNS as a first service, but the framework introduced here only requires slight adaptations of a given AWS service plugin to make it available, as well as the addition of the static PII fields configuration.

New dependencies

Adding jsonpath seems safe given the constraints it imposes on its scripts, even if I don't expect scripts to be used. Using rfdc is more questionable - we need a deep clone because JSONPath apply can only do side-effects, and we must not modify the payload, but maybe something simpler works.

Remaining work

In some cases, JSONPath filter expressions are not sufficient to do what we want.

For example, setting attributes for entities (like SNS topics) requires setting an AttributeName and an AttributeValue at top-level of the JSON payload. Ideally, we should be able to redact the AttributeValue only when the AttributeName matches a disallowed value (for example KMSMasterKeyId). JSONPath syntax does not allow such a complex query, so we need to also specify custom logic hooks that do not go through JSONPath to redact data.

Motivation

This come from:

  1. the desire to have real-world data correlated with traces
  2. the fact that AWS upstream API is well-defined and well-documented, helping us avoid PII/sensitive data pitfalls
  3. the existence of such a mechanism in datadog-lambda-js, but only scoped to lambda function input and output. This provides the same level of information, with additional redaction granularity, for AWS plugins.

Plugin Checklist

Additional Notes

Security

Datadog employees:

  • If this PR touches code that signs or publishes builds or packages, or handles credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
  • This PR doesn't touch any of that.

Unsure? Have a question? Request a review!

Copy link

github-actions bot commented May 15, 2024

Overall package size

Self size: 6.64 MB
Deduped: 62.95 MB
No deduping: 63.23 MB

Dependency sizes

name version self size total size
@datadog/native-appsec 8.0.1 15.59 MB 15.6 MB
@datadog/native-iast-taint-tracking 2.1.0 14.91 MB 14.92 MB
@datadog/pprof 5.3.0 9.85 MB 10.22 MB
protobufjs 7.2.5 2.77 MB 6.56 MB
@datadog/native-iast-rewriter 2.3.1 2.15 MB 2.24 MB
@opentelemetry/core 1.14.0 872.87 kB 1.47 MB
@datadog/native-metrics 2.0.0 898.77 kB 1.3 MB
@opentelemetry/api 1.8.0 1.21 MB 1.21 MB
jsonpath-plus 9.0.0 580.4 kB 1.03 MB
import-in-the-middle 1.7.4 70.19 kB 739.86 kB
msgpack-lite 0.1.26 201.16 kB 281.59 kB
opentracing 0.14.7 194.81 kB 194.81 kB
semver 7.5.4 93.4 kB 123.8 kB
pprof-format 2.1.0 111.69 kB 111.69 kB
@datadog/sketches-js 2.1.0 109.9 kB 109.9 kB
lodash.sortby 4.7.0 75.76 kB 75.76 kB
lru-cache 7.14.0 74.95 kB 74.95 kB
ignore 5.2.4 51.22 kB 51.22 kB
int64-buffer 0.1.10 49.18 kB 49.18 kB
shell-quote 1.8.1 44.96 kB 44.96 kB
istanbul-lib-coverage 3.2.0 29.34 kB 29.34 kB
rfdc 1.3.1 25.21 kB 25.21 kB
tlhunter-sorted-set 0.1.0 24.94 kB 24.94 kB
limiter 1.1.5 23.17 kB 23.17 kB
dc-polyfill 0.1.4 23.1 kB 23.1 kB
retry 0.13.1 18.85 kB 18.85 kB
jest-docblock 29.7.0 8.99 kB 12.76 kB
crypto-randomuuid 1.0.0 11.18 kB 11.18 kB
path-to-regexp 0.1.7 6.78 kB 6.78 kB
koalas 1.0.2 6.47 kB 6.47 kB
module-details-from-path 1.0.3 4.47 kB 4.47 kB

🤖 This report was automatically generated by heaviest-objects-in-the-universe

@tlhunter tlhunter force-pushed the tlhunter/aws-payload-tagging branch from 988466f to 2b87d75 Compare May 30, 2024 17:49
Copy link

codecov bot commented May 30, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.03%. Comparing base (e60feae) to head (f944c60).
Report is 1 commits behind head on master.

Current head f944c60 differs from pull request most recent head aa6cfc5

Please upload reports for the commit aa6cfc5 to get more accurate results.

Additional details and impacted files
@@             Coverage Diff             @@
##           master    #4309       +/-   ##
===========================================
+ Coverage   69.19%   88.03%   +18.84%     
===========================================
  Files           1      109      +108     
  Lines         198     3812     +3614     
  Branches       33       33               
===========================================
+ Hits          137     3356     +3219     
- Misses         61      456      +395     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tlhunter tlhunter mentioned this pull request May 31, 2024
8 tasks
@tlhunter
Copy link
Member Author

It looks like the jsonpath module might not be very friendly with ESBuild:


  esbuild
cd /home/runner/work/dd-trace-js/dd-trace-js/integration-tests/esbuild
npm run build
Warning: G] "../include/module.js" should be marked as external for use with "require.resolve" [require-resolve-not-external]

    ../../node_modules/jsonpath/lib/grammar.js:102:58:
      102 │ ...clude = fs.readFileSync(require.resolve("../include/module.js"));
          ╵                                            ~~~~~~~~~~~~~~~~~~~~~~

Warning: G] "../include/action.js" should be marked as external for use with "require.resolve" [require-resolve-not-external]

    ../../node_modules/jsonpath/lib/grammar.js:103:58:
      103 │ ...clude = fs.readFileSync(require.resolve("../include/action.js"));
          ╵                                            ~~~~~~~~~~~~~~~~~~~~~~

Warning: G] "esprima" should be marked as external for use with "require.resolve" [require-resolve-not-external]

    ../../node_modules/jsonpath/lib/aesprim.js:4:27:
      4 │ var file = require.resolve('esprima');
        ╵                            ~~~~~~~~~

npm run built
node:internal/modules/cjs/loader:1189
  throw err;
  ^

Error: Cannot find module '../include/module.js'
Require stack:
- /home/runner/work/dd-trace-js/dd-trace-js/integration-tests/esbuild/out.js
    at Module._resolveFilename (node:internal/modules/cjs/loader:1186:15)
    at Function.resolve (node:internal/modules/helpers:133:19)
    at ../../node_modules/jsonpath/lib/grammar.js (/home/runner/work/dd-trace-js/dd-trace-js/integration-tests/esbuild/out.js:34936:55)
    at __require (/home/runner/work/dd-trace-js/dd-trace-js/integration-tests/esbuild/out.js:12:51)
    at ../../node_modules/jsonpath/lib/parser.js (/home/runner/work/dd-trace-js/dd-trace-js/integration-tests/esbuild/out.js:35589:19)
    at __require (/home/runner/work/dd-trace-js/dd-trace-js/integration-tests/esbuild/out.js:12:51)
    at ../../node_modules/jsonpath/lib/index.js (/home/runner/work/dd-trace-js/dd-trace-js/integration-tests/esbuild/out.js:42469:18)
    at __require (/home/runner/work/dd-trace-js/dd-trace-js/integration-tests/esbuild/out.js:12:51)
    at ../../node_modules/jsonpath/index.js (/home/runner/work/dd-trace-js/dd-trace-js/integration-tests/esbuild/out.js:42655:23)
    at __require (/home/runner/work/dd-trace-js/dd-trace-js/integration-tests/esbuild/out.js:12:51) {
  code: 'MODULE_NOT_FOUND',
  requireStack: [
    '/home/runner/work/dd-trace-js/dd-trace-js/integration-tests/esbuild/out.js'
  ]
}

@tlhunter
Copy link
Member Author

tlhunter commented May 31, 2024

The forked library jsonpath-plus has about 50% more downloads than jsonpath and fixes the issue so I'll switch to that:
https://www.npmjs.com/package/jsonpath-plus

@pr-commenter
Copy link

pr-commenter bot commented May 31, 2024

Benchmarks

Benchmark execution time: 2024-06-04 16:46:28

Comparing candidate commit aa6cfc5 in PR branch tlhunter/aws-payload-tagging with baseline commit 9b410b7 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 260 metrics, 6 unstable metrics.

@tlhunter tlhunter force-pushed the tlhunter/aws-payload-tagging branch from bfec84d to c48d29a Compare June 3, 2024 16:48
@tlhunter tlhunter force-pushed the tlhunter/aws-payload-tagging branch from c48d29a to aa6cfc5 Compare June 4, 2024 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants