Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spreadsheet-based classifications for RUM data – how? #529

Open
trieloff opened this issue Nov 2, 2021 · 1 comment
Open

Spreadsheet-based classifications for RUM data – how? #529

trieloff opened this issue Nov 2, 2021 · 1 comment
Labels
question Further information is requested

Comments

@trieloff
Copy link
Contributor

trieloff commented Nov 2, 2021

Overview

Our RUM data has a checkpoint field that is (at the moment) tracking the occurrence of technical events such as top (JS execution started), lcp, or load. User interaction is tracked via the click event that can apply to any click on the page. If we want deeper tracking of conversions, it would be ideal to allow users to define a mapping of URL patterns to named conversion events in .helix/config.xlsx (see https://github.com/adobe/helix-admin/issues/282) or a similar file.

What would be the best way of passing this classification information into the helix-run-query service?

Details

I see following options:

  1. we use the Primary/Replica Architecture outlined in H3 Multi-Cloud Storage Architecture helix-home#207 to create a replica of the helix content bus in Google Cloud Storage. Plain JSON files in Google Cloud Storage can be addressed in BigQuery just like a regular table, which can be then JOINed into the other RUM data.
  2. we allow helix-run-query to access the content-bus directly, read the JSON file and provide it as a String query parameter. The query would then be responsible for parsing the JSON and joining it with the RUM data, but that is achievable. Whoever runs the query would still have to provide owner/repo/ref so that the content bus ID can be resolved.
  3. we use an array query parameter for each possible named event. Whoever calls the query service would be responsible for fetching the mapping table and adding the query parameters.

Proposed Actions

At the moment, (3) looks like the easiest option to me, it only comes with the limitation that the list of supported classified checkpoints would need to be pre-prescribed. This limitation has its upsides, as it allows us to infer deeper understanding when events are called buy, subscribe or recommend rather than generic event names like conversion1.

(1) and (2) would also invite an access control problem, so that we do not allow access to any content, but only to mapping tables in .helix.

@trieloff trieloff added the question Further information is requested label Nov 2, 2021
@tripodsan
Copy link
Contributor

I also think that (3) is the best solution for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants