Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transfer to v3 #1024

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

Transfer to v3 #1024

wants to merge 13 commits into from

Conversation

MarquiseRosier
Copy link
Collaborator

@MarquiseRosier MarquiseRosier commented Jan 5, 2024

A minimal set of data endpoints necessary to run https://data.aem.live/rum-dashboard

Also I updated rum-pageviews to use Events_V4; because the version that uses Events_V3 cannot resolve www.petplace.com to petplace.com

When we use Events_V4 which prepends www and checks for results for a url, www.petplace.com and petplace.com can appear.

@trieloff
@langswei
@dbrrt

Copy link

github-actions bot commented Jan 5, 2024

This PR will trigger a minor release when merged.

Copy link

codecov bot commented Jan 5, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (387b11e) 100.00% compared to head (c39dc76) 100.00%.

Additional details and impacted files
@@            Coverage Diff            @@
##              main     #1024   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            6         6           
  Lines          710       710           
=========================================
  Hits           710       710           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@langswei
Copy link
Collaborator

langswei commented Jan 5, 2024

I believe that not every PR merged into main automatically results in v3 semantic versioning. A conventional commit message such as feat: updated dashboard query must be included as at least one of the commits. Lars can confirm if my understanding is correct.

@langswei
Copy link
Collaborator

langswei commented Jan 5, 2024

Please run SQLFluff.

@langswei
Copy link
Collaborator

langswei commented Jan 5, 2024

Given the move to _V4 table functions in some queries, I suggest creating a file similar to /src/queries/common-event-v3.sql where the new table functions are documented.

@MarquiseRosier
Copy link
Collaborator Author

@langswei I don't think I'll add a query to record the new table function as it wouldn't really fit into this PR, feel free to document the new table functions if you'd like in another PR though!

Events_V4 was created by @trieloff sometime ago; some queries simply haven't been upgraded to use it.

@langswei
Copy link
Collaborator

langswei commented Jan 7, 2024

@langswei I don't think I'll add a query to record the new table function as it wouldn't really fit into this PR, feel free to document the new table functions if you'd like in another PR though!

Events_V4 was created by @trieloff sometime ago; some queries simply haven't been upgraded to use it.

Sounds good, separate PR makes sense, no reason to combine here. My overarching goal is to ensure queries used by data.aem.live and queries used to feed spacecat and eventually DaaS -- especially those related to page view estimates -- are based on common logic so that we maintain consistency. I agree with you this is a separate topic.

source,
COUNT(id) AS ids,
COUNT(DISTINCT url) AS pages,
APPROX_TOP_COUNT(url, 1)[OFFSET(0)].value AS topurl,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see a slack thread started by @shsteimer in #aem-data-desk on Jan 3. I believe that grouping by source and taking the top url is hiding data. Might make more sense to group by source+url combination, or group by url and take the top source (referer).

Copy link
Contributor

@trieloff trieloff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't add _V4 in any of the built-in queries, make sure sqlfluff runs cleanly.

@@ -37,7 +37,7 @@ BEGIN
WHEN 365 THEN TIMESTAMP_TRUNC(time, YEAR)
ELSE TIMESTAMP_TRUNC(time, DAY)
END AS date
FROM helix_rum.PAGEVIEWS_V3(
FROM helix_rum.PAGEVIEWS_V4(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not create PAGEVIEWS_V4 and I object strongly to using it in any of the default queries as it is hiding data.

MAX(weight) AS pageviews
FROM
`helix-225321.helix_rum.EVENTS_V4`(
net.host(@url), @offset, @interval, '-', '-', 'UTC', 'all', @domainkey
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • @url should have the same semantics as the rest of the @url parameters, so URL prefix without https://, optional path and $ or ? delimiters.
  • if you list startdate and enddate in the supported parameters, you should also pass them through here
  • same for timezone
  • same for device

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the same data should come from PAGEVIEWS_V4 – why does this query exist when rum-pageviews is there already?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants