Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate the update of the CWV Tech Report #26

Open
rviscomi opened this issue Aug 10, 2021 · 1 comment
Open

Automate the update of the CWV Tech Report #26

rviscomi opened this issue Aug 10, 2021 · 1 comment

Comments

@rviscomi
Copy link
Member

Each month I manually run a query to append the latest data to the httparchive:core_web_vitals.technologies table.

The query have a few places that need to be updated to reflect the latest dataset ("2021-07-01" and "2021_07_01" below). The query also depends on two tables that are made available at different times: the HA technologies.YYYY_MM_DD_* tables are made available at the end of the month corresponding to the release date, while the CrUX materialzied.device_summary table isn't updated until the second Tuesday of the month following the release date. If that's confusing, for example the 2021-07-01 (July 2021) dataset was released today on August 10, the second Tuesday of August.

I can help to publish a Pub/Sub topic when the CrUX release is available on BigQuery. This project would need to subscribe to that topic and kick off a process to run a query like the one below for the given release and append the results to the CWV table.

CREATE TEMP FUNCTION IS_GOOD(good FLOAT64, needs_improvement FLOAT64, poor FLOAT64) RETURNS BOOL AS (
  good / (good + needs_improvement + poor) >= 0.75
);

CREATE TEMP FUNCTION IS_NON_ZERO(good FLOAT64, needs_improvement FLOAT64, poor FLOAT64) RETURNS BOOL AS (
  good + needs_improvement + poor > 0
);

SELECT
  date,
  ARRAY_TO_STRING(ARRAY_AGG(DISTINCT category IGNORE NULLS ORDER BY category), ', ') AS categories,
  app,
  client,
  COUNT(DISTINCT url) AS origins,
  COUNT(DISTINCT IF(good_fid, url, NULL)) AS origins_with_good_fid,
  COUNT(DISTINCT IF(good_cls, url, NULL)) AS origins_with_good_cls,
  COUNT(DISTINCT IF(good_lcp, url, NULL)) AS origins_with_good_lcp,
  COUNT(DISTINCT IF(any_fid, url, NULL)) AS origins_with_any_fid,
  COUNT(DISTINCT IF(any_cls, url, NULL)) AS origins_with_any_cls,
  COUNT(DISTINCT IF(any_lcp, url, NULL)) AS origins_with_any_lcp,
  COUNT(DISTINCT IF(good_cwv, url, NULL)) AS origins_with_good_cwv,
  COUNT(DISTINCT IF(any_lcp AND any_cls, url, NULL)) AS origins_eligible_for_cwv,
  SAFE_DIVIDE(COUNTIF(good_cwv), COUNTIF(any_lcp AND any_cls)) AS pct_eligible_origins_with_good_cwv
FROM (
  SELECT
    date,
    CONCAT(origin, '/') AS url,
    IF(device = 'desktop', 'desktop', 'mobile') AS client,
    IS_NON_ZERO(fast_fid, avg_fid, slow_fid) AS any_fid,
    IS_GOOD(fast_fid, avg_fid, slow_fid) AS good_fid,
    IS_NON_ZERO(small_cls, medium_cls, large_cls) AS any_cls,
    IS_GOOD(small_cls, medium_cls, large_cls) AS good_cls,
    IS_NON_ZERO(fast_lcp, avg_lcp, slow_lcp) AS any_lcp,
    IS_GOOD(fast_lcp, avg_lcp, slow_lcp) AS good_lcp,
    (IS_GOOD(fast_fid, avg_fid, slow_fid) OR fast_fid IS NULL) AND
    IS_GOOD(small_cls, medium_cls, large_cls) AND
    IS_GOOD(fast_lcp, avg_lcp, slow_lcp) AS good_cwv
  FROM
    `chrome-ux-report.materialized.device_summary`
  WHERE
    date = '2021-07-01' AND
    device IN ('desktop', 'phone')
) JOIN (
  SELECT DISTINCT
    CAST('2021-07-01' AS DATE) AS date,
    category,
    app,
    _TABLE_SUFFIX AS client,
    url
  FROM
    `httparchive.technologies.2021_07_01_*`
  WHERE
    app IS NOT NULL AND
    app != ''
) USING (date, url, client)
GROUP BY
  date,
  app,
  client
@tunetheweb
Copy link
Member

FYI latest SQL is actually here: https://github.com/HTTPArchive/cwv-tech-report/blob/main/sql/monthly.sql

@rviscomi rviscomi transferred this issue from HTTPArchive/httparchive.org Jun 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants