Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Researching existing code corpora #37

Open
acutmore opened this issue Jul 7, 2021 · 0 comments
Open

Researching existing code corpora #37

acutmore opened this issue Jul 7, 2021 · 0 comments
Labels

Comments

@acutmore
Copy link
Collaborator

acutmore commented Jul 7, 2021

An issue to collect discussions/results around researching existing code to extract data that may help with the design/scope of this proposal.

Array Method Usage

Using the GitHub dataset on BigQuery to scan for usage on existing array methods.

Number of repositories that include a particular method call

Query
SELECT
SUM(1) AS js_repos,
SUM(CAST(has_map AS INT64)) AS has_map,
SUM(CAST(has_filter  AS INT64)) AS has_filter,
SUM(CAST(has_reduce  AS INT64)) AS has_reduce,
SUM(CAST(has_copy_within  AS INT64)) AS has_copy_within,
SUM(CAST(has_fill  AS INT64)) AS has_fill,
SUM(CAST(has_pop  AS INT64)) AS has_pop,
SUM(CAST(has_push  AS INT64)) AS has_push,
SUM(CAST(has_reverse  AS INT64)) AS has_reverse,
SUM(CAST(has_shift  AS INT64)) AS has_shift,
SUM(CAST(has_sort  AS INT64)) AS has_sort,
SUM(CAST(has_slice  AS INT64)) AS has_slice,
SUM(CAST(has_slice_default AS INT64)) AS has_slice_default,
SUM(CAST(has_slice_pop  AS INT64)) AS has_slice_pop,
SUM(CAST(has_slice_shift  AS INT64)) AS has_slice_shift,
SUM(CAST(has_splice AS INT64)) AS has_splice,
SUM(CAST(has_unshift AS INT64)) AS has_unshift,
FROM (
SELECT
  repo_name,
  LOGICAL_OR(REGEXP_CONTAINS(content, r"\.map\(")) AS has_map,
  LOGICAL_OR(REGEXP_CONTAINS(content, r"\.filter\(")) AS has_filter,
  LOGICAL_OR(REGEXP_CONTAINS(content, r"\.reduce\(")) AS has_reduce,
  LOGICAL_OR(REGEXP_CONTAINS(content, r"\.copyWithin\(")) AS has_copy_within,
  LOGICAL_OR(REGEXP_CONTAINS(content, r"\.fill\(")) AS has_fill,
  LOGICAL_OR(REGEXP_CONTAINS(content, r"\.pop\( ?\)")) AS has_pop,
  LOGICAL_OR(REGEXP_CONTAINS(content, r"\.push\(")) AS has_push,
  LOGICAL_OR(REGEXP_CONTAINS(content, r"\.reverse\( ?\)")) AS has_reverse,
  LOGICAL_OR(REGEXP_CONTAINS(content, r"\.shift\( ?\)")) AS has_shift,
  LOGICAL_OR(REGEXP_CONTAINS(content, r"\.sort\(")) AS has_sort,
  LOGICAL_OR(REGEXP_CONTAINS(content, r"\.slice\(")) AS has_slice,
  LOGICAL_OR(REGEXP_CONTAINS(content, r"\.slice\(\)")) AS has_slice_default,
  LOGICAL_OR(REGEXP_CONTAINS(content, r"\.slice\( ?0 ?, ?-1 ?\)")) AS has_slice_pop,
  LOGICAL_OR(REGEXP_CONTAINS(content, r"\.slice\( ?1 ?\)")) AS has_slice_shift,
  LOGICAL_OR(REGEXP_CONTAINS(content, r"\.splice\(")) AS has_splice,
  LOGICAL_OR(REGEXP_CONTAINS(content, r"\.unshift\(")) AS has_unshift,
FROM (
  SELECT
    repo_name,
    content
  FROM
    `bigquery-public-data.github_repos.files`
  INNER JOIN
    `bigquery-public-data.github_repos.contents`
  USING
    (id)
  WHERE
    ENDS_WITH(path, '.js')
    AND NOT REGEXP_CONTAINS(path, r"\d\.\d"))
GROUP BY
  repo_name )

To get a general sense of the relative usage of the methods that we are proposing to add non-mutating versions of (and other methods like .map as a benchmark).

  • Tries to exclude files that are copies of libraries by excluding file paths that seem to contain a version /\d.\d/
  • Does not exclude forks (tbc)
  • Includes false positives. .map could be Observable.prototype.map and .slice could be String.prototype.slice
  • Dataset is from 2016 (tbc)
Category Count %
All repos with .js 1,187,155 100
.push(... 813,437 69
.slice(... 625,390 53
.map(... 624,565 52
.filter(... 571,678 48
.sort(... 536,098 45
.splice(... 533,482 45
.shift() 500,447 42
.pop() 497,781 42
.unshift() 434,742 37
.reverse() 403,688 34
.reduce(... 248,995 21
.fill(... 141,936 12
.copyWithin(... 6,034 0.5

Slice usage

.withPopped() is ~equivalent to .slice(0, -1) and .withShifted() is ~equivalent to .slice(1). So we can also look for those particular patterns.

Category Count %
All repos with .js 1,187,155 100
.slice(... 625,390 53
.slice(1) 434,558 37
.slice(0, -1) 264,466 22
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

1 participant