Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FYI: Temporary change in language and extension popularity assessment #5756

Open
lildude opened this issue Feb 2, 2022 · 10 comments
Open

FYI: Temporary change in language and extension popularity assessment #5756

lildude opened this issue Feb 2, 2022 · 10 comments
Assignees

Comments

@lildude
Copy link
Member

lildude commented Feb 2, 2022

GitHub's Search is struggling at the moment so all Search requests are being heavily restricted making it almost impossible to count the number of unique :user/:repo combinations via the likes of Harvester or the API.

Search is in the process of being rewritten with the Tech Preview available at https://cs.github.com/ (please tinker with it and send GitHub feedback) however it isn't accessible via the API yet and doesn't quite yet meet our needs to determine our current usage requirements so for the foreseeable future I'll be using my judgment to determine popularity until the new Search gains the functionality we need and/or the restrictions are lifted (or we can come up with other qualifying criteria).

I know this is subjective and open to debate so the loose rules I'll be using are along the lines of:

  • at least 2000 files per extension indexed in the last year (the number you see at the top of the search results), unless the extension is expected to only occur once per repo, then 200 files.
  • with a reasonable distribution across unique :user/:repo combinations assessed by manually and randomly clicking through the results.

If particular users are showing a high proportion of the results, I'll manually filter out those users using -user:<username> to reduce their impact on my assessment.

I know this isn't ideal, but I think it's the best option for the moment. I'm open to suggestions too. On the plus side, it does mean a lot more PRs are likely to be merged 😁.

I'll be going back through older PRs in the next week or two and will re-assess based on these notes and merging any that satisfy them.

@lildude lildude self-assigned this Feb 2, 2022
@lildude lildude pinned this issue Feb 2, 2022
@Alhadis
Copy link
Collaborator

Alhadis commented Feb 3, 2022

Search is in the process of being rewritten

Might be a good time to request a "search by extension/filename" feature to simplify the task of adding new languages to GitHub... 😉

@lildude
Copy link
Member Author

lildude commented Feb 3, 2022

Might be a good time to request a "search by extension/filename" feature to simplify the task of adding new languages to GitHub... 😉

I think we're covered already by a combination of scopes and more intuitive file path expressions:

CleanShot 2022-02-03 at 09 26 58

@Alhadis
Copy link
Collaborator

Alhadis commented Feb 4, 2022

Wow, regular expressions will be supported? Now we're talking. 😀

Also, I tried to access https://cs.github.com/ but it simply redirected me to my activity feed (i.e., https://github.com/). Is it staff-only or something?

@lildude
Copy link
Member Author

lildude commented Feb 4, 2022

Nope. You need to be invited. Join the waitlist at https://cs.github.com/about

@Alhadis
Copy link
Collaborator

Alhadis commented Feb 4, 2022

Done. Hopefully this'll make Harvester's rewrite less intimidating. 😅

@Alhadis Alhadis mentioned this issue Feb 6, 2022
5 tasks
@elimisteve
Copy link

On the plus side, it does mean a lot more PRs are likely to be merged 😁.

That's what really matters anyway -- yay! 🎉

@Jake-NotTheMuss Jake-NotTheMuss mentioned this issue Feb 8, 2022
5 tasks
@lildude lildude mentioned this issue May 25, 2022
5 tasks
@lildude lildude mentioned this issue Jun 14, 2022
5 tasks
This was referenced Jul 25, 2022
Alhadis pushed a commit that referenced this issue Sep 20, 2022
@runarorama
Copy link

I want to submit support for Unison (https://unison-lang.org), but there are zero GitHub repositories with Unison code in them since Unison is an image-based language and can't really use Git and doesn't have source files. I estimate that Unison has roughly 2000 users total.

Is it worth trying to submit?

@sdankel sdankel mentioned this issue Feb 4, 2023
5 tasks
@Alhadis Alhadis mentioned this issue Apr 3, 2023
5 tasks
@lildude lildude mentioned this issue Apr 23, 2023
6 tasks
@lildude lildude mentioned this issue Jun 1, 2023
6 tasks
@modocache modocache mentioned this issue Sep 13, 2023
6 tasks
@toots toots mentioned this issue Oct 5, 2023
6 tasks
@lildude lildude mentioned this issue Oct 5, 2023
6 tasks
@lildude lildude mentioned this issue Nov 22, 2023
6 tasks
This was referenced Dec 18, 2023
@lildude lildude mentioned this issue Dec 21, 2023
6 tasks
@mawildoer
Copy link

One perhaps stupid question (I'm sorry if I missed this!) but how should we (as a language's creators) find the ~5k lines of code/200 repos?

I believe we're getting to the right point (based on the telemetry we do have), but we're finding it really hard to turn the repos up based on keywords.

I also blindly attempted to see if I could declare a language that Github would process somewhat generically for the sake of marking repos to no avail. https://github.com/atopile/spin-servo-drive/blob/main/.gitattributes

@lildude
Copy link
Member Author

lildude commented Feb 20, 2024

One perhaps stupid question (I'm sorry if I missed this!) but how should we (as a language's creators) find the ~5k lines of code/200 repos?

Use GitHub's Search. This is the only way we assess the popularity based on the search URL offered in the PR template and any further customisations you make to it. The more precise you make the query, the better. Note, we do not, and never have nor will, count lines of code.

I believe we're getting to the right point (based on the telemetry we do have), but we're finding it really hard to turn the repos up based on keywords.

The new search is pretty good now and you can use regular expressions too.

I also blindly attempted to see if I could declare a language that Github would process somewhat generically for the sake of marking repos to no avail. https://github.com/atopile/spin-servo-drive/blob/main/.gitattributes

This is expected. This has been discussed at length in other issues and will not be discussed in this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants