New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect typo-squatting #1074
Comments
There were a few good ideas on the recent rust subreddit post. Maybe they could be evaluated? |
It would be awesome if you could list the ideas here with pros/cons of each! |
Sure, I will do my best to summarize:
|
My preference would be to start with the lightest weight solution here, which would be the first one you noted, which is very similar to the description of this issue. Before changing policies or putting up barriers, I would like to be notified about what is happening, how often, by whom, and have time to adjust e.g. the edit distance before taking more drastic measures. |
Hey, I would be interested in implementing this. I think that we would need a list of popular crates first (possibly, like, the 50 most downloaded crates). Having such a list, would make it possible to check whether there are already crates that might be typo-squats. An actual implementation of just silently flagging the crate / sending the email upon creation shouldn't be hard to do in the end. |
Hi esclear, @carols10cents is looking at getting me a snapshot of the db in order to look into this. I'd be happy to work with you on it. |
Sure 👍 |
Okay, I'm currently working on doing some data analysis. The 50 most popular crates (considering all-time downloads) so far are: 50 Most popular crates
I shall provide a list of other crates with more or less similar names to these tomorrow. |
Okay, I accidentally did it right now. Turns out, that for the Using a levenshtein distance lower than 3 as an indicator of possible typo-squatting would yield the following result: Crates with similar names to the 50 most popular crates along with the levenshtein distance
Thus, I would suggest treating names as possible typo-squads if:
|
Because of #159 I've been hesitant to look at this via crates.io search. I agree that we will have to adjust distances based on word length though. |
This wasn't done via the search. I got a list of all package names from the crates.io-index and the 50 most popular crates along with downloads from the API. After discussing typo-squatting with some friends, In my opinion it would be sufficient to flag any crates which name is similar to a popular crates name within a levenshtein distance of 1. |
I to have some prior work on this and would love to be invalved in moving this forwerd! I was starting to research adding a typo check to cargo-edit. It would be convenient if there was a API for getting the possible typos from crates.io. It would also be nice if they appeared prominently in the search results. For a good, but non malicious, example I think Perhaps a link from each crates page |
Resurrecting this thread in light of recent events. I have a proposed solution that is a bit of a mix of @TheDan64 's points number one and two. Proposed Solution: Whenever a new crate is published on crates.io, check whether another similarly named crate already exists, using Levenshtein distance as mentioned above. If it does, perform a basic code comparison, and if the code is substantially similar:
The parameters of the Levenshtein distance used could be tuned as needed to help optimize the number of code comparisons performed. Also, the relative popularity of a crate may need to be taken into consideration, both in terms of risk and in terms of prioritization for the Rust Security Response WG. I was originally thinking this should just be a Links: |
I like the proposal except that I worry that the warning won't be seen by most people since it depends on the use of the non-built-in Ideally a warning could be printed by something within |
Also note that if the manual review approach is taken, it would be necessary to review each version, otherwise a simple avoidance of the protection is to upload the initial release of a typosquatted crate with a small bugfix (so it looks like you just needed to publish a fork with the fix) and then once it passes security review, publish an update with the malicious code. |
Yeah, I very much agree that it will be difficult to help cover all workflows. And that is another good point, there may need to be some form of perpetual/on-going checks. |
We should also compare notes with other community that have tried it in the past or have it now.
That document was collected by OpenSSF Working Group on Securing Software Repositories, so when we have a proposal we can ask for peoples input there. |
we have integrated https://github.com/rustfoundation/typomania last year and are expanding its integration in the near future. I guess this means the original issue is resolved :) |
Edit distance of some small amount away from an existing crate, when detected send an email to help@crates.io with a link to the crate and a link to the crate that its name is close to?
The text was updated successfully, but these errors were encountered: