Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate homepage URLs for repositories #1438

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

jdno
Copy link
Member

@jdno jdno commented Apr 24, 2024

Repositories can be configured with a homepage, which is prominently featured on GitHub as a link. For repositories under the rust-lang organization, we want to make sure that those links only point to domains that are explicitly allowed. Ideally, only domains owned and operated by the Rust project itself will be whitelisted.

The risk with other domains is that they might expire silently and get taken over by malicious actors, who can then host phishing campaigns or malware on sites "advertised" by the Rust project.

An initial selection of domains has been added to the allowlist for homepage URLs. The domains are either owned and operated by the infra-team or belong to GitHub.

Repositories can be configured with a homepage, which is prominently
featured on GitHub as a link. For repositories under the `rust-lang`
organization, we want to make sure that those links only point to
domains that are explicitly allowed. Ideally, only domains owned and
operated by the Rust project itself will be whitelisted.

The risk with other domains is that they might expire silently and get
taken over by malicious actors, who can then host phishing campaigns or
malware on sites "advertised" by the Rust project.
An initial selection of domains has been added to the allowlist for
homepage URLs. The domains are either owned and operated by the
infra-team, or belong to GitHub.
@jdno
Copy link
Member Author

jdno commented Apr 24, 2024

Copy link
Contributor

@Kobzol Kobzol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left one comment.

if let Some(homepage) = &repo.homepage {
if !allowed_domains.iter().any(|domain| domain.is_match(homepage)) {
errors.push(format!(
"homepage URL for {}/{} is not on an allowed domain: {}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this could print the list of allowed domains to make debugging faster?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm I'm not sure if that is helpful in the output. What I haven't done yet is document the allowlist in the schema and point to the configuration file. I think that should hopefully be enough for folks to quickly find the list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, fair enough, if it's documented than that should be enough.

config.toml Outdated
"https://areweasyncyet.rs/?.*",
"https://crates.io/?.*",
"https://docs.rs/?.*",
"https://github.com/rust-lang/rust/issues/35437",
Copy link
Contributor

@apiraino apiraino Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious: why also the URL of this github issue? :)

.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because someone set it as the URL on a repo. 😅 I don't want to allow all of GitHub with something like https://github.com/?.*, because that could be abused with typosquatting. But we can change this to allow anything under the rust-lang organization with a rule like http://github.com/rust-lang/?.*.

@apiraino
Copy link
Contributor

@jdno since this patch in in draft, I assume this list is still incomplete, correct?

@jdno
Copy link
Member Author

jdno commented Apr 25, 2024

@jdno since this patch in in draft, I assume this list is still incomplete, correct?

Yes, there are three URLs that are currently missing from the list. As far as I can tell, they are not owned/managed by the infra-team:

[ERROR rust_team::validate] validation error: homepage URL for rust-lang/rustlings is not on an allowed domain: https://rustlings.cool
[ERROR rust_team::validate] validation error: homepage URL for rust-lang/this-week-in-rust is not on an allowed domain: https://this-week-in-rust.org/
[ERROR rust_team::validate] validation error: homepage URL for rust-lang/wg-allocators is not on an allowed domain: http://bit.ly/hello-wg-allocators

@apiraino
Copy link
Contributor

Yes, there are three URLs that are currently missing from the list. As far as I can tell, they are not owned/managed by the infra-team:

rustup.rs also? I didn't check them all, I'm sure you will find many more.

That bit.ly URL is ... ugh 😅

@jdno
Copy link
Member Author

jdno commented Apr 25, 2024

rustup.rs also? I didn't check them all, I'm sure you will find many more.

I didn't add all the domains that we own. Instead, I looked at the links that are currently set as a homepage and added most of them to the allowlist. I'm not sure if there's a lot of value to preemptively add all our domains to the list or try to keep them in sync as we add more. I think it's easier to add them on a case-by-case basis.

@@ -18,6 +18,18 @@ allowed-github-orgs = [
"rust-analyzer",
]

allowed-domains = [
"https://(.*)rust-lang.org/?.*",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"https://(.*)rust-lang.org/?.*",
"https://(.*)\\.rust-lang.org/?.*",

instead of all of this regex, would it make more sense to parse the URLs and check whether the host part ends with a list of allowed host suffixes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was the first approach that I considered, but then dropped when I came across the link to the GitHub issue and the bit.ly link. There are probably valid reasons to link to domains that we don't own, but in that case I would only want to allow specific URLs and not all of bit.ly or GitHub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants