Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider recalibrating how bits are divided in Span #413

Open
dtolnay opened this issue Oct 10, 2023 · 3 comments
Open

Consider recalibrating how bits are divided in Span #413

dtolnay opened this issue Oct 10, 2023 · 3 comments

Comments

@dtolnay
Copy link
Owner

dtolnay commented Oct 10, 2023

Currently fallback spans store a pair of 32-bit low and high character indices.

proc-macro2/src/fallback.rs

Lines 491 to 496 in fecb02d

pub(crate) struct Span {
#[cfg(span_locations)]
pub(crate) lo: u32,
#[cfg(span_locations)]
pub(crate) hi: u32,
}

A span in which lo > hi is malformed, so right off the bat, approximately half of possible Span bit patterns are wasted.

Separately, tokens are usually small compared to the total amount of input parsed by a thread. If we switch to storing lo and hi - lo instead of lo and hi, then an even split of 32 bits each may not be the wisest allocation. For example, we could decide to give 36 bits to lo (supporting 64 GB input size) and 28 bits to hi - lo (limiting token size to 256 MB). Or some other uneven split.

@dtolnay
Copy link
Owner Author

dtolnay commented Oct 10, 2023

Rustc enforces a file size limit of 4 GB, so a token cannot be bigger than that.

use std::fs::File;
use std::io::Write as _;

fn main() {
    let buf = vec![b' '; 1024 * 1024];
    let mut file = File::create("spanoverflow.rs").unwrap();
    file.write_all(b"fn main() {\n").unwrap();
    for _ in 0..4100 {
        file.write_all(&buf).unwrap();
    }
    file.write_all(b"}\n").unwrap();
}
$ ls -lh spanoverflow.rs
-rw-r--r-- 1 dtolnay users 4.1G Oct  9 18:41 spanoverflow.rs

$ rustc spanoverflow.rs
fatal error: rustc does not support files larger than 4GB

@dtolnay
Copy link
Owner Author

dtolnay commented Oct 10, 2023

There appears to be no limit on the total amount of text parsed by rustc, even though its internal representation for BytePos is 32 bits.

https://github.com/rust-lang/rust/blob/1.73.0/compiler/rustc_span/src/lib.rs#L2010-L2014

If you parse more than 232 bytes, it overflows and you get bogus spans referring to the wrong files.

use std::fs::File;
use std::io::Write as _;

fn main() {
    let buf = vec![b' '; 1024 * 1024];

    let mut file = File::create("spanoverflow.rs").unwrap();
    file.write_all(b"mod module;\n").unwrap();
    for _ in 0..2050 {
        file.write_all(&buf).unwrap();
    }
    file.write_all(b"fn main() {}\n").unwrap();

    let mut file = File::create("module.rs").unwrap();
    for _ in 0..2050 {
        file.write_all(&buf).unwrap();
    }
    file.write_all(b"pub fn f() {}\n").unwrap();
}

According to rustc -Zunpretty=ast-tree,expanded spanoverflow.rs, this is the location of the f function (wrong):

                        Item {
                            attrs: [],
                            id: NodeId(10),
                            span: spanoverflow.rs:2:4194319: 2:4194332 (#0),
                            ident: f#0,
                            kind: Fn(

and this is the location of main (wrong):

        Item {
            attrs: [],
            id: NodeId(12),
            span: /home/dtolnay/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/string.rs:2952:2144473580: 2952:2144473592 (#0),
            ident: main#0,
            kind: Fn(

The correct locations would be module.rs and spanoverflow.rs respectively, which you get if the files do not overflow 232 bytes total size.

@dtolnay
Copy link
Owner Author

dtolnay commented Oct 10, 2023

For scale, currently there is 200 GB of Rust code published on crates.io. Looking at just the newest version of every crate, it is 16 GB of code. So a workload that involves parsing this, even on multiple threads, would currently hit overflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant