Mitigate `cpu_usage()` from returning NaN on Macs #1236

Hoverbear · 2024-03-27T19:45:47Z

Describe the bug

Using the current latest checkout of this repo (or any 0.30.x version) on a Mac can result in cpu.cpu_usage() reporting a NaN. This is primarily caused by this code:

sysinfo/src/unix/apple/cpu.rs

Lines 239 to 244 in b7d7f31

    
               in_use = get_in_use(cpu_info, offset).saturating_sub(get_in_use(old_cpu_info, offset)); 
        
               total = in_use.saturating_add( 
        
                   get_idle(cpu_info, offset).saturating_sub(get_idle(old_cpu_info, offset)) as _, 
        
               ); 
        
           } 
        
           in_use as f32 / total as f32 * 100.

If refresh_cpu() is called in rapid succession the collected data can be the same, resulting in total being 0, since in_use is divided by total, this creates a divide by zero situation and can return a NaN.

It can be worked around by waiting sysinfo::MINIMUM_CPU_UPDATE_INTERVAL between calls to refresh_cpu(), as the data will (hopefully) no longer be the same.

Some platforms, such as Linux, do this interval check internally:

sysinfo/src/unix/linux/cpu.rs

Lines 83 to 85 in b7d7f31

    
           // If the last CPU usage update is too close (less than `MINIMUM_CPU_UPDATE_INTERVAL`), 
        
           // we don't want to update CPUs times. 
        
           if need_cpu_usage_update {

This PR takes the same strategy as Linux and enforces an interval internally.

I can confirm that as a result of this change, cargo test passess successfully on my aarch64 Mac.

To Reproduce

I used a aarch64 Mac, but it should be reproducable on any Mac. It was also observed on during local Rust builds with build.metrics enabled.

In order to reproduce this outside of the Rust repo, I did a bit of light refactoring of compute_usage_of_cpu:

// src/unix/apple/cpu.rs
pub(crate) fn compute_usage_of_cpu(proc_: &Cpu, cpu_info: *mut i32, offset: isize) -> f32 {
    let old_cpu_info = proc_.inner.data().cpu_info.0;
    let in_use;
    let total;

    // In case we are initializing cpus, there is no "old value" yet.
    if old_cpu_info == cpu_info {
        in_use = get_in_use(cpu_info, offset);
        eprintln !("in_use {in_use}");
        let idle = get_idle(cpu_info, offset);
        total = in_use.saturating_add(idle as _);
    } else {
        in_use = get_in_use(cpu_info, offset).saturating_sub(get_in_use(old_cpu_info, offset));
        eprintln !("in_use {in_use}");
        let idle = get_idle(cpu_info, offset);
        let old_idle = get_idle(old_cpu_info, offset);
        eprintln!("idle {idle}, old_idle {old_idle}");
        total = in_use.saturating_add(
            idle.saturating_sub(old_idle) as _,
        );
    }
    eprintln !("total {total}");
    in_use as f32 / total as f32 * 100.
}

Then I updated the the test_cpu test to rapidly call refresh_cpu() twice, and assert that none of the usage was NaN:

// tests/cpu.rs
#[test]
fn test_cpu() {
    let mut s = sysinfo::System::new();
    assert!(s.cpus().is_empty());

    if !sysinfo::IS_SUPPORTED_SYSTEM {
        return;
    }

    s.refresh_cpu();
    assert!(!s.cpus().is_empty());
    // std::thread::sleep(std::time::Duration::from_millis(200));
    s.refresh_cpu();

    assert!(s.cpus().iter().any(|c| !c.cpu_usage().is_nan()));

    let mut s = sysinfo::System::new_all();
    assert!(!s.cpus().is_empty());

    assert!(!s.cpus()[0].brand().chars().any(|c| c == '\0'));

    if !cfg!(target_os = "freebsd") {
        // This information is currently not retrieved on freebsd...
        assert!(s.cpus().iter().any(|c| !c.brand().is_empty()));
    }
    assert!(s.cpus().iter().any(|c| !c.vendor_id().is_empty()));
}

With that done, I was able to reproduce the issue via cargo test test_cpu -- --nocapture:

~/git/GuillameGomez/sysinfo> cargo test test_cpu -- --nocapture
# snipped

     Running tests/cpu.rs (target/debug/deps/cpu-02526d4c3ab91678)

running 1 test
in_use 5690012
total 24318088
in_use 5149645
total 24419660
in_use 3795005
total 24553864
in_use 2914872
total 24646964
in_use 2465127
total 24707921
in_use 2097229
total 24736675
in_use 878747
total 24836498
in_use 882113
total 24842076
in_use 874508
total 24843371
in_use 883108
total 24843382
in_use 886023
total 24844247
in_use 870234
total 24845131
in_use 0
idle 18628076, old_idle 18628076
total 0
in_use 0
idle 19270015, old_idle 19270015
total 0
in_use 0
idle 20758859, old_idle 20758859
total 0
in_use 0
idle 21732092, old_idle 21732092
total 0
in_use 0
idle 22242794, old_idle 22242794
total 0
in_use 0
idle 22639446, old_idle 22639446
total 0
in_use 0
idle 23957751, old_idle 23957751
total 0
in_use 0
idle 23959963, old_idle 23959963
total 0
in_use 0
idle 23968863, old_idle 23968863
total 0
in_use 0
idle 23960274, old_idle 23960274
total 0
in_use 0
idle 23958224, old_idle 23958224
total 0
in_use 0
idle 23974897, old_idle 23974897
total 0
thread 'test_cpu' panicked at tests/cpu.rs:19:5:
assertion failed: s.cpus().iter().any(|c| !c.cpu_usage().is_nan())
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
test test_cpu ... FAILED

failures:

failures:
    test_cpu

I confirmed that adding the delay resolves the issue.

Hoverbear · 2024-03-27T19:50:13Z

I believe the clippy failure to be unrelated to this PR, please let me know if I should take action to resolve it.

src/unix/apple/cpu.rs

GuillaumeGomez · 2024-03-27T21:55:24Z

Hi @Hoverbear! o/

First of all: thanks a lot for the PR, the detailed explanations of the issue and for adding test. It's really awesome. :)

Just one nit and I merge.

As for the clippy lint, it's not from your code, just last nightly clippy version. Can you add a #[allow()] attribute for it where it is triggered please? If you don't want to it's fine, I'll send a fix for it.

Hoverbear · 2024-03-28T04:22:17Z

So good to bump into you again, my friend. :) Really appreciate this crate and your work on it!

Hoverbear · 2024-03-28T05:10:33Z

I renamed this issue, I believe cpu_usage() may still be able to return NaN after this patch, it will simply be much more uncommon.

If the measurements of the total = in_use + (idle - old_idle) result in 0 (for example, when the measurement was identical to the last measurement) I believe a NaN could still be returned.

GuillaumeGomez · 2024-03-28T09:12:28Z

Could you not simply add a check and if it's 0, just return 0?

GuillaumeGomez · 2024-03-28T09:18:41Z

Forgot to precise: in addition to your add (which I think is a very good idea).

Hoverbear · 2024-03-28T13:53:51Z

Hmmmm... I believe that reporting zero would be incorrect, no? It's possible that say, in_use + new_idle - old_idle is zero but that actual usage the same as the last usage?

Let's say:

let old_in_use = 200;
let new_in_use = 200;
let old_idle = 4000;
let new_idle = 4000;
// Results in
let in_use = (new_in_use - old_in_use); // 0
let total = (new_in_use - old_in_use) + (new_idle - old_idle); // 0

Then we'd go divide by total and get a NaN again! Shucks. Returning zero here isn't correct either, since the usage is indeed not zero.

Therefore, to be safe here, we can check to see if the measurement is the same, and return the just the straight measurement instead of the difference if so.

let new_in_use = get_in_use(cpu_info, offset);
let old_in_use = get_in_use(old_cpu_info, offset);

let new_idle = get_idle(cpu_info, offset);
let old_idle = get_idle(old_cpu_info, offset);

// Check the old measurement and new one differ before trying to calculate the difference
// Avoid `total` becoming a zero since `in_use / 0 == NaN`
if new_in_use == old_in_use && new_idle == old_idle {
    in_use = new_in_use.saturating_sub(old_in_use);
    total = new_in_use.saturating_add(new_idle as _);
} else {
    in_use = new_in_use.saturating_sub(old_in_use);
    total = in_use.saturating_add(new_idle.saturating_sub(old_idle) as _);
}

Then, even still, it's possible to get a NaN! Assuming that:

let old_in_use = 400;  // It's bigger!
let new_in_use = 200;  // It's smaller!
let old_idle = 4000;
let new_idle = 4000;   // Unchanged because... CPU Frequency scaled(?)
// Results in
let in_use = (new_in_use - old_in_use); // 0
let total = (new_in_use - old_in_use) + (new_idle - old_idle); // 0

Heck! In this case we now need to do even more work, and how do we provide something accurate? 😵

Instead, I suggest we set in_use and idle, then calculate in_use / (idle + in_use)? I believe that would be more frequently correct...

let in_use;
let idle;

// In case we are initializing cpus, there is no "old value" yet.
if old_cpu_info == cpu_info {
    in_use = get_in_use(cpu_info, offset);
    idle = get_idle(cpu_info, offset);
} else {
    let new_in_use = get_in_use(cpu_info, offset);
    let old_in_use = get_in_use(old_cpu_info, offset);

    let new_idle = get_idle(cpu_info, offset);
    let old_idle = get_idle(old_cpu_info, offset);

    in_use = new_in_use.saturating_sub(old_in_use);
    idle = new_idle.saturating_sub(old_idle) as _;
}
let total = in_use.saturating_add(idle as _);
let usage = (in_use as f32 / total as f32) * 100.;
if usage.is_nan() {
    // If divided by zero, avoid returning a NaN
    0.
} else {
    usage
}

Please let me know if my brain became momentarily vacant and there would be a more appropriate fix?

Here's a log from a run using that code:

~/git/GuillameGomez/sysinfo> cargo run --example simple                                                                                                                                                                                                                  2024-03-28 06:53:03 AM
    Finished dev [unoptimized + debuginfo] target(s) in 0.01s
     Running `target/debug/examples/simple`
Getting system information...
Done.
To get the commands' list, enter 'help'.
> refresh_cpu
Refreshing CPUs...
Done.
> cpus
number of physical cores: 12
total CPU usage: 3.2543619%
Cpu { name: "1", CPU usage: 12.703101, frequency: 4056, vendor ID: "Apple", brand: "Apple M3 Pro" }
Cpu { name: "2", CPU usage: 10.895523, frequency: 4056, vendor ID: "Apple", brand: "Apple M3 Pro" }
Cpu { name: "3", CPU usage: 5.8479533, frequency: 4056, vendor ID: "Apple", brand: "Apple M3 Pro" }
Cpu { name: "4", CPU usage: 3.9358602, frequency: 4056, vendor ID: "Apple", brand: "Apple M3 Pro" }
Cpu { name: "5", CPU usage: 2.1802325, frequency: 4056, vendor ID: "Apple", brand: "Apple M3 Pro" }
Cpu { name: "6", CPU usage: 1.1678832, frequency: 4056, vendor ID: "Apple", brand: "Apple M3 Pro" }
Cpu { name: "7", CPU usage: 0.5805515, frequency: 4056, vendor ID: "Apple", brand: "Apple M3 Pro" }
Cpu { name: "8", CPU usage: 0.14492755, frequency: 4056, vendor ID: "Apple", brand: "Apple M3 Pro" }
Cpu { name: "9", CPU usage: 0.5805515, frequency: 4056, vendor ID: "Apple", brand: "Apple M3 Pro" }
Cpu { name: "10", CPU usage: 0.8708273, frequency: 4056, vendor ID: "Apple", brand: "Apple M3 Pro" }
Cpu { name: "11", CPU usage: 0.0, frequency: 4056, vendor ID: "Apple", brand: "Apple M3 Pro" }
Cpu { name: "12", CPU usage: 0.14492755, frequency: 4056, vendor ID: "Apple", brand: "Apple M3 Pro" }
> refresh_cpu
Refreshing CPUs...
Done.
> cpus
number of physical cores: 12
total CPU usage: 3.3380797%
Cpu { name: "1", CPU usage: 13.250883, frequency: 4056, vendor ID: "Apple", brand: "Apple M3 Pro" }
Cpu { name: "2", CPU usage: 10.229277, frequency: 4056, vendor ID: "Apple", brand: "Apple M3 Pro" }
Cpu { name: "3", CPU usage: 8.391608, frequency: 4056, vendor ID: "Apple", brand: "Apple M3 Pro" }
Cpu { name: "4", CPU usage: 2.7972028, frequency: 4056, vendor ID: "Apple", brand: "Apple M3 Pro" }
Cpu { name: "5", CPU usage: 1.9130435, frequency: 4056, vendor ID: "Apple", brand: "Apple M3 Pro" }
Cpu { name: "6", CPU usage: 0.8695652, frequency: 4056, vendor ID: "Apple", brand: "Apple M3 Pro" }
Cpu { name: "7", CPU usage: 1.0416667, frequency: 4056, vendor ID: "Apple", brand: "Apple M3 Pro" }
Cpu { name: "8", CPU usage: 0.0, frequency: 4056, vendor ID: "Apple", brand: "Apple M3 Pro" }
Cpu { name: "9", CPU usage: 0.34843206, frequency: 4056, vendor ID: "Apple", brand: "Apple M3 Pro" }
Cpu { name: "10", CPU usage: 1.2152778, frequency: 4056, vendor ID: "Apple", brand: "Apple M3 Pro" }
Cpu { name: "11", CPU usage: 0.0, frequency: 4056, vendor ID: "Apple", brand: "Apple M3 Pro" }
Cpu { name: "12", CPU usage: 0.0, frequency: 4056, vendor ID: "Apple", brand: "Apple M3 Pro" }

GuillaumeGomez · 2024-03-28T21:39:54Z

I agree with you. New code looks better too. Thanks a lot for this!

…o-0.30.8, r=Nilstrieb Update sysinfo to 0.30.8 Fixes a Mac specific issue when using `metrics = true` in `config.toml`. ```config.toml # Collect information and statistics about the current build and writes it to # disk. Enabling this or not has no impact on the resulting build output. The # schema of the file generated by the build metrics feature is unstable, and # this is not intended to be used during local development. metrics = true ``` During repeated builds, as the generated `metrics.json` grew, eventually `refresh_cpu()` would be called in quick enough succession (specifically: under 200ms) that a divide by zero would occur, leading to a `NaN` which would not be serialized, then when the `metrics.json` was re-read it would fail to parse. That error looks like this (collected from Ferrocene's CI): ``` Compiling rustdoc-tool v0.0.0 (/Users/distiller/project/src/tools/rustdoc) Finished release [optimized] target(s) in 38.37s thread 'main' panicked at src/utils/metrics.rs:180:21: serde_json::from_slice::<JsonRoot>(&contents) failed with invalid type: null, expected f64 at line 1 column 9598 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace Build completed unsuccessfully in 0:00:40 Exited with code exit status 1 ``` Related: GuillaumeGomez/sysinfo#1236

Rollup merge of rust-lang#123378 - ferrocene:hoverbear/bump-sysinfo-to-0.30.8, r=Nilstrieb Update sysinfo to 0.30.8 Fixes a Mac specific issue when using `metrics = true` in `config.toml`. ```config.toml # Collect information and statistics about the current build and writes it to # disk. Enabling this or not has no impact on the resulting build output. The # schema of the file generated by the build metrics feature is unstable, and # this is not intended to be used during local development. metrics = true ``` During repeated builds, as the generated `metrics.json` grew, eventually `refresh_cpu()` would be called in quick enough succession (specifically: under 200ms) that a divide by zero would occur, leading to a `NaN` which would not be serialized, then when the `metrics.json` was re-read it would fail to parse. That error looks like this (collected from Ferrocene's CI): ``` Compiling rustdoc-tool v0.0.0 (/Users/distiller/project/src/tools/rustdoc) Finished release [optimized] target(s) in 38.37s thread 'main' panicked at src/utils/metrics.rs:180:21: serde_json::from_slice::<JsonRoot>(&contents) failed with invalid type: null, expected f64 at line 1 column 9598 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace Build completed unsuccessfully in 0:00:40 Exited with code exit status 1 ``` Related: GuillaumeGomez/sysinfo#1236

Prevent Mac cpu_usage() from returning NaN

3c6dcc7

Hoverbear force-pushed the hoverbear/sysinfo branch from 829f614 to 3c6dcc7 Compare March 27, 2024 19:47

GuillaumeGomez reviewed Mar 27, 2024

View reviewed changes

src/unix/apple/cpu.rs Outdated Show resolved Hide resolved

Fixup clippy & review nit

adcf3c6

Hoverbear changed the title ~~Prevent cpu_usage() from returning NaN on Macs~~ Mitigate cpu_usage() from returning NaN on Macs Mar 28, 2024

Fully prevent NaNs from cpu_usage() on Mac

d3708f2

GuillaumeGomez approved these changes Mar 28, 2024

View reviewed changes

GuillaumeGomez merged commit 93c1c15 into GuillaumeGomez:master Mar 28, 2024
66 of 67 checks passed

Hoverbear deleted the hoverbear/sysinfo branch March 28, 2024 23:20

Hoverbear mentioned this pull request Apr 2, 2024

Update sysinfo to 0.30.8 rust-lang/rust#123378

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mitigate `cpu_usage()` from returning NaN on Macs #1236

Mitigate `cpu_usage()` from returning NaN on Macs #1236

Hoverbear commented Mar 27, 2024

Hoverbear commented Mar 27, 2024

GuillaumeGomez commented Mar 27, 2024

Hoverbear commented Mar 28, 2024

Hoverbear commented Mar 28, 2024

GuillaumeGomez commented Mar 28, 2024

GuillaumeGomez commented Mar 28, 2024

Hoverbear commented Mar 28, 2024 •

edited

GuillaumeGomez commented Mar 28, 2024

	in_use = get_in_use(cpu_info, offset).saturating_sub(get_in_use(old_cpu_info, offset));
	total = in_use.saturating_add(
	get_idle(cpu_info, offset).saturating_sub(get_idle(old_cpu_info, offset)) as _,
	);
	}
	in_use as f32 / total as f32 * 100.

	// If the last CPU usage update is too close (less than `MINIMUM_CPU_UPDATE_INTERVAL`),
	// we don't want to update CPUs times.
	if need_cpu_usage_update {

Mitigate cpu_usage() from returning NaN on Macs #1236

Mitigate cpu_usage() from returning NaN on Macs #1236

Conversation

Hoverbear commented Mar 27, 2024

Hoverbear commented Mar 27, 2024

GuillaumeGomez commented Mar 27, 2024

Hoverbear commented Mar 28, 2024

Hoverbear commented Mar 28, 2024

GuillaumeGomez commented Mar 28, 2024

GuillaumeGomez commented Mar 28, 2024

Hoverbear commented Mar 28, 2024 • edited

GuillaumeGomez commented Mar 28, 2024

Mitigate `cpu_usage()` from returning NaN on Macs #1236

Mitigate `cpu_usage()` from returning NaN on Macs #1236

Hoverbear commented Mar 28, 2024 •

edited