Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latest Oryx Pro Firmware Update - Fan Usage/Noise Issue v:2021-07-20_93c2809 #241

Open
bartlebee13 opened this issue Sep 9, 2021 · 21 comments

Comments

@bartlebee13
Copy link

I upgraded my oryx pro firmware to the 2021-07-20_93c2809 version today.

Now the fans start randomly humming and every so often make kind of a low frequency weird chugging click sound.

I have slack, discord, chrome, postman, sublime, and a single terminal window open with two external monitors plugged in on Nvidia Graphics + High Performance Mode.

comp info:

NAME="Pop!_OS"
VERSION="21.04"
ID=pop
ID_LIKE="ubuntu debian"
PRETTY_NAME="Pop!_OS 21.04"
VERSION_ID="21.04"
HOME_URL="https://pop.system76.com"
SUPPORT_URL="https://support.system76.com"
BUG_REPORT_URL="https://github.com/pop-os/pop/issues"
PRIVACY_POLICY_URL="https://system76.com/privacy"
VERSION_CODENAME=hirsute
UBUNTU_CODENAME=hirsute
LOGO=distributor-logo-pop-os

sensors output:

iwlwifi_1-virtual-0
Adapter: Virtual device
temp1:            N/A  

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +55.0°C  (high = +100.0°C, crit = +100.0°C)
Core 0:        +54.0°C  (high = +100.0°C, crit = +100.0°C)
Core 1:        +54.0°C  (high = +100.0°C, crit = +100.0°C)
Core 2:        +53.0°C  (high = +100.0°C, crit = +100.0°C)
Core 3:        +53.0°C  (high = +100.0°C, crit = +100.0°C)
Core 4:        +52.0°C  (high = +100.0°C, crit = +100.0°C)
Core 5:        +55.0°C  (high = +100.0°C, crit = +100.0°C)
Core 6:        +54.0°C  (high = +100.0°C, crit = +100.0°C)
Core 7:        +56.0°C  (high = +100.0°C, crit = +100.0°C)

nvme-pci-2700
Adapter: PCI adapter
Composite:    +36.9°C  (low  = -273.1°C, high = +84.8°C)
                       (crit = +84.8°C)
Sensor 1:     +36.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +35.9°C  (low  = -273.1°C, high = +65261.8°C)

pch_cometlake-virtual-0
Adapter: Virtual device
temp1:        +50.0°C  

system76_acpi-acpi-0
Adapter: ACPI interface
CPU fan:     1509 RPM
GPU fan:     1953 RPM
CPU temp:     +54.0°C  
GPU temp:     +54.0°C  

BAT0-acpi-0
Adapter: ACPI interface
in0:          12.83 V  
curr1:         0.00 A  

@bartlebee13
Copy link
Author

Also the zoom problem that I described in a previous issue where zoom would cause fans to go crazy seems to be back with this firmware update

#163

@bartlebee13
Copy link
Author

iwlwifi_1-virtual-0
Adapter: Virtual device
temp1:            N/A  

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +75.0°C  (high = +100.0°C, crit = +100.0°C)
Core 0:        +75.0°C  (high = +100.0°C, crit = +100.0°C)
Core 1:        +60.0°C  (high = +100.0°C, crit = +100.0°C)
Core 2:        +60.0°C  (high = +100.0°C, crit = +100.0°C)
Core 3:        +59.0°C  (high = +100.0°C, crit = +100.0°C)
Core 4:        +58.0°C  (high = +100.0°C, crit = +100.0°C)
Core 5:        +61.0°C  (high = +100.0°C, crit = +100.0°C)
Core 6:        +58.0°C  (high = +100.0°C, crit = +100.0°C)
Core 7:        +59.0°C  (high = +100.0°C, crit = +100.0°C)

nvme-pci-2700
Adapter: PCI adapter
Composite:    +39.9°C  (low  = -273.1°C, high = +84.8°C)
                       (crit = +84.8°C)
Sensor 1:     +39.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +42.9°C  (low  = -273.1°C, high = +65261.8°C)

pch_cometlake-virtual-0
Adapter: Virtual device
temp1:        +54.0°C  

system76_acpi-acpi-0
Adapter: ACPI interface
CPU fan:     3546 RPM
GPU fan:     3809 RPM
CPU temp:     +72.0°C  
GPU temp:     +53.0°C  

BAT0-acpi-0
Adapter: ACPI interface
in0:          12.87 V  
curr1:         0.00 A  

starting up vscode made the fans lift off into space 😄

@jackpot51
Copy link
Member

Thanks for the information, and the sensors data. We'll take a look.

@sed-i
Copy link

sed-i commented Sep 14, 2021

Another regression in this update is stop-start behavior of the fans: at <10% load avg, the fans start (low speed) for a sew seconds, then stop, and then start again within 1-2 seconds. In terms of duty cycle, it is almost 100%. I suspect that long-term maintenance-wise it would be better to keep them running for longer at lower speed or something of that sort.

@curiousercreative
Copy link
Contributor

Someone comfortable flashing the EC could experiment with setting this line to 0: https://github.com/system76/ec/blob/master/src/board/system76/oryp7/board.mk#L32

@curiousercreative
Copy link
Contributor

∆You can also try setting CFLAGS+=-DBOARD_HEATUP=20 which samples temperature over approximately 5 seconds rather than 1 (divide value by 4 for seconds)

@MilesBHuff
Copy link

MilesBHuff commented Oct 15, 2021

fwiw, this config has been working well for me since June on an oryp7: https://github.com/MilesBHuff/ec/blob/patch-1/src/board/system76/oryp7/board.mk#L31-L94.
(I've been running @curiousercreative's smoothing code etc from before it was officially released.)

It has the clicking issue when fans start up / stop, but it does a good job of not being extremely jarring during light, medium, and heavy loads. I'll give @curiousercreative's recommendations above a try at some point; but @curiousercreative: wasn't https://github.com/system76/ec/blob/master/src/board/system76/oryp7/board.mk#L32 added to mitigate the buzzing issue in the first place?

@bartlebee13
Copy link
Author

the fans seem to start and stop a lot more lately.. any updates on this?

@Cadjoe
Copy link

Cadjoe commented Nov 27, 2021

@MilesBHuff, if you have a clicking fun issue, you might have to replace them or at least one of them. I had to do that with one of them in my oryp4, the clicking stopped.

Now, I'm dealing with the 1172 issue. Unfortunately, I haven't yet found a concrete solution to it.

@MilesBHuff
Copy link

MilesBHuff commented Nov 29, 2021

@Cadjoe

if you have a clicking fun issue, you might have to replace them or at least one of them.

The problem is, everyone who owns an oryp7 has the same clicking issue -- it's not feasible to tell every oryp7 owner to buy new fans if they don't want them to click. Hence, the software workaround.

@jacobgkau
Copy link
Member

The problem is, everyone who owns an oryp7 has the same clicking issue

Our lab oryp7 is not clicking, which means there is at least one unit that is not affected (and almost certainly more.) Please do not make generalized statements without data to back them up.

@bartlebee13
Copy link
Author

I think I've pinpointed the issue on my machine down to this scenario:

  • Oryx Pro with latest firmware and Pop!_OS 21.04
  • Nvidia Graphics Mode (w/ latest nvidia driver 470.86)
  • Two external displays: one via HDMI, one via USB-C

Basically if I unplug one of the monitors, the random spinning up and down of the fans every 10 seconds stops happening.

Otherwise this is the fan speed and it starts up and down every 5-10 seconds.
image

@curiousercreative
Copy link
Contributor

@barthvader13 do you have fractional scaling enabled? My Galago Pro will behave similarly whenever I scroll a webpage or really do anything when fractional scaling is enabled.

@MilesBHuff
Copy link

@jacobgkau

Our lab oryp7 is not clicking, which means there is at least one unit that is not affected (and almost certainly more.) Please do not make generalized statements without data to back them up.

Oh! Sorry, I was under the impression this whole time that it was every oryp7 that clicked, and that was why the fan minimum was set to 20% (later 25%). Mine's clicked since I first got it.

@bartlebee13
Copy link
Author

@barthvader13 do you have fractional scaling enabled? My Galago Pro will behave similarly whenever I scroll a webpage or really do anything when fractional scaling is enabled.

@curiousercreative I do not have that setting enabled

@r00t3g
Copy link

r00t3g commented Dec 14, 2022

A year later, related problem for me. Got the latest 4K Oryx Pro recently and fans are running like mad almost all the time as the IDE is running:

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +81.0°C  (high = +100.0°C, crit = +100.0°C)
Core 0:        +55.0°C  (high = +100.0°C, crit = +100.0°C)
Core 4:        +51.0°C  (high = +100.0°C, crit = +100.0°C)
Core 8:        +66.0°C  (high = +100.0°C, crit = +100.0°C)
Core 12:       +81.0°C  (high = +100.0°C, crit = +100.0°C)
Core 16:       +59.0°C  (high = +100.0°C, crit = +100.0°C)
Core 20:       +65.0°C  (high = +100.0°C, crit = +100.0°C)
Core 24:       +50.0°C  (high = +100.0°C, crit = +100.0°C)
Core 25:       +50.0°C  (high = +100.0°C, crit = +100.0°C)
Core 26:       +50.0°C  (high = +100.0°C, crit = +100.0°C)
Core 27:       +50.0°C  (high = +100.0°C, crit = +100.0°C)
Core 28:       +53.0°C  (high = +100.0°C, crit = +100.0°C)
Core 29:       +53.0°C  (high = +100.0°C, crit = +100.0°C)
Core 30:       +53.0°C  (high = +100.0°C, crit = +100.0°C)
Core 31:       +53.0°C  (high = +100.0°C, crit = +100.0°C)

nvme-pci-0200
Adapter: PCI adapter
Composite:    +41.9°C  (low  = -273.1°C, high = +81.8°C)
                       (crit = +84.8°C)
Sensor 1:     +41.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +55.9°C  (low  = -273.1°C, high = +65261.8°C)

iwlwifi_2-virtual-0
Adapter: Virtual device
temp1:        +46.0°C  

system76_acpi-acpi-0
Adapter: ACPI interface
CPU fan:     5208 RPM
GPU fan:     5284 RPM
CPU temp:     +86.0°C  
GPU temp:      +0.0°C  

BAT0-acpi-0
Adapter: ACPI interface
in0:          12.72 V  
curr1:         0.00 A  

I see temperature spikes to 80-85 degrees on one core but this barely lasts for 0.1-0.3 seconds, but make the fans spin up to maximum speed (visual observation with watch -d -n0.1 sensors) and then down in a couple of seconds.

This make the laptop really noisy as it either cycles the fans up and down in cycles, either runs them on the high speed constantly.

And this all happens with LA load average: 1.05, 1.56, 2.00 and no core being utilized for 100% according to htop.

The noise levels are hardly bearable. :( Shall I note that this makes Oryx barely usable in office environment?

Hopefully system76/firmware-open#365 will be merged to have at least the option to control the fans manually or by the OS.

Latest (and the only) original firmware, latest Pop_OS! 22.04.

@curiousercreative
Copy link
Contributor

@r00t3g any chance you can record a video of fans from zero load zero fan to what is considered loud fans and back to zero? I only have my galp5.

@r00t3g
Copy link

r00t3g commented Dec 19, 2022

@curiousercreative I've recorded it and uploaded to YT: https://www.youtube.com/watch?v=_acMuPZjIK8 . Unfortunately there seems to be some sound compression donr by Youtube, but, hopefully the difference is still hearable.

@curiousercreative
Copy link
Contributor

@curiousercreative I've recorded it and uploaded to YT: https://www.youtube.com/watch?v=_acMuPZjIK8 . Unfortunately there seems to be some sound compression donr by Youtube, but, hopefully the difference is still hearable.

Thanks for uploading! Sounds like fan smoothing is working the same as on my galp5 which is good news. So you've raised two concerns:

  1. Fan curves are based on package temperature (highest temp across cores)
  2. Not captured in the video, but you claim fans are spinning up too quick and for too long in response to temp increases? Can you record a video of that because that also should be subject to the same smoothing that would require a sustained increase in temp.

For #1, the best I can suggest short term is running in battery mode. Mid term, we can contribute power profiles for your model to target specific temperatures per profile. This works well for my galp5 effectively limiting my fans to 40% unless I explicitly go to performance power profile. If I need silence I can use battery profile (with considerations).

If you're interested, I can help you on the path to contributing power profiles.

@r00t3g
Copy link

r00t3g commented Dec 20, 2022

@curiousercreative, as far a I observe, several thing occur and lead to the described behavior:

  • When the temp goes up (even on a single core) for a short while, the fans still speed up, but with a delay
  • After the temp drops, the fans keep running for a while
  • If another short temp/cpu load occurs during this period, the speed increases even more

I tried to reproduce such behavior in https://www.youtube.com/watch?v=Ha-4gQCiZ0A . I start and quickly interrupt single-threaded, then multi-threaded nodejs compilation (make -j1, -j8, -j24 - last one intentionally stressful). The last one shows for how long the fans can keep running at increased speed.

Furthermore, I am not really sure that such intense fan rotation is really required for efficient cooling, however I am unable to prove it, since I have now possibility to control the fan speed to check the efficiency under various workloads and on various speeds.

As for power profiles contribution, do I get it right, that you refer to device-specific cases in https://github.com/pop-os/system76-power/blame/master/src/daemon/profiles.rs ? So, basically, I can try adjusting the power daemon to what I find optimal in terms of temp/fan speed balance? And in case of success, just open a PR, of course... :) If yes, everything seems quite clear there, however, if you could tell me what are the units for pl1, pl2 values (watts? ms?), I'd be very thankful :)

@curiousercreative
Copy link
Contributor

curiousercreative commented Dec 20, 2022

@r00t3g looking at these values, we see that we're taking the maximum of the last 5 and the minimum of the last 20 temperature readings when determining a fan speed to target. On my galp5, temperatures are read about 4 times per second, so that's about a delay of 1.25 seconds on increasing temperatures (heatup) and about 4 seconds on decreasing temperatures (cooldown). This is a mechanism to mitigate fan speed "bounciness". You're welcome to tinker with these values for yourself, the defaults are actually 4 and 10 respectively. Actually, if you'll be flashing EC, the most powerful configuration to modify is the smoothing. Just jack those numbers up if you feel like the fan speeds are ramping up and down too fast, here's my galp5 config that I run.

Yes, that's the correct system76-power file. pl1 is long-term power limit while pl2 is short-term power limit, both in watts. I'd suggest setting those to None and just focus on the tcc_offset. Once you have a good value there for a given profile, then you can find power limits that are in the right ballpark to prevent "thermal runaway". Here's a discussion thread between a System76 engineer and myself when I was contributing profiles for my galp5: pop-os/system76-power#210.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants