* [Nouveau] Fans ramping up randomly when idle @ 2022-11-05 19:31 Timothy Madden 2022-11-07 11:41 ` Karol Herbst 2022-11-09 23:44 ` Karol Herbst 0 siblings, 2 replies; 9+ messages in thread From: Timothy Madden @ 2022-11-05 19:31 UTC (permalink / raw) To: nouveau Hello My Msi Gaming X Trio 2080 Ti randomly ramps up the fans with no way to recover (I have to reboot) even when the card is idle or is only showing the desktop. This issue happens even when the card is not connected to a monitor. My dmesg output from nouveau is included below, I think the last 2 lines are the relevant ones: [ 9426.768449] nvidia-gpu 0000:0b:00.3: Unable to change power state from D3hot to D0, device inaccessible [ 9427.889387] nvidia-gpu 0000:0b:00.3: i2c timeout error ffffffff timothy@localhost:~> dmesg | grep -i -e nouveau -e nvidia [ 6.511064] nouveau 0000:0b:00.0: NVIDIA TU102 (162000a1) [ 6.594464] nouveau 0000:0b:00.0: bios: version 90.02.42.00.14 [ 6.597756] nouveau 0000:0b:00.0: pmu: firmware unavailable [ 6.601947] nouveau 0000:0b:00.0: fb: 11264 MiB GDDR6 [ 6.618463] nouveau 0000:0b:00.0: DRM: VRAM: 11264 MiB [ 6.618465] nouveau 0000:0b:00.0: DRM: GART: 536870912 MiB [ 6.618466] nouveau 0000:0b:00.0: DRM: BIT table 'A' not found [ 6.618468] nouveau 0000:0b:00.0: DRM: BIT table 'L' not found [ 6.618469] nouveau 0000:0b:00.0: DRM: TMDS table version 2.0 [ 6.618470] nouveau 0000:0b:00.0: DRM: DCB version 4.1 [ 6.618471] nouveau 0000:0b:00.0: DRM: DCB outp 00: 02800f66 04600020 [ 6.618473] nouveau 0000:0b:00.0: DRM: DCB outp 01: 02000f62 00020020 [ 6.618474] nouveau 0000:0b:00.0: DRM: DCB outp 03: 02011f52 00020010 [ 6.618475] nouveau 0000:0b:00.0: DRM: DCB outp 04: 04822f76 04600010 [ 6.618476] nouveau 0000:0b:00.0: DRM: DCB outp 05: 04022f72 00020010 [ 6.618477] nouveau 0000:0b:00.0: DRM: DCB outp 08: 01844f36 04600010 [ 6.618478] nouveau 0000:0b:00.0: DRM: DCB outp 09: 01044f32 00020010 [ 6.618479] nouveau 0000:0b:00.0: DRM: DCB outp 10: 04833f86 04600020 [ 6.618481] nouveau 0000:0b:00.0: DRM: DCB conn 00: 00020046 [ 6.618481] nouveau 0000:0b:00.0: DRM: DCB conn 01: 00010161 [ 6.618482] nouveau 0000:0b:00.0: DRM: DCB conn 02: 01000246 [ 6.618483] nouveau 0000:0b:00.0: DRM: DCB conn 03: 02000371 [ 6.618484] nouveau 0000:0b:00.0: DRM: DCB conn 04: 00001446 [ 6.620448] nouveau 0000:0b:00.0: DRM: MM: using COPY for buffer copies [ 7.062338] nouveau 0000:0b:00.0: [drm] Cannot find any crtc or sizes [ 7.065331] [drm] Initialized nouveau 1.3.1 20120801 for 0000:0b:00.0 on minor 1 [ 7.254317] nouveau 0000:0b:00.0: [drm] Cannot find any crtc or sizes [ 7.446318] nouveau 0000:0b:00.0: [drm] Cannot find any crtc or sizes [ 8.501252] nvidia-gpu 0000:0b:00.3: enabling device (0000 -> 0002) [ 8.696138] audit: type=1400 audit(1667665884.700:5): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=926 comm="apparmor_parser" [ 8.696141] audit: type=1400 audit(1667665884.700:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=926 comm="apparmor_parser" [ 8.704333] snd_hda_intel 0000:0b:00.1: bound 0000:0b:00.0 (ops nv50_audio_component_bind_ops [nouveau]) [ 8.708797] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input15 [ 8.708903] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input16 [ 8.708936] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input17 [ 8.708965] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input18 [ 8.708994] input: HDA NVidia HDMI/DP,pcm=10 as /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input19 [ 8.709032] input: HDA NVidia HDMI/DP,pcm=11 as /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input20 [ 8.709065] input: HDA NVidia HDMI/DP,pcm=12 as /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input21 [ 10.776280] nouveau 0000:0b:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none [ 3275.720190] nouveau 0000:0b:00.0: therm: temperature (90 C) hit the 'fanboost' threshold [ 9426.768449] nvidia-gpu 0000:0b:00.3: Unable to change power state from D3hot to D0, device inaccessible [ 9427.889387] nvidia-gpu 0000:0b:00.3: i2c timeout error ffffffff timothy@localhost:~> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Nouveau] Fans ramping up randomly when idle 2022-11-05 19:31 [Nouveau] Fans ramping up randomly when idle Timothy Madden @ 2022-11-07 11:41 ` Karol Herbst 2022-11-09 22:50 ` Ajay Gupta 2022-11-10 15:01 ` Timothy Madden 2022-11-09 23:44 ` Karol Herbst 1 sibling, 2 replies; 9+ messages in thread From: Karol Herbst @ 2022-11-07 11:41 UTC (permalink / raw) To: Timothy Madden; +Cc: nouveau, Ajay Gupta On Sat, Nov 5, 2022 at 8:36 PM Timothy Madden <terminatorul@gmail.com> wrote: > > Hello > > My Msi Gaming X Trio 2080 Ti randomly ramps up the fans with no way to recover > (I have to reboot) even when the card is idle or is only showing the desktop. > > This issue happens even when the card is not connected to a monitor. > > My dmesg output from nouveau is included below, I think the last 2 lines are > the relevant ones: > [ 9426.768449] nvidia-gpu 0000:0b:00.3: Unable to change power state from D3hot to D0, device inaccessible > [ 9427.889387] nvidia-gpu 0000:0b:00.3: i2c timeout error ffffffff > > that's kind of odd, because "nvidia-gpu" implies you might have multiple drivers here? Though .3 should be some USB/UCSI or something related sub device on the GPU and Nvidia might have messed it up (adding the maintainer of the i2c-nvidia-gpu driver on CC). Anyway, the fans are probably controlled by the Laptops firmware and maybe something goes wrong with the runtime power management feature here, which as far as I can tell works on the Nouveau side, but i2c-nvidia-gpu might prevent the GPU from powering done and so causing more heat. It's also interesting that the GPU runs that hot, but given we don't support changing power states yet in Nouveau (still WIP wiring up the new released firmware from nvidia), not much we can do while the GPU is actually in use at this point. > > > timothy@localhost:~> dmesg | grep -i -e nouveau -e nvidia > [ 6.511064] nouveau 0000:0b:00.0: NVIDIA TU102 (162000a1) > [ 6.594464] nouveau 0000:0b:00.0: bios: version 90.02.42.00.14 > [ 6.597756] nouveau 0000:0b:00.0: pmu: firmware unavailable > [ 6.601947] nouveau 0000:0b:00.0: fb: 11264 MiB GDDR6 > [ 6.618463] nouveau 0000:0b:00.0: DRM: VRAM: 11264 MiB > [ 6.618465] nouveau 0000:0b:00.0: DRM: GART: 536870912 MiB > [ 6.618466] nouveau 0000:0b:00.0: DRM: BIT table 'A' not found > [ 6.618468] nouveau 0000:0b:00.0: DRM: BIT table 'L' not found > [ 6.618469] nouveau 0000:0b:00.0: DRM: TMDS table version 2.0 > [ 6.618470] nouveau 0000:0b:00.0: DRM: DCB version 4.1 > [ 6.618471] nouveau 0000:0b:00.0: DRM: DCB outp 00: 02800f66 04600020 > [ 6.618473] nouveau 0000:0b:00.0: DRM: DCB outp 01: 02000f62 00020020 > [ 6.618474] nouveau 0000:0b:00.0: DRM: DCB outp 03: 02011f52 00020010 > [ 6.618475] nouveau 0000:0b:00.0: DRM: DCB outp 04: 04822f76 04600010 > [ 6.618476] nouveau 0000:0b:00.0: DRM: DCB outp 05: 04022f72 00020010 > [ 6.618477] nouveau 0000:0b:00.0: DRM: DCB outp 08: 01844f36 04600010 > [ 6.618478] nouveau 0000:0b:00.0: DRM: DCB outp 09: 01044f32 00020010 > [ 6.618479] nouveau 0000:0b:00.0: DRM: DCB outp 10: 04833f86 04600020 > [ 6.618481] nouveau 0000:0b:00.0: DRM: DCB conn 00: 00020046 > [ 6.618481] nouveau 0000:0b:00.0: DRM: DCB conn 01: 00010161 > [ 6.618482] nouveau 0000:0b:00.0: DRM: DCB conn 02: 01000246 > [ 6.618483] nouveau 0000:0b:00.0: DRM: DCB conn 03: 02000371 > [ 6.618484] nouveau 0000:0b:00.0: DRM: DCB conn 04: 00001446 > [ 6.620448] nouveau 0000:0b:00.0: DRM: MM: using COPY for buffer copies > [ 7.062338] nouveau 0000:0b:00.0: [drm] Cannot find any crtc or sizes > [ 7.065331] [drm] Initialized nouveau 1.3.1 20120801 for 0000:0b:00.0 on minor 1 > [ 7.254317] nouveau 0000:0b:00.0: [drm] Cannot find any crtc or sizes > [ 7.446318] nouveau 0000:0b:00.0: [drm] Cannot find any crtc or sizes > [ 8.501252] nvidia-gpu 0000:0b:00.3: enabling device (0000 -> 0002) > [ 8.696138] audit: type=1400 audit(1667665884.700:5): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=926 comm="apparmor_parser" > [ 8.696141] audit: type=1400 audit(1667665884.700:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=926 comm="apparmor_parser" > [ 8.704333] snd_hda_intel 0000:0b:00.1: bound 0000:0b:00.0 (ops nv50_audio_component_bind_ops [nouveau]) > [ 8.708797] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input15 > [ 8.708903] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input16 > [ 8.708936] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input17 > [ 8.708965] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input18 > [ 8.708994] input: HDA NVidia HDMI/DP,pcm=10 as /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input19 > [ 8.709032] input: HDA NVidia HDMI/DP,pcm=11 as /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input20 > [ 8.709065] input: HDA NVidia HDMI/DP,pcm=12 as /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input21 > [ 10.776280] nouveau 0000:0b:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none > [ 3275.720190] nouveau 0000:0b:00.0: therm: temperature (90 C) hit the 'fanboost' threshold > [ 9426.768449] nvidia-gpu 0000:0b:00.3: Unable to change power state from D3hot to D0, device inaccessible > [ 9427.889387] nvidia-gpu 0000:0b:00.3: i2c timeout error ffffffff > timothy@localhost:~> > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Nouveau] Fans ramping up randomly when idle 2022-11-07 11:41 ` Karol Herbst @ 2022-11-09 22:50 ` Ajay Gupta 2022-11-10 15:22 ` Timothy Madden 2022-11-10 15:40 ` Timothy Madden 2022-11-10 15:01 ` Timothy Madden 1 sibling, 2 replies; 9+ messages in thread From: Ajay Gupta @ 2022-11-09 22:50 UTC (permalink / raw) To: Karol Herbst, Timothy Madden; +Cc: nouveau Hi > -----Original Message----- > From: Karol Herbst <kherbst@redhat.com> > Sent: Monday, November 7, 2022 3:42 AM > To: Timothy Madden <terminatorul@gmail.com> > Cc: nouveau@lists.freedesktop.org; Ajay Gupta <ajayg@nvidia.com> > Subject: Re: [Nouveau] Fans ramping up randomly when idle > > External email: Use caution opening links or attachments > > > On Sat, Nov 5, 2022 at 8:36 PM Timothy Madden <terminatorul@gmail.com> > wrote: > > > > Hello > > > > My Msi Gaming X Trio 2080 Ti randomly ramps up the fans with no way to > > recover (I have to reboot) even when the card is idle or is only showing the > desktop. > > > > This issue happens even when the card is not connected to a monitor. > > > > My dmesg output from nouveau is included below, I think the last 2 > > lines are the relevant ones: > > [ 9426.768449] nvidia-gpu 0000:0b:00.3: Unable to change power state > > from D3hot to D0, device inaccessible [ 9427.889387] nvidia-gpu > > 0000:0b:00.3: i2c timeout error ffffffff This only implies that there is no usb/ucsi device on the card, it is expected from such cards and should be seen in dmesg even when heating issue is not there. Thanks >nvpublic > > > > > > that's kind of odd, because "nvidia-gpu" implies you might have multiple > drivers here? Though .3 should be some USB/UCSI or something related sub > device on the GPU and Nvidia might have messed it up (adding the > maintainer of the i2c-nvidia-gpu driver on CC). > > Anyway, the fans are probably controlled by the Laptops firmware and > maybe something goes wrong with the runtime power management feature > here, which as far as I can tell works on the Nouveau side, but i2c-nvidia-gpu > might prevent the GPU from powering done and so causing more heat. It's > also interesting that the GPU runs that hot, but given we don't support > changing power states yet in Nouveau (still WIP wiring up the new released > firmware from nvidia), not much we can do while the GPU is actually in use at > this point. > > > > > > > timothy@localhost:~> dmesg | grep -i -e nouveau -e nvidia > > [ 6.511064] nouveau 0000:0b:00.0: NVIDIA TU102 (162000a1) > > [ 6.594464] nouveau 0000:0b:00.0: bios: version 90.02.42.00.14 > > [ 6.597756] nouveau 0000:0b:00.0: pmu: firmware unavailable > > [ 6.601947] nouveau 0000:0b:00.0: fb: 11264 MiB GDDR6 > > [ 6.618463] nouveau 0000:0b:00.0: DRM: VRAM: 11264 MiB > > [ 6.618465] nouveau 0000:0b:00.0: DRM: GART: 536870912 MiB > > [ 6.618466] nouveau 0000:0b:00.0: DRM: BIT table 'A' not found > > [ 6.618468] nouveau 0000:0b:00.0: DRM: BIT table 'L' not found > > [ 6.618469] nouveau 0000:0b:00.0: DRM: TMDS table version 2.0 > > [ 6.618470] nouveau 0000:0b:00.0: DRM: DCB version 4.1 > > [ 6.618471] nouveau 0000:0b:00.0: DRM: DCB outp 00: 02800f66 04600020 > > [ 6.618473] nouveau 0000:0b:00.0: DRM: DCB outp 01: 02000f62 00020020 > > [ 6.618474] nouveau 0000:0b:00.0: DRM: DCB outp 03: 02011f52 00020010 > > [ 6.618475] nouveau 0000:0b:00.0: DRM: DCB outp 04: 04822f76 04600010 > > [ 6.618476] nouveau 0000:0b:00.0: DRM: DCB outp 05: 04022f72 00020010 > > [ 6.618477] nouveau 0000:0b:00.0: DRM: DCB outp 08: 01844f36 04600010 > > [ 6.618478] nouveau 0000:0b:00.0: DRM: DCB outp 09: 01044f32 00020010 > > [ 6.618479] nouveau 0000:0b:00.0: DRM: DCB outp 10: 04833f86 04600020 > > [ 6.618481] nouveau 0000:0b:00.0: DRM: DCB conn 00: 00020046 > > [ 6.618481] nouveau 0000:0b:00.0: DRM: DCB conn 01: 00010161 > > [ 6.618482] nouveau 0000:0b:00.0: DRM: DCB conn 02: 01000246 > > [ 6.618483] nouveau 0000:0b:00.0: DRM: DCB conn 03: 02000371 > > [ 6.618484] nouveau 0000:0b:00.0: DRM: DCB conn 04: 00001446 > > [ 6.620448] nouveau 0000:0b:00.0: DRM: MM: using COPY for buffer > copies > > [ 7.062338] nouveau 0000:0b:00.0: [drm] Cannot find any crtc or sizes > > [ 7.065331] [drm] Initialized nouveau 1.3.1 20120801 for 0000:0b:00.0 on > minor 1 > > [ 7.254317] nouveau 0000:0b:00.0: [drm] Cannot find any crtc or sizes > > [ 7.446318] nouveau 0000:0b:00.0: [drm] Cannot find any crtc or sizes > > [ 8.501252] nvidia-gpu 0000:0b:00.3: enabling device (0000 -> 0002) > > [ 8.696138] audit: type=1400 audit(1667665884.700:5): > apparmor="STATUS" operation="profile_load" profile="unconfined" > name="nvidia_modprobe" pid=926 comm="apparmor_parser" > > [ 8.696141] audit: type=1400 audit(1667665884.700:6): > apparmor="STATUS" operation="profile_load" profile="unconfined" > name="nvidia_modprobe//kmod" pid=926 comm="apparmor_parser" > > [ 8.704333] snd_hda_intel 0000:0b:00.1: bound 0000:0b:00.0 (ops > nv50_audio_component_bind_ops [nouveau]) > > [ 8.708797] input: HDA NVidia HDMI/DP,pcm=3 as > /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input15 > > [ 8.708903] input: HDA NVidia HDMI/DP,pcm=7 as > /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input16 > > [ 8.708936] input: HDA NVidia HDMI/DP,pcm=8 as > /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input17 > > [ 8.708965] input: HDA NVidia HDMI/DP,pcm=9 as > /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input18 > > [ 8.708994] input: HDA NVidia HDMI/DP,pcm=10 as > /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input19 > > [ 8.709032] input: HDA NVidia HDMI/DP,pcm=11 as > /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input20 > > [ 8.709065] input: HDA NVidia HDMI/DP,pcm=12 as > /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input21 > > [ 10.776280] nouveau 0000:0b:00.0: vgaarb: changed VGA decodes: > olddecodes=io+mem,decodes=none:owns=none > > [ 3275.720190] nouveau 0000:0b:00.0: therm: temperature (90 C) hit the > > 'fanboost' threshold [ 9426.768449] nvidia-gpu 0000:0b:00.3: Unable to > > change power state from D3hot to D0, device inaccessible [ > > 9427.889387] nvidia-gpu 0000:0b:00.3: i2c timeout error ffffffff > > timothy@localhost:~> > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Nouveau] Fans ramping up randomly when idle 2022-11-09 22:50 ` Ajay Gupta @ 2022-11-10 15:22 ` Timothy Madden 2022-11-10 15:40 ` Timothy Madden 1 sibling, 0 replies; 9+ messages in thread From: Timothy Madden @ 2022-11-10 15:22 UTC (permalink / raw) To: Ajay Gupta, Karol Herbst; +Cc: nouveau [-- Attachment #1: Type: text/plain, Size: 3848 bytes --] On 11/10/22 00:50, Ajay Gupta wrote: > >>> This issue happens even when the card is not connected to a monitor. >>> >>> My dmesg output from nouveau is included below, I think the last 2 >>> lines are the relevant ones: >>> [ 9426.768449] nvidia-gpu 0000:0b:00.3: Unable to change power state >>> from D3hot to D0, device inaccessible [ 9427.889387] nvidia-gpu >>> 0000:0b:00.3: i2c timeout error ffffffff > This only implies that there is no usb/ucsi device on the card, it is expected from > such cards and should be seen in dmesg even when heating issue is not there. > > Thanks The 2080 Ti graphics card has an USB-C output, that works if I connect an USB storage device (see the DataTraveler device below). Should I try the USB port as a video output ? I need to bring in an USB-C monitor first from a different location. I also have an USB-C to HDMI adapter, but using such an adapter gives a different crash with nouveau driver, that I found on a different machine (my work laptop at the time), that I reported before. So I suppose it would not be a good test now. localhost:/home/timothy #lsusb Bus 008 Device 002: ID 05e3:0616 Genesys Logic, Inc. hub Bus 008 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 007 Device 002: ID 05e3:0610 Genesys Logic, Inc. Hub Bus 007 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 006 Device 002: ID 0951:176c Kingston Technology DataTraveler Max Bus 006 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 005 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 002 Device 003: ID 11b0:5111 ATECH FLASH TECHNOLOGY PRO88 Reader Bus 002 Device 002: ID 11b0:0031 ATECH FLASH TECHNOLOGY USB3.1 Hub Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 001 Device 006: ID 09da:2268 A4Tech Co., Ltd. USB Keyboard Bus 001 Device 004: ID 1a40:0101 Terminus Technology Inc. Hub Bus 001 Device 003: ID 8087:0025 Intel Corp. Wireless-AC 9260 Bluetooth Adapter Bus 001 Device 005: ID 046d:c24a Logitech, Inc. G600 Gaming Mouse Bus 001 Device 002: ID 11b0:0021 ATECH FLASH TECHNOLOGY USB2.0 Hub Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub localhost:/home/timothy #lsusb --tree /: Bus 08.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 5000M |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/2p, 5000M /: Bus 07.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 480M |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/2p, 480M /: Bus 06.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 10000M |__ Port 1: Dev 2, If 0, Class=Mass Storage, Driver=uas, 10000M /: Bus 05.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 480M /: Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 10000M /: Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 480M /: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/8p, 10000M |__ Port 2: Dev 2, If 0, Class=Hub, Driver=hub/4p, 5000M |__ Port 1: Dev 3, If 0, Class=Mass Storage, Driver=usb-storage, 5000M /: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/14p, 480M |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M |__ Port 2: Dev 3, If 0, Class=Wireless, Driver=btusb, 12M |__ Port 2: Dev 3, If 1, Class=Wireless, Driver=btusb, 12M |__ Port 9: Dev 4, If 0, Class=Hub, Driver=hub/4p, 480M |__ Port 3: Dev 6, If 0, Class=Human Interface Device, Driver=usbhid, 1.5M |__ Port 3: Dev 6, If 1, Class=Human Interface Device, Driver=usbhid, 1.5M |__ Port 10: Dev 5, If 0, Class=Human Interface Device, Driver=usbhid, 12M |__ Port 10: Dev 5, If 1, Class=Human Interface Device, Driver=usbhid, 12M localhost:/home/timothy # [-- Attachment #2: Type: text/html, Size: 6270 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Nouveau] Fans ramping up randomly when idle 2022-11-09 22:50 ` Ajay Gupta 2022-11-10 15:22 ` Timothy Madden @ 2022-11-10 15:40 ` Timothy Madden 1 sibling, 0 replies; 9+ messages in thread From: Timothy Madden @ 2022-11-10 15:40 UTC (permalink / raw) To: Ajay Gupta, Karol Herbst; +Cc: nouveau On 11/10/22 00:50, Ajay Gupta wrote: > >> On Sat, Nov 5, 2022 at 8:36 PM Timothy Madden <terminatorul@gmail.com> >> wrote: >>> Hello >>> >>> My Msi Gaming X Trio 2080 Ti randomly ramps up the fans with no way to >>> recover (I have to reboot) even when the card is idle or is only showing the >> desktop. >>> This issue happens even when the card is not connected to a monitor. >>> >>> My dmesg output from nouveau is included below, I think the last 2 >>> lines are the relevant ones: >>> [ 9426.768449] nvidia-gpu 0000:0b:00.3: Unable to change power state >>> from D3hot to D0, device inaccessible [ 9427.889387] nvidia-gpu >>> 0000:0b:00.3: i2c timeout error ffffffff > This only implies that there is no usb/ucsi device on the card, it is expected from > such cards and should be seen in dmesg even when heating issue is not there. > > More dmesg output from another session showing the same message from xhci_hcd, plus a kernel trace occurs when trying to shut down the computer Nov 10 14:39:58.800882 localhost kernel: thermal thermal_zone1: failed to read out thermal zone (-61) Nov 10 14:39:58.804606 localhost kernel: input: HD-Audio Generic Front Mic as /devices/pci0000:00/0000:00:08.1/0000:0d:00.3/sound/card2/input22 Nov 10 14:39:58.804644 localhost kernel: input: HD-Audio Generic Rear Mic as /devices/pci0000:00/0000:00:08.1/0000:0d:00.3/sound/card2/input23 Nov 10 14:39:58.804662 localhost kernel: input: HD-Audio Generic Line as /devices/pci0000:00/0000:00:08.1/0000:0d:00.3/sound/card2/input24 Nov 10 14:39:58.804679 localhost kernel: input: HD-Audio Generic Line Out Front as /devices/pci0000:00/0000:00:08.1/0000:0d:00.3/sound/card2/input25 Nov 10 14:39:58.804695 localhost kernel: input: HD-Audio Generic Line Out Surround as /devices/pci0000:00/0000:00:08.1/0000:0d:00.3/sound/card2/input26 Nov 10 14:39:58.804712 localhost kernel: input: HD-Audio Generic Line Out CLFE as /devices/pci0000:00/0000:00:08.1/0000:0d:00.3/sound/card2/input27 Nov 10 14:39:58.804729 localhost kernel: input: HD-Audio Generic Front Headphone as /devices/pci0000:00/0000:00:08.1/0000:0d:00.3/sound/card2/input28 Nov 10 14:39:58.812600 localhost kernel: Bluetooth: hci0: Found device firmware: intel/ibt-18-16-1.sfi Nov 10 14:39:58.812635 localhost kernel: Bluetooth: hci0: Boot Address: 0x40800 Nov 10 14:39:58.812651 localhost kernel: Bluetooth: hci0: Firmware Version: 214-6.22 Nov 10 14:39:58.812665 localhost kernel: Bluetooth: hci0: Firmware already loaded Nov 10 14:39:58.844599 localhost kernel: iwlwifi 0000:06:00.0: base HW address: d4:6d:6d:ad:12:bd, OTP minor version: 0x4 Nov 10 14:39:58.912599 localhost kernel: ieee80211 phy0: Selected rate control algorithm 'iwl-mvm-rs' Nov 10 14:39:58.976599 localhost kernel: iwlwifi 0000:06:00.0 wlp6s0: renamed from wlan0 Nov 10 14:39:59.068610 localhost kernel: intel_rapl_common: Found RAPL domain package Nov 10 14:39:59.068681 localhost kernel: intel_rapl_common: Found RAPL domain core Nov 10 14:39:59.548603 localhost kernel: EXT4-fs (sda3): mounted filesystem with ordered data mode. Quota mode: none. Nov 10 14:40:00.100765 localhost kernel: Bluetooth: BNEP (Ethernet Emulation) ver 1.3 Nov 10 14:40:00.100802 localhost kernel: Bluetooth: BNEP filters: protocol multicast Nov 10 14:40:00.100817 localhost kernel: Bluetooth: BNEP socket layer initialized Nov 10 14:40:00.100833 localhost kernel: Bluetooth: MGMT ver 1.22 Nov 10 14:40:00.100847 localhost kernel: NET: Registered PF_ALG protocol family Nov 10 14:40:00.252604 localhost kernel: NET: Registered PF_QIPCRTR protocol family Nov 10 14:40:00.392608 localhost kernel: bpfilter: Loaded bpfilter_umh pid 1434 Nov 10 14:40:00.392695 localhost unknown: Started bpfilter Nov 10 14:40:00.692608 localhost.localdomain kernel: ACPI: \: failed to evaluate _DSM bf0212f2-788f-c64d-a5b3-1f738e285ade (0x1001) Nov 10 14:40:00.692657 localhost.localdomain kernel: ACPI: \: failed to evaluate _DSM bf0212f2-788f-c64d-a5b3-1f738e285ade (0x1001) Nov 10 14:40:00.692680 localhost.localdomain kernel: ACPI: \: failed to evaluate _DSM bf0212f2-788f-c64d-a5b3-1f738e285ade (0x1001) Nov 10 14:40:00.692698 localhost.localdomain kernel: ACPI: \: failed to evaluate _DSM bf0212f2-788f-c64d-a5b3-1f738e285ade (0x1001) Nov 10 14:40:00.692720 localhost.localdomain kernel: ACPI: \: failed to evaluate _DSM bf0212f2-788f-c64d-a5b3-1f738e285ade (0x1001) Nov 10 14:40:00.692738 localhost.localdomain kernel: ACPI: \: failed to evaluate _DSM bf0212f2-788f-c64d-a5b3-1f738e285ade (0x1001) Nov 10 14:40:00.692753 localhost.localdomain kernel: ACPI: \: failed to evaluate _DSM bf0212f2-788f-c64d-a5b3-1f738e285ade (0x1001) Nov 10 14:40:00.700598 localhost.localdomain kernel: amdgpu 0000:0a:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem Nov 10 14:40:00.700757 localhost.localdomain kernel: nouveau 0000:0b:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none Nov 10 14:40:00.932605 localhost.localdomain kernel: ACPI: \: failed to evaluate _DSM bf0212f2-788f-c64d-a5b3-1f738e285ade (0x1001) Nov 10 14:40:00.932660 localhost.localdomain kernel: ACPI: \: failed to evaluate _DSM bf0212f2-788f-c64d-a5b3-1f738e285ade (0x1001) Nov 10 14:40:00.932688 localhost.localdomain kernel: ACPI: \: failed to evaluate _DSM bf0212f2-788f-c64d-a5b3-1f738e285ade (0x1001) Nov 10 14:40:00.932709 localhost.localdomain kernel: ACPI: \: failed to evaluate _DSM bf0212f2-788f-c64d-a5b3-1f738e285ade (0x1001) Nov 10 14:40:00.932729 localhost.localdomain kernel: ACPI: \: failed to evaluate _DSM bf0212f2-788f-c64d-a5b3-1f738e285ade (0x1001) Nov 10 14:40:00.932767 localhost.localdomain kernel: ACPI: \: failed to evaluate _DSM bf0212f2-788f-c64d-a5b3-1f738e285ade (0x1001) Nov 10 14:40:00.932790 localhost.localdomain kernel: ACPI: \: failed to evaluate _DSM bf0212f2-788f-c64d-a5b3-1f738e285ade (0x1001) Nov 10 14:40:01.036607 localhost.localdomain kernel: NET: Registered PF_PACKET protocol family Nov 10 14:40:03.344601 localhost.localdomain kernel: igb 0000:07:00.0 enp7s0: igb: enp7s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX Nov 10 14:40:03.672602 localhost.localdomain kernel: IPv6: ADDRCONF(NETDEV_CHANGE): enp7s0: link becomes ready Nov 10 14:40:07.783797 localhost.localdomain systemd-journald[759]: Time jumped backwards, rotating. Nov 10 14:40:10.122652 localhost.localdomain kernel: Bluetooth: RFCOMM TTY layer initialized Nov 10 14:40:10.122672 localhost.localdomain kernel: Bluetooth: RFCOMM socket layer initialized Nov 10 14:40:10.122683 localhost.localdomain kernel: Bluetooth: RFCOMM ver 1.11 Nov 10 15:34:21.665521 localhost.localdomain kernel: nouveau 0000:0b:00.0: therm: temperature (90 C) hit the 'fanboost' threshold Nov 10 15:56:57.670622 localhost.localdomain kernel: nouveau 0000:0b:00.0: therm: temperature (95 C) hit the 'downclock' threshold Nov 10 14:39:58.812665 localhost kernel: Bluetooth: hci0: Firmware already loaded Nov 10 14:39:58.844599 localhost kernel: iwlwifi 0000:06:00.0: base HW address: d4:6d:6d:ad:12:bd, OTP minor version: 0x4 Nov 10 14:39:58.912599 localhost kernel: ieee80211 phy0: Selected rate control algorithm 'iwl-mvm-rs' Nov 10 14:39:58.976599 localhost kernel: iwlwifi 0000:06:00.0 wlp6s0: renamed from wlan0 Nov 10 14:39:59.068610 localhost kernel: intel_rapl_common: Found RAPL domain package Nov 10 14:39:59.068681 localhost kernel: intel_rapl_common: Found RAPL domain core Nov 10 14:39:59.548603 localhost kernel: EXT4-fs (sda3): mounted filesystem with ordered data mode. Quota mode: none. Nov 10 14:40:00.100765 localhost kernel: Bluetooth: BNEP (Ethernet Emulation) ver 1.3 Nov 10 14:40:00.100802 localhost kernel: Bluetooth: BNEP filters: protocol multicast Nov 10 14:40:00.100817 localhost kernel: Bluetooth: BNEP socket layer initialized Nov 10 14:40:00.100833 localhost kernel: Bluetooth: MGMT ver 1.22 Nov 10 14:40:00.100847 localhost kernel: NET: Registered PF_ALG protocol family Nov 10 14:40:00.252604 localhost kernel: NET: Registered PF_QIPCRTR protocol family Nov 10 14:40:00.392608 localhost kernel: bpfilter: Loaded bpfilter_umh pid 1434 Nov 10 14:40:00.392695 localhost unknown: Started bpfilter Nov 10 14:40:00.692608 localhost.localdomain kernel: ACPI: \: failed to evaluate _DSM bf0212f2-788f-c64d-a5b3-1f738e285ade (0x1001) Nov 10 14:40:00.692657 localhost.localdomain kernel: ACPI: \: failed to evaluate _DSM bf0212f2-788f-c64d-a5b3-1f738e285ade (0x1001) Nov 10 14:40:00.692680 localhost.localdomain kernel: ACPI: \: failed to evaluate _DSM bf0212f2-788f-c64d-a5b3-1f738e285ade (0x1001) Nov 10 14:40:00.692698 localhost.localdomain kernel: ACPI: \: failed to evaluate _DSM bf0212f2-788f-c64d-a5b3-1f738e285ade (0x1001) Nov 10 14:40:00.692720 localhost.localdomain kernel: ACPI: \: failed to evaluate _DSM bf0212f2-788f-c64d-a5b3-1f738e285ade (0x1001) Nov 10 14:40:00.692738 localhost.localdomain kernel: ACPI: \: failed to evaluate _DSM bf0212f2-788f-c64d-a5b3-1f738e285ade (0x1001) Nov 10 14:40:00.692753 localhost.localdomain kernel: ACPI: \: failed to evaluate _DSM bf0212f2-788f-c64d-a5b3-1f738e285ade (0x1001) Nov 10 14:40:00.700598 localhost.localdomain kernel: amdgpu 0000:0a:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem Nov 10 14:40:00.700757 localhost.localdomain kernel: nouveau 0000:0b:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none Nov 10 14:40:00.932605 localhost.localdomain kernel: ACPI: \: failed to evaluate _DSM bf0212f2-788f-c64d-a5b3-1f738e285ade (0x1001) Nov 10 14:40:00.932660 localhost.localdomain kernel: ACPI: \: failed to evaluate _DSM bf0212f2-788f-c64d-a5b3-1f738e285ade (0x1001) Nov 10 14:40:00.932688 localhost.localdomain kernel: ACPI: \: failed to evaluate _DSM bf0212f2-788f-c64d-a5b3-1f738e285ade (0x1001) Nov 10 14:40:00.932709 localhost.localdomain kernel: ACPI: \: failed to evaluate _DSM bf0212f2-788f-c64d-a5b3-1f738e285ade (0x1001) Nov 10 14:40:00.932729 localhost.localdomain kernel: ACPI: \: failed to evaluate _DSM bf0212f2-788f-c64d-a5b3-1f738e285ade (0x1001) Nov 10 14:40:00.932767 localhost.localdomain kernel: ACPI: \: failed to evaluate _DSM bf0212f2-788f-c64d-a5b3-1f738e285ade (0x1001) Nov 10 14:40:00.932790 localhost.localdomain kernel: ACPI: \: failed to evaluate _DSM bf0212f2-788f-c64d-a5b3-1f738e285ade (0x1001) Nov 10 14:40:01.036607 localhost.localdomain kernel: NET: Registered PF_PACKET protocol family Nov 10 14:40:03.344601 localhost.localdomain kernel: igb 0000:07:00.0 enp7s0: igb: enp7s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX Nov 10 14:40:03.672602 localhost.localdomain kernel: IPv6: ADDRCONF(NETDEV_CHANGE): enp7s0: link becomes ready Nov 10 14:40:07.783797 localhost.localdomain systemd-journald[759]: Time jumped backwards, rotating. Nov 10 14:40:10.122652 localhost.localdomain kernel: Bluetooth: RFCOMM TTY layer initialized Nov 10 14:40:10.122672 localhost.localdomain kernel: Bluetooth: RFCOMM socket layer initialized Nov 10 14:40:10.122683 localhost.localdomain kernel: Bluetooth: RFCOMM ver 1.11 Nov 10 15:34:21.665521 localhost.localdomain kernel: nouveau 0000:0b:00.0: therm: temperature (90 C) hit the 'fanboost' threshold Nov 10 15:56:57.670622 localhost.localdomain kernel: nouveau 0000:0b:00.0: therm: temperature (95 C) hit the 'downclock' threshold Nov 10 16:00:02.318661 localhost.localdomain kernel: nvidia-gpu 0000:0b:00.3: Unable to change power state from D3hot to D0, device inaccessible Nov 10 16:00:02.378659 localhost.localdomain kernel: xhci_hcd 0000:0b:00.2: Unable to change power state from D3hot to D0, device inaccessible Nov 10 16:00:02.438659 localhost.localdomain kernel: xhci_hcd 0000:0b:00.2: Unable to change power state from D3cold to D0, device inaccessible Nov 10 16:00:02.438970 localhost.localdomain kernel: xhci_hcd 0000:0b:00.2: Controller not ready at resume -19 Nov 10 16:00:02.439185 localhost.localdomain kernel: xhci_hcd 0000:0b:00.2: PCI post-resume error -19! Nov 10 16:00:02.439380 localhost.localdomain kernel: xhci_hcd 0000:0b:00.2: HC died; cleaning up Nov 10 16:00:03.438657 localhost.localdomain kernel: nvidia-gpu 0000:0b:00.3: i2c timeout error ffffffff Nov 10 16:00:03.438965 localhost.localdomain kernel: ucsi_ccg 28-0008: i2c_transfer failed -110 Nov 10 16:01:22.954678 localhost.localdomain kernel: show_signal_msg: 46 callbacks suppressed Nov 10 16:01:22.954803 localhost.localdomain kernel: kwalletd5[2791]: segfault at 557dd537c ip 00007f86b7476213 sp 00007ffca735cd70 error 4 in libqca-qt5.so.2.3.5[7f86b7466000+7a000] Nov 10 16:01:22.954846 localhost.localdomain kernel: Code: 1f 84 00 00 00 00 00 8b 47 10 85 c0 7e 69 53 80 3f 00 74 2b 48 8b 5f 18 48 85 db 74 52 48 8b 7b 10 8b 53 0c 48 8b 33 48 8b 07 <ff> 50 08 48 89 df be 18 00 00 00 5b e9 ac 01 ff ff 0f 1f 40 00 48 Nov 10 16:01:47.207199 localhost.localdomain kernel: nouveau 0000:0b:00.0: timer: stalled at ffffffffffffffff Nov 10 16:01:47.207632 localhost.localdomain kernel: ------------[ cut here ]------------ Nov 10 16:01:47.207861 localhost.localdomain kernel: nouveau 0000:0b:00.0: timeout Nov 10 16:01:47.207905 localhost.localdomain kernel: WARNING: CPU: 0 PID: 1506 at drivers/gpu/drm/nouveau/nvkm/subdev/bar/g84.c:35 g84_bar_flush+0xca/0xe0 [nouveau] Nov 10 16:01:47.207941 localhost.localdomain kernel: Modules linked in: rfcomm nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct af_packet nft_chain_nat nf_tables ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle> Nov 10 16:01:47.208040 localhost.localdomain kernel: snd joydev soundcore ecdh_generic dca rfkill thermal gpio_amdpt gpio_generic tiny_power_button acpi_cpufreq fuse configfs ip_tables x_tables hid_generic usbhid uas usb_storage amdgpu crct10dif_pclmul crc32_pclmul nouveau polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel video iommu_v2 drm_ttm_h> Nov 10 16:01:47.208100 localhost.localdomain kernel: CPU: 0 PID: 1506 Comm: Xorg.bin Not tainted 6.0.7-1-default #1 openSUSE Tumbleweed 5e8b6dc6c4ea2058c3659c46d114e84bdc37a88e Nov 10 16:01:47.208138 localhost.localdomain kernel: Hardware name: Gigabyte Technology Co., Ltd. X470 AORUS GAMING 5 WIFI/X470 AORUS GAMING 5 WIFI-CF, BIOS F63c 07/20/2022 Nov 10 16:01:47.208167 localhost.localdomain kernel: RIP: 0010:g84_bar_flush+0xca/0xe0 [nouveau] Nov 10 16:01:47.208195 localhost.localdomain kernel: Code: 8b 40 10 48 8b 78 10 48 8b 5f 50 48 85 db 75 03 48 8b 1f e8 88 9e e8 e0 48 89 da 48 c7 c7 8e 7f aa c0 48 89 c6 e8 81 94 1b e1 <0f> 0b eb aa e8 8d a6 20 e1 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 Nov 10 16:01:47.208229 localhost.localdomain kernel: RSP: 0018:ffffb5e4c0c338a8 EFLAGS: 00010086 Nov 10 16:01:47.208254 localhost.localdomain kernel: RAX: 0000000000000000 RBX: ffff937b018f8850 RCX: 0000000000000027 Nov 10 16:01:47.208282 localhost.localdomain kernel: RDX: ffff937e3ea224e8 RSI: 0000000000000001 RDI: ffff937e3ea224e0 Nov 10 16:01:47.208310 localhost.localdomain kernel: RBP: ffff937b00f3b918 R08: 0000000000000000 R09: ffffb5e4c0c33750 Nov 10 16:01:47.208341 localhost.localdomain kernel: R10: 0000000000000003 R11: ffff937e3e7fffe8 R12: 0000000000000246 Nov 10 16:01:47.208377 localhost.localdomain kernel: R13: 0000000000000010 R14: 0000000000000004 R15: 0000000000000006 Nov 10 16:01:47.208414 localhost.localdomain kernel: FS: 00007fee02b02940(0000) GS:ffff937e3ea00000(0000) knlGS:0000000000000000 Nov 10 16:01:47.208447 localhost.localdomain kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 10 16:01:47.208474 localhost.localdomain kernel: CR2: 000055c259db74c0 CR3: 0000000106ec6000 CR4: 00000000003506f0 Nov 10 16:01:47.208503 localhost.localdomain kernel: Call Trace: Nov 10 16:01:47.208527 localhost.localdomain kernel: <TASK> Nov 10 16:01:47.208557 localhost.localdomain kernel: nv50_instobj_release+0x2a/0xa0 [nouveau 34ea2b686d636eb789a9af34726a5f23f567b847] Nov 10 16:01:47.208586 localhost.localdomain kernel: nvkm_vmm_iter.constprop.0+0x7e4/0x880 [nouveau 34ea2b686d636eb789a9af34726a5f23f567b847] Nov 10 16:01:47.208622 localhost.localdomain kernel: ? nvkm_vmm_ptes_sparse+0x1e0/0x1e0 [nouveau 34ea2b686d636eb789a9af34726a5f23f567b847] Nov 10 16:01:47.208662 localhost.localdomain kernel: ? gp100_vmm_pgt_sparse+0xc0/0xc0 [nouveau 34ea2b686d636eb789a9af34726a5f23f567b847] Nov 10 16:01:47.208700 localhost.localdomain kernel: nvkm_vmm_put_locked+0x105/0x280 [nouveau 34ea2b686d636eb789a9af34726a5f23f567b847] Nov 10 16:01:47.208733 localhost.localdomain kernel: ? nvkm_vmm_ptes_sparse+0x1e0/0x1e0 [nouveau 34ea2b686d636eb789a9af34726a5f23f567b847] Nov 10 16:01:47.208756 localhost.localdomain kernel: ? gp100_vmm_pgt_sparse+0xc0/0xc0 [nouveau 34ea2b686d636eb789a9af34726a5f23f567b847] Nov 10 16:01:47.208778 localhost.localdomain kernel: nvkm_uvmm_mthd+0x672/0x6a0 [nouveau 34ea2b686d636eb789a9af34726a5f23f567b847] Nov 10 16:01:47.208814 localhost.localdomain kernel: ? btrfs_block_rsv_release+0x161/0x1c0 [btrfs 99d03bf8982233f7af3195e0b6d760148166d664] Nov 10 16:01:47.208844 localhost.localdomain kernel: nvkm_ioctl+0xd8/0x180 [nouveau 34ea2b686d636eb789a9af34726a5f23f567b847] Nov 10 16:01:47.208868 localhost.localdomain kernel: nvif_object_mthd+0xc1/0x1f0 [nouveau 34ea2b686d636eb789a9af34726a5f23f567b847] Nov 10 16:01:47.208896 localhost.localdomain kernel: ? __count_memcg_events+0x2c/0x80 Nov 10 16:01:47.208923 localhost.localdomain kernel: ? uncharge_batch+0xca/0x120 Nov 10 16:01:47.208951 localhost.localdomain kernel: ? __slab_free+0xc4/0x2f0 Nov 10 16:01:47.208986 localhost.localdomain kernel: nvif_vmm_put+0x60/0x80 [nouveau 34ea2b686d636eb789a9af34726a5f23f567b847] Nov 10 16:01:47.209015 localhost.localdomain kernel: nouveau_vma_del+0x7c/0xb0 [nouveau 34ea2b686d636eb789a9af34726a5f23f567b847] Nov 10 16:01:47.209049 localhost.localdomain kernel: nouveau_gem_object_close+0x1cb/0x1e0 [nouveau 34ea2b686d636eb789a9af34726a5f23f567b847] Nov 10 16:01:47.209072 localhost.localdomain kernel: drm_gem_handle_delete+0x69/0xd0 Nov 10 16:01:47.209098 localhost.localdomain kernel: ? drm_mode_destroy_dumb+0x40/0x40 Nov 10 16:01:47.209504 localhost.localdomain kernel: drm_ioctl_kernel+0xc1/0x160 Nov 10 16:01:47.209531 localhost.localdomain kernel: drm_ioctl+0x21f/0x420 Nov 10 16:01:47.209558 localhost.localdomain kernel: ? drm_mode_destroy_dumb+0x40/0x40 Nov 10 16:01:47.209590 localhost.localdomain kernel: ? tlb_finish_mmu+0x65/0x170 Nov 10 16:01:47.209633 localhost.localdomain kernel: ? __slab_free+0xc4/0x2f0 Nov 10 16:01:47.209669 localhost.localdomain kernel: nouveau_drm_ioctl+0x56/0xb0 [nouveau 34ea2b686d636eb789a9af34726a5f23f567b847] Nov 10 16:01:47.209696 localhost.localdomain kernel: __x64_sys_ioctl+0x90/0xd0 Nov 10 16:01:47.209724 localhost.localdomain kernel: do_syscall_64+0x5b/0x80 Nov 10 16:01:47.209750 localhost.localdomain kernel: ? __vm_munmap+0x93/0x130 Nov 10 16:01:47.209773 localhost.localdomain kernel: ? syscall_exit_to_user_mode+0x17/0x40 Nov 10 16:01:47.209795 localhost.localdomain kernel: ? do_syscall_64+0x67/0x80 Nov 10 16:01:47.209818 localhost.localdomain kernel: ? exc_page_fault+0x66/0x150 Nov 10 16:01:47.209844 localhost.localdomain kernel: entry_SYSCALL_64_after_hwframe+0x63/0xcd Nov 10 16:01:47.209870 localhost.localdomain kernel: RIP: 0033:0x7fee0301c9bf Nov 10 16:01:47.209896 localhost.localdomain kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00 Nov 10 16:01:47.209923 localhost.localdomain kernel: RSP: 002b:00007ffe8c577e00 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 Nov 10 16:01:47.209952 localhost.localdomain kernel: RAX: ffffffffffffffda RBX: 000055c25a6f8680 RCX: 00007fee0301c9bf Nov 10 16:01:47.209971 localhost.localdomain kernel: RDX: 00007ffe8c577e94 RSI: 00000000c00464b4 RDI: 0000000000000015 Nov 10 16:01:47.209992 localhost.localdomain kernel: RBP: 00007ffe8c577e94 R08: 000055c25a6f83b0 R09: 00007fee03748840 Nov 10 16:01:47.210018 localhost.localdomain kernel: R10: 00007fee02f21e90 R11: 0000000000000246 R12: 00000000c00464b4 Nov 10 16:01:47.210045 localhost.localdomain kernel: R13: 0000000000000015 R14: 000055c25a024dc0 R15: 00007fee02722a60 Nov 10 16:01:47.210067 localhost.localdomain kernel: </TASK> Nov 10 16:01:47.210093 localhost.localdomain kernel: ---[ end trace 0000000000000000 ]--- Nov 10 16:01:47.210116 localhost.localdomain kernel: nouveau 0000:0b:00.0: timer: stalled at ffffffffffffffff ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Nouveau] Fans ramping up randomly when idle 2022-11-07 11:41 ` Karol Herbst 2022-11-09 22:50 ` Ajay Gupta @ 2022-11-10 15:01 ` Timothy Madden 2022-11-10 15:36 ` Karol Herbst 1 sibling, 1 reply; 9+ messages in thread From: Timothy Madden @ 2022-11-10 15:01 UTC (permalink / raw) To: nouveau; +Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Ajay Gupta On 11/7/22 13:41, Karol Herbst wrote: > On Sat, Nov 5, 2022 at 8:36 PM Timothy Madden <terminatorul@gmail.com> wrote: >> >> Hello >> >> My Msi Gaming X Trio 2080 Ti randomly ramps up the fans with no way to recover >> (I have to reboot) even when the card is idle or is only showing the desktop. >> >> This issue happens even when the card is not connected to a monitor. >> >> My dmesg output from nouveau is included below, I think the last 2 lines are >> the relevant ones: >> [ 9426.768449] nvidia-gpu 0000:0b:00.3: Unable to change power state from D3hot to D0, device inaccessible >> [ 9427.889387] nvidia-gpu 0000:0b:00.3: i2c timeout error ffffffff >> >> > > that's kind of odd, because "nvidia-gpu" implies you might have > multiple drivers here? Though .3 should be some USB/UCSI or something > related sub device on the GPU and Nvidia might have messed it up > (adding the maintainer of the i2c-nvidia-gpu driver on CC). Is there a way to check for multiple drivers ? I have openSUSE Tumbleweed at version 2022-11-08, and I did not install proprietary or other NVIDIA drivers. > > Anyway, the fans are probably controlled by the Laptops firmware and I meant the fans on the graphics card. No laptop here, my desktop computer has a Gigabyte X470 Aorus Gaming 5 WiFi motherboard with the latest UEFI from gigabyte web site. > maybe something goes wrong with the runtime power management feature > here, which as far as I can tell works on the Nouveau side, but > i2c-nvidia-gpu might prevent the GPU from powering done and so causing > more heat. It's also interesting that the GPU runs that hot, but given > we don't support changing power states yet in Nouveau (still WIP > wiring up the new released firmware from nvidia), not much we can do > while the GPU is actually in use at this point. > >> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Nouveau] Fans ramping up randomly when idle 2022-11-10 15:01 ` Timothy Madden @ 2022-11-10 15:36 ` Karol Herbst 0 siblings, 0 replies; 9+ messages in thread From: Karol Herbst @ 2022-11-10 15:36 UTC (permalink / raw) To: Timothy Madden Cc: nouveau, nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Ajay Gupta On Thu, Nov 10, 2022 at 4:01 PM Timothy Madden <terminatorul@gmail.com> wrote: > > On 11/7/22 13:41, Karol Herbst wrote: > > On Sat, Nov 5, 2022 at 8:36 PM Timothy Madden <terminatorul@gmail.com> wrote: > >> > >> Hello > >> > >> My Msi Gaming X Trio 2080 Ti randomly ramps up the fans with no way to recover > >> (I have to reboot) even when the card is idle or is only showing the desktop. > >> > >> This issue happens even when the card is not connected to a monitor. > >> > >> My dmesg output from nouveau is included below, I think the last 2 lines are > >> the relevant ones: > >> [ 9426.768449] nvidia-gpu 0000:0b:00.3: Unable to change power state from D3hot to D0, device inaccessible > >> [ 9427.889387] nvidia-gpu 0000:0b:00.3: i2c timeout error ffffffff > >> > >> > > > > that's kind of odd, because "nvidia-gpu" implies you might have > > multiple drivers here? Though .3 should be some USB/UCSI or something > > related sub device on the GPU and Nvidia might have messed it up > > (adding the maintainer of the i2c-nvidia-gpu driver on CC). > > Is there a way to check for multiple drivers ? I have openSUSE > Tumbleweed at version 2022-11-08, and I did not install proprietary or > other NVIDIA drivers. > > > > > Anyway, the fans are probably controlled by the Laptops firmware and > > I meant the fans on the graphics card. No laptop here, my desktop > computer has a Gigabyte X470 Aorus Gaming 5 WiFi motherboard with the > latest UEFI from gigabyte web site. > ah, then I got that part wrong. Okay if that's a desktop GPU, then it's something we literally can't fix. We require signed firmware in order to control the fans from a driver side. What you seem to experience is the GPU doing it itself, because it overheats. Sad part, we can't change the clock without signed firmware either. What could help is the new GSP firmware we got, but that's still WIP and would require you to compile your kernel and fetch a firmware from somewhere and then it might not even work correctly yet. > > maybe something goes wrong with the runtime power management feature > > here, which as far as I can tell works on the Nouveau side, but > > i2c-nvidia-gpu might prevent the GPU from powering done and so causing > > more heat. It's also interesting that the GPU runs that hot, but given > > we don't support changing power states yet in Nouveau (still WIP > > wiring up the new released firmware from nvidia), not much we can do > > while the GPU is actually in use at this point. > > > >> > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Nouveau] Fans ramping up randomly when idle 2022-11-05 19:31 [Nouveau] Fans ramping up randomly when idle Timothy Madden 2022-11-07 11:41 ` Karol Herbst @ 2022-11-09 23:44 ` Karol Herbst 2022-11-10 14:51 ` Timothy Madden 1 sibling, 1 reply; 9+ messages in thread From: Karol Herbst @ 2022-11-09 23:44 UTC (permalink / raw) To: Timothy Madden; +Cc: nouveau On Sat, Nov 5, 2022 at 8:36 PM Timothy Madden <terminatorul@gmail.com> wrote: > > Hello > > My Msi Gaming X Trio 2080 Ti randomly ramps up the fans with no way to recover > (I have to reboot) even when the card is idle or is only showing the desktop. > > This issue happens even when the card is not connected to a monitor. > > My dmesg output from nouveau is included below, I think the last 2 lines are > the relevant ones: > [ 9426.768449] nvidia-gpu 0000:0b:00.3: Unable to change power state from D3hot to D0, device inaccessible > [ 9427.889387] nvidia-gpu 0000:0b:00.3: i2c timeout error ffffffff > > > > > timothy@localhost:~> dmesg | grep -i -e nouveau -e nvidia > [ 6.511064] nouveau 0000:0b:00.0: NVIDIA TU102 (162000a1) > [ 6.594464] nouveau 0000:0b:00.0: bios: version 90.02.42.00.14 > [ 6.597756] nouveau 0000:0b:00.0: pmu: firmware unavailable > [ 6.601947] nouveau 0000:0b:00.0: fb: 11264 MiB GDDR6 > [ 6.618463] nouveau 0000:0b:00.0: DRM: VRAM: 11264 MiB > [ 6.618465] nouveau 0000:0b:00.0: DRM: GART: 536870912 MiB > [ 6.618466] nouveau 0000:0b:00.0: DRM: BIT table 'A' not found > [ 6.618468] nouveau 0000:0b:00.0: DRM: BIT table 'L' not found > [ 6.618469] nouveau 0000:0b:00.0: DRM: TMDS table version 2.0 > [ 6.618470] nouveau 0000:0b:00.0: DRM: DCB version 4.1 > [ 6.618471] nouveau 0000:0b:00.0: DRM: DCB outp 00: 02800f66 04600020 > [ 6.618473] nouveau 0000:0b:00.0: DRM: DCB outp 01: 02000f62 00020020 > [ 6.618474] nouveau 0000:0b:00.0: DRM: DCB outp 03: 02011f52 00020010 > [ 6.618475] nouveau 0000:0b:00.0: DRM: DCB outp 04: 04822f76 04600010 > [ 6.618476] nouveau 0000:0b:00.0: DRM: DCB outp 05: 04022f72 00020010 > [ 6.618477] nouveau 0000:0b:00.0: DRM: DCB outp 08: 01844f36 04600010 > [ 6.618478] nouveau 0000:0b:00.0: DRM: DCB outp 09: 01044f32 00020010 > [ 6.618479] nouveau 0000:0b:00.0: DRM: DCB outp 10: 04833f86 04600020 > [ 6.618481] nouveau 0000:0b:00.0: DRM: DCB conn 00: 00020046 > [ 6.618481] nouveau 0000:0b:00.0: DRM: DCB conn 01: 00010161 > [ 6.618482] nouveau 0000:0b:00.0: DRM: DCB conn 02: 01000246 > [ 6.618483] nouveau 0000:0b:00.0: DRM: DCB conn 03: 02000371 > [ 6.618484] nouveau 0000:0b:00.0: DRM: DCB conn 04: 00001446 > [ 6.620448] nouveau 0000:0b:00.0: DRM: MM: using COPY for buffer copies > [ 7.062338] nouveau 0000:0b:00.0: [drm] Cannot find any crtc or sizes > [ 7.065331] [drm] Initialized nouveau 1.3.1 20120801 for 0000:0b:00.0 on minor 1 > [ 7.254317] nouveau 0000:0b:00.0: [drm] Cannot find any crtc or sizes > [ 7.446318] nouveau 0000:0b:00.0: [drm] Cannot find any crtc or sizes > [ 8.501252] nvidia-gpu 0000:0b:00.3: enabling device (0000 -> 0002) > [ 8.696138] audit: type=1400 audit(1667665884.700:5): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=926 comm="apparmor_parser" > [ 8.696141] audit: type=1400 audit(1667665884.700:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=926 comm="apparmor_parser" > [ 8.704333] snd_hda_intel 0000:0b:00.1: bound 0000:0b:00.0 (ops nv50_audio_component_bind_ops [nouveau]) > [ 8.708797] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input15 > [ 8.708903] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input16 > [ 8.708936] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input17 > [ 8.708965] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input18 > [ 8.708994] input: HDA NVidia HDMI/DP,pcm=10 as /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input19 > [ 8.709032] input: HDA NVidia HDMI/DP,pcm=11 as /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input20 > [ 8.709065] input: HDA NVidia HDMI/DP,pcm=12 as /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input21 > [ 10.776280] nouveau 0000:0b:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none > [ 3275.720190] nouveau 0000:0b:00.0: therm: temperature (90 C) hit the 'fanboost' threshold one thing which might help to figure out what's going on would be to know the output of `lspci -t` and `grep . /sys/bus/pci/devices/*/power/control` > [ 9426.768449] nvidia-gpu 0000:0b:00.3: Unable to change power state from D3hot to D0, device inaccessible > [ 9427.889387] nvidia-gpu 0000:0b:00.3: i2c timeout error ffffffff > timothy@localhost:~> > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Nouveau] Fans ramping up randomly when idle 2022-11-09 23:44 ` Karol Herbst @ 2022-11-10 14:51 ` Timothy Madden 0 siblings, 0 replies; 9+ messages in thread From: Timothy Madden @ 2022-11-10 14:51 UTC (permalink / raw) To: nouveau On 11/10/22 01:44, Karol Herbst wrote: >> [ 10.776280] nouveau 0000:0b:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none >> [ 3275.720190] nouveau 0000:0b:00.0: therm: temperature (90 C) hit the 'fanboost' threshold > > one thing which might help to figure out what's going on would be to > know the output of `lspci -t` and `grep . > /sys/bus/pci/devices/*/power/control` localhost:/home/timothy # lspci -t -[0000:00]-+-00.0 +-00.2 +-01.0 +-01.1-[01]----00.0 +-01.3-[02-09]--+-00.0 | +-00.1 | \-00.2-[03-09]--+-00.0-[04]-- | +-01.0-[05]-- | +-02.0-[06]----00.0 | +-03.0-[07]----00.0 | +-04.0-[08]-- | \-09.0-[09]----00.0 +-02.0 +-03.0 +-03.1-[0a]--+-00.0 | \-00.1 +-03.2-[0b]--+-00.0 | +-00.1 | +-00.2 | \-00.3 +-04.0 +-07.0 +-07.1-[0c]--+-00.0 | +-00.2 | \-00.3 +-08.0 +-08.1-[0d]--+-00.0 | +-00.2 | \-00.3 +-14.0 +-14.3 +-18.0 +-18.1 +-18.2 +-18.3 +-18.4 +-18.5 +-18.6 \-18.7 localhost:/home/timothy # grep . /sys/bus/pci/devices/*/power/control /sys/bus/pci/devices/0000:00:00.0/power/control:on /sys/bus/pci/devices/0000:00:00.2/power/control:on /sys/bus/pci/devices/0000:00:01.0/power/control:on /sys/bus/pci/devices/0000:00:01.1/power/control:auto /sys/bus/pci/devices/0000:00:01.3/power/control:auto /sys/bus/pci/devices/0000:00:02.0/power/control:on /sys/bus/pci/devices/0000:00:03.0/power/control:on /sys/bus/pci/devices/0000:00:03.1/power/control:auto /sys/bus/pci/devices/0000:00:03.2/power/control:auto /sys/bus/pci/devices/0000:00:04.0/power/control:on /sys/bus/pci/devices/0000:00:07.0/power/control:on /sys/bus/pci/devices/0000:00:07.1/power/control:auto /sys/bus/pci/devices/0000:00:08.0/power/control:on /sys/bus/pci/devices/0000:00:08.1/power/control:auto /sys/bus/pci/devices/0000:00:14.0/power/control:on /sys/bus/pci/devices/0000:00:14.3/power/control:on /sys/bus/pci/devices/0000:00:18.0/power/control:on /sys/bus/pci/devices/0000:00:18.1/power/control:on /sys/bus/pci/devices/0000:00:18.2/power/control:on /sys/bus/pci/devices/0000:00:18.3/power/control:on /sys/bus/pci/devices/0000:00:18.4/power/control:on /sys/bus/pci/devices/0000:00:18.5/power/control:on /sys/bus/pci/devices/0000:00:18.6/power/control:on /sys/bus/pci/devices/0000:00:18.7/power/control:on /sys/bus/pci/devices/0000:01:00.0/power/control:on /sys/bus/pci/devices/0000:02:00.0/power/control:on /sys/bus/pci/devices/0000:02:00.1/power/control:on /sys/bus/pci/devices/0000:02:00.2/power/control:auto /sys/bus/pci/devices/0000:03:00.0/power/control:auto /sys/bus/pci/devices/0000:03:01.0/power/control:auto /sys/bus/pci/devices/0000:03:02.0/power/control:auto /sys/bus/pci/devices/0000:03:03.0/power/control:auto /sys/bus/pci/devices/0000:03:04.0/power/control:auto /sys/bus/pci/devices/0000:03:09.0/power/control:auto /sys/bus/pci/devices/0000:06:00.0/power/control:auto /sys/bus/pci/devices/0000:07:00.0/power/control:on /sys/bus/pci/devices/0000:09:00.0/power/control:on /sys/bus/pci/devices/0000:0a:00.0/power/control:auto /sys/bus/pci/devices/0000:0a:00.1/power/control:auto /sys/bus/pci/devices/0000:0b:00.0/power/control:on /sys/bus/pci/devices/0000:0b:00.1/power/control:auto /sys/bus/pci/devices/0000:0b:00.2/power/control:auto /sys/bus/pci/devices/0000:0b:00.3/power/control:auto /sys/bus/pci/devices/0000:0c:00.0/power/control:on /sys/bus/pci/devices/0000:0c:00.2/power/control:on /sys/bus/pci/devices/0000:0c:00.3/power/control:on /sys/bus/pci/devices/0000:0d:00.0/power/control:on /sys/bus/pci/devices/0000:0d:00.2/power/control:on /sys/bus/pci/devices/0000:0d:00.3/power/control:auto localhost:/home/timothy # ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-05-04 12:32 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-11-05 19:31 [Nouveau] Fans ramping up randomly when idle Timothy Madden 2022-11-07 11:41 ` Karol Herbst 2022-11-09 22:50 ` Ajay Gupta 2022-11-10 15:22 ` Timothy Madden 2022-11-10 15:40 ` Timothy Madden 2022-11-10 15:01 ` Timothy Madden 2022-11-10 15:36 ` Karol Herbst 2022-11-09 23:44 ` Karol Herbst 2022-11-10 14:51 ` Timothy Madden
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.