All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 207673] New: radeon: crash due to over temperature
@ 2020-05-10 13:43 bugzilla-daemon
  2020-05-10 13:46 ` [Bug 207673] " bugzilla-daemon
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: bugzilla-daemon @ 2020-05-10 13:43 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207673

            Bug ID: 207673
           Summary: radeon: crash due to over temperature
           Product: Drivers
           Version: 2.5
    Kernel Version: 5.6.x and previous
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: high
          Priority: P1
         Component: Video(DRI - non Intel)
          Assignee: drivers_video-dri@kernel-bugs.osdl.org
          Reporter: phil@jpmr.org
        Regression: No

Created attachment 289045
  --> https://bugzilla.kernel.org/attachment.cgi?id=289045&action=edit
kernel log + lspci + glxinfo + patch

The radeon driver crashes because of an over temperature of my
AMD Cape Verde Pro graphic card.

On my system, there's no overclocking, and the power management mode is the
default one, with power_method dpm, power_dpm_state balanced and
power_dpm_force_performance_level auto.
This GPU is used for display and opencl computing.

The default over temperature value in r600_dpm is 120C, which seems to be too
high for this chip/card.
I patched my system to have a 100C limit, and I've no crash anymore.
(I tried 110C, and it's still too high).

Attached are the full kernel log of the crash event, the lspci and glxinfo for
the graphic card, and the proposed patch.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug 207673] radeon: crash due to over temperature
  2020-05-10 13:43 [Bug 207673] New: radeon: crash due to over temperature bugzilla-daemon
@ 2020-05-10 13:46 ` bugzilla-daemon
  2020-05-13 12:20 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2020-05-10 13:46 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207673

--- Comment #1 from phileimer (phil@jpmr.org) ---
Created attachment 289047
  --> https://bugzilla.kernel.org/attachment.cgi?id=289047&action=edit
radeon: lower the high temperature limit

Limit the chip temperature to 100C, instead of 120C.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug 207673] radeon: crash due to over temperature
  2020-05-10 13:43 [Bug 207673] New: radeon: crash due to over temperature bugzilla-daemon
  2020-05-10 13:46 ` [Bug 207673] " bugzilla-daemon
@ 2020-05-13 12:20 ` bugzilla-daemon
  2020-06-22 14:10 ` [Bug 207673] amdgpu/radeon: " bugzilla-daemon
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2020-05-13 12:20 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207673

--- Comment #2 from phileimer (phil@jpmr.org) ---
I can give more information about the over temperature problem :

* if I keep the 120C limit, the card runs at power level 3 until the driver
crashes

* limiting at 100C allows the driver to decrease power level to 2 after a small
overshoot, i.e. the temperature reaches 103/104C

* once at power level 2, the temperature stabilizes around 96C

* to test further, I decreased the case fan speed, and then, even with the 100C
limit, the card continues to run at power level 2 until the driver crashes
around 112C

So, there seems to be 2 problems :
* the default 120C is clearly too high, at least for this board/chip
* the temperature limit is used to go from PWL 3 to PWL 2, but there's no
decrease to a lower PWL (1 or 0), as a safe measure

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug 207673] amdgpu/radeon: crash due to over temperature
  2020-05-10 13:43 [Bug 207673] New: radeon: crash due to over temperature bugzilla-daemon
  2020-05-10 13:46 ` [Bug 207673] " bugzilla-daemon
  2020-05-13 12:20 ` bugzilla-daemon
@ 2020-06-22 14:10 ` bugzilla-daemon
  2020-06-22 14:11 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2020-06-22 14:10 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207673

phileimer (phil@jpmr.org) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|radeon: crash due to over   |amdgpu/radeon: crash due to
                   |temperature                 |over temperature

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug 207673] amdgpu/radeon: crash due to over temperature
  2020-05-10 13:43 [Bug 207673] New: radeon: crash due to over temperature bugzilla-daemon
                   ` (2 preceding siblings ...)
  2020-06-22 14:10 ` [Bug 207673] amdgpu/radeon: " bugzilla-daemon
@ 2020-06-22 14:11 ` bugzilla-daemon
  2020-06-22 14:14 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2020-06-22 14:11 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207673

phileimer (phil@jpmr.org) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Kernel Version|5.6.x and previous          |5.6.x, 5.7.x

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug 207673] amdgpu/radeon: crash due to over temperature
  2020-05-10 13:43 [Bug 207673] New: radeon: crash due to over temperature bugzilla-daemon
                   ` (3 preceding siblings ...)
  2020-06-22 14:11 ` bugzilla-daemon
@ 2020-06-22 14:14 ` bugzilla-daemon
  2020-06-22 14:16 ` bugzilla-daemon
  2020-06-27 12:44 ` bugzilla-daemon
  6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2020-06-22 14:14 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207673

--- Comment #3 from phileimer (phil@jpmr.org) ---
Created attachment 289807
  --> https://bugzilla.kernel.org/attachment.cgi?id=289807&action=edit
amdgpu: lower the temperature limit to avoid kernel crash

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug 207673] amdgpu/radeon: crash due to over temperature
  2020-05-10 13:43 [Bug 207673] New: radeon: crash due to over temperature bugzilla-daemon
                   ` (4 preceding siblings ...)
  2020-06-22 14:14 ` bugzilla-daemon
@ 2020-06-22 14:16 ` bugzilla-daemon
  2020-06-27 12:44 ` bugzilla-daemon
  6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2020-06-22 14:16 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207673

--- Comment #4 from phileimer (phil@jpmr.org) ---
I modified my kernel configuration to use the new amdgpu driver for this SI
chip, instead of the legacy radeon.
The same problem occurs: to avoid frequent kernel crashes, I must apply a patch
to lower the maximum temperature allowed.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug 207673] amdgpu/radeon: crash due to over temperature
  2020-05-10 13:43 [Bug 207673] New: radeon: crash due to over temperature bugzilla-daemon
                   ` (5 preceding siblings ...)
  2020-06-22 14:16 ` bugzilla-daemon
@ 2020-06-27 12:44 ` bugzilla-daemon
  6 siblings, 0 replies; 8+ messages in thread
From: bugzilla-daemon @ 2020-06-27 12:44 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207673

--- Comment #5 from phileimer (phil@jpmr.org) ---
Created attachment 289897
  --> https://bugzilla.kernel.org/attachment.cgi?id=289897&action=edit
amdgpu: kernel log when over temperature crash

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-06-27 12:44 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-10 13:43 [Bug 207673] New: radeon: crash due to over temperature bugzilla-daemon
2020-05-10 13:46 ` [Bug 207673] " bugzilla-daemon
2020-05-13 12:20 ` bugzilla-daemon
2020-06-22 14:10 ` [Bug 207673] amdgpu/radeon: " bugzilla-daemon
2020-06-22 14:11 ` bugzilla-daemon
2020-06-22 14:14 ` bugzilla-daemon
2020-06-22 14:16 ` bugzilla-daemon
2020-06-27 12:44 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.