All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 96964] R290X stuck at 100% GPU load / full core clock on non-x86 machines
@ 2016-07-17 10:19 bugzilla-daemon
  2016-07-17 10:22 ` bugzilla-daemon
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: bugzilla-daemon @ 2016-07-17 10:19 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2046 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=96964

            Bug ID: 96964
           Summary: R290X stuck at 100% GPU load / full core clock on
                    non-x86 machines
           Product: DRI
           Version: XOrg git
          Hardware: Other
                OS: All
            Status: NEW
          Severity: normal
          Priority: medium
         Component: DRM/Radeon
          Assignee: dri-devel@lists.freedesktop.org
          Reporter: kb9vqf@pearsoncomputing.net

Our twin Radeon 290X cards are stuck at 100% GPU load (according to radeontop
and Gallium) and full core clock (according to radeon_pm_info) on non-x86
machines such as our POWER8 compute server.  The identical card does not show
this behaviour on a test x86 machine.

Forcibly crashing the GPU (causing a soft reset) fixes the issue.  Relevant
dmesg output starts at line 4 in this pastebin:
https://bugzilla.kernel.org/show_bug.cgi?id=70651  It is unknown if simply
triggering a soft reset without the GPU crash would also resolve the issue.

I suspect this is related to the atombios x86-specific oprom code only
executing on x86 machines, and related setup therefore not being finalized by
the radeon driver itself on non-x86 machines.  However, this is just an
educated guess.

radeontop output of stuck card:
gpu 100.00%, ee 0.00%, vgt 0.00%, ta 0.00%, sx 0.00%, sh 0.00%, spi 0.00%, sc
0.00%, pa 0.00%, db 0.00%, cb 0.00%

radeontop output of "fixed" card after GPU crash / reset, running 3D app:
gpu 4.17%, ee 0.00%, vgt 0.00%, ta 3.33%, sx 3.33%, sh 0.00%, spi 3.33%, sc
3.33%, pa 0.00%, db 3.33%, cb 3.33%, vram 11.72% 479.87mb

Despite the "100% GPU load" indication, there is no sign of actual load being
placed on the GPU.  3D-intensive applications function 100% correctly with no
apparent performance degradation, so it seems the reading is a.) spurious and
b.) causing the core clock to throttle up needlessly.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3423 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 96964] R290X stuck at 100% GPU load / full core clock on non-x86 machines
  2016-07-17 10:19 [Bug 96964] R290X stuck at 100% GPU load / full core clock on non-x86 machines bugzilla-daemon
@ 2016-07-17 10:22 ` bugzilla-daemon
  2016-07-17 10:28 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2016-07-17 10:22 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 506 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=96964

--- Comment #1 from Timothy Pearson <kb9vqf@pearsoncomputing.net> ---
I should note that after "fixing" the GPU, the radeon driver can be unloaded
and loaded repeatedly without the issue reappearing.  However, rebooting the
machine (i.e. hard GPU reset with firmware reload) will cause the issue to
appear, and it will persist until the GPU is "fixed" (crashed) again.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1352 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 96964] R290X stuck at 100% GPU load / full core clock on non-x86 machines
  2016-07-17 10:19 [Bug 96964] R290X stuck at 100% GPU load / full core clock on non-x86 machines bugzilla-daemon
  2016-07-17 10:22 ` bugzilla-daemon
@ 2016-07-17 10:28 ` bugzilla-daemon
  2016-07-17 10:56 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2016-07-17 10:28 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 497 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=96964

--- Comment #2 from Timothy Pearson <kb9vqf@pearsoncomputing.net> ---
Additional information requested:
Kernel 4.6

Issue appears before X is loaded.  Loading X makes no difference.  Terminating
X makes no difference.  Unloading / reloading radeon driver makes no
difference.  Forced hard reset through the radeon_gpu_reset device node makes
no difference.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1323 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 96964] R290X stuck at 100% GPU load / full core clock on non-x86 machines
  2016-07-17 10:19 [Bug 96964] R290X stuck at 100% GPU load / full core clock on non-x86 machines bugzilla-daemon
  2016-07-17 10:22 ` bugzilla-daemon
  2016-07-17 10:28 ` bugzilla-daemon
@ 2016-07-17 10:56 ` bugzilla-daemon
  2016-07-17 12:53 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2016-07-17 10:56 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 250 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=96964

--- Comment #3 from Timothy Pearson <kb9vqf@pearsoncomputing.net> ---
Corrected pastebin:
https://paste.ee/p/Utp5X

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1115 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 96964] R290X stuck at 100% GPU load / full core clock on non-x86 machines
  2016-07-17 10:19 [Bug 96964] R290X stuck at 100% GPU load / full core clock on non-x86 machines bugzilla-daemon
                   ` (2 preceding siblings ...)
  2016-07-17 10:56 ` bugzilla-daemon
@ 2016-07-17 12:53 ` bugzilla-daemon
  2016-07-17 18:52 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2016-07-17 12:53 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 246 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=96964

--- Comment #4 from Vedran Miletić <vedran@miletic.net> ---
Have you confirmed this affecting aarch64 as well?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1063 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 96964] R290X stuck at 100% GPU load / full core clock on non-x86 machines
  2016-07-17 10:19 [Bug 96964] R290X stuck at 100% GPU load / full core clock on non-x86 machines bugzilla-daemon
                   ` (3 preceding siblings ...)
  2016-07-17 12:53 ` bugzilla-daemon
@ 2016-07-17 18:52 ` bugzilla-daemon
  2016-07-17 19:05 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2016-07-17 18:52 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 566 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=96964

--- Comment #5 from John Bridgman <john.bridgman@amd.com> ---
At the risk of sending things off in the wrong direction, my first thought is
some kind of funky data caching thing when reading GRBM_STATUS using POWER
hardware. 

If bit 31 were always 1 and the other bits were behaving normally then the idea
of being stuck at 100% load would make more sense, but bit 31 stuck at 1 and
all the rest stuck at 0 seems really odd.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1384 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 96964] R290X stuck at 100% GPU load / full core clock on non-x86 machines
  2016-07-17 10:19 [Bug 96964] R290X stuck at 100% GPU load / full core clock on non-x86 machines bugzilla-daemon
                   ` (4 preceding siblings ...)
  2016-07-17 18:52 ` bugzilla-daemon
@ 2016-07-17 19:05 ` bugzilla-daemon
  2016-07-17 19:05 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2016-07-17 19:05 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 714 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=96964

--- Comment #6 from Timothy Pearson <kb9vqf@pearsoncomputing.net> ---
(In reply to John Bridgman from comment #5)
> At the risk of sending things off in the wrong direction, my first thought
> is some kind of funky data caching thing when reading GRBM_STATUS using
> POWER hardware. 
> 
> If bit 31 were always 1 and the other bits were behaving normally then the
> idea of being stuck at 100% load would make more sense, but bit 31 stuck at
> 1 and all the rest stuck at 0 seems really odd.

If it were a data caching issue, how would the GPU crash / soft reset fix it?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1628 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 96964] R290X stuck at 100% GPU load / full core clock on non-x86 machines
  2016-07-17 10:19 [Bug 96964] R290X stuck at 100% GPU load / full core clock on non-x86 machines bugzilla-daemon
                   ` (5 preceding siblings ...)
  2016-07-17 19:05 ` bugzilla-daemon
@ 2016-07-17 19:05 ` bugzilla-daemon
  2016-07-17 19:53 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2016-07-17 19:05 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 386 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=96964

--- Comment #7 from Timothy Pearson <kb9vqf@pearsoncomputing.net> ---
(In reply to Vedran Miletić from comment #4)
> Have you confirmed this affecting aarch64 as well?

No, I have not.  It is non-trivial to test this using the systems on this end.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1282 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 96964] R290X stuck at 100% GPU load / full core clock on non-x86 machines
  2016-07-17 10:19 [Bug 96964] R290X stuck at 100% GPU load / full core clock on non-x86 machines bugzilla-daemon
                   ` (6 preceding siblings ...)
  2016-07-17 19:05 ` bugzilla-daemon
@ 2016-07-17 19:53 ` bugzilla-daemon
  2016-07-17 23:22 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2016-07-17 19:53 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 653 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=96964

--- Comment #8 from John Bridgman <john.bridgman@amd.com> ---
Hold on, there is additional info on radeon IRC log (and in OP's head :)) which
is not yet in the ticket:

>radeontop output of stuck card:
>gpu 100.00%, ee 0.00%, vgt 0.00%, ta 0.00%, sx 0.00%, sh 0.00%, spi 0.00%, sc 0.00%, pa 0.00%, db 0.00%, cb 0.00%

The above is only when no load on the card... when running a 3D app the gpu bit
stays stuck at 1 (100%) but other bits behave normally. 

I think that pretty much eliminates the caching idea.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1505 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 96964] R290X stuck at 100% GPU load / full core clock on non-x86 machines
  2016-07-17 10:19 [Bug 96964] R290X stuck at 100% GPU load / full core clock on non-x86 machines bugzilla-daemon
                   ` (7 preceding siblings ...)
  2016-07-17 19:53 ` bugzilla-daemon
@ 2016-07-17 23:22 ` bugzilla-daemon
  2016-07-18  3:39 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2016-07-17 23:22 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 620 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=96964

--- Comment #9 from Timothy Pearson <kb9vqf@pearsoncomputing.net> ---
A bit more information:
 * Disabling DPM does not fix the problem (dpm=0 on module load)
 * Using hard reset instead of soft reset just makes a complete mess / host
hang
 * It looks like only the CP block needs to be reset (GPU softreset: 0x00000008
corresponds to RADEON_RESET_CP).
 * After reset DPM is broken, but DPM also breaks after unloading / reloading
the radeon module so this may be a red herring.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1446 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 96964] R290X stuck at 100% GPU load / full core clock on non-x86 machines
  2016-07-17 10:19 [Bug 96964] R290X stuck at 100% GPU load / full core clock on non-x86 machines bugzilla-daemon
                   ` (8 preceding siblings ...)
  2016-07-17 23:22 ` bugzilla-daemon
@ 2016-07-18  3:39 ` bugzilla-daemon
  2016-07-18  4:26 ` bugzilla-daemon
  2019-11-19  9:17 ` bugzilla-daemon
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2016-07-18  3:39 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 454 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=96964

--- Comment #10 from Timothy Pearson <kb9vqf@pearsoncomputing.net> ---
Created attachment 125126
  --> https://bugs.freedesktop.org/attachment.cgi?id=125126&action=edit
Hack around spurious GPU load indication

This is rather nasty but it does fix the problem.  DPM works perfectly on both
cards with this applied.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1538 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 96964] R290X stuck at 100% GPU load / full core clock on non-x86 machines
  2016-07-17 10:19 [Bug 96964] R290X stuck at 100% GPU load / full core clock on non-x86 machines bugzilla-daemon
                   ` (9 preceding siblings ...)
  2016-07-18  3:39 ` bugzilla-daemon
@ 2016-07-18  4:26 ` bugzilla-daemon
  2019-11-19  9:17 ` bugzilla-daemon
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2016-07-18  4:26 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 315 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=96964

--- Comment #11 from Timothy Pearson <kb9vqf@pearsoncomputing.net> ---
This bug is also triggered on x86 if the BIOS is set to not execute option ROMs
on installed PCI/PCIe cards.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1142 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 96964] R290X stuck at 100% GPU load / full core clock on non-x86 machines
  2016-07-17 10:19 [Bug 96964] R290X stuck at 100% GPU load / full core clock on non-x86 machines bugzilla-daemon
                   ` (10 preceding siblings ...)
  2016-07-18  4:26 ` bugzilla-daemon
@ 2019-11-19  9:17 ` bugzilla-daemon
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2019-11-19  9:17 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 805 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=96964

Martin Peres <martin.peres@free.fr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |MOVED
             Status|NEW                         |RESOLVED

--- Comment #12 from Martin Peres <martin.peres@free.fr> ---
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been
closed from further activity.

You can subscribe and participate further through the new bug through this link
to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/727.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2419 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-11-19  9:17 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-17 10:19 [Bug 96964] R290X stuck at 100% GPU load / full core clock on non-x86 machines bugzilla-daemon
2016-07-17 10:22 ` bugzilla-daemon
2016-07-17 10:28 ` bugzilla-daemon
2016-07-17 10:56 ` bugzilla-daemon
2016-07-17 12:53 ` bugzilla-daemon
2016-07-17 18:52 ` bugzilla-daemon
2016-07-17 19:05 ` bugzilla-daemon
2016-07-17 19:05 ` bugzilla-daemon
2016-07-17 19:53 ` bugzilla-daemon
2016-07-17 23:22 ` bugzilla-daemon
2016-07-18  3:39 ` bugzilla-daemon
2016-07-18  4:26 ` bugzilla-daemon
2019-11-19  9:17 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.