dri-devel Archive on lore.kernel.org
 help / color / Atom feed
* [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
@ 2020-04-21  9:51 bugzilla-daemon
  2020-04-21  9:57 ` [Bug 207383] " bugzilla-daemon
                   ` (116 more replies)
  0 siblings, 117 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-04-21  9:51 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

            Bug ID: 207383
           Summary: [Regression] 5.7-rc: amdgpu/polaris11 gpf:
                    amdgpu_atomic_commit_tail
           Product: Drivers
           Version: 2.5
    Kernel Version: 5.7-rc1, 5.7-rc2
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: blocking
          Priority: P1
         Component: Video(DRI - non Intel)
          Assignee: drivers_video-dri@kernel-bugs.osdl.org
          Reporter: 1i5t5.duncan@cox.net
        Regression: No

Created attachment 288649
  --> https://bugzilla.kernel.org/attachment.cgi?id=288649&action=edit
kernel config

5.7-rc1 and rc2 regression from kernel 5.6.0

After starting X/plasma on 5.7-rc1 and rc2, system runs for a few seconds to a
few hours, then display freezes.  The pointer continues to be movable and audio
will continue to play for some seconds but they eventually stop as well. The
kernel remains alive at least enough to reboot with SRQ-b, not sure if previous
SRQs have any effect or not.

Sometimes but not always there's a gpf left in the log, appearing to confirm
it's amdgpu (the -dirty is simply a patch making mounts noatime by default):

Apr 20 03:25:55 h2 kernel: general protection fault, probably for non-canonical
address 0xc1316515e40a92f6: 0000 [#1] SMP
Apr 20 03:25:55 h2 kernel: CPU: 3 PID: 3921 Comm: kworker/u16:5 Tainted: G     
          T 5.7.0-rc2-dirty #194
Apr 20 03:25:55 h2 kernel: Hardware name: Gigabyte Technology Co., Ltd.
GA-990FXA-UD3/GA-990FXA-UD3, BIOS F6 03/30/2012
Apr 20 03:25:55 h2 kernel: Workqueue: events_unbound commit_work
Apr 20 03:25:55 h2 kernel: RIP: 0010:amdgpu_dm_atomic_commit_tail+0x102d/0x1fd8
Apr 20 03:25:55 h2 kernel: Code: 48 89 9d a0 fc ff ff 8b 90 e0 02 00 00 85 d2
0f 85 26 f1 ff ff 48 8b 85 e0 fc ff ff 48 89 85 a0 fc ff ff 48 8b b5 e0 fc ff
ff <80> be b0 01 00 00 01 0f 86 b4 00 00 00 31 c0 48 b9 00 00 00 00 01
Apr 20 03:25:55 h2 kernel: RSP: 0018:ffffc9000216bad0 EFLAGS: 00010286
Apr 20 03:25:55 h2 kernel: RAX: ffff88842a6e1000 RBX: ffff8883d1d5b800 RCX:
ffff8884283db200
Apr 20 03:25:55 h2 kernel: RDX: ffff8884283db2e0 RSI: c1316515e40a92f6 RDI:
0000000000000002
Apr 20 03:25:55 h2 kernel: RBP: ffffc9000216be50 R08: 0000000000000001 R09:
0000000000000001
Apr 20 03:25:55 h2 kernel: R10: 0000000000030000 R11: 0000000000000000 R12:
0000000000000000
Apr 20 03:25:55 h2 kernel: R13: 0000000000000005 R14: ffff88842bb76000 R15:
ffff88841c08cc00
Apr 20 03:25:55 h2 kernel: FS:  0000000000000000(0000)
GS:ffff88842ecc0000(0000) knlGS:0000000000000000
Apr 20 03:25:55 h2 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 20 03:25:55 h2 kernel: CR2: 000078617de4fffc CR3: 000000040ca0e000 CR4:
00000000000406e0
Apr 20 03:25:55 h2 kernel: Call Trace:
Apr 20 03:25:55 h2 kernel:  ? 0xffffffff81000000
Apr 20 03:25:55 h2 kernel:  ? __switch_to_asm+0x34/0x70
Apr 20 03:25:55 h2 kernel:  ? __switch_to_asm+0x40/0x70
Apr 20 03:25:55 h2 kernel:  ? __switch_to_asm+0x34/0x70
Apr 20 03:25:55 h2 kernel:  ? __switch_to_asm+0x40/0x70
Apr 20 03:25:55 h2 kernel:  ? commit_tail+0x8e/0x120
Apr 20 03:25:55 h2 kernel:  ? process_one_work+0x1a9/0x300
Apr 20 03:25:55 h2 kernel:  ? worker_thread+0x45/0x3b8
Apr 20 03:25:55 h2 kernel:  ? kthread+0xf3/0x130
Apr 20 03:25:55 h2 kernel:  ? process_one_work+0x300/0x300
Apr 20 03:25:55 h2 kernel:  ? __kthread_create_on_node+0x180/0x180
Apr 20 03:25:55 h2 kernel:  ? ret_from_fork+0x22/0x40
Apr 20 03:25:55 h2 kernel: ---[ end trace 33869116def8e8ad ]---
Apr 20 03:25:55 h2 kernel: RIP: 0010:amdgpu_dm_atomic_commit_tail+0x102d/0x1fd8
Apr 20 03:25:55 h2 kernel: Code: 48 89 9d a0 fc ff ff 8b 90 e0 02 00 00 85 d2
0f 85 26 f1 ff ff 48 8b 85 e0 fc ff ff 48 89 85 a0 fc ff ff 48 89 85 a0 fc ff
ff 48 8b b5 e0 fc ff ff <80> be b0 01 00 00 01 0f 86 b4 00 00 00 31 c0 48 b9 00
00 00 00 01
Apr 20 03:25:55 h2 kernel: RSP: 0018:ffffc9000216bad0 EFLAGS: 00010286
Apr 20 03:25:55 h2 kernel: RAX: ffff88842a6e1000 RBX: ffff8883d1d5b800 RCX:
ffff8884283db200
Apr 20 03:25:55 h2 kernel: RDX: ffff8884283db2e0 RSI: c1316515e40a92f6 RDI:
0000000000000002
Apr 20 03:25:55 h2 kernel: RBP: ffffc9000216be50 R08: 0000000000000001 R09:
0000000000000001
Apr 20 03:25:55 h2 kernel: R10: 0000000000030000 R11: 0000000000000000 R12:
0000000000000000
Apr 20 03:25:55 h2 kernel: R13: 0000000000000005 R14: ffff88842bb76000 R15:
ffff88841c08cc00
Apr 20 03:25:55 h2 kernel: FS:  0000000000000000(0000)
GS:ffff88842ecc0000(0000) knlGS:0000000000000000
Apr 20 03:25:55 h2 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 20 03:25:55 h2 kernel: CR2: 000078617de4fffc CR3: 000000040ca0e000 CR4:
00000000000406e0

That's it.  Nothing in the log since boot before, and the next entry is after
reboot.

gcc version 9.3.0 on Gentoo.  AMD fx6100 on the Gigabyte board in the log
above.    
xorg-server 1.20.8, mesa 20.0.4, xf86-video-amdgpu 19.1.0, linux-firmware
20200413

kernel config attached

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
@ 2020-04-21  9:57 ` bugzilla-daemon
  2020-04-21 10:04 ` bugzilla-daemon
                   ` (115 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-04-21  9:57 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #1 from Duncan (1i5t5.duncan@cox.net) ---
Created attachment 288651
  --> https://bugzilla.kernel.org/attachment.cgi?id=288651&action=edit
automated boot-time dmesg dump

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
  2020-04-21  9:57 ` [Bug 207383] " bugzilla-daemon
@ 2020-04-21 10:04 ` bugzilla-daemon
  2020-04-23  4:59 ` bugzilla-daemon
                   ` (114 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-04-21 10:04 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

Duncan (1i5t5.duncan@cox.net) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Regression|No                          |Yes

--- Comment #2 from Duncan (1i5t5.duncan@cox.net) ---
I build kernels from git and can apply testing patches as necessary.  I may
bisect, but haven't yet, and it'd take a bit and may not be reliable as the
trigger time is variable.  Plus of course I can't do anything I don't want
interrupted while attempting to bisect.  So hoping the polaris-11, log and pin
to v5.6..v5.7-rc1 is enough.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
  2020-04-21  9:57 ` [Bug 207383] " bugzilla-daemon
  2020-04-21 10:04 ` bugzilla-daemon
@ 2020-04-23  4:59 ` bugzilla-daemon
  2020-04-27 19:24 ` bugzilla-daemon
                   ` (113 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-04-23  4:59 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #3 from Duncan (1i5t5.duncan@cox.net) ---
CCed the two from MAINTAINERS bugzi would let me add.  It wouldn't let me add
amd-gfx@ or david1.zhou@, and Alex's gmail address according to bugzi isn't
what's in MAINTAINERS.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (2 preceding siblings ...)
  2020-04-23  4:59 ` bugzilla-daemon
@ 2020-04-27 19:24 ` bugzilla-daemon
  2020-04-27 19:42 ` bugzilla-daemon
                   ` (112 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-04-27 19:24 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

Duncan (1i5t5.duncan@cox.net) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Kernel Version|5.7-rc1, 5.7-rc2            |5.7-rc1, 5.7-rc2, 5.7-rc3

--- Comment #4 from Duncan (1i5t5.duncan@cox.net) ---
Still there with 5.7-rc3, altho /maybe/ it's not triggering as quickly.  Took
13 hours to trigger this time and I'd almost decided it was fixed as it had
been triggering sooner than that, but could simply be luck.  Rebooted to rc3
again.  We'll see...

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (3 preceding siblings ...)
  2020-04-27 19:24 ` bugzilla-daemon
@ 2020-04-27 19:42 ` bugzilla-daemon
  2020-04-27 19:43 ` bugzilla-daemon
                   ` (111 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-04-27 19:42 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #5 from Duncan (1i5t5.duncan@cox.net) ---
Well, that didn't take long.  Four konsole terminals open to do (various
aspects of) a system update.  Just a few seconds after I entered the
(git-based) sync command, display-FREEZE!

Back on 5.6.0 now.  I'll probably test again with rc4, perhaps earlier if I see
a set of drm/amdgpu updates in mainline git.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (4 preceding siblings ...)
  2020-04-27 19:42 ` bugzilla-daemon
@ 2020-04-27 19:43 ` bugzilla-daemon
  2020-05-01  8:20 ` bugzilla-daemon
                   ` (110 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-04-27 19:43 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #6 from Alex Deucher (alexdeucher@gmail.com) ---
Can you bisect?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (5 preceding siblings ...)
  2020-04-27 19:43 ` bugzilla-daemon
@ 2020-05-01  8:20 ` bugzilla-daemon
  2020-05-01  8:28 ` bugzilla-daemon
                   ` (109 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-05-01  8:20 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #7 from Duncan (1i5t5.duncan@cox.net) ---
Bisecting, but it's slow going when the bug can take 12+ hours to trigger, and
even then I can't be sure a "good" is actually so.

So far (at 5.6.0-01623-g12ab316ce, ~7 bisect steps to go, under 100 commits
"after"), the first few were all "good", while the one I'm currently testing
obviously isn't "bad" in terms of this bug yet, but does display a nasty
buffer-sync issue with off-frame read-outs and eventual firefox crashes trying
to play 4k@30fps youtube in firefox, a bit of a struggle with this kit but
usually OK (it's the 4k@60fps that's the real problem in firefox/chromium, tho
it tends to be fine without the browser overhead in mpv/smplayer/vlc).

But I hadn't seen that issue with the full 5.7-rc1 thru rc3, so it was
apparently already fixed with rc1.  And no incidents of this bug, full system
or full graphics lockups with a segfault in amdgpu_dm_atomic_commit_tail,
during the bisect yet.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (6 preceding siblings ...)
  2020-05-01  8:20 ` bugzilla-daemon
@ 2020-05-01  8:28 ` bugzilla-daemon
  2020-05-02 16:03 ` bugzilla-daemon
                   ` (108 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-05-01  8:28 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #8 from Duncan (1i5t5.duncan@cox.net) ---
Hmm.  Don't think I mentioned on this bug yet that I'm running dual 4K TVs as
monitors.  So it could only trigger on dual display, and two 4K displays means
it's pumping a lot more pixels than most cards, too.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (7 preceding siblings ...)
  2020-05-01  8:28 ` bugzilla-daemon
@ 2020-05-02 16:03 ` bugzilla-daemon
  2020-05-03 15:10 ` bugzilla-daemon
                   ` (107 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-05-02 16:03 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #9 from Duncan (1i5t5.duncan@cox.net) ---
I'm not there yet but it's starting to look like a possibly dud bisect:
everything showing good so far.  Maybe I didn't wait long enough for the bug to
trigger at some step and I'm running up the wrong side of the tree, or maybe
it's not drm after all (I thought I'd try something new and limit the paths to
drivers/gpu/drm/ and include/drm/, but that may have been a critical mistake). 
Right now there's only 3-4 even remotely reasonable candidates (out of 14 left
to test... the rest being mediatek or similar):

4064b9827
Peter Xu
mm: allow VM_FAULT_RETRY for multiple times

6bfef2f91
Jason Gunthorpe
mm/hmm: remove HMM_FAULT_SNAPSHOT

17ffdc482
Christoph Hellwig
mm: simplify device private page handling in hmm_range_fault

And maybe (but I'm neither EFI nor 32-bit)

72e0ef0e5
Mikel Rychliski
PCI: Use ioremap(), not phys_to_virt() for platform ROM


Meanwhile, user-side I've gotten vulkan/mesa/etc updates recently.  I'm
considering checking out linus-master/HEAD again, doing a pull, and seeing if
by chance either the last week's kernel updates or the user-side updates have
eliminated the problem.  If not I can come back and finish the bisect (or try
just reverting those four on current linus-master/HEAD), before starting a new
clean bisect if necessary.  Just saved the bisect log and current pointer.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (8 preceding siblings ...)
  2020-05-02 16:03 ` bugzilla-daemon
@ 2020-05-03 15:10 ` bugzilla-daemon
  2020-05-05  4:23 ` bugzilla-daemon
                   ` (106 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-05-03 15:10 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #10 from Duncan (1i5t5.duncan@cox.net) ---
(In reply to Duncan from comment #9)
> I'm not there yet but it's starting to look like a possibly dud bisect:
> everything showing good so far

Good but not ideal news!

I did get an apparent graphics crash at the bisect-point above, but it didn't
dump anything in the log this time and behavior was a bit different than usual
for this bug -- audio continued playing longer and I was able to confirm SRQ-E
termination via audio and cpu-fan, and SRQ-S sync via sata-activity LED.

So I'm not sure it's the same bug, or maybe a different one; I'm bisecting
pre-rc1 after all so others aren't unlikely.

So I'm rebooted to the same bisect step to try again, with any luck to get that
gpf dump in the log confirming it's the same bug this time.

If it *is* the same bug, it looks like I avoided a dud bisect after all, just
happened to be all good until almost the very end, I'm only a few steps away
from pinning it down, and it's almost certainly one of the commits listed in
comment #9. =:^)

> Meanwhile, user-side I've gotten vulkan/mesa/etc updates recently.  I'm
> considering checking out linus-master/HEAD again, doing a pull, and seeing
> if by chance either the last week's kernel updates or the user-side updates
> have eliminated the problem.

Been there, done that, still had the bug, with gpf-log-dump confirmation.  Back
to the bisect.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (9 preceding siblings ...)
  2020-05-03 15:10 ` bugzilla-daemon
@ 2020-05-05  4:23 ` bugzilla-daemon
  2020-05-06 17:46 ` bugzilla-daemon
                   ` (105 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-05-05  4:23 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #11 from Duncan (1i5t5.duncan@cox.net) ---
(In reply to Duncan from comment #10)
> I did get an apparent graphics crash at the bisect-point above, but it
> didn't dump anything in the log this time

Got a gpf dump with amdgpu_atomic_commit_tail, confirming it's the same bug. 
Still a couple bisect steps to go, but the EFI candidate's out now, leaving
only three (plus mediatek and nouveau, and an amdgpu that says it was doc fix
only), and the current round is testing between 406 and the 6bf/17f pair so I
should eliminate at least one of the three this round:

4064b9827
Peter Xu
mm: allow VM_FAULT_RETRY for multiple times

6bfef2f91
Jason Gunthorpe
mm/hmm: remove HMM_FAULT_SNAPSHOT

17ffdc482
Christoph Hellwig
mm: simplify device private page handling in hmm_range_fault

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (10 preceding siblings ...)
  2020-05-05  4:23 ` bugzilla-daemon
@ 2020-05-06 17:46 ` bugzilla-daemon
  2020-05-06 22:06 ` bugzilla-daemon
                   ` (104 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-05-06 17:46 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #12 from Duncan (1i5t5.duncan@cox.net) ---
OK, bisect says:

4064b9827
Peter Xu
mm: allow VM_FAULT_RETRY for multiple times

... which came in via akpm and touches drivers/drm/ttm/ttm_bo_vm.c.

But I'm not entirely confident in that result ATM.  Among other things I had
set ZSWAP_DEFAULT_ON for 5.7, and I had zswap configured but not active
previously, so that could be it too.  I'm not typically under enough memory
pressure to trigger it, but...

Luckily a git show -R generated patch still applies cleanly on current master
(5.7.0-rc4-00029-gdc56c5acd, tho I've only built it not rebooted to test it
yet) so I can test both the commit-revert patch and the changed zswap options
now.

So I'm confirming still.

But perhaps take another look at that commit and see if there's some way
allowing unlimited VM_FAULT_RETRY could leave drm at least on on amdgpu
eternally stalled, which does seem to fit the symptoms, whether it's unlimited
VM_FAULT_RETRY or not.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (11 preceding siblings ...)
  2020-05-06 17:46 ` bugzilla-daemon
@ 2020-05-06 22:06 ` bugzilla-daemon
  2020-06-03  0:04 ` [Bug 207383] [Regression] 5.7 " bugzilla-daemon
                   ` (103 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-05-06 22:06 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #13 from Duncan (1i5t5.duncan@cox.net) ---
Well, so much for /that/ bisect!  Took me a few hours but then had the graphics
stall twice in a few minutes... with the above commit reverted AND with memory
compression off.

So it's back to square one, except I know that my originally chosen new memory
compression options aren't involved.  New bisect time.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (12 preceding siblings ...)
  2020-05-06 22:06 ` bugzilla-daemon
@ 2020-06-03  0:04 ` bugzilla-daemon
  2020-06-21  7:01 ` bugzilla-daemon
                   ` (102 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-06-03  0:04 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

Duncan (1i5t5.duncan@cox.net) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[Regression] 5.7-rc:        |[Regression] 5.7
                   |amdgpu/polaris11 gpf:       |amdgpu/polaris11 gpf:
                   |amdgpu_atomic_commit_tail   |amdgpu_atomic_commit_tail

--- Comment #14 from Duncan (1i5t5.duncan@cox.net) ---
Unfortunately the bug's still there in 5.7 release. =:^(

Not properly bisected yet as after the first failure I needed something
reasonably stable for awhile as I had about a dozen live-git kde-plasma
userspace bugs to track down and report, but kernel 5.6.0-07388-gf365ab31e has
been exactly that, stable for me, for weeks now (built May 6), and the bug
definitely triggered in 5.7-rc1, so it's gotta be between those.  With the
unrelated userspace side mostly fixed now, and this kernelspace bug now known
to remain unfixed in the normal development cycle, maybe I can get back to
bisecting it again.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (13 preceding siblings ...)
  2020-06-03  0:04 ` [Bug 207383] [Regression] 5.7 " bugzilla-daemon
@ 2020-06-21  7:01 ` bugzilla-daemon
  2020-06-22 15:20 ` bugzilla-daemon
                   ` (101 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-06-21  7:01 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #15 from Duncan (1i5t5.duncan@cox.net) ---
Bug's in v5.8-rc1-226-g4333a9b0b too.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (14 preceding siblings ...)
  2020-06-21  7:01 ` bugzilla-daemon
@ 2020-06-22 15:20 ` bugzilla-daemon
  2020-06-22 17:44 ` bugzilla-daemon
                   ` (100 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-06-22 15:20 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

rtmasura+kernel@hotmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rtmasura+kernel@hotmail.com

--- Comment #16 from rtmasura+kernel@hotmail.com ---
Reporting I've had the same issue with kernel 5.7.2 and 5.7.4:

Jun 22 07:10:24 abiggun kernel: general protection fault, probably for
non-canonical address 0xd3d74027d6d8fad4: 0000 [#1] PREEMPT SMP NOPTI
Jun 22 07:10:24 abiggun kernel: CPU: 0 PID: 32680 Comm: kworker/u12:9 Not
tainted 5.7.4-arch1-1 #1
Jun 22 07:10:24 abiggun kernel: Hardware name: System manufacturer System
Product Name/Crosshair IV Formula, BIOS 1102    08/24/2010
Jun 22 07:10:24 abiggun kernel: Workqueue: events_unbound commit_work
[drm_kms_helper]
Jun 22 07:10:24 abiggun kernel: RIP:
0010:amdgpu_dm_atomic_commit_tail+0x2aa/0x2310 [amdgpu]
Jun 22 07:10:24 abiggun kernel: Code: 4f 08 8b 81 e0 02 00 00 41 83 c5 01 44 39
e8 0f 87 46 ff ff ff 48 83 bd f0 fc ff ff 00 0f 84 03 01 00 00 48 8b bd f0 f>
Jun 22 07:10:24 abiggun kernel: RSP: 0018:ffffb0cc421abaf8 EFLAGS: 00010286
Jun 22 07:10:24 abiggun kernel: RAX: 0000000000000006 RBX: ffffa21b8e16c400
RCX: ffffa21cab9c8800
Jun 22 07:10:24 abiggun kernel: RDX: ffffa21ca7326200 RSI: ffffffffc10de1a0
RDI: d3d74027d6d8fad4
Jun 22 07:10:24 abiggun kernel: RBP: ffffb0cc421abe60 R08: 0000000000000001
R09: 0000000000000001
Jun 22 07:10:24 abiggun kernel: R10: 00000000000002be R11: 00000000001c57a1
R12: 0000000000000000
Jun 22 07:10:24 abiggun kernel: R13: 0000000000000006 R14: ffffa218e4959800
R15: ffffa219e5b12780
Jun 22 07:10:24 abiggun kernel: FS:  0000000000000000(0000)
GS:ffffa21cbfc00000(0000) knlGS:0000000000000000
Jun 22 07:10:24 abiggun kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Jun 22 07:10:24 abiggun kernel: CR2: 00007fec2b573008 CR3: 0000000344bd8000
CR4: 00000000000006f0
Jun 22 07:10:24 abiggun kernel: Call Trace:
Jun 22 07:10:24 abiggun kernel:  ? cpumask_next_and+0x19/0x20
Jun 22 07:10:24 abiggun kernel:  ? update_sd_lb_stats.constprop.0+0x115/0x8f0
Jun 22 07:10:24 abiggun kernel:  ? __update_load_avg_cfs_rq+0x277/0x2f0
Jun 22 07:10:24 abiggun kernel:  ? update_load_avg+0x58f/0x660
Jun 22 07:10:24 abiggun kernel:  ? update_curr+0x108/0x1f0
Jun 22 07:10:24 abiggun kernel:  ? __switch_to_asm+0x34/0x70
Jun 22 07:10:24 abiggun kernel:  ? __switch_to_asm+0x40/0x70
Jun 22 07:10:24 abiggun kernel:  ? __switch_to_asm+0x34/0x70
Jun 22 07:10:24 abiggun kernel:  ? __switch_to_asm+0x40/0x70
Jun 22 07:10:24 abiggun kernel:  ? rescuer_thread+0x3f0/0x3f0
Jun 22 07:10:24 abiggun kernel:  commit_tail+0x94/0x130 [drm_kms_helper]
Jun 22 07:10:24 abiggun kernel:  process_one_work+0x1da/0x3d0
Jun 22 07:10:24 abiggun kernel:  ? rescuer_thread+0x3f0/0x3f0
Jun 22 07:10:24 abiggun kernel:  worker_thread+0x4d/0x3e0
Jun 22 07:10:24 abiggun kernel:  ? rescuer_thread+0x3f0/0x3f0
Jun 22 07:10:24 abiggun kernel:  kthread+0x13e/0x160
Jun 22 07:10:24 abiggun kernel:  ? __kthread_bind_mask+0x60/0x60
Jun 22 07:10:24 abiggun kernel:  ret_from_fork+0x22/0x40
Jun 22 07:10:24 abiggun kernel: Modules linked in: snd_usb_audio
snd_usbmidi_lib snd_rawmidi hid_plantronics mc vhost_net vhost tap vhost_iotlb
snd_seq_dumm>
Jun 22 07:10:24 abiggun kernel:  crypto_simd cryptd glue_helper xts dm_crypt
hid_generic usbhid hid raid456 libcrc32c crc32c_generic async_raid6_recov
async>
Jun 22 07:10:24 abiggun kernel: ---[ end trace 536cfe34e3c36293 ]---
Jun 22 07:10:24 abiggun kernel: RIP:
0010:amdgpu_dm_atomic_commit_tail+0x2aa/0x2310 [amdgpu]
Jun 22 07:10:24 abiggun kernel: Code: 4f 08 8b 81 e0 02 00 00 41 83 c5 01 44 39
e8 0f 87 46 ff ff ff 48 83 bd f0 fc ff ff 00 0f 84 03 01 00 00 48 8b bd f0 f>
Jun 22 07:10:25 abiggun kernel: RSP: 0018:ffffb0cc421abaf8 EFLAGS: 00010286
Jun 22 07:10:25 abiggun kernel: RAX: 0000000000000006 RBX: ffffa21b8e16c400
RCX: ffffa21cab9c8800
Jun 22 07:10:25 abiggun kernel: RDX: ffffa21ca7326200 RSI: ffffffffc10de1a0
RDI: d3d74027d6d8fad4
Jun 22 07:10:25 abiggun kernel: RBP: ffffb0cc421abe60 R08: 0000000000000001
R09: 0000000000000001
Jun 22 07:10:25 abiggun kernel: R10: 00000000000002be R11: 00000000001c57a1
R12: 0000000000000000
Jun 22 07:10:25 abiggun kernel: R13: 0000000000000006 R14: ffffa218e4959800
R15: ffffa219e5b12780
Jun 22 07:10:25 abiggun kernel: FS:  0000000000000000(0000)
GS:ffffa21cbfc00000(0000) knlGS:0000000000000000
Jun 22 07:10:25 abiggun kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Jun 22 07:10:25 abiggun kernel: CR2: 00007fec2b573008 CR3: 0000000344bd8000
CR4: 00000000000006f0

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (15 preceding siblings ...)
  2020-06-22 15:20 ` bugzilla-daemon
@ 2020-06-22 17:44 ` bugzilla-daemon
  2020-06-22 17:57 ` bugzilla-daemon
                   ` (99 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-06-22 17:44 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #17 from Duncan (1i5t5.duncan@cox.net) ---
(In reply to rtmasura+kernel from comment #16)
> Reporting I've had the same issue with kernel 5.7.2 and 5.7.4:

Thanks!

> Jun 22 07:10:24 abiggun kernel: Hardware name: System manufacturer System
> Product Name/Crosshair IV Formula, BIOS 1102    08/24/2010

So socket AM3 from 2010, slightly older than my AM3+ from 2012.  Both are
PCIe-2.0.

What's your CPU and GPU?

As above my GPU is Polaris11 (AMD Radeon RX 460, arctic-islands/gcn4 series,
pcie-3),  AMD fx6100 CPU.

Guessing the bug is gpu-series code specific or there'd be more people howling,
so what you're running for gpu is significant.  It's /possible/ it may be
specific to people running pcie mismatch, as well (note my pcie-3 gpu card on a
pcie-2 mobo).

> Jun 22 07:10:24 abiggun kernel: Workqueue: events_unbound commit_work
> [drm_kms_helper]
> 0010:amdgpu_dm_atomic_commit_tail+0x2aa/0x2310 [amdgpu]

That's the bit of the dump I understand, similar to mine...

If you can find a quicker/more-reliable way to trigger the crash, it'd sure be
helpful for bisecting.  Also, if you're running a bad kernel enough to tell
(not just back to 5.6 after finding 5.7 bad), does it reliably dump-log before
the reboot for you?  I'm back to a veerrry--sloowww second bisect attempt, with
for instance my current kernel having crashed three times now so it's obviously
bugged, but nothing dumped in the log on the way down yet so I can't guarantee
it's the _same_ bug (the bisect is in pre-rc1 code so chances of a different
bug are definitely non-zero), and given the bad results on the first bisect I'm
trying to confirm each bisect-bad with a log-dump and each bisect-good with at
least 3-4 days no crash.  But this one's in between right now, frequent
crashing but no log-dump to confirm it's the same bug.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (16 preceding siblings ...)
  2020-06-22 17:44 ` bugzilla-daemon
@ 2020-06-22 17:57 ` bugzilla-daemon
  2020-06-22 19:36 ` bugzilla-daemon
                   ` (98 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-06-22 17:57 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #18 from rtmasura+kernel@hotmail.com ---
lspci:
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890 Northbridge
only single slot PCI-e GFX Hydra part (rev 02)
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD/ATI] RD890S/RD990 I/O Memory
Management Unit (IOMMU)
00:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890/RD9x0/RX980
PCI to PCI bridge (PCI Express GFX port 0)
00:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890/RD9x0/RX980
PCI to PCI bridge (PCI Express GPP Port 0)
00:07.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890/RD9x0/RX980
PCI to PCI bridge (PCI Express GPP Port 3)
00:0b.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890/RD990 PCI to
PCI bridge (PCI Express GFX2 port 0)
00:0d.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890/RD9x0/RX980
PCI to PCI bridge (PCI Express GPP2 Port 0)
00:11.0 RAID bus controller: Advanced Micro Devices, Inc. [AMD/ATI]
SB7x0/SB8x0/SB9x0 SATA Controller [RAID5 mode] (rev 40)
00:12.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI]
SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:12.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI]
SB7x0/SB8x0/SB9x0 USB EHCI Controller
00:13.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI]
SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:13.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI]
SB7x0/SB8x0/SB9x0 USB EHCI Controller
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 SMBus Controller
(rev 42)
00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 Azalia
(Intel HDA) (rev 40)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0
LPC host controller (rev 40)
00:14.4 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 PCI to PCI
Bridge (rev 40)
00:14.5 USB controller: Advanced Micro Devices, Inc. [AMD/ATI]
SB7x0/SB8x0/SB9x0 USB OHCI2 Controller
00:16.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI]
SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:16.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI]
SB7x0/SB8x0/SB9x0 USB EHCI Controller
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor
HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor
Address Map
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor
DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor
Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor
Link Control
02:00.0 PCI bridge: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express
Gen 2 (5.0 GT/s) Switch [ExpressLane] (rev bb)
03:04.0 PCI bridge: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express
Gen 2 (5.0 GT/s) Switch [ExpressLane] (rev bb)
03:05.0 PCI bridge: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express
Gen 2 (5.0 GT/s) Switch [ExpressLane] (rev bb)
03:06.0 PCI bridge: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express
Gen 2 (5.0 GT/s) Switch [ExpressLane] (rev bb)
03:08.0 PCI bridge: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express
Gen 2 (5.0 GT/s) Switch [ExpressLane] (rev bb)
03:09.0 PCI bridge: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express
Gen 2 (5.0 GT/s) Switch [ExpressLane] (rev bb)
04:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection
(rev 01)
04:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection
(rev 01)
06:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection
(rev 01)
06:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection
(rev 01)
07:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection
(rev 01)
07:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection
(rev 01)
09:00.0 VGA compatible controller: NVIDIA Corporation GP104GL [Quadro P4000]
(rev a1)
09:00.1 Audio device: NVIDIA Corporation GP104 High Definition Audio Controller
(rev a1)
0a:00.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev
03)
0b:00.0 SATA controller: JMicron Technology Corp. JMB363 SATA/IDE Controller
(rev 03)
0b:00.1 IDE interface: JMicron Technology Corp. JMB363 SATA/IDE Controller (rev
03)
0c:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1470 (rev c3)
0d:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1471
0e:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega
10 XL/XT [Radeon RX Vega 56/64] (rev c3)
0e:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 HDMI Audio
[Radeon Vega 56/64]

A few notes on that: The AMD Vega56 is used for this PC, the Quadro P4000 is
disabled on my system and passed through to VMs. 

I haven't found any way to trigger it. Seems completely random. Sat down this
morning to update a VM (not the one with the nvidia passthrough) and it froze,
wasn't any real graphical things going on other than normal KDE stuff. 


lscpu:
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   48 bits physical, 48 bits virtual
CPU(s):                          6
On-line CPU(s) list:             0-5
Thread(s) per core:              1
Core(s) per socket:              6
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       AuthenticAMD
CPU family:                      16
Model:                           10
Model name:                      AMD Phenom(tm) II X6 1090T Processor
Stepping:                        0
CPU MHz:                         3355.192
BogoMIPS:                        6421.46
Virtualization:                  AMD-V
L1d cache:                       384 KiB
L1i cache:                       384 KiB
L2 cache:                        3 MiB
L3 cache:                        6 MiB
NUMA node0 CPU(s):               0-5
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Not affected
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and
__user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full AMD retpoline, STIBP
disabled, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep
mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext
fxsr_
                                 opt pdpe1gb rdtscp lm 3dnowext 3dnow
constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni monitor
cx16 po
                                 pcnt lahf_lm cmp_legacy svm extapic cr8_legacy
abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt cpb hw_pstate vmmcall
                                  npt lbrv svm_lock nrip_save pausefilter


I would be happy to help with any testing, just let me know what information
you need.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (17 preceding siblings ...)
  2020-06-22 17:57 ` bugzilla-daemon
@ 2020-06-22 19:36 ` bugzilla-daemon
  2020-06-22 20:00 ` bugzilla-daemon
                   ` (97 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-06-22 19:36 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #19 from Duncan (1i5t5.duncan@cox.net) ---
(In reply to rtmasura+kernel from comment #18)
> 09:00.0 VGA compatible controller: NVIDIA Corporation GP104GL [Quadro P4000]
> (rev a1)

> 0e:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
> Vega 10 XL/XT [Radeon RX Vega 56/64] (rev c3)

> A few notes on that: The AMD Vega56 is used for this PC, the Quadro P4000 is
> disabled on my system and passed through to VMs. 

So newer graphics, Vega56/gcn5 compared to my gcn4.

No VMs at all here so that can be excluded as a factor (unless it's a minor
trigger similar to my zooming or video play).

> I haven't found any way to trigger it. Seems completely random. Sat down
> this morning to update a VM (not the one with the nvidia passthrough) and it
> froze, wasn't any real graphical things going on other than normal KDE
> stuff. 

KDE/Plasma here too.  I think kwin exercises the opengl a bit more than some
WMs, in part because it's a compositor as well.  The bug most often hits here
when playing video or using kwin's zoom effect, which exercise the graphics a
bit.

So mostly kde/kwin triggers could lower the population hitting it and could be
a factor, based on both of us running it.

> Model name:                      AMD Phenom(tm) II X6 1090T Processor

Newer graphics, gcn5 to gcn4, older cpu, phenom ii to fx, than here.

So we know gcn4 and gcn5 are affected, and pcie2 bus with pcie3 cards and
kde/kwin are common-factor possible triggers so far.

> I would be happy to help with any testing, just let me know what information
> you need.

If you happen to run anything besides KDE/Plasma on X, duplicating (or failing
to duplicate) the bug on non-kde and/or on wayland would be useful info.  I
only run KDE Plasma on X here.  Well, that and CLI (on amdgpu-drm-framebuffer)
more than some but not enough that I'd have expected to see it there, which I
haven't.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (18 preceding siblings ...)
  2020-06-22 19:36 ` bugzilla-daemon
@ 2020-06-22 20:00 ` bugzilla-daemon
  2020-06-23 15:36 ` bugzilla-daemon
                   ` (96 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-06-22 20:00 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #20 from rtmasura+kernel@hotmail.com ---
I have XFCE4 installed as well, I'll give it a test and let you know in 24
hours; a GPF should have happened by then

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (19 preceding siblings ...)
  2020-06-22 20:00 ` bugzilla-daemon
@ 2020-06-23 15:36 ` bugzilla-daemon
  2020-06-23 23:41 ` bugzilla-daemon
                   ` (95 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-06-23 15:36 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #21 from rtmasura+kernel@hotmail.com ---
OK. I've uninstalled the vast majority of KDE and am using a vanilla XFCE4.
It's been about 12 hours on 5.7.4-arch1-1 and I have yet to have a crash. It is
looking like it may be something with KDE.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (20 preceding siblings ...)
  2020-06-23 15:36 ` bugzilla-daemon
@ 2020-06-23 23:41 ` bugzilla-daemon
  2020-06-24  8:55 ` bugzilla-daemon
                   ` (94 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-06-23 23:41 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #22 from Duncan (1i5t5.duncan@cox.net) ---
(In reply to rtmasura+kernel from comment #21)
> OK. I've uninstalled the vast majority of KDE and am using a vanilla XFCE4.
> It's been about 12 hours on 5.7.4-arch1-1 and I have yet to have a crash. It
> is looking like it may be something with KDE.

Note that it is possible to run kwin (kwin_x11 being the actual executable) on
another desktop, or conversely, a different WM on plasma.  To run kwin and make
it replace the existing WM you'd simply type in (in the xfce runner or terminal
window, it can be done from a different VT as well but then you gotta feed kwin
the display information too) kwin_x11 --replace.  Presumably other WMs have a
similar command-line option.

I've never actually done it on a non-plasma desktop (tho I run live-git plasma
and frameworks so I must always be prepared to restart it or various other
plasma components, to the point I have non-kde-invoked shortcuts setup to do it
there), but I /think/ kwin would continue to use the configuration setup on
kde, the various window rules, configured kwin keyboard shortcuts and effects,
etc.

That could prove whether it's actually kwin triggering or not (tho it's a
kernel bug regardless), tho I suspect the proof is academic at this point given
that you've demonstrated that the trigger does appear to be kde/plasma related,
at least.  IMO kwin triggering is a reasonably safe assumption given that.  But
it does explain why the bug isn't widely reported, plasma being the apparent
biggest trigger and limited to specific now older generations of hardware means
few people, even of those running the latest kernels, are going to see it.

Meanwhile, I actually got a log-dump on the 4th crash of the kernel at that
bisect step, confirming it is indeed this bug, and have advanced a bisect step.
 But git says I still have ~11 steps, 1000+ commits, so it's still well too
large to start trying to pick out candidate buggy commits from the remainder. 
Slow going indeed.  At this rate a full bisect and fix could well be after 5.8
release, giving us two full bad release cycles and kernels before a fix.  Not
good. =:^(

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (21 preceding siblings ...)
  2020-06-23 23:41 ` bugzilla-daemon
@ 2020-06-24  8:55 ` bugzilla-daemon
  2020-06-27  4:37 ` bugzilla-daemon
                   ` (93 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-06-24  8:55 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #23 from rtmasura+kernel@hotmail.com ---
Yeah, over 24 hours and still stable. And glad I could help, I rarely have
anything I can give back to the community.

And wow, that much work. Truly, we all do appreciate your work, but I don't
think most of us understand how much. Thank you from all of us :)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (22 preceding siblings ...)
  2020-06-24  8:55 ` bugzilla-daemon
@ 2020-06-27  4:37 ` bugzilla-daemon
  2020-06-27  4:38 ` bugzilla-daemon
                   ` (92 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-06-27  4:37 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #24 from rtmasura+kernel@hotmail.com ---
I've been up and stable on XFCE4 since that last message, but just crashed
today with a bit of a different error. This happened after I turned on a screen
tear fix:

xfconf-query -c xfwm4 -p /general/vblank_mode -s glx

I also didn't reboot to activate it, I just hot loaded it with:

xfwm4 --replace --vblank=glx &

Don't think that changes anything, but just in case. Not sure if it's related,
I had a game idling on my monitor while I was cooking, and it's the first time
I had played it. It was Battle of Wesnoth. Anyway, here's the log:


Jun 26 21:08:03 abiggun kernel: general protection fault, probably for
non-canonical address 0x3b963e011fb9f84: 0000 [#1] PREEMPT SMP NOPTI
Jun 26 21:08:03 abiggun kernel: CPU: 4 PID: 362093 Comm: kworker/u12:1 Not
tainted 5.7.4-arch1-1 #1
Jun 26 21:08:03 abiggun kernel: Hardware name: System manufacturer System
Product Name/Crosshair IV Formula, BIOS 1102    08/24/2010
Jun 26 21:08:03 abiggun kernel: Workqueue: events_unbound commit_work
[drm_kms_helper]
Jun 26 21:08:03 abiggun kernel: RIP:
0010:amdgpu_dm_atomic_commit_tail+0x2aa/0x2310 [amdgpu]
Jun 26 21:08:03 abiggun kernel: Code: 4f 08 8b 81 e0 02 00 00 41 83 c5 01 44 39
e8 0f 87 46 ff ff ff 48 83 bd f0 fc ff ff 00 0f 84 03 01 00 00 48 8b bd f0 fc
ff ff <80> bf b0 01 00 00 01 0f 86 ac 00 00 00 48 b9 00 00 00 00 01 00 00
Jun 26 21:08:03 abiggun kernel: RSP: 0018:ffff993cc4037af8 EFLAGS: 00010206
Jun 26 21:08:03 abiggun kernel: RAX: 0000000000000006 RBX: ffff931ae09c0800
RCX: ffff931bfe478000
Jun 26 21:08:03 abiggun kernel: RDX: ffff931bf2dd2600 RSI: ffffffffc10a51a0
RDI: 03b963e011fb9f84
Jun 26 21:08:03 abiggun kernel: RBP: ffff993cc4037e60 R08: 0000000000000001
R09: 0000000000000001
Jun 26 21:08:03 abiggun kernel: R10: 0000000000000018 R11: 0000000000000018
R12: 0000000000000000
Jun 26 21:08:03 abiggun kernel: R13: 0000000000000006 R14: ffff931bd0450c00
R15: ffff931b3574dc80
Jun 26 21:08:03 abiggun kernel: FS:  0000000000000000(0000)
GS:ffff931c3fd00000(0000) knlGS:0000000000000000
Jun 26 21:08:03 abiggun kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Jun 26 21:08:03 abiggun kernel: CR2: 00007fe602dc0008 CR3: 0000000418080000
CR4: 00000000000006e0
Jun 26 21:08:03 abiggun kernel: Call Trace:
Jun 26 21:08:03 abiggun kernel:  ? tomoyo_write_self+0x100/0x1d0
Jun 26 21:08:03 abiggun kernel:  ? __switch_to_asm+0x34/0x70
Jun 26 21:08:03 abiggun kernel:  ? __switch_to_asm+0x40/0x70
Jun 26 21:08:03 abiggun kernel:  ? __switch_to_asm+0x34/0x70
Jun 26 21:08:03 abiggun kernel:  ? __switch_to_asm+0x40/0x70
Jun 26 21:08:03 abiggun kernel:  ? rescuer_thread+0x3f0/0x3f0
Jun 26 21:08:03 abiggun kernel:  commit_tail+0x94/0x130 [drm_kms_helper]
Jun 26 21:08:03 abiggun kernel:  process_one_work+0x1da/0x3d0
Jun 26 21:08:03 abiggun kernel:  ? rescuer_thread+0x3f0/0x3f0
Jun 26 21:08:03 abiggun kernel:  worker_thread+0x4d/0x3e0
Jun 26 21:08:03 abiggun kernel:  ? rescuer_thread+0x3f0/0x3f0
Jun 26 21:08:03 abiggun kernel:  kthread+0x13e/0x160
Jun 26 21:08:03 abiggun kernel:  ? __kthread_bind_mask+0x60/0x60
Jun 26 21:08:03 abiggun kernel:  ret_from_fork+0x22/0x40
Jun 26 21:08:03 abiggun kernel: Modules linked in: snd_usb_audio
snd_usbmidi_lib snd_rawmidi snd_seq_device mc hid_plantronics macvtap macvlan
vhost_net vhost tap vhost_iotlb fuse xt_CHECKSUM xt_MASQUERADE xt_conntrack
ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle
iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ebtable_filter
ebtables ip6table_filter ip6_tables iptable_filter bridge stp llc rfkill tun
lm92 hwmon_vid input_leds amdgpu squashfs nouveau loop edac_mce_amd kvm_amd ccp
rng_core mxm_wmi snd_hda_codec_via gpu_sched snd_hda_codec_generic
snd_hda_codec_hdmi ledtrig_audio kvm ttm snd_hda_intel snd_intel_dspcfg
wmi_bmof snd_hda_codec drm_kms_helper snd_hda_core pcspkr sp5100_tco k10temp
snd_hwdep snd_pcm cec i2c_piix4 joydev rc_core mousedev igb syscopyarea
snd_timer sysfillrect snd sysimgblt i2c_algo_bit dca fb_sys_fops soundcore
asus_atk0110 evdev mac_hid wmi drm crypto_user agpgart ip_tables x_tables ext4
crc16 mbcache jbd2 ecb crypto_simd cryptd
Jun 26 21:08:03 abiggun kernel:  glue_helper xts hid_generic usbhid hid
dm_crypt raid456 libcrc32c crc32c_generic async_raid6_recov async_memcpy
async_pq async_xor xor async_tx ohci_pci raid6_pq md_mod ehci_pci ehci_hcd
ohci_hcd xhci_pci xhci_hcd ata_generic pata_acpi pata_jmicron vfio_pci
irqbypass vfio_virqfd vfio_iommu_type1 vfio dm_mod
Jun 26 21:08:03 abiggun kernel: ---[ end trace 4e7c8ad2195077a2 ]---
Jun 26 21:08:03 abiggun kernel: RIP:
0010:amdgpu_dm_atomic_commit_tail+0x2aa/0x2310 [amdgpu]
Jun 26 21:08:03 abiggun kernel: Code: 4f 08 8b 81 e0 02 00 00 41 83 c5 01 44 39
e8 0f 87 46 ff ff ff 48 83 bd f0 fc ff ff 00 0f 84 03 01 00 00 48 8b bd f0 fc
ff ff <80> bf b0 01 00 00 01 0f 86 ac 00 00 00 48 b9 00 00 00 00 01 00 00
Jun 26 21:08:03 abiggun kernel: RSP: 0018:ffff993cc4037af8 EFLAGS: 00010206
Jun 26 21:08:03 abiggun kernel: RAX: 0000000000000006 RBX: ffff931ae09c0800
RCX: ffff931bfe478000
Jun 26 21:08:03 abiggun kernel: RDX: ffff931bf2dd2600 RSI: ffffffffc10a51a0
RDI: 03b963e011fb9f84
Jun 26 21:08:03 abiggun kernel: RBP: ffff993cc4037e60 R08: 0000000000000001
R09: 0000000000000001
Jun 26 21:08:03 abiggun kernel: R10: 0000000000000018 R11: 0000000000000018
R12: 0000000000000000
Jun 26 21:08:03 abiggun kernel: R13: 0000000000000006 R14: ffff931bd0450c00
R15: ffff931b3574dc80
Jun 26 21:08:03 abiggun kernel: FS:  0000000000000000(0000)
GS:ffff931c3fd00000(0000) knlGS:0000000000000000
Jun 26 21:08:03 abiggun kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Jun 26 21:08:03 abiggun kernel: CR2: 00007fe602dc0008 CR3: 0000000418080000
CR4: 00000000000006e0
Jun 26 21:08:23 abiggun Thunar[3946]: 2020-06-27T04:08:23.137Z - debug:
[REPOSITORY] fetch request: /cytrus.json
Jun 26 21:08:23 abiggun Thunar[3946]: 2020-06-27T04:08:23.138Z - debug:
[REPOSITORY] request: /cytrus.json
Jun 26 21:08:23 abiggun Thunar[3946]: { repository:
'https://launcher.cdn.ankama.com' }
Jun 26 21:08:23 abiggun Thunar[3946]: 2020-06-27T04:08:23.155Z - debug:
[REPOSITORY] fetchJson: Parsing data for /cytrus.json
Jun 26 21:08:23 abiggun Thunar[3946]: 2020-06-27T04:08:23.156Z - debug:
[REGISTRY] update
Jun 26 21:08:23 abiggun Thunar[3946]: 2020-06-27T04:08:23.156Z - debug:
[REGISTRY] Parse repository Data
Jun 26 21:08:40 abiggun audit[241624]: ANOM_ABEND auid=1000 uid=1000 gid=985
ses=2 subj==unconfined pid=241624 comm="GpuWatchdog"
exe="/opt/google/chrome/chrome" sig=11 res=1
Jun 26 21:08:40 abiggun kernel: GpuWatchdog[241650]: segfault at 0 ip
0000556ef31897ad sp 00007f11132a95d0 error 6 in chrome[556eeeadc000+785b000]
Jun 26 21:08:40 abiggun kernel: Code: 00 79 09 48 8b 7d b0 e8 f1 95 6c fe c7 45
b0 aa aa aa aa 0f ae f0 41 8b 84 24 e0 00 00 00 89 45 b0 48 8d 7d b0 e8 f3 5a
ba fb <c7> 04 25 00 00 00 00 37 13 00 00 48 83 c4 38 5b 41 5c 41 5d 41 5e
Jun 26 21:08:40 abiggun audit: BPF prog-id=71 op=LOAD
Jun 26 21:08:40 abiggun audit: BPF prog-id=72 op=LOAD
Jun 26 21:08:40 abiggun systemd[1]: Started Process Core Dump (PID 362491/UID
0).
Jun 26 21:08:40 abiggun audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295
ses=4294967295 subj==unconfined msg='unit=systemd-coredump@4-362491-0
comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=?
res=success'
Jun 26 21:08:45 abiggun systemd-coredump[362492]: Process 241624 (chrome) of
user 1000 dumped core.

                                                  Stack trace of thread 241650:
                                                  #0  0x0000556ef31897ad n/a
(chrome + 0x62b07ad)
                                                  #1  0x0000556ef17e5c93 n/a
(chrome + 0x490cc93)
                                                  #2  0x0000556ef17f7199 n/a
(chrome + 0x491e199)
                                                  #3  0x0000556ef17ad6cf n/a
(chrome + 0x48d46cf)
                                                  #4  0x0000556ef17f795c n/a
(chrome + 0x491e95c)
                                                  #5  0x0000556ef17d08b9 n/a
(chrome + 0x48f78b9)
                                                  #6  0x0000556ef180ea1b n/a
(chrome + 0x4935a1b)
                                                  #7  0x0000556ef184ae78 n/a
(chrome + 0x4971e78)
                                                  #8  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #9  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 241624:
                                                  #0  0x00007f1117c2a05f __poll
(libc.so.6 + 0xf505f)
                                                  #1  0x00007f11190c663b n/a
(libxcb.so.1 + 0xc63b)
                                                  #2  0x00007f11190c845b
xcb_wait_for_special_event (libxcb.so.1 + 0xe45b)
                                                  #3  0x00007f11128cd381 n/a
(libGLX_mesa.so.0 + 0x57381)
                                                  #4  0x00007f11128c132b n/a
(libGLX_mesa.so.0 + 0x4b32b)
                                                  #5  0x0000556ef295706e n/a
(chrome + 0x5a7e06e)
                                                  #6  0x0000556ef2955cb8 n/a
(chrome + 0x5a7ccb8)
                                                  #7  0x0000556ef17e5c93 n/a
(chrome + 0x490cc93)
                                                  #8  0x0000556ef17f7199 n/a
(chrome + 0x491e199)
                                                  #9  0x0000556ef17ad999 n/a
(chrome + 0x48d4999)
                                                  #10 0x0000556ef17f795c n/a
(chrome + 0x491e95c)
                                                  #11 0x0000556ef17d08b9 n/a
(chrome + 0x48f78b9)
                                                  #12 0x0000556ef59a9ed9 n/a
(chrome + 0x8ad0ed9)
                                                  #13 0x0000556ef13329b4 n/a
(chrome + 0x44599b4)
                                                  #14 0x0000556ef139addd n/a
(chrome + 0x44c1ddd)
                                                  #15 0x0000556ef1330901 n/a
(chrome + 0x4457901)
                                                  #16 0x0000556eeede80ce
ChromeMain (chrome + 0x1f0f0ce)
                                                  #17 0x00007f1117b5c002
__libc_start_main (libc.so.6 + 0x27002)
                                                  #18 0x0000556eeeadc6aa _start
(chrome + 0x1c036aa)

                                                  Stack trace of thread 241636:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 241642:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 241644:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 241643:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 359981:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 241651:
                                                  #0  0x00007f1117c34f3e
epoll_wait (libc.so.6 + 0xfff3e)
                                                  #1  0x0000556ef192ea1a n/a
(chrome + 0x4a55a1a)
                                                  #2  0x0000556ef192c227 n/a
(chrome + 0x4a53227)
                                                  #3  0x0000556ef18588d0 n/a
(chrome + 0x497f8d0)
                                                  #4  0x0000556ef17f795c n/a
(chrome + 0x491e95c)
                                                  #5  0x0000556ef17d08b9 n/a
(chrome + 0x48f78b9)
                                                  #6  0x0000556ef1809624 n/a
(chrome + 0x4930624)
                                                  #7  0x0000556ef180ea1b n/a
(chrome + 0x4935a1b)
                                                  #8  0x0000556ef184ae78 n/a
(chrome + 0x4971e78)
                                                  #9  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #10 0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 241655:
                                                  #0  0x00007f1119245158
pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0 + 0x10158)
                                                  #1  0x0000556ef1846f60 n/a
(chrome + 0x496df60)
                                                  #2  0x0000556ef18475b0 n/a
(chrome + 0x496e5b0)
                                                  #3  0x0000556ef17ad716 n/a
(chrome + 0x48d4716)
                                                  #4  0x0000556ef17f795c n/a
(chrome + 0x491e95c)
                                                  #5  0x0000556ef17d08b9 n/a
(chrome + 0x48f78b9)
                                                  #6  0x0000556ef180ea1b n/a
(chrome + 0x4935a1b)
                                                  #7  0x0000556ef184ae78 n/a
(chrome + 0x4971e78)
                                                  #8  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #9  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 241656:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 242011:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 241646:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 241657:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 241658:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 351071:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 351072:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 359972:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 241659:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 361357:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 241647:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 241652:
                                                  #0  0x00007f1119245158
pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0 + 0x10158)
                                                  #1  0x0000556ef1846f60 n/a
(chrome + 0x496df60)
                                                  #2  0x0000556ef18475b0 n/a
(chrome + 0x496e5b0)
                                                  #3  0x0000556ef1809c6a n/a
(chrome + 0x4930c6a)
                                                  #4  0x0000556ef180a54c n/a
(chrome + 0x493154c)
                                                  #5  0x0000556ef180a234 n/a
(chrome + 0x4931234)
                                                  #6  0x0000556ef184ae78 n/a
(chrome + 0x4971e78)
                                                  #7  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #8  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 241653:
                                                  #0  0x00007f1117c34f3e
epoll_wait (libc.so.6 + 0xfff3e)
                                                  #1  0x0000556ef192ea1a n/a
(chrome + 0x4a55a1a)
                                                  #2  0x0000556ef192c227 n/a
(chrome + 0x4a53227)
                                                  #3  0x0000556ef18588d0 n/a
(chrome + 0x497f8d0)
                                                  #4  0x0000556ef17f795c n/a
(chrome + 0x491e95c)
                                                  #5  0x0000556ef17d08b9 n/a
(chrome + 0x48f78b9)
                                                  #6  0x0000556ef180ea1b n/a
(chrome + 0x4935a1b)
                                                  #7  0x0000556ef184ae78 n/a
(chrome + 0x4971e78)
                                                  #8  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #9  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 241660:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 241661:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 241662:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 241665:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 241666:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 241663:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 241667:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 241664:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 241851:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 241852:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 241853:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 245560:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x0000556ef1846e48 n/a
(chrome + 0x496de48)
                                                  #2  0x0000556ef18475d9 n/a
(chrome + 0x496e5d9)
                                                  #3  0x0000556ef184739f n/a
(chrome + 0x496e39f)
                                                  #4  0x0000556ef17ad751 n/a
(chrome + 0x48d4751)
                                                  #5  0x0000556ef17f795c n/a
(chrome + 0x491e95c)
                                                  #6  0x0000556ef17d08b9 n/a
(chrome + 0x48f78b9)
                                                  #7  0x0000556ef180ea1b n/a
(chrome + 0x4935a1b)
                                                  #8  0x0000556ef184ae78 n/a
(chrome + 0x4971e78)
                                                  #9  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #10 0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 241862:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 361354:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 361028:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 241902:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 361345:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 361358:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 241645:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 241638:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 241639:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 241640:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 241641:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 241750:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 241855:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 309100:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 359991:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

                                                  Stack trace of thread 241637:
                                                  #0  0x00007f1119244e32
pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a
(radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a
(radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422
start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3
__clone (libc.so.6 + 0xffbf3)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (23 preceding siblings ...)
  2020-06-27  4:37 ` bugzilla-daemon
@ 2020-06-27  4:38 ` bugzilla-daemon
  2020-06-27  5:16 ` bugzilla-daemon
                   ` (91 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-06-27  4:38 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #25 from rtmasura+kernel@hotmail.com ---
Same kernel (5.7.4) and I'll try to reproduce it, and if it happens I'll turn
off the screen tear and try to reproduce again

Let me know if that's anything I can provide you

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (24 preceding siblings ...)
  2020-06-27  4:38 ` bugzilla-daemon
@ 2020-06-27  5:16 ` bugzilla-daemon
  2020-06-27  6:08 ` bugzilla-daemon
                   ` (90 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-06-27  5:16 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #26 from rtmasura+kernel@hotmail.com ---
and just got another crash, only watching a video in chrome. Guess the chrome
bit at the end might be more important than I thought

I *think* I've turned off the glx for xfwm.. we'll see. My computer has been
showing video in chrome every day without issues before today. I hadn't updated
since last week either, no changes in the system.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (25 preceding siblings ...)
  2020-06-27  5:16 ` bugzilla-daemon
@ 2020-06-27  6:08 ` bugzilla-daemon
  2020-06-27  7:07 ` bugzilla-daemon
                   ` (89 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-06-27  6:08 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #27 from rtmasura+kernel@hotmail.com ---
and another crash, chrome's good at causing them (watching youtube). Used -s ""
for the setting which I think should set it to 'auto', and what I assumed was
default. I've changed that to -s "off" to see if that helps.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (26 preceding siblings ...)
  2020-06-27  6:08 ` bugzilla-daemon
@ 2020-06-27  7:07 ` bugzilla-daemon
  2020-06-27 22:26 ` bugzilla-daemon
                   ` (88 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-06-27  7:07 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #28 from Duncan (1i5t5.duncan@cox.net) ---
(In reply to rtmasura+kernel from comment #27)
> and another crash, chrome's good at causing them (watching youtube). Used -s
> "" for the setting which I think should set it to 'auto', and what I assumed
> was default. I've changed that to -s "off" to see if that helps.

You just added those updates as I was typing a comment pointing out that
chrome/chromium in your bug; bugzilla warned of a mid-air collision! 
Chrom(e|ium) has new vulkan accel code and very likely exercises some of the
same relatively new amdgpu kernel code kwin does, so both of them triggering
the bug wouldn't surprise me at all.

As it happens I switched back to firefox during the 5.6 kernel cycle, so
haven't seen chromium's interaction with the (kernel 5.7) bug myself, but once
I saw it in that trace I said to myself I bet that's his trigger!


FWIW I advanced a couple more bisect steps pretty quickly as it was triggering
as I tried to complete system updates (which on gentoo of course means building
the packages), but then I hit an apparently good kernel, and uptime says 3 days
now, something I've not seen in awhile!  Only thing is, I finished those
updates and they were pretty calm the next couple days, so I've not been
stressing the system to the same extent, either.  Given the problems I got
myself into the first bisect run, I'm going to run on this kernel a bit longer
before I do that bisect good to advance a step.  If it reaches a week and I've
done either a good system update or a some heavy 4k@60 youtube on firefox, I'll
call it good, but I'm not ready to yet.

The good news is, in a couple more bisect steps I'll be down to some practical
number of remaining commits to report the range here, and if they have the
time, a dev with a practiced eye should be able to narrow it down by say 3/4
(two steps ahead of my bisect), leaving something actually practical to examine
closer.  After that it'll be past the point of my bisect being the only
bottleneck, if it's big enough to get dev priority time, of course.  If not,
I'll just have to keep plugging away at the bisect...

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (27 preceding siblings ...)
  2020-06-27  7:07 ` bugzilla-daemon
@ 2020-06-27 22:26 ` bugzilla-daemon
  2020-06-28  1:12 ` bugzilla-daemon
                   ` (87 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-06-27 22:26 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

zzyxpaw@gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |zzyxpaw@gmail.com

--- Comment #29 from zzyxpaw@gmail.com ---
Just hit this on Archlinux with linux-5.7.6 on a Vega 64. So far I've had three
crashes mostly occuring within the first few minutes of uptime. I'm not running
kwin or chrome, just a light window manager (bspwm) and compton.

During the first two, steam's fossilize was running which lead me to suspect it
was triggered by an interaction with that. However the third crashed before I
even managed to start steam, so either I'm just lucky or my system is good at
triggering this. @Duncan I'm not sure if you want to muddle your bisect results
with a different system configuration, but I'm happy to help test commits if
that would be helpful.

I've noticed the call traces reported in the kernel log are slightly different
for each crash; I'm not sure if they're likely to be useful or not. Here's at
least the one from my first crash:

Jun 27 14:04:40 erebor kernel: general protection fault, probably for
non-canonical address 0x5dda9795528973db: 0000 [#1] PREEMPT SMP NOPTI
Jun 27 14:04:40 erebor kernel: CPU: 14 PID: 193610 Comm: kworker/u32:14
Tainted: G           OE     5.7.6-arch1-1 #1
Jun 27 14:04:40 erebor kernel: Hardware name: To Be Filled By O.E.M. To Be
Filled By O.E.M./AB350 Pro4, BIOS P4.90 06/14/2018
Jun 27 14:04:40 erebor kernel: Workqueue: events_unbound commit_work
[drm_kms_helper]
Jun 27 14:04:40 erebor kernel: RIP:
0010:amdgpu_dm_atomic_commit_tail+0x2aa/0x2310 [amdgpu]
Jun 27 14:04:40 erebor kernel: Code: 4f 08 8b 81 e0 02 00 00 41 83 c5 01 44 39
e8 0f 87 46 ff ff ff 48 83 bd f0 fc ff ff 00 0f 84 03 01 00 00 48 8b bd f0 fc
ff ff <80> bf b0 01 00 00 01 0f 86 ac 00 00>
Jun 27 14:04:40 erebor kernel: RSP: 0018:ffffbcec0a4afaf8 EFLAGS: 00010206
Jun 27 14:04:40 erebor kernel: RAX: 0000000000000006 RBX: ffff9b71dbaed000 RCX:
ffff9b7472e4b800
Jun 27 14:04:40 erebor kernel: RDX: ffff9b72504ea400 RSI: ffffffffc13181e0 RDI:
5dda9795528973db
Jun 27 14:04:40 erebor kernel: RBP: ffffbcec0a4afe60 R08: 0000000000000001 R09:
0000000000000001
Jun 27 14:04:40 erebor kernel: R10: 0000000000000082 R11: 00000000000730e2 R12:
0000000000000000
Jun 27 14:04:40 erebor kernel: R13: 0000000000000006 R14: ffff9b71dbaed800 R15:
ffff9b71a8fdb580
Jun 27 14:04:40 erebor kernel: FS:  0000000000000000(0000)
GS:ffff9b747ef80000(0000) knlGS:0000000000000000
Jun 27 14:04:40 erebor kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Jun 27 14:04:40 erebor kernel: CR2: 000056460ce164b0 CR3: 0000000341c86000 CR4:
00000000003406e0
Jun 27 14:04:40 erebor kernel: Call Trace:
Jun 27 14:04:40 erebor kernel:  ? __erst_read+0x160/0x1d0
Jun 27 14:04:40 erebor kernel:  ? __switch_to_asm+0x34/0x70
Jun 27 14:04:40 erebor kernel:  ? __switch_to_asm+0x40/0x70
Jun 27 14:04:40 erebor kernel:  ? __switch_to_asm+0x34/0x70
Jun 27 14:04:40 erebor kernel:  ? __switch_to_asm+0x40/0x70
Jun 27 14:04:40 erebor kernel:  ? rescuer_thread+0x3f0/0x3f0
Jun 27 14:04:40 erebor kernel:  commit_tail+0x94/0x130 [drm_kms_helper]
Jun 27 14:04:40 erebor kernel:  process_one_work+0x1da/0x3d0
Jun 27 14:04:40 erebor kernel:  ? rescuer_thread+0x3f0/0x3f0
Jun 27 14:04:40 erebor kernel:  worker_thread+0x4d/0x3e0
Jun 27 14:04:40 erebor kernel:  ? rescuer_thread+0x3f0/0x3f0
Jun 27 14:04:40 erebor kernel:  kthread+0x13e/0x160
Jun 27 14:04:40 erebor kernel:  ? __kthread_bind_mask+0x60/0x60
Jun 27 14:04:40 erebor kernel:  ret_from_fork+0x22/0x40
Jun 27 14:04:40 erebor kernel: Modules linked in: snd_seq_midi snd_seq_dummy
snd_seq_midi_event snd_hrtimer snd_seq fuse ccm 8021q garp mrp stp llc
snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_de>
Jun 27 14:04:40 erebor kernel:  blake2b_generic libcrc32c crc32c_generic xor
uas usb_storage raid6_pq crc32c_intel xhci_pci xhci_hcd
Jun 27 14:04:40 erebor kernel: ---[ end trace cb5c0d96dd991657 ]---
Jun 27 14:04:40 erebor kernel: RIP:
0010:amdgpu_dm_atomic_commit_tail+0x2aa/0x2310 [amdgpu]
Jun 27 14:04:40 erebor kernel: Code: 4f 08 8b 81 e0 02 00 00 41 83 c5 01 44 39
e8 0f 87 46 ff ff ff 48 83 bd f0 fc ff ff 00 0f 84 03 01 00 00 48 8b bd f0 fc
ff ff <80> bf b0 01 00 00 01 0f 86 ac 00 00>
Jun 27 14:04:40 erebor kernel: RSP: 0018:ffffbcec0a4afaf8 EFLAGS: 00010206
Jun 27 14:04:40 erebor kernel: RAX: 0000000000000006 RBX: ffff9b71dbaed000 RCX:
ffff9b7472e4b800
Jun 27 14:04:40 erebor kernel: RDX: ffff9b72504ea400 RSI: ffffffffc13181e0 RDI:
5dda9795528973db
Jun 27 14:04:40 erebor kernel: RBP: ffffbcec0a4afe60 R08: 0000000000000001 R09:
0000000000000001
Jun 27 14:04:40 erebor kernel: R10: 0000000000000082 R11: 00000000000730e2 R12:
0000000000000000
Jun 27 14:04:40 erebor kernel: R13: 0000000000000006 R14: ffff9b71dbaed800 R15:
ffff9b71a8fdb580
Jun 27 14:04:40 erebor kernel: FS:  0000000000000000(0000)
GS:ffff9b747ef80000(0000) knlGS:0000000000000000
Jun 27 14:04:40 erebor kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Jun 27 14:04:40 erebor kernel: CR2: 000056460ce164b0 CR3: 0000000341c86000 CR4:
00000000003406e0

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (28 preceding siblings ...)
  2020-06-27 22:26 ` bugzilla-daemon
@ 2020-06-28  1:12 ` bugzilla-daemon
  2020-06-28 10:48 ` bugzilla-daemon
                   ` (86 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-06-28  1:12 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #30 from mnrzk@protonmail.com ---
I've been looking at this bug for a while now and I'll try to share what I've
found about it.

In some conditions, when amdgpu_dm_atomic_commit_tail calls
dm_atomic_get_new_state, dm_atomic_get_new_state returns a struct
dm_atomic_state* with an garbage context pointer.

I've also found that this bug exclusively occurs when commit_work is on the
workqueue. After forcing drm_atomic_helper_commit to run all of the commits
without adding to the workqueue and running the OS, the issue seems to have
disappeared. The system was stable for at least 1.5 hours before I manually
shut it down (meanwhile it has usually crashed within 30-45 minutes).

Perhaps there's some sort of race condition occurring after commit_work is
queued?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (29 preceding siblings ...)
  2020-06-28  1:12 ` bugzilla-daemon
@ 2020-06-28 10:48 ` bugzilla-daemon
  2020-06-28 15:30 ` bugzilla-daemon
                   ` (85 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-06-28 10:48 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

Duncan (1i5t5.duncan@cox.net) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Kernel Version|5.7-rc1, 5.7-rc2, 5.7-rc3   |5.7-rc1 - 5.7 - 5.8-rc1+

--- Comment #31 from Duncan (1i5t5.duncan@cox.net) ---
(In reply to mnrzk from comment #30)
> In some conditions, when amdgpu_dm_atomic_commit_tail calls
> dm_atomic_get_new_state, dm_atomic_get_new_state returns a struct
> dm_atomic_state* with an garbage context pointer.

Good! Someone with the bug who can actually read and work the code, now.
Portends well for a fix.  =:^)

> I've also found that this bug exclusively occurs when commit_work is on the
> workqueue. After forcing drm_atomic_helper_commit to run all of the commits
> without adding to the workqueue and running the OS, the issue seems to have
> disappeared.

I see it always with the workqueue too, but not being a dev I simply assumed
that was how it was; I had no idea it could be taken off the workqueue.

> The system was stable for at least 1.5 hours before I manually
> shut it down (meanwhile it has usually crashed within 30-45 minutes).

You're seeing a crash much faster than I am.  I believe my longest uptime
before a crash with the telltale trace was something like two and a half days,
with the obvious implications for bisect good since it's always a gamble that
I've simply not tested long enough.

> Perhaps there's some sort of race condition occurring after commit_work is
> queued?

Agreed, FWIW, tho you've taken it farther than I could, not being able to work
with code much beyond bisect or modifying an existing patch here or there.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (30 preceding siblings ...)
  2020-06-28 10:48 ` bugzilla-daemon
@ 2020-06-28 15:30 ` bugzilla-daemon
  2020-06-29  7:39 ` bugzilla-daemon
                   ` (84 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-06-28 15:30 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #32 from Duncan (1i5t5.duncan@cox.net) ---
Created attachment 289911
  --> https://bugzilla.kernel.org/attachment.cgi?id=289911&action=edit
Partial git bisect log

(In reply to zzyxpaw from comment #29)
> @Duncan I'm not sure if you want to muddle your
> bisect results with a different system configuration, but I'm happy to help
> test commits if that would be helpful.

Here's my current git bisect log you can replay.

I believe that should leave you at v5.6-rc2-245-gcf6c26ec7, which I'm going to
build and boot to as soon as I post this.

But if your system's as good at triggering the bug as you suggest, try deleting
that last good before the replay as I'm only ~98% sure about it given a
potential trigger-time of days on my system.  That should leave you at
7be97138e which you can try triggering it with.  If your system's reliably
triggering within minutes and it doesn't trigger on that, you can confirm my
bisect good and go from there.

Note that if you're building with gcc-10.x you'll likely need a couple patches
that were committed later in the 5.7 cycle, depending on if if they were
applied before or after whatever you're testing.  If you're building with
gcc-9.3 (and presumably earlier) they shouldn't be necessary.

a9a3ed1ef and e78d334a5 are the commits in question.  One was necessary to
build with gcc-10, the other to get past a boot-time crash when built with
gcc-10.  Only one's applying at cf6c26ec7, I don't remember which, but they
were both necessary for 7be97138e.

At my somewhat limited git skill level it was easiest to redirect a git show of
the commit to a patchfile, then apply the patch on top of whatever git bisect
gave me and git reset --hard to clean up the patches before the next git bisect
good/bad.  I guess a git cherry-pick would be the usual way to apply them but
I'm not entirely sure how that interacts with git bisect, so applying the
patches on top was easier way for me, particularly given that I already have
scripts to automate patch application for my local default-to-noatime patch.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (31 preceding siblings ...)
  2020-06-28 15:30 ` bugzilla-daemon
@ 2020-06-29  7:39 ` bugzilla-daemon
  2020-06-29 22:09 ` bugzilla-daemon
                   ` (83 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-06-29  7:39 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #33 from Michel Dänzer (michel@daenzer.net) ---
(In reply to rtmasura+kernel from comment #24)
> xfwm4 --replace --vblank=glx &

FWIW, I recommend

 xfwm4 --vblank=xpresent

instead. --vblank=glx is less efficient and relies on rather exotic GLX
functionality which can be quirky with Mesa.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (32 preceding siblings ...)
  2020-06-29  7:39 ` bugzilla-daemon
@ 2020-06-29 22:09 ` bugzilla-daemon
  2020-07-01 19:08 ` bugzilla-daemon
                   ` (82 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-06-29 22:09 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #34 from mnrzk@protonmail.com ---
Has anyone tried 5.8-rc3? I've been testing it out for the past 3 hours and it
seems stable to me. Also, there were some amdgpu drm fixes pushed between rc2
and rc3 which could have fixed it.

Could someone else experiencing this bug test 5.8-rc3 and see if it's fixed?

I have some debug code and kernel options which may have interfered with my
testing so I wouldn't exactly say the bug is fixed based on my findings.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (33 preceding siblings ...)
  2020-06-29 22:09 ` bugzilla-daemon
@ 2020-07-01 19:08 ` bugzilla-daemon
  2020-07-04 19:57 ` bugzilla-daemon
                   ` (81 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-01 19:08 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #35 from Duncan (1i5t5.duncan@cox.net) ---
(In reply to mnrzk from comment #34)
> Has anyone tried 5.8-rc3? I've been testing it out for the past 3 hours and
> it seems stable to me.

I have now (well, v5.8.0-rc3-00017-g7c30b859a).  Unfortunately got a freeze
with our familiar trace fairly quickly (building kde updates at the time) so
it's not fixed yet.  =:^(

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (34 preceding siblings ...)
  2020-07-01 19:08 ` bugzilla-daemon
@ 2020-07-04 19:57 ` bugzilla-daemon
  2020-07-04 20:13 ` bugzilla-daemon
                   ` (80 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-04 19:57 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

Duncan (1i5t5.duncan@cox.net) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #289911|0                           |1
        is obsolete|                            |

--- Comment #36 from Duncan (1i5t5.duncan@cox.net) ---
Created attachment 290093
  --> https://bugzilla.kernel.org/attachment.cgi?id=290093&action=edit
Updated partial git bisect log

Updated partial git bisect log.  Looks like 226 commits including merges.

There appear to be four Linus-level merge-trees, one of which appears to be the
majority of the remaining commits:

8c1b724dd kvm (medium).  No kvm here so that /should/ be out.

f14a9532e tip (single commit).  sparse warning, x86: bitups.h.  Says generated
code shouldn't be affected.

7f218319c integrity (small). Shouldn't be.

6cad420cc akpm (the majority).  Very likely in this tree.  The current bisect
step is the first code commit (as opposed to tree merge) step and (if I'm
reading things right) appears to split this one, much of this tree on one side,
the rest of it and everything else on the other.

Notice, no drm tree, tho whatever buggy commit it is obviously affects
drm/amdgpu.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (35 preceding siblings ...)
  2020-07-04 19:57 ` bugzilla-daemon
@ 2020-07-04 20:13 ` bugzilla-daemon
  2020-07-05 16:58 ` bugzilla-daemon
                   ` (79 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-04 20:13 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #37 from mnrzk@protonmail.com ---
>Notice, no drm tree, tho whatever buggy commit it is obviously affects
>drm/amdgpu.

Yeah, I kind of noticed that while I was just skimming through the commit
history. Perhaps it's possible that the issue has existed for a while but
became much more apparent since 5.7?

Whatever it is, keep up the good work; maybe you'll find some sort of clue
while bisecting.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (36 preceding siblings ...)
  2020-07-04 20:13 ` bugzilla-daemon
@ 2020-07-05 16:58 ` bugzilla-daemon
  2020-07-05 22:08 ` bugzilla-daemon
                   ` (78 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-05 16:58 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

Duncan (1i5t5.duncan@cox.net) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #290093|0                           |1
        is obsolete|                            |

--- Comment #38 from Duncan (1i5t5.duncan@cox.net) ---
Created attachment 290101
  --> https://bugzilla.kernel.org/attachment.cgi?id=290101&action=edit
Another partial git bisect log update

Just as I was thinking that step was going to be bisect good... it wasn't. 
Confirmed with the usual tail-commit log trace.

(In reply to Duncan from comment #36)
> 6cad420cc akpm (the majority).  Very likely in this tree.

Definitely this tree/pull.  No merge but 113 commits remaining *at* this step
(not _after_), all with signed-off-by both Andrew and Linus so it's all the
akpm tree.  We know the tree, now.

FWIW for anyone relatively new to the bug who skipped some of the first
comments, my bad first bisect attempt ended up in akpm as well.  I haven't
checked if it was the same pull altho I'd guess so.  However, at that time I
was only testing commits with drm in the path (including several that went in
via the akpm tree not the drm tree, one of which that bisect ultimately pointed
me at), and I suspect that's what did me in.

So I strongly suspect that while it's the akpm tree, it's *NOT* the one
remaining candidate with the drm-path in it (4064b9827), thus explaining why
the first bisect ended up pointing at a drm-path commit that I tested by
reverting, only to still have the bug.  I tried a shortcut and it ended up a
rabbit trail. =:^(

Other than that, 113 candidate commits left (well, 112 if we subtract that one)
is still too many (for me) to guess at or really to even just list here.  Two
more steps should bring it down to 28ish, three to 14ish, and maybe I can start
guessing then.  With luck I'll get a couple more bad ones right away and narrow
it down quickly.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (37 preceding siblings ...)
  2020-07-05 16:58 ` bugzilla-daemon
@ 2020-07-05 22:08 ` bugzilla-daemon
  2020-07-06 16:24 ` bugzilla-daemon
                   ` (77 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-05 22:08 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

Duncan (1i5t5.duncan@cox.net) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #290101|0                           |1
        is obsolete|                            |

--- Comment #39 from Duncan (1i5t5.duncan@cox.net) ---
Created attachment 290105
  --> https://bugzilla.kernel.org/attachment.cgi?id=290105&action=edit
Partial git bisect log update #3

(In reply to Duncan from comment #38)
> With luck I'll get a couple more bad ones
> right away and narrow it down quickly.

And so it is.  28 candidates ATM, several of which are OCFS2 or spelling fixes
neither of which should affect this bug.  Excluding those there are eleven
left; the penultimate (next to last) one looks to be a good candidate:

5f2d5026b mm/Makefile: disable KCSAN for kmemleak
b0d14fc43 mm/kmemleak.c: use address-of operator on section symbols
667c79016 revert "topology: add support for node_to_mem_node() to determine the
fallback node"
3202fa62f slub: relocate freelist pointer to middle of object
1ad53d9fa slub: improve bit diffusion for freelist ptr obfuscation
bbd4e305e mm/slub.c: replace kmem_cache->cpu_partial with wrapped APIs
4c7ba22e4 mm/slub.c: replace cpu_slab->partial with wrapped APIs
c537338c0 fs_parse: remove pr_notice() about each validation
630f289b7 asm-generic: make more kernel-space headers mandatory
98c985d7d kthread: mark timer used by delayed kthread works as IRQ safe
4054ab64e tools/accounting/getdelays.c: fix netlink attribute length

My gut says it's 98c "kthread: mark ... delayed kthread... IRQ safe".  Not a
coder but the comment talks about delayed kthreads, we always see the workqueue
in the traces, and mnrzk observes in comment #30 that forcing
drm_atomic_helper_commit to run directly instead of using the workqueue seems
to eliminate the freeze.  If it's called from the amdgpu code and that commit
changes the IRQ-safety assumptions the amdgpu code was depending on in the
workqueue, where the unqueued context is automatically IRQ-safe...

Still could be wrong, but at 11 real candidates it's a 9% chance even simply
statistically, and it sure seems to fit.  Anyway, if it /is/ correct, the next
few bisect steps should be bisect bad and thus go faster, narrowing it down
even further.

Regardless, we're down far enough that someone that can actually read code
might be able to take a look at that and the others now, so my bisect shouldn't
be the /entire/ bottleneck any longer.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (38 preceding siblings ...)
  2020-07-05 22:08 ` bugzilla-daemon
@ 2020-07-06 16:24 ` bugzilla-daemon
  2020-07-06 23:57 ` bugzilla-daemon
                   ` (76 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-06 16:24 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #40 from Alex Deucher (alexdeucher@gmail.com) ---
Does this patch help?
https://gitlab.freedesktop.org/drm/amd/uploads/356586b6aa81f64cfa9b4b034499fdd8/amdgpu-bugfix-revert-vmalloc-size-change.patch

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (39 preceding siblings ...)
  2020-07-06 16:24 ` bugzilla-daemon
@ 2020-07-06 23:57 ` bugzilla-daemon
  2020-07-07  0:37 ` bugzilla-daemon
                   ` (75 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-06 23:57 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #41 from Duncan (1i5t5.duncan@cox.net) ---
(In reply to Alex Deucher from comment #40)
> Does this patch help?

Booted to v5.7 with it applied now.  We'll see.  Since the bug can take awhile
to trigger on my hardware, if the patch fixes it I won't know for days, and
won't be /sure/ for say  a week, the reason bisecting was taking so long.

(It wouldn't apply to current 5.8-rc4-plus-an-s390-pull.  Too tired to figure
out why ATM but if it's because it was there already, hopefully it was pulled
in after v5.8-rc3 as I tested that and got the same graphics freeze with the
characteristic trace, so if the patch was already in v5.8-rc3, it does /not/
fix the bug.)

As for bisecting, I've hard-crashed twice on the current step, apparently with
a different bug, so while _this_ bug hasn't seemed to trigger yet, I haven't
gotten the necessary confidence that it's a bisect-good.  So hopefully this
patch /does/ fix it, and I can put this entirely too frustrating bug-bisect
behind me!

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (40 preceding siblings ...)
  2020-07-06 23:57 ` bugzilla-daemon
@ 2020-07-07  0:37 ` bugzilla-daemon
  2020-07-07  3:01 ` bugzilla-daemon
                   ` (74 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-07  0:37 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #42 from Duncan (1i5t5.duncan@cox.net) ---
(In reply to Alex Deucher from comment #40)
> Does this patch help?

No.  v5.7 with the patch applied gave me the same graphics freeze, with the
usual log trace confirming it's _this_ bug.

Sigh, back to the bisect. =:^(

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (41 preceding siblings ...)
  2020-07-07  0:37 ` bugzilla-daemon
@ 2020-07-07  3:01 ` bugzilla-daemon
  2020-07-07 11:01 ` bugzilla-daemon
                   ` (73 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-07  3:01 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

Christopher Snowhill (kode54@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kode54@gmail.com

--- Comment #43 from Christopher Snowhill (kode54@gmail.com) ---
What about this patch?

https://cgit.freedesktop.org/~agd5f/linux/commit/?h=drm-fixes-5.8&id=6eb3cf2e06d22b2b08e6b0ab48cb9c05a8e1a107

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (42 preceding siblings ...)
  2020-07-07  3:01 ` bugzilla-daemon
@ 2020-07-07 11:01 ` bugzilla-daemon
  2020-07-07 12:43 ` bugzilla-daemon
                   ` (72 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-07 11:01 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #44 from Duncan (1i5t5.duncan@cox.net) ---
(In reply to Christopher Snowhill from comment #43)
> What about this patch?
> 
> https://cgit.freedesktop.org/~agd5f/linux/commit/?h=drm-fixes-5.
> 8&id=6eb3cf2e06d22b2b08e6b0ab48cb9c05a8e1a107

I see that in mainline as of 5.8-rc4 which I just triggered this bug on, so no,
that doesn't fix it.


As for the bisect, now that I'm down to just a few commits, I woke up a couple
hours ago with the idea to just try patch-reverting them on top of 5.7 or
current 5.8-rc, thus eliminating the apparently unrelated kernel-panics I''ve
twice triggered at the current bisect step. I delayed that to try this patch,
to no avail, but that's what I'm going to try now (which is after all
pre-5.6-rc1 so other bugs are to be expected).  For all I know some of the
reverts won't apply to current due to either being already reverted or more
code changes since, but we'll see how it goes.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (43 preceding siblings ...)
  2020-07-07 11:01 ` bugzilla-daemon
@ 2020-07-07 12:43 ` bugzilla-daemon
  2020-07-07 15:27 ` bugzilla-daemon
                   ` (71 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-07 12:43 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #45 from Fabian Möller (fabianm88@gmail.com) ---
(In reply to Christopher Snowhill from comment #43)
> What about this patch?
> 
> https://cgit.freedesktop.org/~agd5f/linux/commit/?h=drm-fixes-5.
> 8&id=6eb3cf2e06d22b2b08e6b0ab48cb9c05a8e1a107

Applying 6eb3cf2e06d22b2b08e6b0ab48cb9c05a8e1a107 to v5.7.7 fixed the issue for
a RX5700/Navi10 under Wayland for me. 
It still produces the following log, which might be related to
https://bugzilla.kernel.org/show_bug.cgi?id=206349.

------------[ cut here ]------------
WARNING: CPU: 2 PID: 1176 at arch/x86/kernel/fpu/core.c:109
kernel_fpu_end+0x19/0x20
Modules linked in: fuse xt_conntrack xt_MASQUERADE nf_conntrack_netlink
nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat
nf_conntrack nf_defrag_ipv4 br_netfilter overlay wireguard curve25519_x86_64
libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libblake2s blake2s_x86_64
ip6_udp_tunnel udp_tunnel libcurve25519_generic libchacha libblake2s_generic
af_packet rfkill msr amdgpu amd_iommu_v2 gpu_sched ttm drm_kms_helper drm
snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi
wmi_bmof mxm_wmi snd_hda_intel nls_iso8859_1 agpgart igb deflate nls_cp437
snd_intel_dspcfg sp5100_tco ptp mousedev vfat efi_pstore fb_sys_fops evdev fat
snd_hda_codec edac_mce_amd pstore mac_hid syscopyarea watchdog pps_core
sysfillrect edac_core snd_hda_core sysimgblt dca i2c_piix4 backlight
crc32_pclmul i2c_algo_bit ghash_clmulni_intel efivars k10temp snd_hwdep
i2c_core thermal wmi pinctrl_amd tiny_power_button button acpi_cpufreq
sch_fq_codel snd_pcm_oss
 snd_mixer_oss snd_pcm snd_timer snd soundcore atkbd libps2 serio loop
cpufreq_ondemand tun tap macvlan bridge stp llc vboxnetflt(OE) vboxnetadp(OE)
vboxdrv(OE) kvm_amd kvm irqbypass efivarfs ip_tables x_tables ipv6
nf_defrag_ipv6 crc_ccitt autofs4 xfs libcrc32c crc32c_generic dm_crypt
algif_skcipher af_alg input_leds led_class hid_generic usbhid hid ahci xhci_pci
libahci crc32c_intel xhci_hcd libata aesni_intel libaes crypto_simd nvme
usbcore cryptd scsi_mod glue_helper nvme_core t10_pi crc_t10dif
crct10dif_generic crct10dif_pclmul usb_common crct10dif_common rtc_cmos
dm_snapshot dm_bufio dm_mod
CPU: 2 PID: 1176 Comm: systemd-logind Tainted: G           OE     5.7.7
#1-NixOS
Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS PRO/X570 AORUS PRO,
BIOS F10c 11/08/2019
RIP: 0010:kernel_fpu_end+0x19/0x20
Code: 90 e9 db 9b 14 00 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 65 8a
05 5c 1a 3e 66 84 c0 74 09 65 c6 05 50 1a 3e 66 00 c3 <0f> 0b eb f3 0f 1f 00 0f
1f 44 00 00 8b 15 cd 59 57 01 31 f6 e8 2e
RSP: 0018:ffffb9dbc1417660 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000000121b
RDX: 0000000000000001 RSI: ffff90b685fa1cd4 RDI: 000000000002f980
RBP: ffff90b685fa0000 R08: 0000000000000000 R09: 0000000000000040
R10: ffffb9dbc14175b0 R11: ffffb9dbc14170a0 R12: 0000000000000001
R13: ffff90b685fa1da8 R14: 0000000000000006 R15: ffff90b5158f8400
FS:  00007fd7ab143880(0000) GS:ffff90b6bea80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000e81091ef030 CR3: 00000007d80c4000 CR4: 0000000000340ee0
Call Trace:
 dcn20_validate_bandwidth+0x2c/0x40 [amdgpu]
 dc_commit_updates_for_stream+0xad7/0x1930 [amdgpu]
 ? amdgpu_display_get_crtc_scanoutpos+0x85/0x190 [amdgpu]
 amdgpu_dm_atomic_commit_tail+0xb4c/0x1fc0 [amdgpu]
 commit_tail+0x94/0x130 [drm_kms_helper]
 drm_atomic_helper_commit+0x113/0x140 [drm_kms_helper]
 drm_client_modeset_commit_atomic+0x1c9/0x200 [drm]
 drm_client_modeset_commit_locked+0x50/0x150 [drm]
 __drm_fb_helper_restore_fbdev_mode_unlocked+0x59/0xc0 [drm_kms_helper]
 drm_fb_helper_set_par+0x3c/0x50 [drm_kms_helper]
 fb_set_var+0x175/0x370
 ? update_load_avg+0x78/0x630
 ? update_curr+0x69/0x1a0
 fbcon_blank+0x20d/0x270
 do_unblank_screen+0xaa/0x150
 complete_change_console+0x54/0xd0
 vt_ioctl+0x126f/0x1320
 tty_ioctl+0x372/0x8c0
 ksys_ioctl+0x87/0xc0
 __x64_sys_ioctl+0x16/0x20
 do_syscall_64+0x4e/0x160
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fd7ab75d1c7
Code: 00 00 90 48 8b 05 b9 9c 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff
c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01
c3 48 8b 0d 89 9c 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffe9138fec8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd7ab75d1c7
RDX: 0000000000000001 RSI: 0000000000005605 RDI: 0000000000000015
RBP: 0000000000000015 R08: 0000000000000000 R09: 00000000ffffffff
R10: 0000000000000001 R11: 0000000000000246 R12: 00007ffe9138ff38
R13: 0000000000000006 R14: 00007ffe91390050 R15: 00007ffe91390048
---[ end trace eefc00b763354df8 ]---

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (44 preceding siblings ...)
  2020-07-07 12:43 ` bugzilla-daemon
@ 2020-07-07 15:27 ` bugzilla-daemon
  2020-07-07 19:05 ` bugzilla-daemon
                   ` (70 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-07 15:27 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #46 from Duncan (1i5t5.duncan@cox.net) ---
(In reply to Fabian Möller from comment #45)
> Applying 6eb3cf2e06d22b2b08e6b0ab48cb9c05a8e1a107 to v5.7.7 fixed the issue
> for a RX5700/Navi10 under Wayland for me. 

Polaris11 uses a different code path or needs an additional fix?  (Less likely,
maybe X/plasma/kwin makes the kernel calls differently?)

Progress for all and a fix for some in any case! =:^)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (45 preceding siblings ...)
  2020-07-07 15:27 ` bugzilla-daemon
@ 2020-07-07 19:05 ` bugzilla-daemon
  2020-07-08  0:25 ` bugzilla-daemon
                   ` (69 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-07 19:05 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #47 from Duncan (1i5t5.duncan@cox.net) ---
(In reply to Duncan from comment #39)
> 28 candidates ATM, several of which are OCFS2 or spelling
> fixes neither of which should affect this bug.  Excluding those there are
> eleven left; the penultimate (next to last) one looks to be a good candidate:
> 
> 5f2d5026b mm/Makefile: disable KCSAN for kmemleak
> b0d14fc43 mm/kmemleak.c: use address-of operator on section symbols
> 667c79016 revert "topology: add support for node_to_mem_node() to determine
> the fallback node"
> 3202fa62f slub: relocate freelist pointer to middle of object
> 1ad53d9fa slub: improve bit diffusion for freelist ptr obfuscation
> bbd4e305e mm/slub.c: replace kmem_cache->cpu_partial with wrapped APIs
> 4c7ba22e4 mm/slub.c: replace cpu_slab->partial with wrapped APIs
> c537338c0 fs_parse: remove pr_notice() about each validation
> 630f289b7 asm-generic: make more kernel-space headers mandatory
> 98c985d7d kthread: mark timer used by delayed kthread works as IRQ safe
> 4054ab64e tools/accounting/getdelays.c: fix netlink attribute length

(... and comment #44)
> [I]dea to just try patch-reverting them on top of
> 5.7 or current 5.8-rc, thus eliminating the apparently unrelated
> kernel-panics I''ve twice triggered at the current bisect step.

[Again noting that on my polaris11 the bug doesn't seem to be fixed, despite
comment #45 saying it is on his navi10 with a patch/commit that I can see in
5.8-rc4+.]

So I tried this with the 11 above commits against 5.8.0-rc4-00025-gbfe91da29,
which previously tested as triggering the freeze for me.  Of the 11, nine
clean-reversed and I simply noted and skipped the other two (3202fa62f and
630f289b7) for the moment.  The patched kernel successfully built and I'm
booted to it now.  I just completed a system update (on gentoo so built from
source), which doesn't always trigger the freeze, but seems to do so with a
reasonable number of package updates on kernels with this bug perhaps 50% of
the time.  No freeze.

I'll now try some 4k youtube in firefox, the other stressor that sometimes
seems to trigger it here, and perhaps combine that with an unnecessary rebuild
(since my system's already current) of something big like qtwebengine.  If that
doesn't trigger a freeze I'll stay booted to this thing another few days and
try some more, before being confident enough to declare that one of those nine
commits triggers the bug on my hardware and reverting them eliminates it.

Assuming it is one of those 9 commits (down from 28, as I quoted above, at my
last completed auto-bisect step) I'll reset and try manually bisecting on the
9.  It's looking good so far, but other kernels have looked good at this stage
and then ultimately frozen with the telltale gpf log, so it remains to be seen.

Meanwhile, nice to be on a current development kernel and well past rc1 stage,
again. =:^)  Bisect-testing otherwise long-stale pre-rc1 kernels with other
kernel-crasher bugs to complicate things is *not* my definition of fun!

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (46 preceding siblings ...)
  2020-07-07 19:05 ` bugzilla-daemon
@ 2020-07-08  0:25 ` bugzilla-daemon
  2020-07-08  1:25 ` bugzilla-daemon
                   ` (68 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-08  0:25 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #48 from Duncan (1i5t5.duncan@cox.net) ---
(In reply to Duncan from comment #47)
> > [I]dea to just try patch-reverting them on top of
> > 5.7 or current 5.8-rc, thus eliminating the apparently unrelated
> > kernel-panics I''ve twice triggered at the current bisect step.
> 
> So I tried this with the 11 above commits against
> 5.8.0-rc4-00025-gbfe91da29, which previously tested as triggering the freeze
> for me.  Of the 11, nine clean-reversed and I simply noted and skipped the
> other two (3202fa62f and 630f289b7) for the moment.  The patched kernel
> successfully built and I'm booted to it now.

Bah, humbug!  Got a freeze and the infamous logged trace on that too!  I was
hoping to demonstrably prove it to be in those nine!  I proved it *NOT* to be!

Well, there's still the two commits to look at that wouldn't cleanly
simple-revert.  Maybe I'll get lucky and it's just an ordering thing, since I
applied out of order compared to original commit, and they'll simple-revert on
top of the others.  Otherwise I'll have to actually look and see if I can make
sense of it and manual revert, maybe/maybe-not for a non-coder, or try on 5.7
instead of 5.8-rc.

If not them, maybe I'll just have to declare defeat on the bisect and hope for
a fix without that.  Last resort there's the buy-my-way-out solution, tho of
course that leaves others without that option in a bind.  But given the hours
I've put into this (that I've only been able to thanks to COVID work
suspension), at some point you just gotta cut your losses and declare defeat
defeat.

But we're not there yet.  There's still the two to look at first, and the
middle-ground 5.7 to try all 11 against.  Hopefully...

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (47 preceding siblings ...)
  2020-07-08  0:25 ` bugzilla-daemon
@ 2020-07-08  1:25 ` bugzilla-daemon
  2020-07-08 20:16 ` bugzilla-daemon
                   ` (67 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-08  1:25 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #49 from Christopher Snowhill (kode54@gmail.com) ---
One possibility that I hadn't considered when I was originally testing this. I
use the GNOME 3 desktop on Arch, and have two monitors, one 3840x2160@60Hz, one
1920x1080@60Hz, both DisplayPort. One thing I haven't enabled since I switched
back from my Nvidia GTX 960 backup card was Variable Refresh Rate, which I had
previously enabled in my Xorg configuration.

I never experienced crashes like these on page flips on my Nvidia card, and am
awaiting a crash any day now on the RX 480, assuming I haven't magically
configured it away with the deletion of that Xorg config snippet which did
nothing but enable VRR.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (48 preceding siblings ...)
  2020-07-08  1:25 ` bugzilla-daemon
@ 2020-07-08 20:16 ` bugzilla-daemon
  2020-07-08 20:17 ` bugzilla-daemon
                   ` (66 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-08 20:16 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #50 from rtmasura+kernel@hotmail.com ---
I have 3 monitors, 2 1080p and one 1440p. Happens when I use vblank_mode glx or
xpresent, off and I'm stable.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (49 preceding siblings ...)
  2020-07-08 20:16 ` bugzilla-daemon
@ 2020-07-08 20:17 ` bugzilla-daemon
  2020-07-09  7:45 ` bugzilla-daemon
                   ` (65 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-08 20:17 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #51 from rtmasura+kernel@hotmail.com ---
that didn't read well, with vblank_mode off for XFWM I don't have this issue at
all.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (50 preceding siblings ...)
  2020-07-08 20:17 ` bugzilla-daemon
@ 2020-07-09  7:45 ` bugzilla-daemon
  2020-07-10  7:23 ` bugzilla-daemon
                   ` (64 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-09  7:45 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #52 from Michel Dänzer (michel@daenzer.net) ---
(In reply to rtmasura+kernel from comment #51)
> that didn't read well, with vblank_mode off for XFWM I don't have this issue
> at all.

That just avoids the problem by not doing any page flips.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (51 preceding siblings ...)
  2020-07-09  7:45 ` bugzilla-daemon
@ 2020-07-10  7:23 ` bugzilla-daemon
  2020-07-10  7:36 ` bugzilla-daemon
                   ` (63 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-10  7:23 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

Stratos Zolotas (strzol@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |strzol@gmail.com

--- Comment #53 from Stratos Zolotas (strzol@gmail.com) ---
Hi everyone.

Don't know if it helps. I'm getting a similar issue on Opensuse Tumbleweed with
kernel 5.7.7. Reverting to kernel 5.7.5 makes things stable for me. My GPU is
RX580.

    Ιουλ 09 21:17:39.718030 teras.baskin.cywn kernel: general protection fault,
probably for non-canonical address 0x3e9478a9ecb3abc8: 0000 [#1] SMP NOPTI
    Ιουλ 09 21:17:39.718200 teras.baskin.cywn kernel: CPU: 1 PID: 141 Comm:
kworker/u16:3 Tainted: G           O      5.7.7-1-default #1 openSUSE
Tumbleweed (unreleased)
    Ιουλ 09 21:17:39.718239 teras.baskin.cywn kernel: Hardware name: Gigabyte
Technology Co., Ltd. To be filled by O.E.M./970A-DS3P, BIOS FD 02/26/2016
    Ιουλ 09 21:17:39.718273 teras.baskin.cywn kernel: Workqueue: events_unbound
commit_work [drm_kms_helper]
    Ιουλ 09 21:17:39.718306 teras.baskin.cywn kernel: RIP:
0010:amdgpu_dm_atomic_commit_tail+0x273/0x10f0 [amdgpu]
    Ιουλ 09 21:17:39.718339 teras.baskin.cywn kernel: Code: 43 08 8b 90 e0 02
00 00 41 83 c6 01 44 39 f2 0f 87 3a ff ff ff 48 83 bd a0 fd ff ff 00 0f 84 03
01 00 00 48 8b bd a0 fd ff ff <80> bf b0 01 00 00 01 0f 86 ac 00 00 00 48 b9 00
00 00 00 01 00 00
    Ιουλ 09 21:17:39.718368 teras.baskin.cywn kernel: RSP:
0018:ffffb7cf4037bbe0 EFLAGS: 00010202
    Ιουλ 09 21:17:39.718400 teras.baskin.cywn kernel: RAX: ffff8fb2a5e11800
RBX: ffff8fb28f2c2880 RCX: ffff8fb10ff8ec00
    Ιουλ 09 21:17:39.718442 teras.baskin.cywn kernel: RDX: 0000000000000006
RSI: ffffffffc0b7f530 RDI: 3e9478a9ecb3abc8
    Ιουλ 09 21:17:39.718482 teras.baskin.cywn kernel: RBP: ffffb7cf4037be68
R08: 0000000000000001 R09: 0000000000000001
    Ιουλ 09 21:17:39.718519 teras.baskin.cywn kernel: R10: 000000000000014d
R11: 0000000000000018 R12: ffff8fb2a35e7400
    Ιουλ 09 21:17:39.718547 teras.baskin.cywn kernel: R13: 0000000000000000
R14: 0000000000000006 R15: ffff8fb112001000
    Ιουλ 09 21:17:39.718584 teras.baskin.cywn kernel: FS: 
0000000000000000(0000) GS:ffff8fb2aec40000(0000) knlGS:0000000000000000
    Ιουλ 09 21:17:39.718620 teras.baskin.cywn kernel: CS:  0010 DS: 0000 ES:
0000 CR0: 0000000080050033
    Ιουλ 09 21:17:39.718652 teras.baskin.cywn kernel: CR2: 00007f364b9bc000
CR3: 000000042a2ae000 CR4: 00000000000406e0
    Ιουλ 09 21:17:39.718683 teras.baskin.cywn kernel: Call Trace:
    Ιουλ 09 21:17:39.718715 teras.baskin.cywn kernel:  ?
__switch_to_asm+0x40/0x70
    Ιουλ 09 21:17:39.718750 teras.baskin.cywn kernel:  ?
__switch_to_asm+0x34/0x70
    Ιουλ 09 21:17:39.718784 teras.baskin.cywn kernel:  ?
__switch_to_asm+0x40/0x70
    Ιουλ 09 21:17:39.718810 teras.baskin.cywn kernel:  ?
__switch_to_asm+0x34/0x70
    Ιουλ 09 21:17:39.718840 teras.baskin.cywn kernel:  ?
__switch_to_asm+0x40/0x70
    Ιουλ 09 21:17:39.718868 teras.baskin.cywn kernel:  ?
__switch_to_asm+0x34/0x70
    Ιουλ 09 21:17:39.718894 teras.baskin.cywn kernel:  ?
__switch_to_asm+0x40/0x70
    Ιουλ 09 21:17:39.718921 teras.baskin.cywn kernel:  ?
__switch_to_asm+0x34/0x70
    Ιουλ 09 21:17:39.718946 teras.baskin.cywn kernel:  ?
__switch_to_asm+0x40/0x70
    Ιουλ 09 21:17:39.718972 teras.baskin.cywn kernel:  ?
__switch_to_asm+0x34/0x70
    Ιουλ 09 21:17:39.718999 teras.baskin.cywn kernel:  ?
__switch_to_asm+0x40/0x70
    Ιουλ 09 21:17:39.719026 teras.baskin.cywn kernel:  ?
__switch_to_asm+0x34/0x70
    Ιουλ 09 21:17:39.719062 teras.baskin.cywn kernel:  ?
__switch_to_asm+0x40/0x70
    Ιουλ 09 21:17:39.719088 teras.baskin.cywn kernel:  ?
__switch_to_asm+0x34/0x70
    Ιουλ 09 21:17:39.719122 teras.baskin.cywn kernel:  ?
__switch_to_asm+0x40/0x70
    Ιουλ 09 21:17:39.719149 teras.baskin.cywn kernel:  ?
__switch_to_asm+0x34/0x70
    Ιουλ 09 21:17:39.719177 teras.baskin.cywn kernel:  ?
__switch_to_asm+0x40/0x70
    Ιουλ 09 21:17:39.719203 teras.baskin.cywn kernel:  ?
__switch_to_asm+0x34/0x70
    Ιουλ 09 21:17:39.719229 teras.baskin.cywn kernel:  ?
__switch_to_asm+0x40/0x70
    Ιουλ 09 21:17:39.719254 teras.baskin.cywn kernel:  ?
__switch_to_asm+0x34/0x70
    Ιουλ 09 21:17:39.719280 teras.baskin.cywn kernel:  ?
__switch_to_asm+0x40/0x70
    Ιουλ 09 21:17:39.719306 teras.baskin.cywn kernel:  ?
__switch_to_asm+0x34/0x70
    Ιουλ 09 21:17:39.719333 teras.baskin.cywn kernel:  ?
__switch_to_asm+0x40/0x70
    Ιουλ 09 21:17:39.719359 teras.baskin.cywn kernel:  ?
__switch_to_asm+0x34/0x70
    Ιουλ 09 21:17:39.719383 teras.baskin.cywn kernel:  ?
__switch_to_asm+0x40/0x70
    Ιουλ 09 21:17:39.719408 teras.baskin.cywn kernel:  ?
__switch_to_asm+0x34/0x70
    Ιουλ 09 21:17:39.719433 teras.baskin.cywn kernel:  ?
__switch_to_asm+0x40/0x70
    Ιουλ 09 21:17:39.719465 teras.baskin.cywn kernel:  ?
__switch_to_asm+0x34/0x70
    Ιουλ 09 21:17:39.719490 teras.baskin.cywn kernel:  ?
__switch_to_asm+0x40/0x70
    Ιουλ 09 21:17:39.719515 teras.baskin.cywn kernel:  ?
__switch_to+0x152/0x380
    Ιουλ 09 21:17:39.719545 teras.baskin.cywn kernel:  ?
__switch_to_asm+0x34/0x70
    Ιουλ 09 21:17:39.719572 teras.baskin.cywn kernel:  ? __schedule+0x1fe/0x560
    Ιουλ 09 21:17:39.719605 teras.baskin.cywn kernel:  ? usleep_range+0x80/0x80
    Ιουλ 09 21:17:39.719637 teras.baskin.cywn kernel:  ?
_cond_resched+0x16/0x40
    Ιουλ 09 21:17:39.719664 teras.baskin.cywn kernel:  ?
__wait_for_common+0x3b/0x160
    Ιουλ 09 21:17:39.719690 teras.baskin.cywn kernel:  commit_tail+0x94/0x130
[drm_kms_helper]
    Ιουλ 09 21:17:39.719727 teras.baskin.cywn kernel: 
process_one_work+0x1e3/0x3b0
    Ιουλ 09 21:17:39.719760 teras.baskin.cywn kernel:  worker_thread+0x46/0x340
    Ιουλ 09 21:17:39.719795 teras.baskin.cywn kernel:  ?
process_one_work+0x3b0/0x3b0
    Ιουλ 09 21:17:39.719830 teras.baskin.cywn kernel:  kthread+0x115/0x140
    Ιουλ 09 21:17:39.719888 teras.baskin.cywn kernel:  ?
__kthread_bind_mask+0x60/0x60
    Ιουλ 09 21:17:39.719920 teras.baskin.cywn kernel:  ret_from_fork+0x22/0x40
    Ιουλ 09 21:17:39.719952 teras.baskin.cywn kernel: Modules linked in: rfcomm
fuse af_packet vboxnetadp(O) vboxnetflt(O) cmac algif_hash vboxdrv(O)
algif_skcipher af_alg bnep dmi_sysfs msr it87 hwmon_vid squashfs xfs
nls_iso8859_1 nls_cp437 vfat fat loop edac_mce_amd uvcvideo kvm_amd pktcdvd ccp
videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 kvm videobuf2_common
snd_hda_codec_realtek snd_hda_codec_generic snd_usb_audio irqbypass
snd_hda_codec_hdmi ledtrig_audio videodev btusb snd_usbmidi_lib btrtl
snd_rawmidi btbcm snd_hda_intel btintel snd_seq_device snd_intel_dspcfg
crct10dif_pclmul crc32_pclmul mc ghash_clmulni_intel bluetooth joydev
snd_hda_codec aesni_intel ecdh_generic crypto_simd rfkill ecc cryptd
snd_hda_core glue_helper efi_pstore fam15h_power pcspkr k10temp sp5100_tco
snd_hwdep i2c_piix4 snd_pcm r8169 snd_timer realtek snd libphy soundcore
tiny_power_button button acpi_cpufreq tcp_bbr sch_fq hid_logitech_hidpp
hid_logitech_dj uas usb_storage hid_generic usbhid btrfs sr_mod cdrom
blake2b_generic libcrc32c xor amdgpu
    Ιουλ 09 21:17:39.720011 teras.baskin.cywn kernel:  ohci_pci amd_iommu_v2
gpu_sched i2c_algo_bit ttm drm_kms_helper raid6_pq crc32c_intel ata_generic
syscopyarea xhci_pci sysfillrect sysimgblt fb_sys_fops xhci_hcd cec ohci_hcd
rc_core ehci_pci ehci_hcd drm pata_atiixp usbcore sg dm_multipath dm_mod
scsi_dh_rdac scsi_dh_emc scsi_dh_alua efivarfs
    Ιουλ 09 21:17:39.720045 teras.baskin.cywn kernel: ---[ end trace
573bd378072b1ec2 ]---
    Ιουλ 09 21:17:39.720078 teras.baskin.cywn kernel: RIP:
0010:amdgpu_dm_atomic_commit_tail+0x273/0x10f0 [amdgpu]
    Ιουλ 09 21:17:39.720105 teras.baskin.cywn kernel: Code: 43 08 8b 90 e0 02
00 00 41 83 c6 01 44 39 f2 0f 87 3a ff ff ff 48 83 bd a0 fd ff ff 00 0f 84 03
01 00 00 48 8b bd a0 fd ff ff <80> bf b0 01 00 00 01 0f 86 ac 00 00 00 48 b9 00
00 00 00 01 00 00
    Ιουλ 09 21:17:39.720133 teras.baskin.cywn kernel: RSP:
0018:ffffb7cf4037bbe0 EFLAGS: 00010202
    Ιουλ 09 21:17:39.720163 teras.baskin.cywn kernel: RAX: ffff8fb2a5e11800
RBX: ffff8fb28f2c2880 RCX: ffff8fb10ff8ec00
    Ιουλ 09 21:17:39.720193 teras.baskin.cywn kernel: RDX: 0000000000000006
RSI: ffffffffc0b7f530 RDI: 3e9478a9ecb3abc8
    Ιουλ 09 21:17:39.720241 teras.baskin.cywn kernel: RBP: ffffb7cf4037be68
R08: 0000000000000001 R09: 0000000000000001
    Ιουλ 09 21:17:39.720294 teras.baskin.cywn kernel: R10: 000000000000014d
R11: 0000000000000018 R12: ffff8fb2a35e7400
    Ιουλ 09 21:17:39.720322 teras.baskin.cywn kernel: R13: 0000000000000000
R14: 0000000000000006 R15: ffff8fb112001000
    Ιουλ 09 21:17:39.720351 teras.baskin.cywn kernel: FS: 
0000000000000000(0000) GS:ffff8fb2aec40000(0000) knlGS:0000000000000000
    Ιουλ 09 21:17:39.720380 teras.baskin.cywn kernel: CS:  0010 DS: 0000 ES:
0000 CR0: 0000000080050033
    Ιουλ 09 21:17:39.720409 teras.baskin.cywn kernel: CR2: 00007f364b9bc000
CR3: 000000042a2ae000 CR4: 00000000000406e0
    Ιουλ 09 21:19:54.107989 teras.baskin.cywn kernel:
[drm:do_aquire_global_lock.isra.0 [amdgpu]] *ERROR* [CRTC:49:crtc-1] hw_done or
flip_done timed out
    Ιουλ 09 21:20:04.348029 teras.baskin.cywn kernel:
[drm:do_aquire_global_lock.isra.0 [amdgpu]] *ERROR* [CRTC:49:crtc-1] hw_done or
flip_done timed out

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (52 preceding siblings ...)
  2020-07-10  7:23 ` bugzilla-daemon
@ 2020-07-10  7:36 ` bugzilla-daemon
  2020-07-10  8:10 ` bugzilla-daemon
                   ` (62 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-10  7:36 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #54 from Paul Menzel (pmenzel+bugzilla.kernel.org@molgen.mpg.de) ---
(In reply to Stratos Zolotas from comment #53)

> Don't know if it helps. I'm getting a similar issue on Opensuse Tumbleweed
> with kernel 5.7.7. Reverting to kernel 5.7.5 makes things stable for me. My
> GPU is RX580.

[…]

Thank you for your report. How quickly can you reproduce it? If you could
bisect the issue to pinpoint the culprit commit between 5.7.5 and 5.7.7, that’d
be great. Maybe open even a separate bug report, in case they are unrelated.
They can always be marked as duplicates later.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (53 preceding siblings ...)
  2020-07-10  7:36 ` bugzilla-daemon
@ 2020-07-10  8:10 ` bugzilla-daemon
  2020-07-10 10:55 ` bugzilla-daemon
                   ` (61 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-10  8:10 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #55 from Stratos Zolotas (strzol@gmail.com) ---
(In reply to Paul Menzel from comment #54)

> Thank you for your report. How quickly can you reproduce it? If you could
> bisect the issue to pinpoint the culprit commit between 5.7.5 and 5.7.7,
> that’d be great. Maybe open even a separate bug report, in case they are
> unrelated. They can always be marked as duplicates later.

If you guide me on what to do I can report back in some hours (not on that
system now). I had 4 crashes yesterday with kernel 5.7.7 in 3 hours doing daily
stuff (not gaming or something like that). System was unresponsive, ssh to the
box worked but reboot from console hangs also, only ALT+SysRq+B reboots the
system. I booted with the previous kernel (5.7.5) and was stable for over 6-7
hours.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (54 preceding siblings ...)
  2020-07-10  8:10 ` bugzilla-daemon
@ 2020-07-10 10:55 ` bugzilla-daemon
  2020-07-10 11:25 ` bugzilla-daemon
                   ` (60 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-10 10:55 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

Duncan (1i5t5.duncan@cox.net) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Kernel Version|5.7-rc1 - 5.7 - 5.8-rc1+    |5.7-rc1 - 5.7 - 5.8-rc4+

--- Comment #56 from Duncan (1i5t5.duncan@cox.net) ---
Some notes and a question (last * point):

* There seem to be two and it's now looking like three near identical bugs or
variants of the same bug, all with the very similar
amdgpu-dm-atomic-commit-tail/events-unbound-commit-work log trace.  

1) Until now all the reports seemed to start by 5.7.0 and presumably between
5.6.0 and 5.7-rc1, which was when I first saw it.  But now, comment #53 is
reporting an origin with 5.7.6 or 5.7.7 while 5.7.5 was fine.  That's on rx580,
which wikipedia says is polaris20.

2) Of the other two, one is reported fixed (on an rc5700/navi10) by commit
6eb3cf2e0 which we were asked to try above, that made it into 5.8-rc4, while...

3) My older rx460/polaris11, started with a pull shortly before 5.7-rc1 (that
I've been unable to properly bisect to, once for sure and it's looking like
twice, much to my frustration!) and continues all the way thru today's almost
5.8-rc5 -- the 6eb commit didn't help.

Seems the vega/navi graphics either started later (your 5.7.5 good, 5.7.7 bad)
or are fixed by 6eb, while my older polaris, started earlier and isn't fixed by
6eb.

BTW Stratos, that 6eb commit appears to be in the fresh 5.7.8 as well.  Seeing
if the bug is still there would thus be interesting.

* Chris mentioned variable-refresh-rate/VRR in comment #49.  He was wondering
if turning it OFF helped him as he had done so when migrating cards and hadn't
seen the problem on his rx480 after that.

I hadn't messed with VRR here on my rx460/polaris11, because I'm running dual
4k TVs as monitors and didn't think they supported it, yet I was the OP, so at
least on rx460 having VRR off doesn't seem to help.  But just for kicks I did
try turning it on yesterday while back on a stable 5.6.0, and then booted to
today's near-5.8-rc5 to test it.  Still got the graphics freeze.  So that
didn't appear to affect the bug here on my rx460 anyway.

Interestingly enough, tho, quite aside from this bug and maybe it's all in my
head, but despite thinking VRR shouldn't be available here and expecting no
difference, turning it on /does/ seem to make things smoother.  Now I'm
wondering if even without actual VRR, turning it on helps something stay in
sync better, at least on my hardware.  <shrug>  Tho it doesn't seem to affect
how the bug triggers, maybe that'll be the hint necessary for the devs to
figure out what's different with the bug on my rx460 compared to the newer
stuff, thus helping them to fix the older stuff too.

* Now the question: Anybody with this bug that is **NOT** running multi-monitor
when it triggers?  Seems all I've seen are multi-monitor, but someone could
have simply not mentioned (or I just missed it) that they're seeing it on
single-monitor too.  (If you are running multi-monitor you don't need to post a
reply just for this, as that seems to be the reported default.  But having
explicit confirmation of whether it affects single-monitor or not could be
helpful.)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (55 preceding siblings ...)
  2020-07-10 10:55 ` bugzilla-daemon
@ 2020-07-10 11:25 ` bugzilla-daemon
  2020-07-10 14:31 ` bugzilla-daemon
                   ` (59 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-10 11:25 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #57 from Anthony Ruhier (anthony.ruhier@gmail.com) ---
To give some precision about the kernel version range, I'm staying on 5.6.19
for a while, which doesn't have the issue. It's pretty bad though, as it's EOL.

Only the 5.7 branch has it. So it's something that wasn't backported.

I also have multimonitors, with one with VRR, though I don't know if VRR
changes anything in my case as the 2 other screens don't support it.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (56 preceding siblings ...)
  2020-07-10 11:25 ` bugzilla-daemon
@ 2020-07-10 14:31 ` bugzilla-daemon
  2020-07-12  5:20 ` bugzilla-daemon
                   ` (58 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-10 14:31 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #58 from Anthony Ruhier (anthony.ruhier@gmail.com) ---
Sorry, I forgot to say that I have a vega64.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (57 preceding siblings ...)
  2020-07-10 14:31 ` bugzilla-daemon
@ 2020-07-12  5:20 ` bugzilla-daemon
  2020-07-12  5:47 ` bugzilla-daemon
                   ` (57 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-12  5:20 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #59 from chancuan66@gmail.com ---
(In reply to Paul Menzel from comment #54)
> (In reply to Stratos Zolotas from comment #53)
> 
> > Don't know if it helps. I'm getting a similar issue on Opensuse Tumbleweed
> > with kernel 5.7.7. Reverting to kernel 5.7.5 makes things stable for me. My
> > GPU is RX580.
> 
> […]
> 
> Thank you for your report. How quickly can you reproduce it? If you could
> bisect the issue to pinpoint the culprit commit between 5.7.5 and 5.7.7,
> that’d be great. Maybe open even a separate bug report, in case they are
> unrelated. They can always be marked as duplicates later.

I am running the same setup as the comment. RX 580, Tumbleweed, have both
kernels 5.7.5 and 5.7.7. On 5.7.7, it happens almost immediately after login.
However, reverting to 5.7.5 does NOT stabilise, and the same problem arises
somewhere between 1 to 10 minutes.

I didn't have this issue prior to installing the 5.7.7 kernel though...

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (58 preceding siblings ...)
  2020-07-12  5:20 ` bugzilla-daemon
@ 2020-07-12  5:47 ` bugzilla-daemon
  2020-07-12  7:47 ` bugzilla-daemon
                   ` (56 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-12  5:47 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #60 from Stratos Zolotas (strzol@gmail.com) ---
(In reply to Chan Cuan from comment #59)

> I didn't have this issue prior to installing the 5.7.7 kernel though...

To make things looks more strange... I have a non-explicable development with
this issue. When it appeared to me I was in the middle of upgrading some
components on my system. I replaced my AMD FX-8350 with one AMD Ryzen 5 3600X
and my Gigabyte GA-970a-ds3p motherboard with one Gigabyte X570 UD (along with
new RAM dimms from 16GB to 32GB). RX580 stayed the same and also OS is the same
(disks moved to the new motherboard, no re-install). Guess what... running with
5.7.7 for 48 hours now without issues.... problem has disappeared. I suspect a
very rare combination of things maybe even not in the amdgpu driver itself...
With 5.7.7 on my "old" configuration, I had the crash almost immediately after
login like in the above comment.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (59 preceding siblings ...)
  2020-07-12  5:47 ` bugzilla-daemon
@ 2020-07-12  7:47 ` bugzilla-daemon
  2020-07-14 23:36 ` bugzilla-daemon
                   ` (55 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-12  7:47 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #61 from Christopher Snowhill (kode54@gmail.com) ---
It may be worth noting that I also haven't experienced this crash lately, and
one of the things I did recently was update my motherboard BIOS, which included
an update from AGESA 1.0.0.4 release 2, to 1.0.0.6am4.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (60 preceding siblings ...)
  2020-07-12  7:47 ` bugzilla-daemon
@ 2020-07-14 23:36 ` bugzilla-daemon
  2020-07-15 16:49 ` bugzilla-daemon
                   ` (54 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-14 23:36 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

Duncan (1i5t5.duncan@cox.net) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Kernel Version|5.7-rc1 - 5.7 - 5.8-rc4+    |5.7-rc1 - 5.7 - 5.8-rc5+

--- Comment #62 from Duncan (1i5t5.duncan@cox.net) ---
(In reply to Duncan from comment #48)
> (In reply to Duncan from comment #47)
> > So I tried [patch-reverting] with the 11 above commits against
> > 5.8.0-rc4-00025-gbfe91da29, which previously tested as triggering the
> freeze
> > for me.  Of the 11, nine clean-reversed and I simply noted and skipped the
> > other two (3202fa62f and 630f289b7) for the moment.  The patched kernel
> > successfully built and I'm booted to it now.
> 
> Bah, humbug!  Got a freeze and the infamous logged trace on that too

After taking a few days discouragement-break I'm back at trying to pin it down.
 The quoted above left two candidate commits, 3202fa62f and 630f289b7, neither
of which would clean-revert as commits since were preventing that.

630f289b7 is a few lines changed in many files so I'm focusing on the simpler
3202fa62f first.  Turns out the reason 320... wasn't reverting was two
additional fixes to it that landed before v5.7.  Since they had Fixes: 320...
labels they were easy enough to find and patch-revert, after which
patch-reverting 320... itself worked against a current v5.8-rc5-8-g0dc589da8. 
I first tested it without the reverts to be sure it's still triggering this bug
for me, and just confirmed it was, freeze with the telltale log dump.

So for me at least, v5.8-rc5 is bad (just updated the version field to reflect
that).

Meanwhile I've applied the three 320-and-followups revert-patches to
v5.8-rc5-8-g0dc589da8 and just did the rebuild with them applied.  Now to
reboot to it and see if it still has our bug.  If no, great, pinned down.  If
yes, there's still that 630... commit to try to test.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (61 preceding siblings ...)
  2020-07-14 23:36 ` bugzilla-daemon
@ 2020-07-15 16:49 ` bugzilla-daemon
  2020-07-15 17:12 ` bugzilla-daemon
                   ` (53 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-15 16:49 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #63 from Duncan (1i5t5.duncan@cox.net) ---
(In reply to Duncan from comment #62)
> I've applied the three 320-and-followups revert-patches to
> v5.8-rc5-8-g0dc589da8 and just did the rebuild with them applied.
> Now to reboot to it and see if it still has our bug.

NB: The 3202fa62f followups are cbfc35a48 and 89b83f282.  That should let
anyone else with git and kernel building skills try reverting the three.

Still too early (by days) to call it nailed down as I've had it take 2-3 days
to trigger, but no gfx freeze here yet on that v5.8-rc5+ with 320-and-followups
reverted so far, despite playing 4k video to try to trigger it as it has
previously on affected kernels.  I'll be trying update builds (gentoo) later
today or tomorrow, another previous trigger, so we'll see how it goes.

But initial results are good enough to let others know that may want to try
it...

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (62 preceding siblings ...)
  2020-07-15 16:49 ` bugzilla-daemon
@ 2020-07-15 17:12 ` bugzilla-daemon
  2020-07-16  2:12 ` bugzilla-daemon
                   ` (52 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-15 17:12 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #64 from Anthony Ruhier (anthony.ruhier@gmail.com) ---
(In reply to Duncan from comment #63)
> (In reply to Duncan from comment #62)
> > I've applied the three 320-and-followups revert-patches to
> > v5.8-rc5-8-g0dc589da8 and just did the rebuild with them applied.
> > Now to reboot to it and see if it still has our bug.
> 
> NB: The 3202fa62f followups are cbfc35a48 and 89b83f282.  That should let
> anyone else with git and kernel building skills try reverting the three.
> 
> Still too early (by days) to call it nailed down as I've had it take 2-3
> days to trigger, but no gfx freeze here yet on that v5.8-rc5+ with
> 320-and-followups reverted so far, despite playing 4k video to try to
> trigger it as it has previously on affected kernels.  I'll be trying update
> builds (gentoo) later today or tomorrow, another previous trigger, so we'll
> see how it goes.
> 
> But initial results are good enough to let others know that may want to try
> it...

Thanks a lot, I'm also trying on my side.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (63 preceding siblings ...)
  2020-07-15 17:12 ` bugzilla-daemon
@ 2020-07-16  2:12 ` bugzilla-daemon
  2020-07-16  6:37 ` bugzilla-daemon
                   ` (51 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-16  2:12 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #65 from Duncan (1i5t5.duncan@cox.net) ---
(In reply to Duncan from comment #63)
> NB: The 3202fa62f followups are cbfc35a48 and 89b83f282.  That should let
> anyone else with git and kernel building skills try reverting the three.
> 
> Still too early (by days) to call it nailed down as I've had it take 2-3
> days to trigger, but no gfx freeze here yet on that v5.8-rc5+ with
> 320-and-followups reverted so far, despite playing 4k video to try to
> trigger it as it has previously on affected kernels.  I'll be trying update
> builds (gentoo) later today or tomorrow, another previous trigger, so we'll
> see how it goes.

I'm still not saying for sure, but that's actually looking like the culprit.

Today's gentoo update included a dep of qtwebengine, which changed ABI so
qtwebengine needed rebuilt on top of it, and qtwebengine is chromium-based. 
And as anyone that's built chromium (or firefox for that matter) can tell you,
at least on older fx-based hardware, it's several hours of near constant 100%
all-cores.

While rebuilding qtwebengine (at a batch-nice of +19 so it doesn't interfere
too badly with anything else I want to run), I was playing youtube videos at
1080p, not normally a problem by themselves (tho 4k can be, especially 4k60)
but with qtwebengine building at the same time...

No freezes.

I'm going to run with the 320 commit and followups reverted a few more days
before declaring it for sure the culprit, and I'm watching for Anthony's
results as well, but the bug's sure doing a convincing job of hiding ATM if
that commit isn't the culprit!

I'd say it's time to start reviewing the amdgpu code to see what relocating the
slub freelist pointer to the middle of the object (what the 320 commit did
according to its git log explanation) could tickle, when the work goes on the
work queue to run later, since that's consistently what the logs say is the
scenario and what mnrzk confirmed by forcing it /not/ to go to the work queue
in comment #30.

Hopefully we can still get and confirm a proper codefix by 5.8.0 release. =:^)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (64 preceding siblings ...)
  2020-07-16  2:12 ` bugzilla-daemon
@ 2020-07-16  6:37 ` bugzilla-daemon
  2020-07-16  9:35 ` bugzilla-daemon
                   ` (50 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-16  6:37 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

Paul Menzel (pmenzel+bugzilla.kernel.org@molgen.mpg.de) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |akpm@linux-foundation.org,
                   |                            |kees@outflux.net

--- Comment #66 from Paul Menzel (pmenzel+bugzilla.kernel.org@molgen.mpg.de) ---
Kees, Andrew, do you have an idea, how commit 3202fa62fb (slub: relocate
freelist pointer to middle of object) could cause a regression.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (65 preceding siblings ...)
  2020-07-16  6:37 ` bugzilla-daemon
@ 2020-07-16  9:35 ` bugzilla-daemon
  2020-07-16 10:24 ` bugzilla-daemon
                   ` (49 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-16  9:35 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #67 from Anthony Ruhier (anthony.ruhier@gmail.com) ---
No freeze for me too, and I compiled firefox yesterday, which usually triggers
a freeze on 5.7, and nothing yet. That's some really good news if it stays
true, thanks a lot Duncan!

FYI, I applied the revert on 5.7.8, I didn't want to run on 5.8.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (66 preceding siblings ...)
  2020-07-16  9:35 ` bugzilla-daemon
@ 2020-07-16 10:24 ` bugzilla-daemon
  2020-07-16 10:30 ` bugzilla-daemon
                   ` (48 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-16 10:24 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #68 from Stratos Zolotas (strzol@gmail.com) ---
(In reply to Stratos Zolotas from comment #60)
> 
> To make things looks more strange... I have a non-explicable development
> with this issue. When it appeared to me I was in the middle of upgrading
> some components on my system. I replaced my AMD FX-8350 with one AMD Ryzen 5
> 3600X and my Gigabyte GA-970a-ds3p motherboard with one Gigabyte X570 UD
> (along with new RAM dimms from 16GB to 32GB). RX580 stayed the same and also
> OS is the same (disks moved to the new motherboard, no re-install). Guess
> what... running with 5.7.7 for 48 hours now without issues.... problem has
> disappeared. I suspect a very rare combination of things maybe even not in
> the amdgpu driver itself... With 5.7.7 on my "old" configuration, I had the
> crash almost immediately after login like in the above comment.

Just to report that got the issue after some days with my new hardware setup,
so it is still there, hope you guys pinpoint it soon!

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (67 preceding siblings ...)
  2020-07-16 10:24 ` bugzilla-daemon
@ 2020-07-16 10:30 ` bugzilla-daemon
  2020-07-16 10:32 ` bugzilla-daemon
                   ` (47 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-16 10:30 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #69 from Anthony Ruhier (anthony.ruhier@gmail.com) ---
(In reply to Stratos Zolotas from comment #68)
> (In reply to Stratos Zolotas from comment #60)
> > 
> > To make things looks more strange... I have a non-explicable development
> > with this issue. When it appeared to me I was in the middle of upgrading
> > some components on my system. I replaced my AMD FX-8350 with one AMD Ryzen
> 5
> > 3600X and my Gigabyte GA-970a-ds3p motherboard with one Gigabyte X570 UD
> > (along with new RAM dimms from 16GB to 32GB). RX580 stayed the same and
> also
> > OS is the same (disks moved to the new motherboard, no re-install). Guess
> > what... running with 5.7.7 for 48 hours now without issues.... problem has
> > disappeared. I suspect a very rare combination of things maybe even not in
> > the amdgpu driver itself... With 5.7.7 on my "old" configuration, I had the
> > crash almost immediately after login like in the above comment.
> 
> Just to report that got the issue after some days with my new hardware
> setup, so it is still there, hope you guys pinpoint it soon!

You're talking about having the bug with 5.7.7 vanilla, right? Not with the
revert of the commits quoted above?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (68 preceding siblings ...)
  2020-07-16 10:30 ` bugzilla-daemon
@ 2020-07-16 10:32 ` bugzilla-daemon
  2020-07-17 12:39 ` bugzilla-daemon
                   ` (46 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-16 10:32 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #70 from Stratos Zolotas (strzol@gmail.com) ---
(In reply to Anthony Ruhier from comment #69)

> 
> You're talking about having the bug with 5.7.7 vanilla, right? Not with the
> revert of the commits quoted above?

Yes! It seemed to had "disappeared" with the change on hardware but probably it
took a little to appear. I'm on vanilla kernel correct.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (69 preceding siblings ...)
  2020-07-16 10:32 ` bugzilla-daemon
@ 2020-07-17 12:39 ` bugzilla-daemon
  2020-07-20  2:20 ` bugzilla-daemon
                   ` (45 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-17 12:39 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #71 from Anthony Ruhier (anthony.ruhier@gmail.com) ---
Just to give some news, I can confirm that I haven't had any freeze since
Wednesday. Usually, when my system just idled, it would quickly trigger the
bug. That or doing something CPU intensive (like compiling firefox). But
nothing since I reverted the 3 commits.

Really good job Duncan! Thanks a lot for your debug!

MB chipset: x470 
CPU: ryzen 2700x
GPU: vega64

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (70 preceding siblings ...)
  2020-07-17 12:39 ` bugzilla-daemon
@ 2020-07-20  2:20 ` bugzilla-daemon
  2020-07-21 16:40 ` bugzilla-daemon
                   ` (44 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-20  2:20 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #72 from Vinicius (mphantomx@yahoo.com.br) ---
Confirming that reverting 3202fa62f, cbfc35a48 and 89b83f282, fixed my
polaris10 too.

Tested with 5.7.8 and 5.7.9, Radeon RX 570.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (71 preceding siblings ...)
  2020-07-20  2:20 ` bugzilla-daemon
@ 2020-07-21 16:40 ` bugzilla-daemon
  2020-07-21 16:57 ` bugzilla-daemon
                   ` (43 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-21 16:40 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

Jeremy Kescher (jeremy@kescher.at) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jeremy@kescher.at

--- Comment #73 from Jeremy Kescher (jeremy@kescher.at) ---
Confirming as well that 3202fa62f, cbfc35a48 and 89b83f282 are the commits that
cause this regression.

Tested with 5.7.9, Radeon RX 480.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (72 preceding siblings ...)
  2020-07-21 16:40 ` bugzilla-daemon
@ 2020-07-21 16:57 ` bugzilla-daemon
  2020-07-21 19:32 ` bugzilla-daemon
                   ` (42 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-21 16:57 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #74 from Paul Menzel (pmenzel+bugzilla.kernel.org@molgen.mpg.de) ---
I sent a message to the LKML and amd-gfx list [1], asking Kees and Andrew on
how to proceed.

[1]: https://lkml.org/lkml/2020/7/21/729
     "[Regression] hangs caused by commit 3202fa62fb (slub: relocate freelist
pointer to middle of object)"

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (73 preceding siblings ...)
  2020-07-21 16:57 ` bugzilla-daemon
@ 2020-07-21 19:32 ` bugzilla-daemon
  2020-07-21 20:33 ` bugzilla-daemon
                   ` (41 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-21 19:32 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #75 from Kees Cook (kees@outflux.net) ---
Hi!

First, let me say sorry for all the work my patch has caused! It seems like it
might be tickling another (previously dormant) bug in the gpu driver.


(In reply to mnrzk from comment #30)
> I've been looking at this bug for a while now and I'll try to share what
> I've found about it.
> 
> In some conditions, when amdgpu_dm_atomic_commit_tail calls
> dm_atomic_get_new_state, dm_atomic_get_new_state returns a struct
> dm_atomic_state* with an garbage context pointer.
> 
> I've also found that this bug exclusively occurs when commit_work is on the
> workqueue. After forcing drm_atomic_helper_commit to run all of the commits
> without adding to the workqueue and running the OS, the issue seems to have
> disappeared. The system was stable for at least 1.5 hours before I manually
> shut it down (meanwhile it has usually crashed within 30-45 minutes).
> 
> Perhaps there's some sort of race condition occurring after commit_work is
> queued?

If it helps to explain what's happening in 3202fa62f, the kernel memory
allocator is moving it's free pointer from offset 0 to the middle of the
object. That means that when the memory is freed, it writes 8 bytes to join the
newly freed memory into the allocator's freelist. That always happened, but
after 3202fa62f it began writing it in the middle, not offset 0. If the work
queue is trying to use freed memory, and before it didn't notice the first 8
bytes getting written, now it appears to notice the overwrite... but that still
means something is freeing memory before it should.

Finding that might be a real trick. :( However, if you've suffered through all
those bisections, I wonder if you can try one other thing, which is to compile
the kernel with KASAN:

CONFIG_KASAN=y
CONFIG_KASAN_GENERIC=y
CONFIG_KASAN_OUTLINE=y
CONFIG_KASAN_STACK=y
CONFIG_KASAN_VMALLOC=y

This will make things _slow_, which might mean the use-after-free race may
never trigger. *However* it's possible that it'll catch a bad behavior before
it even needs to get hit in a race that triggers the behavior you're seeing.
(And note that swapping CONFIG_KASAN_OUTLINE=y for CONFIG_KASAN_INLINE=y might
speed things up, but the kernel image gets bigger).

I'm going to try to read the work queue code for the driver and see if anything
obvious stands out...

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (74 preceding siblings ...)
  2020-07-21 19:32 ` bugzilla-daemon
@ 2020-07-21 20:33 ` bugzilla-daemon
  2020-07-21 20:49 ` bugzilla-daemon
                   ` (40 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-21 20:33 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #76 from mnrzk@protonmail.com ---
(In reply to Kees Cook from comment #75)
> Hi!
> 
> First, let me say sorry for all the work my patch has caused! It seems like
> it might be tickling another (previously dormant) bug in the gpu driver.
> 
> 
> (In reply to mnrzk from comment #30)
> > I've been looking at this bug for a while now and I'll try to share what
> > I've found about it.
> > 
> > In some conditions, when amdgpu_dm_atomic_commit_tail calls
> > dm_atomic_get_new_state, dm_atomic_get_new_state returns a struct
> > dm_atomic_state* with an garbage context pointer.
> > 
> > I've also found that this bug exclusively occurs when commit_work is on the
> > workqueue. After forcing drm_atomic_helper_commit to run all of the commits
> > without adding to the workqueue and running the OS, the issue seems to have
> > disappeared. The system was stable for at least 1.5 hours before I manually
> > shut it down (meanwhile it has usually crashed within 30-45 minutes).
> > 
> > Perhaps there's some sort of race condition occurring after commit_work is
> > queued?
> 
> If it helps to explain what's happening in 3202fa62f, the kernel memory
> allocator is moving it's free pointer from offset 0 to the middle of the
> object. That means that when the memory is freed, it writes 8 bytes to join
> the newly freed memory into the allocator's freelist. That always happened,
> but after 3202fa62f it began writing it in the middle, not offset 0. If the
> work queue is trying to use freed memory, and before it didn't notice the
> first 8 bytes getting written, now it appears to notice the overwrite... but
> that still means something is freeing memory before it should.
> 
> Finding that might be a real trick. :( However, if you've suffered through
> all those bisections, I wonder if you can try one other thing, which is to
> compile the kernel with KASAN:
> 
> CONFIG_KASAN=y
> CONFIG_KASAN_GENERIC=y
> CONFIG_KASAN_OUTLINE=y
> CONFIG_KASAN_STACK=y
> CONFIG_KASAN_VMALLOC=y
> 
> This will make things _slow_, which might mean the use-after-free race may
> never trigger. *However* it's possible that it'll catch a bad behavior
> before it even needs to get hit in a race that triggers the behavior you're
> seeing. (And note that swapping CONFIG_KASAN_OUTLINE=y for
> CONFIG_KASAN_INLINE=y might speed things up, but the kernel image gets
> bigger).
> 
> I'm going to try to read the work queue code for the driver and see if
> anything obvious stands out...

Actually this makes perfect sense, struct dm_atomic_state* dm_state has
two components, base (a struct containing a struct drm_atomic_state*) and
context (a struct dc_state*). Reading through the code of
amdgpu_dm_atomic_commit_tail, I see that dm_state->base is never used.

If my understanding is correct, base would have previously been filled with
the freelist pointer (since it's the first 8 bytes). Now since the freelist
pointer is being put in the middle (rounded to the nearest sizeof(void*),
 or 8 bytes), it's being put in the last 8 bytes of *dm_state
(or dm_state->context).

I'll place a void* for padding in the middle of struct dm_atomic_state* and
if my hypothesis is correct, the padding will be filled with garbage data
instead of context and the bug should be fixed. Of course, there would
still be a use-after-free bug in the code which may cause other issues in
the future so I wouldn't really consider it a solution.

Regarding KASAN, I've tried compiling the kernel with KASAN enabled and
from my experience, the bug did not trigger after actively using the system
for 3 hours and leaving it on for 12 hours. This was almost a month ago
though so maybe I'll try again with different KASAN options (i.e.
CONFIG_KASAN_INLINE=y). If anyone has any more tips on getting KASAN to run
faster, I'll be glad to hear them.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (75 preceding siblings ...)
  2020-07-21 20:33 ` bugzilla-daemon
@ 2020-07-21 20:49 ` bugzilla-daemon
  2020-07-21 20:56 ` bugzilla-daemon
                   ` (39 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-21 20:49 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #77 from Kees Cook (kees@outflux.net) ---
(Midair collision... you saw the same about the structure layout as I did.
Here's my comment...)

(In reply to mnrzk from comment #30)
> I've been looking at this bug for a while now and I'll try to share what
> I've found about it.
> 
> In some conditions, when amdgpu_dm_atomic_commit_tail calls
> dm_atomic_get_new_state, dm_atomic_get_new_state returns a struct
> dm_atomic_state* with an garbage context pointer.

It looks like when amdgpu_dm_atomic_commit_tail() walks the private objects
list with for_each_new_private_obj_in_state(), it'll return the first object's
state when the function pointer tables match. This is a struct dm_atomic_state
allocation, which is 16 bytes:

struct drm_private_state {
        struct drm_atomic_state *state;
};

struct dm_atomic_state {
        struct drm_private_state base;
        struct dc_state *context;
};

If struct dm_atomic_state is being freed early, this would match the behavior
seen: before 3202fa62f, .base.state would be overwritten with a freelist
pointer. After 3202fa62f, .context will be overwritten.

In looking for all "kfree(.*state" patterns in
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c, I see a few suspicious
things, maybe. dm_crtc_destroy_state() and amdgpu_dm_connector_funcs_reset() do
an explicit kfree(state) -- should they use dm_atomic_destroy_state() instead?
Or nothing at all, since I'd expect "state" to be managed by the drm layer via
the .atomic_destroy_state callback?


> I've also found that this bug exclusively occurs when commit_work is on the
> workqueue. After forcing drm_atomic_helper_commit to run all of the commits
> without adding to the workqueue and running the OS, the issue seems to have
> disappeared. The system was stable for at least 1.5 hours before I manually
> shut it down (meanwhile it has usually crashed within 30-45 minutes).

Is this the async call to "commit_work" in drm_atomic_helper_commit()?

There's a big warning in there:

        /*
         * Everything below can be run asynchronously without the need to grab
         * any modeset locks at all under one condition: It must be guaranteed
         * that the asynchronous work has either been cancelled (if the driver
         * supports it, which at least requires that the framebuffers get
         * cleaned up with drm_atomic_helper_cleanup_planes()) or completed
         * before the new state gets committed on the software side with
         * drm_atomic_helper_swap_state().
         ...

I'm not sure how to determine if amdgpu_dm.c is doing this correctly?

I can't tell what can interfere with drm_atomic_helper_commit() -- I would
guess the race is between that and something else causing a kfree(), but I
don't know the APIs here at all...

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (76 preceding siblings ...)
  2020-07-21 20:49 ` bugzilla-daemon
@ 2020-07-21 20:56 ` bugzilla-daemon
  2020-07-21 21:16 ` bugzilla-daemon
                   ` (38 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-21 20:56 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #78 from Kees Cook (kees@outflux.net) ---
(In reply to mnrzk from comment #76)
> If my understanding is correct, base would have previously been filled with
> the freelist pointer (since it's the first 8 bytes). Now since the freelist
> pointer is being put in the middle (rounded to the nearest sizeof(void*),
>  or 8 bytes), it's being put in the last 8 bytes of *dm_state
> (or dm_state->context).
> 
> I'll place a void* for padding in the middle of struct dm_atomic_state* and
> if my hypothesis is correct, the padding will be filled with garbage data
> instead of context and the bug should be fixed. Of course, there would
> still be a use-after-free bug in the code which may cause other issues in
> the future so I wouldn't really consider it a solution.

Agreed: that should make it disappear again, but as you say, it's just kicking
the problem down the road since now the failing condition is losing a race with
kfree()+kmalloc()+new contents.

And if you want to detect without crashing, you can just zero the padding at
init time and report when it's non-NULL at workqueue run time... I wonder if
KASAN can run in a mode where the allocation/freeing tracking happens, but
without the heavy checking instrumentation? Then when the corruption is
detected, it could dump a traceback about who did the early kfree()... hmmm.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (77 preceding siblings ...)
  2020-07-21 20:56 ` bugzilla-daemon
@ 2020-07-21 21:16 ` bugzilla-daemon
  2020-07-22  2:03 ` bugzilla-daemon
                   ` (37 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-21 21:16 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #79 from mnrzk@protonmail.com ---
(In reply to Kees Cook from comment #78)
> (In reply to mnrzk from comment #76)
> > If my understanding is correct, base would have previously been filled with
> > the freelist pointer (since it's the first 8 bytes). Now since the freelist
> > pointer is being put in the middle (rounded to the nearest sizeof(void*),
> >  or 8 bytes), it's being put in the last 8 bytes of *dm_state
> > (or dm_state->context).
> > 
> > I'll place a void* for padding in the middle of struct dm_atomic_state* and
> > if my hypothesis is correct, the padding will be filled with garbage data
> > instead of context and the bug should be fixed. Of course, there would
> > still be a use-after-free bug in the code which may cause other issues in
> > the future so I wouldn't really consider it a solution.
> 
> Agreed: that should make it disappear again, but as you say, it's just
> kicking the problem down the road since now the failing condition is losing
> a race with kfree()+kmalloc()+new contents.
> 
> And if you want to detect without crashing, you can just zero the padding at
> init time and report when it's non-NULL at workqueue run time... I wonder if
> KASAN can run in a mode where the allocation/freeing tracking happens, but
> without the heavy checking instrumentation? Then when the corruption is
> detected, it could dump a traceback about who did the early kfree()... hmmm.

So far I've been testing it by passing my GPU to my VM via vfio-pci and
attaching kgdb to the guest. To test if the context was invalid, I added
a check to make sure the context pointer wasn't garbage data (by checking
if dc_state was not null and the upper 16 bits were set on dc_state).

I wonder if there's any way to set a watchpoint to see where exactly the
dm_atomic_state gets filled with garbage data.

Also, since I'm not too familiar with freelists, do freelist pointers look
like regular pointers? On a regular pointer on a system with a 48-bit
virtual address space, regular pointers would be something like
0xffffXXXXXXXXXXXX. I've noticed that the data being inserted never
followed this format. Is this something valuable to note or is that just
the nature of freelist pointers?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (78 preceding siblings ...)
  2020-07-21 21:16 ` bugzilla-daemon
@ 2020-07-22  2:03 ` bugzilla-daemon
  2020-07-22  2:05 ` bugzilla-daemon
                   ` (36 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-22  2:03 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #80 from Kees Cook (kees@outflux.net) ---
(In reply to mnrzk from comment #79)
> I wonder if there's any way to set a watchpoint to see where exactly the
> dm_atomic_state gets filled with garbage data.

mm/slub.c set_freepointer() (via several possible paths through slab_free())
via writes the pointer. What you really want to know is "who called kfree()
before this tried to read from here?". 

> Also, since I'm not too familiar with freelists, do freelist pointers look
> like regular pointers? On a regular pointer on a system with a 48-bit
> virtual address space, regular pointers would be something like
> 0xffffXXXXXXXXXXXX. I've noticed that the data being inserted never
> followed this format. Is this something valuable to note or is that just
> the nature of freelist pointers?

With CONFIG_SLAB_FREELIST_HARDENED=y the contents will be randomly permuted on
a per-slab basis. Without, they'll look like a "regular" kernel heap pointer
(0xffff....). You maybe have much more exciting failure modes without
CONFIG_SLAB_FREELIST_HARDENED since the pointer will actually be valid. :P

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (79 preceding siblings ...)
  2020-07-22  2:03 ` bugzilla-daemon
@ 2020-07-22  2:05 ` bugzilla-daemon
  2020-07-22  3:37 ` bugzilla-daemon
                   ` (35 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-22  2:05 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #81 from Kees Cook (kees@outflux.net) ---
I assume this is the change, BTW:

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
index d61186ff411d..2b8da2b17a5d 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
@@ -424,6 +424,8 @@ struct dm_crtc_state {
 struct dm_atomic_state {
        struct drm_private_state base;

+       /* This will be overwritten by the freelist pointer during kfree() */
+       void *padding;
        struct dc_state *context;
 };

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (80 preceding siblings ...)
  2020-07-22  2:05 ` bugzilla-daemon
@ 2020-07-22  3:37 ` bugzilla-daemon
  2020-07-22  7:27 ` bugzilla-daemon
                   ` (34 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-22  3:37 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #82 from mnrzk@protonmail.com ---
(In reply to Kees Cook from comment #81)
> I assume this is the change, BTW:
> 
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
> index d61186ff411d..2b8da2b17a5d 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
> @@ -424,6 +424,8 @@ struct dm_crtc_state {
>  struct dm_atomic_state {
>         struct drm_private_state base;
>  
> +       /* This will be overwritten by the freelist pointer during kfree() */
> +       void *padding;
>         struct dc_state *context;
>  };

Yeah that's exactly the change I made, save for the comment of course.

I just got around to actually testing it and it appears to still crash.
Either my hypothesis was wrong or I'm doing something wrong here.

Do you have any ideas?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (81 preceding siblings ...)
  2020-07-22  3:37 ` bugzilla-daemon
@ 2020-07-22  7:27 ` bugzilla-daemon
  2020-07-22 13:04 ` bugzilla-daemon
                   ` (33 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-22  7:27 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #83 from Christian König (christian.koenig@amd.com) ---
Instead of working around the bug I think we should concentrate on nailing the
root cause.

I suggest to insert an use after free check into just that structure. In other
words add a field "magic_number" will it with 0xdeadbeef on allocation and set
it to zero before the kfree().

A simple BUG_ON(ptr->magic_number != 0xdeadbeef) should yield results rather
quickly.

Then just add printk()s before the kfree() to figure out why we have this use
after free race.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (82 preceding siblings ...)
  2020-07-22  7:27 ` bugzilla-daemon
@ 2020-07-22 13:04 ` bugzilla-daemon
  2020-07-23  0:48 ` bugzilla-daemon
                   ` (32 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-22 13:04 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #84 from Nicholas Kazlauskas (nicholas.kazlauskas@amd.com) ---
We don't manually free the dm_state from amdgpu, that should be handled by the
DRM core.

It should generally only be freed once it's no longer use by the DRM core as
well once the state has been swapped and we drop the reference on the old state
at the end of commit tail.

If DRM private objects work the same as regular DRM objects - which from my
impression they should - then they should be NULL until they've been acquired
for a new state as needed.

This turns out to be on almost every commit in our current code. I think most
commits that touch planes or CRTCs would end up doing this.

I kind of wonder if we're keeping the old dm_state pointer that was freed in
the case where it isn't duplicated and for whatever reason it isn't actually
NULL.

Based on the above discussion I guess we're probably not doing a use after free
on the dc_state itself.

There's been other bugs with private objects in the past with DRM that didn't
exist with the regular objects that I'd almost consider finding an alternative
solution here and not keeping an old vs new dc_state just to avoid using them
in the first place.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (83 preceding siblings ...)
  2020-07-22 13:04 ` bugzilla-daemon
@ 2020-07-23  0:48 ` bugzilla-daemon
  2020-07-23  5:46 ` bugzilla-daemon
                   ` (31 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-23  0:48 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #85 from mnrzk@protonmail.com ---
(In reply to Christian König from comment #83)
> Instead of working around the bug I think we should concentrate on nailing
> the root cause.
> 
> I suggest to insert an use after free check into just that structure. In
> other words add a field "magic_number" will it with 0xdeadbeef on allocation
> and set it to zero before the kfree().
> 
> A simple BUG_ON(ptr->magic_number != 0xdeadbeef) should yield results rather
> quickly.
> 
> Then just add printk()s before the kfree() to figure out why we have this
> use after free race.

Fair point, I was just trying to confirm my hypothesis.

I realised why the test failed, adding 8 bytes of padding to the middle
made the struct size 24 bytes. Since the freelist pointer is being added
to the middle (12 bytes) and that's aligned to the nearest 8 bytes, the
pointer ended up being placed at an offset of 16 bytes (context).

After making the padding an array of 2 void* and initialising it to
{0xDEADBEEFCAFEF00D, 0x1BADF00D1BADC0DE}, the padding was eventually
corrupted with the context being left intact and therefore, no crashing.

GDB output of dm_struct:
{
    base = {state = 0xffff888273884c00},
    padding = {0xdeadbeefcafef00d, 0x513df83afd3ad7b2},
    context = 0xffff88824e680000
}

That said, I still don't know the root cause of the bug, I'll see
if I can use KASAN or something to figure out what exactly freed
dm_state. If anyone is more familiar with this code has any advice
for me, please let me know.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (84 preceding siblings ...)
  2020-07-23  0:48 ` bugzilla-daemon
@ 2020-07-23  5:46 ` bugzilla-daemon
  2020-07-23 21:30 ` bugzilla-daemon
                   ` (30 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-23  5:46 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #86 from mnrzk@protonmail.com ---
Created attachment 290475
  --> https://bugzilla.kernel.org/attachment.cgi?id=290475&action=edit
KASAN Use-after-free

Good news, I got KASAN to spit out a use-after-free bug report.

Here's the KASAN bug report, I'm currently trying to understand
what's going on here.

Hopefully someone else can figure something out from this.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (85 preceding siblings ...)
  2020-07-23  5:46 ` bugzilla-daemon
@ 2020-07-23 21:30 ` bugzilla-daemon
  2020-07-23 21:34 ` bugzilla-daemon
                   ` (29 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-23 21:30 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #87 from mnrzk@protonmail.com ---
Good news, I wrote a patch that fixed this bug on my machine and submitted
it to the Linux kernel mailing list [1].

I've tested this for almost 12 hours with KASAN enabled and 3 hours with
all debugging options disabled while watching videos and there have been no
crashes. The longest it's taken for the bug to occur in the past for me was
about 1 hour.

To anyone experiencing this bug, please test out the patch and report on
whether on not it works. I think we'll need some Tested-bys in the LKML
thread and in here before we can consider this bug fixed.

[1] https://lkml.org/lkml/2020/7/23/1123

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (86 preceding siblings ...)
  2020-07-23 21:30 ` bugzilla-daemon
@ 2020-07-23 21:34 ` bugzilla-daemon
  2020-07-24  7:18 ` bugzilla-daemon
                   ` (28 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-23 21:34 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #88 from mnrzk@protonmail.com ---
Created attachment 290485
  --> https://bugzilla.kernel.org/attachment.cgi?id=290485&action=edit
Possible bug fix #1

(In reply to mnrzk from comment #87)
> Good news, I wrote a patch that fixed this bug on my machine and submitted
> it to the Linux kernel mailing list [1].
> 
> I've tested this for almost 12 hours with KASAN enabled and 3 hours with
> all debugging options disabled while watching videos and there have been no
> crashes. The longest it's taken for the bug to occur in the past for me was
> about 1 hour.
> 
> To anyone experiencing this bug, please test out the patch and report on
> whether on not it works. I think we'll need some Tested-bys in the LKML
> thread and in here before we can consider this bug fixed.
> 
> [1] https://lkml.org/lkml/2020/7/23/1123

For convenience, I'll attach the patch here as well.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (87 preceding siblings ...)
  2020-07-23 21:34 ` bugzilla-daemon
@ 2020-07-24  7:18 ` bugzilla-daemon
  2020-07-24  7:24 ` bugzilla-daemon
                   ` (27 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-24  7:18 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #89 from Christian König (christian.koenig@amd.com) ---
(In reply to mnrzk from comment #87)
> Good news, I wrote a patch that fixed this bug on my machine and submitted
> it to the Linux kernel mailing list [1].

You should probably send it to the amd-gfx@lists.freedesktop.org mailing list
as well if you haven't already done so.

I'm not an expert on the DC state stuff, so Harry or Alex need to validate this
patch. But of hand it looks like a nice catch to me.

Good work :)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (88 preceding siblings ...)
  2020-07-24  7:18 ` bugzilla-daemon
@ 2020-07-24  7:24 ` bugzilla-daemon
  2020-07-24 19:08 ` bugzilla-daemon
                   ` (26 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-24  7:24 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #90 from mnrzk@protonmail.com ---
(In reply to Christian König from comment #89)
> (In reply to mnrzk from comment #87)
> > Good news, I wrote a patch that fixed this bug on my machine and submitted
> > it to the Linux kernel mailing list [1].
> 
> You should probably send it to the amd-gfx@lists.freedesktop.org mailing
> list as well if you haven't already done so.
> 
> I'm not an expert on the DC state stuff, so Harry or Alex need to validate
> this patch. But of hand it looks like a nice catch to me.
> 
> Good work :)

After further testing, it seems that it only caused the issue to be delayed.
An hour and a half after I submitted the patch, my system crashed.

I mentioned this on the LKML thread but I forgot to mention it here.

I have a suspicion that the same state is being committed twice. I'll have
to investigate this further though. Once I determine if it is, I'll report
back on here and perhaps that will help with a bug fix.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (89 preceding siblings ...)
  2020-07-24  7:24 ` bugzilla-daemon
@ 2020-07-24 19:08 ` bugzilla-daemon
  2020-07-24 21:00 ` bugzilla-daemon
                   ` (25 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-24 19:08 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

laser.eyess.trackers@gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |laser.eyess.trackers@gmail.
                   |                            |com

--- Comment #91 from laser.eyess.trackers@gmail.com ---
I wanted to comment on this bug because I believe I have been experiencing it
based on a bug report I filed with amdgpu[1]. As of 5.7.8 on Arch Linux I no
longer experiencing this bug regularly. Usually I could trigger it every 1-3
days. The biggest change I made was turning off adaptive_sync (VRR, Freesync,
etc.) in my window manager. Now it's been almost a week and I haven't seen it.
Right now I am on 5.7.9 and will keep running as long as possible until it
crashes again, if it crashes again.


I see some discussion here about race conditions between memory allocations and
atomic commits, and while I don't understand most of it, would I be correct in
assuming that variable frame timing would exacerbate this bug? If so, I believe
that is exactly what I am experiencing. I'd love to help test patches for this
as they come in, but for now I want to add that VRR is an important part of the
equation for this bug for me.


The bug report linked in [1] has more of my set up but all I'll say here is
that I also have a multimonitor setup, each one supports VRR and they are at
varying resolutions/refresh rates; two at 1440p 144Hz, one at 4k 60Hz.


1. https://gitlab.freedesktop.org/drm/amd/-/issues/1216

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (90 preceding siblings ...)
  2020-07-24 19:08 ` bugzilla-daemon
@ 2020-07-24 21:00 ` bugzilla-daemon
  2020-07-25  2:38 ` bugzilla-daemon
                   ` (24 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-24 21:00 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #92 from Nicholas Kazlauskas (nicholas.kazlauskas@amd.com) ---
This sounds very similar to a bug I fixed a year ago but that issue was with
freeing the dc_state.

https://bugzilla.kernel.org/show_bug.cgi?id=204181

1. Client requests non-blocking Commit #1, has a new dc_state #1,
state is swapped, commit tail is deferred to work queue

2. Client requests non-blocking Commit #2, has a new dc_state #2,
state is swapped, commit tail is deferred to work queue

3. Commit #2 work starts before Commit #1, commit tail finishes,
atomic state is cleared, dc_state #1 is freed

4. Commit #1 work starts after Commit #2, uses dc_state #1, NULL pointer deref.

This issue was fixed, but it occurred under similar conditions - heavy system
load and frequent pageflipping.

However, in the case of dm_state things can't be solved in the same manner.
Commit #2 can't free Commit #1's commit - only the commit tail for Commit #1
can free it along with the IOCTL caller.

I don't know if this is going down any of the deadlock paths in DRM core
because that might trigger strange behavior as well with clearing/putting the
dm_state.

If someone who can reproduce this issue can produce a dmesg log with the DRM
IOCTLs logged (I think drm.debug=0x54 should work) then I should be able to
examine the IOCTL sequence in more detail.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (91 preceding siblings ...)
  2020-07-24 21:00 ` bugzilla-daemon
@ 2020-07-25  2:38 ` bugzilla-daemon
  2020-07-26  6:47 ` bugzilla-daemon
                   ` (23 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-25  2:38 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #93 from mnrzk@protonmail.com ---
(In reply to Nicholas Kazlauskas from comment #92)
> This sounds very similar to a bug I fixed a year ago but that issue was with
> freeing the dc_state.
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=204181
> 
> 1. Client requests non-blocking Commit #1, has a new dc_state #1,
> state is swapped, commit tail is deferred to work queue
> 
> 2. Client requests non-blocking Commit #2, has a new dc_state #2,
> state is swapped, commit tail is deferred to work queue
> 
> 3. Commit #2 work starts before Commit #1, commit tail finishes,
> atomic state is cleared, dc_state #1 is freed
> 
> 4. Commit #1 work starts after Commit #2, uses dc_state #1, NULL pointer
> deref.
> 
> This issue was fixed, but it occurred under similar conditions - heavy
> system load and frequent pageflipping.
> 
> However, in the case of dm_state things can't be solved in the same manner.
> Commit #2 can't free Commit #1's commit - only the commit tail for Commit #1
> can free it along with the IOCTL caller.
> 
> I don't know if this is going down any of the deadlock paths in DRM core
> because that might trigger strange behavior as well with clearing/putting
> the dm_state.
> 
> If someone who can reproduce this issue can produce a dmesg log with the DRM
> IOCTLs logged (I think drm.debug=0x54 should work) then I should be able to
> examine the IOCTL sequence in more detail.

Yes, this actually seems quite similar to that bug. Perhaps it's something
like that bug but with dm_state instead?

Also, some more observations I've made:
While dm_state is encountering a use-after-free bug, it does not seem like
state as a whole is. The KASAN bug report only states that reading from
dm_state is invalid, but the same cannot be said about state.

Furthermore, dm_state seems to be used in two separate commits and is being
freed after one commit is complete. This creates a race between the two
commits where the completion of one commit before the other calls
dm_atomic_get_new_state causes a use-after-free.

I think the bug works something like this. Keep in mind that I haven't
worked with this code outside of this bug report so there may be a few
misconceptions:

1. Client requests non-blocking Commit #1, has a new dm_state #1,
state is swapped, commit tail is deferred to work queue

2. Client requests non-blocking Commit #2, has a new dm_state #2,
state is swapped, commit tail is deferred to work queue

3. Commit #2 work starts before Commit #1, commit tail finishes,
atomic state is cleared, dm_state #1 is freed

4. Commit #1 work starts after Commit #2, uses dm_state #1 (use-after-free),
reads bad context pointer and dereferences freelist pointer instead.

So I would agree that this is very similar to the dc_state bug (I even
based that explanation on yours). Perhaps that bug you fixed also
affected dm_state as a whole but only caused an issue with dc_state at the
time?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (92 preceding siblings ...)
  2020-07-25  2:38 ` bugzilla-daemon
@ 2020-07-26  6:47 ` bugzilla-daemon
  2020-07-26 18:40 ` bugzilla-daemon
                   ` (22 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-26  6:47 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #94 from mnrzk@protonmail.com ---
I just got this interesting log w/ drm.debug=0x54 right before a crash:

[  971.537862] [drm:drm_atomic_state_init [drm]] Allocated atomic state
00000000cac2d51a
[  971.537909] [drm:drm_atomic_get_crtc_state [drm]] Added [CRTC:47:crtc-0]
00000000dc3e08a2 state to 00000000cac2d51a
[  971.537938] [drm:drm_atomic_get_plane_state [drm]] Added [PLANE:45:plane-5]
00000000ab054dfb state to 00000000cac2d51a
[  971.537963] [drm:drm_atomic_set_fb_for_plane [drm]] Set [FB:103] for
[PLANE:45:plane-5] state 00000000ab054dfb
[  971.537988] [drm:drm_atomic_check_only [drm]] checking 00000000cac2d51a
[  971.538064] [drm:drm_atomic_get_private_obj_state [drm]] Added new private
object 00000000da817c3e state 000000001743c8e6 to 00000000cac2d51a
[  971.538211] [drm:drm_atomic_nonblocking_commit [drm]] committing
00000000cac2d51a nonblocking
[  971.538898] [drm:drm_atomic_state_init [drm]] Allocated atomic state
00000000cc027c4b
[  971.538941] [drm:drm_atomic_get_crtc_state [drm]] Added [CRTC:49:crtc-1]
00000000992fcbd2 state to 00000000cc027c4b
[  971.538968] [drm:drm_atomic_get_plane_state [drm]] Added [PLANE:44:plane-4]
000000009d6970b1 state to 00000000cc027c4b
[  971.538992] [drm:drm_atomic_set_fb_for_plane [drm]] Set [FB:103] for
[PLANE:44:plane-4] state 000000009d6970b1
[  971.539017] [drm:drm_atomic_check_only [drm]] checking 00000000cc027c4b
[  971.539108] [drm:drm_atomic_get_private_obj_state [drm]] Added new private
object 00000000da817c3e state 0000000057153d72 to 00000000cc027c4b
[  971.539140] [drm:drm_atomic_nonblocking_commit [drm]] committing
00000000cc027c4b nonblocking
[  971.544942] [drm:drm_atomic_state_default_clear [drm]] Clearing atomic state
00000000cc027c4b
[  971.544977] [drm:__drm_atomic_state_free [drm]] Freeing atomic state
00000000cc027c4b

and then my debugger detected a use-after-free while 00000000cac2d51a was being
committed.

Basically the sequence of events is as follows:

1. Non-blocking commit #1 (00000000cac2d51a) was requested, allocated, and is
deferred to workqueue.

2. Non-blocking commit #2 (00000000cc027c4b) was requested, allocated, and is
deferred to workqueue.

3. Commit #2 starts and completes before commit #1 is started, dm_state is
freed.

4. Commit #1 starts after commit #2 and is using commit #2's freed dm_state
pointer.

And from every instance of this bug I have seen, it has been due to
page-flipping.

So Nicholas, it seems your observation was correct; the sequence of events are
very similar to how you've described the other bug.

Perhaps we'll have to look into the page-flipping code to figure out what
exactly
is going on.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (93 preceding siblings ...)
  2020-07-26  6:47 ` bugzilla-daemon
@ 2020-07-26 18:40 ` bugzilla-daemon
  2020-07-26 19:55 ` bugzilla-daemon
                   ` (21 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-26 18:40 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #95 from Nicholas Kazlauskas (nicholas.kazlauskas@amd.com) ---
Created attachment 290583
  --> https://bugzilla.kernel.org/attachment.cgi?id=290583&action=edit
0001-drm-amd-display-Force-add-all-CRTCs-to-state-when-us.patch

So the sequence looks like the following:

1. Non-blocking commit #1 requested, checked, swaps state and deferred to work
queue.

2. Non-blocking commit #2 requested, checked, swaps state and deferred to work
queue.

Commits #1 and #2 don't touch any of the same core DRM objects (CRTCs, Planes,
Connectors) so Commit #2 does not stall for Commit #1. DRM Private Objects have
always been avoided in stall checks, so we have no safety from DRM core in this
regard.

3. Due to system load commit #2 executes first and finishes its commit tail
work. At the end of commit tail, as part of DRM core, it calls
drm_atomic_state_put().

Since this was the pageflip IOCTL we likely already dropped the reference on
the state held by the IOCTL itself. So it's going to actually free at this
point.

This eventually calls drm_atomic_state_clear() which does the following:

obj->funcs->atomic_destroy_state(obj, state->private_objs[i].state);

Note that it clears "state" here. Commit sets "state" to the following:

state->private_objs[i].state = old_obj_state;
obj->state = new_obj_state;

Since Commit #1 swapped first this means Commit #2 actually does free Commit
#1's private object.

4. Commit #1 then executes and we get a use after free.

Same bug, it's just this was never corrupted before by the slab changes. It's
been sitting dormant for 5.0~5.8.

Attached is a patch that might help resolve this.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (94 preceding siblings ...)
  2020-07-26 18:40 ` bugzilla-daemon
@ 2020-07-26 19:55 ` bugzilla-daemon
  2020-07-26 22:52 ` bugzilla-daemon
                   ` (20 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-26 19:55 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #96 from mnrzk@protonmail.com ---
(In reply to Nicholas Kazlauskas from comment #95)
> Created attachment 290583 [details]
> 0001-drm-amd-display-Force-add-all-CRTCs-to-state-when-us.patch
> 
> So the sequence looks like the following:
> 
> 1. Non-blocking commit #1 requested, checked, swaps state and deferred to
> work queue.
> 
> 2. Non-blocking commit #2 requested, checked, swaps state and deferred to
> work queue.
> 
> Commits #1 and #2 don't touch any of the same core DRM objects (CRTCs,
> Planes, Connectors) so Commit #2 does not stall for Commit #1. DRM Private
> Objects have always been avoided in stall checks, so we have no safety from
> DRM core in this regard.
> 
> 3. Due to system load commit #2 executes first and finishes its commit tail
> work. At the end of commit tail, as part of DRM core, it calls
> drm_atomic_state_put().
> 
> Since this was the pageflip IOCTL we likely already dropped the reference on
> the state held by the IOCTL itself. So it's going to actually free at this
> point.
> 
> This eventually calls drm_atomic_state_clear() which does the following:
> 
> obj->funcs->atomic_destroy_state(obj, state->private_objs[i].state);
> 
> Note that it clears "state" here. Commit sets "state" to the following:
> 
> state->private_objs[i].state = old_obj_state;
> obj->state = new_obj_state;

What line number roughly does that happen on? I can't seem to find that 
anywhere in amdgpu_dm.c

> 
> Since Commit #1 swapped first this means Commit #2 actually does free Commit
> #1's private object.
> 
> 4. Commit #1 then executes and we get a use after free.
> 
> Same bug, it's just this was never corrupted before by the slab changes.
> It's been sitting dormant for 5.0~5.8.
> 
> Attached is a patch that might help resolve this.

I actually just started testing my own patch, but I'll apply your patch
and see if it works though.

My patch is based on how you solved bug 204181 [1] and instead of setting
the new dc_state to the old dc_state, it frees the dm_state and removes
the associated private object.

If I understand correctly, if dm_state is set to NULL (i.e. new state
cannot be found), commit_tail retains the current state and context.
Since dm_state only contains the context (which is unused), I don't see
why freeing the state and clearing the private object beforehand would
be an issue.

I would attach the patch but I'll need to clean up my code first. If the
patch works for the next few hours, I'll clean it up and attach it.

[1] https://patchwork.freedesktop.org/patch/320797/

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (95 preceding siblings ...)
  2020-07-26 19:55 ` bugzilla-daemon
@ 2020-07-26 22:52 ` bugzilla-daemon
  2020-07-26 23:30 ` bugzilla-daemon
                   ` (19 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-26 22:52 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

mnrzk@protonmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #290485|0                           |1
        is obsolete|                            |

--- Comment #97 from mnrzk@protonmail.com ---
Created attachment 290591
  --> https://bugzilla.kernel.org/attachment.cgi?id=290591&action=edit
drm/amd/display: Clear dm_state for fast updates

drm/amd/display: Clear dm_state for fast updates

Alright, the bug patch I mentioned in the last comment seems to be good
after a few hours of testing.

Please try out this patch and see if it fixes the issue for the rest of
you.

In the meantime, I'm doing more extended tests on this patch to confirm it
works well enough before posting it on LKML.

Nicholas, I haven't tested your commit since I was too busy with this. I'll
try it out if this one fails though.

Also, can you please review this patch to confirm that I'm not doing
anything wrong here?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (96 preceding siblings ...)
  2020-07-26 22:52 ` bugzilla-daemon
@ 2020-07-26 23:30 ` bugzilla-daemon
  2020-07-26 23:52 ` bugzilla-daemon
                   ` (18 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-26 23:30 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #98 from Nicholas Kazlauskas (nicholas.kazlauskas@amd.com) ---
As much as I'd like to remove the DRM private object from the state instead of
just carrying it over I'd really rather not be hacking around behavior from the
DRM core itself.

Maybe there's value in adding these as DRM helpers in the case where a driver
explicitly wants to remove something from the state. My guess as to why these
don't exist today is because they can be bug prone since the core implicitly
adds some objects (like CRTCs when you add a plane and CRTCs when you add
connectors) but I don't see any technical limitation for not exposing this.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (97 preceding siblings ...)
  2020-07-26 23:30 ` bugzilla-daemon
@ 2020-07-26 23:52 ` bugzilla-daemon
  2020-07-27  6:11 ` bugzilla-daemon
                   ` (17 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-26 23:52 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #99 from mnrzk@protonmail.com ---
(In reply to Nicholas Kazlauskas from comment #98)
> As much as I'd like to remove the DRM private object from the state instead
> of just carrying it over I'd really rather not be hacking around behavior
> from the DRM core itself.
> 
> Maybe there's value in adding these as DRM helpers in the case where a
> driver explicitly wants to remove something from the state. My guess as to
> why these don't exist today is because they can be bug prone since the core
> implicitly adds some objects (like CRTCs when you add a plane and CRTCs when
> you add connectors) but I don't see any technical limitation for not
> exposing this.

I'm a little bit confused, is there anything particularly illegal or
discouraged about the patch I sent? If so, how should I correct it?

Should I create some sort of DRM helper for deleting a private object and
use that to delete the state's associated private object?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (98 preceding siblings ...)
  2020-07-26 23:52 ` bugzilla-daemon
@ 2020-07-27  6:11 ` bugzilla-daemon
  2020-07-27 16:55 ` bugzilla-daemon
                   ` (16 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-27  6:11 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #100 from mnrzk@protonmail.com ---
I posted the patch on the LKML [1] just now so I can get the other
reviewers' input on it. I think it's safe to say that it's working now due
to how much I've tested it but I will test more over the coming days just
to be safe.

If anyone else can test this patch and give their Tested-by in the LKML
thread, or just comment in here about it, please do.

Aside from the description, this patch is identical to the one I just
attached.

Nicholas, sorry but I wasn't quite sure if you were giving a suggestion in
that comment earlier. Please tell me if you have any suggestions or
concerns with this patch.

[1] https://lkml.org/lkml/2020/7/27/64

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (99 preceding siblings ...)
  2020-07-27  6:11 ` bugzilla-daemon
@ 2020-07-27 16:55 ` bugzilla-daemon
  2020-07-28  2:29 ` bugzilla-daemon
                   ` (15 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-27 16:55 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #101 from Duncan (1i5t5.duncan@cox.net) ---
(In reply to Nicholas Kazlauskas from comment #95)
> Created attachment 290583 [details]
> 0001-drm-amd-display-Force-add-all-CRTCs-to-state-when-us.patch

Just booted to 5.8-rc7 with this patched in locally (and the g320+ reverts
/not/ patched in).  So testing, but noting again that the bug can take a couple
days to trigger on my hardware, so while verifying bug-still-there /might/ be
fast, verifying that it's /not/ there will take awhile.

If this still bugs on me (and barring other developments first) I'll try
mnrzk's patch in place of this one.  Even if it's not permanent, getting it
into 5.8 as a temporary fix and doing something better for 5.9 would buy us
some time to develop and test the more permanent fix.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (100 preceding siblings ...)
  2020-07-27 16:55 ` bugzilla-daemon
@ 2020-07-28  2:29 ` bugzilla-daemon
  2020-07-28  3:21 ` bugzilla-daemon
                   ` (14 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-28  2:29 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #102 from Duncan (1i5t5.duncan@cox.net) ---
(In reply to Duncan from comment #101)
> (In reply to Nicholas Kazlauskas from comment #95)
> > 0001-drm-amd-display-Force-add-all-CRTCs-to-state-when-us.patch
> 
> Just booted to 5.8-rc7 with this patched in locally (and the g320+ reverts
> /not/ patched in).  So testing, but noting again that the bug can take a
> couple days to trigger on my hardware, so while verifying bug-still-there
> /might/ be fast, verifying that it's /not/ there will take awhile.

So far building system updates so heavy cpu load while playing only moderate
FHD video.  No freezes but I have seen a bit of the predicted judder.

I suspect the synchronization is preventing the freezes, and the judder hasn't
been /bad/.  But with different-refresh monitors (mine are both 60 Hz 4k
bigscreen TVs so same refresh), or trying 4k video, particularly 4k60 which my
system already struggles with, or possibly even both say 120 Hz monitors, the
judder would be noticeably worse.  The 4k30 and 4k60 youtube tests will
probably have to wait for tomorrow, tho, as I've been up near 24 now...

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (101 preceding siblings ...)
  2020-07-28  2:29 ` bugzilla-daemon
@ 2020-07-28  3:21 ` bugzilla-daemon
  2020-07-28  3:39 ` bugzilla-daemon
                   ` (13 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-28  3:21 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #103 from mnrzk@protonmail.com ---
(In reply to Nicholas Kazlauskas from comment #95)
> Created attachment 290583 [details]
> 0001-drm-amd-display-Force-add-all-CRTCs-to-state-when-us.patch
> 
> So the sequence looks like the following:
> 
> 1. Non-blocking commit #1 requested, checked, swaps state and deferred to
> work queue.
> 
> 2. Non-blocking commit #2 requested, checked, swaps state and deferred to
> work queue.
> 
> Commits #1 and #2 don't touch any of the same core DRM objects (CRTCs,
> Planes, Connectors) so Commit #2 does not stall for Commit #1. DRM Private
> Objects have always been avoided in stall checks, so we have no safety from
> DRM core in this regard.
> 
> 3. Due to system load commit #2 executes first and finishes its commit tail
> work. At the end of commit tail, as part of DRM core, it calls
> drm_atomic_state_put().
> 
> Since this was the pageflip IOCTL we likely already dropped the reference on
> the state held by the IOCTL itself. So it's going to actually free at this
> point.
> 
> This eventually calls drm_atomic_state_clear() which does the following:
> 
> obj->funcs->atomic_destroy_state(obj, state->private_objs[i].state);
> 
> Note that it clears "state" here. Commit sets "state" to the following:
> 
> state->private_objs[i].state = old_obj_state;
> obj->state = new_obj_state;
> 
> Since Commit #1 swapped first this means Commit #2 actually does free Commit
> #1's private object.
> 
> 4. Commit #1 then executes and we get a use after free.
> 
> Same bug, it's just this was never corrupted before by the slab changes.
> It's been sitting dormant for 5.0~5.8.
> 
> Attached is a patch that might help resolve this.

So I just got around to testing this patch and so far, not very promising.

Right now I can't comment on if the bug in question was resolved but this
just introduced some new critical bugs for me.

I first tried this on my bare metal system w/ my RX 480 and it boots into
lightdm just fine. As soon as I log in and start up XFCE however, one of my
two monitors goes black (monitor reports being asleep) but my cursor seems
to drift into the other monitor just fine. So after that, I check the
display settings and both monitors are detected. So I tried re-enabling the
off monitor and then both monitors work fine.

After that, another bug: I now have two cursors, one only works on my right
monitor and the other only stays in one position.

At this point, I recompiled and remade the initramfs, and sure enough, same
issues. This time, however, changing the display settings didn't "fix" the
issue with one monitor being blank; the off monitor activated, but the
previously working one just froze.

I also tried this on my VM passing through my GPU w/ vfio-pci; similar
issues. Lightdm worked fine but when I started KDE Plasma, it started
flashing white and one of my monitors just became blank. This time, I
couldn't enable the blank display from the settings, it just didn't show
up. Xrandr only showed one output as well; switching HDMI outputs still
only lets me use the monitor on the "working" HDMI port.

I don't exactly know how I would go about debugging this since there's just
too many bugs to count. I also don't know if it would be worth it at all.

Do you have any idea why this would occur? This patch only seems to force
synchronisation, I don't quite know why it would break my system so much.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (102 preceding siblings ...)
  2020-07-28  3:21 ` bugzilla-daemon
@ 2020-07-28  3:39 ` bugzilla-daemon
  2020-07-28  7:14 ` bugzilla-daemon
                   ` (12 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-28  3:39 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #104 from mnrzk@protonmail.com ---
(In reply to mnrzk from comment #103)
> (In reply to Nicholas Kazlauskas from comment #95)
> > Created attachment 290583 [details]
> > 0001-drm-amd-display-Force-add-all-CRTCs-to-state-when-us.patch
> > 
> > So the sequence looks like the following:
> > 
> > 1. Non-blocking commit #1 requested, checked, swaps state and deferred to
> > work queue.
> > 
> > 2. Non-blocking commit #2 requested, checked, swaps state and deferred to
> > work queue.
> > 
> > Commits #1 and #2 don't touch any of the same core DRM objects (CRTCs,
> > Planes, Connectors) so Commit #2 does not stall for Commit #1. DRM Private
> > Objects have always been avoided in stall checks, so we have no safety from
> > DRM core in this regard.
> > 
> > 3. Due to system load commit #2 executes first and finishes its commit tail
> > work. At the end of commit tail, as part of DRM core, it calls
> > drm_atomic_state_put().
> > 
> > Since this was the pageflip IOCTL we likely already dropped the reference
> on
> > the state held by the IOCTL itself. So it's going to actually free at this
> > point.
> > 
> > This eventually calls drm_atomic_state_clear() which does the following:
> > 
> > obj->funcs->atomic_destroy_state(obj, state->private_objs[i].state);
> > 
> > Note that it clears "state" here. Commit sets "state" to the following:
> > 
> > state->private_objs[i].state = old_obj_state;
> > obj->state = new_obj_state;
> > 
> > Since Commit #1 swapped first this means Commit #2 actually does free
> Commit
> > #1's private object.
> > 
> > 4. Commit #1 then executes and we get a use after free.
> > 
> > Same bug, it's just this was never corrupted before by the slab changes.
> > It's been sitting dormant for 5.0~5.8.
> > 
> > Attached is a patch that might help resolve this.
> 
> So I just got around to testing this patch and so far, not very promising.
> 
> Right now I can't comment on if the bug in question was resolved but this
> just introduced some new critical bugs for me.
> 
> I first tried this on my bare metal system w/ my RX 480 and it boots into
> lightdm just fine. As soon as I log in and start up XFCE however, one of my
> two monitors goes black (monitor reports being asleep) but my cursor seems
> to drift into the other monitor just fine. So after that, I check the
> display settings and both monitors are detected. So I tried re-enabling the
> off monitor and then both monitors work fine.
> 
> After that, another bug: I now have two cursors, one only works on my right
> monitor and the other only stays in one position.
> 
> At this point, I recompiled and remade the initramfs, and sure enough, same
> issues. This time, however, changing the display settings didn't "fix" the
> issue with one monitor being blank; the off monitor activated, but the
> previously working one just froze.
> 
> I also tried this on my VM passing through my GPU w/ vfio-pci; similar
> issues. Lightdm worked fine but when I started KDE Plasma, it started
> flashing white and one of my monitors just became blank. This time, I
> couldn't enable the blank display from the settings, it just didn't show
> up. Xrandr only showed one output as well; switching HDMI outputs still
> only lets me use the monitor on the "working" HDMI port.
> 
> I don't exactly know how I would go about debugging this since there's just
> too many bugs to count. I also don't know if it would be worth it at all.
> 
> Do you have any idea why this would occur? This patch only seems to force
> synchronisation, I don't quite know why it would break my system so much.

This just gets even weirder the more I test it out. Swapping the two
monitors (i.e. swapping the HDMI ports used for each monitor) seems to fix
the issue completely on my VM (at least from 1 minute of testing), but on
the host it fixes some of the issues (my cursor still disappears on one of
my monitors).

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (103 preceding siblings ...)
  2020-07-28  3:39 ` bugzilla-daemon
@ 2020-07-28  7:14 ` bugzilla-daemon
  2020-07-29  2:33 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-28  7:14 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #105 from Duncan (1i5t5.duncan@cox.net) ---
(In reply to Duncan from comment #102)
> (In reply to Duncan from comment #101)
> > (In reply to Nicholas Kazlauskas from comment #95)
> > > 0001-drm-amd-display-Force-add-all-CRTCs-to-state-when-us.patch
> > 
> > Just booted to 5.8-rc7 with this patched in 
> 
> So far building system updates so heavy cpu load while playing only moderate
> FHD video.  No freezes but I have seen a bit of the predicted judder.
> 
> The 4k30 and 4k60 youtube tests will probably have to wait for tomorrow, tho,
> as I've been up near 24 now...

Still up...  Here's the promised 4k youtube-in-firefox tests.

4k is a bit more stuttery than normal with the patch, but not near as bad as I
expected it to be.  I can normally run 4k60 at 80-85% normal speed with
occasional stutters but without freezing the video entirely until I drop the
speed down again as I often have to do if I try running over that.  With the
patch I was doing 70-75%.  So there's definitely some effect on 4k60. 
Switching to the performance cpufreq governor from my default conservative, as
usual, helps a bit, but not a lot, maybe 5%, tho the frame-freezes seem to
recover a bit better on performance.  In addition to long video freezes at the
full 4k60 100%, even normally I'll sometimes get tab-crashes depending on the
video.  I didn't have any for this test but then I'm so used to not being able
to run at full-speed that I didn't try it for long.

I can normally run 4k30 videos without much problem on default conservative. 
With the patch I was still getting some stuttering at 30fps on conservative,
but it pretty much cleared up with on-demand.  I did just have a tab-crash at
4k30, something I very rarely if ever see normally on 4k30, it normally takes
4k60 to trigger them, so it's definitely affecting it.

But... other than slowing down the usable 4k fps, I'm not seeing any of judder
artifacts on the work (non-video-playing) monitor that I was seeing with the
high system load but relatively low video load build testing with only FHD
video.  That surprised me.  I expected to see more of that with the more
demanding video.  But apparently that's tied to CPU or memory load, not video
load.

But nothing like the problems mnrzk's seeing with the patch, at all.  Both
monitors running fine in text mode, login, startx to plasma, running fine there
too.  Hardware cursor's fine. <shrug>  The only thing I'm seeing is some
slowdown and judder, as described above.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (104 preceding siblings ...)
  2020-07-28  7:14 ` bugzilla-daemon
@ 2020-07-29  2:33 ` bugzilla-daemon
  2020-07-29  6:41 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-29  2:33 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #106 from Duncan (1i5t5.duncan@cox.net) ---
(In reply to Duncan from comment #101)
> (In reply to Nicholas Kazlauskas from comment #95)
> > Created attachment 290583 [details]
> > 0001-drm-amd-display-Force-add-all-CRTCs-to-state-when-us.patch
> 
> Just booted to 5.8-rc7 with this patched in locally (and the g320+ reverts
> /not/ patched in).  So testing, but noting again that the bug can take a
> couple days to trigger on my hardware

This doesn't seem to trigger the bug at all for me, tho there's the expected
slowdown/judder from force-syncing all CRTCs as detailed in my last couple
comments.  But with the more serious side effects mnrzk is seeing with it, it's
clearly not useful as an even temporary mainline candidate.

I'll be testing mnrzk's patch now.  Hopefully it'll be good enough for the
quickly approaching 5.8, tho dev consensus seems to be that a deeper rework is
needed longer-term.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (105 preceding siblings ...)
  2020-07-29  2:33 ` bugzilla-daemon
@ 2020-07-29  6:41 ` bugzilla-daemon
  2020-07-29 16:02 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-29  6:41 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #107 from Paul Menzel (pmenzel+bugzilla.kernel.org@molgen.mpg.de) ---
Everyone seeing this, it’d be great, if you tested

    [PATCH] drm/amd/display: Clear dm_state for fast updates

and reported any noticeable performance regressions.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (106 preceding siblings ...)
  2020-07-29  6:41 ` bugzilla-daemon
@ 2020-07-29 16:02 ` bugzilla-daemon
  2020-07-29 16:37 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-29 16:02 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #108 from Duncan (1i5t5.duncan@cox.net) ---
(In reply to Paul Menzel from comment #107)
> Everyone seeing this, it’d be great, if you tested
> 
>     [PATCH] drm/amd/display: Clear dm_state for fast updates

I've been testing it for... ~12 hours now and so far... nothing unusual to
report. =:^)

Everything seems to be working normally including 4k video and update builds. 
The only two caveats are that there wasn't anything /too/ heavy in the update
pipeline to build, and it has only been 12 hours, while sometimes this bug took
two days to bite on my setup.  But so far, so good, and now that I'm posting
this, if the bug's going to bite it's likely to be right after I hit submit, so
let's see! =:^)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (107 preceding siblings ...)
  2020-07-29 16:02 ` bugzilla-daemon
@ 2020-07-29 16:37 ` bugzilla-daemon
  2020-07-29 16:45 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-29 16:37 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #109 from zzyxpaw@gmail.com ---
I've been testing mnrzk's patch for about 12 hours as well, so far so good. No
obvious performance degradation has appeared, at least that I can discern just
by "feel". My testing has been interrupted a couple times by the new
power-off-on-overtemperature feature while attempting to test heavier loads.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (108 preceding siblings ...)
  2020-07-29 16:37 ` bugzilla-daemon
@ 2020-07-29 16:45 ` bugzilla-daemon
  2020-07-29 20:32 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-29 16:45 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #110 from Nicholas Kazlauskas (nicholas.kazlauskas@amd.com) ---
That's inline with the expectations I think.

That patch shouldn't cause any performance or stuttering impacts and it should
resolve the protection fault.

If there were issues with the patch I would expect to see them within the first
few pageflips from booting into desktop.

For now I'll give my Reviewed-by on the patch on the mailing list and get it
merged in.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (109 preceding siblings ...)
  2020-07-29 16:45 ` bugzilla-daemon
@ 2020-07-29 20:32 ` bugzilla-daemon
  2020-07-31 16:38 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-29 20:32 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #111 from mnrzk@protonmail.com ---
Yeah, no noticeable performance impact on my end either. I don't really
see why it would cause a performance impact either. I could run a benchmark
to compare but I don't really know what to benchmark specifically.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (110 preceding siblings ...)
  2020-07-29 20:32 ` bugzilla-daemon
@ 2020-07-31 16:38 ` bugzilla-daemon
  2020-08-02  1:40 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-07-31 16:38 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #112 from Duncan (1i5t5.duncan@cox.net) ---
(In reply to Paul Menzel from comment #107)
> Everyone seeing this, it’d be great, if you tested
> 
>     [PATCH] drm/amd/display: Clear dm_state for fast updates

For the record, with no reported problems that's in 5.8-post-rc7 now as
fde9f39ac, merged into the drm tree with merge-commit 887c909dd, which in turn
was merged into mainline on Thursday July 30 with merge-commit d8b9faec5.

Thanks, everyone. =:^)

Close the bug on 5.8.0 release?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (111 preceding siblings ...)
  2020-07-31 16:38 ` bugzilla-daemon
@ 2020-08-02  1:40 ` bugzilla-daemon
  2020-08-02 13:06 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-08-02  1:40 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #113 from laser.eyess.trackers@gmail.com ---
I have been using this patch for about 24 hours now, and there has not been any
noticeable performance degradation. I have not experienced any crashes, but it
was much harder for me to get this crash (1-3 days, if I was lucky), so I'm not
sure what that means. At the very least nothing is worse.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (112 preceding siblings ...)
  2020-08-02  1:40 ` bugzilla-daemon
@ 2020-08-02 13:06 ` bugzilla-daemon
  2020-08-03 13:51 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-08-02 13:06 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #114 from Jeremy Kescher (jeremy@kescher.at) ---
(In reply to Duncan from comment #108)
> (In reply to Paul Menzel from comment #107)
> > Everyone seeing this, it’d be great, if you tested
> > 
> >     [PATCH] drm/amd/display: Clear dm_state for fast updates
> 


It fixes the issue for me. My system would, without any patches, crash in a
matter of minutes (perhaps a mix of 144 Hz and 60 Hz monitors causes this crash
to happen faster?), but it has been running for multiple hours on intense
workloads now, without any hiccups or anything.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (113 preceding siblings ...)
  2020-08-02 13:06 ` bugzilla-daemon
@ 2020-08-03 13:51 ` bugzilla-daemon
  2020-08-05 16:10 ` bugzilla-daemon
  2020-08-17  5:45 ` bugzilla-daemon
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-08-03 13:51 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #115 from Duncan (1i5t5.duncan@cox.net) ---
So 5.8.0 has been out for a few hours with the patch-fix, and I see Greg K-H
has it applied to the 5.7 stable tree as well as 5.4 LTS (the bug was in 5.4
but latent, not exposed until developments in 5.7), so they should be covered
in their next releases.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (114 preceding siblings ...)
  2020-08-03 13:51 ` bugzilla-daemon
@ 2020-08-05 16:10 ` bugzilla-daemon
  2020-08-17  5:45 ` bugzilla-daemon
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-08-05 16:10 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

Duncan (1i5t5.duncan@cox.net) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
     Kernel Version|5.7-rc1 - 5.7 - 5.8-rc5+    |5.7-rc1 - 5.7 - 5.8-rc5+,
                   |                            |fixed in 5.8.0, 5.7.13,
                   |                            |5.4.56
         Resolution|---                         |CODE_FIX

--- Comment #116 from Duncan (1i5t5.duncan@cox.net) ---
For those not on 5.8 yet, Mazin's patch is in the 5.7.13 stable and 5.4.56 LTS
releases.

As far as I'm concerned (and lacking any NAKs to my previous question about
closing) there's no further reason to leave the bug open so I'm closing.  The
bugzilla.kernel.org installation has some confusing custom resolution choices
that haven't been documented in the status help link (bug #13851, filed years
ago as implied by the bug number compared to this one) and I don't know whether
CODE_FIX or PATCH_ALREADY_AVAILABLE is more appropriate as both seem to apply
equally, so I guess I'll leave it at the default CODE_FIX.

Thanks again to everyone who confirmed the bug and/or worked on fixes and
testing.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
  2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
                   ` (115 preceding siblings ...)
  2020-08-05 16:10 ` bugzilla-daemon
@ 2020-08-17  5:45 ` bugzilla-daemon
  116 siblings, 0 replies; 118+ messages in thread
From: bugzilla-daemon @ 2020-08-17  5:45 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #117 from Duncan (1i5t5.duncan@cox.net) ---
For those on stable-series 5.4 and/or interested in related bugs...

FWIW, there's an (apparently different) atomic_commit_tail bug reported against
5.4.58 now, bug #208913, with the patch for this bug (which went into 5.4.56
after hitting a late 5.8-rc) originally listed as a potential trigger.

But the filer closed the bug and moved it to the gitlab instance on
freedesktop.org https://gitlab.freedesktop.org/drm/amd/-/issues/1263 , where he
said reverting the patch didn't cure his issue, so there's something else going
on there.

Just posting this here as related, in case anyone here wants to follow it,
since I came across it while checking on a different (not graphics-related) bug
in 5.9-rc1.  With any luck, however, the similar bug will help get a better
longer term fix for both bugs, since the patch in 5.8 (backported to 5.7 and
5.4) was seen as a temporary bandaid, not a permanent fix.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 118+ messages in thread

end of thread, back to index

Thread overview: 118+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-21  9:51 [Bug 207383] New: [Regression] 5.7-rc: amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail bugzilla-daemon
2020-04-21  9:57 ` [Bug 207383] " bugzilla-daemon
2020-04-21 10:04 ` bugzilla-daemon
2020-04-23  4:59 ` bugzilla-daemon
2020-04-27 19:24 ` bugzilla-daemon
2020-04-27 19:42 ` bugzilla-daemon
2020-04-27 19:43 ` bugzilla-daemon
2020-05-01  8:20 ` bugzilla-daemon
2020-05-01  8:28 ` bugzilla-daemon
2020-05-02 16:03 ` bugzilla-daemon
2020-05-03 15:10 ` bugzilla-daemon
2020-05-05  4:23 ` bugzilla-daemon
2020-05-06 17:46 ` bugzilla-daemon
2020-05-06 22:06 ` bugzilla-daemon
2020-06-03  0:04 ` [Bug 207383] [Regression] 5.7 " bugzilla-daemon
2020-06-21  7:01 ` bugzilla-daemon
2020-06-22 15:20 ` bugzilla-daemon
2020-06-22 17:44 ` bugzilla-daemon
2020-06-22 17:57 ` bugzilla-daemon
2020-06-22 19:36 ` bugzilla-daemon
2020-06-22 20:00 ` bugzilla-daemon
2020-06-23 15:36 ` bugzilla-daemon
2020-06-23 23:41 ` bugzilla-daemon
2020-06-24  8:55 ` bugzilla-daemon
2020-06-27  4:37 ` bugzilla-daemon
2020-06-27  4:38 ` bugzilla-daemon
2020-06-27  5:16 ` bugzilla-daemon
2020-06-27  6:08 ` bugzilla-daemon
2020-06-27  7:07 ` bugzilla-daemon
2020-06-27 22:26 ` bugzilla-daemon
2020-06-28  1:12 ` bugzilla-daemon
2020-06-28 10:48 ` bugzilla-daemon
2020-06-28 15:30 ` bugzilla-daemon
2020-06-29  7:39 ` bugzilla-daemon
2020-06-29 22:09 ` bugzilla-daemon
2020-07-01 19:08 ` bugzilla-daemon
2020-07-04 19:57 ` bugzilla-daemon
2020-07-04 20:13 ` bugzilla-daemon
2020-07-05 16:58 ` bugzilla-daemon
2020-07-05 22:08 ` bugzilla-daemon
2020-07-06 16:24 ` bugzilla-daemon
2020-07-06 23:57 ` bugzilla-daemon
2020-07-07  0:37 ` bugzilla-daemon
2020-07-07  3:01 ` bugzilla-daemon
2020-07-07 11:01 ` bugzilla-daemon
2020-07-07 12:43 ` bugzilla-daemon
2020-07-07 15:27 ` bugzilla-daemon
2020-07-07 19:05 ` bugzilla-daemon
2020-07-08  0:25 ` bugzilla-daemon
2020-07-08  1:25 ` bugzilla-daemon
2020-07-08 20:16 ` bugzilla-daemon
2020-07-08 20:17 ` bugzilla-daemon
2020-07-09  7:45 ` bugzilla-daemon
2020-07-10  7:23 ` bugzilla-daemon
2020-07-10  7:36 ` bugzilla-daemon
2020-07-10  8:10 ` bugzilla-daemon
2020-07-10 10:55 ` bugzilla-daemon
2020-07-10 11:25 ` bugzilla-daemon
2020-07-10 14:31 ` bugzilla-daemon
2020-07-12  5:20 ` bugzilla-daemon
2020-07-12  5:47 ` bugzilla-daemon
2020-07-12  7:47 ` bugzilla-daemon
2020-07-14 23:36 ` bugzilla-daemon
2020-07-15 16:49 ` bugzilla-daemon
2020-07-15 17:12 ` bugzilla-daemon
2020-07-16  2:12 ` bugzilla-daemon
2020-07-16  6:37 ` bugzilla-daemon
2020-07-16  9:35 ` bugzilla-daemon
2020-07-16 10:24 ` bugzilla-daemon
2020-07-16 10:30 ` bugzilla-daemon
2020-07-16 10:32 ` bugzilla-daemon
2020-07-17 12:39 ` bugzilla-daemon
2020-07-20  2:20 ` bugzilla-daemon
2020-07-21 16:40 ` bugzilla-daemon
2020-07-21 16:57 ` bugzilla-daemon
2020-07-21 19:32 ` bugzilla-daemon
2020-07-21 20:33 ` bugzilla-daemon
2020-07-21 20:49 ` bugzilla-daemon
2020-07-21 20:56 ` bugzilla-daemon
2020-07-21 21:16 ` bugzilla-daemon
2020-07-22  2:03 ` bugzilla-daemon
2020-07-22  2:05 ` bugzilla-daemon
2020-07-22  3:37 ` bugzilla-daemon
2020-07-22  7:27 ` bugzilla-daemon
2020-07-22 13:04 ` bugzilla-daemon
2020-07-23  0:48 ` bugzilla-daemon
2020-07-23  5:46 ` bugzilla-daemon
2020-07-23 21:30 ` bugzilla-daemon
2020-07-23 21:34 ` bugzilla-daemon
2020-07-24  7:18 ` bugzilla-daemon
2020-07-24  7:24 ` bugzilla-daemon
2020-07-24 19:08 ` bugzilla-daemon
2020-07-24 21:00 ` bugzilla-daemon
2020-07-25  2:38 ` bugzilla-daemon
2020-07-26  6:47 ` bugzilla-daemon
2020-07-26 18:40 ` bugzilla-daemon
2020-07-26 19:55 ` bugzilla-daemon
2020-07-26 22:52 ` bugzilla-daemon
2020-07-26 23:30 ` bugzilla-daemon
2020-07-26 23:52 ` bugzilla-daemon
2020-07-27  6:11 ` bugzilla-daemon
2020-07-27 16:55 ` bugzilla-daemon
2020-07-28  2:29 ` bugzilla-daemon
2020-07-28  3:21 ` bugzilla-daemon
2020-07-28  3:39 ` bugzilla-daemon
2020-07-28  7:14 ` bugzilla-daemon
2020-07-29  2:33 ` bugzilla-daemon
2020-07-29  6:41 ` bugzilla-daemon
2020-07-29 16:02 ` bugzilla-daemon
2020-07-29 16:37 ` bugzilla-daemon
2020-07-29 16:45 ` bugzilla-daemon
2020-07-29 20:32 ` bugzilla-daemon
2020-07-31 16:38 ` bugzilla-daemon
2020-08-02  1:40 ` bugzilla-daemon
2020-08-02 13:06 ` bugzilla-daemon
2020-08-03 13:51 ` bugzilla-daemon
2020-08-05 16:10 ` bugzilla-daemon
2020-08-17  5:45 ` bugzilla-daemon

dri-devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/dri-devel/0 dri-devel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dri-devel dri-devel/ https://lore.kernel.org/dri-devel \
		dri-devel@lists.freedesktop.org
	public-inbox-index dri-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.freedesktop.lists.dri-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git