dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [Bug 110142] "Oops: Kernel access of bad area sig 7" on Kernel 5.0.0 PPC64LE when loading amdgpu, xorg hangs after being unable to load after OS boots.
@ 2019-03-17  2:52 bugzilla-daemon
  2019-03-18  2:33 ` bugzilla-daemon
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: bugzilla-daemon @ 2019-03-17  2:52 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2031 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110142

            Bug ID: 110142
           Summary: "Oops: Kernel access of bad area sig 7" on Kernel
                    5.0.0 PPC64LE when loading amdgpu, xorg hangs after
                    being unable to load after OS boots.
           Product: DRI
           Version: unspecified
          Hardware: PowerPC
                OS: Linux (All)
            Status: NEW
          Severity: critical
          Priority: medium
         Component: DRM/AMDgpu
          Assignee: dri-devel@lists.freedesktop.org
          Reporter: JollyRoger@Mailfence.com

Created attachment 143700
  --> https://bugs.freedesktop.org/attachment.cgi?id=143700&action=edit
dmesg output, kernel configuration file, lspci, and Xorg.0.log respectively.

Ahoy!

It looks like amdgpu is having an "Oops" when initializing with Kernels 5.0.0
and later on Linux PPC64 (Little Endian) platforms, right as it tries to load
amdgpu. 

In the attached dmesg it looks like it starts around here: 

[   34.247578] Oops: Kernel access of bad area, sig: 7 [#1]

I came to notice this bug when I upgraded the kernel from 4.20.11 on Gentoo and
4.20.1 on Debian and rebooted, then trying to bring up xfce4 would hang. This
even causes Gentoo to hang on shutdown when / cannot be unmounted and requires
a hard poweroff, even if I attempt to kill the process starting xfce4. 

I tried it with both 5.0.0 and 5.0.2 on Gentoo, after upgrading from 4.20.11,
and got similar results: when I enter "startxfce4" at the prompt, xorg hangs. 

I'm attaching my dmesg, kernel configuration file, the output of lspci, and the
xorg.0.log in that order. The xorg.0.log file is only 48 lines long, and hasn't
been truncated, that's all that's in it. Currently the only way to work around
this is for me to use an older kernel. I can post the dmesg from 4.20.11 (the
last kernel I had that worked) if it's required.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3659 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug 110142] "Oops: Kernel access of bad area sig 7" on Kernel 5.0.0 PPC64LE when loading amdgpu, xorg hangs after being unable to load after OS boots.
  2019-03-17  2:52 [Bug 110142] "Oops: Kernel access of bad area sig 7" on Kernel 5.0.0 PPC64LE when loading amdgpu, xorg hangs after being unable to load after OS boots bugzilla-daemon
@ 2019-03-18  2:33 ` bugzilla-daemon
  2019-03-18  2:59 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: bugzilla-daemon @ 2019-03-18  2:33 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 212 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110142

--- Comment #1 from Alex Deucher <alexdeucher@gmail.com> ---
Can you bisect?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug 110142] "Oops: Kernel access of bad area sig 7" on Kernel 5.0.0 PPC64LE when loading amdgpu, xorg hangs after being unable to load after OS boots.
  2019-03-17  2:52 [Bug 110142] "Oops: Kernel access of bad area sig 7" on Kernel 5.0.0 PPC64LE when loading amdgpu, xorg hangs after being unable to load after OS boots bugzilla-daemon
  2019-03-18  2:33 ` bugzilla-daemon
@ 2019-03-18  2:59 ` bugzilla-daemon
  2019-03-18 10:03 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: bugzilla-daemon @ 2019-03-18  2:59 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 447 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110142

--- Comment #2 from Peter Easton <JollyRoger@Mailfence.com> ---
Sure! 

I haven't git bisected a kernel before so I'll go and teach myself how to and
report back as soon as I figure it out (I'm currently testing it with 4.20.16,
which also seems to work). I apologize for the delay, I will keep you updated!

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1433 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug 110142] "Oops: Kernel access of bad area sig 7" on Kernel 5.0.0 PPC64LE when loading amdgpu, xorg hangs after being unable to load after OS boots.
  2019-03-17  2:52 [Bug 110142] "Oops: Kernel access of bad area sig 7" on Kernel 5.0.0 PPC64LE when loading amdgpu, xorg hangs after being unable to load after OS boots bugzilla-daemon
  2019-03-18  2:33 ` bugzilla-daemon
  2019-03-18  2:59 ` bugzilla-daemon
@ 2019-03-18 10:03 ` bugzilla-daemon
  2019-03-18 10:05 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: bugzilla-daemon @ 2019-03-18 10:03 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 730 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110142

--- Comment #3 from Michel Dänzer <michel@daenzer.net> ---
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c713a461459202504050305242cd854bad57837c
seems the most likely candidate, it's the only significant change to
gmc_v9_0_late_init between 4.20 and 5.0.

I guess the problem is actually in gmc_v9_0_allocate_vm_inv_eng though. Peter,
what does

 scripts/faddr2line drivers/gpu/drm/amd/amdgpu/amdgpu.ko
gmc_v9_0_late_init+0x114/0x500

say in the kernel build tree (in the state after building the binaries which
generated the attached dmesg output)?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1845 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug 110142] "Oops: Kernel access of bad area sig 7" on Kernel 5.0.0 PPC64LE when loading amdgpu, xorg hangs after being unable to load after OS boots.
  2019-03-17  2:52 [Bug 110142] "Oops: Kernel access of bad area sig 7" on Kernel 5.0.0 PPC64LE when loading amdgpu, xorg hangs after being unable to load after OS boots bugzilla-daemon
                   ` (2 preceding siblings ...)
  2019-03-18 10:03 ` bugzilla-daemon
@ 2019-03-18 10:05 ` bugzilla-daemon
  2019-03-19  3:21 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: bugzilla-daemon @ 2019-03-18 10:05 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 542 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110142

Christian König <christian.koenig@amd.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED

--- Comment #4 from Christian König <christian.koenig@amd.com> ---
Yeah, that is a known problem.

Give me a moment to submit a fix to the mailing list.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2196 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug 110142] "Oops: Kernel access of bad area sig 7" on Kernel 5.0.0 PPC64LE when loading amdgpu, xorg hangs after being unable to load after OS boots.
  2019-03-17  2:52 [Bug 110142] "Oops: Kernel access of bad area sig 7" on Kernel 5.0.0 PPC64LE when loading amdgpu, xorg hangs after being unable to load after OS boots bugzilla-daemon
                   ` (3 preceding siblings ...)
  2019-03-18 10:05 ` bugzilla-daemon
@ 2019-03-19  3:21 ` bugzilla-daemon
  2019-03-19  3:24 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: bugzilla-daemon @ 2019-03-19  3:21 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1754 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110142

--- Comment #5 from Peter Easton <JollyRoger@Mailfence.com> ---

I'm going to try to see if I can narrow things down a bit by finding the last
commit that worked before the bug happened and then bisect the kernel there.
I'm going to try to compile the kernels, install them, and then try rebooting
and starting xfce4 one by one until I find the one that won't start.

I apologize in advance for the sluggishness, there are a lot of commits here
and the computer boots very slowly, so this puts a bit of a bottleneck on how
many kernels I can test in a given timeframe. I'll try to hurry as best I can
and I'll try to keep this thread updated as I make progress on it. Right now I
think I'll start at commit af0df68432f65915b2a316aa99eeeb588d4c65a2 since that
one works, and I'll start working my way towards 5.0.0 from there to see if I
can narrow down the right commit that borked the driver. 

> I guess the problem is actually in gmc_v9_0_allocate_vm_inv_eng though. Peter, what does...[truncated]

Sure. 5.0.2, which gave us that dmesg output, returned to this message when I
tried to enter the commands as followed, this is what the screen looks like: 

> captain@morgans-revenge /usr/src/linux-5.0.2-gentoo/scripts $ ./faddr2line ../drivers/gpu/drm/amd/amdgpu/amdgpu.ko
> gmc_v9_0_late_init+0x114/0x500
> gmc_v9_0_late_init+0x114/0x500:
> gmc_v9_0_late_init at gmc_v9_0.c:?
> captain@morgans-revenge /usr/src/linux-5.0.2-gentoo/scripts $

I hope this might be what you are looking for? I wasn't sure what to enter so I
looked for the drivers folder and then typed the rest in as it was.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2842 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug 110142] "Oops: Kernel access of bad area sig 7" on Kernel 5.0.0 PPC64LE when loading amdgpu, xorg hangs after being unable to load after OS boots.
  2019-03-17  2:52 [Bug 110142] "Oops: Kernel access of bad area sig 7" on Kernel 5.0.0 PPC64LE when loading amdgpu, xorg hangs after being unable to load after OS boots bugzilla-daemon
                   ` (4 preceding siblings ...)
  2019-03-19  3:21 ` bugzilla-daemon
@ 2019-03-19  3:24 ` bugzilla-daemon
  2019-03-19  8:50 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: bugzilla-daemon @ 2019-03-19  3:24 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 658 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110142

--- Comment #6 from Peter Easton <JollyRoger@Mailfence.com> ---
Whoops, hit return too quickly. 

(In reply to Michel Dänzer from comment #3)
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> ?id=c713a461459202504050305242cd854bad57837c seems the most likely
> candidate, it's the only significant change to gmc_v9_0_late_init between
> 4.20 and 5.0.

Sure. I'll go and give this one a try first, actually. I'm out of time for the
night but I can hop back on it first thing tomorrow.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug 110142] "Oops: Kernel access of bad area sig 7" on Kernel 5.0.0 PPC64LE when loading amdgpu, xorg hangs after being unable to load after OS boots.
  2019-03-17  2:52 [Bug 110142] "Oops: Kernel access of bad area sig 7" on Kernel 5.0.0 PPC64LE when loading amdgpu, xorg hangs after being unable to load after OS boots bugzilla-daemon
                   ` (5 preceding siblings ...)
  2019-03-19  3:24 ` bugzilla-daemon
@ 2019-03-19  8:50 ` bugzilla-daemon
  2019-03-20  0:42 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: bugzilla-daemon @ 2019-03-19  8:50 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 284 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110142

--- Comment #7 from Michel Dänzer <michel@daenzer.net> ---
Please try https://patchwork.freedesktop.org/patch/292720/ , it should fix the
problem.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1348 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug 110142] "Oops: Kernel access of bad area sig 7" on Kernel 5.0.0 PPC64LE when loading amdgpu, xorg hangs after being unable to load after OS boots.
  2019-03-17  2:52 [Bug 110142] "Oops: Kernel access of bad area sig 7" on Kernel 5.0.0 PPC64LE when loading amdgpu, xorg hangs after being unable to load after OS boots bugzilla-daemon
                   ` (6 preceding siblings ...)
  2019-03-19  8:50 ` bugzilla-daemon
@ 2019-03-20  0:42 ` bugzilla-daemon
  2019-03-20  2:01 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: bugzilla-daemon @ 2019-03-20  0:42 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 255 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110142

--- Comment #8 from Peter Easton <JollyRoger@Mailfence.com> ---
Great, I'll go try it on 5.0.2 and report back, thanks!

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1261 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug 110142] "Oops: Kernel access of bad area sig 7" on Kernel 5.0.0 PPC64LE when loading amdgpu, xorg hangs after being unable to load after OS boots.
  2019-03-17  2:52 [Bug 110142] "Oops: Kernel access of bad area sig 7" on Kernel 5.0.0 PPC64LE when loading amdgpu, xorg hangs after being unable to load after OS boots bugzilla-daemon
                   ` (7 preceding siblings ...)
  2019-03-20  0:42 ` bugzilla-daemon
@ 2019-03-20  2:01 ` bugzilla-daemon
  2019-03-20  9:47 ` bugzilla-daemon
  2019-03-22  1:40 ` bugzilla-daemon
  10 siblings, 0 replies; 12+ messages in thread
From: bugzilla-daemon @ 2019-03-20  2:01 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 559 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110142

--- Comment #9 from Peter Easton <JollyRoger@Mailfence.com> ---
(In reply to Michel Dänzer from comment #7)
> Please try https://patchwork.freedesktop.org/patch/292720/ , it should fix
> the problem.

Splice the mainbrace! It worked like a charm! 

It worked, xfce4 started without a hitch this time with the new kernel. Thank
you so much! 

What shall we do now? Is there a way to get that patch merged upstream?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1701 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug 110142] "Oops: Kernel access of bad area sig 7" on Kernel 5.0.0 PPC64LE when loading amdgpu, xorg hangs after being unable to load after OS boots.
  2019-03-17  2:52 [Bug 110142] "Oops: Kernel access of bad area sig 7" on Kernel 5.0.0 PPC64LE when loading amdgpu, xorg hangs after being unable to load after OS boots bugzilla-daemon
                   ` (8 preceding siblings ...)
  2019-03-20  2:01 ` bugzilla-daemon
@ 2019-03-20  9:47 ` bugzilla-daemon
  2019-03-22  1:40 ` bugzilla-daemon
  10 siblings, 0 replies; 12+ messages in thread
From: bugzilla-daemon @ 2019-03-20  9:47 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 283 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110142

--- Comment #10 from Michel Dänzer <michel@daenzer.net> ---
It's on its way already, but it might take a while for it to land in a 5.0.y
release.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1286 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug 110142] "Oops: Kernel access of bad area sig 7" on Kernel 5.0.0 PPC64LE when loading amdgpu, xorg hangs after being unable to load after OS boots.
  2019-03-17  2:52 [Bug 110142] "Oops: Kernel access of bad area sig 7" on Kernel 5.0.0 PPC64LE when loading amdgpu, xorg hangs after being unable to load after OS boots bugzilla-daemon
                   ` (9 preceding siblings ...)
  2019-03-20  9:47 ` bugzilla-daemon
@ 2019-03-22  1:40 ` bugzilla-daemon
  10 siblings, 0 replies; 12+ messages in thread
From: bugzilla-daemon @ 2019-03-22  1:40 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 630 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110142

Peter Easton <JollyRoger@Mailfence.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|ASSIGNED                    |RESOLVED

--- Comment #11 from Peter Easton <JollyRoger@Mailfence.com> ---
Yay, glad to hear! 

I'm going to change it to fixed then, if that's okay with you guys? 

Thank you so much for the help.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2443 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-03-22  1:40 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-17  2:52 [Bug 110142] "Oops: Kernel access of bad area sig 7" on Kernel 5.0.0 PPC64LE when loading amdgpu, xorg hangs after being unable to load after OS boots bugzilla-daemon
2019-03-18  2:33 ` bugzilla-daemon
2019-03-18  2:59 ` bugzilla-daemon
2019-03-18 10:03 ` bugzilla-daemon
2019-03-18 10:05 ` bugzilla-daemon
2019-03-19  3:21 ` bugzilla-daemon
2019-03-19  3:24 ` bugzilla-daemon
2019-03-19  8:50 ` bugzilla-daemon
2019-03-20  0:42 ` bugzilla-daemon
2019-03-20  2:01 ` bugzilla-daemon
2019-03-20  9:47 ` bugzilla-daemon
2019-03-22  1:40 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).