All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" at amdgpu_vm_cpu_set_ptes at S3 resume
@ 2018-06-28 19:33 bugzilla-daemon
  2018-06-28 19:42 ` bugzilla-daemon
                   ` (27 more replies)
  0 siblings, 28 replies; 29+ messages in thread
From: bugzilla-daemon @ 2018-06-28 19:33 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 5258 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107065

            Bug ID: 107065
           Summary: "BUG: unable to handle kernel paging request at
                    0000000000002000" at amdgpu_vm_cpu_set_ptes at S3
                    resume
           Product: DRI
           Version: DRI git
          Hardware: x86-64 (AMD64)
                OS: Linux (All)
            Status: NEW
          Severity: major
          Priority: medium
         Component: DRM/AMDgpu
          Assignee: dri-devel@lists.freedesktop.org
          Reporter: jb5sgc1n.nya@20mm.eu

When I resume from S3 using the 4.17.2-1-ARCH kernel, with
amdgpu.vm_update_mode=3 (for reasons explained in
https://bugs.freedesktop.org/show_bug.cgi?id=102322 ) first the amdgpu driver
and shortly thereafter the system crashes with the following kernel messages:

Jun 28 21:14:25 ryzen kernel: ACPI: Low-level resume complete
Jun 28 21:14:25 ryzen kernel: PM: Restoring platform NVS memory
Jun 28 21:14:25 ryzen kernel: Enabling non-boot CPUs ...
...
Jun 28 21:14:25 ryzen kernel: [drm] PCIE GART of 256M enabled (table at
0x000000F400040000).
Jun 28 21:14:25 ryzen kernel: [drm] UVD and UVD ENC initialized successfully.
Jun 28 21:14:25 ryzen kernel: [drm] VCE initialized successfully.
Jun 28 21:14:25 ryzen kernel: OOM killer enabled.
Jun 28 21:14:25 ryzen kernel: Restarting tasks ... done.
Jun 28 21:14:25 ryzen kernel: PM: suspend exit
Jun 28 21:14:25 ryzen kernel: BUG: unable to handle kernel paging request at
0000000000002000
Jun 28 21:14:25 ryzen kernel: PGD 0 P4D 0 
Jun 28 21:14:25 ryzen kernel: Oops: 0002 [#1] PREEMPT SMP NOPTI
Jun 28 21:14:25 ryzen kernel: Modules linked in: arc4 md4 sha512_ssse3
sha512_generic nls_utf8 cifs ccm dns_resolver fscache>
Jun 28 21:14:25 ryzen kernel:  bluetooth snd_hwdep snd_pcm eeepc_wmi snd_timer
asus_wmi snd sparse_keymap mxm_wmi wmi_bmof i>
Jun 28 21:14:25 ryzen kernel:  dm_crypt dm_mod i2c_dev
Jun 28 21:14:25 ryzen kernel: CPU: 3 PID: 882 Comm: amdgpu_cs:0 Tainted: G     
  W  O      4.17.2-1-ARCH #1
Jun 28 21:14:25 ryzen kernel: Hardware name: System manufacturer System Product
Name/PRIME X370-PRO, BIOS 4011 04/19/2018
Jun 28 21:14:25 ryzen kernel: RIP: 0010:gmc_v8_0_set_pte_pde+0x1b/0x30 [amdgpu]
Jun 28 21:14:25 ryzen kernel: RSP: 0018:ffffb8b8c3fa7a70 EFLAGS: 00010202
Jun 28 21:14:25 ryzen kernel: RAX: 000000fffffff000 RBX: 0000000000000001 RCX:
000000f400956001
Jun 28 21:14:25 ryzen kernel: RDX: 0000000000002000 RSI: 0000000000002000 RDI:
ffff9edab48a0000
Jun 28 21:14:25 ryzen kernel: RBP: 0000000000000000 R08: 0000000000000001 R09:
0000000000000000
Jun 28 21:14:25 ryzen kernel: R10: ffffffffc03e4c50 R11: ffff9edab30d0800 R12:
0000000000002000
Jun 28 21:14:25 ryzen kernel: R13: 0000000000000001 R14: ffffb8b8c3fa7ae8 R15:
000000f400956000
Jun 28 21:14:25 ryzen kernel: FS:  00007f622bb59700(0000)
GS:ffff9edadecc0000(0000) knlGS:0000000000000000
Jun 28 21:14:25 ryzen kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 28 21:14:25 ryzen kernel: CR2: 0000000000002000 CR3: 00000007e03f8000 CR4:
00000000003406e0
Jun 28 21:14:25 ryzen kernel: Call Trace:
Jun 28 21:14:25 ryzen kernel:  amdgpu_vm_cpu_set_ptes+0x76/0xf0 [amdgpu]
Jun 28 21:14:25 ryzen kernel:  amdgpu_vm_update_directories+0x1ca/0x3c0
[amdgpu]
Jun 28 21:14:25 ryzen kernel:  ? amdgpu_vm_do_copy_ptes+0xc0/0xc0 [amdgpu]
Jun 28 21:14:25 ryzen kernel:  amdgpu_cs_ioctl+0x1169/0x1a70 [amdgpu]
Jun 28 21:14:25 ryzen kernel:  ? dequeue_entity+0x156/0x950
Jun 28 21:14:25 ryzen kernel:  ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu]
Jun 28 21:14:25 ryzen kernel:  drm_ioctl_kernel+0x5b/0xb0 [drm]
Jun 28 21:14:25 ryzen kernel:  drm_ioctl+0x1b7/0x370 [drm]
Jun 28 21:14:25 ryzen kernel:  ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu]
Jun 28 21:14:25 ryzen kernel:  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
Jun 28 21:14:25 ryzen kernel:  do_vfs_ioctl+0xa4/0x610
Jun 28 21:14:25 ryzen kernel:  ksys_ioctl+0x60/0x90
Jun 28 21:14:25 ryzen kernel:  __x64_sys_ioctl+0x16/0x20
Jun 28 21:14:25 ryzen kernel:  do_syscall_64+0x5b/0x170
Jun 28 21:14:25 ryzen kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jun 28 21:14:25 ryzen kernel: RIP: 0033:0x7f623b586667
Jun 28 21:14:25 ryzen kernel: RSP: 002b:00007f622bb58a98 EFLAGS: 00000246
ORIG_RAX: 0000000000000010
Jun 28 21:14:25 ryzen kernel: RAX: ffffffffffffffda RBX: 00007f622bb58b88 RCX:
00007f623b586667
Jun 28 21:14:25 ryzen kernel: RDX: 00007f622bb58b00 RSI: 00000000c0186444 RDI:
000000000000000b
Jun 28 21:14:25 ryzen kernel: RBP: 00007f622bb58b00 R08: 00007f622bb58bb0 R09:
0000000000000010
Jun 28 21:14:25 ryzen kernel: R10: 00007f622bb58bb0 R11: 0000000000000246 R12:
00000000c0186444
Jun 28 21:14:25 ryzen kernel: R13: 000000000000000b R14: 000000000000000a R15:
0000000000000000
Jun 28 21:14:25 ryzen kernel: Code: 8b 80 d8 00 00 00 e9 85 ed 5c c2 0f 1f 44
00 00 0f 1f 44 00 00 48 b8 00 f0 ff ff ff 00 0>
Jun 28 21:14:25 ryzen kernel: RIP: gmc_v8_0_set_pte_pde+0x1b/0x30 [amdgpu] RSP:
ffffb8b8c3fa7a70
Jun 28 21:14:25 ryzen kernel: CR2: 0000000000002000
Jun 28 21:14:25 ryzen kernel: ---[ end trace 6fce4be2faa5be7e ]---

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 6805 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" at amdgpu_vm_cpu_set_ptes at S3 resume
  2018-06-28 19:33 [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" at amdgpu_vm_cpu_set_ptes at S3 resume bugzilla-daemon
@ 2018-06-28 19:42 ` bugzilla-daemon
  2018-06-28 19:52 ` bugzilla-daemon
                   ` (26 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2018-06-28 19:42 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 356 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107065

--- Comment #1 from dwagner <jb5sgc1n.nya@20mm.eu> ---
Created attachment 140383
  --> https://bugs.freedesktop.org/attachment.cgi?id=140383&action=edit
dmesg of the system boot and before and at the crash at S3 resume

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1487 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" at amdgpu_vm_cpu_set_ptes at S3 resume
  2018-06-28 19:33 [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" at amdgpu_vm_cpu_set_ptes at S3 resume bugzilla-daemon
  2018-06-28 19:42 ` bugzilla-daemon
@ 2018-06-28 19:52 ` bugzilla-daemon
  2018-06-28 20:49 ` bugzilla-daemon
                   ` (25 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2018-06-28 19:52 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 342 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107065

--- Comment #2 from dwagner <jb5sgc1n.nya@20mm.eu> ---
(Just for reference: This bug report is for a different kind of S3-resume-crash
than reported in https://bugs.freedesktop.org/show_bug.cgi?id=103277 )

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1478 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" at amdgpu_vm_cpu_set_ptes at S3 resume
  2018-06-28 19:33 [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" at amdgpu_vm_cpu_set_ptes at S3 resume bugzilla-daemon
  2018-06-28 19:42 ` bugzilla-daemon
  2018-06-28 19:52 ` bugzilla-daemon
@ 2018-06-28 20:49 ` bugzilla-daemon
  2018-06-28 22:50 ` bugzilla-daemon
                   ` (24 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2018-06-28 20:49 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 324 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107065

--- Comment #3 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
Can you use addr2line or gdb with 'list' command to give the line number
matching
amdgpu_vm_cpu_set_ptes+0x76/0xf0 ?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1248 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" at amdgpu_vm_cpu_set_ptes at S3 resume
  2018-06-28 19:33 [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" at amdgpu_vm_cpu_set_ptes at S3 resume bugzilla-daemon
                   ` (2 preceding siblings ...)
  2018-06-28 20:49 ` bugzilla-daemon
@ 2018-06-28 22:50 ` bugzilla-daemon
  2018-06-29  0:37 ` bugzilla-daemon
                   ` (23 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2018-06-28 22:50 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 4772 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107065

--- Comment #4 from dwagner <jb5sgc1n.nya@20mm.eu> ---
(In reply to Andrey Grodzovsky from comment #3)
> Can you use addr2line or gdb with 'list' command to give the line number
> matching
> amdgpu_vm_cpu_set_ptes+0x76/0xf0 ?

That would have been easy had I used my self-compiled kernel - but it seems
there is no debuginfo file available for the Arch Linux supplied kernels, which
I ran in this case.

So I can only provide a disassembled listing of that function, with offset 0x76
aka +118 inside:

Dump of assembler code for function amdgpu_vm_cpu_set_ptes:
   0x0000000000027c80 <+0>:     callq  0x27c85 <amdgpu_vm_cpu_set_ptes+5>
   0x0000000000027c85 <+5>:     push   %r15
   0x0000000000027c87 <+7>:     mov    %rcx,%r15
   0x0000000000027c8a <+10>:    push   %r14
   0x0000000000027c8c <+12>:    mov    %rdi,%r14
   0x0000000000027c8f <+15>:    mov    %rsi,%rdi
   0x0000000000027c92 <+18>:    push   %r13
   0x0000000000027c94 <+20>:    mov    %r8d,%r13d
   0x0000000000027c97 <+23>:    push   %r12
   0x0000000000027c99 <+25>:    mov    %rdx,%r12
   0x0000000000027c9c <+28>:    push   %rbp
   0x0000000000027c9d <+29>:    mov    %r9d,%ebp
   0x0000000000027ca0 <+32>:    push   %rbx
   0x0000000000027ca1 <+33>:    callq  0x27ca6 <amdgpu_vm_cpu_set_ptes+38>
   0x0000000000027ca6 <+38>:    add    %rax,%r12
   0x0000000000027ca9 <+41>:    nopl   0x0(%rax,%rax,1)
   0x0000000000027cae <+46>:    xor    %ebx,%ebx
   0x0000000000027cb0 <+48>:    test   %r13d,%r13d
   0x0000000000027cb3 <+51>:    je     0x27cfb <amdgpu_vm_cpu_set_ptes+123>
   0x0000000000027cb5 <+53>:    mov    0x28(%r14),%rax
   0x0000000000027cb9 <+57>:    mov    %r15,%rcx
   0x0000000000027cbc <+60>:    test   %rax,%rax
   0x0000000000027cbf <+63>:    je     0x27cd3 <amdgpu_vm_cpu_set_ptes+83>
   0x0000000000027cc1 <+65>:    mov    %r15,%rdx
   0x0000000000027cc4 <+68>:    mov    $0xfffffffffffff000,%rcx
   0x0000000000027ccb <+75>:    shr    $0xc,%rdx
   0x0000000000027ccf <+79>:    and    (%rax,%rdx,8),%rcx
   0x0000000000027cd3 <+83>:    mov    (%r14),%rdi
   0x0000000000027cd6 <+86>:    mov    %ebx,%edx
   0x0000000000027cd8 <+88>:    add    $0x1,%ebx
   0x0000000000027cdb <+91>:    mov    0x38(%rsp),%r8
   0x0000000000027ce0 <+96>:    mov    %r12,%rsi
   0x0000000000027ce3 <+99>:    add    %rbp,%r15
   0x0000000000027ce6 <+102>:   mov    0x968(%rdi),%rax
   0x0000000000027ced <+109>:   mov    0x18(%rax),%rax
   0x0000000000027cf1 <+113>:   callq  0x27cf6 <amdgpu_vm_cpu_set_ptes+118>
   0x0000000000027cf6 <+118>:   cmp    %ebx,%r13d
   0x0000000000027cf9 <+121>:   jne    0x27cb5 <amdgpu_vm_cpu_set_ptes+53>
   0x0000000000027cfb <+123>:   pop    %rbx
   0x0000000000027cfc <+124>:   pop    %rbp
   0x0000000000027cfd <+125>:   pop    %r12
   0x0000000000027cff <+127>:   pop    %r13
   0x0000000000027d01 <+129>:   pop    %r14
   0x0000000000027d03 <+131>:   pop    %r15
   0x0000000000027d05 <+133>:   retq   
   0x0000000000027d06 <+134>:   mov    %gs:0x0(%rip),%eax        # 0x27d0d
<amdgpu_vm_cpu_set_ptes+141>
   0x0000000000027d0d <+141>:   mov    %eax,%eax
   0x0000000000027d0f <+143>:   bt     %rax,0x0(%rip)        # 0x27d17
<amdgpu_vm_cpu_set_ptes+151>
   0x0000000000027d17 <+151>:   jae    0x27cae <amdgpu_vm_cpu_set_ptes+46>
   0x0000000000027d19 <+153>:   incl   %gs:0x0(%rip)        # 0x27d20
<amdgpu_vm_cpu_set_ptes+160>
   0x0000000000027d20 <+160>:   mov    0x0(%rip),%rbx        # 0x27d27
<amdgpu_vm_cpu_set_ptes+167>
   0x0000000000027d27 <+167>:   test   %rbx,%rbx
   0x0000000000027d2a <+170>:   je     0x27d55 <amdgpu_vm_cpu_set_ptes+213>
   0x0000000000027d2c <+172>:   mov    (%rbx),%rax
   0x0000000000027d2f <+175>:   mov    0x8(%rbx),%rdi
   0x0000000000027d33 <+179>:   add    $0x18,%rbx
   0x0000000000027d37 <+183>:   mov    0x38(%rsp),%r9
   0x0000000000027d3c <+188>:   mov    %ebp,%r8d
   0x0000000000027d3f <+191>:   mov    %r13d,%ecx
   0x0000000000027d42 <+194>:   mov    %r15,%rdx
   0x0000000000027d45 <+197>:   mov    %r12,%rsi
   0x0000000000027d48 <+200>:   callq  0x27d4d <amdgpu_vm_cpu_set_ptes+205>
   0x0000000000027d4d <+205>:   mov    (%rbx),%rax
   0x0000000000027d50 <+208>:   test   %rax,%rax
   0x0000000000027d53 <+211>:   jne    0x27d2f <amdgpu_vm_cpu_set_ptes+175>
   0x0000000000027d55 <+213>:   decl   %gs:0x0(%rip)        # 0x27d5c
<amdgpu_vm_cpu_set_ptes+220>
   0x0000000000027d5c <+220>:   jne    0x27cae <amdgpu_vm_cpu_set_ptes+46>
   0x0000000000027d62 <+226>:   callq  0x27d67 <amdgpu_vm_cpu_set_ptes+231>
   0x0000000000027d67 <+231>:   jmpq   0x27cae <amdgpu_vm_cpu_set_ptes+46>

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 6280 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" at amdgpu_vm_cpu_set_ptes at S3 resume
  2018-06-28 19:33 [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" at amdgpu_vm_cpu_set_ptes at S3 resume bugzilla-daemon
                   ` (3 preceding siblings ...)
  2018-06-28 22:50 ` bugzilla-daemon
@ 2018-06-29  0:37 ` bugzilla-daemon
  2018-06-29 16:16 ` bugzilla-daemon
                   ` (22 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2018-06-29  0:37 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2208 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107065

--- Comment #5 from dwagner <jb5sgc1n.nya@20mm.eu> ---
Interesting: With amd-staging-drm-next, I see the same crash at
https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c?h=amd-staging-drm-next#n921
with the same backtrace with vm_update_mode=3 immediately upon starting X11 -
not only after S3 resume. Here with symbols translated to source lines:

Jun 29 01:49:05 ryzen kernel: amdgpu_vm_cpu_set_ptes
(/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:921 (discriminator 2)) amdgpu
Jun 29 01:49:05 ryzen kernel: amdgpu_vm_update_directories
(/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:989
/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1096) amdgpu
Jun 29 01:49:05 ryzen kernel: ? amdgpu_vm_do_copy_ptes
(/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:913) amdgpu
Jun 29 01:49:05 ryzen kernel: amdgpu_gem_va_ioctl
(/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c:542
/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c:674) amdgpu
Jun 29 01:49:05 ryzen kernel: ? __alloc_pages_nodemask (/mm/page_alloc.c:4355) 
Jun 29 01:49:05 ryzen kernel: ? amdgpu_gem_metadata_ioctl
(/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c:548) amdgpu
Jun 29 01:49:05 ryzen kernel: drm_ioctl_kernel+0xa7/0xf0 drm
Jun 29 01:49:05 ryzen kernel: drm_ioctl+0x2f1/0x3c0 drm
Jun 29 01:49:05 ryzen kernel: ? amdgpu_gem_metadata_ioctl
(/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c:548) amdgpu
Jun 29 01:49:05 ryzen kernel: amdgpu_drm_ioctl
(/./include/linux/pm_runtime.h:108
/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:842) amdgpu
Jun 29 01:49:05 ryzen kernel: do_vfs_ioctl (/fs/ioctl.c:46 /fs/ioctl.c:500
/fs/ioctl.c:684) 
Jun 29 01:49:05 ryzen kernel: ? handle_mm_fault (/mm/memory.c:4133) 
Jun 29 01:49:05 ryzen kernel: ksys_ioctl (/./include/linux/file.h:39
/fs/ioctl.c:702) 
Jun 29 01:49:05 ryzen kernel: __x64_sys_ioctl (/fs/ioctl.c:708 /fs/ioctl.c:706
/fs/ioctl.c:706) 
Jun 29 01:49:05 ryzen kernel: do_syscall_64 (/arch/x86/entry/common.c:290) 
Jun 29 01:49:05 ryzen kernel: entry_SYSCALL_64_after_hwframe
(/./include/trace/events/initcall.h:10 /./include/trace/events/initcall.h:10)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3245 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" at amdgpu_vm_cpu_set_ptes at S3 resume
  2018-06-28 19:33 [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" at amdgpu_vm_cpu_set_ptes at S3 resume bugzilla-daemon
                   ` (4 preceding siblings ...)
  2018-06-29  0:37 ` bugzilla-daemon
@ 2018-06-29 16:16 ` bugzilla-daemon
  2018-06-29 19:10 ` bugzilla-daemon
                   ` (21 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2018-06-29 16:16 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2455 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107065

--- Comment #6 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
(In reply to dwagner from comment #5)
> Interesting: With amd-staging-drm-next, I see the same crash at
> https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/amd/amdgpu/
> amdgpu_vm.c?h=amd-staging-drm-next#n921 with the same backtrace with
> vm_update_mode=3 immediately upon starting X11 - not only after S3 resume.
> Here with symbols translated to source lines:
> 
> Jun 29 01:49:05 ryzen kernel: amdgpu_vm_cpu_set_ptes
> (/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:921 (discriminator 2)) amdgpu
> Jun 29 01:49:05 ryzen kernel: amdgpu_vm_update_directories
> (/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:989
> /drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1096) amdgpu
> Jun 29 01:49:05 ryzen kernel: ? amdgpu_vm_do_copy_ptes
> (/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:913) amdgpu
> Jun 29 01:49:05 ryzen kernel: amdgpu_gem_va_ioctl
> (/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c:542
> /drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c:674) amdgpu
> Jun 29 01:49:05 ryzen kernel: ? __alloc_pages_nodemask
> (/mm/page_alloc.c:4355) 
> Jun 29 01:49:05 ryzen kernel: ? amdgpu_gem_metadata_ioctl
> (/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c:548) amdgpu
> Jun 29 01:49:05 ryzen kernel: drm_ioctl_kernel+0xa7/0xf0 drm
> Jun 29 01:49:05 ryzen kernel: drm_ioctl+0x2f1/0x3c0 drm
> Jun 29 01:49:05 ryzen kernel: ? amdgpu_gem_metadata_ioctl
> (/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c:548) amdgpu
> Jun 29 01:49:05 ryzen kernel: amdgpu_drm_ioctl
> (/./include/linux/pm_runtime.h:108
> /drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:842) amdgpu
> Jun 29 01:49:05 ryzen kernel: do_vfs_ioctl (/fs/ioctl.c:46 /fs/ioctl.c:500
> /fs/ioctl.c:684) 
> Jun 29 01:49:05 ryzen kernel: ? handle_mm_fault (/mm/memory.c:4133) 
> Jun 29 01:49:05 ryzen kernel: ksys_ioctl (/./include/linux/file.h:39
> /fs/ioctl.c:702) 
> Jun 29 01:49:05 ryzen kernel: __x64_sys_ioctl (/fs/ioctl.c:708
> /fs/ioctl.c:706 /fs/ioctl.c:706) 
> Jun 29 01:49:05 ryzen kernel: do_syscall_64 (/arch/x86/entry/common.c:290) 
> Jun 29 01:49:05 ryzen kernel: entry_SYSCALL_64_after_hwframe
> (/./include/trace/events/initcall.h:10 /./include/trace/events/initcall.h:10)

So with Arch Linux kernel it happens only during S3 but with
amd-staging-drm-next it happens once you start X ?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3647 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" at amdgpu_vm_cpu_set_ptes at S3 resume
  2018-06-28 19:33 [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" at amdgpu_vm_cpu_set_ptes at S3 resume bugzilla-daemon
                   ` (5 preceding siblings ...)
  2018-06-29 16:16 ` bugzilla-daemon
@ 2018-06-29 19:10 ` bugzilla-daemon
  2018-06-29 19:10 ` [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" in amdgpu_vm_cpu_set_ptes at amdgpu_vm.c:921 bugzilla-daemon
                   ` (20 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2018-06-29 19:10 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1447 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107065

--- Comment #7 from dwagner <jb5sgc1n.nya@20mm.eu> ---
(In reply to Andrey Grodzovsky from comment #6)
> So with Arch Linux kernel it happens only during S3 but with
> amd-staging-drm-next it happens once you start X ?

Yes. I know it sounds strange, but it's currently 100% reproducible to me:

Booting linux-4.17.2-ARCH with amdgpu.vm_update_mode=0:
 X11 starts fine, but system crashes after minutes of firefox browsing

Booting linux-4.17.2-ARCH with amdgpu.vm_update_mode=3:
 X11 starts fine, system does not crash (for at least hours of use)
 but crashes as above if resumed from S3 sleep

Booting linux compiled from amd-staging-drm-next, as of commit
527d6e839a0e52b744fd092453544e4f58977334 from yesterday, with
amdgpu.vm_update_mode=0:
 X11 starts fine, but system crashes after minutes of firefox browsing

Booting linux compiled from amd-staging-drm-next, as of commit
527d6e839a0e52b744fd092453544e4f58977334 from yesterday, with
amdgpu.vm_update_mode=3:
 X11 does not start, crashes immediately with the same above pasted kernel BUG
message and backtrace


So something with CPU-based vm_update_mode is broken, but in a different way
than the SDMA-based method.

I will change the subject of this report to reflect that this crash is not
necessarily S3-resume-related.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2430 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" in amdgpu_vm_cpu_set_ptes at amdgpu_vm.c:921
  2018-06-28 19:33 [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" at amdgpu_vm_cpu_set_ptes at S3 resume bugzilla-daemon
                   ` (6 preceding siblings ...)
  2018-06-29 19:10 ` bugzilla-daemon
@ 2018-06-29 19:10 ` bugzilla-daemon
  2018-06-29 19:17 ` bugzilla-daemon
                   ` (19 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2018-06-29 19:10 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 671 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107065

dwagner <jb5sgc1n.nya@20mm.eu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|"BUG: unable to handle      |"BUG: unable to handle
                   |kernel paging request at    |kernel paging request at
                   |0000000000002000" at        |0000000000002000" in
                   |amdgpu_vm_cpu_set_ptes at   |amdgpu_vm_cpu_set_ptes at
                   |S3 resume                   |amdgpu_vm.c:921

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1348 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" in amdgpu_vm_cpu_set_ptes at amdgpu_vm.c:921
  2018-06-28 19:33 [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" at amdgpu_vm_cpu_set_ptes at S3 resume bugzilla-daemon
                   ` (7 preceding siblings ...)
  2018-06-29 19:10 ` [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" in amdgpu_vm_cpu_set_ptes at amdgpu_vm.c:921 bugzilla-daemon
@ 2018-06-29 19:17 ` bugzilla-daemon
  2018-06-29 19:21 ` bugzilla-daemon
                   ` (18 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2018-06-29 19:17 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1684 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107065

--- Comment #8 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
(In reply to dwagner from comment #7)
> (In reply to Andrey Grodzovsky from comment #6)
> > So with Arch Linux kernel it happens only during S3 but with
> > amd-staging-drm-next it happens once you start X ?
> 
> Yes. I know it sounds strange, but it's currently 100% reproducible to me:
> 
> Booting linux-4.17.2-ARCH with amdgpu.vm_update_mode=0:
>  X11 starts fine, but system crashes after minutes of firefox browsing
> 
> Booting linux-4.17.2-ARCH with amdgpu.vm_update_mode=3:
>  X11 starts fine, system does not crash (for at least hours of use)
>  but crashes as above if resumed from S3 sleep
> 
> Booting linux compiled from amd-staging-drm-next, as of commit
> 527d6e839a0e52b744fd092453544e4f58977334 from yesterday, with
> amdgpu.vm_update_mode=0:
>  X11 starts fine, but system crashes after minutes of firefox browsing
> 
> Booting linux compiled from amd-staging-drm-next, as of commit
> 527d6e839a0e52b744fd092453544e4f58977334 from yesterday, with
> amdgpu.vm_update_mode=3:
>  X11 does not start, crashes immediately with the same above pasted kernel
> BUG message and backtrace
> 
> 
> So something with CPU-based vm_update_mode is broken, but in a different way
> than the SDMA-based method.
> 
> I will change the subject of this report to reflect that this crash is not
> necessarily S3-resume-related.

I am going to try and reproduce the crash with CPU update mode here, please
describe exactly what ASIC are you using ?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2824 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" in amdgpu_vm_cpu_set_ptes at amdgpu_vm.c:921
  2018-06-28 19:33 [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" at amdgpu_vm_cpu_set_ptes at S3 resume bugzilla-daemon
                   ` (8 preceding siblings ...)
  2018-06-29 19:17 ` bugzilla-daemon
@ 2018-06-29 19:21 ` bugzilla-daemon
  2018-07-02  3:11 ` bugzilla-daemon
                   ` (17 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2018-06-29 19:21 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1820 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107065

--- Comment #9 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
(In reply to Andrey Grodzovsky from comment #8)
> (In reply to dwagner from comment #7)
> > (In reply to Andrey Grodzovsky from comment #6)
> > > So with Arch Linux kernel it happens only during S3 but with
> > > amd-staging-drm-next it happens once you start X ?
> > 
> > Yes. I know it sounds strange, but it's currently 100% reproducible to me:
> > 
> > Booting linux-4.17.2-ARCH with amdgpu.vm_update_mode=0:
> >  X11 starts fine, but system crashes after minutes of firefox browsing
> > 
> > Booting linux-4.17.2-ARCH with amdgpu.vm_update_mode=3:
> >  X11 starts fine, system does not crash (for at least hours of use)
> >  but crashes as above if resumed from S3 sleep
> > 
> > Booting linux compiled from amd-staging-drm-next, as of commit
> > 527d6e839a0e52b744fd092453544e4f58977334 from yesterday, with
> > amdgpu.vm_update_mode=0:
> >  X11 starts fine, but system crashes after minutes of firefox browsing
> > 
> > Booting linux compiled from amd-staging-drm-next, as of commit
> > 527d6e839a0e52b744fd092453544e4f58977334 from yesterday, with
> > amdgpu.vm_update_mode=3:
> >  X11 does not start, crashes immediately with the same above pasted kernel
> > BUG message and backtrace
> > 
> > 
> > So something with CPU-based vm_update_mode is broken, but in a different way
> > than the SDMA-based method.
> > 
> > I will change the subject of this report to reflect that this crash is not
> > necessarily S3-resume-related.
> 
> I am going to try and reproduce the crash with CPU update mode here, please
> describe exactly what ASIC are you using ?

Got it already.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3102 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" in amdgpu_vm_cpu_set_ptes at amdgpu_vm.c:921
  2018-06-28 19:33 [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" at amdgpu_vm_cpu_set_ptes at S3 resume bugzilla-daemon
                   ` (9 preceding siblings ...)
  2018-06-29 19:21 ` bugzilla-daemon
@ 2018-07-02  3:11 ` bugzilla-daemon
  2018-07-02 11:03 ` bugzilla-daemon
                   ` (16 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2018-07-02  3:11 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1034 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107065

--- Comment #10 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
Created attachment 140418
  --> https://bugs.freedesktop.org/attachment.cgi?id=140418&action=edit
drm/amdgpu: Verify root PD is mapped into kernel address space.

dwagner, please try this patch. Fixes the issue for me and I observed no
suspend/resume issues.

Christian, please take a look at the patch, problem was that in
amdgpu_vm_update_directories the parent BO didn't have a kernel mapping and so
later inside amdgpu_vm_cpu_set_ptes 
pe += (unsigned long)amdgpu_bo_kptr(bo); would equal to  0000000000002000 since 
parent amdgpu_bo_kptr woudld return NULL. The parent was the root PD. 

This was still working in 67b8d5c Linus Torvalds      7 weeks ago    Linux
4.17-rc5   (tag: v4.17-rc5) but I wasn't able to exactly pinpoint which change
broke it. I am not sure my fix is the right one so please advise.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2275 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" in amdgpu_vm_cpu_set_ptes at amdgpu_vm.c:921
  2018-06-28 19:33 [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" at amdgpu_vm_cpu_set_ptes at S3 resume bugzilla-daemon
                   ` (10 preceding siblings ...)
  2018-07-02  3:11 ` bugzilla-daemon
@ 2018-07-02 11:03 ` bugzilla-daemon
  2018-07-02 19:48 ` bugzilla-daemon
                   ` (15 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2018-07-02 11:03 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1364 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107065

--- Comment #11 from Christian König <ckoenig.leichtzumerken@gmail.com> ---
(In reply to Andrey Grodzovsky from comment #10)
> Created attachment 140418 [details] [review]
> drm/amdgpu: Verify root PD is mapped into kernel address space.
> 
> dwagner, please try this patch. Fixes the issue for me and I observed no
> suspend/resume issues.
> 
> Christian, please take a look at the patch, problem was that in
> amdgpu_vm_update_directories the parent BO didn't have a kernel mapping and
> so later inside amdgpu_vm_cpu_set_ptes 
> pe += (unsigned long)amdgpu_bo_kptr(bo); would equal to  0000000000002000
> since 
> parent amdgpu_bo_kptr woudld return NULL. The parent was the root PD. 
> 
> This was still working in 67b8d5c Linus Torvalds      7 weeks ago    Linux
> 4.17-rc5   (tag: v4.17-rc5) but I wasn't able to exactly pinpoint which
> change broke it. I am not sure my fix is the right one so please advise.

No idea when that broke either, CPU based updates is not something we usually
test.

Anyway it's a good catch, but I would rather add that to
amdgpu_vm_bo_base_init() (with the appropriate checks).

That would also allow us to remove the duplicated code from
amdgpu_vm_alloc_levels().

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2791 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" in amdgpu_vm_cpu_set_ptes at amdgpu_vm.c:921
  2018-06-28 19:33 [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" at amdgpu_vm_cpu_set_ptes at S3 resume bugzilla-daemon
                   ` (11 preceding siblings ...)
  2018-07-02 11:03 ` bugzilla-daemon
@ 2018-07-02 19:48 ` bugzilla-daemon
  2018-07-02 22:55 ` bugzilla-daemon
                   ` (14 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2018-07-02 19:48 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 10351 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107065

--- Comment #12 from dwagner <jb5sgc1n.nya@20mm.eu> ---
(In reply to Andrey Grodzovsky from comment #10)
> Created attachment 140418 [details] [review]
> drm/amdgpu: Verify root PD is mapped into kernel address space.
> 
> dwagner, please try this patch. Fixes the issue for me and I observed no
> suspend/resume issues.

While I can start X11 with this patch applied to current amd-staging-drm-next,
attempts to resume from S3 fail consistently.

The following related output is emitted right before the suspend:

Jul 02 21:31:32 ryzen kernel: Freezing remaining freezable tasks ... (elapsed
0.000 seconds) done.
Jul 02 21:31:32 ryzen kernel: Suspending console(s) (use no_console_suspend to
debug)
Jul 02 21:31:32 ryzen kernel: sd 9:0:0:0: [sda] Synchronizing SCSI cache
Jul 02 21:31:32 ryzen kernel: [TTM] Buffer eviction failed
Jul 02 21:31:32 ryzen kernel: ACPI: Preparing to enter system sleep state S3
Jul 02 21:31:32 ryzen kernel: PM: Saving platform NVS memory
Jul 02 21:31:32 ryzen kernel: Disabling non-boot CPUs ...

(I wonder if that "[TTM] Buffer eviction failed" is a bad sign - as I have seen
it some other times in conjunction with heavy uses of the amdgpu driver.)


Then, upon resume, the following messages are emitted:

Jul 02 21:31:33 ryzen kernel: ACPI: Low-level resume complete
Jul 02 21:31:33 ryzen kernel: [drm] PCIE GART of 256M enabled (table at
0x000000F400300000).
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               failed to send message 146 ret is 0 
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               last message was failed ret is 0
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               failed to send message 148 ret is 0 
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               last message was failed ret is 0
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               failed to send message 145 ret is 0 
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               last message was failed ret is 0
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               failed to send message 146 ret is 0 
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               last message was failed ret is 0
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               failed to send message 189 ret is 0 
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               last message was failed ret is 0
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               failed to send message 306 ret is 0 
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               last message was failed ret is 0
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               failed to send message 5e ret is 0 
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               last message was failed ret is 0
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               failed to send message 18a ret is 0 
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               last message was failed ret is 0
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               failed to send message 145 ret is 0 
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               last message was failed ret is 0
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               failed to send message 146 ret is 0 
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               last message was failed ret is 0
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               failed to send message 148 ret is 0 
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               last message was failed ret is 0
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               failed to send message 145 ret is 0 
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               last message was failed ret is 0
Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
                               failed to send message 146 ret is 0 
Jul 02 21:31:33 ryzen kernel: [drm:gfx_v8_0_ring_test_ring [amdgpu]] *ERROR*
amdgpu: ring 0 test failed (scratch(0xC040)=0xC>
Jul 02 21:31:33 ryzen kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]]
*ERROR* resume of IP block <gfx_v8_0> failed -22
Jul 02 21:31:33 ryzen kernel: [drm:amdgpu_device_resume [amdgpu]] *ERROR*
amdgpu_device_ip_resume failed (-22).
Jul 02 21:31:33 ryzen kernel: dpm_run_callback(): pci_pm_resume+0x0/0xa0
returns -22
Jul 02 21:31:33 ryzen kernel: PM: Device 0000:0a:00.0 failed to resume async:
error -22
Jul 02 21:31:33 ryzen kernel: OOM killer enabled.
Jul 02 21:31:33 ryzen kernel: Restarting tasks ... done.
Jul 02 21:31:33 ryzen kernel: PM: suspend exit
Jul 02 21:31:33 ryzen kernel: BUG: unable to handle kernel paging request at
0000000000001000
Jul 02 21:31:33 ryzen kernel: PGD 0 P4D 0 
Jul 02 21:31:33 ryzen kernel: Oops: 0002 [#1] SMP
Jul 02 21:31:33 ryzen kernel: CPU: 14 PID: 791 Comm: amdgpu_cs:0 Tainted: G    
   W  O      4.18.0-rc1-amd+ #45
Jul 02 21:31:33 ryzen kernel: Hardware name: System manufacturer System Product
Name/PRIME X370-PRO, BIOS 4011 04/19/2018
Jul 02 21:31:33 ryzen kernel: RIP: 0010:gmc_v8_0_set_pte_pde+0x1b/0x30 [amdgpu]
Jul 02 21:31:33 ryzen kernel: Code: 80 d8 00 00 00 e9 25 78 60 e1 0f 1f 44 00
00 0f 1f 44 00 00 48 b8 00 f0 ff ff ff 00 00 0>
Jul 02 21:31:33 ryzen kernel: RSP: 0018:ffffc90003e73898 EFLAGS: 00010202
Jul 02 21:31:33 ryzen kernel: RAX: 000000fffffff000 RBX: 0000000000000001 RCX:
000000000fe004f1
Jul 02 21:31:33 ryzen kernel: RDX: 0000000000001000 RSI: 0000000000001000 RDI:
ffff8807e2f70000
Jul 02 21:31:33 ryzen kernel: RBP: 0000000000001000 R08: 00000000000004f1 R09:
0000000000001000
Jul 02 21:31:33 ryzen kernel: R10: ffffffffa03ac7e0 R11: ffff8807daf78000 R12:
0000000000001000
Jul 02 21:31:33 ryzen kernel: R13: 0000000000000200 R14: ffffc90003e73a18 R15:
000000000fe01000
Jul 02 21:31:33 ryzen kernel: FS:  00007f8b57266700(0000)
GS:ffff88081ef80000(0000) knlGS:0000000000000000
Jul 02 21:31:33 ryzen kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 02 21:31:33 ryzen kernel: CR2: 0000000000001000 CR3: 00000007dbbda000 CR4:
00000000003406e0
Jul 02 21:31:33 ryzen kernel: Call Trace:
Jul 02 21:31:33 ryzen kernel:  amdgpu_vm_cpu_set_ptes+0x76/0xe0 [amdgpu]
Jul 02 21:31:33 ryzen kernel:  amdgpu_vm_update_ptes+0x1d3/0x2e0 [amdgpu]
Jul 02 21:31:33 ryzen kernel:  amdgpu_vm_frag_ptes+0xae/0x130 [amdgpu]
Jul 02 21:31:33 ryzen kernel:  amdgpu_vm_bo_update_mapping+0xed/0x410 [amdgpu]
Jul 02 21:31:33 ryzen kernel:  ? amdgpu_vm_do_copy_ptes+0xa0/0xa0 [amdgpu]
Jul 02 21:31:33 ryzen kernel:  amdgpu_vm_bo_update+0x310/0x680 [amdgpu]
Jul 02 21:31:33 ryzen kernel:  amdgpu_cs_ioctl+0x1092/0x1a50 [amdgpu]
Jul 02 21:31:33 ryzen kernel:  ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu]
Jul 02 21:31:33 ryzen kernel:  drm_ioctl_kernel+0xa7/0xf0 [drm]
Jul 02 21:31:33 ryzen kernel:  drm_ioctl+0x2f1/0x3c0 [drm]
Jul 02 21:31:33 ryzen kernel:  ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu]
Jul 02 21:31:33 ryzen kernel:  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
Jul 02 21:31:33 ryzen kernel:  do_vfs_ioctl+0xa4/0x620
Jul 02 21:31:33 ryzen kernel:  ? __se_sys_futex+0x138/0x180
Jul 02 21:31:33 ryzen kernel:  ksys_ioctl+0x60/0x90
Jul 02 21:31:33 ryzen kernel:  __x64_sys_ioctl+0x16/0x20
Jul 02 21:31:33 ryzen kernel:  do_syscall_64+0x48/0xf0
Jul 02 21:31:33 ryzen kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 02 21:31:33 ryzen kernel: RIP: 0033:0x7f8b66c92667
Jul 02 21:31:33 ryzen kernel: Code: 00 00 90 48 8b 05 e9 67 2c 00 64 c7 00 26
00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 8>
Jul 02 21:31:33 ryzen kernel: RSP: 002b:00007f8b57265a98 EFLAGS: 00000246
ORIG_RAX: 0000000000000010
Jul 02 21:31:33 ryzen kernel: RAX: ffffffffffffffda RBX: 00007f8b57265b88 RCX:
00007f8b66c92667
Jul 02 21:31:33 ryzen kernel: RDX: 00007f8b57265b00 RSI: 00000000c0186444 RDI:
000000000000000b
Jul 02 21:31:33 ryzen kernel: RBP: 00007f8b57265b00 R08: 00007f8b57265bb0 R09:
0000000000000010
Jul 02 21:31:33 ryzen kernel: R10: 00007f8b57265bb0 R11: 0000000000000246 R12:
00000000c0186444
Jul 02 21:31:33 ryzen kernel: R13: 000000000000000b R14: 0000000000000002 R15:
0000000000000000
Jul 02 21:31:33 ryzen kernel: Modules linked in: it87(O) joydev mousedev
hid_generic hidp hid ipt_REJECT nf_reject_ipv4 nf_l>
Jul 02 21:31:33 ryzen kernel:  serio_raw crc32_pclmul atkbd ghash_clmulni_intel
libps2 pcbc ahci libahci xhci_pci libata aes>
Jul 02 21:31:33 ryzen kernel: CR2: 0000000000001000
Jul 02 21:31:33 ryzen kernel: ---[ end trace 517a8a72887251f0 ]---
Jul 02 21:31:33 ryzen kernel: RIP: 0010:gmc_v8_0_set_pte_pde+0x1b/0x30 [amdgpu]
Jul 02 21:31:33 ryzen kernel: Code: 80 d8 00 00 00 e9 25 78 60 e1 0f 1f 44 00
00 0f 1f 44 00 00 48 b8 00 f0 ff ff ff 00 00 0>
Jul 02 21:31:33 ryzen kernel: RSP: 0018:ffffc90003e73898 EFLAGS: 00010202
Jul 02 21:31:33 ryzen kernel: RAX: 000000fffffff000 RBX: 0000000000000001 RCX:
000000000fe004f1
Jul 02 21:31:33 ryzen kernel: RDX: 0000000000001000 RSI: 0000000000001000 RDI:
ffff8807e2f70000
Jul 02 21:31:33 ryzen kernel: RBP: 0000000000001000 R08: 00000000000004f1 R09:
0000000000001000
Jul 02 21:31:33 ryzen kernel: R10: ffffffffa03ac7e0 R11: ffff8807daf78000 R12:
0000000000001000
Jul 02 21:31:33 ryzen kernel: R13: 0000000000000200 R14: ffffc90003e73a18 R15:
000000000fe01000
Jul 02 21:31:33 ryzen kernel: FS:  00007f8b57266700(0000)
GS:ffff88081ef80000(0000) knlGS:0000000000000000
Jul 02 21:31:33 ryzen kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 02 21:31:33 ryzen kernel: CR2: 0000000000001000 CR3: 00000007dbbda000 CR4:
00000000003406e0

(At this point, the machine is just dead, and reacts upon nothing.)

So something is still wrong at amdgpu_vm_cpu_set_ptes+0x76

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 11758 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" in amdgpu_vm_cpu_set_ptes at amdgpu_vm.c:921
  2018-06-28 19:33 [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" at amdgpu_vm_cpu_set_ptes at S3 resume bugzilla-daemon
                   ` (12 preceding siblings ...)
  2018-07-02 19:48 ` bugzilla-daemon
@ 2018-07-02 22:55 ` bugzilla-daemon
  2018-07-03 20:42 ` bugzilla-daemon
                   ` (13 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2018-07-02 22:55 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 11214 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107065

--- Comment #13 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
(In reply to dwagner from comment #12)
> (In reply to Andrey Grodzovsky from comment #10)
> > Created attachment 140418 [details] [review] [review]
> > drm/amdgpu: Verify root PD is mapped into kernel address space.
> > 
> > dwagner, please try this patch. Fixes the issue for me and I observed no
> > suspend/resume issues.
> 
> While I can start X11 with this patch applied to current
> amd-staging-drm-next, attempts to resume from S3 fail consistently.
> 
> The following related output is emitted right before the suspend:
> 
> Jul 02 21:31:32 ryzen kernel: Freezing remaining freezable tasks ...
> (elapsed 0.000 seconds) done.
> Jul 02 21:31:32 ryzen kernel: Suspending console(s) (use no_console_suspend
> to debug)
> Jul 02 21:31:32 ryzen kernel: sd 9:0:0:0: [sda] Synchronizing SCSI cache
> Jul 02 21:31:32 ryzen kernel: [TTM] Buffer eviction failed
> Jul 02 21:31:32 ryzen kernel: ACPI: Preparing to enter system sleep state S3
> Jul 02 21:31:32 ryzen kernel: PM: Saving platform NVS memory
> Jul 02 21:31:32 ryzen kernel: Disabling non-boot CPUs ...
> 
> (I wonder if that "[TTM] Buffer eviction failed" is a bad sign - as I have
> seen it some other times in conjunction with heavy uses of the amdgpu
> driver.)
> 
> 
> Then, upon resume, the following messages are emitted:
> 
> Jul 02 21:31:33 ryzen kernel: ACPI: Low-level resume complete
> Jul 02 21:31:33 ryzen kernel: [drm] PCIE GART of 256M enabled (table at
> 0x000000F400300000).
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
>                                failed to send message 146 ret is 0 
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
>                                last message was failed ret is 0
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
>                                failed to send message 148 ret is 0 
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
>                                last message was failed ret is 0
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
>                                failed to send message 145 ret is 0 
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
>                                last message was failed ret is 0
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
>                                failed to send message 146 ret is 0 
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
>                                last message was failed ret is 0
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
>                                failed to send message 189 ret is 0 
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
>                                last message was failed ret is 0
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
>                                failed to send message 306 ret is 0 
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
>                                last message was failed ret is 0
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
>                                failed to send message 5e ret is 0 
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
>                                last message was failed ret is 0
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
>                                failed to send message 18a ret is 0 
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
>                                last message was failed ret is 0
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
>                                failed to send message 145 ret is 0 
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
>                                last message was failed ret is 0
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
>                                failed to send message 146 ret is 0 
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
>                                last message was failed ret is 0
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
>                                failed to send message 148 ret is 0 
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
>                                last message was failed ret is 0
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
>                                failed to send message 145 ret is 0 
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
>                                last message was failed ret is 0
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay] 
>                                failed to send message 146 ret is 0 
> Jul 02 21:31:33 ryzen kernel: [drm:gfx_v8_0_ring_test_ring [amdgpu]] *ERROR*
> amdgpu: ring 0 test failed (scratch(0xC040)=0xC>
> Jul 02 21:31:33 ryzen kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]]
> *ERROR* resume of IP block <gfx_v8_0> failed -22
> Jul 02 21:31:33 ryzen kernel: [drm:amdgpu_device_resume [amdgpu]] *ERROR*
> amdgpu_device_ip_resume failed (-22).
> Jul 02 21:31:33 ryzen kernel: dpm_run_callback(): pci_pm_resume+0x0/0xa0
> returns -22
> Jul 02 21:31:33 ryzen kernel: PM: Device 0000:0a:00.0 failed to resume
> async: error -22
> Jul 02 21:31:33 ryzen kernel: OOM killer enabled.
> Jul 02 21:31:33 ryzen kernel: Restarting tasks ... done.
> Jul 02 21:31:33 ryzen kernel: PM: suspend exit
> Jul 02 21:31:33 ryzen kernel: BUG: unable to handle kernel paging request at
> 0000000000001000
> Jul 02 21:31:33 ryzen kernel: PGD 0 P4D 0 
> Jul 02 21:31:33 ryzen kernel: Oops: 0002 [#1] SMP
> Jul 02 21:31:33 ryzen kernel: CPU: 14 PID: 791 Comm: amdgpu_cs:0 Tainted: G 
> W  O      4.18.0-rc1-amd+ #45
> Jul 02 21:31:33 ryzen kernel: Hardware name: System manufacturer System
> Product Name/PRIME X370-PRO, BIOS 4011 04/19/2018
> Jul 02 21:31:33 ryzen kernel: RIP: 0010:gmc_v8_0_set_pte_pde+0x1b/0x30
> [amdgpu]
> Jul 02 21:31:33 ryzen kernel: Code: 80 d8 00 00 00 e9 25 78 60 e1 0f 1f 44
> 00 00 0f 1f 44 00 00 48 b8 00 f0 ff ff ff 00 00 0>
> Jul 02 21:31:33 ryzen kernel: RSP: 0018:ffffc90003e73898 EFLAGS: 00010202
> Jul 02 21:31:33 ryzen kernel: RAX: 000000fffffff000 RBX: 0000000000000001
> RCX: 000000000fe004f1
> Jul 02 21:31:33 ryzen kernel: RDX: 0000000000001000 RSI: 0000000000001000
> RDI: ffff8807e2f70000
> Jul 02 21:31:33 ryzen kernel: RBP: 0000000000001000 R08: 00000000000004f1
> R09: 0000000000001000
> Jul 02 21:31:33 ryzen kernel: R10: ffffffffa03ac7e0 R11: ffff8807daf78000
> R12: 0000000000001000
> Jul 02 21:31:33 ryzen kernel: R13: 0000000000000200 R14: ffffc90003e73a18
> R15: 000000000fe01000
> Jul 02 21:31:33 ryzen kernel: FS:  00007f8b57266700(0000)
> GS:ffff88081ef80000(0000) knlGS:0000000000000000
> Jul 02 21:31:33 ryzen kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
> 0000000080050033
> Jul 02 21:31:33 ryzen kernel: CR2: 0000000000001000 CR3: 00000007dbbda000
> CR4: 00000000003406e0
> Jul 02 21:31:33 ryzen kernel: Call Trace:
> Jul 02 21:31:33 ryzen kernel:  amdgpu_vm_cpu_set_ptes+0x76/0xe0 [amdgpu]
> Jul 02 21:31:33 ryzen kernel:  amdgpu_vm_update_ptes+0x1d3/0x2e0 [amdgpu]
> Jul 02 21:31:33 ryzen kernel:  amdgpu_vm_frag_ptes+0xae/0x130 [amdgpu]
> Jul 02 21:31:33 ryzen kernel:  amdgpu_vm_bo_update_mapping+0xed/0x410
> [amdgpu]
> Jul 02 21:31:33 ryzen kernel:  ? amdgpu_vm_do_copy_ptes+0xa0/0xa0 [amdgpu]
> Jul 02 21:31:33 ryzen kernel:  amdgpu_vm_bo_update+0x310/0x680 [amdgpu]
> Jul 02 21:31:33 ryzen kernel:  amdgpu_cs_ioctl+0x1092/0x1a50 [amdgpu]
> Jul 02 21:31:33 ryzen kernel:  ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu]
> Jul 02 21:31:33 ryzen kernel:  drm_ioctl_kernel+0xa7/0xf0 [drm]
> Jul 02 21:31:33 ryzen kernel:  drm_ioctl+0x2f1/0x3c0 [drm]
> Jul 02 21:31:33 ryzen kernel:  ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu]
> Jul 02 21:31:33 ryzen kernel:  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
> Jul 02 21:31:33 ryzen kernel:  do_vfs_ioctl+0xa4/0x620
> Jul 02 21:31:33 ryzen kernel:  ? __se_sys_futex+0x138/0x180
> Jul 02 21:31:33 ryzen kernel:  ksys_ioctl+0x60/0x90
> Jul 02 21:31:33 ryzen kernel:  __x64_sys_ioctl+0x16/0x20
> Jul 02 21:31:33 ryzen kernel:  do_syscall_64+0x48/0xf0
> Jul 02 21:31:33 ryzen kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> Jul 02 21:31:33 ryzen kernel: RIP: 0033:0x7f8b66c92667
> Jul 02 21:31:33 ryzen kernel: Code: 00 00 90 48 8b 05 e9 67 2c 00 64 c7 00
> 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 8>
> Jul 02 21:31:33 ryzen kernel: RSP: 002b:00007f8b57265a98 EFLAGS: 00000246
> ORIG_RAX: 0000000000000010
> Jul 02 21:31:33 ryzen kernel: RAX: ffffffffffffffda RBX: 00007f8b57265b88
> RCX: 00007f8b66c92667
> Jul 02 21:31:33 ryzen kernel: RDX: 00007f8b57265b00 RSI: 00000000c0186444
> RDI: 000000000000000b
> Jul 02 21:31:33 ryzen kernel: RBP: 00007f8b57265b00 R08: 00007f8b57265bb0
> R09: 0000000000000010
> Jul 02 21:31:33 ryzen kernel: R10: 00007f8b57265bb0 R11: 0000000000000246
> R12: 00000000c0186444
> Jul 02 21:31:33 ryzen kernel: R13: 000000000000000b R14: 0000000000000002
> R15: 0000000000000000
> Jul 02 21:31:33 ryzen kernel: Modules linked in: it87(O) joydev mousedev
> hid_generic hidp hid ipt_REJECT nf_reject_ipv4 nf_l>
> Jul 02 21:31:33 ryzen kernel:  serio_raw crc32_pclmul atkbd
> ghash_clmulni_intel libps2 pcbc ahci libahci xhci_pci libata aes>
> Jul 02 21:31:33 ryzen kernel: CR2: 0000000000001000
> Jul 02 21:31:33 ryzen kernel: ---[ end trace 517a8a72887251f0 ]---
> Jul 02 21:31:33 ryzen kernel: RIP: 0010:gmc_v8_0_set_pte_pde+0x1b/0x30
> [amdgpu]
> Jul 02 21:31:33 ryzen kernel: Code: 80 d8 00 00 00 e9 25 78 60 e1 0f 1f 44
> 00 00 0f 1f 44 00 00 48 b8 00 f0 ff ff ff 00 00 0>
> Jul 02 21:31:33 ryzen kernel: RSP: 0018:ffffc90003e73898 EFLAGS: 00010202
> Jul 02 21:31:33 ryzen kernel: RAX: 000000fffffff000 RBX: 0000000000000001
> RCX: 000000000fe004f1
> Jul 02 21:31:33 ryzen kernel: RDX: 0000000000001000 RSI: 0000000000001000
> RDI: ffff8807e2f70000
> Jul 02 21:31:33 ryzen kernel: RBP: 0000000000001000 R08: 00000000000004f1
> R09: 0000000000001000
> Jul 02 21:31:33 ryzen kernel: R10: ffffffffa03ac7e0 R11: ffff8807daf78000
> R12: 0000000000001000
> Jul 02 21:31:33 ryzen kernel: R13: 0000000000000200 R14: ffffc90003e73a18
> R15: 000000000fe01000
> Jul 02 21:31:33 ryzen kernel: FS:  00007f8b57266700(0000)
> GS:ffff88081ef80000(0000) knlGS:0000000000000000
> Jul 02 21:31:33 ryzen kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
> 0000000080050033
> Jul 02 21:31:33 ryzen kernel: CR2: 0000000000001000 CR3: 00000007dbbda000
> CR4: 00000000003406e0
> 
> (At this point, the machine is just dead, and reacts upon nothing.)
> 
> So something is still wrong at amdgpu_vm_cpu_set_ptes+0x76


My guess is that on resume from S3 root PD needs to be again mapped to CPU
address space. Maybe changing the patch according  to Christian's advise will
be enough. I will take a look tomorrow. Or it has to do with the resume failure
you are experiencing. What ASIC are you using ? I also tested with gfx8 ASIC
and haven't observed any issues with resume. Did you update the firmware for
this ASIC to latest #

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 13247 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" in amdgpu_vm_cpu_set_ptes at amdgpu_vm.c:921
  2018-06-28 19:33 [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" at amdgpu_vm_cpu_set_ptes at S3 resume bugzilla-daemon
                   ` (13 preceding siblings ...)
  2018-07-02 22:55 ` bugzilla-daemon
@ 2018-07-03 20:42 ` bugzilla-daemon
  2018-07-03 22:58 ` bugzilla-daemon
                   ` (12 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2018-07-03 20:42 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 622 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107065

--- Comment #14 from dwagner <jb5sgc1n.nya@20mm.eu> ---
(In reply to Andrey Grodzovsky from comment #13)
> What ASIC are you using ? I also tested with
> gfx8 ASIC and haven't observed any issues with resume. Did you update the
> firmware for this ASIC to latest #

The GPU is an RX460 "POLARIS11 0x1002:0x67EF 0x1682:0x9460 0xCF",
with the latest firmware from the kernel git, you can see the
details from https://bugs.freedesktop.org/attachment.cgi?id=140383
uploaded earlier.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1700 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" in amdgpu_vm_cpu_set_ptes at amdgpu_vm.c:921
  2018-06-28 19:33 [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" at amdgpu_vm_cpu_set_ptes at S3 resume bugzilla-daemon
                   ` (14 preceding siblings ...)
  2018-07-03 20:42 ` bugzilla-daemon
@ 2018-07-03 22:58 ` bugzilla-daemon
  2018-07-04 22:55 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2018-07-03 22:58 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 902 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107065

--- Comment #15 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
(In reply to dwagner from comment #14)
> (In reply to Andrey Grodzovsky from comment #13)
> > What ASIC are you using ? I also tested with
> > gfx8 ASIC and haven't observed any issues with resume. Did you update the
> > firmware for this ASIC to latest #
> 
> The GPU is an RX460 "POLARIS11 0x1002:0x67EF 0x1682:0x9460 0xCF",
> with the latest firmware from the kernel git, you can see the
> details from https://bugs.freedesktop.org/attachment.cgi?id=140383
> uploaded earlier.

We have only minor differences but I can't reproduce it. Maybe the resume
failure is indeed due the eviction failure during suspend. Is S3 failure is
happening only when you switch to CPU update mode ?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2063 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" in amdgpu_vm_cpu_set_ptes at amdgpu_vm.c:921
  2018-06-28 19:33 [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" at amdgpu_vm_cpu_set_ptes at S3 resume bugzilla-daemon
                   ` (15 preceding siblings ...)
  2018-07-03 22:58 ` bugzilla-daemon
@ 2018-07-04 22:55 ` bugzilla-daemon
  2018-07-06 23:03 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2018-07-04 22:55 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2194 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107065

--- Comment #16 from dwagner <jb5sgc1n.nya@20mm.eu> ---
(In reply to Andrey Grodzovsky from comment #15)
> We have only minor differences but I can't reproduce it. Maybe the resume
> failure is indeed due the eviction failure during suspend. Is S3 failure is
> happening only when you switch to CPU update mode ?

No, when I boot amd-staging-drm-next with amdgpu.vm_update_mode=0
and suspend to S3 then resuming does also crash, but with different
messages - _not_ with
 "BUG: unable to handle kernel paging request at 0000000000002000"
like in the vm_update_mode=3 case.

In the journal, I can see see after a vm_update_mode=0 S3 resume attempt: 

Jul 05 00:41:59 ryzen kernel: [TTM] Buffer eviction failed
Jul 05 00:41:59 ryzen kernel: ACPI: Preparing to enter system sleep state S3
...
Jul 05 00:42:00 ryzen kernel: [drm:gfx_v8_0_ring_test_ring [amdgpu]] *ERROR*
amdgpu: ring 0 test failed (scratch(0xC040)=0xC>
Jul 05 00:42:00 ryzen kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]]
*ERROR* resume of IP block <gfx_v8_0> failed -22
Jul 05 00:42:00 ryzen kernel: [drm:amdgpu_device_resume [amdgpu]] *ERROR*
amdgpu_device_ip_resume failed (-22).
Jul 05 00:42:00 ryzen kernel: dpm_run_callback(): pci_pm_resume+0x0/0xa0
returns -22
Jul 05 00:42:00 ryzen kernel: PM: Device 0000:0a:00.0 failed to resume async:
error -22
...
Jul 05 00:42:00 ryzen kernel: amdgpu 0000:0a:00.0: couldn't schedule ib on ring
<sdma0>
Jul 05 00:42:00 ryzen kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
scheduling IBs (-22)
Jul 05 00:42:00 ryzen kernel: amdgpu 0000:0a:00.0: couldn't schedule ib on ring
<sdma0>
Jul 05 00:42:00 ryzen kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
scheduling IBs (-22)
Jul 05 00:42:00 ryzen kernel: amdgpu 0000:0a:00.0: couldn't schedule ib on ring
<sdma0>
Jul 05 00:42:00 ryzen kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
scheduling IBs (-22)
Jul 05 00:42:00 ryzen kernel: amdgpu 0000:0a:00.0: couldn't schedule ib on ring
<sdma0>
... many more of this... but no kernel BUG or Oops.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3237 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" in amdgpu_vm_cpu_set_ptes at amdgpu_vm.c:921
  2018-06-28 19:33 [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" at amdgpu_vm_cpu_set_ptes at S3 resume bugzilla-daemon
                   ` (16 preceding siblings ...)
  2018-07-04 22:55 ` bugzilla-daemon
@ 2018-07-06 23:03 ` bugzilla-daemon
  2018-07-09 18:16 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2018-07-06 23:03 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 685 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107065

--- Comment #17 from dwagner <jb5sgc1n.nya@20mm.eu> ---
Interesting observation: If I first switch from the X11 display to the console
display (with Alt-F2), and then enter "echo mem >/sys/power/state" on the
console, above described crashes upon S3 resume do not occur, and I do not see
the "[TTM] Buffer eviction failed" in the kernel log, neither with
vm_update_mode=0, nor with vm_update_mode=3.

Switching back to the X11 display after a successful S3 resume to the console
also works fine.

What could be the relevant difference here?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1630 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" in amdgpu_vm_cpu_set_ptes at amdgpu_vm.c:921
  2018-06-28 19:33 [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" at amdgpu_vm_cpu_set_ptes at S3 resume bugzilla-daemon
                   ` (17 preceding siblings ...)
  2018-07-06 23:03 ` bugzilla-daemon
@ 2018-07-09 18:16 ` bugzilla-daemon
  2018-07-11 22:04 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2018-07-09 18:16 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1059 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107065

--- Comment #18 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
(In reply to dwagner from comment #17)
> Interesting observation: If I first switch from the X11 display to the
> console display (with Alt-F2), and then enter "echo mem >/sys/power/state"
> on the console, above described crashes upon S3 resume do not occur, and I
> do not see the "[TTM] Buffer eviction failed" in the kernel log, neither
> with vm_update_mode=0, nor with vm_update_mode=3.
> 
> Switching back to the X11 display after a successful S3 resume to the
> console also works fine.
> 
> What could be the relevant difference here?

Well, there is no acceleration involved when in console mode. So maybe this has
something to do with it.

Anyway, i am sidetracked a bit by an internal requirement but once i finish I
will get back to this issue especially because I got another report with the
same failure as you describe.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2118 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" in amdgpu_vm_cpu_set_ptes at amdgpu_vm.c:921
  2018-06-28 19:33 [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" at amdgpu_vm_cpu_set_ptes at S3 resume bugzilla-daemon
                   ` (18 preceding siblings ...)
  2018-07-09 18:16 ` bugzilla-daemon
@ 2018-07-11 22:04 ` bugzilla-daemon
  2018-07-11 22:23 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2018-07-11 22:04 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1354 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107065

--- Comment #19 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
(In reply to Andrey Grodzovsky from comment #18)
> (In reply to dwagner from comment #17)
> > Interesting observation: If I first switch from the X11 display to the
> > console display (with Alt-F2), and then enter "echo mem >/sys/power/state"
> > on the console, above described crashes upon S3 resume do not occur, and I
> > do not see the "[TTM] Buffer eviction failed" in the kernel log, neither
> > with vm_update_mode=0, nor with vm_update_mode=3.
> > 
> > Switching back to the X11 display after a successful S3 resume to the
> > console also works fine.
> > 
> > What could be the relevant difference here?
> 
> Well, there is no acceleration involved when in console mode. So maybe this
> has something to do with it.
> 
> Anyway, i am sidetracked a bit by an internal requirement but once i finish
> I will get back to this issue especially because I got another report with
> the same failure as you describe.

I was able to reproduce this instantly without even using page tables CPU
update mode. Looks like a regression since S3 was working fine for long time.
Were you able to find a regression point for this ?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2508 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" in amdgpu_vm_cpu_set_ptes at amdgpu_vm.c:921
  2018-06-28 19:33 [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" at amdgpu_vm_cpu_set_ptes at S3 resume bugzilla-daemon
                   ` (19 preceding siblings ...)
  2018-07-11 22:04 ` bugzilla-daemon
@ 2018-07-11 22:23 ` bugzilla-daemon
  2018-07-13 21:01 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2018-07-11 22:23 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1123 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107065

--- Comment #20 from dwagner <jb5sgc1n.nya@20mm.eu> ---
(In reply to Andrey Grodzovsky from comment #19)
> I was able to reproduce this instantly without even using page tables CPU
> update mode. Looks like a regression since S3 was working fine for long
> time. Were you able to find a regression point for this ?

Not for the exact symptom described in this report, but for an older S3 resume
issue that was partially resolved -
https://bugs.freedesktop.org/show_bug.cgi?id=103277 - I did once find the
regression caused by the "drm/amd/display: Match actual state during S3 resume"
commit.

Unluckily, the many changes that followed thereafter do no longer allow to
bisect the symptom there to one specific commit, but given that it still occurs
if I use the option "drm.edid_firmware=edid/LG_EG9609_edid.bin", I think there
is still some bug in the order of things done during re-initialization upon S3
resumes, and setting some fixed EDID seems to expose it as crash.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2386 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" in amdgpu_vm_cpu_set_ptes at amdgpu_vm.c:921
  2018-06-28 19:33 [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" at amdgpu_vm_cpu_set_ptes at S3 resume bugzilla-daemon
                   ` (20 preceding siblings ...)
  2018-07-11 22:23 ` bugzilla-daemon
@ 2018-07-13 21:01 ` bugzilla-daemon
  2018-07-13 23:45 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2018-07-13 21:01 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1526 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107065

--- Comment #21 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
(In reply to dwagner from comment #20)
> (In reply to Andrey Grodzovsky from comment #19)
> > I was able to reproduce this instantly without even using page tables CPU
> > update mode. Looks like a regression since S3 was working fine for long
> > time. Were you able to find a regression point for this ?
> 
> Not for the exact symptom described in this report, but for an older S3
> resume issue that was partially resolved -
> https://bugs.freedesktop.org/show_bug.cgi?id=103277 - I did once find the
> regression caused by the "drm/amd/display: Match actual state during S3
> resume" commit.
> 
> Unluckily, the many changes that followed thereafter do no longer allow to
> bisect the symptom there to one specific commit, but given that it still
> occurs if I use the option "drm.edid_firmware=edid/LG_EG9609_edid.bin", I
> think there is still some bug in the order of things done during
> re-initialization upon S3 resumes, and setting some fixed EDID seems to
> expose it as crash.

I found the offending patch - drm: Stop updating plane->crtc/fb/old_fb on
atomic drivers
Not sure yet what's going on there and not sure it will fix you issue with
amdgpu_vm_cpu_set_ptes page fault after S3 since I haven't observe it here.
Still worth a try on your side to revert it and see what happens.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2899 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" in amdgpu_vm_cpu_set_ptes at amdgpu_vm.c:921
  2018-06-28 19:33 [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" at amdgpu_vm_cpu_set_ptes at S3 resume bugzilla-daemon
                   ` (21 preceding siblings ...)
  2018-07-13 21:01 ` bugzilla-daemon
@ 2018-07-13 23:45 ` bugzilla-daemon
  2018-07-14  4:28 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2018-07-13 23:45 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 892 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107065

--- Comment #22 from dwagner <jb5sgc1n.nya@20mm.eu> ---
(In reply to Andrey Grodzovsky from comment #21)
> I found the offending patch - drm: Stop updating plane->crtc/fb/old_fb on
> atomic drivers
> Not sure yet what's going on there and not sure it will fix you issue with
> amdgpu_vm_cpu_set_ptes page fault after S3 since I haven't observe it here.
> Still worth a try on your side to revert it and see what happens.
Reverting the commit "drm: Stop updating plane->crtc/fb/old_fb on atomic
drivers" for me only changes that after S3 resume, the very picture that was
visible before S3 sleep is displayed again - but the kernel crash at
"amdgpu_vm_cpu_set_ptes+0x76" still happenes, so the "resumed picture" is as
frozen as the system is dead.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1934 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" in amdgpu_vm_cpu_set_ptes at amdgpu_vm.c:921
  2018-06-28 19:33 [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" at amdgpu_vm_cpu_set_ptes at S3 resume bugzilla-daemon
                   ` (22 preceding siblings ...)
  2018-07-13 23:45 ` bugzilla-daemon
@ 2018-07-14  4:28 ` bugzilla-daemon
  2018-07-14 13:15 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2018-07-14  4:28 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1031 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107065

--- Comment #23 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
(In reply to dwagner from comment #22)
> (In reply to Andrey Grodzovsky from comment #21)
> > I found the offending patch - drm: Stop updating plane->crtc/fb/old_fb on
> > atomic drivers
> > Not sure yet what's going on there and not sure it will fix you issue with
> > amdgpu_vm_cpu_set_ptes page fault after S3 since I haven't observe it here.
> > Still worth a try on your side to revert it and see what happens.
> Reverting the commit "drm: Stop updating plane->crtc/fb/old_fb on atomic
> drivers" for me only changes that after S3 resume, the very picture that was
> visible before S3 sleep is displayed again - but the kernel crash at
> "amdgpu_vm_cpu_set_ptes+0x76" still happenes, so the "resumed picture" is as
> frozen as the system is dead.

Can you attach dmesg from the system with reverted patch ?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2162 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" in amdgpu_vm_cpu_set_ptes at amdgpu_vm.c:921
  2018-06-28 19:33 [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" at amdgpu_vm_cpu_set_ptes at S3 resume bugzilla-daemon
                   ` (23 preceding siblings ...)
  2018-07-14  4:28 ` bugzilla-daemon
@ 2018-07-14 13:15 ` bugzilla-daemon
  2018-07-14 13:16 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2018-07-14 13:15 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 624 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107065

--- Comment #24 from dwagner <jb5sgc1n.nya@20mm.eu> ---
> > Reverting the commit "drm: Stop updating plane->crtc/fb/old_fb on atomic
> > drivers" for me only changes that after S3 resume, the very picture that was
> > visible before S3 sleep is displayed again - but the kernel crash at
> > "amdgpu_vm_cpu_set_ptes+0x76" still happenes, so the "resumed picture" is as
> > frozen as the system is dead.
> 
> Can you attach dmesg from the system with reverted patch ?

Sure, will do

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1643 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" in amdgpu_vm_cpu_set_ptes at amdgpu_vm.c:921
  2018-06-28 19:33 [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" at amdgpu_vm_cpu_set_ptes at S3 resume bugzilla-daemon
                   ` (24 preceding siblings ...)
  2018-07-14 13:15 ` bugzilla-daemon
@ 2018-07-14 13:16 ` bugzilla-daemon
  2018-07-16 13:52 ` bugzilla-daemon
  2018-07-19 16:42 ` bugzilla-daemon
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2018-07-14 13:16 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 365 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107065

--- Comment #25 from dwagner <jb5sgc1n.nya@20mm.eu> ---
Created attachment 140634
  --> https://bugs.freedesktop.org/attachment.cgi?id=140634&action=edit
dmesg before and after S3 sleep with commit "updating plane ..." reverted

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1555 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" in amdgpu_vm_cpu_set_ptes at amdgpu_vm.c:921
  2018-06-28 19:33 [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" at amdgpu_vm_cpu_set_ptes at S3 resume bugzilla-daemon
                   ` (25 preceding siblings ...)
  2018-07-14 13:16 ` bugzilla-daemon
@ 2018-07-16 13:52 ` bugzilla-daemon
  2018-07-19 16:42 ` bugzilla-daemon
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2018-07-16 13:52 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 805 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107065

--- Comment #26 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
(In reply to dwagner from comment #25)
> Created attachment 140634 [details]
> dmesg before and after S3 sleep with commit "updating plane ..." reverted

Reverting the patch makes the TTM eviction failure + following driver resume
failure go away. So that one issue. Another issue Is that you still experience
page table updates realated fault during S3. I can't reproduce that issue. 

I am currently looking into how this patch broke S3, this is more burning issue
as other people experience it to. Later i will try to give you some debug
printk patch to sort out your page fault issue.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2148 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" in amdgpu_vm_cpu_set_ptes at amdgpu_vm.c:921
  2018-06-28 19:33 [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" at amdgpu_vm_cpu_set_ptes at S3 resume bugzilla-daemon
                   ` (26 preceding siblings ...)
  2018-07-16 13:52 ` bugzilla-daemon
@ 2018-07-19 16:42 ` bugzilla-daemon
  27 siblings, 0 replies; 29+ messages in thread
From: bugzilla-daemon @ 2018-07-19 16:42 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 825 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107065

Andrey Grodzovsky <andrey.grodzovsky@amd.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|dri-devel@lists.freedesktop |andrey.grodzovsky@amd.com
                   |.org                        |

--- Comment #27 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
Created attachment 140715
  --> https://bugs.freedesktop.org/attachment.cgi?id=140715&action=edit
0001-drm-amdgpu-Fix-S3-resume-failre.patch

Please try the attached patch for the S3 issue, it's might still not be the
final fix but still. It's not a fix for your CPU page table updates fault.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2619 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2018-07-19 16:42 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-28 19:33 [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" at amdgpu_vm_cpu_set_ptes at S3 resume bugzilla-daemon
2018-06-28 19:42 ` bugzilla-daemon
2018-06-28 19:52 ` bugzilla-daemon
2018-06-28 20:49 ` bugzilla-daemon
2018-06-28 22:50 ` bugzilla-daemon
2018-06-29  0:37 ` bugzilla-daemon
2018-06-29 16:16 ` bugzilla-daemon
2018-06-29 19:10 ` bugzilla-daemon
2018-06-29 19:10 ` [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" in amdgpu_vm_cpu_set_ptes at amdgpu_vm.c:921 bugzilla-daemon
2018-06-29 19:17 ` bugzilla-daemon
2018-06-29 19:21 ` bugzilla-daemon
2018-07-02  3:11 ` bugzilla-daemon
2018-07-02 11:03 ` bugzilla-daemon
2018-07-02 19:48 ` bugzilla-daemon
2018-07-02 22:55 ` bugzilla-daemon
2018-07-03 20:42 ` bugzilla-daemon
2018-07-03 22:58 ` bugzilla-daemon
2018-07-04 22:55 ` bugzilla-daemon
2018-07-06 23:03 ` bugzilla-daemon
2018-07-09 18:16 ` bugzilla-daemon
2018-07-11 22:04 ` bugzilla-daemon
2018-07-11 22:23 ` bugzilla-daemon
2018-07-13 21:01 ` bugzilla-daemon
2018-07-13 23:45 ` bugzilla-daemon
2018-07-14  4:28 ` bugzilla-daemon
2018-07-14 13:15 ` bugzilla-daemon
2018-07-14 13:16 ` bugzilla-daemon
2018-07-16 13:52 ` bugzilla-daemon
2018-07-19 16:42 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.