From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 107065] "BUG: unable to handle kernel paging request at 0000000000002000" in amdgpu_vm_cpu_set_ptes at amdgpu_vm.c:921 Date: Mon, 02 Jul 2018 22:55:24 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0778518368==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 15A016E511 for ; Mon, 2 Jul 2018 22:55:25 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0778518368== Content-Type: multipart/alternative; boundary="15305721242.eD4B1.25067" Content-Transfer-Encoding: 7bit --15305721242.eD4B1.25067 Date: Mon, 2 Jul 2018 22:55:24 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D107065 --- Comment #13 from Andrey Grodzovsky --- (In reply to dwagner from comment #12) > (In reply to Andrey Grodzovsky from comment #10) > > Created attachment 140418 [details] [review] [review] > > drm/amdgpu: Verify root PD is mapped into kernel address space. > >=20 > > dwagner, please try this patch. Fixes the issue for me and I observed no > > suspend/resume issues. >=20 > While I can start X11 with this patch applied to current > amd-staging-drm-next, attempts to resume from S3 fail consistently. >=20 > The following related output is emitted right before the suspend: >=20 > Jul 02 21:31:32 ryzen kernel: Freezing remaining freezable tasks ... > (elapsed 0.000 seconds) done. > Jul 02 21:31:32 ryzen kernel: Suspending console(s) (use no_console_suspe= nd > to debug) > Jul 02 21:31:32 ryzen kernel: sd 9:0:0:0: [sda] Synchronizing SCSI cache > Jul 02 21:31:32 ryzen kernel: [TTM] Buffer eviction failed > Jul 02 21:31:32 ryzen kernel: ACPI: Preparing to enter system sleep state= S3 > Jul 02 21:31:32 ryzen kernel: PM: Saving platform NVS memory > Jul 02 21:31:32 ryzen kernel: Disabling non-boot CPUs ... >=20 > (I wonder if that "[TTM] Buffer eviction failed" is a bad sign - as I have > seen it some other times in conjunction with heavy uses of the amdgpu > driver.) >=20 >=20 > Then, upon resume, the following messages are emitted: >=20 > Jul 02 21:31:33 ryzen kernel: ACPI: Low-level resume complete > Jul 02 21:31:33 ryzen kernel: [drm] PCIE GART of 256M enabled (table at > 0x000000F400300000). > Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20 > failed to send message 146 ret is 0=20 > Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20 > last message was failed ret is 0 > Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20 > failed to send message 148 ret is 0=20 > Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20 > last message was failed ret is 0 > Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20 > failed to send message 145 ret is 0=20 > Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20 > last message was failed ret is 0 > Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20 > failed to send message 146 ret is 0=20 > Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20 > last message was failed ret is 0 > Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20 > failed to send message 189 ret is 0=20 > Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20 > last message was failed ret is 0 > Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20 > failed to send message 306 ret is 0=20 > Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20 > last message was failed ret is 0 > Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20 > failed to send message 5e ret is 0=20 > Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20 > last message was failed ret is 0 > Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20 > failed to send message 18a ret is 0=20 > Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20 > last message was failed ret is 0 > Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20 > failed to send message 145 ret is 0=20 > Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20 > last message was failed ret is 0 > Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20 > failed to send message 146 ret is 0=20 > Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20 > last message was failed ret is 0 > Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20 > failed to send message 148 ret is 0=20 > Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20 > last message was failed ret is 0 > Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20 > failed to send message 145 ret is 0=20 > Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20 > last message was failed ret is 0 > Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20 > failed to send message 146 ret is 0=20 > Jul 02 21:31:33 ryzen kernel: [drm:gfx_v8_0_ring_test_ring [amdgpu]] *ERR= OR* > amdgpu: ring 0 test failed (scratch(0xC040)=3D0xC> > Jul 02 21:31:33 ryzen kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu= ]] > *ERROR* resume of IP block failed -22 > Jul 02 21:31:33 ryzen kernel: [drm:amdgpu_device_resume [amdgpu]] *ERROR* > amdgpu_device_ip_resume failed (-22). > Jul 02 21:31:33 ryzen kernel: dpm_run_callback(): pci_pm_resume+0x0/0xa0 > returns -22 > Jul 02 21:31:33 ryzen kernel: PM: Device 0000:0a:00.0 failed to resume > async: error -22 > Jul 02 21:31:33 ryzen kernel: OOM killer enabled. > Jul 02 21:31:33 ryzen kernel: Restarting tasks ... done. > Jul 02 21:31:33 ryzen kernel: PM: suspend exit > Jul 02 21:31:33 ryzen kernel: BUG: unable to handle kernel paging request= at > 0000000000001000 > Jul 02 21:31:33 ryzen kernel: PGD 0 P4D 0=20 > Jul 02 21:31:33 ryzen kernel: Oops: 0002 [#1] SMP > Jul 02 21:31:33 ryzen kernel: CPU: 14 PID: 791 Comm: amdgpu_cs:0 Tainted:= G=20 > W O 4.18.0-rc1-amd+ #45 > Jul 02 21:31:33 ryzen kernel: Hardware name: System manufacturer System > Product Name/PRIME X370-PRO, BIOS 4011 04/19/2018 > Jul 02 21:31:33 ryzen kernel: RIP: 0010:gmc_v8_0_set_pte_pde+0x1b/0x30 > [amdgpu] > Jul 02 21:31:33 ryzen kernel: Code: 80 d8 00 00 00 e9 25 78 60 e1 0f 1f 44 > 00 00 0f 1f 44 00 00 48 b8 00 f0 ff ff ff 00 00 0> > Jul 02 21:31:33 ryzen kernel: RSP: 0018:ffffc90003e73898 EFLAGS: 00010202 > Jul 02 21:31:33 ryzen kernel: RAX: 000000fffffff000 RBX: 0000000000000001 > RCX: 000000000fe004f1 > Jul 02 21:31:33 ryzen kernel: RDX: 0000000000001000 RSI: 0000000000001000 > RDI: ffff8807e2f70000 > Jul 02 21:31:33 ryzen kernel: RBP: 0000000000001000 R08: 00000000000004f1 > R09: 0000000000001000 > Jul 02 21:31:33 ryzen kernel: R10: ffffffffa03ac7e0 R11: ffff8807daf78000 > R12: 0000000000001000 > Jul 02 21:31:33 ryzen kernel: R13: 0000000000000200 R14: ffffc90003e73a18 > R15: 000000000fe01000 > Jul 02 21:31:33 ryzen kernel: FS: 00007f8b57266700(0000) > GS:ffff88081ef80000(0000) knlGS:0000000000000000 > Jul 02 21:31:33 ryzen kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > Jul 02 21:31:33 ryzen kernel: CR2: 0000000000001000 CR3: 00000007dbbda000 > CR4: 00000000003406e0 > Jul 02 21:31:33 ryzen kernel: Call Trace: > Jul 02 21:31:33 ryzen kernel: amdgpu_vm_cpu_set_ptes+0x76/0xe0 [amdgpu] > Jul 02 21:31:33 ryzen kernel: amdgpu_vm_update_ptes+0x1d3/0x2e0 [amdgpu] > Jul 02 21:31:33 ryzen kernel: amdgpu_vm_frag_ptes+0xae/0x130 [amdgpu] > Jul 02 21:31:33 ryzen kernel: amdgpu_vm_bo_update_mapping+0xed/0x410 > [amdgpu] > Jul 02 21:31:33 ryzen kernel: ? amdgpu_vm_do_copy_ptes+0xa0/0xa0 [amdgpu] > Jul 02 21:31:33 ryzen kernel: amdgpu_vm_bo_update+0x310/0x680 [amdgpu] > Jul 02 21:31:33 ryzen kernel: amdgpu_cs_ioctl+0x1092/0x1a50 [amdgpu] > Jul 02 21:31:33 ryzen kernel: ? amdgpu_cs_find_mapping+0x110/0x110 [amdg= pu] > Jul 02 21:31:33 ryzen kernel: drm_ioctl_kernel+0xa7/0xf0 [drm] > Jul 02 21:31:33 ryzen kernel: drm_ioctl+0x2f1/0x3c0 [drm] > Jul 02 21:31:33 ryzen kernel: ? amdgpu_cs_find_mapping+0x110/0x110 [amdg= pu] > Jul 02 21:31:33 ryzen kernel: amdgpu_drm_ioctl+0x49/0x80 [amdgpu] > Jul 02 21:31:33 ryzen kernel: do_vfs_ioctl+0xa4/0x620 > Jul 02 21:31:33 ryzen kernel: ? __se_sys_futex+0x138/0x180 > Jul 02 21:31:33 ryzen kernel: ksys_ioctl+0x60/0x90 > Jul 02 21:31:33 ryzen kernel: __x64_sys_ioctl+0x16/0x20 > Jul 02 21:31:33 ryzen kernel: do_syscall_64+0x48/0xf0 > Jul 02 21:31:33 ryzen kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 > Jul 02 21:31:33 ryzen kernel: RIP: 0033:0x7f8b66c92667 > Jul 02 21:31:33 ryzen kernel: Code: 00 00 90 48 8b 05 e9 67 2c 00 64 c7 00 > 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 8> > Jul 02 21:31:33 ryzen kernel: RSP: 002b:00007f8b57265a98 EFLAGS: 00000246 > ORIG_RAX: 0000000000000010 > Jul 02 21:31:33 ryzen kernel: RAX: ffffffffffffffda RBX: 00007f8b57265b88 > RCX: 00007f8b66c92667 > Jul 02 21:31:33 ryzen kernel: RDX: 00007f8b57265b00 RSI: 00000000c0186444 > RDI: 000000000000000b > Jul 02 21:31:33 ryzen kernel: RBP: 00007f8b57265b00 R08: 00007f8b57265bb0 > R09: 0000000000000010 > Jul 02 21:31:33 ryzen kernel: R10: 00007f8b57265bb0 R11: 0000000000000246 > R12: 00000000c0186444 > Jul 02 21:31:33 ryzen kernel: R13: 000000000000000b R14: 0000000000000002 > R15: 0000000000000000 > Jul 02 21:31:33 ryzen kernel: Modules linked in: it87(O) joydev mousedev > hid_generic hidp hid ipt_REJECT nf_reject_ipv4 nf_l> > Jul 02 21:31:33 ryzen kernel: serio_raw crc32_pclmul atkbd > ghash_clmulni_intel libps2 pcbc ahci libahci xhci_pci libata aes> > Jul 02 21:31:33 ryzen kernel: CR2: 0000000000001000 > Jul 02 21:31:33 ryzen kernel: ---[ end trace 517a8a72887251f0 ]--- > Jul 02 21:31:33 ryzen kernel: RIP: 0010:gmc_v8_0_set_pte_pde+0x1b/0x30 > [amdgpu] > Jul 02 21:31:33 ryzen kernel: Code: 80 d8 00 00 00 e9 25 78 60 e1 0f 1f 44 > 00 00 0f 1f 44 00 00 48 b8 00 f0 ff ff ff 00 00 0> > Jul 02 21:31:33 ryzen kernel: RSP: 0018:ffffc90003e73898 EFLAGS: 00010202 > Jul 02 21:31:33 ryzen kernel: RAX: 000000fffffff000 RBX: 0000000000000001 > RCX: 000000000fe004f1 > Jul 02 21:31:33 ryzen kernel: RDX: 0000000000001000 RSI: 0000000000001000 > RDI: ffff8807e2f70000 > Jul 02 21:31:33 ryzen kernel: RBP: 0000000000001000 R08: 00000000000004f1 > R09: 0000000000001000 > Jul 02 21:31:33 ryzen kernel: R10: ffffffffa03ac7e0 R11: ffff8807daf78000 > R12: 0000000000001000 > Jul 02 21:31:33 ryzen kernel: R13: 0000000000000200 R14: ffffc90003e73a18 > R15: 000000000fe01000 > Jul 02 21:31:33 ryzen kernel: FS: 00007f8b57266700(0000) > GS:ffff88081ef80000(0000) knlGS:0000000000000000 > Jul 02 21:31:33 ryzen kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > Jul 02 21:31:33 ryzen kernel: CR2: 0000000000001000 CR3: 00000007dbbda000 > CR4: 00000000003406e0 >=20 > (At this point, the machine is just dead, and reacts upon nothing.) >=20 > So something is still wrong at amdgpu_vm_cpu_set_ptes+0x76 My guess is that on resume from S3 root PD needs to be again mapped to CPU address space. Maybe changing the patch according to Christian's advise wi= ll be enough. I will take a look tomorrow. Or it has to do with the resume fai= lure you are experiencing. What ASIC are you using ? I also tested with gfx8 ASIC and haven't observed any issues with resume. Did you update the firmware for this ASIC to latest # --=20 You are receiving this mail because: You are the assignee for the bug.= --15305721242.eD4B1.25067 Date: Mon, 2 Jul 2018 22:55:24 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 13 on bug 10706= 5 from Andrey Grodzovsky
(In reply to dwagner from comment #12)
> (In reply to Andrey Grodzovsky from comment #10)
> > Created attachment 140418 [details] [review] [review] [r=
eview]
> > drm/amdgpu: Verify root PD is mapped into kernel address space.
> >=20
> > dwagner, please try this patch. Fixes the issue for me and I obse=
rved no
> > suspend/resume issues.
>=20
> While I can start X11 with this patch applied to current
> amd-staging-drm-next, attempts to resume from S3 fail consistently.
>=20
> The following related output is emitted right before the suspend:
>=20
> Jul 02 21:31:32 ryzen kernel: Freezing remaining freezable tasks ...
> (elapsed 0.000 seconds) done.
> Jul 02 21:31:32 ryzen kernel: Suspending console(s) (use no_console_su=
spend
> to debug)
> Jul 02 21:31:32 ryzen kernel: sd 9:0:0:0: [sda] Synchronizing SCSI cac=
he
> Jul 02 21:31:32 ryzen kernel: [TTM] Buffer eviction failed
> Jul 02 21:31:32 ryzen kernel: ACPI: Preparing to enter system sleep st=
ate S3
> Jul 02 21:31:32 ryzen kernel: PM: Saving platform NVS memory
> Jul 02 21:31:32 ryzen kernel: Disabling non-boot CPUs ...
>=20
> (I wonder if that "[TTM] Buffer eviction failed" is a bad si=
gn - as I have
> seen it some other times in conjunction with heavy uses of the amdgpu
> driver.)
>=20
>=20
> Then, upon resume, the following messages are emitted:
>=20
> Jul 02 21:31:33 ryzen kernel: ACPI: Low-level resume complete
> Jul 02 21:31:33 ryzen kernel: [drm] PCIE GART of 256M enabled (table at
> 0x000000F400300000).
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20
>                                failed to send message 146 ret is 0=20
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20
>                                last message was failed ret is 0
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20
>                                failed to send message 148 ret is 0=20
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20
>                                last message was failed ret is 0
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20
>                                failed to send message 145 ret is 0=20
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20
>                                last message was failed ret is 0
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20
>                                failed to send message 146 ret is 0=20
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20
>                                last message was failed ret is 0
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20
>                                failed to send message 189 ret is 0=20
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20
>                                last message was failed ret is 0
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20
>                                failed to send message 306 ret is 0=20
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20
>                                last message was failed ret is 0
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20
>                                failed to send message 5e ret is 0=20
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20
>                                last message was failed ret is 0
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20
>                                failed to send message 18a ret is 0=20
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20
>                                last message was failed ret is 0
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20
>                                failed to send message 145 ret is 0=20
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20
>                                last message was failed ret is 0
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20
>                                failed to send message 146 ret is 0=20
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20
>                                last message was failed ret is 0
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20
>                                failed to send message 148 ret is 0=20
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20
>                                last message was failed ret is 0
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20
>                                failed to send message 145 ret is 0=20
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20
>                                last message was failed ret is 0
> Jul 02 21:31:33 ryzen kernel: amdgpu: [powerplay]=20
>                                failed to send message 146 ret is 0=20
> Jul 02 21:31:33 ryzen kernel: [drm:gfx_v8_0_ring_test_ring [amdgpu]] *=
ERROR*
> amdgpu: ring 0 test failed (scratch(0xC040)=3D0xC>
> Jul 02 21:31:33 ryzen kernel: [drm:amdgpu_device_ip_resume_phase2 [amd=
gpu]]
> *ERROR* resume of IP block <gfx_v8_0> failed -22
> Jul 02 21:31:33 ryzen kernel: [drm:amdgpu_device_resume [amdgpu]] *ERR=
OR*
> amdgpu_device_ip_resume failed (-22).
> Jul 02 21:31:33 ryzen kernel: dpm_run_callback(): pci_pm_resume+0x0/0x=
a0
> returns -22
> Jul 02 21:31:33 ryzen kernel: PM: Device 0000:0a:00.0 failed to resume
> async: error -22
> Jul 02 21:31:33 ryzen kernel: OOM killer enabled.
> Jul 02 21:31:33 ryzen kernel: Restarting tasks ... done.
> Jul 02 21:31:33 ryzen kernel: PM: suspend exit
> Jul 02 21:31:33 ryzen kernel: BUG: unable to handle kernel paging requ=
est at
> 0000000000001000
> Jul 02 21:31:33 ryzen kernel: PGD 0 P4D 0=20
> Jul 02 21:31:33 ryzen kernel: Oops: 0002 [#1] SMP
> Jul 02 21:31:33 ryzen kernel: CPU: 14 PID: 791 Comm: amdgpu_cs:0 Taint=
ed: G=20
> W  O      4.18.0-rc1-amd+ #45
> Jul 02 21:31:33 ryzen kernel: Hardware name: System manufacturer System
> Product Name/PRIME X370-PRO, BIOS 4011 04/19/2018
> Jul 02 21:31:33 ryzen kernel: RIP: 0010:gmc_v8_0_set_pte_pde+0x1b/0x30
> [amdgpu]
> Jul 02 21:31:33 ryzen kernel: Code: 80 d8 00 00 00 e9 25 78 60 e1 0f 1=
f 44
> 00 00 0f 1f 44 00 00 48 b8 00 f0 ff ff ff 00 00 0>
> Jul 02 21:31:33 ryzen kernel: RSP: 0018:ffffc90003e73898 EFLAGS: 00010=
202
> Jul 02 21:31:33 ryzen kernel: RAX: 000000fffffff000 RBX: 0000000000000=
001
> RCX: 000000000fe004f1
> Jul 02 21:31:33 ryzen kernel: RDX: 0000000000001000 RSI: 0000000000001=
000
> RDI: ffff8807e2f70000
> Jul 02 21:31:33 ryzen kernel: RBP: 0000000000001000 R08: 0000000000000=
4f1
> R09: 0000000000001000
> Jul 02 21:31:33 ryzen kernel: R10: ffffffffa03ac7e0 R11: ffff8807daf78=
000
> R12: 0000000000001000
> Jul 02 21:31:33 ryzen kernel: R13: 0000000000000200 R14: ffffc90003e73=
a18
> R15: 000000000fe01000
> Jul 02 21:31:33 ryzen kernel: FS:  00007f8b57266700(0000)
> GS:ffff88081ef80000(0000) knlGS:0000000000000000
> Jul 02 21:31:33 ryzen kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
> 0000000080050033
> Jul 02 21:31:33 ryzen kernel: CR2: 0000000000001000 CR3: 00000007dbbda=
000
> CR4: 00000000003406e0
> Jul 02 21:31:33 ryzen kernel: Call Trace:
> Jul 02 21:31:33 ryzen kernel:  amdgpu_vm_cpu_set_ptes+0x76/0xe0 [amdgp=
u]
> Jul 02 21:31:33 ryzen kernel:  amdgpu_vm_update_ptes+0x1d3/0x2e0 [amdg=
pu]
> Jul 02 21:31:33 ryzen kernel:  amdgpu_vm_frag_ptes+0xae/0x130 [amdgpu]
> Jul 02 21:31:33 ryzen kernel:  amdgpu_vm_bo_update_mapping+0xed/0x410
> [amdgpu]
> Jul 02 21:31:33 ryzen kernel:  ? amdgpu_vm_do_copy_ptes+0xa0/0xa0 [amd=
gpu]
> Jul 02 21:31:33 ryzen kernel:  amdgpu_vm_bo_update+0x310/0x680 [amdgpu]
> Jul 02 21:31:33 ryzen kernel:  amdgpu_cs_ioctl+0x1092/0x1a50 [amdgpu]
> Jul 02 21:31:33 ryzen kernel:  ? amdgpu_cs_find_mapping+0x110/0x110 [a=
mdgpu]
> Jul 02 21:31:33 ryzen kernel:  drm_ioctl_kernel+0xa7/0xf0 [drm]
> Jul 02 21:31:33 ryzen kernel:  drm_ioctl+0x2f1/0x3c0 [drm]
> Jul 02 21:31:33 ryzen kernel:  ? amdgpu_cs_find_mapping+0x110/0x110 [a=
mdgpu]
> Jul 02 21:31:33 ryzen kernel:  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
> Jul 02 21:31:33 ryzen kernel:  do_vfs_ioctl+0xa4/0x620
> Jul 02 21:31:33 ryzen kernel:  ? __se_sys_futex+0x138/0x180
> Jul 02 21:31:33 ryzen kernel:  ksys_ioctl+0x60/0x90
> Jul 02 21:31:33 ryzen kernel:  __x64_sys_ioctl+0x16/0x20
> Jul 02 21:31:33 ryzen kernel:  do_syscall_64+0x48/0xf0
> Jul 02 21:31:33 ryzen kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> Jul 02 21:31:33 ryzen kernel: RIP: 0033:0x7f8b66c92667
> Jul 02 21:31:33 ryzen kernel: Code: 00 00 90 48 8b 05 e9 67 2c 00 64 c=
7 00
> 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 8>
> Jul 02 21:31:33 ryzen kernel: RSP: 002b:00007f8b57265a98 EFLAGS: 00000=
246
> ORIG_RAX: 0000000000000010
> Jul 02 21:31:33 ryzen kernel: RAX: ffffffffffffffda RBX: 00007f8b57265=
b88
> RCX: 00007f8b66c92667
> Jul 02 21:31:33 ryzen kernel: RDX: 00007f8b57265b00 RSI: 00000000c0186=
444
> RDI: 000000000000000b
> Jul 02 21:31:33 ryzen kernel: RBP: 00007f8b57265b00 R08: 00007f8b57265=
bb0
> R09: 0000000000000010
> Jul 02 21:31:33 ryzen kernel: R10: 00007f8b57265bb0 R11: 0000000000000=
246
> R12: 00000000c0186444
> Jul 02 21:31:33 ryzen kernel: R13: 000000000000000b R14: 0000000000000=
002
> R15: 0000000000000000
> Jul 02 21:31:33 ryzen kernel: Modules linked in: it87(O) joydev moused=
ev
> hid_generic hidp hid ipt_REJECT nf_reject_ipv4 nf_l>
> Jul 02 21:31:33 ryzen kernel:  serio_raw crc32_pclmul atkbd
> ghash_clmulni_intel libps2 pcbc ahci libahci xhci_pci libata aes>
> Jul 02 21:31:33 ryzen kernel: CR2: 0000000000001000
> Jul 02 21:31:33 ryzen kernel: ---[ end trace 517a8a72887251f0 ]---
> Jul 02 21:31:33 ryzen kernel: RIP: 0010:gmc_v8_0_set_pte_pde+0x1b/0x30
> [amdgpu]
> Jul 02 21:31:33 ryzen kernel: Code: 80 d8 00 00 00 e9 25 78 60 e1 0f 1=
f 44
> 00 00 0f 1f 44 00 00 48 b8 00 f0 ff ff ff 00 00 0>
> Jul 02 21:31:33 ryzen kernel: RSP: 0018:ffffc90003e73898 EFLAGS: 00010=
202
> Jul 02 21:31:33 ryzen kernel: RAX: 000000fffffff000 RBX: 0000000000000=
001
> RCX: 000000000fe004f1
> Jul 02 21:31:33 ryzen kernel: RDX: 0000000000001000 RSI: 0000000000001=
000
> RDI: ffff8807e2f70000
> Jul 02 21:31:33 ryzen kernel: RBP: 0000000000001000 R08: 0000000000000=
4f1
> R09: 0000000000001000
> Jul 02 21:31:33 ryzen kernel: R10: ffffffffa03ac7e0 R11: ffff8807daf78=
000
> R12: 0000000000001000
> Jul 02 21:31:33 ryzen kernel: R13: 0000000000000200 R14: ffffc90003e73=
a18
> R15: 000000000fe01000
> Jul 02 21:31:33 ryzen kernel: FS:  00007f8b57266700(0000)
> GS:ffff88081ef80000(0000) knlGS:0000000000000000
> Jul 02 21:31:33 ryzen kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
> 0000000080050033
> Jul 02 21:31:33 ryzen kernel: CR2: 0000000000001000 CR3: 00000007dbbda=
000
> CR4: 00000000003406e0
>=20
> (At this point, the machine is just dead, and reacts upon nothing.)
>=20
> So something is still wrong at amdgpu_vm_cpu_set_ptes+0x76


My guess is that on resume from S3 root PD needs to be again mapped to CPU
address space. Maybe changing the patch according  to Christian's advise wi=
ll
be enough. I will take a look tomorrow. Or it has to do with the resume fai=
lure
you are experiencing. What ASIC are you using ? I also tested with gfx8 ASIC
and haven't observed any issues with resume. Did you update the firmware for
this ASIC to latest #


You are receiving this mail because:
  • You are the assignee for the bug.
= --15305721242.eD4B1.25067-- --===============0778518368== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0778518368==--