* amd-staging-drm-next: Oops - BUG: unable to handle kernel NULL pointer dereference, bisected. @ 2019-01-30 11:07 Przemek Socha 2019-01-30 12:02 ` Christian König 0 siblings, 1 reply; 9+ messages in thread From: Przemek Socha @ 2019-01-30 11:07 UTC (permalink / raw) To: Christian König Cc: Chunming Zhou, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW [-- Attachment #1.1: Type: text/plain, Size: 5948 bytes --] Good morning, after last pull from the amd-staging-drm-next tree (29th of February) I have random Oops on A6 6310 APU with r4 Mullins. Here is the Oops part of the log taken from pstore: <1>[ 55.166270] BUG: unable to handle kernel NULL pointer dereference at 0000000000000208 <1>[ 55.166281] #PF error: [normal kernel read fault] <6>[ 55.166285] PGD 0 P4D 0 <4>[ 55.166293] Oops: 0000 [#1] PREEMPT SMP <4>[ 55.166301] CPU: 3 PID: 11006 Comm: kwin_x11:cs0 Not tainted 5.0.0-rc1+ #44 <4>[ 55.166305] Hardware name: LENOVO 80E3/Lancer 5B2, BIOS A2CN45WW(V2.13) 08/04/2016 <4>[ 55.166320] RIP: 0010:ttm_bo_bulk_move_lru_tail+0xd3/0x188 [ttm] <4>[ 55.166326] Code: 00 4c 8b 0a 48 8b 81 a8 00 00 00 48 81 c1 a8 00 00 00 49 89 02 4c 8b 92 b0 00 00 00 4c 89 50 08 44 89 c0 48 c1 e0 04 4c 01 c8 <4c> 8b 90 08 02 00 00 4d 89 1a 4c 8b 90 08 02 00 00 4c 89 92 b0 00 <4>[ 55.166330] RSP: 0018:ffffa8bdc0f33b18 EFLAGS: 00010246 <4>[ 55.166335] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff9cfa935778f8 <4>[ 55.166339] RDX: ffff9cfa950c5050 RSI: 0000000000000070 RDI: ffff9cfa93575dd0 <4>[ 55.166342] RBP: ffff9cfa5d44d800 R08: 0000000000000000 R09: 0000000000000000 <4>[ 55.166346] R10: ffff9cfa8f7730f8 R11: ffff9cfa950c50f8 R12: ffff9cfa93575dd0 <4>[ 55.166350] R13: ffff9cfa93575800 R14: 0000000000000001 R15: ffffffffc03adc10 <4>[ 55.166355] FS: 00007fb327fff700(0000) GS:ffff9cfa97b80000(0000) knlGS: 0000000000000000 <4>[ 55.166359] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 55.166363] CR2: 0000000000000208 CR3: 00000002150f0000 CR4: 00000000000406e0 <4>[ 55.166366] Call Trace: <4>[ 55.166477] amdgpu_vm_move_to_lru_tail+0xe4/0x100 [amdgpu] <4>[ 55.166563] amdgpu_cs_ioctl+0x14e7/0x1b08 [amdgpu] <4>[ 55.166586] ? __switch_to_asm+0x40/0x70 <4>[ 55.166689] ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu] <4>[ 55.166698] drm_ioctl_kernel+0xa4/0xe8 <4>[ 55.166707] drm_ioctl+0x1db/0x358 <4>[ 55.166805] ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu] <4>[ 55.166901] amdgpu_drm_ioctl+0x44/0x78 [amdgpu] <4>[ 55.166931] do_vfs_ioctl+0x9f/0x618 <4>[ 55.166940] ksys_ioctl+0x5b/0x88 <4>[ 55.166947] __x64_sys_ioctl+0x11/0x18 <4>[ 55.166955] do_syscall_64+0x50/0x168 <4>[ 55.166963] entry_SYSCALL_64_after_hwframe+0x44/0xa9 <4>[ 55.166969] RIP: 0033:0x7fb34b035fa7 <4>[ 55.166974] Code: 00 00 00 75 0c 48 c7 c0 ff ff ff ff 48 83 c4 18 c3 e8 8d dc 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a9 ae 0c 00 f7 d8 64 89 01 48 <4>[ 55.166978] RSP: 002b:00007fb327ffea88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 <4>[ 55.166984] RAX: ffffffffffffffda RBX: 00007fb327ffec58 RCX: 00007fb34b035fa7 <4>[ 55.166987] RDX: 00007fb327ffeb10 RSI: 00000000c0186444 RDI: 0000000000000010 <4>[ 55.166991] RBP: 00007fb327ffeb10 R08: 00007fb327ffec80 R09: 00007fb327ffec58 <4>[ 55.166995] R10: 00007fb327ffeca0 R11: 0000000000000246 R12: 00000000c0186444 <4>[ 55.166998] R13: 0000000000000010 R14: 000055ecd2705dc0 R15: 0000000000000003 <4>[ 55.167004] Modules linked in: rfcomm nf_tables ebtable_nat ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables overlay squashfs loop bnep ipv6 rtsx_usb_ms memstick rtsx_usb_sdmmc rtsx_usb uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev media ath3k btusb btintel bluetooth ecdh_generic ath9k ath9k_common kvm_amd ath9k_hw sdhci_pci kvm cqhci irqbypass mac80211 sdhci crc32_pclmul ghash_clmulni_intel ath serio_raw mmc_core cfg80211 amdgpu mfd_core chash gpu_sched xhci_pci ttm xhci_hcd ehci_pci ehci_hcd sp5100_tco <4>[ 55.167063] CR2: 0000000000000208 <4>[ 55.167069] ---[ end trace bf1c4be089002236 ]--- Bisected, and it seems that the bad commit is "drm/amdgpu: cleanup setting bulk_movable". I hope this is relevant. full git bisect log: git bisect start # good: [10117450735c7a7c0858095fb46a860e7037cb9a] drm/amd/display: add -msse2 to prevent Clang from emitting libcalls to undefined SW FP routines git bisect good 10117450735c7a7c0858095fb46a860e7037cb9a # bad: [b9c6252b7f980e7e03c0bf659a251798b36a8094] Revert "drm/amd/display: add -msse2 to prevent Clang from emitting libcalls to undefined SW FP routines" git bisect bad b9c6252b7f980e7e03c0bf659a251798b36a8094 # good: [1de29da5b7281c9a8427d84948bf3d77bc4b8d16] drm: disable uncached DMA optimization for ARM and arm64 git bisect good 1de29da5b7281c9a8427d84948bf3d77bc4b8d16 # good: [bbf48cae572b39c4df6023b01d6f8de66ef41b34] Revert "test patch for hpd dpms check" git bisect good bbf48cae572b39c4df6023b01d6f8de66ef41b34 # good: [257b75d373c77d6792d0011f7379398ba60799ec] drm/amdgpu: Show XGMI node and hive message per device only once git bisect good 257b75d373c77d6792d0011f7379398ba60799ec # good: [4d771657c533d8fe3b574c561084f66aebc77bb6] drm/amdgpu: cleanup amdgpu_pte_update_params git bisect good 4d771657c533d8fe3b574c561084f66aebc77bb6 # bad: [4ef27005fefd4be102010b7d8552fec1ee13435a] drm/amdgpu: cleanup setting bulk_movable git bisect bad 4ef27005fefd4be102010b7d8552fec1ee13435a # first bad commit: [4ef27005fefd4be102010b7d8552fec1ee13435a] drm/amdgpu: cleanup setting bulk_movable 4ef27005fefd4be102010b7d8552fec1ee13435a is the first bad commit commit 4ef27005fefd4be102010b7d8552fec1ee13435a Author: Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org> Date: Mon Jan 28 13:41:58 2019 +0100 drm/amdgpu: cleanup setting bulk_movable We only need to set this to false now when BOs are removed from the LRU. Signed-off-by: Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org> Reviewed-by: Chunming Zhou <david1.zhou-5C7GfCeVMHo@public.gmane.org> If other info is needed, please do not hesitate. Thanks, Przemek. [-- Attachment #1.2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 833 bytes --] [-- Attachment #2: Type: text/plain, Size: 154 bytes --] _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: amd-staging-drm-next: Oops - BUG: unable to handle kernel NULL pointer dereference, bisected. 2019-01-30 11:07 amd-staging-drm-next: Oops - BUG: unable to handle kernel NULL pointer dereference, bisected Przemek Socha @ 2019-01-30 12:02 ` Christian König [not found] ` <2c13e24c-6d53-a7f5-55ed-d44d7c58f655-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 9+ messages in thread From: Christian König @ 2019-01-30 12:02 UTC (permalink / raw) To: soprwa-Re5JQEeQqe8AvxtiuMwx3w, Christian König Cc: Chunming Zhou, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW [-- Attachment #1.1: Type: text/plain, Size: 6373 bytes --] This is a known issue, see here as well https://bugs.freedesktop.org/show_bug.cgi?id=109487 Christian. Am 30.01.19 um 12:07 schrieb Przemek Socha: > Good morning, > > after last pull from the amd-staging-drm-next tree (29th of February) I have > random Oops on A6 6310 APU with r4 Mullins. > > Here is the Oops part of the log taken from pstore: > > <1>[ 55.166270] BUG: unable to handle kernel NULL pointer dereference at > 0000000000000208 > <1>[ 55.166281] #PF error: [normal kernel read fault] > <6>[ 55.166285] PGD 0 P4D 0 > <4>[ 55.166293] Oops: 0000 [#1] PREEMPT SMP > <4>[ 55.166301] CPU: 3 PID: 11006 Comm: kwin_x11:cs0 Not tainted 5.0.0-rc1+ > #44 > <4>[ 55.166305] Hardware name: LENOVO 80E3/Lancer 5B2, BIOS A2CN45WW(V2.13) > 08/04/2016 > <4>[ 55.166320] RIP: 0010:ttm_bo_bulk_move_lru_tail+0xd3/0x188 [ttm] > <4>[ 55.166326] Code: 00 4c 8b 0a 48 8b 81 a8 00 00 00 48 81 c1 a8 00 00 00 > 49 89 02 4c 8b 92 b0 00 00 00 4c 89 50 08 44 89 c0 48 c1 e0 04 4c 01 c8 <4c> > 8b 90 08 02 00 00 4d 89 1a 4c 8b 90 08 02 00 00 4c 89 92 b0 00 > <4>[ 55.166330] RSP: 0018:ffffa8bdc0f33b18 EFLAGS: 00010246 > <4>[ 55.166335] RAX: 0000000000000000 RBX: 0000000000000000 RCX: > ffff9cfa935778f8 > <4>[ 55.166339] RDX: ffff9cfa950c5050 RSI: 0000000000000070 RDI: > ffff9cfa93575dd0 > <4>[ 55.166342] RBP: ffff9cfa5d44d800 R08: 0000000000000000 R09: > 0000000000000000 > <4>[ 55.166346] R10: ffff9cfa8f7730f8 R11: ffff9cfa950c50f8 R12: ffff9cfa93575dd0 > <4>[ 55.166350] R13: ffff9cfa93575800 R14: 0000000000000001 R15: ffffffffc03adc10 > <4>[ 55.166355] FS: 00007fb327fff700(0000) GS:ffff9cfa97b80000(0000) knlGS: > 0000000000000000 > <4>[ 55.166359] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > <4>[ 55.166363] CR2: 0000000000000208 CR3: 00000002150f0000 CR4: > 00000000000406e0 > <4>[ 55.166366] Call Trace: > <4>[ 55.166477] amdgpu_vm_move_to_lru_tail+0xe4/0x100 [amdgpu] > <4>[ 55.166563] amdgpu_cs_ioctl+0x14e7/0x1b08 [amdgpu] > <4>[ 55.166586] ? __switch_to_asm+0x40/0x70 > <4>[ 55.166689] ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu] > <4>[ 55.166698] drm_ioctl_kernel+0xa4/0xe8 > <4>[ 55.166707] drm_ioctl+0x1db/0x358 > <4>[ 55.166805] ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu] > <4>[ 55.166901] amdgpu_drm_ioctl+0x44/0x78 [amdgpu] > <4>[ 55.166931] do_vfs_ioctl+0x9f/0x618 > <4>[ 55.166940] ksys_ioctl+0x5b/0x88 > <4>[ 55.166947] __x64_sys_ioctl+0x11/0x18 > <4>[ 55.166955] do_syscall_64+0x50/0x168 > <4>[ 55.166963] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > <4>[ 55.166969] RIP: 0033:0x7fb34b035fa7 > <4>[ 55.166974] Code: 00 00 00 75 0c 48 c7 c0 ff ff ff ff 48 83 c4 18 c3 e8 8d > dc 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 10 00 00 00 0f 05 <48> 3d > 01 f0 ff ff 73 01 c3 48 8b 0d a9 ae 0c 00 f7 d8 64 89 01 48 > <4>[ 55.166978] RSP: 002b:00007fb327ffea88 EFLAGS: 00000246 ORIG_RAX: > 0000000000000010 > <4>[ 55.166984] RAX: ffffffffffffffda RBX: 00007fb327ffec58 RCX: 00007fb34b035fa7 > <4>[ 55.166987] RDX: 00007fb327ffeb10 RSI: 00000000c0186444 RDI: > 0000000000000010 > <4>[ 55.166991] RBP: 00007fb327ffeb10 R08: 00007fb327ffec80 R09: > 00007fb327ffec58 > <4>[ 55.166995] R10: 00007fb327ffeca0 R11: 0000000000000246 R12: > 00000000c0186444 > <4>[ 55.166998] R13: 0000000000000010 R14: 000055ecd2705dc0 R15: > 0000000000000003 > <4>[ 55.167004] Modules linked in: rfcomm nf_tables ebtable_nat ip_set > nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables overlay squashfs > loop bnep ipv6 rtsx_usb_ms memstick rtsx_usb_sdmmc rtsx_usb uvcvideo > videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev > media ath3k btusb btintel bluetooth ecdh_generic ath9k ath9k_common kvm_amd > ath9k_hw sdhci_pci kvm cqhci irqbypass mac80211 sdhci crc32_pclmul > ghash_clmulni_intel ath serio_raw mmc_core cfg80211 amdgpu mfd_core chash > gpu_sched xhci_pci ttm xhci_hcd ehci_pci ehci_hcd sp5100_tco > <4>[ 55.167063] CR2: 0000000000000208 > <4>[ 55.167069] ---[ end trace bf1c4be089002236 ]--- > > Bisected, and it seems that the bad commit is "drm/amdgpu: cleanup setting > bulk_movable". I hope this is relevant. > > full git bisect log: > > git bisect start > # good: [10117450735c7a7c0858095fb46a860e7037cb9a] drm/amd/display: add -msse2 > to prevent Clang from emitting libcalls to undefined SW FP routines > git bisect good 10117450735c7a7c0858095fb46a860e7037cb9a > # bad: [b9c6252b7f980e7e03c0bf659a251798b36a8094] Revert "drm/amd/display: add > -msse2 to prevent Clang from emitting libcalls to undefined SW FP routines" > git bisect bad b9c6252b7f980e7e03c0bf659a251798b36a8094 > # good: [1de29da5b7281c9a8427d84948bf3d77bc4b8d16] drm: disable uncached DMA > optimization for ARM and arm64 > git bisect good 1de29da5b7281c9a8427d84948bf3d77bc4b8d16 > # good: [bbf48cae572b39c4df6023b01d6f8de66ef41b34] Revert "test patch for hpd > dpms check" > git bisect good bbf48cae572b39c4df6023b01d6f8de66ef41b34 > # good: [257b75d373c77d6792d0011f7379398ba60799ec] drm/amdgpu: Show XGMI node > and hive message per device only once > git bisect good 257b75d373c77d6792d0011f7379398ba60799ec > # good: [4d771657c533d8fe3b574c561084f66aebc77bb6] drm/amdgpu: cleanup > amdgpu_pte_update_params > git bisect good 4d771657c533d8fe3b574c561084f66aebc77bb6 > # bad: [4ef27005fefd4be102010b7d8552fec1ee13435a] drm/amdgpu: cleanup setting > bulk_movable > git bisect bad 4ef27005fefd4be102010b7d8552fec1ee13435a > # first bad commit: [4ef27005fefd4be102010b7d8552fec1ee13435a] drm/amdgpu: > cleanup setting bulk_movable > > 4ef27005fefd4be102010b7d8552fec1ee13435a is the first bad commit > commit 4ef27005fefd4be102010b7d8552fec1ee13435a > Author: Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org> > Date: Mon Jan 28 13:41:58 2019 +0100 > > drm/amdgpu: cleanup setting bulk_movable > > We only need to set this to false now when BOs are removed from the LRU. > > Signed-off-by: Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org> > Reviewed-by: Chunming Zhou <david1.zhou-5C7GfCeVMHo@public.gmane.org> > > If other info is needed, please do not hesitate. > > Thanks, > Przemek. > > _______________________________________________ > amd-gfx mailing list > amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx [-- Attachment #1.2: Type: text/html, Size: 7544 bytes --] [-- Attachment #2: Type: text/plain, Size: 154 bytes --] _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <2c13e24c-6d53-a7f5-55ed-d44d7c58f655-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: amd-staging-drm-next: Oops - BUG: unable to handle kernel NULL pointer dereference, bisected. [not found] ` <2c13e24c-6d53-a7f5-55ed-d44d7c58f655-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2019-01-30 12:06 ` Koenig, Christian [not found] ` <34e7e9b9-8d16-1055-cd08-68eac497743e-5C7GfCeVMHo@public.gmane.org> 0 siblings, 1 reply; 9+ messages in thread From: Koenig, Christian @ 2019-01-30 12:06 UTC (permalink / raw) To: soprwa-Re5JQEeQqe8AvxtiuMwx3w Cc: Zhou, David(ChunMing), amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW [-- Attachment #1.1: Type: text/plain, Size: 6466 bytes --] Sorry I accidentally replied to the wrong mail. This is a new issue. Going to take a look now. Christian. Am 30.01.19 um 13:02 schrieb Christian König: This is a known issue, see here as well https://bugs.freedesktop.org/show_bug.cgi?id=109487 Christian. Am 30.01.19 um 12:07 schrieb Przemek Socha: Good morning, after last pull from the amd-staging-drm-next tree (29th of February) I have random Oops on A6 6310 APU with r4 Mullins. Here is the Oops part of the log taken from pstore: <1>[ 55.166270] BUG: unable to handle kernel NULL pointer dereference at 0000000000000208 <1>[ 55.166281] #PF error: [normal kernel read fault] <6>[ 55.166285] PGD 0 P4D 0 <4>[ 55.166293] Oops: 0000 [#1] PREEMPT SMP <4>[ 55.166301] CPU: 3 PID: 11006 Comm: kwin_x11:cs0 Not tainted 5.0.0-rc1+ #44 <4>[ 55.166305] Hardware name: LENOVO 80E3/Lancer 5B2, BIOS A2CN45WW(V2.13) 08/04/2016 <4>[ 55.166320] RIP: 0010:ttm_bo_bulk_move_lru_tail+0xd3/0x188 [ttm] <4>[ 55.166326] Code: 00 4c 8b 0a 48 8b 81 a8 00 00 00 48 81 c1 a8 00 00 00 49 89 02 4c 8b 92 b0 00 00 00 4c 89 50 08 44 89 c0 48 c1 e0 04 4c 01 c8 <4c> 8b 90 08 02 00 00 4d 89 1a 4c 8b 90 08 02 00 00 4c 89 92 b0 00 <4>[ 55.166330] RSP: 0018:ffffa8bdc0f33b18 EFLAGS: 00010246 <4>[ 55.166335] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff9cfa935778f8 <4>[ 55.166339] RDX: ffff9cfa950c5050 RSI: 0000000000000070 RDI: ffff9cfa93575dd0 <4>[ 55.166342] RBP: ffff9cfa5d44d800 R08: 0000000000000000 R09: 0000000000000000 <4>[ 55.166346] R10: ffff9cfa8f7730f8 R11: ffff9cfa950c50f8 R12: ffff9cfa93575dd0 <4>[ 55.166350] R13: ffff9cfa93575800 R14: 0000000000000001 R15: ffffffffc03adc10 <4>[ 55.166355] FS: 00007fb327fff700(0000) GS:ffff9cfa97b80000(0000) knlGS: 0000000000000000 <4>[ 55.166359] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 55.166363] CR2: 0000000000000208 CR3: 00000002150f0000 CR4: 00000000000406e0 <4>[ 55.166366] Call Trace: <4>[ 55.166477] amdgpu_vm_move_to_lru_tail+0xe4/0x100 [amdgpu] <4>[ 55.166563] amdgpu_cs_ioctl+0x14e7/0x1b08 [amdgpu] <4>[ 55.166586] ? __switch_to_asm+0x40/0x70 <4>[ 55.166689] ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu] <4>[ 55.166698] drm_ioctl_kernel+0xa4/0xe8 <4>[ 55.166707] drm_ioctl+0x1db/0x358 <4>[ 55.166805] ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu] <4>[ 55.166901] amdgpu_drm_ioctl+0x44/0x78 [amdgpu] <4>[ 55.166931] do_vfs_ioctl+0x9f/0x618 <4>[ 55.166940] ksys_ioctl+0x5b/0x88 <4>[ 55.166947] __x64_sys_ioctl+0x11/0x18 <4>[ 55.166955] do_syscall_64+0x50/0x168 <4>[ 55.166963] entry_SYSCALL_64_after_hwframe+0x44/0xa9 <4>[ 55.166969] RIP: 0033:0x7fb34b035fa7 <4>[ 55.166974] Code: 00 00 00 75 0c 48 c7 c0 ff ff ff ff 48 83 c4 18 c3 e8 8d dc 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a9 ae 0c 00 f7 d8 64 89 01 48 <4>[ 55.166978] RSP: 002b:00007fb327ffea88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 <4>[ 55.166984] RAX: ffffffffffffffda RBX: 00007fb327ffec58 RCX: 00007fb34b035fa7 <4>[ 55.166987] RDX: 00007fb327ffeb10 RSI: 00000000c0186444 RDI: 0000000000000010 <4>[ 55.166991] RBP: 00007fb327ffeb10 R08: 00007fb327ffec80 R09: 00007fb327ffec58 <4>[ 55.166995] R10: 00007fb327ffeca0 R11: 0000000000000246 R12: 00000000c0186444 <4>[ 55.166998] R13: 0000000000000010 R14: 000055ecd2705dc0 R15: 0000000000000003 <4>[ 55.167004] Modules linked in: rfcomm nf_tables ebtable_nat ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables overlay squashfs loop bnep ipv6 rtsx_usb_ms memstick rtsx_usb_sdmmc rtsx_usb uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev media ath3k btusb btintel bluetooth ecdh_generic ath9k ath9k_common kvm_amd ath9k_hw sdhci_pci kvm cqhci irqbypass mac80211 sdhci crc32_pclmul ghash_clmulni_intel ath serio_raw mmc_core cfg80211 amdgpu mfd_core chash gpu_sched xhci_pci ttm xhci_hcd ehci_pci ehci_hcd sp5100_tco <4>[ 55.167063] CR2: 0000000000000208 <4>[ 55.167069] ---[ end trace bf1c4be089002236 ]--- Bisected, and it seems that the bad commit is "drm/amdgpu: cleanup setting bulk_movable". I hope this is relevant. full git bisect log: git bisect start # good: [10117450735c7a7c0858095fb46a860e7037cb9a] drm/amd/display: add -msse2 to prevent Clang from emitting libcalls to undefined SW FP routines git bisect good 10117450735c7a7c0858095fb46a860e7037cb9a # bad: [b9c6252b7f980e7e03c0bf659a251798b36a8094] Revert "drm/amd/display: add -msse2 to prevent Clang from emitting libcalls to undefined SW FP routines" git bisect bad b9c6252b7f980e7e03c0bf659a251798b36a8094 # good: [1de29da5b7281c9a8427d84948bf3d77bc4b8d16] drm: disable uncached DMA optimization for ARM and arm64 git bisect good 1de29da5b7281c9a8427d84948bf3d77bc4b8d16 # good: [bbf48cae572b39c4df6023b01d6f8de66ef41b34] Revert "test patch for hpd dpms check" git bisect good bbf48cae572b39c4df6023b01d6f8de66ef41b34 # good: [257b75d373c77d6792d0011f7379398ba60799ec] drm/amdgpu: Show XGMI node and hive message per device only once git bisect good 257b75d373c77d6792d0011f7379398ba60799ec # good: [4d771657c533d8fe3b574c561084f66aebc77bb6] drm/amdgpu: cleanup amdgpu_pte_update_params git bisect good 4d771657c533d8fe3b574c561084f66aebc77bb6 # bad: [4ef27005fefd4be102010b7d8552fec1ee13435a] drm/amdgpu: cleanup setting bulk_movable git bisect bad 4ef27005fefd4be102010b7d8552fec1ee13435a # first bad commit: [4ef27005fefd4be102010b7d8552fec1ee13435a] drm/amdgpu: cleanup setting bulk_movable 4ef27005fefd4be102010b7d8552fec1ee13435a is the first bad commit commit 4ef27005fefd4be102010b7d8552fec1ee13435a Author: Christian König <christian.koenig@amd.com><mailto:christian.koenig@amd.com> Date: Mon Jan 28 13:41:58 2019 +0100 drm/amdgpu: cleanup setting bulk_movable We only need to set this to false now when BOs are removed from the LRU. Signed-off-by: Christian König <christian.koenig@amd.com><mailto:christian.koenig@amd.com> Reviewed-by: Chunming Zhou <david1.zhou@amd.com><mailto:david1.zhou@amd.com> If other info is needed, please do not hesitate. Thanks, Przemek. _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> https://lists.freedesktop.org/mailman/listinfo/amd-gfx [-- Attachment #1.2: Type: text/html, Size: 7987 bytes --] [-- Attachment #2: Type: text/plain, Size: 154 bytes --] _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <34e7e9b9-8d16-1055-cd08-68eac497743e-5C7GfCeVMHo@public.gmane.org>]
* Re: amd-staging-drm-next: Oops - BUG: unable to handle kernel NULL pointer dereference, bisected. [not found] ` <34e7e9b9-8d16-1055-cd08-68eac497743e-5C7GfCeVMHo@public.gmane.org> @ 2019-01-30 12:42 ` Koenig, Christian [not found] ` <7a65412b-1b41-9e5b-f700-0a944a33cf49-5C7GfCeVMHo@public.gmane.org> 0 siblings, 1 reply; 9+ messages in thread From: Koenig, Christian @ 2019-01-30 12:42 UTC (permalink / raw) To: soprwa-Re5JQEeQqe8AvxtiuMwx3w Cc: Zhou, David(ChunMing), amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW [-- Attachment #1.1: Type: text/plain, Size: 6572 bytes --] Does the attached patch fix the issue? Christian. Am 30.01.19 um 13:06 schrieb Christian König: Sorry I accidentally replied to the wrong mail. This is a new issue. Going to take a look now. Christian. Am 30.01.19 um 13:02 schrieb Christian König: This is a known issue, see here as well https://bugs.freedesktop.org/show_bug.cgi?id=109487 Christian. Am 30.01.19 um 12:07 schrieb Przemek Socha: Good morning, after last pull from the amd-staging-drm-next tree (29th of February) I have random Oops on A6 6310 APU with r4 Mullins. Here is the Oops part of the log taken from pstore: <1>[ 55.166270] BUG: unable to handle kernel NULL pointer dereference at 0000000000000208 <1>[ 55.166281] #PF error: [normal kernel read fault] <6>[ 55.166285] PGD 0 P4D 0 <4>[ 55.166293] Oops: 0000 [#1] PREEMPT SMP <4>[ 55.166301] CPU: 3 PID: 11006 Comm: kwin_x11:cs0 Not tainted 5.0.0-rc1+ #44 <4>[ 55.166305] Hardware name: LENOVO 80E3/Lancer 5B2, BIOS A2CN45WW(V2.13) 08/04/2016 <4>[ 55.166320] RIP: 0010:ttm_bo_bulk_move_lru_tail+0xd3/0x188 [ttm] <4>[ 55.166326] Code: 00 4c 8b 0a 48 8b 81 a8 00 00 00 48 81 c1 a8 00 00 00 49 89 02 4c 8b 92 b0 00 00 00 4c 89 50 08 44 89 c0 48 c1 e0 04 4c 01 c8 <4c> 8b 90 08 02 00 00 4d 89 1a 4c 8b 90 08 02 00 00 4c 89 92 b0 00 <4>[ 55.166330] RSP: 0018:ffffa8bdc0f33b18 EFLAGS: 00010246 <4>[ 55.166335] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff9cfa935778f8 <4>[ 55.166339] RDX: ffff9cfa950c5050 RSI: 0000000000000070 RDI: ffff9cfa93575dd0 <4>[ 55.166342] RBP: ffff9cfa5d44d800 R08: 0000000000000000 R09: 0000000000000000 <4>[ 55.166346] R10: ffff9cfa8f7730f8 R11: ffff9cfa950c50f8 R12: ffff9cfa93575dd0 <4>[ 55.166350] R13: ffff9cfa93575800 R14: 0000000000000001 R15: ffffffffc03adc10 <4>[ 55.166355] FS: 00007fb327fff700(0000) GS:ffff9cfa97b80000(0000) knlGS: 0000000000000000 <4>[ 55.166359] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 55.166363] CR2: 0000000000000208 CR3: 00000002150f0000 CR4: 00000000000406e0 <4>[ 55.166366] Call Trace: <4>[ 55.166477] amdgpu_vm_move_to_lru_tail+0xe4/0x100 [amdgpu] <4>[ 55.166563] amdgpu_cs_ioctl+0x14e7/0x1b08 [amdgpu] <4>[ 55.166586] ? __switch_to_asm+0x40/0x70 <4>[ 55.166689] ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu] <4>[ 55.166698] drm_ioctl_kernel+0xa4/0xe8 <4>[ 55.166707] drm_ioctl+0x1db/0x358 <4>[ 55.166805] ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu] <4>[ 55.166901] amdgpu_drm_ioctl+0x44/0x78 [amdgpu] <4>[ 55.166931] do_vfs_ioctl+0x9f/0x618 <4>[ 55.166940] ksys_ioctl+0x5b/0x88 <4>[ 55.166947] __x64_sys_ioctl+0x11/0x18 <4>[ 55.166955] do_syscall_64+0x50/0x168 <4>[ 55.166963] entry_SYSCALL_64_after_hwframe+0x44/0xa9 <4>[ 55.166969] RIP: 0033:0x7fb34b035fa7 <4>[ 55.166974] Code: 00 00 00 75 0c 48 c7 c0 ff ff ff ff 48 83 c4 18 c3 e8 8d dc 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a9 ae 0c 00 f7 d8 64 89 01 48 <4>[ 55.166978] RSP: 002b:00007fb327ffea88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 <4>[ 55.166984] RAX: ffffffffffffffda RBX: 00007fb327ffec58 RCX: 00007fb34b035fa7 <4>[ 55.166987] RDX: 00007fb327ffeb10 RSI: 00000000c0186444 RDI: 0000000000000010 <4>[ 55.166991] RBP: 00007fb327ffeb10 R08: 00007fb327ffec80 R09: 00007fb327ffec58 <4>[ 55.166995] R10: 00007fb327ffeca0 R11: 0000000000000246 R12: 00000000c0186444 <4>[ 55.166998] R13: 0000000000000010 R14: 000055ecd2705dc0 R15: 0000000000000003 <4>[ 55.167004] Modules linked in: rfcomm nf_tables ebtable_nat ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables overlay squashfs loop bnep ipv6 rtsx_usb_ms memstick rtsx_usb_sdmmc rtsx_usb uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev media ath3k btusb btintel bluetooth ecdh_generic ath9k ath9k_common kvm_amd ath9k_hw sdhci_pci kvm cqhci irqbypass mac80211 sdhci crc32_pclmul ghash_clmulni_intel ath serio_raw mmc_core cfg80211 amdgpu mfd_core chash gpu_sched xhci_pci ttm xhci_hcd ehci_pci ehci_hcd sp5100_tco <4>[ 55.167063] CR2: 0000000000000208 <4>[ 55.167069] ---[ end trace bf1c4be089002236 ]--- Bisected, and it seems that the bad commit is "drm/amdgpu: cleanup setting bulk_movable". I hope this is relevant. full git bisect log: git bisect start # good: [10117450735c7a7c0858095fb46a860e7037cb9a] drm/amd/display: add -msse2 to prevent Clang from emitting libcalls to undefined SW FP routines git bisect good 10117450735c7a7c0858095fb46a860e7037cb9a # bad: [b9c6252b7f980e7e03c0bf659a251798b36a8094] Revert "drm/amd/display: add -msse2 to prevent Clang from emitting libcalls to undefined SW FP routines" git bisect bad b9c6252b7f980e7e03c0bf659a251798b36a8094 # good: [1de29da5b7281c9a8427d84948bf3d77bc4b8d16] drm: disable uncached DMA optimization for ARM and arm64 git bisect good 1de29da5b7281c9a8427d84948bf3d77bc4b8d16 # good: [bbf48cae572b39c4df6023b01d6f8de66ef41b34] Revert "test patch for hpd dpms check" git bisect good bbf48cae572b39c4df6023b01d6f8de66ef41b34 # good: [257b75d373c77d6792d0011f7379398ba60799ec] drm/amdgpu: Show XGMI node and hive message per device only once git bisect good 257b75d373c77d6792d0011f7379398ba60799ec # good: [4d771657c533d8fe3b574c561084f66aebc77bb6] drm/amdgpu: cleanup amdgpu_pte_update_params git bisect good 4d771657c533d8fe3b574c561084f66aebc77bb6 # bad: [4ef27005fefd4be102010b7d8552fec1ee13435a] drm/amdgpu: cleanup setting bulk_movable git bisect bad 4ef27005fefd4be102010b7d8552fec1ee13435a # first bad commit: [4ef27005fefd4be102010b7d8552fec1ee13435a] drm/amdgpu: cleanup setting bulk_movable 4ef27005fefd4be102010b7d8552fec1ee13435a is the first bad commit commit 4ef27005fefd4be102010b7d8552fec1ee13435a Author: Christian König <christian.koenig@amd.com><mailto:christian.koenig@amd.com> Date: Mon Jan 28 13:41:58 2019 +0100 drm/amdgpu: cleanup setting bulk_movable We only need to set this to false now when BOs are removed from the LRU. Signed-off-by: Christian König <christian.koenig@amd.com><mailto:christian.koenig@amd.com> Reviewed-by: Chunming Zhou <david1.zhou@amd.com><mailto:david1.zhou@amd.com> If other info is needed, please do not hesitate. Thanks, Przemek. _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> https://lists.freedesktop.org/mailman/listinfo/amd-gfx [-- Attachment #1.2: Type: text/html, Size: 8251 bytes --] [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-drm-amdgpu-partial-revert-cleanup-setting-bulk_movab.patch --] [-- Type: text/x-patch; name="0001-drm-amdgpu-partial-revert-cleanup-setting-bulk_movab.patch", Size: 1077 bytes --] From 3a7a65eb1952439a90f244a07f6d9bb338c2e4b1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Christian=20K=C3=B6nig?= <christian.koenig@amd.com> Date: Wed, 30 Jan 2019 13:41:05 +0100 Subject: [PATCH] drm/amdgpu: partial revert cleanup setting bulk_movable MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit We still need to set bulk_movable to false when new BOs are added. Signed-off-by: Christian König <christian.koenig@amd.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 79f9dde70bc0..1e101a77eec9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -332,6 +332,7 @@ static void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base, if (bo->tbo.resv != vm->root.base.bo->tbo.resv) return; + vm->bulk_moveable = false; if (bo->tbo.type == ttm_bo_type_kernel) amdgpu_vm_bo_relocated(base); else -- 2.17.1 [-- Attachment #3: Type: text/plain, Size: 154 bytes --] _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply related [flat|nested] 9+ messages in thread
[parent not found: <7a65412b-1b41-9e5b-f700-0a944a33cf49-5C7GfCeVMHo@public.gmane.org>]
* Re: amd-staging-drm-next: Oops - BUG: unable to handle kernel NULL pointer dereference, bisected. [not found] ` <7a65412b-1b41-9e5b-f700-0a944a33cf49-5C7GfCeVMHo@public.gmane.org> @ 2019-01-30 14:17 ` Przemek Socha 2019-01-30 14:37 ` StDenis, Tom 2019-01-31 9:23 ` Przemek Socha 2 siblings, 0 replies; 9+ messages in thread From: Przemek Socha @ 2019-01-30 14:17 UTC (permalink / raw) To: Koenig, Christian Cc: Zhou, David(ChunMing), amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW [-- Attachment #1.1: Type: text/plain, Size: 4032 bytes --] Dnia środa, 30 stycznia 2019 13:42:33 CET piszesz: > Does the attached patch fix the issue? > > Christian. > > ..... Thanks for the rapid response, but unfortunately no. System freezes and only mouse pointer is movable (cannot switch tty's, reboot by pwr button, tree-finger-salute doesn't work also). Here is a trace log after applying the patch. I'm attaching it because it looks different: <4>[ 46.864336] ------------[ cut here ]------------ <2>[ 46.864343] kernel BUG at drivers/gpu/drm/ttm/ttm_bo.c:196! <4>[ 46.864361] invalid opcode: 0000 [#1] PREEMPT SMP <4>[ 46.864369] CPU: 3 PID: 10966 Comm: plasmashel:cs0 Not tainted 5.0.0- rc1+ #44 <4>[ 46.864373] Hardware name: LENOVO 80E3/Lancer 5B2, BIOS A2CN45WW(V2.13) 08/04/2016 <4>[ 46.864388] RIP: 0010:ttm_bo_ref_bug+0x0/0x8 [ttm] <4>[ 46.864393] Code: 00 00 08 00 75 0c 48 83 c7 0c 4c 39 cf 75 ab 31 c0 c3 b8 01 00 00 00 c3 66 90 f0 ff 8f a4 00 00 00 c3 0f 1f 84 00 00 00 00 00 <0f> 0b 66 0f 1f 44 00 00 53 48 8b 07 48 89 fb 48 8b 40 18 48 8b 40 <4>[ 46.864397] RSP: 0018:ffffa86fc1263af8 EFLAGS: 00010247 <4>[ 46.864403] RAX: ffff8c7b133a787c RBX: ffffa86fc1263c48 RCX: ffff8c7b0f7698f8 <4>[ 46.864406] RDX: ffff8c7b133a78f8 RSI: ffff8c7b11aa2800 RDI: ffff8c7b133a787c <4>[ 46.864410] RBP: ffff8c7ac16d1b38 R08: ffff8c7b1348d0f8 R09: ffffa86fc12639b0 <4>[ 46.864414] R10: ffffcfd6c84d07c0 R11: 0000000000000003 R12: ffffffffc0364c10 <4>[ 46.864417] R13: ffffa86fc1263be0 R14: 0000000000000000 R15: ffffa86fc1263c48 <4>[ 46.864422] FS: 00007f3e34019700(0000) GS:ffff8c7b17b80000(0000) knlGS: 0000000000000000 <4>[ 46.864426] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 46.864430] CR2: 00007fb1cb765000 CR3: 000000021333e000 CR4: 00000000000406e0 <4>[ 46.864433] Call Trace: <4>[ 46.864446] ttm_bo_del_from_lru+0xab/0xc8 [ttm] <4>[ 46.864456] ttm_eu_reserve_buffers+0x140/0x2c8 [ttm] <4>[ 46.864557] amdgpu_cs_ioctl+0x4ee/0x1b08 [amdgpu] <4>[ 46.864575] ? __switch_to_asm+0x40/0x70 <4>[ 46.864668] ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu] <4>[ 46.864678] drm_ioctl_kernel+0xa4/0xe8<4>[ 46.864686] drm_ioctl+0x1db/0x358 <4>[ 46.864767] ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu] <4>[ 46.864848] amdgpu_drm_ioctl+0x44/0x78 [amdgpu] <4>[ 46.864859] do_vfs_ioctl+0x9f/0x618 <4>[ 46.864867] ksys_ioctl+0x5b/0x88 <4>[ 46.864874] __x64_sys_ioctl+0x11/0x18 <4>[ 46.864881] do_syscall_64+0x50/0x168 <4>[ 46.864888] entry_SYSCALL_64_after_hwframe+0x44/0xa9 <4>[ 46.864895] RIP: 0033:0x7f3e4a939fa7 <4>[ 46.864900] Code: 00 00 00 75 0c 48 c7 c0 ff ff ff ff 48 83 c4 18 c3 e8 8d dc 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a9 ae 0c 00 f7 d8 64 89 01 48 <4>[ 46.864904] RSP: 002b:00007f3e34018ab8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 <4>[ 46.864909] RAX: ffffffffffffffda RBX: 00007f3e34018c58 RCX: 00007f3e4a939fa7 <4>[ 46.864913] RDX: 00007f3e34018b40 RSI: 00000000c0186444 RDI: 0000000000000010 <4>[ 46.864916] RBP: 00007f3e34018b40 R08: 00007f3e34018c80 R09: 00007f3e34018c58 <4>[ 46.864920] R10: 00007f3e34018ca0 R11: 0000000000000246 R12: 00000000c0186444 <4>[ 46.864923] R13: 0000000000000010 R14: 000055555e550d70 R15: 0000000000000003 <4>[ 46.864929] Modules linked in: rfcomm nf_tables ebtable_nat ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables overlay squashfs loop bnep ipv6 rtsx_usb_ms memstick rtsx_usb_sdmmc rtsx_usb ath3k btusb btintel bluetooth ecdh_generic uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev media kvm_amd ath9k kvm ath9k_common irqbypass ath9k_hw crc32_pclmul mac80211 sdhci_pci cqhci sdhci ghash_clmulni_intel serio_raw mmc_core ath cfg80211 amdgpu mfd_core chash gpu_sched xhci_pci ttm ehci_pci xhci_hcd ehci_hcd sp5100_tco <4>[ 46.864981] ---[ end trace 7bdf1a5927cdc874 ]--- Thanks, Przemek. [-- Attachment #1.2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 833 bytes --] [-- Attachment #2: Type: text/plain, Size: 154 bytes --] _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: amd-staging-drm-next: Oops - BUG: unable to handle kernel NULL pointer dereference, bisected. [not found] ` <7a65412b-1b41-9e5b-f700-0a944a33cf49-5C7GfCeVMHo@public.gmane.org> 2019-01-30 14:17 ` Przemek Socha @ 2019-01-30 14:37 ` StDenis, Tom 2019-01-31 9:23 ` Przemek Socha 2 siblings, 0 replies; 9+ messages in thread From: StDenis, Tom @ 2019-01-30 14:37 UTC (permalink / raw) To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW On 2019-01-30 7:42 a.m., Koenig, Christian wrote: > Does the attached patch fix the issue? No. Now I get a lockup when I start GNOME and try to bring up a terminal. The patch also didn't apply cleanly on top of drm-next but I was able to just manually add the line. [ 88.018735] general protection fault: 0000 [#1] SMP NOPTI [ 88.018741] CPU: 5 PID: 4164 Comm: gnome-shel:cs0 Tainted: G W 5.0.0-rc1+ #20 [ 88.018743] Hardware name: System manufacturer System Product Name/TUF B350M-PLUS GAMING, BIOS 4011 04/19/2018 [ 88.018750] RIP: 0010:ttm_bo_bulk_move_lru_tail+0x36/0x190 [ttm] [ 88.018753] Code: 90 48 85 d2 74 66 48 8b 4c 37 98 4c 8b 92 b0 00 00 00 4c 8d 9a a8 00 00 00 4c 8b 0a 48 8b 81 a8 00 00 00 48 81 c1 a8 00 00 00 <49> 89 02 4c 8b 92 b0 00 00 00 4c 89 50 08 44 89 c0 48 c1 e0 04 4c [ 88.018755] RSP: 0018:ffffb419c1fefb18 EFLAGS: 00010296 [ 88.018757] RAX: ffff9692d9a013a0 RBX: 0000000000000000 RCX: ffff9693032f2f90 [ 88.018759] RDX: ffff9692e099cad8 RSI: 0000000000000070 RDI: ffff9693058a7598 [ 88.018761] RBP: ffff9692ed34f4e8 R08: 0000000000000000 R09: 6b6b6b6b6b6b6b6b [ 88.018762] R10: 6b6b6b6b6b6b6b6b R11: ffff9692e099cb80 R12: ffff9693058a7598 [ 88.018763] R13: ffff9693058a6fc8 R14: 0000000000000001 R15: ffffffffc033dbc0 [ 88.018765] FS: 00007fc351843700(0000) GS:ffff969337b40000(0000) knlGS:0000000000000000 [ 88.018767] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 88.018769] CR2: 00007fa78adb08a0 CR3: 0000000206d56000 CR4: 00000000003406e0 [ 88.018770] Call Trace: [ 88.018807] amdgpu_vm_move_to_lru_tail+0xe1/0x100 [amdgpu] [ 88.018842] amdgpu_cs_ioctl+0x14de/0x1ad0 [amdgpu] [ 88.018846] ? __switch_to_asm+0x34/0x70 [ 88.018881] ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu] [ 88.018884] drm_ioctl_kernel+0xa4/0xf0 [ 88.018887] drm_ioctl+0x1db/0x370 [ 88.018921] ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu] [ 88.018970] amdgpu_drm_ioctl+0x44/0x80 [amdgpu] [ 88.018975] do_vfs_ioctl+0x9f/0x610 [ 88.018980] ? __x64_sys_futex+0x137/0x180 [ 88.018983] ksys_ioctl+0x5b/0x90 [ 88.018986] __x64_sys_ioctl+0x11/0x20 [ 88.018989] do_syscall_64+0x43/0xf0 [ 88.018992] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 88.018995] RIP: 0033:0x7fc37a1b1c97 [ 88.018997] Code: 00 00 90 48 8b 05 09 82 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d9 81 2c 00 f7 d8 64 89 01 48 [ 88.018999] RSP: 002b:00007fc3518425d8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [ 88.019002] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc37a1b1c97 [ 88.019004] RDX: 00007fc3518426b0 RSI: 00000000c0186444 RDI: 000000000000000c [ 88.019005] RBP: 00007fc351842610 R08: 00007fc351842720 R09: 00007fc351842798 [ 88.019007] R10: 0000000000000000 R11: 0000000000000202 R12: 00007ffdf8dc068e [ 88.019009] R13: 00007ffdf8dc068f R14: 00007ffdf8dc0690 R15: 0000000000000000 [ 88.019011] Modules linked in: fuse amdgpu mfd_core chash gpu_sched ttm ax88179_178a usbnet [ 88.019019] ---[ end trace dc532cd45c6dc064 ]--- Tom > > Christian. > > Am 30.01.19 um 13:06 schrieb Christian König: >> Sorry I accidentally replied to the wrong mail. >> >> This is a new issue. Going to take a look now. >> >> Christian. >> >> Am 30.01.19 um 13:02 schrieb Christian König: >>> This is a known issue, see here as well >>> https://bugs.freedesktop.org/show_bug.cgi?id=109487 >>> >>> Christian. >>> >>> Am 30.01.19 um 12:07 schrieb Przemek Socha: >>>> Good morning, >>>> >>>> after last pull from the amd-staging-drm-next tree (29th of February) I have >>>> random Oops on A6 6310 APU with r4 Mullins. >>>> >>>> Here is the Oops part of the log taken from pstore: >>>> >>>> <1>[ 55.166270] BUG: unable to handle kernel NULL pointer dereference at >>>> 0000000000000208 >>>> <1>[ 55.166281] #PF error: [normal kernel read fault] >>>> <6>[ 55.166285] PGD 0 P4D 0 >>>> <4>[ 55.166293] Oops: 0000 [#1] PREEMPT SMP >>>> <4>[ 55.166301] CPU: 3 PID: 11006 Comm: kwin_x11:cs0 Not tainted 5.0.0-rc1+ >>>> #44 >>>> <4>[ 55.166305] Hardware name: LENOVO 80E3/Lancer 5B2, BIOS A2CN45WW(V2.13) >>>> 08/04/2016 >>>> <4>[ 55.166320] RIP: 0010:ttm_bo_bulk_move_lru_tail+0xd3/0x188 [ttm] >>>> <4>[ 55.166326] Code: 00 4c 8b 0a 48 8b 81 a8 00 00 00 48 81 c1 a8 00 00 00 >>>> 49 89 02 4c 8b 92 b0 00 00 00 4c 89 50 08 44 89 c0 48 c1 e0 04 4c 01 c8 <4c> >>>> 8b 90 08 02 00 00 4d 89 1a 4c 8b 90 08 02 00 00 4c 89 92 b0 00 >>>> <4>[ 55.166330] RSP: 0018:ffffa8bdc0f33b18 EFLAGS: 00010246 >>>> <4>[ 55.166335] RAX: 0000000000000000 RBX: 0000000000000000 RCX: >>>> ffff9cfa935778f8 >>>> <4>[ 55.166339] RDX: ffff9cfa950c5050 RSI: 0000000000000070 RDI: >>>> ffff9cfa93575dd0 >>>> <4>[ 55.166342] RBP: ffff9cfa5d44d800 R08: 0000000000000000 R09: >>>> 0000000000000000 >>>> <4>[ 55.166346] R10: ffff9cfa8f7730f8 R11: ffff9cfa950c50f8 R12: ffff9cfa93575dd0 >>>> <4>[ 55.166350] R13: ffff9cfa93575800 R14: 0000000000000001 R15: ffffffffc03adc10 >>>> <4>[ 55.166355] FS: 00007fb327fff700(0000) GS:ffff9cfa97b80000(0000) knlGS: >>>> 0000000000000000 >>>> <4>[ 55.166359] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>> <4>[ 55.166363] CR2: 0000000000000208 CR3: 00000002150f0000 CR4: >>>> 00000000000406e0 >>>> <4>[ 55.166366] Call Trace: >>>> <4>[ 55.166477] amdgpu_vm_move_to_lru_tail+0xe4/0x100 [amdgpu] >>>> <4>[ 55.166563] amdgpu_cs_ioctl+0x14e7/0x1b08 [amdgpu] >>>> <4>[ 55.166586] ? __switch_to_asm+0x40/0x70 >>>> <4>[ 55.166689] ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu] >>>> <4>[ 55.166698] drm_ioctl_kernel+0xa4/0xe8 >>>> <4>[ 55.166707] drm_ioctl+0x1db/0x358 >>>> <4>[ 55.166805] ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu] >>>> <4>[ 55.166901] amdgpu_drm_ioctl+0x44/0x78 [amdgpu] >>>> <4>[ 55.166931] do_vfs_ioctl+0x9f/0x618 >>>> <4>[ 55.166940] ksys_ioctl+0x5b/0x88 >>>> <4>[ 55.166947] __x64_sys_ioctl+0x11/0x18 >>>> <4>[ 55.166955] do_syscall_64+0x50/0x168 >>>> <4>[ 55.166963] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>>> <4>[ 55.166969] RIP: 0033:0x7fb34b035fa7 >>>> <4>[ 55.166974] Code: 00 00 00 75 0c 48 c7 c0 ff ff ff ff 48 83 c4 18 c3 e8 8d >>>> dc 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 10 00 00 00 0f 05 <48> 3d >>>> 01 f0 ff ff 73 01 c3 48 8b 0d a9 ae 0c 00 f7 d8 64 89 01 48 >>>> <4>[ 55.166978] RSP: 002b:00007fb327ffea88 EFLAGS: 00000246 ORIG_RAX: >>>> 0000000000000010 >>>> <4>[ 55.166984] RAX: ffffffffffffffda RBX: 00007fb327ffec58 RCX: 00007fb34b035fa7 >>>> <4>[ 55.166987] RDX: 00007fb327ffeb10 RSI: 00000000c0186444 RDI: >>>> 0000000000000010 >>>> <4>[ 55.166991] RBP: 00007fb327ffeb10 R08: 00007fb327ffec80 R09: >>>> 00007fb327ffec58 >>>> <4>[ 55.166995] R10: 00007fb327ffeca0 R11: 0000000000000246 R12: >>>> 00000000c0186444 >>>> <4>[ 55.166998] R13: 0000000000000010 R14: 000055ecd2705dc0 R15: >>>> 0000000000000003 >>>> <4>[ 55.167004] Modules linked in: rfcomm nf_tables ebtable_nat ip_set >>>> nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables overlay squashfs >>>> loop bnep ipv6 rtsx_usb_ms memstick rtsx_usb_sdmmc rtsx_usb uvcvideo >>>> videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev >>>> media ath3k btusb btintel bluetooth ecdh_generic ath9k ath9k_common kvm_amd >>>> ath9k_hw sdhci_pci kvm cqhci irqbypass mac80211 sdhci crc32_pclmul >>>> ghash_clmulni_intel ath serio_raw mmc_core cfg80211 amdgpu mfd_core chash >>>> gpu_sched xhci_pci ttm xhci_hcd ehci_pci ehci_hcd sp5100_tco >>>> <4>[ 55.167063] CR2: 0000000000000208 >>>> <4>[ 55.167069] ---[ end trace bf1c4be089002236 ]--- >>>> >>>> Bisected, and it seems that the bad commit is "drm/amdgpu: cleanup setting >>>> bulk_movable". I hope this is relevant. >>>> >>>> full git bisect log: >>>> >>>> git bisect start >>>> # good: [10117450735c7a7c0858095fb46a860e7037cb9a] drm/amd/display: add -msse2 >>>> to prevent Clang from emitting libcalls to undefined SW FP routines >>>> git bisect good 10117450735c7a7c0858095fb46a860e7037cb9a >>>> # bad: [b9c6252b7f980e7e03c0bf659a251798b36a8094] Revert "drm/amd/display: add >>>> -msse2 to prevent Clang from emitting libcalls to undefined SW FP routines" >>>> git bisect bad b9c6252b7f980e7e03c0bf659a251798b36a8094 >>>> # good: [1de29da5b7281c9a8427d84948bf3d77bc4b8d16] drm: disable uncached DMA >>>> optimization for ARM and arm64 >>>> git bisect good 1de29da5b7281c9a8427d84948bf3d77bc4b8d16 >>>> # good: [bbf48cae572b39c4df6023b01d6f8de66ef41b34] Revert "test patch for hpd >>>> dpms check" >>>> git bisect good bbf48cae572b39c4df6023b01d6f8de66ef41b34 >>>> # good: [257b75d373c77d6792d0011f7379398ba60799ec] drm/amdgpu: Show XGMI node >>>> and hive message per device only once >>>> git bisect good 257b75d373c77d6792d0011f7379398ba60799ec >>>> # good: [4d771657c533d8fe3b574c561084f66aebc77bb6] drm/amdgpu: cleanup >>>> amdgpu_pte_update_params >>>> git bisect good 4d771657c533d8fe3b574c561084f66aebc77bb6 >>>> # bad: [4ef27005fefd4be102010b7d8552fec1ee13435a] drm/amdgpu: cleanup setting >>>> bulk_movable >>>> git bisect bad 4ef27005fefd4be102010b7d8552fec1ee13435a >>>> # first bad commit: [4ef27005fefd4be102010b7d8552fec1ee13435a] drm/amdgpu: >>>> cleanup setting bulk_movable >>>> >>>> 4ef27005fefd4be102010b7d8552fec1ee13435a is the first bad commit >>>> commit 4ef27005fefd4be102010b7d8552fec1ee13435a >>>> Author: Christian König<christian.koenig@amd.com> >>>> Date: Mon Jan 28 13:41:58 2019 +0100 >>>> >>>> drm/amdgpu: cleanup setting bulk_movable >>>> >>>> We only need to set this to false now when BOs are removed from the LRU. >>>> >>>> Signed-off-by: Christian König<christian.koenig@amd.com> >>>> Reviewed-by: Chunming Zhou<david1.zhou@amd.com> >>>> >>>> If other info is needed, please do not hesitate. >>>> >>>> Thanks, >>>> Przemek. >>>> >>>> _______________________________________________ >>>> amd-gfx mailing list >>>> amd-gfx@lists.freedesktop.org >>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >>> >> > > > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx > _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: amd-staging-drm-next: Oops - BUG: unable to handle kernel NULL pointer dereference, bisected. [not found] ` <7a65412b-1b41-9e5b-f700-0a944a33cf49-5C7GfCeVMHo@public.gmane.org> 2019-01-30 14:17 ` Przemek Socha 2019-01-30 14:37 ` StDenis, Tom @ 2019-01-31 9:23 ` Przemek Socha 2019-01-31 16:56 ` StDenis, Tom 2 siblings, 1 reply; 9+ messages in thread From: Przemek Socha @ 2019-01-31 9:23 UTC (permalink / raw) To: Koenig, Christian, Tom St Denis Cc: Zhou, David(ChunMing), amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW [-- Attachment #1.1: Type: text/plain, Size: 1463 bytes --] Dnia środa, 30 stycznia 2019 13:42:33 CET piszesz: > Does the attached patch fix the issue? > > Christian. I have tested this one also - "drm/amdgpu: partial revert cleanup setting bulk_movable v2" >We still need to set bulk_movable to false when new BOs are added or removed. > >v2: also set it to false on removal > >Signed-off-by: Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org> >--- > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++++ > 1 file changed, 4 insertions(+) > >diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/ >amdgpu/amdgpu_vm.c >index 79f9dde70bc0..822546a149fa 100644 >--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >@@ -332,6 +332,7 @@ static void amdgpu_vm_bo_base_init(struct >amdgpu_vm_bo_base *base, > if (bo->tbo.resv != vm->root.base.bo->tbo.resv) > return; > >+ vm->bulk_moveable = false; > if (bo->tbo.type == ttm_bo_type_kernel) > amdgpu_vm_bo_relocated(base); > else >@@ -2772,6 +2773,9 @@ void amdgpu_vm_bo_rmv(struct amdgpu_device *adev, > struct amdgpu_vm_bo_base **base; > > if (bo) { >+ if (bo->tbo.resv == vm->root.base.bo->tbo.resv) >+ vm->bulk_moveable = false; >+ > for (base = &bo_va->base.bo->vm_bo; *base; > base = &(*base)->next) { > if (*base != &bo_va->base) and so far I have no lockup and Oops, so I think this one is ok. Thank you very much, Przemek. [-- Attachment #1.2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 833 bytes --] [-- Attachment #2: Type: text/plain, Size: 154 bytes --] _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: amd-staging-drm-next: Oops - BUG: unable to handle kernel NULL pointer dereference, bisected. 2019-01-31 9:23 ` Przemek Socha @ 2019-01-31 16:56 ` StDenis, Tom [not found] ` <676b4805-fc73-245f-abd2-38e014248b61-5C7GfCeVMHo@public.gmane.org> 0 siblings, 1 reply; 9+ messages in thread From: StDenis, Tom @ 2019-01-31 16:56 UTC (permalink / raw) To: soprwa-Re5JQEeQqe8AvxtiuMwx3w, Koenig, Christian Cc: Zhou, David(ChunMing), amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW On 2019-01-31 4:23 a.m., Przemek Socha wrote: > Dnia środa, 30 stycznia 2019 13:42:33 CET piszesz: >> Does the attached patch fix the issue? >> >> Christian. > > I have tested this one also - "drm/amdgpu: partial revert cleanup setting > bulk_movable v2" > >> We still need to set bulk_movable to false when new BOs are added or removed. >> >> v2: also set it to false on removal >> >> Signed-off-by: Christian König <christian.koenig@amd.com> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/ >> amdgpu/amdgpu_vm.c >> index 79f9dde70bc0..822546a149fa 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >> @@ -332,6 +332,7 @@ static void amdgpu_vm_bo_base_init(struct >> amdgpu_vm_bo_base *base, >> if (bo->tbo.resv != vm->root.base.bo->tbo.resv) >> return; >> >> + vm->bulk_moveable = false; >> if (bo->tbo.type == ttm_bo_type_kernel) >> amdgpu_vm_bo_relocated(base); >> else >> @@ -2772,6 +2773,9 @@ void amdgpu_vm_bo_rmv(struct amdgpu_device *adev, >> struct amdgpu_vm_bo_base **base; >> >> if (bo) { >> + if (bo->tbo.resv == vm->root.base.bo->tbo.resv) >> + vm->bulk_moveable = false; >> + >> for (base = &bo_va->base.bo->vm_bo; *base; >> base = &(*base)->next) { >> if (*base != &bo_va->base) > > and so far I have no lockup and Oops, so I think this one is ok. In my experience only the last chunk of the patch is necessary. Can you try this without: >> + vm->bulk_moveable = false; Too? Thanks, Tom _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <676b4805-fc73-245f-abd2-38e014248b61-5C7GfCeVMHo@public.gmane.org>]
* Re: amd-staging-drm-next: Oops - BUG: unable to handle kernel NULL pointer dereference, bisected. [not found] ` <676b4805-fc73-245f-abd2-38e014248b61-5C7GfCeVMHo@public.gmane.org> @ 2019-01-31 17:31 ` Przemek Socha 0 siblings, 0 replies; 9+ messages in thread From: Przemek Socha @ 2019-01-31 17:31 UTC (permalink / raw) To: StDenis, Tom, Koenig, Christian Cc: Zhou, David(ChunMing), amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW [-- Attachment #1.1: Type: text/plain, Size: 795 bytes --] Dnia czwartek, 31 stycznia 2019 17:56:32 CET piszesz: > In my experience only the last chunk of the patch is necessary. Can you > try this without: > > > >> + vm->bulk_moveable = false; > > > Too? > > Thanks, > Tom Sure. I have applied only the last chunk of the patch on top of today's amd-staging- drm-next pull: > >> @@ -2772,6 +2773,9 @@ void amdgpu_vm_bo_rmv(struct amdgpu_device *adev, > >> > >> struct amdgpu_vm_bo_base **base; > >> > >> > >> > >> if (bo) { > >> > >> + if (bo->tbo.resv == vm->root.base.bo->tbo.resv) > >> + vm->bulk_moveable = false; > >> + > >> > >> for (base = &bo_va->base.bo->vm_bo; *base; > >> > >> base = &(*base)->next) { > >> > >> if (*base != &bo_va->base) and it seems to be working as expected also. Thanks, Przemek. [-- Attachment #1.2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 833 bytes --] [-- Attachment #2: Type: text/plain, Size: 154 bytes --] _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2019-01-31 17:31 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-01-30 11:07 amd-staging-drm-next: Oops - BUG: unable to handle kernel NULL pointer dereference, bisected Przemek Socha 2019-01-30 12:02 ` Christian König [not found] ` <2c13e24c-6d53-a7f5-55ed-d44d7c58f655-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2019-01-30 12:06 ` Koenig, Christian [not found] ` <34e7e9b9-8d16-1055-cd08-68eac497743e-5C7GfCeVMHo@public.gmane.org> 2019-01-30 12:42 ` Koenig, Christian [not found] ` <7a65412b-1b41-9e5b-f700-0a944a33cf49-5C7GfCeVMHo@public.gmane.org> 2019-01-30 14:17 ` Przemek Socha 2019-01-30 14:37 ` StDenis, Tom 2019-01-31 9:23 ` Przemek Socha 2019-01-31 16:56 ` StDenis, Tom [not found] ` <676b4805-fc73-245f-abd2-38e014248b61-5C7GfCeVMHo@public.gmane.org> 2019-01-31 17:31 ` Przemek Socha
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.