All of lore.kernel.org
 help / color / mirror / Atom feed
* [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start
@ 2022-10-21  8:08 ` Mikhail Gavrilov
  0 siblings, 0 replies; 29+ messages in thread
From: Mikhail Gavrilov @ 2022-10-21  8:08 UTC (permalink / raw)
  To: Christian König, Deucher, Alexander
  Cc: amd-gfx list, Linux List Kernel Mailing, dri-devel

Hi!
I found that some games (Cyberpunk 2077, Forza Horizon 4/5) hang at
start after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6.

dd80d9c8eecac8c516da5b240d01a35660ba6cb6 is the first bad commit
commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6
Author: Christian König <christian.koenig@amd.com>
Date:   Thu Jul 14 10:23:38 2022 +0200

    drm/amdgpu: revert "partial revert "remove ctx->lock" v2"

    This reverts commit 94f4c4965e5513ba624488f4b601d6b385635aec.

    We found that the bo_list is missing a protection for its list entries.
    Since that is fixed now this workaround can be removed again.

    Signed-off-by: Christian König <christian.koenig@amd.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 21 ++++++---------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c |  2 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h |  1 -
 3 files changed, 6 insertions(+), 18 deletions(-)


And when it happening in kernel log appears a such backtrace:
[  231.331210] ------------[ cut here ]------------
[  231.331262] WARNING: CPU: 11 PID: 6555 at
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:675
amdgpu_ttm_tt_get_user_pages+0x14c/0x190 [amdgpu]
[  231.331424] Modules linked in: uinput rfcomm snd_seq_dummy
snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast
nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
qrtr bnep intel_rapl_msr intel_rapl_common snd_sof_amd_renoir
snd_sof_amd_acp snd_sof_pci snd_hda_codec_realtek snd_sof
snd_hda_codec_generic snd_hda_codec_hdmi snd_sof_utils mt7921e
snd_hda_intel sunrpc snd_intel_dspcfg mt7921_common binfmt_misc
snd_intel_sdw_acpi snd_hda_codec mt76_connac_lib edac_mce_amd btusb
snd_soc_core mt76 snd_hda_core btrtl snd_hwdep snd_compress kvm_amd
ac97_bus snd_seq btbcm snd_pcm_dmaengine btintel snd_rpl_pci_acp6x
mac80211 btmtk snd_pci_acp6x kvm snd_seq_device snd_pcm snd_pci_acp5x
libarc4 irqbypass bluetooth snd_rn_pci_acp3x snd_timer pcspkr
asus_nb_wmi rapl joydev wmi_bmof snd_acp_config cfg80211 snd_soc_acpi
vfat snd
[  231.331490]  snd_pci_acp3x i2c_piix4 soundcore fat k10temp amd_pmc
asus_wireless zram amdgpu drm_ttm_helper ttm hid_asus asus_wmi
iommu_v2 crct10dif_pclmul crc32_pclmul gpu_sched crc32c_intel
ledtrig_audio sparse_keymap polyval_clmulni platform_profile drm_buddy
polyval_generic hid_multitouch drm_display_helper rfkill nvme
ucsi_acpi ghash_clmulni_intel nvme_core video typec_ucsi serio_raw ccp
sha512_ssse3 sp5100_tco r8169 cec nvme_common typec wmi i2c_hid_acpi
i2c_hid ip6_tables ip_tables fuse
[  231.331532] CPU: 11 PID: 6555 Comm: GameThread Tainted: G        W
  L    -------  ---
6.1.0-0.rc1.20221019gitaae703b02f92.17.fc38.x86_64 #1
[  231.331534] Hardware name: ASUSTeK COMPUTER INC. ROG Strix
G513QY_G513QY/G513QY, BIOS G513QY.318 03/29/2022
[  231.331537] RIP: 0010:amdgpu_ttm_tt_get_user_pages+0x14c/0x190 [amdgpu]
[  231.331654] Code: a8 d0 e9 32 ff ff ff 4c 89 e9 89 ea 48 c7 c6 40
82 f3 c0 48 c7 c7 10 60 14 c1 e8 2f a0 f4 d0 eb 8e 66 90 bd f2 ff ff
ff eb 8d <0f> 0b eb f5 bd fd ff ff ff eb 82 bd f2 ff ff ff e9 62 ff ff
ff 48
[  231.331656] RSP: 0018:ffffaad4c705bae8 EFLAGS: 00010286
[  231.331659] RAX: ffff8e9cbdbe3200 RBX: ffff8e997e3f2440 RCX: 0000000000000000
[  231.331661] RDX: 0000000000000000 RSI: ffff8e9cbdbe3200 RDI: ffff8e9c31208000
[  231.331663] RBP: 0000000000000001 R08: 0000000000000dc0 R09: 00000000ffffffff
[  231.331665] R10: 0000000000000001 R11: 0000000000000000 R12: ffffaad4c705bb90
[  231.331666] R13: 0000000076510000 R14: ffff8e9c89f334e0 R15: ffff8e991fda8000
[  231.331668] FS:  000000007c2af6c0(0000) GS:ffff8ea7d8e00000(0000)
knlGS:000000007b2c0000
[  231.331671] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  231.331673] CR2: 00007ff65ffd8000 CR3: 00000004f90f0000 CR4: 0000000000750ee0
[  231.331674] PKRU: 55555554
[  231.331676] Call Trace:
[  231.331678]  <TASK>
[  231.331682]  amdgpu_cs_ioctl+0x87e/0x1fc0 [amdgpu]
[  231.331824]  ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
[  231.331981]  drm_ioctl_kernel+0xac/0x160
[  231.331990]  drm_ioctl+0x1e7/0x450
[  231.331994]  ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
[  231.332118]  amdgpu_drm_ioctl+0x4a/0x80 [amdgpu]
[  231.332233]  __x64_sys_ioctl+0x90/0xd0
[  231.332238]  do_syscall_64+0x5b/0x80
[  231.332243]  ? asm_exc_page_fault+0x22/0x30
[  231.332247]  ? lockdep_hardirqs_on+0x7d/0x100
[  231.332250]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  231.332253] RIP: 0033:0x7ff677c5704f
[  231.332256] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24
10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00
00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28
00 00
[  231.332258] RSP: 002b:000000007c2ad470 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[  231.332261] RAX: ffffffffffffffda RBX: 000000007c2ad718 RCX: 00007ff677c5704f
[  231.332263] RDX: 000000007c2ad540 RSI: 00000000c0186444 RDI: 00000000000000a7
[  231.332265] RBP: 000000007c2ad540 R08: 00007ff590048590 R09: 000000007c2ad510
[  231.332266] R10: 000000007e864ec0 R11: 0000000000000246 R12: 00000000c0186444
[  231.332268] R13: 00000000000000a7 R14: 000000007c2ad6f0 R15: 0000000000000005
[  231.332277]  </TASK>
[  231.332279] irq event stamp: 18035
[  231.332281] hardirqs last  enabled at (18043): [<ffffffff9118e8de>]
__up_console_sem+0x5e/0x70
[  231.332284] hardirqs last disabled at (18050): [<ffffffff9118e8c3>]
__up_console_sem+0x43/0x70
[  231.332287] softirqs last  enabled at (17864): [<ffffffff911012ed>]
__irq_exit_rcu+0xed/0x160
[  231.332289] softirqs last disabled at (17857): [<ffffffff911012ed>]
__irq_exit_rcu+0xed/0x160
[  231.332291] ---[ end trace 0000000000000000 ]---
[  231.332299] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to
process the buffer list -14!

[  231.332423] ================================================
[  231.332424] WARNING: lock held when returning to user space!
[  231.332425] 6.1.0-0.rc1.20221019gitaae703b02f92.17.fc38.x86_64 #1
Tainted: G        W    L    -------  ---
[  231.332426] ------------------------------------------------
[  231.332427] GameThread/6555 is leaving the kernel with locks still held!
[  231.332428] 1 lock held by GameThread/6555:
[  231.332429]  #0: ffff8e9cfbac64a8
(&list->bo_list_mutex){+.+.}-{3:3}, at: amdgpu_cs_ioctl+0x786/0x1fc0
[amdgpu]
[  389.428155] amdgpu 0000:03:00.0: amdgpu: free PSP TMR buffer

Christian, any ideas?
Thanks.

Full kernel log: https://pastebin.com/6SEaDay8
My hardware:
GPU: 6900XT, 6800M
CPU: 3950X, 5900HX

-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start
@ 2022-10-21  8:08 ` Mikhail Gavrilov
  0 siblings, 0 replies; 29+ messages in thread
From: Mikhail Gavrilov @ 2022-10-21  8:08 UTC (permalink / raw)
  To: Christian König, Deucher, Alexander
  Cc: dri-devel, Linux List Kernel Mailing, amd-gfx list

Hi!
I found that some games (Cyberpunk 2077, Forza Horizon 4/5) hang at
start after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6.

dd80d9c8eecac8c516da5b240d01a35660ba6cb6 is the first bad commit
commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6
Author: Christian König <christian.koenig@amd.com>
Date:   Thu Jul 14 10:23:38 2022 +0200

    drm/amdgpu: revert "partial revert "remove ctx->lock" v2"

    This reverts commit 94f4c4965e5513ba624488f4b601d6b385635aec.

    We found that the bo_list is missing a protection for its list entries.
    Since that is fixed now this workaround can be removed again.

    Signed-off-by: Christian König <christian.koenig@amd.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 21 ++++++---------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c |  2 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h |  1 -
 3 files changed, 6 insertions(+), 18 deletions(-)


And when it happening in kernel log appears a such backtrace:
[  231.331210] ------------[ cut here ]------------
[  231.331262] WARNING: CPU: 11 PID: 6555 at
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:675
amdgpu_ttm_tt_get_user_pages+0x14c/0x190 [amdgpu]
[  231.331424] Modules linked in: uinput rfcomm snd_seq_dummy
snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast
nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
qrtr bnep intel_rapl_msr intel_rapl_common snd_sof_amd_renoir
snd_sof_amd_acp snd_sof_pci snd_hda_codec_realtek snd_sof
snd_hda_codec_generic snd_hda_codec_hdmi snd_sof_utils mt7921e
snd_hda_intel sunrpc snd_intel_dspcfg mt7921_common binfmt_misc
snd_intel_sdw_acpi snd_hda_codec mt76_connac_lib edac_mce_amd btusb
snd_soc_core mt76 snd_hda_core btrtl snd_hwdep snd_compress kvm_amd
ac97_bus snd_seq btbcm snd_pcm_dmaengine btintel snd_rpl_pci_acp6x
mac80211 btmtk snd_pci_acp6x kvm snd_seq_device snd_pcm snd_pci_acp5x
libarc4 irqbypass bluetooth snd_rn_pci_acp3x snd_timer pcspkr
asus_nb_wmi rapl joydev wmi_bmof snd_acp_config cfg80211 snd_soc_acpi
vfat snd
[  231.331490]  snd_pci_acp3x i2c_piix4 soundcore fat k10temp amd_pmc
asus_wireless zram amdgpu drm_ttm_helper ttm hid_asus asus_wmi
iommu_v2 crct10dif_pclmul crc32_pclmul gpu_sched crc32c_intel
ledtrig_audio sparse_keymap polyval_clmulni platform_profile drm_buddy
polyval_generic hid_multitouch drm_display_helper rfkill nvme
ucsi_acpi ghash_clmulni_intel nvme_core video typec_ucsi serio_raw ccp
sha512_ssse3 sp5100_tco r8169 cec nvme_common typec wmi i2c_hid_acpi
i2c_hid ip6_tables ip_tables fuse
[  231.331532] CPU: 11 PID: 6555 Comm: GameThread Tainted: G        W
  L    -------  ---
6.1.0-0.rc1.20221019gitaae703b02f92.17.fc38.x86_64 #1
[  231.331534] Hardware name: ASUSTeK COMPUTER INC. ROG Strix
G513QY_G513QY/G513QY, BIOS G513QY.318 03/29/2022
[  231.331537] RIP: 0010:amdgpu_ttm_tt_get_user_pages+0x14c/0x190 [amdgpu]
[  231.331654] Code: a8 d0 e9 32 ff ff ff 4c 89 e9 89 ea 48 c7 c6 40
82 f3 c0 48 c7 c7 10 60 14 c1 e8 2f a0 f4 d0 eb 8e 66 90 bd f2 ff ff
ff eb 8d <0f> 0b eb f5 bd fd ff ff ff eb 82 bd f2 ff ff ff e9 62 ff ff
ff 48
[  231.331656] RSP: 0018:ffffaad4c705bae8 EFLAGS: 00010286
[  231.331659] RAX: ffff8e9cbdbe3200 RBX: ffff8e997e3f2440 RCX: 0000000000000000
[  231.331661] RDX: 0000000000000000 RSI: ffff8e9cbdbe3200 RDI: ffff8e9c31208000
[  231.331663] RBP: 0000000000000001 R08: 0000000000000dc0 R09: 00000000ffffffff
[  231.331665] R10: 0000000000000001 R11: 0000000000000000 R12: ffffaad4c705bb90
[  231.331666] R13: 0000000076510000 R14: ffff8e9c89f334e0 R15: ffff8e991fda8000
[  231.331668] FS:  000000007c2af6c0(0000) GS:ffff8ea7d8e00000(0000)
knlGS:000000007b2c0000
[  231.331671] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  231.331673] CR2: 00007ff65ffd8000 CR3: 00000004f90f0000 CR4: 0000000000750ee0
[  231.331674] PKRU: 55555554
[  231.331676] Call Trace:
[  231.331678]  <TASK>
[  231.331682]  amdgpu_cs_ioctl+0x87e/0x1fc0 [amdgpu]
[  231.331824]  ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
[  231.331981]  drm_ioctl_kernel+0xac/0x160
[  231.331990]  drm_ioctl+0x1e7/0x450
[  231.331994]  ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
[  231.332118]  amdgpu_drm_ioctl+0x4a/0x80 [amdgpu]
[  231.332233]  __x64_sys_ioctl+0x90/0xd0
[  231.332238]  do_syscall_64+0x5b/0x80
[  231.332243]  ? asm_exc_page_fault+0x22/0x30
[  231.332247]  ? lockdep_hardirqs_on+0x7d/0x100
[  231.332250]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  231.332253] RIP: 0033:0x7ff677c5704f
[  231.332256] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24
10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00
00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28
00 00
[  231.332258] RSP: 002b:000000007c2ad470 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[  231.332261] RAX: ffffffffffffffda RBX: 000000007c2ad718 RCX: 00007ff677c5704f
[  231.332263] RDX: 000000007c2ad540 RSI: 00000000c0186444 RDI: 00000000000000a7
[  231.332265] RBP: 000000007c2ad540 R08: 00007ff590048590 R09: 000000007c2ad510
[  231.332266] R10: 000000007e864ec0 R11: 0000000000000246 R12: 00000000c0186444
[  231.332268] R13: 00000000000000a7 R14: 000000007c2ad6f0 R15: 0000000000000005
[  231.332277]  </TASK>
[  231.332279] irq event stamp: 18035
[  231.332281] hardirqs last  enabled at (18043): [<ffffffff9118e8de>]
__up_console_sem+0x5e/0x70
[  231.332284] hardirqs last disabled at (18050): [<ffffffff9118e8c3>]
__up_console_sem+0x43/0x70
[  231.332287] softirqs last  enabled at (17864): [<ffffffff911012ed>]
__irq_exit_rcu+0xed/0x160
[  231.332289] softirqs last disabled at (17857): [<ffffffff911012ed>]
__irq_exit_rcu+0xed/0x160
[  231.332291] ---[ end trace 0000000000000000 ]---
[  231.332299] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to
process the buffer list -14!

[  231.332423] ================================================
[  231.332424] WARNING: lock held when returning to user space!
[  231.332425] 6.1.0-0.rc1.20221019gitaae703b02f92.17.fc38.x86_64 #1
Tainted: G        W    L    -------  ---
[  231.332426] ------------------------------------------------
[  231.332427] GameThread/6555 is leaving the kernel with locks still held!
[  231.332428] 1 lock held by GameThread/6555:
[  231.332429]  #0: ffff8e9cfbac64a8
(&list->bo_list_mutex){+.+.}-{3:3}, at: amdgpu_cs_ioctl+0x786/0x1fc0
[amdgpu]
[  389.428155] amdgpu 0000:03:00.0: amdgpu: free PSP TMR buffer

Christian, any ideas?
Thanks.

Full kernel log: https://pastebin.com/6SEaDay8
My hardware:
GPU: 6900XT, 6800M
CPU: 3950X, 5900HX

-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start
  2022-10-21  8:08 ` Mikhail Gavrilov
@ 2022-10-21  8:32   ` Christian König
  -1 siblings, 0 replies; 29+ messages in thread
From: Christian König @ 2022-10-21  8:32 UTC (permalink / raw)
  To: Mikhail Gavrilov, Deucher, Alexander
  Cc: dri-devel, Linux List Kernel Mailing, amd-gfx list

Hi,

yes Bas already reported this issue, but I couldn't reproduce it. Need 
to come up with a patch to narrow this down further.

Can I send you something to test?

Thanks for the help,
Christian.

Am 21.10.22 um 10:08 schrieb Mikhail Gavrilov:
> Hi!
> I found that some games (Cyberpunk 2077, Forza Horizon 4/5) hang at
> start after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6.
>
> dd80d9c8eecac8c516da5b240d01a35660ba6cb6 is the first bad commit
> commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6
> Author: Christian König <christian.koenig@amd.com>
> Date:   Thu Jul 14 10:23:38 2022 +0200
>
>      drm/amdgpu: revert "partial revert "remove ctx->lock" v2"
>
>      This reverts commit 94f4c4965e5513ba624488f4b601d6b385635aec.
>
>      We found that the bo_list is missing a protection for its list entries.
>      Since that is fixed now this workaround can be removed again.
>
>      Signed-off-by: Christian König <christian.koenig@amd.com>
>      Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
>      Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 21 ++++++---------------
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c |  2 --
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h |  1 -
>   3 files changed, 6 insertions(+), 18 deletions(-)
>
>
> And when it happening in kernel log appears a such backtrace:
> [  231.331210] ------------[ cut here ]------------
> [  231.331262] WARNING: CPU: 11 PID: 6555 at
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:675
> amdgpu_ttm_tt_get_user_pages+0x14c/0x190 [amdgpu]
> [  231.331424] Modules linked in: uinput rfcomm snd_seq_dummy
> snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast
> nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
> nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
> qrtr bnep intel_rapl_msr intel_rapl_common snd_sof_amd_renoir
> snd_sof_amd_acp snd_sof_pci snd_hda_codec_realtek snd_sof
> snd_hda_codec_generic snd_hda_codec_hdmi snd_sof_utils mt7921e
> snd_hda_intel sunrpc snd_intel_dspcfg mt7921_common binfmt_misc
> snd_intel_sdw_acpi snd_hda_codec mt76_connac_lib edac_mce_amd btusb
> snd_soc_core mt76 snd_hda_core btrtl snd_hwdep snd_compress kvm_amd
> ac97_bus snd_seq btbcm snd_pcm_dmaengine btintel snd_rpl_pci_acp6x
> mac80211 btmtk snd_pci_acp6x kvm snd_seq_device snd_pcm snd_pci_acp5x
> libarc4 irqbypass bluetooth snd_rn_pci_acp3x snd_timer pcspkr
> asus_nb_wmi rapl joydev wmi_bmof snd_acp_config cfg80211 snd_soc_acpi
> vfat snd
> [  231.331490]  snd_pci_acp3x i2c_piix4 soundcore fat k10temp amd_pmc
> asus_wireless zram amdgpu drm_ttm_helper ttm hid_asus asus_wmi
> iommu_v2 crct10dif_pclmul crc32_pclmul gpu_sched crc32c_intel
> ledtrig_audio sparse_keymap polyval_clmulni platform_profile drm_buddy
> polyval_generic hid_multitouch drm_display_helper rfkill nvme
> ucsi_acpi ghash_clmulni_intel nvme_core video typec_ucsi serio_raw ccp
> sha512_ssse3 sp5100_tco r8169 cec nvme_common typec wmi i2c_hid_acpi
> i2c_hid ip6_tables ip_tables fuse
> [  231.331532] CPU: 11 PID: 6555 Comm: GameThread Tainted: G        W
>    L    -------  ---
> 6.1.0-0.rc1.20221019gitaae703b02f92.17.fc38.x86_64 #1
> [  231.331534] Hardware name: ASUSTeK COMPUTER INC. ROG Strix
> G513QY_G513QY/G513QY, BIOS G513QY.318 03/29/2022
> [  231.331537] RIP: 0010:amdgpu_ttm_tt_get_user_pages+0x14c/0x190 [amdgpu]
> [  231.331654] Code: a8 d0 e9 32 ff ff ff 4c 89 e9 89 ea 48 c7 c6 40
> 82 f3 c0 48 c7 c7 10 60 14 c1 e8 2f a0 f4 d0 eb 8e 66 90 bd f2 ff ff
> ff eb 8d <0f> 0b eb f5 bd fd ff ff ff eb 82 bd f2 ff ff ff e9 62 ff ff
> ff 48
> [  231.331656] RSP: 0018:ffffaad4c705bae8 EFLAGS: 00010286
> [  231.331659] RAX: ffff8e9cbdbe3200 RBX: ffff8e997e3f2440 RCX: 0000000000000000
> [  231.331661] RDX: 0000000000000000 RSI: ffff8e9cbdbe3200 RDI: ffff8e9c31208000
> [  231.331663] RBP: 0000000000000001 R08: 0000000000000dc0 R09: 00000000ffffffff
> [  231.331665] R10: 0000000000000001 R11: 0000000000000000 R12: ffffaad4c705bb90
> [  231.331666] R13: 0000000076510000 R14: ffff8e9c89f334e0 R15: ffff8e991fda8000
> [  231.331668] FS:  000000007c2af6c0(0000) GS:ffff8ea7d8e00000(0000)
> knlGS:000000007b2c0000
> [  231.331671] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  231.331673] CR2: 00007ff65ffd8000 CR3: 00000004f90f0000 CR4: 0000000000750ee0
> [  231.331674] PKRU: 55555554
> [  231.331676] Call Trace:
> [  231.331678]  <TASK>
> [  231.331682]  amdgpu_cs_ioctl+0x87e/0x1fc0 [amdgpu]
> [  231.331824]  ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
> [  231.331981]  drm_ioctl_kernel+0xac/0x160
> [  231.331990]  drm_ioctl+0x1e7/0x450
> [  231.331994]  ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
> [  231.332118]  amdgpu_drm_ioctl+0x4a/0x80 [amdgpu]
> [  231.332233]  __x64_sys_ioctl+0x90/0xd0
> [  231.332238]  do_syscall_64+0x5b/0x80
> [  231.332243]  ? asm_exc_page_fault+0x22/0x30
> [  231.332247]  ? lockdep_hardirqs_on+0x7d/0x100
> [  231.332250]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [  231.332253] RIP: 0033:0x7ff677c5704f
> [  231.332256] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24
> 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00
> 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28
> 00 00
> [  231.332258] RSP: 002b:000000007c2ad470 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [  231.332261] RAX: ffffffffffffffda RBX: 000000007c2ad718 RCX: 00007ff677c5704f
> [  231.332263] RDX: 000000007c2ad540 RSI: 00000000c0186444 RDI: 00000000000000a7
> [  231.332265] RBP: 000000007c2ad540 R08: 00007ff590048590 R09: 000000007c2ad510
> [  231.332266] R10: 000000007e864ec0 R11: 0000000000000246 R12: 00000000c0186444
> [  231.332268] R13: 00000000000000a7 R14: 000000007c2ad6f0 R15: 0000000000000005
> [  231.332277]  </TASK>
> [  231.332279] irq event stamp: 18035
> [  231.332281] hardirqs last  enabled at (18043): [<ffffffff9118e8de>]
> __up_console_sem+0x5e/0x70
> [  231.332284] hardirqs last disabled at (18050): [<ffffffff9118e8c3>]
> __up_console_sem+0x43/0x70
> [  231.332287] softirqs last  enabled at (17864): [<ffffffff911012ed>]
> __irq_exit_rcu+0xed/0x160
> [  231.332289] softirqs last disabled at (17857): [<ffffffff911012ed>]
> __irq_exit_rcu+0xed/0x160
> [  231.332291] ---[ end trace 0000000000000000 ]---
> [  231.332299] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to
> process the buffer list -14!
>
> [  231.332423] ================================================
> [  231.332424] WARNING: lock held when returning to user space!
> [  231.332425] 6.1.0-0.rc1.20221019gitaae703b02f92.17.fc38.x86_64 #1
> Tainted: G        W    L    -------  ---
> [  231.332426] ------------------------------------------------
> [  231.332427] GameThread/6555 is leaving the kernel with locks still held!
> [  231.332428] 1 lock held by GameThread/6555:
> [  231.332429]  #0: ffff8e9cfbac64a8
> (&list->bo_list_mutex){+.+.}-{3:3}, at: amdgpu_cs_ioctl+0x786/0x1fc0
> [amdgpu]
> [  389.428155] amdgpu 0000:03:00.0: amdgpu: free PSP TMR buffer
>
> Christian, any ideas?
> Thanks.
>
> Full kernel log: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2F6SEaDay8&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C27dd4aee42ce4c17b96408dab33b789a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638019365255948072%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=FWqD8yajUiBsCTlgqjF2Y%2BtC193YDFHocH3%2F46fiiOA%3D&amp;reserved=0
> My hardware:
> GPU: 6900XT, 6800M
> CPU: 3950X, 5900HX
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start
@ 2022-10-21  8:32   ` Christian König
  0 siblings, 0 replies; 29+ messages in thread
From: Christian König @ 2022-10-21  8:32 UTC (permalink / raw)
  To: Mikhail Gavrilov, Deucher, Alexander
  Cc: amd-gfx list, Linux List Kernel Mailing, dri-devel

Hi,

yes Bas already reported this issue, but I couldn't reproduce it. Need 
to come up with a patch to narrow this down further.

Can I send you something to test?

Thanks for the help,
Christian.

Am 21.10.22 um 10:08 schrieb Mikhail Gavrilov:
> Hi!
> I found that some games (Cyberpunk 2077, Forza Horizon 4/5) hang at
> start after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6.
>
> dd80d9c8eecac8c516da5b240d01a35660ba6cb6 is the first bad commit
> commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6
> Author: Christian König <christian.koenig@amd.com>
> Date:   Thu Jul 14 10:23:38 2022 +0200
>
>      drm/amdgpu: revert "partial revert "remove ctx->lock" v2"
>
>      This reverts commit 94f4c4965e5513ba624488f4b601d6b385635aec.
>
>      We found that the bo_list is missing a protection for its list entries.
>      Since that is fixed now this workaround can be removed again.
>
>      Signed-off-by: Christian König <christian.koenig@amd.com>
>      Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
>      Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 21 ++++++---------------
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c |  2 --
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h |  1 -
>   3 files changed, 6 insertions(+), 18 deletions(-)
>
>
> And when it happening in kernel log appears a such backtrace:
> [  231.331210] ------------[ cut here ]------------
> [  231.331262] WARNING: CPU: 11 PID: 6555 at
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:675
> amdgpu_ttm_tt_get_user_pages+0x14c/0x190 [amdgpu]
> [  231.331424] Modules linked in: uinput rfcomm snd_seq_dummy
> snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast
> nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
> nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
> qrtr bnep intel_rapl_msr intel_rapl_common snd_sof_amd_renoir
> snd_sof_amd_acp snd_sof_pci snd_hda_codec_realtek snd_sof
> snd_hda_codec_generic snd_hda_codec_hdmi snd_sof_utils mt7921e
> snd_hda_intel sunrpc snd_intel_dspcfg mt7921_common binfmt_misc
> snd_intel_sdw_acpi snd_hda_codec mt76_connac_lib edac_mce_amd btusb
> snd_soc_core mt76 snd_hda_core btrtl snd_hwdep snd_compress kvm_amd
> ac97_bus snd_seq btbcm snd_pcm_dmaengine btintel snd_rpl_pci_acp6x
> mac80211 btmtk snd_pci_acp6x kvm snd_seq_device snd_pcm snd_pci_acp5x
> libarc4 irqbypass bluetooth snd_rn_pci_acp3x snd_timer pcspkr
> asus_nb_wmi rapl joydev wmi_bmof snd_acp_config cfg80211 snd_soc_acpi
> vfat snd
> [  231.331490]  snd_pci_acp3x i2c_piix4 soundcore fat k10temp amd_pmc
> asus_wireless zram amdgpu drm_ttm_helper ttm hid_asus asus_wmi
> iommu_v2 crct10dif_pclmul crc32_pclmul gpu_sched crc32c_intel
> ledtrig_audio sparse_keymap polyval_clmulni platform_profile drm_buddy
> polyval_generic hid_multitouch drm_display_helper rfkill nvme
> ucsi_acpi ghash_clmulni_intel nvme_core video typec_ucsi serio_raw ccp
> sha512_ssse3 sp5100_tco r8169 cec nvme_common typec wmi i2c_hid_acpi
> i2c_hid ip6_tables ip_tables fuse
> [  231.331532] CPU: 11 PID: 6555 Comm: GameThread Tainted: G        W
>    L    -------  ---
> 6.1.0-0.rc1.20221019gitaae703b02f92.17.fc38.x86_64 #1
> [  231.331534] Hardware name: ASUSTeK COMPUTER INC. ROG Strix
> G513QY_G513QY/G513QY, BIOS G513QY.318 03/29/2022
> [  231.331537] RIP: 0010:amdgpu_ttm_tt_get_user_pages+0x14c/0x190 [amdgpu]
> [  231.331654] Code: a8 d0 e9 32 ff ff ff 4c 89 e9 89 ea 48 c7 c6 40
> 82 f3 c0 48 c7 c7 10 60 14 c1 e8 2f a0 f4 d0 eb 8e 66 90 bd f2 ff ff
> ff eb 8d <0f> 0b eb f5 bd fd ff ff ff eb 82 bd f2 ff ff ff e9 62 ff ff
> ff 48
> [  231.331656] RSP: 0018:ffffaad4c705bae8 EFLAGS: 00010286
> [  231.331659] RAX: ffff8e9cbdbe3200 RBX: ffff8e997e3f2440 RCX: 0000000000000000
> [  231.331661] RDX: 0000000000000000 RSI: ffff8e9cbdbe3200 RDI: ffff8e9c31208000
> [  231.331663] RBP: 0000000000000001 R08: 0000000000000dc0 R09: 00000000ffffffff
> [  231.331665] R10: 0000000000000001 R11: 0000000000000000 R12: ffffaad4c705bb90
> [  231.331666] R13: 0000000076510000 R14: ffff8e9c89f334e0 R15: ffff8e991fda8000
> [  231.331668] FS:  000000007c2af6c0(0000) GS:ffff8ea7d8e00000(0000)
> knlGS:000000007b2c0000
> [  231.331671] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  231.331673] CR2: 00007ff65ffd8000 CR3: 00000004f90f0000 CR4: 0000000000750ee0
> [  231.331674] PKRU: 55555554
> [  231.331676] Call Trace:
> [  231.331678]  <TASK>
> [  231.331682]  amdgpu_cs_ioctl+0x87e/0x1fc0 [amdgpu]
> [  231.331824]  ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
> [  231.331981]  drm_ioctl_kernel+0xac/0x160
> [  231.331990]  drm_ioctl+0x1e7/0x450
> [  231.331994]  ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
> [  231.332118]  amdgpu_drm_ioctl+0x4a/0x80 [amdgpu]
> [  231.332233]  __x64_sys_ioctl+0x90/0xd0
> [  231.332238]  do_syscall_64+0x5b/0x80
> [  231.332243]  ? asm_exc_page_fault+0x22/0x30
> [  231.332247]  ? lockdep_hardirqs_on+0x7d/0x100
> [  231.332250]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [  231.332253] RIP: 0033:0x7ff677c5704f
> [  231.332256] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24
> 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00
> 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28
> 00 00
> [  231.332258] RSP: 002b:000000007c2ad470 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [  231.332261] RAX: ffffffffffffffda RBX: 000000007c2ad718 RCX: 00007ff677c5704f
> [  231.332263] RDX: 000000007c2ad540 RSI: 00000000c0186444 RDI: 00000000000000a7
> [  231.332265] RBP: 000000007c2ad540 R08: 00007ff590048590 R09: 000000007c2ad510
> [  231.332266] R10: 000000007e864ec0 R11: 0000000000000246 R12: 00000000c0186444
> [  231.332268] R13: 00000000000000a7 R14: 000000007c2ad6f0 R15: 0000000000000005
> [  231.332277]  </TASK>
> [  231.332279] irq event stamp: 18035
> [  231.332281] hardirqs last  enabled at (18043): [<ffffffff9118e8de>]
> __up_console_sem+0x5e/0x70
> [  231.332284] hardirqs last disabled at (18050): [<ffffffff9118e8c3>]
> __up_console_sem+0x43/0x70
> [  231.332287] softirqs last  enabled at (17864): [<ffffffff911012ed>]
> __irq_exit_rcu+0xed/0x160
> [  231.332289] softirqs last disabled at (17857): [<ffffffff911012ed>]
> __irq_exit_rcu+0xed/0x160
> [  231.332291] ---[ end trace 0000000000000000 ]---
> [  231.332299] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to
> process the buffer list -14!
>
> [  231.332423] ================================================
> [  231.332424] WARNING: lock held when returning to user space!
> [  231.332425] 6.1.0-0.rc1.20221019gitaae703b02f92.17.fc38.x86_64 #1
> Tainted: G        W    L    -------  ---
> [  231.332426] ------------------------------------------------
> [  231.332427] GameThread/6555 is leaving the kernel with locks still held!
> [  231.332428] 1 lock held by GameThread/6555:
> [  231.332429]  #0: ffff8e9cfbac64a8
> (&list->bo_list_mutex){+.+.}-{3:3}, at: amdgpu_cs_ioctl+0x786/0x1fc0
> [amdgpu]
> [  389.428155] amdgpu 0000:03:00.0: amdgpu: free PSP TMR buffer
>
> Christian, any ideas?
> Thanks.
>
> Full kernel log: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2F6SEaDay8&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C27dd4aee42ce4c17b96408dab33b789a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638019365255948072%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=FWqD8yajUiBsCTlgqjF2Y%2BtC193YDFHocH3%2F46fiiOA%3D&amp;reserved=0
> My hardware:
> GPU: 6900XT, 6800M
> CPU: 3950X, 5900HX
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start
  2022-10-21  8:32   ` Christian König
@ 2022-10-21 12:36     ` Mikhail Gavrilov
  -1 siblings, 0 replies; 29+ messages in thread
From: Mikhail Gavrilov @ 2022-10-21 12:36 UTC (permalink / raw)
  To: Christian König
  Cc: Deucher, Alexander, amd-gfx list, Linux List Kernel Mailing, dri-devel

On Fri, Oct 21, 2022 at 1:33 PM Christian König
<christian.koenig@amd.com> wrote:
>
> Hi,
>
> yes Bas already reported this issue, but I couldn't reproduce it. Need
> to come up with a patch to narrow this down further.
>
> Can I send you something to test?

I would appreciate to test any patches and ideas.

-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start
@ 2022-10-21 12:36     ` Mikhail Gavrilov
  0 siblings, 0 replies; 29+ messages in thread
From: Mikhail Gavrilov @ 2022-10-21 12:36 UTC (permalink / raw)
  To: Christian König
  Cc: Deucher, Alexander, dri-devel, Linux List Kernel Mailing, amd-gfx list

On Fri, Oct 21, 2022 at 1:33 PM Christian König
<christian.koenig@amd.com> wrote:
>
> Hi,
>
> yes Bas already reported this issue, but I couldn't reproduce it. Need
> to come up with a patch to narrow this down further.
>
> Can I send you something to test?

I would appreciate to test any patches and ideas.

-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start #forregzbot
  2022-10-21  8:08 ` Mikhail Gavrilov
@ 2022-10-23 13:20   ` Thorsten Leemhuis
  -1 siblings, 0 replies; 29+ messages in thread
From: Thorsten Leemhuis @ 2022-10-23 13:20 UTC (permalink / raw)
  To: regressions; +Cc: dri-devel, Linux List Kernel Mailing, amd-gfx list

[Note: this mail is primarily send for documentation purposes and/or for
regzbot, my Linux kernel regression tracking bot. That's why I removed
most or all folks from the list of recipients, but left any that looked
like a mailing lists. These mails usually contain '#forregzbot' in the
subject, to make them easy to spot and filter out.]

[TLDR: I'm adding this regression report to the list of tracked
regressions; all text from me you find below is based on a few templates
paragraphs you might have encountered already already in similar form.]

Hi, this is your Linux kernel regression tracker. CCing the regression
mailing list, as it should be in the loop for all regressions, as
explained here:
https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html


On 21.10.22 10:08, Mikhail Gavrilov wrote:
> Hi!
> I found that some games (Cyberpunk 2077, Forza Horizon 4/5) hang at
> start after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6.

Thanks for the report. To be sure below issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, my Linux kernel regression
tracking bot:

#regzbot ^introduced dd80d9c8eecac
#regzbot title drm: amdgpu: some games (Cyberpunk 2077, Forza Horizon
4/5) hang at start
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply -- ideally with also
telling regzbot about it, as explained here:
https://linux-regtracking.leemhuis.info/tracked-regression/

Reminder for developers: When fixing the issue, add 'Link:' tags
pointing to the report (the mail this one replies to), as explained for
in the Linux kernel's documentation; above webpage explains why this is
important for tracked regressions.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

> dd80d9c8eecac8c516da5b240d01a35660ba6cb6 is the first bad commit
> commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6
> Author: Christian König <christian.koenig@amd.com>
> Date:   Thu Jul 14 10:23:38 2022 +0200
> 
>     drm/amdgpu: revert "partial revert "remove ctx->lock" v2"
> 
>     This reverts commit 94f4c4965e5513ba624488f4b601d6b385635aec.
> 
>     We found that the bo_list is missing a protection for its list entries.
>     Since that is fixed now this workaround can be removed again.
> 
>     Signed-off-by: Christian König <christian.koenig@amd.com>
>     Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
>     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 21 ++++++---------------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c |  2 --
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h |  1 -
>  3 files changed, 6 insertions(+), 18 deletions(-)
> 
> 
> And when it happening in kernel log appears a such backtrace:
> [  231.331210] ------------[ cut here ]------------
> [  231.331262] WARNING: CPU: 11 PID: 6555 at
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:675
> amdgpu_ttm_tt_get_user_pages+0x14c/0x190 [amdgpu]
> [  231.331424] Modules linked in: uinput rfcomm snd_seq_dummy
> snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast
> nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
> nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
> qrtr bnep intel_rapl_msr intel_rapl_common snd_sof_amd_renoir
> snd_sof_amd_acp snd_sof_pci snd_hda_codec_realtek snd_sof
> snd_hda_codec_generic snd_hda_codec_hdmi snd_sof_utils mt7921e
> snd_hda_intel sunrpc snd_intel_dspcfg mt7921_common binfmt_misc
> snd_intel_sdw_acpi snd_hda_codec mt76_connac_lib edac_mce_amd btusb
> snd_soc_core mt76 snd_hda_core btrtl snd_hwdep snd_compress kvm_amd
> ac97_bus snd_seq btbcm snd_pcm_dmaengine btintel snd_rpl_pci_acp6x
> mac80211 btmtk snd_pci_acp6x kvm snd_seq_device snd_pcm snd_pci_acp5x
> libarc4 irqbypass bluetooth snd_rn_pci_acp3x snd_timer pcspkr
> asus_nb_wmi rapl joydev wmi_bmof snd_acp_config cfg80211 snd_soc_acpi
> vfat snd
> [  231.331490]  snd_pci_acp3x i2c_piix4 soundcore fat k10temp amd_pmc
> asus_wireless zram amdgpu drm_ttm_helper ttm hid_asus asus_wmi
> iommu_v2 crct10dif_pclmul crc32_pclmul gpu_sched crc32c_intel
> ledtrig_audio sparse_keymap polyval_clmulni platform_profile drm_buddy
> polyval_generic hid_multitouch drm_display_helper rfkill nvme
> ucsi_acpi ghash_clmulni_intel nvme_core video typec_ucsi serio_raw ccp
> sha512_ssse3 sp5100_tco r8169 cec nvme_common typec wmi i2c_hid_acpi
> i2c_hid ip6_tables ip_tables fuse
> [  231.331532] CPU: 11 PID: 6555 Comm: GameThread Tainted: G        W
>   L    -------  ---
> 6.1.0-0.rc1.20221019gitaae703b02f92.17.fc38.x86_64 #1
> [  231.331534] Hardware name: ASUSTeK COMPUTER INC. ROG Strix
> G513QY_G513QY/G513QY, BIOS G513QY.318 03/29/2022
> [  231.331537] RIP: 0010:amdgpu_ttm_tt_get_user_pages+0x14c/0x190 [amdgpu]
> [  231.331654] Code: a8 d0 e9 32 ff ff ff 4c 89 e9 89 ea 48 c7 c6 40
> 82 f3 c0 48 c7 c7 10 60 14 c1 e8 2f a0 f4 d0 eb 8e 66 90 bd f2 ff ff
> ff eb 8d <0f> 0b eb f5 bd fd ff ff ff eb 82 bd f2 ff ff ff e9 62 ff ff
> ff 48
> [  231.331656] RSP: 0018:ffffaad4c705bae8 EFLAGS: 00010286
> [  231.331659] RAX: ffff8e9cbdbe3200 RBX: ffff8e997e3f2440 RCX: 0000000000000000
> [  231.331661] RDX: 0000000000000000 RSI: ffff8e9cbdbe3200 RDI: ffff8e9c31208000
> [  231.331663] RBP: 0000000000000001 R08: 0000000000000dc0 R09: 00000000ffffffff
> [  231.331665] R10: 0000000000000001 R11: 0000000000000000 R12: ffffaad4c705bb90
> [  231.331666] R13: 0000000076510000 R14: ffff8e9c89f334e0 R15: ffff8e991fda8000
> [  231.331668] FS:  000000007c2af6c0(0000) GS:ffff8ea7d8e00000(0000)
> knlGS:000000007b2c0000
> [  231.331671] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  231.331673] CR2: 00007ff65ffd8000 CR3: 00000004f90f0000 CR4: 0000000000750ee0
> [  231.331674] PKRU: 55555554
> [  231.331676] Call Trace:
> [  231.331678]  <TASK>
> [  231.331682]  amdgpu_cs_ioctl+0x87e/0x1fc0 [amdgpu]
> [  231.331824]  ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
> [  231.331981]  drm_ioctl_kernel+0xac/0x160
> [  231.331990]  drm_ioctl+0x1e7/0x450
> [  231.331994]  ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
> [  231.332118]  amdgpu_drm_ioctl+0x4a/0x80 [amdgpu]
> [  231.332233]  __x64_sys_ioctl+0x90/0xd0
> [  231.332238]  do_syscall_64+0x5b/0x80
> [  231.332243]  ? asm_exc_page_fault+0x22/0x30
> [  231.332247]  ? lockdep_hardirqs_on+0x7d/0x100
> [  231.332250]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [  231.332253] RIP: 0033:0x7ff677c5704f
> [  231.332256] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24
> 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00
> 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28
> 00 00
> [  231.332258] RSP: 002b:000000007c2ad470 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [  231.332261] RAX: ffffffffffffffda RBX: 000000007c2ad718 RCX: 00007ff677c5704f
> [  231.332263] RDX: 000000007c2ad540 RSI: 00000000c0186444 RDI: 00000000000000a7
> [  231.332265] RBP: 000000007c2ad540 R08: 00007ff590048590 R09: 000000007c2ad510
> [  231.332266] R10: 000000007e864ec0 R11: 0000000000000246 R12: 00000000c0186444
> [  231.332268] R13: 00000000000000a7 R14: 000000007c2ad6f0 R15: 0000000000000005
> [  231.332277]  </TASK>
> [  231.332279] irq event stamp: 18035
> [  231.332281] hardirqs last  enabled at (18043): [<ffffffff9118e8de>]
> __up_console_sem+0x5e/0x70
> [  231.332284] hardirqs last disabled at (18050): [<ffffffff9118e8c3>]
> __up_console_sem+0x43/0x70
> [  231.332287] softirqs last  enabled at (17864): [<ffffffff911012ed>]
> __irq_exit_rcu+0xed/0x160
> [  231.332289] softirqs last disabled at (17857): [<ffffffff911012ed>]
> __irq_exit_rcu+0xed/0x160
> [  231.332291] ---[ end trace 0000000000000000 ]---
> [  231.332299] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to
> process the buffer list -14!
> 
> [  231.332423] ================================================
> [  231.332424] WARNING: lock held when returning to user space!
> [  231.332425] 6.1.0-0.rc1.20221019gitaae703b02f92.17.fc38.x86_64 #1
> Tainted: G        W    L    -------  ---
> [  231.332426] ------------------------------------------------
> [  231.332427] GameThread/6555 is leaving the kernel with locks still held!
> [  231.332428] 1 lock held by GameThread/6555:
> [  231.332429]  #0: ffff8e9cfbac64a8
> (&list->bo_list_mutex){+.+.}-{3:3}, at: amdgpu_cs_ioctl+0x786/0x1fc0
> [amdgpu]
> [  389.428155] amdgpu 0000:03:00.0: amdgpu: free PSP TMR buffer
> 
> Christian, any ideas?
> Thanks.
> 
> Full kernel log: https://pastebin.com/6SEaDay8
> My hardware:
> GPU: 6900XT, 6800M
> CPU: 3950X, 5900HX
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start #forregzbot
@ 2022-10-23 13:20   ` Thorsten Leemhuis
  0 siblings, 0 replies; 29+ messages in thread
From: Thorsten Leemhuis @ 2022-10-23 13:20 UTC (permalink / raw)
  To: regressions; +Cc: amd-gfx list, Linux List Kernel Mailing, dri-devel

[Note: this mail is primarily send for documentation purposes and/or for
regzbot, my Linux kernel regression tracking bot. That's why I removed
most or all folks from the list of recipients, but left any that looked
like a mailing lists. These mails usually contain '#forregzbot' in the
subject, to make them easy to spot and filter out.]

[TLDR: I'm adding this regression report to the list of tracked
regressions; all text from me you find below is based on a few templates
paragraphs you might have encountered already already in similar form.]

Hi, this is your Linux kernel regression tracker. CCing the regression
mailing list, as it should be in the loop for all regressions, as
explained here:
https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html


On 21.10.22 10:08, Mikhail Gavrilov wrote:
> Hi!
> I found that some games (Cyberpunk 2077, Forza Horizon 4/5) hang at
> start after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6.

Thanks for the report. To be sure below issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, my Linux kernel regression
tracking bot:

#regzbot ^introduced dd80d9c8eecac
#regzbot title drm: amdgpu: some games (Cyberpunk 2077, Forza Horizon
4/5) hang at start
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply -- ideally with also
telling regzbot about it, as explained here:
https://linux-regtracking.leemhuis.info/tracked-regression/

Reminder for developers: When fixing the issue, add 'Link:' tags
pointing to the report (the mail this one replies to), as explained for
in the Linux kernel's documentation; above webpage explains why this is
important for tracked regressions.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

> dd80d9c8eecac8c516da5b240d01a35660ba6cb6 is the first bad commit
> commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6
> Author: Christian König <christian.koenig@amd.com>
> Date:   Thu Jul 14 10:23:38 2022 +0200
> 
>     drm/amdgpu: revert "partial revert "remove ctx->lock" v2"
> 
>     This reverts commit 94f4c4965e5513ba624488f4b601d6b385635aec.
> 
>     We found that the bo_list is missing a protection for its list entries.
>     Since that is fixed now this workaround can be removed again.
> 
>     Signed-off-by: Christian König <christian.koenig@amd.com>
>     Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
>     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 21 ++++++---------------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c |  2 --
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h |  1 -
>  3 files changed, 6 insertions(+), 18 deletions(-)
> 
> 
> And when it happening in kernel log appears a such backtrace:
> [  231.331210] ------------[ cut here ]------------
> [  231.331262] WARNING: CPU: 11 PID: 6555 at
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:675
> amdgpu_ttm_tt_get_user_pages+0x14c/0x190 [amdgpu]
> [  231.331424] Modules linked in: uinput rfcomm snd_seq_dummy
> snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast
> nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
> nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
> qrtr bnep intel_rapl_msr intel_rapl_common snd_sof_amd_renoir
> snd_sof_amd_acp snd_sof_pci snd_hda_codec_realtek snd_sof
> snd_hda_codec_generic snd_hda_codec_hdmi snd_sof_utils mt7921e
> snd_hda_intel sunrpc snd_intel_dspcfg mt7921_common binfmt_misc
> snd_intel_sdw_acpi snd_hda_codec mt76_connac_lib edac_mce_amd btusb
> snd_soc_core mt76 snd_hda_core btrtl snd_hwdep snd_compress kvm_amd
> ac97_bus snd_seq btbcm snd_pcm_dmaengine btintel snd_rpl_pci_acp6x
> mac80211 btmtk snd_pci_acp6x kvm snd_seq_device snd_pcm snd_pci_acp5x
> libarc4 irqbypass bluetooth snd_rn_pci_acp3x snd_timer pcspkr
> asus_nb_wmi rapl joydev wmi_bmof snd_acp_config cfg80211 snd_soc_acpi
> vfat snd
> [  231.331490]  snd_pci_acp3x i2c_piix4 soundcore fat k10temp amd_pmc
> asus_wireless zram amdgpu drm_ttm_helper ttm hid_asus asus_wmi
> iommu_v2 crct10dif_pclmul crc32_pclmul gpu_sched crc32c_intel
> ledtrig_audio sparse_keymap polyval_clmulni platform_profile drm_buddy
> polyval_generic hid_multitouch drm_display_helper rfkill nvme
> ucsi_acpi ghash_clmulni_intel nvme_core video typec_ucsi serio_raw ccp
> sha512_ssse3 sp5100_tco r8169 cec nvme_common typec wmi i2c_hid_acpi
> i2c_hid ip6_tables ip_tables fuse
> [  231.331532] CPU: 11 PID: 6555 Comm: GameThread Tainted: G        W
>   L    -------  ---
> 6.1.0-0.rc1.20221019gitaae703b02f92.17.fc38.x86_64 #1
> [  231.331534] Hardware name: ASUSTeK COMPUTER INC. ROG Strix
> G513QY_G513QY/G513QY, BIOS G513QY.318 03/29/2022
> [  231.331537] RIP: 0010:amdgpu_ttm_tt_get_user_pages+0x14c/0x190 [amdgpu]
> [  231.331654] Code: a8 d0 e9 32 ff ff ff 4c 89 e9 89 ea 48 c7 c6 40
> 82 f3 c0 48 c7 c7 10 60 14 c1 e8 2f a0 f4 d0 eb 8e 66 90 bd f2 ff ff
> ff eb 8d <0f> 0b eb f5 bd fd ff ff ff eb 82 bd f2 ff ff ff e9 62 ff ff
> ff 48
> [  231.331656] RSP: 0018:ffffaad4c705bae8 EFLAGS: 00010286
> [  231.331659] RAX: ffff8e9cbdbe3200 RBX: ffff8e997e3f2440 RCX: 0000000000000000
> [  231.331661] RDX: 0000000000000000 RSI: ffff8e9cbdbe3200 RDI: ffff8e9c31208000
> [  231.331663] RBP: 0000000000000001 R08: 0000000000000dc0 R09: 00000000ffffffff
> [  231.331665] R10: 0000000000000001 R11: 0000000000000000 R12: ffffaad4c705bb90
> [  231.331666] R13: 0000000076510000 R14: ffff8e9c89f334e0 R15: ffff8e991fda8000
> [  231.331668] FS:  000000007c2af6c0(0000) GS:ffff8ea7d8e00000(0000)
> knlGS:000000007b2c0000
> [  231.331671] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  231.331673] CR2: 00007ff65ffd8000 CR3: 00000004f90f0000 CR4: 0000000000750ee0
> [  231.331674] PKRU: 55555554
> [  231.331676] Call Trace:
> [  231.331678]  <TASK>
> [  231.331682]  amdgpu_cs_ioctl+0x87e/0x1fc0 [amdgpu]
> [  231.331824]  ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
> [  231.331981]  drm_ioctl_kernel+0xac/0x160
> [  231.331990]  drm_ioctl+0x1e7/0x450
> [  231.331994]  ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
> [  231.332118]  amdgpu_drm_ioctl+0x4a/0x80 [amdgpu]
> [  231.332233]  __x64_sys_ioctl+0x90/0xd0
> [  231.332238]  do_syscall_64+0x5b/0x80
> [  231.332243]  ? asm_exc_page_fault+0x22/0x30
> [  231.332247]  ? lockdep_hardirqs_on+0x7d/0x100
> [  231.332250]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [  231.332253] RIP: 0033:0x7ff677c5704f
> [  231.332256] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24
> 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00
> 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28
> 00 00
> [  231.332258] RSP: 002b:000000007c2ad470 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [  231.332261] RAX: ffffffffffffffda RBX: 000000007c2ad718 RCX: 00007ff677c5704f
> [  231.332263] RDX: 000000007c2ad540 RSI: 00000000c0186444 RDI: 00000000000000a7
> [  231.332265] RBP: 000000007c2ad540 R08: 00007ff590048590 R09: 000000007c2ad510
> [  231.332266] R10: 000000007e864ec0 R11: 0000000000000246 R12: 00000000c0186444
> [  231.332268] R13: 00000000000000a7 R14: 000000007c2ad6f0 R15: 0000000000000005
> [  231.332277]  </TASK>
> [  231.332279] irq event stamp: 18035
> [  231.332281] hardirqs last  enabled at (18043): [<ffffffff9118e8de>]
> __up_console_sem+0x5e/0x70
> [  231.332284] hardirqs last disabled at (18050): [<ffffffff9118e8c3>]
> __up_console_sem+0x43/0x70
> [  231.332287] softirqs last  enabled at (17864): [<ffffffff911012ed>]
> __irq_exit_rcu+0xed/0x160
> [  231.332289] softirqs last disabled at (17857): [<ffffffff911012ed>]
> __irq_exit_rcu+0xed/0x160
> [  231.332291] ---[ end trace 0000000000000000 ]---
> [  231.332299] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to
> process the buffer list -14!
> 
> [  231.332423] ================================================
> [  231.332424] WARNING: lock held when returning to user space!
> [  231.332425] 6.1.0-0.rc1.20221019gitaae703b02f92.17.fc38.x86_64 #1
> Tainted: G        W    L    -------  ---
> [  231.332426] ------------------------------------------------
> [  231.332427] GameThread/6555 is leaving the kernel with locks still held!
> [  231.332428] 1 lock held by GameThread/6555:
> [  231.332429]  #0: ffff8e9cfbac64a8
> (&list->bo_list_mutex){+.+.}-{3:3}, at: amdgpu_cs_ioctl+0x786/0x1fc0
> [amdgpu]
> [  389.428155] amdgpu 0000:03:00.0: amdgpu: free PSP TMR buffer
> 
> Christian, any ideas?
> Thanks.
> 
> Full kernel log: https://pastebin.com/6SEaDay8
> My hardware:
> GPU: 6900XT, 6800M
> CPU: 3950X, 5900HX
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start
  2022-10-21 12:36     ` Mikhail Gavrilov
@ 2022-10-26  7:29       ` Christian König
  -1 siblings, 0 replies; 29+ messages in thread
From: Christian König @ 2022-10-26  7:29 UTC (permalink / raw)
  To: Mikhail Gavrilov; +Cc: Deucher, Alexander, dri-devel, amd-gfx list

[-- Attachment #1: Type: text/plain, Size: 587 bytes --]

Attached is the original test patch rebased on current amd-staging-drm-next.

Can you test if this is enough to make sure that the games start without 
crashing by fetching the userptrs?

Thanks in advance,
Christian.

Am 21.10.22 um 14:36 schrieb Mikhail Gavrilov:
> On Fri, Oct 21, 2022 at 1:33 PM Christian König
> <christian.koenig@amd.com> wrote:
>> Hi,
>>
>> yes Bas already reported this issue, but I couldn't reproduce it. Need
>> to come up with a patch to narrow this down further.
>>
>> Can I send you something to test?
> I would appreciate to test any patches and ideas.
>

[-- Attachment #2: 0001-drm-amdgpu-partial-revert-remove-ctx-lock-v2.patch --]
[-- Type: text/x-patch, Size: 4010 bytes --]

From 852c78656f083394296b3d3b96db33608ce0f272 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christian=20K=C3=B6nig?= <christian.koenig@amd.com>
Date: Wed, 26 Oct 2022 09:26:01 +0200
Subject: [PATCH] drm/amdgpu: partial revert "remove ctx->lock" v2""
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This reverts commit 6c052af778a61977c271632044c754dbbca4f892.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 26 +++++++++++++++++--------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h |  1 +
 3 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 1bbd39b3b0fc..0b331e8bfba6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -57,6 +57,8 @@ static int amdgpu_cs_parser_init(struct amdgpu_cs_parser *p,
 	if (!p->ctx)
 		return -EINVAL;
 
+	mutex_lock(&p->ctx->lock);
+
 	if (atomic_read(&p->ctx->guilty)) {
 		amdgpu_ctx_put(p->ctx);
 		return -ECANCELED;
@@ -578,6 +580,9 @@ static int amdgpu_cs_pass2(struct amdgpu_cs_parser *p)
 	unsigned int ce_preempt = 0, de_preempt = 0;
 	int i, r;
 
+	/* TODO: Investigate why we still need the context lock */
+	mutex_unlock(&p->ctx->lock);
+
 	for (i = 0; i < p->nchunks; ++i) {
 		struct amdgpu_cs_chunk *chunk;
 
@@ -587,38 +592,41 @@ static int amdgpu_cs_pass2(struct amdgpu_cs_parser *p)
 		case AMDGPU_CHUNK_ID_IB:
 			r = amdgpu_cs_p2_ib(p, chunk, &ce_preempt, &de_preempt);
 			if (r)
-				return r;
+				goto out;
 			break;
 		case AMDGPU_CHUNK_ID_DEPENDENCIES:
 		case AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES:
 			r = amdgpu_cs_p2_dependencies(p, chunk);
 			if (r)
-				return r;
+				goto out;
 			break;
 		case AMDGPU_CHUNK_ID_SYNCOBJ_IN:
 			r = amdgpu_cs_p2_syncobj_in(p, chunk);
 			if (r)
-				return r;
+				goto out;
 			break;
 		case AMDGPU_CHUNK_ID_SYNCOBJ_OUT:
 			r = amdgpu_cs_p2_syncobj_out(p, chunk);
 			if (r)
-				return r;
+				goto out;
 			break;
 		case AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_WAIT:
 			r = amdgpu_cs_p2_syncobj_timeline_wait(p, chunk);
 			if (r)
-				return r;
+				goto out;
 			break;
 		case AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_SIGNAL:
 			r = amdgpu_cs_p2_syncobj_timeline_signal(p, chunk);
 			if (r)
-				return r;
+				goto out;
 			break;
 		}
 	}
 
-	return 0;
+	r = 0;
+out:
+	mutex_lock(&p->ctx->lock);
+	return r;
 }
 
 /* Convert microseconds to bytes. */
@@ -1335,8 +1343,10 @@ static void amdgpu_cs_parser_fini(struct amdgpu_cs_parser *parser)
 
 	dma_fence_put(parser->fence);
 
-	if (parser->ctx)
+	if (parser->ctx) {
+		mutex_unlock(&parser->ctx->lock);
 		amdgpu_ctx_put(parser->ctx);
+	}
 	if (parser->bo_list)
 		amdgpu_bo_list_put(parser->bo_list);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 331aa191910c..3a23fa45bfed 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -315,6 +315,7 @@ static int amdgpu_ctx_init(struct amdgpu_ctx_mgr *mgr, int32_t priority,
 	kref_init(&ctx->refcount);
 	ctx->mgr = mgr;
 	spin_lock_init(&ctx->ring_lock);
+	mutex_init(&ctx->lock);
 
 	ctx->reset_counter = atomic_read(&mgr->adev->gpu_reset_counter);
 	ctx->reset_counter_query = ctx->reset_counter;
@@ -409,6 +410,7 @@ static void amdgpu_ctx_fini(struct kref *ref)
 		drm_dev_exit(idx);
 	}
 
+	mutex_destroy(&ctx->lock);
 	kfree(ctx);
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
index 0fa0e56daf67..cc7c8afff414 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
@@ -53,6 +53,7 @@ struct amdgpu_ctx {
 	bool				preamble_presented;
 	int32_t				init_priority;
 	int32_t				override_priority;
+	struct mutex			lock;
 	atomic_t			guilty;
 	unsigned long			ras_counter_ce;
 	unsigned long			ras_counter_ue;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start
@ 2022-10-26  7:29       ` Christian König
  0 siblings, 0 replies; 29+ messages in thread
From: Christian König @ 2022-10-26  7:29 UTC (permalink / raw)
  To: Mikhail Gavrilov
  Cc: Deucher, Alexander, dri-devel, amd-gfx list, Bas Nieuwenhuizen

[-- Attachment #1: Type: text/plain, Size: 587 bytes --]

Attached is the original test patch rebased on current amd-staging-drm-next.

Can you test if this is enough to make sure that the games start without 
crashing by fetching the userptrs?

Thanks in advance,
Christian.

Am 21.10.22 um 14:36 schrieb Mikhail Gavrilov:
> On Fri, Oct 21, 2022 at 1:33 PM Christian König
> <christian.koenig@amd.com> wrote:
>> Hi,
>>
>> yes Bas already reported this issue, but I couldn't reproduce it. Need
>> to come up with a patch to narrow this down further.
>>
>> Can I send you something to test?
> I would appreciate to test any patches and ideas.
>

[-- Attachment #2: 0001-drm-amdgpu-partial-revert-remove-ctx-lock-v2.patch --]
[-- Type: text/x-patch, Size: 4010 bytes --]

From 852c78656f083394296b3d3b96db33608ce0f272 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christian=20K=C3=B6nig?= <christian.koenig@amd.com>
Date: Wed, 26 Oct 2022 09:26:01 +0200
Subject: [PATCH] drm/amdgpu: partial revert "remove ctx->lock" v2""
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This reverts commit 6c052af778a61977c271632044c754dbbca4f892.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 26 +++++++++++++++++--------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h |  1 +
 3 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 1bbd39b3b0fc..0b331e8bfba6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -57,6 +57,8 @@ static int amdgpu_cs_parser_init(struct amdgpu_cs_parser *p,
 	if (!p->ctx)
 		return -EINVAL;
 
+	mutex_lock(&p->ctx->lock);
+
 	if (atomic_read(&p->ctx->guilty)) {
 		amdgpu_ctx_put(p->ctx);
 		return -ECANCELED;
@@ -578,6 +580,9 @@ static int amdgpu_cs_pass2(struct amdgpu_cs_parser *p)
 	unsigned int ce_preempt = 0, de_preempt = 0;
 	int i, r;
 
+	/* TODO: Investigate why we still need the context lock */
+	mutex_unlock(&p->ctx->lock);
+
 	for (i = 0; i < p->nchunks; ++i) {
 		struct amdgpu_cs_chunk *chunk;
 
@@ -587,38 +592,41 @@ static int amdgpu_cs_pass2(struct amdgpu_cs_parser *p)
 		case AMDGPU_CHUNK_ID_IB:
 			r = amdgpu_cs_p2_ib(p, chunk, &ce_preempt, &de_preempt);
 			if (r)
-				return r;
+				goto out;
 			break;
 		case AMDGPU_CHUNK_ID_DEPENDENCIES:
 		case AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES:
 			r = amdgpu_cs_p2_dependencies(p, chunk);
 			if (r)
-				return r;
+				goto out;
 			break;
 		case AMDGPU_CHUNK_ID_SYNCOBJ_IN:
 			r = amdgpu_cs_p2_syncobj_in(p, chunk);
 			if (r)
-				return r;
+				goto out;
 			break;
 		case AMDGPU_CHUNK_ID_SYNCOBJ_OUT:
 			r = amdgpu_cs_p2_syncobj_out(p, chunk);
 			if (r)
-				return r;
+				goto out;
 			break;
 		case AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_WAIT:
 			r = amdgpu_cs_p2_syncobj_timeline_wait(p, chunk);
 			if (r)
-				return r;
+				goto out;
 			break;
 		case AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_SIGNAL:
 			r = amdgpu_cs_p2_syncobj_timeline_signal(p, chunk);
 			if (r)
-				return r;
+				goto out;
 			break;
 		}
 	}
 
-	return 0;
+	r = 0;
+out:
+	mutex_lock(&p->ctx->lock);
+	return r;
 }
 
 /* Convert microseconds to bytes. */
@@ -1335,8 +1343,10 @@ static void amdgpu_cs_parser_fini(struct amdgpu_cs_parser *parser)
 
 	dma_fence_put(parser->fence);
 
-	if (parser->ctx)
+	if (parser->ctx) {
+		mutex_unlock(&parser->ctx->lock);
 		amdgpu_ctx_put(parser->ctx);
+	}
 	if (parser->bo_list)
 		amdgpu_bo_list_put(parser->bo_list);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 331aa191910c..3a23fa45bfed 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -315,6 +315,7 @@ static int amdgpu_ctx_init(struct amdgpu_ctx_mgr *mgr, int32_t priority,
 	kref_init(&ctx->refcount);
 	ctx->mgr = mgr;
 	spin_lock_init(&ctx->ring_lock);
+	mutex_init(&ctx->lock);
 
 	ctx->reset_counter = atomic_read(&mgr->adev->gpu_reset_counter);
 	ctx->reset_counter_query = ctx->reset_counter;
@@ -409,6 +410,7 @@ static void amdgpu_ctx_fini(struct kref *ref)
 		drm_dev_exit(idx);
 	}
 
+	mutex_destroy(&ctx->lock);
 	kfree(ctx);
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
index 0fa0e56daf67..cc7c8afff414 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
@@ -53,6 +53,7 @@ struct amdgpu_ctx {
 	bool				preamble_presented;
 	int32_t				init_priority;
 	int32_t				override_priority;
+	struct mutex			lock;
 	atomic_t			guilty;
 	unsigned long			ras_counter_ce;
 	unsigned long			ras_counter_ue;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start
  2022-10-26  7:29       ` Christian König
@ 2022-10-30 22:05         ` Mikhail Gavrilov
  -1 siblings, 0 replies; 29+ messages in thread
From: Mikhail Gavrilov @ 2022-10-30 22:05 UTC (permalink / raw)
  To: Christian König; +Cc: Deucher, Alexander, dri-devel, amd-gfx list

On Wed, Oct 26, 2022 at 12:29 PM Christian König
<christian.koenig@amd.com> wrote:
>
> Attached is the original test patch rebased on current amd-staging-drm-next.
>
> Can you test if this is enough to make sure that the games start without
> crashing by fetching the userptrs?

1. Over the past week the list of games affected by this issue updated
with new games: The Outlast Trials, Gotham Knights, Sackboy: A Big
Adventure.

2. I tested the patch and it really solves the problem with the launch
of all the listed games and does not create new problems.

3. The only thing I noticed is that in the game Sackboy: A Big
Adventure, when using the kernel built from the commit
b229b6ca5abbd63ff40c1396095b1b36b18139c3 + the attached patch, I can’t
connect to friend coop session due to the steam client hangs. The
kernel built from commit 736ec9fadd7a1fde8480df7e5cfac465c07ff6f3
(this is the commit prior to dd80d9c8eecac8c516da5b240d01a35660ba6cb6)
free of this problem.

I need to spend some more time to find the commit after which leads to
hanging [3] the steam client.

Thanks.

-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start
@ 2022-10-30 22:05         ` Mikhail Gavrilov
  0 siblings, 0 replies; 29+ messages in thread
From: Mikhail Gavrilov @ 2022-10-30 22:05 UTC (permalink / raw)
  To: Christian König
  Cc: Deucher, Alexander, dri-devel, amd-gfx list, Bas Nieuwenhuizen

On Wed, Oct 26, 2022 at 12:29 PM Christian König
<christian.koenig@amd.com> wrote:
>
> Attached is the original test patch rebased on current amd-staging-drm-next.
>
> Can you test if this is enough to make sure that the games start without
> crashing by fetching the userptrs?

1. Over the past week the list of games affected by this issue updated
with new games: The Outlast Trials, Gotham Knights, Sackboy: A Big
Adventure.

2. I tested the patch and it really solves the problem with the launch
of all the listed games and does not create new problems.

3. The only thing I noticed is that in the game Sackboy: A Big
Adventure, when using the kernel built from the commit
b229b6ca5abbd63ff40c1396095b1b36b18139c3 + the attached patch, I can’t
connect to friend coop session due to the steam client hangs. The
kernel built from commit 736ec9fadd7a1fde8480df7e5cfac465c07ff6f3
(this is the commit prior to dd80d9c8eecac8c516da5b240d01a35660ba6cb6)
free of this problem.

I need to spend some more time to find the commit after which leads to
hanging [3] the steam client.

Thanks.

-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start
  2022-10-30 22:05         ` Mikhail Gavrilov
@ 2022-11-01 17:52           ` Christian König
  -1 siblings, 0 replies; 29+ messages in thread
From: Christian König @ 2022-11-01 17:52 UTC (permalink / raw)
  To: Mikhail Gavrilov, Christian König
  Cc: Deucher, Alexander, amd-gfx list, dri-devel

Hi Mikhail,

Am 30.10.22 um 23:05 schrieb Mikhail Gavrilov:
> On Wed, Oct 26, 2022 at 12:29 PM Christian König
> <christian.koenig@amd.com> wrote:
>> Attached is the original test patch rebased on current amd-staging-drm-next.
>>
>> Can you test if this is enough to make sure that the games start without
>> crashing by fetching the userptrs?
> 1. Over the past week the list of games affected by this issue updated
> with new games: The Outlast Trials, Gotham Knights, Sackboy: A Big
> Adventure.
>
> 2. I tested the patch and it really solves the problem with the launch
> of all the listed games and does not create new problems.
>
> 3. The only thing I noticed is that in the game Sackboy: A Big
> Adventure, when using the kernel built from the commit
> b229b6ca5abbd63ff40c1396095b1b36b18139c3 + the attached patch, I can’t
> connect to friend coop session due to the steam client hangs. The
> kernel built from commit 736ec9fadd7a1fde8480df7e5cfac465c07ff6f3
> (this is the commit prior to dd80d9c8eecac8c516da5b240d01a35660ba6cb6)
> free of this problem.
>
> I need to spend some more time to find the commit after which leads to
> hanging [3] the steam client.

Let's focus on one problem at a time.

The issue here is that somehow userptr handling became racy after we 
removed the lock, but I don't see why.

We need to fix this ASAP since it is probably a much wider problem and 
the additional lock just hides it somehow.

Going to provide you with an updated patch tomorrow.

Thanks,
Christian.

>
> Thanks.
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start
@ 2022-11-01 17:52           ` Christian König
  0 siblings, 0 replies; 29+ messages in thread
From: Christian König @ 2022-11-01 17:52 UTC (permalink / raw)
  To: Mikhail Gavrilov, Christian König
  Cc: Deucher, Alexander, amd-gfx list, dri-devel, Bas Nieuwenhuizen

Hi Mikhail,

Am 30.10.22 um 23:05 schrieb Mikhail Gavrilov:
> On Wed, Oct 26, 2022 at 12:29 PM Christian König
> <christian.koenig@amd.com> wrote:
>> Attached is the original test patch rebased on current amd-staging-drm-next.
>>
>> Can you test if this is enough to make sure that the games start without
>> crashing by fetching the userptrs?
> 1. Over the past week the list of games affected by this issue updated
> with new games: The Outlast Trials, Gotham Knights, Sackboy: A Big
> Adventure.
>
> 2. I tested the patch and it really solves the problem with the launch
> of all the listed games and does not create new problems.
>
> 3. The only thing I noticed is that in the game Sackboy: A Big
> Adventure, when using the kernel built from the commit
> b229b6ca5abbd63ff40c1396095b1b36b18139c3 + the attached patch, I can’t
> connect to friend coop session due to the steam client hangs. The
> kernel built from commit 736ec9fadd7a1fde8480df7e5cfac465c07ff6f3
> (this is the commit prior to dd80d9c8eecac8c516da5b240d01a35660ba6cb6)
> free of this problem.
>
> I need to spend some more time to find the commit after which leads to
> hanging [3] the steam client.

Let's focus on one problem at a time.

The issue here is that somehow userptr handling became racy after we 
removed the lock, but I don't see why.

We need to fix this ASAP since it is probably a much wider problem and 
the additional lock just hides it somehow.

Going to provide you with an updated patch tomorrow.

Thanks,
Christian.

>
> Thanks.
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start
  2022-11-01 17:52           ` Christian König
@ 2022-11-02 13:36             ` Mikhail Gavrilov
  -1 siblings, 0 replies; 29+ messages in thread
From: Mikhail Gavrilov @ 2022-11-02 13:36 UTC (permalink / raw)
  To: Christian König
  Cc: Deucher, Alexander, amd-gfx list, Christian König, dri-devel

On Tue, Nov 1, 2022 at 10:52 PM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Let's focus on one problem at a time.
>
> The issue here is that somehow userptr handling became racy after we
> removed the lock, but I don't see why.
>
> We need to fix this ASAP since it is probably a much wider problem and
> the additional lock just hides it somehow.
>
> Going to provide you with an updated patch tomorrow.
>
> Thanks,
> Christian.

Recently sackboy has been updated and now the kernel log contains a
trace very similar to the one in the first post, even with the patch
applied.

[  155.948044] ------------[ cut here ]------------
[  155.948164] WARNING: CPU: 3 PID: 4850 at
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:678
amdgpu_ttm_tt_get_user_pages+0x14c/0x190 [amdgpu]
[  155.948342] Modules linked in: uinput rfcomm snd_seq_dummy
snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast
nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
qrtr bnep intel_rapl_msr intel_rapl_common snd_hda_codec_realtek
snd_sof_amd_renoir snd_sof_amd_acp snd_hda_codec_generic
snd_hda_codec_hdmi snd_sof_pci sunrpc binfmt_misc snd_sof
snd_hda_intel snd_sof_utils snd_intel_dspcfg mt7921e
snd_intel_sdw_acpi snd_hda_codec mt7921_common snd_soc_core
edac_mce_amd mt76_connac_lib btusb snd_hda_core snd_compress snd_hwdep
mt76 btrtl ac97_bus kvm_amd snd_pcm_dmaengine btbcm snd_rpl_pci_acp6x
snd_pci_acp6x btintel mac80211 btmtk snd_seq snd_seq_device kvm
snd_pcm snd_pci_acp5x libarc4 bluetooth irqbypass vfat snd_timer
snd_rn_pci_acp3x fat rapl snd_acp_config asus_nb_wmi snd cfg80211
snd_soc_acpi wmi_bmof k10temp pcspkr
[  155.948436]  snd_pci_acp3x i2c_piix4 soundcore asus_wireless
amd_pmc joydev zram amdgpu drm_ttm_helper ttm crct10dif_pclmul
hid_asus crc32_pclmul asus_wmi crc32c_intel iommu_v2 ledtrig_audio
polyval_clmulni gpu_sched sparse_keymap polyval_generic
platform_profile drm_buddy drm_display_helper nvme rfkill
ghash_clmulni_intel hid_multitouch ucsi_acpi sha512_ssse3 nvme_core
typec_ucsi serio_raw sp5100_tco r8169 ccp cec nvme_common typec
i2c_hid_acpi i2c_hid video wmi ip6_tables ip_tables fuse
[  155.948540] CPU: 3 PID: 4850 Comm: Sackboy-Win64-T Tainted: G
 W    L    -------  ---
6.1.0-0.rc3.20221101git5aaef24b5c6d.29.fc38.x86_64 #1
[  155.948544] Hardware name: ASUSTeK COMPUTER INC. ROG Strix
G513QY_G513QY/G513QY, BIOS G513QY.318 03/29/2022
[  155.948547] RIP: 0010:amdgpu_ttm_tt_get_user_pages+0x14c/0x190 [amdgpu]
[  155.948748] Code: 9e f1 e9 32 ff ff ff 4c 89 e9 89 ea 48 c7 c6 a8
a3 fd c0 48 c7 c7 88 81 1e c1 e8 af 97 ea f1 eb 8e 66 90 bd f2 ff ff
ff eb 8d <0f> 0b eb f5 bd fd ff ff ff eb 82 bd f2 ff ff ff e9 62 ff ff
ff 48
[  155.948751] RSP: 0018:ffff960b544d3a50 EFLAGS: 00010282
[  155.948756] RAX: ffff8a4e40d44e00 RBX: ffff8a4f0e564140 RCX: 0000000000000001
[  155.948759] RDX: 0000000000000000 RSI: ffff8a4e40d44e00 RDI: ffff8a4f4b52b400
[  155.948761] RBP: ffff8a4e8c979000 R08: 0000000000000dc0 R09: 00000000ffffffff
[  155.948764] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8a4e8aaad558
[  155.948767] R13: 000000003b910000 R14: ffff8a4f0e667180 R15: ffff8a4f4b52b458
[  155.948770] FS:  00007fa13fe006c0(0000) GS:ffff8a5d16e00000(0000)
knlGS:0000000036f80000
[  155.948772] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  155.948775] CR2: 0000000025c9e1d0 CR3: 0000000361990000 CR4: 0000000000750ee0
[  155.948778] PKRU: 55555554
[  155.948780] Call Trace:
[  155.948783]  <TASK>
[  155.948790]  amdgpu_cs_ioctl+0x9fd/0x2030 [amdgpu]
[  155.948992]  ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
[  155.949155]  drm_ioctl_kernel+0xac/0x160
[  155.949165]  drm_ioctl+0x1e7/0x450
[  155.949172]  ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
[  155.949344]  amdgpu_drm_ioctl+0x4a/0x80 [amdgpu]
[  155.949528]  __x64_sys_ioctl+0x90/0xd0
[  155.949537]  do_syscall_64+0x5b/0x80
[  155.949547]  ? lock_is_held_type+0xe8/0x140
[  155.949559]  ? do_syscall_64+0x67/0x80
[  155.949565]  ? lockdep_hardirqs_on+0x7d/0x100
[  155.949573]  ? do_syscall_64+0x67/0x80
[  155.949579]  ? do_syscall_64+0x67/0x80
[  155.949586]  ? do_syscall_64+0x67/0x80
[  155.949592]  ? lockdep_hardirqs_on+0x7d/0x100
[  155.949597]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  155.949603] RIP: 0033:0x7fa1b7fd912f
[  155.949610] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24
10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00
00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28
00 00
[  155.949615] RSP: 002b:00007fa13fdfe920 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[  155.949621] RAX: ffffffffffffffda RBX: 00007fa13fdfebe8 RCX: 00007fa1b7fd912f
[  155.949625] RDX: 00007fa13fdfea10 RSI: 00000000c0186444 RDI: 0000000000000165
[  155.949629] RBP: 00007fa13fdfea10 R08: 00007f9ff80018e0 R09: 00007fa13fdfe9c0
[  155.949633] R10: 000000007eb11590 R11: 0000000000000246 R12: 00000000c0186444
[  155.949635] R13: 0000000000000165 R14: 00007f9ff8001860 R15: 0000000000000005
[  155.949647]  </TASK>
[  155.949650] irq event stamp: 5375
[  155.949652] hardirqs last  enabled at (5383): [<ffffffffb218e8fe>]
__up_console_sem+0x5e/0x70
[  155.949657] hardirqs last disabled at (5390): [<ffffffffb218e8e3>]
__up_console_sem+0x43/0x70
[  155.949659] softirqs last  enabled at (3236): [<ffffffffb21012ed>]
__irq_exit_rcu+0xed/0x160
[  155.949663] softirqs last disabled at (3231): [<ffffffffb21012ed>]
__irq_exit_rcu+0xed/0x160
[  155.949665] ---[ end trace 0000000000000000 ]---
[  155.949676] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to
process the buffer list -14!

[  155.950689] ================================================
[  155.950690] WARNING: lock held when returning to user space!
[  155.950691] 6.1.0-0.rc3.20221101git5aaef24b5c6d.29.fc38.x86_64 #1
Tainted: G        W    L    -------  ---
[  155.950694] ------------------------------------------------
[  155.950695] Sackboy-Win64-T/4850 is leaving the kernel with locks still held!
[  155.950697] 1 lock held by Sackboy-Win64-T/4850:
[  155.950698]  #0: ffff8a4e8aaad0a8
(&list->bo_list_mutex){+.+.}-{3:3}, at: amdgpu_cs_ioctl+0x903/0x2030
[amdgpu]

But the most interesting thing is that all previous kernels 6.0, 5.19
are affected by the problem. It is not enough to revert the
dd80d9c8eecac8c516da5b240d01a35660ba6cb6 commit.

Full kernel log 6.1-rc3 + patch above: https://pastebin.com/6ebmReer
Full kernel log 5.19: https://pastebin.com/5dRCgxNW

Thanks.

-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start
@ 2022-11-02 13:36             ` Mikhail Gavrilov
  0 siblings, 0 replies; 29+ messages in thread
From: Mikhail Gavrilov @ 2022-11-02 13:36 UTC (permalink / raw)
  To: Christian König
  Cc: Deucher, Alexander, amd-gfx list, Christian König,
	dri-devel, Bas Nieuwenhuizen

On Tue, Nov 1, 2022 at 10:52 PM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Let's focus on one problem at a time.
>
> The issue here is that somehow userptr handling became racy after we
> removed the lock, but I don't see why.
>
> We need to fix this ASAP since it is probably a much wider problem and
> the additional lock just hides it somehow.
>
> Going to provide you with an updated patch tomorrow.
>
> Thanks,
> Christian.

Recently sackboy has been updated and now the kernel log contains a
trace very similar to the one in the first post, even with the patch
applied.

[  155.948044] ------------[ cut here ]------------
[  155.948164] WARNING: CPU: 3 PID: 4850 at
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:678
amdgpu_ttm_tt_get_user_pages+0x14c/0x190 [amdgpu]
[  155.948342] Modules linked in: uinput rfcomm snd_seq_dummy
snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast
nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
qrtr bnep intel_rapl_msr intel_rapl_common snd_hda_codec_realtek
snd_sof_amd_renoir snd_sof_amd_acp snd_hda_codec_generic
snd_hda_codec_hdmi snd_sof_pci sunrpc binfmt_misc snd_sof
snd_hda_intel snd_sof_utils snd_intel_dspcfg mt7921e
snd_intel_sdw_acpi snd_hda_codec mt7921_common snd_soc_core
edac_mce_amd mt76_connac_lib btusb snd_hda_core snd_compress snd_hwdep
mt76 btrtl ac97_bus kvm_amd snd_pcm_dmaengine btbcm snd_rpl_pci_acp6x
snd_pci_acp6x btintel mac80211 btmtk snd_seq snd_seq_device kvm
snd_pcm snd_pci_acp5x libarc4 bluetooth irqbypass vfat snd_timer
snd_rn_pci_acp3x fat rapl snd_acp_config asus_nb_wmi snd cfg80211
snd_soc_acpi wmi_bmof k10temp pcspkr
[  155.948436]  snd_pci_acp3x i2c_piix4 soundcore asus_wireless
amd_pmc joydev zram amdgpu drm_ttm_helper ttm crct10dif_pclmul
hid_asus crc32_pclmul asus_wmi crc32c_intel iommu_v2 ledtrig_audio
polyval_clmulni gpu_sched sparse_keymap polyval_generic
platform_profile drm_buddy drm_display_helper nvme rfkill
ghash_clmulni_intel hid_multitouch ucsi_acpi sha512_ssse3 nvme_core
typec_ucsi serio_raw sp5100_tco r8169 ccp cec nvme_common typec
i2c_hid_acpi i2c_hid video wmi ip6_tables ip_tables fuse
[  155.948540] CPU: 3 PID: 4850 Comm: Sackboy-Win64-T Tainted: G
 W    L    -------  ---
6.1.0-0.rc3.20221101git5aaef24b5c6d.29.fc38.x86_64 #1
[  155.948544] Hardware name: ASUSTeK COMPUTER INC. ROG Strix
G513QY_G513QY/G513QY, BIOS G513QY.318 03/29/2022
[  155.948547] RIP: 0010:amdgpu_ttm_tt_get_user_pages+0x14c/0x190 [amdgpu]
[  155.948748] Code: 9e f1 e9 32 ff ff ff 4c 89 e9 89 ea 48 c7 c6 a8
a3 fd c0 48 c7 c7 88 81 1e c1 e8 af 97 ea f1 eb 8e 66 90 bd f2 ff ff
ff eb 8d <0f> 0b eb f5 bd fd ff ff ff eb 82 bd f2 ff ff ff e9 62 ff ff
ff 48
[  155.948751] RSP: 0018:ffff960b544d3a50 EFLAGS: 00010282
[  155.948756] RAX: ffff8a4e40d44e00 RBX: ffff8a4f0e564140 RCX: 0000000000000001
[  155.948759] RDX: 0000000000000000 RSI: ffff8a4e40d44e00 RDI: ffff8a4f4b52b400
[  155.948761] RBP: ffff8a4e8c979000 R08: 0000000000000dc0 R09: 00000000ffffffff
[  155.948764] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8a4e8aaad558
[  155.948767] R13: 000000003b910000 R14: ffff8a4f0e667180 R15: ffff8a4f4b52b458
[  155.948770] FS:  00007fa13fe006c0(0000) GS:ffff8a5d16e00000(0000)
knlGS:0000000036f80000
[  155.948772] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  155.948775] CR2: 0000000025c9e1d0 CR3: 0000000361990000 CR4: 0000000000750ee0
[  155.948778] PKRU: 55555554
[  155.948780] Call Trace:
[  155.948783]  <TASK>
[  155.948790]  amdgpu_cs_ioctl+0x9fd/0x2030 [amdgpu]
[  155.948992]  ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
[  155.949155]  drm_ioctl_kernel+0xac/0x160
[  155.949165]  drm_ioctl+0x1e7/0x450
[  155.949172]  ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
[  155.949344]  amdgpu_drm_ioctl+0x4a/0x80 [amdgpu]
[  155.949528]  __x64_sys_ioctl+0x90/0xd0
[  155.949537]  do_syscall_64+0x5b/0x80
[  155.949547]  ? lock_is_held_type+0xe8/0x140
[  155.949559]  ? do_syscall_64+0x67/0x80
[  155.949565]  ? lockdep_hardirqs_on+0x7d/0x100
[  155.949573]  ? do_syscall_64+0x67/0x80
[  155.949579]  ? do_syscall_64+0x67/0x80
[  155.949586]  ? do_syscall_64+0x67/0x80
[  155.949592]  ? lockdep_hardirqs_on+0x7d/0x100
[  155.949597]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  155.949603] RIP: 0033:0x7fa1b7fd912f
[  155.949610] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24
10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00
00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28
00 00
[  155.949615] RSP: 002b:00007fa13fdfe920 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[  155.949621] RAX: ffffffffffffffda RBX: 00007fa13fdfebe8 RCX: 00007fa1b7fd912f
[  155.949625] RDX: 00007fa13fdfea10 RSI: 00000000c0186444 RDI: 0000000000000165
[  155.949629] RBP: 00007fa13fdfea10 R08: 00007f9ff80018e0 R09: 00007fa13fdfe9c0
[  155.949633] R10: 000000007eb11590 R11: 0000000000000246 R12: 00000000c0186444
[  155.949635] R13: 0000000000000165 R14: 00007f9ff8001860 R15: 0000000000000005
[  155.949647]  </TASK>
[  155.949650] irq event stamp: 5375
[  155.949652] hardirqs last  enabled at (5383): [<ffffffffb218e8fe>]
__up_console_sem+0x5e/0x70
[  155.949657] hardirqs last disabled at (5390): [<ffffffffb218e8e3>]
__up_console_sem+0x43/0x70
[  155.949659] softirqs last  enabled at (3236): [<ffffffffb21012ed>]
__irq_exit_rcu+0xed/0x160
[  155.949663] softirqs last disabled at (3231): [<ffffffffb21012ed>]
__irq_exit_rcu+0xed/0x160
[  155.949665] ---[ end trace 0000000000000000 ]---
[  155.949676] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to
process the buffer list -14!

[  155.950689] ================================================
[  155.950690] WARNING: lock held when returning to user space!
[  155.950691] 6.1.0-0.rc3.20221101git5aaef24b5c6d.29.fc38.x86_64 #1
Tainted: G        W    L    -------  ---
[  155.950694] ------------------------------------------------
[  155.950695] Sackboy-Win64-T/4850 is leaving the kernel with locks still held!
[  155.950697] 1 lock held by Sackboy-Win64-T/4850:
[  155.950698]  #0: ffff8a4e8aaad0a8
(&list->bo_list_mutex){+.+.}-{3:3}, at: amdgpu_cs_ioctl+0x903/0x2030
[amdgpu]

But the most interesting thing is that all previous kernels 6.0, 5.19
are affected by the problem. It is not enough to revert the
dd80d9c8eecac8c516da5b240d01a35660ba6cb6 commit.

Full kernel log 6.1-rc3 + patch above: https://pastebin.com/6ebmReer
Full kernel log 5.19: https://pastebin.com/5dRCgxNW

Thanks.

-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start
  2022-11-02 13:36             ` Mikhail Gavrilov
@ 2022-11-02 13:43               ` Christian König
  -1 siblings, 0 replies; 29+ messages in thread
From: Christian König @ 2022-11-02 13:43 UTC (permalink / raw)
  To: Mikhail Gavrilov
  Cc: Deucher, Alexander, amd-gfx list, Christian König, dri-devel

Am 02.11.22 um 14:36 schrieb Mikhail Gavrilov:
> On Tue, Nov 1, 2022 at 10:52 PM Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
>> Let's focus on one problem at a time.
>>
>> The issue here is that somehow userptr handling became racy after we
>> removed the lock, but I don't see why.
>>
>> We need to fix this ASAP since it is probably a much wider problem and
>> the additional lock just hides it somehow.
>>
>> Going to provide you with an updated patch tomorrow.
>>
>> Thanks,
>> Christian.
> Recently sackboy has been updated and now the kernel log contains a
> trace very similar to the one in the first post, even with the patch
> applied.
>
> [  155.948044] ------------[ cut here ]------------
> [  155.948164] WARNING: CPU: 3 PID: 4850 at
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:678
> amdgpu_ttm_tt_get_user_pages+0x14c/0x190 [amdgpu]
> [  155.948342] Modules linked in: uinput rfcomm snd_seq_dummy
> snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast
> nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
> nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
> qrtr bnep intel_rapl_msr intel_rapl_common snd_hda_codec_realtek
> snd_sof_amd_renoir snd_sof_amd_acp snd_hda_codec_generic
> snd_hda_codec_hdmi snd_sof_pci sunrpc binfmt_misc snd_sof
> snd_hda_intel snd_sof_utils snd_intel_dspcfg mt7921e
> snd_intel_sdw_acpi snd_hda_codec mt7921_common snd_soc_core
> edac_mce_amd mt76_connac_lib btusb snd_hda_core snd_compress snd_hwdep
> mt76 btrtl ac97_bus kvm_amd snd_pcm_dmaengine btbcm snd_rpl_pci_acp6x
> snd_pci_acp6x btintel mac80211 btmtk snd_seq snd_seq_device kvm
> snd_pcm snd_pci_acp5x libarc4 bluetooth irqbypass vfat snd_timer
> snd_rn_pci_acp3x fat rapl snd_acp_config asus_nb_wmi snd cfg80211
> snd_soc_acpi wmi_bmof k10temp pcspkr
> [  155.948436]  snd_pci_acp3x i2c_piix4 soundcore asus_wireless
> amd_pmc joydev zram amdgpu drm_ttm_helper ttm crct10dif_pclmul
> hid_asus crc32_pclmul asus_wmi crc32c_intel iommu_v2 ledtrig_audio
> polyval_clmulni gpu_sched sparse_keymap polyval_generic
> platform_profile drm_buddy drm_display_helper nvme rfkill
> ghash_clmulni_intel hid_multitouch ucsi_acpi sha512_ssse3 nvme_core
> typec_ucsi serio_raw sp5100_tco r8169 ccp cec nvme_common typec
> i2c_hid_acpi i2c_hid video wmi ip6_tables ip_tables fuse
> [  155.948540] CPU: 3 PID: 4850 Comm: Sackboy-Win64-T Tainted: G
>   W    L    -------  ---
> 6.1.0-0.rc3.20221101git5aaef24b5c6d.29.fc38.x86_64 #1
> [  155.948544] Hardware name: ASUSTeK COMPUTER INC. ROG Strix
> G513QY_G513QY/G513QY, BIOS G513QY.318 03/29/2022
> [  155.948547] RIP: 0010:amdgpu_ttm_tt_get_user_pages+0x14c/0x190 [amdgpu]
> [  155.948748] Code: 9e f1 e9 32 ff ff ff 4c 89 e9 89 ea 48 c7 c6 a8
> a3 fd c0 48 c7 c7 88 81 1e c1 e8 af 97 ea f1 eb 8e 66 90 bd f2 ff ff
> ff eb 8d <0f> 0b eb f5 bd fd ff ff ff eb 82 bd f2 ff ff ff e9 62 ff ff
> ff 48
> [  155.948751] RSP: 0018:ffff960b544d3a50 EFLAGS: 00010282
> [  155.948756] RAX: ffff8a4e40d44e00 RBX: ffff8a4f0e564140 RCX: 0000000000000001
> [  155.948759] RDX: 0000000000000000 RSI: ffff8a4e40d44e00 RDI: ffff8a4f4b52b400
> [  155.948761] RBP: ffff8a4e8c979000 R08: 0000000000000dc0 R09: 00000000ffffffff
> [  155.948764] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8a4e8aaad558
> [  155.948767] R13: 000000003b910000 R14: ffff8a4f0e667180 R15: ffff8a4f4b52b458
> [  155.948770] FS:  00007fa13fe006c0(0000) GS:ffff8a5d16e00000(0000)
> knlGS:0000000036f80000
> [  155.948772] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  155.948775] CR2: 0000000025c9e1d0 CR3: 0000000361990000 CR4: 0000000000750ee0
> [  155.948778] PKRU: 55555554
> [  155.948780] Call Trace:
> [  155.948783]  <TASK>
> [  155.948790]  amdgpu_cs_ioctl+0x9fd/0x2030 [amdgpu]
> [  155.948992]  ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
> [  155.949155]  drm_ioctl_kernel+0xac/0x160
> [  155.949165]  drm_ioctl+0x1e7/0x450
> [  155.949172]  ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
> [  155.949344]  amdgpu_drm_ioctl+0x4a/0x80 [amdgpu]
> [  155.949528]  __x64_sys_ioctl+0x90/0xd0
> [  155.949537]  do_syscall_64+0x5b/0x80
> [  155.949547]  ? lock_is_held_type+0xe8/0x140
> [  155.949559]  ? do_syscall_64+0x67/0x80
> [  155.949565]  ? lockdep_hardirqs_on+0x7d/0x100
> [  155.949573]  ? do_syscall_64+0x67/0x80
> [  155.949579]  ? do_syscall_64+0x67/0x80
> [  155.949586]  ? do_syscall_64+0x67/0x80
> [  155.949592]  ? lockdep_hardirqs_on+0x7d/0x100
> [  155.949597]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [  155.949603] RIP: 0033:0x7fa1b7fd912f
> [  155.949610] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24
> 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00
> 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28
> 00 00
> [  155.949615] RSP: 002b:00007fa13fdfe920 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [  155.949621] RAX: ffffffffffffffda RBX: 00007fa13fdfebe8 RCX: 00007fa1b7fd912f
> [  155.949625] RDX: 00007fa13fdfea10 RSI: 00000000c0186444 RDI: 0000000000000165
> [  155.949629] RBP: 00007fa13fdfea10 R08: 00007f9ff80018e0 R09: 00007fa13fdfe9c0
> [  155.949633] R10: 000000007eb11590 R11: 0000000000000246 R12: 00000000c0186444
> [  155.949635] R13: 0000000000000165 R14: 00007f9ff8001860 R15: 0000000000000005
> [  155.949647]  </TASK>
> [  155.949650] irq event stamp: 5375
> [  155.949652] hardirqs last  enabled at (5383): [<ffffffffb218e8fe>]
> __up_console_sem+0x5e/0x70
> [  155.949657] hardirqs last disabled at (5390): [<ffffffffb218e8e3>]
> __up_console_sem+0x43/0x70
> [  155.949659] softirqs last  enabled at (3236): [<ffffffffb21012ed>]
> __irq_exit_rcu+0xed/0x160
> [  155.949663] softirqs last disabled at (3231): [<ffffffffb21012ed>]
> __irq_exit_rcu+0xed/0x160
> [  155.949665] ---[ end trace 0000000000000000 ]---
> [  155.949676] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to
> process the buffer list -14!
>
> [  155.950689] ================================================
> [  155.950690] WARNING: lock held when returning to user space!
> [  155.950691] 6.1.0-0.rc3.20221101git5aaef24b5c6d.29.fc38.x86_64 #1
> Tainted: G        W    L    -------  ---
> [  155.950694] ------------------------------------------------
> [  155.950695] Sackboy-Win64-T/4850 is leaving the kernel with locks still held!
> [  155.950697] 1 lock held by Sackboy-Win64-T/4850:
> [  155.950698]  #0: ffff8a4e8aaad0a8
> (&list->bo_list_mutex){+.+.}-{3:3}, at: amdgpu_cs_ioctl+0x903/0x2030
> [amdgpu]
>
> But the most interesting thing is that all previous kernels 6.0, 5.19
> are affected by the problem. It is not enough to revert the
> dd80d9c8eecac8c516da5b240d01a35660ba6cb6 commit.

Yeah, that totally confirms what I expected. The context lock just hides 
the problem because userspace tended to use the same context.

What the application now seems to do is to use multiple contexts for its 
submission and in this case re-adding the lock doesn't even help.

Thanks for that information, gets me a lot closer to a solution.

Regards,
Christian.

>
> Full kernel log 6.1-rc3 + patch above: https://pastebin.com/6ebmReer
> Full kernel log 5.19: https://pastebin.com/5dRCgxNW
>
> Thanks.
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start
@ 2022-11-02 13:43               ` Christian König
  0 siblings, 0 replies; 29+ messages in thread
From: Christian König @ 2022-11-02 13:43 UTC (permalink / raw)
  To: Mikhail Gavrilov
  Cc: Deucher, Alexander, amd-gfx list, Christian König,
	dri-devel, Bas Nieuwenhuizen

Am 02.11.22 um 14:36 schrieb Mikhail Gavrilov:
> On Tue, Nov 1, 2022 at 10:52 PM Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
>> Let's focus on one problem at a time.
>>
>> The issue here is that somehow userptr handling became racy after we
>> removed the lock, but I don't see why.
>>
>> We need to fix this ASAP since it is probably a much wider problem and
>> the additional lock just hides it somehow.
>>
>> Going to provide you with an updated patch tomorrow.
>>
>> Thanks,
>> Christian.
> Recently sackboy has been updated and now the kernel log contains a
> trace very similar to the one in the first post, even with the patch
> applied.
>
> [  155.948044] ------------[ cut here ]------------
> [  155.948164] WARNING: CPU: 3 PID: 4850 at
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:678
> amdgpu_ttm_tt_get_user_pages+0x14c/0x190 [amdgpu]
> [  155.948342] Modules linked in: uinput rfcomm snd_seq_dummy
> snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast
> nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
> nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
> qrtr bnep intel_rapl_msr intel_rapl_common snd_hda_codec_realtek
> snd_sof_amd_renoir snd_sof_amd_acp snd_hda_codec_generic
> snd_hda_codec_hdmi snd_sof_pci sunrpc binfmt_misc snd_sof
> snd_hda_intel snd_sof_utils snd_intel_dspcfg mt7921e
> snd_intel_sdw_acpi snd_hda_codec mt7921_common snd_soc_core
> edac_mce_amd mt76_connac_lib btusb snd_hda_core snd_compress snd_hwdep
> mt76 btrtl ac97_bus kvm_amd snd_pcm_dmaengine btbcm snd_rpl_pci_acp6x
> snd_pci_acp6x btintel mac80211 btmtk snd_seq snd_seq_device kvm
> snd_pcm snd_pci_acp5x libarc4 bluetooth irqbypass vfat snd_timer
> snd_rn_pci_acp3x fat rapl snd_acp_config asus_nb_wmi snd cfg80211
> snd_soc_acpi wmi_bmof k10temp pcspkr
> [  155.948436]  snd_pci_acp3x i2c_piix4 soundcore asus_wireless
> amd_pmc joydev zram amdgpu drm_ttm_helper ttm crct10dif_pclmul
> hid_asus crc32_pclmul asus_wmi crc32c_intel iommu_v2 ledtrig_audio
> polyval_clmulni gpu_sched sparse_keymap polyval_generic
> platform_profile drm_buddy drm_display_helper nvme rfkill
> ghash_clmulni_intel hid_multitouch ucsi_acpi sha512_ssse3 nvme_core
> typec_ucsi serio_raw sp5100_tco r8169 ccp cec nvme_common typec
> i2c_hid_acpi i2c_hid video wmi ip6_tables ip_tables fuse
> [  155.948540] CPU: 3 PID: 4850 Comm: Sackboy-Win64-T Tainted: G
>   W    L    -------  ---
> 6.1.0-0.rc3.20221101git5aaef24b5c6d.29.fc38.x86_64 #1
> [  155.948544] Hardware name: ASUSTeK COMPUTER INC. ROG Strix
> G513QY_G513QY/G513QY, BIOS G513QY.318 03/29/2022
> [  155.948547] RIP: 0010:amdgpu_ttm_tt_get_user_pages+0x14c/0x190 [amdgpu]
> [  155.948748] Code: 9e f1 e9 32 ff ff ff 4c 89 e9 89 ea 48 c7 c6 a8
> a3 fd c0 48 c7 c7 88 81 1e c1 e8 af 97 ea f1 eb 8e 66 90 bd f2 ff ff
> ff eb 8d <0f> 0b eb f5 bd fd ff ff ff eb 82 bd f2 ff ff ff e9 62 ff ff
> ff 48
> [  155.948751] RSP: 0018:ffff960b544d3a50 EFLAGS: 00010282
> [  155.948756] RAX: ffff8a4e40d44e00 RBX: ffff8a4f0e564140 RCX: 0000000000000001
> [  155.948759] RDX: 0000000000000000 RSI: ffff8a4e40d44e00 RDI: ffff8a4f4b52b400
> [  155.948761] RBP: ffff8a4e8c979000 R08: 0000000000000dc0 R09: 00000000ffffffff
> [  155.948764] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8a4e8aaad558
> [  155.948767] R13: 000000003b910000 R14: ffff8a4f0e667180 R15: ffff8a4f4b52b458
> [  155.948770] FS:  00007fa13fe006c0(0000) GS:ffff8a5d16e00000(0000)
> knlGS:0000000036f80000
> [  155.948772] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  155.948775] CR2: 0000000025c9e1d0 CR3: 0000000361990000 CR4: 0000000000750ee0
> [  155.948778] PKRU: 55555554
> [  155.948780] Call Trace:
> [  155.948783]  <TASK>
> [  155.948790]  amdgpu_cs_ioctl+0x9fd/0x2030 [amdgpu]
> [  155.948992]  ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
> [  155.949155]  drm_ioctl_kernel+0xac/0x160
> [  155.949165]  drm_ioctl+0x1e7/0x450
> [  155.949172]  ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
> [  155.949344]  amdgpu_drm_ioctl+0x4a/0x80 [amdgpu]
> [  155.949528]  __x64_sys_ioctl+0x90/0xd0
> [  155.949537]  do_syscall_64+0x5b/0x80
> [  155.949547]  ? lock_is_held_type+0xe8/0x140
> [  155.949559]  ? do_syscall_64+0x67/0x80
> [  155.949565]  ? lockdep_hardirqs_on+0x7d/0x100
> [  155.949573]  ? do_syscall_64+0x67/0x80
> [  155.949579]  ? do_syscall_64+0x67/0x80
> [  155.949586]  ? do_syscall_64+0x67/0x80
> [  155.949592]  ? lockdep_hardirqs_on+0x7d/0x100
> [  155.949597]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [  155.949603] RIP: 0033:0x7fa1b7fd912f
> [  155.949610] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24
> 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00
> 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28
> 00 00
> [  155.949615] RSP: 002b:00007fa13fdfe920 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [  155.949621] RAX: ffffffffffffffda RBX: 00007fa13fdfebe8 RCX: 00007fa1b7fd912f
> [  155.949625] RDX: 00007fa13fdfea10 RSI: 00000000c0186444 RDI: 0000000000000165
> [  155.949629] RBP: 00007fa13fdfea10 R08: 00007f9ff80018e0 R09: 00007fa13fdfe9c0
> [  155.949633] R10: 000000007eb11590 R11: 0000000000000246 R12: 00000000c0186444
> [  155.949635] R13: 0000000000000165 R14: 00007f9ff8001860 R15: 0000000000000005
> [  155.949647]  </TASK>
> [  155.949650] irq event stamp: 5375
> [  155.949652] hardirqs last  enabled at (5383): [<ffffffffb218e8fe>]
> __up_console_sem+0x5e/0x70
> [  155.949657] hardirqs last disabled at (5390): [<ffffffffb218e8e3>]
> __up_console_sem+0x43/0x70
> [  155.949659] softirqs last  enabled at (3236): [<ffffffffb21012ed>]
> __irq_exit_rcu+0xed/0x160
> [  155.949663] softirqs last disabled at (3231): [<ffffffffb21012ed>]
> __irq_exit_rcu+0xed/0x160
> [  155.949665] ---[ end trace 0000000000000000 ]---
> [  155.949676] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to
> process the buffer list -14!
>
> [  155.950689] ================================================
> [  155.950690] WARNING: lock held when returning to user space!
> [  155.950691] 6.1.0-0.rc3.20221101git5aaef24b5c6d.29.fc38.x86_64 #1
> Tainted: G        W    L    -------  ---
> [  155.950694] ------------------------------------------------
> [  155.950695] Sackboy-Win64-T/4850 is leaving the kernel with locks still held!
> [  155.950697] 1 lock held by Sackboy-Win64-T/4850:
> [  155.950698]  #0: ffff8a4e8aaad0a8
> (&list->bo_list_mutex){+.+.}-{3:3}, at: amdgpu_cs_ioctl+0x903/0x2030
> [amdgpu]
>
> But the most interesting thing is that all previous kernels 6.0, 5.19
> are affected by the problem. It is not enough to revert the
> dd80d9c8eecac8c516da5b240d01a35660ba6cb6 commit.

Yeah, that totally confirms what I expected. The context lock just hides 
the problem because userspace tended to use the same context.

What the application now seems to do is to use multiple contexts for its 
submission and in this case re-adding the lock doesn't even help.

Thanks for that information, gets me a lot closer to a solution.

Regards,
Christian.

>
> Full kernel log 6.1-rc3 + patch above: https://pastebin.com/6ebmReer
> Full kernel log 5.19: https://pastebin.com/5dRCgxNW
>
> Thanks.
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start
  2022-11-02 13:43               ` Christian König
  (?)
@ 2022-11-14  9:43               ` Thorsten Leemhuis
  -1 siblings, 0 replies; 29+ messages in thread
From: Thorsten Leemhuis @ 2022-11-14  9:43 UTC (permalink / raw)
  To: Christian König, Mikhail Gavrilov
  Cc: Deucher, Alexander, dri-devel, Christian König, amd-gfx list

Hi, this is your Linux kernel regression tracker. Top-posting for once,
to make this easily accessible to everyone.

Christian, was any progress made to address this? It looks stalled sine
10+ days, as I looked for posts and commits that referenced this report,
but couldn't find anything.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

On 02.11.22 14:43, Christian König wrote:
> Am 02.11.22 um 14:36 schrieb Mikhail Gavrilov:
>> On Tue, Nov 1, 2022 at 10:52 PM Christian König
>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>> Let's focus on one problem at a time.
>>>
>>> The issue here is that somehow userptr handling became racy after we
>>> removed the lock, but I don't see why.
>>>
>>> We need to fix this ASAP since it is probably a much wider problem and
>>> the additional lock just hides it somehow.
>>>
>>> Going to provide you with an updated patch tomorrow.
>>>
>>> Thanks,
>>> Christian.
>> Recently sackboy has been updated and now the kernel log contains a
>> trace very similar to the one in the first post, even with the patch
>> applied.
>>
>> [  155.948044] ------------[ cut here ]------------
>> [  155.948164] WARNING: CPU: 3 PID: 4850 at
>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:678
>> amdgpu_ttm_tt_get_user_pages+0x14c/0x190 [amdgpu]
>> [  155.948342] Modules linked in: uinput rfcomm snd_seq_dummy
>> snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast
>> nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
>> nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
>> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
>> qrtr bnep intel_rapl_msr intel_rapl_common snd_hda_codec_realtek
>> snd_sof_amd_renoir snd_sof_amd_acp snd_hda_codec_generic
>> snd_hda_codec_hdmi snd_sof_pci sunrpc binfmt_misc snd_sof
>> snd_hda_intel snd_sof_utils snd_intel_dspcfg mt7921e
>> snd_intel_sdw_acpi snd_hda_codec mt7921_common snd_soc_core
>> edac_mce_amd mt76_connac_lib btusb snd_hda_core snd_compress snd_hwdep
>> mt76 btrtl ac97_bus kvm_amd snd_pcm_dmaengine btbcm snd_rpl_pci_acp6x
>> snd_pci_acp6x btintel mac80211 btmtk snd_seq snd_seq_device kvm
>> snd_pcm snd_pci_acp5x libarc4 bluetooth irqbypass vfat snd_timer
>> snd_rn_pci_acp3x fat rapl snd_acp_config asus_nb_wmi snd cfg80211
>> snd_soc_acpi wmi_bmof k10temp pcspkr
>> [  155.948436]  snd_pci_acp3x i2c_piix4 soundcore asus_wireless
>> amd_pmc joydev zram amdgpu drm_ttm_helper ttm crct10dif_pclmul
>> hid_asus crc32_pclmul asus_wmi crc32c_intel iommu_v2 ledtrig_audio
>> polyval_clmulni gpu_sched sparse_keymap polyval_generic
>> platform_profile drm_buddy drm_display_helper nvme rfkill
>> ghash_clmulni_intel hid_multitouch ucsi_acpi sha512_ssse3 nvme_core
>> typec_ucsi serio_raw sp5100_tco r8169 ccp cec nvme_common typec
>> i2c_hid_acpi i2c_hid video wmi ip6_tables ip_tables fuse
>> [  155.948540] CPU: 3 PID: 4850 Comm: Sackboy-Win64-T Tainted: G
>>   W    L    -------  ---
>> 6.1.0-0.rc3.20221101git5aaef24b5c6d.29.fc38.x86_64 #1
>> [  155.948544] Hardware name: ASUSTeK COMPUTER INC. ROG Strix
>> G513QY_G513QY/G513QY, BIOS G513QY.318 03/29/2022
>> [  155.948547] RIP: 0010:amdgpu_ttm_tt_get_user_pages+0x14c/0x190
>> [amdgpu]
>> [  155.948748] Code: 9e f1 e9 32 ff ff ff 4c 89 e9 89 ea 48 c7 c6 a8
>> a3 fd c0 48 c7 c7 88 81 1e c1 e8 af 97 ea f1 eb 8e 66 90 bd f2 ff ff
>> ff eb 8d <0f> 0b eb f5 bd fd ff ff ff eb 82 bd f2 ff ff ff e9 62 ff ff
>> ff 48
>> [  155.948751] RSP: 0018:ffff960b544d3a50 EFLAGS: 00010282
>> [  155.948756] RAX: ffff8a4e40d44e00 RBX: ffff8a4f0e564140 RCX:
>> 0000000000000001
>> [  155.948759] RDX: 0000000000000000 RSI: ffff8a4e40d44e00 RDI:
>> ffff8a4f4b52b400
>> [  155.948761] RBP: ffff8a4e8c979000 R08: 0000000000000dc0 R09:
>> 00000000ffffffff
>> [  155.948764] R10: 0000000000000001 R11: 0000000000000000 R12:
>> ffff8a4e8aaad558
>> [  155.948767] R13: 000000003b910000 R14: ffff8a4f0e667180 R15:
>> ffff8a4f4b52b458
>> [  155.948770] FS:  00007fa13fe006c0(0000) GS:ffff8a5d16e00000(0000)
>> knlGS:0000000036f80000
>> [  155.948772] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  155.948775] CR2: 0000000025c9e1d0 CR3: 0000000361990000 CR4:
>> 0000000000750ee0
>> [  155.948778] PKRU: 55555554
>> [  155.948780] Call Trace:
>> [  155.948783]  <TASK>
>> [  155.948790]  amdgpu_cs_ioctl+0x9fd/0x2030 [amdgpu]
>> [  155.948992]  ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
>> [  155.949155]  drm_ioctl_kernel+0xac/0x160
>> [  155.949165]  drm_ioctl+0x1e7/0x450
>> [  155.949172]  ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
>> [  155.949344]  amdgpu_drm_ioctl+0x4a/0x80 [amdgpu]
>> [  155.949528]  __x64_sys_ioctl+0x90/0xd0
>> [  155.949537]  do_syscall_64+0x5b/0x80
>> [  155.949547]  ? lock_is_held_type+0xe8/0x140
>> [  155.949559]  ? do_syscall_64+0x67/0x80
>> [  155.949565]  ? lockdep_hardirqs_on+0x7d/0x100
>> [  155.949573]  ? do_syscall_64+0x67/0x80
>> [  155.949579]  ? do_syscall_64+0x67/0x80
>> [  155.949586]  ? do_syscall_64+0x67/0x80
>> [  155.949592]  ? lockdep_hardirqs_on+0x7d/0x100
>> [  155.949597]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
>> [  155.949603] RIP: 0033:0x7fa1b7fd912f
>> [  155.949610] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24
>> 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00
>> 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28
>> 00 00
>> [  155.949615] RSP: 002b:00007fa13fdfe920 EFLAGS: 00000246 ORIG_RAX:
>> 0000000000000010
>> [  155.949621] RAX: ffffffffffffffda RBX: 00007fa13fdfebe8 RCX:
>> 00007fa1b7fd912f
>> [  155.949625] RDX: 00007fa13fdfea10 RSI: 00000000c0186444 RDI:
>> 0000000000000165
>> [  155.949629] RBP: 00007fa13fdfea10 R08: 00007f9ff80018e0 R09:
>> 00007fa13fdfe9c0
>> [  155.949633] R10: 000000007eb11590 R11: 0000000000000246 R12:
>> 00000000c0186444
>> [  155.949635] R13: 0000000000000165 R14: 00007f9ff8001860 R15:
>> 0000000000000005
>> [  155.949647]  </TASK>
>> [  155.949650] irq event stamp: 5375
>> [  155.949652] hardirqs last  enabled at (5383): [<ffffffffb218e8fe>]
>> __up_console_sem+0x5e/0x70
>> [  155.949657] hardirqs last disabled at (5390): [<ffffffffb218e8e3>]
>> __up_console_sem+0x43/0x70
>> [  155.949659] softirqs last  enabled at (3236): [<ffffffffb21012ed>]
>> __irq_exit_rcu+0xed/0x160
>> [  155.949663] softirqs last disabled at (3231): [<ffffffffb21012ed>]
>> __irq_exit_rcu+0xed/0x160
>> [  155.949665] ---[ end trace 0000000000000000 ]---
>> [  155.949676] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to
>> process the buffer list -14!
>>
>> [  155.950689] ================================================
>> [  155.950690] WARNING: lock held when returning to user space!
>> [  155.950691] 6.1.0-0.rc3.20221101git5aaef24b5c6d.29.fc38.x86_64 #1
>> Tainted: G        W    L    -------  ---
>> [  155.950694] ------------------------------------------------
>> [  155.950695] Sackboy-Win64-T/4850 is leaving the kernel with locks
>> still held!
>> [  155.950697] 1 lock held by Sackboy-Win64-T/4850:
>> [  155.950698]  #0: ffff8a4e8aaad0a8
>> (&list->bo_list_mutex){+.+.}-{3:3}, at: amdgpu_cs_ioctl+0x903/0x2030
>> [amdgpu]
>>
>> But the most interesting thing is that all previous kernels 6.0, 5.19
>> are affected by the problem. It is not enough to revert the
>> dd80d9c8eecac8c516da5b240d01a35660ba6cb6 commit.
> 
> Yeah, that totally confirms what I expected. The context lock just hides
> the problem because userspace tended to use the same context.
> 
> What the application now seems to do is to use multiple contexts for its
> submission and in this case re-adding the lock doesn't even help.
> 
> Thanks for that information, gets me a lot closer to a solution.
> 
> Regards,
> Christian.
> 
>>
>> Full kernel log 6.1-rc3 + patch above: https://pastebin.com/6ebmReer
>> Full kernel log 5.19: https://pastebin.com/5dRCgxNW
>>
>> Thanks.
>>
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start
  2022-11-02 13:43               ` Christian König
@ 2022-11-14 13:22                 ` Christian König
  -1 siblings, 0 replies; 29+ messages in thread
From: Christian König @ 2022-11-14 13:22 UTC (permalink / raw)
  To: Christian König, Mikhail Gavrilov
  Cc: Deucher, Alexander, amd-gfx list, dri-devel

Hi Mikhail,

Am 02.11.22 um 14:43 schrieb Christian König:
> Am 02.11.22 um 14:36 schrieb Mikhail Gavrilov:
>> On Tue, Nov 1, 2022 at 10:52 PM Christian König
>> <ckoenig.leichtzumerken@gmail.com> wrote:
>> [SNIP]
>> But the most interesting thing is that all previous kernels 6.0, 5.19
>> are affected by the problem. It is not enough to revert the
>> dd80d9c8eecac8c516da5b240d01a35660ba6cb6 commit.
>
> Yeah, that totally confirms what I expected. The context lock just 
> hides the problem because userspace tended to use the same context.
>
> What the application now seems to do is to use multiple contexts for 
> its submission and in this case re-adding the lock doesn't even help.
>
> Thanks for that information, gets me a lot closer to a solution.

I've found and fixed a few problems around the userptr handling which 
might explain what you see here.

A series of four patches starting with "drm/amdgpu: always register an 
MMU notifier for userptr" is under review now.

Going to give that a bit cleanup later today and will CC you when I send 
that out. Would be nice if you could give that some testing.

Thanks,
Christian.

>
> Regards,
> Christian.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start
@ 2022-11-14 13:22                 ` Christian König
  0 siblings, 0 replies; 29+ messages in thread
From: Christian König @ 2022-11-14 13:22 UTC (permalink / raw)
  To: Christian König, Mikhail Gavrilov
  Cc: Deucher, Alexander, amd-gfx list, dri-devel, Bas Nieuwenhuizen

Hi Mikhail,

Am 02.11.22 um 14:43 schrieb Christian König:
> Am 02.11.22 um 14:36 schrieb Mikhail Gavrilov:
>> On Tue, Nov 1, 2022 at 10:52 PM Christian König
>> <ckoenig.leichtzumerken@gmail.com> wrote:
>> [SNIP]
>> But the most interesting thing is that all previous kernels 6.0, 5.19
>> are affected by the problem. It is not enough to revert the
>> dd80d9c8eecac8c516da5b240d01a35660ba6cb6 commit.
>
> Yeah, that totally confirms what I expected. The context lock just 
> hides the problem because userspace tended to use the same context.
>
> What the application now seems to do is to use multiple contexts for 
> its submission and in this case re-adding the lock doesn't even help.
>
> Thanks for that information, gets me a lot closer to a solution.

I've found and fixed a few problems around the userptr handling which 
might explain what you see here.

A series of four patches starting with "drm/amdgpu: always register an 
MMU notifier for userptr" is under review now.

Going to give that a bit cleanup later today and will CC you when I send 
that out. Would be nice if you could give that some testing.

Thanks,
Christian.

>
> Regards,
> Christian.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start #forregzbot
  2022-11-14 13:22                 ` Christian König
  (?)
@ 2022-11-20 17:25                 ` Thorsten Leemhuis
  2022-11-27 10:56                   ` Thorsten Leemhuis
  -1 siblings, 1 reply; 29+ messages in thread
From: Thorsten Leemhuis @ 2022-11-20 17:25 UTC (permalink / raw)
  To: amd-gfx list; +Cc: dri-devel

[Note: this mail is primarily send for documentation purposes and/or for
regzbot, my Linux kernel regression tracking bot. That's why I removed
most or all folks from the list of recipients, but left any that looked
like a mailing lists. These mails usually contain '#forregzbot' in the
subject, to make them easy to spot and filter out.]

On 14.11.22 14:22, Christian König wrote:
> 
> I've found and fixed a few problems around the userptr handling which
> might explain what you see here.
> 
> A series of four patches starting with "drm/amdgpu: always register an
> MMU notifier for userptr" is under review now.

#regzbot monitor:
https://lore.kernel.org/all/20221115133853.7950-1-christian.koenig@amd.com/
#regzbot fixed-by: fec8fdb54e8f



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start
  2022-11-14 13:22                 ` Christian König
@ 2022-11-21 23:42                   ` Mikhail Gavrilov
  -1 siblings, 0 replies; 29+ messages in thread
From: Mikhail Gavrilov @ 2022-11-21 23:42 UTC (permalink / raw)
  To: Christian König
  Cc: Deucher, Alexander, Christian König, amd-gfx list, dri-devel

On Mon, Nov 14, 2022 at 6:22 PM Christian König
<christian.koenig@amd.com> wrote:
>
> I've found and fixed a few problems around the userptr handling which
> might explain what you see here.
>
> A series of four patches starting with "drm/amdgpu: always register an
> MMU notifier for userptr" is under review now.
>
> Going to give that a bit cleanup later today and will CC you when I send
> that out. Would be nice if you could give that some testing.
>
> Thanks,
> Christian.
>

Christian, I tested all four patches around week and can say that this
issue is completely gone.
All known broken games working.
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>

The only thing I don't like is the flood in the kernel logs of the
message "WARNING message at drivers/gpu/drm/drm_modeset_lock.c:276
drm_modeset_drop_locks+0x63/0x70", but this is not related to the
patches being checked.
All kernel logs uploaded to pastebin [1][2][3][4][5][6][7][8]

I wrote a separate bug report about "drm_modeset_lock" [9], it's a
pity that no one paid attention to it. I even found the first bad
commit. It is b261509952bc19d1012cf732f853659be6ebc61e.

[1] https://pastebin.com/WZWczupk
[2] https://pastebin.com/f4i9pvjS
[3] https://pastebin.com/rsDWaMR1
[4] https://pastebin.com/tDNEYJq0
[5] https://pastebin.com/xfZVbm1f
[6] https://pastebin.com/Vx9gDyKt
[7] https://pastebin.com/XvRkLckV
[8] https://pastebin.com/pd8WBkgx
[9] https://www.spinics.net/lists/dri-devel/msg367543.html

Thanks.

-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start
@ 2022-11-21 23:42                   ` Mikhail Gavrilov
  0 siblings, 0 replies; 29+ messages in thread
From: Mikhail Gavrilov @ 2022-11-21 23:42 UTC (permalink / raw)
  To: Christian König
  Cc: Deucher, Alexander, Christian König, amd-gfx list,
	dri-devel, Bas Nieuwenhuizen

On Mon, Nov 14, 2022 at 6:22 PM Christian König
<christian.koenig@amd.com> wrote:
>
> I've found and fixed a few problems around the userptr handling which
> might explain what you see here.
>
> A series of four patches starting with "drm/amdgpu: always register an
> MMU notifier for userptr" is under review now.
>
> Going to give that a bit cleanup later today and will CC you when I send
> that out. Would be nice if you could give that some testing.
>
> Thanks,
> Christian.
>

Christian, I tested all four patches around week and can say that this
issue is completely gone.
All known broken games working.
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>

The only thing I don't like is the flood in the kernel logs of the
message "WARNING message at drivers/gpu/drm/drm_modeset_lock.c:276
drm_modeset_drop_locks+0x63/0x70", but this is not related to the
patches being checked.
All kernel logs uploaded to pastebin [1][2][3][4][5][6][7][8]

I wrote a separate bug report about "drm_modeset_lock" [9], it's a
pity that no one paid attention to it. I even found the first bad
commit. It is b261509952bc19d1012cf732f853659be6ebc61e.

[1] https://pastebin.com/WZWczupk
[2] https://pastebin.com/f4i9pvjS
[3] https://pastebin.com/rsDWaMR1
[4] https://pastebin.com/tDNEYJq0
[5] https://pastebin.com/xfZVbm1f
[6] https://pastebin.com/Vx9gDyKt
[7] https://pastebin.com/XvRkLckV
[8] https://pastebin.com/pd8WBkgx
[9] https://www.spinics.net/lists/dri-devel/msg367543.html

Thanks.

-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start
  2022-11-21 23:42                   ` Mikhail Gavrilov
@ 2022-11-22  7:16                     ` Christian König
  -1 siblings, 0 replies; 29+ messages in thread
From: Christian König @ 2022-11-22  7:16 UTC (permalink / raw)
  To: Mikhail Gavrilov
  Cc: Deucher, Alexander, Christian König, amd-gfx list, dri-devel

Am 22.11.22 um 00:42 schrieb Mikhail Gavrilov:
> On Mon, Nov 14, 2022 at 6:22 PM Christian König
> <christian.koenig@amd.com> wrote:
>> I've found and fixed a few problems around the userptr handling which
>> might explain what you see here.
>>
>> A series of four patches starting with "drm/amdgpu: always register an
>> MMU notifier for userptr" is under review now.
>>
>> Going to give that a bit cleanup later today and will CC you when I send
>> that out. Would be nice if you could give that some testing.
>>
>> Thanks,
>> Christian.
>>
> Christian, I tested all four patches around week and can say that this
> issue is completely gone.
> All known broken games working.
> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>

Ah, thanks a lot for this. I've already pushed the patches into our 
internal branch, but getting this confirmation is still great!

This was quite some fundamental bug in the handling and I hope to get 
this completely reworked at some point since it is currently only mitigated.

> The only thing I don't like is the flood in the kernel logs of the
> message "WARNING message at drivers/gpu/drm/drm_modeset_lock.c:276
> drm_modeset_drop_locks+0x63/0x70", but this is not related to the
> patches being checked.
> All kernel logs uploaded to pastebin [1][2][3][4][5][6][7][8]

No idea what that could be. Modesetting is not something I work on.

The best advice I can give you is to maybe ping Harry and our other 
display people, they should know that stuff better than I do.

Thanks,
Christian.

>
> I wrote a separate bug report about "drm_modeset_lock" [9], it's a
> pity that no one paid attention to it. I even found the first bad
> commit. It is b261509952bc19d1012cf732f853659be6ebc61e.
>
> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FWZWczupk&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C5df2793e7deb48add3f008dacc1a176d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638046709676882205%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=wdVnVqVbQ2Ru1fZRmg6P%2FAvP6n98%2F9lkbFQJMXFX%2BBo%3D&amp;reserved=0
> [2] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2Ff4i9pvjS&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C5df2793e7deb48add3f008dacc1a176d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638046709677038445%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=MnnBdy%2FaxmTwfu28WHgGW7Pu9glPsHDsL6oZ8lQl%2BoI%3D&amp;reserved=0
> [3] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FrsDWaMR1&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C5df2793e7deb48add3f008dacc1a176d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638046709677038445%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=SOC29d5HCDj1qiLQl2KMTea7K1TCv4WCIi0EDteUwcQ%3D&amp;reserved=0
> [4] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FtDNEYJq0&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C5df2793e7deb48add3f008dacc1a176d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638046709677038445%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=xNrjwqAUVWYIzsS6zkci09ursNvlufn1dHFJtyx7N40%3D&amp;reserved=0
> [5] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FxfZVbm1f&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C5df2793e7deb48add3f008dacc1a176d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638046709677038445%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=B6P5oIjmqQbaet56%2B3eFWM4%2BrYvqLdRxuzG4DvCsrQw%3D&amp;reserved=0
> [6] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FVx9gDyKt&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C5df2793e7deb48add3f008dacc1a176d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638046709677038445%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=Im0yuNgmRl8kwiAbZD284dp08jyrtIpTzNa9qsTYnfQ%3D&amp;reserved=0
> [7] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FXvRkLckV&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C5df2793e7deb48add3f008dacc1a176d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638046709677038445%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=sHeQUZGur1kC5PEJV18KwNHha8WUPNj9wgAfNusg4H4%3D&amp;reserved=0
> [8] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2Fpd8WBkgx&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C5df2793e7deb48add3f008dacc1a176d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638046709677038445%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=SWI76x2nLqiI%2BLWSfBo8iU5nYMnIN9gplDdhsg8jrFg%3D&amp;reserved=0
> [9] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinics.net%2Flists%2Fdri-devel%2Fmsg367543.html&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C5df2793e7deb48add3f008dacc1a176d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638046709677038445%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=M7VtIH0pKJJ1PpQ3ihnbC7w7PEXRfZ1CfYx9bRzEH2U%3D&amp;reserved=0
>
> Thanks.
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start
@ 2022-11-22  7:16                     ` Christian König
  0 siblings, 0 replies; 29+ messages in thread
From: Christian König @ 2022-11-22  7:16 UTC (permalink / raw)
  To: Mikhail Gavrilov
  Cc: Deucher, Alexander, Christian König, amd-gfx list,
	dri-devel, Bas Nieuwenhuizen

Am 22.11.22 um 00:42 schrieb Mikhail Gavrilov:
> On Mon, Nov 14, 2022 at 6:22 PM Christian König
> <christian.koenig@amd.com> wrote:
>> I've found and fixed a few problems around the userptr handling which
>> might explain what you see here.
>>
>> A series of four patches starting with "drm/amdgpu: always register an
>> MMU notifier for userptr" is under review now.
>>
>> Going to give that a bit cleanup later today and will CC you when I send
>> that out. Would be nice if you could give that some testing.
>>
>> Thanks,
>> Christian.
>>
> Christian, I tested all four patches around week and can say that this
> issue is completely gone.
> All known broken games working.
> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>

Ah, thanks a lot for this. I've already pushed the patches into our 
internal branch, but getting this confirmation is still great!

This was quite some fundamental bug in the handling and I hope to get 
this completely reworked at some point since it is currently only mitigated.

> The only thing I don't like is the flood in the kernel logs of the
> message "WARNING message at drivers/gpu/drm/drm_modeset_lock.c:276
> drm_modeset_drop_locks+0x63/0x70", but this is not related to the
> patches being checked.
> All kernel logs uploaded to pastebin [1][2][3][4][5][6][7][8]

No idea what that could be. Modesetting is not something I work on.

The best advice I can give you is to maybe ping Harry and our other 
display people, they should know that stuff better than I do.

Thanks,
Christian.

>
> I wrote a separate bug report about "drm_modeset_lock" [9], it's a
> pity that no one paid attention to it. I even found the first bad
> commit. It is b261509952bc19d1012cf732f853659be6ebc61e.
>
> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FWZWczupk&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C5df2793e7deb48add3f008dacc1a176d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638046709676882205%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=wdVnVqVbQ2Ru1fZRmg6P%2FAvP6n98%2F9lkbFQJMXFX%2BBo%3D&amp;reserved=0
> [2] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2Ff4i9pvjS&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C5df2793e7deb48add3f008dacc1a176d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638046709677038445%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=MnnBdy%2FaxmTwfu28WHgGW7Pu9glPsHDsL6oZ8lQl%2BoI%3D&amp;reserved=0
> [3] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FrsDWaMR1&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C5df2793e7deb48add3f008dacc1a176d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638046709677038445%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=SOC29d5HCDj1qiLQl2KMTea7K1TCv4WCIi0EDteUwcQ%3D&amp;reserved=0
> [4] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FtDNEYJq0&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C5df2793e7deb48add3f008dacc1a176d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638046709677038445%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=xNrjwqAUVWYIzsS6zkci09ursNvlufn1dHFJtyx7N40%3D&amp;reserved=0
> [5] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FxfZVbm1f&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C5df2793e7deb48add3f008dacc1a176d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638046709677038445%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=B6P5oIjmqQbaet56%2B3eFWM4%2BrYvqLdRxuzG4DvCsrQw%3D&amp;reserved=0
> [6] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FVx9gDyKt&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C5df2793e7deb48add3f008dacc1a176d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638046709677038445%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=Im0yuNgmRl8kwiAbZD284dp08jyrtIpTzNa9qsTYnfQ%3D&amp;reserved=0
> [7] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FXvRkLckV&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C5df2793e7deb48add3f008dacc1a176d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638046709677038445%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=sHeQUZGur1kC5PEJV18KwNHha8WUPNj9wgAfNusg4H4%3D&amp;reserved=0
> [8] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2Fpd8WBkgx&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C5df2793e7deb48add3f008dacc1a176d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638046709677038445%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=SWI76x2nLqiI%2BLWSfBo8iU5nYMnIN9gplDdhsg8jrFg%3D&amp;reserved=0
> [9] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinics.net%2Flists%2Fdri-devel%2Fmsg367543.html&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C5df2793e7deb48add3f008dacc1a176d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638046709677038445%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=M7VtIH0pKJJ1PpQ3ihnbC7w7PEXRfZ1CfYx9bRzEH2U%3D&amp;reserved=0
>
> Thanks.
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start #forregzbot
  2022-11-20 17:25                 ` [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start #forregzbot Thorsten Leemhuis
@ 2022-11-27 10:56                   ` Thorsten Leemhuis
  0 siblings, 0 replies; 29+ messages in thread
From: Thorsten Leemhuis @ 2022-11-27 10:56 UTC (permalink / raw)
  To: amd-gfx list; +Cc: dri-devel



On 20.11.22 18:25, Thorsten Leemhuis wrote:
> [Note: this mail is primarily send for documentation purposes and/or for
> regzbot, my Linux kernel regression tracking bot. That's why I removed
> most or all folks from the list of recipients, but left any that looked
> like a mailing lists. These mails usually contain '#forregzbot' in the
> subject, to make them easy to spot and filter out.]
> 
> On 14.11.22 14:22, Christian König wrote:
>>
>> I've found and fixed a few problems around the userptr handling which
>> might explain what you see here.
>>
>> A series of four patches starting with "drm/amdgpu: always register an
>> MMU notifier for userptr" is under review now.
> 
> #regzbot monitor:
> https://lore.kernel.org/all/20221115133853.7950-1-christian.koenig@amd.com/
> #regzbot fixed-by: fec8fdb54e8f

#regzbot fixed-by: 4458da0bb09d443595

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start
  2022-11-22  7:16                     ` Christian König
@ 2022-11-28 23:16                       ` Mikhail Gavrilov
  -1 siblings, 0 replies; 29+ messages in thread
From: Mikhail Gavrilov @ 2022-11-28 23:16 UTC (permalink / raw)
  To: Christian König
  Cc: Deucher, Alexander, Christian König, amd-gfx list, dri-devel

On Tue, Nov 22, 2022 at 12:16 PM Christian König
<christian.koenig@amd.com> wrote:
>
> Ah, thanks a lot for this. I've already pushed the patches into our
> internal branch, but getting this confirmation is still great!
>
> This was quite some fundamental bug in the handling and I hope to get
> this completely reworked at some point since it is currently only mitigated.

Looks like the final version of this patch successfully merged in 6.1-rc7.
Big thanks, all games work again!

> No idea what that could be. Modesetting is not something I work on.
>
> The best advice I can give you is to maybe ping Harry and our other
> display people, they should know that stuff better than I do.

Unfortunately Harry didn't answer. I hope my email wasn't marked as spam.

-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start
@ 2022-11-28 23:16                       ` Mikhail Gavrilov
  0 siblings, 0 replies; 29+ messages in thread
From: Mikhail Gavrilov @ 2022-11-28 23:16 UTC (permalink / raw)
  To: Christian König
  Cc: Deucher, Alexander, Christian König, amd-gfx list,
	dri-devel, Bas Nieuwenhuizen

On Tue, Nov 22, 2022 at 12:16 PM Christian König
<christian.koenig@amd.com> wrote:
>
> Ah, thanks a lot for this. I've already pushed the patches into our
> internal branch, but getting this confirmation is still great!
>
> This was quite some fundamental bug in the handling and I hope to get
> this completely reworked at some point since it is currently only mitigated.

Looks like the final version of this patch successfully merged in 6.1-rc7.
Big thanks, all games work again!

> No idea what that could be. Modesetting is not something I work on.
>
> The best advice I can give you is to maybe ping Harry and our other
> display people, they should know that stuff better than I do.

Unfortunately Harry didn't answer. I hope my email wasn't marked as spam.

-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2022-11-28 23:17 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-21  8:08 [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start Mikhail Gavrilov
2022-10-21  8:08 ` Mikhail Gavrilov
2022-10-21  8:32 ` Christian König
2022-10-21  8:32   ` Christian König
2022-10-21 12:36   ` Mikhail Gavrilov
2022-10-21 12:36     ` Mikhail Gavrilov
2022-10-26  7:29     ` Christian König
2022-10-26  7:29       ` Christian König
2022-10-30 22:05       ` Mikhail Gavrilov
2022-10-30 22:05         ` Mikhail Gavrilov
2022-11-01 17:52         ` Christian König
2022-11-01 17:52           ` Christian König
2022-11-02 13:36           ` Mikhail Gavrilov
2022-11-02 13:36             ` Mikhail Gavrilov
2022-11-02 13:43             ` Christian König
2022-11-02 13:43               ` Christian König
2022-11-14  9:43               ` Thorsten Leemhuis
2022-11-14 13:22               ` Christian König
2022-11-14 13:22                 ` Christian König
2022-11-20 17:25                 ` [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start #forregzbot Thorsten Leemhuis
2022-11-27 10:56                   ` Thorsten Leemhuis
2022-11-21 23:42                 ` [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start Mikhail Gavrilov
2022-11-21 23:42                   ` Mikhail Gavrilov
2022-11-22  7:16                   ` Christian König
2022-11-22  7:16                     ` Christian König
2022-11-28 23:16                     ` Mikhail Gavrilov
2022-11-28 23:16                       ` Mikhail Gavrilov
2022-10-23 13:20 ` [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start #forregzbot Thorsten Leemhuis
2022-10-23 13:20   ` Thorsten Leemhuis

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.