All of lore.kernel.org
 help / color / mirror / Atom feed
* Lockdep spalt on killing a processes
@ 2021-10-01 10:50 Christian König
  2021-10-01 14:52 ` Daniel Vetter
  2021-10-01 15:10 ` Andrey Grodzovsky
  0 siblings, 2 replies; 19+ messages in thread
From: Christian König @ 2021-10-01 10:50 UTC (permalink / raw)
  To: Andrey Grodzovsky, amd-gfx list, dri-devel

Hey, Andrey.

while investigating some memory management problems I've got the logdep 
splat below.

Looks like something is wrong with drm_sched_entity_kill_jobs_cb(), can 
you investigate?

Thanks,
Christian.

[11176.741052] ============================================
[11176.741056] WARNING: possible recursive locking detected
[11176.741060] 5.15.0-rc1-00031-g9d546d600800 #171 Not tainted
[11176.741066] --------------------------------------------
[11176.741070] swapper/12/0 is trying to acquire lock:
[11176.741074] ffff9c337ed175a8 (&fence->lock){-.-.}-{3:3}, at: 
dma_fence_signal+0x28/0x80
[11176.741088]
                but task is already holding lock:
[11176.741092] ffff9c337ed172a8 (&fence->lock){-.-.}-{3:3}, at: 
dma_fence_signal+0x28/0x80
[11176.741100]
                other info that might help us debug this:
[11176.741104]  Possible unsafe locking scenario:

[11176.741108]        CPU0
[11176.741110]        ----
[11176.741113]   lock(&fence->lock);
[11176.741118]   lock(&fence->lock);
[11176.741122]
                 *** DEADLOCK ***

[11176.741125]  May be due to missing lock nesting notation

[11176.741128] 2 locks held by swapper/12/0:
[11176.741133]  #0: ffff9c339c30f768 
(&ring->fence_drv.lock){-.-.}-{3:3}, at: dma_fence_signal+0x28/0x80
[11176.741142]  #1: ffff9c337ed172a8 (&fence->lock){-.-.}-{3:3}, at: 
dma_fence_signal+0x28/0x80
[11176.741151]
                stack backtrace:
[11176.741155] CPU: 12 PID: 0 Comm: swapper/12 Not tainted 
5.15.0-rc1-00031-g9d546d600800 #171
[11176.741160] Hardware name: System manufacturer System Product 
Name/PRIME X399-A, BIOS 0808 10/12/2018
[11176.741165] Call Trace:
[11176.741169]  <IRQ>
[11176.741173]  dump_stack_lvl+0x5b/0x74
[11176.741181]  dump_stack+0x10/0x12
[11176.741186]  __lock_acquire.cold+0x208/0x2df
[11176.741197]  lock_acquire+0xc6/0x2d0
[11176.741204]  ? dma_fence_signal+0x28/0x80
[11176.741212]  _raw_spin_lock_irqsave+0x4d/0x70
[11176.741219]  ? dma_fence_signal+0x28/0x80
[11176.741225]  dma_fence_signal+0x28/0x80
[11176.741230]  drm_sched_fence_finished+0x12/0x20 [gpu_sched]
[11176.741240]  drm_sched_entity_kill_jobs_cb+0x1c/0x50 [gpu_sched]
[11176.741248]  dma_fence_signal_timestamp_locked+0xac/0x1a0
[11176.741254]  dma_fence_signal+0x3b/0x80
[11176.741260]  drm_sched_fence_finished+0x12/0x20 [gpu_sched]
[11176.741268]  drm_sched_job_done.isra.0+0x7f/0x1a0 [gpu_sched]
[11176.741277]  drm_sched_job_done_cb+0x12/0x20 [gpu_sched]
[11176.741284]  dma_fence_signal_timestamp_locked+0xac/0x1a0
[11176.741290]  dma_fence_signal+0x3b/0x80
[11176.741296]  amdgpu_fence_process+0xd1/0x140 [amdgpu]
[11176.741504]  sdma_v4_0_process_trap_irq+0x8c/0xb0 [amdgpu]
[11176.741731]  amdgpu_irq_dispatch+0xce/0x250 [amdgpu]
[11176.741954]  amdgpu_ih_process+0x81/0x100 [amdgpu]
[11176.742174]  amdgpu_irq_handler+0x26/0xa0 [amdgpu]
[11176.742393]  __handle_irq_event_percpu+0x4f/0x2c0
[11176.742402]  handle_irq_event_percpu+0x33/0x80
[11176.742408]  handle_irq_event+0x39/0x60
[11176.742414]  handle_edge_irq+0x93/0x1d0
[11176.742419]  __common_interrupt+0x50/0xe0
[11176.742426]  common_interrupt+0x80/0x90
[11176.742431]  </IRQ>
[11176.742436]  asm_common_interrupt+0x1e/0x40
[11176.742442] RIP: 0010:cpuidle_enter_state+0xff/0x470
[11176.742449] Code: 0f a3 05 04 54 24 01 0f 82 70 02 00 00 31 ff e8 37 
5d 6f ff 80 7d d7 00 0f 85 e9 01 00 00 e8 58 a2 7f ff fb 66 0f 1f 44 00 
00 <45> 85 ff 0f 88 01 01 00 00 49 63 c7 4c 2b 75 c8 48 8d 14 40 48 8d
[11176.742455] RSP: 0018:ffffb6970021fe48 EFLAGS: 00000202
[11176.742461] RAX: 000000000059be25 RBX: 0000000000000002 RCX: 
0000000000000000
[11176.742465] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
ffffffff9efeed78
[11176.742470] RBP: ffffb6970021fe80 R08: 0000000000000001 R09: 
0000000000000001
[11176.742473] R10: 0000000000000001 R11: 0000000000000001 R12: 
ffff9c3350b0e800
[11176.742477] R13: ffffffffa00e9680 R14: 00000a2a49ada060 R15: 
0000000000000002
[11176.742483]  ? cpuidle_enter_state+0xf8/0x470
[11176.742489]  ? cpuidle_enter_state+0xf8/0x470
[11176.742495]  cpuidle_enter+0x2e/0x40
[11176.742500]  call_cpuidle+0x23/0x40
[11176.742506]  do_idle+0x201/0x280
[11176.742512]  cpu_startup_entry+0x20/0x30
[11176.742517]  start_secondary+0x11f/0x160
[11176.742523]  secondary_startup_64_no_verify+0xb0/0xbb


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Lockdep spalt on killing a processes
  2021-10-01 10:50 Lockdep spalt on killing a processes Christian König
@ 2021-10-01 14:52 ` Daniel Vetter
  2021-10-01 15:10 ` Andrey Grodzovsky
  1 sibling, 0 replies; 19+ messages in thread
From: Daniel Vetter @ 2021-10-01 14:52 UTC (permalink / raw)
  To: Christian König; +Cc: Andrey Grodzovsky, amd-gfx list, dri-devel

On Fri, Oct 01, 2021 at 12:50:35PM +0200, Christian König wrote:
> Hey, Andrey.
> 
> while investigating some memory management problems I've got the logdep
> splat below.
> 
> Looks like something is wrong with drm_sched_entity_kill_jobs_cb(), can you
> investigate?

Probably needs more irq_work so that we get rid of fence completion
signalling recursions.
-Daniel

> 
> Thanks,
> Christian.
> 
> [11176.741052] ============================================
> [11176.741056] WARNING: possible recursive locking detected
> [11176.741060] 5.15.0-rc1-00031-g9d546d600800 #171 Not tainted
> [11176.741066] --------------------------------------------
> [11176.741070] swapper/12/0 is trying to acquire lock:
> [11176.741074] ffff9c337ed175a8 (&fence->lock){-.-.}-{3:3}, at:
> dma_fence_signal+0x28/0x80
> [11176.741088]
>                but task is already holding lock:
> [11176.741092] ffff9c337ed172a8 (&fence->lock){-.-.}-{3:3}, at:
> dma_fence_signal+0x28/0x80
> [11176.741100]
>                other info that might help us debug this:
> [11176.741104]  Possible unsafe locking scenario:
> 
> [11176.741108]        CPU0
> [11176.741110]        ----
> [11176.741113]   lock(&fence->lock);
> [11176.741118]   lock(&fence->lock);
> [11176.741122]
>                 *** DEADLOCK ***
> 
> [11176.741125]  May be due to missing lock nesting notation
> 
> [11176.741128] 2 locks held by swapper/12/0:
> [11176.741133]  #0: ffff9c339c30f768 (&ring->fence_drv.lock){-.-.}-{3:3},
> at: dma_fence_signal+0x28/0x80
> [11176.741142]  #1: ffff9c337ed172a8 (&fence->lock){-.-.}-{3:3}, at:
> dma_fence_signal+0x28/0x80
> [11176.741151]
>                stack backtrace:
> [11176.741155] CPU: 12 PID: 0 Comm: swapper/12 Not tainted
> 5.15.0-rc1-00031-g9d546d600800 #171
> [11176.741160] Hardware name: System manufacturer System Product Name/PRIME
> X399-A, BIOS 0808 10/12/2018
> [11176.741165] Call Trace:
> [11176.741169]  <IRQ>
> [11176.741173]  dump_stack_lvl+0x5b/0x74
> [11176.741181]  dump_stack+0x10/0x12
> [11176.741186]  __lock_acquire.cold+0x208/0x2df
> [11176.741197]  lock_acquire+0xc6/0x2d0
> [11176.741204]  ? dma_fence_signal+0x28/0x80
> [11176.741212]  _raw_spin_lock_irqsave+0x4d/0x70
> [11176.741219]  ? dma_fence_signal+0x28/0x80
> [11176.741225]  dma_fence_signal+0x28/0x80
> [11176.741230]  drm_sched_fence_finished+0x12/0x20 [gpu_sched]
> [11176.741240]  drm_sched_entity_kill_jobs_cb+0x1c/0x50 [gpu_sched]
> [11176.741248]  dma_fence_signal_timestamp_locked+0xac/0x1a0
> [11176.741254]  dma_fence_signal+0x3b/0x80
> [11176.741260]  drm_sched_fence_finished+0x12/0x20 [gpu_sched]
> [11176.741268]  drm_sched_job_done.isra.0+0x7f/0x1a0 [gpu_sched]
> [11176.741277]  drm_sched_job_done_cb+0x12/0x20 [gpu_sched]
> [11176.741284]  dma_fence_signal_timestamp_locked+0xac/0x1a0
> [11176.741290]  dma_fence_signal+0x3b/0x80
> [11176.741296]  amdgpu_fence_process+0xd1/0x140 [amdgpu]
> [11176.741504]  sdma_v4_0_process_trap_irq+0x8c/0xb0 [amdgpu]
> [11176.741731]  amdgpu_irq_dispatch+0xce/0x250 [amdgpu]
> [11176.741954]  amdgpu_ih_process+0x81/0x100 [amdgpu]
> [11176.742174]  amdgpu_irq_handler+0x26/0xa0 [amdgpu]
> [11176.742393]  __handle_irq_event_percpu+0x4f/0x2c0
> [11176.742402]  handle_irq_event_percpu+0x33/0x80
> [11176.742408]  handle_irq_event+0x39/0x60
> [11176.742414]  handle_edge_irq+0x93/0x1d0
> [11176.742419]  __common_interrupt+0x50/0xe0
> [11176.742426]  common_interrupt+0x80/0x90
> [11176.742431]  </IRQ>
> [11176.742436]  asm_common_interrupt+0x1e/0x40
> [11176.742442] RIP: 0010:cpuidle_enter_state+0xff/0x470
> [11176.742449] Code: 0f a3 05 04 54 24 01 0f 82 70 02 00 00 31 ff e8 37 5d
> 6f ff 80 7d d7 00 0f 85 e9 01 00 00 e8 58 a2 7f ff fb 66 0f 1f 44 00 00 <45>
> 85 ff 0f 88 01 01 00 00 49 63 c7 4c 2b 75 c8 48 8d 14 40 48 8d
> [11176.742455] RSP: 0018:ffffb6970021fe48 EFLAGS: 00000202
> [11176.742461] RAX: 000000000059be25 RBX: 0000000000000002 RCX:
> 0000000000000000
> [11176.742465] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> ffffffff9efeed78
> [11176.742470] RBP: ffffb6970021fe80 R08: 0000000000000001 R09:
> 0000000000000001
> [11176.742473] R10: 0000000000000001 R11: 0000000000000001 R12:
> ffff9c3350b0e800
> [11176.742477] R13: ffffffffa00e9680 R14: 00000a2a49ada060 R15:
> 0000000000000002
> [11176.742483]  ? cpuidle_enter_state+0xf8/0x470
> [11176.742489]  ? cpuidle_enter_state+0xf8/0x470
> [11176.742495]  cpuidle_enter+0x2e/0x40
> [11176.742500]  call_cpuidle+0x23/0x40
> [11176.742506]  do_idle+0x201/0x280
> [11176.742512]  cpu_startup_entry+0x20/0x30
> [11176.742517]  start_secondary+0x11f/0x160
> [11176.742523]  secondary_startup_64_no_verify+0xb0/0xbb
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Lockdep spalt on killing a processes
  2021-10-01 10:50 Lockdep spalt on killing a processes Christian König
  2021-10-01 14:52 ` Daniel Vetter
@ 2021-10-01 15:10 ` Andrey Grodzovsky
  2021-10-04  8:14   ` Christian König
  1 sibling, 1 reply; 19+ messages in thread
From: Andrey Grodzovsky @ 2021-10-01 15:10 UTC (permalink / raw)
  To: Christian König, amd-gfx list, dri-devel

 From what I see here you supposed to have actual deadlock and not only 
warning, sched_fence->finished is  first signaled from within
hw fence done callback (drm_sched_job_done_cb) but then again from 
within it's own callback (drm_sched_entity_kill_jobs_cb) and so
looks like same fence  object is recursively signaled twice. This leads 
to attempt to lock fence->lock second time while it's already
locked. I don't see a need to call drm_sched_fence_finished from within 
drm_sched_entity_kill_jobs_cb as this callback already registered
on sched_fence->finished fence (entity->last_scheduled == 
s_fence->finished) and hence the signaling already took place.

Andrey

On 2021-10-01 6:50 a.m., Christian König wrote:
> Hey, Andrey.
>
> while investigating some memory management problems I've got the 
> logdep splat below.
>
> Looks like something is wrong with drm_sched_entity_kill_jobs_cb(), 
> can you investigate?
>
> Thanks,
> Christian.
>
> [11176.741052] ============================================
> [11176.741056] WARNING: possible recursive locking detected
> [11176.741060] 5.15.0-rc1-00031-g9d546d600800 #171 Not tainted
> [11176.741066] --------------------------------------------
> [11176.741070] swapper/12/0 is trying to acquire lock:
> [11176.741074] ffff9c337ed175a8 (&fence->lock){-.-.}-{3:3}, at: 
> dma_fence_signal+0x28/0x80
> [11176.741088]
>                but task is already holding lock:
> [11176.741092] ffff9c337ed172a8 (&fence->lock){-.-.}-{3:3}, at: 
> dma_fence_signal+0x28/0x80
> [11176.741100]
>                other info that might help us debug this:
> [11176.741104]  Possible unsafe locking scenario:
>
> [11176.741108]        CPU0
> [11176.741110]        ----
> [11176.741113]   lock(&fence->lock);
> [11176.741118]   lock(&fence->lock);
> [11176.741122]
>                 *** DEADLOCK ***
>
> [11176.741125]  May be due to missing lock nesting notation
>
> [11176.741128] 2 locks held by swapper/12/0:
> [11176.741133]  #0: ffff9c339c30f768 
> (&ring->fence_drv.lock){-.-.}-{3:3}, at: dma_fence_signal+0x28/0x80
> [11176.741142]  #1: ffff9c337ed172a8 (&fence->lock){-.-.}-{3:3}, at: 
> dma_fence_signal+0x28/0x80
> [11176.741151]
>                stack backtrace:
> [11176.741155] CPU: 12 PID: 0 Comm: swapper/12 Not tainted 
> 5.15.0-rc1-00031-g9d546d600800 #171
> [11176.741160] Hardware name: System manufacturer System Product 
> Name/PRIME X399-A, BIOS 0808 10/12/2018
> [11176.741165] Call Trace:
> [11176.741169]  <IRQ>
> [11176.741173]  dump_stack_lvl+0x5b/0x74
> [11176.741181]  dump_stack+0x10/0x12
> [11176.741186]  __lock_acquire.cold+0x208/0x2df
> [11176.741197]  lock_acquire+0xc6/0x2d0
> [11176.741204]  ? dma_fence_signal+0x28/0x80
> [11176.741212]  _raw_spin_lock_irqsave+0x4d/0x70
> [11176.741219]  ? dma_fence_signal+0x28/0x80
> [11176.741225]  dma_fence_signal+0x28/0x80
> [11176.741230]  drm_sched_fence_finished+0x12/0x20 [gpu_sched]
> [11176.741240]  drm_sched_entity_kill_jobs_cb+0x1c/0x50 [gpu_sched]
> [11176.741248]  dma_fence_signal_timestamp_locked+0xac/0x1a0
> [11176.741254]  dma_fence_signal+0x3b/0x80
> [11176.741260]  drm_sched_fence_finished+0x12/0x20 [gpu_sched]
> [11176.741268]  drm_sched_job_done.isra.0+0x7f/0x1a0 [gpu_sched]
> [11176.741277]  drm_sched_job_done_cb+0x12/0x20 [gpu_sched]
> [11176.741284]  dma_fence_signal_timestamp_locked+0xac/0x1a0
> [11176.741290]  dma_fence_signal+0x3b/0x80
> [11176.741296]  amdgpu_fence_process+0xd1/0x140 [amdgpu]
> [11176.741504]  sdma_v4_0_process_trap_irq+0x8c/0xb0 [amdgpu]
> [11176.741731]  amdgpu_irq_dispatch+0xce/0x250 [amdgpu]
> [11176.741954]  amdgpu_ih_process+0x81/0x100 [amdgpu]
> [11176.742174]  amdgpu_irq_handler+0x26/0xa0 [amdgpu]
> [11176.742393]  __handle_irq_event_percpu+0x4f/0x2c0
> [11176.742402]  handle_irq_event_percpu+0x33/0x80
> [11176.742408]  handle_irq_event+0x39/0x60
> [11176.742414]  handle_edge_irq+0x93/0x1d0
> [11176.742419]  __common_interrupt+0x50/0xe0
> [11176.742426]  common_interrupt+0x80/0x90
> [11176.742431]  </IRQ>
> [11176.742436]  asm_common_interrupt+0x1e/0x40
> [11176.742442] RIP: 0010:cpuidle_enter_state+0xff/0x470
> [11176.742449] Code: 0f a3 05 04 54 24 01 0f 82 70 02 00 00 31 ff e8 
> 37 5d 6f ff 80 7d d7 00 0f 85 e9 01 00 00 e8 58 a2 7f ff fb 66 0f 1f 
> 44 00 00 <45> 85 ff 0f 88 01 01 00 00 49 63 c7 4c 2b 75 c8 48 8d 14 40 
> 48 8d
> [11176.742455] RSP: 0018:ffffb6970021fe48 EFLAGS: 00000202
> [11176.742461] RAX: 000000000059be25 RBX: 0000000000000002 RCX: 
> 0000000000000000
> [11176.742465] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
> ffffffff9efeed78
> [11176.742470] RBP: ffffb6970021fe80 R08: 0000000000000001 R09: 
> 0000000000000001
> [11176.742473] R10: 0000000000000001 R11: 0000000000000001 R12: 
> ffff9c3350b0e800
> [11176.742477] R13: ffffffffa00e9680 R14: 00000a2a49ada060 R15: 
> 0000000000000002
> [11176.742483]  ? cpuidle_enter_state+0xf8/0x470
> [11176.742489]  ? cpuidle_enter_state+0xf8/0x470
> [11176.742495]  cpuidle_enter+0x2e/0x40
> [11176.742500]  call_cpuidle+0x23/0x40
> [11176.742506]  do_idle+0x201/0x280
> [11176.742512]  cpu_startup_entry+0x20/0x30
> [11176.742517]  start_secondary+0x11f/0x160
> [11176.742523]  secondary_startup_64_no_verify+0xb0/0xbb
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Lockdep spalt on killing a processes
  2021-10-01 15:10 ` Andrey Grodzovsky
@ 2021-10-04  8:14   ` Christian König
  2021-10-04 15:27     ` Andrey Grodzovsky
  2021-10-20 19:32     ` Andrey Grodzovsky
  0 siblings, 2 replies; 19+ messages in thread
From: Christian König @ 2021-10-04  8:14 UTC (permalink / raw)
  To: Andrey Grodzovsky, Christian König, amd-gfx list, dri-devel

The problem is a bit different.

The callback is on the dependent fence, while we need to signal the 
scheduler fence.

Daniel is right that this needs an irq_work struct to handle this properly.

Christian.

Am 01.10.21 um 17:10 schrieb Andrey Grodzovsky:
> From what I see here you supposed to have actual deadlock and not only 
> warning, sched_fence->finished is  first signaled from within
> hw fence done callback (drm_sched_job_done_cb) but then again from 
> within it's own callback (drm_sched_entity_kill_jobs_cb) and so
> looks like same fence  object is recursively signaled twice. This 
> leads to attempt to lock fence->lock second time while it's already
> locked. I don't see a need to call drm_sched_fence_finished from 
> within drm_sched_entity_kill_jobs_cb as this callback already registered
> on sched_fence->finished fence (entity->last_scheduled == 
> s_fence->finished) and hence the signaling already took place.
>
> Andrey
>
> On 2021-10-01 6:50 a.m., Christian König wrote:
>> Hey, Andrey.
>>
>> while investigating some memory management problems I've got the 
>> logdep splat below.
>>
>> Looks like something is wrong with drm_sched_entity_kill_jobs_cb(), 
>> can you investigate?
>>
>> Thanks,
>> Christian.
>>
>> [11176.741052] ============================================
>> [11176.741056] WARNING: possible recursive locking detected
>> [11176.741060] 5.15.0-rc1-00031-g9d546d600800 #171 Not tainted
>> [11176.741066] --------------------------------------------
>> [11176.741070] swapper/12/0 is trying to acquire lock:
>> [11176.741074] ffff9c337ed175a8 (&fence->lock){-.-.}-{3:3}, at: 
>> dma_fence_signal+0x28/0x80
>> [11176.741088]
>>                but task is already holding lock:
>> [11176.741092] ffff9c337ed172a8 (&fence->lock){-.-.}-{3:3}, at: 
>> dma_fence_signal+0x28/0x80
>> [11176.741100]
>>                other info that might help us debug this:
>> [11176.741104]  Possible unsafe locking scenario:
>>
>> [11176.741108]        CPU0
>> [11176.741110]        ----
>> [11176.741113]   lock(&fence->lock);
>> [11176.741118]   lock(&fence->lock);
>> [11176.741122]
>>                 *** DEADLOCK ***
>>
>> [11176.741125]  May be due to missing lock nesting notation
>>
>> [11176.741128] 2 locks held by swapper/12/0:
>> [11176.741133]  #0: ffff9c339c30f768 
>> (&ring->fence_drv.lock){-.-.}-{3:3}, at: dma_fence_signal+0x28/0x80
>> [11176.741142]  #1: ffff9c337ed172a8 (&fence->lock){-.-.}-{3:3}, at: 
>> dma_fence_signal+0x28/0x80
>> [11176.741151]
>>                stack backtrace:
>> [11176.741155] CPU: 12 PID: 0 Comm: swapper/12 Not tainted 
>> 5.15.0-rc1-00031-g9d546d600800 #171
>> [11176.741160] Hardware name: System manufacturer System Product 
>> Name/PRIME X399-A, BIOS 0808 10/12/2018
>> [11176.741165] Call Trace:
>> [11176.741169]  <IRQ>
>> [11176.741173]  dump_stack_lvl+0x5b/0x74
>> [11176.741181]  dump_stack+0x10/0x12
>> [11176.741186]  __lock_acquire.cold+0x208/0x2df
>> [11176.741197]  lock_acquire+0xc6/0x2d0
>> [11176.741204]  ? dma_fence_signal+0x28/0x80
>> [11176.741212]  _raw_spin_lock_irqsave+0x4d/0x70
>> [11176.741219]  ? dma_fence_signal+0x28/0x80
>> [11176.741225]  dma_fence_signal+0x28/0x80
>> [11176.741230]  drm_sched_fence_finished+0x12/0x20 [gpu_sched]
>> [11176.741240]  drm_sched_entity_kill_jobs_cb+0x1c/0x50 [gpu_sched]
>> [11176.741248]  dma_fence_signal_timestamp_locked+0xac/0x1a0
>> [11176.741254]  dma_fence_signal+0x3b/0x80
>> [11176.741260]  drm_sched_fence_finished+0x12/0x20 [gpu_sched]
>> [11176.741268]  drm_sched_job_done.isra.0+0x7f/0x1a0 [gpu_sched]
>> [11176.741277]  drm_sched_job_done_cb+0x12/0x20 [gpu_sched]
>> [11176.741284]  dma_fence_signal_timestamp_locked+0xac/0x1a0
>> [11176.741290]  dma_fence_signal+0x3b/0x80
>> [11176.741296]  amdgpu_fence_process+0xd1/0x140 [amdgpu]
>> [11176.741504]  sdma_v4_0_process_trap_irq+0x8c/0xb0 [amdgpu]
>> [11176.741731]  amdgpu_irq_dispatch+0xce/0x250 [amdgpu]
>> [11176.741954]  amdgpu_ih_process+0x81/0x100 [amdgpu]
>> [11176.742174]  amdgpu_irq_handler+0x26/0xa0 [amdgpu]
>> [11176.742393]  __handle_irq_event_percpu+0x4f/0x2c0
>> [11176.742402]  handle_irq_event_percpu+0x33/0x80
>> [11176.742408]  handle_irq_event+0x39/0x60
>> [11176.742414]  handle_edge_irq+0x93/0x1d0
>> [11176.742419]  __common_interrupt+0x50/0xe0
>> [11176.742426]  common_interrupt+0x80/0x90
>> [11176.742431]  </IRQ>
>> [11176.742436]  asm_common_interrupt+0x1e/0x40
>> [11176.742442] RIP: 0010:cpuidle_enter_state+0xff/0x470
>> [11176.742449] Code: 0f a3 05 04 54 24 01 0f 82 70 02 00 00 31 ff e8 
>> 37 5d 6f ff 80 7d d7 00 0f 85 e9 01 00 00 e8 58 a2 7f ff fb 66 0f 1f 
>> 44 00 00 <45> 85 ff 0f 88 01 01 00 00 49 63 c7 4c 2b 75 c8 48 8d 14 
>> 40 48 8d
>> [11176.742455] RSP: 0018:ffffb6970021fe48 EFLAGS: 00000202
>> [11176.742461] RAX: 000000000059be25 RBX: 0000000000000002 RCX: 
>> 0000000000000000
>> [11176.742465] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
>> ffffffff9efeed78
>> [11176.742470] RBP: ffffb6970021fe80 R08: 0000000000000001 R09: 
>> 0000000000000001
>> [11176.742473] R10: 0000000000000001 R11: 0000000000000001 R12: 
>> ffff9c3350b0e800
>> [11176.742477] R13: ffffffffa00e9680 R14: 00000a2a49ada060 R15: 
>> 0000000000000002
>> [11176.742483]  ? cpuidle_enter_state+0xf8/0x470
>> [11176.742489]  ? cpuidle_enter_state+0xf8/0x470
>> [11176.742495]  cpuidle_enter+0x2e/0x40
>> [11176.742500]  call_cpuidle+0x23/0x40
>> [11176.742506]  do_idle+0x201/0x280
>> [11176.742512]  cpu_startup_entry+0x20/0x30
>> [11176.742517]  start_secondary+0x11f/0x160
>> [11176.742523]  secondary_startup_64_no_verify+0xb0/0xbb
>>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Lockdep spalt on killing a processes
  2021-10-04  8:14   ` Christian König
@ 2021-10-04 15:27     ` Andrey Grodzovsky
  2021-10-20 19:32     ` Andrey Grodzovsky
  1 sibling, 0 replies; 19+ messages in thread
From: Andrey Grodzovsky @ 2021-10-04 15:27 UTC (permalink / raw)
  To: Christian König, Christian König, amd-gfx list, dri-devel

I see my confusion, we hang all unsubmitted jobs on the last submitted 
to HW job.
Yea, in this case indeed rescheduling to a different thread context will 
avoid the splat but
the schedule work cannot be done for each dependency signalling but 
rather they way we do
for ttm_bo_delayed_delete with a list of dependencies to signal. 
Otherwise some of the schedule
work will be drop because previous invocation is still pending execution.

Andrey

On 2021-10-04 4:14 a.m., Christian König wrote:
> The problem is a bit different.
>
> The callback is on the dependent fence, while we need to signal the 
> scheduler fence.
>
> Daniel is right that this needs an irq_work struct to handle this 
> properly.
>
> Christian.
>
> Am 01.10.21 um 17:10 schrieb Andrey Grodzovsky:
>> From what I see here you supposed to have actual deadlock and not 
>> only warning, sched_fence->finished is  first signaled from within
>> hw fence done callback (drm_sched_job_done_cb) but then again from 
>> within it's own callback (drm_sched_entity_kill_jobs_cb) and so
>> looks like same fence  object is recursively signaled twice. This 
>> leads to attempt to lock fence->lock second time while it's already
>> locked. I don't see a need to call drm_sched_fence_finished from 
>> within drm_sched_entity_kill_jobs_cb as this callback already registered
>> on sched_fence->finished fence (entity->last_scheduled == 
>> s_fence->finished) and hence the signaling already took place.
>>
>> Andrey
>>
>> On 2021-10-01 6:50 a.m., Christian König wrote:
>>> Hey, Andrey.
>>>
>>> while investigating some memory management problems I've got the 
>>> logdep splat below.
>>>
>>> Looks like something is wrong with drm_sched_entity_kill_jobs_cb(), 
>>> can you investigate?
>>>
>>> Thanks,
>>> Christian.
>>>
>>> [11176.741052] ============================================
>>> [11176.741056] WARNING: possible recursive locking detected
>>> [11176.741060] 5.15.0-rc1-00031-g9d546d600800 #171 Not tainted
>>> [11176.741066] --------------------------------------------
>>> [11176.741070] swapper/12/0 is trying to acquire lock:
>>> [11176.741074] ffff9c337ed175a8 (&fence->lock){-.-.}-{3:3}, at: 
>>> dma_fence_signal+0x28/0x80
>>> [11176.741088]
>>>                but task is already holding lock:
>>> [11176.741092] ffff9c337ed172a8 (&fence->lock){-.-.}-{3:3}, at: 
>>> dma_fence_signal+0x28/0x80
>>> [11176.741100]
>>>                other info that might help us debug this:
>>> [11176.741104]  Possible unsafe locking scenario:
>>>
>>> [11176.741108]        CPU0
>>> [11176.741110]        ----
>>> [11176.741113]   lock(&fence->lock);
>>> [11176.741118]   lock(&fence->lock);
>>> [11176.741122]
>>>                 *** DEADLOCK ***
>>>
>>> [11176.741125]  May be due to missing lock nesting notation
>>>
>>> [11176.741128] 2 locks held by swapper/12/0:
>>> [11176.741133]  #0: ffff9c339c30f768 
>>> (&ring->fence_drv.lock){-.-.}-{3:3}, at: dma_fence_signal+0x28/0x80
>>> [11176.741142]  #1: ffff9c337ed172a8 (&fence->lock){-.-.}-{3:3}, at: 
>>> dma_fence_signal+0x28/0x80
>>> [11176.741151]
>>>                stack backtrace:
>>> [11176.741155] CPU: 12 PID: 0 Comm: swapper/12 Not tainted 
>>> 5.15.0-rc1-00031-g9d546d600800 #171
>>> [11176.741160] Hardware name: System manufacturer System Product 
>>> Name/PRIME X399-A, BIOS 0808 10/12/2018
>>> [11176.741165] Call Trace:
>>> [11176.741169]  <IRQ>
>>> [11176.741173]  dump_stack_lvl+0x5b/0x74
>>> [11176.741181]  dump_stack+0x10/0x12
>>> [11176.741186]  __lock_acquire.cold+0x208/0x2df
>>> [11176.741197]  lock_acquire+0xc6/0x2d0
>>> [11176.741204]  ? dma_fence_signal+0x28/0x80
>>> [11176.741212]  _raw_spin_lock_irqsave+0x4d/0x70
>>> [11176.741219]  ? dma_fence_signal+0x28/0x80
>>> [11176.741225]  dma_fence_signal+0x28/0x80
>>> [11176.741230]  drm_sched_fence_finished+0x12/0x20 [gpu_sched]
>>> [11176.741240]  drm_sched_entity_kill_jobs_cb+0x1c/0x50 [gpu_sched]
>>> [11176.741248]  dma_fence_signal_timestamp_locked+0xac/0x1a0
>>> [11176.741254]  dma_fence_signal+0x3b/0x80
>>> [11176.741260]  drm_sched_fence_finished+0x12/0x20 [gpu_sched]
>>> [11176.741268]  drm_sched_job_done.isra.0+0x7f/0x1a0 [gpu_sched]
>>> [11176.741277]  drm_sched_job_done_cb+0x12/0x20 [gpu_sched]
>>> [11176.741284]  dma_fence_signal_timestamp_locked+0xac/0x1a0
>>> [11176.741290]  dma_fence_signal+0x3b/0x80
>>> [11176.741296]  amdgpu_fence_process+0xd1/0x140 [amdgpu]
>>> [11176.741504]  sdma_v4_0_process_trap_irq+0x8c/0xb0 [amdgpu]
>>> [11176.741731]  amdgpu_irq_dispatch+0xce/0x250 [amdgpu]
>>> [11176.741954]  amdgpu_ih_process+0x81/0x100 [amdgpu]
>>> [11176.742174]  amdgpu_irq_handler+0x26/0xa0 [amdgpu]
>>> [11176.742393]  __handle_irq_event_percpu+0x4f/0x2c0
>>> [11176.742402]  handle_irq_event_percpu+0x33/0x80
>>> [11176.742408]  handle_irq_event+0x39/0x60
>>> [11176.742414]  handle_edge_irq+0x93/0x1d0
>>> [11176.742419]  __common_interrupt+0x50/0xe0
>>> [11176.742426]  common_interrupt+0x80/0x90
>>> [11176.742431]  </IRQ>
>>> [11176.742436]  asm_common_interrupt+0x1e/0x40
>>> [11176.742442] RIP: 0010:cpuidle_enter_state+0xff/0x470
>>> [11176.742449] Code: 0f a3 05 04 54 24 01 0f 82 70 02 00 00 31 ff e8 
>>> 37 5d 6f ff 80 7d d7 00 0f 85 e9 01 00 00 e8 58 a2 7f ff fb 66 0f 1f 
>>> 44 00 00 <45> 85 ff 0f 88 01 01 00 00 49 63 c7 4c 2b 75 c8 48 8d 14 
>>> 40 48 8d
>>> [11176.742455] RSP: 0018:ffffb6970021fe48 EFLAGS: 00000202
>>> [11176.742461] RAX: 000000000059be25 RBX: 0000000000000002 RCX: 
>>> 0000000000000000
>>> [11176.742465] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
>>> ffffffff9efeed78
>>> [11176.742470] RBP: ffffb6970021fe80 R08: 0000000000000001 R09: 
>>> 0000000000000001
>>> [11176.742473] R10: 0000000000000001 R11: 0000000000000001 R12: 
>>> ffff9c3350b0e800
>>> [11176.742477] R13: ffffffffa00e9680 R14: 00000a2a49ada060 R15: 
>>> 0000000000000002
>>> [11176.742483]  ? cpuidle_enter_state+0xf8/0x470
>>> [11176.742489]  ? cpuidle_enter_state+0xf8/0x470
>>> [11176.742495]  cpuidle_enter+0x2e/0x40
>>> [11176.742500]  call_cpuidle+0x23/0x40
>>> [11176.742506]  do_idle+0x201/0x280
>>> [11176.742512]  cpu_startup_entry+0x20/0x30
>>> [11176.742517]  start_secondary+0x11f/0x160
>>> [11176.742523]  secondary_startup_64_no_verify+0xb0/0xbb
>>>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Lockdep spalt on killing a processes
  2021-10-04  8:14   ` Christian König
  2021-10-04 15:27     ` Andrey Grodzovsky
@ 2021-10-20 19:32     ` Andrey Grodzovsky
  2021-10-21  6:34       ` Christian König
  1 sibling, 1 reply; 19+ messages in thread
From: Andrey Grodzovsky @ 2021-10-20 19:32 UTC (permalink / raw)
  To: Christian König, Christian König, amd-gfx list, dri-devel

On 2021-10-04 4:14 a.m., Christian König wrote:

> The problem is a bit different.
>
> The callback is on the dependent fence, while we need to signal the 
> scheduler fence.
>
> Daniel is right that this needs an irq_work struct to handle this 
> properly.
>
> Christian.


So we had some discussions with Christian regarding irq_work and agreed 
I should look into doing it but stepping back for a sec -

Why we insist on calling the dma_fence_cb  with fence->lock locked ? Is 
it because of dma_fence_add_callback ?
Because we first test for DMA_FENCE_FLAG_SIGNALED_BIT and only after 
that lock the fence->lock ? If so, can't we
move DMA_FENCE_FLAG_SIGNALED_BIT  check inside the locked section ? 
Because if in theory
we could call the cb with unlocked fence->lock (i.e. this kind of 
iteration 
https://elixir.bootlin.com/linux/v5.15-rc6/source/drivers/gpu/drm/ttm/ttm_resource.c#L117)
we wouldn't have the lockdep splat. And in general, is it really
the correct approach to call a third party code from a call back with 
locked spinlock ? We don't know what the cb does inside
and I don't see any explicit restrictions in documentation of 
dma_fence_func_t what can and cannot be done there.

Andrey


>
> Am 01.10.21 um 17:10 schrieb Andrey Grodzovsky:
>> From what I see here you supposed to have actual deadlock and not 
>> only warning, sched_fence->finished is  first signaled from within
>> hw fence done callback (drm_sched_job_done_cb) but then again from 
>> within it's own callback (drm_sched_entity_kill_jobs_cb) and so
>> looks like same fence  object is recursively signaled twice. This 
>> leads to attempt to lock fence->lock second time while it's already
>> locked. I don't see a need to call drm_sched_fence_finished from 
>> within drm_sched_entity_kill_jobs_cb as this callback already registered
>> on sched_fence->finished fence (entity->last_scheduled == 
>> s_fence->finished) and hence the signaling already took place.
>>
>> Andrey
>>
>> On 2021-10-01 6:50 a.m., Christian König wrote:
>>> Hey, Andrey.
>>>
>>> while investigating some memory management problems I've got the 
>>> logdep splat below.
>>>
>>> Looks like something is wrong with drm_sched_entity_kill_jobs_cb(), 
>>> can you investigate?
>>>
>>> Thanks,
>>> Christian.
>>>
>>> [11176.741052] ============================================
>>> [11176.741056] WARNING: possible recursive locking detected
>>> [11176.741060] 5.15.0-rc1-00031-g9d546d600800 #171 Not tainted
>>> [11176.741066] --------------------------------------------
>>> [11176.741070] swapper/12/0 is trying to acquire lock:
>>> [11176.741074] ffff9c337ed175a8 (&fence->lock){-.-.}-{3:3}, at: 
>>> dma_fence_signal+0x28/0x80
>>> [11176.741088]
>>>                but task is already holding lock:
>>> [11176.741092] ffff9c337ed172a8 (&fence->lock){-.-.}-{3:3}, at: 
>>> dma_fence_signal+0x28/0x80
>>> [11176.741100]
>>>                other info that might help us debug this:
>>> [11176.741104]  Possible unsafe locking scenario:
>>>
>>> [11176.741108]        CPU0
>>> [11176.741110]        ----
>>> [11176.741113]   lock(&fence->lock);
>>> [11176.741118]   lock(&fence->lock);
>>> [11176.741122]
>>>                 *** DEADLOCK ***
>>>
>>> [11176.741125]  May be due to missing lock nesting notation
>>>
>>> [11176.741128] 2 locks held by swapper/12/0:
>>> [11176.741133]  #0: ffff9c339c30f768 
>>> (&ring->fence_drv.lock){-.-.}-{3:3}, at: dma_fence_signal+0x28/0x80
>>> [11176.741142]  #1: ffff9c337ed172a8 (&fence->lock){-.-.}-{3:3}, at: 
>>> dma_fence_signal+0x28/0x80
>>> [11176.741151]
>>>                stack backtrace:
>>> [11176.741155] CPU: 12 PID: 0 Comm: swapper/12 Not tainted 
>>> 5.15.0-rc1-00031-g9d546d600800 #171
>>> [11176.741160] Hardware name: System manufacturer System Product 
>>> Name/PRIME X399-A, BIOS 0808 10/12/2018
>>> [11176.741165] Call Trace:
>>> [11176.741169]  <IRQ>
>>> [11176.741173]  dump_stack_lvl+0x5b/0x74
>>> [11176.741181]  dump_stack+0x10/0x12
>>> [11176.741186]  __lock_acquire.cold+0x208/0x2df
>>> [11176.741197]  lock_acquire+0xc6/0x2d0
>>> [11176.741204]  ? dma_fence_signal+0x28/0x80
>>> [11176.741212]  _raw_spin_lock_irqsave+0x4d/0x70
>>> [11176.741219]  ? dma_fence_signal+0x28/0x80
>>> [11176.741225]  dma_fence_signal+0x28/0x80
>>> [11176.741230]  drm_sched_fence_finished+0x12/0x20 [gpu_sched]
>>> [11176.741240]  drm_sched_entity_kill_jobs_cb+0x1c/0x50 [gpu_sched]
>>> [11176.741248]  dma_fence_signal_timestamp_locked+0xac/0x1a0
>>> [11176.741254]  dma_fence_signal+0x3b/0x80
>>> [11176.741260]  drm_sched_fence_finished+0x12/0x20 [gpu_sched]
>>> [11176.741268]  drm_sched_job_done.isra.0+0x7f/0x1a0 [gpu_sched]
>>> [11176.741277]  drm_sched_job_done_cb+0x12/0x20 [gpu_sched]
>>> [11176.741284]  dma_fence_signal_timestamp_locked+0xac/0x1a0
>>> [11176.741290]  dma_fence_signal+0x3b/0x80
>>> [11176.741296]  amdgpu_fence_process+0xd1/0x140 [amdgpu]
>>> [11176.741504]  sdma_v4_0_process_trap_irq+0x8c/0xb0 [amdgpu]
>>> [11176.741731]  amdgpu_irq_dispatch+0xce/0x250 [amdgpu]
>>> [11176.741954]  amdgpu_ih_process+0x81/0x100 [amdgpu]
>>> [11176.742174]  amdgpu_irq_handler+0x26/0xa0 [amdgpu]
>>> [11176.742393]  __handle_irq_event_percpu+0x4f/0x2c0
>>> [11176.742402]  handle_irq_event_percpu+0x33/0x80
>>> [11176.742408]  handle_irq_event+0x39/0x60
>>> [11176.742414]  handle_edge_irq+0x93/0x1d0
>>> [11176.742419]  __common_interrupt+0x50/0xe0
>>> [11176.742426]  common_interrupt+0x80/0x90
>>> [11176.742431]  </IRQ>
>>> [11176.742436]  asm_common_interrupt+0x1e/0x40
>>> [11176.742442] RIP: 0010:cpuidle_enter_state+0xff/0x470
>>> [11176.742449] Code: 0f a3 05 04 54 24 01 0f 82 70 02 00 00 31 ff e8 
>>> 37 5d 6f ff 80 7d d7 00 0f 85 e9 01 00 00 e8 58 a2 7f ff fb 66 0f 1f 
>>> 44 00 00 <45> 85 ff 0f 88 01 01 00 00 49 63 c7 4c 2b 75 c8 48 8d 14 
>>> 40 48 8d
>>> [11176.742455] RSP: 0018:ffffb6970021fe48 EFLAGS: 00000202
>>> [11176.742461] RAX: 000000000059be25 RBX: 0000000000000002 RCX: 
>>> 0000000000000000
>>> [11176.742465] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
>>> ffffffff9efeed78
>>> [11176.742470] RBP: ffffb6970021fe80 R08: 0000000000000001 R09: 
>>> 0000000000000001
>>> [11176.742473] R10: 0000000000000001 R11: 0000000000000001 R12: 
>>> ffff9c3350b0e800
>>> [11176.742477] R13: ffffffffa00e9680 R14: 00000a2a49ada060 R15: 
>>> 0000000000000002
>>> [11176.742483]  ? cpuidle_enter_state+0xf8/0x470
>>> [11176.742489]  ? cpuidle_enter_state+0xf8/0x470
>>> [11176.742495]  cpuidle_enter+0x2e/0x40
>>> [11176.742500]  call_cpuidle+0x23/0x40
>>> [11176.742506]  do_idle+0x201/0x280
>>> [11176.742512]  cpu_startup_entry+0x20/0x30
>>> [11176.742517]  start_secondary+0x11f/0x160
>>> [11176.742523]  secondary_startup_64_no_verify+0xb0/0xbb
>>>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Lockdep spalt on killing a processes
  2021-10-20 19:32     ` Andrey Grodzovsky
@ 2021-10-21  6:34       ` Christian König
  2021-10-25 19:10         ` Andrey Grodzovsky
  0 siblings, 1 reply; 19+ messages in thread
From: Christian König @ 2021-10-21  6:34 UTC (permalink / raw)
  To: Andrey Grodzovsky, Christian König, amd-gfx list, dri-devel



Am 20.10.21 um 21:32 schrieb Andrey Grodzovsky:
> On 2021-10-04 4:14 a.m., Christian König wrote:
>
>> The problem is a bit different.
>>
>> The callback is on the dependent fence, while we need to signal the 
>> scheduler fence.
>>
>> Daniel is right that this needs an irq_work struct to handle this 
>> properly.
>>
>> Christian.
>
>
> So we had some discussions with Christian regarding irq_work and 
> agreed I should look into doing it but stepping back for a sec -
>
> Why we insist on calling the dma_fence_cb  with fence->lock locked ? 
> Is it because of dma_fence_add_callback ?
> Because we first test for DMA_FENCE_FLAG_SIGNALED_BIT and only after 
> that lock the fence->lock ? If so, can't we
> move DMA_FENCE_FLAG_SIGNALED_BIT  check inside the locked section ? 
> Because if in theory
> we could call the cb with unlocked fence->lock (i.e. this kind of 
> iteration 
> https://elixir.bootlin.com/linux/v5.15-rc6/source/drivers/gpu/drm/ttm/ttm_resource.c#L117)
> we wouldn't have the lockdep splat. And in general, is it really
> the correct approach to call a third party code from a call back with 
> locked spinlock ? We don't know what the cb does inside
> and I don't see any explicit restrictions in documentation of 
> dma_fence_func_t what can and cannot be done there.

Yeah, that's exactly what I meant with using the irq_work directly in 
the fence code.

The problem is dma_fence_signal_locked() which is used by quite a number 
of drivers to signal the fence while holding the lock.

Otherwise we could indeed simplify the fence handling a lot.

Christian.

>
> Andrey
>
>
>>
>> Am 01.10.21 um 17:10 schrieb Andrey Grodzovsky:
>>> From what I see here you supposed to have actual deadlock and not 
>>> only warning, sched_fence->finished is  first signaled from within
>>> hw fence done callback (drm_sched_job_done_cb) but then again from 
>>> within it's own callback (drm_sched_entity_kill_jobs_cb) and so
>>> looks like same fence  object is recursively signaled twice. This 
>>> leads to attempt to lock fence->lock second time while it's already
>>> locked. I don't see a need to call drm_sched_fence_finished from 
>>> within drm_sched_entity_kill_jobs_cb as this callback already 
>>> registered
>>> on sched_fence->finished fence (entity->last_scheduled == 
>>> s_fence->finished) and hence the signaling already took place.
>>>
>>> Andrey
>>>
>>> On 2021-10-01 6:50 a.m., Christian König wrote:
>>>> Hey, Andrey.
>>>>
>>>> while investigating some memory management problems I've got the 
>>>> logdep splat below.
>>>>
>>>> Looks like something is wrong with drm_sched_entity_kill_jobs_cb(), 
>>>> can you investigate?
>>>>
>>>> Thanks,
>>>> Christian.
>>>>
>>>> [11176.741052] ============================================
>>>> [11176.741056] WARNING: possible recursive locking detected
>>>> [11176.741060] 5.15.0-rc1-00031-g9d546d600800 #171 Not tainted
>>>> [11176.741066] --------------------------------------------
>>>> [11176.741070] swapper/12/0 is trying to acquire lock:
>>>> [11176.741074] ffff9c337ed175a8 (&fence->lock){-.-.}-{3:3}, at: 
>>>> dma_fence_signal+0x28/0x80
>>>> [11176.741088]
>>>>                but task is already holding lock:
>>>> [11176.741092] ffff9c337ed172a8 (&fence->lock){-.-.}-{3:3}, at: 
>>>> dma_fence_signal+0x28/0x80
>>>> [11176.741100]
>>>>                other info that might help us debug this:
>>>> [11176.741104]  Possible unsafe locking scenario:
>>>>
>>>> [11176.741108]        CPU0
>>>> [11176.741110]        ----
>>>> [11176.741113]   lock(&fence->lock);
>>>> [11176.741118]   lock(&fence->lock);
>>>> [11176.741122]
>>>>                 *** DEADLOCK ***
>>>>
>>>> [11176.741125]  May be due to missing lock nesting notation
>>>>
>>>> [11176.741128] 2 locks held by swapper/12/0:
>>>> [11176.741133]  #0: ffff9c339c30f768 
>>>> (&ring->fence_drv.lock){-.-.}-{3:3}, at: dma_fence_signal+0x28/0x80
>>>> [11176.741142]  #1: ffff9c337ed172a8 (&fence->lock){-.-.}-{3:3}, 
>>>> at: dma_fence_signal+0x28/0x80
>>>> [11176.741151]
>>>>                stack backtrace:
>>>> [11176.741155] CPU: 12 PID: 0 Comm: swapper/12 Not tainted 
>>>> 5.15.0-rc1-00031-g9d546d600800 #171
>>>> [11176.741160] Hardware name: System manufacturer System Product 
>>>> Name/PRIME X399-A, BIOS 0808 10/12/2018
>>>> [11176.741165] Call Trace:
>>>> [11176.741169]  <IRQ>
>>>> [11176.741173]  dump_stack_lvl+0x5b/0x74
>>>> [11176.741181]  dump_stack+0x10/0x12
>>>> [11176.741186]  __lock_acquire.cold+0x208/0x2df
>>>> [11176.741197]  lock_acquire+0xc6/0x2d0
>>>> [11176.741204]  ? dma_fence_signal+0x28/0x80
>>>> [11176.741212]  _raw_spin_lock_irqsave+0x4d/0x70
>>>> [11176.741219]  ? dma_fence_signal+0x28/0x80
>>>> [11176.741225]  dma_fence_signal+0x28/0x80
>>>> [11176.741230]  drm_sched_fence_finished+0x12/0x20 [gpu_sched]
>>>> [11176.741240]  drm_sched_entity_kill_jobs_cb+0x1c/0x50 [gpu_sched]
>>>> [11176.741248]  dma_fence_signal_timestamp_locked+0xac/0x1a0
>>>> [11176.741254]  dma_fence_signal+0x3b/0x80
>>>> [11176.741260]  drm_sched_fence_finished+0x12/0x20 [gpu_sched]
>>>> [11176.741268]  drm_sched_job_done.isra.0+0x7f/0x1a0 [gpu_sched]
>>>> [11176.741277]  drm_sched_job_done_cb+0x12/0x20 [gpu_sched]
>>>> [11176.741284]  dma_fence_signal_timestamp_locked+0xac/0x1a0
>>>> [11176.741290]  dma_fence_signal+0x3b/0x80
>>>> [11176.741296]  amdgpu_fence_process+0xd1/0x140 [amdgpu]
>>>> [11176.741504]  sdma_v4_0_process_trap_irq+0x8c/0xb0 [amdgpu]
>>>> [11176.741731]  amdgpu_irq_dispatch+0xce/0x250 [amdgpu]
>>>> [11176.741954]  amdgpu_ih_process+0x81/0x100 [amdgpu]
>>>> [11176.742174]  amdgpu_irq_handler+0x26/0xa0 [amdgpu]
>>>> [11176.742393]  __handle_irq_event_percpu+0x4f/0x2c0
>>>> [11176.742402]  handle_irq_event_percpu+0x33/0x80
>>>> [11176.742408]  handle_irq_event+0x39/0x60
>>>> [11176.742414]  handle_edge_irq+0x93/0x1d0
>>>> [11176.742419]  __common_interrupt+0x50/0xe0
>>>> [11176.742426]  common_interrupt+0x80/0x90
>>>> [11176.742431]  </IRQ>
>>>> [11176.742436]  asm_common_interrupt+0x1e/0x40
>>>> [11176.742442] RIP: 0010:cpuidle_enter_state+0xff/0x470
>>>> [11176.742449] Code: 0f a3 05 04 54 24 01 0f 82 70 02 00 00 31 ff 
>>>> e8 37 5d 6f ff 80 7d d7 00 0f 85 e9 01 00 00 e8 58 a2 7f ff fb 66 
>>>> 0f 1f 44 00 00 <45> 85 ff 0f 88 01 01 00 00 49 63 c7 4c 2b 75 c8 48 
>>>> 8d 14 40 48 8d
>>>> [11176.742455] RSP: 0018:ffffb6970021fe48 EFLAGS: 00000202
>>>> [11176.742461] RAX: 000000000059be25 RBX: 0000000000000002 RCX: 
>>>> 0000000000000000
>>>> [11176.742465] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
>>>> ffffffff9efeed78
>>>> [11176.742470] RBP: ffffb6970021fe80 R08: 0000000000000001 R09: 
>>>> 0000000000000001
>>>> [11176.742473] R10: 0000000000000001 R11: 0000000000000001 R12: 
>>>> ffff9c3350b0e800
>>>> [11176.742477] R13: ffffffffa00e9680 R14: 00000a2a49ada060 R15: 
>>>> 0000000000000002
>>>> [11176.742483]  ? cpuidle_enter_state+0xf8/0x470
>>>> [11176.742489]  ? cpuidle_enter_state+0xf8/0x470
>>>> [11176.742495]  cpuidle_enter+0x2e/0x40
>>>> [11176.742500]  call_cpuidle+0x23/0x40
>>>> [11176.742506]  do_idle+0x201/0x280
>>>> [11176.742512]  cpu_startup_entry+0x20/0x30
>>>> [11176.742517]  start_secondary+0x11f/0x160
>>>> [11176.742523]  secondary_startup_64_no_verify+0xb0/0xbb
>>>>
>>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Lockdep spalt on killing a processes
  2021-10-21  6:34       ` Christian König
@ 2021-10-25 19:10         ` Andrey Grodzovsky
  2021-10-25 19:56           ` Christian König
  0 siblings, 1 reply; 19+ messages in thread
From: Andrey Grodzovsky @ 2021-10-25 19:10 UTC (permalink / raw)
  To: Christian König, Christian König, amd-gfx list,
	dri-devel, chris, daniel.vetter

[-- Attachment #1: Type: text/plain, Size: 8748 bytes --]

Adding back Daniel (somehow he got off the addresses list) and Chris who 
worked a lot in this area.

On 2021-10-21 2:34 a.m., Christian König wrote:
>
>
> Am 20.10.21 um 21:32 schrieb Andrey Grodzovsky:
>> On 2021-10-04 4:14 a.m., Christian König wrote:
>>
>>> The problem is a bit different.
>>>
>>> The callback is on the dependent fence, while we need to signal the 
>>> scheduler fence.
>>>
>>> Daniel is right that this needs an irq_work struct to handle this 
>>> properly.
>>>
>>> Christian.
>>
>>
>> So we had some discussions with Christian regarding irq_work and 
>> agreed I should look into doing it but stepping back for a sec -
>>
>> Why we insist on calling the dma_fence_cb  with fence->lock locked ? 
>> Is it because of dma_fence_add_callback ?
>> Because we first test for DMA_FENCE_FLAG_SIGNALED_BIT and only after 
>> that lock the fence->lock ? If so, can't we
>> move DMA_FENCE_FLAG_SIGNALED_BIT  check inside the locked section ? 
>> Because if in theory
>> we could call the cb with unlocked fence->lock (i.e. this kind of 
>> iteration 
>> https://elixir.bootlin.com/linux/v5.15-rc6/source/drivers/gpu/drm/ttm/ttm_resource.c#L117)
>> we wouldn't have the lockdep splat. And in general, is it really
>> the correct approach to call a third party code from a call back with 
>> locked spinlock ? We don't know what the cb does inside
>> and I don't see any explicit restrictions in documentation of 
>> dma_fence_func_t what can and cannot be done there.
>
> Yeah, that's exactly what I meant with using the irq_work directly in 
> the fence code.


My idea is not to use irq work at all but instead to implement unlocked 
dma_fence cb execution using iteration
which drops the spinlock each time next cb is executed and acquiring it 
again after (until cb_list is empy).


>
>
> The problem is dma_fence_signal_locked() which is used by quite a 
> number of drivers to signal the fence while holding the lock.


For this I think we should not reuse dma_fence_signal_locked inside 
dma_fence_signal and instead implement it using the
unlocked iteration I mentioned above. I looked a bit in the code and the 
history and I see that until some time ago
(this commit by Chris 0fc89b6802ba1fcc561b0c906e0cefd384e3b2e5), indeed 
dma_fence_signal was doing it's own, locked iteration
and wasn't reusing dma_fence_signal_locked. This way whoever relies on 
the dma_fence_signal_locked won't be impacted
an who is not (like us in 
drm_sched_fence_scheduled/drm_sched_fence_finished) should also not be 
impacted by more narrow
scope of the lock. I also looked at dma_fence_default_wait and how it 
locks the fence->lock and check if fence is signaled
before wait start and I don't see a problem there either.

I attached quick draft of this proposal to clarify.

Andrey


>
> Otherwise we could indeed simplify the fence handling a lot.
>
> Christian.
>
>>
>> Andrey
>>
>>
>>>
>>> Am 01.10.21 um 17:10 schrieb Andrey Grodzovsky:
>>>> From what I see here you supposed to have actual deadlock and not 
>>>> only warning, sched_fence->finished is  first signaled from within
>>>> hw fence done callback (drm_sched_job_done_cb) but then again from 
>>>> within it's own callback (drm_sched_entity_kill_jobs_cb) and so
>>>> looks like same fence  object is recursively signaled twice. This 
>>>> leads to attempt to lock fence->lock second time while it's already
>>>> locked. I don't see a need to call drm_sched_fence_finished from 
>>>> within drm_sched_entity_kill_jobs_cb as this callback already 
>>>> registered
>>>> on sched_fence->finished fence (entity->last_scheduled == 
>>>> s_fence->finished) and hence the signaling already took place.
>>>>
>>>> Andrey
>>>>
>>>> On 2021-10-01 6:50 a.m., Christian König wrote:
>>>>> Hey, Andrey.
>>>>>
>>>>> while investigating some memory management problems I've got the 
>>>>> logdep splat below.
>>>>>
>>>>> Looks like something is wrong with 
>>>>> drm_sched_entity_kill_jobs_cb(), can you investigate?
>>>>>
>>>>> Thanks,
>>>>> Christian.
>>>>>
>>>>> [11176.741052] ============================================
>>>>> [11176.741056] WARNING: possible recursive locking detected
>>>>> [11176.741060] 5.15.0-rc1-00031-g9d546d600800 #171 Not tainted
>>>>> [11176.741066] --------------------------------------------
>>>>> [11176.741070] swapper/12/0 is trying to acquire lock:
>>>>> [11176.741074] ffff9c337ed175a8 (&fence->lock){-.-.}-{3:3}, at: 
>>>>> dma_fence_signal+0x28/0x80
>>>>> [11176.741088]
>>>>>                but task is already holding lock:
>>>>> [11176.741092] ffff9c337ed172a8 (&fence->lock){-.-.}-{3:3}, at: 
>>>>> dma_fence_signal+0x28/0x80
>>>>> [11176.741100]
>>>>>                other info that might help us debug this:
>>>>> [11176.741104]  Possible unsafe locking scenario:
>>>>>
>>>>> [11176.741108]        CPU0
>>>>> [11176.741110]        ----
>>>>> [11176.741113]   lock(&fence->lock);
>>>>> [11176.741118]   lock(&fence->lock);
>>>>> [11176.741122]
>>>>>                 *** DEADLOCK ***
>>>>>
>>>>> [11176.741125]  May be due to missing lock nesting notation
>>>>>
>>>>> [11176.741128] 2 locks held by swapper/12/0:
>>>>> [11176.741133]  #0: ffff9c339c30f768 
>>>>> (&ring->fence_drv.lock){-.-.}-{3:3}, at: dma_fence_signal+0x28/0x80
>>>>> [11176.741142]  #1: ffff9c337ed172a8 (&fence->lock){-.-.}-{3:3}, 
>>>>> at: dma_fence_signal+0x28/0x80
>>>>> [11176.741151]
>>>>>                stack backtrace:
>>>>> [11176.741155] CPU: 12 PID: 0 Comm: swapper/12 Not tainted 
>>>>> 5.15.0-rc1-00031-g9d546d600800 #171
>>>>> [11176.741160] Hardware name: System manufacturer System Product 
>>>>> Name/PRIME X399-A, BIOS 0808 10/12/2018
>>>>> [11176.741165] Call Trace:
>>>>> [11176.741169]  <IRQ>
>>>>> [11176.741173]  dump_stack_lvl+0x5b/0x74
>>>>> [11176.741181]  dump_stack+0x10/0x12
>>>>> [11176.741186]  __lock_acquire.cold+0x208/0x2df
>>>>> [11176.741197]  lock_acquire+0xc6/0x2d0
>>>>> [11176.741204]  ? dma_fence_signal+0x28/0x80
>>>>> [11176.741212]  _raw_spin_lock_irqsave+0x4d/0x70
>>>>> [11176.741219]  ? dma_fence_signal+0x28/0x80
>>>>> [11176.741225]  dma_fence_signal+0x28/0x80
>>>>> [11176.741230]  drm_sched_fence_finished+0x12/0x20 [gpu_sched]
>>>>> [11176.741240]  drm_sched_entity_kill_jobs_cb+0x1c/0x50 [gpu_sched]
>>>>> [11176.741248] dma_fence_signal_timestamp_locked+0xac/0x1a0
>>>>> [11176.741254]  dma_fence_signal+0x3b/0x80
>>>>> [11176.741260]  drm_sched_fence_finished+0x12/0x20 [gpu_sched]
>>>>> [11176.741268]  drm_sched_job_done.isra.0+0x7f/0x1a0 [gpu_sched]
>>>>> [11176.741277]  drm_sched_job_done_cb+0x12/0x20 [gpu_sched]
>>>>> [11176.741284] dma_fence_signal_timestamp_locked+0xac/0x1a0
>>>>> [11176.741290]  dma_fence_signal+0x3b/0x80
>>>>> [11176.741296]  amdgpu_fence_process+0xd1/0x140 [amdgpu]
>>>>> [11176.741504]  sdma_v4_0_process_trap_irq+0x8c/0xb0 [amdgpu]
>>>>> [11176.741731]  amdgpu_irq_dispatch+0xce/0x250 [amdgpu]
>>>>> [11176.741954]  amdgpu_ih_process+0x81/0x100 [amdgpu]
>>>>> [11176.742174]  amdgpu_irq_handler+0x26/0xa0 [amdgpu]
>>>>> [11176.742393]  __handle_irq_event_percpu+0x4f/0x2c0
>>>>> [11176.742402]  handle_irq_event_percpu+0x33/0x80
>>>>> [11176.742408]  handle_irq_event+0x39/0x60
>>>>> [11176.742414]  handle_edge_irq+0x93/0x1d0
>>>>> [11176.742419]  __common_interrupt+0x50/0xe0
>>>>> [11176.742426]  common_interrupt+0x80/0x90
>>>>> [11176.742431]  </IRQ>
>>>>> [11176.742436]  asm_common_interrupt+0x1e/0x40
>>>>> [11176.742442] RIP: 0010:cpuidle_enter_state+0xff/0x470
>>>>> [11176.742449] Code: 0f a3 05 04 54 24 01 0f 82 70 02 00 00 31 ff 
>>>>> e8 37 5d 6f ff 80 7d d7 00 0f 85 e9 01 00 00 e8 58 a2 7f ff fb 66 
>>>>> 0f 1f 44 00 00 <45> 85 ff 0f 88 01 01 00 00 49 63 c7 4c 2b 75 c8 
>>>>> 48 8d 14 40 48 8d
>>>>> [11176.742455] RSP: 0018:ffffb6970021fe48 EFLAGS: 00000202
>>>>> [11176.742461] RAX: 000000000059be25 RBX: 0000000000000002 RCX: 
>>>>> 0000000000000000
>>>>> [11176.742465] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
>>>>> ffffffff9efeed78
>>>>> [11176.742470] RBP: ffffb6970021fe80 R08: 0000000000000001 R09: 
>>>>> 0000000000000001
>>>>> [11176.742473] R10: 0000000000000001 R11: 0000000000000001 R12: 
>>>>> ffff9c3350b0e800
>>>>> [11176.742477] R13: ffffffffa00e9680 R14: 00000a2a49ada060 R15: 
>>>>> 0000000000000002
>>>>> [11176.742483]  ? cpuidle_enter_state+0xf8/0x470
>>>>> [11176.742489]  ? cpuidle_enter_state+0xf8/0x470
>>>>> [11176.742495]  cpuidle_enter+0x2e/0x40
>>>>> [11176.742500]  call_cpuidle+0x23/0x40
>>>>> [11176.742506]  do_idle+0x201/0x280
>>>>> [11176.742512]  cpu_startup_entry+0x20/0x30
>>>>> [11176.742517]  start_secondary+0x11f/0x160
>>>>> [11176.742523]  secondary_startup_64_no_verify+0xb0/0xbb
>>>>>
>>>
>

[-- Attachment #2: draft.path --]
[-- Type: text/x-systemd-unit, Size: 901 bytes --]

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index ce0f5eff575d..011cb99afdfb 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -459,9 +459,21 @@ int dma_fence_signal(struct dma_fence *fence)
 
 	tmp = dma_fence_begin_signalling();
 
-	spin_lock_irqsave(fence->lock, flags);
-	ret = dma_fence_signal_timestamp_locked(fence, ktime_get());
-	spin_unlock_irqrestore(fence->lock, flags);
+	if (unlikely(test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
+				      &fence->flags)))
+		return -EINVAL;
+
+	trace_dma_fence_signaled(fence);
+
+	lock(&fence->lock);
+	while(!list_empty(&fence->cb_list)) {
+		dma_fence_cb cur = list_first_entry(&&fence->cb_list, struct dma_fence_cb, node);
+		list_del_init(&cur->node);
+		unlock(&fence->lock);
+		cur->func(fence, cur);
+		lock(&fence->lock);
+	}
+	unlock(&fence->lock);
 
 	dma_fence_end_signalling(tmp);
 

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: Lockdep spalt on killing a processes
  2021-10-25 19:10         ` Andrey Grodzovsky
@ 2021-10-25 19:56           ` Christian König
  2021-10-26  2:33             ` Andrey Grodzovsky
  0 siblings, 1 reply; 19+ messages in thread
From: Christian König @ 2021-10-25 19:56 UTC (permalink / raw)
  To: Andrey Grodzovsky, Christian König, amd-gfx list, dri-devel,
	chris, daniel.vetter

In general I'm all there to get this fixed, but there is one major 
problem: Drivers don't expect the lock to be dropped.

What we could do is to change all drivers so they call always call the 
dma_fence_signal functions and drop the _locked variants. This way we 
could move calling the callback out of the spinlock.

But that requires audit of all drivers, so quite a lot of work to do.

Regards,
Christian.

Am 25.10.21 um 21:10 schrieb Andrey Grodzovsky:
> Adding back Daniel (somehow he got off the addresses list) and Chris 
> who worked a lot in this area.
>
> On 2021-10-21 2:34 a.m., Christian König wrote:
>>
>>
>> Am 20.10.21 um 21:32 schrieb Andrey Grodzovsky:
>>> On 2021-10-04 4:14 a.m., Christian König wrote:
>>>
>>>> The problem is a bit different.
>>>>
>>>> The callback is on the dependent fence, while we need to signal the 
>>>> scheduler fence.
>>>>
>>>> Daniel is right that this needs an irq_work struct to handle this 
>>>> properly.
>>>>
>>>> Christian.
>>>
>>>
>>> So we had some discussions with Christian regarding irq_work and 
>>> agreed I should look into doing it but stepping back for a sec -
>>>
>>> Why we insist on calling the dma_fence_cb  with fence->lock locked ? 
>>> Is it because of dma_fence_add_callback ?
>>> Because we first test for DMA_FENCE_FLAG_SIGNALED_BIT and only after 
>>> that lock the fence->lock ? If so, can't we
>>> move DMA_FENCE_FLAG_SIGNALED_BIT  check inside the locked section ? 
>>> Because if in theory
>>> we could call the cb with unlocked fence->lock (i.e. this kind of 
>>> iteration 
>>> https://elixir.bootlin.com/linux/v5.15-rc6/source/drivers/gpu/drm/ttm/ttm_resource.c#L117)
>>> we wouldn't have the lockdep splat. And in general, is it really
>>> the correct approach to call a third party code from a call back 
>>> with locked spinlock ? We don't know what the cb does inside
>>> and I don't see any explicit restrictions in documentation of 
>>> dma_fence_func_t what can and cannot be done there.
>>
>> Yeah, that's exactly what I meant with using the irq_work directly in 
>> the fence code.
>
>
> My idea is not to use irq work at all but instead to implement 
> unlocked dma_fence cb execution using iteration
> which drops the spinlock each time next cb is executed and acquiring 
> it again after (until cb_list is empy).
>
>
>>
>>
>> The problem is dma_fence_signal_locked() which is used by quite a 
>> number of drivers to signal the fence while holding the lock.
>
>
> For this I think we should not reuse dma_fence_signal_locked inside 
> dma_fence_signal and instead implement it using the
> unlocked iteration I mentioned above. I looked a bit in the code and 
> the history and I see that until some time ago
> (this commit by Chris 0fc89b6802ba1fcc561b0c906e0cefd384e3b2e5), 
> indeed dma_fence_signal was doing it's own, locked iteration
> and wasn't reusing dma_fence_signal_locked. This way whoever relies on 
> the dma_fence_signal_locked won't be impacted
> an who is not (like us in 
> drm_sched_fence_scheduled/drm_sched_fence_finished) should also not be 
> impacted by more narrow
> scope of the lock. I also looked at dma_fence_default_wait and how it 
> locks the fence->lock and check if fence is signaled
> before wait start and I don't see a problem there either.
>
> I attached quick draft of this proposal to clarify.
>
> Andrey
>
>
>>
>> Otherwise we could indeed simplify the fence handling a lot.
>>
>> Christian.
>>
>>>
>>> Andrey
>>>
>>>
>>>>
>>>> Am 01.10.21 um 17:10 schrieb Andrey Grodzovsky:
>>>>> From what I see here you supposed to have actual deadlock and not 
>>>>> only warning, sched_fence->finished is  first signaled from within
>>>>> hw fence done callback (drm_sched_job_done_cb) but then again from 
>>>>> within it's own callback (drm_sched_entity_kill_jobs_cb) and so
>>>>> looks like same fence  object is recursively signaled twice. This 
>>>>> leads to attempt to lock fence->lock second time while it's already
>>>>> locked. I don't see a need to call drm_sched_fence_finished from 
>>>>> within drm_sched_entity_kill_jobs_cb as this callback already 
>>>>> registered
>>>>> on sched_fence->finished fence (entity->last_scheduled == 
>>>>> s_fence->finished) and hence the signaling already took place.
>>>>>
>>>>> Andrey
>>>>>
>>>>> On 2021-10-01 6:50 a.m., Christian König wrote:
>>>>>> Hey, Andrey.
>>>>>>
>>>>>> while investigating some memory management problems I've got the 
>>>>>> logdep splat below.
>>>>>>
>>>>>> Looks like something is wrong with 
>>>>>> drm_sched_entity_kill_jobs_cb(), can you investigate?
>>>>>>
>>>>>> Thanks,
>>>>>> Christian.
>>>>>>
>>>>>> [11176.741052] ============================================
>>>>>> [11176.741056] WARNING: possible recursive locking detected
>>>>>> [11176.741060] 5.15.0-rc1-00031-g9d546d600800 #171 Not tainted
>>>>>> [11176.741066] --------------------------------------------
>>>>>> [11176.741070] swapper/12/0 is trying to acquire lock:
>>>>>> [11176.741074] ffff9c337ed175a8 (&fence->lock){-.-.}-{3:3}, at: 
>>>>>> dma_fence_signal+0x28/0x80
>>>>>> [11176.741088]
>>>>>>                but task is already holding lock:
>>>>>> [11176.741092] ffff9c337ed172a8 (&fence->lock){-.-.}-{3:3}, at: 
>>>>>> dma_fence_signal+0x28/0x80
>>>>>> [11176.741100]
>>>>>>                other info that might help us debug this:
>>>>>> [11176.741104]  Possible unsafe locking scenario:
>>>>>>
>>>>>> [11176.741108]        CPU0
>>>>>> [11176.741110]        ----
>>>>>> [11176.741113]   lock(&fence->lock);
>>>>>> [11176.741118]   lock(&fence->lock);
>>>>>> [11176.741122]
>>>>>>                 *** DEADLOCK ***
>>>>>>
>>>>>> [11176.741125]  May be due to missing lock nesting notation
>>>>>>
>>>>>> [11176.741128] 2 locks held by swapper/12/0:
>>>>>> [11176.741133]  #0: ffff9c339c30f768 
>>>>>> (&ring->fence_drv.lock){-.-.}-{3:3}, at: dma_fence_signal+0x28/0x80
>>>>>> [11176.741142]  #1: ffff9c337ed172a8 (&fence->lock){-.-.}-{3:3}, 
>>>>>> at: dma_fence_signal+0x28/0x80
>>>>>> [11176.741151]
>>>>>>                stack backtrace:
>>>>>> [11176.741155] CPU: 12 PID: 0 Comm: swapper/12 Not tainted 
>>>>>> 5.15.0-rc1-00031-g9d546d600800 #171
>>>>>> [11176.741160] Hardware name: System manufacturer System Product 
>>>>>> Name/PRIME X399-A, BIOS 0808 10/12/2018
>>>>>> [11176.741165] Call Trace:
>>>>>> [11176.741169]  <IRQ>
>>>>>> [11176.741173]  dump_stack_lvl+0x5b/0x74
>>>>>> [11176.741181]  dump_stack+0x10/0x12
>>>>>> [11176.741186]  __lock_acquire.cold+0x208/0x2df
>>>>>> [11176.741197]  lock_acquire+0xc6/0x2d0
>>>>>> [11176.741204]  ? dma_fence_signal+0x28/0x80
>>>>>> [11176.741212]  _raw_spin_lock_irqsave+0x4d/0x70
>>>>>> [11176.741219]  ? dma_fence_signal+0x28/0x80
>>>>>> [11176.741225]  dma_fence_signal+0x28/0x80
>>>>>> [11176.741230]  drm_sched_fence_finished+0x12/0x20 [gpu_sched]
>>>>>> [11176.741240]  drm_sched_entity_kill_jobs_cb+0x1c/0x50 [gpu_sched]
>>>>>> [11176.741248] dma_fence_signal_timestamp_locked+0xac/0x1a0
>>>>>> [11176.741254]  dma_fence_signal+0x3b/0x80
>>>>>> [11176.741260]  drm_sched_fence_finished+0x12/0x20 [gpu_sched]
>>>>>> [11176.741268]  drm_sched_job_done.isra.0+0x7f/0x1a0 [gpu_sched]
>>>>>> [11176.741277]  drm_sched_job_done_cb+0x12/0x20 [gpu_sched]
>>>>>> [11176.741284] dma_fence_signal_timestamp_locked+0xac/0x1a0
>>>>>> [11176.741290]  dma_fence_signal+0x3b/0x80
>>>>>> [11176.741296]  amdgpu_fence_process+0xd1/0x140 [amdgpu]
>>>>>> [11176.741504]  sdma_v4_0_process_trap_irq+0x8c/0xb0 [amdgpu]
>>>>>> [11176.741731]  amdgpu_irq_dispatch+0xce/0x250 [amdgpu]
>>>>>> [11176.741954]  amdgpu_ih_process+0x81/0x100 [amdgpu]
>>>>>> [11176.742174]  amdgpu_irq_handler+0x26/0xa0 [amdgpu]
>>>>>> [11176.742393]  __handle_irq_event_percpu+0x4f/0x2c0
>>>>>> [11176.742402]  handle_irq_event_percpu+0x33/0x80
>>>>>> [11176.742408]  handle_irq_event+0x39/0x60
>>>>>> [11176.742414]  handle_edge_irq+0x93/0x1d0
>>>>>> [11176.742419]  __common_interrupt+0x50/0xe0
>>>>>> [11176.742426]  common_interrupt+0x80/0x90
>>>>>> [11176.742431]  </IRQ>
>>>>>> [11176.742436]  asm_common_interrupt+0x1e/0x40
>>>>>> [11176.742442] RIP: 0010:cpuidle_enter_state+0xff/0x470
>>>>>> [11176.742449] Code: 0f a3 05 04 54 24 01 0f 82 70 02 00 00 31 ff 
>>>>>> e8 37 5d 6f ff 80 7d d7 00 0f 85 e9 01 00 00 e8 58 a2 7f ff fb 66 
>>>>>> 0f 1f 44 00 00 <45> 85 ff 0f 88 01 01 00 00 49 63 c7 4c 2b 75 c8 
>>>>>> 48 8d 14 40 48 8d
>>>>>> [11176.742455] RSP: 0018:ffffb6970021fe48 EFLAGS: 00000202
>>>>>> [11176.742461] RAX: 000000000059be25 RBX: 0000000000000002 RCX: 
>>>>>> 0000000000000000
>>>>>> [11176.742465] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
>>>>>> ffffffff9efeed78
>>>>>> [11176.742470] RBP: ffffb6970021fe80 R08: 0000000000000001 R09: 
>>>>>> 0000000000000001
>>>>>> [11176.742473] R10: 0000000000000001 R11: 0000000000000001 R12: 
>>>>>> ffff9c3350b0e800
>>>>>> [11176.742477] R13: ffffffffa00e9680 R14: 00000a2a49ada060 R15: 
>>>>>> 0000000000000002
>>>>>> [11176.742483]  ? cpuidle_enter_state+0xf8/0x470
>>>>>> [11176.742489]  ? cpuidle_enter_state+0xf8/0x470
>>>>>> [11176.742495]  cpuidle_enter+0x2e/0x40
>>>>>> [11176.742500]  call_cpuidle+0x23/0x40
>>>>>> [11176.742506]  do_idle+0x201/0x280
>>>>>> [11176.742512]  cpu_startup_entry+0x20/0x30
>>>>>> [11176.742517]  start_secondary+0x11f/0x160
>>>>>> [11176.742523]  secondary_startup_64_no_verify+0xb0/0xbb
>>>>>>
>>>>
>>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Lockdep spalt on killing a processes
  2021-10-25 19:56           ` Christian König
@ 2021-10-26  2:33             ` Andrey Grodzovsky
  2021-10-26 10:54               ` Christian König
  0 siblings, 1 reply; 19+ messages in thread
From: Andrey Grodzovsky @ 2021-10-26  2:33 UTC (permalink / raw)
  To: Christian König, Christian König, amd-gfx list,
	dri-devel, chris, daniel.vetter

On 2021-10-25 3:56 p.m., Christian König wrote:

> In general I'm all there to get this fixed, but there is one major 
> problem: Drivers don't expect the lock to be dropped.


I am probably missing something but in my approach we only modify the 
code for those clients that call dma_fence_signal,
not dma_fence_signal_locked. In those cases the drivers are agnostic to 
lock behavior (or should be at least) since the lock
is acquired within the dma fence code. Note that if you are worried 
about calling the callback without lock then same exact
concern is relevant to using the irq_work directly in the fence code 
since the irq_work will execute at a later time without locked
fence->lock (which is the point of using irq_work).

>
> What we could do is to change all drivers so they call always call the 
> dma_fence_signal functions and drop the _locked variants. This way we 
> could move calling the callback out of the spinlock.
>
> But that requires audit of all drivers, so quite a lot of work to do.


As i said earlier - if we only modify dma_fence_signal and don't touch 
dma_fence_signal_locked then our only concern should the users of 
dma_fence_signal.
Let me please know if I am still missing some point of yours.

Andrey


>
> Regards,
> Christian.
>
> Am 25.10.21 um 21:10 schrieb Andrey Grodzovsky:
>> Adding back Daniel (somehow he got off the addresses list) and Chris 
>> who worked a lot in this area.
>>
>> On 2021-10-21 2:34 a.m., Christian König wrote:
>>>
>>>
>>> Am 20.10.21 um 21:32 schrieb Andrey Grodzovsky:
>>>> On 2021-10-04 4:14 a.m., Christian König wrote:
>>>>
>>>>> The problem is a bit different.
>>>>>
>>>>> The callback is on the dependent fence, while we need to signal 
>>>>> the scheduler fence.
>>>>>
>>>>> Daniel is right that this needs an irq_work struct to handle this 
>>>>> properly.
>>>>>
>>>>> Christian.
>>>>
>>>>
>>>> So we had some discussions with Christian regarding irq_work and 
>>>> agreed I should look into doing it but stepping back for a sec -
>>>>
>>>> Why we insist on calling the dma_fence_cb  with fence->lock locked 
>>>> ? Is it because of dma_fence_add_callback ?
>>>> Because we first test for DMA_FENCE_FLAG_SIGNALED_BIT and only 
>>>> after that lock the fence->lock ? If so, can't we
>>>> move DMA_FENCE_FLAG_SIGNALED_BIT  check inside the locked section ? 
>>>> Because if in theory
>>>> we could call the cb with unlocked fence->lock (i.e. this kind of 
>>>> iteration 
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv5.15-rc6%2Fsource%2Fdrivers%2Fgpu%2Fdrm%2Fttm%2Fttm_resource.c%23L117&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7Cc8a4525f94c244bebbd208d997f19242%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637707886075917091%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=YBq%2BNkDuYKERc8XJDWTfeM%2FSknpuCBHQYgN8Uo5PFv0%3D&amp;reserved=0)
>>>> we wouldn't have the lockdep splat. And in general, is it really
>>>> the correct approach to call a third party code from a call back 
>>>> with locked spinlock ? We don't know what the cb does inside
>>>> and I don't see any explicit restrictions in documentation of 
>>>> dma_fence_func_t what can and cannot be done there.
>>>
>>> Yeah, that's exactly what I meant with using the irq_work directly 
>>> in the fence code.
>>
>>
>> My idea is not to use irq work at all but instead to implement 
>> unlocked dma_fence cb execution using iteration
>> which drops the spinlock each time next cb is executed and acquiring 
>> it again after (until cb_list is empy).
>>
>>
>>>
>>>
>>> The problem is dma_fence_signal_locked() which is used by quite a 
>>> number of drivers to signal the fence while holding the lock.
>>
>>
>> For this I think we should not reuse dma_fence_signal_locked inside 
>> dma_fence_signal and instead implement it using the
>> unlocked iteration I mentioned above. I looked a bit in the code and 
>> the history and I see that until some time ago
>> (this commit by Chris 0fc89b6802ba1fcc561b0c906e0cefd384e3b2e5), 
>> indeed dma_fence_signal was doing it's own, locked iteration
>> and wasn't reusing dma_fence_signal_locked. This way whoever relies 
>> on the dma_fence_signal_locked won't be impacted
>> an who is not (like us in 
>> drm_sched_fence_scheduled/drm_sched_fence_finished) should also not 
>> be impacted by more narrow
>> scope of the lock. I also looked at dma_fence_default_wait and how it 
>> locks the fence->lock and check if fence is signaled
>> before wait start and I don't see a problem there either.
>>
>> I attached quick draft of this proposal to clarify.
>>
>> Andrey
>>
>>
>>>
>>> Otherwise we could indeed simplify the fence handling a lot.
>>>
>>> Christian.
>>>
>>>>
>>>> Andrey
>>>>
>>>>
>>>>>
>>>>> Am 01.10.21 um 17:10 schrieb Andrey Grodzovsky:
>>>>>> From what I see here you supposed to have actual deadlock and not 
>>>>>> only warning, sched_fence->finished is  first signaled from within
>>>>>> hw fence done callback (drm_sched_job_done_cb) but then again 
>>>>>> from within it's own callback (drm_sched_entity_kill_jobs_cb) and so
>>>>>> looks like same fence  object is recursively signaled twice. This 
>>>>>> leads to attempt to lock fence->lock second time while it's already
>>>>>> locked. I don't see a need to call drm_sched_fence_finished from 
>>>>>> within drm_sched_entity_kill_jobs_cb as this callback already 
>>>>>> registered
>>>>>> on sched_fence->finished fence (entity->last_scheduled == 
>>>>>> s_fence->finished) and hence the signaling already took place.
>>>>>>
>>>>>> Andrey
>>>>>>
>>>>>> On 2021-10-01 6:50 a.m., Christian König wrote:
>>>>>>> Hey, Andrey.
>>>>>>>
>>>>>>> while investigating some memory management problems I've got the 
>>>>>>> logdep splat below.
>>>>>>>
>>>>>>> Looks like something is wrong with 
>>>>>>> drm_sched_entity_kill_jobs_cb(), can you investigate?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Christian.
>>>>>>>
>>>>>>> [11176.741052] ============================================
>>>>>>> [11176.741056] WARNING: possible recursive locking detected
>>>>>>> [11176.741060] 5.15.0-rc1-00031-g9d546d600800 #171 Not tainted
>>>>>>> [11176.741066] --------------------------------------------
>>>>>>> [11176.741070] swapper/12/0 is trying to acquire lock:
>>>>>>> [11176.741074] ffff9c337ed175a8 (&fence->lock){-.-.}-{3:3}, at: 
>>>>>>> dma_fence_signal+0x28/0x80
>>>>>>> [11176.741088]
>>>>>>>                but task is already holding lock:
>>>>>>> [11176.741092] ffff9c337ed172a8 (&fence->lock){-.-.}-{3:3}, at: 
>>>>>>> dma_fence_signal+0x28/0x80
>>>>>>> [11176.741100]
>>>>>>>                other info that might help us debug this:
>>>>>>> [11176.741104]  Possible unsafe locking scenario:
>>>>>>>
>>>>>>> [11176.741108]        CPU0
>>>>>>> [11176.741110]        ----
>>>>>>> [11176.741113]   lock(&fence->lock);
>>>>>>> [11176.741118]   lock(&fence->lock);
>>>>>>> [11176.741122]
>>>>>>>                 *** DEADLOCK ***
>>>>>>>
>>>>>>> [11176.741125]  May be due to missing lock nesting notation
>>>>>>>
>>>>>>> [11176.741128] 2 locks held by swapper/12/0:
>>>>>>> [11176.741133]  #0: ffff9c339c30f768 
>>>>>>> (&ring->fence_drv.lock){-.-.}-{3:3}, at: dma_fence_signal+0x28/0x80
>>>>>>> [11176.741142]  #1: ffff9c337ed172a8 (&fence->lock){-.-.}-{3:3}, 
>>>>>>> at: dma_fence_signal+0x28/0x80
>>>>>>> [11176.741151]
>>>>>>>                stack backtrace:
>>>>>>> [11176.741155] CPU: 12 PID: 0 Comm: swapper/12 Not tainted 
>>>>>>> 5.15.0-rc1-00031-g9d546d600800 #171
>>>>>>> [11176.741160] Hardware name: System manufacturer System Product 
>>>>>>> Name/PRIME X399-A, BIOS 0808 10/12/2018
>>>>>>> [11176.741165] Call Trace:
>>>>>>> [11176.741169]  <IRQ>
>>>>>>> [11176.741173]  dump_stack_lvl+0x5b/0x74
>>>>>>> [11176.741181]  dump_stack+0x10/0x12
>>>>>>> [11176.741186]  __lock_acquire.cold+0x208/0x2df
>>>>>>> [11176.741197]  lock_acquire+0xc6/0x2d0
>>>>>>> [11176.741204]  ? dma_fence_signal+0x28/0x80
>>>>>>> [11176.741212]  _raw_spin_lock_irqsave+0x4d/0x70
>>>>>>> [11176.741219]  ? dma_fence_signal+0x28/0x80
>>>>>>> [11176.741225]  dma_fence_signal+0x28/0x80
>>>>>>> [11176.741230]  drm_sched_fence_finished+0x12/0x20 [gpu_sched]
>>>>>>> [11176.741240] drm_sched_entity_kill_jobs_cb+0x1c/0x50 [gpu_sched]
>>>>>>> [11176.741248] dma_fence_signal_timestamp_locked+0xac/0x1a0
>>>>>>> [11176.741254]  dma_fence_signal+0x3b/0x80
>>>>>>> [11176.741260]  drm_sched_fence_finished+0x12/0x20 [gpu_sched]
>>>>>>> [11176.741268]  drm_sched_job_done.isra.0+0x7f/0x1a0 [gpu_sched]
>>>>>>> [11176.741277]  drm_sched_job_done_cb+0x12/0x20 [gpu_sched]
>>>>>>> [11176.741284] dma_fence_signal_timestamp_locked+0xac/0x1a0
>>>>>>> [11176.741290]  dma_fence_signal+0x3b/0x80
>>>>>>> [11176.741296]  amdgpu_fence_process+0xd1/0x140 [amdgpu]
>>>>>>> [11176.741504]  sdma_v4_0_process_trap_irq+0x8c/0xb0 [amdgpu]
>>>>>>> [11176.741731]  amdgpu_irq_dispatch+0xce/0x250 [amdgpu]
>>>>>>> [11176.741954]  amdgpu_ih_process+0x81/0x100 [amdgpu]
>>>>>>> [11176.742174]  amdgpu_irq_handler+0x26/0xa0 [amdgpu]
>>>>>>> [11176.742393]  __handle_irq_event_percpu+0x4f/0x2c0
>>>>>>> [11176.742402]  handle_irq_event_percpu+0x33/0x80
>>>>>>> [11176.742408]  handle_irq_event+0x39/0x60
>>>>>>> [11176.742414]  handle_edge_irq+0x93/0x1d0
>>>>>>> [11176.742419]  __common_interrupt+0x50/0xe0
>>>>>>> [11176.742426]  common_interrupt+0x80/0x90
>>>>>>> [11176.742431]  </IRQ>
>>>>>>> [11176.742436]  asm_common_interrupt+0x1e/0x40
>>>>>>> [11176.742442] RIP: 0010:cpuidle_enter_state+0xff/0x470
>>>>>>> [11176.742449] Code: 0f a3 05 04 54 24 01 0f 82 70 02 00 00 31 
>>>>>>> ff e8 37 5d 6f ff 80 7d d7 00 0f 85 e9 01 00 00 e8 58 a2 7f ff 
>>>>>>> fb 66 0f 1f 44 00 00 <45> 85 ff 0f 88 01 01 00 00 49 63 c7 4c 2b 
>>>>>>> 75 c8 48 8d 14 40 48 8d
>>>>>>> [11176.742455] RSP: 0018:ffffb6970021fe48 EFLAGS: 00000202
>>>>>>> [11176.742461] RAX: 000000000059be25 RBX: 0000000000000002 RCX: 
>>>>>>> 0000000000000000
>>>>>>> [11176.742465] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
>>>>>>> ffffffff9efeed78
>>>>>>> [11176.742470] RBP: ffffb6970021fe80 R08: 0000000000000001 R09: 
>>>>>>> 0000000000000001
>>>>>>> [11176.742473] R10: 0000000000000001 R11: 0000000000000001 R12: 
>>>>>>> ffff9c3350b0e800
>>>>>>> [11176.742477] R13: ffffffffa00e9680 R14: 00000a2a49ada060 R15: 
>>>>>>> 0000000000000002
>>>>>>> [11176.742483]  ? cpuidle_enter_state+0xf8/0x470
>>>>>>> [11176.742489]  ? cpuidle_enter_state+0xf8/0x470
>>>>>>> [11176.742495]  cpuidle_enter+0x2e/0x40
>>>>>>> [11176.742500]  call_cpuidle+0x23/0x40
>>>>>>> [11176.742506]  do_idle+0x201/0x280
>>>>>>> [11176.742512]  cpu_startup_entry+0x20/0x30
>>>>>>> [11176.742517]  start_secondary+0x11f/0x160
>>>>>>> [11176.742523] secondary_startup_64_no_verify+0xb0/0xbb
>>>>>>>
>>>>>
>>>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Lockdep spalt on killing a processes
  2021-10-26  2:33             ` Andrey Grodzovsky
@ 2021-10-26 10:54               ` Christian König
  2021-10-27 14:27                 ` Andrey Grodzovsky
  0 siblings, 1 reply; 19+ messages in thread
From: Christian König @ 2021-10-26 10:54 UTC (permalink / raw)
  To: Andrey Grodzovsky, Christian König, amd-gfx list, dri-devel,
	chris, daniel.vetter

Am 26.10.21 um 04:33 schrieb Andrey Grodzovsky:
> On 2021-10-25 3:56 p.m., Christian König wrote:
>
>> In general I'm all there to get this fixed, but there is one major 
>> problem: Drivers don't expect the lock to be dropped.
>
>
> I am probably missing something but in my approach we only modify the 
> code for those clients that call dma_fence_signal,
> not dma_fence_signal_locked. In those cases the drivers are agnostic 
> to lock behavior (or should be at least) since the lock
> is acquired within the dma fence code. Note that if you are worried 
> about calling the callback without lock then same exact
> concern is relevant to using the irq_work directly in the fence code 
> since the irq_work will execute at a later time without locked
> fence->lock (which is the point of using irq_work).

Yeah, I've seen that it just doesn't make much sense to me.

>>
>> What we could do is to change all drivers so they call always call 
>> the dma_fence_signal functions and drop the _locked variants. This 
>> way we could move calling the callback out of the spinlock.
>>
>> But that requires audit of all drivers, so quite a lot of work to do.
>
>
> As i said earlier - if we only modify dma_fence_signal and don't touch 
> dma_fence_signal_locked then our only concern should the users of 
> dma_fence_signal.

Yes, but what do you do with the drivers who call the _locked variant?

> Let me please know if I am still missing some point of yours.

Well, I mean we need to be able to handle this for all drivers.

Regards,
Christian.

>
> Andrey
>
>
>>
>> Regards,
>> Christian.
>>
>> Am 25.10.21 um 21:10 schrieb Andrey Grodzovsky:
>>> Adding back Daniel (somehow he got off the addresses list) and Chris 
>>> who worked a lot in this area.
>>>
>>> On 2021-10-21 2:34 a.m., Christian König wrote:
>>>>
>>>>
>>>> Am 20.10.21 um 21:32 schrieb Andrey Grodzovsky:
>>>>> On 2021-10-04 4:14 a.m., Christian König wrote:
>>>>>
>>>>>> The problem is a bit different.
>>>>>>
>>>>>> The callback is on the dependent fence, while we need to signal 
>>>>>> the scheduler fence.
>>>>>>
>>>>>> Daniel is right that this needs an irq_work struct to handle this 
>>>>>> properly.
>>>>>>
>>>>>> Christian.
>>>>>
>>>>>
>>>>> So we had some discussions with Christian regarding irq_work and 
>>>>> agreed I should look into doing it but stepping back for a sec -
>>>>>
>>>>> Why we insist on calling the dma_fence_cb  with fence->lock locked 
>>>>> ? Is it because of dma_fence_add_callback ?
>>>>> Because we first test for DMA_FENCE_FLAG_SIGNALED_BIT and only 
>>>>> after that lock the fence->lock ? If so, can't we
>>>>> move DMA_FENCE_FLAG_SIGNALED_BIT  check inside the locked section 
>>>>> ? Because if in theory
>>>>> we could call the cb with unlocked fence->lock (i.e. this kind of 
>>>>> iteration 
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv5.15-rc6%2Fsource%2Fdrivers%2Fgpu%2Fdrm%2Fttm%2Fttm_resource.c%23L117&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7Cc8a4525f94c244bebbd208d997f19242%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637707886075917091%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=YBq%2BNkDuYKERc8XJDWTfeM%2FSknpuCBHQYgN8Uo5PFv0%3D&amp;reserved=0)
>>>>> we wouldn't have the lockdep splat. And in general, is it really
>>>>> the correct approach to call a third party code from a call back 
>>>>> with locked spinlock ? We don't know what the cb does inside
>>>>> and I don't see any explicit restrictions in documentation of 
>>>>> dma_fence_func_t what can and cannot be done there.
>>>>
>>>> Yeah, that's exactly what I meant with using the irq_work directly 
>>>> in the fence code.
>>>
>>>
>>> My idea is not to use irq work at all but instead to implement 
>>> unlocked dma_fence cb execution using iteration
>>> which drops the spinlock each time next cb is executed and acquiring 
>>> it again after (until cb_list is empy).
>>>
>>>
>>>>
>>>>
>>>> The problem is dma_fence_signal_locked() which is used by quite a 
>>>> number of drivers to signal the fence while holding the lock.
>>>
>>>
>>> For this I think we should not reuse dma_fence_signal_locked inside 
>>> dma_fence_signal and instead implement it using the
>>> unlocked iteration I mentioned above. I looked a bit in the code and 
>>> the history and I see that until some time ago
>>> (this commit by Chris 0fc89b6802ba1fcc561b0c906e0cefd384e3b2e5), 
>>> indeed dma_fence_signal was doing it's own, locked iteration
>>> and wasn't reusing dma_fence_signal_locked. This way whoever relies 
>>> on the dma_fence_signal_locked won't be impacted
>>> an who is not (like us in 
>>> drm_sched_fence_scheduled/drm_sched_fence_finished) should also not 
>>> be impacted by more narrow
>>> scope of the lock. I also looked at dma_fence_default_wait and how 
>>> it locks the fence->lock and check if fence is signaled
>>> before wait start and I don't see a problem there either.
>>>
>>> I attached quick draft of this proposal to clarify.
>>>
>>> Andrey
>>>
>>>
>>>>
>>>> Otherwise we could indeed simplify the fence handling a lot.
>>>>
>>>> Christian.
>>>>
>>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>>>
>>>>>> Am 01.10.21 um 17:10 schrieb Andrey Grodzovsky:
>>>>>>> From what I see here you supposed to have actual deadlock and 
>>>>>>> not only warning, sched_fence->finished is  first signaled from 
>>>>>>> within
>>>>>>> hw fence done callback (drm_sched_job_done_cb) but then again 
>>>>>>> from within it's own callback (drm_sched_entity_kill_jobs_cb) 
>>>>>>> and so
>>>>>>> looks like same fence  object is recursively signaled twice. 
>>>>>>> This leads to attempt to lock fence->lock second time while it's 
>>>>>>> already
>>>>>>> locked. I don't see a need to call drm_sched_fence_finished from 
>>>>>>> within drm_sched_entity_kill_jobs_cb as this callback already 
>>>>>>> registered
>>>>>>> on sched_fence->finished fence (entity->last_scheduled == 
>>>>>>> s_fence->finished) and hence the signaling already took place.
>>>>>>>
>>>>>>> Andrey
>>>>>>>
>>>>>>> On 2021-10-01 6:50 a.m., Christian König wrote:
>>>>>>>> Hey, Andrey.
>>>>>>>>
>>>>>>>> while investigating some memory management problems I've got 
>>>>>>>> the logdep splat below.
>>>>>>>>
>>>>>>>> Looks like something is wrong with 
>>>>>>>> drm_sched_entity_kill_jobs_cb(), can you investigate?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>> [11176.741052] ============================================
>>>>>>>> [11176.741056] WARNING: possible recursive locking detected
>>>>>>>> [11176.741060] 5.15.0-rc1-00031-g9d546d600800 #171 Not tainted
>>>>>>>> [11176.741066] --------------------------------------------
>>>>>>>> [11176.741070] swapper/12/0 is trying to acquire lock:
>>>>>>>> [11176.741074] ffff9c337ed175a8 (&fence->lock){-.-.}-{3:3}, at: 
>>>>>>>> dma_fence_signal+0x28/0x80
>>>>>>>> [11176.741088]
>>>>>>>>                but task is already holding lock:
>>>>>>>> [11176.741092] ffff9c337ed172a8 (&fence->lock){-.-.}-{3:3}, at: 
>>>>>>>> dma_fence_signal+0x28/0x80
>>>>>>>> [11176.741100]
>>>>>>>>                other info that might help us debug this:
>>>>>>>> [11176.741104]  Possible unsafe locking scenario:
>>>>>>>>
>>>>>>>> [11176.741108]        CPU0
>>>>>>>> [11176.741110]        ----
>>>>>>>> [11176.741113]   lock(&fence->lock);
>>>>>>>> [11176.741118]   lock(&fence->lock);
>>>>>>>> [11176.741122]
>>>>>>>>                 *** DEADLOCK ***
>>>>>>>>
>>>>>>>> [11176.741125]  May be due to missing lock nesting notation
>>>>>>>>
>>>>>>>> [11176.741128] 2 locks held by swapper/12/0:
>>>>>>>> [11176.741133]  #0: ffff9c339c30f768 
>>>>>>>> (&ring->fence_drv.lock){-.-.}-{3:3}, at: 
>>>>>>>> dma_fence_signal+0x28/0x80
>>>>>>>> [11176.741142]  #1: ffff9c337ed172a8 
>>>>>>>> (&fence->lock){-.-.}-{3:3}, at: dma_fence_signal+0x28/0x80
>>>>>>>> [11176.741151]
>>>>>>>>                stack backtrace:
>>>>>>>> [11176.741155] CPU: 12 PID: 0 Comm: swapper/12 Not tainted 
>>>>>>>> 5.15.0-rc1-00031-g9d546d600800 #171
>>>>>>>> [11176.741160] Hardware name: System manufacturer System 
>>>>>>>> Product Name/PRIME X399-A, BIOS 0808 10/12/2018
>>>>>>>> [11176.741165] Call Trace:
>>>>>>>> [11176.741169]  <IRQ>
>>>>>>>> [11176.741173]  dump_stack_lvl+0x5b/0x74
>>>>>>>> [11176.741181]  dump_stack+0x10/0x12
>>>>>>>> [11176.741186]  __lock_acquire.cold+0x208/0x2df
>>>>>>>> [11176.741197]  lock_acquire+0xc6/0x2d0
>>>>>>>> [11176.741204]  ? dma_fence_signal+0x28/0x80
>>>>>>>> [11176.741212]  _raw_spin_lock_irqsave+0x4d/0x70
>>>>>>>> [11176.741219]  ? dma_fence_signal+0x28/0x80
>>>>>>>> [11176.741225]  dma_fence_signal+0x28/0x80
>>>>>>>> [11176.741230]  drm_sched_fence_finished+0x12/0x20 [gpu_sched]
>>>>>>>> [11176.741240] drm_sched_entity_kill_jobs_cb+0x1c/0x50 [gpu_sched]
>>>>>>>> [11176.741248] dma_fence_signal_timestamp_locked+0xac/0x1a0
>>>>>>>> [11176.741254]  dma_fence_signal+0x3b/0x80
>>>>>>>> [11176.741260]  drm_sched_fence_finished+0x12/0x20 [gpu_sched]
>>>>>>>> [11176.741268]  drm_sched_job_done.isra.0+0x7f/0x1a0 [gpu_sched]
>>>>>>>> [11176.741277]  drm_sched_job_done_cb+0x12/0x20 [gpu_sched]
>>>>>>>> [11176.741284] dma_fence_signal_timestamp_locked+0xac/0x1a0
>>>>>>>> [11176.741290]  dma_fence_signal+0x3b/0x80
>>>>>>>> [11176.741296]  amdgpu_fence_process+0xd1/0x140 [amdgpu]
>>>>>>>> [11176.741504]  sdma_v4_0_process_trap_irq+0x8c/0xb0 [amdgpu]
>>>>>>>> [11176.741731]  amdgpu_irq_dispatch+0xce/0x250 [amdgpu]
>>>>>>>> [11176.741954]  amdgpu_ih_process+0x81/0x100 [amdgpu]
>>>>>>>> [11176.742174]  amdgpu_irq_handler+0x26/0xa0 [amdgpu]
>>>>>>>> [11176.742393]  __handle_irq_event_percpu+0x4f/0x2c0
>>>>>>>> [11176.742402]  handle_irq_event_percpu+0x33/0x80
>>>>>>>> [11176.742408]  handle_irq_event+0x39/0x60
>>>>>>>> [11176.742414]  handle_edge_irq+0x93/0x1d0
>>>>>>>> [11176.742419]  __common_interrupt+0x50/0xe0
>>>>>>>> [11176.742426]  common_interrupt+0x80/0x90
>>>>>>>> [11176.742431]  </IRQ>
>>>>>>>> [11176.742436]  asm_common_interrupt+0x1e/0x40
>>>>>>>> [11176.742442] RIP: 0010:cpuidle_enter_state+0xff/0x470
>>>>>>>> [11176.742449] Code: 0f a3 05 04 54 24 01 0f 82 70 02 00 00 31 
>>>>>>>> ff e8 37 5d 6f ff 80 7d d7 00 0f 85 e9 01 00 00 e8 58 a2 7f ff 
>>>>>>>> fb 66 0f 1f 44 00 00 <45> 85 ff 0f 88 01 01 00 00 49 63 c7 4c 
>>>>>>>> 2b 75 c8 48 8d 14 40 48 8d
>>>>>>>> [11176.742455] RSP: 0018:ffffb6970021fe48 EFLAGS: 00000202
>>>>>>>> [11176.742461] RAX: 000000000059be25 RBX: 0000000000000002 RCX: 
>>>>>>>> 0000000000000000
>>>>>>>> [11176.742465] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
>>>>>>>> ffffffff9efeed78
>>>>>>>> [11176.742470] RBP: ffffb6970021fe80 R08: 0000000000000001 R09: 
>>>>>>>> 0000000000000001
>>>>>>>> [11176.742473] R10: 0000000000000001 R11: 0000000000000001 R12: 
>>>>>>>> ffff9c3350b0e800
>>>>>>>> [11176.742477] R13: ffffffffa00e9680 R14: 00000a2a49ada060 R15: 
>>>>>>>> 0000000000000002
>>>>>>>> [11176.742483]  ? cpuidle_enter_state+0xf8/0x470
>>>>>>>> [11176.742489]  ? cpuidle_enter_state+0xf8/0x470
>>>>>>>> [11176.742495]  cpuidle_enter+0x2e/0x40
>>>>>>>> [11176.742500]  call_cpuidle+0x23/0x40
>>>>>>>> [11176.742506]  do_idle+0x201/0x280
>>>>>>>> [11176.742512]  cpu_startup_entry+0x20/0x30
>>>>>>>> [11176.742517]  start_secondary+0x11f/0x160
>>>>>>>> [11176.742523] secondary_startup_64_no_verify+0xb0/0xbb
>>>>>>>>
>>>>>>
>>>>
>>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Lockdep spalt on killing a processes
  2021-10-26 10:54               ` Christian König
@ 2021-10-27 14:27                 ` Andrey Grodzovsky
  2021-10-27 14:34                   ` Christian König
  0 siblings, 1 reply; 19+ messages in thread
From: Andrey Grodzovsky @ 2021-10-27 14:27 UTC (permalink / raw)
  To: Christian König, Christian König, amd-gfx list,
	dri-devel, chris, daniel.vetter


On 2021-10-26 6:54 a.m., Christian König wrote:
> Am 26.10.21 um 04:33 schrieb Andrey Grodzovsky:
>> On 2021-10-25 3:56 p.m., Christian König wrote:
>>
>>> In general I'm all there to get this fixed, but there is one major 
>>> problem: Drivers don't expect the lock to be dropped.
>>
>>
>> I am probably missing something but in my approach we only modify the 
>> code for those clients that call dma_fence_signal,
>> not dma_fence_signal_locked. In those cases the drivers are agnostic 
>> to lock behavior (or should be at least) since the lock
>> is acquired within the dma fence code. Note that if you are worried 
>> about calling the callback without lock then same exact
>> concern is relevant to using the irq_work directly in the fence code 
>> since the irq_work will execute at a later time without locked
>> fence->lock (which is the point of using irq_work).
>
> Yeah, I've seen that it just doesn't make much sense to me.


Not clear what doesn't make sense ?


>
>>>
>>> What we could do is to change all drivers so they call always call 
>>> the dma_fence_signal functions and drop the _locked variants. This 
>>> way we could move calling the callback out of the spinlock.
>>>
>>> But that requires audit of all drivers, so quite a lot of work to do.
>>
>>
>> As i said earlier - if we only modify dma_fence_signal and don't 
>> touch dma_fence_signal_locked then our only concern should the users 
>> of dma_fence_signal.
>
> Yes, but what do you do with the drivers who call the _locked variant?


IMHO we don't touch them at all, they stay as is, we only re-implement  
dma_fence_signal  because drivers that use the locked variant take the 
lock explicitly and so they intend for their callbacks to run
under the lock and if they don't , it's their own problem within their 
code and they should fix it locally there. Driver that call 
dma_fence_signal are our only problem because they didn't take the lock
explicitly but were forced to run the callback under lock by the 
dma-fence framework and that something we can change.


>
>> Let me please know if I am still missing some point of yours.
>
> Well, I mean we need to be able to handle this for all drivers.


For sure, but as i said above in my opinion we need to change only for 
those drivers that don't use the _locked version.

Andrey


>
> Regards,
> Christian.
>
>>
>> Andrey
>>
>>
>>>
>>> Regards,
>>> Christian.
>>>
>>> Am 25.10.21 um 21:10 schrieb Andrey Grodzovsky:
>>>> Adding back Daniel (somehow he got off the addresses list) and 
>>>> Chris who worked a lot in this area.
>>>>
>>>> On 2021-10-21 2:34 a.m., Christian König wrote:
>>>>>
>>>>>
>>>>> Am 20.10.21 um 21:32 schrieb Andrey Grodzovsky:
>>>>>> On 2021-10-04 4:14 a.m., Christian König wrote:
>>>>>>
>>>>>>> The problem is a bit different.
>>>>>>>
>>>>>>> The callback is on the dependent fence, while we need to signal 
>>>>>>> the scheduler fence.
>>>>>>>
>>>>>>> Daniel is right that this needs an irq_work struct to handle 
>>>>>>> this properly.
>>>>>>>
>>>>>>> Christian.
>>>>>>
>>>>>>
>>>>>> So we had some discussions with Christian regarding irq_work and 
>>>>>> agreed I should look into doing it but stepping back for a sec -
>>>>>>
>>>>>> Why we insist on calling the dma_fence_cb  with fence->lock 
>>>>>> locked ? Is it because of dma_fence_add_callback ?
>>>>>> Because we first test for DMA_FENCE_FLAG_SIGNALED_BIT and only 
>>>>>> after that lock the fence->lock ? If so, can't we
>>>>>> move DMA_FENCE_FLAG_SIGNALED_BIT  check inside the locked section 
>>>>>> ? Because if in theory
>>>>>> we could call the cb with unlocked fence->lock (i.e. this kind of 
>>>>>> iteration 
>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv5.15-rc6%2Fsource%2Fdrivers%2Fgpu%2Fdrm%2Fttm%2Fttm_resource.c%23L117&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7Cc8a4525f94c244bebbd208d997f19242%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637707886075917091%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=YBq%2BNkDuYKERc8XJDWTfeM%2FSknpuCBHQYgN8Uo5PFv0%3D&amp;reserved=0)
>>>>>> we wouldn't have the lockdep splat. And in general, is it really
>>>>>> the correct approach to call a third party code from a call back 
>>>>>> with locked spinlock ? We don't know what the cb does inside
>>>>>> and I don't see any explicit restrictions in documentation of 
>>>>>> dma_fence_func_t what can and cannot be done there.
>>>>>
>>>>> Yeah, that's exactly what I meant with using the irq_work directly 
>>>>> in the fence code.
>>>>
>>>>
>>>> My idea is not to use irq work at all but instead to implement 
>>>> unlocked dma_fence cb execution using iteration
>>>> which drops the spinlock each time next cb is executed and 
>>>> acquiring it again after (until cb_list is empy).
>>>>
>>>>
>>>>>
>>>>>
>>>>> The problem is dma_fence_signal_locked() which is used by quite a 
>>>>> number of drivers to signal the fence while holding the lock.
>>>>
>>>>
>>>> For this I think we should not reuse dma_fence_signal_locked inside 
>>>> dma_fence_signal and instead implement it using the
>>>> unlocked iteration I mentioned above. I looked a bit in the code 
>>>> and the history and I see that until some time ago
>>>> (this commit by Chris 0fc89b6802ba1fcc561b0c906e0cefd384e3b2e5), 
>>>> indeed dma_fence_signal was doing it's own, locked iteration
>>>> and wasn't reusing dma_fence_signal_locked. This way whoever relies 
>>>> on the dma_fence_signal_locked won't be impacted
>>>> an who is not (like us in 
>>>> drm_sched_fence_scheduled/drm_sched_fence_finished) should also not 
>>>> be impacted by more narrow
>>>> scope of the lock. I also looked at dma_fence_default_wait and how 
>>>> it locks the fence->lock and check if fence is signaled
>>>> before wait start and I don't see a problem there either.
>>>>
>>>> I attached quick draft of this proposal to clarify.
>>>>
>>>> Andrey
>>>>
>>>>
>>>>>
>>>>> Otherwise we could indeed simplify the fence handling a lot.
>>>>>
>>>>> Christian.
>>>>>
>>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Am 01.10.21 um 17:10 schrieb Andrey Grodzovsky:
>>>>>>>> From what I see here you supposed to have actual deadlock and 
>>>>>>>> not only warning, sched_fence->finished is  first signaled from 
>>>>>>>> within
>>>>>>>> hw fence done callback (drm_sched_job_done_cb) but then again 
>>>>>>>> from within it's own callback (drm_sched_entity_kill_jobs_cb) 
>>>>>>>> and so
>>>>>>>> looks like same fence  object is recursively signaled twice. 
>>>>>>>> This leads to attempt to lock fence->lock second time while 
>>>>>>>> it's already
>>>>>>>> locked. I don't see a need to call drm_sched_fence_finished 
>>>>>>>> from within drm_sched_entity_kill_jobs_cb as this callback 
>>>>>>>> already registered
>>>>>>>> on sched_fence->finished fence (entity->last_scheduled == 
>>>>>>>> s_fence->finished) and hence the signaling already took place.
>>>>>>>>
>>>>>>>> Andrey
>>>>>>>>
>>>>>>>> On 2021-10-01 6:50 a.m., Christian König wrote:
>>>>>>>>> Hey, Andrey.
>>>>>>>>>
>>>>>>>>> while investigating some memory management problems I've got 
>>>>>>>>> the logdep splat below.
>>>>>>>>>
>>>>>>>>> Looks like something is wrong with 
>>>>>>>>> drm_sched_entity_kill_jobs_cb(), can you investigate?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>> [11176.741052] ============================================
>>>>>>>>> [11176.741056] WARNING: possible recursive locking detected
>>>>>>>>> [11176.741060] 5.15.0-rc1-00031-g9d546d600800 #171 Not tainted
>>>>>>>>> [11176.741066] --------------------------------------------
>>>>>>>>> [11176.741070] swapper/12/0 is trying to acquire lock:
>>>>>>>>> [11176.741074] ffff9c337ed175a8 (&fence->lock){-.-.}-{3:3}, 
>>>>>>>>> at: dma_fence_signal+0x28/0x80
>>>>>>>>> [11176.741088]
>>>>>>>>>                but task is already holding lock:
>>>>>>>>> [11176.741092] ffff9c337ed172a8 (&fence->lock){-.-.}-{3:3}, 
>>>>>>>>> at: dma_fence_signal+0x28/0x80
>>>>>>>>> [11176.741100]
>>>>>>>>>                other info that might help us debug this:
>>>>>>>>> [11176.741104]  Possible unsafe locking scenario:
>>>>>>>>>
>>>>>>>>> [11176.741108]        CPU0
>>>>>>>>> [11176.741110]        ----
>>>>>>>>> [11176.741113]   lock(&fence->lock);
>>>>>>>>> [11176.741118]   lock(&fence->lock);
>>>>>>>>> [11176.741122]
>>>>>>>>>                 *** DEADLOCK ***
>>>>>>>>>
>>>>>>>>> [11176.741125]  May be due to missing lock nesting notation
>>>>>>>>>
>>>>>>>>> [11176.741128] 2 locks held by swapper/12/0:
>>>>>>>>> [11176.741133]  #0: ffff9c339c30f768 
>>>>>>>>> (&ring->fence_drv.lock){-.-.}-{3:3}, at: 
>>>>>>>>> dma_fence_signal+0x28/0x80
>>>>>>>>> [11176.741142]  #1: ffff9c337ed172a8 
>>>>>>>>> (&fence->lock){-.-.}-{3:3}, at: dma_fence_signal+0x28/0x80
>>>>>>>>> [11176.741151]
>>>>>>>>>                stack backtrace:
>>>>>>>>> [11176.741155] CPU: 12 PID: 0 Comm: swapper/12 Not tainted 
>>>>>>>>> 5.15.0-rc1-00031-g9d546d600800 #171
>>>>>>>>> [11176.741160] Hardware name: System manufacturer System 
>>>>>>>>> Product Name/PRIME X399-A, BIOS 0808 10/12/2018
>>>>>>>>> [11176.741165] Call Trace:
>>>>>>>>> [11176.741169]  <IRQ>
>>>>>>>>> [11176.741173]  dump_stack_lvl+0x5b/0x74
>>>>>>>>> [11176.741181]  dump_stack+0x10/0x12
>>>>>>>>> [11176.741186]  __lock_acquire.cold+0x208/0x2df
>>>>>>>>> [11176.741197]  lock_acquire+0xc6/0x2d0
>>>>>>>>> [11176.741204]  ? dma_fence_signal+0x28/0x80
>>>>>>>>> [11176.741212]  _raw_spin_lock_irqsave+0x4d/0x70
>>>>>>>>> [11176.741219]  ? dma_fence_signal+0x28/0x80
>>>>>>>>> [11176.741225]  dma_fence_signal+0x28/0x80
>>>>>>>>> [11176.741230]  drm_sched_fence_finished+0x12/0x20 [gpu_sched]
>>>>>>>>> [11176.741240] drm_sched_entity_kill_jobs_cb+0x1c/0x50 
>>>>>>>>> [gpu_sched]
>>>>>>>>> [11176.741248] dma_fence_signal_timestamp_locked+0xac/0x1a0
>>>>>>>>> [11176.741254]  dma_fence_signal+0x3b/0x80
>>>>>>>>> [11176.741260]  drm_sched_fence_finished+0x12/0x20 [gpu_sched]
>>>>>>>>> [11176.741268] drm_sched_job_done.isra.0+0x7f/0x1a0 [gpu_sched]
>>>>>>>>> [11176.741277]  drm_sched_job_done_cb+0x12/0x20 [gpu_sched]
>>>>>>>>> [11176.741284] dma_fence_signal_timestamp_locked+0xac/0x1a0
>>>>>>>>> [11176.741290]  dma_fence_signal+0x3b/0x80
>>>>>>>>> [11176.741296]  amdgpu_fence_process+0xd1/0x140 [amdgpu]
>>>>>>>>> [11176.741504] sdma_v4_0_process_trap_irq+0x8c/0xb0 [amdgpu]
>>>>>>>>> [11176.741731]  amdgpu_irq_dispatch+0xce/0x250 [amdgpu]
>>>>>>>>> [11176.741954]  amdgpu_ih_process+0x81/0x100 [amdgpu]
>>>>>>>>> [11176.742174]  amdgpu_irq_handler+0x26/0xa0 [amdgpu]
>>>>>>>>> [11176.742393] __handle_irq_event_percpu+0x4f/0x2c0
>>>>>>>>> [11176.742402]  handle_irq_event_percpu+0x33/0x80
>>>>>>>>> [11176.742408]  handle_irq_event+0x39/0x60
>>>>>>>>> [11176.742414]  handle_edge_irq+0x93/0x1d0
>>>>>>>>> [11176.742419]  __common_interrupt+0x50/0xe0
>>>>>>>>> [11176.742426]  common_interrupt+0x80/0x90
>>>>>>>>> [11176.742431]  </IRQ>
>>>>>>>>> [11176.742436]  asm_common_interrupt+0x1e/0x40
>>>>>>>>> [11176.742442] RIP: 0010:cpuidle_enter_state+0xff/0x470
>>>>>>>>> [11176.742449] Code: 0f a3 05 04 54 24 01 0f 82 70 02 00 00 31 
>>>>>>>>> ff e8 37 5d 6f ff 80 7d d7 00 0f 85 e9 01 00 00 e8 58 a2 7f ff 
>>>>>>>>> fb 66 0f 1f 44 00 00 <45> 85 ff 0f 88 01 01 00 00 49 63 c7 4c 
>>>>>>>>> 2b 75 c8 48 8d 14 40 48 8d
>>>>>>>>> [11176.742455] RSP: 0018:ffffb6970021fe48 EFLAGS: 00000202
>>>>>>>>> [11176.742461] RAX: 000000000059be25 RBX: 0000000000000002 
>>>>>>>>> RCX: 0000000000000000
>>>>>>>>> [11176.742465] RDX: 0000000000000000 RSI: 0000000000000000 
>>>>>>>>> RDI: ffffffff9efeed78
>>>>>>>>> [11176.742470] RBP: ffffb6970021fe80 R08: 0000000000000001 
>>>>>>>>> R09: 0000000000000001
>>>>>>>>> [11176.742473] R10: 0000000000000001 R11: 0000000000000001 
>>>>>>>>> R12: ffff9c3350b0e800
>>>>>>>>> [11176.742477] R13: ffffffffa00e9680 R14: 00000a2a49ada060 
>>>>>>>>> R15: 0000000000000002
>>>>>>>>> [11176.742483]  ? cpuidle_enter_state+0xf8/0x470
>>>>>>>>> [11176.742489]  ? cpuidle_enter_state+0xf8/0x470
>>>>>>>>> [11176.742495]  cpuidle_enter+0x2e/0x40
>>>>>>>>> [11176.742500]  call_cpuidle+0x23/0x40
>>>>>>>>> [11176.742506]  do_idle+0x201/0x280
>>>>>>>>> [11176.742512]  cpu_startup_entry+0x20/0x30
>>>>>>>>> [11176.742517]  start_secondary+0x11f/0x160
>>>>>>>>> [11176.742523] secondary_startup_64_no_verify+0xb0/0xbb
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Lockdep spalt on killing a processes
  2021-10-27 14:27                 ` Andrey Grodzovsky
@ 2021-10-27 14:34                   ` Christian König
  2021-10-27 14:47                     ` Andrey Grodzovsky
  0 siblings, 1 reply; 19+ messages in thread
From: Christian König @ 2021-10-27 14:34 UTC (permalink / raw)
  To: Andrey Grodzovsky, Christian König, amd-gfx list, dri-devel,
	chris, daniel.vetter

Am 27.10.21 um 16:27 schrieb Andrey Grodzovsky:
> [SNIP]
>>
>>> Let me please know if I am still missing some point of yours.
>>
>> Well, I mean we need to be able to handle this for all drivers.
>
>
> For sure, but as i said above in my opinion we need to change only for 
> those drivers that don't use the _locked version.

And that absolutely won't work.

See the dma_fence is a contract between drivers, so you need the same 
calling convention between all drivers.

Either we always call the callback with the lock held or we always call 
it without the lock, but sometimes like that and sometimes otherwise 
won't work.

Christian.

>
> Andrey


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Lockdep spalt on killing a processes
  2021-10-27 14:34                   ` Christian König
@ 2021-10-27 14:47                     ` Andrey Grodzovsky
  2021-10-27 14:50                       ` Christian König
  0 siblings, 1 reply; 19+ messages in thread
From: Andrey Grodzovsky @ 2021-10-27 14:47 UTC (permalink / raw)
  To: Christian König, Christian König, amd-gfx list,
	dri-devel, chris, daniel.vetter


On 2021-10-27 10:34 a.m., Christian König wrote:
> Am 27.10.21 um 16:27 schrieb Andrey Grodzovsky:
>> [SNIP]
>>>
>>>> Let me please know if I am still missing some point of yours.
>>>
>>> Well, I mean we need to be able to handle this for all drivers.
>>
>>
>> For sure, but as i said above in my opinion we need to change only 
>> for those drivers that don't use the _locked version.
>
> And that absolutely won't work.
>
> See the dma_fence is a contract between drivers, so you need the same 
> calling convention between all drivers.
>
> Either we always call the callback with the lock held or we always 
> call it without the lock, but sometimes like that and sometimes 
> otherwise won't work.
>
> Christian.


I am not sure I fully understand what problems this will cause but 
anyway, then we are back to irq_work. We cannot embed irq_work as union 
within dma_fenc's cb_list
because it's already reused as timestamp and as rcu head after the fence 
is signaled. So I will do it within drm_scheduler with single irq_work 
per drm_sched_entity
as we discussed before.

Andrey


>
>>
>> Andrey
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Lockdep spalt on killing a processes
  2021-10-27 14:47                     ` Andrey Grodzovsky
@ 2021-10-27 14:50                       ` Christian König
  2021-10-27 19:58                         ` Andrey Grodzovsky
  0 siblings, 1 reply; 19+ messages in thread
From: Christian König @ 2021-10-27 14:50 UTC (permalink / raw)
  To: Andrey Grodzovsky, Christian König, amd-gfx list, dri-devel,
	chris, daniel.vetter

Am 27.10.21 um 16:47 schrieb Andrey Grodzovsky:
>
> On 2021-10-27 10:34 a.m., Christian König wrote:
>> Am 27.10.21 um 16:27 schrieb Andrey Grodzovsky:
>>> [SNIP]
>>>>
>>>>> Let me please know if I am still missing some point of yours.
>>>>
>>>> Well, I mean we need to be able to handle this for all drivers.
>>>
>>>
>>> For sure, but as i said above in my opinion we need to change only 
>>> for those drivers that don't use the _locked version.
>>
>> And that absolutely won't work.
>>
>> See the dma_fence is a contract between drivers, so you need the same 
>> calling convention between all drivers.
>>
>> Either we always call the callback with the lock held or we always 
>> call it without the lock, but sometimes like that and sometimes 
>> otherwise won't work.
>>
>> Christian.
>
>
> I am not sure I fully understand what problems this will cause but 
> anyway, then we are back to irq_work. We cannot embed irq_work as 
> union within dma_fenc's cb_list
> because it's already reused as timestamp and as rcu head after the 
> fence is signaled. So I will do it within drm_scheduler with single 
> irq_work per drm_sched_entity
> as we discussed before.

That won't work either. We free up the entity after the cleanup 
function. That's the reason we use the callback on the job in the first 
place.

We could overlead the cb structure in the job though.

Christian.

>
> Andrey
>
>
>>
>>>
>>> Andrey
>>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Lockdep spalt on killing a processes
  2021-10-27 14:50                       ` Christian König
@ 2021-10-27 19:58                         ` Andrey Grodzovsky
  2021-10-28 17:26                           ` Andrey Grodzovsky
  0 siblings, 1 reply; 19+ messages in thread
From: Andrey Grodzovsky @ 2021-10-27 19:58 UTC (permalink / raw)
  To: Christian König, Christian König, amd-gfx list,
	dri-devel, chris, daniel.vetter


On 2021-10-27 10:50 a.m., Christian König wrote:
> Am 27.10.21 um 16:47 schrieb Andrey Grodzovsky:
>>
>> On 2021-10-27 10:34 a.m., Christian König wrote:
>>> Am 27.10.21 um 16:27 schrieb Andrey Grodzovsky:
>>>> [SNIP]
>>>>>
>>>>>> Let me please know if I am still missing some point of yours.
>>>>>
>>>>> Well, I mean we need to be able to handle this for all drivers.
>>>>
>>>>
>>>> For sure, but as i said above in my opinion we need to change only 
>>>> for those drivers that don't use the _locked version.
>>>
>>> And that absolutely won't work.
>>>
>>> See the dma_fence is a contract between drivers, so you need the 
>>> same calling convention between all drivers.
>>>
>>> Either we always call the callback with the lock held or we always 
>>> call it without the lock, but sometimes like that and sometimes 
>>> otherwise won't work.
>>>
>>> Christian.
>>
>>
>> I am not sure I fully understand what problems this will cause but 
>> anyway, then we are back to irq_work. We cannot embed irq_work as 
>> union within dma_fenc's cb_list
>> because it's already reused as timestamp and as rcu head after the 
>> fence is signaled. So I will do it within drm_scheduler with single 
>> irq_work per drm_sched_entity
>> as we discussed before.
>
> That won't work either. We free up the entity after the cleanup 
> function. That's the reason we use the callback on the job in the 
> first place.


Yep, missed it.


>
> We could overlead the cb structure in the job though.


I guess, since no one else is using this member it after the cb executed.

Andrey


>
> Christian.
>
>>
>> Andrey
>>
>>
>>>
>>>>
>>>> Andrey
>>>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Lockdep spalt on killing a processes
  2021-10-27 19:58                         ` Andrey Grodzovsky
@ 2021-10-28 17:26                           ` Andrey Grodzovsky
  2021-10-29  7:07                             ` Christian König
  0 siblings, 1 reply; 19+ messages in thread
From: Andrey Grodzovsky @ 2021-10-28 17:26 UTC (permalink / raw)
  To: Christian König, Christian König, amd-gfx list,
	dri-devel, chris, daniel.vetter

[-- Attachment #1: Type: text/plain, Size: 2139 bytes --]


On 2021-10-27 3:58 p.m., Andrey Grodzovsky wrote:
>
> On 2021-10-27 10:50 a.m., Christian König wrote:
>> Am 27.10.21 um 16:47 schrieb Andrey Grodzovsky:
>>>
>>> On 2021-10-27 10:34 a.m., Christian König wrote:
>>>> Am 27.10.21 um 16:27 schrieb Andrey Grodzovsky:
>>>>> [SNIP]
>>>>>>
>>>>>>> Let me please know if I am still missing some point of yours.
>>>>>>
>>>>>> Well, I mean we need to be able to handle this for all drivers.
>>>>>
>>>>>
>>>>> For sure, but as i said above in my opinion we need to change only 
>>>>> for those drivers that don't use the _locked version.
>>>>
>>>> And that absolutely won't work.
>>>>
>>>> See the dma_fence is a contract between drivers, so you need the 
>>>> same calling convention between all drivers.
>>>>
>>>> Either we always call the callback with the lock held or we always 
>>>> call it without the lock, but sometimes like that and sometimes 
>>>> otherwise won't work.
>>>>
>>>> Christian.
>>>
>>>
>>> I am not sure I fully understand what problems this will cause but 
>>> anyway, then we are back to irq_work. We cannot embed irq_work as 
>>> union within dma_fenc's cb_list
>>> because it's already reused as timestamp and as rcu head after the 
>>> fence is signaled. So I will do it within drm_scheduler with single 
>>> irq_work per drm_sched_entity
>>> as we discussed before.
>>
>> That won't work either. We free up the entity after the cleanup 
>> function. That's the reason we use the callback on the job in the 
>> first place.
>
>
> Yep, missed it.
>
>
>>
>> We could overlead the cb structure in the job though.
>
>
> I guess, since no one else is using this member it after the cb executed.
>
> Andrey


Attached a patch. Give it a try please, I tested it on my side and tried 
to generate the right conditions to trigger this code path by repeatedly 
submitting commands while issuing GPU reset to stop the scheduler and 
then killing command submissions process in the middle. But for some 
reason looks like the job_queue was always empty already at the time of 
entity kill.

Andrey


>
>
>>
>> Christian.
>>
>>>
>>> Andrey
>>>
>>>
>>>>
>>>>>
>>>>> Andrey
>>>>
>>

[-- Attachment #2: 0001-drm-sched-Avoid-lockdep-spalt-on-killing-a-processes.patch --]
[-- Type: text/x-patch, Size: 4397 bytes --]

From 8ba5c089939b79a6567411c33d4db40e5846eef3 Mon Sep 17 00:00:00 2001
From: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Date: Thu, 28 Oct 2021 12:24:03 -0400
Subject: [PATCH] drm/sched: Avoid lockdep spalt on killing a processes
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Probelm:
Singlaning one sched fence from within another's sched
fence singal callback generates lockdep splat because
the both have same lockdep class of their fence->lock

Fix:
Fix bellow stack by rescheduling to irq work of
signaling and killing of jobs that left when entity is killed.

[11176.741181]  dump_stack+0x10/0x12
[11176.741186] __lock_acquire.cold+0x208/0x2df
[11176.741197]  lock_acquire+0xc6/0x2d0
[11176.741204]  ? dma_fence_signal+0x28/0x80
[11176.741212] _raw_spin_lock_irqsave+0x4d/0x70
[11176.741219]  ? dma_fence_signal+0x28/0x80
[11176.741225]  dma_fence_signal+0x28/0x80
[11176.741230] drm_sched_fence_finished+0x12/0x20 [gpu_sched]
[11176.741240] drm_sched_entity_kill_jobs_cb+0x1c/0x50 [gpu_sched]
[11176.741248] dma_fence_signal_timestamp_locked+0xac/0x1a0
[11176.741254]  dma_fence_signal+0x3b/0x80
[11176.741260] drm_sched_fence_finished+0x12/0x20 [gpu_sched]
[11176.741268] drm_sched_job_done.isra.0+0x7f/0x1a0 [gpu_sched]
[11176.741277] drm_sched_job_done_cb+0x12/0x20 [gpu_sched]
[11176.741284] dma_fence_signal_timestamp_locked+0xac/0x1a0
[11176.741290]  dma_fence_signal+0x3b/0x80
[11176.741296] amdgpu_fence_process+0xd1/0x140 [amdgpu]
[11176.741504] sdma_v4_0_process_trap_irq+0x8c/0xb0 [amdgpu]
[11176.741731]  amdgpu_irq_dispatch+0xce/0x250 [amdgpu]
[11176.741954]  amdgpu_ih_process+0x81/0x100 [amdgpu]
[11176.742174]  amdgpu_irq_handler+0x26/0xa0 [amdgpu]
[11176.742393] __handle_irq_event_percpu+0x4f/0x2c0
[11176.742402] handle_irq_event_percpu+0x33/0x80
[11176.742408]  handle_irq_event+0x39/0x60
[11176.742414]  handle_edge_irq+0x93/0x1d0
[11176.742419]  __common_interrupt+0x50/0xe0
[11176.742426]  common_interrupt+0x80/0x90

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Suggested-by: Daniel Vetter  <daniel.vetter@ffwll.ch>
Suggested-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/scheduler/sched_entity.c | 15 ++++++++++++---
 include/drm/gpu_scheduler.h              | 12 +++++++++++-
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 27e1573af96e..191c56064f19 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -190,6 +190,16 @@ long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout)
 }
 EXPORT_SYMBOL(drm_sched_entity_flush);
 
+static void drm_sched_entity_kill_jobs_irq_work(struct irq_work *wrk)
+{
+	struct drm_sched_job *job = container_of(wrk, typeof(*job), work);
+
+	drm_sched_fence_finished(job->s_fence);
+	WARN_ON(job->s_fence->parent);
+	job->sched->ops->free_job(job);
+}
+
+
 /* Signal the scheduler finished fence when the entity in question is killed. */
 static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f,
 					  struct dma_fence_cb *cb)
@@ -197,9 +207,8 @@ static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f,
 	struct drm_sched_job *job = container_of(cb, struct drm_sched_job,
 						 finish_cb);
 
-	drm_sched_fence_finished(job->s_fence);
-	WARN_ON(job->s_fence->parent);
-	job->sched->ops->free_job(job);
+	init_irq_work(&job->work, drm_sched_entity_kill_jobs_irq_work);
+	irq_work_queue(&job->work);
 }
 
 static struct dma_fence *
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index f011e4c407f2..bbc22fad8d80 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -28,6 +28,7 @@
 #include <linux/dma-fence.h>
 #include <linux/completion.h>
 #include <linux/xarray.h>
+#include <linux/irq_work.h>
 
 #define MAX_WAIT_SCHED_ENTITY_Q_EMPTY msecs_to_jiffies(1000)
 
@@ -286,7 +287,16 @@ struct drm_sched_job {
 	struct list_head		list;
 	struct drm_gpu_scheduler	*sched;
 	struct drm_sched_fence		*s_fence;
-	struct dma_fence_cb		finish_cb;
+
+	/*
+	 * work is used only after finish_cb has been used and will not be
+	 * accessed anymore.
+	 */
+	union {
+		struct dma_fence_cb		finish_cb;
+		struct irq_work 		work;
+	};
+
 	uint64_t			id;
 	atomic_t			karma;
 	enum drm_sched_priority		s_priority;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: Lockdep spalt on killing a processes
  2021-10-28 17:26                           ` Andrey Grodzovsky
@ 2021-10-29  7:07                             ` Christian König
  2021-11-01 15:24                               ` Andrey Grodzovsky
  0 siblings, 1 reply; 19+ messages in thread
From: Christian König @ 2021-10-29  7:07 UTC (permalink / raw)
  To: Andrey Grodzovsky, Christian König, amd-gfx list, dri-devel,
	chris, daniel.vetter

Am 28.10.21 um 19:26 schrieb Andrey Grodzovsky:
>
> On 2021-10-27 3:58 p.m., Andrey Grodzovsky wrote:
>>
>> On 2021-10-27 10:50 a.m., Christian König wrote:
>>> Am 27.10.21 um 16:47 schrieb Andrey Grodzovsky:
>>>>
>>>> On 2021-10-27 10:34 a.m., Christian König wrote:
>>>>> Am 27.10.21 um 16:27 schrieb Andrey Grodzovsky:
>>>>>> [SNIP]
>>>>>>>
>>>>>>>> Let me please know if I am still missing some point of yours.
>>>>>>>
>>>>>>> Well, I mean we need to be able to handle this for all drivers.
>>>>>>
>>>>>>
>>>>>> For sure, but as i said above in my opinion we need to change 
>>>>>> only for those drivers that don't use the _locked version.
>>>>>
>>>>> And that absolutely won't work.
>>>>>
>>>>> See the dma_fence is a contract between drivers, so you need the 
>>>>> same calling convention between all drivers.
>>>>>
>>>>> Either we always call the callback with the lock held or we always 
>>>>> call it without the lock, but sometimes like that and sometimes 
>>>>> otherwise won't work.
>>>>>
>>>>> Christian.
>>>>
>>>>
>>>> I am not sure I fully understand what problems this will cause but 
>>>> anyway, then we are back to irq_work. We cannot embed irq_work as 
>>>> union within dma_fenc's cb_list
>>>> because it's already reused as timestamp and as rcu head after the 
>>>> fence is signaled. So I will do it within drm_scheduler with single 
>>>> irq_work per drm_sched_entity
>>>> as we discussed before.
>>>
>>> That won't work either. We free up the entity after the cleanup 
>>> function. That's the reason we use the callback on the job in the 
>>> first place.
>>
>>
>> Yep, missed it.
>>
>>
>>>
>>> We could overlead the cb structure in the job though.
>>
>>
>> I guess, since no one else is using this member it after the cb 
>> executed.
>>
>> Andrey
>
>
> Attached a patch. Give it a try please, I tested it on my side and 
> tried to generate the right conditions to trigger this code path by 
> repeatedly submitting commands while issuing GPU reset to stop the 
> scheduler and then killing command submissions process in the middle. 
> But for some reason looks like the job_queue was always empty already 
> at the time of entity kill.

It was trivial to trigger with the stress utility I've hacked together:

amdgpu_stress -b v 1g -b g 1g -c 1 2 1g 1k

Then while it is copying just cntrl+c to kill it.

The patch itself is:

Tested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>

Thanks,
Christian.

>
> Andrey
>
>
>>
>>
>>>
>>> Christian.
>>>
>>>>
>>>> Andrey
>>>>
>>>>
>>>>>
>>>>>>
>>>>>> Andrey
>>>>>
>>>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Lockdep spalt on killing a processes
  2021-10-29  7:07                             ` Christian König
@ 2021-11-01 15:24                               ` Andrey Grodzovsky
  0 siblings, 0 replies; 19+ messages in thread
From: Andrey Grodzovsky @ 2021-11-01 15:24 UTC (permalink / raw)
  To: Christian König, Christian König, amd-gfx list,
	dri-devel, chris, daniel.vetter

[-- Attachment #1: Type: text/plain, Size: 824 bytes --]

Pushed to drm-misc-next

Andrey

On 2021-10-29 3:07 a.m., Christian König wrote:
>>
>> Attached a patch. Give it a try please, I tested it on my side and 
>> tried to generate the right conditions to trigger this code path by 
>> repeatedly submitting commands while issuing GPU reset to stop the 
>> scheduler and then killing command submissions process in the middle. 
>> But for some reason looks like the job_queue was always empty already 
>> at the time of entity kill.
>
> It was trivial to trigger with the stress utility I've hacked together:
>
> amdgpu_stress -b v 1g -b g 1g -c 1 2 1g 1k
>
> Then while it is copying just cntrl+c to kill it.
>
> The patch itself is:
>
> Tested-by: Christian König <christian.koenig@amd.com>
> Reviewed-by: Christian König <christian.koenig@amd.com>
>
> Thanks,
> Christian. 

[-- Attachment #2: Type: text/html, Size: 1620 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-11-01 15:24 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-01 10:50 Lockdep spalt on killing a processes Christian König
2021-10-01 14:52 ` Daniel Vetter
2021-10-01 15:10 ` Andrey Grodzovsky
2021-10-04  8:14   ` Christian König
2021-10-04 15:27     ` Andrey Grodzovsky
2021-10-20 19:32     ` Andrey Grodzovsky
2021-10-21  6:34       ` Christian König
2021-10-25 19:10         ` Andrey Grodzovsky
2021-10-25 19:56           ` Christian König
2021-10-26  2:33             ` Andrey Grodzovsky
2021-10-26 10:54               ` Christian König
2021-10-27 14:27                 ` Andrey Grodzovsky
2021-10-27 14:34                   ` Christian König
2021-10-27 14:47                     ` Andrey Grodzovsky
2021-10-27 14:50                       ` Christian König
2021-10-27 19:58                         ` Andrey Grodzovsky
2021-10-28 17:26                           ` Andrey Grodzovsky
2021-10-29  7:07                             ` Christian König
2021-11-01 15:24                               ` Andrey Grodzovsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.