linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [REPORT] libata deadlock possibilities by DEPT
@ 2022-02-16  4:12 Byungchul Park
  2022-02-16  4:16 ` Report 2 in ata_scsi_port_error_handler() Byungchul Park
  2022-02-16  6:52 ` [REPORT] libata deadlock possibilities by DEPT Byungchul Park
  0 siblings, 2 replies; 10+ messages in thread
From: Byungchul Park @ 2022-02-16  4:12 UTC (permalink / raw)
  To: damien.lemoal, linux-ide
  Cc: torvalds, mingo, linux-kernel, peterz, will, tglx, rostedt, joel,
	sashal, daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso,
	willy, david, amir73il, bfields, gregkh, kernel-team, linux-mm,
	akpm, mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis,
	cl, penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

Hi Damien and libata folks,

I've been developing a tool for detecting deadlock possibilities by
tracking wait/event rather than lock(?) acquisition order to try to
cover all synchonization machanisms. It's done on v5.17-rc1 tag.

https://github.com/lgebyungchulpark/linux-dept/commits/dept1.11_on_v5.17-rc1

Benifit:

	0. Works with all lock primitives.
	1. Works with wait_for_completion()/complete().
	2. Works with 'wait' on PG_locked.
	3. Works with 'wait' on PG_writeback.
	4. Works with swait/wakeup.
	5. Works with waitqueue.
	6. Multiple reports are allowed.
	7. Deduplication control on multiple reports.
	8. Withstand false positives thanks to 6.
	9. Easy to tag any wait/event.

Future work:

	0. To make it more stable.
	1. To separates Dept from Lockdep.
	2. To improves performance in terms of time and space.
	3. To use Dept as a dependency engine for Lockdep.
	4. To add any missing tags of wait/event in the kernel.
	5. To deduplicate stack trace.

I've got several reports from the tool. Some of them look like false
alarms caused by Lockdep's fake annotations added for better detection.
However, some others look like real deadlock possibility. Because of my
unfamiliarity of the domain, it's hard to confirm if it's a real one.
I'd like to ask for your opinion on it and it'd be appreciated.

How to interpret the report is:

	1. E(event) in each context cannot be triggered because of the
	   W(wait) that cannot be woken.
	2. The stack trace helping find the problematic code is located
	   in each conext's detail.

Let me add the reports on this email thread.

---
Thanks,
Byungchul


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Report 2 in ata_scsi_port_error_handler()
  2022-02-16  4:12 [REPORT] libata deadlock possibilities by DEPT Byungchul Park
@ 2022-02-16  4:16 ` Byungchul Park
  2022-02-16  4:16   ` Report 3 " Byungchul Park
  2022-02-16  4:16   ` Report " Byungchul Park
  2022-02-16  6:52 ` [REPORT] libata deadlock possibilities by DEPT Byungchul Park
  1 sibling, 2 replies; 10+ messages in thread
From: Byungchul Park @ 2022-02-16  4:16 UTC (permalink / raw)
  To: damien.lemoal, linux-ide
  Cc: torvalds, mingo, linux-kernel, peterz, will, tglx, rostedt, joel,
	sashal, daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso,
	willy, david, amir73il, bfields, gregkh, kernel-team, linux-mm,
	akpm, mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis,
	cl, penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

[    2.051991] ===================================================
[    2.051991] DEPT: Circular dependency has been detected.
[    2.051991] 5.17.0-rc1-00014-gcf3441bb2012 #2 Tainted: G        W        
[    2.051991] ---------------------------------------------------
[    2.051991] summary
[    2.051991] ---------------------------------------------------
[    2.051991] *** DEADLOCK ***
[    2.051991] 
[    2.051991] context A
[    2.051991]     [S] (unknown)(&(&ap->eh_wait_q)->dmap:0)
[    2.051991]     [W] wait_for_completion_timeout(&wait:0)
[    2.051991]     [E] event(&(&ap->eh_wait_q)->dmap:0)
[    2.051991] 
[    2.051991] context B
[    2.051991]     [S] __raw_spin_lock_irqsave(&host->lock:0)
[    2.051991]     [W] wait(&(&ap->eh_wait_q)->dmap:0)
[    2.051991]     [E] spin_unlock(&host->lock:0)
[    2.051991] 
[    2.051991] context C
[    2.051991]     [S] (unknown)(&wait:0)
[    2.051991]     [W] __raw_spin_lock_irq(&host->lock:0)
[    2.051991]     [E] complete(&wait:0)
[    2.051991] 
[    2.051991] [S]: start of the event context
[    2.051991] [W]: the wait blocked
[    2.051991] [E]: the event not reachable
[    2.051991] ---------------------------------------------------
[    2.051991] context A's detail
[    2.051991] ---------------------------------------------------
[    2.051991] context A
[    2.051991]     [S] (unknown)(&(&ap->eh_wait_q)->dmap:0)
[    2.051991]     [W] wait_for_completion_timeout(&wait:0)
[    2.051991]     [E] event(&(&ap->eh_wait_q)->dmap:0)
[    2.051991] 
[    2.051991] [S] (unknown)(&(&ap->eh_wait_q)->dmap:0):
[    2.051991] (N/A)
[    2.051991] 
[    2.051991] [W] wait_for_completion_timeout(&wait:0):
[    2.051991] [<ffffffff817763b1>] ata_exec_internal_sg+0x401/0x690
[    2.051991] stacktrace:
[    2.051991]       wait_for_completion_timeout+0x40/0xf0
[    2.051991]       ata_exec_internal_sg+0x401/0x690
[    2.051991]       ata_exec_internal+0x4a/0x70
[    2.051991]       ata_dev_read_id+0x2ce/0x540
[    2.051991]       ata_dev_reread_id+0x33/0xc0
[    2.051991]       ata_dev_revalidate+0x41/0x190
[    2.051991]       ata_do_set_mode+0x61c/0xac0
[    2.051991]       ata_set_mode+0xfc/0x110
[    2.051991]       ata_eh_recover+0x1061/0x1360
[    2.051991]       ata_do_eh+0x3f/0xa0
[    2.051991]       ata_scsi_port_error_handler+0x432/0x740
[    2.051991]       ata_scsi_error+0x94/0xc0
[    2.051991]       scsi_error_handler+0x8d/0x3a0
[    2.051991]       kthread+0xe3/0x110
[    2.051991]       ret_from_fork+0x22/0x30
[    2.051991] 
[    2.051991] [E] event(&(&ap->eh_wait_q)->dmap:0):
[    2.051991] [<ffffffff810baa53>] __wake_up_common+0x93/0x1a0
[    2.051991] stacktrace:
[    2.051991]       dept_event+0x12b/0x1f0
[    2.051991]       __wake_up_common+0xb0/0x1a0
[    2.051991]       __wake_up_common_lock+0x65/0x90
[    2.051991]       ata_scsi_port_error_handler+0x67a/0x740
[    2.051991]       ata_scsi_error+0x94/0xc0
[    2.051991]       scsi_error_handler+0x8d/0x3a0
[    2.051991]       kthread+0xe3/0x110
[    2.051991]       ret_from_fork+0x22/0x30
[    2.051991] ---------------------------------------------------
[    2.051991] context B's detail
[    2.051991] ---------------------------------------------------
[    2.051991] context B
[    2.051991]     [S] __raw_spin_lock_irqsave(&host->lock:0)
[    2.051991]     [W] wait(&(&ap->eh_wait_q)->dmap:0)
[    2.051991]     [E] spin_unlock(&host->lock:0)
[    2.051991] 
[    2.051991] [S] __raw_spin_lock_irqsave(&host->lock:0):
[    2.051991] [<ffffffff8178047c>] ata_port_wait_eh+0x6c/0xc0
[    2.051991] stacktrace:
[    2.051991]       _raw_spin_lock_irqsave+0x82/0xa0
[    2.051991]       ata_port_wait_eh+0x6c/0xc0
[    2.051991]       ata_port_probe+0x1f/0x30
[    2.051991]       async_port_probe+0x27/0x50
[    2.051991]       async_run_entry_fn+0x21/0xa0
[    2.051991]       process_one_work+0x317/0x640
[    2.051991]       worker_thread+0x44/0x410
[    2.051991]       kthread+0xe3/0x110
[    2.051991]       ret_from_fork+0x22/0x30
[    2.051991] 
[    2.051991] [W] wait(&(&ap->eh_wait_q)->dmap:0):
[    2.051991] [<ffffffff810bb017>] prepare_to_wait+0x47/0xd0
[    2.051991] stacktrace:
[    2.051991]       ata_port_wait_eh+0x52/0xc0
[    2.051991]       ata_port_probe+0x1f/0x30
[    2.051991]       async_port_probe+0x27/0x50
[    2.051991]       async_run_entry_fn+0x21/0xa0
[    2.051991]       process_one_work+0x317/0x640
[    2.051991]       worker_thread+0x44/0x410
[    2.051991]       kthread+0xe3/0x110
[    2.051991]       ret_from_fork+0x22/0x30
[    2.051991] 
[    2.051991] [E] spin_unlock(&host->lock:0):
[    2.051991] [<ffffffff8178046e>] ata_port_wait_eh+0x5e/0xc0
[    2.051991] stacktrace:
[    2.051991]       _raw_spin_unlock_irqrestore+0x35/0x80
[    2.051991]       ata_port_wait_eh+0x5e/0xc0
[    2.051991]       ata_port_probe+0x1f/0x30
[    2.051991]       async_port_probe+0x27/0x50
[    2.051991]       async_run_entry_fn+0x21/0xa0
[    2.051991]       process_one_work+0x317/0x640
[    2.051991]       worker_thread+0x44/0x410
[    2.051991]       kthread+0xe3/0x110
[    2.051991]       ret_from_fork+0x22/0x30
[    2.051991] ---------------------------------------------------
[    2.051991] context C's detail
[    2.051991] ---------------------------------------------------
[    2.051991] context C
[    2.051991]     [S] (unknown)(&wait:0)
[    2.051991]     [W] __raw_spin_lock_irq(&host->lock:0)
[    2.051991]     [E] complete(&wait:0)
[    2.051991] 
[    2.051991] [S] (unknown)(&wait:0):
[    2.051991] (N/A)
[    2.051991] 
[    2.051991] [W] __raw_spin_lock_irq(&host->lock:0):
[    2.051991] [<ffffffff8178c51b>] ata_sff_pio_task+0x1b/0x1b0
[    2.051991] stacktrace:
[    2.051991]       _raw_spin_lock_irq+0x58/0x90
[    2.051991]       ata_sff_pio_task+0x1b/0x1b0
[    2.051991]       process_one_work+0x317/0x640
[    2.051991]       worker_thread+0x44/0x410
[    2.051991]       kthread+0xe3/0x110
[    2.051991]       ret_from_fork+0x22/0x30
[    2.051991] 
[    2.051991] [E] complete(&wait:0):
[    2.051991] [<ffffffff81780569>] ata_do_link_abort+0x99/0xd0
[    2.051991] stacktrace:
[    2.051991]       complete+0x28/0x60
[    2.051991]       ata_do_link_abort+0x99/0xd0
[    2.051991]       ata_sff_hsm_move+0xfc/0xad0
[    2.051991]       ata_sff_pio_task+0x177/0x1b0
[    2.051991]       process_one_work+0x317/0x640
[    2.051991]       worker_thread+0x44/0x410
[    2.051991]       kthread+0xe3/0x110
[    2.051991]       ret_from_fork+0x22/0x30
[    2.051991] ---------------------------------------------------
[    2.051991] information that might be helpful
[    2.051991] ---------------------------------------------------
[    2.051991] CPU: 1 PID: 53 Comm: scsi_eh_1 Tainted: G        W         5.17.0-rc1-00014-gcf3441bb2012 #2
[    2.051991] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[    2.051991] Call Trace:
[    2.051991]  <TASK>
[    2.051991]  dump_stack_lvl+0x44/0x57
[    2.051991]  print_circle+0x384/0x510
[    2.051991]  ? print_circle+0x510/0x510
[    2.051991]  cb_check_dl+0x58/0x60
[    2.051991]  bfs+0xdc/0x1b0
[    2.051991]  add_dep+0x94/0x120
[    2.051991]  do_event.isra.22+0x284/0x300
[    2.051991]  ? __wake_up_common+0x93/0x1a0
[    2.051991]  dept_event+0x12b/0x1f0
[    2.051991]  __wake_up_common+0xb0/0x1a0
[    2.051991]  __wake_up_common_lock+0x65/0x90
[    2.051991]  ata_scsi_port_error_handler+0x67a/0x740
[    2.051991]  ? trace_hardirqs_on+0x38/0xe0
[    2.051991]  ? scsi_eh_get_sense+0x150/0x150
[    2.051991]  ata_scsi_error+0x94/0xc0
[    2.051991]  scsi_error_handler+0x8d/0x3a0
[    2.051991]  ? _raw_spin_unlock_irqrestore+0x63/0x80
[    2.051991]  ? scsi_eh_get_sense+0x150/0x150
[    2.051991]  kthread+0xe3/0x110
[    2.051991]  ? kthread_complete_and_exit+0x20/0x20
[    2.051991]  ret_from_fork+0x22/0x30
[    2.051991]  </TASK>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Report 3 in ata_scsi_port_error_handler()
  2022-02-16  4:16 ` Report 2 in ata_scsi_port_error_handler() Byungchul Park
@ 2022-02-16  4:16   ` Byungchul Park
  2022-02-16  4:16   ` Report " Byungchul Park
  1 sibling, 0 replies; 10+ messages in thread
From: Byungchul Park @ 2022-02-16  4:16 UTC (permalink / raw)
  To: damien.lemoal, linux-ide
  Cc: torvalds, mingo, linux-kernel, peterz, will, tglx, rostedt, joel,
	sashal, daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso,
	willy, david, amir73il, bfields, gregkh, kernel-team, linux-mm,
	akpm, mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis,
	cl, penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

[    2.051991] ===================================================
[    2.051991] DEPT: Circular dependency has been detected.
[    2.051991] 5.17.0-rc1-00014-gcf3441bb2012 #2 Tainted: G        W        
[    2.051991] ---------------------------------------------------
[    2.051991] summary
[    2.051991] ---------------------------------------------------
[    2.051991] *** DEADLOCK ***
[    2.051991] 
[    2.051991] context A
[    2.051991]     [S] (unknown)(&(&ap->eh_wait_q)->dmap:0)
[    2.051991]     [W] __mutex_lock_common(&host->eh_mutex:0)
[    2.051991]     [E] event(&(&ap->eh_wait_q)->dmap:0)
[    2.051991] 
[    2.051991] context B
[    2.051991]     [S] __raw_spin_lock_irqsave(&host->lock:0)
[    2.051991]     [W] wait(&(&ap->eh_wait_q)->dmap:0)
[    2.051991]     [E] spin_unlock(&host->lock:0)
[    2.051991] 
[    2.051991] context C
[    2.051991]     [S] __mutex_lock_common(&host->eh_mutex:0)
[    2.051991]     [W] __raw_spin_lock_irqsave(&host->lock:0)
[    2.051991]     [E] mutex_unlock(&host->eh_mutex:0)
[    2.051991] 
[    2.051991] [S]: start of the event context
[    2.051991] [W]: the wait blocked
[    2.051991] [E]: the event not reachable
[    2.051991] ---------------------------------------------------
[    2.051991] context A's detail
[    2.051991] ---------------------------------------------------
[    2.051991] context A
[    2.051991]     [S] (unknown)(&(&ap->eh_wait_q)->dmap:0)
[    2.051991]     [W] __mutex_lock_common(&host->eh_mutex:0)
[    2.051991]     [E] event(&(&ap->eh_wait_q)->dmap:0)
[    2.051991] 
[    2.051991] [S] (unknown)(&(&ap->eh_wait_q)->dmap:0):
[    2.051991] (N/A)
[    2.051991] 
[    2.051991] [W] __mutex_lock_common(&host->eh_mutex:0):
[    2.051991] [<ffffffff81780ad9>] ata_eh_acquire+0x19/0x40
[    2.051991] stacktrace:
[    2.051991]       __mutex_lock+0x52c/0x900
[    2.051991]       ata_eh_acquire+0x19/0x40
[    2.051991]       ata_exec_internal_sg+0x41a/0x690
[    2.051991]       ata_exec_internal+0x4a/0x70
[    2.051991]       ata_dev_read_id+0x2ce/0x540
[    2.051991]       ata_dev_reread_id+0x33/0xc0
[    2.051991]       ata_dev_revalidate+0x41/0x190
[    2.051991]       ata_do_set_mode+0x61c/0xac0
[    2.051991]       ata_set_mode+0xfc/0x110
[    2.051991]       ata_eh_recover+0x1061/0x1360
[    2.051991]       ata_do_eh+0x3f/0xa0
[    2.051991]       ata_scsi_port_error_handler+0x432/0x740
[    2.051991]       ata_scsi_error+0x94/0xc0
[    2.051991]       scsi_error_handler+0x8d/0x3a0
[    2.051991]       kthread+0xe3/0x110
[    2.051991]       ret_from_fork+0x22/0x30
[    2.051991] 
[    2.051991] [E] event(&(&ap->eh_wait_q)->dmap:0):
[    2.051991] [<ffffffff810baa53>] __wake_up_common+0x93/0x1a0
[    2.051991] stacktrace:
[    2.051991]       dept_event+0x12b/0x1f0
[    2.051991]       __wake_up_common+0xb0/0x1a0
[    2.051991]       __wake_up_common_lock+0x65/0x90
[    2.051991]       ata_scsi_port_error_handler+0x67a/0x740
[    2.051991]       ata_scsi_error+0x94/0xc0
[    2.051991]       scsi_error_handler+0x8d/0x3a0
[    2.051991]       kthread+0xe3/0x110
[    2.051991]       ret_from_fork+0x22/0x30
[    2.051991] ---------------------------------------------------
[    2.051991] context B's detail
[    2.051991] ---------------------------------------------------
[    2.051991] context B
[    2.051991]     [S] __raw_spin_lock_irqsave(&host->lock:0)
[    2.051991]     [W] wait(&(&ap->eh_wait_q)->dmap:0)
[    2.051991]     [E] spin_unlock(&host->lock:0)
[    2.051991] 
[    2.051991] [S] __raw_spin_lock_irqsave(&host->lock:0):
[    2.051991] [<ffffffff8178047c>] ata_port_wait_eh+0x6c/0xc0
[    2.051991] stacktrace:
[    2.051991]       _raw_spin_lock_irqsave+0x82/0xa0
[    2.051991]       ata_port_wait_eh+0x6c/0xc0
[    2.051991]       ata_port_probe+0x1f/0x30
[    2.051991]       async_port_probe+0x27/0x50
[    2.051991]       async_run_entry_fn+0x21/0xa0
[    2.051991]       process_one_work+0x317/0x640
[    2.051991]       worker_thread+0x44/0x410
[    2.051991]       kthread+0xe3/0x110
[    2.051991]       ret_from_fork+0x22/0x30
[    2.051991] 
[    2.051991] [W] wait(&(&ap->eh_wait_q)->dmap:0):
[    2.051991] [<ffffffff810bb017>] prepare_to_wait+0x47/0xd0
[    2.051991] stacktrace:
[    2.051991]       ata_port_wait_eh+0x52/0xc0
[    2.051991]       ata_port_probe+0x1f/0x30
[    2.051991]       async_port_probe+0x27/0x50
[    2.051991]       async_run_entry_fn+0x21/0xa0
[    2.051991]       process_one_work+0x317/0x640
[    2.051991]       worker_thread+0x44/0x410
[    2.051991]       kthread+0xe3/0x110
[    2.051991]       ret_from_fork+0x22/0x30
[    2.051991] 
[    2.051991] [E] spin_unlock(&host->lock:0):
[    2.051991] [<ffffffff8178046e>] ata_port_wait_eh+0x5e/0xc0
[    2.051991] stacktrace:
[    2.051991]       _raw_spin_unlock_irqrestore+0x35/0x80
[    2.051991]       ata_port_wait_eh+0x5e/0xc0
[    2.051991]       ata_port_probe+0x1f/0x30
[    2.051991]       async_port_probe+0x27/0x50
[    2.051991]       async_run_entry_fn+0x21/0xa0
[    2.051991]       process_one_work+0x317/0x640
[    2.051991]       worker_thread+0x44/0x410
[    2.051991]       kthread+0xe3/0x110
[    2.051991]       ret_from_fork+0x22/0x30
[    2.051991] ---------------------------------------------------
[    2.051991] context C's detail
[    2.051991] ---------------------------------------------------
[    2.051991] context C
[    2.051991]     [S] __mutex_lock_common(&host->eh_mutex:0)
[    2.051991]     [W] __raw_spin_lock_irqsave(&host->lock:0)
[    2.051991]     [E] mutex_unlock(&host->eh_mutex:0)
[    2.051991] 
[    2.051991] [S] __mutex_lock_common(&host->eh_mutex:0):
[    2.051991] [<ffffffff81780ad9>] ata_eh_acquire+0x19/0x40
[    2.051991] stacktrace:
[    2.051991]       __mutex_lock+0x54d/0x900
[    2.051991]       ata_eh_acquire+0x19/0x40
[    2.051991]       ata_scsi_port_error_handler+0x32/0x740
[    2.051991]       ata_scsi_error+0x94/0xc0
[    2.051991]       scsi_error_handler+0x8d/0x3a0
[    2.051991]       kthread+0xe3/0x110
[    2.051991]       ret_from_fork+0x22/0x30
[    2.051991] 
[    2.051991] [W] __raw_spin_lock_irqsave(&host->lock:0):
[    2.051991] [<ffffffff817853df>] ata_scsi_port_error_handler+0x4f/0x740
[    2.051991] stacktrace:
[    2.051991]       _raw_spin_lock_irqsave+0x5f/0xa0
[    2.051991]       ata_scsi_port_error_handler+0x4f/0x740
[    2.051991]       ata_scsi_error+0x94/0xc0
[    2.051991]       scsi_error_handler+0x8d/0x3a0
[    2.051991]       kthread+0xe3/0x110
[    2.051991]       ret_from_fork+0x22/0x30
[    2.051991] 
[    2.051991] [E] mutex_unlock(&host->eh_mutex:0):
[    2.051991] [<ffffffff81773cd7>] ata_msleep+0x77/0x80
[    2.051991] stacktrace:
[    2.051991]       __mutex_unlock_slowpath+0x49/0x2a0
[    2.051991]       ata_msleep+0x77/0x80
[    2.051991]       ata_sff_wait_after_reset+0x27/0x150
[    2.051991]       ata_sff_softreset+0xc8/0x1a0
[    2.051991]       ata_eh_reset+0x3dc/0x1390
[    2.051991]       ata_eh_recover+0x257/0x1360
[    2.051991]       ata_do_eh+0x3f/0xa0
[    2.051991]       ata_scsi_port_error_handler+0x432/0x740
[    2.051991]       ata_scsi_error+0x94/0xc0
[    2.051991]       scsi_error_handler+0x8d/0x3a0
[    2.051991]       kthread+0xe3/0x110
[    2.051991]       ret_from_fork+0x22/0x30
[    2.051991] ---------------------------------------------------
[    2.051991] information that might be helpful
[    2.051991] ---------------------------------------------------
[    2.051991] CPU: 1 PID: 53 Comm: scsi_eh_1 Tainted: G        W         5.17.0-rc1-00014-gcf3441bb2012 #2
[    2.051991] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[    2.051991] Call Trace:
[    2.051991]  <TASK>
[    2.051991]  dump_stack_lvl+0x44/0x57
[    2.051991]  print_circle+0x384/0x510
[    2.051991]  ? print_circle+0x510/0x510
[    2.051991]  cb_check_dl+0x58/0x60
[    2.051991]  bfs+0xdc/0x1b0
[    2.051991]  add_dep+0x94/0x120
[    2.051991]  do_event.isra.22+0x284/0x300
[    2.051991]  ? __wake_up_common+0x93/0x1a0
[    2.051991]  dept_event+0x12b/0x1f0
[    2.051991]  __wake_up_common+0xb0/0x1a0
[    2.051991]  __wake_up_common_lock+0x65/0x90
[    2.051991]  ata_scsi_port_error_handler+0x67a/0x740
[    2.051991]  ? trace_hardirqs_on+0x38/0xe0
[    2.051991]  ? scsi_eh_get_sense+0x150/0x150
[    2.051991]  ata_scsi_error+0x94/0xc0
[    2.051991]  scsi_error_handler+0x8d/0x3a0
[    2.051991]  ? _raw_spin_unlock_irqrestore+0x63/0x80
[    2.051991]  ? scsi_eh_get_sense+0x150/0x150
[    2.051991]  kthread+0xe3/0x110
[    2.051991]  ? kthread_complete_and_exit+0x20/0x20
[    2.051991]  ret_from_fork+0x22/0x30
[    2.051991]  </TASK>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Report in ata_scsi_port_error_handler()
  2022-02-16  4:16 ` Report 2 in ata_scsi_port_error_handler() Byungchul Park
  2022-02-16  4:16   ` Report 3 " Byungchul Park
@ 2022-02-16  4:16   ` Byungchul Park
  2022-02-16  6:37     ` Damien Le Moal
  1 sibling, 1 reply; 10+ messages in thread
From: Byungchul Park @ 2022-02-16  4:16 UTC (permalink / raw)
  To: damien.lemoal, linux-ide
  Cc: torvalds, mingo, linux-kernel, peterz, will, tglx, rostedt, joel,
	sashal, daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso,
	willy, david, amir73il, bfields, gregkh, kernel-team, linux-mm,
	akpm, mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis,
	cl, penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

[    2.051040] ===================================================
[    2.051406] DEPT: Circular dependency has been detected.
[    2.051730] 5.17.0-rc1-00014-gcf3441bb2012 #2 Tainted: G        W        
[    2.051991] ---------------------------------------------------
[    2.051991] summary
[    2.051991] ---------------------------------------------------
[    2.051991] *** DEADLOCK ***
[    2.051991] 
[    2.051991] context A
[    2.051991]     [S] (unknown)(&(&ap->eh_wait_q)->dmap:0)
[    2.051991]     [W] __raw_spin_lock_irq(&host->lock:0)
[    2.051991]     [E] event(&(&ap->eh_wait_q)->dmap:0)
[    2.051991] 
[    2.051991] context B
[    2.051991]     [S] __raw_spin_lock_irqsave(&host->lock:0)
[    2.051991]     [W] wait(&(&ap->eh_wait_q)->dmap:0)
[    2.051991]     [E] spin_unlock(&host->lock:0)
[    2.051991] 
[    2.051991] [S]: start of the event context
[    2.051991] [W]: the wait blocked
[    2.051991] [E]: the event not reachable
[    2.051991] ---------------------------------------------------
[    2.051991] context A's detail
[    2.051991] ---------------------------------------------------
[    2.051991] context A
[    2.051991]     [S] (unknown)(&(&ap->eh_wait_q)->dmap:0)
[    2.051991]     [W] __raw_spin_lock_irq(&host->lock:0)
[    2.051991]     [E] event(&(&ap->eh_wait_q)->dmap:0)
[    2.051991] 
[    2.051991] [S] (unknown)(&(&ap->eh_wait_q)->dmap:0):
[    2.051991] (N/A)
[    2.051991] 
[    2.051991] [W] __raw_spin_lock_irq(&host->lock:0):
[    2.051991] [<ffffffff8178d355>] ata_sff_flush_pio_task+0x45/0xc0
[    2.051991] stacktrace:
[    2.051991]       _raw_spin_lock_irq+0x58/0x90
[    2.051991]       ata_sff_flush_pio_task+0x45/0xc0
[    2.051991]       ata_exec_internal_sg+0x422/0x690
[    2.051991]       ata_do_set_mode+0x5ee/0xac0
[    2.051991]       ata_set_mode+0xfc/0x110
[    2.051991]       ata_eh_recover+0x1061/0x1360
[    2.051991]       ata_do_eh+0x3f/0xa0
[    2.051991]       ata_scsi_port_error_handler+0x432/0x740
[    2.051991]       ata_scsi_error+0x94/0xc0
[    2.051991]       scsi_error_handler+0x8d/0x3a0
[    2.051991]       kthread+0xe3/0x110
[    2.051991]       ret_from_fork+0x22/0x30
[    2.051991] 
[    2.051991] [E] event(&(&ap->eh_wait_q)->dmap:0):
[    2.051991] [<ffffffff810baa53>] __wake_up_common+0x93/0x1a0
[    2.051991] stacktrace:
[    2.051991]       dept_event+0x12b/0x1f0
[    2.051991]       __wake_up_common+0xb0/0x1a0
[    2.051991]       __wake_up_common_lock+0x65/0x90
[    2.051991]       ata_scsi_port_error_handler+0x67a/0x740
[    2.051991]       ata_scsi_error+0x94/0xc0
[    2.051991]       scsi_error_handler+0x8d/0x3a0
[    2.051991]       kthread+0xe3/0x110
[    2.051991]       ret_from_fork+0x22/0x30
[    2.051991] ---------------------------------------------------
[    2.051991] context B's detail
[    2.051991] ---------------------------------------------------
[    2.051991] context B
[    2.051991]     [S] __raw_spin_lock_irqsave(&host->lock:0)
[    2.051991]     [W] wait(&(&ap->eh_wait_q)->dmap:0)
[    2.051991]     [E] spin_unlock(&host->lock:0)
[    2.051991] 
[    2.051991] [S] __raw_spin_lock_irqsave(&host->lock:0):
[    2.051991] [<ffffffff8178047c>] ata_port_wait_eh+0x6c/0xc0
[    2.051991] stacktrace:
[    2.051991]       _raw_spin_lock_irqsave+0x82/0xa0
[    2.051991]       ata_port_wait_eh+0x6c/0xc0
[    2.051991]       ata_port_probe+0x1f/0x30
[    2.051991]       async_port_probe+0x27/0x50
[    2.051991]       async_run_entry_fn+0x21/0xa0
[    2.051991]       process_one_work+0x317/0x640
[    2.051991]       worker_thread+0x44/0x410
[    2.051991]       kthread+0xe3/0x110
[    2.051991]       ret_from_fork+0x22/0x30
[    2.051991] 
[    2.051991] [W] wait(&(&ap->eh_wait_q)->dmap:0):
[    2.051991] [<ffffffff810bb017>] prepare_to_wait+0x47/0xd0
[    2.051991] stacktrace:
[    2.051991]       ata_port_wait_eh+0x52/0xc0
[    2.051991]       ata_port_probe+0x1f/0x30
[    2.051991]       async_port_probe+0x27/0x50
[    2.051991]       async_run_entry_fn+0x21/0xa0
[    2.051991]       process_one_work+0x317/0x640
[    2.051991]       worker_thread+0x44/0x410
[    2.051991]       kthread+0xe3/0x110
[    2.051991]       ret_from_fork+0x22/0x30
[    2.051991] 
[    2.051991] [E] spin_unlock(&host->lock:0):
[    2.051991] [<ffffffff8178046e>] ata_port_wait_eh+0x5e/0xc0
[    2.051991] stacktrace:
[    2.051991]       _raw_spin_unlock_irqrestore+0x35/0x80
[    2.051991]       ata_port_wait_eh+0x5e/0xc0
[    2.051991]       ata_port_probe+0x1f/0x30
[    2.051991]       async_port_probe+0x27/0x50
[    2.051991]       async_run_entry_fn+0x21/0xa0
[    2.051991]       process_one_work+0x317/0x640
[    2.051991]       worker_thread+0x44/0x410
[    2.051991]       kthread+0xe3/0x110
[    2.051991]       ret_from_fork+0x22/0x30
[    2.051991] ---------------------------------------------------
[    2.051991] information that might be helpful
[    2.051991] ---------------------------------------------------
[    2.051991] CPU: 1 PID: 53 Comm: scsi_eh_1 Tainted: G        W         5.17.0-rc1-00014-gcf3441bb2012 #2
[    2.051991] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[    2.051991] Call Trace:
[    2.051991]  <TASK>
[    2.051991]  dump_stack_lvl+0x44/0x57
[    2.051991]  print_circle+0x384/0x510
[    2.051991]  ? print_circle+0x510/0x510
[    2.051991]  cb_check_dl+0x58/0x60
[    2.051991]  bfs+0xdc/0x1b0
[    2.051991]  add_dep+0x94/0x120
[    2.051991]  do_event.isra.22+0x284/0x300
[    2.051991]  ? __wake_up_common+0x93/0x1a0
[    2.051991]  dept_event+0x12b/0x1f0
[    2.051991]  __wake_up_common+0xb0/0x1a0
[    2.051991]  __wake_up_common_lock+0x65/0x90
[    2.051991]  ata_scsi_port_error_handler+0x67a/0x740
[    2.051991]  ? trace_hardirqs_on+0x38/0xe0
[    2.051991]  ? scsi_eh_get_sense+0x150/0x150
[    2.051991]  ata_scsi_error+0x94/0xc0
[    2.051991]  scsi_error_handler+0x8d/0x3a0
[    2.051991]  ? _raw_spin_unlock_irqrestore+0x63/0x80
[    2.051991]  ? scsi_eh_get_sense+0x150/0x150
[    2.051991]  kthread+0xe3/0x110
[    2.051991]  ? kthread_complete_and_exit+0x20/0x20
[    2.051991]  ret_from_fork+0x22/0x30
[    2.051991]  </TASK>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Report in ata_scsi_port_error_handler()
  2022-02-16  4:16   ` Report " Byungchul Park
@ 2022-02-16  6:37     ` Damien Le Moal
  2022-02-16 18:09       ` Linus Torvalds
  0 siblings, 1 reply; 10+ messages in thread
From: Damien Le Moal @ 2022-02-16  6:37 UTC (permalink / raw)
  To: Byungchul Park, linux-ide
  Cc: torvalds, mingo, linux-kernel, peterz, will, tglx, rostedt, joel,
	sashal, daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso,
	willy, david, amir73il, bfields, gregkh, kernel-team, linux-mm,
	akpm, mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis,
	cl, penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

On 2/16/22 13:16, Byungchul Park wrote:
> [    2.051040] ===================================================
> [    2.051406] DEPT: Circular dependency has been detected.
> [    2.051730] 5.17.0-rc1-00014-gcf3441bb2012 #2 Tainted: G        W        
> [    2.051991] ---------------------------------------------------
> [    2.051991] summary
> [    2.051991] ---------------------------------------------------
> [    2.051991] *** DEADLOCK ***
> [    2.051991] 
> [    2.051991] context A
> [    2.051991]     [S] (unknown)(&(&ap->eh_wait_q)->dmap:0)
> [    2.051991]     [W] __raw_spin_lock_irq(&host->lock:0)
> [    2.051991]     [E] event(&(&ap->eh_wait_q)->dmap:0)
> [    2.051991] 
> [    2.051991] context B
> [    2.051991]     [S] __raw_spin_lock_irqsave(&host->lock:0)
> [    2.051991]     [W] wait(&(&ap->eh_wait_q)->dmap:0)
> [    2.051991]     [E] spin_unlock(&host->lock:0)

Sleeping with a spinlock held would be triggering warnings already, so
these reports seem bogus to me.

In any case, trying to make sense of these reports is not easy, to say
the least. If you think there is really a bug, then please send a clear
explanation of your findings rather than a raw tool output that did not
go through human analysis.

> [    2.051991] 
> [    2.051991] [S]: start of the event context
> [    2.051991] [W]: the wait blocked
> [    2.051991] [E]: the event not reachable
> [    2.051991] ---------------------------------------------------
> [    2.051991] context A's detail
> [    2.051991] ---------------------------------------------------
> [    2.051991] context A
> [    2.051991]     [S] (unknown)(&(&ap->eh_wait_q)->dmap:0)
> [    2.051991]     [W] __raw_spin_lock_irq(&host->lock:0)
> [    2.051991]     [E] event(&(&ap->eh_wait_q)->dmap:0)
> [    2.051991] 
> [    2.051991] [S] (unknown)(&(&ap->eh_wait_q)->dmap:0):
> [    2.051991] (N/A)
> [    2.051991] 
> [    2.051991] [W] __raw_spin_lock_irq(&host->lock:0):
> [    2.051991] [<ffffffff8178d355>] ata_sff_flush_pio_task+0x45/0xc0
> [    2.051991] stacktrace:
> [    2.051991]       _raw_spin_lock_irq+0x58/0x90
> [    2.051991]       ata_sff_flush_pio_task+0x45/0xc0
> [    2.051991]       ata_exec_internal_sg+0x422/0x690
> [    2.051991]       ata_do_set_mode+0x5ee/0xac0
> [    2.051991]       ata_set_mode+0xfc/0x110
> [    2.051991]       ata_eh_recover+0x1061/0x1360
> [    2.051991]       ata_do_eh+0x3f/0xa0
> [    2.051991]       ata_scsi_port_error_handler+0x432/0x740
> [    2.051991]       ata_scsi_error+0x94/0xc0
> [    2.051991]       scsi_error_handler+0x8d/0x3a0
> [    2.051991]       kthread+0xe3/0x110
> [    2.051991]       ret_from_fork+0x22/0x30
> [    2.051991] 
> [    2.051991] [E] event(&(&ap->eh_wait_q)->dmap:0):
> [    2.051991] [<ffffffff810baa53>] __wake_up_common+0x93/0x1a0
> [    2.051991] stacktrace:
> [    2.051991]       dept_event+0x12b/0x1f0
> [    2.051991]       __wake_up_common+0xb0/0x1a0
> [    2.051991]       __wake_up_common_lock+0x65/0x90
> [    2.051991]       ata_scsi_port_error_handler+0x67a/0x740
> [    2.051991]       ata_scsi_error+0x94/0xc0
> [    2.051991]       scsi_error_handler+0x8d/0x3a0
> [    2.051991]       kthread+0xe3/0x110
> [    2.051991]       ret_from_fork+0x22/0x30
> [    2.051991] ---------------------------------------------------
> [    2.051991] context B's detail
> [    2.051991] ---------------------------------------------------
> [    2.051991] context B
> [    2.051991]     [S] __raw_spin_lock_irqsave(&host->lock:0)
> [    2.051991]     [W] wait(&(&ap->eh_wait_q)->dmap:0)
> [    2.051991]     [E] spin_unlock(&host->lock:0)
> [    2.051991] 
> [    2.051991] [S] __raw_spin_lock_irqsave(&host->lock:0):
> [    2.051991] [<ffffffff8178047c>] ata_port_wait_eh+0x6c/0xc0
> [    2.051991] stacktrace:
> [    2.051991]       _raw_spin_lock_irqsave+0x82/0xa0
> [    2.051991]       ata_port_wait_eh+0x6c/0xc0
> [    2.051991]       ata_port_probe+0x1f/0x30
> [    2.051991]       async_port_probe+0x27/0x50
> [    2.051991]       async_run_entry_fn+0x21/0xa0
> [    2.051991]       process_one_work+0x317/0x640
> [    2.051991]       worker_thread+0x44/0x410
> [    2.051991]       kthread+0xe3/0x110
> [    2.051991]       ret_from_fork+0x22/0x30
> [    2.051991] 
> [    2.051991] [W] wait(&(&ap->eh_wait_q)->dmap:0):
> [    2.051991] [<ffffffff810bb017>] prepare_to_wait+0x47/0xd0
> [    2.051991] stacktrace:
> [    2.051991]       ata_port_wait_eh+0x52/0xc0
> [    2.051991]       ata_port_probe+0x1f/0x30
> [    2.051991]       async_port_probe+0x27/0x50
> [    2.051991]       async_run_entry_fn+0x21/0xa0
> [    2.051991]       process_one_work+0x317/0x640
> [    2.051991]       worker_thread+0x44/0x410
> [    2.051991]       kthread+0xe3/0x110
> [    2.051991]       ret_from_fork+0x22/0x30
> [    2.051991] 
> [    2.051991] [E] spin_unlock(&host->lock:0):
> [    2.051991] [<ffffffff8178046e>] ata_port_wait_eh+0x5e/0xc0
> [    2.051991] stacktrace:
> [    2.051991]       _raw_spin_unlock_irqrestore+0x35/0x80
> [    2.051991]       ata_port_wait_eh+0x5e/0xc0
> [    2.051991]       ata_port_probe+0x1f/0x30
> [    2.051991]       async_port_probe+0x27/0x50
> [    2.051991]       async_run_entry_fn+0x21/0xa0
> [    2.051991]       process_one_work+0x317/0x640
> [    2.051991]       worker_thread+0x44/0x410
> [    2.051991]       kthread+0xe3/0x110
> [    2.051991]       ret_from_fork+0x22/0x30
> [    2.051991] ---------------------------------------------------
> [    2.051991] information that might be helpful
> [    2.051991] ---------------------------------------------------
> [    2.051991] CPU: 1 PID: 53 Comm: scsi_eh_1 Tainted: G        W         5.17.0-rc1-00014-gcf3441bb2012 #2
> [    2.051991] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> [    2.051991] Call Trace:
> [    2.051991]  <TASK>
> [    2.051991]  dump_stack_lvl+0x44/0x57
> [    2.051991]  print_circle+0x384/0x510
> [    2.051991]  ? print_circle+0x510/0x510
> [    2.051991]  cb_check_dl+0x58/0x60
> [    2.051991]  bfs+0xdc/0x1b0
> [    2.051991]  add_dep+0x94/0x120
> [    2.051991]  do_event.isra.22+0x284/0x300
> [    2.051991]  ? __wake_up_common+0x93/0x1a0
> [    2.051991]  dept_event+0x12b/0x1f0
> [    2.051991]  __wake_up_common+0xb0/0x1a0
> [    2.051991]  __wake_up_common_lock+0x65/0x90
> [    2.051991]  ata_scsi_port_error_handler+0x67a/0x740
> [    2.051991]  ? trace_hardirqs_on+0x38/0xe0
> [    2.051991]  ? scsi_eh_get_sense+0x150/0x150
> [    2.051991]  ata_scsi_error+0x94/0xc0
> [    2.051991]  scsi_error_handler+0x8d/0x3a0
> [    2.051991]  ? _raw_spin_unlock_irqrestore+0x63/0x80
> [    2.051991]  ? scsi_eh_get_sense+0x150/0x150
> [    2.051991]  kthread+0xe3/0x110
> [    2.051991]  ? kthread_complete_and_exit+0x20/0x20
> [    2.051991]  ret_from_fork+0x22/0x30
> [    2.051991]  </TASK>


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [REPORT] libata deadlock possibilities by DEPT
  2022-02-16  4:12 [REPORT] libata deadlock possibilities by DEPT Byungchul Park
  2022-02-16  4:16 ` Report 2 in ata_scsi_port_error_handler() Byungchul Park
@ 2022-02-16  6:52 ` Byungchul Park
  1 sibling, 0 replies; 10+ messages in thread
From: Byungchul Park @ 2022-02-16  6:52 UTC (permalink / raw)
  To: damien.lemoal, linux-ide
  Cc: torvalds, mingo, linux-kernel, peterz, will, tglx, rostedt, joel,
	sashal, daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso,
	willy, david, amir73il, bfields, gregkh, kernel-team, linux-mm,
	akpm, mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis,
	cl, penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

Thanks to Matthew, I found the reports are spurious.. I missed the case
that it's not going to real sleep after preparing a wait.

I'm sorry for bothering you guys. Please ignore the reports.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Report in ata_scsi_port_error_handler()
  2022-02-16  6:37     ` Damien Le Moal
@ 2022-02-16 18:09       ` Linus Torvalds
  2022-02-16 18:33         ` Steven Rostedt
  2022-02-17 10:35         ` Byungchul Park
  0 siblings, 2 replies; 10+ messages in thread
From: Linus Torvalds @ 2022-02-16 18:09 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Byungchul Park, linux-ide, Ingo Molnar,
	Linux Kernel Mailing List, Peter Zijlstra, Will Deacon,
	Thomas Gleixner, Steven Rostedt, Joel Fernandes, Sasha Levin,
	Daniel Vetter, Chris Wilson, duyuyang, johannes.berg, Tejun Heo,
	Theodore Ts'o, Matthew Wilcox, Dave Chinner, Amir Goldstein,
	J. Bruce Fields, Greg Kroah-Hartman, kernel-team, Linux-MM,
	Andrew Morton, Michal Hocko, Minchan Kim, Johannes Weiner,
	Vladimir Davydov, sj, Jerome Glisse, Dennis Zhou,
	Christoph Lameter, Pekka Enberg, David Rientjes, Vlastimil Babka,
	ngupta, linux-block, Jens Axboe, paolo.valente, Josef Bacik,
	linux-fsdevel, Al Viro, Jan Kara, Jeff Layton, Dan Williams,
	Christoph Hellwig, Darrick J. Wong, dri-devel, Dave Airlie,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

On Tue, Feb 15, 2022 at 10:37 PM Damien Le Moal
<damien.lemoal@opensource.wdc.com> wrote:
>
> On 2/16/22 13:16, Byungchul Park wrote:
> > [    2.051040] ===================================================
> > [    2.051406] DEPT: Circular dependency has been detected.
> > [    2.051730] 5.17.0-rc1-00014-gcf3441bb2012 #2 Tainted: G        W
> > [    2.051991] ---------------------------------------------------
> > [    2.051991] summary
> > [    2.051991] ---------------------------------------------------
> > [    2.051991] *** DEADLOCK ***
> > [    2.051991]
> > [    2.051991] context A
> > [    2.051991]     [S] (unknown)(&(&ap->eh_wait_q)->dmap:0)
> > [    2.051991]     [W] __raw_spin_lock_irq(&host->lock:0)
> > [    2.051991]     [E] event(&(&ap->eh_wait_q)->dmap:0)
> > [    2.051991]
> > [    2.051991] context B
> > [    2.051991]     [S] __raw_spin_lock_irqsave(&host->lock:0)
> > [    2.051991]     [W] wait(&(&ap->eh_wait_q)->dmap:0)
> > [    2.051991]     [E] spin_unlock(&host->lock:0)
>
> Sleeping with a spinlock held would be triggering warnings already, so
> these reports seem bogus to me.

Yeah, Matthew pointed out the same thing for another use-case, where
it looks like DEPT is looking at the state at the wrong point (not at
the scheduling point, but at prepare_to_sleep()).

This ata_port_wait() is the exact same pattern, ie we have

        spin_lock_irqsave(ap->lock, flags);

        while (ap->pflags & (ATA_PFLAG_EH_PENDING | ATA_PFLAG_EH_IN_PROGRESS)) {
                prepare_to_wait(&ap->eh_wait_q, &wait, TASK_UNINTERRUPTIBLE);
                spin_unlock_irqrestore(ap->lock, flags);
                schedule();

and DEPT has incorrectly taken it to mean that 'ap->lock' is held
during the wait, when it is actually released before actually waiting.

For the spin-locks, this is all very obvious (because they'd have been
caught long ago by much simpler debug code), but the same
prepare_to_wait -> wait pattern can most definitely happen with
sleeping locks too, so they are all slightly suspect.

And yes, the detailed reports are hard to read because the locations
are given as "ata_port_wait_eh+0x52/0xc0". Running them through
scripts/decode_stacktrace.sh to turn them into filename and line
numbers - and also sort out inlining - would help a lot.

Byungchul, could you fix those two issues? Some of your reports may
well be entirely valid, but the hard-to-read hex offsets and the
knowledge that at least some of them are confused about how
prepare_to_wait -> wait actually works makes the motivation to look at
the details much less..

           Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Report in ata_scsi_port_error_handler()
  2022-02-16 18:09       ` Linus Torvalds
@ 2022-02-16 18:33         ` Steven Rostedt
  2022-02-17 23:55           ` Byungchul Park
  2022-02-17 10:35         ` Byungchul Park
  1 sibling, 1 reply; 10+ messages in thread
From: Steven Rostedt @ 2022-02-16 18:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Damien Le Moal, Byungchul Park, linux-ide, Ingo Molnar,
	Linux Kernel Mailing List, Peter Zijlstra, Will Deacon,
	Thomas Gleixner, Joel Fernandes, Sasha Levin, Daniel Vetter,
	Chris Wilson, duyuyang, johannes.berg, Tejun Heo,
	Theodore Ts'o, Matthew Wilcox, Dave Chinner, Amir Goldstein,
	J. Bruce Fields, Greg Kroah-Hartman, kernel-team, Linux-MM,
	Andrew Morton, Michal Hocko, Minchan Kim, Johannes Weiner,
	Vladimir Davydov, sj, Jerome Glisse, Dennis Zhou,
	Christoph Lameter, Pekka Enberg, David Rientjes, Vlastimil Babka,
	ngupta, linux-block, Jens Axboe, paolo.valente, Josef Bacik,
	linux-fsdevel, Al Viro, Jan Kara, Jeff Layton, Dan Williams,
	Christoph Hellwig, Darrick J. Wong, dri-devel, Dave Airlie,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

On Wed, 16 Feb 2022 10:09:14 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> Byungchul, could you fix those two issues? Some of your reports may
> well be entirely valid, but the hard-to-read hex offsets and the
> knowledge that at least some of them are confused about how
> prepare_to_wait -> wait actually works makes the motivation to look at
> the details much less..

Hi Byungchul,

I'm not sure what your tool is using to attach to the kernel to analyze the
events (perhaps tracepoints?). But you can have the prepare_to_wait event
just flag the task is about to wait, and then attach to the schedule
(sched_switch) event to denote that it actually waited.

Of course have the finish_wait() clear the flag.

-- Steve

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Report in ata_scsi_port_error_handler()
  2022-02-16 18:09       ` Linus Torvalds
  2022-02-16 18:33         ` Steven Rostedt
@ 2022-02-17 10:35         ` Byungchul Park
  1 sibling, 0 replies; 10+ messages in thread
From: Byungchul Park @ 2022-02-17 10:35 UTC (permalink / raw)
  To: torvalds
  Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	linux-kernel, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy,
	david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, axboe,
	paolo.valente, josef, linux-fsdevel, viro, jack, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, airlied,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

<torvalds@linux-foundation.org> wrote:
> On Tue, Feb 15, 2022 at 10:37 PM Damien Le Moal
> <damien.lemoal@opensource.wdc.com> wrote:
> >
> > On 2/16/22 13:16, Byungchul Park wrote:
> > > [    2.051040 ] ===================================================
> > > [    2.051406 ] DEPT: Circular dependency has been detected.
> > > [    2.051730 ] 5.17.0-rc1-00014-gcf3441bb2012 #2 Tainted: G        W
> > > [    2.051991 ] ---------------------------------------------------
> > > [    2.051991 ] summary
> > > [    2.051991 ] ---------------------------------------------------
> > > [    2.051991 ] *** DEADLOCK ***
> > > [    2.051991 ]
> > > [    2.051991 ] context A
> > > [    2.051991 ]     [S] (unknown)(&(&ap->eh_wait_q)->dmap:0)
> > > [    2.051991 ]     [W] __raw_spin_lock_irq(&host->lock:0)
> > > [    2.051991 ]     [E] event(&(&ap->eh_wait_q)->dmap:0)
> > > [    2.051991 ]
> > > [    2.051991 ] context B
> > > [    2.051991 ]     [S] __raw_spin_lock_irqsave(&host->lock:0)
> > > [    2.051991 ]     [W] wait(&(&ap->eh_wait_q)->dmap:0)
> > > [    2.051991 ]     [E] spin_unlock(&host->lock:0)
> >
> > Sleeping with a spinlock held would be triggering warnings already, so
> > these reports seem bogus to me.
> 
> Yeah, Matthew pointed out the same thing for another use-case, where
> it looks like DEPT is looking at the state at the wrong point (not at
> the scheduling point, but at prepare_to_sleep()).
> 
> This ata_port_wait() is the exact same pattern, ie we have
> 
>	spin_lock_irqsave(ap->lock, flags);
> 
>	while (ap->pflags & (ATA_PFLAG_EH_PENDING | ATA_PFLAG_EH_IN_PROGRESS)) {
>		prepare_to_wait(&ap->eh_wait_q, &wait, TASK_UNINTERRUPTIBLE);
>		spin_unlock_irqrestore(ap->lock, flags);
>		schedule();
> 
> and DEPT has incorrectly taken it to mean that 'ap->lock' is held
> during the wait, when it is actually released before actually waiting.
> 
> For the spin-locks, this is all very obvious (because they'd have been
> caught long ago by much simpler debug code), but the same
> prepare_to_wait -> wait pattern can most definitely happen with
> sleeping locks too, so they are all slightly suspect.
> 
> And yes, the detailed reports are hard to read because the locations
> are given as "ata_port_wait_eh+0x52/0xc0". Running them through
> scripts/decode_stacktrace.sh to turn them into filename and line
> numbers - and also sort out inlining - would help a lot.
> 
> Byungchul, could you fix those two issues? Some of your reports may

Of couse, that's what I should do. Thanks for your feedback.

> well be entirely valid, but the hard-to-read hex offsets and the
> knowledge that at least some of them are confused about how
> prepare_to_wait -> wait actually works makes the motivation to look at
> the details much less..
> 
> 	Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Report in ata_scsi_port_error_handler()
  2022-02-16 18:33         ` Steven Rostedt
@ 2022-02-17 23:55           ` Byungchul Park
  0 siblings, 0 replies; 10+ messages in thread
From: Byungchul Park @ 2022-02-17 23:55 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Linus Torvalds, Damien Le Moal, linux-ide, Ingo Molnar,
	Linux Kernel Mailing List, Peter Zijlstra, Will Deacon,
	Thomas Gleixner, Joel Fernandes, Sasha Levin, Daniel Vetter,
	Chris Wilson, duyuyang, johannes.berg, Tejun Heo,
	Theodore Ts'o, Matthew Wilcox, Dave Chinner, Amir Goldstein,
	J. Bruce Fields, Greg Kroah-Hartman, kernel-team, Linux-MM,
	Andrew Morton, Michal Hocko, Minchan Kim, Johannes Weiner,
	Vladimir Davydov, sj, Jerome Glisse, Dennis Zhou,
	Christoph Lameter, Pekka Enberg, David Rientjes, Vlastimil Babka,
	ngupta, linux-block, Jens Axboe, paolo.valente, Josef Bacik,
	linux-fsdevel, Al Viro, Jan Kara, Jeff Layton, Dan Williams,
	Christoph Hellwig, Darrick J. Wong, dri-devel, Dave Airlie,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa

On Wed, Feb 16, 2022 at 01:33:18PM -0500, Steven Rostedt wrote:
> On Wed, 16 Feb 2022 10:09:14 -0800
> Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > Byungchul, could you fix those two issues? Some of your reports may
> > well be entirely valid, but the hard-to-read hex offsets and the
> > knowledge that at least some of them are confused about how
> > prepare_to_wait -> wait actually works makes the motivation to look at
> > the details much less..
> 
> Hi Byungchul,
> 
> I'm not sure what your tool is using to attach to the kernel to analyze the
> events (perhaps tracepoints?). But you can have the prepare_to_wait event
> just flag the task is about to wait, and then attach to the schedule
> (sched_switch) event to denote that it actually waited.
> 
> Of course have the finish_wait() clear the flag.

(Sorry for late reply, which was because of my email client issue.)

Thank you for the hint. However, unfortunately, I didn't use tracepoint
for that. However, the key idea is the thing that I should take!

Thanks a lot!

Thanks,
Byungchul

> 
> -- Steve

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-02-17 23:57 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-16  4:12 [REPORT] libata deadlock possibilities by DEPT Byungchul Park
2022-02-16  4:16 ` Report 2 in ata_scsi_port_error_handler() Byungchul Park
2022-02-16  4:16   ` Report 3 " Byungchul Park
2022-02-16  4:16   ` Report " Byungchul Park
2022-02-16  6:37     ` Damien Le Moal
2022-02-16 18:09       ` Linus Torvalds
2022-02-16 18:33         ` Steven Rostedt
2022-02-17 23:55           ` Byungchul Park
2022-02-17 10:35         ` Byungchul Park
2022-02-16  6:52 ` [REPORT] libata deadlock possibilities by DEPT Byungchul Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).