* [REPORT] libata deadlock possibilities by DEPT @ 2022-02-16 4:12 Byungchul Park 2022-02-16 4:16 ` Report 2 in ata_scsi_port_error_handler() Byungchul Park 2022-02-16 6:52 ` [REPORT] libata deadlock possibilities by DEPT Byungchul Park 0 siblings, 2 replies; 10+ messages in thread From: Byungchul Park @ 2022-02-16 4:12 UTC (permalink / raw) To: damien.lemoal, linux-ide Cc: torvalds, mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal, daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy, david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm, mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes, vbabka, ngupta, linux-block, axboe, paolo.valente, josef, linux-fsdevel, viro, jack, jlayton, dan.j.williams, hch, djwong, dri-devel, airlied, rodrigosiqueiramelo, melissa.srw, hamohammed.sa Hi Damien and libata folks, I've been developing a tool for detecting deadlock possibilities by tracking wait/event rather than lock(?) acquisition order to try to cover all synchonization machanisms. It's done on v5.17-rc1 tag. https://github.com/lgebyungchulpark/linux-dept/commits/dept1.11_on_v5.17-rc1 Benifit: 0. Works with all lock primitives. 1. Works with wait_for_completion()/complete(). 2. Works with 'wait' on PG_locked. 3. Works with 'wait' on PG_writeback. 4. Works with swait/wakeup. 5. Works with waitqueue. 6. Multiple reports are allowed. 7. Deduplication control on multiple reports. 8. Withstand false positives thanks to 6. 9. Easy to tag any wait/event. Future work: 0. To make it more stable. 1. To separates Dept from Lockdep. 2. To improves performance in terms of time and space. 3. To use Dept as a dependency engine for Lockdep. 4. To add any missing tags of wait/event in the kernel. 5. To deduplicate stack trace. I've got several reports from the tool. Some of them look like false alarms caused by Lockdep's fake annotations added for better detection. However, some others look like real deadlock possibility. Because of my unfamiliarity of the domain, it's hard to confirm if it's a real one. I'd like to ask for your opinion on it and it'd be appreciated. How to interpret the report is: 1. E(event) in each context cannot be triggered because of the W(wait) that cannot be woken. 2. The stack trace helping find the problematic code is located in each conext's detail. Let me add the reports on this email thread. --- Thanks, Byungchul ^ permalink raw reply [flat|nested] 10+ messages in thread
* Report 2 in ata_scsi_port_error_handler() 2022-02-16 4:12 [REPORT] libata deadlock possibilities by DEPT Byungchul Park @ 2022-02-16 4:16 ` Byungchul Park 2022-02-16 4:16 ` Report 3 " Byungchul Park 2022-02-16 4:16 ` Report " Byungchul Park 2022-02-16 6:52 ` [REPORT] libata deadlock possibilities by DEPT Byungchul Park 1 sibling, 2 replies; 10+ messages in thread From: Byungchul Park @ 2022-02-16 4:16 UTC (permalink / raw) To: damien.lemoal, linux-ide Cc: torvalds, mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal, daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy, david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm, mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes, vbabka, ngupta, linux-block, axboe, paolo.valente, josef, linux-fsdevel, viro, jack, jlayton, dan.j.williams, hch, djwong, dri-devel, airlied, rodrigosiqueiramelo, melissa.srw, hamohammed.sa [ 2.051991] =================================================== [ 2.051991] DEPT: Circular dependency has been detected. [ 2.051991] 5.17.0-rc1-00014-gcf3441bb2012 #2 Tainted: G W [ 2.051991] --------------------------------------------------- [ 2.051991] summary [ 2.051991] --------------------------------------------------- [ 2.051991] *** DEADLOCK *** [ 2.051991] [ 2.051991] context A [ 2.051991] [S] (unknown)(&(&ap->eh_wait_q)->dmap:0) [ 2.051991] [W] wait_for_completion_timeout(&wait:0) [ 2.051991] [E] event(&(&ap->eh_wait_q)->dmap:0) [ 2.051991] [ 2.051991] context B [ 2.051991] [S] __raw_spin_lock_irqsave(&host->lock:0) [ 2.051991] [W] wait(&(&ap->eh_wait_q)->dmap:0) [ 2.051991] [E] spin_unlock(&host->lock:0) [ 2.051991] [ 2.051991] context C [ 2.051991] [S] (unknown)(&wait:0) [ 2.051991] [W] __raw_spin_lock_irq(&host->lock:0) [ 2.051991] [E] complete(&wait:0) [ 2.051991] [ 2.051991] [S]: start of the event context [ 2.051991] [W]: the wait blocked [ 2.051991] [E]: the event not reachable [ 2.051991] --------------------------------------------------- [ 2.051991] context A's detail [ 2.051991] --------------------------------------------------- [ 2.051991] context A [ 2.051991] [S] (unknown)(&(&ap->eh_wait_q)->dmap:0) [ 2.051991] [W] wait_for_completion_timeout(&wait:0) [ 2.051991] [E] event(&(&ap->eh_wait_q)->dmap:0) [ 2.051991] [ 2.051991] [S] (unknown)(&(&ap->eh_wait_q)->dmap:0): [ 2.051991] (N/A) [ 2.051991] [ 2.051991] [W] wait_for_completion_timeout(&wait:0): [ 2.051991] [<ffffffff817763b1>] ata_exec_internal_sg+0x401/0x690 [ 2.051991] stacktrace: [ 2.051991] wait_for_completion_timeout+0x40/0xf0 [ 2.051991] ata_exec_internal_sg+0x401/0x690 [ 2.051991] ata_exec_internal+0x4a/0x70 [ 2.051991] ata_dev_read_id+0x2ce/0x540 [ 2.051991] ata_dev_reread_id+0x33/0xc0 [ 2.051991] ata_dev_revalidate+0x41/0x190 [ 2.051991] ata_do_set_mode+0x61c/0xac0 [ 2.051991] ata_set_mode+0xfc/0x110 [ 2.051991] ata_eh_recover+0x1061/0x1360 [ 2.051991] ata_do_eh+0x3f/0xa0 [ 2.051991] ata_scsi_port_error_handler+0x432/0x740 [ 2.051991] ata_scsi_error+0x94/0xc0 [ 2.051991] scsi_error_handler+0x8d/0x3a0 [ 2.051991] kthread+0xe3/0x110 [ 2.051991] ret_from_fork+0x22/0x30 [ 2.051991] [ 2.051991] [E] event(&(&ap->eh_wait_q)->dmap:0): [ 2.051991] [<ffffffff810baa53>] __wake_up_common+0x93/0x1a0 [ 2.051991] stacktrace: [ 2.051991] dept_event+0x12b/0x1f0 [ 2.051991] __wake_up_common+0xb0/0x1a0 [ 2.051991] __wake_up_common_lock+0x65/0x90 [ 2.051991] ata_scsi_port_error_handler+0x67a/0x740 [ 2.051991] ata_scsi_error+0x94/0xc0 [ 2.051991] scsi_error_handler+0x8d/0x3a0 [ 2.051991] kthread+0xe3/0x110 [ 2.051991] ret_from_fork+0x22/0x30 [ 2.051991] --------------------------------------------------- [ 2.051991] context B's detail [ 2.051991] --------------------------------------------------- [ 2.051991] context B [ 2.051991] [S] __raw_spin_lock_irqsave(&host->lock:0) [ 2.051991] [W] wait(&(&ap->eh_wait_q)->dmap:0) [ 2.051991] [E] spin_unlock(&host->lock:0) [ 2.051991] [ 2.051991] [S] __raw_spin_lock_irqsave(&host->lock:0): [ 2.051991] [<ffffffff8178047c>] ata_port_wait_eh+0x6c/0xc0 [ 2.051991] stacktrace: [ 2.051991] _raw_spin_lock_irqsave+0x82/0xa0 [ 2.051991] ata_port_wait_eh+0x6c/0xc0 [ 2.051991] ata_port_probe+0x1f/0x30 [ 2.051991] async_port_probe+0x27/0x50 [ 2.051991] async_run_entry_fn+0x21/0xa0 [ 2.051991] process_one_work+0x317/0x640 [ 2.051991] worker_thread+0x44/0x410 [ 2.051991] kthread+0xe3/0x110 [ 2.051991] ret_from_fork+0x22/0x30 [ 2.051991] [ 2.051991] [W] wait(&(&ap->eh_wait_q)->dmap:0): [ 2.051991] [<ffffffff810bb017>] prepare_to_wait+0x47/0xd0 [ 2.051991] stacktrace: [ 2.051991] ata_port_wait_eh+0x52/0xc0 [ 2.051991] ata_port_probe+0x1f/0x30 [ 2.051991] async_port_probe+0x27/0x50 [ 2.051991] async_run_entry_fn+0x21/0xa0 [ 2.051991] process_one_work+0x317/0x640 [ 2.051991] worker_thread+0x44/0x410 [ 2.051991] kthread+0xe3/0x110 [ 2.051991] ret_from_fork+0x22/0x30 [ 2.051991] [ 2.051991] [E] spin_unlock(&host->lock:0): [ 2.051991] [<ffffffff8178046e>] ata_port_wait_eh+0x5e/0xc0 [ 2.051991] stacktrace: [ 2.051991] _raw_spin_unlock_irqrestore+0x35/0x80 [ 2.051991] ata_port_wait_eh+0x5e/0xc0 [ 2.051991] ata_port_probe+0x1f/0x30 [ 2.051991] async_port_probe+0x27/0x50 [ 2.051991] async_run_entry_fn+0x21/0xa0 [ 2.051991] process_one_work+0x317/0x640 [ 2.051991] worker_thread+0x44/0x410 [ 2.051991] kthread+0xe3/0x110 [ 2.051991] ret_from_fork+0x22/0x30 [ 2.051991] --------------------------------------------------- [ 2.051991] context C's detail [ 2.051991] --------------------------------------------------- [ 2.051991] context C [ 2.051991] [S] (unknown)(&wait:0) [ 2.051991] [W] __raw_spin_lock_irq(&host->lock:0) [ 2.051991] [E] complete(&wait:0) [ 2.051991] [ 2.051991] [S] (unknown)(&wait:0): [ 2.051991] (N/A) [ 2.051991] [ 2.051991] [W] __raw_spin_lock_irq(&host->lock:0): [ 2.051991] [<ffffffff8178c51b>] ata_sff_pio_task+0x1b/0x1b0 [ 2.051991] stacktrace: [ 2.051991] _raw_spin_lock_irq+0x58/0x90 [ 2.051991] ata_sff_pio_task+0x1b/0x1b0 [ 2.051991] process_one_work+0x317/0x640 [ 2.051991] worker_thread+0x44/0x410 [ 2.051991] kthread+0xe3/0x110 [ 2.051991] ret_from_fork+0x22/0x30 [ 2.051991] [ 2.051991] [E] complete(&wait:0): [ 2.051991] [<ffffffff81780569>] ata_do_link_abort+0x99/0xd0 [ 2.051991] stacktrace: [ 2.051991] complete+0x28/0x60 [ 2.051991] ata_do_link_abort+0x99/0xd0 [ 2.051991] ata_sff_hsm_move+0xfc/0xad0 [ 2.051991] ata_sff_pio_task+0x177/0x1b0 [ 2.051991] process_one_work+0x317/0x640 [ 2.051991] worker_thread+0x44/0x410 [ 2.051991] kthread+0xe3/0x110 [ 2.051991] ret_from_fork+0x22/0x30 [ 2.051991] --------------------------------------------------- [ 2.051991] information that might be helpful [ 2.051991] --------------------------------------------------- [ 2.051991] CPU: 1 PID: 53 Comm: scsi_eh_1 Tainted: G W 5.17.0-rc1-00014-gcf3441bb2012 #2 [ 2.051991] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [ 2.051991] Call Trace: [ 2.051991] <TASK> [ 2.051991] dump_stack_lvl+0x44/0x57 [ 2.051991] print_circle+0x384/0x510 [ 2.051991] ? print_circle+0x510/0x510 [ 2.051991] cb_check_dl+0x58/0x60 [ 2.051991] bfs+0xdc/0x1b0 [ 2.051991] add_dep+0x94/0x120 [ 2.051991] do_event.isra.22+0x284/0x300 [ 2.051991] ? __wake_up_common+0x93/0x1a0 [ 2.051991] dept_event+0x12b/0x1f0 [ 2.051991] __wake_up_common+0xb0/0x1a0 [ 2.051991] __wake_up_common_lock+0x65/0x90 [ 2.051991] ata_scsi_port_error_handler+0x67a/0x740 [ 2.051991] ? trace_hardirqs_on+0x38/0xe0 [ 2.051991] ? scsi_eh_get_sense+0x150/0x150 [ 2.051991] ata_scsi_error+0x94/0xc0 [ 2.051991] scsi_error_handler+0x8d/0x3a0 [ 2.051991] ? _raw_spin_unlock_irqrestore+0x63/0x80 [ 2.051991] ? scsi_eh_get_sense+0x150/0x150 [ 2.051991] kthread+0xe3/0x110 [ 2.051991] ? kthread_complete_and_exit+0x20/0x20 [ 2.051991] ret_from_fork+0x22/0x30 [ 2.051991] </TASK> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Report 3 in ata_scsi_port_error_handler() 2022-02-16 4:16 ` Report 2 in ata_scsi_port_error_handler() Byungchul Park @ 2022-02-16 4:16 ` Byungchul Park 2022-02-16 4:16 ` Report " Byungchul Park 1 sibling, 0 replies; 10+ messages in thread From: Byungchul Park @ 2022-02-16 4:16 UTC (permalink / raw) To: damien.lemoal, linux-ide Cc: torvalds, mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal, daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy, david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm, mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes, vbabka, ngupta, linux-block, axboe, paolo.valente, josef, linux-fsdevel, viro, jack, jlayton, dan.j.williams, hch, djwong, dri-devel, airlied, rodrigosiqueiramelo, melissa.srw, hamohammed.sa [ 2.051991] =================================================== [ 2.051991] DEPT: Circular dependency has been detected. [ 2.051991] 5.17.0-rc1-00014-gcf3441bb2012 #2 Tainted: G W [ 2.051991] --------------------------------------------------- [ 2.051991] summary [ 2.051991] --------------------------------------------------- [ 2.051991] *** DEADLOCK *** [ 2.051991] [ 2.051991] context A [ 2.051991] [S] (unknown)(&(&ap->eh_wait_q)->dmap:0) [ 2.051991] [W] __mutex_lock_common(&host->eh_mutex:0) [ 2.051991] [E] event(&(&ap->eh_wait_q)->dmap:0) [ 2.051991] [ 2.051991] context B [ 2.051991] [S] __raw_spin_lock_irqsave(&host->lock:0) [ 2.051991] [W] wait(&(&ap->eh_wait_q)->dmap:0) [ 2.051991] [E] spin_unlock(&host->lock:0) [ 2.051991] [ 2.051991] context C [ 2.051991] [S] __mutex_lock_common(&host->eh_mutex:0) [ 2.051991] [W] __raw_spin_lock_irqsave(&host->lock:0) [ 2.051991] [E] mutex_unlock(&host->eh_mutex:0) [ 2.051991] [ 2.051991] [S]: start of the event context [ 2.051991] [W]: the wait blocked [ 2.051991] [E]: the event not reachable [ 2.051991] --------------------------------------------------- [ 2.051991] context A's detail [ 2.051991] --------------------------------------------------- [ 2.051991] context A [ 2.051991] [S] (unknown)(&(&ap->eh_wait_q)->dmap:0) [ 2.051991] [W] __mutex_lock_common(&host->eh_mutex:0) [ 2.051991] [E] event(&(&ap->eh_wait_q)->dmap:0) [ 2.051991] [ 2.051991] [S] (unknown)(&(&ap->eh_wait_q)->dmap:0): [ 2.051991] (N/A) [ 2.051991] [ 2.051991] [W] __mutex_lock_common(&host->eh_mutex:0): [ 2.051991] [<ffffffff81780ad9>] ata_eh_acquire+0x19/0x40 [ 2.051991] stacktrace: [ 2.051991] __mutex_lock+0x52c/0x900 [ 2.051991] ata_eh_acquire+0x19/0x40 [ 2.051991] ata_exec_internal_sg+0x41a/0x690 [ 2.051991] ata_exec_internal+0x4a/0x70 [ 2.051991] ata_dev_read_id+0x2ce/0x540 [ 2.051991] ata_dev_reread_id+0x33/0xc0 [ 2.051991] ata_dev_revalidate+0x41/0x190 [ 2.051991] ata_do_set_mode+0x61c/0xac0 [ 2.051991] ata_set_mode+0xfc/0x110 [ 2.051991] ata_eh_recover+0x1061/0x1360 [ 2.051991] ata_do_eh+0x3f/0xa0 [ 2.051991] ata_scsi_port_error_handler+0x432/0x740 [ 2.051991] ata_scsi_error+0x94/0xc0 [ 2.051991] scsi_error_handler+0x8d/0x3a0 [ 2.051991] kthread+0xe3/0x110 [ 2.051991] ret_from_fork+0x22/0x30 [ 2.051991] [ 2.051991] [E] event(&(&ap->eh_wait_q)->dmap:0): [ 2.051991] [<ffffffff810baa53>] __wake_up_common+0x93/0x1a0 [ 2.051991] stacktrace: [ 2.051991] dept_event+0x12b/0x1f0 [ 2.051991] __wake_up_common+0xb0/0x1a0 [ 2.051991] __wake_up_common_lock+0x65/0x90 [ 2.051991] ata_scsi_port_error_handler+0x67a/0x740 [ 2.051991] ata_scsi_error+0x94/0xc0 [ 2.051991] scsi_error_handler+0x8d/0x3a0 [ 2.051991] kthread+0xe3/0x110 [ 2.051991] ret_from_fork+0x22/0x30 [ 2.051991] --------------------------------------------------- [ 2.051991] context B's detail [ 2.051991] --------------------------------------------------- [ 2.051991] context B [ 2.051991] [S] __raw_spin_lock_irqsave(&host->lock:0) [ 2.051991] [W] wait(&(&ap->eh_wait_q)->dmap:0) [ 2.051991] [E] spin_unlock(&host->lock:0) [ 2.051991] [ 2.051991] [S] __raw_spin_lock_irqsave(&host->lock:0): [ 2.051991] [<ffffffff8178047c>] ata_port_wait_eh+0x6c/0xc0 [ 2.051991] stacktrace: [ 2.051991] _raw_spin_lock_irqsave+0x82/0xa0 [ 2.051991] ata_port_wait_eh+0x6c/0xc0 [ 2.051991] ata_port_probe+0x1f/0x30 [ 2.051991] async_port_probe+0x27/0x50 [ 2.051991] async_run_entry_fn+0x21/0xa0 [ 2.051991] process_one_work+0x317/0x640 [ 2.051991] worker_thread+0x44/0x410 [ 2.051991] kthread+0xe3/0x110 [ 2.051991] ret_from_fork+0x22/0x30 [ 2.051991] [ 2.051991] [W] wait(&(&ap->eh_wait_q)->dmap:0): [ 2.051991] [<ffffffff810bb017>] prepare_to_wait+0x47/0xd0 [ 2.051991] stacktrace: [ 2.051991] ata_port_wait_eh+0x52/0xc0 [ 2.051991] ata_port_probe+0x1f/0x30 [ 2.051991] async_port_probe+0x27/0x50 [ 2.051991] async_run_entry_fn+0x21/0xa0 [ 2.051991] process_one_work+0x317/0x640 [ 2.051991] worker_thread+0x44/0x410 [ 2.051991] kthread+0xe3/0x110 [ 2.051991] ret_from_fork+0x22/0x30 [ 2.051991] [ 2.051991] [E] spin_unlock(&host->lock:0): [ 2.051991] [<ffffffff8178046e>] ata_port_wait_eh+0x5e/0xc0 [ 2.051991] stacktrace: [ 2.051991] _raw_spin_unlock_irqrestore+0x35/0x80 [ 2.051991] ata_port_wait_eh+0x5e/0xc0 [ 2.051991] ata_port_probe+0x1f/0x30 [ 2.051991] async_port_probe+0x27/0x50 [ 2.051991] async_run_entry_fn+0x21/0xa0 [ 2.051991] process_one_work+0x317/0x640 [ 2.051991] worker_thread+0x44/0x410 [ 2.051991] kthread+0xe3/0x110 [ 2.051991] ret_from_fork+0x22/0x30 [ 2.051991] --------------------------------------------------- [ 2.051991] context C's detail [ 2.051991] --------------------------------------------------- [ 2.051991] context C [ 2.051991] [S] __mutex_lock_common(&host->eh_mutex:0) [ 2.051991] [W] __raw_spin_lock_irqsave(&host->lock:0) [ 2.051991] [E] mutex_unlock(&host->eh_mutex:0) [ 2.051991] [ 2.051991] [S] __mutex_lock_common(&host->eh_mutex:0): [ 2.051991] [<ffffffff81780ad9>] ata_eh_acquire+0x19/0x40 [ 2.051991] stacktrace: [ 2.051991] __mutex_lock+0x54d/0x900 [ 2.051991] ata_eh_acquire+0x19/0x40 [ 2.051991] ata_scsi_port_error_handler+0x32/0x740 [ 2.051991] ata_scsi_error+0x94/0xc0 [ 2.051991] scsi_error_handler+0x8d/0x3a0 [ 2.051991] kthread+0xe3/0x110 [ 2.051991] ret_from_fork+0x22/0x30 [ 2.051991] [ 2.051991] [W] __raw_spin_lock_irqsave(&host->lock:0): [ 2.051991] [<ffffffff817853df>] ata_scsi_port_error_handler+0x4f/0x740 [ 2.051991] stacktrace: [ 2.051991] _raw_spin_lock_irqsave+0x5f/0xa0 [ 2.051991] ata_scsi_port_error_handler+0x4f/0x740 [ 2.051991] ata_scsi_error+0x94/0xc0 [ 2.051991] scsi_error_handler+0x8d/0x3a0 [ 2.051991] kthread+0xe3/0x110 [ 2.051991] ret_from_fork+0x22/0x30 [ 2.051991] [ 2.051991] [E] mutex_unlock(&host->eh_mutex:0): [ 2.051991] [<ffffffff81773cd7>] ata_msleep+0x77/0x80 [ 2.051991] stacktrace: [ 2.051991] __mutex_unlock_slowpath+0x49/0x2a0 [ 2.051991] ata_msleep+0x77/0x80 [ 2.051991] ata_sff_wait_after_reset+0x27/0x150 [ 2.051991] ata_sff_softreset+0xc8/0x1a0 [ 2.051991] ata_eh_reset+0x3dc/0x1390 [ 2.051991] ata_eh_recover+0x257/0x1360 [ 2.051991] ata_do_eh+0x3f/0xa0 [ 2.051991] ata_scsi_port_error_handler+0x432/0x740 [ 2.051991] ata_scsi_error+0x94/0xc0 [ 2.051991] scsi_error_handler+0x8d/0x3a0 [ 2.051991] kthread+0xe3/0x110 [ 2.051991] ret_from_fork+0x22/0x30 [ 2.051991] --------------------------------------------------- [ 2.051991] information that might be helpful [ 2.051991] --------------------------------------------------- [ 2.051991] CPU: 1 PID: 53 Comm: scsi_eh_1 Tainted: G W 5.17.0-rc1-00014-gcf3441bb2012 #2 [ 2.051991] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [ 2.051991] Call Trace: [ 2.051991] <TASK> [ 2.051991] dump_stack_lvl+0x44/0x57 [ 2.051991] print_circle+0x384/0x510 [ 2.051991] ? print_circle+0x510/0x510 [ 2.051991] cb_check_dl+0x58/0x60 [ 2.051991] bfs+0xdc/0x1b0 [ 2.051991] add_dep+0x94/0x120 [ 2.051991] do_event.isra.22+0x284/0x300 [ 2.051991] ? __wake_up_common+0x93/0x1a0 [ 2.051991] dept_event+0x12b/0x1f0 [ 2.051991] __wake_up_common+0xb0/0x1a0 [ 2.051991] __wake_up_common_lock+0x65/0x90 [ 2.051991] ata_scsi_port_error_handler+0x67a/0x740 [ 2.051991] ? trace_hardirqs_on+0x38/0xe0 [ 2.051991] ? scsi_eh_get_sense+0x150/0x150 [ 2.051991] ata_scsi_error+0x94/0xc0 [ 2.051991] scsi_error_handler+0x8d/0x3a0 [ 2.051991] ? _raw_spin_unlock_irqrestore+0x63/0x80 [ 2.051991] ? scsi_eh_get_sense+0x150/0x150 [ 2.051991] kthread+0xe3/0x110 [ 2.051991] ? kthread_complete_and_exit+0x20/0x20 [ 2.051991] ret_from_fork+0x22/0x30 [ 2.051991] </TASK> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Report in ata_scsi_port_error_handler() 2022-02-16 4:16 ` Report 2 in ata_scsi_port_error_handler() Byungchul Park 2022-02-16 4:16 ` Report 3 " Byungchul Park @ 2022-02-16 4:16 ` Byungchul Park 2022-02-16 6:37 ` Damien Le Moal 1 sibling, 1 reply; 10+ messages in thread From: Byungchul Park @ 2022-02-16 4:16 UTC (permalink / raw) To: damien.lemoal, linux-ide Cc: torvalds, mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal, daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy, david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm, mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes, vbabka, ngupta, linux-block, axboe, paolo.valente, josef, linux-fsdevel, viro, jack, jlayton, dan.j.williams, hch, djwong, dri-devel, airlied, rodrigosiqueiramelo, melissa.srw, hamohammed.sa [ 2.051040] =================================================== [ 2.051406] DEPT: Circular dependency has been detected. [ 2.051730] 5.17.0-rc1-00014-gcf3441bb2012 #2 Tainted: G W [ 2.051991] --------------------------------------------------- [ 2.051991] summary [ 2.051991] --------------------------------------------------- [ 2.051991] *** DEADLOCK *** [ 2.051991] [ 2.051991] context A [ 2.051991] [S] (unknown)(&(&ap->eh_wait_q)->dmap:0) [ 2.051991] [W] __raw_spin_lock_irq(&host->lock:0) [ 2.051991] [E] event(&(&ap->eh_wait_q)->dmap:0) [ 2.051991] [ 2.051991] context B [ 2.051991] [S] __raw_spin_lock_irqsave(&host->lock:0) [ 2.051991] [W] wait(&(&ap->eh_wait_q)->dmap:0) [ 2.051991] [E] spin_unlock(&host->lock:0) [ 2.051991] [ 2.051991] [S]: start of the event context [ 2.051991] [W]: the wait blocked [ 2.051991] [E]: the event not reachable [ 2.051991] --------------------------------------------------- [ 2.051991] context A's detail [ 2.051991] --------------------------------------------------- [ 2.051991] context A [ 2.051991] [S] (unknown)(&(&ap->eh_wait_q)->dmap:0) [ 2.051991] [W] __raw_spin_lock_irq(&host->lock:0) [ 2.051991] [E] event(&(&ap->eh_wait_q)->dmap:0) [ 2.051991] [ 2.051991] [S] (unknown)(&(&ap->eh_wait_q)->dmap:0): [ 2.051991] (N/A) [ 2.051991] [ 2.051991] [W] __raw_spin_lock_irq(&host->lock:0): [ 2.051991] [<ffffffff8178d355>] ata_sff_flush_pio_task+0x45/0xc0 [ 2.051991] stacktrace: [ 2.051991] _raw_spin_lock_irq+0x58/0x90 [ 2.051991] ata_sff_flush_pio_task+0x45/0xc0 [ 2.051991] ata_exec_internal_sg+0x422/0x690 [ 2.051991] ata_do_set_mode+0x5ee/0xac0 [ 2.051991] ata_set_mode+0xfc/0x110 [ 2.051991] ata_eh_recover+0x1061/0x1360 [ 2.051991] ata_do_eh+0x3f/0xa0 [ 2.051991] ata_scsi_port_error_handler+0x432/0x740 [ 2.051991] ata_scsi_error+0x94/0xc0 [ 2.051991] scsi_error_handler+0x8d/0x3a0 [ 2.051991] kthread+0xe3/0x110 [ 2.051991] ret_from_fork+0x22/0x30 [ 2.051991] [ 2.051991] [E] event(&(&ap->eh_wait_q)->dmap:0): [ 2.051991] [<ffffffff810baa53>] __wake_up_common+0x93/0x1a0 [ 2.051991] stacktrace: [ 2.051991] dept_event+0x12b/0x1f0 [ 2.051991] __wake_up_common+0xb0/0x1a0 [ 2.051991] __wake_up_common_lock+0x65/0x90 [ 2.051991] ata_scsi_port_error_handler+0x67a/0x740 [ 2.051991] ata_scsi_error+0x94/0xc0 [ 2.051991] scsi_error_handler+0x8d/0x3a0 [ 2.051991] kthread+0xe3/0x110 [ 2.051991] ret_from_fork+0x22/0x30 [ 2.051991] --------------------------------------------------- [ 2.051991] context B's detail [ 2.051991] --------------------------------------------------- [ 2.051991] context B [ 2.051991] [S] __raw_spin_lock_irqsave(&host->lock:0) [ 2.051991] [W] wait(&(&ap->eh_wait_q)->dmap:0) [ 2.051991] [E] spin_unlock(&host->lock:0) [ 2.051991] [ 2.051991] [S] __raw_spin_lock_irqsave(&host->lock:0): [ 2.051991] [<ffffffff8178047c>] ata_port_wait_eh+0x6c/0xc0 [ 2.051991] stacktrace: [ 2.051991] _raw_spin_lock_irqsave+0x82/0xa0 [ 2.051991] ata_port_wait_eh+0x6c/0xc0 [ 2.051991] ata_port_probe+0x1f/0x30 [ 2.051991] async_port_probe+0x27/0x50 [ 2.051991] async_run_entry_fn+0x21/0xa0 [ 2.051991] process_one_work+0x317/0x640 [ 2.051991] worker_thread+0x44/0x410 [ 2.051991] kthread+0xe3/0x110 [ 2.051991] ret_from_fork+0x22/0x30 [ 2.051991] [ 2.051991] [W] wait(&(&ap->eh_wait_q)->dmap:0): [ 2.051991] [<ffffffff810bb017>] prepare_to_wait+0x47/0xd0 [ 2.051991] stacktrace: [ 2.051991] ata_port_wait_eh+0x52/0xc0 [ 2.051991] ata_port_probe+0x1f/0x30 [ 2.051991] async_port_probe+0x27/0x50 [ 2.051991] async_run_entry_fn+0x21/0xa0 [ 2.051991] process_one_work+0x317/0x640 [ 2.051991] worker_thread+0x44/0x410 [ 2.051991] kthread+0xe3/0x110 [ 2.051991] ret_from_fork+0x22/0x30 [ 2.051991] [ 2.051991] [E] spin_unlock(&host->lock:0): [ 2.051991] [<ffffffff8178046e>] ata_port_wait_eh+0x5e/0xc0 [ 2.051991] stacktrace: [ 2.051991] _raw_spin_unlock_irqrestore+0x35/0x80 [ 2.051991] ata_port_wait_eh+0x5e/0xc0 [ 2.051991] ata_port_probe+0x1f/0x30 [ 2.051991] async_port_probe+0x27/0x50 [ 2.051991] async_run_entry_fn+0x21/0xa0 [ 2.051991] process_one_work+0x317/0x640 [ 2.051991] worker_thread+0x44/0x410 [ 2.051991] kthread+0xe3/0x110 [ 2.051991] ret_from_fork+0x22/0x30 [ 2.051991] --------------------------------------------------- [ 2.051991] information that might be helpful [ 2.051991] --------------------------------------------------- [ 2.051991] CPU: 1 PID: 53 Comm: scsi_eh_1 Tainted: G W 5.17.0-rc1-00014-gcf3441bb2012 #2 [ 2.051991] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [ 2.051991] Call Trace: [ 2.051991] <TASK> [ 2.051991] dump_stack_lvl+0x44/0x57 [ 2.051991] print_circle+0x384/0x510 [ 2.051991] ? print_circle+0x510/0x510 [ 2.051991] cb_check_dl+0x58/0x60 [ 2.051991] bfs+0xdc/0x1b0 [ 2.051991] add_dep+0x94/0x120 [ 2.051991] do_event.isra.22+0x284/0x300 [ 2.051991] ? __wake_up_common+0x93/0x1a0 [ 2.051991] dept_event+0x12b/0x1f0 [ 2.051991] __wake_up_common+0xb0/0x1a0 [ 2.051991] __wake_up_common_lock+0x65/0x90 [ 2.051991] ata_scsi_port_error_handler+0x67a/0x740 [ 2.051991] ? trace_hardirqs_on+0x38/0xe0 [ 2.051991] ? scsi_eh_get_sense+0x150/0x150 [ 2.051991] ata_scsi_error+0x94/0xc0 [ 2.051991] scsi_error_handler+0x8d/0x3a0 [ 2.051991] ? _raw_spin_unlock_irqrestore+0x63/0x80 [ 2.051991] ? scsi_eh_get_sense+0x150/0x150 [ 2.051991] kthread+0xe3/0x110 [ 2.051991] ? kthread_complete_and_exit+0x20/0x20 [ 2.051991] ret_from_fork+0x22/0x30 [ 2.051991] </TASK> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Report in ata_scsi_port_error_handler() 2022-02-16 4:16 ` Report " Byungchul Park @ 2022-02-16 6:37 ` Damien Le Moal 2022-02-16 18:09 ` Linus Torvalds 0 siblings, 1 reply; 10+ messages in thread From: Damien Le Moal @ 2022-02-16 6:37 UTC (permalink / raw) To: Byungchul Park, linux-ide Cc: torvalds, mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal, daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy, david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm, mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes, vbabka, ngupta, linux-block, axboe, paolo.valente, josef, linux-fsdevel, viro, jack, jlayton, dan.j.williams, hch, djwong, dri-devel, airlied, rodrigosiqueiramelo, melissa.srw, hamohammed.sa On 2/16/22 13:16, Byungchul Park wrote: > [ 2.051040] =================================================== > [ 2.051406] DEPT: Circular dependency has been detected. > [ 2.051730] 5.17.0-rc1-00014-gcf3441bb2012 #2 Tainted: G W > [ 2.051991] --------------------------------------------------- > [ 2.051991] summary > [ 2.051991] --------------------------------------------------- > [ 2.051991] *** DEADLOCK *** > [ 2.051991] > [ 2.051991] context A > [ 2.051991] [S] (unknown)(&(&ap->eh_wait_q)->dmap:0) > [ 2.051991] [W] __raw_spin_lock_irq(&host->lock:0) > [ 2.051991] [E] event(&(&ap->eh_wait_q)->dmap:0) > [ 2.051991] > [ 2.051991] context B > [ 2.051991] [S] __raw_spin_lock_irqsave(&host->lock:0) > [ 2.051991] [W] wait(&(&ap->eh_wait_q)->dmap:0) > [ 2.051991] [E] spin_unlock(&host->lock:0) Sleeping with a spinlock held would be triggering warnings already, so these reports seem bogus to me. In any case, trying to make sense of these reports is not easy, to say the least. If you think there is really a bug, then please send a clear explanation of your findings rather than a raw tool output that did not go through human analysis. > [ 2.051991] > [ 2.051991] [S]: start of the event context > [ 2.051991] [W]: the wait blocked > [ 2.051991] [E]: the event not reachable > [ 2.051991] --------------------------------------------------- > [ 2.051991] context A's detail > [ 2.051991] --------------------------------------------------- > [ 2.051991] context A > [ 2.051991] [S] (unknown)(&(&ap->eh_wait_q)->dmap:0) > [ 2.051991] [W] __raw_spin_lock_irq(&host->lock:0) > [ 2.051991] [E] event(&(&ap->eh_wait_q)->dmap:0) > [ 2.051991] > [ 2.051991] [S] (unknown)(&(&ap->eh_wait_q)->dmap:0): > [ 2.051991] (N/A) > [ 2.051991] > [ 2.051991] [W] __raw_spin_lock_irq(&host->lock:0): > [ 2.051991] [<ffffffff8178d355>] ata_sff_flush_pio_task+0x45/0xc0 > [ 2.051991] stacktrace: > [ 2.051991] _raw_spin_lock_irq+0x58/0x90 > [ 2.051991] ata_sff_flush_pio_task+0x45/0xc0 > [ 2.051991] ata_exec_internal_sg+0x422/0x690 > [ 2.051991] ata_do_set_mode+0x5ee/0xac0 > [ 2.051991] ata_set_mode+0xfc/0x110 > [ 2.051991] ata_eh_recover+0x1061/0x1360 > [ 2.051991] ata_do_eh+0x3f/0xa0 > [ 2.051991] ata_scsi_port_error_handler+0x432/0x740 > [ 2.051991] ata_scsi_error+0x94/0xc0 > [ 2.051991] scsi_error_handler+0x8d/0x3a0 > [ 2.051991] kthread+0xe3/0x110 > [ 2.051991] ret_from_fork+0x22/0x30 > [ 2.051991] > [ 2.051991] [E] event(&(&ap->eh_wait_q)->dmap:0): > [ 2.051991] [<ffffffff810baa53>] __wake_up_common+0x93/0x1a0 > [ 2.051991] stacktrace: > [ 2.051991] dept_event+0x12b/0x1f0 > [ 2.051991] __wake_up_common+0xb0/0x1a0 > [ 2.051991] __wake_up_common_lock+0x65/0x90 > [ 2.051991] ata_scsi_port_error_handler+0x67a/0x740 > [ 2.051991] ata_scsi_error+0x94/0xc0 > [ 2.051991] scsi_error_handler+0x8d/0x3a0 > [ 2.051991] kthread+0xe3/0x110 > [ 2.051991] ret_from_fork+0x22/0x30 > [ 2.051991] --------------------------------------------------- > [ 2.051991] context B's detail > [ 2.051991] --------------------------------------------------- > [ 2.051991] context B > [ 2.051991] [S] __raw_spin_lock_irqsave(&host->lock:0) > [ 2.051991] [W] wait(&(&ap->eh_wait_q)->dmap:0) > [ 2.051991] [E] spin_unlock(&host->lock:0) > [ 2.051991] > [ 2.051991] [S] __raw_spin_lock_irqsave(&host->lock:0): > [ 2.051991] [<ffffffff8178047c>] ata_port_wait_eh+0x6c/0xc0 > [ 2.051991] stacktrace: > [ 2.051991] _raw_spin_lock_irqsave+0x82/0xa0 > [ 2.051991] ata_port_wait_eh+0x6c/0xc0 > [ 2.051991] ata_port_probe+0x1f/0x30 > [ 2.051991] async_port_probe+0x27/0x50 > [ 2.051991] async_run_entry_fn+0x21/0xa0 > [ 2.051991] process_one_work+0x317/0x640 > [ 2.051991] worker_thread+0x44/0x410 > [ 2.051991] kthread+0xe3/0x110 > [ 2.051991] ret_from_fork+0x22/0x30 > [ 2.051991] > [ 2.051991] [W] wait(&(&ap->eh_wait_q)->dmap:0): > [ 2.051991] [<ffffffff810bb017>] prepare_to_wait+0x47/0xd0 > [ 2.051991] stacktrace: > [ 2.051991] ata_port_wait_eh+0x52/0xc0 > [ 2.051991] ata_port_probe+0x1f/0x30 > [ 2.051991] async_port_probe+0x27/0x50 > [ 2.051991] async_run_entry_fn+0x21/0xa0 > [ 2.051991] process_one_work+0x317/0x640 > [ 2.051991] worker_thread+0x44/0x410 > [ 2.051991] kthread+0xe3/0x110 > [ 2.051991] ret_from_fork+0x22/0x30 > [ 2.051991] > [ 2.051991] [E] spin_unlock(&host->lock:0): > [ 2.051991] [<ffffffff8178046e>] ata_port_wait_eh+0x5e/0xc0 > [ 2.051991] stacktrace: > [ 2.051991] _raw_spin_unlock_irqrestore+0x35/0x80 > [ 2.051991] ata_port_wait_eh+0x5e/0xc0 > [ 2.051991] ata_port_probe+0x1f/0x30 > [ 2.051991] async_port_probe+0x27/0x50 > [ 2.051991] async_run_entry_fn+0x21/0xa0 > [ 2.051991] process_one_work+0x317/0x640 > [ 2.051991] worker_thread+0x44/0x410 > [ 2.051991] kthread+0xe3/0x110 > [ 2.051991] ret_from_fork+0x22/0x30 > [ 2.051991] --------------------------------------------------- > [ 2.051991] information that might be helpful > [ 2.051991] --------------------------------------------------- > [ 2.051991] CPU: 1 PID: 53 Comm: scsi_eh_1 Tainted: G W 5.17.0-rc1-00014-gcf3441bb2012 #2 > [ 2.051991] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > [ 2.051991] Call Trace: > [ 2.051991] <TASK> > [ 2.051991] dump_stack_lvl+0x44/0x57 > [ 2.051991] print_circle+0x384/0x510 > [ 2.051991] ? print_circle+0x510/0x510 > [ 2.051991] cb_check_dl+0x58/0x60 > [ 2.051991] bfs+0xdc/0x1b0 > [ 2.051991] add_dep+0x94/0x120 > [ 2.051991] do_event.isra.22+0x284/0x300 > [ 2.051991] ? __wake_up_common+0x93/0x1a0 > [ 2.051991] dept_event+0x12b/0x1f0 > [ 2.051991] __wake_up_common+0xb0/0x1a0 > [ 2.051991] __wake_up_common_lock+0x65/0x90 > [ 2.051991] ata_scsi_port_error_handler+0x67a/0x740 > [ 2.051991] ? trace_hardirqs_on+0x38/0xe0 > [ 2.051991] ? scsi_eh_get_sense+0x150/0x150 > [ 2.051991] ata_scsi_error+0x94/0xc0 > [ 2.051991] scsi_error_handler+0x8d/0x3a0 > [ 2.051991] ? _raw_spin_unlock_irqrestore+0x63/0x80 > [ 2.051991] ? scsi_eh_get_sense+0x150/0x150 > [ 2.051991] kthread+0xe3/0x110 > [ 2.051991] ? kthread_complete_and_exit+0x20/0x20 > [ 2.051991] ret_from_fork+0x22/0x30 > [ 2.051991] </TASK> -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Report in ata_scsi_port_error_handler() 2022-02-16 6:37 ` Damien Le Moal @ 2022-02-16 18:09 ` Linus Torvalds 2022-02-16 18:33 ` Steven Rostedt 2022-02-17 10:35 ` Byungchul Park 0 siblings, 2 replies; 10+ messages in thread From: Linus Torvalds @ 2022-02-16 18:09 UTC (permalink / raw) To: Damien Le Moal Cc: Byungchul Park, linux-ide, Ingo Molnar, Linux Kernel Mailing List, Peter Zijlstra, Will Deacon, Thomas Gleixner, Steven Rostedt, Joel Fernandes, Sasha Levin, Daniel Vetter, Chris Wilson, duyuyang, johannes.berg, Tejun Heo, Theodore Ts'o, Matthew Wilcox, Dave Chinner, Amir Goldstein, J. Bruce Fields, Greg Kroah-Hartman, kernel-team, Linux-MM, Andrew Morton, Michal Hocko, Minchan Kim, Johannes Weiner, Vladimir Davydov, sj, Jerome Glisse, Dennis Zhou, Christoph Lameter, Pekka Enberg, David Rientjes, Vlastimil Babka, ngupta, linux-block, Jens Axboe, paolo.valente, Josef Bacik, linux-fsdevel, Al Viro, Jan Kara, Jeff Layton, Dan Williams, Christoph Hellwig, Darrick J. Wong, dri-devel, Dave Airlie, rodrigosiqueiramelo, melissa.srw, hamohammed.sa On Tue, Feb 15, 2022 at 10:37 PM Damien Le Moal <damien.lemoal@opensource.wdc.com> wrote: > > On 2/16/22 13:16, Byungchul Park wrote: > > [ 2.051040] =================================================== > > [ 2.051406] DEPT: Circular dependency has been detected. > > [ 2.051730] 5.17.0-rc1-00014-gcf3441bb2012 #2 Tainted: G W > > [ 2.051991] --------------------------------------------------- > > [ 2.051991] summary > > [ 2.051991] --------------------------------------------------- > > [ 2.051991] *** DEADLOCK *** > > [ 2.051991] > > [ 2.051991] context A > > [ 2.051991] [S] (unknown)(&(&ap->eh_wait_q)->dmap:0) > > [ 2.051991] [W] __raw_spin_lock_irq(&host->lock:0) > > [ 2.051991] [E] event(&(&ap->eh_wait_q)->dmap:0) > > [ 2.051991] > > [ 2.051991] context B > > [ 2.051991] [S] __raw_spin_lock_irqsave(&host->lock:0) > > [ 2.051991] [W] wait(&(&ap->eh_wait_q)->dmap:0) > > [ 2.051991] [E] spin_unlock(&host->lock:0) > > Sleeping with a spinlock held would be triggering warnings already, so > these reports seem bogus to me. Yeah, Matthew pointed out the same thing for another use-case, where it looks like DEPT is looking at the state at the wrong point (not at the scheduling point, but at prepare_to_sleep()). This ata_port_wait() is the exact same pattern, ie we have spin_lock_irqsave(ap->lock, flags); while (ap->pflags & (ATA_PFLAG_EH_PENDING | ATA_PFLAG_EH_IN_PROGRESS)) { prepare_to_wait(&ap->eh_wait_q, &wait, TASK_UNINTERRUPTIBLE); spin_unlock_irqrestore(ap->lock, flags); schedule(); and DEPT has incorrectly taken it to mean that 'ap->lock' is held during the wait, when it is actually released before actually waiting. For the spin-locks, this is all very obvious (because they'd have been caught long ago by much simpler debug code), but the same prepare_to_wait -> wait pattern can most definitely happen with sleeping locks too, so they are all slightly suspect. And yes, the detailed reports are hard to read because the locations are given as "ata_port_wait_eh+0x52/0xc0". Running them through scripts/decode_stacktrace.sh to turn them into filename and line numbers - and also sort out inlining - would help a lot. Byungchul, could you fix those two issues? Some of your reports may well be entirely valid, but the hard-to-read hex offsets and the knowledge that at least some of them are confused about how prepare_to_wait -> wait actually works makes the motivation to look at the details much less.. Linus ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Report in ata_scsi_port_error_handler() 2022-02-16 18:09 ` Linus Torvalds @ 2022-02-16 18:33 ` Steven Rostedt 2022-02-17 23:55 ` Byungchul Park 2022-02-17 10:35 ` Byungchul Park 1 sibling, 1 reply; 10+ messages in thread From: Steven Rostedt @ 2022-02-16 18:33 UTC (permalink / raw) To: Linus Torvalds Cc: Damien Le Moal, Byungchul Park, linux-ide, Ingo Molnar, Linux Kernel Mailing List, Peter Zijlstra, Will Deacon, Thomas Gleixner, Joel Fernandes, Sasha Levin, Daniel Vetter, Chris Wilson, duyuyang, johannes.berg, Tejun Heo, Theodore Ts'o, Matthew Wilcox, Dave Chinner, Amir Goldstein, J. Bruce Fields, Greg Kroah-Hartman, kernel-team, Linux-MM, Andrew Morton, Michal Hocko, Minchan Kim, Johannes Weiner, Vladimir Davydov, sj, Jerome Glisse, Dennis Zhou, Christoph Lameter, Pekka Enberg, David Rientjes, Vlastimil Babka, ngupta, linux-block, Jens Axboe, paolo.valente, Josef Bacik, linux-fsdevel, Al Viro, Jan Kara, Jeff Layton, Dan Williams, Christoph Hellwig, Darrick J. Wong, dri-devel, Dave Airlie, rodrigosiqueiramelo, melissa.srw, hamohammed.sa On Wed, 16 Feb 2022 10:09:14 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote: > Byungchul, could you fix those two issues? Some of your reports may > well be entirely valid, but the hard-to-read hex offsets and the > knowledge that at least some of them are confused about how > prepare_to_wait -> wait actually works makes the motivation to look at > the details much less.. Hi Byungchul, I'm not sure what your tool is using to attach to the kernel to analyze the events (perhaps tracepoints?). But you can have the prepare_to_wait event just flag the task is about to wait, and then attach to the schedule (sched_switch) event to denote that it actually waited. Of course have the finish_wait() clear the flag. -- Steve ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Report in ata_scsi_port_error_handler() 2022-02-16 18:33 ` Steven Rostedt @ 2022-02-17 23:55 ` Byungchul Park 0 siblings, 0 replies; 10+ messages in thread From: Byungchul Park @ 2022-02-17 23:55 UTC (permalink / raw) To: Steven Rostedt Cc: Linus Torvalds, Damien Le Moal, linux-ide, Ingo Molnar, Linux Kernel Mailing List, Peter Zijlstra, Will Deacon, Thomas Gleixner, Joel Fernandes, Sasha Levin, Daniel Vetter, Chris Wilson, duyuyang, johannes.berg, Tejun Heo, Theodore Ts'o, Matthew Wilcox, Dave Chinner, Amir Goldstein, J. Bruce Fields, Greg Kroah-Hartman, kernel-team, Linux-MM, Andrew Morton, Michal Hocko, Minchan Kim, Johannes Weiner, Vladimir Davydov, sj, Jerome Glisse, Dennis Zhou, Christoph Lameter, Pekka Enberg, David Rientjes, Vlastimil Babka, ngupta, linux-block, Jens Axboe, paolo.valente, Josef Bacik, linux-fsdevel, Al Viro, Jan Kara, Jeff Layton, Dan Williams, Christoph Hellwig, Darrick J. Wong, dri-devel, Dave Airlie, rodrigosiqueiramelo, melissa.srw, hamohammed.sa On Wed, Feb 16, 2022 at 01:33:18PM -0500, Steven Rostedt wrote: > On Wed, 16 Feb 2022 10:09:14 -0800 > Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > Byungchul, could you fix those two issues? Some of your reports may > > well be entirely valid, but the hard-to-read hex offsets and the > > knowledge that at least some of them are confused about how > > prepare_to_wait -> wait actually works makes the motivation to look at > > the details much less.. > > Hi Byungchul, > > I'm not sure what your tool is using to attach to the kernel to analyze the > events (perhaps tracepoints?). But you can have the prepare_to_wait event > just flag the task is about to wait, and then attach to the schedule > (sched_switch) event to denote that it actually waited. > > Of course have the finish_wait() clear the flag. (Sorry for late reply, which was because of my email client issue.) Thank you for the hint. However, unfortunately, I didn't use tracepoint for that. However, the key idea is the thing that I should take! Thanks a lot! Thanks, Byungchul > > -- Steve ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Report in ata_scsi_port_error_handler() 2022-02-16 18:09 ` Linus Torvalds 2022-02-16 18:33 ` Steven Rostedt @ 2022-02-17 10:35 ` Byungchul Park 1 sibling, 0 replies; 10+ messages in thread From: Byungchul Park @ 2022-02-17 10:35 UTC (permalink / raw) To: torvalds Cc: damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal, daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy, david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm, mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes, vbabka, ngupta, linux-block, axboe, paolo.valente, josef, linux-fsdevel, viro, jack, jack, jlayton, dan.j.williams, hch, djwong, dri-devel, airlied, rodrigosiqueiramelo, melissa.srw, hamohammed.sa <torvalds@linux-foundation.org> wrote: > On Tue, Feb 15, 2022 at 10:37 PM Damien Le Moal > <damien.lemoal@opensource.wdc.com> wrote: > > > > On 2/16/22 13:16, Byungchul Park wrote: > > > [ 2.051040 ] =================================================== > > > [ 2.051406 ] DEPT: Circular dependency has been detected. > > > [ 2.051730 ] 5.17.0-rc1-00014-gcf3441bb2012 #2 Tainted: G W > > > [ 2.051991 ] --------------------------------------------------- > > > [ 2.051991 ] summary > > > [ 2.051991 ] --------------------------------------------------- > > > [ 2.051991 ] *** DEADLOCK *** > > > [ 2.051991 ] > > > [ 2.051991 ] context A > > > [ 2.051991 ] [S] (unknown)(&(&ap->eh_wait_q)->dmap:0) > > > [ 2.051991 ] [W] __raw_spin_lock_irq(&host->lock:0) > > > [ 2.051991 ] [E] event(&(&ap->eh_wait_q)->dmap:0) > > > [ 2.051991 ] > > > [ 2.051991 ] context B > > > [ 2.051991 ] [S] __raw_spin_lock_irqsave(&host->lock:0) > > > [ 2.051991 ] [W] wait(&(&ap->eh_wait_q)->dmap:0) > > > [ 2.051991 ] [E] spin_unlock(&host->lock:0) > > > > Sleeping with a spinlock held would be triggering warnings already, so > > these reports seem bogus to me. > > Yeah, Matthew pointed out the same thing for another use-case, where > it looks like DEPT is looking at the state at the wrong point (not at > the scheduling point, but at prepare_to_sleep()). > > This ata_port_wait() is the exact same pattern, ie we have > > spin_lock_irqsave(ap->lock, flags); > > while (ap->pflags & (ATA_PFLAG_EH_PENDING | ATA_PFLAG_EH_IN_PROGRESS)) { > prepare_to_wait(&ap->eh_wait_q, &wait, TASK_UNINTERRUPTIBLE); > spin_unlock_irqrestore(ap->lock, flags); > schedule(); > > and DEPT has incorrectly taken it to mean that 'ap->lock' is held > during the wait, when it is actually released before actually waiting. > > For the spin-locks, this is all very obvious (because they'd have been > caught long ago by much simpler debug code), but the same > prepare_to_wait -> wait pattern can most definitely happen with > sleeping locks too, so they are all slightly suspect. > > And yes, the detailed reports are hard to read because the locations > are given as "ata_port_wait_eh+0x52/0xc0". Running them through > scripts/decode_stacktrace.sh to turn them into filename and line > numbers - and also sort out inlining - would help a lot. > > Byungchul, could you fix those two issues? Some of your reports may Of couse, that's what I should do. Thanks for your feedback. > well be entirely valid, but the hard-to-read hex offsets and the > knowledge that at least some of them are confused about how > prepare_to_wait -> wait actually works makes the motivation to look at > the details much less.. > > Linus ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [REPORT] libata deadlock possibilities by DEPT 2022-02-16 4:12 [REPORT] libata deadlock possibilities by DEPT Byungchul Park 2022-02-16 4:16 ` Report 2 in ata_scsi_port_error_handler() Byungchul Park @ 2022-02-16 6:52 ` Byungchul Park 1 sibling, 0 replies; 10+ messages in thread From: Byungchul Park @ 2022-02-16 6:52 UTC (permalink / raw) To: damien.lemoal, linux-ide Cc: torvalds, mingo, linux-kernel, peterz, will, tglx, rostedt, joel, sashal, daniel.vetter, chris, duyuyang, johannes.berg, tj, tytso, willy, david, amir73il, bfields, gregkh, kernel-team, linux-mm, akpm, mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes, vbabka, ngupta, linux-block, axboe, paolo.valente, josef, linux-fsdevel, viro, jack, jlayton, dan.j.williams, hch, djwong, dri-devel, airlied, rodrigosiqueiramelo, melissa.srw, hamohammed.sa Thanks to Matthew, I found the reports are spurious.. I missed the case that it's not going to real sleep after preparing a wait. I'm sorry for bothering you guys. Please ignore the reports. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2022-02-17 23:57 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-02-16 4:12 [REPORT] libata deadlock possibilities by DEPT Byungchul Park 2022-02-16 4:16 ` Report 2 in ata_scsi_port_error_handler() Byungchul Park 2022-02-16 4:16 ` Report 3 " Byungchul Park 2022-02-16 4:16 ` Report " Byungchul Park 2022-02-16 6:37 ` Damien Le Moal 2022-02-16 18:09 ` Linus Torvalds 2022-02-16 18:33 ` Steven Rostedt 2022-02-17 23:55 ` Byungchul Park 2022-02-17 10:35 ` Byungchul Park 2022-02-16 6:52 ` [REPORT] libata deadlock possibilities by DEPT Byungchul Park
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).