* Re: [PATCH v3] sched/core: Always flush pending blk_plug
[not found] ` <20220709161956.GA20046@windriver.com>
@ 2022-07-10 1:01 ` Hillf Danton
2022-07-12 18:29 ` Paul Gortmaker
0 siblings, 1 reply; 3+ messages in thread
From: Hillf Danton @ 2022-07-10 1:01 UTC (permalink / raw)
To: Paul Gortmaker
Cc: John Keeping, linux-kernel, linux-mm, Peter Zijlstra,
Thomas Gleixner, Mel Gorman, Theodore Ts'o
On Sat, 9 Jul 2022 12:19:56 -0400 Paul Gortmaker wrote:
> On 08/07/2022 (Fri 17:27) John Keeping wrote:
> > With CONFIG_PREEMPT_RT, it is possible to hit a deadlock between two
> > normal priority tasks (SCHED_OTHER, nice level zero):
> >
> > INFO: task kworker/u8:0:8 blocked for more than 491 seconds.
> > Not tainted 5.15.49-rt46 #1
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > task:kworker/u8:0 state:D stack: 0 pid: 8 ppid: 2 flags:0x00000000
> > Workqueue: writeback wb_workfn (flush-7:0)
>
> This is great. I was on-and-off looking at an ext4 deadlock triggered
> by LTP's "ftest01" - and was just potting my notes together this morning
> to clarify for myself what I thought was going on before I forgot.
>
> Which is good, because I then could recognize the very similar patterns
> in what you saw to what I'd observed/deduced. Otherise I'd probably not
> have had your backtraces grab my attention at all.
>
> Long story short, I applied this to v5.19-rc3-rt5 and it resolves the
> LTP/ext4 deadlock issue. So...
>
> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
>
> As I'd already put together my ext4 debug info, I'll include it below
> in case it helps future people searching for LTP and ftest01.
>
> Thanks,
> Paul.
>
> -----------------
>
>
> There is a reproducible deadlock in ext4 on preempt-rt triggered by LTP
> "ftest01"
>
> Reproducer details are at the bottom. What seems to happen is as follows:
>
> The kworker doing writeback wb_workfn (pid 213 below) stalls on the
> one and only down_write in ext4_map_blocks() and never moves again.
>
> /*
> * New blocks allocate and/or writing to unwritten extent
> * will possibly result in updating i_data, so we take
> * the write lock of i_data_sem, and call get_block()
> * with create == 1 flag.
> */
> down_write(&EXT4_I(inode)->i_data_sem);
>
> So who is blocking the kworker? There are two of the ftest01 processes
> (pid 818 and pid 821) doing ext4 truncate which look like this:
>
> ext4_truncate+0x46a/0x660
> --> ext4_ext_truncate+0xd6/0xf0
> --> ext4_ext_remove_space+0x59f/0x1830
> --> jbd2_journal_get_write_access+0x51/0x80
> --> do_get_write_access+0x2ad/0x550
> --> complete_all+0x60/0x60
> --> bit_wait_io+0x11/0x60
>
> They too are stalled in D state waiting there forever - presumably
> waiting on the kwowrker to make the progress they are waiting for.
>
> Here is where I think lockdep shows us the problem:
>
> 5 locks held by ftest01/818:
> [...]
> #4: ffff957c2a679958 (&ei->i_data_sem){++++}-{3:3}, at: ext4_truncate+0x443/0x660
>
> 5 locks held by ftest01/821:
> [...]
> #4: ffff957c2a67bb78 (&ei->i_data_sem){++++}-{3:3}, at: ext4_truncate+0x443/0x660
>
> Yes - that is the one the kworker was trying the down_write() on.
>
> Also note that the bit_wait is a big telltale sign here - there is only
> one in the jbd2's do_get_write_access() function:
>
> /*
> * There is one case we have to be very careful about. If the
> * committing transaction is currently writing this buffer out to disk
> * and has NOT made a copy-out, then we cannot modify the buffer
> * contents at all right now. The essence of copy-out is that it is
> * the extra copy, not the primary copy, which gets journaled. If the
> * primary copy is already going to disk then we cannot do copy-out
> * here.
> */
> if (buffer_shadow(bh)) {
> JBUFFER_TRACE(jh, "on shadow: sleep");
> spin_unlock(&jh->b_state_lock);
> wait_on_bit_io(&bh->b_state, BH_Shadow, TASK_UNINTERRUPTIBLE);
> goto repeat;
> }
>
> Presumaby this is a corner case that preempt-rt makes easier to trigger?
>
> The full trace with interspersed lockdep info follows.
>
> Various detailed reproducer information follows that.
>
> -----------------------
>
> sysrq-w backtracce information interspersed with sysrq-d lockdep info
> on 5.19-rc2-rt3 kernel:
>
> task:kworker/u129:34 state:D stack: 0 pid: 213 ppid: 2 flags:0x00004000
> Workqueue: writeback wb_workfn (flush-7:0)
> Call Trace:
> <TASK>
> __schedule+0x329/0xae0
> ? __this_cpu_preempt_check+0x13/0x20
> ? __rt_mutex_slowlock.isra.21.constprop.25+0xc0e/0xc40
> schedule+0x48/0xc0
> __rt_mutex_slowlock.isra.21.constprop.25+0x1b8/0xc40
> rwbase_write_lock+0x98/0x6e0
> down_write+0x46/0x50
> ? ext4_map_blocks+0x156/0x5c0
> ext4_map_blocks+0x156/0x5c0
> ext4_writepages+0xa1a/0x1290
> ? __this_cpu_preempt_check+0x13/0x20
> ? lockdep_hardirqs_on+0xcd/0x150
> do_writepages+0xd2/0x1a0
> __writeback_single_inode+0x64/0x850
> writeback_sb_inodes+0x22d/0x5b0
> __writeback_inodes_wb+0x67/0xe0
> wb_writeback+0x269/0x5f0
> ? get_nr_inodes+0x49/0x70
> wb_workfn+0x43d/0x780
> ? __this_cpu_preempt_check+0x13/0x20
> process_one_work+0x286/0x710
> worker_thread+0x3c/0x3d0
> ? process_one_work+0x710/0x710
> kthread+0x13b/0x150
> ? kthread_complete_and_exit+0x20/0x20
> ret_from_fork+0x1f/0x30
> </TASK>
>
> 6 locks held by kworker/u129:34/213:
> #0: ffff957c25e9ef28 ((wq_completion)writeback){+.+.}-{0:0}, at: process_one_work+0x201/0x710
> #1: ffffb7e9c3797e68 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: process_one_work+0x201/0x710
> #2: ffff957b9c9b70d0 (&type->s_umount_key#34){++++}-{3:3}, at: trylock_super+0x1b/0x50
> #3: ffff957b8a9b2d18 (&sbi->s_writepages_rwsem){.+.+}-{0:0}, at: do_writepages+0xd2/0x1a0
> #4: ffff957b9c9b5be8 (jbd2_handle){++++}-{0:0}, at: start_this_handle+0x13e/0x5f0
> #5: ffff957c2a679958 (&ei->i_data_sem){++++}-{3:3}, at: ext4_map_blocks+0x156/0x5c0
>
>
> task:jbd2/loop0-8 state:D stack: 0 pid: 679 ppid: 2 flags:0x00004000
> Call Trace:
> <TASK>
> __schedule+0x329/0xae0
> schedule+0x48/0xc0
> jbd2_journal_wait_updates+0x7c/0xf0
> ? schedstat_stop+0x10/0x10
> jbd2_journal_commit_transaction+0x284/0x20e0
> ? __this_cpu_preempt_check+0x13/0x20
> ? find_held_lock+0x35/0xa0
> ? _raw_spin_unlock_irqrestore+0x86/0x8f
> ? _raw_spin_unlock_irqrestore+0x86/0x8f
> ? try_to_del_timer_sync+0x53/0x80
> ? trace_hardirqs_on+0x3b/0x120
> ? try_to_del_timer_sync+0x53/0x80
> kjournald2+0xd5/0x2b0
> ? schedstat_stop+0x10/0x10
> ? commit_timeout+0x20/0x20
> kthread+0x13b/0x150
> ? kthread_complete_and_exit+0x20/0x20
> ret_from_fork+0x1f/0x30
> </TASK>
>
>
> task:ftest01 state:D stack: 0 pid: 817 ppid: 816 flags:0x00004002
> Call Trace:
> <TASK>
> __schedule+0x329/0xae0
> ? __x64_sys_tee+0xd0/0xd0
> ? __x64_sys_tee+0xd0/0xd0
> schedule+0x48/0xc0
> wb_wait_for_completion+0x62/0x90
> ? schedstat_stop+0x10/0x10
> sync_inodes_sb+0xdd/0x460
> ? __this_cpu_preempt_check+0x13/0x20
> ? __x64_sys_tee+0xd0/0xd0
> sync_inodes_one_sb+0x15/0x20
> iterate_supers+0x94/0x100
> ksys_sync+0x42/0xa0
> __do_sys_sync+0xe/0x20
> do_syscall_64+0x3f/0x90
> entry_SYSCALL_64_after_hwframe+0x46/0xb0
> RIP: 0033:0x7f3f9dae2cdb
> RSP: 002b:00007ffe06430318 EFLAGS: 00000246 ORIG_RAX: 00000000000000a2
> RAX: ffffffffffffffda RBX: 0000000000000800 RCX: 00007f3f9dae2cdb
> RDX: 0000000000000040 RSI: 0000561f57de52f0 RDI: 0000561f57de5340
> RBP: 00000000000ff000 R08: 0000000000000000 R09: 00007f3f9dbce0a0
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 0000000000000207 R14: 0000000000000049 R15: 0000000000000003
> </TASK>
>
> 1 lock held by ftest01/817:
> #0: ffff957b9c9b70d0 (&type->s_umount_key#34){++++}-{3:3}, at: iterate_supers+0x7d/0x100
>
>
> task:ftest01 state:D stack: 0 pid: 818 ppid: 816 flags:0x00004002
> Call Trace:
> <TASK>
> __schedule+0x329/0xae0
> ? bit_wait+0x60/0x60
> schedule+0x48/0xc0
> io_schedule+0x16/0x40
> bit_wait_io+0x11/0x60
> __wait_on_bit_lock+0x5e/0xb0
> out_of_line_wait_on_bit_lock+0x93/0xb0
> ? complete_all+0x60/0x60
> __lock_buffer+0x3f/0x50
pid: 818 was locking buffer
> do_get_write_access+0x2ad/0x550
> jbd2_journal_get_write_access+0x51/0x80
> __ext4_journal_get_write_access+0xf9/0x1a0
> ext4_ext_get_access.isra.32+0x34/0x50
> ext4_ext_remove_space+0x59f/0x1830
> ? ext4_ext_truncate+0xa2/0xf0
> ? ext4_ext_truncate+0x5e/0xf0
> ext4_ext_truncate+0xd6/0xf0
> ext4_truncate+0x46a/0x660
> ext4_setattr+0x6db/0xa40
> notify_change+0x3e5/0x540
> do_truncate+0x7b/0xc0
> ? do_truncate+0x7b/0xc0
> do_sys_ftruncate+0x164/0x2f0
> __x64_sys_ftruncate+0x1b/0x20
> do_syscall_64+0x3f/0x90
> entry_SYSCALL_64_after_hwframe+0x46/0xb0
> RIP: 0033:0x7f3f9dae3f7b
> RSP: 002b:00007ffe06430318 EFLAGS: 00000202 ORIG_RAX: 000000000000004d
> RAX: ffffffffffffffda RBX: 00000000000001a5 RCX: 00007f3f9dae3f7b
> RDX: 00000000000001a5 RSI: 00000000000d2800 RDI: 0000000000000003
> RBP: 00000000000001a5 R08: 00007f3f9dbce090 R09: 00007f3f9dbce0a0
> R10: 0000000000000008 R11: 0000000000000202 R12: 0000000000000000
> R13: 0000000000000207 R14: 0000000000000043 R15: 0000000000000003
> </TASK>
>
> 5 locks held by ftest01/818:
> #0: ffff957b9c9b74e8 (sb_writers#3){.+.+}-{0:0}, at: __x64_sys_ftruncate+0x1b/0x20
> #1: ffff957c2a679b00 (&sb->s_type->i_mutex_key#10){+.+.}-{3:3}, at: do_truncate+0x6a/0xc0
> #2: ffff957c2a679cc0 (mapping.invalidate_lock){++++}-{3:3}, at: ext4_setattr+0x39a/0xa40
> #3: ffff957b9c9b5be8 (jbd2_handle){++++}-{0:0}, at: start_this_handle+0x13e/0x5f0
> #4: ffff957c2a679958 (&ei->i_data_sem){++++}-{3:3}, at: ext4_truncate+0x443/0x660
>
>
> task:ftest01 state:D stack: 0 pid: 819 ppid: 816 flags:0x00004002
> Call Trace:
> <TASK>
> __schedule+0x329/0xae0
> ? __x64_sys_tee+0xd0/0xd0
> ? __x64_sys_tee+0xd0/0xd0
> schedule+0x48/0xc0
> wb_wait_for_completion+0x62/0x90
> ? schedstat_stop+0x10/0x10
> sync_inodes_sb+0xdd/0x460
> ? __this_cpu_preempt_check+0x13/0x20
> ? __x64_sys_tee+0xd0/0xd0
> sync_inodes_one_sb+0x15/0x20
> iterate_supers+0x94/0x100
> ksys_sync+0x42/0xa0
> __do_sys_sync+0xe/0x20
> do_syscall_64+0x3f/0x90
> entry_SYSCALL_64_after_hwframe+0x46/0xb0
> RIP: 0033:0x7f3f9dae2cdb
> RSP: 002b:00007ffe06430318 EFLAGS: 00000246 ORIG_RAX: 00000000000000a2
> RAX: ffffffffffffffda RBX: 0000000000000800 RCX: 00007f3f9dae2cdb
> RDX: 0000000000000040 RSI: 0000561f57de52f0 RDI: 0000561f57de5340
> RBP: 00000000000c2800 R08: 00000000fffff800 R09: 0000000000004000
> R10: 0000000000000010 R11: 0000000000000246 R12: 0000000000000000
> R13: 0000000000000207 R14: 0000000000000075 R15: 0000000000000003
> </TASK>
>
> 1 lock held by ftest01/819:
> #0: ffff957b9c9b70d0 (&type->s_umount_key#34){++++}-{3:3}, at: iterate_supers+0x7d/0x100
>
>
> task:ftest01 state:D stack: 0 pid: 820 ppid: 816 flags:0x00004002
> Call Trace:
> <TASK>
> __schedule+0x329/0xae0
> ? __x64_sys_tee+0xd0/0xd0
> ? __x64_sys_tee+0xd0/0xd0
> schedule+0x48/0xc0
> wb_wait_for_completion+0x62/0x90
> ? schedstat_stop+0x10/0x10
> sync_inodes_sb+0xdd/0x460
> ? __this_cpu_preempt_check+0x13/0x20
> ? __x64_sys_tee+0xd0/0xd0
> sync_inodes_one_sb+0x15/0x20
> iterate_supers+0x94/0x100
> ksys_sync+0x42/0xa0
> __do_sys_sync+0xe/0x20
> do_syscall_64+0x3f/0x90
> entry_SYSCALL_64_after_hwframe+0x46/0xb0
> RIP: 0033:0x7f3f9dae2cdb
> RSP: 002b:00007ffe06430318 EFLAGS: 00000246 ORIG_RAX: 00000000000000a2
> RAX: ffffffffffffffda RBX: 0000000000000800 RCX: 00007f3f9dae2cdb
> RDX: 0000000000000040 RSI: 0000561f57de52f0 RDI: 0000561f57de5340
> RBP: 000000000006a800 R08: 00000000fffff800 R09: 0000000000004000
> R10: 0000000000000010 R11: 0000000000000246 R12: 0000000000000000
> R13: 0000000000000207 R14: 0000000000000069 R15: 0000000000000003
> </TASK>
>
> 1 lock held by ftest01/820:
> #0: ffff957b9c9b70d0 (&type->s_umount_key#34){++++}-{3:3}, at: iterate_supers+0x7d/0x100
>
>
> task:ftest01 state:D stack: 0 pid: 821 ppid: 816 flags:0x00000002
> Call Trace:
> <TASK>
> __schedule+0x329/0xae0
> ? bit_wait+0x60/0x60
> schedule+0x48/0xc0
> io_schedule+0x16/0x40
> bit_wait_io+0x11/0x60
> __wait_on_bit_lock+0x5e/0xb0
> out_of_line_wait_on_bit_lock+0x93/0xb0
> ? complete_all+0x60/0x60
> __lock_buffer+0x3f/0x50
So was pid: 821.
> do_get_write_access+0x2ad/0x550
> jbd2_journal_get_write_access+0x51/0x80
> __ext4_journal_get_write_access+0xf9/0x1a0
> ext4_ext_get_access.isra.32+0x34/0x50
> ext4_ext_remove_space+0x59f/0x1830
> ? ext4_ext_truncate+0xa2/0xf0
> ? ext4_ext_truncate+0x5e/0xf0
> ext4_ext_truncate+0xd6/0xf0
> ext4_truncate+0x46a/0x660
> ext4_setattr+0x6db/0xa40
> notify_change+0x3e5/0x540
> do_truncate+0x7b/0xc0
> ? do_truncate+0x7b/0xc0
> do_sys_ftruncate+0x164/0x2f0
> __x64_sys_ftruncate+0x1b/0x20
> do_syscall_64+0x3f/0x90
> entry_SYSCALL_64_after_hwframe+0x46/0xb0
> RIP: 0033:0x7f3f9dae3f7b
> RSP: 002b:00007ffe06430318 EFLAGS: 00000202 ORIG_RAX: 000000000000004d
> RAX: ffffffffffffffda RBX: 000000000000016a RCX: 00007f3f9dae3f7b
> RDX: 000000000000016a RSI: 00000000000b5000 RDI: 0000000000000003
> RBP: 000000000000016a R08: 00007f3f9dbce080 R09: 00007f3f9dbce0a0
> R10: 0000000000000002 R11: 0000000000000202 R12: 0000000000000000
> R13: 0000000000000207 R14: 0000000000000040 R15: 0000000000000003
> </TASK>
>
> 5 locks held by ftest01/821:
> #0: ffff957b9c9b74e8 (sb_writers#3){.+.+}-{0:0}, at: __x64_sys_ftruncate+0x1b/0x20
> #1: ffff957c2a67bd20 (&sb->s_type->i_mutex_key#10){+.+.}-{3:3}, at: do_truncate+0x6a/0xc0
> #2: ffff957c2a67bee0 (mapping.invalidate_lock){++++}-{3:3}, at: ext4_setattr+0x39a/0xa40
> #3: ffff957b9c9b5be8 (jbd2_handle){++++}-{0:0}, at: start_this_handle+0x13e/0x5f0
> #4: ffff957c2a67bb78 (&ei->i_data_sem){++++}-{3:3}, at: ext4_truncate+0x443/0x660
>
Looks like a ABBA deadlock that should be fed to lockdep, given what was
blocking flusher (pid 213).
> --------------------------------------
>
> Reproducer:
>
> Get latest preempt-rt kernel:
> git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git
> checkout v5.19-rc3-rt5 (or linux-5.19.y-rt branch)
> make defconfig
> manually enable CONFIG_EXPERT and then CONFIG_PREEMPT_RT
> make oldconfig and take defaults for other options hidden behind EXPERT
> can optionally enable LOCKDEP options so sysrq-d works.
>
> Get LTP ; build + install
> https://github.com/linux-test-project/ltp.git
>
> Boot and then run reproducer:
> dd if=/dev/zero of=/root/tmp_mount_file bs=1M count=512
> mkfs.ext4 -F /root/tmp_mount_file
> mount /root/tmp_mount_file /mnt/
> /opt/ltp/runltp -f fs -s ftest01 -d /mnt/
> umount /mnt
>
> loop device isn't strictly necessary but contains fs fallout when
> inevitable reboot is required to get rid of D state processes.
>
> Test completes on v5.10-rt, v5.12-rt - should take about 2-3m total.
> Test hangs forever on v5.15-rt and latest v5.19-rc-rt. Test threads
> in D state - any other i/o directed at /mnt also hangs in D state.
>
Make lock_buffer have mutex_lock() semantics in attempt to catch the
deadlock above.
Only for thoughts now.
Hillf
--- mainline/fs/buffer.c
+++ b/fs/buffer.c
@@ -66,12 +66,17 @@ EXPORT_SYMBOL(touch_buffer);
void __lock_buffer(struct buffer_head *bh)
{
+ /* lock_buffer has mutex_lock() semantics here */
+ mutex_acquire(&bh->b_dep_map, 0, 0, _RET_IP_);
+
wait_on_bit_lock_io(&bh->b_state, BH_Lock, TASK_UNINTERRUPTIBLE);
}
EXPORT_SYMBOL(__lock_buffer);
void unlock_buffer(struct buffer_head *bh)
{
+ mutex_release(&bh->b_dep_map, _RET_IP_);
+
clear_bit_unlock(BH_Lock, &bh->b_state);
smp_mb__after_atomic();
wake_up_bit(&bh->b_state, BH_Lock);
@@ -3289,12 +3294,22 @@ static void recalc_bh_state(void)
buffer_heads_over_limit = (tot > max_buffer_heads);
}
+static void buffer_head_init_lockdep(struct buffer_head *buf)
+{
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+ static struct lock_class_key bh_key;
+
+ lockdep_init_map_wait(&buf->b_dep_map, "buffer_head_lock", &bh_key, 0, LD_WAIT_SLEEP);
+#endif
+}
+
struct buffer_head *alloc_buffer_head(gfp_t gfp_flags)
{
struct buffer_head *ret = kmem_cache_zalloc(bh_cachep, gfp_flags);
if (ret) {
INIT_LIST_HEAD(&ret->b_assoc_buffers);
spin_lock_init(&ret->b_uptodate_lock);
+ buffer_head_init_lockdep(ret);
preempt_disable();
__this_cpu_inc(bh_accounting.nr);
recalc_bh_state();
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -76,6 +76,9 @@ struct buffer_head {
spinlock_t b_uptodate_lock; /* Used by the first bh in a page, to
* serialise IO completion of other
* buffers in the page */
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+ struct lockdep_map b_dep_map;
+#endif
};
/*
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v3] sched/core: Always flush pending blk_plug
2022-07-10 1:01 ` [PATCH v3] sched/core: Always flush pending blk_plug Hillf Danton
@ 2022-07-12 18:29 ` Paul Gortmaker
2022-07-12 22:38 ` Hillf Danton
0 siblings, 1 reply; 3+ messages in thread
From: Paul Gortmaker @ 2022-07-12 18:29 UTC (permalink / raw)
To: Hillf Danton
Cc: John Keeping, linux-kernel, linux-mm, Peter Zijlstra,
Thomas Gleixner, Mel Gorman, Theodore Ts'o
[Re: [PATCH v3] sched/core: Always flush pending blk_plug] On 10/07/2022 (Sun 09:01) Hillf Danton wrote:
> Looks like a ABBA deadlock that should be fed to lockdep, given what was
> blocking flusher (pid 213).
[,,,]
> Make lock_buffer have mutex_lock() semantics in attempt to catch the
> deadlock above.
>
> Only for thoughts now.
Thanks for the test patch - I'm keeping it as a reference for future
use.
So, preempt-rt issues an early lockdep splat at partition probe bootup,
both with and without the original v3 patch from this thread.
Of course then I figured I'd try the debug patch on vanilla v5.19-rc3
and we get pretty much the same lockdep complaint.
sd 1:0:0:0: Attached scsi generic sg1 type 0
sd 1:0:0:0: [sdb] Preferred minimum I/O size 512 bytes
scsi 2:0:0:0: CD-ROM HL-DT-ST DVD-ROM DH30N A101 PQ: 0 ANSI: 5
=====================================
WARNING: bad unlock balance detected!
5.19.0-rc3-dirty #2 Not tainted
-------------------------------------
swapper/2/0 is trying to release lock (buffer_head_lock) at:
[<ffffffffbc494a6b>] end_buffer_async_read+0x5b/0x180
but there are no more locks to release!
other info that might help us debug this:
1 lock held by swapper/2/0:
#0: ffff8bee27744080 (&ret->b_uptodate_lock){....}-{2:2}, at: end_buffer_async_read+0x47/0x180
stack backtrace:
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.19.0-rc3-dirty #2
Hardware name: Dell Inc. Precision WorkStation T5500 /0D883F, BIOS A16 05/28/2013
Call Trace:
<IRQ>
dump_stack_lvl+0x40/0x5c
lock_release+0x245/0x3f0
unlock_buffer+0x15/0x30
end_buffer_async_read+0x5b/0x180
end_bio_bh_io_sync+0x1e/0x40
blk_update_request+0x9a/0x470
scsi_end_request+0x27/0x190
scsi_io_completion+0x3c/0x580
blk_complete_reqs+0x39/0x50
__do_softirq+0x11d/0x344
irq_exit_rcu+0xa9/0xc0
common_interrupt+0xa5/0xc0
</IRQ>
<TASK>
asm_common_interrupt+0x27/0x40
RIP: 0010:cpuidle_enter_state+0x12d/0x3f0
Code: 49 43 0f 84 b7 02 00 00 31 ff e8 fe 1c 74 ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 8e 02 00 00 31 ff e8 97 95 7a ff fb 45 85 f6 <0f> 88 12 01 00 00 49 63 d6 4c 2b 24 24 48 8d 04 52 48 8d 04 82 49
RSP: 0000:ffffa2890013fe90 EFLAGS: 00000206
RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000002
RDX: 0000000080000002 RSI: ffffffffbd78d911 RDI: 00000000ffffffff
RBP: ffff8bee2739a400 R08: 0000000000000000 R09: 00000000001faf40
R10: ffffa2890013fdc8 R11: 0000000000000000 R12: 00000000a7b149d6
R13: ffffffffbdbee3a0 R14: 0000000000000003 R15: 0000000000000001
cpuidle_enter+0x24/0x40
do_idle+0x1e3/0x230
cpu_startup_entry+0x14/0x20
start_secondary+0xe8/0xf0
secondary_startup_64_no_verify+0xe0/0xeb
</TASK>
sda: sda1 sda2 sda3
sd 0:0:0:0: [sda] Attached SCSI disk
sdb: sdb1 sdb2
sd 1:0:0:0: [sdb] Attached SCSI disk
Not quite sure what to make of that.
Paul.
--
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v3] sched/core: Always flush pending blk_plug
2022-07-12 18:29 ` Paul Gortmaker
@ 2022-07-12 22:38 ` Hillf Danton
0 siblings, 0 replies; 3+ messages in thread
From: Hillf Danton @ 2022-07-12 22:38 UTC (permalink / raw)
To: Paul Gortmaker
Cc: John Keeping, linux-kernel, linux-mm, Peter Zijlstra,
Thomas Gleixner, Mel Gorman, Theodore Ts'o
On Tue, 12 Jul 2022 14:29:14 -0400 Paul Gortmaker wrote:
> On 10/07/2022 (Sun 09:01) Hillf Danton wrote:
>
> > Looks like a ABBA deadlock that should be fed to lockdep, given what was
> > blocking flusher (pid 213).
>
> [,,,]
>
> > Make lock_buffer have mutex_lock() semantics in attempt to catch the
> > deadlock above.
> >
> > Only for thoughts now.
>
> Thanks for the test patch - I'm keeping it as a reference for future
> use.
>
> So, preempt-rt issues an early lockdep splat at partition probe bootup,
> both with and without the original v3 patch from this thread.
>
> Of course then I figured I'd try the debug patch on vanilla v5.19-rc3
> and we get pretty much the same lockdep complaint.
>
>
> sd 1:0:0:0: Attached scsi generic sg1 type 0
> sd 1:0:0:0: [sdb] Preferred minimum I/O size 512 bytes
> scsi 2:0:0:0: CD-ROM HL-DT-ST DVD-ROM DH30N A101 PQ: 0 ANSI: 5
>
> =====================================
> WARNING: bad unlock balance detected!
> 5.19.0-rc3-dirty #2 Not tainted
> -------------------------------------
> swapper/2/0 is trying to release lock (buffer_head_lock) at:
> [<ffffffffbc494a6b>] end_buffer_async_read+0x5b/0x180
> but there are no more locks to release!
The lock releaser different from the lock acquirer was caught.
>
> other info that might help us debug this:
> 1 lock held by swapper/2/0:
> #0: ffff8bee27744080 (&ret->b_uptodate_lock){....}-{2:2}, at: end_buffer_async_read+0x47/0x180
>
> stack backtrace:
> CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.19.0-rc3-dirty #2
> Hardware name: Dell Inc. Precision WorkStation T5500 /0D883F, BIOS A16 05/28/2013
> Call Trace:
> <IRQ>
> dump_stack_lvl+0x40/0x5c
> lock_release+0x245/0x3f0
> unlock_buffer+0x15/0x30
> end_buffer_async_read+0x5b/0x180
> end_bio_bh_io_sync+0x1e/0x40
> blk_update_request+0x9a/0x470
> scsi_end_request+0x27/0x190
> scsi_io_completion+0x3c/0x580
> blk_complete_reqs+0x39/0x50
> __do_softirq+0x11d/0x344
> irq_exit_rcu+0xa9/0xc0
> common_interrupt+0xa5/0xc0
> </IRQ>
> <TASK>
> asm_common_interrupt+0x27/0x40
> RIP: 0010:cpuidle_enter_state+0x12d/0x3f0
> Code: 49 43 0f 84 b7 02 00 00 31 ff e8 fe 1c 74 ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 8e 02 00 00 31 ff e8 97 95 7a ff fb 45 85 f6 <0f> 88 12 01 00 00 49 63 d6 4c 2b 24 24 48 8d 04 52 48 8d 04 82 49
> RSP: 0000:ffffa2890013fe90 EFLAGS: 00000206
> RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000002
> RDX: 0000000080000002 RSI: ffffffffbd78d911 RDI: 00000000ffffffff
> RBP: ffff8bee2739a400 R08: 0000000000000000 R09: 00000000001faf40
> R10: ffffa2890013fdc8 R11: 0000000000000000 R12: 00000000a7b149d6
> R13: ffffffffbdbee3a0 R14: 0000000000000003 R15: 0000000000000001
> cpuidle_enter+0x24/0x40
> do_idle+0x1e3/0x230
> cpu_startup_entry+0x14/0x20
> start_secondary+0xe8/0xf0
> secondary_startup_64_no_verify+0xe0/0xeb
> </TASK>
> sda: sda1 sda2 sda3
> sd 0:0:0:0: [sda] Attached SCSI disk
> sdb: sdb1 sdb2
> sd 1:0:0:0: [sdb] Attached SCSI disk
>
> Not quite sure what to make of that.
You provide the answer to the question why it has been quite a long while
that lock_buffer has no annotation to help lockdep. And perhaps lock_page
as well despite the relevant long running deadlock does not sit deep under
the water.
Thanks
Hillf
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-07-12 22:39 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20220708162702.1758865-1-john@metanate.com>
[not found] ` <20220709161956.GA20046@windriver.com>
2022-07-10 1:01 ` [PATCH v3] sched/core: Always flush pending blk_plug Hillf Danton
2022-07-12 18:29 ` Paul Gortmaker
2022-07-12 22:38 ` Hillf Danton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).