* Re: [PATCH] fuse: disable irqs for fuse_iqueue::waitq.lock
[not found] ` <20190822233529.4176-1-ebiggers@kernel.org>
@ 2019-09-03 7:31 ` Miklos Szeredi
2019-09-03 13:39 ` Eric Biggers
0 siblings, 1 reply; 5+ messages in thread
From: Miklos Szeredi @ 2019-09-03 7:31 UTC (permalink / raw)
To: Eric Biggers
Cc: linux-fsdevel, linux-aio, Benjamin LaHaise, syzkaller-bugs,
stable, Christoph Hellwig
On Fri, Aug 23, 2019 at 1:35 AM Eric Biggers <ebiggers@kernel.org> wrote:
>
> From: Eric Biggers <ebiggers@google.com>
>
> When IOCB_CMD_POLL is used on the FUSE device, aio_poll() disables IRQs
> and takes kioctx::ctx_lock, then fuse_iqueue::waitq.lock.
Not in -linus.
Which tree was this reproduced with?
Thanks,
Miklos
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] fuse: disable irqs for fuse_iqueue::waitq.lock
2019-09-03 7:31 ` [PATCH] fuse: disable irqs for fuse_iqueue::waitq.lock Miklos Szeredi
@ 2019-09-03 13:39 ` Eric Biggers
2019-09-04 14:29 ` Miklos Szeredi
0 siblings, 1 reply; 5+ messages in thread
From: Eric Biggers @ 2019-09-03 13:39 UTC (permalink / raw)
To: Miklos Szeredi
Cc: linux-fsdevel, linux-aio, Benjamin LaHaise, syzkaller-bugs,
stable, Christoph Hellwig
On Tue, Sep 03, 2019 at 09:31:29AM +0200, Miklos Szeredi wrote:
> On Fri, Aug 23, 2019 at 1:35 AM Eric Biggers <ebiggers@kernel.org> wrote:
> >
> > From: Eric Biggers <ebiggers@google.com>
> >
> > When IOCB_CMD_POLL is used on the FUSE device, aio_poll() disables IRQs
> > and takes kioctx::ctx_lock, then fuse_iqueue::waitq.lock.
>
> Not in -linus.
>
> Which tree was this reproduced with?
>
> Thanks,
> Miklos
Linus's tree. Here's the full symbolized output on v5.3-rc7:
=====================================================
WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
5.3.0-rc7 #1 Not tainted
-----------------------------------------------------
syz_fuse/150 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
00000000c2701b26 (&fiq->waitq){+.+.}, at: spin_lock include/linux/spinlock.h:338 [inline]
00000000c2701b26 (&fiq->waitq){+.+.}, at: aio_poll fs/aio.c:1751 [inline]
00000000c2701b26 (&fiq->waitq){+.+.}, at: __io_submit_one.constprop.0+0x203/0x5b0 fs/aio.c:1825
and this task is already holding:
00000000f4c02e81 (&(&ctx->ctx_lock)->rlock){..-.}, at: spin_lock_irq include/linux/spinlock.h:363 [inline]
00000000f4c02e81 (&(&ctx->ctx_lock)->rlock){..-.}, at: aio_poll fs/aio.c:1749 [inline]
00000000f4c02e81 (&(&ctx->ctx_lock)->rlock){..-.}, at: __io_submit_one.constprop.0+0x1f4/0x5b0 fs/aio.c:1825
which would create a new lock dependency:
(&(&ctx->ctx_lock)->rlock){..-.} -> (&fiq->waitq){+.+.}
but this new dependency connects a SOFTIRQ-irq-safe lock:
(&(&ctx->ctx_lock)->rlock){..-.}
... which became SOFTIRQ-irq-safe at:
lock_acquire+0x99/0x180 kernel/locking/lockdep.c:4412
__raw_spin_lock_irq include/linux/spinlock_api_smp.h:128 [inline]
_raw_spin_lock_irq+0x32/0x50 kernel/locking/spinlock.c:167
spin_lock_irq include/linux/spinlock.h:363 [inline]
free_ioctx_users+0x25/0x190 fs/aio.c:618
percpu_ref_put_many include/linux/percpu-refcount.h:293 [inline]
percpu_ref_put include/linux/percpu-refcount.h:309 [inline]
percpu_ref_call_confirm_rcu lib/percpu-refcount.c:130 [inline]
percpu_ref_switch_to_atomic_rcu+0x202/0x210 lib/percpu-refcount.c:165
__rcu_reclaim kernel/rcu/rcu.h:222 [inline]
rcu_do_batch+0x2ae/0x890 kernel/rcu/tree.c:2114
rcu_core+0x13a/0x370 kernel/rcu/tree.c:2314
rcu_core_si+0x9/0x10 kernel/rcu/tree.c:2323
__do_softirq+0xbe/0x400 kernel/softirq.c:292
invoke_softirq kernel/softirq.c:373 [inline]
irq_exit+0x8f/0xc0 kernel/softirq.c:413
exiting_irq arch/x86/include/asm/apic.h:537 [inline]
smp_apic_timer_interrupt+0x8e/0x210 arch/x86/kernel/apic/apic.c:1133
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830
native_safe_halt arch/x86/include/asm/irqflags.h:60 [inline]
arch_safe_halt arch/x86/include/asm/irqflags.h:103 [inline]
default_idle+0x29/0x160 arch/x86/kernel/process.c:580
arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:571
default_idle_call+0x1e/0x30 kernel/sched/idle.c:94
cpuidle_idle_call kernel/sched/idle.c:154 [inline]
do_idle+0x1f0/0x220 kernel/sched/idle.c:263
cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:354
rest_init+0x18a/0x24d init/main.c:451
arch_call_rest_init+0x9/0xc
start_kernel+0x530/0x54e init/main.c:785
x86_64_start_reservations+0x29/0x2b arch/x86/kernel/head64.c:472
x86_64_start_kernel+0x6d/0x71 arch/x86/kernel/head64.c:453
secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:241
to a SOFTIRQ-irq-unsafe lock:
(&fiq->waitq){+.+.}
... which became SOFTIRQ-irq-unsafe at:
...
lock_acquire+0x99/0x180 kernel/locking/lockdep.c:4412
__raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
_raw_spin_lock+0x2b/0x40 kernel/locking/spinlock.c:151
spin_lock include/linux/spinlock.h:338 [inline]
flush_bg_queue+0x91/0xf0 fs/fuse/dev.c:415
fuse_request_queue_background+0xd1/0x140 fs/fuse/dev.c:676
fuse_request_send_background+0x27/0x60 fs/fuse/dev.c:687
fuse_send_init fs/fuse/inode.c:986 [inline]
fuse_fill_super+0x656/0x681 fs/fuse/inode.c:1211
mount_nodev+0x42/0x90 fs/super.c:1329
fuse_mount+0x13/0x20 fs/fuse/inode.c:1236
legacy_get_tree+0x2c/0x50 fs/fs_context.c:661
vfs_get_tree+0x22/0xc0 fs/super.c:1413
do_new_mount fs/namespace.c:2791 [inline]
do_mount+0x7e3/0xa60 fs/namespace.c:3111
ksys_mount+0x7d/0xd0 fs/namespace.c:3320
__do_sys_mount fs/namespace.c:3334 [inline]
__se_sys_mount fs/namespace.c:3331 [inline]
__x64_sys_mount+0x20/0x30 fs/namespace.c:3331
do_syscall_64+0x4a/0x1a0 arch/x86/entry/common.c:296
entry_SYSCALL_64_after_hwframe+0x49/0xbe
other info that might help us debug this:
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&fiq->waitq);
local_irq_disable();
lock(&(&ctx->ctx_lock)->rlock);
lock(&fiq->waitq);
<Interrupt>
lock(&(&ctx->ctx_lock)->rlock);
*** DEADLOCK ***
1 lock held by syz_fuse/150:
#0: 00000000f4c02e81 (&(&ctx->ctx_lock)->rlock){..-.}, at: spin_lock_irq include/linux/spinlock.h:363 [inline]
#0: 00000000f4c02e81 (&(&ctx->ctx_lock)->rlock){..-.}, at: aio_poll fs/aio.c:1749 [inline]
#0: 00000000f4c02e81 (&(&ctx->ctx_lock)->rlock){..-.}, at: __io_submit_one.constprop.0+0x1f4/0x5b0 fs/aio.c:1825
the dependencies between SOFTIRQ-irq-safe lock and the holding lock:
-> (&(&ctx->ctx_lock)->rlock){..-.} {
IN-SOFTIRQ-W at:
lock_acquire+0x99/0x180 kernel/locking/lockdep.c:4412
__raw_spin_lock_irq include/linux/spinlock_api_smp.h:128 [inline]
_raw_spin_lock_irq+0x32/0x50 kernel/locking/spinlock.c:167
spin_lock_irq include/linux/spinlock.h:363 [inline]
free_ioctx_users+0x25/0x190 fs/aio.c:618
percpu_ref_put_many include/linux/percpu-refcount.h:293 [inline]
percpu_ref_put include/linux/percpu-refcount.h:309 [inline]
percpu_ref_call_confirm_rcu lib/percpu-refcount.c:130 [inline]
percpu_ref_switch_to_atomic_rcu+0x202/0x210 lib/percpu-refcount.c:165
__rcu_reclaim kernel/rcu/rcu.h:222 [inline]
rcu_do_batch+0x2ae/0x890 kernel/rcu/tree.c:2114
rcu_core+0x13a/0x370 kernel/rcu/tree.c:2314
rcu_core_si+0x9/0x10 kernel/rcu/tree.c:2323
__do_softirq+0xbe/0x400 kernel/softirq.c:292
invoke_softirq kernel/softirq.c:373 [inline]
irq_exit+0x8f/0xc0 kernel/softirq.c:413
exiting_irq arch/x86/include/asm/apic.h:537 [inline]
smp_apic_timer_interrupt+0x8e/0x210 arch/x86/kernel/apic/apic.c:1133
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830
native_safe_halt arch/x86/include/asm/irqflags.h:60 [inline]
arch_safe_halt arch/x86/include/asm/irqflags.h:103 [inline]
default_idle+0x29/0x160 arch/x86/kernel/process.c:580
arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:571
default_idle_call+0x1e/0x30 kernel/sched/idle.c:94
cpuidle_idle_call kernel/sched/idle.c:154 [inline]
do_idle+0x1f0/0x220 kernel/sched/idle.c:263
cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:354
rest_init+0x18a/0x24d init/main.c:451
arch_call_rest_init+0x9/0xc
start_kernel+0x530/0x54e init/main.c:785
x86_64_start_reservations+0x29/0x2b arch/x86/kernel/head64.c:472
x86_64_start_kernel+0x6d/0x71 arch/x86/kernel/head64.c:453
secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:241
INITIAL USE at:
lock_acquire+0x99/0x180 kernel/locking/lockdep.c:4412
__raw_spin_lock_irq include/linux/spinlock_api_smp.h:128 [inline]
_raw_spin_lock_irq+0x32/0x50 kernel/locking/spinlock.c:167
spin_lock_irq include/linux/spinlock.h:363 [inline]
free_ioctx_users+0x25/0x190 fs/aio.c:618
percpu_ref_put_many include/linux/percpu-refcount.h:293 [inline]
percpu_ref_put include/linux/percpu-refcount.h:309 [inline]
percpu_ref_call_confirm_rcu lib/percpu-refcount.c:130 [inline]
percpu_ref_switch_to_atomic_rcu+0x202/0x210 lib/percpu-refcount.c:165
__rcu_reclaim kernel/rcu/rcu.h:222 [inline]
rcu_do_batch+0x2ae/0x890 kernel/rcu/tree.c:2114
rcu_core+0x13a/0x370 kernel/rcu/tree.c:2314
rcu_core_si+0x9/0x10 kernel/rcu/tree.c:2323
__do_softirq+0xbe/0x400 kernel/softirq.c:292
invoke_softirq kernel/softirq.c:373 [inline]
irq_exit+0x8f/0xc0 kernel/softirq.c:413
exiting_irq arch/x86/include/asm/apic.h:537 [inline]
smp_apic_timer_interrupt+0x8e/0x210 arch/x86/kernel/apic/apic.c:1133
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830
native_safe_halt arch/x86/include/asm/irqflags.h:60 [inline]
arch_safe_halt arch/x86/include/asm/irqflags.h:103 [inline]
default_idle+0x29/0x160 arch/x86/kernel/process.c:580
arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:571
default_idle_call+0x1e/0x30 kernel/sched/idle.c:94
cpuidle_idle_call kernel/sched/idle.c:154 [inline]
do_idle+0x1f0/0x220 kernel/sched/idle.c:263
cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:354
rest_init+0x18a/0x24d init/main.c:451
arch_call_rest_init+0x9/0xc
start_kernel+0x530/0x54e init/main.c:785
x86_64_start_reservations+0x29/0x2b arch/x86/kernel/head64.c:472
x86_64_start_kernel+0x6d/0x71 arch/x86/kernel/head64.c:453
secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:241
}
... key at: [<ffffffff82860670>] __key.50377+0x0/0x10
... acquired at:
check_prev_add+0xa7/0x950 kernel/locking/lockdep.c:2409
check_prevs_add kernel/locking/lockdep.c:2507 [inline]
validate_chain+0x483/0x870 kernel/locking/lockdep.c:2897
__lock_acquire+0x447/0x7d0 kernel/locking/lockdep.c:3880
lock_acquire+0x99/0x180 kernel/locking/lockdep.c:4412
__raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
_raw_spin_lock+0x2b/0x40 kernel/locking/spinlock.c:151
spin_lock include/linux/spinlock.h:338 [inline]
aio_poll fs/aio.c:1751 [inline]
__io_submit_one.constprop.0+0x203/0x5b0 fs/aio.c:1825
io_submit_one+0x146/0x5b0 fs/aio.c:1862
__do_sys_io_submit fs/aio.c:1921 [inline]
__se_sys_io_submit+0x8e/0x270 fs/aio.c:1891
__x64_sys_io_submit+0x15/0x20 fs/aio.c:1891
do_syscall_64+0x4a/0x1a0 arch/x86/entry/common.c:296
entry_SYSCALL_64_after_hwframe+0x49/0xbe
the dependencies between the lock to be acquired
and SOFTIRQ-irq-unsafe lock:
-> (&fiq->waitq){+.+.} {
HARDIRQ-ON-W at:
lock_acquire+0x99/0x180 kernel/locking/lockdep.c:4412
__raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
_raw_spin_lock+0x2b/0x40 kernel/locking/spinlock.c:151
spin_lock include/linux/spinlock.h:338 [inline]
flush_bg_queue+0x91/0xf0 fs/fuse/dev.c:415
fuse_request_queue_background+0xd1/0x140 fs/fuse/dev.c:676
fuse_request_send_background+0x27/0x60 fs/fuse/dev.c:687
fuse_send_init fs/fuse/inode.c:986 [inline]
fuse_fill_super+0x656/0x681 fs/fuse/inode.c:1211
mount_nodev+0x42/0x90 fs/super.c:1329
fuse_mount+0x13/0x20 fs/fuse/inode.c:1236
legacy_get_tree+0x2c/0x50 fs/fs_context.c:661
vfs_get_tree+0x22/0xc0 fs/super.c:1413
do_new_mount fs/namespace.c:2791 [inline]
do_mount+0x7e3/0xa60 fs/namespace.c:3111
ksys_mount+0x7d/0xd0 fs/namespace.c:3320
__do_sys_mount fs/namespace.c:3334 [inline]
__se_sys_mount fs/namespace.c:3331 [inline]
__x64_sys_mount+0x20/0x30 fs/namespace.c:3331
do_syscall_64+0x4a/0x1a0 arch/x86/entry/common.c:296
entry_SYSCALL_64_after_hwframe+0x49/0xbe
SOFTIRQ-ON-W at:
lock_acquire+0x99/0x180 kernel/locking/lockdep.c:4412
__raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
_raw_spin_lock+0x2b/0x40 kernel/locking/spinlock.c:151
spin_lock include/linux/spinlock.h:338 [inline]
flush_bg_queue+0x91/0xf0 fs/fuse/dev.c:415
fuse_request_queue_background+0xd1/0x140 fs/fuse/dev.c:676
fuse_request_send_background+0x27/0x60 fs/fuse/dev.c:687
fuse_send_init fs/fuse/inode.c:986 [inline]
fuse_fill_super+0x656/0x681 fs/fuse/inode.c:1211
mount_nodev+0x42/0x90 fs/super.c:1329
fuse_mount+0x13/0x20 fs/fuse/inode.c:1236
legacy_get_tree+0x2c/0x50 fs/fs_context.c:661
vfs_get_tree+0x22/0xc0 fs/super.c:1413
do_new_mount fs/namespace.c:2791 [inline]
do_mount+0x7e3/0xa60 fs/namespace.c:3111
ksys_mount+0x7d/0xd0 fs/namespace.c:3320
__do_sys_mount fs/namespace.c:3334 [inline]
__se_sys_mount fs/namespace.c:3331 [inline]
__x64_sys_mount+0x20/0x30 fs/namespace.c:3331
do_syscall_64+0x4a/0x1a0 arch/x86/entry/common.c:296
entry_SYSCALL_64_after_hwframe+0x49/0xbe
INITIAL USE at:
lock_acquire+0x99/0x180 kernel/locking/lockdep.c:4412
__raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
_raw_spin_lock+0x2b/0x40 kernel/locking/spinlock.c:151
spin_lock include/linux/spinlock.h:338 [inline]
flush_bg_queue+0x91/0xf0 fs/fuse/dev.c:415
fuse_request_queue_background+0xd1/0x140 fs/fuse/dev.c:676
fuse_request_send_background+0x27/0x60 fs/fuse/dev.c:687
fuse_send_init fs/fuse/inode.c:986 [inline]
fuse_fill_super+0x656/0x681 fs/fuse/inode.c:1211
mount_nodev+0x42/0x90 fs/super.c:1329
fuse_mount+0x13/0x20 fs/fuse/inode.c:1236
legacy_get_tree+0x2c/0x50 fs/fs_context.c:661
vfs_get_tree+0x22/0xc0 fs/super.c:1413
do_new_mount fs/namespace.c:2791 [inline]
do_mount+0x7e3/0xa60 fs/namespace.c:3111
ksys_mount+0x7d/0xd0 fs/namespace.c:3320
__do_sys_mount fs/namespace.c:3334 [inline]
__se_sys_mount fs/namespace.c:3331 [inline]
__x64_sys_mount+0x20/0x30 fs/namespace.c:3331
do_syscall_64+0x4a/0x1a0 arch/x86/entry/common.c:296
entry_SYSCALL_64_after_hwframe+0x49/0xbe
}
... key at: [<ffffffff82863ea0>] __key.40870+0x0/0x10
... acquired at:
check_prev_add+0xa7/0x950 kernel/locking/lockdep.c:2409
check_prevs_add kernel/locking/lockdep.c:2507 [inline]
validate_chain+0x483/0x870 kernel/locking/lockdep.c:2897
__lock_acquire+0x447/0x7d0 kernel/locking/lockdep.c:3880
lock_acquire+0x99/0x180 kernel/locking/lockdep.c:4412
__raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
_raw_spin_lock+0x2b/0x40 kernel/locking/spinlock.c:151
spin_lock include/linux/spinlock.h:338 [inline]
aio_poll fs/aio.c:1751 [inline]
__io_submit_one.constprop.0+0x203/0x5b0 fs/aio.c:1825
io_submit_one+0x146/0x5b0 fs/aio.c:1862
__do_sys_io_submit fs/aio.c:1921 [inline]
__se_sys_io_submit+0x8e/0x270 fs/aio.c:1891
__x64_sys_io_submit+0x15/0x20 fs/aio.c:1891
do_syscall_64+0x4a/0x1a0 arch/x86/entry/common.c:296
entry_SYSCALL_64_after_hwframe+0x49/0xbe
stack backtrace:
CPU: 1 PID: 150 Comm: syz_fuse Not tainted 5.3.0-rc7 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-20181126_142135-anatol 04/01/2014
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x70/0x9a lib/dump_stack.c:113
print_bad_irq_dependency.cold+0x380/0x3d8 kernel/locking/lockdep.c:2025
check_irq_usage+0x337/0x400 kernel/locking/lockdep.c:2223
check_prev_add+0xa7/0x950 kernel/locking/lockdep.c:2409
check_prevs_add kernel/locking/lockdep.c:2507 [inline]
validate_chain+0x483/0x870 kernel/locking/lockdep.c:2897
__lock_acquire+0x447/0x7d0 kernel/locking/lockdep.c:3880
lock_acquire+0x99/0x180 kernel/locking/lockdep.c:4412
__raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
_raw_spin_lock+0x2b/0x40 kernel/locking/spinlock.c:151
spin_lock include/linux/spinlock.h:338 [inline]
aio_poll fs/aio.c:1751 [inline]
__io_submit_one.constprop.0+0x203/0x5b0 fs/aio.c:1825
io_submit_one+0x146/0x5b0 fs/aio.c:1862
__do_sys_io_submit fs/aio.c:1921 [inline]
__se_sys_io_submit+0x8e/0x270 fs/aio.c:1891
__x64_sys_io_submit+0x15/0x20 fs/aio.c:1891
do_syscall_64+0x4a/0x1a0 arch/x86/entry/common.c:296
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f1bbde7ec6d
Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f3 51 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffffa3a70e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1
RAX: ffffffffffffffda RBX: 00007ffffa3a7140 RCX: 00007f1bbde7ec6d
RDX: 00007ffffa3a70f8 RSI: 0000000000000001 RDI: 00007f1bbe27d000
RBP: 0000559337c29290 R08: 0000000000000000 R09: 00007f1bbe27d000
R10: 0000000000000029 R11: 0000000000000246 R12: 0000559337c29190
R13: 00007ffffa3a72b0 R14: 0000000000000000 R15: 0000000000000000
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] fuse: disable irqs for fuse_iqueue::waitq.lock
2019-09-03 13:39 ` Eric Biggers
@ 2019-09-04 14:29 ` Miklos Szeredi
2019-09-06 4:43 ` Eric Biggers
0 siblings, 1 reply; 5+ messages in thread
From: Miklos Szeredi @ 2019-09-04 14:29 UTC (permalink / raw)
To: Eric Biggers
Cc: linux-fsdevel, linux-aio, Benjamin LaHaise, syzkaller-bugs,
stable, Christoph Hellwig
On Tue, Sep 3, 2019 at 3:39 PM Eric Biggers <ebiggers@kernel.org> wrote:
>
> On Tue, Sep 03, 2019 at 09:31:29AM +0200, Miklos Szeredi wrote:
> > On Fri, Aug 23, 2019 at 1:35 AM Eric Biggers <ebiggers@kernel.org> wrote:
> > >
> > > From: Eric Biggers <ebiggers@google.com>
> > >
> > > When IOCB_CMD_POLL is used on the FUSE device, aio_poll() disables IRQs
> > > and takes kioctx::ctx_lock, then fuse_iqueue::waitq.lock.
> >
> > Not in -linus.
> >
> > Which tree was this reproduced with?
> >
> > Thanks,
> > Miklos
>
> Linus's tree. Here's the full symbolized output on v5.3-rc7:
Okay.
TBH, I find the fix disgusting. It's confusing to sprinke code that
has absolutely nothing to do with interrupts with spin_lock_irq()
calls.
I think the lock/unlock calls should at least be done with a helper
with a comment explaining why disabling interrupts is needed (though I
have not managed to understand why aio needs to actually mess with the
waitq lock...)
Probably a better fix would be to just use a separate spinlock to
avoid the need to disable interrupts in cases where it's not
necessary.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] fuse: disable irqs for fuse_iqueue::waitq.lock
2019-09-04 14:29 ` Miklos Szeredi
@ 2019-09-06 4:43 ` Eric Biggers
2019-09-06 6:58 ` Miklos Szeredi
0 siblings, 1 reply; 5+ messages in thread
From: Eric Biggers @ 2019-09-06 4:43 UTC (permalink / raw)
To: Miklos Szeredi
Cc: linux-fsdevel, linux-aio, Benjamin LaHaise, syzkaller-bugs,
stable, Christoph Hellwig
On Wed, Sep 04, 2019 at 04:29:03PM +0200, Miklos Szeredi wrote:
> On Tue, Sep 3, 2019 at 3:39 PM Eric Biggers <ebiggers@kernel.org> wrote:
> >
> > On Tue, Sep 03, 2019 at 09:31:29AM +0200, Miklos Szeredi wrote:
> > > On Fri, Aug 23, 2019 at 1:35 AM Eric Biggers <ebiggers@kernel.org> wrote:
> > > >
> > > > From: Eric Biggers <ebiggers@google.com>
> > > >
> > > > When IOCB_CMD_POLL is used on the FUSE device, aio_poll() disables IRQs
> > > > and takes kioctx::ctx_lock, then fuse_iqueue::waitq.lock.
> > >
> > > Not in -linus.
> > >
> > > Which tree was this reproduced with?
> > >
> > > Thanks,
> > > Miklos
> >
> > Linus's tree. Here's the full symbolized output on v5.3-rc7:
>
> Okay.
>
> TBH, I find the fix disgusting. It's confusing to sprinke code that
> has absolutely nothing to do with interrupts with spin_lock_irq()
> calls.
>
> I think the lock/unlock calls should at least be done with a helper
> with a comment explaining why disabling interrupts is needed (though I
> have not managed to understand why aio needs to actually mess with the
> waitq lock...)
The aio code is doing a poll(), so it needs to use the wait queue.
>
> Probably a better fix would be to just use a separate spinlock to
> avoid the need to disable interrupts in cases where it's not
> necessary.
Well, the below is what a separate lock would look like. Note that it actually
still disables IRQs in some places; it's just hidden away in the nested
spin_lock_irqsave() in wake_up(). Likewise, adding something to the fuse_iqueue
then requires taking 2 spin locks -- one explicit, and one hidden in wake_up().
Is this the solution you'd prefer?
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index ea8237513dfa..1d9d555f2699 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -377,7 +377,7 @@ static void queue_request(struct fuse_iqueue *fiq, struct fuse_req *req)
req->in.h.len = sizeof(struct fuse_in_header) +
len_args(req->in.numargs, (struct fuse_arg *) req->in.args);
list_add_tail(&req->list, &fiq->pending);
- wake_up_locked(&fiq->waitq);
+ wake_up(&fiq->waitq);
kill_fasync(&fiq->fasync, SIGIO, POLL_IN);
}
@@ -389,16 +389,16 @@ void fuse_queue_forget(struct fuse_conn *fc, struct fuse_forget_link *forget,
forget->forget_one.nodeid = nodeid;
forget->forget_one.nlookup = nlookup;
- spin_lock(&fiq->waitq.lock);
+ spin_lock(&fiq->lock);
if (fiq->connected) {
fiq->forget_list_tail->next = forget;
fiq->forget_list_tail = forget;
- wake_up_locked(&fiq->waitq);
+ wake_up(&fiq->waitq);
kill_fasync(&fiq->fasync, SIGIO, POLL_IN);
} else {
kfree(forget);
}
- spin_unlock(&fiq->waitq.lock);
+ spin_unlock(&fiq->lock);
}
static void flush_bg_queue(struct fuse_conn *fc)
@@ -412,10 +412,10 @@ static void flush_bg_queue(struct fuse_conn *fc)
req = list_first_entry(&fc->bg_queue, struct fuse_req, list);
list_del(&req->list);
fc->active_background++;
- spin_lock(&fiq->waitq.lock);
+ spin_lock(&fiq->lock);
req->in.h.unique = fuse_get_unique(fiq);
queue_request(fiq, req);
- spin_unlock(&fiq->waitq.lock);
+ spin_unlock(&fiq->lock);
}
}
@@ -439,9 +439,9 @@ static void request_end(struct fuse_conn *fc, struct fuse_req *req)
* smp_mb() from queue_interrupt().
*/
if (!list_empty(&req->intr_entry)) {
- spin_lock(&fiq->waitq.lock);
+ spin_lock(&fiq->lock);
list_del_init(&req->intr_entry);
- spin_unlock(&fiq->waitq.lock);
+ spin_unlock(&fiq->lock);
}
WARN_ON(test_bit(FR_PENDING, &req->flags));
WARN_ON(test_bit(FR_SENT, &req->flags));
@@ -483,10 +483,10 @@ static void request_end(struct fuse_conn *fc, struct fuse_req *req)
static int queue_interrupt(struct fuse_iqueue *fiq, struct fuse_req *req)
{
- spin_lock(&fiq->waitq.lock);
+ spin_lock(&fiq->lock);
/* Check for we've sent request to interrupt this req */
if (unlikely(!test_bit(FR_INTERRUPTED, &req->flags))) {
- spin_unlock(&fiq->waitq.lock);
+ spin_unlock(&fiq->lock);
return -EINVAL;
}
@@ -499,13 +499,13 @@ static int queue_interrupt(struct fuse_iqueue *fiq, struct fuse_req *req)
smp_mb();
if (test_bit(FR_FINISHED, &req->flags)) {
list_del_init(&req->intr_entry);
- spin_unlock(&fiq->waitq.lock);
+ spin_unlock(&fiq->lock);
return 0;
}
- wake_up_locked(&fiq->waitq);
+ wake_up(&fiq->waitq);
kill_fasync(&fiq->fasync, SIGIO, POLL_IN);
}
- spin_unlock(&fiq->waitq.lock);
+ spin_unlock(&fiq->lock);
return 0;
}
@@ -535,16 +535,16 @@ static void request_wait_answer(struct fuse_conn *fc, struct fuse_req *req)
if (!err)
return;
- spin_lock(&fiq->waitq.lock);
+ spin_lock(&fiq->lock);
/* Request is not yet in userspace, bail out */
if (test_bit(FR_PENDING, &req->flags)) {
list_del(&req->list);
- spin_unlock(&fiq->waitq.lock);
+ spin_unlock(&fiq->lock);;
__fuse_put_request(req);
req->out.h.error = -EINTR;
return;
}
- spin_unlock(&fiq->waitq.lock);
+ spin_unlock(&fiq->lock);
}
/*
@@ -559,9 +559,9 @@ static void __fuse_request_send(struct fuse_conn *fc, struct fuse_req *req)
struct fuse_iqueue *fiq = &fc->iq;
BUG_ON(test_bit(FR_BACKGROUND, &req->flags));
- spin_lock(&fiq->waitq.lock);
+ spin_lock_irq(&fiq->lock);
if (!fiq->connected) {
- spin_unlock(&fiq->waitq.lock);
+ spin_unlock(&fiq->lock);
req->out.h.error = -ENOTCONN;
} else {
req->in.h.unique = fuse_get_unique(fiq);
@@ -569,7 +569,7 @@ static void __fuse_request_send(struct fuse_conn *fc, struct fuse_req *req)
/* acquire extra reference, since request is still needed
after request_end() */
__fuse_get_request(req);
- spin_unlock(&fiq->waitq.lock);
+ spin_unlock(&fiq->lock);
request_wait_answer(fc, req);
/* Pairs with smp_wmb() in request_end() */
@@ -700,12 +700,12 @@ static int fuse_request_send_notify_reply(struct fuse_conn *fc,
__clear_bit(FR_ISREPLY, &req->flags);
req->in.h.unique = unique;
- spin_lock(&fiq->waitq.lock);
+ spin_lock(&fiq->lock);
if (fiq->connected) {
queue_request(fiq, req);
err = 0;
}
- spin_unlock(&fiq->waitq.lock);
+ spin_unlock(&fiq->lock);
return err;
}
@@ -1149,12 +1149,12 @@ static int request_pending(struct fuse_iqueue *fiq)
* Unlike other requests this is assembled on demand, without a need
* to allocate a separate fuse_req structure.
*
- * Called with fiq->waitq.lock held, releases it
+ * Called with fiq->lock held, releases it
*/
static int fuse_read_interrupt(struct fuse_iqueue *fiq,
struct fuse_copy_state *cs,
size_t nbytes, struct fuse_req *req)
-__releases(fiq->waitq.lock)
+__releases(fiq->lock)
{
struct fuse_in_header ih;
struct fuse_interrupt_in arg;
@@ -1169,7 +1169,7 @@ __releases(fiq->waitq.lock)
ih.unique = (req->in.h.unique | FUSE_INT_REQ_BIT);
arg.unique = req->in.h.unique;
- spin_unlock(&fiq->waitq.lock);
+ spin_unlock(&fiq->lock);
if (nbytes < reqsize)
return -EINVAL;
@@ -1206,7 +1206,7 @@ static struct fuse_forget_link *dequeue_forget(struct fuse_iqueue *fiq,
static int fuse_read_single_forget(struct fuse_iqueue *fiq,
struct fuse_copy_state *cs,
size_t nbytes)
-__releases(fiq->waitq.lock)
+__releases(fiq->lock)
{
int err;
struct fuse_forget_link *forget = dequeue_forget(fiq, 1, NULL);
@@ -1220,7 +1220,7 @@ __releases(fiq->waitq.lock)
.len = sizeof(ih) + sizeof(arg),
};
- spin_unlock(&fiq->waitq.lock);
+ spin_unlock(&fiq->lock);
kfree(forget);
if (nbytes < ih.len)
return -EINVAL;
@@ -1238,7 +1238,7 @@ __releases(fiq->waitq.lock)
static int fuse_read_batch_forget(struct fuse_iqueue *fiq,
struct fuse_copy_state *cs, size_t nbytes)
-__releases(fiq->waitq.lock)
+__releases(fiq->lock)
{
int err;
unsigned max_forgets;
@@ -1252,13 +1252,13 @@ __releases(fiq->waitq.lock)
};
if (nbytes < ih.len) {
- spin_unlock(&fiq->waitq.lock);
+ spin_unlock(&fiq->lock);
return -EINVAL;
}
max_forgets = (nbytes - ih.len) / sizeof(struct fuse_forget_one);
head = dequeue_forget(fiq, max_forgets, &count);
- spin_unlock(&fiq->waitq.lock);
+ spin_unlock(&fiq->lock);
arg.count = count;
ih.len += count * sizeof(struct fuse_forget_one);
@@ -1288,7 +1288,7 @@ __releases(fiq->waitq.lock)
static int fuse_read_forget(struct fuse_conn *fc, struct fuse_iqueue *fiq,
struct fuse_copy_state *cs,
size_t nbytes)
-__releases(fiq->waitq.lock)
+__releases(fiq->lock)
{
if (fc->minor < 16 || fiq->forget_list_head.next->next == NULL)
return fuse_read_single_forget(fiq, cs, nbytes);
@@ -1318,16 +1318,19 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
unsigned int hash;
restart:
- spin_lock(&fiq->waitq.lock);
- err = -EAGAIN;
- if ((file->f_flags & O_NONBLOCK) && fiq->connected &&
- !request_pending(fiq))
- goto err_unlock;
+ for (;;) {
+ spin_lock(&fiq->lock);
+ if (!fiq->connected || request_pending(fiq))
+ break;
+ spin_unlock(&fiq->lock);
- err = wait_event_interruptible_exclusive_locked(fiq->waitq,
+ if (file->f_flags & O_NONBLOCK)
+ return -EAGAIN;
+ err = wait_event_interruptible_exclusive(fiq->waitq,
!fiq->connected || request_pending(fiq));
- if (err)
- goto err_unlock;
+ if (err)
+ return err;
+ }
if (!fiq->connected) {
err = fc->aborted ? -ECONNABORTED : -ENODEV;
@@ -1351,7 +1354,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
req = list_entry(fiq->pending.next, struct fuse_req, list);
clear_bit(FR_PENDING, &req->flags);
list_del_init(&req->list);
- spin_unlock(&fiq->waitq.lock);
+ spin_unlock(&fiq->lock);
in = &req->in;
reqsize = in->h.len;
@@ -1409,7 +1412,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
return err;
err_unlock:
- spin_unlock(&fiq->waitq.lock);
+ spin_unlock(&fiq->lock);
return err;
}
@@ -2121,12 +2124,12 @@ static __poll_t fuse_dev_poll(struct file *file, poll_table *wait)
fiq = &fud->fc->iq;
poll_wait(file, &fiq->waitq, wait);
- spin_lock(&fiq->waitq.lock);
+ spin_lock(&fiq->lock);
if (!fiq->connected)
mask = EPOLLERR;
else if (request_pending(fiq))
mask |= EPOLLIN | EPOLLRDNORM;
- spin_unlock(&fiq->waitq.lock);
+ spin_unlock(&fiq->lock);
return mask;
}
@@ -2221,15 +2224,15 @@ void fuse_abort_conn(struct fuse_conn *fc)
flush_bg_queue(fc);
spin_unlock(&fc->bg_lock);
- spin_lock(&fiq->waitq.lock);
+ spin_lock(&fiq->lock);
fiq->connected = 0;
list_for_each_entry(req, &fiq->pending, list)
clear_bit(FR_PENDING, &req->flags);
list_splice_tail_init(&fiq->pending, &to_end);
while (forget_pending(fiq))
kfree(dequeue_forget(fiq, 1, NULL));
- wake_up_all_locked(&fiq->waitq);
- spin_unlock(&fiq->waitq.lock);
+ wake_up_all(&fiq->waitq);
+ spin_unlock(&fiq->lock);
kill_fasync(&fiq->fasync, SIGIO, POLL_IN);
end_polls(fc);
wake_up_all(&fc->blocked_waitq);
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 24dbca777775..5de7fd0fd6ef 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -450,6 +450,9 @@ struct fuse_iqueue {
/** Connection established */
unsigned connected;
+ /** Lock protecting accessess to members of this structure */
+ spinlock_t lock;
+
/** Readers of the connection are waiting on this */
wait_queue_head_t waitq;
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 4bb885b0f032..987877860c01 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -582,6 +582,7 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root)
static void fuse_iqueue_init(struct fuse_iqueue *fiq)
{
memset(fiq, 0, sizeof(struct fuse_iqueue));
+ spin_lock_init(&fiq->lock);
init_waitqueue_head(&fiq->waitq);
INIT_LIST_HEAD(&fiq->pending);
INIT_LIST_HEAD(&fiq->interrupts);
--
2.23.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] fuse: disable irqs for fuse_iqueue::waitq.lock
2019-09-06 4:43 ` Eric Biggers
@ 2019-09-06 6:58 ` Miklos Szeredi
0 siblings, 0 replies; 5+ messages in thread
From: Miklos Szeredi @ 2019-09-06 6:58 UTC (permalink / raw)
To: Eric Biggers
Cc: linux-fsdevel, linux-aio, Benjamin LaHaise, syzkaller-bugs,
stable, Christoph Hellwig
On Fri, Sep 6, 2019 at 6:43 AM Eric Biggers <ebiggers@kernel.org> wrote:
>
> On Wed, Sep 04, 2019 at 04:29:03PM +0200, Miklos Szeredi wrote:
> > TBH, I find the fix disgusting. It's confusing to sprinke code that
> > has absolutely nothing to do with interrupts with spin_lock_irq()
> > calls.
> >
> > I think the lock/unlock calls should at least be done with a helper
> > with a comment explaining why disabling interrupts is needed (though I
> > have not managed to understand why aio needs to actually mess with the
> > waitq lock...)
>
> The aio code is doing a poll(), so it needs to use the wait queue.
Doesn't explain why the irq disabled nested locking is needed in
aio_poll(). poll/select manage to do that without messing with waitq
internals. How is aio poll different?
> >
> > Probably a better fix would be to just use a separate spinlock to
> > avoid the need to disable interrupts in cases where it's not
> > necessary.
>
> Well, the below is what a separate lock would look like. Note that it actually
> still disables IRQs in some places; it's just hidden away in the nested
> spin_lock_irqsave() in wake_up(). Likewise, adding something to the fuse_iqueue
> then requires taking 2 spin locks -- one explicit, and one hidden in wake_up().
Right, that's exactly why the waitq lock was used.
> Is this the solution you'd prefer?
I'd actually prefer if aio was fixed. But I guess that's not
realistic, so yes, the below patch looks okay. If fiq->lock is in the
same cacheline as fiq->waitq then it shouldn't make a difference.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2019-09-06 6:59 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <0000000000008d8eac05906691ac@google.com>
[not found] ` <20190822233529.4176-1-ebiggers@kernel.org>
2019-09-03 7:31 ` [PATCH] fuse: disable irqs for fuse_iqueue::waitq.lock Miklos Szeredi
2019-09-03 13:39 ` Eric Biggers
2019-09-04 14:29 ` Miklos Szeredi
2019-09-06 4:43 ` Eric Biggers
2019-09-06 6:58 ` Miklos Szeredi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).