Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
From: Bart Van Assche <bvanassche@acm.org>
To: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org, linux-aio@kvack.org,
	Bart Van Assche <bvanassche@acm.org>,
	syzbot <syzkaller@googlegroups.com>,
	Christoph Hellwig <hch@lst.de>, Avi Kivity <avi@scylladb.com>,
	Eric Dumazet <edumazet@google.com>,
	stable@vger.kernel.org
Subject: [PATCH] aio: Fix locking in aio_poll()
Date: Mon,  4 Feb 2019 09:45:55 -0800
Message-ID: <20190204174555.83603-1-bvanassche@acm.org> (raw)

Since kioctx.ctx_lock may be acquired from IRQ context, all code that
acquires that lock from thread context must disable interrupts. This
patch fixes the following lockdep complaint:

=====================================================
WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
5.0.0-rc4-next-20190131 #23 Not tainted
-----------------------------------------------------
syz-executor2/13779 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
0000000098ac1230 (&fiq->waitq){+.+.}, at: spin_lock include/linux/spinlock.h:329 [inline]
0000000098ac1230 (&fiq->waitq){+.+.}, at: aio_poll fs/aio.c:1772 [inline]
0000000098ac1230 (&fiq->waitq){+.+.}, at: __io_submit_one fs/aio.c:1875 [inline]
0000000098ac1230 (&fiq->waitq){+.+.}, at: io_submit_one+0xedf/0x1cf0 fs/aio.c:1908

and this task is already holding:
000000003c46111c (&(&ctx->ctx_lock)->rlock){..-.}, at: spin_lock_irq include/linux/spinlock.h:354 [inline]
000000003c46111c (&(&ctx->ctx_lock)->rlock){..-.}, at: aio_poll fs/aio.c:1771 [inline]
000000003c46111c (&(&ctx->ctx_lock)->rlock){..-.}, at: __io_submit_one fs/aio.c:1875 [inline]
000000003c46111c (&(&ctx->ctx_lock)->rlock){..-.}, at: io_submit_one+0xeb6/0x1cf0 fs/aio.c:1908
which would create a new lock dependency:
 (&(&ctx->ctx_lock)->rlock){..-.} -> (&fiq->waitq){+.+.}

but this new dependency connects a SOFTIRQ-irq-safe lock:
 (&(&ctx->ctx_lock)->rlock){..-.}

... which became SOFTIRQ-irq-safe at:
  lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:3826
  __raw_spin_lock_irq include/linux/spinlock_api_smp.h:128 [inline]
  _raw_spin_lock_irq+0x60/0x80 kernel/locking/spinlock.c:160
  spin_lock_irq include/linux/spinlock.h:354 [inline]
  free_ioctx_users+0x2d/0x4a0 fs/aio.c:610
  percpu_ref_put_many include/linux/percpu-refcount.h:285 [inline]
  percpu_ref_put include/linux/percpu-refcount.h:301 [inline]
  percpu_ref_call_confirm_rcu lib/percpu-refcount.c:123 [inline]
  percpu_ref_switch_to_atomic_rcu+0x3e7/0x520 lib/percpu-refcount.c:158
  __rcu_reclaim kernel/rcu/rcu.h:240 [inline]
  rcu_do_batch kernel/rcu/tree.c:2486 [inline]
  invoke_rcu_callbacks kernel/rcu/tree.c:2799 [inline]
  rcu_core+0x928/0x1390 kernel/rcu/tree.c:2780
  __do_softirq+0x266/0x95a kernel/softirq.c:292
  run_ksoftirqd kernel/softirq.c:654 [inline]
  run_ksoftirqd+0x8e/0x110 kernel/softirq.c:646
  smpboot_thread_fn+0x6ab/0xa10 kernel/smpboot.c:164
  kthread+0x357/0x430 kernel/kthread.c:247
  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352

to a SOFTIRQ-irq-unsafe lock:
 (&fiq->waitq){+.+.}

... which became SOFTIRQ-irq-unsafe at:
...
  lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:3826
  __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
  _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:144
  spin_lock include/linux/spinlock.h:329 [inline]
  flush_bg_queue+0x1f3/0x3c0 fs/fuse/dev.c:415
  fuse_request_queue_background+0x2d1/0x580 fs/fuse/dev.c:676
  fuse_request_send_background+0x58/0x120 fs/fuse/dev.c:687
  fuse_send_init fs/fuse/inode.c:989 [inline]
  fuse_fill_super+0x13bb/0x1730 fs/fuse/inode.c:1214
  mount_nodev+0x68/0x110 fs/super.c:1392
  fuse_mount+0x2d/0x40 fs/fuse/inode.c:1239
  legacy_get_tree+0xf2/0x200 fs/fs_context.c:590
  vfs_get_tree+0x123/0x450 fs/super.c:1481
  do_new_mount fs/namespace.c:2610 [inline]
  do_mount+0x1436/0x2c40 fs/namespace.c:2932
  ksys_mount+0xdb/0x150 fs/namespace.c:3148
  __do_sys_mount fs/namespace.c:3162 [inline]
  __se_sys_mount fs/namespace.c:3159 [inline]
  __x64_sys_mount+0xbe/0x150 fs/namespace.c:3159
  do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

other info that might help us debug this:

 Possible interrupt unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&fiq->waitq);
                               local_irq_disable();
                               lock(&(&ctx->ctx_lock)->rlock);
                               lock(&fiq->waitq);
  <Interrupt>
    lock(&(&ctx->ctx_lock)->rlock);

 *** DEADLOCK ***

1 lock held by syz-executor2/13779:
 #0: 000000003c46111c (&(&ctx->ctx_lock)->rlock){..-.}, at: spin_lock_irq include/linux/spinlock.h:354 [inline]
 #0: 000000003c46111c (&(&ctx->ctx_lock)->rlock){..-.}, at: aio_poll fs/aio.c:1771 [inline]
 #0: 000000003c46111c (&(&ctx->ctx_lock)->rlock){..-.}, at: __io_submit_one fs/aio.c:1875 [inline]
 #0: 000000003c46111c (&(&ctx->ctx_lock)->rlock){..-.}, at: io_submit_one+0xeb6/0x1cf0 fs/aio.c:1908

the dependencies between SOFTIRQ-irq-safe lock and the holding lock:
-> (&(&ctx->ctx_lock)->rlock){..-.} {
   IN-SOFTIRQ-W at:
                    lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:3826
                    __raw_spin_lock_irq include/linux/spinlock_api_smp.h:128 [inline]
                    _raw_spin_lock_irq+0x60/0x80 kernel/locking/spinlock.c:160
                    spin_lock_irq include/linux/spinlock.h:354 [inline]
                    free_ioctx_users+0x2d/0x4a0 fs/aio.c:610
                    percpu_ref_put_many include/linux/percpu-refcount.h:285 [inline]
                    percpu_ref_put include/linux/percpu-refcount.h:301 [inline]
                    percpu_ref_call_confirm_rcu lib/percpu-refcount.c:123 [inline]
                    percpu_ref_switch_to_atomic_rcu+0x3e7/0x520 lib/percpu-refcount.c:158
                    __rcu_reclaim kernel/rcu/rcu.h:240 [inline]
                    rcu_do_batch kernel/rcu/tree.c:2486 [inline]
                    invoke_rcu_callbacks kernel/rcu/tree.c:2799 [inline]
                    rcu_core+0x928/0x1390 kernel/rcu/tree.c:2780
                    __do_softirq+0x266/0x95a kernel/softirq.c:292
                    run_ksoftirqd kernel/softirq.c:654 [inline]
                    run_ksoftirqd+0x8e/0x110 kernel/softirq.c:646
                    smpboot_thread_fn+0x6ab/0xa10 kernel/smpboot.c:164
                    kthread+0x357/0x430 kernel/kthread.c:247
                    ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
   INITIAL USE at:
                   lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:3826
                   __raw_spin_lock_irq include/linux/spinlock_api_smp.h:128 [inline]
                   _raw_spin_lock_irq+0x60/0x80 kernel/locking/spinlock.c:160
                   spin_lock_irq include/linux/spinlock.h:354 [inline]
                   __do_sys_io_cancel fs/aio.c:2052 [inline]
                   __se_sys_io_cancel fs/aio.c:2035 [inline]
                   __x64_sys_io_cancel+0xd5/0x5a0 fs/aio.c:2035
                   do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
                   entry_SYSCALL_64_after_hwframe+0x49/0xbe
 }
 ... key      at: [<ffffffff8a574140>] __key.52370+0x0/0x40
 ... acquired at:
   lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:3826
   __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
   _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:144
   spin_lock include/linux/spinlock.h:329 [inline]
   aio_poll fs/aio.c:1772 [inline]
   __io_submit_one fs/aio.c:1875 [inline]
   io_submit_one+0xedf/0x1cf0 fs/aio.c:1908
   __do_sys_io_submit fs/aio.c:1953 [inline]
   __se_sys_io_submit fs/aio.c:1923 [inline]
   __x64_sys_io_submit+0x1bd/0x580 fs/aio.c:1923
   do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

the dependencies between the lock to be acquired
 and SOFTIRQ-irq-unsafe lock:
-> (&fiq->waitq){+.+.} {
   HARDIRQ-ON-W at:
                    lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:3826
                    __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
                    _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:144
                    spin_lock include/linux/spinlock.h:329 [inline]
                    flush_bg_queue+0x1f3/0x3c0 fs/fuse/dev.c:415
                    fuse_request_queue_background+0x2d1/0x580 fs/fuse/dev.c:676
                    fuse_request_send_background+0x58/0x120 fs/fuse/dev.c:687
                    fuse_send_init fs/fuse/inode.c:989 [inline]
                    fuse_fill_super+0x13bb/0x1730 fs/fuse/inode.c:1214
                    mount_nodev+0x68/0x110 fs/super.c:1392
                    fuse_mount+0x2d/0x40 fs/fuse/inode.c:1239
                    legacy_get_tree+0xf2/0x200 fs/fs_context.c:590
                    vfs_get_tree+0x123/0x450 fs/super.c:1481
                    do_new_mount fs/namespace.c:2610 [inline]
                    do_mount+0x1436/0x2c40 fs/namespace.c:2932
                    ksys_mount+0xdb/0x150 fs/namespace.c:3148
                    __do_sys_mount fs/namespace.c:3162 [inline]
                    __se_sys_mount fs/namespace.c:3159 [inline]
                    __x64_sys_mount+0xbe/0x150 fs/namespace.c:3159
                    do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
                    entry_SYSCALL_64_after_hwframe+0x49/0xbe
   SOFTIRQ-ON-W at:
                    lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:3826
                    __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
                    _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:144
                    spin_lock include/linux/spinlock.h:329 [inline]
                    flush_bg_queue+0x1f3/0x3c0 fs/fuse/dev.c:415
                    fuse_request_queue_background+0x2d1/0x580 fs/fuse/dev.c:676
                    fuse_request_send_background+0x58/0x120 fs/fuse/dev.c:687
                    fuse_send_init fs/fuse/inode.c:989 [inline]
                    fuse_fill_super+0x13bb/0x1730 fs/fuse/inode.c:1214
                    mount_nodev+0x68/0x110 fs/super.c:1392
                    fuse_mount+0x2d/0x40 fs/fuse/inode.c:1239
                    legacy_get_tree+0xf2/0x200 fs/fs_context.c:590
                    vfs_get_tree+0x123/0x450 fs/super.c:1481
                    do_new_mount fs/namespace.c:2610 [inline]
                    do_mount+0x1436/0x2c40 fs/namespace.c:2932
                    ksys_mount+0xdb/0x150 fs/namespace.c:3148
                    __do_sys_mount fs/namespace.c:3162 [inline]
                    __se_sys_mount fs/namespace.c:3159 [inline]
                    __x64_sys_mount+0xbe/0x150 fs/namespace.c:3159
                    do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
                    entry_SYSCALL_64_after_hwframe+0x49/0xbe
   INITIAL USE at:
                   lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:3826
                   __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
                   _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:144
                   spin_lock include/linux/spinlock.h:329 [inline]
                   flush_bg_queue+0x1f3/0x3c0 fs/fuse/dev.c:415
                   fuse_request_queue_background+0x2d1/0x580 fs/fuse/dev.c:676
                   fuse_request_send_background+0x58/0x120 fs/fuse/dev.c:687
                   fuse_send_init fs/fuse/inode.c:989 [inline]
                   fuse_fill_super+0x13bb/0x1730 fs/fuse/inode.c:1214
                   mount_nodev+0x68/0x110 fs/super.c:1392
                   fuse_mount+0x2d/0x40 fs/fuse/inode.c:1239
                   legacy_get_tree+0xf2/0x200 fs/fs_context.c:590
                   vfs_get_tree+0x123/0x450 fs/super.c:1481
                   do_new_mount fs/namespace.c:2610 [inline]
                   do_mount+0x1436/0x2c40 fs/namespace.c:2932
                   ksys_mount+0xdb/0x150 fs/namespace.c:3148
                   __do_sys_mount fs/namespace.c:3162 [inline]
                   __se_sys_mount fs/namespace.c:3159 [inline]
                   __x64_sys_mount+0xbe/0x150 fs/namespace.c:3159
                   do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
                   entry_SYSCALL_64_after_hwframe+0x49/0xbe
 }
 ... key      at: [<ffffffff8a60dec0>] __key.43450+0x0/0x40
 ... acquired at:
   lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:3826
   __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
   _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:144
   spin_lock include/linux/spinlock.h:329 [inline]
   aio_poll fs/aio.c:1772 [inline]
   __io_submit_one fs/aio.c:1875 [inline]
   io_submit_one+0xedf/0x1cf0 fs/aio.c:1908
   __do_sys_io_submit fs/aio.c:1953 [inline]
   __se_sys_io_submit fs/aio.c:1923 [inline]
   __x64_sys_io_submit+0x1bd/0x580 fs/aio.c:1923
   do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

stack backtrace:
CPU: 0 PID: 13779 Comm: syz-executor2 Not tainted 5.0.0-rc4-next-20190131 #23
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x172/0x1f0 lib/dump_stack.c:113
 print_bad_irq_dependency kernel/locking/lockdep.c:1573 [inline]
 check_usage.cold+0x60f/0x940 kernel/locking/lockdep.c:1605
 check_irq_usage kernel/locking/lockdep.c:1650 [inline]
 check_prev_add_irq kernel/locking/lockdep_states.h:8 [inline]
 check_prev_add kernel/locking/lockdep.c:1860 [inline]
 check_prevs_add kernel/locking/lockdep.c:1968 [inline]
 validate_chain kernel/locking/lockdep.c:2339 [inline]
 __lock_acquire+0x1f12/0x4790 kernel/locking/lockdep.c:3320
 lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:3826
 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
 _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:144
 spin_lock include/linux/spinlock.h:329 [inline]
 aio_poll fs/aio.c:1772 [inline]
 __io_submit_one fs/aio.c:1875 [inline]
 io_submit_one+0xedf/0x1cf0 fs/aio.c:1908
 __do_sys_io_submit fs/aio.c:1953 [inline]
 __se_sys_io_submit fs/aio.c:1923 [inline]
 __x64_sys_io_submit+0x1bd/0x580 fs/aio.c:1923
 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Avi Kivity <avi@scylladb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: <stable@vger.kernel.org>
Fixes: e8693bcfa0b4 ("aio: allow direct aio poll comletions for keyed wakeups") # v4.19
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 fs/aio.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index b906ff70c90f..41bb7114678e 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1688,9 +1688,9 @@ static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
 			return 0;
 
 		/* try to complete the iocb inline if we can: */
-		if (spin_trylock(&iocb->ki_ctx->ctx_lock)) {
+		if (spin_trylock_irq(&iocb->ki_ctx->ctx_lock)) {
 			list_del(&iocb->ki_list);
-			spin_unlock(&iocb->ki_ctx->ctx_lock);
+			spin_unlock_irq(&iocb->ki_ctx->ctx_lock);
 
 			list_del_init(&req->wait.entry);
 			aio_poll_complete(iocb, mask);
-- 
2.20.1.611.gfbb209baf1-goog


             reply index

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-04 17:45 Bart Van Assche [this message]
2019-02-04 17:49 ` Christoph Hellwig
2019-02-05  8:12   ` Miklos Szeredi
2019-02-06  0:53     ` Bart Van Assche
2019-02-06  8:36       ` Miklos Szeredi
2019-02-06 13:47         ` Christoph Hellwig
2019-02-06 14:31           ` Miklos Szeredi
2019-02-06 13:39       ` Christoph Hellwig
2019-02-09  0:59 Bart Van Assche
2019-02-12  7:56 ` Christoph Hellwig
2019-02-21 22:28   ` Bart Van Assche
2019-02-22  3:17     ` Al Viro

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190204174555.83603-1-bvanassche@acm.org \
    --to=bvanassche@acm.org \
    --cc=avi@scylladb.com \
    --cc=edumazet@google.com \
    --cc=hch@lst.de \
    --cc=linux-aio@kvack.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=syzkaller@googlegroups.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org linux-fsdevel@archiver.kernel.org
	public-inbox-index linux-fsdevel


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox