linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* lockdep warning in sys_swapon
@ 2020-03-23 15:17 Qais Yousef
  2020-03-23 15:30 ` Qais Yousef
  0 siblings, 1 reply; 7+ messages in thread
From: Qais Yousef @ 2020-03-23 15:17 UTC (permalink / raw)
  To: Andrew Morton, linux-mm; +Cc: linux-kernel

Hi

I hit the following 2 warnings when running with LOCKDEP=y on arm64 platform
(juno-r2), running on v5.6-rc6

The 1st one is when I execute `swapon -a`. The 2nd one happens at boot. I have
/dev/sda2 as my swap in /etc/fstab

Note that I either hit 1 OR 2, but didn't hit both warnings at the same time,
yet at least.

/dev/sda2 is a usb flash drive, in case it matters somehow.


### WARNING 1 ###

# swapon -a
[ 1682.788988]
[ 1682.790503] =====================================
[ 1682.795226] WARNING: bad unlock balance detected!
[ 1682.799951] 5.6.0-rc6 #533 Not tainted
[ 1682.803714] -------------------------------------
[ 1682.808437] swapon/437 is trying to release lock (&sb->s_type->i_mutex_key) at:
[ 1682.815797] [<ffff80001030c07c>] __arm64_sys_swapon+0x3fc/0x1140
[ 1682.821828] but there are no more locks to release!
[ 1682.826724]
[ 1682.826724] other info that might help us debug this:
[ 1682.833280] no locks held by swapon/437.
[ 1682.837216]
[ 1682.837216] stack backtrace:
[ 1682.841596] CPU: 4 PID: 437 Comm: swapon Not tainted 5.6.0-rc6 #533
[ 1682.847889] Hardware name: ARM Juno development board (r2) (DT)
[ 1682.853834] Call trace:
[ 1682.856293]  dump_backtrace+0x0/0x1a8
[ 1682.859971]  show_stack+0x24/0x30
[ 1682.863302]  dump_stack+0xe8/0x150
[ 1682.866722]  print_unlock_imbalance_bug+0xe4/0xe8
[ 1682.871447]  lock_release+0x274/0x350
[ 1682.875128]  up_write+0x30/0x190
[ 1682.878371]  __arm64_sys_swapon+0x3fc/0x1140
[ 1682.882662]  el0_svc_common.constprop.2+0x7c/0x178
[ 1682.887475]  do_el0_svc+0x34/0xa0
[ 1682.890805]  el0_sync_handler+0x114/0x1d0
[ 1682.894832]  el0_sync+0x140/0x180
[ 1682.898233] ------------[ cut here ]------------
[ 1682.902936] DEBUG_RWSEMS_WARN_ON((rwsem_owner(sem) != current) && !rwsem_test_oflags(sem, RWSEM_NONSPINNABLE)): count = 0x0, magic = 0xffff000971018270, owner = 0x0, curr 0xffff0009752fc680, list empty
[ 1682.921045] WARNING: CPU: 4 PID: 437 at kernel/locking/rwsem.c:1459 up_write+0x134/0x190
[ 1682.929203] Modules linked in:
[ 1682.932288] CPU: 4 PID: 437 Comm: swapon Not tainted 5.6.0-rc6 #533
[ 1682.938604] Hardware name: ARM Juno development board (r2) (DT)
[ 1682.944574] pstate: 60000005 (nZCv daif -PAN -UAO)
[ 1682.949407] pc : up_write+0x134/0x190
[ 1682.953102] lr : up_write+0x134/0x190
[ 1682.956793] sp : ffff800018e1bd30
[ 1682.960135] x29: ffff800018e1bd30 x28: ffff000970fb4000
[ 1682.965494] x27: ffff0009750bd000 x26: 00000000fffffff0
[ 1682.970852] x25: 0000000000000000 x24: fffffffffffffff0
[ 1682.976211] x23: ffff000971439200 x22: ffff80001323d000
[ 1682.981569] x21: ffff800013241000 x20: ffff80001030c07c
[ 1682.986927] x19: ffff000971018270 x18: ffffffffffffffff
[ 1682.992285] x17: 0000000000000000 x16: 0000000000000000
[ 1682.997643] x15: ffff80001323da88 x14: 0720072007200720
[ 1683.003002] x13: 0720072007200720 x12: 0720072007200720
[ 1683.008360] x11: 0720072007200720 x10: 0000000000000000
[ 1683.013718] x9 : ffff80001323da88 x8 : ffff800013267808
[ 1683.019076] x7 : ffff80001019f77c x6 : ffff800018e1b9f0
[ 1683.024434] x5 : 0000000000000001 x4 : 0000000000000001
[ 1683.029791] x3 : 775f041afc1e3200 x2 : ffff800013267860
[ 1683.035149] x1 : 775f041afc1e3200 x0 : 0000000000000000
[ 1683.040507] Call trace:
[ 1683.042977]  up_write+0x134/0x190
[ 1683.046323]  __arm64_sys_swapon+0x3fc/0x1140
[ 1683.050632]  el0_svc_common.constprop.2+0x7c/0x178
[ 1683.055464]  do_el0_svc+0x34/0xa0
[ 1683.058810]  el0_sync_handler+0x114/0x1d0
[ 1683.062854]  el0_sync+0x140/0x180
[ 1683.066196] irq event stamp: 1095
[ 1683.069543] hardirqs last  enabled at (1095): [<ffff80001191b724>] _raw_spin_unlock_irqrestore+0x7c/0x88
[ 1683.079104] hardirqs last disabled at (1094): [<ffff80001191b4c0>] _raw_spin_lock_irqsave+0x38/0x88
[ 1683.088226] softirqs last  enabled at (996): [<ffff8000100818a4>] __do_softirq+0x4bc/0x568
[ 1683.096562] softirqs last disabled at (985): [<ffff800010114064>] irq_exit+0x144/0x150
[ 1683.104545] ---[ end trace 737bf4f253f20ead ]---


### WARNING 2 ###


[   15.502583] Unable to find swap-space signature
[   15.507496]
[   15.508996] ======================================================
[   15.515202] WARNING: possible circular locking dependency detected
[   15.521411] 5.6.0-rc6 #533 Not tainted
[   15.525173] ------------------------------------------------------
[   15.531378] swapon/321 is trying to acquire lock:
[   15.536102] ffff000971020078 (&bdev->bd_mutex){+.+.}, at: blkdev_put+0x30/0x120
[   15.543460]
[   15.543460] but task is already holding lock:
[   15.549317] ffff0009710202d8 (&sb->s_type->i_mutex_key#7){+.+.}, at: __arm64_sys_swapon+0x2cc/0x1140
[   15.558505]
[   15.558505] which lock already depends on the new lock.
[   15.558505]
[   15.566719]
[   15.566719] the existing dependency chain (in reverse order) is:
[   15.574234]
[   15.574234] -> #1 (&sb->s_type->i_mutex_key#7){+.+.}:
[   15.580802]        down_write+0x50/0xb0
[   15.584656]        __blkdev_get+0x2a4/0x4f8
[   15.588859]        blkdev_get+0xfc/0x190
[   15.592800]        __device_add_disk+0x470/0x4b8
[   15.597437]        device_add_disk+0x38/0x48
[   15.601729]        sd_probe+0x350/0x488
[   15.605585]        really_probe+0x10c/0x358
[   15.609788]        driver_probe_device+0x5c/0x100
[   15.614514]        __device_attach_driver+0x98/0xb8
[   15.619415]        bus_for_each_drv+0x74/0xd8
[   15.623792]        __device_attach_async_helper+0xc8/0xf0
[   15.629217]        async_run_entry_fn+0x48/0x148
[   15.633857]        process_one_work+0x2a4/0x748
[   15.638407]        worker_thread+0x48/0x498
[   15.642609]        kthread+0x13c/0x140
[   15.646376]        ret_from_fork+0x10/0x18
[   15.650487]
[   15.650487] -> #0 (&bdev->bd_mutex){+.+.}:
[   15.656093]        __lock_acquire+0x1008/0x1450
[   15.660645]        lock_acquire+0xf8/0x280
[   15.664759]        __mutex_lock+0xac/0x920
[   15.668873]        mutex_lock_nested+0x3c/0x50
[   15.673337]        blkdev_put+0x30/0x120
[   15.677278]        __arm64_sys_swapon+0x5a8/0x1140
[   15.682093]        el0_svc_common.constprop.2+0x7c/0x178
[   15.687429]        do_el0_svc+0x34/0xa0
[   15.691282]        el0_sync_handler+0x114/0x1d0
[   15.695833]        el0_sync+0x140/0x180
[   15.699683]
[   15.699683] other info that might help us debug this:
[   15.699683]
[   15.707721]  Possible unsafe locking scenario:
[   15.707721]
[   15.713665]        CPU0                    CPU1
[   15.718213]        ----                    ----
[   15.722759]   lock(&sb->s_type->i_mutex_key#7);
[   15.727311]                                lock(&bdev->bd_mutex);
[   15.733430]                                lock(&sb->s_type->i_mutex_key#7);
[   15.740510]   lock(&bdev->bd_mutex);
[   15.744100]
[   15.744100]  *** DEADLOCK ***
[   15.744100]
[   15.750048] 1 lock held by swapon/321:
[   15.753810]  #0: ffff0009710202d8 (&sb->s_type->i_mutex_key#7){+.+.}, at: __arm64_sys_swapon+0x2cc/0x1140
[   15.763432]
[   15.763432] stack backtrace:
[   15.767811] CPU: 3 PID: 321 Comm: swapon Not tainted 5.6.0-rc6 #533
[   15.774104] Hardware name: ARM Juno development board (r2) (DT)
[   15.780048] Call trace:
[   15.782505]  dump_backtrace+0x0/0x1a8
[   15.786182]  show_stack+0x24/0x30
[   15.789512]  dump_stack+0xe8/0x150
[   15.792928]  print_circular_bug.isra.39+0x1b8/0x210
[   15.797826]  check_noncircular+0x178/0x1e0
[   15.801940]  __lock_acquire+0x1008/0x1450
[   15.805967]  lock_acquire+0xf8/0x280
[   15.809558]  __mutex_lock+0xac/0x920
[   15.813149]  mutex_lock_nested+0x3c/0x50
[   15.817089]  blkdev_put+0x30/0x120
[   15.820507]  __arm64_sys_swapon+0x5a8/0x1140
[   15.824797]  el0_svc_common.constprop.2+0x7c/0x178
[   15.829609]  do_el0_svc+0x34/0xa0
[   15.832938]  el0_sync_handler+0x114/0x1d0
[   15.836964]  el0_sync+0x140/0x180
swapon: /dev/sda2: Invalid argument



Thanks

--
Qais Yousef


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: lockdep warning in sys_swapon
  2020-03-23 15:17 lockdep warning in sys_swapon Qais Yousef
@ 2020-03-23 15:30 ` Qais Yousef
  2020-03-23 21:06   ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: Qais Yousef @ 2020-03-23 15:30 UTC (permalink / raw)
  To: Andrew Morton, linux-mm; +Cc: linux-kernel

On 03/23/20 15:17, Qais Yousef wrote:
> Hi
> 
> I hit the following 2 warnings when running with LOCKDEP=y on arm64 platform
> (juno-r2), running on v5.6-rc6
> 
> The 1st one is when I execute `swapon -a`. The 2nd one happens at boot. I have
> /dev/sda2 as my swap in /etc/fstab
> 
> Note that I either hit 1 OR 2, but didn't hit both warnings at the same time,
> yet at least.
> 
> /dev/sda2 is a usb flash drive, in case it matters somehow.

By the way, I noticed that in claim_swapfile() if we fail we don't release the
lock. Shouldn't we release the lock here?

I tried with that FWIW, but it had no effect on the warnings.


diff --git a/mm/swapfile.c b/mm/swapfile.c
index b2a2e45c9a36..0293e8c7b472 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2900,8 +2900,10 @@ static int claim_swapfile(struct swap_info_struct *p, struct inode *inode)
        }

        inode_lock(inode);
-       if (IS_SWAPFILE(inode))
+       if (IS_SWAPFILE(inode)) {
+               inode_unlock(inode);
                return -EBUSY;
+       }

        return 0;
 }


Thanks

--
Qais Yousef


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: lockdep warning in sys_swapon
  2020-03-23 15:30 ` Qais Yousef
@ 2020-03-23 21:06   ` Andrew Morton
  2020-03-23 21:12     ` Darrick J. Wong
  2020-03-24 10:51     ` Qais Yousef
  0 siblings, 2 replies; 7+ messages in thread
From: Andrew Morton @ 2020-03-23 21:06 UTC (permalink / raw)
  To: Qais Yousef
  Cc: linux-mm, linux-kernel, Naohiro Aota, Darrick J. Wong, Christoph Hellwig

On Mon, 23 Mar 2020 15:30:45 +0000 Qais Yousef <qais.yousef@arm.com> wrote:

> On 03/23/20 15:17, Qais Yousef wrote:
> > Hi
> > 
> > I hit the following 2 warnings when running with LOCKDEP=y on arm64 platform
> > (juno-r2), running on v5.6-rc6
> > 
> > The 1st one is when I execute `swapon -a`. The 2nd one happens at boot. I have
> > /dev/sda2 as my swap in /etc/fstab
> > 
> > Note that I either hit 1 OR 2, but didn't hit both warnings at the same time,
> > yet at least.
> > 
> > /dev/sda2 is a usb flash drive, in case it matters somehow.
> 
> By the way, I noticed that in claim_swapfile() if we fail we don't release the
> lock. Shouldn't we release the lock here?
> 
> I tried with that FWIW, but it had no effect on the warnings.
> 

I'll be sending the below into Linus this week.

I was hoping to hear from Darrick/Christoph (?) but it looks like the
right thing to do.  Are you able to test it?

I think I'll add a cc:stable to this one.



From: Naohiro Aota <naohiro.aota@wdc.com>
Subject: mm/swapfile.c: move inode_lock out of claim_swapfile

claim_swapfile() currently keeps the inode locked when it is successful,
or the file is already swapfile (with -EBUSY).  And, on the other error
cases, it does not lock the inode.

This inconsistency of the lock state and return value is quite confusing
and actually causing a bad unlock balance as below in the "bad_swap"
section of __do_sys_swapon().

This commit fixes this issue by moving the inode_lock() and IS_SWAPFILE
check out of claim_swapfile().  The inode is unlocked in
"bad_swap_unlock_inode" section, so that the inode is ensured to be
unlocked at "bad_swap".  Thus, error handling codes after the locking now
jumps to "bad_swap_unlock_inode" instead of "bad_swap".

    =====================================
    WARNING: bad unlock balance detected!
    5.5.0-rc7+ #176 Not tainted
    -------------------------------------
    swapon/4294 is trying to release lock (&sb->s_type->i_mutex_key) at:
    [<ffffffff8173a6eb>] __do_sys_swapon+0x94b/0x3550
    but there are no more locks to release!

    other info that might help us debug this:
    no locks held by swapon/4294.

    stack backtrace:
    CPU: 5 PID: 4294 Comm: swapon Not tainted 5.5.0-rc7-BTRFS-ZNS+ #176
    Hardware name: ASUS All Series/H87-PRO, BIOS 2102 07/29/2014
    Call Trace:
     dump_stack+0xa1/0xea
     ? __do_sys_swapon+0x94b/0x3550
     print_unlock_imbalance_bug.cold+0x114/0x123
     ? __do_sys_swapon+0x94b/0x3550
     lock_release+0x562/0xed0
     ? kvfree+0x31/0x40
     ? lock_downgrade+0x770/0x770
     ? kvfree+0x31/0x40
     ? rcu_read_lock_sched_held+0xa1/0xd0
     ? rcu_read_lock_bh_held+0xb0/0xb0
     up_write+0x2d/0x490
     ? kfree+0x293/0x2f0
     __do_sys_swapon+0x94b/0x3550
     ? putname+0xb0/0xf0
     ? kmem_cache_free+0x2e7/0x370
     ? do_sys_open+0x184/0x3e0
     ? generic_max_swapfile_size+0x40/0x40
     ? do_syscall_64+0x27/0x4b0
     ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
     ? lockdep_hardirqs_on+0x38c/0x590
     __x64_sys_swapon+0x54/0x80
     do_syscall_64+0xa4/0x4b0
     entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x7f15da0a0dc7

Link: http://lkml.kernel.org/r/20200206090132.154869-1-naohiro.aota@wdc.com
Fixes: 1638045c3677 ("mm: set S_SWAPFILE on blockdev swap devices")
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/swapfile.c |   41 ++++++++++++++++++++---------------------
 1 file changed, 20 insertions(+), 21 deletions(-)

--- a/mm/swapfile.c~mm-swap-move-inode_lock-out-of-claim_swapfile
+++ a/mm/swapfile.c
@@ -2899,10 +2899,6 @@ static int claim_swapfile(struct swap_in
 		p->bdev = inode->i_sb->s_bdev;
 	}
 
-	inode_lock(inode);
-	if (IS_SWAPFILE(inode))
-		return -EBUSY;
-
 	return 0;
 }
 
@@ -3157,36 +3153,41 @@ SYSCALL_DEFINE2(swapon, const char __use
 	mapping = swap_file->f_mapping;
 	inode = mapping->host;
 
-	/* will take i_rwsem; */
 	error = claim_swapfile(p, inode);
 	if (unlikely(error))
 		goto bad_swap;
 
+	inode_lock(inode);
+	if (IS_SWAPFILE(inode)) {
+		error = -EBUSY;
+		goto bad_swap_unlock_inode;
+	}
+
 	/*
 	 * Read the swap header.
 	 */
 	if (!mapping->a_ops->readpage) {
 		error = -EINVAL;
-		goto bad_swap;
+		goto bad_swap_unlock_inode;
 	}
 	page = read_mapping_page(mapping, 0, swap_file);
 	if (IS_ERR(page)) {
 		error = PTR_ERR(page);
-		goto bad_swap;
+		goto bad_swap_unlock_inode;
 	}
 	swap_header = kmap(page);
 
 	maxpages = read_swap_header(p, swap_header, inode);
 	if (unlikely(!maxpages)) {
 		error = -EINVAL;
-		goto bad_swap;
+		goto bad_swap_unlock_inode;
 	}
 
 	/* OK, set up the swap map and apply the bad block list */
 	swap_map = vzalloc(maxpages);
 	if (!swap_map) {
 		error = -ENOMEM;
-		goto bad_swap;
+		goto bad_swap_unlock_inode;
 	}
 
 	if (bdi_cap_stable_pages_required(inode_to_bdi(inode)))
@@ -3211,7 +3212,7 @@ SYSCALL_DEFINE2(swapon, const char __use
 					GFP_KERNEL);
 		if (!cluster_info) {
 			error = -ENOMEM;
-			goto bad_swap;
+			goto bad_swap_unlock_inode;
 		}
 
 		for (ci = 0; ci < nr_cluster; ci++)
@@ -3220,7 +3221,7 @@ SYSCALL_DEFINE2(swapon, const char __use
 		p->percpu_cluster = alloc_percpu(struct percpu_cluster);
 		if (!p->percpu_cluster) {
 			error = -ENOMEM;
-			goto bad_swap;
+			goto bad_swap_unlock_inode;
 		}
 		for_each_possible_cpu(cpu) {
 			struct percpu_cluster *cluster;
@@ -3234,13 +3235,13 @@ SYSCALL_DEFINE2(swapon, const char __use
 
 	error = swap_cgroup_swapon(p->type, maxpages);
 	if (error)
-		goto bad_swap;
+		goto bad_swap_unlock_inode;
 
 	nr_extents = setup_swap_map_and_extents(p, swap_header, swap_map,
 		cluster_info, maxpages, &span);
 	if (unlikely(nr_extents < 0)) {
 		error = nr_extents;
-		goto bad_swap;
+		goto bad_swap_unlock_inode;
 	}
 	/* frontswap enabled? set up bit-per-page map for frontswap */
 	if (IS_ENABLED(CONFIG_FRONTSWAP))
@@ -3280,7 +3281,7 @@ SYSCALL_DEFINE2(swapon, const char __use
 
 	error = init_swap_address_space(p->type, maxpages);
 	if (error)
-		goto bad_swap;
+		goto bad_swap_unlock_inode;
 
 	/*
 	 * Flush any pending IO and dirty mappings before we start using this
@@ -3290,7 +3291,7 @@ SYSCALL_DEFINE2(swapon, const char __use
 	error = inode_drain_writes(inode);
 	if (error) {
 		inode->i_flags &= ~S_SWAPFILE;
-		goto bad_swap;
+		goto bad_swap_unlock_inode;
 	}
 
 	mutex_lock(&swapon_mutex);
@@ -3315,6 +3316,8 @@ SYSCALL_DEFINE2(swapon, const char __use
 
 	error = 0;
 	goto out;
+bad_swap_unlock_inode:
+	inode_unlock(inode);
 bad_swap:
 	free_percpu(p->percpu_cluster);
 	p->percpu_cluster = NULL;
@@ -3322,6 +3325,7 @@ bad_swap:
 		set_blocksize(p->bdev, p->old_block_size);
 		blkdev_put(p->bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL);
 	}
+	inode = NULL;
 	destroy_swap_extents(p);
 	swap_cgroup_swapoff(p->type);
 	spin_lock(&swap_lock);
@@ -3333,13 +3337,8 @@ bad_swap:
 	kvfree(frontswap_map);
 	if (inced_nr_rotate_swap)
 		atomic_dec(&nr_rotate_swap);
-	if (swap_file) {
-		if (inode) {
-			inode_unlock(inode);
-			inode = NULL;
-		}
+	if (swap_file)
 		filp_close(swap_file, NULL);
-	}
 out:
 	if (page && !IS_ERR(page)) {
 		kunmap(page);
_



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: lockdep warning in sys_swapon
  2020-03-23 21:06   ` Andrew Morton
@ 2020-03-23 21:12     ` Darrick J. Wong
  2020-03-23 21:59       ` Naohiro Aota
  2020-03-24 10:51     ` Qais Yousef
  1 sibling, 1 reply; 7+ messages in thread
From: Darrick J. Wong @ 2020-03-23 21:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Qais Yousef, linux-mm, linux-kernel, Naohiro Aota, Christoph Hellwig

On Mon, Mar 23, 2020 at 02:06:15PM -0700, Andrew Morton wrote:
> On Mon, 23 Mar 2020 15:30:45 +0000 Qais Yousef <qais.yousef@arm.com> wrote:
> 
> > On 03/23/20 15:17, Qais Yousef wrote:
> > > Hi
> > > 
> > > I hit the following 2 warnings when running with LOCKDEP=y on arm64 platform
> > > (juno-r2), running on v5.6-rc6
> > > 
> > > The 1st one is when I execute `swapon -a`. The 2nd one happens at boot. I have
> > > /dev/sda2 as my swap in /etc/fstab
> > > 
> > > Note that I either hit 1 OR 2, but didn't hit both warnings at the same time,
> > > yet at least.
> > > 
> > > /dev/sda2 is a usb flash drive, in case it matters somehow.
> > 
> > By the way, I noticed that in claim_swapfile() if we fail we don't release the
> > lock. Shouldn't we release the lock here?
> > 
> > I tried with that FWIW, but it had no effect on the warnings.
> > 
> 
> I'll be sending the below into Linus this week.
> 
> I was hoping to hear from Darrick/Christoph (?) but it looks like the
> right thing to do.  Are you able to test it?

I had questions[1] about the patch, but nobody ever replied.

--D

[1] https://lore.kernel.org/linux-fsdevel/20200212164956.GK6874@magnolia/

> I think I'll add a cc:stable to this one.
> 
> 
> 
> From: Naohiro Aota <naohiro.aota@wdc.com>
> Subject: mm/swapfile.c: move inode_lock out of claim_swapfile
> 
> claim_swapfile() currently keeps the inode locked when it is successful,
> or the file is already swapfile (with -EBUSY).  And, on the other error
> cases, it does not lock the inode.
> 
> This inconsistency of the lock state and return value is quite confusing
> and actually causing a bad unlock balance as below in the "bad_swap"
> section of __do_sys_swapon().
> 
> This commit fixes this issue by moving the inode_lock() and IS_SWAPFILE
> check out of claim_swapfile().  The inode is unlocked in
> "bad_swap_unlock_inode" section, so that the inode is ensured to be
> unlocked at "bad_swap".  Thus, error handling codes after the locking now
> jumps to "bad_swap_unlock_inode" instead of "bad_swap".
> 
>     =====================================
>     WARNING: bad unlock balance detected!
>     5.5.0-rc7+ #176 Not tainted
>     -------------------------------------
>     swapon/4294 is trying to release lock (&sb->s_type->i_mutex_key) at:
>     [<ffffffff8173a6eb>] __do_sys_swapon+0x94b/0x3550
>     but there are no more locks to release!
> 
>     other info that might help us debug this:
>     no locks held by swapon/4294.
> 
>     stack backtrace:
>     CPU: 5 PID: 4294 Comm: swapon Not tainted 5.5.0-rc7-BTRFS-ZNS+ #176
>     Hardware name: ASUS All Series/H87-PRO, BIOS 2102 07/29/2014
>     Call Trace:
>      dump_stack+0xa1/0xea
>      ? __do_sys_swapon+0x94b/0x3550
>      print_unlock_imbalance_bug.cold+0x114/0x123
>      ? __do_sys_swapon+0x94b/0x3550
>      lock_release+0x562/0xed0
>      ? kvfree+0x31/0x40
>      ? lock_downgrade+0x770/0x770
>      ? kvfree+0x31/0x40
>      ? rcu_read_lock_sched_held+0xa1/0xd0
>      ? rcu_read_lock_bh_held+0xb0/0xb0
>      up_write+0x2d/0x490
>      ? kfree+0x293/0x2f0
>      __do_sys_swapon+0x94b/0x3550
>      ? putname+0xb0/0xf0
>      ? kmem_cache_free+0x2e7/0x370
>      ? do_sys_open+0x184/0x3e0
>      ? generic_max_swapfile_size+0x40/0x40
>      ? do_syscall_64+0x27/0x4b0
>      ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
>      ? lockdep_hardirqs_on+0x38c/0x590
>      __x64_sys_swapon+0x54/0x80
>      do_syscall_64+0xa4/0x4b0
>      entry_SYSCALL_64_after_hwframe+0x49/0xbe
>     RIP: 0033:0x7f15da0a0dc7
> 
> Link: http://lkml.kernel.org/r/20200206090132.154869-1-naohiro.aota@wdc.com
> Fixes: 1638045c3677 ("mm: set S_SWAPFILE on blockdev swap devices")
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
> Cc: Darrick J. Wong <darrick.wong@oracle.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  mm/swapfile.c |   41 ++++++++++++++++++++---------------------
>  1 file changed, 20 insertions(+), 21 deletions(-)
> 
> --- a/mm/swapfile.c~mm-swap-move-inode_lock-out-of-claim_swapfile
> +++ a/mm/swapfile.c
> @@ -2899,10 +2899,6 @@ static int claim_swapfile(struct swap_in
>  		p->bdev = inode->i_sb->s_bdev;
>  	}
>  
> -	inode_lock(inode);
> -	if (IS_SWAPFILE(inode))
> -		return -EBUSY;
> -
>  	return 0;
>  }
>  
> @@ -3157,36 +3153,41 @@ SYSCALL_DEFINE2(swapon, const char __use
>  	mapping = swap_file->f_mapping;
>  	inode = mapping->host;
>  
> -	/* will take i_rwsem; */
>  	error = claim_swapfile(p, inode);
>  	if (unlikely(error))
>  		goto bad_swap;
>  
> +	inode_lock(inode);
> +	if (IS_SWAPFILE(inode)) {
> +		error = -EBUSY;
> +		goto bad_swap_unlock_inode;
> +	}
> +
>  	/*
>  	 * Read the swap header.
>  	 */
>  	if (!mapping->a_ops->readpage) {
>  		error = -EINVAL;
> -		goto bad_swap;
> +		goto bad_swap_unlock_inode;
>  	}
>  	page = read_mapping_page(mapping, 0, swap_file);
>  	if (IS_ERR(page)) {
>  		error = PTR_ERR(page);
> -		goto bad_swap;
> +		goto bad_swap_unlock_inode;
>  	}
>  	swap_header = kmap(page);
>  
>  	maxpages = read_swap_header(p, swap_header, inode);
>  	if (unlikely(!maxpages)) {
>  		error = -EINVAL;
> -		goto bad_swap;
> +		goto bad_swap_unlock_inode;
>  	}
>  
>  	/* OK, set up the swap map and apply the bad block list */
>  	swap_map = vzalloc(maxpages);
>  	if (!swap_map) {
>  		error = -ENOMEM;
> -		goto bad_swap;
> +		goto bad_swap_unlock_inode;
>  	}
>  
>  	if (bdi_cap_stable_pages_required(inode_to_bdi(inode)))
> @@ -3211,7 +3212,7 @@ SYSCALL_DEFINE2(swapon, const char __use
>  					GFP_KERNEL);
>  		if (!cluster_info) {
>  			error = -ENOMEM;
> -			goto bad_swap;
> +			goto bad_swap_unlock_inode;
>  		}
>  
>  		for (ci = 0; ci < nr_cluster; ci++)
> @@ -3220,7 +3221,7 @@ SYSCALL_DEFINE2(swapon, const char __use
>  		p->percpu_cluster = alloc_percpu(struct percpu_cluster);
>  		if (!p->percpu_cluster) {
>  			error = -ENOMEM;
> -			goto bad_swap;
> +			goto bad_swap_unlock_inode;
>  		}
>  		for_each_possible_cpu(cpu) {
>  			struct percpu_cluster *cluster;
> @@ -3234,13 +3235,13 @@ SYSCALL_DEFINE2(swapon, const char __use
>  
>  	error = swap_cgroup_swapon(p->type, maxpages);
>  	if (error)
> -		goto bad_swap;
> +		goto bad_swap_unlock_inode;
>  
>  	nr_extents = setup_swap_map_and_extents(p, swap_header, swap_map,
>  		cluster_info, maxpages, &span);
>  	if (unlikely(nr_extents < 0)) {
>  		error = nr_extents;
> -		goto bad_swap;
> +		goto bad_swap_unlock_inode;
>  	}
>  	/* frontswap enabled? set up bit-per-page map for frontswap */
>  	if (IS_ENABLED(CONFIG_FRONTSWAP))
> @@ -3280,7 +3281,7 @@ SYSCALL_DEFINE2(swapon, const char __use
>  
>  	error = init_swap_address_space(p->type, maxpages);
>  	if (error)
> -		goto bad_swap;
> +		goto bad_swap_unlock_inode;
>  
>  	/*
>  	 * Flush any pending IO and dirty mappings before we start using this
> @@ -3290,7 +3291,7 @@ SYSCALL_DEFINE2(swapon, const char __use
>  	error = inode_drain_writes(inode);
>  	if (error) {
>  		inode->i_flags &= ~S_SWAPFILE;
> -		goto bad_swap;
> +		goto bad_swap_unlock_inode;
>  	}
>  
>  	mutex_lock(&swapon_mutex);
> @@ -3315,6 +3316,8 @@ SYSCALL_DEFINE2(swapon, const char __use
>  
>  	error = 0;
>  	goto out;
> +bad_swap_unlock_inode:
> +	inode_unlock(inode);
>  bad_swap:
>  	free_percpu(p->percpu_cluster);
>  	p->percpu_cluster = NULL;
> @@ -3322,6 +3325,7 @@ bad_swap:
>  		set_blocksize(p->bdev, p->old_block_size);
>  		blkdev_put(p->bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL);
>  	}
> +	inode = NULL;
>  	destroy_swap_extents(p);
>  	swap_cgroup_swapoff(p->type);
>  	spin_lock(&swap_lock);
> @@ -3333,13 +3337,8 @@ bad_swap:
>  	kvfree(frontswap_map);
>  	if (inced_nr_rotate_swap)
>  		atomic_dec(&nr_rotate_swap);
> -	if (swap_file) {
> -		if (inode) {
> -			inode_unlock(inode);
> -			inode = NULL;
> -		}
> +	if (swap_file)
>  		filp_close(swap_file, NULL);
> -	}
>  out:
>  	if (page && !IS_ERR(page)) {
>  		kunmap(page);
> _
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: lockdep warning in sys_swapon
  2020-03-23 21:12     ` Darrick J. Wong
@ 2020-03-23 21:59       ` Naohiro Aota
  2020-03-25 16:08         ` Darrick J. Wong
  0 siblings, 1 reply; 7+ messages in thread
From: Naohiro Aota @ 2020-03-23 21:59 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Andrew Morton, Qais Yousef, linux-mm, linux-kernel, Christoph Hellwig

On Mon, Mar 23, 2020 at 02:12:01PM -0700, Darrick J. Wong wrote:
>On Mon, Mar 23, 2020 at 02:06:15PM -0700, Andrew Morton wrote:
>> On Mon, 23 Mar 2020 15:30:45 +0000 Qais Yousef <qais.yousef@arm.com> wrote:
>>
>> > On 03/23/20 15:17, Qais Yousef wrote:
>> > > Hi
>> > >
>> > > I hit the following 2 warnings when running with LOCKDEP=y on arm64 platform
>> > > (juno-r2), running on v5.6-rc6
>> > >
>> > > The 1st one is when I execute `swapon -a`. The 2nd one happens at boot. I have
>> > > /dev/sda2 as my swap in /etc/fstab
>> > >
>> > > Note that I either hit 1 OR 2, but didn't hit both warnings at the same time,
>> > > yet at least.
>> > >
>> > > /dev/sda2 is a usb flash drive, in case it matters somehow.
>> >
>> > By the way, I noticed that in claim_swapfile() if we fail we don't release the
>> > lock. Shouldn't we release the lock here?
>> >
>> > I tried with that FWIW, but it had no effect on the warnings.
>> >
>>
>> I'll be sending the below into Linus this week.
>>
>> I was hoping to hear from Darrick/Christoph (?) but it looks like the
>> right thing to do.  Are you able to test it?
>
>I had questions[1] about the patch, but nobody ever replied.

Sorry, I have overlooked that one. I'm replying here.

> Sorry to wander in late, but I don't see how we unlock the inode in the
> success case.  Before this patch, the "if (inode) inode_unlock(inode);"
> below out: would take care of this for both the success case and the
> bad_swap case, but now that's gone, and AFAICT after this patch we only
> unlock the inode when erroring out...
> 
> > +bad_swap_unlock_inode:
> > +	inode_unlock(inode);
> 
> ...since we never goto bad_swap_unlock_inode when error == 0, correct?

Actually, this patch does not touch "if (inode) inode_unlock(inode);" below
the "out:" label, so we still have that "inode_unlock" after this patch and
it unlocks the inode in the success case.

>
>--D
>
>[1] https://lore.kernel.org/linux-fsdevel/20200212164956.GK6874@magnolia/
>
>> I think I'll add a cc:stable to this one.
>>
>>
>>
>> From: Naohiro Aota <naohiro.aota@wdc.com>
>> Subject: mm/swapfile.c: move inode_lock out of claim_swapfile
>>
>> claim_swapfile() currently keeps the inode locked when it is successful,
>> or the file is already swapfile (with -EBUSY).  And, on the other error
>> cases, it does not lock the inode.
>>
>> This inconsistency of the lock state and return value is quite confusing
>> and actually causing a bad unlock balance as below in the "bad_swap"
>> section of __do_sys_swapon().
>>
>> This commit fixes this issue by moving the inode_lock() and IS_SWAPFILE
>> check out of claim_swapfile().  The inode is unlocked in
>> "bad_swap_unlock_inode" section, so that the inode is ensured to be
>> unlocked at "bad_swap".  Thus, error handling codes after the locking now
>> jumps to "bad_swap_unlock_inode" instead of "bad_swap".
>>
>>     =====================================
>>     WARNING: bad unlock balance detected!
>>     5.5.0-rc7+ #176 Not tainted
>>     -------------------------------------
>>     swapon/4294 is trying to release lock (&sb->s_type->i_mutex_key) at:
>>     [<ffffffff8173a6eb>] __do_sys_swapon+0x94b/0x3550
>>     but there are no more locks to release!
>>
>>     other info that might help us debug this:
>>     no locks held by swapon/4294.
>>
>>     stack backtrace:
>>     CPU: 5 PID: 4294 Comm: swapon Not tainted 5.5.0-rc7-BTRFS-ZNS+ #176
>>     Hardware name: ASUS All Series/H87-PRO, BIOS 2102 07/29/2014
>>     Call Trace:
>>      dump_stack+0xa1/0xea
>>      ? __do_sys_swapon+0x94b/0x3550
>>      print_unlock_imbalance_bug.cold+0x114/0x123
>>      ? __do_sys_swapon+0x94b/0x3550
>>      lock_release+0x562/0xed0
>>      ? kvfree+0x31/0x40
>>      ? lock_downgrade+0x770/0x770
>>      ? kvfree+0x31/0x40
>>      ? rcu_read_lock_sched_held+0xa1/0xd0
>>      ? rcu_read_lock_bh_held+0xb0/0xb0
>>      up_write+0x2d/0x490
>>      ? kfree+0x293/0x2f0
>>      __do_sys_swapon+0x94b/0x3550
>>      ? putname+0xb0/0xf0
>>      ? kmem_cache_free+0x2e7/0x370
>>      ? do_sys_open+0x184/0x3e0
>>      ? generic_max_swapfile_size+0x40/0x40
>>      ? do_syscall_64+0x27/0x4b0
>>      ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>      ? lockdep_hardirqs_on+0x38c/0x590
>>      __x64_sys_swapon+0x54/0x80
>>      do_syscall_64+0xa4/0x4b0
>>      entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>     RIP: 0033:0x7f15da0a0dc7
>>
>> Link: http://lkml.kernel.org/r/20200206090132.154869-1-naohiro.aota@wdc.com
>> Fixes: 1638045c3677 ("mm: set S_SWAPFILE on blockdev swap devices")
>> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
>> Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Darrick J. Wong <darrick.wong@oracle.com>
>> Cc: Christoph Hellwig <hch@infradead.org>
>> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>> ---
>>
>>  mm/swapfile.c |   41 ++++++++++++++++++++---------------------
>>  1 file changed, 20 insertions(+), 21 deletions(-)
>>
>> --- a/mm/swapfile.c~mm-swap-move-inode_lock-out-of-claim_swapfile
>> +++ a/mm/swapfile.c
>> @@ -2899,10 +2899,6 @@ static int claim_swapfile(struct swap_in
>>  		p->bdev = inode->i_sb->s_bdev;
>>  	}
>>
>> -	inode_lock(inode);
>> -	if (IS_SWAPFILE(inode))
>> -		return -EBUSY;
>> -
>>  	return 0;
>>  }
>>
>> @@ -3157,36 +3153,41 @@ SYSCALL_DEFINE2(swapon, const char __use
>>  	mapping = swap_file->f_mapping;
>>  	inode = mapping->host;
>>
>> -	/* will take i_rwsem; */
>>  	error = claim_swapfile(p, inode);
>>  	if (unlikely(error))
>>  		goto bad_swap;
>>
>> +	inode_lock(inode);
>> +	if (IS_SWAPFILE(inode)) {
>> +		error = -EBUSY;
>> +		goto bad_swap_unlock_inode;
>> +	}
>> +
>>  	/*
>>  	 * Read the swap header.
>>  	 */
>>  	if (!mapping->a_ops->readpage) {
>>  		error = -EINVAL;
>> -		goto bad_swap;
>> +		goto bad_swap_unlock_inode;
>>  	}
>>  	page = read_mapping_page(mapping, 0, swap_file);
>>  	if (IS_ERR(page)) {
>>  		error = PTR_ERR(page);
>> -		goto bad_swap;
>> +		goto bad_swap_unlock_inode;
>>  	}
>>  	swap_header = kmap(page);
>>
>>  	maxpages = read_swap_header(p, swap_header, inode);
>>  	if (unlikely(!maxpages)) {
>>  		error = -EINVAL;
>> -		goto bad_swap;
>> +		goto bad_swap_unlock_inode;
>>  	}
>>
>>  	/* OK, set up the swap map and apply the bad block list */
>>  	swap_map = vzalloc(maxpages);
>>  	if (!swap_map) {
>>  		error = -ENOMEM;
>> -		goto bad_swap;
>> +		goto bad_swap_unlock_inode;
>>  	}
>>
>>  	if (bdi_cap_stable_pages_required(inode_to_bdi(inode)))
>> @@ -3211,7 +3212,7 @@ SYSCALL_DEFINE2(swapon, const char __use
>>  					GFP_KERNEL);
>>  		if (!cluster_info) {
>>  			error = -ENOMEM;
>> -			goto bad_swap;
>> +			goto bad_swap_unlock_inode;
>>  		}
>>
>>  		for (ci = 0; ci < nr_cluster; ci++)
>> @@ -3220,7 +3221,7 @@ SYSCALL_DEFINE2(swapon, const char __use
>>  		p->percpu_cluster = alloc_percpu(struct percpu_cluster);
>>  		if (!p->percpu_cluster) {
>>  			error = -ENOMEM;
>> -			goto bad_swap;
>> +			goto bad_swap_unlock_inode;
>>  		}
>>  		for_each_possible_cpu(cpu) {
>>  			struct percpu_cluster *cluster;
>> @@ -3234,13 +3235,13 @@ SYSCALL_DEFINE2(swapon, const char __use
>>
>>  	error = swap_cgroup_swapon(p->type, maxpages);
>>  	if (error)
>> -		goto bad_swap;
>> +		goto bad_swap_unlock_inode;
>>
>>  	nr_extents = setup_swap_map_and_extents(p, swap_header, swap_map,
>>  		cluster_info, maxpages, &span);
>>  	if (unlikely(nr_extents < 0)) {
>>  		error = nr_extents;
>> -		goto bad_swap;
>> +		goto bad_swap_unlock_inode;
>>  	}
>>  	/* frontswap enabled? set up bit-per-page map for frontswap */
>>  	if (IS_ENABLED(CONFIG_FRONTSWAP))
>> @@ -3280,7 +3281,7 @@ SYSCALL_DEFINE2(swapon, const char __use
>>
>>  	error = init_swap_address_space(p->type, maxpages);
>>  	if (error)
>> -		goto bad_swap;
>> +		goto bad_swap_unlock_inode;
>>
>>  	/*
>>  	 * Flush any pending IO and dirty mappings before we start using this
>> @@ -3290,7 +3291,7 @@ SYSCALL_DEFINE2(swapon, const char __use
>>  	error = inode_drain_writes(inode);
>>  	if (error) {
>>  		inode->i_flags &= ~S_SWAPFILE;
>> -		goto bad_swap;
>> +		goto bad_swap_unlock_inode;
>>  	}
>>
>>  	mutex_lock(&swapon_mutex);
>> @@ -3315,6 +3316,8 @@ SYSCALL_DEFINE2(swapon, const char __use
>>
>>  	error = 0;
>>  	goto out;
>> +bad_swap_unlock_inode:
>> +	inode_unlock(inode);
>>  bad_swap:
>>  	free_percpu(p->percpu_cluster);
>>  	p->percpu_cluster = NULL;
>> @@ -3322,6 +3325,7 @@ bad_swap:
>>  		set_blocksize(p->bdev, p->old_block_size);
>>  		blkdev_put(p->bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL);
>>  	}
>> +	inode = NULL;
>>  	destroy_swap_extents(p);
>>  	swap_cgroup_swapoff(p->type);
>>  	spin_lock(&swap_lock);
>> @@ -3333,13 +3337,8 @@ bad_swap:
>>  	kvfree(frontswap_map);
>>  	if (inced_nr_rotate_swap)
>>  		atomic_dec(&nr_rotate_swap);
>> -	if (swap_file) {
>> -		if (inode) {
>> -			inode_unlock(inode);
>> -			inode = NULL;
>> -		}
>> +	if (swap_file)
>>  		filp_close(swap_file, NULL);
>> -	}
>>  out:
>>  	if (page && !IS_ERR(page)) {
>>  		kunmap(page);
>> _
>>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: lockdep warning in sys_swapon
  2020-03-23 21:06   ` Andrew Morton
  2020-03-23 21:12     ` Darrick J. Wong
@ 2020-03-24 10:51     ` Qais Yousef
  1 sibling, 0 replies; 7+ messages in thread
From: Qais Yousef @ 2020-03-24 10:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Naohiro Aota, Darrick J. Wong, Christoph Hellwig

On 03/23/20 14:06, Andrew Morton wrote:
> On Mon, 23 Mar 2020 15:30:45 +0000 Qais Yousef <qais.yousef@arm.com> wrote:
> 
> > On 03/23/20 15:17, Qais Yousef wrote:
> > > Hi
> > > 
> > > I hit the following 2 warnings when running with LOCKDEP=y on arm64 platform
> > > (juno-r2), running on v5.6-rc6
> > > 
> > > The 1st one is when I execute `swapon -a`. The 2nd one happens at boot. I have
> > > /dev/sda2 as my swap in /etc/fstab
> > > 
> > > Note that I either hit 1 OR 2, but didn't hit both warnings at the same time,
> > > yet at least.
> > > 
> > > /dev/sda2 is a usb flash drive, in case it matters somehow.
> > 
> > By the way, I noticed that in claim_swapfile() if we fail we don't release the
> > lock. Shouldn't we release the lock here?
> > 
> > I tried with that FWIW, but it had no effect on the warnings.
> > 
> 
> I'll be sending the below into Linus this week.
> 
> I was hoping to hear from Darrick/Christoph (?) but it looks like the
> right thing to do.  Are you able to test it?
> 
> I think I'll add a cc:stable to this one.
> 
> 
> 
> From: Naohiro Aota <naohiro.aota@wdc.com>
> Subject: mm/swapfile.c: move inode_lock out of claim_swapfile
> 
> claim_swapfile() currently keeps the inode locked when it is successful,
> or the file is already swapfile (with -EBUSY).  And, on the other error
> cases, it does not lock the inode.
> 
> This inconsistency of the lock state and return value is quite confusing
> and actually causing a bad unlock balance as below in the "bad_swap"
> section of __do_sys_swapon().
> 
> This commit fixes this issue by moving the inode_lock() and IS_SWAPFILE
> check out of claim_swapfile().  The inode is unlocked in
> "bad_swap_unlock_inode" section, so that the inode is ensured to be
> unlocked at "bad_swap".  Thus, error handling codes after the locking now
> jumps to "bad_swap_unlock_inode" instead of "bad_swap".
> 
>     =====================================
>     WARNING: bad unlock balance detected!
>     5.5.0-rc7+ #176 Not tainted
>     -------------------------------------
>     swapon/4294 is trying to release lock (&sb->s_type->i_mutex_key) at:
>     [<ffffffff8173a6eb>] __do_sys_swapon+0x94b/0x3550
>     but there are no more locks to release!
> 
>     other info that might help us debug this:
>     no locks held by swapon/4294.
> 
>     stack backtrace:
>     CPU: 5 PID: 4294 Comm: swapon Not tainted 5.5.0-rc7-BTRFS-ZNS+ #176
>     Hardware name: ASUS All Series/H87-PRO, BIOS 2102 07/29/2014
>     Call Trace:
>      dump_stack+0xa1/0xea
>      ? __do_sys_swapon+0x94b/0x3550
>      print_unlock_imbalance_bug.cold+0x114/0x123
>      ? __do_sys_swapon+0x94b/0x3550
>      lock_release+0x562/0xed0
>      ? kvfree+0x31/0x40
>      ? lock_downgrade+0x770/0x770
>      ? kvfree+0x31/0x40
>      ? rcu_read_lock_sched_held+0xa1/0xd0
>      ? rcu_read_lock_bh_held+0xb0/0xb0
>      up_write+0x2d/0x490
>      ? kfree+0x293/0x2f0
>      __do_sys_swapon+0x94b/0x3550
>      ? putname+0xb0/0xf0
>      ? kmem_cache_free+0x2e7/0x370
>      ? do_sys_open+0x184/0x3e0
>      ? generic_max_swapfile_size+0x40/0x40
>      ? do_syscall_64+0x27/0x4b0
>      ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
>      ? lockdep_hardirqs_on+0x38c/0x590
>      __x64_sys_swapon+0x54/0x80
>      do_syscall_64+0xa4/0x4b0
>      entry_SYSCALL_64_after_hwframe+0x49/0xbe
>     RIP: 0033:0x7f15da0a0dc7
> 
> Link: http://lkml.kernel.org/r/20200206090132.154869-1-naohiro.aota@wdc.com
> Fixes: 1638045c3677 ("mm: set S_SWAPFILE on blockdev swap devices")
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
> Cc: Darrick J. Wong <darrick.wong@oracle.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---

With this patch applied I can no longer reproduce the problem.

Tested-by: Qais Youef <qais.yousef@arm.com>

Thanks!

--
Qais Yousef

> 
>  mm/swapfile.c |   41 ++++++++++++++++++++---------------------
>  1 file changed, 20 insertions(+), 21 deletions(-)
> 
> --- a/mm/swapfile.c~mm-swap-move-inode_lock-out-of-claim_swapfile
> +++ a/mm/swapfile.c
> @@ -2899,10 +2899,6 @@ static int claim_swapfile(struct swap_in
>  		p->bdev = inode->i_sb->s_bdev;
>  	}
>  
> -	inode_lock(inode);
> -	if (IS_SWAPFILE(inode))
> -		return -EBUSY;
> -
>  	return 0;
>  }
>  
> @@ -3157,36 +3153,41 @@ SYSCALL_DEFINE2(swapon, const char __use
>  	mapping = swap_file->f_mapping;
>  	inode = mapping->host;
>  
> -	/* will take i_rwsem; */
>  	error = claim_swapfile(p, inode);
>  	if (unlikely(error))
>  		goto bad_swap;
>  
> +	inode_lock(inode);
> +	if (IS_SWAPFILE(inode)) {
> +		error = -EBUSY;
> +		goto bad_swap_unlock_inode;
> +	}
> +
>  	/*
>  	 * Read the swap header.
>  	 */
>  	if (!mapping->a_ops->readpage) {
>  		error = -EINVAL;
> -		goto bad_swap;
> +		goto bad_swap_unlock_inode;
>  	}
>  	page = read_mapping_page(mapping, 0, swap_file);
>  	if (IS_ERR(page)) {
>  		error = PTR_ERR(page);
> -		goto bad_swap;
> +		goto bad_swap_unlock_inode;
>  	}
>  	swap_header = kmap(page);
>  
>  	maxpages = read_swap_header(p, swap_header, inode);
>  	if (unlikely(!maxpages)) {
>  		error = -EINVAL;
> -		goto bad_swap;
> +		goto bad_swap_unlock_inode;
>  	}
>  
>  	/* OK, set up the swap map and apply the bad block list */
>  	swap_map = vzalloc(maxpages);
>  	if (!swap_map) {
>  		error = -ENOMEM;
> -		goto bad_swap;
> +		goto bad_swap_unlock_inode;
>  	}
>  
>  	if (bdi_cap_stable_pages_required(inode_to_bdi(inode)))
> @@ -3211,7 +3212,7 @@ SYSCALL_DEFINE2(swapon, const char __use
>  					GFP_KERNEL);
>  		if (!cluster_info) {
>  			error = -ENOMEM;
> -			goto bad_swap;
> +			goto bad_swap_unlock_inode;
>  		}
>  
>  		for (ci = 0; ci < nr_cluster; ci++)
> @@ -3220,7 +3221,7 @@ SYSCALL_DEFINE2(swapon, const char __use
>  		p->percpu_cluster = alloc_percpu(struct percpu_cluster);
>  		if (!p->percpu_cluster) {
>  			error = -ENOMEM;
> -			goto bad_swap;
> +			goto bad_swap_unlock_inode;
>  		}
>  		for_each_possible_cpu(cpu) {
>  			struct percpu_cluster *cluster;
> @@ -3234,13 +3235,13 @@ SYSCALL_DEFINE2(swapon, const char __use
>  
>  	error = swap_cgroup_swapon(p->type, maxpages);
>  	if (error)
> -		goto bad_swap;
> +		goto bad_swap_unlock_inode;
>  
>  	nr_extents = setup_swap_map_and_extents(p, swap_header, swap_map,
>  		cluster_info, maxpages, &span);
>  	if (unlikely(nr_extents < 0)) {
>  		error = nr_extents;
> -		goto bad_swap;
> +		goto bad_swap_unlock_inode;
>  	}
>  	/* frontswap enabled? set up bit-per-page map for frontswap */
>  	if (IS_ENABLED(CONFIG_FRONTSWAP))
> @@ -3280,7 +3281,7 @@ SYSCALL_DEFINE2(swapon, const char __use
>  
>  	error = init_swap_address_space(p->type, maxpages);
>  	if (error)
> -		goto bad_swap;
> +		goto bad_swap_unlock_inode;
>  
>  	/*
>  	 * Flush any pending IO and dirty mappings before we start using this
> @@ -3290,7 +3291,7 @@ SYSCALL_DEFINE2(swapon, const char __use
>  	error = inode_drain_writes(inode);
>  	if (error) {
>  		inode->i_flags &= ~S_SWAPFILE;
> -		goto bad_swap;
> +		goto bad_swap_unlock_inode;
>  	}
>  
>  	mutex_lock(&swapon_mutex);
> @@ -3315,6 +3316,8 @@ SYSCALL_DEFINE2(swapon, const char __use
>  
>  	error = 0;
>  	goto out;
> +bad_swap_unlock_inode:
> +	inode_unlock(inode);
>  bad_swap:
>  	free_percpu(p->percpu_cluster);
>  	p->percpu_cluster = NULL;
> @@ -3322,6 +3325,7 @@ bad_swap:
>  		set_blocksize(p->bdev, p->old_block_size);
>  		blkdev_put(p->bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL);
>  	}
> +	inode = NULL;
>  	destroy_swap_extents(p);
>  	swap_cgroup_swapoff(p->type);
>  	spin_lock(&swap_lock);
> @@ -3333,13 +3337,8 @@ bad_swap:
>  	kvfree(frontswap_map);
>  	if (inced_nr_rotate_swap)
>  		atomic_dec(&nr_rotate_swap);
> -	if (swap_file) {
> -		if (inode) {
> -			inode_unlock(inode);
> -			inode = NULL;
> -		}
> +	if (swap_file)
>  		filp_close(swap_file, NULL);
> -	}
>  out:
>  	if (page && !IS_ERR(page)) {
>  		kunmap(page);
> _
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: lockdep warning in sys_swapon
  2020-03-23 21:59       ` Naohiro Aota
@ 2020-03-25 16:08         ` Darrick J. Wong
  0 siblings, 0 replies; 7+ messages in thread
From: Darrick J. Wong @ 2020-03-25 16:08 UTC (permalink / raw)
  To: Naohiro Aota
  Cc: Andrew Morton, Qais Yousef, linux-mm, linux-kernel, Christoph Hellwig

On Tue, Mar 24, 2020 at 06:59:08AM +0900, Naohiro Aota wrote:
> On Mon, Mar 23, 2020 at 02:12:01PM -0700, Darrick J. Wong wrote:
> > On Mon, Mar 23, 2020 at 02:06:15PM -0700, Andrew Morton wrote:
> > > On Mon, 23 Mar 2020 15:30:45 +0000 Qais Yousef <qais.yousef@arm.com> wrote:
> > > 
> > > > On 03/23/20 15:17, Qais Yousef wrote:
> > > > > Hi
> > > > >
> > > > > I hit the following 2 warnings when running with LOCKDEP=y on arm64 platform
> > > > > (juno-r2), running on v5.6-rc6
> > > > >
> > > > > The 1st one is when I execute `swapon -a`. The 2nd one happens at boot. I have
> > > > > /dev/sda2 as my swap in /etc/fstab
> > > > >
> > > > > Note that I either hit 1 OR 2, but didn't hit both warnings at the same time,
> > > > > yet at least.
> > > > >
> > > > > /dev/sda2 is a usb flash drive, in case it matters somehow.
> > > >
> > > > By the way, I noticed that in claim_swapfile() if we fail we don't release the
> > > > lock. Shouldn't we release the lock here?
> > > >
> > > > I tried with that FWIW, but it had no effect on the warnings.
> > > >
> > > 
> > > I'll be sending the below into Linus this week.
> > > 
> > > I was hoping to hear from Darrick/Christoph (?) but it looks like the
> > > right thing to do.  Are you able to test it?
> > 
> > I had questions[1] about the patch, but nobody ever replied.
> 
> Sorry, I have overlooked that one. I'm replying here.
> 
> > Sorry to wander in late, but I don't see how we unlock the inode in the
> > success case.  Before this patch, the "if (inode) inode_unlock(inode);"
> > below out: would take care of this for both the success case and the
> > bad_swap case, but now that's gone, and AFAICT after this patch we only
> > unlock the inode when erroring out...
> > 
> > > +bad_swap_unlock_inode:
> > > +	inode_unlock(inode);
> > 
> > ...since we never goto bad_swap_unlock_inode when error == 0, correct?
> 
> Actually, this patch does not touch "if (inode) inode_unlock(inode);" below
> the "out:" label, so we still have that "inode_unlock" after this patch and
> it unlocks the inode in the success case.

Ah, right, I don't know what I was thinking when I wrote that. :)

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> 
> > 
> > --D
> > 
> > [1] https://lore.kernel.org/linux-fsdevel/20200212164956.GK6874@magnolia/
> > 
> > > I think I'll add a cc:stable to this one.
> > > 
> > > 
> > > 
> > > From: Naohiro Aota <naohiro.aota@wdc.com>
> > > Subject: mm/swapfile.c: move inode_lock out of claim_swapfile
> > > 
> > > claim_swapfile() currently keeps the inode locked when it is successful,
> > > or the file is already swapfile (with -EBUSY).  And, on the other error
> > > cases, it does not lock the inode.
> > > 
> > > This inconsistency of the lock state and return value is quite confusing
> > > and actually causing a bad unlock balance as below in the "bad_swap"
> > > section of __do_sys_swapon().
> > > 
> > > This commit fixes this issue by moving the inode_lock() and IS_SWAPFILE
> > > check out of claim_swapfile().  The inode is unlocked in
> > > "bad_swap_unlock_inode" section, so that the inode is ensured to be
> > > unlocked at "bad_swap".  Thus, error handling codes after the locking now
> > > jumps to "bad_swap_unlock_inode" instead of "bad_swap".
> > > 
> > >     =====================================
> > >     WARNING: bad unlock balance detected!
> > >     5.5.0-rc7+ #176 Not tainted
> > >     -------------------------------------
> > >     swapon/4294 is trying to release lock (&sb->s_type->i_mutex_key) at:
> > >     [<ffffffff8173a6eb>] __do_sys_swapon+0x94b/0x3550
> > >     but there are no more locks to release!
> > > 
> > >     other info that might help us debug this:
> > >     no locks held by swapon/4294.
> > > 
> > >     stack backtrace:
> > >     CPU: 5 PID: 4294 Comm: swapon Not tainted 5.5.0-rc7-BTRFS-ZNS+ #176
> > >     Hardware name: ASUS All Series/H87-PRO, BIOS 2102 07/29/2014
> > >     Call Trace:
> > >      dump_stack+0xa1/0xea
> > >      ? __do_sys_swapon+0x94b/0x3550
> > >      print_unlock_imbalance_bug.cold+0x114/0x123
> > >      ? __do_sys_swapon+0x94b/0x3550
> > >      lock_release+0x562/0xed0
> > >      ? kvfree+0x31/0x40
> > >      ? lock_downgrade+0x770/0x770
> > >      ? kvfree+0x31/0x40
> > >      ? rcu_read_lock_sched_held+0xa1/0xd0
> > >      ? rcu_read_lock_bh_held+0xb0/0xb0
> > >      up_write+0x2d/0x490
> > >      ? kfree+0x293/0x2f0
> > >      __do_sys_swapon+0x94b/0x3550
> > >      ? putname+0xb0/0xf0
> > >      ? kmem_cache_free+0x2e7/0x370
> > >      ? do_sys_open+0x184/0x3e0
> > >      ? generic_max_swapfile_size+0x40/0x40
> > >      ? do_syscall_64+0x27/0x4b0
> > >      ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > >      ? lockdep_hardirqs_on+0x38c/0x590
> > >      __x64_sys_swapon+0x54/0x80
> > >      do_syscall_64+0xa4/0x4b0
> > >      entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > >     RIP: 0033:0x7f15da0a0dc7
> > > 
> > > Link: http://lkml.kernel.org/r/20200206090132.154869-1-naohiro.aota@wdc.com
> > > Fixes: 1638045c3677 ("mm: set S_SWAPFILE on blockdev swap devices")
> > > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> > > Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
> > > Cc: Darrick J. Wong <darrick.wong@oracle.com>
> > > Cc: Christoph Hellwig <hch@infradead.org>
> > > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> > > ---
> > > 
> > >  mm/swapfile.c |   41 ++++++++++++++++++++---------------------
> > >  1 file changed, 20 insertions(+), 21 deletions(-)
> > > 
> > > --- a/mm/swapfile.c~mm-swap-move-inode_lock-out-of-claim_swapfile
> > > +++ a/mm/swapfile.c
> > > @@ -2899,10 +2899,6 @@ static int claim_swapfile(struct swap_in
> > >  		p->bdev = inode->i_sb->s_bdev;
> > >  	}
> > > 
> > > -	inode_lock(inode);
> > > -	if (IS_SWAPFILE(inode))
> > > -		return -EBUSY;
> > > -
> > >  	return 0;
> > >  }
> > > 
> > > @@ -3157,36 +3153,41 @@ SYSCALL_DEFINE2(swapon, const char __use
> > >  	mapping = swap_file->f_mapping;
> > >  	inode = mapping->host;
> > > 
> > > -	/* will take i_rwsem; */
> > >  	error = claim_swapfile(p, inode);
> > >  	if (unlikely(error))
> > >  		goto bad_swap;
> > > 
> > > +	inode_lock(inode);
> > > +	if (IS_SWAPFILE(inode)) {
> > > +		error = -EBUSY;
> > > +		goto bad_swap_unlock_inode;
> > > +	}
> > > +
> > >  	/*
> > >  	 * Read the swap header.
> > >  	 */
> > >  	if (!mapping->a_ops->readpage) {
> > >  		error = -EINVAL;
> > > -		goto bad_swap;
> > > +		goto bad_swap_unlock_inode;
> > >  	}
> > >  	page = read_mapping_page(mapping, 0, swap_file);
> > >  	if (IS_ERR(page)) {
> > >  		error = PTR_ERR(page);
> > > -		goto bad_swap;
> > > +		goto bad_swap_unlock_inode;
> > >  	}
> > >  	swap_header = kmap(page);
> > > 
> > >  	maxpages = read_swap_header(p, swap_header, inode);
> > >  	if (unlikely(!maxpages)) {
> > >  		error = -EINVAL;
> > > -		goto bad_swap;
> > > +		goto bad_swap_unlock_inode;
> > >  	}
> > > 
> > >  	/* OK, set up the swap map and apply the bad block list */
> > >  	swap_map = vzalloc(maxpages);
> > >  	if (!swap_map) {
> > >  		error = -ENOMEM;
> > > -		goto bad_swap;
> > > +		goto bad_swap_unlock_inode;
> > >  	}
> > > 
> > >  	if (bdi_cap_stable_pages_required(inode_to_bdi(inode)))
> > > @@ -3211,7 +3212,7 @@ SYSCALL_DEFINE2(swapon, const char __use
> > >  					GFP_KERNEL);
> > >  		if (!cluster_info) {
> > >  			error = -ENOMEM;
> > > -			goto bad_swap;
> > > +			goto bad_swap_unlock_inode;
> > >  		}
> > > 
> > >  		for (ci = 0; ci < nr_cluster; ci++)
> > > @@ -3220,7 +3221,7 @@ SYSCALL_DEFINE2(swapon, const char __use
> > >  		p->percpu_cluster = alloc_percpu(struct percpu_cluster);
> > >  		if (!p->percpu_cluster) {
> > >  			error = -ENOMEM;
> > > -			goto bad_swap;
> > > +			goto bad_swap_unlock_inode;
> > >  		}
> > >  		for_each_possible_cpu(cpu) {
> > >  			struct percpu_cluster *cluster;
> > > @@ -3234,13 +3235,13 @@ SYSCALL_DEFINE2(swapon, const char __use
> > > 
> > >  	error = swap_cgroup_swapon(p->type, maxpages);
> > >  	if (error)
> > > -		goto bad_swap;
> > > +		goto bad_swap_unlock_inode;
> > > 
> > >  	nr_extents = setup_swap_map_and_extents(p, swap_header, swap_map,
> > >  		cluster_info, maxpages, &span);
> > >  	if (unlikely(nr_extents < 0)) {
> > >  		error = nr_extents;
> > > -		goto bad_swap;
> > > +		goto bad_swap_unlock_inode;
> > >  	}
> > >  	/* frontswap enabled? set up bit-per-page map for frontswap */
> > >  	if (IS_ENABLED(CONFIG_FRONTSWAP))
> > > @@ -3280,7 +3281,7 @@ SYSCALL_DEFINE2(swapon, const char __use
> > > 
> > >  	error = init_swap_address_space(p->type, maxpages);
> > >  	if (error)
> > > -		goto bad_swap;
> > > +		goto bad_swap_unlock_inode;
> > > 
> > >  	/*
> > >  	 * Flush any pending IO and dirty mappings before we start using this
> > > @@ -3290,7 +3291,7 @@ SYSCALL_DEFINE2(swapon, const char __use
> > >  	error = inode_drain_writes(inode);
> > >  	if (error) {
> > >  		inode->i_flags &= ~S_SWAPFILE;
> > > -		goto bad_swap;
> > > +		goto bad_swap_unlock_inode;
> > >  	}
> > > 
> > >  	mutex_lock(&swapon_mutex);
> > > @@ -3315,6 +3316,8 @@ SYSCALL_DEFINE2(swapon, const char __use
> > > 
> > >  	error = 0;
> > >  	goto out;
> > > +bad_swap_unlock_inode:
> > > +	inode_unlock(inode);
> > >  bad_swap:
> > >  	free_percpu(p->percpu_cluster);
> > >  	p->percpu_cluster = NULL;
> > > @@ -3322,6 +3325,7 @@ bad_swap:
> > >  		set_blocksize(p->bdev, p->old_block_size);
> > >  		blkdev_put(p->bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL);
> > >  	}
> > > +	inode = NULL;
> > >  	destroy_swap_extents(p);
> > >  	swap_cgroup_swapoff(p->type);
> > >  	spin_lock(&swap_lock);
> > > @@ -3333,13 +3337,8 @@ bad_swap:
> > >  	kvfree(frontswap_map);
> > >  	if (inced_nr_rotate_swap)
> > >  		atomic_dec(&nr_rotate_swap);
> > > -	if (swap_file) {
> > > -		if (inode) {
> > > -			inode_unlock(inode);
> > > -			inode = NULL;
> > > -		}
> > > +	if (swap_file)
> > >  		filp_close(swap_file, NULL);
> > > -	}
> > >  out:
> > >  	if (page && !IS_ERR(page)) {
> > >  		kunmap(page);
> > > _
> > > 


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-03-25 16:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-23 15:17 lockdep warning in sys_swapon Qais Yousef
2020-03-23 15:30 ` Qais Yousef
2020-03-23 21:06   ` Andrew Morton
2020-03-23 21:12     ` Darrick J. Wong
2020-03-23 21:59       ` Naohiro Aota
2020-03-25 16:08         ` Darrick J. Wong
2020-03-24 10:51     ` Qais Yousef

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).