* lockdep warning in sys_swapon @ 2020-03-23 15:17 Qais Yousef 2020-03-23 15:30 ` Qais Yousef 0 siblings, 1 reply; 7+ messages in thread From: Qais Yousef @ 2020-03-23 15:17 UTC (permalink / raw) To: Andrew Morton, linux-mm; +Cc: linux-kernel Hi I hit the following 2 warnings when running with LOCKDEP=y on arm64 platform (juno-r2), running on v5.6-rc6 The 1st one is when I execute `swapon -a`. The 2nd one happens at boot. I have /dev/sda2 as my swap in /etc/fstab Note that I either hit 1 OR 2, but didn't hit both warnings at the same time, yet at least. /dev/sda2 is a usb flash drive, in case it matters somehow. ### WARNING 1 ### # swapon -a [ 1682.788988] [ 1682.790503] ===================================== [ 1682.795226] WARNING: bad unlock balance detected! [ 1682.799951] 5.6.0-rc6 #533 Not tainted [ 1682.803714] ------------------------------------- [ 1682.808437] swapon/437 is trying to release lock (&sb->s_type->i_mutex_key) at: [ 1682.815797] [<ffff80001030c07c>] __arm64_sys_swapon+0x3fc/0x1140 [ 1682.821828] but there are no more locks to release! [ 1682.826724] [ 1682.826724] other info that might help us debug this: [ 1682.833280] no locks held by swapon/437. [ 1682.837216] [ 1682.837216] stack backtrace: [ 1682.841596] CPU: 4 PID: 437 Comm: swapon Not tainted 5.6.0-rc6 #533 [ 1682.847889] Hardware name: ARM Juno development board (r2) (DT) [ 1682.853834] Call trace: [ 1682.856293] dump_backtrace+0x0/0x1a8 [ 1682.859971] show_stack+0x24/0x30 [ 1682.863302] dump_stack+0xe8/0x150 [ 1682.866722] print_unlock_imbalance_bug+0xe4/0xe8 [ 1682.871447] lock_release+0x274/0x350 [ 1682.875128] up_write+0x30/0x190 [ 1682.878371] __arm64_sys_swapon+0x3fc/0x1140 [ 1682.882662] el0_svc_common.constprop.2+0x7c/0x178 [ 1682.887475] do_el0_svc+0x34/0xa0 [ 1682.890805] el0_sync_handler+0x114/0x1d0 [ 1682.894832] el0_sync+0x140/0x180 [ 1682.898233] ------------[ cut here ]------------ [ 1682.902936] DEBUG_RWSEMS_WARN_ON((rwsem_owner(sem) != current) && !rwsem_test_oflags(sem, RWSEM_NONSPINNABLE)): count = 0x0, magic = 0xffff000971018270, owner = 0x0, curr 0xffff0009752fc680, list empty [ 1682.921045] WARNING: CPU: 4 PID: 437 at kernel/locking/rwsem.c:1459 up_write+0x134/0x190 [ 1682.929203] Modules linked in: [ 1682.932288] CPU: 4 PID: 437 Comm: swapon Not tainted 5.6.0-rc6 #533 [ 1682.938604] Hardware name: ARM Juno development board (r2) (DT) [ 1682.944574] pstate: 60000005 (nZCv daif -PAN -UAO) [ 1682.949407] pc : up_write+0x134/0x190 [ 1682.953102] lr : up_write+0x134/0x190 [ 1682.956793] sp : ffff800018e1bd30 [ 1682.960135] x29: ffff800018e1bd30 x28: ffff000970fb4000 [ 1682.965494] x27: ffff0009750bd000 x26: 00000000fffffff0 [ 1682.970852] x25: 0000000000000000 x24: fffffffffffffff0 [ 1682.976211] x23: ffff000971439200 x22: ffff80001323d000 [ 1682.981569] x21: ffff800013241000 x20: ffff80001030c07c [ 1682.986927] x19: ffff000971018270 x18: ffffffffffffffff [ 1682.992285] x17: 0000000000000000 x16: 0000000000000000 [ 1682.997643] x15: ffff80001323da88 x14: 0720072007200720 [ 1683.003002] x13: 0720072007200720 x12: 0720072007200720 [ 1683.008360] x11: 0720072007200720 x10: 0000000000000000 [ 1683.013718] x9 : ffff80001323da88 x8 : ffff800013267808 [ 1683.019076] x7 : ffff80001019f77c x6 : ffff800018e1b9f0 [ 1683.024434] x5 : 0000000000000001 x4 : 0000000000000001 [ 1683.029791] x3 : 775f041afc1e3200 x2 : ffff800013267860 [ 1683.035149] x1 : 775f041afc1e3200 x0 : 0000000000000000 [ 1683.040507] Call trace: [ 1683.042977] up_write+0x134/0x190 [ 1683.046323] __arm64_sys_swapon+0x3fc/0x1140 [ 1683.050632] el0_svc_common.constprop.2+0x7c/0x178 [ 1683.055464] do_el0_svc+0x34/0xa0 [ 1683.058810] el0_sync_handler+0x114/0x1d0 [ 1683.062854] el0_sync+0x140/0x180 [ 1683.066196] irq event stamp: 1095 [ 1683.069543] hardirqs last enabled at (1095): [<ffff80001191b724>] _raw_spin_unlock_irqrestore+0x7c/0x88 [ 1683.079104] hardirqs last disabled at (1094): [<ffff80001191b4c0>] _raw_spin_lock_irqsave+0x38/0x88 [ 1683.088226] softirqs last enabled at (996): [<ffff8000100818a4>] __do_softirq+0x4bc/0x568 [ 1683.096562] softirqs last disabled at (985): [<ffff800010114064>] irq_exit+0x144/0x150 [ 1683.104545] ---[ end trace 737bf4f253f20ead ]--- ### WARNING 2 ### [ 15.502583] Unable to find swap-space signature [ 15.507496] [ 15.508996] ====================================================== [ 15.515202] WARNING: possible circular locking dependency detected [ 15.521411] 5.6.0-rc6 #533 Not tainted [ 15.525173] ------------------------------------------------------ [ 15.531378] swapon/321 is trying to acquire lock: [ 15.536102] ffff000971020078 (&bdev->bd_mutex){+.+.}, at: blkdev_put+0x30/0x120 [ 15.543460] [ 15.543460] but task is already holding lock: [ 15.549317] ffff0009710202d8 (&sb->s_type->i_mutex_key#7){+.+.}, at: __arm64_sys_swapon+0x2cc/0x1140 [ 15.558505] [ 15.558505] which lock already depends on the new lock. [ 15.558505] [ 15.566719] [ 15.566719] the existing dependency chain (in reverse order) is: [ 15.574234] [ 15.574234] -> #1 (&sb->s_type->i_mutex_key#7){+.+.}: [ 15.580802] down_write+0x50/0xb0 [ 15.584656] __blkdev_get+0x2a4/0x4f8 [ 15.588859] blkdev_get+0xfc/0x190 [ 15.592800] __device_add_disk+0x470/0x4b8 [ 15.597437] device_add_disk+0x38/0x48 [ 15.601729] sd_probe+0x350/0x488 [ 15.605585] really_probe+0x10c/0x358 [ 15.609788] driver_probe_device+0x5c/0x100 [ 15.614514] __device_attach_driver+0x98/0xb8 [ 15.619415] bus_for_each_drv+0x74/0xd8 [ 15.623792] __device_attach_async_helper+0xc8/0xf0 [ 15.629217] async_run_entry_fn+0x48/0x148 [ 15.633857] process_one_work+0x2a4/0x748 [ 15.638407] worker_thread+0x48/0x498 [ 15.642609] kthread+0x13c/0x140 [ 15.646376] ret_from_fork+0x10/0x18 [ 15.650487] [ 15.650487] -> #0 (&bdev->bd_mutex){+.+.}: [ 15.656093] __lock_acquire+0x1008/0x1450 [ 15.660645] lock_acquire+0xf8/0x280 [ 15.664759] __mutex_lock+0xac/0x920 [ 15.668873] mutex_lock_nested+0x3c/0x50 [ 15.673337] blkdev_put+0x30/0x120 [ 15.677278] __arm64_sys_swapon+0x5a8/0x1140 [ 15.682093] el0_svc_common.constprop.2+0x7c/0x178 [ 15.687429] do_el0_svc+0x34/0xa0 [ 15.691282] el0_sync_handler+0x114/0x1d0 [ 15.695833] el0_sync+0x140/0x180 [ 15.699683] [ 15.699683] other info that might help us debug this: [ 15.699683] [ 15.707721] Possible unsafe locking scenario: [ 15.707721] [ 15.713665] CPU0 CPU1 [ 15.718213] ---- ---- [ 15.722759] lock(&sb->s_type->i_mutex_key#7); [ 15.727311] lock(&bdev->bd_mutex); [ 15.733430] lock(&sb->s_type->i_mutex_key#7); [ 15.740510] lock(&bdev->bd_mutex); [ 15.744100] [ 15.744100] *** DEADLOCK *** [ 15.744100] [ 15.750048] 1 lock held by swapon/321: [ 15.753810] #0: ffff0009710202d8 (&sb->s_type->i_mutex_key#7){+.+.}, at: __arm64_sys_swapon+0x2cc/0x1140 [ 15.763432] [ 15.763432] stack backtrace: [ 15.767811] CPU: 3 PID: 321 Comm: swapon Not tainted 5.6.0-rc6 #533 [ 15.774104] Hardware name: ARM Juno development board (r2) (DT) [ 15.780048] Call trace: [ 15.782505] dump_backtrace+0x0/0x1a8 [ 15.786182] show_stack+0x24/0x30 [ 15.789512] dump_stack+0xe8/0x150 [ 15.792928] print_circular_bug.isra.39+0x1b8/0x210 [ 15.797826] check_noncircular+0x178/0x1e0 [ 15.801940] __lock_acquire+0x1008/0x1450 [ 15.805967] lock_acquire+0xf8/0x280 [ 15.809558] __mutex_lock+0xac/0x920 [ 15.813149] mutex_lock_nested+0x3c/0x50 [ 15.817089] blkdev_put+0x30/0x120 [ 15.820507] __arm64_sys_swapon+0x5a8/0x1140 [ 15.824797] el0_svc_common.constprop.2+0x7c/0x178 [ 15.829609] do_el0_svc+0x34/0xa0 [ 15.832938] el0_sync_handler+0x114/0x1d0 [ 15.836964] el0_sync+0x140/0x180 swapon: /dev/sda2: Invalid argument Thanks -- Qais Yousef ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: lockdep warning in sys_swapon 2020-03-23 15:17 lockdep warning in sys_swapon Qais Yousef @ 2020-03-23 15:30 ` Qais Yousef 2020-03-23 21:06 ` Andrew Morton 0 siblings, 1 reply; 7+ messages in thread From: Qais Yousef @ 2020-03-23 15:30 UTC (permalink / raw) To: Andrew Morton, linux-mm; +Cc: linux-kernel On 03/23/20 15:17, Qais Yousef wrote: > Hi > > I hit the following 2 warnings when running with LOCKDEP=y on arm64 platform > (juno-r2), running on v5.6-rc6 > > The 1st one is when I execute `swapon -a`. The 2nd one happens at boot. I have > /dev/sda2 as my swap in /etc/fstab > > Note that I either hit 1 OR 2, but didn't hit both warnings at the same time, > yet at least. > > /dev/sda2 is a usb flash drive, in case it matters somehow. By the way, I noticed that in claim_swapfile() if we fail we don't release the lock. Shouldn't we release the lock here? I tried with that FWIW, but it had no effect on the warnings. diff --git a/mm/swapfile.c b/mm/swapfile.c index b2a2e45c9a36..0293e8c7b472 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -2900,8 +2900,10 @@ static int claim_swapfile(struct swap_info_struct *p, struct inode *inode) } inode_lock(inode); - if (IS_SWAPFILE(inode)) + if (IS_SWAPFILE(inode)) { + inode_unlock(inode); return -EBUSY; + } return 0; } Thanks -- Qais Yousef ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: lockdep warning in sys_swapon 2020-03-23 15:30 ` Qais Yousef @ 2020-03-23 21:06 ` Andrew Morton 2020-03-23 21:12 ` Darrick J. Wong 2020-03-24 10:51 ` Qais Yousef 0 siblings, 2 replies; 7+ messages in thread From: Andrew Morton @ 2020-03-23 21:06 UTC (permalink / raw) To: Qais Yousef Cc: linux-mm, linux-kernel, Naohiro Aota, Darrick J. Wong, Christoph Hellwig On Mon, 23 Mar 2020 15:30:45 +0000 Qais Yousef <qais.yousef@arm.com> wrote: > On 03/23/20 15:17, Qais Yousef wrote: > > Hi > > > > I hit the following 2 warnings when running with LOCKDEP=y on arm64 platform > > (juno-r2), running on v5.6-rc6 > > > > The 1st one is when I execute `swapon -a`. The 2nd one happens at boot. I have > > /dev/sda2 as my swap in /etc/fstab > > > > Note that I either hit 1 OR 2, but didn't hit both warnings at the same time, > > yet at least. > > > > /dev/sda2 is a usb flash drive, in case it matters somehow. > > By the way, I noticed that in claim_swapfile() if we fail we don't release the > lock. Shouldn't we release the lock here? > > I tried with that FWIW, but it had no effect on the warnings. > I'll be sending the below into Linus this week. I was hoping to hear from Darrick/Christoph (?) but it looks like the right thing to do. Are you able to test it? I think I'll add a cc:stable to this one. From: Naohiro Aota <naohiro.aota@wdc.com> Subject: mm/swapfile.c: move inode_lock out of claim_swapfile claim_swapfile() currently keeps the inode locked when it is successful, or the file is already swapfile (with -EBUSY). And, on the other error cases, it does not lock the inode. This inconsistency of the lock state and return value is quite confusing and actually causing a bad unlock balance as below in the "bad_swap" section of __do_sys_swapon(). This commit fixes this issue by moving the inode_lock() and IS_SWAPFILE check out of claim_swapfile(). The inode is unlocked in "bad_swap_unlock_inode" section, so that the inode is ensured to be unlocked at "bad_swap". Thus, error handling codes after the locking now jumps to "bad_swap_unlock_inode" instead of "bad_swap". ===================================== WARNING: bad unlock balance detected! 5.5.0-rc7+ #176 Not tainted ------------------------------------- swapon/4294 is trying to release lock (&sb->s_type->i_mutex_key) at: [<ffffffff8173a6eb>] __do_sys_swapon+0x94b/0x3550 but there are no more locks to release! other info that might help us debug this: no locks held by swapon/4294. stack backtrace: CPU: 5 PID: 4294 Comm: swapon Not tainted 5.5.0-rc7-BTRFS-ZNS+ #176 Hardware name: ASUS All Series/H87-PRO, BIOS 2102 07/29/2014 Call Trace: dump_stack+0xa1/0xea ? __do_sys_swapon+0x94b/0x3550 print_unlock_imbalance_bug.cold+0x114/0x123 ? __do_sys_swapon+0x94b/0x3550 lock_release+0x562/0xed0 ? kvfree+0x31/0x40 ? lock_downgrade+0x770/0x770 ? kvfree+0x31/0x40 ? rcu_read_lock_sched_held+0xa1/0xd0 ? rcu_read_lock_bh_held+0xb0/0xb0 up_write+0x2d/0x490 ? kfree+0x293/0x2f0 __do_sys_swapon+0x94b/0x3550 ? putname+0xb0/0xf0 ? kmem_cache_free+0x2e7/0x370 ? do_sys_open+0x184/0x3e0 ? generic_max_swapfile_size+0x40/0x40 ? do_syscall_64+0x27/0x4b0 ? entry_SYSCALL_64_after_hwframe+0x49/0xbe ? lockdep_hardirqs_on+0x38c/0x590 __x64_sys_swapon+0x54/0x80 do_syscall_64+0xa4/0x4b0 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f15da0a0dc7 Link: http://lkml.kernel.org/r/20200206090132.154869-1-naohiro.aota@wdc.com Fixes: 1638045c3677 ("mm: set S_SWAPFILE on blockdev swap devices") Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/swapfile.c | 41 ++++++++++++++++++++--------------------- 1 file changed, 20 insertions(+), 21 deletions(-) --- a/mm/swapfile.c~mm-swap-move-inode_lock-out-of-claim_swapfile +++ a/mm/swapfile.c @@ -2899,10 +2899,6 @@ static int claim_swapfile(struct swap_in p->bdev = inode->i_sb->s_bdev; } - inode_lock(inode); - if (IS_SWAPFILE(inode)) - return -EBUSY; - return 0; } @@ -3157,36 +3153,41 @@ SYSCALL_DEFINE2(swapon, const char __use mapping = swap_file->f_mapping; inode = mapping->host; - /* will take i_rwsem; */ error = claim_swapfile(p, inode); if (unlikely(error)) goto bad_swap; + inode_lock(inode); + if (IS_SWAPFILE(inode)) { + error = -EBUSY; + goto bad_swap_unlock_inode; + } + /* * Read the swap header. */ if (!mapping->a_ops->readpage) { error = -EINVAL; - goto bad_swap; + goto bad_swap_unlock_inode; } page = read_mapping_page(mapping, 0, swap_file); if (IS_ERR(page)) { error = PTR_ERR(page); - goto bad_swap; + goto bad_swap_unlock_inode; } swap_header = kmap(page); maxpages = read_swap_header(p, swap_header, inode); if (unlikely(!maxpages)) { error = -EINVAL; - goto bad_swap; + goto bad_swap_unlock_inode; } /* OK, set up the swap map and apply the bad block list */ swap_map = vzalloc(maxpages); if (!swap_map) { error = -ENOMEM; - goto bad_swap; + goto bad_swap_unlock_inode; } if (bdi_cap_stable_pages_required(inode_to_bdi(inode))) @@ -3211,7 +3212,7 @@ SYSCALL_DEFINE2(swapon, const char __use GFP_KERNEL); if (!cluster_info) { error = -ENOMEM; - goto bad_swap; + goto bad_swap_unlock_inode; } for (ci = 0; ci < nr_cluster; ci++) @@ -3220,7 +3221,7 @@ SYSCALL_DEFINE2(swapon, const char __use p->percpu_cluster = alloc_percpu(struct percpu_cluster); if (!p->percpu_cluster) { error = -ENOMEM; - goto bad_swap; + goto bad_swap_unlock_inode; } for_each_possible_cpu(cpu) { struct percpu_cluster *cluster; @@ -3234,13 +3235,13 @@ SYSCALL_DEFINE2(swapon, const char __use error = swap_cgroup_swapon(p->type, maxpages); if (error) - goto bad_swap; + goto bad_swap_unlock_inode; nr_extents = setup_swap_map_and_extents(p, swap_header, swap_map, cluster_info, maxpages, &span); if (unlikely(nr_extents < 0)) { error = nr_extents; - goto bad_swap; + goto bad_swap_unlock_inode; } /* frontswap enabled? set up bit-per-page map for frontswap */ if (IS_ENABLED(CONFIG_FRONTSWAP)) @@ -3280,7 +3281,7 @@ SYSCALL_DEFINE2(swapon, const char __use error = init_swap_address_space(p->type, maxpages); if (error) - goto bad_swap; + goto bad_swap_unlock_inode; /* * Flush any pending IO and dirty mappings before we start using this @@ -3290,7 +3291,7 @@ SYSCALL_DEFINE2(swapon, const char __use error = inode_drain_writes(inode); if (error) { inode->i_flags &= ~S_SWAPFILE; - goto bad_swap; + goto bad_swap_unlock_inode; } mutex_lock(&swapon_mutex); @@ -3315,6 +3316,8 @@ SYSCALL_DEFINE2(swapon, const char __use error = 0; goto out; +bad_swap_unlock_inode: + inode_unlock(inode); bad_swap: free_percpu(p->percpu_cluster); p->percpu_cluster = NULL; @@ -3322,6 +3325,7 @@ bad_swap: set_blocksize(p->bdev, p->old_block_size); blkdev_put(p->bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL); } + inode = NULL; destroy_swap_extents(p); swap_cgroup_swapoff(p->type); spin_lock(&swap_lock); @@ -3333,13 +3337,8 @@ bad_swap: kvfree(frontswap_map); if (inced_nr_rotate_swap) atomic_dec(&nr_rotate_swap); - if (swap_file) { - if (inode) { - inode_unlock(inode); - inode = NULL; - } + if (swap_file) filp_close(swap_file, NULL); - } out: if (page && !IS_ERR(page)) { kunmap(page); _ ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: lockdep warning in sys_swapon 2020-03-23 21:06 ` Andrew Morton @ 2020-03-23 21:12 ` Darrick J. Wong 2020-03-23 21:59 ` Naohiro Aota 2020-03-24 10:51 ` Qais Yousef 1 sibling, 1 reply; 7+ messages in thread From: Darrick J. Wong @ 2020-03-23 21:12 UTC (permalink / raw) To: Andrew Morton Cc: Qais Yousef, linux-mm, linux-kernel, Naohiro Aota, Christoph Hellwig On Mon, Mar 23, 2020 at 02:06:15PM -0700, Andrew Morton wrote: > On Mon, 23 Mar 2020 15:30:45 +0000 Qais Yousef <qais.yousef@arm.com> wrote: > > > On 03/23/20 15:17, Qais Yousef wrote: > > > Hi > > > > > > I hit the following 2 warnings when running with LOCKDEP=y on arm64 platform > > > (juno-r2), running on v5.6-rc6 > > > > > > The 1st one is when I execute `swapon -a`. The 2nd one happens at boot. I have > > > /dev/sda2 as my swap in /etc/fstab > > > > > > Note that I either hit 1 OR 2, but didn't hit both warnings at the same time, > > > yet at least. > > > > > > /dev/sda2 is a usb flash drive, in case it matters somehow. > > > > By the way, I noticed that in claim_swapfile() if we fail we don't release the > > lock. Shouldn't we release the lock here? > > > > I tried with that FWIW, but it had no effect on the warnings. > > > > I'll be sending the below into Linus this week. > > I was hoping to hear from Darrick/Christoph (?) but it looks like the > right thing to do. Are you able to test it? I had questions[1] about the patch, but nobody ever replied. --D [1] https://lore.kernel.org/linux-fsdevel/20200212164956.GK6874@magnolia/ > I think I'll add a cc:stable to this one. > > > > From: Naohiro Aota <naohiro.aota@wdc.com> > Subject: mm/swapfile.c: move inode_lock out of claim_swapfile > > claim_swapfile() currently keeps the inode locked when it is successful, > or the file is already swapfile (with -EBUSY). And, on the other error > cases, it does not lock the inode. > > This inconsistency of the lock state and return value is quite confusing > and actually causing a bad unlock balance as below in the "bad_swap" > section of __do_sys_swapon(). > > This commit fixes this issue by moving the inode_lock() and IS_SWAPFILE > check out of claim_swapfile(). The inode is unlocked in > "bad_swap_unlock_inode" section, so that the inode is ensured to be > unlocked at "bad_swap". Thus, error handling codes after the locking now > jumps to "bad_swap_unlock_inode" instead of "bad_swap". > > ===================================== > WARNING: bad unlock balance detected! > 5.5.0-rc7+ #176 Not tainted > ------------------------------------- > swapon/4294 is trying to release lock (&sb->s_type->i_mutex_key) at: > [<ffffffff8173a6eb>] __do_sys_swapon+0x94b/0x3550 > but there are no more locks to release! > > other info that might help us debug this: > no locks held by swapon/4294. > > stack backtrace: > CPU: 5 PID: 4294 Comm: swapon Not tainted 5.5.0-rc7-BTRFS-ZNS+ #176 > Hardware name: ASUS All Series/H87-PRO, BIOS 2102 07/29/2014 > Call Trace: > dump_stack+0xa1/0xea > ? __do_sys_swapon+0x94b/0x3550 > print_unlock_imbalance_bug.cold+0x114/0x123 > ? __do_sys_swapon+0x94b/0x3550 > lock_release+0x562/0xed0 > ? kvfree+0x31/0x40 > ? lock_downgrade+0x770/0x770 > ? kvfree+0x31/0x40 > ? rcu_read_lock_sched_held+0xa1/0xd0 > ? rcu_read_lock_bh_held+0xb0/0xb0 > up_write+0x2d/0x490 > ? kfree+0x293/0x2f0 > __do_sys_swapon+0x94b/0x3550 > ? putname+0xb0/0xf0 > ? kmem_cache_free+0x2e7/0x370 > ? do_sys_open+0x184/0x3e0 > ? generic_max_swapfile_size+0x40/0x40 > ? do_syscall_64+0x27/0x4b0 > ? entry_SYSCALL_64_after_hwframe+0x49/0xbe > ? lockdep_hardirqs_on+0x38c/0x590 > __x64_sys_swapon+0x54/0x80 > do_syscall_64+0xa4/0x4b0 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > RIP: 0033:0x7f15da0a0dc7 > > Link: http://lkml.kernel.org/r/20200206090132.154869-1-naohiro.aota@wdc.com > Fixes: 1638045c3677 ("mm: set S_SWAPFILE on blockdev swap devices") > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> > Reviewed-by: Andrew Morton <akpm@linux-foundation.org> > Cc: Darrick J. Wong <darrick.wong@oracle.com> > Cc: Christoph Hellwig <hch@infradead.org> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > --- > > mm/swapfile.c | 41 ++++++++++++++++++++--------------------- > 1 file changed, 20 insertions(+), 21 deletions(-) > > --- a/mm/swapfile.c~mm-swap-move-inode_lock-out-of-claim_swapfile > +++ a/mm/swapfile.c > @@ -2899,10 +2899,6 @@ static int claim_swapfile(struct swap_in > p->bdev = inode->i_sb->s_bdev; > } > > - inode_lock(inode); > - if (IS_SWAPFILE(inode)) > - return -EBUSY; > - > return 0; > } > > @@ -3157,36 +3153,41 @@ SYSCALL_DEFINE2(swapon, const char __use > mapping = swap_file->f_mapping; > inode = mapping->host; > > - /* will take i_rwsem; */ > error = claim_swapfile(p, inode); > if (unlikely(error)) > goto bad_swap; > > + inode_lock(inode); > + if (IS_SWAPFILE(inode)) { > + error = -EBUSY; > + goto bad_swap_unlock_inode; > + } > + > /* > * Read the swap header. > */ > if (!mapping->a_ops->readpage) { > error = -EINVAL; > - goto bad_swap; > + goto bad_swap_unlock_inode; > } > page = read_mapping_page(mapping, 0, swap_file); > if (IS_ERR(page)) { > error = PTR_ERR(page); > - goto bad_swap; > + goto bad_swap_unlock_inode; > } > swap_header = kmap(page); > > maxpages = read_swap_header(p, swap_header, inode); > if (unlikely(!maxpages)) { > error = -EINVAL; > - goto bad_swap; > + goto bad_swap_unlock_inode; > } > > /* OK, set up the swap map and apply the bad block list */ > swap_map = vzalloc(maxpages); > if (!swap_map) { > error = -ENOMEM; > - goto bad_swap; > + goto bad_swap_unlock_inode; > } > > if (bdi_cap_stable_pages_required(inode_to_bdi(inode))) > @@ -3211,7 +3212,7 @@ SYSCALL_DEFINE2(swapon, const char __use > GFP_KERNEL); > if (!cluster_info) { > error = -ENOMEM; > - goto bad_swap; > + goto bad_swap_unlock_inode; > } > > for (ci = 0; ci < nr_cluster; ci++) > @@ -3220,7 +3221,7 @@ SYSCALL_DEFINE2(swapon, const char __use > p->percpu_cluster = alloc_percpu(struct percpu_cluster); > if (!p->percpu_cluster) { > error = -ENOMEM; > - goto bad_swap; > + goto bad_swap_unlock_inode; > } > for_each_possible_cpu(cpu) { > struct percpu_cluster *cluster; > @@ -3234,13 +3235,13 @@ SYSCALL_DEFINE2(swapon, const char __use > > error = swap_cgroup_swapon(p->type, maxpages); > if (error) > - goto bad_swap; > + goto bad_swap_unlock_inode; > > nr_extents = setup_swap_map_and_extents(p, swap_header, swap_map, > cluster_info, maxpages, &span); > if (unlikely(nr_extents < 0)) { > error = nr_extents; > - goto bad_swap; > + goto bad_swap_unlock_inode; > } > /* frontswap enabled? set up bit-per-page map for frontswap */ > if (IS_ENABLED(CONFIG_FRONTSWAP)) > @@ -3280,7 +3281,7 @@ SYSCALL_DEFINE2(swapon, const char __use > > error = init_swap_address_space(p->type, maxpages); > if (error) > - goto bad_swap; > + goto bad_swap_unlock_inode; > > /* > * Flush any pending IO and dirty mappings before we start using this > @@ -3290,7 +3291,7 @@ SYSCALL_DEFINE2(swapon, const char __use > error = inode_drain_writes(inode); > if (error) { > inode->i_flags &= ~S_SWAPFILE; > - goto bad_swap; > + goto bad_swap_unlock_inode; > } > > mutex_lock(&swapon_mutex); > @@ -3315,6 +3316,8 @@ SYSCALL_DEFINE2(swapon, const char __use > > error = 0; > goto out; > +bad_swap_unlock_inode: > + inode_unlock(inode); > bad_swap: > free_percpu(p->percpu_cluster); > p->percpu_cluster = NULL; > @@ -3322,6 +3325,7 @@ bad_swap: > set_blocksize(p->bdev, p->old_block_size); > blkdev_put(p->bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL); > } > + inode = NULL; > destroy_swap_extents(p); > swap_cgroup_swapoff(p->type); > spin_lock(&swap_lock); > @@ -3333,13 +3337,8 @@ bad_swap: > kvfree(frontswap_map); > if (inced_nr_rotate_swap) > atomic_dec(&nr_rotate_swap); > - if (swap_file) { > - if (inode) { > - inode_unlock(inode); > - inode = NULL; > - } > + if (swap_file) > filp_close(swap_file, NULL); > - } > out: > if (page && !IS_ERR(page)) { > kunmap(page); > _ > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: lockdep warning in sys_swapon 2020-03-23 21:12 ` Darrick J. Wong @ 2020-03-23 21:59 ` Naohiro Aota 2020-03-25 16:08 ` Darrick J. Wong 0 siblings, 1 reply; 7+ messages in thread From: Naohiro Aota @ 2020-03-23 21:59 UTC (permalink / raw) To: Darrick J. Wong Cc: Andrew Morton, Qais Yousef, linux-mm, linux-kernel, Christoph Hellwig On Mon, Mar 23, 2020 at 02:12:01PM -0700, Darrick J. Wong wrote: >On Mon, Mar 23, 2020 at 02:06:15PM -0700, Andrew Morton wrote: >> On Mon, 23 Mar 2020 15:30:45 +0000 Qais Yousef <qais.yousef@arm.com> wrote: >> >> > On 03/23/20 15:17, Qais Yousef wrote: >> > > Hi >> > > >> > > I hit the following 2 warnings when running with LOCKDEP=y on arm64 platform >> > > (juno-r2), running on v5.6-rc6 >> > > >> > > The 1st one is when I execute `swapon -a`. The 2nd one happens at boot. I have >> > > /dev/sda2 as my swap in /etc/fstab >> > > >> > > Note that I either hit 1 OR 2, but didn't hit both warnings at the same time, >> > > yet at least. >> > > >> > > /dev/sda2 is a usb flash drive, in case it matters somehow. >> > >> > By the way, I noticed that in claim_swapfile() if we fail we don't release the >> > lock. Shouldn't we release the lock here? >> > >> > I tried with that FWIW, but it had no effect on the warnings. >> > >> >> I'll be sending the below into Linus this week. >> >> I was hoping to hear from Darrick/Christoph (?) but it looks like the >> right thing to do. Are you able to test it? > >I had questions[1] about the patch, but nobody ever replied. Sorry, I have overlooked that one. I'm replying here. > Sorry to wander in late, but I don't see how we unlock the inode in the > success case. Before this patch, the "if (inode) inode_unlock(inode);" > below out: would take care of this for both the success case and the > bad_swap case, but now that's gone, and AFAICT after this patch we only > unlock the inode when erroring out... > > > +bad_swap_unlock_inode: > > + inode_unlock(inode); > > ...since we never goto bad_swap_unlock_inode when error == 0, correct? Actually, this patch does not touch "if (inode) inode_unlock(inode);" below the "out:" label, so we still have that "inode_unlock" after this patch and it unlocks the inode in the success case. > >--D > >[1] https://lore.kernel.org/linux-fsdevel/20200212164956.GK6874@magnolia/ > >> I think I'll add a cc:stable to this one. >> >> >> >> From: Naohiro Aota <naohiro.aota@wdc.com> >> Subject: mm/swapfile.c: move inode_lock out of claim_swapfile >> >> claim_swapfile() currently keeps the inode locked when it is successful, >> or the file is already swapfile (with -EBUSY). And, on the other error >> cases, it does not lock the inode. >> >> This inconsistency of the lock state and return value is quite confusing >> and actually causing a bad unlock balance as below in the "bad_swap" >> section of __do_sys_swapon(). >> >> This commit fixes this issue by moving the inode_lock() and IS_SWAPFILE >> check out of claim_swapfile(). The inode is unlocked in >> "bad_swap_unlock_inode" section, so that the inode is ensured to be >> unlocked at "bad_swap". Thus, error handling codes after the locking now >> jumps to "bad_swap_unlock_inode" instead of "bad_swap". >> >> ===================================== >> WARNING: bad unlock balance detected! >> 5.5.0-rc7+ #176 Not tainted >> ------------------------------------- >> swapon/4294 is trying to release lock (&sb->s_type->i_mutex_key) at: >> [<ffffffff8173a6eb>] __do_sys_swapon+0x94b/0x3550 >> but there are no more locks to release! >> >> other info that might help us debug this: >> no locks held by swapon/4294. >> >> stack backtrace: >> CPU: 5 PID: 4294 Comm: swapon Not tainted 5.5.0-rc7-BTRFS-ZNS+ #176 >> Hardware name: ASUS All Series/H87-PRO, BIOS 2102 07/29/2014 >> Call Trace: >> dump_stack+0xa1/0xea >> ? __do_sys_swapon+0x94b/0x3550 >> print_unlock_imbalance_bug.cold+0x114/0x123 >> ? __do_sys_swapon+0x94b/0x3550 >> lock_release+0x562/0xed0 >> ? kvfree+0x31/0x40 >> ? lock_downgrade+0x770/0x770 >> ? kvfree+0x31/0x40 >> ? rcu_read_lock_sched_held+0xa1/0xd0 >> ? rcu_read_lock_bh_held+0xb0/0xb0 >> up_write+0x2d/0x490 >> ? kfree+0x293/0x2f0 >> __do_sys_swapon+0x94b/0x3550 >> ? putname+0xb0/0xf0 >> ? kmem_cache_free+0x2e7/0x370 >> ? do_sys_open+0x184/0x3e0 >> ? generic_max_swapfile_size+0x40/0x40 >> ? do_syscall_64+0x27/0x4b0 >> ? entry_SYSCALL_64_after_hwframe+0x49/0xbe >> ? lockdep_hardirqs_on+0x38c/0x590 >> __x64_sys_swapon+0x54/0x80 >> do_syscall_64+0xa4/0x4b0 >> entry_SYSCALL_64_after_hwframe+0x49/0xbe >> RIP: 0033:0x7f15da0a0dc7 >> >> Link: http://lkml.kernel.org/r/20200206090132.154869-1-naohiro.aota@wdc.com >> Fixes: 1638045c3677 ("mm: set S_SWAPFILE on blockdev swap devices") >> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> >> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> >> Cc: Darrick J. Wong <darrick.wong@oracle.com> >> Cc: Christoph Hellwig <hch@infradead.org> >> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> >> --- >> >> mm/swapfile.c | 41 ++++++++++++++++++++--------------------- >> 1 file changed, 20 insertions(+), 21 deletions(-) >> >> --- a/mm/swapfile.c~mm-swap-move-inode_lock-out-of-claim_swapfile >> +++ a/mm/swapfile.c >> @@ -2899,10 +2899,6 @@ static int claim_swapfile(struct swap_in >> p->bdev = inode->i_sb->s_bdev; >> } >> >> - inode_lock(inode); >> - if (IS_SWAPFILE(inode)) >> - return -EBUSY; >> - >> return 0; >> } >> >> @@ -3157,36 +3153,41 @@ SYSCALL_DEFINE2(swapon, const char __use >> mapping = swap_file->f_mapping; >> inode = mapping->host; >> >> - /* will take i_rwsem; */ >> error = claim_swapfile(p, inode); >> if (unlikely(error)) >> goto bad_swap; >> >> + inode_lock(inode); >> + if (IS_SWAPFILE(inode)) { >> + error = -EBUSY; >> + goto bad_swap_unlock_inode; >> + } >> + >> /* >> * Read the swap header. >> */ >> if (!mapping->a_ops->readpage) { >> error = -EINVAL; >> - goto bad_swap; >> + goto bad_swap_unlock_inode; >> } >> page = read_mapping_page(mapping, 0, swap_file); >> if (IS_ERR(page)) { >> error = PTR_ERR(page); >> - goto bad_swap; >> + goto bad_swap_unlock_inode; >> } >> swap_header = kmap(page); >> >> maxpages = read_swap_header(p, swap_header, inode); >> if (unlikely(!maxpages)) { >> error = -EINVAL; >> - goto bad_swap; >> + goto bad_swap_unlock_inode; >> } >> >> /* OK, set up the swap map and apply the bad block list */ >> swap_map = vzalloc(maxpages); >> if (!swap_map) { >> error = -ENOMEM; >> - goto bad_swap; >> + goto bad_swap_unlock_inode; >> } >> >> if (bdi_cap_stable_pages_required(inode_to_bdi(inode))) >> @@ -3211,7 +3212,7 @@ SYSCALL_DEFINE2(swapon, const char __use >> GFP_KERNEL); >> if (!cluster_info) { >> error = -ENOMEM; >> - goto bad_swap; >> + goto bad_swap_unlock_inode; >> } >> >> for (ci = 0; ci < nr_cluster; ci++) >> @@ -3220,7 +3221,7 @@ SYSCALL_DEFINE2(swapon, const char __use >> p->percpu_cluster = alloc_percpu(struct percpu_cluster); >> if (!p->percpu_cluster) { >> error = -ENOMEM; >> - goto bad_swap; >> + goto bad_swap_unlock_inode; >> } >> for_each_possible_cpu(cpu) { >> struct percpu_cluster *cluster; >> @@ -3234,13 +3235,13 @@ SYSCALL_DEFINE2(swapon, const char __use >> >> error = swap_cgroup_swapon(p->type, maxpages); >> if (error) >> - goto bad_swap; >> + goto bad_swap_unlock_inode; >> >> nr_extents = setup_swap_map_and_extents(p, swap_header, swap_map, >> cluster_info, maxpages, &span); >> if (unlikely(nr_extents < 0)) { >> error = nr_extents; >> - goto bad_swap; >> + goto bad_swap_unlock_inode; >> } >> /* frontswap enabled? set up bit-per-page map for frontswap */ >> if (IS_ENABLED(CONFIG_FRONTSWAP)) >> @@ -3280,7 +3281,7 @@ SYSCALL_DEFINE2(swapon, const char __use >> >> error = init_swap_address_space(p->type, maxpages); >> if (error) >> - goto bad_swap; >> + goto bad_swap_unlock_inode; >> >> /* >> * Flush any pending IO and dirty mappings before we start using this >> @@ -3290,7 +3291,7 @@ SYSCALL_DEFINE2(swapon, const char __use >> error = inode_drain_writes(inode); >> if (error) { >> inode->i_flags &= ~S_SWAPFILE; >> - goto bad_swap; >> + goto bad_swap_unlock_inode; >> } >> >> mutex_lock(&swapon_mutex); >> @@ -3315,6 +3316,8 @@ SYSCALL_DEFINE2(swapon, const char __use >> >> error = 0; >> goto out; >> +bad_swap_unlock_inode: >> + inode_unlock(inode); >> bad_swap: >> free_percpu(p->percpu_cluster); >> p->percpu_cluster = NULL; >> @@ -3322,6 +3325,7 @@ bad_swap: >> set_blocksize(p->bdev, p->old_block_size); >> blkdev_put(p->bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL); >> } >> + inode = NULL; >> destroy_swap_extents(p); >> swap_cgroup_swapoff(p->type); >> spin_lock(&swap_lock); >> @@ -3333,13 +3337,8 @@ bad_swap: >> kvfree(frontswap_map); >> if (inced_nr_rotate_swap) >> atomic_dec(&nr_rotate_swap); >> - if (swap_file) { >> - if (inode) { >> - inode_unlock(inode); >> - inode = NULL; >> - } >> + if (swap_file) >> filp_close(swap_file, NULL); >> - } >> out: >> if (page && !IS_ERR(page)) { >> kunmap(page); >> _ >> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: lockdep warning in sys_swapon 2020-03-23 21:59 ` Naohiro Aota @ 2020-03-25 16:08 ` Darrick J. Wong 0 siblings, 0 replies; 7+ messages in thread From: Darrick J. Wong @ 2020-03-25 16:08 UTC (permalink / raw) To: Naohiro Aota Cc: Andrew Morton, Qais Yousef, linux-mm, linux-kernel, Christoph Hellwig On Tue, Mar 24, 2020 at 06:59:08AM +0900, Naohiro Aota wrote: > On Mon, Mar 23, 2020 at 02:12:01PM -0700, Darrick J. Wong wrote: > > On Mon, Mar 23, 2020 at 02:06:15PM -0700, Andrew Morton wrote: > > > On Mon, 23 Mar 2020 15:30:45 +0000 Qais Yousef <qais.yousef@arm.com> wrote: > > > > > > > On 03/23/20 15:17, Qais Yousef wrote: > > > > > Hi > > > > > > > > > > I hit the following 2 warnings when running with LOCKDEP=y on arm64 platform > > > > > (juno-r2), running on v5.6-rc6 > > > > > > > > > > The 1st one is when I execute `swapon -a`. The 2nd one happens at boot. I have > > > > > /dev/sda2 as my swap in /etc/fstab > > > > > > > > > > Note that I either hit 1 OR 2, but didn't hit both warnings at the same time, > > > > > yet at least. > > > > > > > > > > /dev/sda2 is a usb flash drive, in case it matters somehow. > > > > > > > > By the way, I noticed that in claim_swapfile() if we fail we don't release the > > > > lock. Shouldn't we release the lock here? > > > > > > > > I tried with that FWIW, but it had no effect on the warnings. > > > > > > > > > > I'll be sending the below into Linus this week. > > > > > > I was hoping to hear from Darrick/Christoph (?) but it looks like the > > > right thing to do. Are you able to test it? > > > > I had questions[1] about the patch, but nobody ever replied. > > Sorry, I have overlooked that one. I'm replying here. > > > Sorry to wander in late, but I don't see how we unlock the inode in the > > success case. Before this patch, the "if (inode) inode_unlock(inode);" > > below out: would take care of this for both the success case and the > > bad_swap case, but now that's gone, and AFAICT after this patch we only > > unlock the inode when erroring out... > > > > > +bad_swap_unlock_inode: > > > + inode_unlock(inode); > > > > ...since we never goto bad_swap_unlock_inode when error == 0, correct? > > Actually, this patch does not touch "if (inode) inode_unlock(inode);" below > the "out:" label, so we still have that "inode_unlock" after this patch and > it unlocks the inode in the success case. Ah, right, I don't know what I was thinking when I wrote that. :) Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> --D > > > > > --D > > > > [1] https://lore.kernel.org/linux-fsdevel/20200212164956.GK6874@magnolia/ > > > > > I think I'll add a cc:stable to this one. > > > > > > > > > > > > From: Naohiro Aota <naohiro.aota@wdc.com> > > > Subject: mm/swapfile.c: move inode_lock out of claim_swapfile > > > > > > claim_swapfile() currently keeps the inode locked when it is successful, > > > or the file is already swapfile (with -EBUSY). And, on the other error > > > cases, it does not lock the inode. > > > > > > This inconsistency of the lock state and return value is quite confusing > > > and actually causing a bad unlock balance as below in the "bad_swap" > > > section of __do_sys_swapon(). > > > > > > This commit fixes this issue by moving the inode_lock() and IS_SWAPFILE > > > check out of claim_swapfile(). The inode is unlocked in > > > "bad_swap_unlock_inode" section, so that the inode is ensured to be > > > unlocked at "bad_swap". Thus, error handling codes after the locking now > > > jumps to "bad_swap_unlock_inode" instead of "bad_swap". > > > > > > ===================================== > > > WARNING: bad unlock balance detected! > > > 5.5.0-rc7+ #176 Not tainted > > > ------------------------------------- > > > swapon/4294 is trying to release lock (&sb->s_type->i_mutex_key) at: > > > [<ffffffff8173a6eb>] __do_sys_swapon+0x94b/0x3550 > > > but there are no more locks to release! > > > > > > other info that might help us debug this: > > > no locks held by swapon/4294. > > > > > > stack backtrace: > > > CPU: 5 PID: 4294 Comm: swapon Not tainted 5.5.0-rc7-BTRFS-ZNS+ #176 > > > Hardware name: ASUS All Series/H87-PRO, BIOS 2102 07/29/2014 > > > Call Trace: > > > dump_stack+0xa1/0xea > > > ? __do_sys_swapon+0x94b/0x3550 > > > print_unlock_imbalance_bug.cold+0x114/0x123 > > > ? __do_sys_swapon+0x94b/0x3550 > > > lock_release+0x562/0xed0 > > > ? kvfree+0x31/0x40 > > > ? lock_downgrade+0x770/0x770 > > > ? kvfree+0x31/0x40 > > > ? rcu_read_lock_sched_held+0xa1/0xd0 > > > ? rcu_read_lock_bh_held+0xb0/0xb0 > > > up_write+0x2d/0x490 > > > ? kfree+0x293/0x2f0 > > > __do_sys_swapon+0x94b/0x3550 > > > ? putname+0xb0/0xf0 > > > ? kmem_cache_free+0x2e7/0x370 > > > ? do_sys_open+0x184/0x3e0 > > > ? generic_max_swapfile_size+0x40/0x40 > > > ? do_syscall_64+0x27/0x4b0 > > > ? entry_SYSCALL_64_after_hwframe+0x49/0xbe > > > ? lockdep_hardirqs_on+0x38c/0x590 > > > __x64_sys_swapon+0x54/0x80 > > > do_syscall_64+0xa4/0x4b0 > > > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > > RIP: 0033:0x7f15da0a0dc7 > > > > > > Link: http://lkml.kernel.org/r/20200206090132.154869-1-naohiro.aota@wdc.com > > > Fixes: 1638045c3677 ("mm: set S_SWAPFILE on blockdev swap devices") > > > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> > > > Reviewed-by: Andrew Morton <akpm@linux-foundation.org> > > > Cc: Darrick J. Wong <darrick.wong@oracle.com> > > > Cc: Christoph Hellwig <hch@infradead.org> > > > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > > > --- > > > > > > mm/swapfile.c | 41 ++++++++++++++++++++--------------------- > > > 1 file changed, 20 insertions(+), 21 deletions(-) > > > > > > --- a/mm/swapfile.c~mm-swap-move-inode_lock-out-of-claim_swapfile > > > +++ a/mm/swapfile.c > > > @@ -2899,10 +2899,6 @@ static int claim_swapfile(struct swap_in > > > p->bdev = inode->i_sb->s_bdev; > > > } > > > > > > - inode_lock(inode); > > > - if (IS_SWAPFILE(inode)) > > > - return -EBUSY; > > > - > > > return 0; > > > } > > > > > > @@ -3157,36 +3153,41 @@ SYSCALL_DEFINE2(swapon, const char __use > > > mapping = swap_file->f_mapping; > > > inode = mapping->host; > > > > > > - /* will take i_rwsem; */ > > > error = claim_swapfile(p, inode); > > > if (unlikely(error)) > > > goto bad_swap; > > > > > > + inode_lock(inode); > > > + if (IS_SWAPFILE(inode)) { > > > + error = -EBUSY; > > > + goto bad_swap_unlock_inode; > > > + } > > > + > > > /* > > > * Read the swap header. > > > */ > > > if (!mapping->a_ops->readpage) { > > > error = -EINVAL; > > > - goto bad_swap; > > > + goto bad_swap_unlock_inode; > > > } > > > page = read_mapping_page(mapping, 0, swap_file); > > > if (IS_ERR(page)) { > > > error = PTR_ERR(page); > > > - goto bad_swap; > > > + goto bad_swap_unlock_inode; > > > } > > > swap_header = kmap(page); > > > > > > maxpages = read_swap_header(p, swap_header, inode); > > > if (unlikely(!maxpages)) { > > > error = -EINVAL; > > > - goto bad_swap; > > > + goto bad_swap_unlock_inode; > > > } > > > > > > /* OK, set up the swap map and apply the bad block list */ > > > swap_map = vzalloc(maxpages); > > > if (!swap_map) { > > > error = -ENOMEM; > > > - goto bad_swap; > > > + goto bad_swap_unlock_inode; > > > } > > > > > > if (bdi_cap_stable_pages_required(inode_to_bdi(inode))) > > > @@ -3211,7 +3212,7 @@ SYSCALL_DEFINE2(swapon, const char __use > > > GFP_KERNEL); > > > if (!cluster_info) { > > > error = -ENOMEM; > > > - goto bad_swap; > > > + goto bad_swap_unlock_inode; > > > } > > > > > > for (ci = 0; ci < nr_cluster; ci++) > > > @@ -3220,7 +3221,7 @@ SYSCALL_DEFINE2(swapon, const char __use > > > p->percpu_cluster = alloc_percpu(struct percpu_cluster); > > > if (!p->percpu_cluster) { > > > error = -ENOMEM; > > > - goto bad_swap; > > > + goto bad_swap_unlock_inode; > > > } > > > for_each_possible_cpu(cpu) { > > > struct percpu_cluster *cluster; > > > @@ -3234,13 +3235,13 @@ SYSCALL_DEFINE2(swapon, const char __use > > > > > > error = swap_cgroup_swapon(p->type, maxpages); > > > if (error) > > > - goto bad_swap; > > > + goto bad_swap_unlock_inode; > > > > > > nr_extents = setup_swap_map_and_extents(p, swap_header, swap_map, > > > cluster_info, maxpages, &span); > > > if (unlikely(nr_extents < 0)) { > > > error = nr_extents; > > > - goto bad_swap; > > > + goto bad_swap_unlock_inode; > > > } > > > /* frontswap enabled? set up bit-per-page map for frontswap */ > > > if (IS_ENABLED(CONFIG_FRONTSWAP)) > > > @@ -3280,7 +3281,7 @@ SYSCALL_DEFINE2(swapon, const char __use > > > > > > error = init_swap_address_space(p->type, maxpages); > > > if (error) > > > - goto bad_swap; > > > + goto bad_swap_unlock_inode; > > > > > > /* > > > * Flush any pending IO and dirty mappings before we start using this > > > @@ -3290,7 +3291,7 @@ SYSCALL_DEFINE2(swapon, const char __use > > > error = inode_drain_writes(inode); > > > if (error) { > > > inode->i_flags &= ~S_SWAPFILE; > > > - goto bad_swap; > > > + goto bad_swap_unlock_inode; > > > } > > > > > > mutex_lock(&swapon_mutex); > > > @@ -3315,6 +3316,8 @@ SYSCALL_DEFINE2(swapon, const char __use > > > > > > error = 0; > > > goto out; > > > +bad_swap_unlock_inode: > > > + inode_unlock(inode); > > > bad_swap: > > > free_percpu(p->percpu_cluster); > > > p->percpu_cluster = NULL; > > > @@ -3322,6 +3325,7 @@ bad_swap: > > > set_blocksize(p->bdev, p->old_block_size); > > > blkdev_put(p->bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL); > > > } > > > + inode = NULL; > > > destroy_swap_extents(p); > > > swap_cgroup_swapoff(p->type); > > > spin_lock(&swap_lock); > > > @@ -3333,13 +3337,8 @@ bad_swap: > > > kvfree(frontswap_map); > > > if (inced_nr_rotate_swap) > > > atomic_dec(&nr_rotate_swap); > > > - if (swap_file) { > > > - if (inode) { > > > - inode_unlock(inode); > > > - inode = NULL; > > > - } > > > + if (swap_file) > > > filp_close(swap_file, NULL); > > > - } > > > out: > > > if (page && !IS_ERR(page)) { > > > kunmap(page); > > > _ > > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: lockdep warning in sys_swapon 2020-03-23 21:06 ` Andrew Morton 2020-03-23 21:12 ` Darrick J. Wong @ 2020-03-24 10:51 ` Qais Yousef 1 sibling, 0 replies; 7+ messages in thread From: Qais Yousef @ 2020-03-24 10:51 UTC (permalink / raw) To: Andrew Morton Cc: linux-mm, linux-kernel, Naohiro Aota, Darrick J. Wong, Christoph Hellwig On 03/23/20 14:06, Andrew Morton wrote: > On Mon, 23 Mar 2020 15:30:45 +0000 Qais Yousef <qais.yousef@arm.com> wrote: > > > On 03/23/20 15:17, Qais Yousef wrote: > > > Hi > > > > > > I hit the following 2 warnings when running with LOCKDEP=y on arm64 platform > > > (juno-r2), running on v5.6-rc6 > > > > > > The 1st one is when I execute `swapon -a`. The 2nd one happens at boot. I have > > > /dev/sda2 as my swap in /etc/fstab > > > > > > Note that I either hit 1 OR 2, but didn't hit both warnings at the same time, > > > yet at least. > > > > > > /dev/sda2 is a usb flash drive, in case it matters somehow. > > > > By the way, I noticed that in claim_swapfile() if we fail we don't release the > > lock. Shouldn't we release the lock here? > > > > I tried with that FWIW, but it had no effect on the warnings. > > > > I'll be sending the below into Linus this week. > > I was hoping to hear from Darrick/Christoph (?) but it looks like the > right thing to do. Are you able to test it? > > I think I'll add a cc:stable to this one. > > > > From: Naohiro Aota <naohiro.aota@wdc.com> > Subject: mm/swapfile.c: move inode_lock out of claim_swapfile > > claim_swapfile() currently keeps the inode locked when it is successful, > or the file is already swapfile (with -EBUSY). And, on the other error > cases, it does not lock the inode. > > This inconsistency of the lock state and return value is quite confusing > and actually causing a bad unlock balance as below in the "bad_swap" > section of __do_sys_swapon(). > > This commit fixes this issue by moving the inode_lock() and IS_SWAPFILE > check out of claim_swapfile(). The inode is unlocked in > "bad_swap_unlock_inode" section, so that the inode is ensured to be > unlocked at "bad_swap". Thus, error handling codes after the locking now > jumps to "bad_swap_unlock_inode" instead of "bad_swap". > > ===================================== > WARNING: bad unlock balance detected! > 5.5.0-rc7+ #176 Not tainted > ------------------------------------- > swapon/4294 is trying to release lock (&sb->s_type->i_mutex_key) at: > [<ffffffff8173a6eb>] __do_sys_swapon+0x94b/0x3550 > but there are no more locks to release! > > other info that might help us debug this: > no locks held by swapon/4294. > > stack backtrace: > CPU: 5 PID: 4294 Comm: swapon Not tainted 5.5.0-rc7-BTRFS-ZNS+ #176 > Hardware name: ASUS All Series/H87-PRO, BIOS 2102 07/29/2014 > Call Trace: > dump_stack+0xa1/0xea > ? __do_sys_swapon+0x94b/0x3550 > print_unlock_imbalance_bug.cold+0x114/0x123 > ? __do_sys_swapon+0x94b/0x3550 > lock_release+0x562/0xed0 > ? kvfree+0x31/0x40 > ? lock_downgrade+0x770/0x770 > ? kvfree+0x31/0x40 > ? rcu_read_lock_sched_held+0xa1/0xd0 > ? rcu_read_lock_bh_held+0xb0/0xb0 > up_write+0x2d/0x490 > ? kfree+0x293/0x2f0 > __do_sys_swapon+0x94b/0x3550 > ? putname+0xb0/0xf0 > ? kmem_cache_free+0x2e7/0x370 > ? do_sys_open+0x184/0x3e0 > ? generic_max_swapfile_size+0x40/0x40 > ? do_syscall_64+0x27/0x4b0 > ? entry_SYSCALL_64_after_hwframe+0x49/0xbe > ? lockdep_hardirqs_on+0x38c/0x590 > __x64_sys_swapon+0x54/0x80 > do_syscall_64+0xa4/0x4b0 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > RIP: 0033:0x7f15da0a0dc7 > > Link: http://lkml.kernel.org/r/20200206090132.154869-1-naohiro.aota@wdc.com > Fixes: 1638045c3677 ("mm: set S_SWAPFILE on blockdev swap devices") > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> > Reviewed-by: Andrew Morton <akpm@linux-foundation.org> > Cc: Darrick J. Wong <darrick.wong@oracle.com> > Cc: Christoph Hellwig <hch@infradead.org> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > --- With this patch applied I can no longer reproduce the problem. Tested-by: Qais Youef <qais.yousef@arm.com> Thanks! -- Qais Yousef > > mm/swapfile.c | 41 ++++++++++++++++++++--------------------- > 1 file changed, 20 insertions(+), 21 deletions(-) > > --- a/mm/swapfile.c~mm-swap-move-inode_lock-out-of-claim_swapfile > +++ a/mm/swapfile.c > @@ -2899,10 +2899,6 @@ static int claim_swapfile(struct swap_in > p->bdev = inode->i_sb->s_bdev; > } > > - inode_lock(inode); > - if (IS_SWAPFILE(inode)) > - return -EBUSY; > - > return 0; > } > > @@ -3157,36 +3153,41 @@ SYSCALL_DEFINE2(swapon, const char __use > mapping = swap_file->f_mapping; > inode = mapping->host; > > - /* will take i_rwsem; */ > error = claim_swapfile(p, inode); > if (unlikely(error)) > goto bad_swap; > > + inode_lock(inode); > + if (IS_SWAPFILE(inode)) { > + error = -EBUSY; > + goto bad_swap_unlock_inode; > + } > + > /* > * Read the swap header. > */ > if (!mapping->a_ops->readpage) { > error = -EINVAL; > - goto bad_swap; > + goto bad_swap_unlock_inode; > } > page = read_mapping_page(mapping, 0, swap_file); > if (IS_ERR(page)) { > error = PTR_ERR(page); > - goto bad_swap; > + goto bad_swap_unlock_inode; > } > swap_header = kmap(page); > > maxpages = read_swap_header(p, swap_header, inode); > if (unlikely(!maxpages)) { > error = -EINVAL; > - goto bad_swap; > + goto bad_swap_unlock_inode; > } > > /* OK, set up the swap map and apply the bad block list */ > swap_map = vzalloc(maxpages); > if (!swap_map) { > error = -ENOMEM; > - goto bad_swap; > + goto bad_swap_unlock_inode; > } > > if (bdi_cap_stable_pages_required(inode_to_bdi(inode))) > @@ -3211,7 +3212,7 @@ SYSCALL_DEFINE2(swapon, const char __use > GFP_KERNEL); > if (!cluster_info) { > error = -ENOMEM; > - goto bad_swap; > + goto bad_swap_unlock_inode; > } > > for (ci = 0; ci < nr_cluster; ci++) > @@ -3220,7 +3221,7 @@ SYSCALL_DEFINE2(swapon, const char __use > p->percpu_cluster = alloc_percpu(struct percpu_cluster); > if (!p->percpu_cluster) { > error = -ENOMEM; > - goto bad_swap; > + goto bad_swap_unlock_inode; > } > for_each_possible_cpu(cpu) { > struct percpu_cluster *cluster; > @@ -3234,13 +3235,13 @@ SYSCALL_DEFINE2(swapon, const char __use > > error = swap_cgroup_swapon(p->type, maxpages); > if (error) > - goto bad_swap; > + goto bad_swap_unlock_inode; > > nr_extents = setup_swap_map_and_extents(p, swap_header, swap_map, > cluster_info, maxpages, &span); > if (unlikely(nr_extents < 0)) { > error = nr_extents; > - goto bad_swap; > + goto bad_swap_unlock_inode; > } > /* frontswap enabled? set up bit-per-page map for frontswap */ > if (IS_ENABLED(CONFIG_FRONTSWAP)) > @@ -3280,7 +3281,7 @@ SYSCALL_DEFINE2(swapon, const char __use > > error = init_swap_address_space(p->type, maxpages); > if (error) > - goto bad_swap; > + goto bad_swap_unlock_inode; > > /* > * Flush any pending IO and dirty mappings before we start using this > @@ -3290,7 +3291,7 @@ SYSCALL_DEFINE2(swapon, const char __use > error = inode_drain_writes(inode); > if (error) { > inode->i_flags &= ~S_SWAPFILE; > - goto bad_swap; > + goto bad_swap_unlock_inode; > } > > mutex_lock(&swapon_mutex); > @@ -3315,6 +3316,8 @@ SYSCALL_DEFINE2(swapon, const char __use > > error = 0; > goto out; > +bad_swap_unlock_inode: > + inode_unlock(inode); > bad_swap: > free_percpu(p->percpu_cluster); > p->percpu_cluster = NULL; > @@ -3322,6 +3325,7 @@ bad_swap: > set_blocksize(p->bdev, p->old_block_size); > blkdev_put(p->bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL); > } > + inode = NULL; > destroy_swap_extents(p); > swap_cgroup_swapoff(p->type); > spin_lock(&swap_lock); > @@ -3333,13 +3337,8 @@ bad_swap: > kvfree(frontswap_map); > if (inced_nr_rotate_swap) > atomic_dec(&nr_rotate_swap); > - if (swap_file) { > - if (inode) { > - inode_unlock(inode); > - inode = NULL; > - } > + if (swap_file) > filp_close(swap_file, NULL); > - } > out: > if (page && !IS_ERR(page)) { > kunmap(page); > _ > ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-03-25 16:11 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-03-23 15:17 lockdep warning in sys_swapon Qais Yousef 2020-03-23 15:30 ` Qais Yousef 2020-03-23 21:06 ` Andrew Morton 2020-03-23 21:12 ` Darrick J. Wong 2020-03-23 21:59 ` Naohiro Aota 2020-03-25 16:08 ` Darrick J. Wong 2020-03-24 10:51 ` Qais Yousef
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).