From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Naohiro Aota <naohiro.aota@wdc.com>
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
linux-block@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Christoph Hellwig <hch@infradead.org>
Subject: Re: [PATCH v2] mm, swap: move inode_lock out of claim_swapfile
Date: Wed, 12 Feb 2020 08:49:56 -0800 [thread overview]
Message-ID: <20200212164956.GK6874@magnolia> (raw)
In-Reply-To: <20200206090132.154869-1-naohiro.aota@wdc.com>
On Thu, Feb 06, 2020 at 06:01:32PM +0900, Naohiro Aota wrote:
> claim_swapfile() currently keeps the inode locked when it is successful, or
> the file is already swapfile (with -EBUSY). And, on the other error cases,
> it does not lock the inode.
>
> This inconsistency of the lock state and return value is quite confusing
> and actually causing a bad unlock balance as below in the "bad_swap"
> section of __do_sys_swapon().
>
> This commit fixes this issue by moving the inode_lock() and IS_SWAPFILE
> check out of claim_swapfile(). The inode is unlocked in
> "bad_swap_unlock_inode" section, so that the inode is ensured to be
> unlocked at "bad_swap". Thus, error handling codes after the locking now
> jumps to "bad_swap_unlock_inode" instead of "bad_swap".
>
> =====================================
> WARNING: bad unlock balance detected!
> 5.5.0-rc7+ #176 Not tainted
> -------------------------------------
> swapon/4294 is trying to release lock (&sb->s_type->i_mutex_key) at:
> [<ffffffff8173a6eb>] __do_sys_swapon+0x94b/0x3550
> but there are no more locks to release!
>
> other info that might help us debug this:
> no locks held by swapon/4294.
>
> stack backtrace:
> CPU: 5 PID: 4294 Comm: swapon Not tainted 5.5.0-rc7-BTRFS-ZNS+ #176
> Hardware name: ASUS All Series/H87-PRO, BIOS 2102 07/29/2014
> Call Trace:
> dump_stack+0xa1/0xea
> ? __do_sys_swapon+0x94b/0x3550
> print_unlock_imbalance_bug.cold+0x114/0x123
> ? __do_sys_swapon+0x94b/0x3550
> lock_release+0x562/0xed0
> ? kvfree+0x31/0x40
> ? lock_downgrade+0x770/0x770
> ? kvfree+0x31/0x40
> ? rcu_read_lock_sched_held+0xa1/0xd0
> ? rcu_read_lock_bh_held+0xb0/0xb0
> up_write+0x2d/0x490
> ? kfree+0x293/0x2f0
> __do_sys_swapon+0x94b/0x3550
> ? putname+0xb0/0xf0
> ? kmem_cache_free+0x2e7/0x370
> ? do_sys_open+0x184/0x3e0
> ? generic_max_swapfile_size+0x40/0x40
> ? do_syscall_64+0x27/0x4b0
> ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
> ? lockdep_hardirqs_on+0x38c/0x590
> __x64_sys_swapon+0x54/0x80
> do_syscall_64+0xa4/0x4b0
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x7f15da0a0dc7
>
> Fixes: 1638045c3677 ("mm: set S_SWAPFILE on blockdev swap devices")
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
> Changelog:
> - Avoid taking inode lock in claim_swapfile()
> - Change error handling
> - Add "bad_swap_unlock_inode" section to ensure the inode is unlocked at
> "bad_swap"
> ---
> mm/swapfile.c | 41 ++++++++++++++++++++---------------------
> 1 file changed, 20 insertions(+), 21 deletions(-)
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index bb3261d45b6a..2c4c349e1101 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -2899,10 +2899,6 @@ static int claim_swapfile(struct swap_info_struct *p, struct inode *inode)
> p->bdev = inode->i_sb->s_bdev;
> }
>
> - inode_lock(inode);
> - if (IS_SWAPFILE(inode))
> - return -EBUSY;
> -
> return 0;
> }
>
> @@ -3157,36 +3153,41 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
> mapping = swap_file->f_mapping;
> inode = mapping->host;
>
> - /* If S_ISREG(inode->i_mode) will do inode_lock(inode); */
> error = claim_swapfile(p, inode);
> if (unlikely(error))
> goto bad_swap;
>
> + inode_lock(inode);
> + if (IS_SWAPFILE(inode)) {
> + error = -EBUSY;
> + goto bad_swap_unlock_inode;
> + }
> +
> /*
> * Read the swap header.
> */
> if (!mapping->a_ops->readpage) {
> error = -EINVAL;
> - goto bad_swap;
> + goto bad_swap_unlock_inode;
> }
> page = read_mapping_page(mapping, 0, swap_file);
> if (IS_ERR(page)) {
> error = PTR_ERR(page);
> - goto bad_swap;
> + goto bad_swap_unlock_inode;
> }
> swap_header = kmap(page);
>
> maxpages = read_swap_header(p, swap_header, inode);
> if (unlikely(!maxpages)) {
> error = -EINVAL;
> - goto bad_swap;
> + goto bad_swap_unlock_inode;
> }
>
> /* OK, set up the swap map and apply the bad block list */
> swap_map = vzalloc(maxpages);
> if (!swap_map) {
> error = -ENOMEM;
> - goto bad_swap;
> + goto bad_swap_unlock_inode;
> }
>
> if (bdi_cap_stable_pages_required(inode_to_bdi(inode)))
> @@ -3211,7 +3212,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
> GFP_KERNEL);
> if (!cluster_info) {
> error = -ENOMEM;
> - goto bad_swap;
> + goto bad_swap_unlock_inode;
> }
>
> for (ci = 0; ci < nr_cluster; ci++)
> @@ -3220,7 +3221,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
> p->percpu_cluster = alloc_percpu(struct percpu_cluster);
> if (!p->percpu_cluster) {
> error = -ENOMEM;
> - goto bad_swap;
> + goto bad_swap_unlock_inode;
> }
> for_each_possible_cpu(cpu) {
> struct percpu_cluster *cluster;
> @@ -3234,13 +3235,13 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
>
> error = swap_cgroup_swapon(p->type, maxpages);
> if (error)
> - goto bad_swap;
> + goto bad_swap_unlock_inode;
>
> nr_extents = setup_swap_map_and_extents(p, swap_header, swap_map,
> cluster_info, maxpages, &span);
> if (unlikely(nr_extents < 0)) {
> error = nr_extents;
> - goto bad_swap;
> + goto bad_swap_unlock_inode;
> }
> /* frontswap enabled? set up bit-per-page map for frontswap */
> if (IS_ENABLED(CONFIG_FRONTSWAP))
> @@ -3280,7 +3281,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
>
> error = init_swap_address_space(p->type, maxpages);
> if (error)
> - goto bad_swap;
> + goto bad_swap_unlock_inode;
>
> /*
> * Flush any pending IO and dirty mappings before we start using this
> @@ -3290,7 +3291,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
> error = inode_drain_writes(inode);
> if (error) {
> inode->i_flags &= ~S_SWAPFILE;
> - goto bad_swap;
> + goto bad_swap_unlock_inode;
> }
>
> mutex_lock(&swapon_mutex);
> @@ -3315,6 +3316,8 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
>
> error = 0;
> goto out;
Sorry to wander in late, but I don't see how we unlock the inode in the
success case. Before this patch, the "if (inode) inode_unlock(inode);"
below out: would take care of this for both the success case and the
bad_swap case, but now that's gone, and AFAICT after this patch we only
unlock the inode when erroring out...
> +bad_swap_unlock_inode:
> + inode_unlock(inode);
...since we never goto bad_swap_unlock_inode when error == 0, correct?
--D
> bad_swap:
> free_percpu(p->percpu_cluster);
> p->percpu_cluster = NULL;
> @@ -3322,6 +3325,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
> set_blocksize(p->bdev, p->old_block_size);
> blkdev_put(p->bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL);
> }
> + inode = NULL;
> destroy_swap_extents(p);
> swap_cgroup_swapoff(p->type);
> spin_lock(&swap_lock);
> @@ -3333,13 +3337,8 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
> kvfree(frontswap_map);
> if (inced_nr_rotate_swap)
> atomic_dec(&nr_rotate_swap);
> - if (swap_file) {
> - if (inode) {
> - inode_unlock(inode);
> - inode = NULL;
> - }
> + if (swap_file)
> filp_close(swap_file, NULL);
> - }
> out:
> if (page && !IS_ERR(page)) {
> kunmap(page);
> --
> 2.25.0
>
prev parent reply other threads:[~2020-02-12 16:50 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-06 9:01 [PATCH v2] mm, swap: move inode_lock out of claim_swapfile Naohiro Aota
2020-02-10 4:16 ` Andrew Morton
2020-02-10 4:25 ` Naohiro Aota
2020-02-12 16:49 ` Darrick J. Wong [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200212164956.GK6874@magnolia \
--to=darrick.wong@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=hch@infradead.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=naohiro.aota@wdc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).