* [PATCH] mm, swap: unlock inode in error path of claim_swapfile
@ 2020-02-04 9:59 Naohiro Aota
2020-02-04 15:42 ` Darrick J. Wong
0 siblings, 1 reply; 5+ messages in thread
From: Naohiro Aota @ 2020-02-04 9:59 UTC (permalink / raw)
To: linux-mm
Cc: linux-fsdevel, linux-block, Andrew Morton, Christoph Hellwig,
Darrick J . Wong, Naohiro Aota
claim_swapfile() currently keeps the inode locked when it is successful, or
the file is already swapfile (with -EBUSY). And, on the other error cases,
it does not lock the inode.
This inconsistency of the lock state and return value is quite confusing
and actually causing a bad unlock balance as below in the "bad_swap"
section of __do_sys_swapon().
This commit fixes this issue by unlocking the inode on the error path. It
also reverts blocksize and releases bdev, so that the caller can safely
forget about the inode.
=====================================
WARNING: bad unlock balance detected!
5.5.0-rc7+ #176 Not tainted
-------------------------------------
swapon/4294 is trying to release lock (&sb->s_type->i_mutex_key) at:
[<ffffffff8173a6eb>] __do_sys_swapon+0x94b/0x3550
but there are no more locks to release!
other info that might help us debug this:
no locks held by swapon/4294.
stack backtrace:
CPU: 5 PID: 4294 Comm: swapon Not tainted 5.5.0-rc7-BTRFS-ZNS+ #176
Hardware name: ASUS All Series/H87-PRO, BIOS 2102 07/29/2014
Call Trace:
dump_stack+0xa1/0xea
? __do_sys_swapon+0x94b/0x3550
print_unlock_imbalance_bug.cold+0x114/0x123
? __do_sys_swapon+0x94b/0x3550
lock_release+0x562/0xed0
? kvfree+0x31/0x40
? lock_downgrade+0x770/0x770
? kvfree+0x31/0x40
? rcu_read_lock_sched_held+0xa1/0xd0
? rcu_read_lock_bh_held+0xb0/0xb0
up_write+0x2d/0x490
? kfree+0x293/0x2f0
__do_sys_swapon+0x94b/0x3550
? putname+0xb0/0xf0
? kmem_cache_free+0x2e7/0x370
? do_sys_open+0x184/0x3e0
? generic_max_swapfile_size+0x40/0x40
? do_syscall_64+0x27/0x4b0
? entry_SYSCALL_64_after_hwframe+0x49/0xbe
? lockdep_hardirqs_on+0x38c/0x590
__x64_sys_swapon+0x54/0x80
do_syscall_64+0xa4/0x4b0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f15da0a0dc7
Fixes: 1638045c3677 ("mm: set S_SWAPFILE on blockdev swap devices")
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
mm/swapfile.c | 29 ++++++++++++++++++++++-------
1 file changed, 22 insertions(+), 7 deletions(-)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index bb3261d45b6a..dd5d7fa42282 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2886,24 +2886,37 @@ static int claim_swapfile(struct swap_info_struct *p, struct inode *inode)
p->old_block_size = block_size(p->bdev);
error = set_blocksize(p->bdev, PAGE_SIZE);
if (error < 0)
- return error;
+ goto err;
/*
* Zoned block devices contain zones that have a sequential
* write only restriction. Hence zoned block devices are not
* suitable for swapping. Disallow them here.
*/
- if (blk_queue_is_zoned(p->bdev->bd_queue))
- return -EINVAL;
+ if (blk_queue_is_zoned(p->bdev->bd_queue)) {
+ error = -EINVAL;
+ goto err;
+ }
p->flags |= SWP_BLKDEV;
} else if (S_ISREG(inode->i_mode)) {
p->bdev = inode->i_sb->s_bdev;
}
inode_lock(inode);
- if (IS_SWAPFILE(inode))
- return -EBUSY;
+ if (IS_SWAPFILE(inode)) {
+ inode_unlock(inode);
+ error = -EBUSY;
+ goto err;
+ }
return 0;
+
+err:
+ if (S_ISBLK(inode->i_mode)) {
+ set_blocksize(p->bdev, p->old_block_size);
+ blkdev_put(p->bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL);
+ }
+
+ return error;
}
@@ -3157,10 +3170,12 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
mapping = swap_file->f_mapping;
inode = mapping->host;
- /* If S_ISREG(inode->i_mode) will do inode_lock(inode); */
+ /* do inode_lock(inode); */
error = claim_swapfile(p, inode);
- if (unlikely(error))
+ if (unlikely(error)) {
+ inode = NULL;
goto bad_swap;
+ }
/*
* Read the swap header.
--
2.25.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] mm, swap: unlock inode in error path of claim_swapfile
2020-02-04 9:59 [PATCH] mm, swap: unlock inode in error path of claim_swapfile Naohiro Aota
@ 2020-02-04 15:42 ` Darrick J. Wong
2020-02-04 23:49 ` Naohiro Aota
0 siblings, 1 reply; 5+ messages in thread
From: Darrick J. Wong @ 2020-02-04 15:42 UTC (permalink / raw)
To: Naohiro Aota
Cc: linux-mm, linux-fsdevel, linux-block, Andrew Morton, Christoph Hellwig
On Tue, Feb 04, 2020 at 06:59:43PM +0900, Naohiro Aota wrote:
> claim_swapfile() currently keeps the inode locked when it is successful, or
> the file is already swapfile (with -EBUSY). And, on the other error cases,
> it does not lock the inode.
>
> This inconsistency of the lock state and return value is quite confusing
> and actually causing a bad unlock balance as below in the "bad_swap"
> section of __do_sys_swapon().
>
> This commit fixes this issue by unlocking the inode on the error path. It
> also reverts blocksize and releases bdev, so that the caller can safely
> forget about the inode.
>
> =====================================
> WARNING: bad unlock balance detected!
> 5.5.0-rc7+ #176 Not tainted
> -------------------------------------
> swapon/4294 is trying to release lock (&sb->s_type->i_mutex_key) at:
> [<ffffffff8173a6eb>] __do_sys_swapon+0x94b/0x3550
> but there are no more locks to release!
>
> other info that might help us debug this:
> no locks held by swapon/4294.
>
> stack backtrace:
> CPU: 5 PID: 4294 Comm: swapon Not tainted 5.5.0-rc7-BTRFS-ZNS+ #176
> Hardware name: ASUS All Series/H87-PRO, BIOS 2102 07/29/2014
> Call Trace:
> dump_stack+0xa1/0xea
> ? __do_sys_swapon+0x94b/0x3550
> print_unlock_imbalance_bug.cold+0x114/0x123
> ? __do_sys_swapon+0x94b/0x3550
> lock_release+0x562/0xed0
> ? kvfree+0x31/0x40
> ? lock_downgrade+0x770/0x770
> ? kvfree+0x31/0x40
> ? rcu_read_lock_sched_held+0xa1/0xd0
> ? rcu_read_lock_bh_held+0xb0/0xb0
> up_write+0x2d/0x490
> ? kfree+0x293/0x2f0
> __do_sys_swapon+0x94b/0x3550
> ? putname+0xb0/0xf0
> ? kmem_cache_free+0x2e7/0x370
> ? do_sys_open+0x184/0x3e0
> ? generic_max_swapfile_size+0x40/0x40
> ? do_syscall_64+0x27/0x4b0
> ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
> ? lockdep_hardirqs_on+0x38c/0x590
> __x64_sys_swapon+0x54/0x80
> do_syscall_64+0xa4/0x4b0
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x7f15da0a0dc7
>
> Fixes: 1638045c3677 ("mm: set S_SWAPFILE on blockdev swap devices")
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
> mm/swapfile.c | 29 ++++++++++++++++++++++-------
> 1 file changed, 22 insertions(+), 7 deletions(-)
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index bb3261d45b6a..dd5d7fa42282 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -2886,24 +2886,37 @@ static int claim_swapfile(struct swap_info_struct *p, struct inode *inode)
> p->old_block_size = block_size(p->bdev);
> error = set_blocksize(p->bdev, PAGE_SIZE);
> if (error < 0)
> - return error;
> + goto err;
> /*
> * Zoned block devices contain zones that have a sequential
> * write only restriction. Hence zoned block devices are not
> * suitable for swapping. Disallow them here.
> */
> - if (blk_queue_is_zoned(p->bdev->bd_queue))
> - return -EINVAL;
> + if (blk_queue_is_zoned(p->bdev->bd_queue)) {
> + error = -EINVAL;
> + goto err;
> + }
> p->flags |= SWP_BLKDEV;
> } else if (S_ISREG(inode->i_mode)) {
> p->bdev = inode->i_sb->s_bdev;
> }
>
> inode_lock(inode);
> - if (IS_SWAPFILE(inode))
> - return -EBUSY;
> + if (IS_SWAPFILE(inode)) {
> + inode_unlock(inode);
> + error = -EBUSY;
> + goto err;
> + }
>
> return 0;
> +
> +err:
> + if (S_ISBLK(inode->i_mode)) {
> + set_blocksize(p->bdev, p->old_block_size);
> + blkdev_put(p->bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL);
> + }
> +
> + return error;
> }
>
>
> @@ -3157,10 +3170,12 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
> mapping = swap_file->f_mapping;
> inode = mapping->host;
>
> - /* If S_ISREG(inode->i_mode) will do inode_lock(inode); */
> + /* do inode_lock(inode); */
What if we made this function responsible for calling inode_lock (and
unlock) instead of splitting it between sys_swapon and claim_swapfile?
--D
> error = claim_swapfile(p, inode);
> - if (unlikely(error))
> + if (unlikely(error)) {
> + inode = NULL;
> goto bad_swap;
> + }
>
> /*
> * Read the swap header.
> --
> 2.25.0
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm, swap: unlock inode in error path of claim_swapfile
2020-02-04 15:42 ` Darrick J. Wong
@ 2020-02-04 23:49 ` Naohiro Aota
2020-02-04 23:56 ` Darrick J. Wong
0 siblings, 1 reply; 5+ messages in thread
From: Naohiro Aota @ 2020-02-04 23:49 UTC (permalink / raw)
To: Darrick J. Wong
Cc: linux-mm, linux-fsdevel, linux-block, Andrew Morton, Christoph Hellwig
On Tue, Feb 04, 2020 at 07:42:29AM -0800, Darrick J. Wong wrote:
>On Tue, Feb 04, 2020 at 06:59:43PM +0900, Naohiro Aota wrote:
>> claim_swapfile() currently keeps the inode locked when it is successful, or
>> the file is already swapfile (with -EBUSY). And, on the other error cases,
>> it does not lock the inode.
>>
>> This inconsistency of the lock state and return value is quite confusing
>> and actually causing a bad unlock balance as below in the "bad_swap"
>> section of __do_sys_swapon().
>>
>> This commit fixes this issue by unlocking the inode on the error path. It
>> also reverts blocksize and releases bdev, so that the caller can safely
>> forget about the inode.
>>
>> =====================================
>> WARNING: bad unlock balance detected!
>> 5.5.0-rc7+ #176 Not tainted
>> -------------------------------------
>> swapon/4294 is trying to release lock (&sb->s_type->i_mutex_key) at:
>> [<ffffffff8173a6eb>] __do_sys_swapon+0x94b/0x3550
>> but there are no more locks to release!
>>
>> other info that might help us debug this:
>> no locks held by swapon/4294.
>>
>> stack backtrace:
>> CPU: 5 PID: 4294 Comm: swapon Not tainted 5.5.0-rc7-BTRFS-ZNS+ #176
>> Hardware name: ASUS All Series/H87-PRO, BIOS 2102 07/29/2014
>> Call Trace:
>> dump_stack+0xa1/0xea
>> ? __do_sys_swapon+0x94b/0x3550
>> print_unlock_imbalance_bug.cold+0x114/0x123
>> ? __do_sys_swapon+0x94b/0x3550
>> lock_release+0x562/0xed0
>> ? kvfree+0x31/0x40
>> ? lock_downgrade+0x770/0x770
>> ? kvfree+0x31/0x40
>> ? rcu_read_lock_sched_held+0xa1/0xd0
>> ? rcu_read_lock_bh_held+0xb0/0xb0
>> up_write+0x2d/0x490
>> ? kfree+0x293/0x2f0
>> __do_sys_swapon+0x94b/0x3550
>> ? putname+0xb0/0xf0
>> ? kmem_cache_free+0x2e7/0x370
>> ? do_sys_open+0x184/0x3e0
>> ? generic_max_swapfile_size+0x40/0x40
>> ? do_syscall_64+0x27/0x4b0
>> ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
>> ? lockdep_hardirqs_on+0x38c/0x590
>> __x64_sys_swapon+0x54/0x80
>> do_syscall_64+0xa4/0x4b0
>> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>> RIP: 0033:0x7f15da0a0dc7
>>
>> Fixes: 1638045c3677 ("mm: set S_SWAPFILE on blockdev swap devices")
>> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
>> ---
>> mm/swapfile.c | 29 ++++++++++++++++++++++-------
>> 1 file changed, 22 insertions(+), 7 deletions(-)
>>
>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>> index bb3261d45b6a..dd5d7fa42282 100644
>> --- a/mm/swapfile.c
>> +++ b/mm/swapfile.c
>> @@ -2886,24 +2886,37 @@ static int claim_swapfile(struct swap_info_struct *p, struct inode *inode)
>> p->old_block_size = block_size(p->bdev);
>> error = set_blocksize(p->bdev, PAGE_SIZE);
>> if (error < 0)
>> - return error;
>> + goto err;
>> /*
>> * Zoned block devices contain zones that have a sequential
>> * write only restriction. Hence zoned block devices are not
>> * suitable for swapping. Disallow them here.
>> */
>> - if (blk_queue_is_zoned(p->bdev->bd_queue))
>> - return -EINVAL;
>> + if (blk_queue_is_zoned(p->bdev->bd_queue)) {
>> + error = -EINVAL;
>> + goto err;
>> + }
>> p->flags |= SWP_BLKDEV;
>> } else if (S_ISREG(inode->i_mode)) {
>> p->bdev = inode->i_sb->s_bdev;
>> }
>>
>> inode_lock(inode);
>> - if (IS_SWAPFILE(inode))
>> - return -EBUSY;
>> + if (IS_SWAPFILE(inode)) {
>> + inode_unlock(inode);
>> + error = -EBUSY;
>> + goto err;
>> + }
>>
>> return 0;
>> +
>> +err:
>> + if (S_ISBLK(inode->i_mode)) {
>> + set_blocksize(p->bdev, p->old_block_size);
>> + blkdev_put(p->bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL);
>> + }
>> +
>> + return error;
>> }
>>
>>
>> @@ -3157,10 +3170,12 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
>> mapping = swap_file->f_mapping;
>> inode = mapping->host;
>>
>> - /* If S_ISREG(inode->i_mode) will do inode_lock(inode); */
>> + /* do inode_lock(inode); */
>
>What if we made this function responsible for calling inode_lock (and
>unlock) instead of splitting it between sys_swapon and claim_swapfile?
I think we cannot take inode_lock before claim_swapfile() because we can
have circular locking dependency as:
claim_swapfile()
-> blkdev_get()
-> __blkdev_get()
-> mutex_lock(&bdev->bd_mutex)
-> bd_set_size()
-> inode_lock(&bdev->bd_inode);
So, one thing we can do is to move inode_lock() and "if (IS_SWAPFILE(..))
..." out of claim_swapfile(). In this case, the "bad_swap" section must
check if "inode_is_locked" to call "inode_unlock".
>
>--D
>
>> error = claim_swapfile(p, inode);
>> - if (unlikely(error))
>> + if (unlikely(error)) {
>> + inode = NULL;
>> goto bad_swap;
>> + }
>>
>> /*
>> * Read the swap header.
>> --
>> 2.25.0
>>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm, swap: unlock inode in error path of claim_swapfile
2020-02-04 23:49 ` Naohiro Aota
@ 2020-02-04 23:56 ` Darrick J. Wong
2020-02-05 2:01 ` Naohiro Aota
0 siblings, 1 reply; 5+ messages in thread
From: Darrick J. Wong @ 2020-02-04 23:56 UTC (permalink / raw)
To: Naohiro Aota
Cc: linux-mm, linux-fsdevel, linux-block, Andrew Morton, Christoph Hellwig
On Wed, Feb 05, 2020 at 08:49:16AM +0900, Naohiro Aota wrote:
> On Tue, Feb 04, 2020 at 07:42:29AM -0800, Darrick J. Wong wrote:
> > On Tue, Feb 04, 2020 at 06:59:43PM +0900, Naohiro Aota wrote:
> > > claim_swapfile() currently keeps the inode locked when it is successful, or
> > > the file is already swapfile (with -EBUSY). And, on the other error cases,
> > > it does not lock the inode.
> > >
> > > This inconsistency of the lock state and return value is quite confusing
> > > and actually causing a bad unlock balance as below in the "bad_swap"
> > > section of __do_sys_swapon().
> > >
> > > This commit fixes this issue by unlocking the inode on the error path. It
> > > also reverts blocksize and releases bdev, so that the caller can safely
> > > forget about the inode.
> > >
> > > =====================================
> > > WARNING: bad unlock balance detected!
> > > 5.5.0-rc7+ #176 Not tainted
> > > -------------------------------------
> > > swapon/4294 is trying to release lock (&sb->s_type->i_mutex_key) at:
> > > [<ffffffff8173a6eb>] __do_sys_swapon+0x94b/0x3550
> > > but there are no more locks to release!
> > >
> > > other info that might help us debug this:
> > > no locks held by swapon/4294.
> > >
> > > stack backtrace:
> > > CPU: 5 PID: 4294 Comm: swapon Not tainted 5.5.0-rc7-BTRFS-ZNS+ #176
> > > Hardware name: ASUS All Series/H87-PRO, BIOS 2102 07/29/2014
> > > Call Trace:
> > > dump_stack+0xa1/0xea
> > > ? __do_sys_swapon+0x94b/0x3550
> > > print_unlock_imbalance_bug.cold+0x114/0x123
> > > ? __do_sys_swapon+0x94b/0x3550
> > > lock_release+0x562/0xed0
> > > ? kvfree+0x31/0x40
> > > ? lock_downgrade+0x770/0x770
> > > ? kvfree+0x31/0x40
> > > ? rcu_read_lock_sched_held+0xa1/0xd0
> > > ? rcu_read_lock_bh_held+0xb0/0xb0
> > > up_write+0x2d/0x490
> > > ? kfree+0x293/0x2f0
> > > __do_sys_swapon+0x94b/0x3550
> > > ? putname+0xb0/0xf0
> > > ? kmem_cache_free+0x2e7/0x370
> > > ? do_sys_open+0x184/0x3e0
> > > ? generic_max_swapfile_size+0x40/0x40
> > > ? do_syscall_64+0x27/0x4b0
> > > ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > > ? lockdep_hardirqs_on+0x38c/0x590
> > > __x64_sys_swapon+0x54/0x80
> > > do_syscall_64+0xa4/0x4b0
> > > entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > > RIP: 0033:0x7f15da0a0dc7
> > >
> > > Fixes: 1638045c3677 ("mm: set S_SWAPFILE on blockdev swap devices")
> > > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> > > ---
> > > mm/swapfile.c | 29 ++++++++++++++++++++++-------
> > > 1 file changed, 22 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > > index bb3261d45b6a..dd5d7fa42282 100644
> > > --- a/mm/swapfile.c
> > > +++ b/mm/swapfile.c
> > > @@ -2886,24 +2886,37 @@ static int claim_swapfile(struct swap_info_struct *p, struct inode *inode)
> > > p->old_block_size = block_size(p->bdev);
> > > error = set_blocksize(p->bdev, PAGE_SIZE);
> > > if (error < 0)
> > > - return error;
> > > + goto err;
> > > /*
> > > * Zoned block devices contain zones that have a sequential
> > > * write only restriction. Hence zoned block devices are not
> > > * suitable for swapping. Disallow them here.
> > > */
> > > - if (blk_queue_is_zoned(p->bdev->bd_queue))
> > > - return -EINVAL;
> > > + if (blk_queue_is_zoned(p->bdev->bd_queue)) {
> > > + error = -EINVAL;
> > > + goto err;
> > > + }
> > > p->flags |= SWP_BLKDEV;
> > > } else if (S_ISREG(inode->i_mode)) {
> > > p->bdev = inode->i_sb->s_bdev;
> > > }
> > >
> > > inode_lock(inode);
> > > - if (IS_SWAPFILE(inode))
> > > - return -EBUSY;
> > > + if (IS_SWAPFILE(inode)) {
> > > + inode_unlock(inode);
> > > + error = -EBUSY;
> > > + goto err;
> > > + }
> > >
> > > return 0;
> > > +
> > > +err:
> > > + if (S_ISBLK(inode->i_mode)) {
> > > + set_blocksize(p->bdev, p->old_block_size);
> > > + blkdev_put(p->bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL);
> > > + }
> > > +
> > > + return error;
> > > }
> > >
> > >
> > > @@ -3157,10 +3170,12 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
> > > mapping = swap_file->f_mapping;
> > > inode = mapping->host;
> > >
> > > - /* If S_ISREG(inode->i_mode) will do inode_lock(inode); */
> > > + /* do inode_lock(inode); */
> >
> > What if we made this function responsible for calling inode_lock (and
> > unlock) instead of splitting it between sys_swapon and claim_swapfile?
>
> I think we cannot take inode_lock before claim_swapfile() because we can
> have circular locking dependency as:
>
> claim_swapfile()
> -> blkdev_get() -> __blkdev_get()
> -> mutex_lock(&bdev->bd_mutex)
> -> bd_set_size()
> -> inode_lock(&bdev->bd_inode);
Ah, good point. Thank you for doing the research on that. :)
> So, one thing we can do is to move inode_lock() and "if (IS_SWAPFILE(..))
> ..." out of claim_swapfile(). In this case, the "bad_swap" section must
> check if "inode_is_locked" to call "inode_unlock".
I think I wouldn't rely on inode_is_locked and structure the error
escape as follows:
err = claim_swapfile()
if (err)
goto bad_swap;
inode_lock()
if (IS_SWAPFILE)
goto unlock_swap;
other_stuff()
unlock_swap:
inode_unlock()
bad_swap:
fput()
since that's how we (well, XFS anyway :)) tend to do it.
--D
> >
> > --D
> >
> > > error = claim_swapfile(p, inode);
> > > - if (unlikely(error))
> > > + if (unlikely(error)) {
> > > + inode = NULL;
> > > goto bad_swap;
> > > + }
> > >
> > > /*
> > > * Read the swap header.
> > > --
> > > 2.25.0
> > >
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm, swap: unlock inode in error path of claim_swapfile
2020-02-04 23:56 ` Darrick J. Wong
@ 2020-02-05 2:01 ` Naohiro Aota
0 siblings, 0 replies; 5+ messages in thread
From: Naohiro Aota @ 2020-02-05 2:01 UTC (permalink / raw)
To: Darrick J. Wong
Cc: linux-mm, linux-fsdevel, linux-block, Andrew Morton, Christoph Hellwig
On Tue, Feb 04, 2020 at 03:56:08PM -0800, Darrick J. Wong wrote:
>On Wed, Feb 05, 2020 at 08:49:16AM +0900, Naohiro Aota wrote:
>> On Tue, Feb 04, 2020 at 07:42:29AM -0800, Darrick J. Wong wrote:
>> > On Tue, Feb 04, 2020 at 06:59:43PM +0900, Naohiro Aota wrote:
>> > > claim_swapfile() currently keeps the inode locked when it is successful, or
>> > > the file is already swapfile (with -EBUSY). And, on the other error cases,
>> > > it does not lock the inode.
>> > >
>> > > This inconsistency of the lock state and return value is quite confusing
>> > > and actually causing a bad unlock balance as below in the "bad_swap"
>> > > section of __do_sys_swapon().
>> > >
>> > > This commit fixes this issue by unlocking the inode on the error path. It
>> > > also reverts blocksize and releases bdev, so that the caller can safely
>> > > forget about the inode.
>> > >
>> > > =====================================
>> > > WARNING: bad unlock balance detected!
>> > > 5.5.0-rc7+ #176 Not tainted
>> > > -------------------------------------
>> > > swapon/4294 is trying to release lock (&sb->s_type->i_mutex_key) at:
>> > > [<ffffffff8173a6eb>] __do_sys_swapon+0x94b/0x3550
>> > > but there are no more locks to release!
>> > >
>> > > other info that might help us debug this:
>> > > no locks held by swapon/4294.
>> > >
>> > > stack backtrace:
>> > > CPU: 5 PID: 4294 Comm: swapon Not tainted 5.5.0-rc7-BTRFS-ZNS+ #176
>> > > Hardware name: ASUS All Series/H87-PRO, BIOS 2102 07/29/2014
>> > > Call Trace:
>> > > dump_stack+0xa1/0xea
>> > > ? __do_sys_swapon+0x94b/0x3550
>> > > print_unlock_imbalance_bug.cold+0x114/0x123
>> > > ? __do_sys_swapon+0x94b/0x3550
>> > > lock_release+0x562/0xed0
>> > > ? kvfree+0x31/0x40
>> > > ? lock_downgrade+0x770/0x770
>> > > ? kvfree+0x31/0x40
>> > > ? rcu_read_lock_sched_held+0xa1/0xd0
>> > > ? rcu_read_lock_bh_held+0xb0/0xb0
>> > > up_write+0x2d/0x490
>> > > ? kfree+0x293/0x2f0
>> > > __do_sys_swapon+0x94b/0x3550
>> > > ? putname+0xb0/0xf0
>> > > ? kmem_cache_free+0x2e7/0x370
>> > > ? do_sys_open+0x184/0x3e0
>> > > ? generic_max_swapfile_size+0x40/0x40
>> > > ? do_syscall_64+0x27/0x4b0
>> > > ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
>> > > ? lockdep_hardirqs_on+0x38c/0x590
>> > > __x64_sys_swapon+0x54/0x80
>> > > do_syscall_64+0xa4/0x4b0
>> > > entry_SYSCALL_64_after_hwframe+0x49/0xbe
>> > > RIP: 0033:0x7f15da0a0dc7
>> > >
>> > > Fixes: 1638045c3677 ("mm: set S_SWAPFILE on blockdev swap devices")
>> > > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
>> > > ---
>> > > mm/swapfile.c | 29 ++++++++++++++++++++++-------
>> > > 1 file changed, 22 insertions(+), 7 deletions(-)
>> > >
>> > > diff --git a/mm/swapfile.c b/mm/swapfile.c
>> > > index bb3261d45b6a..dd5d7fa42282 100644
>> > > --- a/mm/swapfile.c
>> > > +++ b/mm/swapfile.c
>> > > @@ -2886,24 +2886,37 @@ static int claim_swapfile(struct swap_info_struct *p, struct inode *inode)
>> > > p->old_block_size = block_size(p->bdev);
>> > > error = set_blocksize(p->bdev, PAGE_SIZE);
>> > > if (error < 0)
>> > > - return error;
>> > > + goto err;
>> > > /*
>> > > * Zoned block devices contain zones that have a sequential
>> > > * write only restriction. Hence zoned block devices are not
>> > > * suitable for swapping. Disallow them here.
>> > > */
>> > > - if (blk_queue_is_zoned(p->bdev->bd_queue))
>> > > - return -EINVAL;
>> > > + if (blk_queue_is_zoned(p->bdev->bd_queue)) {
>> > > + error = -EINVAL;
>> > > + goto err;
>> > > + }
>> > > p->flags |= SWP_BLKDEV;
>> > > } else if (S_ISREG(inode->i_mode)) {
>> > > p->bdev = inode->i_sb->s_bdev;
>> > > }
>> > >
>> > > inode_lock(inode);
>> > > - if (IS_SWAPFILE(inode))
>> > > - return -EBUSY;
>> > > + if (IS_SWAPFILE(inode)) {
>> > > + inode_unlock(inode);
>> > > + error = -EBUSY;
>> > > + goto err;
>> > > + }
>> > >
>> > > return 0;
>> > > +
>> > > +err:
>> > > + if (S_ISBLK(inode->i_mode)) {
>> > > + set_blocksize(p->bdev, p->old_block_size);
>> > > + blkdev_put(p->bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL);
>> > > + }
>> > > +
>> > > + return error;
>> > > }
>> > >
>> > >
>> > > @@ -3157,10 +3170,12 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
>> > > mapping = swap_file->f_mapping;
>> > > inode = mapping->host;
>> > >
>> > > - /* If S_ISREG(inode->i_mode) will do inode_lock(inode); */
>> > > + /* do inode_lock(inode); */
>> >
>> > What if we made this function responsible for calling inode_lock (and
>> > unlock) instead of splitting it between sys_swapon and claim_swapfile?
>>
>> I think we cannot take inode_lock before claim_swapfile() because we can
>> have circular locking dependency as:
>>
>> claim_swapfile()
>> -> blkdev_get() -> __blkdev_get()
>> -> mutex_lock(&bdev->bd_mutex)
>> -> bd_set_size()
>> -> inode_lock(&bdev->bd_inode);
>
>Ah, good point. Thank you for doing the research on that. :)
>
>> So, one thing we can do is to move inode_lock() and "if (IS_SWAPFILE(..))
>> ..." out of claim_swapfile(). In this case, the "bad_swap" section must
>> check if "inode_is_locked" to call "inode_unlock".
>
>I think I wouldn't rely on inode_is_locked and structure the error
>escape as follows:
>
> err = claim_swapfile()
> if (err)
> goto bad_swap;
>
> inode_lock()
> if (IS_SWAPFILE)
> goto unlock_swap;
>
> other_stuff()
>
>unlock_swap:
> inode_unlock()
>bad_swap:
> fput()
>
>since that's how we (well, XFS anyway :)) tend to do it.
That's possible, but current error handling (the "bad_swap" section) is not
well organized, so we may hit some other lock issue or race problem ... OK,
I'll investigate and try to reorder the error handling code to be cleaner.
Thanks,
>
>--D
>
>> >
>> > --D
>> >
>> > > error = claim_swapfile(p, inode);
>> > > - if (unlikely(error))
>> > > + if (unlikely(error)) {
>> > > + inode = NULL;
>> > > goto bad_swap;
>> > > + }
>> > >
>> > > /*
>> > > * Read the swap header.
>> > > --
>> > > 2.25.0
>> > >
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-02-05 2:01 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-04 9:59 [PATCH] mm, swap: unlock inode in error path of claim_swapfile Naohiro Aota
2020-02-04 15:42 ` Darrick J. Wong
2020-02-04 23:49 ` Naohiro Aota
2020-02-04 23:56 ` Darrick J. Wong
2020-02-05 2:01 ` Naohiro Aota
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).