linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] swap: fix setting PAGE_SIZE blocksize during swapoff/swapon race
@ 2013-10-14 13:58 Krzysztof Kozlowski
  2013-10-15  9:59 ` Hugh Dickins
  0 siblings, 1 reply; 6+ messages in thread
From: Krzysztof Kozlowski @ 2013-10-14 13:58 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm
  Cc: Hugh Dickins, Weijie Yang, Michal Hocko, Shaohua Li, Minchan Kim,
	Krzysztof Kozlowski

Fix race between swapoff and swapon resulting in setting blocksize of
PAGE_SIZE for block devices during swapoff.

The swapon modifies swap_info->old_block_size before acquiring
swapon_mutex. It reads block_size of bdev, stores it under
swap_info->old_block_size and sets new block_size to PAGE_SIZE.

On the other hand the swapoff sets the device's block_size to
old_block_size after releasing swapon_mutex.

This patch locks the swapon_mutex much earlier during swapon. It also
releases the swapon_mutex later during swapoff.

The effect of race can be triggered by following scenario:
 - One block swap device with block size of 512
 - thread 1: Swapon is called, swap is activated,
   p->old_block_size = block_size(p->bdev); /512/
   block_size(p->bdev) = PAGE_SIZE;
   Thread ends.

 - thread 2: Swapoff is called and it goes just after releasing the
   swapon_mutex. The swap is now fully disabled except of setting the
   block size to old value. The p->bdev->block_size is still equal to
   PAGE_SIZE.

 - thread 3: New swapon is called. This swap is disabled so without
   acquiring the swapon_mutex:
   - p->old_block_size = block_size(p->bdev); /PAGE_SIZE (!!!)/
   - block_size(p->bdev) = PAGE_SIZE;
   Swap is activated and thread ends.

 - thread 2: resumes work and sets blocksize to old value:
   - set_blocksize(bdev, p->old_block_size)
   But now the p->old_block_size is equal to PAGE_SIZE.

The patch swap-fix-set_blocksize-race-during-swapon-swapoff does not fix
this particular issue. It reduces the possibility of races as the swapon
must overwrite p->old_block_size before acquiring swapon_mutex in
swapoff.

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
---
 mm/swapfile.c |   20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 3963fc2..9b64ef4 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1926,7 +1926,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
 	spin_unlock(&p->lock);
 	spin_unlock(&swap_lock);
 	frontswap_invalidate_area(type);
-	mutex_unlock(&swapon_mutex);
 	free_percpu(p->percpu_cluster);
 	p->percpu_cluster = NULL;
 	vfree(swap_map);
@@ -1946,6 +1945,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
 		mutex_unlock(&inode->i_mutex);
 	}
 	filp_close(swap_file, NULL);
+	mutex_unlock(&swapon_mutex);
 	err = 0;
 	atomic_inc(&proc_poll_event);
 	wake_up_interruptible(&proc_poll_wait);
@@ -2402,37 +2402,38 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
 		}
 	}
 
+	mutex_lock(&swapon_mutex);
 	inode = mapping->host;
 	/* If S_ISREG(inode->i_mode) will do mutex_lock(&inode->i_mutex); */
 	error = claim_swapfile(p, inode);
 	if (unlikely(error))
-		goto bad_swap;
+		goto bad_swap_wmutex;
 
 	/*
 	 * Read the swap header.
 	 */
 	if (!mapping->a_ops->readpage) {
 		error = -EINVAL;
-		goto bad_swap;
+		goto bad_swap_wmutex;
 	}
 	page = read_mapping_page(mapping, 0, swap_file);
 	if (IS_ERR(page)) {
 		error = PTR_ERR(page);
-		goto bad_swap;
+		goto bad_swap_wmutex;
 	}
 	swap_header = kmap(page);
 
 	maxpages = read_swap_header(p, swap_header, inode);
 	if (unlikely(!maxpages)) {
 		error = -EINVAL;
-		goto bad_swap;
+		goto bad_swap_wmutex;
 	}
 
 	/* OK, set up the swap map and apply the bad block list */
 	swap_map = vzalloc(maxpages);
 	if (!swap_map) {
 		error = -ENOMEM;
-		goto bad_swap;
+		goto bad_swap_wmutex;
 	}
 	if (p->bdev && blk_queue_nonrot(bdev_get_queue(p->bdev))) {
 		p->flags |= SWP_SOLIDSTATE;
@@ -2462,13 +2463,13 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
 
 	error = swap_cgroup_swapon(p->type, maxpages);
 	if (error)
-		goto bad_swap;
+		goto bad_swap_wmutex;
 
 	nr_extents = setup_swap_map_and_extents(p, swap_header, swap_map,
 		cluster_info, maxpages, &span);
 	if (unlikely(nr_extents < 0)) {
 		error = nr_extents;
-		goto bad_swap;
+		goto bad_swap_wmutex;
 	}
 	/* frontswap enabled? set up bit-per-page map for frontswap */
 	if (frontswap_enabled)
@@ -2504,7 +2505,6 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
 		}
 	}
 
-	mutex_lock(&swapon_mutex);
 	prio = -1;
 	if (swap_flags & SWAP_FLAG_PREFER)
 		prio =
@@ -2529,6 +2529,8 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
 		inode->i_flags |= S_SWAPFILE;
 	error = 0;
 	goto out;
+bad_swap_wmutex:
+	mutex_unlock(&swapon_mutex);
 bad_swap:
 	free_percpu(p->percpu_cluster);
 	p->percpu_cluster = NULL;
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] swap: fix setting PAGE_SIZE blocksize during swapoff/swapon race
  2013-10-14 13:58 [PATCH] swap: fix setting PAGE_SIZE blocksize during swapoff/swapon race Krzysztof Kozlowski
@ 2013-10-15  9:59 ` Hugh Dickins
  2013-10-15 11:22   ` Krzysztof Kozlowski
  0 siblings, 1 reply; 6+ messages in thread
From: Hugh Dickins @ 2013-10-15  9:59 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: Andrew Morton, linux-kernel, linux-mm, Hugh Dickins, Weijie Yang,
	Michal Hocko, Shaohua Li, Minchan Kim

On Mon, 14 Oct 2013, Krzysztof Kozlowski wrote:

> Fix race between swapoff and swapon resulting in setting blocksize of
> PAGE_SIZE for block devices during swapoff.
> 
> The swapon modifies swap_info->old_block_size before acquiring
> swapon_mutex. It reads block_size of bdev, stores it under
> swap_info->old_block_size and sets new block_size to PAGE_SIZE.
> 
> On the other hand the swapoff sets the device's block_size to
> old_block_size after releasing swapon_mutex.
> 
> This patch locks the swapon_mutex much earlier during swapon. It also
> releases the swapon_mutex later during swapoff.
> 
> The effect of race can be triggered by following scenario:
>  - One block swap device with block size of 512
>  - thread 1: Swapon is called, swap is activated,
>    p->old_block_size = block_size(p->bdev); /512/
>    block_size(p->bdev) = PAGE_SIZE;
>    Thread ends.
> 
>  - thread 2: Swapoff is called and it goes just after releasing the
>    swapon_mutex. The swap is now fully disabled except of setting the
>    block size to old value. The p->bdev->block_size is still equal to
>    PAGE_SIZE.
> 
>  - thread 3: New swapon is called. This swap is disabled so without
>    acquiring the swapon_mutex:
>    - p->old_block_size = block_size(p->bdev); /PAGE_SIZE (!!!)/
>    - block_size(p->bdev) = PAGE_SIZE;
>    Swap is activated and thread ends.
> 
>  - thread 2: resumes work and sets blocksize to old value:
>    - set_blocksize(bdev, p->old_block_size)
>    But now the p->old_block_size is equal to PAGE_SIZE.
> 
> The patch swap-fix-set_blocksize-race-during-swapon-swapoff does not fix
> this particular issue. It reduces the possibility of races as the swapon
> must overwrite p->old_block_size before acquiring swapon_mutex in
> swapoff.
> 
> Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>

Sorry you're being blown back and forth on this, but I say Nack to
this version.  I've not spent the time to check whether it ends up
correct or not; but your original patch was appropriate to the bug,
and this one is just unnecessary churn in my view.

Hugh

> ---
>  mm/swapfile.c |   20 +++++++++++---------
>  1 file changed, 11 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 3963fc2..9b64ef4 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -1926,7 +1926,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
>  	spin_unlock(&p->lock);
>  	spin_unlock(&swap_lock);
>  	frontswap_invalidate_area(type);
> -	mutex_unlock(&swapon_mutex);
>  	free_percpu(p->percpu_cluster);
>  	p->percpu_cluster = NULL;
>  	vfree(swap_map);
> @@ -1946,6 +1945,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
>  		mutex_unlock(&inode->i_mutex);
>  	}
>  	filp_close(swap_file, NULL);
> +	mutex_unlock(&swapon_mutex);
>  	err = 0;
>  	atomic_inc(&proc_poll_event);
>  	wake_up_interruptible(&proc_poll_wait);
> @@ -2402,37 +2402,38 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
>  		}
>  	}
>  
> +	mutex_lock(&swapon_mutex);
>  	inode = mapping->host;
>  	/* If S_ISREG(inode->i_mode) will do mutex_lock(&inode->i_mutex); */
>  	error = claim_swapfile(p, inode);
>  	if (unlikely(error))
> -		goto bad_swap;
> +		goto bad_swap_wmutex;
>  
>  	/*
>  	 * Read the swap header.
>  	 */
>  	if (!mapping->a_ops->readpage) {
>  		error = -EINVAL;
> -		goto bad_swap;
> +		goto bad_swap_wmutex;
>  	}
>  	page = read_mapping_page(mapping, 0, swap_file);
>  	if (IS_ERR(page)) {
>  		error = PTR_ERR(page);
> -		goto bad_swap;
> +		goto bad_swap_wmutex;
>  	}
>  	swap_header = kmap(page);
>  
>  	maxpages = read_swap_header(p, swap_header, inode);
>  	if (unlikely(!maxpages)) {
>  		error = -EINVAL;
> -		goto bad_swap;
> +		goto bad_swap_wmutex;
>  	}
>  
>  	/* OK, set up the swap map and apply the bad block list */
>  	swap_map = vzalloc(maxpages);
>  	if (!swap_map) {
>  		error = -ENOMEM;
> -		goto bad_swap;
> +		goto bad_swap_wmutex;
>  	}
>  	if (p->bdev && blk_queue_nonrot(bdev_get_queue(p->bdev))) {
>  		p->flags |= SWP_SOLIDSTATE;
> @@ -2462,13 +2463,13 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
>  
>  	error = swap_cgroup_swapon(p->type, maxpages);
>  	if (error)
> -		goto bad_swap;
> +		goto bad_swap_wmutex;
>  
>  	nr_extents = setup_swap_map_and_extents(p, swap_header, swap_map,
>  		cluster_info, maxpages, &span);
>  	if (unlikely(nr_extents < 0)) {
>  		error = nr_extents;
> -		goto bad_swap;
> +		goto bad_swap_wmutex;
>  	}
>  	/* frontswap enabled? set up bit-per-page map for frontswap */
>  	if (frontswap_enabled)
> @@ -2504,7 +2505,6 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
>  		}
>  	}
>  
> -	mutex_lock(&swapon_mutex);
>  	prio = -1;
>  	if (swap_flags & SWAP_FLAG_PREFER)
>  		prio =
> @@ -2529,6 +2529,8 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
>  		inode->i_flags |= S_SWAPFILE;
>  	error = 0;
>  	goto out;
> +bad_swap_wmutex:
> +	mutex_unlock(&swapon_mutex);
>  bad_swap:
>  	free_percpu(p->percpu_cluster);
>  	p->percpu_cluster = NULL;
> -- 
> 1.7.9.5
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] swap: fix setting PAGE_SIZE blocksize during swapoff/swapon race
  2013-10-15  9:59 ` Hugh Dickins
@ 2013-10-15 11:22   ` Krzysztof Kozlowski
  2013-10-15 17:19     ` Hugh Dickins
  0 siblings, 1 reply; 6+ messages in thread
From: Krzysztof Kozlowski @ 2013-10-15 11:22 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, linux-kernel, linux-mm, Weijie Yang, Michal Hocko,
	Shaohua Li, Minchan Kim

On Tue, 2013-10-15 at 02:59 -0700, Hugh Dickins wrote:
> On Mon, 14 Oct 2013, Krzysztof Kozlowski wrote:
> 
> > Fix race between swapoff and swapon resulting in setting blocksize of
> > PAGE_SIZE for block devices during swapoff.
> > 
> > The swapon modifies swap_info->old_block_size before acquiring
> > swapon_mutex. It reads block_size of bdev, stores it under
> > swap_info->old_block_size and sets new block_size to PAGE_SIZE.
> > 
> > On the other hand the swapoff sets the device's block_size to
> > old_block_size after releasing swapon_mutex.
> > 
> > This patch locks the swapon_mutex much earlier during swapon. It also
> > releases the swapon_mutex later during swapoff.
> > 
> > The effect of race can be triggered by following scenario:
> >  - One block swap device with block size of 512
> >  - thread 1: Swapon is called, swap is activated,
> >    p->old_block_size = block_size(p->bdev); /512/
> >    block_size(p->bdev) = PAGE_SIZE;
> >    Thread ends.
> > 
> >  - thread 2: Swapoff is called and it goes just after releasing the
> >    swapon_mutex. The swap is now fully disabled except of setting the
> >    block size to old value. The p->bdev->block_size is still equal to
> >    PAGE_SIZE.
> > 
> >  - thread 3: New swapon is called. This swap is disabled so without
> >    acquiring the swapon_mutex:
> >    - p->old_block_size = block_size(p->bdev); /PAGE_SIZE (!!!)/
> >    - block_size(p->bdev) = PAGE_SIZE;
> >    Swap is activated and thread ends.
> > 
> >  - thread 2: resumes work and sets blocksize to old value:
> >    - set_blocksize(bdev, p->old_block_size)
> >    But now the p->old_block_size is equal to PAGE_SIZE.
> > 
> > The patch swap-fix-set_blocksize-race-during-swapon-swapoff does not fix
> > this particular issue. It reduces the possibility of races as the swapon
> > must overwrite p->old_block_size before acquiring swapon_mutex in
> > swapoff.
> > 
> > Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
> 
> Sorry you're being blown back and forth on this, but I say Nack to
> this version.  I've not spent the time to check whether it ends up
> correct or not; but your original patch was appropriate to the bug,
> and this one is just unnecessary churn in my view.

Hi,

I still think my previous patch does not solve the issue entirely.
The call set_blocksize() in swapoff quite often sets PAGE_SIZE instead
of valid block size (e.g. 512). I trigger this with:
------
for i in `seq 1000`
do
	swapoff /dev/sdc1 &
	swapon /dev/sdc1 &
	swapon /dev/sdc1 &
done
------
10 seconds run of this script resulted in 50% of set_blocksize(PAGE_SIZE).
Although effect can only be observed after adding printks (block device is
released).

Best regards,
Krzysztof



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] swap: fix setting PAGE_SIZE blocksize during swapoff/swapon race
  2013-10-15 11:22   ` Krzysztof Kozlowski
@ 2013-10-15 17:19     ` Hugh Dickins
  2013-10-17 15:48       ` Weijie Yang
  0 siblings, 1 reply; 6+ messages in thread
From: Hugh Dickins @ 2013-10-15 17:19 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: Al Viro, Andrew Morton, linux-kernel, linux-mm, Weijie Yang,
	Michal Hocko, Shaohua Li, Minchan Kim

On Tue, 15 Oct 2013, Krzysztof Kozlowski wrote:
> On Tue, 2013-10-15 at 02:59 -0700, Hugh Dickins wrote:
> > On Mon, 14 Oct 2013, Krzysztof Kozlowski wrote:
> > 
> > > Fix race between swapoff and swapon resulting in setting blocksize of
> > > PAGE_SIZE for block devices during swapoff.
> > > 
> > > The swapon modifies swap_info->old_block_size before acquiring
> > > swapon_mutex. It reads block_size of bdev, stores it under
> > > swap_info->old_block_size and sets new block_size to PAGE_SIZE.
> > > 
> > > On the other hand the swapoff sets the device's block_size to
> > > old_block_size after releasing swapon_mutex.
> > > 
> > > This patch locks the swapon_mutex much earlier during swapon. It also
> > > releases the swapon_mutex later during swapoff.
> > > 
> > > The effect of race can be triggered by following scenario:
> > >  - One block swap device with block size of 512
> > >  - thread 1: Swapon is called, swap is activated,
> > >    p->old_block_size = block_size(p->bdev); /512/
> > >    block_size(p->bdev) = PAGE_SIZE;
> > >    Thread ends.
> > > 
> > >  - thread 2: Swapoff is called and it goes just after releasing the
> > >    swapon_mutex. The swap is now fully disabled except of setting the
> > >    block size to old value. The p->bdev->block_size is still equal to
> > >    PAGE_SIZE.
> > > 
> > >  - thread 3: New swapon is called. This swap is disabled so without
> > >    acquiring the swapon_mutex:
> > >    - p->old_block_size = block_size(p->bdev); /PAGE_SIZE (!!!)/
> > >    - block_size(p->bdev) = PAGE_SIZE;
> > >    Swap is activated and thread ends.
> > > 
> > >  - thread 2: resumes work and sets blocksize to old value:
> > >    - set_blocksize(bdev, p->old_block_size)
> > >    But now the p->old_block_size is equal to PAGE_SIZE.
> > > 
> > > The patch swap-fix-set_blocksize-race-during-swapon-swapoff does not fix
> > > this particular issue. It reduces the possibility of races as the swapon
> > > must overwrite p->old_block_size before acquiring swapon_mutex in
> > > swapoff.
> > > 
> > > Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
> > 
> > Sorry you're being blown back and forth on this, but I say Nack to
> > this version.  I've not spent the time to check whether it ends up
> > correct or not; but your original patch was appropriate to the bug,
> > and this one is just unnecessary churn in my view.
> 
> Hi,
> 
> I still think my previous patch does not solve the issue entirely.
> The call set_blocksize() in swapoff quite often sets PAGE_SIZE instead
> of valid block size (e.g. 512). I trigger this with:

PAGE_SIZE and 512 are equally valid block sizes,
it's just hard to support both consistently at the same instant.

> ------
> for i in `seq 1000`
> do
> 	swapoff /dev/sdc1 &
> 	swapon /dev/sdc1 &
> 	swapon /dev/sdc1 &
> done
> ------
> 10 seconds run of this script resulted in 50% of set_blocksize(PAGE_SIZE).
> Although effect can only be observed after adding printks (block device is
> released).

But despite PAGE_SIZE being a valid block size,
I agree that it's odd if you see variation there.

Here's my guess: it looks as if the p->bdev test is inadequate, in the
decision whether bad_swap should set_blocksize() or not: p->bdev is not
usually reset when a swap_info_struct is released for reuse.

Please try correcting that, either by resetting p->bdev where necessary,
or by putting a better test in bad_swap: see if that fixes this oddity.

I still much prefer your original little patch,
to this extension of the use of swapon_mutex.

However, a bigger question would be, why does swapoff have to set block
size back to old_block_size anyway?  That was introduced in 2.5.13 by

<viro@math.psu.edu> (02/05/01 1.447.69.1)
	[PATCH] (1/6) blksize_size[] removal
	
	 - preliminary cleanups: make sure that swapoff restores original block
	   size, kill set_blocksize() (and use of __bread()) in multipath.c,
	   reorder opening device and finding its block size in mtdblock.c.

Al, not an urgent question, but is this swapoff old_block_size stuff
still necessary?  And can't swapon just use whatever bd_block_size is
already in force?  IIUC, it plays no part beyond the initial readpage
of swap header.

Thanks,
Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] swap: fix setting PAGE_SIZE blocksize during swapoff/swapon race
  2013-10-15 17:19     ` Hugh Dickins
@ 2013-10-17 15:48       ` Weijie Yang
  2013-10-17 15:59         ` Weijie Yang
  0 siblings, 1 reply; 6+ messages in thread
From: Weijie Yang @ 2013-10-17 15:48 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Krzysztof Kozlowski, Al Viro, Andrew Morton, Linux-Kernel,
	Linux-MM, Michal Hocko, Shaohua Li, Minchan Kim

On Wed, Oct 16, 2013 at 1:19 AM, Hugh Dickins <hughd@google.com> wrote:
> On Tue, 15 Oct 2013, Krzysztof Kozlowski wrote:
>> On Tue, 2013-10-15 at 02:59 -0700, Hugh Dickins wrote:
>> > On Mon, 14 Oct 2013, Krzysztof Kozlowski wrote:
>> >
>> > > Fix race between swapoff and swapon resulting in setting blocksize of
>> > > PAGE_SIZE for block devices during swapoff.
>> > >
>> > > The swapon modifies swap_info->old_block_size before acquiring
>> > > swapon_mutex. It reads block_size of bdev, stores it under
>> > > swap_info->old_block_size and sets new block_size to PAGE_SIZE.
>> > >
>> > > On the other hand the swapoff sets the device's block_size to
>> > > old_block_size after releasing swapon_mutex.
>> > >
>> > > This patch locks the swapon_mutex much earlier during swapon. It also
>> > > releases the swapon_mutex later during swapoff.
>> > >
>> > > The effect of race can be triggered by following scenario:
>> > >  - One block swap device with block size of 512
>> > >  - thread 1: Swapon is called, swap is activated,
>> > >    p->old_block_size = block_size(p->bdev); /512/
>> > >    block_size(p->bdev) = PAGE_SIZE;
>> > >    Thread ends.
>> > >
>> > >  - thread 2: Swapoff is called and it goes just after releasing the
>> > >    swapon_mutex. The swap is now fully disabled except of setting the
>> > >    block size to old value. The p->bdev->block_size is still equal to
>> > >    PAGE_SIZE.
>> > >
>> > >  - thread 3: New swapon is called. This swap is disabled so without
>> > >    acquiring the swapon_mutex:
>> > >    - p->old_block_size = block_size(p->bdev); /PAGE_SIZE (!!!)/
>> > >    - block_size(p->bdev) = PAGE_SIZE;
>> > >    Swap is activated and thread ends.
>> > >
>> > >  - thread 2: resumes work and sets blocksize to old value:
>> > >    - set_blocksize(bdev, p->old_block_size)
>> > >    But now the p->old_block_size is equal to PAGE_SIZE.
>> > >
>> > > The patch swap-fix-set_blocksize-race-during-swapon-swapoff does not fix
>> > > this particular issue. It reduces the possibility of races as the swapon
>> > > must overwrite p->old_block_size before acquiring swapon_mutex in
>> > > swapoff.
>> > >
>> > > Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
>> >
>> > Sorry you're being blown back and forth on this, but I say Nack to
>> > this version.  I've not spent the time to check whether it ends up
>> > correct or not; but your original patch was appropriate to the bug,
>> > and this one is just unnecessary churn in my view.
>>
>> Hi,
>>
>> I still think my previous patch does not solve the issue entirely.
>> The call set_blocksize() in swapoff quite often sets PAGE_SIZE instead
>> of valid block size (e.g. 512). I trigger this with:
>
> PAGE_SIZE and 512 are equally valid block sizes,
> it's just hard to support both consistently at the same instant.
>
>> ------
>> for i in `seq 1000`
>> do
>>       swapoff /dev/sdc1 &
>>       swapon /dev/sdc1 &
>>       swapon /dev/sdc1 &
>> done
>> ------
>> 10 seconds run of this script resulted in 50% of set_blocksize(PAGE_SIZE).
>> Although effect can only be observed after adding printks (block device is
>> released).
>
> But despite PAGE_SIZE being a valid block size,
> I agree that it's odd if you see variation there.
>
> Here's my guess: it looks as if the p->bdev test is inadequate, in the
> decision whether bad_swap should set_blocksize() or not: p->bdev is not
> usually reset when a swap_info_struct is released for reuse.
>
> Please try correcting that, either by resetting p->bdev where necessary,
> or by putting a better test in bad_swap: see if that fixes this oddity.
>
> I still much prefer your original little patch,
> to this extension of the use of swapon_mutex.
>
> However, a bigger question would be, why does swapoff have to set block
> size back to old_block_size anyway?  That was introduced in 2.5.13 by
>
> <viro@math.psu.edu> (02/05/01 1.447.69.1)
>         [PATCH] (1/6) blksize_size[] removal
>
>          - preliminary cleanups: make sure that swapoff restores original block
>            size, kill set_blocksize() (and use of __bread()) in multipath.c,
>            reorder opening device and finding its block size in mtdblock.c.
>
> Al, not an urgent question, but is this swapoff old_block_size stuff
> still necessary?  And can't swapon just use whatever bd_block_size is
> already in force?  IIUC, it plays no part beyond the initial readpage
> of swap header.
>
> Thanks,
> Hugh

Let me try to explain(and guess):
we have to set_block in swapon. the swap_header is PAGE_SIZE, if device's
blocksize is more than PAGE_SIZE, then the swap entry address on swapfile
would be not PAGE_SIZE aligned. or one swap page can not fill a block.
There maybe a problem for some device.
The set_blocksize() do the judgement work for swapon.
And may be some userland tools assume swap device blocksize is PAGE_SIZE?

issues here are more than this one:
After swap_info_struct is released for reuse in swapoff.
Its corresponding resources are released later, such as:
- swap_cgroup_swapoff(type);
- blkdev_put
- inode->i_flags &= ~S_SWAPFILE;

we need release(or clean) these resources before release swap_info_struct.

to Krzysztof: I think it is better to add this handle to your patch

regards

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] swap: fix setting PAGE_SIZE blocksize during swapoff/swapon race
  2013-10-17 15:48       ` Weijie Yang
@ 2013-10-17 15:59         ` Weijie Yang
  0 siblings, 0 replies; 6+ messages in thread
From: Weijie Yang @ 2013-10-17 15:59 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Krzysztof Kozlowski, Al Viro, Andrew Morton, Linux-Kernel,
	Linux-MM, Michal Hocko, Shaohua Li, Minchan Kim

On Thu, Oct 17, 2013 at 11:48 PM, Weijie Yang <weijie.yang.kh@gmail.com> wrote:
> On Wed, Oct 16, 2013 at 1:19 AM, Hugh Dickins <hughd@google.com> wrote:
>> On Tue, 15 Oct 2013, Krzysztof Kozlowski wrote:
>>> On Tue, 2013-10-15 at 02:59 -0700, Hugh Dickins wrote:
>>> > On Mon, 14 Oct 2013, Krzysztof Kozlowski wrote:
>>> >
>>> > > Fix race between swapoff and swapon resulting in setting blocksize of
>>> > > PAGE_SIZE for block devices during swapoff.
>>> > >
>>> > > The swapon modifies swap_info->old_block_size before acquiring
>>> > > swapon_mutex. It reads block_size of bdev, stores it under
>>> > > swap_info->old_block_size and sets new block_size to PAGE_SIZE.
>>> > >
>>> > > On the other hand the swapoff sets the device's block_size to
>>> > > old_block_size after releasing swapon_mutex.
>>> > >
>>> > > This patch locks the swapon_mutex much earlier during swapon. It also
>>> > > releases the swapon_mutex later during swapoff.
>>> > >
>>> > > The effect of race can be triggered by following scenario:
>>> > >  - One block swap device with block size of 512
>>> > >  - thread 1: Swapon is called, swap is activated,
>>> > >    p->old_block_size = block_size(p->bdev); /512/
>>> > >    block_size(p->bdev) = PAGE_SIZE;
>>> > >    Thread ends.
>>> > >
>>> > >  - thread 2: Swapoff is called and it goes just after releasing the
>>> > >    swapon_mutex. The swap is now fully disabled except of setting the
>>> > >    block size to old value. The p->bdev->block_size is still equal to
>>> > >    PAGE_SIZE.
>>> > >
>>> > >  - thread 3: New swapon is called. This swap is disabled so without
>>> > >    acquiring the swapon_mutex:
>>> > >    - p->old_block_size = block_size(p->bdev); /PAGE_SIZE (!!!)/
>>> > >    - block_size(p->bdev) = PAGE_SIZE;
>>> > >    Swap is activated and thread ends.
>>> > >
>>> > >  - thread 2: resumes work and sets blocksize to old value:
>>> > >    - set_blocksize(bdev, p->old_block_size)
>>> > >    But now the p->old_block_size is equal to PAGE_SIZE.
>>> > >
>>> > > The patch swap-fix-set_blocksize-race-during-swapon-swapoff does not fix
>>> > > this particular issue. It reduces the possibility of races as the swapon
>>> > > must overwrite p->old_block_size before acquiring swapon_mutex in
>>> > > swapoff.
>>> > >
>>> > > Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
>>> >
>>> > Sorry you're being blown back and forth on this, but I say Nack to
>>> > this version.  I've not spent the time to check whether it ends up
>>> > correct or not; but your original patch was appropriate to the bug,
>>> > and this one is just unnecessary churn in my view.
>>>
>>> Hi,
>>>
>>> I still think my previous patch does not solve the issue entirely.
>>> The call set_blocksize() in swapoff quite often sets PAGE_SIZE instead
>>> of valid block size (e.g. 512). I trigger this with:
>>
>> PAGE_SIZE and 512 are equally valid block sizes,
>> it's just hard to support both consistently at the same instant.
>>
>>> ------
>>> for i in `seq 1000`
>>> do
>>>       swapoff /dev/sdc1 &
>>>       swapon /dev/sdc1 &
>>>       swapon /dev/sdc1 &
>>> done
>>> ------
>>> 10 seconds run of this script resulted in 50% of set_blocksize(PAGE_SIZE).
>>> Although effect can only be observed after adding printks (block device is
>>> released).
>>
>> But despite PAGE_SIZE being a valid block size,
>> I agree that it's odd if you see variation there.
>>
>> Here's my guess: it looks as if the p->bdev test is inadequate, in the
>> decision whether bad_swap should set_blocksize() or not: p->bdev is not
>> usually reset when a swap_info_struct is released for reuse.
>>
>> Please try correcting that, either by resetting p->bdev where necessary,
>> or by putting a better test in bad_swap: see if that fixes this oddity.
>>
>> I still much prefer your original little patch,
>> to this extension of the use of swapon_mutex.
>>
>> However, a bigger question would be, why does swapoff have to set block
>> size back to old_block_size anyway?  That was introduced in 2.5.13 by
>>
>> <viro@math.psu.edu> (02/05/01 1.447.69.1)
>>         [PATCH] (1/6) blksize_size[] removal
>>
>>          - preliminary cleanups: make sure that swapoff restores original block
>>            size, kill set_blocksize() (and use of __bread()) in multipath.c,
>>            reorder opening device and finding its block size in mtdblock.c.
>>
>> Al, not an urgent question, but is this swapoff old_block_size stuff
>> still necessary?  And can't swapon just use whatever bd_block_size is
>> already in force?  IIUC, it plays no part beyond the initial readpage
>> of swap header.
>>
>> Thanks,
>> Hugh
>
> Let me try to explain(and guess):
> we have to set_block in swapon. the swap_header is PAGE_SIZE, if device's
> blocksize is more than PAGE_SIZE, then the swap entry address on swapfile
> would be not PAGE_SIZE aligned. or one swap page can not fill a block.
> There maybe a problem for some device.
> The set_blocksize() do the judgement work for swapon.
> And may be some userland tools assume swap device blocksize is PAGE_SIZE?
>
> issues here are more than this one:
> After swap_info_struct is released for reuse in swapoff.
> Its corresponding resources are released later, such as:
> - swap_cgroup_swapoff(type);
> - blkdev_put
> - inode->i_flags &= ~S_SWAPFILE;
>

my code is 3.11 version. And in 3.12-rc5,
free_percpu(p->percpu_cluster);
is another issue that released later.

> we need release(or clean) these resources before release swap_info_struct.
>
> to Krzysztof: I think it is better to add this handle to your patch
>
> regards

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-10-17 15:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-14 13:58 [PATCH] swap: fix setting PAGE_SIZE blocksize during swapoff/swapon race Krzysztof Kozlowski
2013-10-15  9:59 ` Hugh Dickins
2013-10-15 11:22   ` Krzysztof Kozlowski
2013-10-15 17:19     ` Hugh Dickins
2013-10-17 15:48       ` Weijie Yang
2013-10-17 15:59         ` Weijie Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).