linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm, swap: disallow swapon() on zoned block devices
@ 2019-10-15  4:38 Naohiro Aota
  2019-10-15  7:57 ` Christoph Hellwig
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Naohiro Aota @ 2019-10-15  4:38 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-fsdevel, linux-block, Andrew Morton, Naohiro Aota

A zoned block device consists of a number of zones. Zones are
eitherconventional and accepting random writes or sequential and
requiringthat writes be issued in LBA order from each zone write
pointerposition. For the write restriction, zoned block devices are
notsuitable for a swap device. Disallow swapon on them.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 mm/swapfile.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index dab43523afdd..a9da20739017 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2887,6 +2887,8 @@ static int claim_swapfile(struct swap_info_struct *p, struct inode *inode)
 		error = set_blocksize(p->bdev, PAGE_SIZE);
 		if (error < 0)
 			return error;
+		if (blk_queue_is_zoned(p->bdev->bd_queue))
+			return -EINVAL;
 		p->flags |= SWP_BLKDEV;
 	} else if (S_ISREG(inode->i_mode)) {
 		p->bdev = inode->i_sb->s_bdev;
-- 
2.23.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm, swap: disallow swapon() on zoned block devices
  2019-10-15  4:38 [PATCH] mm, swap: disallow swapon() on zoned block devices Naohiro Aota
@ 2019-10-15  7:57 ` Christoph Hellwig
  2019-10-15  8:58 ` [PATCH v2] " Naohiro Aota
  2019-10-15 11:35 ` Project idea: Swap to " Matthew Wilcox
  2 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2019-10-15  7:57 UTC (permalink / raw)
  To: Naohiro Aota; +Cc: linux-mm, linux-fsdevel, linux-block, Andrew Morton

On Tue, Oct 15, 2019 at 01:38:27PM +0900, Naohiro Aota wrote:
> +		if (blk_queue_is_zoned(p->bdev->bd_queue))
> +			return -EINVAL;

Please add a comment here (based on your changelog).

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2] mm, swap: disallow swapon() on zoned block devices
  2019-10-15  4:38 [PATCH] mm, swap: disallow swapon() on zoned block devices Naohiro Aota
  2019-10-15  7:57 ` Christoph Hellwig
@ 2019-10-15  8:58 ` Naohiro Aota
  2019-10-15  9:06   ` Christoph Hellwig
  2019-10-15 11:35 ` Project idea: Swap to " Matthew Wilcox
  2 siblings, 1 reply; 11+ messages in thread
From: Naohiro Aota @ 2019-10-15  8:58 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-fsdevel, linux-block, Andrew Morton, Christoph Hellwig,
	Naohiro Aota

A zoned block device consists of a number of zones. Zones are either
conventional and accepting random writes or sequential and requiring that
writes be issued in LBA order from each zone write pointer position. For
the write restriction, zoned block devices are not suitable for a swap
device. Disallow swapon on them.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
v2: add comments according to Christoph's feedback, reformat chengelog.
---
 mm/swapfile.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index dab43523afdd..f2c4224d1f8a 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2887,6 +2887,14 @@ static int claim_swapfile(struct swap_info_struct *p, struct inode *inode)
 		error = set_blocksize(p->bdev, PAGE_SIZE);
 		if (error < 0)
 			return error;
+		/*
+		 * Zoned block device contains zones that have
+		 * sequential write only restriction. For the restriction,
+		 * zoned block devices are not suitable for a swap device.
+		 * Disallow them here.
+		 */
+		if (blk_queue_is_zoned(p->bdev->bd_queue))
+			return -EINVAL;
 		p->flags |= SWP_BLKDEV;
 	} else if (S_ISREG(inode->i_mode)) {
 		p->bdev = inode->i_sb->s_bdev;
-- 
2.23.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] mm, swap: disallow swapon() on zoned block devices
  2019-10-15  8:58 ` [PATCH v2] " Naohiro Aota
@ 2019-10-15  9:06   ` Christoph Hellwig
  2019-10-15 20:43     ` Andrew Morton
  0 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2019-10-15  9:06 UTC (permalink / raw)
  To: Naohiro Aota
  Cc: linux-mm, linux-fsdevel, linux-block, Andrew Morton, Christoph Hellwig

On Tue, Oct 15, 2019 at 05:58:14PM +0900, Naohiro Aota wrote:
> A zoned block device consists of a number of zones. Zones are either
> conventional and accepting random writes or sequential and requiring that
> writes be issued in LBA order from each zone write pointer position. For
> the write restriction, zoned block devices are not suitable for a swap
> device. Disallow swapon on them.
> 
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
> v2: add comments according to Christoph's feedback, reformat chengelog.
> ---
>  mm/swapfile.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index dab43523afdd..f2c4224d1f8a 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -2887,6 +2887,14 @@ static int claim_swapfile(struct swap_info_struct *p, struct inode *inode)
>  		error = set_blocksize(p->bdev, PAGE_SIZE);
>  		if (error < 0)
>  			return error;
> +		/*
> +		 * Zoned block device contains zones that have
> +		 * sequential write only restriction. For the restriction,
> +		 * zoned block devices are not suitable for a swap device.
> +		 * Disallow them here.
> +		 */
> +		if (blk_queue_is_zoned(p->bdev->bd_queue))

Please use up all 80 chars per line  Otherwise this looks fine:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Project idea: Swap to zoned block devices
  2019-10-15  4:38 [PATCH] mm, swap: disallow swapon() on zoned block devices Naohiro Aota
  2019-10-15  7:57 ` Christoph Hellwig
  2019-10-15  8:58 ` [PATCH v2] " Naohiro Aota
@ 2019-10-15 11:35 ` Matthew Wilcox
  2019-10-15 13:27   ` Theodore Y. Ts'o
  2019-10-15 13:48   ` Hannes Reinecke
  2 siblings, 2 replies; 11+ messages in thread
From: Matthew Wilcox @ 2019-10-15 11:35 UTC (permalink / raw)
  To: Naohiro Aota; +Cc: linux-mm, linux-fsdevel, linux-block, Andrew Morton

On Tue, Oct 15, 2019 at 01:38:27PM +0900, Naohiro Aota wrote:
> A zoned block device consists of a number of zones. Zones are
> eitherconventional and accepting random writes or sequential and
> requiringthat writes be issued in LBA order from each zone write
> pointerposition. For the write restriction, zoned block devices are
> notsuitable for a swap device. Disallow swapon on them.

That's unfortunate.  I wonder what it would take to make the swap code be
suitable for zoned devices.  It might even perform better on conventional
drives since swapout would be a large linear write.  Swapin would be a
fragmented, seeky set of reads, but this would seem like an excellent
university project.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Project idea: Swap to zoned block devices
  2019-10-15 11:35 ` Project idea: Swap to " Matthew Wilcox
@ 2019-10-15 13:27   ` Theodore Y. Ts'o
  2019-10-15 13:48   ` Hannes Reinecke
  1 sibling, 0 replies; 11+ messages in thread
From: Theodore Y. Ts'o @ 2019-10-15 13:27 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Naohiro Aota, linux-mm, linux-fsdevel, linux-block, Andrew Morton

On Tue, Oct 15, 2019 at 04:35:48AM -0700, Matthew Wilcox wrote:
> On Tue, Oct 15, 2019 at 01:38:27PM +0900, Naohiro Aota wrote:
> > A zoned block device consists of a number of zones. Zones are
> > eitherconventional and accepting random writes or sequential and
> > requiringthat writes be issued in LBA order from each zone write
> > pointerposition. For the write restriction, zoned block devices are
> > notsuitable for a swap device. Disallow swapon on them.
> 
> That's unfortunate.  I wonder what it would take to make the swap code be
> suitable for zoned devices.  It might even perform better on conventional
> drives since swapout would be a large linear write.  Swapin would be a
> fragmented, seeky set of reads, but this would seem like an excellent
> university project.

Also maybe a great Outreachy or GSOC project?


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Project idea: Swap to zoned block devices
  2019-10-15 11:35 ` Project idea: Swap to " Matthew Wilcox
  2019-10-15 13:27   ` Theodore Y. Ts'o
@ 2019-10-15 13:48   ` Hannes Reinecke
  2019-10-15 14:50     ` Christopher Lameter
  2019-10-15 15:09     ` Matthew Wilcox
  1 sibling, 2 replies; 11+ messages in thread
From: Hannes Reinecke @ 2019-10-15 13:48 UTC (permalink / raw)
  To: Matthew Wilcox, Naohiro Aota
  Cc: linux-mm, linux-fsdevel, linux-block, Andrew Morton

On 10/15/19 1:35 PM, Matthew Wilcox wrote:
> On Tue, Oct 15, 2019 at 01:38:27PM +0900, Naohiro Aota wrote:
>> A zoned block device consists of a number of zones. Zones are
>> either conventional and accepting random writes or sequential and
>> requiring that writes be issued in LBA order from each zone write
>> pointer position. For the write restriction, zoned block devices are
>> not suitable for a swap device. Disallow swapon on them.
> 
> That's unfortunate.  I wonder what it would take to make the swap code be
> suitable for zoned devices.  It might even perform better on conventional
> drives since swapout would be a large linear write.  Swapin would be a
> fragmented, seeky set of reads, but this would seem like an excellent
> university project.
> 
The main problem I'm seeing is the eviction of pages from swap.
While swapin is easy (as you can do random access on reads), evict pages
from cache becomes extremely tricky as you can only delete entire zones.
So how to we mark pages within zones as being stale?
Or can we modify the swapin code to always swap in an entire zone and
discard it immediately?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      Teamlead Storage & Networking
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 247165 (AG München), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Project idea: Swap to zoned block devices
  2019-10-15 13:48   ` Hannes Reinecke
@ 2019-10-15 14:50     ` Christopher Lameter
  2019-10-15 15:09     ` Matthew Wilcox
  1 sibling, 0 replies; 11+ messages in thread
From: Christopher Lameter @ 2019-10-15 14:50 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Matthew Wilcox, Naohiro Aota, linux-mm, linux-fsdevel,
	linux-block, Andrew Morton

On Tue, 15 Oct 2019, Hannes Reinecke wrote:

> On 10/15/19 1:35 PM, Matthew Wilcox wrote:
> > On Tue, Oct 15, 2019 at 01:38:27PM +0900, Naohiro Aota wrote:
> >> A zoned block device consists of a number of zones. Zones are
> >> either conventional and accepting random writes or sequential and
> >> requiring that writes be issued in LBA order from each zone write
> >> pointer position. For the write restriction, zoned block devices are
> >> not suitable for a swap device. Disallow swapon on them.
> >
> > That's unfortunate.  I wonder what it would take to make the swap code be
> > suitable for zoned devices.  It might even perform better on conventional
> > drives since swapout would be a large linear write.  Swapin would be a
> > fragmented, seeky set of reads, but this would seem like an excellent
> > university project.
> >
> The main problem I'm seeing is the eviction of pages from swap.
> While swapin is easy (as you can do random access on reads), evict pages
> from cache becomes extremely tricky as you can only delete entire zones.
> So how to we mark pages within zones as being stale?
> Or can we modify the swapin code to always swap in an entire zone and
> discard it immediately?

On swapout you would change the block number on the swap device to the
latest and increment it?

Mark the prio block number as unused and then at some convenient time scan
the map and see if you can somehow free up a zone?


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Project idea: Swap to zoned block devices
  2019-10-15 13:48   ` Hannes Reinecke
  2019-10-15 14:50     ` Christopher Lameter
@ 2019-10-15 15:09     ` Matthew Wilcox
  2019-10-15 15:22       ` Hannes Reinecke
  1 sibling, 1 reply; 11+ messages in thread
From: Matthew Wilcox @ 2019-10-15 15:09 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Naohiro Aota, linux-mm, linux-fsdevel, linux-block, Andrew Morton

On Tue, Oct 15, 2019 at 03:48:47PM +0200, Hannes Reinecke wrote:
> On 10/15/19 1:35 PM, Matthew Wilcox wrote:
> > On Tue, Oct 15, 2019 at 01:38:27PM +0900, Naohiro Aota wrote:
> >> A zoned block device consists of a number of zones. Zones are
> >> either conventional and accepting random writes or sequential and
> >> requiring that writes be issued in LBA order from each zone write
> >> pointer position. For the write restriction, zoned block devices are
> >> not suitable for a swap device. Disallow swapon on them.
> > 
> > That's unfortunate.  I wonder what it would take to make the swap code be
> > suitable for zoned devices.  It might even perform better on conventional
> > drives since swapout would be a large linear write.  Swapin would be a
> > fragmented, seeky set of reads, but this would seem like an excellent
> > university project.
> 
> The main problem I'm seeing is the eviction of pages from swap.
> While swapin is easy (as you can do random access on reads), evict pages
> from cache becomes extremely tricky as you can only delete entire zones.
> So how to we mark pages within zones as being stale?
> Or can we modify the swapin code to always swap in an entire zone and
> discard it immediately?

I thought zones were too big to swap in all at once?  What's a typical
zone size these days?  (the answer looks very different if a zone is 1MB
or if it's 1GB)

Fundamentally an allocated anonymous page has 5 states:

A: In memory, not written to swap (allocated)
B: In memory, dirty, not written to swap (app modifies page)
C: In memory, clean, written to swap (kernel decides to write it)
D: Not in memory, written to swap (kernel decides to reuse the memory)
E: In memory, clean, written to swap (app faults it back in for read)

We currently have a sixth state which is a page that has previously been
written to swap but has been redirtied by the app.  It will be written
back to the allocated location the next time it's targetted for writeout.

That would have to change; since we can't do random writes, pages would
transition from states D or E back to B.  Swapping out a page that has
previously been swapped will now mean appending to the tail of the swap,
not writing in place.

So the swap code will now need to keep track of which pages are still
in use in storage and will need to be relocated once we decide to reuse
the zone.  Not an insurmountable task, but not entirely trivial.

There'd be some other gunk to deal with around handling badblocks.
Those are currently stored in page 1, so adding new ones would be
a rewrite of that block.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Project idea: Swap to zoned block devices
  2019-10-15 15:09     ` Matthew Wilcox
@ 2019-10-15 15:22       ` Hannes Reinecke
  0 siblings, 0 replies; 11+ messages in thread
From: Hannes Reinecke @ 2019-10-15 15:22 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Naohiro Aota, linux-mm, linux-fsdevel, linux-block, Andrew Morton

On 10/15/19 5:09 PM, Matthew Wilcox wrote:
> On Tue, Oct 15, 2019 at 03:48:47PM +0200, Hannes Reinecke wrote:
>> On 10/15/19 1:35 PM, Matthew Wilcox wrote:
>>> On Tue, Oct 15, 2019 at 01:38:27PM +0900, Naohiro Aota wrote:
>>>> A zoned block device consists of a number of zones. Zones are
>>>> either conventional and accepting random writes or sequential and
>>>> requiring that writes be issued in LBA order from each zone write
>>>> pointer position. For the write restriction, zoned block devices are
>>>> not suitable for a swap device. Disallow swapon on them.
>>>
>>> That's unfortunate.  I wonder what it would take to make the swap code be
>>> suitable for zoned devices.  It might even perform better on conventional
>>> drives since swapout would be a large linear write.  Swapin would be a
>>> fragmented, seeky set of reads, but this would seem like an excellent
>>> university project.
>>
>> The main problem I'm seeing is the eviction of pages from swap.
>> While swapin is easy (as you can do random access on reads), evict pages
>> from cache becomes extremely tricky as you can only delete entire zones.
>> So how to we mark pages within zones as being stale?
>> Or can we modify the swapin code to always swap in an entire zone and
>> discard it immediately?
> 
> I thought zones were too big to swap in all at once?  What's a typical
> zone size these days?  (the answer looks very different if a zone is 1MB
> or if it's 1GB)
> 
Currently things have settled at 256MB, might be increased for ZNS.
But GB would be the upper limit I'd assume.

> Fundamentally an allocated anonymous page has 5 states:
> 
> A: In memory, not written to swap (allocated)
> B: In memory, dirty, not written to swap (app modifies page)
> C: In memory, clean, written to swap (kernel decides to write it)
> D: Not in memory, written to swap (kernel decides to reuse the memory)
> E: In memory, clean, written to swap (app faults it back in for read)
> 
> We currently have a sixth state which is a page that has previously been
> written to swap but has been redirtied by the app.  It will be written
> back to the allocated location the next time it's targetted for writeout.
> 
> That would have to change; since we can't do random writes, pages would
> transition from states D or E back to B.  Swapping out a page that has
> previously been swapped will now mean appending to the tail of the swap,
> not writing in place.
> 
> So the swap code will now need to keep track of which pages are still
> in use in storage and will need to be relocated once we decide to reuse
> the zone.  Not an insurmountable task, but not entirely trivial.
> 
Precisely my worries.
However, clearing stuff is _really_ fast (you just have to reset the
pointer which is kept in NVRAM of the device). Which might help a bit.

> There'd be some other gunk to deal with around handling badblocks.
> Those are currently stored in page 1, so adding new ones would be
> a rewrite of that block.
> 
Bah. Can't we make that optional?
We really only need badblocks when writing to crappy media (or NV-DIMM
:-). Zoned devices _will_ have proper error recovery in place, so the
only time where badblocks might be used is when the device is
essentially dead ;-)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      Teamlead Storage & Networking
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 247165 (AG München), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] mm, swap: disallow swapon() on zoned block devices
  2019-10-15  9:06   ` Christoph Hellwig
@ 2019-10-15 20:43     ` Andrew Morton
  0 siblings, 0 replies; 11+ messages in thread
From: Andrew Morton @ 2019-10-15 20:43 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Naohiro Aota, linux-mm, linux-fsdevel, linux-block

On Tue, 15 Oct 2019 02:06:41 -0700 Christoph Hellwig <hch@infradead.org> wrote:

> > +		/*
> > +		 * Zoned block device contains zones that have
> > +		 * sequential write only restriction. For the restriction,
> > +		 * zoned block devices are not suitable for a swap device.
> > +		 * Disallow them here.
> > +		 */
> > +		if (blk_queue_is_zoned(p->bdev->bd_queue))
> 
> Please use up all 80 chars per line  Otherwise this looks fine:

I redid the text a bit as well.

--- a/mm/swapfile.c~mm-swap-disallow-swapon-on-zoned-block-devices-fix
+++ a/mm/swapfile.c
@@ -2888,10 +2888,9 @@ static int claim_swapfile(struct swap_in
 		if (error < 0)
 			return error;
 		/*
-		 * Zoned block device contains zones that have
-		 * sequential write only restriction. For the restriction,
-		 * zoned block devices are not suitable for a swap device.
-		 * Disallow them here.
+		 * Zoned block devices contain zones that have a sequential
+		 * write only restriction.  Hence zoned block devices are not
+		 * suitable for swapping.  Disallow them here.
 		 */
 		if (blk_queue_is_zoned(p->bdev->bd_queue))
 			return -EINVAL;
_


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-10-15 20:44 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-15  4:38 [PATCH] mm, swap: disallow swapon() on zoned block devices Naohiro Aota
2019-10-15  7:57 ` Christoph Hellwig
2019-10-15  8:58 ` [PATCH v2] " Naohiro Aota
2019-10-15  9:06   ` Christoph Hellwig
2019-10-15 20:43     ` Andrew Morton
2019-10-15 11:35 ` Project idea: Swap to " Matthew Wilcox
2019-10-15 13:27   ` Theodore Y. Ts'o
2019-10-15 13:48   ` Hannes Reinecke
2019-10-15 14:50     ` Christopher Lameter
2019-10-15 15:09     ` Matthew Wilcox
2019-10-15 15:22       ` Hannes Reinecke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).