All of lore.kernel.org
 help / color / mirror / Atom feed
* fallocate on XFS for swap
@ 2018-03-09 22:05 Besogonov, Aleksei
  2018-03-09 23:44   ` Darrick J. Wong
  0 siblings, 1 reply; 15+ messages in thread
From: Besogonov, Aleksei @ 2018-03-09 22:05 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm; +Cc: Dave Chinner

Hi!

We’re working at Amazon on making XFS our default root filesystem for the upcoming Amazon Linux 2 (now in prod preview). One of the problems that we’ve encountered is inability to use fallocated files for swap on XFS. This is really important for us, since we’re shipping our current Amazon Linux with hibernation support .

I’ve traced the problem to bmap(), used in generic_swapfile_activate call, which returns 0 for blocks inside holes created by fallocate and Dave Chinner confirmed it in a private email. I’m thinking about ways to fix it, so far I see the following possibilities:

1. Change bmap() to not return zeroes for blocks inside holes. But this is an ABI change and it likely will break some obscure userspace utility somewhere.
2. Change generic_swap_activate to use a more modern interface, by adding fiemap-like operation to address_space_operations with fallback on bmap().
3. Add an XFS-specific implementation of swapfile_activate.

What do the people think about it? I kinda like option 2, since it'll make fallocate() work for any other FS that implements fiemap.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fallocate on XFS for swap
  2018-03-09 22:05 fallocate on XFS for swap Besogonov, Aleksei
@ 2018-03-09 23:44   ` Darrick J. Wong
  0 siblings, 0 replies; 15+ messages in thread
From: Darrick J. Wong @ 2018-03-09 23:44 UTC (permalink / raw)
  To: Besogonov, Aleksei; +Cc: linux-fsdevel, linux-mm, Dave Chinner, xfs

[you really ought to cc the xfs list]

On Fri, Mar 09, 2018 at 10:05:24PM +0000, Besogonov, Aleksei wrote:
> Hi!
> 
> We’re working at Amazon on making XFS our default root filesystem for
> the upcoming Amazon Linux 2 (now in prod preview). One of the problems
> that we’ve encountered is inability to use fallocated files for swap
> on XFS. This is really important for us, since we’re shipping our
> current Amazon Linux with hibernation support .

<shudder>

> I’ve traced the problem to bmap(), used in generic_swapfile_activate
> call, which returns 0 for blocks inside holes created by fallocate and
> Dave Chinner confirmed it in a private email. I’m thinking about ways
> to fix it, so far I see the following possibilities:
> 
> 1. Change bmap() to not return zeroes for blocks inside holes. But
> this is an ABI change and it likely will break some obscure userspace
> utility somewhere.

bmap is a horrible interface, let's leave it to wither and eventually go
away.

> 2. Change generic_swap_activate to use a more modern interface, by
> adding fiemap-like operation to address_space_operations with fallback
> on bmap().

Probably the best idea, but see fs/iomap.c since we're basically leasing
a chunk of file space to the kernel.  Leasing space to a user that wants
direct access is becoming rather common (rdma, map_sync, etc.)

> 3. Add an XFS-specific implementation of swapfile_activate.

Ugh no.

> What do the people think about it? I kinda like option 2, since it'll
> make fallocate() work for any other FS that implements fiemap.

--D

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fallocate on XFS for swap
@ 2018-03-09 23:44   ` Darrick J. Wong
  0 siblings, 0 replies; 15+ messages in thread
From: Darrick J. Wong @ 2018-03-09 23:44 UTC (permalink / raw)
  To: Besogonov, Aleksei; +Cc: linux-fsdevel, linux-mm, Dave Chinner, xfs

[you really ought to cc the xfs list]

On Fri, Mar 09, 2018 at 10:05:24PM +0000, Besogonov, Aleksei wrote:
> Hi!
> 
> Wea??re working at Amazon on making XFS our default root filesystem for
> the upcoming Amazon Linux 2 (now in prod preview). One of the problems
> that wea??ve encountered is inability to use fallocated files for swap
> on XFS. This is really important for us, since wea??re shipping our
> current Amazon Linux with hibernation support .

<shudder>

> Ia??ve traced the problem to bmap(), used in generic_swapfile_activate
> call, which returns 0 for blocks inside holes created by fallocate and
> Dave Chinner confirmed it in a private email. Ia??m thinking about ways
> to fix it, so far I see the following possibilities:
> 
> 1. Change bmap() to not return zeroes for blocks inside holes. But
> this is an ABI change and it likely will break some obscure userspace
> utility somewhere.

bmap is a horrible interface, let's leave it to wither and eventually go
away.

> 2. Change generic_swap_activate to use a more modern interface, by
> adding fiemap-like operation to address_space_operations with fallback
> on bmap().

Probably the best idea, but see fs/iomap.c since we're basically leasing
a chunk of file space to the kernel.  Leasing space to a user that wants
direct access is becoming rather common (rdma, map_sync, etc.)

> 3. Add an XFS-specific implementation of swapfile_activate.

Ugh no.

> What do the people think about it? I kinda like option 2, since it'll
> make fallocate() work for any other FS that implements fiemap.

--D

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fallocate on XFS for swap
  2018-03-09 23:44   ` Darrick J. Wong
@ 2018-03-10  0:58     ` Dave Chinner
  -1 siblings, 0 replies; 15+ messages in thread
From: Dave Chinner @ 2018-03-10  0:58 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Besogonov, Aleksei, linux-fsdevel, linux-mm, xfs

On Fri, Mar 09, 2018 at 03:44:22PM -0800, Darrick J. Wong wrote:
> [you really ought to cc the xfs list]
> 
> On Fri, Mar 09, 2018 at 10:05:24PM +0000, Besogonov, Aleksei wrote:
> > Hi!
> > 
> > We’re working at Amazon on making XFS our default root filesystem for
> > the upcoming Amazon Linux 2 (now in prod preview). One of the problems
> > that we’ve encountered is inability to use fallocated files for swap
> > on XFS. This is really important for us, since we’re shipping our
> > current Amazon Linux with hibernation support .
> 
> <shudder>
> 
> > I’ve traced the problem to bmap(), used in generic_swapfile_activate
> > call, which returns 0 for blocks inside holes created by fallocate and
> > Dave Chinner confirmed it in a private email. I’m thinking about ways
> > to fix it, so far I see the following possibilities:
> > 
> > 1. Change bmap() to not return zeroes for blocks inside holes. But
> > this is an ABI change and it likely will break some obscure userspace
> > utility somewhere.
> 
> bmap is a horrible interface, let's leave it to wither and eventually go
> away.
> 
> > 2. Change generic_swap_activate to use a more modern interface, by
> > adding fiemap-like operation to address_space_operations with fallback
> > on bmap().
> 
> Probably the best idea, but see fs/iomap.c since we're basically leasing
> a chunk of file space to the kernel.  Leasing space to a user that wants
> direct access is becoming rather common (rdma, map_sync, etc.)

thing is, we don't want in-kernel users of fiemap. We've got other
block mapping interfaces that can be used, such as iomap...

> > 3. Add an XFS-specific implementation of swapfile_activate.
> 
> Ugh no.

What we want is an iomap-based re-implementation of
generic_swap_activate(). One of the ways to plumb that in is to
use ->swapfile_activate() like so:

iomap_swapfile_activate()
{
	return iomap_apply(... iomap_swapfile_add_extent, ...)
}

xfs_vm_swapfile_activate()
{
	return iomap_swapfile_activate(xfs_iomap_ops);
}

	.swapfile_activate = xfs_vm_swapfile_activate()

And massage the swapfile_activate callout be friendly to fragmented
files. i.e. change the nfs caller to run a
"add_single_swap_extent()" caller rather than have to do it in the
generic code on return....

IOWs, I think the choices we have are to either re-implement
generic_swapfile_activate() and then be stuck with using get_block
style interfaces forever in XFS, or we use the filesystem specific
callout to implement more advanced generic support using the
filesystem supplied get_block/iomap interfaces for block mapping
like we do for everything else that the VM needs the filesystem to
do....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fallocate on XFS for swap
@ 2018-03-10  0:58     ` Dave Chinner
  0 siblings, 0 replies; 15+ messages in thread
From: Dave Chinner @ 2018-03-10  0:58 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Besogonov, Aleksei, linux-fsdevel, linux-mm, xfs

On Fri, Mar 09, 2018 at 03:44:22PM -0800, Darrick J. Wong wrote:
> [you really ought to cc the xfs list]
> 
> On Fri, Mar 09, 2018 at 10:05:24PM +0000, Besogonov, Aleksei wrote:
> > Hi!
> > 
> > Wea??re working at Amazon on making XFS our default root filesystem for
> > the upcoming Amazon Linux 2 (now in prod preview). One of the problems
> > that wea??ve encountered is inability to use fallocated files for swap
> > on XFS. This is really important for us, since wea??re shipping our
> > current Amazon Linux with hibernation support .
> 
> <shudder>
> 
> > Ia??ve traced the problem to bmap(), used in generic_swapfile_activate
> > call, which returns 0 for blocks inside holes created by fallocate and
> > Dave Chinner confirmed it in a private email. Ia??m thinking about ways
> > to fix it, so far I see the following possibilities:
> > 
> > 1. Change bmap() to not return zeroes for blocks inside holes. But
> > this is an ABI change and it likely will break some obscure userspace
> > utility somewhere.
> 
> bmap is a horrible interface, let's leave it to wither and eventually go
> away.
> 
> > 2. Change generic_swap_activate to use a more modern interface, by
> > adding fiemap-like operation to address_space_operations with fallback
> > on bmap().
> 
> Probably the best idea, but see fs/iomap.c since we're basically leasing
> a chunk of file space to the kernel.  Leasing space to a user that wants
> direct access is becoming rather common (rdma, map_sync, etc.)

thing is, we don't want in-kernel users of fiemap. We've got other
block mapping interfaces that can be used, such as iomap...

> > 3. Add an XFS-specific implementation of swapfile_activate.
> 
> Ugh no.

What we want is an iomap-based re-implementation of
generic_swap_activate(). One of the ways to plumb that in is to
use ->swapfile_activate() like so:

iomap_swapfile_activate()
{
	return iomap_apply(... iomap_swapfile_add_extent, ...)
}

xfs_vm_swapfile_activate()
{
	return iomap_swapfile_activate(xfs_iomap_ops);
}

	.swapfile_activate = xfs_vm_swapfile_activate()

And massage the swapfile_activate callout be friendly to fragmented
files. i.e. change the nfs caller to run a
"add_single_swap_extent()" caller rather than have to do it in the
generic code on return....

IOWs, I think the choices we have are to either re-implement
generic_swapfile_activate() and then be stuck with using get_block
style interfaces forever in XFS, or we use the filesystem specific
callout to implement more advanced generic support using the
filesystem supplied get_block/iomap interfaces for block mapping
like we do for everything else that the VM needs the filesystem to
do....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fallocate on XFS for swap
  2018-03-10  0:58     ` Dave Chinner
@ 2018-03-10  1:17       ` Darrick J. Wong
  -1 siblings, 0 replies; 15+ messages in thread
From: Darrick J. Wong @ 2018-03-10  1:17 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Besogonov, Aleksei, linux-fsdevel, linux-mm, xfs

On Sat, Mar 10, 2018 at 11:58:50AM +1100, Dave Chinner wrote:
> On Fri, Mar 09, 2018 at 03:44:22PM -0800, Darrick J. Wong wrote:
> > [you really ought to cc the xfs list]
> > 
> > On Fri, Mar 09, 2018 at 10:05:24PM +0000, Besogonov, Aleksei wrote:
> > > Hi!
> > > 
> > > We’re working at Amazon on making XFS our default root filesystem for
> > > the upcoming Amazon Linux 2 (now in prod preview). One of the problems
> > > that we’ve encountered is inability to use fallocated files for swap
> > > on XFS. This is really important for us, since we’re shipping our
> > > current Amazon Linux with hibernation support .
> > 
> > <shudder>
> > 
> > > I’ve traced the problem to bmap(), used in generic_swapfile_activate
> > > call, which returns 0 for blocks inside holes created by fallocate and
> > > Dave Chinner confirmed it in a private email. I’m thinking about ways
> > > to fix it, so far I see the following possibilities:
> > > 
> > > 1. Change bmap() to not return zeroes for blocks inside holes. But
> > > this is an ABI change and it likely will break some obscure userspace
> > > utility somewhere.
> > 
> > bmap is a horrible interface, let's leave it to wither and eventually go
> > away.
> > 
> > > 2. Change generic_swap_activate to use a more modern interface, by
> > > adding fiemap-like operation to address_space_operations with fallback
> > > on bmap().
> > 
> > Probably the best idea, but see fs/iomap.c since we're basically leasing
> > a chunk of file space to the kernel.  Leasing space to a user that wants
> > direct access is becoming rather common (rdma, map_sync, etc.)
> 
> thing is, we don't want in-kernel users of fiemap. We've got other
> block mapping interfaces that can be used, such as iomap...

Well yes, I was clumsily trying to suggest reimplementing
generic_swap_activate with an iomap backend replacing/augmenting the old
get_blocks thing... :)

> > > 3. Add an XFS-specific implementation of swapfile_activate.
> > 
> > Ugh no.
> 
> What we want is an iomap-based re-implementation of
> generic_swap_activate(). One of the ways to plumb that in is to
> use ->swapfile_activate() like so:

Is this distinct from the ->swap_activate function pointer in
address_operations or a new one?  I think it'd be best to have it be a
separate callback like you suggest:

> iomap_swapfile_activate()
> {
> 	return iomap_apply(... iomap_swapfile_add_extent, ...)
> }
> 
> xfs_vm_swapfile_activate()
> {
> 	return iomap_swapfile_activate(xfs_iomap_ops);
> }
> 
> 	.swapfile_activate = xfs_vm_swapfile_activate()
> 
> And massage the swapfile_activate callout be friendly to fragmented
> files. i.e. change the nfs caller to run a
> "add_single_swap_extent()" caller rather than have to do it in the
> generic code on return....

But ugh, the names are confusing.  ->swapfile_activate, ->swap_activate,
and generic_swapfile_activate.  Not sure what's needed to clean up the
other filesystems to use a single mapping interface, though.

> IOWs, I think the choices we have are to either re-implement
> generic_swapfile_activate() and then be stuck with using get_block
> style interfaces forever in XFS, or we use the filesystem specific
> callout to implement more advanced generic support using the
> filesystem supplied get_block/iomap interfaces for block mapping
> like we do for everything else that the VM needs the filesystem to
> do....

Yes, that's what I was trying to nudge Mr. Besogonov towards, though not
as clearly as you've put it.  Thanks. :)

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fallocate on XFS for swap
@ 2018-03-10  1:17       ` Darrick J. Wong
  0 siblings, 0 replies; 15+ messages in thread
From: Darrick J. Wong @ 2018-03-10  1:17 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Besogonov, Aleksei, linux-fsdevel, linux-mm, xfs

On Sat, Mar 10, 2018 at 11:58:50AM +1100, Dave Chinner wrote:
> On Fri, Mar 09, 2018 at 03:44:22PM -0800, Darrick J. Wong wrote:
> > [you really ought to cc the xfs list]
> > 
> > On Fri, Mar 09, 2018 at 10:05:24PM +0000, Besogonov, Aleksei wrote:
> > > Hi!
> > > 
> > > Wea??re working at Amazon on making XFS our default root filesystem for
> > > the upcoming Amazon Linux 2 (now in prod preview). One of the problems
> > > that wea??ve encountered is inability to use fallocated files for swap
> > > on XFS. This is really important for us, since wea??re shipping our
> > > current Amazon Linux with hibernation support .
> > 
> > <shudder>
> > 
> > > Ia??ve traced the problem to bmap(), used in generic_swapfile_activate
> > > call, which returns 0 for blocks inside holes created by fallocate and
> > > Dave Chinner confirmed it in a private email. Ia??m thinking about ways
> > > to fix it, so far I see the following possibilities:
> > > 
> > > 1. Change bmap() to not return zeroes for blocks inside holes. But
> > > this is an ABI change and it likely will break some obscure userspace
> > > utility somewhere.
> > 
> > bmap is a horrible interface, let's leave it to wither and eventually go
> > away.
> > 
> > > 2. Change generic_swap_activate to use a more modern interface, by
> > > adding fiemap-like operation to address_space_operations with fallback
> > > on bmap().
> > 
> > Probably the best idea, but see fs/iomap.c since we're basically leasing
> > a chunk of file space to the kernel.  Leasing space to a user that wants
> > direct access is becoming rather common (rdma, map_sync, etc.)
> 
> thing is, we don't want in-kernel users of fiemap. We've got other
> block mapping interfaces that can be used, such as iomap...

Well yes, I was clumsily trying to suggest reimplementing
generic_swap_activate with an iomap backend replacing/augmenting the old
get_blocks thing... :)

> > > 3. Add an XFS-specific implementation of swapfile_activate.
> > 
> > Ugh no.
> 
> What we want is an iomap-based re-implementation of
> generic_swap_activate(). One of the ways to plumb that in is to
> use ->swapfile_activate() like so:

Is this distinct from the ->swap_activate function pointer in
address_operations or a new one?  I think it'd be best to have it be a
separate callback like you suggest:

> iomap_swapfile_activate()
> {
> 	return iomap_apply(... iomap_swapfile_add_extent, ...)
> }
> 
> xfs_vm_swapfile_activate()
> {
> 	return iomap_swapfile_activate(xfs_iomap_ops);
> }
> 
> 	.swapfile_activate = xfs_vm_swapfile_activate()
> 
> And massage the swapfile_activate callout be friendly to fragmented
> files. i.e. change the nfs caller to run a
> "add_single_swap_extent()" caller rather than have to do it in the
> generic code on return....

But ugh, the names are confusing.  ->swapfile_activate, ->swap_activate,
and generic_swapfile_activate.  Not sure what's needed to clean up the
other filesystems to use a single mapping interface, though.

> IOWs, I think the choices we have are to either re-implement
> generic_swapfile_activate() and then be stuck with using get_block
> style interfaces forever in XFS, or we use the filesystem specific
> callout to implement more advanced generic support using the
> filesystem supplied get_block/iomap interfaces for block mapping
> like we do for everything else that the VM needs the filesystem to
> do....

Yes, that's what I was trying to nudge Mr. Besogonov towards, though not
as clearly as you've put it.  Thanks. :)

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fallocate on XFS for swap
  2018-03-10  1:17       ` Darrick J. Wong
@ 2018-03-10  1:36         ` Dave Chinner
  -1 siblings, 0 replies; 15+ messages in thread
From: Dave Chinner @ 2018-03-10  1:36 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Besogonov, Aleksei, linux-fsdevel, linux-mm, xfs

On Fri, Mar 09, 2018 at 05:17:07PM -0800, Darrick J. Wong wrote:
> On Sat, Mar 10, 2018 at 11:58:50AM +1100, Dave Chinner wrote:
> > On Fri, Mar 09, 2018 at 03:44:22PM -0800, Darrick J. Wong wrote:
> > > [you really ought to cc the xfs list]
> > > 
> > > On Fri, Mar 09, 2018 at 10:05:24PM +0000, Besogonov, Aleksei wrote:
> > > > Hi!
> > > > 
> > > > We’re working at Amazon on making XFS our default root filesystem for
> > > > the upcoming Amazon Linux 2 (now in prod preview). One of the problems
> > > > that we’ve encountered is inability to use fallocated files for swap
> > > > on XFS. This is really important for us, since we’re shipping our
> > > > current Amazon Linux with hibernation support .
> > > 
> > > <shudder>
> > > 
> > > > I’ve traced the problem to bmap(), used in generic_swapfile_activate
> > > > call, which returns 0 for blocks inside holes created by fallocate and
> > > > Dave Chinner confirmed it in a private email. I’m thinking about ways
> > > > to fix it, so far I see the following possibilities:
> > > > 
> > > > 1. Change bmap() to not return zeroes for blocks inside holes. But
> > > > this is an ABI change and it likely will break some obscure userspace
> > > > utility somewhere.
> > > 
> > > bmap is a horrible interface, let's leave it to wither and eventually go
> > > away.
> > > 
> > > > 2. Change generic_swap_activate to use a more modern interface, by
> > > > adding fiemap-like operation to address_space_operations with fallback
> > > > on bmap().
> > > 
> > > Probably the best idea, but see fs/iomap.c since we're basically leasing
> > > a chunk of file space to the kernel.  Leasing space to a user that wants
> > > direct access is becoming rather common (rdma, map_sync, etc.)
> > 
> > thing is, we don't want in-kernel users of fiemap. We've got other
> > block mapping interfaces that can be used, such as iomap...
> 
> Well yes, I was clumsily trying to suggest reimplementing
> generic_swap_activate with an iomap backend replacing/augmenting the old
> get_blocks thing... :)
> 
> > > > 3. Add an XFS-specific implementation of swapfile_activate.
> > > 
> > > Ugh no.
> > 
> > What we want is an iomap-based re-implementation of
> > generic_swap_activate(). One of the ways to plumb that in is to
> > use ->swapfile_activate() like so:
> 
> Is this distinct from the ->swap_activate function pointer in
> address_operations or a new one?  I think it'd be best to have it be a
> separate callback like you suggest:

No, we don't need to create a new one - the existing one is used by
a single caller and we can easily move all the functionality it
requires inside the NFS specific implementation - it's just mapping
the entire range as a single extent, but the callout is needed to
mark the sockets backing the file as in the memalloc path...

> > iomap_swapfile_activate()
> > {
> > 	return iomap_apply(... iomap_swapfile_add_extent, ...)
> > }
> > 
> > xfs_vm_swapfile_activate()
> > {
> > 	return iomap_swapfile_activate(xfs_iomap_ops);
> > }
> > 
> > 	.swapfile_activate = xfs_vm_swapfile_activate()
> > 
> > And massage the swapfile_activate callout be friendly to fragmented
> > files. i.e. change the nfs caller to run a
> > "add_single_swap_extent()" caller rather than have to do it in the
> > generic code on return....
> 
> But ugh, the names are confusing.  ->swapfile_activate, ->swap_activate,
> and generic_swapfile_activate.  Not sure what's needed to clean up the
> other filesystems to use a single mapping interface, though.

If they don't implement the callout, they use the
generic_swapfile_activate code that currently exists. Maybe with a
name change, but this way we don't have to touch them....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fallocate on XFS for swap
@ 2018-03-10  1:36         ` Dave Chinner
  0 siblings, 0 replies; 15+ messages in thread
From: Dave Chinner @ 2018-03-10  1:36 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Besogonov, Aleksei, linux-fsdevel, linux-mm, xfs

On Fri, Mar 09, 2018 at 05:17:07PM -0800, Darrick J. Wong wrote:
> On Sat, Mar 10, 2018 at 11:58:50AM +1100, Dave Chinner wrote:
> > On Fri, Mar 09, 2018 at 03:44:22PM -0800, Darrick J. Wong wrote:
> > > [you really ought to cc the xfs list]
> > > 
> > > On Fri, Mar 09, 2018 at 10:05:24PM +0000, Besogonov, Aleksei wrote:
> > > > Hi!
> > > > 
> > > > Wea??re working at Amazon on making XFS our default root filesystem for
> > > > the upcoming Amazon Linux 2 (now in prod preview). One of the problems
> > > > that wea??ve encountered is inability to use fallocated files for swap
> > > > on XFS. This is really important for us, since wea??re shipping our
> > > > current Amazon Linux with hibernation support .
> > > 
> > > <shudder>
> > > 
> > > > Ia??ve traced the problem to bmap(), used in generic_swapfile_activate
> > > > call, which returns 0 for blocks inside holes created by fallocate and
> > > > Dave Chinner confirmed it in a private email. Ia??m thinking about ways
> > > > to fix it, so far I see the following possibilities:
> > > > 
> > > > 1. Change bmap() to not return zeroes for blocks inside holes. But
> > > > this is an ABI change and it likely will break some obscure userspace
> > > > utility somewhere.
> > > 
> > > bmap is a horrible interface, let's leave it to wither and eventually go
> > > away.
> > > 
> > > > 2. Change generic_swap_activate to use a more modern interface, by
> > > > adding fiemap-like operation to address_space_operations with fallback
> > > > on bmap().
> > > 
> > > Probably the best idea, but see fs/iomap.c since we're basically leasing
> > > a chunk of file space to the kernel.  Leasing space to a user that wants
> > > direct access is becoming rather common (rdma, map_sync, etc.)
> > 
> > thing is, we don't want in-kernel users of fiemap. We've got other
> > block mapping interfaces that can be used, such as iomap...
> 
> Well yes, I was clumsily trying to suggest reimplementing
> generic_swap_activate with an iomap backend replacing/augmenting the old
> get_blocks thing... :)
> 
> > > > 3. Add an XFS-specific implementation of swapfile_activate.
> > > 
> > > Ugh no.
> > 
> > What we want is an iomap-based re-implementation of
> > generic_swap_activate(). One of the ways to plumb that in is to
> > use ->swapfile_activate() like so:
> 
> Is this distinct from the ->swap_activate function pointer in
> address_operations or a new one?  I think it'd be best to have it be a
> separate callback like you suggest:

No, we don't need to create a new one - the existing one is used by
a single caller and we can easily move all the functionality it
requires inside the NFS specific implementation - it's just mapping
the entire range as a single extent, but the callout is needed to
mark the sockets backing the file as in the memalloc path...

> > iomap_swapfile_activate()
> > {
> > 	return iomap_apply(... iomap_swapfile_add_extent, ...)
> > }
> > 
> > xfs_vm_swapfile_activate()
> > {
> > 	return iomap_swapfile_activate(xfs_iomap_ops);
> > }
> > 
> > 	.swapfile_activate = xfs_vm_swapfile_activate()
> > 
> > And massage the swapfile_activate callout be friendly to fragmented
> > files. i.e. change the nfs caller to run a
> > "add_single_swap_extent()" caller rather than have to do it in the
> > generic code on return....
> 
> But ugh, the names are confusing.  ->swapfile_activate, ->swap_activate,
> and generic_swapfile_activate.  Not sure what's needed to clean up the
> other filesystems to use a single mapping interface, though.

If they don't implement the callout, they use the
generic_swapfile_activate code that currently exists. Maybe with a
name change, but this way we don't have to touch them....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fallocate on XFS for swap
  2018-03-10  0:58     ` Dave Chinner
  (?)
  (?)
@ 2018-03-10  9:38     ` Christoph Hellwig
  2018-03-12 21:46       ` Dave Chinner
  -1 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2018-03-10  9:38 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Darrick J. Wong, Besogonov, Aleksei, linux-fsdevel, linux-mm, xfs

On Sat, Mar 10, 2018 at 11:58:50AM +1100, Dave Chinner wrote:
> > Probably the best idea, but see fs/iomap.c since we're basically leasing
> > a chunk of file space to the kernel.  Leasing space to a user that wants
> > direct access is becoming rather common (rdma, map_sync, etc.)
> 
> thing is, we don't want in-kernel users of fiemap. We've got other
> block mapping interfaces that can be used, such as iomap...

Agreed.  fiemap is in many ways just as bad as bmap - it is an
information at a given point in time interface.  It is more detailed
than bmap and allows better error reporting, but it still is
fundamentally the wrong thing to use for any sort of the I/O path.

> 
> > > 3. Add an XFS-specific implementation of swapfile_activate.
> > 
> > Ugh no.
> 
> What we want is an iomap-based re-implementation of
> generic_swap_activate(). One of the ways to plumb that in is to
> use ->swapfile_activate() like so:

Hmm.  Fundamentally swap is the same problem as the pNFS block layout
or get_user_pages on DAX mappings - we want to get a 'lease' on the
current block mapping, and make sure it stays that way as the external
user (the swap code in this case) uses it.  The twist for the swap code
is mostly that it never wants to break the least but instead disallow
any external operation, but that's not really such a big difference.

So maybe we want a layout based swap code instead of reinventing it,
with the slight twist to the layout break code to never try a lease
break and just return an error for the IS_SWAPFILE case.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fallocate on XFS for swap
  2018-03-10  0:58     ` Dave Chinner
                       ` (2 preceding siblings ...)
  (?)
@ 2018-03-12 18:40     ` Besogonov, Aleksei
  -1 siblings, 0 replies; 15+ messages in thread
From: Besogonov, Aleksei @ 2018-03-12 18:40 UTC (permalink / raw)
  To: Dave Chinner, Darrick J. Wong; +Cc: linux-fsdevel, linux-mm, xfs

On 3/9/18, 16:58, "Dave Chinner" <david@fromorbit.com> wrote:
    On Fri, Mar 09, 2018 at 03:44:22PM -0800, Darrick J. Wong wrote:
    > [you really ought to cc the xfs list]
    > 
    > On Fri, Mar 09, 2018 at 10:05:24PM +0000, Besogonov, Aleksei wrote:
    > > Hi!
    > > 
    > > We’re working at Amazon on making XFS our default root filesystem for
    > > the upcoming Amazon Linux 2 (now in prod preview). One of the problems
    > > that we’ve encountered is inability to use fallocated files for swap
    > > on XFS. This is really important for us, since we’re shipping our
    > > current Amazon Linux with hibernation support .
    > 
    > <shudder>
    > 
    > > I’ve traced the problem to bmap(), used in generic_swapfile_activate
    > > call, which returns 0 for blocks inside holes created by fallocate and
    > > Dave Chinner confirmed it in a private email. I’m thinking about ways
    > > to fix it, so far I see the following possibilities:
    > > 
    > > 1. Change bmap() to not return zeroes for blocks inside holes. But
    > > this is an ABI change and it likely will break some obscure userspace
    > > utility somewhere.
    > 
    > bmap is a horrible interface, let's leave it to wither and eventually go
    > away.
    > 
    > > 2. Change generic_swap_activate to use a more modern interface, by
    > > adding fiemap-like operation to address_space_operations with fallback
    > > on bmap().
    > 
    > Probably the best idea, but see fs/iomap.c since we're basically leasing
    > a chunk of file space to the kernel.  Leasing space to a user that wants
    > direct access is becoming rather common (rdma, map_sync, etc.)
    
    thing is, we don't want in-kernel users of fiemap. We've got other
    block mapping interfaces that can be used, such as iomap...
    
    > > 3. Add an XFS-specific implementation of swapfile_activate.
    > 
    > Ugh no.
    
    What we want is an iomap-based re-implementation of
    generic_swap_activate(). One of the ways to plumb that in is to
    use ->swapfile_activate() like so:
    
    iomap_swapfile_activate()
    {
    	return iomap_apply(... iomap_swapfile_add_extent, ...)
    }
    
    xfs_vm_swapfile_activate()
    {
    	return iomap_swapfile_activate(xfs_iomap_ops);
    }
    
    	.swapfile_activate = xfs_vm_swapfile_activate()
    
    And massage the swapfile_activate callout be friendly to fragmented
    files. i.e. change the nfs caller to run a
    "add_single_swap_extent()" caller rather than have to do it in the
    generic code on return....

This sounds reasonable, I'll try to implement it this week or so for XFS.
No guarantees about NFS, though. 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fallocate on XFS for swap
  2018-03-10  9:38     ` Christoph Hellwig
@ 2018-03-12 21:46       ` Dave Chinner
  2018-03-13  7:14         ` Christoph Hellwig
  0 siblings, 1 reply; 15+ messages in thread
From: Dave Chinner @ 2018-03-12 21:46 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Darrick J. Wong, Besogonov, Aleksei, linux-fsdevel, linux-mm, xfs

On Sat, Mar 10, 2018 at 01:38:44AM -0800, Christoph Hellwig wrote:
> On Sat, Mar 10, 2018 at 11:58:50AM +1100, Dave Chinner wrote:
> > > > 3. Add an XFS-specific implementation of swapfile_activate.
> > > 
> > > Ugh no.
> > 
> > What we want is an iomap-based re-implementation of
> > generic_swap_activate(). One of the ways to plumb that in is to
> > use ->swapfile_activate() like so:
> 
> Hmm.  Fundamentally swap is the same problem as the pNFS block layout
> or get_user_pages on DAX mappings - we want to get a 'lease' on the
> current block mapping, and make sure it stays that way as the external
> user (the swap code in this case) uses it.  The twist for the swap code
> is mostly that it never wants to break the least but instead disallow
> any external operation, but that's not really such a big difference.

True.

> So maybe we want a layout based swap code instead of reinventing it,
> with the slight twist to the layout break code to never try a lease
> break and just return an error for the IS_SWAPFILE case.

Hmmm - won't that change user visible behaviour on swapfiles? Not
that it would be a bad thing to reject read/write from root on swap
files, but it would make XFS different to everything else.

Speaking of which - we probably need to spend some time at LSFMM in
the fs track talking about the iomap infrastructure and long term
plans to migrate the major filesystems to it....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fallocate on XFS for swap
  2018-03-10  1:36         ` Dave Chinner
  (?)
@ 2018-03-12 22:01         ` Besogonov, Aleksei
  2018-03-13  1:31           ` Dave Chinner
  -1 siblings, 1 reply; 15+ messages in thread
From: Besogonov, Aleksei @ 2018-03-12 22:01 UTC (permalink / raw)
  To: Dave Chinner, Darrick J. Wong; +Cc: linux-fsdevel, linux-mm, xfs

[snip unrelated]

So I'm looking at the XFS code and it appears that the iomap is limited to 1024*PAGE_SIZE blocks at a time, which is too small for most of swap use-cases. I can of course just loop through the file in 4Mb increments and, just like the bmap() code does today. But this just doesn't look right and it's not atomic. And it looks like iomap in ext2 doesn't have this limitation. 

The stated rationale for the XFS limit is:
>/*
> * We cap the maximum length we map here to MAX_WRITEBACK_PAGES pages
> * to keep the chunks of work done where somewhat symmetric with the
> * work writeback does. This is a completely arbitrary number pulled
> * out of thin air as a best guess for initial testing.
> *
> * Note that the values needs to be less than 32-bits wide until
> * the lower level functions are updated.
> */

So can it be lifted today?


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fallocate on XFS for swap
  2018-03-12 22:01         ` Besogonov, Aleksei
@ 2018-03-13  1:31           ` Dave Chinner
  0 siblings, 0 replies; 15+ messages in thread
From: Dave Chinner @ 2018-03-13  1:31 UTC (permalink / raw)
  To: Besogonov, Aleksei; +Cc: Darrick J. Wong, linux-fsdevel, linux-mm, xfs

[Aleski, can you word wrap your email text at 72 columns? ]

On Mon, Mar 12, 2018 at 10:01:54PM +0000, Besogonov, Aleksei wrote:
> [snip unrelated]
> 
> So I'm looking at the XFS code and it appears that the iomap is
> limited to 1024*PAGE_SIZE blocks at a time,

Take a closer look - that code is not used for reading file extents
and returning them to the caller.

> which is too small for
> most of swap use-cases. I can of course just loop through the file
> in 4Mb increments and, just like the bmap() code does today. But
> this just doesn't look right and it's not atomic. And it looks
> like iomap in ext2 doesn't have this limitation. 
> 
> The stated rationale for the XFS limit is:
> >/*
> > * We cap the maximum length we map here to MAX_WRITEBACK_PAGES pages
> > * to keep the chunks of work done where somewhat symmetric with the
> > * work writeback does. This is a completely arbitrary number pulled
> > * out of thin air as a best guess for initial testing.
> > *
> > * Note that the values needs to be less than 32-bits wide until
> > * the lower level functions are updated.
> > */

Yeah, that's in the IOMAP_WRITE path used for block allocation. swap
file mapping should not be asking for IOMAP_WRITE mappings that
trigger extent allocation, so you should never hit this case.

You should probably be using the IOMAP_REPORT path (i.e. basically
very similar code to iomap_fiemap/iomap_fiemap_apply and rejecting
any file that returns an iomap that is not IOMAP_MAPPED or
IOMAP_UNWRITTEN. Also, you want to reject any file that returns
IOMAP_F_SHARED in iomap->flags, too, because swapfiles can't do COW
to break extent sharing on writes.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fallocate on XFS for swap
  2018-03-12 21:46       ` Dave Chinner
@ 2018-03-13  7:14         ` Christoph Hellwig
  0 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2018-03-13  7:14 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Darrick J. Wong, Besogonov, Aleksei,
	linux-fsdevel, linux-mm, xfs

On Tue, Mar 13, 2018 at 08:46:26AM +1100, Dave Chinner wrote:
> > So maybe we want a layout based swap code instead of reinventing it,
> > with the slight twist to the layout break code to never try a lease
> > break and just return an error for the IS_SWAPFILE case.
> 
> Hmmm - won't that change user visible behaviour on swapfiles? Not
> that it would be a bad thing to reject read/write from root on swap
> files, but it would make XFS different to everything else.

We already can't writew to active swap files, thank god:

root@testvm:~# dd if=/dev/zero of=swapfile bs=1M count=64
64+0 records in
64+0 records out
67108864 bytes (67 MB, 64 MiB) copied, 0.0458446 s, 1.5 GB/s
mkswap swapfile
mkswap: swapfile: insecure permissions 0644, 0600 suggested.
Setting up swapspace version 1, size = 64 MiB (67104768 bytes)
no label, UUID=bb42b883-f224-4627-8580-c1ba9f4569ab
root@testvm:~# swapon swapfile
swapon: /root/swapfile: insecure permissions 0644, 0600 suggested.
[   54.165439] Adding 65532k swap on /root/swapfile.  Priority:-2 extents:1 across:65532k
root@testvm:~# dd if=/dev/zero of=swapfile bs=1M count=64
dd: failed to open 'swapfile': Text file busy

> 
> Speaking of which - we probably need to spend some time at LSFMM in
> the fs track talking about the iomap infrastructure and long term
> plans to migrate the major filesystems to it....

I won't be there, as I'll be busy working the local election ballot.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-03-13  7:14 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-09 22:05 fallocate on XFS for swap Besogonov, Aleksei
2018-03-09 23:44 ` Darrick J. Wong
2018-03-09 23:44   ` Darrick J. Wong
2018-03-10  0:58   ` Dave Chinner
2018-03-10  0:58     ` Dave Chinner
2018-03-10  1:17     ` Darrick J. Wong
2018-03-10  1:17       ` Darrick J. Wong
2018-03-10  1:36       ` Dave Chinner
2018-03-10  1:36         ` Dave Chinner
2018-03-12 22:01         ` Besogonov, Aleksei
2018-03-13  1:31           ` Dave Chinner
2018-03-10  9:38     ` Christoph Hellwig
2018-03-12 21:46       ` Dave Chinner
2018-03-13  7:14         ` Christoph Hellwig
2018-03-12 18:40     ` Besogonov, Aleksei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.