All of lore.kernel.org
 help / color / mirror / Atom feed
* Swapfile on btrfs
@ 2013-02-26 21:45 Alex Elsayed
  2013-02-27 17:52 ` David Sterba
  0 siblings, 1 reply; 4+ messages in thread
From: Alex Elsayed @ 2013-02-26 21:45 UTC (permalink / raw)
  To: linux-btrfs

Hey, I was looking at the wiki 'unclaimed projects' list and I thought I'd 
take a look at what might be needed to use the swap-over-n{fs,bd} 
functionality in order to implement swapfile support. It *looks* like it's 
surprisingly uncomplicated - to the point where I suspect I'm missing 
something, and would like some sanity checking.

My primary references for this are the patch[1] implementing swap-over-NFS 
and HEAD of cmason's for-linus branch.

I'm going to go over the sections of the commitdiff from top to bottom (and 
hunk-by-hunk where I feel it necessary), and I'd like it if people could 
correct any mistakes I make.

Kconfig:
Simple enough that I'm not worried.

direct.c:
The first hunk seems to serve to factor out the difference between 
rw={READ,WRITE} and the KERNEL_ equivalents into a 'uio' parameter, so I'll 
view it as being mostly a code organization thing.

The second hunk just adds that uio param to read_schedule_segment(), which 
it is given secondhand via direct_read().

The third hunk seems to be where the actual reading occurs in 
read_schedule_segment(), and the differences between uio and !uio seem to 
boil down to that KERNEL_READ is only allowed to operate one page at a time 
under penalty of BUG, and that it uses get_kernel_page instead of 
get_user_pages. Since btrfs calls into __blockdev_direct_IO(), I suspect 
that fs/direct-io.c would (in this case) be the right place to implement 
both the {READ,WRITE} vs KERNEL_* logic and those checks in this case. Also, 
check_direct_IO() in fs/btrfs/inode.c may want to treat KERNEL_WRITE 
similarly to WRITE - I'm not completely sure of that, however.

Hunks 4-8 are just dealing with passing the additional uio parameter around.

Hunk 9 is the write-side equivalent of hunk 3, and seems to correspond 
pretty exactly. It's looking like fs/direct_io.c is definitely the place to 
implement this.

Hunks 10-17 are more uio parameter passing.

file.c:
Pretty simple - as far as I can tell, the activate() sets the span to what 
the swap header expects, the xs_swapper call basically sets the appropriate 
kmalloc flags for the network stuff (irrelevant here), and the deactivate() 
just resets those flags. All the smarts seem to be in the DIO code.

The rest:
The remainder of the patch seems to just be a.) handing that uio parameter 
around and b.) stuff specific to swapping over the network.

Did I miss anything important? If not, I may well decide to tackle this as 
my first non-trivial kernel contribution. Does anyone have any suggestions 
on how I should stresstest this? I figure a good starting point would be 
running xfstests under memory pressure with such a swapfile.

[1] 
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;h=a564b8f


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Swapfile on btrfs
  2013-02-26 21:45 Swapfile on btrfs Alex Elsayed
@ 2013-02-27 17:52 ` David Sterba
  2013-02-27 20:52   ` Alex Elsayed
  0 siblings, 1 reply; 4+ messages in thread
From: David Sterba @ 2013-02-27 17:52 UTC (permalink / raw)
  To: Alex Elsayed; +Cc: linux-btrfs

On Tue, Feb 26, 2013 at 01:45:12PM -0800, Alex Elsayed wrote:
> Hey, I was looking at the wiki 'unclaimed projects' list and I thought I'd 
> take a look at what might be needed to use the swap-over-n{fs,bd} 
> functionality in order to implement swapfile support. It *looks* like it's 
> surprisingly uncomplicated - to the point where I suspect I'm missing 
> something, and would like some sanity checking.
> 
> My primary references for this are the patch[1] implementing swap-over-NFS 
> and HEAD of cmason's for-linus branch.

The patch [1] is not enough to cook swap support for btrfs, have look at
the rest of the series. There's some explanation how the interfaces look
like. The read/write mechanism is done via direct_IO, but the filesystem
has to add the swap_activate and swap_deactivate operations and take
care of preallocating the swap file extents.

Details:

Analogy of these patches has to be done in btrfs as well:

d56b4ddf7781e	nfs: teach the NFS client how to treat PG_swapcache pages
29418aa4bd487	nfs: disable data cache revalidation for swapfiles

The patch you've analyzed in detail adds some nfs-specific bits, not the
core work (and thus I don't disagree that it looks surprisingly
uncomplicated :)

The generic swap support is demonstrated in

a509bc1a9e48	mm: swap: implement generic handler for swap_activate

and in case of btrfs it has to avoid using bmap.

Thanks for tackling this project, I think it's not an easy one. There
are implications and API requirements that need to be taken into account
wrt snapshots and stable block ranges during the file lifetime.  I have
a few commetns about that from Mel and can share them if you still want
to proceed with this project.

david

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Swapfile on btrfs
  2013-02-27 17:52 ` David Sterba
@ 2013-02-27 20:52   ` Alex Elsayed
  2013-03-15 15:22     ` David Sterba
  0 siblings, 1 reply; 4+ messages in thread
From: Alex Elsayed @ 2013-02-27 20:52 UTC (permalink / raw)
  To: linux-btrfs

David Sterba wrote:

> On Tue, Feb 26, 2013 at 01:45:12PM -0800, Alex Elsayed wrote:
>> Hey, I was looking at the wiki 'unclaimed projects' list and I thought
>> I'd take a look at what might be needed to use the swap-over-n{fs,bd}
>> functionality in order to implement swapfile support. It *looks* like
>> it's surprisingly uncomplicated - to the point where I suspect I'm
>> missing something, and would like some sanity checking.
>> 
>> My primary references for this are the patch[1] implementing
>> swap-over-NFS and HEAD of cmason's for-linus branch.
> 
> The patch [1] is not enough to cook swap support for btrfs, have look at
> the rest of the series. There's some explanation how the interfaces look
> like. The read/write mechanism is done via direct_IO, but the filesystem
> has to add the swap_activate and swap_deactivate operations and take
> care of preallocating the swap file extents.
> 
> Details:
> 
> Analogy of these patches has to be done in btrfs as well:
> 
> d56b4ddf7781e	nfs: teach the NFS client how to treat PG_swapcache 
pages
> 29418aa4bd487	nfs: disable data cache revalidation for swapfiles
> 
> The patch you've analyzed in detail adds some nfs-specific bits, not the
> core work (and thus I don't disagree that it looks surprisingly
> uncomplicated :)
> 
> The generic swap support is demonstrated in
> 
> a509bc1a9e48	mm: swap: implement generic handler for 
swap_activate
> 
> and in case of btrfs it has to avoid using bmap.
> 
> Thanks for tackling this project, I think it's not an easy one. There
> are implications and API requirements that need to be taken into account
> wrt snapshots and stable block ranges during the file lifetime.  I have
> a few commetns about that from Mel and can share them if you still want
> to proceed with this project.
> 
> david

Thanks for the information! I probably will keep going on this, since at 
this point my disk basically boils down to an EFI system partition and a 
LUKS volume containing swap and btrfs over LVM. Being able to dwap the swap 
partition would let me drop LVM, and simplify my setup a little (along with 
letting me omit the LVM module from dracut).

It will, however, likely take a while - I'm not particularly skilled with C, 
and the kernel is still slightly magic in places, but I'm stubborn and pick 
up on patterns reasonably well so hopefully it'll get done eventually.

Also, in talking on IRC with Hugo Mills he helped me realize another issue - 
specifically, that a.) DIO to compressed data falls back to buffered IO, and 
b.) (at least according to the wiki) the compress_force mount option takes 
precedence over the NOCOMPRESS file flag.

Thinking about it further, I realized that individual extents could be 
compressed, at which point this could get rather onerous on swapon since we 
might well need to check each and every extent to make sure it isn't 
compressed.

Any suggestions?



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Swapfile on btrfs
  2013-02-27 20:52   ` Alex Elsayed
@ 2013-03-15 15:22     ` David Sterba
  0 siblings, 0 replies; 4+ messages in thread
From: David Sterba @ 2013-03-15 15:22 UTC (permalink / raw)
  To: Alex Elsayed; +Cc: linux-btrfs

On Wed, Feb 27, 2013 at 12:52:50PM -0800, Alex Elsayed wrote:
> Also, in talking on IRC with Hugo Mills he helped me realize another issue - 
> specifically, that a.) DIO to compressed data falls back to buffered IO, and 
> b.) (at least according to the wiki) the compress_force mount option takes 
> precedence over the NOCOMPRESS file flag.
> 
> Thinking about it further, I realized that individual extents could be 
> compressed, at which point this could get rather onerous on swapon since we 
> might well need to check each and every extent to make sure it isn't 
> compressed.

There are a few limitations I can think of:

* the file has to be nocow (stable block ranges)
* no compression for the swap file (due to nocow)

* snapshot of the swapfile would work probably the same way as in any
other nocow file, but may imply unexpected or unwanted locking and
interfering with swap.

* relocation must not move the file extents, which makes balance
still possible, but it may not be able to satisfy the filter
requirements (the 1G chunks are basically pinned until the swap file
is active)
  - this affects 'device remove', 'fi resize'
  - depends on the raid profile used for the file, raid 0 is likely to
    pin all devices

david

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-03-15 15:22 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-26 21:45 Swapfile on btrfs Alex Elsayed
2013-02-27 17:52 ` David Sterba
2013-02-27 20:52   ` Alex Elsayed
2013-03-15 15:22     ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.