All of lore.kernel.org
 help / color / mirror / Atom feed
From: Omar Sandoval <osandov@osandov.com>
To: Nikolay Borisov <nborisov@suse.com>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com,
	linux-fsdevel@vger.kernel.org, Tejun Heo <tj@kernel.org>
Subject: Re: [RFC PATCH v4 6/6] Btrfs: support swap files
Date: Fri, 25 May 2018 09:16:47 -0700	[thread overview]
Message-ID: <20180525161647.GC29193@vader> (raw)
In-Reply-To: <bc13c9f8-34b0-3566-5d4d-a2cb5040e0fa@suse.com>

On Fri, May 25, 2018 at 01:07:00PM +0300, Nikolay Borisov wrote:
> 
> 
> On 25.05.2018 00:41, Omar Sandoval wrote:
> > From: Omar Sandoval <osandov@fb.com>
> > 
> > Implement the swap file a_ops on Btrfs. Activation needs to make sure
> > that the file can be used as a swap file, which currently means it must
> > be fully allocated as nocow with no compression on one device. It also
> > sets up the swap extents directly with add_swap_extent(), so export it.
> > 
> > Signed-off-by: Omar Sandoval <osandov@fb.com>
> 
> What testing (apart form the xfstest patches you sent) have this code
> seen?

Light testing with my swapme script [1] and btrfsck to make sure I
didn't swap to the wrong place. I was meaning to put this through
something more intensive like a kernel build, thanks for the reminder.
As opposed to the previous approach, the swapin/swapout paths are
simple, core code, so the edge cases I'm worried about are really in
activate/deactivate and other ioctls breaking things.

1: https://github.com/osandov/osandov-linux/blob/master/scripts/swapme.c

> Have you run it with lockdep enabled (I'm asking because when I
> picked up your v3 there was quite a bunch of deadlock warnings). Also
> see some inline questions below.

Yup, I've been running it with lockdep, no warnings. The nice part of
this approach is that there's no new locking involved, just whatever the
swap code does itself.

> > ---
> >  fs/btrfs/inode.c | 220 +++++++++++++++++++++++++++++++++++++++++++++++
> >  mm/swapfile.c    |   1 +
> >  2 files changed, 221 insertions(+)
> > 
> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > index 9228cb866115..6cca8529e307 100644
> > --- a/fs/btrfs/inode.c
> > +++ b/fs/btrfs/inode.c
> > @@ -10526,6 +10526,224 @@ void btrfs_set_range_writeback(void *private_data, u64 start, u64 end)
> >  	}
> >  }
> >  
> <snip>
> > +static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file,
> > +			       sector_t *span)
> > +{
> > +	struct inode *inode = file_inode(file);
> > +	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
> > +	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
> > +	struct extent_state *cached_state = NULL;
> > +	struct extent_map *em = NULL;
> > +	struct btrfs_device *device = NULL;
> > +	struct btrfs_swap_info bsi = {
> > +		.lowest_ppage = (sector_t)-1ULL,
> > +	};
> > +	int ret = 0;
> > +	u64 isize = inode->i_size;
> > +	u64 start;
> > +
> > +	/*
> > +	 * The inode is locked, so these flags won't change after we check them.
> > +	 */
> > +	if (BTRFS_I(inode)->flags & BTRFS_INODE_COMPRESS) {
> > +		btrfs_err(fs_info, "swapfile is compressed");
> > +		return -EINVAL;
> > +	}
> > +	if (!(BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW)) {
> > +		btrfs_err(fs_info, "swapfile is copy-on-write");
> > +		return -EINVAL;
> > +	}
> > +
> > +	/*
> > +	 * Balance or device remove/replace/resize can move stuff around from
> > +	 * under us. The EXCL_OP flag makes sure they aren't running/won't run
> > +	 * concurrently while we are mapping the swap extents, and the fs_info
> > +	 * nr_swapfiles counter prevents them from running while the swap file
> > +	 * is active and moving the extents. Note that this also prevents a
> > +	 * concurrent device add which isn't actually necessary, but it's not
> > +	 * really worth the trouble to allow it.
> > +	 */
> > +	if (test_and_set_bit(BTRFS_FS_EXCL_OP, &fs_info->flags))
> > +		return -EBUSY;
> > +	atomic_inc(&fs_info->nr_swapfiles);
> > +	/*
> > +	 * Snapshots can create extents which require COW even if NODATACOW is
> > +	 * set. We use this counter to prevent snapshots. We must increment it
> > +	 * before walking the extents because we don't want a concurrent
> > +	 * snapshot to run after we've already checked the extents.
> > +	 */
> > +	atomic_inc(&BTRFS_I(inode)->root->nr_swapfiles);
> > +
> > +	lock_extent_bits(io_tree, 0, isize - 1, &cached_state);
> > +	start = 0;
> > +	while (start < isize) {
> > +		u64 end, logical_block_start, physical_block_start;
> > +		u64 len = isize - start;
> > +
> > +		em = btrfs_get_extent(BTRFS_I(inode), NULL, 0, start, len, 0);
> > +		if (IS_ERR(em)) {
> > +			ret = PTR_ERR(em);
> > +			goto out;
> > +		}
> > +		end = extent_map_end(em);
> > +
> > +		if (em->block_start == EXTENT_MAP_HOLE) {
> > +			btrfs_err(fs_info, "swapfile has holes");
> > +			ret = -EINVAL;
> > +			goto out;
> > +		}
> > +		if (em->block_start == EXTENT_MAP_INLINE) {
> > +			/*
> > +			 * It's unlikely we'll ever actually find ourselves
> > +			 * here, as a file small enough to fit inline won't be
> > +			 * big enough to store more than the swap header, but in
> > +			 * case something changes in the future, let's catch it
> > +			 * here rather than later.
> > +			 */
> > +			btrfs_err(fs_info, "swapfile is inline");
> > +			ret = -EINVAL;
> > +			goto out;
> > +		}
> > +		if (test_bit(EXTENT_FLAG_COMPRESSED, &em->flags)) {
> > +			btrfs_err(fs_info, "swapfile is compressed");
> > +			ret = -EINVAL;
> > +			goto out;
> > +		}
> 
> Isn't this implied by the earlier BTRFS_I(inode)->flags &
> BTRFS_INODE_COMPRESS check ?

Nope, if you look at btrfs_ioctl_setflags(), you can clear the inode
flag even if there are compressed extents.

> > +
> > +		logical_block_start = em->block_start + (start - em->start);
> 
> So which offset are you calculating by doing the start - em->start
> calculation? start is really the ending logical address of the previous
> extent but isn't this always < em->start of the next extent ?

[em->start, em->start + em->len) will always contain start.
start - em->start is the offset of start from the beginning of the
extent we just found. It should always be zero since we're walking
sequentially and we have the extents locked, but I wanted to be
defensive here.

> > +		len = min(len, em->len - (start - em->start));
> > +		free_extent_map(em);
> > +		em = NULL;
> > +
> > +		ret = can_nocow_extent(inode, start, &len, NULL, NULL, NULL);
> > +		if (ret < 0) {
> > +			goto out;
> > +		} else if (ret) {
> > +			ret = 0;
> > +		} else {
> > +			btrfs_err(fs_info, "swapfile is copy-on-write");
> > +			ret = -EINVAL;
> > +			goto out;
> > +		}
> > +
> > +		em = btrfs_get_chunk_map(fs_info, logical_block_start, len);
> > +		if (IS_ERR(em)) {
> > +			ret = PTR_ERR(em);
> > +			goto out;
> > +		}
> > +
> > +		if (em->map_lookup->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
> > +			btrfs_err(fs_info, "swapfile data profile is not single");
> > +			ret = -EINVAL;
> > +			goto out;
> > +		}
> > +
> > +		if (device == NULL) {
> > +			device = em->map_lookup->stripes[0].dev;
> > +		} else if (device != em->map_lookup->stripes[0].dev) {
> > +			btrfs_err(fs_info, "swapfile is on multiple devices");
> > +			ret = -EINVAL;
> > +			goto out;
> > +		}
> > +
> <snip>

  reply	other threads:[~2018-05-25 16:16 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-24 21:41 [RFC PATCH v4 0/6] Btrfs: implement swap file support Omar Sandoval
2018-05-24 21:41 ` [RFC PATCH v4 1/6] mm: split SWP_FILE into SWP_ACTIVATED and SWP_FS Omar Sandoval
2018-05-25  9:11   ` Nikolay Borisov
2018-05-24 21:41 ` [RFC PATCH v4 2/6] vfs: update swap_{,de}activate documentation Omar Sandoval
2018-05-25  9:15   ` Nikolay Borisov
2018-05-24 21:41 ` [RFC PATCH v4 3/6] Btrfs: push EXCL_OP set into btrfs_rm_device() Omar Sandoval
2018-05-25  9:19   ` Nikolay Borisov
2018-05-28 13:29   ` David Sterba
2018-05-24 21:41 ` [RFC PATCH v4 4/6] Btrfs: prevent ioctls from interfering with a swap file Omar Sandoval
2018-05-25 14:50   ` David Sterba
2018-05-25 16:00     ` Omar Sandoval
2018-05-25 16:10       ` David Sterba
2018-08-21  8:46         ` Omar Sandoval
2018-05-24 21:41 ` [RFC PATCH v4 5/6] Btrfs: rename get_chunk_map() and make it non-static Omar Sandoval
2018-05-25  9:21   ` Nikolay Borisov
2018-05-25 16:02     ` Omar Sandoval
2018-05-24 21:41 ` [RFC PATCH v4 6/6] Btrfs: support swap files Omar Sandoval
2018-05-25 10:07   ` Nikolay Borisov
2018-05-25 16:16     ` Omar Sandoval [this message]
2018-05-28 13:46   ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180525161647.GC29193@vader \
    --to=osandov@osandov.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=nborisov@suse.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.