All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@redhat.com>
To: Brian Foster <bfoster@redhat.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>,
	Joe Thornber <ejt@redhat.com>,
	xfs@oss.sgi.com, linux-block@vger.kernel.org,
	dm-devel@redhat.com, linux-fsdevel@vger.kernel.org
Subject: Re: [RFC v2 PATCH 05/10] dm thin: add methods to set and get reserved space
Date: Thu, 14 Apr 2016 11:10:14 -0400	[thread overview]
Message-ID: <20160414151014.GA13074@redhat.com> (raw)
In-Reply-To: <20160413204117.GA6870@bfoster.bfoster>

On Wed, Apr 13 2016 at  4:41pm -0400,
Brian Foster <bfoster@redhat.com> wrote:

> On Wed, Apr 13, 2016 at 02:33:52PM -0400, Brian Foster wrote:
> > On Wed, Apr 13, 2016 at 10:44:42AM -0700, Darrick J. Wong wrote:
> > > On Tue, Apr 12, 2016 at 12:42:48PM -0400, Brian Foster wrote:
> > > > From: Joe Thornber <ejt@redhat.com>
> > > > 
> > > > Experimental reserve interface for XFS guys to play with.
> > > > 
> > > > I have big reservations (no pun intended) about this patch.
> > > > 
> > > > [BF:
> > > >  - Support for reservation reduction.
> > > >  - Support for space provisioning.
> > > >  - Condensed to a single function.]
> > > > 
> > > > Not-Signed-off-by: Joe Thornber <ejt@redhat.com>
> > > > Not-Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> > > > ---
> > > >  drivers/md/dm-thin.c | 181 ++++++++++++++++++++++++++++++++++++++++++++++++---
> > > >  1 file changed, 171 insertions(+), 10 deletions(-)
> > > > 
> > > > diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
> > > > index 92237b6..32bc5bd 100644
> > > > --- a/drivers/md/dm-thin.c
> > > > +++ b/drivers/md/dm-thin.c
> > ...
> > > > @@ -4271,6 +4343,94 @@ static void thin_io_hints(struct dm_target *ti, struct queue_limits *limits)
> > > >  	limits->max_discard_sectors = 2048 * 1024 * 16; /* 16G */
> > > >  }
> > > >  
> > > > +static int thin_provision_space(struct dm_target *ti, sector_t offset,
> > > > +				sector_t len, sector_t *res)
> > > > +{
> > > > +	struct thin_c *tc = ti->private;
> > > > +	struct pool *pool = tc->pool;
> > > > +	sector_t end;
> > > > +	dm_block_t pblock;
> > > > +	dm_block_t vblock;
> > > > +	int error;
> > > > +	struct dm_thin_lookup_result lookup;
> > > > +
> > > > +	if (!is_factor(offset, pool->sectors_per_block))
> > > > +		return -EINVAL;
> > > > +
> > > > +	if (!len || !is_factor(len, pool->sectors_per_block))
> > > > +		return -EINVAL;
> > > > +
> > > > +	if (res && !is_factor(*res, pool->sectors_per_block))
> > > > +		return -EINVAL;
> > > > +
> > > > +	end = offset + len;
> > > > +
> > > > +	while (offset < end) {
> > > > +		vblock = offset;
> > > > +		do_div(vblock, pool->sectors_per_block);
> > > > +
> > > > +		error = dm_thin_find_block(tc->td, vblock, true, &lookup);
> > > > +		if (error == 0)
> > > > +			goto next;
> > > > +		if (error != -ENODATA)
> > > > +			return error;
> > > > +
> > > > +		error = alloc_data_block(tc, &pblock);
> > > 
> > > So this means that if fallocate wants to BDEV_RES_PROVISION N blocks, it must
> > > first increase the reservation (BDEV_RES_MOD) by N blocks to avoid using up
> > > space that was previously reserved by some other caller.  I think?
> > > 
> > 
> > Yes, assuming this is being called from a filesystem using the
> > reservation mechanism.

Brian, I need to circle back with you to understand why XFS even needs
reservation as opposed to just using something like fallocate (which
would provision the space before you actually initiate the IO that would
use it).  But we can discuss that in person and then report back to the
list if it makes it easier...
 
> > > > +		if (error)
> > > > +			return error;
> > > > +
> > > > +		error = dm_thin_insert_block(tc->td, vblock, pblock);
> > > 
> > > Having reserved and mapped blocks, what happens when we try to read them?
> > > Do we actually get zeroes, or does the read go straight through to whatever
> > > happens to be in the disk blocks?  I don't think it's correct that we could
> > > BDEV_RES_PROVISION and end up with stale credit card numbers from some other
> > > thin device.
> > > 
> > 
> > Agree, but I'm not really sure how this works in thinp tbh. fallocate
> > wasn't really on my mind when doing this. I was simply trying to cobble
> > together what I could to facilitate making progress on the fs parts
> > (e.g., I just needed a call that allocated blocks and consumed
> > reservation in the process).
> > 
> > Skimming through the dm-thin code, it looks like a (configurable) block
> > zeroing mechanism can be triggered from somewhere around
> > provision_block()->schedule_zero(), depending on whether the incoming
> > write overwrites the newly allocated block. If that's the case, then I
> > suspect that means reads would just fall through to the block and return
> > whatever was on disk. This code would probably need to tie into that
> > zeroing mechanism one way or another to deal with that issue. (Though
> > somebody who actually knows something about dm-thin should verify that.
> > :)
> > 
> 
> BTW, if that mechanism is in fact doing I/O, that might not be the
> appropriate solution for fallocate. Perhaps we'd have to consider an
> unwritten flag or some such in dm-thin, if possible.

DM thinp defaults to enabling 'zero_new_blocks' (can be disabled using
the 'skip_block_zeroing' feature when loading the DM table for the
thin-pool).  With block-zeroing any blocks that are provisioned _will_
be overwritten with zeroes (using dm-kcopyd which is trained to use
WRITE_SAME if supported).

But yeah, for fallocate.. certainly not something we want as it defeats
the point of fallocate being cheap.

So we probably would need a flag comparable to the
ext4-stale-flag-that-shall-not-be-named ;)

Mike

WARNING: multiple messages have this Message-ID (diff)
From: Mike Snitzer <snitzer@redhat.com>
To: Brian Foster <bfoster@redhat.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>,
	dm-devel@redhat.com, xfs@oss.sgi.com,
	linux-block@vger.kernel.org, Joe Thornber <ejt@redhat.com>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [RFC v2 PATCH 05/10] dm thin: add methods to set and get reserved space
Date: Thu, 14 Apr 2016 11:10:14 -0400	[thread overview]
Message-ID: <20160414151014.GA13074@redhat.com> (raw)
In-Reply-To: <20160413204117.GA6870@bfoster.bfoster>

On Wed, Apr 13 2016 at  4:41pm -0400,
Brian Foster <bfoster@redhat.com> wrote:

> On Wed, Apr 13, 2016 at 02:33:52PM -0400, Brian Foster wrote:
> > On Wed, Apr 13, 2016 at 10:44:42AM -0700, Darrick J. Wong wrote:
> > > On Tue, Apr 12, 2016 at 12:42:48PM -0400, Brian Foster wrote:
> > > > From: Joe Thornber <ejt@redhat.com>
> > > > 
> > > > Experimental reserve interface for XFS guys to play with.
> > > > 
> > > > I have big reservations (no pun intended) about this patch.
> > > > 
> > > > [BF:
> > > >  - Support for reservation reduction.
> > > >  - Support for space provisioning.
> > > >  - Condensed to a single function.]
> > > > 
> > > > Not-Signed-off-by: Joe Thornber <ejt@redhat.com>
> > > > Not-Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> > > > ---
> > > >  drivers/md/dm-thin.c | 181 ++++++++++++++++++++++++++++++++++++++++++++++++---
> > > >  1 file changed, 171 insertions(+), 10 deletions(-)
> > > > 
> > > > diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
> > > > index 92237b6..32bc5bd 100644
> > > > --- a/drivers/md/dm-thin.c
> > > > +++ b/drivers/md/dm-thin.c
> > ...
> > > > @@ -4271,6 +4343,94 @@ static void thin_io_hints(struct dm_target *ti, struct queue_limits *limits)
> > > >  	limits->max_discard_sectors = 2048 * 1024 * 16; /* 16G */
> > > >  }
> > > >  
> > > > +static int thin_provision_space(struct dm_target *ti, sector_t offset,
> > > > +				sector_t len, sector_t *res)
> > > > +{
> > > > +	struct thin_c *tc = ti->private;
> > > > +	struct pool *pool = tc->pool;
> > > > +	sector_t end;
> > > > +	dm_block_t pblock;
> > > > +	dm_block_t vblock;
> > > > +	int error;
> > > > +	struct dm_thin_lookup_result lookup;
> > > > +
> > > > +	if (!is_factor(offset, pool->sectors_per_block))
> > > > +		return -EINVAL;
> > > > +
> > > > +	if (!len || !is_factor(len, pool->sectors_per_block))
> > > > +		return -EINVAL;
> > > > +
> > > > +	if (res && !is_factor(*res, pool->sectors_per_block))
> > > > +		return -EINVAL;
> > > > +
> > > > +	end = offset + len;
> > > > +
> > > > +	while (offset < end) {
> > > > +		vblock = offset;
> > > > +		do_div(vblock, pool->sectors_per_block);
> > > > +
> > > > +		error = dm_thin_find_block(tc->td, vblock, true, &lookup);
> > > > +		if (error == 0)
> > > > +			goto next;
> > > > +		if (error != -ENODATA)
> > > > +			return error;
> > > > +
> > > > +		error = alloc_data_block(tc, &pblock);
> > > 
> > > So this means that if fallocate wants to BDEV_RES_PROVISION N blocks, it must
> > > first increase the reservation (BDEV_RES_MOD) by N blocks to avoid using up
> > > space that was previously reserved by some other caller.  I think?
> > > 
> > 
> > Yes, assuming this is being called from a filesystem using the
> > reservation mechanism.

Brian, I need to circle back with you to understand why XFS even needs
reservation as opposed to just using something like fallocate (which
would provision the space before you actually initiate the IO that would
use it).  But we can discuss that in person and then report back to the
list if it makes it easier...
 
> > > > +		if (error)
> > > > +			return error;
> > > > +
> > > > +		error = dm_thin_insert_block(tc->td, vblock, pblock);
> > > 
> > > Having reserved and mapped blocks, what happens when we try to read them?
> > > Do we actually get zeroes, or does the read go straight through to whatever
> > > happens to be in the disk blocks?  I don't think it's correct that we could
> > > BDEV_RES_PROVISION and end up with stale credit card numbers from some other
> > > thin device.
> > > 
> > 
> > Agree, but I'm not really sure how this works in thinp tbh. fallocate
> > wasn't really on my mind when doing this. I was simply trying to cobble
> > together what I could to facilitate making progress on the fs parts
> > (e.g., I just needed a call that allocated blocks and consumed
> > reservation in the process).
> > 
> > Skimming through the dm-thin code, it looks like a (configurable) block
> > zeroing mechanism can be triggered from somewhere around
> > provision_block()->schedule_zero(), depending on whether the incoming
> > write overwrites the newly allocated block. If that's the case, then I
> > suspect that means reads would just fall through to the block and return
> > whatever was on disk. This code would probably need to tie into that
> > zeroing mechanism one way or another to deal with that issue. (Though
> > somebody who actually knows something about dm-thin should verify that.
> > :)
> > 
> 
> BTW, if that mechanism is in fact doing I/O, that might not be the
> appropriate solution for fallocate. Perhaps we'd have to consider an
> unwritten flag or some such in dm-thin, if possible.

DM thinp defaults to enabling 'zero_new_blocks' (can be disabled using
the 'skip_block_zeroing' feature when loading the DM table for the
thin-pool).  With block-zeroing any blocks that are provisioned _will_
be overwritten with zeroes (using dm-kcopyd which is trained to use
WRITE_SAME if supported).

But yeah, for fallocate.. certainly not something we want as it defeats
the point of fallocate being cheap.

So we probably would need a flag comparable to the
ext4-stale-flag-that-shall-not-be-named ;)

Mike

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2016-04-14 15:10 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-12 16:42 [RFC v2 PATCH 00/10] dm-thin/xfs: prototype a block reservation allocation model Brian Foster
2016-04-12 16:42 ` Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 01/10] xfs: refactor xfs_reserve_blocks() to handle ENOSPC correctly Brian Foster
2016-04-12 16:42   ` Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 02/10] xfs: replace xfs_mod_fdblocks() bool param with flags Brian Foster
2016-04-12 16:42   ` Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 03/10] block: add block_device_operations methods to set and get reserved space Brian Foster
2016-04-12 16:42   ` Brian Foster
2016-04-14  0:32   ` Dave Chinner
2016-04-14  0:32     ` Dave Chinner
2016-04-12 16:42 ` [RFC v2 PATCH 04/10] dm: add " Brian Foster
2016-04-12 16:42   ` Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 05/10] dm thin: " Brian Foster
2016-04-12 16:42   ` Brian Foster
2016-04-13 17:44   ` Darrick J. Wong
2016-04-13 17:44     ` Darrick J. Wong
2016-04-13 18:33     ` Brian Foster
2016-04-13 18:33       ` Brian Foster
2016-04-13 20:41       ` Brian Foster
2016-04-13 20:41         ` Brian Foster
2016-04-13 21:01         ` Darrick J. Wong
2016-04-13 21:01           ` Darrick J. Wong
2016-04-14 15:10         ` Mike Snitzer [this message]
2016-04-14 15:10           ` Mike Snitzer
2016-04-14 16:23           ` Brian Foster
2016-04-14 16:23             ` Brian Foster
2016-04-14 20:18             ` Mike Snitzer
2016-04-14 20:18               ` Mike Snitzer
2016-04-15 11:48               ` Brian Foster
2016-04-15 11:48                 ` Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 06/10] xfs: thin block device reservation mechanism Brian Foster
2016-04-12 16:42   ` Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 07/10] xfs: adopt a reserved allocation model on dm-thin devices Brian Foster
2016-04-12 16:42   ` Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 08/10] xfs: handle bdev reservation ENOSPC correctly from XFS reserved pool Brian Foster
2016-04-12 16:42   ` Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 09/10] xfs: support no block reservation transaction mode Brian Foster
2016-04-12 16:42   ` Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 10/10] xfs: use contiguous bdev reservation for file preallocation Brian Foster
2016-04-12 16:42   ` Brian Foster
2016-04-12 20:04 ` [RFC PATCH] block: wire blkdev_fallocate() to block_device_operations' reserve_space Mike Snitzer
2016-04-12 20:04   ` Mike Snitzer
2016-04-12 20:39   ` Darrick J. Wong
2016-04-12 20:39     ` Darrick J. Wong
2016-04-12 20:46     ` Mike Snitzer
2016-04-12 20:46       ` Mike Snitzer
2016-04-12 22:25       ` Darrick J. Wong
2016-04-12 22:25         ` Darrick J. Wong
2016-04-12 21:04     ` Mike Snitzer
2016-04-12 21:04       ` Mike Snitzer
2016-04-13  0:12       ` Darrick J. Wong
2016-04-13  0:12         ` Darrick J. Wong
2016-04-14 15:18         ` Mike Snitzer
2016-04-14 15:18           ` Mike Snitzer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160414151014.GA13074@redhat.com \
    --to=snitzer@redhat.com \
    --cc=bfoster@redhat.com \
    --cc=darrick.wong@oracle.com \
    --cc=dm-devel@redhat.com \
    --cc=ejt@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.