All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@redhat.com>
To: Brian Foster <bfoster@redhat.com>
Cc: xfs@oss.sgi.com, linux-block@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, dm-devel@redhat.com,
	"Darrick J. Wong" <darrick.wong@oracle.com>
Subject: [RFC PATCH] block: wire blkdev_fallocate() to block_device_operations' reserve_space
Date: Tue, 12 Apr 2016 16:04:59 -0400	[thread overview]
Message-ID: <20160412200459.GA10730@redhat.com> (raw)
In-Reply-To: <1460479373-63317-1-git-send-email-bfoster@redhat.com>

On Tue, Apr 12 2016 at 12:42P -0400,
Brian Foster <bfoster@redhat.com> wrote:

> Hi all,
> 
> This is v2 of the XFS and block device reservation experiment. The
> significant changes in v2 are that the bdev interface has been condensed
> to a single callback function, the XFS transaction reservation
> management has been reworked to make transactions responsible for
> tracking and releasing excess reservation (for non-delalloc cases) and a
> workaround for the fallocate over-reservation issue is included. Beyond
> that, this version adds a bunch of miscellaneous cleanups and fixes some
> of the nastier locking/leak issues present in the first rfc.
> 
> Patches 1-2 refactor some XFS reserve pool and block accounting code in
> preparation for subsequent patches. Patches 3-5 add block/device-mapper
> reservation support. Patches 6-10 add the core reservation
> infrastructure and management bits to XFS. See the link to the original
> rfc below for instructions and further details around the purpose of
> this series.
> 
> Finally, note that this is still highly experimental/theoretical and
> should not be used on production systems. Thoughts, reviews, flames
> appreciated.

Thanks for carrying on with this work Brian.

I've started to review your patchset and Darrick's fallocate patchset.
I've pushed a branch to linux-dm.git that combines the 2, see:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-fallocate

and then added this RFC patch, at the end, which relies on both of your
patchsets -- you'll see blkdev_ensure_space_exists() has a FIXME which
implies it isn't much more than simply stubbed out at this point
(completely untested):

From: Mike Snitzer <snitzer@redhat.com>
Date: Tue, 12 Apr 2016 15:54:31 -0400
Subject: [RFC PATCH] block: wire blkdev_fallocate() to block_device_operations' reserve_space

This effectively exposes the primitive for "ensure space exists".  It
relies on block_device_operations' reserve_space method.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 block/blk-lib.c        | 26 ++++++++++++++++++++++++++
 fs/block_dev.c         | 20 +++++++++++---------
 include/linux/blkdev.h |  2 ++
 3 files changed, 39 insertions(+), 9 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 9dca6bb..5042a84 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -314,3 +314,29 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 	return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
 }
 EXPORT_SYMBOL(blkdev_issue_zeroout);
+
+/**
+ * blkdev_ensure_space_exists - preallocate a block range
+ * @bdev:	blockdev to preallocate space for
+ * @sector:	start sector
+ * @nr_sects:	number of sectors to preallocate
+ * @gfp_mask:	memory allocation flags (for bio_alloc)
+ * @flags:	FALLOC_FL_* to control behaviour
+ *
+ * Description:
+ *    Ensure space exists, or is preallocated, for the sectors in question.
+ */
+int blkdev_ensure_space_exists(struct block_device *bdev, sector_t sector,
+		sector_t nr_sects, unsigned long flags)
+{
+	sector_t res;
+	const struct block_device_operations *ops = bdev->bd_disk->fops;
+
+	if (!ops->reserve_space)
+		return -EOPNOTSUPP;
+
+	// FIXME: check with Brian Foster on whether it makes sense to
+	// use BDEV_RES_GET/BDEV_RES_MOD instead of BDEV_RES_PROVISION?
+	return ops->reserve_space(bdev, BDEV_RES_PROVISION, sector, nr_sects, &res);
+}
+EXPORT_SYMBOL(blkdev_ensure_space_exists);
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 5a2c3ab..b34c07b 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1801,17 +1801,13 @@ long blkdev_fallocate(struct file *file, int mode, loff_t start, loff_t len)
 	struct request_queue *q = bdev_get_queue(bdev);
 	struct address_space *mapping;
 	loff_t end = start + len - 1;
-	loff_t bs_mask, isize;
+	loff_t isize;
 	int error;
 
 	/* We only support zero range and punch hole. */
 	if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
 		return -EOPNOTSUPP;
 
-	/* We haven't a primitive for "ensure space exists" right now. */
-	if (!(mode & ~FALLOC_FL_KEEP_SIZE))
-		return -EOPNOTSUPP;
-
 	/* Only punch if the device can do zeroing discard. */
 	if ((mode & FALLOC_FL_PUNCH_HOLE) &&
 	    (!blk_queue_discard(q) || !q->limits.discard_zeroes_data))
@@ -1829,9 +1825,12 @@ long blkdev_fallocate(struct file *file, int mode, loff_t start, loff_t len)
 			return -EINVAL;
 	}
 
-	/* Don't allow IO that isn't aligned to logical block size */
-	bs_mask = bdev_logical_block_size(bdev) - 1;
-	if ((start | len) & bs_mask)
+	/*
+	 * Don't allow IO that isn't aligned to minimum IO size (io_min)
+	 * - for normal device's io_min is usually logical block size
+	 * - but for more exotic devices (e.g. DM thinp) it may be larger
+	 */
+	if ((start | len) % bdev_io_min(bdev))
 		return -EINVAL;
 
 	/* Invalidate the page cache, including dirty pages. */
@@ -1839,7 +1838,10 @@ long blkdev_fallocate(struct file *file, int mode, loff_t start, loff_t len)
 	truncate_inode_pages_range(mapping, start, end);
 
 	error = -EINVAL;
-	if (mode & FALLOC_FL_ZERO_RANGE)
+	if (!(mode & ~FALLOC_FL_KEEP_SIZE))
+		error = blkdev_ensure_space_exists(bdev, start >> 9, len >> 9,
+						   mode);
+	else if (mode & FALLOC_FL_ZERO_RANGE)
 		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
 					    GFP_KERNEL, false);
 	else if (mode & FALLOC_FL_PUNCH_HOLE)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 6c6ea96..4147af2 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1132,6 +1132,8 @@ extern int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp_mask, struct page *page);
 extern int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp_mask, bool discard);
+extern int blkdev_ensure_space_exists(struct block_device *bdev, sector_t sector,
+		sector_t nr_sects, unsigned long flags);
 static inline int sb_issue_discard(struct super_block *sb, sector_t block,
 		sector_t nr_blocks, gfp_t gfp_mask, unsigned long flags)
 {
-- 
2.6.4 (Apple Git-63)


WARNING: multiple messages have this Message-ID (diff)
From: Mike Snitzer <snitzer@redhat.com>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	dm-devel@redhat.com, "Darrick J. Wong" <darrick.wong@oracle.com>,
	xfs@oss.sgi.com
Subject: [RFC PATCH] block: wire blkdev_fallocate() to block_device_operations' reserve_space
Date: Tue, 12 Apr 2016 16:04:59 -0400	[thread overview]
Message-ID: <20160412200459.GA10730@redhat.com> (raw)
In-Reply-To: <1460479373-63317-1-git-send-email-bfoster@redhat.com>

On Tue, Apr 12 2016 at 12:42P -0400,
Brian Foster <bfoster@redhat.com> wrote:

> Hi all,
> 
> This is v2 of the XFS and block device reservation experiment. The
> significant changes in v2 are that the bdev interface has been condensed
> to a single callback function, the XFS transaction reservation
> management has been reworked to make transactions responsible for
> tracking and releasing excess reservation (for non-delalloc cases) and a
> workaround for the fallocate over-reservation issue is included. Beyond
> that, this version adds a bunch of miscellaneous cleanups and fixes some
> of the nastier locking/leak issues present in the first rfc.
> 
> Patches 1-2 refactor some XFS reserve pool and block accounting code in
> preparation for subsequent patches. Patches 3-5 add block/device-mapper
> reservation support. Patches 6-10 add the core reservation
> infrastructure and management bits to XFS. See the link to the original
> rfc below for instructions and further details around the purpose of
> this series.
> 
> Finally, note that this is still highly experimental/theoretical and
> should not be used on production systems. Thoughts, reviews, flames
> appreciated.

Thanks for carrying on with this work Brian.

I've started to review your patchset and Darrick's fallocate patchset.
I've pushed a branch to linux-dm.git that combines the 2, see:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-fallocate

and then added this RFC patch, at the end, which relies on both of your
patchsets -- you'll see blkdev_ensure_space_exists() has a FIXME which
implies it isn't much more than simply stubbed out at this point
(completely untested):

From: Mike Snitzer <snitzer@redhat.com>
Date: Tue, 12 Apr 2016 15:54:31 -0400
Subject: [RFC PATCH] block: wire blkdev_fallocate() to block_device_operations' reserve_space

This effectively exposes the primitive for "ensure space exists".  It
relies on block_device_operations' reserve_space method.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 block/blk-lib.c        | 26 ++++++++++++++++++++++++++
 fs/block_dev.c         | 20 +++++++++++---------
 include/linux/blkdev.h |  2 ++
 3 files changed, 39 insertions(+), 9 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 9dca6bb..5042a84 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -314,3 +314,29 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 	return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
 }
 EXPORT_SYMBOL(blkdev_issue_zeroout);
+
+/**
+ * blkdev_ensure_space_exists - preallocate a block range
+ * @bdev:	blockdev to preallocate space for
+ * @sector:	start sector
+ * @nr_sects:	number of sectors to preallocate
+ * @gfp_mask:	memory allocation flags (for bio_alloc)
+ * @flags:	FALLOC_FL_* to control behaviour
+ *
+ * Description:
+ *    Ensure space exists, or is preallocated, for the sectors in question.
+ */
+int blkdev_ensure_space_exists(struct block_device *bdev, sector_t sector,
+		sector_t nr_sects, unsigned long flags)
+{
+	sector_t res;
+	const struct block_device_operations *ops = bdev->bd_disk->fops;
+
+	if (!ops->reserve_space)
+		return -EOPNOTSUPP;
+
+	// FIXME: check with Brian Foster on whether it makes sense to
+	// use BDEV_RES_GET/BDEV_RES_MOD instead of BDEV_RES_PROVISION?
+	return ops->reserve_space(bdev, BDEV_RES_PROVISION, sector, nr_sects, &res);
+}
+EXPORT_SYMBOL(blkdev_ensure_space_exists);
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 5a2c3ab..b34c07b 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1801,17 +1801,13 @@ long blkdev_fallocate(struct file *file, int mode, loff_t start, loff_t len)
 	struct request_queue *q = bdev_get_queue(bdev);
 	struct address_space *mapping;
 	loff_t end = start + len - 1;
-	loff_t bs_mask, isize;
+	loff_t isize;
 	int error;
 
 	/* We only support zero range and punch hole. */
 	if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
 		return -EOPNOTSUPP;
 
-	/* We haven't a primitive for "ensure space exists" right now. */
-	if (!(mode & ~FALLOC_FL_KEEP_SIZE))
-		return -EOPNOTSUPP;
-
 	/* Only punch if the device can do zeroing discard. */
 	if ((mode & FALLOC_FL_PUNCH_HOLE) &&
 	    (!blk_queue_discard(q) || !q->limits.discard_zeroes_data))
@@ -1829,9 +1825,12 @@ long blkdev_fallocate(struct file *file, int mode, loff_t start, loff_t len)
 			return -EINVAL;
 	}
 
-	/* Don't allow IO that isn't aligned to logical block size */
-	bs_mask = bdev_logical_block_size(bdev) - 1;
-	if ((start | len) & bs_mask)
+	/*
+	 * Don't allow IO that isn't aligned to minimum IO size (io_min)
+	 * - for normal device's io_min is usually logical block size
+	 * - but for more exotic devices (e.g. DM thinp) it may be larger
+	 */
+	if ((start | len) % bdev_io_min(bdev))
 		return -EINVAL;
 
 	/* Invalidate the page cache, including dirty pages. */
@@ -1839,7 +1838,10 @@ long blkdev_fallocate(struct file *file, int mode, loff_t start, loff_t len)
 	truncate_inode_pages_range(mapping, start, end);
 
 	error = -EINVAL;
-	if (mode & FALLOC_FL_ZERO_RANGE)
+	if (!(mode & ~FALLOC_FL_KEEP_SIZE))
+		error = blkdev_ensure_space_exists(bdev, start >> 9, len >> 9,
+						   mode);
+	else if (mode & FALLOC_FL_ZERO_RANGE)
 		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
 					    GFP_KERNEL, false);
 	else if (mode & FALLOC_FL_PUNCH_HOLE)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 6c6ea96..4147af2 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1132,6 +1132,8 @@ extern int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp_mask, struct page *page);
 extern int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp_mask, bool discard);
+extern int blkdev_ensure_space_exists(struct block_device *bdev, sector_t sector,
+		sector_t nr_sects, unsigned long flags);
 static inline int sb_issue_discard(struct super_block *sb, sector_t block,
 		sector_t nr_blocks, gfp_t gfp_mask, unsigned long flags)
 {
-- 
2.6.4 (Apple Git-63)

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2016-04-12 20:05 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-12 16:42 [RFC v2 PATCH 00/10] dm-thin/xfs: prototype a block reservation allocation model Brian Foster
2016-04-12 16:42 ` Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 01/10] xfs: refactor xfs_reserve_blocks() to handle ENOSPC correctly Brian Foster
2016-04-12 16:42   ` Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 02/10] xfs: replace xfs_mod_fdblocks() bool param with flags Brian Foster
2016-04-12 16:42   ` Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 03/10] block: add block_device_operations methods to set and get reserved space Brian Foster
2016-04-12 16:42   ` Brian Foster
2016-04-14  0:32   ` Dave Chinner
2016-04-14  0:32     ` Dave Chinner
2016-04-12 16:42 ` [RFC v2 PATCH 04/10] dm: add " Brian Foster
2016-04-12 16:42   ` Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 05/10] dm thin: " Brian Foster
2016-04-12 16:42   ` Brian Foster
2016-04-13 17:44   ` Darrick J. Wong
2016-04-13 17:44     ` Darrick J. Wong
2016-04-13 18:33     ` Brian Foster
2016-04-13 18:33       ` Brian Foster
2016-04-13 20:41       ` Brian Foster
2016-04-13 20:41         ` Brian Foster
2016-04-13 21:01         ` Darrick J. Wong
2016-04-13 21:01           ` Darrick J. Wong
2016-04-14 15:10         ` Mike Snitzer
2016-04-14 15:10           ` Mike Snitzer
2016-04-14 16:23           ` Brian Foster
2016-04-14 16:23             ` Brian Foster
2016-04-14 20:18             ` Mike Snitzer
2016-04-14 20:18               ` Mike Snitzer
2016-04-15 11:48               ` Brian Foster
2016-04-15 11:48                 ` Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 06/10] xfs: thin block device reservation mechanism Brian Foster
2016-04-12 16:42   ` Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 07/10] xfs: adopt a reserved allocation model on dm-thin devices Brian Foster
2016-04-12 16:42   ` Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 08/10] xfs: handle bdev reservation ENOSPC correctly from XFS reserved pool Brian Foster
2016-04-12 16:42   ` Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 09/10] xfs: support no block reservation transaction mode Brian Foster
2016-04-12 16:42   ` Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 10/10] xfs: use contiguous bdev reservation for file preallocation Brian Foster
2016-04-12 16:42   ` Brian Foster
2016-04-12 20:04 ` Mike Snitzer [this message]
2016-04-12 20:04   ` [RFC PATCH] block: wire blkdev_fallocate() to block_device_operations' reserve_space Mike Snitzer
2016-04-12 20:39   ` Darrick J. Wong
2016-04-12 20:39     ` Darrick J. Wong
2016-04-12 20:46     ` Mike Snitzer
2016-04-12 20:46       ` Mike Snitzer
2016-04-12 22:25       ` Darrick J. Wong
2016-04-12 22:25         ` Darrick J. Wong
2016-04-12 21:04     ` Mike Snitzer
2016-04-12 21:04       ` Mike Snitzer
2016-04-13  0:12       ` Darrick J. Wong
2016-04-13  0:12         ` Darrick J. Wong
2016-04-14 15:18         ` Mike Snitzer
2016-04-14 15:18           ` Mike Snitzer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160412200459.GA10730@redhat.com \
    --to=snitzer@redhat.com \
    --cc=bfoster@redhat.com \
    --cc=darrick.wong@oracle.com \
    --cc=dm-devel@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.