From: Mike Snitzer <snitzer@redhat.com> To: Brian Foster <bfoster@redhat.com> Cc: xfs@oss.sgi.com, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, dm-devel@redhat.com, "Darrick J. Wong" <darrick.wong@oracle.com> Subject: [RFC PATCH] block: wire blkdev_fallocate() to block_device_operations' reserve_space Date: Tue, 12 Apr 2016 16:04:59 -0400 [thread overview] Message-ID: <20160412200459.GA10730@redhat.com> (raw) In-Reply-To: <1460479373-63317-1-git-send-email-bfoster@redhat.com> On Tue, Apr 12 2016 at 12:42P -0400, Brian Foster <bfoster@redhat.com> wrote: > Hi all, > > This is v2 of the XFS and block device reservation experiment. The > significant changes in v2 are that the bdev interface has been condensed > to a single callback function, the XFS transaction reservation > management has been reworked to make transactions responsible for > tracking and releasing excess reservation (for non-delalloc cases) and a > workaround for the fallocate over-reservation issue is included. Beyond > that, this version adds a bunch of miscellaneous cleanups and fixes some > of the nastier locking/leak issues present in the first rfc. > > Patches 1-2 refactor some XFS reserve pool and block accounting code in > preparation for subsequent patches. Patches 3-5 add block/device-mapper > reservation support. Patches 6-10 add the core reservation > infrastructure and management bits to XFS. See the link to the original > rfc below for instructions and further details around the purpose of > this series. > > Finally, note that this is still highly experimental/theoretical and > should not be used on production systems. Thoughts, reviews, flames > appreciated. Thanks for carrying on with this work Brian. I've started to review your patchset and Darrick's fallocate patchset. I've pushed a branch to linux-dm.git that combines the 2, see: https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-fallocate and then added this RFC patch, at the end, which relies on both of your patchsets -- you'll see blkdev_ensure_space_exists() has a FIXME which implies it isn't much more than simply stubbed out at this point (completely untested): From: Mike Snitzer <snitzer@redhat.com> Date: Tue, 12 Apr 2016 15:54:31 -0400 Subject: [RFC PATCH] block: wire blkdev_fallocate() to block_device_operations' reserve_space This effectively exposes the primitive for "ensure space exists". It relies on block_device_operations' reserve_space method. Signed-off-by: Mike Snitzer <snitzer@redhat.com> --- block/blk-lib.c | 26 ++++++++++++++++++++++++++ fs/block_dev.c | 20 +++++++++++--------- include/linux/blkdev.h | 2 ++ 3 files changed, 39 insertions(+), 9 deletions(-) diff --git a/block/blk-lib.c b/block/blk-lib.c index 9dca6bb..5042a84 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -314,3 +314,29 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask); } EXPORT_SYMBOL(blkdev_issue_zeroout); + +/** + * blkdev_ensure_space_exists - preallocate a block range + * @bdev: blockdev to preallocate space for + * @sector: start sector + * @nr_sects: number of sectors to preallocate + * @gfp_mask: memory allocation flags (for bio_alloc) + * @flags: FALLOC_FL_* to control behaviour + * + * Description: + * Ensure space exists, or is preallocated, for the sectors in question. + */ +int blkdev_ensure_space_exists(struct block_device *bdev, sector_t sector, + sector_t nr_sects, unsigned long flags) +{ + sector_t res; + const struct block_device_operations *ops = bdev->bd_disk->fops; + + if (!ops->reserve_space) + return -EOPNOTSUPP; + + // FIXME: check with Brian Foster on whether it makes sense to + // use BDEV_RES_GET/BDEV_RES_MOD instead of BDEV_RES_PROVISION? + return ops->reserve_space(bdev, BDEV_RES_PROVISION, sector, nr_sects, &res); +} +EXPORT_SYMBOL(blkdev_ensure_space_exists); diff --git a/fs/block_dev.c b/fs/block_dev.c index 5a2c3ab..b34c07b 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -1801,17 +1801,13 @@ long blkdev_fallocate(struct file *file, int mode, loff_t start, loff_t len) struct request_queue *q = bdev_get_queue(bdev); struct address_space *mapping; loff_t end = start + len - 1; - loff_t bs_mask, isize; + loff_t isize; int error; /* We only support zero range and punch hole. */ if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED) return -EOPNOTSUPP; - /* We haven't a primitive for "ensure space exists" right now. */ - if (!(mode & ~FALLOC_FL_KEEP_SIZE)) - return -EOPNOTSUPP; - /* Only punch if the device can do zeroing discard. */ if ((mode & FALLOC_FL_PUNCH_HOLE) && (!blk_queue_discard(q) || !q->limits.discard_zeroes_data)) @@ -1829,9 +1825,12 @@ long blkdev_fallocate(struct file *file, int mode, loff_t start, loff_t len) return -EINVAL; } - /* Don't allow IO that isn't aligned to logical block size */ - bs_mask = bdev_logical_block_size(bdev) - 1; - if ((start | len) & bs_mask) + /* + * Don't allow IO that isn't aligned to minimum IO size (io_min) + * - for normal device's io_min is usually logical block size + * - but for more exotic devices (e.g. DM thinp) it may be larger + */ + if ((start | len) % bdev_io_min(bdev)) return -EINVAL; /* Invalidate the page cache, including dirty pages. */ @@ -1839,7 +1838,10 @@ long blkdev_fallocate(struct file *file, int mode, loff_t start, loff_t len) truncate_inode_pages_range(mapping, start, end); error = -EINVAL; - if (mode & FALLOC_FL_ZERO_RANGE) + if (!(mode & ~FALLOC_FL_KEEP_SIZE)) + error = blkdev_ensure_space_exists(bdev, start >> 9, len >> 9, + mode); + else if (mode & FALLOC_FL_ZERO_RANGE) error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL, false); else if (mode & FALLOC_FL_PUNCH_HOLE) diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 6c6ea96..4147af2 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1132,6 +1132,8 @@ extern int blkdev_issue_write_same(struct block_device *bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask, struct page *page); extern int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask, bool discard); +extern int blkdev_ensure_space_exists(struct block_device *bdev, sector_t sector, + sector_t nr_sects, unsigned long flags); static inline int sb_issue_discard(struct super_block *sb, sector_t block, sector_t nr_blocks, gfp_t gfp_mask, unsigned long flags) { -- 2.6.4 (Apple Git-63)
WARNING: multiple messages have this Message-ID (diff)
From: Mike Snitzer <snitzer@redhat.com> To: Brian Foster <bfoster@redhat.com> Cc: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, dm-devel@redhat.com, "Darrick J. Wong" <darrick.wong@oracle.com>, xfs@oss.sgi.com Subject: [RFC PATCH] block: wire blkdev_fallocate() to block_device_operations' reserve_space Date: Tue, 12 Apr 2016 16:04:59 -0400 [thread overview] Message-ID: <20160412200459.GA10730@redhat.com> (raw) In-Reply-To: <1460479373-63317-1-git-send-email-bfoster@redhat.com> On Tue, Apr 12 2016 at 12:42P -0400, Brian Foster <bfoster@redhat.com> wrote: > Hi all, > > This is v2 of the XFS and block device reservation experiment. The > significant changes in v2 are that the bdev interface has been condensed > to a single callback function, the XFS transaction reservation > management has been reworked to make transactions responsible for > tracking and releasing excess reservation (for non-delalloc cases) and a > workaround for the fallocate over-reservation issue is included. Beyond > that, this version adds a bunch of miscellaneous cleanups and fixes some > of the nastier locking/leak issues present in the first rfc. > > Patches 1-2 refactor some XFS reserve pool and block accounting code in > preparation for subsequent patches. Patches 3-5 add block/device-mapper > reservation support. Patches 6-10 add the core reservation > infrastructure and management bits to XFS. See the link to the original > rfc below for instructions and further details around the purpose of > this series. > > Finally, note that this is still highly experimental/theoretical and > should not be used on production systems. Thoughts, reviews, flames > appreciated. Thanks for carrying on with this work Brian. I've started to review your patchset and Darrick's fallocate patchset. I've pushed a branch to linux-dm.git that combines the 2, see: https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-fallocate and then added this RFC patch, at the end, which relies on both of your patchsets -- you'll see blkdev_ensure_space_exists() has a FIXME which implies it isn't much more than simply stubbed out at this point (completely untested): From: Mike Snitzer <snitzer@redhat.com> Date: Tue, 12 Apr 2016 15:54:31 -0400 Subject: [RFC PATCH] block: wire blkdev_fallocate() to block_device_operations' reserve_space This effectively exposes the primitive for "ensure space exists". It relies on block_device_operations' reserve_space method. Signed-off-by: Mike Snitzer <snitzer@redhat.com> --- block/blk-lib.c | 26 ++++++++++++++++++++++++++ fs/block_dev.c | 20 +++++++++++--------- include/linux/blkdev.h | 2 ++ 3 files changed, 39 insertions(+), 9 deletions(-) diff --git a/block/blk-lib.c b/block/blk-lib.c index 9dca6bb..5042a84 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -314,3 +314,29 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask); } EXPORT_SYMBOL(blkdev_issue_zeroout); + +/** + * blkdev_ensure_space_exists - preallocate a block range + * @bdev: blockdev to preallocate space for + * @sector: start sector + * @nr_sects: number of sectors to preallocate + * @gfp_mask: memory allocation flags (for bio_alloc) + * @flags: FALLOC_FL_* to control behaviour + * + * Description: + * Ensure space exists, or is preallocated, for the sectors in question. + */ +int blkdev_ensure_space_exists(struct block_device *bdev, sector_t sector, + sector_t nr_sects, unsigned long flags) +{ + sector_t res; + const struct block_device_operations *ops = bdev->bd_disk->fops; + + if (!ops->reserve_space) + return -EOPNOTSUPP; + + // FIXME: check with Brian Foster on whether it makes sense to + // use BDEV_RES_GET/BDEV_RES_MOD instead of BDEV_RES_PROVISION? + return ops->reserve_space(bdev, BDEV_RES_PROVISION, sector, nr_sects, &res); +} +EXPORT_SYMBOL(blkdev_ensure_space_exists); diff --git a/fs/block_dev.c b/fs/block_dev.c index 5a2c3ab..b34c07b 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -1801,17 +1801,13 @@ long blkdev_fallocate(struct file *file, int mode, loff_t start, loff_t len) struct request_queue *q = bdev_get_queue(bdev); struct address_space *mapping; loff_t end = start + len - 1; - loff_t bs_mask, isize; + loff_t isize; int error; /* We only support zero range and punch hole. */ if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED) return -EOPNOTSUPP; - /* We haven't a primitive for "ensure space exists" right now. */ - if (!(mode & ~FALLOC_FL_KEEP_SIZE)) - return -EOPNOTSUPP; - /* Only punch if the device can do zeroing discard. */ if ((mode & FALLOC_FL_PUNCH_HOLE) && (!blk_queue_discard(q) || !q->limits.discard_zeroes_data)) @@ -1829,9 +1825,12 @@ long blkdev_fallocate(struct file *file, int mode, loff_t start, loff_t len) return -EINVAL; } - /* Don't allow IO that isn't aligned to logical block size */ - bs_mask = bdev_logical_block_size(bdev) - 1; - if ((start | len) & bs_mask) + /* + * Don't allow IO that isn't aligned to minimum IO size (io_min) + * - for normal device's io_min is usually logical block size + * - but for more exotic devices (e.g. DM thinp) it may be larger + */ + if ((start | len) % bdev_io_min(bdev)) return -EINVAL; /* Invalidate the page cache, including dirty pages. */ @@ -1839,7 +1838,10 @@ long blkdev_fallocate(struct file *file, int mode, loff_t start, loff_t len) truncate_inode_pages_range(mapping, start, end); error = -EINVAL; - if (mode & FALLOC_FL_ZERO_RANGE) + if (!(mode & ~FALLOC_FL_KEEP_SIZE)) + error = blkdev_ensure_space_exists(bdev, start >> 9, len >> 9, + mode); + else if (mode & FALLOC_FL_ZERO_RANGE) error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL, false); else if (mode & FALLOC_FL_PUNCH_HOLE) diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 6c6ea96..4147af2 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1132,6 +1132,8 @@ extern int blkdev_issue_write_same(struct block_device *bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask, struct page *page); extern int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask, bool discard); +extern int blkdev_ensure_space_exists(struct block_device *bdev, sector_t sector, + sector_t nr_sects, unsigned long flags); static inline int sb_issue_discard(struct super_block *sb, sector_t block, sector_t nr_blocks, gfp_t gfp_mask, unsigned long flags) { -- 2.6.4 (Apple Git-63) _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2016-04-12 20:05 UTC|newest] Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top 2016-04-12 16:42 [RFC v2 PATCH 00/10] dm-thin/xfs: prototype a block reservation allocation model Brian Foster 2016-04-12 16:42 ` Brian Foster 2016-04-12 16:42 ` [RFC v2 PATCH 01/10] xfs: refactor xfs_reserve_blocks() to handle ENOSPC correctly Brian Foster 2016-04-12 16:42 ` Brian Foster 2016-04-12 16:42 ` [RFC v2 PATCH 02/10] xfs: replace xfs_mod_fdblocks() bool param with flags Brian Foster 2016-04-12 16:42 ` Brian Foster 2016-04-12 16:42 ` [RFC v2 PATCH 03/10] block: add block_device_operations methods to set and get reserved space Brian Foster 2016-04-12 16:42 ` Brian Foster 2016-04-14 0:32 ` Dave Chinner 2016-04-14 0:32 ` Dave Chinner 2016-04-12 16:42 ` [RFC v2 PATCH 04/10] dm: add " Brian Foster 2016-04-12 16:42 ` Brian Foster 2016-04-12 16:42 ` [RFC v2 PATCH 05/10] dm thin: " Brian Foster 2016-04-12 16:42 ` Brian Foster 2016-04-13 17:44 ` Darrick J. Wong 2016-04-13 17:44 ` Darrick J. Wong 2016-04-13 18:33 ` Brian Foster 2016-04-13 18:33 ` Brian Foster 2016-04-13 20:41 ` Brian Foster 2016-04-13 20:41 ` Brian Foster 2016-04-13 21:01 ` Darrick J. Wong 2016-04-13 21:01 ` Darrick J. Wong 2016-04-14 15:10 ` Mike Snitzer 2016-04-14 15:10 ` Mike Snitzer 2016-04-14 16:23 ` Brian Foster 2016-04-14 16:23 ` Brian Foster 2016-04-14 20:18 ` Mike Snitzer 2016-04-14 20:18 ` Mike Snitzer 2016-04-15 11:48 ` Brian Foster 2016-04-15 11:48 ` Brian Foster 2016-04-12 16:42 ` [RFC v2 PATCH 06/10] xfs: thin block device reservation mechanism Brian Foster 2016-04-12 16:42 ` Brian Foster 2016-04-12 16:42 ` [RFC v2 PATCH 07/10] xfs: adopt a reserved allocation model on dm-thin devices Brian Foster 2016-04-12 16:42 ` Brian Foster 2016-04-12 16:42 ` [RFC v2 PATCH 08/10] xfs: handle bdev reservation ENOSPC correctly from XFS reserved pool Brian Foster 2016-04-12 16:42 ` Brian Foster 2016-04-12 16:42 ` [RFC v2 PATCH 09/10] xfs: support no block reservation transaction mode Brian Foster 2016-04-12 16:42 ` Brian Foster 2016-04-12 16:42 ` [RFC v2 PATCH 10/10] xfs: use contiguous bdev reservation for file preallocation Brian Foster 2016-04-12 16:42 ` Brian Foster 2016-04-12 20:04 ` Mike Snitzer [this message] 2016-04-12 20:04 ` [RFC PATCH] block: wire blkdev_fallocate() to block_device_operations' reserve_space Mike Snitzer 2016-04-12 20:39 ` Darrick J. Wong 2016-04-12 20:39 ` Darrick J. Wong 2016-04-12 20:46 ` Mike Snitzer 2016-04-12 20:46 ` Mike Snitzer 2016-04-12 22:25 ` Darrick J. Wong 2016-04-12 22:25 ` Darrick J. Wong 2016-04-12 21:04 ` Mike Snitzer 2016-04-12 21:04 ` Mike Snitzer 2016-04-13 0:12 ` Darrick J. Wong 2016-04-13 0:12 ` Darrick J. Wong 2016-04-14 15:18 ` Mike Snitzer 2016-04-14 15:18 ` Mike Snitzer
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20160412200459.GA10730@redhat.com \ --to=snitzer@redhat.com \ --cc=bfoster@redhat.com \ --cc=darrick.wong@oracle.com \ --cc=dm-devel@redhat.com \ --cc=linux-block@vger.kernel.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=xfs@oss.sgi.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.