From: "Darrick J. Wong" <darrick.wong@oracle.com> To: axboe@kernel.dk, akpm@linux-foundation.org, darrick.wong@oracle.com Cc: linux-block@vger.kernel.org, tytso@mit.edu, martin.petersen@oracle.com, snitzer@redhat.com, linux-api@vger.kernel.org, bfoster@redhat.com, xfs@oss.sgi.com, hch@infradead.org, dm-devel@redhat.com, linux-fsdevel@vger.kernel.org, bart.vanassche@sandisk.com Subject: [PATCH 3/3] block: implement (some of) fallocate for block devices Date: Wed, 28 Sep 2016 17:39:51 -0700 [thread overview] Message-ID: <147510959149.8940.2897845352082568677.stgit@birch.djwong.org> (raw) In-Reply-To: <147510957066.8940.13803086684642725401.stgit@birch.djwong.org> After much discussion, it seems that the fallocate feature flag FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been whitelisted for zeroing SCSI UNMAP. Punch still requires that FALLOC_FL_KEEP_SIZE is set. A length that goes past the end of the device will be clamped to the device size if KEEP_SIZE is set; or will return -EINVAL if not. Both start and length must be aligned to the device's logical block size. Since the semantics of fallocate are fairly well established already, wire up the two pieces. The other fallocate variants (collapse range, insert range, and allocate blocks) are not supported. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> --- v2: Incorporate feedback from Christoph & Linus. Tentatively add a requirement that the fallocate arguments be aligned to logical block size, and put in a few XXX comments ahead of LSF discussion. v3: Forward port to 4.7. v4: Forward port to 4.8. --- fs/block_dev.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ fs/open.c | 3 +- 2 files changed, 80 insertions(+), 1 deletion(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index 08ae993..0c808fc 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -30,6 +30,7 @@ #include <linux/cleancache.h> #include <linux/dax.h> #include <linux/badblocks.h> +#include <linux/falloc.h> #include <asm/uaccess.h> #include "internal.h" @@ -1787,6 +1788,82 @@ static const struct address_space_operations def_blk_aops = { .is_dirty_writeback = buffer_check_dirty_writeback, }; +#define BLKDEV_FALLOC_FL_SUPPORTED \ + (FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE | \ + FALLOC_FL_ZERO_RANGE | FALLOC_FL_NO_HIDE_STALE) + +static long blkdev_fallocate(struct file *file, int mode, loff_t start, + loff_t len) +{ + struct block_device *bdev = I_BDEV(bdev_file_inode(file)); + struct request_queue *q = bdev_get_queue(bdev); + struct address_space *mapping; + loff_t end = start + len - 1; + loff_t isize; + int error; + + /* Fail if we don't recognize the flags. */ + if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED) + return -EOPNOTSUPP; + + /* Don't go off the end of the device. */ + isize = i_size_read(bdev->bd_inode); + if (start >= isize) + return -EINVAL; + if (end > isize) { + if (mode & FALLOC_FL_KEEP_SIZE) { + len = isize - start; + end = start + len - 1; + } else + return -EINVAL; + } + + /* + * Don't allow IO that isn't aligned to logical block size. + */ + if ((start | len) & (bdev_logical_block_size(bdev) - 1)) + return -EINVAL; + + /* Invalidate the page cache, including dirty pages. */ + mapping = bdev->bd_inode->i_mapping; + truncate_inode_pages_range(mapping, start, end); + + switch (mode) { + case FALLOC_FL_ZERO_RANGE: + case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE: + error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, + GFP_KERNEL, false); + if (error) + return error; + break; + case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE: + /* Only punch if the device can do zeroing discard. */ + if (!blk_queue_discard(q) || !q->limits.discard_zeroes_data) + return -EOPNOTSUPP; + error = blkdev_issue_discard(bdev, start >> 9, len >> 9, + GFP_KERNEL, 0); + if (error) + return error; + break; + case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE: + error = blkdev_issue_discard(bdev, start >> 9, len >> 9, + GFP_KERNEL, 0); + if (error) + return error; + break; + default: + return -EOPNOTSUPP; + } + + /* + * Invalidate again; if someone wandered in and dirtied a page, + * the caller will be given -EBUSY; + */ + return invalidate_inode_pages2_range(mapping, + start >> PAGE_SHIFT, + end >> PAGE_SHIFT); +} + const struct file_operations def_blk_fops = { .open = blkdev_open, .release = blkdev_close, @@ -1801,6 +1878,7 @@ const struct file_operations def_blk_fops = { #endif .splice_read = generic_file_splice_read, .splice_write = iter_file_splice_write, + .fallocate = blkdev_fallocate, }; int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long arg) diff --git a/fs/open.c b/fs/open.c index 4fd6e25..01b6092 100644 --- a/fs/open.c +++ b/fs/open.c @@ -289,7 +289,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len) * Let individual file system decide if it supports preallocation * for directories or not. */ - if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode)) + if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) && + !S_ISBLK(inode->i_mode)) return -ENODEV; /* Check for wrap through zero too */
WARNING: multiple messages have this Message-ID (diff)
From: "Darrick J. Wong" <darrick.wong@oracle.com> To: axboe@kernel.dk, akpm@linux-foundation.org, darrick.wong@oracle.com Cc: hch@infradead.org, tytso@mit.edu, martin.petersen@oracle.com, snitzer@redhat.com, linux-api@vger.kernel.org, bfoster@redhat.com, xfs@oss.sgi.com, linux-block@vger.kernel.org, dm-devel@redhat.com, linux-fsdevel@vger.kernel.org, bart.vanassche@sandisk.com Subject: [PATCH 3/3] block: implement (some of) fallocate for block devices Date: Wed, 28 Sep 2016 17:39:51 -0700 [thread overview] Message-ID: <147510959149.8940.2897845352082568677.stgit@birch.djwong.org> (raw) In-Reply-To: <147510957066.8940.13803086684642725401.stgit@birch.djwong.org> After much discussion, it seems that the fallocate feature flag FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been whitelisted for zeroing SCSI UNMAP. Punch still requires that FALLOC_FL_KEEP_SIZE is set. A length that goes past the end of the device will be clamped to the device size if KEEP_SIZE is set; or will return -EINVAL if not. Both start and length must be aligned to the device's logical block size. Since the semantics of fallocate are fairly well established already, wire up the two pieces. The other fallocate variants (collapse range, insert range, and allocate blocks) are not supported. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> --- v2: Incorporate feedback from Christoph & Linus. Tentatively add a requirement that the fallocate arguments be aligned to logical block size, and put in a few XXX comments ahead of LSF discussion. v3: Forward port to 4.7. v4: Forward port to 4.8. --- fs/block_dev.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ fs/open.c | 3 +- 2 files changed, 80 insertions(+), 1 deletion(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index 08ae993..0c808fc 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -30,6 +30,7 @@ #include <linux/cleancache.h> #include <linux/dax.h> #include <linux/badblocks.h> +#include <linux/falloc.h> #include <asm/uaccess.h> #include "internal.h" @@ -1787,6 +1788,82 @@ static const struct address_space_operations def_blk_aops = { .is_dirty_writeback = buffer_check_dirty_writeback, }; +#define BLKDEV_FALLOC_FL_SUPPORTED \ + (FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE | \ + FALLOC_FL_ZERO_RANGE | FALLOC_FL_NO_HIDE_STALE) + +static long blkdev_fallocate(struct file *file, int mode, loff_t start, + loff_t len) +{ + struct block_device *bdev = I_BDEV(bdev_file_inode(file)); + struct request_queue *q = bdev_get_queue(bdev); + struct address_space *mapping; + loff_t end = start + len - 1; + loff_t isize; + int error; + + /* Fail if we don't recognize the flags. */ + if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED) + return -EOPNOTSUPP; + + /* Don't go off the end of the device. */ + isize = i_size_read(bdev->bd_inode); + if (start >= isize) + return -EINVAL; + if (end > isize) { + if (mode & FALLOC_FL_KEEP_SIZE) { + len = isize - start; + end = start + len - 1; + } else + return -EINVAL; + } + + /* + * Don't allow IO that isn't aligned to logical block size. + */ + if ((start | len) & (bdev_logical_block_size(bdev) - 1)) + return -EINVAL; + + /* Invalidate the page cache, including dirty pages. */ + mapping = bdev->bd_inode->i_mapping; + truncate_inode_pages_range(mapping, start, end); + + switch (mode) { + case FALLOC_FL_ZERO_RANGE: + case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE: + error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, + GFP_KERNEL, false); + if (error) + return error; + break; + case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE: + /* Only punch if the device can do zeroing discard. */ + if (!blk_queue_discard(q) || !q->limits.discard_zeroes_data) + return -EOPNOTSUPP; + error = blkdev_issue_discard(bdev, start >> 9, len >> 9, + GFP_KERNEL, 0); + if (error) + return error; + break; + case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE: + error = blkdev_issue_discard(bdev, start >> 9, len >> 9, + GFP_KERNEL, 0); + if (error) + return error; + break; + default: + return -EOPNOTSUPP; + } + + /* + * Invalidate again; if someone wandered in and dirtied a page, + * the caller will be given -EBUSY; + */ + return invalidate_inode_pages2_range(mapping, + start >> PAGE_SHIFT, + end >> PAGE_SHIFT); +} + const struct file_operations def_blk_fops = { .open = blkdev_open, .release = blkdev_close, @@ -1801,6 +1878,7 @@ const struct file_operations def_blk_fops = { #endif .splice_read = generic_file_splice_read, .splice_write = iter_file_splice_write, + .fallocate = blkdev_fallocate, }; int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long arg) diff --git a/fs/open.c b/fs/open.c index 4fd6e25..01b6092 100644 --- a/fs/open.c +++ b/fs/open.c @@ -289,7 +289,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len) * Let individual file system decide if it supports preallocation * for directories or not. */ - if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode)) + if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) && + !S_ISBLK(inode->i_mode)) return -ENODEV; /* Check for wrap through zero too */ _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2016-09-29 0:39 UTC|newest] Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top 2016-09-29 0:39 [PATCH v10 0/3] fallocate for block devices Darrick J. Wong 2016-09-29 0:39 ` Darrick J. Wong 2016-09-29 0:39 ` [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT Darrick J. Wong 2016-09-29 0:39 ` Darrick J. Wong 2016-09-29 0:39 ` Darrick J. Wong 2016-09-29 1:16 ` Bart Van Assche 2016-09-29 1:16 ` Bart Van Assche 2016-09-29 5:56 ` Hannes Reinecke 2016-09-29 5:56 ` Hannes Reinecke 2016-09-29 0:39 ` [PATCH 2/3] block: require write_same and discard requests align to logical block size Darrick J. Wong 2016-09-29 0:39 ` Darrick J. Wong 2016-09-29 5:56 ` Hannes Reinecke 2016-09-29 5:56 ` Hannes Reinecke 2016-09-29 5:56 ` Hannes Reinecke 2016-09-29 0:39 ` Darrick J. Wong [this message] 2016-09-29 0:39 ` [PATCH 3/3] block: implement (some of) fallocate for block devices Darrick J. Wong 2016-09-29 1:42 ` Bart Van Assche 2016-09-29 1:42 ` Bart Van Assche 2016-09-29 1:42 ` Bart Van Assche 2016-09-29 2:09 ` Darrick J. Wong 2016-09-29 2:09 ` Darrick J. Wong 2016-09-29 2:19 ` [PATCH v2 " Darrick J. Wong 2016-09-29 2:19 ` Darrick J. Wong 2016-09-29 20:08 ` Bart Van Assche 2016-09-29 20:08 ` Bart Van Assche 2016-09-29 20:08 ` Bart Van Assche 2016-09-29 20:35 ` Darrick J. Wong 2016-09-29 20:35 ` Darrick J. Wong 2016-09-29 5:57 ` [PATCH " Hannes Reinecke 2016-09-29 5:57 ` Hannes Reinecke -- strict thread matches above, loose matches on Subject: below -- 2016-09-29 21:16 [PATCH v11 0/3] " Darrick J. Wong 2016-09-29 21:16 ` [PATCH 3/3] block: implement (some of) " Darrick J. Wong 2016-09-29 21:16 ` Darrick J. Wong 2016-08-26 0:02 [PATCH v10 0/3] " Darrick J. Wong 2016-08-26 0:02 ` [PATCH 3/3] block: implement (some of) " Darrick J. Wong 2016-08-26 0:02 ` Darrick J. Wong 2016-06-17 1:17 [PATCH v9 0/3] " Darrick J. Wong 2016-06-17 1:17 ` [PATCH 3/3] block: implement (some of) " Darrick J. Wong 2016-06-17 1:17 ` Darrick J. Wong 2016-06-17 1:17 ` Darrick J. Wong 2016-04-13 4:01 [RFC DONOTMERGE v8 0/3] " Darrick J. Wong 2016-04-13 4:01 ` [PATCH 3/3] block: implement (some of) " Darrick J. Wong 2016-04-13 4:01 ` Darrick J. Wong 2016-04-13 4:01 ` Darrick J. Wong 2016-03-15 19:42 [PATCH v7 0/3] fallocate for block devices to provide zero-out Darrick J. Wong 2016-03-15 19:42 ` [PATCH 3/3] block: implement (some of) fallocate for block devices Darrick J. Wong 2016-03-15 19:42 ` Darrick J. Wong 2016-03-21 15:38 ` Christoph Hellwig 2016-03-21 15:38 ` Christoph Hellwig 2016-03-21 17:52 ` Darrick J. Wong 2016-03-21 17:52 ` Darrick J. Wong 2016-03-21 18:17 ` Christoph Hellwig 2016-03-21 18:17 ` Christoph Hellwig 2016-03-21 18:21 ` Martin K. Petersen 2016-03-21 18:21 ` Martin K. Petersen 2016-03-21 18:52 ` Mike Snitzer 2016-03-21 18:52 ` Mike Snitzer 2016-03-21 19:11 ` Darrick J. Wong 2016-03-21 19:11 ` Darrick J. Wong 2016-03-21 19:22 ` Mike Snitzer 2016-03-21 20:59 ` Brian Foster 2016-03-21 20:59 ` Brian Foster 2016-03-05 0:55 [PATCH v6 0/3] fallocate for block devices to provide zero-out Darrick J. Wong 2016-03-05 0:56 ` [PATCH 3/3] block: implement (some of) fallocate for block devices Darrick J. Wong 2016-03-05 3:06 ` Linus Torvalds 2016-03-05 3:06 ` Linus Torvalds 2016-03-05 20:57 ` Christoph Hellwig 2016-03-05 20:57 ` Christoph Hellwig 2016-03-05 3:13 ` Linus Torvalds 2016-03-05 3:13 ` Linus Torvalds 2016-03-05 20:58 ` Christoph Hellwig 2016-03-05 20:58 ` Christoph Hellwig
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=147510959149.8940.2897845352082568677.stgit@birch.djwong.org \ --to=darrick.wong@oracle.com \ --cc=akpm@linux-foundation.org \ --cc=axboe@kernel.dk \ --cc=bart.vanassche@sandisk.com \ --cc=bfoster@redhat.com \ --cc=dm-devel@redhat.com \ --cc=hch@infradead.org \ --cc=linux-api@vger.kernel.org \ --cc=linux-block@vger.kernel.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=martin.petersen@oracle.com \ --cc=snitzer@redhat.com \ --cc=tytso@mit.edu \ --cc=xfs@oss.sgi.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.