linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: axboe@kernel.dk, torvalds@linux-foundation.org, darrick.wong@oracle.com
Cc: hch@infradead.org, tytso@mit.edu, martin.petersen@oracle.com,
	linux-api@vger.kernel.org, david@fromorbit.com,
	linux-kernel@vger.kernel.org, shane.seymour@hpe.com,
	bfields@fieldses.org, linux-fsdevel@vger.kernel.org,
	jlayton@poochiereds.net, akpm@linux-foundation.org
Subject: [PATCH 3/3] block: implement (some of) fallocate for block devices
Date: Fri, 04 Mar 2016 16:56:17 -0800	[thread overview]
Message-ID: <20160305005617.29738.85316.stgit@birch.djwong.org> (raw)
In-Reply-To: <20160305005556.29738.66782.stgit@birch.djwong.org>

After much discussion, it seems that the fallocate feature flag
FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
whitelisted for zeroing SCSI UNMAP.  Both flags require that
FALLOC_FL_KEEP_SIZE are set, both return EINVAL if one tries
to write past the end of the device, and both require that the
offset and length be aligned at least to 512-byte offsets.q

Since the semantics of fallocate are fairly well established already,
wire up the two pieces.  The other fallocate variants (collapse range,
insert range, and allocate blocks) are not supported.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/block_dev.c |   67 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/open.c      |    3 ++-
 2 files changed, 69 insertions(+), 1 deletion(-)


diff --git a/fs/block_dev.c b/fs/block_dev.c
index 826b164..c9c9421 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -30,6 +30,7 @@
 #include <linux/cleancache.h>
 #include <linux/dax.h>
 #include <asm/uaccess.h>
+#include <linux/falloc.h>
 #include "internal.h"
 
 struct bdev_inode {
@@ -1786,6 +1787,71 @@ static int blkdev_mmap(struct file *file, struct vm_area_struct *vma)
 #define blkdev_mmap generic_file_mmap
 #endif
 
+#define	BLKDEV_FALLOC_FL_SUPPORTED					\
+		(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |		\
+		 FALLOC_FL_ZERO_RANGE)
+
+long blkdev_fallocate(struct file *file, int mode, loff_t start, loff_t len)
+{
+	struct block_device *bdev = I_BDEV(bdev_file_inode(file));
+	struct request_queue *q = bdev_get_queue(bdev);
+	struct address_space *mapping;
+	loff_t end = start + len - 1;
+	loff_t bs_mask;
+	int error;
+
+	/* We only support zero range and punch hole. */
+	if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
+		return -EOPNOTSUPP;
+
+	/* We can't change the bdev size from here */
+	if (!(mode & FALLOC_FL_KEEP_SIZE))
+		return -EOPNOTSUPP;
+
+	/* We haven't a primitive for "ensure space exists" right now. */
+	if (mode == FALLOC_FL_KEEP_SIZE)
+		return -EOPNOTSUPP;
+
+	/* Only punch if the device can do zeroing discard. */
+	if ((mode & FALLOC_FL_PUNCH_HOLE) &&
+	    (!blk_queue_discard(q) || !q->limits.discard_zeroes_data))
+		return -EOPNOTSUPP;
+
+	/* Don't allow IO that isn't aligned to logical block size */
+	bs_mask = bdev_logical_block_size(bdev) - 1;
+	if ((start & bs_mask) || ((start + len) & bs_mask))
+		return -EINVAL;
+
+	/* Don't go off the end of the device */
+	if (end > i_size_read(bdev->bd_inode))
+		return -EINVAL;
+	if (end < start)
+		return -EINVAL;
+
+	/* Invalidate the page cache, including dirty pages. */
+	mapping = bdev->bd_inode->i_mapping;
+	truncate_inode_pages_range(mapping, start, end);
+
+	error = -EINVAL;
+	if (mode & FALLOC_FL_ZERO_RANGE)
+		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
+					    GFP_KERNEL, false);
+	else if (mode & FALLOC_FL_PUNCH_HOLE)
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+	if (error)
+		return error;
+
+	/*
+	 * Invalidate again; if someone wandered in and dirtied a page,
+	 * the caller will be given -EBUSY;
+	 */
+	return invalidate_inode_pages2_range(mapping,
+					     start >> PAGE_CACHE_SHIFT,
+					     end >> PAGE_CACHE_SHIFT);
+}
+EXPORT_SYMBOL_GPL(blkdev_fallocate);
+
 const struct file_operations def_blk_fops = {
 	.open		= blkdev_open,
 	.release	= blkdev_close,
@@ -1800,6 +1866,7 @@ const struct file_operations def_blk_fops = {
 #endif
 	.splice_read	= generic_file_splice_read,
 	.splice_write	= iter_file_splice_write,
+	.fallocate	= blkdev_fallocate,
 };
 
 int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long arg)
diff --git a/fs/open.c b/fs/open.c
index 55bdc75..4f99adc 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -289,7 +289,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 	 * Let individual file system decide if it supports preallocation
 	 * for directories or not.
 	 */
-	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
+	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) &&
+	    !S_ISBLK(inode->i_mode))
 		return -ENODEV;
 
 	/* Check for wrap through zero too */

  parent reply	other threads:[~2016-03-05  0:56 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-05  0:55 [PATCH v6 0/3] fallocate for block devices to provide zero-out Darrick J. Wong
2016-03-05  0:56 ` [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT Darrick J. Wong
2016-03-15  7:35   ` Christoph Hellwig
2016-03-05  0:56 ` [PATCH 2/3] block: require write_same and discard requests align to logical block size Darrick J. Wong
2016-03-05  3:02   ` Linus Torvalds
2016-03-15  7:34   ` Christoph Hellwig
2016-03-05  0:56 ` Darrick J. Wong [this message]
2016-03-05  3:06   ` [PATCH 3/3] block: implement (some of) fallocate for block devices Linus Torvalds
2016-03-05 20:57     ` Christoph Hellwig
2016-03-05  3:13   ` Linus Torvalds
2016-03-05 20:58     ` Christoph Hellwig
2016-03-05  3:17 ` [PATCH v6 0/3] fallocate for block devices to provide zero-out Linus Torvalds
2016-03-15 19:42 [PATCH v7 " Darrick J. Wong
2016-03-15 19:42 ` [PATCH 3/3] block: implement (some of) fallocate for block devices Darrick J. Wong
2016-03-21 15:38   ` Christoph Hellwig
2016-03-21 17:52     ` Darrick J. Wong
2016-03-21 18:17       ` Christoph Hellwig
2016-03-21 18:21         ` Martin K. Petersen
2016-03-21 18:52   ` Mike Snitzer
2016-03-21 19:11     ` Darrick J. Wong
2016-03-21 19:22       ` Mike Snitzer
2016-03-21 20:59         ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160305005617.29738.85316.stgit@birch.djwong.org \
    --to=darrick.wong@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=bfields@fieldses.org \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jlayton@poochiereds.net \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=shane.seymour@hpe.com \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).