All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v10 0/3] fallocate for block devices
@ 2016-09-29  0:39 ` Darrick J. Wong
  0 siblings, 0 replies; 66+ messages in thread
From: Darrick J. Wong @ 2016-09-29  0:39 UTC (permalink / raw)
  To: axboe, akpm, darrick.wong
  Cc: linux-block, tytso, martin.petersen, snitzer, linux-api, bfoster,
	xfs, hch, dm-devel, linux-fsdevel, bart.vanassche

Hi Andrew & others,

This is a resend of the patchset to fix page cache coherency with
BLKZEROOUT and implement fallocate for block devices.  This time I'm
sending it direct to Andrew for inclusion because the block layer
maintainer has not been responsive over the past year of submissions.
Can this please go upstream for 4.9?

The first patch is still a fix to the existing BLKZEROOUT ioctl to
invalidate the page cache if the zeroing command to the underlying
device succeeds.  Without this patch we still have the pagecache
coherence bug that's been in the kernel forever.

The second patch changes the internal block device functions to reject
attempts to discard or zeroout that are not aligned to the logical
block size.  Previously, we only checked that the start/len parameters
were 512-byte aligned, which caused kernel BUG_ONs for unaligned IOs
to 4k-LBA devices.

The third patch creates an fallocate handler for block devices, wires
up the FALLOC_FL_PUNCH_HOLE flag to zeroing-discard, and connects
FALLOC_FL_ZERO_RANGE to write-same so that we can have a consistent
fallocate interface between files and block devices.  It also allows
the combination of PUNCH_HOLE and NO_HIDE_STALE to invoke non-zeroing
discard.

Test cases for the new block device fallocate are now in xfstests as
generic/349-351.

Comments and questions are, as always, welcome.  Patches are against
4.8-rc8.

v7: Strengthen parameter checking and fix various code issues pointed
out by Linus and Christoph.
v8: More code rearranging, rebase to 4.6-rc3, and dig into alignment
issues.
v9: Forward port to 4.7.
v10: Forward port to 4.8.  Remove the extra call to
invalidate_inode_pages2_range per Bart Van Assche's request.

--D

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v10 0/3] fallocate for block devices
@ 2016-09-29  0:39 ` Darrick J. Wong
  0 siblings, 0 replies; 66+ messages in thread
From: Darrick J. Wong @ 2016-09-29  0:39 UTC (permalink / raw)
  To: axboe, akpm, darrick.wong
  Cc: hch, tytso, martin.petersen, snitzer, linux-api, bfoster, xfs,
	linux-block, dm-devel, linux-fsdevel, bart.vanassche

Hi Andrew & others,

This is a resend of the patchset to fix page cache coherency with
BLKZEROOUT and implement fallocate for block devices.  This time I'm
sending it direct to Andrew for inclusion because the block layer
maintainer has not been responsive over the past year of submissions.
Can this please go upstream for 4.9?

The first patch is still a fix to the existing BLKZEROOUT ioctl to
invalidate the page cache if the zeroing command to the underlying
device succeeds.  Without this patch we still have the pagecache
coherence bug that's been in the kernel forever.

The second patch changes the internal block device functions to reject
attempts to discard or zeroout that are not aligned to the logical
block size.  Previously, we only checked that the start/len parameters
were 512-byte aligned, which caused kernel BUG_ONs for unaligned IOs
to 4k-LBA devices.

The third patch creates an fallocate handler for block devices, wires
up the FALLOC_FL_PUNCH_HOLE flag to zeroing-discard, and connects
FALLOC_FL_ZERO_RANGE to write-same so that we can have a consistent
fallocate interface between files and block devices.  It also allows
the combination of PUNCH_HOLE and NO_HIDE_STALE to invoke non-zeroing
discard.

Test cases for the new block device fallocate are now in xfstests as
generic/349-351.

Comments and questions are, as always, welcome.  Patches are against
4.8-rc8.

v7: Strengthen parameter checking and fix various code issues pointed
out by Linus and Christoph.
v8: More code rearranging, rebase to 4.6-rc3, and dig into alignment
issues.
v9: Forward port to 4.7.
v10: Forward port to 4.8.  Remove the extra call to
invalidate_inode_pages2_range per Bart Van Assche's request.

--D

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT.
  2016-09-29  0:39 ` Darrick J. Wong
  (?)
@ 2016-09-29  0:39   ` Darrick J. Wong
  -1 siblings, 0 replies; 66+ messages in thread
From: Darrick J. Wong @ 2016-09-29  0:39 UTC (permalink / raw)
  To: axboe, akpm, darrick.wong
  Cc: linux-block, tytso, martin.petersen, snitzer, linux-api, bfoster,
	xfs, hch, dm-devel, linux-fsdevel, bart.vanassche,
	Christoph Hellwig

Invalidate the page cache (as a regular O_DIRECT write would do) to avoid
returning stale cache contents at a later time.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 block/ioctl.c |   18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)


diff --git a/block/ioctl.c b/block/ioctl.c
index ed2397f..755119c 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -225,7 +225,8 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
 		unsigned long arg)
 {
 	uint64_t range[2];
-	uint64_t start, len;
+	struct address_space *mapping;
+	uint64_t start, end, len;
 
 	if (!(mode & FMODE_WRITE))
 		return -EBADF;
@@ -235,18 +236,23 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
 
 	start = range[0];
 	len = range[1];
+	end = start + len - 1;
 
 	if (start & 511)
 		return -EINVAL;
 	if (len & 511)
 		return -EINVAL;
-	start >>= 9;
-	len >>= 9;
-
-	if (start + len > (i_size_read(bdev->bd_inode) >> 9))
+	if (end >= (uint64_t)i_size_read(bdev->bd_inode))
+		return -EINVAL;
+	if (end < start)
 		return -EINVAL;
 
-	return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL, false);
+	/* Invalidate the page cache, including dirty pages */
+	mapping = bdev->bd_inode->i_mapping;
+	truncate_inode_pages_range(mapping, start, end);
+
+	return blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
+				    false);
 }
 
 static int put_ushort(unsigned long arg, unsigned short val)

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT.
@ 2016-09-29  0:39   ` Darrick J. Wong
  0 siblings, 0 replies; 66+ messages in thread
From: Darrick J. Wong @ 2016-09-29  0:39 UTC (permalink / raw)
  To: axboe, akpm, darrick.wong
  Cc: hch, tytso, martin.petersen, snitzer, linux-api, bfoster, xfs,
	linux-block, dm-devel, linux-fsdevel, bart.vanassche,
	Christoph Hellwig

Invalidate the page cache (as a regular O_DIRECT write would do) to avoid
returning stale cache contents at a later time.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 block/ioctl.c |   18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)


diff --git a/block/ioctl.c b/block/ioctl.c
index ed2397f..755119c 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -225,7 +225,8 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
 		unsigned long arg)
 {
 	uint64_t range[2];
-	uint64_t start, len;
+	struct address_space *mapping;
+	uint64_t start, end, len;
 
 	if (!(mode & FMODE_WRITE))
 		return -EBADF;
@@ -235,18 +236,23 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
 
 	start = range[0];
 	len = range[1];
+	end = start + len - 1;
 
 	if (start & 511)
 		return -EINVAL;
 	if (len & 511)
 		return -EINVAL;
-	start >>= 9;
-	len >>= 9;
-
-	if (start + len > (i_size_read(bdev->bd_inode) >> 9))
+	if (end >= (uint64_t)i_size_read(bdev->bd_inode))
+		return -EINVAL;
+	if (end < start)
 		return -EINVAL;
 
-	return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL, false);
+	/* Invalidate the page cache, including dirty pages */
+	mapping = bdev->bd_inode->i_mapping;
+	truncate_inode_pages_range(mapping, start, end);
+
+	return blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
+				    false);
 }
 
 static int put_ushort(unsigned long arg, unsigned short val)

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT.
@ 2016-09-29  0:39   ` Darrick J. Wong
  0 siblings, 0 replies; 66+ messages in thread
From: Darrick J. Wong @ 2016-09-29  0:39 UTC (permalink / raw)
  To: axboe-tSWWG44O7X1aa/9Udqfwiw,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	darrick.wong-QHcLZuEGTsvQT0dZR+AlfA
  Cc: linux-block-u79uwXL29TY76Z2rM5mHXA, tytso-3s7WtUTddSA,
	martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, linux-api-u79uwXL29TY76Z2rM5mHXA,
	bfoster-H+wXaHxf7aLQT0dZR+AlfA, xfs-VZNHf3L845pBDgjK7y7TUQ,
	hch-wEGCiKHe2LqWVfeAwA7xHQ, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ, Christoph Hellwig

Invalidate the page cache (as a regular O_DIRECT write would do) to avoid
returning stale cache contents at a later time.

Signed-off-by: Darrick J. Wong <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Reviewed-by: Martin K. Petersen <martin.petersen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 block/ioctl.c |   18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)


diff --git a/block/ioctl.c b/block/ioctl.c
index ed2397f..755119c 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -225,7 +225,8 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
 		unsigned long arg)
 {
 	uint64_t range[2];
-	uint64_t start, len;
+	struct address_space *mapping;
+	uint64_t start, end, len;
 
 	if (!(mode & FMODE_WRITE))
 		return -EBADF;
@@ -235,18 +236,23 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
 
 	start = range[0];
 	len = range[1];
+	end = start + len - 1;
 
 	if (start & 511)
 		return -EINVAL;
 	if (len & 511)
 		return -EINVAL;
-	start >>= 9;
-	len >>= 9;
-
-	if (start + len > (i_size_read(bdev->bd_inode) >> 9))
+	if (end >= (uint64_t)i_size_read(bdev->bd_inode))
+		return -EINVAL;
+	if (end < start)
 		return -EINVAL;
 
-	return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL, false);
+	/* Invalidate the page cache, including dirty pages */
+	mapping = bdev->bd_inode->i_mapping;
+	truncate_inode_pages_range(mapping, start, end);
+
+	return blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
+				    false);
 }
 
 static int put_ushort(unsigned long arg, unsigned short val)

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 2/3] block: require write_same and discard requests align to logical block size
  2016-09-29  0:39 ` Darrick J. Wong
@ 2016-09-29  0:39   ` Darrick J. Wong
  -1 siblings, 0 replies; 66+ messages in thread
From: Darrick J. Wong @ 2016-09-29  0:39 UTC (permalink / raw)
  To: axboe, akpm, darrick.wong
  Cc: linux-block, tytso, martin.petersen, snitzer, linux-api, bfoster,
	xfs, hch, dm-devel, linux-fsdevel, bart.vanassche,
	Christoph Hellwig

Make sure that the offset and length arguments that we're using to
construct WRITE SAME and DISCARD requests are actually aligned to the
logical block size.  Failure to do this causes other errors in other
parts of the block layer or the SCSI layer because disks don't support
partial logical block writes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 block/blk-lib.c |   15 +++++++++++++++
 1 file changed, 15 insertions(+)


diff --git a/block/blk-lib.c b/block/blk-lib.c
index 083e56f..46fe924 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -31,6 +31,7 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 	unsigned int granularity;
 	enum req_op op;
 	int alignment;
+	sector_t bs_mask;
 
 	if (!q)
 		return -ENXIO;
@@ -50,6 +51,10 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 		op = REQ_OP_DISCARD;
 	}
 
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
+
 	/* Zero-sector (unknown) and one-sector granularities are the same.  */
 	granularity = max(q->limits.discard_granularity >> 9, 1U);
 	alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
@@ -150,10 +155,15 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 	unsigned int max_write_same_sectors;
 	struct bio *bio = NULL;
 	int ret = 0;
+	sector_t bs_mask;
 
 	if (!q)
 		return -ENXIO;
 
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
+
 	/* Ensure that max_write_same_sectors doesn't overflow bi_size */
 	max_write_same_sectors = UINT_MAX >> 9;
 
@@ -202,6 +212,11 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 	int ret;
 	struct bio *bio = NULL;
 	unsigned int sz;
+	sector_t bs_mask;
+
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
 
 	while (nr_sects != 0) {
 		bio = next_bio(bio, min(nr_sects, (sector_t)BIO_MAX_PAGES),

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 2/3] block: require write_same and discard requests align to logical block size
@ 2016-09-29  0:39   ` Darrick J. Wong
  0 siblings, 0 replies; 66+ messages in thread
From: Darrick J. Wong @ 2016-09-29  0:39 UTC (permalink / raw)
  To: axboe, akpm, darrick.wong
  Cc: hch, tytso, martin.petersen, snitzer, linux-api, bfoster, xfs,
	linux-block, dm-devel, linux-fsdevel, bart.vanassche,
	Christoph Hellwig

Make sure that the offset and length arguments that we're using to
construct WRITE SAME and DISCARD requests are actually aligned to the
logical block size.  Failure to do this causes other errors in other
parts of the block layer or the SCSI layer because disks don't support
partial logical block writes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 block/blk-lib.c |   15 +++++++++++++++
 1 file changed, 15 insertions(+)


diff --git a/block/blk-lib.c b/block/blk-lib.c
index 083e56f..46fe924 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -31,6 +31,7 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 	unsigned int granularity;
 	enum req_op op;
 	int alignment;
+	sector_t bs_mask;
 
 	if (!q)
 		return -ENXIO;
@@ -50,6 +51,10 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 		op = REQ_OP_DISCARD;
 	}
 
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
+
 	/* Zero-sector (unknown) and one-sector granularities are the same.  */
 	granularity = max(q->limits.discard_granularity >> 9, 1U);
 	alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
@@ -150,10 +155,15 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 	unsigned int max_write_same_sectors;
 	struct bio *bio = NULL;
 	int ret = 0;
+	sector_t bs_mask;
 
 	if (!q)
 		return -ENXIO;
 
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
+
 	/* Ensure that max_write_same_sectors doesn't overflow bi_size */
 	max_write_same_sectors = UINT_MAX >> 9;
 
@@ -202,6 +212,11 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 	int ret;
 	struct bio *bio = NULL;
 	unsigned int sz;
+	sector_t bs_mask;
+
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
 
 	while (nr_sects != 0) {
 		bio = next_bio(bio, min(nr_sects, (sector_t)BIO_MAX_PAGES),

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 3/3] block: implement (some of) fallocate for block devices
  2016-09-29  0:39 ` Darrick J. Wong
@ 2016-09-29  0:39   ` Darrick J. Wong
  -1 siblings, 0 replies; 66+ messages in thread
From: Darrick J. Wong @ 2016-09-29  0:39 UTC (permalink / raw)
  To: axboe, akpm, darrick.wong
  Cc: linux-block, tytso, martin.petersen, snitzer, linux-api, bfoster,
	xfs, hch, dm-devel, linux-fsdevel, bart.vanassche

After much discussion, it seems that the fallocate feature flag
FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
whitelisted for zeroing SCSI UNMAP.  Punch still requires that
FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
device will be clamped to the device size if KEEP_SIZE is set; or will
return -EINVAL if not.  Both start and length must be aligned to the
device's logical block size.

Since the semantics of fallocate are fairly well established already,
wire up the two pieces.  The other fallocate variants (collapse range,
insert range, and allocate blocks) are not supported.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
v2: Incorporate feedback from Christoph & Linus.  Tentatively add
a requirement that the fallocate arguments be aligned to logical block
size, and put in a few XXX comments ahead of LSF discussion.
v3: Forward port to 4.7.
v4: Forward port to 4.8.
---
 fs/block_dev.c |   78 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/open.c      |    3 +-
 2 files changed, 80 insertions(+), 1 deletion(-)


diff --git a/fs/block_dev.c b/fs/block_dev.c
index 08ae993..0c808fc 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -30,6 +30,7 @@
 #include <linux/cleancache.h>
 #include <linux/dax.h>
 #include <linux/badblocks.h>
+#include <linux/falloc.h>
 #include <asm/uaccess.h>
 #include "internal.h"
 
@@ -1787,6 +1788,82 @@ static const struct address_space_operations def_blk_aops = {
 	.is_dirty_writeback = buffer_check_dirty_writeback,
 };
 
+#define	BLKDEV_FALLOC_FL_SUPPORTED					\
+		(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |		\
+		 FALLOC_FL_ZERO_RANGE | FALLOC_FL_NO_HIDE_STALE)
+
+static long blkdev_fallocate(struct file *file, int mode, loff_t start,
+			     loff_t len)
+{
+	struct block_device *bdev = I_BDEV(bdev_file_inode(file));
+	struct request_queue *q = bdev_get_queue(bdev);
+	struct address_space *mapping;
+	loff_t end = start + len - 1;
+	loff_t isize;
+	int error;
+
+	/* Fail if we don't recognize the flags. */
+	if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
+		return -EOPNOTSUPP;
+
+	/* Don't go off the end of the device. */
+	isize = i_size_read(bdev->bd_inode);
+	if (start >= isize)
+		return -EINVAL;
+	if (end > isize) {
+		if (mode & FALLOC_FL_KEEP_SIZE) {
+			len = isize - start;
+			end = start + len - 1;
+		} else
+			return -EINVAL;
+	}
+
+	/*
+	 * Don't allow IO that isn't aligned to logical block size.
+	 */
+	if ((start | len) & (bdev_logical_block_size(bdev) - 1))
+		return -EINVAL;
+
+	/* Invalidate the page cache, including dirty pages. */
+	mapping = bdev->bd_inode->i_mapping;
+	truncate_inode_pages_range(mapping, start, end);
+
+	switch (mode) {
+	case FALLOC_FL_ZERO_RANGE:
+	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
+		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
+					    GFP_KERNEL, false);
+		if (error)
+			return error;
+		break;
+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
+		/* Only punch if the device can do zeroing discard. */
+		if (!blk_queue_discard(q) || !q->limits.discard_zeroes_data)
+			return -EOPNOTSUPP;
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+		if (error)
+			return error;
+		break;
+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+		if (error)
+			return error;
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	/*
+	 * Invalidate again; if someone wandered in and dirtied a page,
+	 * the caller will be given -EBUSY;
+	 */
+	return invalidate_inode_pages2_range(mapping,
+					     start >> PAGE_SHIFT,
+					     end >> PAGE_SHIFT);
+}
+
 const struct file_operations def_blk_fops = {
 	.open		= blkdev_open,
 	.release	= blkdev_close,
@@ -1801,6 +1878,7 @@ const struct file_operations def_blk_fops = {
 #endif
 	.splice_read	= generic_file_splice_read,
 	.splice_write	= iter_file_splice_write,
+	.fallocate	= blkdev_fallocate,
 };
 
 int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long arg)
diff --git a/fs/open.c b/fs/open.c
index 4fd6e25..01b6092 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -289,7 +289,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 	 * Let individual file system decide if it supports preallocation
 	 * for directories or not.
 	 */
-	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
+	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) &&
+	    !S_ISBLK(inode->i_mode))
 		return -ENODEV;
 
 	/* Check for wrap through zero too */

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 3/3] block: implement (some of) fallocate for block devices
@ 2016-09-29  0:39   ` Darrick J. Wong
  0 siblings, 0 replies; 66+ messages in thread
From: Darrick J. Wong @ 2016-09-29  0:39 UTC (permalink / raw)
  To: axboe, akpm, darrick.wong
  Cc: hch, tytso, martin.petersen, snitzer, linux-api, bfoster, xfs,
	linux-block, dm-devel, linux-fsdevel, bart.vanassche

After much discussion, it seems that the fallocate feature flag
FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
whitelisted for zeroing SCSI UNMAP.  Punch still requires that
FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
device will be clamped to the device size if KEEP_SIZE is set; or will
return -EINVAL if not.  Both start and length must be aligned to the
device's logical block size.

Since the semantics of fallocate are fairly well established already,
wire up the two pieces.  The other fallocate variants (collapse range,
insert range, and allocate blocks) are not supported.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
v2: Incorporate feedback from Christoph & Linus.  Tentatively add
a requirement that the fallocate arguments be aligned to logical block
size, and put in a few XXX comments ahead of LSF discussion.
v3: Forward port to 4.7.
v4: Forward port to 4.8.
---
 fs/block_dev.c |   78 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/open.c      |    3 +-
 2 files changed, 80 insertions(+), 1 deletion(-)


diff --git a/fs/block_dev.c b/fs/block_dev.c
index 08ae993..0c808fc 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -30,6 +30,7 @@
 #include <linux/cleancache.h>
 #include <linux/dax.h>
 #include <linux/badblocks.h>
+#include <linux/falloc.h>
 #include <asm/uaccess.h>
 #include "internal.h"
 
@@ -1787,6 +1788,82 @@ static const struct address_space_operations def_blk_aops = {
 	.is_dirty_writeback = buffer_check_dirty_writeback,
 };
 
+#define	BLKDEV_FALLOC_FL_SUPPORTED					\
+		(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |		\
+		 FALLOC_FL_ZERO_RANGE | FALLOC_FL_NO_HIDE_STALE)
+
+static long blkdev_fallocate(struct file *file, int mode, loff_t start,
+			     loff_t len)
+{
+	struct block_device *bdev = I_BDEV(bdev_file_inode(file));
+	struct request_queue *q = bdev_get_queue(bdev);
+	struct address_space *mapping;
+	loff_t end = start + len - 1;
+	loff_t isize;
+	int error;
+
+	/* Fail if we don't recognize the flags. */
+	if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
+		return -EOPNOTSUPP;
+
+	/* Don't go off the end of the device. */
+	isize = i_size_read(bdev->bd_inode);
+	if (start >= isize)
+		return -EINVAL;
+	if (end > isize) {
+		if (mode & FALLOC_FL_KEEP_SIZE) {
+			len = isize - start;
+			end = start + len - 1;
+		} else
+			return -EINVAL;
+	}
+
+	/*
+	 * Don't allow IO that isn't aligned to logical block size.
+	 */
+	if ((start | len) & (bdev_logical_block_size(bdev) - 1))
+		return -EINVAL;
+
+	/* Invalidate the page cache, including dirty pages. */
+	mapping = bdev->bd_inode->i_mapping;
+	truncate_inode_pages_range(mapping, start, end);
+
+	switch (mode) {
+	case FALLOC_FL_ZERO_RANGE:
+	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
+		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
+					    GFP_KERNEL, false);
+		if (error)
+			return error;
+		break;
+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
+		/* Only punch if the device can do zeroing discard. */
+		if (!blk_queue_discard(q) || !q->limits.discard_zeroes_data)
+			return -EOPNOTSUPP;
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+		if (error)
+			return error;
+		break;
+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+		if (error)
+			return error;
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	/*
+	 * Invalidate again; if someone wandered in and dirtied a page,
+	 * the caller will be given -EBUSY;
+	 */
+	return invalidate_inode_pages2_range(mapping,
+					     start >> PAGE_SHIFT,
+					     end >> PAGE_SHIFT);
+}
+
 const struct file_operations def_blk_fops = {
 	.open		= blkdev_open,
 	.release	= blkdev_close,
@@ -1801,6 +1878,7 @@ const struct file_operations def_blk_fops = {
 #endif
 	.splice_read	= generic_file_splice_read,
 	.splice_write	= iter_file_splice_write,
+	.fallocate	= blkdev_fallocate,
 };
 
 int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long arg)
diff --git a/fs/open.c b/fs/open.c
index 4fd6e25..01b6092 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -289,7 +289,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 	 * Let individual file system decide if it supports preallocation
 	 * for directories or not.
 	 */
-	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
+	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) &&
+	    !S_ISBLK(inode->i_mode))
 		return -ENODEV;
 
 	/* Check for wrap through zero too */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT.
  2016-09-29  0:39   ` Darrick J. Wong
@ 2016-09-29  1:16     ` Bart Van Assche
  -1 siblings, 0 replies; 66+ messages in thread
From: Bart Van Assche @ 2016-09-29  1:16 UTC (permalink / raw)
  To: Darrick J. Wong, axboe, akpm
  Cc: linux-block, tytso, martin.petersen, snitzer, linux-api, bfoster,
	xfs, hch, dm-devel, linux-fsdevel, Christoph Hellwig

On 09/28/16 17:39, Darrick J. Wong wrote:
> Invalidate the page cache (as a regular O_DIRECT write would do) to avoid
> returning stale cache contents at a later time.

Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT.
@ 2016-09-29  1:16     ` Bart Van Assche
  0 siblings, 0 replies; 66+ messages in thread
From: Bart Van Assche @ 2016-09-29  1:16 UTC (permalink / raw)
  To: Darrick J. Wong, axboe, akpm
  Cc: hch, tytso, martin.petersen, snitzer, linux-api, bfoster, xfs,
	linux-block, dm-devel, linux-fsdevel, Christoph Hellwig

On 09/28/16 17:39, Darrick J. Wong wrote:
> Invalidate the page cache (as a regular O_DIRECT write would do) to avoid
> returning stale cache contents at a later time.

Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices
  2016-09-29  0:39   ` Darrick J. Wong
  (?)
@ 2016-09-29  1:42     ` Bart Van Assche
  -1 siblings, 0 replies; 66+ messages in thread
From: Bart Van Assche @ 2016-09-29  1:42 UTC (permalink / raw)
  To: Darrick J. Wong, axboe, akpm
  Cc: linux-block, tytso, martin.petersen, snitzer, linux-api, bfoster,
	xfs, hch, dm-devel, linux-fsdevel

On 09/28/16 17:39, Darrick J. Wong wrote:
> +	if (end > isize) {
> +		if (mode & FALLOC_FL_KEEP_SIZE) {
> +			len = isize - start;
> +			end = start + len - 1;
> +		} else
> +			return -EINVAL;
> +	}

If FALLOC_FL_KEEP_SIZE has been set and end == isize the above code 
won't reduce end to isize - 1. Shouldn't "end > isize" be changed into 
"end >= isize" ?

> +	switch (mode) {
> +	case FALLOC_FL_ZERO_RANGE:
> +	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
> +		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
> +					    GFP_KERNEL, false);
> +		if (error)
> +			return error;
> +		break;
> +	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
> +		/* Only punch if the device can do zeroing discard. */
> +		if (!blk_queue_discard(q) || !q->limits.discard_zeroes_data)
> +			return -EOPNOTSUPP;
> +		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
> +					     GFP_KERNEL, 0);
> +		if (error)
> +			return error;
> +		break;
> +	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
> +		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
> +					     GFP_KERNEL, 0);
> +		if (error)
> +			return error;
> +		break;
> +	default:
> +		return -EOPNOTSUPP;
> +	}

Have you considered to move "if (error) return error" out of the switch 
statement?

> +	/*
> +	 * Invalidate again; if someone wandered in and dirtied a page,
> +	 * the caller will be given -EBUSY;
> +	 */
> +	return invalidate_inode_pages2_range(mapping,
> +					     start >> PAGE_SHIFT,
> +					     end >> PAGE_SHIFT);

A comment might be appropriate here that since end is inclusive and 
since the third argument of invalidate_inode_pages2_range() is inclusive 
that rounding down will yield the correct result.

Bart.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices
@ 2016-09-29  1:42     ` Bart Van Assche
  0 siblings, 0 replies; 66+ messages in thread
From: Bart Van Assche @ 2016-09-29  1:42 UTC (permalink / raw)
  To: Darrick J. Wong, axboe, akpm
  Cc: hch, tytso, martin.petersen, snitzer, linux-api, bfoster, xfs,
	linux-block, dm-devel, linux-fsdevel

On 09/28/16 17:39, Darrick J. Wong wrote:
> +	if (end > isize) {
> +		if (mode & FALLOC_FL_KEEP_SIZE) {
> +			len = isize - start;
> +			end = start + len - 1;
> +		} else
> +			return -EINVAL;
> +	}

If FALLOC_FL_KEEP_SIZE has been set and end == isize the above code 
won't reduce end to isize - 1. Shouldn't "end > isize" be changed into 
"end >= isize" ?

> +	switch (mode) {
> +	case FALLOC_FL_ZERO_RANGE:
> +	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
> +		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
> +					    GFP_KERNEL, false);
> +		if (error)
> +			return error;
> +		break;
> +	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
> +		/* Only punch if the device can do zeroing discard. */
> +		if (!blk_queue_discard(q) || !q->limits.discard_zeroes_data)
> +			return -EOPNOTSUPP;
> +		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
> +					     GFP_KERNEL, 0);
> +		if (error)
> +			return error;
> +		break;
> +	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
> +		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
> +					     GFP_KERNEL, 0);
> +		if (error)
> +			return error;
> +		break;
> +	default:
> +		return -EOPNOTSUPP;
> +	}

Have you considered to move "if (error) return error" out of the switch 
statement?

> +	/*
> +	 * Invalidate again; if someone wandered in and dirtied a page,
> +	 * the caller will be given -EBUSY;
> +	 */
> +	return invalidate_inode_pages2_range(mapping,
> +					     start >> PAGE_SHIFT,
> +					     end >> PAGE_SHIFT);

A comment might be appropriate here that since end is inclusive and 
since the third argument of invalidate_inode_pages2_range() is inclusive 
that rounding down will yield the correct result.

Bart.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices
@ 2016-09-29  1:42     ` Bart Van Assche
  0 siblings, 0 replies; 66+ messages in thread
From: Bart Van Assche @ 2016-09-29  1:42 UTC (permalink / raw)
  To: Darrick J. Wong, axboe, akpm
  Cc: linux-block, tytso, martin.petersen, snitzer, linux-api, bfoster,
	xfs, hch, dm-devel, linux-fsdevel

On 09/28/16 17:39, Darrick J. Wong wrote:
> +	if (end > isize) {
> +		if (mode & FALLOC_FL_KEEP_SIZE) {
> +			len = isize - start;
> +			end = start + len - 1;
> +		} else
> +			return -EINVAL;
> +	}

If FALLOC_FL_KEEP_SIZE has been set and end == isize the above code 
won't reduce end to isize - 1. Shouldn't "end > isize" be changed into 
"end >= isize" ?

> +	switch (mode) {
> +	case FALLOC_FL_ZERO_RANGE:
> +	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
> +		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
> +					    GFP_KERNEL, false);
> +		if (error)
> +			return error;
> +		break;
> +	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
> +		/* Only punch if the device can do zeroing discard. */
> +		if (!blk_queue_discard(q) || !q->limits.discard_zeroes_data)
> +			return -EOPNOTSUPP;
> +		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
> +					     GFP_KERNEL, 0);
> +		if (error)
> +			return error;
> +		break;
> +	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
> +		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
> +					     GFP_KERNEL, 0);
> +		if (error)
> +			return error;
> +		break;
> +	default:
> +		return -EOPNOTSUPP;
> +	}

Have you considered to move "if (error) return error" out of the switch 
statement?

> +	/*
> +	 * Invalidate again; if someone wandered in and dirtied a page,
> +	 * the caller will be given -EBUSY;
> +	 */
> +	return invalidate_inode_pages2_range(mapping,
> +					     start >> PAGE_SHIFT,
> +					     end >> PAGE_SHIFT);

A comment might be appropriate here that since end is inclusive and 
since the third argument of invalidate_inode_pages2_range() is inclusive 
that rounding down will yield the correct result.

Bart.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices
  2016-09-29  1:42     ` Bart Van Assche
@ 2016-09-29  2:09       ` Darrick J. Wong
  -1 siblings, 0 replies; 66+ messages in thread
From: Darrick J. Wong @ 2016-09-29  2:09 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: axboe, akpm, linux-block, tytso, martin.petersen, snitzer,
	linux-api, bfoster, xfs, hch, dm-devel, linux-fsdevel

On Wed, Sep 28, 2016 at 06:42:14PM -0700, Bart Van Assche wrote:
> On 09/28/16 17:39, Darrick J. Wong wrote:
> >+	if (end > isize) {
> >+		if (mode & FALLOC_FL_KEEP_SIZE) {
> >+			len = isize - start;
> >+			end = start + len - 1;
> >+		} else
> >+			return -EINVAL;
> >+	}
> 
> If FALLOC_FL_KEEP_SIZE has been set and end == isize the above code won't
> reduce end to isize - 1. Shouldn't "end > isize" be changed into "end >=
> isize" ?

Oops.  Will fix and send out a v2.

> >+	switch (mode) {
> >+	case FALLOC_FL_ZERO_RANGE:
> >+	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
> >+		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
> >+					    GFP_KERNEL, false);
> >+		if (error)
> >+			return error;
> >+		break;
> >+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
> >+		/* Only punch if the device can do zeroing discard. */
> >+		if (!blk_queue_discard(q) || !q->limits.discard_zeroes_data)
> >+			return -EOPNOTSUPP;
> >+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
> >+					     GFP_KERNEL, 0);
> >+		if (error)
> >+			return error;
> >+		break;
> >+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
> >+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
> >+					     GFP_KERNEL, 0);
> >+		if (error)
> >+			return error;
> >+		break;
> >+	default:
> >+		return -EOPNOTSUPP;
> >+	}
> 
> Have you considered to move "if (error) return error" out of the switch
> statement?

Sure, I could do that.

> >+	/*
> >+	 * Invalidate again; if someone wandered in and dirtied a page,
> >+	 * the caller will be given -EBUSY;
> >+	 */
> >+	return invalidate_inode_pages2_range(mapping,
> >+					     start >> PAGE_SHIFT,
> >+					     end >> PAGE_SHIFT);
> 
> A comment might be appropriate here that since end is inclusive and since
> the third argument of invalidate_inode_pages2_range() is inclusive that
> rounding down will yield the correct result.

/methot the documentation of invalidate_inode_pages2_range was clear
enough on that point, but I could throw that into the comment too.

--D
> 
> Bart.
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices
@ 2016-09-29  2:09       ` Darrick J. Wong
  0 siblings, 0 replies; 66+ messages in thread
From: Darrick J. Wong @ 2016-09-29  2:09 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: axboe, hch, tytso, martin.petersen, snitzer, linux-api, bfoster,
	xfs, linux-block, dm-devel, linux-fsdevel, akpm

On Wed, Sep 28, 2016 at 06:42:14PM -0700, Bart Van Assche wrote:
> On 09/28/16 17:39, Darrick J. Wong wrote:
> >+	if (end > isize) {
> >+		if (mode & FALLOC_FL_KEEP_SIZE) {
> >+			len = isize - start;
> >+			end = start + len - 1;
> >+		} else
> >+			return -EINVAL;
> >+	}
> 
> If FALLOC_FL_KEEP_SIZE has been set and end == isize the above code won't
> reduce end to isize - 1. Shouldn't "end > isize" be changed into "end >=
> isize" ?

Oops.  Will fix and send out a v2.

> >+	switch (mode) {
> >+	case FALLOC_FL_ZERO_RANGE:
> >+	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
> >+		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
> >+					    GFP_KERNEL, false);
> >+		if (error)
> >+			return error;
> >+		break;
> >+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
> >+		/* Only punch if the device can do zeroing discard. */
> >+		if (!blk_queue_discard(q) || !q->limits.discard_zeroes_data)
> >+			return -EOPNOTSUPP;
> >+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
> >+					     GFP_KERNEL, 0);
> >+		if (error)
> >+			return error;
> >+		break;
> >+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
> >+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
> >+					     GFP_KERNEL, 0);
> >+		if (error)
> >+			return error;
> >+		break;
> >+	default:
> >+		return -EOPNOTSUPP;
> >+	}
> 
> Have you considered to move "if (error) return error" out of the switch
> statement?

Sure, I could do that.

> >+	/*
> >+	 * Invalidate again; if someone wandered in and dirtied a page,
> >+	 * the caller will be given -EBUSY;
> >+	 */
> >+	return invalidate_inode_pages2_range(mapping,
> >+					     start >> PAGE_SHIFT,
> >+					     end >> PAGE_SHIFT);
> 
> A comment might be appropriate here that since end is inclusive and since
> the third argument of invalidate_inode_pages2_range() is inclusive that
> rounding down will yield the correct result.

/methot the documentation of invalidate_inode_pages2_range was clear
enough on that point, but I could throw that into the comment too.

--D
> 
> Bart.
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v2 3/3] block: implement (some of) fallocate for block devices
  2016-09-29  0:39   ` Darrick J. Wong
@ 2016-09-29  2:19     ` Darrick J. Wong
  -1 siblings, 0 replies; 66+ messages in thread
From: Darrick J. Wong @ 2016-09-29  2:19 UTC (permalink / raw)
  To: axboe, akpm
  Cc: hch, tytso, martin.petersen, snitzer, linux-api, bfoster, xfs,
	linux-block, dm-devel, linux-fsdevel, bart.vanassche

After much discussion, it seems that the fallocate feature flag
FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
whitelisted for zeroing SCSI UNMAP.  Punch still requires that
FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
device will be clamped to the device size if KEEP_SIZE is set; or will
return -EINVAL if not.  Both start and length must be aligned to the
device's logical block size.

Since the semantics of fallocate are fairly well established already,
wire up the two pieces.  The other fallocate variants (collapse range,
insert range, and allocate blocks) are not supported.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
v2: Incorporate feedback from Christoph & Linus.  Tentatively add
a requirement that the fallocate arguments be aligned to logical block
size, and put in a few XXX comments ahead of LSF discussion.
v3: Forward port to 4.7.
v4: Forward port to 4.8.
---
 fs/block_dev.c |   75 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/open.c      |    3 +-
 2 files changed, 77 insertions(+), 1 deletion(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 08ae993..7b6d096 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -30,6 +30,7 @@
 #include <linux/cleancache.h>
 #include <linux/dax.h>
 #include <linux/badblocks.h>
+#include <linux/falloc.h>
 #include <asm/uaccess.h>
 #include "internal.h"
 
@@ -1787,6 +1788,79 @@ static const struct address_space_operations def_blk_aops = {
 	.is_dirty_writeback = buffer_check_dirty_writeback,
 };
 
+#define	BLKDEV_FALLOC_FL_SUPPORTED					\
+		(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |		\
+		 FALLOC_FL_ZERO_RANGE | FALLOC_FL_NO_HIDE_STALE)
+
+static long blkdev_fallocate(struct file *file, int mode, loff_t start,
+			     loff_t len)
+{
+	struct block_device *bdev = I_BDEV(bdev_file_inode(file));
+	struct request_queue *q = bdev_get_queue(bdev);
+	struct address_space *mapping;
+	loff_t end = start + len - 1;
+	loff_t isize;
+	int error;
+
+	/* Fail if we don't recognize the flags. */
+	if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
+		return -EOPNOTSUPP;
+
+	/* Don't go off the end of the device. */
+	isize = i_size_read(bdev->bd_inode);
+	if (start >= isize)
+		return -EINVAL;
+	if (end >= isize) {
+		if (mode & FALLOC_FL_KEEP_SIZE) {
+			len = isize - start;
+			end = start + len - 1;
+		} else
+			return -EINVAL;
+	}
+
+	/*
+	 * Don't allow IO that isn't aligned to logical block size.
+	 */
+	if ((start | len) & (bdev_logical_block_size(bdev) - 1))
+		return -EINVAL;
+
+	/* Invalidate the page cache, including dirty pages. */
+	mapping = bdev->bd_inode->i_mapping;
+	truncate_inode_pages_range(mapping, start, end);
+
+	switch (mode) {
+	case FALLOC_FL_ZERO_RANGE:
+	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
+		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
+					    GFP_KERNEL, false);
+		break;
+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
+		/* Only punch if the device can do zeroing discard. */
+		if (!blk_queue_discard(q) || !q->limits.discard_zeroes_data)
+			return -EOPNOTSUPP;
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+		break;
+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+	if (error)
+		return error;
+
+	/*
+	 * Invalidate again; if someone wandered in and dirtied a page,
+	 * the caller will be given -EBUSY.  The third argument is
+	 * inclusive, so the rounding here is safe.
+	 */
+	return invalidate_inode_pages2_range(mapping,
+					     start >> PAGE_SHIFT,
+					     end >> PAGE_SHIFT);
+}
+
 const struct file_operations def_blk_fops = {
 	.open		= blkdev_open,
 	.release	= blkdev_close,
@@ -1801,6 +1875,7 @@ const struct file_operations def_blk_fops = {
 #endif
 	.splice_read	= generic_file_splice_read,
 	.splice_write	= iter_file_splice_write,
+	.fallocate	= blkdev_fallocate,
 };
 
 int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long arg)
diff --git a/fs/open.c b/fs/open.c
index 4fd6e25..01b6092 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -289,7 +289,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 	 * Let individual file system decide if it supports preallocation
 	 * for directories or not.
 	 */
-	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
+	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) &&
+	    !S_ISBLK(inode->i_mode))
 		return -ENODEV;
 
 	/* Check for wrap through zero too */

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 3/3] block: implement (some of) fallocate for block devices
@ 2016-09-29  2:19     ` Darrick J. Wong
  0 siblings, 0 replies; 66+ messages in thread
From: Darrick J. Wong @ 2016-09-29  2:19 UTC (permalink / raw)
  To: axboe, akpm
  Cc: linux-block, tytso, martin.petersen, snitzer, linux-api, bfoster,
	xfs, hch, dm-devel, linux-fsdevel, bart.vanassche

After much discussion, it seems that the fallocate feature flag
FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
whitelisted for zeroing SCSI UNMAP.  Punch still requires that
FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
device will be clamped to the device size if KEEP_SIZE is set; or will
return -EINVAL if not.  Both start and length must be aligned to the
device's logical block size.

Since the semantics of fallocate are fairly well established already,
wire up the two pieces.  The other fallocate variants (collapse range,
insert range, and allocate blocks) are not supported.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
v2: Incorporate feedback from Christoph & Linus.  Tentatively add
a requirement that the fallocate arguments be aligned to logical block
size, and put in a few XXX comments ahead of LSF discussion.
v3: Forward port to 4.7.
v4: Forward port to 4.8.
---
 fs/block_dev.c |   75 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/open.c      |    3 +-
 2 files changed, 77 insertions(+), 1 deletion(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 08ae993..7b6d096 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -30,6 +30,7 @@
 #include <linux/cleancache.h>
 #include <linux/dax.h>
 #include <linux/badblocks.h>
+#include <linux/falloc.h>
 #include <asm/uaccess.h>
 #include "internal.h"
 
@@ -1787,6 +1788,79 @@ static const struct address_space_operations def_blk_aops = {
 	.is_dirty_writeback = buffer_check_dirty_writeback,
 };
 
+#define	BLKDEV_FALLOC_FL_SUPPORTED					\
+		(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |		\
+		 FALLOC_FL_ZERO_RANGE | FALLOC_FL_NO_HIDE_STALE)
+
+static long blkdev_fallocate(struct file *file, int mode, loff_t start,
+			     loff_t len)
+{
+	struct block_device *bdev = I_BDEV(bdev_file_inode(file));
+	struct request_queue *q = bdev_get_queue(bdev);
+	struct address_space *mapping;
+	loff_t end = start + len - 1;
+	loff_t isize;
+	int error;
+
+	/* Fail if we don't recognize the flags. */
+	if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
+		return -EOPNOTSUPP;
+
+	/* Don't go off the end of the device. */
+	isize = i_size_read(bdev->bd_inode);
+	if (start >= isize)
+		return -EINVAL;
+	if (end >= isize) {
+		if (mode & FALLOC_FL_KEEP_SIZE) {
+			len = isize - start;
+			end = start + len - 1;
+		} else
+			return -EINVAL;
+	}
+
+	/*
+	 * Don't allow IO that isn't aligned to logical block size.
+	 */
+	if ((start | len) & (bdev_logical_block_size(bdev) - 1))
+		return -EINVAL;
+
+	/* Invalidate the page cache, including dirty pages. */
+	mapping = bdev->bd_inode->i_mapping;
+	truncate_inode_pages_range(mapping, start, end);
+
+	switch (mode) {
+	case FALLOC_FL_ZERO_RANGE:
+	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
+		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
+					    GFP_KERNEL, false);
+		break;
+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
+		/* Only punch if the device can do zeroing discard. */
+		if (!blk_queue_discard(q) || !q->limits.discard_zeroes_data)
+			return -EOPNOTSUPP;
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+		break;
+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+	if (error)
+		return error;
+
+	/*
+	 * Invalidate again; if someone wandered in and dirtied a page,
+	 * the caller will be given -EBUSY.  The third argument is
+	 * inclusive, so the rounding here is safe.
+	 */
+	return invalidate_inode_pages2_range(mapping,
+					     start >> PAGE_SHIFT,
+					     end >> PAGE_SHIFT);
+}
+
 const struct file_operations def_blk_fops = {
 	.open		= blkdev_open,
 	.release	= blkdev_close,
@@ -1801,6 +1875,7 @@ const struct file_operations def_blk_fops = {
 #endif
 	.splice_read	= generic_file_splice_read,
 	.splice_write	= iter_file_splice_write,
+	.fallocate	= blkdev_fallocate,
 };
 
 int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long arg)
diff --git a/fs/open.c b/fs/open.c
index 4fd6e25..01b6092 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -289,7 +289,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 	 * Let individual file system decide if it supports preallocation
 	 * for directories or not.
 	 */
-	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
+	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) &&
+	    !S_ISBLK(inode->i_mode))
 		return -ENODEV;
 
 	/* Check for wrap through zero too */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT.
  2016-09-29  0:39   ` Darrick J. Wong
@ 2016-09-29  5:56     ` Hannes Reinecke
  -1 siblings, 0 replies; 66+ messages in thread
From: Hannes Reinecke @ 2016-09-29  5:56 UTC (permalink / raw)
  To: Darrick J. Wong, axboe, akpm
  Cc: linux-block, tytso, martin.petersen, snitzer, linux-api, bfoster,
	xfs, hch, dm-devel, linux-fsdevel, bart.vanassche,
	Christoph Hellwig

On 09/29/2016 02:39 AM, Darrick J. Wong wrote:
> Invalidate the page cache (as a regular O_DIRECT write would do) to avoid
> returning stale cache contents at a later time.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
> ---
>  block/ioctl.c |   18 ++++++++++++------
>  1 file changed, 12 insertions(+), 6 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT.
@ 2016-09-29  5:56     ` Hannes Reinecke
  0 siblings, 0 replies; 66+ messages in thread
From: Hannes Reinecke @ 2016-09-29  5:56 UTC (permalink / raw)
  To: Darrick J. Wong, axboe, akpm
  Cc: hch, tytso, martin.petersen, snitzer, linux-api, bfoster, xfs,
	linux-block, dm-devel, linux-fsdevel, bart.vanassche,
	Christoph Hellwig

On 09/29/2016 02:39 AM, Darrick J. Wong wrote:
> Invalidate the page cache (as a regular O_DIRECT write would do) to avoid
> returning stale cache contents at a later time.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
> ---
>  block/ioctl.c |   18 ++++++++++++------
>  1 file changed, 12 insertions(+), 6 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/3] block: require write_same and discard requests align to logical block size
  2016-09-29  0:39   ` Darrick J. Wong
  (?)
@ 2016-09-29  5:56     ` Hannes Reinecke
  -1 siblings, 0 replies; 66+ messages in thread
From: Hannes Reinecke @ 2016-09-29  5:56 UTC (permalink / raw)
  To: Darrick J. Wong, axboe, akpm
  Cc: linux-block, tytso, martin.petersen, snitzer, linux-api, bfoster,
	xfs, hch, dm-devel, linux-fsdevel, bart.vanassche,
	Christoph Hellwig

On 09/29/2016 02:39 AM, Darrick J. Wong wrote:
> Make sure that the offset and length arguments that we're using to
> construct WRITE SAME and DISCARD requests are actually aligned to the
> logical block size.  Failure to do this causes other errors in other
> parts of the block layer or the SCSI layer because disks don't support
> partial logical block writes.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
> ---
>  block/blk-lib.c |   15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/3] block: require write_same and discard requests align to logical block size
@ 2016-09-29  5:56     ` Hannes Reinecke
  0 siblings, 0 replies; 66+ messages in thread
From: Hannes Reinecke @ 2016-09-29  5:56 UTC (permalink / raw)
  To: Darrick J. Wong, axboe, akpm
  Cc: hch, tytso, martin.petersen, snitzer, linux-api, bfoster, xfs,
	linux-block, dm-devel, linux-fsdevel, bart.vanassche,
	Christoph Hellwig

On 09/29/2016 02:39 AM, Darrick J. Wong wrote:
> Make sure that the offset and length arguments that we're using to
> construct WRITE SAME and DISCARD requests are actually aligned to the
> logical block size.  Failure to do this causes other errors in other
> parts of the block layer or the SCSI layer because disks don't support
> partial logical block writes.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
> ---
>  block/blk-lib.c |   15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/3] block: require write_same and discard requests align to logical block size
@ 2016-09-29  5:56     ` Hannes Reinecke
  0 siblings, 0 replies; 66+ messages in thread
From: Hannes Reinecke @ 2016-09-29  5:56 UTC (permalink / raw)
  To: Darrick J. Wong, axboe-tSWWG44O7X1aa/9Udqfwiw,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
  Cc: linux-block-u79uwXL29TY76Z2rM5mHXA, tytso-3s7WtUTddSA,
	martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, linux-api-u79uwXL29TY76Z2rM5mHXA,
	bfoster-H+wXaHxf7aLQT0dZR+AlfA, xfs-VZNHf3L845pBDgjK7y7TUQ,
	hch-wEGCiKHe2LqWVfeAwA7xHQ, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ, Christoph Hellwig

On 09/29/2016 02:39 AM, Darrick J. Wong wrote:
> Make sure that the offset and length arguments that we're using to
> construct WRITE SAME and DISCARD requests are actually aligned to the
> logical block size.  Failure to do this causes other errors in other
> parts of the block layer or the SCSI layer because disks don't support
> partial logical block writes.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> Reviewed-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> Reviewed-by: Martin K. Petersen <martin.petersen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> ---
>  block/blk-lib.c |   15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
Reviewed-by: Hannes Reinecke <hare-IBi9RG/b67k@public.gmane.org>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare-l3A5Bk7waGM@public.gmane.org			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices
  2016-09-29  0:39   ` Darrick J. Wong
@ 2016-09-29  5:57     ` Hannes Reinecke
  -1 siblings, 0 replies; 66+ messages in thread
From: Hannes Reinecke @ 2016-09-29  5:57 UTC (permalink / raw)
  To: Darrick J. Wong, axboe, akpm
  Cc: linux-block, tytso, martin.petersen, snitzer, linux-api, bfoster,
	xfs, hch, dm-devel, linux-fsdevel, bart.vanassche

On 09/29/2016 02:39 AM, Darrick J. Wong wrote:
> After much discussion, it seems that the fallocate feature flag
> FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
> FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
> whitelisted for zeroing SCSI UNMAP.  Punch still requires that
> FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
> device will be clamped to the device size if KEEP_SIZE is set; or will
> return -EINVAL if not.  Both start and length must be aligned to the
> device's logical block size.
> 
> Since the semantics of fallocate are fairly well established already,
> wire up the two pieces.  The other fallocate variants (collapse range,
> insert range, and allocate blocks) are not supported.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> v2: Incorporate feedback from Christoph & Linus.  Tentatively add
> a requirement that the fallocate arguments be aligned to logical block
> size, and put in a few XXX comments ahead of LSF discussion.
> v3: Forward port to 4.7.
> v4: Forward port to 4.8.
> ---
>  fs/block_dev.c |   78 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/open.c      |    3 +-
>  2 files changed, 80 insertions(+), 1 deletion(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices
@ 2016-09-29  5:57     ` Hannes Reinecke
  0 siblings, 0 replies; 66+ messages in thread
From: Hannes Reinecke @ 2016-09-29  5:57 UTC (permalink / raw)
  To: Darrick J. Wong, axboe, akpm
  Cc: hch, tytso, martin.petersen, snitzer, linux-api, bfoster, xfs,
	linux-block, dm-devel, linux-fsdevel, bart.vanassche

On 09/29/2016 02:39 AM, Darrick J. Wong wrote:
> After much discussion, it seems that the fallocate feature flag
> FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
> FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
> whitelisted for zeroing SCSI UNMAP.  Punch still requires that
> FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
> device will be clamped to the device size if KEEP_SIZE is set; or will
> return -EINVAL if not.  Both start and length must be aligned to the
> device's logical block size.
> 
> Since the semantics of fallocate are fairly well established already,
> wire up the two pieces.  The other fallocate variants (collapse range,
> insert range, and allocate blocks) are not supported.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> v2: Incorporate feedback from Christoph & Linus.  Tentatively add
> a requirement that the fallocate arguments be aligned to logical block
> size, and put in a few XXX comments ahead of LSF discussion.
> v3: Forward port to 4.7.
> v4: Forward port to 4.8.
> ---
>  fs/block_dev.c |   78 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/open.c      |    3 +-
>  2 files changed, 80 insertions(+), 1 deletion(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 3/3] block: implement (some of) fallocate for block devices
  2016-09-29  2:19     ` Darrick J. Wong
  (?)
@ 2016-09-29 20:08       ` Bart Van Assche
  -1 siblings, 0 replies; 66+ messages in thread
From: Bart Van Assche @ 2016-09-29 20:08 UTC (permalink / raw)
  To: Darrick J. Wong, axboe, akpm
  Cc: hch, tytso, martin.petersen, snitzer, linux-api, bfoster, xfs,
	linux-block, dm-devel, linux-fsdevel

On 09/28/2016 07:19 PM, Darrick J. Wong wrote:
> After much discussion, it seems that the fallocate feature flag
> FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
> FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
> whitelisted for zeroing SCSI UNMAP.  Punch still requires that
> FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
> device will be clamped to the device size if KEEP_SIZE is set; or will
> return -EINVAL if not.  Both start and length must be aligned to the
> device's logical block size.
>
> Since the semantics of fallocate are fairly well established already,
> wire up the two pieces.  The other fallocate variants (collapse range,
> insert range, and allocate blocks) are not supported.

For the FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | 
FALLOC_FL_NO_HIDE_STALE case, it's probably safer not to try to send a 
discard to block devices that do not support discard in order not to hit 
block driver bugs. But that's something we can still discuss later. Hence:

Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 3/3] block: implement (some of) fallocate for block devices
@ 2016-09-29 20:08       ` Bart Van Assche
  0 siblings, 0 replies; 66+ messages in thread
From: Bart Van Assche @ 2016-09-29 20:08 UTC (permalink / raw)
  To: Darrick J. Wong, axboe, akpm
  Cc: linux-block, tytso, martin.petersen, snitzer, linux-api, bfoster,
	xfs, hch, dm-devel, linux-fsdevel

On 09/28/2016 07:19 PM, Darrick J. Wong wrote:
> After much discussion, it seems that the fallocate feature flag
> FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
> FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
> whitelisted for zeroing SCSI UNMAP.  Punch still requires that
> FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
> device will be clamped to the device size if KEEP_SIZE is set; or will
> return -EINVAL if not.  Both start and length must be aligned to the
> device's logical block size.
>
> Since the semantics of fallocate are fairly well established already,
> wire up the two pieces.  The other fallocate variants (collapse range,
> insert range, and allocate blocks) are not supported.

For the FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | 
FALLOC_FL_NO_HIDE_STALE case, it's probably safer not to try to send a 
discard to block devices that do not support discard in order not to hit 
block driver bugs. But that's something we can still discuss later. Hence:

Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 3/3] block: implement (some of) fallocate for block devices
@ 2016-09-29 20:08       ` Bart Van Assche
  0 siblings, 0 replies; 66+ messages in thread
From: Bart Van Assche @ 2016-09-29 20:08 UTC (permalink / raw)
  To: Darrick J. Wong, axboe-tSWWG44O7X1aa/9Udqfwiw,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
  Cc: hch-wEGCiKHe2LqWVfeAwA7xHQ, tytso-3s7WtUTddSA,
	martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, linux-api-u79uwXL29TY76Z2rM5mHXA,
	bfoster-H+wXaHxf7aLQT0dZR+AlfA, xfs-VZNHf3L845pBDgjK7y7TUQ,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

On 09/28/2016 07:19 PM, Darrick J. Wong wrote:
> After much discussion, it seems that the fallocate feature flag
> FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
> FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
> whitelisted for zeroing SCSI UNMAP.  Punch still requires that
> FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
> device will be clamped to the device size if KEEP_SIZE is set; or will
> return -EINVAL if not.  Both start and length must be aligned to the
> device's logical block size.
>
> Since the semantics of fallocate are fairly well established already,
> wire up the two pieces.  The other fallocate variants (collapse range,
> insert range, and allocate blocks) are not supported.

For the FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | 
FALLOC_FL_NO_HIDE_STALE case, it's probably safer not to try to send a 
discard to block devices that do not support discard in order not to hit 
block driver bugs. But that's something we can still discuss later. Hence:

Reviewed-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 3/3] block: implement (some of) fallocate for block devices
  2016-09-29 20:08       ` Bart Van Assche
@ 2016-09-29 20:35         ` Darrick J. Wong
  -1 siblings, 0 replies; 66+ messages in thread
From: Darrick J. Wong @ 2016-09-29 20:35 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: axboe, akpm, hch, tytso, martin.petersen, snitzer, linux-api,
	bfoster, xfs, linux-block, dm-devel, linux-fsdevel

On Thu, Sep 29, 2016 at 01:08:57PM -0700, Bart Van Assche wrote:
> On 09/28/2016 07:19 PM, Darrick J. Wong wrote:
> >After much discussion, it seems that the fallocate feature flag
> >FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
> >FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
> >whitelisted for zeroing SCSI UNMAP.  Punch still requires that
> >FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
> >device will be clamped to the device size if KEEP_SIZE is set; or will
> >return -EINVAL if not.  Both start and length must be aligned to the
> >device's logical block size.
> >
> >Since the semantics of fallocate are fairly well established already,
> >wire up the two pieces.  The other fallocate variants (collapse range,
> >insert range, and allocate blocks) are not supported.
> 
> For the FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE
> case, it's probably safer not to try to send a discard to block devices that
> do not support discard in order not to hit block driver bugs. But that's
> something we can still discuss later. Hence:

I'll just change it to check the queue flags and post a new revision.
At this point I might as well repost the whole thing to reflect the
reviewed-bys.

--D

> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 3/3] block: implement (some of) fallocate for block devices
@ 2016-09-29 20:35         ` Darrick J. Wong
  0 siblings, 0 replies; 66+ messages in thread
From: Darrick J. Wong @ 2016-09-29 20:35 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: axboe, linux-block, tytso, martin.petersen, snitzer, linux-api,
	bfoster, xfs, hch, dm-devel, linux-fsdevel, akpm

On Thu, Sep 29, 2016 at 01:08:57PM -0700, Bart Van Assche wrote:
> On 09/28/2016 07:19 PM, Darrick J. Wong wrote:
> >After much discussion, it seems that the fallocate feature flag
> >FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
> >FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
> >whitelisted for zeroing SCSI UNMAP.  Punch still requires that
> >FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
> >device will be clamped to the device size if KEEP_SIZE is set; or will
> >return -EINVAL if not.  Both start and length must be aligned to the
> >device's logical block size.
> >
> >Since the semantics of fallocate are fairly well established already,
> >wire up the two pieces.  The other fallocate variants (collapse range,
> >insert range, and allocate blocks) are not supported.
> 
> For the FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE
> case, it's probably safer not to try to send a discard to block devices that
> do not support discard in order not to hit block driver bugs. But that's
> something we can still discuss later. Hence:

I'll just change it to check the queue flags and post a new revision.
At this point I might as well repost the whole thing to reflect the
reviewed-bys.

--D

> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 3/3] block: implement (some of) fallocate for block devices
  2016-09-29 21:16 [PATCH v11 0/3] " Darrick J. Wong
@ 2016-09-29 21:16   ` Darrick J. Wong
  0 siblings, 0 replies; 66+ messages in thread
From: Darrick J. Wong @ 2016-09-29 21:16 UTC (permalink / raw)
  To: axboe, akpm, darrick.wong
  Cc: linux-block, Hannes Reinecke, tytso, martin.petersen, snitzer,
	linux-api, bfoster, xfs, hch, dm-devel, hare, linux-fsdevel,
	bart.vanassche

After much discussion, it seems that the fallocate feature flag
FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
whitelisted for zeroing SCSI UNMAP.  Punch still requires that
FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
device will be clamped to the device size if KEEP_SIZE is set; or will
return -EINVAL if not.  Both start and length must be aligned to the
device's logical block size.

Since the semantics of fallocate are fairly well established already,
wire up the two pieces.  The other fallocate variants (collapse range,
insert range, and allocate blocks) are not supported.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
---
 fs/block_dev.c |   77 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/open.c      |    3 +-
 2 files changed, 79 insertions(+), 1 deletion(-)


diff --git a/fs/block_dev.c b/fs/block_dev.c
index 08ae993..777fd9b 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -30,6 +30,7 @@
 #include <linux/cleancache.h>
 #include <linux/dax.h>
 #include <linux/badblocks.h>
+#include <linux/falloc.h>
 #include <asm/uaccess.h>
 #include "internal.h"
 
@@ -1787,6 +1788,81 @@ static const struct address_space_operations def_blk_aops = {
 	.is_dirty_writeback = buffer_check_dirty_writeback,
 };
 
+#define	BLKDEV_FALLOC_FL_SUPPORTED					\
+		(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |		\
+		 FALLOC_FL_ZERO_RANGE | FALLOC_FL_NO_HIDE_STALE)
+
+static long blkdev_fallocate(struct file *file, int mode, loff_t start,
+			     loff_t len)
+{
+	struct block_device *bdev = I_BDEV(bdev_file_inode(file));
+	struct request_queue *q = bdev_get_queue(bdev);
+	struct address_space *mapping;
+	loff_t end = start + len - 1;
+	loff_t isize;
+	int error;
+
+	/* Fail if we don't recognize the flags. */
+	if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
+		return -EOPNOTSUPP;
+
+	/* Don't go off the end of the device. */
+	isize = i_size_read(bdev->bd_inode);
+	if (start >= isize)
+		return -EINVAL;
+	if (end >= isize) {
+		if (mode & FALLOC_FL_KEEP_SIZE) {
+			len = isize - start;
+			end = start + len - 1;
+		} else
+			return -EINVAL;
+	}
+
+	/*
+	 * Don't allow IO that isn't aligned to logical block size.
+	 */
+	if ((start | len) & (bdev_logical_block_size(bdev) - 1))
+		return -EINVAL;
+
+	/* Invalidate the page cache, including dirty pages. */
+	mapping = bdev->bd_inode->i_mapping;
+	truncate_inode_pages_range(mapping, start, end);
+
+	switch (mode) {
+	case FALLOC_FL_ZERO_RANGE:
+	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
+		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
+					    GFP_KERNEL, false);
+		break;
+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
+		/* Only punch if the device can do zeroing discard. */
+		if (!blk_queue_discard(q) || !q->limits.discard_zeroes_data)
+			return -EOPNOTSUPP;
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+		break;
+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
+		if (!blk_queue_discard(q))
+			return -EOPNOTSUPP;
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+	if (error)
+		return error;
+
+	/*
+	 * Invalidate again; if someone wandered in and dirtied a page,
+	 * the caller will be given -EBUSY.  The third argument is
+	 * inclusive, so the rounding here is safe.
+	 */
+	return invalidate_inode_pages2_range(mapping,
+					     start >> PAGE_SHIFT,
+					     end >> PAGE_SHIFT);
+}
+
 const struct file_operations def_blk_fops = {
 	.open		= blkdev_open,
 	.release	= blkdev_close,
@@ -1801,6 +1877,7 @@ const struct file_operations def_blk_fops = {
 #endif
 	.splice_read	= generic_file_splice_read,
 	.splice_write	= iter_file_splice_write,
+	.fallocate	= blkdev_fallocate,
 };
 
 int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long arg)
diff --git a/fs/open.c b/fs/open.c
index 4fd6e25..01b6092 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -289,7 +289,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 	 * Let individual file system decide if it supports preallocation
 	 * for directories or not.
 	 */
-	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
+	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) &&
+	    !S_ISBLK(inode->i_mode))
 		return -ENODEV;
 
 	/* Check for wrap through zero too */

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 3/3] block: implement (some of) fallocate for block devices
@ 2016-09-29 21:16   ` Darrick J. Wong
  0 siblings, 0 replies; 66+ messages in thread
From: Darrick J. Wong @ 2016-09-29 21:16 UTC (permalink / raw)
  To: axboe, akpm, darrick.wong
  Cc: hch, Hannes Reinecke, tytso, martin.petersen, snitzer, linux-api,
	bfoster, xfs, linux-block, dm-devel, hare, linux-fsdevel,
	bart.vanassche

After much discussion, it seems that the fallocate feature flag
FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
whitelisted for zeroing SCSI UNMAP.  Punch still requires that
FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
device will be clamped to the device size if KEEP_SIZE is set; or will
return -EINVAL if not.  Both start and length must be aligned to the
device's logical block size.

Since the semantics of fallocate are fairly well established already,
wire up the two pieces.  The other fallocate variants (collapse range,
insert range, and allocate blocks) are not supported.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
---
 fs/block_dev.c |   77 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/open.c      |    3 +-
 2 files changed, 79 insertions(+), 1 deletion(-)


diff --git a/fs/block_dev.c b/fs/block_dev.c
index 08ae993..777fd9b 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -30,6 +30,7 @@
 #include <linux/cleancache.h>
 #include <linux/dax.h>
 #include <linux/badblocks.h>
+#include <linux/falloc.h>
 #include <asm/uaccess.h>
 #include "internal.h"
 
@@ -1787,6 +1788,81 @@ static const struct address_space_operations def_blk_aops = {
 	.is_dirty_writeback = buffer_check_dirty_writeback,
 };
 
+#define	BLKDEV_FALLOC_FL_SUPPORTED					\
+		(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |		\
+		 FALLOC_FL_ZERO_RANGE | FALLOC_FL_NO_HIDE_STALE)
+
+static long blkdev_fallocate(struct file *file, int mode, loff_t start,
+			     loff_t len)
+{
+	struct block_device *bdev = I_BDEV(bdev_file_inode(file));
+	struct request_queue *q = bdev_get_queue(bdev);
+	struct address_space *mapping;
+	loff_t end = start + len - 1;
+	loff_t isize;
+	int error;
+
+	/* Fail if we don't recognize the flags. */
+	if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
+		return -EOPNOTSUPP;
+
+	/* Don't go off the end of the device. */
+	isize = i_size_read(bdev->bd_inode);
+	if (start >= isize)
+		return -EINVAL;
+	if (end >= isize) {
+		if (mode & FALLOC_FL_KEEP_SIZE) {
+			len = isize - start;
+			end = start + len - 1;
+		} else
+			return -EINVAL;
+	}
+
+	/*
+	 * Don't allow IO that isn't aligned to logical block size.
+	 */
+	if ((start | len) & (bdev_logical_block_size(bdev) - 1))
+		return -EINVAL;
+
+	/* Invalidate the page cache, including dirty pages. */
+	mapping = bdev->bd_inode->i_mapping;
+	truncate_inode_pages_range(mapping, start, end);
+
+	switch (mode) {
+	case FALLOC_FL_ZERO_RANGE:
+	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
+		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
+					    GFP_KERNEL, false);
+		break;
+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
+		/* Only punch if the device can do zeroing discard. */
+		if (!blk_queue_discard(q) || !q->limits.discard_zeroes_data)
+			return -EOPNOTSUPP;
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+		break;
+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
+		if (!blk_queue_discard(q))
+			return -EOPNOTSUPP;
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+	if (error)
+		return error;
+
+	/*
+	 * Invalidate again; if someone wandered in and dirtied a page,
+	 * the caller will be given -EBUSY.  The third argument is
+	 * inclusive, so the rounding here is safe.
+	 */
+	return invalidate_inode_pages2_range(mapping,
+					     start >> PAGE_SHIFT,
+					     end >> PAGE_SHIFT);
+}
+
 const struct file_operations def_blk_fops = {
 	.open		= blkdev_open,
 	.release	= blkdev_close,
@@ -1801,6 +1877,7 @@ const struct file_operations def_blk_fops = {
 #endif
 	.splice_read	= generic_file_splice_read,
 	.splice_write	= iter_file_splice_write,
+	.fallocate	= blkdev_fallocate,
 };
 
 int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long arg)
diff --git a/fs/open.c b/fs/open.c
index 4fd6e25..01b6092 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -289,7 +289,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 	 * Let individual file system decide if it supports preallocation
 	 * for directories or not.
 	 */
-	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
+	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) &&
+	    !S_ISBLK(inode->i_mode))
 		return -ENODEV;
 
 	/* Check for wrap through zero too */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 3/3] block: implement (some of) fallocate for block devices
  2016-08-26  0:02 [PATCH v10 0/3] " Darrick J. Wong
@ 2016-08-26  0:02   ` Darrick J. Wong
  0 siblings, 0 replies; 66+ messages in thread
From: Darrick J. Wong @ 2016-08-26  0:02 UTC (permalink / raw)
  To: axboe, darrick.wong
  Cc: linux-block, tytso, martin.petersen, snitzer, linux-api, bfoster,
	xfs, hch, dm-devel, linux-fsdevel, bart.vanassche

After much discussion, it seems that the fallocate feature flag
FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
whitelisted for zeroing SCSI UNMAP.  Punch still requires that
FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
device will be clamped to the device size if KEEP_SIZE is set; or will
return -EINVAL if not.  Both start and length must be aligned to the
device's logical block size.

Since the semantics of fallocate are fairly well established already,
wire up the two pieces.  The other fallocate variants (collapse range,
insert range, and allocate blocks) are not supported.

v2: Incorporate feedback from Christoph & Linus.  Tentatively add
a requirement that the fallocate arguments be aligned to logical block
size, and put in a few XXX comments ahead of LSF discussion.

v3: Forward port to 4.7.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/block_dev.c |   84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/open.c      |    3 +-
 2 files changed, 86 insertions(+), 1 deletion(-)


diff --git a/fs/block_dev.c b/fs/block_dev.c
index c3cdde8..4df3fc8 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -30,6 +30,7 @@
 #include <linux/cleancache.h>
 #include <linux/dax.h>
 #include <linux/badblocks.h>
+#include <linux/falloc.h>
 #include <asm/uaccess.h>
 #include "internal.h"
 
@@ -1786,6 +1787,88 @@ static const struct address_space_operations def_blk_aops = {
 	.is_dirty_writeback = buffer_check_dirty_writeback,
 };
 
+#define	BLKDEV_FALLOC_FL_SUPPORTED					\
+		(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |		\
+		 FALLOC_FL_ZERO_RANGE | FALLOC_FL_NO_HIDE_STALE)
+
+static long blkdev_fallocate(struct file *file, int mode, loff_t start,
+			     loff_t len)
+{
+	struct block_device *bdev = I_BDEV(bdev_file_inode(file));
+	struct request_queue *q = bdev_get_queue(bdev);
+	struct address_space *mapping;
+	loff_t end = start + len - 1;
+	loff_t isize;
+	int error;
+
+	/* Fail if we don't recognize the flags. */
+	if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
+		return -EOPNOTSUPP;
+
+	/* Don't go off the end of the device. */
+	isize = i_size_read(bdev->bd_inode);
+	if (start >= isize)
+		return -EINVAL;
+	if (end > isize) {
+		if (mode & FALLOC_FL_KEEP_SIZE) {
+			len = isize - start;
+			end = start + len - 1;
+		} else
+			return -EINVAL;
+	}
+
+	/*
+	 * Don't allow IO that isn't aligned to logical block size.
+	 */
+	if ((start | len) & (bdev_logical_block_size(bdev) - 1))
+		return -EINVAL;
+
+	/* Invalidate the page cache, including dirty pages. */
+	mapping = bdev->bd_inode->i_mapping;
+	truncate_inode_pages_range(mapping, start, end);
+
+	switch (mode) {
+	case FALLOC_FL_ZERO_RANGE:
+	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
+		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
+					    GFP_KERNEL, false);
+		if (error)
+			return error;
+		break;
+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
+		/* Only punch if the device can do zeroing discard. */
+		if (!blk_queue_discard(q) || !q->limits.discard_zeroes_data)
+			return -EOPNOTSUPP;
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+		if (error)
+			return error;
+		break;
+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
+		/*
+		 * XXX: a well known search engine vendor interprets this
+		 * flag (in other circumstances) to mean "I don't care if
+		 * we can read stale contents later".  Is it appropriate
+		 * to wire this up to the non-zeroing discard?
+		 */
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+		if (error)
+			return error;
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	/*
+	 * Invalidate again; if someone wandered in and dirtied a page,
+	 * the caller will be given -EBUSY;
+	 */
+	return invalidate_inode_pages2_range(mapping,
+					     start >> PAGE_SHIFT,
+					     end >> PAGE_SHIFT);
+}
+
 const struct file_operations def_blk_fops = {
 	.open		= blkdev_open,
 	.release	= blkdev_close,
@@ -1800,6 +1883,7 @@ const struct file_operations def_blk_fops = {
 #endif
 	.splice_read	= generic_file_splice_read,
 	.splice_write	= iter_file_splice_write,
+	.fallocate	= blkdev_fallocate,
 };
 
 int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long arg)
diff --git a/fs/open.c b/fs/open.c
index 4fd6e25..01b6092 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -289,7 +289,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 	 * Let individual file system decide if it supports preallocation
 	 * for directories or not.
 	 */
-	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
+	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) &&
+	    !S_ISBLK(inode->i_mode))
 		return -ENODEV;
 
 	/* Check for wrap through zero too */

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 3/3] block: implement (some of) fallocate for block devices
@ 2016-08-26  0:02   ` Darrick J. Wong
  0 siblings, 0 replies; 66+ messages in thread
From: Darrick J. Wong @ 2016-08-26  0:02 UTC (permalink / raw)
  To: axboe, darrick.wong
  Cc: hch, tytso, martin.petersen, snitzer, linux-api, bfoster, xfs,
	linux-block, dm-devel, linux-fsdevel, bart.vanassche

After much discussion, it seems that the fallocate feature flag
FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
whitelisted for zeroing SCSI UNMAP.  Punch still requires that
FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
device will be clamped to the device size if KEEP_SIZE is set; or will
return -EINVAL if not.  Both start and length must be aligned to the
device's logical block size.

Since the semantics of fallocate are fairly well established already,
wire up the two pieces.  The other fallocate variants (collapse range,
insert range, and allocate blocks) are not supported.

v2: Incorporate feedback from Christoph & Linus.  Tentatively add
a requirement that the fallocate arguments be aligned to logical block
size, and put in a few XXX comments ahead of LSF discussion.

v3: Forward port to 4.7.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/block_dev.c |   84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/open.c      |    3 +-
 2 files changed, 86 insertions(+), 1 deletion(-)


diff --git a/fs/block_dev.c b/fs/block_dev.c
index c3cdde8..4df3fc8 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -30,6 +30,7 @@
 #include <linux/cleancache.h>
 #include <linux/dax.h>
 #include <linux/badblocks.h>
+#include <linux/falloc.h>
 #include <asm/uaccess.h>
 #include "internal.h"
 
@@ -1786,6 +1787,88 @@ static const struct address_space_operations def_blk_aops = {
 	.is_dirty_writeback = buffer_check_dirty_writeback,
 };
 
+#define	BLKDEV_FALLOC_FL_SUPPORTED					\
+		(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |		\
+		 FALLOC_FL_ZERO_RANGE | FALLOC_FL_NO_HIDE_STALE)
+
+static long blkdev_fallocate(struct file *file, int mode, loff_t start,
+			     loff_t len)
+{
+	struct block_device *bdev = I_BDEV(bdev_file_inode(file));
+	struct request_queue *q = bdev_get_queue(bdev);
+	struct address_space *mapping;
+	loff_t end = start + len - 1;
+	loff_t isize;
+	int error;
+
+	/* Fail if we don't recognize the flags. */
+	if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
+		return -EOPNOTSUPP;
+
+	/* Don't go off the end of the device. */
+	isize = i_size_read(bdev->bd_inode);
+	if (start >= isize)
+		return -EINVAL;
+	if (end > isize) {
+		if (mode & FALLOC_FL_KEEP_SIZE) {
+			len = isize - start;
+			end = start + len - 1;
+		} else
+			return -EINVAL;
+	}
+
+	/*
+	 * Don't allow IO that isn't aligned to logical block size.
+	 */
+	if ((start | len) & (bdev_logical_block_size(bdev) - 1))
+		return -EINVAL;
+
+	/* Invalidate the page cache, including dirty pages. */
+	mapping = bdev->bd_inode->i_mapping;
+	truncate_inode_pages_range(mapping, start, end);
+
+	switch (mode) {
+	case FALLOC_FL_ZERO_RANGE:
+	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
+		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
+					    GFP_KERNEL, false);
+		if (error)
+			return error;
+		break;
+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
+		/* Only punch if the device can do zeroing discard. */
+		if (!blk_queue_discard(q) || !q->limits.discard_zeroes_data)
+			return -EOPNOTSUPP;
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+		if (error)
+			return error;
+		break;
+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
+		/*
+		 * XXX: a well known search engine vendor interprets this
+		 * flag (in other circumstances) to mean "I don't care if
+		 * we can read stale contents later".  Is it appropriate
+		 * to wire this up to the non-zeroing discard?
+		 */
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+		if (error)
+			return error;
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	/*
+	 * Invalidate again; if someone wandered in and dirtied a page,
+	 * the caller will be given -EBUSY;
+	 */
+	return invalidate_inode_pages2_range(mapping,
+					     start >> PAGE_SHIFT,
+					     end >> PAGE_SHIFT);
+}
+
 const struct file_operations def_blk_fops = {
 	.open		= blkdev_open,
 	.release	= blkdev_close,
@@ -1800,6 +1883,7 @@ const struct file_operations def_blk_fops = {
 #endif
 	.splice_read	= generic_file_splice_read,
 	.splice_write	= iter_file_splice_write,
+	.fallocate	= blkdev_fallocate,
 };
 
 int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long arg)
diff --git a/fs/open.c b/fs/open.c
index 4fd6e25..01b6092 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -289,7 +289,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 	 * Let individual file system decide if it supports preallocation
 	 * for directories or not.
 	 */
-	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
+	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) &&
+	    !S_ISBLK(inode->i_mode))
 		return -ENODEV;
 
 	/* Check for wrap through zero too */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 3/3] block: implement (some of) fallocate for block devices
  2016-06-17  1:17 [PATCH v9 0/3] " Darrick J. Wong
  2016-06-17  1:17   ` Darrick J. Wong
@ 2016-06-17  1:17   ` Darrick J. Wong
  0 siblings, 0 replies; 66+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:17 UTC (permalink / raw)
  To: axboe, darrick.wong
  Cc: linux-block, tytso, martin.petersen, snitzer, linux-api, bfoster,
	xfs, hch, dm-devel, linux-fsdevel

After much discussion, it seems that the fallocate feature flag
FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
whitelisted for zeroing SCSI UNMAP.  Punch still requires that
FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
device will be clamped to the device size if KEEP_SIZE is set; or will
return -EINVAL if not.  Both start and length must be aligned to the
device's logical block size.

Since the semantics of fallocate are fairly well established already,
wire up the two pieces.  The other fallocate variants (collapse range,
insert range, and allocate blocks) are not supported.

v2: Incorporate feedback from Christoph & Linus.  Tentatively add
a requirement that the fallocate arguments be aligned to logical block
size, and put in a few XXX comments ahead of LSF discussion.

v3: Forward port to 4.7.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/block_dev.c |   84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/open.c      |    3 +-
 2 files changed, 86 insertions(+), 1 deletion(-)


diff --git a/fs/block_dev.c b/fs/block_dev.c
index 71ccab1..a3975c4 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -30,6 +30,7 @@
 #include <linux/cleancache.h>
 #include <linux/dax.h>
 #include <linux/badblocks.h>
+#include <linux/falloc.h>
 #include <asm/uaccess.h>
 #include "internal.h"
 
@@ -1802,6 +1803,88 @@ static const struct address_space_operations def_blk_aops = {
 	.is_dirty_writeback = buffer_check_dirty_writeback,
 };
 
+#define	BLKDEV_FALLOC_FL_SUPPORTED					\
+		(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |		\
+		 FALLOC_FL_ZERO_RANGE | FALLOC_FL_NO_HIDE_STALE)
+
+static long blkdev_fallocate(struct file *file, int mode, loff_t start,
+			     loff_t len)
+{
+	struct block_device *bdev = I_BDEV(bdev_file_inode(file));
+	struct request_queue *q = bdev_get_queue(bdev);
+	struct address_space *mapping;
+	loff_t end = start + len - 1;
+	loff_t isize;
+	int error;
+
+	/* Fail if we don't recognize the flags. */
+	if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
+		return -EOPNOTSUPP;
+
+	/* Don't go off the end of the device. */
+	isize = i_size_read(bdev->bd_inode);
+	if (start >= isize)
+		return -EINVAL;
+	if (end > isize) {
+		if (mode & FALLOC_FL_KEEP_SIZE) {
+			len = isize - start;
+			end = start + len - 1;
+		} else
+			return -EINVAL;
+	}
+
+	/*
+	 * Don't allow IO that isn't aligned to logical block size.
+	 */
+	if ((start | len) & (bdev_logical_block_size(bdev) - 1))
+		return -EINVAL;
+
+	/* Invalidate the page cache, including dirty pages. */
+	mapping = bdev->bd_inode->i_mapping;
+	truncate_inode_pages_range(mapping, start, end);
+
+	switch (mode) {
+	case FALLOC_FL_ZERO_RANGE:
+	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
+		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
+					    GFP_KERNEL, false);
+		if (error)
+			return error;
+		break;
+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
+		/* Only punch if the device can do zeroing discard. */
+		if (!blk_queue_discard(q) || !q->limits.discard_zeroes_data)
+			return -EOPNOTSUPP;
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+		if (error)
+			return error;
+		break;
+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
+		/*
+		 * XXX: a well known search engine vendor interprets this
+		 * flag (in other circumstances) to mean "I don't care if
+		 * we can read stale contents later".  Is it appropriate
+		 * to wire this up to the non-zeroing discard?
+		 */
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+		if (error)
+			return error;
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	/*
+	 * Invalidate again; if someone wandered in and dirtied a page,
+	 * the caller will be given -EBUSY;
+	 */
+	return invalidate_inode_pages2_range(mapping,
+					     start >> PAGE_SHIFT,
+					     end >> PAGE_SHIFT);
+}
+
 const struct file_operations def_blk_fops = {
 	.open		= blkdev_open,
 	.release	= blkdev_close,
@@ -1816,6 +1899,7 @@ const struct file_operations def_blk_fops = {
 #endif
 	.splice_read	= generic_file_splice_read,
 	.splice_write	= iter_file_splice_write,
+	.fallocate	= blkdev_fallocate,
 };
 
 int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long arg)
diff --git a/fs/open.c b/fs/open.c
index 93ae3cd..b2dbda4 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -289,7 +289,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 	 * Let individual file system decide if it supports preallocation
 	 * for directories or not.
 	 */
-	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
+	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) &&
+	    !S_ISBLK(inode->i_mode))
 		return -ENODEV;
 
 	/* Check for wrap through zero too */

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 3/3] block: implement (some of) fallocate for block devices
@ 2016-06-17  1:17   ` Darrick J. Wong
  0 siblings, 0 replies; 66+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:17 UTC (permalink / raw)
  To: axboe, darrick.wong
  Cc: hch, tytso, martin.petersen, snitzer, linux-api, bfoster, xfs,
	linux-block, dm-devel, linux-fsdevel

After much discussion, it seems that the fallocate feature flag
FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
whitelisted for zeroing SCSI UNMAP.  Punch still requires that
FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
device will be clamped to the device size if KEEP_SIZE is set; or will
return -EINVAL if not.  Both start and length must be aligned to the
device's logical block size.

Since the semantics of fallocate are fairly well established already,
wire up the two pieces.  The other fallocate variants (collapse range,
insert range, and allocate blocks) are not supported.

v2: Incorporate feedback from Christoph & Linus.  Tentatively add
a requirement that the fallocate arguments be aligned to logical block
size, and put in a few XXX comments ahead of LSF discussion.

v3: Forward port to 4.7.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/block_dev.c |   84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/open.c      |    3 +-
 2 files changed, 86 insertions(+), 1 deletion(-)


diff --git a/fs/block_dev.c b/fs/block_dev.c
index 71ccab1..a3975c4 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -30,6 +30,7 @@
 #include <linux/cleancache.h>
 #include <linux/dax.h>
 #include <linux/badblocks.h>
+#include <linux/falloc.h>
 #include <asm/uaccess.h>
 #include "internal.h"
 
@@ -1802,6 +1803,88 @@ static const struct address_space_operations def_blk_aops = {
 	.is_dirty_writeback = buffer_check_dirty_writeback,
 };
 
+#define	BLKDEV_FALLOC_FL_SUPPORTED					\
+		(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |		\
+		 FALLOC_FL_ZERO_RANGE | FALLOC_FL_NO_HIDE_STALE)
+
+static long blkdev_fallocate(struct file *file, int mode, loff_t start,
+			     loff_t len)
+{
+	struct block_device *bdev = I_BDEV(bdev_file_inode(file));
+	struct request_queue *q = bdev_get_queue(bdev);
+	struct address_space *mapping;
+	loff_t end = start + len - 1;
+	loff_t isize;
+	int error;
+
+	/* Fail if we don't recognize the flags. */
+	if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
+		return -EOPNOTSUPP;
+
+	/* Don't go off the end of the device. */
+	isize = i_size_read(bdev->bd_inode);
+	if (start >= isize)
+		return -EINVAL;
+	if (end > isize) {
+		if (mode & FALLOC_FL_KEEP_SIZE) {
+			len = isize - start;
+			end = start + len - 1;
+		} else
+			return -EINVAL;
+	}
+
+	/*
+	 * Don't allow IO that isn't aligned to logical block size.
+	 */
+	if ((start | len) & (bdev_logical_block_size(bdev) - 1))
+		return -EINVAL;
+
+	/* Invalidate the page cache, including dirty pages. */
+	mapping = bdev->bd_inode->i_mapping;
+	truncate_inode_pages_range(mapping, start, end);
+
+	switch (mode) {
+	case FALLOC_FL_ZERO_RANGE:
+	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
+		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
+					    GFP_KERNEL, false);
+		if (error)
+			return error;
+		break;
+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
+		/* Only punch if the device can do zeroing discard. */
+		if (!blk_queue_discard(q) || !q->limits.discard_zeroes_data)
+			return -EOPNOTSUPP;
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+		if (error)
+			return error;
+		break;
+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
+		/*
+		 * XXX: a well known search engine vendor interprets this
+		 * flag (in other circumstances) to mean "I don't care if
+		 * we can read stale contents later".  Is it appropriate
+		 * to wire this up to the non-zeroing discard?
+		 */
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+		if (error)
+			return error;
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	/*
+	 * Invalidate again; if someone wandered in and dirtied a page,
+	 * the caller will be given -EBUSY;
+	 */
+	return invalidate_inode_pages2_range(mapping,
+					     start >> PAGE_SHIFT,
+					     end >> PAGE_SHIFT);
+}
+
 const struct file_operations def_blk_fops = {
 	.open		= blkdev_open,
 	.release	= blkdev_close,
@@ -1816,6 +1899,7 @@ const struct file_operations def_blk_fops = {
 #endif
 	.splice_read	= generic_file_splice_read,
 	.splice_write	= iter_file_splice_write,
+	.fallocate	= blkdev_fallocate,
 };
 
 int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long arg)
diff --git a/fs/open.c b/fs/open.c
index 93ae3cd..b2dbda4 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -289,7 +289,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 	 * Let individual file system decide if it supports preallocation
 	 * for directories or not.
 	 */
-	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
+	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) &&
+	    !S_ISBLK(inode->i_mode))
 		return -ENODEV;
 
 	/* Check for wrap through zero too */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 3/3] block: implement (some of) fallocate for block devices
@ 2016-06-17  1:17   ` Darrick J. Wong
  0 siblings, 0 replies; 66+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:17 UTC (permalink / raw)
  To: axboe-tSWWG44O7X1aa/9Udqfwiw, darrick.wong-QHcLZuEGTsvQT0dZR+AlfA
  Cc: linux-block-u79uwXL29TY76Z2rM5mHXA, tytso-3s7WtUTddSA,
	martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, linux-api-u79uwXL29TY76Z2rM5mHXA,
	bfoster-H+wXaHxf7aLQT0dZR+AlfA, xfs-VZNHf3L845pBDgjK7y7TUQ,
	hch-wEGCiKHe2LqWVfeAwA7xHQ, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

After much discussion, it seems that the fallocate feature flag
FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
whitelisted for zeroing SCSI UNMAP.  Punch still requires that
FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
device will be clamped to the device size if KEEP_SIZE is set; or will
return -EINVAL if not.  Both start and length must be aligned to the
device's logical block size.

Since the semantics of fallocate are fairly well established already,
wire up the two pieces.  The other fallocate variants (collapse range,
insert range, and allocate blocks) are not supported.

v2: Incorporate feedback from Christoph & Linus.  Tentatively add
a requirement that the fallocate arguments be aligned to logical block
size, and put in a few XXX comments ahead of LSF discussion.

v3: Forward port to 4.7.

Signed-off-by: Darrick J. Wong <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 fs/block_dev.c |   84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/open.c      |    3 +-
 2 files changed, 86 insertions(+), 1 deletion(-)


diff --git a/fs/block_dev.c b/fs/block_dev.c
index 71ccab1..a3975c4 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -30,6 +30,7 @@
 #include <linux/cleancache.h>
 #include <linux/dax.h>
 #include <linux/badblocks.h>
+#include <linux/falloc.h>
 #include <asm/uaccess.h>
 #include "internal.h"
 
@@ -1802,6 +1803,88 @@ static const struct address_space_operations def_blk_aops = {
 	.is_dirty_writeback = buffer_check_dirty_writeback,
 };
 
+#define	BLKDEV_FALLOC_FL_SUPPORTED					\
+		(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |		\
+		 FALLOC_FL_ZERO_RANGE | FALLOC_FL_NO_HIDE_STALE)
+
+static long blkdev_fallocate(struct file *file, int mode, loff_t start,
+			     loff_t len)
+{
+	struct block_device *bdev = I_BDEV(bdev_file_inode(file));
+	struct request_queue *q = bdev_get_queue(bdev);
+	struct address_space *mapping;
+	loff_t end = start + len - 1;
+	loff_t isize;
+	int error;
+
+	/* Fail if we don't recognize the flags. */
+	if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
+		return -EOPNOTSUPP;
+
+	/* Don't go off the end of the device. */
+	isize = i_size_read(bdev->bd_inode);
+	if (start >= isize)
+		return -EINVAL;
+	if (end > isize) {
+		if (mode & FALLOC_FL_KEEP_SIZE) {
+			len = isize - start;
+			end = start + len - 1;
+		} else
+			return -EINVAL;
+	}
+
+	/*
+	 * Don't allow IO that isn't aligned to logical block size.
+	 */
+	if ((start | len) & (bdev_logical_block_size(bdev) - 1))
+		return -EINVAL;
+
+	/* Invalidate the page cache, including dirty pages. */
+	mapping = bdev->bd_inode->i_mapping;
+	truncate_inode_pages_range(mapping, start, end);
+
+	switch (mode) {
+	case FALLOC_FL_ZERO_RANGE:
+	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
+		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
+					    GFP_KERNEL, false);
+		if (error)
+			return error;
+		break;
+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
+		/* Only punch if the device can do zeroing discard. */
+		if (!blk_queue_discard(q) || !q->limits.discard_zeroes_data)
+			return -EOPNOTSUPP;
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+		if (error)
+			return error;
+		break;
+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
+		/*
+		 * XXX: a well known search engine vendor interprets this
+		 * flag (in other circumstances) to mean "I don't care if
+		 * we can read stale contents later".  Is it appropriate
+		 * to wire this up to the non-zeroing discard?
+		 */
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+		if (error)
+			return error;
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	/*
+	 * Invalidate again; if someone wandered in and dirtied a page,
+	 * the caller will be given -EBUSY;
+	 */
+	return invalidate_inode_pages2_range(mapping,
+					     start >> PAGE_SHIFT,
+					     end >> PAGE_SHIFT);
+}
+
 const struct file_operations def_blk_fops = {
 	.open		= blkdev_open,
 	.release	= blkdev_close,
@@ -1816,6 +1899,7 @@ const struct file_operations def_blk_fops = {
 #endif
 	.splice_read	= generic_file_splice_read,
 	.splice_write	= iter_file_splice_write,
+	.fallocate	= blkdev_fallocate,
 };
 
 int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long arg)
diff --git a/fs/open.c b/fs/open.c
index 93ae3cd..b2dbda4 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -289,7 +289,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 	 * Let individual file system decide if it supports preallocation
 	 * for directories or not.
 	 */
-	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
+	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) &&
+	    !S_ISBLK(inode->i_mode))
 		return -ENODEV;
 
 	/* Check for wrap through zero too */

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 3/3] block: implement (some of) fallocate for block devices
  2016-04-13  4:01 [RFC DONOTMERGE v8 0/3] " Darrick J. Wong
  2016-04-13  4:01   ` Darrick J. Wong
@ 2016-04-13  4:01   ` Darrick J. Wong
  0 siblings, 0 replies; 66+ messages in thread
From: Darrick J. Wong @ 2016-04-13  4:01 UTC (permalink / raw)
  To: darrick.wong
  Cc: axboe, hch, tytso, martin.petersen, snitzer, linux-api, bfoster,
	xfs, linux-block, dm-devel, linux-fsdevel

After much discussion, it seems that the fallocate feature flag
FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
whitelisted for zeroing SCSI UNMAP.  Punch still requires that
FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
device will be clamped to the device size if KEEP_SIZE is set; or will
return -EINVAL if not.  Both start and length must be aligned to the
device's logical block size.

Since the semantics of fallocate are fairly well established already,
wire up the two pieces.  The other fallocate variants (collapse range,
insert range, and allocate blocks) are not supported.

v2: Incorporate feedback from Christoph & Linus.  Tentatively add
a requirement that the fallocate arguments be aligned to logical block
size, and put in a few XXX comments ahead of LSF discussion.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/block_dev.c |   87 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/open.c      |    3 +-
 2 files changed, 89 insertions(+), 1 deletion(-)


diff --git a/fs/block_dev.c b/fs/block_dev.c
index 20a2c02..5c8eb0c 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -29,6 +29,7 @@
 #include <linux/log2.h>
 #include <linux/cleancache.h>
 #include <linux/dax.h>
+#include <linux/falloc.h>
 #include <asm/uaccess.h>
 #include "internal.h"
 
@@ -1790,6 +1791,91 @@ static int blkdev_mmap(struct file *file, struct vm_area_struct *vma)
 #define blkdev_mmap generic_file_mmap
 #endif
 
+#define	BLKDEV_FALLOC_FL_SUPPORTED					\
+		(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |		\
+		 FALLOC_FL_ZERO_RANGE | FALLOC_FL_NO_HIDE_STALE)
+
+static long blkdev_fallocate(struct file *file, int mode, loff_t start,
+			     loff_t len)
+{
+	struct block_device *bdev = I_BDEV(bdev_file_inode(file));
+	struct request_queue *q = bdev_get_queue(bdev);
+	struct address_space *mapping;
+	loff_t end = start + len - 1;
+	loff_t isize;
+	int error;
+
+	/* Fail if we don't recognize the flags. */
+	if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
+		return -EOPNOTSUPP;
+
+	/* Don't go off the end of the device. */
+	isize = i_size_read(bdev->bd_inode);
+	if (start >= isize)
+		return -EINVAL;
+	if (end > isize) {
+		if (mode & FALLOC_FL_KEEP_SIZE) {
+			len = isize - start;
+			end = start + len - 1;
+		} else
+			return -EINVAL;
+	}
+
+	/*
+	 * Don't allow IO that isn't aligned to logical block size.
+	 * XXX: Should zero and punch write zeroes through the page cache
+	 *      if start or end aren't lbs aligned?
+	 * XXX: What about thinp which prefers io_min alignment?
+	 */
+	if ((start | len) & bdev_logical_block_size(bdev) - 1)
+		return -EINVAL;
+
+	/* Invalidate the page cache, including dirty pages. */
+	mapping = bdev->bd_inode->i_mapping;
+	truncate_inode_pages_range(mapping, start, end);
+
+	switch (mode) {
+	case FALLOC_FL_ZERO_RANGE:
+	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
+		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
+					    GFP_KERNEL, false);
+		if (error)
+			return error;
+		break;
+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
+		/* Only punch if the device can do zeroing discard. */
+		if (!blk_queue_discard(q) || !q->limits.discard_zeroes_data)
+			return -EOPNOTSUPP;
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+		if (error)
+			return error;
+		break;
+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
+		/*
+		 * XXX: a well known search engine vendor interprets this
+		 * flag (in other circumstances) to mean "I don't care if
+		 * we can read stale contents later".  Is it appropriate
+		 * to wire this up to the non-zeroing discard?
+		 */
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+		if (error)
+			return error;
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	/*
+	 * Invalidate again; if someone wandered in and dirtied a page,
+	 * the caller will be given -EBUSY;
+	 */
+	return invalidate_inode_pages2_range(mapping,
+					     start >> PAGE_SHIFT,
+					     end >> PAGE_SHIFT);
+}
+
 const struct file_operations def_blk_fops = {
 	.open		= blkdev_open,
 	.release	= blkdev_close,
@@ -1804,6 +1890,7 @@ const struct file_operations def_blk_fops = {
 #endif
 	.splice_read	= generic_file_splice_read,
 	.splice_write	= iter_file_splice_write,
+	.fallocate	= blkdev_fallocate,
 };
 
 int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long arg)
diff --git a/fs/open.c b/fs/open.c
index 17cb6b1..f9ebe32 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -289,7 +289,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 	 * Let individual file system decide if it supports preallocation
 	 * for directories or not.
 	 */
-	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
+	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) &&
+	    !S_ISBLK(inode->i_mode))
 		return -ENODEV;
 
 	/* Check for wrap through zero too */


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 3/3] block: implement (some of) fallocate for block devices
@ 2016-04-13  4:01   ` Darrick J. Wong
  0 siblings, 0 replies; 66+ messages in thread
From: Darrick J. Wong @ 2016-04-13  4:01 UTC (permalink / raw)
  To: darrick.wong
  Cc: axboe, linux-block, tytso, martin.petersen, snitzer, linux-api,
	bfoster, xfs, hch, dm-devel, linux-fsdevel

After much discussion, it seems that the fallocate feature flag
FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
whitelisted for zeroing SCSI UNMAP.  Punch still requires that
FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
device will be clamped to the device size if KEEP_SIZE is set; or will
return -EINVAL if not.  Both start and length must be aligned to the
device's logical block size.

Since the semantics of fallocate are fairly well established already,
wire up the two pieces.  The other fallocate variants (collapse range,
insert range, and allocate blocks) are not supported.

v2: Incorporate feedback from Christoph & Linus.  Tentatively add
a requirement that the fallocate arguments be aligned to logical block
size, and put in a few XXX comments ahead of LSF discussion.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/block_dev.c |   87 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/open.c      |    3 +-
 2 files changed, 89 insertions(+), 1 deletion(-)


diff --git a/fs/block_dev.c b/fs/block_dev.c
index 20a2c02..5c8eb0c 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -29,6 +29,7 @@
 #include <linux/log2.h>
 #include <linux/cleancache.h>
 #include <linux/dax.h>
+#include <linux/falloc.h>
 #include <asm/uaccess.h>
 #include "internal.h"
 
@@ -1790,6 +1791,91 @@ static int blkdev_mmap(struct file *file, struct vm_area_struct *vma)
 #define blkdev_mmap generic_file_mmap
 #endif
 
+#define	BLKDEV_FALLOC_FL_SUPPORTED					\
+		(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |		\
+		 FALLOC_FL_ZERO_RANGE | FALLOC_FL_NO_HIDE_STALE)
+
+static long blkdev_fallocate(struct file *file, int mode, loff_t start,
+			     loff_t len)
+{
+	struct block_device *bdev = I_BDEV(bdev_file_inode(file));
+	struct request_queue *q = bdev_get_queue(bdev);
+	struct address_space *mapping;
+	loff_t end = start + len - 1;
+	loff_t isize;
+	int error;
+
+	/* Fail if we don't recognize the flags. */
+	if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
+		return -EOPNOTSUPP;
+
+	/* Don't go off the end of the device. */
+	isize = i_size_read(bdev->bd_inode);
+	if (start >= isize)
+		return -EINVAL;
+	if (end > isize) {
+		if (mode & FALLOC_FL_KEEP_SIZE) {
+			len = isize - start;
+			end = start + len - 1;
+		} else
+			return -EINVAL;
+	}
+
+	/*
+	 * Don't allow IO that isn't aligned to logical block size.
+	 * XXX: Should zero and punch write zeroes through the page cache
+	 *      if start or end aren't lbs aligned?
+	 * XXX: What about thinp which prefers io_min alignment?
+	 */
+	if ((start | len) & bdev_logical_block_size(bdev) - 1)
+		return -EINVAL;
+
+	/* Invalidate the page cache, including dirty pages. */
+	mapping = bdev->bd_inode->i_mapping;
+	truncate_inode_pages_range(mapping, start, end);
+
+	switch (mode) {
+	case FALLOC_FL_ZERO_RANGE:
+	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
+		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
+					    GFP_KERNEL, false);
+		if (error)
+			return error;
+		break;
+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
+		/* Only punch if the device can do zeroing discard. */
+		if (!blk_queue_discard(q) || !q->limits.discard_zeroes_data)
+			return -EOPNOTSUPP;
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+		if (error)
+			return error;
+		break;
+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
+		/*
+		 * XXX: a well known search engine vendor interprets this
+		 * flag (in other circumstances) to mean "I don't care if
+		 * we can read stale contents later".  Is it appropriate
+		 * to wire this up to the non-zeroing discard?
+		 */
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+		if (error)
+			return error;
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	/*
+	 * Invalidate again; if someone wandered in and dirtied a page,
+	 * the caller will be given -EBUSY;
+	 */
+	return invalidate_inode_pages2_range(mapping,
+					     start >> PAGE_SHIFT,
+					     end >> PAGE_SHIFT);
+}
+
 const struct file_operations def_blk_fops = {
 	.open		= blkdev_open,
 	.release	= blkdev_close,
@@ -1804,6 +1890,7 @@ const struct file_operations def_blk_fops = {
 #endif
 	.splice_read	= generic_file_splice_read,
 	.splice_write	= iter_file_splice_write,
+	.fallocate	= blkdev_fallocate,
 };
 
 int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long arg)
diff --git a/fs/open.c b/fs/open.c
index 17cb6b1..f9ebe32 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -289,7 +289,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 	 * Let individual file system decide if it supports preallocation
 	 * for directories or not.
 	 */
-	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
+	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) &&
+	    !S_ISBLK(inode->i_mode))
 		return -ENODEV;
 
 	/* Check for wrap through zero too */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 3/3] block: implement (some of) fallocate for block devices
@ 2016-04-13  4:01   ` Darrick J. Wong
  0 siblings, 0 replies; 66+ messages in thread
From: Darrick J. Wong @ 2016-04-13  4:01 UTC (permalink / raw)
  To: darrick.wong-QHcLZuEGTsvQT0dZR+AlfA
  Cc: axboe-tSWWG44O7X1aa/9Udqfwiw, hch-wEGCiKHe2LqWVfeAwA7xHQ,
	tytso-3s7WtUTddSA, martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, linux-api-u79uwXL29TY76Z2rM5mHXA,
	bfoster-H+wXaHxf7aLQT0dZR+AlfA, xfs-VZNHf3L845pBDgjK7y7TUQ,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

After much discussion, it seems that the fallocate feature flag
FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
whitelisted for zeroing SCSI UNMAP.  Punch still requires that
FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
device will be clamped to the device size if KEEP_SIZE is set; or will
return -EINVAL if not.  Both start and length must be aligned to the
device's logical block size.

Since the semantics of fallocate are fairly well established already,
wire up the two pieces.  The other fallocate variants (collapse range,
insert range, and allocate blocks) are not supported.

v2: Incorporate feedback from Christoph & Linus.  Tentatively add
a requirement that the fallocate arguments be aligned to logical block
size, and put in a few XXX comments ahead of LSF discussion.

Signed-off-by: Darrick J. Wong <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 fs/block_dev.c |   87 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/open.c      |    3 +-
 2 files changed, 89 insertions(+), 1 deletion(-)


diff --git a/fs/block_dev.c b/fs/block_dev.c
index 20a2c02..5c8eb0c 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -29,6 +29,7 @@
 #include <linux/log2.h>
 #include <linux/cleancache.h>
 #include <linux/dax.h>
+#include <linux/falloc.h>
 #include <asm/uaccess.h>
 #include "internal.h"
 
@@ -1790,6 +1791,91 @@ static int blkdev_mmap(struct file *file, struct vm_area_struct *vma)
 #define blkdev_mmap generic_file_mmap
 #endif
 
+#define	BLKDEV_FALLOC_FL_SUPPORTED					\
+		(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |		\
+		 FALLOC_FL_ZERO_RANGE | FALLOC_FL_NO_HIDE_STALE)
+
+static long blkdev_fallocate(struct file *file, int mode, loff_t start,
+			     loff_t len)
+{
+	struct block_device *bdev = I_BDEV(bdev_file_inode(file));
+	struct request_queue *q = bdev_get_queue(bdev);
+	struct address_space *mapping;
+	loff_t end = start + len - 1;
+	loff_t isize;
+	int error;
+
+	/* Fail if we don't recognize the flags. */
+	if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
+		return -EOPNOTSUPP;
+
+	/* Don't go off the end of the device. */
+	isize = i_size_read(bdev->bd_inode);
+	if (start >= isize)
+		return -EINVAL;
+	if (end > isize) {
+		if (mode & FALLOC_FL_KEEP_SIZE) {
+			len = isize - start;
+			end = start + len - 1;
+		} else
+			return -EINVAL;
+	}
+
+	/*
+	 * Don't allow IO that isn't aligned to logical block size.
+	 * XXX: Should zero and punch write zeroes through the page cache
+	 *      if start or end aren't lbs aligned?
+	 * XXX: What about thinp which prefers io_min alignment?
+	 */
+	if ((start | len) & bdev_logical_block_size(bdev) - 1)
+		return -EINVAL;
+
+	/* Invalidate the page cache, including dirty pages. */
+	mapping = bdev->bd_inode->i_mapping;
+	truncate_inode_pages_range(mapping, start, end);
+
+	switch (mode) {
+	case FALLOC_FL_ZERO_RANGE:
+	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
+		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
+					    GFP_KERNEL, false);
+		if (error)
+			return error;
+		break;
+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
+		/* Only punch if the device can do zeroing discard. */
+		if (!blk_queue_discard(q) || !q->limits.discard_zeroes_data)
+			return -EOPNOTSUPP;
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+		if (error)
+			return error;
+		break;
+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
+		/*
+		 * XXX: a well known search engine vendor interprets this
+		 * flag (in other circumstances) to mean "I don't care if
+		 * we can read stale contents later".  Is it appropriate
+		 * to wire this up to the non-zeroing discard?
+		 */
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+		if (error)
+			return error;
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	/*
+	 * Invalidate again; if someone wandered in and dirtied a page,
+	 * the caller will be given -EBUSY;
+	 */
+	return invalidate_inode_pages2_range(mapping,
+					     start >> PAGE_SHIFT,
+					     end >> PAGE_SHIFT);
+}
+
 const struct file_operations def_blk_fops = {
 	.open		= blkdev_open,
 	.release	= blkdev_close,
@@ -1804,6 +1890,7 @@ const struct file_operations def_blk_fops = {
 #endif
 	.splice_read	= generic_file_splice_read,
 	.splice_write	= iter_file_splice_write,
+	.fallocate	= blkdev_fallocate,
 };
 
 int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long arg)
diff --git a/fs/open.c b/fs/open.c
index 17cb6b1..f9ebe32 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -289,7 +289,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 	 * Let individual file system decide if it supports preallocation
 	 * for directories or not.
 	 */
-	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
+	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) &&
+	    !S_ISBLK(inode->i_mode))
 		return -ENODEV;
 
 	/* Check for wrap through zero too */

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices
@ 2016-03-21 20:59           ` Brian Foster
  0 siblings, 0 replies; 66+ messages in thread
From: Brian Foster @ 2016-03-21 20:59 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Darrick J. Wong, Jens Axboe, Linus Torvalds, Bruce Fields,
	Theodore Ts'o, Martin K. Petersen, linux-api, david,
	linux-kernel, shane.seymour, Christoph Hellwig, linux-fsdevel,
	Jeff Layton, Andrew Morton, device-mapper development

On Mon, Mar 21, 2016 at 03:22:29PM -0400, Mike Snitzer wrote:
> On Mon, Mar 21 2016 at  3:11pm -0400,
> Darrick J. Wong <darrick.wong@oracle.com> wrote:
> 
> > On Mon, Mar 21, 2016 at 02:52:00PM -0400, Mike Snitzer wrote:
> > > On Tue, Mar 15, 2016 at 3:42 PM, Darrick J. Wong
> > > <darrick.wong@oracle.com> wrote:
> > > > After much discussion, it seems that the fallocate feature flag
> > > > FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
> > > > FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
> > > > whitelisted for zeroing SCSI UNMAP.  Punch still requires that
> > > > FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
> > > > device will be clamped to the device size if KEEP_SIZE is set; or will
> > > > return -EINVAL if not.  Both start and length must be aligned to the
> > > > device's logical block size.
> > > >
> > > > Since the semantics of fallocate are fairly well established already,
> > > > wire up the two pieces.  The other fallocate variants (collapse range,
> > > > insert range, and allocate blocks) are not supported.
> > > 
> > > I'd like to see fallocate (block allocation) extend down to DM thinp.
> > > This more traditional use of fallocate would be useful for ensuring
> > > ENOSPC won't occur -- especially important if the FS has committed
> > > space in response to fallocate.  As of now fallocate doesn't inform DM
> > > thinp at all.  Curious why you decided not to wire it up?
> > 
> > I don't know what to wire it up to. :)
> 
> Fair enough.  Yes something needs to be invented.
>  
> > I didn't find any blkdev_* function that looked encouraging, though I
> > haven't dug too deeply into bfoster's "prototype a block reservation
> > allocation model" patchset yet.  At a high level I'd guess that would
> > be a reasonable piece to connect to?  It looks like the piece I want
> > is blk_provision_space().
> 
> Yes, something like that.
> 

Just a note that the caveat/hack with the provision call in there is
that it returns an allocated block count. That was necessary to help
maintain the local reservation accounting. I'd love to find a way to
handle that more cleanly or take advantage of generic fallocate, but I
don't have a clear idea on how to do that at the moment. (I do wonder
whether an internal-only set of falloc "reserve" flags would fly...).

Anyways, that's a separate topic. Feel free to steal any of that dm-thin
provision code if it is useful for generic fallocate(). :)

Brian

> > > But I'm not sure what "it" (the "allocate blocks" variant) even is
> > > given falloc.h doesn't show anything like "_ALLOCATE_BLOCKS"...
> > 
> > The default behavior of fallocate is to allocate blocks, which means
> > that one invokes it by not passing any mode flags (except possibly
> > KEEP_SIZE).
> 
> OK.
> 
> > > It would require a new block interface to pass the fallocate extent
> > > down.  But it seems bizarre to implement "some of" fallocate but not
> > > the most widely used case for fallocate.
> > 
> > Agreed.  I'd like to get the existing functionality wired up sooner than
> > later, and plumbing "allocate blocks" down to thinp can be done as a
> > followup.
> > 
> > (Or stall long enough that it becomes one patchset.)
> 
> Sure, sounds good.  Glad we're in agreement.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices
@ 2016-03-21 20:59           ` Brian Foster
  0 siblings, 0 replies; 66+ messages in thread
From: Brian Foster @ 2016-03-21 20:59 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Darrick J. Wong, Jens Axboe, Linus Torvalds, Bruce Fields,
	Theodore Ts'o, Martin K. Petersen,
	linux-api-u79uwXL29TY76Z2rM5mHXA, david-FqsqvQoI3Ljby3iVrkZq2A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, shane.seymour-ZPxbGqLxI0U,
	Christoph Hellwig, linux-fsdevel, Jeff Layton, Andrew Morton,
	device-mapper development

On Mon, Mar 21, 2016 at 03:22:29PM -0400, Mike Snitzer wrote:
> On Mon, Mar 21 2016 at  3:11pm -0400,
> Darrick J. Wong <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
> 
> > On Mon, Mar 21, 2016 at 02:52:00PM -0400, Mike Snitzer wrote:
> > > On Tue, Mar 15, 2016 at 3:42 PM, Darrick J. Wong
> > > <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
> > > > After much discussion, it seems that the fallocate feature flag
> > > > FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
> > > > FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
> > > > whitelisted for zeroing SCSI UNMAP.  Punch still requires that
> > > > FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
> > > > device will be clamped to the device size if KEEP_SIZE is set; or will
> > > > return -EINVAL if not.  Both start and length must be aligned to the
> > > > device's logical block size.
> > > >
> > > > Since the semantics of fallocate are fairly well established already,
> > > > wire up the two pieces.  The other fallocate variants (collapse range,
> > > > insert range, and allocate blocks) are not supported.
> > > 
> > > I'd like to see fallocate (block allocation) extend down to DM thinp.
> > > This more traditional use of fallocate would be useful for ensuring
> > > ENOSPC won't occur -- especially important if the FS has committed
> > > space in response to fallocate.  As of now fallocate doesn't inform DM
> > > thinp at all.  Curious why you decided not to wire it up?
> > 
> > I don't know what to wire it up to. :)
> 
> Fair enough.  Yes something needs to be invented.
>  
> > I didn't find any blkdev_* function that looked encouraging, though I
> > haven't dug too deeply into bfoster's "prototype a block reservation
> > allocation model" patchset yet.  At a high level I'd guess that would
> > be a reasonable piece to connect to?  It looks like the piece I want
> > is blk_provision_space().
> 
> Yes, something like that.
> 

Just a note that the caveat/hack with the provision call in there is
that it returns an allocated block count. That was necessary to help
maintain the local reservation accounting. I'd love to find a way to
handle that more cleanly or take advantage of generic fallocate, but I
don't have a clear idea on how to do that at the moment. (I do wonder
whether an internal-only set of falloc "reserve" flags would fly...).

Anyways, that's a separate topic. Feel free to steal any of that dm-thin
provision code if it is useful for generic fallocate(). :)

Brian

> > > But I'm not sure what "it" (the "allocate blocks" variant) even is
> > > given falloc.h doesn't show anything like "_ALLOCATE_BLOCKS"...
> > 
> > The default behavior of fallocate is to allocate blocks, which means
> > that one invokes it by not passing any mode flags (except possibly
> > KEEP_SIZE).
> 
> OK.
> 
> > > It would require a new block interface to pass the fallocate extent
> > > down.  But it seems bizarre to implement "some of" fallocate but not
> > > the most widely used case for fallocate.
> > 
> > Agreed.  I'd like to get the existing functionality wired up sooner than
> > later, and plumbing "allocate blocks" down to thinp can be done as a
> > followup.
> > 
> > (Or stall long enough that it becomes one patchset.)
> 
> Sure, sounds good.  Glad we're in agreement.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices
  2016-03-21 19:11       ` Darrick J. Wong
  (?)
@ 2016-03-21 19:22       ` Mike Snitzer
  2016-03-21 20:59           ` Brian Foster
  -1 siblings, 1 reply; 66+ messages in thread
From: Mike Snitzer @ 2016-03-21 19:22 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Jens Axboe, Linus Torvalds, Bruce Fields, Theodore Ts'o,
	Martin K. Petersen, linux-api, david, linux-kernel,
	shane.seymour, Christoph Hellwig, linux-fsdevel, Jeff Layton,
	Andrew Morton, device-mapper development

On Mon, Mar 21 2016 at  3:11pm -0400,
Darrick J. Wong <darrick.wong@oracle.com> wrote:

> On Mon, Mar 21, 2016 at 02:52:00PM -0400, Mike Snitzer wrote:
> > On Tue, Mar 15, 2016 at 3:42 PM, Darrick J. Wong
> > <darrick.wong@oracle.com> wrote:
> > > After much discussion, it seems that the fallocate feature flag
> > > FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
> > > FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
> > > whitelisted for zeroing SCSI UNMAP.  Punch still requires that
> > > FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
> > > device will be clamped to the device size if KEEP_SIZE is set; or will
> > > return -EINVAL if not.  Both start and length must be aligned to the
> > > device's logical block size.
> > >
> > > Since the semantics of fallocate are fairly well established already,
> > > wire up the two pieces.  The other fallocate variants (collapse range,
> > > insert range, and allocate blocks) are not supported.
> > 
> > I'd like to see fallocate (block allocation) extend down to DM thinp.
> > This more traditional use of fallocate would be useful for ensuring
> > ENOSPC won't occur -- especially important if the FS has committed
> > space in response to fallocate.  As of now fallocate doesn't inform DM
> > thinp at all.  Curious why you decided not to wire it up?
> 
> I don't know what to wire it up to. :)

Fair enough.  Yes something needs to be invented.
 
> I didn't find any blkdev_* function that looked encouraging, though I
> haven't dug too deeply into bfoster's "prototype a block reservation
> allocation model" patchset yet.  At a high level I'd guess that would
> be a reasonable piece to connect to?  It looks like the piece I want
> is blk_provision_space().

Yes, something like that.

> > But I'm not sure what "it" (the "allocate blocks" variant) even is
> > given falloc.h doesn't show anything like "_ALLOCATE_BLOCKS"...
> 
> The default behavior of fallocate is to allocate blocks, which means
> that one invokes it by not passing any mode flags (except possibly
> KEEP_SIZE).

OK.

> > It would require a new block interface to pass the fallocate extent
> > down.  But it seems bizarre to implement "some of" fallocate but not
> > the most widely used case for fallocate.
> 
> Agreed.  I'd like to get the existing functionality wired up sooner than
> later, and plumbing "allocate blocks" down to thinp can be done as a
> followup.
> 
> (Or stall long enough that it becomes one patchset.)

Sure, sounds good.  Glad we're in agreement.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices
  2016-03-21 18:52     ` Mike Snitzer
@ 2016-03-21 19:11       ` Darrick J. Wong
  -1 siblings, 0 replies; 66+ messages in thread
From: Darrick J. Wong @ 2016-03-21 19:11 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Jens Axboe, Linus Torvalds, Bruce Fields, Theodore Ts'o,
	Martin K. Petersen, linux-api, david, linux-kernel,
	shane.seymour, Christoph Hellwig, linux-fsdevel, Jeff Layton,
	Andrew Morton, device-mapper development

On Mon, Mar 21, 2016 at 02:52:00PM -0400, Mike Snitzer wrote:
> On Tue, Mar 15, 2016 at 3:42 PM, Darrick J. Wong
> <darrick.wong@oracle.com> wrote:
> > After much discussion, it seems that the fallocate feature flag
> > FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
> > FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
> > whitelisted for zeroing SCSI UNMAP.  Punch still requires that
> > FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
> > device will be clamped to the device size if KEEP_SIZE is set; or will
> > return -EINVAL if not.  Both start and length must be aligned to the
> > device's logical block size.
> >
> > Since the semantics of fallocate are fairly well established already,
> > wire up the two pieces.  The other fallocate variants (collapse range,
> > insert range, and allocate blocks) are not supported.
> 
> I'd like to see fallocate (block allocation) extend down to DM thinp.
> This more traditional use of fallocate would be useful for ensuring
> ENOSPC won't occur -- especially important if the FS has committed
> space in response to fallocate.  As of now fallocate doesn't inform DM
> thinp at all.  Curious why you decided not to wire it up?

I don't know what to wire it up to. :)

I didn't find any blkdev_* function that looked encouraging, though I
haven't dug too deeply into bfoster's "prototype a block reservation
allocation model" patchset yet.  At a high level I'd guess that would
be a reasonable piece to connect to?  It looks like the piece I want
is blk_provision_space().

> But I'm not sure what "it" (the "allocate blocks" variant) even is
> given falloc.h doesn't show anything like "_ALLOCATE_BLOCKS"...

The default behavior of fallocate is to allocate blocks, which means
that one invokes it by not passing any mode flags (except possibly
KEEP_SIZE).

> It would require a new block interface to pass the fallocate extent
> down.  But it seems bizarre to implement "some of" fallocate but not
> the most widely used case for fallocate.

Agreed.  I'd like to get the existing functionality wired up sooner than
later, and plumbing "allocate blocks" down to thinp can be done as a
followup.

(Or stall long enough that it becomes one patchset.)

--D

> 
> Mike

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices
@ 2016-03-21 19:11       ` Darrick J. Wong
  0 siblings, 0 replies; 66+ messages in thread
From: Darrick J. Wong @ 2016-03-21 19:11 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Jens Axboe, Christoph Hellwig, Theodore Ts'o,
	Martin K. Petersen, linux-api, david, linux-kernel,
	shane.seymour, Bruce Fields, device-mapper development,
	linux-fsdevel, Jeff Layton, Linus Torvalds, Andrew Morton

On Mon, Mar 21, 2016 at 02:52:00PM -0400, Mike Snitzer wrote:
> On Tue, Mar 15, 2016 at 3:42 PM, Darrick J. Wong
> <darrick.wong@oracle.com> wrote:
> > After much discussion, it seems that the fallocate feature flag
> > FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
> > FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
> > whitelisted for zeroing SCSI UNMAP.  Punch still requires that
> > FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
> > device will be clamped to the device size if KEEP_SIZE is set; or will
> > return -EINVAL if not.  Both start and length must be aligned to the
> > device's logical block size.
> >
> > Since the semantics of fallocate are fairly well established already,
> > wire up the two pieces.  The other fallocate variants (collapse range,
> > insert range, and allocate blocks) are not supported.
> 
> I'd like to see fallocate (block allocation) extend down to DM thinp.
> This more traditional use of fallocate would be useful for ensuring
> ENOSPC won't occur -- especially important if the FS has committed
> space in response to fallocate.  As of now fallocate doesn't inform DM
> thinp at all.  Curious why you decided not to wire it up?

I don't know what to wire it up to. :)

I didn't find any blkdev_* function that looked encouraging, though I
haven't dug too deeply into bfoster's "prototype a block reservation
allocation model" patchset yet.  At a high level I'd guess that would
be a reasonable piece to connect to?  It looks like the piece I want
is blk_provision_space().

> But I'm not sure what "it" (the "allocate blocks" variant) even is
> given falloc.h doesn't show anything like "_ALLOCATE_BLOCKS"...

The default behavior of fallocate is to allocate blocks, which means
that one invokes it by not passing any mode flags (except possibly
KEEP_SIZE).

> It would require a new block interface to pass the fallocate extent
> down.  But it seems bizarre to implement "some of" fallocate but not
> the most widely used case for fallocate.

Agreed.  I'd like to get the existing functionality wired up sooner than
later, and plumbing "allocate blocks" down to thinp can be done as a
followup.

(Or stall long enough that it becomes one patchset.)

--D

> 
> Mike

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices
  2016-03-15 19:42   ` Darrick J. Wong
@ 2016-03-21 18:52     ` Mike Snitzer
  -1 siblings, 0 replies; 66+ messages in thread
From: Mike Snitzer @ 2016-03-21 18:52 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Jens Axboe, Linus Torvalds, Bruce Fields, Theodore Ts'o,
	Martin K. Petersen, linux-api, david, linux-kernel,
	shane.seymour, Christoph Hellwig, linux-fsdevel, Jeff Layton,
	Andrew Morton, device-mapper development

On Tue, Mar 15, 2016 at 3:42 PM, Darrick J. Wong
<darrick.wong@oracle.com> wrote:
> After much discussion, it seems that the fallocate feature flag
> FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
> FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
> whitelisted for zeroing SCSI UNMAP.  Punch still requires that
> FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
> device will be clamped to the device size if KEEP_SIZE is set; or will
> return -EINVAL if not.  Both start and length must be aligned to the
> device's logical block size.
>
> Since the semantics of fallocate are fairly well established already,
> wire up the two pieces.  The other fallocate variants (collapse range,
> insert range, and allocate blocks) are not supported.

I'd like to see fallocate (block allocation) extend down to DM thinp.
This more traditional use of fallocate would be useful for ensuring
ENOSPC won't occur -- especially important if the FS has committed
space in response to fallocate.  As of now fallocate doesn't inform DM
thinp at all.  Curious why you decided not to wire it up?

But I'm not sure what "it" (the "allocate blocks" variant) even is
given falloc.h doesn't show anything like "_ALLOCATE_BLOCKS"...

It would require a new block interface to pass the fallocate extent
down.  But it seems bizarre to implement "some of" fallocate but not
the most widely used case for fallocate.

Mike

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices
@ 2016-03-21 18:52     ` Mike Snitzer
  0 siblings, 0 replies; 66+ messages in thread
From: Mike Snitzer @ 2016-03-21 18:52 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Jens Axboe, Christoph Hellwig, Theodore Ts'o,
	Martin K. Petersen, linux-api, david, linux-kernel,
	shane.seymour, Bruce Fields, device-mapper development,
	linux-fsdevel, Jeff Layton, Linus Torvalds, Andrew Morton

On Tue, Mar 15, 2016 at 3:42 PM, Darrick J. Wong
<darrick.wong@oracle.com> wrote:
> After much discussion, it seems that the fallocate feature flag
> FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
> FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
> whitelisted for zeroing SCSI UNMAP.  Punch still requires that
> FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
> device will be clamped to the device size if KEEP_SIZE is set; or will
> return -EINVAL if not.  Both start and length must be aligned to the
> device's logical block size.
>
> Since the semantics of fallocate are fairly well established already,
> wire up the two pieces.  The other fallocate variants (collapse range,
> insert range, and allocate blocks) are not supported.

I'd like to see fallocate (block allocation) extend down to DM thinp.
This more traditional use of fallocate would be useful for ensuring
ENOSPC won't occur -- especially important if the FS has committed
space in response to fallocate.  As of now fallocate doesn't inform DM
thinp at all.  Curious why you decided not to wire it up?

But I'm not sure what "it" (the "allocate blocks" variant) even is
given falloc.h doesn't show anything like "_ALLOCATE_BLOCKS"...

It would require a new block interface to pass the fallocate extent
down.  But it seems bizarre to implement "some of" fallocate but not
the most widely used case for fallocate.

Mike

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices
@ 2016-03-21 18:21           ` Martin K. Petersen
  0 siblings, 0 replies; 66+ messages in thread
From: Martin K. Petersen @ 2016-03-21 18:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Darrick J. Wong, axboe, torvalds, bfields, tytso,
	martin.petersen, linux-api, david, linux-kernel, shane.seymour,
	linux-fsdevel, jlayton, akpm

>>>>> "Christoph" == Christoph Hellwig <hch@infradead.org> writes:

Christoph> On Mon, Mar 21, 2016 at 10:52:35AM -0700, Darrick J. Wong wrote:
>> > I don't really understand the comment.  But I think you'd be much
>> 
>> I don't know of a block device primitive that corresponds to the
>> "default" mode of fallocate, as documented in the manpage (i.e. mode
>> == 0).  I agree that the whole thing could be simplified in the
>> manner you point out below.

Christoph> SCSI allows 'anchoring' blocks, which is pretty similar to a
Christoph> normal fallocate, but we don't support anchoring blocks in
Christoph> Linux yet.

Chicken and egg problem. I actually did tinker with this for a different
project a while back and can easily revive it.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices
@ 2016-03-21 18:21           ` Martin K. Petersen
  0 siblings, 0 replies; 66+ messages in thread
From: Martin K. Petersen @ 2016-03-21 18:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Darrick J. Wong, axboe-tSWWG44O7X1aa/9Udqfwiw,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	bfields-uC3wQj2KruNg9hUCZPvPmw, tytso-3s7WtUTddSA,
	martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, david-FqsqvQoI3Ljby3iVrkZq2A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, shane.seymour-ZPxbGqLxI0U,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	jlayton-vpEMnDpepFuMZCB2o+C8xQ,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

>>>>> "Christoph" == Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> writes:

Christoph> On Mon, Mar 21, 2016 at 10:52:35AM -0700, Darrick J. Wong wrote:
>> > I don't really understand the comment.  But I think you'd be much
>> 
>> I don't know of a block device primitive that corresponds to the
>> "default" mode of fallocate, as documented in the manpage (i.e. mode
>> == 0).  I agree that the whole thing could be simplified in the
>> manner you point out below.

Christoph> SCSI allows 'anchoring' blocks, which is pretty similar to a
Christoph> normal fallocate, but we don't support anchoring blocks in
Christoph> Linux yet.

Chicken and egg problem. I actually did tinker with this for a different
project a while back and can easily revive it.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices
@ 2016-03-21 18:17         ` Christoph Hellwig
  0 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2016-03-21 18:17 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, axboe, torvalds, bfields, tytso,
	martin.petersen, linux-api, david, linux-kernel, shane.seymour,
	linux-fsdevel, jlayton, akpm

On Mon, Mar 21, 2016 at 10:52:35AM -0700, Darrick J. Wong wrote:
> > I don't really understand the comment.  But I think you'd be much
> 
> I don't know of a block device primitive that corresponds to the "default"
> mode of fallocate, as documented in the manpage (i.e. mode == 0).  I agree
> that the whole thing could be simplified in the manner you point out below.

SCSI allows 'anchoring' blocks, which is pretty similar to a normal
fallocate, but we don't support anchoring blocks in Linux yet.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices
@ 2016-03-21 18:17         ` Christoph Hellwig
  0 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2016-03-21 18:17 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, axboe-tSWWG44O7X1aa/9Udqfwiw,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	bfields-uC3wQj2KruNg9hUCZPvPmw, tytso-3s7WtUTddSA,
	martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, david-FqsqvQoI3Ljby3iVrkZq2A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, shane.seymour-ZPxbGqLxI0U,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	jlayton-vpEMnDpepFuMZCB2o+C8xQ,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

On Mon, Mar 21, 2016 at 10:52:35AM -0700, Darrick J. Wong wrote:
> > I don't really understand the comment.  But I think you'd be much
> 
> I don't know of a block device primitive that corresponds to the "default"
> mode of fallocate, as documented in the manpage (i.e. mode == 0).  I agree
> that the whole thing could be simplified in the manner you point out below.

SCSI allows 'anchoring' blocks, which is pretty similar to a normal
fallocate, but we don't support anchoring blocks in Linux yet.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices
@ 2016-03-21 17:52       ` Darrick J. Wong
  0 siblings, 0 replies; 66+ messages in thread
From: Darrick J. Wong @ 2016-03-21 17:52 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, torvalds, bfields, tytso, martin.petersen, linux-api,
	david, linux-kernel, shane.seymour, linux-fsdevel, jlayton, akpm

On Mon, Mar 21, 2016 at 08:38:27AM -0700, Christoph Hellwig wrote:
> On Tue, Mar 15, 2016 at 12:42:44PM -0700, Darrick J. Wong wrote:
> >  #include <linux/cleancache.h>
> >  #include <linux/dax.h>
> >  #include <asm/uaccess.h>
> > +#include <linux/falloc.h>
> 
> Maybe keep this before asm/uaccess.h
> 
> > +long blkdev_fallocate(struct file *file, int mode, loff_t start, loff_t len)
> 
> should be marked static.

Ok (to both).

> 
> > +	/* We haven't a primitive for "ensure space exists" right now. */
> > +	if (!(mode & ~FALLOC_FL_KEEP_SIZE))
> > +		return -EOPNOTSUPP;
> 
> I don't really understand the comment.  But I think you'd be much

I don't know of a block device primitive that corresponds to the "default"
mode of fallocate, as documented in the manpage (i.e. mode == 0).  I agree
that the whole thing could be simplified in the manner you point out below.

> better off with having blkdev_fallocate as just a tiny wrapper that has
> a switch for the supported modes, e.g.
> 
> 	switch (mode) {
> 	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
> 		return blkdev_punch_hole();
> 	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE::
> 		return blkdev_zero_range();
> 	default:
> 		return -EOPNOTSUPP;
> 	}
> 
> > +EXPORT_SYMBOL_GPL(blkdev_fallocate);
> 
> and no need to export it either..

Ok.

--D

> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices
@ 2016-03-21 17:52       ` Darrick J. Wong
  0 siblings, 0 replies; 66+ messages in thread
From: Darrick J. Wong @ 2016-03-21 17:52 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe-tSWWG44O7X1aa/9Udqfwiw,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	bfields-uC3wQj2KruNg9hUCZPvPmw, tytso-3s7WtUTddSA,
	martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, david-FqsqvQoI3Ljby3iVrkZq2A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, shane.seymour-ZPxbGqLxI0U,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	jlayton-vpEMnDpepFuMZCB2o+C8xQ,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

On Mon, Mar 21, 2016 at 08:38:27AM -0700, Christoph Hellwig wrote:
> On Tue, Mar 15, 2016 at 12:42:44PM -0700, Darrick J. Wong wrote:
> >  #include <linux/cleancache.h>
> >  #include <linux/dax.h>
> >  #include <asm/uaccess.h>
> > +#include <linux/falloc.h>
> 
> Maybe keep this before asm/uaccess.h
> 
> > +long blkdev_fallocate(struct file *file, int mode, loff_t start, loff_t len)
> 
> should be marked static.

Ok (to both).

> 
> > +	/* We haven't a primitive for "ensure space exists" right now. */
> > +	if (!(mode & ~FALLOC_FL_KEEP_SIZE))
> > +		return -EOPNOTSUPP;
> 
> I don't really understand the comment.  But I think you'd be much

I don't know of a block device primitive that corresponds to the "default"
mode of fallocate, as documented in the manpage (i.e. mode == 0).  I agree
that the whole thing could be simplified in the manner you point out below.

> better off with having blkdev_fallocate as just a tiny wrapper that has
> a switch for the supported modes, e.g.
> 
> 	switch (mode) {
> 	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
> 		return blkdev_punch_hole();
> 	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE::
> 		return blkdev_zero_range();
> 	default:
> 		return -EOPNOTSUPP;
> 	}
> 
> > +EXPORT_SYMBOL_GPL(blkdev_fallocate);
> 
> and no need to export it either..

Ok.

--D

> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices
@ 2016-03-21 15:38     ` Christoph Hellwig
  0 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2016-03-21 15:38 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: axboe, torvalds, bfields, tytso, martin.petersen, linux-api,
	david, linux-kernel, shane.seymour, hch, linux-fsdevel, jlayton,
	akpm

On Tue, Mar 15, 2016 at 12:42:44PM -0700, Darrick J. Wong wrote:
>  #include <linux/cleancache.h>
>  #include <linux/dax.h>
>  #include <asm/uaccess.h>
> +#include <linux/falloc.h>

Maybe keep this before asm/uaccess.h

> +long blkdev_fallocate(struct file *file, int mode, loff_t start, loff_t len)

should be marked static.

> +	/* We haven't a primitive for "ensure space exists" right now. */
> +	if (!(mode & ~FALLOC_FL_KEEP_SIZE))
> +		return -EOPNOTSUPP;

I don't really understand the comment.  But I think you'd be much
better off with having blkdev_fallocate as just a tiny wrapper that has
a switch for the supported modes, e.g.

	switch (mode) {
	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
		return blkdev_punch_hole();
	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE::
		return blkdev_zero_range();
	default:
		return -EOPNOTSUPP;
	}

> +EXPORT_SYMBOL_GPL(blkdev_fallocate);

and no need to export it either..

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices
@ 2016-03-21 15:38     ` Christoph Hellwig
  0 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2016-03-21 15:38 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: axboe-tSWWG44O7X1aa/9Udqfwiw,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	bfields-uC3wQj2KruNg9hUCZPvPmw, tytso-3s7WtUTddSA,
	martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, david-FqsqvQoI3Ljby3iVrkZq2A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, shane.seymour-ZPxbGqLxI0U,
	hch-wEGCiKHe2LqWVfeAwA7xHQ, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	jlayton-vpEMnDpepFuMZCB2o+C8xQ,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

On Tue, Mar 15, 2016 at 12:42:44PM -0700, Darrick J. Wong wrote:
>  #include <linux/cleancache.h>
>  #include <linux/dax.h>
>  #include <asm/uaccess.h>
> +#include <linux/falloc.h>

Maybe keep this before asm/uaccess.h

> +long blkdev_fallocate(struct file *file, int mode, loff_t start, loff_t len)

should be marked static.

> +	/* We haven't a primitive for "ensure space exists" right now. */
> +	if (!(mode & ~FALLOC_FL_KEEP_SIZE))
> +		return -EOPNOTSUPP;

I don't really understand the comment.  But I think you'd be much
better off with having blkdev_fallocate as just a tiny wrapper that has
a switch for the supported modes, e.g.

	switch (mode) {
	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
		return blkdev_punch_hole();
	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE::
		return blkdev_zero_range();
	default:
		return -EOPNOTSUPP;
	}

> +EXPORT_SYMBOL_GPL(blkdev_fallocate);

and no need to export it either..

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 3/3] block: implement (some of) fallocate for block devices
@ 2016-03-15 19:42   ` Darrick J. Wong
  0 siblings, 0 replies; 66+ messages in thread
From: Darrick J. Wong @ 2016-03-15 19:42 UTC (permalink / raw)
  To: axboe, torvalds, darrick.wong
  Cc: bfields, tytso, martin.petersen, linux-api, david, linux-kernel,
	shane.seymour, hch, linux-fsdevel, jlayton, akpm

After much discussion, it seems that the fallocate feature flag
FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
whitelisted for zeroing SCSI UNMAP.  Punch still requires that
FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
device will be clamped to the device size if KEEP_SIZE is set; or will
return -EINVAL if not.  Both start and length must be aligned to the
device's logical block size.

Since the semantics of fallocate are fairly well established already,
wire up the two pieces.  The other fallocate variants (collapse range,
insert range, and allocate blocks) are not supported.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/block_dev.c |   69 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/open.c      |    3 ++
 2 files changed, 71 insertions(+), 1 deletion(-)


diff --git a/fs/block_dev.c b/fs/block_dev.c
index 826b164..6137c6e 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -30,6 +30,7 @@
 #include <linux/cleancache.h>
 #include <linux/dax.h>
 #include <asm/uaccess.h>
+#include <linux/falloc.h>
 #include "internal.h"
 
 struct bdev_inode {
@@ -1786,6 +1787,73 @@ static int blkdev_mmap(struct file *file, struct vm_area_struct *vma)
 #define blkdev_mmap generic_file_mmap
 #endif
 
+#define	BLKDEV_FALLOC_FL_SUPPORTED					\
+		(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |		\
+		 FALLOC_FL_ZERO_RANGE)
+
+long blkdev_fallocate(struct file *file, int mode, loff_t start, loff_t len)
+{
+	struct block_device *bdev = I_BDEV(bdev_file_inode(file));
+	struct request_queue *q = bdev_get_queue(bdev);
+	struct address_space *mapping;
+	loff_t end = start + len - 1;
+	loff_t bs_mask, isize;
+	int error;
+
+	/* We only support zero range and punch hole. */
+	if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
+		return -EOPNOTSUPP;
+
+	/* We haven't a primitive for "ensure space exists" right now. */
+	if (!(mode & ~FALLOC_FL_KEEP_SIZE))
+		return -EOPNOTSUPP;
+
+	/* Only punch if the device can do zeroing discard. */
+	if ((mode & FALLOC_FL_PUNCH_HOLE) &&
+	    (!blk_queue_discard(q) || !q->limits.discard_zeroes_data))
+		return -EOPNOTSUPP;
+
+	/* Don't go off the end of the device */
+	isize = i_size_read(bdev->bd_inode);
+	if (start >= isize)
+		return -EINVAL;
+	if (end > isize) {
+		if (mode & FALLOC_FL_KEEP_SIZE) {
+			len = isize - start;
+			end = start + len - 1;
+		} else
+			return -EINVAL;
+	}
+
+	/* Don't allow IO that isn't aligned to logical block size */
+	bs_mask = bdev_logical_block_size(bdev) - 1;
+	if ((start | len) & bs_mask)
+		return -EINVAL;
+
+	/* Invalidate the page cache, including dirty pages. */
+	mapping = bdev->bd_inode->i_mapping;
+	truncate_inode_pages_range(mapping, start, end);
+
+	error = -EINVAL;
+	if (mode & FALLOC_FL_ZERO_RANGE)
+		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
+					    GFP_KERNEL, false);
+	else if (mode & FALLOC_FL_PUNCH_HOLE)
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+	if (error)
+		return error;
+
+	/*
+	 * Invalidate again; if someone wandered in and dirtied a page,
+	 * the caller will be given -EBUSY;
+	 */
+	return invalidate_inode_pages2_range(mapping,
+					     start >> PAGE_CACHE_SHIFT,
+					     end >> PAGE_CACHE_SHIFT);
+}
+EXPORT_SYMBOL_GPL(blkdev_fallocate);
+
 const struct file_operations def_blk_fops = {
 	.open		= blkdev_open,
 	.release	= blkdev_close,
@@ -1800,6 +1868,7 @@ const struct file_operations def_blk_fops = {
 #endif
 	.splice_read	= generic_file_splice_read,
 	.splice_write	= iter_file_splice_write,
+	.fallocate	= blkdev_fallocate,
 };
 
 int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long arg)
diff --git a/fs/open.c b/fs/open.c
index 55bdc75..4f99adc 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -289,7 +289,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 	 * Let individual file system decide if it supports preallocation
 	 * for directories or not.
 	 */
-	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
+	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) &&
+	    !S_ISBLK(inode->i_mode))
 		return -ENODEV;
 
 	/* Check for wrap through zero too */

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 3/3] block: implement (some of) fallocate for block devices
@ 2016-03-15 19:42   ` Darrick J. Wong
  0 siblings, 0 replies; 66+ messages in thread
From: Darrick J. Wong @ 2016-03-15 19:42 UTC (permalink / raw)
  To: axboe-tSWWG44O7X1aa/9Udqfwiw,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	darrick.wong-QHcLZuEGTsvQT0dZR+AlfA
  Cc: bfields-uC3wQj2KruNg9hUCZPvPmw, tytso-3s7WtUTddSA,
	martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, david-FqsqvQoI3Ljby3iVrkZq2A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, shane.seymour-ZPxbGqLxI0U,
	hch-wEGCiKHe2LqWVfeAwA7xHQ, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	jlayton-vpEMnDpepFuMZCB2o+C8xQ,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

After much discussion, it seems that the fallocate feature flag
FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
whitelisted for zeroing SCSI UNMAP.  Punch still requires that
FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
device will be clamped to the device size if KEEP_SIZE is set; or will
return -EINVAL if not.  Both start and length must be aligned to the
device's logical block size.

Since the semantics of fallocate are fairly well established already,
wire up the two pieces.  The other fallocate variants (collapse range,
insert range, and allocate blocks) are not supported.

Signed-off-by: Darrick J. Wong <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 fs/block_dev.c |   69 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/open.c      |    3 ++
 2 files changed, 71 insertions(+), 1 deletion(-)


diff --git a/fs/block_dev.c b/fs/block_dev.c
index 826b164..6137c6e 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -30,6 +30,7 @@
 #include <linux/cleancache.h>
 #include <linux/dax.h>
 #include <asm/uaccess.h>
+#include <linux/falloc.h>
 #include "internal.h"
 
 struct bdev_inode {
@@ -1786,6 +1787,73 @@ static int blkdev_mmap(struct file *file, struct vm_area_struct *vma)
 #define blkdev_mmap generic_file_mmap
 #endif
 
+#define	BLKDEV_FALLOC_FL_SUPPORTED					\
+		(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |		\
+		 FALLOC_FL_ZERO_RANGE)
+
+long blkdev_fallocate(struct file *file, int mode, loff_t start, loff_t len)
+{
+	struct block_device *bdev = I_BDEV(bdev_file_inode(file));
+	struct request_queue *q = bdev_get_queue(bdev);
+	struct address_space *mapping;
+	loff_t end = start + len - 1;
+	loff_t bs_mask, isize;
+	int error;
+
+	/* We only support zero range and punch hole. */
+	if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
+		return -EOPNOTSUPP;
+
+	/* We haven't a primitive for "ensure space exists" right now. */
+	if (!(mode & ~FALLOC_FL_KEEP_SIZE))
+		return -EOPNOTSUPP;
+
+	/* Only punch if the device can do zeroing discard. */
+	if ((mode & FALLOC_FL_PUNCH_HOLE) &&
+	    (!blk_queue_discard(q) || !q->limits.discard_zeroes_data))
+		return -EOPNOTSUPP;
+
+	/* Don't go off the end of the device */
+	isize = i_size_read(bdev->bd_inode);
+	if (start >= isize)
+		return -EINVAL;
+	if (end > isize) {
+		if (mode & FALLOC_FL_KEEP_SIZE) {
+			len = isize - start;
+			end = start + len - 1;
+		} else
+			return -EINVAL;
+	}
+
+	/* Don't allow IO that isn't aligned to logical block size */
+	bs_mask = bdev_logical_block_size(bdev) - 1;
+	if ((start | len) & bs_mask)
+		return -EINVAL;
+
+	/* Invalidate the page cache, including dirty pages. */
+	mapping = bdev->bd_inode->i_mapping;
+	truncate_inode_pages_range(mapping, start, end);
+
+	error = -EINVAL;
+	if (mode & FALLOC_FL_ZERO_RANGE)
+		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
+					    GFP_KERNEL, false);
+	else if (mode & FALLOC_FL_PUNCH_HOLE)
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+	if (error)
+		return error;
+
+	/*
+	 * Invalidate again; if someone wandered in and dirtied a page,
+	 * the caller will be given -EBUSY;
+	 */
+	return invalidate_inode_pages2_range(mapping,
+					     start >> PAGE_CACHE_SHIFT,
+					     end >> PAGE_CACHE_SHIFT);
+}
+EXPORT_SYMBOL_GPL(blkdev_fallocate);
+
 const struct file_operations def_blk_fops = {
 	.open		= blkdev_open,
 	.release	= blkdev_close,
@@ -1800,6 +1868,7 @@ const struct file_operations def_blk_fops = {
 #endif
 	.splice_read	= generic_file_splice_read,
 	.splice_write	= iter_file_splice_write,
+	.fallocate	= blkdev_fallocate,
 };
 
 int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long arg)
diff --git a/fs/open.c b/fs/open.c
index 55bdc75..4f99adc 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -289,7 +289,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 	 * Let individual file system decide if it supports preallocation
 	 * for directories or not.
 	 */
-	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
+	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) &&
+	    !S_ISBLK(inode->i_mode))
 		return -ENODEV;
 
 	/* Check for wrap through zero too */

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices
@ 2016-03-05 20:58       ` Christoph Hellwig
  0 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2016-03-05 20:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Darrick J. Wong, Jens Axboe, Christoph Hellwig,
	Theodore Ts'o, Martin K. Petersen, Linux API, Dave Chinner,
	Linux Kernel Mailing List, shane.seymour, Bruce Fields,
	linux-fsdevel, Jeff Layton, Andrew Morton

On Fri, Mar 04, 2016 at 07:13:25PM -0800, Linus Torvalds wrote:
> > +       /* We can't change the bdev size from here */
> > +       if (!(mode & FALLOC_FL_KEEP_SIZE))
> > +               return -EOPNOTSUPP;
> 
> Oh, and this I think is wrong.
> 
> The thing is, FALLOC_FL_KEEP_SIZE is only supposed to matter if the
> region is outside the existing length.

For allocations...

> So if y ou punch a hole in the middle of a file, you don't need
> FALLOC_FL_KEEP_SIZE.

For FALLOC_FL_PUNCH_HOLE we always require FALLOC_FL_KEEP_SIZE so far,
and I'd rather not change things for block devices just because we can.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices
@ 2016-03-05 20:58       ` Christoph Hellwig
  0 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2016-03-05 20:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Darrick J. Wong, Jens Axboe, Christoph Hellwig,
	Theodore Ts'o, Martin K. Petersen, Linux API, Dave Chinner,
	Linux Kernel Mailing List, shane.seymour-ZPxbGqLxI0U,
	Bruce Fields, linux-fsdevel, Jeff Layton, Andrew Morton

On Fri, Mar 04, 2016 at 07:13:25PM -0800, Linus Torvalds wrote:
> > +       /* We can't change the bdev size from here */
> > +       if (!(mode & FALLOC_FL_KEEP_SIZE))
> > +               return -EOPNOTSUPP;
> 
> Oh, and this I think is wrong.
> 
> The thing is, FALLOC_FL_KEEP_SIZE is only supposed to matter if the
> region is outside the existing length.

For allocations...

> So if y ou punch a hole in the middle of a file, you don't need
> FALLOC_FL_KEEP_SIZE.

For FALLOC_FL_PUNCH_HOLE we always require FALLOC_FL_KEEP_SIZE so far,
and I'd rather not change things for block devices just because we can.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices
@ 2016-03-05 20:57       ` Christoph Hellwig
  0 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2016-03-05 20:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Darrick J. Wong, Jens Axboe, Christoph Hellwig,
	Theodore Ts'o, Martin K. Petersen, Linux API, Dave Chinner,
	Linux Kernel Mailing List, shane.seymour, Bruce Fields,
	linux-fsdevel, Jeff Layton, Andrew Morton

On Fri, Mar 04, 2016 at 07:06:38PM -0800, Linus Torvalds wrote:
> > +       if ((mode & FALLOC_FL_PUNCH_HOLE) &&
> > +           (!blk_queue_discard(q) || !q->limits.discard_zeroes_data))
> > +               return -EOPNOTSUPP;
> 
> I'm ok with this, but suspect that some users would prefer to just
> turn this into ZERO_RANGE silently.
> 
> Comments from people who would be expected to use this?

A hole punch should be a hole punch, and not silently allocate blocks
isntead of deallocating them.  It's not even a fallback, it's pretty
much the opposite for some workloads.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices
@ 2016-03-05 20:57       ` Christoph Hellwig
  0 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2016-03-05 20:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Darrick J. Wong, Jens Axboe, Christoph Hellwig,
	Theodore Ts'o, Martin K. Petersen, Linux API, Dave Chinner,
	Linux Kernel Mailing List, shane.seymour-ZPxbGqLxI0U,
	Bruce Fields, linux-fsdevel, Jeff Layton, Andrew Morton

On Fri, Mar 04, 2016 at 07:06:38PM -0800, Linus Torvalds wrote:
> > +       if ((mode & FALLOC_FL_PUNCH_HOLE) &&
> > +           (!blk_queue_discard(q) || !q->limits.discard_zeroes_data))
> > +               return -EOPNOTSUPP;
> 
> I'm ok with this, but suspect that some users would prefer to just
> turn this into ZERO_RANGE silently.
> 
> Comments from people who would be expected to use this?

A hole punch should be a hole punch, and not silently allocate blocks
isntead of deallocating them.  It's not even a fallback, it's pretty
much the opposite for some workloads.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices
@ 2016-03-05  3:13     ` Linus Torvalds
  0 siblings, 0 replies; 66+ messages in thread
From: Linus Torvalds @ 2016-03-05  3:13 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Jens Axboe, Christoph Hellwig, Theodore Ts'o,
	Martin K. Petersen, Linux API, Dave Chinner,
	Linux Kernel Mailing List, shane.seymour, Bruce Fields,
	linux-fsdevel, Jeff Layton, Andrew Morton

On Fri, Mar 4, 2016 at 4:56 PM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> +
> +       /* We can't change the bdev size from here */
> +       if (!(mode & FALLOC_FL_KEEP_SIZE))
> +               return -EOPNOTSUPP;

Oh, and this I think is wrong.

The thing is, FALLOC_FL_KEEP_SIZE is only supposed to matter if the
region is outside the existing length.

So if y ou punch a hole in the middle of a file, you don't need
FALLOC_FL_KEEP_SIZE.

I would suggest removing this check entirely, since you already check
that people don't try to punch holes past the end of the device. So
FALLOC_FL_KEEP_SIZE is simply a non-issue, and shouldn't even be
checked.

              Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices
@ 2016-03-05  3:13     ` Linus Torvalds
  0 siblings, 0 replies; 66+ messages in thread
From: Linus Torvalds @ 2016-03-05  3:13 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Jens Axboe, Christoph Hellwig, Theodore Ts'o,
	Martin K. Petersen, Linux API, Dave Chinner,
	Linux Kernel Mailing List, shane.seymour-ZPxbGqLxI0U,
	Bruce Fields, linux-fsdevel, Jeff Layton, Andrew Morton

On Fri, Mar 4, 2016 at 4:56 PM, Darrick J. Wong <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
> +
> +       /* We can't change the bdev size from here */
> +       if (!(mode & FALLOC_FL_KEEP_SIZE))
> +               return -EOPNOTSUPP;

Oh, and this I think is wrong.

The thing is, FALLOC_FL_KEEP_SIZE is only supposed to matter if the
region is outside the existing length.

So if y ou punch a hole in the middle of a file, you don't need
FALLOC_FL_KEEP_SIZE.

I would suggest removing this check entirely, since you already check
that people don't try to punch holes past the end of the device. So
FALLOC_FL_KEEP_SIZE is simply a non-issue, and shouldn't even be
checked.

              Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices
@ 2016-03-05  3:06     ` Linus Torvalds
  0 siblings, 0 replies; 66+ messages in thread
From: Linus Torvalds @ 2016-03-05  3:06 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Jens Axboe, Christoph Hellwig, Theodore Ts'o,
	Martin K. Petersen, Linux API, Dave Chinner,
	Linux Kernel Mailing List, shane.seymour, Bruce Fields,
	linux-fsdevel, Jeff Layton, Andrew Morton

On Fri, Mar 4, 2016 at 4:56 PM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> +       /* Only punch if the device can do zeroing discard. */
> +       if ((mode & FALLOC_FL_PUNCH_HOLE) &&
> +           (!blk_queue_discard(q) || !q->limits.discard_zeroes_data))
> +               return -EOPNOTSUPP;

I'm ok with this, but suspect that some users would prefer to just
turn this into ZERO_RANGE silently.

Comments from people who would be expected to use this?

            Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices
@ 2016-03-05  3:06     ` Linus Torvalds
  0 siblings, 0 replies; 66+ messages in thread
From: Linus Torvalds @ 2016-03-05  3:06 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Jens Axboe, Christoph Hellwig, Theodore Ts'o,
	Martin K. Petersen, Linux API, Dave Chinner,
	Linux Kernel Mailing List, shane.seymour-ZPxbGqLxI0U,
	Bruce Fields, linux-fsdevel, Jeff Layton, Andrew Morton

On Fri, Mar 4, 2016 at 4:56 PM, Darrick J. Wong <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
> +       /* Only punch if the device can do zeroing discard. */
> +       if ((mode & FALLOC_FL_PUNCH_HOLE) &&
> +           (!blk_queue_discard(q) || !q->limits.discard_zeroes_data))
> +               return -EOPNOTSUPP;

I'm ok with this, but suspect that some users would prefer to just
turn this into ZERO_RANGE silently.

Comments from people who would be expected to use this?

            Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 3/3] block: implement (some of) fallocate for block devices
  2016-03-05  0:55 [PATCH v6 0/3] fallocate for block devices to provide zero-out Darrick J. Wong
@ 2016-03-05  0:56 ` Darrick J. Wong
  2016-03-05  3:06     ` Linus Torvalds
  2016-03-05  3:13     ` Linus Torvalds
  0 siblings, 2 replies; 66+ messages in thread
From: Darrick J. Wong @ 2016-03-05  0:56 UTC (permalink / raw)
  To: axboe, torvalds, darrick.wong
  Cc: hch, tytso, martin.petersen, linux-api, david, linux-kernel,
	shane.seymour, bfields, linux-fsdevel, jlayton, akpm

After much discussion, it seems that the fallocate feature flag
FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
whitelisted for zeroing SCSI UNMAP.  Both flags require that
FALLOC_FL_KEEP_SIZE are set, both return EINVAL if one tries
to write past the end of the device, and both require that the
offset and length be aligned at least to 512-byte offsets.q

Since the semantics of fallocate are fairly well established already,
wire up the two pieces.  The other fallocate variants (collapse range,
insert range, and allocate blocks) are not supported.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/block_dev.c |   67 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/open.c      |    3 ++-
 2 files changed, 69 insertions(+), 1 deletion(-)


diff --git a/fs/block_dev.c b/fs/block_dev.c
index 826b164..c9c9421 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -30,6 +30,7 @@
 #include <linux/cleancache.h>
 #include <linux/dax.h>
 #include <asm/uaccess.h>
+#include <linux/falloc.h>
 #include "internal.h"
 
 struct bdev_inode {
@@ -1786,6 +1787,71 @@ static int blkdev_mmap(struct file *file, struct vm_area_struct *vma)
 #define blkdev_mmap generic_file_mmap
 #endif
 
+#define	BLKDEV_FALLOC_FL_SUPPORTED					\
+		(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |		\
+		 FALLOC_FL_ZERO_RANGE)
+
+long blkdev_fallocate(struct file *file, int mode, loff_t start, loff_t len)
+{
+	struct block_device *bdev = I_BDEV(bdev_file_inode(file));
+	struct request_queue *q = bdev_get_queue(bdev);
+	struct address_space *mapping;
+	loff_t end = start + len - 1;
+	loff_t bs_mask;
+	int error;
+
+	/* We only support zero range and punch hole. */
+	if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
+		return -EOPNOTSUPP;
+
+	/* We can't change the bdev size from here */
+	if (!(mode & FALLOC_FL_KEEP_SIZE))
+		return -EOPNOTSUPP;
+
+	/* We haven't a primitive for "ensure space exists" right now. */
+	if (mode == FALLOC_FL_KEEP_SIZE)
+		return -EOPNOTSUPP;
+
+	/* Only punch if the device can do zeroing discard. */
+	if ((mode & FALLOC_FL_PUNCH_HOLE) &&
+	    (!blk_queue_discard(q) || !q->limits.discard_zeroes_data))
+		return -EOPNOTSUPP;
+
+	/* Don't allow IO that isn't aligned to logical block size */
+	bs_mask = bdev_logical_block_size(bdev) - 1;
+	if ((start & bs_mask) || ((start + len) & bs_mask))
+		return -EINVAL;
+
+	/* Don't go off the end of the device */
+	if (end > i_size_read(bdev->bd_inode))
+		return -EINVAL;
+	if (end < start)
+		return -EINVAL;
+
+	/* Invalidate the page cache, including dirty pages. */
+	mapping = bdev->bd_inode->i_mapping;
+	truncate_inode_pages_range(mapping, start, end);
+
+	error = -EINVAL;
+	if (mode & FALLOC_FL_ZERO_RANGE)
+		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
+					    GFP_KERNEL, false);
+	else if (mode & FALLOC_FL_PUNCH_HOLE)
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+	if (error)
+		return error;
+
+	/*
+	 * Invalidate again; if someone wandered in and dirtied a page,
+	 * the caller will be given -EBUSY;
+	 */
+	return invalidate_inode_pages2_range(mapping,
+					     start >> PAGE_CACHE_SHIFT,
+					     end >> PAGE_CACHE_SHIFT);
+}
+EXPORT_SYMBOL_GPL(blkdev_fallocate);
+
 const struct file_operations def_blk_fops = {
 	.open		= blkdev_open,
 	.release	= blkdev_close,
@@ -1800,6 +1866,7 @@ const struct file_operations def_blk_fops = {
 #endif
 	.splice_read	= generic_file_splice_read,
 	.splice_write	= iter_file_splice_write,
+	.fallocate	= blkdev_fallocate,
 };
 
 int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long arg)
diff --git a/fs/open.c b/fs/open.c
index 55bdc75..4f99adc 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -289,7 +289,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 	 * Let individual file system decide if it supports preallocation
 	 * for directories or not.
 	 */
-	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
+	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) &&
+	    !S_ISBLK(inode->i_mode))
 		return -ENODEV;
 
 	/* Check for wrap through zero too */

^ permalink raw reply related	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2016-09-29 21:17 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-29  0:39 [PATCH v10 0/3] fallocate for block devices Darrick J. Wong
2016-09-29  0:39 ` Darrick J. Wong
2016-09-29  0:39 ` [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT Darrick J. Wong
2016-09-29  0:39   ` Darrick J. Wong
2016-09-29  0:39   ` Darrick J. Wong
2016-09-29  1:16   ` Bart Van Assche
2016-09-29  1:16     ` Bart Van Assche
2016-09-29  5:56   ` Hannes Reinecke
2016-09-29  5:56     ` Hannes Reinecke
2016-09-29  0:39 ` [PATCH 2/3] block: require write_same and discard requests align to logical block size Darrick J. Wong
2016-09-29  0:39   ` Darrick J. Wong
2016-09-29  5:56   ` Hannes Reinecke
2016-09-29  5:56     ` Hannes Reinecke
2016-09-29  5:56     ` Hannes Reinecke
2016-09-29  0:39 ` [PATCH 3/3] block: implement (some of) fallocate for block devices Darrick J. Wong
2016-09-29  0:39   ` Darrick J. Wong
2016-09-29  1:42   ` Bart Van Assche
2016-09-29  1:42     ` Bart Van Assche
2016-09-29  1:42     ` Bart Van Assche
2016-09-29  2:09     ` Darrick J. Wong
2016-09-29  2:09       ` Darrick J. Wong
2016-09-29  2:19   ` [PATCH v2 " Darrick J. Wong
2016-09-29  2:19     ` Darrick J. Wong
2016-09-29 20:08     ` Bart Van Assche
2016-09-29 20:08       ` Bart Van Assche
2016-09-29 20:08       ` Bart Van Assche
2016-09-29 20:35       ` Darrick J. Wong
2016-09-29 20:35         ` Darrick J. Wong
2016-09-29  5:57   ` [PATCH " Hannes Reinecke
2016-09-29  5:57     ` Hannes Reinecke
  -- strict thread matches above, loose matches on Subject: below --
2016-09-29 21:16 [PATCH v11 0/3] " Darrick J. Wong
2016-09-29 21:16 ` [PATCH 3/3] block: implement (some of) " Darrick J. Wong
2016-09-29 21:16   ` Darrick J. Wong
2016-08-26  0:02 [PATCH v10 0/3] " Darrick J. Wong
2016-08-26  0:02 ` [PATCH 3/3] block: implement (some of) " Darrick J. Wong
2016-08-26  0:02   ` Darrick J. Wong
2016-06-17  1:17 [PATCH v9 0/3] " Darrick J. Wong
2016-06-17  1:17 ` [PATCH 3/3] block: implement (some of) " Darrick J. Wong
2016-06-17  1:17   ` Darrick J. Wong
2016-06-17  1:17   ` Darrick J. Wong
2016-04-13  4:01 [RFC DONOTMERGE v8 0/3] " Darrick J. Wong
2016-04-13  4:01 ` [PATCH 3/3] block: implement (some of) " Darrick J. Wong
2016-04-13  4:01   ` Darrick J. Wong
2016-04-13  4:01   ` Darrick J. Wong
2016-03-15 19:42 [PATCH v7 0/3] fallocate for block devices to provide zero-out Darrick J. Wong
2016-03-15 19:42 ` [PATCH 3/3] block: implement (some of) fallocate for block devices Darrick J. Wong
2016-03-15 19:42   ` Darrick J. Wong
2016-03-21 15:38   ` Christoph Hellwig
2016-03-21 15:38     ` Christoph Hellwig
2016-03-21 17:52     ` Darrick J. Wong
2016-03-21 17:52       ` Darrick J. Wong
2016-03-21 18:17       ` Christoph Hellwig
2016-03-21 18:17         ` Christoph Hellwig
2016-03-21 18:21         ` Martin K. Petersen
2016-03-21 18:21           ` Martin K. Petersen
2016-03-21 18:52   ` Mike Snitzer
2016-03-21 18:52     ` Mike Snitzer
2016-03-21 19:11     ` Darrick J. Wong
2016-03-21 19:11       ` Darrick J. Wong
2016-03-21 19:22       ` Mike Snitzer
2016-03-21 20:59         ` Brian Foster
2016-03-21 20:59           ` Brian Foster
2016-03-05  0:55 [PATCH v6 0/3] fallocate for block devices to provide zero-out Darrick J. Wong
2016-03-05  0:56 ` [PATCH 3/3] block: implement (some of) fallocate for block devices Darrick J. Wong
2016-03-05  3:06   ` Linus Torvalds
2016-03-05  3:06     ` Linus Torvalds
2016-03-05 20:57     ` Christoph Hellwig
2016-03-05 20:57       ` Christoph Hellwig
2016-03-05  3:13   ` Linus Torvalds
2016-03-05  3:13     ` Linus Torvalds
2016-03-05 20:58     ` Christoph Hellwig
2016-03-05 20:58       ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.