linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v9 0/3] fallocate for block devices
@ 2016-06-17  1:17 Darrick J. Wong
  2016-06-17  1:17 ` [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT Darrick J. Wong
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:17 UTC (permalink / raw)
  To: axboe, darrick.wong
  Cc: hch, tytso, martin.petersen, snitzer, linux-api, bfoster, xfs,
	linux-block, dm-devel, linux-fsdevel

Hi,

This is a redesign of the patch series that fixes various interface
problems with the existing "zero out this part of a block device"
code.  BLKZEROOUT2 is gone.

The first patch is still a fix to the existing BLKZEROOUT ioctl to
invalidate the page cache if the zeroing command to the underlying
device succeeds.  Without this patch we still have the pagecache
coherence bug that's been in the kernel forever.

The second patch changes the internal block device functions to reject
attempts to discard or zeroout that are not aligned to the logical
block size.  Previously, we only checked that the start/len parameters
were 512-byte aligned, which caused kernel BUG_ONs for unaligned IOs
to 4k-LBA devices.

The third patch creates an fallocate handler for block devices, wires
up the FALLOC_FL_PUNCH_HOLE flag to zeroing-discard, and connects
FALLOC_FL_ZERO_RANGE to write-same so that we can have a consistent
fallocate interface between files and block devices.  It also allows
the combination of PUNCH_HOLE and NO_HIDE_STALE to invoke non-zeroing
discard.

Test cases for the new block device fallocate are now in xfstests as
generic/349-351.

Comments and questions are, as always, welcome.  Patches are against
4.7-rc3.

v7: Strengthen parameter checking and fix various code issues pointed
out by Linus and Christoph.
v8: More code rearranging, rebase to 4.6-rc3, and dig into alignment
issues.
v9: Forward port to 4.7.

--D

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT.
  2016-06-17  1:17 [PATCH v9 0/3] fallocate for block devices Darrick J. Wong
@ 2016-06-17  1:17 ` Darrick J. Wong
  2016-06-20 12:35   ` Bart Van Assche
  2016-06-29  4:57   ` Martin K. Petersen
  2016-06-17  1:17 ` [PATCH 2/3] block: require write_same and discard requests align to logical block size Darrick J. Wong
  2016-06-17  1:17 ` [PATCH 3/3] block: implement (some of) fallocate for block devices Darrick J. Wong
  2 siblings, 2 replies; 9+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:17 UTC (permalink / raw)
  To: axboe, darrick.wong
  Cc: hch, tytso, martin.petersen, snitzer, linux-api, bfoster, xfs,
	linux-block, dm-devel, linux-fsdevel, Christoph Hellwig

Invalidate the page cache (as a regular O_DIRECT write would do) to avoid
returning stale cache contents at a later time.

v5: Refactor the 4.4 refactoring of the ioctl code into separate functions.
Split the page invalidation and the new ioctl into separate patches.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 block/ioctl.c |   29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)


diff --git a/block/ioctl.c b/block/ioctl.c
index ed2397f..d001f52 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -225,7 +225,9 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
 		unsigned long arg)
 {
 	uint64_t range[2];
-	uint64_t start, len;
+	struct address_space *mapping;
+	uint64_t start, end, len;
+	int ret;
 
 	if (!(mode & FMODE_WRITE))
 		return -EBADF;
@@ -235,18 +237,33 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
 
 	start = range[0];
 	len = range[1];
+	end = start + len - 1;
 
 	if (start & 511)
 		return -EINVAL;
 	if (len & 511)
 		return -EINVAL;
-	start >>= 9;
-	len >>= 9;
-
-	if (start + len > (i_size_read(bdev->bd_inode) >> 9))
+	if (end >= (uint64_t)i_size_read(bdev->bd_inode))
+		return -EINVAL;
+	if (end < start)
 		return -EINVAL;
 
-	return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL, false);
+	/* Invalidate the page cache, including dirty pages */
+	mapping = bdev->bd_inode->i_mapping;
+	truncate_inode_pages_range(mapping, start, end);
+
+	ret = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
+				    false);
+	if (ret)
+		return ret;
+
+	/*
+	 * Invalidate again; if someone wandered in and dirtied a page,
+	 * the caller will be given -EBUSY.
+	 */
+	return invalidate_inode_pages2_range(mapping,
+					     start >> PAGE_SHIFT,
+					     end >> PAGE_SHIFT);
 }
 
 static int put_ushort(unsigned long arg, unsigned short val)

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/3] block: require write_same and discard requests align to logical block size
  2016-06-17  1:17 [PATCH v9 0/3] fallocate for block devices Darrick J. Wong
  2016-06-17  1:17 ` [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT Darrick J. Wong
@ 2016-06-17  1:17 ` Darrick J. Wong
  2016-06-20 12:37   ` Bart Van Assche
  2016-06-29  4:58   ` Martin K. Petersen
  2016-06-17  1:17 ` [PATCH 3/3] block: implement (some of) fallocate for block devices Darrick J. Wong
  2 siblings, 2 replies; 9+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:17 UTC (permalink / raw)
  To: axboe, darrick.wong
  Cc: hch, tytso, martin.petersen, snitzer, linux-api, bfoster, xfs,
	linux-block, dm-devel, linux-fsdevel, Christoph Hellwig

Make sure that the offset and length arguments that we're using to
construct WRITE SAME and DISCARD requests are actually aligned to the
logical block size.  Failure to do this causes other errors in other
parts of the block layer or the SCSI layer because disks don't support
partial logical block writes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-lib.c |   15 +++++++++++++++
 1 file changed, 15 insertions(+)


diff --git a/block/blk-lib.c b/block/blk-lib.c
index 9e29dc3..012aa98 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -29,6 +29,7 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 	struct bio *bio = *biop;
 	unsigned int granularity;
 	int alignment;
+	sector_t bs_mask;
 
 	if (!q)
 		return -ENXIO;
@@ -37,6 +38,10 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 	if ((type & REQ_SECURE) && !blk_queue_secdiscard(q))
 		return -EOPNOTSUPP;
 
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
+
 	/* Zero-sector (unknown) and one-sector granularities are the same.  */
 	granularity = max(q->limits.discard_granularity >> 9, 1U);
 	alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
@@ -140,10 +145,15 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 	unsigned int max_write_same_sectors;
 	struct bio *bio = NULL;
 	int ret = 0;
+	sector_t bs_mask;
 
 	if (!q)
 		return -ENXIO;
 
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
+
 	/* Ensure that max_write_same_sectors doesn't overflow bi_size */
 	max_write_same_sectors = UINT_MAX >> 9;
 
@@ -191,6 +201,11 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 	int ret;
 	struct bio *bio = NULL;
 	unsigned int sz;
+	sector_t bs_mask;
+
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
 
 	while (nr_sects != 0) {
 		bio = next_bio(bio, WRITE,

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/3] block: implement (some of) fallocate for block devices
  2016-06-17  1:17 [PATCH v9 0/3] fallocate for block devices Darrick J. Wong
  2016-06-17  1:17 ` [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT Darrick J. Wong
  2016-06-17  1:17 ` [PATCH 2/3] block: require write_same and discard requests align to logical block size Darrick J. Wong
@ 2016-06-17  1:17 ` Darrick J. Wong
  2 siblings, 0 replies; 9+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:17 UTC (permalink / raw)
  To: axboe, darrick.wong
  Cc: hch, tytso, martin.petersen, snitzer, linux-api, bfoster, xfs,
	linux-block, dm-devel, linux-fsdevel

After much discussion, it seems that the fallocate feature flag
FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
whitelisted for zeroing SCSI UNMAP.  Punch still requires that
FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
device will be clamped to the device size if KEEP_SIZE is set; or will
return -EINVAL if not.  Both start and length must be aligned to the
device's logical block size.

Since the semantics of fallocate are fairly well established already,
wire up the two pieces.  The other fallocate variants (collapse range,
insert range, and allocate blocks) are not supported.

v2: Incorporate feedback from Christoph & Linus.  Tentatively add
a requirement that the fallocate arguments be aligned to logical block
size, and put in a few XXX comments ahead of LSF discussion.

v3: Forward port to 4.7.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/block_dev.c |   84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/open.c      |    3 +-
 2 files changed, 86 insertions(+), 1 deletion(-)


diff --git a/fs/block_dev.c b/fs/block_dev.c
index 71ccab1..a3975c4 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -30,6 +30,7 @@
 #include <linux/cleancache.h>
 #include <linux/dax.h>
 #include <linux/badblocks.h>
+#include <linux/falloc.h>
 #include <asm/uaccess.h>
 #include "internal.h"
 
@@ -1802,6 +1803,88 @@ static const struct address_space_operations def_blk_aops = {
 	.is_dirty_writeback = buffer_check_dirty_writeback,
 };
 
+#define	BLKDEV_FALLOC_FL_SUPPORTED					\
+		(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |		\
+		 FALLOC_FL_ZERO_RANGE | FALLOC_FL_NO_HIDE_STALE)
+
+static long blkdev_fallocate(struct file *file, int mode, loff_t start,
+			     loff_t len)
+{
+	struct block_device *bdev = I_BDEV(bdev_file_inode(file));
+	struct request_queue *q = bdev_get_queue(bdev);
+	struct address_space *mapping;
+	loff_t end = start + len - 1;
+	loff_t isize;
+	int error;
+
+	/* Fail if we don't recognize the flags. */
+	if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
+		return -EOPNOTSUPP;
+
+	/* Don't go off the end of the device. */
+	isize = i_size_read(bdev->bd_inode);
+	if (start >= isize)
+		return -EINVAL;
+	if (end > isize) {
+		if (mode & FALLOC_FL_KEEP_SIZE) {
+			len = isize - start;
+			end = start + len - 1;
+		} else
+			return -EINVAL;
+	}
+
+	/*
+	 * Don't allow IO that isn't aligned to logical block size.
+	 */
+	if ((start | len) & (bdev_logical_block_size(bdev) - 1))
+		return -EINVAL;
+
+	/* Invalidate the page cache, including dirty pages. */
+	mapping = bdev->bd_inode->i_mapping;
+	truncate_inode_pages_range(mapping, start, end);
+
+	switch (mode) {
+	case FALLOC_FL_ZERO_RANGE:
+	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
+		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
+					    GFP_KERNEL, false);
+		if (error)
+			return error;
+		break;
+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
+		/* Only punch if the device can do zeroing discard. */
+		if (!blk_queue_discard(q) || !q->limits.discard_zeroes_data)
+			return -EOPNOTSUPP;
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+		if (error)
+			return error;
+		break;
+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
+		/*
+		 * XXX: a well known search engine vendor interprets this
+		 * flag (in other circumstances) to mean "I don't care if
+		 * we can read stale contents later".  Is it appropriate
+		 * to wire this up to the non-zeroing discard?
+		 */
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+		if (error)
+			return error;
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	/*
+	 * Invalidate again; if someone wandered in and dirtied a page,
+	 * the caller will be given -EBUSY;
+	 */
+	return invalidate_inode_pages2_range(mapping,
+					     start >> PAGE_SHIFT,
+					     end >> PAGE_SHIFT);
+}
+
 const struct file_operations def_blk_fops = {
 	.open		= blkdev_open,
 	.release	= blkdev_close,
@@ -1816,6 +1899,7 @@ const struct file_operations def_blk_fops = {
 #endif
 	.splice_read	= generic_file_splice_read,
 	.splice_write	= iter_file_splice_write,
+	.fallocate	= blkdev_fallocate,
 };
 
 int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long arg)
diff --git a/fs/open.c b/fs/open.c
index 93ae3cd..b2dbda4 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -289,7 +289,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 	 * Let individual file system decide if it supports preallocation
 	 * for directories or not.
 	 */
-	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
+	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) &&
+	    !S_ISBLK(inode->i_mode))
 		return -ENODEV;
 
 	/* Check for wrap through zero too */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT.
  2016-06-17  1:17 ` [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT Darrick J. Wong
@ 2016-06-20 12:35   ` Bart Van Assche
  2016-06-28 19:13     ` Darrick J. Wong
  2016-06-29  4:57   ` Martin K. Petersen
  1 sibling, 1 reply; 9+ messages in thread
From: Bart Van Assche @ 2016-06-20 12:35 UTC (permalink / raw)
  To: Darrick J. Wong, axboe
  Cc: hch, tytso, martin.petersen, snitzer, linux-api, bfoster, xfs,
	linux-block, dm-devel, linux-fsdevel, Christoph Hellwig

On 06/17/2016 03:18 AM, Darrick J. Wong wrote:
> Invalidate the page cache (as a regular O_DIRECT write would do) to avoid
> returning stale cache contents at a later time.
>
> v5: Refactor the 4.4 refactoring of the ioctl code into separate functions.
> Split the page invalidation and the new ioctl into separate patches.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>  block/ioctl.c |   29 +++++++++++++++++++++++------
>  1 file changed, 23 insertions(+), 6 deletions(-)
>
>
> diff --git a/block/ioctl.c b/block/ioctl.c
> index ed2397f..d001f52 100644
> --- a/block/ioctl.c
> +++ b/block/ioctl.c
> @@ -225,7 +225,9 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
>  		unsigned long arg)
>  {
>  	uint64_t range[2];
> -	uint64_t start, len;
> +	struct address_space *mapping;
> +	uint64_t start, end, len;
> +	int ret;
>
>  	if (!(mode & FMODE_WRITE))
>  		return -EBADF;
> @@ -235,18 +237,33 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
>
>  	start = range[0];
>  	len = range[1];
> +	end = start + len - 1;
>
>  	if (start & 511)
>  		return -EINVAL;
>  	if (len & 511)
>  		return -EINVAL;
> -	start >>= 9;
> -	len >>= 9;
> -
> -	if (start + len > (i_size_read(bdev->bd_inode) >> 9))
> +	if (end >= (uint64_t)i_size_read(bdev->bd_inode))
> +		return -EINVAL;
> +	if (end < start)
>  		return -EINVAL;
>
> -	return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL, false);
> +	/* Invalidate the page cache, including dirty pages */
> +	mapping = bdev->bd_inode->i_mapping;
> +	truncate_inode_pages_range(mapping, start, end);
> +
> +	ret = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
> +				    false);
> +	if (ret)
> +		return ret;
> +
> +	/*
> +	 * Invalidate again; if someone wandered in and dirtied a page,
> +	 * the caller will be given -EBUSY.
> +	 */
> +	return invalidate_inode_pages2_range(mapping,
> +					     start >> PAGE_SHIFT,
> +					     end >> PAGE_SHIFT);
>  }

Hello Darrick,

Maybe this has already been discussed, but anyway: in the POSIX spec 
(http://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html) I 
found the following: "This volume of POSIX.1-2008 does not specify 
behavior of concurrent writes to a file from multiple processes. 
Applications should use some form of concurrency control."

Do we really need the invalidate_inode_pages2_range() call?

Thanks,

Bart.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/3] block: require write_same and discard requests align to logical block size
  2016-06-17  1:17 ` [PATCH 2/3] block: require write_same and discard requests align to logical block size Darrick J. Wong
@ 2016-06-20 12:37   ` Bart Van Assche
  2016-06-29  4:58   ` Martin K. Petersen
  1 sibling, 0 replies; 9+ messages in thread
From: Bart Van Assche @ 2016-06-20 12:37 UTC (permalink / raw)
  To: Darrick J. Wong, axboe
  Cc: hch, tytso, martin.petersen, snitzer, linux-api, bfoster, xfs,
	linux-block, dm-devel, linux-fsdevel, Bart Van Assche,
	Christoph Hellwig

On 06/17/2016 03:19 AM, Darrick J. Wong wrote:
> Make sure that the offset and length arguments that we're using to
> construct WRITE SAME and DISCARD requests are actually aligned to the
> logical block size.  Failure to do this causes other errors in other
> parts of the block layer or the SCSI layer because disks don't support
> partial logical block writes.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT.
  2016-06-20 12:35   ` Bart Van Assche
@ 2016-06-28 19:13     ` Darrick J. Wong
  0 siblings, 0 replies; 9+ messages in thread
From: Darrick J. Wong @ 2016-06-28 19:13 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: axboe, hch, tytso, martin.petersen, snitzer, linux-api, bfoster,
	xfs, linux-block, dm-devel, linux-fsdevel, Christoph Hellwig

On Mon, Jun 20, 2016 at 02:35:29PM +0200, Bart Van Assche wrote:
> On 06/17/2016 03:18 AM, Darrick J. Wong wrote:
> >Invalidate the page cache (as a regular O_DIRECT write would do) to avoid
> >returning stale cache contents at a later time.
> >
> >v5: Refactor the 4.4 refactoring of the ioctl code into separate functions.
> >Split the page invalidation and the new ioctl into separate patches.
> >
> >Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> >Reviewed-by: Christoph Hellwig <hch@lst.de>
> >---
> > block/ioctl.c |   29 +++++++++++++++++++++++------
> > 1 file changed, 23 insertions(+), 6 deletions(-)
> >
> >
> >diff --git a/block/ioctl.c b/block/ioctl.c
> >index ed2397f..d001f52 100644
> >--- a/block/ioctl.c
> >+++ b/block/ioctl.c
> >@@ -225,7 +225,9 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
> > 		unsigned long arg)
> > {
> > 	uint64_t range[2];
> >-	uint64_t start, len;
> >+	struct address_space *mapping;
> >+	uint64_t start, end, len;
> >+	int ret;
> >
> > 	if (!(mode & FMODE_WRITE))
> > 		return -EBADF;
> >@@ -235,18 +237,33 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
> >
> > 	start = range[0];
> > 	len = range[1];
> >+	end = start + len - 1;
> >
> > 	if (start & 511)
> > 		return -EINVAL;
> > 	if (len & 511)
> > 		return -EINVAL;
> >-	start >>= 9;
> >-	len >>= 9;
> >-
> >-	if (start + len > (i_size_read(bdev->bd_inode) >> 9))
> >+	if (end >= (uint64_t)i_size_read(bdev->bd_inode))
> >+		return -EINVAL;
> >+	if (end < start)
> > 		return -EINVAL;
> >
> >-	return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL, false);
> >+	/* Invalidate the page cache, including dirty pages */
> >+	mapping = bdev->bd_inode->i_mapping;
> >+	truncate_inode_pages_range(mapping, start, end);
> >+
> >+	ret = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
> >+				    false);
> >+	if (ret)
> >+		return ret;
> >+
> >+	/*
> >+	 * Invalidate again; if someone wandered in and dirtied a page,
> >+	 * the caller will be given -EBUSY.
> >+	 */
> >+	return invalidate_inode_pages2_range(mapping,
> >+					     start >> PAGE_SHIFT,
> >+					     end >> PAGE_SHIFT);
> > }
> 
> Hello Darrick,
> 
> Maybe this has already been discussed, but anyway: in the POSIX spec
> (http://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html) I
> found the following: "This volume of POSIX.1-2008 does not specify behavior
> of concurrent writes to a file from multiple processes. Applications should
> use some form of concurrency control."
> 
> Do we really need the invalidate_inode_pages2_range() call?

It's not strictly necessary.  I like the idea of having the kernel bonking
userspace when they don't coordinate and collide, but we could just jump
out after the blkdev_*() calls and let userspace fend for themselves. :)

--D

> 
> Thanks,
> 
> Bart.
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT.
  2016-06-17  1:17 ` [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT Darrick J. Wong
  2016-06-20 12:35   ` Bart Van Assche
@ 2016-06-29  4:57   ` Martin K. Petersen
  1 sibling, 0 replies; 9+ messages in thread
From: Martin K. Petersen @ 2016-06-29  4:57 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: axboe, linux-block, tytso, martin.petersen, snitzer, linux-api,
	bfoster, xfs, hch, dm-devel, linux-fsdevel, Christoph Hellwig

>>>>> "Darrick" == Darrick J Wong <darrick.wong@oracle.com> writes:

Darrick> Invalidate the page cache (as a regular O_DIRECT write would
Darrick> do) to avoid returning stale cache contents at a later time.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/3] block: require write_same and discard requests align to logical block size
  2016-06-17  1:17 ` [PATCH 2/3] block: require write_same and discard requests align to logical block size Darrick J. Wong
  2016-06-20 12:37   ` Bart Van Assche
@ 2016-06-29  4:58   ` Martin K. Petersen
  1 sibling, 0 replies; 9+ messages in thread
From: Martin K. Petersen @ 2016-06-29  4:58 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: axboe, linux-block, tytso, martin.petersen, snitzer, linux-api,
	bfoster, xfs, hch, dm-devel, linux-fsdevel, Christoph Hellwig

>>>>> "Darrick" == Darrick J Wong <darrick.wong@oracle.com> writes:

Darrick> Make sure that the offset and length arguments that we're using
Darrick> to construct WRITE SAME and DISCARD requests are actually
Darrick> aligned to the logical block size.  Failure to do this causes
Darrick> other errors in other parts of the block layer or the SCSI
Darrick> layer because disks don't support partial logical block writes.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-06-29  4:58 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-17  1:17 [PATCH v9 0/3] fallocate for block devices Darrick J. Wong
2016-06-17  1:17 ` [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT Darrick J. Wong
2016-06-20 12:35   ` Bart Van Assche
2016-06-28 19:13     ` Darrick J. Wong
2016-06-29  4:57   ` Martin K. Petersen
2016-06-17  1:17 ` [PATCH 2/3] block: require write_same and discard requests align to logical block size Darrick J. Wong
2016-06-20 12:37   ` Bart Van Assche
2016-06-29  4:58   ` Martin K. Petersen
2016-06-17  1:17 ` [PATCH 3/3] block: implement (some of) fallocate for block devices Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).