All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC DONOTMERGE v8 0/3] fallocate for block devices
@ 2016-04-13  4:01 ` Darrick J. Wong
  0 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2016-04-13  4:01 UTC (permalink / raw)
  To: darrick.wong
  Cc: axboe, hch, tytso, martin.petersen, snitzer, linux-api, bfoster,
	xfs, linux-block, dm-devel, linux-fsdevel

Hi,

This is a redesign of the patch series that fixes various interface
problems with the existing "zero out this part of a block device"
code.  BLKZEROOUT2 is gone.

The first patch is still a fix to the existing BLKZEROOUT ioctl to
invalidate the page cache if the zeroing command to the underlying
device succeeds.

The second patch changes the internal block device functions to reject
attempts to discard or zeroout that are not aligned to the logical
block size.  Previously, we only checked that the start/len parameters
were 512-byte aligned, which caused kernel BUG_ONs for unaligned IOs
to 4k-LBA devices.

The third patch creates an fallocate handler for block devices, wires
up the FALLOC_FL_PUNCH_HOLE flag to zeroing-discard, and connects
FALLOC_FL_ZERO_RANGE to write-same so that we can have a consistent
fallocate interface between files and block devices.  It also allows
the combination of PUNCH_HOLE and NO_HIDE_STALE to invoke non-zeroing
discard.

The point of this patchset is not to go upstream, but is to be a
starting point for a discussion at LSF.  Don't merge this!  Foremost
in my mind is whether or not we require the offset/len parameters to
be aligned to logical block size or minimum_io_size; what error code
to return for unaligned values; and whether or not we should allow
byte ranges and zero blocks with the page cache (like file fallocate
does now).  It'll also be a jumping off point for Brian Foster and
Mike Snitzer's patches to allow bdev clients to ask that space be
allocated to a range, and to plumb that out to userspace.

Test cases for the new block device fallocate have been submitted to
the xfstests list as generic/70[5-7], though the latest versions of
those test cases will be attached to this patchset for convenience.

Comments and questions are, as always, welcome.  Patches are against
4.6-rc3.

v7: Strengthen parameter checking and fix various code issues pointed
out by Linus and Christoph.
v8: More code rearranging, rebase to 4.6-rc3, and dig into alignment
issues.

--D

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC DONOTMERGE v8 0/3] fallocate for block devices
@ 2016-04-13  4:01 ` Darrick J. Wong
  0 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2016-04-13  4:01 UTC (permalink / raw)
  To: darrick.wong
  Cc: axboe, linux-block, tytso, martin.petersen, snitzer, linux-api,
	bfoster, xfs, hch, dm-devel, linux-fsdevel

Hi,

This is a redesign of the patch series that fixes various interface
problems with the existing "zero out this part of a block device"
code.  BLKZEROOUT2 is gone.

The first patch is still a fix to the existing BLKZEROOUT ioctl to
invalidate the page cache if the zeroing command to the underlying
device succeeds.

The second patch changes the internal block device functions to reject
attempts to discard or zeroout that are not aligned to the logical
block size.  Previously, we only checked that the start/len parameters
were 512-byte aligned, which caused kernel BUG_ONs for unaligned IOs
to 4k-LBA devices.

The third patch creates an fallocate handler for block devices, wires
up the FALLOC_FL_PUNCH_HOLE flag to zeroing-discard, and connects
FALLOC_FL_ZERO_RANGE to write-same so that we can have a consistent
fallocate interface between files and block devices.  It also allows
the combination of PUNCH_HOLE and NO_HIDE_STALE to invoke non-zeroing
discard.

The point of this patchset is not to go upstream, but is to be a
starting point for a discussion at LSF.  Don't merge this!  Foremost
in my mind is whether or not we require the offset/len parameters to
be aligned to logical block size or minimum_io_size; what error code
to return for unaligned values; and whether or not we should allow
byte ranges and zero blocks with the page cache (like file fallocate
does now).  It'll also be a jumping off point for Brian Foster and
Mike Snitzer's patches to allow bdev clients to ask that space be
allocated to a range, and to plumb that out to userspace.

Test cases for the new block device fallocate have been submitted to
the xfstests list as generic/70[5-7], though the latest versions of
those test cases will be attached to this patchset for convenience.

Comments and questions are, as always, welcome.  Patches are against
4.6-rc3.

v7: Strengthen parameter checking and fix various code issues pointed
out by Linus and Christoph.
v8: More code rearranging, rebase to 4.6-rc3, and dig into alignment
issues.

--D

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC DONOTMERGE v8 0/3] fallocate for block devices
@ 2016-04-13  4:01 ` Darrick J. Wong
  0 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2016-04-13  4:01 UTC (permalink / raw)
  To: darrick.wong-QHcLZuEGTsvQT0dZR+AlfA
  Cc: axboe-tSWWG44O7X1aa/9Udqfwiw, hch-wEGCiKHe2LqWVfeAwA7xHQ,
	tytso-3s7WtUTddSA, martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, linux-api-u79uwXL29TY76Z2rM5mHXA,
	bfoster-H+wXaHxf7aLQT0dZR+AlfA, xfs-VZNHf3L845pBDgjK7y7TUQ,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

Hi,

This is a redesign of the patch series that fixes various interface
problems with the existing "zero out this part of a block device"
code.  BLKZEROOUT2 is gone.

The first patch is still a fix to the existing BLKZEROOUT ioctl to
invalidate the page cache if the zeroing command to the underlying
device succeeds.

The second patch changes the internal block device functions to reject
attempts to discard or zeroout that are not aligned to the logical
block size.  Previously, we only checked that the start/len parameters
were 512-byte aligned, which caused kernel BUG_ONs for unaligned IOs
to 4k-LBA devices.

The third patch creates an fallocate handler for block devices, wires
up the FALLOC_FL_PUNCH_HOLE flag to zeroing-discard, and connects
FALLOC_FL_ZERO_RANGE to write-same so that we can have a consistent
fallocate interface between files and block devices.  It also allows
the combination of PUNCH_HOLE and NO_HIDE_STALE to invoke non-zeroing
discard.

The point of this patchset is not to go upstream, but is to be a
starting point for a discussion at LSF.  Don't merge this!  Foremost
in my mind is whether or not we require the offset/len parameters to
be aligned to logical block size or minimum_io_size; what error code
to return for unaligned values; and whether or not we should allow
byte ranges and zero blocks with the page cache (like file fallocate
does now).  It'll also be a jumping off point for Brian Foster and
Mike Snitzer's patches to allow bdev clients to ask that space be
allocated to a range, and to plumb that out to userspace.

Test cases for the new block device fallocate have been submitted to
the xfstests list as generic/70[5-7], though the latest versions of
those test cases will be attached to this patchset for convenience.

Comments and questions are, as always, welcome.  Patches are against
4.6-rc3.

v7: Strengthen parameter checking and fix various code issues pointed
out by Linus and Christoph.
v8: More code rearranging, rebase to 4.6-rc3, and dig into alignment
issues.

--D

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT.
  2016-04-13  4:01 ` Darrick J. Wong
  (?)
@ 2016-04-13  4:01   ` Darrick J. Wong
  -1 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2016-04-13  4:01 UTC (permalink / raw)
  To: darrick.wong
  Cc: axboe, hch, tytso, martin.petersen, snitzer, linux-api, bfoster,
	xfs, linux-block, dm-devel, linux-fsdevel, Christoph Hellwig

Invalidate the page cache (as a regular O_DIRECT write would do) to avoid
returning stale cache contents at a later time.

v5: Refactor the 4.4 refactoring of the ioctl code into separate functions.
Split the page invalidation and the new ioctl into separate patches.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 block/ioctl.c |   29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)


diff --git a/block/ioctl.c b/block/ioctl.c
index 4ff1f92..52b60b2 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -226,7 +226,9 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
 		unsigned long arg)
 {
 	uint64_t range[2];
-	uint64_t start, len;
+	struct address_space *mapping;
+	uint64_t start, end, len;
+	int ret;
 
 	if (!(mode & FMODE_WRITE))
 		return -EBADF;
@@ -236,18 +238,33 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
 
 	start = range[0];
 	len = range[1];
+	end = start + len - 1;
 
 	if (start & 511)
 		return -EINVAL;
 	if (len & 511)
 		return -EINVAL;
-	start >>= 9;
-	len >>= 9;
-
-	if (start + len > (i_size_read(bdev->bd_inode) >> 9))
+	if (end >= (uint64_t)i_size_read(bdev->bd_inode))
+		return -EINVAL;
+	if (end < start)
 		return -EINVAL;
 
-	return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL, false);
+	/* Invalidate the page cache, including dirty pages */
+	mapping = bdev->bd_inode->i_mapping;
+	truncate_inode_pages_range(mapping, start, end);
+
+	ret = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
+				    false);
+	if (ret)
+		return ret;
+
+	/*
+	 * Invalidate again; if someone wandered in and dirtied a page,
+	 * the caller will be given -EBUSY.
+	 */
+	return invalidate_inode_pages2_range(mapping,
+					     start >> PAGE_SHIFT,
+					     end >> PAGE_SHIFT);
 }
 
 static int put_ushort(unsigned long arg, unsigned short val)


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT.
@ 2016-04-13  4:01   ` Darrick J. Wong
  0 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2016-04-13  4:01 UTC (permalink / raw)
  To: darrick.wong
  Cc: axboe, linux-block, tytso, martin.petersen, snitzer, linux-api,
	bfoster, xfs, hch, dm-devel, linux-fsdevel, Christoph Hellwig

Invalidate the page cache (as a regular O_DIRECT write would do) to avoid
returning stale cache contents at a later time.

v5: Refactor the 4.4 refactoring of the ioctl code into separate functions.
Split the page invalidation and the new ioctl into separate patches.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 block/ioctl.c |   29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)


diff --git a/block/ioctl.c b/block/ioctl.c
index 4ff1f92..52b60b2 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -226,7 +226,9 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
 		unsigned long arg)
 {
 	uint64_t range[2];
-	uint64_t start, len;
+	struct address_space *mapping;
+	uint64_t start, end, len;
+	int ret;
 
 	if (!(mode & FMODE_WRITE))
 		return -EBADF;
@@ -236,18 +238,33 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
 
 	start = range[0];
 	len = range[1];
+	end = start + len - 1;
 
 	if (start & 511)
 		return -EINVAL;
 	if (len & 511)
 		return -EINVAL;
-	start >>= 9;
-	len >>= 9;
-
-	if (start + len > (i_size_read(bdev->bd_inode) >> 9))
+	if (end >= (uint64_t)i_size_read(bdev->bd_inode))
+		return -EINVAL;
+	if (end < start)
 		return -EINVAL;
 
-	return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL, false);
+	/* Invalidate the page cache, including dirty pages */
+	mapping = bdev->bd_inode->i_mapping;
+	truncate_inode_pages_range(mapping, start, end);
+
+	ret = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
+				    false);
+	if (ret)
+		return ret;
+
+	/*
+	 * Invalidate again; if someone wandered in and dirtied a page,
+	 * the caller will be given -EBUSY.
+	 */
+	return invalidate_inode_pages2_range(mapping,
+					     start >> PAGE_SHIFT,
+					     end >> PAGE_SHIFT);
 }
 
 static int put_ushort(unsigned long arg, unsigned short val)

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT.
@ 2016-04-13  4:01   ` Darrick J. Wong
  0 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2016-04-13  4:01 UTC (permalink / raw)
  To: darrick.wong-QHcLZuEGTsvQT0dZR+AlfA
  Cc: axboe-tSWWG44O7X1aa/9Udqfwiw, hch-wEGCiKHe2LqWVfeAwA7xHQ,
	tytso-3s7WtUTddSA, martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, linux-api-u79uwXL29TY76Z2rM5mHXA,
	bfoster-H+wXaHxf7aLQT0dZR+AlfA, xfs-VZNHf3L845pBDgjK7y7TUQ,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Christoph Hellwig

Invalidate the page cache (as a regular O_DIRECT write would do) to avoid
returning stale cache contents at a later time.

v5: Refactor the 4.4 refactoring of the ioctl code into separate functions.
Split the page invalidation and the new ioctl into separate patches.

Signed-off-by: Darrick J. Wong <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 block/ioctl.c |   29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)


diff --git a/block/ioctl.c b/block/ioctl.c
index 4ff1f92..52b60b2 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -226,7 +226,9 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
 		unsigned long arg)
 {
 	uint64_t range[2];
-	uint64_t start, len;
+	struct address_space *mapping;
+	uint64_t start, end, len;
+	int ret;
 
 	if (!(mode & FMODE_WRITE))
 		return -EBADF;
@@ -236,18 +238,33 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
 
 	start = range[0];
 	len = range[1];
+	end = start + len - 1;
 
 	if (start & 511)
 		return -EINVAL;
 	if (len & 511)
 		return -EINVAL;
-	start >>= 9;
-	len >>= 9;
-
-	if (start + len > (i_size_read(bdev->bd_inode) >> 9))
+	if (end >= (uint64_t)i_size_read(bdev->bd_inode))
+		return -EINVAL;
+	if (end < start)
 		return -EINVAL;
 
-	return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL, false);
+	/* Invalidate the page cache, including dirty pages */
+	mapping = bdev->bd_inode->i_mapping;
+	truncate_inode_pages_range(mapping, start, end);
+
+	ret = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
+				    false);
+	if (ret)
+		return ret;
+
+	/*
+	 * Invalidate again; if someone wandered in and dirtied a page,
+	 * the caller will be given -EBUSY.
+	 */
+	return invalidate_inode_pages2_range(mapping,
+					     start >> PAGE_SHIFT,
+					     end >> PAGE_SHIFT);
 }
 
 static int put_ushort(unsigned long arg, unsigned short val)

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/3] block: require write_same and discard requests align to logical block size
  2016-04-13  4:01 ` Darrick J. Wong
@ 2016-04-13  4:01   ` Darrick J. Wong
  -1 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2016-04-13  4:01 UTC (permalink / raw)
  To: darrick.wong
  Cc: axboe, hch, tytso, martin.petersen, snitzer, linux-api, bfoster,
	xfs, linux-block, dm-devel, linux-fsdevel, Christoph Hellwig

Make sure that the offset and length arguments that we're using to
construct WRITE SAME and DISCARD requests are actually aligned to the
logical block size.  Failure to do this causes other errors in other
parts of the block layer or the SCSI layer because disks don't support
partial logical block writes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-lib.c |   15 +++++++++++++++
 1 file changed, 15 insertions(+)


diff --git a/block/blk-lib.c b/block/blk-lib.c
index 9ebf653..9dca6bb 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -49,6 +49,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 	struct bio *bio;
 	int ret = 0;
 	struct blk_plug plug;
+	sector_t bs_mask;
 
 	if (!q)
 		return -ENXIO;
@@ -56,6 +57,10 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 	if (!blk_queue_discard(q))
 		return -EOPNOTSUPP;
 
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
+
 	/* Zero-sector (unknown) and one-sector granularities are the same.  */
 	granularity = max(q->limits.discard_granularity >> 9, 1U);
 	alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
@@ -148,6 +153,7 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 	DECLARE_COMPLETION_ONSTACK(wait);
 	struct request_queue *q = bdev_get_queue(bdev);
 	unsigned int max_write_same_sectors;
+	sector_t bs_mask;
 	struct bio_batch bb;
 	struct bio *bio;
 	int ret = 0;
@@ -155,6 +161,10 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 	if (!q)
 		return -ENXIO;
 
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
+
 	/* Ensure that max_write_same_sectors doesn't overflow bi_size */
 	max_write_same_sectors = UINT_MAX >> 9;
 
@@ -218,9 +228,14 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 	int ret;
 	struct bio *bio;
 	struct bio_batch bb;
+	sector_t bs_mask;
 	unsigned int sz;
 	DECLARE_COMPLETION_ONSTACK(wait);
 
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
+
 	atomic_set(&bb.done, 1);
 	bb.error = 0;
 	bb.wait = &wait;


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/3] block: require write_same and discard requests align to logical block size
@ 2016-04-13  4:01   ` Darrick J. Wong
  0 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2016-04-13  4:01 UTC (permalink / raw)
  To: darrick.wong
  Cc: axboe, linux-block, tytso, martin.petersen, snitzer, linux-api,
	bfoster, xfs, hch, dm-devel, linux-fsdevel, Christoph Hellwig

Make sure that the offset and length arguments that we're using to
construct WRITE SAME and DISCARD requests are actually aligned to the
logical block size.  Failure to do this causes other errors in other
parts of the block layer or the SCSI layer because disks don't support
partial logical block writes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-lib.c |   15 +++++++++++++++
 1 file changed, 15 insertions(+)


diff --git a/block/blk-lib.c b/block/blk-lib.c
index 9ebf653..9dca6bb 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -49,6 +49,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 	struct bio *bio;
 	int ret = 0;
 	struct blk_plug plug;
+	sector_t bs_mask;
 
 	if (!q)
 		return -ENXIO;
@@ -56,6 +57,10 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 	if (!blk_queue_discard(q))
 		return -EOPNOTSUPP;
 
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
+
 	/* Zero-sector (unknown) and one-sector granularities are the same.  */
 	granularity = max(q->limits.discard_granularity >> 9, 1U);
 	alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
@@ -148,6 +153,7 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 	DECLARE_COMPLETION_ONSTACK(wait);
 	struct request_queue *q = bdev_get_queue(bdev);
 	unsigned int max_write_same_sectors;
+	sector_t bs_mask;
 	struct bio_batch bb;
 	struct bio *bio;
 	int ret = 0;
@@ -155,6 +161,10 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 	if (!q)
 		return -ENXIO;
 
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
+
 	/* Ensure that max_write_same_sectors doesn't overflow bi_size */
 	max_write_same_sectors = UINT_MAX >> 9;
 
@@ -218,9 +228,14 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 	int ret;
 	struct bio *bio;
 	struct bio_batch bb;
+	sector_t bs_mask;
 	unsigned int sz;
 	DECLARE_COMPLETION_ONSTACK(wait);
 
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
+
 	atomic_set(&bb.done, 1);
 	bb.error = 0;
 	bb.wait = &wait;

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 3/3] block: implement (some of) fallocate for block devices
  2016-04-13  4:01 ` Darrick J. Wong
  (?)
@ 2016-04-13  4:01   ` Darrick J. Wong
  -1 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2016-04-13  4:01 UTC (permalink / raw)
  To: darrick.wong
  Cc: axboe, hch, tytso, martin.petersen, snitzer, linux-api, bfoster,
	xfs, linux-block, dm-devel, linux-fsdevel

After much discussion, it seems that the fallocate feature flag
FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
whitelisted for zeroing SCSI UNMAP.  Punch still requires that
FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
device will be clamped to the device size if KEEP_SIZE is set; or will
return -EINVAL if not.  Both start and length must be aligned to the
device's logical block size.

Since the semantics of fallocate are fairly well established already,
wire up the two pieces.  The other fallocate variants (collapse range,
insert range, and allocate blocks) are not supported.

v2: Incorporate feedback from Christoph & Linus.  Tentatively add
a requirement that the fallocate arguments be aligned to logical block
size, and put in a few XXX comments ahead of LSF discussion.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/block_dev.c |   87 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/open.c      |    3 +-
 2 files changed, 89 insertions(+), 1 deletion(-)


diff --git a/fs/block_dev.c b/fs/block_dev.c
index 20a2c02..5c8eb0c 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -29,6 +29,7 @@
 #include <linux/log2.h>
 #include <linux/cleancache.h>
 #include <linux/dax.h>
+#include <linux/falloc.h>
 #include <asm/uaccess.h>
 #include "internal.h"
 
@@ -1790,6 +1791,91 @@ static int blkdev_mmap(struct file *file, struct vm_area_struct *vma)
 #define blkdev_mmap generic_file_mmap
 #endif
 
+#define	BLKDEV_FALLOC_FL_SUPPORTED					\
+		(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |		\
+		 FALLOC_FL_ZERO_RANGE | FALLOC_FL_NO_HIDE_STALE)
+
+static long blkdev_fallocate(struct file *file, int mode, loff_t start,
+			     loff_t len)
+{
+	struct block_device *bdev = I_BDEV(bdev_file_inode(file));
+	struct request_queue *q = bdev_get_queue(bdev);
+	struct address_space *mapping;
+	loff_t end = start + len - 1;
+	loff_t isize;
+	int error;
+
+	/* Fail if we don't recognize the flags. */
+	if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
+		return -EOPNOTSUPP;
+
+	/* Don't go off the end of the device. */
+	isize = i_size_read(bdev->bd_inode);
+	if (start >= isize)
+		return -EINVAL;
+	if (end > isize) {
+		if (mode & FALLOC_FL_KEEP_SIZE) {
+			len = isize - start;
+			end = start + len - 1;
+		} else
+			return -EINVAL;
+	}
+
+	/*
+	 * Don't allow IO that isn't aligned to logical block size.
+	 * XXX: Should zero and punch write zeroes through the page cache
+	 *      if start or end aren't lbs aligned?
+	 * XXX: What about thinp which prefers io_min alignment?
+	 */
+	if ((start | len) & bdev_logical_block_size(bdev) - 1)
+		return -EINVAL;
+
+	/* Invalidate the page cache, including dirty pages. */
+	mapping = bdev->bd_inode->i_mapping;
+	truncate_inode_pages_range(mapping, start, end);
+
+	switch (mode) {
+	case FALLOC_FL_ZERO_RANGE:
+	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
+		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
+					    GFP_KERNEL, false);
+		if (error)
+			return error;
+		break;
+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
+		/* Only punch if the device can do zeroing discard. */
+		if (!blk_queue_discard(q) || !q->limits.discard_zeroes_data)
+			return -EOPNOTSUPP;
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+		if (error)
+			return error;
+		break;
+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
+		/*
+		 * XXX: a well known search engine vendor interprets this
+		 * flag (in other circumstances) to mean "I don't care if
+		 * we can read stale contents later".  Is it appropriate
+		 * to wire this up to the non-zeroing discard?
+		 */
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+		if (error)
+			return error;
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	/*
+	 * Invalidate again; if someone wandered in and dirtied a page,
+	 * the caller will be given -EBUSY;
+	 */
+	return invalidate_inode_pages2_range(mapping,
+					     start >> PAGE_SHIFT,
+					     end >> PAGE_SHIFT);
+}
+
 const struct file_operations def_blk_fops = {
 	.open		= blkdev_open,
 	.release	= blkdev_close,
@@ -1804,6 +1890,7 @@ const struct file_operations def_blk_fops = {
 #endif
 	.splice_read	= generic_file_splice_read,
 	.splice_write	= iter_file_splice_write,
+	.fallocate	= blkdev_fallocate,
 };
 
 int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long arg)
diff --git a/fs/open.c b/fs/open.c
index 17cb6b1..f9ebe32 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -289,7 +289,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 	 * Let individual file system decide if it supports preallocation
 	 * for directories or not.
 	 */
-	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
+	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) &&
+	    !S_ISBLK(inode->i_mode))
 		return -ENODEV;
 
 	/* Check for wrap through zero too */


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 3/3] block: implement (some of) fallocate for block devices
@ 2016-04-13  4:01   ` Darrick J. Wong
  0 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2016-04-13  4:01 UTC (permalink / raw)
  To: darrick.wong
  Cc: axboe, linux-block, tytso, martin.petersen, snitzer, linux-api,
	bfoster, xfs, hch, dm-devel, linux-fsdevel

After much discussion, it seems that the fallocate feature flag
FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
whitelisted for zeroing SCSI UNMAP.  Punch still requires that
FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
device will be clamped to the device size if KEEP_SIZE is set; or will
return -EINVAL if not.  Both start and length must be aligned to the
device's logical block size.

Since the semantics of fallocate are fairly well established already,
wire up the two pieces.  The other fallocate variants (collapse range,
insert range, and allocate blocks) are not supported.

v2: Incorporate feedback from Christoph & Linus.  Tentatively add
a requirement that the fallocate arguments be aligned to logical block
size, and put in a few XXX comments ahead of LSF discussion.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/block_dev.c |   87 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/open.c      |    3 +-
 2 files changed, 89 insertions(+), 1 deletion(-)


diff --git a/fs/block_dev.c b/fs/block_dev.c
index 20a2c02..5c8eb0c 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -29,6 +29,7 @@
 #include <linux/log2.h>
 #include <linux/cleancache.h>
 #include <linux/dax.h>
+#include <linux/falloc.h>
 #include <asm/uaccess.h>
 #include "internal.h"
 
@@ -1790,6 +1791,91 @@ static int blkdev_mmap(struct file *file, struct vm_area_struct *vma)
 #define blkdev_mmap generic_file_mmap
 #endif
 
+#define	BLKDEV_FALLOC_FL_SUPPORTED					\
+		(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |		\
+		 FALLOC_FL_ZERO_RANGE | FALLOC_FL_NO_HIDE_STALE)
+
+static long blkdev_fallocate(struct file *file, int mode, loff_t start,
+			     loff_t len)
+{
+	struct block_device *bdev = I_BDEV(bdev_file_inode(file));
+	struct request_queue *q = bdev_get_queue(bdev);
+	struct address_space *mapping;
+	loff_t end = start + len - 1;
+	loff_t isize;
+	int error;
+
+	/* Fail if we don't recognize the flags. */
+	if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
+		return -EOPNOTSUPP;
+
+	/* Don't go off the end of the device. */
+	isize = i_size_read(bdev->bd_inode);
+	if (start >= isize)
+		return -EINVAL;
+	if (end > isize) {
+		if (mode & FALLOC_FL_KEEP_SIZE) {
+			len = isize - start;
+			end = start + len - 1;
+		} else
+			return -EINVAL;
+	}
+
+	/*
+	 * Don't allow IO that isn't aligned to logical block size.
+	 * XXX: Should zero and punch write zeroes through the page cache
+	 *      if start or end aren't lbs aligned?
+	 * XXX: What about thinp which prefers io_min alignment?
+	 */
+	if ((start | len) & bdev_logical_block_size(bdev) - 1)
+		return -EINVAL;
+
+	/* Invalidate the page cache, including dirty pages. */
+	mapping = bdev->bd_inode->i_mapping;
+	truncate_inode_pages_range(mapping, start, end);
+
+	switch (mode) {
+	case FALLOC_FL_ZERO_RANGE:
+	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
+		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
+					    GFP_KERNEL, false);
+		if (error)
+			return error;
+		break;
+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
+		/* Only punch if the device can do zeroing discard. */
+		if (!blk_queue_discard(q) || !q->limits.discard_zeroes_data)
+			return -EOPNOTSUPP;
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+		if (error)
+			return error;
+		break;
+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
+		/*
+		 * XXX: a well known search engine vendor interprets this
+		 * flag (in other circumstances) to mean "I don't care if
+		 * we can read stale contents later".  Is it appropriate
+		 * to wire this up to the non-zeroing discard?
+		 */
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+		if (error)
+			return error;
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	/*
+	 * Invalidate again; if someone wandered in and dirtied a page,
+	 * the caller will be given -EBUSY;
+	 */
+	return invalidate_inode_pages2_range(mapping,
+					     start >> PAGE_SHIFT,
+					     end >> PAGE_SHIFT);
+}
+
 const struct file_operations def_blk_fops = {
 	.open		= blkdev_open,
 	.release	= blkdev_close,
@@ -1804,6 +1890,7 @@ const struct file_operations def_blk_fops = {
 #endif
 	.splice_read	= generic_file_splice_read,
 	.splice_write	= iter_file_splice_write,
+	.fallocate	= blkdev_fallocate,
 };
 
 int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long arg)
diff --git a/fs/open.c b/fs/open.c
index 17cb6b1..f9ebe32 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -289,7 +289,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 	 * Let individual file system decide if it supports preallocation
 	 * for directories or not.
 	 */
-	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
+	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) &&
+	    !S_ISBLK(inode->i_mode))
 		return -ENODEV;
 
 	/* Check for wrap through zero too */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 3/3] block: implement (some of) fallocate for block devices
@ 2016-04-13  4:01   ` Darrick J. Wong
  0 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2016-04-13  4:01 UTC (permalink / raw)
  To: darrick.wong-QHcLZuEGTsvQT0dZR+AlfA
  Cc: axboe-tSWWG44O7X1aa/9Udqfwiw, hch-wEGCiKHe2LqWVfeAwA7xHQ,
	tytso-3s7WtUTddSA, martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, linux-api-u79uwXL29TY76Z2rM5mHXA,
	bfoster-H+wXaHxf7aLQT0dZR+AlfA, xfs-VZNHf3L845pBDgjK7y7TUQ,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

After much discussion, it seems that the fallocate feature flag
FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
whitelisted for zeroing SCSI UNMAP.  Punch still requires that
FALLOC_FL_KEEP_SIZE is set.  A length that goes past the end of the
device will be clamped to the device size if KEEP_SIZE is set; or will
return -EINVAL if not.  Both start and length must be aligned to the
device's logical block size.

Since the semantics of fallocate are fairly well established already,
wire up the two pieces.  The other fallocate variants (collapse range,
insert range, and allocate blocks) are not supported.

v2: Incorporate feedback from Christoph & Linus.  Tentatively add
a requirement that the fallocate arguments be aligned to logical block
size, and put in a few XXX comments ahead of LSF discussion.

Signed-off-by: Darrick J. Wong <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 fs/block_dev.c |   87 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/open.c      |    3 +-
 2 files changed, 89 insertions(+), 1 deletion(-)


diff --git a/fs/block_dev.c b/fs/block_dev.c
index 20a2c02..5c8eb0c 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -29,6 +29,7 @@
 #include <linux/log2.h>
 #include <linux/cleancache.h>
 #include <linux/dax.h>
+#include <linux/falloc.h>
 #include <asm/uaccess.h>
 #include "internal.h"
 
@@ -1790,6 +1791,91 @@ static int blkdev_mmap(struct file *file, struct vm_area_struct *vma)
 #define blkdev_mmap generic_file_mmap
 #endif
 
+#define	BLKDEV_FALLOC_FL_SUPPORTED					\
+		(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |		\
+		 FALLOC_FL_ZERO_RANGE | FALLOC_FL_NO_HIDE_STALE)
+
+static long blkdev_fallocate(struct file *file, int mode, loff_t start,
+			     loff_t len)
+{
+	struct block_device *bdev = I_BDEV(bdev_file_inode(file));
+	struct request_queue *q = bdev_get_queue(bdev);
+	struct address_space *mapping;
+	loff_t end = start + len - 1;
+	loff_t isize;
+	int error;
+
+	/* Fail if we don't recognize the flags. */
+	if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
+		return -EOPNOTSUPP;
+
+	/* Don't go off the end of the device. */
+	isize = i_size_read(bdev->bd_inode);
+	if (start >= isize)
+		return -EINVAL;
+	if (end > isize) {
+		if (mode & FALLOC_FL_KEEP_SIZE) {
+			len = isize - start;
+			end = start + len - 1;
+		} else
+			return -EINVAL;
+	}
+
+	/*
+	 * Don't allow IO that isn't aligned to logical block size.
+	 * XXX: Should zero and punch write zeroes through the page cache
+	 *      if start or end aren't lbs aligned?
+	 * XXX: What about thinp which prefers io_min alignment?
+	 */
+	if ((start | len) & bdev_logical_block_size(bdev) - 1)
+		return -EINVAL;
+
+	/* Invalidate the page cache, including dirty pages. */
+	mapping = bdev->bd_inode->i_mapping;
+	truncate_inode_pages_range(mapping, start, end);
+
+	switch (mode) {
+	case FALLOC_FL_ZERO_RANGE:
+	case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
+		error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
+					    GFP_KERNEL, false);
+		if (error)
+			return error;
+		break;
+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
+		/* Only punch if the device can do zeroing discard. */
+		if (!blk_queue_discard(q) || !q->limits.discard_zeroes_data)
+			return -EOPNOTSUPP;
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+		if (error)
+			return error;
+		break;
+	case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
+		/*
+		 * XXX: a well known search engine vendor interprets this
+		 * flag (in other circumstances) to mean "I don't care if
+		 * we can read stale contents later".  Is it appropriate
+		 * to wire this up to the non-zeroing discard?
+		 */
+		error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+					     GFP_KERNEL, 0);
+		if (error)
+			return error;
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	/*
+	 * Invalidate again; if someone wandered in and dirtied a page,
+	 * the caller will be given -EBUSY;
+	 */
+	return invalidate_inode_pages2_range(mapping,
+					     start >> PAGE_SHIFT,
+					     end >> PAGE_SHIFT);
+}
+
 const struct file_operations def_blk_fops = {
 	.open		= blkdev_open,
 	.release	= blkdev_close,
@@ -1804,6 +1890,7 @@ const struct file_operations def_blk_fops = {
 #endif
 	.splice_read	= generic_file_splice_read,
 	.splice_write	= iter_file_splice_write,
+	.fallocate	= blkdev_fallocate,
 };
 
 int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long arg)
diff --git a/fs/open.c b/fs/open.c
index 17cb6b1..f9ebe32 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -289,7 +289,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 	 * Let individual file system decide if it supports preallocation
 	 * for directories or not.
 	 */
-	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
+	if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) &&
+	    !S_ISBLK(inode->i_mode))
 		return -ENODEV;
 
 	/* Check for wrap through zero too */

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 4/3] block: test fallocate for block devices
  2016-04-13  4:01 ` Darrick J. Wong
@ 2016-04-13  4:04   ` Darrick J. Wong
  -1 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2016-04-13  4:04 UTC (permalink / raw)
  To: darrick.wong
  Cc: axboe, hch, tytso, martin.petersen, snitzer, linux-api, bfoster,
	xfs, linux-block, dm-devel, linux-fsdevel, fstests

Now that we're wiring up fallocate's PUNCH_HOLE and ZERO_RANGE
features for block devices, add some tests to make sure they
work correctly.

v2: Update tests to reflect EOD clamping suggested by Linus.
Note that the VFS fallocate makes us play some weird games wrt
MAX_LFS_FILESIZE.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 common/scsi_debug     |    6 ++
 tests/generic/705     |   89 ++++++++++++++++++++++++++++++++++
 tests/generic/705.out |   11 ++++
 tests/generic/706     |   86 ++++++++++++++++++++++++++++++++
 tests/generic/706.out |   10 ++++
 tests/generic/707     |  130 +++++++++++++++++++++++++++++++++++++++++++++++++
 tests/generic/707.out |   32 ++++++++++++
 tests/generic/group   |    3 +
 8 files changed, 366 insertions(+), 1 deletion(-)
 create mode 100755 tests/generic/705
 create mode 100644 tests/generic/705.out
 create mode 100755 tests/generic/706
 create mode 100644 tests/generic/706.out
 create mode 100755 tests/generic/707
 create mode 100644 tests/generic/707.out

diff --git a/common/scsi_debug b/common/scsi_debug
index eb08126..74c3802 100644
--- a/common/scsi_debug
+++ b/common/scsi_debug
@@ -40,13 +40,17 @@ _get_scsi_debug_dev()
 	logical=${2-512}
 	unaligned=${3-0}
 	size=${4-128}
+	test -n "$4" && shift
+	test -n "$3" && shift
+	test -n "$2" && shift
+	test -n "$1" && shift
 
 	phys_exp=0
 	while [ $logical -lt $physical ]; do
 		let physical=physical/2
 		let phys_exp=phys_exp+1
 	done
-	opts="sector_size=$logical physblk_exp=$phys_exp lowest_aligned=$unaligned dev_size_mb=$size"
+	opts="sector_size=$logical physblk_exp=$phys_exp lowest_aligned=$unaligned dev_size_mb=$size $@"
 	echo "scsi_debug options $opts" >> $seqres.full
 	modprobe scsi_debug $opts
 	[ $? -eq 0 ] || _fail "scsi_debug modprobe failed"
diff --git a/tests/generic/705 b/tests/generic/705
new file mode 100755
index 0000000..f30f2c3
--- /dev/null
+++ b/tests/generic/705
@@ -0,0 +1,89 @@
+#! /bin/bash
+# FS QA Test No. 705
+#
+# Test fallocate(ZERO_RANGE) on a block device, which should be able to
+# WRITE SAME (or equivalent) the range.
+#
+#-----------------------------------------------------------------------
+# Copyright (c) 2016 Oracle, Inc.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#-----------------------------------------------------------------------
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1	# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 7 15
+
+_cleanup()
+{
+    cd /
+    rm -rf $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/scsi_debug
+
+# real QA test starts here
+_supported_os Linux
+_require_scsi_debug
+_require_xfs_io_command "fzero"
+
+echo "Create and format"
+dev=$(_get_scsi_debug_dev 512 512 0 4 "lbpws=1 lbpws10=1")
+_pwrite_byte 0x62 0 4m $dev >> $seqres.full
+
+echo "Zero range"
+$XFS_IO_PROG -c "fzero -k 512k 1m" $dev
+
+echo "Zero range without keep_size"
+$XFS_IO_PROG -c "fzero 384k 64k" $dev
+
+echo "Zero range past EOD"
+$XFS_IO_PROG -c "fzero -k 3m 4m" $dev
+
+echo "Check contents"
+md5sum $dev | sed -e "s|$dev|SCSI_DEBUG_DEV|g"
+
+echo "Zero range to MAX_LFS_FILESIZE"
+# zod = MAX_LFS_FILESIZE
+case "$(getconf LONG_BIT)" in
+"32")
+	zod=$(( ($(getconf PAGE_SIZE) << ($(getconf LONG_BIT) - 1) ) - 1))
+	;;
+"64")
+	zod=9223372036854775807
+	;;
+*)
+	_fail "sizeof(long) == $(getconf LONG_BIT)?"
+	;;
+esac
+$XFS_IO_PROG -c "fzero -k 0 $zod" $dev
+
+echo "Check contents"
+md5sum $dev | sed -e "s|$dev|SCSI_DEBUG_DEV|g"
+
+echo "Destroy device"
+_put_scsi_debug_dev
+
+# success, all done
+status=0
+exit
diff --git a/tests/generic/705.out b/tests/generic/705.out
new file mode 100644
index 0000000..86d0317
--- /dev/null
+++ b/tests/generic/705.out
@@ -0,0 +1,11 @@
+QA output created by 705
+Create and format
+Zero range
+Zero range without keep_size
+Zero range past EOD
+Check contents
+f0cb9070c098aa347f664bead3a219d9  SCSI_DEBUG_DEV
+Zero range to MAX_LFS_FILESIZE
+Check contents
+b5cfa9d6c8febd618f91ac2843d50a1c  SCSI_DEBUG_DEV
+Destroy device
diff --git a/tests/generic/706 b/tests/generic/706
new file mode 100755
index 0000000..dd502e2
--- /dev/null
+++ b/tests/generic/706
@@ -0,0 +1,86 @@
+#! /bin/bash
+# FS QA Test No. 706
+#
+# Test fallocate(PUNCH_HOLE) on a block device, which should be able to
+# zero-TRIM (or equivalent) the range.
+#
+#-----------------------------------------------------------------------
+# Copyright (c) 2016 Oracle, Inc.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#-----------------------------------------------------------------------
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1	# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 7 15
+
+_cleanup()
+{
+    cd /
+    rm -rf $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/scsi_debug
+
+# real QA test starts here
+_supported_os Linux
+_require_scsi_debug
+_require_xfs_io_command "fpunch"
+
+echo "Create and format"
+dev=$(_get_scsi_debug_dev 512 512 0 4 "lbpws=1 lbpws10=1")
+_pwrite_byte 0x62 0 4m $dev >> $seqres.full
+
+echo "Zero punch"
+$XFS_IO_PROG -c "fpunch 512k 1m" $dev
+
+echo "Punch range past EOD"
+$XFS_IO_PROG -c "fpunch 3m 4m" $dev
+
+echo "Check contents"
+md5sum $dev | sed -e "s|$dev|SCSI_DEBUG_DEV|g"
+
+echo "Punch to MAX_LFS_FILESIZE"
+# zod = MAX_LFS_FILESIZE
+case "$(getconf LONG_BIT)" in
+"32")
+	zod=$(( ($(getconf PAGE_SIZE) << ($(getconf LONG_BIT) - 1) ) - 1))
+	;;
+"64")
+	zod=9223372036854775807
+	;;
+*)
+	_fail "sizeof(long) == $(getconf LONG_BIT)?"
+	;;
+esac
+$XFS_IO_PROG -c "fpunch 0 $zod" $dev
+
+echo "Check contents"
+md5sum $dev | sed -e "s|$dev|SCSI_DEBUG_DEV|g"
+
+echo "Destroy device"
+_put_scsi_debug_dev
+
+# success, all done
+status=0
+exit
diff --git a/tests/generic/706.out b/tests/generic/706.out
new file mode 100644
index 0000000..de8de0c
--- /dev/null
+++ b/tests/generic/706.out
@@ -0,0 +1,10 @@
+QA output created by 706
+Create and format
+Zero punch
+Punch range past EOD
+Check contents
+8c6a3fd51601141b56eaebbab3746156  SCSI_DEBUG_DEV
+Punch to MAX_LFS_FILESIZE
+Check contents
+b5cfa9d6c8febd618f91ac2843d50a1c  SCSI_DEBUG_DEV
+Destroy device
diff --git a/tests/generic/707 b/tests/generic/707
new file mode 100755
index 0000000..229b152
--- /dev/null
+++ b/tests/generic/707
@@ -0,0 +1,130 @@
+#! /bin/bash
+# FS QA Test No. 707
+#
+# Test the unsupported fallocate flags on a block device.  No collapse
+# or insert range, no regular fallocate, no forgetting keep-space on
+# zero range, no punching past EOD, no requests that aren't aligned
+# with the logicalsector size, and make sure the fallbacks work for
+# devices that don't support write_same or discard.
+#
+#-----------------------------------------------------------------------
+# Copyright (c) 2016 Oracle, Inc.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#-----------------------------------------------------------------------
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1	# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 7 15
+
+_cleanup()
+{
+    cd /
+    rm -rf $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/scsi_debug
+
+# real QA test starts here
+_supported_os Linux
+_require_scsi_debug
+_require_xfs_io_command "falloc"
+_require_xfs_io_command "finsert"
+_require_xfs_io_command "fcollapse"
+_require_xfs_io_command "fzero"
+_require_xfs_io_command "fpunch"
+
+
+echo "Create and format"
+dev=$(_get_scsi_debug_dev 4096 4096 0 4 "lbpws=1 lbpws10=1")
+_pwrite_byte 0x62 0 4m $dev >> $seqres.full
+$XFS_IO_PROG -c "fsync" $dev
+
+echo "Regular fallocate"
+$XFS_IO_PROG -c "falloc 64k 64k" $dev
+
+echo "Insert range"
+$XFS_IO_PROG -c "finsert 128k 64k" $dev
+
+echo "Collapse range"
+$XFS_IO_PROG -c "fcollapse 256k 64k" $dev
+
+echo "Unaligned zero range"
+$XFS_IO_PROG -c "fzero -k 512 512" $dev
+
+echo "Unaligned punch"
+$XFS_IO_PROG -c "fpunch 512 512" $dev
+
+echo "Zero range past MAX_LFS_FILESIZE keep size"
+# zod = MAX_LFS_FILESIZE
+case "$(getconf LONG_BIT)" in
+"32")
+	zod=$(( ($(getconf PAGE_SIZE) << ($(getconf LONG_BIT) - 1) ) - 1))
+	;;
+"64")
+	zod=9223372036854775807
+	;;
+*)
+	_fail "sizeof(long) == $(getconf LONG_BIT)?"
+	;;
+esac
+$XFS_IO_PROG -c "fzero -k 512k $zod" $dev
+
+echo "Zero range past MAX_LFS_FILESIZE"
+$XFS_IO_PROG -c "fzero 512k $zod" $dev
+
+echo "Zero range to MAX_LFS_FILESIZE fail w/o keepsize"
+$XFS_IO_PROG -c "fzero 0 $zod" $dev
+
+echo "Zero range starts past EOD"
+$XFS_IO_PROG -c "fzero -k 900m 1m" $dev
+
+echo "Punch starts past EOD"
+$XFS_IO_PROG -c "fpunch 900m 1m" $dev
+
+echo "Check contents"
+md5sum $dev | sed -e "s|$dev|SCSI_DEBUG_DEV|g"
+
+echo "Destroy device"
+_put_scsi_debug_dev
+
+echo "Create w/o unmap or writesame and format"
+dev=$(_get_scsi_debug_dev 512 512 0 4 "lbpws=0 lbpws10=0 lbpu=0 write_same_length=0 unmap_max_blocks=0")
+_pwrite_byte 0x62 0 4m $dev >> $seqres.full
+$XFS_IO_PROG -c "fsync" $dev
+
+echo "Zero punch, no fallback available"
+$XFS_IO_PROG -c "fpunch 512k 512k" $dev
+
+echo "Zero range, write fallback"
+$XFS_IO_PROG -c "fzero -k 1536k 512k" $dev
+
+echo "Check contents"
+md5sum $dev | sed -e "s|$dev|SCSI_DEBUG_DEV|g"
+
+echo "Destroy device"
+_put_scsi_debug_dev
+
+# success, all done
+status=0
+exit
diff --git a/tests/generic/707.out b/tests/generic/707.out
new file mode 100644
index 0000000..a520221
--- /dev/null
+++ b/tests/generic/707.out
@@ -0,0 +1,32 @@
+QA output created by 707
+Create and format
+Regular fallocate
+fallocate: Operation not supported
+Insert range
+fallocate: Operation not supported
+Collapse range
+fallocate: Operation not supported
+Unaligned zero range
+fallocate: Invalid argument
+Unaligned punch
+fallocate: Invalid argument
+Zero range past MAX_LFS_FILESIZE keep size
+fallocate: File too large
+Zero range past MAX_LFS_FILESIZE
+fallocate: File too large
+Zero range to MAX_LFS_FILESIZE fail w/o keepsize
+fallocate: Invalid argument
+Zero range starts past EOD
+fallocate: Invalid argument
+Punch starts past EOD
+fallocate: Invalid argument
+Check contents
+b83f9394092e15bdcda585cd8e776dc6  SCSI_DEBUG_DEV
+Destroy device
+Create w/o unmap or writesame and format
+Zero punch, no fallback available
+fallocate: Operation not supported
+Zero range, write fallback
+Check contents
+0fc6bc93cd0cd97e3cde5ea39ea1185d  SCSI_DEBUG_DEV
+Destroy device
diff --git a/tests/generic/group b/tests/generic/group
index ef1a423..cc14c80 100644
--- a/tests/generic/group
+++ b/tests/generic/group
@@ -345,3 +345,6 @@
 340 auto
 341 auto quick metadata
 342 auto quick metadata
+705 blockdev quick rw
+706 blockdev quick rw
+707 blockdev quick rw

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 4/3] block: test fallocate for block devices
@ 2016-04-13  4:04   ` Darrick J. Wong
  0 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2016-04-13  4:04 UTC (permalink / raw)
  To: darrick.wong
  Cc: axboe, linux-block, tytso, martin.petersen, snitzer, linux-api,
	bfoster, fstests, xfs, hch, dm-devel, linux-fsdevel

Now that we're wiring up fallocate's PUNCH_HOLE and ZERO_RANGE
features for block devices, add some tests to make sure they
work correctly.

v2: Update tests to reflect EOD clamping suggested by Linus.
Note that the VFS fallocate makes us play some weird games wrt
MAX_LFS_FILESIZE.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 common/scsi_debug     |    6 ++
 tests/generic/705     |   89 ++++++++++++++++++++++++++++++++++
 tests/generic/705.out |   11 ++++
 tests/generic/706     |   86 ++++++++++++++++++++++++++++++++
 tests/generic/706.out |   10 ++++
 tests/generic/707     |  130 +++++++++++++++++++++++++++++++++++++++++++++++++
 tests/generic/707.out |   32 ++++++++++++
 tests/generic/group   |    3 +
 8 files changed, 366 insertions(+), 1 deletion(-)
 create mode 100755 tests/generic/705
 create mode 100644 tests/generic/705.out
 create mode 100755 tests/generic/706
 create mode 100644 tests/generic/706.out
 create mode 100755 tests/generic/707
 create mode 100644 tests/generic/707.out

diff --git a/common/scsi_debug b/common/scsi_debug
index eb08126..74c3802 100644
--- a/common/scsi_debug
+++ b/common/scsi_debug
@@ -40,13 +40,17 @@ _get_scsi_debug_dev()
 	logical=${2-512}
 	unaligned=${3-0}
 	size=${4-128}
+	test -n "$4" && shift
+	test -n "$3" && shift
+	test -n "$2" && shift
+	test -n "$1" && shift
 
 	phys_exp=0
 	while [ $logical -lt $physical ]; do
 		let physical=physical/2
 		let phys_exp=phys_exp+1
 	done
-	opts="sector_size=$logical physblk_exp=$phys_exp lowest_aligned=$unaligned dev_size_mb=$size"
+	opts="sector_size=$logical physblk_exp=$phys_exp lowest_aligned=$unaligned dev_size_mb=$size $@"
 	echo "scsi_debug options $opts" >> $seqres.full
 	modprobe scsi_debug $opts
 	[ $? -eq 0 ] || _fail "scsi_debug modprobe failed"
diff --git a/tests/generic/705 b/tests/generic/705
new file mode 100755
index 0000000..f30f2c3
--- /dev/null
+++ b/tests/generic/705
@@ -0,0 +1,89 @@
+#! /bin/bash
+# FS QA Test No. 705
+#
+# Test fallocate(ZERO_RANGE) on a block device, which should be able to
+# WRITE SAME (or equivalent) the range.
+#
+#-----------------------------------------------------------------------
+# Copyright (c) 2016 Oracle, Inc.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#-----------------------------------------------------------------------
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1	# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 7 15
+
+_cleanup()
+{
+    cd /
+    rm -rf $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/scsi_debug
+
+# real QA test starts here
+_supported_os Linux
+_require_scsi_debug
+_require_xfs_io_command "fzero"
+
+echo "Create and format"
+dev=$(_get_scsi_debug_dev 512 512 0 4 "lbpws=1 lbpws10=1")
+_pwrite_byte 0x62 0 4m $dev >> $seqres.full
+
+echo "Zero range"
+$XFS_IO_PROG -c "fzero -k 512k 1m" $dev
+
+echo "Zero range without keep_size"
+$XFS_IO_PROG -c "fzero 384k 64k" $dev
+
+echo "Zero range past EOD"
+$XFS_IO_PROG -c "fzero -k 3m 4m" $dev
+
+echo "Check contents"
+md5sum $dev | sed -e "s|$dev|SCSI_DEBUG_DEV|g"
+
+echo "Zero range to MAX_LFS_FILESIZE"
+# zod = MAX_LFS_FILESIZE
+case "$(getconf LONG_BIT)" in
+"32")
+	zod=$(( ($(getconf PAGE_SIZE) << ($(getconf LONG_BIT) - 1) ) - 1))
+	;;
+"64")
+	zod=9223372036854775807
+	;;
+*)
+	_fail "sizeof(long) == $(getconf LONG_BIT)?"
+	;;
+esac
+$XFS_IO_PROG -c "fzero -k 0 $zod" $dev
+
+echo "Check contents"
+md5sum $dev | sed -e "s|$dev|SCSI_DEBUG_DEV|g"
+
+echo "Destroy device"
+_put_scsi_debug_dev
+
+# success, all done
+status=0
+exit
diff --git a/tests/generic/705.out b/tests/generic/705.out
new file mode 100644
index 0000000..86d0317
--- /dev/null
+++ b/tests/generic/705.out
@@ -0,0 +1,11 @@
+QA output created by 705
+Create and format
+Zero range
+Zero range without keep_size
+Zero range past EOD
+Check contents
+f0cb9070c098aa347f664bead3a219d9  SCSI_DEBUG_DEV
+Zero range to MAX_LFS_FILESIZE
+Check contents
+b5cfa9d6c8febd618f91ac2843d50a1c  SCSI_DEBUG_DEV
+Destroy device
diff --git a/tests/generic/706 b/tests/generic/706
new file mode 100755
index 0000000..dd502e2
--- /dev/null
+++ b/tests/generic/706
@@ -0,0 +1,86 @@
+#! /bin/bash
+# FS QA Test No. 706
+#
+# Test fallocate(PUNCH_HOLE) on a block device, which should be able to
+# zero-TRIM (or equivalent) the range.
+#
+#-----------------------------------------------------------------------
+# Copyright (c) 2016 Oracle, Inc.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#-----------------------------------------------------------------------
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1	# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 7 15
+
+_cleanup()
+{
+    cd /
+    rm -rf $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/scsi_debug
+
+# real QA test starts here
+_supported_os Linux
+_require_scsi_debug
+_require_xfs_io_command "fpunch"
+
+echo "Create and format"
+dev=$(_get_scsi_debug_dev 512 512 0 4 "lbpws=1 lbpws10=1")
+_pwrite_byte 0x62 0 4m $dev >> $seqres.full
+
+echo "Zero punch"
+$XFS_IO_PROG -c "fpunch 512k 1m" $dev
+
+echo "Punch range past EOD"
+$XFS_IO_PROG -c "fpunch 3m 4m" $dev
+
+echo "Check contents"
+md5sum $dev | sed -e "s|$dev|SCSI_DEBUG_DEV|g"
+
+echo "Punch to MAX_LFS_FILESIZE"
+# zod = MAX_LFS_FILESIZE
+case "$(getconf LONG_BIT)" in
+"32")
+	zod=$(( ($(getconf PAGE_SIZE) << ($(getconf LONG_BIT) - 1) ) - 1))
+	;;
+"64")
+	zod=9223372036854775807
+	;;
+*)
+	_fail "sizeof(long) == $(getconf LONG_BIT)?"
+	;;
+esac
+$XFS_IO_PROG -c "fpunch 0 $zod" $dev
+
+echo "Check contents"
+md5sum $dev | sed -e "s|$dev|SCSI_DEBUG_DEV|g"
+
+echo "Destroy device"
+_put_scsi_debug_dev
+
+# success, all done
+status=0
+exit
diff --git a/tests/generic/706.out b/tests/generic/706.out
new file mode 100644
index 0000000..de8de0c
--- /dev/null
+++ b/tests/generic/706.out
@@ -0,0 +1,10 @@
+QA output created by 706
+Create and format
+Zero punch
+Punch range past EOD
+Check contents
+8c6a3fd51601141b56eaebbab3746156  SCSI_DEBUG_DEV
+Punch to MAX_LFS_FILESIZE
+Check contents
+b5cfa9d6c8febd618f91ac2843d50a1c  SCSI_DEBUG_DEV
+Destroy device
diff --git a/tests/generic/707 b/tests/generic/707
new file mode 100755
index 0000000..229b152
--- /dev/null
+++ b/tests/generic/707
@@ -0,0 +1,130 @@
+#! /bin/bash
+# FS QA Test No. 707
+#
+# Test the unsupported fallocate flags on a block device.  No collapse
+# or insert range, no regular fallocate, no forgetting keep-space on
+# zero range, no punching past EOD, no requests that aren't aligned
+# with the logicalsector size, and make sure the fallbacks work for
+# devices that don't support write_same or discard.
+#
+#-----------------------------------------------------------------------
+# Copyright (c) 2016 Oracle, Inc.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#-----------------------------------------------------------------------
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1	# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 7 15
+
+_cleanup()
+{
+    cd /
+    rm -rf $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/scsi_debug
+
+# real QA test starts here
+_supported_os Linux
+_require_scsi_debug
+_require_xfs_io_command "falloc"
+_require_xfs_io_command "finsert"
+_require_xfs_io_command "fcollapse"
+_require_xfs_io_command "fzero"
+_require_xfs_io_command "fpunch"
+
+
+echo "Create and format"
+dev=$(_get_scsi_debug_dev 4096 4096 0 4 "lbpws=1 lbpws10=1")
+_pwrite_byte 0x62 0 4m $dev >> $seqres.full
+$XFS_IO_PROG -c "fsync" $dev
+
+echo "Regular fallocate"
+$XFS_IO_PROG -c "falloc 64k 64k" $dev
+
+echo "Insert range"
+$XFS_IO_PROG -c "finsert 128k 64k" $dev
+
+echo "Collapse range"
+$XFS_IO_PROG -c "fcollapse 256k 64k" $dev
+
+echo "Unaligned zero range"
+$XFS_IO_PROG -c "fzero -k 512 512" $dev
+
+echo "Unaligned punch"
+$XFS_IO_PROG -c "fpunch 512 512" $dev
+
+echo "Zero range past MAX_LFS_FILESIZE keep size"
+# zod = MAX_LFS_FILESIZE
+case "$(getconf LONG_BIT)" in
+"32")
+	zod=$(( ($(getconf PAGE_SIZE) << ($(getconf LONG_BIT) - 1) ) - 1))
+	;;
+"64")
+	zod=9223372036854775807
+	;;
+*)
+	_fail "sizeof(long) == $(getconf LONG_BIT)?"
+	;;
+esac
+$XFS_IO_PROG -c "fzero -k 512k $zod" $dev
+
+echo "Zero range past MAX_LFS_FILESIZE"
+$XFS_IO_PROG -c "fzero 512k $zod" $dev
+
+echo "Zero range to MAX_LFS_FILESIZE fail w/o keepsize"
+$XFS_IO_PROG -c "fzero 0 $zod" $dev
+
+echo "Zero range starts past EOD"
+$XFS_IO_PROG -c "fzero -k 900m 1m" $dev
+
+echo "Punch starts past EOD"
+$XFS_IO_PROG -c "fpunch 900m 1m" $dev
+
+echo "Check contents"
+md5sum $dev | sed -e "s|$dev|SCSI_DEBUG_DEV|g"
+
+echo "Destroy device"
+_put_scsi_debug_dev
+
+echo "Create w/o unmap or writesame and format"
+dev=$(_get_scsi_debug_dev 512 512 0 4 "lbpws=0 lbpws10=0 lbpu=0 write_same_length=0 unmap_max_blocks=0")
+_pwrite_byte 0x62 0 4m $dev >> $seqres.full
+$XFS_IO_PROG -c "fsync" $dev
+
+echo "Zero punch, no fallback available"
+$XFS_IO_PROG -c "fpunch 512k 512k" $dev
+
+echo "Zero range, write fallback"
+$XFS_IO_PROG -c "fzero -k 1536k 512k" $dev
+
+echo "Check contents"
+md5sum $dev | sed -e "s|$dev|SCSI_DEBUG_DEV|g"
+
+echo "Destroy device"
+_put_scsi_debug_dev
+
+# success, all done
+status=0
+exit
diff --git a/tests/generic/707.out b/tests/generic/707.out
new file mode 100644
index 0000000..a520221
--- /dev/null
+++ b/tests/generic/707.out
@@ -0,0 +1,32 @@
+QA output created by 707
+Create and format
+Regular fallocate
+fallocate: Operation not supported
+Insert range
+fallocate: Operation not supported
+Collapse range
+fallocate: Operation not supported
+Unaligned zero range
+fallocate: Invalid argument
+Unaligned punch
+fallocate: Invalid argument
+Zero range past MAX_LFS_FILESIZE keep size
+fallocate: File too large
+Zero range past MAX_LFS_FILESIZE
+fallocate: File too large
+Zero range to MAX_LFS_FILESIZE fail w/o keepsize
+fallocate: Invalid argument
+Zero range starts past EOD
+fallocate: Invalid argument
+Punch starts past EOD
+fallocate: Invalid argument
+Check contents
+b83f9394092e15bdcda585cd8e776dc6  SCSI_DEBUG_DEV
+Destroy device
+Create w/o unmap or writesame and format
+Zero punch, no fallback available
+fallocate: Operation not supported
+Zero range, write fallback
+Check contents
+0fc6bc93cd0cd97e3cde5ea39ea1185d  SCSI_DEBUG_DEV
+Destroy device
diff --git a/tests/generic/group b/tests/generic/group
index ef1a423..cc14c80 100644
--- a/tests/generic/group
+++ b/tests/generic/group
@@ -345,3 +345,6 @@
 340 auto
 341 auto quick metadata
 342 auto quick metadata
+705 blockdev quick rw
+706 blockdev quick rw
+707 blockdev quick rw

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/3] block: require write_same and discard requests align to logical block size
  2016-04-13  4:01   ` Darrick J. Wong
@ 2016-04-13 14:23     ` Christoph Hellwig
  -1 siblings, 0 replies; 35+ messages in thread
From: Christoph Hellwig @ 2016-04-13 14:23 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: axboe, hch, tytso, martin.petersen, snitzer, linux-api, bfoster,
	xfs, linux-block, dm-devel, linux-fsdevel, Christoph Hellwig

On Tue, Apr 12, 2016 at 09:01:35PM -0700, Darrick J. Wong wrote:
> Make sure that the offset and length arguments that we're using to
> construct WRITE SAME and DISCARD requests are actually aligned to the
> logical block size.  Failure to do this causes other errors in other
> parts of the block layer or the SCSI layer because disks don't support
> partial logical block writes.

FYI, Bart has just been posting a patchset in that includes this, but
goes further.  Can you take a look at it?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/3] block: require write_same and discard requests align to logical block size
@ 2016-04-13 14:23     ` Christoph Hellwig
  0 siblings, 0 replies; 35+ messages in thread
From: Christoph Hellwig @ 2016-04-13 14:23 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: axboe, linux-block, tytso, martin.petersen, snitzer, linux-api,
	bfoster, xfs, hch, dm-devel, linux-fsdevel, Christoph Hellwig

On Tue, Apr 12, 2016 at 09:01:35PM -0700, Darrick J. Wong wrote:
> Make sure that the offset and length arguments that we're using to
> construct WRITE SAME and DISCARD requests are actually aligned to the
> logical block size.  Failure to do this causes other errors in other
> parts of the block layer or the SCSI layer because disks don't support
> partial logical block writes.

FYI, Bart has just been posting a patchset in that includes this, but
goes further.  Can you take a look at it?

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 2/3] block: require write_same and discard requests align to logical block size
  2016-09-29 21:16 [PATCH v11 0/3] " Darrick J. Wong
@ 2016-09-29 21:16   ` Darrick J. Wong
  0 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2016-09-29 21:16 UTC (permalink / raw)
  To: axboe, akpm, darrick.wong
  Cc: linux-block, Hannes Reinecke, tytso, snitzer, martin.petersen,
	linux-api, bfoster, xfs, hch, dm-devel, hare, linux-fsdevel,
	bart.vanassche, Christoph Hellwig

Make sure that the offset and length arguments that we're using to
construct WRITE SAME and DISCARD requests are actually aligned to the
logical block size.  Failure to do this causes other errors in other
parts of the block layer or the SCSI layer because disks don't support
partial logical block writes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
---
 block/blk-lib.c |   15 +++++++++++++++
 1 file changed, 15 insertions(+)


diff --git a/block/blk-lib.c b/block/blk-lib.c
index 083e56f..46fe924 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -31,6 +31,7 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 	unsigned int granularity;
 	enum req_op op;
 	int alignment;
+	sector_t bs_mask;
 
 	if (!q)
 		return -ENXIO;
@@ -50,6 +51,10 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 		op = REQ_OP_DISCARD;
 	}
 
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
+
 	/* Zero-sector (unknown) and one-sector granularities are the same.  */
 	granularity = max(q->limits.discard_granularity >> 9, 1U);
 	alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
@@ -150,10 +155,15 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 	unsigned int max_write_same_sectors;
 	struct bio *bio = NULL;
 	int ret = 0;
+	sector_t bs_mask;
 
 	if (!q)
 		return -ENXIO;
 
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
+
 	/* Ensure that max_write_same_sectors doesn't overflow bi_size */
 	max_write_same_sectors = UINT_MAX >> 9;
 
@@ -202,6 +212,11 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 	int ret;
 	struct bio *bio = NULL;
 	unsigned int sz;
+	sector_t bs_mask;
+
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
 
 	while (nr_sects != 0) {
 		bio = next_bio(bio, min(nr_sects, (sector_t)BIO_MAX_PAGES),

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/3] block: require write_same and discard requests align to logical block size
@ 2016-09-29 21:16   ` Darrick J. Wong
  0 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2016-09-29 21:16 UTC (permalink / raw)
  To: axboe, akpm, darrick.wong
  Cc: hch, Hannes Reinecke, tytso, snitzer, martin.petersen, linux-api,
	bfoster, xfs, linux-block, dm-devel, hare, linux-fsdevel,
	bart.vanassche, Christoph Hellwig

Make sure that the offset and length arguments that we're using to
construct WRITE SAME and DISCARD requests are actually aligned to the
logical block size.  Failure to do this causes other errors in other
parts of the block layer or the SCSI layer because disks don't support
partial logical block writes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
---
 block/blk-lib.c |   15 +++++++++++++++
 1 file changed, 15 insertions(+)


diff --git a/block/blk-lib.c b/block/blk-lib.c
index 083e56f..46fe924 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -31,6 +31,7 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 	unsigned int granularity;
 	enum req_op op;
 	int alignment;
+	sector_t bs_mask;
 
 	if (!q)
 		return -ENXIO;
@@ -50,6 +51,10 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 		op = REQ_OP_DISCARD;
 	}
 
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
+
 	/* Zero-sector (unknown) and one-sector granularities are the same.  */
 	granularity = max(q->limits.discard_granularity >> 9, 1U);
 	alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
@@ -150,10 +155,15 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 	unsigned int max_write_same_sectors;
 	struct bio *bio = NULL;
 	int ret = 0;
+	sector_t bs_mask;
 
 	if (!q)
 		return -ENXIO;
 
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
+
 	/* Ensure that max_write_same_sectors doesn't overflow bi_size */
 	max_write_same_sectors = UINT_MAX >> 9;
 
@@ -202,6 +212,11 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 	int ret;
 	struct bio *bio = NULL;
 	unsigned int sz;
+	sector_t bs_mask;
+
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
 
 	while (nr_sects != 0) {
 		bio = next_bio(bio, min(nr_sects, (sector_t)BIO_MAX_PAGES),

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/3] block: require write_same and discard requests align to logical block size
  2016-09-29  0:39   ` Darrick J. Wong
  (?)
@ 2016-09-29  5:56     ` Hannes Reinecke
  -1 siblings, 0 replies; 35+ messages in thread
From: Hannes Reinecke @ 2016-09-29  5:56 UTC (permalink / raw)
  To: Darrick J. Wong, axboe, akpm
  Cc: linux-block, tytso, martin.petersen, snitzer, linux-api, bfoster,
	xfs, hch, dm-devel, linux-fsdevel, bart.vanassche,
	Christoph Hellwig

On 09/29/2016 02:39 AM, Darrick J. Wong wrote:
> Make sure that the offset and length arguments that we're using to
> construct WRITE SAME and DISCARD requests are actually aligned to the
> logical block size.  Failure to do this causes other errors in other
> parts of the block layer or the SCSI layer because disks don't support
> partial logical block writes.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
> ---
>  block/blk-lib.c |   15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/3] block: require write_same and discard requests align to logical block size
@ 2016-09-29  5:56     ` Hannes Reinecke
  0 siblings, 0 replies; 35+ messages in thread
From: Hannes Reinecke @ 2016-09-29  5:56 UTC (permalink / raw)
  To: Darrick J. Wong, axboe, akpm
  Cc: hch, tytso, martin.petersen, snitzer, linux-api, bfoster, xfs,
	linux-block, dm-devel, linux-fsdevel, bart.vanassche,
	Christoph Hellwig

On 09/29/2016 02:39 AM, Darrick J. Wong wrote:
> Make sure that the offset and length arguments that we're using to
> construct WRITE SAME and DISCARD requests are actually aligned to the
> logical block size.  Failure to do this causes other errors in other
> parts of the block layer or the SCSI layer because disks don't support
> partial logical block writes.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
> ---
>  block/blk-lib.c |   15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/3] block: require write_same and discard requests align to logical block size
@ 2016-09-29  5:56     ` Hannes Reinecke
  0 siblings, 0 replies; 35+ messages in thread
From: Hannes Reinecke @ 2016-09-29  5:56 UTC (permalink / raw)
  To: Darrick J. Wong, axboe-tSWWG44O7X1aa/9Udqfwiw,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
  Cc: linux-block-u79uwXL29TY76Z2rM5mHXA, tytso-3s7WtUTddSA,
	martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, linux-api-u79uwXL29TY76Z2rM5mHXA,
	bfoster-H+wXaHxf7aLQT0dZR+AlfA, xfs-VZNHf3L845pBDgjK7y7TUQ,
	hch-wEGCiKHe2LqWVfeAwA7xHQ, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ, Christoph Hellwig

On 09/29/2016 02:39 AM, Darrick J. Wong wrote:
> Make sure that the offset and length arguments that we're using to
> construct WRITE SAME and DISCARD requests are actually aligned to the
> logical block size.  Failure to do this causes other errors in other
> parts of the block layer or the SCSI layer because disks don't support
> partial logical block writes.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> Reviewed-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> Reviewed-by: Martin K. Petersen <martin.petersen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> ---
>  block/blk-lib.c |   15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
Reviewed-by: Hannes Reinecke <hare-IBi9RG/b67k@public.gmane.org>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare-l3A5Bk7waGM@public.gmane.org			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 2/3] block: require write_same and discard requests align to logical block size
  2016-09-29  0:39 [PATCH v10 0/3] fallocate for block devices Darrick J. Wong
@ 2016-09-29  0:39   ` Darrick J. Wong
  0 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2016-09-29  0:39 UTC (permalink / raw)
  To: axboe, akpm, darrick.wong
  Cc: linux-block, tytso, martin.petersen, snitzer, linux-api, bfoster,
	xfs, hch, dm-devel, linux-fsdevel, bart.vanassche,
	Christoph Hellwig

Make sure that the offset and length arguments that we're using to
construct WRITE SAME and DISCARD requests are actually aligned to the
logical block size.  Failure to do this causes other errors in other
parts of the block layer or the SCSI layer because disks don't support
partial logical block writes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 block/blk-lib.c |   15 +++++++++++++++
 1 file changed, 15 insertions(+)


diff --git a/block/blk-lib.c b/block/blk-lib.c
index 083e56f..46fe924 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -31,6 +31,7 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 	unsigned int granularity;
 	enum req_op op;
 	int alignment;
+	sector_t bs_mask;
 
 	if (!q)
 		return -ENXIO;
@@ -50,6 +51,10 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 		op = REQ_OP_DISCARD;
 	}
 
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
+
 	/* Zero-sector (unknown) and one-sector granularities are the same.  */
 	granularity = max(q->limits.discard_granularity >> 9, 1U);
 	alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
@@ -150,10 +155,15 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 	unsigned int max_write_same_sectors;
 	struct bio *bio = NULL;
 	int ret = 0;
+	sector_t bs_mask;
 
 	if (!q)
 		return -ENXIO;
 
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
+
 	/* Ensure that max_write_same_sectors doesn't overflow bi_size */
 	max_write_same_sectors = UINT_MAX >> 9;
 
@@ -202,6 +212,11 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 	int ret;
 	struct bio *bio = NULL;
 	unsigned int sz;
+	sector_t bs_mask;
+
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
 
 	while (nr_sects != 0) {
 		bio = next_bio(bio, min(nr_sects, (sector_t)BIO_MAX_PAGES),

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/3] block: require write_same and discard requests align to logical block size
@ 2016-09-29  0:39   ` Darrick J. Wong
  0 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2016-09-29  0:39 UTC (permalink / raw)
  To: axboe, akpm, darrick.wong
  Cc: hch, tytso, martin.petersen, snitzer, linux-api, bfoster, xfs,
	linux-block, dm-devel, linux-fsdevel, bart.vanassche,
	Christoph Hellwig

Make sure that the offset and length arguments that we're using to
construct WRITE SAME and DISCARD requests are actually aligned to the
logical block size.  Failure to do this causes other errors in other
parts of the block layer or the SCSI layer because disks don't support
partial logical block writes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 block/blk-lib.c |   15 +++++++++++++++
 1 file changed, 15 insertions(+)


diff --git a/block/blk-lib.c b/block/blk-lib.c
index 083e56f..46fe924 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -31,6 +31,7 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 	unsigned int granularity;
 	enum req_op op;
 	int alignment;
+	sector_t bs_mask;
 
 	if (!q)
 		return -ENXIO;
@@ -50,6 +51,10 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 		op = REQ_OP_DISCARD;
 	}
 
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
+
 	/* Zero-sector (unknown) and one-sector granularities are the same.  */
 	granularity = max(q->limits.discard_granularity >> 9, 1U);
 	alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
@@ -150,10 +155,15 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 	unsigned int max_write_same_sectors;
 	struct bio *bio = NULL;
 	int ret = 0;
+	sector_t bs_mask;
 
 	if (!q)
 		return -ENXIO;
 
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
+
 	/* Ensure that max_write_same_sectors doesn't overflow bi_size */
 	max_write_same_sectors = UINT_MAX >> 9;
 
@@ -202,6 +212,11 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 	int ret;
 	struct bio *bio = NULL;
 	unsigned int sz;
+	sector_t bs_mask;
+
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
 
 	while (nr_sects != 0) {
 		bio = next_bio(bio, min(nr_sects, (sector_t)BIO_MAX_PAGES),

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/3] block: require write_same and discard requests align to logical block size
  2016-08-26  0:02 [PATCH v10 0/3] fallocate for block devices Darrick J. Wong
@ 2016-08-26  0:02   ` Darrick J. Wong
  0 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2016-08-26  0:02 UTC (permalink / raw)
  To: axboe, darrick.wong
  Cc: linux-block, tytso, martin.petersen, snitzer, linux-api, bfoster,
	xfs, hch, dm-devel, linux-fsdevel, bart.vanassche,
	Christoph Hellwig

Make sure that the offset and length arguments that we're using to
construct WRITE SAME and DISCARD requests are actually aligned to the
logical block size.  Failure to do this causes other errors in other
parts of the block layer or the SCSI layer because disks don't support
partial logical block writes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 block/blk-lib.c |   15 +++++++++++++++
 1 file changed, 15 insertions(+)


diff --git a/block/blk-lib.c b/block/blk-lib.c
index 083e56f..46fe924 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -31,6 +31,7 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 	unsigned int granularity;
 	enum req_op op;
 	int alignment;
+	sector_t bs_mask;
 
 	if (!q)
 		return -ENXIO;
@@ -50,6 +51,10 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 		op = REQ_OP_DISCARD;
 	}
 
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
+
 	/* Zero-sector (unknown) and one-sector granularities are the same.  */
 	granularity = max(q->limits.discard_granularity >> 9, 1U);
 	alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
@@ -150,10 +155,15 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 	unsigned int max_write_same_sectors;
 	struct bio *bio = NULL;
 	int ret = 0;
+	sector_t bs_mask;
 
 	if (!q)
 		return -ENXIO;
 
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
+
 	/* Ensure that max_write_same_sectors doesn't overflow bi_size */
 	max_write_same_sectors = UINT_MAX >> 9;
 
@@ -202,6 +212,11 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 	int ret;
 	struct bio *bio = NULL;
 	unsigned int sz;
+	sector_t bs_mask;
+
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
 
 	while (nr_sects != 0) {
 		bio = next_bio(bio, min(nr_sects, (sector_t)BIO_MAX_PAGES),

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/3] block: require write_same and discard requests align to logical block size
@ 2016-08-26  0:02   ` Darrick J. Wong
  0 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2016-08-26  0:02 UTC (permalink / raw)
  To: axboe, darrick.wong
  Cc: hch, tytso, martin.petersen, snitzer, linux-api, bfoster, xfs,
	linux-block, dm-devel, linux-fsdevel, bart.vanassche,
	Christoph Hellwig

Make sure that the offset and length arguments that we're using to
construct WRITE SAME and DISCARD requests are actually aligned to the
logical block size.  Failure to do this causes other errors in other
parts of the block layer or the SCSI layer because disks don't support
partial logical block writes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 block/blk-lib.c |   15 +++++++++++++++
 1 file changed, 15 insertions(+)


diff --git a/block/blk-lib.c b/block/blk-lib.c
index 083e56f..46fe924 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -31,6 +31,7 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 	unsigned int granularity;
 	enum req_op op;
 	int alignment;
+	sector_t bs_mask;
 
 	if (!q)
 		return -ENXIO;
@@ -50,6 +51,10 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 		op = REQ_OP_DISCARD;
 	}
 
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
+
 	/* Zero-sector (unknown) and one-sector granularities are the same.  */
 	granularity = max(q->limits.discard_granularity >> 9, 1U);
 	alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
@@ -150,10 +155,15 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 	unsigned int max_write_same_sectors;
 	struct bio *bio = NULL;
 	int ret = 0;
+	sector_t bs_mask;
 
 	if (!q)
 		return -ENXIO;
 
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
+
 	/* Ensure that max_write_same_sectors doesn't overflow bi_size */
 	max_write_same_sectors = UINT_MAX >> 9;
 
@@ -202,6 +212,11 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 	int ret;
 	struct bio *bio = NULL;
 	unsigned int sz;
+	sector_t bs_mask;
+
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
 
 	while (nr_sects != 0) {
 		bio = next_bio(bio, min(nr_sects, (sector_t)BIO_MAX_PAGES),

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/3] block: require write_same and discard requests align to logical block size
  2016-06-17  1:17   ` Darrick J. Wong
@ 2016-06-29  4:58     ` Martin K. Petersen
  -1 siblings, 0 replies; 35+ messages in thread
From: Martin K. Petersen @ 2016-06-29  4:58 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: axboe, hch, tytso, martin.petersen, snitzer, linux-api, bfoster,
	xfs, linux-block, dm-devel, linux-fsdevel, Christoph Hellwig

>>>>> "Darrick" == Darrick J Wong <darrick.wong@oracle.com> writes:

Darrick> Make sure that the offset and length arguments that we're using
Darrick> to construct WRITE SAME and DISCARD requests are actually
Darrick> aligned to the logical block size.  Failure to do this causes
Darrick> other errors in other parts of the block layer or the SCSI
Darrick> layer because disks don't support partial logical block writes.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/3] block: require write_same and discard requests align to logical block size
@ 2016-06-29  4:58     ` Martin K. Petersen
  0 siblings, 0 replies; 35+ messages in thread
From: Martin K. Petersen @ 2016-06-29  4:58 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: axboe, linux-block, tytso, martin.petersen, snitzer, linux-api,
	bfoster, xfs, hch, dm-devel, linux-fsdevel, Christoph Hellwig

>>>>> "Darrick" == Darrick J Wong <darrick.wong@oracle.com> writes:

Darrick> Make sure that the offset and length arguments that we're using
Darrick> to construct WRITE SAME and DISCARD requests are actually
Darrick> aligned to the logical block size.  Failure to do this causes
Darrick> other errors in other parts of the block layer or the SCSI
Darrick> layer because disks don't support partial logical block writes.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/3] block: require write_same and discard requests align to logical block size
  2016-06-17  1:17   ` Darrick J. Wong
@ 2016-06-20 12:37     ` Bart Van Assche
  -1 siblings, 0 replies; 35+ messages in thread
From: Bart Van Assche @ 2016-06-20 12:37 UTC (permalink / raw)
  To: Darrick J. Wong, axboe
  Cc: linux-block, tytso, martin.petersen, snitzer, linux-api, bfoster,
	xfs, hch, dm-devel, linux-fsdevel, Christoph Hellwig,
	Bart Van Assche

On 06/17/2016 03:19 AM, Darrick J. Wong wrote:
> Make sure that the offset and length arguments that we're using to
> construct WRITE SAME and DISCARD requests are actually aligned to the
> logical block size.  Failure to do this causes other errors in other
> parts of the block layer or the SCSI layer because disks don't support
> partial logical block writes.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/3] block: require write_same and discard requests align to logical block size
@ 2016-06-20 12:37     ` Bart Van Assche
  0 siblings, 0 replies; 35+ messages in thread
From: Bart Van Assche @ 2016-06-20 12:37 UTC (permalink / raw)
  To: Darrick J. Wong, axboe
  Cc: hch, tytso, martin.petersen, snitzer, linux-api, bfoster, xfs,
	linux-block, dm-devel, linux-fsdevel, Bart Van Assche,
	Christoph Hellwig

On 06/17/2016 03:19 AM, Darrick J. Wong wrote:
> Make sure that the offset and length arguments that we're using to
> construct WRITE SAME and DISCARD requests are actually aligned to the
> logical block size.  Failure to do this causes other errors in other
> parts of the block layer or the SCSI layer because disks don't support
> partial logical block writes.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 2/3] block: require write_same and discard requests align to logical block size
  2016-06-17  1:17 [PATCH v9 0/3] fallocate for block devices Darrick J. Wong
  2016-06-17  1:17   ` Darrick J. Wong
@ 2016-06-17  1:17   ` Darrick J. Wong
  0 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:17 UTC (permalink / raw)
  To: axboe, darrick.wong
  Cc: linux-block, tytso, martin.petersen, snitzer, linux-api, bfoster,
	xfs, hch, dm-devel, linux-fsdevel, Christoph Hellwig

Make sure that the offset and length arguments that we're using to
construct WRITE SAME and DISCARD requests are actually aligned to the
logical block size.  Failure to do this causes other errors in other
parts of the block layer or the SCSI layer because disks don't support
partial logical block writes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-lib.c |   15 +++++++++++++++
 1 file changed, 15 insertions(+)


diff --git a/block/blk-lib.c b/block/blk-lib.c
index 9e29dc3..012aa98 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -29,6 +29,7 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 	struct bio *bio = *biop;
 	unsigned int granularity;
 	int alignment;
+	sector_t bs_mask;
 
 	if (!q)
 		return -ENXIO;
@@ -37,6 +38,10 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 	if ((type & REQ_SECURE) && !blk_queue_secdiscard(q))
 		return -EOPNOTSUPP;
 
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
+
 	/* Zero-sector (unknown) and one-sector granularities are the same.  */
 	granularity = max(q->limits.discard_granularity >> 9, 1U);
 	alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
@@ -140,10 +145,15 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 	unsigned int max_write_same_sectors;
 	struct bio *bio = NULL;
 	int ret = 0;
+	sector_t bs_mask;
 
 	if (!q)
 		return -ENXIO;
 
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
+
 	/* Ensure that max_write_same_sectors doesn't overflow bi_size */
 	max_write_same_sectors = UINT_MAX >> 9;
 
@@ -191,6 +201,11 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 	int ret;
 	struct bio *bio = NULL;
 	unsigned int sz;
+	sector_t bs_mask;
+
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
 
 	while (nr_sects != 0) {
 		bio = next_bio(bio, WRITE,

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/3] block: require write_same and discard requests align to logical block size
@ 2016-06-17  1:17   ` Darrick J. Wong
  0 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:17 UTC (permalink / raw)
  To: axboe, darrick.wong
  Cc: hch, tytso, martin.petersen, snitzer, linux-api, bfoster, xfs,
	linux-block, dm-devel, linux-fsdevel, Christoph Hellwig

Make sure that the offset and length arguments that we're using to
construct WRITE SAME and DISCARD requests are actually aligned to the
logical block size.  Failure to do this causes other errors in other
parts of the block layer or the SCSI layer because disks don't support
partial logical block writes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-lib.c |   15 +++++++++++++++
 1 file changed, 15 insertions(+)


diff --git a/block/blk-lib.c b/block/blk-lib.c
index 9e29dc3..012aa98 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -29,6 +29,7 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 	struct bio *bio = *biop;
 	unsigned int granularity;
 	int alignment;
+	sector_t bs_mask;
 
 	if (!q)
 		return -ENXIO;
@@ -37,6 +38,10 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 	if ((type & REQ_SECURE) && !blk_queue_secdiscard(q))
 		return -EOPNOTSUPP;
 
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
+
 	/* Zero-sector (unknown) and one-sector granularities are the same.  */
 	granularity = max(q->limits.discard_granularity >> 9, 1U);
 	alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
@@ -140,10 +145,15 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 	unsigned int max_write_same_sectors;
 	struct bio *bio = NULL;
 	int ret = 0;
+	sector_t bs_mask;
 
 	if (!q)
 		return -ENXIO;
 
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
+
 	/* Ensure that max_write_same_sectors doesn't overflow bi_size */
 	max_write_same_sectors = UINT_MAX >> 9;
 
@@ -191,6 +201,11 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 	int ret;
 	struct bio *bio = NULL;
 	unsigned int sz;
+	sector_t bs_mask;
+
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
 
 	while (nr_sects != 0) {
 		bio = next_bio(bio, WRITE,

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/3] block: require write_same and discard requests align to logical block size
@ 2016-06-17  1:17   ` Darrick J. Wong
  0 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2016-06-17  1:17 UTC (permalink / raw)
  To: axboe-tSWWG44O7X1aa/9Udqfwiw, darrick.wong-QHcLZuEGTsvQT0dZR+AlfA
  Cc: linux-block-u79uwXL29TY76Z2rM5mHXA, tytso-3s7WtUTddSA,
	martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, linux-api-u79uwXL29TY76Z2rM5mHXA,
	bfoster-H+wXaHxf7aLQT0dZR+AlfA, xfs-VZNHf3L845pBDgjK7y7TUQ,
	hch-wEGCiKHe2LqWVfeAwA7xHQ, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Christoph Hellwig

Make sure that the offset and length arguments that we're using to
construct WRITE SAME and DISCARD requests are actually aligned to the
logical block size.  Failure to do this causes other errors in other
parts of the block layer or the SCSI layer because disks don't support
partial logical block writes.

Signed-off-by: Darrick J. Wong <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 block/blk-lib.c |   15 +++++++++++++++
 1 file changed, 15 insertions(+)


diff --git a/block/blk-lib.c b/block/blk-lib.c
index 9e29dc3..012aa98 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -29,6 +29,7 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 	struct bio *bio = *biop;
 	unsigned int granularity;
 	int alignment;
+	sector_t bs_mask;
 
 	if (!q)
 		return -ENXIO;
@@ -37,6 +38,10 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 	if ((type & REQ_SECURE) && !blk_queue_secdiscard(q))
 		return -EOPNOTSUPP;
 
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
+
 	/* Zero-sector (unknown) and one-sector granularities are the same.  */
 	granularity = max(q->limits.discard_granularity >> 9, 1U);
 	alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
@@ -140,10 +145,15 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 	unsigned int max_write_same_sectors;
 	struct bio *bio = NULL;
 	int ret = 0;
+	sector_t bs_mask;
 
 	if (!q)
 		return -ENXIO;
 
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
+
 	/* Ensure that max_write_same_sectors doesn't overflow bi_size */
 	max_write_same_sectors = UINT_MAX >> 9;
 
@@ -191,6 +201,11 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 	int ret;
 	struct bio *bio = NULL;
 	unsigned int sz;
+	sector_t bs_mask;
+
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
 
 	while (nr_sects != 0) {
 		bio = next_bio(bio, WRITE,

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/3] block: require write_same and discard requests align to logical block size
  2016-03-15 19:42 [PATCH v7 0/3] fallocate for block devices to provide zero-out Darrick J. Wong
@ 2016-03-15 19:42 ` Darrick J. Wong
  0 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2016-03-15 19:42 UTC (permalink / raw)
  To: axboe, torvalds, darrick.wong
  Cc: bfields, tytso, akpm, martin.petersen, linux-api, david,
	linux-kernel, shane.seymour, hch, linux-fsdevel, jlayton,
	Christoph Hellwig

Make sure that the offset and length arguments that we're using to
construct WRITE SAME and DISCARD requests are actually aligned to the
logical block size.  Failure to do this causes other errors in other
parts of the block layer or the SCSI layer because disks don't support
partial logical block writes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-lib.c |   15 +++++++++++++++
 1 file changed, 15 insertions(+)


diff --git a/block/blk-lib.c b/block/blk-lib.c
index 9ebf653..9dca6bb 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -49,6 +49,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 	struct bio *bio;
 	int ret = 0;
 	struct blk_plug plug;
+	sector_t bs_mask;
 
 	if (!q)
 		return -ENXIO;
@@ -56,6 +57,10 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 	if (!blk_queue_discard(q))
 		return -EOPNOTSUPP;
 
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
+
 	/* Zero-sector (unknown) and one-sector granularities are the same.  */
 	granularity = max(q->limits.discard_granularity >> 9, 1U);
 	alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
@@ -148,6 +153,7 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 	DECLARE_COMPLETION_ONSTACK(wait);
 	struct request_queue *q = bdev_get_queue(bdev);
 	unsigned int max_write_same_sectors;
+	sector_t bs_mask;
 	struct bio_batch bb;
 	struct bio *bio;
 	int ret = 0;
@@ -155,6 +161,10 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 	if (!q)
 		return -ENXIO;
 
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
+
 	/* Ensure that max_write_same_sectors doesn't overflow bi_size */
 	max_write_same_sectors = UINT_MAX >> 9;
 
@@ -218,9 +228,14 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 	int ret;
 	struct bio *bio;
 	struct bio_batch bb;
+	sector_t bs_mask;
 	unsigned int sz;
 	DECLARE_COMPLETION_ONSTACK(wait);
 
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector | nr_sects) & bs_mask)
+		return -EINVAL;
+
 	atomic_set(&bb.done, 1);
 	bb.error = 0;
 	bb.wait = &wait;

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/3] block: require write_same and discard requests align to logical block size
  2016-03-05  0:56 ` [PATCH 2/3] block: require write_same and discard requests align to logical block size Darrick J. Wong
  2016-03-05  3:02   ` Linus Torvalds
@ 2016-03-15  7:34   ` Christoph Hellwig
  1 sibling, 0 replies; 35+ messages in thread
From: Christoph Hellwig @ 2016-03-15  7:34 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: axboe, torvalds, hch, tytso, martin.petersen, linux-api, david,
	linux-kernel, shane.seymour, bfields, linux-fsdevel, jlayton,
	akpm

On Fri, Mar 04, 2016 at 04:56:10PM -0800, Darrick J. Wong wrote:
> Make sure that the offset and length arguments that we're using to
> construct WRITE SAME and DISCARD requests are actually aligned to the
> logical block size.  Failure to do this causes other errors in other
> parts of the block layer or the SCSI layer because disks don't support
> partial logical block writes.

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/3] block: require write_same and discard requests align to logical block size
  2016-03-05  0:56 ` [PATCH 2/3] block: require write_same and discard requests align to logical block size Darrick J. Wong
@ 2016-03-05  3:02   ` Linus Torvalds
  2016-03-15  7:34   ` Christoph Hellwig
  1 sibling, 0 replies; 35+ messages in thread
From: Linus Torvalds @ 2016-03-05  3:02 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Jens Axboe, Christoph Hellwig, Theodore Ts'o,
	Martin K. Petersen, Linux API, Dave Chinner,
	Linux Kernel Mailing List, shane.seymour, Bruce Fields,
	linux-fsdevel, Jeff Layton, Andrew Morton

On Fri, Mar 4, 2016 at 4:56 PM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
>
> +       bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
> +       if ((sector & bs_mask) || ((sector + nr_sects) & bs_mask))
> +               return -EINVAL;

This test may _work_, but it's kind of crazily overly complicated.

The sane test would be just "are the start and length aligned":

        if ((sector & bs_mask) || (nr_sects & bs_mask))
                return -EINVAL;

and the *smart* test is simpler still, and asks "are there invalid
bits in either the start or the length":

        if ((sector | nr_sects) & bs_mask)
                return -EINVAL:

I suspect either of these would be fine, and the compiler may even
notice that there's the smart way of doing it.

The compiler *might* even notice that the original version can be
simplified and generate sane code.

But I think that original version is not only overly complicated, it's
also actually less obvious than the simpler versions, if only because
the whole conditional is so big that you have to actively parse it.

That last shortest form is actually so simple that I think it's the
easiest to understand too - the conditional is simply so small that it
doesn't take a lot of effort to see what it does.

            Linus

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 2/3] block: require write_same and discard requests align to logical block size
  2016-03-05  0:55 [PATCH v6 0/3] fallocate for block devices to provide zero-out Darrick J. Wong
@ 2016-03-05  0:56 ` Darrick J. Wong
  2016-03-05  3:02   ` Linus Torvalds
  2016-03-15  7:34   ` Christoph Hellwig
  0 siblings, 2 replies; 35+ messages in thread
From: Darrick J. Wong @ 2016-03-05  0:56 UTC (permalink / raw)
  To: axboe, torvalds, darrick.wong
  Cc: hch, tytso, martin.petersen, linux-api, david, linux-kernel,
	shane.seymour, bfields, linux-fsdevel, jlayton, akpm

Make sure that the offset and length arguments that we're using to
construct WRITE SAME and DISCARD requests are actually aligned to the
logical block size.  Failure to do this causes other errors in other
parts of the block layer or the SCSI layer because disks don't support
partial logical block writes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 block/blk-lib.c |   15 +++++++++++++++
 1 file changed, 15 insertions(+)


diff --git a/block/blk-lib.c b/block/blk-lib.c
index 9ebf653..3e5ca28 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -49,6 +49,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 	struct bio *bio;
 	int ret = 0;
 	struct blk_plug plug;
+	sector_t bs_mask;
 
 	if (!q)
 		return -ENXIO;
@@ -56,6 +57,10 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 	if (!blk_queue_discard(q))
 		return -EOPNOTSUPP;
 
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector & bs_mask) || ((sector + nr_sects) & bs_mask))
+		return -EINVAL;
+
 	/* Zero-sector (unknown) and one-sector granularities are the same.  */
 	granularity = max(q->limits.discard_granularity >> 9, 1U);
 	alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
@@ -148,6 +153,7 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 	DECLARE_COMPLETION_ONSTACK(wait);
 	struct request_queue *q = bdev_get_queue(bdev);
 	unsigned int max_write_same_sectors;
+	sector_t bs_mask;
 	struct bio_batch bb;
 	struct bio *bio;
 	int ret = 0;
@@ -155,6 +161,10 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 	if (!q)
 		return -ENXIO;
 
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector & bs_mask) || ((sector + nr_sects) & bs_mask))
+		return -EINVAL;
+
 	/* Ensure that max_write_same_sectors doesn't overflow bi_size */
 	max_write_same_sectors = UINT_MAX >> 9;
 
@@ -218,9 +228,14 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 	int ret;
 	struct bio *bio;
 	struct bio_batch bb;
+	sector_t bs_mask;
 	unsigned int sz;
 	DECLARE_COMPLETION_ONSTACK(wait);
 
+	bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+	if ((sector & bs_mask) || ((sector + nr_sects) & bs_mask))
+		return -EINVAL;
+
 	atomic_set(&bb.done, 1);
 	bb.error = 0;
 	bb.wait = &wait;

^ permalink raw reply related	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2016-09-29 21:17 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-13  4:01 [RFC DONOTMERGE v8 0/3] fallocate for block devices Darrick J. Wong
2016-04-13  4:01 ` Darrick J. Wong
2016-04-13  4:01 ` Darrick J. Wong
2016-04-13  4:01 ` [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT Darrick J. Wong
2016-04-13  4:01   ` Darrick J. Wong
2016-04-13  4:01   ` Darrick J. Wong
2016-04-13  4:01 ` [PATCH 2/3] block: require write_same and discard requests align to logical block size Darrick J. Wong
2016-04-13  4:01   ` Darrick J. Wong
2016-04-13 14:23   ` Christoph Hellwig
2016-04-13 14:23     ` Christoph Hellwig
2016-04-13  4:01 ` [PATCH 3/3] block: implement (some of) fallocate for block devices Darrick J. Wong
2016-04-13  4:01   ` Darrick J. Wong
2016-04-13  4:01   ` Darrick J. Wong
2016-04-13  4:04 ` [PATCH 4/3] block: test " Darrick J. Wong
2016-04-13  4:04   ` Darrick J. Wong
  -- strict thread matches above, loose matches on Subject: below --
2016-09-29 21:16 [PATCH v11 0/3] " Darrick J. Wong
2016-09-29 21:16 ` [PATCH 2/3] block: require write_same and discard requests align to logical block size Darrick J. Wong
2016-09-29 21:16   ` Darrick J. Wong
2016-09-29  0:39 [PATCH v10 0/3] fallocate for block devices Darrick J. Wong
2016-09-29  0:39 ` [PATCH 2/3] block: require write_same and discard requests align to logical block size Darrick J. Wong
2016-09-29  0:39   ` Darrick J. Wong
2016-09-29  5:56   ` Hannes Reinecke
2016-09-29  5:56     ` Hannes Reinecke
2016-09-29  5:56     ` Hannes Reinecke
2016-08-26  0:02 [PATCH v10 0/3] fallocate for block devices Darrick J. Wong
2016-08-26  0:02 ` [PATCH 2/3] block: require write_same and discard requests align to logical block size Darrick J. Wong
2016-08-26  0:02   ` Darrick J. Wong
2016-06-17  1:17 [PATCH v9 0/3] fallocate for block devices Darrick J. Wong
2016-06-17  1:17 ` [PATCH 2/3] block: require write_same and discard requests align to logical block size Darrick J. Wong
2016-06-17  1:17   ` Darrick J. Wong
2016-06-17  1:17   ` Darrick J. Wong
2016-06-20 12:37   ` Bart Van Assche
2016-06-20 12:37     ` Bart Van Assche
2016-06-29  4:58   ` Martin K. Petersen
2016-06-29  4:58     ` Martin K. Petersen
2016-03-15 19:42 [PATCH v7 0/3] fallocate for block devices to provide zero-out Darrick J. Wong
2016-03-15 19:42 ` [PATCH 2/3] block: require write_same and discard requests align to logical block size Darrick J. Wong
2016-03-05  0:55 [PATCH v6 0/3] fallocate for block devices to provide zero-out Darrick J. Wong
2016-03-05  0:56 ` [PATCH 2/3] block: require write_same and discard requests align to logical block size Darrick J. Wong
2016-03-05  3:02   ` Linus Torvalds
2016-03-15  7:34   ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.