All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4] block: create ioctl to discard-or-zeroout a range of blocks
@ 2015-11-13 22:01 Darrick J. Wong
  2015-11-16 22:56 ` Seymour, Shane M
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Darrick J. Wong @ 2015-11-13 22:01 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig
  Cc: Seymour, Shane M, linux-kernel, linux-fsdevel, linux-api,
	Jeff Layton, J. Bruce Fields, martin.petersen

Create a new ioctl to expose the block layer's newfound ability to
issue either a zeroing discard, a WRITE SAME with a zero page, or a
regular write with the zero page.  This BLKZEROOUT2 ioctl takes
{start, length, flags} as parameters.  So far, the only flag available
is to enable the zeroing discard part -- without it, the call invokes
the old BLKZEROOUT behavior.  start and length have the same meaning
as in BLKZEROOUT.

Furthermore, because BLKZEROOUT2 issues commands directly to the
storage device, we must invalidate the page cache (as a regular
O_DIRECT write would do) to avoid returning stale cache contents at a
later time.

v3: Add extra padding for future expansion, and check the padding is zero.
v4: Check the start/len arguments for overflows prior to feeding the page
    cache bogus numbers (that it'll ignore anyway).

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 block/ioctl.c           |   50 ++++++++++++++++++++++++++++++++++++++++-------
 include/uapi/linux/fs.h |    9 ++++++++
 2 files changed, 52 insertions(+), 7 deletions(-)

diff --git a/block/ioctl.c b/block/ioctl.c
index 8061eba..5c092e8 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -213,19 +213,41 @@ static int blk_ioctl_discard(struct block_device *bdev, uint64_t start,
 }
 
 static int blk_ioctl_zeroout(struct block_device *bdev, uint64_t start,
-			     uint64_t len)
+			     uint64_t len, uint32_t flags)
 {
+	int ret;
+	struct address_space *mapping;
+	uint64_t end = start + len - 1;
+
+	if (flags & ~BLKZEROOUT2_DISCARD_OK)
+		return -EINVAL;
 	if (start & 511)
 		return -EINVAL;
 	if (len & 511)
 		return -EINVAL;
-	start >>= 9;
-	len >>= 9;
-
-	if (start + len > (i_size_read(bdev->bd_inode) >> 9))
+	if (end >= (uint64_t)i_size_read(bdev->bd_inode))
+		return -EINVAL;
+	if (end < start)
 		return -EINVAL;
 
-	return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL, false);
+	/* Invalidate the page cache, including dirty pages */
+	mapping = bdev->bd_inode->i_mapping;
+	truncate_inode_pages_range(mapping, start, end);
+
+	ret = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
+				   flags & BLKZEROOUT2_DISCARD_OK);
+	if (ret)
+		goto out;
+
+	/*
+	 * Invalidate again; if someone wandered in and dirtied a page,
+	 * the caller will be given -EBUSY.
+	 */
+	ret = invalidate_inode_pages2_range(mapping,
+					    start >> PAGE_CACHE_SHIFT,
+					    end >> PAGE_CACHE_SHIFT);
+out:
+	return ret;
 }
 
 static int put_ushort(unsigned long arg, unsigned short val)
@@ -353,7 +375,21 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
 		if (copy_from_user(range, (void __user *)arg, sizeof(range)))
 			return -EFAULT;
 
-		return blk_ioctl_zeroout(bdev, range[0], range[1]);
+		return blk_ioctl_zeroout(bdev, range[0], range[1], 0);
+	}
+	case BLKZEROOUT2: {
+		struct blkzeroout2 p;
+
+		if (!(mode & FMODE_WRITE))
+			return -EBADF;
+
+		if (copy_from_user(&p, (void __user *)arg, sizeof(p)))
+			return -EFAULT;
+
+		if (p.padding || p.padding2)
+			return -EINVAL;
+
+		return blk_ioctl_zeroout(bdev, p.start, p.length, p.flags);
 	}
 
 	case HDIO_GETGEO: {
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 9b964a5..b811fa4 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -152,6 +152,15 @@ struct inodes_stat_t {
 #define BLKSECDISCARD _IO(0x12,125)
 #define BLKROTATIONAL _IO(0x12,126)
 #define BLKZEROOUT _IO(0x12,127)
+struct blkzeroout2 {
+	__u64 start;
+	__u64 length;
+	__u32 flags;
+	__u32 padding;
+	__u64 padding2;
+};
+#define BLKZEROOUT2_DISCARD_OK	1
+#define BLKZEROOUT2 _IOR(0x12, 127, struct blkzeroout2)
 
 #define BMAP_IOCTL 1		/* obsolete - kept for compatibility */
 #define FIBMAP	   _IO(0x00,1)	/* bmap access */

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* RE: [PATCH v4] block: create ioctl to discard-or-zeroout a range of blocks
  2015-11-13 22:01 [PATCH v4] block: create ioctl to discard-or-zeroout a range of blocks Darrick J. Wong
@ 2015-11-16 22:56 ` Seymour, Shane M
  2015-12-08  1:16   ` Martin K. Petersen
  2015-12-08  2:40 ` Christoph Hellwig
  2 siblings, 0 replies; 6+ messages in thread
From: Seymour, Shane M @ 2015-11-16 22:56 UTC (permalink / raw)
  To: Darrick J. Wong, Jens Axboe, Christoph Hellwig
  Cc: linux-kernel, linux-fsdevel, linux-api, Jeff Layton,
	J. Bruce Fields, martin.petersen


> v4: Check the start/len arguments for overflows prior to feeding the page
>     cache bogus numbers (that it'll ignore anyway).

> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Reviewed-by: Shane Seymour <shane.seymour@hpe.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4] block: create ioctl to discard-or-zeroout a range of blocks
@ 2015-12-08  1:16   ` Martin K. Petersen
  0 siblings, 0 replies; 6+ messages in thread
From: Martin K. Petersen @ 2015-12-08  1:16 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Jens Axboe, Christoph Hellwig, Seymour, Shane M, linux-kernel,
	linux-fsdevel, linux-api, Jeff Layton, J. Bruce Fields,
	martin.petersen

>>>>> "Darrick" == Darrick J Wong <darrick.wong@oracle.com> writes:

Darrick> Create a new ioctl to expose the block layer's newfound ability
Darrick> to issue either a zeroing discard, a WRITE SAME with a zero
Darrick> page, or a regular write with the zero page.  This BLKZEROOUT2
Darrick> ioctl takes {start, length, flags} as parameters.  So far, the
Darrick> only flag available is to enable the zeroing discard part --
Darrick> without it, the call invokes the old BLKZEROOUT behavior.
Darrick> start and length have the same meaning as in BLKZEROOUT.

Darrick> Furthermore, because BLKZEROOUT2 issues commands directly to
Darrick> the storage device, we must invalidate the page cache (as a
Darrick> regular O_DIRECT write would do) to avoid returning stale cache
Darrick> contents at a later time.

Jens? Christoph?

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4] block: create ioctl to discard-or-zeroout a range of blocks
@ 2015-12-08  1:16   ` Martin K. Petersen
  0 siblings, 0 replies; 6+ messages in thread
From: Martin K. Petersen @ 2015-12-08  1:16 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Jens Axboe, Christoph Hellwig, Seymour, Shane M,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-api@vger.kernel.org, Jeff Layton, J. Bruce Fields,
	martin.petersen@oracle.com

>>>>> "Darrick" == Darrick J Wong <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> writes:

Darrick> Create a new ioctl to expose the block layer's newfound ability
Darrick> to issue either a zeroing discard, a WRITE SAME with a zero
Darrick> page, or a regular write with the zero page.  This BLKZEROOUT2
Darrick> ioctl takes {start, length, flags} as parameters.  So far, the
Darrick> only flag available is to enable the zeroing discard part --
Darrick> without it, the call invokes the old BLKZEROOUT behavior.
Darrick> start and length have the same meaning as in BLKZEROOUT.

Darrick> Furthermore, because BLKZEROOUT2 issues commands directly to
Darrick> the storage device, we must invalidate the page cache (as a
Darrick> regular O_DIRECT write would do) to avoid returning stale cache
Darrick> contents at a later time.

Jens? Christoph?

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4] block: create ioctl to discard-or-zeroout a range of blocks
  2015-11-13 22:01 [PATCH v4] block: create ioctl to discard-or-zeroout a range of blocks Darrick J. Wong
  2015-11-16 22:56 ` Seymour, Shane M
  2015-12-08  1:16   ` Martin K. Petersen
@ 2015-12-08  2:40 ` Christoph Hellwig
  2015-12-08  5:28   ` Darrick J. Wong
  2 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2015-12-08  2:40 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Jens Axboe, Christoph Hellwig, Seymour, Shane M, linux-kernel,
	linux-fsdevel, linux-api, Jeff Layton, J. Bruce Fields,
	martin.petersen

On Fri, Nov 13, 2015 at 02:01:43PM -0800, Darrick J. Wong wrote:
> Create a new ioctl to expose the block layer's newfound ability to
> issue either a zeroing discard, a WRITE SAME with a zero page, or a
> regular write with the zero page.  This BLKZEROOUT2 ioctl takes
> {start, length, flags} as parameters.  So far, the only flag available
> is to enable the zeroing discard part -- without it, the call invokes
> the old BLKZEROOUT behavior.  start and length have the same meaning
> as in BLKZEROOUT.
> 
> Furthermore, because BLKZEROOUT2 issues commands directly to the
> storage device, we must invalidate the page cache (as a regular
> O_DIRECT write would do) to avoid returning stale cache contents at a
> later time.

So does BLKZEROOUT.  Seems like adding the cache invalidation should
be one patch and the ioctl another one.  Otherwise this looks fine
except that I kinda hate BLKZEROOUT2 name, but can't come up with
anything better.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4] block: create ioctl to discard-or-zeroout a range of blocks
  2015-12-08  2:40 ` Christoph Hellwig
@ 2015-12-08  5:28   ` Darrick J. Wong
  0 siblings, 0 replies; 6+ messages in thread
From: Darrick J. Wong @ 2015-12-08  5:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Seymour, Shane M, linux-kernel, linux-fsdevel,
	linux-api, Jeff Layton, J. Bruce Fields, martin.petersen

On Mon, Dec 07, 2015 at 06:40:15PM -0800, Christoph Hellwig wrote:
> On Fri, Nov 13, 2015 at 02:01:43PM -0800, Darrick J. Wong wrote:
> > Create a new ioctl to expose the block layer's newfound ability to
> > issue either a zeroing discard, a WRITE SAME with a zero page, or a
> > regular write with the zero page.  This BLKZEROOUT2 ioctl takes
> > {start, length, flags} as parameters.  So far, the only flag available
> > is to enable the zeroing discard part -- without it, the call invokes
> > the old BLKZEROOUT behavior.  start and length have the same meaning
> > as in BLKZEROOUT.
> > 
> > Furthermore, because BLKZEROOUT2 issues commands directly to the
> > storage device, we must invalidate the page cache (as a regular
> > O_DIRECT write would do) to avoid returning stale cache contents at a
> > later time.
> 
> So does BLKZEROOUT.  Seems like adding the cache invalidation should
> be one patch and the ioctl another one.  Otherwise this looks fine
> except that I kinda hate BLKZEROOUT2 name, but can't come up with
> anything better.

I should've sent the 4.4 version -- since the ioctl code was refactored into
separate functions and the new function modifies its arguments, I ended up
rewriting most of the function body just to set up the page cache invalidation.

(Ah, I see -- 4.4-rc1 came out and I forgot to resend.)

Blergh, v5 on its way...

--D

> --
> To unsubscribe from this list: send the line "unsubscribe linux-api" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-12-08  5:29 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-13 22:01 [PATCH v4] block: create ioctl to discard-or-zeroout a range of blocks Darrick J. Wong
2015-11-16 22:56 ` Seymour, Shane M
2015-12-08  1:16 ` Martin K. Petersen
2015-12-08  1:16   ` Martin K. Petersen
2015-12-08  2:40 ` Christoph Hellwig
2015-12-08  5:28   ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.