linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFCv2 0/3] Support for high-priority block device flag
@ 2016-05-14  0:02 Jon Derrick
  2016-05-14  0:02 ` [RFCv2 1/3] block: allow other bd i_node flags when DAX is disabled Jon Derrick
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Jon Derrick @ 2016-05-14  0:02 UTC (permalink / raw)
  To: linux-block
  Cc: Jon Derrick, Jens Axboe, Alexander Viro, linux-fsdevel,
	Dan Williams, Jeff Moyer, Stephen Bates, Keith Busch, linux-nvme,
	Christoph Hellwig


V2: Added inode locking as appropriate when setting i_flags


This set intends to recreate block polling (now HIPRI) behavior that was
present in 4.5, where all IO on a queue could be selected to use
block polling behavior. The set allows a block device file to subscribe
to block polling on a block device granularity, rather than a per-queue
granularity.

There have been few-to-no arguments in support of the per-queue,
always-poll functionality that 4.5 offered, moreso in favor of enabling
polling on the entire block device (or indivual IOs as 4.6 offers).

I've been made aware that streams and ioprio may supercede this
functionality in the future, but I'm hoping this is an acceptable
stopgap in the meantime.

This set applies against 4.6-rc7 as well as Jens' for-4.7/core

(I've also been made aware that it may not apply cleanly to 4.7 after
several DAX changes)


Jon Derrick (3):
  block: allow other bd i_node flags when DAX is disabled
  block: add helper for setting and clearing S_DAX on inode
  block: Introduce S_HIPRI inode flag

 block/ioctl.c           | 43 +++++++++++++++++++++++++++++++++++++++++++
 fs/block_dev.c          | 23 +++++++++++++++++++----
 include/linux/fs.h      |  2 ++
 include/uapi/linux/fs.h |  2 ++
 4 files changed, 66 insertions(+), 4 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFCv2 1/3] block: allow other bd i_node flags when DAX is disabled
  2016-05-14  0:02 [RFCv2 0/3] Support for high-priority block device flag Jon Derrick
@ 2016-05-14  0:02 ` Jon Derrick
  2016-05-14  0:02 ` [RFCv2 2/3] block: add helper for setting and clearing S_DAX on inode Jon Derrick
  2016-05-14  0:02 ` [RFCv2 3/3] block: Introduce S_HIPRI inode flag Jon Derrick
  2 siblings, 0 replies; 7+ messages in thread
From: Jon Derrick @ 2016-05-14  0:02 UTC (permalink / raw)
  To: linux-block
  Cc: Jon Derrick, Jens Axboe, Alexander Viro, linux-fsdevel,
	Dan Williams, Jeff Moyer, Stephen Bates, Keith Busch, linux-nvme,
	Christoph Hellwig

When DAX is not compiled into the kernel or the device does not support
direct-access, the block device file's inode flags are fully cleared.
This patch changes it to only clear the S_DAX flag when DAX is disabled.

This reverts to i_flags behavior prior to
bbab37ddc20bae4709bca8745c128c4f46fe63c5

Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
---
 fs/block_dev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 20a2c02..d4fa725 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1208,7 +1208,7 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
 		if (IS_ENABLED(CONFIG_BLK_DEV_DAX) && disk->fops->direct_access)
 			bdev->bd_inode->i_flags = S_DAX;
 		else
-			bdev->bd_inode->i_flags = 0;
+			bdev->bd_inode->i_flags &= ~S_DAX;
 
 		if (!partno) {
 			ret = -ENXIO;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFCv2 2/3] block: add helper for setting and clearing S_DAX on inode
  2016-05-14  0:02 [RFCv2 0/3] Support for high-priority block device flag Jon Derrick
  2016-05-14  0:02 ` [RFCv2 1/3] block: allow other bd i_node flags when DAX is disabled Jon Derrick
@ 2016-05-14  0:02 ` Jon Derrick
  2016-05-14  5:05   ` Elliott, Robert (Persistent Memory)
  2016-05-14  0:02 ` [RFCv2 3/3] block: Introduce S_HIPRI inode flag Jon Derrick
  2 siblings, 1 reply; 7+ messages in thread
From: Jon Derrick @ 2016-05-14  0:02 UTC (permalink / raw)
  To: linux-block
  Cc: Jon Derrick, Jens Axboe, Alexander Viro, linux-fsdevel,
	Dan Williams, Jeff Moyer, Stephen Bates, Keith Busch, linux-nvme,
	Christoph Hellwig

inode->i_flags locking rules suggest using the i_mutex lock. This patch
adds a helper to do the locking and setting of S_DAX on the block device
inode.

Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
---
 fs/block_dev.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index d4fa725..252e459 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1159,6 +1159,18 @@ void bd_set_size(struct block_device *bdev, loff_t size)
 }
 EXPORT_SYMBOL(bd_set_size);
 
+static void bd_set_dax(struct block_device *bdev, bool enabled)
+{
+	struct inode *inode = bdev->bd_inode;
+
+	inode_lock(inode);
+	if (enabled)
+		inode->i_flags = S_DAX;
+	else
+		inode->i_flags &= ~S_DAX;
+	inode_unlock(inode);
+}
+
 static void __blkdev_put(struct block_device *bdev, fmode_t mode, int for_part);
 
 /*
@@ -1206,9 +1218,9 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
 		bdev->bd_queue = disk->queue;
 		bdev->bd_contains = bdev;
 		if (IS_ENABLED(CONFIG_BLK_DEV_DAX) && disk->fops->direct_access)
-			bdev->bd_inode->i_flags = S_DAX;
+			bd_set_dax(bdev, 1);
 		else
-			bdev->bd_inode->i_flags &= ~S_DAX;
+			bd_set_dax(bdev, 0);
 
 		if (!partno) {
 			ret = -ENXIO;
@@ -1239,7 +1251,7 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
 			if (!ret) {
 				bd_set_size(bdev,(loff_t)get_capacity(disk)<<9);
 				if (!blkdev_dax_capable(bdev))
-					bdev->bd_inode->i_flags &= ~S_DAX;
+					bd_set_dax(bdev, 0);
 			}
 
 			/*
@@ -1276,7 +1288,7 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
 			}
 			bd_set_size(bdev, (loff_t)bdev->bd_part->nr_sects << 9);
 			if (!blkdev_dax_capable(bdev))
-				bdev->bd_inode->i_flags &= ~S_DAX;
+				bd_set_dax(bdev, 0);
 		}
 	} else {
 		if (bdev->bd_contains == bdev) {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFCv2 3/3] block: Introduce S_HIPRI inode flag
  2016-05-14  0:02 [RFCv2 0/3] Support for high-priority block device flag Jon Derrick
  2016-05-14  0:02 ` [RFCv2 1/3] block: allow other bd i_node flags when DAX is disabled Jon Derrick
  2016-05-14  0:02 ` [RFCv2 2/3] block: add helper for setting and clearing S_DAX on inode Jon Derrick
@ 2016-05-14  0:02 ` Jon Derrick
  2016-05-14 14:22   ` Dan Williams
  2 siblings, 1 reply; 7+ messages in thread
From: Jon Derrick @ 2016-05-14  0:02 UTC (permalink / raw)
  To: linux-block
  Cc: Jon Derrick, Jens Axboe, Alexander Viro, linux-fsdevel,
	Dan Williams, Jeff Moyer, Stephen Bates, Keith Busch, linux-nvme,
	Christoph Hellwig

S_HIPRI is a hint that indicates the file (currently only block devices)
is a high priority file. This hint allows direct-io to the block device
to poll for completions if polling is available to the block device.

The motivation for this patch comes from tiered caching solutions. A
user may wish to have low-latency block devices act as a cache for
higher-latency storage media.

With the introduction of block polling, polling could be enabled on a
queue of a block device. The preadv2/pwritev2 sets allowed a user to
specify per-io polling, but removed the ability to poll per-queue.

Instead of having a user modify their software to use preadv2/pwritev2,
this patch allows a user to set S_HIPRI on a block device file to request
all direct-io for this file to be polled.

Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
---
 block/ioctl.c           | 43 +++++++++++++++++++++++++++++++++++++++++++
 fs/block_dev.c          |  3 +++
 include/linux/fs.h      |  2 ++
 include/uapi/linux/fs.h |  2 ++
 4 files changed, 50 insertions(+)

diff --git a/block/ioctl.c b/block/ioctl.c
index 4ff1f92..1b50fef 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -520,6 +520,45 @@ static int blkdev_bszset(struct block_device *bdev, fmode_t mode,
 	return ret;
 }
 
+static int blkdev_hipriget(struct block_device *bdev)
+{
+	return !!(bdev->bd_inode->i_flags & S_HIPRI);
+}
+
+static int blkdev_hipriset(struct block_device *bdev, fmode_t mode,
+		int __user *argp)
+{
+	int n;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EACCES;
+	if (!argp)
+		return -EINVAL;
+	if (get_user(n, argp))
+		return -EFAULT;
+
+	n = !!n;
+	if (n == blkdev_hipriget(bdev))
+		return 0;
+
+	if (!(mode & FMODE_EXCL)) {
+		bdgrab(bdev);
+		if (blkdev_get(bdev, mode | FMODE_EXCL, &bdev) < 0)
+			return -EBUSY;
+	}
+
+	inode_lock(bdev->bd_inode);
+	if (n)
+		bdev->bd_inode->i_flags |= S_HIPRI;
+	else
+		bdev->bd_inode->i_flags &= ~S_HIPRI;
+	inode_unlock(bdev->bd_inode);
+
+	if (!(mode & FMODE_EXCL))
+		blkdev_put(bdev, mode | FMODE_EXCL);
+	return 0;
+}
+
 /*
  * always keep this in sync with compat_blkdev_ioctl()
  */
@@ -601,6 +640,10 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
 	case BLKDAXGET:
 		return put_int(arg, !!(bdev->bd_inode->i_flags & S_DAX));
 		break;
+	case BLKHIPRISET:
+		return blkdev_hipriset(bdev, mode, argp);
+	case BLKHIPRIGET:
+		return put_int(arg, blkdev_hipriget(bdev));
 	case IOC_PR_REGISTER:
 		return blkdev_pr_register(bdev, argp);
 	case IOC_PR_RESERVE:
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 252e459..ede8325 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -170,6 +170,9 @@ blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, loff_t offset)
 	if (IS_DAX(inode))
 		return dax_do_io(iocb, inode, iter, offset, blkdev_get_block,
 				NULL, DIO_SKIP_DIO_COUNT);
+
+	if (IS_HIPRI(inode))
+		iocb->ki_flags |= IOCB_HIPRI;
 	return __blockdev_direct_IO(iocb, inode, I_BDEV(inode), iter, offset,
 				    blkdev_get_block, NULL, NULL,
 				    DIO_SKIP_DIO_COUNT);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 70e61b5..8ae39ea 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1788,6 +1788,7 @@ struct super_operations {
 #else
 #define S_DAX		0	/* Make all the DAX code disappear */
 #endif
+#define S_HIPRI		16384	/* IO for this file has high priority */
 
 /*
  * Note that nosuid etc flags are inode-specific: setting some file-system
@@ -1826,6 +1827,7 @@ struct super_operations {
 #define IS_AUTOMOUNT(inode)	((inode)->i_flags & S_AUTOMOUNT)
 #define IS_NOSEC(inode)		((inode)->i_flags & S_NOSEC)
 #define IS_DAX(inode)		((inode)->i_flags & S_DAX)
+#define IS_HIPRI(inode)		((inode)->i_flags & S_HIPRI)
 
 #define IS_WHITEOUT(inode)	(S_ISCHR(inode->i_mode) && \
 				 (inode)->i_rdev == WHITEOUT_DEV)
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index a079d50..d6e262c 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -223,6 +223,8 @@ struct fsxattr {
 #define BLKROTATIONAL _IO(0x12,126)
 #define BLKZEROOUT _IO(0x12,127)
 #define BLKDAXGET _IO(0x12,129)
+#define BLKHIPRISET _IOW(0x12,130,int)
+#define BLKHIPRIGET _IO(0x12,131)
 
 #define BMAP_IOCTL 1		/* obsolete - kept for compatibility */
 #define FIBMAP	   _IO(0x00,1)	/* bmap access */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* RE: [RFCv2 2/3] block: add helper for setting and clearing S_DAX on inode
  2016-05-14  0:02 ` [RFCv2 2/3] block: add helper for setting and clearing S_DAX on inode Jon Derrick
@ 2016-05-14  5:05   ` Elliott, Robert (Persistent Memory)
  2016-05-16 17:31     ` Jon Derrick
  0 siblings, 1 reply; 7+ messages in thread
From: Elliott, Robert (Persistent Memory) @ 2016-05-14  5:05 UTC (permalink / raw)
  To: Jon Derrick, linux-block
  Cc: Jens Axboe, Alexander Viro, linux-fsdevel, Dan Williams,
	Jeff Moyer, Stephen Bates, Keith Busch, linux-nvme,
	Christoph Hellwig



> -----Original Message-----
> From: linux-block-owner@vger.kernel.org [mailto:linux-block-
> owner@vger.kernel.org] On Behalf Of Jon Derrick
> Sent: Friday, May 13, 2016 7:03 PM
...
> Subject: [RFCv2 2/3] block: add helper for setting and clearing S_DAX on
> inode
> 
> inode->i_flags locking rules suggest using the i_mutex lock. This patch
> adds a helper to do the locking and setting of S_DAX on the block device
> inode.
...
> +static void bd_set_dax(struct block_device *bdev, bool enabled)
> +{
> +	struct inode *inode = bdev->bd_inode;
> +
> +	inode_lock(inode);
> +	if (enabled)
> +		inode->i_flags = S_DAX;
> +	else
> +		inode->i_flags &= ~S_DAX;
> +	inode_unlock(inode);
> +}

This is not symmetric - setting wipes out any other bits, but
clearing only clears the S_DAX bit. That seems confusing for
a helper function.

Using |= would be symmetric, but wouldn't replace what 
__blkdev_get does (if what it does is appropriate).

> @@ -1206,9 +1218,9 @@ static int __blkdev_get(struct block_device *bdev,
> fmode_t mode, int for_part)
>  		bdev->bd_queue = disk->queue;
>  		bdev->bd_contains = bdev;
>  		if (IS_ENABLED(CONFIG_BLK_DEV_DAX) && disk->fops-
> >direct_access)
> -			bdev->bd_inode->i_flags = S_DAX;
> +			bd_set_dax(bdev, 1);
>  		else
> -			bdev->bd_inode->i_flags &= ~S_DAX;
> +			bd_set_dax(bdev, 0);


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFCv2 3/3] block: Introduce S_HIPRI inode flag
  2016-05-14  0:02 ` [RFCv2 3/3] block: Introduce S_HIPRI inode flag Jon Derrick
@ 2016-05-14 14:22   ` Dan Williams
  0 siblings, 0 replies; 7+ messages in thread
From: Dan Williams @ 2016-05-14 14:22 UTC (permalink / raw)
  To: Jon Derrick
  Cc: linux-block, Jens Axboe, Alexander Viro, linux-fsdevel,
	Jeff Moyer, Stephen Bates, Keith Busch, linux-nvme,
	Christoph Hellwig, Martin K. Petersen

[ adding Martin, given his hint infrastructure that is in the works... ]


On Fri, May 13, 2016 at 5:02 PM, Jon Derrick <jonathan.derrick@intel.com> wrote:
> S_HIPRI is a hint that indicates the file (currently only block devices)
> is a high priority file. This hint allows direct-io to the block device
> to poll for completions if polling is available to the block device.
>
> The motivation for this patch comes from tiered caching solutions. A
> user may wish to have low-latency block devices act as a cache for
> higher-latency storage media.
>
> With the introduction of block polling, polling could be enabled on a
> queue of a block device. The preadv2/pwritev2 sets allowed a user to
> specify per-io polling, but removed the ability to poll per-queue.
>
> Instead of having a user modify their software to use preadv2/pwritev2,
> this patch allows a user to set S_HIPRI on a block device file to request
> all direct-io for this file to be polled.

Setting aside whether this should be a per-task ioprio hint vs a block
device inode hint, this single flag loses the granularity of whether
reads or writes are polled.  Low latency reads with posted writes
seems a valid configuration.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFCv2 2/3] block: add helper for setting and clearing S_DAX on inode
  2016-05-14  5:05   ` Elliott, Robert (Persistent Memory)
@ 2016-05-16 17:31     ` Jon Derrick
  0 siblings, 0 replies; 7+ messages in thread
From: Jon Derrick @ 2016-05-16 17:31 UTC (permalink / raw)
  To: Elliott, Robert (Persistent Memory)
  Cc: linux-block, Jens Axboe, Alexander Viro, linux-fsdevel,
	Dan Williams, Jeff Moyer, Stephen Bates, Keith Busch, linux-nvme,
	Christoph Hellwig

On Sat, May 14, 2016 at 05:05:59AM +0000, Elliott, Robert (Persistent Memory) wrote:
> 
> 
> > -----Original Message-----
> > From: linux-block-owner@vger.kernel.org [mailto:linux-block-
> > owner@vger.kernel.org] On Behalf Of Jon Derrick
> > Sent: Friday, May 13, 2016 7:03 PM
> ...
> > Subject: [RFCv2 2/3] block: add helper for setting and clearing S_DAX on
> > inode
> > 
> > inode->i_flags locking rules suggest using the i_mutex lock. This patch
> > adds a helper to do the locking and setting of S_DAX on the block device
> > inode.
> ...
> > +static void bd_set_dax(struct block_device *bdev, bool enabled)
> > +{
> > +	struct inode *inode = bdev->bd_inode;
> > +
> > +	inode_lock(inode);
> > +	if (enabled)
> > +		inode->i_flags = S_DAX;
> > +	else
> > +		inode->i_flags &= ~S_DAX;
> > +	inode_unlock(inode);
> > +}
> 
> This is not symmetric - setting wipes out any other bits, but
> clearing only clears the S_DAX bit. That seems confusing for
> a helper function.
> 
> Using |= would be symmetric, but wouldn't replace what 
> __blkdev_get does (if what it does is appropriate).
> 
I'm under the impression that S_DAX is the only inode flag appropriate for a block device inode with direct-access, even though it does break the 'flag' semantic.

It does bring up an interesting question about the fate of the block device inode when S_DAX is set and then cleared later due to the two other conditions that can clear S_DAX.

I think the appropriate fix is to save the inode flags when setting DAX and restore them when we need to clear S_DAX.

> > @@ -1206,9 +1218,9 @@ static int __blkdev_get(struct block_device *bdev,
> > fmode_t mode, int for_part)
> >  		bdev->bd_queue = disk->queue;
> >  		bdev->bd_contains = bdev;
> >  		if (IS_ENABLED(CONFIG_BLK_DEV_DAX) && disk->fops-
> > >direct_access)
> > -			bdev->bd_inode->i_flags = S_DAX;
> > +			bd_set_dax(bdev, 1);
> >  		else
> > -			bdev->bd_inode->i_flags &= ~S_DAX;
> > +			bd_set_dax(bdev, 0);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-05-16 17:31 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-14  0:02 [RFCv2 0/3] Support for high-priority block device flag Jon Derrick
2016-05-14  0:02 ` [RFCv2 1/3] block: allow other bd i_node flags when DAX is disabled Jon Derrick
2016-05-14  0:02 ` [RFCv2 2/3] block: add helper for setting and clearing S_DAX on inode Jon Derrick
2016-05-14  5:05   ` Elliott, Robert (Persistent Memory)
2016-05-16 17:31     ` Jon Derrick
2016-05-14  0:02 ` [RFCv2 3/3] block: Introduce S_HIPRI inode flag Jon Derrick
2016-05-14 14:22   ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).