All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET v10] Add support for write life time hints
@ 2017-06-26 15:37 Jens Axboe
  2017-06-26 15:37 ` [PATCH 1/9] fs: add fcntl() interface for setting/getting " Jens Axboe
                   ` (8 more replies)
  0 siblings, 9 replies; 42+ messages in thread
From: Jens Axboe @ 2017-06-26 15:37 UTC (permalink / raw)
  To: linux-block; +Cc: linux-fsdevel, hch, martin.petersen

A new iteration of this patchset, previously known as write streams.
As before, this patchset aims at enabling applications split up
writes into separate streams, based on the perceived life time
of the data written. This is useful for a variety of reasons:

- For NVMe, this feature is ratified and released with the NVMe 1.3
  spec. Devices implementing Directives can expose multiple streams.
  Separating data written into streams based on life time can
  drastically reduce the write amplification. This helps device
  endurance, and increases performance. Testing just performed
  internally at Facebook with these patches showed up to a 25% reduction
  in NAND writes in a RocksDB setup.

- Software caching solutions can make more intelligent decisions
  on how and where to place data.

Contrary to previous patches, we're not exposing numeric stream values anymore.
I've previously advocated for just doing a set of hints that makes sense
instead. See the coverage from the LSFMM summit this year:

https://lwn.net/Articles/717755/

This patchset attempts to do that. We add an fcntl(2) interface to
get/set these types of hints. We define 4 hints that pertain to
data write life times:

RWH_WRITE_LIFE_SHORT	Data written with this flag is expected to have
			a high overwrite rate, or life time.

RWH_WRITE_LIFE_MEDIUM	Longer life time than SHORT

RWH_WRITE_LIFE_LONG	Longer life time than MEDIUM

RWH_WRITE_LIFE_EXTREME	Longer life time than LONG

The idea is that these are relative values, so an application can
use them as they see fit. The underlying device can then place
data appropriately, or be free to ignore the hint. It's just a hint.

A branch based on current master can be pulled
from here:

git://git.kernel.dk/linux-block write-stream.10

Changes since v9:

- Address Christoph's concerns:
	- Add NVMe 'streams' parameter, default to off.
	- Add file get/set fcntl() commands.
	- Add helper for getting block opf mask from inode write
	  hint.
	- Fixup a few < 80 lines.

Changes since v8:

- Add file write hints as well. File hints override inode hints,
  if both are valid and available.
- Distinguish between "hint not set" or "hint none".
- NVMe: remove global stream allocation and stream parameter
- Rebase on top of new for-4.13/block, to fixup conflicts with
  the NOWAIT patchset.

Changes since v7:

- NVMe: change 'streams' parameter to be a bool enable/disable. We
  hardwire the number of streams anyway and use the appropriate amount,
  so no point in exposing this value.
- NVMe: collapse stream values appropriately, instead of just doing
  a basic MOD.
- Get rid of pwritev2(2) flags. Just use the fcntl(2) interface.
- Collapse some patches
- Change fcntl(2) interface to get/set values from a user supplied
  64-bit pointer.
- Move inode-to-iocb mask setting to iocb_flags().

Changes since v6:

- Rewrite NVMe write stream assignment
- Change NVMe stream assignment to be per-controller, not per-ns. Then
  we can use the same IDs across name spaces, and we don't have to do
  lazy setup of streams.
- If streams are enabled on nvme, set io min/opt and discard
  granularity based on the stream params reported.
- Fixup F_SET_RW_HINT definition, it was 20, should have been 12.

Changes since v5:

- Change enum write_hint to enum rw_hint.
- Change fcntl() interface to be read/write generic
- Bring enum rw_hint all the way to bio/request
- Change references to streams in changelogs and debugfs interface
- Rebase to master to resolve blkdev.h conflict
- Reshuffle patches so the WRITE_LIFE_* hints and type come first. Allowed
  me to merge two block patches as well.

Changes since v4:

- Add enum write_hint and the WRITE_HINT_* values. This is what we
  use internally (until transformed to req/bio flags), and what is
  exposed to user space with the fcntl() interface. Maps directly
  to the RWF_WRITE_LIFE_* values.
- Add fcntl() interface for getting/setting hint values.
- Get rid of inode ->i_write_hint, encode the 3 bits of hint info
  in the inode flags intead.
- Allow a write with no hint to clear the old hint. Previously we
  only changed the hint if a new valid hint was given, not if no
  hint was passed in.
- Shrink flag space grabbed from 4 to 3 bits for RWF_* and the inode
  flags.

Changes since v3:

- Change any naming of stream ID to write hint.
- Various little API changes, suggested by Christoph
- Cleanup the NVMe bits, dump the debug info.
- Change NVMe to lazily allocate the streams.
- Various NVMe error handling improvements and command checking.

Changes since v2:

- Get rid of bio->bi_stream and replace with four request/bio flags.
  These map directly to the RWF_WRITE_* flags that the user passes in.
- Cleanup the NVMe stream setting.
- Drivers now responsible for updating the queue stream write counter,
  as they determine what stream to map a given flag to.

Changes since v1:

- Guard queue stream stats to ensure we don't mess up memory, if
  bio_stream() ever were to return a larger value than we support.
- NVMe: ensure we set the stream modulo the name space defined count.
- Cleanup the RWF_ and IOCB_ flags. Set aside 4 bits, and just store
  the stream value in there. This makes the passing of stream ID from
  RWF_ space to IOCB_ (and IOCB_ to bio) more efficient, and cleans it
  up in general.
- Kill the block internal definitions of the stream type, we don't need
  them anymore. See above.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 1/9] fs: add fcntl() interface for setting/getting write life time hints
  2017-06-26 15:37 [PATCHSET v10] Add support for write life time hints Jens Axboe
@ 2017-06-26 15:37 ` Jens Axboe
  2017-06-27 14:42   ` Christoph Hellwig
  2017-06-26 15:37 ` [PATCH 2/9] block: add support for write hints in a bio Jens Axboe
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 42+ messages in thread
From: Jens Axboe @ 2017-06-26 15:37 UTC (permalink / raw)
  To: linux-block; +Cc: linux-fsdevel, hch, martin.petersen, Jens Axboe

Define a set of write life time hints:

RWH_WRITE_LIFE_NOT_SET	No hint information set
RWH_WRITE_LIFE_NONE	No hints about write life time
RWH_WRITE_LIFE_SHORT	Data written has a short life time
RWH_WRITE_LIFE_MEDIUM	Data written has a medium life time
RWH_WRITE_LIFE_LONG	Data written has a long life time
RWH_WRITE_LIFE_EXTREME	Data written has an extremely long life time

The intent is for these values to be relative to each other, no
absolute meaning should be attached to these flag names.

Add an fcntl interface for querying these flags, and also for
setting them as well:

F_GET_RW_HINT		Returns the read/write hint set on the
			underlying inode.

F_SET_RW_HINT		Set one of the above write hints on the
			underlying inode.

F_GET_FILE_RW_HINT	Returns the read/write hint set on the
			file descriptor.

F_SET_FILE_RW_HINT	Set one of the above write hints on the
			file descriptor.

The user passes in a 64-bit pointer to get/set these values, and
the interface returns 0/-1 on success/error.

Sample program testing/implementing basic setting/getting of write
hints is below.

Add support for storing the write life time hint in the inode flags
and in struct file as well, and pass them to the kiocb flags. If
both a file and its corresponding inode has a write hint, then we
use the one in the file, if available. The file hint can be used
for sync/direct IO, for buffered writeback only the inode hint
is available.

This is in preparation for utilizing these hints in the block layer,
to guide on-media data placement.

/*
 * writehint.c: get or set an inode write hint
 */
 #include <stdio.h>
 #include <fcntl.h>
 #include <stdlib.h>
 #include <unistd.h>
 #include <stdbool.h>
 #include <inttypes.h>

 #ifndef F_GET_RW_HINT
 #define F_LINUX_SPECIFIC_BASE	1024
 #define F_GET_RW_HINT		(F_LINUX_SPECIFIC_BASE + 11)
 #define F_SET_RW_HINT		(F_LINUX_SPECIFIC_BASE + 12)
 #endif

static char *str[] = { "RWF_WRITE_LIFE_NOT_SET", "RWH_WRITE_LIFE_NONE",
			"RWH_WRITE_LIFE_SHORT", "RWH_WRITE_LIFE_MEDIUM",
			"RWH_WRITE_LIFE_LONG", "RWH_WRITE_LIFE_EXTREME" };

int main(int argc, char *argv[])
{
	uint64_t hint;
	int fd, ret;

	if (argc < 2) {
		fprintf(stderr, "%s: file <hint>\n", argv[0]);
		return 1;
	}

	fd = open(argv[1], O_RDONLY);
	if (fd < 0) {
		perror("open");
		return 2;
	}

	if (argc > 2) {
		hint = atoi(argv[2]);
		ret = fcntl(fd, F_SET_RW_HINT, &hint);
		if (ret < 0) {
			perror("fcntl: F_SET_RW_HINT");
			return 4;
		}
	}

	ret = fcntl(fd, F_GET_RW_HINT, &hint);
	if (ret < 0) {
		perror("fcntl: F_GET_RW_HINT");
		return 3;
	}

	printf("%s: hint %s\n", argv[1], str[hint]);
	close(fd);
	return 0;
}

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/fcntl.c                 | 66 +++++++++++++++++++++++++++++++++++++++++
 fs/inode.c                 | 11 +++++++
 fs/open.c                  |  1 +
 include/linux/fs.h         | 74 ++++++++++++++++++++++++++++++++++++++++++++--
 include/uapi/linux/fcntl.h | 21 +++++++++++++
 5 files changed, 171 insertions(+), 2 deletions(-)

diff --git a/fs/fcntl.c b/fs/fcntl.c
index f4e7267d117f..e166807646bf 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -243,6 +243,66 @@ static int f_getowner_uids(struct file *filp, unsigned long arg)
 }
 #endif
 
+static long fcntl_rw_hint(struct file *file, unsigned int cmd,
+			  unsigned long arg)
+{
+	struct inode *inode = file_inode(file);
+	bool on_file = false;
+	enum rw_hint hint;
+	long ret = 0;
+
+	switch (cmd) {
+	case F_GET_FILE_RW_HINT:
+		on_file = true;
+	case F_GET_RW_HINT:
+		/*
+		 * If we ask for the file descriptor hint and it isn't set,
+		 * return the underlying inode write hint. This is what
+		 * writeback does as well.
+		 */
+		hint = RWF_WRITE_LIFE_NOT_SET;
+		if (on_file)
+			hint = file->f_write_hint;
+
+		if (!on_file || hint == RWF_WRITE_LIFE_NOT_SET)
+			hint = mask_to_write_hint(inode->i_flags,
+							S_WRITE_LIFE_SHIFT);
+		if (put_user(hint, (u64 __user *) arg))
+			ret = -EFAULT;
+		break;
+	case F_SET_FILE_RW_HINT:
+		on_file = true;
+	case F_SET_RW_HINT:
+		if (get_user(hint, (u64 __user *) arg)) {
+			ret = -EFAULT;
+			break;
+		}
+		switch (hint) {
+		case RWF_WRITE_LIFE_NOT_SET:
+		case RWH_WRITE_LIFE_NONE:
+		case RWH_WRITE_LIFE_SHORT:
+		case RWH_WRITE_LIFE_MEDIUM:
+		case RWH_WRITE_LIFE_LONG:
+		case RWH_WRITE_LIFE_EXTREME:
+			if (on_file) {
+				spin_lock(&file->f_lock);
+				file->f_write_hint = hint;
+				spin_unlock(&file->f_lock);
+			} else
+				inode_set_write_hint(inode, hint);
+			break;
+		default:
+			ret = -EINVAL;
+		}
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+
+	return ret;
+}
+
 static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
 		struct file *filp)
 {
@@ -337,6 +397,12 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
 	case F_GET_SEALS:
 		err = shmem_fcntl(filp, cmd, arg);
 		break;
+	case F_GET_RW_HINT:
+	case F_SET_RW_HINT:
+	case F_GET_FILE_RW_HINT:
+	case F_SET_FILE_RW_HINT:
+		err = fcntl_rw_hint(filp, cmd, arg);
+		break;
 	default:
 		break;
 	}
diff --git a/fs/inode.c b/fs/inode.c
index db5914783a71..defb015a2c6d 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -2120,3 +2120,14 @@ struct timespec current_time(struct inode *inode)
 	return timespec_trunc(now, inode->i_sb->s_time_gran);
 }
 EXPORT_SYMBOL(current_time);
+
+void inode_set_write_hint(struct inode *inode, enum rw_hint hint)
+{
+	unsigned int flags = write_hint_to_mask(hint, S_WRITE_LIFE_SHIFT);
+
+	if (flags != mask_to_write_hint(inode->i_flags, S_WRITE_LIFE_SHIFT)) {
+		inode_lock(inode);
+		inode_set_flags(inode, flags, S_WRITE_LIFE_MASK);
+		inode_unlock(inode);
+	}
+}
diff --git a/fs/open.c b/fs/open.c
index cd0c5be8d012..3fe0c4aa7d27 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -759,6 +759,7 @@ static int do_dentry_open(struct file *f,
 	     likely(f->f_op->write || f->f_op->write_iter))
 		f->f_mode |= FMODE_CAN_WRITE;
 
+	f->f_write_hint = WRITE_LIFE_NOT_SET;
 	f->f_flags &= ~(O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC);
 
 	file_ra_state_init(&f->f_ra, f->f_mapping->host->i_mapping);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 4574121f4746..0ef5d110d2bc 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -274,6 +274,13 @@ struct writeback_control;
 #define IOCB_WRITE		(1 << 6)
 #define IOCB_NOWAIT		(1 << 7)
 
+/*
+ * Steal 3 bits for write hint information, this allows 8 valid hints
+ */
+#define IOCB_WRITE_LIFE_SHIFT	8
+#define IOCB_WRITE_LIFE_MASK	(7 << IOCB_WRITE_LIFE_SHIFT)
+
+
 struct kiocb {
 	struct file		*ki_filp;
 	loff_t			ki_pos;
@@ -297,6 +304,12 @@ static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp)
 	};
 }
 
+static inline int iocb_write_hint(const struct kiocb *iocb)
+{
+	return (iocb->ki_flags & IOCB_WRITE_LIFE_MASK) >>
+			IOCB_WRITE_LIFE_SHIFT;
+}
+
 /*
  * "descriptor" for what we're up to with a read.
  * This allows us to use the same read code yet
@@ -828,6 +841,20 @@ struct file_ra_state {
 	loff_t prev_pos;		/* Cache last read() position */
 };
 
+#include <linux/fcntl.h>
+
+/*
+ * Write life time hint values.
+ */
+enum rw_hint {
+	WRITE_LIFE_NOT_SET = 0,
+	WRITE_LIFE_NONE = RWH_WRITE_LIFE_NONE,
+	WRITE_LIFE_SHORT = RWH_WRITE_LIFE_SHORT,
+	WRITE_LIFE_MEDIUM = RWH_WRITE_LIFE_MEDIUM,
+	WRITE_LIFE_LONG = RWH_WRITE_LIFE_LONG,
+	WRITE_LIFE_EXTREME = RWH_WRITE_LIFE_EXTREME,
+};
+
 /*
  * Check if @index falls in the readahead windows.
  */
@@ -851,6 +878,7 @@ struct file {
 	 * Must not be taken from IRQ context.
 	 */
 	spinlock_t		f_lock;
+	enum rw_hint		f_write_hint;
 	atomic_long_t		f_count;
 	unsigned int 		f_flags;
 	fmode_t			f_mode;
@@ -1026,8 +1054,6 @@ struct file_lock_context {
 #define OFFT_OFFSET_MAX	INT_LIMIT(off_t)
 #endif
 
-#include <linux/fcntl.h>
-
 extern void send_sigio(struct fown_struct *fown, int fd, int band);
 
 /*
@@ -1833,6 +1859,14 @@ struct super_operations {
 #endif
 
 /*
+ * Expected life time hint of a write for this inode. This uses the
+ * WRITE_LIFE_* encoding, we just need to define the shift. We need
+ * 3 bits for this. Next S_* value is 131072, bit 17.
+ */
+#define S_WRITE_LIFE_SHIFT	14	/* 16384, next bit */
+#define S_WRITE_LIFE_MASK	(7 << S_WRITE_LIFE_SHIFT)
+
+/*
  * Note that nosuid etc flags are inode-specific: setting some file-system
  * flags just means all the inodes inherit those flags by default. It might be
  * possible to override it selectively if you really wanted to with some
@@ -1878,6 +1912,39 @@ static inline bool HAS_UNMAPPED_ID(struct inode *inode)
 	return !uid_valid(inode->i_uid) || !gid_valid(inode->i_gid);
 }
 
+static inline unsigned int write_hint_to_mask(enum rw_hint hint,
+					      unsigned int shift)
+{
+	return hint << shift;
+}
+
+static inline enum rw_hint mask_to_write_hint(unsigned int mask,
+					      unsigned int shift)
+{
+	return (mask >> shift) & 0x7;
+}
+
+static inline enum rw_hint inode_write_hint(struct inode *inode)
+{
+	enum rw_hint ret = WRITE_LIFE_NONE;
+
+	if (inode) {
+		ret = mask_to_write_hint(inode->i_flags, S_WRITE_LIFE_SHIFT);
+		if (ret == WRITE_LIFE_NOT_SET)
+			ret = WRITE_LIFE_NONE;
+	}
+
+	return ret;
+}
+
+static inline enum rw_hint file_write_hint(struct file *file)
+{
+	if (file->f_write_hint != WRITE_LIFE_NOT_SET)
+		return file->f_write_hint;
+
+	return inode_write_hint(file_inode(file));
+}
+
 /*
  * Inode state bits.  Protected by inode->i_lock
  *
@@ -2764,6 +2831,7 @@ extern struct inode *new_inode(struct super_block *sb);
 extern void free_inode_nonrcu(struct inode *inode);
 extern int should_remove_suid(struct dentry *);
 extern int file_remove_privs(struct file *);
+extern void inode_set_write_hint(struct inode *inode, enum rw_hint hint);
 
 extern void __insert_inode_hash(struct inode *, unsigned long hashval);
 static inline void insert_inode_hash(struct inode *inode)
@@ -3060,6 +3128,8 @@ static inline int iocb_flags(struct file *file)
 		res |= IOCB_DSYNC;
 	if (file->f_flags & __O_SYNC)
 		res |= IOCB_SYNC;
+
+	res |= write_hint_to_mask(file->f_write_hint, IOCB_WRITE_LIFE_SHIFT);
 	return res;
 }
 
diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
index 813afd6eee71..ec69d55bcec7 100644
--- a/include/uapi/linux/fcntl.h
+++ b/include/uapi/linux/fcntl.h
@@ -43,6 +43,27 @@
 /* (1U << 31) is reserved for signed error codes */
 
 /*
+ * Set/Get write life time hints. {GET,SET}_RW_HINT operate on the
+ * underlying inode, while {GET,SET}_FILE_RW_HINT operate only on
+ * the specific file.
+ */
+#define F_GET_RW_HINT		(F_LINUX_SPECIFIC_BASE + 11)
+#define F_SET_RW_HINT		(F_LINUX_SPECIFIC_BASE + 12)
+#define F_GET_FILE_RW_HINT	(F_LINUX_SPECIFIC_BASE + 13)
+#define F_SET_FILE_RW_HINT	(F_LINUX_SPECIFIC_BASE + 14)
+
+/*
+ * Valid hint values for F_{GET,SET}_RW_HINT. 0 is "not set", or can be
+ * used to clear any hints previously set.
+ */
+#define RWF_WRITE_LIFE_NOT_SET	0
+#define RWH_WRITE_LIFE_NONE	1
+#define RWH_WRITE_LIFE_SHORT	2
+#define RWH_WRITE_LIFE_MEDIUM	3
+#define RWH_WRITE_LIFE_LONG	4
+#define RWH_WRITE_LIFE_EXTREME	5
+
+/*
  * Types of directory notifications that may be requested.
  */
 #define DN_ACCESS	0x00000001	/* File accessed */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 2/9] block: add support for write hints in a bio
  2017-06-26 15:37 [PATCHSET v10] Add support for write life time hints Jens Axboe
  2017-06-26 15:37 ` [PATCH 1/9] fs: add fcntl() interface for setting/getting " Jens Axboe
@ 2017-06-26 15:37 ` Jens Axboe
  2017-06-26 15:37 ` [PATCH 3/9] blk-mq: expose write hints through debugfs Jens Axboe
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2017-06-26 15:37 UTC (permalink / raw)
  To: linux-block; +Cc: linux-fsdevel, hch, martin.petersen, Jens Axboe

No functional changes in this patch, we just set aside 3 bits
in the bio/request flags, which can be used to hold a WRITE_LIFE_*
life time hint.

Ensure that we don't merge requests that have different life time
hints assigned to them.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-merge.c         | 16 ++++++++++++++++
 fs/inode.c                |  9 +++++++++
 include/linux/blk_types.h | 31 +++++++++++++++++++++++++++++++
 include/linux/fs.h        |  2 ++
 4 files changed, 58 insertions(+)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 5df13041b851..be1e955db75e 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -673,6 +673,14 @@ static struct request *attempt_merge(struct request_queue *q,
 		return NULL;
 
 	/*
+	 * Don't allow merge of different write hints, or for a hint with
+	 * non-hint IO.
+	 */
+	if ((req->cmd_flags & REQ_WRITE_LIFE_MASK) !=
+	    (next->cmd_flags & REQ_WRITE_LIFE_MASK))
+		return NULL;
+
+	/*
 	 * If we are allowed to merge, then append bio list
 	 * from next to rq and release next. merge_requests_fn
 	 * will have updated segment counts, update sector
@@ -791,6 +799,14 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
 	    !blk_write_same_mergeable(rq->bio, bio))
 		return false;
 
+	/*
+	 * Don't allow merge of different write hints, or for a hint with
+	 * non-hint IO.
+	 */
+	if ((rq->cmd_flags & REQ_WRITE_LIFE_MASK) !=
+	    (bio->bi_opf & REQ_WRITE_LIFE_MASK))
+		return false;
+
 	return true;
 }
 
diff --git a/fs/inode.c b/fs/inode.c
index defb015a2c6d..66cc431c9a96 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -2131,3 +2131,12 @@ void inode_set_write_hint(struct inode *inode, enum rw_hint hint)
 		inode_unlock(inode);
 	}
 }
+
+/*
+ * Returns block write hint mask for the inode
+ */
+unsigned int inode_hint_to_opf(struct inode *inode)
+{
+	return write_hint_to_opf(inode_write_hint(inode));
+}
+EXPORT_SYMBOL(inode_hint_to_opf);
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index e210da6d14b8..0d44dce19d9f 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -7,6 +7,7 @@
 
 #include <linux/types.h>
 #include <linux/bvec.h>
+#include <linux/fs.h>
 
 struct bio_set;
 struct bio;
@@ -223,6 +224,10 @@ enum req_flag_bits {
 	__REQ_RAHEAD,		/* read ahead, can fail anytime */
 	__REQ_BACKGROUND,	/* background IO */
 
+	__REQ_WRITE_HINT_SHIFT,	/* 3 bits for life time hint */
+	__REQ_WRITE_HINT_PAD1,
+	__REQ_WRITE_HINT_PAD2,
+
 	/* command specific flags for REQ_OP_WRITE_ZEROES: */
 	__REQ_NOUNMAP,		/* do not free blocks when zeroing */
 
@@ -244,6 +249,13 @@ enum req_flag_bits {
 #define REQ_RAHEAD		(1ULL << __REQ_RAHEAD)
 #define REQ_BACKGROUND		(1ULL << __REQ_BACKGROUND)
 
+#define REQ_WRITE_SHORT		(WRITE_LIFE_SHORT << __REQ_WRITE_HINT_SHIFT)
+#define REQ_WRITE_MEDIUM	(WRITE_LIFE_MEDIUM << __REQ_WRITE_HINT_SHIFT)
+#define REQ_WRITE_LONG		(WRITE_LIFE_LONG << __REQ_WRITE_HINT_SHIFT)
+#define REQ_WRITE_EXTREME	(WRITE_LIFE_EXTREME << __REQ_WRITE_HINT_SHIFT)
+
+#define REQ_WRITE_LIFE_MASK	(0x7 << __REQ_WRITE_HINT_SHIFT)
+
 #define REQ_NOUNMAP		(1ULL << __REQ_NOUNMAP)
 #define REQ_NOWAIT		(1ULL << __REQ_NOWAIT)
 
@@ -335,4 +347,23 @@ struct blk_rq_stat {
 	u64 batch;
 };
 
+static inline unsigned int write_hint_to_opf(enum rw_hint hint)
+{
+	return hint << __REQ_WRITE_HINT_SHIFT;
+}
+
+/*
+ * Don't let drivers see WRITE_LIFE_NOT_SET, return NONE for that
+ */
+static inline enum rw_hint opf_to_write_hint(unsigned int opf)
+{
+	enum rw_hint ret;
+
+	ret = (opf & REQ_WRITE_LIFE_MASK) >> __REQ_WRITE_HINT_SHIFT;
+	if (ret == WRITE_LIFE_NOT_SET)
+		ret = WRITE_LIFE_NONE;
+
+	return ret;
+}
+
 #endif /* __LINUX_BLK_TYPES_H */
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 0ef5d110d2bc..86888a6ccad1 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1912,6 +1912,8 @@ static inline bool HAS_UNMAPPED_ID(struct inode *inode)
 	return !uid_valid(inode->i_uid) || !gid_valid(inode->i_gid);
 }
 
+extern unsigned int inode_hint_to_opf(struct inode *inode);
+
 static inline unsigned int write_hint_to_mask(enum rw_hint hint,
 					      unsigned int shift)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 3/9] blk-mq: expose write hints through debugfs
  2017-06-26 15:37 [PATCHSET v10] Add support for write life time hints Jens Axboe
  2017-06-26 15:37 ` [PATCH 1/9] fs: add fcntl() interface for setting/getting " Jens Axboe
  2017-06-26 15:37 ` [PATCH 2/9] block: add support for write hints in a bio Jens Axboe
@ 2017-06-26 15:37 ` Jens Axboe
  2017-06-27 15:17   ` Christoph Hellwig
  2017-06-26 15:37 ` [PATCH 4/9] fs: add O_DIRECT support for sending down write life time hints Jens Axboe
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 42+ messages in thread
From: Jens Axboe @ 2017-06-26 15:37 UTC (permalink / raw)
  To: linux-block; +Cc: linux-fsdevel, hch, martin.petersen, Jens Axboe

Useful to verify that things are working the way they should.
Reading the file will return number of kb written with each
write hint. Writing the file will reset the statistics. No care
is taken to ensure that we don't race on updates.

Drivers will write to q->write_hints[] if they handle a given
write hint.

Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-mq-debugfs.c | 24 ++++++++++++++++++++++++
 include/linux/blkdev.h |  3 +++
 2 files changed, 27 insertions(+)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 9edebbdce0bd..9ebc2945f991 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -135,6 +135,29 @@ static void print_stat(struct seq_file *m, struct blk_rq_stat *stat)
 	}
 }
 
+static int queue_write_hint_show(void *data, struct seq_file *m)
+{
+	struct request_queue *q = data;
+	int i;
+
+	for (i = 0; i < BLK_MAX_WRITE_HINTS; i++)
+		seq_printf(m, "hint%d: %llu\n", i, q->write_hints[i]);
+
+	return 0;
+}
+
+static ssize_t queue_write_hint_store(void *data, const char __user *buf,
+				      size_t count, loff_t *ppos)
+{
+	struct request_queue *q = data;
+	int i;
+
+	for (i = 0; i < BLK_MAX_WRITE_HINTS; i++)
+		q->write_hints[i] = 0;
+
+	return count;
+}
+
 static int queue_poll_stat_show(void *data, struct seq_file *m)
 {
 	struct request_queue *q = data;
@@ -730,6 +753,7 @@ static const struct blk_mq_debugfs_attr blk_mq_debugfs_queue_attrs[] = {
 	{"poll_stat", 0400, queue_poll_stat_show},
 	{"requeue_list", 0400, .seq_ops = &queue_requeue_list_seq_ops},
 	{"state", 0600, queue_state_show, queue_state_write},
+	{"write_hints", 0600, queue_write_hint_show, queue_write_hint_store},
 	{},
 };
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index bf2157141d53..596de77b9a0a 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -594,6 +594,9 @@ struct request_queue {
 	void			*rq_alloc_data;
 
 	struct work_struct	release_work;
+
+#define BLK_MAX_WRITE_HINTS	5
+	u64			write_hints[BLK_MAX_WRITE_HINTS];
 };
 
 #define QUEUE_FLAG_QUEUED	1	/* uses generic tag queueing */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 4/9] fs: add O_DIRECT support for sending down write life time hints
  2017-06-26 15:37 [PATCHSET v10] Add support for write life time hints Jens Axboe
                   ` (2 preceding siblings ...)
  2017-06-26 15:37 ` [PATCH 3/9] blk-mq: expose write hints through debugfs Jens Axboe
@ 2017-06-26 15:37 ` Jens Axboe
  2017-06-27 14:53   ` Christoph Hellwig
  2017-06-26 15:37 ` [PATCH 5/9] fs: add support for buffered writeback to pass down write hints Jens Axboe
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 42+ messages in thread
From: Jens Axboe @ 2017-06-26 15:37 UTC (permalink / raw)
  To: linux-block; +Cc: linux-fsdevel, hch, martin.petersen, Jens Axboe

Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/block_dev.c | 2 ++
 fs/direct-io.c | 2 ++
 fs/iomap.c     | 5 ++++-
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index dd91c99e9ba0..30e1fb65c2fa 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -183,6 +183,8 @@ static unsigned int dio_bio_write_op(struct kiocb *iocb)
 	/* avoid the need for a I/O completion work item */
 	if (iocb->ki_flags & IOCB_DSYNC)
 		op |= REQ_FUA;
+
+	op |= write_hint_to_opf(iocb_write_hint(iocb));
 	return op;
 }
 
diff --git a/fs/direct-io.c b/fs/direct-io.c
index c87077d1dc33..5fea570551e5 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -385,6 +385,8 @@ dio_bio_alloc(struct dio *dio, struct dio_submit *sdio,
 	else
 		bio->bi_end_io = dio_bio_end_io;
 
+	bio->bi_opf |= write_hint_to_opf(iocb_write_hint(dio->iocb));
+
 	sdio->bio = bio;
 	sdio->logical_offset_in_bio = sdio->cur_page_fs_offset;
 }
diff --git a/fs/iomap.c b/fs/iomap.c
index c71a64b97fba..42d4ecf3ba54 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -803,7 +803,10 @@ iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length,
 		}
 
 		if (dio->flags & IOMAP_DIO_WRITE) {
-			bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_SYNC | REQ_IDLE);
+			bio_set_op_attrs(bio, REQ_OP_WRITE,
+						REQ_SYNC | REQ_IDLE);
+			bio->bi_opf |=
+				write_hint_to_opf(iocb_write_hint(dio->iocb));
 			task_io_account_write(bio->bi_iter.bi_size);
 		} else {
 			bio_set_op_attrs(bio, REQ_OP_READ, 0);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 5/9] fs: add support for buffered writeback to pass down write hints
  2017-06-26 15:37 [PATCHSET v10] Add support for write life time hints Jens Axboe
                   ` (3 preceding siblings ...)
  2017-06-26 15:37 ` [PATCH 4/9] fs: add O_DIRECT support for sending down write life time hints Jens Axboe
@ 2017-06-26 15:37 ` Jens Axboe
  2017-06-26 15:37 ` [PATCH 6/9] ext4: add support for passing in write hints for buffered writes Jens Axboe
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2017-06-26 15:37 UTC (permalink / raw)
  To: linux-block; +Cc: linux-fsdevel, hch, martin.petersen, Jens Axboe

Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/buffer.c | 14 +++++++++-----
 fs/mpage.c  |  1 +
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 306b720f7383..307b508c9d60 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -49,7 +49,7 @@
 
 static int fsync_buffers_list(spinlock_t *lock, struct list_head *list);
 static int submit_bh_wbc(int op, int op_flags, struct buffer_head *bh,
-			 struct writeback_control *wbc);
+			 unsigned int hint, struct writeback_control *wbc);
 
 #define BH_ENTRY(list) list_entry((list), struct buffer_head, b_assoc_buffers)
 
@@ -1829,7 +1829,8 @@ int __block_write_full_page(struct inode *inode, struct page *page,
 	do {
 		struct buffer_head *next = bh->b_this_page;
 		if (buffer_async_write(bh)) {
-			submit_bh_wbc(REQ_OP_WRITE, write_flags, bh, wbc);
+			submit_bh_wbc(REQ_OP_WRITE, write_flags, bh,
+					inode_write_hint(inode), wbc);
 			nr_underway++;
 		}
 		bh = next;
@@ -1883,7 +1884,8 @@ int __block_write_full_page(struct inode *inode, struct page *page,
 		struct buffer_head *next = bh->b_this_page;
 		if (buffer_async_write(bh)) {
 			clear_buffer_dirty(bh);
-			submit_bh_wbc(REQ_OP_WRITE, write_flags, bh, wbc);
+			submit_bh_wbc(REQ_OP_WRITE, write_flags, bh,
+					inode_write_hint(inode), wbc);
 			nr_underway++;
 		}
 		bh = next;
@@ -3091,7 +3093,7 @@ void guard_bio_eod(int op, struct bio *bio)
 }
 
 static int submit_bh_wbc(int op, int op_flags, struct buffer_head *bh,
-			 struct writeback_control *wbc)
+			 unsigned int write_hint, struct writeback_control *wbc)
 {
 	struct bio *bio;
 
@@ -3134,6 +3136,8 @@ static int submit_bh_wbc(int op, int op_flags, struct buffer_head *bh,
 		op_flags |= REQ_META;
 	if (buffer_prio(bh))
 		op_flags |= REQ_PRIO;
+
+	op_flags |= write_hint_to_opf(write_hint);
 	bio_set_op_attrs(bio, op, op_flags);
 
 	submit_bio(bio);
@@ -3142,7 +3146,7 @@ static int submit_bh_wbc(int op, int op_flags, struct buffer_head *bh,
 
 int submit_bh(int op, int op_flags, struct buffer_head *bh)
 {
-	return submit_bh_wbc(op, op_flags, bh, NULL);
+	return submit_bh_wbc(op, op_flags, bh, 0, NULL);
 }
 EXPORT_SYMBOL(submit_bh);
 
diff --git a/fs/mpage.c b/fs/mpage.c
index 9524fdde00c2..07587fd6debf 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -615,6 +615,7 @@ static int __mpage_writepage(struct page *page, struct writeback_control *wbc,
 			goto confused;
 
 		wbc_init_bio(wbc, bio);
+		bio->bi_opf |= inode_hint_to_opf(inode);
 	}
 
 	/*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 6/9] ext4: add support for passing in write hints for buffered writes
  2017-06-26 15:37 [PATCHSET v10] Add support for write life time hints Jens Axboe
                   ` (4 preceding siblings ...)
  2017-06-26 15:37 ` [PATCH 5/9] fs: add support for buffered writeback to pass down write hints Jens Axboe
@ 2017-06-26 15:37 ` Jens Axboe
  2017-06-26 15:37 ` [PATCH 7/9] xfs: " Jens Axboe
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2017-06-26 15:37 UTC (permalink / raw)
  To: linux-block; +Cc: linux-fsdevel, hch, martin.petersen, Jens Axboe

Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/ext4/page-io.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 930ca0fc9a0f..02e5a7b8d60b 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -350,6 +350,7 @@ void ext4_io_submit(struct ext4_io_submit *io)
 	if (bio) {
 		int io_op_flags = io->io_wbc->sync_mode == WB_SYNC_ALL ?
 				  REQ_SYNC : 0;
+		io_op_flags |= inode_hint_to_opf(io->io_end->inode);
 		bio_set_op_attrs(io->io_bio, REQ_OP_WRITE, io_op_flags);
 		submit_bio(io->io_bio);
 	}
@@ -397,6 +398,7 @@ static int io_submit_add_bh(struct ext4_io_submit *io,
 		ret = io_submit_init_bio(io, bh);
 		if (ret)
 			return ret;
+		io->io_bio->bi_opf |= inode_hint_to_opf(inode);
 	}
 	ret = bio_add_page(io->io_bio, page, bh->b_size, bh_offset(bh));
 	if (ret != bh->b_size)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 7/9] xfs: add support for passing in write hints for buffered writes
  2017-06-26 15:37 [PATCHSET v10] Add support for write life time hints Jens Axboe
                   ` (5 preceding siblings ...)
  2017-06-26 15:37 ` [PATCH 6/9] ext4: add support for passing in write hints for buffered writes Jens Axboe
@ 2017-06-26 15:37 ` Jens Axboe
  2017-06-26 15:37 ` [PATCH 8/9] btrfs: " Jens Axboe
  2017-06-26 15:38 ` [PATCH 9/9] nvme: add support for streams and directives Jens Axboe
  8 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2017-06-26 15:37 UTC (permalink / raw)
  To: linux-block; +Cc: linux-fsdevel, hch, martin.petersen, Jens Axboe

Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/xfs/xfs_aops.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 76b6f988e2fa..ceb124bd8f80 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -506,6 +506,7 @@ xfs_submit_ioend(
 		return status;
 	}
 
+	ioend->io_bio->bi_opf |= inode_hint_to_opf(ioend->io_inode);
 	submit_bio(ioend->io_bio);
 	return 0;
 }
@@ -565,6 +566,7 @@ xfs_chain_bio(
 	bio_chain(ioend->io_bio, new);
 	bio_get(ioend->io_bio);		/* for xfs_destroy_ioend */
 	ioend->io_bio->bi_opf = REQ_OP_WRITE | wbc_to_write_flags(wbc);
+	ioend->io_bio->bi_opf |= inode_hint_to_opf(ioend->io_inode);
 	submit_bio(ioend->io_bio);
 	ioend->io_bio = new;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 8/9] btrfs: add support for passing in write hints for buffered writes
  2017-06-26 15:37 [PATCHSET v10] Add support for write life time hints Jens Axboe
                   ` (6 preceding siblings ...)
  2017-06-26 15:37 ` [PATCH 7/9] xfs: " Jens Axboe
@ 2017-06-26 15:37 ` Jens Axboe
  2017-06-26 15:38 ` [PATCH 9/9] nvme: add support for streams and directives Jens Axboe
  8 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2017-06-26 15:37 UTC (permalink / raw)
  To: linux-block; +Cc: linux-fsdevel, hch, martin.petersen, Jens Axboe

Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Chris Mason <clm@fb.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/btrfs/extent_io.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 19eedf2e630b..fde09c6005fc 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2830,6 +2830,7 @@ static int submit_extent_page(int op, int op_flags, struct extent_io_tree *tree,
 	bio_add_page(bio, page, page_size, offset);
 	bio->bi_end_io = end_io_func;
 	bio->bi_private = tree;
+	op_flags |= inode_hint_to_opf(page->mapping->host);
 	bio_set_op_attrs(bio, op, op_flags);
 	if (wbc) {
 		wbc_init_bio(wbc, bio);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 9/9] nvme: add support for streams and directives
  2017-06-26 15:37 [PATCHSET v10] Add support for write life time hints Jens Axboe
                   ` (7 preceding siblings ...)
  2017-06-26 15:37 ` [PATCH 8/9] btrfs: " Jens Axboe
@ 2017-06-26 15:38 ` Jens Axboe
  8 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2017-06-26 15:38 UTC (permalink / raw)
  To: linux-block; +Cc: linux-fsdevel, hch, martin.petersen, Jens Axboe

This adds support for Directives in NVMe, particular for the Streams
directive. Support for Directives is a new feature in NVMe 1.3. It
allows a user to pass in information about where to store the data, so
that it the device can do so most effiently. If an application is
managing and writing data with different life times, mixing differently
retentioned data onto the same locations on flash can cause write
amplification to grow. This, in turn, will reduce performance and life
time of the device.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 drivers/nvme/host/core.c | 148 +++++++++++++++++++++++++++++++++++++++++++++--
 drivers/nvme/host/nvme.h |   4 ++
 include/linux/nvme.h     |  48 +++++++++++++++
 3 files changed, 196 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index aee37b73231d..2d9835617953 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -65,6 +65,10 @@ static bool force_apst;
 module_param(force_apst, bool, 0644);
 MODULE_PARM_DESC(force_apst, "allow APST for newly enumerated devices even if quirked off");
 
+static bool streams;
+module_param(streams, bool, 0644);
+MODULE_PARM_DESC(stream, "turn on support for Streams write directives");
+
 struct workqueue_struct *nvme_wq;
 EXPORT_SYMBOL_GPL(nvme_wq);
 
@@ -297,6 +301,102 @@ struct request *nvme_alloc_request(struct request_queue *q,
 }
 EXPORT_SYMBOL_GPL(nvme_alloc_request);
 
+static int nvme_enable_streams(struct nvme_ctrl *ctrl)
+{
+	struct nvme_command c;
+
+	memset(&c, 0, sizeof(c));
+
+	c.directive.opcode = nvme_admin_directive_send;
+	c.directive.nsid = cpu_to_le32(0xffffffff);
+	c.directive.doper = NVME_DIR_SND_ID_OP_ENABLE;
+	c.directive.dtype = NVME_DIR_IDENTIFY;
+	c.directive.tdtype = NVME_DIR_STREAMS;
+	c.directive.endir = NVME_DIR_ENDIR;
+
+	return nvme_submit_sync_cmd(ctrl->admin_q, &c, NULL, 0);
+}
+
+static int nvme_get_stream_params(struct nvme_ctrl *ctrl,
+				  struct streams_directive_params *s, u32 nsid)
+{
+	struct nvme_command c;
+
+	memset(&c, 0, sizeof(c));
+	memset(s, 0, sizeof(*s));
+
+	c.directive.opcode = nvme_admin_directive_recv;
+	c.directive.nsid = cpu_to_le32(nsid);
+	c.directive.numd = sizeof(*s);
+	c.directive.doper = NVME_DIR_RCV_ST_OP_PARAM;
+	c.directive.dtype = NVME_DIR_STREAMS;
+
+	return nvme_submit_sync_cmd(ctrl->admin_q, &c, s, sizeof(*s));
+}
+
+static int nvme_configure_directives(struct nvme_ctrl *ctrl)
+{
+	struct streams_directive_params s;
+	int ret;
+
+	if (!(ctrl->oacs & NVME_CTRL_OACS_DIRECTIVES))
+		return 0;
+	if (!streams)
+		return 0;
+
+	ret = nvme_enable_streams(ctrl);
+	if (ret)
+		return ret;
+
+	ret = nvme_get_stream_params(ctrl, &s, 0xffffffff);
+	if (ret)
+		return ret;
+
+	ctrl->nssa = le16_to_cpu(s.nssa);
+	ctrl->nr_streams = min_t(unsigned, ctrl->nssa, BLK_MAX_WRITE_HINTS - 1);
+	return 0;
+}
+
+/*
+ * Write hint number to stream mappings
+ */
+static const unsigned int stream_mappings[BLK_MAX_WRITE_HINTS][BLK_MAX_WRITE_HINTS] = {
+	/* 0 or 1 stream, we don't use streams */
+	{ 0, },
+	{ 0, },
+	/* collapse short+medium to short, and long+extreme to medium */
+	{ WRITE_LIFE_NONE, WRITE_LIFE_SHORT, WRITE_LIFE_SHORT,
+		WRITE_LIFE_MEDIUM, WRITE_LIFE_MEDIUM },
+	/* collapse long+extreme to long */
+	{ WRITE_LIFE_NONE, WRITE_LIFE_SHORT, WRITE_LIFE_MEDIUM,
+		WRITE_LIFE_LONG, WRITE_LIFE_LONG },
+	/* 4 streams, no collapsing needed */
+	{ WRITE_LIFE_NONE, WRITE_LIFE_SHORT, WRITE_LIFE_MEDIUM,
+		WRITE_LIFE_LONG, WRITE_LIFE_EXTREME },
+};
+
+/*
+ * Check if 'req' has a write hint associated with it. If it does, assign
+ * a valid namespace stream to the write. If we haven't setup streams yet,
+ * kick off configuration and ignore the hints until that has completed.
+ */
+static void nvme_assign_write_stream(struct nvme_ctrl *ctrl,
+				     struct request *req, u16 *control,
+				     u32 *dsmgmt)
+{
+	enum rw_hint streamid;
+
+	streamid = opf_to_write_hint(req->cmd_flags);
+	if (streamid != WRITE_LIFE_NONE) {
+		streamid = stream_mappings[ctrl->nr_streams][streamid - 1];
+		*control |= NVME_RW_DTYPE_STREAMS;
+		*dsmgmt |= streamid << 16;
+	}
+
+	if (streamid < ARRAY_SIZE(req->q->write_hints))
+		req->q->write_hints[streamid] += blk_rq_bytes(req) >> 9;
+}
+
 static inline void nvme_setup_flush(struct nvme_ns *ns,
 		struct nvme_command *cmnd)
 {
@@ -348,6 +448,7 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req,
 static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns,
 		struct request *req, struct nvme_command *cmnd)
 {
+	struct nvme_ctrl *ctrl = ns->ctrl;
 	u16 control = 0;
 	u32 dsmgmt = 0;
 
@@ -375,6 +476,9 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns,
 	cmnd->rw.slba = cpu_to_le64(nvme_block_nr(ns, blk_rq_pos(req)));
 	cmnd->rw.length = cpu_to_le16((blk_rq_bytes(req) >> ns->lba_shift) - 1);
 
+	if (req_op(req) == REQ_OP_WRITE && ctrl->nr_streams)
+		nvme_assign_write_stream(ctrl, req, &control, &dsmgmt);
+
 	if (ns->ms) {
 		switch (ns->pi_type) {
 		case NVME_NS_DPS_PI_TYPE3:
@@ -1094,8 +1198,15 @@ static void nvme_config_discard(struct nvme_ns *ns)
 	BUILD_BUG_ON(PAGE_SIZE / sizeof(struct nvme_dsm_range) <
 			NVME_DSM_MAX_RANGES);
 
-	ns->queue->limits.discard_alignment = logical_block_size;
-	ns->queue->limits.discard_granularity = logical_block_size;
+	if (ctrl->nr_streams && ns->sws && ns->sgs) {
+		unsigned int sz = logical_block_size * ns->sws * ns->sgs;
+
+		ns->queue->limits.discard_alignment = sz;
+		ns->queue->limits.discard_granularity = sz;
+	} else {
+		ns->queue->limits.discard_alignment = logical_block_size;
+		ns->queue->limits.discard_granularity = logical_block_size;
+	}
 	blk_queue_max_discard_sectors(ns->queue, UINT_MAX);
 	blk_queue_max_discard_segments(ns->queue, NVME_DSM_MAX_RANGES);
 	queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, ns->queue);
@@ -1135,6 +1246,7 @@ static int nvme_revalidate_ns(struct nvme_ns *ns, struct nvme_id_ns **id)
 static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 {
 	struct nvme_ns *ns = disk->private_data;
+	struct nvme_ctrl *ctrl = ns->ctrl;
 	u16 bs;
 
 	/*
@@ -1149,7 +1261,7 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 
 	blk_mq_freeze_queue(disk->queue);
 
-	if (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED)
+	if (ctrl->ops->flags & NVME_F_METADATA_SUPPORTED)
 		nvme_prep_integrity(disk, id, bs);
 	blk_queue_logical_block_size(ns->queue, bs);
 	if (ns->noiob)
@@ -1161,7 +1273,7 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 	else
 		set_capacity(disk, le64_to_cpup(&id->nsze) << (ns->lba_shift - 9));
 
-	if (ns->ctrl->oncs & NVME_CTRL_ONCS_DSM)
+	if (ctrl->oncs & NVME_CTRL_ONCS_DSM)
 		nvme_config_discard(ns);
 	blk_mq_unfreeze_queue(disk->queue);
 }
@@ -1766,6 +1878,7 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
 		dev_pm_qos_hide_latency_tolerance(ctrl->device);
 
 	nvme_configure_apst(ctrl);
+	nvme_configure_directives(ctrl);
 
 	ctrl->identified = true;
 
@@ -2158,6 +2271,32 @@ static struct nvme_ns *nvme_find_get_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	return ret;
 }
 
+static int nvme_setup_streams_ns(struct nvme_ctrl *ctrl, struct nvme_ns *ns)
+{
+	struct streams_directive_params s;
+	int ret;
+
+	if (!ctrl->nr_streams)
+		return 0;
+
+	ret = nvme_get_stream_params(ctrl, &s, ns->ns_id);
+	if (ret)
+		return ret;
+
+	ns->sws = le32_to_cpu(s.sws);
+	ns->sgs = le16_to_cpu(s.sgs);
+
+	if (ns->sws) {
+		unsigned int bs = 1 << ns->lba_shift;
+
+		blk_queue_io_min(ns->queue, bs * ns->sws);
+		if (ns->sgs)
+			blk_queue_io_opt(ns->queue, bs * ns->sws * ns->sgs);
+	}
+
+	return 0;
+}
+
 static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 {
 	struct nvme_ns *ns;
@@ -2187,6 +2326,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 
 	blk_queue_logical_block_size(ns->queue, 1 << ns->lba_shift);
 	nvme_set_queue_limits(ctrl, ns->queue);
+	nvme_setup_streams_ns(ctrl, ns);
 
 	sprintf(disk_name, "nvme%dn%d", ctrl->instance, ns->instance);
 
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index ec8c7363934d..f616835afc4c 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -147,6 +147,8 @@ struct nvme_ctrl {
 	u16 oncs;
 	u16 vid;
 	u16 oacs;
+	u16 nssa;
+	u16 nr_streams;
 	atomic_t abort_limit;
 	u8 event_limit;
 	u8 vwc;
@@ -199,6 +201,8 @@ struct nvme_ns {
 	unsigned ns_id;
 	int lba_shift;
 	u16 ms;
+	u16 sgs;
+	u32 sws;
 	bool ext;
 	u8 pi_type;
 	unsigned long flags;
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 291587a0743f..f516a975bb21 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -253,6 +253,7 @@ enum {
 	NVME_CTRL_ONCS_WRITE_ZEROES		= 1 << 3,
 	NVME_CTRL_VWC_PRESENT			= 1 << 0,
 	NVME_CTRL_OACS_SEC_SUPP                 = 1 << 0,
+	NVME_CTRL_OACS_DIRECTIVES		= 1 << 5,
 	NVME_CTRL_OACS_DBBUF_SUPP		= 1 << 7,
 };
 
@@ -304,6 +305,19 @@ enum {
 };
 
 enum {
+	NVME_DIR_IDENTIFY		= 0x00,
+	NVME_DIR_STREAMS		= 0x01,
+	NVME_DIR_SND_ID_OP_ENABLE	= 0x01,
+	NVME_DIR_SND_ST_OP_REL_ID	= 0x01,
+	NVME_DIR_SND_ST_OP_REL_RSC	= 0x02,
+	NVME_DIR_RCV_ID_OP_PARAM	= 0x01,
+	NVME_DIR_RCV_ST_OP_PARAM	= 0x01,
+	NVME_DIR_RCV_ST_OP_STATUS	= 0x02,
+	NVME_DIR_RCV_ST_OP_RESOURCE	= 0x03,
+	NVME_DIR_ENDIR			= 0x01,
+};
+
+enum {
 	NVME_NS_FEAT_THIN	= 1 << 0,
 	NVME_NS_FLBAS_LBA_MASK	= 0xf,
 	NVME_NS_FLBAS_META_EXT	= 0x10,
@@ -560,6 +574,7 @@ enum {
 	NVME_RW_PRINFO_PRCHK_APP	= 1 << 11,
 	NVME_RW_PRINFO_PRCHK_GUARD	= 1 << 12,
 	NVME_RW_PRINFO_PRACT		= 1 << 13,
+	NVME_RW_DTYPE_STREAMS		= 1 << 4,
 };
 
 struct nvme_dsm_cmd {
@@ -634,6 +649,8 @@ enum nvme_admin_opcode {
 	nvme_admin_download_fw		= 0x11,
 	nvme_admin_ns_attach		= 0x15,
 	nvme_admin_keep_alive		= 0x18,
+	nvme_admin_directive_send	= 0x19,
+	nvme_admin_directive_recv	= 0x1a,
 	nvme_admin_dbbuf		= 0x7C,
 	nvme_admin_format_nvm		= 0x80,
 	nvme_admin_security_send	= 0x81,
@@ -797,6 +814,24 @@ struct nvme_get_log_page_command {
 	__u32			rsvd14[2];
 };
 
+struct nvme_directive_cmd {
+	__u8			opcode;
+	__u8			flags;
+	__u16			command_id;
+	__le32			nsid;
+	__u64			rsvd2[2];
+	union nvme_data_ptr	dptr;
+	__le32			numd;
+	__u8			doper;
+	__u8			dtype;
+	__le16			dspec;
+	__u8			endir;
+	__u8			tdtype;
+	__u16			rsvd15;
+
+	__u32			rsvd16[3];
+};
+
 /*
  * Fabrics subcommands.
  */
@@ -927,6 +962,18 @@ struct nvme_dbbuf {
 	__u32			rsvd12[6];
 };
 
+struct streams_directive_params {
+	__u16	msl;
+	__u16	nssa;
+	__u16	nsso;
+	__u8	rsvd[10];
+	__u32	sws;
+	__u16	sgs;
+	__u16	nsa;
+	__u16	nso;
+	__u8	rsvd2[6];
+};
+
 struct nvme_command {
 	union {
 		struct nvme_common_command common;
@@ -947,6 +994,7 @@ struct nvme_command {
 		struct nvmf_property_set_command prop_set;
 		struct nvmf_property_get_command prop_get;
 		struct nvme_dbbuf dbbuf;
+		struct nvme_directive_cmd directive;
 	};
 };
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/9] fs: add fcntl() interface for setting/getting write life time hints
  2017-06-26 15:37 ` [PATCH 1/9] fs: add fcntl() interface for setting/getting " Jens Axboe
@ 2017-06-27 14:42   ` Christoph Hellwig
  2017-06-27 14:52     ` Christoph Hellwig
                       ` (2 more replies)
  0 siblings, 3 replies; 42+ messages in thread
From: Christoph Hellwig @ 2017-06-27 14:42 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-fsdevel, hch, martin.petersen

The API looks ok, but the code could use some cleanups.  What do
you think about the incremental patch below:

It refactors various manipulations, and stores the write hint right
in the iocb as there is a 4 byte hole (this will need some minor
adjustments in the next patches):

diff --git a/fs/fcntl.c b/fs/fcntl.c
index f4e7267d117f..c436278154b4 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -243,6 +243,63 @@ static int f_getowner_uids(struct file *filp, unsigned long arg)
 }
 #endif
 
+static bool rw_hint_valid(enum rw_hint hint)
+{
+	switch (hint) {
+	case RWF_WRITE_LIFE_NOT_SET:
+	case RWH_WRITE_LIFE_NONE:
+	case RWH_WRITE_LIFE_SHORT:
+	case RWH_WRITE_LIFE_MEDIUM:
+	case RWH_WRITE_LIFE_LONG:
+	case RWH_WRITE_LIFE_EXTREME:
+		return true;
+	default:
+		return false;
+	}
+}
+
+static long fcntl_rw_hint(struct file *file, unsigned int cmd,
+			  unsigned long arg)
+{
+	struct inode *inode = file_inode(file);
+	u64 *argp = (u64 __user *)arg;
+	enum rw_hint hint;
+
+	switch (cmd) {
+	case F_GET_FILE_RW_HINT:
+		if (put_user(__file_write_hint(file), argp))
+			return -EFAULT;
+		return 0;
+	case F_SET_FILE_RW_HINT:
+		if (get_user(hint, argp))
+			return -EFAULT;
+		if (!rw_hint_valid(hint))
+			return -EINVAL;
+
+		spin_lock(&file->f_lock);
+		file->f_write_hint = hint;
+		spin_unlock(&file->f_lock);
+		return 0;
+	case F_GET_RW_HINT:
+		if (put_user(__inode_write_hint(inode), argp))
+			return -EFAULT;
+		return 0;
+	case F_SET_RW_HINT:
+		if (get_user(hint, argp))
+			return -EFAULT;
+		if (!rw_hint_valid(hint))
+			return -EINVAL;
+
+		inode_lock(inode);
+		inode_set_flags(inode, hint << S_WRITE_LIFE_SHIFT,
+				S_WRITE_LIFE_MASK);
+		inode_unlock(inode);
+		return 0;
+	default:
+		return -EINVAL;
+	}
+}
+
 static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
 		struct file *filp)
 {
@@ -337,6 +394,12 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
 	case F_GET_SEALS:
 		err = shmem_fcntl(filp, cmd, arg);
 		break;
+	case F_GET_RW_HINT:
+	case F_SET_RW_HINT:
+	case F_GET_FILE_RW_HINT:
+	case F_SET_FILE_RW_HINT:
+		err = fcntl_rw_hint(filp, cmd, arg);
+		break;
 	default:
 		break;
 	}
diff --git a/fs/open.c b/fs/open.c
index cd0c5be8d012..3fe0c4aa7d27 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -759,6 +759,7 @@ static int do_dentry_open(struct file *f,
 	     likely(f->f_op->write || f->f_op->write_iter))
 		f->f_mode |= FMODE_CAN_WRITE;
 
+	f->f_write_hint = WRITE_LIFE_NOT_SET;
 	f->f_flags &= ~(O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC);
 
 	file_ra_state_init(&f->f_ra, f->f_mapping->host->i_mapping);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 4574121f4746..a07e9ce970d1 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -265,6 +265,18 @@ struct page;
 struct address_space;
 struct writeback_control;
 
+/*
+ * Write life time hint values.
+ */
+enum rw_hint {
+	WRITE_LIFE_NOT_SET	= 0,
+	WRITE_LIFE_NONE		= RWH_WRITE_LIFE_NONE,
+	WRITE_LIFE_SHORT	= RWH_WRITE_LIFE_SHORT,
+	WRITE_LIFE_MEDIUM	= RWH_WRITE_LIFE_MEDIUM,
+	WRITE_LIFE_LONG		= RWH_WRITE_LIFE_LONG,
+	WRITE_LIFE_EXTREME	= RWH_WRITE_LIFE_EXTREME,
+};
+
 #define IOCB_EVENTFD		(1 << 0)
 #define IOCB_APPEND		(1 << 1)
 #define IOCB_DIRECT		(1 << 2)
@@ -280,6 +292,7 @@ struct kiocb {
 	void (*ki_complete)(struct kiocb *iocb, long ret, long ret2);
 	void			*private;
 	int			ki_flags;
+	enum rw_hint		ki_hint;
 };
 
 static inline bool is_sync_kiocb(struct kiocb *kiocb)
@@ -851,6 +864,7 @@ struct file {
 	 * Must not be taken from IRQ context.
 	 */
 	spinlock_t		f_lock;
+	enum rw_hint		f_write_hint;
 	atomic_long_t		f_count;
 	unsigned int 		f_flags;
 	fmode_t			f_mode;
@@ -1833,6 +1847,14 @@ struct super_operations {
 #endif
 
 /*
+ * Expected life time hint of a write for this inode. This uses the
+ * WRITE_LIFE_* encoding, we just need to define the shift. We need
+ * 3 bits for this. Next S_* value is 131072, bit 17.
+ */
+#define S_WRITE_LIFE_SHIFT	14	/* 16384, next bit */
+#define S_WRITE_LIFE_MASK	(7 << S_WRITE_LIFE_SHIFT)
+
+/*
  * Note that nosuid etc flags are inode-specific: setting some file-system
  * flags just means all the inodes inherit those flags by default. It might be
  * possible to override it selectively if you really wanted to with some
@@ -1878,6 +1900,35 @@ static inline bool HAS_UNMAPPED_ID(struct inode *inode)
 	return !uid_valid(inode->i_uid) || !gid_valid(inode->i_gid);
 }
 
+static inline enum rw_hint __inode_write_hint(struct inode *inode)
+{
+	return (inode->i_flags >> S_WRITE_LIFE_SHIFT) & 0x7;
+}
+
+static inline enum rw_hint inode_write_hint(struct inode *inode)
+{
+	enum rw_hint ret = __inode_write_hint(inode);
+	if (ret != WRITE_LIFE_NOT_SET)
+		return ret;
+	return WRITE_LIFE_NONE;
+}
+
+static inline enum rw_hint __file_write_hint(struct file *file)
+{
+	if (file->f_write_hint != WRITE_LIFE_NOT_SET)
+		return file->f_write_hint;
+
+	return __inode_write_hint(file_inode(file));
+}
+
+static inline enum rw_hint file_write_hint(struct file *file)
+{
+	enum rw_hint ret = __file_write_hint(file);
+	if (ret != WRITE_LIFE_NOT_SET)
+		return ret;
+	return WRITE_LIFE_NONE;
+}
+
 /*
  * Inode state bits.  Protected by inode->i_lock
  *
diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
index 813afd6eee71..ec69d55bcec7 100644
--- a/include/uapi/linux/fcntl.h
+++ b/include/uapi/linux/fcntl.h
@@ -43,6 +43,27 @@
 /* (1U << 31) is reserved for signed error codes */
 
 /*
+ * Set/Get write life time hints. {GET,SET}_RW_HINT operate on the
+ * underlying inode, while {GET,SET}_FILE_RW_HINT operate only on
+ * the specific file.
+ */
+#define F_GET_RW_HINT		(F_LINUX_SPECIFIC_BASE + 11)
+#define F_SET_RW_HINT		(F_LINUX_SPECIFIC_BASE + 12)
+#define F_GET_FILE_RW_HINT	(F_LINUX_SPECIFIC_BASE + 13)
+#define F_SET_FILE_RW_HINT	(F_LINUX_SPECIFIC_BASE + 14)
+
+/*
+ * Valid hint values for F_{GET,SET}_RW_HINT. 0 is "not set", or can be
+ * used to clear any hints previously set.
+ */
+#define RWF_WRITE_LIFE_NOT_SET	0
+#define RWH_WRITE_LIFE_NONE	1
+#define RWH_WRITE_LIFE_SHORT	2
+#define RWH_WRITE_LIFE_MEDIUM	3
+#define RWH_WRITE_LIFE_LONG	4
+#define RWH_WRITE_LIFE_EXTREME	5
+
+/*
  * Types of directory notifications that may be requested.
  */
 #define DN_ACCESS	0x00000001	/* File accessed */

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/9] fs: add fcntl() interface for setting/getting write life time hints
  2017-06-27 14:42   ` Christoph Hellwig
@ 2017-06-27 14:52     ` Christoph Hellwig
  2017-06-27 14:55     ` Jens Axboe
  2017-06-27 15:09     ` Jens Axboe
  2 siblings, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2017-06-27 14:52 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-fsdevel, hch, martin.petersen

On Tue, Jun 27, 2017 at 07:42:55AM -0700, Christoph Hellwig wrote:
> The API looks ok, but the code could use some cleanups.  What do
> you think about the incremental patch below:
> 
> It refactors various manipulations, and stores the write hint right
> in the iocb as there is a 4 byte hole (this will need some minor
> adjustments in the next patches):

And looking over the followons I'd love to just store the hints
directly in the inode, bio and request themselves.  We have big
enough holes at least in the bio and request to store them, although
instead of the enum which is at least in sized we'd have to make them
an explicit u16 or even u8.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/9] fs: add O_DIRECT support for sending down write life time hints
  2017-06-26 15:37 ` [PATCH 4/9] fs: add O_DIRECT support for sending down write life time hints Jens Axboe
@ 2017-06-27 14:53   ` Christoph Hellwig
  0 siblings, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2017-06-27 14:53 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-fsdevel, hch, martin.petersen

> -			bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_SYNC | REQ_IDLE);
> +			bio_set_op_attrs(bio, REQ_OP_WRITE,
> +						REQ_SYNC | REQ_IDLE);

			bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_IDLE;

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/9] fs: add fcntl() interface for setting/getting write life time hints
  2017-06-27 14:42   ` Christoph Hellwig
  2017-06-27 14:52     ` Christoph Hellwig
@ 2017-06-27 14:55     ` Jens Axboe
  2017-06-27 14:57       ` Christoph Hellwig
  2017-06-27 15:09     ` Jens Axboe
  2 siblings, 1 reply; 42+ messages in thread
From: Jens Axboe @ 2017-06-27 14:55 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, linux-fsdevel, hch, martin.petersen

On 06/27/2017 08:42 AM, Christoph Hellwig wrote:
> The API looks ok, but the code could use some cleanups.  What do
> you think about the incremental patch below:
> 
> It refactors various manipulations, and stores the write hint right
> in the iocb as there is a 4 byte hole (this will need some minor
> adjustments in the next patches):

Sigh... Sure, that's how I did it originally as well.

BTW, that patch does not look like an incremental patch, what's
this against?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/9] fs: add fcntl() interface for setting/getting write life time hints
  2017-06-27 14:55     ` Jens Axboe
@ 2017-06-27 14:57       ` Christoph Hellwig
  2017-06-27 14:58         ` Jens Axboe
  0 siblings, 1 reply; 42+ messages in thread
From: Christoph Hellwig @ 2017-06-27 14:57 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, linux-block, linux-fsdevel, hch, martin.petersen

On Tue, Jun 27, 2017 at 08:55:02AM -0600, Jens Axboe wrote:
> BTW, that patch does not look like an incremental patch, what's
> this against?

The patch I'm replying to, without the other ones.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/9] fs: add fcntl() interface for setting/getting write life time hints
  2017-06-27 14:57       ` Christoph Hellwig
@ 2017-06-27 14:58         ` Jens Axboe
  0 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2017-06-27 14:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christoph Hellwig, linux-block, linux-fsdevel, martin.petersen

On 06/27/2017 08:57 AM, Christoph Hellwig wrote:
> On Tue, Jun 27, 2017 at 08:55:02AM -0600, Jens Axboe wrote:
>> BTW, that patch does not look like an incremental patch, what's
>> this against?
> 
> The patch I'm replying to, without the other ones.

Looks like a replacement patch, not incremental to that. I'll
update. And I'm fine with not using flags, in fact that's what
I preferred to do initially.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/9] fs: add fcntl() interface for setting/getting write life time hints
  2017-06-27 14:42   ` Christoph Hellwig
  2017-06-27 14:52     ` Christoph Hellwig
  2017-06-27 14:55     ` Jens Axboe
@ 2017-06-27 15:09     ` Jens Axboe
  2017-06-27 15:16       ` Christoph Hellwig
  2 siblings, 1 reply; 42+ messages in thread
From: Jens Axboe @ 2017-06-27 15:09 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, linux-fsdevel, hch, martin.petersen

On 06/27/2017 08:42 AM, Christoph Hellwig wrote:
> The API looks ok, but the code could use some cleanups.  What do
> you think about the incremental patch below:
> 
> It refactors various manipulations, and stores the write hint right
> in the iocb as there is a 4 byte hole (this will need some minor
> adjustments in the next patches):

How's this? Fixes for compile, and also squeeze an enum rw_hint into
a hole in the inode structure.

I'll refactor around this and squeeze into bio/rq holes as well, then
re-test it.

diff --git a/fs/fcntl.c b/fs/fcntl.c
index f4e7267d117f..25f96a101f1a 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -243,6 +243,62 @@ static int f_getowner_uids(struct file *filp, unsigned long arg)
 }
 #endif
 
+static bool rw_hint_valid(enum rw_hint hint)
+{
+	switch (hint) {
+	case RWF_WRITE_LIFE_NOT_SET:
+	case RWH_WRITE_LIFE_NONE:
+	case RWH_WRITE_LIFE_SHORT:
+	case RWH_WRITE_LIFE_MEDIUM:
+	case RWH_WRITE_LIFE_LONG:
+	case RWH_WRITE_LIFE_EXTREME:
+		return true;
+	default:
+		return false;
+	}
+}
+
+static long fcntl_rw_hint(struct file *file, unsigned int cmd,
+			  unsigned long arg)
+{
+	struct inode *inode = file_inode(file);
+	u64 *argp = (u64 __user *)arg;
+	enum rw_hint hint;
+
+	switch (cmd) {
+	case F_GET_FILE_RW_HINT:
+		if (put_user(__file_write_hint(file), argp))
+			return -EFAULT;
+		return 0;
+	case F_SET_FILE_RW_HINT:
+		if (get_user(hint, argp))
+			return -EFAULT;
+		if (!rw_hint_valid(hint))
+			return -EINVAL;
+
+		spin_lock(&file->f_lock);
+		file->f_write_hint = hint;
+		spin_unlock(&file->f_lock);
+		return 0;
+	case F_GET_RW_HINT:
+		if (put_user(__inode_write_hint(inode), argp))
+			return -EFAULT;
+		return 0;
+	case F_SET_RW_HINT:
+		if (get_user(hint, argp))
+			return -EFAULT;
+		if (!rw_hint_valid(hint))
+			return -EINVAL;
+
+		inode_lock(inode);
+		inode->i_write_hint = hint;
+		inode_unlock(inode);
+		return 0;
+	default:
+		return -EINVAL;
+	}
+}
+
 static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
 		struct file *filp)
 {
@@ -337,6 +393,12 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
 	case F_GET_SEALS:
 		err = shmem_fcntl(filp, cmd, arg);
 		break;
+	case F_GET_RW_HINT:
+	case F_SET_RW_HINT:
+	case F_GET_FILE_RW_HINT:
+	case F_SET_FILE_RW_HINT:
+		err = fcntl_rw_hint(filp, cmd, arg);
+		break;
 	default:
 		break;
 	}
diff --git a/fs/inode.c b/fs/inode.c
index db5914783a71..f0e5fc77e6a4 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -146,6 +146,7 @@ int inode_init_always(struct super_block *sb, struct inode *inode)
 	i_gid_write(inode, 0);
 	atomic_set(&inode->i_writecount, 0);
 	inode->i_size = 0;
+	inode->i_write_hint = WRITE_LIFE_NOT_SET;
 	inode->i_blocks = 0;
 	inode->i_bytes = 0;
 	inode->i_generation = 0;
diff --git a/fs/open.c b/fs/open.c
index cd0c5be8d012..3fe0c4aa7d27 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -759,6 +759,7 @@ static int do_dentry_open(struct file *f,
 	     likely(f->f_op->write || f->f_op->write_iter))
 		f->f_mode |= FMODE_CAN_WRITE;
 
+	f->f_write_hint = WRITE_LIFE_NOT_SET;
 	f->f_flags &= ~(O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC);
 
 	file_ra_state_init(&f->f_ra, f->f_mapping->host->i_mapping);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 4574121f4746..4587a181162e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -265,6 +265,20 @@ struct page;
 struct address_space;
 struct writeback_control;
 
+#include <linux/fcntl.h>
+
+/*
+ * Write life time hint values.
+ */
+enum rw_hint {
+	WRITE_LIFE_NOT_SET	= 0,
+	WRITE_LIFE_NONE		= RWH_WRITE_LIFE_NONE,
+	WRITE_LIFE_SHORT	= RWH_WRITE_LIFE_SHORT,
+	WRITE_LIFE_MEDIUM	= RWH_WRITE_LIFE_MEDIUM,
+	WRITE_LIFE_LONG		= RWH_WRITE_LIFE_LONG,
+	WRITE_LIFE_EXTREME	= RWH_WRITE_LIFE_EXTREME,
+};
+
 #define IOCB_EVENTFD		(1 << 0)
 #define IOCB_APPEND		(1 << 1)
 #define IOCB_DIRECT		(1 << 2)
@@ -280,6 +294,7 @@ struct kiocb {
 	void (*ki_complete)(struct kiocb *iocb, long ret, long ret2);
 	void			*private;
 	int			ki_flags;
+	enum rw_hint		ki_hint;
 };
 
 static inline bool is_sync_kiocb(struct kiocb *kiocb)
@@ -597,6 +612,7 @@ struct inode {
 	spinlock_t		i_lock;	/* i_blocks, i_bytes, maybe i_size */
 	unsigned short          i_bytes;
 	unsigned int		i_blkbits;
+	enum rw_hint		i_write_hint;
 	blkcnt_t		i_blocks;
 
 #ifdef __NEED_I_SIZE_ORDERED
@@ -851,6 +867,7 @@ struct file {
 	 * Must not be taken from IRQ context.
 	 */
 	spinlock_t		f_lock;
+	enum rw_hint		f_write_hint;
 	atomic_long_t		f_count;
 	unsigned int 		f_flags;
 	fmode_t			f_mode;
@@ -1026,8 +1043,6 @@ struct file_lock_context {
 #define OFFT_OFFSET_MAX	INT_LIMIT(off_t)
 #endif
 
-#include <linux/fcntl.h>
-
 extern void send_sigio(struct fown_struct *fown, int fd, int band);
 
 /*
@@ -1878,6 +1893,35 @@ static inline bool HAS_UNMAPPED_ID(struct inode *inode)
 	return !uid_valid(inode->i_uid) || !gid_valid(inode->i_gid);
 }
 
+static inline enum rw_hint __inode_write_hint(struct inode *inode)
+{
+	return inode->i_write_hint;
+}
+
+static inline enum rw_hint inode_write_hint(struct inode *inode)
+{
+	enum rw_hint ret = __inode_write_hint(inode);
+	if (ret != WRITE_LIFE_NOT_SET)
+		return ret;
+	return WRITE_LIFE_NONE;
+}
+
+static inline enum rw_hint __file_write_hint(struct file *file)
+{
+	if (file->f_write_hint != WRITE_LIFE_NOT_SET)
+		return file->f_write_hint;
+
+	return __inode_write_hint(file_inode(file));
+}
+
+static inline enum rw_hint file_write_hint(struct file *file)
+{
+	enum rw_hint ret = __file_write_hint(file);
+	if (ret != WRITE_LIFE_NOT_SET)
+		return ret;
+	return WRITE_LIFE_NONE;
+}
+
 /*
  * Inode state bits.  Protected by inode->i_lock
  *
diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
index 813afd6eee71..ec69d55bcec7 100644
--- a/include/uapi/linux/fcntl.h
+++ b/include/uapi/linux/fcntl.h
@@ -43,6 +43,27 @@
 /* (1U << 31) is reserved for signed error codes */
 
 /*
+ * Set/Get write life time hints. {GET,SET}_RW_HINT operate on the
+ * underlying inode, while {GET,SET}_FILE_RW_HINT operate only on
+ * the specific file.
+ */
+#define F_GET_RW_HINT		(F_LINUX_SPECIFIC_BASE + 11)
+#define F_SET_RW_HINT		(F_LINUX_SPECIFIC_BASE + 12)
+#define F_GET_FILE_RW_HINT	(F_LINUX_SPECIFIC_BASE + 13)
+#define F_SET_FILE_RW_HINT	(F_LINUX_SPECIFIC_BASE + 14)
+
+/*
+ * Valid hint values for F_{GET,SET}_RW_HINT. 0 is "not set", or can be
+ * used to clear any hints previously set.
+ */
+#define RWF_WRITE_LIFE_NOT_SET	0
+#define RWH_WRITE_LIFE_NONE	1
+#define RWH_WRITE_LIFE_SHORT	2
+#define RWH_WRITE_LIFE_MEDIUM	3
+#define RWH_WRITE_LIFE_LONG	4
+#define RWH_WRITE_LIFE_EXTREME	5
+
+/*
  * Types of directory notifications that may be requested.
  */
 #define DN_ACCESS	0x00000001	/* File accessed */

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/9] fs: add fcntl() interface for setting/getting write life time hints
  2017-06-27 15:09     ` Jens Axboe
@ 2017-06-27 15:16       ` Christoph Hellwig
  2017-06-27 15:18         ` Jens Axboe
  0 siblings, 1 reply; 42+ messages in thread
From: Christoph Hellwig @ 2017-06-27 15:16 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-fsdevel, martin.petersen

On Tue, Jun 27, 2017 at 09:09:48AM -0600, Jens Axboe wrote:
> On 06/27/2017 08:42 AM, Christoph Hellwig wrote:
> > The API looks ok, but the code could use some cleanups.  What do
> > you think about the incremental patch below:
> > 
> > It refactors various manipulations, and stores the write hint right
> > in the iocb as there is a 4 byte hole (this will need some minor
> > adjustments in the next patches):
> 
> How's this? Fixes for compile, and also squeeze an enum rw_hint into
> a hole in the inode structure.
> 
> I'll refactor around this and squeeze into bio/rq holes as well, then
> re-test it.

Looks good, minor nitpick below:

> index 4574121f4746..4587a181162e 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -265,6 +265,20 @@ struct page;
>  struct address_space;
>  struct writeback_control;
>  
> +#include <linux/fcntl.h>

I didn't seem to need the move.  But if you want to move it can
we keep all the includes together at the very top?

> +static inline enum rw_hint __inode_write_hint(struct inode *inode)
> +{
> +	return inode->i_write_hint;
> +}
> +
> +static inline enum rw_hint inode_write_hint(struct inode *inode)
> +{
> +	enum rw_hint ret = __inode_write_hint(inode);
> +	if (ret != WRITE_LIFE_NOT_SET)
> +		return ret;
> +	return WRITE_LIFE_NONE;
> +}
> +
> +static inline enum rw_hint __file_write_hint(struct file *file)
> +{
> +	if (file->f_write_hint != WRITE_LIFE_NOT_SET)
> +		return file->f_write_hint;
> +
> +	return __inode_write_hint(file_inode(file));
> +}
> +
> +static inline enum rw_hint file_write_hint(struct file *file)
> +{
> +	enum rw_hint ret = __file_write_hint(file);
> +	if (ret != WRITE_LIFE_NOT_SET)
> +		return ret;
> +	return WRITE_LIFE_NONE;
> +}

I'd say kill all these helpers and just treat both WRITE_LIFE_NONE
and WRITE_LIFE_NOT_SET special all the way down in NVMe.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/9] blk-mq: expose write hints through debugfs
  2017-06-26 15:37 ` [PATCH 3/9] blk-mq: expose write hints through debugfs Jens Axboe
@ 2017-06-27 15:17   ` Christoph Hellwig
  2017-06-27 15:20     ` Jens Axboe
  0 siblings, 1 reply; 42+ messages in thread
From: Christoph Hellwig @ 2017-06-27 15:17 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-fsdevel, hch, martin.petersen

On Mon, Jun 26, 2017 at 09:37:54AM -0600, Jens Axboe wrote:
> Useful to verify that things are working the way they should.
> Reading the file will return number of kb written with each
> write hint. Writing the file will reset the statistics. No care
> is taken to ensure that we don't race on updates.
> 
> Drivers will write to q->write_hints[] if they handle a given
> write hint.

How about moving the accounting itself to blk-mq as well?  Just noticed
that it's completely generic while looking over the nvme patch.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/9] fs: add fcntl() interface for setting/getting write life time hints
  2017-06-27 15:16       ` Christoph Hellwig
@ 2017-06-27 15:18         ` Jens Axboe
  0 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2017-06-27 15:18 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, linux-fsdevel, martin.petersen

On 06/27/2017 09:16 AM, Christoph Hellwig wrote:
> On Tue, Jun 27, 2017 at 09:09:48AM -0600, Jens Axboe wrote:
>> On 06/27/2017 08:42 AM, Christoph Hellwig wrote:
>>> The API looks ok, but the code could use some cleanups.  What do
>>> you think about the incremental patch below:
>>>
>>> It refactors various manipulations, and stores the write hint right
>>> in the iocb as there is a 4 byte hole (this will need some minor
>>> adjustments in the next patches):
>>
>> How's this? Fixes for compile, and also squeeze an enum rw_hint into
>> a hole in the inode structure.
>>
>> I'll refactor around this and squeeze into bio/rq holes as well, then
>> re-test it.
> 
> Looks good, minor nitpick below:
> 
>> index 4574121f4746..4587a181162e 100644
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -265,6 +265,20 @@ struct page;
>>  struct address_space;
>>  struct writeback_control;
>>  
>> +#include <linux/fcntl.h>
> 
> I didn't seem to need the move.  But if you want to move it can
> we keep all the includes together at the very top?

It did here, we need it for the RWH_ defines or my compile blows up.
But yeah, let's just move it to the top, not sure why it's in the
middle.

>> +static inline enum rw_hint __inode_write_hint(struct inode *inode)
>> +{
>> +	return inode->i_write_hint;
>> +}
>> +
>> +static inline enum rw_hint inode_write_hint(struct inode *inode)
>> +{
>> +	enum rw_hint ret = __inode_write_hint(inode);
>> +	if (ret != WRITE_LIFE_NOT_SET)
>> +		return ret;
>> +	return WRITE_LIFE_NONE;
>> +}
>> +
>> +static inline enum rw_hint __file_write_hint(struct file *file)
>> +{
>> +	if (file->f_write_hint != WRITE_LIFE_NOT_SET)
>> +		return file->f_write_hint;
>> +
>> +	return __inode_write_hint(file_inode(file));
>> +}
>> +
>> +static inline enum rw_hint file_write_hint(struct file *file)
>> +{
>> +	enum rw_hint ret = __file_write_hint(file);
>> +	if (ret != WRITE_LIFE_NOT_SET)
>> +		return ret;
>> +	return WRITE_LIFE_NONE;
>> +}
> 
> I'd say kill all these helpers and just treat both WRITE_LIFE_NONE
> and WRITE_LIFE_NOT_SET special all the way down in NVMe.

Sure, we can do that.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/9] blk-mq: expose write hints through debugfs
  2017-06-27 15:17   ` Christoph Hellwig
@ 2017-06-27 15:20     ` Jens Axboe
  0 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2017-06-27 15:20 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, linux-fsdevel, martin.petersen

On 06/27/2017 09:17 AM, Christoph Hellwig wrote:
> On Mon, Jun 26, 2017 at 09:37:54AM -0600, Jens Axboe wrote:
>> Useful to verify that things are working the way they should.
>> Reading the file will return number of kb written with each
>> write hint. Writing the file will reset the statistics. No care
>> is taken to ensure that we don't race on updates.
>>
>> Drivers will write to q->write_hints[] if they handle a given
>> write hint.
> 
> How about moving the accounting itself to blk-mq as well?  Just noticed
> that it's completely generic while looking over the nvme patch.

I didn't want to do it, unless the driver is using streams.


-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/9] fs: add fcntl() interface for setting/getting write life time hints
@ 2017-06-26 16:29           ` Jens Axboe
  0 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2017-06-26 16:29 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, linux-fsdevel, linux-block, adilger,
	martin.petersen, linux-nvme, linux-api, linux-man

On 06/26/2017 10:09 AM, Darrick J. Wong wrote:
> On Mon, Jun 26, 2017 at 07:55:27AM -0600, Jens Axboe wrote:
>> On 06/26/2017 03:51 AM, Christoph Hellwig wrote:
>>> Please document the userspace API (added linux-api and linux-man
>>> to CC for sugestions), especially including the odd effects of the
>>> per-inode settings.
>>
>> Of course, I'll send in a diff for the fcntl(2) man page.
>>
>>> Also I would highly recommend to use different fcntl commands
>>> for the file vs inode hints to avoid any strange behavior.
>>
>> OK, used to have that too... I can add specific _FILE versions.
> 
> While you're at it, can you also send in an xfstest or two to check the
> basic functionality of the fcntl so that we know the code reflects the
> userspace API ("I set this hint and now I can query it back" and "file
> hint overrides inode hint") that we want?

I definitely can. I already wrote the below to verify that it behaves
the way it should.


/*
 * test-writehints.c: test file/inode write hint setting/getting
 */
#include <stdio.h>
#include <fcntl.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdbool.h>
#include <inttypes.h>
#include <assert.h>

#ifndef F_GET_RW_HINT
#define F_LINUX_SPECIFIC_BASE	1024
#define F_GET_RW_HINT		(F_LINUX_SPECIFIC_BASE + 11)
#define F_SET_RW_HINT		(F_LINUX_SPECIFIC_BASE + 12)
#define F_GET_FILE_RW_HINT	(F_LINUX_SPECIFIC_BASE + 13)
#define F_SET_FILE_RW_HINT	(F_LINUX_SPECIFIC_BASE + 14)

#define RWF_WRITE_LIFE_NOT_SET	0
#define RWH_WRITE_LIFE_NONE	1
#define RWH_WRITE_LIFE_SHORT	2
#define RWH_WRITE_LIFE_MEDIUM	3
#define RWH_WRITE_LIFE_LONG	4
#define RWH_WRITE_LIFE_EXTREME	5

#endif

static int __get_write_hint(int fd, int cmd)
{
	uint64_t hint;
	int ret;

	ret = fcntl(fd, cmd, &hint);
	if (ret < 0) {
		perror("fcntl: F_GET_RW_FILE_HINT");
		return -1;
	}

	return hint;
}

static int get_file_write_hint(int fd)
{
	return __get_write_hint(fd, F_GET_FILE_RW_HINT);
}

static int get_inode_write_hint(int fd)
{
	return __get_write_hint(fd, F_GET_RW_HINT);
}

static void set_file_write_hint(int fd, uint64_t hint)
{
	uint64_t set_hint = hint;
	int ret;

	ret = fcntl(fd, F_SET_FILE_RW_HINT, &set_hint);
	if (ret < 0) {
		perror("fcntl: F_RW_SET_HINT");
		return;
	}
}

static void set_inode_write_hint(int fd, uint64_t hint)
{
	uint64_t set_hint = hint;
	int ret;

	ret = fcntl(fd, F_SET_RW_HINT, &set_hint);
	if (ret < 0) {
		perror("fcntl: F_RW_SET_HINT");
		return;
	}
}

int main(int argc, char *argv[])
{
	char filename[] = "/tmp/writehintsXXXXXX";
	int ihint, fhint, fd;

	fd = open(filename, O_RDWR | O_CREAT | 0644);
	if (fd < 0) {
		perror("open");
		return 2;
	}

	/*
	 * Default hints for both file and inode should be NOT_SET
	 */
	fhint = get_file_write_hint(fd);
	if (fhint < 0)
		return 0;
	ihint = get_inode_write_hint(fd);
	assert(fhint == ihint);
	assert(fhint == RWF_WRITE_LIFE_NOT_SET);

	/*
	 * Set inode hint, check file hint returns the right hint
	 */
	set_inode_write_hint(fd, RWH_WRITE_LIFE_SHORT);
	fhint = get_file_write_hint(fd);
	ihint = get_inode_write_hint(fd);
	assert(fhint == ihint);
	assert(fhint == RWH_WRITE_LIFE_SHORT);

	/*
	 * Now set file hint, ensure that this is now the hint we get
	 */
	set_file_write_hint(fd, RWH_WRITE_LIFE_LONG);
	fhint = get_file_write_hint(fd);
	ihint = get_inode_write_hint(fd);
	assert(fhint == RWH_WRITE_LIFE_LONG);
	assert(ihint == RWH_WRITE_LIFE_SHORT);

	/*
	 * Clear inode write hint, ensure that file still returns the set hint
	 */
	set_inode_write_hint(fd, RWF_WRITE_LIFE_NOT_SET);
	fhint = get_file_write_hint(fd);
	ihint = get_inode_write_hint(fd);
	assert(fhint == RWH_WRITE_LIFE_LONG);
	assert(ihint == RWF_WRITE_LIFE_NOT_SET);

	/*
	 * Clear file write hint, ensure that now returns cleared
	 */
	set_file_write_hint(fd, RWF_WRITE_LIFE_NOT_SET);
	fhint = get_file_write_hint(fd);
	assert(fhint == RWF_WRITE_LIFE_NOT_SET);

	close(fd);
	unlink(filename);
	return 0;
}


-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/9] fs: add fcntl() interface for setting/getting write life time hints
@ 2017-06-26 16:29           ` Jens Axboe
  0 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2017-06-26 16:29 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	adilger-m1MBpc4rdrD3fQ9qLvQP4Q,
	martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-man-u79uwXL29TY76Z2rM5mHXA

On 06/26/2017 10:09 AM, Darrick J. Wong wrote:
> On Mon, Jun 26, 2017 at 07:55:27AM -0600, Jens Axboe wrote:
>> On 06/26/2017 03:51 AM, Christoph Hellwig wrote:
>>> Please document the userspace API (added linux-api and linux-man
>>> to CC for sugestions), especially including the odd effects of the
>>> per-inode settings.
>>
>> Of course, I'll send in a diff for the fcntl(2) man page.
>>
>>> Also I would highly recommend to use different fcntl commands
>>> for the file vs inode hints to avoid any strange behavior.
>>
>> OK, used to have that too... I can add specific _FILE versions.
> 
> While you're at it, can you also send in an xfstest or two to check the
> basic functionality of the fcntl so that we know the code reflects the
> userspace API ("I set this hint and now I can query it back" and "file
> hint overrides inode hint") that we want?

I definitely can. I already wrote the below to verify that it behaves
the way it should.


/*
 * test-writehints.c: test file/inode write hint setting/getting
 */
#include <stdio.h>
#include <fcntl.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdbool.h>
#include <inttypes.h>
#include <assert.h>

#ifndef F_GET_RW_HINT
#define F_LINUX_SPECIFIC_BASE	1024
#define F_GET_RW_HINT		(F_LINUX_SPECIFIC_BASE + 11)
#define F_SET_RW_HINT		(F_LINUX_SPECIFIC_BASE + 12)
#define F_GET_FILE_RW_HINT	(F_LINUX_SPECIFIC_BASE + 13)
#define F_SET_FILE_RW_HINT	(F_LINUX_SPECIFIC_BASE + 14)

#define RWF_WRITE_LIFE_NOT_SET	0
#define RWH_WRITE_LIFE_NONE	1
#define RWH_WRITE_LIFE_SHORT	2
#define RWH_WRITE_LIFE_MEDIUM	3
#define RWH_WRITE_LIFE_LONG	4
#define RWH_WRITE_LIFE_EXTREME	5

#endif

static int __get_write_hint(int fd, int cmd)
{
	uint64_t hint;
	int ret;

	ret = fcntl(fd, cmd, &hint);
	if (ret < 0) {
		perror("fcntl: F_GET_RW_FILE_HINT");
		return -1;
	}

	return hint;
}

static int get_file_write_hint(int fd)
{
	return __get_write_hint(fd, F_GET_FILE_RW_HINT);
}

static int get_inode_write_hint(int fd)
{
	return __get_write_hint(fd, F_GET_RW_HINT);
}

static void set_file_write_hint(int fd, uint64_t hint)
{
	uint64_t set_hint = hint;
	int ret;

	ret = fcntl(fd, F_SET_FILE_RW_HINT, &set_hint);
	if (ret < 0) {
		perror("fcntl: F_RW_SET_HINT");
		return;
	}
}

static void set_inode_write_hint(int fd, uint64_t hint)
{
	uint64_t set_hint = hint;
	int ret;

	ret = fcntl(fd, F_SET_RW_HINT, &set_hint);
	if (ret < 0) {
		perror("fcntl: F_RW_SET_HINT");
		return;
	}
}

int main(int argc, char *argv[])
{
	char filename[] = "/tmp/writehintsXXXXXX";
	int ihint, fhint, fd;

	fd = open(filename, O_RDWR | O_CREAT | 0644);
	if (fd < 0) {
		perror("open");
		return 2;
	}

	/*
	 * Default hints for both file and inode should be NOT_SET
	 */
	fhint = get_file_write_hint(fd);
	if (fhint < 0)
		return 0;
	ihint = get_inode_write_hint(fd);
	assert(fhint == ihint);
	assert(fhint == RWF_WRITE_LIFE_NOT_SET);

	/*
	 * Set inode hint, check file hint returns the right hint
	 */
	set_inode_write_hint(fd, RWH_WRITE_LIFE_SHORT);
	fhint = get_file_write_hint(fd);
	ihint = get_inode_write_hint(fd);
	assert(fhint == ihint);
	assert(fhint == RWH_WRITE_LIFE_SHORT);

	/*
	 * Now set file hint, ensure that this is now the hint we get
	 */
	set_file_write_hint(fd, RWH_WRITE_LIFE_LONG);
	fhint = get_file_write_hint(fd);
	ihint = get_inode_write_hint(fd);
	assert(fhint == RWH_WRITE_LIFE_LONG);
	assert(ihint == RWH_WRITE_LIFE_SHORT);

	/*
	 * Clear inode write hint, ensure that file still returns the set hint
	 */
	set_inode_write_hint(fd, RWF_WRITE_LIFE_NOT_SET);
	fhint = get_file_write_hint(fd);
	ihint = get_inode_write_hint(fd);
	assert(fhint == RWH_WRITE_LIFE_LONG);
	assert(ihint == RWF_WRITE_LIFE_NOT_SET);

	/*
	 * Clear file write hint, ensure that now returns cleared
	 */
	set_file_write_hint(fd, RWF_WRITE_LIFE_NOT_SET);
	fhint = get_file_write_hint(fd);
	assert(fhint == RWF_WRITE_LIFE_NOT_SET);

	close(fd);
	unlink(filename);
	return 0;
}


-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 1/9] fs: add fcntl() interface for setting/getting write life time hints
@ 2017-06-26 16:29           ` Jens Axboe
  0 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2017-06-26 16:29 UTC (permalink / raw)


On 06/26/2017 10:09 AM, Darrick J. Wong wrote:
> On Mon, Jun 26, 2017@07:55:27AM -0600, Jens Axboe wrote:
>> On 06/26/2017 03:51 AM, Christoph Hellwig wrote:
>>> Please document the userspace API (added linux-api and linux-man
>>> to CC for sugestions), especially including the odd effects of the
>>> per-inode settings.
>>
>> Of course, I'll send in a diff for the fcntl(2) man page.
>>
>>> Also I would highly recommend to use different fcntl commands
>>> for the file vs inode hints to avoid any strange behavior.
>>
>> OK, used to have that too... I can add specific _FILE versions.
> 
> While you're at it, can you also send in an xfstest or two to check the
> basic functionality of the fcntl so that we know the code reflects the
> userspace API ("I set this hint and now I can query it back" and "file
> hint overrides inode hint") that we want?

I definitely can. I already wrote the below to verify that it behaves
the way it should.


/*
 * test-writehints.c: test file/inode write hint setting/getting
 */
#include <stdio.h>
#include <fcntl.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdbool.h>
#include <inttypes.h>
#include <assert.h>

#ifndef F_GET_RW_HINT
#define F_LINUX_SPECIFIC_BASE	1024
#define F_GET_RW_HINT		(F_LINUX_SPECIFIC_BASE + 11)
#define F_SET_RW_HINT		(F_LINUX_SPECIFIC_BASE + 12)
#define F_GET_FILE_RW_HINT	(F_LINUX_SPECIFIC_BASE + 13)
#define F_SET_FILE_RW_HINT	(F_LINUX_SPECIFIC_BASE + 14)

#define RWF_WRITE_LIFE_NOT_SET	0
#define RWH_WRITE_LIFE_NONE	1
#define RWH_WRITE_LIFE_SHORT	2
#define RWH_WRITE_LIFE_MEDIUM	3
#define RWH_WRITE_LIFE_LONG	4
#define RWH_WRITE_LIFE_EXTREME	5

#endif

static int __get_write_hint(int fd, int cmd)
{
	uint64_t hint;
	int ret;

	ret = fcntl(fd, cmd, &hint);
	if (ret < 0) {
		perror("fcntl: F_GET_RW_FILE_HINT");
		return -1;
	}

	return hint;
}

static int get_file_write_hint(int fd)
{
	return __get_write_hint(fd, F_GET_FILE_RW_HINT);
}

static int get_inode_write_hint(int fd)
{
	return __get_write_hint(fd, F_GET_RW_HINT);
}

static void set_file_write_hint(int fd, uint64_t hint)
{
	uint64_t set_hint = hint;
	int ret;

	ret = fcntl(fd, F_SET_FILE_RW_HINT, &set_hint);
	if (ret < 0) {
		perror("fcntl: F_RW_SET_HINT");
		return;
	}
}

static void set_inode_write_hint(int fd, uint64_t hint)
{
	uint64_t set_hint = hint;
	int ret;

	ret = fcntl(fd, F_SET_RW_HINT, &set_hint);
	if (ret < 0) {
		perror("fcntl: F_RW_SET_HINT");
		return;
	}
}

int main(int argc, char *argv[])
{
	char filename[] = "/tmp/writehintsXXXXXX";
	int ihint, fhint, fd;

	fd = open(filename, O_RDWR | O_CREAT | 0644);
	if (fd < 0) {
		perror("open");
		return 2;
	}

	/*
	 * Default hints for both file and inode should be NOT_SET
	 */
	fhint = get_file_write_hint(fd);
	if (fhint < 0)
		return 0;
	ihint = get_inode_write_hint(fd);
	assert(fhint == ihint);
	assert(fhint == RWF_WRITE_LIFE_NOT_SET);

	/*
	 * Set inode hint, check file hint returns the right hint
	 */
	set_inode_write_hint(fd, RWH_WRITE_LIFE_SHORT);
	fhint = get_file_write_hint(fd);
	ihint = get_inode_write_hint(fd);
	assert(fhint == ihint);
	assert(fhint == RWH_WRITE_LIFE_SHORT);

	/*
	 * Now set file hint, ensure that this is now the hint we get
	 */
	set_file_write_hint(fd, RWH_WRITE_LIFE_LONG);
	fhint = get_file_write_hint(fd);
	ihint = get_inode_write_hint(fd);
	assert(fhint == RWH_WRITE_LIFE_LONG);
	assert(ihint == RWH_WRITE_LIFE_SHORT);

	/*
	 * Clear inode write hint, ensure that file still returns the set hint
	 */
	set_inode_write_hint(fd, RWF_WRITE_LIFE_NOT_SET);
	fhint = get_file_write_hint(fd);
	ihint = get_inode_write_hint(fd);
	assert(fhint == RWH_WRITE_LIFE_LONG);
	assert(ihint == RWF_WRITE_LIFE_NOT_SET);

	/*
	 * Clear file write hint, ensure that now returns cleared
	 */
	set_file_write_hint(fd, RWF_WRITE_LIFE_NOT_SET);
	fhint = get_file_write_hint(fd);
	assert(fhint == RWF_WRITE_LIFE_NOT_SET);

	close(fd);
	unlink(filename);
	return 0;
}


-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/9] fs: add fcntl() interface for setting/getting write life time hints
@ 2017-06-26 16:09         ` Darrick J. Wong
  0 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2017-06-26 16:09 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, linux-fsdevel, linux-block, adilger,
	martin.petersen, linux-nvme, linux-api, linux-man

On Mon, Jun 26, 2017 at 07:55:27AM -0600, Jens Axboe wrote:
> On 06/26/2017 03:51 AM, Christoph Hellwig wrote:
> > Please document the userspace API (added linux-api and linux-man
> > to CC for sugestions), especially including the odd effects of the
> > per-inode settings.
> 
> Of course, I'll send in a diff for the fcntl(2) man page.
> 
> > Also I would highly recommend to use different fcntl commands
> > for the file vs inode hints to avoid any strange behavior.
> 
> OK, used to have that too... I can add specific _FILE versions.

While you're at it, can you also send in an xfstest or two to check the
basic functionality of the fcntl so that we know the code reflects the
userspace API ("I set this hint and now I can query it back" and "file
hint overrides inode hint") that we want?

--D

> 
> -- 
> Jens Axboe
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-api" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/9] fs: add fcntl() interface for setting/getting write life time hints
@ 2017-06-26 16:09         ` Darrick J. Wong
  0 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2017-06-26 16:09 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	adilger-m1MBpc4rdrD3fQ9qLvQP4Q,
	martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-man-u79uwXL29TY76Z2rM5mHXA

On Mon, Jun 26, 2017 at 07:55:27AM -0600, Jens Axboe wrote:
> On 06/26/2017 03:51 AM, Christoph Hellwig wrote:
> > Please document the userspace API (added linux-api and linux-man
> > to CC for sugestions), especially including the odd effects of the
> > per-inode settings.
> 
> Of course, I'll send in a diff for the fcntl(2) man page.
> 
> > Also I would highly recommend to use different fcntl commands
> > for the file vs inode hints to avoid any strange behavior.
> 
> OK, used to have that too... I can add specific _FILE versions.

While you're at it, can you also send in an xfstest or two to check the
basic functionality of the fcntl so that we know the code reflects the
userspace API ("I set this hint and now I can query it back" and "file
hint overrides inode hint") that we want?

--D

> 
> -- 
> Jens Axboe
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-api" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 1/9] fs: add fcntl() interface for setting/getting write life time hints
@ 2017-06-26 16:09         ` Darrick J. Wong
  0 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2017-06-26 16:09 UTC (permalink / raw)


On Mon, Jun 26, 2017@07:55:27AM -0600, Jens Axboe wrote:
> On 06/26/2017 03:51 AM, Christoph Hellwig wrote:
> > Please document the userspace API (added linux-api and linux-man
> > to CC for sugestions), especially including the odd effects of the
> > per-inode settings.
> 
> Of course, I'll send in a diff for the fcntl(2) man page.
> 
> > Also I would highly recommend to use different fcntl commands
> > for the file vs inode hints to avoid any strange behavior.
> 
> OK, used to have that too... I can add specific _FILE versions.

While you're at it, can you also send in an xfstest or two to check the
basic functionality of the fcntl so that we know the code reflects the
userspace API ("I set this hint and now I can query it back" and "file
hint overrides inode hint") that we want?

--D

> 
> -- 
> Jens Axboe
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-api" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/9] fs: add fcntl() interface for setting/getting write life time hints
@ 2017-06-26 13:55       ` Jens Axboe
  0 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2017-06-26 13:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-fsdevel, linux-block, adilger, martin.petersen, linux-nvme,
	linux-api, linux-man

On 06/26/2017 03:51 AM, Christoph Hellwig wrote:
> Please document the userspace API (added linux-api and linux-man
> to CC for sugestions), especially including the odd effects of the
> per-inode settings.

Of course, I'll send in a diff for the fcntl(2) man page.

> Also I would highly recommend to use different fcntl commands
> for the file vs inode hints to avoid any strange behavior.

OK, used to have that too... I can add specific _FILE versions.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/9] fs: add fcntl() interface for setting/getting write life time hints
@ 2017-06-26 13:55       ` Jens Axboe
  0 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2017-06-26 13:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	adilger-m1MBpc4rdrD3fQ9qLvQP4Q,
	martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-man-u79uwXL29TY76Z2rM5mHXA

On 06/26/2017 03:51 AM, Christoph Hellwig wrote:
> Please document the userspace API (added linux-api and linux-man
> to CC for sugestions), especially including the odd effects of the
> per-inode settings.

Of course, I'll send in a diff for the fcntl(2) man page.

> Also I would highly recommend to use different fcntl commands
> for the file vs inode hints to avoid any strange behavior.

OK, used to have that too... I can add specific _FILE versions.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 1/9] fs: add fcntl() interface for setting/getting write life time hints
@ 2017-06-26 13:55       ` Jens Axboe
  0 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2017-06-26 13:55 UTC (permalink / raw)


On 06/26/2017 03:51 AM, Christoph Hellwig wrote:
> Please document the userspace API (added linux-api and linux-man
> to CC for sugestions), especially including the odd effects of the
> per-inode settings.

Of course, I'll send in a diff for the fcntl(2) man page.

> Also I would highly recommend to use different fcntl commands
> for the file vs inode hints to avoid any strange behavior.

OK, used to have that too... I can add specific _FILE versions.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/9] fs: add fcntl() interface for setting/getting write life time hints
@ 2017-06-26  9:51     ` Christoph Hellwig
  0 siblings, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2017-06-26  9:51 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-fsdevel, linux-block, adilger, hch, martin.petersen,
	linux-nvme, linux-api, linux-man

Please document the userspace API (added linux-api and linux-man
to CC for sugestions), especially including the odd effects of the
per-inode settings.

Also I would highly recommend to use different fcntl commands
for the file vs inode hints to avoid any strange behavior.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/9] fs: add fcntl() interface for setting/getting write life time hints
@ 2017-06-26  9:51     ` Christoph Hellwig
  0 siblings, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2017-06-26  9:51 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	adilger-m1MBpc4rdrD3fQ9qLvQP4Q, hch-wEGCiKHe2LqWVfeAwA7xHQ,
	martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-man-u79uwXL29TY76Z2rM5mHXA

Please document the userspace API (added linux-api and linux-man
to CC for sugestions), especially including the odd effects of the
per-inode settings.

Also I would highly recommend to use different fcntl commands
for the file vs inode hints to avoid any strange behavior.
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 1/9] fs: add fcntl() interface for setting/getting write life time hints
@ 2017-06-26  9:51     ` Christoph Hellwig
  0 siblings, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2017-06-26  9:51 UTC (permalink / raw)


Please document the userspace API (added linux-api and linux-man
to CC for sugestions), especially including the odd effects of the
per-inode settings.

Also I would highly recommend to use different fcntl commands
for the file vs inode hints to avoid any strange behavior.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 1/9] fs: add fcntl() interface for setting/getting write life time hints
  2017-06-21  0:21 [PATCHSET v9] Add support for write life time hints Jens Axboe
@ 2017-06-21  0:21   ` Jens Axboe
  0 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2017-06-21  0:21 UTC (permalink / raw)
  To: linux-fsdevel, linux-block
  Cc: adilger, hch, martin.petersen, linux-nvme, Jens Axboe

Define a set of write life time hints:

RWH_WRITE_LIFE_NONE	No hints about write life time
RWH_WRITE_LIFE_SHORT	Data written has a short life time
RWH_WRITE_LIFE_MEDIUM	Data written has a medium life time
RWH_WRITE_LIFE_LONG	Data written has a long life time
RWH_WRITE_LIFE_EXTREME	Data written has an extremely long life tim

The intent is for these values to be relative to each other, no
absolute meaning should be attached to these flag names.

Add an fcntl interface for querying these flags, and also for
setting them as well:

F_GET_RW_HINT		Returns the read/write hint set.

F_SET_RW_HINT		Pass one of the above write hints.

The user passes in a 64-bit pointer to get/set these values, and
the interface returns 0/-1 on success/error.

Sample program testing/implementing basic setting/getting of write
hints is below.

Add support for storing the write life time hint in the inode flags
and in struct file as well, and pass them to the kiocb flags. If
both a file and its corresponding inode has a write hint, then we
use the one in the file, if available. The file hint can be used
for sync/direct IO, for buffered writeback only the inode hint
is available.

This is in preparation for utilizing these hints in the block layer,
to guide on-media data placement.

/*
 * writehint.c: check or set a file/inode write hint
 */
 #include <stdio.h>
 #include <fcntl.h>
 #include <stdlib.h>
 #include <unistd.h>
 #include <stdbool.h>
 #include <inttypes.h>

 #ifndef F_RW_GET_HINT
 #define F_LINUX_SPECIFIC_BASE	1024
 #define F_RW_GET_HINT		(F_LINUX_SPECIFIC_BASE + 11)
 #define F_RW_SET_HINT		(F_LINUX_SPECIFIC_BASE + 12)
 #endif

static char *str[] = { "WRITE_LIFE_NOT_SET", "WRITE_LIFE_NONE",
			"WRITE_LIFE_SHORT", "WRITE_LIFE_MEDIUM",
			"WRITE_LIFE_LONG", "WRITE_LIFE_EXTREME" };

int main(int argc, char *argv[])
{
	uint64_t hint;
	int fd, ret;

	if (argc < 2) {
		fprintf(stderr, "%s: file <hint>\n", argv[0]);
		return 1;
	}

	fd = open(argv[1], O_RDONLY);
	if (fd < 0) {
		perror("open");
		return 2;
	}

	if (argc > 2) {
		hint = atoi(argv[2]);
		ret = fcntl(fd, F_RW_SET_HINT, &hint);
		if (ret < 0) {
			perror("fcntl: F_RW_SET_HINT");
			return 4;
		}
	}

	ret = fcntl(fd, F_RW_GET_HINT, &hint);
	if (ret < 0) {
		perror("fcntl: F_RW_GET_HINT");
		return 3;
	}

	printf("%s: hint %s\n", argv[1], str[hint]);
	close(fd);
	return 0;
}

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/fcntl.c                 | 60 +++++++++++++++++++++++++++++++++++++
 fs/inode.c                 | 11 +++++++
 fs/open.c                  |  1 +
 include/linux/fs.h         | 74 ++++++++++++++++++++++++++++++++++++++++++++--
 include/uapi/linux/fcntl.h | 16 ++++++++++
 5 files changed, 160 insertions(+), 2 deletions(-)

diff --git a/fs/fcntl.c b/fs/fcntl.c
index f4e7267d117f..7037f0560f36 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -243,6 +243,62 @@ static int f_getowner_uids(struct file *filp, unsigned long arg)
 }
 #endif
 
+static long fcntl_rw_hint(struct file *file, unsigned int cmd,
+			  unsigned long arg)
+{
+	struct inode *inode = file_inode(file);
+	enum rw_hint hint, old_hint;
+	long ret = 0;
+
+	switch (cmd) {
+	case F_GET_RW_HINT:
+		if (file->f_write_hint != WRITE_LIFE_NOT_SET)
+			hint = file->f_write_hint;
+		else
+			hint = mask_to_write_hint(inode->i_flags,
+							S_WRITE_LIFE_SHIFT);
+		if (put_user(hint, (u64 __user *) arg))
+			ret = -EFAULT;
+		break;
+	case F_SET_RW_HINT:
+		if (get_user(hint, (u64 __user *) arg)) {
+			ret = -EFAULT;
+			break;
+		}
+		switch (hint) {
+		case WRITE_LIFE_NOT_SET:
+		case WRITE_LIFE_NONE:
+		case WRITE_LIFE_SHORT:
+		case WRITE_LIFE_MEDIUM:
+		case WRITE_LIFE_LONG:
+		case WRITE_LIFE_EXTREME:
+			spin_lock(&file->f_lock);
+			file->f_write_hint = hint;
+			spin_unlock(&file->f_lock);
+
+			/*
+			 * Only propagate hint to inode, if no hint is set,
+			 * or if the hint is being cleared
+			 */
+			old_hint = mask_to_write_hint(inode->i_flags,
+							S_WRITE_LIFE_SHIFT);
+			if (old_hint == WRITE_LIFE_NOT_SET ||
+			    hint == WRITE_LIFE_NOT_SET)
+				inode_set_write_hint(inode, hint);
+			ret = 0;
+			break;
+		default:
+			ret = -EINVAL;
+		}
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+
+	return ret;
+}
+
 static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
 		struct file *filp)
 {
@@ -337,6 +393,10 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
 	case F_GET_SEALS:
 		err = shmem_fcntl(filp, cmd, arg);
 		break;
+	case F_GET_RW_HINT:
+	case F_SET_RW_HINT:
+		err = fcntl_rw_hint(filp, cmd, arg);
+		break;
 	default:
 		break;
 	}
diff --git a/fs/inode.c b/fs/inode.c
index db5914783a71..defb015a2c6d 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -2120,3 +2120,14 @@ struct timespec current_time(struct inode *inode)
 	return timespec_trunc(now, inode->i_sb->s_time_gran);
 }
 EXPORT_SYMBOL(current_time);
+
+void inode_set_write_hint(struct inode *inode, enum rw_hint hint)
+{
+	unsigned int flags = write_hint_to_mask(hint, S_WRITE_LIFE_SHIFT);
+
+	if (flags != mask_to_write_hint(inode->i_flags, S_WRITE_LIFE_SHIFT)) {
+		inode_lock(inode);
+		inode_set_flags(inode, flags, S_WRITE_LIFE_MASK);
+		inode_unlock(inode);
+	}
+}
diff --git a/fs/open.c b/fs/open.c
index cd0c5be8d012..3fe0c4aa7d27 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -759,6 +759,7 @@ static int do_dentry_open(struct file *f,
 	     likely(f->f_op->write || f->f_op->write_iter))
 		f->f_mode |= FMODE_CAN_WRITE;
 
+	f->f_write_hint = WRITE_LIFE_NOT_SET;
 	f->f_flags &= ~(O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC);
 
 	file_ra_state_init(&f->f_ra, f->f_mapping->host->i_mapping);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 4574121f4746..9c554a783a6f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -274,6 +274,13 @@ struct writeback_control;
 #define IOCB_WRITE		(1 << 6)
 #define IOCB_NOWAIT		(1 << 7)
 
+/*
+ * Steal 3 bits for stream information, this allows 8 valid streams
+ */
+#define IOCB_WRITE_LIFE_SHIFT	8
+#define IOCB_WRITE_LIFE_MASK	(7 << IOCB_WRITE_LIFE_SHIFT)
+
+
 struct kiocb {
 	struct file		*ki_filp;
 	loff_t			ki_pos;
@@ -297,6 +304,12 @@ static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp)
 	};
 }
 
+static inline int iocb_write_hint(const struct kiocb *iocb)
+{
+	return (iocb->ki_flags & IOCB_WRITE_LIFE_MASK) >>
+			IOCB_WRITE_LIFE_SHIFT;
+}
+
 /*
  * "descriptor" for what we're up to with a read.
  * This allows us to use the same read code yet
@@ -828,6 +841,20 @@ struct file_ra_state {
 	loff_t prev_pos;		/* Cache last read() position */
 };
 
+#include <linux/fcntl.h>
+
+/*
+ * Write life time hint values.
+ */
+enum rw_hint {
+	WRITE_LIFE_NOT_SET = 0,
+	WRITE_LIFE_NONE = RWH_WRITE_LIFE_NONE,
+	WRITE_LIFE_SHORT = RWH_WRITE_LIFE_SHORT,
+	WRITE_LIFE_MEDIUM = RWH_WRITE_LIFE_MEDIUM,
+	WRITE_LIFE_LONG = RWH_WRITE_LIFE_LONG,
+	WRITE_LIFE_EXTREME = RWH_WRITE_LIFE_EXTREME,
+};
+
 /*
  * Check if @index falls in the readahead windows.
  */
@@ -851,6 +878,7 @@ struct file {
 	 * Must not be taken from IRQ context.
 	 */
 	spinlock_t		f_lock;
+	enum rw_hint		f_write_hint;
 	atomic_long_t		f_count;
 	unsigned int 		f_flags;
 	fmode_t			f_mode;
@@ -1026,8 +1054,6 @@ struct file_lock_context {
 #define OFFT_OFFSET_MAX	INT_LIMIT(off_t)
 #endif
 
-#include <linux/fcntl.h>
-
 extern void send_sigio(struct fown_struct *fown, int fd, int band);
 
 /*
@@ -1833,6 +1859,14 @@ struct super_operations {
 #endif
 
 /*
+ * Expected life time hint of a write for this inode. This uses the
+ * WRITE_LIFE_* encoding, we just need to define the shift. We need
+ * 3 bits for this. Next S_* value is 131072, bit 17.
+ */
+#define S_WRITE_LIFE_SHIFT	14	/* 16384, next bit */
+#define S_WRITE_LIFE_MASK	(7 << S_WRITE_LIFE_SHIFT)
+
+/*
  * Note that nosuid etc flags are inode-specific: setting some file-system
  * flags just means all the inodes inherit those flags by default. It might be
  * possible to override it selectively if you really wanted to with some
@@ -1878,6 +1912,39 @@ static inline bool HAS_UNMAPPED_ID(struct inode *inode)
 	return !uid_valid(inode->i_uid) || !gid_valid(inode->i_gid);
 }
 
+static inline unsigned int write_hint_to_mask(enum rw_hint hint,
+					      unsigned int shift)
+{
+	return hint << shift;
+}
+
+static inline enum rw_hint mask_to_write_hint(unsigned int mask,
+					      unsigned int shift)
+{
+	return (mask >> shift) & 0x7;
+}
+
+static inline enum rw_hint inode_write_hint(struct inode *inode)
+{
+	enum rw_hint ret = WRITE_LIFE_NONE;
+
+	if (inode) {
+		ret = mask_to_write_hint(inode->i_flags, S_WRITE_LIFE_SHIFT);
+		if (ret == WRITE_LIFE_NOT_SET)
+			ret = WRITE_LIFE_NONE;
+	}
+
+	return ret;
+}
+
+static inline enum rw_hint file_write_hint(struct file *file)
+{
+	if (file->f_write_hint != WRITE_LIFE_NOT_SET)
+		return file->f_write_hint;
+
+	return inode_write_hint(file_inode(file));
+}
+
 /*
  * Inode state bits.  Protected by inode->i_lock
  *
@@ -2764,6 +2831,7 @@ extern struct inode *new_inode(struct super_block *sb);
 extern void free_inode_nonrcu(struct inode *inode);
 extern int should_remove_suid(struct dentry *);
 extern int file_remove_privs(struct file *);
+extern void inode_set_write_hint(struct inode *inode, enum rw_hint hint);
 
 extern void __insert_inode_hash(struct inode *, unsigned long hashval);
 static inline void insert_inode_hash(struct inode *inode)
@@ -3060,6 +3128,8 @@ static inline int iocb_flags(struct file *file)
 		res |= IOCB_DSYNC;
 	if (file->f_flags & __O_SYNC)
 		res |= IOCB_SYNC;
+
+	res |= write_hint_to_mask(file->f_write_hint, IOCB_WRITE_LIFE_SHIFT);
 	return res;
 }
 
diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
index 813afd6eee71..defe6e77fc99 100644
--- a/include/uapi/linux/fcntl.h
+++ b/include/uapi/linux/fcntl.h
@@ -43,6 +43,22 @@
 /* (1U << 31) is reserved for signed error codes */
 
 /*
+ * Set/Get write life time hints.
+ */
+#define F_GET_RW_HINT		(F_LINUX_SPECIFIC_BASE + 11)
+#define F_SET_RW_HINT		(F_LINUX_SPECIFIC_BASE + 12)
+
+/*
+ * Valid hint values for F_{GET,SET}_RW_HINT. 0 is "not set", or can be
+ * used to clear any hints previously set.
+ */
+#define RWH_WRITE_LIFE_NONE	1
+#define RWH_WRITE_LIFE_SHORT	2
+#define RWH_WRITE_LIFE_MEDIUM	3
+#define RWH_WRITE_LIFE_LONG	4
+#define RWH_WRITE_LIFE_EXTREME	5
+
+/*
  * Types of directory notifications that may be requested.
  */
 #define DN_ACCESS	0x00000001	/* File accessed */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 1/9] fs: add fcntl() interface for setting/getting write life time hints
@ 2017-06-21  0:21   ` Jens Axboe
  0 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2017-06-21  0:21 UTC (permalink / raw)


Define a set of write life time hints:

RWH_WRITE_LIFE_NONE	No hints about write life time
RWH_WRITE_LIFE_SHORT	Data written has a short life time
RWH_WRITE_LIFE_MEDIUM	Data written has a medium life time
RWH_WRITE_LIFE_LONG	Data written has a long life time
RWH_WRITE_LIFE_EXTREME	Data written has an extremely long life tim

The intent is for these values to be relative to each other, no
absolute meaning should be attached to these flag names.

Add an fcntl interface for querying these flags, and also for
setting them as well:

F_GET_RW_HINT		Returns the read/write hint set.

F_SET_RW_HINT		Pass one of the above write hints.

The user passes in a 64-bit pointer to get/set these values, and
the interface returns 0/-1 on success/error.

Sample program testing/implementing basic setting/getting of write
hints is below.

Add support for storing the write life time hint in the inode flags
and in struct file as well, and pass them to the kiocb flags. If
both a file and its corresponding inode has a write hint, then we
use the one in the file, if available. The file hint can be used
for sync/direct IO, for buffered writeback only the inode hint
is available.

This is in preparation for utilizing these hints in the block layer,
to guide on-media data placement.

/*
 * writehint.c: check or set a file/inode write hint
 */
 #include <stdio.h>
 #include <fcntl.h>
 #include <stdlib.h>
 #include <unistd.h>
 #include <stdbool.h>
 #include <inttypes.h>

 #ifndef F_RW_GET_HINT
 #define F_LINUX_SPECIFIC_BASE	1024
 #define F_RW_GET_HINT		(F_LINUX_SPECIFIC_BASE + 11)
 #define F_RW_SET_HINT		(F_LINUX_SPECIFIC_BASE + 12)
 #endif

static char *str[] = { "WRITE_LIFE_NOT_SET", "WRITE_LIFE_NONE",
			"WRITE_LIFE_SHORT", "WRITE_LIFE_MEDIUM",
			"WRITE_LIFE_LONG", "WRITE_LIFE_EXTREME" };

int main(int argc, char *argv[])
{
	uint64_t hint;
	int fd, ret;

	if (argc < 2) {
		fprintf(stderr, "%s: file <hint>\n", argv[0]);
		return 1;
	}

	fd = open(argv[1], O_RDONLY);
	if (fd < 0) {
		perror("open");
		return 2;
	}

	if (argc > 2) {
		hint = atoi(argv[2]);
		ret = fcntl(fd, F_RW_SET_HINT, &hint);
		if (ret < 0) {
			perror("fcntl: F_RW_SET_HINT");
			return 4;
		}
	}

	ret = fcntl(fd, F_RW_GET_HINT, &hint);
	if (ret < 0) {
		perror("fcntl: F_RW_GET_HINT");
		return 3;
	}

	printf("%s: hint %s\n", argv[1], str[hint]);
	close(fd);
	return 0;
}

Signed-off-by: Jens Axboe <axboe at kernel.dk>
---
 fs/fcntl.c                 | 60 +++++++++++++++++++++++++++++++++++++
 fs/inode.c                 | 11 +++++++
 fs/open.c                  |  1 +
 include/linux/fs.h         | 74 ++++++++++++++++++++++++++++++++++++++++++++--
 include/uapi/linux/fcntl.h | 16 ++++++++++
 5 files changed, 160 insertions(+), 2 deletions(-)

diff --git a/fs/fcntl.c b/fs/fcntl.c
index f4e7267d117f..7037f0560f36 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -243,6 +243,62 @@ static int f_getowner_uids(struct file *filp, unsigned long arg)
 }
 #endif
 
+static long fcntl_rw_hint(struct file *file, unsigned int cmd,
+			  unsigned long arg)
+{
+	struct inode *inode = file_inode(file);
+	enum rw_hint hint, old_hint;
+	long ret = 0;
+
+	switch (cmd) {
+	case F_GET_RW_HINT:
+		if (file->f_write_hint != WRITE_LIFE_NOT_SET)
+			hint = file->f_write_hint;
+		else
+			hint = mask_to_write_hint(inode->i_flags,
+							S_WRITE_LIFE_SHIFT);
+		if (put_user(hint, (u64 __user *) arg))
+			ret = -EFAULT;
+		break;
+	case F_SET_RW_HINT:
+		if (get_user(hint, (u64 __user *) arg)) {
+			ret = -EFAULT;
+			break;
+		}
+		switch (hint) {
+		case WRITE_LIFE_NOT_SET:
+		case WRITE_LIFE_NONE:
+		case WRITE_LIFE_SHORT:
+		case WRITE_LIFE_MEDIUM:
+		case WRITE_LIFE_LONG:
+		case WRITE_LIFE_EXTREME:
+			spin_lock(&file->f_lock);
+			file->f_write_hint = hint;
+			spin_unlock(&file->f_lock);
+
+			/*
+			 * Only propagate hint to inode, if no hint is set,
+			 * or if the hint is being cleared
+			 */
+			old_hint = mask_to_write_hint(inode->i_flags,
+							S_WRITE_LIFE_SHIFT);
+			if (old_hint == WRITE_LIFE_NOT_SET ||
+			    hint == WRITE_LIFE_NOT_SET)
+				inode_set_write_hint(inode, hint);
+			ret = 0;
+			break;
+		default:
+			ret = -EINVAL;
+		}
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+
+	return ret;
+}
+
 static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
 		struct file *filp)
 {
@@ -337,6 +393,10 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
 	case F_GET_SEALS:
 		err = shmem_fcntl(filp, cmd, arg);
 		break;
+	case F_GET_RW_HINT:
+	case F_SET_RW_HINT:
+		err = fcntl_rw_hint(filp, cmd, arg);
+		break;
 	default:
 		break;
 	}
diff --git a/fs/inode.c b/fs/inode.c
index db5914783a71..defb015a2c6d 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -2120,3 +2120,14 @@ struct timespec current_time(struct inode *inode)
 	return timespec_trunc(now, inode->i_sb->s_time_gran);
 }
 EXPORT_SYMBOL(current_time);
+
+void inode_set_write_hint(struct inode *inode, enum rw_hint hint)
+{
+	unsigned int flags = write_hint_to_mask(hint, S_WRITE_LIFE_SHIFT);
+
+	if (flags != mask_to_write_hint(inode->i_flags, S_WRITE_LIFE_SHIFT)) {
+		inode_lock(inode);
+		inode_set_flags(inode, flags, S_WRITE_LIFE_MASK);
+		inode_unlock(inode);
+	}
+}
diff --git a/fs/open.c b/fs/open.c
index cd0c5be8d012..3fe0c4aa7d27 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -759,6 +759,7 @@ static int do_dentry_open(struct file *f,
 	     likely(f->f_op->write || f->f_op->write_iter))
 		f->f_mode |= FMODE_CAN_WRITE;
 
+	f->f_write_hint = WRITE_LIFE_NOT_SET;
 	f->f_flags &= ~(O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC);
 
 	file_ra_state_init(&f->f_ra, f->f_mapping->host->i_mapping);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 4574121f4746..9c554a783a6f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -274,6 +274,13 @@ struct writeback_control;
 #define IOCB_WRITE		(1 << 6)
 #define IOCB_NOWAIT		(1 << 7)
 
+/*
+ * Steal 3 bits for stream information, this allows 8 valid streams
+ */
+#define IOCB_WRITE_LIFE_SHIFT	8
+#define IOCB_WRITE_LIFE_MASK	(7 << IOCB_WRITE_LIFE_SHIFT)
+
+
 struct kiocb {
 	struct file		*ki_filp;
 	loff_t			ki_pos;
@@ -297,6 +304,12 @@ static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp)
 	};
 }
 
+static inline int iocb_write_hint(const struct kiocb *iocb)
+{
+	return (iocb->ki_flags & IOCB_WRITE_LIFE_MASK) >>
+			IOCB_WRITE_LIFE_SHIFT;
+}
+
 /*
  * "descriptor" for what we're up to with a read.
  * This allows us to use the same read code yet
@@ -828,6 +841,20 @@ struct file_ra_state {
 	loff_t prev_pos;		/* Cache last read() position */
 };
 
+#include <linux/fcntl.h>
+
+/*
+ * Write life time hint values.
+ */
+enum rw_hint {
+	WRITE_LIFE_NOT_SET = 0,
+	WRITE_LIFE_NONE = RWH_WRITE_LIFE_NONE,
+	WRITE_LIFE_SHORT = RWH_WRITE_LIFE_SHORT,
+	WRITE_LIFE_MEDIUM = RWH_WRITE_LIFE_MEDIUM,
+	WRITE_LIFE_LONG = RWH_WRITE_LIFE_LONG,
+	WRITE_LIFE_EXTREME = RWH_WRITE_LIFE_EXTREME,
+};
+
 /*
  * Check if @index falls in the readahead windows.
  */
@@ -851,6 +878,7 @@ struct file {
 	 * Must not be taken from IRQ context.
 	 */
 	spinlock_t		f_lock;
+	enum rw_hint		f_write_hint;
 	atomic_long_t		f_count;
 	unsigned int 		f_flags;
 	fmode_t			f_mode;
@@ -1026,8 +1054,6 @@ struct file_lock_context {
 #define OFFT_OFFSET_MAX	INT_LIMIT(off_t)
 #endif
 
-#include <linux/fcntl.h>
-
 extern void send_sigio(struct fown_struct *fown, int fd, int band);
 
 /*
@@ -1833,6 +1859,14 @@ struct super_operations {
 #endif
 
 /*
+ * Expected life time hint of a write for this inode. This uses the
+ * WRITE_LIFE_* encoding, we just need to define the shift. We need
+ * 3 bits for this. Next S_* value is 131072, bit 17.
+ */
+#define S_WRITE_LIFE_SHIFT	14	/* 16384, next bit */
+#define S_WRITE_LIFE_MASK	(7 << S_WRITE_LIFE_SHIFT)
+
+/*
  * Note that nosuid etc flags are inode-specific: setting some file-system
  * flags just means all the inodes inherit those flags by default. It might be
  * possible to override it selectively if you really wanted to with some
@@ -1878,6 +1912,39 @@ static inline bool HAS_UNMAPPED_ID(struct inode *inode)
 	return !uid_valid(inode->i_uid) || !gid_valid(inode->i_gid);
 }
 
+static inline unsigned int write_hint_to_mask(enum rw_hint hint,
+					      unsigned int shift)
+{
+	return hint << shift;
+}
+
+static inline enum rw_hint mask_to_write_hint(unsigned int mask,
+					      unsigned int shift)
+{
+	return (mask >> shift) & 0x7;
+}
+
+static inline enum rw_hint inode_write_hint(struct inode *inode)
+{
+	enum rw_hint ret = WRITE_LIFE_NONE;
+
+	if (inode) {
+		ret = mask_to_write_hint(inode->i_flags, S_WRITE_LIFE_SHIFT);
+		if (ret == WRITE_LIFE_NOT_SET)
+			ret = WRITE_LIFE_NONE;
+	}
+
+	return ret;
+}
+
+static inline enum rw_hint file_write_hint(struct file *file)
+{
+	if (file->f_write_hint != WRITE_LIFE_NOT_SET)
+		return file->f_write_hint;
+
+	return inode_write_hint(file_inode(file));
+}
+
 /*
  * Inode state bits.  Protected by inode->i_lock
  *
@@ -2764,6 +2831,7 @@ extern struct inode *new_inode(struct super_block *sb);
 extern void free_inode_nonrcu(struct inode *inode);
 extern int should_remove_suid(struct dentry *);
 extern int file_remove_privs(struct file *);
+extern void inode_set_write_hint(struct inode *inode, enum rw_hint hint);
 
 extern void __insert_inode_hash(struct inode *, unsigned long hashval);
 static inline void insert_inode_hash(struct inode *inode)
@@ -3060,6 +3128,8 @@ static inline int iocb_flags(struct file *file)
 		res |= IOCB_DSYNC;
 	if (file->f_flags & __O_SYNC)
 		res |= IOCB_SYNC;
+
+	res |= write_hint_to_mask(file->f_write_hint, IOCB_WRITE_LIFE_SHIFT);
 	return res;
 }
 
diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
index 813afd6eee71..defe6e77fc99 100644
--- a/include/uapi/linux/fcntl.h
+++ b/include/uapi/linux/fcntl.h
@@ -43,6 +43,22 @@
 /* (1U << 31) is reserved for signed error codes */
 
 /*
+ * Set/Get write life time hints.
+ */
+#define F_GET_RW_HINT		(F_LINUX_SPECIFIC_BASE + 11)
+#define F_SET_RW_HINT		(F_LINUX_SPECIFIC_BASE + 12)
+
+/*
+ * Valid hint values for F_{GET,SET}_RW_HINT. 0 is "not set", or can be
+ * used to clear any hints previously set.
+ */
+#define RWH_WRITE_LIFE_NONE	1
+#define RWH_WRITE_LIFE_SHORT	2
+#define RWH_WRITE_LIFE_MEDIUM	3
+#define RWH_WRITE_LIFE_LONG	4
+#define RWH_WRITE_LIFE_EXTREME	5
+
+/*
  * Types of directory notifications that may be requested.
  */
 #define DN_ACCESS	0x00000001	/* File accessed */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/9] fs: add fcntl() interface for setting/getting write life time hints
  2017-06-20 23:09     ` Bart Van Assche
@ 2017-06-20 23:49       ` Jens Axboe
  -1 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2017-06-20 23:49 UTC (permalink / raw)
  To: Bart Van Assche, linux-block, linux-fsdevel
  Cc: hch, adilger, linux-nvme, martin.petersen

On 06/20/2017 05:09 PM, Bart Van Assche wrote:
> On Mon, 2017-06-19 at 11:04 -0600, Jens Axboe wrote:
>> +static long fcntl_rw_hint(struct file *file, unsigned int cmd,
>> +			  u64 __user *ptr)
>> +{
>> +	struct inode *inode = file_inode(file);
>> +	long ret = 0;
>> +	u64 hint;
>> +
>> +	switch (cmd) {
>> +	case F_GET_RW_HINT:
>> +		hint = mask_to_write_hint(inode->i_flags, S_WRITE_LIFE_SHIFT);
>> +		if (put_user(hint, ptr))
>> +			ret = -EFAULT;
>> +		break;
>> +	case F_SET_RW_HINT:
>> +		if (get_user(hint, ptr)) {
>> +			ret = -EFAULT;
>> +			break;
>> +		}
>> +		switch (hint) {
>> +		case WRITE_LIFE_NONE:
>> +		case WRITE_LIFE_SHORT:
>> +		case WRITE_LIFE_MEDIUM:
>> +		case WRITE_LIFE_LONG:
>> +		case WRITE_LIFE_EXTREME:
>> +			inode_set_write_hint(inode, hint);
>> +			ret = 0;
>> +			break;
>> +		default:
>> +			ret = -EINVAL;
>> +		}
>> +		break;
>> +	default:
>> +		ret = -EINVAL;
>> +		break;
>> +	}
>> +
>> +	return ret;
>> +}
> 
> Hello Jens,
> 
> Do we need an (inline) helper function for checking the validity of a
> numerical WRITE_LIFE value next to the definition of the WRITE_LIFE_*
> constants, e.g. WRITE_LIFE_NONE <= hint && hint <= WRITE_LIFE_EXTREME?

Might not hurt in general, I can fold something like that in.

>> +/*
>> + * Steal 3 bits for stream information, this allows 8 valid streams
>> + */
>> +#define IOCB_WRITE_LIFE_SHIFT	7
>> +#define IOCB_WRITE_LIFE_MASK	(BIT(7) | BIT(8) | BIT(9))
> 
> A minor comment: how about making this easier to read by defining
> IOCB_WRITE_LIFE_MASK as (7 << IOCB_WRITE_LIFE_SHIFT)?

Agree, that would be prettier.

>>  /*
>> + * Expected life time hint of a write for this inode. This uses the
>> + * WRITE_LIFE_* encoding, we just need to define the shift. We need
>> + * 3 bits for this. Next S_* value is 131072, bit 17.
>> + */
>> +#define S_WRITE_LIFE_MASK	0x1c000	/* bits 14..16 */
>> +#define S_WRITE_LIFE_SHIFT	14	/* 16384, next bit */
> 
> Another minor comment: how about making this easier to read by defining
> S_WRITE_LIFE_MASK as (7 << S_WRITE_LIFE_SHIFT)?

Agree, I'll make that change too.

>> /*
>> + * Write life time hint values.
>> + */
>> +enum rw_hint {
>> +	WRITE_LIFE_NONE = RWH_WRITE_LIFE_NONE,
>> +	WRITE_LIFE_SHORT = RWH_WRITE_LIFE_SHORT,
>> +	WRITE_LIFE_MEDIUM = RWH_WRITE_LIFE_MEDIUM,
>> +	WRITE_LIFE_LONG = RWH_WRITE_LIFE_LONG,
>> +	WRITE_LIFE_EXTREME = RWH_WRITE_LIFE_EXTREME
>> +};
>> [ ... ]
>> +/*
>> + * Valid hint values for F_{GET,SET}_RW_HINT
>> + */
>> +#define RWH_WRITE_LIFE_NONE	0
>> +#define RWH_WRITE_LIFE_SHORT	1
>> +#define RWH_WRITE_LIFE_MEDIUM	2
>> +#define RWH_WRITE_LIFE_LONG	3
>> +#define RWH_WRITE_LIFE_EXTREME	4
> 
> Maybe I missed something, but it's not clear to me why we have both an
> enum and defines with the same numerical values? BTW, I prefer an enum
> above #defines.

We use the enum internally, that's the hint that the fs and block layer
sees. The reason for the defines is for the user interface, where we
don't want that to be an enum. So the mapping between the two is the
definition of the enum rw_hint values.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 1/9] fs: add fcntl() interface for setting/getting write life time hints
@ 2017-06-20 23:49       ` Jens Axboe
  0 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2017-06-20 23:49 UTC (permalink / raw)


On 06/20/2017 05:09 PM, Bart Van Assche wrote:
> On Mon, 2017-06-19@11:04 -0600, Jens Axboe wrote:
>> +static long fcntl_rw_hint(struct file *file, unsigned int cmd,
>> +			  u64 __user *ptr)
>> +{
>> +	struct inode *inode = file_inode(file);
>> +	long ret = 0;
>> +	u64 hint;
>> +
>> +	switch (cmd) {
>> +	case F_GET_RW_HINT:
>> +		hint = mask_to_write_hint(inode->i_flags, S_WRITE_LIFE_SHIFT);
>> +		if (put_user(hint, ptr))
>> +			ret = -EFAULT;
>> +		break;
>> +	case F_SET_RW_HINT:
>> +		if (get_user(hint, ptr)) {
>> +			ret = -EFAULT;
>> +			break;
>> +		}
>> +		switch (hint) {
>> +		case WRITE_LIFE_NONE:
>> +		case WRITE_LIFE_SHORT:
>> +		case WRITE_LIFE_MEDIUM:
>> +		case WRITE_LIFE_LONG:
>> +		case WRITE_LIFE_EXTREME:
>> +			inode_set_write_hint(inode, hint);
>> +			ret = 0;
>> +			break;
>> +		default:
>> +			ret = -EINVAL;
>> +		}
>> +		break;
>> +	default:
>> +		ret = -EINVAL;
>> +		break;
>> +	}
>> +
>> +	return ret;
>> +}
> 
> Hello Jens,
> 
> Do we need an (inline) helper function for checking the validity of a
> numerical WRITE_LIFE value next to the definition of the WRITE_LIFE_*
> constants, e.g. WRITE_LIFE_NONE <= hint && hint <= WRITE_LIFE_EXTREME?

Might not hurt in general, I can fold something like that in.

>> +/*
>> + * Steal 3 bits for stream information, this allows 8 valid streams
>> + */
>> +#define IOCB_WRITE_LIFE_SHIFT	7
>> +#define IOCB_WRITE_LIFE_MASK	(BIT(7) | BIT(8) | BIT(9))
> 
> A minor comment: how about making this easier to read by defining
> IOCB_WRITE_LIFE_MASK as (7 << IOCB_WRITE_LIFE_SHIFT)?

Agree, that would be prettier.

>>  /*
>> + * Expected life time hint of a write for this inode. This uses the
>> + * WRITE_LIFE_* encoding, we just need to define the shift. We need
>> + * 3 bits for this. Next S_* value is 131072, bit 17.
>> + */
>> +#define S_WRITE_LIFE_MASK	0x1c000	/* bits 14..16 */
>> +#define S_WRITE_LIFE_SHIFT	14	/* 16384, next bit */
> 
> Another minor comment: how about making this easier to read by defining
> S_WRITE_LIFE_MASK as (7 << S_WRITE_LIFE_SHIFT)?

Agree, I'll make that change too.

>> /*
>> + * Write life time hint values.
>> + */
>> +enum rw_hint {
>> +	WRITE_LIFE_NONE = RWH_WRITE_LIFE_NONE,
>> +	WRITE_LIFE_SHORT = RWH_WRITE_LIFE_SHORT,
>> +	WRITE_LIFE_MEDIUM = RWH_WRITE_LIFE_MEDIUM,
>> +	WRITE_LIFE_LONG = RWH_WRITE_LIFE_LONG,
>> +	WRITE_LIFE_EXTREME = RWH_WRITE_LIFE_EXTREME
>> +};
>> [ ... ]
>> +/*
>> + * Valid hint values for F_{GET,SET}_RW_HINT
>> + */
>> +#define RWH_WRITE_LIFE_NONE	0
>> +#define RWH_WRITE_LIFE_SHORT	1
>> +#define RWH_WRITE_LIFE_MEDIUM	2
>> +#define RWH_WRITE_LIFE_LONG	3
>> +#define RWH_WRITE_LIFE_EXTREME	4
> 
> Maybe I missed something, but it's not clear to me why we have both an
> enum and defines with the same numerical values? BTW, I prefer an enum
> above #defines.

We use the enum internally, that's the hint that the fs and block layer
sees. The reason for the defines is for the user interface, where we
don't want that to be an enum. So the mapping between the two is the
definition of the enum rw_hint values.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/9] fs: add fcntl() interface for setting/getting write life time hints
  2017-06-19 17:04   ` Jens Axboe
  (?)
@ 2017-06-20 23:09     ` Bart Van Assche
  -1 siblings, 0 replies; 42+ messages in thread
From: Bart Van Assche @ 2017-06-20 23:09 UTC (permalink / raw)
  To: linux-block, axboe, linux-fsdevel
  Cc: hch, adilger, linux-nvme, martin.petersen

On Mon, 2017-06-19 at 11:04 -0600, Jens Axboe wrote:
> +static long fcntl_rw_hint(struct file *file, unsigned int cmd,
> +			  u64 __user *ptr)
> +{
> +	struct inode *inode =3D file_inode(file);
> +	long ret =3D 0;
> +	u64 hint;
> +
> +	switch (cmd) {
> +	case F_GET_RW_HINT:
> +		hint =3D mask_to_write_hint(inode->i_flags, S_WRITE_LIFE_SHIFT);
> +		if (put_user(hint, ptr))
> +			ret =3D -EFAULT;
> +		break;
> +	case F_SET_RW_HINT:
> +		if (get_user(hint, ptr)) {
> +			ret =3D -EFAULT;
> +			break;
> +		}
> +		switch (hint) {
> +		case WRITE_LIFE_NONE:
> +		case WRITE_LIFE_SHORT:
> +		case WRITE_LIFE_MEDIUM:
> +		case WRITE_LIFE_LONG:
> +		case WRITE_LIFE_EXTREME:
> +			inode_set_write_hint(inode, hint);
> +			ret =3D 0;
> +			break;
> +		default:
> +			ret =3D -EINVAL;
> +		}
> +		break;
> +	default:
> +		ret =3D -EINVAL;
> +		break;
> +	}
> +
> +	return ret;
> +}

Hello Jens,

Do we need an (inline) helper function for checking the validity of a
numerical WRITE_LIFE value next to the definition of the WRITE_LIFE_*
constants, e.g. WRITE_LIFE_NONE <=3D hint && hint <=3D WRITE_LIFE_EXTREME?

> +/*
> + * Steal 3 bits for stream information, this allows 8 valid streams
> + */
> +#define IOCB_WRITE_LIFE_SHIFT	7
> +#define IOCB_WRITE_LIFE_MASK	(BIT(7) | BIT(8) | BIT(9))

A minor comment: how about making this easier to read by defining
IOCB_WRITE_LIFE_MASK as (7 << IOCB_WRITE_LIFE_SHIFT)?

>  /*
> + * Expected life time hint of a write for this inode. This uses the
> + * WRITE_LIFE_* encoding, we just need to define the shift. We need
> + * 3 bits for this. Next S_* value is 131072, bit 17.
> + */
> +#define S_WRITE_LIFE_MASK	0x1c000	/* bits 14..16 */
> +#define S_WRITE_LIFE_SHIFT	14	/* 16384, next bit */

Another minor comment: how about making this easier to read by defining
S_WRITE_LIFE_MASK as (7 << S_WRITE_LIFE_SHIFT)?

> /*
> + * Write life time hint values.
> + */
> +enum rw_hint {
> +	WRITE_LIFE_NONE =3D RWH_WRITE_LIFE_NONE,
> +	WRITE_LIFE_SHORT =3D RWH_WRITE_LIFE_SHORT,
> +	WRITE_LIFE_MEDIUM =3D RWH_WRITE_LIFE_MEDIUM,
> +	WRITE_LIFE_LONG =3D RWH_WRITE_LIFE_LONG,
> +	WRITE_LIFE_EXTREME =3D RWH_WRITE_LIFE_EXTREME
> +};
> [ ... ]
> +/*
> + * Valid hint values for F_{GET,SET}_RW_HINT
> + */
> +#define RWH_WRITE_LIFE_NONE	0
> +#define RWH_WRITE_LIFE_SHORT	1
> +#define RWH_WRITE_LIFE_MEDIUM	2
> +#define RWH_WRITE_LIFE_LONG	3
> +#define RWH_WRITE_LIFE_EXTREME	4

Maybe I missed something, but it's not clear to me why we have both an enum=
 and
defines with the same numerical values? BTW, I prefer an enum above #define=
s.

Thanks,

Bart.=

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/9] fs: add fcntl() interface for setting/getting write life time hints
@ 2017-06-20 23:09     ` Bart Van Assche
  0 siblings, 0 replies; 42+ messages in thread
From: Bart Van Assche @ 2017-06-20 23:09 UTC (permalink / raw)
  To: linux-block, axboe, linux-fsdevel
  Cc: hch, adilger, linux-nvme, martin.petersen

On Mon, 2017-06-19 at 11:04 -0600, Jens Axboe wrote:
> +static long fcntl_rw_hint(struct file *file, unsigned int cmd,
> +			  u64 __user *ptr)
> +{
> +	struct inode *inode = file_inode(file);
> +	long ret = 0;
> +	u64 hint;
> +
> +	switch (cmd) {
> +	case F_GET_RW_HINT:
> +		hint = mask_to_write_hint(inode->i_flags, S_WRITE_LIFE_SHIFT);
> +		if (put_user(hint, ptr))
> +			ret = -EFAULT;
> +		break;
> +	case F_SET_RW_HINT:
> +		if (get_user(hint, ptr)) {
> +			ret = -EFAULT;
> +			break;
> +		}
> +		switch (hint) {
> +		case WRITE_LIFE_NONE:
> +		case WRITE_LIFE_SHORT:
> +		case WRITE_LIFE_MEDIUM:
> +		case WRITE_LIFE_LONG:
> +		case WRITE_LIFE_EXTREME:
> +			inode_set_write_hint(inode, hint);
> +			ret = 0;
> +			break;
> +		default:
> +			ret = -EINVAL;
> +		}
> +		break;
> +	default:
> +		ret = -EINVAL;
> +		break;
> +	}
> +
> +	return ret;
> +}

Hello Jens,

Do we need an (inline) helper function for checking the validity of a
numerical WRITE_LIFE value next to the definition of the WRITE_LIFE_*
constants, e.g. WRITE_LIFE_NONE <= hint && hint <= WRITE_LIFE_EXTREME?

> +/*
> + * Steal 3 bits for stream information, this allows 8 valid streams
> + */
> +#define IOCB_WRITE_LIFE_SHIFT	7
> +#define IOCB_WRITE_LIFE_MASK	(BIT(7) | BIT(8) | BIT(9))

A minor comment: how about making this easier to read by defining
IOCB_WRITE_LIFE_MASK as (7 << IOCB_WRITE_LIFE_SHIFT)?

>  /*
> + * Expected life time hint of a write for this inode. This uses the
> + * WRITE_LIFE_* encoding, we just need to define the shift. We need
> + * 3 bits for this. Next S_* value is 131072, bit 17.
> + */
> +#define S_WRITE_LIFE_MASK	0x1c000	/* bits 14..16 */
> +#define S_WRITE_LIFE_SHIFT	14	/* 16384, next bit */

Another minor comment: how about making this easier to read by defining
S_WRITE_LIFE_MASK as (7 << S_WRITE_LIFE_SHIFT)?

> /*
> + * Write life time hint values.
> + */
> +enum rw_hint {
> +	WRITE_LIFE_NONE = RWH_WRITE_LIFE_NONE,
> +	WRITE_LIFE_SHORT = RWH_WRITE_LIFE_SHORT,
> +	WRITE_LIFE_MEDIUM = RWH_WRITE_LIFE_MEDIUM,
> +	WRITE_LIFE_LONG = RWH_WRITE_LIFE_LONG,
> +	WRITE_LIFE_EXTREME = RWH_WRITE_LIFE_EXTREME
> +};
> [ ... ]
> +/*
> + * Valid hint values for F_{GET,SET}_RW_HINT
> + */
> +#define RWH_WRITE_LIFE_NONE	0
> +#define RWH_WRITE_LIFE_SHORT	1
> +#define RWH_WRITE_LIFE_MEDIUM	2
> +#define RWH_WRITE_LIFE_LONG	3
> +#define RWH_WRITE_LIFE_EXTREME	4

Maybe I missed something, but it's not clear to me why we have both an enum and
defines with the same numerical values? BTW, I prefer an enum above #defines.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 1/9] fs: add fcntl() interface for setting/getting write life time hints
@ 2017-06-20 23:09     ` Bart Van Assche
  0 siblings, 0 replies; 42+ messages in thread
From: Bart Van Assche @ 2017-06-20 23:09 UTC (permalink / raw)


On Mon, 2017-06-19@11:04 -0600, Jens Axboe wrote:
> +static long fcntl_rw_hint(struct file *file, unsigned int cmd,
> +			  u64 __user *ptr)
> +{
> +	struct inode *inode = file_inode(file);
> +	long ret = 0;
> +	u64 hint;
> +
> +	switch (cmd) {
> +	case F_GET_RW_HINT:
> +		hint = mask_to_write_hint(inode->i_flags, S_WRITE_LIFE_SHIFT);
> +		if (put_user(hint, ptr))
> +			ret = -EFAULT;
> +		break;
> +	case F_SET_RW_HINT:
> +		if (get_user(hint, ptr)) {
> +			ret = -EFAULT;
> +			break;
> +		}
> +		switch (hint) {
> +		case WRITE_LIFE_NONE:
> +		case WRITE_LIFE_SHORT:
> +		case WRITE_LIFE_MEDIUM:
> +		case WRITE_LIFE_LONG:
> +		case WRITE_LIFE_EXTREME:
> +			inode_set_write_hint(inode, hint);
> +			ret = 0;
> +			break;
> +		default:
> +			ret = -EINVAL;
> +		}
> +		break;
> +	default:
> +		ret = -EINVAL;
> +		break;
> +	}
> +
> +	return ret;
> +}

Hello Jens,

Do we need an (inline) helper function for checking the validity of a
numerical WRITE_LIFE value next to the definition of the WRITE_LIFE_*
constants, e.g. WRITE_LIFE_NONE <= hint && hint <= WRITE_LIFE_EXTREME?

> +/*
> + * Steal 3 bits for stream information, this allows 8 valid streams
> + */
> +#define IOCB_WRITE_LIFE_SHIFT	7
> +#define IOCB_WRITE_LIFE_MASK	(BIT(7) | BIT(8) | BIT(9))

A minor comment: how about making this easier to read by defining
IOCB_WRITE_LIFE_MASK as (7 << IOCB_WRITE_LIFE_SHIFT)?

>  /*
> + * Expected life time hint of a write for this inode. This uses the
> + * WRITE_LIFE_* encoding, we just need to define the shift. We need
> + * 3 bits for this. Next S_* value is 131072, bit 17.
> + */
> +#define S_WRITE_LIFE_MASK	0x1c000	/* bits 14..16 */
> +#define S_WRITE_LIFE_SHIFT	14	/* 16384, next bit */

Another minor comment: how about making this easier to read by defining
S_WRITE_LIFE_MASK as (7 << S_WRITE_LIFE_SHIFT)?

> /*
> + * Write life time hint values.
> + */
> +enum rw_hint {
> +	WRITE_LIFE_NONE = RWH_WRITE_LIFE_NONE,
> +	WRITE_LIFE_SHORT = RWH_WRITE_LIFE_SHORT,
> +	WRITE_LIFE_MEDIUM = RWH_WRITE_LIFE_MEDIUM,
> +	WRITE_LIFE_LONG = RWH_WRITE_LIFE_LONG,
> +	WRITE_LIFE_EXTREME = RWH_WRITE_LIFE_EXTREME
> +};
> [ ... ]
> +/*
> + * Valid hint values for F_{GET,SET}_RW_HINT
> + */
> +#define RWH_WRITE_LIFE_NONE	0
> +#define RWH_WRITE_LIFE_SHORT	1
> +#define RWH_WRITE_LIFE_MEDIUM	2
> +#define RWH_WRITE_LIFE_LONG	3
> +#define RWH_WRITE_LIFE_EXTREME	4

Maybe I missed something, but it's not clear to me why we have both an enum and
defines with the same numerical values? BTW, I prefer an enum above #defines.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 1/9] fs: add fcntl() interface for setting/getting write life time hints
  2017-06-19 17:04 [PATCHSET v8] Add support for " Jens Axboe
@ 2017-06-19 17:04   ` Jens Axboe
  0 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2017-06-19 17:04 UTC (permalink / raw)
  To: linux-fsdevel, linux-block
  Cc: adilger, hch, martin.petersen, linux-nvme, Jens Axboe

Define a set of write life time hints:

and add an fcntl interface for querying these flags, and also for
setting them as well:

F_GET_RW_HINT		Returns the read/write hint set.

F_SET_RW_HINT		Pass one of the above write hints.

The user passes in a 64-bit pointer to get/set these values, and
the interface returns 0/-1 on success/error.

Sample program testing/implementing basic setting/getting of write
hints is below.

Add support for storing the write life time hint in the inode flags,
and pass them to the kiocb flags as well. This is in preparation
for utilizing these hints in the block layer, to guide on-media
data placement.

/*
 * writehint.c: check or set a file/inode write hint
 */

static char *str[] = { "WRITE_LIFE_NONE", "WRITE_LIFE_SHORT",
			"WRITE_LIFE_MEDIUM", "WRITE_LIFE_LONG",
			"WRITE_LIFE_EXTREME" };

int main(int argc, char *argv[])
{
	uint64_t hint = -1ULL;
	int fd, ret;

	if (argc < 2) {
		fprintf(stderr, "%s: dev <hint>\n", argv[0]);
		return 1;
	}

	fd = open(argv[1], O_RDONLY);
	if (fd < 0) {
		perror("open");
		return 2;
	}

	if (argc > 2)
		hint = atoi(argv[2]);

	if (hint == -1ULL) {
		ret = fcntl(fd, F_RW_GET_HINT, &hint);
		if (ret < 0) {
			perror("fcntl: F_RW_GET_HINT");
			return 3;
		}
	} else {
		ret = fcntl(fd, F_RW_SET_HINT, &hint);
		if (ret < 0) {
			perror("fcntl: F_RW_SET_HINT");
			return 4;
		}
	}

	printf("%s: %shint %s\n", argv[1], hint != -1ULL ? "set " : "", str[hint]);
	close(fd);
	return 0;
}

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/fcntl.c                 | 43 ++++++++++++++++++++++++++++++++
 fs/inode.c                 | 11 +++++++++
 include/linux/fs.h         | 61 ++++++++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/fcntl.h | 15 ++++++++++++
 4 files changed, 130 insertions(+)

diff --git a/fs/fcntl.c b/fs/fcntl.c
index f4e7267d117f..113b78c11631 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -243,6 +243,45 @@ static int f_getowner_uids(struct file *filp, unsigned long arg)
 }
 #endif
 
+static long fcntl_rw_hint(struct file *file, unsigned int cmd,
+			  u64 __user *ptr)
+{
+	struct inode *inode = file_inode(file);
+	long ret = 0;
+	u64 hint;
+
+	switch (cmd) {
+	case F_GET_RW_HINT:
+		hint = mask_to_write_hint(inode->i_flags, S_WRITE_LIFE_SHIFT);
+		if (put_user(hint, ptr))
+			ret = -EFAULT;
+		break;
+	case F_SET_RW_HINT:
+		if (get_user(hint, ptr)) {
+			ret = -EFAULT;
+			break;
+		}
+		switch (hint) {
+		case WRITE_LIFE_NONE:
+		case WRITE_LIFE_SHORT:
+		case WRITE_LIFE_MEDIUM:
+		case WRITE_LIFE_LONG:
+		case WRITE_LIFE_EXTREME:
+			inode_set_write_hint(inode, hint);
+			ret = 0;
+			break;
+		default:
+			ret = -EINVAL;
+		}
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+
+	return ret;
+}
+
 static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
 		struct file *filp)
 {
@@ -337,6 +376,10 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
 	case F_GET_SEALS:
 		err = shmem_fcntl(filp, cmd, arg);
 		break;
+	case F_GET_RW_HINT:
+	case F_SET_RW_HINT:
+		err = fcntl_rw_hint(filp, cmd, (u64 __user *) arg);
+		break;
 	default:
 		break;
 	}
diff --git a/fs/inode.c b/fs/inode.c
index db5914783a71..defb015a2c6d 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -2120,3 +2120,14 @@ struct timespec current_time(struct inode *inode)
 	return timespec_trunc(now, inode->i_sb->s_time_gran);
 }
 EXPORT_SYMBOL(current_time);
+
+void inode_set_write_hint(struct inode *inode, enum rw_hint hint)
+{
+	unsigned int flags = write_hint_to_mask(hint, S_WRITE_LIFE_SHIFT);
+
+	if (flags != mask_to_write_hint(inode->i_flags, S_WRITE_LIFE_SHIFT)) {
+		inode_lock(inode);
+		inode_set_flags(inode, flags, S_WRITE_LIFE_MASK);
+		inode_unlock(inode);
+	}
+}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 023f0324762b..8720251cc153 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -270,6 +270,12 @@ struct writeback_control;
 #define IOCB_SYNC		(1 << 5)
 #define IOCB_WRITE		(1 << 6)
 
+/*
+ * Steal 3 bits for stream information, this allows 8 valid streams
+ */
+#define IOCB_WRITE_LIFE_SHIFT	7
+#define IOCB_WRITE_LIFE_MASK	(BIT(7) | BIT(8) | BIT(9))
+
 struct kiocb {
 	struct file		*ki_filp;
 	loff_t			ki_pos;
@@ -293,6 +299,12 @@ static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp)
 	};
 }
 
+static inline int iocb_write_hint(const struct kiocb *iocb)
+{
+	return (iocb->ki_flags & IOCB_WRITE_LIFE_MASK) >>
+			IOCB_WRITE_LIFE_SHIFT;
+}
+
 /*
  * "descriptor" for what we're up to with a read.
  * This allows us to use the same read code yet
@@ -1829,6 +1841,14 @@ struct super_operations {
 #endif
 
 /*
+ * Expected life time hint of a write for this inode. This uses the
+ * WRITE_LIFE_* encoding, we just need to define the shift. We need
+ * 3 bits for this. Next S_* value is 131072, bit 17.
+ */
+#define S_WRITE_LIFE_MASK	0x1c000	/* bits 14..16 */
+#define S_WRITE_LIFE_SHIFT	14	/* 16384, next bit */
+
+/*
  * Note that nosuid etc flags are inode-specific: setting some file-system
  * flags just means all the inodes inherit those flags by default. It might be
  * possible to override it selectively if you really wanted to with some
@@ -1875,6 +1895,37 @@ static inline bool HAS_UNMAPPED_ID(struct inode *inode)
 }
 
 /*
+ * Write life time hint values.
+ */
+enum rw_hint {
+	WRITE_LIFE_NONE = RWH_WRITE_LIFE_NONE,
+	WRITE_LIFE_SHORT = RWH_WRITE_LIFE_SHORT,
+	WRITE_LIFE_MEDIUM = RWH_WRITE_LIFE_MEDIUM,
+	WRITE_LIFE_LONG = RWH_WRITE_LIFE_LONG,
+	WRITE_LIFE_EXTREME = RWH_WRITE_LIFE_EXTREME
+};
+
+static inline unsigned int write_hint_to_mask(enum rw_hint hint,
+					      unsigned int shift)
+{
+	return hint << shift;
+}
+
+static inline enum rw_hint mask_to_write_hint(unsigned int mask,
+					      unsigned int shift)
+{
+	return (mask >> shift) & 0x7;
+}
+
+static inline unsigned int inode_write_hint(struct inode *inode)
+{
+	if (inode)
+		return mask_to_write_hint(inode->i_flags, S_WRITE_LIFE_SHIFT);
+
+	return 0;
+}
+
+/*
  * Inode state bits.  Protected by inode->i_lock
  *
  * Three bits determine the dirty state of the inode, I_DIRTY_SYNC,
@@ -2758,6 +2809,7 @@ extern struct inode *new_inode(struct super_block *sb);
 extern void free_inode_nonrcu(struct inode *inode);
 extern int should_remove_suid(struct dentry *);
 extern int file_remove_privs(struct file *);
+extern void inode_set_write_hint(struct inode *inode, enum rw_hint hint);
 
 extern void __insert_inode_hash(struct inode *, unsigned long hashval);
 static inline void insert_inode_hash(struct inode *inode)
@@ -3045,7 +3097,9 @@ static inline bool io_is_direct(struct file *filp)
 
 static inline int iocb_flags(struct file *file)
 {
+	struct inode *inode = file_inode(file);
 	int res = 0;
+
 	if (file->f_flags & O_APPEND)
 		res |= IOCB_APPEND;
 	if (io_is_direct(file))
@@ -3054,6 +3108,13 @@ static inline int iocb_flags(struct file *file)
 		res |= IOCB_DSYNC;
 	if (file->f_flags & __O_SYNC)
 		res |= IOCB_SYNC;
+	if (mask_to_write_hint(inode->i_flags, S_WRITE_LIFE_SHIFT)) {
+		enum rw_hint hint;
+
+		hint = mask_to_write_hint(inode->i_flags, S_WRITE_LIFE_SHIFT);
+		res |= write_hint_to_mask(hint, IOCB_WRITE_LIFE_SHIFT);
+	}
+
 	return res;
 }
 
diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
index 813afd6eee71..def8f70e8bae 100644
--- a/include/uapi/linux/fcntl.h
+++ b/include/uapi/linux/fcntl.h
@@ -43,6 +43,21 @@
 /* (1U << 31) is reserved for signed error codes */
 
 /*
+ * Set/Get write life time hints.
+ */
+#define F_GET_RW_HINT		(F_LINUX_SPECIFIC_BASE + 11)
+#define F_SET_RW_HINT		(F_LINUX_SPECIFIC_BASE + 12)
+
+/*
+ * Valid hint values for F_{GET,SET}_RW_HINT
+ */
+#define RWH_WRITE_LIFE_NONE	0
+#define RWH_WRITE_LIFE_SHORT	1
+#define RWH_WRITE_LIFE_MEDIUM	2
+#define RWH_WRITE_LIFE_LONG	3
+#define RWH_WRITE_LIFE_EXTREME	4
+
+/*
  * Types of directory notifications that may be requested.
  */
 #define DN_ACCESS	0x00000001	/* File accessed */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 1/9] fs: add fcntl() interface for setting/getting write life time hints
@ 2017-06-19 17:04   ` Jens Axboe
  0 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2017-06-19 17:04 UTC (permalink / raw)


Define a set of write life time hints:

and add an fcntl interface for querying these flags, and also for
setting them as well:

F_GET_RW_HINT		Returns the read/write hint set.

F_SET_RW_HINT		Pass one of the above write hints.

The user passes in a 64-bit pointer to get/set these values, and
the interface returns 0/-1 on success/error.

Sample program testing/implementing basic setting/getting of write
hints is below.

Add support for storing the write life time hint in the inode flags,
and pass them to the kiocb flags as well. This is in preparation
for utilizing these hints in the block layer, to guide on-media
data placement.

/*
 * writehint.c: check or set a file/inode write hint
 */

static char *str[] = { "WRITE_LIFE_NONE", "WRITE_LIFE_SHORT",
			"WRITE_LIFE_MEDIUM", "WRITE_LIFE_LONG",
			"WRITE_LIFE_EXTREME" };

int main(int argc, char *argv[])
{
	uint64_t hint = -1ULL;
	int fd, ret;

	if (argc < 2) {
		fprintf(stderr, "%s: dev <hint>\n", argv[0]);
		return 1;
	}

	fd = open(argv[1], O_RDONLY);
	if (fd < 0) {
		perror("open");
		return 2;
	}

	if (argc > 2)
		hint = atoi(argv[2]);

	if (hint == -1ULL) {
		ret = fcntl(fd, F_RW_GET_HINT, &hint);
		if (ret < 0) {
			perror("fcntl: F_RW_GET_HINT");
			return 3;
		}
	} else {
		ret = fcntl(fd, F_RW_SET_HINT, &hint);
		if (ret < 0) {
			perror("fcntl: F_RW_SET_HINT");
			return 4;
		}
	}

	printf("%s: %shint %s\n", argv[1], hint != -1ULL ? "set " : "", str[hint]);
	close(fd);
	return 0;
}

Signed-off-by: Jens Axboe <axboe at kernel.dk>
---
 fs/fcntl.c                 | 43 ++++++++++++++++++++++++++++++++
 fs/inode.c                 | 11 +++++++++
 include/linux/fs.h         | 61 ++++++++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/fcntl.h | 15 ++++++++++++
 4 files changed, 130 insertions(+)

diff --git a/fs/fcntl.c b/fs/fcntl.c
index f4e7267d117f..113b78c11631 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -243,6 +243,45 @@ static int f_getowner_uids(struct file *filp, unsigned long arg)
 }
 #endif
 
+static long fcntl_rw_hint(struct file *file, unsigned int cmd,
+			  u64 __user *ptr)
+{
+	struct inode *inode = file_inode(file);
+	long ret = 0;
+	u64 hint;
+
+	switch (cmd) {
+	case F_GET_RW_HINT:
+		hint = mask_to_write_hint(inode->i_flags, S_WRITE_LIFE_SHIFT);
+		if (put_user(hint, ptr))
+			ret = -EFAULT;
+		break;
+	case F_SET_RW_HINT:
+		if (get_user(hint, ptr)) {
+			ret = -EFAULT;
+			break;
+		}
+		switch (hint) {
+		case WRITE_LIFE_NONE:
+		case WRITE_LIFE_SHORT:
+		case WRITE_LIFE_MEDIUM:
+		case WRITE_LIFE_LONG:
+		case WRITE_LIFE_EXTREME:
+			inode_set_write_hint(inode, hint);
+			ret = 0;
+			break;
+		default:
+			ret = -EINVAL;
+		}
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+
+	return ret;
+}
+
 static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
 		struct file *filp)
 {
@@ -337,6 +376,10 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
 	case F_GET_SEALS:
 		err = shmem_fcntl(filp, cmd, arg);
 		break;
+	case F_GET_RW_HINT:
+	case F_SET_RW_HINT:
+		err = fcntl_rw_hint(filp, cmd, (u64 __user *) arg);
+		break;
 	default:
 		break;
 	}
diff --git a/fs/inode.c b/fs/inode.c
index db5914783a71..defb015a2c6d 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -2120,3 +2120,14 @@ struct timespec current_time(struct inode *inode)
 	return timespec_trunc(now, inode->i_sb->s_time_gran);
 }
 EXPORT_SYMBOL(current_time);
+
+void inode_set_write_hint(struct inode *inode, enum rw_hint hint)
+{
+	unsigned int flags = write_hint_to_mask(hint, S_WRITE_LIFE_SHIFT);
+
+	if (flags != mask_to_write_hint(inode->i_flags, S_WRITE_LIFE_SHIFT)) {
+		inode_lock(inode);
+		inode_set_flags(inode, flags, S_WRITE_LIFE_MASK);
+		inode_unlock(inode);
+	}
+}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 023f0324762b..8720251cc153 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -270,6 +270,12 @@ struct writeback_control;
 #define IOCB_SYNC		(1 << 5)
 #define IOCB_WRITE		(1 << 6)
 
+/*
+ * Steal 3 bits for stream information, this allows 8 valid streams
+ */
+#define IOCB_WRITE_LIFE_SHIFT	7
+#define IOCB_WRITE_LIFE_MASK	(BIT(7) | BIT(8) | BIT(9))
+
 struct kiocb {
 	struct file		*ki_filp;
 	loff_t			ki_pos;
@@ -293,6 +299,12 @@ static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp)
 	};
 }
 
+static inline int iocb_write_hint(const struct kiocb *iocb)
+{
+	return (iocb->ki_flags & IOCB_WRITE_LIFE_MASK) >>
+			IOCB_WRITE_LIFE_SHIFT;
+}
+
 /*
  * "descriptor" for what we're up to with a read.
  * This allows us to use the same read code yet
@@ -1829,6 +1841,14 @@ struct super_operations {
 #endif
 
 /*
+ * Expected life time hint of a write for this inode. This uses the
+ * WRITE_LIFE_* encoding, we just need to define the shift. We need
+ * 3 bits for this. Next S_* value is 131072, bit 17.
+ */
+#define S_WRITE_LIFE_MASK	0x1c000	/* bits 14..16 */
+#define S_WRITE_LIFE_SHIFT	14	/* 16384, next bit */
+
+/*
  * Note that nosuid etc flags are inode-specific: setting some file-system
  * flags just means all the inodes inherit those flags by default. It might be
  * possible to override it selectively if you really wanted to with some
@@ -1875,6 +1895,37 @@ static inline bool HAS_UNMAPPED_ID(struct inode *inode)
 }
 
 /*
+ * Write life time hint values.
+ */
+enum rw_hint {
+	WRITE_LIFE_NONE = RWH_WRITE_LIFE_NONE,
+	WRITE_LIFE_SHORT = RWH_WRITE_LIFE_SHORT,
+	WRITE_LIFE_MEDIUM = RWH_WRITE_LIFE_MEDIUM,
+	WRITE_LIFE_LONG = RWH_WRITE_LIFE_LONG,
+	WRITE_LIFE_EXTREME = RWH_WRITE_LIFE_EXTREME
+};
+
+static inline unsigned int write_hint_to_mask(enum rw_hint hint,
+					      unsigned int shift)
+{
+	return hint << shift;
+}
+
+static inline enum rw_hint mask_to_write_hint(unsigned int mask,
+					      unsigned int shift)
+{
+	return (mask >> shift) & 0x7;
+}
+
+static inline unsigned int inode_write_hint(struct inode *inode)
+{
+	if (inode)
+		return mask_to_write_hint(inode->i_flags, S_WRITE_LIFE_SHIFT);
+
+	return 0;
+}
+
+/*
  * Inode state bits.  Protected by inode->i_lock
  *
  * Three bits determine the dirty state of the inode, I_DIRTY_SYNC,
@@ -2758,6 +2809,7 @@ extern struct inode *new_inode(struct super_block *sb);
 extern void free_inode_nonrcu(struct inode *inode);
 extern int should_remove_suid(struct dentry *);
 extern int file_remove_privs(struct file *);
+extern void inode_set_write_hint(struct inode *inode, enum rw_hint hint);
 
 extern void __insert_inode_hash(struct inode *, unsigned long hashval);
 static inline void insert_inode_hash(struct inode *inode)
@@ -3045,7 +3097,9 @@ static inline bool io_is_direct(struct file *filp)
 
 static inline int iocb_flags(struct file *file)
 {
+	struct inode *inode = file_inode(file);
 	int res = 0;
+
 	if (file->f_flags & O_APPEND)
 		res |= IOCB_APPEND;
 	if (io_is_direct(file))
@@ -3054,6 +3108,13 @@ static inline int iocb_flags(struct file *file)
 		res |= IOCB_DSYNC;
 	if (file->f_flags & __O_SYNC)
 		res |= IOCB_SYNC;
+	if (mask_to_write_hint(inode->i_flags, S_WRITE_LIFE_SHIFT)) {
+		enum rw_hint hint;
+
+		hint = mask_to_write_hint(inode->i_flags, S_WRITE_LIFE_SHIFT);
+		res |= write_hint_to_mask(hint, IOCB_WRITE_LIFE_SHIFT);
+	}
+
 	return res;
 }
 
diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
index 813afd6eee71..def8f70e8bae 100644
--- a/include/uapi/linux/fcntl.h
+++ b/include/uapi/linux/fcntl.h
@@ -43,6 +43,21 @@
 /* (1U << 31) is reserved for signed error codes */
 
 /*
+ * Set/Get write life time hints.
+ */
+#define F_GET_RW_HINT		(F_LINUX_SPECIFIC_BASE + 11)
+#define F_SET_RW_HINT		(F_LINUX_SPECIFIC_BASE + 12)
+
+/*
+ * Valid hint values for F_{GET,SET}_RW_HINT
+ */
+#define RWH_WRITE_LIFE_NONE	0
+#define RWH_WRITE_LIFE_SHORT	1
+#define RWH_WRITE_LIFE_MEDIUM	2
+#define RWH_WRITE_LIFE_LONG	3
+#define RWH_WRITE_LIFE_EXTREME	4
+
+/*
  * Types of directory notifications that may be requested.
  */
 #define DN_ACCESS	0x00000001	/* File accessed */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2017-06-27 15:20 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-26 15:37 [PATCHSET v10] Add support for write life time hints Jens Axboe
2017-06-26 15:37 ` [PATCH 1/9] fs: add fcntl() interface for setting/getting " Jens Axboe
2017-06-27 14:42   ` Christoph Hellwig
2017-06-27 14:52     ` Christoph Hellwig
2017-06-27 14:55     ` Jens Axboe
2017-06-27 14:57       ` Christoph Hellwig
2017-06-27 14:58         ` Jens Axboe
2017-06-27 15:09     ` Jens Axboe
2017-06-27 15:16       ` Christoph Hellwig
2017-06-27 15:18         ` Jens Axboe
2017-06-26 15:37 ` [PATCH 2/9] block: add support for write hints in a bio Jens Axboe
2017-06-26 15:37 ` [PATCH 3/9] blk-mq: expose write hints through debugfs Jens Axboe
2017-06-27 15:17   ` Christoph Hellwig
2017-06-27 15:20     ` Jens Axboe
2017-06-26 15:37 ` [PATCH 4/9] fs: add O_DIRECT support for sending down write life time hints Jens Axboe
2017-06-27 14:53   ` Christoph Hellwig
2017-06-26 15:37 ` [PATCH 5/9] fs: add support for buffered writeback to pass down write hints Jens Axboe
2017-06-26 15:37 ` [PATCH 6/9] ext4: add support for passing in write hints for buffered writes Jens Axboe
2017-06-26 15:37 ` [PATCH 7/9] xfs: " Jens Axboe
2017-06-26 15:37 ` [PATCH 8/9] btrfs: " Jens Axboe
2017-06-26 15:38 ` [PATCH 9/9] nvme: add support for streams and directives Jens Axboe
  -- strict thread matches above, loose matches on Subject: below --
2017-06-21  0:21 [PATCHSET v9] Add support for write life time hints Jens Axboe
2017-06-21  0:21 ` [PATCH 1/9] fs: add fcntl() interface for setting/getting " Jens Axboe
2017-06-21  0:21   ` Jens Axboe
2017-06-26  9:51   ` Christoph Hellwig
2017-06-26  9:51     ` Christoph Hellwig
2017-06-26  9:51     ` Christoph Hellwig
2017-06-26 13:55     ` Jens Axboe
2017-06-26 13:55       ` Jens Axboe
2017-06-26 13:55       ` Jens Axboe
2017-06-26 16:09       ` Darrick J. Wong
2017-06-26 16:09         ` Darrick J. Wong
2017-06-26 16:09         ` Darrick J. Wong
2017-06-26 16:29         ` Jens Axboe
2017-06-26 16:29           ` Jens Axboe
2017-06-26 16:29           ` Jens Axboe
2017-06-19 17:04 [PATCHSET v8] Add support for " Jens Axboe
2017-06-19 17:04 ` [PATCH 1/9] fs: add fcntl() interface for setting/getting " Jens Axboe
2017-06-19 17:04   ` Jens Axboe
2017-06-20 23:09   ` Bart Van Assche
2017-06-20 23:09     ` Bart Van Assche
2017-06-20 23:09     ` Bart Van Assche
2017-06-20 23:49     ` Jens Axboe
2017-06-20 23:49       ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.