All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/13] Pass data temperature information to zoned UFS devices
@ 2023-09-20 19:14 Bart Van Assche
  2023-09-20 19:14 ` [PATCH 01/13] fs/f2fs: Restore the whint_mode mount option Bart Van Assche
                   ` (19 more replies)
  0 siblings, 20 replies; 61+ messages in thread
From: Bart Van Assche @ 2023-09-20 19:14 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, Bart Van Assche

Hi Jens,

Zoned UFS vendors need the data temperature information. Hence this patch
series that restores write hint information in F2FS and in the block layer.
The SCSI disk (sd) driver is modified such that it passes write hint
information to SCSI devices via the GROUP NUMBER field.

Please consider this patch series for the next merge window.

Thanks,

Bart.

Bart Van Assche (13):
  fs/f2fs: Restore the whint_mode mount option
  fs: Restore support for F_GET_FILE_RW_HINT and F_SET_FILE_RW_HINT
  fs: Restore kiocb.ki_hint
  block: Restore write hint support
  scsi: core: Query the Block Limits Extension VPD page
  scsi_proto: Add struct io_group_descriptor
  sd: Translate data lifetime information
  scsi_debug: Reduce code duplication
  scsi_debug: Support the block limits extension VPD page
  scsi_debug: Rework page code error handling
  scsi_debug: Rework subpage code error handling
  scsi_debug: Implement the IO Advice Hints Grouping mode page
  scsi_debug: Maintain write statistics per group number

 Documentation/filesystems/f2fs.rst |  70 ++++++++++
 block/bio.c                        |   2 +
 block/blk-crypto-fallback.c        |   1 +
 block/blk-merge.c                  |  14 ++
 block/blk-mq.c                     |   2 +
 block/bounce.c                     |   1 +
 block/fops.c                       |   3 +
 drivers/scsi/scsi.c                |   2 +
 drivers/scsi/scsi_debug.c          | 202 +++++++++++++++++++----------
 drivers/scsi/scsi_sysfs.c          |  10 ++
 drivers/scsi/sd.c                  |  78 ++++++++++-
 drivers/scsi/sd.h                  |   2 +
 fs/aio.c                           |   1 +
 fs/buffer.c                        |  13 +-
 fs/cachefiles/io.c                 |   2 +
 fs/direct-io.c                     |   1 +
 fs/f2fs/data.c                     |   2 +
 fs/f2fs/f2fs.h                     |   9 ++
 fs/f2fs/file.c                     |   6 +
 fs/f2fs/segment.c                  |  95 ++++++++++++++
 fs/f2fs/super.c                    |  32 ++++-
 fs/fcntl.c                         |  18 +++
 fs/iomap/buffered-io.c             |   2 +
 fs/iomap/direct-io.c               |   1 +
 fs/mpage.c                         |   1 +
 fs/open.c                          |   1 +
 include/linux/blk-mq.h             |   1 +
 include/linux/blk_types.h          |   1 +
 include/linux/fs.h                 |  21 +++
 include/scsi/scsi_device.h         |   1 +
 include/scsi/scsi_proto.h          |  40 ++++++
 include/trace/events/f2fs.h        |   5 +-
 io_uring/rw.c                      |   1 +
 33 files changed, 566 insertions(+), 75 deletions(-)


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 01/13] fs/f2fs: Restore the whint_mode mount option
  2023-09-20 19:14 [PATCH 00/13] Pass data temperature information to zoned UFS devices Bart Van Assche
@ 2023-09-20 19:14 ` Bart Van Assche
  2023-10-02 10:32   ` Avri Altman
  2023-10-03 19:33   ` Bean Huo
  2023-09-20 19:14 ` [PATCH 02/13] fs: Restore support for F_GET_FILE_RW_HINT and F_SET_FILE_RW_HINT Bart Van Assche
                   ` (18 subsequent siblings)
  19 siblings, 2 replies; 61+ messages in thread
From: Bart Van Assche @ 2023-09-20 19:14 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, Bart Van Assche, Jaegeuk Kim, Chao Yu,
	Jonathan Corbet

Restore support for the whint_mode mount option by reverting commit
930e2607638d ("f2fs: remove obsolete whint_mode").

Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 Documentation/filesystems/f2fs.rst | 70 ++++++++++++++++++++++
 fs/f2fs/f2fs.h                     |  9 +++
 fs/f2fs/segment.c                  | 95 ++++++++++++++++++++++++++++++
 fs/f2fs/super.c                    | 32 +++++++++-
 4 files changed, 205 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/f2fs.rst b/Documentation/filesystems/f2fs.rst
index d32c6209685d..de412ddebcc8 100644
--- a/Documentation/filesystems/f2fs.rst
+++ b/Documentation/filesystems/f2fs.rst
@@ -242,6 +242,12 @@ offgrpjquota		 Turn off group journalled quota.
 offprjjquota		 Turn off project journalled quota.
 quota			 Enable plain user disk quota accounting.
 noquota			 Disable all plain disk quota option.
+whint_mode=%s		 Control which write hints are passed down to block
+			 layer. This supports "off", "user-based", and
+			 "fs-based".  In "off" mode (default), f2fs does not pass
+			 down hints. In "user-based" mode, f2fs tries to pass
+			 down hints given by users. And in "fs-based" mode, f2fs
+			 passes down hints with its policy.
 alloc_mode=%s		 Adjust block allocation policy, which supports "reuse"
 			 and "default".
 fsync_mode=%s		 Control the policy of fsync. Currently supports "posix",
@@ -776,6 +782,70 @@ In order to identify whether the data in the victim segment are valid or not,
 F2FS manages a bitmap. Each bit represents the validity of a block, and the
 bitmap is composed of a bit stream covering whole blocks in main area.
 
+Write-hint Policy
+-----------------
+
+1) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET.
+
+2) whint_mode=user-based. F2FS tries to pass down hints given by
+users.
+
+===================== ======================== ===================
+User                  F2FS                     Block
+===================== ======================== ===================
+N/A                   META                     WRITE_LIFE_NOT_SET
+N/A                   HOT_NODE                 "
+N/A                   WARM_NODE                "
+N/A                   COLD_NODE                "
+ioctl(COLD)           COLD_DATA                WRITE_LIFE_EXTREME
+extension list        "                        "
+
+-- buffered io
+WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
+WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
+WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_NOT_SET
+WRITE_LIFE_NONE       "                        "
+WRITE_LIFE_MEDIUM     "                        "
+WRITE_LIFE_LONG       "                        "
+
+-- direct io
+WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
+WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
+WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_NOT_SET
+WRITE_LIFE_NONE       "                        WRITE_LIFE_NONE
+WRITE_LIFE_MEDIUM     "                        WRITE_LIFE_MEDIUM
+WRITE_LIFE_LONG       "                        WRITE_LIFE_LONG
+===================== ======================== ===================
+
+3) whint_mode=fs-based. F2FS passes down hints with its policy.
+
+===================== ======================== ===================
+User                  F2FS                     Block
+===================== ======================== ===================
+N/A                   META                     WRITE_LIFE_MEDIUM;
+N/A                   HOT_NODE                 WRITE_LIFE_NOT_SET
+N/A                   WARM_NODE                "
+N/A                   COLD_NODE                WRITE_LIFE_NONE
+ioctl(COLD)           COLD_DATA                WRITE_LIFE_EXTREME
+extension list        "                        "
+
+-- buffered io
+WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
+WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
+WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_LONG
+WRITE_LIFE_NONE       "                        "
+WRITE_LIFE_MEDIUM     "                        "
+WRITE_LIFE_LONG       "                        "
+
+-- direct io
+WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
+WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
+WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_NOT_SET
+WRITE_LIFE_NONE       "                        WRITE_LIFE_NONE
+WRITE_LIFE_MEDIUM     "                        WRITE_LIFE_MEDIUM
+WRITE_LIFE_LONG       "                        WRITE_LIFE_LONG
+===================== ======================== ===================
+
 Fallocate(2) Policy
 -------------------
 
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 6d688e42d89c..39ffad5c4087 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -157,6 +157,7 @@ struct f2fs_mount_info {
 	int s_jquota_fmt;			/* Format of quota to use */
 #endif
 	/* For which write hints are passed down to block layer */
+	int whint_mode;
 	int alloc_mode;			/* segment allocation policy */
 	int fsync_mode;			/* fsync policy */
 	int fs_mode;			/* fs mode: LFS or ADAPTIVE */
@@ -1343,6 +1344,12 @@ enum {
 	FS_MODE_FRAGMENT_BLK,		/* block fragmentation mode */
 };
 
+enum {
+	WHINT_MODE_OFF,		/* not pass down write hints */
+	WHINT_MODE_USER,	/* try to pass down hints given by users */
+	WHINT_MODE_FS,		/* pass down hints with F2FS policy */
+};
+
 enum {
 	ALLOC_MODE_DEFAULT,	/* stay default */
 	ALLOC_MODE_REUSE,	/* reuse segments as much as possible */
@@ -3727,6 +3734,8 @@ void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi);
 int __init f2fs_create_segment_manager_caches(void);
 void f2fs_destroy_segment_manager_caches(void);
 int f2fs_rw_hint_to_seg_type(enum rw_hint hint);
+enum rw_hint f2fs_io_type_to_rw_hint(struct f2fs_sb_info *sbi,
+			enum page_type type, enum temp_type temp);
 unsigned int f2fs_usable_segs_in_sec(struct f2fs_sb_info *sbi,
 			unsigned int segno);
 unsigned int f2fs_usable_blks_in_seg(struct f2fs_sb_info *sbi,
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index d05b41608fc0..38c0cb8d9571 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -3290,6 +3290,101 @@ int f2fs_rw_hint_to_seg_type(enum rw_hint hint)
 	}
 }
 
+/* This returns write hints for each segment type. This hints will be
+ * passed down to block layer. There are mapping tables which depend on
+ * the mount option 'whint_mode'.
+ *
+ * 1) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET.
+ *
+ * 2) whint_mode=user-based. F2FS tries to pass down hints given by users.
+ *
+ * User                  F2FS                     Block
+ * ----                  ----                     -----
+ *                       META                     WRITE_LIFE_NOT_SET
+ *                       HOT_NODE                 "
+ *                       WARM_NODE                "
+ *                       COLD_NODE                "
+ * ioctl(COLD)           COLD_DATA                WRITE_LIFE_EXTREME
+ * extension list        "                        "
+ *
+ * -- buffered io
+ * WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
+ * WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
+ * WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_NOT_SET
+ * WRITE_LIFE_NONE       "                        "
+ * WRITE_LIFE_MEDIUM     "                        "
+ * WRITE_LIFE_LONG       "                        "
+ *
+ * -- direct io
+ * WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
+ * WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
+ * WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_NOT_SET
+ * WRITE_LIFE_NONE       "                        WRITE_LIFE_NONE
+ * WRITE_LIFE_MEDIUM     "                        WRITE_LIFE_MEDIUM
+ * WRITE_LIFE_LONG       "                        WRITE_LIFE_LONG
+ *
+ * 3) whint_mode=fs-based. F2FS passes down hints with its policy.
+ *
+ * User                  F2FS                     Block
+ * ----                  ----                     -----
+ *                       META                     WRITE_LIFE_MEDIUM;
+ *                       HOT_NODE                 WRITE_LIFE_NOT_SET
+ *                       WARM_NODE                "
+ *                       COLD_NODE                WRITE_LIFE_NONE
+ * ioctl(COLD)           COLD_DATA                WRITE_LIFE_EXTREME
+ * extension list        "                        "
+ *
+ * -- buffered io
+ * WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
+ * WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
+ * WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_LONG
+ * WRITE_LIFE_NONE       "                        "
+ * WRITE_LIFE_MEDIUM     "                        "
+ * WRITE_LIFE_LONG       "                        "
+ *
+ * -- direct io
+ * WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
+ * WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
+ * WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_NOT_SET
+ * WRITE_LIFE_NONE       "                        WRITE_LIFE_NONE
+ * WRITE_LIFE_MEDIUM     "                        WRITE_LIFE_MEDIUM
+ * WRITE_LIFE_LONG       "                        WRITE_LIFE_LONG
+ */
+
+enum rw_hint f2fs_io_type_to_rw_hint(struct f2fs_sb_info *sbi,
+				enum page_type type, enum temp_type temp)
+{
+	if (F2FS_OPTION(sbi).whint_mode == WHINT_MODE_USER) {
+		if (type == DATA) {
+			if (temp == WARM)
+				return WRITE_LIFE_NOT_SET;
+			else if (temp == HOT)
+				return WRITE_LIFE_SHORT;
+			else if (temp == COLD)
+				return WRITE_LIFE_EXTREME;
+		} else {
+			return WRITE_LIFE_NOT_SET;
+		}
+	} else if (F2FS_OPTION(sbi).whint_mode == WHINT_MODE_FS) {
+		if (type == DATA) {
+			if (temp == WARM)
+				return WRITE_LIFE_LONG;
+			else if (temp == HOT)
+				return WRITE_LIFE_SHORT;
+			else if (temp == COLD)
+				return WRITE_LIFE_EXTREME;
+		} else if (type == NODE) {
+			if (temp == WARM || temp == HOT)
+				return WRITE_LIFE_NOT_SET;
+			else if (temp == COLD)
+				return WRITE_LIFE_NONE;
+		} else if (type == META) {
+			return WRITE_LIFE_MEDIUM;
+		}
+	}
+	return WRITE_LIFE_NOT_SET;
+}
+
 static int __get_segment_type_2(struct f2fs_io_info *fio)
 {
 	if (fio->type == DATA)
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index a8c8232852bb..5bb062075acf 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -141,6 +141,7 @@ enum {
 	Opt_jqfmt_vfsold,
 	Opt_jqfmt_vfsv0,
 	Opt_jqfmt_vfsv1,
+	Opt_whint,
 	Opt_alloc,
 	Opt_fsync,
 	Opt_test_dummy_encryption,
@@ -220,6 +221,7 @@ static match_table_t f2fs_tokens = {
 	{Opt_jqfmt_vfsold, "jqfmt=vfsold"},
 	{Opt_jqfmt_vfsv0, "jqfmt=vfsv0"},
 	{Opt_jqfmt_vfsv1, "jqfmt=vfsv1"},
+	{Opt_whint, "whint_mode=%s"},
 	{Opt_alloc, "alloc_mode=%s"},
 	{Opt_fsync, "fsync_mode=%s"},
 	{Opt_test_dummy_encryption, "test_dummy_encryption=%s"},
@@ -988,6 +990,22 @@ static int parse_options(struct super_block *sb, char *options, bool is_remount)
 			f2fs_info(sbi, "quota operations not supported");
 			break;
 #endif
+		case Opt_whint:
+			name = match_strdup(&args[0]);
+			if (!name)
+				return -ENOMEM;
+			if (!strcmp(name, "user-based")) {
+				F2FS_OPTION(sbi).whint_mode = WHINT_MODE_USER;
+			} else if (!strcmp(name, "off")) {
+				F2FS_OPTION(sbi).whint_mode = WHINT_MODE_OFF;
+			} else if (!strcmp(name, "fs-based")) {
+				F2FS_OPTION(sbi).whint_mode = WHINT_MODE_FS;
+			} else {
+				kfree(name);
+				return -EINVAL;
+			}
+			kfree(name);
+			break;
 		case Opt_alloc:
 			name = match_strdup(&args[0]);
 			if (!name)
@@ -1389,6 +1407,12 @@ static int parse_options(struct super_block *sb, char *options, bool is_remount)
 		return -EINVAL;
 	}
 
+	/* Not pass down write hints if the number of active logs is lesser
+	 * than NR_CURSEG_PERSIST_TYPE.
+	 */
+	if (F2FS_OPTION(sbi).active_logs != NR_CURSEG_PERSIST_TYPE)
+		F2FS_OPTION(sbi).whint_mode = WHINT_MODE_OFF;
+
 	if (f2fs_sb_has_readonly(sbi) && !f2fs_readonly(sbi->sb)) {
 		f2fs_err(sbi, "Allow to mount readonly mode only");
 		return -EROFS;
@@ -2060,6 +2084,10 @@ static int f2fs_show_options(struct seq_file *seq, struct dentry *root)
 		seq_puts(seq, ",prjquota");
 #endif
 	f2fs_show_quota_options(seq, sbi->sb);
+	if (F2FS_OPTION(sbi).whint_mode == WHINT_MODE_USER)
+		seq_printf(seq, ",whint_mode=%s", "user-based");
+	else if (F2FS_OPTION(sbi).whint_mode == WHINT_MODE_FS)
+		seq_printf(seq, ",whint_mode=%s", "fs-based");
 
 	fscrypt_show_test_dummy_encryption(seq, ',', sbi->sb);
 
@@ -2129,6 +2157,7 @@ static void default_options(struct f2fs_sb_info *sbi, bool remount)
 		F2FS_OPTION(sbi).active_logs = NR_CURSEG_PERSIST_TYPE;
 
 	F2FS_OPTION(sbi).inline_xattr_size = DEFAULT_INLINE_XATTR_ADDRS;
+	F2FS_OPTION(sbi).whint_mode = WHINT_MODE_OFF;
 	if (le32_to_cpu(F2FS_RAW_SUPER(sbi)->segment_count_main) <=
 							SMALL_VOLUME_SEGMENTS)
 		F2FS_OPTION(sbi).alloc_mode = ALLOC_MODE_REUSE;
@@ -2443,7 +2472,8 @@ static int f2fs_remount(struct super_block *sb, int *flags, char *data)
 		need_stop_gc = true;
 	}
 
-	if (*flags & SB_RDONLY) {
+	if (*flags & SB_RDONLY ||
+	    F2FS_OPTION(sbi).whint_mode != org_mount_opt.whint_mode) {
 		sync_inodes_sb(sb);
 
 		set_sbi_flag(sbi, SBI_IS_DIRTY);

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 02/13] fs: Restore support for F_GET_FILE_RW_HINT and F_SET_FILE_RW_HINT
  2023-09-20 19:14 [PATCH 00/13] Pass data temperature information to zoned UFS devices Bart Van Assche
  2023-09-20 19:14 ` [PATCH 01/13] fs/f2fs: Restore the whint_mode mount option Bart Van Assche
@ 2023-09-20 19:14 ` Bart Van Assche
  2023-10-02 10:35   ` Avri Altman
  2023-10-03 19:42   ` Bean Huo
  2023-09-20 19:14 ` [PATCH 03/13] fs: Restore kiocb.ki_hint Bart Van Assche
                   ` (17 subsequent siblings)
  19 siblings, 2 replies; 61+ messages in thread
From: Bart Van Assche @ 2023-09-20 19:14 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, Bart Van Assche, Dave Chinner, Alexander Viro,
	Christian Brauner, Jeff Layton, Chuck Lever

Restore support for F_GET_FILE_RW_HINT and F_SET_FILE_RW_HINT by
reverting commit 7b12e49669c9 ("fs: remove fs.f_write_hint").

Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 fs/fcntl.c         | 18 ++++++++++++++++++
 fs/open.c          |  1 +
 include/linux/fs.h |  9 +++++++++
 3 files changed, 28 insertions(+)

diff --git a/fs/fcntl.c b/fs/fcntl.c
index e871009f6c88..acaa49fb1a35 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -292,6 +292,22 @@ static long fcntl_rw_hint(struct file *file, unsigned int cmd,
 	u64 h;
 
 	switch (cmd) {
+	case F_GET_FILE_RW_HINT:
+		h = file_write_hint(file);
+		if (copy_to_user(argp, &h, sizeof(*argp)))
+			return -EFAULT;
+		return 0;
+	case F_SET_FILE_RW_HINT:
+		if (copy_from_user(&h, argp, sizeof(h)))
+			return -EFAULT;
+		hint = (enum rw_hint) h;
+		if (!rw_hint_valid(hint))
+			return -EINVAL;
+
+		spin_lock(&file->f_lock);
+		file->f_write_hint = hint;
+		spin_unlock(&file->f_lock);
+		return 0;
 	case F_GET_RW_HINT:
 		h = inode->i_write_hint;
 		if (copy_to_user(argp, &h, sizeof(*argp)))
@@ -417,6 +433,8 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
 		break;
 	case F_GET_RW_HINT:
 	case F_SET_RW_HINT:
+	case F_GET_FILE_RW_HINT:
+	case F_SET_FILE_RW_HINT:
 		err = fcntl_rw_hint(filp, cmd, arg);
 		break;
 	default:
diff --git a/fs/open.c b/fs/open.c
index 98f6601fbac6..9e31b8c50cc4 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -942,6 +942,7 @@ static int do_dentry_open(struct file *f,
 	if (f->f_mapping->a_ops && f->f_mapping->a_ops->direct_IO)
 		f->f_mode |= FMODE_CAN_ODIRECT;
 
+	f->f_write_hint = WRITE_LIFE_NOT_SET;
 	f->f_flags &= ~(O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC);
 	f->f_iocb_flags = iocb_flags(f);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 4aeb3fa11927..ba2c5c90af6d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1001,6 +1001,7 @@ struct file {
 	 * Must not be taken from IRQ context.
 	 */
 	spinlock_t		f_lock;
+	enum rw_hint		f_write_hint;
 	fmode_t			f_mode;
 	atomic_long_t		f_count;
 	struct mutex		f_pos_lock;
@@ -2134,6 +2135,14 @@ static inline bool HAS_UNMAPPED_ID(struct mnt_idmap *idmap,
 	       !vfsgid_valid(i_gid_into_vfsgid(idmap, inode));
 }
 
+static inline enum rw_hint file_write_hint(struct file *file)
+{
+	if (file->f_write_hint != WRITE_LIFE_NOT_SET)
+		return file->f_write_hint;
+
+	return file_inode(file)->i_write_hint;
+}
+
 static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp)
 {
 	*kiocb = (struct kiocb) {

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 03/13] fs: Restore kiocb.ki_hint
  2023-09-20 19:14 [PATCH 00/13] Pass data temperature information to zoned UFS devices Bart Van Assche
  2023-09-20 19:14 ` [PATCH 01/13] fs/f2fs: Restore the whint_mode mount option Bart Van Assche
  2023-09-20 19:14 ` [PATCH 02/13] fs: Restore support for F_GET_FILE_RW_HINT and F_SET_FILE_RW_HINT Bart Van Assche
@ 2023-09-20 19:14 ` Bart Van Assche
  2023-10-02 10:45   ` Avri Altman
  2023-09-20 19:14 ` [PATCH 04/13] block: Restore write hint support Bart Van Assche
                   ` (16 subsequent siblings)
  19 siblings, 1 reply; 61+ messages in thread
From: Bart Van Assche @ 2023-09-20 19:14 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, Bart Van Assche, Dave Chinner, Alexander Viro,
	Christian Brauner, Benjamin LaHaise, David Howells, Jaegeuk Kim,
	Chao Yu, Steven Rostedt, Masami Hiramatsu

Restore support for passing write hint information from a filesystem to the
block layer. Write hint information can be set via fcntl(fd, F_SET_RW_HINT,
&hint). This patch reverts commit 41d36a9f3e53 ("fs: remove kiocb.ki_hint").

Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 fs/aio.c                    |  1 +
 fs/cachefiles/io.c          |  2 ++
 fs/f2fs/file.c              |  6 ++++++
 include/linux/fs.h          | 12 ++++++++++++
 include/trace/events/f2fs.h |  5 ++++-
 io_uring/rw.c               |  1 +
 6 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/fs/aio.c b/fs/aio.c
index a4c2a6bac72c..a09743049738 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1466,6 +1466,7 @@ static int aio_prep_rw(struct kiocb *req, const struct iocb *iocb)
 	req->ki_flags = req->ki_filp->f_iocb_flags;
 	if (iocb->aio_flags & IOCB_FLAG_RESFD)
 		req->ki_flags |= IOCB_EVENTFD;
+	req->ki_hint = ki_hint_validate(file_write_hint(req->ki_filp));
 	if (iocb->aio_flags & IOCB_FLAG_IOPRIO) {
 		/*
 		 * If the IOCB_FLAG_IOPRIO flag of aio_flags is set, then
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index 009d23cd435b..ad2870748c15 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -138,6 +138,7 @@ static int cachefiles_read(struct netfs_cache_resources *cres,
 	ki->iocb.ki_filp	= file;
 	ki->iocb.ki_pos		= start_pos + skipped;
 	ki->iocb.ki_flags	= IOCB_DIRECT;
+	ki->iocb.ki_hint	= ki_hint_validate(file_write_hint(file));
 	ki->iocb.ki_ioprio	= get_current_ioprio();
 	ki->skipped		= skipped;
 	ki->object		= object;
@@ -306,6 +307,7 @@ int __cachefiles_write(struct cachefiles_object *object,
 	ki->iocb.ki_filp	= file;
 	ki->iocb.ki_pos		= start_pos;
 	ki->iocb.ki_flags	= IOCB_DIRECT | IOCB_WRITE;
+	ki->iocb.ki_hint	= ki_hint_validate(file_write_hint(file));
 	ki->iocb.ki_ioprio	= get_current_ioprio();
 	ki->object		= object;
 	ki->start		= start_pos;
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index ca5904129b16..9dc0e06c38ba 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -4634,8 +4634,10 @@ static ssize_t f2fs_dio_write_iter(struct kiocb *iocb, struct iov_iter *from,
 	struct f2fs_inode_info *fi = F2FS_I(inode);
 	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
 	const bool do_opu = f2fs_lfs_mode(sbi);
+	const int whint_mode = F2FS_OPTION(sbi).whint_mode;
 	const loff_t pos = iocb->ki_pos;
 	const ssize_t count = iov_iter_count(from);
+	const enum rw_hint hint = iocb->ki_hint;
 	unsigned int dio_flags;
 	struct iomap_dio *dio;
 	ssize_t ret;
@@ -4668,6 +4670,8 @@ static ssize_t f2fs_dio_write_iter(struct kiocb *iocb, struct iov_iter *from,
 		if (do_opu)
 			f2fs_down_read(&fi->i_gc_rwsem[READ]);
 	}
+	if (whint_mode == WHINT_MODE_OFF)
+		iocb->ki_hint = WRITE_LIFE_NOT_SET;
 
 	/*
 	 * We have to use __iomap_dio_rw() and iomap_dio_complete() instead of
@@ -4690,6 +4694,8 @@ static ssize_t f2fs_dio_write_iter(struct kiocb *iocb, struct iov_iter *from,
 		ret = iomap_dio_complete(dio);
 	}
 
+	if (whint_mode == WHINT_MODE_OFF)
+		iocb->ki_hint = hint;
 	if (do_opu)
 		f2fs_up_read(&fi->i_gc_rwsem[READ]);
 	f2fs_up_read(&fi->i_gc_rwsem[WRITE]);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index ba2c5c90af6d..8ebed22dfc88 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -374,6 +374,7 @@ struct kiocb {
 	void (*ki_complete)(struct kiocb *iocb, long ret);
 	void			*private;
 	int			ki_flags;
+	u16			ki_hint;
 	u16			ki_ioprio; /* See linux/ioprio.h */
 	union {
 		/*
@@ -2143,11 +2144,21 @@ static inline enum rw_hint file_write_hint(struct file *file)
 	return file_inode(file)->i_write_hint;
 }
 
+static inline u16 ki_hint_validate(enum rw_hint hint)
+{
+	typeof(((struct kiocb *)0)->ki_hint) max_hint = -1;
+
+	if (hint <= max_hint)
+		return hint;
+	return 0;
+}
+
 static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp)
 {
 	*kiocb = (struct kiocb) {
 		.ki_filp = filp,
 		.ki_flags = filp->f_iocb_flags,
+		.ki_hint = ki_hint_validate(file_write_hint(filp)),
 		.ki_ioprio = get_current_ioprio(),
 	};
 }
@@ -2158,6 +2169,7 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
 	*kiocb = (struct kiocb) {
 		.ki_filp = filp,
 		.ki_flags = kiocb_src->ki_flags,
+		.ki_hint = kiocb_src->ki_hint,
 		.ki_ioprio = kiocb_src->ki_ioprio,
 		.ki_pos = kiocb_src->ki_pos,
 	};
diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
index 793f82cc1515..9247ad58034e 100644
--- a/include/trace/events/f2fs.h
+++ b/include/trace/events/f2fs.h
@@ -946,6 +946,7 @@ TRACE_EVENT(f2fs_direct_IO_enter,
 		__field(ino_t,	ino)
 		__field(loff_t,	ki_pos)
 		__field(int,	ki_flags)
+		__field(u16,    ki_hint)
 		__field(u16,	ki_ioprio)
 		__field(unsigned long,	len)
 		__field(int,	rw)
@@ -956,16 +957,18 @@ TRACE_EVENT(f2fs_direct_IO_enter,
 		__entry->ino		= inode->i_ino;
 		__entry->ki_pos		= iocb->ki_pos;
 		__entry->ki_flags	= iocb->ki_flags;
+		__entry->ki_hint	= iocb->ki_hint;
 		__entry->ki_ioprio	= iocb->ki_ioprio;
 		__entry->len		= len;
 		__entry->rw		= rw;
 	),
 
-	TP_printk("dev = (%d,%d), ino = %lu pos = %lld len = %lu ki_flags = %x ki_ioprio = %x rw = %d",
+	TP_printk("dev = (%d,%d), ino = %lu pos = %lld len = %lu ki_flags = %x ki_hint = %x ki_ioprio = %x rw = %d",
 		show_dev_ino(__entry),
 		__entry->ki_pos,
 		__entry->len,
 		__entry->ki_flags,
+		__entry->ki_hint,
 		__entry->ki_ioprio,
 		__entry->rw)
 );
diff --git a/io_uring/rw.c b/io_uring/rw.c
index c8c822fa7980..c41ae6654116 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -677,6 +677,7 @@ static int io_rw_init_file(struct io_kiocb *req, fmode_t mode)
 		req->flags |= io_file_get_flags(file);
 
 	kiocb->ki_flags = file->f_iocb_flags;
+	kiocb->ki_hint = file_inode(file)->i_write_hint;
 	ret = kiocb_set_rw_flags(kiocb, rw->flags);
 	if (unlikely(ret))
 		return ret;

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 04/13] block: Restore write hint support
  2023-09-20 19:14 [PATCH 00/13] Pass data temperature information to zoned UFS devices Bart Van Assche
                   ` (2 preceding siblings ...)
  2023-09-20 19:14 ` [PATCH 03/13] fs: Restore kiocb.ki_hint Bart Van Assche
@ 2023-09-20 19:14 ` Bart Van Assche
  2023-10-02 11:23   ` Avri Altman
                     ` (2 more replies)
  2023-09-20 19:14 ` [PATCH 05/13] scsi: core: Query the Block Limits Extension VPD page Bart Van Assche
                   ` (15 subsequent siblings)
  19 siblings, 3 replies; 61+ messages in thread
From: Bart Van Assche @ 2023-09-20 19:14 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, Bart Van Assche, Alexander Viro,
	Christian Brauner, Jaegeuk Kim, Chao Yu, Darrick J. Wong

This patch partially reverts commit c75e707fe1aa ("block: remove the
per-bio/request write hint"). The following aspects of that commit have
been reverted:
- Pass the struct kiocb write hint information to struct bio.
- Pass the struct bio write hint information to struct request.
- Do not merge requests with different write hints.
- Passing write hint information from the VFS layer to the block layer.
- In F2FS, initialization of bio.bi_write_hint.

The following aspects of that commit have been dropped:
- Debugfs support for retrieving and modifying write hints.
- md-raid, BTRFS, ext4, gfs2 and zonefs write hint support.
- The write_hints[] array in struct request_queue.

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/bio.c                 |  2 ++
 block/blk-crypto-fallback.c |  1 +
 block/blk-merge.c           | 14 ++++++++++++++
 block/blk-mq.c              |  2 ++
 block/bounce.c              |  1 +
 block/fops.c                |  3 +++
 fs/buffer.c                 | 13 ++++++++-----
 fs/direct-io.c              |  1 +
 fs/f2fs/data.c              |  2 ++
 fs/iomap/buffered-io.c      |  2 ++
 fs/iomap/direct-io.c        |  1 +
 fs/mpage.c                  |  1 +
 include/linux/blk-mq.h      |  1 +
 include/linux/blk_types.h   |  1 +
 14 files changed, 40 insertions(+), 5 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 816d412c06e9..755fcde5cb66 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -251,6 +251,7 @@ void bio_init(struct bio *bio, struct block_device *bdev, struct bio_vec *table,
 	bio->bi_opf = opf;
 	bio->bi_flags = 0;
 	bio->bi_ioprio = 0;
+	bio->bi_write_hint = 0;
 	bio->bi_status = 0;
 	bio->bi_iter.bi_sector = 0;
 	bio->bi_iter.bi_size = 0;
@@ -813,6 +814,7 @@ static int __bio_clone(struct bio *bio, struct bio *bio_src, gfp_t gfp)
 {
 	bio_set_flag(bio, BIO_CLONED);
 	bio->bi_ioprio = bio_src->bi_ioprio;
+	bio->bi_write_hint = bio_src->bi_write_hint;
 	bio->bi_iter = bio_src->bi_iter;
 
 	if (bio->bi_bdev) {
diff --git a/block/blk-crypto-fallback.c b/block/blk-crypto-fallback.c
index e6468eab2681..b1e7415f8439 100644
--- a/block/blk-crypto-fallback.c
+++ b/block/blk-crypto-fallback.c
@@ -172,6 +172,7 @@ static struct bio *blk_crypto_fallback_clone_bio(struct bio *bio_src)
 	if (bio_flagged(bio_src, BIO_REMAPPED))
 		bio_set_flag(bio, BIO_REMAPPED);
 	bio->bi_ioprio		= bio_src->bi_ioprio;
+	bio->bi_write_hint	= bio_src->bi_write_hint;
 	bio->bi_iter.bi_sector	= bio_src->bi_iter.bi_sector;
 	bio->bi_iter.bi_size	= bio_src->bi_iter.bi_size;
 
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 65e75efa9bd3..b1854d6bd081 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -817,6 +817,13 @@ static struct request *attempt_merge(struct request_queue *q,
 	if (req->ioprio != next->ioprio)
 		return NULL;
 
+	/*
+	 * Don't allow merge of different write hints, or for a hint with
+	 * non-hint IO.
+	 */
+	if (req->write_hint != next->write_hint)
+		return NULL;
+
 	/*
 	 * If we are allowed to merge, then append bio list
 	 * from next to rq and release next. merge_requests_fn
@@ -944,6 +951,13 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
 	if (rq->ioprio != bio_prio(bio))
 		return false;
 
+	/*
+	 * Don't allow merge of different write hints, or for a hint with
+	 * non-hint IO.
+	 */
+	if (rq->write_hint != bio->bi_write_hint)
+		return false;
+
 	return true;
 }
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index ec922c6bccbe..1326d1661f0e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2563,6 +2563,7 @@ static void blk_mq_bio_to_request(struct request *rq, struct bio *bio,
 		rq->cmd_flags |= REQ_FAILFAST_MASK;
 
 	rq->__sector = bio->bi_iter.bi_sector;
+	rq->write_hint = bio->bi_write_hint;
 	blk_rq_bio_prep(rq, bio, nr_segs);
 
 	/* This can't fail, since GFP_NOIO includes __GFP_DIRECT_RECLAIM. */
@@ -3160,6 +3161,7 @@ int blk_rq_prep_clone(struct request *rq, struct request *rq_src,
 	}
 	rq->nr_phys_segments = rq_src->nr_phys_segments;
 	rq->ioprio = rq_src->ioprio;
+	rq->write_hint = rq_src->write_hint;
 
 	if (rq->bio && blk_crypto_rq_bio_prep(rq, rq->bio, gfp_mask) < 0)
 		goto free_and_out;
diff --git a/block/bounce.c b/block/bounce.c
index 7cfcb242f9a1..d6a5219f29dd 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -169,6 +169,7 @@ static struct bio *bounce_clone_bio(struct bio *bio_src)
 	if (bio_flagged(bio_src, BIO_REMAPPED))
 		bio_set_flag(bio, BIO_REMAPPED);
 	bio->bi_ioprio		= bio_src->bi_ioprio;
+	bio->bi_write_hint	= bio_src->bi_write_hint;
 	bio->bi_iter.bi_sector	= bio_src->bi_iter.bi_sector;
 	bio->bi_iter.bi_size	= bio_src->bi_iter.bi_size;
 
diff --git a/block/fops.c b/block/fops.c
index acff3d5d22d4..6923de13665f 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -74,6 +74,7 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
 	}
 	bio.bi_iter.bi_sector = pos >> SECTOR_SHIFT;
 	bio.bi_ioprio = iocb->ki_ioprio;
+	bio.bi_write_hint = iocb->ki_hint;
 
 	ret = bio_iov_iter_get_pages(&bio, iter);
 	if (unlikely(ret))
@@ -206,6 +207,7 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
 		bio->bi_private = dio;
 		bio->bi_end_io = blkdev_bio_end_io;
 		bio->bi_ioprio = iocb->ki_ioprio;
+		bio->bi_write_hint = iocb->ki_hint;
 
 		ret = bio_iov_iter_get_pages(bio, iter);
 		if (unlikely(ret)) {
@@ -323,6 +325,7 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
 	bio->bi_iter.bi_sector = pos >> SECTOR_SHIFT;
 	bio->bi_end_io = blkdev_bio_end_io_async;
 	bio->bi_ioprio = iocb->ki_ioprio;
+	bio->bi_write_hint = iocb->ki_hint;
 
 	if (iov_iter_is_bvec(iter)) {
 		/*
diff --git a/fs/buffer.c b/fs/buffer.c
index 2379564e5aea..bf1d94f7a96a 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -55,7 +55,7 @@
 
 static int fsync_buffers_list(spinlock_t *lock, struct list_head *list);
 static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh,
-			  struct writeback_control *wbc);
+			  enum rw_hint hint, struct writeback_control *wbc);
 
 #define BH_ENTRY(list) list_entry((list), struct buffer_head, b_assoc_buffers)
 
@@ -1904,7 +1904,8 @@ int __block_write_full_folio(struct inode *inode, struct folio *folio,
 	do {
 		struct buffer_head *next = bh->b_this_page;
 		if (buffer_async_write(bh)) {
-			submit_bh_wbc(REQ_OP_WRITE | write_flags, bh, wbc);
+			submit_bh_wbc(REQ_OP_WRITE | write_flags, bh,
+					inode->i_write_hint, wbc);
 			nr_underway++;
 		}
 		bh = next;
@@ -1958,7 +1959,8 @@ int __block_write_full_folio(struct inode *inode, struct folio *folio,
 		struct buffer_head *next = bh->b_this_page;
 		if (buffer_async_write(bh)) {
 			clear_buffer_dirty(bh);
-			submit_bh_wbc(REQ_OP_WRITE | write_flags, bh, wbc);
+			submit_bh_wbc(REQ_OP_WRITE | write_flags, bh,
+					inode->i_write_hint, wbc);
 			nr_underway++;
 		}
 		bh = next;
@@ -2770,7 +2772,7 @@ static void end_bio_bh_io_sync(struct bio *bio)
 }
 
 static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh,
-			  struct writeback_control *wbc)
+			  enum rw_hint write_hint, struct writeback_control *wbc)
 {
 	const enum req_op op = opf & REQ_OP_MASK;
 	struct bio *bio;
@@ -2797,6 +2799,7 @@ static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh,
 	fscrypt_set_bio_crypt_ctx_bh(bio, bh, GFP_NOIO);
 
 	bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9);
+	bio->bi_write_hint = write_hint;
 
 	__bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh));
 
@@ -2816,7 +2819,7 @@ static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh,
 
 void submit_bh(blk_opf_t opf, struct buffer_head *bh)
 {
-	submit_bh_wbc(opf, bh, NULL);
+	submit_bh_wbc(opf, bh, 0, NULL);
 }
 EXPORT_SYMBOL(submit_bh);
 
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 7bc494ee56b9..bfa32c6ed3dd 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -404,6 +404,7 @@ dio_bio_alloc(struct dio *dio, struct dio_submit *sdio,
 	 */
 	bio = bio_alloc(bdev, nr_vecs, dio->opf, GFP_KERNEL);
 	bio->bi_iter.bi_sector = first_sector;
+	bio->bi_write_hint = dio->iocb->ki_hint;
 	if (dio->is_async)
 		bio->bi_end_io = dio_bio_end_aio;
 	else
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 916e317ac925..d759a7b8478f 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -478,6 +478,8 @@ static struct bio *__bio_alloc(struct f2fs_io_info *fio, int npages)
 	} else {
 		bio->bi_end_io = f2fs_write_end_io;
 		bio->bi_private = sbi;
+		bio->bi_write_hint =
+			f2fs_io_type_to_rw_hint(sbi, fio->type, fio->temp);
 	}
 	iostat_alloc_and_bind_ctx(sbi, bio, NULL);
 
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index ae8673ce08b1..a344418a82ad 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1654,6 +1654,7 @@ iomap_alloc_ioend(struct inode *inode, struct iomap_writepage_ctx *wpc,
 			       REQ_OP_WRITE | wbc_to_write_flags(wbc),
 			       GFP_NOFS, &iomap_ioend_bioset);
 	bio->bi_iter.bi_sector = sector;
+	bio->bi_write_hint = inode->i_write_hint;
 	wbc_init_bio(wbc, bio);
 
 	ioend = container_of(bio, struct iomap_ioend, io_inline_bio);
@@ -1684,6 +1685,7 @@ iomap_chain_bio(struct bio *prev)
 	new = bio_alloc(prev->bi_bdev, BIO_MAX_VECS, prev->bi_opf, GFP_NOFS);
 	bio_clone_blkg_association(new, prev);
 	new->bi_iter.bi_sector = bio_end_sector(prev);
+	new->bi_write_hint = prev->bi_write_hint;
 
 	bio_chain(prev, new);
 	bio_get(prev);		/* for iomap_finish_ioend */
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index bcd3f8cf5ea4..afb704f98a97 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -380,6 +380,7 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
 		fscrypt_set_bio_crypt_ctx(bio, inode, pos >> inode->i_blkbits,
 					  GFP_KERNEL);
 		bio->bi_iter.bi_sector = iomap_sector(iomap, pos);
+		bio->bi_write_hint = dio->iocb->ki_hint;
 		bio->bi_ioprio = dio->iocb->ki_ioprio;
 		bio->bi_private = dio;
 		bio->bi_end_io = iomap_dio_bio_end_io;
diff --git a/fs/mpage.c b/fs/mpage.c
index 242e213ee064..5d444d2c39f1 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -612,6 +612,7 @@ static int __mpage_writepage(struct folio *folio, struct writeback_control *wbc,
 				GFP_NOFS);
 		bio->bi_iter.bi_sector = blocks[0] << (blkbits - 9);
 		wbc_init_bio(wbc, bio);
+		bio->bi_write_hint = inode->i_write_hint;
 	}
 
 	/*
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 958ed7e89b30..d2605fb5ee63 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -137,6 +137,7 @@ struct request {
 	struct blk_crypto_keyslot *crypt_keyslot;
 #endif
 
+	unsigned short write_hint;
 	unsigned short ioprio;
 
 	enum mq_rq_state state;
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index d5c5e59ddbd2..6d1617f2123b 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -269,6 +269,7 @@ struct bio {
 						 */
 	unsigned short		bi_flags;	/* BIO_* below */
 	unsigned short		bi_ioprio;
+	unsigned short		bi_write_hint;
 	blk_status_t		bi_status;
 	atomic_t		__bi_remaining;
 

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 05/13] scsi: core: Query the Block Limits Extension VPD page
  2023-09-20 19:14 [PATCH 00/13] Pass data temperature information to zoned UFS devices Bart Van Assche
                   ` (3 preceding siblings ...)
  2023-09-20 19:14 ` [PATCH 04/13] block: Restore write hint support Bart Van Assche
@ 2023-09-20 19:14 ` Bart Van Assche
  2023-10-02 11:29   ` Avri Altman
  2023-09-20 19:14 ` [PATCH 06/13] scsi_proto: Add struct io_group_descriptor Bart Van Assche
                   ` (14 subsequent siblings)
  19 siblings, 1 reply; 61+ messages in thread
From: Bart Van Assche @ 2023-09-20 19:14 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, Bart Van Assche, James E.J. Bottomley

Parse the Reduced Stream Control Supported (RSCS) bit from the block
limits extension VPD page. The RSCS bit is defined in T10 document
"SBC-5 Constrained Streams with Data Lifetimes"
(https://www.t10.org/cgi-bin/ac.pl?t=d&f=23-024r3.pdf).

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/scsi.c        |  2 ++
 drivers/scsi/scsi_sysfs.c  | 10 ++++++++++
 drivers/scsi/sd.c          | 13 +++++++++++++
 drivers/scsi/sd.h          |  1 +
 include/scsi/scsi_device.h |  1 +
 5 files changed, 27 insertions(+)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index d0911bc28663..5ad291770806 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -499,6 +499,8 @@ void scsi_attach_vpd(struct scsi_device *sdev)
 			scsi_update_vpd_page(sdev, 0xb1, &sdev->vpd_pgb1);
 		if (vpd_buf->data[i] == 0xb2)
 			scsi_update_vpd_page(sdev, 0xb2, &sdev->vpd_pgb2);
+		if (vpd_buf->data[i] == 0xb7)
+			scsi_update_vpd_page(sdev, 0xb7, &sdev->vpd_pgb7);
 	}
 	kfree(vpd_buf);
 }
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 24f6eefb6803..93652a786a46 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -449,6 +449,7 @@ static void scsi_device_dev_release(struct device *dev)
 	struct scsi_vpd *vpd_pg80 = NULL, *vpd_pg83 = NULL;
 	struct scsi_vpd *vpd_pg0 = NULL, *vpd_pg89 = NULL;
 	struct scsi_vpd *vpd_pgb0 = NULL, *vpd_pgb1 = NULL, *vpd_pgb2 = NULL;
+	struct scsi_vpd *vpd_pgb7 = NULL;
 	unsigned long flags;
 
 	might_sleep();
@@ -494,6 +495,8 @@ static void scsi_device_dev_release(struct device *dev)
 				       lockdep_is_held(&sdev->inquiry_mutex));
 	vpd_pgb2 = rcu_replace_pointer(sdev->vpd_pgb2, vpd_pgb2,
 				       lockdep_is_held(&sdev->inquiry_mutex));
+	vpd_pgb7 = rcu_replace_pointer(sdev->vpd_pgb7, vpd_pgb7,
+				       lockdep_is_held(&sdev->inquiry_mutex));
 	mutex_unlock(&sdev->inquiry_mutex);
 
 	if (vpd_pg0)
@@ -510,6 +513,8 @@ static void scsi_device_dev_release(struct device *dev)
 		kfree_rcu(vpd_pgb1, rcu);
 	if (vpd_pgb2)
 		kfree_rcu(vpd_pgb2, rcu);
+	if (vpd_pgb7)
+		kfree_rcu(vpd_pgb7, rcu);
 	kfree(sdev->inquiry);
 	kfree(sdev);
 
@@ -921,6 +926,7 @@ sdev_vpd_pg_attr(pg89);
 sdev_vpd_pg_attr(pgb0);
 sdev_vpd_pg_attr(pgb1);
 sdev_vpd_pg_attr(pgb2);
+sdev_vpd_pg_attr(pgb7);
 sdev_vpd_pg_attr(pg0);
 
 static ssize_t show_inquiry(struct file *filep, struct kobject *kobj,
@@ -1295,6 +1301,9 @@ static umode_t scsi_sdev_bin_attr_is_visible(struct kobject *kobj,
 	if (attr == &dev_attr_vpd_pgb2 && !sdev->vpd_pgb2)
 		return 0;
 
+	if (attr == &dev_attr_vpd_pgb7 && !sdev->vpd_pgb7)
+		return 0;
+
 	return S_IRUGO;
 }
 
@@ -1347,6 +1356,7 @@ static struct bin_attribute *scsi_sdev_bin_attrs[] = {
 	&dev_attr_vpd_pgb0,
 	&dev_attr_vpd_pgb1,
 	&dev_attr_vpd_pgb2,
+	&dev_attr_vpd_pgb7,
 	&dev_attr_inquiry,
 	NULL
 };
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index c92a317ba547..879edbc1a065 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3019,6 +3019,18 @@ static void sd_read_block_limits(struct scsi_disk *sdkp)
 	rcu_read_unlock();
 }
 
+/* Parse the Block Limits Extension VPD page (0xb7) */
+static void sd_read_block_limits_ext(struct scsi_disk *sdkp)
+{
+	struct scsi_vpd *vpd;
+
+	rcu_read_lock();
+	vpd = rcu_dereference(sdkp->device->vpd_pgb7);
+	if (vpd && vpd->len >= 2)
+		sdkp->rscs = vpd->data[5] & 1;
+	rcu_read_unlock();
+}
+
 /**
  * sd_read_block_characteristics - Query block dev. characteristics
  * @sdkp: disk to query
@@ -3373,6 +3385,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
 		if (scsi_device_supports_vpd(sdp)) {
 			sd_read_block_provisioning(sdkp);
 			sd_read_block_limits(sdkp);
+			sd_read_block_limits_ext(sdkp);
 			sd_read_block_characteristics(sdkp);
 			sd_zbc_read_zones(sdkp, buffer);
 			sd_read_cpr(sdkp);
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 5eea762f84d1..84685168b6e0 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -150,6 +150,7 @@ struct scsi_disk {
 	unsigned	urswrz : 1;
 	unsigned	security : 1;
 	unsigned	ignore_medium_access_errors : 1;
+	bool		rscs : 1; /* reduced stream control support */
 };
 #define to_scsi_disk(obj) container_of(obj, struct scsi_disk, disk_dev)
 
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index b9230b6add04..2dd96ae101e1 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -153,6 +153,7 @@ struct scsi_device {
 	struct scsi_vpd __rcu *vpd_pgb0;
 	struct scsi_vpd __rcu *vpd_pgb1;
 	struct scsi_vpd __rcu *vpd_pgb2;
+	struct scsi_vpd __rcu *vpd_pgb7;
 
 	struct scsi_target      *sdev_target;
 

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 06/13] scsi_proto: Add struct io_group_descriptor
  2023-09-20 19:14 [PATCH 00/13] Pass data temperature information to zoned UFS devices Bart Van Assche
                   ` (4 preceding siblings ...)
  2023-09-20 19:14 ` [PATCH 05/13] scsi: core: Query the Block Limits Extension VPD page Bart Van Assche
@ 2023-09-20 19:14 ` Bart Van Assche
  2023-10-02 11:41   ` Avri Altman
  2023-10-02 18:16   ` Avri Altman
  2023-09-20 19:14 ` [PATCH 07/13] sd: Translate data lifetime information Bart Van Assche
                   ` (13 subsequent siblings)
  19 siblings, 2 replies; 61+ messages in thread
From: Bart Van Assche @ 2023-09-20 19:14 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, Bart Van Assche, James E.J. Bottomley

Prepare for adding code that will fill in and parse this data structure.

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 include/scsi/scsi_proto.h | 40 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/include/scsi/scsi_proto.h b/include/scsi/scsi_proto.h
index 07d65c1f59db..4e3691cb67da 100644
--- a/include/scsi/scsi_proto.h
+++ b/include/scsi/scsi_proto.h
@@ -10,6 +10,7 @@
 #ifndef _SCSI_PROTO_H_
 #define _SCSI_PROTO_H_
 
+#include <linux/build_bug.h>
 #include <linux/types.h>
 
 /*
@@ -275,6 +276,45 @@ struct scsi_lun {
 	__u8 scsi_lun[8];
 };
 
+/* SBC-5 IO advice hints group descriptor */
+struct scsi_io_group_descriptor {
+#if defined(__BIG_ENDIAN)
+	u8 io_advice_hints_mode: 2;
+	u8 reserved1: 3;
+	u8 st_enble: 1;
+	u8 cs_enble: 1;
+	u8 ic_enable: 1;
+#elif defined(__LITTLE_ENDIAN)
+	u8 ic_enable: 1;
+	u8 cs_enble: 1;
+	u8 st_enble: 1;
+	u8 reserved1: 3;
+	u8 io_advice_hints_mode: 2;
+#else
+#error
+#endif
+	u8 reserved2[3];
+	/* Logical block markup descriptor */
+#if defined(__BIG_ENDIAN)
+	u8 acdlu: 1;
+	u8 reserved3: 1;
+	u8 rlbsr: 2;
+	u8 lbm_descriptor_type: 4;
+#elif defined(__LITTLE_ENDIAN)
+	u8 lbm_descriptor_type: 4;
+	u8 rlbsr: 2;
+	u8 reserved3: 1;
+	u8 acdlu: 1;
+#else
+#error
+#endif
+	u8 params[2];
+	u8 reserved4;
+	u8 reserved5[8];
+};
+
+static_assert(sizeof(struct scsi_io_group_descriptor) == 16);
+
 /* SPC asymmetric access states */
 #define SCSI_ACCESS_STATE_OPTIMAL     0x00
 #define SCSI_ACCESS_STATE_ACTIVE      0x01

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 07/13] sd: Translate data lifetime information
  2023-09-20 19:14 [PATCH 00/13] Pass data temperature information to zoned UFS devices Bart Van Assche
                   ` (5 preceding siblings ...)
  2023-09-20 19:14 ` [PATCH 06/13] scsi_proto: Add struct io_group_descriptor Bart Van Assche
@ 2023-09-20 19:14 ` Bart Van Assche
  2023-10-02 13:11   ` Avri Altman
  2023-09-20 19:14 ` [PATCH 08/13] scsi_debug: Reduce code duplication Bart Van Assche
                   ` (12 subsequent siblings)
  19 siblings, 1 reply; 61+ messages in thread
From: Bart Van Assche @ 2023-09-20 19:14 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, Bart Van Assche, Damien Le Moal,
	James E.J. Bottomley

Recently T10 standardized SBC constrained streams. This mechanism enables
passing data lifetime information to SCSI devices in the group number
field. Add support for translating write hint information into a
permanent stream number in the sd driver.

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/sd.c | 65 ++++++++++++++++++++++++++++++++++++++++++++---
 drivers/scsi/sd.h |  1 +
 2 files changed, 63 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 879edbc1a065..7bbc58cd99d1 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1001,12 +1001,38 @@ static blk_status_t sd_setup_flush_cmnd(struct scsi_cmnd *cmd)
 	return BLK_STS_OK;
 }
 
+/**
+ * sd_group_number() - Compute the GROUP NUMBER field
+ * @cmd: SCSI command for which to compute the value of the six-bit GROUP NUMBER
+ *	field.
+ *
+ * From "SBC-5 Constrained Streams with Data Lifetimes"
+ * (https://www.t10.org/cgi-bin/ac.pl?t=d&f=23-024r3.pdf):
+ * 0: no relative lifetime.
+ * 1: shortest relative lifetime.
+ * 2: second shortest relative lifetime.
+ * 3 - 0x3d: intermediate relative lifetimes.
+ * 0x3e: second longest relative lifetime.
+ * 0x3f: longest relative lifetime.
+ */
+static u8 sd_group_number(struct scsi_cmnd *cmd)
+{
+	const struct request *rq = scsi_cmd_to_rq(cmd);
+	struct scsi_disk *sdkp = scsi_disk(rq->q->disk);
+	const int max_gn = min_t(u16, sdkp->permanent_stream_count, 0x3f);
+
+	if (!sdkp->rscs || rq->write_hint == WRITE_LIFE_NOT_SET)
+		return 0;
+	return min(rq->write_hint - WRITE_LIFE_NONE, max_gn);
+}
+
 static blk_status_t sd_setup_rw32_cmnd(struct scsi_cmnd *cmd, bool write,
 				       sector_t lba, unsigned int nr_blocks,
 				       unsigned char flags, unsigned int dld)
 {
 	cmd->cmd_len = SD_EXT_CDB_SIZE;
 	cmd->cmnd[0]  = VARIABLE_LENGTH_CMD;
+	cmd->cmnd[6]  = sd_group_number(cmd);
 	cmd->cmnd[7]  = 0x18; /* Additional CDB len */
 	cmd->cmnd[9]  = write ? WRITE_32 : READ_32;
 	cmd->cmnd[10] = flags;
@@ -1025,7 +1051,7 @@ static blk_status_t sd_setup_rw16_cmnd(struct scsi_cmnd *cmd, bool write,
 	cmd->cmd_len  = 16;
 	cmd->cmnd[0]  = write ? WRITE_16 : READ_16;
 	cmd->cmnd[1]  = flags | ((dld >> 2) & 0x01);
-	cmd->cmnd[14] = (dld & 0x03) << 6;
+	cmd->cmnd[14] = ((dld & 0x03) << 6) | sd_group_number(cmd);
 	cmd->cmnd[15] = 0;
 	put_unaligned_be64(lba, &cmd->cmnd[2]);
 	put_unaligned_be32(nr_blocks, &cmd->cmnd[10]);
@@ -1040,7 +1066,7 @@ static blk_status_t sd_setup_rw10_cmnd(struct scsi_cmnd *cmd, bool write,
 	cmd->cmd_len = 10;
 	cmd->cmnd[0] = write ? WRITE_10 : READ_10;
 	cmd->cmnd[1] = flags;
-	cmd->cmnd[6] = 0;
+	cmd->cmnd[6] = sd_group_number(cmd);
 	cmd->cmnd[9] = 0;
 	put_unaligned_be32(lba, &cmd->cmnd[2]);
 	put_unaligned_be16(nr_blocks, &cmd->cmnd[7]);
@@ -1177,7 +1203,8 @@ static blk_status_t sd_setup_read_write_cmnd(struct scsi_cmnd *cmd)
 		ret = sd_setup_rw16_cmnd(cmd, write, lba, nr_blocks,
 					 protect | fua, dld);
 	} else if ((nr_blocks > 0xff) || (lba > 0x1fffff) ||
-		   sdp->use_10_for_rw || protect) {
+		   sdp->use_10_for_rw || protect ||
+		   rq->write_hint != WRITE_LIFE_NOT_SET) {
 		ret = sd_setup_rw10_cmnd(cmd, write, lba, nr_blocks,
 					 protect | fua);
 	} else {
@@ -2912,6 +2939,37 @@ sd_read_cache_type(struct scsi_disk *sdkp, unsigned char *buffer)
 	sdkp->DPOFUA = 0;
 }
 
+static void sd_read_io_hints(struct scsi_disk *sdkp, unsigned char *buffer)
+{
+	struct scsi_device *sdp = sdkp->device;
+	const struct scsi_io_group_descriptor *desc, *start, *end;
+	struct scsi_sense_hdr sshdr;
+	struct scsi_mode_data data;
+	int res;
+
+	res = scsi_mode_sense(sdp, /*dbd=*/0x8, /*modepage=*/0x0a,
+			      /*subpage=*/0x05, buffer, SD_BUF_SIZE,
+			      SD_TIMEOUT, sdkp->max_retries, &data, &sshdr);
+	if (res < 0)
+		return;
+	start = (void *)buffer + data.header_length + 16;
+	end = (void *)buffer + ((data.header_length + data.length)
+				& ~(sizeof(*end) - 1));
+	/*
+	 * From "SBC-5 Constrained Streams with Data Lifetimes": Device severs
+	 * should assign the lowest numbered stream identifiers to permanent
+	 * streams.
+	 */
+	for (desc = start; desc < end; desc++)
+		if (!desc->st_enble)
+			break;
+	sdkp->permanent_stream_count = desc - start;
+	if (sdkp->rscs && sdkp->permanent_stream_count < 2)
+		sdev_printk(KERN_INFO, sdp,
+			    "Unexpected: RSCS has been set and the permanent stream count is %u\n",
+			    sdkp->permanent_stream_count);
+}
+
 /*
  * The ATO bit indicates whether the DIF application tag is available
  * for use by the operating system.
@@ -3395,6 +3453,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
 
 		sd_read_write_protect_flag(sdkp, buffer);
 		sd_read_cache_type(sdkp, buffer);
+		sd_read_io_hints(sdkp, buffer);
 		sd_read_app_tag_own(sdkp, buffer);
 		sd_read_write_same(sdkp, buffer);
 		sd_read_security(sdkp, buffer);
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 84685168b6e0..1863de5ebae4 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -125,6 +125,7 @@ struct scsi_disk {
 	unsigned int	physical_block_size;
 	unsigned int	max_medium_access_timeouts;
 	unsigned int	medium_access_timed_out;
+	u16		permanent_stream_count;	/* maximum number of streams */
 	u8		media_present;
 	u8		write_prot;
 	u8		protection_type;/* Data Integrity Field */

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 08/13] scsi_debug: Reduce code duplication
  2023-09-20 19:14 [PATCH 00/13] Pass data temperature information to zoned UFS devices Bart Van Assche
                   ` (6 preceding siblings ...)
  2023-09-20 19:14 ` [PATCH 07/13] sd: Translate data lifetime information Bart Van Assche
@ 2023-09-20 19:14 ` Bart Van Assche
  2023-10-03  6:49   ` Avri Altman
  2023-09-20 19:14 ` [PATCH 09/13] scsi_debug: Support the block limits extension VPD page Bart Van Assche
                   ` (11 subsequent siblings)
  19 siblings, 1 reply; 61+ messages in thread
From: Bart Van Assche @ 2023-09-20 19:14 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, Bart Van Assche, Douglas Gilbert,
	James E.J. Bottomley

All VPD pages have the page code in byte one. Reduce code duplication by
storing the VPD page code once.

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/scsi_debug.c | 16 ++--------------
 1 file changed, 2 insertions(+), 14 deletions(-)

diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index 9c0af50501f9..46eaa2f9e63b 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -1598,7 +1598,8 @@ static int resp_inquiry(struct scsi_cmnd *scp, struct sdebug_dev_info *devip)
 		u32 len;
 		char lu_id_str[6];
 		int host_no = devip->sdbg_host->shost->host_no;
-		
+
+		arr[1] = cmd[2];
 		port_group_id = (((host_no + 1) & 0x7f) << 8) +
 		    (devip->channel & 0x7f);
 		if (sdebug_vpd_use_hostno == 0)
@@ -1609,7 +1610,6 @@ static int resp_inquiry(struct scsi_cmnd *scp, struct sdebug_dev_info *devip)
 				 (devip->target * 1000) - 3;
 		len = scnprintf(lu_id_str, 6, "%d", lu_id_num);
 		if (0 == cmd[2]) { /* supported vital product data pages */
-			arr[1] = cmd[2];	/*sanity */
 			n = 4;
 			arr[n++] = 0x0;   /* this page */
 			arr[n++] = 0x80;  /* unit serial number */
@@ -1630,23 +1630,18 @@ static int resp_inquiry(struct scsi_cmnd *scp, struct sdebug_dev_info *devip)
 			}
 			arr[3] = n - 4;	  /* number of supported VPD pages */
 		} else if (0x80 == cmd[2]) { /* unit serial number */
-			arr[1] = cmd[2];	/*sanity */
 			arr[3] = len;
 			memcpy(&arr[4], lu_id_str, len);
 		} else if (0x83 == cmd[2]) { /* device identification */
-			arr[1] = cmd[2];	/*sanity */
 			arr[3] = inquiry_vpd_83(&arr[4], port_group_id,
 						target_dev_id, lu_id_num,
 						lu_id_str, len,
 						&devip->lu_name);
 		} else if (0x84 == cmd[2]) { /* Software interface ident. */
-			arr[1] = cmd[2];	/*sanity */
 			arr[3] = inquiry_vpd_84(&arr[4]);
 		} else if (0x85 == cmd[2]) { /* Management network addresses */
-			arr[1] = cmd[2];	/*sanity */
 			arr[3] = inquiry_vpd_85(&arr[4]);
 		} else if (0x86 == cmd[2]) { /* extended inquiry */
-			arr[1] = cmd[2];	/*sanity */
 			arr[3] = 0x3c;	/* number of following entries */
 			if (sdebug_dif == T10_PI_TYPE3_PROTECTION)
 				arr[4] = 0x4;	/* SPT: GRD_CHK:1 */
@@ -1656,30 +1651,23 @@ static int resp_inquiry(struct scsi_cmnd *scp, struct sdebug_dev_info *devip)
 				arr[4] = 0x0;   /* no protection stuff */
 			arr[5] = 0x7;   /* head of q, ordered + simple q's */
 		} else if (0x87 == cmd[2]) { /* mode page policy */
-			arr[1] = cmd[2];	/*sanity */
 			arr[3] = 0x8;	/* number of following entries */
 			arr[4] = 0x2;	/* disconnect-reconnect mp */
 			arr[6] = 0x80;	/* mlus, shared */
 			arr[8] = 0x18;	 /* protocol specific lu */
 			arr[10] = 0x82;	 /* mlus, per initiator port */
 		} else if (0x88 == cmd[2]) { /* SCSI Ports */
-			arr[1] = cmd[2];	/*sanity */
 			arr[3] = inquiry_vpd_88(&arr[4], target_dev_id);
 		} else if (is_disk_zbc && 0x89 == cmd[2]) { /* ATA info */
-			arr[1] = cmd[2];        /*sanity */
 			n = inquiry_vpd_89(&arr[4]);
 			put_unaligned_be16(n, arr + 2);
 		} else if (is_disk_zbc && 0xb0 == cmd[2]) { /* Block limits */
-			arr[1] = cmd[2];        /*sanity */
 			arr[3] = inquiry_vpd_b0(&arr[4]);
 		} else if (is_disk_zbc && 0xb1 == cmd[2]) { /* Block char. */
-			arr[1] = cmd[2];        /*sanity */
 			arr[3] = inquiry_vpd_b1(devip, &arr[4]);
 		} else if (is_disk && 0xb2 == cmd[2]) { /* LB Prov. */
-			arr[1] = cmd[2];        /*sanity */
 			arr[3] = inquiry_vpd_b2(&arr[4]);
 		} else if (is_zbc && cmd[2] == 0xb6) { /* ZB dev. charact. */
-			arr[1] = cmd[2];        /*sanity */
 			arr[3] = inquiry_vpd_b6(devip, &arr[4]);
 		} else {
 			mk_sense_invalid_fld(scp, SDEB_IN_CDB, 2, -1);

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 09/13] scsi_debug: Support the block limits extension VPD page
  2023-09-20 19:14 [PATCH 00/13] Pass data temperature information to zoned UFS devices Bart Van Assche
                   ` (7 preceding siblings ...)
  2023-09-20 19:14 ` [PATCH 08/13] scsi_debug: Reduce code duplication Bart Van Assche
@ 2023-09-20 19:14 ` Bart Van Assche
  2023-09-20 19:14 ` [PATCH 10/13] scsi_debug: Rework page code error handling Bart Van Assche
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 61+ messages in thread
From: Bart Van Assche @ 2023-09-20 19:14 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, Bart Van Assche, Douglas Gilbert,
	James E.J. Bottomley

From T10 document 23-024r3.pdf:

"Reduced stream control:
a) reduces the maximum number of streams that the device server supports;
   and
b) increases the number of write commands that are able to specify a stream
   to be written in any write command that contains the GROUP NUMBER field
   in its CDB.

If the RSCS bit (see 6.6.5) is set to one, then the device server shall:
a) support per group stream identifier usage as described in 4.32.2;
b) support the IO Advice Hints Grouping mode page (see 6.5.7); and
c) set the MAXIMUM NUMBER OF STREAMS field (see 6.6.5) to a value that is
   less than 64.

Device servers that set the RSCS bit to one may support other features
(e.g., permanent streams (see 4.32.4)).

4.32.4 Permanent streams

A permanent stream is a stream for which the device server does not allow
closing or otherwise modifying the configuration of that stream. The PERM
bit (see 5.9.2.3) indicates whether a stream is a permanent stream. If a
STREAM CONTROL command (see 5.32) specifies the closing of a permanent
stream, the device server terminates that command with CHECK CONDITION
status instead of closing the specified stream. A permanent stream is always
an open stream. Device severs should assign the lowest numbered stream
identifiers to permanent streams."

Report that reduced stream control is supported.

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/scsi_debug.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index 46eaa2f9e63b..88cba9374166 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -1627,6 +1627,7 @@ static int resp_inquiry(struct scsi_cmnd *scp, struct sdebug_dev_info *devip)
 					arr[n++] = 0xb2;  /* LB Provisioning */
 				if (is_zbc)
 					arr[n++] = 0xb6;  /* ZB dev. char. */
+				arr[n++] = 0xb7;  /* Block limits extension */
 			}
 			arr[3] = n - 4;	  /* number of supported VPD pages */
 		} else if (0x80 == cmd[2]) { /* unit serial number */
@@ -1669,6 +1670,9 @@ static int resp_inquiry(struct scsi_cmnd *scp, struct sdebug_dev_info *devip)
 			arr[3] = inquiry_vpd_b2(&arr[4]);
 		} else if (is_zbc && cmd[2] == 0xb6) { /* ZB dev. charact. */
 			arr[3] = inquiry_vpd_b6(devip, &arr[4]);
+		} else if (cmd[2] == 0xb7) { /* block limits extension page */
+			arr[3] = 2; /* page length */
+			arr[5] = 1; /* Reduced stream control support (RSCS) */
 		} else {
 			mk_sense_invalid_fld(scp, SDEB_IN_CDB, 2, -1);
 			kfree(arr);

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 10/13] scsi_debug: Rework page code error handling
  2023-09-20 19:14 [PATCH 00/13] Pass data temperature information to zoned UFS devices Bart Van Assche
                   ` (8 preceding siblings ...)
  2023-09-20 19:14 ` [PATCH 09/13] scsi_debug: Support the block limits extension VPD page Bart Van Assche
@ 2023-09-20 19:14 ` Bart Van Assche
  2023-09-20 19:14 ` [PATCH 11/13] scsi_debug: Rework subpage " Bart Van Assche
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 61+ messages in thread
From: Bart Van Assche @ 2023-09-20 19:14 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, Bart Van Assche, Douglas Gilbert,
	James E.J. Bottomley

Instead of tracking whether or not the page code is valid in a boolean
variable, jump to error handling code if an unsupported page code is
encountered.

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/scsi_debug.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index 88cba9374166..6b87d267c9c5 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -2327,7 +2327,7 @@ static int resp_mode_sense(struct scsi_cmnd *scp,
 	unsigned char *ap;
 	unsigned char arr[SDEBUG_MAX_MSENSE_SZ];
 	unsigned char *cmd = scp->cmnd;
-	bool dbd, llbaa, msense_6, is_disk, is_zbc, bad_pcode;
+	bool dbd, llbaa, msense_6, is_disk, is_zbc;
 
 	dbd = !!(cmd[1] & 0x8);		/* disable block descriptors */
 	pcontrol = (cmd[2] & 0xc0) >> 6;
@@ -2391,7 +2391,6 @@ static int resp_mode_sense(struct scsi_cmnd *scp,
 		mk_sense_invalid_fld(scp, SDEB_IN_CDB, 3, -1);
 		return check_condition_result;
 	}
-	bad_pcode = false;
 
 	switch (pcode) {
 	case 0x1:	/* Read-Write error recovery page, direct access */
@@ -2406,15 +2405,17 @@ static int resp_mode_sense(struct scsi_cmnd *scp,
 		if (is_disk) {
 			len = resp_format_pg(ap, pcontrol, target);
 			offset += len;
-		} else
-			bad_pcode = true;
+		} else {
+			goto bad_pcode;
+		}
 		break;
 	case 0x8:	/* Caching page, direct access */
 		if (is_disk || is_zbc) {
 			len = resp_caching_pg(ap, pcontrol, target);
 			offset += len;
-		} else
-			bad_pcode = true;
+		} else {
+			goto bad_pcode;
+		}
 		break;
 	case 0xa:	/* Control Mode page, all devices */
 		len = resp_ctrl_m_pg(ap, pcontrol, target);
@@ -2467,18 +2468,17 @@ static int resp_mode_sense(struct scsi_cmnd *scp,
 		}
 		break;
 	default:
-		bad_pcode = true;
-		break;
-	}
-	if (bad_pcode) {
-		mk_sense_invalid_fld(scp, SDEB_IN_CDB, 2, 5);
-		return check_condition_result;
+		goto bad_pcode;
 	}
 	if (msense_6)
 		arr[0] = offset - 1;
 	else
 		put_unaligned_be16((offset - 2), arr + 0);
 	return fill_from_dev_buffer(scp, arr, min_t(u32, alloc_len, offset));
+
+bad_pcode:
+	mk_sense_invalid_fld(scp, SDEB_IN_CDB, 2, 5);
+	return check_condition_result;
 }
 
 #define SDEBUG_MAX_MSELECT_SZ 512

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 11/13] scsi_debug: Rework subpage code error handling
  2023-09-20 19:14 [PATCH 00/13] Pass data temperature information to zoned UFS devices Bart Van Assche
                   ` (9 preceding siblings ...)
  2023-09-20 19:14 ` [PATCH 10/13] scsi_debug: Rework page code error handling Bart Van Assche
@ 2023-09-20 19:14 ` Bart Van Assche
  2023-09-20 19:14 ` [PATCH 12/13] scsi_debug: Implement the IO Advice Hints Grouping mode page Bart Van Assche
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 61+ messages in thread
From: Bart Van Assche @ 2023-09-20 19:14 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, Bart Van Assche, Douglas Gilbert,
	James E.J. Bottomley

Move the subpage code checks into the switch statement to make it easier
to add support for new page code / subpage code combinations.

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/scsi_debug.c | 70 ++++++++++++++++++++-------------------
 1 file changed, 36 insertions(+), 34 deletions(-)

diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index 6b87d267c9c5..a96eb0d10346 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -2386,22 +2386,22 @@ static int resp_mode_sense(struct scsi_cmnd *scp,
 		ap = arr + offset;
 	}
 
-	if ((subpcode > 0x0) && (subpcode < 0xff) && (0x19 != pcode)) {
-		/* TODO: Control Extension page */
-		mk_sense_invalid_fld(scp, SDEB_IN_CDB, 3, -1);
-		return check_condition_result;
-	}
-
 	switch (pcode) {
 	case 0x1:	/* Read-Write error recovery page, direct access */
+		if (subpcode > 0x0 && subpcode < 0xff)
+			goto bad_subpcode;
 		len = resp_err_recov_pg(ap, pcontrol, target);
 		offset += len;
 		break;
 	case 0x2:	/* Disconnect-Reconnect page, all devices */
+		if (subpcode > 0x0 && subpcode < 0xff)
+			goto bad_subpcode;
 		len = resp_disconnect_pg(ap, pcontrol, target);
 		offset += len;
 		break;
 	case 0x3:       /* Format device page, direct access */
+		if (subpcode > 0x0 && subpcode < 0xff)
+			goto bad_subpcode;
 		if (is_disk) {
 			len = resp_format_pg(ap, pcontrol, target);
 			offset += len;
@@ -2410,6 +2410,8 @@ static int resp_mode_sense(struct scsi_cmnd *scp,
 		}
 		break;
 	case 0x8:	/* Caching page, direct access */
+		if (subpcode > 0x0 && subpcode < 0xff)
+			goto bad_subpcode;
 		if (is_disk || is_zbc) {
 			len = resp_caching_pg(ap, pcontrol, target);
 			offset += len;
@@ -2418,14 +2420,14 @@ static int resp_mode_sense(struct scsi_cmnd *scp,
 		}
 		break;
 	case 0xa:	/* Control Mode page, all devices */
+		if (subpcode > 0x0 && subpcode < 0xff)
+			goto bad_subpcode;
 		len = resp_ctrl_m_pg(ap, pcontrol, target);
 		offset += len;
 		break;
 	case 0x19:	/* if spc==1 then sas phy, control+discover */
-		if ((subpcode > 0x2) && (subpcode < 0xff)) {
-			mk_sense_invalid_fld(scp, SDEB_IN_CDB, 3, -1);
-			return check_condition_result;
-		}
+		if (subpcode > 0x2 && subpcode < 0xff)
+			goto bad_subpcode;
 		len = 0;
 		if ((0x0 == subpcode) || (0xff == subpcode))
 			len += resp_sas_sf_m_pg(ap + len, pcontrol, target);
@@ -2437,35 +2439,31 @@ static int resp_mode_sense(struct scsi_cmnd *scp,
 		offset += len;
 		break;
 	case 0x1c:	/* Informational Exceptions Mode page, all devices */
+		if (subpcode > 0x0 && subpcode < 0xff)
+			goto bad_subpcode;
 		len = resp_iec_m_pg(ap, pcontrol, target);
 		offset += len;
 		break;
 	case 0x3f:	/* Read all Mode pages */
-		if ((0 == subpcode) || (0xff == subpcode)) {
-			len = resp_err_recov_pg(ap, pcontrol, target);
-			len += resp_disconnect_pg(ap + len, pcontrol, target);
-			if (is_disk) {
-				len += resp_format_pg(ap + len, pcontrol,
-						      target);
-				len += resp_caching_pg(ap + len, pcontrol,
-						       target);
-			} else if (is_zbc) {
-				len += resp_caching_pg(ap + len, pcontrol,
-						       target);
-			}
-			len += resp_ctrl_m_pg(ap + len, pcontrol, target);
-			len += resp_sas_sf_m_pg(ap + len, pcontrol, target);
-			if (0xff == subpcode) {
-				len += resp_sas_pcd_m_spg(ap + len, pcontrol,
-						  target, target_dev_id);
-				len += resp_sas_sha_m_spg(ap + len, pcontrol);
-			}
-			len += resp_iec_m_pg(ap + len, pcontrol, target);
-			offset += len;
-		} else {
-			mk_sense_invalid_fld(scp, SDEB_IN_CDB, 3, -1);
-			return check_condition_result;
+		if (subpcode > 0x0 && subpcode < 0xff)
+			goto bad_subpcode;
+		len = resp_err_recov_pg(ap, pcontrol, target);
+		len += resp_disconnect_pg(ap + len, pcontrol, target);
+		if (is_disk) {
+			len += resp_format_pg(ap + len, pcontrol, target);
+			len += resp_caching_pg(ap + len, pcontrol, target);
+		} else if (is_zbc) {
+			len += resp_caching_pg(ap + len, pcontrol, target);
+		}
+		len += resp_ctrl_m_pg(ap + len, pcontrol, target);
+		len += resp_sas_sf_m_pg(ap + len, pcontrol, target);
+		if (0xff == subpcode) {
+			len += resp_sas_pcd_m_spg(ap + len, pcontrol, target,
+						  target_dev_id);
+			len += resp_sas_sha_m_spg(ap + len, pcontrol);
 		}
+		len += resp_iec_m_pg(ap + len, pcontrol, target);
+		offset += len;
 		break;
 	default:
 		goto bad_pcode;
@@ -2479,6 +2477,10 @@ static int resp_mode_sense(struct scsi_cmnd *scp,
 bad_pcode:
 	mk_sense_invalid_fld(scp, SDEB_IN_CDB, 2, 5);
 	return check_condition_result;
+
+bad_subpcode:
+	mk_sense_invalid_fld(scp, SDEB_IN_CDB, 3, -1);
+	return check_condition_result;
 }
 
 #define SDEBUG_MAX_MSELECT_SZ 512

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 12/13] scsi_debug: Implement the IO Advice Hints Grouping mode page
  2023-09-20 19:14 [PATCH 00/13] Pass data temperature information to zoned UFS devices Bart Van Assche
                   ` (10 preceding siblings ...)
  2023-09-20 19:14 ` [PATCH 11/13] scsi_debug: Rework subpage " Bart Van Assche
@ 2023-09-20 19:14 ` Bart Van Assche
  2023-09-20 19:14 ` [PATCH 13/13] scsi_debug: Maintain write statistics per group number Bart Van Assche
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 61+ messages in thread
From: Bart Van Assche @ 2023-09-20 19:14 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, Bart Van Assche, Douglas Gilbert,
	James E.J. Bottomley

Implement an IO Advice Hints Grouping mode page with three permanent
streams. A permanent stream is a stream for which the device server does
not allow closing or otherwise modifying the configuration of that
stream. The stream identifier enable (ST_ENBLE) bit specifies whether
the stream identifier may be used in the GROUP NUMBER field of SCSI
WRITE commands.

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/scsi_debug.c | 39 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index a96eb0d10346..ae46bcf8374b 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -2241,6 +2241,33 @@ static int resp_ctrl_m_pg(unsigned char *p, int pcontrol, int target)
 	return sizeof(ctrl_m_pg);
 }
 
+/* IO Advice Hints Grouping mode page */
+static int resp_grouping_m_pg(unsigned char *p, int pcontrol, int target)
+{
+	/* IO Advice Hints Grouping mode page */
+	struct grouping_m_pg {
+		u8 page_code;
+		u8 subpage_code;
+		__be16 page_length;
+		u8 reserved[12];
+		struct scsi_io_group_descriptor descr[4];
+	};
+	static const struct grouping_m_pg gr_m_pg = {
+		.page_code = 0xa,
+		.subpage_code = 5,
+		.page_length = cpu_to_be16(sizeof(gr_m_pg) - 4),
+		.descr = {
+			{ .st_enble = 1 },
+			{ .st_enble = 1 },
+			{ .st_enble = 1 },
+			{ .st_enble = 0 },
+		}
+	};
+
+	BUILD_BUG_ON(sizeof(struct grouping_m_pg) != 16 + 4 * 16);
+	memcpy(p, &gr_m_pg, sizeof(gr_m_pg));
+	return sizeof(gr_m_pg);
+}
 
 static int resp_iec_m_pg(unsigned char *p, int pcontrol, int target)
 {	/* Informational Exceptions control mode page for mode_sense */
@@ -2420,9 +2447,17 @@ static int resp_mode_sense(struct scsi_cmnd *scp,
 		}
 		break;
 	case 0xa:	/* Control Mode page, all devices */
-		if (subpcode > 0x0 && subpcode < 0xff)
+		switch (subpcode) {
+		case 0:
+		case 0xff:
+			len = resp_ctrl_m_pg(ap, pcontrol, target);
+			break;
+		case 0x05:
+			len = resp_grouping_m_pg(ap, pcontrol, target);
+			break;
+		default:
 			goto bad_subpcode;
-		len = resp_ctrl_m_pg(ap, pcontrol, target);
+		}
 		offset += len;
 		break;
 	case 0x19:	/* if spc==1 then sas phy, control+discover */

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 13/13] scsi_debug: Maintain write statistics per group number
  2023-09-20 19:14 [PATCH 00/13] Pass data temperature information to zoned UFS devices Bart Van Assche
                   ` (11 preceding siblings ...)
  2023-09-20 19:14 ` [PATCH 12/13] scsi_debug: Implement the IO Advice Hints Grouping mode page Bart Van Assche
@ 2023-09-20 19:14 ` Bart Van Assche
  2023-09-20 19:28 ` [PATCH 00/13] Pass data temperature information to zoned UFS devices Matthew Wilcox
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 61+ messages in thread
From: Bart Van Assche @ 2023-09-20 19:14 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, Bart Van Assche, Douglas Gilbert,
	James E.J. Bottomley

Track per GROUP NUMBER how many write commands have been processed. Make
this information available in sysfs. Reset these statistics if any data
is written into the sysfs attribute.

Note: SCSI devices should only interpret the information in the GROUP
NUMBER field as a stream identifier if the ST_ENBLE bit has been set to
one. This patch follows a simpler approach: count the number of writes
per GROUP NUMBER whether or not the group number represents a stream
identifier.

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/scsi_debug.c | 51 ++++++++++++++++++++++++++++++++++++---
 1 file changed, 47 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index ae46bcf8374b..728d9a1831e2 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -841,6 +841,8 @@ static int sdeb_zbc_nr_conv = DEF_ZBC_NR_CONV_ZONES;
 static int submit_queues = DEF_SUBMIT_QUEUES;  /* > 1 for multi-queue (mq) */
 static int poll_queues; /* iouring iopoll interface.*/
 
+static atomic_long_t writes_by_group_number[64];
+
 static char sdebug_proc_name[] = MY_NAME;
 static const char *my_name = MY_NAME;
 
@@ -3032,7 +3034,8 @@ static inline struct sdeb_store_info *devip2sip(struct sdebug_dev_info *devip,
 
 /* Returns number of bytes copied or -1 if error. */
 static int do_device_access(struct sdeb_store_info *sip, struct scsi_cmnd *scp,
-			    u32 sg_skip, u64 lba, u32 num, bool do_write)
+			    u32 sg_skip, u64 lba, u32 num, bool do_write,
+			    u8 group_number)
 {
 	int ret;
 	u64 block, rest = 0;
@@ -3051,6 +3054,10 @@ static int do_device_access(struct sdeb_store_info *sip, struct scsi_cmnd *scp,
 		return 0;
 	if (scp->sc_data_direction != dir)
 		return -1;
+
+	if (do_write && group_number < ARRAY_SIZE(writes_by_group_number))
+		atomic_long_inc(&writes_by_group_number[group_number]);
+
 	fsp = sip->storep;
 
 	block = do_div(lba, sdebug_store_sectors);
@@ -3424,7 +3431,7 @@ static int resp_read_dt0(struct scsi_cmnd *scp, struct sdebug_dev_info *devip)
 		}
 	}
 
-	ret = do_device_access(sip, scp, 0, lba, num, false);
+	ret = do_device_access(sip, scp, 0, lba, num, false, 0);
 	sdeb_read_unlock(sip);
 	if (unlikely(ret == -1))
 		return DID_ERROR << 16;
@@ -3609,6 +3616,7 @@ static int resp_write_dt0(struct scsi_cmnd *scp, struct sdebug_dev_info *devip)
 {
 	bool check_prot;
 	u32 num;
+	u8 group = 0;
 	u32 ei_lba;
 	int ret;
 	u64 lba;
@@ -3620,11 +3628,13 @@ static int resp_write_dt0(struct scsi_cmnd *scp, struct sdebug_dev_info *devip)
 		ei_lba = 0;
 		lba = get_unaligned_be64(cmd + 2);
 		num = get_unaligned_be32(cmd + 10);
+		group = cmd[14] & 0x3f;
 		check_prot = true;
 		break;
 	case WRITE_10:
 		ei_lba = 0;
 		lba = get_unaligned_be32(cmd + 2);
+		group = cmd[6] & 0x3f;
 		num = get_unaligned_be16(cmd + 7);
 		check_prot = true;
 		break;
@@ -3639,15 +3649,18 @@ static int resp_write_dt0(struct scsi_cmnd *scp, struct sdebug_dev_info *devip)
 		ei_lba = 0;
 		lba = get_unaligned_be32(cmd + 2);
 		num = get_unaligned_be32(cmd + 6);
+		group = cmd[6] & 0x3f;
 		check_prot = true;
 		break;
 	case 0x53:	/* XDWRITEREAD(10) */
 		ei_lba = 0;
 		lba = get_unaligned_be32(cmd + 2);
+		group = cmd[6] & 0x1f;
 		num = get_unaligned_be16(cmd + 7);
 		check_prot = false;
 		break;
 	default:	/* assume WRITE(32) */
+		group = cmd[6] & 0x3f;
 		lba = get_unaligned_be64(cmd + 12);
 		ei_lba = get_unaligned_be32(cmd + 20);
 		num = get_unaligned_be32(cmd + 28);
@@ -3702,7 +3715,7 @@ static int resp_write_dt0(struct scsi_cmnd *scp, struct sdebug_dev_info *devip)
 		}
 	}
 
-	ret = do_device_access(sip, scp, 0, lba, num, true);
+	ret = do_device_access(sip, scp, 0, lba, num, true, group);
 	if (unlikely(scsi_debug_lbp()))
 		map_region(sip, lba, num);
 	/* If ZBC zone then bump its write pointer */
@@ -3754,12 +3767,14 @@ static int resp_write_scat(struct scsi_cmnd *scp,
 	u32 lb_size = sdebug_sector_size;
 	u32 ei_lba;
 	u64 lba;
+	u8 group;
 	int ret, res;
 	bool is_16;
 	static const u32 lrd_size = 32; /* + parameter list header size */
 
 	if (cmd[0] == VARIABLE_LENGTH_CMD) {
 		is_16 = false;
+		group = cmd[6] & 0x3f;
 		wrprotect = (cmd[10] >> 5) & 0x7;
 		lbdof = get_unaligned_be16(cmd + 12);
 		num_lrd = get_unaligned_be16(cmd + 16);
@@ -3770,6 +3785,7 @@ static int resp_write_scat(struct scsi_cmnd *scp,
 		lbdof = get_unaligned_be16(cmd + 4);
 		num_lrd = get_unaligned_be16(cmd + 8);
 		bt_len = get_unaligned_be32(cmd + 10);
+		group = cmd[14] & 0x3f;
 		if (unlikely(have_dif_prot)) {
 			if (sdebug_dif == T10_PI_TYPE2_PROTECTION &&
 			    wrprotect) {
@@ -3858,7 +3874,8 @@ static int resp_write_scat(struct scsi_cmnd *scp,
 			}
 		}
 
-		ret = do_device_access(sip, scp, sg_off, lba, num, true);
+		ret = do_device_access(sip, scp, sg_off, lba, num, true,
+				       group);
 		/* If ZBC zone then bump its write pointer */
 		if (sdebug_dev_is_zoned(devip))
 			zbc_inc_wp(devip, lba, num);
@@ -6783,6 +6800,31 @@ static ssize_t tur_ms_to_ready_show(struct device_driver *ddp, char *buf)
 }
 static DRIVER_ATTR_RO(tur_ms_to_ready);
 
+static ssize_t group_number_stats_show(struct device_driver *ddp, char *buf)
+{
+	char *p = buf, *end = buf + PAGE_SIZE;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(writes_by_group_number); i++)
+		p += scnprintf(p, end - p, "%d %ld\n", i,
+			       atomic_long_read(&writes_by_group_number[i]));
+
+	return p - buf;
+}
+
+static ssize_t group_number_stats_store(struct device_driver *ddp,
+					const char *buf,
+				  size_t count)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(writes_by_group_number); i++)
+		atomic_long_set(&writes_by_group_number[i], 0);
+
+	return 0;
+}
+static DRIVER_ATTR_RW(group_number_stats);
+
 /* Note: The following array creates attribute files in the
    /sys/bus/pseudo/drivers/scsi_debug directory. The advantage of these
    files (over those found in the /sys/module/scsi_debug/parameters
@@ -6829,6 +6871,7 @@ static struct attribute *sdebug_drv_attrs[] = {
 	&driver_attr_cdb_len.attr,
 	&driver_attr_tur_ms_to_ready.attr,
 	&driver_attr_zbc.attr,
+	&driver_attr_group_number_stats.attr,
 	NULL,
 };
 ATTRIBUTE_GROUPS(sdebug_drv);

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH 00/13] Pass data temperature information to zoned UFS devices
  2023-09-20 19:14 [PATCH 00/13] Pass data temperature information to zoned UFS devices Bart Van Assche
                   ` (12 preceding siblings ...)
  2023-09-20 19:14 ` [PATCH 13/13] scsi_debug: Maintain write statistics per group number Bart Van Assche
@ 2023-09-20 19:28 ` Matthew Wilcox
  2023-09-20 20:46   ` Bart Van Assche
  2023-09-27 19:14 ` Martin K. Petersen
                   ` (5 subsequent siblings)
  19 siblings, 1 reply; 61+ messages in thread
From: Matthew Wilcox @ 2023-09-20 19:28 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, linux-scsi, linux-fsdevel,
	Martin K . Petersen, Christoph Hellwig

On Wed, Sep 20, 2023 at 12:14:25PM -0700, Bart Van Assche wrote:
> Hi Jens,
> 
> Zoned UFS vendors need the data temperature information. Hence this patch
> series that restores write hint information in F2FS and in the block layer.
> The SCSI disk (sd) driver is modified such that it passes write hint
> information to SCSI devices via the GROUP NUMBER field.

"Need" in what sense?  Can you quantify what improvements we might
see from this patchset?

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 00/13] Pass data temperature information to zoned UFS devices
  2023-09-20 19:28 ` [PATCH 00/13] Pass data temperature information to zoned UFS devices Matthew Wilcox
@ 2023-09-20 20:46   ` Bart Van Assche
  2023-09-21  7:46     ` Niklas Cassel
  0 siblings, 1 reply; 61+ messages in thread
From: Bart Van Assche @ 2023-09-20 20:46 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jens Axboe, linux-block, linux-scsi, linux-fsdevel,
	Martin K . Petersen, Christoph Hellwig

On 9/20/23 12:28, Matthew Wilcox wrote:
> On Wed, Sep 20, 2023 at 12:14:25PM -0700, Bart Van Assche wrote:
>> Zoned UFS vendors need the data temperature information. Hence
>> this patch series that restores write hint information in F2FS and
>> in the block layer. The SCSI disk (sd) driver is modified such that
>> it passes write hint information to SCSI devices via the GROUP
>> NUMBER field.
> 
> "Need" in what sense?  Can you quantify what improvements we might 
> see from this patchset?

Hi Matthew,

This is what Jens wrote about 1.5 years ago in reply to complaints about
the removal of write hint support making it impossible to pass write 
hint information to SSD devices: "If at some point there's a
desire to actually try and upstream this support, then we'll be happy to
review that patchset."
(https://lore.kernel.org/linux-block/ef77ef36-df95-8658-ff54-7d8046f5d0e7@kernel.dk/). 
Hence this patch series.

Recently T10 standardized how data temperature information should be 
passed to SCSI devices. One of the patches in this series translates 
write hint information into a data temperature for SCSI devices. This 
can be used by SCSI SSD devices (including UFS devices) to reduce write 
amplification inside the device because host software should assign the 
same data temperature to all data that will be garbage collected at once.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 00/13] Pass data temperature information to zoned UFS devices
  2023-09-20 20:46   ` Bart Van Assche
@ 2023-09-21  7:46     ` Niklas Cassel
  2023-09-21 14:27       ` Bart Van Assche
  0 siblings, 1 reply; 61+ messages in thread
From: Niklas Cassel @ 2023-09-21  7:46 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Matthew Wilcox, Jens Axboe, linux-block, linux-scsi,
	linux-fsdevel, Martin K . Petersen, Christoph Hellwig

On Wed, Sep 20, 2023 at 01:46:41PM -0700, Bart Van Assche wrote:
> On 9/20/23 12:28, Matthew Wilcox wrote:
> > On Wed, Sep 20, 2023 at 12:14:25PM -0700, Bart Van Assche wrote:
> > > Zoned UFS vendors need the data temperature information. Hence
> > > this patch series that restores write hint information in F2FS and
> > > in the block layer. The SCSI disk (sd) driver is modified such that
> > > it passes write hint information to SCSI devices via the GROUP
> > > NUMBER field.
> > 
> > "Need" in what sense?  Can you quantify what improvements we might see
> > from this patchset?
> 
> Hi Matthew,
> 
> This is what Jens wrote about 1.5 years ago in reply to complaints about
> the removal of write hint support making it impossible to pass write hint
> information to SSD devices: "If at some point there's a
> desire to actually try and upstream this support, then we'll be happy to
> review that patchset."
> (https://lore.kernel.org/linux-block/ef77ef36-df95-8658-ff54-7d8046f5d0e7@kernel.dk/).
> Hence this patch series.
> 
> Recently T10 standardized how data temperature information should be passed
> to SCSI devices. One of the patches in this series translates write hint
> information into a data temperature for SCSI devices. This can be used by
> SCSI SSD devices (including UFS devices) to reduce write amplification
> inside the device because host software should assign the same data
> temperature to all data that will be garbage collected at once.

Hello Bart,

Considering that this API (F_GET_FILE_RW_HINT / F_SET_FILE_RW_HINT)
was previously only used by NVMe (NVMe streams).

Yet, this API and the support in NVMe (NVMe streams) was removed.

Now you want to re-add the same API, but this time, it will only
be used by SCSI.

Since you basically revert (some of) the patches, I would have expected
the cover letter to at least mention NVMe somewhere.

Should NVMe streams be brought back? Yes? No?
While I have a strong guess of what the NVMe maintainers will say, I think
that your cover letter should mention "why"/"why not" the NVMe support
"is"/"is not" reverted.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 00/13] Pass data temperature information to zoned UFS devices
  2023-09-21  7:46     ` Niklas Cassel
@ 2023-09-21 14:27       ` Bart Van Assche
  2023-09-21 15:34         ` Niklas Cassel
  2023-09-21 19:27         ` Matthew Wilcox
  0 siblings, 2 replies; 61+ messages in thread
From: Bart Van Assche @ 2023-09-21 14:27 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: Matthew Wilcox, Jens Axboe, linux-block, linux-scsi,
	linux-fsdevel, Martin K . Petersen, Christoph Hellwig

On 9/21/23 00:46, Niklas Cassel wrote:
> Considering that this API (F_GET_FILE_RW_HINT / F_SET_FILE_RW_HINT) 
> was previously only used by NVMe (NVMe streams).

That doesn't sound correct to me. I think support for this API was added
in F2FS in November 2017 (commit 4f0a03d34dd4 ("f2fs: apply write hints
to select the type of segments for buffered write")). That was a few
months after NVMe stream support was added (June 2017) by commit
f5d118406247 ("nvme: add support for streams and directives").

> Should NVMe streams be brought back? Yes? No?

 From commit 561593a048d7 ("Merge tag 'for-5.18/write-streams-2022-03-18'
of git://git.kernel.dk/linux-block"): "This removes the write streams
support in NVMe. No vendor ever really shipped working support for this,
and they are not interested in supporting it."

I do not want to reopen the discussion about NVMe streams.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 00/13] Pass data temperature information to zoned UFS devices
  2023-09-21 14:27       ` Bart Van Assche
@ 2023-09-21 15:34         ` Niklas Cassel
  2023-09-21 17:00           ` Bart Van Assche
  2023-09-21 19:27         ` Matthew Wilcox
  1 sibling, 1 reply; 61+ messages in thread
From: Niklas Cassel @ 2023-09-21 15:34 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Matthew Wilcox, Jens Axboe, linux-block, linux-scsi,
	linux-fsdevel, Martin K . Petersen, Christoph Hellwig

Hello Bart,

On Thu, Sep 21, 2023 at 07:27:08AM -0700, Bart Van Assche wrote:
> On 9/21/23 00:46, Niklas Cassel wrote:
> > Considering that this API (F_GET_FILE_RW_HINT / F_SET_FILE_RW_HINT) was
> > previously only used by NVMe (NVMe streams).
> 
> That doesn't sound correct to me. I think support for this API was added
> in F2FS in November 2017 (commit 4f0a03d34dd4 ("f2fs: apply write hints
> to select the type of segments for buffered write")). That was a few
> months after NVMe stream support was added (June 2017) by commit
> f5d118406247 ("nvme: add support for streams and directives").

I wrote the "this API (F_GET_FILE_RW_HINT / F_SET_FILE_RW_HINT),
i.e. the support for hints in the block layer.

This addition to the block layer API was added in:
c75b1d9421f8 ("fs: add fcntl() interface for setting/getting write life time hints")

As part of this series:
https://lore.kernel.org/linux-block/1498491480-16306-1-git-send-email-axboe@kernel.dk/

So this support included:
-the block layer API changes
-the support for NVMe streams


The modifications to f2fs to actually make use of these block layer write
hints was not included in this initial series. They were added several
months later.


> From commit 561593a048d7 ("Merge tag 'for-5.18/write-streams-2022-03-18'
> of git://git.kernel.dk/linux-block"): "This removes the write streams
> support in NVMe. No vendor ever really shipped working support for this,
> and they are not interested in supporting it."
> 
> I do not want to reopen the discussion about NVMe streams.

I don't think we need to.

I simply think that your cover letter should mention it somehow...

As the whole reason why the block layer API was merged was to be
able to support NVMe streams.

So you bringing back this API, I think that you should at least
mention that you don't bring back NVMe streams...
and mention that you bring back the support for f2fs,
and add support for SCSI.. with some short motivation of why support
is needed in both SCSI and f2fs.

Right now your cover letter is 4 lines :)
I don't recall when I last saw such a small cover letter for a feature
impacting so many different parts of the kernel.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 00/13] Pass data temperature information to zoned UFS devices
  2023-09-21 15:34         ` Niklas Cassel
@ 2023-09-21 17:00           ` Bart Van Assche
  0 siblings, 0 replies; 61+ messages in thread
From: Bart Van Assche @ 2023-09-21 17:00 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: Matthew Wilcox, Jens Axboe, linux-block, linux-scsi,
	linux-fsdevel, Martin K . Petersen, Christoph Hellwig

On 9/21/23 08:34, Niklas Cassel wrote:
> Right now your cover letter is 4 lines :)
> I don't recall when I last saw such a small cover letter for a feature
> impacting so many different parts of the kernel.

I will expand the cover letter if I have to repost this patch series.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 00/13] Pass data temperature information to zoned UFS devices
  2023-09-21 14:27       ` Bart Van Assche
  2023-09-21 15:34         ` Niklas Cassel
@ 2023-09-21 19:27         ` Matthew Wilcox
  2023-09-21 19:39           ` Bart Van Assche
  1 sibling, 1 reply; 61+ messages in thread
From: Matthew Wilcox @ 2023-09-21 19:27 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Niklas Cassel, Jens Axboe, linux-block, linux-scsi,
	linux-fsdevel, Martin K . Petersen, Christoph Hellwig

On Thu, Sep 21, 2023 at 07:27:08AM -0700, Bart Van Assche wrote:
> On 9/21/23 00:46, Niklas Cassel wrote:
> > Considering that this API (F_GET_FILE_RW_HINT / F_SET_FILE_RW_HINT) was
> > previously only used by NVMe (NVMe streams).
> 
> That doesn't sound correct to me. I think support for this API was added
> in F2FS in November 2017 (commit 4f0a03d34dd4 ("f2fs: apply write hints
> to select the type of segments for buffered write")). That was a few
> months after NVMe stream support was added (June 2017) by commit
> f5d118406247 ("nvme: add support for streams and directives").
> 
> > Should NVMe streams be brought back? Yes? No?
> 
> From commit 561593a048d7 ("Merge tag 'for-5.18/write-streams-2022-03-18'
> of git://git.kernel.dk/linux-block"): "This removes the write streams
> support in NVMe. No vendor ever really shipped working support for this,
> and they are not interested in supporting it."

It sounds like UFS is at the same stage that NVMe got to -- standard
exists, no vendor has committed to actually shipping it.  Isn't bringing
it back a little premature?

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 00/13] Pass data temperature information to zoned UFS devices
  2023-09-21 19:27         ` Matthew Wilcox
@ 2023-09-21 19:39           ` Bart Van Assche
  2023-09-21 19:46             ` Matthew Wilcox
  0 siblings, 1 reply; 61+ messages in thread
From: Bart Van Assche @ 2023-09-21 19:39 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Niklas Cassel, Jens Axboe, linux-block, linux-scsi,
	linux-fsdevel, Martin K . Petersen, Christoph Hellwig

On 9/21/23 12:27, Matthew Wilcox wrote:
> On Thu, Sep 21, 2023 at 07:27:08AM -0700, Bart Van Assche wrote:
>> On 9/21/23 00:46, Niklas Cassel wrote:
>>> Should NVMe streams be brought back? Yes? No?
>>
>> From commit 561593a048d7 ("Merge tag 'for-5.18/write-streams-2022-03-18'
>> of git://git.kernel.dk/linux-block"): "This removes the write streams
>> support in NVMe. No vendor ever really shipped working support for this,
>> and they are not interested in supporting it."
> 
> It sounds like UFS is at the same stage that NVMe got to -- standard
> exists, no vendor has committed to actually shipping it.  Isn't bringing
> it back a little premature?

Hi Matthew,

That's a misunderstanding. UFS vendors support interpreting the SCSI 
GROUP NUMBER as a data temperature since many years, probably since more 
than ten years. Additionally, for multiple UFS vendors having the data 
temperature available is important for achieving good performance. This 
message shows how UFS vendors were using that information before write 
hint support was removed: 
https://lore.kernel.org/linux-block/PH0PR08MB7889642784B2E1FC1799A828DB0B9@PH0PR08MB7889.namprd08.prod.outlook.com/

This patch series implements support for passing data temperature 
information from F2FS to UFS devices in a standards-compliant way.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 00/13] Pass data temperature information to zoned UFS devices
  2023-09-21 19:39           ` Bart Van Assche
@ 2023-09-21 19:46             ` Matthew Wilcox
  2023-09-21 20:11               ` Bart Van Assche
  2023-09-21 20:47               ` Jaegeuk Kim
  0 siblings, 2 replies; 61+ messages in thread
From: Matthew Wilcox @ 2023-09-21 19:46 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Niklas Cassel, Jens Axboe, linux-block, linux-scsi,
	linux-fsdevel, Martin K . Petersen, Christoph Hellwig

On Thu, Sep 21, 2023 at 12:39:00PM -0700, Bart Van Assche wrote:
> On 9/21/23 12:27, Matthew Wilcox wrote:
> > On Thu, Sep 21, 2023 at 07:27:08AM -0700, Bart Van Assche wrote:
> > > On 9/21/23 00:46, Niklas Cassel wrote:
> > > > Should NVMe streams be brought back? Yes? No?
> > > 
> > > From commit 561593a048d7 ("Merge tag 'for-5.18/write-streams-2022-03-18'
> > > of git://git.kernel.dk/linux-block"): "This removes the write streams
> > > support in NVMe. No vendor ever really shipped working support for this,
> > > and they are not interested in supporting it."
> > 
> > It sounds like UFS is at the same stage that NVMe got to -- standard
> > exists, no vendor has committed to actually shipping it.  Isn't bringing
> > it back a little premature?
> 
> Hi Matthew,
> 
> That's a misunderstanding. UFS vendors support interpreting the SCSI GROUP
> NUMBER as a data temperature since many years, probably since more than ten
> years. Additionally, for multiple UFS vendors having the data temperature
> available is important for achieving good performance. This message shows
> how UFS vendors were using that information before write hint support was
> removed: https://lore.kernel.org/linux-block/PH0PR08MB7889642784B2E1FC1799A828DB0B9@PH0PR08MB7889.namprd08.prod.outlook.com/

If vendor support already exists, then why did you dodge the question
asking for quantified data that I asked earlier?  And can we have that
data now?

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 00/13] Pass data temperature information to zoned UFS devices
  2023-09-21 19:46             ` Matthew Wilcox
@ 2023-09-21 20:11               ` Bart Van Assche
  2023-09-21 20:47               ` Jaegeuk Kim
  1 sibling, 0 replies; 61+ messages in thread
From: Bart Van Assche @ 2023-09-21 20:11 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Niklas Cassel, Jens Axboe, linux-block, linux-scsi,
	linux-fsdevel, Martin K . Petersen, Christoph Hellwig, Bean Huo,
	Bean Huo, Avri Altman, Daejun Park, Luca Porzio

On 9/21/23 12:46, Matthew Wilcox wrote:
> If vendor support already exists, then why did you dodge the question
> asking for quantified data that I asked earlier?  And can we have that
> data now?

 From Rho, Eunhee, Kanchan Joshi, Seung-Uk Shin, Nitesh Jagadeesh 
Shetty, Jooyoung Hwang, Sangyeun Cho, Daniel DG Lee, and Jaeheon Jeong. 
"{FStream}: Managing Flash Streams in the File System." In 16th USENIX 
Conference on File and Storage Technologies (FAST 18), pp. 257-264. 
2018: "Experimental results show that FStream enhances the filebench 
performance by 5%∼35% and reduces WAF (Write Amplification Factor) by 
7%∼46%. For a NoSQL database benchmark, performance is improved by up to 
38% and WAF is reduced by up to 81%." Please note that these results are 
for ext4 instead of F2FS. The benefit for F2FS is probably smaller since 
F2FS is optimized for NAND flash media.

I have Cc-ed open source contributors from multiple UFS vendors on this 
email and I hope that they can share performance numbers for F2FS.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 00/13] Pass data temperature information to zoned UFS devices
  2023-09-21 19:46             ` Matthew Wilcox
  2023-09-21 20:11               ` Bart Van Assche
@ 2023-09-21 20:47               ` Jaegeuk Kim
  1 sibling, 0 replies; 61+ messages in thread
From: Jaegeuk Kim @ 2023-09-21 20:47 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Bart Van Assche, Niklas Cassel, Jens Axboe, linux-block,
	linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig

On 09/21, Matthew Wilcox wrote:
> On Thu, Sep 21, 2023 at 12:39:00PM -0700, Bart Van Assche wrote:
> > On 9/21/23 12:27, Matthew Wilcox wrote:
> > > On Thu, Sep 21, 2023 at 07:27:08AM -0700, Bart Van Assche wrote:
> > > > On 9/21/23 00:46, Niklas Cassel wrote:
> > > > > Should NVMe streams be brought back? Yes? No?
> > > > 
> > > > From commit 561593a048d7 ("Merge tag 'for-5.18/write-streams-2022-03-18'
> > > > of git://git.kernel.dk/linux-block"): "This removes the write streams
> > > > support in NVMe. No vendor ever really shipped working support for this,
> > > > and they are not interested in supporting it."
> > > 
> > > It sounds like UFS is at the same stage that NVMe got to -- standard
> > > exists, no vendor has committed to actually shipping it.  Isn't bringing
> > > it back a little premature?
> > 
> > Hi Matthew,
> > 
> > That's a misunderstanding. UFS vendors support interpreting the SCSI GROUP
> > NUMBER as a data temperature since many years, probably since more than ten
> > years. Additionally, for multiple UFS vendors having the data temperature
> > available is important for achieving good performance. This message shows
> > how UFS vendors were using that information before write hint support was
> > removed: https://lore.kernel.org/linux-block/PH0PR08MB7889642784B2E1FC1799A828DB0B9@PH0PR08MB7889.namprd08.prod.outlook.com/
> 
> If vendor support already exists, then why did you dodge the question
> asking for quantified data that I asked earlier?  And can we have that
> data now?

I'm in doubt this patch-set really requires the quantified data which may be
mostly confidential to all the companies, also given the revert reason was no
user, IIUC. OTOH, I'm not sure whether you're famailiar with FTL, but, when
we consider the entire stack ranging from f2fs to FTL which manages NAND blocks,
I do see a clear benefit to give the temperature hints for FTL to align therein
garbage collection unit with one in f2fs, which is the key idea on Zoned UFS
in mobile world, I believe. Otherwise, it can show non-deterministic longer
write latencies due to internal GCs, increase WAI feeding to shorter lifetime.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 00/13] Pass data temperature information to zoned UFS devices
  2023-09-20 19:14 [PATCH 00/13] Pass data temperature information to zoned UFS devices Bart Van Assche
                   ` (13 preceding siblings ...)
  2023-09-20 19:28 ` [PATCH 00/13] Pass data temperature information to zoned UFS devices Matthew Wilcox
@ 2023-09-27 19:14 ` Martin K. Petersen
  2023-09-27 20:49   ` Bart Van Assche
  2023-10-02 11:38   ` Niklas Cassel
       [not found] ` <CGME20230920191557epcas2p34a114957acf221c0d8f60acbb3107c77@epcms2p6>
                   ` (4 subsequent siblings)
  19 siblings, 2 replies; 61+ messages in thread
From: Martin K. Petersen @ 2023-09-27 19:14 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, linux-scsi, linux-fsdevel,
	Martin K . Petersen, Christoph Hellwig


Hi Bart!

> Zoned UFS vendors need the data temperature information. Hence this
> patch series that restores write hint information in F2FS and in the
> block layer. The SCSI disk (sd) driver is modified such that it passes
> write hint information to SCSI devices via the GROUP NUMBER field.

I don't have any particular problems with your implementation, although
I'm still trying to wrap my head around how to make this coexist with my
I/O hinting series. But I guess there's probably not going to be a big
overlap between devices that support both features.

However, it still pains me greatly to see the SBC proposal being
intertwined with the travesty that is streams. Why not define everything
in the IO advice hints group descriptor? I/O hints already use GROUP
NUMBER as an index. Why not just define a few permanent hint
descriptors? What's the point of the additional level of indirection to
tie this new feature into streams? RSCS basically says "ignore the
streams-specific bits and bobs and do this other stuff instead". What
does the streams infrastructure provide that can't be solved trivially
in the IO advise mode page alone?

For existing UFS devices which predate RSCS and streams but which
support getting data temperature from GROUP NUMBER, what is the
mechanism for detecting and enabling the feature?

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 00/13] Pass data temperature information to zoned UFS devices
  2023-09-27 19:14 ` Martin K. Petersen
@ 2023-09-27 20:49   ` Bart Van Assche
  2023-10-02 11:38   ` Niklas Cassel
  1 sibling, 0 replies; 61+ messages in thread
From: Bart Van Assche @ 2023-09-27 20:49 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Jens Axboe, linux-block, linux-scsi, linux-fsdevel, Christoph Hellwig

On 9/27/23 12:14, Martin K. Petersen wrote:
> I don't have any particular problems with your implementation, 
> although I'm still trying to wrap my head around how to make this 
> coexist with my I/O hinting series. But I guess there's probably not
> going to be a big overlap between devices that support both 
> features.

Hi Martin,

This patch series should make it easier to implement I/O hint support
since some of the code added by this patch series is also needed to
implement I/O hint support.

> However, it still pains me greatly to see the SBC proposal being 
> intertwined with the travesty that is streams. Why not define 
> everything in the IO advice hints group descriptor? I/O hints already
> use GROUP NUMBER as an index. Why not just define a few permanent
> hint descriptors? What's the point of the additional level of
> indirection to tie this new feature into streams? RSCS basically says
> "ignore the streams-specific bits and bobs and do this other stuff
> instead". What does the streams infrastructure provide that can't be
> solved trivially in the IO advise mode page alone?

Hmm ... isn't that exactly what T10 did, define everything in the IO
advice hints group descriptor by introducing the new ST_ENBLE bit in
that descriptor?

This patch series relies on the constrained streams mechanism. A
constrained stream is permanently open. The new ST_ENBLE bit in the IO
advice hints group descriptor indicates whether or not an IO advice
hints group represents a permanent stream.

The new ST_ENBLE bit in the IO advice hints group descriptor allows SCSI
devices to interpret the index of the descriptor as a data lifetime.
 From the approved T10 proposal:

Table x1 – RELATIVE LIFETIME field
..............................................
Code        Relative lifetime
..............................................
00h         no relative lifetime is applicable
01h         shortest relative lifetime
02h         second shortest relative lifetime
03h to 3Dh  intermediate relative lifetimes
3Eh         second longest relative lifetime
3Fh         longest relative lifetime
..............................................

> For existing UFS devices which predate RSCS and streams but which 
> support getting data temperature from GROUP NUMBER, what is the 
> mechanism for detecting and enabling the feature?

We plan to ask UFS device vendors to modify the UFS device firmware and
to add support for the VPD and mode pages this patch series relies on.
My understanding is that this can be done easily in UFS device firmware.

Although it is technically possible to update the firmware of UFS
devices in smartphones, most smartphones do not support this because
this is considered risky. Hence, only new smartphones will benefit from
this patch series.

I do not want to add support in the Linux kernel for how conventional
UFS devices use the GROUP NUMBER field today. Conventional UFS devices
interpret the GROUP NUMBER field as a "ContextID". The ContextID
mechanism has a state, just like the SCSI stream mechanism. UFS contexts
need to be opened explicitly and are closed upon reset. From the UFS 4.0
specification: "No ContextID shall be open after power cycle."

Please let me know if you need more information.

Bart.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH 01/13] fs/f2fs: Restore the whint_mode mount option
  2023-09-20 19:14 ` [PATCH 01/13] fs/f2fs: Restore the whint_mode mount option Bart Van Assche
@ 2023-10-02 10:32   ` Avri Altman
  2023-10-03 19:33   ` Bean Huo
  1 sibling, 0 replies; 61+ messages in thread
From: Avri Altman @ 2023-10-02 10:32 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, Jaegeuk Kim, Chao Yu, Jonathan Corbet

> Restore support for the whint_mode mount option by reverting commit
> 930e2607638d ("f2fs: remove obsolete whint_mode").
> 
> Cc: Jaegeuk Kim <jaegeuk@kernel.org>
> Cc: Chao Yu <chao@kernel.org>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Avri Altman <avri.altman@wdc.com>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH 02/13] fs: Restore support for F_GET_FILE_RW_HINT and F_SET_FILE_RW_HINT
  2023-09-20 19:14 ` [PATCH 02/13] fs: Restore support for F_GET_FILE_RW_HINT and F_SET_FILE_RW_HINT Bart Van Assche
@ 2023-10-02 10:35   ` Avri Altman
  2023-10-03 19:42   ` Bean Huo
  1 sibling, 0 replies; 61+ messages in thread
From: Avri Altman @ 2023-10-02 10:35 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, Dave Chinner, Alexander Viro,
	Christian Brauner, Jeff Layton, Chuck Lever

> Restore support for F_GET_FILE_RW_HINT and F_SET_FILE_RW_HINT by
> reverting commit 7b12e49669c9 ("fs: remove fs.f_write_hint").
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Avri Altman <avri.altman@wdc.com>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH 03/13] fs: Restore kiocb.ki_hint
  2023-09-20 19:14 ` [PATCH 03/13] fs: Restore kiocb.ki_hint Bart Van Assche
@ 2023-10-02 10:45   ` Avri Altman
  2023-10-02 16:39     ` Bart Van Assche
  0 siblings, 1 reply; 61+ messages in thread
From: Avri Altman @ 2023-10-02 10:45 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, Dave Chinner, Alexander Viro,
	Christian Brauner, Benjamin LaHaise, David Howells, Jaegeuk Kim,
	Chao Yu, Steven Rostedt, Masami Hiramatsu

 
> Restore support for passing write hint information from a filesystem to the
> block layer. Write hint information can be set via fcntl(fd, F_SET_RW_HINT,
> &hint). This patch reverts commit 41d36a9f3e53 ("fs: remove kiocb.ki_hint").
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

.....

> diff --git a/io_uring/rw.c b/io_uring/rw.c
> index c8c822fa7980..c41ae6654116 100644
> --- a/io_uring/rw.c
> +++ b/io_uring/rw.c
> @@ -677,6 +677,7 @@ static int io_rw_init_file(struct io_kiocb *req,
> fmode_t mode)
>                 req->flags |= io_file_get_flags(file);
> 
>         kiocb->ki_flags = file->f_iocb_flags;
> +       kiocb->ki_hint = file_inode(file)->i_write_hint;
Originally ki_hint_validate() was used here as well?

Thanks,
Avri

>         ret = kiocb_set_rw_flags(kiocb, rw->flags);
>         if (unlikely(ret))
>                 return ret;

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH 04/13] block: Restore write hint support
  2023-09-20 19:14 ` [PATCH 04/13] block: Restore write hint support Bart Van Assche
@ 2023-10-02 11:23   ` Avri Altman
  2023-10-02 17:02     ` Bart Van Assche
  2023-10-02 18:08   ` Avri Altman
  2023-10-03 19:52   ` Bean Huo
  2 siblings, 1 reply; 61+ messages in thread
From: Avri Altman @ 2023-10-02 11:23 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, Alexander Viro, Christian Brauner,
	Jaegeuk Kim, Chao Yu, Darrick J. Wong

> This patch partially reverts commit c75e707fe1aa ("block: remove the
> per-bio/request write hint"). The following aspects of that commit have
> been reverted:
> - Pass the struct kiocb write hint information to struct bio.
> - Pass the struct bio write hint information to struct request.
> - Do not merge requests with different write hints.
> - Passing write hint information from the VFS layer to the block layer.
> - In F2FS, initialization of bio.bi_write_hint.
> 
> The following aspects of that commit have been dropped:
> - Debugfs support for retrieving and modifying write hints.
Any particular reason to left those out?

> - md-raid, BTRFS, ext4, gfs2 and zonefs write hint support.
Native Linux with ext4 is being used in automotive, and even mobile platforms.
E.g. Qualcomm's RB5 is formally maintained with Debian - https://releases.linaro.org/96boards/rb5/linaro/debian/21.12/

Thanks,
Avri
> - The write_hints[] array in struct request_queue.
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH 05/13] scsi: core: Query the Block Limits Extension VPD page
  2023-09-20 19:14 ` [PATCH 05/13] scsi: core: Query the Block Limits Extension VPD page Bart Van Assche
@ 2023-10-02 11:29   ` Avri Altman
  0 siblings, 0 replies; 61+ messages in thread
From: Avri Altman @ 2023-10-02 11:29 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, James E.J. Bottomley

> Parse the Reduced Stream Control Supported (RSCS) bit from the block limits
> extension VPD page. The RSCS bit is defined in T10 document
> "SBC-5 Constrained Streams with Data Lifetimes"
> (https://www.t10.org/cgi-bin/ac.pl?t=d&f=23-024r3.pdf).
> 
> Cc: Martin K. Petersen <martin.petersen@oracle.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Avri Altman <avri.altman@wdc.com>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 00/13] Pass data temperature information to zoned UFS devices
  2023-09-27 19:14 ` Martin K. Petersen
  2023-09-27 20:49   ` Bart Van Assche
@ 2023-10-02 11:38   ` Niklas Cassel
  2023-10-02 11:53     ` Niklas Cassel
                       ` (2 more replies)
  1 sibling, 3 replies; 61+ messages in thread
From: Niklas Cassel @ 2023-10-02 11:38 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Bart Van Assche, Jens Axboe, linux-block, linux-scsi,
	linux-fsdevel, Christoph Hellwig, Damien Le Moal

On Wed, Sep 27, 2023 at 03:14:10PM -0400, Martin K. Petersen wrote:
> 
> Hi Bart!
> 
> > Zoned UFS vendors need the data temperature information. Hence this
> > patch series that restores write hint information in F2FS and in the
> > block layer. The SCSI disk (sd) driver is modified such that it passes
> > write hint information to SCSI devices via the GROUP NUMBER field.
> 
> I don't have any particular problems with your implementation, although
> I'm still trying to wrap my head around how to make this coexist with my
> I/O hinting series. But I guess there's probably not going to be a big
> overlap between devices that support both features.

Hello Bart, Martin,

I don't know which user facing API Martin's I/O hinting series is intending
to use.

However, while discussing this series at ALPSS, we did ask ourselves why this
series is not reusing the already existing block layer API for providing I/O
hints:
https://github.com/torvalds/linux/blob/v6.6-rc4/include/uapi/linux/ioprio.h#L83-L103

We can have 1023 possible I/O hints, and so far we are only using 7, which
means that there are 1016 possible hints left.
This also enables you to define more than the 4 previous temperature hints
(extreme, long, medium, short), if so desired.

There is also support in fio for these I/O hints:
https://github.com/axboe/fio/blob/master/HOWTO.rst?plain=1#L2294-L2302

When this new I/O hint API has added, there was no other I/O hint API
in the kernel (since the old fcntl() F_GET_FILE_RW_HINT / F_SET_FILE_RW_HINT
API had already been removed when this new API was added).

So there should probably be a good argument why we would want to introduce
yet another API for providing I/O hints, instead of extending the I/O hint
API that we already have in the kernel right now.
(Especially since it seems fairly easy to modify your patches to reuse the
existing API.)


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH 06/13] scsi_proto: Add struct io_group_descriptor
  2023-09-20 19:14 ` [PATCH 06/13] scsi_proto: Add struct io_group_descriptor Bart Van Assche
@ 2023-10-02 11:41   ` Avri Altman
  2023-10-02 17:16     ` Bart Van Assche
  2023-10-02 18:16   ` Avri Altman
  1 sibling, 1 reply; 61+ messages in thread
From: Avri Altman @ 2023-10-02 11:41 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, James E.J. Bottomley

> Prepare for adding code that will fill in and parse this data structure.
> 
> Cc: Martin K. Petersen <martin.petersen@oracle.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  include/scsi/scsi_proto.h | 40
> +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 40 insertions(+)
> 
> diff --git a/include/scsi/scsi_proto.h b/include/scsi/scsi_proto.h
> index 07d65c1f59db..4e3691cb67da 100644
> --- a/include/scsi/scsi_proto.h
> +++ b/include/scsi/scsi_proto.h
> @@ -10,6 +10,7 @@
>  #ifndef _SCSI_PROTO_H_
>  #define _SCSI_PROTO_H_
> 
> +#include <linux/build_bug.h>
>  #include <linux/types.h>
> 
>  /*
> @@ -275,6 +276,45 @@ struct scsi_lun {
>         __u8 scsi_lun[8];
>  };
> 
> +/* SBC-5 IO advice hints group descriptor */
> +struct scsi_io_group_descriptor {
> +#if defined(__BIG_ENDIAN)
> +       u8 io_advice_hints_mode: 2;
> +       u8 reserved1: 3;
> +       u8 st_enble: 1;
> +       u8 cs_enble: 1;
> +       u8 ic_enable: 1;
> +#elif defined(__LITTLE_ENDIAN)
> +       u8 ic_enable: 1;
> +       u8 cs_enble: 1;
> +       u8 st_enble: 1;
> +       u8 reserved1: 3;
> +       u8 io_advice_hints_mode: 2;
> +#else
> +#error
> +#endif
Anything pass byte offset 0 is irrelevant for constrained streams.
Why do we need that further drill down of the descriptor structure?

Thanks,
Avri

> +       u8 reserved2[3];
> +       /* Logical block markup descriptor */
> +#if defined(__BIG_ENDIAN)
> +       u8 acdlu: 1;
> +       u8 reserved3: 1;
> +       u8 rlbsr: 2;
> +       u8 lbm_descriptor_type: 4;
> +#elif defined(__LITTLE_ENDIAN)
> +       u8 lbm_descriptor_type: 4;
> +       u8 rlbsr: 2;
> +       u8 reserved3: 1;
> +       u8 acdlu: 1;
> +#else
> +#error
> +#endif
> +       u8 params[2];
> +       u8 reserved4;
> +       u8 reserved5[8];
> +};
> +
> +static_assert(sizeof(struct scsi_io_group_descriptor) == 16);
> +
>  /* SPC asymmetric access states */
>  #define SCSI_ACCESS_STATE_OPTIMAL     0x00
>  #define SCSI_ACCESS_STATE_ACTIVE      0x01

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 00/13] Pass data temperature information to zoned UFS devices
  2023-10-02 11:38   ` Niklas Cassel
@ 2023-10-02 11:53     ` Niklas Cassel
  2023-10-02 16:33       ` Bart Van Assche
  2023-10-02 17:20     ` Bart Van Assche
  2023-10-03  1:40     ` Martin K. Petersen
  2 siblings, 1 reply; 61+ messages in thread
From: Niklas Cassel @ 2023-10-02 11:53 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Bart Van Assche, Jens Axboe, linux-block, linux-scsi,
	linux-fsdevel, Christoph Hellwig, Damien Le Moal

On Mon, Oct 02, 2023 at 01:37:59PM +0200, Niklas Cassel wrote:
> On Wed, Sep 27, 2023 at 03:14:10PM -0400, Martin K. Petersen wrote:
> > 
> > Hi Bart!
> > 
> > > Zoned UFS vendors need the data temperature information. Hence this
> > > patch series that restores write hint information in F2FS and in the
> > > block layer. The SCSI disk (sd) driver is modified such that it passes
> > > write hint information to SCSI devices via the GROUP NUMBER field.
> > 
> > I don't have any particular problems with your implementation, although
> > I'm still trying to wrap my head around how to make this coexist with my
> > I/O hinting series. But I guess there's probably not going to be a big
> > overlap between devices that support both features.
> 
> Hello Bart, Martin,
> 
> I don't know which user facing API Martin's I/O hinting series is intending
> to use.
> 
> However, while discussing this series at ALPSS, we did ask ourselves why this
> series is not reusing the already existing block layer API for providing I/O
> hints:
> https://github.com/torvalds/linux/blob/v6.6-rc4/include/uapi/linux/ioprio.h#L83-L103
> 
> We can have 1023 possible I/O hints, and so far we are only using 7, which
> means that there are 1016 possible hints left.
> This also enables you to define more than the 4 previous temperature hints
> (extreme, long, medium, short), if so desired.
> 
> There is also support in fio for these I/O hints:
> https://github.com/axboe/fio/blob/master/HOWTO.rst?plain=1#L2294-L2302
> 
> When this new I/O hint API has added, there was no other I/O hint API
> in the kernel (since the old fcntl() F_GET_FILE_RW_HINT / F_SET_FILE_RW_HINT
> API had already been removed when this new API was added).
> 
> So there should probably be a good argument why we would want to introduce
> yet another API for providing I/O hints, instead of extending the I/O hint
> API that we already have in the kernel right now.
> (Especially since it seems fairly easy to modify your patches to reuse the
> existing API.)

One argument might be that the current I/O hints API does not allow hints to
be stacked. So one would not e.g. be able to combine a command duration limit
with a temperature hint...


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH 07/13] sd: Translate data lifetime information
  2023-09-20 19:14 ` [PATCH 07/13] sd: Translate data lifetime information Bart Van Assche
@ 2023-10-02 13:11   ` Avri Altman
  2023-10-02 17:42     ` Bart Van Assche
  0 siblings, 1 reply; 61+ messages in thread
From: Avri Altman @ 2023-10-02 13:11 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, Damien Le Moal, James E.J. Bottomley

> 
> Recently T10 standardized SBC constrained streams. This mechanism enables
> passing data lifetime information to SCSI devices in the group number
> field. Add support for translating write hint information into a
> permanent stream number in the sd driver.
> 
> Cc: Martin K. Petersen <martin.petersen@oracle.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/scsi/sd.c | 65 ++++++++++++++++++++++++++++++++++++++++++++-
> --
>  drivers/scsi/sd.h |  1 +
>  2 files changed, 63 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 879edbc1a065..7bbc58cd99d1 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -1001,12 +1001,38 @@ static blk_status_t sd_setup_flush_cmnd(struct
> scsi_cmnd *cmd)
>         return BLK_STS_OK;
>  }
> 
> +/**
> + * sd_group_number() - Compute the GROUP NUMBER field
> + * @cmd: SCSI command for which to compute the value of the six-bit
> GROUP NUMBER
> + *     field.
> + *
> + * From "SBC-5 Constrained Streams with Data Lifetimes"
> + * (https://www.t10.org/cgi-bin/ac.pl?t=d&f=23-024r3.pdf):
> + * 0: no relative lifetime.
> + * 1: shortest relative lifetime.
> + * 2: second shortest relative lifetime.
> + * 3 - 0x3d: intermediate relative lifetimes.
> + * 0x3e: second longest relative lifetime.
> + * 0x3f: longest relative lifetime.
> + */
> +static u8 sd_group_number(struct scsi_cmnd *cmd)
> +{
> +       const struct request *rq = scsi_cmd_to_rq(cmd);
> +       struct scsi_disk *sdkp = scsi_disk(rq->q->disk);
> +       const int max_gn = min_t(u16, sdkp->permanent_stream_count, 0x3f);
> +
> +       if (!sdkp->rscs || rq->write_hint == WRITE_LIFE_NOT_SET)
> +               return 0;
> +       return min(rq->write_hint - WRITE_LIFE_NONE, max_gn);
> +}
> +
>  static blk_status_t sd_setup_rw32_cmnd(struct scsi_cmnd *cmd, bool write,
>                                        sector_t lba, unsigned int nr_blocks,
>                                        unsigned char flags, unsigned int dld)
>  {
>         cmd->cmd_len = SD_EXT_CDB_SIZE;
>         cmd->cmnd[0]  = VARIABLE_LENGTH_CMD;
> +       cmd->cmnd[6]  = sd_group_number(cmd);
>         cmd->cmnd[7]  = 0x18; /* Additional CDB len */
>         cmd->cmnd[9]  = write ? WRITE_32 : READ_32;
>         cmd->cmnd[10] = flags;
> @@ -1025,7 +1051,7 @@ static blk_status_t sd_setup_rw16_cmnd(struct
> scsi_cmnd *cmd, bool write,
>         cmd->cmd_len  = 16;
>         cmd->cmnd[0]  = write ? WRITE_16 : READ_16;
>         cmd->cmnd[1]  = flags | ((dld >> 2) & 0x01);
> -       cmd->cmnd[14] = (dld & 0x03) << 6;
> +       cmd->cmnd[14] = ((dld & 0x03) << 6) | sd_group_number(cmd);
>         cmd->cmnd[15] = 0;
>         put_unaligned_be64(lba, &cmd->cmnd[2]);
>         put_unaligned_be32(nr_blocks, &cmd->cmnd[10]);
> @@ -1040,7 +1066,7 @@ static blk_status_t sd_setup_rw10_cmnd(struct
> scsi_cmnd *cmd, bool write,
>         cmd->cmd_len = 10;
>         cmd->cmnd[0] = write ? WRITE_10 : READ_10;
>         cmd->cmnd[1] = flags;
> -       cmd->cmnd[6] = 0;
> +       cmd->cmnd[6] = sd_group_number(cmd);
>         cmd->cmnd[9] = 0;
>         put_unaligned_be32(lba, &cmd->cmnd[2]);
>         put_unaligned_be16(nr_blocks, &cmd->cmnd[7]);
> @@ -1177,7 +1203,8 @@ static blk_status_t
> sd_setup_read_write_cmnd(struct scsi_cmnd *cmd)
>                 ret = sd_setup_rw16_cmnd(cmd, write, lba, nr_blocks,
>                                          protect | fua, dld);
>         } else if ((nr_blocks > 0xff) || (lba > 0x1fffff) ||
> -                  sdp->use_10_for_rw || protect) {
> +                  sdp->use_10_for_rw || protect ||
> +                  rq->write_hint != WRITE_LIFE_NOT_SET) {
Is this a typo?

>                 ret = sd_setup_rw10_cmnd(cmd, write, lba, nr_blocks,
>                                          protect | fua);
>         } else {
> @@ -2912,6 +2939,37 @@ sd_read_cache_type(struct scsi_disk *sdkp,
> unsigned char *buffer)
>         sdkp->DPOFUA = 0;
>  }
> 
> +static void sd_read_io_hints(struct scsi_disk *sdkp, unsigned char *buffer)
> +{
> +       struct scsi_device *sdp = sdkp->device;
> +       const struct scsi_io_group_descriptor *desc, *start, *end;
> +       struct scsi_sense_hdr sshdr;
> +       struct scsi_mode_data data;
> +       int res;
> +
> +       res = scsi_mode_sense(sdp, /*dbd=*/0x8, /*modepage=*/0x0a,
> +                             /*subpage=*/0x05, buffer, SD_BUF_SIZE,
> +                             SD_TIMEOUT, sdkp->max_retries, &data, &sshdr);
> +       if (res < 0)
> +               return;
> +       start = (void *)buffer + data.header_length + 16;
> +       end = (void *)buffer + ((data.header_length + data.length)
> +                               & ~(sizeof(*end) - 1));
> +       /*
> +        * From "SBC-5 Constrained Streams with Data Lifetimes": Device severs
> +        * should assign the lowest numbered stream identifiers to permanent
> +        * streams.
> +        */
> +       for (desc = start; desc < end; desc++)
> +               if (!desc->st_enble)
> +                       break;
I don't see how you can conclude that the stream is permanent,
without reading the perm bit from the stream status descriptor.

> +       sdkp->permanent_stream_count = desc - start;
> +       if (sdkp->rscs && sdkp->permanent_stream_count < 2)
> +               sdev_printk(KERN_INFO, sdp,
> +                           "Unexpected: RSCS has been set and the permanent stream
> count is %u\n",
> +                           sdkp->permanent_stream_count);
> +}
> +
>  /*
>   * The ATO bit indicates whether the DIF application tag is available
>   * for use by the operating system.
> @@ -3395,6 +3453,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
> 
>                 sd_read_write_protect_flag(sdkp, buffer);
>                 sd_read_cache_type(sdkp, buffer);
> +               sd_read_io_hints(sdkp, buffer);
>                 sd_read_app_tag_own(sdkp, buffer);
>                 sd_read_write_same(sdkp, buffer);
>                 sd_read_security(sdkp, buffer);
> diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
> index 84685168b6e0..1863de5ebae4 100644
> --- a/drivers/scsi/sd.h
> +++ b/drivers/scsi/sd.h
> @@ -125,6 +125,7 @@ struct scsi_disk {
>         unsigned int    physical_block_size;
>         unsigned int    max_medium_access_timeouts;
>         unsigned int    medium_access_timed_out;
> +       u16             permanent_stream_count; /* maximum number of streams
> */
This comment is a bit misleading:
The Block Limits Extension VPD page has a "maximum number of streams" field.
Maybe avoid the unnecessary confusion?

Thanks,
Avri

>         u8              media_present;
>         u8              write_prot;
>         u8              protection_type;/* Data Integrity Field */

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 00/13] Pass data temperature information to zoned UFS devices
  2023-10-02 11:53     ` Niklas Cassel
@ 2023-10-02 16:33       ` Bart Van Assche
  2023-10-02 19:19         ` Niklas Cassel
  0 siblings, 1 reply; 61+ messages in thread
From: Bart Van Assche @ 2023-10-02 16:33 UTC (permalink / raw)
  To: Niklas Cassel, Martin K. Petersen
  Cc: Jens Axboe, linux-block, linux-scsi, linux-fsdevel,
	Christoph Hellwig, Damien Le Moal

On 10/2/23 04:53, Niklas Cassel wrote:
> On Mon, Oct 02, 2023 at 01:37:59PM +0200, Niklas Cassel wrote:
>> I don't know which user facing API Martin's I/O hinting series is intending
>> to use.
>>
>> However, while discussing this series at ALPSS, we did ask ourselves why this
>> series is not reusing the already existing block layer API for providing I/O
>> hints:
>> https://github.com/torvalds/linux/blob/v6.6-rc4/include/uapi/linux/ioprio.h#L83-L103
>>
>> We can have 1023 possible I/O hints, and so far we are only using 7, which
>> means that there are 1016 possible hints left.
>> This also enables you to define more than the 4 previous temperature hints
>> (extreme, long, medium, short), if so desired.
>>
>> There is also support in fio for these I/O hints:
>> https://github.com/axboe/fio/blob/master/HOWTO.rst?plain=1#L2294-L2302
>>
>> When this new I/O hint API has added, there was no other I/O hint API
>> in the kernel (since the old fcntl() F_GET_FILE_RW_HINT / F_SET_FILE_RW_HINT
>> API had already been removed when this new API was added).
>>
>> So there should probably be a good argument why we would want to introduce
>> yet another API for providing I/O hints, instead of extending the I/O hint
>> API that we already have in the kernel right now.
>> (Especially since it seems fairly easy to modify your patches to reuse the
>> existing API.)
> 
> One argument might be that the current I/O hints API does not allow hints to
> be stacked. So one would not e.g. be able to combine a command duration limit
> with a temperature hint...

Hi Niklas,

Is your feedback about the user space API only or also about the
mechanism that is used internally in the kernel?

Restoring the ability to pass data temperature information from a
filesystem to a block device is much more important to me than
restoring the ability to pass data temperature information from user
space to a filesystem. Would it be sufficient to address your concern
if patch 2/13 would be dropped from this series?

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/13] fs: Restore kiocb.ki_hint
  2023-10-02 10:45   ` Avri Altman
@ 2023-10-02 16:39     ` Bart Van Assche
  0 siblings, 0 replies; 61+ messages in thread
From: Bart Van Assche @ 2023-10-02 16:39 UTC (permalink / raw)
  To: Avri Altman, Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, Dave Chinner, Alexander Viro,
	Christian Brauner, Benjamin LaHaise, David Howells, Jaegeuk Kim,
	Chao Yu, Steven Rostedt, Masami Hiramatsu

On 10/2/23 03:45, Avri Altman wrote:
>> diff --git a/io_uring/rw.c b/io_uring/rw.c
>> index c8c822fa7980..c41ae6654116 100644
>> --- a/io_uring/rw.c
>> +++ b/io_uring/rw.c
>> @@ -677,6 +677,7 @@ static int io_rw_init_file(struct io_kiocb *req,
>> fmode_t mode)
>>                  req->flags |= io_file_get_flags(file);
>>
>>          kiocb->ki_flags = file->f_iocb_flags;
>> +       kiocb->ki_hint = file_inode(file)->i_write_hint;
>
> Originally ki_hint_validate() was used here as well?

Thanks for having reported this. I will restore the ki_hint_validate()
call in the io_uring code.

Bart.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 04/13] block: Restore write hint support
  2023-10-02 11:23   ` Avri Altman
@ 2023-10-02 17:02     ` Bart Van Assche
  0 siblings, 0 replies; 61+ messages in thread
From: Bart Van Assche @ 2023-10-02 17:02 UTC (permalink / raw)
  To: Avri Altman, Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, Alexander Viro, Christian Brauner,
	Jaegeuk Kim, Chao Yu, Darrick J. Wong

On 10/2/23 04:23, Avri Altman wrote:
>> The following aspects of that commit have been dropped:
>> - Debugfs support for retrieving and modifying write hints.
> 
> Any particular reason to left those out?

The above comment is misleading: what has not been restored is the
struct request_queue write_hints[] array member nor the debugfs
interface for accessing that array. My understanding is that that array
was used to track stream statistics by the NVMe driver. From version
v5.17 of the NVMe driver:

	if (streamid < ARRAY_SIZE(req->q->write_hints))
		req->q->write_hints[streamid] += blk_rq_bytes(req) >> 9;

>> - md-raid, BTRFS, ext4, gfs2 and zonefs write hint support.
> 
> Native Linux with ext4 is being used in automotive, and even mobile
> platforms. E.g. Qualcomm's RB5 is formally maintained with Debian -
> https://releases.linaro.org/96boards/rb5/linaro/debian/21.12/

All ext4 did with write hint information is to copy the inode write
hint information into the bio. The inode write hint information is set
by the F_SET_RW_HINT fcntl. The only software packages that use the
F_SET_RW_HINT fcntl and that I'm aware of are RocksDB, Samba, stress_ng
and rr. Are any of these software packages used in automotive software
on top of ext4?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 06/13] scsi_proto: Add struct io_group_descriptor
  2023-10-02 11:41   ` Avri Altman
@ 2023-10-02 17:16     ` Bart Van Assche
  0 siblings, 0 replies; 61+ messages in thread
From: Bart Van Assche @ 2023-10-02 17:16 UTC (permalink / raw)
  To: Avri Altman, Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, James E.J. Bottomley

On 10/2/23 04:41, Avri Altman wrote:
>> +/* SBC-5 IO advice hints group descriptor */
>> +struct scsi_io_group_descriptor {
>> +#if defined(__BIG_ENDIAN)
>> +       u8 io_advice_hints_mode: 2;
>> +       u8 reserved1: 3;
>> +       u8 st_enble: 1;
>> +       u8 cs_enble: 1;
>> +       u8 ic_enable: 1;
>> +#elif defined(__LITTLE_ENDIAN)
>> +       u8 ic_enable: 1;
>> +       u8 cs_enble: 1;
>> +       u8 st_enble: 1;
>> +       u8 reserved1: 3;
>> +       u8 io_advice_hints_mode: 2;
>> +#else
>> +#error
>> +#endif
 >
> Anything pass byte offset 0 is irrelevant for constrained streams.
> Why do we need that further drill down of the descriptor structure?

The data structures in header file include/scsi/scsi_proto.h follow the
SCSI standards closely. These data structures should not be tailored to
the current use case of these data structures.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 00/13] Pass data temperature information to zoned UFS devices
  2023-10-02 11:38   ` Niklas Cassel
  2023-10-02 11:53     ` Niklas Cassel
@ 2023-10-02 17:20     ` Bart Van Assche
  2023-10-03  1:40     ` Martin K. Petersen
  2 siblings, 0 replies; 61+ messages in thread
From: Bart Van Assche @ 2023-10-02 17:20 UTC (permalink / raw)
  To: Niklas Cassel, Martin K. Petersen
  Cc: Jens Axboe, linux-block, linux-scsi, linux-fsdevel,
	Christoph Hellwig, Damien Le Moal

On 10/2/23 04:38, Niklas Cassel wrote:
> So there should probably be a good argument why we would want to 
> introduce yet another API for providing I/O hints, instead of 
> extending the I/O hint API that we already have in the kernel right 
> now. (Especially since it seems fairly easy to modify your patches
> to reuse the existing API.)

Here is a strong argument: there is user space software that is using
the F_SET_FILE_RW_HINT API, e.g. Samba. I don't think that the above
arguments are strong enough to tell all developers of user space
software to switch from F_SET_FILE_RW_HINT to another API. This would
force user space developers to check the kernel version before they
can decide which user space API to use. If the new user space API would
get backported to distro kernels then that would cause a real nightmare
for user space developers who want to use F_SET_FILE_RW_HINT or its
equivalent.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 07/13] sd: Translate data lifetime information
  2023-10-02 13:11   ` Avri Altman
@ 2023-10-02 17:42     ` Bart Van Assche
  2023-10-03  5:48       ` Avri Altman
  0 siblings, 1 reply; 61+ messages in thread
From: Bart Van Assche @ 2023-10-02 17:42 UTC (permalink / raw)
  To: Avri Altman, Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, Damien Le Moal, James E.J. Bottomley

On 10/2/23 06:11, Avri Altman wrote:
>> sd_setup_read_write_cmnd(struct scsi_cmnd *cmd)
>>                  ret = sd_setup_rw16_cmnd(cmd, write, lba, nr_blocks,
>>                                           protect | fua, dld);
>>          } else if ((nr_blocks > 0xff) || (lba > 0x1fffff) ||
>> -                  sdp->use_10_for_rw || protect) {
>> +                  sdp->use_10_for_rw || protect ||
>> +                  rq->write_hint != WRITE_LIFE_NOT_SET) {
>
> Is this a typo?

I don't see a typo? Am I perhaps overlooking something?

>> +static void sd_read_io_hints(struct scsi_disk *sdkp, unsigned char *buffer)
>> +{
>> +       struct scsi_device *sdp = sdkp->device;
>> +       const struct scsi_io_group_descriptor *desc, *start, *end;
>> +       struct scsi_sense_hdr sshdr;
>> +       struct scsi_mode_data data;
>> +       int res;
>> +
>> +       res = scsi_mode_sense(sdp, /*dbd=*/0x8, /*modepage=*/0x0a,
>> +                             /*subpage=*/0x05, buffer, SD_BUF_SIZE,
>> +                             SD_TIMEOUT, sdkp->max_retries, &data, &sshdr);
>> +       if (res < 0)
>> +               return;
>> +       start = (void *)buffer + data.header_length + 16;
>> +       end = (void *)buffer + ((data.header_length + data.length)
>> +                               & ~(sizeof(*end) - 1));
>> +       /*
>> +        * From "SBC-5 Constrained Streams with Data Lifetimes": Device severs
>> +        * should assign the lowest numbered stream identifiers to permanent
>> +        * streams.
>> +        */
>> +       for (desc = start; desc < end; desc++)
>> +               if (!desc->st_enble)
>> +                       break;
> I don't see how you can conclude that the stream is permanent,
> without reading the perm bit from the stream status descriptor.

I will add code that retrieves the stream status and that checks the 
PERM bit.

>> diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
>> index 84685168b6e0..1863de5ebae4 100644
>> --- a/drivers/scsi/sd.h
>> +++ b/drivers/scsi/sd.h
>> @@ -125,6 +125,7 @@ struct scsi_disk {
>>          unsigned int    physical_block_size;
>>          unsigned int    max_medium_access_timeouts;
>>          unsigned int    medium_access_timed_out;
>> +       u16             permanent_stream_count; /* maximum number of streams
>> */
>
> This comment is a bit misleading:
> The Block Limits Extension VPD page has a "maximum number of streams" field.
> Maybe avoid the unnecessary confusion?

I will change that comment or leave it out.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH 04/13] block: Restore write hint support
  2023-09-20 19:14 ` [PATCH 04/13] block: Restore write hint support Bart Van Assche
  2023-10-02 11:23   ` Avri Altman
@ 2023-10-02 18:08   ` Avri Altman
  2023-10-03 19:52   ` Bean Huo
  2 siblings, 0 replies; 61+ messages in thread
From: Avri Altman @ 2023-10-02 18:08 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, Alexander Viro, Christian Brauner,
	Jaegeuk Kim, Chao Yu, Darrick J. Wong

 
> This patch partially reverts commit c75e707fe1aa ("block: remove the per-
> bio/request write hint"). The following aspects of that commit have been
> reverted:
> - Pass the struct kiocb write hint information to struct bio.
> - Pass the struct bio write hint information to struct request.
> - Do not merge requests with different write hints.
> - Passing write hint information from the VFS layer to the block layer.
> - In F2FS, initialization of bio.bi_write_hint.
> 
> The following aspects of that commit have been dropped:
> - Debugfs support for retrieving and modifying write hints.
> - md-raid, BTRFS, ext4, gfs2 and zonefs write hint support.
> - The write_hints[] array in struct request_queue.
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Avri Altman <avri.altman@wdc.com>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH 06/13] scsi_proto: Add struct io_group_descriptor
  2023-09-20 19:14 ` [PATCH 06/13] scsi_proto: Add struct io_group_descriptor Bart Van Assche
  2023-10-02 11:41   ` Avri Altman
@ 2023-10-02 18:16   ` Avri Altman
  1 sibling, 0 replies; 61+ messages in thread
From: Avri Altman @ 2023-10-02 18:16 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, James E.J. Bottomley

 
> Prepare for adding code that will fill in and parse this data structure.
> 
> Cc: Martin K. Petersen <martin.petersen@oracle.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Avri Altman <avri.altman@wdc.com>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 00/13] Pass data temperature information to zoned UFS devices
  2023-10-02 16:33       ` Bart Van Assche
@ 2023-10-02 19:19         ` Niklas Cassel
  0 siblings, 0 replies; 61+ messages in thread
From: Niklas Cassel @ 2023-10-02 19:19 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Martin K. Petersen, Jens Axboe, linux-block, linux-scsi,
	linux-fsdevel, Christoph Hellwig, Damien Le Moal

On Mon, Oct 02, 2023 at 09:33:22AM -0700, Bart Van Assche wrote:
> On 10/2/23 04:53, Niklas Cassel wrote:
> > On Mon, Oct 02, 2023 at 01:37:59PM +0200, Niklas Cassel wrote:
> > > I don't know which user facing API Martin's I/O hinting series is intending
> > > to use.
> > > 
> > > However, while discussing this series at ALPSS, we did ask ourselves why this
> > > series is not reusing the already existing block layer API for providing I/O
> > > hints:
> > > https://github.com/torvalds/linux/blob/v6.6-rc4/include/uapi/linux/ioprio.h#L83-L103
> > > 
> > > We can have 1023 possible I/O hints, and so far we are only using 7, which
> > > means that there are 1016 possible hints left.
> > > This also enables you to define more than the 4 previous temperature hints
> > > (extreme, long, medium, short), if so desired.
> > > 
> > > There is also support in fio for these I/O hints:
> > > https://github.com/axboe/fio/blob/master/HOWTO.rst?plain=1#L2294-L2302
> > > 
> > > When this new I/O hint API has added, there was no other I/O hint API
> > > in the kernel (since the old fcntl() F_GET_FILE_RW_HINT / F_SET_FILE_RW_HINT
> > > API had already been removed when this new API was added).
> > > 
> > > So there should probably be a good argument why we would want to introduce
> > > yet another API for providing I/O hints, instead of extending the I/O hint
> > > API that we already have in the kernel right now.
> > > (Especially since it seems fairly easy to modify your patches to reuse the
> > > existing API.)
> > 
> > One argument might be that the current I/O hints API does not allow hints to
> > be stacked. So one would not e.g. be able to combine a command duration limit
> > with a temperature hint...
> 
> Hi Niklas,
> 
> Is your feedback about the user space API only or also about the
> mechanism that is used internally in the kernel?

The concern is only related to the user space API.

(However, if you do reuse the existing I/O prio hints, you will avoid
adding a new struct member to a lot of structs.)


> 
> Restoring the ability to pass data temperature information from a
> filesystem to a block device is much more important to me than
> restoring the ability to pass data temperature information from user
> space to a filesystem. Would it be sufficient to address your concern
> if patch 2/13 would be dropped from this series?

Right now 0 means no I/O hint.
Value 1-7 is used for CDL.
This means that bits 0-2 are currently used by CDL.

I guess we could define e.g. bits 3-5 to be used by temperature hints,
i.e. temperature hints could have values 0-7, where 0 would be no
temperature hint. (I guess we could still limit the temperature hints
to 1-4 if we want to keep the previous extreme/long/medium/short constants.)

This way, we can combine a CDL value with a temperature hint.
I.e. if user space has set bits in both bits 0-2 and 3-5, then both CDL
and temperature hints are used.

(And we would still have 4 bits left in 10 bit long I/O hints field that
can be used by some other I/O hint feature in the future.)

We could theoretically do this without changing the existing I/O prio hints
API, as all the existing hints (CDL descriptors 1-7) would keep their existing
values.

While I think this sounds quite nice, since it would avoid what your patches
currently do: adding a new "write_hint" struct member to the following structs:
struct kiocb, struct file, struct request, struct request, struct bio.

Instead it would rely on the existing ioprio struct members in these structs.
Additionally you would not need to add code that avoid merging of requests with
different write hints, as the current code already avoids merging of requests
with different ioprio (which thus extends to ioprio I/O hints).

Anyway, even if I do think that modifying your patch series to use the I/O prio
hints API would be a simpler and cleaner solution, including a smaller diffstat,
I do not care too strongly about this, and will leave the pondering to the very
wise maintainers.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 00/13] Pass data temperature information to zoned UFS devices
  2023-10-02 11:38   ` Niklas Cassel
  2023-10-02 11:53     ` Niklas Cassel
  2023-10-02 17:20     ` Bart Van Assche
@ 2023-10-03  1:40     ` Martin K. Petersen
  2023-10-03 17:26       ` Bart Van Assche
  2 siblings, 1 reply; 61+ messages in thread
From: Martin K. Petersen @ 2023-10-03  1:40 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: Martin K. Petersen, Bart Van Assche, Jens Axboe, linux-block,
	linux-scsi, linux-fsdevel, Christoph Hellwig, Damien Le Moal


Niklas,

> I don't know which user facing API Martin's I/O hinting series is
> intending to use.

I'm just using ioprio.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH 07/13] sd: Translate data lifetime information
  2023-10-02 17:42     ` Bart Van Assche
@ 2023-10-03  5:48       ` Avri Altman
  2023-10-03 16:58         ` Bart Van Assche
  0 siblings, 1 reply; 61+ messages in thread
From: Avri Altman @ 2023-10-03  5:48 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, Damien Le Moal, James E.J. Bottomley

> On 10/2/23 06:11, Avri Altman wrote:
> >> sd_setup_read_write_cmnd(struct scsi_cmnd *cmd)
> >>                  ret = sd_setup_rw16_cmnd(cmd, write, lba, nr_blocks,
> >>                                           protect | fua, dld);
> >>          } else if ((nr_blocks > 0xff) || (lba > 0x1fffff) ||
> >> -                  sdp->use_10_for_rw || protect) {
> >> +                  sdp->use_10_for_rw || protect ||
> >> +                  rq->write_hint != WRITE_LIFE_NOT_SET) {
> >
> > Is this a typo?
> 
> I don't see a typo? Am I perhaps overlooking something?
Forcing READ(6) into READ(10) because that req carries a write-hint,
Deserves an extra line in the commit log IMO.

Thanks,
Avri

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH 08/13] scsi_debug: Reduce code duplication
  2023-09-20 19:14 ` [PATCH 08/13] scsi_debug: Reduce code duplication Bart Van Assche
@ 2023-10-03  6:49   ` Avri Altman
  0 siblings, 0 replies; 61+ messages in thread
From: Avri Altman @ 2023-10-03  6:49 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, Douglas Gilbert, James E.J. Bottomley

> All VPD pages have the page code in byte one. Reduce code duplication by
> storing the VPD page code once.
> 
> Cc: Martin K. Petersen <martin.petersen@oracle.com>
> Cc: Douglas Gilbert <dgilbert@interlog.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Avri Altman <avri.altman@wdc.com>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 07/13] sd: Translate data lifetime information
  2023-10-03  5:48       ` Avri Altman
@ 2023-10-03 16:58         ` Bart Van Assche
  2023-10-03 16:59           ` Bart Van Assche
  0 siblings, 1 reply; 61+ messages in thread
From: Bart Van Assche @ 2023-10-03 16:58 UTC (permalink / raw)
  To: Avri Altman, Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, Damien Le Moal, James E.J. Bottomley

On 10/2/23 22:48, Avri Altman wrote:
>> On 10/2/23 06:11, Avri Altman wrote:
>>>> sd_setup_read_write_cmnd(struct scsi_cmnd *cmd)
>>>>                   ret = sd_setup_rw16_cmnd(cmd, write, lba, nr_blocks,
>>>>                                            protect | fua, dld);
>>>>           } else if ((nr_blocks > 0xff) || (lba > 0x1fffff) ||
>>>> -                  sdp->use_10_for_rw || protect) {
>>>> +                  sdp->use_10_for_rw || protect ||
>>>> +                  rq->write_hint != WRITE_LIFE_NOT_SET) {
>>>
>>> Is this a typo?
>>
>> I don't see a typo? Am I perhaps overlooking something?
 >
> Forcing READ(6) into READ(10) because that req carries a write-hint,
> Deserves an extra line in the commit log IMO.

Right, I should explain that the READ(6) command does not support write 
hints and hence that READ(10) is selected if a write hint is present.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 07/13] sd: Translate data lifetime information
  2023-10-03 16:58         ` Bart Van Assche
@ 2023-10-03 16:59           ` Bart Van Assche
  0 siblings, 0 replies; 61+ messages in thread
From: Bart Van Assche @ 2023-10-03 16:59 UTC (permalink / raw)
  To: Avri Altman, Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, Damien Le Moal, James E.J. Bottomley

On 10/3/23 09:58, Bart Van Assche wrote:
> On 10/2/23 22:48, Avri Altman wrote:
>>> On 10/2/23 06:11, Avri Altman wrote:
>>>>> sd_setup_read_write_cmnd(struct scsi_cmnd *cmd)
>>>>>                   ret = sd_setup_rw16_cmnd(cmd, write, lba, nr_blocks,
>>>>>                                            protect | fua, dld);
>>>>>           } else if ((nr_blocks > 0xff) || (lba > 0x1fffff) ||
>>>>> -                  sdp->use_10_for_rw || protect) {
>>>>> +                  sdp->use_10_for_rw || protect ||
>>>>> +                  rq->write_hint != WRITE_LIFE_NOT_SET) {
>>>>
>>>> Is this a typo?
>>>
>>> I don't see a typo? Am I perhaps overlooking something?
>  >
>> Forcing READ(6) into READ(10) because that req carries a write-hint,
>> Deserves an extra line in the commit log IMO.
> 
> Right, I should explain that the READ(6) command does not support write 
> hints and hence that READ(10) is selected if a write hint is present.

(replying to my own email)

In my answer READ should be changed into WRITE.

Bart.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 00/13] Pass data temperature information to zoned UFS devices
  2023-10-03  1:40     ` Martin K. Petersen
@ 2023-10-03 17:26       ` Bart Van Assche
  2023-10-03 18:45         ` Niklas Cassel
  2023-10-04  3:17         ` Martin K. Petersen
  0 siblings, 2 replies; 61+ messages in thread
From: Bart Van Assche @ 2023-10-03 17:26 UTC (permalink / raw)
  To: Martin K. Petersen, Niklas Cassel
  Cc: Jens Axboe, linux-block, linux-scsi, linux-fsdevel,
	Christoph Hellwig, Damien Le Moal

On 10/2/23 18:40, Martin K. Petersen wrote:
> 
> Niklas,
> 
>> I don't know which user facing API Martin's I/O hinting series is
>> intending to use.
> 
> I'm just using ioprio.

Hi Martin,

Do you plan to use existing bits from the ioprio bitmask or new bits? 
Bits 0-2 are used for the priority level. Bits 3-5 are used for CDL. 
Bits 13-15 are used for the I/O priority. The SCSI and NVMe standard 
define 64 different data lifetimes (six bits). So there are 16 - 3 - 3 - 
6 = 4 remaining bits.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 00/13] Pass data temperature information to zoned UFS devices
  2023-10-03 17:26       ` Bart Van Assche
@ 2023-10-03 18:45         ` Niklas Cassel
  2023-10-04  3:17         ` Martin K. Petersen
  1 sibling, 0 replies; 61+ messages in thread
From: Niklas Cassel @ 2023-10-03 18:45 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Martin K. Petersen, Jens Axboe, linux-block, linux-scsi,
	linux-fsdevel, Christoph Hellwig, Damien Le Moal

On Tue, Oct 03, 2023 at 10:26:27AM -0700, Bart Van Assche wrote:
> On 10/2/23 18:40, Martin K. Petersen wrote:
> > 
> > Niklas,
> > 
> > > I don't know which user facing API Martin's I/O hinting series is
> > > intending to use.
> > 
> > I'm just using ioprio.
> 
> Hi Martin,
> 
> Do you plan to use existing bits from the ioprio bitmask or new bits? Bits
> 0-2 are used for the priority level. Bits 3-5 are used for CDL. Bits 13-15
> are used for the I/O priority. The SCSI and NVMe standard define 64
> different data lifetimes (six bits). So there are 16 - 3 - 3 - 6 = 4
> remaining bits.

Hello Bart,

I think the math is:

16 - 3 (prio level) - 3 (CDL) - 3 (prio class) = 7

so if we want 64 different values for data lifetimes
(we previously only had 4 different values), that is 6 bits:

16 - 3 (prio level) - 3 (CDL) - 3 (prio class) - 6 (lifetime) = 1

so only one bit left for Martin :)

Not very much room to play with...


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 01/13] fs/f2fs: Restore the whint_mode mount option
  2023-09-20 19:14 ` [PATCH 01/13] fs/f2fs: Restore the whint_mode mount option Bart Van Assche
  2023-10-02 10:32   ` Avri Altman
@ 2023-10-03 19:33   ` Bean Huo
  1 sibling, 0 replies; 61+ messages in thread
From: Bean Huo @ 2023-10-03 19:33 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, Jaegeuk Kim, Chao Yu, Jonathan Corbet

On 20.09.23 9:14 PM, Bart Van Assche wrote:
> Restore support for the whint_mode mount option by reverting commit
> 930e2607638d ("f2fs: remove obsolete whint_mode").
>
> Cc: Jaegeuk Kim<jaegeuk@kernel.org>
> Cc: Chao Yu<chao@kernel.org>
> Signed-off-by: Bart Van Assche<bvanassche@acm.org>

Reviewed-by: Bean Huo <beanhuo@micron.com>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 02/13] fs: Restore support for F_GET_FILE_RW_HINT and F_SET_FILE_RW_HINT
  2023-09-20 19:14 ` [PATCH 02/13] fs: Restore support for F_GET_FILE_RW_HINT and F_SET_FILE_RW_HINT Bart Van Assche
  2023-10-02 10:35   ` Avri Altman
@ 2023-10-03 19:42   ` Bean Huo
  1 sibling, 0 replies; 61+ messages in thread
From: Bean Huo @ 2023-10-03 19:42 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, Dave Chinner, Alexander Viro,
	Christian Brauner, Jeff Layton, Chuck Lever

On 20.09.23 9:14 PM, Bart Van Assche wrote:
> Restore support for F_GET_FILE_RW_HINT and F_SET_FILE_RW_HINT by
> reverting commit 7b12e49669c9 ("fs: remove fs.f_write_hint").
>
> Cc: Christoph Hellwig<hch@lst.de>
> Cc: Dave Chinner<dchinner@redhat.com>
> Signed-off-by: Bart Van Assche<bvanassche@acm.org>

Reviewed-by: Bean Huo <beanhuo@micron.com>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 04/13] block: Restore write hint support
  2023-09-20 19:14 ` [PATCH 04/13] block: Restore write hint support Bart Van Assche
  2023-10-02 11:23   ` Avri Altman
  2023-10-02 18:08   ` Avri Altman
@ 2023-10-03 19:52   ` Bean Huo
  2 siblings, 0 replies; 61+ messages in thread
From: Bean Huo @ 2023-10-03 19:52 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, Alexander Viro, Christian Brauner,
	Jaegeuk Kim, Chao Yu, Darrick J. Wong

On 20.09.23 9:14 PM, Bart Van Assche wrote:
> Cc: Christoph Hellwig<hch@lst.de>
> Signed-off-by: Bart Van Assche<bvanassche@acm.org>

Reviewed-by: Bean Huo <beanhuo@micron.com>



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 00/13] Pass data temperature information to zoned UFS devices
  2023-10-03 17:26       ` Bart Van Assche
  2023-10-03 18:45         ` Niklas Cassel
@ 2023-10-04  3:17         ` Martin K. Petersen
  1 sibling, 0 replies; 61+ messages in thread
From: Martin K. Petersen @ 2023-10-04  3:17 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Martin K. Petersen, Niklas Cassel, Jens Axboe, linux-block,
	linux-scsi, linux-fsdevel, Christoph Hellwig, Damien Le Moal


Bart,

> Do you plan to use existing bits from the ioprio bitmask or new bits?
> Bits 0-2 are used for the priority level. Bits 3-5 are used for CDL.
> Bits 13-15 are used for the I/O priority. The SCSI and NVMe standard
> define 64 different data lifetimes (six bits). So there are 16 - 3 - 3
> - 6 = 4 remaining bits.

I just use the existing I/O priority classes and levels to set a
high/normal/low relative priority.

I would still like pursue I/O classification since that performed better
in our testing. But that does involve working with vendors on a Linux
profile as discussed at LSF/MM. Don't really more than a handful in
either case.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH 01/13] fs/f2fs: Restore the whint_mode mount option
       [not found] ` <CGME20230920191557epcas2p34a114957acf221c0d8f60acbb3107c77@epcms2p6>
@ 2023-10-05 11:41   ` Daejun Park
  0 siblings, 0 replies; 61+ messages in thread
From: Daejun Park @ 2023-10-05 11:41 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, Jaegeuk Kim, Chao Yu, Jonathan Corbet,
	Daejun Park, Seokhwan Kim, Yonggil Song, Jorn Lee


> Restore support for the whint_mode mount option by reverting commit
> 930e2607638d ("f2fs: remove obsolete whint_mode").
> 
> Cc: Jaegeuk Kim <jaegeuk@kernel.org>
> Cc: Chao Yu <chao@kernel.org>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Reviewed-by: Daejun Park <daejun7.park@samsung.com>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH 02/13] fs: Restore support for F_GET_FILE_RW_HINT and F_SET_FILE_RW_HINT
       [not found] ` <CGME20230920191549epcas2p35174687f1bebe87c42a658fa6aa57bff@epcms2p7>
@ 2023-10-05 11:43   ` Daejun Park
  0 siblings, 0 replies; 61+ messages in thread
From: Daejun Park @ 2023-10-05 11:43 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, Dave Chinner, Alexander Viro,
	Christian Brauner, Jeff Layton, Chuck Lever, Daejun Park,
	Seokhwan Kim, Yonggil Song, Jorn Lee


> Restore support for F_GET_FILE_RW_HINT and F_SET_FILE_RW_HINT by
> reverting commit 7b12e49669c9 ("fs: remove fs.f_write_hint").
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Reviewed-by: Daejun Park <daejun7.park@samsung.com>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH 04/13] block: Restore write hint support
       [not found] ` <CGME20230920191556epcas2p39b150e6715248b625588a50b333e82e2@epcms2p1>
@ 2023-10-05 11:46   ` Daejun Park
  0 siblings, 0 replies; 61+ messages in thread
From: Daejun Park @ 2023-10-05 11:46 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, Alexander Viro, Christian Brauner,
	Jaegeuk Kim, Chao Yu, Darrick J. Wong, Daejun Park, Seokhwan Kim,
	Yonggil Song, Jorn Lee


> This patch partially reverts commit c75e707fe1aa ("block: remove the
> per-bio/request write hint"). The following aspects of that commit have
> been reverted:
> - Pass the struct kiocb write hint information to struct bio.
> - Pass the struct bio write hint information to struct request.
> - Do not merge requests with different write hints.
> - Passing write hint information from the VFS layer to the block layer.
> - In F2FS, initialization of bio.bi_write_hint.
> 
> The following aspects of that commit have been dropped:
> - Debugfs support for retrieving and modifying write hints.
> - md-raid, BTRFS, ext4, gfs2 and zonefs write hint support.
> - The write_hints[] array in struct request_queue.
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Reviewed-by: Daejun Park <daejun7.park@samsung.com>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH 05/13] scsi: core: Query the Block Limits Extension VPD page
       [not found] ` <CGME20230920191816epcas2p1b30d19aa41e51ffaf7c95f9100ee6311@epcms2p3>
@ 2023-10-05 11:58   ` Daejun Park
  0 siblings, 0 replies; 61+ messages in thread
From: Daejun Park @ 2023-10-05 11:58 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, James E.J. Bottomley, Daejun Park, Jorn Lee,
	Seokhwan Kim, Yonggil Song

 
> Parse the Reduced Stream Control Supported (RSCS) bit from the block
> limits extension VPD page. The RSCS bit is defined in T10 document
> "SBC-5 Constrained Streams with Data Lifetimes"
> (https://protect2.fireeye.com/v1/url?k=046aff72-65e1ea35-046b743d-000babff99aa-a76aa64ec2a10777&q=1&e=39d8e8ca-6f96-4283-86e4-54310bfa56e1&u=https%3A%2F%2Fwww.t10.org%2Fcgi-bin%2Fac.pl%3Ft%3Dd%26f%3D23-024r3.pdf).
> 
> Cc: Martin K. Petersen <martin.petersen@oracle.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Reviewed-by: Daejun Park <daejun7.park@samsung.com>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH 06/13] scsi_proto: Add struct io_group_descriptor
       [not found] ` <CGME20230920191554epcas2p2280a25d6b2a7fa81563bd6cf1e75549d@epcms2p8>
@ 2023-10-05 11:59   ` Daejun Park
  0 siblings, 0 replies; 61+ messages in thread
From: Daejun Park @ 2023-10-05 11:59 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, linux-fsdevel, Martin K . Petersen,
	Christoph Hellwig, James E.J. Bottomley, Daejun Park, Jorn Lee,
	Seokhwan Kim, Yonggil Song

> Prepare for adding code that will fill in and parse this data structure.
> 
> Cc: Martin K. Petersen <martin.petersen@oracle.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Reviewed-by: Daejun Park <daejun7.park@samsung.com>

^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2023-10-05 16:17 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-20 19:14 [PATCH 00/13] Pass data temperature information to zoned UFS devices Bart Van Assche
2023-09-20 19:14 ` [PATCH 01/13] fs/f2fs: Restore the whint_mode mount option Bart Van Assche
2023-10-02 10:32   ` Avri Altman
2023-10-03 19:33   ` Bean Huo
2023-09-20 19:14 ` [PATCH 02/13] fs: Restore support for F_GET_FILE_RW_HINT and F_SET_FILE_RW_HINT Bart Van Assche
2023-10-02 10:35   ` Avri Altman
2023-10-03 19:42   ` Bean Huo
2023-09-20 19:14 ` [PATCH 03/13] fs: Restore kiocb.ki_hint Bart Van Assche
2023-10-02 10:45   ` Avri Altman
2023-10-02 16:39     ` Bart Van Assche
2023-09-20 19:14 ` [PATCH 04/13] block: Restore write hint support Bart Van Assche
2023-10-02 11:23   ` Avri Altman
2023-10-02 17:02     ` Bart Van Assche
2023-10-02 18:08   ` Avri Altman
2023-10-03 19:52   ` Bean Huo
2023-09-20 19:14 ` [PATCH 05/13] scsi: core: Query the Block Limits Extension VPD page Bart Van Assche
2023-10-02 11:29   ` Avri Altman
2023-09-20 19:14 ` [PATCH 06/13] scsi_proto: Add struct io_group_descriptor Bart Van Assche
2023-10-02 11:41   ` Avri Altman
2023-10-02 17:16     ` Bart Van Assche
2023-10-02 18:16   ` Avri Altman
2023-09-20 19:14 ` [PATCH 07/13] sd: Translate data lifetime information Bart Van Assche
2023-10-02 13:11   ` Avri Altman
2023-10-02 17:42     ` Bart Van Assche
2023-10-03  5:48       ` Avri Altman
2023-10-03 16:58         ` Bart Van Assche
2023-10-03 16:59           ` Bart Van Assche
2023-09-20 19:14 ` [PATCH 08/13] scsi_debug: Reduce code duplication Bart Van Assche
2023-10-03  6:49   ` Avri Altman
2023-09-20 19:14 ` [PATCH 09/13] scsi_debug: Support the block limits extension VPD page Bart Van Assche
2023-09-20 19:14 ` [PATCH 10/13] scsi_debug: Rework page code error handling Bart Van Assche
2023-09-20 19:14 ` [PATCH 11/13] scsi_debug: Rework subpage " Bart Van Assche
2023-09-20 19:14 ` [PATCH 12/13] scsi_debug: Implement the IO Advice Hints Grouping mode page Bart Van Assche
2023-09-20 19:14 ` [PATCH 13/13] scsi_debug: Maintain write statistics per group number Bart Van Assche
2023-09-20 19:28 ` [PATCH 00/13] Pass data temperature information to zoned UFS devices Matthew Wilcox
2023-09-20 20:46   ` Bart Van Assche
2023-09-21  7:46     ` Niklas Cassel
2023-09-21 14:27       ` Bart Van Assche
2023-09-21 15:34         ` Niklas Cassel
2023-09-21 17:00           ` Bart Van Assche
2023-09-21 19:27         ` Matthew Wilcox
2023-09-21 19:39           ` Bart Van Assche
2023-09-21 19:46             ` Matthew Wilcox
2023-09-21 20:11               ` Bart Van Assche
2023-09-21 20:47               ` Jaegeuk Kim
2023-09-27 19:14 ` Martin K. Petersen
2023-09-27 20:49   ` Bart Van Assche
2023-10-02 11:38   ` Niklas Cassel
2023-10-02 11:53     ` Niklas Cassel
2023-10-02 16:33       ` Bart Van Assche
2023-10-02 19:19         ` Niklas Cassel
2023-10-02 17:20     ` Bart Van Assche
2023-10-03  1:40     ` Martin K. Petersen
2023-10-03 17:26       ` Bart Van Assche
2023-10-03 18:45         ` Niklas Cassel
2023-10-04  3:17         ` Martin K. Petersen
     [not found] ` <CGME20230920191557epcas2p34a114957acf221c0d8f60acbb3107c77@epcms2p6>
2023-10-05 11:41   ` [PATCH 01/13] fs/f2fs: Restore the whint_mode mount option Daejun Park
     [not found] ` <CGME20230920191549epcas2p35174687f1bebe87c42a658fa6aa57bff@epcms2p7>
2023-10-05 11:43   ` [PATCH 02/13] fs: Restore support for F_GET_FILE_RW_HINT and F_SET_FILE_RW_HINT Daejun Park
     [not found] ` <CGME20230920191556epcas2p39b150e6715248b625588a50b333e82e2@epcms2p1>
2023-10-05 11:46   ` [PATCH 04/13] block: Restore write hint support Daejun Park
     [not found] ` <CGME20230920191816epcas2p1b30d19aa41e51ffaf7c95f9100ee6311@epcms2p3>
2023-10-05 11:58   ` [PATCH 05/13] scsi: core: Query the Block Limits Extension VPD page Daejun Park
     [not found] ` <CGME20230920191554epcas2p2280a25d6b2a7fa81563bd6cf1e75549d@epcms2p8>
2023-10-05 11:59   ` [PATCH 06/13] scsi_proto: Add struct io_group_descriptor Daejun Park

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.