Linux-Block Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v2 0/4] Write-hint for FS journal
       [not found] <CGME20190109153328epcas2p4643cbdc7a2182b47893a2bcaa0778e17@epcas2p4.samsung.com>
@ 2019-01-09 15:30 ` Kanchan Joshi
       [not found]   ` <CGME20190109153332epcas1p187b419176a8d1d0be4982a275c0b9e86@epcas1p1.samsung.com>
                     ` (6 more replies)
  0 siblings, 7 replies; 20+ messages in thread
From: Kanchan Joshi @ 2019-01-09 15:30 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-block, linux-ext4, linux-nvme, jack, david, tytso,
	prakash.v, Kanchan Joshi

Towards supporing write-hints/streams for filesystem journal.                   
                                                                                
Here is the v1 patch for background -                                           
https://marc.info/?l=linux-fsdevel&m=154444637519020&w=2                        
                                                                                
Changes since v1:                                                               
- introduce four more hints for in-kernel use, as recommended by Dave chinner   
  & Jens axboe. This isolates kernel-mode hints from user-mode ones.            
- remove mount-option to specify write-hint, as recommended by Jan kara &       
  Dave chinner. Rather, FS always sets write-hint for journal. This gets ignored
  if device does not support stream.                                            
- Removed code-redundancy for write_dirty_buffer (Jan kara's review comment)


Kanchan Joshi (4):
  block: Increase count of supported write-hints
  fs: introduce four macros for in-kernel hints
  fs: introduce APIs to enable sending write-hint with buffer-head
  fs/ext4,jbd2: add support for passing write-hint with journal.

 fs/buffer.c                 | 18 ++++++++++++++++--
 fs/ext4/super.c             |  2 ++
 fs/jbd2/commit.c            | 11 +++++++----
 fs/jbd2/journal.c           |  3 ++-
 fs/jbd2/revoke.c            |  3 ++-
 include/linux/blkdev.h      |  5 ++++-
 include/linux/buffer_head.h |  3 +++
 include/linux/fs.h          |  5 +++++
 include/linux/jbd2.h        |  8 ++++++++
 9 files changed, 49 insertions(+), 9 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/4] block: Increase count of supported write-hints
       [not found]   ` <CGME20190109153332epcas1p187b419176a8d1d0be4982a275c0b9e86@epcas1p1.samsung.com>
@ 2019-01-09 15:30     ` Kanchan Joshi
  0 siblings, 0 replies; 20+ messages in thread
From: Kanchan Joshi @ 2019-01-09 15:30 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-block, linux-ext4, linux-nvme, jack, david, tytso,
	prakash.v, Kanchan Joshi

This patch bumps up write-hint count to support four new, in-kernel
hints.

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
---
 include/linux/blkdev.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 338604d..df07759 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -568,7 +568,10 @@ struct request_queue {
 
 	struct work_struct	release_work;
 
-#define BLK_MAX_WRITE_HINTS	5
+#define BLK_MAX_USER_WRITE_HINTS	5
+#define BLK_MAX_KERNEL_WRITE_HINTS	4
+#define BLK_MAX_WRITE_HINTS	(BLK_MAX_USER_WRITE_HINTS + \
+				BLK_MAX_KERNEL_WRITE_HINTS)
 	u64			write_hints[BLK_MAX_WRITE_HINTS];
 };
 
-- 
2.7.4


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 2/4] fs: introduce four macros for in-kernel hints
       [not found]   ` <CGME20190109153336epcas2p29b3275b6c545e483a3f43b92268f08bf@epcas2p2.samsung.com>
@ 2019-01-09 15:30     ` Kanchan Joshi
  2019-01-23 18:27       ` [PATCH 2/4] " Javier González
  0 siblings, 1 reply; 20+ messages in thread
From: Kanchan Joshi @ 2019-01-09 15:30 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-block, linux-ext4, linux-nvme, jack, david, tytso,
	prakash.v, Kanchan Joshi

Exiting write-hints are exposed to user-mode. There is a possiblity
of conflict if kernel happens to use those. This patch introduces four
write-hints for exclusive kernel-mode use.

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
---
 include/linux/fs.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 811c777..e8548eb 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -291,6 +291,11 @@ enum rw_hint {
 	WRITE_LIFE_MEDIUM	= RWH_WRITE_LIFE_MEDIUM,
 	WRITE_LIFE_LONG		= RWH_WRITE_LIFE_LONG,
 	WRITE_LIFE_EXTREME	= RWH_WRITE_LIFE_EXTREME,
+/* below ones are meant for in-kernel use */
+	KERN_WRITE_LIFE_SHORT,
+	KERN_WRITE_LIFE_MEDIUM,
+	KERN_WRITE_LIFE_LONG,
+	KERN_WRITE_LIFE_EXTREME
 };
 
 #define IOCB_EVENTFD		(1 << 0)
-- 
2.7.4


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 3/4] fs: introduce APIs to enable sending write-hint with buffer-head
       [not found]   ` <CGME20190109153339epcas2p4691a898dde0174a7565d62fcb3be0b6d@epcas2p4.samsung.com>
@ 2019-01-09 15:31     ` Kanchan Joshi
  0 siblings, 0 replies; 20+ messages in thread
From: Kanchan Joshi @ 2019-01-09 15:31 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-block, linux-ext4, linux-nvme, jack, david, tytso,
	prakash.v, Kanchan Joshi

submit_bh and write_dirty_buffer do not take write-hint as parameter.
This patch introduces variants which do.

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
---
 fs/buffer.c                 | 18 ++++++++++++++++--
 include/linux/buffer_head.h |  3 +++
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index d60d61e..2a94676 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -3094,6 +3094,13 @@ int submit_bh(int op, int op_flags, struct buffer_head *bh)
 }
 EXPORT_SYMBOL(submit_bh);
 
+int submit_bh_write_hint(int op, int op_flags, struct buffer_head *bh,
+			 enum rw_hint hint)
+{
+	return submit_bh_wbc(op, op_flags, bh, hint, NULL);
+}
+EXPORT_SYMBOL(submit_bh_write_hint);
+
 /**
  * ll_rw_block: low-level access to block devices (DEPRECATED)
  * @op: whether to %READ or %WRITE
@@ -3151,6 +3158,13 @@ EXPORT_SYMBOL(ll_rw_block);
 
 void write_dirty_buffer(struct buffer_head *bh, int op_flags)
 {
+	write_dirty_buffer_with_hint(bh, op_flags, 0);
+}
+EXPORT_SYMBOL(write_dirty_buffer);
+
+void write_dirty_buffer_with_hint(struct buffer_head *bh, int op_flags,
+				   enum rw_hint hint)
+{
 	lock_buffer(bh);
 	if (!test_clear_buffer_dirty(bh)) {
 		unlock_buffer(bh);
@@ -3158,9 +3172,9 @@ void write_dirty_buffer(struct buffer_head *bh, int op_flags)
 	}
 	bh->b_end_io = end_buffer_write_sync;
 	get_bh(bh);
-	submit_bh(REQ_OP_WRITE, op_flags, bh);
+	submit_bh_wbc(REQ_OP_WRITE, op_flags, bh, hint, NULL);
 }
-EXPORT_SYMBOL(write_dirty_buffer);
+EXPORT_SYMBOL(write_dirty_buffer_with_hint);
 
 /*
  * For a data-integrity writeout, we need to wait upon any in-progress I/O
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 7b73ef7..3d682ac 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -200,7 +200,10 @@ void ll_rw_block(int, int, int, struct buffer_head * bh[]);
 int sync_dirty_buffer(struct buffer_head *bh);
 int __sync_dirty_buffer(struct buffer_head *bh, int op_flags);
 void write_dirty_buffer(struct buffer_head *bh, int op_flags);
+void write_dirty_buffer_with_hint(struct buffer_head *bh, int op_flags,
+				   enum rw_hint hint);
 int submit_bh(int, int, struct buffer_head *);
+int submit_bh_write_hint(int, int, struct buffer_head *, enum rw_hint hint);
 void write_boundary_block(struct block_device *bdev,
 			sector_t bblock, unsigned blocksize);
 int bh_uptodate_or_lock(struct buffer_head *bh);
-- 
2.7.4


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 4/4] fs/ext4,jbd2: add support for passing write-hint with journal.
       [not found]   ` <CGME20190109153342epcas2p3208f62a4dd876f8e1765b48f8aec2432@epcas2p3.samsung.com>
@ 2019-01-09 15:31     ` Kanchan Joshi
  2019-01-24  8:50       ` Jan Kara
  0 siblings, 1 reply; 20+ messages in thread
From: Kanchan Joshi @ 2019-01-09 15:31 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-block, linux-ext4, linux-nvme, jack, david, tytso,
	prakash.v, Kanchan Joshi

For NAND based SSDs, mixing of data with different life-time reduces
efficiency of internal garbage-collection. During FS operations, series
of journal updates will follow/precede series of data/meta updates, causing
intermixing inside SSD. By passing a write-hint with journal, its write
can be isolated from other data/meta writes, leading to endurance/performance
benefit on SSD.

This patch introduces "j_writehint" member in JBD2 journal, using which
Ext4 specifies write-hint (as SHORT) for journal.

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
---
 fs/ext4/super.c      |  2 ++
 fs/jbd2/commit.c     | 11 +++++++----
 fs/jbd2/journal.c    |  3 ++-
 fs/jbd2/revoke.c     |  3 ++-
 include/linux/jbd2.h |  8 ++++++++
 5 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index d6c142d..3af4049 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -4289,6 +4289,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 
 	set_task_ioprio(sbi->s_journal->j_task, journal_ioprio);
 
+	sbi->s_journal->j_writehint = KERN_WRITE_LIFE_SHORT;
+
 	sbi->s_journal->j_commit_callback = ext4_journal_commit_callback;
 
 no_journal:
diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 2eb55c3..6da4c28 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -153,10 +153,12 @@ static int journal_submit_commit_record(journal_t *journal,
 
 	if (journal->j_flags & JBD2_BARRIER &&
 	    !jbd2_has_feature_async_commit(journal))
-		ret = submit_bh(REQ_OP_WRITE,
-			REQ_SYNC | REQ_PREFLUSH | REQ_FUA, bh);
+		ret = submit_bh_write_hint(REQ_OP_WRITE,
+			REQ_SYNC | REQ_PREFLUSH | REQ_FUA, bh,
+			journal->j_writehint);
 	else
-		ret = submit_bh(REQ_OP_WRITE, REQ_SYNC, bh);
+		ret = submit_bh_write_hint(REQ_OP_WRITE, REQ_SYNC, bh,
+			journal->j_writehint);
 
 	*cbh = bh;
 	return ret;
@@ -711,7 +713,8 @@ void jbd2_journal_commit_transaction(journal_t *journal)
 				clear_buffer_dirty(bh);
 				set_buffer_uptodate(bh);
 				bh->b_end_io = journal_end_buffer_io_sync;
-				submit_bh(REQ_OP_WRITE, REQ_SYNC, bh);
+				submit_bh_write_hint(REQ_OP_WRITE, REQ_SYNC,
+						bh, journal->j_writehint);
 			}
 			cond_resched();
 			stats.run.rs_blocks_logged += bufs;
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 8ef6b6d..804dc2c 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -1384,7 +1384,8 @@ static int jbd2_write_superblock(journal_t *journal, int write_flags)
 	jbd2_superblock_csum_set(journal, sb);
 	get_bh(bh);
 	bh->b_end_io = end_buffer_write_sync;
-	ret = submit_bh(REQ_OP_WRITE, write_flags, bh);
+	ret = submit_bh_write_hint(REQ_OP_WRITE, write_flags, bh,
+				journal->j_writehint);
 	wait_on_buffer(bh);
 	if (buffer_write_io_error(bh)) {
 		clear_buffer_write_io_error(bh);
diff --git a/fs/jbd2/revoke.c b/fs/jbd2/revoke.c
index a1143e5..376b1d8 100644
--- a/fs/jbd2/revoke.c
+++ b/fs/jbd2/revoke.c
@@ -642,7 +642,8 @@ static void flush_descriptor(journal_t *journal,
 	set_buffer_jwrite(descriptor);
 	BUFFER_TRACE(descriptor, "write");
 	set_buffer_dirty(descriptor);
-	write_dirty_buffer(descriptor, REQ_SYNC);
+	write_dirty_buffer_with_hint(descriptor, REQ_SYNC,
+				journal->j_writehint);
 }
 #endif
 
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 0f919d5..918f21e 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1139,6 +1139,14 @@ struct journal_s
 	 */
 	__u32 j_csum_seed;
 
+	/**
+	 * @j_writehint:
+	 *
+	 * write-hint for journal (set by FS).
+	 */
+	enum rw_hint	j_writehint;
+
+
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 	/**
 	 * @j_trans_commit_map:
-- 
2.7.4


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/4] introduce four macros for in-kernel hints
  2019-01-09 15:30     ` [PATCH 2/4] fs: introduce four macros for in-kernel hints Kanchan Joshi
@ 2019-01-23 18:27       ` " Javier González
  2019-01-24  8:35         ` Jan Kara
  0 siblings, 1 reply; 20+ messages in thread
From: Javier González @ 2019-01-23 18:27 UTC (permalink / raw)
  To: Kanchan Joshi
  Cc: linux-fsdevel, linux-block, linux-ext4, linux-nvme, jack, david,
	tytso, prakash.v

[-- Attachment #1: Type: text/plain, Size: 1038 bytes --]


> On 9 Jan 2019, at 16.30, Kanchan Joshi <joshi.k@samsung.com> wrote:
> 
> Exiting write-hints are exposed to user-mode. There is a possiblity
> of conflict if kernel happens to use those. This patch introduces four
> write-hints for exclusive kernel-mode use.
> 
> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
> ---
> include/linux/fs.h | 5 +++++
> 1 file changed, 5 insertions(+)
> 
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 811c777..e8548eb 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -291,6 +291,11 @@ enum rw_hint {
> 	WRITE_LIFE_MEDIUM	= RWH_WRITE_LIFE_MEDIUM,
> 	WRITE_LIFE_LONG		= RWH_WRITE_LIFE_LONG,
> 	WRITE_LIFE_EXTREME	= RWH_WRITE_LIFE_EXTREME,
> +/* below ones are meant for in-kernel use */
> +	KERN_WRITE_LIFE_SHORT,
> +	KERN_WRITE_LIFE_MEDIUM,
> +	KERN_WRITE_LIFE_LONG,
> +	KERN_WRITE_LIFE_EXTREME
> };
> 

I think Jens and Dave meant kernel hints to go top down. This would also
give space for supporting more hints / streams from both ends for user
and kernel.

Javier

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/4] Write-hint for FS journal
  2019-01-09 15:30 ` [PATCH v2 0/4] Write-hint for FS journal Kanchan Joshi
                     ` (3 preceding siblings ...)
       [not found]   ` <CGME20190109153342epcas2p3208f62a4dd876f8e1765b48f8aec2432@epcas2p3.samsung.com>
@ 2019-01-23 18:35   ` Javier González
  2019-01-24  8:29   ` Jan Kara
  2019-01-25 16:23   ` Keith Busch
  6 siblings, 0 replies; 20+ messages in thread
From: Javier González @ 2019-01-23 18:35 UTC (permalink / raw)
  To: Kanchan Joshi
  Cc: linux-fsdevel, linux-block, linux-ext4, linux-nvme, jack, david,
	tytso, prakash.v

[-- Attachment #1: Type: text/plain, Size: 1839 bytes --]


> On 9 Jan 2019, at 16.30, Kanchan Joshi <joshi.k@samsung.com> wrote:
> 
> Towards supporing write-hints/streams for filesystem journal.
> 
> Here is the v1 patch for background -
> https://marc.info/?l=linux-fsdevel&m=154444637519020&w=2
> 
> Changes since v1:
> - introduce four more hints for in-kernel use, as recommended by Dave chinner
>  & Jens axboe. This isolates kernel-mode hints from user-mode ones.
> - remove mount-option to specify write-hint, as recommended by Jan kara &
>  Dave chinner. Rather, FS always sets write-hint for journal. This gets ignored
>  if device does not support stream.
> - Removed code-redundancy for write_dirty_buffer (Jan kara's review comment)
> 
> 
> Kanchan Joshi (4):
>  block: Increase count of supported write-hints
>  fs: introduce four macros for in-kernel hints
>  fs: introduce APIs to enable sending write-hint with buffer-head
>  fs/ext4,jbd2: add support for passing write-hint with journal.
> 
> fs/buffer.c                 | 18 ++++++++++++++++--
> fs/ext4/super.c             |  2 ++
> fs/jbd2/commit.c            | 11 +++++++----
> fs/jbd2/journal.c           |  3 ++-
> fs/jbd2/revoke.c            |  3 ++-
> include/linux/blkdev.h      |  5 ++++-
> include/linux/buffer_head.h |  3 +++
> include/linux/fs.h          |  5 +++++
> include/linux/jbd2.h        |  8 ++++++++
> 9 files changed, 49 insertions(+), 9 deletions(-)
> 
> --
> 2.7.4

Worth sharing the paper where you describe the design and the numbers
you collected [1]. Also, addressing Dave's comment on stream support, it
points to a Samsung drive supporting streams (PM963). In this context,
you should verify in V3 that we are at least passing xfstests and
blktests with these changes.

[1] https://www.usenix.org/system/files/conference/fast18/fast18-rho.pdf

Javier

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/4] Write-hint for FS journal
  2019-01-09 15:30 ` [PATCH v2 0/4] Write-hint for FS journal Kanchan Joshi
                     ` (4 preceding siblings ...)
  2019-01-23 18:35   ` [PATCH v2 0/4] Write-hint for FS journal Javier González
@ 2019-01-24  8:29   ` Jan Kara
  2019-01-25 14:20     ` Kanchan Joshi
  2019-01-25 16:23   ` Keith Busch
  6 siblings, 1 reply; 20+ messages in thread
From: Jan Kara @ 2019-01-24  8:29 UTC (permalink / raw)
  To: Kanchan Joshi
  Cc: linux-fsdevel, linux-block, linux-ext4, linux-nvme, jack, david,
	tytso, prakash.v

Hello,

On Wed 09-01-19 21:00:57, Kanchan Joshi wrote:
> Towards supporing write-hints/streams for filesystem journal.                   
>                                                                                 
> Here is the v1 patch for background -                                           
> https://marc.info/?l=linux-fsdevel&m=154444637519020&w=2                        
>                                                                                 
> Changes since v1:                                                               
> - introduce four more hints for in-kernel use, as recommended by Dave chinner   
>   & Jens axboe. This isolates kernel-mode hints from user-mode ones.            
> - remove mount-option to specify write-hint, as recommended by Jan kara &       
>   Dave chinner. Rather, FS always sets write-hint for journal. This gets ignored
>   if device does not support stream.                                            
> - Removed code-redundancy for write_dirty_buffer (Jan kara's review comment)

I guess the series should also go to Jens since he was the original author
of write-hint support so he has the best idea about the architecture.

								Honza

> Kanchan Joshi (4):
>   block: Increase count of supported write-hints
>   fs: introduce four macros for in-kernel hints
>   fs: introduce APIs to enable sending write-hint with buffer-head
>   fs/ext4,jbd2: add support for passing write-hint with journal.
> 
>  fs/buffer.c                 | 18 ++++++++++++++++--
>  fs/ext4/super.c             |  2 ++
>  fs/jbd2/commit.c            | 11 +++++++----
>  fs/jbd2/journal.c           |  3 ++-
>  fs/jbd2/revoke.c            |  3 ++-
>  include/linux/blkdev.h      |  5 ++++-
>  include/linux/buffer_head.h |  3 +++
>  include/linux/fs.h          |  5 +++++
>  include/linux/jbd2.h        |  8 ++++++++
>  9 files changed, 49 insertions(+), 9 deletions(-)
> 
> -- 
> 2.7.4
> 
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/4] introduce four macros for in-kernel hints
  2019-01-23 18:27       ` [PATCH 2/4] " Javier González
@ 2019-01-24  8:35         ` Jan Kara
  2019-01-24  9:23           ` Javier González
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Kara @ 2019-01-24  8:35 UTC (permalink / raw)
  To: Javier González
  Cc: Kanchan Joshi, linux-fsdevel, linux-block, linux-ext4,
	linux-nvme, jack, david, tytso, prakash.v, Jens Axboe

On Wed 23-01-19 19:27:12, Javier González wrote:
> 
> > On 9 Jan 2019, at 16.30, Kanchan Joshi <joshi.k@samsung.com> wrote:
> > 
> > Exiting write-hints are exposed to user-mode. There is a possiblity
> > of conflict if kernel happens to use those. This patch introduces four
> > write-hints for exclusive kernel-mode use.
> > 
> > Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
> > ---
> > include/linux/fs.h | 5 +++++
> > 1 file changed, 5 insertions(+)
> > 
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 811c777..e8548eb 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -291,6 +291,11 @@ enum rw_hint {
> > 	WRITE_LIFE_MEDIUM	= RWH_WRITE_LIFE_MEDIUM,
> > 	WRITE_LIFE_LONG		= RWH_WRITE_LIFE_LONG,
> > 	WRITE_LIFE_EXTREME	= RWH_WRITE_LIFE_EXTREME,
> > +/* below ones are meant for in-kernel use */
> > +	KERN_WRITE_LIFE_SHORT,
> > +	KERN_WRITE_LIFE_MEDIUM,
> > +	KERN_WRITE_LIFE_LONG,
> > +	KERN_WRITE_LIFE_EXTREME
> > };
> > 
> 
> I think Jens and Dave meant kernel hints to go top down. This would also
> give space for supporting more hints / streams from both ends for user
> and kernel.

Yes, that was the idea however if I understand it right, the write hints do
not really have to be consistent boot-to-boot since they aren't stored
persistently by the disk, are they? If that's the case, it doesn't really
matter which numbers we pick.

One thing I don't quite like is the naming of KERN_WRITE_LIFE_SHORT etc.. It
is upto filesystem to assign meanings to the write hints. So I think it is
enough to provide something like KERN_WRITE_HINT_MIN which is the first
hint available to the kernel and then the number of hints available to the
kernel.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] fs/ext4,jbd2: add support for passing write-hint with journal.
  2019-01-09 15:31     ` [PATCH 4/4] fs/ext4,jbd2: add support for passing write-hint with journal Kanchan Joshi
@ 2019-01-24  8:50       ` Jan Kara
  0 siblings, 0 replies; 20+ messages in thread
From: Jan Kara @ 2019-01-24  8:50 UTC (permalink / raw)
  To: Kanchan Joshi
  Cc: linux-fsdevel, linux-block, linux-ext4, linux-nvme, jack, david,
	tytso, prakash.v

On Wed 09-01-19 21:01:01, Kanchan Joshi wrote:
> For NAND based SSDs, mixing of data with different life-time reduces
> efficiency of internal garbage-collection. During FS operations, series
> of journal updates will follow/precede series of data/meta updates, causing
> intermixing inside SSD. By passing a write-hint with journal, its write
> can be isolated from other data/meta writes, leading to endurance/performance
> benefit on SSD.
> 
> This patch introduces "j_writehint" member in JBD2 journal, using which
> Ext4 specifies write-hint (as SHORT) for journal.
> 
> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>

Thanks for the patch. It looks mostly good, just one nit below.

> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index d6c142d..3af4049 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -4289,6 +4289,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
>  
>  	set_task_ioprio(sbi->s_journal->j_task, journal_ioprio);
>  
> +	sbi->s_journal->j_writehint = KERN_WRITE_LIFE_SHORT;
> +
>  	sbi->s_journal->j_commit_callback = ext4_journal_commit_callback;

So I'd rather have defines like:

#define EXT4_WRITE_HINT_JOURNAL KERN_WRITE_HINT_MIN

like I suggested in another email and then later we could add stuff like

#define EXT4_WRITE_HINT_BITMAP	(KERN_WRITE_HINT_MIN + 1)

etc.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/4] introduce four macros for in-kernel hints
  2019-01-24  8:35         ` Jan Kara
@ 2019-01-24  9:23           ` Javier González
  0 siblings, 0 replies; 20+ messages in thread
From: Javier González @ 2019-01-24  9:23 UTC (permalink / raw)
  To: Jan Kara
  Cc: Kanchan Joshi, linux-fsdevel, linux-block, linux-ext4,
	linux-nvme, jack, david, tytso, prakash.v, Jens Axboe

[-- Attachment #1: Type: text/plain, Size: 2345 bytes --]


> On 24 Jan 2019, at 09.35, Jan Kara <jack@suse.cz> wrote:
> 
> On Wed 23-01-19 19:27:12, Javier González wrote:
>>> On 9 Jan 2019, at 16.30, Kanchan Joshi <joshi.k@samsung.com> wrote:
>>> 
>>> Exiting write-hints are exposed to user-mode. There is a possiblity
>>> of conflict if kernel happens to use those. This patch introduces four
>>> write-hints for exclusive kernel-mode use.
>>> 
>>> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
>>> ---
>>> include/linux/fs.h | 5 +++++
>>> 1 file changed, 5 insertions(+)
>>> 
>>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>>> index 811c777..e8548eb 100644
>>> --- a/include/linux/fs.h
>>> +++ b/include/linux/fs.h
>>> @@ -291,6 +291,11 @@ enum rw_hint {
>>> 	WRITE_LIFE_MEDIUM	= RWH_WRITE_LIFE_MEDIUM,
>>> 	WRITE_LIFE_LONG		= RWH_WRITE_LIFE_LONG,
>>> 	WRITE_LIFE_EXTREME	= RWH_WRITE_LIFE_EXTREME,
>>> +/* below ones are meant for in-kernel use */
>>> +	KERN_WRITE_LIFE_SHORT,
>>> +	KERN_WRITE_LIFE_MEDIUM,
>>> +	KERN_WRITE_LIFE_LONG,
>>> +	KERN_WRITE_LIFE_EXTREME
>>> };
>> 
>> I think Jens and Dave meant kernel hints to go top down. This would also
>> give space for supporting more hints / streams from both ends for user
>> and kernel.
> 
> Yes, that was the idea however if I understand it right, the write hints do
> not really have to be consistent boot-to-boot since they aren't stored
> persistently by the disk, are they? If that's the case, it doesn't really
> matter which numbers we pick.
> 

I guess this is implementation specific. Some times the drive will want
to store this to improve GC. For the current "coldness" hint I does not
matter much, but if the hint were to express other metric it can become
relevant.

Anyway, the comment was more to separate user / kernel hints and allow
them to grow from the ends.


> One thing I don't quite like is the naming of KERN_WRITE_LIFE_SHORT etc.. It
> is upto filesystem to assign meanings to the write hints. So I think it is
> enough to provide something like KERN_WRITE_HINT_MIN which is the first
> hint available to the kernel and then the number of hints available to the
> kernel.
> 

Makes sense to me. Then we can rename the hint for each FS to give it
proper mening.

> 								Honza
> 
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

Javier

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/4] Write-hint for FS journal
  2019-01-24  8:29   ` Jan Kara
@ 2019-01-25 14:20     ` Kanchan Joshi
  0 siblings, 0 replies; 20+ messages in thread
From: Kanchan Joshi @ 2019-01-25 14:20 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-fsdevel, linux-block, linux-ext4, linux-nvme, jack, david,
	Jan Kara, prakash.v

Hi Jens,
Can you please have a glance on this patch series, given than attempt is 
to extend the original architecture of write-hints.

Thanks,

On Thursday 24 January 2019 01:59 PM, Jan Kara wrote:
> Hello,
> 
> On Wed 09-01-19 21:00:57, Kanchan Joshi wrote:
>> Towards supporing write-hints/streams for filesystem journal.
>>                                                                                  
>> Here is the v1 patch for background -
>> https://marc.info/?l=linux-fsdevel&m=154444637519020&w=2
>>                                                                                  
>> Changes since v1:
>> - introduce four more hints for in-kernel use, as recommended by Dave chinner
>>    & Jens axboe. This isolates kernel-mode hints from user-mode ones.
>> - remove mount-option to specify write-hint, as recommended by Jan kara &
>>    Dave chinner. Rather, FS always sets write-hint for journal. This gets ignored
>>    if device does not support stream.
>> - Removed code-redundancy for write_dirty_buffer (Jan kara's review comment)
> 
> I guess the series should also go to Jens since he was the original author
> of write-hint support so he has the best idea about the architecture.
> 
> 								Honza
> 
>> Kanchan Joshi (4):
>>    block: Increase count of supported write-hints
>>    fs: introduce four macros for in-kernel hints
>>    fs: introduce APIs to enable sending write-hint with buffer-head
>>    fs/ext4,jbd2: add support for passing write-hint with journal.
>>
>>   fs/buffer.c                 | 18 ++++++++++++++++--
>>   fs/ext4/super.c             |  2 ++
>>   fs/jbd2/commit.c            | 11 +++++++----
>>   fs/jbd2/journal.c           |  3 ++-
>>   fs/jbd2/revoke.c            |  3 ++-
>>   include/linux/blkdev.h      |  5 ++++-
>>   include/linux/buffer_head.h |  3 +++
>>   include/linux/fs.h          |  5 +++++
>>   include/linux/jbd2.h        |  8 ++++++++
>>   9 files changed, 49 insertions(+), 9 deletions(-)
>>
>> -- 
>> 2.7.4
>>
>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/4] Write-hint for FS journal
  2019-01-09 15:30 ` [PATCH v2 0/4] Write-hint for FS journal Kanchan Joshi
                     ` (5 preceding siblings ...)
  2019-01-24  8:29   ` Jan Kara
@ 2019-01-25 16:23   ` Keith Busch
  2019-01-28 12:47     ` Jan Kara
  6 siblings, 1 reply; 20+ messages in thread
From: Keith Busch @ 2019-01-25 16:23 UTC (permalink / raw)
  To: Kanchan Joshi
  Cc: linux-fsdevel, linux-block, linux-ext4, linux-nvme, jack, david,
	tytso, prakash.v

On Wed, Jan 09, 2019 at 09:00:57PM +0530, Kanchan Joshi wrote:
> Towards supporing write-hints/streams for filesystem journal.                   
>                                                                                 
> Here is the v1 patch for background -                                           
> https://marc.info/?l=linux-fsdevel&m=154444637519020&w=2                        
>                                                                                 
> Changes since v1:                                                               
> - introduce four more hints for in-kernel use, as recommended by Dave chinner   
>   & Jens axboe. This isolates kernel-mode hints from user-mode ones.            

The nvme driver disables streams if the controller doesn't support
BLK_MAX_WRITE_HINT number of streams, so this series breaks the feature
for controllers that only support up to 4.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/4] Write-hint for FS journal
  2019-01-25 16:23   ` Keith Busch
@ 2019-01-28 12:47     ` Jan Kara
  2019-01-28 23:24       ` Keith Busch
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Kara @ 2019-01-28 12:47 UTC (permalink / raw)
  To: Keith Busch
  Cc: Kanchan Joshi, linux-fsdevel, linux-block, linux-ext4,
	linux-nvme, jack, david, tytso, prakash.v

On Fri 25-01-19 09:23:53, Keith Busch wrote:
> On Wed, Jan 09, 2019 at 09:00:57PM +0530, Kanchan Joshi wrote:
> > Towards supporing write-hints/streams for filesystem journal.                   
> >                                                                                 
> > Here is the v1 patch for background -                                           
> > https://marc.info/?l=linux-fsdevel&m=154444637519020&w=2                        
> >                                                                                 
> > Changes since v1:                                                               
> > - introduce four more hints for in-kernel use, as recommended by Dave chinner   
> >   & Jens axboe. This isolates kernel-mode hints from user-mode ones.            
> 
> The nvme driver disables streams if the controller doesn't support
> BLK_MAX_WRITE_HINT number of streams, so this series breaks the feature
> for controllers that only support up to 4.

Right. Do you know if there are such controllers? Or are you just afraid
that there could be?

									Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/4] Write-hint for FS journal
  2019-01-28 12:47     ` Jan Kara
@ 2019-01-28 23:24       ` Keith Busch
  2019-01-29 10:07         ` Jan Kara
  0 siblings, 1 reply; 20+ messages in thread
From: Keith Busch @ 2019-01-28 23:24 UTC (permalink / raw)
  To: Jan Kara
  Cc: Kanchan Joshi, linux-fsdevel, linux-block, linux-ext4,
	linux-nvme, jack, david, tytso, prakash.v

On Mon, Jan 28, 2019 at 04:47:09AM -0800, Jan Kara wrote:
> On Fri 25-01-19 09:23:53, Keith Busch wrote:
> > On Wed, Jan 09, 2019 at 09:00:57PM +0530, Kanchan Joshi wrote:
> > > Towards supporing write-hints/streams for filesystem journal.                   
> > >                                                                                 
> > > Here is the v1 patch for background -                                           
> > > https://marc.info/?l=linux-fsdevel&m=154444637519020&w=2                        
> > >                                                                                 
> > > Changes since v1:                                                               
> > > - introduce four more hints for in-kernel use, as recommended by Dave chinner   
> > >   & Jens axboe. This isolates kernel-mode hints from user-mode ones.            
> > 
> > The nvme driver disables streams if the controller doesn't support
> > BLK_MAX_WRITE_HINT number of streams, so this series breaks the feature
> > for controllers that only support up to 4.
> 
> Right. Do you know if there are such controllers? Or are you just afraid
> that there could be?

I've asked around, and the concensus I received is all currently support
at least 8, but they couldn't say if that would be true for potential
lower budget products. Can we implement a reasonable fallback to use
what's available?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/4] Write-hint for FS journal
  2019-01-28 23:24       ` Keith Busch
@ 2019-01-29 10:07         ` Jan Kara
  2019-01-30  0:13           ` Dave Chinner
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Kara @ 2019-01-29 10:07 UTC (permalink / raw)
  To: Keith Busch
  Cc: Jan Kara, Kanchan Joshi, linux-fsdevel, linux-block, linux-ext4,
	linux-nvme, jack, david, tytso, prakash.v, Jens Axboe

On Mon 28-01-19 16:24:24, Keith Busch wrote:
> On Mon, Jan 28, 2019 at 04:47:09AM -0800, Jan Kara wrote:
> > On Fri 25-01-19 09:23:53, Keith Busch wrote:
> > > On Wed, Jan 09, 2019 at 09:00:57PM +0530, Kanchan Joshi wrote:
> > > > Towards supporing write-hints/streams for filesystem journal.                   
> > > >                                                                                 
> > > > Here is the v1 patch for background -                                           
> > > > https://marc.info/?l=linux-fsdevel&m=154444637519020&w=2                        
> > > >                                                                                 
> > > > Changes since v1:                                                               
> > > > - introduce four more hints for in-kernel use, as recommended by Dave chinner   
> > > >   & Jens axboe. This isolates kernel-mode hints from user-mode ones.            
> > > 
> > > The nvme driver disables streams if the controller doesn't support
> > > BLK_MAX_WRITE_HINT number of streams, so this series breaks the feature
> > > for controllers that only support up to 4.
> > 
> > Right. Do you know if there are such controllers? Or are you just afraid
> > that there could be?
> 
> I've asked around, and the concensus I received is all currently support
> at least 8, but they couldn't say if that would be true for potential
> lower budget products. Can we implement a reasonable fallback to use
> what's available?

OK, thanks for input. So probably we should just map kernel stream IDs to 0
if the device doesn't support them. But that probably means we need to
propagate number of available streams up from NVME into the block layer so
that this can be handled reasonably seamlessly. Jens, Kanchan?

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/4] Write-hint for FS journal
  2019-01-29 10:07         ` Jan Kara
@ 2019-01-30  0:13           ` Dave Chinner
  2019-01-30 13:54             ` Kanchan Joshi
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2019-01-30  0:13 UTC (permalink / raw)
  To: Jan Kara
  Cc: Keith Busch, Kanchan Joshi, linux-fsdevel, linux-block,
	linux-ext4, linux-nvme, jack, tytso, prakash.v, Jens Axboe

On Tue, Jan 29, 2019 at 11:07:02AM +0100, Jan Kara wrote:
> On Mon 28-01-19 16:24:24, Keith Busch wrote:
> > On Mon, Jan 28, 2019 at 04:47:09AM -0800, Jan Kara wrote:
> > > On Fri 25-01-19 09:23:53, Keith Busch wrote:
> > > > On Wed, Jan 09, 2019 at 09:00:57PM +0530, Kanchan Joshi wrote:
> > > > > Towards supporing write-hints/streams for filesystem journal.                   
> > > > >                                                                                 
> > > > > Here is the v1 patch for background -                                           
> > > > > https://marc.info/?l=linux-fsdevel&m=154444637519020&w=2                        
> > > > >                                                                                 
> > > > > Changes since v1:                                                               
> > > > > - introduce four more hints for in-kernel use, as recommended by Dave chinner   
> > > > >   & Jens axboe. This isolates kernel-mode hints from user-mode ones.            
> > > > 
> > > > The nvme driver disables streams if the controller doesn't support
> > > > BLK_MAX_WRITE_HINT number of streams, so this series breaks the feature
> > > > for controllers that only support up to 4.
> > > 
> > > Right. Do you know if there are such controllers? Or are you just afraid
> > > that there could be?
> > 
> > I've asked around, and the concensus I received is all currently support
> > at least 8, but they couldn't say if that would be true for potential
> > lower budget products. Can we implement a reasonable fallback to use
> > what's available?
> 
> OK, thanks for input. So probably we should just map kernel stream IDs to 0
> if the device doesn't support them. But that probably means we need to
> propagate number of available streams up from NVME into the block layer so
> that this can be handled reasonably seamlessly. Jens, Kanchan?

Yeah, that's basically what I said we needed to do when this was
last discussed. i.e. that the block layer needed to know how many
streams the hardware had and map the 4 "kernel internal" hints
appropriately to what he device supports.

e.g. if the device only supports 4 hints, then it needs to map the
kernel hints either to zero. If it supports less than 8 streams,
then they need otbe mapped into the hints above index 5. If there
are N streams, then they need to be mapped to the hints {N-3,N}

And, to top it all off, there needs to be guards so that if we want
to grow the userspace hints to more than 4 hints, they don't crash
into ranges the kernel is already reserving because of limited
device range support.

Nothing is ever simple....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/4] Write-hint for FS journal
  2019-01-30  0:13           ` Dave Chinner
@ 2019-01-30 13:54             ` Kanchan Joshi
  2019-02-05 11:50               ` Jan Kara
  0 siblings, 1 reply; 20+ messages in thread
From: Kanchan Joshi @ 2019-01-30 13:54 UTC (permalink / raw)
  To: Dave Chinner, Jan Kara
  Cc: Keith Busch, linux-fsdevel, linux-block, linux-ext4, linux-nvme,
	jack, tytso, prakash.v, Jens Axboe


On Wednesday 30 January 2019 05:43 AM, Dave Chinner wrote:
> On Tue, Jan 29, 2019 at 11:07:02AM +0100, Jan Kara wrote:
>> On Mon 28-01-19 16:24:24, Keith Busch wrote:
>>> On Mon, Jan 28, 2019 at 04:47:09AM -0800, Jan Kara wrote:
>>>> On Fri 25-01-19 09:23:53, Keith Busch wrote:
>>>>> On Wed, Jan 09, 2019 at 09:00:57PM +0530, Kanchan Joshi wrote:
>>>>>> Towards supporing write-hints/streams for filesystem journal.
>>>>>>                                                                                  
>>>>>> Here is the v1 patch for background -
>>>>>> https://marc.info/?l=linux-fsdevel&m=154444637519020&w=2
>>>>>>                                                                                  
>>>>>> Changes since v1:
>>>>>> - introduce four more hints for in-kernel use, as recommended by Dave chinner
>>>>>>    & Jens axboe. This isolates kernel-mode hints from user-mode ones.
>>>>>
>>>>> The nvme driver disables streams if the controller doesn't support
>>>>> BLK_MAX_WRITE_HINT number of streams, so this series breaks the feature
>>>>> for controllers that only support up to 4.
>>>>
>>>> Right. Do you know if there are such controllers? Or are you just afraid
>>>> that there could be?
>>>
>>> I've asked around, and the concensus I received is all currently support
>>> at least 8, but they couldn't say if that would be true for potential
>>> lower budget products. Can we implement a reasonable fallback to use
>>> what's available?
>>
>> OK, thanks for input. So probably we should just map kernel stream IDs to 0
>> if the device doesn't support them. But that probably means we need to
>> propagate number of available streams up from NVME into the block layer so
>> that this can be handled reasonably seamlessly. Jens, Kanchan?
> 
> Yeah, that's basically what I said we needed to do when this was
> last discussed. i.e. that the block layer needed to know how many
> streams the hardware had and map the 4 "kernel internal" hints
> appropriately to what he device supports.
> 
> e.g. if the device only supports 4 hints, then it needs to map the
> kernel hints either to zero. If it supports less than 8 streams,
> then they need otbe mapped into the hints above index 5. If there
> are N streams, then they need to be mapped to the hints {N-3,N}
> 
> And, to top it all off, there needs to be guards so that if we want
> to grow the userspace hints to more than 4 hints, they don't crash
> into ranges the kernel is already reserving because of limited
> device range support.
> 
> Nothing is ever simple....
> 
Thanks all for feedback.
user-hints, when they reach to kernel via fcntl path, are sanity-checked 
(rw_hint_valid function).
Currently streams are enabled when nvme driver is made to run with 
"streams =1" option, while stream users always pass some write-hint, 
without bothering whether streams (and how many of those) are 
operational or not. This keeps configuration simple for stream users. 
Second, block layer does not translate write-hint to stream-number, 
rather it is done inside nvme driver. I suppose I should keep both these 
properties intact.
And considering all the suggestions, this is the plan for V3 -

[In block layer]
1. Introduce one macro "KERN_WRITE_HINT_MIN" which will take the value 
"user_hint_cnt + 1".
FS code will use this value (onwards) to define their own streams.

2. Introduce another macro "BLK_MAX_KERNEL_WRITE_HINTS" which will be 
set to 4 for now.

[In nvme driver]
1. Continue working as before if device supports just 4 streams. All 
these streams are used by user-hints, and kernel-hints are translated to 0.

2. If device supports any more than 4 streams, those will be mapped to 
serve kernel-hints, starting from KERN_WRITE_HINT_MIN onwards.
For example, if device has 6 streams, four streams (numbers = 1,2,3,4) 
will be used to serve user-hints and two streams ( numbers = 65535, 
65534) will be used to serve first two kernel hints. Other kernel-hints 
get mapped to 0. OTOH, if device has 10 streams, first four kernel-hints 
will be mapped to non-zero values (65535 to 65532) and anything else 
would get turned to 0.


Let me know if this sounds fine?


Thanks,
Kanchan







^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/4] Write-hint for FS journal
  2019-01-30 13:54             ` Kanchan Joshi
@ 2019-02-05 11:50               ` Jan Kara
  2019-02-05 22:53                 ` Dave Chinner
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Kara @ 2019-02-05 11:50 UTC (permalink / raw)
  To: Kanchan Joshi
  Cc: Dave Chinner, Jan Kara, Keith Busch, linux-fsdevel, linux-block,
	linux-ext4, linux-nvme, jack, tytso, prakash.v, Jens Axboe

On Wed 30-01-19 19:24:39, Kanchan Joshi wrote:
> 
> On Wednesday 30 January 2019 05:43 AM, Dave Chinner wrote:
> > On Tue, Jan 29, 2019 at 11:07:02AM +0100, Jan Kara wrote:
> > > On Mon 28-01-19 16:24:24, Keith Busch wrote:
> > > > On Mon, Jan 28, 2019 at 04:47:09AM -0800, Jan Kara wrote:
> > > > > On Fri 25-01-19 09:23:53, Keith Busch wrote:
> > > > > > On Wed, Jan 09, 2019 at 09:00:57PM +0530, Kanchan Joshi wrote:
> > > > > > > Towards supporing write-hints/streams for filesystem journal.
> > > > > > > Here is the v1 patch for background -
> > > > > > > https://marc.info/?l=linux-fsdevel&m=154444637519020&w=2
> > > > > > > Changes since v1:
> > > > > > > - introduce four more hints for in-kernel use, as recommended by Dave chinner
> > > > > > >    & Jens axboe. This isolates kernel-mode hints from user-mode ones.
> > > > > > 
> > > > > > The nvme driver disables streams if the controller doesn't support
> > > > > > BLK_MAX_WRITE_HINT number of streams, so this series breaks the feature
> > > > > > for controllers that only support up to 4.
> > > > > 
> > > > > Right. Do you know if there are such controllers? Or are you just afraid
> > > > > that there could be?
> > > > 
> > > > I've asked around, and the concensus I received is all currently support
> > > > at least 8, but they couldn't say if that would be true for potential
> > > > lower budget products. Can we implement a reasonable fallback to use
> > > > what's available?
> > > 
> > > OK, thanks for input. So probably we should just map kernel stream IDs to 0
> > > if the device doesn't support them. But that probably means we need to
> > > propagate number of available streams up from NVME into the block layer so
> > > that this can be handled reasonably seamlessly. Jens, Kanchan?
> > 
> > Yeah, that's basically what I said we needed to do when this was
> > last discussed. i.e. that the block layer needed to know how many
> > streams the hardware had and map the 4 "kernel internal" hints
> > appropriately to what he device supports.
> > 
> > e.g. if the device only supports 4 hints, then it needs to map the
> > kernel hints either to zero. If it supports less than 8 streams,
> > then they need otbe mapped into the hints above index 5. If there
> > are N streams, then they need to be mapped to the hints {N-3,N}
> > 
> > And, to top it all off, there needs to be guards so that if we want
> > to grow the userspace hints to more than 4 hints, they don't crash
> > into ranges the kernel is already reserving because of limited
> > device range support.
> > 
> > Nothing is ever simple....
> > 
> Thanks all for feedback.
> user-hints, when they reach to kernel via fcntl path, are sanity-checked
> (rw_hint_valid function).
> Currently streams are enabled when nvme driver is made to run with "streams
> =1" option, while stream users always pass some write-hint, without
> bothering whether streams (and how many of those) are operational or not.
> This keeps configuration simple for stream users. Second, block layer does
> not translate write-hint to stream-number, rather it is done inside nvme
> driver. I suppose I should keep both these properties intact.
> And considering all the suggestions, this is the plan for V3 -
> 
> [In block layer]
> 1. Introduce one macro "KERN_WRITE_HINT_MIN" which will take the value
> "user_hint_cnt + 1".
> FS code will use this value (onwards) to define their own streams.
> 
> 2. Introduce another macro "BLK_MAX_KERNEL_WRITE_HINTS" which will be set to
> 4 for now.
> 
> [In nvme driver]
> 1. Continue working as before if device supports just 4 streams. All these
> streams are used by user-hints, and kernel-hints are translated to 0.
> 
> 2. If device supports any more than 4 streams, those will be mapped to serve
> kernel-hints, starting from KERN_WRITE_HINT_MIN onwards.
> For example, if device has 6 streams, four streams (numbers = 1,2,3,4) will
> be used to serve user-hints and two streams ( numbers = 65535, 65534) will
> be used to serve first two kernel hints. Other kernel-hints get mapped to 0.
> OTOH, if device has 10 streams, first four kernel-hints will be mapped to
> non-zero values (65535 to 65532) and anything else would get turned to 0.

Well, I'm not sure if the mapping should happen in the NVME driver. In
future, there will be potentially more drivers supporting write hints and
we probably don't want each of them to replicate the mapping behavior. So
IMO the mapping should rather belong to the block layer...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/4] Write-hint for FS journal
  2019-02-05 11:50               ` Jan Kara
@ 2019-02-05 22:53                 ` Dave Chinner
  0 siblings, 0 replies; 20+ messages in thread
From: Dave Chinner @ 2019-02-05 22:53 UTC (permalink / raw)
  To: Jan Kara
  Cc: Kanchan Joshi, Keith Busch, linux-fsdevel, linux-block,
	linux-ext4, linux-nvme, jack, tytso, prakash.v, Jens Axboe

On Tue, Feb 05, 2019 at 12:50:48PM +0100, Jan Kara wrote:
> On Wed 30-01-19 19:24:39, Kanchan Joshi wrote:
> > 
> > On Wednesday 30 January 2019 05:43 AM, Dave Chinner wrote:
> > > On Tue, Jan 29, 2019 at 11:07:02AM +0100, Jan Kara wrote:
> > > > On Mon 28-01-19 16:24:24, Keith Busch wrote:
> > > > > On Mon, Jan 28, 2019 at 04:47:09AM -0800, Jan Kara wrote:
> > > > > > On Fri 25-01-19 09:23:53, Keith Busch wrote:
> > > > > > > On Wed, Jan 09, 2019 at 09:00:57PM +0530, Kanchan Joshi wrote:
> > > > > > > > Towards supporing write-hints/streams for filesystem journal.
> > > > > > > > Here is the v1 patch for background -
> > > > > > > > https://marc.info/?l=linux-fsdevel&m=154444637519020&w=2
> > > > > > > > Changes since v1:
> > > > > > > > - introduce four more hints for in-kernel use, as recommended by Dave chinner
> > > > > > > >    & Jens axboe. This isolates kernel-mode hints from user-mode ones.
> > > > > > > 
> > > > > > > The nvme driver disables streams if the controller doesn't support
> > > > > > > BLK_MAX_WRITE_HINT number of streams, so this series breaks the feature
> > > > > > > for controllers that only support up to 4.
> > > > > > 
> > > > > > Right. Do you know if there are such controllers? Or are you just afraid
> > > > > > that there could be?
> > > > > 
> > > > > I've asked around, and the concensus I received is all currently support
> > > > > at least 8, but they couldn't say if that would be true for potential
> > > > > lower budget products. Can we implement a reasonable fallback to use
> > > > > what's available?
> > > > 
> > > > OK, thanks for input. So probably we should just map kernel stream IDs to 0
> > > > if the device doesn't support them. But that probably means we need to
> > > > propagate number of available streams up from NVME into the block layer so
> > > > that this can be handled reasonably seamlessly. Jens, Kanchan?
> > > 
> > > Yeah, that's basically what I said we needed to do when this was
> > > last discussed. i.e. that the block layer needed to know how many
> > > streams the hardware had and map the 4 "kernel internal" hints
> > > appropriately to what he device supports.
> > > 
> > > e.g. if the device only supports 4 hints, then it needs to map the
> > > kernel hints either to zero. If it supports less than 8 streams,
> > > then they need otbe mapped into the hints above index 5. If there
> > > are N streams, then they need to be mapped to the hints {N-3,N}
> > > 
> > > And, to top it all off, there needs to be guards so that if we want
> > > to grow the userspace hints to more than 4 hints, they don't crash
> > > into ranges the kernel is already reserving because of limited
> > > device range support.
> > > 
> > > Nothing is ever simple....
> > > 
> > Thanks all for feedback.
> > user-hints, when they reach to kernel via fcntl path, are sanity-checked
> > (rw_hint_valid function).
> > Currently streams are enabled when nvme driver is made to run with "streams
> > =1" option, while stream users always pass some write-hint, without
> > bothering whether streams (and how many of those) are operational or not.
> > This keeps configuration simple for stream users. Second, block layer does
> > not translate write-hint to stream-number, rather it is done inside nvme
> > driver. I suppose I should keep both these properties intact.
> > And considering all the suggestions, this is the plan for V3 -
> > 
> > [In block layer]
> > 1. Introduce one macro "KERN_WRITE_HINT_MIN" which will take the value
> > "user_hint_cnt + 1".
> > FS code will use this value (onwards) to define their own streams.
> > 
> > 2. Introduce another macro "BLK_MAX_KERNEL_WRITE_HINTS" which will be set to
> > 4 for now.
> > 
> > [In nvme driver]
> > 1. Continue working as before if device supports just 4 streams. All these
> > streams are used by user-hints, and kernel-hints are translated to 0.
> > 
> > 2. If device supports any more than 4 streams, those will be mapped to serve
> > kernel-hints, starting from KERN_WRITE_HINT_MIN onwards.
> > For example, if device has 6 streams, four streams (numbers = 1,2,3,4) will
> > be used to serve user-hints and two streams ( numbers = 65535, 65534) will
> > be used to serve first two kernel hints. Other kernel-hints get mapped to 0.
> > OTOH, if device has 10 streams, first four kernel-hints will be mapped to
> > non-zero values (65535 to 65532) and anything else would get turned to 0.
> 
> Well, I'm not sure if the mapping should happen in the NVME driver. In
> future, there will be potentially more drivers supporting write hints and
> we probably don't want each of them to replicate the mapping behavior. So
> IMO the mapping should rather belong to the block layer...

*nod*

That's what I was suggesting. All the driver does is supply the
block layer with the number of hints it supports, and the block
layer does the rest. After all, this has to work with DM, MD, etc
so it really does need to bubble up from the driver to the block
layer so it can be handled appropriately by multi-device block
drivers. e.g. md raid might want to reserve a kernel channel for
itself (e.g. internal metadata) and so only present 7 channels to
the next layer up (4 user and 3 kernel)....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, back to index

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20190109153328epcas2p4643cbdc7a2182b47893a2bcaa0778e17@epcas2p4.samsung.com>
2019-01-09 15:30 ` [PATCH v2 0/4] Write-hint for FS journal Kanchan Joshi
     [not found]   ` <CGME20190109153332epcas1p187b419176a8d1d0be4982a275c0b9e86@epcas1p1.samsung.com>
2019-01-09 15:30     ` [PATCH 1/4] block: Increase count of supported write-hints Kanchan Joshi
     [not found]   ` <CGME20190109153336epcas2p29b3275b6c545e483a3f43b92268f08bf@epcas2p2.samsung.com>
2019-01-09 15:30     ` [PATCH 2/4] fs: introduce four macros for in-kernel hints Kanchan Joshi
2019-01-23 18:27       ` [PATCH 2/4] " Javier González
2019-01-24  8:35         ` Jan Kara
2019-01-24  9:23           ` Javier González
     [not found]   ` <CGME20190109153339epcas2p4691a898dde0174a7565d62fcb3be0b6d@epcas2p4.samsung.com>
2019-01-09 15:31     ` [PATCH 3/4] fs: introduce APIs to enable sending write-hint with buffer-head Kanchan Joshi
     [not found]   ` <CGME20190109153342epcas2p3208f62a4dd876f8e1765b48f8aec2432@epcas2p3.samsung.com>
2019-01-09 15:31     ` [PATCH 4/4] fs/ext4,jbd2: add support for passing write-hint with journal Kanchan Joshi
2019-01-24  8:50       ` Jan Kara
2019-01-23 18:35   ` [PATCH v2 0/4] Write-hint for FS journal Javier González
2019-01-24  8:29   ` Jan Kara
2019-01-25 14:20     ` Kanchan Joshi
2019-01-25 16:23   ` Keith Busch
2019-01-28 12:47     ` Jan Kara
2019-01-28 23:24       ` Keith Busch
2019-01-29 10:07         ` Jan Kara
2019-01-30  0:13           ` Dave Chinner
2019-01-30 13:54             ` Kanchan Joshi
2019-02-05 11:50               ` Jan Kara
2019-02-05 22:53                 ` Dave Chinner

Linux-Block Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-block/0 linux-block/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-block linux-block/ https://lore.kernel.org/linux-block \
		linux-block@vger.kernel.org
	public-inbox-index linux-block

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-block


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git