linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Kanchan Joshi <joshi.k@samsung.com>
Cc: Dave Chinner <david@fromorbit.com>, Jan Kara <jack@suse.cz>,
	Keith Busch <keith.busch@intel.com>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	"jack@suse.com" <jack@suse.com>, "tytso@mit.edu" <tytso@mit.edu>,
	"prakash.v@samsung.com" <prakash.v@samsung.com>,
	Jens Axboe <axboe@fb.com>
Subject: Re: [PATCH v2 0/4] Write-hint for FS journal
Date: Tue, 5 Feb 2019 12:50:48 +0100	[thread overview]
Message-ID: <20190205115048.GC3872@quack2.suse.cz> (raw)
In-Reply-To: <0ab2f0e1-27f2-7ab4-1772-f96c1430ea3b@samsung.com>

On Wed 30-01-19 19:24:39, Kanchan Joshi wrote:
> 
> On Wednesday 30 January 2019 05:43 AM, Dave Chinner wrote:
> > On Tue, Jan 29, 2019 at 11:07:02AM +0100, Jan Kara wrote:
> > > On Mon 28-01-19 16:24:24, Keith Busch wrote:
> > > > On Mon, Jan 28, 2019 at 04:47:09AM -0800, Jan Kara wrote:
> > > > > On Fri 25-01-19 09:23:53, Keith Busch wrote:
> > > > > > On Wed, Jan 09, 2019 at 09:00:57PM +0530, Kanchan Joshi wrote:
> > > > > > > Towards supporing write-hints/streams for filesystem journal.
> > > > > > > Here is the v1 patch for background -
> > > > > > > https://marc.info/?l=linux-fsdevel&m=154444637519020&w=2
> > > > > > > Changes since v1:
> > > > > > > - introduce four more hints for in-kernel use, as recommended by Dave chinner
> > > > > > >    & Jens axboe. This isolates kernel-mode hints from user-mode ones.
> > > > > > 
> > > > > > The nvme driver disables streams if the controller doesn't support
> > > > > > BLK_MAX_WRITE_HINT number of streams, so this series breaks the feature
> > > > > > for controllers that only support up to 4.
> > > > > 
> > > > > Right. Do you know if there are such controllers? Or are you just afraid
> > > > > that there could be?
> > > > 
> > > > I've asked around, and the concensus I received is all currently support
> > > > at least 8, but they couldn't say if that would be true for potential
> > > > lower budget products. Can we implement a reasonable fallback to use
> > > > what's available?
> > > 
> > > OK, thanks for input. So probably we should just map kernel stream IDs to 0
> > > if the device doesn't support them. But that probably means we need to
> > > propagate number of available streams up from NVME into the block layer so
> > > that this can be handled reasonably seamlessly. Jens, Kanchan?
> > 
> > Yeah, that's basically what I said we needed to do when this was
> > last discussed. i.e. that the block layer needed to know how many
> > streams the hardware had and map the 4 "kernel internal" hints
> > appropriately to what he device supports.
> > 
> > e.g. if the device only supports 4 hints, then it needs to map the
> > kernel hints either to zero. If it supports less than 8 streams,
> > then they need otbe mapped into the hints above index 5. If there
> > are N streams, then they need to be mapped to the hints {N-3,N}
> > 
> > And, to top it all off, there needs to be guards so that if we want
> > to grow the userspace hints to more than 4 hints, they don't crash
> > into ranges the kernel is already reserving because of limited
> > device range support.
> > 
> > Nothing is ever simple....
> > 
> Thanks all for feedback.
> user-hints, when they reach to kernel via fcntl path, are sanity-checked
> (rw_hint_valid function).
> Currently streams are enabled when nvme driver is made to run with "streams
> =1" option, while stream users always pass some write-hint, without
> bothering whether streams (and how many of those) are operational or not.
> This keeps configuration simple for stream users. Second, block layer does
> not translate write-hint to stream-number, rather it is done inside nvme
> driver. I suppose I should keep both these properties intact.
> And considering all the suggestions, this is the plan for V3 -
> 
> [In block layer]
> 1. Introduce one macro "KERN_WRITE_HINT_MIN" which will take the value
> "user_hint_cnt + 1".
> FS code will use this value (onwards) to define their own streams.
> 
> 2. Introduce another macro "BLK_MAX_KERNEL_WRITE_HINTS" which will be set to
> 4 for now.
> 
> [In nvme driver]
> 1. Continue working as before if device supports just 4 streams. All these
> streams are used by user-hints, and kernel-hints are translated to 0.
> 
> 2. If device supports any more than 4 streams, those will be mapped to serve
> kernel-hints, starting from KERN_WRITE_HINT_MIN onwards.
> For example, if device has 6 streams, four streams (numbers = 1,2,3,4) will
> be used to serve user-hints and two streams ( numbers = 65535, 65534) will
> be used to serve first two kernel hints. Other kernel-hints get mapped to 0.
> OTOH, if device has 10 streams, first four kernel-hints will be mapped to
> non-zero values (65535 to 65532) and anything else would get turned to 0.

Well, I'm not sure if the mapping should happen in the NVME driver. In
future, there will be potentially more drivers supporting write hints and
we probably don't want each of them to replicate the mapping behavior. So
IMO the mapping should rather belong to the block layer...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2019-02-05 11:50 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20190109153328epcas2p4643cbdc7a2182b47893a2bcaa0778e17@epcas2p4.samsung.com>
2019-01-09 15:30 ` [PATCH v2 0/4] Write-hint for FS journal Kanchan Joshi
     [not found]   ` <CGME20190109153332epcas1p187b419176a8d1d0be4982a275c0b9e86@epcas1p1.samsung.com>
2019-01-09 15:30     ` [PATCH 1/4] block: Increase count of supported write-hints Kanchan Joshi
     [not found]   ` <CGME20190109153336epcas2p29b3275b6c545e483a3f43b92268f08bf@epcas2p2.samsung.com>
2019-01-09 15:30     ` [PATCH 2/4] fs: introduce four macros for in-kernel hints Kanchan Joshi
2019-01-23 18:27       ` [PATCH 2/4] " Javier González
2019-01-24  8:35         ` Jan Kara
2019-01-24  9:23           ` Javier González
     [not found]   ` <CGME20190109153339epcas2p4691a898dde0174a7565d62fcb3be0b6d@epcas2p4.samsung.com>
2019-01-09 15:31     ` [PATCH 3/4] fs: introduce APIs to enable sending write-hint with buffer-head Kanchan Joshi
     [not found]   ` <CGME20190109153342epcas2p3208f62a4dd876f8e1765b48f8aec2432@epcas2p3.samsung.com>
2019-01-09 15:31     ` [PATCH 4/4] fs/ext4,jbd2: add support for passing write-hint with journal Kanchan Joshi
2019-01-24  8:50       ` Jan Kara
2019-01-23 18:35   ` [PATCH v2 0/4] Write-hint for FS journal Javier González
2019-01-24  8:29   ` Jan Kara
2019-01-25 14:20     ` Kanchan Joshi
2019-01-25 16:23   ` Keith Busch
2019-01-28 12:47     ` Jan Kara
2019-01-28 23:24       ` Keith Busch
2019-01-29 10:07         ` Jan Kara
2019-01-30  0:13           ` Dave Chinner
2019-01-30 13:54             ` Kanchan Joshi
2019-02-05 11:50               ` Jan Kara [this message]
2019-02-05 22:53                 ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190205115048.GC3872@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=axboe@fb.com \
    --cc=david@fromorbit.com \
    --cc=jack@suse.com \
    --cc=joshi.k@samsung.com \
    --cc=keith.busch@intel.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=prakash.v@samsung.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).