Linux-Block Archive on lore.kernel.org
 help / color / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Jan Kara <jack@suse.cz>
Cc: Kanchan Joshi <joshi.k@samsung.com>,
	Keith Busch <keith.busch@intel.com>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	"jack@suse.com" <jack@suse.com>, "tytso@mit.edu" <tytso@mit.edu>,
	"prakash.v@samsung.com" <prakash.v@samsung.com>,
	Jens Axboe <axboe@fb.com>
Subject: Re: [PATCH v2 0/4] Write-hint for FS journal
Date: Wed, 6 Feb 2019 09:53:15 +1100
Message-ID: <20190205225315.GY6173@dastard> (raw)
In-Reply-To: <20190205115048.GC3872@quack2.suse.cz>

On Tue, Feb 05, 2019 at 12:50:48PM +0100, Jan Kara wrote:
> On Wed 30-01-19 19:24:39, Kanchan Joshi wrote:
> > 
> > On Wednesday 30 January 2019 05:43 AM, Dave Chinner wrote:
> > > On Tue, Jan 29, 2019 at 11:07:02AM +0100, Jan Kara wrote:
> > > > On Mon 28-01-19 16:24:24, Keith Busch wrote:
> > > > > On Mon, Jan 28, 2019 at 04:47:09AM -0800, Jan Kara wrote:
> > > > > > On Fri 25-01-19 09:23:53, Keith Busch wrote:
> > > > > > > On Wed, Jan 09, 2019 at 09:00:57PM +0530, Kanchan Joshi wrote:
> > > > > > > > Towards supporing write-hints/streams for filesystem journal.
> > > > > > > > Here is the v1 patch for background -
> > > > > > > > https://marc.info/?l=linux-fsdevel&m=154444637519020&w=2
> > > > > > > > Changes since v1:
> > > > > > > > - introduce four more hints for in-kernel use, as recommended by Dave chinner
> > > > > > > >    & Jens axboe. This isolates kernel-mode hints from user-mode ones.
> > > > > > > 
> > > > > > > The nvme driver disables streams if the controller doesn't support
> > > > > > > BLK_MAX_WRITE_HINT number of streams, so this series breaks the feature
> > > > > > > for controllers that only support up to 4.
> > > > > > 
> > > > > > Right. Do you know if there are such controllers? Or are you just afraid
> > > > > > that there could be?
> > > > > 
> > > > > I've asked around, and the concensus I received is all currently support
> > > > > at least 8, but they couldn't say if that would be true for potential
> > > > > lower budget products. Can we implement a reasonable fallback to use
> > > > > what's available?
> > > > 
> > > > OK, thanks for input. So probably we should just map kernel stream IDs to 0
> > > > if the device doesn't support them. But that probably means we need to
> > > > propagate number of available streams up from NVME into the block layer so
> > > > that this can be handled reasonably seamlessly. Jens, Kanchan?
> > > 
> > > Yeah, that's basically what I said we needed to do when this was
> > > last discussed. i.e. that the block layer needed to know how many
> > > streams the hardware had and map the 4 "kernel internal" hints
> > > appropriately to what he device supports.
> > > 
> > > e.g. if the device only supports 4 hints, then it needs to map the
> > > kernel hints either to zero. If it supports less than 8 streams,
> > > then they need otbe mapped into the hints above index 5. If there
> > > are N streams, then they need to be mapped to the hints {N-3,N}
> > > 
> > > And, to top it all off, there needs to be guards so that if we want
> > > to grow the userspace hints to more than 4 hints, they don't crash
> > > into ranges the kernel is already reserving because of limited
> > > device range support.
> > > 
> > > Nothing is ever simple....
> > > 
> > Thanks all for feedback.
> > user-hints, when they reach to kernel via fcntl path, are sanity-checked
> > (rw_hint_valid function).
> > Currently streams are enabled when nvme driver is made to run with "streams
> > =1" option, while stream users always pass some write-hint, without
> > bothering whether streams (and how many of those) are operational or not.
> > This keeps configuration simple for stream users. Second, block layer does
> > not translate write-hint to stream-number, rather it is done inside nvme
> > driver. I suppose I should keep both these properties intact.
> > And considering all the suggestions, this is the plan for V3 -
> > 
> > [In block layer]
> > 1. Introduce one macro "KERN_WRITE_HINT_MIN" which will take the value
> > "user_hint_cnt + 1".
> > FS code will use this value (onwards) to define their own streams.
> > 
> > 2. Introduce another macro "BLK_MAX_KERNEL_WRITE_HINTS" which will be set to
> > 4 for now.
> > 
> > [In nvme driver]
> > 1. Continue working as before if device supports just 4 streams. All these
> > streams are used by user-hints, and kernel-hints are translated to 0.
> > 
> > 2. If device supports any more than 4 streams, those will be mapped to serve
> > kernel-hints, starting from KERN_WRITE_HINT_MIN onwards.
> > For example, if device has 6 streams, four streams (numbers = 1,2,3,4) will
> > be used to serve user-hints and two streams ( numbers = 65535, 65534) will
> > be used to serve first two kernel hints. Other kernel-hints get mapped to 0.
> > OTOH, if device has 10 streams, first four kernel-hints will be mapped to
> > non-zero values (65535 to 65532) and anything else would get turned to 0.
> 
> Well, I'm not sure if the mapping should happen in the NVME driver. In
> future, there will be potentially more drivers supporting write hints and
> we probably don't want each of them to replicate the mapping behavior. So
> IMO the mapping should rather belong to the block layer...

*nod*

That's what I was suggesting. All the driver does is supply the
block layer with the number of hints it supports, and the block
layer does the rest. After all, this has to work with DM, MD, etc
so it really does need to bubble up from the driver to the block
layer so it can be handled appropriately by multi-device block
drivers. e.g. md raid might want to reserve a kernel channel for
itself (e.g. internal metadata) and so only present 7 channels to
the next layer up (4 user and 3 kernel)....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

      reply index

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20190109153328epcas2p4643cbdc7a2182b47893a2bcaa0778e17@epcas2p4.samsung.com>
2019-01-09 15:30 ` Kanchan Joshi
     [not found]   ` <CGME20190109153332epcas1p187b419176a8d1d0be4982a275c0b9e86@epcas1p1.samsung.com>
2019-01-09 15:30     ` [PATCH 1/4] block: Increase count of supported write-hints Kanchan Joshi
     [not found]   ` <CGME20190109153336epcas2p29b3275b6c545e483a3f43b92268f08bf@epcas2p2.samsung.com>
2019-01-09 15:30     ` [PATCH 2/4] fs: introduce four macros for in-kernel hints Kanchan Joshi
2019-01-23 18:27       ` [PATCH 2/4] " Javier González
2019-01-24  8:35         ` Jan Kara
2019-01-24  9:23           ` Javier González
     [not found]   ` <CGME20190109153339epcas2p4691a898dde0174a7565d62fcb3be0b6d@epcas2p4.samsung.com>
2019-01-09 15:31     ` [PATCH 3/4] fs: introduce APIs to enable sending write-hint with buffer-head Kanchan Joshi
     [not found]   ` <CGME20190109153342epcas2p3208f62a4dd876f8e1765b48f8aec2432@epcas2p3.samsung.com>
2019-01-09 15:31     ` [PATCH 4/4] fs/ext4,jbd2: add support for passing write-hint with journal Kanchan Joshi
2019-01-24  8:50       ` Jan Kara
2019-01-23 18:35   ` [PATCH v2 0/4] Write-hint for FS journal Javier González
2019-01-24  8:29   ` Jan Kara
2019-01-25 14:20     ` Kanchan Joshi
2019-01-25 16:23   ` Keith Busch
2019-01-28 12:47     ` Jan Kara
2019-01-28 23:24       ` Keith Busch
2019-01-29 10:07         ` Jan Kara
2019-01-30  0:13           ` Dave Chinner
2019-01-30 13:54             ` Kanchan Joshi
2019-02-05 11:50               ` Jan Kara
2019-02-05 22:53                 ` Dave Chinner [this message]

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190205225315.GY6173@dastard \
    --to=david@fromorbit.com \
    --cc=axboe@fb.com \
    --cc=jack@suse.com \
    --cc=jack@suse.cz \
    --cc=joshi.k@samsung.com \
    --cc=keith.busch@intel.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=prakash.v@samsung.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Block Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-block/0 linux-block/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-block linux-block/ https://lore.kernel.org/linux-block \
		linux-block@vger.kernel.org linux-block@archiver.kernel.org
	public-inbox-index linux-block


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-block


AGPL code for this site: git clone https://public-inbox.org/ public-inbox