All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Wheeler <bcache@lists.ewheeler.net>
To: Kent Overstreet <kent.overstreet@gmail.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
	Coly Li <colyli@suse.de>,
	linux-block@vger.kernel.org, Jonathan Corbet <corbet@lwn.net>,
	"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>,
	open list <linux-kernel@vger.kernel.org>,
	"open list:BCACHE (BLOCK LAYER CACHE)"
	<linux-bcache@vger.kernel.org>
Subject: Re: [PATCH] bcache: make stripe_size configurable and persistent for hardware raid5/6
Date: Sat, 8 Jan 2022 13:51:22 -0800 (PST)	[thread overview]
Message-ID: <98aa1886-859-abb9-164f-c9eb9be38a91@ewheeler.net> (raw)
In-Reply-To: <c9abd220-6b7f-9299-48a1-a16d64981734@ewheeler.net>

On Fri, 7 Jan 2022, Eric Wheeler wrote:
> On Fri, 7 Jan 2022, Martin K. Petersen wrote:
> > Eric,
> > 
> > > Even new new RAID controlers that _do_ provide `io_opt` still do _not_ 
> > > indicate partial_stripes_expensive (which is an mdraid feature, but Martin 
> > > please correct me if I'm wrong here).
> > 
> > partial_stripes_expensive is a bcache thing, I am not sure why it needs
> > a separate flag. It is implied, although I guess one could argue that
> > RAID0 is a special case since partial writes are not as painful as with
> > parity RAID.
> 
> I'm guessing bcache used did some optimization for 
> queue->limits.raid_partial_stripes_expensive because md raid5 code sets 
> this flag.  At least when using Linux md as the RAID5 implementation it 
> gets configured automatically:
>    raid5.c:       mddev->queue->limits.raid_partial_stripes_expensive = 1;
> 
> https://elixir.bootlin.com/linux/latest/source/drivers/md/raid5.c#L7729
> 
> Interestingly only bcache uses it, but md does set it.

Ok so `git blame` shows that Kent added this to md/raid5.c in 
c78afc6261b (Kent Overstreet 2013-07-11 22:39:53 -0700 7526)
	mddev->queue->limits.raid_partial_stripes_expensive = 1;

    bcache/md: Use raid stripe size
    
    Now that we've got code for raid5/6 stripe awareness, bcache just needs
    to know about the stripes and when writing partial stripes is expensive
    - we probably don't want to enable this optimization for raid1 or 10,
    even though they have stripes. So add a flag to queue_limits.

Kent, Martin:

Do you think we should leave the md-specific 
raid_partial_stripes_expensive setting and require users of RAID 
controllers to set the bit themselves in bcache---or---remove all 
raid_partial_stripes_expensive code and always treat writes as "expensive" 
when `opt_io` is defined?

--
Eric Wheeler


> 
> > The SCSI spec states that submitting an I/O that is smaller than io_min
> > "may incur delays in processing the command". And similarly, submitting
> > a command larger than io_opt "may incur delays in processing the
> > command".
> > 
> > IOW, the spec says "don't write less than an aligned multiple of the
> > stripe chunk size" and "don't write more than an aligned full
> > stripe". That leaves "aligned multiples of the stripe chunk size but
> > less than the full stripe width" unaccounted for. And I guess that's
> > what the bcache flag is trying to capture.
> 
> Maybe any time io_opt is provided then partial_stripes_expensive should be 
> flagged too and any code to the contrary should be removed?
> 
> Question: Does anyone have a reason to keep partial_stripes_expensive in 
> the kernel at all?
> 
> > SCSI doesn't go into details about RAID levels and other implementation
> > details which is why the wording is deliberately vague. But obviously
> > the expectation is that partial stripe writes are slower than full.
> > 
> > In my book any component in the stack that sees either io_min or io_opt
> > should try very hard to send I/Os that are aligned multiples of those
> > values. I am not opposed to letting users manually twiddle the
> > settings. But I do think that we should aim for the stack doing the
> > right thing when it sees io_opt reported on a device.
> 
> Agreed, thanks for the feedback!
> 
> -Eric
> 
> 
> > 
> > -- 
> > Martin K. Petersen	Oracle Linux Engineering
> > 
> 

  reply	other threads:[~2022-01-08 21:51 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <d3f7fd44-9287-c7fa-ee95-c3b8a4d56c93@suse.de>
2019-06-22 23:16 ` [PATCH] bcache: make stripe_size configurable and persistent for hardware raid5/6 Eric Wheeler
2019-06-22 23:16   ` Eric Wheeler
2019-06-23  0:41   ` Martin K. Petersen
2019-06-23  0:41     ` Martin K. Petersen
2019-06-24  6:57   ` Coly Li
2019-06-24  7:05   ` Coly Li
2019-06-24 18:14     ` Eric Wheeler
2019-06-24 23:24       ` Martin K. Petersen
2019-06-24 23:24         ` Martin K. Petersen
2019-06-26  0:23         ` Eric Wheeler
2019-06-26  0:23           ` Eric Wheeler
2019-06-26  2:50           ` Martin K. Petersen
2019-06-26  2:50             ` Martin K. Petersen
2019-06-25  1:59       ` Coly Li
2022-01-06  3:29         ` Eric Wheeler
2022-01-06 16:17           ` Coly Li
2022-01-08  0:21           ` Martin K. Petersen
2022-01-08  4:54             ` Eric Wheeler
2022-01-08 21:51               ` Eric Wheeler [this message]
2022-01-10 16:14                 ` Martin K. Petersen
2022-01-10 23:30                   ` Eric Wheeler
2022-01-11  2:55                     ` Martin K. Petersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98aa1886-859-abb9-164f-c9eb9be38a91@ewheeler.net \
    --to=bcache@lists.ewheeler.net \
    --cc=colyli@suse.de \
    --cc=corbet@lwn.net \
    --cc=kent.overstreet@gmail.com \
    --cc=linux-bcache@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.