All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Wheeler <bcache@lists.ewheeler.net>
To: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Coly Li <colyli@suse.de>,
	linux-block@vger.kernel.org, Jonathan Corbet <corbet@lwn.net>,
	Kent Overstreet <kent.overstreet@gmail.com>,
	"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>,
	open list <linux-kernel@vger.kernel.org>,
	"open list:BCACHE (BLOCK LAYER CACHE)"
	<linux-bcache@vger.kernel.org>
Subject: Re: [PATCH] bcache: make stripe_size configurable and persistent for hardware raid5/6
Date: Fri, 7 Jan 2022 20:54:16 -0800 (PST)	[thread overview]
Message-ID: <c9abd220-6b7f-9299-48a1-a16d64981734@ewheeler.net> (raw)
In-Reply-To: <yq15yqvw1f0.fsf@ca-mkp.ca.oracle.com>

On Fri, 7 Jan 2022, Martin K. Petersen wrote:
> Eric,
> 
> > Even new new RAID controlers that _do_ provide `io_opt` still do _not_ 
> > indicate partial_stripes_expensive (which is an mdraid feature, but Martin 
> > please correct me if I'm wrong here).
> 
> partial_stripes_expensive is a bcache thing, I am not sure why it needs
> a separate flag. It is implied, although I guess one could argue that
> RAID0 is a special case since partial writes are not as painful as with
> parity RAID.

I'm guessing bcache used did some optimization for 
queue->limits.raid_partial_stripes_expensive because md raid5 code sets 
this flag.  At least when using Linux md as the RAID5 implementation it 
gets configured automatically:
   raid5.c:       mddev->queue->limits.raid_partial_stripes_expensive = 1;

https://elixir.bootlin.com/linux/latest/source/drivers/md/raid5.c#L7729

Interestingly only bcache uses it, but md does set it.

> The SCSI spec states that submitting an I/O that is smaller than io_min
> "may incur delays in processing the command". And similarly, submitting
> a command larger than io_opt "may incur delays in processing the
> command".
> 
> IOW, the spec says "don't write less than an aligned multiple of the
> stripe chunk size" and "don't write more than an aligned full
> stripe". That leaves "aligned multiples of the stripe chunk size but
> less than the full stripe width" unaccounted for. And I guess that's
> what the bcache flag is trying to capture.

Maybe any time io_opt is provided then partial_stripes_expensive should be 
flagged too and any code to the contrary should be removed?

Question: Does anyone have a reason to keep partial_stripes_expensive in 
the kernel at all?

> SCSI doesn't go into details about RAID levels and other implementation
> details which is why the wording is deliberately vague. But obviously
> the expectation is that partial stripe writes are slower than full.
> 
> In my book any component in the stack that sees either io_min or io_opt
> should try very hard to send I/Os that are aligned multiples of those
> values. I am not opposed to letting users manually twiddle the
> settings. But I do think that we should aim for the stack doing the
> right thing when it sees io_opt reported on a device.

Agreed, thanks for the feedback!

-Eric


> 
> -- 
> Martin K. Petersen	Oracle Linux Engineering
> 

  reply	other threads:[~2022-01-08  4:54 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <d3f7fd44-9287-c7fa-ee95-c3b8a4d56c93@suse.de>
2019-06-22 23:16 ` [PATCH] bcache: make stripe_size configurable and persistent for hardware raid5/6 Eric Wheeler
2019-06-22 23:16   ` Eric Wheeler
2019-06-23  0:41   ` Martin K. Petersen
2019-06-23  0:41     ` Martin K. Petersen
2019-06-24  6:57   ` Coly Li
2019-06-24  7:05   ` Coly Li
2019-06-24 18:14     ` Eric Wheeler
2019-06-24 23:24       ` Martin K. Petersen
2019-06-24 23:24         ` Martin K. Petersen
2019-06-26  0:23         ` Eric Wheeler
2019-06-26  0:23           ` Eric Wheeler
2019-06-26  2:50           ` Martin K. Petersen
2019-06-26  2:50             ` Martin K. Petersen
2019-06-25  1:59       ` Coly Li
2022-01-06  3:29         ` Eric Wheeler
2022-01-06 16:17           ` Coly Li
2022-01-08  0:21           ` Martin K. Petersen
2022-01-08  4:54             ` Eric Wheeler [this message]
2022-01-08 21:51               ` Eric Wheeler
2022-01-10 16:14                 ` Martin K. Petersen
2022-01-10 23:30                   ` Eric Wheeler
2022-01-11  2:55                     ` Martin K. Petersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c9abd220-6b7f-9299-48a1-a16d64981734@ewheeler.net \
    --to=bcache@lists.ewheeler.net \
    --cc=colyli@suse.de \
    --cc=corbet@lwn.net \
    --cc=kent.overstreet@gmail.com \
    --cc=linux-bcache@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.