linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Martin K. Petersen" <martin.petersen@oracle.com>
To: Eric Wheeler <bcache@lists.ewheeler.net>
Cc: Coly Li <colyli@suse.de>,
	linux-block@vger.kernel.org, Jonathan Corbet <corbet@lwn.net>,
	Kent Overstreet <kent.overstreet@gmail.com>,
	"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>,
	open list <linux-kernel@vger.kernel.org>,
	"open list:BCACHE (BLOCK LAYER CACHE)"
	<linux-bcache@vger.kernel.org>,
	"Martin K. Petersen" <martin.petersen@oracle.com>
Subject: Re: [PATCH] bcache: make stripe_size configurable and persistent for hardware raid5/6
Date: Fri, 07 Jan 2022 19:21:28 -0500	[thread overview]
Message-ID: <yq15yqvw1f0.fsf@ca-mkp.ca.oracle.com> (raw)
In-Reply-To: <fdb85dc1-eee6-e55e-8e9c-fa1f36b4a37@ewheeler.net> (Eric Wheeler's message of "Wed, 5 Jan 2022 19:29:05 -0800 (PST)")


Eric,

> Even new new RAID controlers that _do_ provide `io_opt` still do _not_ 
> indicate partial_stripes_expensive (which is an mdraid feature, but Martin 
> please correct me if I'm wrong here).

partial_stripes_expensive is a bcache thing, I am not sure why it needs
a separate flag. It is implied, although I guess one could argue that
RAID0 is a special case since partial writes are not as painful as with
parity RAID.

The SCSI spec states that submitting an I/O that is smaller than io_min
"may incur delays in processing the command". And similarly, submitting
a command larger than io_opt "may incur delays in processing the
command".

IOW, the spec says "don't write less than an aligned multiple of the
stripe chunk size" and "don't write more than an aligned full
stripe". That leaves "aligned multiples of the stripe chunk size but
less than the full stripe width" unaccounted for. And I guess that's
what the bcache flag is trying to capture.

SCSI doesn't go into details about RAID levels and other implementation
details which is why the wording is deliberately vague. But obviously
the expectation is that partial stripe writes are slower than full.

In my book any component in the stack that sees either io_min or io_opt
should try very hard to send I/Os that are aligned multiples of those
values. I am not opposed to letting users manually twiddle the
settings. But I do think that we should aim for the stack doing the
right thing when it sees io_opt reported on a device.

-- 
Martin K. Petersen	Oracle Linux Engineering

  parent reply	other threads:[~2022-01-08  0:21 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <d3f7fd44-9287-c7fa-ee95-c3b8a4d56c93@suse.de>
2019-06-22 23:16 ` [PATCH] bcache: make stripe_size configurable and persistent for hardware raid5/6 Eric Wheeler
2019-06-23  0:41   ` Martin K. Petersen
2019-06-24  6:57   ` Coly Li
2019-06-24  7:05   ` Coly Li
2019-06-24 18:14     ` Eric Wheeler
2019-06-24 23:24       ` Martin K. Petersen
2019-06-26  0:23         ` Eric Wheeler
2019-06-26  2:50           ` Martin K. Petersen
2019-06-25  1:59       ` Coly Li
2022-01-06  3:29         ` Eric Wheeler
2022-01-06 16:17           ` Coly Li
2022-01-08  0:21           ` Martin K. Petersen [this message]
2022-01-08  4:54             ` Eric Wheeler
2022-01-08 21:51               ` Eric Wheeler
2022-01-10 16:14                 ` Martin K. Petersen
2022-01-10 23:30                   ` Eric Wheeler
2022-01-11  2:55                     ` Martin K. Petersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=yq15yqvw1f0.fsf@ca-mkp.ca.oracle.com \
    --to=martin.petersen@oracle.com \
    --cc=bcache@lists.ewheeler.net \
    --cc=colyli@suse.de \
    --cc=corbet@lwn.net \
    --cc=kent.overstreet@gmail.com \
    --cc=linux-bcache@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).