All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bryan Gurney <bgurney@redhat.com>
To: Damien Le Moal <Damien.LeMoal@wdc.com>
Cc: Jens Axboe <axboe@kernel.dk>, Jeff Moyer <jmoyer@redhat.com>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>
Subject: Re: [PATCH 0/2][RFC] block: default to deadline for SMR devices
Date: Wed, 30 May 2018 09:45:51 -0400	[thread overview]
Message-ID: <CAHhmqcQC8JZh_kL6DMEcwQu69s0havYsJogUgzn6w0pFPZHTbw@mail.gmail.com> (raw)
In-Reply-To: <122bc1d9-a074-7282-6812-aba26f4868f4@wdc.com>

On Wed, May 30, 2018 at 2:22 AM, Damien Le Moal <Damien.LeMoal@wdc.com> wrote:
> Jens,
>
> On 5/26/18 13:01, Jens Axboe wrote:
>> On 5/25/18 4:18 PM, Jeff Moyer wrote:
>>> Hi, Jens,
>>>
>>> Jens Axboe <axboe@kernel.dk> writes:
>>>
>>>> On 5/25/18 3:14 PM, Jeff Moyer wrote:
>>>>> Bryan Gurney reported I/O errors when using dm-zoned with a host-managed
>>>>> SMR device.  It turns out he was using CFQ, which is the default.
>>>>> Unfortunately, as of v4.16, only the deadline schedulers work well with
>>>>> host-managed SMR devices.  This series aatempts to switch the elevator
>>>>> to deadline for those devices.
>>>>>
>>>>> NOTE: I'm not super happy with setting up one iosched and then
>>>>> immediately tearing it down.  I'm open to suggestions on better ways
>>>>> to accomplish this goal.
>>>>
>>>> Let's please not do this, a few years ago I finally managed to kill
>>>> drivers changing the scheduler manually. Why can't this go into a
>>>> udev (or similar) rule? That's where it belongs, imho.
>>>
>>> We could do that.  The downside is that distros will have to pick up
>>> udev rules, which they haven't done yet, and the udev rules will have to
>>> be kernel version dependent.  And then later, when this restriction is
>>> lifted, we'll have to update the udev rules.  That also sounds awful to
>>> me.
>>
>> They only have to be feature dependent, which isn't that hard. And if I
>> had to pick between a kernel upgrade and a udev rule package update, the
>> choice is pretty clear.
>
> Currently, a queue elevator is first initialized with a call to
> elevator_init() from either blk_mq_init_allocated_queue() or
> blk_init_allocated_queue(). When these are called, the device has not
> yet been probed and no information is available. Do you think it is
> reasonable to delay the default elevator bring-up to after some
> information about the device is obtained ?
>
> That would necessitate splitting elevator_init() into a generic elevator
> initialization function setting up the elevator related fields of the
> request queue and a second part setting up the default elevator (e.g.
> elevator_set_default()). Doing so, the function elevator_set_default()
> could be called later in the device initialization sequence, after
> information from the device has been obtained. It would make choosing a
> sane default elevator much cleaner.
>
> Granted, that is in effect very similar to what Jeff's patches do. But
> in a sense, the current code already consider some of the device
> characteristics when setting the default elevator. That is, if the
> device is mq or not. Would it be a stretch to add "is_zoned" as another
> characteristic that is looked at ?
>
> On another note, I noticed that the only two callers of elevator_init()
> are blk_mq_init_allocated_queue() and blk_init_allocated_queue(). Both
> call elevator_init() with a NULL name. I wonder the usefulness of that
> argument. Furthermore, elevator_init() is also exported but I do not see
> any external module user. Is this in purpose or is it the remnant of
> older use of it ? I can send a patch cleaning that up if you feel that
> needs so cleanup.
>
>>
>>> I understand why you don't like this patch set, but I happen to think
>>> the alternative is worse.  FYI, in Bryan's case, his system actually got
>>> bricked (likely due to buggy firmware).
>>
>> I disagree, I think the rule approach is much easier. If the wrong write
>> location bricked the drive, then I think that user has much larger
>> issues... That seems like a trivial issue that should have been caught
>> in basic testing, I would not trust that drive with any data if it
>> bricks that easily.
>
> I think that the bug in this particular case was with the BIOS (probably
> UEFI), not the disk. Disks only answer to trivial requests when probed
> on boot, unless the BIOS is too smart for its own good and doing very
> weird things. Different issue anyway, not related to the kernel.

Actually, the BIOS boot mode was configured to "Legacy" mode (i.e.:
non-UEFI) at the time of the bricking.  I'm pretty confident of this,
because after the BIOS reflash, I can see a different rendering mode
in the GRUB screen of Fedora 28, and when I checked back in the BIOS
setup utility, I saw that the default for the new BIOS image was UEFI.

On the test systems in general, I prefer to use legacy BIOS if I don't
need UEFI.  I'm not sure if this means that I walked into a latent
BIOS bug, though.


Thanks,

Bryan

  parent reply	other threads:[~2018-05-30 13:45 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-25 21:14 [PATCH 0/2][RFC] block: default to deadline for SMR devices Jeff Moyer
2018-05-25 21:14 ` [PATCH 1/2] block: __elevator_change: add try_loading parameter Jeff Moyer
2018-05-25 21:14 ` [PATCH 2/2] block: default to deadline for host-managed SMR devices Jeff Moyer
2018-05-25 22:04 ` [PATCH 0/2][RFC] block: default to deadline for " Jens Axboe
2018-05-25 22:18   ` Jeff Moyer
2018-05-26  4:01     ` Jens Axboe
2018-05-29 12:14       ` Bryan Gurney
2018-05-30  6:22       ` Damien Le Moal
2018-05-30  8:47         ` Christoph Hellwig
2018-05-31  5:36           ` Damien Le Moal
2018-05-30 13:45         ` Bryan Gurney [this message]
2018-05-30  8:49       ` Christoph Hellwig
2018-05-30 14:22         ` Jens Axboe
2018-05-30 15:06           ` Jeff Moyer
2018-05-30 15:17             ` Jens Axboe
2018-05-30 15:45               ` Jeff Moyer
2018-05-30 15:52                 ` Jens Axboe
2018-05-31  3:16                   ` Damien Le Moal
2018-05-31  3:04                 ` Damien Le Moal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHhmqcQC8JZh_kL6DMEcwQu69s0havYsJogUgzn6w0pFPZHTbw@mail.gmail.com \
    --to=bgurney@redhat.com \
    --cc=Damien.LeMoal@wdc.com \
    --cc=axboe@kernel.dk \
    --cc=jmoyer@redhat.com \
    --cc=linux-block@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.