All of lore.kernel.org
 help / color / mirror / Atom feed
From: Niklas Cassel <Niklas.Cassel@wdc.com>
To: Bart Van Assche <bvanassche@acm.org>
Cc: Jens Axboe <axboe@kernel.dk>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	Christoph Hellwig <hch@lst.de>, Jaegeuk Kim <jaegeuk@kernel.org>,
	Adam Manzanares <a.manzanares@samsung.com>,
	Damien Le Moal <Damien.LeMoal@wdc.com>,
	Hannes Reinecke <hare@suse.de>, Ming Lei <ming.lei@redhat.com>,
	Johannes Thumshirn <Johannes.Thumshirn@wdc.com>,
	Himanshu Madhani <himanshu.madhani@oracle.com>
Subject: Re: [PATCH v3 16/16] block/mq-deadline: Prioritize high-priority requests
Date: Tue, 24 Aug 2021 21:33:58 +0000	[thread overview]
Message-ID: <YSVlxRutpVkfo7W/@x1-carbon> (raw)
In-Reply-To: <b6131780-8b67-8efb-a942-e40b68df082e@acm.org>

On Mon, Aug 23, 2021 at 10:15:39AM -0700, Bart Van Assche wrote:
> On 8/23/21 12:36 AM, Niklas Cassel wrote:
> > I was mainly thinking that it should be possible to do a generic fix,
> > such that we eventually won't need a similar fix as yours in all the
> > different I/O schedulers.
> 
> Coming up with a generic fix would be great but I have not yet found an
> elegant approach ...
> 
> Another question is what the impact is of scheduler bypass on zoned block
> devices? Is the zone locking performed by the mq-deadline scheduler for
> writes to zoned block devices compatible with I/O scheduler bypass?

If anyone is curious of how the stack trace looks:

#0  dd_finish_request (rq=0xffff8881051b0000) at block/mq-deadline.c:790
#1  0xffffffff81741fcf in blk_mq_free_request (rq=rq@entry=0xffff8881051b0000)
    at block/blk-mq.c:516
#2  0xffffffff8172e4fa in blk_put_request (req=req@entry=0xffff8881051b0000)
    at block/blk-core.c:644
#3  0xffffffff819c51c2 in __scsi_execute (sdev=0xffff888106064000,
    cmd=cmd@entry=0xffffc900002c78d8 "", data_direction=data_direction@entry=3,
    buffer=buffer@entry=0x0 <fixed_percpu_data>, bufflen=bufflen@entry=0,
    sense=sense@entry=0x0 <fixed_percpu_data>, sshdr=0xffffc900002c7878, timeout=30000, retries=5,
    flags=0, rq_flags=0, resid=0x0 <fixed_percpu_data>) at drivers/scsi/scsi_lib.c:260
#4  0xffffffff819e12b6 in scsi_execute_req (resid=0x0 <fixed_percpu_data>, retries=5,
    timeout=30000, sshdr=0xffffc900002c7878, bufflen=0, buffer=0x0 <fixed_percpu_data>,
    data_direction=3, cmd=0xffffc900002c78d8 "", sdev=<optimized out>)
    at ./include/scsi/scsi_device.h:463
#5  sd_spinup_disk (sdkp=<optimized out>) at drivers/scsi/sd.c:2177
#6  sd_revalidate_disk (disk=<optimized out>) at drivers/scsi/sd.c:3302
#7  0xffffffff819e479d in sd_open (bdev=0xffff888102b45800, mode=1) at drivers/scsi/sd.c:1443
#8  0xffffffff81422285 in blkdev_get_whole (bdev=bdev@entry=0xffff888102b45800, mode=mode@entry=1)
    at fs/block_dev.c:1253
#9  0xffffffff8142348a in blkdev_get_by_dev (dev=<optimized out>, mode=mode@entry=1,
    holder=holder@entry=0x0 <fixed_percpu_data>) at fs/block_dev.c:1417
#10 0xffffffff8175632c in disk_scan_partitions (disk=0xffff88810514dc00) at block/genhd.c:388
#11 register_disk (groups=<optimized out>, disk=0xffff88810514dc00, parent=0xffff888106064268)
    at block/genhd.c:435
#12 __device_add_disk (parent=parent@entry=0xffff888106064268, disk=disk@entry=0xffff88810514dc00,
    groups=groups@entry=0x0 <fixed_percpu_data>, register_queue=register_queue@entry=true)
    at block/genhd.c:527
#13 0xffffffff8175640f in device_add_disk (parent=parent@entry=0xffff888106064268,
    disk=disk@entry=0xffff88810514dc00, groups=groups@entry=0x0 <fixed_percpu_data>)
    at block/genhd.c:548
#14 0xffffffff819e4d0f in sd_probe (dev=0xffff888106064268) at drivers/scsi/sd.c:3581


This stack trace is simply the first one, it can be traced back to
sd_revalidate_disk(). All the other dd_finish_request() calls (which doesn't
have a matching insert) also originate from sd_revalidate_disk().

Like we suspected, this is because of scheduler bypass.

E.g.
sd_revalidate_disk() -> read_capacity_16() -> __scsi_execute() ->
blk_execute_rq() -> blk_execute_rq_nowait() -> blk_mq_sched_insert_request()
-> blk_mq_sched_bypass_insert() -> blk_mq_request_bypass_insert()

__scsi_execute() sets req op to REQ_OP_DRV_OUT or REQ_OP_DRV_IN.

blk_mq_sched_insert_request() bypasses the scheduler when
blk_mq_sched_bypass_insert() returns true, which it does if
blk_rq_is_passthrough().
blk_rq_is_passthrough() returns true if req op is REQ_OP_DRV_OUT
or REQ_OP_DRV_IN.

Basically __scsi_execute() is the equivalent of __nvme_submit_sync_cmd(),
but with a worse name :)


"Is the zone locking performed by the mq-deadline scheduler for
writes to zoned block devices compatible with I/O scheduler bypass?"

Since sd_revalidate_disk() doesn't do any writes, everything is fine.
(And like Damien said, if any kernel code did passthrough writes,
we would have seen errors from the drive a long time ago.)

Yes, a user submitting passthrough writes can of course do "bad things",
but that is expected :)


Kind regards,
Niklas

  parent reply	other threads:[~2021-08-24 21:34 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-18  0:44 [PATCH v3 00/16] Improve I/O priority support Bart Van Assche
2021-06-18  0:44 ` [PATCH v3 01/16] block/Kconfig: Make the BLK_WBT and BLK_WBT_MQ entries consecutive Bart Van Assche
2021-06-18 17:09   ` Adam Manzanares
2021-06-18 19:49   ` Himanshu Madhani
2021-06-18  0:44 ` [PATCH v3 02/16] block/blk-cgroup: Swap the blk_throtl_init() and blk_iolatency_init() calls Bart Van Assche
2021-06-18 17:15   ` Adam Manzanares
2021-06-18 19:49   ` Himanshu Madhani
2021-06-21 14:24   ` Tejun Heo
2021-06-18  0:44 ` [PATCH v3 03/16] block/blk-rq-qos: Move a function from a header file into a C file Bart Van Assche
2021-06-18 17:22   ` Adam Manzanares
2021-06-18 19:49   ` Himanshu Madhani
2021-06-18  0:44 ` [PATCH v3 04/16] block: Introduce the ioprio rq-qos policy Bart Van Assche
2021-06-18 22:02   ` Adam Manzanares
2021-06-18  0:44 ` [PATCH v3 05/16] block/mq-deadline: Add several comments Bart Van Assche
2021-06-18 22:06   ` Adam Manzanares
2021-06-18  0:44 ` [PATCH v3 06/16] block/mq-deadline: Add two lockdep_assert_held() statements Bart Van Assche
2021-06-18 22:09   ` Adam Manzanares
2021-06-18  0:44 ` [PATCH v3 07/16] block/mq-deadline: Remove two local variables Bart Van Assche
2021-06-18 22:16   ` Adam Manzanares
2021-06-18  0:44 ` [PATCH v3 08/16] block/mq-deadline: Rename dd_init_queue() and dd_exit_queue() Bart Van Assche
2021-06-18 22:18   ` Adam Manzanares
2021-06-18  0:44 ` [PATCH v3 09/16] block/mq-deadline: Improve compile-time argument checking Bart Van Assche
2021-06-18 22:30   ` Adam Manzanares
2021-06-18  0:44 ` [PATCH v3 10/16] block/mq-deadline: Improve the sysfs show and store macros Bart Van Assche
2021-06-18 23:07   ` Adam Manzanares
2021-06-18  0:44 ` [PATCH v3 11/16] block/mq-deadline: Reserve 25% of scheduler tags for synchronous requests Bart Van Assche
2021-06-18  0:44 ` [PATCH v3 12/16] block/mq-deadline: Micro-optimize the batching algorithm Bart Van Assche
2021-06-18  0:44 ` [PATCH v3 13/16] block/mq-deadline: Add I/O priority support Bart Van Assche
2021-06-18  0:44 ` [PATCH v3 14/16] block/mq-deadline: Track I/O statistics Bart Van Assche
2021-06-18  0:44 ` [PATCH v3 15/16] block/mq-deadline: Add cgroup support Bart Van Assche
2021-06-18  0:44 ` [PATCH v3 16/16] block/mq-deadline: Prioritize high-priority requests Bart Van Assche
2021-08-20  0:45   ` Niklas Cassel
2021-08-20 18:04     ` Bart Van Assche
2021-08-20 23:05       ` Niklas Cassel
2021-08-20 23:38         ` Bart Van Assche
2021-08-23  7:36           ` Niklas Cassel
2021-08-23 17:15             ` Bart Van Assche
2021-08-23 23:01               ` Damien Le Moal
2021-08-24 21:33               ` Niklas Cassel [this message]
2021-06-21 16:06 ` [PATCH v3 00/16] Improve I/O priority support Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YSVlxRutpVkfo7W/@x1-carbon \
    --to=niklas.cassel@wdc.com \
    --cc=Damien.LeMoal@wdc.com \
    --cc=Johannes.Thumshirn@wdc.com \
    --cc=a.manzanares@samsung.com \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=himanshu.madhani@oracle.com \
    --cc=jaegeuk@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.