linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET/RFC] blk-mq scheduling framework
@ 2016-12-07 23:09 Jens Axboe
  2016-12-07 23:09 ` [PATCH 1/7] blk-mq: add blk_mq_start_stopped_hw_queue() Jens Axboe
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Jens Axboe @ 2016-12-07 23:09 UTC (permalink / raw)
  To: axboe, linux-block, linux-kernel; +Cc: paolo.valente, osandov

I hacked up some support for registering blk-mq capable IO schedulers,
and when that was done, I adapted deadline to work with it as a new
mq-deadline scheduler.

Basically this is similar to the legacy scheduling patches I posted
recently, in that we setup a list of fake requests (I called them
shadows) that the IO scheduler works on. We then transform those
into a real blk-mq request, when we have to dispatch to hardware.

There's a hack in place to make this work with the flush/fua
requests, as those bypass the regular software queue insertion.
For now we simply ensure that we allocate a real request for
those. Howeer, I'd prefer if we simply inserted the request as
we usually would, and then start the flush state machinery when
we pull the request out of the queue. That would both be cleaner
from a flush perspective, and from the scheduling side as well.

I'm reusing the existing elevator interface, just augmenting
that with mq_ops and a ->uses_mq so we can tell which is which.
They show up automatically, for instance on a scsi-mq device:

$ cat /sys/block/sda/queue/scheduler 
[mq-deadline] none

vs just a legacy device:

$ cat /sys/block/nullb0/queue/scheduler 
noop deadline [cfq]

Changing schedulers is done in the same way as it always has,
by echoing into the 'scheduler' file. For MQ, there's a 'none'
setting as well that isn't a real scheduler, it simply turns off
the scheduler. Handy for comparison.

Obviously a direct deadline adaptation has performance implications,
so it can be improved. A _real_ MQ scheduler is forth coming, which
will sit on top of this interface. Paolo, for BFQ, this is the
interface you should target. Let me know if you have any questions
about how it works.


 block/Kconfig.iosched    |    6 
 block/Makefile           |    3 
 block/blk-core.c         |    3 
 block/blk-exec.c         |    3 
 block/blk-flush.c        |    7 
 block/blk-mq-sched.c     |  243 ++++++++++++++++++
 block/blk-mq-sched.h     |  168 ++++++++++++
 block/blk-mq-tag.c       |    1 
 block/blk-mq.c           |  241 ++++++++++--------
 block/blk-mq.h           |   33 --
 block/elevator.c         |  140 +++++++---
 block/mq-deadline.c      |  622 +++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/blk-mq.h   |    3 
 include/linux/elevator.h |   30 ++
 14 files changed, 1331 insertions(+), 172 deletions(-)


-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 19+ messages in thread
* [PATCHSET/RFC v2] blk-mq scheduling framework
@ 2016-12-08 20:13 Jens Axboe
  2016-12-08 20:13 ` [PATCH 5/7] blk-mq-sched: add framework for MQ capable IO schedulers Jens Axboe
  0 siblings, 1 reply; 19+ messages in thread
From: Jens Axboe @ 2016-12-08 20:13 UTC (permalink / raw)
  To: axboe, linux-block, linux-kernel; +Cc: paolo.valente, osandov

As a followup to this posting from yesterday:

https://marc.info/?l=linux-block&m=148115232806065&w=2

this is version 2. I wanted to post a new one fairly quickly, as there
ended up being a number of potential crashes in v1. This one should be
solid, I've run mq-deadline on both NVMe and regular rotating storage,
and we handle the various merging cases correctly.

You can download it from git as well:

git://git.kernel.dk/linux-block blk-mq-sched.2

Note that this is based on for-4.10/block, which is in turn based on
v4.9-rc1. I suggest pulling it into my for-next branch, which would
then merge nicely with 'master' as well.

Changes since v1:

- Add Kconfig entries to allow the user to choose what the default
  scheduler should be for blk-mq, and whether that depends on the
  number of hardware queues.

- Properly abstract the whole get/put of a request, so we can manage
  the life time properly.

- Enable full merging on mq-deadline (front/back, bio-to-rq, rq-to-rq).
  Has full feature parity with deadline now.

- Export necessary symbols for compiling mq-deadline as a module.

- Various API adjustments for the mq schedulers.

- Various cleanups and improvements.

- Fix a lot of bugs. A lot. Upgrade!

 block/Kconfig.iosched    |   37 ++
 block/Makefile           |    3 
 block/blk-core.c         |    9 
 block/blk-exec.c         |    3 
 block/blk-flush.c        |    7 
 block/blk-merge.c        |    3 
 block/blk-mq-sched.c     |  265 +++++++++++++++++++
 block/blk-mq-sched.h     |  188 +++++++++++++
 block/blk-mq-tag.c       |    1 
 block/blk-mq.c           |  254 ++++++++++--------
 block/blk-mq.h           |   35 +-
 block/elevator.c         |  194 ++++++++++----
 block/mq-deadline.c      |  647 +++++++++++++++++++++++++++++++++++++++++++++++
 drivers/nvme/host/pci.c  |    1 
 include/linux/blk-mq.h   |    4 
 include/linux/elevator.h |   34 ++
 16 files changed, 1495 insertions(+), 190 deletions(-)

^ permalink raw reply	[flat|nested] 19+ messages in thread
* [PATCHSET v3] blk-mq scheduling framework
@ 2016-12-15  5:26 Jens Axboe
  2016-12-15  5:26 ` [PATCH 5/7] blk-mq-sched: add framework for MQ capable IO schedulers Jens Axboe
  0 siblings, 1 reply; 19+ messages in thread
From: Jens Axboe @ 2016-12-15  5:26 UTC (permalink / raw)
  To: axboe, linux-block, linux-kernel; +Cc: paolo.valente, osandov

This is version 3 of the blk-mq scheduling framework. Version 2
was posted here:

https://marc.info/?l=linux-block&m=148122805026762&w=2

It's fully stable. In fact I'm running it on my laptop [1]. That may
or may not have been part of a dare. In any case, it's been stable
on that too, and has survived lengthy testing on dedicated test
boxes.

[1] $ cat /sys/block/nvme0n1/queue/scheduler
[mq-deadline] none

I'm still mentally debating whether to shift this over to have
duplicate request tags, one for the scheduler and one for the issue
side. We run into various issues if we do that, but we also get
rid of the shadow request field copying. I think both approaches
have their downsides. I originally considered both, and though that
the shadow request would potentially be the cleanest.

I've rebased this against Linus master branch, since a bunch of
the prep patches are now in, and the general block changes are in
as well.

The patches can be pulled here:

git://git.kernel.dk/linux-block blk-mq-sched.3

Changes since v2:

- Fix the Kconfig single/multi queue sched entry. Suggested by Bart.

- Move the queue ref put into the failure path of the request getting,
  so the caller doesn't have to know about it. Suggested by Bart.

- Add support for IO context management. Needed for the BFQ port.

- Change the anonymous elevator ops union to a named one, since
  old (looking at you, gcc 4.4) compilers don't support named
  initialization of anon unions.

- Constify the blk_mq_ops structure pointers.

- Add generic merging code, so mq-deadline (and others) don't have to
  handle/duplicate that.

- Switched the dispatch hook to list based, so we can move more entries
  at the time, if we want/need to. From Omar.

- Add support for schedulers to continue using the software queues.
  From Omar.

- Ensure that it works with blk-wbt.

- Fix a failure case if we fail registering the MQ elevator. We'd
  fall back to trying noop, which we'd find, but that would not
  work for MQ devices. Fall back to 'none' instead.

- Verified queue ref management.

- Fixed a bunch of bugs, and added a bunch of cleanups.

 block/Kconfig.iosched    |   37 ++
 block/Makefile           |    3 
 block/blk-core.c         |   23 -
 block/blk-exec.c         |    3 
 block/blk-flush.c        |    7 
 block/blk-ioc.c          |    8 
 block/blk-merge.c        |    4 
 block/blk-mq-sched.c     |  394 +++++++++++++++++++++++++++++
 block/blk-mq-sched.h     |  192 ++++++++++++++
 block/blk-mq-tag.c       |    1 
 block/blk-mq.c           |  226 +++++++---------
 block/blk-mq.h           |   28 ++
 block/blk.h              |   26 +
 block/cfq-iosched.c      |    2 
 block/deadline-iosched.c |    2 
 block/elevator.c         |  229 ++++++++++++----
 block/mq-deadline.c      |  638 +++++++++++++++++++++++++++++++++++++++++++++++
 block/noop-iosched.c     |    2 
 drivers/nvme/host/pci.c  |    1 
 include/linux/blk-mq.h   |    6 
 include/linux/blkdev.h   |    2 
 include/linux/elevator.h |   33 ++
 22 files changed, 1635 insertions(+), 232 deletions(-)

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2016-12-15 21:44 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-07 23:09 [PATCHSET/RFC] blk-mq scheduling framework Jens Axboe
2016-12-07 23:09 ` [PATCH 1/7] blk-mq: add blk_mq_start_stopped_hw_queue() Jens Axboe
2016-12-07 23:09 ` [PATCH 2/7] blk-mq: abstract out blk_mq_dispatch_rq_list() helper Jens Axboe
2016-12-07 23:09 ` [PATCH 3/7] elevator: make the rqhash helpers exported Jens Axboe
2016-12-07 23:09 ` [PATCH 4/7] blk-flush: run the queue when inserting blk-mq flush Jens Axboe
2016-12-07 23:09 ` [PATCH 5/7] blk-mq-sched: add framework for MQ capable IO schedulers Jens Axboe
2016-12-07 23:10 ` [PATCH 6/7] " Jens Axboe
2016-12-07 23:10 ` [PATCH 7/7] mq-deadline: add blk-mq adaptation of the deadline IO scheduler Jens Axboe
2016-12-08 20:13 [PATCHSET/RFC v2] blk-mq scheduling framework Jens Axboe
2016-12-08 20:13 ` [PATCH 5/7] blk-mq-sched: add framework for MQ capable IO schedulers Jens Axboe
2016-12-13 13:56   ` Bart Van Assche
2016-12-13 15:14     ` Jens Axboe
2016-12-14 10:31       ` Bart Van Assche
2016-12-14 15:05         ` Jens Axboe
2016-12-13 14:29   ` Bart Van Assche
2016-12-13 15:20     ` Jens Axboe
2016-12-15  5:26 [PATCHSET v3] blk-mq scheduling framework Jens Axboe
2016-12-15  5:26 ` [PATCH 5/7] blk-mq-sched: add framework for MQ capable IO schedulers Jens Axboe
2016-12-15 19:29   ` Omar Sandoval
2016-12-15 20:14     ` Jens Axboe
2016-12-15 21:44     ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).