All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 00/20] blk-mq-sched: improve SCSI-MQ performance
@ 2017-08-05  6:56 Ming Lei
  2017-08-05  6:56 ` [PATCH V2 01/20] blk-mq-sched: fix scheduler bad performance Ming Lei
                   ` (23 more replies)
  0 siblings, 24 replies; 84+ messages in thread
From: Ming Lei @ 2017-08-05  6:56 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, Laurence Oberman, Ming Lei

In Red Hat internal storage test wrt. blk-mq scheduler, we
found that I/O performance is much bad with mq-deadline, especially
about sequential I/O on some multi-queue SCSI devcies(lpfc, qla2xxx,
SRP...)

Turns out one big issue causes the performance regression: requests
are still dequeued from sw queue/scheduler queue even when ldd's
queue is busy, so I/O merge becomes quite difficult to make, then
sequential IO degrades a lot.

The 1st five patches improve this situation, and brings back
some performance loss.

But looks they are still not enough. It is caused by
the shared queue depth among all hw queues. For SCSI devices,
.cmd_per_lun defines the max number of pending I/O on one
request queue, which is per-request_queue depth. So during
dispatch, if one hctx is too busy to move on, all hctxs can't
dispatch too because of the per-request_queue depth.

Patch 6 ~ 14 use per-request_queue dispatch list to avoid
to dequeue requests from sw/scheduler queue when lld queue
is busy.

Patch 15 ~20 improve bio merge via hash table in sw queue,
which makes bio merge more efficient than current approch
in which only the last 8 requests are checked. Since patch
6~14 converts to the scheduler way of dequeuing one request
from sw queue one time for SCSI device, and the times of
acquring ctx->lock is increased, and merging bio via hash
table decreases holding time of ctx->lock and should eliminate
effect from patch 14. 

With this changes, SCSI-MQ sequential I/O performance is
improved much, for lpfc, it is basically brought back
compared with block legacy path[1], especially mq-deadline
is improved by > X10 [1] on lpfc and by > 3X on SCSI SRP,
For mq-none it is improved by 10% on lpfc, and write is
improved by > 10% on SRP too.

Also Bart worried that this patchset may affect SRP, so provide
test data on SCSI SRP this time:

- fio(libaio, bs:4k, dio, queue_depth:64, 64 jobs)
- system(16 cores, dual sockets, mem: 96G)

              |v4.13-rc3     |v4.13-rc3     | v4.13-rc3+patches |
              |blk-legacy dd |blk-mq none   | blk-mq none  |
-----------------------------------------------------------|  
read     :iops|         587K |         526K |         537K |
randread :iops|         115K |         140K |         139K |
write    :iops|         596K |         519K |         602K |
randwrite:iops|         103K |         122K |         120K |


              |v4.13-rc3     |v4.13-rc3     | v4.13-rc3+patches
              |blk-legacy dd |blk-mq dd     | blk-mq dd    |
------------------------------------------------------------
read     :iops|         587K |         155K |         522K |
randread :iops|         115K |         140K |         141K |
write    :iops|         596K |         135K |         587K |
randwrite:iops|         103K |         120K |         118K |

V2:
	- dequeue request from sw queues in round roubin's style
	as suggested by Bart, and introduces one helper in sbitmap
	for this purpose
	- improve bio merge via hash table from sw queue
	- add comments about using DISPATCH_BUSY state in lockless way,
	simplifying handling on busy state,
	- hold ctx->lock when clearing ctx busy bit as suggested
	by Bart


[1] http://marc.info/?l=linux-block&m=150151989915776&w=2

Ming Lei (20):
  blk-mq-sched: fix scheduler bad performance
  sbitmap: introduce __sbitmap_for_each_set()
  blk-mq: introduce blk_mq_dispatch_rq_from_ctx()
  blk-mq-sched: move actual dispatching into one helper
  blk-mq-sched: improve dispatching from sw queue
  blk-mq-sched: don't dequeue request until all in ->dispatch are
    flushed
  blk-mq-sched: introduce blk_mq_sched_queue_depth()
  blk-mq-sched: use q->queue_depth as hint for q->nr_requests
  blk-mq: introduce BLK_MQ_F_SHARED_DEPTH
  blk-mq-sched: introduce helpers for query, change busy state
  blk-mq: introduce helpers for operating ->dispatch list
  blk-mq: introduce pointers to dispatch lock & list
  blk-mq: pass 'request_queue *' to several helpers of operating BUSY
  blk-mq-sched: improve IO scheduling on SCSI devcie
  block: introduce rqhash helpers
  block: move actual bio merge code into __elv_merge
  block: add check on elevator for supporting bio merge via hashtable
    from blk-mq sw queue
  block: introduce .last_merge and .hash to blk_mq_ctx
  blk-mq-sched: refactor blk_mq_sched_try_merge()
  blk-mq: improve bio merge from blk-mq sw queue

 block/blk-mq-debugfs.c  |  12 ++--
 block/blk-mq-sched.c    | 187 +++++++++++++++++++++++++++++-------------------
 block/blk-mq-sched.h    |  23 ++++++
 block/blk-mq.c          | 133 +++++++++++++++++++++++++++++++---
 block/blk-mq.h          |  73 +++++++++++++++++++
 block/blk-settings.c    |   2 +
 block/blk.h             |  55 ++++++++++++++
 block/elevator.c        |  93 ++++++++++++++----------
 include/linux/blk-mq.h  |   5 ++
 include/linux/blkdev.h  |   5 ++
 include/linux/sbitmap.h |  54 ++++++++++----
 11 files changed, 504 insertions(+), 138 deletions(-)

-- 
2.9.4

^ permalink raw reply	[flat|nested] 84+ messages in thread

end of thread, other threads:[~2017-08-26  8:53 UTC | newest]

Thread overview: 84+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-05  6:56 [PATCH V2 00/20] blk-mq-sched: improve SCSI-MQ performance Ming Lei
2017-08-05  6:56 ` [PATCH V2 01/20] blk-mq-sched: fix scheduler bad performance Ming Lei
2017-08-09  0:11   ` Omar Sandoval
2017-08-09  2:32     ` Ming Lei
2017-08-09  7:11       ` Omar Sandoval
2017-08-21  8:18         ` Ming Lei
2017-08-23  7:48         ` Ming Lei
2017-08-05  6:56 ` [PATCH V2 02/20] sbitmap: introduce __sbitmap_for_each_set() Ming Lei
2017-08-22 18:28   ` Bart Van Assche
2017-08-24  3:57     ` Ming Lei
2017-08-25 21:36       ` Bart Van Assche
2017-08-26  8:43         ` Ming Lei
2017-08-22 18:37   ` Bart Van Assche
2017-08-24  4:02     ` Ming Lei
2017-08-05  6:56 ` [PATCH V2 03/20] blk-mq: introduce blk_mq_dispatch_rq_from_ctx() Ming Lei
2017-08-22 18:45   ` Bart Van Assche
2017-08-24  4:52     ` Ming Lei
2017-08-25 21:41       ` Bart Van Assche
2017-08-26  8:47         ` Ming Lei
2017-08-05  6:56 ` [PATCH V2 04/20] blk-mq-sched: move actual dispatching into one helper Ming Lei
2017-08-22 19:50   ` Bart Van Assche
2017-08-05  6:56 ` [PATCH V2 05/20] blk-mq-sched: improve dispatching from sw queue Ming Lei
2017-08-22 19:55   ` Bart Van Assche
2017-08-23 19:58     ` Jens Axboe
2017-08-24  5:52     ` Ming Lei
2017-08-22 20:57   ` Bart Van Assche
2017-08-24  6:12     ` Ming Lei
2017-08-05  6:56 ` [PATCH V2 06/20] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed Ming Lei
2017-08-22 20:09   ` Bart Van Assche
2017-08-24  6:18     ` Ming Lei
2017-08-23 19:56   ` Jens Axboe
2017-08-24  6:38     ` Ming Lei
2017-08-25 10:19       ` Ming Lei
2017-08-05  6:56 ` [PATCH V2 07/20] blk-mq-sched: introduce blk_mq_sched_queue_depth() Ming Lei
2017-08-22 20:10   ` Bart Van Assche
2017-08-05  6:56 ` [PATCH V2 08/20] blk-mq-sched: use q->queue_depth as hint for q->nr_requests Ming Lei
2017-08-22 20:20   ` Bart Van Assche
2017-08-24  6:39     ` Ming Lei
2017-08-05  6:56 ` [PATCH V2 09/20] blk-mq: introduce BLK_MQ_F_SHARED_DEPTH Ming Lei
2017-08-22 21:55   ` Bart Van Assche
2017-08-23  6:46     ` Hannes Reinecke
2017-08-24  6:52     ` Ming Lei
2017-08-25 22:23       ` Bart Van Assche
2017-08-26  8:53         ` Ming Lei
2017-08-05  6:56 ` [PATCH V2 10/20] blk-mq-sched: introduce helpers for query, change busy state Ming Lei
2017-08-22 20:41   ` Bart Van Assche
2017-08-23 20:02     ` Jens Axboe
2017-08-24  6:55       ` Ming Lei
2017-08-24  6:54     ` Ming Lei
2017-08-05  6:56 ` [PATCH V2 11/20] blk-mq: introduce helpers for operating ->dispatch list Ming Lei
2017-08-22 20:43   ` Bart Van Assche
2017-08-24  0:59     ` Damien Le Moal
2017-08-24  7:10       ` Ming Lei
2017-08-24  7:42         ` Damien Le Moal
2017-08-24  6:57     ` Ming Lei
2017-08-05  6:56 ` [PATCH V2 12/20] blk-mq: introduce pointers to dispatch lock & list Ming Lei
2017-08-05  6:56 ` [PATCH V2 13/20] blk-mq: pass 'request_queue *' to several helpers of operating BUSY Ming Lei
2017-08-05  6:56 ` [PATCH V2 14/20] blk-mq-sched: improve IO scheduling on SCSI devcie Ming Lei
2017-08-22 20:51   ` Bart Van Assche
2017-08-24  7:14     ` Ming Lei
2017-08-05  6:57 ` [PATCH V2 15/20] block: introduce rqhash helpers Ming Lei
2017-08-05  6:57 ` [PATCH V2 16/20] block: move actual bio merge code into __elv_merge Ming Lei
2017-08-05  6:57 ` [PATCH V2 17/20] block: add check on elevator for supporting bio merge via hashtable from blk-mq sw queue Ming Lei
2017-08-05  6:57 ` [PATCH V2 18/20] block: introduce .last_merge and .hash to blk_mq_ctx Ming Lei
2017-08-05  6:57 ` [PATCH V2 19/20] blk-mq-sched: refactor blk_mq_sched_try_merge() Ming Lei
2017-08-05  6:57 ` [PATCH V2 20/20] blk-mq: improve bio merge from blk-mq sw queue Ming Lei
2017-08-07 12:48 ` [PATCH V2 00/20] blk-mq-sched: improve SCSI-MQ performance Laurence Oberman
2017-08-07 15:27   ` Bart Van Assche
2017-08-07 17:29     ` Laurence Oberman
2017-08-07 18:46       ` Laurence Oberman
2017-08-07 19:46         ` Laurence Oberman
2017-08-07 23:04       ` Ming Lei
     [not found]   ` <CAFfF4qv3W6D-j8BSSZbwPLqhd_mmwk8CZQe7dSqud8cMMd2yPg@mail.gmail.com>
2017-08-07 22:29     ` Bart Van Assche
2017-08-07 23:17     ` Ming Lei
2017-08-08 13:41     ` Ming Lei
2017-08-08 13:58       ` Laurence Oberman
2017-08-08  8:09 ` Paolo Valente
2017-08-08  9:09   ` Ming Lei
2017-08-08  9:13     ` Paolo Valente
2017-08-11  8:11 ` Christoph Hellwig
2017-08-11 14:25   ` James Bottomley
2017-08-23 16:12 ` Bart Van Assche
2017-08-23 16:15   ` Jens Axboe
2017-08-23 16:24     ` Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.