All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paolo Valente <paolo.valente@linaro.org>
To: Ming Lei <ming.lei@redhat.com>
Cc: Jens Axboe <axboe@fb.com>,
	linux-block <linux-block@vger.kernel.org>,
	Christoph Hellwig <hch@infradead.org>,
	Bart Van Assche <bart.vanassche@sandisk.com>,
	Laurence Oberman <loberman@redhat.com>
Subject: Re: [PATCH V2 00/20] blk-mq-sched: improve SCSI-MQ performance
Date: Tue, 8 Aug 2017 11:13:50 +0200	[thread overview]
Message-ID: <542C6135-7FEF-419F-A382-1091296CB671@linaro.org> (raw)
In-Reply-To: <20170808090938.GA19390@ming.t460p>


> Il giorno 08 ago 2017, alle ore 11:09, Ming Lei <ming.lei@redhat.com> =
ha scritto:
>=20
> On Tue, Aug 08, 2017 at 10:09:57AM +0200, Paolo Valente wrote:
>>=20
>>> Il giorno 05 ago 2017, alle ore 08:56, Ming Lei =
<ming.lei@redhat.com> ha scritto:
>>>=20
>>> In Red Hat internal storage test wrt. blk-mq scheduler, we
>>> found that I/O performance is much bad with mq-deadline, especially
>>> about sequential I/O on some multi-queue SCSI devcies(lpfc, qla2xxx,
>>> SRP...)
>>>=20
>>> Turns out one big issue causes the performance regression: requests
>>> are still dequeued from sw queue/scheduler queue even when ldd's
>>> queue is busy, so I/O merge becomes quite difficult to make, then
>>> sequential IO degrades a lot.
>>>=20
>>> The 1st five patches improve this situation, and brings back
>>> some performance loss.
>>>=20
>>> But looks they are still not enough. It is caused by
>>> the shared queue depth among all hw queues. For SCSI devices,
>>> .cmd_per_lun defines the max number of pending I/O on one
>>> request queue, which is per-request_queue depth. So during
>>> dispatch, if one hctx is too busy to move on, all hctxs can't
>>> dispatch too because of the per-request_queue depth.
>>>=20
>>> Patch 6 ~ 14 use per-request_queue dispatch list to avoid
>>> to dequeue requests from sw/scheduler queue when lld queue
>>> is busy.
>>>=20
>>> Patch 15 ~20 improve bio merge via hash table in sw queue,
>>> which makes bio merge more efficient than current approch
>>> in which only the last 8 requests are checked. Since patch
>>> 6~14 converts to the scheduler way of dequeuing one request
>>> from sw queue one time for SCSI device, and the times of
>>> acquring ctx->lock is increased, and merging bio via hash
>>> table decreases holding time of ctx->lock and should eliminate
>>> effect from patch 14.=20
>>>=20
>>> With this changes, SCSI-MQ sequential I/O performance is
>>> improved much, for lpfc, it is basically brought back
>>> compared with block legacy path[1], especially mq-deadline
>>> is improved by > X10 [1] on lpfc and by > 3X on SCSI SRP,
>>> For mq-none it is improved by 10% on lpfc, and write is
>>> improved by > 10% on SRP too.
>>>=20
>>> Also Bart worried that this patchset may affect SRP, so provide
>>> test data on SCSI SRP this time:
>>>=20
>>> - fio(libaio, bs:4k, dio, queue_depth:64, 64 jobs)
>>> - system(16 cores, dual sockets, mem: 96G)
>>>=20
>>>             |v4.13-rc3     |v4.13-rc3     | v4.13-rc3+patches |
>>>             |blk-legacy dd |blk-mq none   | blk-mq none  |
>>> -----------------------------------------------------------| =20
>>> read     :iops|         587K |         526K |         537K |
>>> randread :iops|         115K |         140K |         139K |
>>> write    :iops|         596K |         519K |         602K |
>>> randwrite:iops|         103K |         122K |         120K |
>>>=20
>>>=20
>>>             |v4.13-rc3     |v4.13-rc3     | v4.13-rc3+patches
>>>             |blk-legacy dd |blk-mq dd     | blk-mq dd    |
>>> ------------------------------------------------------------
>>> read     :iops|         587K |         155K |         522K |
>>> randread :iops|         115K |         140K |         141K |
>>> write    :iops|         596K |         135K |         587K |
>>> randwrite:iops|         103K |         120K |         118K |
>>>=20
>>> V2:
>>> 	- dequeue request from sw queues in round roubin's style
>>> 	as suggested by Bart, and introduces one helper in sbitmap
>>> 	for this purpose
>>> 	- improve bio merge via hash table from sw queue
>>> 	- add comments about using DISPATCH_BUSY state in lockless way,
>>> 	simplifying handling on busy state,
>>> 	- hold ctx->lock when clearing ctx busy bit as suggested
>>> 	by Bart
>>>=20
>>>=20
>>=20
>> Hi,
>> I've performance-tested Ming's patchset with the dbench4 test in
>> MMTests, and with the mq-deadline and bfq schedulers.  Max latencies,
>> have decreased dramatically: up to 32 times.  Very good results for
>> average latencies as well.
>>=20
>> For brevity, here are only results for deadline.  You can find full
>> results with bfq in the thread that triggered my testing of Ming's
>> patches [1].
>>=20
>> MQ-DEADLINE WITHOUT MING'S PATCHES
>>=20
>> Operation                Count    AvgLat    MaxLat
>> --------------------------------------------------
>> Flush                    13760    90.542 13221.495
>> Close                   137654     0.008    27.133
>> LockX                      640     0.009     0.115
>> Rename                    8064     1.062   246.759
>> ReadX                   297956     0.051   347.018
>> WriteX                   94698   425.636 15090.020
>> Unlink                   35077     0.580   208.462
>> UnlockX                    640     0.007     0.291
>> FIND_FIRST               66630     0.566   530.339
>> SET_FILE_INFORMATION     16000     1.419   811.494
>> QUERY_FILE_INFORMATION   30717     0.004     1.108
>> QUERY_PATH_INFORMATION  176153     0.182   517.419
>> QUERY_FS_INFORMATION     30857     0.018    18.562
>> NTCreateX               184145     0.281   582.076
>>=20
>> Throughput 8.93961 MB/sec  64 clients  64 procs  =
max_latency=3D15090.026 ms
>>=20
>> MQ-DEADLINE WITH MING'S PATCHES
>>=20
>> Operation                Count    AvgLat    MaxLat
>> --------------------------------------------------
>> Flush                    13760    48.650   431.525
>> Close                   144320     0.004     7.605
>> LockX                      640     0.005     0.019
>> Rename                    8320     0.187     5.702
>> ReadX                   309248     0.023   216.220
>> WriteX                   97176   338.961  5464.995
>> Unlink                   39744     0.454   315.207
>> UnlockX                    640     0.004     0.027
>> FIND_FIRST               69184     0.042    17.648
>> SET_FILE_INFORMATION     16128     0.113   134.464
>> QUERY_FILE_INFORMATION   31104     0.004     0.370
>> QUERY_PATH_INFORMATION  187136     0.031   168.554
>> QUERY_FS_INFORMATION     33024     0.009     2.915
>> NTCreateX               196672     0.152   163.835
>=20
> Hi Paolo,
>=20
> Thanks very much for testing this patchset!
>=20
> BTW, could you share us which kind of disk you are using
> in this test?
>=20

Absolutely:

ATA device, with non-removable media
	Model Number:       HITACHI HTS727550A9E364                =20
	Serial Number:      J3370082G622JD
	Firmware Revision:  JF3ZD0H0
	Transport:          Serial, ATA8-AST, SATA 1.0a, SATA II =
Extensions, SATA Rev 2.5, SATA Rev 2.6; Revision: ATA8-AST T13 Project =
D1697 Revision 0b

Thanks,
Paolo

> Thanks,
> Ming

  reply	other threads:[~2017-08-08  9:13 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-05  6:56 [PATCH V2 00/20] blk-mq-sched: improve SCSI-MQ performance Ming Lei
2017-08-05  6:56 ` [PATCH V2 01/20] blk-mq-sched: fix scheduler bad performance Ming Lei
2017-08-09  0:11   ` Omar Sandoval
2017-08-09  2:32     ` Ming Lei
2017-08-09  7:11       ` Omar Sandoval
2017-08-21  8:18         ` Ming Lei
2017-08-23  7:48         ` Ming Lei
2017-08-05  6:56 ` [PATCH V2 02/20] sbitmap: introduce __sbitmap_for_each_set() Ming Lei
2017-08-22 18:28   ` Bart Van Assche
2017-08-24  3:57     ` Ming Lei
2017-08-25 21:36       ` Bart Van Assche
2017-08-26  8:43         ` Ming Lei
2017-08-22 18:37   ` Bart Van Assche
2017-08-24  4:02     ` Ming Lei
2017-08-05  6:56 ` [PATCH V2 03/20] blk-mq: introduce blk_mq_dispatch_rq_from_ctx() Ming Lei
2017-08-22 18:45   ` Bart Van Assche
2017-08-24  4:52     ` Ming Lei
2017-08-25 21:41       ` Bart Van Assche
2017-08-26  8:47         ` Ming Lei
2017-08-05  6:56 ` [PATCH V2 04/20] blk-mq-sched: move actual dispatching into one helper Ming Lei
2017-08-22 19:50   ` Bart Van Assche
2017-08-05  6:56 ` [PATCH V2 05/20] blk-mq-sched: improve dispatching from sw queue Ming Lei
2017-08-22 19:55   ` Bart Van Assche
2017-08-23 19:58     ` Jens Axboe
2017-08-24  5:52     ` Ming Lei
2017-08-22 20:57   ` Bart Van Assche
2017-08-24  6:12     ` Ming Lei
2017-08-05  6:56 ` [PATCH V2 06/20] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed Ming Lei
2017-08-22 20:09   ` Bart Van Assche
2017-08-24  6:18     ` Ming Lei
2017-08-23 19:56   ` Jens Axboe
2017-08-24  6:38     ` Ming Lei
2017-08-25 10:19       ` Ming Lei
2017-08-05  6:56 ` [PATCH V2 07/20] blk-mq-sched: introduce blk_mq_sched_queue_depth() Ming Lei
2017-08-22 20:10   ` Bart Van Assche
2017-08-05  6:56 ` [PATCH V2 08/20] blk-mq-sched: use q->queue_depth as hint for q->nr_requests Ming Lei
2017-08-22 20:20   ` Bart Van Assche
2017-08-24  6:39     ` Ming Lei
2017-08-05  6:56 ` [PATCH V2 09/20] blk-mq: introduce BLK_MQ_F_SHARED_DEPTH Ming Lei
2017-08-22 21:55   ` Bart Van Assche
2017-08-23  6:46     ` Hannes Reinecke
2017-08-24  6:52     ` Ming Lei
2017-08-25 22:23       ` Bart Van Assche
2017-08-26  8:53         ` Ming Lei
2017-08-05  6:56 ` [PATCH V2 10/20] blk-mq-sched: introduce helpers for query, change busy state Ming Lei
2017-08-22 20:41   ` Bart Van Assche
2017-08-23 20:02     ` Jens Axboe
2017-08-24  6:55       ` Ming Lei
2017-08-24  6:54     ` Ming Lei
2017-08-05  6:56 ` [PATCH V2 11/20] blk-mq: introduce helpers for operating ->dispatch list Ming Lei
2017-08-22 20:43   ` Bart Van Assche
2017-08-24  0:59     ` Damien Le Moal
2017-08-24  7:10       ` Ming Lei
2017-08-24  7:42         ` Damien Le Moal
2017-08-24  6:57     ` Ming Lei
2017-08-05  6:56 ` [PATCH V2 12/20] blk-mq: introduce pointers to dispatch lock & list Ming Lei
2017-08-05  6:56 ` [PATCH V2 13/20] blk-mq: pass 'request_queue *' to several helpers of operating BUSY Ming Lei
2017-08-05  6:56 ` [PATCH V2 14/20] blk-mq-sched: improve IO scheduling on SCSI devcie Ming Lei
2017-08-22 20:51   ` Bart Van Assche
2017-08-24  7:14     ` Ming Lei
2017-08-05  6:57 ` [PATCH V2 15/20] block: introduce rqhash helpers Ming Lei
2017-08-05  6:57 ` [PATCH V2 16/20] block: move actual bio merge code into __elv_merge Ming Lei
2017-08-05  6:57 ` [PATCH V2 17/20] block: add check on elevator for supporting bio merge via hashtable from blk-mq sw queue Ming Lei
2017-08-05  6:57 ` [PATCH V2 18/20] block: introduce .last_merge and .hash to blk_mq_ctx Ming Lei
2017-08-05  6:57 ` [PATCH V2 19/20] blk-mq-sched: refactor blk_mq_sched_try_merge() Ming Lei
2017-08-05  6:57 ` [PATCH V2 20/20] blk-mq: improve bio merge from blk-mq sw queue Ming Lei
2017-08-07 12:48 ` [PATCH V2 00/20] blk-mq-sched: improve SCSI-MQ performance Laurence Oberman
2017-08-07 15:27   ` Bart Van Assche
2017-08-07 17:29     ` Laurence Oberman
2017-08-07 18:46       ` Laurence Oberman
2017-08-07 19:46         ` Laurence Oberman
2017-08-07 23:04       ` Ming Lei
     [not found]   ` <CAFfF4qv3W6D-j8BSSZbwPLqhd_mmwk8CZQe7dSqud8cMMd2yPg@mail.gmail.com>
2017-08-07 22:29     ` Bart Van Assche
2017-08-07 23:17     ` Ming Lei
2017-08-08 13:41     ` Ming Lei
2017-08-08 13:58       ` Laurence Oberman
2017-08-08  8:09 ` Paolo Valente
2017-08-08  9:09   ` Ming Lei
2017-08-08  9:13     ` Paolo Valente [this message]
2017-08-11  8:11 ` Christoph Hellwig
2017-08-11 14:25   ` James Bottomley
2017-08-23 16:12 ` Bart Van Assche
2017-08-23 16:15   ` Jens Axboe
2017-08-23 16:24     ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=542C6135-7FEF-419F-A382-1091296CB671@linaro.org \
    --to=paolo.valente@linaro.org \
    --cc=axboe@fb.com \
    --cc=bart.vanassche@sandisk.com \
    --cc=hch@infradead.org \
    --cc=linux-block@vger.kernel.org \
    --cc=loberman@redhat.com \
    --cc=ming.lei@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.