All of lore.kernel.org
 help / color / mirror / Atom feed
From: hare@suse.de (Hannes Reinecke)
Subject: dm-multipath low performance with blk-mq
Date: Mon, 1 Feb 2016 07:46:59 +0100	[thread overview]
Message-ID: <56AEFF63.7050606@suse.de> (raw)
In-Reply-To: <20160130191238.GA18686@redhat.com>

On 01/30/2016 08:12 PM, Mike Snitzer wrote:
> On Sat, Jan 30 2016 at  3:52am -0500,
> Hannes Reinecke <hare@suse.de> wrote:
> 
>> On 01/30/2016 12:35 AM, Mike Snitzer wrote:
>>>
>>> Your test above is prone to exhaust the dm-mpath blk-mq tags (128)
>>> because 24 threads * 32 easily exceeds 128 (by a factor of 6).
>>>
>>> I found that we were context switching (via bt_get's io_schedule)
>>> waiting for tags to become available.
>>>
>>> This is embarassing but, until Jens told me today, I was oblivious to
>>> the fact that the number of blk-mq's tags per hw_queue was defined by
>>> tag_set.queue_depth.
>>>
>>> Previously request-based DM's blk-mq support had:
>>> md->tag_set.queue_depth = BLKDEV_MAX_RQ; (again: 128)
>>>
>>> Now I have a patch that allows tuning queue_depth via dm_mod module
>>> parameter.  And I'll likely bump the default to 4096 or something (doing
>>> so eliminated blocking in bt_get).
>>>
>>> But eliminating the tags bottleneck only raised my read IOPs from ~600K
>>> to ~800K (using 1 hw_queue for both null_blk and dm-mpath).
>>>
>>> When I raise nr_hw_queues to 4 for null_blk (keeping dm-mq at 1) I see a
>>> whole lot more context switching due to request-based DM's use of
>>> ksoftirqd (and kworkers) for request completion.
>>>
>>> So I'm moving on to optimizing the completion path.  But at least some
>>> progress was made, more to come...
>>>
>>
>> Would you mind sharing your patches?
> 
> I'm still working through this.  I'll hopefully have a handful of
> RFC-level changes by end of day Monday.  But could take longer.
> 
> One change that I already shared in a previous mail is:
> http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=devel2&id=99ebcaf36d9d1fa3acec98492c36664d57ba8fbd
> 
>> We're currently doing tests with a high-performance FC setup
>> (16G FC with all-flash storage), and are still 20% short of the
>> announced backend performance.
>>
>> Just as a side note: we're currently getting 550k IOPs.
>> With unpatched dm-mpath.
> 
> What is your test workload?  If you can share I'll be sure to factor it
> into my testing.
> 
That's a plain random read via fio, using 8 LUNs on the target.

>> So nearly on par with your null-blk setup. but with real hardware.
>> (Which in itself is pretty cool. You should get faster RAM :-)
> 
> You've misunderstood what I said my null_blk (RAM) performance is.
> 
> My null_blk test gets ~1900K read IOPs.  But dm-mpath ontop only gets
> between 600K and 1000K IOPs depending on $FIO_QUEUE_DEPTH and if I
> use multiple $NULL_BLK_HW_QUEUES.
> 
Right.
We're using two 16G FC links, each talking to 4 LUNs.
With dm-mpath on top. The FC HBAs have a hardware queue depth
of roughly 2000, so we might need to tweak the queue depth of the
multipath devices, too.


Will be having a look at your patches.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare at suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: F. Imend?rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG N?rnberg)

WARNING: multiple messages have this Message-ID (diff)
From: Hannes Reinecke <hare@suse.de>
To: Mike Snitzer <snitzer@redhat.com>
Cc: axboe@kernel.dk, "keith.busch@intel.com" <keith.busch@intel.com>,
	Sagi Grimberg <sagig@dev.mellanox.co.il>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	Christoph Hellwig <hch@infradead.org>,
	device-mapper development <dm-devel@redhat.com>,
	linux-block@vger.kernel.org,
	Bart Van Assche <bart.vanassche@sandisk.com>
Subject: Re: dm-multipath low performance with blk-mq
Date: Mon, 1 Feb 2016 07:46:59 +0100	[thread overview]
Message-ID: <56AEFF63.7050606@suse.de> (raw)
In-Reply-To: <20160130191238.GA18686@redhat.com>

On 01/30/2016 08:12 PM, Mike Snitzer wrote:
> On Sat, Jan 30 2016 at  3:52am -0500,
> Hannes Reinecke <hare@suse.de> wrote:
> 
>> On 01/30/2016 12:35 AM, Mike Snitzer wrote:
>>>
>>> Your test above is prone to exhaust the dm-mpath blk-mq tags (128)
>>> because 24 threads * 32 easily exceeds 128 (by a factor of 6).
>>>
>>> I found that we were context switching (via bt_get's io_schedule)
>>> waiting for tags to become available.
>>>
>>> This is embarassing but, until Jens told me today, I was oblivious to
>>> the fact that the number of blk-mq's tags per hw_queue was defined by
>>> tag_set.queue_depth.
>>>
>>> Previously request-based DM's blk-mq support had:
>>> md->tag_set.queue_depth = BLKDEV_MAX_RQ; (again: 128)
>>>
>>> Now I have a patch that allows tuning queue_depth via dm_mod module
>>> parameter.  And I'll likely bump the default to 4096 or something (doing
>>> so eliminated blocking in bt_get).
>>>
>>> But eliminating the tags bottleneck only raised my read IOPs from ~600K
>>> to ~800K (using 1 hw_queue for both null_blk and dm-mpath).
>>>
>>> When I raise nr_hw_queues to 4 for null_blk (keeping dm-mq at 1) I see a
>>> whole lot more context switching due to request-based DM's use of
>>> ksoftirqd (and kworkers) for request completion.
>>>
>>> So I'm moving on to optimizing the completion path.  But at least some
>>> progress was made, more to come...
>>>
>>
>> Would you mind sharing your patches?
> 
> I'm still working through this.  I'll hopefully have a handful of
> RFC-level changes by end of day Monday.  But could take longer.
> 
> One change that I already shared in a previous mail is:
> http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=devel2&id=99ebcaf36d9d1fa3acec98492c36664d57ba8fbd
> 
>> We're currently doing tests with a high-performance FC setup
>> (16G FC with all-flash storage), and are still 20% short of the
>> announced backend performance.
>>
>> Just as a side note: we're currently getting 550k IOPs.
>> With unpatched dm-mpath.
> 
> What is your test workload?  If you can share I'll be sure to factor it
> into my testing.
> 
That's a plain random read via fio, using 8 LUNs on the target.

>> So nearly on par with your null-blk setup. but with real hardware.
>> (Which in itself is pretty cool. You should get faster RAM :-)
> 
> You've misunderstood what I said my null_blk (RAM) performance is.
> 
> My null_blk test gets ~1900K read IOPs.  But dm-mpath ontop only gets
> between 600K and 1000K IOPs depending on $FIO_QUEUE_DEPTH and if I
> use multiple $NULL_BLK_HW_QUEUES.
> 
Right.
We're using two 16G FC links, each talking to 4 LUNs.
With dm-mpath on top. The FC HBAs have a hardware queue depth
of roughly 2000, so we might need to tweak the queue depth of the
multipath devices, too.


Will be having a look at your patches.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

  reply	other threads:[~2016-02-01  6:46 UTC|newest]

Thread overview: 127+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-18 12:04 dm-multipath low performance with blk-mq Sagi Grimberg
2016-01-19 10:37 ` Sagi Grimberg
2016-01-19 22:45   ` Mike Snitzer
2016-01-19 22:45     ` Mike Snitzer
2016-01-25 21:40     ` Mike Snitzer
2016-01-25 21:40       ` Mike Snitzer
2016-01-25 23:37       ` [dm-devel] " Benjamin Marzinski
2016-01-25 23:37         ` Benjamin Marzinski
2016-01-26 13:29         ` Mike Snitzer
2016-01-26 13:29           ` Mike Snitzer
2016-01-26 14:01           ` Hannes Reinecke
2016-01-26 14:47             ` Mike Snitzer
2016-01-26 14:47               ` Mike Snitzer
2016-01-26 14:56               ` Christoph Hellwig
2016-01-26 14:56                 ` Christoph Hellwig
2016-01-26 15:27                 ` Mike Snitzer
2016-01-26 15:27                   ` Mike Snitzer
2016-01-26 15:57             ` Benjamin Marzinski
2016-01-27 11:14           ` Sagi Grimberg
2016-01-27 11:14             ` Sagi Grimberg
2016-01-27 17:48             ` Mike Snitzer
2016-01-27 17:48               ` Mike Snitzer
2016-01-27 17:51               ` Jens Axboe
2016-01-27 17:51                 ` Jens Axboe
2016-01-27 18:16                 ` Mike Snitzer
2016-01-27 18:16                   ` Mike Snitzer
2016-01-27 18:26                   ` Jens Axboe
2016-01-27 18:26                     ` Jens Axboe
2016-01-27 19:14                     ` Mike Snitzer
2016-01-27 19:14                       ` Mike Snitzer
2016-01-27 19:50                       ` Jens Axboe
2016-01-27 19:50                         ` Jens Axboe
2016-01-27 17:56               ` Sagi Grimberg
2016-01-27 17:56                 ` Sagi Grimberg
2016-01-27 18:42                 ` Mike Snitzer
2016-01-27 18:42                   ` Mike Snitzer
2016-01-27 19:49                   ` Jens Axboe
2016-01-27 19:49                     ` Jens Axboe
2016-01-27 20:45                     ` Mike Snitzer
2016-01-27 20:45                       ` Mike Snitzer
2016-01-29 23:35                 ` Mike Snitzer
2016-01-29 23:35                   ` Mike Snitzer
2016-01-30  8:52                   ` Hannes Reinecke
2016-01-30  8:52                     ` Hannes Reinecke
2016-01-30 19:12                     ` Mike Snitzer
2016-01-30 19:12                       ` Mike Snitzer
2016-02-01  6:46                       ` Hannes Reinecke [this message]
2016-02-01  6:46                         ` Hannes Reinecke
2016-02-03 18:04                         ` Mike Snitzer
2016-02-03 18:04                           ` Mike Snitzer
2016-02-03 18:24                           ` Mike Snitzer
2016-02-03 18:24                             ` Mike Snitzer
2016-02-03 19:22                             ` Mike Snitzer
2016-02-03 19:22                               ` Mike Snitzer
2016-02-04  6:54                             ` Hannes Reinecke
2016-02-04  6:54                               ` Hannes Reinecke
2016-02-04 13:54                               ` Mike Snitzer
2016-02-04 13:54                                 ` Mike Snitzer
2016-02-04 13:58                                 ` Hannes Reinecke
2016-02-04 13:58                                   ` Hannes Reinecke
2016-02-04 14:09                                   ` Mike Snitzer
2016-02-04 14:09                                     ` Mike Snitzer
2016-02-04 14:32                                     ` Hannes Reinecke
2016-02-04 14:32                                       ` Hannes Reinecke
2016-02-04 14:44                                       ` Mike Snitzer
2016-02-04 14:44                                         ` Mike Snitzer
2016-02-05 15:13                                 ` [RFC PATCH] dm: fix excessive dm-mq context switching Mike Snitzer
2016-02-05 15:13                                   ` Mike Snitzer
2016-02-05 18:05                                   ` Mike Snitzer
2016-02-05 18:05                                     ` Mike Snitzer
2016-02-05 19:19                                     ` Mike Snitzer
2016-02-05 19:19                                       ` Mike Snitzer
2016-02-07 15:41                                       ` Sagi Grimberg
2016-02-07 15:41                                         ` Sagi Grimberg
2016-02-07 16:07                                         ` Mike Snitzer
2016-02-07 16:07                                           ` Mike Snitzer
2016-02-07 16:42                                           ` Sagi Grimberg
2016-02-07 16:42                                             ` Sagi Grimberg
2016-02-07 16:37                                         ` Bart Van Assche
2016-02-07 16:37                                           ` Bart Van Assche
2016-02-07 16:43                                           ` Sagi Grimberg
2016-02-07 16:43                                             ` Sagi Grimberg
2016-02-07 16:53                                             ` Mike Snitzer
2016-02-07 16:53                                               ` Mike Snitzer
2016-02-07 16:54                                             ` Sagi Grimberg
2016-02-07 16:54                                               ` Sagi Grimberg
2016-02-07 17:20                                               ` Mike Snitzer
2016-02-07 17:20                                                 ` Mike Snitzer
2016-02-08 12:21                                                 ` Sagi Grimberg
2016-02-08 12:21                                                   ` Sagi Grimberg
2016-02-08 14:34                                                   ` Mike Snitzer
2016-02-08 14:34                                                     ` Mike Snitzer
2016-02-09  7:50                                                 ` Hannes Reinecke
2016-02-09  7:50                                                   ` Hannes Reinecke
2016-02-09 14:55                                                   ` Mike Snitzer
2016-02-09 14:55                                                     ` Mike Snitzer
2016-02-09 15:32                                                     ` Hannes Reinecke
2016-02-09 15:32                                                       ` Hannes Reinecke
2016-02-10  0:45                                                       ` Mike Snitzer
2016-02-10  0:45                                                         ` Mike Snitzer
2016-02-11  1:50                                                         ` RCU-ified dm-mpath for testing/review Mike Snitzer
2016-02-11  3:35                                                           ` Mike Snitzer
2016-02-11  3:35                                                             ` Mike Snitzer
2016-02-11 15:34                                                           ` Mike Snitzer
2016-02-11 15:34                                                             ` Mike Snitzer
2016-02-12 15:18                                                             ` Hannes Reinecke
2016-02-12 15:18                                                               ` Hannes Reinecke
2016-02-12 15:26                                                               ` Mike Snitzer
2016-02-12 15:26                                                                 ` Mike Snitzer
2016-02-12 16:04                                                                 ` Hannes Reinecke
2016-02-12 16:04                                                                   ` Hannes Reinecke
2016-02-12 18:00                                                                   ` Mike Snitzer
2016-02-12 18:00                                                                     ` Mike Snitzer
2016-02-15  6:47                                                                     ` Hannes Reinecke
2016-02-15  6:47                                                                       ` Hannes Reinecke
2016-01-26  1:49       ` [dm-devel] dm-multipath low performance with blk-mq Benjamin Marzinski
2016-01-26  1:49         ` Benjamin Marzinski
2016-01-26 16:03       ` Mike Snitzer
2016-01-26 16:03         ` Mike Snitzer
2016-01-26 16:44         ` Christoph Hellwig
2016-01-26 16:44           ` Christoph Hellwig
2016-01-27  2:09           ` Mike Snitzer
2016-01-27  2:09             ` Mike Snitzer
2016-01-27 11:10             ` Sagi Grimberg
2016-01-27 11:10               ` Sagi Grimberg
2016-01-26 21:40         ` [dm-devel] " Benjamin Marzinski
2016-01-26 21:40           ` Benjamin Marzinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56AEFF63.7050606@suse.de \
    --to=hare@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.