linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: Ming Lei <ming.lei@redhat.com>, Long Li <longli@microsoft.com>
Cc: Keith Busch <kbusch@kernel.org>, Jens Axboe <axboe@fb.com>,
	Christoph Hellwig <hch@lst.de>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	Sagi Grimberg <sagi@grimberg.me>
Subject: Re: [PATCH 2/2] nvme-pci: poll IO after batch submission for multi-mapping queue
Date: Tue, 12 Nov 2019 17:25:59 +0100	[thread overview]
Message-ID: <8198fd99-6b47-7594-ba1c-4a15ffe25269@suse.de> (raw)
In-Reply-To: <20191112023920.GD15079@ming.t460p>

On 11/12/19 3:39 AM, Ming Lei wrote:
> On Tue, Nov 12, 2019 at 12:33:50AM +0000, Long Li wrote:
>>> From: Christoph Hellwig <hch@lst.de>
>>> Sent: Monday, November 11, 2019 12:45 PM
>>> To: Ming Lei <ming.lei@redhat.com>
>>> Cc: linux-nvme@lists.infradead.org; Keith Busch <kbusch@kernel.org>; Jens
>>> Axboe <axboe@fb.com>; Christoph Hellwig <hch@lst.de>; Sagi Grimberg
>>> <sagi@grimberg.me>; Long Li <longli@microsoft.com>
>>> Subject: Re: [PATCH 2/2] nvme-pci: poll IO after batch submission for multi-
>>> mapping queue
>>>
>>> On Fri, Nov 08, 2019 at 11:55:08AM +0800, Ming Lei wrote:
>>>> f9dde187fa92("nvme-pci: remove cq check after submission") removes cq
>>>> check after submission, this change actually causes performance
>>>> regression on some NVMe drive in which single nvmeq handles requests
>>>> originated from more than one blk-mq sw queues(call it multi-mapping
>>>> queue).
>>>
>>>> Follows test result done on Azure L80sv2 guest with NVMe drive(
>>>> Microsoft Corporation Device b111). This guest has 80 CPUs and 10 numa
>>>> nodes, and each NVMe drive supports 8 hw queues.
>>>
>>> Have you actually seen this on a real nvme drive as well?
>>>
>>> Note that it is kinda silly to limit queues like that in VMs, so I really don't think
>>> we should optimize the driver for this particular case.
>>
>> I tested on an Azure L80s_v2 VM with newer Samsung P983 NVMe SSD (with 32 hardware queues). Tests also showed soft lockup when 32 queues are shared by 80 CPUs. 
>>
> 
> BTW, do you see if this simple change makes a difference?
> 
>> The issue will likely show up if the number of NVMe hardware queues is less than the number of CPUs. I think this may be a likely configuration on a very large system. (e.g. the largest VM on Azure has 416 cores)
>>
> 
> 'the number of NVMe hardware queues' above should be the number of single NVMe drive.
> I believe 32 hw queues is common, also poll queues may take several from the total 32.
> When interrupt handling on single CPU core can't catch up with NVMe's IO handling,
> soft lockup could be triggered. Of course, there are lot kinds of supported processors
> by Linux.
> 
But then we should rather work on eliminating the soft lockup itself.
Switching to polling for completions on the same CPU isn't going to
help; you just stall all other NVMe's which might be waiting for
interrupts arriving on this CPU.
(Nitpick: what does happen with the interrupt if we have a mask of
several CPUs? Will the interrupt delivered to one CPU?
To all in the mask? And if that, how do the other CPU cores notice that
one is working on that interrupt? Questions ...)

Can't we implement blk_poll? Or maybe even threaded interrupts?

> Also when (nr_nvme_drives * nr_nvme_hw_queues) > nr_cpu_cores, one same CPU
> can be assigned to handle more than 1 nvme IO queue interrupt from different
> NVMe drive, the situation becomes worse.
> 
That is arguably bad; especially so as we're doing automatic interrupt
affinity.

-- 
Dr. Hannes Reinecke		      Teamlead Storage & Networking
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 247165 (AG München), GF: Felix Imendörffer

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2019-11-12 16:26 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-08  3:55 [PATCH 0/2] nvme-pci: improve IO performance via poll after batch submission Ming Lei
2019-11-08  3:55 ` [PATCH 1/2] nvme-pci: move sq/cq_poll lock initialization into nvme_init_queue Ming Lei
2019-11-08  4:12   ` Keith Busch
2019-11-08  7:09     ` Ming Lei
2019-11-08  3:55 ` [PATCH 2/2] nvme-pci: poll IO after batch submission for multi-mapping queue Ming Lei
2019-11-11 20:44   ` Christoph Hellwig
2019-11-12  0:33     ` Long Li
2019-11-12  1:35       ` Sagi Grimberg
2019-11-12  2:39       ` Ming Lei
2019-11-12 16:25         ` Hannes Reinecke [this message]
2019-11-12 16:49           ` Keith Busch
2019-11-12 17:29             ` Hannes Reinecke
2019-11-13  3:05               ` Ming Lei
2019-11-13  3:17                 ` Keith Busch
2019-11-13  3:57                   ` Ming Lei
2019-11-12 21:20         ` Long Li
2019-11-12 21:36           ` Keith Busch
2019-11-13  0:50             ` Long Li
2019-11-13  2:24           ` Ming Lei
2019-11-12  2:07     ` Ming Lei
2019-11-12  1:44   ` Sagi Grimberg
2019-11-12  9:56     ` Ming Lei
2019-11-12 17:35       ` Sagi Grimberg
2019-11-12 21:17         ` Long Li
2019-11-12 23:44         ` Jens Axboe
2019-11-13  2:47         ` Ming Lei
2019-11-12 18:11   ` Nadolski, Edmund
2019-11-13 13:46     ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8198fd99-6b47-7594-ba1c-4a15ffe25269@suse.de \
    --to=hare@suse.de \
    --cc=axboe@fb.com \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=longli@microsoft.com \
    --cc=ming.lei@redhat.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).