Linux-NVME Archive on lore.kernel.org
 help / color / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Hannes Reinecke <hare@suse.de>
Cc: Sagi Grimberg <sagi@grimberg.me>, Long Li <longli@microsoft.com>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	Jens Axboe <axboe@fb.com>, Keith Busch <kbusch@kernel.org>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH 2/2] nvme-pci: poll IO after batch submission for multi-mapping queue
Date: Wed, 13 Nov 2019 11:05:20 +0800
Message-ID: <20191113030520.GC28701@ming.t460p> (raw)
In-Reply-To: <f69d4e4c-3d6e-74c0-ed97-cac3c6b230c2@suse.de>

On Tue, Nov 12, 2019 at 06:29:34PM +0100, Hannes Reinecke wrote:
> On 11/12/19 5:49 PM, Keith Busch wrote:
> > On Tue, Nov 12, 2019 at 05:25:59PM +0100, Hannes Reinecke wrote:
> > > (Nitpick: what does happen with the interrupt if we have a mask of
> > > several CPUs? Will the interrupt delivered to one CPU?
> > > To all in the mask?
> > 
> > The hard-interrupt will be delivered to effectively one of the CPUs in the
> > mask. The one that is selected is determined when the IRQ is allocated,
> > and it should try to select one form the mask that is least used (see
> > matrix_find_best_cpu_managed()).
> > 
> Yeah, just as I thought.
> Which also means that we need to redirect the irq to a non-busy cpu to avoid
> stalls under high load.
> Expecially if we have several NVMes to deal with.

IRQ matrix tries best to assign different effective CPU to each vector for
handling interrupt.

In theory, if (nr_nvme_drives * nr_nvme_hw_queues) < nr_cpu_cores, each
hw queue may be assigned to one single effective CPU for handling the
queue's interrupt. Otherwise, one CPU may be responsible for handling
interrupts from more than 1 drive's queues. But that is just in theory,
for example, irq matrix takes admin queues into account of managed IRQ.

On Azure, there are such cases, however soft lockup still can't be
triggered after applying checking cq in submission. That means one CPU
is enough to handle two hw queue's interrupt in this case. Again,
it depends both CPU and NVMe's drive.

For network, packets flood can come any time unlimitedly, however number
of in-flight storage requests is always limited, so the situation could
be much better for storage IO than network in which NAPI may avoid the issue.

> 
> > > Can't we implement blk_poll? Or maybe even threaded interrupts?
> > 
> > Threaded interrupts sound good. Currently, though, threaded interrupts
> > execute only on the same cpu as the hard irq. There was a proposal here to
> > change that to use any CPU in the mask, and I still think it makes sense
> > 
> >    http://lists.infradead.org/pipermail/linux-nvme/2019-August/026628.html
> > 
> That looks like just the ticket.
> In combination with threaded irqs and possibly blk_poll to avoid irq storms
> we should be good.

Threaded irq can't help Azure's performance, because Azure's nvme implementation
applies aggressive interrupt coalescing.

Thanks, 
Ming


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply index

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-08  3:55 [PATCH 0/2] nvme-pci: improve IO performance via poll after batch submission Ming Lei
2019-11-08  3:55 ` [PATCH 1/2] nvme-pci: move sq/cq_poll lock initialization into nvme_init_queue Ming Lei
2019-11-08  4:12   ` Keith Busch
2019-11-08  7:09     ` Ming Lei
2019-11-08  3:55 ` [PATCH 2/2] nvme-pci: poll IO after batch submission for multi-mapping queue Ming Lei
2019-11-11 20:44   ` Christoph Hellwig
2019-11-12  0:33     ` Long Li
2019-11-12  1:35       ` Sagi Grimberg
2019-11-12  2:39       ` Ming Lei
2019-11-12 16:25         ` Hannes Reinecke
2019-11-12 16:49           ` Keith Busch
2019-11-12 17:29             ` Hannes Reinecke
2019-11-13  3:05               ` Ming Lei [this message]
2019-11-13  3:17                 ` Keith Busch
2019-11-13  3:57                   ` Ming Lei
2019-11-12 21:20         ` Long Li
2019-11-12 21:36           ` Keith Busch
2019-11-13  0:50             ` Long Li
2019-11-13  2:24           ` Ming Lei
2019-11-12  2:07     ` Ming Lei
2019-11-12  1:44   ` Sagi Grimberg
2019-11-12  9:56     ` Ming Lei
2019-11-12 17:35       ` Sagi Grimberg
2019-11-12 21:17         ` Long Li
2019-11-12 23:44         ` Jens Axboe
2019-11-13  2:47         ` Ming Lei
2019-11-12 18:11   ` Nadolski, Edmund
2019-11-13 13:46     ` Ming Lei

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191113030520.GC28701@ming.t460p \
    --to=ming.lei@redhat.com \
    --cc=axboe@fb.com \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=longli@microsoft.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-NVME Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nvme/0 linux-nvme/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nvme linux-nvme/ https://lore.kernel.org/linux-nvme \
		linux-nvme@lists.infradead.org
	public-inbox-index linux-nvme

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-nvme


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git