From: Ming Lei <ming.lei@redhat.com> To: Hannes Reinecke <hare@suse.de> Cc: Sagi Grimberg <sagi@grimberg.me>, Long Li <longli@microsoft.com>, "linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>, Jens Axboe <axboe@fb.com>, Keith Busch <kbusch@kernel.org>, Christoph Hellwig <hch@lst.de> Subject: Re: [PATCH 2/2] nvme-pci: poll IO after batch submission for multi-mapping queue Date: Wed, 13 Nov 2019 11:05:20 +0800 Message-ID: <20191113030520.GC28701@ming.t460p> (raw) In-Reply-To: <f69d4e4c-3d6e-74c0-ed97-cac3c6b230c2@suse.de> On Tue, Nov 12, 2019 at 06:29:34PM +0100, Hannes Reinecke wrote: > On 11/12/19 5:49 PM, Keith Busch wrote: > > On Tue, Nov 12, 2019 at 05:25:59PM +0100, Hannes Reinecke wrote: > > > (Nitpick: what does happen with the interrupt if we have a mask of > > > several CPUs? Will the interrupt delivered to one CPU? > > > To all in the mask? > > > > The hard-interrupt will be delivered to effectively one of the CPUs in the > > mask. The one that is selected is determined when the IRQ is allocated, > > and it should try to select one form the mask that is least used (see > > matrix_find_best_cpu_managed()). > > > Yeah, just as I thought. > Which also means that we need to redirect the irq to a non-busy cpu to avoid > stalls under high load. > Expecially if we have several NVMes to deal with. IRQ matrix tries best to assign different effective CPU to each vector for handling interrupt. In theory, if (nr_nvme_drives * nr_nvme_hw_queues) < nr_cpu_cores, each hw queue may be assigned to one single effective CPU for handling the queue's interrupt. Otherwise, one CPU may be responsible for handling interrupts from more than 1 drive's queues. But that is just in theory, for example, irq matrix takes admin queues into account of managed IRQ. On Azure, there are such cases, however soft lockup still can't be triggered after applying checking cq in submission. That means one CPU is enough to handle two hw queue's interrupt in this case. Again, it depends both CPU and NVMe's drive. For network, packets flood can come any time unlimitedly, however number of in-flight storage requests is always limited, so the situation could be much better for storage IO than network in which NAPI may avoid the issue. > > > > Can't we implement blk_poll? Or maybe even threaded interrupts? > > > > Threaded interrupts sound good. Currently, though, threaded interrupts > > execute only on the same cpu as the hard irq. There was a proposal here to > > change that to use any CPU in the mask, and I still think it makes sense > > > > http://lists.infradead.org/pipermail/linux-nvme/2019-August/026628.html > > > That looks like just the ticket. > In combination with threaded irqs and possibly blk_poll to avoid irq storms > we should be good. Threaded irq can't help Azure's performance, because Azure's nvme implementation applies aggressive interrupt coalescing. Thanks, Ming _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme
next prev parent reply index Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-11-08 3:55 [PATCH 0/2] nvme-pci: improve IO performance via poll after batch submission Ming Lei 2019-11-08 3:55 ` [PATCH 1/2] nvme-pci: move sq/cq_poll lock initialization into nvme_init_queue Ming Lei 2019-11-08 4:12 ` Keith Busch 2019-11-08 7:09 ` Ming Lei 2019-11-08 3:55 ` [PATCH 2/2] nvme-pci: poll IO after batch submission for multi-mapping queue Ming Lei 2019-11-11 20:44 ` Christoph Hellwig 2019-11-12 0:33 ` Long Li 2019-11-12 1:35 ` Sagi Grimberg 2019-11-12 2:39 ` Ming Lei 2019-11-12 16:25 ` Hannes Reinecke 2019-11-12 16:49 ` Keith Busch 2019-11-12 17:29 ` Hannes Reinecke 2019-11-13 3:05 ` Ming Lei [this message] 2019-11-13 3:17 ` Keith Busch 2019-11-13 3:57 ` Ming Lei 2019-11-12 21:20 ` Long Li 2019-11-12 21:36 ` Keith Busch 2019-11-13 0:50 ` Long Li 2019-11-13 2:24 ` Ming Lei 2019-11-12 2:07 ` Ming Lei 2019-11-12 1:44 ` Sagi Grimberg 2019-11-12 9:56 ` Ming Lei 2019-11-12 17:35 ` Sagi Grimberg 2019-11-12 21:17 ` Long Li 2019-11-12 23:44 ` Jens Axboe 2019-11-13 2:47 ` Ming Lei 2019-11-12 18:11 ` Nadolski, Edmund 2019-11-13 13:46 ` Ming Lei
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20191113030520.GC28701@ming.t460p \ --to=ming.lei@redhat.com \ --cc=axboe@fb.com \ --cc=hare@suse.de \ --cc=hch@lst.de \ --cc=kbusch@kernel.org \ --cc=linux-nvme@lists.infradead.org \ --cc=longli@microsoft.com \ --cc=sagi@grimberg.me \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Linux-NVME Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/linux-nvme/0 linux-nvme/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 linux-nvme linux-nvme/ https://lore.kernel.org/linux-nvme \ linux-nvme@lists.infradead.org public-inbox-index linux-nvme Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.infradead.lists.linux-nvme AGPL code for this site: git clone https://public-inbox.org/public-inbox.git