linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>, Long Li <longli@microsoft.com>,
	linux-nvme@lists.infradead.org, Jens Axboe <axboe@fb.com>,
	Nadolski Edmund <edmund.nadolski@intel.com>,
	Keith Busch <kbusch@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH V3 0/2] nvme-pci: check CQ after batch submission for Microsoft device
Date: Fri, 22 Nov 2019 18:25:17 +0800	[thread overview]
Message-ID: <20191122102517.GA30001@ming.t460p> (raw)
In-Reply-To: <20191122095743.GA21087@lst.de>

On Fri, Nov 22, 2019 at 10:57:43AM +0100, Christoph Hellwig wrote:
> On Fri, Nov 22, 2019 at 05:44:57PM +0800, Ming Lei wrote:
> > > Can this default coalescing setting be turned off with a "set feature"
> > > command?
> > > 
> > 
> > At default, 'get feature -f 0x8' shows zero, and nothing changes after
> > running 'set feature -f 0x8 -v 0'.
> > 
> > BTW, soft lockup from another Samsung NVMe can be fixed by this patch
> > too. I am confirming if the Samsung NVMe applies aggressive interrupt
> > coalescing too.
> 
> I think we are missing up a few things here, and just polling the
> completion queue from submission context isn't the right answer for
> either.
> 
> The aggressive interrupt coalescing and resulting long run times of
> the irq handler really just means we need to stop processing them from
> hard irq context at all.  NVMe already has support for threaded
> interrupts and we need to make that the default (and possibly even
> the only option).  At that point we can do a cond_resched() in this
> handler to avoid soft lockups.

I am pretty sure that threaded interrupt will cause performance drop a lot,
and Long has verified that.

> 
> The next problem is drivers with less completion queues than cpu cores,

IRQ matrix has balanced interrupt load among CPUs already. For example,
on Azure, 8 cpu cores can be mapped to one hctx, however, there are only
several CPUs which handles interrupts from at most two queues.

> as that will still overload the one cpu that the interrupt handler was
> assigned to.  A dumb fix would be a cpu mask for the threaded interrupt

Actually one CPU is fast enough to handle several drive's interrupt handling.
Also there is per-queue depth limit, and the interrupt flood issue in network
can't be serious on storage.

So far I only got three NVMe softlock tickets, two of them can be fixed by this
patch, and another one turns out a IOMMU lock issue. Are there other
NVMe softlock report?

At least Azure's NVMe applies aggressive interrupt coalescing, in which
the softlock is usually triggered by interrupt delay handling, then
there can be lots of requests to be handled in single interrupt handler
some times. It doesn't mean CPU is saturated, and it is just sort of
interrupt peak. And it is very specific with Azure's implementation.

Otherwise, this simple patch can't fix the issue.

> handler that can be used for round robin scheduling, but that probably
> won't help with getting good performance.  The other idea is to use
> "virtual" completion queues.  NVMe allows free form command ids, so
> we could OR an index for the relative cpu number inside this queue
> into the command id and and then create one interrupt thread for
> each of them.  Although I'd like to hear from Thomas on what he thinks
> of multiple threads per hardirq first.

As mentioned above, there isn't proof it is caused by interrupt loading
saturating the CPU, so not sure we need to re-invent a wheel.


Thanks,
Ming


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2019-11-22 10:25 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-14  2:59 [PATCH V3 0/2] nvme-pci: check CQ after batch submission for Microsoft device Ming Lei
2019-11-14  2:59 ` [PATCH V3 1/2] nvme-pci: move sq/cq_poll lock initialization into nvme_init_queue Ming Lei
2019-11-14  2:59 ` [PATCH V3 2/2] nvme-pci: check CQ after batch submission for Microsoft device Ming Lei
2019-11-14  4:56   ` Keith Busch
2019-11-14  8:56     ` Ming Lei
2019-11-21  3:11 ` [PATCH V3 0/2] " Ming Lei
2019-11-21  6:14   ` Christoph Hellwig
2019-11-21  7:46     ` Ming Lei
2019-11-21 15:45       ` Keith Busch
2019-11-22  9:44         ` Ming Lei
2019-11-22  9:57           ` Christoph Hellwig
2019-11-22 10:25             ` Ming Lei [this message]
2019-11-22 14:04               ` Jens Axboe
2019-11-22 21:49                 ` Ming Lei
2019-11-22 21:58                   ` Jens Axboe
2019-11-22 22:30                     ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191122102517.GA30001@ming.t460p \
    --to=ming.lei@redhat.com \
    --cc=axboe@fb.com \
    --cc=edmund.nadolski@intel.com \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=longli@microsoft.com \
    --cc=sagi@grimberg.me \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).