Re: [PATCH V3 0/2] nvme-pci: check CQ after batch submission for Microsoft device

From: Christoph Hellwig <hch@lst.de>
To: Ming Lei <ming.lei@redhat.com>
Cc: Sagi Grimberg <sagi@grimberg.me>, Long Li <longli@microsoft.com>,
	linux-nvme@lists.infradead.org, Jens Axboe <axboe@fb.com>,
	Nadolski Edmund <edmund.nadolski@intel.com>,
	Keith Busch <kbusch@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH V3 0/2] nvme-pci: check CQ after batch submission for Microsoft device
Date: Fri, 22 Nov 2019 10:57:43 +0100	[thread overview]
Message-ID: <20191122095743.GA21087@lst.de> (raw)
In-Reply-To: <20191122094457.GA23632@ming.t460p>

On Fri, Nov 22, 2019 at 05:44:57PM +0800, Ming Lei wrote:
> > Can this default coalescing setting be turned off with a "set feature"
> > command?
> > 
> 
> At default, 'get feature -f 0x8' shows zero, and nothing changes after
> running 'set feature -f 0x8 -v 0'.
> 
> BTW, soft lockup from another Samsung NVMe can be fixed by this patch
> too. I am confirming if the Samsung NVMe applies aggressive interrupt
> coalescing too.

I think we are missing up a few things here, and just polling the
completion queue from submission context isn't the right answer for
either.

The aggressive interrupt coalescing and resulting long run times of
the irq handler really just means we need to stop processing them from
hard irq context at all.  NVMe already has support for threaded
interrupts and we need to make that the default (and possibly even
the only option).  At that point we can do a cond_resched() in this
handler to avoid soft lockups.

The next problem is drivers with less completion queues than cpu cores,
as that will still overload the one cpu that the interrupt handler was
assigned to.  A dumb fix would be a cpu mask for the threaded interrupt
handler that can be used for round robin scheduling, but that probably
won't help with getting good performance.  The other idea is to use
"virtual" completion queues.  NVMe allows free form command ids, so
we could OR an index for the relative cpu number inside this queue
into the command id and and then create one interrupt thread for
each of them.  Although I'd like to hear from Thomas on what he thinks
of multiple threads per hardirq first.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme