Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

From: Ming Lei <ming.lei@redhat.com>
To: Keith Busch <kbusch@kernel.org>
Cc: Long Li <longli@microsoft.com>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	Keith Busch <keith.busch@intel.com>,
	Hannes Reinecke <hare@suse.com>,
	Bart Van Assche <bvanassche@acm.org>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	John Garry <john.garry@huawei.com>,
	LKML <linux-kernel@vger.kernel.org>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	Jens Axboe <axboe@fb.com>, Ingo Molnar <mingo@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Christoph Hellwig <hch@lst.de>, Sagi Grimberg <sagi@grimberg.me>
Subject: Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism
Date: Sat, 7 Sep 2019 07:13:22 +0800	[thread overview]
Message-ID: <20190906231321.GB12290@ming.t460p> (raw)
In-Reply-To: <20190906222555.GB4260@localhost.localdomain>

On Fri, Sep 06, 2019 at 04:25:55PM -0600, Keith Busch wrote:
> On Sat, Sep 07, 2019 at 06:19:21AM +0800, Ming Lei wrote:
> > On Fri, Sep 06, 2019 at 05:50:49PM +0000, Long Li wrote:
> > > >Subject: Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism
> > > >
> > > >Why are all 8 nvmes sharing the same CPU for interrupt handling?
> > > >Shouldn't matrix_find_best_cpu_managed() handle selecting the least used
> > > >CPU from the cpumask for the effective interrupt handling?
> > > 
> > > The tests run on 10 NVMe disks on a system of 80 CPUs. Each NVMe disk has 32 hardware queues.
> > 
> > Then there are total 320 NVMe MSI/X vectors, and 80 CPUs, so irq matrix
> > can't avoid effective CPUs overlapping at all.
> 
> Sure, but it's at most half, meanwhile the CPU that's dispatching requests
> would naturally be throttled by the other half who's completions are
> interrupting that CPU, no?

The root cause is that multiple submission vs. single completion.

Let's see two cases:

1) 10 NVMe, each 8 queues, 80 CPU cores
- suppose genriq matrix can avoid effective cpu overlap, each cpu
  only handles one nvme interrupt
- there can be concurrent submissions from 10 CPUs, and all may be
  completed on one CPU
- IRQ flood couldn't happen for this case, given each CPU is only
  handling completion from one NVMe drive, which shouldn't be fast
  than CPU.

2) 10 NVMe, each 32 queues, 80 CPU cores
- one CPU may handle 4 NVMe interrupts, each from different NVMe drive
- then there may be 4*3 submissions aimed at single completion, then IRQ
  flood should be easy triggered on CPU for handing 4 NVMe interrupts.
  Because IO from 4 NVMe drive may be quicker than one CPU.

I can observe IRQ flood on the case #1, but there are still CPUs for
handling 2 NVMe interrupt, as the reason mentioned by Long. We could
improve for this case.

Thanks,
Ming