linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Keith Busch <kbusch@kernel.org>
Cc: Long Li <longli@microsoft.com>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	Keith Busch <keith.busch@intel.com>,
	Hannes Reinecke <hare@suse.com>,
	Bart Van Assche <bvanassche@acm.org>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	John Garry <john.garry@huawei.com>,
	LKML <linux-kernel@vger.kernel.org>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	Jens Axboe <axboe@fb.com>, Ingo Molnar <mingo@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Christoph Hellwig <hch@lst.de>, Sagi Grimberg <sagi@grimberg.me>
Subject: Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism
Date: Sat, 7 Sep 2019 07:13:22 +0800	[thread overview]
Message-ID: <20190906231321.GB12290@ming.t460p> (raw)
In-Reply-To: <20190906222555.GB4260@localhost.localdomain>

On Fri, Sep 06, 2019 at 04:25:55PM -0600, Keith Busch wrote:
> On Sat, Sep 07, 2019 at 06:19:21AM +0800, Ming Lei wrote:
> > On Fri, Sep 06, 2019 at 05:50:49PM +0000, Long Li wrote:
> > > >Subject: Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism
> > > >
> > > >Why are all 8 nvmes sharing the same CPU for interrupt handling?
> > > >Shouldn't matrix_find_best_cpu_managed() handle selecting the least used
> > > >CPU from the cpumask for the effective interrupt handling?
> > > 
> > > The tests run on 10 NVMe disks on a system of 80 CPUs. Each NVMe disk has 32 hardware queues.
> > 
> > Then there are total 320 NVMe MSI/X vectors, and 80 CPUs, so irq matrix
> > can't avoid effective CPUs overlapping at all.
> 
> Sure, but it's at most half, meanwhile the CPU that's dispatching requests
> would naturally be throttled by the other half who's completions are
> interrupting that CPU, no?

The root cause is that multiple submission vs. single completion.

Let's see two cases:

1) 10 NVMe, each 8 queues, 80 CPU cores
- suppose genriq matrix can avoid effective cpu overlap, each cpu
  only handles one nvme interrupt
- there can be concurrent submissions from 10 CPUs, and all may be
  completed on one CPU
- IRQ flood couldn't happen for this case, given each CPU is only
  handling completion from one NVMe drive, which shouldn't be fast
  than CPU.

2) 10 NVMe, each 32 queues, 80 CPU cores
- one CPU may handle 4 NVMe interrupts, each from different NVMe drive
- then there may be 4*3 submissions aimed at single completion, then IRQ
  flood should be easy triggered on CPU for handing 4 NVMe interrupts.
  Because IO from 4 NVMe drive may be quicker than one CPU.

I can observe IRQ flood on the case #1, but there are still CPUs for
handling 2 NVMe interrupt, as the reason mentioned by Long. We could
improve for this case.

Thanks,
Ming

  reply	other threads:[~2019-09-06 23:13 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-27  8:53 [PATCH 0/4] genirq/nvme: add IRQF_RESCUE_THREAD for avoiding IRQ flood Ming Lei
2019-08-27  8:53 ` [PATCH 1/4] softirq: implement IRQ flood detection mechanism Ming Lei
2019-08-27 14:42   ` Thomas Gleixner
2019-08-27 16:19     ` Thomas Gleixner
2019-08-27 23:04       ` Ming Lei
2019-08-27 23:12         ` Thomas Gleixner
2019-08-27 22:58     ` Ming Lei
2019-08-27 23:09       ` Thomas Gleixner
2019-08-28 11:06         ` Ming Lei
2019-08-28 11:23           ` Thomas Gleixner
2019-08-28 13:50             ` Ming Lei
2019-08-28 14:07               ` Thomas Gleixner
2019-09-03  3:30                 ` Ming Lei
2019-09-03  5:59                   ` Daniel Lezcano
2019-09-03  6:31                     ` Ming Lei
2019-09-03  6:40                       ` Daniel Lezcano
2019-09-03  7:28                         ` Ming Lei
2019-09-03  7:50                           ` Daniel Lezcano
2019-09-03  9:30                             ` Ming Lei
2019-09-04 17:07                             ` Bart Van Assche
2019-09-04 17:31                               ` Daniel Lezcano
2019-09-04 17:38                                 ` Bart Van Assche
2019-09-04 18:02                                   ` Peter Zijlstra
2019-09-04 19:47                                     ` Bart Van Assche
2019-09-05  9:11                                       ` Ming Lei
2019-09-05  9:06                                 ` Ming Lei
2019-09-05 10:37                                   ` Daniel Lezcano
2019-09-06  1:22                                     ` Long Li
2019-09-06  4:36                                       ` Daniel Lezcano
2019-09-06  4:44                                         ` Long Li
2019-09-06  1:48                                     ` Ming Lei
2019-09-06  5:14                                       ` Daniel Lezcano
2019-09-06 18:30                                         ` Sagi Grimberg
2019-09-06 18:52                                           ` Keith Busch
2019-09-07  0:01                                           ` Ming Lei
2019-09-10  3:10                                             ` Sagi Grimberg
2019-09-18  0:00                                               ` Long Li
2019-09-20 17:14                                                 ` Sagi Grimberg
2019-09-20 19:12                                                   ` Long Li
2019-09-20 20:45                                                     ` Sagi Grimberg
2019-09-24  0:57                                                       ` Long Li
2019-09-18 14:37                                               ` Ming Lei
2019-09-20 17:09                                                 ` Sagi Grimberg
2019-09-06 14:18                                       ` Keith Busch
2019-09-06 17:50                                         ` Long Li
2019-09-06 22:19                                           ` Ming Lei
2019-09-06 22:25                                             ` Keith Busch
2019-09-06 23:13                                               ` Ming Lei [this message]
2019-09-10  0:24                                             ` Ming Lei
2019-09-03  8:09                           ` Thomas Gleixner
2019-09-03  9:24                             ` Ming Lei
2019-08-29  6:15   ` Long Li
2019-08-30  0:55     ` Ming Lei
2019-08-27  8:53 ` [PATCH 2/4] genirq: add IRQF_RESCUE_THREAD Ming Lei
2019-08-27  8:53 ` [PATCH 3/4] nvme: pci: pass IRQF_RESCURE_THREAD to request_threaded_irq Ming Lei
2019-08-27  9:06   ` Johannes Thumshirn
2019-08-27  9:09     ` Ming Lei
2019-08-27  9:12       ` Johannes Thumshirn
2019-08-27 14:34       ` Keith Busch
2019-08-27 14:44         ` Keith Busch
2019-08-27 15:10   ` Bart Van Assche
2019-08-28  1:45     ` Ming Lei
2019-08-27  8:53 ` [PATCH 4/4] genirq: use irq's affinity for threaded irq with IRQF_RESCUE_THREAD Ming Lei
2019-08-27 14:35   ` Keith Busch
2019-09-06  8:50   ` John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190906231321.GB12290@ming.t460p \
    --to=ming.lei@redhat.com \
    --cc=axboe@fb.com \
    --cc=bvanassche@acm.org \
    --cc=daniel.lezcano@linaro.org \
    --cc=hare@suse.com \
    --cc=hch@lst.de \
    --cc=john.garry@huawei.com \
    --cc=kbusch@kernel.org \
    --cc=keith.busch@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=longli@microsoft.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=sagi@grimberg.me \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).