From: "Derrick, Jonathan" <jonathan.derrick@intel.com>
To: "kbusch@kernel.org" <kbusch@kernel.org>,
"tglx@linutronix.de" <tglx@linutronix.de>
Cc: "hch@lst.de" <hch@lst.de>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
"ming.lei@redhat.com" <ming.lei@redhat.com>
Subject: Re: [PATCH] genirq/affinity: report extra vectors on uneven nodes
Date: Thu, 8 Aug 2019 22:46:06 +0000 [thread overview]
Message-ID: <1a6ab898b8800c3e660054f77ac81bfc3921d45a.camel@intel.com> (raw)
In-Reply-To: <20190808163224.GB27077@localhost.localdomain>
[-- Attachment #1: Type: text/plain, Size: 3268 bytes --]
On Thu, 2019-08-08 at 10:32 -0600, Keith Busch wrote:
> On Thu, Aug 08, 2019 at 09:04:28AM +0200, Thomas Gleixner wrote:
> > On Wed, 7 Aug 2019, Jon Derrick wrote:
> > > The current irq spreading algorithm spreads vectors amongst cpus evenly
> > > per node. If a node has more cpus than another node, the extra vectors
> > > being spread may not be reported back to the caller.
> > >
> > > This is most apparent with the NVMe driver and nr_cpus < vectors, where
> > > the underreporting results in the caller's WARN being triggered:
> > >
> > > irq_build_affinity_masks()
> > > ...
> > > if (nr_present < numvecs)
> > > WARN_ON(nr_present + nr_others < numvecs);
> > >
> > > Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
> > > ---
> > > kernel/irq/affinity.c | 7 +++++--
> > > 1 file changed, 5 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
> > > index 4352b08ae48d..9beafb8c7e92 100644
> > > --- a/kernel/irq/affinity.c
> > > +++ b/kernel/irq/affinity.c
> > > @@ -127,7 +127,8 @@ static int __irq_build_affinity_masks(unsigned int startvec,
> > > }
> > >
> > > for_each_node_mask(n, nodemsk) {
> > > - unsigned int ncpus, v, vecs_to_assign, vecs_per_node;
> > > + unsigned int ncpus, v, vecs_to_assign, total_vecs_to_assign,
> > > + vecs_per_node;
> > >
> > > /* Spread the vectors per node */
> > > vecs_per_node = (numvecs - (curvec - firstvec)) / nodes;
> > > @@ -141,14 +142,16 @@ static int __irq_build_affinity_masks(unsigned int startvec,
> > >
> > > /* Account for rounding errors */
> > > extra_vecs = ncpus - vecs_to_assign * (ncpus / vecs_to_assign);
> > > + total_vecs_to_assign = vecs_to_assign + extra_vecs;
> > >
> > > - for (v = 0; curvec < last_affv && v < vecs_to_assign;
> > > + for (v = 0; curvec < last_affv && v < total_vecs_to_assign;
> > > curvec++, v++) {
> > > cpus_per_vec = ncpus / vecs_to_assign;
> > >
> > > /* Account for extra vectors to compensate rounding errors */
> > > if (extra_vecs) {
> > > cpus_per_vec++;
> > > + v++;
> > > --extra_vecs;
> > > }
> > > irq_spread_init_one(&masks[curvec].mask, nmsk,
> > > --
>
> This looks like it will break the spread to non-present CPUs since
> it's not accurately reporting how many vectors were assigned for the
> present spread.
>
> I think the real problem is the spread's vecs_per_node doesn't account
> which nodes contribute more CPUs than others. For example:
>
> Node 0 has 32 CPUs
> Node 1 has 8 CPUs
> Assign 32 vectors
>
> The current algorithm assigns 16 vectors to node 0 because vecs_per_node
> is calculated as 32 vectors / 2 nodes on the first iteration. The
> subsequent iteration for node 1 gets 8 vectors because it has only 8
> CPUs, leaving 8 vectors unassigned.
>
> A more fair spread would give node 0 the remaining 8 vectors. This
> optimization, however, is a bit more complex than the current algorithm,
> which is probably why it wasn't done, so I think the warning should just
> be removed.
It does get a bit complex for the rare scenario in this case
Maybe just an informational warning rather than a stackdumping warning
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 3278 bytes --]
next prev parent reply other threads:[~2019-08-08 22:46 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-07 20:10 [PATCH] genirq/affinity: report extra vectors on uneven nodes Jon Derrick
2019-08-08 7:04 ` Thomas Gleixner
2019-08-08 16:32 ` Keith Busch
2019-08-08 22:46 ` Derrick, Jonathan [this message]
2019-08-08 23:08 ` Keith Busch
2019-08-09 3:04 ` Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1a6ab898b8800c3e660054f77ac81bfc3921d45a.camel@intel.com \
--to=jonathan.derrick@intel.com \
--cc=hch@lst.de \
--cc=kbusch@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=ming.lei@redhat.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).