From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.6 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6EB11C433FF for ; Thu, 8 Aug 2019 16:34:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 433372173E for ; Thu, 8 Aug 2019 16:34:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1565282092; bh=95Er43qZvMplrezdl0LI4jtkKdBwgvpsqB/rsSHZN74=; h=Date:From:To:Cc:Subject:References:In-Reply-To:List-ID:From; b=sRXRGwJelcSZeV64072f8uuznqgalU+E3NIMDTxIVlch56ieHMo4Uq2IgYk4IV2eH H8pAzp6PwZ5BgsRjr9GJs4viiQIpi3zQNxwbaV0gQDRrcSJwmCQKGKJHrxUF7q7YCi AVKXiem/G98l86MwewXD6CdKEeWigJO+lSFZskt8= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390143AbfHHQev (ORCPT ); Thu, 8 Aug 2019 12:34:51 -0400 Received: from mga01.intel.com ([192.55.52.88]:54043 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725535AbfHHQeu (ORCPT ); Thu, 8 Aug 2019 12:34:50 -0400 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Aug 2019 09:34:50 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,362,1559545200"; d="scan'208";a="174895459" Received: from unknown (HELO localhost.localdomain) ([10.232.112.69]) by fmsmga008.fm.intel.com with ESMTP; 08 Aug 2019 09:34:49 -0700 Date: Thu, 8 Aug 2019 10:32:24 -0600 From: Keith Busch To: Thomas Gleixner Cc: Jon Derrick , LKML , linux-nvme@lists.infradead.org, Ming Lei , Christoph Hellwig Subject: Re: [PATCH] genirq/affinity: report extra vectors on uneven nodes Message-ID: <20190808163224.GB27077@localhost.localdomain> References: <20190807201051.32662-1-jonathan.derrick@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.1 (2017-09-22) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Aug 08, 2019 at 09:04:28AM +0200, Thomas Gleixner wrote: > On Wed, 7 Aug 2019, Jon Derrick wrote: > > The current irq spreading algorithm spreads vectors amongst cpus evenly > > per node. If a node has more cpus than another node, the extra vectors > > being spread may not be reported back to the caller. > > > > This is most apparent with the NVMe driver and nr_cpus < vectors, where > > the underreporting results in the caller's WARN being triggered: > > > > irq_build_affinity_masks() > > ... > > if (nr_present < numvecs) > > WARN_ON(nr_present + nr_others < numvecs); > > > > Signed-off-by: Jon Derrick > > --- > > kernel/irq/affinity.c | 7 +++++-- > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c > > index 4352b08ae48d..9beafb8c7e92 100644 > > --- a/kernel/irq/affinity.c > > +++ b/kernel/irq/affinity.c > > @@ -127,7 +127,8 @@ static int __irq_build_affinity_masks(unsigned int startvec, > > } > > > > for_each_node_mask(n, nodemsk) { > > - unsigned int ncpus, v, vecs_to_assign, vecs_per_node; > > + unsigned int ncpus, v, vecs_to_assign, total_vecs_to_assign, > > + vecs_per_node; > > > > /* Spread the vectors per node */ > > vecs_per_node = (numvecs - (curvec - firstvec)) / nodes; > > @@ -141,14 +142,16 @@ static int __irq_build_affinity_masks(unsigned int startvec, > > > > /* Account for rounding errors */ > > extra_vecs = ncpus - vecs_to_assign * (ncpus / vecs_to_assign); > > + total_vecs_to_assign = vecs_to_assign + extra_vecs; > > > > - for (v = 0; curvec < last_affv && v < vecs_to_assign; > > + for (v = 0; curvec < last_affv && v < total_vecs_to_assign; > > curvec++, v++) { > > cpus_per_vec = ncpus / vecs_to_assign; > > > > /* Account for extra vectors to compensate rounding errors */ > > if (extra_vecs) { > > cpus_per_vec++; > > + v++; > > --extra_vecs; > > } > > irq_spread_init_one(&masks[curvec].mask, nmsk, > > -- This looks like it will break the spread to non-present CPUs since it's not accurately reporting how many vectors were assigned for the present spread. I think the real problem is the spread's vecs_per_node doesn't account which nodes contribute more CPUs than others. For example: Node 0 has 32 CPUs Node 1 has 8 CPUs Assign 32 vectors The current algorithm assigns 16 vectors to node 0 because vecs_per_node is calculated as 32 vectors / 2 nodes on the first iteration. The subsequent iteration for node 1 gets 8 vectors because it has only 8 CPUs, leaving 8 vectors unassigned. A more fair spread would give node 0 the remaining 8 vectors. This optimization, however, is a bit more complex than the current algorithm, which is probably why it wasn't done, so I think the warning should just be removed.