From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C07AC433E6 for ; Sun, 7 Feb 2021 00:45:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C81A764E89 for ; Sun, 7 Feb 2021 00:45:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229621AbhBGApQ (ORCPT ); Sat, 6 Feb 2021 19:45:16 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:33436 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229566AbhBGApO (ORCPT ); Sat, 6 Feb 2021 19:45:14 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1612658627; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BGO97t36igbcJ5TYTclR2hGDKIhlmk20Dxr2wSKLR70=; b=hg8SZVTFX2RJ8QbaqkxMAZpgLbx90IFEatwveeix9vHL0gurBXhkyzmuw3p1xQzv5DhGsD UKxLfSDLT2865VOfWdPpQhwnAtJh9jHDwO0A12W4IID2y7D5ddnYeKSyHCW+ovzPHnDZOp SXXQqVv+N3dBvoMMwlgPmDU2wQqW0IM= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-597-FGHDHh7NM9KAxV6V6zgaTg-1; Sat, 06 Feb 2021 19:43:43 -0500 X-MC-Unique: FGHDHh7NM9KAxV6V6zgaTg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 5F3AD107ACE6; Sun, 7 Feb 2021 00:43:40 +0000 (UTC) Received: from [10.10.112.236] (ovpn-112-236.rdu2.redhat.com [10.10.112.236]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 2F2B56E417; Sun, 7 Feb 2021 00:43:31 +0000 (UTC) Subject: Re: [Patch v4 1/3] lib: Restrict cpumask_local_spread to houskeeping CPUs To: Thomas Gleixner , Marcelo Tosatti Cc: Robin Murphy , linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, frederic@kernel.org, juri.lelli@redhat.com, abelits@marvell.com, bhelgaas@google.com, linux-pci@vger.kernel.org, rostedt@goodmis.org, mingo@kernel.org, peterz@infradead.org, davem@davemloft.net, akpm@linux-foundation.org, sfr@canb.auug.org.au, stephen@networkplumber.org, rppt@linux.vnet.ibm.com, jinyuqi@huawei.com, zhangshaokun@hisilicon.com References: <20200625223443.2684-1-nitesh@redhat.com> <20200625223443.2684-2-nitesh@redhat.com> <3e9ce666-c9cd-391b-52b6-3471fe2be2e6@arm.com> <20210127121939.GA54725@fuller.cnet> <87r1m5can2.fsf@nanos.tec.linutronix.de> <20210128165903.GB38339@fuller.cnet> <87h7n0de5a.fsf@nanos.tec.linutronix.de> <20210204181546.GA30113@fuller.cnet> <20210204190647.GA32868@fuller.cnet> <87y2g26tnt.fsf@nanos.tec.linutronix.de> From: Nitesh Narayan Lal Organization: Red Hat Inc, Message-ID: Date: Sat, 6 Feb 2021 19:43:30 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0 MIME-Version: 1.0 In-Reply-To: <87y2g26tnt.fsf@nanos.tec.linutronix.de> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2/5/21 5:23 PM, Thomas Gleixner wrote: > On Thu, Feb 04 2021 at 14:17, Nitesh Narayan Lal wrote: >> On 2/4/21 2:06 PM, Marcelo Tosatti wrote: >>>>> How about adding a new flag for isolcpus instead? >>>>> >>>> Do you mean a flag based on which we can switch the affinity mask to >>>> housekeeping for all the devices at the time of IRQ distribution? >>> Yes a new flag for isolcpus. HK_FLAG_IRQ_SPREAD or some better name. >> Does sounds like a nice idea to explore, lets see what Thomas thinks about it. > I just read back up on that whole discussion and stared into the usage > sites a bit. > > There are a couple of issues here in a larger picture. Looking at it > from the device side first: > > The spreading is done for non-managed queues/interrupts which makes them > movable by user space. So it could be argued from both sides that the > damage done by allowing the full online mask or by allowing only the > house keeping mask can be fixed up by user space. > > But that's the trivial part of the problem. The real problem is CPU > hotplug and offline CPUs and the way how interrupts are set up for their > initial affinity. > > As Robin noticed, the change in 1abdfe706a57 ("lib: Restrict > cpumask_local_spread to houskeeping CPUs") is broken as it can return > offline CPUs in both the NOHZ_FULL and the !NOHZ_FULL case. A quick question here, is there any reason why we don't have cpu_online_mask instead of cpu_possible_mask in the housekeeping_cpumask()? (not for this particular patch but in general) > > The original code is racy vs. hotplug unless the callers block hotplug. > > Let's look at all the callers and what they do with it. > > cptvf_set_irq_affinity() affinity hint > safexcel_request_ring_irq() affinity hint > mv_cesa_probe() affinity hint > bnxt_request_irq() affinity hint > nicvf_set_irq_affinity() affinity hint > cxgb4_set_msix_aff() affinity hint > enic_init_affinity_hint(() affinity hint > iavf_request_traffic_irqs() affinity hint > ionic_alloc_qcq_interrupt() affinity hint > efx_set_interrupt_affinity() affinity hint > i40e_vsi_request_irq_msix() affinity hint > > be_evt_queues_create() affinity hint, queue affinity > hns3_nic_set_cpumask() affinity hint, queue affinity > mlx4_en_init_affinity_hint() affinity hint, queue affinity > mlx4_en_create_tx_ring() affinity hint, queue affinity > set_comp_irq_affinity_hint() affinity hint, queue affinity > i40e_config_xps_tx_ring() affinity hint, queue affinity > > hclge_configure affinity_hint, queue affinity, workqueue selection > > ixgbe_alloc_q_vector() node selection, affinity hint, queue affinity > > All of them do not care about disabling hotplug. Taking cpu_read_lock() > inside of that spread function would not solve anything because once the > lock is dropped the CPU can go away. > > There are 3 classes of this: > > 1) Does not matter: affinity hint > > 2) Might fail to set up the network queue when the selected CPU > is offline. > > 3) Broken: The hclge driver which uses the cpu to schedule work on > that cpu. That's broken, but unfortunately neither the workqueue > code nor the timer code will ever notice. The work just wont be > scheduled until the CPU comes online again which might be never. Agreed. > But looking at the above I really have to ask the question what the > commit in question is actually trying to solve. > > AFAICT, nothing at all. Why? > > 1) The majority of the drivers sets the hint __after_ requesting the > interrupt > > 2) Even if set _before_ requesting the interrupt it does not solve > anything because it's a hint and the interrupt core code does > not care about it at all. It provides the storage and the procfs > interface nothing else. > > So how does that prevent the interrupt subsystem from assigning an > interrupt to an isolated CPU? Not at all. > > Interrupts which are freshly allocated get the default interrupt > affinity mask, which is either set on the command line or via /proc. The > affinity of the interrupt can be changed after it has been populated in > /proc. > > When the interrupt is requested then one of the online CPUs in it's > affinity mask is chosen. > > X86 is special here because this also requires that there are free > vectors on one of the online CPUs in the mask. If the CPUs in the > affinity mask run out of vectors then it will grab a vector from some > other CPU which might be an isolated CPU. > > When the affinity mask of the interrupt at the time when it is actually > requested contains an isolated CPU then nothing prevents the kernel from > steering it at an isolated CPU. But that has absolutely nothing to do > with that spreading thingy. > > The only difference which this change makes is the fact that the > affinity hint changes. Nothing else. > Thanks for the detailed explanation. Before I posted this patch, I was doing some debugging on a setup where I was observing some latency issues due to the iavf IRQs that were pinned on the isolated CPUs. Based on some initial traces I had this impression that the affinity hint or cpumask_local_spread was somehow playing a role in deciding the affinity mask of these IRQs. Although, that does look incorrect after going through your explanation. For some reason, with a kernel that had this patch when I tried creating VFs iavf IRQs always ended up on the HK CPUs. The reasoning for the above is still not very clear to me. I will investigate this further to properly understand this behavior. -- Nitesh