From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.2 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EBCD1C4338F for ; Mon, 2 Aug 2021 13:15:43 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E9E0260D07 for ; Mon, 2 Aug 2021 13:15:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org E9E0260D07 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kaod.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.ozlabs.org Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4GddmF2qlcz3cSW for ; Mon, 2 Aug 2021 23:15:41 +1000 (AEST) Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=kaod.org (client-ip=217.182.185.173; helo=smtpout3.3005.mail-out.ovh.net; envelope-from=clg@kaod.org; receiver=) X-Greylist: delayed 966 seconds by postgrey-1.36 at boromir; Mon, 02 Aug 2021 23:15:16 AEST Received: from smtpout3.3005.mail-out.ovh.net (smtpout3.3005.mail-out.ovh.net [217.182.185.173]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4Gddlm3SDmz3046 for ; Mon, 2 Aug 2021 23:15:14 +1000 (AEST) Received: from mxplan5.mail.ovh.net (unknown [10.109.146.59]) by mo3005.mail-out.ovh.net (Postfix) with ESMTPS id EC3A413B0E9; Mon, 2 Aug 2021 12:59:02 +0000 (UTC) Received: from kaod.org (37.59.142.97) by DAG4EX1.mxp5.local (172.16.2.31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.14; Mon, 2 Aug 2021 14:59:02 +0200 Authentication-Results: garm.ovh; auth=pass (GARM-97G0024d37b098-f782-465d-969d-06a754345f4b, BF3CD4E3C57DE48CDC00B7E56D076D590FF73035) smtp.auth=clg@kaod.org X-OVh-ClientIp: 82.66.77.115 Subject: Re: [PATCH] powerpc/xive: Do not skip CPU-less nodes when creating the IPIs To: Michael Ellerman , References: <20210629131542.743888-1-clg@kaod.org> <87a6m0l5by.fsf@mpe.ellerman.id.au> From: =?UTF-8?Q?C=c3=a9dric_Le_Goater?= Message-ID: <1facebb0-3ac1-18cc-e473-46120a5ef4ad@kaod.org> Date: Mon, 2 Aug 2021 14:59:01 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <87a6m0l5by.fsf@mpe.ellerman.id.au> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [37.59.142.97] X-ClientProxiedBy: DAG1EX1.mxp5.local (172.16.2.1) To DAG4EX1.mxp5.local (172.16.2.31) X-Ovh-Tracer-GUID: 5022168f-0f47-41e8-8cfd-12b1bc3af52d X-Ovh-Tracer-Id: 17011784644205841260 X-VR-SPAMSTATE: OK X-VR-SPAMSCORE: -100 X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrgedvtddriedvgdehgecutefuodetggdotefrodftvfcurfhrohhfihhlvgemucfqggfjpdevjffgvefmvefgnecuuegrihhlohhuthemucehtddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenucfjughrpefuvfhfhffkffgfgggjtgfgihesthekredttdefjeenucfhrhhomhepveorughrihgtpgfnvggpifhorghtvghruceotghlgheskhgrohgurdhorhhgqeenucggtffrrghtthgvrhhnpeehuedtheeghfdvhedtueelteegvdefueektdefiefhffffieduuddtudfhgfevtdenucffohhmrghinhepghhithhhuhgsrdgtohhmnecukfhppedtrddtrddtrddtpdefjedrheelrddugedvrdeljeenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhhouggvpehsmhhtphdqohhuthdphhgvlhhopehmgihplhgrnhehrdhmrghilhdrohhvhhdrnhgvthdpihhnvghtpedtrddtrddtrddtpdhmrghilhhfrhhomheptghlgheskhgrohgurdhorhhgpdhrtghpthhtohepmhhpvgesvghllhgvrhhmrghnrdhiugdrrghu X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Laurent Vivier , Srikar Dronamraju , Geetika Moolchandani , stable@vger.kernel.org, David Gibson , Kairui Song Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On 8/2/21 8:37 AM, Michael Ellerman wrote: > Cédric Le Goater writes: >> On PowerVM, CPU-less nodes can be populated with hot-plugged CPUs at >> runtime. Today, the IPI is not created for such nodes, and hot-plugged >> CPUs use a bogus IPI, which leads to soft lockups. >> >> We could create the node IPI on demand but it is a bit complex because >> this code would be called under bringup_up() and some IRQ locking is >> being done. The simplest solution is to create the IPIs for all nodes >> at startup. >> >> Fixes: 7dcc37b3eff9 ("powerpc/xive: Map one IPI interrupt per node") >> Cc: stable@vger.kernel.org # v5.13 >> Reported-by: Geetika Moolchandani >> Cc: Srikar Dronamraju >> Signed-off-by: Cédric Le Goater >> --- >> >> This patch breaks old versions of irqbalance (<= v1.4). Possible nodes >> are collected from /sys/devices/system/node/ but CPU-less nodes are >> not listed there. When interrupts are scanned, the link representing >> the node structure is NULL and segfault occurs. > > Breaking userspace is usually frowned upon, even if it is irqbalance. > > If CPU-less nodes appeared in /sys/devices/system/node would that fix > it? Could we do that or is that not possible for other reasons? > >> Version 1.7 seems immune. > > Which was released in August 2020. > > Looks like some distros still ship 1.6, I take it you're not sure if > that is broken or not. I did a bisect on irqbalance and the "bad" commit was introduced between version 1.7 and version 1.8 : commit 31dea01f3a47 ("Also fetch node info for non-PCI devices") https://github.com/Irqbalance/irqbalance/commit/31dea01f3a47aa6374560638486879e5129f9c94 which was backported on RHEL 8 in RPM irqbalance-1.4.0-6.el8. Any distro using irqbalance <= 1.7 without the patch above is fine. Since irqbalance handled cleanly irqs referencing offline nodes before this patch, I am inclined to think that the irqbalance fix is incomplete. Unfortunately, the commit log lacks some context on the non-PCI devices. Thanks, C.