From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17BE0C31E5B for ; Tue, 18 Jun 2019 17:40:27 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E555C20863 for ; Tue, 18 Jun 2019 17:40:26 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="jafNvHZS" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E555C20863 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:Date: Message-ID:From:References:To:Subject:Reply-To:Content-ID:Content-Description :Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=T8rklq80l3bH6GMKqVzcn49br5NfMY95CzqfEfn+EsA=; b=jafNvHZSKVlJOA gskrm65mSBQ4z88hukhEh27it8GnMMgpoFMz+2ZlcZLjgJz2bwPGrLCnIR4OfEAF75GJB1pDinhZ7 3uKTvT6t3rkYecWvUDLxd150+HMR4+QRYbvhYeTnzQgbT0RUHfw1bN5GiuLWbX7LyQnRCmLfrxXke 7tY2QCqPaHEWBrtFWFkSBIxNHI4ZD1YQOtuEG8Srgsx0hBEIboiQa9rLHOxARCBmh6twj7AyBC2fU 6dG0a3WU5THc5Zi6fCb4KmaaRR0pwr5GA5WhZhek4jrllxVUQ2nVJg4c+KFTZZY83GnxvHqtvg3xH lT4a526mkY0MoGPEbLUg==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92 #3 (Red Hat Linux)) id 1hdI5m-00005t-Ii; Tue, 18 Jun 2019 17:40:22 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.92 #3 (Red Hat Linux)) id 1hdI5j-00005R-4X for linux-arm-kernel@lists.infradead.org; Tue, 18 Jun 2019 17:40:20 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3B4BF344; Tue, 18 Jun 2019 10:40:18 -0700 (PDT) Received: from [10.1.197.50] (e120937-lin.cambridge.arm.com [10.1.197.50]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 5017E3F738; Tue, 18 Jun 2019 10:40:17 -0700 (PDT) Subject: Re: [PATCH] arm64: smp: fix smp_send_stop() behaviour To: Russell King - ARM Linux admin , Will Deacon References: <20190613122146.45459-1-cristian.marussi@arm.com> <20190617180913.GN30800@fuggles.cambridge.arm.com> <20190618125413.7la4mg3mojfshw6n@shell.armlinux.org.uk> From: Cristian Marussi Message-ID: <7100ad3c-9b64-9512-e513-5bf389002c30@arm.com> Date: Tue, 18 Jun 2019 18:40:16 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.0 MIME-Version: 1.0 In-Reply-To: <20190618125413.7la4mg3mojfshw6n@shell.armlinux.org.uk> Content-Language: en-US X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20190618_104019_266207_A063E762 X-CRM114-Status: GOOD ( 20.44 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: mark.rutland@arm.com, catalin.marinas@arm.com, james.morse@arm.com, dave.martin@arm.com, linux-arm-kernel@lists.infradead.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org Hi On 18/06/2019 13:54, Russell King - ARM Linux admin wrote: > On Mon, Jun 17, 2019 at 07:09:13PM +0100, Will Deacon wrote: >> [+James M] >> >> On Thu, Jun 13, 2019 at 01:21:46PM +0100, Cristian Marussi wrote: >>> On a 2-CPUs system, when one CPU is already online if the other >>> panics while starting-up, smp_send_stop() will fail to send any >>> STOP message to the other already online core, resulting in a >>> system still responsive and alive at the end of the panic procedure. >>> This patch makes smp_send_stop() account also for the online status >>> of the calling CPU while evaluating how many CPUs are effectively >>> online: this way, an adequate number of STOPs is sent, so enforcing >>> a proper freeze of the system at the end of panic even under the >>> above conditions. >>> >>> Reported-by: Dave Martin >>> Signed-off-by: Cristian Marussi >>> --- >>> >>> This peculiar panic-procedure behaviour was exposed hitting a BUG() >>> while running a KSFT cpu-hotplug test on a 2-core ARMv8 model. >>> Such trigger-BUG() was fixed by a distinct commit already included >>> in Linux 5.2-rc4 [0] >>> >>> [0] https://lore.kernel.org/linux-arm-kernel/1559576102-12156-1-git-send-email-Dave.Martin@arm.com/ >>> --- >>> arch/arm64/kernel/smp.c | 12 +++++++++--- >>> 1 file changed, 9 insertions(+), 3 deletions(-) >>> >>> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c >>> index bb4b3f07761a..c7d604427883 100644 >>> --- a/arch/arm64/kernel/smp.c >>> +++ b/arch/arm64/kernel/smp.c >>> @@ -971,8 +971,14 @@ void tick_broadcast(const struct cpumask *mask) >>> void smp_send_stop(void) >>> { >>> unsigned long timeout; >>> + unsigned int this_cpu_online = cpu_online(smp_processor_id()); >>> >>> - if (num_online_cpus() > 1) { >>> + /* >>> + * If this CPU isn't fully online, it will not be counted in >>> + * num_online_cpus(): on a 2-CPU system this situation will >>> + * result in no message being sent to the other already online CPU. >>> + */ >>> + if (num_online_cpus() > this_cpu_online) { >>> cpumask_t mask; >>> >>> cpumask_copy(&mask, cpu_online_mask); >>> @@ -985,10 +991,10 @@ void smp_send_stop(void) >>> >>> /* Wait up to one second for other CPUs to stop */ >>> timeout = USEC_PER_SEC; >>> - while (num_online_cpus() > 1 && timeout--) >>> + while (num_online_cpus() > this_cpu_online && timeout--) >>> udelay(1); >>> >>> - if (num_online_cpus() > 1) >>> + if (num_online_cpus() > this_cpu_online) >>> pr_warning("SMP: failed to stop secondary CPUs %*pbl\n", >>> cpumask_pr_args(cpu_online_mask)); >> >> Whilst this looks ok to me, I'm worried about whether or not we have this >> sort of logic elsewhere. For example, do we need to fix >> crash_smp_send_stop() (and possibly machine_kexec()) too? > > What about other architectures? This, or very similar code, is present > on other architectures too. > Thanks for the review, indeed glancing quickly at other arch's, there is a lot of common things and subtle differences in handling these situations...potentially some common logic to fix once for all and abstract out of arch specific code...(or at least try to do it)....so now a question arises to me: is it still worth to properly fix on arm64 at first extending this patch (easier and quicker maybe) or should I opt directly for the abstraction party ? Thanks Cristian _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel