From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E54D1C35640 for ; Fri, 21 Feb 2020 08:38:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B7C03206ED for ; Fri, 21 Feb 2020 08:38:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1582274338; bh=gecPiQ7v5GjEeaOs2ggxatXWN3mW4uwxGEFNfz7DXuU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=JmvKVCG6O+Qi7fIvILlmYjvk4n0Wi5056d2AMPGz7IM63OBA4527JdW5vyTJutttT 71z7oql16D+Lqbn4Gwwj3li8ggF5pQutKM/ezdv/PzP8jZjqEGSjHk/WDbfKD77Lu6 4SArCZVma3T0b8yEkvhdVbQPNRI67UG6ugDXvX6Q= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731274AbgBUIB5 (ORCPT ); Fri, 21 Feb 2020 03:01:57 -0500 Received: from mail.kernel.org ([198.145.29.99]:34232 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731110AbgBUIBw (ORCPT ); Fri, 21 Feb 2020 03:01:52 -0500 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id AD2EA206ED; Fri, 21 Feb 2020 08:01:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1582272112; bh=gecPiQ7v5GjEeaOs2ggxatXWN3mW4uwxGEFNfz7DXuU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=AU5B0i29iHqC7tzA3CbDIhW7pUeyjOql/Ft9Ur5lvlNUMlHsO79iNI72d4Dymb84N AnelOaoMGLadG65rlOYc2vx/vv1GsnHUNDMO0W1mpAn2TXgrHSW++3wohRkIk2oerg R/3M8BZVM7OyLcIBgNVVo2ykMlIu7Iudi7o2rDWU= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, "Paul E. McKenney" , "Peter Zijlstra (Intel)" , Sasha Levin , Tejun Heo Subject: [PATCH 5.4 020/344] cpu/hotplug, stop_machine: Fix stop_machine vs hotplug order Date: Fri, 21 Feb 2020 08:36:59 +0100 Message-Id: <20200221072351.043868164@linuxfoundation.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200221072349.335551332@linuxfoundation.org> References: <20200221072349.335551332@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Peter Zijlstra [ Upstream commit 45178ac0cea853fe0e405bf11e101bdebea57b15 ] Paul reported a very sporadic, rcutorture induced, workqueue failure. When the planets align, the workqueue rescuer's self-migrate fails and then triggers a WARN for running a work on the wrong CPU. Tejun then figured that set_cpus_allowed_ptr()'s stop_one_cpu() call could be ignored! When stopper->enabled is false, stop_machine will insta complete the work, without actually doing the work. Worse, it will not WARN about this (we really should fix this). It turns out there is a small window where a freshly online'ed CPU is marked 'online' but doesn't yet have the stopper task running: BP AP bringup_cpu() __cpu_up(cpu, idle) --> start_secondary() ... cpu_startup_entry() bringup_wait_for_ap() wait_for_ap_thread() <-- cpuhp_online_idle() while (1) do_idle() ... available to run kthreads ... stop_machine_unpark() stopper->enable = true; Close this by moving the stop_machine_unpark() into cpuhp_online_idle(), such that the stopper thread is ready before we start the idle loop and schedule. Reported-by: "Paul E. McKenney" Debugged-by: Tejun Heo Signed-off-by: Peter Zijlstra (Intel) Tested-by: "Paul E. McKenney" Signed-off-by: Sasha Levin --- kernel/cpu.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/kernel/cpu.c b/kernel/cpu.c index 116825437cd61..406828fb30388 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -525,8 +525,7 @@ static int bringup_wait_for_ap(unsigned int cpu) if (WARN_ON_ONCE((!cpu_online(cpu)))) return -ECANCELED; - /* Unpark the stopper thread and the hotplug thread of the target cpu */ - stop_machine_unpark(cpu); + /* Unpark the hotplug thread of the target cpu */ kthread_unpark(st->thread); /* @@ -1089,8 +1088,8 @@ void notify_cpu_starting(unsigned int cpu) /* * Called from the idle task. Wake up the controlling task which brings the - * stopper and the hotplug thread of the upcoming CPU up and then delegates - * the rest of the online bringup to the hotplug thread. + * hotplug thread of the upcoming CPU up and then delegates the rest of the + * online bringup to the hotplug thread. */ void cpuhp_online_idle(enum cpuhp_state state) { @@ -1100,6 +1099,12 @@ void cpuhp_online_idle(enum cpuhp_state state) if (state != CPUHP_AP_ONLINE_IDLE) return; + /* + * Unpart the stopper thread before we start the idle loop (and start + * scheduling); this ensures the stopper task is always available. + */ + stop_machine_unpark(smp_processor_id()); + st->state = CPUHP_AP_ONLINE_IDLE; complete_ap_thread(st, true); } -- 2.20.1