From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9577AC433EF for ; Mon, 14 Mar 2022 13:38:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241419AbiCNNjM (ORCPT ); Mon, 14 Mar 2022 09:39:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33130 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241536AbiCNNjH (ORCPT ); Mon, 14 Mar 2022 09:39:07 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 111B6BC9A for ; Mon, 14 Mar 2022 06:37:57 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 4CEA161185 for ; Mon, 14 Mar 2022 13:37:56 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id AB0E7C340F6; Mon, 14 Mar 2022 13:37:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1647265075; bh=rr6lMXcZFN4o3VbDgDdxjHH2wTI1ygkpoMFDut5z9oY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=X3bh7TI4xgrruUgSrEmv/4AuzHcpyQSpJIa0FuZaWaMv4mcshalAfoD36sdz+ns5s X3BaROjRbrIlIPwohmsov8p8LfOeXy12c3KnYFe4+8MHwo6XlEY35FttA0T0gi2zRh OIDWSmaSnyPEPsecwG/BgcvriCgoHZdQFHKu+EKI0GGrDLgtLYNvZDwvFi0hQmvhbY lLtL6tKBTTdgNkbBUx1zqRidLlNJLjGzR9+NAoqlCmsd4JBHxrKpxYhCrLRFGixUNp FtMThnu4v3lF1tbd08zq7JvgzlD+8Z5AMpcg+d3t08o8sG+14JeKNm5Zcupk3jatEH 3DynZcEX2q6bw== From: Frederic Weisbecker To: "Paul E . McKenney" Cc: LKML , Frederic Weisbecker , Peter Zijlstra , Marco Elver , Neeraj Upadhyay , Valentin Schneider , Boqun Feng , Uladzislau Rezki , Joel Fernandes Subject: [PATCH 1/3] rcu: Fix expedited GP polling against UP/no-preempt environment Date: Mon, 14 Mar 2022 14:37:36 +0100 Message-Id: <20220314133738.269522-2-frederic@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220314133738.269522-1-frederic@kernel.org> References: <20220314133738.269522-1-frederic@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org synchronize_rcu_expedited() has an early return condition: if the current CPU is the only one online and the kernel doesn't run in preemption mode, the current assumed quiescent state is worth a grace period. However the expedited grace period polling caller of synchronize_rcu_expedited() takes a GP sequence snapshot and expects it to complete by the end of the synchronize_rcu_expedited() call. Yet if synchronize_rcu_expedited() relies on the above described UP/no-preempt early return, the grace period sequence won't move and may cause an expedited grace period polling stall. Fix this with treating polling differently while calling synchronize_rcu_expedited() and ignore the UP-no-preempt optimization in this case. Reported-by: Paul E. McKenney Signed-off-by: Frederic Weisbecker Cc: Uladzislau Rezki Cc: Joel Fernandes Cc: Boqun Feng Cc: Peter Zijlstra Cc: Neeraj Upadhyay Cc: Valentin Schneider --- kernel/rcu/tree_exp.h | 57 ++++++++++++++++++++++++++----------------- 1 file changed, 35 insertions(+), 22 deletions(-) diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h index d5f30085b0cf..3d8216ced93e 100644 --- a/kernel/rcu/tree_exp.h +++ b/kernel/rcu/tree_exp.h @@ -794,27 +794,14 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp) #endif /* #else #ifdef CONFIG_PREEMPT_RCU */ -/** - * synchronize_rcu_expedited - Brute-force RCU grace period - * - * Wait for an RCU grace period, but expedite it. The basic idea is to - * IPI all non-idle non-nohz online CPUs. The IPI handler checks whether - * the CPU is in an RCU critical section, and if so, it sets a flag that - * causes the outermost rcu_read_unlock() to report the quiescent state - * for RCU-preempt or asks the scheduler for help for RCU-sched. On the - * other hand, if the CPU is not in an RCU read-side critical section, - * the IPI handler reports the quiescent state immediately. - * - * Although this is a great improvement over previous expedited - * implementations, it is still unfriendly to real-time workloads, so is - * thus not recommended for any sort of common-case code. In fact, if - * you are using synchronize_rcu_expedited() in a loop, please restructure - * your code to batch your updates, and then use a single synchronize_rcu() - * instead. - * - * This has the same semantics as (but is more brutal than) synchronize_rcu(). +/* + * Start and wait for an expedited grace period completion. + * If it happens to be called by polling functions (@polling = true), + * there is no possible early return in UP no-preempt mode because + * the callers are waiting for an actual given sequence snapshot to start + * and end. */ -void synchronize_rcu_expedited(void) +static void __synchronize_rcu_expedited(bool polling) { bool boottime = (rcu_scheduler_active == RCU_SCHEDULER_INIT); struct rcu_exp_work rew; @@ -827,7 +814,7 @@ void synchronize_rcu_expedited(void) "Illegal synchronize_rcu_expedited() in RCU read-side critical section"); /* Is the state is such that the call is a grace period? */ - if (rcu_blocking_is_gp()) + if (rcu_blocking_is_gp() && !polling) return; /* If expedited grace periods are prohibited, fall back to normal. */ @@ -863,6 +850,32 @@ void synchronize_rcu_expedited(void) if (likely(!boottime)) destroy_work_on_stack(&rew.rew_work); + +} + +/** + * synchronize_rcu_expedited - Brute-force RCU grace period + * + * Wait for an RCU grace period, but expedite it. The basic idea is to + * IPI all non-idle non-nohz online CPUs. The IPI handler checks whether + * the CPU is in an RCU critical section, and if so, it sets a flag that + * causes the outermost rcu_read_unlock() to report the quiescent state + * for RCU-preempt or asks the scheduler for help for RCU-sched. On the + * other hand, if the CPU is not in an RCU read-side critical section, + * the IPI handler reports the quiescent state immediately. + * + * Although this is a great improvement over previous expedited + * implementations, it is still unfriendly to real-time workloads, so is + * thus not recommended for any sort of common-case code. In fact, if + * you are using synchronize_rcu_expedited() in a loop, please restructure + * your code to batch your updates, and then use a single synchronize_rcu() + * instead. + * + * This has the same semantics as (but is more brutal than) synchronize_rcu(). + */ +void synchronize_rcu_expedited(void) +{ + __synchronize_rcu_expedited(false); } EXPORT_SYMBOL_GPL(synchronize_rcu_expedited); @@ -903,7 +916,7 @@ static void sync_rcu_do_polled_gp(struct work_struct *wp) if (s & 0x1) return; while (!sync_exp_work_done(s)) - synchronize_rcu_expedited(); + __synchronize_rcu_expedited(true); raw_spin_lock_irqsave(&rnp->exp_poll_lock, flags); s = rnp->exp_seq_poll_rq; if (!(s & 0x1) && !sync_exp_work_done(s)) -- 2.25.1