From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.6 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 46973C432C3 for ; Sat, 16 Nov 2019 15:48:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0ADDC2086A for ; Sat, 16 Nov 2019 15:48:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1573919305; bh=uEVA+L9CRwP8rqGfZ0JMcA2s9MQeOXhQzmIXOSo/uZo=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:List-ID: From; b=CjyPka6kQPW7HViXcOxzLR2YtieOcbnrRLxDa+q3D62IP5wiqDCGM0bWi9iG/ZqUQ MRb9lTHUxg5p6uilvOpMVOkWm7OFIsoRahH23rw2IgJ8UyD+KHBK3cGexHQO5Y0yCZ Yz3q8zlBsPox9AU22Mli6LR8De7igRx5xaVBKVEs= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728630AbfKPPsY (ORCPT ); Sat, 16 Nov 2019 10:48:24 -0500 Received: from mail.kernel.org ([198.145.29.99]:55404 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730024AbfKPPsX (ORCPT ); Sat, 16 Nov 2019 10:48:23 -0500 Received: from paulmck-ThinkPad-P72.home (50-39-105-78.bvtn.or.frontiernet.net [50.39.105.78]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id EC3F520815; Sat, 16 Nov 2019 15:48:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1573919302; bh=uEVA+L9CRwP8rqGfZ0JMcA2s9MQeOXhQzmIXOSo/uZo=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=lmmYoUxs9fgACQ6w0AiVqBIjHrHcLkHc0GIQV/xwTeYRrtRR+F7n6gc7YP4qVkZ8+ 1hlu25FdN+feJtSFKj9yWJrjTDVFYxhAGronHwErw358JhjMOr3voFGxLUOWlyvJwc BmfWqp4qZ/zeyB9i0q/ohZhYTnoXvncrELh+jm3I= Received: by paulmck-ThinkPad-P72.home (Postfix, from userid 1000) id B341E35227AD; Sat, 16 Nov 2019 07:48:21 -0800 (PST) Date: Sat, 16 Nov 2019 07:48:21 -0800 From: "Paul E. McKenney" To: Lai Jiangshan Cc: linux-kernel@vger.kernel.org, Peter Zijlstra , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , x86@kernel.org, Josh Triplett , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Joel Fernandes , Andi Kleen , Andy Lutomirski , Fenghua Yu , Kees Cook , "Rafael J. Wysocki" , Sebastian Andrzej Siewior , Dave Hansen , Babu Moger , Rik van Riel , "Chang S. Bae" , Jann Horn , David Windsor , Elena Reshetova , Yuyang Du , Anshuman Khandual , Richard Guy Briggs , Andrew Morton , Christian Brauner , Michal Hocko , Andrea Arcangeli , Al Viro , "Dmitry V. Levin" , rcu@vger.kernel.org Subject: Re: [PATCH V2 7/7] x86,rcu: use percpu rcu_preempt_depth Message-ID: <20191116154821.GE2865@paulmck-ThinkPad-P72> Reply-To: paulmck@kernel.org References: <20191102124559.1135-1-laijs@linux.alibaba.com> <20191102124559.1135-8-laijs@linux.alibaba.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191102124559.1135-8-laijs@linux.alibaba.com> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: rcu-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org On Sat, Nov 02, 2019 at 12:45:59PM +0000, Lai Jiangshan wrote: > Convert x86 to use a per-cpu rcu_preempt_depth. The reason for doing so > is that accessing per-cpu variables is a lot cheaper than accessing > task_struct or thread_info variables. > > We need to save/restore the actual rcu_preempt_depth when switch. > We also place the per-cpu rcu_preempt_depth close to __preempt_count > and current_task variable. > > Using the idea of per-cpu __preempt_count. > > No function call when using rcu_read_[un]lock(). > Single instruction for rcu_read_lock(). > 2 instructions for fast path of rcu_read_unlock(). > > CC: Peter Zijlstra > Signed-off-by: Lai Jiangshan Wouldn't putting RCU's nesting-depth counter in task info be just as fast, just as nice for #include/inlining, and a lot less per-architecture code? Or am I missing some issue with the task-info approach? Thanx, Paul > --- > arch/x86/Kconfig | 2 + > arch/x86/include/asm/rcu_preempt_depth.h | 87 ++++++++++++++++++++++++ > arch/x86/kernel/cpu/common.c | 7 ++ > arch/x86/kernel/process_32.c | 2 + > arch/x86/kernel/process_64.c | 2 + > include/linux/rcupdate.h | 24 +++++++ > init/init_task.c | 2 +- > kernel/fork.c | 2 +- > kernel/rcu/Kconfig | 3 + > kernel/rcu/tree_exp.h | 2 + > kernel/rcu/tree_plugin.h | 37 +++++++--- > 11 files changed, 158 insertions(+), 12 deletions(-) > create mode 100644 arch/x86/include/asm/rcu_preempt_depth.h > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index d6e1faa28c58..af9fedc0fdc4 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -18,6 +18,7 @@ config X86_32 > select MODULES_USE_ELF_REL > select OLD_SIGACTION > select GENERIC_VDSO_32 > + select ARCH_HAVE_RCU_PREEMPT_DEEPTH > > config X86_64 > def_bool y > @@ -31,6 +32,7 @@ config X86_64 > select NEED_DMA_MAP_STATE > select SWIOTLB > select ARCH_HAS_SYSCALL_WRAPPER > + select ARCH_HAVE_RCU_PREEMPT_DEEPTH > > config FORCE_DYNAMIC_FTRACE > def_bool y > diff --git a/arch/x86/include/asm/rcu_preempt_depth.h b/arch/x86/include/asm/rcu_preempt_depth.h > new file mode 100644 > index 000000000000..88010ad59c20 > --- /dev/null > +++ b/arch/x86/include/asm/rcu_preempt_depth.h > @@ -0,0 +1,87 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +#ifndef __ASM_RCU_PREEMPT_DEPTH_H > +#define __ASM_RCU_PREEMPT_DEPTH_H > + > +#include > +#include > + > +#ifdef CONFIG_PREEMPT_RCU > +DECLARE_PER_CPU(int, __rcu_preempt_depth); > + > +/* > + * We use the RCU_NEED_SPECIAL bit as an inverted need_special > + * such that a decrement hitting 0 means we can and should do > + * rcu_read_unlock_special(). > + */ > +#define RCU_NEED_SPECIAL 0x80000000 > + > +#define INIT_RCU_PREEMPT_DEPTH (RCU_NEED_SPECIAL) > + > +/* We mask the RCU_NEED_SPECIAL bit so that it return real depth */ > +static __always_inline int rcu_preempt_depth(void) > +{ > + return raw_cpu_read_4(__rcu_preempt_depth) & ~RCU_NEED_SPECIAL; > +} > + > +static __always_inline void rcu_preempt_depth_set(int pc) > +{ > + int old, new; > + > + do { > + old = raw_cpu_read_4(__rcu_preempt_depth); > + new = (old & RCU_NEED_SPECIAL) | > + (pc & ~RCU_NEED_SPECIAL); > + } while (raw_cpu_cmpxchg_4(__rcu_preempt_depth, old, new) != old); > +} > + > +/* > + * We fold the RCU_NEED_SPECIAL bit into the rcu_preempt_depth such that > + * rcu_read_unlock() can decrement and test for needing to do special > + * with a single instruction. > + * > + * We invert the actual bit, so that when the decrement hits 0 we know > + * both it just exited the outmost rcu_read_lock() critical section and > + * we need to do specail (the bit is cleared) if it doesn't need to be > + * deferred. > + */ > + > +static inline void set_rcu_preempt_need_special(void) > +{ > + raw_cpu_and_4(__rcu_preempt_depth, ~RCU_NEED_SPECIAL); > +} > + > +/* > + * irq needs to be disabled for clearing any bits of ->rcu_read_unlock_special > + * and calling this function. Otherwise it may clear the work done > + * by set_rcu_preempt_need_special() in interrupt. > + */ > +static inline void clear_rcu_preempt_need_special(void) > +{ > + raw_cpu_or_4(__rcu_preempt_depth, RCU_NEED_SPECIAL); > +} > + > +static __always_inline void rcu_preempt_depth_inc(void) > +{ > + raw_cpu_add_4(__rcu_preempt_depth, 1); > +} > + > +static __always_inline bool rcu_preempt_depth_dec_and_test(void) > +{ > + return GEN_UNARY_RMWcc("decl", __rcu_preempt_depth, e, __percpu_arg([var])); > +} > + > +/* must be macros to avoid header recursion hell */ > +#define save_restore_rcu_preempt_depth(prev_p, next_p) do { \ > + prev_p->rcu_read_lock_nesting = this_cpu_read(__rcu_preempt_depth); \ > + this_cpu_write(__rcu_preempt_depth, next_p->rcu_read_lock_nesting); \ > + } while (0) > + > +#define DEFINE_PERCPU_RCU_PREEMP_DEPTH \ > + DEFINE_PER_CPU(int, __rcu_preempt_depth) = INIT_RCU_PREEMPT_DEPTH; \ > + EXPORT_PER_CPU_SYMBOL(__rcu_preempt_depth) > +#else /* #ifdef CONFIG_PREEMPT_RCU */ > +#define save_restore_rcu_preempt_depth(prev_p, next_p) do {} while (0) > +#define DEFINE_PERCPU_RCU_PREEMP_DEPTH /* empty */ > +#endif /* #else #ifdef CONFIG_PREEMPT_RCU */ > + > +#endif /* __ASM_RCU_PREEMPT_DEPTH_H */ > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c > index 9ae7d1bcd4f4..0151737e196c 100644 > --- a/arch/x86/kernel/cpu/common.c > +++ b/arch/x86/kernel/cpu/common.c > @@ -46,6 +46,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -1633,6 +1634,9 @@ DEFINE_PER_CPU(unsigned int, irq_count) __visible = -1; > DEFINE_PER_CPU(int, __preempt_count) = INIT_PREEMPT_COUNT; > EXPORT_PER_CPU_SYMBOL(__preempt_count); > > +/* close to __preempt_count */ > +DEFINE_PERCPU_RCU_PREEMP_DEPTH; > + > /* May not be marked __init: used by software suspend */ > void syscall_init(void) > { > @@ -1690,6 +1694,9 @@ EXPORT_PER_CPU_SYMBOL(current_task); > DEFINE_PER_CPU(int, __preempt_count) = INIT_PREEMPT_COUNT; > EXPORT_PER_CPU_SYMBOL(__preempt_count); > > +/* close to __preempt_count */ > +DEFINE_PERCPU_RCU_PREEMP_DEPTH; > + > /* > * On x86_32, vm86 modifies tss.sp0, so sp0 isn't a reliable way to find > * the top of the kernel stack. Use an extra percpu variable to track the > diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c > index b8ceec4974fe..ab1f20353663 100644 > --- a/arch/x86/kernel/process_32.c > +++ b/arch/x86/kernel/process_32.c > @@ -51,6 +51,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -290,6 +291,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) > if (prev->gs | next->gs) > lazy_load_gs(next->gs); > > + save_restore_rcu_preempt_depth(prev_p, next_p); > this_cpu_write(current_task, next_p); > > switch_fpu_finish(next_fpu); > diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c > index af64519b2695..2e1c6e829d30 100644 > --- a/arch/x86/kernel/process_64.c > +++ b/arch/x86/kernel/process_64.c > @@ -50,6 +50,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -559,6 +560,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) > > x86_fsgsbase_load(prev, next); > > + save_restore_rcu_preempt_depth(prev_p, next_p); > /* > * Switch the PDA and FPU contexts. > */ > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > index a35daab95d14..0d2abf08b694 100644 > --- a/include/linux/rcupdate.h > +++ b/include/linux/rcupdate.h > @@ -41,6 +41,29 @@ void synchronize_rcu(void); > > #ifdef CONFIG_PREEMPT_RCU > > +#ifdef CONFIG_ARCH_HAVE_RCU_PREEMPT_DEEPTH > +#include > + > +#ifndef CONFIG_PROVE_LOCKING > +extern void rcu_read_unlock_special(void); > + > +static inline void __rcu_read_lock(void) > +{ > + rcu_preempt_depth_inc(); > +} > + > +static inline void __rcu_read_unlock(void) > +{ > + if (unlikely(rcu_preempt_depth_dec_and_test())) > + rcu_read_unlock_special(); > +} > +#else > +void __rcu_read_lock(void); > +void __rcu_read_unlock(void); > +#endif > + > +#else /* #ifdef CONFIG_ARCH_HAVE_RCU_PREEMPT_DEEPTH */ > +#define INIT_RCU_PREEMPT_DEPTH (0) > void __rcu_read_lock(void); > void __rcu_read_unlock(void); > > @@ -51,6 +74,7 @@ void __rcu_read_unlock(void); > * types of kernel builds, the rcu_read_lock() nesting depth is unknowable. > */ > #define rcu_preempt_depth() (current->rcu_read_lock_nesting) > +#endif /* #else #ifdef CONFIG_ARCH_HAVE_RCU_PREEMPT_DEEPTH */ > > #else /* #ifdef CONFIG_PREEMPT_RCU */ > > diff --git a/init/init_task.c b/init/init_task.c > index 9e5cbe5eab7b..0a91e38fba37 100644 > --- a/init/init_task.c > +++ b/init/init_task.c > @@ -130,7 +130,7 @@ struct task_struct init_task > .perf_event_list = LIST_HEAD_INIT(init_task.perf_event_list), > #endif > #ifdef CONFIG_PREEMPT_RCU > - .rcu_read_lock_nesting = 0, > + .rcu_read_lock_nesting = INIT_RCU_PREEMPT_DEPTH, > .rcu_read_unlock_special.s = 0, > .rcu_node_entry = LIST_HEAD_INIT(init_task.rcu_node_entry), > .rcu_blocked_node = NULL, > diff --git a/kernel/fork.c b/kernel/fork.c > index f9572f416126..7368d4ccb857 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -1665,7 +1665,7 @@ init_task_pid(struct task_struct *task, enum pid_type type, struct pid *pid) > static inline void rcu_copy_process(struct task_struct *p) > { > #ifdef CONFIG_PREEMPT_RCU > - p->rcu_read_lock_nesting = 0; > + p->rcu_read_lock_nesting = INIT_RCU_PREEMPT_DEPTH; > p->rcu_read_unlock_special.s = 0; > p->rcu_blocked_node = NULL; > INIT_LIST_HEAD(&p->rcu_node_entry); > diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig > index 1cc940fef17c..d2ecca49a1a4 100644 > --- a/kernel/rcu/Kconfig > +++ b/kernel/rcu/Kconfig > @@ -14,6 +14,9 @@ config TREE_RCU > thousands of CPUs. It also scales down nicely to > smaller systems. > > +config ARCH_HAVE_RCU_PREEMPT_DEEPTH > + def_bool n > + > config PREEMPT_RCU > bool > default y if PREEMPTION > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h > index dcb2124203cf..f919881832d4 100644 > --- a/kernel/rcu/tree_exp.h > +++ b/kernel/rcu/tree_exp.h > @@ -588,6 +588,7 @@ static void wait_rcu_exp_gp(struct work_struct *wp) > } > > #ifdef CONFIG_PREEMPT_RCU > +static inline void set_rcu_preempt_need_special(void); > > /* > * Remote handler for smp_call_function_single(). If there is an > @@ -637,6 +638,7 @@ static void rcu_exp_handler(void *unused) > if (rnp->expmask & rdp->grpmask) { > rdp->exp_deferred_qs = true; > t->rcu_read_unlock_special.b.exp_hint = true; > + set_rcu_preempt_need_special(); > } > raw_spin_unlock_irqrestore_rcu_node(rnp, flags); > return; > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h > index e16c3867d2ff..e6774a7ab16b 100644 > --- a/kernel/rcu/tree_plugin.h > +++ b/kernel/rcu/tree_plugin.h > @@ -82,7 +82,7 @@ static void __init rcu_bootup_announce_oddness(void) > #ifdef CONFIG_PREEMPT_RCU > > static void rcu_report_exp_rnp(struct rcu_node *rnp, bool wake); > -static void rcu_read_unlock_special(struct task_struct *t); > +void rcu_read_unlock_special(void); > > /* > * Tell them what RCU they are running. > @@ -298,6 +298,7 @@ void rcu_note_context_switch(bool preempt) > t->rcu_read_unlock_special.b.need_qs = false; > t->rcu_read_unlock_special.b.blocked = true; > t->rcu_blocked_node = rnp; > + set_rcu_preempt_need_special(); > > /* > * Verify the CPU's sanity, trace the preemption, and > @@ -345,6 +346,7 @@ static int rcu_preempt_blocked_readers_cgp(struct rcu_node *rnp) > /* Bias and limit values for ->rcu_read_lock_nesting. */ > #define RCU_NEST_PMAX (INT_MAX / 2) > > +#ifndef CONFIG_ARCH_HAVE_RCU_PREEMPT_DEEPTH > static inline void rcu_preempt_depth_inc(void) > { > current->rcu_read_lock_nesting++; > @@ -352,7 +354,12 @@ static inline void rcu_preempt_depth_inc(void) > > static inline bool rcu_preempt_depth_dec_and_test(void) > { > - return --current->rcu_read_lock_nesting == 0; > + if (--current->rcu_read_lock_nesting == 0) { > + /* check speical after dec ->rcu_read_lock_nesting */ > + barrier(); > + return READ_ONCE(current->rcu_read_unlock_special.s); > + } > + return 0; > } > > static inline void rcu_preempt_depth_set(int val) > @@ -360,6 +367,12 @@ static inline void rcu_preempt_depth_set(int val) > current->rcu_read_lock_nesting = val; > } > > +static inline void clear_rcu_preempt_need_special(void) {} > +static inline void set_rcu_preempt_need_special(void) {} > + > +#endif /* #ifndef CONFIG_ARCH_HAVE_RCU_PREEMPT_DEEPTH */ > + > +#if !defined(CONFIG_ARCH_HAVE_RCU_PREEMPT_DEEPTH) || defined (CONFIG_PROVE_LOCKING) > /* > * Preemptible RCU implementation for rcu_read_lock(). > * Just increment ->rcu_read_lock_nesting, shared state will be updated > @@ -383,18 +396,16 @@ EXPORT_SYMBOL_GPL(__rcu_read_lock); > */ > void __rcu_read_unlock(void) > { > - struct task_struct *t = current; > - > if (rcu_preempt_depth_dec_and_test()) { > - barrier(); /* critical section before exit code. */ > - if (unlikely(READ_ONCE(t->rcu_read_unlock_special.s))) > - rcu_read_unlock_special(t); > + rcu_read_unlock_special(); > } > if (IS_ENABLED(CONFIG_PROVE_LOCKING)) { > - WARN_ON_ONCE(rcu_preempt_depth() < 0); > + WARN_ON_ONCE(rcu_preempt_depth() < 0 || > + rcu_preempt_depth() > RCU_NEST_PMAX); > } > } > EXPORT_SYMBOL_GPL(__rcu_read_unlock); > +#endif /* #if !defined(CONFIG_ARCH_HAVE_RCU_PREEMPT_DEEPTH) || defined (CONFIG_PROVE_LOCKING) */ > > /* > * Advance a ->blkd_tasks-list pointer to the next entry, instead > @@ -449,6 +460,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags) > return; > } > t->rcu_read_unlock_special.s = 0; > + clear_rcu_preempt_need_special(); > if (special.b.need_qs) > rcu_qs(); > > @@ -579,8 +591,9 @@ static void rcu_preempt_deferred_qs_handler(struct irq_work *iwp) > * notify RCU core processing or task having blocked during the RCU > * read-side critical section. > */ > -static void rcu_read_unlock_special(struct task_struct *t) > +void rcu_read_unlock_special(void) > { > + struct task_struct *t = current; > unsigned long flags; > bool preempt_bh_were_disabled = > !!(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK)); > @@ -631,6 +644,7 @@ static void rcu_read_unlock_special(struct task_struct *t) > } > rcu_preempt_deferred_qs_irqrestore(t, flags); > } > +EXPORT_SYMBOL_GPL(rcu_read_unlock_special); > > /* > * Check that the list of blocked tasks for the newly completed grace > @@ -694,8 +708,10 @@ static void rcu_flavor_sched_clock_irq(int user) > __this_cpu_read(rcu_data.core_needs_qs) && > __this_cpu_read(rcu_data.cpu_no_qs.b.norm) && > !t->rcu_read_unlock_special.b.need_qs && > - time_after(jiffies, rcu_state.gp_start + HZ)) > + time_after(jiffies, rcu_state.gp_start + HZ)) { > t->rcu_read_unlock_special.b.need_qs = true; > + set_rcu_preempt_need_special(); > + } > } > > /* > @@ -714,6 +730,7 @@ void exit_rcu(void) > rcu_preempt_depth_set(1); > barrier(); > WRITE_ONCE(t->rcu_read_unlock_special.b.blocked, true); > + set_rcu_preempt_need_special(); > } else if (unlikely(rcu_preempt_depth())) { > rcu_preempt_depth_set(1); > } else { > -- > 2.20.1 >