From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760227AbdEWAAo (ORCPT ); Mon, 22 May 2017 20:00:44 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:56989 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752828AbdEWAAm (ORCPT ); Mon, 22 May 2017 20:00:42 -0400 Date: Mon, 22 May 2017 17:00:36 -0700 From: "Paul E. McKenney" To: Steven Rostedt Cc: Ingo Molnar , linux-kernel@vger.kernel.org, Peter Zijlstra , Thomas Gleixner Subject: Re: Use case for TASKS_RCU Reply-To: paulmck@linux.vnet.ibm.com References: <20170515182354.GA25440@linux.vnet.ibm.com> <20170516062233.tyz7ze7ilmbkxtjc@gmail.com> <20170516122354.GB3956@linux.vnet.ibm.com> <20170519062331.52dhungzvcsdxdgo@gmail.com> <20170519133550.GD3956@linux.vnet.ibm.com> <20170519100421.27298063@gandalf.local.home> <20170519102331.0d5a8536@gandalf.local.home> <20170519190609.GE3956@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170519190609.GE3956@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17052300-0056-0000-0000-0000036CDC77 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007104; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000212; SDB=6.00864220; UDB=6.00428994; IPR=6.00644001; BA=6.00005367; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00015545; XFM=3.00000015; UTC=2017-05-23 00:00:38 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17052300-0057-0000-0000-000007A30EC9 Message-Id: <20170523000036.GA13506@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-05-22_12:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1703280000 definitions=main-1705220120 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 19, 2017 at 12:06:09PM -0700, Paul E. McKenney wrote: > On Fri, May 19, 2017 at 10:23:31AM -0400, Steven Rostedt wrote: > > On Fri, 19 May 2017 10:04:21 -0400 > > Steven Rostedt wrote: > > > > > On Fri, 19 May 2017 06:35:50 -0700 > > > "Paul E. McKenney" wrote: > > > > > > > Simpler would be better! > > > > > > > > However, is it really guaranteed that one SCHED_IDLE thread cannot > > > > preempt another? If not, then the trampoline-freeing SCHED_IDLE thread > > > > might preempt some other SCHED_IDLE thread in the middle of a trampoline. > > > > I am not seeing anything that prevents such preemption, but it is rather > > > > early local time, so I could easily be missing something. > > > > > > > > However, if SCHED_IDLE threads cannot preempt other threads, even other > > > > SCHED_IDLE threads, then your approach sounds quite promising to me. > > > > > > > > Steve, Peter, thoughts? > > > > > > SCHED_IDLE is the swapper task. There's one on each CPU, and they don't > > > migrate. And they only get called when there's no other task running. > > > > Peter just "schooled" me on IRC. I stand corrected (and he may respond > > to this email too). I guess any task can become SCHED_IDLE. > > > > But that just makes this an even less likely option for > > synchronize_rcu_tasks(). > > Hmmm... The goal is to make sure that any task that was preempted or > running at a given point in time passes through a voluntary context switch > (or userspace execution, or, ...). > > What is the simplest way to get this job done? To Ingo's point, I bet > that there is a simpler way than the current TASKS_RCU implementation. > > Ingo, if I make it fit into 100 lines of code, would you be OK with it? > I probably need a one-line hook at task-creation time and another > at task-exit time, if that makes a difference. And please see below for such a patch, which does add (just barely) fewer than 100 lines net. Unfortunately, it does not work, as I should have known ahead of time from the dyntick-idle experience. Not all context switches go through context_switch(). :-/ I believe this is fixable, more or less like dyntick-idle's half-interrupts were fixable, but it will likely be a few days. Not clear whether the result will be simpler than current TASKS_RCU, but there is only one way to find out. ;-) Thanx, Paul ------------------------------------------------------------------------ include/linux/init_task.h | 7 +++++ include/linux/rcupdate.h | 12 ++++++++++ include/linux/sched.h | 5 ++++ kernel/fork.c | 4 +++ kernel/rcu/Kconfig | 8 +++--- kernel/rcu/Kconfig.debug | 2 - kernel/rcu/tree.c | 15 +++++++++--- kernel/rcu/tree_plugin.h | 55 ++++++++++++++++++++++++++++++++++++++++++++++ kernel/sched/core.c | 3 ++ 9 files changed, 101 insertions(+), 10 deletions(-) commit 25cb9b6882542f3db0242f6556b94e139c4b29d5 Author: Paul E. McKenney Date: Mon May 22 08:55:01 2017 -0700 rcu: Provide simplified RCU-tasks implementation This commit provides a simplified RCU-tasks implementation in terms of SRCU for CONFIG_PREEMPT=y kernels and in terms of RCU-sched otherwise. The RCU-sched approach simply maps synchronize_rcu_tasks() directly onto synchronize_sched(). The SRCU approach provides a tasks_srcu srcu_struct structure that tracks usermode execution and voluntary context switches. When each task is created, it invokes tasks_rcu_qs_exit() and when each task exits it invokes tasks_rcu_qs_enter(). On context switch, tasks_rcu_qs_exit() is invoked for the incoming task, but only if that task is resuming from a voluntary context switch. If the current context switch is a preemption, tasks_rcu_qs_enter() is invoked for the outgoing tasks, and either way the preempt-or-not state is recorded for the task's next resumption. A "context switch" that keeps the same task invokes tasks_rcu_qs(). If the scheduling-clock interrupt sees a task in usermode, it invokes tasks_rcu_qs(). It is normally unsafe to do SRCU read-side operations from interrupt, but in this case, usermode was interrupted, so there is no chance of having interrupted another SRCU read-side operation. Given all this, synchronize_srcu(&tasks_srcu) waits for all tasks to be seen voluntarily blocked or in user space. This means that synchronize_rcu_tasks() is a one-line function. Signed-off-by: Paul E. McKenney diff --git a/include/linux/init_task.h b/include/linux/init_task.h index bd11d33d48cc..631f3d3f2dc3 100644 --- a/include/linux/init_task.h +++ b/include/linux/init_task.h @@ -140,6 +140,13 @@ extern struct group_info init_groups; #else #define INIT_TASK_RCU_PREEMPT(tsk) #endif +#ifdef CONFIG_TASKS_RCU +#define INIT_TASK_RCU_TASKS(tsk) \ + .rcu_tasks_idx = 0, \ + .rcu_tasks_preempt = false, +#else /* #ifdef CONFIG_TASKS_RCU */ +#define INIT_TASK_RCU_TASKS(tsk) +#endif /* #else #ifdef CONFIG_TASKS_RCU */ extern struct cred init_cred; diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 5088a44ce99c..1b2f104a467f 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -156,6 +156,18 @@ static inline void rcu_init_nohz(void) { } rcu_irq_exit_irqson(); \ } while (0) +#ifdef CONFIG_TASKS_RCU +void tasks_rcu_qs(bool preempt); +void tasks_rcu_qs_enter(struct task_struct *t, bool preempt); +void tasks_rcu_qs_exit(struct task_struct *t); +void synchronize_rcu_tasks(void); +#else /* #ifdef CONFIG_TASKS_RCU */ +static inline void tasks_rcu_qs(bool preempt) { } +static inline void tasks_rcu_qs_enter(struct task_struct *t, bool preempt) { } +static inline void tasks_rcu_qs_exit(struct task_struct *t) { } +static inline void synchronize_rcu_tasks(void) { synchronize_sched(); } +#endif /* #else #ifdef CONFIG_TASKS_RCU */ + /** * cond_resched_rcu_qs - Report potential quiescent states to RCU * diff --git a/include/linux/sched.h b/include/linux/sched.h index 3b7d9aa80349..1e5d27d1e885 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -544,6 +544,11 @@ struct task_struct { struct rcu_node *rcu_blocked_node; #endif /* #ifdef CONFIG_PREEMPT_RCU */ +#ifdef CONFIG_TASKS_RCU + u8 rcu_tasks_idx; + u8 rcu_tasks_preempt; +#endif /* #ifdef CONFIG_TASKS_RCU */ + struct sched_info sched_info; struct list_head tasks; diff --git a/kernel/fork.c b/kernel/fork.c index 1784f49b0cd4..8767d739eed4 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1498,6 +1498,10 @@ static inline void rcu_copy_process(struct task_struct *p) p->rcu_blocked_node = NULL; INIT_LIST_HEAD(&p->rcu_node_entry); #endif /* #ifdef CONFIG_PREEMPT_RCU */ +#ifdef CONFIG_TASKS_RCU + p->rcu_tasks_preempt = false; + tasks_rcu_qs_exit(p); +#endif /* #ifdef CONFIG_TASKS_RCU */ } /* diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig index be90c945063f..3077568930f3 100644 --- a/kernel/rcu/Kconfig +++ b/kernel/rcu/Kconfig @@ -69,13 +69,13 @@ config TREE_SRCU This option selects the full-fledged version of SRCU. config TASKS_RCU - bool - default n + def_bool PREEMPT_RCU select SRCU help This option enables a task-based RCU implementation that uses - only voluntary context switch (not preemption!), idle, and - user-mode execution as quiescent states. + only voluntary context switch (not preemption!) and user-mode + execution as quiescent states. For !PREEMPT_RCU kernels, + synchronize_tasks_rcu() maps to synchronize_sched(). config RCU_STALL_COMMON def_bool ( TREE_RCU || PREEMPT_RCU ) diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug index 0ec7d1d33a14..b6619ceedafc 100644 --- a/kernel/rcu/Kconfig.debug +++ b/kernel/rcu/Kconfig.debug @@ -16,7 +16,6 @@ config RCU_PERF_TEST depends on DEBUG_KERNEL select TORTURE_TEST select SRCU - select TASKS_RCU default n help This option provides a kernel module that runs performance @@ -33,7 +32,6 @@ config RCU_TORTURE_TEST depends on DEBUG_KERNEL select TORTURE_TEST select SRCU - select TASKS_RCU default n help This option provides a kernel module that runs torture tests diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 8008541b04f2..05e7abecde68 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -166,6 +166,7 @@ static void invoke_rcu_callbacks(struct rcu_state *rsp, struct rcu_data *rdp); static void rcu_report_exp_rdp(struct rcu_state *rsp, struct rcu_data *rdp, bool wake); static void sync_sched_exp_online_cleanup(int cpu); +static void rcu_all_qs_ctxt(bool ctxt); /* rcuc/rcub kthread realtime priority */ static int kthread_prio = IS_ENABLED(CONFIG_RCU_BOOST) ? 1 : 0; @@ -467,7 +468,7 @@ void rcu_note_context_switch(bool preempt) rcu_momentary_dyntick_idle(); this_cpu_inc(rcu_dynticks.rcu_qs_ctr); if (!preempt) - rcu_all_qs(); + rcu_all_qs_ctxt(true); out: trace_rcu_utilization(TPS("End context switch")); barrier(); /* Avoid RCU read-side critical sections leaking up. */ @@ -480,14 +481,13 @@ EXPORT_SYMBOL_GPL(rcu_note_context_switch); * dyntick-idle quiescent state visible to other CPUs (but only for those * RCU flavors in desperate need of a quiescent state, which will normally * be none of them). Either way, do a lightweight quiescent state for - * all RCU flavors. + * all RCU flavors. If ctxt is false, also to tasks-RCU quiescent state. * * The barrier() calls are redundant in the common case when this is * called externally, but just in case this is called from within this * file. - * */ -void rcu_all_qs(void) +static void rcu_all_qs_ctxt(bool ctxt) { unsigned long flags; @@ -509,9 +509,16 @@ void rcu_all_qs(void) if (unlikely(raw_cpu_read(rcu_sched_data.cpu_no_qs.b.exp))) rcu_sched_qs(); this_cpu_inc(rcu_dynticks.rcu_qs_ctr); + if (!ctxt) + tasks_rcu_qs(false); barrier(); /* Avoid RCU read-side critical sections leaking up. */ preempt_enable(); } + +void rcu_all_qs(void) +{ + rcu_all_qs_ctxt(false); +} EXPORT_SYMBOL_GPL(rcu_all_qs); #define DEFAULT_RCU_BLIMIT 10 /* Maximum callbacks per rcu_do_batch. */ diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index d7325553cf82..21b254dd28b8 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -783,6 +783,7 @@ void exit_rcu(void) barrier(); t->rcu_read_unlock_special.b.blocked = true; __rcu_read_unlock(); + tasks_rcu_qs_enter(t, false); } #else /* #ifdef CONFIG_PREEMPT_RCU */ @@ -2597,3 +2598,57 @@ static void rcu_bind_gp_kthread(void) return; housekeeping_affine(current); } + +#ifdef CONFIG_TASKS_RCU + +DEFINE_STATIC_SRCU(tasks_srcu); + +/* Record a momentary RCU-tasks quiescent state. */ +void tasks_rcu_qs(bool preempt) +{ + int idx; + + if (preempt || is_idle_task(current)) + return; /* Preempted, not a quiescent state. */ + /* Acquire new before releasing old to allow tracing SRCU. */ + preempt_disable(); + idx = __srcu_read_lock(&tasks_srcu); + __srcu_read_unlock(&tasks_srcu, current->rcu_tasks_idx); + current->rcu_tasks_idx = idx; + preempt_enable(); +} + +/* Record entering an extended RCU-tasks quiescent state. */ +void tasks_rcu_qs_enter(struct task_struct *t, bool preempt) +{ + t->rcu_tasks_preempt = preempt; + if (preempt || is_idle_task(t)) + return; /* Preempted, not a quiescent state. */ + preempt_disable(); + __srcu_read_unlock(&tasks_srcu, t->rcu_tasks_idx); + preempt_enable(); +} + +/* Record exiting an extended RCU-tasks quiescent state. */ +void tasks_rcu_qs_exit(struct task_struct *t) +{ + if (t->rcu_tasks_preempt || is_idle_task(t)) + return; /* Preempted, not a quiescent state. */ + preempt_disable(); + t->rcu_tasks_idx = __srcu_read_lock(&tasks_srcu); + preempt_enable(); +} + +/** + * synchronize_rcu_tasks - Wait for all tasks to voluntarily leave the kernel + * + * Waits until each task is either voluntarily blocked, running in + * userspace, or has offered to block (as in cond_resched_rcu_qs()). + * Idle tasks are ignored. + */ +void synchronize_rcu_tasks(void) +{ + synchronize_srcu(&tasks_srcu); +} + +#endif /* #ifdef CONFIG_TASKS_RCU */ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 75353aa53ae3..1ba79734377f 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3430,6 +3430,7 @@ static void __sched notrace __schedule(bool preempt) clear_preempt_need_resched(); if (likely(prev != next)) { + tasks_rcu_qs_exit(next); rq->nr_switches++; rq->curr = next; ++*switch_count; @@ -3438,9 +3439,11 @@ static void __sched notrace __schedule(bool preempt) /* Also unlocks the rq: */ rq = context_switch(rq, prev, next, &rf); + tasks_rcu_qs_enter(prev, preempt); } else { rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP); rq_unlock_irq(rq, &rf); + tasks_rcu_qs(preempt); } balance_callback(rq);