From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755245AbdESNf5 (ORCPT ); Fri, 19 May 2017 09:35:57 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:38212 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750764AbdESNf4 (ORCPT ); Fri, 19 May 2017 09:35:56 -0400 Date: Fri, 19 May 2017 06:35:50 -0700 From: "Paul E. McKenney" To: Ingo Molnar Cc: rostedt@goodmis.org, linux-kernel@vger.kernel.org, Peter Zijlstra , Thomas Gleixner Subject: Re: Use case for TASKS_RCU Reply-To: paulmck@linux.vnet.ibm.com References: <20170515182354.GA25440@linux.vnet.ibm.com> <20170516062233.tyz7ze7ilmbkxtjc@gmail.com> <20170516122354.GB3956@linux.vnet.ibm.com> <20170519062331.52dhungzvcsdxdgo@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170519062331.52dhungzvcsdxdgo@gmail.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17051913-0048-0000-0000-0000018834EF X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007084; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000212; SDB=6.00862608; UDB=6.00428015; IPR=6.00642357; BA=6.00005358; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00015510; XFM=3.00000015; UTC=2017-05-19 13:35:52 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17051913-0049-0000-0000-000041306D94 Message-Id: <20170519133550.GD3956@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-05-19_08:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1703280000 definitions=main-1705190086 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 19, 2017 at 08:23:31AM +0200, Ingo Molnar wrote: > > * Paul E. McKenney wrote: > > > On Tue, May 16, 2017 at 08:22:33AM +0200, Ingo Molnar wrote: > > > > > > * Paul E. McKenney wrote: > > > > > > > Hello! > > > > > > > > The question of the use case for TASKS_RCU came up, and here is my > > > > understanding. Steve will not be shy about correcting any misconceptions > > > > I might have. ;-) > > > > > > > > The use case is to support freeing of trampolines used in tracing/probing > > > > in CONFIG_PREEMPT=y kernels. It is necessary to wait until any task > > > > executing in the trampoline in question has left it, taking into account > > > > that the trampoline's code might be interrupted and preempted. However, > > > > the code in the trampolines is guaranteed never to context switch. > > > > > > > > Note that in CONFIG_PREEMPT=n kernels, synchronize_sched() suffices. > > > > It is therefore tempting to think in terms of disabling preemption across > > > > the trampolines, but there is apparently not enough room to accommodate > > > > the needed preempt_disable() and preempt_enable() in the code invoking > > > > the trampoline, and putting the preempt_disable() and preempt_enable() > > > > in the trampoline itself fails because of the possibility of preemption > > > > just before the preempt_disable() and just after the preempt_enable(). > > > > Similar reasoning rules out use of rcu_read_lock() and rcu_read_unlock(). > > > > > > So how was this solved before TASKS_RCU? Also, nothing uses call_rcu_tasks() at > > > the moment, so it's hard for me to review its users. What am I missing? > > > > Before TASKS_RCU, the trampolines were just leaked when CONFIG_PREEMPT=y. > > > > Current mainline kernel/trace/ftrace.c uses synchronize_rcu_tasks(). > > So yes, currently one user. > > So why not schedule a worklet on every CPU to drive the trampoline freeing? To > guarantee that nothing was preempted it could run at SCHED_IDLE and could observe > nr_running from the worklet and use a short timeout loop. Batching and hysteresis > would ensure that this is only running rarely in practice. > > It doesn't have to be fast or particularly elegant, but it could use existing > kernel facilites just fine: it's a corner case cost and quirk of our live kernel > text modifying trampoline code and our current CONFIG_PREEMPT=y model. > > I.e. don't make it an RCU facility that complicates not just the RCU code but has > various costs in generic code as well: > > kernel/exit.c: TASKS_RCU(int tasks_rcu_i); > kernel/exit.c: TASKS_RCU(preempt_disable()); > kernel/exit.c: TASKS_RCU(tasks_rcu_i = __srcu_read_lock(&tasks_rcu_exit_srcu)); > kernel/exit.c: TASKS_RCU(preempt_enable()); > kernel/exit.c: TASKS_RCU(__srcu_read_unlock(&tasks_rcu_exit_srcu, tasks_rcu_i)); > > I.e. I question that this should be a generic RCU facility. Simpler would be better! However, is it really guaranteed that one SCHED_IDLE thread cannot preempt another? If not, then the trampoline-freeing SCHED_IDLE thread might preempt some other SCHED_IDLE thread in the middle of a trampoline. I am not seeing anything that prevents such preemption, but it is rather early local time, so I could easily be missing something. However, if SCHED_IDLE threads cannot preempt other threads, even other SCHED_IDLE threads, then your approach sounds quite promising to me. Steve, Peter, thoughts? Thanx, Paul