From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758599Ab0DPQpR (ORCPT ); Fri, 16 Apr 2010 12:45:17 -0400 Received: from e5.ny.us.ibm.com ([32.97.182.145]:44127 "EHLO e5.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758584Ab0DPQpL (ORCPT ); Fri, 16 Apr 2010 12:45:11 -0400 Date: Fri, 16 Apr 2010 09:45:03 -0700 From: "Paul E. McKenney" To: Peter Zijlstra Cc: Benjamin Herrenschmidt , Andrea Arcangeli , Avi Kivity , Thomas Gleixner , Rik van Riel , Ingo Molnar , akpm@linux-foundation.org, Linus Torvalds , linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, David Miller , Hugh Dickins , Mel Gorman , Nick Piggin Subject: Re: [PATCH 01/13] powerpc: Add rcu_read_lock() to gup_fast() implementation Message-ID: <20100416164503.GH2615@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20100413034311.GB2772@linux.vnet.ibm.com> <1271253110.32749.47.camel@laptop> <20100415142852.GA2471@linux.vnet.ibm.com> <1271425881.4807.2319.camel@twins> <20100416141745.GC2615@linux.vnet.ibm.com> <1271427819.4807.2353.camel@twins> <20100416143202.GE2615@linux.vnet.ibm.com> <1271429810.4807.2390.camel@twins> <20100416150909.GF2615@linux.vnet.ibm.com> <1271430855.4807.2411.camel@twins> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1271430855.4807.2411.camel@twins> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 16, 2010 at 05:14:15PM +0200, Peter Zijlstra wrote: > On Fri, 2010-04-16 at 08:09 -0700, Paul E. McKenney wrote: > > On Fri, Apr 16, 2010 at 04:56:50PM +0200, Peter Zijlstra wrote: > > > On Fri, 2010-04-16 at 07:32 -0700, Paul E. McKenney wrote: > > > > On Fri, Apr 16, 2010 at 04:23:39PM +0200, Peter Zijlstra wrote: > > > > > On Fri, 2010-04-16 at 07:17 -0700, Paul E. McKenney wrote: > > > > > > > > > > > > > Of course, with call_rcu_sched(), the corresponding RCU read-side critical > > > > > > > > sections are non-preemptible. Therefore, in CONFIG_PREEMPT_RT, these > > > > > > > > read-side critical sections must use raw spinlocks. > > > > > > > > > > > > > > OK, so if we fully remove CONFIG_TREE_PREEMPT_RCU (defaulting to y), > > > > > > > rename all the {call_rcu, rcu_read_lock, rcu_read_unlock, > > > > > > > synchronize_rcu} functions to {*}_preempt and then add a new > > > > > > > CONFIG_PREEMPT_RCU that simply maps {*} to either {*}_sched or > > > > > > > {*}_preempt, we've basically got what I've been asking for for a while, > > > > > > > no? > > > > > > > > > > > > What would rcu_read_lock_preempt() do in a !CONFIG_PREEMPT kernel? > > > > > > > > > > Same as for a preempt one, since you'd have to be able to schedule() > > > > > while holding it to be able to do things like mutex_lock(). > > > > > > > > So what you really want is something like rcu_read_lock_sleep() rather > > > > than rcu_read_lock_preempt(), right? The point is that you want to do > > > > more than merely preempt, given that it is legal to do general blocking > > > > while holding a mutex, correct? > > > > > > Right, but CONFIG_TREE_PREEMPT_RCU=y ends up being that. We could change > > > the name to _sleep, but we've been calling it preemptible-rcu for a long > > > while now. > > > > It is actually not permitted to do general blocking in a preemptible RCU > > read-side critical section. Otherwise, someone is going to block waiting > > for a network packet that never comes, thus OOMing the system. > > Sure, something that guarantees progress seems like a sensible > restriction for any lock, and in particular RCU :-) Excellent point -- much of the issue really does center around forward-progress guarantees. In fact the Linux kernel has a number of locking primitives that require different degrees of forward-progress guarantee from the code in their respective critical sections: o spin_lock_irqsave(): Critical sections must guarantee forward progress against everything except NMI handlers. o raw_spin_lock(): Critical sections must guarantee forward progress against everything except IRQ (including softirq) and NMI handlers. o spin_lock(): Critical sections must guarantee forward progress against everything except IRQ (again including softirq) and NMI handlers and (given CONFIG_PREEMPT_RT) higher-priority realtime tasks. o mutex_lock(): Critical sections need not guarantee forward progress, as general blocking is permitted. The other issue is the scope of the lock. The Linux kernel has the following: o BKL: global scope. o Everything else: scope defined by the use of the underlying lock variable. One of the many reasons that we are trying to get rid of BKL is because it combines global scope with relatively weak forward-progress guarantees. So here is how the various RCU flavors stack up: o rcu_read_lock_bh(): critical sections must guarantee forward progress against everything except NMI handlers and IRQ handlers, but not against softirq handlers. Global in scope, so that violating the forward-progress guarantee risks OOMing the system. o rcu_read_lock_sched(): critical sections must guarantee forward progress against everything except NMI and IRQ handlers, including softirq handlers. Global in scope, so that violating the forward-progress guarantee risks OOMing the system. o rcu_read_lock(): critical sections must guarantee forward progress against everything except NMI handlers, IRQ handlers, softirq handlers, and (in CONFIG_PREEMPT_RT) higher-priority realtime tasks. Global in scope, so that violating the forward-progress guarantee risks OOMing the system. o srcu_read_lock(): critical sections need not guarantee forward progress, as general blocking is permitted. Scope is controlled by the use of the underlying srcu_struct structure. As you say, one can block in rcu_read_lock() critical sections, but the only blocking that is really safe is blocking that is subject to priority inheritance. This prohibits mutexes, because although the mutexes themselves are subject to priority inheritance, the mutexes' critical sections might well not be. So the easy response is "just use SRCU." Of course, SRCU has some disadvantages at the moment: o The return value from srcu_read_lock() must be passed to srcu_read_unlock(). I believe that I can fix this. o There is no call_srcu(). I believe that I can fix this. o SRCU uses a flat per-CPU counter scheme that is not particularly scalable. I believe that I can fix this. o SRCU's current implementation makes it almost impossible to implement priority boosting. I believe that I can fix this. o SRCU requires explicit initialization of the underlying srcu_struct. Unfortunately, I don't see a reasonable way around this. Not yet, anyway. So, is there anything else that you don't like about SRCU? Thanx, Paul