From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Paul E. McKenney" Subject: Re: [PATCH 4/6] kvm tools: Add rwlock wrapper Date: Thu, 26 May 2011 17:58:22 -0700 Message-ID: <20110527005822.GL2386@linux.vnet.ibm.com> References: <1306419950-19064-1-git-send-email-levinsasha928@gmail.com> <1306419950-19064-4-git-send-email-levinsasha928@gmail.com> <1306426743.3065.34.camel@lappy> <20110526180518.GA3572@elte.hu> <4DDE97CE.4000302@redhat.com> <20110526202531.GA2765@elte.hu> <20110526230508.GA15983@Krystal> Reply-To: paulmck@linux.vnet.ibm.com Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Ingo Molnar , Pekka Enberg , Avi Kivity , Sasha Levin , john@jfloren.net, kvm@vger.kernel.org, asias.hejun@gmail.com, gorcunov@gmail.com, prasadjoshi124@gmail.com To: Mathieu Desnoyers Return-path: Received: from e1.ny.us.ibm.com ([32.97.182.141]:49330 "EHLO e1.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752040Ab1E0A60 (ORCPT ); Thu, 26 May 2011 20:58:26 -0400 Received: from d01relay05.pok.ibm.com (d01relay05.pok.ibm.com [9.56.227.237]) by e1.ny.us.ibm.com (8.14.4/8.13.1) with ESMTP id p4R0l36A001465 for ; Thu, 26 May 2011 20:47:03 -0400 Received: from d01av01.pok.ibm.com (d01av01.pok.ibm.com [9.56.224.215]) by d01relay05.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id p4R0wPXf121690 for ; Thu, 26 May 2011 20:58:25 -0400 Received: from d01av01.pok.ibm.com (loopback [127.0.0.1]) by d01av01.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p4R0wOPd029483 for ; Thu, 26 May 2011 20:58:25 -0400 Content-Disposition: inline In-Reply-To: <20110526230508.GA15983@Krystal> Sender: kvm-owner@vger.kernel.org List-ID: On Thu, May 26, 2011 at 07:05:08PM -0400, Mathieu Desnoyers wrote: > * Ingo Molnar (mingo@elte.hu) wrote: > >=20 > > * Pekka Enberg wrote: > >=20 > > > On Thu, May 26, 2011 at 9:11 PM, Avi Kivity wrot= e: > > > > On 05/26/2011 09:05 PM, Ingo Molnar wrote: > > > >> > > > >> > > > > >> > =A0I've added some rwlocks because of what Ingo said yesterd= ay about > > > >> > =A0adding/removing devices after the first initialization ph= ase. > > > >> > > > > >> > =A0Take MMIO lock for example: Since we can now run SMP gues= ts, we may > > > >> > =A0have multiple MMIO exits (one from each VCPU thread). Eac= h of those > > > >> > =A0exits leads to searching the MMIO rbtree. > > > >> > > > > >> > =A0We can use a mutex to lock it, but it just means that tho= se threads > > > >> > =A0will be blocked there instead of concurrently searching t= he MMIO > > > >> > =A0tree which makes the search linear instead of parallel. > > > >> > > > > >> > =A0It's hard to bring 'real' numbers at this stage because t= he only > > > >> > =A0'real' device we have which uses MMIO is the VESA driver,= and we > > > >> > =A0can't really simulate many VCPUs writing to it :) > > > >> > > > >> I'd suggest keeping it simple first - rwlocks are nasty and wi= ll > > > >> bounce a cacheline just as much. > > > > > > > > Well, this is the first case where tools/kvm can do better than= qemu with > > > > its global lock, so I think it's worth it. > > > > > > > >> If lookup scalability is an issue we can extend RCU to tools/k= vm/. > > > > > > > > Definitely rcu is a perfect patch for mmio dispatch. > > >=20 > > > Userspace RCU code is here, Sasha, if you feel like tackling this= : > > >=20 > > > http://lttng.org/urcu > > >=20 > > > :-) > > >=20 > > > I'm CC'ing Paul and Mathieu as well for urcu. I am hoping we can get better convergence between the user-level and kernel-level URCU implementations once I get SRCU merged into the TREE_= RCU and TINY_RCU implementations. But it is early days for user-level RCU implementations -- for example, the kernel-level implementations have deep dependencies on being able to lock themselves cheaply to a given C= PU, which does not exist at user level. But there seems to be an assumption that there should be only one URCU implementation, and I am not sure that this assumption holds. For example, there are several in http://lttng.org/urcu, each corresponding to different use cases. This should not be too much of a surprise, giv= en that there are quite a few implementations in the Linux kernel: TINY_RC= U, TINY_PREEMPT_RCU, TREE_RCU, TREE_PREEMPT_RCU, and SRCU. Of course, each of the first four variants provides RCU-bh and RCU-sched, and TINY_PREEMPT_RCU and TREE_PREEMPT_RCU provide preemptible RCU in additi= on. And back in the mid-1990s, I would never have imagined a need for more than one implementation of RCU. ;-) All that aside, one advantage of http://lttng.org/urcu is that it alrea= dy exists, which allows prototyping to proceed immediately. If it turns out that URCU doesn't help for whatever reason, then there is no point = in worrying further. And if URCU does turn out to help, we will know more about exactly what this particular situation requires of URCU, which will likely help us better understand what the implemenation should look like -- perhaps very close to one of the URCU implementations, perhaps very close to one of the in-kernel implementations. Seem reasonable, or am I missing something here? Thanx, Paul > > I think we should rather share some of the kernel RCU code in an=20 > > intelligent way >=20 > Hi Ingo, >=20 > By all means feel free to redo all the work Paul have spent care and > time coding and testing. >=20 > > instead of bringing in yet another library which is a=20 > > IIRC a distant copy of the kernel code to begin with. >=20 > This is either a lie, or immensely misinformed. You should go and loo= k > at the source before doing nonsensical assumptions like this. What yo= u > are saying cannot be further from the truth. >=20 > >=20 > > That way we could lazily benefit from all the enhancements Paul doe= s=20 > > to the kernel RCU code! :-) >=20 > Maybe there is a reason why Paul have been working with me on the > userspace RCU implementation in parallel with working on the Linux > kernel one ? >=20 > >=20 > > Note that kernel/treercu.c is pretty tied to the kernel right now, = so=20 > > a first approach could be to: > >=20 > > - Check Paul's excellent documentation about RCU and make sure > > you don't miss Paul's excellent 3-part primer on LWN.net: > >=20 > > http://lwn.net/Articles/262464/ > > http://lwn.net/Articles/263130/ > > http://lwn.net/Articles/264090/ > >=20 > > There are also lots of very good RCU articles on the LWN Kernel=20 > > Index page: > >=20 > > http://lwn.net/Kernel/Index/ >=20 > Or just (see README) >=20 > git clone git://git.lttng.org/urcu > cd userspace-rcu > ./bootstrap > ./configure > make > make install > ldconfig >=20 > #include > gcc -lurcu ... >=20 > and be done with it ? >=20 > > - Check kernel/tinyrcu.c to see how RCU is implemented in its=20 > > simplest form. :) >=20 > ...so simplistic it only works on UP systems, which are not so common > these days on the systems targeted by kvm. >=20 > >=20 > > - Copy the tree-RCU code from kernel/treercu.c to tools/kvm/rcu/ >=20 > This code is very much tied with the kernel scheduler. This is actual= ly > one of the main reason why we had to reimplement RCU for userspace > rather than to "simply copy the kernel one" as you recommend. >=20 > >=20 > > - Massage it so far that it is suitable for tools/kvm/. We really > > only need a few core RCU facilities initially: > >=20 > > struct rcu_head; > >=20 > > rcu_read_lock(); > > rcu_read_unlock(); > >=20 > > rcu_dereference() > >=20 > > call_rcu(head, func); > >=20 > > rcu_synchronize(); > >=20 > > The rest, _bh(), etc. are all kernelisms. >=20 > rcu_synchronize and the rcu read lock/unlock, in tree RCU, are tied t= o > the scheduler to deal with preemption. User-land does not have this > luxury. >=20 > >=20 > > - Then once it's working we could look at doing the code sharing > > *for real*: splitting the functionality out of the original > > treercu.c code into kernel/treercu-lib.c and rcuupdate-lib.h or = so > > and include that one in tools/kvm/rcu/. > >=20 > > - [ You might also benefit from porting rcutorture code to=20 > > user-space. It will catch RCU bugs like nothing else. ] >=20 > Userspace RCU already includes the torture test suite you are referri= ng > to. >=20 > >=20 > > That way the 'core RCU' logic would be contained in treercu-lib.c,=20 > > all kernel glue would be in kernel/rcutree.c. > >=20 > > Some other approach might be possible as well, this was just a firs= t=20 > > rough idea :) >=20 > "wheel not invented here" syndrome ? >=20 > Mathieu >=20 > --=20 > Mathieu Desnoyers > Operating System Efficiency R&D Consultant > EfficiOS Inc. > http://www.efficios.com