From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mathieu Desnoyers Subject: RCU red-black tree (was: Re: [PATCH 4/6] kvm tools: Add rwlock wrapper) Date: Sun, 29 May 2011 13:01:04 -0400 Message-ID: <20110529170104.GA17189@Krystal> References: <1306419950-19064-4-git-send-email-levinsasha928@gmail.com> <1306426743.3065.34.camel@lappy> <20110526180518.GA3572@elte.hu> <4DDE97CE.4000302@redhat.com> <1306436223.3065.36.camel@lappy> <20110526230923.GB15983@Krystal> <1306491547.3217.9.camel@lappy> <20110527131400.GC29744@Krystal> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Pekka Enberg , Avi Kivity , Ingo Molnar , john@jfloren.net, kvm@vger.kernel.org, asias.hejun@gmail.com, gorcunov@gmail.com, prasadjoshi124@gmail.com, "Paul E. McKenney" , Phil Howard , rp@svcs.cs.pdx.edu To: Sasha Levin Return-path: Received: from mail.openrapids.net ([64.15.138.104]:60513 "EHLO blackscsi.openrapids.net" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752031Ab1E2RBI (ORCPT ); Sun, 29 May 2011 13:01:08 -0400 Content-Disposition: inline In-Reply-To: <20110527131400.GC29744@Krystal> Sender: kvm-owner@vger.kernel.org List-ID: * Mathieu Desnoyers (mathieu.desnoyers@efficios.com) wrote: > * Sasha Levin (levinsasha928@gmail.com) wrote: [...] > > Hi Mathieu! > > > > In tools/kvm/ we use a rb-tree (same one used by the kernel) with the > > augmentation feature to support an interval rb-tree - which means that > > every update to the tree not only updates the nodes directly related to > > the updated node but also all the nodes on the path to the root of the > > tree. > > Cool !! > > I'm adding in copy Phil Howard who has been working on RCU RB tree for > much longer than myself. > > > I see that in liburcu there is an implementation of a rcu linked list > > but no implementation of a rb-tree. > > > > Are you currently working on one? or maybe I should try writing one and > > sending it to you? > > Actually, I started working on one last year, but had to interrupt my > effort before I got it even working right. [...] > We'd have to see how we can go from this implementation of a standard RB > tree to an interval RB tree too. I guess it will depend whether you need > the updates from the target node up to the root to be done "all at once" > from a reader perspective (then you would probably need to replace a > copy of a part of the tree all at once), or if you can allow the update > to be done piece-wise on a node-by-node basis as readers go through the > tree (from root to leafs). I've revisited the RCU rbtree implementation this weekend, and it works much better now. I reimplemented the whole thing from 0 starting from the CLRS chapter 12 algorithms (to get the non-rcu (insertion/removal)-only stress-tests working) and incrementally RCU-ized the updates and then added read-side tests. All along, I used the test_urcu_rbtree test case that does some basic coherency tests by searching for some random elements that *should* be there in parellel with insertion and removals. The implementation I currently have survives the "search for known elements in parallel with updates" stress test (so far). (e.g. test_urcu_rbtree 6 2 10 -g 30 : 6 readers, 2 writers, 30 known random elements, writers are adding/removing 6 random elements, on a 8-core machine) See: git://git.lttng.org/userspace-rcu.git branch : rbtree2 The key idea I used in this implementation is to "decay" the old nodes (AFAIK, I just made this up) : "decaying" a node could be best described as creating an exact copy of a node, and putting a pointer to this new node into the old node to form a "decay chain". This allowed me to keep the algorithm very much similar to CLRS by just walking the decay chains whenever needed. The old node "decays" by using call_rcu to free it after a grace period passes. This imply that the updates must hold the RCU read-side lock in addition to a mutex to make sure the decaying nodes stay valid for the duration of their use. This implementation never requires the read-side to loop, thus guaranteeing a wait-free read-side behavior (so search operations will always be strictly log(n) without any busy-loop delay). I have not created stress-tests for next/prev walk of the tree yet. It is therefore entirely possible that this does not work as expected. Comments are welcome, Thanks, Mathieu > > Thanks, > > Mathieu > > > -- > Mathieu Desnoyers > Operating System Efficiency R&D Consultant > EfficiOS Inc. > http://www.efficios.com -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com