From: Nick Piggin <npiggin@suse.de>
To: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Andrew Morton <akpm@osdl.org>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Paul McKenney <Paul.McKenney@us.ibm.com>,
Linux Kernel <linux-kernel@vger.kernel.org>,
Linux Memory Management <linux-mm@kvack.org>
Subject: Re: [patch 3/3] radix-tree: RCU lockless readside
Date: Thu, 22 Jun 2006 18:55:51 +0200 [thread overview]
Message-ID: <20060622165551.GB23109@wotan.suse.de> (raw)
In-Reply-To: <20060622163032.GC1295@us.ibm.com>
On Thu, Jun 22, 2006 at 09:30:32AM -0700, Paul E. McKenney wrote:
> On Thu, Jun 22, 2006 at 05:45:18PM +0200, Nick Piggin wrote:
> >
> > I'll probably put a little table in radix-tree.h to summarise the
> > API synchronisation requirements, OK?
>
> Makes sense to me -- except will that feed into the docbook stuff?
> It seems to me to be really important to get these sorts of requirements
> included in the docbook stuff. I have had too many people show me
> code that assumed that RCU somehow synchronizes updates, so it is
> good to call out these requirements early and often.
I'm not a docbook expert, but that's a good point. Will RFComments
on the comments when I'm done ;)
> >
> > Does the single rcu_dereference in radix_tree_gang_lookup look OK?
>
> Well, it does put a memory barrier in the right place on Alpha, but the
> intent would be more clear to me if the rcu_dereference() were on the
> assignments to each element of the results array. And there would be
> no additional overhead on most architectures.
>
> So I would much prefer the rcu_dereference() be on the assignment to
> the results array.
No problem, will change.
> > Ah indeed, that's confusing. Yes, the lookup_tag must exclude updates.
> > I guess I got too mechanical in my conversion... however, tag lookups
> > can be RCUified without a great deal of trouble, so I might take this
> > opportunity.
>
> The tag lookups would then find anything that (1) had been tagged in a
> prior operation and (2) had not been deleted in the meantime, right?
Yes. Where "prior" is only really prior as guaranteed by some
synchronising (or otherwise dependent) operation. But I don't
need to tell you that ;)
> And the caller could hold a lock across both the tagging and tag
> lookup if greater certainty was desired. I could imagine this sort
> of semantic being useful for deferred operations on ranges of memory,
> where new additions would have the operation implicit in creation and any
> deletions would no longer need the operation to be performed (or might
> be performed as part of deletion operation), but have not actually used
> this sort of thing myself.
>
> So I must defer to people who have used tagging and tagged lookups
> in anger.
We use tagged lookups for writeout and synching -- slow IO related
which is why I had not converted it over to lockless. But it could
use lockless tagged lookups: eg. so long as sync catches all the
pages that we *know* to be dirty at the time of the sync, that's
fine.
> > I've tried to get that message across in the radix_tree_lookup_slot
> > comment, if they're using RCU lookups. Enough? I guess I'll add it
> > to the locking summary too.
>
> This is what I saw in the radix_tree_lookup_slot() comment:
>
> + * radix_tree_lookup_slot - lookup a slot in a radix tree
> + * @root: radix tree root
> + * @index: index key
> + *
> + * Lookup the slot corresponding to the position @index in the radix tree
> + * @root. This is useful for update-if-exists operations.
> + *
> + * This function cannot be called under rcu_read_lock, it must be
> + * excluded from writers, as must the returned slot.
>
> This comment does not make the RCU-protection point clear to me.
> The constraint is that if you use RCU radix-tree lookups, then you must
> use synchronize_rcu() or call_rcu() when freeing any elements removed
> from the radix tree via radix_tree_delete().
>
> Or am I missing something here?
Well the pagecache uses pointers to struct page. struct page is never
freed, so we can forget the whole thing ;)
(mem unplug may want to free them, so in that case they would have to
synchronize_rcu).
But I guess for less specialised users, RCU would be the usual way to
go... ah I see, I must have got out of synch somewhere. I have comments
along these lines for radix_tree_lookup_slot:
+ * This function can be called under rcu_read_lock, however it is the
+ * duty of the caller to manage the lifetimes of the leaf nodes (ie.
+ * they would usually be RCU protected as well). Also, dereferencing
+ * the slot pointer would require rcu_dereference, and modifying it
+ * would require rcu_assign_pointer.
> > > Rough notes, FYA:
>
> You went through all these as well? Hope you have recovered from the
> bout of insomnia! ;-)
Pretty well gone through them.
>
> > > o Don't the items placed into the radix tree need to be protected
> > > by RCU? If not, how does the radix tree avoid handing a pointer
> > > to something that has recently been removed, that the caller to
> > > radix_tree_delete() might have already freed?
> >
> > Yes, they'll need to be protected by something. In the lockless pagecache,
> > they are never freed, so that isn't an issue. But other users (Ben's
> > irq patch perhaps, unless it uses the slot pointers directly) will have to
> > be careful.
>
> Ah! If they are never freed, there is still a need to take care when
> reusing them. One approach is to prevent them from being reused until
> after a grace period has elapsed, another is to use revalidation checks
> after lookup. Either way, one needs to allow for the fact that a
> lookup might hand you something that has just been deleted.
Yep, validation checks are done after lookup. The core of it, the
``page_cache_get_speculative'' function isn't that big.
> > I'll send out an incremental diff with changes.
>
> Looking forward to it! Maybe not as anxiously as Ben Herrenschmidt, but
> so it goes. ;-)
Well I couldn't ask you to spend any more time on it, but if you
get interested and take a peek, that'll be a bonus for me ;)
Thanks again
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2006-06-22 16:55 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-06-20 14:48 [patch 0/3] 2.6.17 radix-tree: updates and lockless Nick Piggin
2006-06-20 14:48 ` [patch 1/3] radix-tree: direct data Nick Piggin
2006-06-20 14:48 ` [patch 2/3] radix-tree: small Nick Piggin
2006-06-20 14:48 ` [patch 3/3] radix-tree: RCU lockless readside Nick Piggin
2006-06-22 1:49 ` Paul E. McKenney
2006-06-22 15:45 ` Nick Piggin
2006-06-22 16:30 ` Paul E. McKenney
2006-06-22 16:55 ` Nick Piggin [this message]
2006-06-22 17:40 ` Paul E. McKenney
2006-06-22 18:11 ` Nick Piggin
2006-06-23 7:09 ` Andrew Morton
2006-06-23 8:03 ` Nick Piggin
2006-06-23 8:39 ` Nick Piggin
2006-06-23 8:41 ` Nick Piggin
2006-06-20 22:08 ` [patch 0/3] 2.6.17 radix-tree: updates and lockless Benjamin Herrenschmidt
2006-06-20 22:35 ` Andrew Morton
2006-06-20 23:09 ` Benjamin Herrenschmidt
2006-06-20 23:30 ` Andrew Morton
2006-06-20 23:50 ` Benjamin Herrenschmidt
2006-06-21 0:34 ` Christoph Lameter
2006-06-21 0:47 ` Benjamin Herrenschmidt
2006-06-21 1:07 ` Christoph Lameter
2006-06-21 1:33 ` Benjamin Herrenschmidt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060622165551.GB23109@wotan.suse.de \
--to=npiggin@suse.de \
--cc=Paul.McKenney@us.ibm.com \
--cc=akpm@osdl.org \
--cc=benh@kernel.crashing.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=paulmck@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).