All of lore.kernel.org
 help / color / mirror / Atom feed
* PREEMPT_RCU breaks anon_vma locking ?
@ 2007-02-23 21:23 Oleg Nesterov
  2007-02-23 22:41 ` Paul E. McKenney
  2007-02-24 22:04 ` Hugh Dickins
  0 siblings, 2 replies; 10+ messages in thread
From: Oleg Nesterov @ 2007-02-23 21:23 UTC (permalink / raw)
  To: Paul E. McKenney, Hugh Dickins; +Cc: dipankar, Andrew Morton, linux-kernel

If my understanding correct, vmscan can find a page which lives in a already
anon_vma_unlink'ed vma. This is ok, the page is pinned, and page->mapping is
not cleared until free_hot_cold_page().

So page_lock_anon_vma() works correctly due to SLAB_DESTROY_BY_RCU even if
anon_vma_unlink() has already freed anon_vma. In that case we should see
list_empty(&anon_vma->head), we are safe.

However, we are doing spin_unlock(anon_vma->lock) after page_lock_anon_vma(),
and this looks unsafe to me because page_lock_anon_vma() does rcu_read_unlock()
on return.

This worked before because spin_lock() implied rcu_read_lock(), so rcu was
blocked if page_lock_anon_vma() returns !NULL. With CONFIG_PREEMPT_RCU this
is not true (yes?), so it is possible that the slab returns the memory to
the system and it is re-used when we write to anon_vma->lock.

IOW, don't we need something like this

	static struct anon_vma *page_lock_anon_vma(struct page *page)
	{
		struct anon_vma *anon_vma;
		unsigned long anon_mapping;

		rcu_read_lock();
		anon_mapping = (unsigned long) page->mapping;
		if (!(anon_mapping & PAGE_MAPPING_ANON))
			goto out;
		if (!page_mapped(page))
			goto out;

		anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
		spin_lock(&anon_vma->lock);
		return anon_vma;

	out:
		rcu_read_unlock();
		return NULL;
	}

	static inline void page_lock_anon_vma(struct anon_vma *anon_vma)
	{
		spin_unlock(&anon_vma->lock);
		rcu_read_unlock();
	}
?

Oleg.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PREEMPT_RCU breaks anon_vma locking ?
  2007-02-23 21:23 PREEMPT_RCU breaks anon_vma locking ? Oleg Nesterov
@ 2007-02-23 22:41 ` Paul E. McKenney
  2007-02-24 22:10   ` Hugh Dickins
  2007-02-24 22:04 ` Hugh Dickins
  1 sibling, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2007-02-23 22:41 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Hugh Dickins, dipankar, Andrew Morton, linux-kernel

On Sat, Feb 24, 2007 at 12:23:03AM +0300, Oleg Nesterov wrote:
> If my understanding correct, vmscan can find a page which lives in a already
> anon_vma_unlink'ed vma. This is ok, the page is pinned, and page->mapping is
> not cleared until free_hot_cold_page().
> 
> So page_lock_anon_vma() works correctly due to SLAB_DESTROY_BY_RCU even if
> anon_vma_unlink() has already freed anon_vma. In that case we should see
> list_empty(&anon_vma->head), we are safe.
> 
> However, we are doing spin_unlock(anon_vma->lock) after page_lock_anon_vma(),
> and this looks unsafe to me because page_lock_anon_vma() does rcu_read_unlock()
> on return.

This would indeed be bad when using CONFIG_PREEMPT_RCU!  Good catch!!!

> This worked before because spin_lock() implied rcu_read_lock(), so rcu was
> blocked if page_lock_anon_vma() returns !NULL. With CONFIG_PREEMPT_RCU this
> is not true (yes?), so it is possible that the slab returns the memory to
> the system and it is re-used when we write to anon_vma->lock.
> 
> IOW, don't we need something like this
> 
> 	static struct anon_vma *page_lock_anon_vma(struct page *page)
> 	{
> 		struct anon_vma *anon_vma;
> 		unsigned long anon_mapping;
> 
> 		rcu_read_lock();
> 		anon_mapping = (unsigned long) page->mapping;
> 		if (!(anon_mapping & PAGE_MAPPING_ANON))
> 			goto out;
> 		if (!page_mapped(page))
> 			goto out;
> 
> 		anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
> 		spin_lock(&anon_vma->lock);
> 		return anon_vma;
> 
> 	out:
> 		rcu_read_unlock();
> 		return NULL;
> 	}
> 
> 	static inline void page_lock_anon_vma(struct anon_vma *anon_vma)
> 	{
> 		spin_unlock(&anon_vma->lock);
> 		rcu_read_unlock();
> 	}
> ?

This look like a valid fix to me, at least as long as the lock is never
dropped in the meantime (e.g., to do I/O).  If the lock -is- dropped in
the meantime, then presumably whatever is done to keep the page from
vanishing should allow an rcu_read_unlock() to be placed after each
spin_unlock(&...->lock) and an rcu_read_lock() to be placed before each
spin_lock(&...->lock).

						Thanx, Paul

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PREEMPT_RCU breaks anon_vma locking ?
  2007-02-23 21:23 PREEMPT_RCU breaks anon_vma locking ? Oleg Nesterov
  2007-02-23 22:41 ` Paul E. McKenney
@ 2007-02-24 22:04 ` Hugh Dickins
  2007-02-24 22:53   ` Paul E. McKenney
                     ` (2 more replies)
  1 sibling, 3 replies; 10+ messages in thread
From: Hugh Dickins @ 2007-02-24 22:04 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Paul E. McKenney, dipankar, Andrew Morton, Christoph Lameter,
	linux-kernel

On Sat, 24 Feb 2007, Oleg Nesterov wrote:

> If my understanding correct, vmscan can find a page which lives in a already
> anon_vma_unlink'ed vma. This is ok, the page is pinned, and page->mapping is
> not cleared until free_hot_cold_page().

That's about right.  The page_mapped checks, at several levels, make
it very hard to hit this case in practice; but it is possible for the
page to be unmapped and the anon_vma unlinked and kmem_cache_freed
while vmscan is racing through page_lock_anon_vma.

> 
> So page_lock_anon_vma() works correctly due to SLAB_DESTROY_BY_RCU even if
> anon_vma_unlink() has already freed anon_vma. In that case we should see
> list_empty(&anon_vma->head), we are safe.

(It doesn't affect your argument, but we won't necessarily see list_empty
there: the anon_vma slot may already have got reused for a different
bundle of vmas completely; but its lock remains a lock and its list
remains a list of vmas, and the worst that happens is that
page_referenced_anon or try_to_unmap_anon wanders through an irrelevant
bundle of vmas, looking for a page that cannot be found there.)

> 
> However, we are doing spin_unlock(anon_vma->lock) after page_lock_anon_vma(),
> and this looks unsafe to me because page_lock_anon_vma() does rcu_read_unlock()
> on return.
> 
> This worked before because spin_lock() implied rcu_read_lock(), so rcu was
> blocked if page_lock_anon_vma() returns !NULL. With CONFIG_PREEMPT_RCU this
> is not true (yes?), so it is possible that the slab returns the memory to
> the system and it is re-used when we write to anon_vma->lock.

I had been meaning to take a fresh look at page_lock_anon_vma, to see
whether recent RCU developments in -mm affected it.  Thanks for saving
me the trouble.

You and Paul know infinitely more about CONFIG_PREEMPT_RCU than I do,
so if you believe that the change below is enough, that's great, it's
much simpler than I'd feared might be needed.  CONFIG_PREEMPT_RCU
rather scares me, but perhaps it's less worrying than I'd imagined.

Have you checked through the SLAB_DESTROY_BY_RCU end in slab.c?
Is what that's doing still valid?

> 
> IOW, don't we need something like this
> 
> 	static struct anon_vma *page_lock_anon_vma(struct page *page)
> 	{
> 		struct anon_vma *anon_vma;
> 		unsigned long anon_mapping;
> 
> 		rcu_read_lock();
> 		anon_mapping = (unsigned long) page->mapping;
> 		if (!(anon_mapping & PAGE_MAPPING_ANON))
> 			goto out;
> 		if (!page_mapped(page))
> 			goto out;
> 
> 		anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
> 		spin_lock(&anon_vma->lock);
> 		return anon_vma;
> 
> 	out:
> 		rcu_read_unlock();
> 		return NULL;
> 	}
> 
> 	static inline void page_lock_anon_vma(struct anon_vma *anon_vma)

It might be wiser to call that one page_unlock_anon_vma ;)

(I'm slightly disgruntled that page_lock_anon_vma takes a struct page *,
but page_unlock_anon_vma no struct page *.  But it would be silly to do
it differently, or mess with the naming: besides, it's a static function
and the prototype guards against error anyway.)

> 	{
> 		spin_unlock(&anon_vma->lock);
> 		rcu_read_unlock();
> 	}
> ?

Looks fine to me: please go ahead a make a patch for -mm, Oleg: thanks.

I've CC'ed Christoph for several reasons.

One, I think he was trying to persuade me a year ago to change
page_lock_anon_vma as you're now proposing; but I resisted, preferring
to keep the RCU stuff self-contained within page_lock_anon_vma: let's
credit him for prescience, and admit you've proved me wrong.

Two, in his SLUB thread it sounds like he's toying with mixing up kmem
caches of similar sizes: that seems a really bad idea to me (for reasons
Andi and others have made) and I expect it'll get dropped; but in case
not, let's note that the anon_vma cache is one which would fare very
badly from getting mixed in with others (lock would no longer remain
a lock, list would no longer remain a list, across that race).

Three, he has a recurring interest in this unusual page_lock_anon_vma,
mainly for page migration reasons, so let's keep him informed.

Hugh

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PREEMPT_RCU breaks anon_vma locking ?
  2007-02-23 22:41 ` Paul E. McKenney
@ 2007-02-24 22:10   ` Hugh Dickins
  2007-02-24 22:36     ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Hugh Dickins @ 2007-02-24 22:10 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Oleg Nesterov, dipankar, Andrew Morton, linux-kernel

On Fri, 23 Feb 2007, Paul E. McKenney wrote:
> 
> This look like a valid fix to me, at least as long as the lock is never
> dropped in the meantime (e.g., to do I/O).  If the lock -is- dropped in
> the meantime, then presumably whatever is done to keep the page from
> vanishing should allow an rcu_read_unlock() to be placed after each
> spin_unlock(&...->lock) and an rcu_read_lock() to be placed before each
> spin_lock(&...->lock).

Thankfully no complications of that kind, page_lock_anon_vma is static
to mm/rmap.c, and only used to hold the spin lock while examining page
tables of the vmas in the list, never a need to drop that lock at all.
(Until the day when someone reports such a long list that we start to
worry about the latency.)

Hugh

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PREEMPT_RCU breaks anon_vma locking ?
  2007-02-24 22:10   ` Hugh Dickins
@ 2007-02-24 22:36     ` Paul E. McKenney
  0 siblings, 0 replies; 10+ messages in thread
From: Paul E. McKenney @ 2007-02-24 22:36 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Oleg Nesterov, dipankar, Andrew Morton, linux-kernel

On Sat, Feb 24, 2007 at 10:10:57PM +0000, Hugh Dickins wrote:
> On Fri, 23 Feb 2007, Paul E. McKenney wrote:
> > 
> > This look like a valid fix to me, at least as long as the lock is never
> > dropped in the meantime (e.g., to do I/O).  If the lock -is- dropped in
> > the meantime, then presumably whatever is done to keep the page from
> > vanishing should allow an rcu_read_unlock() to be placed after each
> > spin_unlock(&...->lock) and an rcu_read_lock() to be placed before each
> > spin_lock(&...->lock).
> 
> Thankfully no complications of that kind, page_lock_anon_vma is static
> to mm/rmap.c, and only used to hold the spin lock while examining page
> tables of the vmas in the list, never a need to drop that lock at all.
> (Until the day when someone reports such a long list that we start to
> worry about the latency.)

Whew!!!  For now, anyway!  ;-)

						Thanx, Paul

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PREEMPT_RCU breaks anon_vma locking ?
  2007-02-24 22:04 ` Hugh Dickins
@ 2007-02-24 22:53   ` Paul E. McKenney
  2007-03-02 16:27     ` Hugh Dickins
  2007-02-25  0:13   ` Christoph Lameter
  2007-02-25 20:05   ` Oleg Nesterov
  2 siblings, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2007-02-24 22:53 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Oleg Nesterov, dipankar, Andrew Morton, Christoph Lameter, linux-kernel

On Sat, Feb 24, 2007 at 10:04:04PM +0000, Hugh Dickins wrote:
> On Sat, 24 Feb 2007, Oleg Nesterov wrote:
>
> Have you checked through the SLAB_DESTROY_BY_RCU end in slab.c?
> Is what that's doing still valid?

The only thing I see needed due to PREEMPT_RCU is the following comment
change.

For a terrified few minutes, I thought that the code assumed that struct
rcu_head was the same size as struct list_head, but it turns out to only
assume that struct slab is at least as large as struct slab_rcu.

						Thanx, Paul

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---

diff -urpNa -X dontdiff linux-2.6.20/mm/slab.c linux-2.6.20-slabrcufix/mm/slab.c
--- linux-2.6.20/mm/slab.c	2007-02-04 10:44:54.000000000 -0800
+++ linux-2.6.20-slabrcufix/mm/slab.c	2007-02-24 14:50:39.000000000 -0800
@@ -238,7 +238,7 @@ struct slab {
  * other kind of object (which our subsystem's lock might corrupt).
  *
  * rcu_read_lock before reading the address, then rcu_read_unlock after
- * taking the spinlock within the structure expected at that address.
+ * releasing the spinlock within the structure expected at that address.
  *
  * We assume struct slab_rcu can overlay struct slab when destroying.
  */

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PREEMPT_RCU breaks anon_vma locking ?
  2007-02-24 22:04 ` Hugh Dickins
  2007-02-24 22:53   ` Paul E. McKenney
@ 2007-02-25  0:13   ` Christoph Lameter
  2007-02-25 20:05   ` Oleg Nesterov
  2 siblings, 0 replies; 10+ messages in thread
From: Christoph Lameter @ 2007-02-25  0:13 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Oleg Nesterov, Paul E. McKenney, dipankar, Andrew Morton, linux-kernel

On Sat, 24 Feb 2007, Hugh Dickins wrote:

> Two, in his SLUB thread it sounds like he's toying with mixing up kmem
> caches of similar sizes: that seems a really bad idea to me (for reasons
> Andi and others have made) and I expect it'll get dropped; but in case
> not, let's note that the anon_vma cache is one which would fare very
> badly from getting mixed in with others (lock would no longer remain
> a lock, list would no longer remain a list, across that race).

Dont worry DESTROY_BY_RCU slabs wont be mixed with those not marked 
DESTROY_BY_RCU.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PREEMPT_RCU breaks anon_vma locking ?
  2007-02-24 22:04 ` Hugh Dickins
  2007-02-24 22:53   ` Paul E. McKenney
  2007-02-25  0:13   ` Christoph Lameter
@ 2007-02-25 20:05   ` Oleg Nesterov
  2007-02-26  1:53     ` Paul E. McKenney
  2 siblings, 1 reply; 10+ messages in thread
From: Oleg Nesterov @ 2007-02-25 20:05 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Paul E. McKenney, dipankar, Andrew Morton, Christoph Lameter,
	linux-kernel

On 02/24, Hugh Dickins wrote:
>
> On Sat, 24 Feb 2007, Oleg Nesterov wrote:
> 
> > So page_lock_anon_vma() works correctly due to SLAB_DESTROY_BY_RCU even if
> > anon_vma_unlink() has already freed anon_vma. In that case we should see
> > list_empty(&anon_vma->head), we are safe.
> 
> (It doesn't affect your argument, but we won't necessarily see list_empty
> there: the anon_vma slot may already have got reused for a different
> bundle of vmas completely; but its lock remains a lock and its list
> remains a list of vmas, and the worst that happens is that
> page_referenced_anon or try_to_unmap_anon wanders through an irrelevant
> bundle of vmas, looking for a page that cannot be found there.)

Yes, but in that case we are safe, right? We hold the lock, anon_vma can't be
freed. But thanks for clarification! Somehow I missed that not only unlock()
is unsafe (in theory). If anon_vma's memory was re-used for something else, we
can't assume that we will see list_empty(&anon_vma->head).

> > 	static inline void page_lock_anon_vma(struct anon_vma *anon_vma)
> 
> It might be wiser to call that one page_unlock_anon_vma ;)

Congratulations, you passed the test! Paul didn't :)

> (I'm slightly disgruntled that page_lock_anon_vma takes a struct page *,
> but page_unlock_anon_vma no struct page *.  But it would be silly to do
> it differently, or mess with the naming: besides, it's a static function
> and the prototype guards against error anyway.)

OK. I thought about "unlock_anon_vma", but symmetry is good indeed.

Oleg.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PREEMPT_RCU breaks anon_vma locking ?
  2007-02-25 20:05   ` Oleg Nesterov
@ 2007-02-26  1:53     ` Paul E. McKenney
  0 siblings, 0 replies; 10+ messages in thread
From: Paul E. McKenney @ 2007-02-26  1:53 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Hugh Dickins, dipankar, Andrew Morton, Christoph Lameter, linux-kernel

On Sun, Feb 25, 2007 at 11:05:50PM +0300, Oleg Nesterov wrote:
> On 02/24, Hugh Dickins wrote:
> >
> > On Sat, 24 Feb 2007, Oleg Nesterov wrote:
> > 
> > > So page_lock_anon_vma() works correctly due to SLAB_DESTROY_BY_RCU even if
> > > anon_vma_unlink() has already freed anon_vma. In that case we should see
> > > list_empty(&anon_vma->head), we are safe.
> > 
> > (It doesn't affect your argument, but we won't necessarily see list_empty
> > there: the anon_vma slot may already have got reused for a different
> > bundle of vmas completely; but its lock remains a lock and its list
> > remains a list of vmas, and the worst that happens is that
> > page_referenced_anon or try_to_unmap_anon wanders through an irrelevant
> > bundle of vmas, looking for a page that cannot be found there.)
> 
> Yes, but in that case we are safe, right? We hold the lock, anon_vma can't be
> freed. But thanks for clarification! Somehow I missed that not only unlock()
> is unsafe (in theory). If anon_vma's memory was re-used for something else, we
> can't assume that we will see list_empty(&anon_vma->head).
> 
> > > 	static inline void page_lock_anon_vma(struct anon_vma *anon_vma)
> > 
> > It might be wiser to call that one page_unlock_anon_vma ;)
> 
> Congratulations, you passed the test! Paul didn't :)

What is in a name?  ;-)

						Thanx, Paul

> > (I'm slightly disgruntled that page_lock_anon_vma takes a struct page *,
> > but page_unlock_anon_vma no struct page *.  But it would be silly to do
> > it differently, or mess with the naming: besides, it's a static function
> > and the prototype guards against error anyway.)
> 
> OK. I thought about "unlock_anon_vma", but symmetry is good indeed.
> 
> Oleg.
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PREEMPT_RCU breaks anon_vma locking ?
  2007-02-24 22:53   ` Paul E. McKenney
@ 2007-03-02 16:27     ` Hugh Dickins
  0 siblings, 0 replies; 10+ messages in thread
From: Hugh Dickins @ 2007-03-02 16:27 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Oleg Nesterov, dipankar, Andrew Morton, Christoph Lameter, linux-kernel

On Sat, 24 Feb 2007, Paul E. McKenney wrote:
> On Sat, Feb 24, 2007 at 10:04:04PM +0000, Hugh Dickins wrote:
> >
> > Have you checked through the SLAB_DESTROY_BY_RCU end in slab.c?
> > Is what that's doing still valid?
> 
> The only thing I see needed due to PREEMPT_RCU is the following comment
> change.
> 
> For a terrified few minutes, I thought that the code assumed that struct
> rcu_head was the same size as struct list_head, but it turns out to only
> assume that struct slab is at least as large as struct slab_rcu.
> 
> 						Thanx, Paul

Thanks for enduring the terror, checking it out, and arriving at
such a reassuring conclusion.  Andrew, please add this to your -mm
collection after (or folded into) Paul's rcu-preemptible-rcu.patch.


PREEMPT_RCU has stricter needs: updated comment on SLAB_DESTROY_BY_RCU.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Hugh Dickins <hugh@veritas.com>
---

diff -urpNa -X dontdiff linux-2.6.20/mm/slab.c linux-2.6.20-slabrcufix/mm/slab.c
--- linux-2.6.20/mm/slab.c	2007-02-04 10:44:54.000000000 -0800
+++ linux-2.6.20-slabrcufix/mm/slab.c	2007-02-24 14:50:39.000000000 -0800
@@ -238,7 +238,7 @@ struct slab {
  * other kind of object (which our subsystem's lock might corrupt).
  *
  * rcu_read_lock before reading the address, then rcu_read_unlock after
- * taking the spinlock within the structure expected at that address.
+ * releasing the spinlock within the structure expected at that address.
  *
  * We assume struct slab_rcu can overlay struct slab when destroying.
  */

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2007-03-02 16:27 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-23 21:23 PREEMPT_RCU breaks anon_vma locking ? Oleg Nesterov
2007-02-23 22:41 ` Paul E. McKenney
2007-02-24 22:10   ` Hugh Dickins
2007-02-24 22:36     ` Paul E. McKenney
2007-02-24 22:04 ` Hugh Dickins
2007-02-24 22:53   ` Paul E. McKenney
2007-03-02 16:27     ` Hugh Dickins
2007-02-25  0:13   ` Christoph Lameter
2007-02-25 20:05   ` Oleg Nesterov
2007-02-26  1:53     ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.