All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm: fix hang on anon_vma->root->lock
@ 2010-08-26  6:12 ` Hugh Dickins
  0 siblings, 0 replies; 44+ messages in thread
From: Hugh Dickins @ 2010-08-26  6:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Rik van Riel, Andrea Arcangeli, linux-kernel, linux-mm

After several hours, kbuild tests hang with anon_vma_prepare() spinning on
a newly allocated anon_vma's lock - on a box with CONFIG_TREE_PREEMPT_RCU=y
(which makes this very much more likely, but it could happen without).

The ever-subtle page_lock_anon_vma() now needs a further twist: since
anon_vma_prepare() and anon_vma_fork() are liable to change the ->root
of a reused anon_vma structure at any moment, page_lock_anon_vma()
needs to check page_mapped() again before succeeding, otherwise
page_unlock_anon_vma() might address a different root->lock.

Signed-off-by: Hugh Dickins <hughd@google.com>
---

 mm/rmap.c |   19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

--- 2.6.36-rc2/mm/rmap.c	2010-08-16 00:18:01.000000000 -0700
+++ linux/mm/rmap.c	2010-08-24 03:22:17.000000000 -0700
@@ -316,7 +316,7 @@ void __init anon_vma_init(void)
  */
 struct anon_vma *page_lock_anon_vma(struct page *page)
 {
-	struct anon_vma *anon_vma;
+	struct anon_vma *anon_vma, *root_anon_vma;
 	unsigned long anon_mapping;
 
 	rcu_read_lock();
@@ -327,8 +327,21 @@ struct anon_vma *page_lock_anon_vma(stru
 		goto out;
 
 	anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
-	anon_vma_lock(anon_vma);
-	return anon_vma;
+	root_anon_vma = ACCESS_ONCE(anon_vma->root);
+	spin_lock(&root_anon_vma->lock);
+
+	/*
+	 * If this page is still mapped, then its anon_vma cannot have been
+	 * freed.  But if it has been unmapped, we have no security against
+	 * the anon_vma structure being freed and reused (for another anon_vma:
+	 * SLAB_DESTROY_BY_RCU guarantees that - so the spin_lock above cannot
+	 * corrupt): with anon_vma_prepare() or anon_vma_fork() redirecting
+	 * anon_vma->root before page_unlock_anon_vma() is called to unlock.
+	 */
+	if (page_mapped(page))
+		return anon_vma;
+
+	spin_unlock(&root_anon_vma->lock);
 out:
 	rcu_read_unlock();
 	return NULL;

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH] mm: fix hang on anon_vma->root->lock
@ 2010-08-26  6:12 ` Hugh Dickins
  0 siblings, 0 replies; 44+ messages in thread
From: Hugh Dickins @ 2010-08-26  6:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Rik van Riel, Andrea Arcangeli, linux-kernel, linux-mm

After several hours, kbuild tests hang with anon_vma_prepare() spinning on
a newly allocated anon_vma's lock - on a box with CONFIG_TREE_PREEMPT_RCU=y
(which makes this very much more likely, but it could happen without).

The ever-subtle page_lock_anon_vma() now needs a further twist: since
anon_vma_prepare() and anon_vma_fork() are liable to change the ->root
of a reused anon_vma structure at any moment, page_lock_anon_vma()
needs to check page_mapped() again before succeeding, otherwise
page_unlock_anon_vma() might address a different root->lock.

Signed-off-by: Hugh Dickins <hughd@google.com>
---

 mm/rmap.c |   19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

--- 2.6.36-rc2/mm/rmap.c	2010-08-16 00:18:01.000000000 -0700
+++ linux/mm/rmap.c	2010-08-24 03:22:17.000000000 -0700
@@ -316,7 +316,7 @@ void __init anon_vma_init(void)
  */
 struct anon_vma *page_lock_anon_vma(struct page *page)
 {
-	struct anon_vma *anon_vma;
+	struct anon_vma *anon_vma, *root_anon_vma;
 	unsigned long anon_mapping;
 
 	rcu_read_lock();
@@ -327,8 +327,21 @@ struct anon_vma *page_lock_anon_vma(stru
 		goto out;
 
 	anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
-	anon_vma_lock(anon_vma);
-	return anon_vma;
+	root_anon_vma = ACCESS_ONCE(anon_vma->root);
+	spin_lock(&root_anon_vma->lock);
+
+	/*
+	 * If this page is still mapped, then its anon_vma cannot have been
+	 * freed.  But if it has been unmapped, we have no security against
+	 * the anon_vma structure being freed and reused (for another anon_vma:
+	 * SLAB_DESTROY_BY_RCU guarantees that - so the spin_lock above cannot
+	 * corrupt): with anon_vma_prepare() or anon_vma_fork() redirecting
+	 * anon_vma->root before page_unlock_anon_vma() is called to unlock.
+	 */
+	if (page_mapped(page))
+		return anon_vma;
+
+	spin_unlock(&root_anon_vma->lock);
 out:
 	rcu_read_unlock();
 	return NULL;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
  2010-08-26  6:12 ` Hugh Dickins
@ 2010-08-26  6:41   ` David Miller
  -1 siblings, 0 replies; 44+ messages in thread
From: David Miller @ 2010-08-26  6:41 UTC (permalink / raw)
  To: hughd; +Cc: torvalds, akpm, riel, aarcange, linux-kernel, linux-mm

From: Hugh Dickins <hughd@google.com>
Date: Wed, 25 Aug 2010 23:12:54 -0700 (PDT)

> After several hours, kbuild tests hang with anon_vma_prepare() spinning on
> a newly allocated anon_vma's lock - on a box with CONFIG_TREE_PREEMPT_RCU=y
> (which makes this very much more likely, but it could happen without).
> 
> The ever-subtle page_lock_anon_vma() now needs a further twist: since
> anon_vma_prepare() and anon_vma_fork() are liable to change the ->root
> of a reused anon_vma structure at any moment, page_lock_anon_vma()
> needs to check page_mapped() again before succeeding, otherwise
> page_unlock_anon_vma() might address a different root->lock.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>

Interesting, is the condition which allows this to trigger specific
to this merge window or was it always possible?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
@ 2010-08-26  6:41   ` David Miller
  0 siblings, 0 replies; 44+ messages in thread
From: David Miller @ 2010-08-26  6:41 UTC (permalink / raw)
  To: hughd; +Cc: torvalds, akpm, riel, aarcange, linux-kernel, linux-mm

From: Hugh Dickins <hughd@google.com>
Date: Wed, 25 Aug 2010 23:12:54 -0700 (PDT)

> After several hours, kbuild tests hang with anon_vma_prepare() spinning on
> a newly allocated anon_vma's lock - on a box with CONFIG_TREE_PREEMPT_RCU=y
> (which makes this very much more likely, but it could happen without).
> 
> The ever-subtle page_lock_anon_vma() now needs a further twist: since
> anon_vma_prepare() and anon_vma_fork() are liable to change the ->root
> of a reused anon_vma structure at any moment, page_lock_anon_vma()
> needs to check page_mapped() again before succeeding, otherwise
> page_unlock_anon_vma() might address a different root->lock.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>

Interesting, is the condition which allows this to trigger specific
to this merge window or was it always possible?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
  2010-08-26  6:41   ` David Miller
@ 2010-08-26 10:54     ` Hugh Dickins
  -1 siblings, 0 replies; 44+ messages in thread
From: Hugh Dickins @ 2010-08-26 10:54 UTC (permalink / raw)
  To: David Miller; +Cc: torvalds, akpm, riel, aarcange, linux-kernel, linux-mm

On Wed, Aug 25, 2010 at 11:41 PM, David Miller <davem@davemloft.net> wrote:
> From: Hugh Dickins <hughd@google.com>
> Date: Wed, 25 Aug 2010 23:12:54 -0700 (PDT)
>
>> After several hours, kbuild tests hang with anon_vma_prepare() spinning on
>> a newly allocated anon_vma's lock - on a box with CONFIG_TREE_PREEMPT_RCU=y
>> (which makes this very much more likely, but it could happen without).
>>
>> The ever-subtle page_lock_anon_vma() now needs a further twist: since
>> anon_vma_prepare() and anon_vma_fork() are liable to change the ->root
>> of a reused anon_vma structure at any moment, page_lock_anon_vma()
>> needs to check page_mapped() again before succeeding, otherwise
>> page_unlock_anon_vma() might address a different root->lock.
>>
>> Signed-off-by: Hugh Dickins <hughd@google.com>
>
> Interesting, is the condition which allows this to trigger specific
> to this merge window or was it always possible?

Just specific to this merge window, which started using
anon_vma->root->lock in place of anon_vma->lock (anon_vma->root is
often anon_vma itself, but not always).  I _think_ that change was
itself a simplification of the locking in 2.6.35, rather than plugging
a particular hole (it's not been backported to -stable), but I may be
wrong on that - Rik?

Hugh

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
@ 2010-08-26 10:54     ` Hugh Dickins
  0 siblings, 0 replies; 44+ messages in thread
From: Hugh Dickins @ 2010-08-26 10:54 UTC (permalink / raw)
  To: David Miller; +Cc: torvalds, akpm, riel, aarcange, linux-kernel, linux-mm

On Wed, Aug 25, 2010 at 11:41 PM, David Miller <davem@davemloft.net> wrote:
> From: Hugh Dickins <hughd@google.com>
> Date: Wed, 25 Aug 2010 23:12:54 -0700 (PDT)
>
>> After several hours, kbuild tests hang with anon_vma_prepare() spinning on
>> a newly allocated anon_vma's lock - on a box with CONFIG_TREE_PREEMPT_RCU=y
>> (which makes this very much more likely, but it could happen without).
>>
>> The ever-subtle page_lock_anon_vma() now needs a further twist: since
>> anon_vma_prepare() and anon_vma_fork() are liable to change the ->root
>> of a reused anon_vma structure at any moment, page_lock_anon_vma()
>> needs to check page_mapped() again before succeeding, otherwise
>> page_unlock_anon_vma() might address a different root->lock.
>>
>> Signed-off-by: Hugh Dickins <hughd@google.com>
>
> Interesting, is the condition which allows this to trigger specific
> to this merge window or was it always possible?

Just specific to this merge window, which started using
anon_vma->root->lock in place of anon_vma->lock (anon_vma->root is
often anon_vma itself, but not always).  I _think_ that change was
itself a simplification of the locking in 2.6.35, rather than plugging
a particular hole (it's not been backported to -stable), but I may be
wrong on that - Rik?

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
  2010-08-26  6:12 ` Hugh Dickins
@ 2010-08-26 13:32   ` Rik van Riel
  -1 siblings, 0 replies; 44+ messages in thread
From: Rik van Riel @ 2010-08-26 13:32 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Andrew Morton, Andrea Arcangeli, linux-kernel, linux-mm

On 08/26/2010 02:12 AM, Hugh Dickins wrote:
> After several hours, kbuild tests hang with anon_vma_prepare() spinning on
> a newly allocated anon_vma's lock - on a box with CONFIG_TREE_PREEMPT_RCU=y
> (which makes this very much more likely, but it could happen without).
>
> The ever-subtle page_lock_anon_vma() now needs a further twist: since
> anon_vma_prepare() and anon_vma_fork() are liable to change the ->root
> of a reused anon_vma structure at any moment, page_lock_anon_vma()
> needs to check page_mapped() again before succeeding, otherwise
> page_unlock_anon_vma() might address a different root->lock.
>
> Signed-off-by: Hugh Dickins<hughd@google.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

And yes, AFAIK this code lived just in -mm up to now.

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
@ 2010-08-26 13:32   ` Rik van Riel
  0 siblings, 0 replies; 44+ messages in thread
From: Rik van Riel @ 2010-08-26 13:32 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Andrew Morton, Andrea Arcangeli, linux-kernel, linux-mm

On 08/26/2010 02:12 AM, Hugh Dickins wrote:
> After several hours, kbuild tests hang with anon_vma_prepare() spinning on
> a newly allocated anon_vma's lock - on a box with CONFIG_TREE_PREEMPT_RCU=y
> (which makes this very much more likely, but it could happen without).
>
> The ever-subtle page_lock_anon_vma() now needs a further twist: since
> anon_vma_prepare() and anon_vma_fork() are liable to change the ->root
> of a reused anon_vma structure at any moment, page_lock_anon_vma()
> needs to check page_mapped() again before succeeding, otherwise
> page_unlock_anon_vma() might address a different root->lock.
>
> Signed-off-by: Hugh Dickins<hughd@google.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

And yes, AFAIK this code lived just in -mm up to now.

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
  2010-08-26 10:54     ` Hugh Dickins
@ 2010-08-26 19:00       ` David Miller
  -1 siblings, 0 replies; 44+ messages in thread
From: David Miller @ 2010-08-26 19:00 UTC (permalink / raw)
  To: hughd; +Cc: torvalds, akpm, riel, aarcange, linux-kernel, linux-mm

From: Hugh Dickins <hughd@google.com>
Date: Thu, 26 Aug 2010 03:54:28 -0700

> On Wed, Aug 25, 2010 at 11:41 PM, David Miller <davem@davemloft.net> wrote:
>> From: Hugh Dickins <hughd@google.com>
>> Date: Wed, 25 Aug 2010 23:12:54 -0700 (PDT)
>>
>>> After several hours, kbuild tests hang with anon_vma_prepare() spinning on
>>> a newly allocated anon_vma's lock - on a box with CONFIG_TREE_PREEMPT_RCU=y
>>> (which makes this very much more likely, but it could happen without).
>>>
>>> The ever-subtle page_lock_anon_vma() now needs a further twist: since
>>> anon_vma_prepare() and anon_vma_fork() are liable to change the ->root
>>> of a reused anon_vma structure at any moment, page_lock_anon_vma()
>>> needs to check page_mapped() again before succeeding, otherwise
>>> page_unlock_anon_vma() might address a different root->lock.
>>>
>>> Signed-off-by: Hugh Dickins <hughd@google.com>
>>
>> Interesting, is the condition which allows this to trigger specific
>> to this merge window or was it always possible?
> 
> Just specific to this merge window, which started using
> anon_vma->root->lock in place of anon_vma->lock (anon_vma->root is
> often anon_vma itself, but not always).

Ok, because I've been seeing mysterious hangs recently on one of my
sparc64 boxes and I was wondering if this might explain them :-)


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
@ 2010-08-26 19:00       ` David Miller
  0 siblings, 0 replies; 44+ messages in thread
From: David Miller @ 2010-08-26 19:00 UTC (permalink / raw)
  To: hughd; +Cc: torvalds, akpm, riel, aarcange, linux-kernel, linux-mm

From: Hugh Dickins <hughd@google.com>
Date: Thu, 26 Aug 2010 03:54:28 -0700

> On Wed, Aug 25, 2010 at 11:41 PM, David Miller <davem@davemloft.net> wrote:
>> From: Hugh Dickins <hughd@google.com>
>> Date: Wed, 25 Aug 2010 23:12:54 -0700 (PDT)
>>
>>> After several hours, kbuild tests hang with anon_vma_prepare() spinning on
>>> a newly allocated anon_vma's lock - on a box with CONFIG_TREE_PREEMPT_RCU=y
>>> (which makes this very much more likely, but it could happen without).
>>>
>>> The ever-subtle page_lock_anon_vma() now needs a further twist: since
>>> anon_vma_prepare() and anon_vma_fork() are liable to change the ->root
>>> of a reused anon_vma structure at any moment, page_lock_anon_vma()
>>> needs to check page_mapped() again before succeeding, otherwise
>>> page_unlock_anon_vma() might address a different root->lock.
>>>
>>> Signed-off-by: Hugh Dickins <hughd@google.com>
>>
>> Interesting, is the condition which allows this to trigger specific
>> to this merge window or was it always possible?
> 
> Just specific to this merge window, which started using
> anon_vma->root->lock in place of anon_vma->lock (anon_vma->root is
> often anon_vma itself, but not always).

Ok, because I've been seeing mysterious hangs recently on one of my
sparc64 boxes and I was wondering if this might explain them :-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
  2010-08-26  6:12 ` Hugh Dickins
@ 2010-08-26 23:50   ` Andrea Arcangeli
  -1 siblings, 0 replies; 44+ messages in thread
From: Andrea Arcangeli @ 2010-08-26 23:50 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Andrew Morton, Rik van Riel, linux-kernel, linux-mm

Hi Hugh,

On Wed, Aug 25, 2010 at 11:12:54PM -0700, Hugh Dickins wrote:
> After several hours, kbuild tests hang with anon_vma_prepare() spinning on
> a newly allocated anon_vma's lock - on a box with CONFIG_TREE_PREEMPT_RCU=y
> (which makes this very much more likely, but it could happen without).
> 
> The ever-subtle page_lock_anon_vma() now needs a further twist: since
> anon_vma_prepare() and anon_vma_fork() are liable to change the ->root
> of a reused anon_vma structure at any moment, page_lock_anon_vma()
> needs to check page_mapped() again before succeeding, otherwise
> page_unlock_anon_vma() might address a different root->lock.

I don't get it, the anon_vma can be freed and reused only after we run
rcu_read_unlock(). And the anon_vma->root can't change unless the
anon_vma is freed and reused. Last but not the least by the time
page->mapping points to "anon_vma" the "anon_vma->root" is already
initialized and stable.

The page_mapped test is only relevant against the rcu_read_lock, not
the spin_lock, so how it can make a difference to run it twice inside
the same rcu_read_lock protected critical section? The first one still
is valid also after the anon_vma_lock() returns, it's not like that
anon_vma_lock drops the rcu_read_lock internally.

Furthermore no need of ACCESS_ONCE on the anon_vma->root because it
can't change from under us as the anon_vma can't be freed from under
us until rcu_read_unlock returns (after we verified the first time
that page_mapped is true under the rcu_read_lock, which we already do
before trying to take the anon_vma_lock).

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
@ 2010-08-26 23:50   ` Andrea Arcangeli
  0 siblings, 0 replies; 44+ messages in thread
From: Andrea Arcangeli @ 2010-08-26 23:50 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Andrew Morton, Rik van Riel, linux-kernel, linux-mm

Hi Hugh,

On Wed, Aug 25, 2010 at 11:12:54PM -0700, Hugh Dickins wrote:
> After several hours, kbuild tests hang with anon_vma_prepare() spinning on
> a newly allocated anon_vma's lock - on a box with CONFIG_TREE_PREEMPT_RCU=y
> (which makes this very much more likely, but it could happen without).
> 
> The ever-subtle page_lock_anon_vma() now needs a further twist: since
> anon_vma_prepare() and anon_vma_fork() are liable to change the ->root
> of a reused anon_vma structure at any moment, page_lock_anon_vma()
> needs to check page_mapped() again before succeeding, otherwise
> page_unlock_anon_vma() might address a different root->lock.

I don't get it, the anon_vma can be freed and reused only after we run
rcu_read_unlock(). And the anon_vma->root can't change unless the
anon_vma is freed and reused. Last but not the least by the time
page->mapping points to "anon_vma" the "anon_vma->root" is already
initialized and stable.

The page_mapped test is only relevant against the rcu_read_lock, not
the spin_lock, so how it can make a difference to run it twice inside
the same rcu_read_lock protected critical section? The first one still
is valid also after the anon_vma_lock() returns, it's not like that
anon_vma_lock drops the rcu_read_lock internally.

Furthermore no need of ACCESS_ONCE on the anon_vma->root because it
can't change from under us as the anon_vma can't be freed from under
us until rcu_read_unlock returns (after we verified the first time
that page_mapped is true under the rcu_read_lock, which we already do
before trying to take the anon_vma_lock).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
  2010-08-26 10:54     ` Hugh Dickins
@ 2010-08-27  0:19       ` Andrea Arcangeli
  -1 siblings, 0 replies; 44+ messages in thread
From: Andrea Arcangeli @ 2010-08-27  0:19 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: David Miller, torvalds, akpm, riel, linux-kernel, linux-mm

On Thu, Aug 26, 2010 at 03:54:28AM -0700, Hugh Dickins wrote:
> On Wed, Aug 25, 2010 at 11:41 PM, David Miller <davem@davemloft.net> wrote:
> > From: Hugh Dickins <hughd@google.com>
> > Date: Wed, 25 Aug 2010 23:12:54 -0700 (PDT)
> >
> >> After several hours, kbuild tests hang with anon_vma_prepare() spinning on
> >> a newly allocated anon_vma's lock - on a box with CONFIG_TREE_PREEMPT_RCU=y
> >> (which makes this very much more likely, but it could happen without).
> >>
> >> The ever-subtle page_lock_anon_vma() now needs a further twist: since
> >> anon_vma_prepare() and anon_vma_fork() are liable to change the ->root
> >> of a reused anon_vma structure at any moment, page_lock_anon_vma()
> >> needs to check page_mapped() again before succeeding, otherwise
> >> page_unlock_anon_vma() might address a different root->lock.
> >>
> >> Signed-off-by: Hugh Dickins <hughd@google.com>
> >
> > Interesting, is the condition which allows this to trigger specific
> > to this merge window or was it always possible?
> 
> Just specific to this merge window, which started using
> anon_vma->root->lock in place of anon_vma->lock (anon_vma->root is
> often anon_vma itself, but not always).  I _think_ that change was
> itself a simplification of the locking in 2.6.35, rather than plugging
> a particular hole (it's not been backported to -stable), but I may be
> wrong on that - Rik?

rmap_walk_anon isn't stable without the anon_vma->root->lock. This is
because it only locks the local anon_vma and but the "vma" can be
still modified under it in vma_adjust, so vma_address can fail and
migration will crash. swapping is not issue as it's ok to miss a pte
once in a while if the vma is under vma_adjust by the time the VM
tries to unmap the page.

See the anon_vma_lock added as well to vma_adjust (without it, it'd be
just an useless exercise, but instead it's really plugging a real hole
thanks to the anon_vma_lock addition to vma_adjust).

Again not an issue unless you use migration (incidentally what memory
compaction uses, and you know who the primary user of memory
compaction is :).

Secondly ksm_does_need_to_copy can't be fixed to cow only pages that
are nonlinear as the page->mapping != anon_vma can now generate false
positives. It's now trivial to fix with page->mapping->root !=
anon_vma->root.

About your patch, it's a noop in my view... A single page_mapping
check after rcu_read_lock is enough. And "anon_vma->root" can't change
if page->mapping points to "anon_vma" at any time after rcu_read_lock
returns. rcu_read_lock works for all anon_vma including
anon_vma->root.

If you were running the page_mapped() check on a different "page" then
it could make a difference, but repeating it on the same page in the
same rcu_read_lock protected critical section won't make a difference
as far as the anon_vma freeing is concerned.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
@ 2010-08-27  0:19       ` Andrea Arcangeli
  0 siblings, 0 replies; 44+ messages in thread
From: Andrea Arcangeli @ 2010-08-27  0:19 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: David Miller, torvalds, akpm, riel, linux-kernel, linux-mm

On Thu, Aug 26, 2010 at 03:54:28AM -0700, Hugh Dickins wrote:
> On Wed, Aug 25, 2010 at 11:41 PM, David Miller <davem@davemloft.net> wrote:
> > From: Hugh Dickins <hughd@google.com>
> > Date: Wed, 25 Aug 2010 23:12:54 -0700 (PDT)
> >
> >> After several hours, kbuild tests hang with anon_vma_prepare() spinning on
> >> a newly allocated anon_vma's lock - on a box with CONFIG_TREE_PREEMPT_RCU=y
> >> (which makes this very much more likely, but it could happen without).
> >>
> >> The ever-subtle page_lock_anon_vma() now needs a further twist: since
> >> anon_vma_prepare() and anon_vma_fork() are liable to change the ->root
> >> of a reused anon_vma structure at any moment, page_lock_anon_vma()
> >> needs to check page_mapped() again before succeeding, otherwise
> >> page_unlock_anon_vma() might address a different root->lock.
> >>
> >> Signed-off-by: Hugh Dickins <hughd@google.com>
> >
> > Interesting, is the condition which allows this to trigger specific
> > to this merge window or was it always possible?
> 
> Just specific to this merge window, which started using
> anon_vma->root->lock in place of anon_vma->lock (anon_vma->root is
> often anon_vma itself, but not always).  I _think_ that change was
> itself a simplification of the locking in 2.6.35, rather than plugging
> a particular hole (it's not been backported to -stable), but I may be
> wrong on that - Rik?

rmap_walk_anon isn't stable without the anon_vma->root->lock. This is
because it only locks the local anon_vma and but the "vma" can be
still modified under it in vma_adjust, so vma_address can fail and
migration will crash. swapping is not issue as it's ok to miss a pte
once in a while if the vma is under vma_adjust by the time the VM
tries to unmap the page.

See the anon_vma_lock added as well to vma_adjust (without it, it'd be
just an useless exercise, but instead it's really plugging a real hole
thanks to the anon_vma_lock addition to vma_adjust).

Again not an issue unless you use migration (incidentally what memory
compaction uses, and you know who the primary user of memory
compaction is :).

Secondly ksm_does_need_to_copy can't be fixed to cow only pages that
are nonlinear as the page->mapping != anon_vma can now generate false
positives. It's now trivial to fix with page->mapping->root !=
anon_vma->root.

About your patch, it's a noop in my view... A single page_mapping
check after rcu_read_lock is enough. And "anon_vma->root" can't change
if page->mapping points to "anon_vma" at any time after rcu_read_lock
returns. rcu_read_lock works for all anon_vma including
anon_vma->root.

If you were running the page_mapped() check on a different "page" then
it could make a difference, but repeating it on the same page in the
same rcu_read_lock protected critical section won't make a difference
as far as the anon_vma freeing is concerned.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
  2010-08-26 23:50   ` Andrea Arcangeli
@ 2010-08-27  1:43     ` Hugh Dickins
  -1 siblings, 0 replies; 44+ messages in thread
From: Hugh Dickins @ 2010-08-27  1:43 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Linus Torvalds, Andrew Morton, Rik van Riel, linux-kernel, linux-mm

On Thu, Aug 26, 2010 at 4:50 PM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> On Wed, Aug 25, 2010 at 11:12:54PM -0700, Hugh Dickins wrote:
>> After several hours, kbuild tests hang with anon_vma_prepare() spinning on
>> a newly allocated anon_vma's lock - on a box with CONFIG_TREE_PREEMPT_RCU=y
>> (which makes this very much more likely, but it could happen without).
>>
>> The ever-subtle page_lock_anon_vma() now needs a further twist: since
>> anon_vma_prepare() and anon_vma_fork() are liable to change the ->root
>> of a reused anon_vma structure at any moment, page_lock_anon_vma()
>> needs to check page_mapped() again before succeeding, otherwise
>> page_unlock_anon_vma() might address a different root->lock.
>
> I don't get it, the anon_vma can be freed and reused only after we run
> rcu_read_unlock().

No.  Between rcu_read_lock and rcu_read_unlock, once we've done the
first (original) page_mapped test to make sure that this isn't just a
long-stale page->mapping left over in there, SLAB_DESTROY_BY_RCU
ensures that the slab page on which this "struct anon_vma" resides
cannot be freed and reused for some other purpose e.g. a page of user
data.  But that piece of slab holding this "struct anon_vma" is liable
to be freed and reused for another struct anon_vma at any point, until
we've got the right lock on it.

> And the anon_vma->root can't change unless the
> anon_vma is freed and reused.

Yes, but RCU is not protecting against that: all it's doing is
guaranteeing that when we "speculatively" spin_lock the anon_vma, we
won't be corrupting some other kind of structure or data.

> Last but not the least by the time
> page->mapping points to "anon_vma" the "anon_vma->root" is already
> initialized and stable.

Yes, but two things to be careful of there: one, we leave
page->mapping pointing to the anon_vma maybe long after that address
has got reused for something else, it's only when the page is finally
freed that it's cleared (and there certainly was a good racing reason
for that, but I'd have to think long and hard to reconstruct the
sequence - OTOH it was a race between page_remove_rmap and
page_add_anon_rmap); and two, notice how anon_vma_prepare() sets
anon_vma->root = anon_vma on a newly allocated anon_vma, before it
gets anon_vma_lock - so anon_vma->root can change underneath us at any
point there, until we've got anon_vma_lock _and_ checked stability
with a second page_mapped test.

>
> The page_mapped test is only relevant against the rcu_read_lock, not
> the spin_lock, so how it can make a difference to run it twice inside
> the same rcu_read_lock protected critical section? The first one still
> is valid also after the anon_vma_lock() returns, it's not like that
> anon_vma_lock drops the rcu_read_lock internally.
>
> Furthermore no need of ACCESS_ONCE on the anon_vma->root because it
> can't change from under us as the anon_vma can't be freed from under
> us until rcu_read_unlock returns (after we verified the first time
> that page_mapped is true under the rcu_read_lock, which we already do
> before trying to take the anon_vma_lock).

I must rush off for a few hours: hopefully my assertions above shed
some light., I think you're mistaking the role that RCU plays here.
You remark in other mail "About your patch, it's a noop in my view..."
- but seems quite an effective noop ;)  Without the patch my kbuilds
would hang usually within 6 hours - though one time they did manage 20
hours, I admit. They ran with the patch for 52 hours before I switched
the machine over to something else.

Hugh

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
@ 2010-08-27  1:43     ` Hugh Dickins
  0 siblings, 0 replies; 44+ messages in thread
From: Hugh Dickins @ 2010-08-27  1:43 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Linus Torvalds, Andrew Morton, Rik van Riel, linux-kernel, linux-mm

On Thu, Aug 26, 2010 at 4:50 PM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> On Wed, Aug 25, 2010 at 11:12:54PM -0700, Hugh Dickins wrote:
>> After several hours, kbuild tests hang with anon_vma_prepare() spinning on
>> a newly allocated anon_vma's lock - on a box with CONFIG_TREE_PREEMPT_RCU=y
>> (which makes this very much more likely, but it could happen without).
>>
>> The ever-subtle page_lock_anon_vma() now needs a further twist: since
>> anon_vma_prepare() and anon_vma_fork() are liable to change the ->root
>> of a reused anon_vma structure at any moment, page_lock_anon_vma()
>> needs to check page_mapped() again before succeeding, otherwise
>> page_unlock_anon_vma() might address a different root->lock.
>
> I don't get it, the anon_vma can be freed and reused only after we run
> rcu_read_unlock().

No.  Between rcu_read_lock and rcu_read_unlock, once we've done the
first (original) page_mapped test to make sure that this isn't just a
long-stale page->mapping left over in there, SLAB_DESTROY_BY_RCU
ensures that the slab page on which this "struct anon_vma" resides
cannot be freed and reused for some other purpose e.g. a page of user
data.  But that piece of slab holding this "struct anon_vma" is liable
to be freed and reused for another struct anon_vma at any point, until
we've got the right lock on it.

> And the anon_vma->root can't change unless the
> anon_vma is freed and reused.

Yes, but RCU is not protecting against that: all it's doing is
guaranteeing that when we "speculatively" spin_lock the anon_vma, we
won't be corrupting some other kind of structure or data.

> Last but not the least by the time
> page->mapping points to "anon_vma" the "anon_vma->root" is already
> initialized and stable.

Yes, but two things to be careful of there: one, we leave
page->mapping pointing to the anon_vma maybe long after that address
has got reused for something else, it's only when the page is finally
freed that it's cleared (and there certainly was a good racing reason
for that, but I'd have to think long and hard to reconstruct the
sequence - OTOH it was a race between page_remove_rmap and
page_add_anon_rmap); and two, notice how anon_vma_prepare() sets
anon_vma->root = anon_vma on a newly allocated anon_vma, before it
gets anon_vma_lock - so anon_vma->root can change underneath us at any
point there, until we've got anon_vma_lock _and_ checked stability
with a second page_mapped test.

>
> The page_mapped test is only relevant against the rcu_read_lock, not
> the spin_lock, so how it can make a difference to run it twice inside
> the same rcu_read_lock protected critical section? The first one still
> is valid also after the anon_vma_lock() returns, it's not like that
> anon_vma_lock drops the rcu_read_lock internally.
>
> Furthermore no need of ACCESS_ONCE on the anon_vma->root because it
> can't change from under us as the anon_vma can't be freed from under
> us until rcu_read_unlock returns (after we verified the first time
> that page_mapped is true under the rcu_read_lock, which we already do
> before trying to take the anon_vma_lock).

I must rush off for a few hours: hopefully my assertions above shed
some light., I think you're mistaking the role that RCU plays here.
You remark in other mail "About your patch, it's a noop in my view..."
- but seems quite an effective noop ;)  Without the patch my kbuilds
would hang usually within 6 hours - though one time they did manage 20
hours, I admit. They ran with the patch for 52 hours before I switched
the machine over to something else.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
  2010-08-27  1:43     ` Hugh Dickins
@ 2010-08-27  9:55       ` Andrea Arcangeli
  -1 siblings, 0 replies; 44+ messages in thread
From: Andrea Arcangeli @ 2010-08-27  9:55 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Andrew Morton, Rik van Riel, linux-kernel, linux-mm

On Thu, Aug 26, 2010 at 06:43:31PM -0700, Hugh Dickins wrote:
> some light., I think you're mistaking the role that RCU plays here.

That's exactly correct, I thought it prevented reuse of the slab
entry, not only of the whole slab... SLAB_DESTROY_BY_RCU is a lot more
tricky to use than I though...

However at the light of this, I think page_lock_anon_vma could have
returned a freed and reused anon_vma well before the anon-vma changes.

The anon_vma could have been freed after the first page_mapped check
succeed but before taking the spinlock. I think, it worked fine
because the rmap walks are robust enough just not to fall apart on a
reused anon_vma while the lock is hold. It become a visible problem
now because we were unlocking the wrong lock leading to a
deadlock. But I guess it wasn't too intentional to return a reused
anon_vma out of page_lock_anon_vma.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
@ 2010-08-27  9:55       ` Andrea Arcangeli
  0 siblings, 0 replies; 44+ messages in thread
From: Andrea Arcangeli @ 2010-08-27  9:55 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Andrew Morton, Rik van Riel, linux-kernel, linux-mm

On Thu, Aug 26, 2010 at 06:43:31PM -0700, Hugh Dickins wrote:
> some light., I think you're mistaking the role that RCU plays here.

That's exactly correct, I thought it prevented reuse of the slab
entry, not only of the whole slab... SLAB_DESTROY_BY_RCU is a lot more
tricky to use than I though...

However at the light of this, I think page_lock_anon_vma could have
returned a freed and reused anon_vma well before the anon-vma changes.

The anon_vma could have been freed after the first page_mapped check
succeed but before taking the spinlock. I think, it worked fine
because the rmap walks are robust enough just not to fall apart on a
reused anon_vma while the lock is hold. It become a visible problem
now because we were unlocking the wrong lock leading to a
deadlock. But I guess it wasn't too intentional to return a reused
anon_vma out of page_lock_anon_vma.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
  2010-08-27  9:55       ` Andrea Arcangeli
@ 2010-08-27 16:43         ` Hugh Dickins
  -1 siblings, 0 replies; 44+ messages in thread
From: Hugh Dickins @ 2010-08-27 16:43 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Linus Torvalds, Andrew Morton, Rik van Riel, Christoph Lameter,
	Peter Zijlstra, linux-kernel, linux-mm

On Fri, Aug 27, 2010 at 2:55 AM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> On Thu, Aug 26, 2010 at 06:43:31PM -0700, Hugh Dickins wrote:
>> some light., I think you're mistaking the role that RCU plays here.
>
> That's exactly correct, I thought it prevented reuse of the slab
> entry, not only of the whole slab... SLAB_DESTROY_BY_RCU is a lot more
> tricky to use than I though...
>
> However at the light of this, I think page_lock_anon_vma could have
> returned a freed and reused anon_vma well before the anon-vma changes.
>
> The anon_vma could have been freed after the first page_mapped check
> succeed but before taking the spinlock. I think, it worked fine
> because the rmap walks are robust enough just not to fall apart on a
> reused anon_vma while the lock is hold. It become a visible problem
> now because we were unlocking the wrong lock leading to a
> deadlock. But I guess it wasn't too intentional to return a reused
> anon_vma out of page_lock_anon_vma.

What you say there is all exactly right, except for "I guess it wasn't
too intentional": it was intentional, and known that it all worked out
okay in the rare case when a reused anon_vma got fed into the loops -
the anon_vma, after all, is nothing more than a list of places where
you may find the page mapped, it has never asserted that a page will
be found everywhere that the anon_vma lists.

I would have liked to say "well known" above, but perhaps well known
only to me: you're certainly not the first to be surprised by this.
IIRC both Christoph and Peter have at different times proposed patches
to tighten up page_lock_anon_vma() to avoid returning a stale/reused
anon_vma, probably both were dropped because neither was actually
necessary, until now: I guess it's a good thing for understandability
that anon_vma->root->lock now requires that we weed out that case.

Hugh

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
@ 2010-08-27 16:43         ` Hugh Dickins
  0 siblings, 0 replies; 44+ messages in thread
From: Hugh Dickins @ 2010-08-27 16:43 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Linus Torvalds, Andrew Morton, Rik van Riel, Christoph Lameter,
	Peter Zijlstra, linux-kernel, linux-mm

On Fri, Aug 27, 2010 at 2:55 AM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> On Thu, Aug 26, 2010 at 06:43:31PM -0700, Hugh Dickins wrote:
>> some light., I think you're mistaking the role that RCU plays here.
>
> That's exactly correct, I thought it prevented reuse of the slab
> entry, not only of the whole slab... SLAB_DESTROY_BY_RCU is a lot more
> tricky to use than I though...
>
> However at the light of this, I think page_lock_anon_vma could have
> returned a freed and reused anon_vma well before the anon-vma changes.
>
> The anon_vma could have been freed after the first page_mapped check
> succeed but before taking the spinlock. I think, it worked fine
> because the rmap walks are robust enough just not to fall apart on a
> reused anon_vma while the lock is hold. It become a visible problem
> now because we were unlocking the wrong lock leading to a
> deadlock. But I guess it wasn't too intentional to return a reused
> anon_vma out of page_lock_anon_vma.

What you say there is all exactly right, except for "I guess it wasn't
too intentional": it was intentional, and known that it all worked out
okay in the rare case when a reused anon_vma got fed into the loops -
the anon_vma, after all, is nothing more than a list of places where
you may find the page mapped, it has never asserted that a page will
be found everywhere that the anon_vma lists.

I would have liked to say "well known" above, but perhaps well known
only to me: you're certainly not the first to be surprised by this.
IIRC both Christoph and Peter have at different times proposed patches
to tighten up page_lock_anon_vma() to avoid returning a stale/reused
anon_vma, probably both were dropped because neither was actually
necessary, until now: I guess it's a good thing for understandability
that anon_vma->root->lock now requires that we weed out that case.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
  2010-08-27 16:43         ` Hugh Dickins
@ 2010-08-27 17:13           ` Christoph Lameter
  -1 siblings, 0 replies; 44+ messages in thread
From: Christoph Lameter @ 2010-08-27 17:13 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrea Arcangeli, Linus Torvalds, Andrew Morton, Rik van Riel,
	Peter Zijlstra, linux-kernel, linux-mm

On Fri, 27 Aug 2010, Hugh Dickins wrote:

> I would have liked to say "well known" above, but perhaps well known
> only to me: you're certainly not the first to be surprised by this.

Most people dealing with this for the first time get through a discovery
period. The network guys had similar problems when they first tried to use
SLAB_DESTROY_BY_RCU.

> IIRC both Christoph and Peter have at different times proposed patches
> to tighten up page_lock_anon_vma() to avoid returning a stale/reused
> anon_vma, probably both were dropped because neither was actually
> necessary, until now: I guess it's a good thing for understandability
> that anon_vma->root->lock now requires that we weed out that case.

Right. We need to verify that the object we have reached is the correct
one.

The basic problem with SLAB_DESTROY_BY_RCU is that you get a reference to
an object that is guaranteed only to have the same type (the instance may
fluctuate and be replaced from under you unless other measures are taken).

Typically one must take a lock within the memory structure to pin down
the object (or take a refcount). Only then can you follow pointers and
such. It is only possible to verify that the right object has been
reached *after* locking. Following a pointer without having determined
that we hit the right object should not occur.

A solution here would be to take the anon_vma->lock (prevents the object
switching under us) and then verify that the mapping is the one we are
looking for and that the pointer points to the right root. Then take the
root lock.

Hughs solution takes a global spinlock which will limit scalability.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
@ 2010-08-27 17:13           ` Christoph Lameter
  0 siblings, 0 replies; 44+ messages in thread
From: Christoph Lameter @ 2010-08-27 17:13 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrea Arcangeli, Linus Torvalds, Andrew Morton, Rik van Riel,
	Peter Zijlstra, linux-kernel, linux-mm

On Fri, 27 Aug 2010, Hugh Dickins wrote:

> I would have liked to say "well known" above, but perhaps well known
> only to me: you're certainly not the first to be surprised by this.

Most people dealing with this for the first time get through a discovery
period. The network guys had similar problems when they first tried to use
SLAB_DESTROY_BY_RCU.

> IIRC both Christoph and Peter have at different times proposed patches
> to tighten up page_lock_anon_vma() to avoid returning a stale/reused
> anon_vma, probably both were dropped because neither was actually
> necessary, until now: I guess it's a good thing for understandability
> that anon_vma->root->lock now requires that we weed out that case.

Right. We need to verify that the object we have reached is the correct
one.

The basic problem with SLAB_DESTROY_BY_RCU is that you get a reference to
an object that is guaranteed only to have the same type (the instance may
fluctuate and be replaced from under you unless other measures are taken).

Typically one must take a lock within the memory structure to pin down
the object (or take a refcount). Only then can you follow pointers and
such. It is only possible to verify that the right object has been
reached *after* locking. Following a pointer without having determined
that we hit the right object should not occur.

A solution here would be to take the anon_vma->lock (prevents the object
switching under us) and then verify that the mapping is the one we are
looking for and that the pointer points to the right root. Then take the
root lock.

Hughs solution takes a global spinlock which will limit scalability.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
  2010-08-27 17:13           ` Christoph Lameter
@ 2010-08-27 17:55             ` Hugh Dickins
  -1 siblings, 0 replies; 44+ messages in thread
From: Hugh Dickins @ 2010-08-27 17:55 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrea Arcangeli, Linus Torvalds, Andrew Morton, Rik van Riel,
	Peter Zijlstra, linux-kernel, linux-mm

On Fri, Aug 27, 2010 at 10:13 AM, Christoph Lameter <cl@linux.com> wrote:

> The basic problem with SLAB_DESTROY_BY_RCU is that you get a reference to
> an object that is guaranteed only to have the same type (the instance may
> fluctuate and be replaced from under you unless other measures are taken).

(I wouldn't describe that as a "problem with SLAB_DESTROY_BY_RCU":
it's precisely the nature of SLAB_DESTROY_BY_RCU, what makes it useful
in solving backward-locking problems elsewhere.)

>
> Typically one must take a lock within the memory structure to pin down
> the object (or take a refcount). Only then can you follow pointers and
> such. It is only possible to verify that the right object has been
> reached *after* locking. Following a pointer without having determined
> that we hit the right object should not occur.
>
> A solution here would be to take the anon_vma->lock (prevents the object
> switching under us) and then verify that the mapping is the one we are
> looking for and that the pointer points to the right root. Then take the
> root lock.
>
> Hughs solution takes a global spinlock which will limit scalability.

Eh?  My solution was a second page_mapped(page) test i.e. testing an atomic.

Hugh

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
@ 2010-08-27 17:55             ` Hugh Dickins
  0 siblings, 0 replies; 44+ messages in thread
From: Hugh Dickins @ 2010-08-27 17:55 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrea Arcangeli, Linus Torvalds, Andrew Morton, Rik van Riel,
	Peter Zijlstra, linux-kernel, linux-mm

On Fri, Aug 27, 2010 at 10:13 AM, Christoph Lameter <cl@linux.com> wrote:

> The basic problem with SLAB_DESTROY_BY_RCU is that you get a reference to
> an object that is guaranteed only to have the same type (the instance may
> fluctuate and be replaced from under you unless other measures are taken).

(I wouldn't describe that as a "problem with SLAB_DESTROY_BY_RCU":
it's precisely the nature of SLAB_DESTROY_BY_RCU, what makes it useful
in solving backward-locking problems elsewhere.)

>
> Typically one must take a lock within the memory structure to pin down
> the object (or take a refcount). Only then can you follow pointers and
> such. It is only possible to verify that the right object has been
> reached *after* locking. Following a pointer without having determined
> that we hit the right object should not occur.
>
> A solution here would be to take the anon_vma->lock (prevents the object
> switching under us) and then verify that the mapping is the one we are
> looking for and that the pointer points to the right root. Then take the
> root lock.
>
> Hughs solution takes a global spinlock which will limit scalability.

Eh?  My solution was a second page_mapped(page) test i.e. testing an atomic.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
  2010-08-27 17:55             ` Hugh Dickins
@ 2010-08-27 19:29               ` Christoph Lameter
  -1 siblings, 0 replies; 44+ messages in thread
From: Christoph Lameter @ 2010-08-27 19:29 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrea Arcangeli, Linus Torvalds, Andrew Morton, Rik van Riel,
	Peter Zijlstra, linux-kernel, linux-mm

On Fri, 27 Aug 2010, Hugh Dickins wrote:

> Eh?  My solution was a second page_mapped(page) test i.e. testing an atomic.

Argh. Right. Looked like a global to me. Did not see the earlier local
def.

If you still use a pointer then what does insure that the root
pointer was not changed after the ACCESS_ONCE? The free semantics
of an anon_vma?

Since there is no lock taken before the mapped check none of the
earlier reads from the anon vma structure nor the page mapped check
necessarily reflect a single state of the anon_vma.





^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
@ 2010-08-27 19:29               ` Christoph Lameter
  0 siblings, 0 replies; 44+ messages in thread
From: Christoph Lameter @ 2010-08-27 19:29 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrea Arcangeli, Linus Torvalds, Andrew Morton, Rik van Riel,
	Peter Zijlstra, linux-kernel, linux-mm

On Fri, 27 Aug 2010, Hugh Dickins wrote:

> Eh?  My solution was a second page_mapped(page) test i.e. testing an atomic.

Argh. Right. Looked like a global to me. Did not see the earlier local
def.

If you still use a pointer then what does insure that the root
pointer was not changed after the ACCESS_ONCE? The free semantics
of an anon_vma?

Since there is no lock taken before the mapped check none of the
earlier reads from the anon vma structure nor the page mapped check
necessarily reflect a single state of the anon_vma.




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
  2010-08-27 19:29               ` Christoph Lameter
@ 2010-08-27 20:14                 ` Hugh Dickins
  -1 siblings, 0 replies; 44+ messages in thread
From: Hugh Dickins @ 2010-08-27 20:14 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrea Arcangeli, Linus Torvalds, Andrew Morton, Rik van Riel,
	Peter Zijlstra, linux-kernel, linux-mm

On Fri, Aug 27, 2010 at 12:29 PM, Christoph Lameter <cl@linux.com> wrote:
> On Fri, 27 Aug 2010, Hugh Dickins wrote:
>
>> Eh?  My solution was a second page_mapped(page) test i.e. testing an atomic.
>
> Argh. Right. Looked like a global to me. Did not see the earlier local
> def.
>
> If you still use a pointer then what does insure that the root
> pointer was not changed after the ACCESS_ONCE? The free semantics
> of an anon_vma?

Nothing ensures that the root pointer was not changed after the
ACCESS_ONCE, that's exactly why we use ACCESS_ONCE there: once we've
got the lock and realize that what we've locked may not be what we
wanted (or may change from what we were wanting at any moment, the
page no longer being mapped there - but in that case we no longer want
it), we have to be sure to unlock the one we locked, rather than the
one which anon_vma->root might subsequently point to.

(Umm, maybe I'm not the clearest of explainers, sorry!  If you get my
point, fine; if it's gibberish to you, please ask me to try again.)

>
> Since there is no lock taken before the mapped check none of the
> earlier reads from the anon vma structure nor the page mapped check
> necessarily reflect a single state of the anon_vma.

There's no lock (other than RCU's read "lock")  taken before the
original mapped check, and that's important, otherwise our attempt to
lock might actually spinon or corrupt something that was long ago an
anon_vma.  But we do take the anon_vma->root->lock before the second
mapped check which I added.  If the page is still mapped at the point
of that second check, then we know that we got the right anon_vma,
that the page might be mapped in it, and anon_vma->root is not going
to change underneath us before the page_unlock_anon_vma().  (The page
may get unmapped at any time, the lock does not protect against that;
but if it's still mapped once we hold the lock, free_pgtables() cannot
free the anon_vma until we're done.)

Hugh

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
@ 2010-08-27 20:14                 ` Hugh Dickins
  0 siblings, 0 replies; 44+ messages in thread
From: Hugh Dickins @ 2010-08-27 20:14 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrea Arcangeli, Linus Torvalds, Andrew Morton, Rik van Riel,
	Peter Zijlstra, linux-kernel, linux-mm

On Fri, Aug 27, 2010 at 12:29 PM, Christoph Lameter <cl@linux.com> wrote:
> On Fri, 27 Aug 2010, Hugh Dickins wrote:
>
>> Eh?  My solution was a second page_mapped(page) test i.e. testing an atomic.
>
> Argh. Right. Looked like a global to me. Did not see the earlier local
> def.
>
> If you still use a pointer then what does insure that the root
> pointer was not changed after the ACCESS_ONCE? The free semantics
> of an anon_vma?

Nothing ensures that the root pointer was not changed after the
ACCESS_ONCE, that's exactly why we use ACCESS_ONCE there: once we've
got the lock and realize that what we've locked may not be what we
wanted (or may change from what we were wanting at any moment, the
page no longer being mapped there - but in that case we no longer want
it), we have to be sure to unlock the one we locked, rather than the
one which anon_vma->root might subsequently point to.

(Umm, maybe I'm not the clearest of explainers, sorry!  If you get my
point, fine; if it's gibberish to you, please ask me to try again.)

>
> Since there is no lock taken before the mapped check none of the
> earlier reads from the anon vma structure nor the page mapped check
> necessarily reflect a single state of the anon_vma.

There's no lock (other than RCU's read "lock")  taken before the
original mapped check, and that's important, otherwise our attempt to
lock might actually spinon or corrupt something that was long ago an
anon_vma.  But we do take the anon_vma->root->lock before the second
mapped check which I added.  If the page is still mapped at the point
of that second check, then we know that we got the right anon_vma,
that the page might be mapped in it, and anon_vma->root is not going
to change underneath us before the page_unlock_anon_vma().  (The page
may get unmapped at any time, the lock does not protect against that;
but if it's still mapped once we hold the lock, free_pgtables() cannot
free the anon_vma until we're done.)

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
  2010-08-27 20:14                 ` Hugh Dickins
@ 2010-08-27 20:56                   ` Christoph Lameter
  -1 siblings, 0 replies; 44+ messages in thread
From: Christoph Lameter @ 2010-08-27 20:56 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrea Arcangeli, Linus Torvalds, Andrew Morton, Rik van Riel,
	Peter Zijlstra, linux-kernel, linux-mm

On Fri, 27 Aug 2010, Hugh Dickins wrote:

> Nothing ensures that the root pointer was not changed after the
> ACCESS_ONCE, that's exactly why we use ACCESS_ONCE there: once we've
> got the lock and realize that what we've locked may not be what we
> wanted (or may change from what we were wanting at any moment, the
> page no longer being mapped there - but in that case we no longer want
> it), we have to be sure to unlock the one we locked, rather than the
> one which anon_vma->root might subsequently point to.

I do not see any check after we have taken the lock to verify that we
locked the correct object. Was there a second version of the patch?

> > Since there is no lock taken before the mapped check none of the
> > earlier reads from the anon vma structure nor the page mapped check
> > necessarily reflect a single state of the anon_vma.
>
> There's no lock (other than RCU's read "lock")  taken before the
> original mapped check, and that's important, otherwise our attempt to
> lock might actually spinon or corrupt something that was long ago an
> anon_vma.  But we do take the anon_vma->root->lock before the second
> mapped check which I added.  If the page is still mapped at the point

You then are using an object from the anon_vma (the pointer) without a
lock! This is unstable therefore unless there are other constraints. The
anon_vma->lock must be taken before derefencing that pointer. The page may
have been unmapped and mapped again between the two checks. Unlikely but
possible.

> of that second check, then we know that we got the right anon_vma,

I do not see a second check (*after* taking the lock) in the patch and the
way the lock is taken can be a problem in itself.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
@ 2010-08-27 20:56                   ` Christoph Lameter
  0 siblings, 0 replies; 44+ messages in thread
From: Christoph Lameter @ 2010-08-27 20:56 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrea Arcangeli, Linus Torvalds, Andrew Morton, Rik van Riel,
	Peter Zijlstra, linux-kernel, linux-mm

On Fri, 27 Aug 2010, Hugh Dickins wrote:

> Nothing ensures that the root pointer was not changed after the
> ACCESS_ONCE, that's exactly why we use ACCESS_ONCE there: once we've
> got the lock and realize that what we've locked may not be what we
> wanted (or may change from what we were wanting at any moment, the
> page no longer being mapped there - but in that case we no longer want
> it), we have to be sure to unlock the one we locked, rather than the
> one which anon_vma->root might subsequently point to.

I do not see any check after we have taken the lock to verify that we
locked the correct object. Was there a second version of the patch?

> > Since there is no lock taken before the mapped check none of the
> > earlier reads from the anon vma structure nor the page mapped check
> > necessarily reflect a single state of the anon_vma.
>
> There's no lock (other than RCU's read "lock")  taken before the
> original mapped check, and that's important, otherwise our attempt to
> lock might actually spinon or corrupt something that was long ago an
> anon_vma.  But we do take the anon_vma->root->lock before the second
> mapped check which I added.  If the page is still mapped at the point

You then are using an object from the anon_vma (the pointer) without a
lock! This is unstable therefore unless there are other constraints. The
anon_vma->lock must be taken before derefencing that pointer. The page may
have been unmapped and mapped again between the two checks. Unlikely but
possible.

> of that second check, then we know that we got the right anon_vma,

I do not see a second check (*after* taking the lock) in the patch and the
way the lock is taken can be a problem in itself.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
  2010-08-27 20:56                   ` Christoph Lameter
@ 2010-08-27 21:28                     ` Hugh Dickins
  -1 siblings, 0 replies; 44+ messages in thread
From: Hugh Dickins @ 2010-08-27 21:28 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrea Arcangeli, Linus Torvalds, Andrew Morton, Rik van Riel,
	Peter Zijlstra, linux-kernel, linux-mm

On Fri, Aug 27, 2010 at 1:56 PM, Christoph Lameter <cl@linux.com> wrote:
> On Fri, 27 Aug 2010, Hugh Dickins wrote:
>
>> Nothing ensures that the root pointer was not changed after the
>> ACCESS_ONCE, that's exactly why we use ACCESS_ONCE there: once we've
>> got the lock and realize that what we've locked may not be what we
>> wanted (or may change from what we were wanting at any moment, the
>> page no longer being mapped there - but in that case we no longer want
>> it), we have to be sure to unlock the one we locked, rather than the
>> one which anon_vma->root might subsequently point to.
>
> I do not see any check after we have taken the lock to verify that we
> locked the correct object. Was there a second version of the patch?

No second version of the patch, no.  As I said already, it's that
second page_mapped check which gives the guarantee that the anon_vma
has not yet been freed, hence we've locked the correct object.

>
>> > Since there is no lock taken before the mapped check none of the
>> > earlier reads from the anon vma structure nor the page mapped check
>> > necessarily reflect a single state of the anon_vma.
>>
>> There's no lock (other than RCU's read "lock")  taken before the
>> original mapped check, and that's important, otherwise our attempt to
>> lock might actually spinon or corrupt something that was long ago an
>> anon_vma.  But we do take the anon_vma->root->lock before the second
>> mapped check which I added.  If the page is still mapped at the point
>
> You then are using an object from the anon_vma (the pointer) without a
> lock!

Yes. (not counting RCU's read "lock" as a lock).

> This is unstable therefore unless there are other constraints. The
> anon_vma->lock must be taken before derefencing that pointer.

No, SLAB_DESTROY_BY_RCU gives us just the stablity we need to take the lock.

> The page may
> have been unmapped and mapped again between the two checks. Unlikely but
> possible.

Yes, unlikely but possible.  (Well, actually, is it possible?  It can
be unmapped on exit without any lock, but unmapping for pageout would
require the page lock, would insert a swp_entry_t, and mapping again
would go to do_swap_page which would again require the page lock.  But
never mind that, let's assume there is a way it can be unmapped and
mapped again.)  The thing is, page->mapping will point to the same
anon_vma throughout, that only gets reset when the page is freed, and
there should be nowhere else that modifies page->mapping once it's
been set to anon_vma - if you know of somewhere, please point to it,
we do need to examine that case.

>
>> of that second check, then we know that we got the right anon_vma,
>
> I do not see a second check (*after* taking the lock) in the patch






> and the
> way the lock is taken can be a problem in itself.
>
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
@ 2010-08-27 21:28                     ` Hugh Dickins
  0 siblings, 0 replies; 44+ messages in thread
From: Hugh Dickins @ 2010-08-27 21:28 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrea Arcangeli, Linus Torvalds, Andrew Morton, Rik van Riel,
	Peter Zijlstra, linux-kernel, linux-mm

On Fri, Aug 27, 2010 at 1:56 PM, Christoph Lameter <cl@linux.com> wrote:
> On Fri, 27 Aug 2010, Hugh Dickins wrote:
>
>> Nothing ensures that the root pointer was not changed after the
>> ACCESS_ONCE, that's exactly why we use ACCESS_ONCE there: once we've
>> got the lock and realize that what we've locked may not be what we
>> wanted (or may change from what we were wanting at any moment, the
>> page no longer being mapped there - but in that case we no longer want
>> it), we have to be sure to unlock the one we locked, rather than the
>> one which anon_vma->root might subsequently point to.
>
> I do not see any check after we have taken the lock to verify that we
> locked the correct object. Was there a second version of the patch?

No second version of the patch, no.  As I said already, it's that
second page_mapped check which gives the guarantee that the anon_vma
has not yet been freed, hence we've locked the correct object.

>
>> > Since there is no lock taken before the mapped check none of the
>> > earlier reads from the anon vma structure nor the page mapped check
>> > necessarily reflect a single state of the anon_vma.
>>
>> There's no lock (other than RCU's read "lock")  taken before the
>> original mapped check, and that's important, otherwise our attempt to
>> lock might actually spinon or corrupt something that was long ago an
>> anon_vma.  But we do take the anon_vma->root->lock before the second
>> mapped check which I added.  If the page is still mapped at the point
>
> You then are using an object from the anon_vma (the pointer) without a
> lock!

Yes. (not counting RCU's read "lock" as a lock).

> This is unstable therefore unless there are other constraints. The
> anon_vma->lock must be taken before derefencing that pointer.

No, SLAB_DESTROY_BY_RCU gives us just the stablity we need to take the lock

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
  2010-08-27 21:28                     ` Hugh Dickins
@ 2010-08-27 21:33                       ` Hugh Dickins
  -1 siblings, 0 replies; 44+ messages in thread
From: Hugh Dickins @ 2010-08-27 21:33 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrea Arcangeli, Linus Torvalds, Andrew Morton, Rik van Riel,
	Peter Zijlstra, linux-kernel, linux-mm

Sorry, I seem to have hit some key which sent the mail off too soon, a
tab perhaps: finishing up...

On Fri, Aug 27, 2010 at 2:28 PM, Hugh Dickins <hughd@google.com> wrote:
> On Fri, Aug 27, 2010 at 1:56 PM, Christoph Lameter <cl@linux.com> wrote:
>>
>>> of that second check, then we know that we got the right anon_vma,
>>
>> I do not see a second check (*after* taking the lock) in the patch

        if (page_mapped(page))
                return anon_vma;

>> and the way the lock is taken can be a problem in itself.

No, that's what we rely upon SLAB_DESTROY_BY_RCU for.

Hugh

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
@ 2010-08-27 21:33                       ` Hugh Dickins
  0 siblings, 0 replies; 44+ messages in thread
From: Hugh Dickins @ 2010-08-27 21:33 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrea Arcangeli, Linus Torvalds, Andrew Morton, Rik van Riel,
	Peter Zijlstra, linux-kernel, linux-mm

Sorry, I seem to have hit some key which sent the mail off too soon, a
tab perhaps: finishing up...

On Fri, Aug 27, 2010 at 2:28 PM, Hugh Dickins <hughd@google.com> wrote:
> On Fri, Aug 27, 2010 at 1:56 PM, Christoph Lameter <cl@linux.com> wrote:
>>
>>> of that second check, then we know that we got the right anon_vma,
>>
>> I do not see a second check (*after* taking the lock) in the patch

        if (page_mapped(page))
                return anon_vma;

>> and the way the lock is taken can be a problem in itself.

No, that's what we rely upon SLAB_DESTROY_BY_RCU for.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
  2010-08-27 21:33                       ` Hugh Dickins
@ 2010-08-27 23:06                         ` Christoph Lameter
  -1 siblings, 0 replies; 44+ messages in thread
From: Christoph Lameter @ 2010-08-27 23:06 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrea Arcangeli, Linus Torvalds, Andrew Morton, Rik van Riel,
	Peter Zijlstra, linux-kernel, linux-mm

On Fri, 27 Aug 2010, Hugh Dickins wrote:

> >> I do not see a second check (*after* taking the lock) in the patch
>
>         if (page_mapped(page))
>                 return anon_vma;

As far as I can tell you would have to recheck the mapping pointer and the
pointer to the root too after taking the lock because only taking the lock
stabilitzes the object. Any other data you may have obtained before
acquiring the lock may have changed.

> >> and the way the lock is taken can be a problem in itself.
>
> No, that's what we rely upon SLAB_DESTROY_BY_RCU for.

SLAB_DESTROY_BY_RCU does not guarantee that the object stays the same nor
does it prevent any fields from changing. Going through a pointer with
only SLAB_DESTROY_BY_RCU means that you can only rely on the atomicity
guarantee for pointer updates. You get a valid pointer but pointer changes
are not prevented by SLAB_DESTROY_BY_RCU.

The only guarantee of that would be through other synchronization
techniques. If you believe that the page lock provides sufficient
synchronization that then this approach may be ok.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
@ 2010-08-27 23:06                         ` Christoph Lameter
  0 siblings, 0 replies; 44+ messages in thread
From: Christoph Lameter @ 2010-08-27 23:06 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrea Arcangeli, Linus Torvalds, Andrew Morton, Rik van Riel,
	Peter Zijlstra, linux-kernel, linux-mm

On Fri, 27 Aug 2010, Hugh Dickins wrote:

> >> I do not see a second check (*after* taking the lock) in the patch
>
>         if (page_mapped(page))
>                 return anon_vma;

As far as I can tell you would have to recheck the mapping pointer and the
pointer to the root too after taking the lock because only taking the lock
stabilitzes the object. Any other data you may have obtained before
acquiring the lock may have changed.

> >> and the way the lock is taken can be a problem in itself.
>
> No, that's what we rely upon SLAB_DESTROY_BY_RCU for.

SLAB_DESTROY_BY_RCU does not guarantee that the object stays the same nor
does it prevent any fields from changing. Going through a pointer with
only SLAB_DESTROY_BY_RCU means that you can only rely on the atomicity
guarantee for pointer updates. You get a valid pointer but pointer changes
are not prevented by SLAB_DESTROY_BY_RCU.

The only guarantee of that would be through other synchronization
techniques. If you believe that the page lock provides sufficient
synchronization that then this approach may be ok.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
  2010-08-27 23:06                         ` Christoph Lameter
@ 2010-08-28  1:07                           ` Hugh Dickins
  -1 siblings, 0 replies; 44+ messages in thread
From: Hugh Dickins @ 2010-08-28  1:07 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Naoya Horiguchi, Andrea Arcangeli, Linus Torvalds, Andrew Morton,
	Rik van Riel, Peter Zijlstra, linux-kernel, linux-mm

On Fri, Aug 27, 2010 at 4:06 PM, Christoph Lameter <cl@linux.com> wrote:
> On Fri, 27 Aug 2010, Hugh Dickins wrote:
>
>> >> I do not see a second check (*after* taking the lock) in the patch
>>
>>         if (page_mapped(page))
>>                 return anon_vma;
>
> As far as I can tell you would have to recheck the mapping pointer and the

That's a more interesting question than I'd realized.  When
page_lock_anon_vma() first came in (2.6.9) there was nothing which
updated page->mapping of an anon page after it was set, until the page
was freed.

Since then we've gathered a few places which update it while holding
the page lock (migrate.c, ksm.c) - no problem since the callers of
page_lock_anon_vma() hold and must hold page lock.  Well, there is the
fairly recent call to page_lock_anon_vma() from memory-failure.c, and
its even more recent use on hugepages: there's switching back and
forth between p and hpage and page, but I think it does end up
applying page_lock_anon_vma() to the very page that it locked earlier.

Then there's the recently added page_move_anon_rmap(): fine in
memory.c, the page lock is held; but apparently broken in hugetlb.c,
where it's called only when the pagelock has not been taken!
Horiguchi-san Cc'ed.

__page_set_anon_rmap() looks like it might have changed anon
page->mapping in 2.6.35, but Andrea has fixed that with PageAnon tests
in 2.6.36-rc.  Ah, but what if "exclusive" and non-exclusive calls to
__page_set_anon_rmap() are racing?  Not clear, it may be that Andrea
has only narrowed a window not closed it (and I've not yet looked up
the commit to see his intent); or it may be okay, that there cannot be
a conflict of anon_vma in that case.  Need to dig deeper.

__hugepage_set_anon_rmap() appears to copy the 2.6.35
__page_set_anon_rmap(), and probably needs to add in Andrea's fix, or
whatever else is needed there.

This is a different problem (or it may turn out to be a non-existent
problem, aside from the hugetlb.c case to be fixed there) than I was
fixing with my patch, and can be patched separately; but It certainly
looks as if it's worth adding a BUG_ON or VM_BUG_ON to check for a
switch of anon_vma beneath us there.   Plus a
VM_BUG_ON(PageLocked(page)) going into page_lock_anon_vma().

> pointer to the root too after taking the lock because only taking the lock
> stabilitzes the object.

A change in the pointer to the root is covered by the ACCESS_ONCE:
yes, it can change beneath us there, but only through the anon_vma
being freed and reused, in which case the subsequent page_mapped test
tells us the page is no longer mapped, whereupon we back out,
unlocking what we locked.  (I had at one point been tempted to check
anon_vma->root == root_anon_vma there instead of page_mapped(), but
that would not have been good enough: since anon_vma_prepare() sets
anon_vma->root before taking the lock, anon_vma->root could change
under us anywhere between the page_lock_anon_vma() and its
page_unlock_anon_vma() in that case.)

> Any other data you may have obtained before
> acquiring the lock may have changed.
>
>> >> and the way the lock is taken can be a problem in itself.
>>
>> No, that's what we rely upon SLAB_DESTROY_BY_RCU for.
>
> SLAB_DESTROY_BY_RCU does not guarantee that the object stays the same nor
> does it prevent any fields from changing. Going through a pointer with
> only SLAB_DESTROY_BY_RCU means that you can only rely on the atomicity
> guarantee for pointer updates. You get a valid pointer but pointer changes
> are not prevented by SLAB_DESTROY_BY_RCU.

You're speaking too generally there for me to understand its
relevance!  What specific problem do you see?

>
> The only guarantee of that would be through other synchronization
> techniques. If you believe that the page lock provides sufficient
> synchronization that then this approach may be ok.

The page lock should be guaranteeing that page->mapping (anon_vma)
cannot change underneath us; but there is some doubt on that above,
I'll report back when I've had enough quiet time to think through the
__set_page_anon_rmap() possibilities:
thanks for uncovering those doubts.

Hugh

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
@ 2010-08-28  1:07                           ` Hugh Dickins
  0 siblings, 0 replies; 44+ messages in thread
From: Hugh Dickins @ 2010-08-28  1:07 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Naoya Horiguchi, Andrea Arcangeli, Linus Torvalds, Andrew Morton,
	Rik van Riel, Peter Zijlstra, linux-kernel, linux-mm

On Fri, Aug 27, 2010 at 4:06 PM, Christoph Lameter <cl@linux.com> wrote:
> On Fri, 27 Aug 2010, Hugh Dickins wrote:
>
>> >> I do not see a second check (*after* taking the lock) in the patch
>>
>>         if (page_mapped(page))
>>                 return anon_vma;
>
> As far as I can tell you would have to recheck the mapping pointer and the

That's a more interesting question than I'd realized.  When
page_lock_anon_vma() first came in (2.6.9) there was nothing which
updated page->mapping of an anon page after it was set, until the page
was freed.

Since then we've gathered a few places which update it while holding
the page lock (migrate.c, ksm.c) - no problem since the callers of
page_lock_anon_vma() hold and must hold page lock.  Well, there is the
fairly recent call to page_lock_anon_vma() from memory-failure.c, and
its even more recent use on hugepages: there's switching back and
forth between p and hpage and page, but I think it does end up
applying page_lock_anon_vma() to the very page that it locked earlier.

Then there's the recently added page_move_anon_rmap(): fine in
memory.c, the page lock is held; but apparently broken in hugetlb.c,
where it's called only when the pagelock has not been taken!
Horiguchi-san Cc'ed.

__page_set_anon_rmap() looks like it might have changed anon
page->mapping in 2.6.35, but Andrea has fixed that with PageAnon tests
in 2.6.36-rc.  Ah, but what if "exclusive" and non-exclusive calls to
__page_set_anon_rmap() are racing?  Not clear, it may be that Andrea
has only narrowed a window not closed it (and I've not yet looked up
the commit to see his intent); or it may be okay, that there cannot be
a conflict of anon_vma in that case.  Need to dig deeper.

__hugepage_set_anon_rmap() appears to copy the 2.6.35
__page_set_anon_rmap(), and probably needs to add in Andrea's fix, or
whatever else is needed there.

This is a different problem (or it may turn out to be a non-existent
problem, aside from the hugetlb.c case to be fixed there) than I was
fixing with my patch, and can be patched separately; but It certainly
looks as if it's worth adding a BUG_ON or VM_BUG_ON to check for a
switch of anon_vma beneath us there.   Plus a
VM_BUG_ON(PageLocked(page)) going into page_lock_anon_vma().

> pointer to the root too after taking the lock because only taking the lock
> stabilitzes the object.

A change in the pointer to the root is covered by the ACCESS_ONCE:
yes, it can change beneath us there, but only through the anon_vma
being freed and reused, in which case the subsequent page_mapped test
tells us the page is no longer mapped, whereupon we back out,
unlocking what we locked.  (I had at one point been tempted to check
anon_vma->root == root_anon_vma there instead of page_mapped(), but
that would not have been good enough: since anon_vma_prepare() sets
anon_vma->root before taking the lock, anon_vma->root could change
under us anywhere between the page_lock_anon_vma() and its
page_unlock_anon_vma() in that case.)

> Any other data you may have obtained before
> acquiring the lock may have changed.
>
>> >> and the way the lock is taken can be a problem in itself.
>>
>> No, that's what we rely upon SLAB_DESTROY_BY_RCU for.
>
> SLAB_DESTROY_BY_RCU does not guarantee that the object stays the same nor
> does it prevent any fields from changing. Going through a pointer with
> only SLAB_DESTROY_BY_RCU means that you can only rely on the atomicity
> guarantee for pointer updates. You get a valid pointer but pointer changes
> are not prevented by SLAB_DESTROY_BY_RCU.

You're speaking too generally there for me to understand its
relevance!  What specific problem do you see?

>
> The only guarantee of that would be through other synchronization
> techniques. If you believe that the page lock provides sufficient
> synchronization that then this approach may be ok.

The page lock should be guaranteeing that page->mapping (anon_vma)
cannot change underneath us; but there is some doubt on that above,
I'll report back when I've had enough quiet time to think through the
__set_page_anon_rmap() possibilities:
thanks for uncovering those doubts.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
  2010-08-28  1:07                           ` Hugh Dickins
@ 2010-08-28  2:47                             ` Christoph Lameter
  -1 siblings, 0 replies; 44+ messages in thread
From: Christoph Lameter @ 2010-08-28  2:47 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Naoya Horiguchi, Andrea Arcangeli, Linus Torvalds, Andrew Morton,
	Rik van Riel, Peter Zijlstra, linux-kernel, linux-mm

On Fri, 27 Aug 2010, Hugh Dickins wrote:

> >> No, that's what we rely upon SLAB_DESTROY_BY_RCU for.
> >
> > SLAB_DESTROY_BY_RCU does not guarantee that the object stays the same nor
> > does it prevent any fields from changing. Going through a pointer with
> > only SLAB_DESTROY_BY_RCU means that you can only rely on the atomicity
> > guarantee for pointer updates. You get a valid pointer but pointer changes
> > are not prevented by SLAB_DESTROY_BY_RCU.
>
> You're speaking too generally there for me to understand its
> relevance!  What specific problem do you see?

I had the impression that you rely on SLAB_DESTROY_BY_RCU for more than
what it gives you. If the lock taken is not directly in the structure that
is managed by slab but only reachable by a pointer then potential pointer
changes are also danger to consider.

I'd be much more comfortable if the following would be done

A. Pin the anon_vma by either
	I. Take a refcount on the anon vma
	II. Take a lock in the anon vma (something that is not pointed to)

B. Either
	I. All values that have been used before the pinning are
	   verified after the pinning (and the lock is reacquired
           if verification fails).

	II. Or all functions using page_lock_anon_vma() must securely
	    work in the case that the anon_vma was reused for
	    something else before the vma lock was acquired.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
@ 2010-08-28  2:47                             ` Christoph Lameter
  0 siblings, 0 replies; 44+ messages in thread
From: Christoph Lameter @ 2010-08-28  2:47 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Naoya Horiguchi, Andrea Arcangeli, Linus Torvalds, Andrew Morton,
	Rik van Riel, Peter Zijlstra, linux-kernel, linux-mm

On Fri, 27 Aug 2010, Hugh Dickins wrote:

> >> No, that's what we rely upon SLAB_DESTROY_BY_RCU for.
> >
> > SLAB_DESTROY_BY_RCU does not guarantee that the object stays the same nor
> > does it prevent any fields from changing. Going through a pointer with
> > only SLAB_DESTROY_BY_RCU means that you can only rely on the atomicity
> > guarantee for pointer updates. You get a valid pointer but pointer changes
> > are not prevented by SLAB_DESTROY_BY_RCU.
>
> You're speaking too generally there for me to understand its
> relevance!  What specific problem do you see?

I had the impression that you rely on SLAB_DESTROY_BY_RCU for more than
what it gives you. If the lock taken is not directly in the structure that
is managed by slab but only reachable by a pointer then potential pointer
changes are also danger to consider.

I'd be much more comfortable if the following would be done

A. Pin the anon_vma by either
	I. Take a refcount on the anon vma
	II. Take a lock in the anon vma (something that is not pointed to)

B. Either
	I. All values that have been used before the pinning are
	   verified after the pinning (and the lock is reacquired
           if verification fails).

	II. Or all functions using page_lock_anon_vma() must securely
	    work in the case that the anon_vma was reused for
	    something else before the vma lock was acquired.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
  2010-08-28  2:47                             ` Christoph Lameter
@ 2010-08-28 10:17                               ` Peter Zijlstra
  -1 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2010-08-28 10:17 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Hugh Dickins, Naoya Horiguchi, Andrea Arcangeli, Linus Torvalds,
	Andrew Morton, Rik van Riel, linux-kernel, linux-mm

On Fri, 2010-08-27 at 21:47 -0500, Christoph Lameter wrote:
> 
> I'd be much more comfortable if the following would be done
> 
> A. Pin the anon_vma by either
>         I. Take a refcount on the anon vma

My preemptible mmu patches do that..

>         II. Take a lock in the anon vma (something that is not pointed to)
> 
> B. Either
>         I. All values that have been used before the pinning are
>            verified after the pinning (and the lock is reacquired
>            if verification fails).
> 
>         II. Or all functions using page_lock_anon_vma() must securely
>             work in the case that the anon_vma was reused for
>             something else before the vma lock was acquired.

Last time I looked they all work like that, they all use something akin
to vma_address() which validates that the page we're interested in is
indeed part of the vma we obtained from the rmap chain.

Anyway, I'll try and refresh my preemptible mmu patch-set now that the
merge window dust settled and post if again, hopefully we can stick it
in -next.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
@ 2010-08-28 10:17                               ` Peter Zijlstra
  0 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2010-08-28 10:17 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Hugh Dickins, Naoya Horiguchi, Andrea Arcangeli, Linus Torvalds,
	Andrew Morton, Rik van Riel, linux-kernel, linux-mm

On Fri, 2010-08-27 at 21:47 -0500, Christoph Lameter wrote:
> 
> I'd be much more comfortable if the following would be done
> 
> A. Pin the anon_vma by either
>         I. Take a refcount on the anon vma

My preemptible mmu patches do that..

>         II. Take a lock in the anon vma (something that is not pointed to)
> 
> B. Either
>         I. All values that have been used before the pinning are
>            verified after the pinning (and the lock is reacquired
>            if verification fails).
> 
>         II. Or all functions using page_lock_anon_vma() must securely
>             work in the case that the anon_vma was reused for
>             something else before the vma lock was acquired.

Last time I looked they all work like that, they all use something akin
to vma_address() which validates that the page we're interested in is
indeed part of the vma we obtained from the rmap chain.

Anyway, I'll try and refresh my preemptible mmu patch-set now that the
merge window dust settled and post if again, hopefully we can stick it
in -next.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
  2010-08-28  1:07                           ` Hugh Dickins
@ 2010-08-28 15:54                             ` Andrea Arcangeli
  -1 siblings, 0 replies; 44+ messages in thread
From: Andrea Arcangeli @ 2010-08-28 15:54 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Christoph Lameter, Naoya Horiguchi, Linus Torvalds,
	Andrew Morton, Rik van Riel, Peter Zijlstra, linux-kernel,
	linux-mm

On Fri, Aug 27, 2010 at 06:07:23PM -0700, Hugh Dickins wrote:
> __page_set_anon_rmap() looks like it might have changed anon
> page->mapping in 2.6.35, but Andrea has fixed that with PageAnon tests
> in 2.6.36-rc.  Ah, but what if "exclusive" and non-exclusive calls to
> __page_set_anon_rmap() are racing?  Not clear, it may be that Andrea

__page_set_anon_rmap doesn't require the PG_lock only if it's a newly
allocated page and it is called from page_add_new_anon_rmap. So there
cannot be concurrent __page_set_anon_rmap running on the same page. If
there could be concurrent __page_set_anon_rmap running on the same
page, the page_lock_anon_vma running on a changing page->mapping would
be the last worry, as page_add_new_anon_rmap would overwrite _mapcount
with 0 while do_page_add_anon_rmap runs, so corrupting the mapcount
information leading to immediate crash during page freeing....

So definitely it must not happen and not only because of
page_lock_anon_vma... and it's unlikely to go unnoticed.

I think we're safe on that respect.

> has only narrowed a window not closed it (and I've not yet looked up
> the commit to see his intent); or it may be okay, that there cannot be
> a conflict of anon_vma in that case.  Need to dig deeper.

That change is to avoid altering the page->mapping for anon
pages. It's only an optimization. No need to set the page->mapping
back to the anon_vma->root for AnonPages (that in turn have already
their page->mapping set) if we've already more finegrained information
into the page->mapping. If we've already information in page->mapping
(page is Anon) then setting to anon_vma->root can only be a coarser
setting losing anon_vma child granularity. We must set to the root
anon vma however when the page is swapcache but not anon yet... and if
it's not exclusive we've to use the anon_vma->root, otherwise if it's
being taken over by the local process we can use the local
vma->anon_vma. This should explain the logic in __set_page_anon_rmap.

> __hugepage_set_anon_rmap() appears to copy the 2.6.35
> __page_set_anon_rmap(), and probably needs to add in Andrea's fix, or
> whatever else is needed there.

I think it's actually safe in anon_vma terms, setting the
page->mapping to the anon_vma->root _always_ safe, but it should use
anon_vma->root instead of list_entry (should still lead to the same
result) and it can probably also optimize it if it's already an
AnonPage like I did for the not-hugetlbfs case (which also includes
transparent hugepages as they share the core VM paths).

The lack of BUG_ON(!PageLocked(page)) in the hugetlb_add_anon_rmap is
worth fixing...

hugepage_add_new_anon_rmap runs lock_page before so most certainly is
ok (maybe lock_page not needed if it's a new page?).

hugepage_add_anon_rmap seems to run on a local new page too (just
cowed) so I'm unsure why it's not using hugepage_add_new_anon_rmap too
and maybe it's safe without the PG_lock too, but it should use
hugepage_add_new_anon_rmap so we keep the same logic of the core VM.

I didn't spend too much on this hugetlbfs code, this is just a short
review.

> switch of anon_vma beneath us there.   Plus a
> VM_BUG_ON(PageLocked(page)) going into page_lock_anon_vma().

!PageLocked

Ok I think the concurrent writers of page->mapping needs the PG_lock
(unless they're working on the newly allocated page like
page_add_new_anon_rmap) but I really like page_lock_anon_vma to be
safe without PG_lock and to rely only on RCU and the anon_vma lock. Do
you think there's a window there? I think as long as the page is
mapped it doesn't matter... any change that can happen from under us
in page_lock_anon_vma, is still going to point to a valid anon_vma, if
it is reused it can be reused regardless if it's the root or the local
one and we've your fix to take care of that complexity of slab RCU
freeing behavior.

I need to think again and more deeply about page_lock_anon_vma running
on a page not locked, but I was convinced it was safe and that only
the writers needed the PG_lock (and even for the writers case, I think
it is mostly needed for other reasons, notably to keep the mapcount
coherent!). Yeah the write to page->mapping has to be atomic of course
(I actually checked every page->mapping writer in mm/rmap.c to verify
the gcc output to verify it is atomic, like we also relay on gcc to
write 64bit atomic for the pte/spte writes... I worried a little about
the |=1 but gcc is smart enough to do it in registers before writing
to memory, again not so different from the pte updates ;).

My reasoning is that any anon_vma we pick in page_lock_anon_vma is ok
(the root or intermediate one can't go away until the local one goes
away, if it's the local one it can't go away as long as the page is
mapped). So as long as the page->mapping writer writes atomic, and we
check page_mapped after taking the lock like your patch does, we
should be ok without PG_lock in page_lock_anon_vma.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] mm: fix hang on anon_vma->root->lock
@ 2010-08-28 15:54                             ` Andrea Arcangeli
  0 siblings, 0 replies; 44+ messages in thread
From: Andrea Arcangeli @ 2010-08-28 15:54 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Christoph Lameter, Naoya Horiguchi, Linus Torvalds,
	Andrew Morton, Rik van Riel, Peter Zijlstra, linux-kernel,
	linux-mm

On Fri, Aug 27, 2010 at 06:07:23PM -0700, Hugh Dickins wrote:
> __page_set_anon_rmap() looks like it might have changed anon
> page->mapping in 2.6.35, but Andrea has fixed that with PageAnon tests
> in 2.6.36-rc.  Ah, but what if "exclusive" and non-exclusive calls to
> __page_set_anon_rmap() are racing?  Not clear, it may be that Andrea

__page_set_anon_rmap doesn't require the PG_lock only if it's a newly
allocated page and it is called from page_add_new_anon_rmap. So there
cannot be concurrent __page_set_anon_rmap running on the same page. If
there could be concurrent __page_set_anon_rmap running on the same
page, the page_lock_anon_vma running on a changing page->mapping would
be the last worry, as page_add_new_anon_rmap would overwrite _mapcount
with 0 while do_page_add_anon_rmap runs, so corrupting the mapcount
information leading to immediate crash during page freeing....

So definitely it must not happen and not only because of
page_lock_anon_vma... and it's unlikely to go unnoticed.

I think we're safe on that respect.

> has only narrowed a window not closed it (and I've not yet looked up
> the commit to see his intent); or it may be okay, that there cannot be
> a conflict of anon_vma in that case.  Need to dig deeper.

That change is to avoid altering the page->mapping for anon
pages. It's only an optimization. No need to set the page->mapping
back to the anon_vma->root for AnonPages (that in turn have already
their page->mapping set) if we've already more finegrained information
into the page->mapping. If we've already information in page->mapping
(page is Anon) then setting to anon_vma->root can only be a coarser
setting losing anon_vma child granularity. We must set to the root
anon vma however when the page is swapcache but not anon yet... and if
it's not exclusive we've to use the anon_vma->root, otherwise if it's
being taken over by the local process we can use the local
vma->anon_vma. This should explain the logic in __set_page_anon_rmap.

> __hugepage_set_anon_rmap() appears to copy the 2.6.35
> __page_set_anon_rmap(), and probably needs to add in Andrea's fix, or
> whatever else is needed there.

I think it's actually safe in anon_vma terms, setting the
page->mapping to the anon_vma->root _always_ safe, but it should use
anon_vma->root instead of list_entry (should still lead to the same
result) and it can probably also optimize it if it's already an
AnonPage like I did for the not-hugetlbfs case (which also includes
transparent hugepages as they share the core VM paths).

The lack of BUG_ON(!PageLocked(page)) in the hugetlb_add_anon_rmap is
worth fixing...

hugepage_add_new_anon_rmap runs lock_page before so most certainly is
ok (maybe lock_page not needed if it's a new page?).

hugepage_add_anon_rmap seems to run on a local new page too (just
cowed) so I'm unsure why it's not using hugepage_add_new_anon_rmap too
and maybe it's safe without the PG_lock too, but it should use
hugepage_add_new_anon_rmap so we keep the same logic of the core VM.

I didn't spend too much on this hugetlbfs code, this is just a short
review.

> switch of anon_vma beneath us there.   Plus a
> VM_BUG_ON(PageLocked(page)) going into page_lock_anon_vma().

!PageLocked

Ok I think the concurrent writers of page->mapping needs the PG_lock
(unless they're working on the newly allocated page like
page_add_new_anon_rmap) but I really like page_lock_anon_vma to be
safe without PG_lock and to rely only on RCU and the anon_vma lock. Do
you think there's a window there? I think as long as the page is
mapped it doesn't matter... any change that can happen from under us
in page_lock_anon_vma, is still going to point to a valid anon_vma, if
it is reused it can be reused regardless if it's the root or the local
one and we've your fix to take care of that complexity of slab RCU
freeing behavior.

I need to think again and more deeply about page_lock_anon_vma running
on a page not locked, but I was convinced it was safe and that only
the writers needed the PG_lock (and even for the writers case, I think
it is mostly needed for other reasons, notably to keep the mapcount
coherent!). Yeah the write to page->mapping has to be atomic of course
(I actually checked every page->mapping writer in mm/rmap.c to verify
the gcc output to verify it is atomic, like we also relay on gcc to
write 64bit atomic for the pte/spte writes... I worried a little about
the |=1 but gcc is smart enough to do it in registers before writing
to memory, again not so different from the pte updates ;).

My reasoning is that any anon_vma we pick in page_lock_anon_vma is ok
(the root or intermediate one can't go away until the local one goes
away, if it's the local one it can't go away as long as the page is
mapped). So as long as the page->mapping writer writes atomic, and we
check page_mapped after taking the lock like your patch does, we
should be ok without PG_lock in page_lock_anon_vma.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2010-08-28 15:54 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-26  6:12 [PATCH] mm: fix hang on anon_vma->root->lock Hugh Dickins
2010-08-26  6:12 ` Hugh Dickins
2010-08-26  6:41 ` David Miller
2010-08-26  6:41   ` David Miller
2010-08-26 10:54   ` Hugh Dickins
2010-08-26 10:54     ` Hugh Dickins
2010-08-26 19:00     ` David Miller
2010-08-26 19:00       ` David Miller
2010-08-27  0:19     ` Andrea Arcangeli
2010-08-27  0:19       ` Andrea Arcangeli
2010-08-26 13:32 ` Rik van Riel
2010-08-26 13:32   ` Rik van Riel
2010-08-26 23:50 ` Andrea Arcangeli
2010-08-26 23:50   ` Andrea Arcangeli
2010-08-27  1:43   ` Hugh Dickins
2010-08-27  1:43     ` Hugh Dickins
2010-08-27  9:55     ` Andrea Arcangeli
2010-08-27  9:55       ` Andrea Arcangeli
2010-08-27 16:43       ` Hugh Dickins
2010-08-27 16:43         ` Hugh Dickins
2010-08-27 17:13         ` Christoph Lameter
2010-08-27 17:13           ` Christoph Lameter
2010-08-27 17:55           ` Hugh Dickins
2010-08-27 17:55             ` Hugh Dickins
2010-08-27 19:29             ` Christoph Lameter
2010-08-27 19:29               ` Christoph Lameter
2010-08-27 20:14               ` Hugh Dickins
2010-08-27 20:14                 ` Hugh Dickins
2010-08-27 20:56                 ` Christoph Lameter
2010-08-27 20:56                   ` Christoph Lameter
2010-08-27 21:28                   ` Hugh Dickins
2010-08-27 21:28                     ` Hugh Dickins
2010-08-27 21:33                     ` Hugh Dickins
2010-08-27 21:33                       ` Hugh Dickins
2010-08-27 23:06                       ` Christoph Lameter
2010-08-27 23:06                         ` Christoph Lameter
2010-08-28  1:07                         ` Hugh Dickins
2010-08-28  1:07                           ` Hugh Dickins
2010-08-28  2:47                           ` Christoph Lameter
2010-08-28  2:47                             ` Christoph Lameter
2010-08-28 10:17                             ` Peter Zijlstra
2010-08-28 10:17                               ` Peter Zijlstra
2010-08-28 15:54                           ` Andrea Arcangeli
2010-08-28 15:54                             ` Andrea Arcangeli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.