All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] hwpoison: Fix race with changing page during offlining v2
@ 2014-07-01  0:32 ` Andi Kleen
  0 siblings, 0 replies; 10+ messages in thread
From: Andi Kleen @ 2014-07-01  0:32 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, akpm, Andi Kleen, Naoya Horiguchi

From: Andi Kleen <ak@linux.intel.com>

When a hwpoison page is locked it could change state
due to parallel modifications.  Check after the lock
if the page is still the same compound page.

[v2: Removed earlier non LRU check which should be already
covered elsewhere]

Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 mm/memory-failure.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index cd8989c..99e5077 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1168,6 +1168,16 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
 	lock_page(hpage);
 
 	/*
+	 * The page could have changed compound pages during the locking.
+	 * If this happens just bail out.
+	 */
+	if (compound_head(p) != hpage) {
+		action_result(pfn, "different compound page after locking", IGNORED);
+		res = -EBUSY;
+		goto out;
+	}
+
+	/*
 	 * We use page flags to determine what action should be taken, but
 	 * the flags can be modified by the error containment action.  One
 	 * example is an mlocked page, where PG_mlocked is cleared by
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH] hwpoison: Fix race with changing page during offlining v2
@ 2014-07-01  0:32 ` Andi Kleen
  0 siblings, 0 replies; 10+ messages in thread
From: Andi Kleen @ 2014-07-01  0:32 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, akpm, Andi Kleen, Naoya Horiguchi

From: Andi Kleen <ak@linux.intel.com>

When a hwpoison page is locked it could change state
due to parallel modifications.  Check after the lock
if the page is still the same compound page.

[v2: Removed earlier non LRU check which should be already
covered elsewhere]

Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 mm/memory-failure.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index cd8989c..99e5077 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1168,6 +1168,16 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
 	lock_page(hpage);
 
 	/*
+	 * The page could have changed compound pages during the locking.
+	 * If this happens just bail out.
+	 */
+	if (compound_head(p) != hpage) {
+		action_result(pfn, "different compound page after locking", IGNORED);
+		res = -EBUSY;
+		goto out;
+	}
+
+	/*
 	 * We use page flags to determine what action should be taken, but
 	 * the flags can be modified by the error containment action.  One
 	 * example is an mlocked page, where PG_mlocked is cleared by
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] hwpoison: Fix race with changing page during offlining v2
  2014-07-01  0:32 ` Andi Kleen
@ 2014-07-01  1:21   ` Naoya Horiguchi
  -1 siblings, 0 replies; 10+ messages in thread
From: Naoya Horiguchi @ 2014-07-01  1:21 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-mm, linux-kernel, akpm, Andi Kleen

On Mon, Jun 30, 2014 at 05:32:16PM -0700, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> When a hwpoison page is locked it could change state
> due to parallel modifications.  Check after the lock
> if the page is still the same compound page.
> 
> [v2: Removed earlier non LRU check which should be already
> covered elsewhere]
> 
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Signed-off-by: Andi Kleen <ak@linux.intel.com>

Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

Is it -stable matter?
Maybe 2.6.38+ can profit from this.

Thanks,
Naoya Horiguchi

> ---
>  mm/memory-failure.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index cd8989c..99e5077 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1168,6 +1168,16 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
>  	lock_page(hpage);
>  
>  	/*
> +	 * The page could have changed compound pages during the locking.
> +	 * If this happens just bail out.
> +	 */
> +	if (compound_head(p) != hpage) {
> +		action_result(pfn, "different compound page after locking", IGNORED);
> +		res = -EBUSY;
> +		goto out;
> +	}
> +
> +	/*
>  	 * We use page flags to determine what action should be taken, but
>  	 * the flags can be modified by the error containment action.  One
>  	 * example is an mlocked page, where PG_mlocked is cleared by
> -- 
> 1.9.3
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] hwpoison: Fix race with changing page during offlining v2
@ 2014-07-01  1:21   ` Naoya Horiguchi
  0 siblings, 0 replies; 10+ messages in thread
From: Naoya Horiguchi @ 2014-07-01  1:21 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-mm, linux-kernel, akpm, Andi Kleen

On Mon, Jun 30, 2014 at 05:32:16PM -0700, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> When a hwpoison page is locked it could change state
> due to parallel modifications.  Check after the lock
> if the page is still the same compound page.
> 
> [v2: Removed earlier non LRU check which should be already
> covered elsewhere]
> 
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Signed-off-by: Andi Kleen <ak@linux.intel.com>

Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

Is it -stable matter?
Maybe 2.6.38+ can profit from this.

Thanks,
Naoya Horiguchi

> ---
>  mm/memory-failure.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index cd8989c..99e5077 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1168,6 +1168,16 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
>  	lock_page(hpage);
>  
>  	/*
> +	 * The page could have changed compound pages during the locking.
> +	 * If this happens just bail out.
> +	 */
> +	if (compound_head(p) != hpage) {
> +		action_result(pfn, "different compound page after locking", IGNORED);
> +		res = -EBUSY;
> +		goto out;
> +	}
> +
> +	/*
>  	 * We use page flags to determine what action should be taken, but
>  	 * the flags can be modified by the error containment action.  One
>  	 * example is an mlocked page, where PG_mlocked is cleared by
> -- 
> 1.9.3
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] hwpoison: Fix race with changing page during offlining v2
  2014-07-01  1:21   ` Naoya Horiguchi
@ 2014-07-01  2:15     ` Andi Kleen
  -1 siblings, 0 replies; 10+ messages in thread
From: Andi Kleen @ 2014-07-01  2:15 UTC (permalink / raw)
  To: Naoya Horiguchi; +Cc: Andi Kleen, linux-mm, linux-kernel, akpm, Andi Kleen

> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> 
> Is it -stable matter?
> Maybe 2.6.38+ can profit from this.

Probably not, it's not a critical bug fix.

-Andi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] hwpoison: Fix race with changing page during offlining v2
@ 2014-07-01  2:15     ` Andi Kleen
  0 siblings, 0 replies; 10+ messages in thread
From: Andi Kleen @ 2014-07-01  2:15 UTC (permalink / raw)
  To: Naoya Horiguchi; +Cc: Andi Kleen, linux-mm, linux-kernel, akpm, Andi Kleen

> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> 
> Is it -stable matter?
> Maybe 2.6.38+ can profit from this.

Probably not, it's not a critical bug fix.

-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] hwpoison: Fix race with changing page during offlining v2
  2014-07-01  0:32 ` Andi Kleen
@ 2014-07-01 22:27   ` Andrew Morton
  -1 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2014-07-01 22:27 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-mm, linux-kernel, Andi Kleen, Naoya Horiguchi

On Mon, 30 Jun 2014 17:32:16 -0700 Andi Kleen <andi@firstfloor.org> wrote:

> From: Andi Kleen <ak@linux.intel.com>
> 
> When a hwpoison page is locked it could change state
> due to parallel modifications.  Check after the lock
> if the page is still the same compound page.
> 
> ...
>
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1168,6 +1168,16 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
>  	lock_page(hpage);
>  
>  	/*
> +	 * The page could have changed compound pages during the locking.
> +	 * If this happens just bail out.
> +	 */
> +	if (compound_head(p) != hpage) {

How can a 4k page change compound pages?  The original compound page
was torn down and then this 4k page became part of a differently-sized
compound page?

> +		action_result(pfn, "different compound page after locking", IGNORED);
> +		res = -EBUSY;
> +		goto out;
> +	}
> +
> +	/*

I don't get it.  We just go and fail the poisoning attempt?  Shouldn't
we go back, grab the new hpage and try again?



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] hwpoison: Fix race with changing page during offlining v2
@ 2014-07-01 22:27   ` Andrew Morton
  0 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2014-07-01 22:27 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-mm, linux-kernel, Andi Kleen, Naoya Horiguchi

On Mon, 30 Jun 2014 17:32:16 -0700 Andi Kleen <andi@firstfloor.org> wrote:

> From: Andi Kleen <ak@linux.intel.com>
> 
> When a hwpoison page is locked it could change state
> due to parallel modifications.  Check after the lock
> if the page is still the same compound page.
> 
> ...
>
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1168,6 +1168,16 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
>  	lock_page(hpage);
>  
>  	/*
> +	 * The page could have changed compound pages during the locking.
> +	 * If this happens just bail out.
> +	 */
> +	if (compound_head(p) != hpage) {

How can a 4k page change compound pages?  The original compound page
was torn down and then this 4k page became part of a differently-sized
compound page?

> +		action_result(pfn, "different compound page after locking", IGNORED);
> +		res = -EBUSY;
> +		goto out;
> +	}
> +
> +	/*

I don't get it.  We just go and fail the poisoning attempt?  Shouldn't
we go back, grab the new hpage and try again?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] hwpoison: Fix race with changing page during offlining v2
  2014-07-01 22:27   ` Andrew Morton
@ 2014-07-02  0:02     ` Andi Kleen
  -1 siblings, 0 replies; 10+ messages in thread
From: Andi Kleen @ 2014-07-02  0:02 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andi Kleen, linux-mm, linux-kernel, Andi Kleen, Naoya Horiguchi

> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -1168,6 +1168,16 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
> >  	lock_page(hpage);
> >  
> >  	/*
> > +	 * The page could have changed compound pages during the locking.
> > +	 * If this happens just bail out.
> > +	 */
> > +	if (compound_head(p) != hpage) {
> 
> How can a 4k page change compound pages?  The original compound page
> was torn down and then this 4k page became part of a differently-size
> compound page?

Yes or it was torn down and now it's its own page.

> 
> > +		action_result(pfn, "different compound page after locking", IGNORED);
> > +		res = -EBUSY;
> > +		goto out;
> > +	}
> > +
> > +	/*
> 
> I don't get it.  We just go and fail the poisoning attempt?  Shouldn't
> we go back, grab the new hpage and try again?

It should be quite rare, so I thought this was safest. An retry loop
would be more difficult to test and may have more side effects.

The hwpoison code by design only tries to handle cases that are
reasonably common in workloads, as visible in page-flags.

I'm not really that concerned about handling this (likely rare case),
just not crashing on it.

-Andi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] hwpoison: Fix race with changing page during offlining v2
@ 2014-07-02  0:02     ` Andi Kleen
  0 siblings, 0 replies; 10+ messages in thread
From: Andi Kleen @ 2014-07-02  0:02 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andi Kleen, linux-mm, linux-kernel, Andi Kleen, Naoya Horiguchi

> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -1168,6 +1168,16 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
> >  	lock_page(hpage);
> >  
> >  	/*
> > +	 * The page could have changed compound pages during the locking.
> > +	 * If this happens just bail out.
> > +	 */
> > +	if (compound_head(p) != hpage) {
> 
> How can a 4k page change compound pages?  The original compound page
> was torn down and then this 4k page became part of a differently-size
> compound page?

Yes or it was torn down and now it's its own page.

> 
> > +		action_result(pfn, "different compound page after locking", IGNORED);
> > +		res = -EBUSY;
> > +		goto out;
> > +	}
> > +
> > +	/*
> 
> I don't get it.  We just go and fail the poisoning attempt?  Shouldn't
> we go back, grab the new hpage and try again?

It should be quite rare, so I thought this was safest. An retry loop
would be more difficult to test and may have more side effects.

The hwpoison code by design only tries to handle cases that are
reasonably common in workloads, as visible in page-flags.

I'm not really that concerned about handling this (likely rare case),
just not crashing on it.

-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-07-02  0:02 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-01  0:32 [PATCH] hwpoison: Fix race with changing page during offlining v2 Andi Kleen
2014-07-01  0:32 ` Andi Kleen
2014-07-01  1:21 ` Naoya Horiguchi
2014-07-01  1:21   ` Naoya Horiguchi
2014-07-01  2:15   ` Andi Kleen
2014-07-01  2:15     ` Andi Kleen
2014-07-01 22:27 ` Andrew Morton
2014-07-01 22:27   ` Andrew Morton
2014-07-02  0:02   ` Andi Kleen
2014-07-02  0:02     ` Andi Kleen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.