linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2] MCE: fix an error of mce_bad_pages statistics
@ 2012-12-07  8:48 Xishi Qiu
  2012-12-07 14:33 ` Borislav Petkov
  2012-12-07 22:11 ` Andrew Morton
  0 siblings, 2 replies; 28+ messages in thread
From: Xishi Qiu @ 2012-12-07  8:48 UTC (permalink / raw)
  To: WuJianguo, Xishi Qiu, Liujiang, Vyacheslav.Dubeyko,
	Borislav Petkov, andi, akpm, linux-mm, linux-kernel

On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" to offline a
free page twice, the value of mce_bad_pages will be added twice. So this is an error,
since the page was already marked HWPoison, we should skip the page and don't add the
value of mce_bad_pages.

$ cat /proc/meminfo | grep HardwareCorrupted

soft_offline_page()
	get_any_page()
		atomic_long_add(1, &mce_bad_pages)

Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
---
 mm/memory-failure.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 8b20278..de760ca 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1582,8 +1582,11 @@ int soft_offline_page(struct page *page, int flags)
 		return ret;

 done:
-	atomic_long_add(1, &mce_bad_pages);
-	SetPageHWPoison(page);
 	/* keep elevated page count for bad page */
+	if (!PageHWPoison(page)) {
+		atomic_long_add(1, &mce_bad_pages);
+		SetPageHWPoison(page);
+	}
+
 	return ret;
 }
-- 
1.7.6.1



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
  2012-12-07  8:48 [PATCH V2] MCE: fix an error of mce_bad_pages statistics Xishi Qiu
@ 2012-12-07 14:33 ` Borislav Petkov
  2012-12-07 22:11 ` Andrew Morton
  1 sibling, 0 replies; 28+ messages in thread
From: Borislav Petkov @ 2012-12-07 14:33 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: WuJianguo, Liujiang, Vyacheslav.Dubeyko, andi, akpm, linux-mm,
	linux-kernel

On Fri, Dec 07, 2012 at 04:48:45PM +0800, Xishi Qiu wrote:
> On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" to offline a
> free page twice, the value of mce_bad_pages will be added twice. So this is an error,
> since the page was already marked HWPoison, we should skip the page and don't add the
> value of mce_bad_pages.
> 
> $ cat /proc/meminfo | grep HardwareCorrupted
> 
> soft_offline_page()
> 	get_any_page()
> 		atomic_long_add(1, &mce_bad_pages)
> 
> Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
> Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
> ---
>  mm/memory-failure.c |    7 +++++--
>  1 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 8b20278..de760ca 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1582,8 +1582,11 @@ int soft_offline_page(struct page *page, int flags)
>  		return ret;
> 
>  done:
> -	atomic_long_add(1, &mce_bad_pages);
> -	SetPageHWPoison(page);
>  	/* keep elevated page count for bad page */
> +	if (!PageHWPoison(page)) {
> +		atomic_long_add(1, &mce_bad_pages);
> +		SetPageHWPoison(page);
> +	}

Ok, I don't know the memory-failure code all that well but IMHO why
should we wade through the whole soft_offline_page function for a page
which has been marked as poisoned already?

IOW, why not do what you started with previously and exit this function
ASAP:

---
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 8b20278be6a6..a83baeca0644 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1494,6 +1494,12 @@ int soft_offline_page(struct page *page, int flags)
 	if (ret == 0)
 		goto done;
 
+	if (PageHWPoison(page)) {
+		put_page(page);
+		pr_info("soft offline: %#lx page already poisoned\n", pfn);
+		return -EBUSY;
+	}
+
 	/*
 	 * Page cache page we can handle?
 	 */
---

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
  2012-12-07  8:48 [PATCH V2] MCE: fix an error of mce_bad_pages statistics Xishi Qiu
  2012-12-07 14:33 ` Borislav Petkov
@ 2012-12-07 22:11 ` Andrew Morton
  2012-12-07 22:41   ` Borislav Petkov
                     ` (2 more replies)
  1 sibling, 3 replies; 28+ messages in thread
From: Andrew Morton @ 2012-12-07 22:11 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: WuJianguo, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, andi,
	linux-mm, linux-kernel

On Fri, 7 Dec 2012 16:48:45 +0800
Xishi Qiu <qiuxishi@huawei.com> wrote:

> On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" to offline a
> free page twice, the value of mce_bad_pages will be added twice. So this is an error,
> since the page was already marked HWPoison, we should skip the page and don't add the
> value of mce_bad_pages.
> 
> $ cat /proc/meminfo | grep HardwareCorrupted
> 
> soft_offline_page()
> 	get_any_page()
> 		atomic_long_add(1, &mce_bad_pages)
> 
> ...
>
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1582,8 +1582,11 @@ int soft_offline_page(struct page *page, int flags)
>  		return ret;
> 
>  done:
> -	atomic_long_add(1, &mce_bad_pages);
> -	SetPageHWPoison(page);
>  	/* keep elevated page count for bad page */
> +	if (!PageHWPoison(page)) {
> +		atomic_long_add(1, &mce_bad_pages);
> +		SetPageHWPoison(page);
> +	}
> +
>  	return ret;
>  }

A few things:

- soft_offline_page() already checks for this case:

	if (PageHWPoison(page)) {
		unlock_page(page);
		put_page(page);
		pr_info("soft offline: %#lx page already poisoned\n", pfn);
		return -EBUSY;
	}

  so why didn't this check work for you?

  Presumably because one of the earlier "goto done" branches was
  taken.  Which one, any why?

  This function is an utter mess.  It contains six return points
  randomly intermingled with three "goto done" return points.

  This mess is probably the cause of the bug you have observed.  Can
  we please fix it up somehow?  It *seems* that the design (lol) of
  this function is "for errors, return immediately.  For success, goto
  done".  In which case "done" should have been called "success".  But
  if you just look at the function you'll see that this approach didn't
  work.  I suggest it be converted to have two return points - one for
  the success path, one for the failure path.  Or something.

- soft_offline_huge_page() is a miniature copy of soft_offline_page()
  and might suffer the same bug.

- A cleaner, shorter and possibly faster implementation is

	if (!TestSetPageHWPoison(page))
		atomic_long_add(1, &mce_bad_pages);

- We have atomic_long_inc().  Use it?

- Why do we have a variable called "mce_bad_pages"?  MCE is an x86
  concept, and this code is in mm/.  Lights are flashing, bells are
  ringing and a loudspeaker is blaring "layering violation" at us!

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
  2012-12-07 22:11 ` Andrew Morton
@ 2012-12-07 22:41   ` Borislav Petkov
  2012-12-10  4:33   ` Xishi Qiu
       [not found]   ` <20121210083342.GA31670@hacker.(null)>
  2 siblings, 0 replies; 28+ messages in thread
From: Borislav Petkov @ 2012-12-07 22:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Xishi Qiu, WuJianguo, Liujiang, Vyacheslav.Dubeyko, andi,
	linux-mm, linux-kernel

On Fri, Dec 07, 2012 at 02:11:02PM -0800, Andrew Morton wrote:
> A few things:
> 
> - soft_offline_page() already checks for this case:
> 
> 	if (PageHWPoison(page)) {
> 		unlock_page(page);
> 		put_page(page);
> 		pr_info("soft offline: %#lx page already poisoned\n", pfn);
> 		return -EBUSY;
> 	}

Oh, so we do this check after all. But later in the function. Why? Why
not at the beginning so that when a page is marked poisoned already we
can exit early?

Strange.

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
  2012-12-07 22:11 ` Andrew Morton
  2012-12-07 22:41   ` Borislav Petkov
@ 2012-12-10  4:33   ` Xishi Qiu
       [not found]   ` <20121210083342.GA31670@hacker.(null)>
  2 siblings, 0 replies; 28+ messages in thread
From: Xishi Qiu @ 2012-12-10  4:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: WuJianguo, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, andi,
	linux-mm, linux-kernel

On 2012/12/8 6:11, Andrew Morton wrote:

> On Fri, 7 Dec 2012 16:48:45 +0800
> Xishi Qiu <qiuxishi@huawei.com> wrote:
> 
>> On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" to offline a
>> free page twice, the value of mce_bad_pages will be added twice. So this is an error,
>> since the page was already marked HWPoison, we should skip the page and don't add the
>> value of mce_bad_pages.
>>
>> $ cat /proc/meminfo | grep HardwareCorrupted
>>
>> soft_offline_page()
>> 	get_any_page()
>> 		atomic_long_add(1, &mce_bad_pages)
>>
>> ...
>>
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1582,8 +1582,11 @@ int soft_offline_page(struct page *page, int flags)
>>  		return ret;
>>
>>  done:
>> -	atomic_long_add(1, &mce_bad_pages);
>> -	SetPageHWPoison(page);
>>  	/* keep elevated page count for bad page */
>> +	if (!PageHWPoison(page)) {
>> +		atomic_long_add(1, &mce_bad_pages);
>> +		SetPageHWPoison(page);
>> +	}
>> +
>>  	return ret;
>>  }
> 
> A few things:
> 
> - soft_offline_page() already checks for this case:
> 
> 	if (PageHWPoison(page)) {
> 		unlock_page(page);
> 		put_page(page);
> 		pr_info("soft offline: %#lx page already poisoned\n", pfn);
> 		return -EBUSY;
> 	}
> 
>   so why didn't this check work for you?
> 
>   Presumably because one of the earlier "goto done" branches was
>   taken.  Which one, any why?
> 
>   This function is an utter mess.  It contains six return points
>   randomly intermingled with three "goto done" return points.
> 
>   This mess is probably the cause of the bug you have observed.  Can
>   we please fix it up somehow?  It *seems* that the design (lol) of
>   this function is "for errors, return immediately.  For success, goto
>   done".  In which case "done" should have been called "success".  But
>   if you just look at the function you'll see that this approach didn't
>   work.  I suggest it be converted to have two return points - one for
>   the success path, one for the failure path.  Or something.
> 
> - soft_offline_huge_page() is a miniature copy of soft_offline_page()
>   and might suffer the same bug.
> 
> - A cleaner, shorter and possibly faster implementation is
> 
> 	if (!TestSetPageHWPoison(page))
> 		atomic_long_add(1, &mce_bad_pages);
> 
> - We have atomic_long_inc().  Use it?
> 
> - Why do we have a variable called "mce_bad_pages"?  MCE is an x86
>   concept, and this code is in mm/.  Lights are flashing, bells are
>   ringing and a loudspeaker is blaring "layering violation" at us!
> 

Hi Andrew, thank you for your advice, I will send V3 soon.

Thanks
Xishi Qiu

> .
> 




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
       [not found]   ` <20121210083342.GA31670@hacker.(null)>
@ 2012-12-10  9:06     ` Xishi Qiu
  2012-12-10 10:47       ` Simon Jeons
  0 siblings, 1 reply; 28+ messages in thread
From: Xishi Qiu @ 2012-12-10  9:06 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Andrew Morton, WuJianguo, Liujiang, Vyacheslav.Dubeyko,
	Borislav Petkov, andi, linux-mm, linux-kernel, wency

On 2012/12/10 16:33, Wanpeng Li wrote:

> On Fri, Dec 07, 2012 at 02:11:02PM -0800, Andrew Morton wrote:
>> On Fri, 7 Dec 2012 16:48:45 +0800
>> Xishi Qiu <qiuxishi@huawei.com> wrote:
>>
>>> On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" to offline a
>>> free page twice, the value of mce_bad_pages will be added twice. So this is an error,
>>> since the page was already marked HWPoison, we should skip the page and don't add the
>>> value of mce_bad_pages.
>>>
>>> $ cat /proc/meminfo | grep HardwareCorrupted
>>>
>>> soft_offline_page()
>>> 	get_any_page()
>>> 		atomic_long_add(1, &mce_bad_pages)
>>>
>>> ...
>>>
>>> --- a/mm/memory-failure.c
>>> +++ b/mm/memory-failure.c
>>> @@ -1582,8 +1582,11 @@ int soft_offline_page(struct page *page, int flags)
>>>  		return ret;
>>>
>>>  done:
>>> -	atomic_long_add(1, &mce_bad_pages);
>>> -	SetPageHWPoison(page);
>>>  	/* keep elevated page count for bad page */
>>> +	if (!PageHWPoison(page)) {
>>> +		atomic_long_add(1, &mce_bad_pages);
>>> +		SetPageHWPoison(page);
>>> +	}
>>> +
>>>  	return ret;
>>>  }
>>
>> A few things:
>>
>> - soft_offline_page() already checks for this case:
>>
>> 	if (PageHWPoison(page)) {
>> 		unlock_page(page);
>> 		put_page(page);
>> 		pr_info("soft offline: %#lx page already poisoned\n", pfn);
>> 		return -EBUSY;
>> 	}
>>
>>  so why didn't this check work for you?
>>
>>  Presumably because one of the earlier "goto done" branches was
>>  taken.  Which one, any why?
>>
>>  This function is an utter mess.  It contains six return points
>>  randomly intermingled with three "goto done" return points.
>>
>>  This mess is probably the cause of the bug you have observed.  Can
>>  we please fix it up somehow?  It *seems* that the design (lol) of
>>  this function is "for errors, return immediately.  For success, goto
>>  done".  In which case "done" should have been called "success".  But
>>  if you just look at the function you'll see that this approach didn't
>>  work.  I suggest it be converted to have two return points - one for
>>  the success path, one for the failure path.  Or something.
>>
>> - soft_offline_huge_page() is a miniature copy of soft_offline_page()
>>  and might suffer the same bug.
>>
>> - A cleaner, shorter and possibly faster implementation is
>>
>> 	if (!TestSetPageHWPoison(page))
>> 		atomic_long_add(1, &mce_bad_pages);
>>
> 
> Hi Andrew,
> 
> Since hwpoison bit for free buddy page has already be set in get_any_page, 
> !TestSetPageHWPoison(page) will not increase mce_bad_pages count even for 
> the first time.
> 
> Regards,
> Wanpeng Li
> 

The poisoned page is isolated in bad_page(), I wonder whether it could be isolated
immediately in soft_offline_page() and memory_failure()?

buffered_rmqueue()
	prep_new_page()
		check_new_page()
			bad_page()

Thanks
Xishi Qiu

>> - We have atomic_long_inc().  Use it?
>>
>> - Why do we have a variable called "mce_bad_pages"?  MCE is an x86
>>  concept, and this code is in mm/.  Lights are flashing, bells are
>>  ringing and a loudspeaker is blaring "layering violation" at us!
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 
> 
> .
> 




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
  2012-12-10  9:06     ` Xishi Qiu
@ 2012-12-10 10:47       ` Simon Jeons
  2012-12-10 11:16         ` Xishi Qiu
  0 siblings, 1 reply; 28+ messages in thread
From: Simon Jeons @ 2012-12-10 10:47 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Wanpeng Li, Andrew Morton, WuJianguo, Liujiang,
	Vyacheslav.Dubeyko, Borislav Petkov, andi, linux-mm,
	linux-kernel, wency

On Mon, 2012-12-10 at 17:06 +0800, Xishi Qiu wrote:
> On 2012/12/10 16:33, Wanpeng Li wrote:
> 
> > On Fri, Dec 07, 2012 at 02:11:02PM -0800, Andrew Morton wrote:
> >> On Fri, 7 Dec 2012 16:48:45 +0800
> >> Xishi Qiu <qiuxishi@huawei.com> wrote:
> >>
> >>> On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" to offline a
> >>> free page twice, the value of mce_bad_pages will be added twice. So this is an error,
> >>> since the page was already marked HWPoison, we should skip the page and don't add the
> >>> value of mce_bad_pages.
> >>>
> >>> $ cat /proc/meminfo | grep HardwareCorrupted
> >>>
> >>> soft_offline_page()
> >>> 	get_any_page()
> >>> 		atomic_long_add(1, &mce_bad_pages)
> >>>
> >>> ...
> >>>
> >>> --- a/mm/memory-failure.c
> >>> +++ b/mm/memory-failure.c
> >>> @@ -1582,8 +1582,11 @@ int soft_offline_page(struct page *page, int flags)
> >>>  		return ret;
> >>>
> >>>  done:
> >>> -	atomic_long_add(1, &mce_bad_pages);
> >>> -	SetPageHWPoison(page);
> >>>  	/* keep elevated page count for bad page */
> >>> +	if (!PageHWPoison(page)) {
> >>> +		atomic_long_add(1, &mce_bad_pages);
> >>> +		SetPageHWPoison(page);
> >>> +	}
> >>> +
> >>>  	return ret;
> >>>  }
> >>
> >> A few things:
> >>
> >> - soft_offline_page() already checks for this case:
> >>
> >> 	if (PageHWPoison(page)) {
> >> 		unlock_page(page);
> >> 		put_page(page);
> >> 		pr_info("soft offline: %#lx page already poisoned\n", pfn);
> >> 		return -EBUSY;
> >> 	}
> >>
> >>  so why didn't this check work for you?
> >>
> >>  Presumably because one of the earlier "goto done" branches was
> >>  taken.  Which one, any why?
> >>
> >>  This function is an utter mess.  It contains six return points
> >>  randomly intermingled with three "goto done" return points.
> >>
> >>  This mess is probably the cause of the bug you have observed.  Can
> >>  we please fix it up somehow?  It *seems* that the design (lol) of
> >>  this function is "for errors, return immediately.  For success, goto
> >>  done".  In which case "done" should have been called "success".  But
> >>  if you just look at the function you'll see that this approach didn't
> >>  work.  I suggest it be converted to have two return points - one for
> >>  the success path, one for the failure path.  Or something.
> >>
> >> - soft_offline_huge_page() is a miniature copy of soft_offline_page()
> >>  and might suffer the same bug.
> >>
> >> - A cleaner, shorter and possibly faster implementation is
> >>
> >> 	if (!TestSetPageHWPoison(page))
> >> 		atomic_long_add(1, &mce_bad_pages);
> >>
> > 
> > Hi Andrew,
> > 
> > Since hwpoison bit for free buddy page has already be set in get_any_page, 
> > !TestSetPageHWPoison(page) will not increase mce_bad_pages count even for 
> > the first time.
> > 
> > Regards,
> > Wanpeng Li
> > 
> 
> The poisoned page is isolated in bad_page(), I wonder whether it could be isolated
> immediately in soft_offline_page() and memory_failure()?
> 
> buffered_rmqueue()
> 	prep_new_page()
> 		check_new_page()
> 			bad_page()

Do you mean else if(is_free_buddy_page(p)) branch is redundancy?

> 
> Thanks
> Xishi Qiu
> 
> >> - We have atomic_long_inc().  Use it?
> >>
> >> - Why do we have a variable called "mce_bad_pages"?  MCE is an x86
> >>  concept, and this code is in mm/.  Lights are flashing, bells are
> >>  ringing and a loudspeaker is blaring "layering violation" at us!
> >>
> >> --
> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >> the body to majordomo@kvack.org.  For more info on Linux MM,
> >> see: http://www.linux-mm.org/ .
> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> > 
> > 
> > .
> > 
> 
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
  2012-12-10 10:47       ` Simon Jeons
@ 2012-12-10 11:16         ` Xishi Qiu
  2012-12-10 11:58           ` Simon Jeons
                             ` (3 more replies)
  0 siblings, 4 replies; 28+ messages in thread
From: Xishi Qiu @ 2012-12-10 11:16 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Wanpeng Li, Andrew Morton, WuJianguo, Liujiang,
	Vyacheslav.Dubeyko, Borislav Petkov, andi, linux-mm,
	linux-kernel, wency

On 2012/12/10 18:47, Simon Jeons wrote:

> On Mon, 2012-12-10 at 17:06 +0800, Xishi Qiu wrote:
>> On 2012/12/10 16:33, Wanpeng Li wrote:
>>
>>> On Fri, Dec 07, 2012 at 02:11:02PM -0800, Andrew Morton wrote:
>>>> On Fri, 7 Dec 2012 16:48:45 +0800
>>>> Xishi Qiu <qiuxishi@huawei.com> wrote:
>>>>
>>>>> On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" to offline a
>>>>> free page twice, the value of mce_bad_pages will be added twice. So this is an error,
>>>>> since the page was already marked HWPoison, we should skip the page and don't add the
>>>>> value of mce_bad_pages.
>>>>>
>>>>> $ cat /proc/meminfo | grep HardwareCorrupted
>>>>>
>>>>> soft_offline_page()
>>>>> 	get_any_page()
>>>>> 		atomic_long_add(1, &mce_bad_pages)
>>>>>
>>>>> ...
>>>>>
>>>>> --- a/mm/memory-failure.c
>>>>> +++ b/mm/memory-failure.c
>>>>> @@ -1582,8 +1582,11 @@ int soft_offline_page(struct page *page, int flags)
>>>>>  		return ret;
>>>>>
>>>>>  done:
>>>>> -	atomic_long_add(1, &mce_bad_pages);
>>>>> -	SetPageHWPoison(page);
>>>>>  	/* keep elevated page count for bad page */
>>>>> +	if (!PageHWPoison(page)) {
>>>>> +		atomic_long_add(1, &mce_bad_pages);
>>>>> +		SetPageHWPoison(page);
>>>>> +	}
>>>>> +
>>>>>  	return ret;
>>>>>  }
>>>>
>>>> A few things:
>>>>
>>>> - soft_offline_page() already checks for this case:
>>>>
>>>> 	if (PageHWPoison(page)) {
>>>> 		unlock_page(page);
>>>> 		put_page(page);
>>>> 		pr_info("soft offline: %#lx page already poisoned\n", pfn);
>>>> 		return -EBUSY;
>>>> 	}
>>>>
>>>>  so why didn't this check work for you?
>>>>
>>>>  Presumably because one of the earlier "goto done" branches was
>>>>  taken.  Which one, any why?
>>>>
>>>>  This function is an utter mess.  It contains six return points
>>>>  randomly intermingled with three "goto done" return points.
>>>>
>>>>  This mess is probably the cause of the bug you have observed.  Can
>>>>  we please fix it up somehow?  It *seems* that the design (lol) of
>>>>  this function is "for errors, return immediately.  For success, goto
>>>>  done".  In which case "done" should have been called "success".  But
>>>>  if you just look at the function you'll see that this approach didn't
>>>>  work.  I suggest it be converted to have two return points - one for
>>>>  the success path, one for the failure path.  Or something.
>>>>
>>>> - soft_offline_huge_page() is a miniature copy of soft_offline_page()
>>>>  and might suffer the same bug.
>>>>
>>>> - A cleaner, shorter and possibly faster implementation is
>>>>
>>>> 	if (!TestSetPageHWPoison(page))
>>>> 		atomic_long_add(1, &mce_bad_pages);
>>>>
>>>
>>> Hi Andrew,
>>>
>>> Since hwpoison bit for free buddy page has already be set in get_any_page, 
>>> !TestSetPageHWPoison(page) will not increase mce_bad_pages count even for 
>>> the first time.
>>>
>>> Regards,
>>> Wanpeng Li
>>>
>>
>> The poisoned page is isolated in bad_page(), I wonder whether it could be isolated
>> immediately in soft_offline_page() and memory_failure()?
>>
>> buffered_rmqueue()
>> 	prep_new_page()
>> 		check_new_page()
>> 			bad_page()
> 
> Do you mean else if(is_free_buddy_page(p)) branch is redundancy?
> 

Hi Simon,

get_any_page() -> "else if(is_free_buddy_page(p))" branch is *not* redundancy.

It is another topic, I mean since the page is poisoned, so why not isolate it
from page buddy alocator in soft_offline_page() rather than in check_new_page().

I find soft_offline_page() only migrate the page and mark HWPoison, the poisoned
page is still managed by page buddy alocator.

>>
>> Thanks
>> Xishi Qiu
>>
>>>> - We have atomic_long_inc().  Use it?
>>>>
>>>> - Why do we have a variable called "mce_bad_pages"?  MCE is an x86
>>>>  concept, and this code is in mm/.  Lights are flashing, bells are
>>>>  ringing and a loudspeaker is blaring "layering violation" at us!
>>>>



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
       [not found]           ` <20121210113923.GA5579@hacker.(null)>
@ 2012-12-10 11:54             ` Xishi Qiu
  2012-12-10 12:11               ` Borislav Petkov
  2012-12-10 15:39             ` Andi Kleen
  1 sibling, 1 reply; 28+ messages in thread
From: Xishi Qiu @ 2012-12-10 11:54 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Simon Jeons, Andrew Morton, WuJianguo, Liujiang,
	Vyacheslav.Dubeyko, Borislav Petkov, andi, linux-mm,
	linux-kernel, wency

On 2012/12/10 19:39, Wanpeng Li wrote:

> On Mon, Dec 10, 2012 at 07:16:50PM +0800, Xishi Qiu wrote:
>> On 2012/12/10 18:47, Simon Jeons wrote:
>>
>>> On Mon, 2012-12-10 at 17:06 +0800, Xishi Qiu wrote:
>>>> On 2012/12/10 16:33, Wanpeng Li wrote:
>>>>
>>>>> On Fri, Dec 07, 2012 at 02:11:02PM -0800, Andrew Morton wrote:
>>>>>> On Fri, 7 Dec 2012 16:48:45 +0800
>>>>>> Xishi Qiu <qiuxishi@huawei.com> wrote:
>>>>>>
>>>>>>> On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" to offline a
>>>>>>> free page twice, the value of mce_bad_pages will be added twice. So this is an error,
>>>>>>> since the page was already marked HWPoison, we should skip the page and don't add the
>>>>>>> value of mce_bad_pages.
>>>>>>>
>>>>>>> $ cat /proc/meminfo | grep HardwareCorrupted
>>>>>>>
>>>>>>> soft_offline_page()
>>>>>>> 	get_any_page()
>>>>>>> 		atomic_long_add(1, &mce_bad_pages)
>>>>>>>
>>>>>>> ...
>>>>>>>
>>>>>>> --- a/mm/memory-failure.c
>>>>>>> +++ b/mm/memory-failure.c
>>>>>>> @@ -1582,8 +1582,11 @@ int soft_offline_page(struct page *page, int flags)
>>>>>>>  		return ret;
>>>>>>>
>>>>>>>  done:
>>>>>>> -	atomic_long_add(1, &mce_bad_pages);
>>>>>>> -	SetPageHWPoison(page);
>>>>>>>  	/* keep elevated page count for bad page */
>>>>>>> +	if (!PageHWPoison(page)) {
>>>>>>> +		atomic_long_add(1, &mce_bad_pages);
>>>>>>> +		SetPageHWPoison(page);
>>>>>>> +	}
>>>>>>> +
>>>>>>>  	return ret;
>>>>>>>  }
>>>>>>
>>>>>> A few things:
>>>>>>
>>>>>> - soft_offline_page() already checks for this case:
>>>>>>
>>>>>> 	if (PageHWPoison(page)) {
>>>>>> 		unlock_page(page);
>>>>>> 		put_page(page);
>>>>>> 		pr_info("soft offline: %#lx page already poisoned\n", pfn);
>>>>>> 		return -EBUSY;
>>>>>> 	}
>>>>>>
>>>>>>  so why didn't this check work for you?
>>>>>>
>>>>>>  Presumably because one of the earlier "goto done" branches was
>>>>>>  taken.  Which one, any why?
>>>>>>
>>>>>>  This function is an utter mess.  It contains six return points
>>>>>>  randomly intermingled with three "goto done" return points.
>>>>>>
>>>>>>  This mess is probably the cause of the bug you have observed.  Can
>>>>>>  we please fix it up somehow?  It *seems* that the design (lol) of
>>>>>>  this function is "for errors, return immediately.  For success, goto
>>>>>>  done".  In which case "done" should have been called "success".  But
>>>>>>  if you just look at the function you'll see that this approach didn't
>>>>>>  work.  I suggest it be converted to have two return points - one for
>>>>>>  the success path, one for the failure path.  Or something.
>>>>>>
>>>>>> - soft_offline_huge_page() is a miniature copy of soft_offline_page()
>>>>>>  and might suffer the same bug.
>>>>>>
>>>>>> - A cleaner, shorter and possibly faster implementation is
>>>>>>
>>>>>> 	if (!TestSetPageHWPoison(page))
>>>>>> 		atomic_long_add(1, &mce_bad_pages);
>>>>>>
>>>>>
>>>>> Hi Andrew,
>>>>>
>>>>> Since hwpoison bit for free buddy page has already be set in get_any_page, 
>>>>> !TestSetPageHWPoison(page) will not increase mce_bad_pages count even for 
>>>>> the first time.
>>>>>
>>>>> Regards,
>>>>> Wanpeng Li
>>>>>
>>>>
>>>> The poisoned page is isolated in bad_page(), I wonder whether it could be isolated
>>>> immediately in soft_offline_page() and memory_failure()?
>>>>
>>>> buffered_rmqueue()
>>>> 	prep_new_page()
>>>> 		check_new_page()
>>>> 			bad_page()
>>>
>>> Do you mean else if(is_free_buddy_page(p)) branch is redundancy?
>>>
>>
>> Hi Simon,
>>
>> get_any_page() -> "else if(is_free_buddy_page(p))" branch is *not* redundancy.
>>
>> It is another topic, I mean since the page is poisoned, so why not isolate it
>>from page buddy alocator in soft_offline_page() rather than in check_new_page().
>>
>> I find soft_offline_page() only migrate the page and mark HWPoison, the poisoned
>> page is still managed by page buddy alocator.
>>
> 
> Hi Xishi,
> 
> HWPoison delays any action on buddy allocator pages, handling can be safely postponed 
> until a later time when the page might be referenced. By delaying, some transient errors 
> may not reoccur or may be irrelevant.
> 
> Regards,
> Wanpeng Li 
> 

Hi Wanpeng, thanks for your explanation.

One more question, can we add a list_head to manager the poisoned pages? I find ia64
has the array which named "static struct page *page_isolate[MAX_PAGE_ISOLATE]".

Andrew, what do you think?

Thanks
Xishi Qiu

>>>>
>>>> Thanks
>>>> Xishi Qiu
>>>>
>>>>>> - We have atomic_long_inc().  Use it?
>>>>>>
>>>>>> - Why do we have a variable called "mce_bad_pages"?  MCE is an x86
>>>>>>  concept, and this code is in mm/.  Lights are flashing, bells are
>>>>>>  ringing and a loudspeaker is blaring "layering violation" at us!
>>>>>>
>>
> 
> 
> .
> 




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
  2012-12-10 11:16         ` Xishi Qiu
@ 2012-12-10 11:58           ` Simon Jeons
       [not found]           ` <1355140561.1821.5.camel@kernel.cn.ibm.com>
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 28+ messages in thread
From: Simon Jeons @ 2012-12-10 11:58 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Wanpeng Li, Andrew Morton, WuJianguo, Liujiang,
	Vyacheslav.Dubeyko, Borislav Petkov, andi, linux-mm,
	linux-kernel, wency

On Mon, 2012-12-10 at 19:16 +0800, Xishi Qiu wrote:
> On 2012/12/10 18:47, Simon Jeons wrote:
> 
> > On Mon, 2012-12-10 at 17:06 +0800, Xishi Qiu wrote:
> >> On 2012/12/10 16:33, Wanpeng Li wrote:
> >>
> >>> On Fri, Dec 07, 2012 at 02:11:02PM -0800, Andrew Morton wrote:
> >>>> On Fri, 7 Dec 2012 16:48:45 +0800
> >>>> Xishi Qiu <qiuxishi@huawei.com> wrote:
> >>>>
> >>>>> On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" to offline a
> >>>>> free page twice, the value of mce_bad_pages will be added twice. So this is an error,
> >>>>> since the page was already marked HWPoison, we should skip the page and don't add the
> >>>>> value of mce_bad_pages.
> >>>>>
> >>>>> $ cat /proc/meminfo | grep HardwareCorrupted
> >>>>>
> >>>>> soft_offline_page()
> >>>>> 	get_any_page()
> >>>>> 		atomic_long_add(1, &mce_bad_pages)
> >>>>>
> >>>>> ...
> >>>>>
> >>>>> --- a/mm/memory-failure.c
> >>>>> +++ b/mm/memory-failure.c
> >>>>> @@ -1582,8 +1582,11 @@ int soft_offline_page(struct page *page, int flags)
> >>>>>  		return ret;
> >>>>>
> >>>>>  done:
> >>>>> -	atomic_long_add(1, &mce_bad_pages);
> >>>>> -	SetPageHWPoison(page);
> >>>>>  	/* keep elevated page count for bad page */
> >>>>> +	if (!PageHWPoison(page)) {
> >>>>> +		atomic_long_add(1, &mce_bad_pages);
> >>>>> +		SetPageHWPoison(page);
> >>>>> +	}
> >>>>> +
> >>>>>  	return ret;
> >>>>>  }
> >>>>
> >>>> A few things:
> >>>>
> >>>> - soft_offline_page() already checks for this case:
> >>>>
> >>>> 	if (PageHWPoison(page)) {
> >>>> 		unlock_page(page);
> >>>> 		put_page(page);
> >>>> 		pr_info("soft offline: %#lx page already poisoned\n", pfn);
> >>>> 		return -EBUSY;
> >>>> 	}
> >>>>
> >>>>  so why didn't this check work for you?
> >>>>
> >>>>  Presumably because one of the earlier "goto done" branches was
> >>>>  taken.  Which one, any why?
> >>>>
> >>>>  This function is an utter mess.  It contains six return points
> >>>>  randomly intermingled with three "goto done" return points.
> >>>>
> >>>>  This mess is probably the cause of the bug you have observed.  Can
> >>>>  we please fix it up somehow?  It *seems* that the design (lol) of
> >>>>  this function is "for errors, return immediately.  For success, goto
> >>>>  done".  In which case "done" should have been called "success".  But
> >>>>  if you just look at the function you'll see that this approach didn't
> >>>>  work.  I suggest it be converted to have two return points - one for
> >>>>  the success path, one for the failure path.  Or something.
> >>>>
> >>>> - soft_offline_huge_page() is a miniature copy of soft_offline_page()
> >>>>  and might suffer the same bug.
> >>>>
> >>>> - A cleaner, shorter and possibly faster implementation is
> >>>>
> >>>> 	if (!TestSetPageHWPoison(page))
> >>>> 		atomic_long_add(1, &mce_bad_pages);
> >>>>
> >>>
> >>> Hi Andrew,
> >>>
> >>> Since hwpoison bit for free buddy page has already be set in get_any_page, 
> >>> !TestSetPageHWPoison(page) will not increase mce_bad_pages count even for 
> >>> the first time.
> >>>
> >>> Regards,
> >>> Wanpeng Li
> >>>
> >>
> >> The poisoned page is isolated in bad_page(), I wonder whether it could be isolated
> >> immediately in soft_offline_page() and memory_failure()?
> >>
> >> buffered_rmqueue()
> >> 	prep_new_page()
> >> 		check_new_page()
> >> 			bad_page()
> > 
> > Do you mean else if(is_free_buddy_page(p)) branch is redundancy?
> > 
> 
> Hi Simon,
> 
> get_any_page() -> "else if(is_free_buddy_page(p))" branch is *not* redundancy.
> 
> It is another topic, I mean since the page is poisoned, so why not isolate it

What topic? I still can't figure out when this branch can be executed
since hwpoison inject path can't poison free buddy pages.

> from page buddy alocator in soft_offline_page() rather than in check_new_page().
> 
> I find soft_offline_page() only migrate the page and mark HWPoison, the poisoned
> page is still managed by page buddy alocator.
> 
> >>
> >> Thanks
> >> Xishi Qiu
> >>
> >>>> - We have atomic_long_inc().  Use it?
> >>>>
> >>>> - Why do we have a variable called "mce_bad_pages"?  MCE is an x86
> >>>>  concept, and this code is in mm/.  Lights are flashing, bells are
> >>>>  ringing and a loudspeaker is blaring "layering violation" at us!
> >>>>
> 
> 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
  2012-12-10 11:54             ` Xishi Qiu
@ 2012-12-10 12:11               ` Borislav Petkov
  0 siblings, 0 replies; 28+ messages in thread
From: Borislav Petkov @ 2012-12-10 12:11 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Wanpeng Li, Simon Jeons, Andrew Morton, WuJianguo, Liujiang,
	Vyacheslav.Dubeyko, andi, linux-mm, linux-kernel, wency

On Mon, Dec 10, 2012 at 07:54:53PM +0800, Xishi Qiu wrote:
> One more question, can we add a list_head to manager the poisoned pages?

What would you need that list for? Also, a list is not the most optimal
data structure for when you need to traverse it often.

Thanks.

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
       [not found]             ` <50C5D844.8050707@huawei.com>
@ 2012-12-10 12:47               ` Simon Jeons
       [not found]                 ` <20121211011643.GA15754@hacker.(null)>
  0 siblings, 1 reply; 28+ messages in thread
From: Simon Jeons @ 2012-12-10 12:47 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Wanpeng Li, Simon Jeons, Andrew Morton, WuJianguo, Liujiang,
	Vyacheslav.Dubeyko, Borislav Petkov, andi, linux-mm,
	linux-kernel, wency

Cc other guys.

On Mon, 2012-12-10 at 20:40 +0800, Xishi Qiu wrote:
> On 2012/12/10 19:56, Simon Jeons wrote:
> 
> > On Mon, 2012-12-10 at 19:16 +0800, Xishi Qiu wrote:
> >> On 2012/12/10 18:47, Simon Jeons wrote:
> >>
> >>> On Mon, 2012-12-10 at 17:06 +0800, Xishi Qiu wrote:
> >>>> On 2012/12/10 16:33, Wanpeng Li wrote:
> >>>>
> >>>>> On Fri, Dec 07, 2012 at 02:11:02PM -0800, Andrew Morton wrote:
> >>>>>> On Fri, 7 Dec 2012 16:48:45 +0800
> >>>>>> Xishi Qiu <qiuxishi@huawei.com> wrote:
> >>>>>>
> >>>>>>> On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" to offline a
> >>>>>>> free page twice, the value of mce_bad_pages will be added twice. So this is an error,
> >>>>>>> since the page was already marked HWPoison, we should skip the page and don't add the
> >>>>>>> value of mce_bad_pages.
> >>>>>>>
> >>>>>>> $ cat /proc/meminfo | grep HardwareCorrupted
> >>>>>>>
> >>>>>>> soft_offline_page()
> >>>>>>> 	get_any_page()
> >>>>>>> 		atomic_long_add(1, &mce_bad_pages)
> >>>>>>>
> >>>>>>> ...
> >>>>>>>
> >>>>>>> --- a/mm/memory-failure.c
> >>>>>>> +++ b/mm/memory-failure.c
> >>>>>>> @@ -1582,8 +1582,11 @@ int soft_offline_page(struct page *page, int flags)
> >>>>>>>  		return ret;
> >>>>>>>
> >>>>>>>  done:
> >>>>>>> -	atomic_long_add(1, &mce_bad_pages);
> >>>>>>> -	SetPageHWPoison(page);
> >>>>>>>  	/* keep elevated page count for bad page */
> >>>>>>> +	if (!PageHWPoison(page)) {
> >>>>>>> +		atomic_long_add(1, &mce_bad_pages);
> >>>>>>> +		SetPageHWPoison(page);
> >>>>>>> +	}
> >>>>>>> +
> >>>>>>>  	return ret;
> >>>>>>>  }
> >>>>>>
> >>>>>> A few things:
> >>>>>>
> >>>>>> - soft_offline_page() already checks for this case:
> >>>>>>
> >>>>>> 	if (PageHWPoison(page)) {
> >>>>>> 		unlock_page(page);
> >>>>>> 		put_page(page);
> >>>>>> 		pr_info("soft offline: %#lx page already poisoned\n", pfn);
> >>>>>> 		return -EBUSY;
> >>>>>> 	}
> >>>>>>
> >>>>>>  so why didn't this check work for you?
> >>>>>>
> >>>>>>  Presumably because one of the earlier "goto done" branches was
> >>>>>>  taken.  Which one, any why?
> >>>>>>
> >>>>>>  This function is an utter mess.  It contains six return points
> >>>>>>  randomly intermingled with three "goto done" return points.
> >>>>>>
> >>>>>>  This mess is probably the cause of the bug you have observed.  Can
> >>>>>>  we please fix it up somehow?  It *seems* that the design (lol) of
> >>>>>>  this function is "for errors, return immediately.  For success, goto
> >>>>>>  done".  In which case "done" should have been called "success".  But
> >>>>>>  if you just look at the function you'll see that this approach didn't
> >>>>>>  work.  I suggest it be converted to have two return points - one for
> >>>>>>  the success path, one for the failure path.  Or something.
> >>>>>>
> >>>>>> - soft_offline_huge_page() is a miniature copy of soft_offline_page()
> >>>>>>  and might suffer the same bug.
> >>>>>>
> >>>>>> - A cleaner, shorter and possibly faster implementation is
> >>>>>>
> >>>>>> 	if (!TestSetPageHWPoison(page))
> >>>>>> 		atomic_long_add(1, &mce_bad_pages);
> >>>>>>
> >>>>>
> >>>>> Hi Andrew,
> >>>>>
> >>>>> Since hwpoison bit for free buddy page has already be set in get_any_page, 
> >>>>> !TestSetPageHWPoison(page) will not increase mce_bad_pages count even for 
> >>>>> the first time.
> >>>>>
> >>>>> Regards,
> >>>>> Wanpeng Li
> >>>>>
> >>>>
> >>>> The poisoned page is isolated in bad_page(), I wonder whether it could be isolated
> >>>> immediately in soft_offline_page() and memory_failure()?
> >>>>
> >>>> buffered_rmqueue()
> >>>> 	prep_new_page()
> >>>> 		check_new_page()
> >>>> 			bad_page()
> >>>
> >>> Do you mean else if(is_free_buddy_page(p)) branch is redundancy?
> >>>
> >>
> >> Hi Simon,
> >>
> >> get_any_page() -> "else if(is_free_buddy_page(p))" branch is *not* redundancy.
> >>
> >> It is another topic, I mean since the page is poisoned, so why not isolate it
> > 
> > What topic? I still can't figure out when this branch can be executed
> > since hwpoison inject path can't poison free buddy pages. 
> > 
> 
> Hi Simon,
> 
> If we use "/sys/devices/system/memory/soft_offline_page" to offline a
> free page, the value of mce_bad_pages will be added. Then the page is marked
> HWPoison, but it is still managed by page buddy alocator.
> 
> So if we offline it again, the value of mce_bad_pages will be added again.
> Assume the page is not allocated during this short time.
> 
> soft_offline_page()
> 	get_any_page()
> 		"else if (is_free_buddy_page(p))" branch return 0
> 			"goto done";
> 				"atomic_long_add(1, &mce_bad_pages);"
> 
> I think it would be better to move "if(PageHWPoison(page))" at the beginning of
> soft_offline_page(). However I don't know what do these words mean,
> "Synchronized using the page lock with memory_failure()"
> 
> >> from page buddy alocator in soft_offline_page() rather than in check_new_page().
> >>
> >> I find soft_offline_page() only migrate the page and mark HWPoison, the poisoned
> >> page is still managed by page buddy alocator.
> >>
> >>>>
> >>>> Thanks
> >>>> Xishi Qiu
> >>>>
> >>>>>> - We have atomic_long_inc().  Use it?
> >>>>>>
> >>>>>> - Why do we have a variable called "mce_bad_pages"?  MCE is an x86
> >>>>>>  concept, and this code is in mm/.  Lights are flashing, bells are
> >>>>>>  ringing and a loudspeaker is blaring "layering violation" at us!
> >>>>>>
> >>
> >>
> > 
> > 
> > 
> > .
> > 
> 
> 
> 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
  2012-12-10 11:16         ` Xishi Qiu
  2012-12-10 11:58           ` Simon Jeons
       [not found]           ` <1355140561.1821.5.camel@kernel.cn.ibm.com>
@ 2012-12-10 15:38           ` Andi Kleen
  2012-12-11  1:49             ` Simon Jeons
  2012-12-11  2:25             ` Xishi Qiu
       [not found]           ` <20121210113923.GA5579@hacker.(null)>
  3 siblings, 2 replies; 28+ messages in thread
From: Andi Kleen @ 2012-12-10 15:38 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Simon Jeons, Wanpeng Li, Andrew Morton, WuJianguo, Liujiang,
	Vyacheslav.Dubeyko, Borislav Petkov, andi, linux-mm,
	linux-kernel, wency

> It is another topic, I mean since the page is poisoned, so why not isolate it
> from page buddy alocator in soft_offline_page() rather than in check_new_page().
> I find soft_offline_page() only migrate the page and mark HWPoison, the poisoned
> page is still managed by page buddy alocator.

Doing it in check_new_page is the only way if the page is currently
allocated by someone. Since that's not uncommon it's simplest to always
do it this way.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
       [not found]           ` <20121210113923.GA5579@hacker.(null)>
  2012-12-10 11:54             ` Xishi Qiu
@ 2012-12-10 15:39             ` Andi Kleen
  1 sibling, 0 replies; 28+ messages in thread
From: Andi Kleen @ 2012-12-10 15:39 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Xishi Qiu, Simon Jeons, Andrew Morton, WuJianguo, Liujiang,
	Vyacheslav.Dubeyko, Borislav Petkov, andi, linux-mm,
	linux-kernel, wency

> HWPoison delays any action on buddy allocator pages, handling can be safely postponed 
> until a later time when the page might be referenced. By delaying, some transient errors 
> may not reoccur or may be irrelevant.

That's not true for soft offlining, only for hard.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
  2012-12-10 15:38           ` Andi Kleen
@ 2012-12-11  1:49             ` Simon Jeons
  2012-12-11  2:03               ` Andi Kleen
  2012-12-11  2:25             ` Xishi Qiu
  1 sibling, 1 reply; 28+ messages in thread
From: Simon Jeons @ 2012-12-11  1:49 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Xishi Qiu, Wanpeng Li, Andrew Morton, WuJianguo, Liujiang,
	Vyacheslav.Dubeyko, Borislav Petkov, linux-mm, linux-kernel,
	wency

On Mon, 2012-12-10 at 16:38 +0100, Andi Kleen wrote:
> > It is another topic, I mean since the page is poisoned, so why not isolate it
> > from page buddy alocator in soft_offline_page() rather than in check_new_page().
> > I find soft_offline_page() only migrate the page and mark HWPoison, the poisoned
> > page is still managed by page buddy alocator.
> 
> Doing it in check_new_page is the only way if the page is currently
> allocated by someone. Since that's not uncommon it's simplest to always
> do it this way.

Hi Andi,

IIUC, soft offlining will isolate and migrate hwpoisoned page, and this
page will not be accessed by memory management subsystem until unpoison,
correct?

                             -Simon

> 
> -Andi
> 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
  2012-12-11  1:49             ` Simon Jeons
@ 2012-12-11  2:03               ` Andi Kleen
  2012-12-11  2:14                 ` Simon Jeons
  0 siblings, 1 reply; 28+ messages in thread
From: Andi Kleen @ 2012-12-11  2:03 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Andi Kleen, Xishi Qiu, Wanpeng Li, Andrew Morton, WuJianguo,
	Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, linux-mm,
	linux-kernel, wency

> IIUC, soft offlining will isolate and migrate hwpoisoned page, and this
> page will not be accessed by memory management subsystem until unpoison,
> correct?

No, soft offlining can still allow accesses for some time. It'll never kill
anything.

Hard tries much harder and will kill.

In some cases (unshrinkable kernel allocation) they end up doing the same
because there isn't any other alternative though. However these are
expected to only apply to a small percentage of pages in a typical
system.

-Andi

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
  2012-12-11  2:03               ` Andi Kleen
@ 2012-12-11  2:14                 ` Simon Jeons
  2012-12-11  3:01                   ` Andi Kleen
  0 siblings, 1 reply; 28+ messages in thread
From: Simon Jeons @ 2012-12-11  2:14 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Xishi Qiu, Wanpeng Li, Andrew Morton, WuJianguo, Liujiang,
	Vyacheslav.Dubeyko, Borislav Petkov, linux-mm, linux-kernel,
	wency

On Tue, 2012-12-11 at 03:03 +0100, Andi Kleen wrote:
> > IIUC, soft offlining will isolate and migrate hwpoisoned page, and this
> > page will not be accessed by memory management subsystem until unpoison,
> > correct?
> 
> No, soft offlining can still allow accesses for some time. It'll never kill
> anything.

Oh, it will be putback to lru list during migration. So does your "some
time" mean before call check_new_page?

	               -Simon 

> 
> Hard tries much harder and will kill.
> 
> In some cases (unshrinkable kernel allocation) they end up doing the same
> because there isn't any other alternative though. However these are
> expected to only apply to a small percentage of pages in a typical
> system.
> 
> -Andi



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
  2012-12-10 15:38           ` Andi Kleen
  2012-12-11  1:49             ` Simon Jeons
@ 2012-12-11  2:25             ` Xishi Qiu
  2012-12-11  2:45               ` Fengguang Wu
  1 sibling, 1 reply; 28+ messages in thread
From: Xishi Qiu @ 2012-12-11  2:25 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Simon Jeons, Wanpeng Li, Andrew Morton, WuJianguo, Liujiang,
	Vyacheslav.Dubeyko, Borislav Petkov, linux-mm, linux-kernel,
	wency, fengguang.wu, Hanjun Guo

On 2012/12/10 23:38, Andi Kleen wrote:

>> It is another topic, I mean since the page is poisoned, so why not isolate it
>> from page buddy alocator in soft_offline_page() rather than in check_new_page().
>> I find soft_offline_page() only migrate the page and mark HWPoison, the poisoned
>> page is still managed by page buddy alocator.
> 
> Doing it in check_new_page is the only way if the page is currently
> allocated by someone. Since that's not uncommon it's simplest to always
> do it this way.
> 
> -Andi
> 

Hi Andi,

The poisoned page is isolated in check_new_page, however the whole buddy block will
be dropped, it seems to be a waste of memory.

Can we separate the poisoned page from the buddy block, then *only* drop the poisoned
page?

Thanks
Xishi Qiu



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
  2012-12-11  2:25             ` Xishi Qiu
@ 2012-12-11  2:45               ` Fengguang Wu
  2012-12-11  2:58                 ` Andi Kleen
  0 siblings, 1 reply; 28+ messages in thread
From: Fengguang Wu @ 2012-12-11  2:45 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Andi Kleen, Simon Jeons, Wanpeng Li, Andrew Morton, WuJianguo,
	Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, linux-mm,
	linux-kernel, wency, Hanjun Guo

On Tue, Dec 11, 2012 at 10:25:00AM +0800, Xishi Qiu wrote:
> On 2012/12/10 23:38, Andi Kleen wrote:
> 
> >> It is another topic, I mean since the page is poisoned, so why not isolate it
> >> from page buddy alocator in soft_offline_page() rather than in check_new_page().
> >> I find soft_offline_page() only migrate the page and mark HWPoison, the poisoned
> >> page is still managed by page buddy alocator.
> > 
> > Doing it in check_new_page is the only way if the page is currently
> > allocated by someone. Since that's not uncommon it's simplest to always
> > do it this way.
> > 
> > -Andi
> > 
> 
> Hi Andi,
> 
> The poisoned page is isolated in check_new_page, however the whole buddy block will
> be dropped, it seems to be a waste of memory.
> 
> Can we separate the poisoned page from the buddy block, then *only* drop the poisoned
> page?

That sounds like overkill. There are not so many free pages in a
typical server system.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
  2012-12-11  2:45               ` Fengguang Wu
@ 2012-12-11  2:58                 ` Andi Kleen
  2012-12-11  3:25                   ` Xishi Qiu
  0 siblings, 1 reply; 28+ messages in thread
From: Andi Kleen @ 2012-12-11  2:58 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: Xishi Qiu, Andi Kleen, Simon Jeons, Wanpeng Li, Andrew Morton,
	WuJianguo, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov,
	linux-mm, linux-kernel, wency, Hanjun Guo

> That sounds like overkill. There are not so many free pages in a
> typical server system.

As Fengguang said -- memory error handling is tricky. Lots of things
could be done in theory, but they all have a cost in testing and 
maintenance. 

In general they are only worth doing if the situation is common and
represents a significant percentage of the total pages of a relevant server
workload.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
  2012-12-11  2:14                 ` Simon Jeons
@ 2012-12-11  3:01                   ` Andi Kleen
  2012-12-11  3:13                     ` Simon Jeons
  0 siblings, 1 reply; 28+ messages in thread
From: Andi Kleen @ 2012-12-11  3:01 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Andi Kleen, Xishi Qiu, Wanpeng Li, Andrew Morton, WuJianguo,
	Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, linux-mm,
	linux-kernel, wency

> Oh, it will be putback to lru list during migration. So does your "some
> time" mean before call check_new_page?

Yes until the next check_new_page() whenever that is. If the migration
works it will be earlier, otherwise later.

-andi

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
  2012-12-11  3:01                   ` Andi Kleen
@ 2012-12-11  3:13                     ` Simon Jeons
  2012-12-11  3:19                       ` Andi Kleen
  0 siblings, 1 reply; 28+ messages in thread
From: Simon Jeons @ 2012-12-11  3:13 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Xishi Qiu, Wanpeng Li, Andrew Morton, WuJianguo, Liujiang,
	Vyacheslav.Dubeyko, Borislav Petkov, linux-mm, linux-kernel,
	wency

On Tue, 2012-12-11 at 04:01 +0100, Andi Kleen wrote:
> > Oh, it will be putback to lru list during migration. So does your "some
> > time" mean before call check_new_page?
> 
> Yes until the next check_new_page() whenever that is. If the migration
> works it will be earlier, otherwise later.

But I can't figure out any page reclaim path check if the page is set
PG_hwpoison, can poisoned pages be rclaimed?

			-Simon

> 
> -andi



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
  2012-12-11  3:13                     ` Simon Jeons
@ 2012-12-11  3:19                       ` Andi Kleen
  2012-12-11  3:48                         ` Simon Jeons
  0 siblings, 1 reply; 28+ messages in thread
From: Andi Kleen @ 2012-12-11  3:19 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Andi Kleen, Xishi Qiu, Wanpeng Li, Andrew Morton, WuJianguo,
	Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, linux-mm,
	linux-kernel, wency

On Mon, Dec 10, 2012 at 09:13:11PM -0600, Simon Jeons wrote:
> On Tue, 2012-12-11 at 04:01 +0100, Andi Kleen wrote:
> > > Oh, it will be putback to lru list during migration. So does your "some
> > > time" mean before call check_new_page?
> > 
> > Yes until the next check_new_page() whenever that is. If the migration
> > works it will be earlier, otherwise later.
> 
> But I can't figure out any page reclaim path check if the page is set
> PG_hwpoison, can poisoned pages be rclaimed?

The only way to reclaim a page is to free and reallocate it.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
  2012-12-11  2:58                 ` Andi Kleen
@ 2012-12-11  3:25                   ` Xishi Qiu
  2012-12-11  3:36                     ` Andi Kleen
  0 siblings, 1 reply; 28+ messages in thread
From: Xishi Qiu @ 2012-12-11  3:25 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Fengguang Wu, Simon Jeons, Wanpeng Li, Andrew Morton, WuJianguo,
	Liujiang, Vyacheslav.Dubeyko, Borislav Petkov, linux-mm,
	linux-kernel, wency, Hanjun Guo

On 2012/12/11 10:58, Andi Kleen wrote:

>> That sounds like overkill. There are not so many free pages in a
>> typical server system.
> 
> As Fengguang said -- memory error handling is tricky. Lots of things
> could be done in theory, but they all have a cost in testing and 
> maintenance. 
> 
> In general they are only worth doing if the situation is common and
> represents a significant percentage of the total pages of a relevant server
> workload.
> 
> -Andi
> 

Hi Andi and Fengguang,

"There are not so many free pages in a typical server system", sorry I don't
quite understand it.

buffered_rmqueue()
	prep_new_page()
		check_new_page()
			bad_page()

If we alloc 2^10 pages and one of them is a poisoned page, then the whole 4M
memory will be dropped.

Thanks,
Xishi Qiu


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
  2012-12-11  3:25                   ` Xishi Qiu
@ 2012-12-11  3:36                     ` Andi Kleen
  0 siblings, 0 replies; 28+ messages in thread
From: Andi Kleen @ 2012-12-11  3:36 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Andi Kleen, Fengguang Wu, Simon Jeons, Wanpeng Li, Andrew Morton,
	WuJianguo, Liujiang, Vyacheslav.Dubeyko, Borislav Petkov,
	linux-mm, linux-kernel, wency, Hanjun Guo

> "There are not so many free pages in a typical server system", sorry I don't
> quite understand it.

Linux tries to keep most memory in caches. As Linus says "free memory is
bad memory"

>
> buffered_rmqueue()
> 	prep_new_page()
> 		check_new_page()
> 			bad_page()
> 
> If we alloc 2^10 pages and one of them is a poisoned page, then the whole 4M
> memory will be dropped.

prep_new_page() is only called on whatever is allocated.
MAX_ORDER is much smaller than 2^10

If you allocate a large order page then yes the complete page is
dropped. This is today generally true in hwpoison. It would be one
possible area of improvement (probably mostly if 1GB pages become
more common than they are today)

It's usually not a problem because usually most allocations are
small order and systems have generally very few memory errors,
and even the largest MAX_ORDER pages are a small fraction of the 
total memory.

If you lose larger amounts of memory usually you quickly hit something
that HWPoison cannot handle.

-Andi

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
  2012-12-11  3:19                       ` Andi Kleen
@ 2012-12-11  3:48                         ` Simon Jeons
  2012-12-11  5:55                           ` Xishi Qiu
  0 siblings, 1 reply; 28+ messages in thread
From: Simon Jeons @ 2012-12-11  3:48 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Xishi Qiu, Wanpeng Li, Andrew Morton, WuJianguo, Liujiang,
	Vyacheslav.Dubeyko, Borislav Petkov, linux-mm, linux-kernel,
	wency

On Tue, 2012-12-11 at 04:19 +0100, Andi Kleen wrote:
> On Mon, Dec 10, 2012 at 09:13:11PM -0600, Simon Jeons wrote:
> > On Tue, 2012-12-11 at 04:01 +0100, Andi Kleen wrote:
> > > > Oh, it will be putback to lru list during migration. So does your "some
> > > > time" mean before call check_new_page?
> > > 
> > > Yes until the next check_new_page() whenever that is. If the migration
> > > works it will be earlier, otherwise later.
> > 
> > But I can't figure out any page reclaim path check if the page is set
> > PG_hwpoison, can poisoned pages be rclaimed?
> 
> The only way to reclaim a page is to free and reallocate it.

Then why there doesn't have check in reclaim path to avoid relcaim
poisoned page?

			-Simon

> 
> -Andi



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
  2012-12-11  3:48                         ` Simon Jeons
@ 2012-12-11  5:55                           ` Xishi Qiu
  0 siblings, 0 replies; 28+ messages in thread
From: Xishi Qiu @ 2012-12-11  5:55 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Andi Kleen, Wanpeng Li, Andrew Morton, WuJianguo, Liujiang,
	Vyacheslav.Dubeyko, Borislav Petkov, linux-mm, linux-kernel,
	wency

On 2012/12/11 11:48, Simon Jeons wrote:

> On Tue, 2012-12-11 at 04:19 +0100, Andi Kleen wrote:
>> On Mon, Dec 10, 2012 at 09:13:11PM -0600, Simon Jeons wrote:
>>> On Tue, 2012-12-11 at 04:01 +0100, Andi Kleen wrote:
>>>>> Oh, it will be putback to lru list during migration. So does your "some
>>>>> time" mean before call check_new_page?
>>>>
>>>> Yes until the next check_new_page() whenever that is. If the migration
>>>> works it will be earlier, otherwise later.
>>>
>>> But I can't figure out any page reclaim path check if the page is set
>>> PG_hwpoison, can poisoned pages be rclaimed?
>>
>> The only way to reclaim a page is to free and reallocate it.
> 
> Then why there doesn't have check in reclaim path to avoid relcaim
> poisoned page?
> 
> 			-Simon

Hi Simon,

If the page is free, it will be set PG_hwpoison, and soft_offline_page() is done.
When the page is alocated later, check_new_page() will find the poisoned page and
isolate the whole buddy block(just drop the block).

If the page is not free, soft_offline_page() try to free it first, if this is
failed, it will migrate the page, but the page is still in LRU list after migration,
migrate_pages()
	unmap_and_move()
		if (rc != -EAGAIN) {
			...
			putback_lru_page(page);
		}
We can use lru_add_drain_all() to drain lru pagevec, at last free_hot_cold_page()
will be called, and free_pages_prepare() check the poisoned pages.
free_pages_prepare()
	free_pages_check()
		bad_page()

Is this right, Andi?

Thanks
Xishi Qiu

>>
>> -Andi
> 
> 
> 
> 




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
       [not found]                 ` <20121211011643.GA15754@hacker.(null)>
@ 2012-12-11  6:49                   ` Xishi Qiu
  0 siblings, 0 replies; 28+ messages in thread
From: Xishi Qiu @ 2012-12-11  6:49 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Simon Jeons, Andrew Morton, WuJianguo, Liujiang,
	Vyacheslav.Dubeyko, Borislav Petkov, andi, linux-mm,
	linux-kernel, wency

>>> Hi Simon,

>>>
>>> If we use "/sys/devices/system/memory/soft_offline_page" to offline a
>>> free page, the value of mce_bad_pages will be added. Then the page is marked
>>> HWPoison, but it is still managed by page buddy alocator.
>>>
>>> So if we offline it again, the value of mce_bad_pages will be added again.
>>> Assume the page is not allocated during this short time.
>>>
>>> soft_offline_page()
>>> 	get_any_page()
>>> 		"else if (is_free_buddy_page(p))" branch return 0
>>> 			"goto done";
>>> 				"atomic_long_add(1, &mce_bad_pages);"
>>>
>>> I think it would be better to move "if(PageHWPoison(page))" at the beginning of
>>> soft_offline_page(). However I don't know what do these words mean,
>>> "Synchronized using the page lock with memory_failure()"
> 
> Hi Xishi,
> 
> Unpoison will clear PG_hwpoison flag after hold page lock, memory_failure() and 
> soft_offline_page() take the lock to avoid unpoison clear the flag behind them.
> 
> Regards,
> Wanpeng Li 
> 

Hi Wanpeng,

As you mean, it is the necessary to get the page lock first when we check the
HWPoison flag every time, this is in order to avoid conflict, right?

So why not use a globe lock here? For example lock_memory_hotplug() is used in
online_pages() and offline_pages()?

Thanks,
Xishi Qiu


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2012-12-11  6:50 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-07  8:48 [PATCH V2] MCE: fix an error of mce_bad_pages statistics Xishi Qiu
2012-12-07 14:33 ` Borislav Petkov
2012-12-07 22:11 ` Andrew Morton
2012-12-07 22:41   ` Borislav Petkov
2012-12-10  4:33   ` Xishi Qiu
     [not found]   ` <20121210083342.GA31670@hacker.(null)>
2012-12-10  9:06     ` Xishi Qiu
2012-12-10 10:47       ` Simon Jeons
2012-12-10 11:16         ` Xishi Qiu
2012-12-10 11:58           ` Simon Jeons
     [not found]           ` <1355140561.1821.5.camel@kernel.cn.ibm.com>
     [not found]             ` <50C5D844.8050707@huawei.com>
2012-12-10 12:47               ` Simon Jeons
     [not found]                 ` <20121211011643.GA15754@hacker.(null)>
2012-12-11  6:49                   ` Xishi Qiu
2012-12-10 15:38           ` Andi Kleen
2012-12-11  1:49             ` Simon Jeons
2012-12-11  2:03               ` Andi Kleen
2012-12-11  2:14                 ` Simon Jeons
2012-12-11  3:01                   ` Andi Kleen
2012-12-11  3:13                     ` Simon Jeons
2012-12-11  3:19                       ` Andi Kleen
2012-12-11  3:48                         ` Simon Jeons
2012-12-11  5:55                           ` Xishi Qiu
2012-12-11  2:25             ` Xishi Qiu
2012-12-11  2:45               ` Fengguang Wu
2012-12-11  2:58                 ` Andi Kleen
2012-12-11  3:25                   ` Xishi Qiu
2012-12-11  3:36                     ` Andi Kleen
     [not found]           ` <20121210113923.GA5579@hacker.(null)>
2012-12-10 11:54             ` Xishi Qiu
2012-12-10 12:11               ` Borislav Petkov
2012-12-10 15:39             ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).