linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood
@ 2013-09-14 23:53 Wanpeng Li
  2013-09-14 23:53 ` [RESEND PATCH v2 2/4] mm/hwpoison: fix miss catch transparent huge page Wanpeng Li
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Wanpeng Li @ 2013-09-14 23:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andi Kleen, Fengguang Wu, Naoya Horiguchi, Tony Luck, gong.chen,
	linux-mm, linux-kernel, Wanpeng Li

madvise_hwpoison won't check if the page is small page or huge page and traverse 
in small page granularity against the range unconditional, which result in a printk 
flood "MCE xxx: already hardware poisoned" if the page is huge page. This patch fix 
it by increase compound_order(compound_head(page)) for huge page iterator.

Testcase:

#define _GNU_SOURCE
#include <stdlib.h>
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <errno.h>

#define PAGES_TO_TEST 3
#define PAGE_SIZE	4096 * 512

int main(void)
{
	char *mem;
	int i;

	mem = mmap(NULL, PAGES_TO_TEST * PAGE_SIZE,
			PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, 0, 0);

	if (madvise(mem, PAGES_TO_TEST * PAGE_SIZE, MADV_HWPOISON) == -1)
		return -1;
	
	munmap(mem, PAGES_TO_TEST * PAGE_SIZE);

	return 0;
}

Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
---
 mm/madvise.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 6975bc8..539eeb9 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -343,10 +343,11 @@ static long madvise_remove(struct vm_area_struct *vma,
  */
 static int madvise_hwpoison(int bhv, unsigned long start, unsigned long end)
 {
+	struct page *p;
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
-	for (; start < end; start += PAGE_SIZE) {
-		struct page *p;
+	for (; start < end; start += PAGE_SIZE <<
+				compound_order(compound_head(p))) {
 		int ret;
 
 		ret = get_user_pages_fast(start, 1, 0, &p);
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RESEND PATCH v2 2/4] mm/hwpoison: fix miss catch transparent huge page
  2013-09-14 23:53 [RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood Wanpeng Li
@ 2013-09-14 23:53 ` Wanpeng Li
  2013-09-15  0:14   ` Andi Kleen
  2013-09-14 23:53 ` [RESEND PATCH v2 3/4] mm/hwpoison: fix false report 2nd try page recovery Wanpeng Li
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Wanpeng Li @ 2013-09-14 23:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andi Kleen, Fengguang Wu, Naoya Horiguchi, Tony Luck, gong.chen,
	linux-mm, linux-kernel, Wanpeng Li

Changelog:
 *v1 -> v2: reverse PageTransHuge(page) && !PageHuge(page) check 

PageTransHuge() can't guarantee the page is transparent huge page since it 
return true for both transparent huge and hugetlbfs pages. This patch fix 
it by check the page is also !hugetlbfs page.

Before patch:

[  121.571128] Injecting memory failure at pfn 23a200
[  121.571141] MCE 0x23a200: huge page recovery: Delayed
[  140.355100] MCE: Memory failure is now running on 0x23a200

After patch:

[   94.290793] Injecting memory failure at pfn 23a000
[   94.290800] MCE 0x23a000: huge page recovery: Delayed
[  105.722303] MCE: Software-unpoisoned page 0x23a000

Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
---
 mm/memory-failure.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index e28ee77..b114570 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1349,7 +1349,7 @@ int unpoison_memory(unsigned long pfn)
 	 * worked by memory_failure() and the page lock is not held yet.
 	 * In such case, we yield to memory_failure() and make unpoison fail.
 	 */
-	if (PageTransHuge(page)) {
+	if (!PageHuge(page) && PageTransHuge(page)) {
 		pr_info("MCE: Memory failure is now running on %#lx\n", pfn);
 			return 0;
 	}
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RESEND PATCH v2 3/4] mm/hwpoison: fix false report 2nd try page recovery
  2013-09-14 23:53 [RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood Wanpeng Li
  2013-09-14 23:53 ` [RESEND PATCH v2 2/4] mm/hwpoison: fix miss catch transparent huge page Wanpeng Li
@ 2013-09-14 23:53 ` Wanpeng Li
  2013-09-15  0:15   ` Andi Kleen
  2013-09-14 23:53 ` [RESEND PATCH v2 4/4] mm/hwpoison: fix the lack of one reference count against poisoned page Wanpeng Li
  2013-09-15  0:13 ` [RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood Andi Kleen
  3 siblings, 1 reply; 12+ messages in thread
From: Wanpeng Li @ 2013-09-14 23:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andi Kleen, Fengguang Wu, Naoya Horiguchi, Tony Luck, gong.chen,
	linux-mm, linux-kernel, Wanpeng Li

If the page is poisoned by software inject w/ MF_COUNT_INCREASED flag, there
is a false report 2nd try page recovery which is not truth, this patch fix it
by report first try free buddy page recovery if MF_COUNT_INCREASED is set.

Before patch:

[  346.332041] Injecting memory failure at pfn 200010
[  346.332189] MCE 0x200010: free buddy, 2nd try page recovery: Delayed

After patch:

[  297.742600] Injecting memory failure at pfn 200010
[  297.742941] MCE 0x200010: free buddy page recovery: Delayed

Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
---
 mm/memory-failure.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index b114570..6293164 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1114,8 +1114,10 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
 			 * shake_page could have turned it free.
 			 */
 			if (is_free_buddy_page(p)) {
-				action_result(pfn, "free buddy, 2nd try",
-						DELAYED);
+				if (flags & MF_COUNT_INCREASED)
+					action_result(pfn, "free buddy", DELAYED);
+				else
+					action_result(pfn, "free buddy, 2nd try", DELAYED);
 				return 0;
 			}
 			action_result(pfn, "non LRU", IGNORED);
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RESEND PATCH v2 4/4] mm/hwpoison: fix the lack of one reference count against poisoned page
  2013-09-14 23:53 [RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood Wanpeng Li
  2013-09-14 23:53 ` [RESEND PATCH v2 2/4] mm/hwpoison: fix miss catch transparent huge page Wanpeng Li
  2013-09-14 23:53 ` [RESEND PATCH v2 3/4] mm/hwpoison: fix false report 2nd try page recovery Wanpeng Li
@ 2013-09-14 23:53 ` Wanpeng Li
  2013-09-15  0:16   ` Andi Kleen
  2013-09-15  0:13 ` [RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood Andi Kleen
  3 siblings, 1 reply; 12+ messages in thread
From: Wanpeng Li @ 2013-09-14 23:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andi Kleen, Fengguang Wu, Naoya Horiguchi, Tony Luck, gong.chen,
	linux-mm, linux-kernel, Wanpeng Li

The lack of one reference count against poisoned page for hwpoison_inject w/o 
hwpoison_filter enabled result in hwpoison detect -1 users still referenced 
the page, however, the number should be 0 except the poison handler held one 
after successfully unmap. This patch fix it by hold one referenced count against 
poisoned page for hwpoison_inject w/ and w/o hwpoison_filter enabled.

Before patch:

[   71.902112] Injecting memory failure at pfn 224706
[   71.902137] MCE 0x224706: dirty LRU page recovery: Failed
[   71.902138] MCE 0x224706: dirty LRU page still referenced by -1 users

After patch:

[   94.710860] Injecting memory failure at pfn 215b68
[   94.710885] MCE 0x215b68: dirty LRU page recovery: Recovered

Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
---
 mm/hwpoison-inject.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/hwpoison-inject.c b/mm/hwpoison-inject.c
index afc2daa..4c84678 100644
--- a/mm/hwpoison-inject.c
+++ b/mm/hwpoison-inject.c
@@ -20,8 +20,6 @@ static int hwpoison_inject(void *data, u64 val)
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
 
-	if (!hwpoison_filter_enable)
-		goto inject;
 	if (!pfn_valid(pfn))
 		return -ENXIO;
 
@@ -33,6 +31,9 @@ static int hwpoison_inject(void *data, u64 val)
 	if (!get_page_unless_zero(hpage))
 		return 0;
 
+	if (!hwpoison_filter_enable)
+		goto inject;
+
 	if (!PageLRU(p) && !PageHuge(p))
 		shake_page(p, 0);
 	/*
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood
  2013-09-14 23:53 [RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood Wanpeng Li
                   ` (2 preceding siblings ...)
  2013-09-14 23:53 ` [RESEND PATCH v2 4/4] mm/hwpoison: fix the lack of one reference count against poisoned page Wanpeng Li
@ 2013-09-15  0:13 ` Andi Kleen
  2013-09-16 21:50   ` Luck, Tony
  3 siblings, 1 reply; 12+ messages in thread
From: Andi Kleen @ 2013-09-15  0:13 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Andrew Morton, Andi Kleen, Fengguang Wu, Naoya Horiguchi,
	Tony Luck, gong.chen, linux-mm, linux-kernel

> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>

Acked-by: Andi Kleen <ak@linux.intel.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RESEND PATCH v2 2/4] mm/hwpoison: fix miss catch transparent huge page
  2013-09-14 23:53 ` [RESEND PATCH v2 2/4] mm/hwpoison: fix miss catch transparent huge page Wanpeng Li
@ 2013-09-15  0:14   ` Andi Kleen
  0 siblings, 0 replies; 12+ messages in thread
From: Andi Kleen @ 2013-09-15  0:14 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Andrew Morton, Andi Kleen, Fengguang Wu, Naoya Horiguchi,
	Tony Luck, gong.chen, linux-mm, linux-kernel

> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>

Acked-by: Andi Kleen <ak@linux.intel.com>

-Andi

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RESEND PATCH v2 3/4] mm/hwpoison: fix false report 2nd try page recovery
  2013-09-14 23:53 ` [RESEND PATCH v2 3/4] mm/hwpoison: fix false report 2nd try page recovery Wanpeng Li
@ 2013-09-15  0:15   ` Andi Kleen
  0 siblings, 0 replies; 12+ messages in thread
From: Andi Kleen @ 2013-09-15  0:15 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Andrew Morton, Andi Kleen, Fengguang Wu, Naoya Horiguchi,
	Tony Luck, gong.chen, linux-mm, linux-kernel

> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>

Acked-by: Andi Kleen <ak@linux.intel.com>

-Andi

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RESEND PATCH v2 4/4] mm/hwpoison: fix the lack of one reference count against poisoned page
  2013-09-14 23:53 ` [RESEND PATCH v2 4/4] mm/hwpoison: fix the lack of one reference count against poisoned page Wanpeng Li
@ 2013-09-15  0:16   ` Andi Kleen
  0 siblings, 0 replies; 12+ messages in thread
From: Andi Kleen @ 2013-09-15  0:16 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Andrew Morton, Andi Kleen, Fengguang Wu, Naoya Horiguchi,
	Tony Luck, gong.chen, linux-mm, linux-kernel

> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>

Acked-by: Andi Kleen <ak@linux.intel.com>

-Andi

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood
  2013-09-15  0:13 ` [RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood Andi Kleen
@ 2013-09-16 21:50   ` Luck, Tony
  2013-09-16 22:09     ` Naoya Horiguchi
  0 siblings, 1 reply; 12+ messages in thread
From: Luck, Tony @ 2013-09-16 21:50 UTC (permalink / raw)
  To: Andi Kleen, Wanpeng Li
  Cc: Andrew Morton, Wu, Fengguang, Naoya Horiguchi, gong.chen,
	linux-mm, linux-kernel

This is good - but the real solution is to stop poisoning entire huge pages ... they should
be broken into 4K pages and just one 4K page should be poisoned.

Naoya Horiguchi: I thought that you were looking at this problem some months ago. Any progress?

-Tony

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood
  2013-09-16 21:50   ` Luck, Tony
@ 2013-09-16 22:09     ` Naoya Horiguchi
       [not found]       ` <20130916232345.GA3241@hacker.(null)>
  0 siblings, 1 reply; 12+ messages in thread
From: Naoya Horiguchi @ 2013-09-16 22:09 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Andi Kleen, Wanpeng Li, Andrew Morton, Wu, Fengguang, gong.chen,
	linux-mm, linux-kernel

On Mon, Sep 16, 2013 at 09:50:06PM +0000, Luck, Tony wrote:
> This is good - but the real solution is to stop poisoning entire huge pages ... they should
> be broken into 4K pages and just one 4K page should be poisoned.
> 
> Naoya Horiguchi: I thought that you were looking at this problem some months ago. Any progress?

Sorry, I have no meaningful progress on this. Splitting hugepages is not
a trivial operation, and introduce more complexity on hugetlbfs code.
I don't hit on any usecase of it rather than memory failure, so I'm not
sure that it's worth doing now.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood
       [not found]       ` <20130916232345.GA3241@hacker.(null)>
@ 2013-09-16 23:44         ` Luck, Tony
       [not found]           ` <20130917000817.GA5996@hacker.(null)>
  0 siblings, 1 reply; 12+ messages in thread
From: Luck, Tony @ 2013-09-16 23:44 UTC (permalink / raw)
  To: Wanpeng Li, Naoya Horiguchi
  Cc: Andi Kleen, Andrew Morton, Wu, Fengguang, gong.chen, linux-mm,
	linux-kernel

>>Sorry, I have no meaningful progress on this. Splitting hugepages is not
>>a trivial operation, and introduce more complexity on hugetlbfs code.
>>I don't hit on any usecase of it rather than memory failure, so I'm not
>>sure that it's worth doing now.
>
> Agreed. ;-)

Agreed that huge pages should be split - or that it is not worth splitting them?

Actually I wonder how useful huge pages still are - transparent huge pages may
give most of the benefits without having to modify applications to use them.
Plus the kernel does know how to split them when an error occurs (which I care
about more than most people).

-Tony

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood
       [not found]           ` <20130917000817.GA5996@hacker.(null)>
@ 2013-09-17 16:47             ` Luck, Tony
  0 siblings, 0 replies; 12+ messages in thread
From: Luck, Tony @ 2013-09-17 16:47 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Naoya Horiguchi, Andi Kleen, Andrew Morton, Wu, Fengguang,
	gong.chen, linux-mm, linux-kernel

> Transparent huge pages are not helpful for DB workload which there is a lot of 
> shared memory

Hmm. Perhaps they should be.  If a database allocates most[1] of the memory on a
machine to a shared memory segment - that *ought* to be a candidate for using
transparent huge pages.  Now that we have them they seem a better choice (much
more flexibility) than hugetlbfs.

-Tony

[1] I've been told that it is normal to configure over 95% of physical memory to the
shared memory region to run a particular transaction based benchmark with one
commercial data base application.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-09-17 16:47 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-14 23:53 [RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood Wanpeng Li
2013-09-14 23:53 ` [RESEND PATCH v2 2/4] mm/hwpoison: fix miss catch transparent huge page Wanpeng Li
2013-09-15  0:14   ` Andi Kleen
2013-09-14 23:53 ` [RESEND PATCH v2 3/4] mm/hwpoison: fix false report 2nd try page recovery Wanpeng Li
2013-09-15  0:15   ` Andi Kleen
2013-09-14 23:53 ` [RESEND PATCH v2 4/4] mm/hwpoison: fix the lack of one reference count against poisoned page Wanpeng Li
2013-09-15  0:16   ` Andi Kleen
2013-09-15  0:13 ` [RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood Andi Kleen
2013-09-16 21:50   ` Luck, Tony
2013-09-16 22:09     ` Naoya Horiguchi
     [not found]       ` <20130916232345.GA3241@hacker.(null)>
2013-09-16 23:44         ` Luck, Tony
     [not found]           ` <20130917000817.GA5996@hacker.(null)>
2013-09-17 16:47             ` Luck, Tony

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).