All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] HWpoison: further fixes and cleanups
@ 2020-09-08  7:56 Oscar Salvador
  2020-09-08  7:56 ` [PATCH v2 1/5] mm,hwpoison: Take free pages off the buddy freelists Oscar Salvador
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: Oscar Salvador @ 2020-09-08  7:56 UTC (permalink / raw)
  To: akpm
  Cc: naoya.horiguchi, mhocko, tony.luck, cai, linux-kernel, linux-mm,
	Oscar Salvador

The important bit of this patchset is patch#1, which is a fix to take off
HWPoison pages off a buddy freelist since it can lead us to having HWPoison
pages back in the game without no one noticing it.
So fix it (we did that already for soft_offline_page [1]).

The other patches are clean-ups and not that important, so if anything,
consider patch#1 for inclusion.

[1] https://patchwork.kernel.org/cover/11704083/

Thanks

Oscar Salvador (5):
  mm,hwpoison: Take free pages off the buddy freelists
  mm,hwpoison: Refactor madvise_inject_error
  mm,hwpoison: Drain pcplists before bailing out for non-buddy
    zero-refcount page
  mm,hwpoison: Drop unneeded pcplist draining
  mm,hwpoison: Remove stale code

 mm/madvise.c        | 34 ++++++++++++------------------
 mm/memory-failure.c | 50 +++++++++++++++++++++++++++++++++------------
 2 files changed, 50 insertions(+), 34 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/5] mm,hwpoison: Take free pages off the buddy freelists
  2020-09-08  7:56 [PATCH v2 0/5] HWpoison: further fixes and cleanups Oscar Salvador
@ 2020-09-08  7:56 ` Oscar Salvador
  2020-09-11  2:37   ` HORIGUCHI NAOYA(堀口 直也)
  2020-09-08  7:56 ` [PATCH v2 2/5] mm,hwpoison: Refactor madvise_inject_error Oscar Salvador
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 8+ messages in thread
From: Oscar Salvador @ 2020-09-08  7:56 UTC (permalink / raw)
  To: akpm
  Cc: naoya.horiguchi, mhocko, tony.luck, cai, linux-kernel, linux-mm,
	Oscar Salvador

The crux of the matter is that historically we left poisoned pages
in the buddy system because we have some checks in place when
allocating a page that a gatekeeper for poisoned pages.
Unfortunately, we do have other users (e.g: compaction [1]) that scan
buddy freelists and try to get a page from there without checking
whether the page is HWPoison.

As I stated already, I think it is fundamentally wrong to keep
HWPoison pages within the buddy systems, checks in place or not.

Let us fix this we same way we did for soft_offline [2], and take
the page off the buddy freelist, so it is completely unreachable.

Note that this is fairly simple to trigger, as we only need
to poison free buddy pages (madvise MADV_HWPOISON) and then we need
to run some sort of memory stress system.

Just for a matter of reference, I put a dump_page in compaction_alloc
to trigger for HWPoison patches:

kernel: page:0000000012b2982b refcount:1 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x1d5db
kernel: flags: 0xfffffc0800000(hwpoison)
kernel: raw: 000fffffc0800000 ffffea00007573c8 ffffc90000857de0 0000000000000000
kernel: raw: 0000000000000001 0000000000000000 00000001ffffffff 0000000000000000
kernel: page dumped because: compaction_alloc

kernel: CPU: 4 PID: 123 Comm: kcompactd0 Tainted: G            E     5.9.0-rc2-mm1-1-default+ #5
kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
kernel: Call Trace:
kernel:  dump_stack+0x6d/0x8b
kernel:  compaction_alloc+0xb2/0xc0
kernel:  migrate_pages+0x2a6/0x12a0
kernel:  ? isolate_freepages+0xc80/0xc80
kernel:  ? __ClearPageMovable+0xb0/0xb0
kernel:  compact_zone+0x5eb/0x11c0
kernel:  ? finish_task_switch+0x74/0x300
kernel:  ? lock_timer_base+0xa8/0x170
kernel:  proactive_compact_node+0x89/0xf0
kernel:  ? kcompactd+0x2d0/0x3a0
kernel:  kcompactd+0x2d0/0x3a0
kernel:  ? finish_wait+0x80/0x80
kernel:  ? kcompactd_do_work+0x350/0x350
kernel:  kthread+0x118/0x130
kernel:  ? kthread_associate_blkcg+0xa0/0xa0
kernel:  ret_from_fork+0x22/0x30

After that, if e.g: someone faults in the page, that someone will get killed
unexpectedly.

While we are at it, I also changed the action result for such cases.
I think that MF_DELAYED is a bit misleading, because in case we could
contain the page and take it off the buddy, such a page is not to be
used again unless it is unpoison, so we "fixed" the situation.

So unless I am missing something, I strongly think that we should report
MF_RECOVERED.

[1] https://lore.kernel.org/linux-mm/20190826104144.GA7849@linux/T/#u
[2] https://patchwork.kernel.org/patch/11694847/

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 mm/memory-failure.c | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 696505f56910..b0ef5db45ba6 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1323,8 +1323,9 @@ int memory_failure(unsigned long pfn, int flags)
 	struct page *hpage;
 	struct page *orig_head;
 	struct dev_pagemap *pgmap;
-	int res;
 	unsigned long page_flags;
+	int res = 0;
+	bool retry = true;
 
 	if (!sysctl_memory_failure_recovery)
 		panic("Memory failure on page %lx", pfn);
@@ -1364,10 +1365,21 @@ int memory_failure(unsigned long pfn, int flags)
 	 * In fact it's dangerous to directly bump up page count from 0,
 	 * that may make page_ref_freeze()/page_ref_unfreeze() mismatch.
 	 */
+try_again:
 	if (!(flags & MF_COUNT_INCREASED) && !get_hwpoison_page(p)) {
 		if (is_free_buddy_page(p)) {
-			action_result(pfn, MF_MSG_BUDDY, MF_DELAYED);
-			return 0;
+			if (take_page_off_buddy(p)) {
+				action_result(pfn, MF_MSG_BUDDY, MF_RECOVERED);
+				return 0;
+			} else {
+				/* We lost the race, try again */
+				if (retry) {
+					retry = false;
+					goto try_again;
+				}
+				action_result(pfn, MF_MSG_BUDDY, MF_IGNORED);
+				return -EBUSY;
+			}
 		} else {
 			action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED);
 			return -EBUSY;
@@ -1393,11 +1405,15 @@ int memory_failure(unsigned long pfn, int flags)
 	shake_page(p, 0);
 	/* shake_page could have turned it free. */
 	if (!PageLRU(p) && is_free_buddy_page(p)) {
+		if (!take_page_off_buddy(p))
+			res = -EBUSY;
+
 		if (flags & MF_COUNT_INCREASED)
-			action_result(pfn, MF_MSG_BUDDY, MF_DELAYED);
+			action_result(pfn, MF_MSG_BUDDY, res ? MF_IGNORED : MF_RECOVERED);
 		else
-			action_result(pfn, MF_MSG_BUDDY_2ND, MF_DELAYED);
-		return 0;
+			action_result(pfn, MF_MSG_BUDDY_2ND, res ? MF_IGNORED : MF_RECOVERED);
+
+		return res;
 	}
 
 	lock_page(p);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/5] mm,hwpoison: Refactor madvise_inject_error
  2020-09-08  7:56 [PATCH v2 0/5] HWpoison: further fixes and cleanups Oscar Salvador
  2020-09-08  7:56 ` [PATCH v2 1/5] mm,hwpoison: Take free pages off the buddy freelists Oscar Salvador
@ 2020-09-08  7:56 ` Oscar Salvador
  2020-09-08  7:56 ` [PATCH v2 3/5] mm,hwpoison: Drain pcplists before bailing out for non-buddy zero-refcount page Oscar Salvador
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Oscar Salvador @ 2020-09-08  7:56 UTC (permalink / raw)
  To: akpm
  Cc: naoya.horiguchi, mhocko, tony.luck, cai, linux-kernel, linux-mm,
	Oscar Salvador, Oscar Salvador

Make a proper if-else condition for {hard,soft}-offline.

Signed-off-by: Oscar Salvador <osalvador@suse.com>
---
 mm/madvise.c | 30 +++++++++++++-----------------
 1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index e32e7efbba0f..e92e06890b08 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -885,7 +885,6 @@ static long madvise_remove(struct vm_area_struct *vma,
 static int madvise_inject_error(int behavior,
 		unsigned long start, unsigned long end)
 {
-	struct page *page;
 	struct zone *zone;
 	unsigned long size;
 
@@ -895,6 +894,7 @@ static int madvise_inject_error(int behavior,
 
 	for (; start < end; start += size) {
 		unsigned long pfn;
+		struct page *page;
 		int ret;
 
 		ret = get_user_pages_fast(start, 1, 0, &page);
@@ -911,25 +911,21 @@ static int madvise_inject_error(int behavior,
 
 		if (behavior == MADV_SOFT_OFFLINE) {
 			pr_info("Soft offlining pfn %#lx at process virtual address %#lx\n",
-					pfn, start);
-
+				 pfn, start);
 			ret = soft_offline_page(pfn, MF_COUNT_INCREASED);
-			if (ret)
-				return ret;
-			continue;
+		} else {
+			pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n",
+				 pfn, start);
+			/*
+			 * Drop the page reference taken by get_user_pages_fast(). In
+			 * the absence of MF_COUNT_INCREASED the memory_failure()
+			 * routine is responsible for pinning the page to prevent it
+			 * from being released back to the page allocator.
+			 */
+			put_page(page);
+			ret = memory_failure(pfn, 0);
 		}
 
-		pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n",
-				pfn, start);
-
-		/*
-		 * Drop the page reference taken by get_user_pages_fast(). In
-		 * the absence of MF_COUNT_INCREASED the memory_failure()
-		 * routine is responsible for pinning the page to prevent it
-		 * from being released back to the page allocator.
-		 */
-		put_page(page);
-		ret = memory_failure(pfn, 0);
 		if (ret)
 			return ret;
 	}
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 3/5] mm,hwpoison: Drain pcplists before bailing out for non-buddy zero-refcount page
  2020-09-08  7:56 [PATCH v2 0/5] HWpoison: further fixes and cleanups Oscar Salvador
  2020-09-08  7:56 ` [PATCH v2 1/5] mm,hwpoison: Take free pages off the buddy freelists Oscar Salvador
  2020-09-08  7:56 ` [PATCH v2 2/5] mm,hwpoison: Refactor madvise_inject_error Oscar Salvador
@ 2020-09-08  7:56 ` Oscar Salvador
  2020-09-08  7:56 ` [PATCH v2 4/5] mm,hwpoison: Drop unneeded pcplist draining Oscar Salvador
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Oscar Salvador @ 2020-09-08  7:56 UTC (permalink / raw)
  To: akpm
  Cc: naoya.horiguchi, mhocko, tony.luck, cai, linux-kernel, linux-mm,
	Oscar Salvador

A page with 0-refcount and !PageBuddy could perfectly be a pcppage.
Currently, we bail out with an error if we encounter such a page,
meaning that we do not handle pcppages neither from hard-offline
nor from soft-offline path.

Fix this by draining pcplists whenever we find this kind of page
and retry the check again.
It might be that pcplists have been spilled into the buddy allocator
and so we can handle it.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 mm/memory-failure.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index b0ef5db45ba6..e57fc5c5117a 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -950,13 +950,13 @@ static int page_action(struct page_state *ps, struct page *p,
 }
 
 /**
- * get_hwpoison_page() - Get refcount for memory error handling:
+ * __get_hwpoison_page() - Get refcount for memory error handling:
  * @page:	raw error page (hit by memory error)
  *
  * Return: return 0 if failed to grab the refcount, otherwise true (some
  * non-zero value.)
  */
-static int get_hwpoison_page(struct page *page)
+static int __get_hwpoison_page(struct page *page)
 {
 	struct page *head = compound_head(page);
 
@@ -986,6 +986,26 @@ static int get_hwpoison_page(struct page *page)
 	return 0;
 }
 
+static int get_hwpoison_page(struct page *p)
+{
+	int ret;
+	bool drained = false;
+
+retry:
+	ret = __get_hwpoison_page(p);
+	if (!ret && !is_free_buddy_page(p) && !page_count(p) && !drained) {
+		/*
+		 * The page might be in a pcplist, so try to drain those
+		 * and see if we are lucky.
+		 */
+		drain_all_pages(page_zone(p));
+		drained = true;
+		goto retry;
+	}
+
+	return ret;
+}
+
 /*
  * Do all that is necessary to remove user space mappings. Unmap
  * the pages and send SIGBUS to the processes if the data was dirty.
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 4/5] mm,hwpoison: Drop unneeded pcplist draining
  2020-09-08  7:56 [PATCH v2 0/5] HWpoison: further fixes and cleanups Oscar Salvador
                   ` (2 preceding siblings ...)
  2020-09-08  7:56 ` [PATCH v2 3/5] mm,hwpoison: Drain pcplists before bailing out for non-buddy zero-refcount page Oscar Salvador
@ 2020-09-08  7:56 ` Oscar Salvador
  2020-09-08  7:56 ` [PATCH v2 5/5] mm,hwpoison: Remove stale code Oscar Salvador
  2020-09-10 20:07 ` [PATCH v2 0/5] HWpoison: further fixes and cleanups osalvador
  5 siblings, 0 replies; 8+ messages in thread
From: Oscar Salvador @ 2020-09-08  7:56 UTC (permalink / raw)
  To: akpm
  Cc: naoya.horiguchi, mhocko, tony.luck, cai, linux-kernel, linux-mm,
	Oscar Salvador

memory_failure and soft_offline_path paths now drain pcplists by calling
get_hwpoison_page.
memory_failure flags the page as HWPoison before, so that page cannot longer
go into a pcplist, and soft_offline_page only flags a page as HWPoison
if 1) we took the page off a buddy freelist 2) the page was in-use
and we migrated it 3) was a clean pagecache.

Because of that, a page cannot longer be poisoned and be in a pcplist.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 mm/madvise.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index e92e06890b08..e7cdfabd8aca 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -930,10 +930,6 @@ static int madvise_inject_error(int behavior,
 			return ret;
 	}
 
-	/* Ensure that all poisoned pages are removed from per-cpu lists */
-	for_each_populated_zone(zone)
-		drain_all_pages(zone);
-
 	return 0;
 }
 #endif
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 5/5] mm,hwpoison: Remove stale code
  2020-09-08  7:56 [PATCH v2 0/5] HWpoison: further fixes and cleanups Oscar Salvador
                   ` (3 preceding siblings ...)
  2020-09-08  7:56 ` [PATCH v2 4/5] mm,hwpoison: Drop unneeded pcplist draining Oscar Salvador
@ 2020-09-08  7:56 ` Oscar Salvador
  2020-09-10 20:07 ` [PATCH v2 0/5] HWpoison: further fixes and cleanups osalvador
  5 siblings, 0 replies; 8+ messages in thread
From: Oscar Salvador @ 2020-09-08  7:56 UTC (permalink / raw)
  To: akpm
  Cc: naoya.horiguchi, mhocko, tony.luck, cai, linux-kernel, linux-mm,
	Oscar Salvador

Currently we call shake_page and then check whether the page is
Buddy because shake_page calls drain_all_pages, which sends pcp-pages
back to the buddy freelists, so we could have a chance to handle
free pages.

get_hwpoison_page already calls drain_all_pages, and we do call
get_hwpoison_page right before coming here, so we should be on
the safe side.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 mm/memory-failure.c | 12 ------------
 1 file changed, 12 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index e57fc5c5117a..eb72360bd78b 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1423,18 +1423,6 @@ int memory_failure(unsigned long pfn, int flags)
 	 * walked by the page reclaim code, however that's not a big loss.
 	 */
 	shake_page(p, 0);
-	/* shake_page could have turned it free. */
-	if (!PageLRU(p) && is_free_buddy_page(p)) {
-		if (!take_page_off_buddy(p))
-			res = -EBUSY;
-
-		if (flags & MF_COUNT_INCREASED)
-			action_result(pfn, MF_MSG_BUDDY, res ? MF_IGNORED : MF_RECOVERED);
-		else
-			action_result(pfn, MF_MSG_BUDDY_2ND, res ? MF_IGNORED : MF_RECOVERED);
-
-		return res;
-	}
 
 	lock_page(p);
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/5] HWpoison: further fixes and cleanups
  2020-09-08  7:56 [PATCH v2 0/5] HWpoison: further fixes and cleanups Oscar Salvador
                   ` (4 preceding siblings ...)
  2020-09-08  7:56 ` [PATCH v2 5/5] mm,hwpoison: Remove stale code Oscar Salvador
@ 2020-09-10 20:07 ` osalvador
  5 siblings, 0 replies; 8+ messages in thread
From: osalvador @ 2020-09-10 20:07 UTC (permalink / raw)
  To: akpm; +Cc: naoya.horiguchi, mhocko, tony.luck, cai, linux-kernel, linux-mm

On 2020-09-08 09:56, Oscar Salvador wrote:
> The important bit of this patchset is patch#1, which is a fix to take 
> off
> HWPoison pages off a buddy freelist since it can lead us to having 
> HWPoison
> pages back in the game without no one noticing it.
> So fix it (we did that already for soft_offline_page [1]).
> 
> The other patches are clean-ups and not that important, so if anything,
> consider patch#1 for inclusion.
> 
> [1] https://patchwork.kernel.org/cover/11704083/
> 
> Thanks
> 
> Oscar Salvador (5):
>   mm,hwpoison: Take free pages off the buddy freelists
>   mm,hwpoison: Refactor madvise_inject_error
>   mm,hwpoison: Drain pcplists before bailing out for non-buddy
>     zero-refcount page
>   mm,hwpoison: Drop unneeded pcplist draining
>   mm,hwpoison: Remove stale code

Naoya, any insight?

Thanks


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/5] mm,hwpoison: Take free pages off the buddy freelists
  2020-09-08  7:56 ` [PATCH v2 1/5] mm,hwpoison: Take free pages off the buddy freelists Oscar Salvador
@ 2020-09-11  2:37   ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 0 replies; 8+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2020-09-11  2:37 UTC (permalink / raw)
  To: Oscar Salvador; +Cc: akpm, mhocko, tony.luck, cai, linux-kernel, linux-mm

On Tue, Sep 08, 2020 at 09:56:22AM +0200, Oscar Salvador wrote:
> The crux of the matter is that historically we left poisoned pages
> in the buddy system because we have some checks in place when
> allocating a page that a gatekeeper for poisoned pages.
> Unfortunately, we do have other users (e.g: compaction [1]) that scan
> buddy freelists and try to get a page from there without checking
> whether the page is HWPoison.
> 
> As I stated already, I think it is fundamentally wrong to keep
> HWPoison pages within the buddy systems, checks in place or not.
> 
> Let us fix this we same way we did for soft_offline [2], and take
> the page off the buddy freelist, so it is completely unreachable.
> 
> Note that this is fairly simple to trigger, as we only need
> to poison free buddy pages (madvise MADV_HWPOISON) and then we need
> to run some sort of memory stress system.
> 
> Just for a matter of reference, I put a dump_page in compaction_alloc
> to trigger for HWPoison patches:
> 
> kernel: page:0000000012b2982b refcount:1 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x1d5db
> kernel: flags: 0xfffffc0800000(hwpoison)
> kernel: raw: 000fffffc0800000 ffffea00007573c8 ffffc90000857de0 0000000000000000
> kernel: raw: 0000000000000001 0000000000000000 00000001ffffffff 0000000000000000
> kernel: page dumped because: compaction_alloc
> 
> kernel: CPU: 4 PID: 123 Comm: kcompactd0 Tainted: G            E     5.9.0-rc2-mm1-1-default+ #5
> kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
> kernel: Call Trace:
> kernel:  dump_stack+0x6d/0x8b
> kernel:  compaction_alloc+0xb2/0xc0
> kernel:  migrate_pages+0x2a6/0x12a0
> kernel:  ? isolate_freepages+0xc80/0xc80
> kernel:  ? __ClearPageMovable+0xb0/0xb0
> kernel:  compact_zone+0x5eb/0x11c0
> kernel:  ? finish_task_switch+0x74/0x300
> kernel:  ? lock_timer_base+0xa8/0x170
> kernel:  proactive_compact_node+0x89/0xf0
> kernel:  ? kcompactd+0x2d0/0x3a0
> kernel:  kcompactd+0x2d0/0x3a0
> kernel:  ? finish_wait+0x80/0x80
> kernel:  ? kcompactd_do_work+0x350/0x350
> kernel:  kthread+0x118/0x130
> kernel:  ? kthread_associate_blkcg+0xa0/0xa0
> kernel:  ret_from_fork+0x22/0x30
> 
> After that, if e.g: someone faults in the page, that someone will get killed
> unexpectedly.
> 
> While we are at it, I also changed the action result for such cases.
> I think that MF_DELAYED is a bit misleading, because in case we could
> contain the page and take it off the buddy, such a page is not to be
> used again unless it is unpoison, so we "fixed" the situation.

Thanks for pointing out this.

Originally (before the reported issue related to compaction is recognized),
this case is categorized as MF_DELAYED because the error page is to be
removed from buddy later when page allocator finds it in subsequent
allocation time (not in memory_failure() time).

But this patch changes the situation, and the removal from buddy is done
immediately in memory_failure(), so ...

> 
> So unless I am missing something, I strongly think that we should report
> MF_RECOVERED.

this change looks correct to me.

> 
> [1] https://lore.kernel.org/linux-mm/20190826104144.GA7849@linux/T/#u
> [2] https://patchwork.kernel.org/patch/11694847/
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---
>  mm/memory-failure.c | 28 ++++++++++++++++++++++------
>  1 file changed, 22 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 696505f56910..b0ef5db45ba6 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1323,8 +1323,9 @@ int memory_failure(unsigned long pfn, int flags)
>  	struct page *hpage;
>  	struct page *orig_head;
>  	struct dev_pagemap *pgmap;
> -	int res;
>  	unsigned long page_flags;
> +	int res = 0;
> +	bool retry = true;
>  
>  	if (!sysctl_memory_failure_recovery)
>  		panic("Memory failure on page %lx", pfn);
> @@ -1364,10 +1365,21 @@ int memory_failure(unsigned long pfn, int flags)
>  	 * In fact it's dangerous to directly bump up page count from 0,
>  	 * that may make page_ref_freeze()/page_ref_unfreeze() mismatch.
>  	 */
> +try_again:
>  	if (!(flags & MF_COUNT_INCREASED) && !get_hwpoison_page(p)) {
>  		if (is_free_buddy_page(p)) {
> -			action_result(pfn, MF_MSG_BUDDY, MF_DELAYED);
> -			return 0;
> +			if (take_page_off_buddy(p)) {
> +				action_result(pfn, MF_MSG_BUDDY, MF_RECOVERED);
> +				return 0;
> +			} else {
> +				/* We lost the race, try again */
> +				if (retry) {
> +					retry = false;
> +					goto try_again;
> +				}
> +				action_result(pfn, MF_MSG_BUDDY, MF_IGNORED);

According to include/linux/mm.h MF_IGNORED means "Error: cannot be handled",

    /*
     * Error handlers for various types of pages.
     */
    enum mf_result {
            MF_IGNORED,     /* Error: cannot be handled */
            MF_FAILED,      /* Error: handling failed */
            MF_DELAYED,     /* Will be handled later */
            MF_RECOVERED,   /* Successfully recovered */
    };

This case is an occasional failure, so maybe MF_FAILED seems better to me.

> +				return -EBUSY;
> +			}
>  		} else {
>  			action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED);
>  			return -EBUSY;
> @@ -1393,11 +1405,15 @@ int memory_failure(unsigned long pfn, int flags)
>  	shake_page(p, 0);
>  	/* shake_page could have turned it free. */
>  	if (!PageLRU(p) && is_free_buddy_page(p)) {
> +		if (!take_page_off_buddy(p))
> +			res = -EBUSY;
> +
>  		if (flags & MF_COUNT_INCREASED)
> -			action_result(pfn, MF_MSG_BUDDY, MF_DELAYED);
> +			action_result(pfn, MF_MSG_BUDDY, res ? MF_IGNORED : MF_RECOVERED);
>  		else
> -			action_result(pfn, MF_MSG_BUDDY_2ND, MF_DELAYED);
> -		return 0;
> +			action_result(pfn, MF_MSG_BUDDY_2ND, res ? MF_IGNORED : MF_RECOVERED);
> +
> +		return res;

Althouth this code will be removed in 5/5, you can use MF_FAILED in failure case too.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-09-11  2:38 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-08  7:56 [PATCH v2 0/5] HWpoison: further fixes and cleanups Oscar Salvador
2020-09-08  7:56 ` [PATCH v2 1/5] mm,hwpoison: Take free pages off the buddy freelists Oscar Salvador
2020-09-11  2:37   ` HORIGUCHI NAOYA(堀口 直也)
2020-09-08  7:56 ` [PATCH v2 2/5] mm,hwpoison: Refactor madvise_inject_error Oscar Salvador
2020-09-08  7:56 ` [PATCH v2 3/5] mm,hwpoison: Drain pcplists before bailing out for non-buddy zero-refcount page Oscar Salvador
2020-09-08  7:56 ` [PATCH v2 4/5] mm,hwpoison: Drop unneeded pcplist draining Oscar Salvador
2020-09-08  7:56 ` [PATCH v2 5/5] mm,hwpoison: Remove stale code Oscar Salvador
2020-09-10 20:07 ` [PATCH v2 0/5] HWpoison: further fixes and cleanups osalvador

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.