linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/4] HWpoison: further fixes and cleanups
@ 2020-10-13 14:44 Oscar Salvador
  2020-10-13 14:44 ` [PATCH v5 1/4] mm,hwpoison: drain pcplists before bailing out for non-buddy zero-refcount page Oscar Salvador
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Oscar Salvador @ 2020-10-13 14:44 UTC (permalink / raw)
  To: n-horiguchi; +Cc: linux-mm, linux-kernel, Oscar Salvador

This patchset includes some more fixes and a cleanup.

Patch#2 and patch#3 are both fixes for taking a HWpoison page off a buddy
freelist, since having them there has proved to be bad (see [1] and
pathch#2's commit log).
Patch#3 does the same for hugetlb pages.

[1] https://lkml.org/lkml/2020/9/22/565

Thanks

Oscar Salvador (4):
  mm,hwpoison: drain pcplists before bailing out for non-buddy
    zero-refcount page
  mm,hwpoison: take free pages off the buddy freelists
  mm,hwpoison: take free pages off the buddy freelists for hugetlb
  mm,hwpoison: drop unneeded pcplist draining

 mm/madvise.c        |  5 ----
 mm/memory-failure.c | 70 +++++++++++++++++++++++++++++++++------------
 2 files changed, 52 insertions(+), 23 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v5 1/4] mm,hwpoison: drain pcplists before bailing out for non-buddy zero-refcount page
  2020-10-13 14:44 [PATCH v5 0/4] HWpoison: further fixes and cleanups Oscar Salvador
@ 2020-10-13 14:44 ` Oscar Salvador
  2020-11-25 16:26   ` Vlastimil Babka
  2020-10-13 14:44 ` [PATCH v5 2/4] mm,hwpoison: take free pages off the buddy freelists Oscar Salvador
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Oscar Salvador @ 2020-10-13 14:44 UTC (permalink / raw)
  To: n-horiguchi; +Cc: linux-mm, linux-kernel, Oscar Salvador, Naoya Horiguchi

A page with 0-refcount and !PageBuddy could perfectly be a pcppage.
Currently, we bail out with an error if we encounter such a page, meaning
that we do not handle pcppages neither from hard-offline nor from
soft-offline path.

Fix this by draining pcplists whenever we find this kind of page and retry
the check again.  It might be that pcplists have been spilled into the
buddy allocator and so we can handle it.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
 mm/memory-failure.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index c0bb186bba62..e2f12410c594 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -946,13 +946,13 @@ static int page_action(struct page_state *ps, struct page *p,
 }
 
 /**
- * get_hwpoison_page() - Get refcount for memory error handling:
+ * __get_hwpoison_page() - Get refcount for memory error handling:
  * @page:	raw error page (hit by memory error)
  *
  * Return: return 0 if failed to grab the refcount, otherwise true (some
  * non-zero value.)
  */
-static int get_hwpoison_page(struct page *page)
+static int __get_hwpoison_page(struct page *page)
 {
 	struct page *head = compound_head(page);
 
@@ -982,6 +982,26 @@ static int get_hwpoison_page(struct page *page)
 	return 0;
 }
 
+static int get_hwpoison_page(struct page *p)
+{
+	int ret;
+	bool drained = false;
+
+retry:
+	ret = __get_hwpoison_page(p);
+	if (!ret && !is_free_buddy_page(p) && !page_count(p) && !drained) {
+		/*
+		 * The page might be in a pcplist, so try to drain those
+		 * and see if we are lucky.
+		 */
+		drain_all_pages(page_zone(p));
+		drained = true;
+		goto retry;
+	}
+
+	return ret;
+}
+
 /*
  * Do all that is necessary to remove user space mappings. Unmap
  * the pages and send SIGBUS to the processes if the data was dirty.
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 2/4] mm,hwpoison: take free pages off the buddy freelists
  2020-10-13 14:44 [PATCH v5 0/4] HWpoison: further fixes and cleanups Oscar Salvador
  2020-10-13 14:44 ` [PATCH v5 1/4] mm,hwpoison: drain pcplists before bailing out for non-buddy zero-refcount page Oscar Salvador
@ 2020-10-13 14:44 ` Oscar Salvador
  2020-11-25 16:35   ` Vlastimil Babka
  2020-10-13 14:44 ` [PATCH v5 3/4] mm,hwpoison: take free pages off the buddy freelists for hugetlb Oscar Salvador
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Oscar Salvador @ 2020-10-13 14:44 UTC (permalink / raw)
  To: n-horiguchi; +Cc: linux-mm, linux-kernel, Oscar Salvador, Naoya Horiguchi

The crux of the matter is that historically we left poisoned pages in the
buddy system because we have some checks in place when allocating a page
that are gatekeeper for poisoned pages.  Unfortunately, we do have other
users (e.g: compaction [1]) that scan buddy freelists and try to get a
page from there without checking whether the page is HWPoison.

As I stated already, I think it is fundamentally wrong to keep HWPoison
pages within the buddy systems, checks in place or not.

Let us fix this the same way we did for soft_offline [2], taking the page
off the buddy freelist so it is completely unreachable.

Note that this is fairly simple to trigger, as we only need to poison free
buddy pages (madvise MADV_HWPOISON) and then run some sort of memory stress
system.

Just for a matter of reference, I put a dump_page() in compaction_alloc()
to trigger for HWPoison patches:

kernel: page:0000000012b2982b refcount:1 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x1d5db
kernel: flags: 0xfffffc0800000(hwpoison)
kernel: raw: 000fffffc0800000 ffffea00007573c8 ffffc90000857de0 0000000000000000
kernel: raw: 0000000000000001 0000000000000000 00000001ffffffff 0000000000000000
kernel: page dumped because: compaction_alloc

kernel: CPU: 4 PID: 123 Comm: kcompactd0 Tainted: G            E     5.9.0-rc2-mm1-1-default+ #5
kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
kernel: Call Trace:
kernel:  dump_stack+0x6d/0x8b
kernel:  compaction_alloc+0xb2/0xc0
kernel:  migrate_pages+0x2a6/0x12a0
kernel:  ? isolate_freepages+0xc80/0xc80
kernel:  ? __ClearPageMovable+0xb0/0xb0
kernel:  compact_zone+0x5eb/0x11c0
kernel:  ? finish_task_switch+0x74/0x300
kernel:  ? lock_timer_base+0xa8/0x170
kernel:  proactive_compact_node+0x89/0xf0
kernel:  ? kcompactd+0x2d0/0x3a0
kernel:  kcompactd+0x2d0/0x3a0
kernel:  ? finish_wait+0x80/0x80
kernel:  ? kcompactd_do_work+0x350/0x350
kernel:  kthread+0x118/0x130
kernel:  ? kthread_associate_blkcg+0xa0/0xa0
kernel:  ret_from_fork+0x22/0x30

After that, if e.g: a process faults in the page,  it will get killed
unexpectedly.
Fix it by containing the page immediatelly.

Besides that, two more changes can be noticed:

* MF_DELAYED no longer suits as we are fixing the issue by containing
  the page immediately, so it does no longer rely on the allocation-time
  checks to stop HWPoison to be handed over.
  gain unless it is unpoisoned, so we fixed the situation.
  Because of that, let us use MF_RECOVERED from now on.

* The second block that handles PageBuddy pages is no longer needed:
  We call shake_page and then check whether the page is Buddy
  because shake_page calls drain_all_pages, which sends pcp-pages back to
  the buddy freelists, so we could have a chance to handle free pages.
  Currently, get_hwpoison_page already calls drain_all_pages, and we call
  get_hwpoison_page right before coming here, so we should be on the safe
  side.

[1] https://lore.kernel.org/linux-mm/20190826104144.GA7849@linux/T/#u
[2] https://patchwork.kernel.org/cover/11792607/

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
 mm/memory-failure.c | 27 +++++++++++++++++----------
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index e2f12410c594..181bed890c16 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1341,6 +1341,7 @@ int memory_failure(unsigned long pfn, int flags)
 	struct dev_pagemap *pgmap;
 	int res;
 	unsigned long page_flags;
+	bool retry = true;
 
 	if (!sysctl_memory_failure_recovery)
 		panic("Memory failure on page %lx", pfn);
@@ -1358,6 +1359,7 @@ int memory_failure(unsigned long pfn, int flags)
 		return -ENXIO;
 	}
 
+try_again:
 	if (PageHuge(p))
 		return memory_failure_hugetlb(pfn, flags);
 	if (TestSetPageHWPoison(p)) {
@@ -1382,8 +1384,21 @@ int memory_failure(unsigned long pfn, int flags)
 	 */
 	if (!(flags & MF_COUNT_INCREASED) && !get_hwpoison_page(p)) {
 		if (is_free_buddy_page(p)) {
-			action_result(pfn, MF_MSG_BUDDY, MF_DELAYED);
-			return 0;
+			if (take_page_off_buddy(p)) {
+				page_ref_inc(p);
+				res = MF_RECOVERED;
+			} else {
+				/* We lost the race, try again */
+				if (retry) {
+					ClearPageHWPoison(p);
+					num_poisoned_pages_dec();
+					retry = false;
+					goto try_again;
+				}
+				res = MF_FAILED;
+			}
+			action_result(pfn, MF_MSG_BUDDY, res);
+			return res == MF_RECOVERED ? 0 : -EBUSY;
 		} else {
 			action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED);
 			return -EBUSY;
@@ -1407,14 +1422,6 @@ int memory_failure(unsigned long pfn, int flags)
 	 * walked by the page reclaim code, however that's not a big loss.
 	 */
 	shake_page(p, 0);
-	/* shake_page could have turned it free. */
-	if (!PageLRU(p) && is_free_buddy_page(p)) {
-		if (flags & MF_COUNT_INCREASED)
-			action_result(pfn, MF_MSG_BUDDY, MF_DELAYED);
-		else
-			action_result(pfn, MF_MSG_BUDDY_2ND, MF_DELAYED);
-		return 0;
-	}
 
 	lock_page(p);
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 3/4] mm,hwpoison: take free pages off the buddy freelists for hugetlb
  2020-10-13 14:44 [PATCH v5 0/4] HWpoison: further fixes and cleanups Oscar Salvador
  2020-10-13 14:44 ` [PATCH v5 1/4] mm,hwpoison: drain pcplists before bailing out for non-buddy zero-refcount page Oscar Salvador
  2020-10-13 14:44 ` [PATCH v5 2/4] mm,hwpoison: take free pages off the buddy freelists Oscar Salvador
@ 2020-10-13 14:44 ` Oscar Salvador
  2020-10-14  0:05   ` HORIGUCHI NAOYA(堀口 直也)
  2020-11-25 16:36   ` Vlastimil Babka
  2020-10-13 14:44 ` [PATCH v5 4/4] mm,hwpoison: drop unneeded pcplist draining Oscar Salvador
  2020-11-10  6:21 ` [PATCH v5 0/4] HWpoison: further fixes and cleanups Oscar Salvador
  4 siblings, 2 replies; 12+ messages in thread
From: Oscar Salvador @ 2020-10-13 14:44 UTC (permalink / raw)
  To: n-horiguchi; +Cc: linux-mm, linux-kernel, Oscar Salvador

Currently, free hugetlb get dissolved, but we also need to make sure
to take the poisoned subpage off the buddy frelists, so no one stumbles
upon it (see previous patch for more information).

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 mm/memory-failure.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 181bed890c16..30aadeca97d2 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -809,7 +809,7 @@ static int me_swapcache_clean(struct page *p, unsigned long pfn)
  */
 static int me_huge_page(struct page *p, unsigned long pfn)
 {
-	int res = 0;
+	int res;
 	struct page *hpage = compound_head(p);
 	struct address_space *mapping;
 
@@ -820,6 +820,7 @@ static int me_huge_page(struct page *p, unsigned long pfn)
 	if (mapping) {
 		res = truncate_error_page(hpage, pfn, mapping);
 	} else {
+		res = MF_FAILED;
 		unlock_page(hpage);
 		/*
 		 * migration entry prevents later access on error anonymous
@@ -828,8 +829,10 @@ static int me_huge_page(struct page *p, unsigned long pfn)
 		 */
 		if (PageAnon(hpage))
 			put_page(hpage);
-		dissolve_free_huge_page(p);
-		res = MF_RECOVERED;
+		if (!dissolve_free_huge_page(p) && take_page_off_buddy(p)) {
+			page_ref_inc(p);
+			res = MF_RECOVERED;
+		}
 		lock_page(hpage);
 	}
 
@@ -1198,9 +1201,13 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
 			}
 		}
 		unlock_page(head);
-		dissolve_free_huge_page(p);
-		action_result(pfn, MF_MSG_FREE_HUGE, MF_DELAYED);
-		return 0;
+		res = MF_FAILED;
+		if (!dissolve_free_huge_page(p) && take_page_off_buddy(p)) {
+			page_ref_inc(p);
+			res = MF_RECOVERED;
+		}
+		action_result(pfn, MF_MSG_FREE_HUGE, res);
+		return res == MF_RECOVERED ? 0 : -EBUSY;
 	}
 
 	lock_page(head);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 4/4] mm,hwpoison: drop unneeded pcplist draining
  2020-10-13 14:44 [PATCH v5 0/4] HWpoison: further fixes and cleanups Oscar Salvador
                   ` (2 preceding siblings ...)
  2020-10-13 14:44 ` [PATCH v5 3/4] mm,hwpoison: take free pages off the buddy freelists for hugetlb Oscar Salvador
@ 2020-10-13 14:44 ` Oscar Salvador
  2020-10-14  0:06   ` HORIGUCHI NAOYA(堀口 直也)
  2020-11-25 16:38   ` Vlastimil Babka
  2020-11-10  6:21 ` [PATCH v5 0/4] HWpoison: further fixes and cleanups Oscar Salvador
  4 siblings, 2 replies; 12+ messages in thread
From: Oscar Salvador @ 2020-10-13 14:44 UTC (permalink / raw)
  To: n-horiguchi; +Cc: linux-mm, linux-kernel, Oscar Salvador

memory_failure and soft_offline_path paths now drain pcplists by calling
get_hwpoison_page.

memory_failure flags the page as HWPoison before, so that page cannot
longer go into a pcplist, and soft_offline_page only flags a page as
HWPoison if 1) we took the page off a buddy freelist 2) the page was
in-use and we migrated it 3) was a clean pagecache.

Because of that, a page cannot longer be poisoned and be in a pcplist.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 mm/madvise.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 416a56b8e757..c6b5524add58 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -877,7 +877,6 @@ static long madvise_remove(struct vm_area_struct *vma,
 static int madvise_inject_error(int behavior,
 		unsigned long start, unsigned long end)
 {
-	struct zone *zone;
 	unsigned long size;
 
 	if (!capable(CAP_SYS_ADMIN))
@@ -922,10 +921,6 @@ static int madvise_inject_error(int behavior,
 			return ret;
 	}
 
-	/* Ensure that all poisoned pages are removed from per-cpu lists */
-	for_each_populated_zone(zone)
-		drain_all_pages(zone);
-
 	return 0;
 }
 #endif
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 3/4] mm,hwpoison: take free pages off the buddy freelists for hugetlb
  2020-10-13 14:44 ` [PATCH v5 3/4] mm,hwpoison: take free pages off the buddy freelists for hugetlb Oscar Salvador
@ 2020-10-14  0:05   ` HORIGUCHI NAOYA(堀口 直也)
  2020-11-25 16:36   ` Vlastimil Babka
  1 sibling, 0 replies; 12+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2020-10-14  0:05 UTC (permalink / raw)
  To: Oscar Salvador; +Cc: n-horiguchi, linux-mm, linux-kernel

On Tue, Oct 13, 2020 at 04:44:46PM +0200, Oscar Salvador wrote:
> Currently, free hugetlb get dissolved, but we also need to make sure
> to take the poisoned subpage off the buddy frelists, so no one stumbles
> upon it (see previous patch for more information).
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>

Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 4/4] mm,hwpoison: drop unneeded pcplist draining
  2020-10-13 14:44 ` [PATCH v5 4/4] mm,hwpoison: drop unneeded pcplist draining Oscar Salvador
@ 2020-10-14  0:06   ` HORIGUCHI NAOYA(堀口 直也)
  2020-11-25 16:38   ` Vlastimil Babka
  1 sibling, 0 replies; 12+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2020-10-14  0:06 UTC (permalink / raw)
  To: Oscar Salvador; +Cc: n-horiguchi, linux-mm, linux-kernel

On Tue, Oct 13, 2020 at 04:44:47PM +0200, Oscar Salvador wrote:
> memory_failure and soft_offline_path paths now drain pcplists by calling
> get_hwpoison_page.
> 
> memory_failure flags the page as HWPoison before, so that page cannot
> longer go into a pcplist, and soft_offline_page only flags a page as
> HWPoison if 1) we took the page off a buddy freelist 2) the page was
> in-use and we migrated it 3) was a clean pagecache.
> 
> Because of that, a page cannot longer be poisoned and be in a pcplist.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>

Thank you!

Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 0/4] HWpoison: further fixes and cleanups
  2020-10-13 14:44 [PATCH v5 0/4] HWpoison: further fixes and cleanups Oscar Salvador
                   ` (3 preceding siblings ...)
  2020-10-13 14:44 ` [PATCH v5 4/4] mm,hwpoison: drop unneeded pcplist draining Oscar Salvador
@ 2020-11-10  6:21 ` Oscar Salvador
  4 siblings, 0 replies; 12+ messages in thread
From: Oscar Salvador @ 2020-11-10  6:21 UTC (permalink / raw)
  To: n-horiguchi; +Cc: linux-mm, linux-kernel, akpm

On Tue, Oct 13, 2020 at 04:44:43PM +0200, Oscar Salvador wrote:
> This patchset includes some more fixes and a cleanup.
> 
> Patch#2 and patch#3 are both fixes for taking a HWpoison page off a buddy
> freelist, since having them there has proved to be bad (see [1] and
> pathch#2's commit log).
> Patch#3 does the same for hugetlb pages.
> 
> [1] https://lkml.org/lkml/2020/9/22/565

Hi Andrew, I think this slipped through the cracks.
(Probably because I did not CC you, my bad)

Could you please consider taking this series? It has been already acked
by Naoya.

Thanks


-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 1/4] mm,hwpoison: drain pcplists before bailing out for non-buddy zero-refcount page
  2020-10-13 14:44 ` [PATCH v5 1/4] mm,hwpoison: drain pcplists before bailing out for non-buddy zero-refcount page Oscar Salvador
@ 2020-11-25 16:26   ` Vlastimil Babka
  0 siblings, 0 replies; 12+ messages in thread
From: Vlastimil Babka @ 2020-11-25 16:26 UTC (permalink / raw)
  To: Oscar Salvador, n-horiguchi; +Cc: linux-mm, linux-kernel, Naoya Horiguchi

On 10/13/20 4:44 PM, Oscar Salvador wrote:
> A page with 0-refcount and !PageBuddy could perfectly be a pcppage.
> Currently, we bail out with an error if we encounter such a page, meaning
> that we do not handle pcppages neither from hard-offline nor from
> soft-offline path.
> 
> Fix this by draining pcplists whenever we find this kind of page and retry
> the check again.  It might be that pcplists have been spilled into the
> buddy allocator and so we can handle it.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>   mm/memory-failure.c | 24 ++++++++++++++++++++++--
>   1 file changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index c0bb186bba62..e2f12410c594 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -946,13 +946,13 @@ static int page_action(struct page_state *ps, struct page *p,
>   }
>   
>   /**
> - * get_hwpoison_page() - Get refcount for memory error handling:
> + * __get_hwpoison_page() - Get refcount for memory error handling:
>    * @page:	raw error page (hit by memory error)
>    *
>    * Return: return 0 if failed to grab the refcount, otherwise true (some
>    * non-zero value.)
>    */
> -static int get_hwpoison_page(struct page *page)
> +static int __get_hwpoison_page(struct page *page)
>   {
>   	struct page *head = compound_head(page);
>   
> @@ -982,6 +982,26 @@ static int get_hwpoison_page(struct page *page)
>   	return 0;
>   }
>   
> +static int get_hwpoison_page(struct page *p)
> +{
> +	int ret;
> +	bool drained = false;
> +
> +retry:
> +	ret = __get_hwpoison_page(p);
> +	if (!ret && !is_free_buddy_page(p) && !page_count(p) && !drained) {
> +		/*
> +		 * The page might be in a pcplist, so try to drain those
> +		 * and see if we are lucky.
> +		 */
> +		drain_all_pages(page_zone(p));
> +		drained = true;
> +		goto retry;
> +	}
> +
> +	return ret;
> +}
> +
>   /*
>    * Do all that is necessary to remove user space mappings. Unmap
>    * the pages and send SIGBUS to the processes if the data was dirty.
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 2/4] mm,hwpoison: take free pages off the buddy freelists
  2020-10-13 14:44 ` [PATCH v5 2/4] mm,hwpoison: take free pages off the buddy freelists Oscar Salvador
@ 2020-11-25 16:35   ` Vlastimil Babka
  0 siblings, 0 replies; 12+ messages in thread
From: Vlastimil Babka @ 2020-11-25 16:35 UTC (permalink / raw)
  To: Oscar Salvador, n-horiguchi; +Cc: linux-mm, linux-kernel, Naoya Horiguchi

On 10/13/20 4:44 PM, Oscar Salvador wrote:
> The crux of the matter is that historically we left poisoned pages in the
> buddy system because we have some checks in place when allocating a page
> that are gatekeeper for poisoned pages.  Unfortunately, we do have other
> users (e.g: compaction [1]) that scan buddy freelists and try to get a
> page from there without checking whether the page is HWPoison.
> 
> As I stated already, I think it is fundamentally wrong to keep HWPoison
> pages within the buddy systems, checks in place or not.
> 
> Let us fix this the same way we did for soft_offline [2], taking the page
> off the buddy freelist so it is completely unreachable.
> 
> Note that this is fairly simple to trigger, as we only need to poison free
> buddy pages (madvise MADV_HWPOISON) and then run some sort of memory stress
> system.
> 
> Just for a matter of reference, I put a dump_page() in compaction_alloc()
> to trigger for HWPoison patches:
> 
> kernel: page:0000000012b2982b refcount:1 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x1d5db
> kernel: flags: 0xfffffc0800000(hwpoison)
> kernel: raw: 000fffffc0800000 ffffea00007573c8 ffffc90000857de0 0000000000000000
> kernel: raw: 0000000000000001 0000000000000000 00000001ffffffff 0000000000000000
> kernel: page dumped because: compaction_alloc
> 
> kernel: CPU: 4 PID: 123 Comm: kcompactd0 Tainted: G            E     5.9.0-rc2-mm1-1-default+ #5
> kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
> kernel: Call Trace:
> kernel:  dump_stack+0x6d/0x8b
> kernel:  compaction_alloc+0xb2/0xc0
> kernel:  migrate_pages+0x2a6/0x12a0
> kernel:  ? isolate_freepages+0xc80/0xc80
> kernel:  ? __ClearPageMovable+0xb0/0xb0
> kernel:  compact_zone+0x5eb/0x11c0
> kernel:  ? finish_task_switch+0x74/0x300
> kernel:  ? lock_timer_base+0xa8/0x170
> kernel:  proactive_compact_node+0x89/0xf0
> kernel:  ? kcompactd+0x2d0/0x3a0
> kernel:  kcompactd+0x2d0/0x3a0
> kernel:  ? finish_wait+0x80/0x80
> kernel:  ? kcompactd_do_work+0x350/0x350
> kernel:  kthread+0x118/0x130
> kernel:  ? kthread_associate_blkcg+0xa0/0xa0
> kernel:  ret_from_fork+0x22/0x30
> 
> After that, if e.g: a process faults in the page,  it will get killed
> unexpectedly.
> Fix it by containing the page immediatelly.
> 
> Besides that, two more changes can be noticed:
> 
> * MF_DELAYED no longer suits as we are fixing the issue by containing
>    the page immediately, so it does no longer rely on the allocation-time
>    checks to stop HWPoison to be handed over.
>    gain unless it is unpoisoned, so we fixed the situation.
>    Because of that, let us use MF_RECOVERED from now on.
> 
> * The second block that handles PageBuddy pages is no longer needed:
>    We call shake_page and then check whether the page is Buddy
>    because shake_page calls drain_all_pages, which sends pcp-pages back to
>    the buddy freelists, so we could have a chance to handle free pages.
>    Currently, get_hwpoison_page already calls drain_all_pages, and we call
>    get_hwpoison_page right before coming here, so we should be on the safe
>    side.
> 
> [1] https://lore.kernel.org/linux-mm/20190826104144.GA7849@linux/T/#u
> [2] https://patchwork.kernel.org/cover/11792607/
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

Makes a lot of sense.
Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>   mm/memory-failure.c | 27 +++++++++++++++++----------
>   1 file changed, 17 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index e2f12410c594..181bed890c16 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1341,6 +1341,7 @@ int memory_failure(unsigned long pfn, int flags)
>   	struct dev_pagemap *pgmap;
>   	int res;
>   	unsigned long page_flags;
> +	bool retry = true;
>   
>   	if (!sysctl_memory_failure_recovery)
>   		panic("Memory failure on page %lx", pfn);
> @@ -1358,6 +1359,7 @@ int memory_failure(unsigned long pfn, int flags)
>   		return -ENXIO;
>   	}
>   
> +try_again:
>   	if (PageHuge(p))
>   		return memory_failure_hugetlb(pfn, flags);
>   	if (TestSetPageHWPoison(p)) {
> @@ -1382,8 +1384,21 @@ int memory_failure(unsigned long pfn, int flags)
>   	 */
>   	if (!(flags & MF_COUNT_INCREASED) && !get_hwpoison_page(p)) {
>   		if (is_free_buddy_page(p)) {
> -			action_result(pfn, MF_MSG_BUDDY, MF_DELAYED);
> -			return 0;
> +			if (take_page_off_buddy(p)) {
> +				page_ref_inc(p);
> +				res = MF_RECOVERED;
> +			} else {
> +				/* We lost the race, try again */
> +				if (retry) {
> +					ClearPageHWPoison(p);
> +					num_poisoned_pages_dec();
> +					retry = false;
> +					goto try_again;
> +				}
> +				res = MF_FAILED;
> +			}
> +			action_result(pfn, MF_MSG_BUDDY, res);
> +			return res == MF_RECOVERED ? 0 : -EBUSY;
>   		} else {
>   			action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED);
>   			return -EBUSY;
> @@ -1407,14 +1422,6 @@ int memory_failure(unsigned long pfn, int flags)
>   	 * walked by the page reclaim code, however that's not a big loss.
>   	 */
>   	shake_page(p, 0);
> -	/* shake_page could have turned it free. */
> -	if (!PageLRU(p) && is_free_buddy_page(p)) {
> -		if (flags & MF_COUNT_INCREASED)
> -			action_result(pfn, MF_MSG_BUDDY, MF_DELAYED);
> -		else
> -			action_result(pfn, MF_MSG_BUDDY_2ND, MF_DELAYED);
> -		return 0;
> -	}
>   
>   	lock_page(p);
>   
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 3/4] mm,hwpoison: take free pages off the buddy freelists for hugetlb
  2020-10-13 14:44 ` [PATCH v5 3/4] mm,hwpoison: take free pages off the buddy freelists for hugetlb Oscar Salvador
  2020-10-14  0:05   ` HORIGUCHI NAOYA(堀口 直也)
@ 2020-11-25 16:36   ` Vlastimil Babka
  1 sibling, 0 replies; 12+ messages in thread
From: Vlastimil Babka @ 2020-11-25 16:36 UTC (permalink / raw)
  To: Oscar Salvador, n-horiguchi; +Cc: linux-mm, linux-kernel

On 10/13/20 4:44 PM, Oscar Salvador wrote:
> Currently, free hugetlb get dissolved, but we also need to make sure
> to take the poisoned subpage off the buddy frelists, so no one stumbles
> upon it (see previous patch for more information).
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>   mm/memory-failure.c | 19 +++++++++++++------
>   1 file changed, 13 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 181bed890c16..30aadeca97d2 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -809,7 +809,7 @@ static int me_swapcache_clean(struct page *p, unsigned long pfn)
>    */
>   static int me_huge_page(struct page *p, unsigned long pfn)
>   {
> -	int res = 0;
> +	int res;
>   	struct page *hpage = compound_head(p);
>   	struct address_space *mapping;
>   
> @@ -820,6 +820,7 @@ static int me_huge_page(struct page *p, unsigned long pfn)
>   	if (mapping) {
>   		res = truncate_error_page(hpage, pfn, mapping);
>   	} else {
> +		res = MF_FAILED;
>   		unlock_page(hpage);
>   		/*
>   		 * migration entry prevents later access on error anonymous
> @@ -828,8 +829,10 @@ static int me_huge_page(struct page *p, unsigned long pfn)
>   		 */
>   		if (PageAnon(hpage))
>   			put_page(hpage);
> -		dissolve_free_huge_page(p);
> -		res = MF_RECOVERED;
> +		if (!dissolve_free_huge_page(p) && take_page_off_buddy(p)) {
> +			page_ref_inc(p);
> +			res = MF_RECOVERED;
> +		}
>   		lock_page(hpage);
>   	}
>   
> @@ -1198,9 +1201,13 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
>   			}
>   		}
>   		unlock_page(head);
> -		dissolve_free_huge_page(p);
> -		action_result(pfn, MF_MSG_FREE_HUGE, MF_DELAYED);
> -		return 0;
> +		res = MF_FAILED;
> +		if (!dissolve_free_huge_page(p) && take_page_off_buddy(p)) {
> +			page_ref_inc(p);
> +			res = MF_RECOVERED;
> +		}
> +		action_result(pfn, MF_MSG_FREE_HUGE, res);
> +		return res == MF_RECOVERED ? 0 : -EBUSY;
>   	}
>   
>   	lock_page(head);
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 4/4] mm,hwpoison: drop unneeded pcplist draining
  2020-10-13 14:44 ` [PATCH v5 4/4] mm,hwpoison: drop unneeded pcplist draining Oscar Salvador
  2020-10-14  0:06   ` HORIGUCHI NAOYA(堀口 直也)
@ 2020-11-25 16:38   ` Vlastimil Babka
  1 sibling, 0 replies; 12+ messages in thread
From: Vlastimil Babka @ 2020-11-25 16:38 UTC (permalink / raw)
  To: Oscar Salvador, n-horiguchi; +Cc: linux-mm, linux-kernel

On 10/13/20 4:44 PM, Oscar Salvador wrote:
> memory_failure and soft_offline_path paths now drain pcplists by calling
> get_hwpoison_page.
> 
> memory_failure flags the page as HWPoison before, so that page cannot
> longer go into a pcplist, and soft_offline_page only flags a page as
> HWPoison if 1) we took the page off a buddy freelist 2) the page was
> in-use and we migrated it 3) was a clean pagecache.
> 
> Because of that, a page cannot longer be poisoned and be in a pcplist.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>   mm/madvise.c | 5 -----
>   1 file changed, 5 deletions(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 416a56b8e757..c6b5524add58 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -877,7 +877,6 @@ static long madvise_remove(struct vm_area_struct *vma,
>   static int madvise_inject_error(int behavior,
>   		unsigned long start, unsigned long end)
>   {
> -	struct zone *zone;
>   	unsigned long size;
>   
>   	if (!capable(CAP_SYS_ADMIN))
> @@ -922,10 +921,6 @@ static int madvise_inject_error(int behavior,
>   			return ret;
>   	}
>   
> -	/* Ensure that all poisoned pages are removed from per-cpu lists */
> -	for_each_populated_zone(zone)
> -		drain_all_pages(zone);
> -
>   	return 0;
>   }
>   #endif
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-11-25 16:38 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-13 14:44 [PATCH v5 0/4] HWpoison: further fixes and cleanups Oscar Salvador
2020-10-13 14:44 ` [PATCH v5 1/4] mm,hwpoison: drain pcplists before bailing out for non-buddy zero-refcount page Oscar Salvador
2020-11-25 16:26   ` Vlastimil Babka
2020-10-13 14:44 ` [PATCH v5 2/4] mm,hwpoison: take free pages off the buddy freelists Oscar Salvador
2020-11-25 16:35   ` Vlastimil Babka
2020-10-13 14:44 ` [PATCH v5 3/4] mm,hwpoison: take free pages off the buddy freelists for hugetlb Oscar Salvador
2020-10-14  0:05   ` HORIGUCHI NAOYA(堀口 直也)
2020-11-25 16:36   ` Vlastimil Babka
2020-10-13 14:44 ` [PATCH v5 4/4] mm,hwpoison: drop unneeded pcplist draining Oscar Salvador
2020-10-14  0:06   ` HORIGUCHI NAOYA(堀口 直也)
2020-11-25 16:38   ` Vlastimil Babka
2020-11-10  6:21 ` [PATCH v5 0/4] HWpoison: further fixes and cleanups Oscar Salvador

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).