Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v3 00/15] HWPOISON: soft offline rework
@ 2020-06-24 15:01 nao.horiguchi
  2020-06-24 15:01 ` [PATCH v3 01/15] mm,hwpoison: cleanup unused PageHuge() check nao.horiguchi
                   ` (18 more replies)
  0 siblings, 19 replies; 34+ messages in thread
From: nao.horiguchi @ 2020-06-24 15:01 UTC (permalink / raw)
  To: linux-mm
  Cc: mhocko, akpm, mike.kravetz, osalvador, tony.luck, david,
	aneesh.kumar, zeil, naoya.horiguchi, linux-kernel

I rebased soft-offline rework patchset [1][2] onto the latest mmotm.  The
rebasing required some non-trivial changes to adjust, but mainly that was
straightforward.  I confirmed that the reported problem doesn't reproduce on
compaction after soft offline.  For more precise description of the problem
and the motivation of this patchset, please see [2].

I think that the following two patches in v2 are better to be done with
separate work of hard-offline rework, so it's not included in this series.

  - mm,hwpoison: Take pages off the buddy when hard-offlining
  - mm/hwpoison-inject: Rip off duplicated checks

These two are not directly related to the reported problem, so they seems
not urgent.  And the first one breaks num_poisoned_pages counting in some
testcases, and The second patch needs more consideration about commented point.

Any comment/suggestion/help would be appreciated.

[1] v1: https://lore.kernel.org/linux-mm/1541746035-13408-1-git-send-email-n-horiguchi@ah.jp.nec.com/
[2] v2: https://lore.kernel.org/linux-mm/20191017142123.24245-1-osalvador@suse.de/

Thanks,
Naoya Horiguchi
---
Summary:

Naoya Horiguchi (7):
      mm,hwpoison: cleanup unused PageHuge() check
      mm, hwpoison: remove recalculating hpage
      mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED
      mm,hwpoison-inject: don't pin for hwpoison_filter
      mm,hwpoison: remove MF_COUNT_INCREASED
      mm,hwpoison: remove flag argument from soft offline functions
      mm,hwpoison: introduce MF_MSG_UNSPLIT_THP

Oscar Salvador (8):
      mm,madvise: Refactor madvise_inject_error
      mm,hwpoison: Un-export get_hwpoison_page and make it static
      mm,hwpoison: Kill put_hwpoison_page
      mm,hwpoison: Unify THP handling for hard and soft offline
      mm,hwpoison: Rework soft offline for free pages
      mm,hwpoison: Rework soft offline for in-use pages
      mm,hwpoison: Refactor soft_offline_huge_page and __soft_offline_page
      mm,hwpoison: Return 0 if the page is already poisoned in soft-offline

 drivers/base/memory.c      |   2 +-
 include/linux/mm.h         |  12 +-
 include/linux/page-flags.h |   6 +-
 include/ras/ras_event.h    |   3 +
 mm/hwpoison-inject.c       |  18 +--
 mm/madvise.c               |  39 +++---
 mm/memory-failure.c        | 331 ++++++++++++++++++++-------------------------
 mm/migrate.c               |  11 +-
 mm/page_alloc.c            |  63 +++++++--
 9 files changed, 233 insertions(+), 252 deletions(-)


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 01/15] mm,hwpoison: cleanup unused PageHuge() check
  2020-06-24 15:01 [PATCH v3 00/15] HWPOISON: soft offline rework nao.horiguchi
@ 2020-06-24 15:01 ` nao.horiguchi
  2020-06-30 23:28   ` Mike Kravetz
  2020-06-24 15:01 ` [PATCH v3 02/15] mm, hwpoison: remove recalculating hpage nao.horiguchi
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 34+ messages in thread
From: nao.horiguchi @ 2020-06-24 15:01 UTC (permalink / raw)
  To: linux-mm
  Cc: mhocko, akpm, mike.kravetz, osalvador, tony.luck, david,
	aneesh.kumar, zeil, naoya.horiguchi, linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

Drop the PageHuge check, which is dead code since memory_failure() forks
into memory_failure_hugetlb() for hugetlb pages.

memory_failure() and memory_failure_hugetlb() shares some functions like
hwpoison_user_mappings() and identify_page_state(), so they should properly
handle 4kB page, thp, and hugetlb.

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
changelog v2 -> v3:
- add description about shared logic b/w hugetlb and non-hugetlb path.
---
 mm/memory-failure.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git v5.8-rc1-mmots-2020-06-20-21-44/mm/memory-failure.c v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/memory-failure.c
index 47b8ccb1fb9b..e5d0c14c2332 100644
--- v5.8-rc1-mmots-2020-06-20-21-44/mm/memory-failure.c
+++ v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/memory-failure.c
@@ -1382,10 +1382,7 @@ int memory_failure(unsigned long pfn, int flags)
 	 * page_remove_rmap() in try_to_unmap_one(). So to determine page status
 	 * correctly, we save a copy of the page flags at this time.
 	 */
-	if (PageHuge(p))
-		page_flags = hpage->flags;
-	else
-		page_flags = p->flags;
+	page_flags = p->flags;
 
 	/*
 	 * unpoison always clear PG_hwpoison inside page lock
-- 
2.17.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 02/15] mm, hwpoison: remove recalculating hpage
  2020-06-24 15:01 [PATCH v3 00/15] HWPOISON: soft offline rework nao.horiguchi
  2020-06-24 15:01 ` [PATCH v3 01/15] mm,hwpoison: cleanup unused PageHuge() check nao.horiguchi
@ 2020-06-24 15:01 ` nao.horiguchi
  2020-06-30 23:45   ` Mike Kravetz
  2020-06-24 15:01 ` [PATCH v3 03/15] mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED nao.horiguchi
                   ` (16 subsequent siblings)
  18 siblings, 1 reply; 34+ messages in thread
From: nao.horiguchi @ 2020-06-24 15:01 UTC (permalink / raw)
  To: linux-mm
  Cc: mhocko, akpm, mike.kravetz, osalvador, tony.luck, david,
	aneesh.kumar, zeil, naoya.horiguchi, linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

hpage is never used after try_to_split_thp_page() in memory_failure(),
so we don't have to update hpage.  So let's not recalculate/use hpage.

Suggested-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
 mm/memory-failure.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git v5.8-rc1-mmots-2020-06-20-21-44/mm/memory-failure.c v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/memory-failure.c
index e5d0c14c2332..d2d6010764e7 100644
--- v5.8-rc1-mmots-2020-06-20-21-44/mm/memory-failure.c
+++ v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/memory-failure.c
@@ -1342,7 +1342,6 @@ int memory_failure(unsigned long pfn, int flags)
 		}
 		unlock_page(p);
 		VM_BUG_ON_PAGE(!page_count(p), p);
-		hpage = compound_head(p);
 	}
 
 	/*
@@ -1414,11 +1413,8 @@ int memory_failure(unsigned long pfn, int flags)
 	/*
 	 * Now take care of user space mappings.
 	 * Abort on fail: __delete_from_page_cache() assumes unmapped page.
-	 *
-	 * When the raw error page is thp tail page, hpage points to the raw
-	 * page after thp split.
 	 */
-	if (!hwpoison_user_mappings(p, pfn, flags, &hpage)) {
+	if (!hwpoison_user_mappings(p, pfn, flags, &p)) {
 		action_result(pfn, MF_MSG_UNMAP_FAILED, MF_IGNORED);
 		res = -EBUSY;
 		goto out;
-- 
2.17.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 03/15] mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED
  2020-06-24 15:01 [PATCH v3 00/15] HWPOISON: soft offline rework nao.horiguchi
  2020-06-24 15:01 ` [PATCH v3 01/15] mm,hwpoison: cleanup unused PageHuge() check nao.horiguchi
  2020-06-24 15:01 ` [PATCH v3 02/15] mm, hwpoison: remove recalculating hpage nao.horiguchi
@ 2020-06-24 15:01 ` nao.horiguchi
  2020-06-24 15:01 ` [PATCH v3 04/15] mm,madvise: Refactor madvise_inject_error nao.horiguchi
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: nao.horiguchi @ 2020-06-24 15:01 UTC (permalink / raw)
  To: linux-mm
  Cc: mhocko, akpm, mike.kravetz, osalvador, tony.luck, david,
	aneesh.kumar, zeil, naoya.horiguchi, linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

The call to get_user_pages_fast is only to get the pointer to a struct
page of a given address, pinning it is memory-poisoning handler's job,
so drop the refcount grabbed by get_user_pages_fast().

Note that the target page is still pinned after this put_page() because
the current process should have refcount from mapping.

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 mm/madvise.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git v5.8-rc1-mmots-2020-06-20-21-44/mm/madvise.c v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/madvise.c
index dd1d43cf026d..275b08edd428 100644
--- v5.8-rc1-mmots-2020-06-20-21-44/mm/madvise.c
+++ v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/madvise.c
@@ -893,16 +893,24 @@ static int madvise_inject_error(int behavior,
 		 */
 		size = page_size(compound_head(page));
 
-		if (PageHWPoison(page)) {
-			put_page(page);
+		/*
+		 * The get_user_pages_fast() is just to get the pfn of the
+		 * given address, and the refcount has nothing to do with
+		 * what we try to test, so it should be released immediately.
+		 * This is racy but it's intended because the real hardware
+		 * errors could happen at any moment and memory error handlers
+		 * must properly handle the race.
+		 */
+		put_page(page);
+
+		if (PageHWPoison(page))
 			continue;
-		}
 
 		if (behavior == MADV_SOFT_OFFLINE) {
 			pr_info("Soft offlining pfn %#lx at process virtual address %#lx\n",
 					pfn, start);
 
-			ret = soft_offline_page(pfn, MF_COUNT_INCREASED);
+			ret = soft_offline_page(pfn, 0);
 			if (ret)
 				return ret;
 			continue;
@@ -910,14 +918,6 @@ static int madvise_inject_error(int behavior,
 
 		pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n",
 				pfn, start);
-
-		/*
-		 * Drop the page reference taken by get_user_pages_fast(). In
-		 * the absence of MF_COUNT_INCREASED the memory_failure()
-		 * routine is responsible for pinning the page to prevent it
-		 * from being released back to the page allocator.
-		 */
-		put_page(page);
 		ret = memory_failure(pfn, 0);
 		if (ret)
 			return ret;
-- 
2.17.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 04/15] mm,madvise: Refactor madvise_inject_error
  2020-06-24 15:01 [PATCH v3 00/15] HWPOISON: soft offline rework nao.horiguchi
                   ` (2 preceding siblings ...)
  2020-06-24 15:01 ` [PATCH v3 03/15] mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED nao.horiguchi
@ 2020-06-24 15:01 ` nao.horiguchi
  2020-06-24 15:01 ` [PATCH v3 05/15] mm,hwpoison-inject: don't pin for hwpoison_filter nao.horiguchi
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: nao.horiguchi @ 2020-06-24 15:01 UTC (permalink / raw)
  To: linux-mm
  Cc: mhocko, akpm, mike.kravetz, osalvador, tony.luck, david,
	aneesh.kumar, zeil, naoya.horiguchi, linux-kernel

From: Oscar Salvador <osalvador@suse.de>

Make a proper if-else condition for {hard,soft}-offline.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
 mm/madvise.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git v5.8-rc1-mmots-2020-06-20-21-44/mm/madvise.c v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/madvise.c
index 275b08edd428..bee1f4ac70d6 100644
--- v5.8-rc1-mmots-2020-06-20-21-44/mm/madvise.c
+++ v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/madvise.c
@@ -869,16 +869,15 @@ static long madvise_remove(struct vm_area_struct *vma,
 static int madvise_inject_error(int behavior,
 		unsigned long start, unsigned long end)
 {
-	struct page *page;
 	struct zone *zone;
 	unsigned long size;
 
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
 
-
 	for (; start < end; start += size) {
 		unsigned long pfn;
+		struct page *page;
 		int ret;
 
 		ret = get_user_pages_fast(start, 1, 0, &page);
@@ -908,17 +907,14 @@ static int madvise_inject_error(int behavior,
 
 		if (behavior == MADV_SOFT_OFFLINE) {
 			pr_info("Soft offlining pfn %#lx at process virtual address %#lx\n",
-					pfn, start);
-
+				pfn, start);
 			ret = soft_offline_page(pfn, 0);
-			if (ret)
-				return ret;
-			continue;
+		} else {
+			pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n",
+				pfn, start);
+			ret = memory_failure(pfn, 0);
 		}
 
-		pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n",
-				pfn, start);
-		ret = memory_failure(pfn, 0);
 		if (ret)
 			return ret;
 	}
-- 
2.17.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 05/15] mm,hwpoison-inject: don't pin for hwpoison_filter
  2020-06-24 15:01 [PATCH v3 00/15] HWPOISON: soft offline rework nao.horiguchi
                   ` (3 preceding siblings ...)
  2020-06-24 15:01 ` [PATCH v3 04/15] mm,madvise: Refactor madvise_inject_error nao.horiguchi
@ 2020-06-24 15:01 ` nao.horiguchi
  2020-06-24 15:01 ` [PATCH v3 06/15] mm,hwpoison: Un-export get_hwpoison_page and make it static nao.horiguchi
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: nao.horiguchi @ 2020-06-24 15:01 UTC (permalink / raw)
  To: linux-mm
  Cc: mhocko, akpm, mike.kravetz, osalvador, tony.luck, david,
	aneesh.kumar, zeil, naoya.horiguchi, linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

Another memory error injection interface debugfs:hwpoison/corrupt-pfn
also takes bogus refcount for hwpoison_filter(). It's justified
because this does a coarse filter, expecting that memory_failure()
redoes the check for sure.

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 mm/hwpoison-inject.c | 18 +++++-------------
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git v5.8-rc1-mmots-2020-06-20-21-44/mm/hwpoison-inject.c v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/hwpoison-inject.c
index e488876b168a..1ae1ebc2b9b1 100644
--- v5.8-rc1-mmots-2020-06-20-21-44/mm/hwpoison-inject.c
+++ v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/hwpoison-inject.c
@@ -26,11 +26,6 @@ static int hwpoison_inject(void *data, u64 val)
 
 	p = pfn_to_page(pfn);
 	hpage = compound_head(p);
-	/*
-	 * This implies unable to support free buddy pages.
-	 */
-	if (!get_hwpoison_page(p))
-		return 0;
 
 	if (!hwpoison_filter_enable)
 		goto inject;
@@ -40,23 +35,20 @@ static int hwpoison_inject(void *data, u64 val)
 	 * This implies unable to support non-LRU pages.
 	 */
 	if (!PageLRU(hpage) && !PageHuge(p))
-		goto put_out;
+		return 0;
 
 	/*
-	 * do a racy check with elevated page count, to make sure PG_hwpoison
-	 * will only be set for the targeted owner (or on a free page).
+	 * do a racy check to make sure PG_hwpoison will only be set for
+	 * the targeted owner (or on a free page).
 	 * memory_failure() will redo the check reliably inside page lock.
 	 */
 	err = hwpoison_filter(hpage);
 	if (err)
-		goto put_out;
+		return 0;
 
 inject:
 	pr_info("Injecting memory failure at pfn %#lx\n", pfn);
-	return memory_failure(pfn, MF_COUNT_INCREASED);
-put_out:
-	put_hwpoison_page(p);
-	return 0;
+	return memory_failure(pfn, 0);
 }
 
 static int hwpoison_unpoison(void *data, u64 val)
-- 
2.17.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 06/15] mm,hwpoison: Un-export get_hwpoison_page and make it static
  2020-06-24 15:01 [PATCH v3 00/15] HWPOISON: soft offline rework nao.horiguchi
                   ` (4 preceding siblings ...)
  2020-06-24 15:01 ` [PATCH v3 05/15] mm,hwpoison-inject: don't pin for hwpoison_filter nao.horiguchi
@ 2020-06-24 15:01 ` nao.horiguchi
  2020-06-24 15:01 ` [PATCH v3 07/15] mm,hwpoison: Kill put_hwpoison_page nao.horiguchi
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: nao.horiguchi @ 2020-06-24 15:01 UTC (permalink / raw)
  To: linux-mm
  Cc: mhocko, akpm, mike.kravetz, osalvador, tony.luck, david,
	aneesh.kumar, zeil, naoya.horiguchi, linux-kernel

From: Oscar Salvador <osalvador@suse.de>

Since get_hwpoison_page is only used in memory-failure code now,
let us un-export it and make it private to that code.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
 include/linux/mm.h  | 1 -
 mm/memory-failure.c | 3 +--
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git v5.8-rc1-mmots-2020-06-20-21-44/include/linux/mm.h v5.8-rc1-mmots-2020-06-20-21-44_patched/include/linux/mm.h
index e6ff54a7b284..050e9cffc2ea 100644
--- v5.8-rc1-mmots-2020-06-20-21-44/include/linux/mm.h
+++ v5.8-rc1-mmots-2020-06-20-21-44_patched/include/linux/mm.h
@@ -2995,7 +2995,6 @@ extern int memory_failure(unsigned long pfn, int flags);
 extern void memory_failure_queue(unsigned long pfn, int flags);
 extern void memory_failure_queue_kick(int cpu);
 extern int unpoison_memory(unsigned long pfn);
-extern int get_hwpoison_page(struct page *page);
 #define put_hwpoison_page(page)	put_page(page)
 extern int sysctl_memory_failure_early_kill;
 extern int sysctl_memory_failure_recovery;
diff --git v5.8-rc1-mmots-2020-06-20-21-44/mm/memory-failure.c v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/memory-failure.c
index d2d6010764e7..48feb45047f7 100644
--- v5.8-rc1-mmots-2020-06-20-21-44/mm/memory-failure.c
+++ v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/memory-failure.c
@@ -925,7 +925,7 @@ static int page_action(struct page_state *ps, struct page *p,
  * Return: return 0 if failed to grab the refcount, otherwise true (some
  * non-zero value.)
  */
-int get_hwpoison_page(struct page *page)
+static int get_hwpoison_page(struct page *page)
 {
 	struct page *head = compound_head(page);
 
@@ -954,7 +954,6 @@ int get_hwpoison_page(struct page *page)
 
 	return 0;
 }
-EXPORT_SYMBOL_GPL(get_hwpoison_page);
 
 /*
  * Do all that is necessary to remove user space mappings. Unmap
-- 
2.17.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 07/15] mm,hwpoison: Kill put_hwpoison_page
  2020-06-24 15:01 [PATCH v3 00/15] HWPOISON: soft offline rework nao.horiguchi
                   ` (5 preceding siblings ...)
  2020-06-24 15:01 ` [PATCH v3 06/15] mm,hwpoison: Un-export get_hwpoison_page and make it static nao.horiguchi
@ 2020-06-24 15:01 ` nao.horiguchi
  2020-06-24 15:01 ` [PATCH v3 08/15] mm,hwpoison: remove MF_COUNT_INCREASED nao.horiguchi
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: nao.horiguchi @ 2020-06-24 15:01 UTC (permalink / raw)
  To: linux-mm
  Cc: mhocko, akpm, mike.kravetz, osalvador, tony.luck, david,
	aneesh.kumar, zeil, naoya.horiguchi, linux-kernel

From: Oscar Salvador <osalvador@suse.de>

After commit 4e41a30c6d50 ("mm: hwpoison: adjust for new thp refcounting"),
put_hwpoison_page got reduced to a put_page.
Let us just use put_page instead.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
 include/linux/mm.h  |  1 -
 mm/memory-failure.c | 30 +++++++++++++++---------------
 2 files changed, 15 insertions(+), 16 deletions(-)

diff --git v5.8-rc1-mmots-2020-06-20-21-44/include/linux/mm.h v5.8-rc1-mmots-2020-06-20-21-44_patched/include/linux/mm.h
index 050e9cffc2ea..c64ffec363b5 100644
--- v5.8-rc1-mmots-2020-06-20-21-44/include/linux/mm.h
+++ v5.8-rc1-mmots-2020-06-20-21-44_patched/include/linux/mm.h
@@ -2995,7 +2995,6 @@ extern int memory_failure(unsigned long pfn, int flags);
 extern void memory_failure_queue(unsigned long pfn, int flags);
 extern void memory_failure_queue_kick(int cpu);
 extern int unpoison_memory(unsigned long pfn);
-#define put_hwpoison_page(page)	put_page(page)
 extern int sysctl_memory_failure_early_kill;
 extern int sysctl_memory_failure_recovery;
 extern void shake_page(struct page *p, int access);
diff --git v5.8-rc1-mmots-2020-06-20-21-44/mm/memory-failure.c v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/memory-failure.c
index 48feb45047f7..0b7d9769cf29 100644
--- v5.8-rc1-mmots-2020-06-20-21-44/mm/memory-failure.c
+++ v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/memory-failure.c
@@ -1144,7 +1144,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
 		pr_err("Memory failure: %#lx: just unpoisoned\n", pfn);
 		num_poisoned_pages_dec();
 		unlock_page(head);
-		put_hwpoison_page(head);
+		put_page(head);
 		return 0;
 	}
 
@@ -1336,7 +1336,7 @@ int memory_failure(unsigned long pfn, int flags)
 					pfn);
 			if (TestClearPageHWPoison(p))
 				num_poisoned_pages_dec();
-			put_hwpoison_page(p);
+			put_page(p);
 			return -EBUSY;
 		}
 		unlock_page(p);
@@ -1389,14 +1389,14 @@ int memory_failure(unsigned long pfn, int flags)
 		pr_err("Memory failure: %#lx: just unpoisoned\n", pfn);
 		num_poisoned_pages_dec();
 		unlock_page(p);
-		put_hwpoison_page(p);
+		put_page(p);
 		return 0;
 	}
 	if (hwpoison_filter(p)) {
 		if (TestClearPageHWPoison(p))
 			num_poisoned_pages_dec();
 		unlock_page(p);
-		put_hwpoison_page(p);
+		put_page(p);
 		return 0;
 	}
 
@@ -1630,9 +1630,9 @@ int unpoison_memory(unsigned long pfn)
 	}
 	unlock_page(page);
 
-	put_hwpoison_page(page);
+	put_page(page);
 	if (freeit && !(pfn == my_zero_pfn(0) && page_count(p) == 1))
-		put_hwpoison_page(page);
+		put_page(page);
 
 	return 0;
 }
@@ -1690,7 +1690,7 @@ static int get_any_page(struct page *page, unsigned long pfn, int flags)
 		/*
 		 * Try to free it.
 		 */
-		put_hwpoison_page(page);
+		put_page(page);
 		shake_page(page, 1);
 
 		/*
@@ -1699,7 +1699,7 @@ static int get_any_page(struct page *page, unsigned long pfn, int flags)
 		ret = __get_any_page(page, pfn, 0);
 		if (ret == 1 && !PageLRU(page)) {
 			/* Drop page reference which is from __get_any_page() */
-			put_hwpoison_page(page);
+			put_page(page);
 			pr_info("soft_offline: %#lx: unknown non LRU page type %lx (%pGp)\n",
 				pfn, page->flags, &page->flags);
 			return -EIO;
@@ -1722,7 +1722,7 @@ static int soft_offline_huge_page(struct page *page, int flags)
 	lock_page(hpage);
 	if (PageHWPoison(hpage)) {
 		unlock_page(hpage);
-		put_hwpoison_page(hpage);
+		put_page(hpage);
 		pr_info("soft offline: %#lx hugepage already poisoned\n", pfn);
 		return -EBUSY;
 	}
@@ -1733,7 +1733,7 @@ static int soft_offline_huge_page(struct page *page, int flags)
 	 * get_any_page() and isolate_huge_page() takes a refcount each,
 	 * so need to drop one here.
 	 */
-	put_hwpoison_page(hpage);
+	put_page(hpage);
 	if (!ret) {
 		pr_info("soft offline: %#lx hugepage failed to isolate\n", pfn);
 		return -EBUSY;
@@ -1782,7 +1782,7 @@ static int __soft_offline_page(struct page *page, int flags)
 	wait_on_page_writeback(page);
 	if (PageHWPoison(page)) {
 		unlock_page(page);
-		put_hwpoison_page(page);
+		put_page(page);
 		pr_info("soft offline: %#lx page already poisoned\n", pfn);
 		return -EBUSY;
 	}
@@ -1797,7 +1797,7 @@ static int __soft_offline_page(struct page *page, int flags)
 	 * would need to fix isolation locking first.
 	 */
 	if (ret == 1) {
-		put_hwpoison_page(page);
+		put_page(page);
 		pr_info("soft_offline: %#lx: invalidated\n", pfn);
 		SetPageHWPoison(page);
 		num_poisoned_pages_inc();
@@ -1817,7 +1817,7 @@ static int __soft_offline_page(struct page *page, int flags)
 	 * Drop page reference which is came from get_any_page()
 	 * successful isolate_lru_page() already took another one.
 	 */
-	put_hwpoison_page(page);
+	put_page(page);
 	if (!ret) {
 		LIST_HEAD(pagelist);
 		/*
@@ -1861,7 +1861,7 @@ static int soft_offline_in_use_page(struct page *page, int flags)
 				pr_info("soft offline: %#lx: non anonymous thp\n", page_to_pfn(page));
 			else
 				pr_info("soft offline: %#lx: thp split failed\n", page_to_pfn(page));
-			put_hwpoison_page(page);
+			put_page(page);
 			return -EBUSY;
 		}
 		unlock_page(page);
@@ -1934,7 +1934,7 @@ int soft_offline_page(unsigned long pfn, int flags)
 	if (PageHWPoison(page)) {
 		pr_info("soft offline: %#lx page already poisoned\n", pfn);
 		if (flags & MF_COUNT_INCREASED)
-			put_hwpoison_page(page);
+			put_page(page);
 		return -EBUSY;
 	}
 
-- 
2.17.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 08/15] mm,hwpoison: remove MF_COUNT_INCREASED
  2020-06-24 15:01 [PATCH v3 00/15] HWPOISON: soft offline rework nao.horiguchi
                   ` (6 preceding siblings ...)
  2020-06-24 15:01 ` [PATCH v3 07/15] mm,hwpoison: Kill put_hwpoison_page nao.horiguchi
@ 2020-06-24 15:01 ` nao.horiguchi
  2020-06-24 15:01 ` [PATCH v3 09/15] mm,hwpoison: remove flag argument from soft offline functions nao.horiguchi
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: nao.horiguchi @ 2020-06-24 15:01 UTC (permalink / raw)
  To: linux-mm
  Cc: mhocko, akpm, mike.kravetz, osalvador, tony.luck, david,
	aneesh.kumar, zeil, naoya.horiguchi, linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

Now there's no user of MF_COUNT_INCREASED, so we can safely remove
it from all calling points.

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 include/linux/mm.h  |  7 +++----
 mm/memory-failure.c | 14 +++-----------
 2 files changed, 6 insertions(+), 15 deletions(-)

diff --git v5.8-rc1-mmots-2020-06-20-21-44/include/linux/mm.h v5.8-rc1-mmots-2020-06-20-21-44_patched/include/linux/mm.h
index c64ffec363b5..229efb3479b1 100644
--- v5.8-rc1-mmots-2020-06-20-21-44/include/linux/mm.h
+++ v5.8-rc1-mmots-2020-06-20-21-44_patched/include/linux/mm.h
@@ -2986,10 +2986,9 @@ void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
 				  unsigned long nr_pages);
 
 enum mf_flags {
-	MF_COUNT_INCREASED = 1 << 0,
-	MF_ACTION_REQUIRED = 1 << 1,
-	MF_MUST_KILL = 1 << 2,
-	MF_SOFT_OFFLINE = 1 << 3,
+	MF_ACTION_REQUIRED = 1 << 0,
+	MF_MUST_KILL = 1 << 1,
+	MF_SOFT_OFFLINE = 1 << 2,
 };
 extern int memory_failure(unsigned long pfn, int flags);
 extern void memory_failure_queue(unsigned long pfn, int flags);
diff --git v5.8-rc1-mmots-2020-06-20-21-44/mm/memory-failure.c v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/memory-failure.c
index 0b7d9769cf29..15b8e7fd94ed 100644
--- v5.8-rc1-mmots-2020-06-20-21-44/mm/memory-failure.c
+++ v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/memory-failure.c
@@ -1118,7 +1118,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
 
 	num_poisoned_pages_inc();
 
-	if (!(flags & MF_COUNT_INCREASED) && !get_hwpoison_page(p)) {
+	if (!get_hwpoison_page(p)) {
 		/*
 		 * Check "filter hit" and "race with other subpage."
 		 */
@@ -1314,7 +1314,7 @@ int memory_failure(unsigned long pfn, int flags)
 	 * In fact it's dangerous to directly bump up page count from 0,
 	 * that may make page_ref_freeze()/page_ref_unfreeze() mismatch.
 	 */
-	if (!(flags & MF_COUNT_INCREASED) && !get_hwpoison_page(p)) {
+	if (!get_hwpoison_page(p)) {
 		if (is_free_buddy_page(p)) {
 			action_result(pfn, MF_MSG_BUDDY, MF_DELAYED);
 			return 0;
@@ -1354,10 +1354,7 @@ int memory_failure(unsigned long pfn, int flags)
 	shake_page(p, 0);
 	/* shake_page could have turned it free. */
 	if (!PageLRU(p) && is_free_buddy_page(p)) {
-		if (flags & MF_COUNT_INCREASED)
-			action_result(pfn, MF_MSG_BUDDY, MF_DELAYED);
-		else
-			action_result(pfn, MF_MSG_BUDDY_2ND, MF_DELAYED);
+		action_result(pfn, MF_MSG_BUDDY_2ND, MF_DELAYED);
 		return 0;
 	}
 
@@ -1655,9 +1652,6 @@ static int __get_any_page(struct page *p, unsigned long pfn, int flags)
 {
 	int ret;
 
-	if (flags & MF_COUNT_INCREASED)
-		return 1;
-
 	/*
 	 * When the target page is a free hugepage, just remove it
 	 * from free hugepage list.
@@ -1933,8 +1927,6 @@ int soft_offline_page(unsigned long pfn, int flags)
 
 	if (PageHWPoison(page)) {
 		pr_info("soft offline: %#lx page already poisoned\n", pfn);
-		if (flags & MF_COUNT_INCREASED)
-			put_page(page);
 		return -EBUSY;
 	}
 
-- 
2.17.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 09/15] mm,hwpoison: remove flag argument from soft offline functions
  2020-06-24 15:01 [PATCH v3 00/15] HWPOISON: soft offline rework nao.horiguchi
                   ` (7 preceding siblings ...)
  2020-06-24 15:01 ` [PATCH v3 08/15] mm,hwpoison: remove MF_COUNT_INCREASED nao.horiguchi
@ 2020-06-24 15:01 ` nao.horiguchi
  2020-06-24 15:01 ` [PATCH v3 10/15] mm,hwpoison: Unify THP handling for hard and soft offline nao.horiguchi
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: nao.horiguchi @ 2020-06-24 15:01 UTC (permalink / raw)
  To: linux-mm
  Cc: mhocko, akpm, mike.kravetz, osalvador, tony.luck, david,
	aneesh.kumar, zeil, naoya.horiguchi, linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

The argument @flag no longer affects the behavior of soft_offline_page()
and its variants, so let's remove them.

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 drivers/base/memory.c |  2 +-
 include/linux/mm.h    |  2 +-
 mm/madvise.c          |  2 +-
 mm/memory-failure.c   | 27 +++++++++++++--------------
 4 files changed, 16 insertions(+), 17 deletions(-)

diff --git v5.8-rc1-mmots-2020-06-20-21-44/drivers/base/memory.c v5.8-rc1-mmots-2020-06-20-21-44_patched/drivers/base/memory.c
index 2b09b68b9f78..20664f279c99 100644
--- v5.8-rc1-mmots-2020-06-20-21-44/drivers/base/memory.c
+++ v5.8-rc1-mmots-2020-06-20-21-44_patched/drivers/base/memory.c
@@ -463,7 +463,7 @@ static ssize_t soft_offline_page_store(struct device *dev,
 	if (kstrtoull(buf, 0, &pfn) < 0)
 		return -EINVAL;
 	pfn >>= PAGE_SHIFT;
-	ret = soft_offline_page(pfn, 0);
+	ret = soft_offline_page(pfn);
 	return ret == 0 ? count : ret;
 }
 
diff --git v5.8-rc1-mmots-2020-06-20-21-44/include/linux/mm.h v5.8-rc1-mmots-2020-06-20-21-44_patched/include/linux/mm.h
index 229efb3479b1..d708033c50c0 100644
--- v5.8-rc1-mmots-2020-06-20-21-44/include/linux/mm.h
+++ v5.8-rc1-mmots-2020-06-20-21-44_patched/include/linux/mm.h
@@ -2998,7 +2998,7 @@ extern int sysctl_memory_failure_early_kill;
 extern int sysctl_memory_failure_recovery;
 extern void shake_page(struct page *p, int access);
 extern atomic_long_t num_poisoned_pages __read_mostly;
-extern int soft_offline_page(unsigned long pfn, int flags);
+extern int soft_offline_page(unsigned long pfn);
 
 
 /*
diff --git v5.8-rc1-mmots-2020-06-20-21-44/mm/madvise.c v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/madvise.c
index bee1f4ac70d6..2b5080253406 100644
--- v5.8-rc1-mmots-2020-06-20-21-44/mm/madvise.c
+++ v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/madvise.c
@@ -908,7 +908,7 @@ static int madvise_inject_error(int behavior,
 		if (behavior == MADV_SOFT_OFFLINE) {
 			pr_info("Soft offlining pfn %#lx at process virtual address %#lx\n",
 				pfn, start);
-			ret = soft_offline_page(pfn, 0);
+			ret = soft_offline_page(pfn);
 		} else {
 			pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n",
 				pfn, start);
diff --git v5.8-rc1-mmots-2020-06-20-21-44/mm/memory-failure.c v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/memory-failure.c
index 15b8e7fd94ed..03d3aae77f89 100644
--- v5.8-rc1-mmots-2020-06-20-21-44/mm/memory-failure.c
+++ v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/memory-failure.c
@@ -1502,7 +1502,7 @@ static void memory_failure_work_func(struct work_struct *work)
 		if (!gotten)
 			break;
 		if (entry.flags & MF_SOFT_OFFLINE)
-			soft_offline_page(entry.pfn, entry.flags);
+			soft_offline_page(entry.pfn);
 		else
 			memory_failure(entry.pfn, entry.flags);
 	}
@@ -1648,7 +1648,7 @@ static struct page *new_page(struct page *p, unsigned long private)
  * that is not free, and 1 for any other page type.
  * For 1 the page is returned with increased page count, otherwise not.
  */
-static int __get_any_page(struct page *p, unsigned long pfn, int flags)
+static int __get_any_page(struct page *p, unsigned long pfn)
 {
 	int ret;
 
@@ -1675,9 +1675,9 @@ static int __get_any_page(struct page *p, unsigned long pfn, int flags)
 	return ret;
 }
 
-static int get_any_page(struct page *page, unsigned long pfn, int flags)
+static int get_any_page(struct page *page, unsigned long pfn)
 {
-	int ret = __get_any_page(page, pfn, flags);
+	int ret = __get_any_page(page, pfn);
 
 	if (ret == 1 && !PageHuge(page) &&
 	    !PageLRU(page) && !__PageMovable(page)) {
@@ -1690,7 +1690,7 @@ static int get_any_page(struct page *page, unsigned long pfn, int flags)
 		/*
 		 * Did it turn free?
 		 */
-		ret = __get_any_page(page, pfn, 0);
+		ret = __get_any_page(page, pfn);
 		if (ret == 1 && !PageLRU(page)) {
 			/* Drop page reference which is from __get_any_page() */
 			put_page(page);
@@ -1702,7 +1702,7 @@ static int get_any_page(struct page *page, unsigned long pfn, int flags)
 	return ret;
 }
 
-static int soft_offline_huge_page(struct page *page, int flags)
+static int soft_offline_huge_page(struct page *page)
 {
 	int ret;
 	unsigned long pfn = page_to_pfn(page);
@@ -1761,7 +1761,7 @@ static int soft_offline_huge_page(struct page *page, int flags)
 	return ret;
 }
 
-static int __soft_offline_page(struct page *page, int flags)
+static int __soft_offline_page(struct page *page)
 {
 	int ret;
 	unsigned long pfn = page_to_pfn(page);
@@ -1841,7 +1841,7 @@ static int __soft_offline_page(struct page *page, int flags)
 	return ret;
 }
 
-static int soft_offline_in_use_page(struct page *page, int flags)
+static int soft_offline_in_use_page(struct page *page)
 {
 	int ret;
 	int mt;
@@ -1871,9 +1871,9 @@ static int soft_offline_in_use_page(struct page *page, int flags)
 	mt = get_pageblock_migratetype(page);
 	set_pageblock_migratetype(page, MIGRATE_ISOLATE);
 	if (PageHuge(page))
-		ret = soft_offline_huge_page(page, flags);
+		ret = soft_offline_huge_page(page);
 	else
-		ret = __soft_offline_page(page, flags);
+		ret = __soft_offline_page(page);
 	set_pageblock_migratetype(page, mt);
 	return ret;
 }
@@ -1894,7 +1894,6 @@ static int soft_offline_free_page(struct page *page)
 /**
  * soft_offline_page - Soft offline a page.
  * @pfn: pfn to soft-offline
- * @flags: flags. Same as memory_failure().
  *
  * Returns 0 on success, otherwise negated errno.
  *
@@ -1913,7 +1912,7 @@ static int soft_offline_free_page(struct page *page)
  * This is not a 100% solution for all memory, but tries to be
  * ``good enough'' for the majority of memory.
  */
-int soft_offline_page(unsigned long pfn, int flags)
+int soft_offline_page(unsigned long pfn)
 {
 	int ret;
 	struct page *page;
@@ -1931,11 +1930,11 @@ int soft_offline_page(unsigned long pfn, int flags)
 	}
 
 	get_online_mems();
-	ret = get_any_page(page, pfn, flags);
+	ret = get_any_page(page, pfn);
 	put_online_mems();
 
 	if (ret > 0)
-		ret = soft_offline_in_use_page(page, flags);
+		ret = soft_offline_in_use_page(page);
 	else if (ret == 0)
 		ret = soft_offline_free_page(page);
 
-- 
2.17.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 10/15] mm,hwpoison: Unify THP handling for hard and soft offline
  2020-06-24 15:01 [PATCH v3 00/15] HWPOISON: soft offline rework nao.horiguchi
                   ` (8 preceding siblings ...)
  2020-06-24 15:01 ` [PATCH v3 09/15] mm,hwpoison: remove flag argument from soft offline functions nao.horiguchi
@ 2020-06-24 15:01 ` nao.horiguchi
  2020-06-24 15:01 ` [PATCH v3 11/15] mm,hwpoison: Rework soft offline for free pages nao.horiguchi
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: nao.horiguchi @ 2020-06-24 15:01 UTC (permalink / raw)
  To: linux-mm
  Cc: mhocko, akpm, mike.kravetz, osalvador, tony.luck, david,
	aneesh.kumar, zeil, naoya.horiguchi, linux-kernel

From: Oscar Salvador <osalvador@suse.de>

Place the THP's page handling in a helper and use it
from both hard and soft-offline machinery, so we get rid
of some duplicated code.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
 mm/memory-failure.c | 48 +++++++++++++++++++++------------------------
 1 file changed, 22 insertions(+), 26 deletions(-)

diff --git v5.8-rc1-mmots-2020-06-20-21-44/mm/memory-failure.c v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/memory-failure.c
index 03d3aae77f89..2e244d5b83e0 100644
--- v5.8-rc1-mmots-2020-06-20-21-44/mm/memory-failure.c
+++ v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/memory-failure.c
@@ -1103,6 +1103,25 @@ static int identify_page_state(unsigned long pfn, struct page *p,
 	return page_action(ps, p, pfn);
 }
 
+static int try_to_split_thp_page(struct page *page, const char *msg)
+{
+	lock_page(page);
+	if (!PageAnon(page) || unlikely(split_huge_page(page))) {
+		unsigned long pfn = page_to_pfn(page);
+
+		unlock_page(page);
+		if (!PageAnon(page))
+			pr_info("%s: %#lx: non anonymous thp\n", msg, pfn);
+		else
+			pr_info("%s: %#lx: thp split failed\n", msg, pfn);
+		put_page(page);
+		return -EBUSY;
+	}
+	unlock_page(page);
+
+	return 0;
+}
+
 static int memory_failure_hugetlb(unsigned long pfn, int flags)
 {
 	struct page *p = pfn_to_page(pfn);
@@ -1325,21 +1344,8 @@ int memory_failure(unsigned long pfn, int flags)
 	}
 
 	if (PageTransHuge(hpage)) {
-		lock_page(p);
-		if (!PageAnon(p) || unlikely(split_huge_page(p))) {
-			unlock_page(p);
-			if (!PageAnon(p))
-				pr_err("Memory failure: %#lx: non anonymous thp\n",
-					pfn);
-			else
-				pr_err("Memory failure: %#lx: thp split failed\n",
-					pfn);
-			if (TestClearPageHWPoison(p))
-				num_poisoned_pages_dec();
-			put_page(p);
+		if (try_to_split_thp_page(p, "Memory Failure") < 0)
 			return -EBUSY;
-		}
-		unlock_page(p);
 		VM_BUG_ON_PAGE(!page_count(p), p);
 	}
 
@@ -1847,19 +1853,9 @@ static int soft_offline_in_use_page(struct page *page)
 	int mt;
 	struct page *hpage = compound_head(page);
 
-	if (!PageHuge(page) && PageTransHuge(hpage)) {
-		lock_page(page);
-		if (!PageAnon(page) || unlikely(split_huge_page(page))) {
-			unlock_page(page);
-			if (!PageAnon(page))
-				pr_info("soft offline: %#lx: non anonymous thp\n", page_to_pfn(page));
-			else
-				pr_info("soft offline: %#lx: thp split failed\n", page_to_pfn(page));
-			put_page(page);
+	if (!PageHuge(page) && PageTransHuge(hpage))
+		if (try_to_split_thp_page(page, "soft offline") < 0)
 			return -EBUSY;
-		}
-		unlock_page(page);
-	}
 
 	/*
 	 * Setting MIGRATE_ISOLATE here ensures that the page will be linked
-- 
2.17.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 11/15] mm,hwpoison: Rework soft offline for free pages
  2020-06-24 15:01 [PATCH v3 00/15] HWPOISON: soft offline rework nao.horiguchi
                   ` (9 preceding siblings ...)
  2020-06-24 15:01 ` [PATCH v3 10/15] mm,hwpoison: Unify THP handling for hard and soft offline nao.horiguchi
@ 2020-06-24 15:01 ` nao.horiguchi
  2020-06-24 15:01 ` [PATCH v3 12/15] mm,hwpoison: Rework soft offline for in-use pages nao.horiguchi
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: nao.horiguchi @ 2020-06-24 15:01 UTC (permalink / raw)
  To: linux-mm
  Cc: mhocko, akpm, mike.kravetz, osalvador, tony.luck, david,
	aneesh.kumar, zeil, naoya.horiguchi, linux-kernel

From: Oscar Salvador <osalvador@suse.de>

When trying to soft-offline a free page, we need to first take it off
the buddy allocator.
Once we know is out of reach, we can safely flag it as poisoned.

take_page_off_buddy will be used to take a page meant to be poisoned
off the buddy allocator.
take_page_off_buddy calls break_down_buddy_pages, which splits a
higher-order page in case our page belongs to one.

Once the page is under our control, we call page_handle_poison to set it
as poisoned and grab a refcount on it.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
ChangeLog v2 -> v3:
- use add_to_free_list() instead of add_to_free_area()
- use del_page_from_free_list() instead of del_page_from_free_area()
- add fast return
- move extern definition to header file as warned by checkpatch.pl
---
 include/linux/page-flags.h |  1 +
 mm/memory-failure.c        | 18 ++++++----
 mm/page_alloc.c            | 68 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 81 insertions(+), 6 deletions(-)

diff --git v5.8-rc1-mmots-2020-06-20-21-44/include/linux/page-flags.h v5.8-rc1-mmots-2020-06-20-21-44_patched/include/linux/page-flags.h
index 6be1aa559b1e..9fa5d4e2d69a 100644
--- v5.8-rc1-mmots-2020-06-20-21-44/include/linux/page-flags.h
+++ v5.8-rc1-mmots-2020-06-20-21-44_patched/include/linux/page-flags.h
@@ -423,6 +423,7 @@ PAGEFLAG(HWPoison, hwpoison, PF_ANY)
 TESTSCFLAG(HWPoison, hwpoison, PF_ANY)
 #define __PG_HWPOISON (1UL << PG_hwpoison)
 extern bool set_hwpoison_free_buddy_page(struct page *page);
+extern bool take_page_off_buddy(struct page *page);
 #else
 PAGEFLAG_FALSE(HWPoison)
 static inline bool set_hwpoison_free_buddy_page(struct page *page)
diff --git v5.8-rc1-mmots-2020-06-20-21-44/mm/memory-failure.c v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/memory-failure.c
index 2e244d5b83e0..d79e756a97be 100644
--- v5.8-rc1-mmots-2020-06-20-21-44/mm/memory-failure.c
+++ v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/memory-failure.c
@@ -78,6 +78,13 @@ EXPORT_SYMBOL_GPL(hwpoison_filter_dev_minor);
 EXPORT_SYMBOL_GPL(hwpoison_filter_flags_mask);
 EXPORT_SYMBOL_GPL(hwpoison_filter_flags_value);
 
+static void page_handle_poison(struct page *page)
+{
+	SetPageHWPoison(page);
+	page_ref_inc(page);
+	num_poisoned_pages_inc();
+}
+
 static int hwpoison_filter_dev(struct page *p)
 {
 	struct address_space *mapping;
@@ -1876,14 +1883,13 @@ static int soft_offline_in_use_page(struct page *page)
 
 static int soft_offline_free_page(struct page *page)
 {
-	int rc = dissolve_free_huge_page(page);
+	int rc = -EBUSY;
 
-	if (!rc) {
-		if (set_hwpoison_free_buddy_page(page))
-			num_poisoned_pages_inc();
-		else
-			rc = -EBUSY;
+	if (!dissolve_free_huge_page(page) && take_page_off_buddy(page)) {
+		page_handle_poison(page);
+		rc = 0;
 	}
+
 	return rc;
 }
 
diff --git v5.8-rc1-mmots-2020-06-20-21-44/mm/page_alloc.c v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/page_alloc.c
index 31c32fe0ecfb..3b145bceb477 100644
--- v5.8-rc1-mmots-2020-06-20-21-44/mm/page_alloc.c
+++ v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/page_alloc.c
@@ -8781,6 +8781,74 @@ bool is_free_buddy_page(struct page *page)
 }
 
 #ifdef CONFIG_MEMORY_FAILURE
+/*
+ * Break down a higher-order page in sub-pages, and keep our target out of
+ * buddy allocator.
+ */
+static void break_down_buddy_pages(struct zone *zone, struct page *page,
+				   struct page *target, int low, int high,
+				   int migratetype)
+{
+	unsigned long size = 1 << high;
+	struct page *current_buddy, *next_page;
+
+	while (high > low) {
+		high--;
+		size >>= 1;
+
+		if (target >= &page[size]) {
+			next_page = page + size;
+			current_buddy = page;
+		} else {
+			next_page = page;
+			current_buddy = page + size;
+		}
+
+		if (set_page_guard(zone, current_buddy, high, migratetype))
+			continue;
+
+		if (current_buddy != target) {
+			add_to_free_list(current_buddy, zone, high, migratetype);
+			set_page_order(current_buddy, high);
+			page = next_page;
+		}
+	}
+}
+
+/*
+ * Take a page that will be marked as poisoned off the buddy allocator.
+ */
+bool take_page_off_buddy(struct page *page)
+{
+	struct zone *zone = page_zone(page);
+	unsigned long pfn = page_to_pfn(page);
+	unsigned long flags;
+	unsigned int order;
+	bool ret = false;
+
+	spin_lock_irqsave(&zone->lock, flags);
+	for (order = 0; order < MAX_ORDER; order++) {
+		struct page *page_head = page - (pfn & ((1 << order) - 1));
+		int buddy_order = page_order(page_head);
+
+		if (PageBuddy(page_head) && buddy_order >= order) {
+			unsigned long pfn_head = page_to_pfn(page_head);
+			int migratetype = get_pfnblock_migratetype(page_head,
+								   pfn_head);
+
+			del_page_from_free_list(page_head, zone, buddy_order);
+			break_down_buddy_pages(zone, page_head, page, 0,
+						buddy_order, migratetype);
+			ret = true;
+			break;
+		}
+		if (page_count(page_head) > 0)
+			break;
+	}
+	spin_unlock_irqrestore(&zone->lock, flags);
+	return ret;
+}
+
 /*
  * Set PG_hwpoison flag if a given page is confirmed to be a free page.  This
  * test is performed under the zone lock to prevent a race against page
-- 
2.17.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 12/15] mm,hwpoison: Rework soft offline for in-use pages
  2020-06-24 15:01 [PATCH v3 00/15] HWPOISON: soft offline rework nao.horiguchi
                   ` (10 preceding siblings ...)
  2020-06-24 15:01 ` [PATCH v3 11/15] mm,hwpoison: Rework soft offline for free pages nao.horiguchi
@ 2020-06-24 15:01 ` nao.horiguchi
  2020-06-24 15:01 ` [PATCH v3 13/15] mm,hwpoison: Refactor soft_offline_huge_page and __soft_offline_page nao.horiguchi
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: nao.horiguchi @ 2020-06-24 15:01 UTC (permalink / raw)
  To: linux-mm
  Cc: mhocko, akpm, mike.kravetz, osalvador, tony.luck, david,
	aneesh.kumar, zeil, naoya.horiguchi, linux-kernel

From: Oscar Salvador <osalvador@suse.de>

This patch changes the way we set and handle in-use poisoned pages.
Until now, poisoned pages were released to the buddy allocator, trusting
that the checks that take place prior to hand the page would act as a
safe net and would skip that page.

This has proved to be wrong, as we got some pfn walkers out there, like
compaction, that all they care is the page to be PageBuddy and be in a
freelist.
Although this might not be the only user, having poisoned pages
in the buddy allocator seems a bad idea as we should only have
free pages that are ready and meant to be used as such.

Before explaining the taken approach, let us break down the kind
of pages we can soft offline.

- Anonymous THP (after the split, they end up being 4K pages)
- Hugetlb
- Order-0 pages (that can be either migrated or invalited)

* Normal pages (order-0 and anon-THP)

  - If they are clean and unmapped page cache pages, we invalidate
    then by means of invalidate_inode_page().
  - If they are mapped/dirty, we do the isolate-and-migrate dance.

  Either way, do not call put_page directly from those paths.
  Instead, we keep the page and send it to page_set_poison to perform the
  right handling.

  page_set_poison sets the HWPoison flag and does the last put_page.
  This call to put_page is mainly to be able to call __page_cache_release,
  since this function is not exported.

  Down the chain, we placed a check for HWPoison page in
  free_pages_prepare, that just skips any poisoned page, so those pages
  do not end up in any pcplist/freelist.

  After that, we set the refcount on the page to 1 and we increment
  the poisoned pages counter.

  We could do as we do for free pages:
  1) wait until the page hits buddy's freelists
  2) take it off
  3) flag it

  The problem is that we could race with an allocation, so by the time we
  want to take the page off the buddy, the page is already allocated, so we
  cannot soft-offline it.
  This is not fatal of course, but if it is better if we can close the race
  as does not require a lot of code.

* Hugetlb pages

  - We isolate-and-migrate them

  After the migration has been successful, we call dissolve_free_huge_page,
  and we set HWPoison on the page if we succeed.
  Hugetlb has a slightly different handling though.

  While for non-hugetlb pages we cared about closing the race with an
  allocation, doing so for hugetlb pages requires quite some additional
  code (we would need to hook in free_huge_page and some other places).
  So I decided to not make the code overly complicated and just fail
  normally if the page we allocated in the meantime.

Because of the way we handle now in-use pages, we no longer need the
put-as-isolation-migratetype dance, that was guarding for poisoned pages
to end up in pcplists.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
 include/linux/page-flags.h |  5 -----
 mm/memory-failure.c        | 44 +++++++++++++-------------------------
 mm/migrate.c               | 11 +++-------
 mm/page_alloc.c            | 31 +++------------------------
 4 files changed, 21 insertions(+), 70 deletions(-)

diff --git v5.8-rc1-mmots-2020-06-20-21-44/include/linux/page-flags.h v5.8-rc1-mmots-2020-06-20-21-44_patched/include/linux/page-flags.h
index 9fa5d4e2d69a..d1df51ed6eeb 100644
--- v5.8-rc1-mmots-2020-06-20-21-44/include/linux/page-flags.h
+++ v5.8-rc1-mmots-2020-06-20-21-44_patched/include/linux/page-flags.h
@@ -422,14 +422,9 @@ PAGEFLAG_FALSE(Uncached)
 PAGEFLAG(HWPoison, hwpoison, PF_ANY)
 TESTSCFLAG(HWPoison, hwpoison, PF_ANY)
 #define __PG_HWPOISON (1UL << PG_hwpoison)
-extern bool set_hwpoison_free_buddy_page(struct page *page);
 extern bool take_page_off_buddy(struct page *page);
 #else
 PAGEFLAG_FALSE(HWPoison)
-static inline bool set_hwpoison_free_buddy_page(struct page *page)
-{
-	return 0;
-}
 #define __PG_HWPOISON 0
 #endif
 
diff --git v5.8-rc1-mmots-2020-06-20-21-44/mm/memory-failure.c v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/memory-failure.c
index d79e756a97be..f744eb90c15c 100644
--- v5.8-rc1-mmots-2020-06-20-21-44/mm/memory-failure.c
+++ v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/memory-failure.c
@@ -78,9 +78,12 @@ EXPORT_SYMBOL_GPL(hwpoison_filter_dev_minor);
 EXPORT_SYMBOL_GPL(hwpoison_filter_flags_mask);
 EXPORT_SYMBOL_GPL(hwpoison_filter_flags_value);
 
-static void page_handle_poison(struct page *page)
+static void page_handle_poison(struct page *page, bool release)
 {
+
 	SetPageHWPoison(page);
+	if (release)
+		put_page(page);
 	page_ref_inc(page);
 	num_poisoned_pages_inc();
 }
@@ -1757,19 +1760,13 @@ static int soft_offline_huge_page(struct page *page)
 			ret = -EIO;
 	} else {
 		/*
-		 * We set PG_hwpoison only when the migration source hugepage
-		 * was successfully dissolved, because otherwise hwpoisoned
-		 * hugepage remains on free hugepage list, then userspace will
-		 * find it as SIGBUS by allocation failure. That's not expected
-		 * in soft-offlining.
+		 * We set PG_hwpoison only when we were able to take the page
+		 * off the buddy.
 		 */
-		ret = dissolve_free_huge_page(page);
-		if (!ret) {
-			if (set_hwpoison_free_buddy_page(page))
-				num_poisoned_pages_inc();
-			else
-				ret = -EBUSY;
-		}
+		if (!dissolve_free_huge_page(page) && take_page_off_buddy(page))
+			page_handle_poison(page, false);
+		else
+			ret = -EBUSY;
 	}
 	return ret;
 }
@@ -1804,10 +1801,8 @@ static int __soft_offline_page(struct page *page)
 	 * would need to fix isolation locking first.
 	 */
 	if (ret == 1) {
-		put_page(page);
 		pr_info("soft_offline: %#lx: invalidated\n", pfn);
-		SetPageHWPoison(page);
-		num_poisoned_pages_inc();
+		page_handle_poison(page, true);
 		return 0;
 	}
 
@@ -1838,7 +1833,9 @@ static int __soft_offline_page(struct page *page)
 		list_add(&page->lru, &pagelist);
 		ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
 					MIGRATE_SYNC, MR_MEMORY_FAILURE);
-		if (ret) {
+		if (!ret) {
+			page_handle_poison(page, true);
+		} else {
 			if (!list_empty(&pagelist))
 				putback_movable_pages(&pagelist);
 
@@ -1857,27 +1854,16 @@ static int __soft_offline_page(struct page *page)
 static int soft_offline_in_use_page(struct page *page)
 {
 	int ret;
-	int mt;
 	struct page *hpage = compound_head(page);
 
 	if (!PageHuge(page) && PageTransHuge(hpage))
 		if (try_to_split_thp_page(page, "soft offline") < 0)
 			return -EBUSY;
 
-	/*
-	 * Setting MIGRATE_ISOLATE here ensures that the page will be linked
-	 * to free list immediately (not via pcplist) when released after
-	 * successful page migration. Otherwise we can't guarantee that the
-	 * page is really free after put_page() returns, so
-	 * set_hwpoison_free_buddy_page() highly likely fails.
-	 */
-	mt = get_pageblock_migratetype(page);
-	set_pageblock_migratetype(page, MIGRATE_ISOLATE);
 	if (PageHuge(page))
 		ret = soft_offline_huge_page(page);
 	else
 		ret = __soft_offline_page(page);
-	set_pageblock_migratetype(page, mt);
 	return ret;
 }
 
@@ -1886,7 +1872,7 @@ static int soft_offline_free_page(struct page *page)
 	int rc = -EBUSY;
 
 	if (!dissolve_free_huge_page(page) && take_page_off_buddy(page)) {
-		page_handle_poison(page);
+		page_handle_poison(page, false);
 		rc = 0;
 	}
 
diff --git v5.8-rc1-mmots-2020-06-20-21-44/mm/migrate.c v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/migrate.c
index c95912f74fe2..4381f76becee 100644
--- v5.8-rc1-mmots-2020-06-20-21-44/mm/migrate.c
+++ v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/migrate.c
@@ -1255,16 +1255,11 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
 		 */
 		if (newpage && PageTransHuge(newpage))
 			thp_pmd_migration_success(true);
-		put_page(page);
-		if (reason == MR_MEMORY_FAILURE) {
+		if (reason != MR_MEMORY_FAILURE)
 			/*
-			 * Set PG_HWPoison on just freed page
-			 * intentionally. Although it's rather weird,
-			 * it's how HWPoison flag works at the moment.
+			 * We release the page in page_handle_poison.
 			 */
-			if (set_hwpoison_free_buddy_page(page))
-				num_poisoned_pages_inc();
-		}
+			put_page(page);
 	} else {
 		if (rc != -EAGAIN) {
 			if (likely(!__PageMovable(page))) {
diff --git v5.8-rc1-mmots-2020-06-20-21-44/mm/page_alloc.c v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/page_alloc.c
index 3b145bceb477..5fbd28d63d60 100644
--- v5.8-rc1-mmots-2020-06-20-21-44/mm/page_alloc.c
+++ v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/page_alloc.c
@@ -1175,6 +1175,9 @@ static __always_inline bool free_pages_prepare(struct page *page,
 
 	VM_BUG_ON_PAGE(PageTail(page), page);
 
+	if (unlikely(PageHWPoison(page)) && !order)
+		return false;
+
 	trace_mm_page_free(page, order);
 
 	/*
@@ -8848,32 +8851,4 @@ bool take_page_off_buddy(struct page *page)
 	spin_unlock_irqrestore(&zone->lock, flags);
 	return ret;
 }
-
-/*
- * Set PG_hwpoison flag if a given page is confirmed to be a free page.  This
- * test is performed under the zone lock to prevent a race against page
- * allocation.
- */
-bool set_hwpoison_free_buddy_page(struct page *page)
-{
-	struct zone *zone = page_zone(page);
-	unsigned long pfn = page_to_pfn(page);
-	unsigned long flags;
-	unsigned int order;
-	bool hwpoisoned = false;
-
-	spin_lock_irqsave(&zone->lock, flags);
-	for (order = 0; order < MAX_ORDER; order++) {
-		struct page *page_head = page - (pfn & ((1 << order) - 1));
-
-		if (PageBuddy(page_head) && page_order(page_head) >= order) {
-			if (!TestSetPageHWPoison(page))
-				hwpoisoned = true;
-			break;
-		}
-	}
-	spin_unlock_irqrestore(&zone->lock, flags);
-
-	return hwpoisoned;
-}
 #endif
-- 
2.17.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 13/15] mm,hwpoison: Refactor soft_offline_huge_page and __soft_offline_page
  2020-06-24 15:01 [PATCH v3 00/15] HWPOISON: soft offline rework nao.horiguchi
                   ` (11 preceding siblings ...)
  2020-06-24 15:01 ` [PATCH v3 12/15] mm,hwpoison: Rework soft offline for in-use pages nao.horiguchi
@ 2020-06-24 15:01 ` nao.horiguchi
  2020-06-26 16:53   ` Naresh Kamboju
  2020-06-24 15:01 ` [PATCH v3 14/15] mm,hwpoison: Return 0 if the page is already poisoned in soft-offline nao.horiguchi
                   ` (5 subsequent siblings)
  18 siblings, 1 reply; 34+ messages in thread
From: nao.horiguchi @ 2020-06-24 15:01 UTC (permalink / raw)
  To: linux-mm
  Cc: mhocko, akpm, mike.kravetz, osalvador, tony.luck, david,
	aneesh.kumar, zeil, naoya.horiguchi, linux-kernel

From: Oscar Salvador <osalvador@suse.de>

Merging soft_offline_huge_page and __soft_offline_page let us get rid of
quite some duplicated code, and makes the code much easier to follow.

Now, __soft_offline_page will handle both normal and hugetlb pages.

Note that move put_page() block to the beginning of page_handle_poison()
with drain_all_pages() in order to make sure that the target page is
freed and sent into free list to make take_page_off_buddy() work properly.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
ChangeLog v2 -> v3:
- use page_is_file_lru() instead of page_is_file_cache(),
- add description about put_page() and drain_all_pages().
- fix coding style warnings by checkpatch.pl
---
 mm/memory-failure.c | 185 ++++++++++++++++++++------------------------
 1 file changed, 86 insertions(+), 99 deletions(-)

diff --git v5.8-rc1-mmots-2020-06-20-21-44/mm/memory-failure.c v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/memory-failure.c
index f744eb90c15c..22c904f6d17a 100644
--- v5.8-rc1-mmots-2020-06-20-21-44/mm/memory-failure.c
+++ v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/memory-failure.c
@@ -78,14 +78,36 @@ EXPORT_SYMBOL_GPL(hwpoison_filter_dev_minor);
 EXPORT_SYMBOL_GPL(hwpoison_filter_flags_mask);
 EXPORT_SYMBOL_GPL(hwpoison_filter_flags_value);
 
-static void page_handle_poison(struct page *page, bool release)
+static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, bool release)
 {
+	if (release) {
+		put_page(page);
+		drain_all_pages(page_zone(page));
+	}
+
+	if (hugepage_or_freepage) {
+		/*
+		 * Doing this check for free pages is also fine since dissolve_free_huge_page
+		 * returns 0 for non-hugetlb pages as well.
+		 */
+		if (dissolve_free_huge_page(page) || !take_page_off_buddy(page))
+		/*
+		 * The hugetlb page can end up being enqueued back into
+		 * the freelists by means of:
+		 * unmap_and_move_huge_page
+		 *  putback_active_hugepage
+		 *   put_page->free_huge_page
+		 *    enqueue_huge_page
+		 * If this happens, we might lose the race against an allocation.
+		 */
+			return false;
+	}
 
 	SetPageHWPoison(page);
-	if (release)
-		put_page(page);
 	page_ref_inc(page);
 	num_poisoned_pages_inc();
+
+	return true;
 }
 
 static int hwpoison_filter_dev(struct page *p)
@@ -1718,63 +1740,52 @@ static int get_any_page(struct page *page, unsigned long pfn)
 	return ret;
 }
 
-static int soft_offline_huge_page(struct page *page)
+static bool isolate_page(struct page *page, struct list_head *pagelist)
 {
-	int ret;
-	unsigned long pfn = page_to_pfn(page);
-	struct page *hpage = compound_head(page);
-	LIST_HEAD(pagelist);
+	bool isolated = false;
+	bool lru = PageLRU(page);
+
+	if (PageHuge(page)) {
+		isolated = isolate_huge_page(page, pagelist);
+	} else {
+		if (lru)
+			isolated = !isolate_lru_page(page);
+		else
+			isolated = !isolate_movable_page(page, ISOLATE_UNEVICTABLE);
+
+		if (isolated)
+			list_add(&page->lru, pagelist);
 
-	/*
-	 * This double-check of PageHWPoison is to avoid the race with
-	 * memory_failure(). See also comment in __soft_offline_page().
-	 */
-	lock_page(hpage);
-	if (PageHWPoison(hpage)) {
-		unlock_page(hpage);
-		put_page(hpage);
-		pr_info("soft offline: %#lx hugepage already poisoned\n", pfn);
-		return -EBUSY;
 	}
-	unlock_page(hpage);
 
-	ret = isolate_huge_page(hpage, &pagelist);
+	if (isolated && lru)
+		inc_node_page_state(page, NR_ISOLATED_ANON +
+				    page_is_file_lru(page));
+
 	/*
-	 * get_any_page() and isolate_huge_page() takes a refcount each,
-	 * so need to drop one here.
+	 * If we succeed to isolate the page, we grabbed another refcount on
+	 * the page, so we can safely drop the one we got from get_any_pages().
+	 * If we failed to isolate the page, it means that we cannot go further
+	 * and we will return an error, so drop the reference we got from
+	 * get_any_pages() as well.
 	 */
-	put_page(hpage);
-	if (!ret) {
-		pr_info("soft offline: %#lx hugepage failed to isolate\n", pfn);
-		return -EBUSY;
-	}
-
-	ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
-				MIGRATE_SYNC, MR_MEMORY_FAILURE);
-	if (ret) {
-		pr_info("soft offline: %#lx: hugepage migration failed %d, type %lx (%pGp)\n",
-			pfn, ret, page->flags, &page->flags);
-		if (!list_empty(&pagelist))
-			putback_movable_pages(&pagelist);
-		if (ret > 0)
-			ret = -EIO;
-	} else {
-		/*
-		 * We set PG_hwpoison only when we were able to take the page
-		 * off the buddy.
-		 */
-		if (!dissolve_free_huge_page(page) && take_page_off_buddy(page))
-			page_handle_poison(page, false);
-		else
-			ret = -EBUSY;
-	}
-	return ret;
+	put_page(page);
+	return isolated;
 }
 
+/*
+ * __soft_offline_page handles hugetlb-pages and non-hugetlb pages.
+ * If the page is a non-dirty unmapped page-cache page, it simply invalidates.
+ * If the page is mapped, it migrates the contents over.
+ */
 static int __soft_offline_page(struct page *page)
 {
-	int ret;
+	int ret = 0;
 	unsigned long pfn = page_to_pfn(page);
+	struct page *hpage = compound_head(page);
+	const char *msg_page[] = {"page", "hugepage"};
+	bool huge = PageHuge(page);
+	LIST_HEAD(pagelist);
 
 	/*
 	 * Check PageHWPoison again inside page lock because PageHWPoison
@@ -1783,98 +1794,74 @@ static int __soft_offline_page(struct page *page)
 	 * so there's no race between soft_offline_page() and memory_failure().
 	 */
 	lock_page(page);
-	wait_on_page_writeback(page);
+	if (!PageHuge(page))
+		wait_on_page_writeback(page);
 	if (PageHWPoison(page)) {
 		unlock_page(page);
 		put_page(page);
 		pr_info("soft offline: %#lx page already poisoned\n", pfn);
 		return -EBUSY;
 	}
-	/*
-	 * Try to invalidate first. This should work for
-	 * non dirty unmapped page cache pages.
-	 */
-	ret = invalidate_inode_page(page);
+
+	if (!PageHuge(page))
+		/*
+		 * Try to invalidate first. This should work for
+		 * non dirty unmapped page cache pages.
+		 */
+		ret = invalidate_inode_page(page);
 	unlock_page(page);
+
 	/*
 	 * RED-PEN would be better to keep it isolated here, but we
 	 * would need to fix isolation locking first.
 	 */
-	if (ret == 1) {
+	if (ret) {
 		pr_info("soft_offline: %#lx: invalidated\n", pfn);
-		page_handle_poison(page, true);
+		page_handle_poison(page, false, true);
 		return 0;
 	}
 
-	/*
-	 * Simple invalidation didn't work.
-	 * Try to migrate to a new page instead. migrate.c
-	 * handles a large number of cases for us.
-	 */
-	if (PageLRU(page))
-		ret = isolate_lru_page(page);
-	else
-		ret = isolate_movable_page(page, ISOLATE_UNEVICTABLE);
-	/*
-	 * Drop page reference which is came from get_any_page()
-	 * successful isolate_lru_page() already took another one.
-	 */
-	put_page(page);
-	if (!ret) {
-		LIST_HEAD(pagelist);
-		/*
-		 * After isolated lru page, the PageLRU will be cleared,
-		 * so use !__PageMovable instead for LRU page's mapping
-		 * cannot have PAGE_MAPPING_MOVABLE.
-		 */
-		if (!__PageMovable(page))
-			inc_node_page_state(page, NR_ISOLATED_ANON +
-						page_is_file_lru(page));
-		list_add(&page->lru, &pagelist);
+	if (isolate_page(hpage, &pagelist)) {
 		ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
 					MIGRATE_SYNC, MR_MEMORY_FAILURE);
 		if (!ret) {
-			page_handle_poison(page, true);
+			bool release = !huge;
+
+			if (!page_handle_poison(page, true, release))
+				ret = -EBUSY;
 		} else {
 			if (!list_empty(&pagelist))
 				putback_movable_pages(&pagelist);
 
-			pr_info("soft offline: %#lx: migration failed %d, type %lx (%pGp)\n",
-				pfn, ret, page->flags, &page->flags);
+
+			pr_info("soft offline: %#lx: %s migration failed %d, type %lx (%pGp)\n",
+				pfn, msg_page[huge], ret, page->flags, &page->flags);
 			if (ret > 0)
 				ret = -EIO;
 		}
 	} else {
-		pr_info("soft offline: %#lx: isolation failed: %d, page count %d, type %lx (%pGp)\n",
-			pfn, ret, page_count(page), page->flags, &page->flags);
+		pr_info("soft offline: %#lx: %s isolation failed: %d, page count %d, type %lx (%pGp)\n",
+			pfn, msg_page[huge], ret, page_count(page), page->flags, &page->flags);
 	}
 	return ret;
 }
 
 static int soft_offline_in_use_page(struct page *page)
 {
-	int ret;
 	struct page *hpage = compound_head(page);
 
 	if (!PageHuge(page) && PageTransHuge(hpage))
 		if (try_to_split_thp_page(page, "soft offline") < 0)
 			return -EBUSY;
-
-	if (PageHuge(page))
-		ret = soft_offline_huge_page(page);
-	else
-		ret = __soft_offline_page(page);
-	return ret;
+	return __soft_offline_page(page);
 }
 
 static int soft_offline_free_page(struct page *page)
 {
-	int rc = -EBUSY;
+	int rc = 0;
 
-	if (!dissolve_free_huge_page(page) && take_page_off_buddy(page)) {
-		page_handle_poison(page, false);
-		rc = 0;
-	}
+	if (!page_handle_poison(page, true, false))
+		rc = -EBUSY;
 
 	return rc;
 }
-- 
2.17.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 14/15] mm,hwpoison: Return 0 if the page is already poisoned in soft-offline
  2020-06-24 15:01 [PATCH v3 00/15] HWPOISON: soft offline rework nao.horiguchi
                   ` (12 preceding siblings ...)
  2020-06-24 15:01 ` [PATCH v3 13/15] mm,hwpoison: Refactor soft_offline_huge_page and __soft_offline_page nao.horiguchi
@ 2020-06-24 15:01 ` nao.horiguchi
  2020-06-24 15:01 ` [PATCH v3 15/15] mm,hwpoison: introduce MF_MSG_UNSPLIT_THP nao.horiguchi
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: nao.horiguchi @ 2020-06-24 15:01 UTC (permalink / raw)
  To: linux-mm
  Cc: mhocko, akpm, mike.kravetz, osalvador, tony.luck, david,
	aneesh.kumar, zeil, naoya.horiguchi, linux-kernel

From: Oscar Salvador <osalvador@suse.de>

Currently, there is an inconsistency when calling soft-offline from
different paths on a page that is already poisoned.

1) madvise:

        madvise_inject_error skips any poisoned page and continues
        the loop.
        If that was the only page to madvise, it returns 0.

2) /sys/devices/system/memory/:

        When calling soft_offline_page_store()->soft_offline_page(),
        we return -EBUSY in case the page is already poisoned.
        This is inconsistent with a) the above example and b)
        memory_failure, where we return 0 if the page was poisoned.

Fix this by dropping the PageHWPoison() check in madvise_inject_error,
and let soft_offline_page return 0 if it finds the page already poisoned.

Please, note that this represents a user-api change, since now the
return error when calling soft_offline_page_store()->soft_offline_page()
will be different.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
 mm/madvise.c        | 3 ---
 mm/memory-failure.c | 4 ++--
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git v5.8-rc1-mmots-2020-06-20-21-44/mm/madvise.c v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/madvise.c
index 2b5080253406..b59bbd6c9982 100644
--- v5.8-rc1-mmots-2020-06-20-21-44/mm/madvise.c
+++ v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/madvise.c
@@ -902,9 +902,6 @@ static int madvise_inject_error(int behavior,
 		 */
 		put_page(page);
 
-		if (PageHWPoison(page))
-			continue;
-
 		if (behavior == MADV_SOFT_OFFLINE) {
 			pr_info("Soft offlining pfn %#lx at process virtual address %#lx\n",
 				pfn, start);
diff --git v5.8-rc1-mmots-2020-06-20-21-44/mm/memory-failure.c v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/memory-failure.c
index 22c904f6d17a..9ad3198a3954 100644
--- v5.8-rc1-mmots-2020-06-20-21-44/mm/memory-failure.c
+++ v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/memory-failure.c
@@ -1800,7 +1800,7 @@ static int __soft_offline_page(struct page *page)
 		unlock_page(page);
 		put_page(page);
 		pr_info("soft offline: %#lx page already poisoned\n", pfn);
-		return -EBUSY;
+		return 0;
 	}
 
 	if (!PageHuge(page))
@@ -1901,7 +1901,7 @@ int soft_offline_page(unsigned long pfn)
 
 	if (PageHWPoison(page)) {
 		pr_info("soft offline: %#lx page already poisoned\n", pfn);
-		return -EBUSY;
+		return 0;
 	}
 
 	get_online_mems();
-- 
2.17.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 15/15] mm,hwpoison: introduce MF_MSG_UNSPLIT_THP
  2020-06-24 15:01 [PATCH v3 00/15] HWPOISON: soft offline rework nao.horiguchi
                   ` (13 preceding siblings ...)
  2020-06-24 15:01 ` [PATCH v3 14/15] mm,hwpoison: Return 0 if the page is already poisoned in soft-offline nao.horiguchi
@ 2020-06-24 15:01 ` nao.horiguchi
  2020-06-24 19:17 ` [PATCH v3 00/15] HWPOISON: soft offline rework Andrew Morton
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 34+ messages in thread
From: nao.horiguchi @ 2020-06-24 15:01 UTC (permalink / raw)
  To: linux-mm
  Cc: mhocko, akpm, mike.kravetz, osalvador, tony.luck, david,
	aneesh.kumar, zeil, naoya.horiguchi, linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

memory_failure() is supposed to call action_result() when it handles
a memory error event, but there's one missing case. So let's add it.

I find that include/ras/ras_event.h has some other MF_MSG_* undefined,
so this patch also adds them.

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
 include/linux/mm.h      | 1 +
 include/ras/ras_event.h | 3 +++
 mm/memory-failure.c     | 5 ++++-
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git v5.8-rc1-mmots-2020-06-20-21-44/include/linux/mm.h v5.8-rc1-mmots-2020-06-20-21-44_patched/include/linux/mm.h
index d708033c50c0..10ee48f05f68 100644
--- v5.8-rc1-mmots-2020-06-20-21-44/include/linux/mm.h
+++ v5.8-rc1-mmots-2020-06-20-21-44_patched/include/linux/mm.h
@@ -3033,6 +3033,7 @@ enum mf_action_page_type {
 	MF_MSG_BUDDY,
 	MF_MSG_BUDDY_2ND,
 	MF_MSG_DAX,
+	MF_MSG_UNSPLIT_THP,
 	MF_MSG_UNKNOWN,
 };
 
diff --git v5.8-rc1-mmots-2020-06-20-21-44/include/ras/ras_event.h v5.8-rc1-mmots-2020-06-20-21-44_patched/include/ras/ras_event.h
index 36c5c5e38c1d..0bdbc0d17d2f 100644
--- v5.8-rc1-mmots-2020-06-20-21-44/include/ras/ras_event.h
+++ v5.8-rc1-mmots-2020-06-20-21-44_patched/include/ras/ras_event.h
@@ -361,6 +361,7 @@ TRACE_EVENT(aer_event,
 	EM ( MF_MSG_POISONED_HUGE, "huge page already hardware poisoned" )	\
 	EM ( MF_MSG_HUGE, "huge page" )					\
 	EM ( MF_MSG_FREE_HUGE, "free huge page" )			\
+	EM ( MF_MSG_NON_PMD_HUGE, "non-pmd-sized huge page" )		\
 	EM ( MF_MSG_UNMAP_FAILED, "unmapping failed page" )		\
 	EM ( MF_MSG_DIRTY_SWAPCACHE, "dirty swapcache page" )		\
 	EM ( MF_MSG_CLEAN_SWAPCACHE, "clean swapcache page" )		\
@@ -373,6 +374,8 @@ TRACE_EVENT(aer_event,
 	EM ( MF_MSG_TRUNCATED_LRU, "already truncated LRU page" )	\
 	EM ( MF_MSG_BUDDY, "free buddy page" )				\
 	EM ( MF_MSG_BUDDY_2ND, "free buddy page (2nd try)" )		\
+	EM ( MF_MSG_DAX, "dax page" )					\
+	EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" )			\
 	EMe ( MF_MSG_UNKNOWN, "unknown page" )
 
 /*
diff --git v5.8-rc1-mmots-2020-06-20-21-44/mm/memory-failure.c v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/memory-failure.c
index 9ad3198a3954..e57cfe0606f5 100644
--- v5.8-rc1-mmots-2020-06-20-21-44/mm/memory-failure.c
+++ v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/memory-failure.c
@@ -586,6 +586,7 @@ static const char * const action_page_types[] = {
 	[MF_MSG_BUDDY]			= "free buddy page",
 	[MF_MSG_BUDDY_2ND]		= "free buddy page (2nd try)",
 	[MF_MSG_DAX]			= "dax page",
+	[MF_MSG_UNSPLIT_THP]		= "unsplit thp",
 	[MF_MSG_UNKNOWN]		= "unknown page",
 };
 
@@ -1376,8 +1377,10 @@ int memory_failure(unsigned long pfn, int flags)
 	}
 
 	if (PageTransHuge(hpage)) {
-		if (try_to_split_thp_page(p, "Memory Failure") < 0)
+		if (try_to_split_thp_page(p, "Memory Failure") < 0) {
+			action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
 			return -EBUSY;
+		}
 		VM_BUG_ON_PAGE(!page_count(p), p);
 	}
 
-- 
2.17.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 00/15] HWPOISON: soft offline rework
  2020-06-24 15:01 [PATCH v3 00/15] HWPOISON: soft offline rework nao.horiguchi
                   ` (14 preceding siblings ...)
  2020-06-24 15:01 ` [PATCH v3 15/15] mm,hwpoison: introduce MF_MSG_UNSPLIT_THP nao.horiguchi
@ 2020-06-24 19:17 ` Andrew Morton
  2020-06-24 22:36   ` HORIGUCHI NAOYA(堀口 直也)
  2020-06-26 13:35 ` Qian Cai
                   ` (2 subsequent siblings)
  18 siblings, 1 reply; 34+ messages in thread
From: Andrew Morton @ 2020-06-24 19:17 UTC (permalink / raw)
  To: nao.horiguchi
  Cc: linux-mm, mhocko, mike.kravetz, osalvador, tony.luck, david,
	aneesh.kumar, zeil, naoya.horiguchi, linux-kernel

On Wed, 24 Jun 2020 15:01:22 +0000 nao.horiguchi@gmail.com wrote:

> I rebased soft-offline rework patchset [1][2] onto the latest mmotm.  The
> rebasing required some non-trivial changes to adjust, but mainly that was
> straightforward.  I confirmed that the reported problem doesn't reproduce on
> compaction after soft offline.  For more precise description of the problem
> and the motivation of this patchset, please see [2].
> 
> I think that the following two patches in v2 are better to be done with
> separate work of hard-offline rework, so it's not included in this series.
> 
>   - mm,hwpoison: Take pages off the buddy when hard-offlining
>   - mm/hwpoison-inject: Rip off duplicated checks
> 
> These two are not directly related to the reported problem, so they seems
> not urgent.  And the first one breaks num_poisoned_pages counting in some
> testcases, and The second patch needs more consideration about commented point.
> 

It would be nice to have some sort of overview of the patch series in
this [0/n] email.

> [1] v1: https://lore.kernel.org/linux-mm/1541746035-13408-1-git-send-email-n-horiguchi@ah.jp.nec.com/
> [2] v2: https://lore.kernel.org/linux-mm/20191017142123.24245-1-osalvador@suse.de/

The above have such, but are they up to date?



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 00/15] HWPOISON: soft offline rework
  2020-06-24 19:17 ` [PATCH v3 00/15] HWPOISON: soft offline rework Andrew Morton
@ 2020-06-24 22:36   ` HORIGUCHI NAOYA(堀口 直也)
  2020-06-24 22:49     ` Andrew Morton
  0 siblings, 1 reply; 34+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2020-06-24 22:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: nao.horiguchi, linux-mm, mhocko, mike.kravetz, osalvador,
	tony.luck, david, aneesh.kumar, zeil, linux-kernel

On Wed, Jun 24, 2020 at 12:17:42PM -0700, Andrew Morton wrote:
> On Wed, 24 Jun 2020 15:01:22 +0000 nao.horiguchi@gmail.com wrote:
> 
> > I rebased soft-offline rework patchset [1][2] onto the latest mmotm.  The
> > rebasing required some non-trivial changes to adjust, but mainly that was
> > straightforward.  I confirmed that the reported problem doesn't reproduce on
> > compaction after soft offline.  For more precise description of the problem
> > and the motivation of this patchset, please see [2].
> > 
> > I think that the following two patches in v2 are better to be done with
> > separate work of hard-offline rework, so it's not included in this series.
> > 
> >   - mm,hwpoison: Take pages off the buddy when hard-offlining
> >   - mm/hwpoison-inject: Rip off duplicated checks
> > 
> > These two are not directly related to the reported problem, so they seems
> > not urgent.  And the first one breaks num_poisoned_pages counting in some
> > testcases, and The second patch needs more consideration about commented point.
> > 
> 
> It would be nice to have some sort of overview of the patch series in
> this [0/n] email.
> 
> > [1] v1: https://lore.kernel.org/linux-mm/1541746035-13408-1-git-send-email-n-horiguchi@ah.jp.nec.com/
> > [2] v2: https://lore.kernel.org/linux-mm/20191017142123.24245-1-osalvador@suse.de/
> 
> The above have such, but are they up to date?

The description of the problem doesn't change, but there're some new patches
and some patches are postponed, so I should've added an overview of this series:

- patch 1, 2 are cleanups.
- patch 3, 4, 5 change the precondition when calling memory_failure(). Previously
  we sometimes call it with holding refcount of the target page and somtimes call
  without holding it, and we passed a flag of whether refcount was taken out of
  memory_failure().  It was confusing and caused code more complex than needed.
- patch 6-10 are cleanups.
- patch 11 introduces new logic to remove the error page from buddy allocator,
  which is also applied to the path of soft-offling in-use pages in patch 12.
- patch 13 is basically a refactoring but I added some adjustment to make sure
  that the freed page is surely sent back to buddy instead of being kept in pcplist,
  which is based on discussion in v2.
- patch 14 fixes the inconsistency of return values between injection interfaces.
- patch 15 is a new patch to complement missing code found in code review for
  previous version.

Core change is in patch 11 and 12, and the others are kind of cleanup/refactoring.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 00/15] HWPOISON: soft offline rework
  2020-06-24 22:36   ` HORIGUCHI NAOYA(堀口 直也)
@ 2020-06-24 22:49     ` Andrew Morton
  2020-06-24 23:01       ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 1 reply; 34+ messages in thread
From: Andrew Morton @ 2020-06-24 22:49 UTC (permalink / raw)
  To: HORIGUCHI NAOYA
  Cc: nao.horiguchi, linux-mm, mhocko, mike.kravetz, osalvador,
	tony.luck, david, aneesh.kumar, zeil, linux-kernel

On Wed, 24 Jun 2020 22:36:18 +0000 HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@nec.com> wrote:

> On Wed, Jun 24, 2020 at 12:17:42PM -0700, Andrew Morton wrote:
> > On Wed, 24 Jun 2020 15:01:22 +0000 nao.horiguchi@gmail.com wrote:
> > 
> > > I rebased soft-offline rework patchset [1][2] onto the latest mmotm.  The
> > > rebasing required some non-trivial changes to adjust, but mainly that was
> > > straightforward.  I confirmed that the reported problem doesn't reproduce on
> > > compaction after soft offline.  For more precise description of the problem
> > > and the motivation of this patchset, please see [2].
> > > 
> > > I think that the following two patches in v2 are better to be done with
> > > separate work of hard-offline rework, so it's not included in this series.
> > > 
> > >   - mm,hwpoison: Take pages off the buddy when hard-offlining
> > >   - mm/hwpoison-inject: Rip off duplicated checks
> > > 
> > > These two are not directly related to the reported problem, so they seems
> > > not urgent.  And the first one breaks num_poisoned_pages counting in some
> > > testcases, and The second patch needs more consideration about commented point.
> > > 
> > 
> > It would be nice to have some sort of overview of the patch series in
> > this [0/n] email.
> > 
> > > [1] v1: https://lore.kernel.org/linux-mm/1541746035-13408-1-git-send-email-n-horiguchi@ah.jp.nec.com/
> > > [2] v2: https://lore.kernel.org/linux-mm/20191017142123.24245-1-osalvador@suse.de/
> > 
> > The above have such, but are they up to date?
> 
> The description of the problem doesn't change, but there're some new patches
> and some patches are postponed, so I should've added an overview of this series:
> 
> - patch 1, 2 are cleanups.
> - patch 3, 4, 5 change the precondition when calling memory_failure(). Previously
>   we sometimes call it with holding refcount of the target page and somtimes call
>   without holding it, and we passed a flag of whether refcount was taken out of
>   memory_failure().  It was confusing and caused code more complex than needed.
> - patch 6-10 are cleanups.
> - patch 11 introduces new logic to remove the error page from buddy allocator,
>   which is also applied to the path of soft-offling in-use pages in patch 12.
> - patch 13 is basically a refactoring but I added some adjustment to make sure
>   that the freed page is surely sent back to buddy instead of being kept in pcplist,
>   which is based on discussion in v2.
> - patch 14 fixes the inconsistency of return values between injection interfaces.
> - patch 15 is a new patch to complement missing code found in code review for
>   previous version.
> 
> Core change is in patch 11 and 12, and the others are kind of cleanup/refactoring.

And all the other words in
https://lore.kernel.org/linux-mm/1541746035-13408-1-git-send-email-n-horiguchi@ah.jp.nec.com/
are still accurate and complete?



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 00/15] HWPOISON: soft offline rework
  2020-06-24 22:49     ` Andrew Morton
@ 2020-06-24 23:01       ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 0 replies; 34+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2020-06-24 23:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: nao.horiguchi, linux-mm, mhocko, mike.kravetz, osalvador,
	tony.luck, david, aneesh.kumar, zeil, linux-kernel

On Wed, Jun 24, 2020 at 03:49:47PM -0700, Andrew Morton wrote:
> On Wed, 24 Jun 2020 22:36:18 +0000 HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@nec.com> wrote:
> 
> > On Wed, Jun 24, 2020 at 12:17:42PM -0700, Andrew Morton wrote:
> > > On Wed, 24 Jun 2020 15:01:22 +0000 nao.horiguchi@gmail.com wrote:
> > > 
> > > > I rebased soft-offline rework patchset [1][2] onto the latest mmotm.  The
> > > > rebasing required some non-trivial changes to adjust, but mainly that was
> > > > straightforward.  I confirmed that the reported problem doesn't reproduce on
> > > > compaction after soft offline.  For more precise description of the problem
> > > > and the motivation of this patchset, please see [2].
> > > > 
> > > > I think that the following two patches in v2 are better to be done with
> > > > separate work of hard-offline rework, so it's not included in this series.
> > > > 
> > > >   - mm,hwpoison: Take pages off the buddy when hard-offlining
> > > >   - mm/hwpoison-inject: Rip off duplicated checks
> > > > 
> > > > These two are not directly related to the reported problem, so they seems
> > > > not urgent.  And the first one breaks num_poisoned_pages counting in some
> > > > testcases, and The second patch needs more consideration about commented point.
> > > > 
> > > 
> > > It would be nice to have some sort of overview of the patch series in
> > > this [0/n] email.
> > > 
> > > > [1] v1: https://lore.kernel.org/linux-mm/1541746035-13408-1-git-send-email-n-horiguchi@ah.jp.nec.com/
> > > > [2] v2: https://lore.kernel.org/linux-mm/20191017142123.24245-1-osalvador@suse.de/
> > > 
> > > The above have such, but are they up to date?
> > 
> > The description of the problem doesn't change, but there're some new patches
> > and some patches are postponed, so I should've added an overview of this series:
> > 
> > - patch 1, 2 are cleanups.
> > - patch 3, 4, 5 change the precondition when calling memory_failure(). Previously
> >   we sometimes call it with holding refcount of the target page and somtimes call
> >   without holding it, and we passed a flag of whether refcount was taken out of
> >   memory_failure().  It was confusing and caused code more complex than needed.
> > - patch 6-10 are cleanups.
> > - patch 11 introduces new logic to remove the error page from buddy allocator,
> >   which is also applied to the path of soft-offling in-use pages in patch 12.
> > - patch 13 is basically a refactoring but I added some adjustment to make sure
> >   that the freed page is surely sent back to buddy instead of being kept in pcplist,
> >   which is based on discussion in v2.
> > - patch 14 fixes the inconsistency of return values between injection interfaces.
> > - patch 15 is a new patch to complement missing code found in code review for
> >   previous version.
> > 
> > Core change is in patch 11 and 12, and the others are kind of cleanup/refactoring.
> 
> And all the other words in
> https://lore.kernel.org/linux-mm/1541746035-13408-1-git-send-email-n-horiguchi@ah.jp.nec.com/
> are still accurate and complete?

Yes, they are.

- Naoya

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 00/15] HWPOISON: soft offline rework
  2020-06-24 15:01 [PATCH v3 00/15] HWPOISON: soft offline rework nao.horiguchi
                   ` (15 preceding siblings ...)
  2020-06-24 19:17 ` [PATCH v3 00/15] HWPOISON: soft offline rework Andrew Morton
@ 2020-06-26 13:35 ` Qian Cai
  2020-06-29 10:29 ` Oscar Salvador
  2020-06-30  5:08 ` Qian Cai
  18 siblings, 0 replies; 34+ messages in thread
From: Qian Cai @ 2020-06-26 13:35 UTC (permalink / raw)
  To: nao.horiguchi
  Cc: linux-mm, mhocko, akpm, mike.kravetz, osalvador, tony.luck,
	david, aneesh.kumar, zeil, naoya.horiguchi, linux-kernel

On Wed, Jun 24, 2020 at 03:01:22PM +0000, nao.horiguchi@gmail.com wrote:
> I rebased soft-offline rework patchset [1][2] onto the latest mmotm.  The
> rebasing required some non-trivial changes to adjust, but mainly that was
> straightforward.  I confirmed that the reported problem doesn't reproduce on
> compaction after soft offline.  For more precise description of the problem
> and the motivation of this patchset, please see [2].
> 
> I think that the following two patches in v2 are better to be done with
> separate work of hard-offline rework, so it's not included in this series.
> 
>   - mm,hwpoison: Take pages off the buddy when hard-offlining
>   - mm/hwpoison-inject: Rip off duplicated checks
> 
> These two are not directly related to the reported problem, so they seems
> not urgent.  And the first one breaks num_poisoned_pages counting in some
> testcases, and The second patch needs more consideration about commented point.
> 
> Any comment/suggestion/help would be appreciated.

Next-20200626 failed to compile due to this series. Reverting the whole
thing [1] will fix the issue below right away,

mm/memory-failure.c: In function ‘__soft_offline_page’:
mm/memory-failure.c:1827:3: error: implicit declaration of function ‘page_handle_poison’; did you mean ‘page_init_poison’? [-Werror=implicit-function-declaration]
   page_handle_poison(page, false, true);
   ^~~~~~~~~~~~~~~~~~
   page_init_poison

.config used,

https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config
https://raw.githubusercontent.com/cailca/linux-mm/master/arm64.config

[1] git revert --no-edit f296ba1d3a07..0bd1762119e9

> 
> [1] v1: https://lore.kernel.org/linux-mm/1541746035-13408-1-git-send-email-n-horiguchi@ah.jp.nec.com/
> [2] v2: https://lore.kernel.org/linux-mm/20191017142123.24245-1-osalvador@suse.de/
> 
> Thanks,
> Naoya Horiguchi
> ---
> Summary:
> 
> Naoya Horiguchi (7):
>       mm,hwpoison: cleanup unused PageHuge() check
>       mm, hwpoison: remove recalculating hpage
>       mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED
>       mm,hwpoison-inject: don't pin for hwpoison_filter
>       mm,hwpoison: remove MF_COUNT_INCREASED
>       mm,hwpoison: remove flag argument from soft offline functions
>       mm,hwpoison: introduce MF_MSG_UNSPLIT_THP
> 
> Oscar Salvador (8):
>       mm,madvise: Refactor madvise_inject_error
>       mm,hwpoison: Un-export get_hwpoison_page and make it static
>       mm,hwpoison: Kill put_hwpoison_page
>       mm,hwpoison: Unify THP handling for hard and soft offline
>       mm,hwpoison: Rework soft offline for free pages
>       mm,hwpoison: Rework soft offline for in-use pages
>       mm,hwpoison: Refactor soft_offline_huge_page and __soft_offline_page
>       mm,hwpoison: Return 0 if the page is already poisoned in soft-offline
> 
>  drivers/base/memory.c      |   2 +-
>  include/linux/mm.h         |  12 +-
>  include/linux/page-flags.h |   6 +-
>  include/ras/ras_event.h    |   3 +
>  mm/hwpoison-inject.c       |  18 +--
>  mm/madvise.c               |  39 +++---
>  mm/memory-failure.c        | 331 ++++++++++++++++++++-------------------------
>  mm/migrate.c               |  11 +-
>  mm/page_alloc.c            |  63 +++++++--
>  9 files changed, 233 insertions(+), 252 deletions(-)
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 13/15] mm,hwpoison: Refactor soft_offline_huge_page and __soft_offline_page
  2020-06-24 15:01 ` [PATCH v3 13/15] mm,hwpoison: Refactor soft_offline_huge_page and __soft_offline_page nao.horiguchi
@ 2020-06-26 16:53   ` Naresh Kamboju
  2020-06-28  6:54     ` Naoya Horiguchi
  0 siblings, 1 reply; 34+ messages in thread
From: Naresh Kamboju @ 2020-06-26 16:53 UTC (permalink / raw)
  To: nao.horiguchi
  Cc: linux-mm, Michal Hocko, Andrew Morton, Mike Kravetz, osalvador,
	Tony Luck, david, Aneesh Kumar K . V, zeil, naoya.horiguchi,
	open list, lkft-triage, Linux-Next Mailing List,
	Stephen Rothwell

On Wed, 24 Jun 2020 at 20:32, <nao.horiguchi@gmail.com> wrote:
>
> From: Oscar Salvador <osalvador@suse.de>
>
> Merging soft_offline_huge_page and __soft_offline_page let us get rid of
> quite some duplicated code, and makes the code much easier to follow.
>
> Now, __soft_offline_page will handle both normal and hugetlb pages.
>
> Note that move put_page() block to the beginning of page_handle_poison()
> with drain_all_pages() in order to make sure that the target page is
> freed and sent into free list to make take_page_off_buddy() work properly.
>
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> ---
> ChangeLog v2 -> v3:
> - use page_is_file_lru() instead of page_is_file_cache(),
> - add description about put_page() and drain_all_pages().
> - fix coding style warnings by checkpatch.pl
> ---
>  mm/memory-failure.c | 185 ++++++++++++++++++++------------------------
>  1 file changed, 86 insertions(+), 99 deletions(-)
>
> diff --git v5.8-rc1-mmots-2020-06-20-21-44/mm/memory-failure.c v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/memory-failure.c
> index f744eb90c15c..22c904f6d17a 100644
> --- v5.8-rc1-mmots-2020-06-20-21-44/mm/memory-failure.c
> +++ v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/memory-failure.c
> @@ -78,14 +78,36 @@ EXPORT_SYMBOL_GPL(hwpoison_filter_dev_minor);
>  EXPORT_SYMBOL_GPL(hwpoison_filter_flags_mask);
>  EXPORT_SYMBOL_GPL(hwpoison_filter_flags_value);
>
> -static void page_handle_poison(struct page *page, bool release)
> +static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, bool release)
>  {
> +       if (release) {
> +               put_page(page);
> +               drain_all_pages(page_zone(page));
> +       }
> +
> +       if (hugepage_or_freepage) {
> +               /*
> +                * Doing this check for free pages is also fine since dissolve_free_huge_page
> +                * returns 0 for non-hugetlb pages as well.
> +                */
> +               if (dissolve_free_huge_page(page) || !take_page_off_buddy(page))
> +               /*
> +                * The hugetlb page can end up being enqueued back into
> +                * the freelists by means of:
> +                * unmap_and_move_huge_page
> +                *  putback_active_hugepage
> +                *   put_page->free_huge_page
> +                *    enqueue_huge_page
> +                * If this happens, we might lose the race against an allocation.
> +                */
> +                       return false;
> +       }
>
>         SetPageHWPoison(page);
> -       if (release)
> -               put_page(page);
>         page_ref_inc(page);
>         num_poisoned_pages_inc();
> +
> +       return true;
>  }
>
>  static int hwpoison_filter_dev(struct page *p)
> @@ -1718,63 +1740,52 @@ static int get_any_page(struct page *page, unsigned long pfn)
>         return ret;
>  }
>
> -static int soft_offline_huge_page(struct page *page)
> +static bool isolate_page(struct page *page, struct list_head *pagelist)
>  {
> -       int ret;
> -       unsigned long pfn = page_to_pfn(page);
> -       struct page *hpage = compound_head(page);
> -       LIST_HEAD(pagelist);
> +       bool isolated = false;
> +       bool lru = PageLRU(page);
> +
> +       if (PageHuge(page)) {
> +               isolated = isolate_huge_page(page, pagelist);
> +       } else {
> +               if (lru)
> +                       isolated = !isolate_lru_page(page);
> +               else
> +                       isolated = !isolate_movable_page(page, ISOLATE_UNEVICTABLE);
> +
> +               if (isolated)
> +                       list_add(&page->lru, pagelist);
>
> -       /*
> -        * This double-check of PageHWPoison is to avoid the race with
> -        * memory_failure(). See also comment in __soft_offline_page().
> -        */
> -       lock_page(hpage);
> -       if (PageHWPoison(hpage)) {
> -               unlock_page(hpage);
> -               put_page(hpage);
> -               pr_info("soft offline: %#lx hugepage already poisoned\n", pfn);
> -               return -EBUSY;
>         }
> -       unlock_page(hpage);
>
> -       ret = isolate_huge_page(hpage, &pagelist);
> +       if (isolated && lru)
> +               inc_node_page_state(page, NR_ISOLATED_ANON +
> +                                   page_is_file_lru(page));
> +
>         /*
> -        * get_any_page() and isolate_huge_page() takes a refcount each,
> -        * so need to drop one here.
> +        * If we succeed to isolate the page, we grabbed another refcount on
> +        * the page, so we can safely drop the one we got from get_any_pages().
> +        * If we failed to isolate the page, it means that we cannot go further
> +        * and we will return an error, so drop the reference we got from
> +        * get_any_pages() as well.
>          */
> -       put_page(hpage);
> -       if (!ret) {
> -               pr_info("soft offline: %#lx hugepage failed to isolate\n", pfn);
> -               return -EBUSY;
> -       }
> -
> -       ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
> -                               MIGRATE_SYNC, MR_MEMORY_FAILURE);
> -       if (ret) {
> -               pr_info("soft offline: %#lx: hugepage migration failed %d, type %lx (%pGp)\n",
> -                       pfn, ret, page->flags, &page->flags);
> -               if (!list_empty(&pagelist))
> -                       putback_movable_pages(&pagelist);
> -               if (ret > 0)
> -                       ret = -EIO;
> -       } else {
> -               /*
> -                * We set PG_hwpoison only when we were able to take the page
> -                * off the buddy.
> -                */
> -               if (!dissolve_free_huge_page(page) && take_page_off_buddy(page))
> -                       page_handle_poison(page, false);
> -               else
> -                       ret = -EBUSY;
> -       }
> -       return ret;
> +       put_page(page);
> +       return isolated;
>  }
>
> +/*
> + * __soft_offline_page handles hugetlb-pages and non-hugetlb pages.
> + * If the page is a non-dirty unmapped page-cache page, it simply invalidates.
> + * If the page is mapped, it migrates the contents over.
> + */
>  static int __soft_offline_page(struct page *page)
>  {
> -       int ret;
> +       int ret = 0;
>         unsigned long pfn = page_to_pfn(page);
> +       struct page *hpage = compound_head(page);
> +       const char *msg_page[] = {"page", "hugepage"};
> +       bool huge = PageHuge(page);
> +       LIST_HEAD(pagelist);
>
>         /*
>          * Check PageHWPoison again inside page lock because PageHWPoison
> @@ -1783,98 +1794,74 @@ static int __soft_offline_page(struct page *page)
>          * so there's no race between soft_offline_page() and memory_failure().
>          */
>         lock_page(page);
> -       wait_on_page_writeback(page);
> +       if (!PageHuge(page))
> +               wait_on_page_writeback(page);
>         if (PageHWPoison(page)) {
>                 unlock_page(page);
>                 put_page(page);
>                 pr_info("soft offline: %#lx page already poisoned\n", pfn);
>                 return -EBUSY;
>         }
> -       /*
> -        * Try to invalidate first. This should work for
> -        * non dirty unmapped page cache pages.
> -        */
> -       ret = invalidate_inode_page(page);
> +
> +       if (!PageHuge(page))
> +               /*
> +                * Try to invalidate first. This should work for
> +                * non dirty unmapped page cache pages.
> +                */
> +               ret = invalidate_inode_page(page);
>         unlock_page(page);
> +
>         /*
>          * RED-PEN would be better to keep it isolated here, but we
>          * would need to fix isolation locking first.
>          */
> -       if (ret == 1) {
> +       if (ret) {
>                 pr_info("soft_offline: %#lx: invalidated\n", pfn);
> -               page_handle_poison(page, true);
> +               page_handle_poison(page, false, true);

arm64 build failed on linux-next 20200626 tag

make -sk KBUILD_BUILD_USER=TuxBuild -C/linux -j16 ARCH=arm64
CROSS_COMPILE=aarch64-linux-gnu- HOSTCC=gcc CC="sccache
aarch64-linux-gnu-gcc" O=build Image
79 #
80 ../mm/memory-failure.c: In function ‘__soft_offline_page’:
81 ../mm/memory-failure.c:1827:3: error: implicit declaration of
function ‘page_handle_poison’; did you mean ‘page_init_poison’?
[-Werror=implicit-function-declaration]
82  1827 | page_handle_poison(page, false, true);
83  | ^~~~~~~~~~~~~~~~~~
84  | page_init_poison


-- 
Linaro LKFT
https://lkft.linaro.org


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 13/15] mm,hwpoison: Refactor soft_offline_huge_page and __soft_offline_page
  2020-06-26 16:53   ` Naresh Kamboju
@ 2020-06-28  6:54     ` Naoya Horiguchi
  2020-06-29  7:22       ` Stephen Rothwell
  0 siblings, 1 reply; 34+ messages in thread
From: Naoya Horiguchi @ 2020-06-28  6:54 UTC (permalink / raw)
  To: Naresh Kamboju, cai, Andrew Morton
  Cc: linux-mm, Michal Hocko, Mike Kravetz, osalvador, Tony Luck,
	david, Aneesh Kumar K . V, zeil, naoya.horiguchi, open list,
	lkft-triage, Linux-Next Mailing List, Stephen Rothwell

On Fri, Jun 26, 2020 at 10:23:43PM +0530, Naresh Kamboju wrote:
> On Wed, 24 Jun 2020 at 20:32, <nao.horiguchi@gmail.com> wrote:
> >
> > From: Oscar Salvador <osalvador@suse.de>
> >
> > Merging soft_offline_huge_page and __soft_offline_page let us get rid of
> > quite some duplicated code, and makes the code much easier to follow.
> >
> > Now, __soft_offline_page will handle both normal and hugetlb pages.
> >
> > Note that move put_page() block to the beginning of page_handle_poison()
> > with drain_all_pages() in order to make sure that the target page is
> > freed and sent into free list to make take_page_off_buddy() work properly.
> >
> > Signed-off-by: Oscar Salvador <osalvador@suse.de>
> > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > ---
> > ChangeLog v2 -> v3:
> > - use page_is_file_lru() instead of page_is_file_cache(),
> > - add description about put_page() and drain_all_pages().
> > - fix coding style warnings by checkpatch.pl
> > ---
> >  mm/memory-failure.c | 185 ++++++++++++++++++++------------------------
> >  1 file changed, 86 insertions(+), 99 deletions(-)
> >
> > diff --git v5.8-rc1-mmots-2020-06-20-21-44/mm/memory-failure.c v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/memory-failure.c
> > index f744eb90c15c..22c904f6d17a 100644
> > --- v5.8-rc1-mmots-2020-06-20-21-44/mm/memory-failure.c
> > +++ v5.8-rc1-mmots-2020-06-20-21-44_patched/mm/memory-failure.c
> > @@ -78,14 +78,36 @@ EXPORT_SYMBOL_GPL(hwpoison_filter_dev_minor);
> >  EXPORT_SYMBOL_GPL(hwpoison_filter_flags_mask);
> >  EXPORT_SYMBOL_GPL(hwpoison_filter_flags_value);
> >
> > -static void page_handle_poison(struct page *page, bool release)
> > +static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, bool release)
> >  {
> > +       if (release) {
> > +               put_page(page);
> > +               drain_all_pages(page_zone(page));
> > +       }
> > +
> > +       if (hugepage_or_freepage) {
> > +               /*
> > +                * Doing this check for free pages is also fine since dissolve_free_huge_page
> > +                * returns 0 for non-hugetlb pages as well.
> > +                */
> > +               if (dissolve_free_huge_page(page) || !take_page_off_buddy(page))
> > +               /*
> > +                * The hugetlb page can end up being enqueued back into
> > +                * the freelists by means of:
> > +                * unmap_and_move_huge_page
> > +                *  putback_active_hugepage
> > +                *   put_page->free_huge_page
> > +                *    enqueue_huge_page
> > +                * If this happens, we might lose the race against an allocation.
> > +                */
> > +                       return false;
> > +       }
> >
> >         SetPageHWPoison(page);
> > -       if (release)
> > -               put_page(page);
> >         page_ref_inc(page);
> >         num_poisoned_pages_inc();
> > +
> > +       return true;
> >  }
> >
> >  static int hwpoison_filter_dev(struct page *p)
> > @@ -1718,63 +1740,52 @@ static int get_any_page(struct page *page, unsigned long pfn)
> >         return ret;
> >  }
> >
> > -static int soft_offline_huge_page(struct page *page)
> > +static bool isolate_page(struct page *page, struct list_head *pagelist)
> >  {
> > -       int ret;
> > -       unsigned long pfn = page_to_pfn(page);
> > -       struct page *hpage = compound_head(page);
> > -       LIST_HEAD(pagelist);
> > +       bool isolated = false;
> > +       bool lru = PageLRU(page);
> > +
> > +       if (PageHuge(page)) {
> > +               isolated = isolate_huge_page(page, pagelist);
> > +       } else {
> > +               if (lru)
> > +                       isolated = !isolate_lru_page(page);
> > +               else
> > +                       isolated = !isolate_movable_page(page, ISOLATE_UNEVICTABLE);
> > +
> > +               if (isolated)
> > +                       list_add(&page->lru, pagelist);
> >
> > -       /*
> > -        * This double-check of PageHWPoison is to avoid the race with
> > -        * memory_failure(). See also comment in __soft_offline_page().
> > -        */
> > -       lock_page(hpage);
> > -       if (PageHWPoison(hpage)) {
> > -               unlock_page(hpage);
> > -               put_page(hpage);
> > -               pr_info("soft offline: %#lx hugepage already poisoned\n", pfn);
> > -               return -EBUSY;
> >         }
> > -       unlock_page(hpage);
> >
> > -       ret = isolate_huge_page(hpage, &pagelist);
> > +       if (isolated && lru)
> > +               inc_node_page_state(page, NR_ISOLATED_ANON +
> > +                                   page_is_file_lru(page));
> > +
> >         /*
> > -        * get_any_page() and isolate_huge_page() takes a refcount each,
> > -        * so need to drop one here.
> > +        * If we succeed to isolate the page, we grabbed another refcount on
> > +        * the page, so we can safely drop the one we got from get_any_pages().
> > +        * If we failed to isolate the page, it means that we cannot go further
> > +        * and we will return an error, so drop the reference we got from
> > +        * get_any_pages() as well.
> >          */
> > -       put_page(hpage);
> > -       if (!ret) {
> > -               pr_info("soft offline: %#lx hugepage failed to isolate\n", pfn);
> > -               return -EBUSY;
> > -       }
> > -
> > -       ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
> > -                               MIGRATE_SYNC, MR_MEMORY_FAILURE);
> > -       if (ret) {
> > -               pr_info("soft offline: %#lx: hugepage migration failed %d, type %lx (%pGp)\n",
> > -                       pfn, ret, page->flags, &page->flags);
> > -               if (!list_empty(&pagelist))
> > -                       putback_movable_pages(&pagelist);
> > -               if (ret > 0)
> > -                       ret = -EIO;
> > -       } else {
> > -               /*
> > -                * We set PG_hwpoison only when we were able to take the page
> > -                * off the buddy.
> > -                */
> > -               if (!dissolve_free_huge_page(page) && take_page_off_buddy(page))
> > -                       page_handle_poison(page, false);
> > -               else
> > -                       ret = -EBUSY;
> > -       }
> > -       return ret;
> > +       put_page(page);
> > +       return isolated;
> >  }
> >
> > +/*
> > + * __soft_offline_page handles hugetlb-pages and non-hugetlb pages.
> > + * If the page is a non-dirty unmapped page-cache page, it simply invalidates.
> > + * If the page is mapped, it migrates the contents over.
> > + */
> >  static int __soft_offline_page(struct page *page)
> >  {
> > -       int ret;
> > +       int ret = 0;
> >         unsigned long pfn = page_to_pfn(page);
> > +       struct page *hpage = compound_head(page);
> > +       const char *msg_page[] = {"page", "hugepage"};
> > +       bool huge = PageHuge(page);
> > +       LIST_HEAD(pagelist);
> >
> >         /*
> >          * Check PageHWPoison again inside page lock because PageHWPoison
> > @@ -1783,98 +1794,74 @@ static int __soft_offline_page(struct page *page)
> >          * so there's no race between soft_offline_page() and memory_failure().
> >          */
> >         lock_page(page);
> > -       wait_on_page_writeback(page);
> > +       if (!PageHuge(page))
> > +               wait_on_page_writeback(page);
> >         if (PageHWPoison(page)) {
> >                 unlock_page(page);
> >                 put_page(page);
> >                 pr_info("soft offline: %#lx page already poisoned\n", pfn);
> >                 return -EBUSY;
> >         }
> > -       /*
> > -        * Try to invalidate first. This should work for
> > -        * non dirty unmapped page cache pages.
> > -        */
> > -       ret = invalidate_inode_page(page);
> > +
> > +       if (!PageHuge(page))
> > +               /*
> > +                * Try to invalidate first. This should work for
> > +                * non dirty unmapped page cache pages.
> > +                */
> > +               ret = invalidate_inode_page(page);
> >         unlock_page(page);
> > +
> >         /*
> >          * RED-PEN would be better to keep it isolated here, but we
> >          * would need to fix isolation locking first.
> >          */
> > -       if (ret == 1) {
> > +       if (ret) {
> >                 pr_info("soft_offline: %#lx: invalidated\n", pfn);
> > -               page_handle_poison(page, true);
> > +               page_handle_poison(page, false, true);
> 
> arm64 build failed on linux-next 20200626 tag
> 
> make -sk KBUILD_BUILD_USER=TuxBuild -C/linux -j16 ARCH=arm64
> CROSS_COMPILE=aarch64-linux-gnu- HOSTCC=gcc CC="sccache
> aarch64-linux-gnu-gcc" O=build Image
> 79 #
> 80 ../mm/memory-failure.c: In function ‘__soft_offline_page’:
> 81 ../mm/memory-failure.c:1827:3: error: implicit declaration of
> function ‘page_handle_poison’; did you mean ‘page_init_poison’?
> [-Werror=implicit-function-declaration]
> 82  1827 | page_handle_poison(page, false, true);
> 83  | ^~~~~~~~~~~~~~~~~~
> 84  | page_init_poison

Thanks for reporting, Naresh, Qian Cai.

page_handle_poison() is now defined in #ifdef CONFIG_HWPOISON_INJECT block,
which is not correct because this function is used in non-injection code
path.  I'd like to move this function out of the block and it affects
multiple patches in this series, so maybe I have to update/resend the whole
series, but I like to wait for a few days for other reviews, so for quick
fix for linux-next I suggest the following one.

Andrew, could you append this diff on top of this series on mmotm?

Thanks,
Naoya Horiguchi
---
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index e90ddddab397..b49590bc5615 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -78,38 +78,6 @@ EXPORT_SYMBOL_GPL(hwpoison_filter_dev_minor);
 EXPORT_SYMBOL_GPL(hwpoison_filter_flags_mask);
 EXPORT_SYMBOL_GPL(hwpoison_filter_flags_value);
 
-static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, bool release)
-{
-	if (release) {
-		put_page(page);
-		drain_all_pages(page_zone(page));
-	}
-
-	if (hugepage_or_freepage) {
-		/*
-		 * Doing this check for free pages is also fine since dissolve_free_huge_page
-		 * returns 0 for non-hugetlb pages as well.
-		 */
-		if (dissolve_free_huge_page(page) || !take_page_off_buddy(page))
-		/*
-		 * The hugetlb page can end up being enqueued back into
-		 * the freelists by means of:
-		 * unmap_and_move_huge_page
-		 *  putback_active_hugepage
-		 *   put_page->free_huge_page
-		 *    enqueue_huge_page
-		 * If this happens, we might lose the race against an allocation.
-		 */
-			return false;
-	}
-
-	SetPageHWPoison(page);
-	page_ref_inc(page);
-	num_poisoned_pages_inc();
-
-	return true;
-}
-
 static int hwpoison_filter_dev(struct page *p)
 {
 	struct address_space *mapping;
@@ -204,6 +172,38 @@ int hwpoison_filter(struct page *p)
 
 EXPORT_SYMBOL_GPL(hwpoison_filter);
 
+static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, bool release)
+{
+	if (release) {
+		put_page(page);
+		drain_all_pages(page_zone(page));
+	}
+
+	if (hugepage_or_freepage) {
+		/*
+		 * Doing this check for free pages is also fine since dissolve_free_huge_page
+		 * returns 0 for non-hugetlb pages as well.
+		 */
+		if (dissolve_free_huge_page(page) || !take_page_off_buddy(page))
+		/*
+		 * The hugetlb page can end up being enqueued back into
+		 * the freelists by means of:
+		 * unmap_and_move_huge_page
+		 *  putback_active_hugepage
+		 *   put_page->free_huge_page
+		 *    enqueue_huge_page
+		 * If this happens, we might lose the race against an allocation.
+		 */
+			return false;
+	}
+
+	SetPageHWPoison(page);
+	page_ref_inc(page);
+	num_poisoned_pages_inc();
+
+	return true;
+}
+
 /*
  * Kill all processes that have a poisoned page mapped and then isolate
  * the page.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 13/15] mm,hwpoison: Refactor soft_offline_huge_page and __soft_offline_page
  2020-06-28  6:54     ` Naoya Horiguchi
@ 2020-06-29  7:22       ` Stephen Rothwell
  2020-06-29  7:33         ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 1 reply; 34+ messages in thread
From: Stephen Rothwell @ 2020-06-29  7:22 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Naresh Kamboju, cai, Andrew Morton, linux-mm, Michal Hocko,
	Mike Kravetz, osalvador, Tony Luck, david, Aneesh Kumar K . V,
	zeil, naoya.horiguchi, open list, lkft-triage,
	Linux-Next Mailing List


[-- Attachment #1: Type: text/plain, Size: 253 bytes --]

Hi Naoya,

On Sun, 28 Jun 2020 15:54:09 +0900 Naoya Horiguchi <nao.horiguchi@gmail.com> wrote:
>
> Andrew, could you append this diff on top of this series on mmotm?

I have added that patch to linux-next today.

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 13/15] mm,hwpoison: Refactor soft_offline_huge_page and __soft_offline_page
  2020-06-29  7:22       ` Stephen Rothwell
@ 2020-06-29  7:33         ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 0 replies; 34+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2020-06-29  7:33 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Naoya Horiguchi, Naresh Kamboju, cai, Andrew Morton, linux-mm,
	Michal Hocko, Mike Kravetz, osalvador, Tony Luck, david,
	Aneesh Kumar K . V, zeil, open list, lkft-triage,
	Linux-Next Mailing List

On Mon, Jun 29, 2020 at 05:22:40PM +1000, Stephen Rothwell wrote:
> Hi Naoya,
> 
> On Sun, 28 Jun 2020 15:54:09 +0900 Naoya Horiguchi <nao.horiguchi@gmail.com> wrote:
> >
> > Andrew, could you append this diff on top of this series on mmotm?
> 
> I have added that patch to linux-next today.

Thank you!

- Naoya

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 00/15] HWPOISON: soft offline rework
  2020-06-24 15:01 [PATCH v3 00/15] HWPOISON: soft offline rework nao.horiguchi
                   ` (16 preceding siblings ...)
  2020-06-26 13:35 ` Qian Cai
@ 2020-06-29 10:29 ` Oscar Salvador
  2020-06-30  6:50   ` HORIGUCHI NAOYA(堀口 直也)
  2020-06-30  5:08 ` Qian Cai
  18 siblings, 1 reply; 34+ messages in thread
From: Oscar Salvador @ 2020-06-29 10:29 UTC (permalink / raw)
  To: nao.horiguchi, linux-mm
  Cc: mhocko, akpm, mike.kravetz, tony.luck, david, aneesh.kumar, zeil,
	naoya.horiguchi, linux-kernel

On Wed, 2020-06-24 at 15:01 +0000, nao.horiguchi@gmail.com wrote:
> I rebased soft-offline rework patchset [1][2] onto the latest
> mmotm.  The
> rebasing required some non-trivial changes to adjust, but mainly that
> was
> straightforward.  I confirmed that the reported problem doesn't
> reproduce on
> compaction after soft offline.  For more precise description of the
> problem
> and the motivation of this patchset, please see [2].

Hi Naoya,

Thanks for dusting this off.
To be honest, I got stuck with the hard offline mode so this delayed
the resubmission, along other problems.

> I think that the following two patches in v2 are better to be done
> with
> separate work of hard-offline rework, so it's not included in this
> series.
> 
>   - mm,hwpoison: Take pages off the buddy when hard-offlining
>   - mm/hwpoison-inject: Rip off duplicated checks
> 
> These two are not directly related to the reported problem, so they
> seems
> not urgent.  And the first one breaks num_poisoned_pages counting in
> some
> testcases, and The second patch needs more consideration about
> commented point.

I fully agree.

> Any comment/suggestion/help would be appreciated.

My "new" version included a patch to make sure we give a chance to
pages that possibly are in a pcplist.
Current behavior is that if someone tries to soft-offline such a page,
we return an error because page count is 0 but page is not in the buddy
system.

Since this patchset already landed in the mm tree, I could send it as a
standalone patch on top if you agree with it.

My patch looked something like:

From: Oscar Salvador <osalvador@suse.de>
Date: Mon, 29 Jun 2020 12:25:11 +0200
Subject: [PATCH] mm,hwpoison: Drain pcplists before bailing out for
non-buddy
 zero-refcount page

A page with 0-refcount and !PageBuddy could perfectly be a pcppage.
Currently, we bail out with an error if we encounter such a page,
meaning that we do not give a chance to handle pcppages.

Fix this by draining pcplists whenever we find this kind of page
and retry the check again.
It might be that pcplists have been spilled into the buddy allocator
and so we can handle it.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 mm/memory-failure.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index e90ddddab397..3aac3f1eeed0 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -958,7 +958,7 @@ static int page_action(struct page_state *ps,
struct page *p,
  * Return: return 0 if failed to grab the refcount, otherwise true
(some
  * non-zero value.)
  */
-static int get_hwpoison_page(struct page *page)
+static int __get_hwpoison_page(struct page *page)
 {
 	struct page *head = compound_head(page);
 
@@ -988,6 +988,28 @@ static int get_hwpoison_page(struct page *page)
 	return 0;
 }
 
+static int get_hwpoison_page(struct page *p)
+{
+	int ret;
+	bool drained = false;
+
+retry:
+	ret = __get_hwpoison_page(p);
+	if (!ret) {
+		if (!is_free_buddy_page(p) && !page_count(p) &&
!drained) {
+			/*
+			 * The page might be in a pcplist, so try to
drain
+			 * those and see if we are lucky.
+			 */
+			drain_all_pages(page_zone(p));
+			drained = true;
+			goto retry;
+		}
+	}
+
+	return ret;
+}
+
 /*
  * Do all that is necessary to remove user space mappings. Unmap
  * the pages and send SIGBUS to the processes if the data was dirty.
-- 
2.26.2

-- 
Oscar Salvador
SUSE L3


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 00/15] HWPOISON: soft offline rework
  2020-06-24 15:01 [PATCH v3 00/15] HWPOISON: soft offline rework nao.horiguchi
                   ` (17 preceding siblings ...)
  2020-06-29 10:29 ` Oscar Salvador
@ 2020-06-30  5:08 ` Qian Cai
  2020-06-30  6:35   ` Oscar Salvador
  2020-07-14 10:08   ` Oscar Salvador
  18 siblings, 2 replies; 34+ messages in thread
From: Qian Cai @ 2020-06-30  5:08 UTC (permalink / raw)
  To: nao.horiguchi
  Cc: linux-mm, mhocko, akpm, mike.kravetz, osalvador, tony.luck,
	david, aneesh.kumar, zeil, naoya.horiguchi, linux-kernel

On Wed, Jun 24, 2020 at 03:01:22PM +0000, nao.horiguchi@gmail.com wrote:
> I rebased soft-offline rework patchset [1][2] onto the latest mmotm.  The
> rebasing required some non-trivial changes to adjust, but mainly that was
> straightforward.  I confirmed that the reported problem doesn't reproduce on
> compaction after soft offline.  For more precise description of the problem
> and the motivation of this patchset, please see [2].
> 
> I think that the following two patches in v2 are better to be done with
> separate work of hard-offline rework, so it's not included in this series.
> 
>   - mm,hwpoison: Take pages off the buddy when hard-offlining
>   - mm/hwpoison-inject: Rip off duplicated checks
> 
> These two are not directly related to the reported problem, so they seems
> not urgent.  And the first one breaks num_poisoned_pages counting in some
> testcases, and The second patch needs more consideration about commented point.
> 
> Any comment/suggestion/help would be appreciated.

Even after applied the compling fix,

https://lore.kernel.org/linux-mm/20200628065409.GA546944@u2004/

madvise(MADV_SOFT_OFFLINE) will fail with EIO with hugetlb where it
would succeed without this series. Steps:

# git clone https://github.com/cailca/linux-mm
# cd linux-mm; make
# ./random 1 (Need at least two NUMA memory nodes)
 start: migrate_huge_offline
- use NUMA nodes 0,4.
- mmap and free 8388608 bytes hugepages on node 0
- mmap and free 8388608 bytes hugepages on node 4
madvise: Input/output error

(x86.config is also included there.)

[10718.158548][T13039] __get_any_page: 0x1d1600 free huge page
[10718.165684][T13039] Soft offlining pfn 0x1d1600 at process virtual address 0x7f1dd2000000
[10718.175061][T13039] Soft offlining pfn 0x154c00 at process virtual address 0x7f1dd2200000
[10718.185492][T13039] Soft offlining pfn 0x137c00 at process virtual address 0x7f1dd2000000
[10718.195209][T13039] Soft offlining pfn 0x4c7a00 at process virtual address 0x7f1dd2200000
[10718.203483][T13039] soft offline: 0x4c7a00: hugepage isolation failed: 0, page count 2, type bfffc00001000f (locked|referenced|uptodate|dirty|head)
[10718.218228][T13039] Soft offlining pfn 0x4c7a00 at process virtual address 0x7f1dd2000000
[10718.227522][T13039] Soft offlining pfn 0x2da800 at process virtual address 0x7f1dd2200000
[10718.238503][T13039] Soft offlining pfn 0x1de200 at process virtual address 0x7f1dd2000000
[10718.247822][T13039] Soft offlining pfn 0x155c00 at process virtual address 0x7f1dd2200000
[10718.259031][T13039] Soft offlining pfn 0x203600 at process virtual address 0x7f1dd2000000
[10718.268504][T13039] Soft offlining pfn 0x417600 at process virtual address 0x7f1dd2200000
[10718.278830][T13039] Soft offlining pfn 0x20a600 at process virtual address 0x7f1dd2000000
[10718.288133][T13039] Soft offlining pfn 0x1d0800 at process virtual address 0x7f1dd2200000
[10718.299198][T13039] Soft offlining pfn 0x3e5800 at process virtual address 0x7f1dd2000000
[10718.308593][T13039] Soft offlining pfn 0x21f200 at process virtual address 0x7f1dd2200000
[10718.319725][T13039] Soft offlining pfn 0x18c600 at process virtual address 0x7f1dd2000000
[10718.329301][T13039] Soft offlining pfn 0x396a00 at process virtual address 0x7f1dd2200000
[10718.378882][T13039] Soft offlining pfn 0x4d5000 at process virtual address 0x7f1dd2000000
[10718.388415][T13039] Soft offlining pfn 0x4e5000 at process virtual address 0x7f1dd2200000
[10718.398807][T13039] Soft offlining pfn 0x2f5000 at process virtual address 0x7f1dd2000000
[10718.408236][T13039] Soft offlining pfn 0x50b400 at process virtual address 0x7f1dd2200000
[10718.419781][T13039] Soft offlining pfn 0x396800 at process virtual address 0x7f1dd2000000
[10718.429677][T13039] Soft offlining pfn 0xd69c00 at process virtual address 0x7f1dd2200000
[10718.440435][T13039] Soft offlining pfn 0x21f000 at process virtual address 0x7f1dd2000000
[10718.450341][T13039] Soft offlining pfn 0x399400 at process virtual address 0x7f1dd2200000
[10718.458768][T13039] __get_any_page: 0x399400: unknown zero refcount page type bfffc000000000

The main part is,
https://github.com/cailca/linux-mm/blob/master/random.c#L372

> 
> [1] v1: https://lore.kernel.org/linux-mm/1541746035-13408-1-git-send-email-n-horiguchi@ah.jp.nec.com/
> [2] v2: https://lore.kernel.org/linux-mm/20191017142123.24245-1-osalvador@suse.de/
> 
> Thanks,
> Naoya Horiguchi
> ---
> Summary:
> 
> Naoya Horiguchi (7):
>       mm,hwpoison: cleanup unused PageHuge() check
>       mm, hwpoison: remove recalculating hpage
>       mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED
>       mm,hwpoison-inject: don't pin for hwpoison_filter
>       mm,hwpoison: remove MF_COUNT_INCREASED
>       mm,hwpoison: remove flag argument from soft offline functions
>       mm,hwpoison: introduce MF_MSG_UNSPLIT_THP
> 
> Oscar Salvador (8):
>       mm,madvise: Refactor madvise_inject_error
>       mm,hwpoison: Un-export get_hwpoison_page and make it static
>       mm,hwpoison: Kill put_hwpoison_page
>       mm,hwpoison: Unify THP handling for hard and soft offline
>       mm,hwpoison: Rework soft offline for free pages
>       mm,hwpoison: Rework soft offline for in-use pages
>       mm,hwpoison: Refactor soft_offline_huge_page and __soft_offline_page
>       mm,hwpoison: Return 0 if the page is already poisoned in soft-offline
> 
>  drivers/base/memory.c      |   2 +-
>  include/linux/mm.h         |  12 +-
>  include/linux/page-flags.h |   6 +-
>  include/ras/ras_event.h    |   3 +
>  mm/hwpoison-inject.c       |  18 +--
>  mm/madvise.c               |  39 +++---
>  mm/memory-failure.c        | 331 ++++++++++++++++++++-------------------------
>  mm/migrate.c               |  11 +-
>  mm/page_alloc.c            |  63 +++++++--
>  9 files changed, 233 insertions(+), 252 deletions(-)
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 00/15] HWPOISON: soft offline rework
  2020-06-30  5:08 ` Qian Cai
@ 2020-06-30  6:35   ` Oscar Salvador
  2020-07-01  8:22     ` Oscar Salvador
  2020-07-14 10:08   ` Oscar Salvador
  1 sibling, 1 reply; 34+ messages in thread
From: Oscar Salvador @ 2020-06-30  6:35 UTC (permalink / raw)
  To: Qian Cai, nao.horiguchi
  Cc: linux-mm, mhocko, akpm, mike.kravetz, tony.luck, david,
	aneesh.kumar, zeil, naoya.horiguchi, linux-kernel

On Tue, 2020-06-30 at 01:08 -0400, Qian Cai wrote:
> On Wed, Jun 24, 2020 at 03:01:22PM +0000, nao.horiguchi@gmail.com
> wrote:
> > I rebased soft-offline rework patchset [1][2] onto the latest
> > mmotm.  The
> > rebasing required some non-trivial changes to adjust, but mainly
> > that was
> > straightforward.  I confirmed that the reported problem doesn't
> > reproduce on
> > compaction after soft offline.  For more precise description of the
> > problem
> > and the motivation of this patchset, please see [2].
> > 
> > I think that the following two patches in v2 are better to be done
> > with
> > separate work of hard-offline rework, so it's not included in this
> > series.
> > 
> >   - mm,hwpoison: Take pages off the buddy when hard-offlining
> >   - mm/hwpoison-inject: Rip off duplicated checks
> > 
> > These two are not directly related to the reported problem, so they
> > seems
> > not urgent.  And the first one breaks num_poisoned_pages counting
> > in some
> > testcases, and The second patch needs more consideration about
> > commented point.
> > 
> > Any comment/suggestion/help would be appreciated.
> 
> Even after applied the compling fix,
> 
> https://lore.kernel.org/linux-mm/20200628065409.GA546944@u2004/
> 
> madvise(MADV_SOFT_OFFLINE) will fail with EIO with hugetlb where it
> would succeed without this series. Steps:
> 
> # git clone https://github.com/cailca/linux-mm
> # cd linux-mm; make
> # ./random 1 (Need at least two NUMA memory nodes)
>  start: migrate_huge_offline
> - use NUMA nodes 0,4.
> - mmap and free 8388608 bytes hugepages on node 0
> - mmap and free 8388608 bytes hugepages on node 4
> madvise: Input/output error

I think I know why.
It's been a while since I took a look, but I compared the posted
patchset with my newest patchset I had ready and I saw I made some
changes with regard of hugetlb pages.

I will be taking a look, although it might be better to re-post the
patchset instead of adding a fix on top since the changes are a bit
substantial.

Thanks for reporting.

-- 
Oscar Salvador
SUSE L3


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 00/15] HWPOISON: soft offline rework
  2020-06-29 10:29 ` Oscar Salvador
@ 2020-06-30  6:50   ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 0 replies; 34+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2020-06-30  6:50 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: nao.horiguchi, linux-mm, mhocko, akpm, mike.kravetz, tony.luck,
	david, aneesh.kumar, zeil, linux-kernel

On Mon, Jun 29, 2020 at 12:29:25PM +0200, Oscar Salvador wrote:
> On Wed, 2020-06-24 at 15:01 +0000, nao.horiguchi@gmail.com wrote:
> > I rebased soft-offline rework patchset [1][2] onto the latest
> > mmotm.  The
> > rebasing required some non-trivial changes to adjust, but mainly that
> > was
> > straightforward.  I confirmed that the reported problem doesn't
> > reproduce on
> > compaction after soft offline.  For more precise description of the
> > problem
> > and the motivation of this patchset, please see [2].
> 
> Hi Naoya,
> 
> Thanks for dusting this off.
> To be honest, I got stuck with the hard offline mode so this delayed
> the resubmission, along other problems.
> 
> > I think that the following two patches in v2 are better to be done
> > with
> > separate work of hard-offline rework, so it's not included in this
> > series.
> > 
> >   - mm,hwpoison: Take pages off the buddy when hard-offlining
> >   - mm/hwpoison-inject: Rip off duplicated checks
> > 
> > These two are not directly related to the reported problem, so they
> > seems
> > not urgent.  And the first one breaks num_poisoned_pages counting in
> > some
> > testcases, and The second patch needs more consideration about
> > commented point.
> 
> I fully agree.
> 
> > Any comment/suggestion/help would be appreciated.
> 
> My "new" version included a patch to make sure we give a chance to
> pages that possibly are in a pcplist.
> Current behavior is that if someone tries to soft-offline such a page,
> we return an error because page count is 0 but page is not in the buddy
> system.
> 
> Since this patchset already landed in the mm tree, I could send it as a
> standalone patch on top if you agree with it.
> 
> My patch looked something like:
> 
> From: Oscar Salvador <osalvador@suse.de>
> Date: Mon, 29 Jun 2020 12:25:11 +0200
> Subject: [PATCH] mm,hwpoison: Drain pcplists before bailing out for
> non-buddy
>  zero-refcount page
> 
> A page with 0-refcount and !PageBuddy could perfectly be a pcppage.
> Currently, we bail out with an error if we encounter such a page,
> meaning that we do not give a chance to handle pcppages.
> 
> Fix this by draining pcplists whenever we find this kind of page
> and retry the check again.
> It might be that pcplists have been spilled into the buddy allocator
> and so we can handle it.
>
> Signed-off-by: Oscar Salvador <osalvador@suse.de>

I'm fine with this patch, so this will be with the next version with
some minor change like moving function comment block together.

Actually I feel we could refactor get_hwpoison_page() with get_any_page()
to handle this kind of pcplist related issues in one place. But this could
be more important in hard-offline reworking, and we don't have to do it
in this series.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 01/15] mm,hwpoison: cleanup unused PageHuge() check
  2020-06-24 15:01 ` [PATCH v3 01/15] mm,hwpoison: cleanup unused PageHuge() check nao.horiguchi
@ 2020-06-30 23:28   ` Mike Kravetz
  0 siblings, 0 replies; 34+ messages in thread
From: Mike Kravetz @ 2020-06-30 23:28 UTC (permalink / raw)
  To: nao.horiguchi, linux-mm
  Cc: mhocko, akpm, osalvador, tony.luck, david, aneesh.kumar, zeil,
	naoya.horiguchi, linux-kernel

On 6/24/20 8:01 AM, nao.horiguchi@gmail.com wrote:
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> Drop the PageHuge check, which is dead code since memory_failure() forks
> into memory_failure_hugetlb() for hugetlb pages.
> 
> memory_failure() and memory_failure_hugetlb() shares some functions like
> hwpoison_user_mappings() and identify_page_state(), so they should properly
> handle 4kB page, thp, and hugetlb.
> 
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> Signed-off-by: Oscar Salvador <osalvador@suse.de>

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
-- 
Mike Kravetz


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 02/15] mm, hwpoison: remove recalculating hpage
  2020-06-24 15:01 ` [PATCH v3 02/15] mm, hwpoison: remove recalculating hpage nao.horiguchi
@ 2020-06-30 23:45   ` Mike Kravetz
  0 siblings, 0 replies; 34+ messages in thread
From: Mike Kravetz @ 2020-06-30 23:45 UTC (permalink / raw)
  To: nao.horiguchi, linux-mm
  Cc: mhocko, akpm, osalvador, tony.luck, david, aneesh.kumar, zeil,
	naoya.horiguchi, linux-kernel

On 6/24/20 8:01 AM, nao.horiguchi@gmail.com wrote:
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> hpage is never used after try_to_split_thp_page() in memory_failure(),
> so we don't have to update hpage.  So let's not recalculate/use hpage.
> 
> Suggested-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
-- 
Mike Kravetz


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 00/15] HWPOISON: soft offline rework
  2020-06-30  6:35   ` Oscar Salvador
@ 2020-07-01  8:22     ` Oscar Salvador
  2020-07-01  9:07       ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 1 reply; 34+ messages in thread
From: Oscar Salvador @ 2020-07-01  8:22 UTC (permalink / raw)
  To: Qian Cai, nao.horiguchi
  Cc: linux-mm, mhocko, akpm, mike.kravetz, tony.luck, david,
	aneesh.kumar, zeil, naoya.horiguchi, linux-kernel

On Tue, 2020-06-30 at 08:35 +0200, Oscar Salvador wrote:
> > Even after applied the compling fix,
> > 
> > https://lore.kernel.org/linux-mm/20200628065409.GA546944@u2004/
> > 
> > madvise(MADV_SOFT_OFFLINE) will fail with EIO with hugetlb where it
> > would succeed without this series. Steps:
> > 
> > # git clone https://github.com/cailca/linux-mm
> > # cd linux-mm; make
> > # ./random 1 (Need at least two NUMA memory nodes)
> >  start: migrate_huge_offline
> > - use NUMA nodes 0,4.
> > - mmap and free 8388608 bytes hugepages on node 0
> > - mmap and free 8388608 bytes hugepages on node 4
> > madvise: Input/output error
> 
> I think I know why.
> It's been a while since I took a look, but I compared the posted
> patchset with my newest patchset I had ready and I saw I made some
> changes with regard of hugetlb pages.
> 
> I will be taking a look, although it might be better to re-post the
> patchset instead of adding a fix on top since the changes are a bit
> substantial.

Yap, I just confirmed this.

Posted version had some flaws wrt. hugetlb pages, that is why I was
working on a v3.
I am rebasing my v3 on top of the mm with the posted patchset reverted.

I expect to post it on Friday.


-- 
Oscar Salvador
SUSE L3


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 00/15] HWPOISON: soft offline rework
  2020-07-01  8:22     ` Oscar Salvador
@ 2020-07-01  9:07       ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 0 replies; 34+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2020-07-01  9:07 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Qian Cai, nao.horiguchi, linux-mm, mhocko, akpm, mike.kravetz,
	tony.luck, david, aneesh.kumar, zeil, linux-kernel

On Wed, Jul 01, 2020 at 10:22:07AM +0200, Oscar Salvador wrote:
> On Tue, 2020-06-30 at 08:35 +0200, Oscar Salvador wrote:
> > > Even after applied the compling fix,
> > > 
> > > https://lore.kernel.org/linux-mm/20200628065409.GA546944@u2004/
> > > 
> > > madvise(MADV_SOFT_OFFLINE) will fail with EIO with hugetlb where it
> > > would succeed without this series. Steps:
> > > 
> > > # git clone https://github.com/cailca/linux-mm
> > > # cd linux-mm; make
> > > # ./random 1 (Need at least two NUMA memory nodes)
> > >  start: migrate_huge_offline
> > > - use NUMA nodes 0,4.
> > > - mmap and free 8388608 bytes hugepages on node 0
> > > - mmap and free 8388608 bytes hugepages on node 4
> > > madvise: Input/output error
> > 
> > I think I know why.
> > It's been a while since I took a look, but I compared the posted
> > patchset with my newest patchset I had ready and I saw I made some
> > changes with regard of hugetlb pages.
> > 
> > I will be taking a look, although it might be better to re-post the
> > patchset instead of adding a fix on top since the changes are a bit
> > substantial.
> 
> Yap, I just confirmed this.

I've reproduced the EIO, but still not sure how this happens ...

> Posted version had some flaws wrt. hugetlb pages, that is why I was
> working on a v3.
> I am rebasing my v3 on top of the mm with the posted patchset reverted.

OK, thank you for offering to work for v4, so I'll wait for it for testing.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 00/15] HWPOISON: soft offline rework
  2020-06-30  5:08 ` Qian Cai
  2020-06-30  6:35   ` Oscar Salvador
@ 2020-07-14 10:08   ` Oscar Salvador
  1 sibling, 0 replies; 34+ messages in thread
From: Oscar Salvador @ 2020-07-14 10:08 UTC (permalink / raw)
  To: Qian Cai
  Cc: nao.horiguchi, linux-mm, mhocko, akpm, mike.kravetz, tony.luck,
	david, aneesh.kumar, zeil, naoya.horiguchi, linux-kernel

On Tue, Jun 30, 2020 at 01:08:03AM -0400, Qian Cai wrote:
> Even after applied the compling fix,
> 
> https://lore.kernel.org/linux-mm/20200628065409.GA546944@u2004/
> 
> madvise(MADV_SOFT_OFFLINE) will fail with EIO with hugetlb where it
> would succeed without this series. Steps:
> 
> # git clone https://github.com/cailca/linux-mm
> # cd linux-mm; make
> # ./random 1 (Need at least two NUMA memory nodes)
>  start: migrate_huge_offline
> - use NUMA nodes 0,4.
> - mmap and free 8388608 bytes hugepages on node 0
> - mmap and free 8388608 bytes hugepages on node 4
> madvise: Input/output error

Ok, sorry for the lateness, but I had to re-fetch the code on my brain again.

I just finished v4 of this patchset and it seems this problem is gone:

# ./random 1
- start: migrate_huge_offline
- use NUMA nodes 0,1.
- mmap and free 8388608 bytes hugepages on node 0
- mmap and free 8388608 bytes hugepages on node 1
- pass: mmap_offline_node_huge
- start: hotplug_memory
offline: Device or resource busy
offline: Device or resource busy
offline: Device or resource busy
offline: Device or resource busy
offline: Device or resource busy
offline: Device or resource busy
offline: Device or resource busy
offline: Device or resource busy
offline: Invalid argument
offline: Device or resource busy
offline: Invalid argument
offline: Device or resource busy
offline: Device or resource busy
offline: Device or resource busy
offline: Device or resource busy
- pass: hotplug_memory

The test seems to suceed and no crash on the kernel side.

I will just run some more tests to make sure the thing is solid enough
and then I will post v4.

Thanks

-- 
Oscar Salvador
SUSE L3


^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, back to index

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-24 15:01 [PATCH v3 00/15] HWPOISON: soft offline rework nao.horiguchi
2020-06-24 15:01 ` [PATCH v3 01/15] mm,hwpoison: cleanup unused PageHuge() check nao.horiguchi
2020-06-30 23:28   ` Mike Kravetz
2020-06-24 15:01 ` [PATCH v3 02/15] mm, hwpoison: remove recalculating hpage nao.horiguchi
2020-06-30 23:45   ` Mike Kravetz
2020-06-24 15:01 ` [PATCH v3 03/15] mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED nao.horiguchi
2020-06-24 15:01 ` [PATCH v3 04/15] mm,madvise: Refactor madvise_inject_error nao.horiguchi
2020-06-24 15:01 ` [PATCH v3 05/15] mm,hwpoison-inject: don't pin for hwpoison_filter nao.horiguchi
2020-06-24 15:01 ` [PATCH v3 06/15] mm,hwpoison: Un-export get_hwpoison_page and make it static nao.horiguchi
2020-06-24 15:01 ` [PATCH v3 07/15] mm,hwpoison: Kill put_hwpoison_page nao.horiguchi
2020-06-24 15:01 ` [PATCH v3 08/15] mm,hwpoison: remove MF_COUNT_INCREASED nao.horiguchi
2020-06-24 15:01 ` [PATCH v3 09/15] mm,hwpoison: remove flag argument from soft offline functions nao.horiguchi
2020-06-24 15:01 ` [PATCH v3 10/15] mm,hwpoison: Unify THP handling for hard and soft offline nao.horiguchi
2020-06-24 15:01 ` [PATCH v3 11/15] mm,hwpoison: Rework soft offline for free pages nao.horiguchi
2020-06-24 15:01 ` [PATCH v3 12/15] mm,hwpoison: Rework soft offline for in-use pages nao.horiguchi
2020-06-24 15:01 ` [PATCH v3 13/15] mm,hwpoison: Refactor soft_offline_huge_page and __soft_offline_page nao.horiguchi
2020-06-26 16:53   ` Naresh Kamboju
2020-06-28  6:54     ` Naoya Horiguchi
2020-06-29  7:22       ` Stephen Rothwell
2020-06-29  7:33         ` HORIGUCHI NAOYA(堀口 直也)
2020-06-24 15:01 ` [PATCH v3 14/15] mm,hwpoison: Return 0 if the page is already poisoned in soft-offline nao.horiguchi
2020-06-24 15:01 ` [PATCH v3 15/15] mm,hwpoison: introduce MF_MSG_UNSPLIT_THP nao.horiguchi
2020-06-24 19:17 ` [PATCH v3 00/15] HWPOISON: soft offline rework Andrew Morton
2020-06-24 22:36   ` HORIGUCHI NAOYA(堀口 直也)
2020-06-24 22:49     ` Andrew Morton
2020-06-24 23:01       ` HORIGUCHI NAOYA(堀口 直也)
2020-06-26 13:35 ` Qian Cai
2020-06-29 10:29 ` Oscar Salvador
2020-06-30  6:50   ` HORIGUCHI NAOYA(堀口 直也)
2020-06-30  5:08 ` Qian Cai
2020-06-30  6:35   ` Oscar Salvador
2020-07-01  8:22     ` Oscar Salvador
2020-07-01  9:07       ` HORIGUCHI NAOYA(堀口 直也)
2020-07-14 10:08   ` Oscar Salvador

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git