All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC v1 00/11] hwpoison improvement part 1
@ 2018-11-09  6:47 Naoya Horiguchi
  2018-11-09  6:47 ` [RFC][PATCH v1 01/11] mm: hwpoison: cleanup unused PageHuge() check Naoya Horiguchi
                   ` (10 more replies)
  0 siblings, 11 replies; 24+ messages in thread
From: Naoya Horiguchi @ 2018-11-09  6:47 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Michal Hocko, Andrew Morton, Mike Kravetz, xishi.qiuxishi,
	Laurent Dufour

Hi everyone,

I wrote hwpoison patches which partially mention the problems
discussed recently on this area [1].

Main point of this series is how we isolate faulty pages more
safely/reliable. As pointed out from Michal in thread [2], we can
have better isolation functions rather than what we currently have.
Patch 8/11 gives the implementation. As a result, the behavior of
poisoned pages (at least from soft-offline) are more predictable
and I think that memory hotremove should properly work with it.

The structure of this series:
  - patch 1-7 are small fixes, preparation, and/or cleanup.
    I can separate these out from main part if you like.
  - patch 8 is core part of this series, providing some code
    to pick out the target page from buddy allocator,
  - patch 9-11 are changes on caller sides (hard-offline,
    hotremove and unpoison.)

One big issue not addressed by this series is hard-offlining hugetlb,
which is still a todo unfortunately.

Another remaining work is to rework on the behavior of PG_hwpoison
flag from hard-offlining of in-use page. Even with this series,
hard-offline for in-use pages works as in the past (i.e. we still take
racy "set PG_hwpoison at first, then do some handling" approach.)
Without changing this, we can't be free from many "if (PageHWPoison)"
checks in mm code. So I'll think/try more about it after this one.

Anyway this is the first step for better solution (I believe,)
and any kind of help is applicated.

Thanks,
Naoya Horiguchi

[1]: https://lwn.net/Articles/753261/
[2]: https://lkml.org/lkml/2018/7/17/60
---
Summary:

Naoya Horiguchi (11):
      mm: hwpoison: cleanup unused PageHuge() check
      mm: soft-offline: add missing error check of set_hwpoison_free_buddy_page()
      mm: move definition of num_poisoned_pages_inc/dec to include/linux/mm.h
      mm: madvise: call soft_offline_page() without MF_COUNT_INCREASED
      mm: hwpoison-inject: don't pin for hwpoison_filter()
      mm: hwpoison: remove MF_COUNT_INCREASED
      mm: remove flag argument from soft offline functions
      mm: soft-offline: isolate error pages from buddy freelist
      mm: hwpoison: apply buddy page handling code to hard-offline
      mm: clear PageHWPoison in memory hotremove
      mm: hwpoison: introduce clear_hwpoison_free_buddy_page()

 drivers/base/memory.c      |   2 +-
 include/linux/mm.h         |  22 ++++++---
 include/linux/page-flags.h |   8 +++-
 include/linux/swapops.h    |  16 -------
 mm/hwpoison-inject.c       |  18 ++------
 mm/madvise.c               |  25 +++++-----
 mm/memory-failure.c        | 112 ++++++++++++++++++++++++++-------------------
 mm/migrate.c               |   9 ----
 mm/page_alloc.c            |  95 +++++++++++++++++++++++++++++++++++---
 mm/sparse.c                |   2 +-
 10 files changed, 193 insertions(+), 116 deletions(-)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC][PATCH v1 01/11] mm: hwpoison: cleanup unused PageHuge() check
  2018-11-09  6:47 [PATCH RFC v1 00/11] hwpoison improvement part 1 Naoya Horiguchi
@ 2018-11-09  6:47 ` Naoya Horiguchi
  2018-11-09  9:52   ` Anshuman Khandual
  2018-11-09  6:47 ` [RFC][PATCH v1 02/11] mm: soft-offline: add missing error check of set_hwpoison_free_buddy_page() Naoya Horiguchi
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 24+ messages in thread
From: Naoya Horiguchi @ 2018-11-09  6:47 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Michal Hocko, Andrew Morton, Mike Kravetz, xishi.qiuxishi,
	Laurent Dufour

memory_failure() forks to memory_failure_hugetlb() for hugetlb pages,
so a PageHuge() check after the fork should not be necessary.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/memory-failure.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
index 0cd3de3..9f09bf3 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
@@ -1357,10 +1357,7 @@ int memory_failure(unsigned long pfn, int flags)
 	 * page_remove_rmap() in try_to_unmap_one(). So to determine page status
 	 * correctly, we save a copy of the page flags at this time.
 	 */
-	if (PageHuge(p))
-		page_flags = hpage->flags;
-	else
-		page_flags = p->flags;
+	page_flags = p->flags;
 
 	/*
 	 * unpoison always clear PG_hwpoison inside page lock
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC][PATCH v1 02/11] mm: soft-offline: add missing error check of set_hwpoison_free_buddy_page()
  2018-11-09  6:47 [PATCH RFC v1 00/11] hwpoison improvement part 1 Naoya Horiguchi
  2018-11-09  6:47 ` [RFC][PATCH v1 01/11] mm: hwpoison: cleanup unused PageHuge() check Naoya Horiguchi
@ 2018-11-09  6:47 ` Naoya Horiguchi
  2018-11-09 10:20   ` Anshuman Khandual
  2018-11-09  6:47 ` [RFC][PATCH v1 03/11] mm: move definition of num_poisoned_pages_inc/dec to include/linux/mm.h Naoya Horiguchi
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 24+ messages in thread
From: Naoya Horiguchi @ 2018-11-09  6:47 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Michal Hocko, Andrew Morton, Mike Kravetz, xishi.qiuxishi,
	Laurent Dufour

set_hwpoison_free_buddy_page() could fail, then the target page is
finally not isolated, so it's better to report -EBUSY for userspace
to know the failure and chance of retry.

And for consistency, this patch moves set_hwpoison_free_buddy_page()
in unmap_and_move() to __soft_offline_page().

Fixes: 6bc9b56433b7 ("mm: fix race on soft-offlining free huge pages")
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/memory-failure.c | 15 ++++++++++++---
 mm/migrate.c        |  9 ---------
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
index 9f09bf3..11e283e 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
@@ -1719,14 +1719,18 @@ static int soft_offline_huge_page(struct page *page, int flags)
 		/*
 		 * We set PG_hwpoison only when the migration source hugepage
 		 * was successfully dissolved, because otherwise hwpoisoned
-		 * hugepage remains on free hugepage list, then userspace will
-		 * find it as SIGBUS by allocation failure. That's not expected
-		 * in soft-offlining.
+		 * hugepage remains on free hugepage list. The allocator ignores
+		 * such a hwpoisoned page so it's never allocated, but it could
+		 * kill a process because of no-memory rather than hwpoison.
+		 * Soft-offline never impacts the userspace, so this is
+		 * undesired.
 		 */
 		ret = dissolve_free_huge_page(page);
 		if (!ret) {
 			if (set_hwpoison_free_buddy_page(page))
 				num_poisoned_pages_inc();
+			else
+				ret = -EBUSY;
 		}
 	}
 	return ret;
@@ -1804,6 +1808,11 @@ static int __soft_offline_page(struct page *page, int flags)
 				pfn, ret, page->flags, &page->flags);
 			if (ret > 0)
 				ret = -EIO;
+		} else {
+			if (set_hwpoison_free_buddy_page(page))
+				num_poisoned_pages_inc();
+			else
+				ret = -EBUSY;
 		}
 	} else {
 		pr_info("soft offline: %#lx: isolation failed: %d, page count %d, type %lx (%pGp)\n",
diff --git v4.19-mmotm-2018-10-30-16-08/mm/migrate.c v4.19-mmotm-2018-10-30-16-08_patched/mm/migrate.c
index f7e4bfd..1742372 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/migrate.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/migrate.c
@@ -1199,15 +1199,6 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
 	 */
 	if (rc == MIGRATEPAGE_SUCCESS) {
 		put_page(page);
-		if (reason == MR_MEMORY_FAILURE) {
-			/*
-			 * Set PG_HWPoison on just freed page
-			 * intentionally. Although it's rather weird,
-			 * it's how HWPoison flag works at the moment.
-			 */
-			if (set_hwpoison_free_buddy_page(page))
-				num_poisoned_pages_inc();
-		}
 	} else {
 		if (rc != -EAGAIN) {
 			if (likely(!__PageMovable(page))) {
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC][PATCH v1 03/11] mm: move definition of num_poisoned_pages_inc/dec to include/linux/mm.h
  2018-11-09  6:47 [PATCH RFC v1 00/11] hwpoison improvement part 1 Naoya Horiguchi
  2018-11-09  6:47 ` [RFC][PATCH v1 01/11] mm: hwpoison: cleanup unused PageHuge() check Naoya Horiguchi
  2018-11-09  6:47 ` [RFC][PATCH v1 02/11] mm: soft-offline: add missing error check of set_hwpoison_free_buddy_page() Naoya Horiguchi
@ 2018-11-09  6:47 ` Naoya Horiguchi
  2018-11-09 10:28   ` Anshuman Khandual
  2018-11-09  6:47 ` [RFC][PATCH v1 04/11] mm: madvise: call soft_offline_page() without MF_COUNT_INCREASED Naoya Horiguchi
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 24+ messages in thread
From: Naoya Horiguchi @ 2018-11-09  6:47 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Michal Hocko, Andrew Morton, Mike Kravetz, xishi.qiuxishi,
	Laurent Dufour

num_poisoned_pages_inc/dec had better be visible to some file like
mm/sparse.c and mm/page_alloc.c (for a subsequent patch). So let's
move it to include/linux/mm.h.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 include/linux/mm.h      | 13 ++++++++++++-
 include/linux/swapops.h | 16 ----------------
 mm/sparse.c             |  2 +-
 3 files changed, 13 insertions(+), 18 deletions(-)

diff --git v4.19-mmotm-2018-10-30-16-08/include/linux/mm.h v4.19-mmotm-2018-10-30-16-08_patched/include/linux/mm.h
index 59df394..22623ba 100644
--- v4.19-mmotm-2018-10-30-16-08/include/linux/mm.h
+++ v4.19-mmotm-2018-10-30-16-08_patched/include/linux/mm.h
@@ -2741,7 +2741,7 @@ extern void shake_page(struct page *p, int access);
 extern atomic_long_t num_poisoned_pages __read_mostly;
 extern int soft_offline_page(struct page *page, int flags);
 
-
+#ifdef CONFIG_MEMORY_FAILURE
 /*
  * Error handlers for various types of pages.
  */
@@ -2777,6 +2777,17 @@ enum mf_action_page_type {
 	MF_MSG_UNKNOWN,
 };
 
+static inline void num_poisoned_pages_inc(void)
+{
+	atomic_long_inc(&num_poisoned_pages);
+}
+
+static inline void num_poisoned_pages_dec(void)
+{
+	atomic_long_dec(&num_poisoned_pages);
+}
+#endif
+
 #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS)
 extern void clear_huge_page(struct page *page,
 			    unsigned long addr_hint,
diff --git v4.19-mmotm-2018-10-30-16-08/include/linux/swapops.h v4.19-mmotm-2018-10-30-16-08_patched/include/linux/swapops.h
index 4d96166..88137e9 100644
--- v4.19-mmotm-2018-10-30-16-08/include/linux/swapops.h
+++ v4.19-mmotm-2018-10-30-16-08_patched/include/linux/swapops.h
@@ -320,8 +320,6 @@ static inline int is_pmd_migration_entry(pmd_t pmd)
 
 #ifdef CONFIG_MEMORY_FAILURE
 
-extern atomic_long_t num_poisoned_pages __read_mostly;
-
 /*
  * Support for hardware poisoned pages
  */
@@ -336,16 +334,6 @@ static inline int is_hwpoison_entry(swp_entry_t entry)
 	return swp_type(entry) == SWP_HWPOISON;
 }
 
-static inline void num_poisoned_pages_inc(void)
-{
-	atomic_long_inc(&num_poisoned_pages);
-}
-
-static inline void num_poisoned_pages_dec(void)
-{
-	atomic_long_dec(&num_poisoned_pages);
-}
-
 #else
 
 static inline swp_entry_t make_hwpoison_entry(struct page *page)
@@ -357,10 +345,6 @@ static inline int is_hwpoison_entry(swp_entry_t swp)
 {
 	return 0;
 }
-
-static inline void num_poisoned_pages_inc(void)
-{
-}
 #endif
 
 #if defined(CONFIG_MEMORY_FAILURE) || defined(CONFIG_MIGRATION)
diff --git v4.19-mmotm-2018-10-30-16-08/mm/sparse.c v4.19-mmotm-2018-10-30-16-08_patched/mm/sparse.c
index 33307fc..7ada2e5 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/sparse.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/sparse.c
@@ -726,7 +726,7 @@ static void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
 
 	for (i = 0; i < nr_pages; i++) {
 		if (PageHWPoison(&memmap[i])) {
-			atomic_long_sub(1, &num_poisoned_pages);
+			num_poisoned_pages_dec();
 			ClearPageHWPoison(&memmap[i]);
 		}
 	}
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC][PATCH v1 04/11] mm: madvise: call soft_offline_page() without MF_COUNT_INCREASED
  2018-11-09  6:47 [PATCH RFC v1 00/11] hwpoison improvement part 1 Naoya Horiguchi
                   ` (2 preceding siblings ...)
  2018-11-09  6:47 ` [RFC][PATCH v1 03/11] mm: move definition of num_poisoned_pages_inc/dec to include/linux/mm.h Naoya Horiguchi
@ 2018-11-09  6:47 ` Naoya Horiguchi
  2018-11-09 10:46   ` Anshuman Khandual
  2018-11-09  6:47 ` [RFC][PATCH v1 05/11] mm: hwpoison-inject: don't pin for hwpoison_filter() Naoya Horiguchi
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 24+ messages in thread
From: Naoya Horiguchi @ 2018-11-09  6:47 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Michal Hocko, Andrew Morton, Mike Kravetz, xishi.qiuxishi,
	Laurent Dufour

Currently madvise_inject_error() pins the target page when calling
memory error handler, but it's not good because the refcount is just
an artifact of error injector and mock nothing about hw error itself.
IOW, pinning the error page is part of error handler's task, so
let's stop doing it.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/madvise.c | 25 +++++++++++--------------
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git v4.19-mmotm-2018-10-30-16-08/mm/madvise.c v4.19-mmotm-2018-10-30-16-08_patched/mm/madvise.c
index 6cb1ca9..9fa0225 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/madvise.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/madvise.c
@@ -637,6 +637,16 @@ static int madvise_inject_error(int behavior,
 		ret = get_user_pages_fast(start, 1, 0, &page);
 		if (ret != 1)
 			return ret;
+		/*
+		 * The get_user_pages_fast() is just to get the pfn of the
+		 * given address, and the refcount has nothing to do with
+		 * what we try to test, so it should be released immediately.
+		 * This is racy but it's intended because the real hardware
+		 * errors could happen at any moment and memory error handlers
+		 * must properly handle the race.
+		 */
+		put_page(page);
+
 		pfn = page_to_pfn(page);
 
 		/*
@@ -646,16 +656,11 @@ static int madvise_inject_error(int behavior,
 		 */
 		order = compound_order(compound_head(page));
 
-		if (PageHWPoison(page)) {
-			put_page(page);
-			continue;
-		}
-
 		if (behavior == MADV_SOFT_OFFLINE) {
 			pr_info("Soft offlining pfn %#lx at process virtual address %#lx\n",
 					pfn, start);
 
-			ret = soft_offline_page(page, MF_COUNT_INCREASED);
+			ret = soft_offline_page(page, 0);
 			if (ret)
 				return ret;
 			continue;
@@ -663,14 +668,6 @@ static int madvise_inject_error(int behavior,
 
 		pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n",
 				pfn, start);
-
-		/*
-		 * Drop the page reference taken by get_user_pages_fast(). In
-		 * the absence of MF_COUNT_INCREASED the memory_failure()
-		 * routine is responsible for pinning the page to prevent it
-		 * from being released back to the page allocator.
-		 */
-		put_page(page);
 		ret = memory_failure(pfn, 0);
 		if (ret)
 			return ret;
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC][PATCH v1 05/11] mm: hwpoison-inject: don't pin for hwpoison_filter()
  2018-11-09  6:47 [PATCH RFC v1 00/11] hwpoison improvement part 1 Naoya Horiguchi
                   ` (3 preceding siblings ...)
  2018-11-09  6:47 ` [RFC][PATCH v1 04/11] mm: madvise: call soft_offline_page() without MF_COUNT_INCREASED Naoya Horiguchi
@ 2018-11-09  6:47 ` Naoya Horiguchi
  2018-11-09  6:47 ` [RFC][PATCH v1 06/11] mm: hwpoison: remove MF_COUNT_INCREASED Naoya Horiguchi
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Naoya Horiguchi @ 2018-11-09  6:47 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Michal Hocko, Andrew Morton, Mike Kravetz, xishi.qiuxishi,
	Laurent Dufour

Another memory error injection interface debugfs:hwpoison/corrupt-pfn
also takes bogus refcount for hwpoison_filter(). It's justified
because this does a coarse filter, expecting that memory_failure()
redoes the check for sure.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/hwpoison-inject.c | 18 +++++-------------
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git v4.19-mmotm-2018-10-30-16-08/mm/hwpoison-inject.c v4.19-mmotm-2018-10-30-16-08_patched/mm/hwpoison-inject.c
index b6ac706..766062c 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/hwpoison-inject.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/hwpoison-inject.c
@@ -25,11 +25,6 @@ static int hwpoison_inject(void *data, u64 val)
 
 	p = pfn_to_page(pfn);
 	hpage = compound_head(p);
-	/*
-	 * This implies unable to support free buddy pages.
-	 */
-	if (!get_hwpoison_page(p))
-		return 0;
 
 	if (!hwpoison_filter_enable)
 		goto inject;
@@ -39,23 +34,20 @@ static int hwpoison_inject(void *data, u64 val)
 	 * This implies unable to support non-LRU pages.
 	 */
 	if (!PageLRU(hpage) && !PageHuge(p))
-		goto put_out;
+		return 0;
 
 	/*
-	 * do a racy check with elevated page count, to make sure PG_hwpoison
-	 * will only be set for the targeted owner (or on a free page).
+	 * do a racy check to make sure PG_hwpoison will only be set for
+	 * the targeted owner (or on a free page).
 	 * memory_failure() will redo the check reliably inside page lock.
 	 */
 	err = hwpoison_filter(hpage);
 	if (err)
-		goto put_out;
+		return 0;
 
 inject:
 	pr_info("Injecting memory failure at pfn %#lx\n", pfn);
-	return memory_failure(pfn, MF_COUNT_INCREASED);
-put_out:
-	put_hwpoison_page(p);
-	return 0;
+	return memory_failure(pfn, 0);
 }
 
 static int hwpoison_unpoison(void *data, u64 val)
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC][PATCH v1 06/11] mm: hwpoison: remove MF_COUNT_INCREASED
  2018-11-09  6:47 [PATCH RFC v1 00/11] hwpoison improvement part 1 Naoya Horiguchi
                   ` (4 preceding siblings ...)
  2018-11-09  6:47 ` [RFC][PATCH v1 05/11] mm: hwpoison-inject: don't pin for hwpoison_filter() Naoya Horiguchi
@ 2018-11-09  6:47 ` Naoya Horiguchi
  2018-11-09  6:47 ` [RFC][PATCH v1 07/11] mm: remove flag argument from soft offline functions Naoya Horiguchi
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Naoya Horiguchi @ 2018-11-09  6:47 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Michal Hocko, Andrew Morton, Mike Kravetz, xishi.qiuxishi,
	Laurent Dufour

Now there's no user of MF_COUNT_INCREASED, so we can safely remove
all calling points.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 include/linux/mm.h  |  7 +++----
 mm/memory-failure.c | 16 +++-------------
 2 files changed, 6 insertions(+), 17 deletions(-)

diff --git v4.19-mmotm-2018-10-30-16-08/include/linux/mm.h v4.19-mmotm-2018-10-30-16-08_patched/include/linux/mm.h
index 22623ba..f85b450 100644
--- v4.19-mmotm-2018-10-30-16-08/include/linux/mm.h
+++ v4.19-mmotm-2018-10-30-16-08_patched/include/linux/mm.h
@@ -2725,10 +2725,9 @@ void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
 				  unsigned long nr_pages);
 
 enum mf_flags {
-	MF_COUNT_INCREASED = 1 << 0,
-	MF_ACTION_REQUIRED = 1 << 1,
-	MF_MUST_KILL = 1 << 2,
-	MF_SOFT_OFFLINE = 1 << 3,
+	MF_ACTION_REQUIRED = 1 << 0,
+	MF_MUST_KILL = 1 << 1,
+	MF_SOFT_OFFLINE = 1 << 2,
 };
 extern int memory_failure(unsigned long pfn, int flags);
 extern void memory_failure_queue(unsigned long pfn, int flags);
diff --git v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
index 11e283e..ed347f8 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
@@ -1094,7 +1094,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
 
 	num_poisoned_pages_inc();
 
-	if (!(flags & MF_COUNT_INCREASED) && !get_hwpoison_page(p)) {
+	if (!get_hwpoison_page(p)) {
 		/*
 		 * Check "filter hit" and "race with other subpage."
 		 */
@@ -1290,7 +1290,7 @@ int memory_failure(unsigned long pfn, int flags)
 	 * In fact it's dangerous to directly bump up page count from 0,
 	 * that may make page_ref_freeze()/page_ref_unfreeze() mismatch.
 	 */
-	if (!(flags & MF_COUNT_INCREASED) && !get_hwpoison_page(p)) {
+	if (!get_hwpoison_page(p)) {
 		if (is_free_buddy_page(p)) {
 			action_result(pfn, MF_MSG_BUDDY, MF_DELAYED);
 			return 0;
@@ -1331,10 +1331,7 @@ int memory_failure(unsigned long pfn, int flags)
 	shake_page(p, 0);
 	/* shake_page could have turned it free. */
 	if (!PageLRU(p) && is_free_buddy_page(p)) {
-		if (flags & MF_COUNT_INCREASED)
-			action_result(pfn, MF_MSG_BUDDY, MF_DELAYED);
-		else
-			action_result(pfn, MF_MSG_BUDDY_2ND, MF_DELAYED);
+		action_result(pfn, MF_MSG_BUDDY_2ND, MF_DELAYED);
 		return 0;
 	}
 
@@ -1622,9 +1619,6 @@ static int __get_any_page(struct page *p, unsigned long pfn, int flags)
 {
 	int ret;
 
-	if (flags & MF_COUNT_INCREASED)
-		return 1;
-
 	/*
 	 * When the target page is a free hugepage, just remove it
 	 * from free hugepage list.
@@ -1906,15 +1900,11 @@ int soft_offline_page(struct page *page, int flags)
 	if (is_zone_device_page(page)) {
 		pr_debug_ratelimited("soft_offline: %#lx page is device page\n",
 				pfn);
-		if (flags & MF_COUNT_INCREASED)
-			put_page(page);
 		return -EIO;
 	}
 
 	if (PageHWPoison(page)) {
 		pr_info("soft offline: %#lx page already poisoned\n", pfn);
-		if (flags & MF_COUNT_INCREASED)
-			put_hwpoison_page(page);
 		return -EBUSY;
 	}
 
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC][PATCH v1 07/11] mm: remove flag argument from soft offline functions
  2018-11-09  6:47 [PATCH RFC v1 00/11] hwpoison improvement part 1 Naoya Horiguchi
                   ` (5 preceding siblings ...)
  2018-11-09  6:47 ` [RFC][PATCH v1 06/11] mm: hwpoison: remove MF_COUNT_INCREASED Naoya Horiguchi
@ 2018-11-09  6:47 ` Naoya Horiguchi
  2018-11-09  6:47 ` [RFC][PATCH v1 08/11] mm: soft-offline: isolate error pages from buddy freelist Naoya Horiguchi
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Naoya Horiguchi @ 2018-11-09  6:47 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Michal Hocko, Andrew Morton, Mike Kravetz, xishi.qiuxishi,
	Laurent Dufour

The argument @flag no longer affects the behavior of soft_offline_page()
and its variants, so let's remove them.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 drivers/base/memory.c |  2 +-
 include/linux/mm.h    |  2 +-
 mm/madvise.c          |  2 +-
 mm/memory-failure.c   | 27 +++++++++++++--------------
 4 files changed, 16 insertions(+), 17 deletions(-)

diff --git v4.19-mmotm-2018-10-30-16-08/drivers/base/memory.c v4.19-mmotm-2018-10-30-16-08_patched/drivers/base/memory.c
index 0e59856..4a554a5 100644
--- v4.19-mmotm-2018-10-30-16-08/drivers/base/memory.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/drivers/base/memory.c
@@ -548,7 +548,7 @@ store_soft_offline_page(struct device *dev,
 	pfn >>= PAGE_SHIFT;
 	if (!pfn_valid(pfn))
 		return -ENXIO;
-	ret = soft_offline_page(pfn_to_page(pfn), 0);
+	ret = soft_offline_page(pfn_to_page(pfn));
 	return ret == 0 ? count : ret;
 }
 
diff --git v4.19-mmotm-2018-10-30-16-08/include/linux/mm.h v4.19-mmotm-2018-10-30-16-08_patched/include/linux/mm.h
index f85b450..6c496da 100644
--- v4.19-mmotm-2018-10-30-16-08/include/linux/mm.h
+++ v4.19-mmotm-2018-10-30-16-08_patched/include/linux/mm.h
@@ -2738,7 +2738,7 @@ extern int sysctl_memory_failure_early_kill;
 extern int sysctl_memory_failure_recovery;
 extern void shake_page(struct page *p, int access);
 extern atomic_long_t num_poisoned_pages __read_mostly;
-extern int soft_offline_page(struct page *page, int flags);
+extern int soft_offline_page(struct page *page);
 
 #ifdef CONFIG_MEMORY_FAILURE
 /*
diff --git v4.19-mmotm-2018-10-30-16-08/mm/madvise.c v4.19-mmotm-2018-10-30-16-08_patched/mm/madvise.c
index 9fa0225..86453f3 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/madvise.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/madvise.c
@@ -660,7 +660,7 @@ static int madvise_inject_error(int behavior,
 			pr_info("Soft offlining pfn %#lx at process virtual address %#lx\n",
 					pfn, start);
 
-			ret = soft_offline_page(page, 0);
+			ret = soft_offline_page(page);
 			if (ret)
 				return ret;
 			continue;
diff --git v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
index ed347f8..869ff8f 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
@@ -1482,7 +1482,7 @@ static void memory_failure_work_func(struct work_struct *work)
 		if (!gotten)
 			break;
 		if (entry.flags & MF_SOFT_OFFLINE)
-			soft_offline_page(pfn_to_page(entry.pfn), entry.flags);
+			soft_offline_page(pfn_to_page(entry.pfn));
 		else
 			memory_failure(entry.pfn, entry.flags);
 	}
@@ -1615,7 +1615,7 @@ static struct page *new_page(struct page *p, unsigned long private)
  * that is not free, and 1 for any other page type.
  * For 1 the page is returned with increased page count, otherwise not.
  */
-static int __get_any_page(struct page *p, unsigned long pfn, int flags)
+static int __get_any_page(struct page *p, unsigned long pfn)
 {
 	int ret;
 
@@ -1642,9 +1642,9 @@ static int __get_any_page(struct page *p, unsigned long pfn, int flags)
 	return ret;
 }
 
-static int get_any_page(struct page *page, unsigned long pfn, int flags)
+static int get_any_page(struct page *page, unsigned long pfn)
 {
-	int ret = __get_any_page(page, pfn, flags);
+	int ret = __get_any_page(page, pfn);
 
 	if (ret == 1 && !PageHuge(page) &&
 	    !PageLRU(page) && !__PageMovable(page)) {
@@ -1657,7 +1657,7 @@ static int get_any_page(struct page *page, unsigned long pfn, int flags)
 		/*
 		 * Did it turn free?
 		 */
-		ret = __get_any_page(page, pfn, 0);
+		ret = __get_any_page(page, pfn);
 		if (ret == 1 && !PageLRU(page)) {
 			/* Drop page reference which is from __get_any_page() */
 			put_hwpoison_page(page);
@@ -1669,7 +1669,7 @@ static int get_any_page(struct page *page, unsigned long pfn, int flags)
 	return ret;
 }
 
-static int soft_offline_huge_page(struct page *page, int flags)
+static int soft_offline_huge_page(struct page *page)
 {
 	int ret;
 	unsigned long pfn = page_to_pfn(page);
@@ -1730,7 +1730,7 @@ static int soft_offline_huge_page(struct page *page, int flags)
 	return ret;
 }
 
-static int __soft_offline_page(struct page *page, int flags)
+static int __soft_offline_page(struct page *page)
 {
 	int ret;
 	unsigned long pfn = page_to_pfn(page);
@@ -1815,7 +1815,7 @@ static int __soft_offline_page(struct page *page, int flags)
 	return ret;
 }
 
-static int soft_offline_in_use_page(struct page *page, int flags)
+static int soft_offline_in_use_page(struct page *page)
 {
 	int ret;
 	int mt;
@@ -1847,9 +1847,9 @@ static int soft_offline_in_use_page(struct page *page, int flags)
 	mt = get_pageblock_migratetype(page);
 	set_pageblock_migratetype(page, MIGRATE_ISOLATE);
 	if (PageHuge(page))
-		ret = soft_offline_huge_page(page, flags);
+		ret = soft_offline_huge_page(page);
 	else
-		ret = __soft_offline_page(page, flags);
+		ret = __soft_offline_page(page);
 	set_pageblock_migratetype(page, mt);
 	return ret;
 }
@@ -1873,7 +1873,6 @@ static int soft_offline_free_page(struct page *page)
 /**
  * soft_offline_page - Soft offline a page.
  * @page: page to offline
- * @flags: flags. Same as memory_failure().
  *
  * Returns 0 on success, otherwise negated errno.
  *
@@ -1892,7 +1891,7 @@ static int soft_offline_free_page(struct page *page)
  * This is not a 100% solution for all memory, but tries to be
  * ``good enough'' for the majority of memory.
  */
-int soft_offline_page(struct page *page, int flags)
+int soft_offline_page(struct page *page)
 {
 	int ret;
 	unsigned long pfn = page_to_pfn(page);
@@ -1909,11 +1908,11 @@ int soft_offline_page(struct page *page, int flags)
 	}
 
 	get_online_mems();
-	ret = get_any_page(page, pfn, flags);
+	ret = get_any_page(page, pfn);
 	put_online_mems();
 
 	if (ret > 0)
-		ret = soft_offline_in_use_page(page, flags);
+		ret = soft_offline_in_use_page(page);
 	else if (ret == 0)
 		ret = soft_offline_free_page(page);
 
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC][PATCH v1 08/11] mm: soft-offline: isolate error pages from buddy freelist
  2018-11-09  6:47 [PATCH RFC v1 00/11] hwpoison improvement part 1 Naoya Horiguchi
                   ` (6 preceding siblings ...)
  2018-11-09  6:47 ` [RFC][PATCH v1 07/11] mm: remove flag argument from soft offline functions Naoya Horiguchi
@ 2018-11-09  6:47 ` Naoya Horiguchi
  2018-11-09  6:47 ` [RFC][PATCH v1 09/11] mm: hwpoison: apply buddy page handling code to hard-offline Naoya Horiguchi
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Naoya Horiguchi @ 2018-11-09  6:47 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Michal Hocko, Andrew Morton, Mike Kravetz, xishi.qiuxishi,
	Laurent Dufour

Soft-offline shares PG_hwpoison with hard-offline to keep track
of memory error, but recently we found that the approach can be
undesirable for soft-offline because it never expects to stop
applications unlike hard-offline.

So this patch suggests that memory error handler (not only sets
PG_hwpoison, but) isolates error pages from buddy allocator in
its context.

In previous works [1], we allow soft-offline handler to set
PG_hwpoison only after successful page migration and page freeing.
This patch, along with that, makes the isolation always done via
set_hwpoison_free_buddy_page() with zone->lock, so the behavior
should be less racy and more predictable.

Note that only considering for isolation, we don't have to set
PG_hwpoison, but my analysis shows that to make memory hotremove
properly work, we still need some flag to clearly separate memory
error from any other type of pages. So this patch doesn't change this.

[1]:
  commit 6bc9b56433b7 ("mm: fix race on soft-offlining free huge pages")
  commit d4ae9916ea29 ("mm: soft-offline: close the race against page allocation")

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/memory-failure.c |  8 +++---
 mm/page_alloc.c     | 71 ++++++++++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 70 insertions(+), 9 deletions(-)

diff --git v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
index 869ff8f..ecafd4a 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
@@ -1762,9 +1762,11 @@ static int __soft_offline_page(struct page *page)
 	if (ret == 1) {
 		put_hwpoison_page(page);
 		pr_info("soft_offline: %#lx: invalidated\n", pfn);
-		SetPageHWPoison(page);
-		num_poisoned_pages_inc();
-		return 0;
+		if (set_hwpoison_free_buddy_page(page)) {
+			num_poisoned_pages_inc();
+			return 0;
+		} else
+			return -EBUSY;
 	}
 
 	/*
diff --git v4.19-mmotm-2018-10-30-16-08/mm/page_alloc.c v4.19-mmotm-2018-10-30-16-08_patched/mm/page_alloc.c
index ae31839..970d6ff 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/page_alloc.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/page_alloc.c
@@ -8183,10 +8183,55 @@ bool is_free_buddy_page(struct page *page)
 }
 
 #ifdef CONFIG_MEMORY_FAILURE
+
+/*
+ * Pick out a free page from buddy allocator. Unlike expand(), this
+ * function can choose the target page by @target which is not limited
+ * to the first page of some free block.
+ *
+ * This function changes zone state, so callers need to hold zone->lock.
+ */
+static inline void pickout_buddy_page(struct zone *zone, struct page *page,
+			struct page *target, int torder, int low, int high,
+			struct free_area *area, int migratetype)
+{
+	unsigned long size = 1 << high;
+	struct page *current_buddy, *next_page;
+
+	while (high > low) {
+		area--;
+		high--;
+		size >>= 1;
+
+		if (target >= &page[size]) { /* target is in higher buddy */
+			next_page = page + size;
+			current_buddy = page;
+		} else { /* target is in lower buddy */
+			next_page = page;
+			current_buddy = page + size;
+		}
+		VM_BUG_ON_PAGE(bad_range(zone, current_buddy), current_buddy);
+
+		if (set_page_guard(zone, &page[size], high, migratetype))
+			continue;
+
+		list_add(&current_buddy->lru, &area->free_list[migratetype]);
+		area->nr_free++;
+		set_page_order(current_buddy, high);
+		page = next_page;
+	}
+}
+
 /*
- * Set PG_hwpoison flag if a given page is confirmed to be a free page.  This
- * test is performed under the zone lock to prevent a race against page
- * allocation.
+ * Isolate hwpoisoned free page which actully does the following
+ *   - confirm that a given page is a free page under zone->lock,
+ *   - set PG_hwpoison flag,
+ *   - remove the page from buddy allocator, subdividing buddy page
+ *     of each order.
+ *
+ * Just setting PG_hwpoison flag is not safe enough for complete isolation
+ * because rapidly-changing memory allocator code is always with the
+ * risk of mishandling the flag and potential race.
  */
 bool set_hwpoison_free_buddy_page(struct page *page)
 {
@@ -8199,10 +8244,24 @@ bool set_hwpoison_free_buddy_page(struct page *page)
 	spin_lock_irqsave(&zone->lock, flags);
 	for (order = 0; order < MAX_ORDER; order++) {
 		struct page *page_head = page - (pfn & ((1 << order) - 1));
+		unsigned int forder = page_order(page_head);
+		struct free_area *area = &(zone->free_area[forder]);
 
-		if (PageBuddy(page_head) && page_order(page_head) >= order) {
-			if (!TestSetPageHWPoison(page))
-				hwpoisoned = true;
+		if (PageBuddy(page_head) && forder >= order) {
+			int migtype = get_pfnblock_migratetype(page_head,
+							page_to_pfn(page_head));
+			/*
+			 * TestSetPageHWPoison() will be used later when
+			 * reworking hard-offline part is finished.
+			 */
+			SetPageHWPoison(page);
+
+			list_del(&page_head->lru);
+			rmv_page_order(page_head);
+			area->nr_free--;
+			pickout_buddy_page(zone, page_head, page, 0, 0, forder,
+					area, migtype);
+			hwpoisoned = true;
 			break;
 		}
 	}
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC][PATCH v1 09/11] mm: hwpoison: apply buddy page handling code to hard-offline
  2018-11-09  6:47 [PATCH RFC v1 00/11] hwpoison improvement part 1 Naoya Horiguchi
                   ` (7 preceding siblings ...)
  2018-11-09  6:47 ` [RFC][PATCH v1 08/11] mm: soft-offline: isolate error pages from buddy freelist Naoya Horiguchi
@ 2018-11-09  6:47 ` Naoya Horiguchi
  2018-11-09  6:47 ` [RFC][PATCH v1 10/11] mm: clear PageHWPoison in memory hotremove Naoya Horiguchi
  2018-11-09  6:47 ` [RFC][PATCH v1 11/11] mm: hwpoison: introduce clear_hwpoison_free_buddy_page() Naoya Horiguchi
  10 siblings, 0 replies; 24+ messages in thread
From: Naoya Horiguchi @ 2018-11-09  6:47 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Michal Hocko, Andrew Morton, Mike Kravetz, xishi.qiuxishi,
	Laurent Dufour

Hard-offline of free buddy pages can be handled in the same manner as
soft-offline. So this patch applies the new semantics to hard-offline to
more complete isolation of offlined page. As a result, the successful
case is worth MF_RECOVERED instead of MF_DELAYED, so this patch also
changes it.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/memory-failure.c | 38 ++++++++++++++++++++++++++++----------
 1 file changed, 28 insertions(+), 10 deletions(-)

diff --git v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
index ecafd4a..af541141 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
@@ -772,6 +772,16 @@ static int me_swapcache_clean(struct page *p, unsigned long pfn)
 		return MF_FAILED;
 }
 
+static int me_huge_free_page(struct page *p)
+{
+	int rc = dissolve_free_huge_page(p);
+
+	if (!rc && set_hwpoison_free_buddy_page(p))
+		return MF_RECOVERED;
+	else
+		return MF_FAILED;
+}
+
 /*
  * Huge pages. Needs work.
  * Issues:
@@ -799,8 +809,7 @@ static int me_huge_page(struct page *p, unsigned long pfn)
 		 */
 		if (PageAnon(hpage))
 			put_page(hpage);
-		dissolve_free_huge_page(p);
-		res = MF_RECOVERED;
+		res = me_huge_free_page(p);
 		lock_page(hpage);
 	}
 
@@ -1108,8 +1117,11 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
 			}
 		}
 		unlock_page(head);
-		dissolve_free_huge_page(p);
-		action_result(pfn, MF_MSG_FREE_HUGE, MF_DELAYED);
+
+		res = me_huge_free_page(p);
+		if (res == MF_FAILED)
+			num_poisoned_pages_dec();
+		action_result(pfn, MF_MSG_FREE_HUGE, res);
 		return 0;
 	}
 
@@ -1270,6 +1282,13 @@ int memory_failure(unsigned long pfn, int flags)
 	p = pfn_to_page(pfn);
 	if (PageHuge(p))
 		return memory_failure_hugetlb(pfn, flags);
+
+	if (set_hwpoison_free_buddy_page(p)) {
+		action_result(pfn, MF_MSG_BUDDY, MF_RECOVERED);
+		num_poisoned_pages_inc();
+		return 0;
+	}
+
 	if (TestSetPageHWPoison(p)) {
 		pr_err("Memory failure: %#lx: already hardware poisoned\n",
 			pfn);
@@ -1281,8 +1300,7 @@ int memory_failure(unsigned long pfn, int flags)
 
 	/*
 	 * We need/can do nothing about count=0 pages.
-	 * 1) it's a free page, and therefore in safe hand:
-	 *    prep_new_page() will be the gate keeper.
+	 * 1) it's a free page, and removed from buddy allocator.
 	 * 2) it's part of a non-compound high order page.
 	 *    Implies some kernel user: cannot stop them from
 	 *    R/W the page; let's pray that the page has been
@@ -1291,8 +1309,8 @@ int memory_failure(unsigned long pfn, int flags)
 	 * that may make page_ref_freeze()/page_ref_unfreeze() mismatch.
 	 */
 	if (!get_hwpoison_page(p)) {
-		if (is_free_buddy_page(p)) {
-			action_result(pfn, MF_MSG_BUDDY, MF_DELAYED);
+		if (set_hwpoison_free_buddy_page(p)) {
+			action_result(pfn, MF_MSG_BUDDY, MF_RECOVERED);
 			return 0;
 		} else {
 			action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED);
@@ -1330,8 +1348,8 @@ int memory_failure(unsigned long pfn, int flags)
 	 */
 	shake_page(p, 0);
 	/* shake_page could have turned it free. */
-	if (!PageLRU(p) && is_free_buddy_page(p)) {
-		action_result(pfn, MF_MSG_BUDDY_2ND, MF_DELAYED);
+	if (!PageLRU(p) && set_hwpoison_free_buddy_page(p)) {
+		action_result(pfn, MF_MSG_BUDDY_2ND, MF_RECOVERED);
 		return 0;
 	}
 
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC][PATCH v1 10/11] mm: clear PageHWPoison in memory hotremove
  2018-11-09  6:47 [PATCH RFC v1 00/11] hwpoison improvement part 1 Naoya Horiguchi
                   ` (8 preceding siblings ...)
  2018-11-09  6:47 ` [RFC][PATCH v1 09/11] mm: hwpoison: apply buddy page handling code to hard-offline Naoya Horiguchi
@ 2018-11-09  6:47 ` Naoya Horiguchi
  2018-11-13  1:32   ` Naoya Horiguchi
  2018-11-09  6:47 ` [RFC][PATCH v1 11/11] mm: hwpoison: introduce clear_hwpoison_free_buddy_page() Naoya Horiguchi
  10 siblings, 1 reply; 24+ messages in thread
From: Naoya Horiguchi @ 2018-11-09  6:47 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Michal Hocko, Andrew Morton, Mike Kravetz, xishi.qiuxishi,
	Laurent Dufour

One hopeful usecase of memory hotplug is to replace half-broken DIMMs
with new ones, so it makes sense to clear hwpoison info at the time of
memory hotremove.

I hope that this patch covers the topic discussed in
https://lkml.org/lkml/2018/1/17/1228

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/page_alloc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git v4.19-mmotm-2018-10-30-16-08/mm/page_alloc.c v4.19-mmotm-2018-10-30-16-08_patched/mm/page_alloc.c
index 970d6ff..27826b3 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/page_alloc.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/page_alloc.c
@@ -8139,8 +8139,9 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
 		 * The HWPoisoned page may be not in buddy system, and
 		 * page_count() is not 0.
 		 */
-		if (unlikely(!PageBuddy(page) && PageHWPoison(page))) {
+		if (unlikely(!PageBuddy(page) && TestClearPageHWPoison(page))) {
 			pfn++;
+			num_poisoned_pages_dec();
 			SetPageReserved(page);
 			continue;
 		}
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC][PATCH v1 11/11] mm: hwpoison: introduce clear_hwpoison_free_buddy_page()
  2018-11-09  6:47 [PATCH RFC v1 00/11] hwpoison improvement part 1 Naoya Horiguchi
                   ` (9 preceding siblings ...)
  2018-11-09  6:47 ` [RFC][PATCH v1 10/11] mm: clear PageHWPoison in memory hotremove Naoya Horiguchi
@ 2018-11-09  6:47 ` Naoya Horiguchi
  2018-11-09 11:33   ` Anshuman Khandual
  10 siblings, 1 reply; 24+ messages in thread
From: Naoya Horiguchi @ 2018-11-09  6:47 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Michal Hocko, Andrew Morton, Mike Kravetz, xishi.qiuxishi,
	Laurent Dufour

The new function is a reverse operation of set_hwpoison_free_buddy_page()
to adjust unpoison_memory() to the new semantics.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 include/linux/page-flags.h |  8 +++++++-
 mm/memory-failure.c        |  5 +++--
 mm/page_alloc.c            | 21 +++++++++++++++++++++
 3 files changed, 31 insertions(+), 3 deletions(-)

diff --git v4.19-mmotm-2018-10-30-16-08/include/linux/page-flags.h v4.19-mmotm-2018-10-30-16-08_patched/include/linux/page-flags.h
index 50ce1bd..ab0bde0 100644
--- v4.19-mmotm-2018-10-30-16-08/include/linux/page-flags.h
+++ v4.19-mmotm-2018-10-30-16-08_patched/include/linux/page-flags.h
@@ -382,11 +382,17 @@ PAGEFLAG(HWPoison, hwpoison, PF_ANY)
 TESTSCFLAG(HWPoison, hwpoison, PF_ANY)
 #define __PG_HWPOISON (1UL << PG_hwpoison)
 extern bool set_hwpoison_free_buddy_page(struct page *page);
+extern bool clear_hwpoison_free_buddy_page(struct page *page);
 #else
 PAGEFLAG_FALSE(HWPoison)
 static inline bool set_hwpoison_free_buddy_page(struct page *page)
 {
-	return 0;
+	return false;
+}
+
+static inline bool clear_hwpoison_free_buddy_page(struct page *page)
+{
+	return false;
 }
 #define __PG_HWPOISON 0
 #endif
diff --git v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
index af541141..a0e1cd4 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/memory-failure.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/memory-failure.c
@@ -1590,8 +1590,9 @@ int unpoison_memory(unsigned long pfn)
 	}
 
 	if (!get_hwpoison_page(p)) {
-		if (TestClearPageHWPoison(p))
-			num_poisoned_pages_dec();
+		if (!clear_hwpoison_free_buddy_page(p))
+			return 0;
+		num_poisoned_pages_dec();
 		unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n",
 				 pfn, &unpoison_rs);
 		return 0;
diff --git v4.19-mmotm-2018-10-30-16-08/mm/page_alloc.c v4.19-mmotm-2018-10-30-16-08_patched/mm/page_alloc.c
index 27826b3..9a90f93 100644
--- v4.19-mmotm-2018-10-30-16-08/mm/page_alloc.c
+++ v4.19-mmotm-2018-10-30-16-08_patched/mm/page_alloc.c
@@ -8270,4 +8270,25 @@ bool set_hwpoison_free_buddy_page(struct page *page)
 
 	return hwpoisoned;
 }
+
+/*
+ * Reverse operation of set_hwpoison_free_buddy_page(), which is expected
+ * to work only on error pages isolated from buddy allocator.
+ */
+bool clear_hwpoison_free_buddy_page(struct page *page)
+{
+	struct zone *zone = page_zone(page);
+	bool unpoisoned = false;
+
+	spin_lock(&zone->lock);
+	if (TestClearPageHWPoison(page)) {
+		unsigned long pfn = page_to_pfn(page);
+		int migratetype = get_pfnblock_migratetype(page, pfn);
+
+		__free_one_page(page, pfn, zone, 0, migratetype);
+		unpoisoned = true;
+	}
+	spin_unlock(&zone->lock);
+	return unpoisoned;
+}
 #endif
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH v1 01/11] mm: hwpoison: cleanup unused PageHuge() check
  2018-11-09  6:47 ` [RFC][PATCH v1 01/11] mm: hwpoison: cleanup unused PageHuge() check Naoya Horiguchi
@ 2018-11-09  9:52   ` Anshuman Khandual
  0 siblings, 0 replies; 24+ messages in thread
From: Anshuman Khandual @ 2018-11-09  9:52 UTC (permalink / raw)
  To: Naoya Horiguchi, linux-mm, linux-kernel
  Cc: Michal Hocko, Andrew Morton, Mike Kravetz, xishi.qiuxishi,
	Laurent Dufour



On 11/09/2018 12:17 PM, Naoya Horiguchi wrote:
> memory_failure() forks to memory_failure_hugetlb() for hugetlb pages,
> so a PageHuge() check after the fork should not be necessary.
> 
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

Pretty straightforward.

Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH v1 02/11] mm: soft-offline: add missing error check of set_hwpoison_free_buddy_page()
  2018-11-09  6:47 ` [RFC][PATCH v1 02/11] mm: soft-offline: add missing error check of set_hwpoison_free_buddy_page() Naoya Horiguchi
@ 2018-11-09 10:20   ` Anshuman Khandual
  2018-11-13  0:16     ` Naoya Horiguchi
  0 siblings, 1 reply; 24+ messages in thread
From: Anshuman Khandual @ 2018-11-09 10:20 UTC (permalink / raw)
  To: Naoya Horiguchi, linux-mm, linux-kernel
  Cc: Michal Hocko, Andrew Morton, Mike Kravetz, xishi.qiuxishi,
	Laurent Dufour



On 11/09/2018 12:17 PM, Naoya Horiguchi wrote:
> set_hwpoison_free_buddy_page() could fail, then the target page is
> finally not isolated, so it's better to report -EBUSY for userspace
> to know the failure and chance of retry.
> 

IIUC set_hwpoison_free_buddy_page() could only fail if the page is not
free in the buddy. At least for soft_offline_huge_page() that wont be
the case otherwise dissolve_free_huge_page() would have returned non
zero -EBUSY. Is there any other reason set_hwpoison_free_buddy_page()
would not succeed ?

> And for consistency, this patch moves set_hwpoison_free_buddy_page()
> in unmap_and_move() to __soft_offline_page().

Yeah this check should be handled in soft offline functions not inside
migrations they trigger.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH v1 03/11] mm: move definition of num_poisoned_pages_inc/dec to include/linux/mm.h
  2018-11-09  6:47 ` [RFC][PATCH v1 03/11] mm: move definition of num_poisoned_pages_inc/dec to include/linux/mm.h Naoya Horiguchi
@ 2018-11-09 10:28   ` Anshuman Khandual
  2018-11-13  0:17     ` Naoya Horiguchi
  0 siblings, 1 reply; 24+ messages in thread
From: Anshuman Khandual @ 2018-11-09 10:28 UTC (permalink / raw)
  To: Naoya Horiguchi, linux-mm, linux-kernel
  Cc: Michal Hocko, Andrew Morton, Mike Kravetz, xishi.qiuxishi,
	Laurent Dufour



On 11/09/2018 12:17 PM, Naoya Horiguchi wrote:
> num_poisoned_pages_inc/dec had better be visible to some file like
> mm/sparse.c and mm/page_alloc.c (for a subsequent patch). So let's
> move it to include/linux/mm.h.
> 
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> ---
>  include/linux/mm.h      | 13 ++++++++++++-
>  include/linux/swapops.h | 16 ----------------
>  mm/sparse.c             |  2 +-
>  3 files changed, 13 insertions(+), 18 deletions(-)
> 
> diff --git v4.19-mmotm-2018-10-30-16-08/include/linux/mm.h v4.19-mmotm-2018-10-30-16-08_patched/include/linux/mm.h
> index 59df394..22623ba 100644
> --- v4.19-mmotm-2018-10-30-16-08/include/linux/mm.h
> +++ v4.19-mmotm-2018-10-30-16-08_patched/include/linux/mm.h
> @@ -2741,7 +2741,7 @@ extern void shake_page(struct page *p, int access);
>  extern atomic_long_t num_poisoned_pages __read_mostly;
>  extern int soft_offline_page(struct page *page, int flags);
>  
> -
> +#ifdef CONFIG_MEMORY_FAILURE
>  /*
>   * Error handlers for various types of pages.
>   */
> @@ -2777,6 +2777,17 @@ enum mf_action_page_type {
>  	MF_MSG_UNKNOWN,
>  };
>  
> +static inline void num_poisoned_pages_inc(void)
> +{
> +	atomic_long_inc(&num_poisoned_pages);
> +}
> +
> +static inline void num_poisoned_pages_dec(void)
> +{
> +	atomic_long_dec(&num_poisoned_pages);
> +}
> +#endif
> +
>  #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS)
>  extern void clear_huge_page(struct page *page,
>  			    unsigned long addr_hint,
> diff --git v4.19-mmotm-2018-10-30-16-08/include/linux/swapops.h v4.19-mmotm-2018-10-30-16-08_patched/include/linux/swapops.h
> index 4d96166..88137e9 100644
> --- v4.19-mmotm-2018-10-30-16-08/include/linux/swapops.h
> +++ v4.19-mmotm-2018-10-30-16-08_patched/include/linux/swapops.h
> @@ -320,8 +320,6 @@ static inline int is_pmd_migration_entry(pmd_t pmd)
>  
>  #ifdef CONFIG_MEMORY_FAILURE
>  
> -extern atomic_long_t num_poisoned_pages __read_mostly;
> -
>  /*
>   * Support for hardware poisoned pages
>   */
> @@ -336,16 +334,6 @@ static inline int is_hwpoison_entry(swp_entry_t entry)
>  	return swp_type(entry) == SWP_HWPOISON;
>  }
>  
> -static inline void num_poisoned_pages_inc(void)
> -{
> -	atomic_long_inc(&num_poisoned_pages);
> -}
> -
> -static inline void num_poisoned_pages_dec(void)
> -{
> -	atomic_long_dec(&num_poisoned_pages);
> -}
> -
>  #else
>  
>  static inline swp_entry_t make_hwpoison_entry(struct page *page)
> @@ -357,10 +345,6 @@ static inline int is_hwpoison_entry(swp_entry_t swp)
>  {
>  	return 0;
>  }
> -
> -static inline void num_poisoned_pages_inc(void)
> -{
> -}

I hope this was a stray definition and redundant which does not prevent
build in absence of CONFIG_MEMORY_FAILURE.

>  #endif
>  
>  #if defined(CONFIG_MEMORY_FAILURE) || defined(CONFIG_MIGRATION)
> diff --git v4.19-mmotm-2018-10-30-16-08/mm/sparse.c v4.19-mmotm-2018-10-30-16-08_patched/mm/sparse.c
> index 33307fc..7ada2e5 100644
> --- v4.19-mmotm-2018-10-30-16-08/mm/sparse.c
> +++ v4.19-mmotm-2018-10-30-16-08_patched/mm/sparse.c
> @@ -726,7 +726,7 @@ static void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
>  
>  	for (i = 0; i < nr_pages; i++) {
>  		if (PageHWPoison(&memmap[i])) {
> -			atomic_long_sub(1, &num_poisoned_pages);
> +			num_poisoned_pages_dec();
>  			ClearPageHWPoison(&memmap[i]);
>  		}
>  	}
> 

Otherwise looks good.

Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH v1 04/11] mm: madvise: call soft_offline_page() without MF_COUNT_INCREASED
  2018-11-09  6:47 ` [RFC][PATCH v1 04/11] mm: madvise: call soft_offline_page() without MF_COUNT_INCREASED Naoya Horiguchi
@ 2018-11-09 10:46   ` Anshuman Khandual
  2018-11-13  0:18     ` Naoya Horiguchi
  0 siblings, 1 reply; 24+ messages in thread
From: Anshuman Khandual @ 2018-11-09 10:46 UTC (permalink / raw)
  To: Naoya Horiguchi, linux-mm, linux-kernel
  Cc: Michal Hocko, Andrew Morton, Mike Kravetz, xishi.qiuxishi,
	Laurent Dufour



On 11/09/2018 12:17 PM, Naoya Horiguchi wrote:
> Currently madvise_inject_error() pins the target page when calling
> memory error handler, but it's not good because the refcount is just
> an artifact of error injector and mock nothing about hw error itself.
> IOW, pinning the error page is part of error handler's task, so
> let's stop doing it.

Did not get that. Could you please kindly explain how an incremented
ref count through get_user_pages_fast() was a mocking the HW error
previously ? Though I might be missing the some context here.

> 
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> ---
>  mm/madvise.c | 25 +++++++++++--------------
>  1 file changed, 11 insertions(+), 14 deletions(-)
> 
> diff --git v4.19-mmotm-2018-10-30-16-08/mm/madvise.c v4.19-mmotm-2018-10-30-16-08_patched/mm/madvise.c
> index 6cb1ca9..9fa0225 100644
> --- v4.19-mmotm-2018-10-30-16-08/mm/madvise.c
> +++ v4.19-mmotm-2018-10-30-16-08_patched/mm/madvise.c
> @@ -637,6 +637,16 @@ static int madvise_inject_error(int behavior,
>  		ret = get_user_pages_fast(start, 1, 0, &page);
>  		if (ret != 1)
>  			return ret;
> +		/*
> +		 * The get_user_pages_fast() is just to get the pfn of the
> +		 * given address, and the refcount has nothing to do with
> +		 * what we try to test, so it should be released immediately.
> +		 * This is racy but it's intended because the real hardware
> +		 * errors could happen at any moment and memory error handlers
> +		 * must properly handle the race.
> +		 */
> +		put_page(page);
> +
>  		pfn = page_to_pfn(page);
>  
>  		/*
> @@ -646,16 +656,11 @@ static int madvise_inject_error(int behavior,
>  		 */
>  		order = compound_order(compound_head(page));
>  
> -		if (PageHWPoison(page)) {
> -			put_page(page);
> -			continue;
> -		}
> -
>  		if (behavior == MADV_SOFT_OFFLINE) {
>  			pr_info("Soft offlining pfn %#lx at process virtual address %#lx\n",
>  					pfn, start);
>  
> -			ret = soft_offline_page(page, MF_COUNT_INCREASED);
> +			ret = soft_offline_page(page, 0);

Probably something defined as a new "ignored" in the memory faults flag
enumeration instead of passing '0' directly.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH v1 11/11] mm: hwpoison: introduce clear_hwpoison_free_buddy_page()
  2018-11-09  6:47 ` [RFC][PATCH v1 11/11] mm: hwpoison: introduce clear_hwpoison_free_buddy_page() Naoya Horiguchi
@ 2018-11-09 11:33   ` Anshuman Khandual
  2018-11-13  0:19     ` Naoya Horiguchi
  0 siblings, 1 reply; 24+ messages in thread
From: Anshuman Khandual @ 2018-11-09 11:33 UTC (permalink / raw)
  To: Naoya Horiguchi, linux-mm, linux-kernel
  Cc: Michal Hocko, Andrew Morton, Mike Kravetz, xishi.qiuxishi,
	Laurent Dufour



On 11/09/2018 12:17 PM, Naoya Horiguchi wrote:
> The new function is a reverse operation of set_hwpoison_free_buddy_page()
> to adjust unpoison_memory() to the new semantics.
> 
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

snip

> +
> +/*
> + * Reverse operation of set_hwpoison_free_buddy_page(), which is expected
> + * to work only on error pages isolated from buddy allocator.
> + */
> +bool clear_hwpoison_free_buddy_page(struct page *page)
> +{
> +	struct zone *zone = page_zone(page);
> +	bool unpoisoned = false;
> +
> +	spin_lock(&zone->lock);
> +	if (TestClearPageHWPoison(page)) {
> +		unsigned long pfn = page_to_pfn(page);
> +		int migratetype = get_pfnblock_migratetype(page, pfn);
> +
> +		__free_one_page(page, pfn, zone, 0, migratetype);
> +		unpoisoned = true;
> +	}
> +	spin_unlock(&zone->lock);
> +	return unpoisoned;
> +}
>  #endif
> 

Though there are multiple page state checks in unpoison_memory() leading
upto clearing HWPoison flag, the page must not be in buddy already if
__free_one_page() would be called on it.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH v1 02/11] mm: soft-offline: add missing error check of set_hwpoison_free_buddy_page()
  2018-11-09 10:20   ` Anshuman Khandual
@ 2018-11-13  0:16     ` Naoya Horiguchi
  2018-11-14  8:53       ` Anshuman Khandual
  0 siblings, 1 reply; 24+ messages in thread
From: Naoya Horiguchi @ 2018-11-13  0:16 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linux-mm, linux-kernel, Michal Hocko, Andrew Morton,
	Mike Kravetz, xishi.qiuxishi, Laurent Dufour

Hi Anshuman,

On Fri, Nov 09, 2018 at 03:50:41PM +0530, Anshuman Khandual wrote:
> 
> 
> On 11/09/2018 12:17 PM, Naoya Horiguchi wrote:
> > set_hwpoison_free_buddy_page() could fail, then the target page is
> > finally not isolated, so it's better to report -EBUSY for userspace
> > to know the failure and chance of retry.
> > 
> 
> IIUC set_hwpoison_free_buddy_page() could only fail if the page is not
> free in the buddy. At least for soft_offline_huge_page() that wont be
> the case otherwise dissolve_free_huge_page() would have returned non
> zero -EBUSY. Is there any other reason set_hwpoison_free_buddy_page()
> would not succeed ?

There is a race window between page freeing (after successful soft-offline
-> page migration case) and the containment by set_hwpoison_free_buddy_page().
Or a target page can be allocated just after get_any_page() decided that
the target page is a free page.
So set_hwpoison_free_buddy_page() would safely fail in such cases.

Thanks,
Naoya Horiguchi

> 
> > And for consistency, this patch moves set_hwpoison_free_buddy_page()
> > in unmap_and_move() to __soft_offline_page().
> 
> Yeah this check should be handled in soft offline functions not inside
> migrations they trigger.
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH v1 03/11] mm: move definition of num_poisoned_pages_inc/dec to include/linux/mm.h
  2018-11-09 10:28   ` Anshuman Khandual
@ 2018-11-13  0:17     ` Naoya Horiguchi
  0 siblings, 0 replies; 24+ messages in thread
From: Naoya Horiguchi @ 2018-11-13  0:17 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linux-mm, linux-kernel, Michal Hocko, Andrew Morton,
	Mike Kravetz, xishi.qiuxishi, Laurent Dufour

On Fri, Nov 09, 2018 at 03:58:27PM +0530, Anshuman Khandual wrote:
> 
> 
> On 11/09/2018 12:17 PM, Naoya Horiguchi wrote:
> > num_poisoned_pages_inc/dec had better be visible to some file like
> > mm/sparse.c and mm/page_alloc.c (for a subsequent patch). So let's
> > move it to include/linux/mm.h.
> > 
> > Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> > ---
> >  include/linux/mm.h      | 13 ++++++++++++-
> >  include/linux/swapops.h | 16 ----------------
> >  mm/sparse.c             |  2 +-
> >  3 files changed, 13 insertions(+), 18 deletions(-)
> > 
> > diff --git v4.19-mmotm-2018-10-30-16-08/include/linux/mm.h v4.19-mmotm-2018-10-30-16-08_patched/include/linux/mm.h
> > index 59df394..22623ba 100644
> > --- v4.19-mmotm-2018-10-30-16-08/include/linux/mm.h
> > +++ v4.19-mmotm-2018-10-30-16-08_patched/include/linux/mm.h
> > @@ -2741,7 +2741,7 @@ extern void shake_page(struct page *p, int access);
> >  extern atomic_long_t num_poisoned_pages __read_mostly;
> >  extern int soft_offline_page(struct page *page, int flags);
> >  
> > -
> > +#ifdef CONFIG_MEMORY_FAILURE
> >  /*
> >   * Error handlers for various types of pages.
> >   */
> > @@ -2777,6 +2777,17 @@ enum mf_action_page_type {
> >  	MF_MSG_UNKNOWN,
> >  };
> >  
> > +static inline void num_poisoned_pages_inc(void)
> > +{
> > +	atomic_long_inc(&num_poisoned_pages);
> > +}
> > +
> > +static inline void num_poisoned_pages_dec(void)
> > +{
> > +	atomic_long_dec(&num_poisoned_pages);
> > +}
> > +#endif
> > +
> >  #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS)
> >  extern void clear_huge_page(struct page *page,
> >  			    unsigned long addr_hint,
> > diff --git v4.19-mmotm-2018-10-30-16-08/include/linux/swapops.h v4.19-mmotm-2018-10-30-16-08_patched/include/linux/swapops.h
> > index 4d96166..88137e9 100644
> > --- v4.19-mmotm-2018-10-30-16-08/include/linux/swapops.h
> > +++ v4.19-mmotm-2018-10-30-16-08_patched/include/linux/swapops.h
> > @@ -320,8 +320,6 @@ static inline int is_pmd_migration_entry(pmd_t pmd)
> >  
> >  #ifdef CONFIG_MEMORY_FAILURE
> >  
> > -extern atomic_long_t num_poisoned_pages __read_mostly;
> > -
> >  /*
> >   * Support for hardware poisoned pages
> >   */
> > @@ -336,16 +334,6 @@ static inline int is_hwpoison_entry(swp_entry_t entry)
> >  	return swp_type(entry) == SWP_HWPOISON;
> >  }
> >  
> > -static inline void num_poisoned_pages_inc(void)
> > -{
> > -	atomic_long_inc(&num_poisoned_pages);
> > -}
> > -
> > -static inline void num_poisoned_pages_dec(void)
> > -{
> > -	atomic_long_dec(&num_poisoned_pages);
> > -}
> > -
> >  #else
> >  
> >  static inline swp_entry_t make_hwpoison_entry(struct page *page)
> > @@ -357,10 +345,6 @@ static inline int is_hwpoison_entry(swp_entry_t swp)
> >  {
> >  	return 0;
> >  }
> > -
> > -static inline void num_poisoned_pages_inc(void)
> > -{
> > -}
> 
> I hope this was a stray definition and redundant which does not prevent
> build in absence of CONFIG_MEMORY_FAILURE.

You're right :)

> >  #endif
> >  
> >  #if defined(CONFIG_MEMORY_FAILURE) || defined(CONFIG_MIGRATION)
> > diff --git v4.19-mmotm-2018-10-30-16-08/mm/sparse.c v4.19-mmotm-2018-10-30-16-08_patched/mm/sparse.c
> > index 33307fc..7ada2e5 100644
> > --- v4.19-mmotm-2018-10-30-16-08/mm/sparse.c
> > +++ v4.19-mmotm-2018-10-30-16-08_patched/mm/sparse.c
> > @@ -726,7 +726,7 @@ static void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
> >  
> >  	for (i = 0; i < nr_pages; i++) {
> >  		if (PageHWPoison(&memmap[i])) {
> > -			atomic_long_sub(1, &num_poisoned_pages);
> > +			num_poisoned_pages_dec();
> >  			ClearPageHWPoison(&memmap[i]);
> >  		}
> >  	}
> > 
> 
> Otherwise looks good.
> 
> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>

Thanks!
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH v1 04/11] mm: madvise: call soft_offline_page() without MF_COUNT_INCREASED
  2018-11-09 10:46   ` Anshuman Khandual
@ 2018-11-13  0:18     ` Naoya Horiguchi
  0 siblings, 0 replies; 24+ messages in thread
From: Naoya Horiguchi @ 2018-11-13  0:18 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linux-mm, linux-kernel, Michal Hocko, Andrew Morton,
	Mike Kravetz, xishi.qiuxishi, Laurent Dufour

On Fri, Nov 09, 2018 at 04:16:55PM +0530, Anshuman Khandual wrote:
> 
> 
> On 11/09/2018 12:17 PM, Naoya Horiguchi wrote:
> > Currently madvise_inject_error() pins the target page when calling
> > memory error handler, but it's not good because the refcount is just
> > an artifact of error injector and mock nothing about hw error itself.
> > IOW, pinning the error page is part of error handler's task, so
> > let's stop doing it.
> 
> Did not get that. Could you please kindly explain how an incremented
> ref count through get_user_pages_fast() was a mocking the HW error
> previously ? Though I might be missing the some context here.

I meant in "mock nothing about hw error itself" that in the code path
for actual HW error (from MCE handler code) the error page is not pinned
outside (but inside) memory_failure().
So it makes more sense to me to do similarly also in error injection code,
and another good thing is that that makes code more simple (A later patch
eliminates MF_COUNT_INCREASED.)

> 
> > 
> > Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> > ---
> >  mm/madvise.c | 25 +++++++++++--------------
> >  1 file changed, 11 insertions(+), 14 deletions(-)
> > 
> > diff --git v4.19-mmotm-2018-10-30-16-08/mm/madvise.c v4.19-mmotm-2018-10-30-16-08_patched/mm/madvise.c
> > index 6cb1ca9..9fa0225 100644
> > --- v4.19-mmotm-2018-10-30-16-08/mm/madvise.c
> > +++ v4.19-mmotm-2018-10-30-16-08_patched/mm/madvise.c
> > @@ -637,6 +637,16 @@ static int madvise_inject_error(int behavior,
> >  		ret = get_user_pages_fast(start, 1, 0, &page);
> >  		if (ret != 1)
> >  			return ret;
> > +		/*
> > +		 * The get_user_pages_fast() is just to get the pfn of the
> > +		 * given address, and the refcount has nothing to do with
> > +		 * what we try to test, so it should be released immediately.
> > +		 * This is racy but it's intended because the real hardware
> > +		 * errors could happen at any moment and memory error handlers
> > +		 * must properly handle the race.
> > +		 */
> > +		put_page(page);
> > +
> >  		pfn = page_to_pfn(page);
> >  
> >  		/*
> > @@ -646,16 +656,11 @@ static int madvise_inject_error(int behavior,
> >  		 */
> >  		order = compound_order(compound_head(page));
> >  
> > -		if (PageHWPoison(page)) {
> > -			put_page(page);
> > -			continue;
> > -		}
> > -
> >  		if (behavior == MADV_SOFT_OFFLINE) {
> >  			pr_info("Soft offlining pfn %#lx at process virtual address %#lx\n",
> >  					pfn, start);
> >  
> > -			ret = soft_offline_page(page, MF_COUNT_INCREASED);
> > +			ret = soft_offline_page(page, 0);
> 
> Probably something defined as a new "ignored" in the memory faults flag
> enumeration instead of passing '0' directly.

MF_* flags are defined as bitmap, not separate values. And according to
other caller like do_memory_failure(), multiple bits in flags can be set together.

    static int do_memory_failure(struct mce *m)
    {
            int flags = MF_ACTION_REQUIRED;
            ....
            if (!(m->mcgstatus & MCG_STATUS_RIPV))
                    flags |= MF_MUST_KILL;
            ret = memory_failure(m->addr >> PAGE_SHIFT, flags);

So I think that simply adding new MF_* value doesn't work, and "flags == 0"
seems to me to show "no flag set" in the clearest way.
Or if you have any code suggestion, that's great.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH v1 11/11] mm: hwpoison: introduce clear_hwpoison_free_buddy_page()
  2018-11-09 11:33   ` Anshuman Khandual
@ 2018-11-13  0:19     ` Naoya Horiguchi
  2018-11-14  8:23       ` Anshuman Khandual
  0 siblings, 1 reply; 24+ messages in thread
From: Naoya Horiguchi @ 2018-11-13  0:19 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linux-mm, linux-kernel, Michal Hocko, Andrew Morton,
	Mike Kravetz, xishi.qiuxishi, Laurent Dufour

On Fri, Nov 09, 2018 at 05:03:06PM +0530, Anshuman Khandual wrote:
> 
> 
> On 11/09/2018 12:17 PM, Naoya Horiguchi wrote:
> > The new function is a reverse operation of set_hwpoison_free_buddy_page()
> > to adjust unpoison_memory() to the new semantics.
> > 
> > Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> 
> snip
> 
> > +
> > +/*
> > + * Reverse operation of set_hwpoison_free_buddy_page(), which is expected
> > + * to work only on error pages isolated from buddy allocator.
> > + */
> > +bool clear_hwpoison_free_buddy_page(struct page *page)
> > +{
> > +	struct zone *zone = page_zone(page);
> > +	bool unpoisoned = false;
> > +
> > +	spin_lock(&zone->lock);
> > +	if (TestClearPageHWPoison(page)) {
> > +		unsigned long pfn = page_to_pfn(page);
> > +		int migratetype = get_pfnblock_migratetype(page, pfn);
> > +
> > +		__free_one_page(page, pfn, zone, 0, migratetype);
> > +		unpoisoned = true;
> > +	}
> > +	spin_unlock(&zone->lock);
> > +	return unpoisoned;
> > +}
> >  #endif
> > 
> 
> Though there are multiple page state checks in unpoison_memory() leading
> upto clearing HWPoison flag, the page must not be in buddy already if
> __free_one_page() would be called on it.

Yes, you're right.
clear_hwpoison_free_buddy_page() is intended to cancel the isolation by
set_hwpoison_free_buddy_page() which removes the target page from buddy allocator,
so the page clear_hwpoison_free_buddy_page() tries to handle is not a buddy page
actually (not linked to any freelist).

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH v1 10/11] mm: clear PageHWPoison in memory hotremove
  2018-11-09  6:47 ` [RFC][PATCH v1 10/11] mm: clear PageHWPoison in memory hotremove Naoya Horiguchi
@ 2018-11-13  1:32   ` Naoya Horiguchi
  0 siblings, 0 replies; 24+ messages in thread
From: Naoya Horiguchi @ 2018-11-13  1:32 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Michal Hocko, Andrew Morton, Mike Kravetz, xishi.qiuxishi,
	Laurent Dufour

On Fri, Nov 09, 2018 at 03:47:14PM +0900, Naoya Horiguchi wrote:
> One hopeful usecase of memory hotplug is to replace half-broken DIMMs
> with new ones, so it makes sense to clear hwpoison info at the time of
> memory hotremove.
> 
> I hope that this patch covers the topic discussed in
> https://lkml.org/lkml/2018/1/17/1228
> 
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> ---
>  mm/page_alloc.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git v4.19-mmotm-2018-10-30-16-08/mm/page_alloc.c v4.19-mmotm-2018-10-30-16-08_patched/mm/page_alloc.c
> index 970d6ff..27826b3 100644
> --- v4.19-mmotm-2018-10-30-16-08/mm/page_alloc.c
> +++ v4.19-mmotm-2018-10-30-16-08_patched/mm/page_alloc.c
> @@ -8139,8 +8139,9 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
>  		 * The HWPoisoned page may be not in buddy system, and
>  		 * page_count() is not 0.
>  		 */
> -		if (unlikely(!PageBuddy(page) && PageHWPoison(page))) {
> +		if (unlikely(!PageBuddy(page) && TestClearPageHWPoison(page))) {
>  			pfn++;
> +			num_poisoned_pages_dec();
>  			SetPageReserved(page);
>  			continue;
>  		}

Kbuild test robot shows that this patch causes build errors on
!CONFIG_MEMORY_FAILURE, which should be fixed by the following changes.

Thanks,
Naoya Horiguchi
---
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6c496dab246d..559092915fe6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2785,6 +2785,10 @@ static inline void num_poisoned_pages_dec(void)
 {
 	atomic_long_dec(&num_poisoned_pages);
 }
+#else
+static inline void num_poisoned_pages_dec(void)
+{
+}
 #endif
 
 #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS)
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index ab0bde073050..1461384aa1a3 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -385,6 +385,7 @@ extern bool set_hwpoison_free_buddy_page(struct page *page);
 extern bool clear_hwpoison_free_buddy_page(struct page *page);
 #else
 PAGEFLAG_FALSE(HWPoison)
+TESTCLEARFLAG_FALSE(HWPoison)
 static inline bool set_hwpoison_free_buddy_page(struct page *page)
 {
 	return false;

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH v1 11/11] mm: hwpoison: introduce clear_hwpoison_free_buddy_page()
  2018-11-13  0:19     ` Naoya Horiguchi
@ 2018-11-14  8:23       ` Anshuman Khandual
  0 siblings, 0 replies; 24+ messages in thread
From: Anshuman Khandual @ 2018-11-14  8:23 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, linux-kernel, Michal Hocko, Andrew Morton,
	Mike Kravetz, xishi.qiuxishi, Laurent Dufour



On 11/13/2018 05:49 AM, Naoya Horiguchi wrote:
> On Fri, Nov 09, 2018 at 05:03:06PM +0530, Anshuman Khandual wrote:
>>
>>
>> On 11/09/2018 12:17 PM, Naoya Horiguchi wrote:
>>> The new function is a reverse operation of set_hwpoison_free_buddy_page()
>>> to adjust unpoison_memory() to the new semantics.
>>>
>>> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>>
>> snip
>>
>>> +
>>> +/*
>>> + * Reverse operation of set_hwpoison_free_buddy_page(), which is expected
>>> + * to work only on error pages isolated from buddy allocator.
>>> + */
>>> +bool clear_hwpoison_free_buddy_page(struct page *page)
>>> +{
>>> +	struct zone *zone = page_zone(page);
>>> +	bool unpoisoned = false;
>>> +
>>> +	spin_lock(&zone->lock);
>>> +	if (TestClearPageHWPoison(page)) {
>>> +		unsigned long pfn = page_to_pfn(page);
>>> +		int migratetype = get_pfnblock_migratetype(page, pfn);
>>> +
>>> +		__free_one_page(page, pfn, zone, 0, migratetype);
>>> +		unpoisoned = true;
>>> +	}
>>> +	spin_unlock(&zone->lock);
>>> +	return unpoisoned;
>>> +}
>>>  #endif
>>>
>>
>> Though there are multiple page state checks in unpoison_memory() leading
>> upto clearing HWPoison flag, the page must not be in buddy already if
>> __free_one_page() would be called on it.
> 
> Yes, you're right.
> clear_hwpoison_free_buddy_page() is intended to cancel the isolation by
> set_hwpoison_free_buddy_page() which removes the target page from buddy allocator,
> so the page clear_hwpoison_free_buddy_page() tries to handle is not a buddy page
> actually (not linked to any freelist).

Got it. Thanks for the explanation.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH v1 02/11] mm: soft-offline: add missing error check of set_hwpoison_free_buddy_page()
  2018-11-13  0:16     ` Naoya Horiguchi
@ 2018-11-14  8:53       ` Anshuman Khandual
  0 siblings, 0 replies; 24+ messages in thread
From: Anshuman Khandual @ 2018-11-14  8:53 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, linux-kernel, Michal Hocko, Andrew Morton,
	Mike Kravetz, xishi.qiuxishi, Laurent Dufour



On 11/13/2018 05:46 AM, Naoya Horiguchi wrote:
> Hi Anshuman,
> 
> On Fri, Nov 09, 2018 at 03:50:41PM +0530, Anshuman Khandual wrote:
>>
>> On 11/09/2018 12:17 PM, Naoya Horiguchi wrote:
>>> set_hwpoison_free_buddy_page() could fail, then the target page is
>>> finally not isolated, so it's better to report -EBUSY for userspace
>>> to know the failure and chance of retry.
>>>
>> IIUC set_hwpoison_free_buddy_page() could only fail if the page is not
>> free in the buddy. At least for soft_offline_huge_page() that wont be
>> the case otherwise dissolve_free_huge_page() would have returned non
>> zero -EBUSY. Is there any other reason set_hwpoison_free_buddy_page()
>> would not succeed ?
> There is a race window between page freeing (after successful soft-offline
> -> page migration case) and the containment by set_hwpoison_free_buddy_page().
> Or a target page can be allocated just after get_any_page() decided that
> the target page is a free page.
> So set_hwpoison_free_buddy_page() would safely fail in such cases.

Makes sense. Thanks.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2018-11-14  8:54 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-09  6:47 [PATCH RFC v1 00/11] hwpoison improvement part 1 Naoya Horiguchi
2018-11-09  6:47 ` [RFC][PATCH v1 01/11] mm: hwpoison: cleanup unused PageHuge() check Naoya Horiguchi
2018-11-09  9:52   ` Anshuman Khandual
2018-11-09  6:47 ` [RFC][PATCH v1 02/11] mm: soft-offline: add missing error check of set_hwpoison_free_buddy_page() Naoya Horiguchi
2018-11-09 10:20   ` Anshuman Khandual
2018-11-13  0:16     ` Naoya Horiguchi
2018-11-14  8:53       ` Anshuman Khandual
2018-11-09  6:47 ` [RFC][PATCH v1 03/11] mm: move definition of num_poisoned_pages_inc/dec to include/linux/mm.h Naoya Horiguchi
2018-11-09 10:28   ` Anshuman Khandual
2018-11-13  0:17     ` Naoya Horiguchi
2018-11-09  6:47 ` [RFC][PATCH v1 04/11] mm: madvise: call soft_offline_page() without MF_COUNT_INCREASED Naoya Horiguchi
2018-11-09 10:46   ` Anshuman Khandual
2018-11-13  0:18     ` Naoya Horiguchi
2018-11-09  6:47 ` [RFC][PATCH v1 05/11] mm: hwpoison-inject: don't pin for hwpoison_filter() Naoya Horiguchi
2018-11-09  6:47 ` [RFC][PATCH v1 06/11] mm: hwpoison: remove MF_COUNT_INCREASED Naoya Horiguchi
2018-11-09  6:47 ` [RFC][PATCH v1 07/11] mm: remove flag argument from soft offline functions Naoya Horiguchi
2018-11-09  6:47 ` [RFC][PATCH v1 08/11] mm: soft-offline: isolate error pages from buddy freelist Naoya Horiguchi
2018-11-09  6:47 ` [RFC][PATCH v1 09/11] mm: hwpoison: apply buddy page handling code to hard-offline Naoya Horiguchi
2018-11-09  6:47 ` [RFC][PATCH v1 10/11] mm: clear PageHWPoison in memory hotremove Naoya Horiguchi
2018-11-13  1:32   ` Naoya Horiguchi
2018-11-09  6:47 ` [RFC][PATCH v1 11/11] mm: hwpoison: introduce clear_hwpoison_free_buddy_page() Naoya Horiguchi
2018-11-09 11:33   ` Anshuman Khandual
2018-11-13  0:19     ` Naoya Horiguchi
2018-11-14  8:23       ` Anshuman Khandual

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.