[mm-unstable PATCH v4 0/9] mm, hwpoison: enable 1GB hugepage support (v4)

All of lore.kernel.org
 help / color / mirror / Atom feed

* [mm-unstable PATCH v4 0/9] mm, hwpoison: enable 1GB hugepage support (v4)
@ 2022-07-04  1:33 Naoya Horiguchi
  2022-07-04  1:33 ` [mm-unstable PATCH v4 1/9] mm/hugetlb: check gigantic_page_runtime_supported() in return_unused_surplus_pages() Naoya Horiguchi
                   ` (8 more replies)
  0 siblings, 9 replies; 30+ messages in thread
From: Naoya Horiguchi @ 2022-07-04  1:33 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, David Hildenbrand, Mike Kravetz, Miaohe Lin,
	Liu Shixin, Yang Shi, Oscar Salvador, Muchun Song,
	Naoya Horiguchi, linux-kernel

Here is v4 of "enabling memory error handling on 1GB hugepage" patchset.
It's rebased onto mm-unstable on Jul 3 (1e6a0f7c1c49).  There're a few
conflicts, but all the resolutions are superficial ones.

- v1: https://lore.kernel.org/linux-mm/20220602050631.771414-1-naoya.horiguchi@linux.dev/T/#u
- v2: https://lore.kernel.org/linux-mm/20220623235153.2623702-1-naoya.horiguchi@linux.dev/T/#u
- v3: https://lore.kernel.org/linux-mm/20220630022755.3362349-1-naoya.horiguchi@linux.dev/T/#u

Thanks,
Naoya Horiguchi
---
Summary:

Naoya Horiguchi (9):
      mm/hugetlb: check gigantic_page_runtime_supported() in return_unused_surplus_pages()
      mm/hugetlb: separate path for hwpoison entry in copy_hugetlb_page_range()
      mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry
      mm, hwpoison, hugetlb: support saving mechanism of raw error pages
      mm, hwpoison: make unpoison aware of raw error info in hwpoisoned hugepage
      mm, hwpoison: set PG_hwpoison for busy hugetlb pages
      mm, hwpoison: make __page_handle_poison returns int
      mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage
      mm, hwpoison: enable memory error handling on 1GB hugepage

 arch/x86/mm/hugetlbpage.c |   8 ++-
 include/linux/hugetlb.h   |  18 +++++-
 include/linux/mm.h        |   2 +-
 include/linux/swapops.h   |   9 +++
 include/ras/ras_event.h   |   1 -
 mm/hugetlb.c              |  99 ++++++++++++++++++++++---------
 mm/memory-failure.c       | 147 ++++++++++++++++++++++++++++++++++++----------
 7 files changed, 223 insertions(+), 61 deletions(-)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [mm-unstable PATCH v4 1/9] mm/hugetlb: check gigantic_page_runtime_supported() in return_unused_surplus_pages()
  2022-07-04  1:33 [mm-unstable PATCH v4 0/9] mm, hwpoison: enable 1GB hugepage support (v4) Naoya Horiguchi
@ 2022-07-04  1:33 ` Naoya Horiguchi
  2022-07-05  2:16   ` Miaohe Lin
  2022-07-06 21:51   ` Mike Kravetz
  2022-07-04  1:33 ` [mm-unstable PATCH v4 2/9] mm/hugetlb: separate path for hwpoison entry in copy_hugetlb_page_range() Naoya Horiguchi
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 30+ messages in thread
From: Naoya Horiguchi @ 2022-07-04  1:33 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, David Hildenbrand, Mike Kravetz, Miaohe Lin,
	Liu Shixin, Yang Shi, Oscar Salvador, Muchun Song,
	Naoya Horiguchi, linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

I found a weird state of 1GB hugepage pool, caused by the following
procedure:

  - run a process reserving all free 1GB hugepages,
  - shrink free 1GB hugepage pool to zero (i.e. writing 0 to
    /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages), then
  - kill the reserving process.

, then all the hugepages are free *and* surplus at the same time.

  $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
  3
  $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/free_hugepages
  3
  $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/resv_hugepages
  0
  $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/surplus_hugepages
  3

This state is resolved by reserving and allocating the pages then
freeing them again, so this seems not to result in serious problem.
But it's a little surprising (shrinking pool suddenly fails).

This behavior is caused by hstate_is_gigantic() check in
return_unused_surplus_pages(). This was introduced so long ago in 2008
by commit aa888a74977a ("hugetlb: support larger than MAX_ORDER"), and
at that time the gigantic pages were not supposed to be allocated/freed
at run-time.  Now kernel can support runtime allocation/free, so let's
check gigantic_page_runtime_supported() together.

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
v2 -> v3:
- Fixed typo in patch description,
- add !gigantic_page_runtime_supported() check instead of removing
  hstate_is_gigantic() check (suggested by Miaohe and Muchun)
- add a few more !gigantic_page_runtime_supported() check in
  set_max_huge_pages() (by Mike).
---
 mm/hugetlb.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 2a554f006255..bdc4499f324b 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2432,8 +2432,7 @@ static void return_unused_surplus_pages(struct hstate *h,
 	/* Uncommit the reservation */
 	h->resv_huge_pages -= unused_resv_pages;
 
-	/* Cannot return gigantic pages currently */
-	if (hstate_is_gigantic(h))
+	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
 		goto out;
 
 	/*
@@ -3315,7 +3314,8 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
 	 * the user tries to allocate gigantic pages but let the user free the
 	 * boottime allocated gigantic pages.
 	 */
-	if (hstate_is_gigantic(h) && !IS_ENABLED(CONFIG_CONTIG_ALLOC)) {
+	if (hstate_is_gigantic(h) && (!IS_ENABLED(CONFIG_CONTIG_ALLOC) ||
+				      !gigantic_page_runtime_supported())) {
 		if (count > persistent_huge_pages(h)) {
 			spin_unlock_irq(&hugetlb_lock);
 			mutex_unlock(&h->resize_lock);
@@ -3363,6 +3363,19 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
 			goto out;
 	}
 
+	/*
+	 * We can not decrease gigantic pool size if runtime modification
+	 * is not supported.
+	 */
+	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported()) {
+		if (count < persistent_huge_pages(h)) {
+			spin_unlock_irq(&hugetlb_lock);
+			mutex_unlock(&h->resize_lock);
+			NODEMASK_FREE(node_alloc_noretry);
+			return -EINVAL;
+		}
+	}
+
 	/*
 	 * Decrease the pool size
 	 * First return free pages to the buddy allocator (being careful
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [mm-unstable PATCH v4 2/9] mm/hugetlb: separate path for hwpoison entry in copy_hugetlb_page_range()
  2022-07-04  1:33 [mm-unstable PATCH v4 0/9] mm, hwpoison: enable 1GB hugepage support (v4) Naoya Horiguchi
  2022-07-04  1:33 ` [mm-unstable PATCH v4 1/9] mm/hugetlb: check gigantic_page_runtime_supported() in return_unused_surplus_pages() Naoya Horiguchi
@ 2022-07-04  1:33 ` Naoya Horiguchi
  2022-07-04  1:42   ` Andrew Morton
  2022-07-04  1:33 ` [mm-unstable PATCH v4 3/9] mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry Naoya Horiguchi
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 30+ messages in thread
From: Naoya Horiguchi @ 2022-07-04  1:33 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, David Hildenbrand, Mike Kravetz, Miaohe Lin,
	Liu Shixin, Yang Shi, Oscar Salvador, Muchun Song,
	Naoya Horiguchi, linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

Originally copy_hugetlb_page_range() handles migration entries and hwpoisoned
entries in similar manner.  But recently the related code path has more code
for migration entries, and when is_writable_migration_entry() was converted
to !is_readable_migration_entry(), hwpoison entries on source processes got
to be unexpectedly updated (which is legitimate for migration entries, but
not for hwpoison entries).  This results in unexpected serious issues like
kernel panic when forking processes with hwpoison entries in pmd.

Separate the if branch into one for hwpoison entries and one for migration
entries.

Fixes: 6c287605fd56 ("mm: remember exclusively mapped anonymous pages with PG_anon_exclusive")
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: <stable@vger.kernel.org> # 5.18
---
v3 -> v4:
- replact set_huge_swap_pte_at() with set_huge_pte_at()
---
 mm/hugetlb.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index bdc4499f324b..ad621688370b 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4803,8 +4803,13 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 			 * sharing with another vma.
 			 */
 			;
-		} else if (unlikely(is_hugetlb_entry_migration(entry) ||
-				    is_hugetlb_entry_hwpoisoned(entry))) {
+		} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) {
+			bool uffd_wp = huge_pte_uffd_wp(entry);
+
+			if (!userfaultfd_wp(dst_vma) && uffd_wp)
+				entry = huge_pte_clear_uffd_wp(entry);
+			set_huge_pte_at(dst, addr, dst_pte, entry);
+		} else if (unlikely(is_hugetlb_entry_migration(entry))) {
 			swp_entry_t swp_entry = pte_to_swp_entry(entry);
 			bool uffd_wp = huge_pte_uffd_wp(entry);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [mm-unstable PATCH v4 3/9] mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry
  2022-07-04  1:33 [mm-unstable PATCH v4 0/9] mm, hwpoison: enable 1GB hugepage support (v4) Naoya Horiguchi
  2022-07-04  1:33 ` [mm-unstable PATCH v4 1/9] mm/hugetlb: check gigantic_page_runtime_supported() in return_unused_surplus_pages() Naoya Horiguchi
  2022-07-04  1:33 ` [mm-unstable PATCH v4 2/9] mm/hugetlb: separate path for hwpoison entry in copy_hugetlb_page_range() Naoya Horiguchi
@ 2022-07-04  1:33 ` Naoya Horiguchi
  2022-07-05  2:46   ` Miaohe Lin
  2022-07-06 22:21   ` Mike Kravetz
  2022-07-04  1:33 ` [mm-unstable PATCH v4 4/9] mm, hwpoison, hugetlb: support saving mechanism of raw error pages Naoya Horiguchi
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 30+ messages in thread
From: Naoya Horiguchi @ 2022-07-04  1:33 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, David Hildenbrand, Mike Kravetz, Miaohe Lin,
	Liu Shixin, Yang Shi, Oscar Salvador, Muchun Song,
	Naoya Horiguchi, linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

follow_pud_mask() does not support non-present pud entry now.  As long as
I tested on x86_64 server, follow_pud_mask() still simply returns
no_page_table() for non-present_pud_entry() due to pud_bad(), so no severe
user-visible effect should happen.  But generally we should call
follow_huge_pud() for non-present pud entry for 1GB hugetlb page.

Update pud_huge() and follow_huge_pud() to handle non-present pud entries.
The changes are similar to previous works for pud entries commit e66f17ff7177
("mm/hugetlb: take page table lock in follow_huge_pmd()") and commit
cbef8478bee5 ("mm/hugetlb: pmd_huge() returns true for non-present hugepage").

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
v2 -> v3:
- fixed typos in subject and description,
- added comment on pud_huge(),
- added comment about fallback for hwpoisoned entry,
- updated initial check about FOLL_{PIN,GET} flags.
---
 arch/x86/mm/hugetlbpage.c |  8 +++++++-
 mm/hugetlb.c              | 32 ++++++++++++++++++++++++++++++--
 2 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index 509408da0da1..6b3033845c6d 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -30,9 +30,15 @@ int pmd_huge(pmd_t pmd)
 		(pmd_val(pmd) & (_PAGE_PRESENT|_PAGE_PSE)) != _PAGE_PRESENT;
 }
 
+/*
+ * pud_huge() returns 1 if @pud is hugetlb related entry, that is normal
+ * hugetlb entry or non-present (migration or hwpoisoned) hugetlb entry.
+ * Otherwise, returns 0.
+ */
 int pud_huge(pud_t pud)
 {
-	return !!(pud_val(pud) & _PAGE_PSE);
+	return !pud_none(pud) &&
+		(pud_val(pud) & (_PAGE_PRESENT|_PAGE_PSE)) != _PAGE_PRESENT;
 }
 
 #ifdef CONFIG_HUGETLB_PAGE
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ad621688370b..66bb39e0fce8 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6994,10 +6994,38 @@ struct page * __weak
 follow_huge_pud(struct mm_struct *mm, unsigned long address,
 		pud_t *pud, int flags)
 {
-	if (flags & (FOLL_GET | FOLL_PIN))
+	struct page *page = NULL;
+	spinlock_t *ptl;
+	pte_t pte;
+
+	if (WARN_ON_ONCE(flags & FOLL_PIN))
 		return NULL;
 
-	return pte_page(*(pte_t *)pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT);
+retry:
+	ptl = huge_pte_lock(hstate_sizelog(PUD_SHIFT), mm, (pte_t *)pud);
+	if (!pud_huge(*pud))
+		goto out;
+	pte = huge_ptep_get((pte_t *)pud);
+	if (pte_present(pte)) {
+		page = pud_page(*pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT);
+		if (WARN_ON_ONCE(!try_grab_page(page, flags))) {
+			page = NULL;
+			goto out;
+		}
+	} else {
+		if (is_hugetlb_entry_migration(pte)) {
+			spin_unlock(ptl);
+			__migration_entry_wait(mm, (pte_t *)pud, ptl);
+			goto retry;
+		}
+		/*
+		 * hwpoisoned entry is treated as no_page_table in
+		 * follow_page_mask().
+		 */
+	}
+out:
+	spin_unlock(ptl);
+	return page;
 }
 
 struct page * __weak
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [mm-unstable PATCH v4 4/9] mm, hwpoison, hugetlb: support saving mechanism of raw error pages
  2022-07-04  1:33 [mm-unstable PATCH v4 0/9] mm, hwpoison: enable 1GB hugepage support (v4) Naoya Horiguchi
                   ` (2 preceding siblings ...)
  2022-07-04  1:33 ` [mm-unstable PATCH v4 3/9] mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry Naoya Horiguchi
@ 2022-07-04  1:33 ` Naoya Horiguchi
  2022-07-06  2:37   ` Miaohe Lin
  2022-07-04  1:33 ` [mm-unstable PATCH v4 5/9] mm, hwpoison: make unpoison aware of raw error info in hwpoisoned hugepage Naoya Horiguchi
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 30+ messages in thread
From: Naoya Horiguchi @ 2022-07-04  1:33 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, David Hildenbrand, Mike Kravetz, Miaohe Lin,
	Liu Shixin, Yang Shi, Oscar Salvador, Muchun Song,
	Naoya Horiguchi, linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

When handling memory error on a hugetlb page, the error handler tries to
dissolve and turn it into 4kB pages.  If it's successfully dissolved,
PageHWPoison flag is moved to the raw error page, so that's all right.
However, dissolve sometimes fails, then the error page is left as
hwpoisoned hugepage. It's useful if we can retry to dissolve it to save
healthy pages, but that's not possible now because the information about
where the raw error pages is lost.

Use the private field of a few tail pages to keep that information.  The
code path of shrinking hugepage pool uses this info to try delayed dissolve.
In order to remember multiple errors in a hugepage, a singly-linked list
originated from SUBPAGE_INDEX_HWPOISON-th tail page is constructed.  Only
simple operations (adding an entry or clearing all) are required and the
list is assumed not to be very long, so this simple data structure should
be enough.

If we failed to save raw error info, the hwpoison hugepage has errors on
unknown subpage, then this new saving mechanism does not work any more,
so disable saving new raw error info and freeing hwpoison hugepages.

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
v3 -> v4:
- resolve conflict with "mm: hugetlb_vmemmap: improve hugetlb_vmemmap
  code readability", use hugetlb_vmemmap_restore() instead of
  hugetlb_vmemmap_alloc().

v2 -> v3:
- remove duplicate "return ret" lines,
- use GFP_ATOMIC instead of GFP_KERNEL,
- introduce HPageRawHwpUnreliable pseudo flag (suggested by Muchun),
- hugetlb_clear_page_hwpoison removes raw_hwp_page list even if
  HPageRawHwpUnreliable is true, (by Miaohe)

v1 -> v2:
- support hwpoison hugepage with multiple errors,
- moved the new interface functions to mm/memory-failure.c,
- define additional subpage index SUBPAGE_INDEX_HWPOISON_UNRELIABLE,
- stop freeing/dissolving hwpoison hugepages with unreliable raw error info,
- drop hugetlb_clear_page_hwpoison() in dissolve_free_huge_page() because
  that's done in update_and_free_page(),
- move setting/clearing PG_hwpoison flag to the new interfaces,
- checking already hwpoisoned or not on a subpage basis.

ChangeLog since previous post on 4/27:
- fixed typo in patch description (by Miaohe)
- fixed config value in #ifdef statement (by Miaohe)
- added sentences about "multiple hwpoison pages" scenario in patch
  description

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
 include/linux/hugetlb.h | 18 +++++++++-
 mm/hugetlb.c            | 39 ++++++++++----------
 mm/memory-failure.c     | 80 +++++++++++++++++++++++++++++++++++++++--
 3 files changed, 114 insertions(+), 23 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index dce46d571575..29c4d0883d36 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -42,6 +42,9 @@ enum {
 	SUBPAGE_INDEX_CGROUP,		/* reuse page->private */
 	SUBPAGE_INDEX_CGROUP_RSVD,	/* reuse page->private */
 	__MAX_CGROUP_SUBPAGE_INDEX = SUBPAGE_INDEX_CGROUP_RSVD,
+#endif
+#ifdef CONFIG_MEMORY_FAILURE
+	SUBPAGE_INDEX_HWPOISON,
 #endif
 	__NR_USED_SUBPAGE,
 };
@@ -551,7 +554,7 @@ generic_hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
  *	Synchronization:  Initially set after new page allocation with no
  *	locking.  When examined and modified during migration processing
  *	(isolate, migrate, putback) the hugetlb_lock is held.
- * HPG_temporary - - Set on a page that is temporarily allocated from the buddy
+ * HPG_temporary -- Set on a page that is temporarily allocated from the buddy
  *	allocator.  Typically used for migration target pages when no pages
  *	are available in the pool.  The hugetlb free page path will
  *	immediately free pages with this flag set to the buddy allocator.
@@ -561,6 +564,8 @@ generic_hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
  * HPG_freed - Set when page is on the free lists.
  *	Synchronization: hugetlb_lock held for examination and modification.
  * HPG_vmemmap_optimized - Set when the vmemmap pages of the page are freed.
+ * HPG_raw_hwp_unreliable - Set when the hugetlb page has a hwpoison sub-page
+ *     that is not tracked by raw_hwp_page list.
  */
 enum hugetlb_page_flags {
 	HPG_restore_reserve = 0,
@@ -568,6 +573,7 @@ enum hugetlb_page_flags {
 	HPG_temporary,
 	HPG_freed,
 	HPG_vmemmap_optimized,
+	HPG_raw_hwp_unreliable,
 	__NR_HPAGEFLAGS,
 };
 
@@ -614,6 +620,7 @@ HPAGEFLAG(Migratable, migratable)
 HPAGEFLAG(Temporary, temporary)
 HPAGEFLAG(Freed, freed)
 HPAGEFLAG(VmemmapOptimized, vmemmap_optimized)
+HPAGEFLAG(RawHwpUnreliable, raw_hwp_unreliable)
 
 #ifdef CONFIG_HUGETLB_PAGE
 
@@ -796,6 +803,15 @@ extern int dissolve_free_huge_page(struct page *page);
 extern int dissolve_free_huge_pages(unsigned long start_pfn,
 				    unsigned long end_pfn);
 
+#ifdef CONFIG_MEMORY_FAILURE
+extern int hugetlb_clear_page_hwpoison(struct page *hpage);
+#else
+static inline int hugetlb_clear_page_hwpoison(struct page *hpage)
+{
+	return 0;
+}
+#endif
+
 #ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
 #ifndef arch_hugetlb_migration_supported
 static inline bool arch_hugetlb_migration_supported(struct hstate *h)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 66bb39e0fce8..ccd470f0194c 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1535,17 +1535,15 @@ static void __update_and_free_page(struct hstate *h, struct page *page)
 	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
 		return;
 
-	if (hugetlb_vmemmap_restore(h, page)) {
-		spin_lock_irq(&hugetlb_lock);
-		/*
-		 * If we cannot allocate vmemmap pages, just refuse to free the
-		 * page and put the page back on the hugetlb free list and treat
-		 * as a surplus page.
-		 */
-		add_hugetlb_page(h, page, true);
-		spin_unlock_irq(&hugetlb_lock);
-		return;
-	}
+	if (hugetlb_vmemmap_restore(h, page))
+		goto fail;
+
+	/*
+	 * Move PageHWPoison flag from head page to the raw error pages,
+	 * which makes any healthy subpages reusable.
+	 */
+	if (unlikely(PageHWPoison(page) && hugetlb_clear_page_hwpoison(page)))
+		goto fail;
 
 	for (i = 0; i < pages_per_huge_page(h);
 	     i++, subpage = mem_map_next(subpage, page, i)) {
@@ -1566,6 +1564,16 @@ static void __update_and_free_page(struct hstate *h, struct page *page)
 	} else {
 		__free_pages(page, huge_page_order(h));
 	}
+	return;
+fail:
+	spin_lock_irq(&hugetlb_lock);
+	/*
+	 * If we cannot allocate vmemmap pages or cannot identify raw hwpoison
+	 * subpages reliably, just refuse to free the page and put the page
+	 * back on the hugetlb free list and treat as a surplus page.
+	 */
+	add_hugetlb_page(h, page, true);
+	spin_unlock_irq(&hugetlb_lock);
 }
 
 /*
@@ -2109,15 +2117,6 @@ int dissolve_free_huge_page(struct page *page)
 		 */
 		rc = hugetlb_vmemmap_restore(h, head);
 		if (!rc) {
-			/*
-			 * Move PageHWPoison flag from head page to the raw
-			 * error page, which makes any subpages rather than
-			 * the error page reusable.
-			 */
-			if (PageHWPoison(head) && page != head) {
-				SetPageHWPoison(page);
-				ClearPageHWPoison(head);
-			}
 			update_and_free_page(h, head, false);
 		} else {
 			spin_lock_irq(&hugetlb_lock);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index c9931c676335..53bf7486a245 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1664,6 +1664,82 @@ int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index,
 EXPORT_SYMBOL_GPL(mf_dax_kill_procs);
 #endif /* CONFIG_FS_DAX */
 
+/*
+ * Struct raw_hwp_page represents information about "raw error page",
+ * constructing singly linked list originated from ->private field of
+ * SUBPAGE_INDEX_HWPOISON-th tail page.
+ */
+struct raw_hwp_page {
+	struct llist_node node;
+	struct page *page;
+};
+
+static inline struct llist_head *raw_hwp_list_head(struct page *hpage)
+{
+	return (struct llist_head *)&page_private(hpage + SUBPAGE_INDEX_HWPOISON);
+}
+
+static inline int hugetlb_set_page_hwpoison(struct page *hpage,
+					struct page *page)
+{
+	struct llist_head *head;
+	struct raw_hwp_page *raw_hwp;
+	struct llist_node *t, *tnode;
+	int ret;
+
+	/*
+	 * Once the hwpoison hugepage has lost reliable raw error info,
+	 * there is little meaning to keep additional error info precisely,
+	 * so skip to add additional raw error info.
+	 */
+	if (HPageRawHwpUnreliable(hpage))
+		return -EHWPOISON;
+	head = raw_hwp_list_head(hpage);
+	llist_for_each_safe(tnode, t, head->first) {
+		struct raw_hwp_page *p = container_of(tnode, struct raw_hwp_page, node);
+
+		if (p->page == page)
+			return -EHWPOISON;
+	}
+
+	ret = TestSetPageHWPoison(hpage) ? -EHWPOISON : 0;
+	/* the first error event will be counted in action_result(). */
+	if (ret)
+		num_poisoned_pages_inc();
+
+	raw_hwp = kmalloc(sizeof(struct raw_hwp_page), GFP_ATOMIC);
+	if (raw_hwp) {
+		raw_hwp->page = page;
+		llist_add(&raw_hwp->node, head);
+	} else {
+		/*
+		 * Failed to save raw error info.  We no longer trace all
+		 * hwpoisoned subpages, and we need refuse to free/dissolve
+		 * this hwpoisoned hugepage.
+		 */
+		SetHPageRawHwpUnreliable(hpage);
+	}
+	return ret;
+}
+
+inline int hugetlb_clear_page_hwpoison(struct page *hpage)
+{
+	struct llist_head *head;
+	struct llist_node *t, *tnode;
+
+	if (!HPageRawHwpUnreliable(hpage))
+		ClearPageHWPoison(hpage);
+	head = raw_hwp_list_head(hpage);
+	llist_for_each_safe(tnode, t, head->first) {
+		struct raw_hwp_page *p = container_of(tnode, struct raw_hwp_page, node);
+
+		SetPageHWPoison(p->page);
+		kfree(p);
+	}
+	llist_del_all(head);
+	return 0;
+}
+
 /*
  * Called from hugetlb code with hugetlb_lock held.
  *
@@ -1698,7 +1774,7 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
 		goto out;
 	}
 
-	if (TestSetPageHWPoison(head)) {
+	if (hugetlb_set_page_hwpoison(head, page)) {
 		ret = -EHWPOISON;
 		goto out;
 	}
@@ -1751,7 +1827,7 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
 	lock_page(head);
 
 	if (hwpoison_filter(p)) {
-		ClearPageHWPoison(head);
+		hugetlb_clear_page_hwpoison(head);
 		res = -EOPNOTSUPP;
 		goto out;
 	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [mm-unstable PATCH v4 5/9] mm, hwpoison: make unpoison aware of raw error info in hwpoisoned hugepage
  2022-07-04  1:33 [mm-unstable PATCH v4 0/9] mm, hwpoison: enable 1GB hugepage support (v4) Naoya Horiguchi
                   ` (3 preceding siblings ...)
  2022-07-04  1:33 ` [mm-unstable PATCH v4 4/9] mm, hwpoison, hugetlb: support saving mechanism of raw error pages Naoya Horiguchi
@ 2022-07-04  1:33 ` Naoya Horiguchi
  2022-07-06  2:58   ` Miaohe Lin
  2022-07-04  1:33 ` [mm-unstable PATCH v4 6/9] mm, hwpoison: set PG_hwpoison for busy hugetlb pages Naoya Horiguchi
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 30+ messages in thread
From: Naoya Horiguchi @ 2022-07-04  1:33 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, David Hildenbrand, Mike Kravetz, Miaohe Lin,
	Liu Shixin, Yang Shi, Oscar Salvador, Muchun Song,
	Naoya Horiguchi, linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

Raw error info list needs to be removed when hwpoisoned hugetlb is
unpoisoned.  And unpoison handler needs to know how many errors there
are in the target hugepage. So add them.

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
 include/linux/swapops.h |  9 +++++++++
 mm/memory-failure.c     | 31 +++++++++++++++++++++++++------
 2 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index a01aeb3fcc0b..ddc98f96ad2c 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -498,6 +498,11 @@ static inline void num_poisoned_pages_dec(void)
 	atomic_long_dec(&num_poisoned_pages);
 }
 
+static inline void num_poisoned_pages_sub(long i)
+{
+	atomic_long_sub(i, &num_poisoned_pages);
+}
+
 #else
 
 static inline swp_entry_t make_hwpoison_entry(struct page *page)
@@ -518,6 +523,10 @@ static inline struct page *hwpoison_entry_to_page(swp_entry_t entry)
 static inline void num_poisoned_pages_inc(void)
 {
 }
+
+static inline void num_poisoned_pages_sub(long i)
+{
+}
 #endif
 
 static inline int non_swap_entry(swp_entry_t entry)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 53bf7486a245..6af2096d8ea0 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1722,22 +1722,33 @@ static inline int hugetlb_set_page_hwpoison(struct page *hpage,
 	return ret;
 }
 
-inline int hugetlb_clear_page_hwpoison(struct page *hpage)
+static inline long free_raw_hwp_pages(struct page *hpage, bool move_flag)
 {
 	struct llist_head *head;
 	struct llist_node *t, *tnode;
+	long count = 0;
 
-	if (!HPageRawHwpUnreliable(hpage))
-		ClearPageHWPoison(hpage);
 	head = raw_hwp_list_head(hpage);
 	llist_for_each_safe(tnode, t, head->first) {
 		struct raw_hwp_page *p = container_of(tnode, struct raw_hwp_page, node);
 
-		SetPageHWPoison(p->page);
+		if (move_flag)
+			SetPageHWPoison(p->page);
 		kfree(p);
+		count++;
 	}
 	llist_del_all(head);
-	return 0;
+	return count;
+}
+
+inline int hugetlb_clear_page_hwpoison(struct page *hpage)
+{
+	int ret = -EBUSY;
+
+	if (!HPageRawHwpUnreliable(hpage))
+		ret = !TestClearPageHWPoison(hpage);
+	free_raw_hwp_pages(hpage, true);
+	return ret;
 }
 
 /*
@@ -1882,6 +1893,9 @@ static inline int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *
 	return 0;
 }
 
+static inline void free_raw_hwp_pages(struct page *hpage, bool move_flag)
+{
+}
 #endif	/* CONFIG_HUGETLB_PAGE */
 
 static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
@@ -2287,6 +2301,7 @@ int unpoison_memory(unsigned long pfn)
 	struct page *p;
 	int ret = -EBUSY;
 	int freeit = 0;
+	long count = 1;
 	static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
 					DEFAULT_RATELIMIT_BURST);
 
@@ -2334,6 +2349,8 @@ int unpoison_memory(unsigned long pfn)
 
 	ret = get_hwpoison_page(p, MF_UNPOISON);
 	if (!ret) {
+		if (PageHuge(p))
+			count = free_raw_hwp_pages(page, false);
 		ret = TestClearPageHWPoison(page) ? 0 : -EBUSY;
 	} else if (ret < 0) {
 		if (ret == -EHWPOISON) {
@@ -2342,6 +2359,8 @@ int unpoison_memory(unsigned long pfn)
 			unpoison_pr_info("Unpoison: failed to grab page %#lx\n",
 					 pfn, &unpoison_rs);
 	} else {
+		if (PageHuge(p))
+			count = free_raw_hwp_pages(page, false);
 		freeit = !!TestClearPageHWPoison(p);
 
 		put_page(page);
@@ -2354,7 +2373,7 @@ int unpoison_memory(unsigned long pfn)
 unlock_mutex:
 	mutex_unlock(&mf_mutex);
 	if (!ret || freeit) {
-		num_poisoned_pages_dec();
+		num_poisoned_pages_sub(count);
 		unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
 				 page_to_pfn(p), &unpoison_rs);
 	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [mm-unstable PATCH v4 6/9] mm, hwpoison: set PG_hwpoison for busy hugetlb pages
  2022-07-04  1:33 [mm-unstable PATCH v4 0/9] mm, hwpoison: enable 1GB hugepage support (v4) Naoya Horiguchi
                   ` (4 preceding siblings ...)
  2022-07-04  1:33 ` [mm-unstable PATCH v4 5/9] mm, hwpoison: make unpoison aware of raw error info in hwpoisoned hugepage Naoya Horiguchi
@ 2022-07-04  1:33 ` Naoya Horiguchi
  2022-07-04  1:33 ` [mm-unstable PATCH v4 7/9] mm, hwpoison: make __page_handle_poison returns int Naoya Horiguchi
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 30+ messages in thread
From: Naoya Horiguchi @ 2022-07-04  1:33 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, David Hildenbrand, Mike Kravetz, Miaohe Lin,
	Liu Shixin, Yang Shi, Oscar Salvador, Muchun Song,
	Naoya Horiguchi, linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

If memory_failure() fails to grab page refcount on a hugetlb page
because it's busy, it returns without setting PG_hwpoison on it.
This not only loses a chance of error containment, but breaks the rule
that action_result() should be called only when memory_failure() do
any of handling work (even if that's just setting PG_hwpoison).
This inconsistency could harm code maintainability.

So set PG_hwpoison and call hugetlb_set_page_hwpoison() for such a case.

Fixes: 405ce051236c ("mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()")
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
---
 include/linux/mm.h  | 1 +
 mm/memory-failure.c | 8 ++++----
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 433bde7dcbf2..22f2dfe41c99 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3235,6 +3235,7 @@ enum mf_flags {
 	MF_SOFT_OFFLINE = 1 << 3,
 	MF_UNPOISON = 1 << 4,
 	MF_SW_SIMULATED = 1 << 5,
+	MF_NO_RETRY = 1 << 6,
 };
 int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index,
 		      unsigned long count, int mf_flags);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 6af2096d8ea0..4233b21328a5 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1782,7 +1782,8 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
 			count_increased = true;
 	} else {
 		ret = -EBUSY;
-		goto out;
+		if (!(flags & MF_NO_RETRY))
+			goto out;
 	}
 
 	if (hugetlb_set_page_hwpoison(head, page)) {
@@ -1810,7 +1811,6 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
 	struct page *p = pfn_to_page(pfn);
 	struct page *head;
 	unsigned long page_flags;
-	bool retry = true;
 
 	*hugetlb = 1;
 retry:
@@ -1826,8 +1826,8 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
 		}
 		return res;
 	} else if (res == -EBUSY) {
-		if (retry) {
-			retry = false;
+		if (!(flags & MF_NO_RETRY)) {
+			flags |= MF_NO_RETRY;
 			goto retry;
 		}
 		action_result(pfn, MF_MSG_UNKNOWN, MF_IGNORED);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [mm-unstable PATCH v4 7/9] mm, hwpoison: make __page_handle_poison returns int
  2022-07-04  1:33 [mm-unstable PATCH v4 0/9] mm, hwpoison: enable 1GB hugepage support (v4) Naoya Horiguchi
                   ` (5 preceding siblings ...)
  2022-07-04  1:33 ` [mm-unstable PATCH v4 6/9] mm, hwpoison: set PG_hwpoison for busy hugetlb pages Naoya Horiguchi
@ 2022-07-04  1:33 ` Naoya Horiguchi
  2022-07-04  1:33 ` [mm-unstable PATCH v4 8/9] mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage Naoya Horiguchi
  2022-07-04  1:33 ` [mm-unstable PATCH v4 9/9] mm, hwpoison: enable memory error handling on " Naoya Horiguchi
  8 siblings, 0 replies; 30+ messages in thread
From: Naoya Horiguchi @ 2022-07-04  1:33 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, David Hildenbrand, Mike Kravetz, Miaohe Lin,
	Liu Shixin, Yang Shi, Oscar Salvador, Muchun Song,
	Naoya Horiguchi, linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

__page_handle_poison() returns bool that shows whether
take_page_off_buddy() has passed or not now.  But we will want to
distinguish another case of "dissolve has passed but taking off failed"
by its return value. So change the type of the return value.
No functional change.

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
---
v2 -> v3:
- move deleting "res = MF_FAILED" to the later patch. (by Miaohe)
---
 mm/memory-failure.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 4233b21328a5..c8939a39fbe6 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -71,7 +71,13 @@ atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
 
 static bool hw_memory_failure __read_mostly = false;
 
-static bool __page_handle_poison(struct page *page)
+/*
+ * Return values:
+ *   1:   the page is dissolved (if needed) and taken off from buddy,
+ *   0:   the page is dissolved (if needed) and not taken off from buddy,
+ *   < 0: failed to dissolve.
+ */
+static int __page_handle_poison(struct page *page)
 {
 	int ret;
 
@@ -81,7 +87,7 @@ static bool __page_handle_poison(struct page *page)
 		ret = take_page_off_buddy(page);
 	zone_pcp_enable(page_zone(page));
 
-	return ret > 0;
+	return ret;
 }
 
 static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, bool release)
@@ -91,7 +97,7 @@ static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, boo
 		 * Doing this check for free pages is also fine since dissolve_free_huge_page
 		 * returns 0 for non-hugetlb pages as well.
 		 */
-		if (!__page_handle_poison(page))
+		if (__page_handle_poison(page) <= 0)
 			/*
 			 * We could fail to take off the target page from buddy
 			 * for example due to racy page allocation, but that's
@@ -1086,7 +1092,7 @@ static int me_huge_page(struct page_state *ps, struct page *p)
 		 * subpages.
 		 */
 		put_page(hpage);
-		if (__page_handle_poison(p)) {
+		if (__page_handle_poison(p) > 0) {
 			page_ref_inc(p);
 			res = MF_RECOVERED;
 		}
@@ -1850,7 +1856,7 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
 	if (res == 0) {
 		unlock_page(head);
 		res = MF_FAILED;
-		if (__page_handle_poison(p)) {
+		if (__page_handle_poison(p) > 0) {
 			page_ref_inc(p);
 			res = MF_RECOVERED;
 		}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [mm-unstable PATCH v4 8/9] mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage
  2022-07-04  1:33 [mm-unstable PATCH v4 0/9] mm, hwpoison: enable 1GB hugepage support (v4) Naoya Horiguchi
                   ` (6 preceding siblings ...)
  2022-07-04  1:33 ` [mm-unstable PATCH v4 7/9] mm, hwpoison: make __page_handle_poison returns int Naoya Horiguchi
@ 2022-07-04  1:33 ` Naoya Horiguchi
  2022-07-04  1:33 ` [mm-unstable PATCH v4 9/9] mm, hwpoison: enable memory error handling on " Naoya Horiguchi
  8 siblings, 0 replies; 30+ messages in thread
From: Naoya Horiguchi @ 2022-07-04  1:33 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, David Hildenbrand, Mike Kravetz, Miaohe Lin,
	Liu Shixin, Yang Shi, Oscar Salvador, Muchun Song,
	Naoya Horiguchi, linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

Currently if memory_failure() (modified to remove blocking code with
subsequent patch) is called on a page in some 1GB hugepage, memory error
handling fails and the raw error page gets into leaked state.  The impact
is small in production systems (just leaked single 4kB page), but this
limits the testability because unpoison doesn't work for it.
We can no longer create 1GB hugepage on the 1GB physical address range
with such leaked pages, that's not useful when testing on small systems.

When a hwpoison page in a 1GB hugepage is handled, it's caught by the
PageHWPoison check in free_pages_prepare() because the 1GB hugepage is
broken down into raw error pages before coming to this point:

        if (unlikely(PageHWPoison(page)) && !order) {
                ...
                return false;
        }

Then, the page is not sent to buddy and the page refcount is left 0.

Originally this check is supposed to work when the error page is freed from
page_handle_poison() (that is called from soft-offline), but now we are
opening another path to call it, so the callers of __page_handle_poison()
need to handle the case by considering the return value 0 as success. Then
page refcount for hwpoison is properly incremented so unpoison works.

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
---
v2 -> v3:
- remove "res = MF_FAILED" in try_memory_failure_hugetlb (by Miaohe)
---
 mm/memory-failure.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index c8939a39fbe6..f095d55f40bc 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1084,7 +1084,6 @@ static int me_huge_page(struct page_state *ps, struct page *p)
 		res = truncate_error_page(hpage, page_to_pfn(p), mapping);
 		unlock_page(hpage);
 	} else {
-		res = MF_FAILED;
 		unlock_page(hpage);
 		/*
 		 * migration entry prevents later access on error hugepage,
@@ -1092,9 +1091,11 @@ static int me_huge_page(struct page_state *ps, struct page *p)
 		 * subpages.
 		 */
 		put_page(hpage);
-		if (__page_handle_poison(p) > 0) {
+		if (__page_handle_poison(p) >= 0) {
 			page_ref_inc(p);
 			res = MF_RECOVERED;
+		} else {
+			res = MF_FAILED;
 		}
 	}
 
@@ -1855,10 +1856,11 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
 	 */
 	if (res == 0) {
 		unlock_page(head);
-		res = MF_FAILED;
-		if (__page_handle_poison(p) > 0) {
+		if (__page_handle_poison(p) >= 0) {
 			page_ref_inc(p);
 			res = MF_RECOVERED;
+		} else {
+			res = MF_FAILED;
 		}
 		action_result(pfn, MF_MSG_FREE_HUGE, res);
 		return res == MF_RECOVERED ? 0 : -EBUSY;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [mm-unstable PATCH v4 9/9] mm, hwpoison: enable memory error handling on 1GB hugepage
  2022-07-04  1:33 [mm-unstable PATCH v4 0/9] mm, hwpoison: enable 1GB hugepage support (v4) Naoya Horiguchi
                   ` (7 preceding siblings ...)
  2022-07-04  1:33 ` [mm-unstable PATCH v4 8/9] mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage Naoya Horiguchi
@ 2022-07-04  1:33 ` Naoya Horiguchi
  8 siblings, 0 replies; 30+ messages in thread
From: Naoya Horiguchi @ 2022-07-04  1:33 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, David Hildenbrand, Mike Kravetz, Miaohe Lin,
	Liu Shixin, Yang Shi, Oscar Salvador, Muchun Song,
	Naoya Horiguchi, linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

Now error handling code is prepared, so remove the blocking code and
enable memory error handling on 1GB hugepage.

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
---
 include/linux/mm.h      |  1 -
 include/ras/ras_event.h |  1 -
 mm/memory-failure.c     | 16 ----------------
 3 files changed, 18 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 22f2dfe41c99..d084ce57c7a6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3288,7 +3288,6 @@ enum mf_action_page_type {
 	MF_MSG_DIFFERENT_COMPOUND,
 	MF_MSG_HUGE,
 	MF_MSG_FREE_HUGE,
-	MF_MSG_NON_PMD_HUGE,
 	MF_MSG_UNMAP_FAILED,
 	MF_MSG_DIRTY_SWAPCACHE,
 	MF_MSG_CLEAN_SWAPCACHE,
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
index d0337a41141c..cbd3ddd7c33d 100644
--- a/include/ras/ras_event.h
+++ b/include/ras/ras_event.h
@@ -360,7 +360,6 @@ TRACE_EVENT(aer_event,
 	EM ( MF_MSG_DIFFERENT_COMPOUND, "different compound page after locking" ) \
 	EM ( MF_MSG_HUGE, "huge page" )					\
 	EM ( MF_MSG_FREE_HUGE, "free huge page" )			\
-	EM ( MF_MSG_NON_PMD_HUGE, "non-pmd-sized huge page" )		\
 	EM ( MF_MSG_UNMAP_FAILED, "unmapping failed page" )		\
 	EM ( MF_MSG_DIRTY_SWAPCACHE, "dirty swapcache page" )		\
 	EM ( MF_MSG_CLEAN_SWAPCACHE, "clean swapcache page" )		\
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index f095d55f40bc..ba24b72b8764 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -765,7 +765,6 @@ static const char * const action_page_types[] = {
 	[MF_MSG_DIFFERENT_COMPOUND]	= "different compound page after locking",
 	[MF_MSG_HUGE]			= "huge page",
 	[MF_MSG_FREE_HUGE]		= "free huge page",
-	[MF_MSG_NON_PMD_HUGE]		= "non-pmd-sized huge page",
 	[MF_MSG_UNMAP_FAILED]		= "unmapping failed page",
 	[MF_MSG_DIRTY_SWAPCACHE]	= "dirty swapcache page",
 	[MF_MSG_CLEAN_SWAPCACHE]	= "clean swapcache page",
@@ -1868,21 +1867,6 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
 
 	page_flags = head->flags;
 
-	/*
-	 * TODO: hwpoison for pud-sized hugetlb doesn't work right now, so
-	 * simply disable it. In order to make it work properly, we need
-	 * make sure that:
-	 *  - conversion of a pud that maps an error hugetlb into hwpoison
-	 *    entry properly works, and
-	 *  - other mm code walking over page table is aware of pud-aligned
-	 *    hwpoison entries.
-	 */
-	if (huge_page_size(page_hstate(head)) > PMD_SIZE) {
-		action_result(pfn, MF_MSG_NON_PMD_HUGE, MF_IGNORED);
-		res = -EBUSY;
-		goto out;
-	}
-
 	if (!hwpoison_user_mappings(p, pfn, flags, head)) {
 		action_result(pfn, MF_MSG_UNMAP_FAILED, MF_IGNORED);
 		res = -EBUSY;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [mm-unstable PATCH v4 2/9] mm/hugetlb: separate path for hwpoison entry in copy_hugetlb_page_range()
  2022-07-04  1:33 ` [mm-unstable PATCH v4 2/9] mm/hugetlb: separate path for hwpoison entry in copy_hugetlb_page_range() Naoya Horiguchi
@ 2022-07-04  1:42   ` Andrew Morton
  2022-07-04  2:04     ` HORIGUCHI NAOYA(堀口　直也)
  0 siblings, 1 reply; 30+ messages in thread
From: Andrew Morton @ 2022-07-04  1:42 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, David Hildenbrand, Mike Kravetz, Miaohe Lin,
	Liu Shixin, Yang Shi, Oscar Salvador, Muchun Song,
	Naoya Horiguchi, linux-kernel

On Mon,  4 Jul 2022 10:33:05 +0900 Naoya Horiguchi <naoya.horiguchi@linux.dev> wrote:

> Originally copy_hugetlb_page_range() handles migration entries and hwpoisoned
> entries in similar manner.  But recently the related code path has more code
> for migration entries, and when is_writable_migration_entry() was converted
> to !is_readable_migration_entry(), hwpoison entries on source processes got
> to be unexpectedly updated (which is legitimate for migration entries, but
> not for hwpoison entries).  This results in unexpected serious issues like
> kernel panic when forking processes with hwpoison entries in pmd.
> 
> Separate the if branch into one for hwpoison entries and one for migration
> entries.
> 
> ...
>
> Cc: <stable@vger.kernel.org> # 5.18

It's unusual to have a cc:stable patch in the middle of a series like
this.  One would expect the fix to be a standalone thing against
current -linus.

As presented, this patch won't get into mainline until after 5.20-rc1. 
If that's OK then OK.  Otherwise I can shuffle things around and stage
this patch in mm-hotfixes?


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [mm-unstable PATCH v4 2/9] mm/hugetlb: separate path for hwpoison entry in copy_hugetlb_page_range()
  2022-07-04  1:42   ` Andrew Morton
@ 2022-07-04  2:04     ` HORIGUCHI NAOYA(堀口　直也)
  0 siblings, 0 replies; 30+ messages in thread
From: HORIGUCHI NAOYA(堀口　直也) @ 2022-07-04  2:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Naoya Horiguchi, linux-mm, David Hildenbrand, Mike Kravetz,
	Miaohe Lin, Liu Shixin, Yang Shi, Oscar Salvador, Muchun Song,
	linux-kernel

On Sun, Jul 03, 2022 at 06:42:59PM -0700, Andrew Morton wrote:
> On Mon,  4 Jul 2022 10:33:05 +0900 Naoya Horiguchi <naoya.horiguchi@linux.dev> wrote:
> 
> > Originally copy_hugetlb_page_range() handles migration entries and hwpoisoned
> > entries in similar manner.  But recently the related code path has more code
> > for migration entries, and when is_writable_migration_entry() was converted
> > to !is_readable_migration_entry(), hwpoison entries on source processes got
> > to be unexpectedly updated (which is legitimate for migration entries, but
> > not for hwpoison entries).  This results in unexpected serious issues like
> > kernel panic when forking processes with hwpoison entries in pmd.
> > 
> > Separate the if branch into one for hwpoison entries and one for migration
> > entries.
> > 
> > ...
> >
> > Cc: <stable@vger.kernel.org> # 5.18
> 
> It's unusual to have a cc:stable patch in the middle of a series like
> this.  One would expect the fix to be a standalone thing against
> current -linus.

Ah, OK, I should've submit this seperately.

> 
> As presented, this patch won't get into mainline until after 5.20-rc1. 
> If that's OK then OK.  Otherwise I can shuffle things around and stage
> this patch in mm-hotfixes?

Yes, I'd like to ask you to do it. Thank you for the arrangement.

- Naoya Horiguchi

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [mm-unstable PATCH v4 1/9] mm/hugetlb: check gigantic_page_runtime_supported() in return_unused_surplus_pages()
  2022-07-04  1:33 ` [mm-unstable PATCH v4 1/9] mm/hugetlb: check gigantic_page_runtime_supported() in return_unused_surplus_pages() Naoya Horiguchi
@ 2022-07-05  2:16   ` Miaohe Lin
  2022-07-05  6:39     ` HORIGUCHI NAOYA(堀口　直也)
  2022-07-06 21:51   ` Mike Kravetz
  1 sibling, 1 reply; 30+ messages in thread
From: Miaohe Lin @ 2022-07-05  2:16 UTC (permalink / raw)
  To: Naoya Horiguchi, linux-mm
  Cc: Andrew Morton, David Hildenbrand, Mike Kravetz, Liu Shixin,
	Yang Shi, Oscar Salvador, Muchun Song, Naoya Horiguchi,
	linux-kernel

On 2022/7/4 9:33, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> I found a weird state of 1GB hugepage pool, caused by the following
> procedure:
> 
>   - run a process reserving all free 1GB hugepages,
>   - shrink free 1GB hugepage pool to zero (i.e. writing 0 to
>     /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages), then
>   - kill the reserving process.
> 
> , then all the hugepages are free *and* surplus at the same time.
> 
>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
>   3
>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/free_hugepages
>   3
>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/resv_hugepages
>   0
>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/surplus_hugepages
>   3
> 
> This state is resolved by reserving and allocating the pages then
> freeing them again, so this seems not to result in serious problem.
> But it's a little surprising (shrinking pool suddenly fails).
> 
> This behavior is caused by hstate_is_gigantic() check in
> return_unused_surplus_pages(). This was introduced so long ago in 2008
> by commit aa888a74977a ("hugetlb: support larger than MAX_ORDER"), and
> at that time the gigantic pages were not supposed to be allocated/freed
> at run-time.  Now kernel can support runtime allocation/free, so let's
> check gigantic_page_runtime_supported() together.
> 
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

This patch looks good to me with a few question below.

> ---
> v2 -> v3:
> - Fixed typo in patch description,
> - add !gigantic_page_runtime_supported() check instead of removing
>   hstate_is_gigantic() check (suggested by Miaohe and Muchun)
> - add a few more !gigantic_page_runtime_supported() check in
>   set_max_huge_pages() (by Mike).
> ---
>  mm/hugetlb.c | 19 ++++++++++++++++---
>  1 file changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 2a554f006255..bdc4499f324b 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2432,8 +2432,7 @@ static void return_unused_surplus_pages(struct hstate *h,
>  	/* Uncommit the reservation */
>  	h->resv_huge_pages -= unused_resv_pages;
>  
> -	/* Cannot return gigantic pages currently */
> -	if (hstate_is_gigantic(h))
> +	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
>  		goto out;
>  
>  	/*
> @@ -3315,7 +3314,8 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
>  	 * the user tries to allocate gigantic pages but let the user free the
>  	 * boottime allocated gigantic pages.
>  	 */
> -	if (hstate_is_gigantic(h) && !IS_ENABLED(CONFIG_CONTIG_ALLOC)) {
> +	if (hstate_is_gigantic(h) && (!IS_ENABLED(CONFIG_CONTIG_ALLOC) ||
> +				      !gigantic_page_runtime_supported())) {
>  		if (count > persistent_huge_pages(h)) {
>  			spin_unlock_irq(&hugetlb_lock);
>  			mutex_unlock(&h->resize_lock);
> @@ -3363,6 +3363,19 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
>  			goto out;
>  	}
>  
> +	/*
> +	 * We can not decrease gigantic pool size if runtime modification
> +	 * is not supported.
> +	 */
> +	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported()) {
> +		if (count < persistent_huge_pages(h)) {
> +			spin_unlock_irq(&hugetlb_lock);
> +			mutex_unlock(&h->resize_lock);
> +			NODEMASK_FREE(node_alloc_noretry);
> +			return -EINVAL;
> +		}
> +	}

With above change, we're not allowed to decrease the pool size now. But it was allowed previously
even if !gigantic_page_runtime_supported. Does this will break user?

And it seems it's not allowed to adjust the max_huge_pages now if !gigantic_page_runtime_supported
for gigantic huge page. Should we just return for such case as there should be nothing to do now?
Or am I miss something?

Thanks!

> +
>  	/*
>  	 * Decrease the pool size
>  	 * First return free pages to the buddy allocator (being careful
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [mm-unstable PATCH v4 3/9] mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry
  2022-07-04  1:33 ` [mm-unstable PATCH v4 3/9] mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry Naoya Horiguchi
@ 2022-07-05  2:46   ` Miaohe Lin
  2022-07-05  9:04     ` HORIGUCHI NAOYA(堀口　直也)
  2022-07-06 22:21   ` Mike Kravetz
  1 sibling, 1 reply; 30+ messages in thread
From: Miaohe Lin @ 2022-07-05  2:46 UTC (permalink / raw)
  To: Naoya Horiguchi, linux-mm
  Cc: Andrew Morton, David Hildenbrand, Mike Kravetz, Liu Shixin,
	Yang Shi, Oscar Salvador, Muchun Song, Naoya Horiguchi,
	linux-kernel

On 2022/7/4 9:33, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> follow_pud_mask() does not support non-present pud entry now.  As long as
> I tested on x86_64 server, follow_pud_mask() still simply returns
> no_page_table() for non-present_pud_entry() due to pud_bad(), so no severe
> user-visible effect should happen.  But generally we should call
> follow_huge_pud() for non-present pud entry for 1GB hugetlb page.
> 
> Update pud_huge() and follow_huge_pud() to handle non-present pud entries.
> The changes are similar to previous works for pud entries commit e66f17ff7177
> ("mm/hugetlb: take page table lock in follow_huge_pmd()") and commit
> cbef8478bee5 ("mm/hugetlb: pmd_huge() returns true for non-present hugepage").
> 
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> ---
> v2 -> v3:
> - fixed typos in subject and description,
> - added comment on pud_huge(),
> - added comment about fallback for hwpoisoned entry,
> - updated initial check about FOLL_{PIN,GET} flags.
> ---
>  arch/x86/mm/hugetlbpage.c |  8 +++++++-
>  mm/hugetlb.c              | 32 ++++++++++++++++++++++++++++++--
>  2 files changed, 37 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
> index 509408da0da1..6b3033845c6d 100644
> --- a/arch/x86/mm/hugetlbpage.c
> +++ b/arch/x86/mm/hugetlbpage.c
> @@ -30,9 +30,15 @@ int pmd_huge(pmd_t pmd)
>  		(pmd_val(pmd) & (_PAGE_PRESENT|_PAGE_PSE)) != _PAGE_PRESENT;
>  }
>  
> +/*
> + * pud_huge() returns 1 if @pud is hugetlb related entry, that is normal
> + * hugetlb entry or non-present (migration or hwpoisoned) hugetlb entry.
> + * Otherwise, returns 0.
> + */
>  int pud_huge(pud_t pud)
>  {
> -	return !!(pud_val(pud) & _PAGE_PSE);
> +	return !pud_none(pud) &&
> +		(pud_val(pud) & (_PAGE_PRESENT|_PAGE_PSE)) != _PAGE_PRESENT;
>  }

Question: Is aarch64 supported too? It seems aarch64 version of pud_huge matches
the requirement naturally for me.

Anyway, this patch looks good to me.

Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>

Thanks.

>  
>  #ifdef CONFIG_HUGETLB_PAGE
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index ad621688370b..66bb39e0fce8 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -6994,10 +6994,38 @@ struct page * __weak
>  follow_huge_pud(struct mm_struct *mm, unsigned long address,
>  		pud_t *pud, int flags)
>  {
> -	if (flags & (FOLL_GET | FOLL_PIN))
> +	struct page *page = NULL;
> +	spinlock_t *ptl;
> +	pte_t pte;
> +
> +	if (WARN_ON_ONCE(flags & FOLL_PIN))
>  		return NULL;
>  
> -	return pte_page(*(pte_t *)pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT);
> +retry:
> +	ptl = huge_pte_lock(hstate_sizelog(PUD_SHIFT), mm, (pte_t *)pud);
> +	if (!pud_huge(*pud))
> +		goto out;
> +	pte = huge_ptep_get((pte_t *)pud);
> +	if (pte_present(pte)) {
> +		page = pud_page(*pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT);
> +		if (WARN_ON_ONCE(!try_grab_page(page, flags))) {
> +			page = NULL;
> +			goto out;
> +		}
> +	} else {
> +		if (is_hugetlb_entry_migration(pte)) {
> +			spin_unlock(ptl);
> +			__migration_entry_wait(mm, (pte_t *)pud, ptl);
> +			goto retry;
> +		}
> +		/*
> +		 * hwpoisoned entry is treated as no_page_table in
> +		 * follow_page_mask().
> +		 */
> +	}
> +out:
> +	spin_unlock(ptl);
> +	return page;
>  }
>  
>  struct page * __weak
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [mm-unstable PATCH v4 1/9] mm/hugetlb: check gigantic_page_runtime_supported() in return_unused_surplus_pages()
  2022-07-05  2:16   ` Miaohe Lin
@ 2022-07-05  6:39     ` HORIGUCHI NAOYA(堀口　直也)
  2022-07-06  3:04       ` Miaohe Lin
  0 siblings, 1 reply; 30+ messages in thread
From: HORIGUCHI NAOYA(堀口　直也) @ 2022-07-05  6:39 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: Naoya Horiguchi, linux-mm, Andrew Morton, David Hildenbrand,
	Mike Kravetz, Liu Shixin, Yang Shi, Oscar Salvador, Muchun Song,
	linux-kernel

On Tue, Jul 05, 2022 at 10:16:39AM +0800, Miaohe Lin wrote:
> On 2022/7/4 9:33, Naoya Horiguchi wrote:
> > From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > 
> > I found a weird state of 1GB hugepage pool, caused by the following
> > procedure:
> > 
> >   - run a process reserving all free 1GB hugepages,
> >   - shrink free 1GB hugepage pool to zero (i.e. writing 0 to
> >     /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages), then
> >   - kill the reserving process.
> > 
> > , then all the hugepages are free *and* surplus at the same time.
> > 
> >   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
> >   3
> >   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/free_hugepages
> >   3
> >   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/resv_hugepages
> >   0
> >   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/surplus_hugepages
> >   3
> > 
> > This state is resolved by reserving and allocating the pages then
> > freeing them again, so this seems not to result in serious problem.
> > But it's a little surprising (shrinking pool suddenly fails).
> > 
> > This behavior is caused by hstate_is_gigantic() check in
> > return_unused_surplus_pages(). This was introduced so long ago in 2008
> > by commit aa888a74977a ("hugetlb: support larger than MAX_ORDER"), and
> > at that time the gigantic pages were not supposed to be allocated/freed
> > at run-time.  Now kernel can support runtime allocation/free, so let's
> > check gigantic_page_runtime_supported() together.
> > 
> > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> This patch looks good to me with a few question below.

Thank you for reviewing.

> 
> > ---
> > v2 -> v3:
> > - Fixed typo in patch description,
> > - add !gigantic_page_runtime_supported() check instead of removing
> >   hstate_is_gigantic() check (suggested by Miaohe and Muchun)
> > - add a few more !gigantic_page_runtime_supported() check in
> >   set_max_huge_pages() (by Mike).
> > ---
> >  mm/hugetlb.c | 19 ++++++++++++++++---
> >  1 file changed, 16 insertions(+), 3 deletions(-)
> > 
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 2a554f006255..bdc4499f324b 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -2432,8 +2432,7 @@ static void return_unused_surplus_pages(struct hstate *h,
> >  	/* Uncommit the reservation */
> >  	h->resv_huge_pages -= unused_resv_pages;
> >  
> > -	/* Cannot return gigantic pages currently */
> > -	if (hstate_is_gigantic(h))
> > +	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
> >  		goto out;
> >  
> >  	/*
> > @@ -3315,7 +3314,8 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
> >  	 * the user tries to allocate gigantic pages but let the user free the
> >  	 * boottime allocated gigantic pages.
> >  	 */
> > -	if (hstate_is_gigantic(h) && !IS_ENABLED(CONFIG_CONTIG_ALLOC)) {
> > +	if (hstate_is_gigantic(h) && (!IS_ENABLED(CONFIG_CONTIG_ALLOC) ||
> > +				      !gigantic_page_runtime_supported())) {
> >  		if (count > persistent_huge_pages(h)) {
> >  			spin_unlock_irq(&hugetlb_lock);
> >  			mutex_unlock(&h->resize_lock);
> > @@ -3363,6 +3363,19 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
> >  			goto out;
> >  	}
> >  
> > +	/*
> > +	 * We can not decrease gigantic pool size if runtime modification
> > +	 * is not supported.
> > +	 */
> > +	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported()) {
> > +		if (count < persistent_huge_pages(h)) {
> > +			spin_unlock_irq(&hugetlb_lock);
> > +			mutex_unlock(&h->resize_lock);
> > +			NODEMASK_FREE(node_alloc_noretry);
> > +			return -EINVAL;
> > +		}
> > +	}
> 
> With above change, we're not allowed to decrease the pool size now. But it was allowed previously
> even if !gigantic_page_runtime_supported. Does this will break user?

Yes, it does. I might get the wrong idea about the definition of
gigantic_page_runtime_supported(), which shows that runtime pool *extension*
is supported or not (implying that pool shrinking is always possible).
If this is right, this new if-block is not necessary.

> 
> And it seems it's not allowed to adjust the max_huge_pages now if !gigantic_page_runtime_supported
> for gigantic huge page. Should we just return for such case as there should be nothing to do now?
> Or am I miss something?

If pool shrinking is always allowed, we need uptdate max_huge_pages so,
the above if-block should have "goto out;", but it will be removed anyway
so we don't have to care for it.

Thank you for the valuable comment.

- Naoya Horiguchi

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [mm-unstable PATCH v4 3/9] mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry
  2022-07-05  2:46   ` Miaohe Lin
@ 2022-07-05  9:04     ` HORIGUCHI NAOYA(堀口　直也)
  2022-07-06  3:07       ` Miaohe Lin
  0 siblings, 1 reply; 30+ messages in thread
From: HORIGUCHI NAOYA(堀口　直也) @ 2022-07-05  9:04 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: Naoya Horiguchi, linux-mm, Andrew Morton, David Hildenbrand,
	Mike Kravetz, Liu Shixin, Yang Shi, Oscar Salvador, Muchun Song,
	linux-kernel

On Tue, Jul 05, 2022 at 10:46:09AM +0800, Miaohe Lin wrote:
> On 2022/7/4 9:33, Naoya Horiguchi wrote:
> > From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > 
> > follow_pud_mask() does not support non-present pud entry now.  As long as
> > I tested on x86_64 server, follow_pud_mask() still simply returns
> > no_page_table() for non-present_pud_entry() due to pud_bad(), so no severe
> > user-visible effect should happen.  But generally we should call
> > follow_huge_pud() for non-present pud entry for 1GB hugetlb page.
> > 
> > Update pud_huge() and follow_huge_pud() to handle non-present pud entries.
> > The changes are similar to previous works for pud entries commit e66f17ff7177
> > ("mm/hugetlb: take page table lock in follow_huge_pmd()") and commit
> > cbef8478bee5 ("mm/hugetlb: pmd_huge() returns true for non-present hugepage").
> > 
> > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > ---
> > v2 -> v3:
> > - fixed typos in subject and description,
> > - added comment on pud_huge(),
> > - added comment about fallback for hwpoisoned entry,
> > - updated initial check about FOLL_{PIN,GET} flags.
> > ---
> >  arch/x86/mm/hugetlbpage.c |  8 +++++++-
> >  mm/hugetlb.c              | 32 ++++++++++++++++++++++++++++++--
> >  2 files changed, 37 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
> > index 509408da0da1..6b3033845c6d 100644
> > --- a/arch/x86/mm/hugetlbpage.c
> > +++ b/arch/x86/mm/hugetlbpage.c
> > @@ -30,9 +30,15 @@ int pmd_huge(pmd_t pmd)
> >  		(pmd_val(pmd) & (_PAGE_PRESENT|_PAGE_PSE)) != _PAGE_PRESENT;
> >  }
> >  
> > +/*
> > + * pud_huge() returns 1 if @pud is hugetlb related entry, that is normal
> > + * hugetlb entry or non-present (migration or hwpoisoned) hugetlb entry.
> > + * Otherwise, returns 0.
> > + */
> >  int pud_huge(pud_t pud)
> >  {
> > -	return !!(pud_val(pud) & _PAGE_PSE);
> > +	return !pud_none(pud) &&
> > +		(pud_val(pud) & (_PAGE_PRESENT|_PAGE_PSE)) != _PAGE_PRESENT;
> >  }
> 
> Question: Is aarch64 supported too? It seems aarch64 version of pud_huge matches
> the requirement naturally for me.

I think that if pmd_huge() and pud_huge() return true for non-present
pmd/pud entries, that's OK.  Otherwise we need update to support the
new feature.

In aarch64, the bits in pte/pmd/pud related to {pmd,pud}_present() and
{pmd,pud}_huge() seem not to overlap with the bit range for swap type
and swap offset, so maybe that's fine.  But I recommend to test with
arm64 if you have access to aarch64 servers.

> 
> Anyway, this patch looks good to me.
> 
> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>

Thank you for reviewing.

- Naoya Horiguchi

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [mm-unstable PATCH v4 4/9] mm, hwpoison, hugetlb: support saving mechanism of raw error pages
  2022-07-04  1:33 ` [mm-unstable PATCH v4 4/9] mm, hwpoison, hugetlb: support saving mechanism of raw error pages Naoya Horiguchi
@ 2022-07-06  2:37   ` Miaohe Lin
  2022-07-06 23:06     ` HORIGUCHI NAOYA(堀口　直也)
  0 siblings, 1 reply; 30+ messages in thread
From: Miaohe Lin @ 2022-07-06  2:37 UTC (permalink / raw)
  To: Naoya Horiguchi, linux-mm
  Cc: Andrew Morton, David Hildenbrand, Mike Kravetz, Liu Shixin,
	Yang Shi, Oscar Salvador, Muchun Song, Naoya Horiguchi,
	linux-kernel

On 2022/7/4 9:33, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> When handling memory error on a hugetlb page, the error handler tries to
> dissolve and turn it into 4kB pages.  If it's successfully dissolved,
> PageHWPoison flag is moved to the raw error page, so that's all right.
> However, dissolve sometimes fails, then the error page is left as
> hwpoisoned hugepage. It's useful if we can retry to dissolve it to save
> healthy pages, but that's not possible now because the information about
> where the raw error pages is lost.
> 
> Use the private field of a few tail pages to keep that information.  The
> code path of shrinking hugepage pool uses this info to try delayed dissolve.
> In order to remember multiple errors in a hugepage, a singly-linked list
> originated from SUBPAGE_INDEX_HWPOISON-th tail page is constructed.  Only
> simple operations (adding an entry or clearing all) are required and the
> list is assumed not to be very long, so this simple data structure should
> be enough.
> 
> If we failed to save raw error info, the hwpoison hugepage has errors on
> unknown subpage, then this new saving mechanism does not work any more,
> so disable saving new raw error info and freeing hwpoison hugepages.
> 
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> ---
> v3 -> v4:
> - resolve conflict with "mm: hugetlb_vmemmap: improve hugetlb_vmemmap
>   code readability", use hugetlb_vmemmap_restore() instead of
>   hugetlb_vmemmap_alloc().
> 
> v2 -> v3:
> - remove duplicate "return ret" lines,
> - use GFP_ATOMIC instead of GFP_KERNEL,
> - introduce HPageRawHwpUnreliable pseudo flag (suggested by Muchun),
> - hugetlb_clear_page_hwpoison removes raw_hwp_page list even if
>   HPageRawHwpUnreliable is true, (by Miaohe)
> 
> v1 -> v2:
> - support hwpoison hugepage with multiple errors,
> - moved the new interface functions to mm/memory-failure.c,
> - define additional subpage index SUBPAGE_INDEX_HWPOISON_UNRELIABLE,
> - stop freeing/dissolving hwpoison hugepages with unreliable raw error info,
> - drop hugetlb_clear_page_hwpoison() in dissolve_free_huge_page() because
>   that's done in update_and_free_page(),
> - move setting/clearing PG_hwpoison flag to the new interfaces,
> - checking already hwpoisoned or not on a subpage basis.
> 
> ChangeLog since previous post on 4/27:
> - fixed typo in patch description (by Miaohe)
> - fixed config value in #ifdef statement (by Miaohe)
> - added sentences about "multiple hwpoison pages" scenario in patch
>   description
> 
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> ---
>  include/linux/hugetlb.h | 18 +++++++++-
>  mm/hugetlb.c            | 39 ++++++++++----------
>  mm/memory-failure.c     | 80 +++++++++++++++++++++++++++++++++++++++--
>  3 files changed, 114 insertions(+), 23 deletions(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index dce46d571575..29c4d0883d36 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -42,6 +42,9 @@ enum {
>  	SUBPAGE_INDEX_CGROUP,		/* reuse page->private */
>  	SUBPAGE_INDEX_CGROUP_RSVD,	/* reuse page->private */
>  	__MAX_CGROUP_SUBPAGE_INDEX = SUBPAGE_INDEX_CGROUP_RSVD,
> +#endif
> +#ifdef CONFIG_MEMORY_FAILURE
> +	SUBPAGE_INDEX_HWPOISON,
>  #endif
>  	__NR_USED_SUBPAGE,
>  };
> @@ -551,7 +554,7 @@ generic_hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
>   *	Synchronization:  Initially set after new page allocation with no
>   *	locking.  When examined and modified during migration processing
>   *	(isolate, migrate, putback) the hugetlb_lock is held.
> - * HPG_temporary - - Set on a page that is temporarily allocated from the buddy
> + * HPG_temporary -- Set on a page that is temporarily allocated from the buddy
>   *	allocator.  Typically used for migration target pages when no pages
>   *	are available in the pool.  The hugetlb free page path will
>   *	immediately free pages with this flag set to the buddy allocator.
> @@ -561,6 +564,8 @@ generic_hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
>   * HPG_freed - Set when page is on the free lists.
>   *	Synchronization: hugetlb_lock held for examination and modification.
>   * HPG_vmemmap_optimized - Set when the vmemmap pages of the page are freed.
> + * HPG_raw_hwp_unreliable - Set when the hugetlb page has a hwpoison sub-page
> + *     that is not tracked by raw_hwp_page list.
>   */
>  enum hugetlb_page_flags {
>  	HPG_restore_reserve = 0,
> @@ -568,6 +573,7 @@ enum hugetlb_page_flags {
>  	HPG_temporary,
>  	HPG_freed,
>  	HPG_vmemmap_optimized,
> +	HPG_raw_hwp_unreliable,
>  	__NR_HPAGEFLAGS,
>  };
>  
> @@ -614,6 +620,7 @@ HPAGEFLAG(Migratable, migratable)
>  HPAGEFLAG(Temporary, temporary)
>  HPAGEFLAG(Freed, freed)
>  HPAGEFLAG(VmemmapOptimized, vmemmap_optimized)
> +HPAGEFLAG(RawHwpUnreliable, raw_hwp_unreliable)
>  
>  #ifdef CONFIG_HUGETLB_PAGE
>  
> @@ -796,6 +803,15 @@ extern int dissolve_free_huge_page(struct page *page);
>  extern int dissolve_free_huge_pages(unsigned long start_pfn,
>  				    unsigned long end_pfn);
>  
> +#ifdef CONFIG_MEMORY_FAILURE
> +extern int hugetlb_clear_page_hwpoison(struct page *hpage);
> +#else
> +static inline int hugetlb_clear_page_hwpoison(struct page *hpage)
> +{
> +	return 0;
> +}
> +#endif
> +
>  #ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
>  #ifndef arch_hugetlb_migration_supported
>  static inline bool arch_hugetlb_migration_supported(struct hstate *h)
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 66bb39e0fce8..ccd470f0194c 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1535,17 +1535,15 @@ static void __update_and_free_page(struct hstate *h, struct page *page)
>  	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
>  		return;
>  
> -	if (hugetlb_vmemmap_restore(h, page)) {
> -		spin_lock_irq(&hugetlb_lock);
> -		/*
> -		 * If we cannot allocate vmemmap pages, just refuse to free the
> -		 * page and put the page back on the hugetlb free list and treat
> -		 * as a surplus page.
> -		 */
> -		add_hugetlb_page(h, page, true);
> -		spin_unlock_irq(&hugetlb_lock);
> -		return;
> -	}
> +	if (hugetlb_vmemmap_restore(h, page))
> +		goto fail;
> +
> +	/*
> +	 * Move PageHWPoison flag from head page to the raw error pages,
> +	 * which makes any healthy subpages reusable.
> +	 */
> +	if (unlikely(PageHWPoison(page) && hugetlb_clear_page_hwpoison(page)))
> +		goto fail;

IIUC, HPageVmemmapOptimized must have been cleared via hugetlb_vmemmap_restore above. So
VM_BUG_ON_PAGE(!HPageVmemmapOptimized(page), page) in add_hugetlb_page will be triggered
if we go to fail here. add_hugetlb_page is expected to be called when we cannot allocate
vmemmap pages.

>  
>  	for (i = 0; i < pages_per_huge_page(h);
>  	     i++, subpage = mem_map_next(subpage, page, i)) {
> @@ -1566,6 +1564,16 @@ static void __update_and_free_page(struct hstate *h, struct page *page)
>  	} else {
>  		__free_pages(page, huge_page_order(h));
>  	}
> +	return;
> +fail:
> +	spin_lock_irq(&hugetlb_lock);
> +	/*
> +	 * If we cannot allocate vmemmap pages or cannot identify raw hwpoison
> +	 * subpages reliably, just refuse to free the page and put the page
> +	 * back on the hugetlb free list and treat as a surplus page.
> +	 */
> +	add_hugetlb_page(h, page, true);
> +	spin_unlock_irq(&hugetlb_lock);
>  }
>  
>  /*
> @@ -2109,15 +2117,6 @@ int dissolve_free_huge_page(struct page *page)
>  		 */
>  		rc = hugetlb_vmemmap_restore(h, head);
>  		if (!rc) {
> -			/*
> -			 * Move PageHWPoison flag from head page to the raw
> -			 * error page, which makes any subpages rather than
> -			 * the error page reusable.
> -			 */
> -			if (PageHWPoison(head) && page != head) {
> -				SetPageHWPoison(page);
> -				ClearPageHWPoison(head);
> -			}
>  			update_and_free_page(h, head, false);
>  		} else {
>  			spin_lock_irq(&hugetlb_lock);
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index c9931c676335..53bf7486a245 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1664,6 +1664,82 @@ int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index,
>  EXPORT_SYMBOL_GPL(mf_dax_kill_procs);
>  #endif /* CONFIG_FS_DAX */
>  
> +/*
> + * Struct raw_hwp_page represents information about "raw error page",
> + * constructing singly linked list originated from ->private field of
> + * SUBPAGE_INDEX_HWPOISON-th tail page.
> + */
> +struct raw_hwp_page {
> +	struct llist_node node;
> +	struct page *page;
> +};
> +
> +static inline struct llist_head *raw_hwp_list_head(struct page *hpage)
> +{
> +	return (struct llist_head *)&page_private(hpage + SUBPAGE_INDEX_HWPOISON);
> +}
> +
> +static inline int hugetlb_set_page_hwpoison(struct page *hpage,
> +					struct page *page)
> +{
> +	struct llist_head *head;
> +	struct raw_hwp_page *raw_hwp;
> +	struct llist_node *t, *tnode;
> +	int ret;
> +
> +	/*
> +	 * Once the hwpoison hugepage has lost reliable raw error info,
> +	 * there is little meaning to keep additional error info precisely,
> +	 * so skip to add additional raw error info.
> +	 */
> +	if (HPageRawHwpUnreliable(hpage))
> +		return -EHWPOISON;

If we return here, num_poisoned_pages can't reflect the all hwpoisoned hugepages?

> +	head = raw_hwp_list_head(hpage);
> +	llist_for_each_safe(tnode, t, head->first) {
> +		struct raw_hwp_page *p = container_of(tnode, struct raw_hwp_page, node);
> +
> +		if (p->page == page)
> +			return -EHWPOISON;
> +	}
> +
> +	ret = TestSetPageHWPoison(hpage) ? -EHWPOISON : 0;
> +	/* the first error event will be counted in action_result(). */
> +	if (ret)
> +		num_poisoned_pages_inc();
> +
> +	raw_hwp = kmalloc(sizeof(struct raw_hwp_page), GFP_ATOMIC);
> +	if (raw_hwp) {
> +		raw_hwp->page = page;
> +		llist_add(&raw_hwp->node, head);

IMHO, we might need to do num_poisoned_pages_inc here because we decrement the
num_poisoned_pages according to the llist length.

> +	} else {
> +		/*
> +		 * Failed to save raw error info.  We no longer trace all
> +		 * hwpoisoned subpages, and we need refuse to free/dissolve
> +		 * this hwpoisoned hugepage.
> +		 */
> +		SetHPageRawHwpUnreliable(hpage);
> +	}
> +	return ret;
> +}
> +
> +inline int hugetlb_clear_page_hwpoison(struct page *hpage)

off-the-topic: Is "inline" needed here? I see hugetlb_clear_page_hwpoison is "extern" above.

> +{
> +	struct llist_head *head;
> +	struct llist_node *t, *tnode;
> +
> +	if (!HPageRawHwpUnreliable(hpage))
> +		ClearPageHWPoison(hpage);
> +	head = raw_hwp_list_head(hpage);
> +	llist_for_each_safe(tnode, t, head->first) {
> +		struct raw_hwp_page *p = container_of(tnode, struct raw_hwp_page, node);
> +
> +		SetPageHWPoison(p->page);

IMHO, in HPageRawHwpUnreliable(hpage) case, it's better not to do SetPageHWPoison here.
Because hugepage won't be dissolved and thus we cannot write any data to some tail struct
pages if HugeTLB Vmemmap Optimization is enabled. Freeing the memory here should be enough.

> +		kfree(p);
> +	}
> +	llist_del_all(head);
> +	return 0;
> +}

I thought num_poisoned_pages_dec is missing and return value is unneeded. But this is changed
in next patch. So it should be fine here.

> +
>  /*
>   * Called from hugetlb code with hugetlb_lock held.
>   *
> @@ -1698,7 +1774,7 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
>  		goto out;
>  	}
>  
> -	if (TestSetPageHWPoison(head)) {
> +	if (hugetlb_set_page_hwpoison(head, page)) {
>  		ret = -EHWPOISON;
>  		goto out;
>  	}
> @@ -1751,7 +1827,7 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
>  	lock_page(head);
>  
>  	if (hwpoison_filter(p)) {
> -		ClearPageHWPoison(head);
> +		hugetlb_clear_page_hwpoison(head);
>  		res = -EOPNOTSUPP;
>  		goto out;
>  	}
> 

Many thanks for your hard work. :)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [mm-unstable PATCH v4 5/9] mm, hwpoison: make unpoison aware of raw error info in hwpoisoned hugepage
  2022-07-04  1:33 ` [mm-unstable PATCH v4 5/9] mm, hwpoison: make unpoison aware of raw error info in hwpoisoned hugepage Naoya Horiguchi
@ 2022-07-06  2:58   ` Miaohe Lin
  2022-07-06 23:06     ` HORIGUCHI NAOYA(堀口　直也)
  0 siblings, 1 reply; 30+ messages in thread
From: Miaohe Lin @ 2022-07-06  2:58 UTC (permalink / raw)
  To: Naoya Horiguchi, linux-mm
  Cc: Andrew Morton, David Hildenbrand, Mike Kravetz, Liu Shixin,
	Yang Shi, Oscar Salvador, Muchun Song, Naoya Horiguchi,
	linux-kernel

On 2022/7/4 9:33, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> Raw error info list needs to be removed when hwpoisoned hugetlb is
> unpoisoned.  And unpoison handler needs to know how many errors there
> are in the target hugepage. So add them.
> 
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> ---
>  include/linux/swapops.h |  9 +++++++++
>  mm/memory-failure.c     | 31 +++++++++++++++++++++++++------
>  2 files changed, 34 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/swapops.h b/include/linux/swapops.h
> index a01aeb3fcc0b..ddc98f96ad2c 100644
> --- a/include/linux/swapops.h
> +++ b/include/linux/swapops.h
> @@ -498,6 +498,11 @@ static inline void num_poisoned_pages_dec(void)
>  	atomic_long_dec(&num_poisoned_pages);
>  }
>  
> +static inline void num_poisoned_pages_sub(long i)
> +{
> +	atomic_long_sub(i, &num_poisoned_pages);
> +}
> +
>  #else
>  
>  static inline swp_entry_t make_hwpoison_entry(struct page *page)
> @@ -518,6 +523,10 @@ static inline struct page *hwpoison_entry_to_page(swp_entry_t entry)
>  static inline void num_poisoned_pages_inc(void)
>  {
>  }
> +
> +static inline void num_poisoned_pages_sub(long i)
> +{
> +}
>  #endif
>  
>  static inline int non_swap_entry(swp_entry_t entry)
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 53bf7486a245..6af2096d8ea0 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1722,22 +1722,33 @@ static inline int hugetlb_set_page_hwpoison(struct page *hpage,
>  	return ret;
>  }
>  
> -inline int hugetlb_clear_page_hwpoison(struct page *hpage)
> +static inline long free_raw_hwp_pages(struct page *hpage, bool move_flag)
>  {
>  	struct llist_head *head;
>  	struct llist_node *t, *tnode;
> +	long count = 0;
>  
> -	if (!HPageRawHwpUnreliable(hpage))
> -		ClearPageHWPoison(hpage);
>  	head = raw_hwp_list_head(hpage);
>  	llist_for_each_safe(tnode, t, head->first) {
>  		struct raw_hwp_page *p = container_of(tnode, struct raw_hwp_page, node);
>  
> -		SetPageHWPoison(p->page);
> +		if (move_flag)
> +			SetPageHWPoison(p->page);
>  		kfree(p);
> +		count++;
>  	}
>  	llist_del_all(head);
> -	return 0;
> +	return count;
> +}
> +
> +inline int hugetlb_clear_page_hwpoison(struct page *hpage)
> +{
> +	int ret = -EBUSY;
> +
> +	if (!HPageRawHwpUnreliable(hpage))
> +		ret = !TestClearPageHWPoison(hpage);
> +	free_raw_hwp_pages(hpage, true);
> +	return ret;
>  }
>  
>  /*
> @@ -1882,6 +1893,9 @@ static inline int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *
>  	return 0;
>  }
>  
> +static inline void free_raw_hwp_pages(struct page *hpage, bool move_flag)
> +{
> +}
>  #endif	/* CONFIG_HUGETLB_PAGE */
>  
>  static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
> @@ -2287,6 +2301,7 @@ int unpoison_memory(unsigned long pfn)

Is it safe to unpoison hugepage when HPageRawHwpUnreliable? I'm afraid because
some raw error info is missing..

Thanks.

>  	struct page *p;
>  	int ret = -EBUSY;
>  	int freeit = 0;
> +	long count = 1;
>  	static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
>  					DEFAULT_RATELIMIT_BURST);
>  
> @@ -2334,6 +2349,8 @@ int unpoison_memory(unsigned long pfn)
>  
>  	ret = get_hwpoison_page(p, MF_UNPOISON);
>  	if (!ret) {
> +		if (PageHuge(p))
> +			count = free_raw_hwp_pages(page, false);
>  		ret = TestClearPageHWPoison(page) ? 0 : -EBUSY;
>  	} else if (ret < 0) {
>  		if (ret == -EHWPOISON) {
> @@ -2342,6 +2359,8 @@ int unpoison_memory(unsigned long pfn)
>  			unpoison_pr_info("Unpoison: failed to grab page %#lx\n",
>  					 pfn, &unpoison_rs);
>  	} else {
> +		if (PageHuge(p))
> +			count = free_raw_hwp_pages(page, false);
>  		freeit = !!TestClearPageHWPoison(p);
>  
>  		put_page(page);
> @@ -2354,7 +2373,7 @@ int unpoison_memory(unsigned long pfn)
>  unlock_mutex:
>  	mutex_unlock(&mf_mutex);
>  	if (!ret || freeit) {
> -		num_poisoned_pages_dec();
> +		num_poisoned_pages_sub(count);
>  		unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
>  				 page_to_pfn(p), &unpoison_rs);
>  	}
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [mm-unstable PATCH v4 1/9] mm/hugetlb: check gigantic_page_runtime_supported() in return_unused_surplus_pages()
  2022-07-05  6:39     ` HORIGUCHI NAOYA(堀口　直也)
@ 2022-07-06  3:04       ` Miaohe Lin
  2022-07-06  3:22         ` Mike Kravetz
  0 siblings, 1 reply; 30+ messages in thread
From: Miaohe Lin @ 2022-07-06  3:04 UTC (permalink / raw)
  To: HORIGUCHI NAOYA(堀口 直也)
  Cc: Naoya Horiguchi, linux-mm, Andrew Morton, David Hildenbrand,
	Mike Kravetz, Liu Shixin, Yang Shi, Oscar Salvador, Muchun Song,
	linux-kernel

On 2022/7/5 14:39, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Tue, Jul 05, 2022 at 10:16:39AM +0800, Miaohe Lin wrote:
>> On 2022/7/4 9:33, Naoya Horiguchi wrote:
>>> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>>
>>> I found a weird state of 1GB hugepage pool, caused by the following
>>> procedure:
>>>
>>>   - run a process reserving all free 1GB hugepages,
>>>   - shrink free 1GB hugepage pool to zero (i.e. writing 0 to
>>>     /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages), then
>>>   - kill the reserving process.
>>>
>>> , then all the hugepages are free *and* surplus at the same time.
>>>
>>>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
>>>   3
>>>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/free_hugepages
>>>   3
>>>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/resv_hugepages
>>>   0
>>>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/surplus_hugepages
>>>   3
>>>
>>> This state is resolved by reserving and allocating the pages then
>>> freeing them again, so this seems not to result in serious problem.
>>> But it's a little surprising (shrinking pool suddenly fails).
>>>
>>> This behavior is caused by hstate_is_gigantic() check in
>>> return_unused_surplus_pages(). This was introduced so long ago in 2008
>>> by commit aa888a74977a ("hugetlb: support larger than MAX_ORDER"), and
>>> at that time the gigantic pages were not supposed to be allocated/freed
>>> at run-time.  Now kernel can support runtime allocation/free, so let's
>>> check gigantic_page_runtime_supported() together.
>>>
>>> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>
>> This patch looks good to me with a few question below.
> 
> Thank you for reviewing.
> 
>>
>>> ---
>>> v2 -> v3:
>>> - Fixed typo in patch description,
>>> - add !gigantic_page_runtime_supported() check instead of removing
>>>   hstate_is_gigantic() check (suggested by Miaohe and Muchun)
>>> - add a few more !gigantic_page_runtime_supported() check in
>>>   set_max_huge_pages() (by Mike).
>>> ---
>>>  mm/hugetlb.c | 19 ++++++++++++++++---
>>>  1 file changed, 16 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>> index 2a554f006255..bdc4499f324b 100644
>>> --- a/mm/hugetlb.c
>>> +++ b/mm/hugetlb.c
>>> @@ -2432,8 +2432,7 @@ static void return_unused_surplus_pages(struct hstate *h,
>>>  	/* Uncommit the reservation */
>>>  	h->resv_huge_pages -= unused_resv_pages;
>>>  
>>> -	/* Cannot return gigantic pages currently */
>>> -	if (hstate_is_gigantic(h))
>>> +	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
>>>  		goto out;
>>>  
>>>  	/*
>>> @@ -3315,7 +3314,8 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
>>>  	 * the user tries to allocate gigantic pages but let the user free the
>>>  	 * boottime allocated gigantic pages.
>>>  	 */
>>> -	if (hstate_is_gigantic(h) && !IS_ENABLED(CONFIG_CONTIG_ALLOC)) {
>>> +	if (hstate_is_gigantic(h) && (!IS_ENABLED(CONFIG_CONTIG_ALLOC) ||
>>> +				      !gigantic_page_runtime_supported())) {
>>>  		if (count > persistent_huge_pages(h)) {
>>>  			spin_unlock_irq(&hugetlb_lock);
>>>  			mutex_unlock(&h->resize_lock);
>>> @@ -3363,6 +3363,19 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
>>>  			goto out;
>>>  	}
>>>  
>>> +	/*
>>> +	 * We can not decrease gigantic pool size if runtime modification
>>> +	 * is not supported.
>>> +	 */
>>> +	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported()) {
>>> +		if (count < persistent_huge_pages(h)) {
>>> +			spin_unlock_irq(&hugetlb_lock);
>>> +			mutex_unlock(&h->resize_lock);
>>> +			NODEMASK_FREE(node_alloc_noretry);
>>> +			return -EINVAL;
>>> +		}
>>> +	}
>>
>> With above change, we're not allowed to decrease the pool size now. But it was allowed previously
>> even if !gigantic_page_runtime_supported. Does this will break user?
> 
> Yes, it does. I might get the wrong idea about the definition of
> gigantic_page_runtime_supported(), which shows that runtime pool *extension*
> is supported or not (implying that pool shrinking is always possible).
> If this is right, this new if-block is not necessary.

I tend to remove above new if-block to keep pool shrinking available.

Thanks.

> 
>>
>> And it seems it's not allowed to adjust the max_huge_pages now if !gigantic_page_runtime_supported
>> for gigantic huge page. Should we just return for such case as there should be nothing to do now?
>> Or am I miss something?
> 
> If pool shrinking is always allowed, we need uptdate max_huge_pages so,
> the above if-block should have "goto out;", but it will be removed anyway
> so we don't have to care for it.
> 
> Thank you for the valuable comment.
> 
> - Naoya Horiguchi
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [mm-unstable PATCH v4 3/9] mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry
  2022-07-05  9:04     ` HORIGUCHI NAOYA(堀口　直也)
@ 2022-07-06  3:07       ` Miaohe Lin
  0 siblings, 0 replies; 30+ messages in thread
From: Miaohe Lin @ 2022-07-06  3:07 UTC (permalink / raw)
  To: HORIGUCHI NAOYA(堀口 直也)
  Cc: Naoya Horiguchi, linux-mm, Andrew Morton, David Hildenbrand,
	Mike Kravetz, Liu Shixin, Yang Shi, Oscar Salvador, Muchun Song,
	linux-kernel

On 2022/7/5 17:04, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Tue, Jul 05, 2022 at 10:46:09AM +0800, Miaohe Lin wrote:
>> On 2022/7/4 9:33, Naoya Horiguchi wrote:
>>> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>>
>>> follow_pud_mask() does not support non-present pud entry now.  As long as
>>> I tested on x86_64 server, follow_pud_mask() still simply returns
>>> no_page_table() for non-present_pud_entry() due to pud_bad(), so no severe
>>> user-visible effect should happen.  But generally we should call
>>> follow_huge_pud() for non-present pud entry for 1GB hugetlb page.
>>>
>>> Update pud_huge() and follow_huge_pud() to handle non-present pud entries.
>>> The changes are similar to previous works for pud entries commit e66f17ff7177
>>> ("mm/hugetlb: take page table lock in follow_huge_pmd()") and commit
>>> cbef8478bee5 ("mm/hugetlb: pmd_huge() returns true for non-present hugepage").
>>>
>>> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>> ---
>>> v2 -> v3:
>>> - fixed typos in subject and description,
>>> - added comment on pud_huge(),
>>> - added comment about fallback for hwpoisoned entry,
>>> - updated initial check about FOLL_{PIN,GET} flags.
>>> ---
>>>  arch/x86/mm/hugetlbpage.c |  8 +++++++-
>>>  mm/hugetlb.c              | 32 ++++++++++++++++++++++++++++++--
>>>  2 files changed, 37 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
>>> index 509408da0da1..6b3033845c6d 100644
>>> --- a/arch/x86/mm/hugetlbpage.c
>>> +++ b/arch/x86/mm/hugetlbpage.c
>>> @@ -30,9 +30,15 @@ int pmd_huge(pmd_t pmd)
>>>  		(pmd_val(pmd) & (_PAGE_PRESENT|_PAGE_PSE)) != _PAGE_PRESENT;
>>>  }
>>>  
>>> +/*
>>> + * pud_huge() returns 1 if @pud is hugetlb related entry, that is normal
>>> + * hugetlb entry or non-present (migration or hwpoisoned) hugetlb entry.
>>> + * Otherwise, returns 0.
>>> + */
>>>  int pud_huge(pud_t pud)
>>>  {
>>> -	return !!(pud_val(pud) & _PAGE_PSE);
>>> +	return !pud_none(pud) &&
>>> +		(pud_val(pud) & (_PAGE_PRESENT|_PAGE_PSE)) != _PAGE_PRESENT;
>>>  }
>>
>> Question: Is aarch64 supported too? It seems aarch64 version of pud_huge matches
>> the requirement naturally for me.
> 
> I think that if pmd_huge() and pud_huge() return true for non-present
> pmd/pud entries, that's OK.  Otherwise we need update to support the
> new feature.
> 
> In aarch64, the bits in pte/pmd/pud related to {pmd,pud}_present() and
> {pmd,pud}_huge() seem not to overlap with the bit range for swap type
> and swap offset, so maybe that's fine.  But I recommend to test with
> arm64 if you have access to aarch64 servers.

I see. This series is intended to enable 1GB hugepage support on x86. And if
someone wants to use it in other arches, it's better to have a test first. ;)

Thanks.

> 
>>
>> Anyway, this patch looks good to me.
>>
>> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
> 
> Thank you for reviewing.
> 
> - Naoya Horiguchi
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [mm-unstable PATCH v4 1/9] mm/hugetlb: check gigantic_page_runtime_supported() in return_unused_surplus_pages()
  2022-07-06  3:04       ` Miaohe Lin
@ 2022-07-06  3:22         ` Mike Kravetz
  2022-07-07  2:59           ` Miaohe Lin
  0 siblings, 1 reply; 30+ messages in thread
From: Mike Kravetz @ 2022-07-06  3:22 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: HORIGUCHI NAOYA(堀口 直也),
	Naoya Horiguchi, linux-mm, Andrew Morton, David Hildenbrand,
	Liu Shixin, Yang Shi, Oscar Salvador, Muchun Song, linux-kernel

On 07/06/22 11:04, Miaohe Lin wrote:
> On 2022/7/5 14:39, HORIGUCHI NAOYA(堀口 直也) wrote:
> > On Tue, Jul 05, 2022 at 10:16:39AM +0800, Miaohe Lin wrote:
> >> On 2022/7/4 9:33, Naoya Horiguchi wrote:
> >>> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> >>>
> >>> I found a weird state of 1GB hugepage pool, caused by the following
> >>> procedure:
> >>>
> >>>   - run a process reserving all free 1GB hugepages,
> >>>   - shrink free 1GB hugepage pool to zero (i.e. writing 0 to
> >>>     /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages), then
> >>>   - kill the reserving process.
> >>>
> >>> , then all the hugepages are free *and* surplus at the same time.
> >>>
> >>>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
> >>>   3
> >>>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/free_hugepages
> >>>   3
> >>>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/resv_hugepages
> >>>   0
> >>>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/surplus_hugepages
> >>>   3
> >>>
> >>> This state is resolved by reserving and allocating the pages then
> >>> freeing them again, so this seems not to result in serious problem.
> >>> But it's a little surprising (shrinking pool suddenly fails).
> >>>
> >>> This behavior is caused by hstate_is_gigantic() check in
> >>> return_unused_surplus_pages(). This was introduced so long ago in 2008
> >>> by commit aa888a74977a ("hugetlb: support larger than MAX_ORDER"), and
> >>> at that time the gigantic pages were not supposed to be allocated/freed
> >>> at run-time.  Now kernel can support runtime allocation/free, so let's
> >>> check gigantic_page_runtime_supported() together.
> >>>
> >>> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> >>
> >> This patch looks good to me with a few question below.
> > 
> > Thank you for reviewing.
> > 
> >>
> >>> ---
> >>> v2 -> v3:
> >>> - Fixed typo in patch description,
> >>> - add !gigantic_page_runtime_supported() check instead of removing
> >>>   hstate_is_gigantic() check (suggested by Miaohe and Muchun)
> >>> - add a few more !gigantic_page_runtime_supported() check in
> >>>   set_max_huge_pages() (by Mike).
> >>> ---
> >>>  mm/hugetlb.c | 19 ++++++++++++++++---
> >>>  1 file changed, 16 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> >>> index 2a554f006255..bdc4499f324b 100644
> >>> --- a/mm/hugetlb.c
> >>> +++ b/mm/hugetlb.c
> >>> @@ -2432,8 +2432,7 @@ static void return_unused_surplus_pages(struct hstate *h,
> >>>  	/* Uncommit the reservation */
> >>>  	h->resv_huge_pages -= unused_resv_pages;
> >>>  
> >>> -	/* Cannot return gigantic pages currently */
> >>> -	if (hstate_is_gigantic(h))
> >>> +	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
> >>>  		goto out;
> >>>  
> >>>  	/*
> >>> @@ -3315,7 +3314,8 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
> >>>  	 * the user tries to allocate gigantic pages but let the user free the
> >>>  	 * boottime allocated gigantic pages.
> >>>  	 */
> >>> -	if (hstate_is_gigantic(h) && !IS_ENABLED(CONFIG_CONTIG_ALLOC)) {
> >>> +	if (hstate_is_gigantic(h) && (!IS_ENABLED(CONFIG_CONTIG_ALLOC) ||
> >>> +				      !gigantic_page_runtime_supported())) {
> >>>  		if (count > persistent_huge_pages(h)) {
> >>>  			spin_unlock_irq(&hugetlb_lock);
> >>>  			mutex_unlock(&h->resize_lock);
> >>> @@ -3363,6 +3363,19 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
> >>>  			goto out;
> >>>  	}
> >>>  
> >>> +	/*
> >>> +	 * We can not decrease gigantic pool size if runtime modification
> >>> +	 * is not supported.
> >>> +	 */
> >>> +	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported()) {
> >>> +		if (count < persistent_huge_pages(h)) {
> >>> +			spin_unlock_irq(&hugetlb_lock);
> >>> +			mutex_unlock(&h->resize_lock);
> >>> +			NODEMASK_FREE(node_alloc_noretry);
> >>> +			return -EINVAL;
> >>> +		}
> >>> +	}
> >>
> >> With above change, we're not allowed to decrease the pool size now. But it was allowed previously
> >> even if !gigantic_page_runtime_supported. Does this will break user?
> > 
> > Yes, it does. I might get the wrong idea about the definition of
> > gigantic_page_runtime_supported(), which shows that runtime pool *extension*
> > is supported or not (implying that pool shrinking is always possible).
> > If this is right, this new if-block is not necessary.
> 
> I tend to remove above new if-block to keep pool shrinking available.
> 

Not sure I am following the questions.

Take a look at __update_and_free_page which will refuse to 'free' a
gigantic page if !gigantic_page_runtime_supported.  I 'think' attempting
to shrink the pool when !gigantic_page_runtime_supported will result in
leaking gigantic pages.  i.e.  Memory will remain allocated for the
gigantic page, but it can not be used.

I can take a closer look during my tomorrow.

IIRC, the only way gigantic_page_runtime_supported is not set to day is
in the case of powerpc using 16GB pages allocated/managed by firmware.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [mm-unstable PATCH v4 1/9] mm/hugetlb: check gigantic_page_runtime_supported() in return_unused_surplus_pages()
  2022-07-04  1:33 ` [mm-unstable PATCH v4 1/9] mm/hugetlb: check gigantic_page_runtime_supported() in return_unused_surplus_pages() Naoya Horiguchi
  2022-07-05  2:16   ` Miaohe Lin
@ 2022-07-06 21:51   ` Mike Kravetz
  2022-07-07  0:56     ` HORIGUCHI NAOYA(堀口　直也)
  1 sibling, 1 reply; 30+ messages in thread
From: Mike Kravetz @ 2022-07-06 21:51 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andrew Morton, David Hildenbrand, Miaohe Lin,
	Liu Shixin, Yang Shi, Oscar Salvador, Muchun Song,
	Naoya Horiguchi, linux-kernel

On 07/04/22 10:33, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> I found a weird state of 1GB hugepage pool, caused by the following
> procedure:
> 
>   - run a process reserving all free 1GB hugepages,
>   - shrink free 1GB hugepage pool to zero (i.e. writing 0 to
>     /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages), then
>   - kill the reserving process.
> 
> , then all the hugepages are free *and* surplus at the same time.
> 
>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
>   3
>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/free_hugepages
>   3
>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/resv_hugepages
>   0
>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/surplus_hugepages
>   3
> 
> This state is resolved by reserving and allocating the pages then
> freeing them again, so this seems not to result in serious problem.
> But it's a little surprising (shrinking pool suddenly fails).
> 
> This behavior is caused by hstate_is_gigantic() check in
> return_unused_surplus_pages(). This was introduced so long ago in 2008
> by commit aa888a74977a ("hugetlb: support larger than MAX_ORDER"), and
> at that time the gigantic pages were not supposed to be allocated/freed
> at run-time.  Now kernel can support runtime allocation/free, so let's
> check gigantic_page_runtime_supported() together.
> 
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> ---
> v2 -> v3:
> - Fixed typo in patch description,
> - add !gigantic_page_runtime_supported() check instead of removing
>   hstate_is_gigantic() check (suggested by Miaohe and Muchun)
> - add a few more !gigantic_page_runtime_supported() check in
>   set_max_huge_pages() (by Mike).

Hi Naoya,

My apologies for suggesting the above checks in set_max_huge_pages().
set_max_huge_pages is only called from __nr_hugepages_store_common.
At the very beginning of __nr_hugepages_store_common is this:

	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
		return -EINVAL;

So, those extra checks in set_max_huge_pages are unnecessary.  Sorry!
-- 
Mike Kravetz


> ---
>  mm/hugetlb.c | 19 ++++++++++++++++---
>  1 file changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 2a554f006255..bdc4499f324b 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2432,8 +2432,7 @@ static void return_unused_surplus_pages(struct hstate *h,
>  	/* Uncommit the reservation */
>  	h->resv_huge_pages -= unused_resv_pages;
>  
> -	/* Cannot return gigantic pages currently */
> -	if (hstate_is_gigantic(h))
> +	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
>  		goto out;
>  
>  	/*
> @@ -3315,7 +3314,8 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
>  	 * the user tries to allocate gigantic pages but let the user free the
>  	 * boottime allocated gigantic pages.
>  	 */
> -	if (hstate_is_gigantic(h) && !IS_ENABLED(CONFIG_CONTIG_ALLOC)) {
> +	if (hstate_is_gigantic(h) && (!IS_ENABLED(CONFIG_CONTIG_ALLOC) ||
> +				      !gigantic_page_runtime_supported())) {
>  		if (count > persistent_huge_pages(h)) {
>  			spin_unlock_irq(&hugetlb_lock);
>  			mutex_unlock(&h->resize_lock);
> @@ -3363,6 +3363,19 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
>  			goto out;
>  	}
>  
> +	/*
> +	 * We can not decrease gigantic pool size if runtime modification
> +	 * is not supported.
> +	 */
> +	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported()) {
> +		if (count < persistent_huge_pages(h)) {
> +			spin_unlock_irq(&hugetlb_lock);
> +			mutex_unlock(&h->resize_lock);
> +			NODEMASK_FREE(node_alloc_noretry);
> +			return -EINVAL;
> +		}
> +	}
> +
>  	/*
>  	 * Decrease the pool size
>  	 * First return free pages to the buddy allocator (being careful
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [mm-unstable PATCH v4 3/9] mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry
  2022-07-04  1:33 ` [mm-unstable PATCH v4 3/9] mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry Naoya Horiguchi
  2022-07-05  2:46   ` Miaohe Lin
@ 2022-07-06 22:21   ` Mike Kravetz
  1 sibling, 0 replies; 30+ messages in thread
From: Mike Kravetz @ 2022-07-06 22:21 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andrew Morton, David Hildenbrand, Miaohe Lin,
	Liu Shixin, Yang Shi, Oscar Salvador, Muchun Song,
	Naoya Horiguchi, linux-kernel

On 07/04/22 10:33, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> follow_pud_mask() does not support non-present pud entry now.  As long as
> I tested on x86_64 server, follow_pud_mask() still simply returns
> no_page_table() for non-present_pud_entry() due to pud_bad(), so no severe
> user-visible effect should happen.  But generally we should call
> follow_huge_pud() for non-present pud entry for 1GB hugetlb page.
> 
> Update pud_huge() and follow_huge_pud() to handle non-present pud entries.
> The changes are similar to previous works for pud entries commit e66f17ff7177
> ("mm/hugetlb: take page table lock in follow_huge_pmd()") and commit
> cbef8478bee5 ("mm/hugetlb: pmd_huge() returns true for non-present hugepage").
> 
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> ---
> v2 -> v3:
> - fixed typos in subject and description,
> - added comment on pud_huge(),
> - added comment about fallback for hwpoisoned entry,
> - updated initial check about FOLL_{PIN,GET} flags.
> ---
>  arch/x86/mm/hugetlbpage.c |  8 +++++++-
>  mm/hugetlb.c              | 32 ++++++++++++++++++++++++++++++--
>  2 files changed, 37 insertions(+), 3 deletions(-)

Thanks!

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [mm-unstable PATCH v4 4/9] mm, hwpoison, hugetlb: support saving mechanism of raw error pages
  2022-07-06  2:37   ` Miaohe Lin
@ 2022-07-06 23:06     ` HORIGUCHI NAOYA(堀口　直也)
  2022-07-07  3:22       ` Miaohe Lin
  0 siblings, 1 reply; 30+ messages in thread
From: HORIGUCHI NAOYA(堀口　直也) @ 2022-07-06 23:06 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: Naoya Horiguchi, linux-mm, Andrew Morton, David Hildenbrand,
	Mike Kravetz, Liu Shixin, Yang Shi, Oscar Salvador, Muchun Song,
	linux-kernel

On Wed, Jul 06, 2022 at 10:37:50AM +0800, Miaohe Lin wrote:
> On 2022/7/4 9:33, Naoya Horiguchi wrote:
> > From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > 
> > When handling memory error on a hugetlb page, the error handler tries to
> > dissolve and turn it into 4kB pages.  If it's successfully dissolved,
> > PageHWPoison flag is moved to the raw error page, so that's all right.
> > However, dissolve sometimes fails, then the error page is left as
> > hwpoisoned hugepage. It's useful if we can retry to dissolve it to save
> > healthy pages, but that's not possible now because the information about
> > where the raw error pages is lost.
> > 
> > Use the private field of a few tail pages to keep that information.  The
> > code path of shrinking hugepage pool uses this info to try delayed dissolve.
> > In order to remember multiple errors in a hugepage, a singly-linked list
> > originated from SUBPAGE_INDEX_HWPOISON-th tail page is constructed.  Only
> > simple operations (adding an entry or clearing all) are required and the
> > list is assumed not to be very long, so this simple data structure should
> > be enough.
> > 
> > If we failed to save raw error info, the hwpoison hugepage has errors on
> > unknown subpage, then this new saving mechanism does not work any more,
> > so disable saving new raw error info and freeing hwpoison hugepages.
> > 
> > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > ---
...
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 66bb39e0fce8..ccd470f0194c 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1535,17 +1535,15 @@ static void __update_and_free_page(struct hstate *h, struct page *page)
> >  	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
> >  		return;
> >  
> > -	if (hugetlb_vmemmap_restore(h, page)) {
> > -		spin_lock_irq(&hugetlb_lock);
> > -		/*
> > -		 * If we cannot allocate vmemmap pages, just refuse to free the
> > -		 * page and put the page back on the hugetlb free list and treat
> > -		 * as a surplus page.
> > -		 */
> > -		add_hugetlb_page(h, page, true);
> > -		spin_unlock_irq(&hugetlb_lock);
> > -		return;
> > -	}
> > +	if (hugetlb_vmemmap_restore(h, page))
> > +		goto fail;
> > +
> > +	/*
> > +	 * Move PageHWPoison flag from head page to the raw error pages,
> > +	 * which makes any healthy subpages reusable.
> > +	 */
> > +	if (unlikely(PageHWPoison(page) && hugetlb_clear_page_hwpoison(page)))
> > +		goto fail;
> 
> IIUC, HPageVmemmapOptimized must have been cleared via hugetlb_vmemmap_restore above. So
> VM_BUG_ON_PAGE(!HPageVmemmapOptimized(page), page) in add_hugetlb_page will be triggered
> if we go to fail here. add_hugetlb_page is expected to be called when we cannot allocate
> vmemmap pages.

Thanks a lot, you're right. I shouldn't simply factor the failure path with
the goto label.  I think that it's hard to undo hugetlb_vmemmap_restore(), so
checking HPageRawHwpUnreliable() before hugetlb_vmemmap_restore(), then
try to hugetlb_clear_page_hwpoison() after it (where tail pages are available).

> 
> >  
> >  	for (i = 0; i < pages_per_huge_page(h);
> >  	     i++, subpage = mem_map_next(subpage, page, i)) {
> > @@ -1566,6 +1564,16 @@ static void __update_and_free_page(struct hstate *h, struct page *page)
> >  	} else {
> >  		__free_pages(page, huge_page_order(h));
> >  	}
> > +	return;
> > +fail:
> > +	spin_lock_irq(&hugetlb_lock);
> > +	/*
> > +	 * If we cannot allocate vmemmap pages or cannot identify raw hwpoison
> > +	 * subpages reliably, just refuse to free the page and put the page
> > +	 * back on the hugetlb free list and treat as a surplus page.
> > +	 */
> > +	add_hugetlb_page(h, page, true);
> > +	spin_unlock_irq(&hugetlb_lock);
> >  }
> >  
> >  /*
> > @@ -2109,15 +2117,6 @@ int dissolve_free_huge_page(struct page *page)
> >  		 */
> >  		rc = hugetlb_vmemmap_restore(h, head);
> >  		if (!rc) {
> > -			/*
> > -			 * Move PageHWPoison flag from head page to the raw
> > -			 * error page, which makes any subpages rather than
> > -			 * the error page reusable.
> > -			 */
> > -			if (PageHWPoison(head) && page != head) {
> > -				SetPageHWPoison(page);
> > -				ClearPageHWPoison(head);
> > -			}
> >  			update_and_free_page(h, head, false);
> >  		} else {
> >  			spin_lock_irq(&hugetlb_lock);
> > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > index c9931c676335..53bf7486a245 100644
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -1664,6 +1664,82 @@ int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index,
> >  EXPORT_SYMBOL_GPL(mf_dax_kill_procs);
> >  #endif /* CONFIG_FS_DAX */
> >  
> > +/*
> > + * Struct raw_hwp_page represents information about "raw error page",
> > + * constructing singly linked list originated from ->private field of
> > + * SUBPAGE_INDEX_HWPOISON-th tail page.
> > + */
> > +struct raw_hwp_page {
> > +	struct llist_node node;
> > +	struct page *page;
> > +};
> > +
> > +static inline struct llist_head *raw_hwp_list_head(struct page *hpage)
> > +{
> > +	return (struct llist_head *)&page_private(hpage + SUBPAGE_INDEX_HWPOISON);
> > +}
> > +
> > +static inline int hugetlb_set_page_hwpoison(struct page *hpage,
> > +					struct page *page)
> > +{
> > +	struct llist_head *head;
> > +	struct raw_hwp_page *raw_hwp;
> > +	struct llist_node *t, *tnode;
> > +	int ret;
> > +
> > +	/*
> > +	 * Once the hwpoison hugepage has lost reliable raw error info,
> > +	 * there is little meaning to keep additional error info precisely,
> > +	 * so skip to add additional raw error info.
> > +	 */
> > +	if (HPageRawHwpUnreliable(hpage))
> > +		return -EHWPOISON;
> 
> If we return here, num_poisoned_pages can't reflect the all hwpoisoned hugepages?

No, it can't now.  Currently we try to (and fail) count only the number
hwpoison subpages with raw_hwp_info.  If we want to track all corrupted
pages (including hwpoison subpages without raw_hwp_info), maybe running the
following part first in this function could make it better.

	ret = TestSetPageHWPoison(hpage) ? -EHWPOISON : 0;
	/* the first error event will be counted in action_result(). */
	if (ret)
		num_poisoned_pages_inc();

But I like the option you suggest below.

> 
> > +	head = raw_hwp_list_head(hpage);
> > +	llist_for_each_safe(tnode, t, head->first) {
> > +		struct raw_hwp_page *p = container_of(tnode, struct raw_hwp_page, node);
> > +
> > +		if (p->page == page)
> > +			return -EHWPOISON;
> > +	}
> > +
> > +	ret = TestSetPageHWPoison(hpage) ? -EHWPOISON : 0;
> > +	/* the first error event will be counted in action_result(). */
> > +	if (ret)
> > +		num_poisoned_pages_inc();
> > +
> > +	raw_hwp = kmalloc(sizeof(struct raw_hwp_page), GFP_ATOMIC);
> > +	if (raw_hwp) {
> > +		raw_hwp->page = page;
> > +		llist_add(&raw_hwp->node, head);
> 
> IMHO, we might need to do num_poisoned_pages_inc here because we decrement the
> num_poisoned_pages according to the llist length.

Yes, if we'd like to count only hwpoisoned subpages with raw_hwp_info,
doing num_poisoned_pages_inc here is fine.

> > +	} else {
> > +		/*
> > +		 * Failed to save raw error info.  We no longer trace all
> > +		 * hwpoisoned subpages, and we need refuse to free/dissolve
> > +		 * this hwpoisoned hugepage.
> > +		 */
> > +		SetHPageRawHwpUnreliable(hpage);
> > +	}
> > +	return ret;
> > +}
> > +
> > +inline int hugetlb_clear_page_hwpoison(struct page *hpage)
> 
> off-the-topic: Is "inline" needed here? I see hugetlb_clear_page_hwpoison is "extern" above.

Maybe not, this code is not performance-sensitive and actually this inline
is a leftover of updates in the previous versions. I'll remove this.

> 
> > +{
> > +	struct llist_head *head;
> > +	struct llist_node *t, *tnode;
> > +
> > +	if (!HPageRawHwpUnreliable(hpage))
> > +		ClearPageHWPoison(hpage);
> > +	head = raw_hwp_list_head(hpage);
> > +	llist_for_each_safe(tnode, t, head->first) {
> > +		struct raw_hwp_page *p = container_of(tnode, struct raw_hwp_page, node);
> > +
> > +		SetPageHWPoison(p->page);
> 
> IMHO, in HPageRawHwpUnreliable(hpage) case, it's better not to do SetPageHWPoison here.
> Because hugepage won't be dissolved and thus we cannot write any data to some tail struct
> pages if HugeTLB Vmemmap Optimization is enabled. Freeing the memory here should be enough.

This is a good point, too. Current version surely does not work with HVO, so
I think of simply giving up clearing hwpoison when HPageVmemmapOptimized is
true. And I should leave some inline comment about this.

> > +		kfree(p);
> > +	}
> > +	llist_del_all(head);
> > +	return 0;
> > +}
> 
> I thought num_poisoned_pages_dec is missing and return value is unneeded. But this is changed
> in next patch. So it should be fine here.

OK.

> 
> > +
> >  /*
> >   * Called from hugetlb code with hugetlb_lock held.
> >   *
> > @@ -1698,7 +1774,7 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
> >  		goto out;
> >  	}
> >  
> > -	if (TestSetPageHWPoison(head)) {
> > +	if (hugetlb_set_page_hwpoison(head, page)) {
> >  		ret = -EHWPOISON;
> >  		goto out;
> >  	}
> > @@ -1751,7 +1827,7 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
> >  	lock_page(head);
> >  
> >  	if (hwpoison_filter(p)) {
> > -		ClearPageHWPoison(head);
> > +		hugetlb_clear_page_hwpoison(head);
> >  		res = -EOPNOTSUPP;
> >  		goto out;
> >  	}
> > 
> 
> Many thanks for your hard work. :)

Thanks for the detailed review and feedback.

- Naoya Horiguchi

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [mm-unstable PATCH v4 5/9] mm, hwpoison: make unpoison aware of raw error info in hwpoisoned hugepage
  2022-07-06  2:58   ` Miaohe Lin
@ 2022-07-06 23:06     ` HORIGUCHI NAOYA(堀口　直也)
  2022-07-07  1:35       ` HORIGUCHI NAOYA(堀口　直也)
  0 siblings, 1 reply; 30+ messages in thread
From: HORIGUCHI NAOYA(堀口　直也) @ 2022-07-06 23:06 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: Naoya Horiguchi, linux-mm, Andrew Morton, David Hildenbrand,
	Mike Kravetz, Liu Shixin, Yang Shi, Oscar Salvador, Muchun Song,
	linux-kernel

On Wed, Jul 06, 2022 at 10:58:53AM +0800, Miaohe Lin wrote:
> On 2022/7/4 9:33, Naoya Horiguchi wrote:
> > From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > 
> > Raw error info list needs to be removed when hwpoisoned hugetlb is
> > unpoisoned.  And unpoison handler needs to know how many errors there
> > are in the target hugepage. So add them.
> > 
> > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > ---
> > @@ -2287,6 +2301,7 @@ int unpoison_memory(unsigned long pfn)
> 
> Is it safe to unpoison hugepage when HPageRawHwpUnreliable? I'm afraid because
> some raw error info is missing..

Ah, right. We need prevent it.  I'll fix it by inserting the check.

 static inline long free_raw_hwp_pages(struct page *hpage, bool move_flag)
 {
         struct llist_head *head;
         struct llist_node *t, *tnode;
         long count = 0;
 
+        if (!HPageRawHwpUnreliable(hpage))
+                return 0;

Thanks,
Naoya Horiguchi

> Thanks.
> 
> >  	struct page *p;
> >  	int ret = -EBUSY;
> >  	int freeit = 0;
> > +	long count = 1;
> >  	static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
> >  					DEFAULT_RATELIMIT_BURST);
> >  
> > @@ -2334,6 +2349,8 @@ int unpoison_memory(unsigned long pfn)
> >  
> >  	ret = get_hwpoison_page(p, MF_UNPOISON);
> >  	if (!ret) {
> > +		if (PageHuge(p))
> > +			count = free_raw_hwp_pages(page, false);
> >  		ret = TestClearPageHWPoison(page) ? 0 : -EBUSY;
> >  	} else if (ret < 0) {
> >  		if (ret == -EHWPOISON) {
> > @@ -2342,6 +2359,8 @@ int unpoison_memory(unsigned long pfn)
> >  			unpoison_pr_info("Unpoison: failed to grab page %#lx\n",
> >  					 pfn, &unpoison_rs);
> >  	} else {
> > +		if (PageHuge(p))
> > +			count = free_raw_hwp_pages(page, false);
> >  		freeit = !!TestClearPageHWPoison(p);
> >  
> >  		put_page(page);
> > @@ -2354,7 +2373,7 @@ int unpoison_memory(unsigned long pfn)
> >  unlock_mutex:
> >  	mutex_unlock(&mf_mutex);
> >  	if (!ret || freeit) {
> > -		num_poisoned_pages_dec();
> > +		num_poisoned_pages_sub(count);
> >  		unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
> >  				 page_to_pfn(p), &unpoison_rs);
> >  	}
> > 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [mm-unstable PATCH v4 1/9] mm/hugetlb: check gigantic_page_runtime_supported() in return_unused_surplus_pages()
  2022-07-06 21:51   ` Mike Kravetz
@ 2022-07-07  0:56     ` HORIGUCHI NAOYA(堀口　直也)
  0 siblings, 0 replies; 30+ messages in thread
From: HORIGUCHI NAOYA(堀口　直也) @ 2022-07-07  0:56 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Naoya Horiguchi, linux-mm, Andrew Morton, David Hildenbrand,
	Miaohe Lin, Liu Shixin, Yang Shi, Oscar Salvador, Muchun Song,
	linux-kernel

On Wed, Jul 06, 2022 at 02:51:00PM -0700, Mike Kravetz wrote:
> On 07/04/22 10:33, Naoya Horiguchi wrote:
> > From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > 
> > I found a weird state of 1GB hugepage pool, caused by the following
> > procedure:
> > 
> >   - run a process reserving all free 1GB hugepages,
> >   - shrink free 1GB hugepage pool to zero (i.e. writing 0 to
> >     /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages), then
> >   - kill the reserving process.
> > 
> > , then all the hugepages are free *and* surplus at the same time.
> > 
> >   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
> >   3
> >   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/free_hugepages
> >   3
> >   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/resv_hugepages
> >   0
> >   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/surplus_hugepages
> >   3
> > 
> > This state is resolved by reserving and allocating the pages then
> > freeing them again, so this seems not to result in serious problem.
> > But it's a little surprising (shrinking pool suddenly fails).
> > 
> > This behavior is caused by hstate_is_gigantic() check in
> > return_unused_surplus_pages(). This was introduced so long ago in 2008
> > by commit aa888a74977a ("hugetlb: support larger than MAX_ORDER"), and
> > at that time the gigantic pages were not supposed to be allocated/freed
> > at run-time.  Now kernel can support runtime allocation/free, so let's
> > check gigantic_page_runtime_supported() together.
> > 
> > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > ---
> > v2 -> v3:
> > - Fixed typo in patch description,
> > - add !gigantic_page_runtime_supported() check instead of removing
> >   hstate_is_gigantic() check (suggested by Miaohe and Muchun)
> > - add a few more !gigantic_page_runtime_supported() check in
> >   set_max_huge_pages() (by Mike).
> 
> Hi Naoya,
> 
> My apologies for suggesting the above checks in set_max_huge_pages().
> set_max_huge_pages is only called from __nr_hugepages_store_common.
> At the very beginning of __nr_hugepages_store_common is this:
> 
> 	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
> 		return -EINVAL;
> 
> So, those extra checks in set_max_huge_pages are unnecessary.  Sorry!

OK, so I'll drop both checks, thank you.

- Naoya Horiguchi

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [mm-unstable PATCH v4 5/9] mm, hwpoison: make unpoison aware of raw error info in hwpoisoned hugepage
  2022-07-06 23:06     ` HORIGUCHI NAOYA(堀口　直也)
@ 2022-07-07  1:35       ` HORIGUCHI NAOYA(堀口　直也)
  2022-07-07  3:08         ` Miaohe Lin
  0 siblings, 1 reply; 30+ messages in thread
From: HORIGUCHI NAOYA(堀口　直也) @ 2022-07-07  1:35 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: Naoya Horiguchi, linux-mm, Andrew Morton, David Hildenbrand,
	Mike Kravetz, Liu Shixin, Yang Shi, Oscar Salvador, Muchun Song,
	linux-kernel

On Wed, Jul 06, 2022 at 11:06:28PM +0000, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Wed, Jul 06, 2022 at 10:58:53AM +0800, Miaohe Lin wrote:
> > On 2022/7/4 9:33, Naoya Horiguchi wrote:
> > > From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > > 
> > > Raw error info list needs to be removed when hwpoisoned hugetlb is
> > > unpoisoned.  And unpoison handler needs to know how many errors there
> > > are in the target hugepage. So add them.
> > > 
> > > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > > ---
> > > @@ -2287,6 +2301,7 @@ int unpoison_memory(unsigned long pfn)
> > 
> > Is it safe to unpoison hugepage when HPageRawHwpUnreliable? I'm afraid because
> > some raw error info is missing..
> 
> Ah, right. We need prevent it.  I'll fix it by inserting the check.
> 
>  static inline long free_raw_hwp_pages(struct page *hpage, bool move_flag)
>  {
>          struct llist_head *head;
>          struct llist_node *t, *tnode;
>          long count = 0;
>  
> +        if (!HPageRawHwpUnreliable(hpage))
> +                return 0;

No, I meant "if (HPageRawHwpUnreliable(hpage))", sorry for the noise :(

- Naoya Horiguchi

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [mm-unstable PATCH v4 1/9] mm/hugetlb: check gigantic_page_runtime_supported() in return_unused_surplus_pages()
  2022-07-06  3:22         ` Mike Kravetz
@ 2022-07-07  2:59           ` Miaohe Lin
  0 siblings, 0 replies; 30+ messages in thread
From: Miaohe Lin @ 2022-07-07  2:59 UTC (permalink / raw)
  To: Mike Kravetz, alex
  Cc: HORIGUCHI NAOYA(堀口 直也),
	Naoya Horiguchi, linux-mm, Andrew Morton, David Hildenbrand,
	Liu Shixin, Yang Shi, Oscar Salvador, Muchun Song, linux-kernel

On 2022/7/6 11:22, Mike Kravetz wrote:
> On 07/06/22 11:04, Miaohe Lin wrote:
>> On 2022/7/5 14:39, HORIGUCHI NAOYA(堀口 直也) wrote:
>>> On Tue, Jul 05, 2022 at 10:16:39AM +0800, Miaohe Lin wrote:
>>>> On 2022/7/4 9:33, Naoya Horiguchi wrote:
>>>>> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>>>>
>>>>> I found a weird state of 1GB hugepage pool, caused by the following
>>>>> procedure:
>>>>>
>>>>>   - run a process reserving all free 1GB hugepages,
>>>>>   - shrink free 1GB hugepage pool to zero (i.e. writing 0 to
>>>>>     /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages), then
>>>>>   - kill the reserving process.
>>>>>
>>>>> , then all the hugepages are free *and* surplus at the same time.
>>>>>
>>>>>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
>>>>>   3
>>>>>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/free_hugepages
>>>>>   3
>>>>>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/resv_hugepages
>>>>>   0
>>>>>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/surplus_hugepages
>>>>>   3
>>>>>
>>>>> This state is resolved by reserving and allocating the pages then
>>>>> freeing them again, so this seems not to result in serious problem.
>>>>> But it's a little surprising (shrinking pool suddenly fails).
>>>>>
>>>>> This behavior is caused by hstate_is_gigantic() check in
>>>>> return_unused_surplus_pages(). This was introduced so long ago in 2008
>>>>> by commit aa888a74977a ("hugetlb: support larger than MAX_ORDER"), and
>>>>> at that time the gigantic pages were not supposed to be allocated/freed
>>>>> at run-time.  Now kernel can support runtime allocation/free, so let's
>>>>> check gigantic_page_runtime_supported() together.
>>>>>
>>>>> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>>>
>>>> This patch looks good to me with a few question below.
>>>
>>> Thank you for reviewing.
>>>
>>>>
>>>>> ---
>>>>> v2 -> v3:
>>>>> - Fixed typo in patch description,
>>>>> - add !gigantic_page_runtime_supported() check instead of removing
>>>>>   hstate_is_gigantic() check (suggested by Miaohe and Muchun)
>>>>> - add a few more !gigantic_page_runtime_supported() check in
>>>>>   set_max_huge_pages() (by Mike).
>>>>> ---
>>>>>  mm/hugetlb.c | 19 ++++++++++++++++---
>>>>>  1 file changed, 16 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>>>> index 2a554f006255..bdc4499f324b 100644
>>>>> --- a/mm/hugetlb.c
>>>>> +++ b/mm/hugetlb.c
>>>>> @@ -2432,8 +2432,7 @@ static void return_unused_surplus_pages(struct hstate *h,
>>>>>  	/* Uncommit the reservation */
>>>>>  	h->resv_huge_pages -= unused_resv_pages;
>>>>>  
>>>>> -	/* Cannot return gigantic pages currently */
>>>>> -	if (hstate_is_gigantic(h))
>>>>> +	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
>>>>>  		goto out;
>>>>>  
>>>>>  	/*
>>>>> @@ -3315,7 +3314,8 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
>>>>>  	 * the user tries to allocate gigantic pages but let the user free the
>>>>>  	 * boottime allocated gigantic pages.
>>>>>  	 */
>>>>> -	if (hstate_is_gigantic(h) && !IS_ENABLED(CONFIG_CONTIG_ALLOC)) {
>>>>> +	if (hstate_is_gigantic(h) && (!IS_ENABLED(CONFIG_CONTIG_ALLOC) ||
>>>>> +				      !gigantic_page_runtime_supported())) {
>>>>>  		if (count > persistent_huge_pages(h)) {
>>>>>  			spin_unlock_irq(&hugetlb_lock);
>>>>>  			mutex_unlock(&h->resize_lock);
>>>>> @@ -3363,6 +3363,19 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
>>>>>  			goto out;
>>>>>  	}
>>>>>  
>>>>> +	/*
>>>>> +	 * We can not decrease gigantic pool size if runtime modification
>>>>> +	 * is not supported.
>>>>> +	 */
>>>>> +	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported()) {
>>>>> +		if (count < persistent_huge_pages(h)) {
>>>>> +			spin_unlock_irq(&hugetlb_lock);
>>>>> +			mutex_unlock(&h->resize_lock);
>>>>> +			NODEMASK_FREE(node_alloc_noretry);
>>>>> +			return -EINVAL;
>>>>> +		}
>>>>> +	}
>>>>
>>>> With above change, we're not allowed to decrease the pool size now. But it was allowed previously
>>>> even if !gigantic_page_runtime_supported. Does this will break user?
>>>
>>> Yes, it does. I might get the wrong idea about the definition of
>>> gigantic_page_runtime_supported(), which shows that runtime pool *extension*
>>> is supported or not (implying that pool shrinking is always possible).
>>> If this is right, this new if-block is not necessary.
>>
>> I tend to remove above new if-block to keep pool shrinking available.
>>
> 
> Not sure I am following the questions.
> 
> Take a look at __update_and_free_page which will refuse to 'free' a
> gigantic page if !gigantic_page_runtime_supported.  I 'think' attempting
> to shrink the pool when !gigantic_page_runtime_supported will result in
> leaking gigantic pages.  i.e.  Memory will remain allocated for the

It seems the commit 4eb0716e868e ("hugetlb: allow to free gigantic pages regardless of the configuration")
adds the ability to free gigantic pages even if !gigantic_page_supported(). If the gigantic pages can't be
freed due to gigantic_page_runtime_supported check if __update_and_free_page, there might be something need
to do -- disallow trying to free gigantic pages when !gigantic_page_supported or succeeds to free gigantic
pages regardless of gigantic_page_supported. Maybe I am missing something important. Add Alexandre to help
confirm.

Thanks!

> gigantic page, but it can not be used.
> 
> I can take a closer look during my tomorrow.
> 
> IIRC, the only way gigantic_page_runtime_supported is not set to day is
> in the case of powerpc using 16GB pages allocated/managed by firmware.
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [mm-unstable PATCH v4 5/9] mm, hwpoison: make unpoison aware of raw error info in hwpoisoned hugepage
  2022-07-07  1:35       ` HORIGUCHI NAOYA(堀口　直也)
@ 2022-07-07  3:08         ` Miaohe Lin
  0 siblings, 0 replies; 30+ messages in thread
From: Miaohe Lin @ 2022-07-07  3:08 UTC (permalink / raw)
  To: HORIGUCHI NAOYA(堀口 直也)
  Cc: Naoya Horiguchi, linux-mm, Andrew Morton, David Hildenbrand,
	Mike Kravetz, Liu Shixin, Yang Shi, Oscar Salvador, Muchun Song,
	linux-kernel

On 2022/7/7 9:35, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Wed, Jul 06, 2022 at 11:06:28PM +0000, HORIGUCHI NAOYA(堀口 直也) wrote:
>> On Wed, Jul 06, 2022 at 10:58:53AM +0800, Miaohe Lin wrote:
>>> On 2022/7/4 9:33, Naoya Horiguchi wrote:
>>>> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>>>
>>>> Raw error info list needs to be removed when hwpoisoned hugetlb is
>>>> unpoisoned.  And unpoison handler needs to know how many errors there
>>>> are in the target hugepage. So add them.
>>>>
>>>> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>>> ---
>>>> @@ -2287,6 +2301,7 @@ int unpoison_memory(unsigned long pfn)
>>>
>>> Is it safe to unpoison hugepage when HPageRawHwpUnreliable? I'm afraid because
>>> some raw error info is missing..
>>
>> Ah, right. We need prevent it.  I'll fix it by inserting the check.
>>
>>  static inline long free_raw_hwp_pages(struct page *hpage, bool move_flag)
>>  {
>>          struct llist_head *head;
>>          struct llist_node *t, *tnode;
>>          long count = 0;
>>  
>> +        if (!HPageRawHwpUnreliable(hpage))
>> +                return 0;

IIUC, even if we return 0 here, the caller will still do TestClearPageHWPoison(please see below
code diff) and succeeds to unpoison the page. Or am I miss something?

@@ -2334,6 +2349,8 @@ int unpoison_memory(unsigned long pfn)

 	ret = get_hwpoison_page(p, MF_UNPOISON);
 	if (!ret) {
+		if (PageHuge(p))
+			count = free_raw_hwp_pages(page, false);
 		ret = TestClearPageHWPoison(page) ? 0 : -EBUSY;
 	} else if (ret < 0) {
 		if (ret == -EHWPOISON) {

> 
> No, I meant "if (HPageRawHwpUnreliable(hpage))", sorry for the noise :(

No, thanks for your hard work!

> 
> - Naoya Horiguchi

Thanks.

> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [mm-unstable PATCH v4 4/9] mm, hwpoison, hugetlb: support saving mechanism of raw error pages
  2022-07-06 23:06     ` HORIGUCHI NAOYA(堀口　直也)
@ 2022-07-07  3:22       ` Miaohe Lin
  0 siblings, 0 replies; 30+ messages in thread
From: Miaohe Lin @ 2022-07-07  3:22 UTC (permalink / raw)
  To: HORIGUCHI NAOYA(堀口 直也)
  Cc: Naoya Horiguchi, linux-mm, Andrew Morton, David Hildenbrand,
	Mike Kravetz, Liu Shixin, Yang Shi, Oscar Salvador, Muchun Song,
	linux-kernel

On 2022/7/7 7:06, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Wed, Jul 06, 2022 at 10:37:50AM +0800, Miaohe Lin wrote:
>> On 2022/7/4 9:33, Naoya Horiguchi wrote:
>>> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>>
>>> When handling memory error on a hugetlb page, the error handler tries to
>>> dissolve and turn it into 4kB pages.  If it's successfully dissolved,
>>> PageHWPoison flag is moved to the raw error page, so that's all right.
>>> However, dissolve sometimes fails, then the error page is left as
>>> hwpoisoned hugepage. It's useful if we can retry to dissolve it to save
>>> healthy pages, but that's not possible now because the information about
>>> where the raw error pages is lost.
>>>
>>> Use the private field of a few tail pages to keep that information.  The
>>> code path of shrinking hugepage pool uses this info to try delayed dissolve.
>>> In order to remember multiple errors in a hugepage, a singly-linked list
>>> originated from SUBPAGE_INDEX_HWPOISON-th tail page is constructed.  Only
>>> simple operations (adding an entry or clearing all) are required and the
>>> list is assumed not to be very long, so this simple data structure should
>>> be enough.
>>>
>>> If we failed to save raw error info, the hwpoison hugepage has errors on
>>> unknown subpage, then this new saving mechanism does not work any more,
>>> so disable saving new raw error info and freeing hwpoison hugepages.
>>>
>>> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>> ---
> ...
>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>> index 66bb39e0fce8..ccd470f0194c 100644
>>> --- a/mm/hugetlb.c
>>> +++ b/mm/hugetlb.c
>>> @@ -1535,17 +1535,15 @@ static void __update_and_free_page(struct hstate *h, struct page *page)
>>>  	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
>>>  		return;
>>>  
>>> -	if (hugetlb_vmemmap_restore(h, page)) {
>>> -		spin_lock_irq(&hugetlb_lock);
>>> -		/*
>>> -		 * If we cannot allocate vmemmap pages, just refuse to free the
>>> -		 * page and put the page back on the hugetlb free list and treat
>>> -		 * as a surplus page.
>>> -		 */
>>> -		add_hugetlb_page(h, page, true);
>>> -		spin_unlock_irq(&hugetlb_lock);
>>> -		return;
>>> -	}
>>> +	if (hugetlb_vmemmap_restore(h, page))
>>> +		goto fail;
>>> +
>>> +	/*
>>> +	 * Move PageHWPoison flag from head page to the raw error pages,
>>> +	 * which makes any healthy subpages reusable.
>>> +	 */
>>> +	if (unlikely(PageHWPoison(page) && hugetlb_clear_page_hwpoison(page)))
>>> +		goto fail;
>>
>> IIUC, HPageVmemmapOptimized must have been cleared via hugetlb_vmemmap_restore above. So
>> VM_BUG_ON_PAGE(!HPageVmemmapOptimized(page), page) in add_hugetlb_page will be triggered
>> if we go to fail here. add_hugetlb_page is expected to be called when we cannot allocate
>> vmemmap pages.
> 
> Thanks a lot, you're right. I shouldn't simply factor the failure path with
> the goto label.  I think that it's hard to undo hugetlb_vmemmap_restore(), so
> checking HPageRawHwpUnreliable() before hugetlb_vmemmap_restore(), then
> try to hugetlb_clear_page_hwpoison() after it (where tail pages are available).

Sounds feasible.

> 
>>
>>>  
>>>  	for (i = 0; i < pages_per_huge_page(h);
>>>  	     i++, subpage = mem_map_next(subpage, page, i)) {
>>> @@ -1566,6 +1564,16 @@ static void __update_and_free_page(struct hstate *h, struct page *page)
>>>  	} else {
>>>  		__free_pages(page, huge_page_order(h));
>>>  	}
>>> +	return;
>>> +fail:
>>> +	spin_lock_irq(&hugetlb_lock);
>>> +	/*
>>> +	 * If we cannot allocate vmemmap pages or cannot identify raw hwpoison
>>> +	 * subpages reliably, just refuse to free the page and put the page
>>> +	 * back on the hugetlb free list and treat as a surplus page.
>>> +	 */
>>> +	add_hugetlb_page(h, page, true);
>>> +	spin_unlock_irq(&hugetlb_lock);
>>>  }
>>>  
>>>  /*
>>> @@ -2109,15 +2117,6 @@ int dissolve_free_huge_page(struct page *page)
>>>  		 */
>>>  		rc = hugetlb_vmemmap_restore(h, head);
>>>  		if (!rc) {
>>> -			/*
>>> -			 * Move PageHWPoison flag from head page to the raw
>>> -			 * error page, which makes any subpages rather than
>>> -			 * the error page reusable.
>>> -			 */
>>> -			if (PageHWPoison(head) && page != head) {
>>> -				SetPageHWPoison(page);
>>> -				ClearPageHWPoison(head);
>>> -			}
>>>  			update_and_free_page(h, head, false);
>>>  		} else {
>>>  			spin_lock_irq(&hugetlb_lock);
>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>> index c9931c676335..53bf7486a245 100644
>>> --- a/mm/memory-failure.c
>>> +++ b/mm/memory-failure.c
>>> @@ -1664,6 +1664,82 @@ int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index,
>>>  EXPORT_SYMBOL_GPL(mf_dax_kill_procs);
>>>  #endif /* CONFIG_FS_DAX */
>>>  
>>> +/*
>>> + * Struct raw_hwp_page represents information about "raw error page",
>>> + * constructing singly linked list originated from ->private field of
>>> + * SUBPAGE_INDEX_HWPOISON-th tail page.
>>> + */
>>> +struct raw_hwp_page {
>>> +	struct llist_node node;
>>> +	struct page *page;
>>> +};
>>> +
>>> +static inline struct llist_head *raw_hwp_list_head(struct page *hpage)
>>> +{
>>> +	return (struct llist_head *)&page_private(hpage + SUBPAGE_INDEX_HWPOISON);
>>> +}
>>> +
>>> +static inline int hugetlb_set_page_hwpoison(struct page *hpage,
>>> +					struct page *page)
>>> +{
>>> +	struct llist_head *head;
>>> +	struct raw_hwp_page *raw_hwp;
>>> +	struct llist_node *t, *tnode;
>>> +	int ret;
>>> +
>>> +	/*
>>> +	 * Once the hwpoison hugepage has lost reliable raw error info,
>>> +	 * there is little meaning to keep additional error info precisely,
>>> +	 * so skip to add additional raw error info.
>>> +	 */
>>> +	if (HPageRawHwpUnreliable(hpage))
>>> +		return -EHWPOISON;
>>
>> If we return here, num_poisoned_pages can't reflect the all hwpoisoned hugepages?
> 
> No, it can't now.  Currently we try to (and fail) count only the number
> hwpoison subpages with raw_hwp_info.  If we want to track all corrupted
> pages (including hwpoison subpages without raw_hwp_info), maybe running the
> following part first in this function could make it better.
> 
> 	ret = TestSetPageHWPoison(hpage) ? -EHWPOISON : 0;
> 	/* the first error event will be counted in action_result(). */
> 	if (ret)
> 		num_poisoned_pages_inc();
> 
> But I like the option you suggest below.
> 
>>
>>> +	head = raw_hwp_list_head(hpage);
>>> +	llist_for_each_safe(tnode, t, head->first) {
>>> +		struct raw_hwp_page *p = container_of(tnode, struct raw_hwp_page, node);
>>> +
>>> +		if (p->page == page)
>>> +			return -EHWPOISON;
>>> +	}
>>> +
>>> +	ret = TestSetPageHWPoison(hpage) ? -EHWPOISON : 0;
>>> +	/* the first error event will be counted in action_result(). */
>>> +	if (ret)
>>> +		num_poisoned_pages_inc();
>>> +
>>> +	raw_hwp = kmalloc(sizeof(struct raw_hwp_page), GFP_ATOMIC);
>>> +	if (raw_hwp) {
>>> +		raw_hwp->page = page;
>>> +		llist_add(&raw_hwp->node, head);
>>
>> IMHO, we might need to do num_poisoned_pages_inc here because we decrement the
>> num_poisoned_pages according to the llist length.
> 
> Yes, if we'd like to count only hwpoisoned subpages with raw_hwp_info,
> doing num_poisoned_pages_inc here is fine.
> 
>>> +	} else {
>>> +		/*
>>> +		 * Failed to save raw error info.  We no longer trace all
>>> +		 * hwpoisoned subpages, and we need refuse to free/dissolve
>>> +		 * this hwpoisoned hugepage.
>>> +		 */
>>> +		SetHPageRawHwpUnreliable(hpage);
>>> +	}
>>> +	return ret;
>>> +}
>>> +
>>> +inline int hugetlb_clear_page_hwpoison(struct page *hpage)
>>
>> off-the-topic: Is "inline" needed here? I see hugetlb_clear_page_hwpoison is "extern" above.
> 
> Maybe not, this code is not performance-sensitive and actually this inline
> is a leftover of updates in the previous versions. I'll remove this.
> 
>>
>>> +{
>>> +	struct llist_head *head;
>>> +	struct llist_node *t, *tnode;
>>> +
>>> +	if (!HPageRawHwpUnreliable(hpage))
>>> +		ClearPageHWPoison(hpage);
>>> +	head = raw_hwp_list_head(hpage);
>>> +	llist_for_each_safe(tnode, t, head->first) {
>>> +		struct raw_hwp_page *p = container_of(tnode, struct raw_hwp_page, node);
>>> +
>>> +		SetPageHWPoison(p->page);
>>
>> IMHO, in HPageRawHwpUnreliable(hpage) case, it's better not to do SetPageHWPoison here.
>> Because hugepage won't be dissolved and thus we cannot write any data to some tail struct
>> pages if HugeTLB Vmemmap Optimization is enabled. Freeing the memory here should be enough.
> 
> This is a good point, too. Current version surely does not work with HVO, so
> I think of simply giving up clearing hwpoison when HPageVmemmapOptimized is
> true. And I should leave some inline comment about this.

Sounds like a good idea. This will make the life easier.

> 
>>> +		kfree(p);
>>> +	}
>>> +	llist_del_all(head);
>>> +	return 0;
>>> +}
>>
>> I thought num_poisoned_pages_dec is missing and return value is unneeded. But this is changed
>> in next patch. So it should be fine here.
> 
> OK.
> 
>>
>>> +
>>>  /*
>>>   * Called from hugetlb code with hugetlb_lock held.
>>>   *
>>> @@ -1698,7 +1774,7 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
>>>  		goto out;
>>>  	}
>>>  
>>> -	if (TestSetPageHWPoison(head)) {
>>> +	if (hugetlb_set_page_hwpoison(head, page)) {
>>>  		ret = -EHWPOISON;
>>>  		goto out;
>>>  	}
>>> @@ -1751,7 +1827,7 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
>>>  	lock_page(head);
>>>  
>>>  	if (hwpoison_filter(p)) {
>>> -		ClearPageHWPoison(head);
>>> +		hugetlb_clear_page_hwpoison(head);
>>>  		res = -EOPNOTSUPP;
>>>  		goto out;
>>>  	}
>>>
>>
>> Many thanks for your hard work. :)
> 
> Thanks for the detailed review and feedback.
> 
> - Naoya Horiguchi

Thanks.

> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2022-07-07  3:22 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-04  1:33 [mm-unstable PATCH v4 0/9] mm, hwpoison: enable 1GB hugepage support (v4) Naoya Horiguchi
2022-07-04  1:33 ` [mm-unstable PATCH v4 1/9] mm/hugetlb: check gigantic_page_runtime_supported() in return_unused_surplus_pages() Naoya Horiguchi
2022-07-05  2:16   ` Miaohe Lin
2022-07-05  6:39     ` HORIGUCHI NAOYA(堀口　直也)
2022-07-06  3:04       ` Miaohe Lin
2022-07-06  3:22         ` Mike Kravetz
2022-07-07  2:59           ` Miaohe Lin
2022-07-06 21:51   ` Mike Kravetz
2022-07-07  0:56     ` HORIGUCHI NAOYA(堀口　直也)
2022-07-04  1:33 ` [mm-unstable PATCH v4 2/9] mm/hugetlb: separate path for hwpoison entry in copy_hugetlb_page_range() Naoya Horiguchi
2022-07-04  1:42   ` Andrew Morton
2022-07-04  2:04     ` HORIGUCHI NAOYA(堀口　直也)
2022-07-04  1:33 ` [mm-unstable PATCH v4 3/9] mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry Naoya Horiguchi
2022-07-05  2:46   ` Miaohe Lin
2022-07-05  9:04     ` HORIGUCHI NAOYA(堀口　直也)
2022-07-06  3:07       ` Miaohe Lin
2022-07-06 22:21   ` Mike Kravetz
2022-07-04  1:33 ` [mm-unstable PATCH v4 4/9] mm, hwpoison, hugetlb: support saving mechanism of raw error pages Naoya Horiguchi
2022-07-06  2:37   ` Miaohe Lin
2022-07-06 23:06     ` HORIGUCHI NAOYA(堀口　直也)
2022-07-07  3:22       ` Miaohe Lin
2022-07-04  1:33 ` [mm-unstable PATCH v4 5/9] mm, hwpoison: make unpoison aware of raw error info in hwpoisoned hugepage Naoya Horiguchi
2022-07-06  2:58   ` Miaohe Lin
2022-07-06 23:06     ` HORIGUCHI NAOYA(堀口　直也)
2022-07-07  1:35       ` HORIGUCHI NAOYA(堀口　直也)
2022-07-07  3:08         ` Miaohe Lin
2022-07-04  1:33 ` [mm-unstable PATCH v4 6/9] mm, hwpoison: set PG_hwpoison for busy hugetlb pages Naoya Horiguchi
2022-07-04  1:33 ` [mm-unstable PATCH v4 7/9] mm, hwpoison: make __page_handle_poison returns int Naoya Horiguchi
2022-07-04  1:33 ` [mm-unstable PATCH v4 8/9] mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage Naoya Horiguchi
2022-07-04  1:33 ` [mm-unstable PATCH v4 9/9] mm, hwpoison: enable memory error handling on " Naoya Horiguchi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.