[mm-unstable PATCH v7 0/8] mm, hwpoison: enable 1GB hugepage support (v7)

All of lore.kernel.org
 help / color / mirror / Atom feed

* [mm-unstable PATCH v7 0/8] mm, hwpoison: enable 1GB hugepage support (v7)
@ 2022-07-14  4:24 Naoya Horiguchi
  2022-07-14  4:24 ` [mm-unstable PATCH v7 1/8] mm/hugetlb: check gigantic_page_runtime_supported() in return_unused_surplus_pages() Naoya Horiguchi
                   ` (7 more replies)
  0 siblings, 8 replies; 17+ messages in thread
From: Naoya Horiguchi @ 2022-07-14  4:24 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, David Hildenbrand, Mike Kravetz, Miaohe Lin,
	Liu Shixin, Yang Shi, Oscar Salvador, Muchun Song,
	Naoya Horiguchi, linux-kernel

Here is v7 of "enabling memory error handling on 1GB hugepage" patchset.

I applied feedbacks provided for v6 (thank you, Miaohe).
There're a few improvements on on 3/8 and 4/8.

- v1: https://lore.kernel.org/linux-mm/20220602050631.771414-1-naoya.horiguchi@linux.dev/T/#u
- v2: https://lore.kernel.org/linux-mm/20220623235153.2623702-1-naoya.horiguchi@linux.dev/T/#u
- v3: https://lore.kernel.org/linux-mm/20220630022755.3362349-1-naoya.horiguchi@linux.dev/T/#u
- v4: https://lore.kernel.org/linux-mm/20220704013312.2415700-1-naoya.horiguchi@linux.dev/T/#u
- v5: https://lore.kernel.org/linux-mm/20220708053653.964464-1-naoya.horiguchi@linux.dev/T/#u
- v6: https://lore.kernel.org/linux-mm/20220712032858.170414-1-naoya.horiguchi@linux.dev/T/#u

Thanks,
Naoya Horiguchi
---
Summary:

Naoya Horiguchi (8):
      mm/hugetlb: check gigantic_page_runtime_supported() in return_unused_surplus_pages()
      mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry
      mm, hwpoison, hugetlb: support saving mechanism of raw error pages
      mm, hwpoison: make unpoison aware of raw error info in hwpoisoned hugepage
      mm, hwpoison: set PG_hwpoison for busy hugetlb pages
      mm, hwpoison: make __page_handle_poison returns int
      mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage
      mm, hwpoison: enable memory error handling on 1GB hugepage

 arch/x86/mm/hugetlbpage.c |   8 ++-
 include/linux/hugetlb.h   |  17 ++++-
 include/linux/mm.h        |   2 +-
 include/linux/swapops.h   |   9 +++
 include/ras/ras_event.h   |   1 -
 mm/hugetlb.c              |  58 +++++++++++----
 mm/memory-failure.c       | 179 ++++++++++++++++++++++++++++++++++++++--------
 7 files changed, 226 insertions(+), 48 deletions(-)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [mm-unstable PATCH v7 1/8] mm/hugetlb: check gigantic_page_runtime_supported() in return_unused_surplus_pages()
  2022-07-14  4:24 [mm-unstable PATCH v7 0/8] mm, hwpoison: enable 1GB hugepage support (v7) Naoya Horiguchi
@ 2022-07-14  4:24 ` Naoya Horiguchi
  2022-07-14  4:24 ` [mm-unstable PATCH v7 2/8] mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry Naoya Horiguchi
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Naoya Horiguchi @ 2022-07-14  4:24 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, David Hildenbrand, Mike Kravetz, Miaohe Lin,
	Liu Shixin, Yang Shi, Oscar Salvador, Muchun Song,
	Naoya Horiguchi, linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

I found a weird state of 1GB hugepage pool, caused by the following
procedure:

  - run a process reserving all free 1GB hugepages,
  - shrink free 1GB hugepage pool to zero (i.e. writing 0 to
    /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages), then
  - kill the reserving process.

, then all the hugepages are free *and* surplus at the same time.

  $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
  3
  $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/free_hugepages
  3
  $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/resv_hugepages
  0
  $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/surplus_hugepages
  3

This state is resolved by reserving and allocating the pages then
freeing them again, so this seems not to result in serious problem.
But it's a little surprising (shrinking pool suddenly fails).

This behavior is caused by hstate_is_gigantic() check in
return_unused_surplus_pages(). This was introduced so long ago in 2008
by commit aa888a74977a ("hugetlb: support larger than MAX_ORDER"), and
at that time the gigantic pages were not supposed to be allocated/freed
at run-time.  Now kernel can support runtime allocation/free, so let's
check gigantic_page_runtime_supported() together.

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
---
v4 -> v5:
- drop additional gigantic_page_runtime_supported() checks.

v2 -> v3:
- Fixed typo in patch description,
- add !gigantic_page_runtime_supported() check instead of removing
  hstate_is_gigantic() check (suggested by Miaohe and Muchun)
- add a few more !gigantic_page_runtime_supported() check in
  set_max_huge_pages() (by Mike).
---
 mm/hugetlb.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index a4506ed1f1db..cf8ccee7654c 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2432,8 +2432,7 @@ static void return_unused_surplus_pages(struct hstate *h,
 	/* Uncommit the reservation */
 	h->resv_huge_pages -= unused_resv_pages;
 
-	/* Cannot return gigantic pages currently */
-	if (hstate_is_gigantic(h))
+	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
 		goto out;
 
 	/*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [mm-unstable PATCH v7 2/8] mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry
  2022-07-14  4:24 [mm-unstable PATCH v7 0/8] mm, hwpoison: enable 1GB hugepage support (v7) Naoya Horiguchi
  2022-07-14  4:24 ` [mm-unstable PATCH v7 1/8] mm/hugetlb: check gigantic_page_runtime_supported() in return_unused_surplus_pages() Naoya Horiguchi
@ 2022-07-14  4:24 ` Naoya Horiguchi
  2022-11-02 20:51     ` [Intel-gfx] " Ville Syrjälä
  2022-07-14  4:24 ` [mm-unstable PATCH v7 3/8] mm, hwpoison, hugetlb: support saving mechanism of raw error pages Naoya Horiguchi
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Naoya Horiguchi @ 2022-07-14  4:24 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, David Hildenbrand, Mike Kravetz, Miaohe Lin,
	Liu Shixin, Yang Shi, Oscar Salvador, Muchun Song,
	Naoya Horiguchi, linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

follow_pud_mask() does not support non-present pud entry now.  As long as
I tested on x86_64 server, follow_pud_mask() still simply returns
no_page_table() for non-present_pud_entry() due to pud_bad(), so no severe
user-visible effect should happen.  But generally we should call
follow_huge_pud() for non-present pud entry for 1GB hugetlb page.

Update pud_huge() and follow_huge_pud() to handle non-present pud entries.
The changes are similar to previous works for pud entries commit e66f17ff7177
("mm/hugetlb: take page table lock in follow_huge_pmd()") and commit
cbef8478bee5 ("mm/hugetlb: pmd_huge() returns true for non-present hugepage").

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
---
v2 -> v3:
- fixed typos in subject and description,
- added comment on pud_huge(),
- added comment about fallback for hwpoisoned entry,
- updated initial check about FOLL_{PIN,GET} flags.
---
 arch/x86/mm/hugetlbpage.c |  8 +++++++-
 mm/hugetlb.c              | 32 ++++++++++++++++++++++++++++++--
 2 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index 509408da0da1..6b3033845c6d 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -30,9 +30,15 @@ int pmd_huge(pmd_t pmd)
 		(pmd_val(pmd) & (_PAGE_PRESENT|_PAGE_PSE)) != _PAGE_PRESENT;
 }
 
+/*
+ * pud_huge() returns 1 if @pud is hugetlb related entry, that is normal
+ * hugetlb entry or non-present (migration or hwpoisoned) hugetlb entry.
+ * Otherwise, returns 0.
+ */
 int pud_huge(pud_t pud)
 {
-	return !!(pud_val(pud) & _PAGE_PSE);
+	return !pud_none(pud) &&
+		(pud_val(pud) & (_PAGE_PRESENT|_PAGE_PSE)) != _PAGE_PRESENT;
 }
 
 #ifdef CONFIG_HUGETLB_PAGE
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index cf8ccee7654c..77119d93a0f9 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6978,10 +6978,38 @@ struct page * __weak
 follow_huge_pud(struct mm_struct *mm, unsigned long address,
 		pud_t *pud, int flags)
 {
-	if (flags & (FOLL_GET | FOLL_PIN))
+	struct page *page = NULL;
+	spinlock_t *ptl;
+	pte_t pte;
+
+	if (WARN_ON_ONCE(flags & FOLL_PIN))
 		return NULL;
 
-	return pte_page(*(pte_t *)pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT);
+retry:
+	ptl = huge_pte_lock(hstate_sizelog(PUD_SHIFT), mm, (pte_t *)pud);
+	if (!pud_huge(*pud))
+		goto out;
+	pte = huge_ptep_get((pte_t *)pud);
+	if (pte_present(pte)) {
+		page = pud_page(*pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT);
+		if (WARN_ON_ONCE(!try_grab_page(page, flags))) {
+			page = NULL;
+			goto out;
+		}
+	} else {
+		if (is_hugetlb_entry_migration(pte)) {
+			spin_unlock(ptl);
+			__migration_entry_wait(mm, (pte_t *)pud, ptl);
+			goto retry;
+		}
+		/*
+		 * hwpoisoned entry is treated as no_page_table in
+		 * follow_page_mask().
+		 */
+	}
+out:
+	spin_unlock(ptl);
+	return page;
 }
 
 struct page * __weak
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [mm-unstable PATCH v7 3/8] mm, hwpoison, hugetlb: support saving mechanism of raw error pages
  2022-07-14  4:24 [mm-unstable PATCH v7 0/8] mm, hwpoison: enable 1GB hugepage support (v7) Naoya Horiguchi
  2022-07-14  4:24 ` [mm-unstable PATCH v7 1/8] mm/hugetlb: check gigantic_page_runtime_supported() in return_unused_surplus_pages() Naoya Horiguchi
  2022-07-14  4:24 ` [mm-unstable PATCH v7 2/8] mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry Naoya Horiguchi
@ 2022-07-14  4:24 ` Naoya Horiguchi
  2022-07-14  4:24 ` [mm-unstable PATCH v7 4/8] mm, hwpoison: make unpoison aware of raw error info in hwpoisoned hugepage Naoya Horiguchi
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Naoya Horiguchi @ 2022-07-14  4:24 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, David Hildenbrand, Mike Kravetz, Miaohe Lin,
	Liu Shixin, Yang Shi, Oscar Salvador, Muchun Song,
	Naoya Horiguchi, linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

When handling memory error on a hugetlb page, the error handler tries to
dissolve and turn it into 4kB pages.  If it's successfully dissolved,
PageHWPoison flag is moved to the raw error page, so that's all right.
However, dissolve sometimes fails, then the error page is left as
hwpoisoned hugepage. It's useful if we can retry to dissolve it to save
healthy pages, but that's not possible now because the information about
where the raw error pages is lost.

Use the private field of a few tail pages to keep that information.  The
code path of shrinking hugepage pool uses this info to try delayed dissolve.
In order to remember multiple errors in a hugepage, a singly-linked list
originated from SUBPAGE_INDEX_HWPOISON-th tail page is constructed.  Only
simple operations (adding an entry or clearing all) are required and the
list is assumed not to be very long, so this simple data structure should
be enough.

If we failed to save raw error info, the hwpoison hugepage has errors on
unknown subpage, then this new saving mechanism does not work any more,
so disable saving new raw error info and freeing hwpoison hugepages.

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
---
v6 -> v7:
- free raw_hwp_page list when HPageRawHwpUnreliable is set,
- hugetlb_clear_page_hwpoison returns immediately when
  HPageRawHwpUnreliable is set.

v5 -> v6:
- remove additional buggy HPageRawHwpUnreliable check

v4 -> v5:
- fixed build error (reported by kernel test robot).
- do not try to undo remove_hugetlb_page() when HPageRawHwpUnreliable is true,
- check HPageRawHwpUnreliable() before hugetlb_vmemmap_restore(),
- call num_poisoned_pages_inc() in hugetlb_set_page_hwpoison() when kalloc
  succeeds,
- remove "inline" in the definition of hugetlb_clear_page_hwpoison().

v3 -> v4:
- resolve conflict with "mm: hugetlb_vmemmap: improve hugetlb_vmemmap
  code readability", use hugetlb_vmemmap_restore() instead of
  hugetlb_vmemmap_alloc().

v2 -> v3:
- remove duplicate "return ret" lines,
- use GFP_ATOMIC instead of GFP_KERNEL,
- introduce HPageRawHwpUnreliable pseudo flag (suggested by Muchun),
- hugetlb_clear_page_hwpoison removes raw_hwp_page list even if
  HPageRawHwpUnreliable is true, (by Miaohe)

v1 -> v2:
- support hwpoison hugepage with multiple errors,
- moved the new interface functions to mm/memory-failure.c,
- define additional subpage index SUBPAGE_INDEX_HWPOISON_UNRELIABLE,
- stop freeing/dissolving hwpoison hugepages with unreliable raw error info,
- drop hugetlb_clear_page_hwpoison() in dissolve_free_huge_page() because
  that's done in update_and_free_page(),
- move setting/clearing PG_hwpoison flag to the new interfaces,
- checking already hwpoisoned or not on a subpage basis.

ChangeLog since previous post on 4/27:
- fixed typo in patch description (by Miaohe)
- fixed config value in #ifdef statement (by Miaohe)
- added sentences about "multiple hwpoison pages" scenario in patch
  description
---
 include/linux/hugetlb.h | 17 +++++++-
 mm/hugetlb.c            | 23 ++++++-----
 mm/memory-failure.c     | 89 +++++++++++++++++++++++++++++++++++++++--
 3 files changed, 116 insertions(+), 13 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 6d0620edf0a6..3ec981a0d8b3 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -42,6 +42,9 @@ enum {
 	SUBPAGE_INDEX_CGROUP,		/* reuse page->private */
 	SUBPAGE_INDEX_CGROUP_RSVD,	/* reuse page->private */
 	__MAX_CGROUP_SUBPAGE_INDEX = SUBPAGE_INDEX_CGROUP_RSVD,
+#endif
+#ifdef CONFIG_MEMORY_FAILURE
+	SUBPAGE_INDEX_HWPOISON,
 #endif
 	__NR_USED_SUBPAGE,
 };
@@ -551,7 +554,7 @@ generic_hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
  *	Synchronization:  Initially set after new page allocation with no
  *	locking.  When examined and modified during migration processing
  *	(isolate, migrate, putback) the hugetlb_lock is held.
- * HPG_temporary - - Set on a page that is temporarily allocated from the buddy
+ * HPG_temporary - Set on a page that is temporarily allocated from the buddy
  *	allocator.  Typically used for migration target pages when no pages
  *	are available in the pool.  The hugetlb free page path will
  *	immediately free pages with this flag set to the buddy allocator.
@@ -561,6 +564,8 @@ generic_hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
  * HPG_freed - Set when page is on the free lists.
  *	Synchronization: hugetlb_lock held for examination and modification.
  * HPG_vmemmap_optimized - Set when the vmemmap pages of the page are freed.
+ * HPG_raw_hwp_unreliable - Set when the hugetlb page has a hwpoison sub-page
+ *     that is not tracked by raw_hwp_page list.
  */
 enum hugetlb_page_flags {
 	HPG_restore_reserve = 0,
@@ -568,6 +573,7 @@ enum hugetlb_page_flags {
 	HPG_temporary,
 	HPG_freed,
 	HPG_vmemmap_optimized,
+	HPG_raw_hwp_unreliable,
 	__NR_HPAGEFLAGS,
 };
 
@@ -614,6 +620,7 @@ HPAGEFLAG(Migratable, migratable)
 HPAGEFLAG(Temporary, temporary)
 HPAGEFLAG(Freed, freed)
 HPAGEFLAG(VmemmapOptimized, vmemmap_optimized)
+HPAGEFLAG(RawHwpUnreliable, raw_hwp_unreliable)
 
 #ifdef CONFIG_HUGETLB_PAGE
 
@@ -796,6 +803,14 @@ extern int dissolve_free_huge_page(struct page *page);
 extern int dissolve_free_huge_pages(unsigned long start_pfn,
 				    unsigned long end_pfn);
 
+#ifdef CONFIG_MEMORY_FAILURE
+extern void hugetlb_clear_page_hwpoison(struct page *hpage);
+#else
+static inline void hugetlb_clear_page_hwpoison(struct page *hpage)
+{
+}
+#endif
+
 #ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
 #ifndef arch_hugetlb_migration_supported
 static inline bool arch_hugetlb_migration_supported(struct hstate *h)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 77119d93a0f9..e590632bc70f 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1535,6 +1535,13 @@ static void __update_and_free_page(struct hstate *h, struct page *page)
 	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
 		return;
 
+	/*
+	 * If we don't know which subpages are hwpoisoned, we can't free
+	 * the hugepage, so it's leaked intentionally.
+	 */
+	if (HPageRawHwpUnreliable(page))
+		return;
+
 	if (hugetlb_vmemmap_restore(h, page)) {
 		spin_lock_irq(&hugetlb_lock);
 		/*
@@ -1547,6 +1554,13 @@ static void __update_and_free_page(struct hstate *h, struct page *page)
 		return;
 	}
 
+	/*
+	 * Move PageHWPoison flag from head page to the raw error pages,
+	 * which makes any healthy subpages reusable.
+	 */
+	if (unlikely(PageHWPoison(page)))
+		hugetlb_clear_page_hwpoison(page);
+
 	for (i = 0; i < pages_per_huge_page(h);
 	     i++, subpage = mem_map_next(subpage, page, i)) {
 		subpage->flags &= ~(1 << PG_locked | 1 << PG_error |
@@ -2109,15 +2123,6 @@ int dissolve_free_huge_page(struct page *page)
 		 */
 		rc = hugetlb_vmemmap_restore(h, head);
 		if (!rc) {
-			/*
-			 * Move PageHWPoison flag from head page to the raw
-			 * error page, which makes any subpages rather than
-			 * the error page reusable.
-			 */
-			if (PageHWPoison(head) && page != head) {
-				SetPageHWPoison(page);
-				ClearPageHWPoison(head);
-			}
 			update_and_free_page(h, head, false);
 		} else {
 			spin_lock_irq(&hugetlb_lock);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index c9931c676335..fa29849769ed 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1664,6 +1664,90 @@ int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index,
 EXPORT_SYMBOL_GPL(mf_dax_kill_procs);
 #endif /* CONFIG_FS_DAX */
 
+#ifdef CONFIG_HUGETLB_PAGE
+/*
+ * Struct raw_hwp_page represents information about "raw error page",
+ * constructing singly linked list originated from ->private field of
+ * SUBPAGE_INDEX_HWPOISON-th tail page.
+ */
+struct raw_hwp_page {
+	struct llist_node node;
+	struct page *page;
+};
+
+static inline struct llist_head *raw_hwp_list_head(struct page *hpage)
+{
+	return (struct llist_head *)&page_private(hpage + SUBPAGE_INDEX_HWPOISON);
+}
+
+static void __free_raw_hwp_pages(struct page *hpage)
+{
+	struct llist_head *head;
+	struct llist_node *t, *tnode;
+
+	head = raw_hwp_list_head(hpage);
+	llist_for_each_safe(tnode, t, head->first) {
+		struct raw_hwp_page *p = container_of(tnode, struct raw_hwp_page, node);
+
+		SetPageHWPoison(p->page);
+		kfree(p);
+	}
+	llist_del_all(head);
+}
+
+static int hugetlb_set_page_hwpoison(struct page *hpage, struct page *page)
+{
+	struct llist_head *head;
+	struct raw_hwp_page *raw_hwp;
+	struct llist_node *t, *tnode;
+	int ret = TestSetPageHWPoison(hpage) ? -EHWPOISON : 0;
+
+	/*
+	 * Once the hwpoison hugepage has lost reliable raw error info,
+	 * there is little meaning to keep additional error info precisely,
+	 * so skip to add additional raw error info.
+	 */
+	if (HPageRawHwpUnreliable(hpage))
+		return -EHWPOISON;
+	head = raw_hwp_list_head(hpage);
+	llist_for_each_safe(tnode, t, head->first) {
+		struct raw_hwp_page *p = container_of(tnode, struct raw_hwp_page, node);
+
+		if (p->page == page)
+			return -EHWPOISON;
+	}
+
+	raw_hwp = kmalloc(sizeof(struct raw_hwp_page), GFP_ATOMIC);
+	if (raw_hwp) {
+		raw_hwp->page = page;
+		llist_add(&raw_hwp->node, head);
+		/* the first error event will be counted in action_result(). */
+		if (ret)
+			num_poisoned_pages_inc();
+	} else {
+		/*
+		 * Failed to save raw error info.  We no longer trace all
+		 * hwpoisoned subpages, and we need refuse to free/dissolve
+		 * this hwpoisoned hugepage.
+		 */
+		SetHPageRawHwpUnreliable(hpage);
+		/*
+		 * Once HPageRawHwpUnreliable is set, raw_hwp_page is not
+		 * used any more, so free it.
+		 */
+		__free_raw_hwp_pages(hpage);
+	}
+	return ret;
+}
+
+void hugetlb_clear_page_hwpoison(struct page *hpage)
+{
+	if (HPageRawHwpUnreliable(hpage))
+		return;
+	ClearPageHWPoison(hpage);
+	__free_raw_hwp_pages(hpage);
+}
+
 /*
  * Called from hugetlb code with hugetlb_lock held.
  *
@@ -1698,7 +1782,7 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
 		goto out;
 	}
 
-	if (TestSetPageHWPoison(head)) {
+	if (hugetlb_set_page_hwpoison(head, page)) {
 		ret = -EHWPOISON;
 		goto out;
 	}
@@ -1710,7 +1794,6 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
 	return ret;
 }
 
-#ifdef CONFIG_HUGETLB_PAGE
 /*
  * Taking refcount of hugetlb pages needs extra care about race conditions
  * with basic operations like hugepage allocation/free/demotion.
@@ -1751,7 +1834,7 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
 	lock_page(head);
 
 	if (hwpoison_filter(p)) {
-		ClearPageHWPoison(head);
+		hugetlb_clear_page_hwpoison(head);
 		res = -EOPNOTSUPP;
 		goto out;
 	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [mm-unstable PATCH v7 4/8] mm, hwpoison: make unpoison aware of raw error info in hwpoisoned hugepage
  2022-07-14  4:24 [mm-unstable PATCH v7 0/8] mm, hwpoison: enable 1GB hugepage support (v7) Naoya Horiguchi
                   ` (2 preceding siblings ...)
  2022-07-14  4:24 ` [mm-unstable PATCH v7 3/8] mm, hwpoison, hugetlb: support saving mechanism of raw error pages Naoya Horiguchi
@ 2022-07-14  4:24 ` Naoya Horiguchi
  2022-07-14  4:24 ` [mm-unstable PATCH v7 5/8] mm, hwpoison: set PG_hwpoison for busy hugetlb pages Naoya Horiguchi
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Naoya Horiguchi @ 2022-07-14  4:24 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, David Hildenbrand, Mike Kravetz, Miaohe Lin,
	Liu Shixin, Yang Shi, Oscar Salvador, Muchun Song,
	Naoya Horiguchi, linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

Raw error info list needs to be removed when hwpoisoned hugetlb is
unpoisoned.  And unpoison handler needs to know how many errors there
are in the target hugepage. So add them.

HPageVmemmapOptimized(hpage) and HPageRawHwpUnreliable(hpage)) sometimes
can't be unpoisoned, so skip them.

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
---
v6 -> v7:
- free_raw_hwp_pages() checks HPageVmemmapOptimized(hpage) only when
  move_flag is true so that unpoison works for HPageVmemmapOptimized pages.

v5 -> v6:
- set type of return value of hugetlb_clear_page_hwpoison() to void
- change type of return value of hugetlb_clear_page_hwpoison() to unsigned long

v4 -> v5:
- fix type of return value of free_raw_hwp_pages()
  (found by kernel test robot),
- prevent unpoison for HPageVmemmapOptimized and HPageRawHwpUnreliable.
---
 include/linux/swapops.h |  9 +++++++
 mm/memory-failure.c     | 52 +++++++++++++++++++++++++++++++++++++----
 2 files changed, 56 insertions(+), 5 deletions(-)

diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index a01aeb3fcc0b..ddc98f96ad2c 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -498,6 +498,11 @@ static inline void num_poisoned_pages_dec(void)
 	atomic_long_dec(&num_poisoned_pages);
 }
 
+static inline void num_poisoned_pages_sub(long i)
+{
+	atomic_long_sub(i, &num_poisoned_pages);
+}
+
 #else
 
 static inline swp_entry_t make_hwpoison_entry(struct page *page)
@@ -518,6 +523,10 @@ static inline struct page *hwpoison_entry_to_page(swp_entry_t entry)
 static inline void num_poisoned_pages_inc(void)
 {
 }
+
+static inline void num_poisoned_pages_sub(long i)
+{
+}
 #endif
 
 static inline int non_swap_entry(swp_entry_t entry)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index fa29849769ed..8b9c0d228549 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1680,19 +1680,23 @@ static inline struct llist_head *raw_hwp_list_head(struct page *hpage)
 	return (struct llist_head *)&page_private(hpage + SUBPAGE_INDEX_HWPOISON);
 }
 
-static void __free_raw_hwp_pages(struct page *hpage)
+static unsigned long __free_raw_hwp_pages(struct page *hpage, bool move_flag)
 {
 	struct llist_head *head;
 	struct llist_node *t, *tnode;
+	unsigned long count = 0;
 
 	head = raw_hwp_list_head(hpage);
 	llist_for_each_safe(tnode, t, head->first) {
 		struct raw_hwp_page *p = container_of(tnode, struct raw_hwp_page, node);
 
-		SetPageHWPoison(p->page);
+		if (move_flag)
+			SetPageHWPoison(p->page);
 		kfree(p);
+		count++;
 	}
 	llist_del_all(head);
+	return count;
 }
 
 static int hugetlb_set_page_hwpoison(struct page *hpage, struct page *page)
@@ -1735,17 +1739,36 @@ static int hugetlb_set_page_hwpoison(struct page *hpage, struct page *page)
 		 * Once HPageRawHwpUnreliable is set, raw_hwp_page is not
 		 * used any more, so free it.
 		 */
-		__free_raw_hwp_pages(hpage);
+		__free_raw_hwp_pages(hpage, false);
 	}
 	return ret;
 }
 
+static unsigned long free_raw_hwp_pages(struct page *hpage, bool move_flag)
+{
+	/*
+	 * HPageVmemmapOptimized hugepages can't be freed because struct
+	 * pages for tail pages are required but they don't exist.
+	 */
+	if (move_flag && HPageVmemmapOptimized(hpage))
+		return 0;
+
+	/*
+	 * HPageRawHwpUnreliable hugepages shouldn't be unpoisoned by
+	 * definition.
+	 */
+	if (HPageRawHwpUnreliable(hpage))
+		return 0;
+
+	return __free_raw_hwp_pages(hpage, move_flag);
+}
+
 void hugetlb_clear_page_hwpoison(struct page *hpage)
 {
 	if (HPageRawHwpUnreliable(hpage))
 		return;
 	ClearPageHWPoison(hpage);
-	__free_raw_hwp_pages(hpage);
+	free_raw_hwp_pages(hpage, true);
 }
 
 /*
@@ -1889,6 +1912,10 @@ static inline int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *
 	return 0;
 }
 
+static inline unsigned long free_raw_hwp_pages(struct page *hpage, bool flag)
+{
+	return 0;
+}
 #endif	/* CONFIG_HUGETLB_PAGE */
 
 static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
@@ -2294,6 +2321,7 @@ int unpoison_memory(unsigned long pfn)
 	struct page *p;
 	int ret = -EBUSY;
 	int freeit = 0;
+	unsigned long count = 1;
 	static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
 					DEFAULT_RATELIMIT_BURST);
 
@@ -2341,6 +2369,13 @@ int unpoison_memory(unsigned long pfn)
 
 	ret = get_hwpoison_page(p, MF_UNPOISON);
 	if (!ret) {
+		if (PageHuge(p)) {
+			count = free_raw_hwp_pages(page, false);
+			if (count == 0) {
+				ret = -EBUSY;
+				goto unlock_mutex;
+			}
+		}
 		ret = TestClearPageHWPoison(page) ? 0 : -EBUSY;
 	} else if (ret < 0) {
 		if (ret == -EHWPOISON) {
@@ -2349,6 +2384,13 @@ int unpoison_memory(unsigned long pfn)
 			unpoison_pr_info("Unpoison: failed to grab page %#lx\n",
 					 pfn, &unpoison_rs);
 	} else {
+		if (PageHuge(p)) {
+			count = free_raw_hwp_pages(page, false);
+			if (count == 0) {
+				ret = -EBUSY;
+				goto unlock_mutex;
+			}
+		}
 		freeit = !!TestClearPageHWPoison(p);
 
 		put_page(page);
@@ -2361,7 +2403,7 @@ int unpoison_memory(unsigned long pfn)
 unlock_mutex:
 	mutex_unlock(&mf_mutex);
 	if (!ret || freeit) {
-		num_poisoned_pages_dec();
+		num_poisoned_pages_sub(count);
 		unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
 				 page_to_pfn(p), &unpoison_rs);
 	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [mm-unstable PATCH v7 5/8] mm, hwpoison: set PG_hwpoison for busy hugetlb pages
  2022-07-14  4:24 [mm-unstable PATCH v7 0/8] mm, hwpoison: enable 1GB hugepage support (v7) Naoya Horiguchi
                   ` (3 preceding siblings ...)
  2022-07-14  4:24 ` [mm-unstable PATCH v7 4/8] mm, hwpoison: make unpoison aware of raw error info in hwpoisoned hugepage Naoya Horiguchi
@ 2022-07-14  4:24 ` Naoya Horiguchi
  2022-07-14  4:24 ` [mm-unstable PATCH v7 6/8] mm, hwpoison: make __page_handle_poison returns int Naoya Horiguchi
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Naoya Horiguchi @ 2022-07-14  4:24 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, David Hildenbrand, Mike Kravetz, Miaohe Lin,
	Liu Shixin, Yang Shi, Oscar Salvador, Muchun Song,
	Naoya Horiguchi, linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

If memory_failure() fails to grab page refcount on a hugetlb page
because it's busy, it returns without setting PG_hwpoison on it.
This not only loses a chance of error containment, but breaks the rule
that action_result() should be called only when memory_failure() do
any of handling work (even if that's just setting PG_hwpoison).
This inconsistency could harm code maintainability.

So set PG_hwpoison and call hugetlb_set_page_hwpoison() for such a case.

Fixes: 405ce051236c ("mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()")
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
---
 include/linux/mm.h  | 1 +
 mm/memory-failure.c | 8 ++++----
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 4287bec50c28..7668831c919f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3188,6 +3188,7 @@ enum mf_flags {
 	MF_SOFT_OFFLINE = 1 << 3,
 	MF_UNPOISON = 1 << 4,
 	MF_SW_SIMULATED = 1 << 5,
+	MF_NO_RETRY = 1 << 6,
 };
 int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index,
 		      unsigned long count, int mf_flags);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 8b9c0d228549..f15d521c3f1f 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1802,7 +1802,8 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
 			count_increased = true;
 	} else {
 		ret = -EBUSY;
-		goto out;
+		if (!(flags & MF_NO_RETRY))
+			goto out;
 	}
 
 	if (hugetlb_set_page_hwpoison(head, page)) {
@@ -1829,7 +1830,6 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
 	struct page *p = pfn_to_page(pfn);
 	struct page *head;
 	unsigned long page_flags;
-	bool retry = true;
 
 	*hugetlb = 1;
 retry:
@@ -1845,8 +1845,8 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
 		}
 		return res;
 	} else if (res == -EBUSY) {
-		if (retry) {
-			retry = false;
+		if (!(flags & MF_NO_RETRY)) {
+			flags |= MF_NO_RETRY;
 			goto retry;
 		}
 		action_result(pfn, MF_MSG_UNKNOWN, MF_IGNORED);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [mm-unstable PATCH v7 6/8] mm, hwpoison: make __page_handle_poison returns int
  2022-07-14  4:24 [mm-unstable PATCH v7 0/8] mm, hwpoison: enable 1GB hugepage support (v7) Naoya Horiguchi
                   ` (4 preceding siblings ...)
  2022-07-14  4:24 ` [mm-unstable PATCH v7 5/8] mm, hwpoison: set PG_hwpoison for busy hugetlb pages Naoya Horiguchi
@ 2022-07-14  4:24 ` Naoya Horiguchi
  2022-07-14  4:24 ` [mm-unstable PATCH v7 7/8] mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage Naoya Horiguchi
  2022-07-14  4:24 ` [mm-unstable PATCH v7 8/8] mm, hwpoison: enable memory error handling on " Naoya Horiguchi
  7 siblings, 0 replies; 17+ messages in thread
From: Naoya Horiguchi @ 2022-07-14  4:24 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, David Hildenbrand, Mike Kravetz, Miaohe Lin,
	Liu Shixin, Yang Shi, Oscar Salvador, Muchun Song,
	Naoya Horiguchi, linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

__page_handle_poison() returns bool that shows whether
take_page_off_buddy() has passed or not now.  But we will want to
distinguish another case of "dissolve has passed but taking off failed"
by its return value. So change the type of the return value.
No functional change.

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
---
v2 -> v3:
- move deleting "res = MF_FAILED" to the later patch. (by Miaohe)
---
 mm/memory-failure.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index f15d521c3f1f..c8fa3643791c 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -71,7 +71,13 @@ atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
 
 static bool hw_memory_failure __read_mostly = false;
 
-static bool __page_handle_poison(struct page *page)
+/*
+ * Return values:
+ *   1:   the page is dissolved (if needed) and taken off from buddy,
+ *   0:   the page is dissolved (if needed) and not taken off from buddy,
+ *   < 0: failed to dissolve.
+ */
+static int __page_handle_poison(struct page *page)
 {
 	int ret;
 
@@ -81,7 +87,7 @@ static bool __page_handle_poison(struct page *page)
 		ret = take_page_off_buddy(page);
 	zone_pcp_enable(page_zone(page));
 
-	return ret > 0;
+	return ret;
 }
 
 static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, bool release)
@@ -91,7 +97,7 @@ static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, boo
 		 * Doing this check for free pages is also fine since dissolve_free_huge_page
 		 * returns 0 for non-hugetlb pages as well.
 		 */
-		if (!__page_handle_poison(page))
+		if (__page_handle_poison(page) <= 0)
 			/*
 			 * We could fail to take off the target page from buddy
 			 * for example due to racy page allocation, but that's
@@ -1086,7 +1092,7 @@ static int me_huge_page(struct page_state *ps, struct page *p)
 		 * subpages.
 		 */
 		put_page(hpage);
-		if (__page_handle_poison(p)) {
+		if (__page_handle_poison(p) > 0) {
 			page_ref_inc(p);
 			res = MF_RECOVERED;
 		}
@@ -1869,7 +1875,7 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
 	if (res == 0) {
 		unlock_page(head);
 		res = MF_FAILED;
-		if (__page_handle_poison(p)) {
+		if (__page_handle_poison(p) > 0) {
 			page_ref_inc(p);
 			res = MF_RECOVERED;
 		}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [mm-unstable PATCH v7 7/8] mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage
  2022-07-14  4:24 [mm-unstable PATCH v7 0/8] mm, hwpoison: enable 1GB hugepage support (v7) Naoya Horiguchi
                   ` (5 preceding siblings ...)
  2022-07-14  4:24 ` [mm-unstable PATCH v7 6/8] mm, hwpoison: make __page_handle_poison returns int Naoya Horiguchi
@ 2022-07-14  4:24 ` Naoya Horiguchi
  2022-07-14  4:24 ` [mm-unstable PATCH v7 8/8] mm, hwpoison: enable memory error handling on " Naoya Horiguchi
  7 siblings, 0 replies; 17+ messages in thread
From: Naoya Horiguchi @ 2022-07-14  4:24 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, David Hildenbrand, Mike Kravetz, Miaohe Lin,
	Liu Shixin, Yang Shi, Oscar Salvador, Muchun Song,
	Naoya Horiguchi, linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

Currently if memory_failure() (modified to remove blocking code with
subsequent patch) is called on a page in some 1GB hugepage, memory error
handling fails and the raw error page gets into leaked state.  The impact
is small in production systems (just leaked single 4kB page), but this
limits the testability because unpoison doesn't work for it.
We can no longer create 1GB hugepage on the 1GB physical address range
with such leaked pages, that's not useful when testing on small systems.

When a hwpoison page in a 1GB hugepage is handled, it's caught by the
PageHWPoison check in free_pages_prepare() because the 1GB hugepage is
broken down into raw error pages before coming to this point:

        if (unlikely(PageHWPoison(page)) && !order) {
                ...
                return false;
        }

Then, the page is not sent to buddy and the page refcount is left 0.

Originally this check is supposed to work when the error page is freed from
page_handle_poison() (that is called from soft-offline), but now we are
opening another path to call it, so the callers of __page_handle_poison()
need to handle the case by considering the return value 0 as success. Then
page refcount for hwpoison is properly incremented so unpoison works.

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
---
v2 -> v3:
- remove "res = MF_FAILED" in try_memory_failure_hugetlb (by Miaohe)
---
 mm/memory-failure.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index c8fa3643791c..3721de624b98 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1084,7 +1084,6 @@ static int me_huge_page(struct page_state *ps, struct page *p)
 		res = truncate_error_page(hpage, page_to_pfn(p), mapping);
 		unlock_page(hpage);
 	} else {
-		res = MF_FAILED;
 		unlock_page(hpage);
 		/*
 		 * migration entry prevents later access on error hugepage,
@@ -1092,9 +1091,11 @@ static int me_huge_page(struct page_state *ps, struct page *p)
 		 * subpages.
 		 */
 		put_page(hpage);
-		if (__page_handle_poison(p) > 0) {
+		if (__page_handle_poison(p) >= 0) {
 			page_ref_inc(p);
 			res = MF_RECOVERED;
+		} else {
+			res = MF_FAILED;
 		}
 	}
 
@@ -1874,10 +1875,11 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
 	 */
 	if (res == 0) {
 		unlock_page(head);
-		res = MF_FAILED;
-		if (__page_handle_poison(p) > 0) {
+		if (__page_handle_poison(p) >= 0) {
 			page_ref_inc(p);
 			res = MF_RECOVERED;
+		} else {
+			res = MF_FAILED;
 		}
 		action_result(pfn, MF_MSG_FREE_HUGE, res);
 		return res == MF_RECOVERED ? 0 : -EBUSY;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [mm-unstable PATCH v7 8/8] mm, hwpoison: enable memory error handling on 1GB hugepage
  2022-07-14  4:24 [mm-unstable PATCH v7 0/8] mm, hwpoison: enable 1GB hugepage support (v7) Naoya Horiguchi
                   ` (6 preceding siblings ...)
  2022-07-14  4:24 ` [mm-unstable PATCH v7 7/8] mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage Naoya Horiguchi
@ 2022-07-14  4:24 ` Naoya Horiguchi
  7 siblings, 0 replies; 17+ messages in thread
From: Naoya Horiguchi @ 2022-07-14  4:24 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, David Hildenbrand, Mike Kravetz, Miaohe Lin,
	Liu Shixin, Yang Shi, Oscar Salvador, Muchun Song,
	Naoya Horiguchi, linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

Now error handling code is prepared, so remove the blocking code and
enable memory error handling on 1GB hugepage.

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
---
 include/linux/mm.h      |  1 -
 include/ras/ras_event.h |  1 -
 mm/memory-failure.c     | 16 ----------------
 3 files changed, 18 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7668831c919f..b0e83835184e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3241,7 +3241,6 @@ enum mf_action_page_type {
 	MF_MSG_DIFFERENT_COMPOUND,
 	MF_MSG_HUGE,
 	MF_MSG_FREE_HUGE,
-	MF_MSG_NON_PMD_HUGE,
 	MF_MSG_UNMAP_FAILED,
 	MF_MSG_DIRTY_SWAPCACHE,
 	MF_MSG_CLEAN_SWAPCACHE,
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
index d0337a41141c..cbd3ddd7c33d 100644
--- a/include/ras/ras_event.h
+++ b/include/ras/ras_event.h
@@ -360,7 +360,6 @@ TRACE_EVENT(aer_event,
 	EM ( MF_MSG_DIFFERENT_COMPOUND, "different compound page after locking" ) \
 	EM ( MF_MSG_HUGE, "huge page" )					\
 	EM ( MF_MSG_FREE_HUGE, "free huge page" )			\
-	EM ( MF_MSG_NON_PMD_HUGE, "non-pmd-sized huge page" )		\
 	EM ( MF_MSG_UNMAP_FAILED, "unmapping failed page" )		\
 	EM ( MF_MSG_DIRTY_SWAPCACHE, "dirty swapcache page" )		\
 	EM ( MF_MSG_CLEAN_SWAPCACHE, "clean swapcache page" )		\
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 3721de624b98..d86b5acd5754 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -765,7 +765,6 @@ static const char * const action_page_types[] = {
 	[MF_MSG_DIFFERENT_COMPOUND]	= "different compound page after locking",
 	[MF_MSG_HUGE]			= "huge page",
 	[MF_MSG_FREE_HUGE]		= "free huge page",
-	[MF_MSG_NON_PMD_HUGE]		= "non-pmd-sized huge page",
 	[MF_MSG_UNMAP_FAILED]		= "unmapping failed page",
 	[MF_MSG_DIRTY_SWAPCACHE]	= "dirty swapcache page",
 	[MF_MSG_CLEAN_SWAPCACHE]	= "clean swapcache page",
@@ -1887,21 +1886,6 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
 
 	page_flags = head->flags;
 
-	/*
-	 * TODO: hwpoison for pud-sized hugetlb doesn't work right now, so
-	 * simply disable it. In order to make it work properly, we need
-	 * make sure that:
-	 *  - conversion of a pud that maps an error hugetlb into hwpoison
-	 *    entry properly works, and
-	 *  - other mm code walking over page table is aware of pud-aligned
-	 *    hwpoison entries.
-	 */
-	if (huge_page_size(page_hstate(head)) > PMD_SIZE) {
-		action_result(pfn, MF_MSG_NON_PMD_HUGE, MF_IGNORED);
-		res = -EBUSY;
-		goto out;
-	}
-
 	if (!hwpoison_user_mappings(p, pfn, flags, head)) {
 		action_result(pfn, MF_MSG_UNMAP_FAILED, MF_IGNORED);
 		res = -EBUSY;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [mm-unstable PATCH v7 2/8] mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry
  2022-07-14  4:24 ` [mm-unstable PATCH v7 2/8] mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry Naoya Horiguchi
@ 2022-11-02 20:51     ` Ville Syrjälä
  0 siblings, 0 replies; 17+ messages in thread
From: Ville Syrjälä @ 2022-11-02 20:51 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andrew Morton, David Hildenbrand, Mike Kravetz,
	Miaohe Lin, Liu Shixin, Yang Shi, Oscar Salvador, Muchun Song,
	Naoya Horiguchi, linux-kernel, intel-gfx, regressions

On Thu, Jul 14, 2022 at 01:24:14PM +0900, Naoya Horiguchi wrote:
> +/*
> + * pud_huge() returns 1 if @pud is hugetlb related entry, that is normal
> + * hugetlb entry or non-present (migration or hwpoisoned) hugetlb entry.
> + * Otherwise, returns 0.
> + */
>  int pud_huge(pud_t pud)
>  {
> -	return !!(pud_val(pud) & _PAGE_PSE);
> +	return !pud_none(pud) &&
> +		(pud_val(pud) & (_PAGE_PRESENT|_PAGE_PSE)) != _PAGE_PRESENT;
>  }

Hi,

This causes i915 to trip a BUG_ON() on x86-32 when I start X.

[  225.777375] kernel BUG at mm/memory.c:2664!
[  225.777391] invalid opcode: 0000 [#1] PREEMPT SMP
[  225.777405] CPU: 0 PID: 2402 Comm: Xorg Not tainted 6.1.0-rc3-bdg+ #86
[  225.777415] Hardware name:  /8I865G775-G, BIOS F1 08/29/2006
[  225.777421] EIP: __apply_to_page_range+0x24d/0x31c
[  225.777437] Code: ff ff 8b 55 e8 8b 45 cc e8 0a 11 ec ff 89 d8 83 c4 28 5b 5e 5f 5d c3 81 7d e0 a0 ef 96 c1 74 ad 8b 45 d0 e8 2d 83 49 00 eb a3 <0f> 0b 25 00 f0 ff ff 81 eb 00 00 00 40 01 c3 8b 45 ec 8b 00 e8 76
[  225.777446] EAX: 00000001 EBX: c53a3b58 ECX: b5c00000 EDX: c258aa00
[  225.777454] ESI: b5c00000 EDI: b5900000 EBP: c4b0fdb4 ESP: c4b0fd80
[  225.777462] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010202
[  225.777470] CR0: 80050033 CR2: b5900000 CR3: 053a3000 CR4: 000006d0
[  225.777479] Call Trace:
[  225.777486]  ? i915_memcpy_init_early+0x63/0x63 [i915]
[  225.777684]  apply_to_page_range+0x21/0x27
[  225.777694]  ? i915_memcpy_init_early+0x63/0x63 [i915]
[  225.777870]  remap_io_mapping+0x49/0x75 [i915]
[  225.778046]  ? i915_memcpy_init_early+0x63/0x63 [i915]
[  225.778220]  ? mutex_unlock+0xb/0xd
[  225.778231]  ? i915_vma_pin_fence+0x6d/0xf7 [i915]
[  225.778420]  vm_fault_gtt+0x2a9/0x8f1 [i915]
[  225.778644]  ? lock_is_held_type+0x56/0xe7
[  225.778655]  ? lock_is_held_type+0x7a/0xe7
[  225.778663]  ? 0xc1000000
[  225.778670]  __do_fault+0x21/0x6a
[  225.778679]  handle_mm_fault+0x708/0xb21
[  225.778686]  ? mt_find+0x21e/0x5ae
[  225.778696]  exc_page_fault+0x185/0x705
[  225.778704]  ? doublefault_shim+0x127/0x127
[  225.778715]  handle_exception+0x130/0x130
[  225.778723] EIP: 0xb700468a
[  225.778730] Code: 44 24 40 8b 7c 24 1c 89 47 54 8b 44 24 5c 65 2b 05 14 00 00 00 0f 85 8a 01 00 00 83 c4 6c 5b 5e 5f 5d c3 8b 44 24 1c 8b 40 28 <c7> 00 00 00 00 00 8b 44 24 20 8d 90 20 1b 00 00 8b 02 83 e8 01 89
[  225.778738] EAX: b5900000 EBX: b7148000 ECX: 00000000 EDX: 00000000
[  225.778745] ESI: 0103eb60 EDI: b7148000 EBP: b6cf7000 ESP: bfd76650
[  225.778752] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00010246
[  225.778761]  ? doublefault_shim+0x127/0x127
[  225.778769] Modules linked in: i915 prime_numbers i2c_algo_bit iosf_mbi drm_buddy video wmi drm_display_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm drm_panel_orientation_quirks backlight cfg80211 rfkill sch_fq_codel xt_tcpudp xt_multiport xt_state iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv4 ip_tables x_tables binfmt_misc i2c_dev iTCO_wdt snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_timer psmouse i2c_i801 snd i2c_smbus uhci_hcd i2c_core pcspkr soundcore lpc_ich mfd_core ehci_pci ehci_hcd skge intel_agp intel_gtt usbcore agpgart usb_common rng_core parport_pc parport evdev
[  225.778899] ---[ end trace 0000000000000000 ]---
[  225.778906] EIP: __apply_to_page_range+0x24d/0x31c
[  225.778916] Code: ff ff 8b 55 e8 8b 45 cc e8 0a 11 ec ff 89 d8 83 c4 28 5b 5e 5f 5d c3 81 7d e0 a0 ef 96 c1 74 ad 8b 45 d0 e8 2d 83 49 00 eb a3 <0f> 0b 25 00 f0 ff ff 81 eb 00 00 00 40 01 c3 8b 45 ec 8b 00 e8 76
[  225.778924] EAX: 00000001 EBX: c53a3b58 ECX: b5c00000 EDX: c258aa00
[  225.778931] ESI: b5c00000 EDI: b5900000 EBP: c4b0fdb4 ESP: c4b0fd80
[  225.778938] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010202
[  225.778946] CR0: 80050033 CR2: b5900000 CR3: 053a3000 CR4: 000006d0

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Intel-gfx] [mm-unstable PATCH v7 2/8] mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry
@ 2022-11-02 20:51     ` Ville Syrjälä
  0 siblings, 0 replies; 17+ messages in thread
From: Ville Syrjälä @ 2022-11-02 20:51 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Miaohe Lin, regressions, David Hildenbrand, Yang Shi,
	Naoya Horiguchi, linux-kernel, Liu Shixin, linux-mm, Muchun Song,
	Andrew Morton, Oscar Salvador, intel-gfx, Mike Kravetz

On Thu, Jul 14, 2022 at 01:24:14PM +0900, Naoya Horiguchi wrote:
> +/*
> + * pud_huge() returns 1 if @pud is hugetlb related entry, that is normal
> + * hugetlb entry or non-present (migration or hwpoisoned) hugetlb entry.
> + * Otherwise, returns 0.
> + */
>  int pud_huge(pud_t pud)
>  {
> -	return !!(pud_val(pud) & _PAGE_PSE);
> +	return !pud_none(pud) &&
> +		(pud_val(pud) & (_PAGE_PRESENT|_PAGE_PSE)) != _PAGE_PRESENT;
>  }

Hi,

This causes i915 to trip a BUG_ON() on x86-32 when I start X.

[  225.777375] kernel BUG at mm/memory.c:2664!
[  225.777391] invalid opcode: 0000 [#1] PREEMPT SMP
[  225.777405] CPU: 0 PID: 2402 Comm: Xorg Not tainted 6.1.0-rc3-bdg+ #86
[  225.777415] Hardware name:  /8I865G775-G, BIOS F1 08/29/2006
[  225.777421] EIP: __apply_to_page_range+0x24d/0x31c
[  225.777437] Code: ff ff 8b 55 e8 8b 45 cc e8 0a 11 ec ff 89 d8 83 c4 28 5b 5e 5f 5d c3 81 7d e0 a0 ef 96 c1 74 ad 8b 45 d0 e8 2d 83 49 00 eb a3 <0f> 0b 25 00 f0 ff ff 81 eb 00 00 00 40 01 c3 8b 45 ec 8b 00 e8 76
[  225.777446] EAX: 00000001 EBX: c53a3b58 ECX: b5c00000 EDX: c258aa00
[  225.777454] ESI: b5c00000 EDI: b5900000 EBP: c4b0fdb4 ESP: c4b0fd80
[  225.777462] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010202
[  225.777470] CR0: 80050033 CR2: b5900000 CR3: 053a3000 CR4: 000006d0
[  225.777479] Call Trace:
[  225.777486]  ? i915_memcpy_init_early+0x63/0x63 [i915]
[  225.777684]  apply_to_page_range+0x21/0x27
[  225.777694]  ? i915_memcpy_init_early+0x63/0x63 [i915]
[  225.777870]  remap_io_mapping+0x49/0x75 [i915]
[  225.778046]  ? i915_memcpy_init_early+0x63/0x63 [i915]
[  225.778220]  ? mutex_unlock+0xb/0xd
[  225.778231]  ? i915_vma_pin_fence+0x6d/0xf7 [i915]
[  225.778420]  vm_fault_gtt+0x2a9/0x8f1 [i915]
[  225.778644]  ? lock_is_held_type+0x56/0xe7
[  225.778655]  ? lock_is_held_type+0x7a/0xe7
[  225.778663]  ? 0xc1000000
[  225.778670]  __do_fault+0x21/0x6a
[  225.778679]  handle_mm_fault+0x708/0xb21
[  225.778686]  ? mt_find+0x21e/0x5ae
[  225.778696]  exc_page_fault+0x185/0x705
[  225.778704]  ? doublefault_shim+0x127/0x127
[  225.778715]  handle_exception+0x130/0x130
[  225.778723] EIP: 0xb700468a
[  225.778730] Code: 44 24 40 8b 7c 24 1c 89 47 54 8b 44 24 5c 65 2b 05 14 00 00 00 0f 85 8a 01 00 00 83 c4 6c 5b 5e 5f 5d c3 8b 44 24 1c 8b 40 28 <c7> 00 00 00 00 00 8b 44 24 20 8d 90 20 1b 00 00 8b 02 83 e8 01 89
[  225.778738] EAX: b5900000 EBX: b7148000 ECX: 00000000 EDX: 00000000
[  225.778745] ESI: 0103eb60 EDI: b7148000 EBP: b6cf7000 ESP: bfd76650
[  225.778752] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00010246
[  225.778761]  ? doublefault_shim+0x127/0x127
[  225.778769] Modules linked in: i915 prime_numbers i2c_algo_bit iosf_mbi drm_buddy video wmi drm_display_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm drm_panel_orientation_quirks backlight cfg80211 rfkill sch_fq_codel xt_tcpudp xt_multiport xt_state iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv4 ip_tables x_tables binfmt_misc i2c_dev iTCO_wdt snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_timer psmouse i2c_i801 snd i2c_smbus uhci_hcd i2c_core pcspkr soundcore lpc_ich mfd_core ehci_pci ehci_hcd skge intel_agp intel_gtt usbcore agpgart usb_common rng_core parport_pc parport evdev
[  225.778899] ---[ end trace 0000000000000000 ]---
[  225.778906] EIP: __apply_to_page_range+0x24d/0x31c
[  225.778916] Code: ff ff 8b 55 e8 8b 45 cc e8 0a 11 ec ff 89 d8 83 c4 28 5b 5e 5f 5d c3 81 7d e0 a0 ef 96 c1 74 ad 8b 45 d0 e8 2d 83 49 00 eb a3 <0f> 0b 25 00 f0 ff ff 81 eb 00 00 00 40 01 c3 8b 45 ec 8b 00 e8 76
[  225.778924] EAX: 00000001 EBX: c53a3b58 ECX: b5c00000 EDX: c258aa00
[  225.778931] ESI: b5c00000 EDI: b5900000 EBP: c4b0fdb4 ESP: c4b0fd80
[  225.778938] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010202
[  225.778946] CR0: 80050033 CR2: b5900000 CR3: 053a3000 CR4: 000006d0

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [mm-unstable PATCH v7 2/8] mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry
  2022-11-02 20:51     ` [Intel-gfx] " Ville Syrjälä
@ 2022-11-04 15:59       ` Naoya Horiguchi
  -1 siblings, 0 replies; 17+ messages in thread
From: Naoya Horiguchi @ 2022-11-04 15:59 UTC (permalink / raw)
  To: ville.syrjala
  Cc: linux-mm, Andrew Morton, David Hildenbrand, Mike Kravetz,
	Miaohe Lin, Liu Shixin, Yang Shi, Oscar Salvador, Muchun Song,
	linux-kernel, intel-gfx, regressions, naoya.horiguchi

On Wed, Nov 02, 2022 at 10:51:40PM +0200, Ville Syrjälä wrote:
> On Thu, Jul 14, 2022 at 01:24:14PM +0900, Naoya Horiguchi wrote:
> > +/*
> > + * pud_huge() returns 1 if @pud is hugetlb related entry, that is normal
> > + * hugetlb entry or non-present (migration or hwpoisoned) hugetlb entry.
> > + * Otherwise, returns 0.
> > + */
> >  int pud_huge(pud_t pud)
> >  {
> > -	return !!(pud_val(pud) & _PAGE_PSE);
> > +	return !pud_none(pud) &&
> > +		(pud_val(pud) & (_PAGE_PRESENT|_PAGE_PSE)) != _PAGE_PRESENT;
> >  }
> 
> Hi,
> 
> This causes i915 to trip a BUG_ON() on x86-32 when I start X.

Hello,

Thank you for finding and reporting the issue.

x86-32 does not enable CONFIG_ARCH_HAS_GIGANTIC_PAGE, so pud_huge() is
supposed to be false on x86-32.  Doing like below looks to me a fix
(reverting to the original behavior for x86-32):


diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index 6b3033845c6d..bf73f25aaa32 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -37,8 +37,12 @@ int pmd_huge(pmd_t pmd)
  */
 int pud_huge(pud_t pud)
 {
+#ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
        return !pud_none(pud) &&
                (pud_val(pud) & (_PAGE_PRESENT|_PAGE_PSE)) != _PAGE_PRESENT;
+#else
+       return !!(pud_val(pud) & _PAGE_PSE);    // or "return 0;" ?
+#endif
 }

 #ifdef CONFIG_HUGETLB_PAGE


Let me guess what the PUD entry was there when triggering the issue.
Assuming that the original code (before 3a194f3f8ad0) was correct, the PSE
bit in pud_val(pud) should be always cleared.  So, when pud_huge() returns
true since 3a194f3f8ad0, the PRESENT bit should be clear and some other
bits (rather than PRESENT and PSE) are set so that pud_none() is false.
I'm not sure what such a non-present PUD entry does mean.

Thanks,
Naoya Horiguchi

> 
> [  225.777375] kernel BUG at mm/memory.c:2664!
> [  225.777391] invalid opcode: 0000 [#1] PREEMPT SMP
> [  225.777405] CPU: 0 PID: 2402 Comm: Xorg Not tainted 6.1.0-rc3-bdg+ #86
> [  225.777415] Hardware name:  /8I865G775-G, BIOS F1 08/29/2006
> [  225.777421] EIP: __apply_to_page_range+0x24d/0x31c
> [  225.777437] Code: ff ff 8b 55 e8 8b 45 cc e8 0a 11 ec ff 89 d8 83 c4 28 5b 5e 5f 5d c3 81 7d e0 a0 ef 96 c1 74 ad 8b 45 d0 e8 2d 83 49 00 eb a3 <0f> 0b 25 00 f0 ff ff 81 eb 00 00 00 40 01 c3 8b 45 ec 8b 00 e8 76
> [  225.777446] EAX: 00000001 EBX: c53a3b58 ECX: b5c00000 EDX: c258aa00
> [  225.777454] ESI: b5c00000 EDI: b5900000 EBP: c4b0fdb4 ESP: c4b0fd80
> [  225.777462] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010202
> [  225.777470] CR0: 80050033 CR2: b5900000 CR3: 053a3000 CR4: 000006d0
> [  225.777479] Call Trace:
> [  225.777486]  ? i915_memcpy_init_early+0x63/0x63 [i915]
> [  225.777684]  apply_to_page_range+0x21/0x27
> [  225.777694]  ? i915_memcpy_init_early+0x63/0x63 [i915]
> [  225.777870]  remap_io_mapping+0x49/0x75 [i915]
> [  225.778046]  ? i915_memcpy_init_early+0x63/0x63 [i915]
> [  225.778220]  ? mutex_unlock+0xb/0xd
> [  225.778231]  ? i915_vma_pin_fence+0x6d/0xf7 [i915]
> [  225.778420]  vm_fault_gtt+0x2a9/0x8f1 [i915]
> [  225.778644]  ? lock_is_held_type+0x56/0xe7
> [  225.778655]  ? lock_is_held_type+0x7a/0xe7
> [  225.778663]  ? 0xc1000000
> [  225.778670]  __do_fault+0x21/0x6a
> [  225.778679]  handle_mm_fault+0x708/0xb21
> [  225.778686]  ? mt_find+0x21e/0x5ae
> [  225.778696]  exc_page_fault+0x185/0x705
> [  225.778704]  ? doublefault_shim+0x127/0x127
> [  225.778715]  handle_exception+0x130/0x130
> [  225.778723] EIP: 0xb700468a
> [  225.778730] Code: 44 24 40 8b 7c 24 1c 89 47 54 8b 44 24 5c 65 2b 05 14 00 00 00 0f 85 8a 01 00 00 83 c4 6c 5b 5e 5f 5d c3 8b 44 24 1c 8b 40 28 <c7> 00 00 00 00 00 8b 44 24 20 8d 90 20 1b 00 00 8b 02 83 e8 01 89
> [  225.778738] EAX: b5900000 EBX: b7148000 ECX: 00000000 EDX: 00000000
> [  225.778745] ESI: 0103eb60 EDI: b7148000 EBP: b6cf7000 ESP: bfd76650
> [  225.778752] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00010246
> [  225.778761]  ? doublefault_shim+0x127/0x127
> [  225.778769] Modules linked in: i915 prime_numbers i2c_algo_bit iosf_mbi drm_buddy video wmi drm_display_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm drm_panel_orientation_quirks backlight cfg80211 rfkill sch_fq_codel xt_tcpudp xt_multiport xt_state iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv4 ip_tables x_tables binfmt_misc i2c_dev iTCO_wdt snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_timer psmouse i2c_i801 snd i2c_smbus uhci_hcd i2c_core pcspkr soundcore lpc_ich mfd_core ehci_pci ehci_hcd skge intel_agp intel_gtt usbcore agpgart usb_common rng_core parport_pc parport evdev
> [  225.778899] ---[ end trace 0000000000000000 ]---
> [  225.778906] EIP: __apply_to_page_range+0x24d/0x31c
> [  225.778916] Code: ff ff 8b 55 e8 8b 45 cc e8 0a 11 ec ff 89 d8 83 c4 28 5b 5e 5f 5d c3 81 7d e0 a0 ef 96 c1 74 ad 8b 45 d0 e8 2d 83 49 00 eb a3 <0f> 0b 25 00 f0 ff ff 81 eb 00 00 00 40 01 c3 8b 45 ec 8b 00 e8 76
> [  225.778924] EAX: 00000001 EBX: c53a3b58 ECX: b5c00000 EDX: c258aa00
> [  225.778931] ESI: b5c00000 EDI: b5900000 EBP: c4b0fdb4 ESP: c4b0fd80
> [  225.778938] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010202
> [  225.778946] CR0: 80050033 CR2: b5900000 CR3: 053a3000 CR4: 000006d0
> 
> -- 
> Ville Syrjälä
> Intel

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [Intel-gfx] [mm-unstable PATCH v7 2/8] mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry
@ 2022-11-04 15:59       ` Naoya Horiguchi
  0 siblings, 0 replies; 17+ messages in thread
From: Naoya Horiguchi @ 2022-11-04 15:59 UTC (permalink / raw)
  To: ville.syrjala
  Cc: Miaohe Lin, regressions, David Hildenbrand, Yang Shi,
	naoya.horiguchi, linux-kernel, Liu Shixin, linux-mm, Muchun Song,
	Andrew Morton, Oscar Salvador, intel-gfx, Mike Kravetz

On Wed, Nov 02, 2022 at 10:51:40PM +0200, Ville Syrjälä wrote:
> On Thu, Jul 14, 2022 at 01:24:14PM +0900, Naoya Horiguchi wrote:
> > +/*
> > + * pud_huge() returns 1 if @pud is hugetlb related entry, that is normal
> > + * hugetlb entry or non-present (migration or hwpoisoned) hugetlb entry.
> > + * Otherwise, returns 0.
> > + */
> >  int pud_huge(pud_t pud)
> >  {
> > -	return !!(pud_val(pud) & _PAGE_PSE);
> > +	return !pud_none(pud) &&
> > +		(pud_val(pud) & (_PAGE_PRESENT|_PAGE_PSE)) != _PAGE_PRESENT;
> >  }
> 
> Hi,
> 
> This causes i915 to trip a BUG_ON() on x86-32 when I start X.

Hello,

Thank you for finding and reporting the issue.

x86-32 does not enable CONFIG_ARCH_HAS_GIGANTIC_PAGE, so pud_huge() is
supposed to be false on x86-32.  Doing like below looks to me a fix
(reverting to the original behavior for x86-32):


diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index 6b3033845c6d..bf73f25aaa32 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -37,8 +37,12 @@ int pmd_huge(pmd_t pmd)
  */
 int pud_huge(pud_t pud)
 {
+#ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
        return !pud_none(pud) &&
                (pud_val(pud) & (_PAGE_PRESENT|_PAGE_PSE)) != _PAGE_PRESENT;
+#else
+       return !!(pud_val(pud) & _PAGE_PSE);    // or "return 0;" ?
+#endif
 }

 #ifdef CONFIG_HUGETLB_PAGE


Let me guess what the PUD entry was there when triggering the issue.
Assuming that the original code (before 3a194f3f8ad0) was correct, the PSE
bit in pud_val(pud) should be always cleared.  So, when pud_huge() returns
true since 3a194f3f8ad0, the PRESENT bit should be clear and some other
bits (rather than PRESENT and PSE) are set so that pud_none() is false.
I'm not sure what such a non-present PUD entry does mean.

Thanks,
Naoya Horiguchi

> 
> [  225.777375] kernel BUG at mm/memory.c:2664!
> [  225.777391] invalid opcode: 0000 [#1] PREEMPT SMP
> [  225.777405] CPU: 0 PID: 2402 Comm: Xorg Not tainted 6.1.0-rc3-bdg+ #86
> [  225.777415] Hardware name:  /8I865G775-G, BIOS F1 08/29/2006
> [  225.777421] EIP: __apply_to_page_range+0x24d/0x31c
> [  225.777437] Code: ff ff 8b 55 e8 8b 45 cc e8 0a 11 ec ff 89 d8 83 c4 28 5b 5e 5f 5d c3 81 7d e0 a0 ef 96 c1 74 ad 8b 45 d0 e8 2d 83 49 00 eb a3 <0f> 0b 25 00 f0 ff ff 81 eb 00 00 00 40 01 c3 8b 45 ec 8b 00 e8 76
> [  225.777446] EAX: 00000001 EBX: c53a3b58 ECX: b5c00000 EDX: c258aa00
> [  225.777454] ESI: b5c00000 EDI: b5900000 EBP: c4b0fdb4 ESP: c4b0fd80
> [  225.777462] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010202
> [  225.777470] CR0: 80050033 CR2: b5900000 CR3: 053a3000 CR4: 000006d0
> [  225.777479] Call Trace:
> [  225.777486]  ? i915_memcpy_init_early+0x63/0x63 [i915]
> [  225.777684]  apply_to_page_range+0x21/0x27
> [  225.777694]  ? i915_memcpy_init_early+0x63/0x63 [i915]
> [  225.777870]  remap_io_mapping+0x49/0x75 [i915]
> [  225.778046]  ? i915_memcpy_init_early+0x63/0x63 [i915]
> [  225.778220]  ? mutex_unlock+0xb/0xd
> [  225.778231]  ? i915_vma_pin_fence+0x6d/0xf7 [i915]
> [  225.778420]  vm_fault_gtt+0x2a9/0x8f1 [i915]
> [  225.778644]  ? lock_is_held_type+0x56/0xe7
> [  225.778655]  ? lock_is_held_type+0x7a/0xe7
> [  225.778663]  ? 0xc1000000
> [  225.778670]  __do_fault+0x21/0x6a
> [  225.778679]  handle_mm_fault+0x708/0xb21
> [  225.778686]  ? mt_find+0x21e/0x5ae
> [  225.778696]  exc_page_fault+0x185/0x705
> [  225.778704]  ? doublefault_shim+0x127/0x127
> [  225.778715]  handle_exception+0x130/0x130
> [  225.778723] EIP: 0xb700468a
> [  225.778730] Code: 44 24 40 8b 7c 24 1c 89 47 54 8b 44 24 5c 65 2b 05 14 00 00 00 0f 85 8a 01 00 00 83 c4 6c 5b 5e 5f 5d c3 8b 44 24 1c 8b 40 28 <c7> 00 00 00 00 00 8b 44 24 20 8d 90 20 1b 00 00 8b 02 83 e8 01 89
> [  225.778738] EAX: b5900000 EBX: b7148000 ECX: 00000000 EDX: 00000000
> [  225.778745] ESI: 0103eb60 EDI: b7148000 EBP: b6cf7000 ESP: bfd76650
> [  225.778752] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00010246
> [  225.778761]  ? doublefault_shim+0x127/0x127
> [  225.778769] Modules linked in: i915 prime_numbers i2c_algo_bit iosf_mbi drm_buddy video wmi drm_display_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm drm_panel_orientation_quirks backlight cfg80211 rfkill sch_fq_codel xt_tcpudp xt_multiport xt_state iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv4 ip_tables x_tables binfmt_misc i2c_dev iTCO_wdt snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_timer psmouse i2c_i801 snd i2c_smbus uhci_hcd i2c_core pcspkr soundcore lpc_ich mfd_core ehci_pci ehci_hcd skge intel_agp intel_gtt usbcore agpgart usb_common rng_core parport_pc parport evdev
> [  225.778899] ---[ end trace 0000000000000000 ]---
> [  225.778906] EIP: __apply_to_page_range+0x24d/0x31c
> [  225.778916] Code: ff ff 8b 55 e8 8b 45 cc e8 0a 11 ec ff 89 d8 83 c4 28 5b 5e 5f 5d c3 81 7d e0 a0 ef 96 c1 74 ad 8b 45 d0 e8 2d 83 49 00 eb a3 <0f> 0b 25 00 f0 ff ff 81 eb 00 00 00 40 01 c3 8b 45 ec 8b 00 e8 76
> [  225.778924] EAX: 00000001 EBX: c53a3b58 ECX: b5c00000 EDX: c258aa00
> [  225.778931] ESI: b5c00000 EDI: b5900000 EBP: c4b0fdb4 ESP: c4b0fd80
> [  225.778938] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010202
> [  225.778946] CR0: 80050033 CR2: b5900000 CR3: 053a3000 CR4: 000006d0
> 
> -- 
> Ville Syrjälä
> Intel

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [mm-unstable PATCH v7 2/8] mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry
  2022-11-04 15:59       ` [Intel-gfx] " Naoya Horiguchi
@ 2022-11-04 22:23         ` Ville Syrjälä
  -1 siblings, 0 replies; 17+ messages in thread
From: Ville Syrjälä @ 2022-11-04 22:23 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Andrew Morton, David Hildenbrand, Mike Kravetz,
	Miaohe Lin, Liu Shixin, Yang Shi, Oscar Salvador, Muchun Song,
	linux-kernel, intel-gfx, regressions, naoya.horiguchi

On Sat, Nov 05, 2022 at 12:59:30AM +0900, Naoya Horiguchi wrote:
> On Wed, Nov 02, 2022 at 10:51:40PM +0200, Ville Syrjälä wrote:
> > On Thu, Jul 14, 2022 at 01:24:14PM +0900, Naoya Horiguchi wrote:
> > > +/*
> > > + * pud_huge() returns 1 if @pud is hugetlb related entry, that is normal
> > > + * hugetlb entry or non-present (migration or hwpoisoned) hugetlb entry.
> > > + * Otherwise, returns 0.
> > > + */
> > >  int pud_huge(pud_t pud)
> > >  {
> > > -	return !!(pud_val(pud) & _PAGE_PSE);
> > > +	return !pud_none(pud) &&
> > > +		(pud_val(pud) & (_PAGE_PRESENT|_PAGE_PSE)) != _PAGE_PRESENT;
> > >  }
> > 
> > Hi,
> > 
> > This causes i915 to trip a BUG_ON() on x86-32 when I start X.
> 
> Hello,
> 
> Thank you for finding and reporting the issue.
> 
> x86-32 does not enable CONFIG_ARCH_HAS_GIGANTIC_PAGE, so pud_huge() is
> supposed to be false on x86-32.  Doing like below looks to me a fix
> (reverting to the original behavior for x86-32):
> diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
> index 6b3033845c6d..bf73f25aaa32 100644
> --- a/arch/x86/mm/hugetlbpage.c
> +++ b/arch/x86/mm/hugetlbpage.c
> @@ -37,8 +37,12 @@ int pmd_huge(pmd_t pmd)
>   */
>  int pud_huge(pud_t pud)
>  {
> +#ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
>         return !pud_none(pud) &&
>                 (pud_val(pud) & (_PAGE_PRESENT|_PAGE_PSE)) != _PAGE_PRESENT;
> +#else
> +       return !!(pud_val(pud) & _PAGE_PSE);    // or "return 0;" ?
> +#endif
>  }
> 
>  #ifdef CONFIG_HUGETLB_PAGE
> 
> 
> Let me guess what the PUD entry was there when triggering the issue.
> Assuming that the original code (before 3a194f3f8ad0) was correct, the PSE
> bit in pud_val(pud) should be always cleared.  So, when pud_huge() returns
> true since 3a194f3f8ad0, the PRESENT bit should be clear and some other
> bits (rather than PRESENT and PSE) are set so that pud_none() is false.
> I'm not sure what such a non-present PUD entry does mean.

pud_val()==0 when it blows up, and pud_none() is false because
pgtable-nopmd.h says so with 2 level paging.

And given that I just tested with PAE / 3 level paging, 
and sure enough it no longer blows up.

So looks to me like maybe this new code just doesn't understand
how the levels get folded.

I might also be missing something obvious, but why is it even
necessary to treat PRESENT==0+PSE==0 as a huge entry?

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Intel-gfx] [mm-unstable PATCH v7 2/8] mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry
@ 2022-11-04 22:23         ` Ville Syrjälä
  0 siblings, 0 replies; 17+ messages in thread
From: Ville Syrjälä @ 2022-11-04 22:23 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Miaohe Lin, regressions, David Hildenbrand, Yang Shi,
	naoya.horiguchi, linux-kernel, Liu Shixin, linux-mm, Muchun Song,
	Andrew Morton, Oscar Salvador, intel-gfx, Mike Kravetz

On Sat, Nov 05, 2022 at 12:59:30AM +0900, Naoya Horiguchi wrote:
> On Wed, Nov 02, 2022 at 10:51:40PM +0200, Ville Syrjälä wrote:
> > On Thu, Jul 14, 2022 at 01:24:14PM +0900, Naoya Horiguchi wrote:
> > > +/*
> > > + * pud_huge() returns 1 if @pud is hugetlb related entry, that is normal
> > > + * hugetlb entry or non-present (migration or hwpoisoned) hugetlb entry.
> > > + * Otherwise, returns 0.
> > > + */
> > >  int pud_huge(pud_t pud)
> > >  {
> > > -	return !!(pud_val(pud) & _PAGE_PSE);
> > > +	return !pud_none(pud) &&
> > > +		(pud_val(pud) & (_PAGE_PRESENT|_PAGE_PSE)) != _PAGE_PRESENT;
> > >  }
> > 
> > Hi,
> > 
> > This causes i915 to trip a BUG_ON() on x86-32 when I start X.
> 
> Hello,
> 
> Thank you for finding and reporting the issue.
> 
> x86-32 does not enable CONFIG_ARCH_HAS_GIGANTIC_PAGE, so pud_huge() is
> supposed to be false on x86-32.  Doing like below looks to me a fix
> (reverting to the original behavior for x86-32):
> diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
> index 6b3033845c6d..bf73f25aaa32 100644
> --- a/arch/x86/mm/hugetlbpage.c
> +++ b/arch/x86/mm/hugetlbpage.c
> @@ -37,8 +37,12 @@ int pmd_huge(pmd_t pmd)
>   */
>  int pud_huge(pud_t pud)
>  {
> +#ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
>         return !pud_none(pud) &&
>                 (pud_val(pud) & (_PAGE_PRESENT|_PAGE_PSE)) != _PAGE_PRESENT;
> +#else
> +       return !!(pud_val(pud) & _PAGE_PSE);    // or "return 0;" ?
> +#endif
>  }
> 
>  #ifdef CONFIG_HUGETLB_PAGE
> 
> 
> Let me guess what the PUD entry was there when triggering the issue.
> Assuming that the original code (before 3a194f3f8ad0) was correct, the PSE
> bit in pud_val(pud) should be always cleared.  So, when pud_huge() returns
> true since 3a194f3f8ad0, the PRESENT bit should be clear and some other
> bits (rather than PRESENT and PSE) are set so that pud_none() is false.
> I'm not sure what such a non-present PUD entry does mean.

pud_val()==0 when it blows up, and pud_none() is false because
pgtable-nopmd.h says so with 2 level paging.

And given that I just tested with PAE / 3 level paging, 
and sure enough it no longer blows up.

So looks to me like maybe this new code just doesn't understand
how the levels get folded.

I might also be missing something obvious, but why is it even
necessary to treat PRESENT==0+PSE==0 as a huge entry?

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [mm-unstable PATCH v7 2/8] mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry
  2022-11-04 22:23         ` [Intel-gfx] " Ville Syrjälä
@ 2022-11-06 23:52           ` HORIGUCHI NAOYA(堀口　直也)
  -1 siblings, 0 replies; 17+ messages in thread
From: HORIGUCHI NAOYA(堀口　直也) @ 2022-11-06 23:52 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: Naoya Horiguchi, linux-mm, Andrew Morton, David Hildenbrand,
	Mike Kravetz, Miaohe Lin, Liu Shixin, Yang Shi, Oscar Salvador,
	Muchun Song, linux-kernel, intel-gfx, regressions

On Sat, Nov 05, 2022 at 12:23:40AM +0200, Ville Syrjälä wrote:
> On Sat, Nov 05, 2022 at 12:59:30AM +0900, Naoya Horiguchi wrote:
> > On Wed, Nov 02, 2022 at 10:51:40PM +0200, Ville Syrjälä wrote:
> > > On Thu, Jul 14, 2022 at 01:24:14PM +0900, Naoya Horiguchi wrote:
> > > > +/*
> > > > + * pud_huge() returns 1 if @pud is hugetlb related entry, that is normal
> > > > + * hugetlb entry or non-present (migration or hwpoisoned) hugetlb entry.
> > > > + * Otherwise, returns 0.
> > > > + */
> > > >  int pud_huge(pud_t pud)
> > > >  {
> > > > -	return !!(pud_val(pud) & _PAGE_PSE);
> > > > +	return !pud_none(pud) &&
> > > > +		(pud_val(pud) & (_PAGE_PRESENT|_PAGE_PSE)) != _PAGE_PRESENT;
> > > >  }
> > > 
> > > Hi,
> > > 
> > > This causes i915 to trip a BUG_ON() on x86-32 when I start X.
> > 
> > Hello,
> > 
> > Thank you for finding and reporting the issue.
> > 
> > x86-32 does not enable CONFIG_ARCH_HAS_GIGANTIC_PAGE, so pud_huge() is
> > supposed to be false on x86-32.  Doing like below looks to me a fix
> > (reverting to the original behavior for x86-32):
> > diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
> > index 6b3033845c6d..bf73f25aaa32 100644
> > --- a/arch/x86/mm/hugetlbpage.c
> > +++ b/arch/x86/mm/hugetlbpage.c
> > @@ -37,8 +37,12 @@ int pmd_huge(pmd_t pmd)
> >   */
> >  int pud_huge(pud_t pud)
> >  {
> > +#ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
> >         return !pud_none(pud) &&
> >                 (pud_val(pud) & (_PAGE_PRESENT|_PAGE_PSE)) != _PAGE_PRESENT;
> > +#else
> > +       return !!(pud_val(pud) & _PAGE_PSE);    // or "return 0;" ?
> > +#endif
> >  }
> > 
> >  #ifdef CONFIG_HUGETLB_PAGE
> > 
> > 
> > Let me guess what the PUD entry was there when triggering the issue.
> > Assuming that the original code (before 3a194f3f8ad0) was correct, the PSE
> > bit in pud_val(pud) should be always cleared.  So, when pud_huge() returns
> > true since 3a194f3f8ad0, the PRESENT bit should be clear and some other
> > bits (rather than PRESENT and PSE) are set so that pud_none() is false.
> > I'm not sure what such a non-present PUD entry does mean.
> 
> pud_val()==0 when it blows up, and pud_none() is false because
> pgtable-nopmd.h says so with 2 level paging.
> 
> And given that I just tested with PAE / 3 level paging, 
> and sure enough it no longer blows up.
> 
> So looks to me like maybe this new code just doesn't understand
> how the levels get folded.

OK, so branching based on "#if CONFIG_PGTABLE_LEVELS > 2" seems better.
Thank you for additional testing.

> 
> I might also be missing something obvious, but why is it even
> necessary to treat PRESENT==0+PSE==0 as a huge entry?

The format of pud entry differs based on PRESENT bit, and PSE bit is
checked before PRESENT bit.  So in order to distinguish from a normal
huge entry, we had to define that a non-present huge entry should have
its PSE bit cleared (although this sounds counter-intuitive).

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Intel-gfx] [mm-unstable PATCH v7 2/8] mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry
@ 2022-11-06 23:52           ` HORIGUCHI NAOYA(堀口　直也)
  0 siblings, 0 replies; 17+ messages in thread
From: HORIGUCHI NAOYA(堀口　直也) @ 2022-11-06 23:52 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: Miaohe Lin, regressions, David Hildenbrand, Yang Shi,
	linux-kernel, Liu Shixin, linux-mm, Muchun Song, Andrew Morton,
	Oscar Salvador, Naoya Horiguchi, intel-gfx, Mike Kravetz

On Sat, Nov 05, 2022 at 12:23:40AM +0200, Ville Syrjälä wrote:
> On Sat, Nov 05, 2022 at 12:59:30AM +0900, Naoya Horiguchi wrote:
> > On Wed, Nov 02, 2022 at 10:51:40PM +0200, Ville Syrjälä wrote:
> > > On Thu, Jul 14, 2022 at 01:24:14PM +0900, Naoya Horiguchi wrote:
> > > > +/*
> > > > + * pud_huge() returns 1 if @pud is hugetlb related entry, that is normal
> > > > + * hugetlb entry or non-present (migration or hwpoisoned) hugetlb entry.
> > > > + * Otherwise, returns 0.
> > > > + */
> > > >  int pud_huge(pud_t pud)
> > > >  {
> > > > -	return !!(pud_val(pud) & _PAGE_PSE);
> > > > +	return !pud_none(pud) &&
> > > > +		(pud_val(pud) & (_PAGE_PRESENT|_PAGE_PSE)) != _PAGE_PRESENT;
> > > >  }
> > > 
> > > Hi,
> > > 
> > > This causes i915 to trip a BUG_ON() on x86-32 when I start X.
> > 
> > Hello,
> > 
> > Thank you for finding and reporting the issue.
> > 
> > x86-32 does not enable CONFIG_ARCH_HAS_GIGANTIC_PAGE, so pud_huge() is
> > supposed to be false on x86-32.  Doing like below looks to me a fix
> > (reverting to the original behavior for x86-32):
> > diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
> > index 6b3033845c6d..bf73f25aaa32 100644
> > --- a/arch/x86/mm/hugetlbpage.c
> > +++ b/arch/x86/mm/hugetlbpage.c
> > @@ -37,8 +37,12 @@ int pmd_huge(pmd_t pmd)
> >   */
> >  int pud_huge(pud_t pud)
> >  {
> > +#ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
> >         return !pud_none(pud) &&
> >                 (pud_val(pud) & (_PAGE_PRESENT|_PAGE_PSE)) != _PAGE_PRESENT;
> > +#else
> > +       return !!(pud_val(pud) & _PAGE_PSE);    // or "return 0;" ?
> > +#endif
> >  }
> > 
> >  #ifdef CONFIG_HUGETLB_PAGE
> > 
> > 
> > Let me guess what the PUD entry was there when triggering the issue.
> > Assuming that the original code (before 3a194f3f8ad0) was correct, the PSE
> > bit in pud_val(pud) should be always cleared.  So, when pud_huge() returns
> > true since 3a194f3f8ad0, the PRESENT bit should be clear and some other
> > bits (rather than PRESENT and PSE) are set so that pud_none() is false.
> > I'm not sure what such a non-present PUD entry does mean.
> 
> pud_val()==0 when it blows up, and pud_none() is false because
> pgtable-nopmd.h says so with 2 level paging.
> 
> And given that I just tested with PAE / 3 level paging, 
> and sure enough it no longer blows up.
> 
> So looks to me like maybe this new code just doesn't understand
> how the levels get folded.

OK, so branching based on "#if CONFIG_PGTABLE_LEVELS > 2" seems better.
Thank you for additional testing.

> 
> I might also be missing something obvious, but why is it even
> necessary to treat PRESENT==0+PSE==0 as a huge entry?

The format of pud entry differs based on PRESENT bit, and PSE bit is
checked before PRESENT bit.  So in order to distinguish from a normal
huge entry, we had to define that a non-present huge entry should have
its PSE bit cleared (although this sounds counter-intuitive).

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-11-07 18:20 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-14  4:24 [mm-unstable PATCH v7 0/8] mm, hwpoison: enable 1GB hugepage support (v7) Naoya Horiguchi
2022-07-14  4:24 ` [mm-unstable PATCH v7 1/8] mm/hugetlb: check gigantic_page_runtime_supported() in return_unused_surplus_pages() Naoya Horiguchi
2022-07-14  4:24 ` [mm-unstable PATCH v7 2/8] mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry Naoya Horiguchi
2022-11-02 20:51   ` Ville Syrjälä
2022-11-02 20:51     ` [Intel-gfx] " Ville Syrjälä
2022-11-04 15:59     ` Naoya Horiguchi
2022-11-04 15:59       ` [Intel-gfx] " Naoya Horiguchi
2022-11-04 22:23       ` Ville Syrjälä
2022-11-04 22:23         ` [Intel-gfx] " Ville Syrjälä
2022-11-06 23:52         ` HORIGUCHI NAOYA(堀口　直也)
2022-11-06 23:52           ` [Intel-gfx] " HORIGUCHI NAOYA(堀口　直也)
2022-07-14  4:24 ` [mm-unstable PATCH v7 3/8] mm, hwpoison, hugetlb: support saving mechanism of raw error pages Naoya Horiguchi
2022-07-14  4:24 ` [mm-unstable PATCH v7 4/8] mm, hwpoison: make unpoison aware of raw error info in hwpoisoned hugepage Naoya Horiguchi
2022-07-14  4:24 ` [mm-unstable PATCH v7 5/8] mm, hwpoison: set PG_hwpoison for busy hugetlb pages Naoya Horiguchi
2022-07-14  4:24 ` [mm-unstable PATCH v7 6/8] mm, hwpoison: make __page_handle_poison returns int Naoya Horiguchi
2022-07-14  4:24 ` [mm-unstable PATCH v7 7/8] mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage Naoya Horiguchi
2022-07-14  4:24 ` [mm-unstable PATCH v7 8/8] mm, hwpoison: enable memory error handling on " Naoya Horiguchi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.