linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] mm: hugetlb: add hwcrp_hugepages to record memory failure on hugetlbfs
@ 2021-06-23  8:51 wangbin
  2021-06-25 23:07 ` Mike Kravetz
  0 siblings, 1 reply; 5+ messages in thread
From: wangbin @ 2021-06-23  8:51 UTC (permalink / raw)
  To: linux-mm, linux-kernel; +Cc: mike.kravetz, nao.horiguchi, akpm, wuxu.wu

From: Bin Wang <wangbin224@huawei.com>

In the current hugetlbfs memory failure handler, reserved huge page
counts are used to record the number of huge pages with hwposion.
There are two problems:

1. We call hugetlb_fix_reserve_counts() to change reserved counts
in hugetlbfs_error_remove_page(). But this function is only called if
hugetlb_unreserve_pages() fails, and hugetlb_unreserve_pages() fails
only if kmalloc in region_del() fails, which is almost impossible.
As a result, the reserved count is not corrected as expected when a
memory failure occurs.

2. Reserved counts is designed to display the number of hugepages
reserved at mmap() time. This means that even if we fix the first
issue, reserved counts will be confusing because we can't tell if
it's hwposion or reserved hugepage.

This patch adds hardware corrput huge pages counts to record memory
failure on hugetlbfs instead of reserved counts.

Signed-off-by: Bin Wang <wangbin224@huawei.com>
---
 fs/hugetlbfs/inode.c    |  3 +--
 include/linux/hugetlb.h |  3 +++
 mm/hugetlb.c            | 30 ++++++++++++++++++++++++++++++
 3 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 926eeb9bf4eb..ffb6e7b6756b 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -986,8 +986,7 @@ static int hugetlbfs_error_remove_page(struct address_space *mapping,
 	pgoff_t index = page->index;
 
 	remove_huge_page(page);
-	if (unlikely(hugetlb_unreserve_pages(inode, index, index + 1, 1)))
-		hugetlb_fix_reserve_counts(inode);
+	hugetlb_fix_hwcrp_counts(page);
 
 	return 0;
 }
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index f7ca1a3870ea..1d5bada80aa5 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -171,6 +171,7 @@ void putback_active_hugepage(struct page *page);
 void move_hugetlb_state(struct page *oldpage, struct page *newpage, int reason);
 void free_huge_page(struct page *page);
 void hugetlb_fix_reserve_counts(struct inode *inode);
+void hugetlb_fix_hwcrp_counts(struct page *page);
 extern struct mutex *hugetlb_fault_mutex_table;
 u32 hugetlb_fault_mutex_hash(struct address_space *mapping, pgoff_t idx);
 
@@ -602,12 +603,14 @@ struct hstate {
 	unsigned long free_huge_pages;
 	unsigned long resv_huge_pages;
 	unsigned long surplus_huge_pages;
+	unsigned long hwcrp_huge_pages;
 	unsigned long nr_overcommit_huge_pages;
 	struct list_head hugepage_activelist;
 	struct list_head hugepage_freelists[MAX_NUMNODES];
 	unsigned int nr_huge_pages_node[MAX_NUMNODES];
 	unsigned int free_huge_pages_node[MAX_NUMNODES];
 	unsigned int surplus_huge_pages_node[MAX_NUMNODES];
+	unsigned int hwcrp_huge_pages_node[MAX_NUMNODES];
 #ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
 	unsigned int nr_free_vmemmap_pages;
 #endif
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 760b5fb836b8..3e6385381db7 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -763,6 +763,15 @@ void hugetlb_fix_reserve_counts(struct inode *inode)
 		pr_warn("hugetlb: Huge Page Reserved count may go negative.\n");
 }
 
+void hugetlb_fix_hwcrp_counts(struct page *page)
+{
+	struct hstate *h = &default_hstate;
+	int nid = page_to_nid(page);
+
+	h->hwcrp_huge_pages++;
+	h->hwcrp_huge_pages_node[nid]++;
+}
+
 /*
  * Count and return the number of huge pages in the reserve map
  * that intersect with the range [f, t).
@@ -3293,12 +3302,30 @@ static ssize_t surplus_hugepages_show(struct kobject *kobj,
 }
 HSTATE_ATTR_RO(surplus_hugepages);
 
+static ssize_t hwcrp_hugepages_show(struct kobject *kobj,
+					struct kobj_attribute *attr, char *buf)
+{
+	struct hstate *h;
+	unsigned long hwcrp_huge_pages;
+	int nid;
+
+	h = kobj_to_hstate(kobj, &nid);
+	if (nid == NUMA_NO_NODE)
+		hwcrp_huge_pages = h->hwcrp_huge_pages;
+	else
+		hwcrp_huge_pages = h->hwcrp_huge_pages_node[nid];
+
+	return sysfs_emit(buf, "%lu\n", hwcrp_huge_pages);
+}
+HSTATE_ATTR_RO(hwcrp_hugepages);
+
 static struct attribute *hstate_attrs[] = {
 	&nr_hugepages_attr.attr,
 	&nr_overcommit_hugepages_attr.attr,
 	&free_hugepages_attr.attr,
 	&resv_hugepages_attr.attr,
 	&surplus_hugepages_attr.attr,
+	&hwcrp_hugepages_attr.attr,
 #ifdef CONFIG_NUMA
 	&nr_hugepages_mempolicy_attr.attr,
 #endif
@@ -3368,6 +3395,7 @@ static struct attribute *per_node_hstate_attrs[] = {
 	&nr_hugepages_attr.attr,
 	&free_hugepages_attr.attr,
 	&surplus_hugepages_attr.attr,
+	&hwcrp_hugepages_attr.attr,
 	NULL,
 };
 
@@ -3862,11 +3890,13 @@ void hugetlb_report_meminfo(struct seq_file *m)
 				   "HugePages_Free:    %5lu\n"
 				   "HugePages_Rsvd:    %5lu\n"
 				   "HugePages_Surp:    %5lu\n"
+				   "HugePages_Hwcrp:   %5lu\n"
 				   "Hugepagesize:   %8lu kB\n",
 				   count,
 				   h->free_huge_pages,
 				   h->resv_huge_pages,
 				   h->surplus_huge_pages,
+				   h->hwcrp_huge_pages,
 				   huge_page_size(h) / SZ_1K);
 	}
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] mm: hugetlb: add hwcrp_hugepages to record memory failure on hugetlbfs
  2021-06-23  8:51 [PATCH v2] mm: hugetlb: add hwcrp_hugepages to record memory failure on hugetlbfs wangbin
@ 2021-06-25 23:07 ` Mike Kravetz
  2021-06-28  9:27   ` Bin Wang
  0 siblings, 1 reply; 5+ messages in thread
From: Mike Kravetz @ 2021-06-25 23:07 UTC (permalink / raw)
  To: wangbin, linux-mm, linux-kernel; +Cc: nao.horiguchi, akpm, wuxu.wu

On 6/23/21 1:51 AM, wangbin wrote:
> From: Bin Wang <wangbin224@huawei.com>
> 
> In the current hugetlbfs memory failure handler, reserved huge page
> counts are used to record the number of huge pages with hwposion.
> There are two problems:

Plese review the comments from the first patch.  Reserved huge page
counts are NOT used to record the number of huge pages with hwposion.

> 
> 1. We call hugetlb_fix_reserve_counts() to change reserved counts
> in hugetlbfs_error_remove_page(). But this function is only called if
> hugetlb_unreserve_pages() fails, and hugetlb_unreserve_pages() fails
> only if kmalloc in region_del() fails, which is almost impossible.
> As a result, the reserved count is not corrected as expected when a
> memory failure occurs.
> 
> 2. Reserved counts is designed to display the number of hugepages
> reserved at mmap() time. This means that even if we fix the first
> issue, reserved counts will be confusing because we can't tell if
> it's hwposion or reserved hugepage.
> 
> This patch adds hardware corrput huge pages counts to record memory
> failure on hugetlbfs instead of reserved counts.
> 
> Signed-off-by: Bin Wang <wangbin224@huawei.com>
> ---
>  fs/hugetlbfs/inode.c    |  3 +--
>  include/linux/hugetlb.h |  3 +++
>  mm/hugetlb.c            | 30 ++++++++++++++++++++++++++++++
>  3 files changed, 34 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 926eeb9bf4eb..ffb6e7b6756b 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -986,8 +986,7 @@ static int hugetlbfs_error_remove_page(struct address_space *mapping,
>  	pgoff_t index = page->index;
>  
>  	remove_huge_page(page);
> -	if (unlikely(hugetlb_unreserve_pages(inode, index, index + 1, 1)))
> -		hugetlb_fix_reserve_counts(inode);

As mentioned, huge page reserve counts are not used to record number of
poisioned pages.  The calls to hugetlb_unreserve_pages and possibly
hugetlb_fix_reserve_counts are necessary for reserve accounting.  They
can not be removed.

> +	hugetlb_fix_hwcrp_counts(page);

This new routine just counts memory errors on 'in use' huge pages.
I do not see a call anywhere to count memory errors on huge pages
not in use.

>  
>  	return 0;
>  }
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index f7ca1a3870ea..1d5bada80aa5 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -171,6 +171,7 @@ void putback_active_hugepage(struct page *page);
>  void move_hugetlb_state(struct page *oldpage, struct page *newpage, int reason);
>  void free_huge_page(struct page *page);
>  void hugetlb_fix_reserve_counts(struct inode *inode);
> +void hugetlb_fix_hwcrp_counts(struct page *page);
>  extern struct mutex *hugetlb_fault_mutex_table;
>  u32 hugetlb_fault_mutex_hash(struct address_space *mapping, pgoff_t idx);
>  
> @@ -602,12 +603,14 @@ struct hstate {
>  	unsigned long free_huge_pages;
>  	unsigned long resv_huge_pages;
>  	unsigned long surplus_huge_pages;
> +	unsigned long hwcrp_huge_pages;
>  	unsigned long nr_overcommit_huge_pages;
>  	struct list_head hugepage_activelist;
>  	struct list_head hugepage_freelists[MAX_NUMNODES];
>  	unsigned int nr_huge_pages_node[MAX_NUMNODES];
>  	unsigned int free_huge_pages_node[MAX_NUMNODES];
>  	unsigned int surplus_huge_pages_node[MAX_NUMNODES];
> +	unsigned int hwcrp_huge_pages_node[MAX_NUMNODES];

I understand your requirement to count the number of memory errors on
hugetlb pages.  However, we need to think carefully about we represent
that count.

Noaya, do you have opinions on where would be the best place to store
this information?  The hugetlb memory error code has the comment 'needs
work'.  Ideally, we could isolate memory errors to a single base (4K for
x86) page and free the remaining base pages to buddy.  We could also
potentially allocate a 'replacement' hugetlb page doing something like
alloc_and_dissolve_huge_page.

If we get an error on a hugetlb page and can isolate it to a base page
and replace the huge page, is it still a huge page memory error?

IMO, we should work on isolating memory errors to a base page and
replacing the huge page.  Then, the existing count of base pages with
memory errors would be sufficient?

This is something I would like to work, but I have higher priorities
right now.

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Re: [PATCH v2] mm: hugetlb: add hwcrp_hugepages to record memory failure on hugetlbfs
  2021-06-25 23:07 ` Mike Kravetz
@ 2021-06-28  9:27   ` Bin Wang
  2021-06-29  8:08     ` Naoya Horiguchi
  0 siblings, 1 reply; 5+ messages in thread
From: Bin Wang @ 2021-06-28  9:27 UTC (permalink / raw)
  To: mike.kravetz; +Cc: akpm, linux-kernel, linux-mm, naoya.horiguchi, wuxu.wu

> > diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> > index 926eeb9bf4eb..ffb6e7b6756b 100644
> > --- a/fs/hugetlbfs/inode.c
> > +++ b/fs/hugetlbfs/inode.c
> > @@ -986,8 +986,7 @@ static int hugetlbfs_error_remove_page(struct address_space *mapping,
> >  	pgoff_t index = page->index;
> >  
> >  	remove_huge_page(page);
> > -	if (unlikely(hugetlb_unreserve_pages(inode, index, index + 1, 1)))
> > -		hugetlb_fix_reserve_counts(inode);
> 
> As mentioned, huge page reserve counts are not used to record number of
> poisioned pages.  The calls to hugetlb_unreserve_pages and possibly
> hugetlb_fix_reserve_counts are necessary for reserve accounting.  They
> can not be removed.

Thanks for your explanation very much. I didn't get the point of the
comments from the first patch. hugetlb_fix_reserve_counts() shouldn't
be removed and I will fix this.

> > +	hugetlb_fix_hwcrp_counts(page);
> 
> This new routine just counts memory errors on 'in use' huge pages.
> I do not see a call anywhere to count memory errors on huge pages
> not in use.

It's my oversight. I should have considered this situation. I tested
it with hwcrp_hugepages count, and this is the result:
# cat /proc/meminfo |grep -E 'HugePages_|Hard'
HardwareCorrupted:     0 kB
HugePages_Total:      64
HugePages_Free:       64
HugePages_Rsvd:        0
HugePages_Surp:        0
HugePages_Hwcrp:       0
No count changes, even the HardwareCorrupted. I'm not sure if this is
normal. This is what happens in kernel(stable master branch):
static int memory_failure_hugetlb(unsigned long pfn, int flags)
{
	...
	/* TestSetPageHWPoison return 0 */
	if (TestSetPageHWPoison(head)) {
		...
	}

	num_poisoned_pages_inc(); /* HardwareCorrupted += PAGE_SIZE */

	/* get_hwpoison_page() return 0 */
	if (!(flags & MF_COUNT_INCREASED) && !get_hwpoison_page(p, flags, 0)) {
		/*
		 * Check "filter hit" and "race with other subpage."
		 */
		lock_page(head);
		/* PageHWPoison() return 1 */
		if (PageHWPoison(head)) {
			/* (p != head && TestSetPageHWPoison(head)) is hit */
			if ((hwpoison_filter(p) && TestClearPageHWPoison(p))
			    || (p != head && TestSetPageHWPoison(head))) {
				/* HardwareCorrupted -= PAGE_SIZE */
				num_poisoned_pages_dec();
				unlock_page(head);
				return 0;
			}
		}
		...
	}
	...
}
It seems like that memory errors on huge pages not in use hit the
"race with other subpage". I think I shouldn't add hwcrp_hugepages in
this routine. Maybe we need more conditions to distinguish this.

> >  
> >  	return 0;
> >  }
> > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > index f7ca1a3870ea..1d5bada80aa5 100644
> > --- a/include/linux/hugetlb.h
> > +++ b/include/linux/hugetlb.h
> > @@ -171,6 +171,7 @@ void putback_active_hugepage(struct page *page);
> >  void move_hugetlb_state(struct page *oldpage, struct page *newpage, int reason);
> >  void free_huge_page(struct page *page);
> >  void hugetlb_fix_reserve_counts(struct inode *inode);
> > +void hugetlb_fix_hwcrp_counts(struct page *page);
> >  extern struct mutex *hugetlb_fault_mutex_table;
> >  u32 hugetlb_fault_mutex_hash(struct address_space *mapping, pgoff_t idx);
> >  
> > @@ -602,12 +603,14 @@ struct hstate {
> >  	unsigned long free_huge_pages;
> >  	unsigned long resv_huge_pages;
> >  	unsigned long surplus_huge_pages;
> > +	unsigned long hwcrp_huge_pages;
> >  	unsigned long nr_overcommit_huge_pages;
> >  	struct list_head hugepage_activelist;
> >  	struct list_head hugepage_freelists[MAX_NUMNODES];
> >  	unsigned int nr_huge_pages_node[MAX_NUMNODES];
> >  	unsigned int free_huge_pages_node[MAX_NUMNODES];
> >  	unsigned int surplus_huge_pages_node[MAX_NUMNODES];
> > +	unsigned int hwcrp_huge_pages_node[MAX_NUMNODES];
> 
> I understand your requirement to count the number of memory errors on
> hugetlb pages.  However, we need to think carefully about we represent
> that count.
> 
> Noaya, do you have opinions on where would be the best place to store
> this information?  The hugetlb memory error code has the comment 'needs
> work'.  Ideally, we could isolate memory errors to a single base (4K for
> x86) page and free the remaining base pages to buddy.  We could also
> potentially allocate a 'replacement' hugetlb page doing something like
> alloc_and_dissolve_huge_page.
> 
> If we get an error on a hugetlb page and can isolate it to a base page
> and replace the huge page, is it still a huge page memory error?
> 
> IMO, we should work on isolating memory errors to a base page and
> replacing the huge page.  Then, the existing count of base pages with
> memory errors would be sufficient?
> 
> This is something I would like to work, but I have higher priorities
> right now.

Yes, splitting the huge pages and isolating a base page is ideal. And
we do this with dissolve_free_huge_page() when page_mapping() return
NULL. I think there is a reason(but I do not get it) why we don't split
huge pags in hugetlbfs_error_remove_page() or after. So I choose to 
add a new count.

--
Bin Wang

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Re: [PATCH v2] mm: hugetlb: add hwcrp_hugepages to record memory failure on hugetlbfs
  2021-06-28  9:27   ` Bin Wang
@ 2021-06-29  8:08     ` Naoya Horiguchi
  2021-07-11 14:20       ` Bin Wang
  0 siblings, 1 reply; 5+ messages in thread
From: Naoya Horiguchi @ 2021-06-29  8:08 UTC (permalink / raw)
  To: Bin Wang, mike.kravetz
  Cc: akpm, linux-kernel, linux-mm, naoya.horiguchi, wuxu.wu

Hi Bin, Mike,

On Mon, Jun 28, 2021 at 05:27:52PM +0800, Bin Wang wrote:
> > > diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> > > index 926eeb9bf4eb..ffb6e7b6756b 100644
> > > --- a/fs/hugetlbfs/inode.c
> > > +++ b/fs/hugetlbfs/inode.c
> > > @@ -986,8 +986,7 @@ static int hugetlbfs_error_remove_page(struct address_space *mapping,
> > >  	pgoff_t index = page->index;
> > >  
> > >  	remove_huge_page(page);
> > > -	if (unlikely(hugetlb_unreserve_pages(inode, index, index + 1, 1)))
> > > -		hugetlb_fix_reserve_counts(inode);
> > 
> > As mentioned, huge page reserve counts are not used to record number of
> > poisioned pages.  The calls to hugetlb_unreserve_pages and possibly
> > hugetlb_fix_reserve_counts are necessary for reserve accounting.  They
> > can not be removed.
> 
> Thanks for your explanation very much. I didn't get the point of the
> comments from the first patch. hugetlb_fix_reserve_counts() shouldn't
> be removed and I will fix this.
> 
> > > +	hugetlb_fix_hwcrp_counts(page);
> > 
> > This new routine just counts memory errors on 'in use' huge pages.
> > I do not see a call anywhere to count memory errors on huge pages
> > not in use.
> 
> It's my oversight. I should have considered this situation. I tested
> it with hwcrp_hugepages count, and this is the result:
> # cat /proc/meminfo |grep -E 'HugePages_|Hard'
> HardwareCorrupted:     0 kB
> HugePages_Total:      64
> HugePages_Free:       64
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> HugePages_Hwcrp:       0
> No count changes, even the HardwareCorrupted. I'm not sure if this is
> normal. This is what happens in kernel(stable master branch):
> static int memory_failure_hugetlb(unsigned long pfn, int flags)
> {
> 	...
> 	/* TestSetPageHWPoison return 0 */
> 	if (TestSetPageHWPoison(head)) {
> 		...
> 	}
> 
> 	num_poisoned_pages_inc(); /* HardwareCorrupted += PAGE_SIZE */
> 
> 	/* get_hwpoison_page() return 0 */
> 	if (!(flags & MF_COUNT_INCREASED) && !get_hwpoison_page(p, flags, 0)) {
> 		/*
> 		 * Check "filter hit" and "race with other subpage."
> 		 */
> 		lock_page(head);
> 		/* PageHWPoison() return 1 */
> 		if (PageHWPoison(head)) {
> 			/* (p != head && TestSetPageHWPoison(head)) is hit */
> 			if ((hwpoison_filter(p) && TestClearPageHWPoison(p))
> 			    || (p != head && TestSetPageHWPoison(head))) {
> 				/* HardwareCorrupted -= PAGE_SIZE */
> 				num_poisoned_pages_dec();
> 				unlock_page(head);
> 				return 0;
> 			}
> 		}
> 		...
> 	}
> 	...
> }
> It seems like that memory errors on huge pages not in use hit the
> "race with other subpage". I think I shouldn't add hwcrp_hugepages in
> this routine. Maybe we need more conditions to distinguish this.

BTW, I think that the race handled by this if-block will be no longer
necessary if mf_mutex is extended to cover unpoison_memory() by patch
"mm/hwpoison: mf_mutex for soft offline and unpoison" [1].
I'm thinking of updating patch 2/6 [2] to remove this if-block.

  [1]: https://lore.kernel.org/linux-mm/20210614021212.223326-2-nao.horiguchi@gmail.com/
  [2]: https://lore.kernel.org/linux-mm/20210614021212.223326-3-nao.horiguchi@gmail.com/

# .. so you don't have to care for the detail of this part.

> 
> > >  
> > >  	return 0;
> > >  }
> > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > > index f7ca1a3870ea..1d5bada80aa5 100644
> > > --- a/include/linux/hugetlb.h
> > > +++ b/include/linux/hugetlb.h
> > > @@ -171,6 +171,7 @@ void putback_active_hugepage(struct page *page);
> > >  void move_hugetlb_state(struct page *oldpage, struct page *newpage, int reason);
> > >  void free_huge_page(struct page *page);
> > >  void hugetlb_fix_reserve_counts(struct inode *inode);
> > > +void hugetlb_fix_hwcrp_counts(struct page *page);
> > >  extern struct mutex *hugetlb_fault_mutex_table;
> > >  u32 hugetlb_fault_mutex_hash(struct address_space *mapping, pgoff_t idx);
> > >  
> > > @@ -602,12 +603,14 @@ struct hstate {
> > >  	unsigned long free_huge_pages;
> > >  	unsigned long resv_huge_pages;
> > >  	unsigned long surplus_huge_pages;
> > > +	unsigned long hwcrp_huge_pages;
> > >  	unsigned long nr_overcommit_huge_pages;
> > >  	struct list_head hugepage_activelist;
> > >  	struct list_head hugepage_freelists[MAX_NUMNODES];
> > >  	unsigned int nr_huge_pages_node[MAX_NUMNODES];
> > >  	unsigned int free_huge_pages_node[MAX_NUMNODES];
> > >  	unsigned int surplus_huge_pages_node[MAX_NUMNODES];
> > > +	unsigned int hwcrp_huge_pages_node[MAX_NUMNODES];
> > 
> > I understand your requirement to count the number of memory errors on
> > hugetlb pages.  However, we need to think carefully about we represent
> > that count.
> > 
> > Noaya, do you have opinions on where would be the best place to store
> > this information?  The hugetlb memory error code has the comment 'needs
> > work'.  Ideally, we could isolate memory errors to a single base (4K for
> > x86) page and free the remaining base pages to buddy.  We could also
> > potentially allocate a 'replacement' hugetlb page doing something like
> > alloc_and_dissolve_huge_page.

Yes, these two are the important goals of hugetlb support.

> > 
> > If we get an error on a hugetlb page and can isolate it to a base page
> > and replace the huge page, is it still a huge page memory error?

That's not counted as a huge page error, but just a base page error.

> > 
> > IMO, we should work on isolating memory errors to a base page and
> > replacing the huge page.  Then, the existing count of base pages with
> > memory errors would be sufficient?

Yes, I agree with Mike.  Adding new fields dedicated to the (maybe temporary)
counters in struct hstate seems to me an overkill.

> > 
> > This is something I would like to work, but I have higher priorities
> > right now.
> 
> Yes, splitting the huge pages and isolating a base page is ideal. And
> we do this with dissolve_free_huge_page() when page_mapping() return
> NULL. I think there is a reason(but I do not get it) why we don't split
> huge pags in hugetlbfs_error_remove_page() or after. So I choose to 
> add a new count.

Maybe the resource is the main reason of this incompleteness, I noticed this
for years and continued to say "this is in my todo list", but still don't
make it (really sorry about that...).  Anyway, if you can (I hope) solve
your problem with "/proc/kpageflag" approach, which is a recommended solution.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Re: [PATCH v2] mm: hugetlb: add hwcrp_hugepages to record memory failure on hugetlbfs
  2021-06-29  8:08     ` Naoya Horiguchi
@ 2021-07-11 14:20       ` Bin Wang
  0 siblings, 0 replies; 5+ messages in thread
From: Bin Wang @ 2021-07-11 14:20 UTC (permalink / raw)
  To: naoya.horiguchi
  Cc: akpm, linux-kernel, linux-mm, mike.kravetz, naoya.horiguchi,
	wangbin224, wuxu.wu

Hi Naoya,

> > Yes, splitting the huge pages and isolating a base page is ideal. And
> > we do this with dissolve_free_huge_page() when page_mapping() return
> > NULL. I think there is a reason(but I do not get it) why we don't split
> > huge pags in hugetlbfs_error_remove_page() or after. So I choose to 
> > add a new count.
> 
> Maybe the resource is the main reason of this incompleteness, I noticed this
> for years and continued to say "this is in my todo list", but still don't
> make it (really sorry about that...).  Anyway, if you can (I hope) solve
> your problem with "/proc/kpageflag" approach, which is a recommended solution.

I do not understand the exact meaning of the "resource". I have tried to call
dissolve_free_huge_page() after hugetlbfs_error_remove_page() and it worked.
In my opinion, the error huge page has been truncated from the hugetlbfs. It
cannot be accessed and allocated again. I think it is safe to split it.

I would appreciate it if you could point out what I overlooked. And I will
try to solve it.

Thanks,
Bin Wang

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-07-11 14:20 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-23  8:51 [PATCH v2] mm: hugetlb: add hwcrp_hugepages to record memory failure on hugetlbfs wangbin
2021-06-25 23:07 ` Mike Kravetz
2021-06-28  9:27   ` Bin Wang
2021-06-29  8:08     ` Naoya Horiguchi
2021-07-11 14:20       ` Bin Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).