All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm: hugetlbfs: add hwcrp_hugepages to record memory failure on hugetlbfs
@ 2021-06-07 14:16 wangbin
  2021-06-07 19:07 ` Mike Kravetz
  2021-06-07 19:13 ` Mike Kravetz
  0 siblings, 2 replies; 7+ messages in thread
From: wangbin @ 2021-06-07 14:16 UTC (permalink / raw)
  To: linux-mm, linux-kernel; +Cc: n-horiguchi, mike.kravetz, akpm, wuxu.wu

From: Bin Wang <wangbin224@huawei.com>

In the current hugetlbfs memory failure handler, reserved huge page
counts are used to record the number of huge pages with hwposion.
There are two problems:

1. We call hugetlb_fix_reserve_counts() to change reserved counts
in hugetlbfs_error_remove_page(). But this function is only called if
hugetlb_unreserve_pages() fails, and hugetlb_unreserve_pages() fails
only if kmalloc in region_del() fails, which is almost impossible.
As a result, the reserved count is not corrected as expected when a
memory failure occurs.

2. Reserved counts is designed to display the number of hugepages
reserved at mmap() time. This means that even if we fix the first
issue, reserved counts will be confusing because we can't tell if
it's hwposion or reserved hugepage.

This patch adds hardware corrput huge pages counts to record memory
failure on hugetlbfs instead of reserved counts.

Signed-off-by: Bin Wang <wangbin224@huawei.com>
---
 fs/hugetlbfs/inode.c    |  3 +--
 include/linux/hugetlb.h |  3 +++
 mm/hugetlb.c            | 30 ++++++++++++++++++++++++++++++
 3 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 55efd3dd04f6..3c094f533981 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -985,8 +985,7 @@ static int hugetlbfs_error_remove_page(struct address_space *mapping,
 	pgoff_t index = page->index;
 
 	remove_huge_page(page);
-	if (unlikely(hugetlb_unreserve_pages(inode, index, index + 1, 1)))
-		hugetlb_fix_reserve_counts(inode);
+	hugetlb_fix_hwcrp_counts(page);
 
 	return 0;
 }
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index b92f25ccef58..130f244f3bef 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -153,6 +153,7 @@ void putback_active_hugepage(struct page *page);
 void move_hugetlb_state(struct page *oldpage, struct page *newpage, int reason);
 void free_huge_page(struct page *page);
 void hugetlb_fix_reserve_counts(struct inode *inode);
+void hugetlb_fix_hwcrp_counts(struct page *page);
 extern struct mutex *hugetlb_fault_mutex_table;
 u32 hugetlb_fault_mutex_hash(struct address_space *mapping, pgoff_t idx);
 
@@ -576,12 +577,14 @@ struct hstate {
 	unsigned long free_huge_pages;
 	unsigned long resv_huge_pages;
 	unsigned long surplus_huge_pages;
+	unsigned long hwcrp_huge_pages;
 	unsigned long nr_overcommit_huge_pages;
 	struct list_head hugepage_activelist;
 	struct list_head hugepage_freelists[MAX_NUMNODES];
 	unsigned int nr_huge_pages_node[MAX_NUMNODES];
 	unsigned int free_huge_pages_node[MAX_NUMNODES];
 	unsigned int surplus_huge_pages_node[MAX_NUMNODES];
+	unsigned int hwcrp_huge_pages_node[MAX_NUMNODES];
 #ifdef CONFIG_CGROUP_HUGETLB
 	/* cgroup control files */
 	struct cftype cgroup_files_dfl[7];
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 95918f410c0f..dae91f118c18 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -761,6 +761,15 @@ void hugetlb_fix_reserve_counts(struct inode *inode)
 		pr_warn("hugetlb: Huge Page Reserved count may go negative.\n");
 }
 
+void hugetlb_fix_hwcrp_counts(struct page *page)
+{
+	struct hstate *h = &default_hstate;
+	int nid = page_to_nid(page);
+
+	h->hwcrp_huge_pages++;
+	h->hwcrp_huge_pages_node[nid]++;
+}
+
 /*
  * Count and return the number of huge pages in the reserve map
  * that intersect with the range [f, t).
@@ -3089,12 +3098,30 @@ static ssize_t surplus_hugepages_show(struct kobject *kobj,
 }
 HSTATE_ATTR_RO(surplus_hugepages);
 
+static ssize_t hwcrp_hugepages_show(struct kobject *kobj,
+					struct kobj_attribute *attr, char *buf)
+{
+	struct hstate *h;
+	unsigned long hwcrp_huge_pages;
+	int nid;
+
+	h = kobj_to_hstate(kobj, &nid);
+	if (nid == NUMA_NO_NODE)
+		hwcrp_huge_pages = h->hwcrp_huge_pages;
+	else
+		hwcrp_huge_pages = h->hwcrp_huge_pages_node[nid];
+
+	return sprintf(buf, "%lu\n", hwcrp_huge_pages);
+}
+HSTATE_ATTR_RO(hwcrp_hugepages);
+
 static struct attribute *hstate_attrs[] = {
 	&nr_hugepages_attr.attr,
 	&nr_overcommit_hugepages_attr.attr,
 	&free_hugepages_attr.attr,
 	&resv_hugepages_attr.attr,
 	&surplus_hugepages_attr.attr,
+	&hwcrp_hugepages_attr.attr,
 #ifdef CONFIG_NUMA
 	&nr_hugepages_mempolicy_attr.attr,
 #endif
@@ -3164,6 +3191,7 @@ static struct attribute *per_node_hstate_attrs[] = {
 	&nr_hugepages_attr.attr,
 	&free_hugepages_attr.attr,
 	&surplus_hugepages_attr.attr,
+	&hwcrp_hugepages_attr.attr,
 	NULL,
 };
 
@@ -3657,11 +3685,13 @@ void hugetlb_report_meminfo(struct seq_file *m)
 				   "HugePages_Free:    %5lu\n"
 				   "HugePages_Rsvd:    %5lu\n"
 				   "HugePages_Surp:    %5lu\n"
+				   "HugePages_Hwcrp:   %5lu\n"
 				   "Hugepagesize:   %8lu kB\n",
 				   count,
 				   h->free_huge_pages,
 				   h->resv_huge_pages,
 				   h->surplus_huge_pages,
+				   h->hwcrp_huge_pages,
 				   huge_page_size(h) / SZ_1K);
 	}
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: hugetlbfs: add hwcrp_hugepages to record memory failure on hugetlbfs
  2021-06-07 14:16 [PATCH] mm: hugetlbfs: add hwcrp_hugepages to record memory failure on hugetlbfs wangbin
@ 2021-06-07 19:07 ` Mike Kravetz
  2021-06-07 19:13 ` Mike Kravetz
  1 sibling, 0 replies; 7+ messages in thread
From: Mike Kravetz @ 2021-06-07 19:07 UTC (permalink / raw)
  To: wangbin, linux-mm, linux-kernel; +Cc: n-horiguchi, akpm, wuxu.wu

On 6/7/21 7:16 AM, wangbin wrote:
> From: Bin Wang <wangbin224@huawei.com>
> 
> In the current hugetlbfs memory failure handler, reserved huge page
> counts are used to record the number of huge pages with hwposion.

I do not believe this is an accurate statement.  Naoya is the memory
error expert and may disagree, but I do not see anywhere where reserve
counts are being used to track huge pages with memory errors.

IIUC, the routine hugetlbfs_error_remove_page is called after
unmapping the page from all user mappings.  The routine will simply,
remove the page from the cache.  This effectively removes the page
from the file as hugetlbfs is a memory only filesystem.  The subsequent
call to hugetlb_unreserve_pages cleans up any reserve map entries
associated with the page and adjusts the reserve count if necessary.
The reserve count adjustment is based on removing the page from the
file, rather than the memory error.  The same adjustment would be made
if the page was hole punched from the file.

What specific problem are you trying to solve?  Are trying to see how
many huge pages were hit by memory errors?
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: hugetlbfs: add hwcrp_hugepages to record memory failure on hugetlbfs
  2021-06-07 14:16 [PATCH] mm: hugetlbfs: add hwcrp_hugepages to record memory failure on hugetlbfs wangbin
  2021-06-07 19:07 ` Mike Kravetz
@ 2021-06-07 19:13 ` Mike Kravetz
  2021-06-08  2:24   ` wangbin
  2021-06-08  8:01   ` HORIGUCHI NAOYA(堀口 直也)
  1 sibling, 2 replies; 7+ messages in thread
From: Mike Kravetz @ 2021-06-07 19:13 UTC (permalink / raw)
  To: wangbin, linux-mm, linux-kernel; +Cc: Naoya Horiguchi, akpm, wuxu.wu

Resend with new e-mail for Naoya

On 6/7/21 7:16 AM, wangbin wrote:
> From: Bin Wang <wangbin224@huawei.com>
> 
> In the current hugetlbfs memory failure handler, reserved huge page
> counts are used to record the number of huge pages with hwposion.

I do not believe this is an accurate statement.  Naoya is the memory
error expert and may disagree, but I do not see anywhere where reserve
counts are being used to track huge pages with memory errors.

IIUC, the routine hugetlbfs_error_remove_page is called after
unmapping the page from all user mappings.  The routine will simply,
remove the page from the cache.  This effectively removes the page
from the file as hugetlbfs is a memory only filesystem.  The subsequent
call to hugetlb_unreserve_pages cleans up any reserve map entries
associated with the page and adjusts the reserve count if necessary.
The reserve count adjustment is based on removing the page from the
file, rather than the memory error.  The same adjustment would be made
if the page was hole punched from the file.

What specific problem are you trying to solve?  Are trying to see how
many huge pages were hit by memory errors?
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Re: [PATCH] mm: hugetlbfs: add hwcrp_hugepages to record memory failure on hugetlbfs
  2021-06-07 19:13 ` Mike Kravetz
@ 2021-06-08  2:24   ` wangbin
  2021-06-08  9:13     ` HORIGUCHI NAOYA(堀口 直也)
  2021-06-08  8:01   ` HORIGUCHI NAOYA(堀口 直也)
  1 sibling, 1 reply; 7+ messages in thread
From: wangbin @ 2021-06-08  2:24 UTC (permalink / raw)
  To: mike.kravetz
  Cc: akpm, linux-kernel, linux-mm, nao.horiguchi, wangbin224, wuxu.wu

> What specific problem are you trying to solve?  Are trying to see how
> many huge pages were hit by memory errors?

Yes, I'd like to know how many huge pages are not available because of
the memory errors. Just like HardwareCorrupted in the /proc/meminfo.
But the HardwareCorrupted only adds one page size when a huge page is
hit by memory errors, and mixes with normal pages. So I think we should
add a new counts to track the memory errors on hugetlbfs. 
--
Bin Wang

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: hugetlbfs: add hwcrp_hugepages to record memory failure on hugetlbfs
  2021-06-07 19:13 ` Mike Kravetz
  2021-06-08  2:24   ` wangbin
@ 2021-06-08  8:01   ` HORIGUCHI NAOYA(堀口 直也)
  1 sibling, 0 replies; 7+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2021-06-08  8:01 UTC (permalink / raw)
  To: wangbin, Mike Kravetz
  Cc: linux-mm, linux-kernel, Naoya Horiguchi, akpm, wuxu.wu

Thanks for forwarding the message, Mike.

On Mon, Jun 07, 2021 at 12:13:03PM -0700, Mike Kravetz wrote:
> Resend with new e-mail for Naoya
> 
> On 6/7/21 7:16 AM, wangbin wrote:
> > From: Bin Wang <wangbin224@huawei.com>
> > 
> > In the current hugetlbfs memory failure handler, reserved huge page
> > counts are used to record the number of huge pages with hwposion.
> 
> I do not believe this is an accurate statement.  Naoya is the memory
> error expert and may disagree, but I do not see anywhere where reserve
> counts are being used to track huge pages with memory errors.

And Mike is right, hugetlb's reservation count is not linked
to accounting of hwpoisoned pages.

> 
> IIUC, the routine hugetlbfs_error_remove_page is called after
> unmapping the page from all user mappings.  The routine will simply,
> remove the page from the cache.  This effectively removes the page
> from the file as hugetlbfs is a memory only filesystem.  The subsequent
> call to hugetlb_unreserve_pages cleans up any reserve map entries
> associated with the page and adjusts the reserve count if necessary.
> The reserve count adjustment is based on removing the page from the
> file, rather than the memory error.  The same adjustment would be made
> if the page was hole punched from the file.

This logic totally makes sense to me.

Unmapping done in memory_failure() might increment the reserve count,
but that's the cancel of the consumed reservation by unmapping.

Thanks,
Naoya Horigcuhi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Re: [PATCH] mm: hugetlbfs: add hwcrp_hugepages to record memory failure on hugetlbfs
  2021-06-08  2:24   ` wangbin
@ 2021-06-08  9:13     ` HORIGUCHI NAOYA(堀口 直也)
  2021-06-09  2:23       ` wangbin
  0 siblings, 1 reply; 7+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2021-06-08  9:13 UTC (permalink / raw)
  To: wangbin
  Cc: mike.kravetz, akpm, linux-kernel, linux-mm, nao.horiguchi, wuxu.wu

On Tue, Jun 08, 2021 at 10:24:50AM +0800, wangbin wrote:
> > What specific problem are you trying to solve?  Are trying to see how
> > many huge pages were hit by memory errors?
> 
> Yes, I'd like to know how many huge pages are not available because of
> the memory errors. Just like HardwareCorrupted in the /proc/meminfo.
> But the HardwareCorrupted only adds one page size when a huge page is
> hit by memory errors, and mixes with normal pages. So I think we should
> add a new counts to track the memory errors on hugetlbfs. 

If you can use root privilege in your use-case, an easy way to get the
number of corrupted hugepages is to use page-types.c (which reads
/proc/kpageflags) like below:

    $ page-types -b huge,hwpoison=huge,hwpoison
                 flags      page-count       MB  symbolic-flags                     long-symbolic-flags
    0x00000000000a8000               1        0  _______________H_G_X_______________________        compound_head,huge,hwpoison
                 total               1        0


But I guess that many usecases do not permit access to this interface,
where some new accounting interface for corrupted hugepages could be
helpful as you suggest.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Re: [PATCH] mm: hugetlbfs: add hwcrp_hugepages to record memory failure on hugetlbfs
  2021-06-08  9:13     ` HORIGUCHI NAOYA(堀口 直也)
@ 2021-06-09  2:23       ` wangbin
  0 siblings, 0 replies; 7+ messages in thread
From: wangbin @ 2021-06-09  2:23 UTC (permalink / raw)
  To: naoya.horiguchi
  Cc: akpm, linux-kernel, linux-mm, mike.kravetz, nao.horiguchi,
	wangbin224, wuxu.wu

> If you can use root privilege in your use-case, an easy way to get the
> number of corrupted hugepages is to use page-types.c (which reads
> /proc/kpageflags) like below:
> 
>     $ page-types -b huge,hwpoison=huge,hwpoison
>                  flags      page-count       MB  symbolic-flags                     long-symbolic-flags
>     0x00000000000a8000               1        0  _______________H_G_X_______________________        compound_head,huge,hwpoison
>                  total               1        0
> 
> But I guess that many usecases do not permit access to this interface,
> where some new accounting interface for corrupted hugepages could be
> helpful as you suggest.

Thanks for your suggestion very much. This approach is helpful to me.
But as you say, root privilege is not permitted in most cases. And I
also want to know the number of corrupted hugepages per node.

--
Bin Wang

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-06-09  2:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-07 14:16 [PATCH] mm: hugetlbfs: add hwcrp_hugepages to record memory failure on hugetlbfs wangbin
2021-06-07 19:07 ` Mike Kravetz
2021-06-07 19:13 ` Mike Kravetz
2021-06-08  2:24   ` wangbin
2021-06-08  9:13     ` HORIGUCHI NAOYA(堀口 直也)
2021-06-09  2:23       ` wangbin
2021-06-08  8:01   ` HORIGUCHI NAOYA(堀口 直也)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.