All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/1] mm:hugetlbfs: Fix hwpoison reserve accounting
@ 2017-10-19 23:00 ` Mike Kravetz
  0 siblings, 0 replies; 14+ messages in thread
From: Mike Kravetz @ 2017-10-19 23:00 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Naoya Horiguchi, Michal Hocko, Aneesh Kumar, Anshuman Khandual,
	Andrew Morton, Mike Kravetz

The routine hugetlbfs_error_remove_page() incorrectly calls
hugetlb_fix_reserve_counts which will result in bad (negative)
reserved huge page counts.  The following patch addresses this
issue.

A follow up question/issue:
When a hugetlbfs page is poisoned, it appears as an 'in use'
huge page via all the external user visible metrics.  Even the
in internal counters think this is simply an 'in use' huge page.
This usually is not an issue.  However, it may be confusing if
someone adjusts the total number of huge pages.  For example,
if after poisoning a huge page I set the total number of huge
pages to zero, the poisoned page will be counted as 'surplus'.
I was thinking about keeping at least a bad page count (if not
a list) to avoid user confusion.  It may be overkill as I have
not given too much thought to this issue.  Anyone else have
thoughts here?

Mike Kravetz (1):
  mm:hugetlbfs: Fix hwpoison reserve accounting

 fs/hugetlbfs/inode.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

-- 
2.13.6

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 0/1] mm:hugetlbfs: Fix hwpoison reserve accounting
@ 2017-10-19 23:00 ` Mike Kravetz
  0 siblings, 0 replies; 14+ messages in thread
From: Mike Kravetz @ 2017-10-19 23:00 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Naoya Horiguchi, Michal Hocko, Aneesh Kumar, Anshuman Khandual,
	Andrew Morton, Mike Kravetz

The routine hugetlbfs_error_remove_page() incorrectly calls
hugetlb_fix_reserve_counts which will result in bad (negative)
reserved huge page counts.  The following patch addresses this
issue.

A follow up question/issue:
When a hugetlbfs page is poisoned, it appears as an 'in use'
huge page via all the external user visible metrics.  Even the
in internal counters think this is simply an 'in use' huge page.
This usually is not an issue.  However, it may be confusing if
someone adjusts the total number of huge pages.  For example,
if after poisoning a huge page I set the total number of huge
pages to zero, the poisoned page will be counted as 'surplus'.
I was thinking about keeping at least a bad page count (if not
a list) to avoid user confusion.  It may be overkill as I have
not given too much thought to this issue.  Anyone else have
thoughts here?

Mike Kravetz (1):
  mm:hugetlbfs: Fix hwpoison reserve accounting

 fs/hugetlbfs/inode.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

-- 
2.13.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/1] mm:hugetlbfs: Fix hwpoison reserve accounting
  2017-10-19 23:00 ` Mike Kravetz
@ 2017-10-19 23:00   ` Mike Kravetz
  -1 siblings, 0 replies; 14+ messages in thread
From: Mike Kravetz @ 2017-10-19 23:00 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Naoya Horiguchi, Michal Hocko, Aneesh Kumar, Anshuman Khandual,
	Andrew Morton, Mike Kravetz, stable

Calling madvise(MADV_HWPOISON) on a hugetlbfs page will result in
bad (negative) reserved huge page counts.  This may not happen
immediately, but may happen later when the underlying file is
removed or filesystem unmounted.  For example:
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:       1
HugePages_Free:        0
HugePages_Rsvd:    18446744073709551615
HugePages_Surp:        0
Hugepagesize:       2048 kB

In routine hugetlbfs_error_remove_page(), hugetlb_fix_reserve_counts
is called after remove_huge_page.  hugetlb_fix_reserve_counts is
designed to only be called/used only if a failure is returned from
hugetlb_unreserve_pages.  Therefore, call hugetlb_unreserve_pages
as required and only call hugetlb_fix_reserve_counts in the unlikely
event that hugetlb_unreserve_pages returns an error.

Fixes: 78bb920344b8 ("mm: hwpoison: dissolve in-use hugepage in unrecoverable memory error")
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 fs/hugetlbfs/inode.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 59073e9f01a4..ed113ea17aff 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -842,9 +842,12 @@ static int hugetlbfs_error_remove_page(struct address_space *mapping,
 				struct page *page)
 {
 	struct inode *inode = mapping->host;
+	pgoff_t index = page->index;
 
 	remove_huge_page(page);
-	hugetlb_fix_reserve_counts(inode);
+	if (unlikely(hugetlb_unreserve_pages(inode, index, index + 1, 1)))
+		hugetlb_fix_reserve_counts(inode);
+
 	return 0;
 }
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 1/1] mm:hugetlbfs: Fix hwpoison reserve accounting
@ 2017-10-19 23:00   ` Mike Kravetz
  0 siblings, 0 replies; 14+ messages in thread
From: Mike Kravetz @ 2017-10-19 23:00 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Naoya Horiguchi, Michal Hocko, Aneesh Kumar, Anshuman Khandual,
	Andrew Morton, Mike Kravetz, stable

Calling madvise(MADV_HWPOISON) on a hugetlbfs page will result in
bad (negative) reserved huge page counts.  This may not happen
immediately, but may happen later when the underlying file is
removed or filesystem unmounted.  For example:
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:       1
HugePages_Free:        0
HugePages_Rsvd:    18446744073709551615
HugePages_Surp:        0
Hugepagesize:       2048 kB

In routine hugetlbfs_error_remove_page(), hugetlb_fix_reserve_counts
is called after remove_huge_page.  hugetlb_fix_reserve_counts is
designed to only be called/used only if a failure is returned from
hugetlb_unreserve_pages.  Therefore, call hugetlb_unreserve_pages
as required and only call hugetlb_fix_reserve_counts in the unlikely
event that hugetlb_unreserve_pages returns an error.

Fixes: 78bb920344b8 ("mm: hwpoison: dissolve in-use hugepage in unrecoverable memory error")
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 fs/hugetlbfs/inode.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 59073e9f01a4..ed113ea17aff 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -842,9 +842,12 @@ static int hugetlbfs_error_remove_page(struct address_space *mapping,
 				struct page *page)
 {
 	struct inode *inode = mapping->host;
+	pgoff_t index = page->index;
 
 	remove_huge_page(page);
-	hugetlb_fix_reserve_counts(inode);
+	if (unlikely(hugetlb_unreserve_pages(inode, index, index + 1, 1)))
+		hugetlb_fix_reserve_counts(inode);
+
 	return 0;
 }
 
-- 
2.13.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] mm:hugetlbfs: Fix hwpoison reserve accounting
  2017-10-19 23:00   ` Mike Kravetz
@ 2017-10-20  2:30     ` Naoya Horiguchi
  -1 siblings, 0 replies; 14+ messages in thread
From: Naoya Horiguchi @ 2017-10-20  2:30 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, Michal Hocko, Aneesh Kumar,
	Anshuman Khandual, Andrew Morton, stable

On Thu, Oct 19, 2017 at 04:00:07PM -0700, Mike Kravetz wrote:
> Calling madvise(MADV_HWPOISON) on a hugetlbfs page will result in
> bad (negative) reserved huge page counts.  This may not happen
> immediately, but may happen later when the underlying file is
> removed or filesystem unmounted.  For example:
> AnonHugePages:         0 kB
> ShmemHugePages:        0 kB
> HugePages_Total:       1
> HugePages_Free:        0
> HugePages_Rsvd:    18446744073709551615
> HugePages_Surp:        0
> Hugepagesize:       2048 kB
>
> In routine hugetlbfs_error_remove_page(), hugetlb_fix_reserve_counts
> is called after remove_huge_page.  hugetlb_fix_reserve_counts is
> designed to only be called/used only if a failure is returned from
> hugetlb_unreserve_pages.  Therefore, call hugetlb_unreserve_pages
> as required and only call hugetlb_fix_reserve_counts in the unlikely
> event that hugetlb_unreserve_pages returns an error.

Hi Mike,

Thank you for addressing this. The patch itself looks good to me, but
the reported issue (negative reserve count) doesn't reproduce in my trial
with v4.14-rc5, so could you share the exact procedure for this issue?

When error handler runs over a huge page, the reserve count is incremented
so I'm not sure why the reserve count goes negative. My operation is like below:

  $ sysctl vm.nr_hugepages=10
  $ grep HugePages_ /proc/meminfo
  HugePages_Total:      10
  HugePages_Free:       10
  HugePages_Rsvd:        0
  HugePages_Surp:        0
  $ ./test_alloc_generic -B hugetlb_file -N1 -L "mmap access memory_error_injection:error_type=madv_hard"  // allocate a 2MB file on hugetlbfs, then madvise(MADV_HWPOISON) on it.
  $ grep HugePages_ /proc/meminfo
  HugePages_Total:      10
  HugePages_Free:        9
  HugePages_Rsvd:        1  // reserve count is incremented
  HugePages_Surp:        0
  $ rm work/hugetlbfs/testfile
  $ grep HugePages_ /proc/meminfo
  HugePages_Total:      10
  HugePages_Free:        9
  HugePages_Rsvd:        0  // reserve count is gone
  HugePages_Surp:        0
  $ /src/linux-dev/tools/vm/page-types -b hwpoison -x // unpoison the huge page
  $ grep HugePages_ /proc/meminfo
  HugePages_Total:      10
  HugePages_Free:       10  // all huge pages are free (back to the beginning)
  HugePages_Rsvd:        0
  HugePages_Surp:        0

Thanks,
Naoya Horiguchi

>
> Fixes: 78bb920344b8 ("mm: hwpoison: dissolve in-use hugepage in unrecoverable memory error")
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  fs/hugetlbfs/inode.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 59073e9f01a4..ed113ea17aff 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -842,9 +842,12 @@ static int hugetlbfs_error_remove_page(struct address_space *mapping,
>  				struct page *page)
>  {
>  	struct inode *inode = mapping->host;
> +	pgoff_t index = page->index;
>
>  	remove_huge_page(page);
> -	hugetlb_fix_reserve_counts(inode);
> +	if (unlikely(hugetlb_unreserve_pages(inode, index, index + 1, 1)))
> +		hugetlb_fix_reserve_counts(inode);
> +
>  	return 0;
>  }
>
> --
> 2.13.6
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] mm:hugetlbfs: Fix hwpoison reserve accounting
@ 2017-10-20  2:30     ` Naoya Horiguchi
  0 siblings, 0 replies; 14+ messages in thread
From: Naoya Horiguchi @ 2017-10-20  2:30 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, Michal Hocko, Aneesh Kumar,
	Anshuman Khandual, Andrew Morton, stable

On Thu, Oct 19, 2017 at 04:00:07PM -0700, Mike Kravetz wrote:
> Calling madvise(MADV_HWPOISON) on a hugetlbfs page will result in
> bad (negative) reserved huge page counts.  This may not happen
> immediately, but may happen later when the underlying file is
> removed or filesystem unmounted.  For example:
> AnonHugePages:         0 kB
> ShmemHugePages:        0 kB
> HugePages_Total:       1
> HugePages_Free:        0
> HugePages_Rsvd:    18446744073709551615
> HugePages_Surp:        0
> Hugepagesize:       2048 kB
>
> In routine hugetlbfs_error_remove_page(), hugetlb_fix_reserve_counts
> is called after remove_huge_page.  hugetlb_fix_reserve_counts is
> designed to only be called/used only if a failure is returned from
> hugetlb_unreserve_pages.  Therefore, call hugetlb_unreserve_pages
> as required and only call hugetlb_fix_reserve_counts in the unlikely
> event that hugetlb_unreserve_pages returns an error.

Hi Mike,

Thank you for addressing this. The patch itself looks good to me, but
the reported issue (negative reserve count) doesn't reproduce in my trial
with v4.14-rc5, so could you share the exact procedure for this issue?

When error handler runs over a huge page, the reserve count is incremented
so I'm not sure why the reserve count goes negative. My operation is like below:

  $ sysctl vm.nr_hugepages=10
  $ grep HugePages_ /proc/meminfo
  HugePages_Total:      10
  HugePages_Free:       10
  HugePages_Rsvd:        0
  HugePages_Surp:        0
  $ ./test_alloc_generic -B hugetlb_file -N1 -L "mmap access memory_error_injection:error_type=madv_hard"  // allocate a 2MB file on hugetlbfs, then madvise(MADV_HWPOISON) on it.
  $ grep HugePages_ /proc/meminfo
  HugePages_Total:      10
  HugePages_Free:        9
  HugePages_Rsvd:        1  // reserve count is incremented
  HugePages_Surp:        0
  $ rm work/hugetlbfs/testfile
  $ grep HugePages_ /proc/meminfo
  HugePages_Total:      10
  HugePages_Free:        9
  HugePages_Rsvd:        0  // reserve count is gone
  HugePages_Surp:        0
  $ /src/linux-dev/tools/vm/page-types -b hwpoison -x // unpoison the huge page
  $ grep HugePages_ /proc/meminfo
  HugePages_Total:      10
  HugePages_Free:       10  // all huge pages are free (back to the beginning)
  HugePages_Rsvd:        0
  HugePages_Surp:        0

Thanks,
Naoya Horiguchi

>
> Fixes: 78bb920344b8 ("mm: hwpoison: dissolve in-use hugepage in unrecoverable memory error")
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  fs/hugetlbfs/inode.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 59073e9f01a4..ed113ea17aff 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -842,9 +842,12 @@ static int hugetlbfs_error_remove_page(struct address_space *mapping,
>  				struct page *page)
>  {
>  	struct inode *inode = mapping->host;
> +	pgoff_t index = page->index;
>
>  	remove_huge_page(page);
> -	hugetlb_fix_reserve_counts(inode);
> +	if (unlikely(hugetlb_unreserve_pages(inode, index, index + 1, 1)))
> +		hugetlb_fix_reserve_counts(inode);
> +
>  	return 0;
>  }
>
> --
> 2.13.6
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] mm:hugetlbfs: Fix hwpoison reserve accounting
  2017-10-20  2:30     ` Naoya Horiguchi
@ 2017-10-20 17:49       ` Mike Kravetz
  -1 siblings, 0 replies; 14+ messages in thread
From: Mike Kravetz @ 2017-10-20 17:49 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, linux-kernel, Michal Hocko, Aneesh Kumar,
	Anshuman Khandual, Andrew Morton, stable

On 10/19/2017 07:30 PM, Naoya Horiguchi wrote:
> On Thu, Oct 19, 2017 at 04:00:07PM -0700, Mike Kravetz wrote:
> 
> Thank you for addressing this. The patch itself looks good to me, but
> the reported issue (negative reserve count) doesn't reproduce in my trial
> with v4.14-rc5, so could you share the exact procedure for this issue?

Sure, but first one question on your test scenario below.

> 
> When error handler runs over a huge page, the reserve count is incremented
> so I'm not sure why the reserve count goes negative.

I'm not sure I follow.  What specific code is incrementing the reserve
count?  

>                                                      My operation is like below:
> 
>   $ sysctl vm.nr_hugepages=10
>   $ grep HugePages_ /proc/meminfo
>   HugePages_Total:      10
>   HugePages_Free:       10
>   HugePages_Rsvd:        0
>   HugePages_Surp:        0
>   $ ./test_alloc_generic -B hugetlb_file -N1 -L "mmap access memory_error_injection:error_type=madv_hard"  // allocate a 2MB file on hugetlbfs, then madvise(MADV_HWPOISON) on it.
>   $ grep HugePages_ /proc/meminfo
>   HugePages_Total:      10
>   HugePages_Free:        9
>   HugePages_Rsvd:        1  // reserve count is incremented
>   HugePages_Surp:        0

This is confusing to me.  I can not create a test where there is a reserve
count after poisoning page.

I tried to recreate your test.  Running unmodified 4.14.0-rc5.

Before test
-----------
HugePages_Total:       1
HugePages_Free:        1
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

After open(creat) and mmap of 2MB hugetlbfs file
------------------------------------------------
HugePages_Total:       1
HugePages_Free:        1
HugePages_Rsvd:        1
HugePages_Surp:        0
Hugepagesize:       2048 kB

Reserve count is 1 as expected/normal

After madvise(MADV_HWPOISON) of the single huge page in mapping/file
--------------------------------------------------------------------
HugePages_Total:       1
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

In this case, the reserve (and free) count were decremented.  Note that
before the poison operation the page was not associated with the mapping/
file.  I did not look closely at the code, but assume the madvise may
cause the page to be 'faulted in'.

The counts remain the same when the program exits
-------------------------------------------------
HugePages_Total:       1
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

Remove the file (rm /var/opt/oracle/hugepool/foo)
-------------------------------------------------
HugePages_Total:       1
HugePages_Free:        0
HugePages_Rsvd:    18446744073709551615
HugePages_Surp:        0
Hugepagesize:       2048 kB

I am still confused about how your test maintains a reserve count after
poisoning.  It may be a good idea for you to test my patch with your
test scenario as I can not recreate here.

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] mm:hugetlbfs: Fix hwpoison reserve accounting
@ 2017-10-20 17:49       ` Mike Kravetz
  0 siblings, 0 replies; 14+ messages in thread
From: Mike Kravetz @ 2017-10-20 17:49 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, linux-kernel, Michal Hocko, Aneesh Kumar,
	Anshuman Khandual, Andrew Morton, stable

On 10/19/2017 07:30 PM, Naoya Horiguchi wrote:
> On Thu, Oct 19, 2017 at 04:00:07PM -0700, Mike Kravetz wrote:
> 
> Thank you for addressing this. The patch itself looks good to me, but
> the reported issue (negative reserve count) doesn't reproduce in my trial
> with v4.14-rc5, so could you share the exact procedure for this issue?

Sure, but first one question on your test scenario below.

> 
> When error handler runs over a huge page, the reserve count is incremented
> so I'm not sure why the reserve count goes negative.

I'm not sure I follow.  What specific code is incrementing the reserve
count?  

>                                                      My operation is like below:
> 
>   $ sysctl vm.nr_hugepages=10
>   $ grep HugePages_ /proc/meminfo
>   HugePages_Total:      10
>   HugePages_Free:       10
>   HugePages_Rsvd:        0
>   HugePages_Surp:        0
>   $ ./test_alloc_generic -B hugetlb_file -N1 -L "mmap access memory_error_injection:error_type=madv_hard"  // allocate a 2MB file on hugetlbfs, then madvise(MADV_HWPOISON) on it.
>   $ grep HugePages_ /proc/meminfo
>   HugePages_Total:      10
>   HugePages_Free:        9
>   HugePages_Rsvd:        1  // reserve count is incremented
>   HugePages_Surp:        0

This is confusing to me.  I can not create a test where there is a reserve
count after poisoning page.

I tried to recreate your test.  Running unmodified 4.14.0-rc5.

Before test
-----------
HugePages_Total:       1
HugePages_Free:        1
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

After open(creat) and mmap of 2MB hugetlbfs file
------------------------------------------------
HugePages_Total:       1
HugePages_Free:        1
HugePages_Rsvd:        1
HugePages_Surp:        0
Hugepagesize:       2048 kB

Reserve count is 1 as expected/normal

After madvise(MADV_HWPOISON) of the single huge page in mapping/file
--------------------------------------------------------------------
HugePages_Total:       1
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

In this case, the reserve (and free) count were decremented.  Note that
before the poison operation the page was not associated with the mapping/
file.  I did not look closely at the code, but assume the madvise may
cause the page to be 'faulted in'.

The counts remain the same when the program exits
-------------------------------------------------
HugePages_Total:       1
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

Remove the file (rm /var/opt/oracle/hugepool/foo)
-------------------------------------------------
HugePages_Total:       1
HugePages_Free:        0
HugePages_Rsvd:    18446744073709551615
HugePages_Surp:        0
Hugepagesize:       2048 kB

I am still confused about how your test maintains a reserve count after
poisoning.  It may be a good idea for you to test my patch with your
test scenario as I can not recreate here.

-- 
Mike Kravetz

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] mm:hugetlbfs: Fix hwpoison reserve accounting
  2017-10-20 17:49       ` Mike Kravetz
@ 2017-10-23  7:32         ` Naoya Horiguchi
  -1 siblings, 0 replies; 14+ messages in thread
From: Naoya Horiguchi @ 2017-10-23  7:32 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, Michal Hocko, Aneesh Kumar,
	Anshuman Khandual, Andrew Morton, stable

On Fri, Oct 20, 2017 at 10:49:46AM -0700, Mike Kravetz wrote:
> On 10/19/2017 07:30 PM, Naoya Horiguchi wrote:
> > On Thu, Oct 19, 2017 at 04:00:07PM -0700, Mike Kravetz wrote:
> >
> > Thank you for addressing this. The patch itself looks good to me, but
> > the reported issue (negative reserve count) doesn't reproduce in my trial
> > with v4.14-rc5, so could you share the exact procedure for this issue?
>
> Sure, but first one question on your test scenario below.
>
> >
> > When error handler runs over a huge page, the reserve count is incremented
> > so I'm not sure why the reserve count goes negative.
>
> I'm not sure I follow.  What specific code is incrementing the reserve
> count?

The call path is like below:

  hugetlbfs_error_remove_page
    hugetlb_fix_reserve_counts
      hugepage_subpool_get_pages(spool, 1)
        hugetlb_acct_memory(h, 1);
          gather_surplus_pages
            h->resv_huge_pages += delta;

>
> >                                                      My operation is like below:
> >
> >   $ sysctl vm.nr_hugepages=10
> >   $ grep HugePages_ /proc/meminfo
> >   HugePages_Total:      10
> >   HugePages_Free:       10
> >   HugePages_Rsvd:        0
> >   HugePages_Surp:        0
> >   $ ./test_alloc_generic -B hugetlb_file -N1 -L "mmap access memory_error_injection:error_type=madv_hard"  // allocate a 2MB file on hugetlbfs, then madvise(MADV_HWPOISON) on it.
> >   $ grep HugePages_ /proc/meminfo
> >   HugePages_Total:      10
> >   HugePages_Free:        9
> >   HugePages_Rsvd:        1  // reserve count is incremented
> >   HugePages_Surp:        0
>
> This is confusing to me.  I can not create a test where there is a reserve
> count after poisoning page.
>
> I tried to recreate your test.  Running unmodified 4.14.0-rc5.
>
> Before test
> -----------
> HugePages_Total:       1
> HugePages_Free:        1
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:       2048 kB
>
> After open(creat) and mmap of 2MB hugetlbfs file
> ------------------------------------------------
> HugePages_Total:       1
> HugePages_Free:        1
> HugePages_Rsvd:        1
> HugePages_Surp:        0
> Hugepagesize:       2048 kB
>
> Reserve count is 1 as expected/normal
>
> After madvise(MADV_HWPOISON) of the single huge page in mapping/file
> --------------------------------------------------------------------
> HugePages_Total:       1
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:       2048 kB
>
> In this case, the reserve (and free) count were decremented.  Note that
> before the poison operation the page was not associated with the mapping/
> file.  I did not look closely at the code, but assume the madvise may
> cause the page to be 'faulted in'.
>
> The counts remain the same when the program exits
> -------------------------------------------------
> HugePages_Total:       1
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:       2048 kB
>
> Remove the file (rm /var/opt/oracle/hugepool/foo)
> -------------------------------------------------
> HugePages_Total:       1
> HugePages_Free:        0
> HugePages_Rsvd:    18446744073709551615
> HugePages_Surp:        0
> Hugepagesize:       2048 kB
>
> I am still confused about how your test maintains a reserve count after
> poisoning.  It may be a good idea for you to test my patch with your
> test scenario as I can not recreate here.

Interestingly, I found that this reproduces if all hugetlb pages are
reserved when poisoning.
Your testing meets the condition, and mine doesn't.

In gather_surplus_pages() we determine whether we extend hugetlb pool
with surplus pages like below:

    needed = (h->resv_huge_pages + delta) - h->free_huge_pages;
    if (needed <= 0) {
            h->resv_huge_pages += delta;
            return 0;
    }
    ...

needed is 1 if h->resv_huge_pages == h->free_huge_pages, and then
the reserve count gets inconsistent.
I confirmed that your patch fixes the issue, so I'm OK with it.

Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] mm:hugetlbfs: Fix hwpoison reserve accounting
@ 2017-10-23  7:32         ` Naoya Horiguchi
  0 siblings, 0 replies; 14+ messages in thread
From: Naoya Horiguchi @ 2017-10-23  7:32 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, Michal Hocko, Aneesh Kumar,
	Anshuman Khandual, Andrew Morton, stable

On Fri, Oct 20, 2017 at 10:49:46AM -0700, Mike Kravetz wrote:
> On 10/19/2017 07:30 PM, Naoya Horiguchi wrote:
> > On Thu, Oct 19, 2017 at 04:00:07PM -0700, Mike Kravetz wrote:
> >
> > Thank you for addressing this. The patch itself looks good to me, but
> > the reported issue (negative reserve count) doesn't reproduce in my trial
> > with v4.14-rc5, so could you share the exact procedure for this issue?
>
> Sure, but first one question on your test scenario below.
>
> >
> > When error handler runs over a huge page, the reserve count is incremented
> > so I'm not sure why the reserve count goes negative.
>
> I'm not sure I follow.  What specific code is incrementing the reserve
> count?

The call path is like below:

  hugetlbfs_error_remove_page
    hugetlb_fix_reserve_counts
      hugepage_subpool_get_pages(spool, 1)
        hugetlb_acct_memory(h, 1);
          gather_surplus_pages
            h->resv_huge_pages += delta;

>
> >                                                      My operation is like below:
> >
> >   $ sysctl vm.nr_hugepages=10
> >   $ grep HugePages_ /proc/meminfo
> >   HugePages_Total:      10
> >   HugePages_Free:       10
> >   HugePages_Rsvd:        0
> >   HugePages_Surp:        0
> >   $ ./test_alloc_generic -B hugetlb_file -N1 -L "mmap access memory_error_injection:error_type=madv_hard"  // allocate a 2MB file on hugetlbfs, then madvise(MADV_HWPOISON) on it.
> >   $ grep HugePages_ /proc/meminfo
> >   HugePages_Total:      10
> >   HugePages_Free:        9
> >   HugePages_Rsvd:        1  // reserve count is incremented
> >   HugePages_Surp:        0
>
> This is confusing to me.  I can not create a test where there is a reserve
> count after poisoning page.
>
> I tried to recreate your test.  Running unmodified 4.14.0-rc5.
>
> Before test
> -----------
> HugePages_Total:       1
> HugePages_Free:        1
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:       2048 kB
>
> After open(creat) and mmap of 2MB hugetlbfs file
> ------------------------------------------------
> HugePages_Total:       1
> HugePages_Free:        1
> HugePages_Rsvd:        1
> HugePages_Surp:        0
> Hugepagesize:       2048 kB
>
> Reserve count is 1 as expected/normal
>
> After madvise(MADV_HWPOISON) of the single huge page in mapping/file
> --------------------------------------------------------------------
> HugePages_Total:       1
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:       2048 kB
>
> In this case, the reserve (and free) count were decremented.  Note that
> before the poison operation the page was not associated with the mapping/
> file.  I did not look closely at the code, but assume the madvise may
> cause the page to be 'faulted in'.
>
> The counts remain the same when the program exits
> -------------------------------------------------
> HugePages_Total:       1
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:       2048 kB
>
> Remove the file (rm /var/opt/oracle/hugepool/foo)
> -------------------------------------------------
> HugePages_Total:       1
> HugePages_Free:        0
> HugePages_Rsvd:    18446744073709551615
> HugePages_Surp:        0
> Hugepagesize:       2048 kB
>
> I am still confused about how your test maintains a reserve count after
> poisoning.  It may be a good idea for you to test my patch with your
> test scenario as I can not recreate here.

Interestingly, I found that this reproduces if all hugetlb pages are
reserved when poisoning.
Your testing meets the condition, and mine doesn't.

In gather_surplus_pages() we determine whether we extend hugetlb pool
with surplus pages like below:

    needed = (h->resv_huge_pages + delta) - h->free_huge_pages;
    if (needed <= 0) {
            h->resv_huge_pages += delta;
            return 0;
    }
    ...

needed is 1 if h->resv_huge_pages == h->free_huge_pages, and then
the reserve count gets inconsistent.
I confirmed that your patch fixes the issue, so I'm OK with it.

Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

Thanks,
Naoya Horiguchi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] mm:hugetlbfs: Fix hwpoison reserve accounting
  2017-10-23  7:32         ` Naoya Horiguchi
@ 2017-10-23 18:20           ` Mike Kravetz
  -1 siblings, 0 replies; 14+ messages in thread
From: Mike Kravetz @ 2017-10-23 18:20 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, linux-kernel, Michal Hocko, Aneesh Kumar,
	Anshuman Khandual, Andrew Morton, stable

On 10/23/2017 12:32 AM, Naoya Horiguchi wrote:
> On Fri, Oct 20, 2017 at 10:49:46AM -0700, Mike Kravetz wrote:
>> On 10/19/2017 07:30 PM, Naoya Horiguchi wrote:
>>> On Thu, Oct 19, 2017 at 04:00:07PM -0700, Mike Kravetz wrote:
>>>
>>> Thank you for addressing this. The patch itself looks good to me, but
>>> the reported issue (negative reserve count) doesn't reproduce in my trial
>>> with v4.14-rc5, so could you share the exact procedure for this issue?
>>
>> Sure, but first one question on your test scenario below.
>>
>>>
>>> When error handler runs over a huge page, the reserve count is incremented
>>> so I'm not sure why the reserve count goes negative.
>>
>> I'm not sure I follow.  What specific code is incrementing the reserve
>> count?
> 
> The call path is like below:
> 
>   hugetlbfs_error_remove_page
>     hugetlb_fix_reserve_counts
>       hugepage_subpool_get_pages(spool, 1)
>         hugetlb_acct_memory(h, 1);
>           gather_surplus_pages
>             h->resv_huge_pages += delta;
> 

Ah OK.  This is a result of call to hugetlb_fix_reserve_counts which
I believe is incorrect in most instances, and is unlikely to happen 
with my patch.

>>
>> Remove the file (rm /var/opt/oracle/hugepool/foo)
>> -------------------------------------------------
>> HugePages_Total:       1
>> HugePages_Free:        0
>> HugePages_Rsvd:    18446744073709551615
>> HugePages_Surp:        0
>> Hugepagesize:       2048 kB
>>
>> I am still confused about how your test maintains a reserve count after
>> poisoning.  It may be a good idea for you to test my patch with your
>> test scenario as I can not recreate here.
> 
> Interestingly, I found that this reproduces if all hugetlb pages are
> reserved when poisoning.
> Your testing meets the condition, and mine doesn't.
> 
> In gather_surplus_pages() we determine whether we extend hugetlb pool
> with surplus pages like below:
> 
>     needed = (h->resv_huge_pages + delta) - h->free_huge_pages;
>     if (needed <= 0) {
>             h->resv_huge_pages += delta;
>             return 0;
>     }
>     ...
> 
> needed is 1 if h->resv_huge_pages == h->free_huge_pages, and then
> the reserve count gets inconsistent.
> I confirmed that your patch fixes the issue, so I'm OK with it.

Thanks.  That now makes sense to me.

hugetlb_fix_reserve_counts (which results in gather_surplus_pages being
called), is only designed to be called in the extremely rare cases when
we have free'ed a huge page but are unable to free the reservation entry.

Just curious, when the hugetlb_fix_reserve_counts call was added to
hugetlbfs_error_remove_page, was the intention to preserve the original
reservation?  I remember thinking hard about that for the hole punch
case and came to the conclusion that it was easier and less error prone
to remove the reservation as well.  That will also happen in the error
case with the patch I provided.

-- 
Mike Kravetz

> 
> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> 
> Thanks,
> Naoya Horiguchi
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] mm:hugetlbfs: Fix hwpoison reserve accounting
@ 2017-10-23 18:20           ` Mike Kravetz
  0 siblings, 0 replies; 14+ messages in thread
From: Mike Kravetz @ 2017-10-23 18:20 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, linux-kernel, Michal Hocko, Aneesh Kumar,
	Anshuman Khandual, Andrew Morton, stable

On 10/23/2017 12:32 AM, Naoya Horiguchi wrote:
> On Fri, Oct 20, 2017 at 10:49:46AM -0700, Mike Kravetz wrote:
>> On 10/19/2017 07:30 PM, Naoya Horiguchi wrote:
>>> On Thu, Oct 19, 2017 at 04:00:07PM -0700, Mike Kravetz wrote:
>>>
>>> Thank you for addressing this. The patch itself looks good to me, but
>>> the reported issue (negative reserve count) doesn't reproduce in my trial
>>> with v4.14-rc5, so could you share the exact procedure for this issue?
>>
>> Sure, but first one question on your test scenario below.
>>
>>>
>>> When error handler runs over a huge page, the reserve count is incremented
>>> so I'm not sure why the reserve count goes negative.
>>
>> I'm not sure I follow.  What specific code is incrementing the reserve
>> count?
> 
> The call path is like below:
> 
>   hugetlbfs_error_remove_page
>     hugetlb_fix_reserve_counts
>       hugepage_subpool_get_pages(spool, 1)
>         hugetlb_acct_memory(h, 1);
>           gather_surplus_pages
>             h->resv_huge_pages += delta;
> 

Ah OK.  This is a result of call to hugetlb_fix_reserve_counts which
I believe is incorrect in most instances, and is unlikely to happen 
with my patch.

>>
>> Remove the file (rm /var/opt/oracle/hugepool/foo)
>> -------------------------------------------------
>> HugePages_Total:       1
>> HugePages_Free:        0
>> HugePages_Rsvd:    18446744073709551615
>> HugePages_Surp:        0
>> Hugepagesize:       2048 kB
>>
>> I am still confused about how your test maintains a reserve count after
>> poisoning.  It may be a good idea for you to test my patch with your
>> test scenario as I can not recreate here.
> 
> Interestingly, I found that this reproduces if all hugetlb pages are
> reserved when poisoning.
> Your testing meets the condition, and mine doesn't.
> 
> In gather_surplus_pages() we determine whether we extend hugetlb pool
> with surplus pages like below:
> 
>     needed = (h->resv_huge_pages + delta) - h->free_huge_pages;
>     if (needed <= 0) {
>             h->resv_huge_pages += delta;
>             return 0;
>     }
>     ...
> 
> needed is 1 if h->resv_huge_pages == h->free_huge_pages, and then
> the reserve count gets inconsistent.
> I confirmed that your patch fixes the issue, so I'm OK with it.

Thanks.  That now makes sense to me.

hugetlb_fix_reserve_counts (which results in gather_surplus_pages being
called), is only designed to be called in the extremely rare cases when
we have free'ed a huge page but are unable to free the reservation entry.

Just curious, when the hugetlb_fix_reserve_counts call was added to
hugetlbfs_error_remove_page, was the intention to preserve the original
reservation?  I remember thinking hard about that for the hole punch
case and came to the conclusion that it was easier and less error prone
to remove the reservation as well.  That will also happen in the error
case with the patch I provided.

-- 
Mike Kravetz

> 
> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> 
> Thanks,
> Naoya Horiguchi
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] mm:hugetlbfs: Fix hwpoison reserve accounting
  2017-10-23 18:20           ` Mike Kravetz
@ 2017-10-24  0:46             ` Naoya Horiguchi
  -1 siblings, 0 replies; 14+ messages in thread
From: Naoya Horiguchi @ 2017-10-24  0:46 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, Michal Hocko, Aneesh Kumar,
	Anshuman Khandual, Andrew Morton, stable

On Mon, Oct 23, 2017 at 11:20:02AM -0700, Mike Kravetz wrote:
> On 10/23/2017 12:32 AM, Naoya Horiguchi wrote:
> > On Fri, Oct 20, 2017 at 10:49:46AM -0700, Mike Kravetz wrote:
> >> On 10/19/2017 07:30 PM, Naoya Horiguchi wrote:
> >>> On Thu, Oct 19, 2017 at 04:00:07PM -0700, Mike Kravetz wrote:
> >>>
> >>> Thank you for addressing this. The patch itself looks good to me, but
> >>> the reported issue (negative reserve count) doesn't reproduce in my trial
> >>> with v4.14-rc5, so could you share the exact procedure for this issue?
> >>
> >> Sure, but first one question on your test scenario below.
> >>
> >>>
> >>> When error handler runs over a huge page, the reserve count is incremented
> >>> so I'm not sure why the reserve count goes negative.
> >>
> >> I'm not sure I follow.  What specific code is incrementing the reserve
> >> count?
> > 
> > The call path is like below:
> > 
> >   hugetlbfs_error_remove_page
> >     hugetlb_fix_reserve_counts
> >       hugepage_subpool_get_pages(spool, 1)
> >         hugetlb_acct_memory(h, 1);
> >           gather_surplus_pages
> >             h->resv_huge_pages += delta;
> > 
> 
> Ah OK.  This is a result of call to hugetlb_fix_reserve_counts which
> I believe is incorrect in most instances, and is unlikely to happen 
> with my patch.
> 
> >>
> >> Remove the file (rm /var/opt/oracle/hugepool/foo)
> >> -------------------------------------------------
> >> HugePages_Total:       1
> >> HugePages_Free:        0
> >> HugePages_Rsvd:    18446744073709551615
> >> HugePages_Surp:        0
> >> Hugepagesize:       2048 kB
> >>
> >> I am still confused about how your test maintains a reserve count after
> >> poisoning.  It may be a good idea for you to test my patch with your
> >> test scenario as I can not recreate here.
> > 
> > Interestingly, I found that this reproduces if all hugetlb pages are
> > reserved when poisoning.
> > Your testing meets the condition, and mine doesn't.
> > 
> > In gather_surplus_pages() we determine whether we extend hugetlb pool
> > with surplus pages like below:
> > 
> >     needed = (h->resv_huge_pages + delta) - h->free_huge_pages;
> >     if (needed <= 0) {
> >             h->resv_huge_pages += delta;
> >             return 0;
> >     }
> >     ...
> > 
> > needed is 1 if h->resv_huge_pages == h->free_huge_pages, and then
> > the reserve count gets inconsistent.
> > I confirmed that your patch fixes the issue, so I'm OK with it.
> 
> Thanks.  That now makes sense to me.
> 
> hugetlb_fix_reserve_counts (which results in gather_surplus_pages being
> called), is only designed to be called in the extremely rare cases when
> we have free'ed a huge page but are unable to free the reservation entry.
> 
> Just curious, when the hugetlb_fix_reserve_counts call was added to
> hugetlbfs_error_remove_page, was the intention to preserve the original
> reservation? 

No, the intention was to remove the reservation of the error hugepage
which was unmapped and isolated from normal hugepage's lifecycle.
The error hugepage is not freed back to hugepage pool, but it should be
handled in the same manner as freeing from the perspective of reserve count.

When I was writing commit 78bb920344b8, I experienced some reserve count
mismatch, and wrongly borrowed the code from truncation code.

> I remember thinking hard about that for the hole punch
> case and came to the conclusion that it was easier and less error prone
> to remove the reservation as well.  That will also happen in the error
> case with the patch I provided.

Yes, hole punching seems sililar to poisoning except that the final destination
of the target page differs. So we can make the same conclusion here.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] mm:hugetlbfs: Fix hwpoison reserve accounting
@ 2017-10-24  0:46             ` Naoya Horiguchi
  0 siblings, 0 replies; 14+ messages in thread
From: Naoya Horiguchi @ 2017-10-24  0:46 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, Michal Hocko, Aneesh Kumar,
	Anshuman Khandual, Andrew Morton, stable

On Mon, Oct 23, 2017 at 11:20:02AM -0700, Mike Kravetz wrote:
> On 10/23/2017 12:32 AM, Naoya Horiguchi wrote:
> > On Fri, Oct 20, 2017 at 10:49:46AM -0700, Mike Kravetz wrote:
> >> On 10/19/2017 07:30 PM, Naoya Horiguchi wrote:
> >>> On Thu, Oct 19, 2017 at 04:00:07PM -0700, Mike Kravetz wrote:
> >>>
> >>> Thank you for addressing this. The patch itself looks good to me, but
> >>> the reported issue (negative reserve count) doesn't reproduce in my trial
> >>> with v4.14-rc5, so could you share the exact procedure for this issue?
> >>
> >> Sure, but first one question on your test scenario below.
> >>
> >>>
> >>> When error handler runs over a huge page, the reserve count is incremented
> >>> so I'm not sure why the reserve count goes negative.
> >>
> >> I'm not sure I follow.  What specific code is incrementing the reserve
> >> count?
> > 
> > The call path is like below:
> > 
> >   hugetlbfs_error_remove_page
> >     hugetlb_fix_reserve_counts
> >       hugepage_subpool_get_pages(spool, 1)
> >         hugetlb_acct_memory(h, 1);
> >           gather_surplus_pages
> >             h->resv_huge_pages += delta;
> > 
> 
> Ah OK.  This is a result of call to hugetlb_fix_reserve_counts which
> I believe is incorrect in most instances, and is unlikely to happen 
> with my patch.
> 
> >>
> >> Remove the file (rm /var/opt/oracle/hugepool/foo)
> >> -------------------------------------------------
> >> HugePages_Total:       1
> >> HugePages_Free:        0
> >> HugePages_Rsvd:    18446744073709551615
> >> HugePages_Surp:        0
> >> Hugepagesize:       2048 kB
> >>
> >> I am still confused about how your test maintains a reserve count after
> >> poisoning.  It may be a good idea for you to test my patch with your
> >> test scenario as I can not recreate here.
> > 
> > Interestingly, I found that this reproduces if all hugetlb pages are
> > reserved when poisoning.
> > Your testing meets the condition, and mine doesn't.
> > 
> > In gather_surplus_pages() we determine whether we extend hugetlb pool
> > with surplus pages like below:
> > 
> >     needed = (h->resv_huge_pages + delta) - h->free_huge_pages;
> >     if (needed <= 0) {
> >             h->resv_huge_pages += delta;
> >             return 0;
> >     }
> >     ...
> > 
> > needed is 1 if h->resv_huge_pages == h->free_huge_pages, and then
> > the reserve count gets inconsistent.
> > I confirmed that your patch fixes the issue, so I'm OK with it.
> 
> Thanks.  That now makes sense to me.
> 
> hugetlb_fix_reserve_counts (which results in gather_surplus_pages being
> called), is only designed to be called in the extremely rare cases when
> we have free'ed a huge page but are unable to free the reservation entry.
> 
> Just curious, when the hugetlb_fix_reserve_counts call was added to
> hugetlbfs_error_remove_page, was the intention to preserve the original
> reservation? 

No, the intention was to remove the reservation of the error hugepage
which was unmapped and isolated from normal hugepage's lifecycle.
The error hugepage is not freed back to hugepage pool, but it should be
handled in the same manner as freeing from the perspective of reserve count.

When I was writing commit 78bb920344b8, I experienced some reserve count
mismatch, and wrongly borrowed the code from truncation code.

> I remember thinking hard about that for the hole punch
> case and came to the conclusion that it was easier and less error prone
> to remove the reservation as well.  That will also happen in the error
> case with the patch I provided.

Yes, hole punching seems sililar to poisoning except that the final destination
of the target page differs. So we can make the same conclusion here.

Thanks,
Naoya Horiguchi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-10-24  0:46 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-19 23:00 [PATCH 0/1] mm:hugetlbfs: Fix hwpoison reserve accounting Mike Kravetz
2017-10-19 23:00 ` Mike Kravetz
2017-10-19 23:00 ` [PATCH 1/1] " Mike Kravetz
2017-10-19 23:00   ` Mike Kravetz
2017-10-20  2:30   ` Naoya Horiguchi
2017-10-20  2:30     ` Naoya Horiguchi
2017-10-20 17:49     ` Mike Kravetz
2017-10-20 17:49       ` Mike Kravetz
2017-10-23  7:32       ` Naoya Horiguchi
2017-10-23  7:32         ` Naoya Horiguchi
2017-10-23 18:20         ` Mike Kravetz
2017-10-23 18:20           ` Mike Kravetz
2017-10-24  0:46           ` Naoya Horiguchi
2017-10-24  0:46             ` Naoya Horiguchi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.