linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/02] fix return value issue of soft offlining hugepages
@ 2019-06-10  8:18 Naoya Horiguchi
  2019-06-10  8:18 ` [PATCH v2 1/2] mm: soft-offline: return -EBUSY if set_hwpoison_free_buddy_page() fails Naoya Horiguchi
  2019-06-10  8:18 ` [PATCH v2 2/2] mm: hugetlb: soft-offline: dissolve_free_huge_page() return zero on !PageHuge Naoya Horiguchi
  0 siblings, 2 replies; 13+ messages in thread
From: Naoya Horiguchi @ 2019-06-10  8:18 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Michal Hocko, Mike Kravetz, xishi.qiuxishi, Chen,
	Jerry T, Zhuo, Qiuxu, linux-kernel

Hi everyone,

This is an update of the fix of return value issue of hugepage soft-offlining
(v1: https://patchwork.kernel.org/patch/10962135/).

The code itself has no change since v1 but I updated the description.
Jerry helped testing and finally confirmed that the patch is OK.

In previous discussion, it's pointed out by Mike that this problem contained
two separate issues (a problem of dissolve_free_huge_page() and a problem of
soft_offline_huge_page() itself) and I agree with it (althouth I stated
differently at v1). So I separated the patch.

Hopefully this will help finishing the issue.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 1/2] mm: soft-offline: return -EBUSY if set_hwpoison_free_buddy_page() fails
  2019-06-10  8:18 [PATCH v2 00/02] fix return value issue of soft offlining hugepages Naoya Horiguchi
@ 2019-06-10  8:18 ` Naoya Horiguchi
  2019-06-10 21:20   ` Andrew Morton
  2019-06-11  0:19   ` Mike Kravetz
  2019-06-10  8:18 ` [PATCH v2 2/2] mm: hugetlb: soft-offline: dissolve_free_huge_page() return zero on !PageHuge Naoya Horiguchi
  1 sibling, 2 replies; 13+ messages in thread
From: Naoya Horiguchi @ 2019-06-10  8:18 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Michal Hocko, Mike Kravetz, xishi.qiuxishi, Chen,
	Jerry T, Zhuo, Qiuxu, linux-kernel

The pass/fail of soft offline should be judged by checking whether the
raw error page was finally contained or not (i.e. the result of
set_hwpoison_free_buddy_page()), but current code do not work like that.
So this patch is suggesting to fix it.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Fixes: 6bc9b56433b76 ("mm: fix race on soft-offlining")
Cc: <stable@vger.kernel.org> # v4.19+
---
 mm/memory-failure.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git v5.2-rc3/mm/memory-failure.c v5.2-rc3_patched/mm/memory-failure.c
index fc8b517..7ea485e 100644
--- v5.2-rc3/mm/memory-failure.c
+++ v5.2-rc3_patched/mm/memory-failure.c
@@ -1733,6 +1733,8 @@ static int soft_offline_huge_page(struct page *page, int flags)
 		if (!ret) {
 			if (set_hwpoison_free_buddy_page(page))
 				num_poisoned_pages_inc();
+			else
+				ret = -EBUSY;
 		}
 	}
 	return ret;
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 2/2] mm: hugetlb: soft-offline: dissolve_free_huge_page() return zero on !PageHuge
  2019-06-10  8:18 [PATCH v2 00/02] fix return value issue of soft offlining hugepages Naoya Horiguchi
  2019-06-10  8:18 ` [PATCH v2 1/2] mm: soft-offline: return -EBUSY if set_hwpoison_free_buddy_page() fails Naoya Horiguchi
@ 2019-06-10  8:18 ` Naoya Horiguchi
  2019-06-11  9:50   ` Anshuman Khandual
  2019-06-11 17:16   ` Mike Kravetz
  1 sibling, 2 replies; 13+ messages in thread
From: Naoya Horiguchi @ 2019-06-10  8:18 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Michal Hocko, Mike Kravetz, xishi.qiuxishi, Chen,
	Jerry T, Zhuo, Qiuxu, linux-kernel

madvise(MADV_SOFT_OFFLINE) often returns -EBUSY when calling soft offline
for hugepages with overcommitting enabled. That was caused by the suboptimal
code in current soft-offline code. See the following part:

    ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
                            MIGRATE_SYNC, MR_MEMORY_FAILURE);
    if (ret) {
            ...
    } else {
            /*
             * We set PG_hwpoison only when the migration source hugepage
             * was successfully dissolved, because otherwise hwpoisoned
             * hugepage remains on free hugepage list, then userspace will
             * find it as SIGBUS by allocation failure. That's not expected
             * in soft-offlining.
             */
            ret = dissolve_free_huge_page(page);
            if (!ret) {
                    if (set_hwpoison_free_buddy_page(page))
                            num_poisoned_pages_inc();
            }
    }
    return ret;

Here dissolve_free_huge_page() returns -EBUSY if the migration source page
was freed into buddy in migrate_pages(), but even in that case we actually
has a chance that set_hwpoison_free_buddy_page() succeeds. So that means
current code gives up offlining too early now.

dissolve_free_huge_page() checks that a given hugepage is suitable for
dissolving, where we should return success for !PageHuge() case because
the given hugepage is considered as already dissolved.

This change also affects other callers of dissolve_free_huge_page(),
which are cleaned up together.

Reported-by: Chen, Jerry T <jerry.t.chen@intel.com>
Tested-by: Chen, Jerry T <jerry.t.chen@intel.com>
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Fixes: 6bc9b56433b76 ("mm: fix race on soft-offlining")
Cc: <stable@vger.kernel.org> # v4.19+
---
 mm/hugetlb.c        | 15 +++++++++------
 mm/memory-failure.c |  5 +----
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git v5.2-rc3/mm/hugetlb.c v5.2-rc3_patched/mm/hugetlb.c
index ac843d3..048d071 100644
--- v5.2-rc3/mm/hugetlb.c
+++ v5.2-rc3_patched/mm/hugetlb.c
@@ -1519,7 +1519,12 @@ int dissolve_free_huge_page(struct page *page)
 	int rc = -EBUSY;
 
 	spin_lock(&hugetlb_lock);
-	if (PageHuge(page) && !page_count(page)) {
+	if (!PageHuge(page)) {
+		rc = 0;
+		goto out;
+	}
+
+	if (!page_count(page)) {
 		struct page *head = compound_head(page);
 		struct hstate *h = page_hstate(head);
 		int nid = page_to_nid(head);
@@ -1564,11 +1569,9 @@ int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
 
 	for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order) {
 		page = pfn_to_page(pfn);
-		if (PageHuge(page) && !page_count(page)) {
-			rc = dissolve_free_huge_page(page);
-			if (rc)
-				break;
-		}
+		rc = dissolve_free_huge_page(page);
+		if (rc)
+			break;
 	}
 
 	return rc;
diff --git v5.2-rc3/mm/memory-failure.c v5.2-rc3_patched/mm/memory-failure.c
index 7ea485e..3a83e27 100644
--- v5.2-rc3/mm/memory-failure.c
+++ v5.2-rc3_patched/mm/memory-failure.c
@@ -1859,11 +1859,8 @@ static int soft_offline_in_use_page(struct page *page, int flags)
 
 static int soft_offline_free_page(struct page *page)
 {
-	int rc = 0;
-	struct page *head = compound_head(page);
+	int rc = dissolve_free_huge_page(page);
 
-	if (PageHuge(head))
-		rc = dissolve_free_huge_page(page);
 	if (!rc) {
 		if (set_hwpoison_free_buddy_page(page))
 			num_poisoned_pages_inc();
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/2] mm: soft-offline: return -EBUSY if set_hwpoison_free_buddy_page() fails
  2019-06-10  8:18 ` [PATCH v2 1/2] mm: soft-offline: return -EBUSY if set_hwpoison_free_buddy_page() fails Naoya Horiguchi
@ 2019-06-10 21:20   ` Andrew Morton
  2019-06-10 22:51     ` Naoya Horiguchi
  2019-06-11  0:19   ` Mike Kravetz
  1 sibling, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2019-06-10 21:20 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Michal Hocko, Mike Kravetz, xishi.qiuxishi, Chen,
	Jerry T, Zhuo, Qiuxu, linux-kernel

On Mon, 10 Jun 2019 17:18:05 +0900 Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:

> The pass/fail of soft offline should be judged by checking whether the
> raw error page was finally contained or not (i.e. the result of
> set_hwpoison_free_buddy_page()), but current code do not work like that.
> So this patch is suggesting to fix it.

Please describe the user-visible runtime effects of this change?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/2] mm: soft-offline: return -EBUSY if set_hwpoison_free_buddy_page() fails
  2019-06-10 21:20   ` Andrew Morton
@ 2019-06-10 22:51     ` Naoya Horiguchi
  0 siblings, 0 replies; 13+ messages in thread
From: Naoya Horiguchi @ 2019-06-10 22:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Michal Hocko, Mike Kravetz, xishi.qiuxishi, Chen,
	Jerry T, Zhuo, Qiuxu, linux-kernel

On Mon, Jun 10, 2019 at 02:20:33PM -0700, Andrew Morton wrote:
> On Mon, 10 Jun 2019 17:18:05 +0900 Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:
> 
> > The pass/fail of soft offline should be judged by checking whether the
> > raw error page was finally contained or not (i.e. the result of
> > set_hwpoison_free_buddy_page()), but current code do not work like that.
> > So this patch is suggesting to fix it.
> 
> Please describe the user-visible runtime effects of this change?

Sorry, could you replace the description as follows (I inserted one sentence)?

    The pass/fail of soft offline should be judged by checking whether the
    raw error page was finally contained or not (i.e. the result of
    set_hwpoison_free_buddy_page()), but current code do not work like that.
    It might lead us to misjudge the test result when
    set_hwpoison_free_buddy_page() fails.  So this patch is suggesting to
    fix it.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/2] mm: soft-offline: return -EBUSY if set_hwpoison_free_buddy_page() fails
  2019-06-10  8:18 ` [PATCH v2 1/2] mm: soft-offline: return -EBUSY if set_hwpoison_free_buddy_page() fails Naoya Horiguchi
  2019-06-10 21:20   ` Andrew Morton
@ 2019-06-11  0:19   ` Mike Kravetz
  2019-06-11  0:57     ` Naoya Horiguchi
  1 sibling, 1 reply; 13+ messages in thread
From: Mike Kravetz @ 2019-06-11  0:19 UTC (permalink / raw)
  To: Naoya Horiguchi, linux-mm
  Cc: Andrew Morton, Michal Hocko, xishi.qiuxishi, Chen, Jerry T, Zhuo,
	Qiuxu, linux-kernel

On 6/10/19 1:18 AM, Naoya Horiguchi wrote:
> The pass/fail of soft offline should be judged by checking whether the
> raw error page was finally contained or not (i.e. the result of
> set_hwpoison_free_buddy_page()), but current code do not work like that.
> So this patch is suggesting to fix it.
> 
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Fixes: 6bc9b56433b76 ("mm: fix race on soft-offlining")
> Cc: <stable@vger.kernel.org> # v4.19+

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>

To follow-up on Andrew's comment/question about user visible effects.  Without
this fix, there are cases where madvise(MADV_SOFT_OFFLINE) may not offline the
original page and will not return an error.  Are there any other visible
effects?

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/2] mm: soft-offline: return -EBUSY if set_hwpoison_free_buddy_page() fails
  2019-06-11  0:19   ` Mike Kravetz
@ 2019-06-11  0:57     ` Naoya Horiguchi
  2019-06-11  8:14       ` Anshuman Khandual
  0 siblings, 1 reply; 13+ messages in thread
From: Naoya Horiguchi @ 2019-06-11  0:57 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, Andrew Morton, Michal Hocko, xishi.qiuxishi, Chen,
	Jerry T, Zhuo, Qiuxu, linux-kernel

On Mon, Jun 10, 2019 at 05:19:45PM -0700, Mike Kravetz wrote:
> On 6/10/19 1:18 AM, Naoya Horiguchi wrote:
> > The pass/fail of soft offline should be judged by checking whether the
> > raw error page was finally contained or not (i.e. the result of
> > set_hwpoison_free_buddy_page()), but current code do not work like that.
> > So this patch is suggesting to fix it.
> > 
> > Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> > Fixes: 6bc9b56433b76 ("mm: fix race on soft-offlining")
> > Cc: <stable@vger.kernel.org> # v4.19+
> 
> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>

Thank you, Mike.

> 
> To follow-up on Andrew's comment/question about user visible effects.  Without
> this fix, there are cases where madvise(MADV_SOFT_OFFLINE) may not offline the
> original page and will not return an error.

Yes, that's right.

>  Are there any other visible
> effects?

I can't think of other ones.

- Naoya

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/2] mm: soft-offline: return -EBUSY if set_hwpoison_free_buddy_page() fails
  2019-06-11  0:57     ` Naoya Horiguchi
@ 2019-06-11  8:14       ` Anshuman Khandual
  2019-06-12  6:48         ` Naoya Horiguchi
  0 siblings, 1 reply; 13+ messages in thread
From: Anshuman Khandual @ 2019-06-11  8:14 UTC (permalink / raw)
  To: Naoya Horiguchi, Mike Kravetz
  Cc: linux-mm, Andrew Morton, Michal Hocko, xishi.qiuxishi, Chen,
	Jerry T, Zhuo, Qiuxu, linux-kernel



On 06/11/2019 06:27 AM, Naoya Horiguchi wrote:
> On Mon, Jun 10, 2019 at 05:19:45PM -0700, Mike Kravetz wrote:
>> On 6/10/19 1:18 AM, Naoya Horiguchi wrote:
>>> The pass/fail of soft offline should be judged by checking whether the
>>> raw error page was finally contained or not (i.e. the result of
>>> set_hwpoison_free_buddy_page()), but current code do not work like that.
>>> So this patch is suggesting to fix it.
>>>
>>> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>>> Fixes: 6bc9b56433b76 ("mm: fix race on soft-offlining")
>>> Cc: <stable@vger.kernel.org> # v4.19+
>>
>> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> 
> Thank you, Mike.
> 
>>
>> To follow-up on Andrew's comment/question about user visible effects.  Without
>> this fix, there are cases where madvise(MADV_SOFT_OFFLINE) may not offline the
>> original page and will not return an error.
> 
> Yes, that's right.

Then should this be included in the commit message as well ?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/2] mm: hugetlb: soft-offline: dissolve_free_huge_page() return zero on !PageHuge
  2019-06-10  8:18 ` [PATCH v2 2/2] mm: hugetlb: soft-offline: dissolve_free_huge_page() return zero on !PageHuge Naoya Horiguchi
@ 2019-06-11  9:50   ` Anshuman Khandual
  2019-06-12  7:09     ` Naoya Horiguchi
  2019-06-11 17:16   ` Mike Kravetz
  1 sibling, 1 reply; 13+ messages in thread
From: Anshuman Khandual @ 2019-06-11  9:50 UTC (permalink / raw)
  To: Naoya Horiguchi, linux-mm
  Cc: Andrew Morton, Michal Hocko, Mike Kravetz, xishi.qiuxishi, Chen,
	Jerry T, Zhuo, Qiuxu, linux-kernel


On 06/10/2019 01:48 PM, Naoya Horiguchi wrote:
> madvise(MADV_SOFT_OFFLINE) often returns -EBUSY when calling soft offline
> for hugepages with overcommitting enabled. That was caused by the suboptimal
> code in current soft-offline code. See the following part:
> 
>     ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
>                             MIGRATE_SYNC, MR_MEMORY_FAILURE);
>     if (ret) {
>             ...
>     } else {
>             /*
>              * We set PG_hwpoison only when the migration source hugepage
>              * was successfully dissolved, because otherwise hwpoisoned
>              * hugepage remains on free hugepage list, then userspace will
>              * find it as SIGBUS by allocation failure. That's not expected
>              * in soft-offlining.
>              */
>             ret = dissolve_free_huge_page(page);
>             if (!ret) {
>                     if (set_hwpoison_free_buddy_page(page))
>                             num_poisoned_pages_inc();
>             }
>     }
>     return ret;
> 
> Here dissolve_free_huge_page() returns -EBUSY if the migration source page
> was freed into buddy in migrate_pages(), but even in that case we actually

Over committed source pages will be released into buddy and the normal ones
will not be ? dissolve_free_huge_page() returns -EBUSY because PageHuge()
return negative on already released pages ? How dissolve_free_huge_page()
will behave differently with over committed pages. I might be missing some
recent developments here.

> has a chance that set_hwpoison_free_buddy_page() succeeds. So that means
> current code gives up offlining too early now.

Hmm. It gives up early as the return value from dissolve_free_huge_page(EBUSY)
gets back as the return code for soft_offline_huge_page() without attempting
set_hwpoison_free_buddy_page() which still has a chance to succeed for freed
normal buddy pages.

> 
> dissolve_free_huge_page() checks that a given hugepage is suitable for
> dissolving, where we should return success for !PageHuge() case because
> the given hugepage is considered as already dissolved.

Right. It should return 0 (as a success) for freed normal buddy pages. Should
not it then check explicitly for PageBuddy() as well ?

> 
> This change also affects other callers of dissolve_free_huge_page(),
> which are cleaned up together.
> 
> Reported-by: Chen, Jerry T <jerry.t.chen@intel.com>
> Tested-by: Chen, Jerry T <jerry.t.chen@intel.com>
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Fixes: 6bc9b56433b76 ("mm: fix race on soft-offlining")
> Cc: <stable@vger.kernel.org> # v4.19+
> ---
>  mm/hugetlb.c        | 15 +++++++++------
>  mm/memory-failure.c |  5 +----
>  2 files changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git v5.2-rc3/mm/hugetlb.c v5.2-rc3_patched/mm/hugetlb.c
> index ac843d3..048d071 100644
> --- v5.2-rc3/mm/hugetlb.c
> +++ v5.2-rc3_patched/mm/hugetlb.c
> @@ -1519,7 +1519,12 @@ int dissolve_free_huge_page(struct page *page)
>  	int rc = -EBUSY;
>  
>  	spin_lock(&hugetlb_lock);
> -	if (PageHuge(page) && !page_count(page)) {
> +	if (!PageHuge(page)) {
> +		rc = 0;
> +		goto out;
> +	}

With this early bail out it maintains the functionality when called from
soft_offline_free_page() for normal pages. For huge page, it continues
on the previous path.

> +
> +	if (!page_count(page)) {
>  		struct page *head = compound_head(page);
>  		struct hstate *h = page_hstate(head);
>  		int nid = page_to_nid(head);
> @@ -1564,11 +1569,9 @@ int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
>  
>  	for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order) {
>  		page = pfn_to_page(pfn);
> -		if (PageHuge(page) && !page_count(page)) {

Right. These checks are now redundant.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/2] mm: hugetlb: soft-offline: dissolve_free_huge_page() return zero on !PageHuge
  2019-06-10  8:18 ` [PATCH v2 2/2] mm: hugetlb: soft-offline: dissolve_free_huge_page() return zero on !PageHuge Naoya Horiguchi
  2019-06-11  9:50   ` Anshuman Khandual
@ 2019-06-11 17:16   ` Mike Kravetz
  2019-06-12  7:24     ` Naoya Horiguchi
  1 sibling, 1 reply; 13+ messages in thread
From: Mike Kravetz @ 2019-06-11 17:16 UTC (permalink / raw)
  To: Naoya Horiguchi, linux-mm
  Cc: Andrew Morton, Michal Hocko, xishi.qiuxishi, Chen, Jerry T, Zhuo,
	Qiuxu, linux-kernel

On 6/10/19 1:18 AM, Naoya Horiguchi wrote:
> madvise(MADV_SOFT_OFFLINE) often returns -EBUSY when calling soft offline
> for hugepages with overcommitting enabled. That was caused by the suboptimal
> code in current soft-offline code. See the following part:
> 
>     ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
>                             MIGRATE_SYNC, MR_MEMORY_FAILURE);
>     if (ret) {
>             ...
>     } else {
>             /*
>              * We set PG_hwpoison only when the migration source hugepage
>              * was successfully dissolved, because otherwise hwpoisoned
>              * hugepage remains on free hugepage list, then userspace will
>              * find it as SIGBUS by allocation failure. That's not expected
>              * in soft-offlining.
>              */
>             ret = dissolve_free_huge_page(page);
>             if (!ret) {
>                     if (set_hwpoison_free_buddy_page(page))
>                             num_poisoned_pages_inc();
>             }
>     }
>     return ret;
> 
> Here dissolve_free_huge_page() returns -EBUSY if the migration source page
> was freed into buddy in migrate_pages(), but even in that case we actually
> has a chance that set_hwpoison_free_buddy_page() succeeds. So that means
> current code gives up offlining too early now.
> 
> dissolve_free_huge_page() checks that a given hugepage is suitable for
> dissolving, where we should return success for !PageHuge() case because
> the given hugepage is considered as already dissolved.
> 
> This change also affects other callers of dissolve_free_huge_page(),
> which are cleaned up together.
> 
> Reported-by: Chen, Jerry T <jerry.t.chen@intel.com>
> Tested-by: Chen, Jerry T <jerry.t.chen@intel.com>
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Fixes: 6bc9b56433b76 ("mm: fix race on soft-offlining")
> Cc: <stable@vger.kernel.org> # v4.19+
> ---
>  mm/hugetlb.c        | 15 +++++++++------
>  mm/memory-failure.c |  5 +----
>  2 files changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git v5.2-rc3/mm/hugetlb.c v5.2-rc3_patched/mm/hugetlb.c
> index ac843d3..048d071 100644
> --- v5.2-rc3/mm/hugetlb.c
> +++ v5.2-rc3_patched/mm/hugetlb.c
> @@ -1519,7 +1519,12 @@ int dissolve_free_huge_page(struct page *page)

Please update the function description for dissolve_free_huge_page() as
well.  It currently says, "Returns -EBUSY if the dissolution fails because
a give page is not a free hugepage" which is no longer true as a result of
this change.

>  	int rc = -EBUSY;
>  
>  	spin_lock(&hugetlb_lock);
> -	if (PageHuge(page) && !page_count(page)) {
> +	if (!PageHuge(page)) {
> +		rc = 0;
> +		goto out;
> +	}
> +
> +	if (!page_count(page)) {
>  		struct page *head = compound_head(page);
>  		struct hstate *h = page_hstate(head);
>  		int nid = page_to_nid(head);
> @@ -1564,11 +1569,9 @@ int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
>  
>  	for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order) {
>  		page = pfn_to_page(pfn);
> -		if (PageHuge(page) && !page_count(page)) {
> -			rc = dissolve_free_huge_page(page);
> -			if (rc)
> -				break;
> -		}

We may want to consider keeping at least the PageHuge(page) check before
calling dissolve_free_huge_page().  dissolve_free_huge_pages is called as
part of memory offline processing.  We do not know if the memory to be offlined
contains huge pages or not.  With your changes, we are taking hugetlb_lock
on each call to dissolve_free_huge_page just to discover that the page is
not a huge page.

You 'could' add a PageHuge(page) check to dissolve_free_huge_page before
taking the lock.  However, you would need to check again after taking the
lock.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/2] mm: soft-offline: return -EBUSY if set_hwpoison_free_buddy_page() fails
  2019-06-11  8:14       ` Anshuman Khandual
@ 2019-06-12  6:48         ` Naoya Horiguchi
  0 siblings, 0 replies; 13+ messages in thread
From: Naoya Horiguchi @ 2019-06-12  6:48 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Mike Kravetz, linux-mm, Andrew Morton, Michal Hocko,
	xishi.qiuxishi, Chen, Jerry T, Zhuo, Qiuxu, linux-kernel

On Tue, Jun 11, 2019 at 01:44:46PM +0530, Anshuman Khandual wrote:
> 
> 
> On 06/11/2019 06:27 AM, Naoya Horiguchi wrote:
> > On Mon, Jun 10, 2019 at 05:19:45PM -0700, Mike Kravetz wrote:
> >> On 6/10/19 1:18 AM, Naoya Horiguchi wrote:
> >>> The pass/fail of soft offline should be judged by checking whether the
> >>> raw error page was finally contained or not (i.e. the result of
> >>> set_hwpoison_free_buddy_page()), but current code do not work like that.
> >>> So this patch is suggesting to fix it.
> >>>
> >>> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> >>> Fixes: 6bc9b56433b76 ("mm: fix race on soft-offlining")
> >>> Cc: <stable@vger.kernel.org> # v4.19+
> >>
> >> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> > 
> > Thank you, Mike.
> > 
> >>
> >> To follow-up on Andrew's comment/question about user visible effects.  Without
> >> this fix, there are cases where madvise(MADV_SOFT_OFFLINE) may not offline the
> >> original page and will not return an error.
> > 
> > Yes, that's right.
> 
> Then should this be included in the commit message as well ?

Right, I'll clarify the point in the description.

Thanks,
- Naoya

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/2] mm: hugetlb: soft-offline: dissolve_free_huge_page() return zero on !PageHuge
  2019-06-11  9:50   ` Anshuman Khandual
@ 2019-06-12  7:09     ` Naoya Horiguchi
  0 siblings, 0 replies; 13+ messages in thread
From: Naoya Horiguchi @ 2019-06-12  7:09 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linux-mm, Andrew Morton, Michal Hocko, Mike Kravetz,
	xishi.qiuxishi, Chen, Jerry T, Zhuo, Qiuxu, linux-kernel

On Tue, Jun 11, 2019 at 03:20:26PM +0530, Anshuman Khandual wrote:
> 
> On 06/10/2019 01:48 PM, Naoya Horiguchi wrote:
> > madvise(MADV_SOFT_OFFLINE) often returns -EBUSY when calling soft offline
> > for hugepages with overcommitting enabled. That was caused by the suboptimal
> > code in current soft-offline code. See the following part:
> > 
> >     ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
> >                             MIGRATE_SYNC, MR_MEMORY_FAILURE);
> >     if (ret) {
> >             ...
> >     } else {
> >             /*
> >              * We set PG_hwpoison only when the migration source hugepage
> >              * was successfully dissolved, because otherwise hwpoisoned
> >              * hugepage remains on free hugepage list, then userspace will
> >              * find it as SIGBUS by allocation failure. That's not expected
> >              * in soft-offlining.
> >              */
> >             ret = dissolve_free_huge_page(page);
> >             if (!ret) {
> >                     if (set_hwpoison_free_buddy_page(page))
> >                             num_poisoned_pages_inc();
> >             }
> >     }
> >     return ret;
> > 
> > Here dissolve_free_huge_page() returns -EBUSY if the migration source page
> > was freed into buddy in migrate_pages(), but even in that case we actually
> 
> Over committed source pages will be released into buddy and the normal ones
> will not be ? dissolve_free_huge_page() returns -EBUSY because PageHuge()
> return negative on already released pages ? 

The answers for both questions here are yes.

> How dissolve_free_huge_page()
> will behave differently with over committed pages. I might be missing some
> recent developments here.

This dissolve_free_huge_page() should see a (free or reused) 4kB page when
overcommitting, and should see a (free or reused) huge page for non
overcommitting case.

> 
> > has a chance that set_hwpoison_free_buddy_page() succeeds. So that means
> > current code gives up offlining too early now.
> 
> Hmm. It gives up early as the return value from dissolve_free_huge_page(EBUSY)
> gets back as the return code for soft_offline_huge_page() without attempting
> set_hwpoison_free_buddy_page() which still has a chance to succeed for freed
> normal buddy pages.

Exactly.

> 
> > 
> > dissolve_free_huge_page() checks that a given hugepage is suitable for
> > dissolving, where we should return success for !PageHuge() case because
> > the given hugepage is considered as already dissolved.
> 
> Right. It should return 0 (as a success) for freed normal buddy pages. Should
> not it then check explicitly for PageBuddy() as well ?

in new semantics, dissolve_free_huge_page() returns:

  0: successfully dissolved free hugepages or the page is already dissolved
  EBUSY: failed to dissolved free hugepages or the hugepage is in-use.

so for any types of non hugepages, the return value is 0.

Thanks,
- Naoya 

> > 
> > This change also affects other callers of dissolve_free_huge_page(),
> > which are cleaned up together.
> > 
> > Reported-by: Chen, Jerry T <jerry.t.chen@intel.com>
> > Tested-by: Chen, Jerry T <jerry.t.chen@intel.com>
> > Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> > Fixes: 6bc9b56433b76 ("mm: fix race on soft-offlining")
> > Cc: <stable@vger.kernel.org> # v4.19+
> > ---
> >  mm/hugetlb.c        | 15 +++++++++------
> >  mm/memory-failure.c |  5 +----
> >  2 files changed, 10 insertions(+), 10 deletions(-)
> > 
> > diff --git v5.2-rc3/mm/hugetlb.c v5.2-rc3_patched/mm/hugetlb.c
> > index ac843d3..048d071 100644
> > --- v5.2-rc3/mm/hugetlb.c
> > +++ v5.2-rc3_patched/mm/hugetlb.c
> > @@ -1519,7 +1519,12 @@ int dissolve_free_huge_page(struct page *page)
> >  	int rc = -EBUSY;
> >  
> >  	spin_lock(&hugetlb_lock);
> > -	if (PageHuge(page) && !page_count(page)) {
> > +	if (!PageHuge(page)) {
> > +		rc = 0;
> > +		goto out;
> > +	}
> 
> With this early bail out it maintains the functionality when called from
> soft_offline_free_page() for normal pages. For huge page, it continues
> on the previous path.
> 
> > +
> > +	if (!page_count(page)) {
> >  		struct page *head = compound_head(page);
> >  		struct hstate *h = page_hstate(head);
> >  		int nid = page_to_nid(head);
> > @@ -1564,11 +1569,9 @@ int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
> >  
> >  	for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order) {
> >  		page = pfn_to_page(pfn);
> > -		if (PageHuge(page) && !page_count(page)) {
> 
> Right. These checks are now redundant.
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/2] mm: hugetlb: soft-offline: dissolve_free_huge_page() return zero on !PageHuge
  2019-06-11 17:16   ` Mike Kravetz
@ 2019-06-12  7:24     ` Naoya Horiguchi
  0 siblings, 0 replies; 13+ messages in thread
From: Naoya Horiguchi @ 2019-06-12  7:24 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, Andrew Morton, Michal Hocko, xishi.qiuxishi, Chen,
	Jerry T, Zhuo, Qiuxu, linux-kernel

On Tue, Jun 11, 2019 at 10:16:03AM -0700, Mike Kravetz wrote:
> On 6/10/19 1:18 AM, Naoya Horiguchi wrote:
> > madvise(MADV_SOFT_OFFLINE) often returns -EBUSY when calling soft offline
> > for hugepages with overcommitting enabled. That was caused by the suboptimal
> > code in current soft-offline code. See the following part:
> > 
> >     ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
> >                             MIGRATE_SYNC, MR_MEMORY_FAILURE);
> >     if (ret) {
> >             ...
> >     } else {
> >             /*
> >              * We set PG_hwpoison only when the migration source hugepage
> >              * was successfully dissolved, because otherwise hwpoisoned
> >              * hugepage remains on free hugepage list, then userspace will
> >              * find it as SIGBUS by allocation failure. That's not expected
> >              * in soft-offlining.
> >              */
> >             ret = dissolve_free_huge_page(page);
> >             if (!ret) {
> >                     if (set_hwpoison_free_buddy_page(page))
> >                             num_poisoned_pages_inc();
> >             }
> >     }
> >     return ret;
> > 
> > Here dissolve_free_huge_page() returns -EBUSY if the migration source page
> > was freed into buddy in migrate_pages(), but even in that case we actually
> > has a chance that set_hwpoison_free_buddy_page() succeeds. So that means
> > current code gives up offlining too early now.
> > 
> > dissolve_free_huge_page() checks that a given hugepage is suitable for
> > dissolving, where we should return success for !PageHuge() case because
> > the given hugepage is considered as already dissolved.
> > 
> > This change also affects other callers of dissolve_free_huge_page(),
> > which are cleaned up together.
> > 
> > Reported-by: Chen, Jerry T <jerry.t.chen@intel.com>
> > Tested-by: Chen, Jerry T <jerry.t.chen@intel.com>
> > Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> > Fixes: 6bc9b56433b76 ("mm: fix race on soft-offlining")
> > Cc: <stable@vger.kernel.org> # v4.19+
> > ---
> >  mm/hugetlb.c        | 15 +++++++++------
> >  mm/memory-failure.c |  5 +----
> >  2 files changed, 10 insertions(+), 10 deletions(-)
> > 
> > diff --git v5.2-rc3/mm/hugetlb.c v5.2-rc3_patched/mm/hugetlb.c
> > index ac843d3..048d071 100644
> > --- v5.2-rc3/mm/hugetlb.c
> > +++ v5.2-rc3_patched/mm/hugetlb.c
> > @@ -1519,7 +1519,12 @@ int dissolve_free_huge_page(struct page *page)
> 
> Please update the function description for dissolve_free_huge_page() as
> well.  It currently says, "Returns -EBUSY if the dissolution fails because
> a give page is not a free hugepage" which is no longer true as a result of
> this change.

Thanks for pointing out, I completely missed that.

> 
> >  	int rc = -EBUSY;
> >  
> >  	spin_lock(&hugetlb_lock);
> > -	if (PageHuge(page) && !page_count(page)) {
> > +	if (!PageHuge(page)) {
> > +		rc = 0;
> > +		goto out;
> > +	}
> > +
> > +	if (!page_count(page)) {
> >  		struct page *head = compound_head(page);
> >  		struct hstate *h = page_hstate(head);
> >  		int nid = page_to_nid(head);
> > @@ -1564,11 +1569,9 @@ int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
> >  
> >  	for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order) {
> >  		page = pfn_to_page(pfn);
> > -		if (PageHuge(page) && !page_count(page)) {
> > -			rc = dissolve_free_huge_page(page);
> > -			if (rc)
> > -				break;
> > -		}
> 
> We may want to consider keeping at least the PageHuge(page) check before
> calling dissolve_free_huge_page().  dissolve_free_huge_pages is called as
> part of memory offline processing.  We do not know if the memory to be offlined
> contains huge pages or not.  With your changes, we are taking hugetlb_lock
> on each call to dissolve_free_huge_page just to discover that the page is
> not a huge page.
> 
> You 'could' add a PageHuge(page) check to dissolve_free_huge_page before
> taking the lock.  However, you would need to check again after taking the
> lock.

Right, I'll do this.

What was in my mind when writing this was that I actually don't like
PageHuge because it's slow (not inlined) and called anywhere in mm code,
so I like to reduce it if possible.
But I now see that dissolve_free_huge_page() are relatively rare event
rather than hugepage allocation/free, so dissolve_free_huge_page should take
burden to precheck PageHuge instead of speculatively taking hugetlb_lock
and disrupting the hot path.

Thanks,
- Naoya

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-06-12  7:26 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-10  8:18 [PATCH v2 00/02] fix return value issue of soft offlining hugepages Naoya Horiguchi
2019-06-10  8:18 ` [PATCH v2 1/2] mm: soft-offline: return -EBUSY if set_hwpoison_free_buddy_page() fails Naoya Horiguchi
2019-06-10 21:20   ` Andrew Morton
2019-06-10 22:51     ` Naoya Horiguchi
2019-06-11  0:19   ` Mike Kravetz
2019-06-11  0:57     ` Naoya Horiguchi
2019-06-11  8:14       ` Anshuman Khandual
2019-06-12  6:48         ` Naoya Horiguchi
2019-06-10  8:18 ` [PATCH v2 2/2] mm: hugetlb: soft-offline: dissolve_free_huge_page() return zero on !PageHuge Naoya Horiguchi
2019-06-11  9:50   ` Anshuman Khandual
2019-06-12  7:09     ` Naoya Horiguchi
2019-06-11 17:16   ` Mike Kravetz
2019-06-12  7:24     ` Naoya Horiguchi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).