linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1] memory offline issues with hugepage size > memory block size
@ 2016-09-20 15:53 Gerald Schaefer
  2016-09-20 15:53 ` [PATCH 1/1] mm/hugetlb: fix memory offline " Gerald Schaefer
  2016-09-20 17:37 ` [PATCH 0/1] memory offline issues " Mike Kravetz
  0 siblings, 2 replies; 24+ messages in thread
From: Gerald Schaefer @ 2016-09-20 15:53 UTC (permalink / raw)
  To: Andrew Morton, Naoya Horiguchi
  Cc: linux-mm, linux-kernel, Michal Hocko, Kirill A . Shutemov,
	Vlastimil Babka, Mike Kravetz, Aneesh Kumar K . V,
	Martin Schwidefsky, Heiko Carstens, Gerald Schaefer

dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
list corruption and addressing exception when trying to set a memory
block offline that is part (but not the first part) of a gigantic
hugetlb page with a size > memory block size.

When no other smaller hugepage sizes are present, the VM_BUG_ON() will
trigger directly. In the other case we will run into an addressing
exception later, because dissolve_free_huge_page() will not use the head
page of the compound hugetlb page which will result in a NULL hstate
from page_hstate(). list_del() would also not work well on a tail page.

To fix this, first remove the VM_BUG_ON() because it is wrong, and then
use the compound head page in dissolve_free_huge_page().

However, this all assumes that it is the desired behaviour to remove
a (gigantic) unused hugetlb page from the pool, just because a small
(in relation to the  hugepage size) memory block is going offline. Not
sure if this is the right thing, and it doesn't look very consistent
given that in this scenario it is _not_ possible to migrate
such a (gigantic) hugepage if it is in use. OTOH, has_unmovable_pages()
will return false in both cases, i.e. the memory block will be reported
as removable, no matter if the hugepage that it is part of is unused or
in use.

This patch is assuming that it would be OK to remove the hugepage,
i.e. memory offline beats pre-allocated unused (gigantic) hugepages.

Any thoughts?


Gerald Schaefer (1):
  mm/hugetlb: fix memory offline with hugepage size > memory block size

 mm/hugetlb.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

-- 
2.8.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/1] mm/hugetlb: fix memory offline with hugepage size > memory block size
  2016-09-20 15:53 [PATCH 0/1] memory offline issues with hugepage size > memory block size Gerald Schaefer
@ 2016-09-20 15:53 ` Gerald Schaefer
  2016-09-21  6:29   ` Hillf Danton
  2016-09-20 17:37 ` [PATCH 0/1] memory offline issues " Mike Kravetz
  1 sibling, 1 reply; 24+ messages in thread
From: Gerald Schaefer @ 2016-09-20 15:53 UTC (permalink / raw)
  To: Andrew Morton, Naoya Horiguchi
  Cc: linux-mm, linux-kernel, Michal Hocko, Kirill A . Shutemov,
	Vlastimil Babka, Mike Kravetz, Aneesh Kumar K . V,
	Martin Schwidefsky, Heiko Carstens, Gerald Schaefer

dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
list corruption and addressing exception when trying to set a memory
block offline that is part (but not the first part) of a gigantic
hugetlb page with a size > memory block size.

When no other smaller hugepage sizes are present, the VM_BUG_ON() will
trigger directly. In the other case we will run into an addressing
exception later, because dissolve_free_huge_page() will not use the head
page of the compound hugetlb page which will result in a NULL hstate
from page_hstate(). list_del() would also not work well on a tail page.

To fix this, first remove the VM_BUG_ON() because it is wrong, and then
use the compound head page in dissolve_free_huge_page().

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
---
 mm/hugetlb.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 87e11d8..65e723c 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1441,15 +1441,17 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
  */
 static void dissolve_free_huge_page(struct page *page)
 {
+	struct page *head = compound_head(page);
+
 	spin_lock(&hugetlb_lock);
-	if (PageHuge(page) && !page_count(page)) {
-		struct hstate *h = page_hstate(page);
-		int nid = page_to_nid(page);
-		list_del(&page->lru);
+	if (!page_count(head)) {
+		struct hstate *h = page_hstate(head);
+		int nid = page_to_nid(head);
+		list_del(&head->lru);
 		h->free_huge_pages--;
 		h->free_huge_pages_node[nid]--;
 		h->max_huge_pages--;
-		update_and_free_page(h, page);
+		update_and_free_page(h, head);
 	}
 	spin_unlock(&hugetlb_lock);
 }
@@ -1466,9 +1468,9 @@ void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
 	if (!hugepages_supported())
 		return;
 
-	VM_BUG_ON(!IS_ALIGNED(start_pfn, 1 << minimum_order));
 	for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order)
-		dissolve_free_huge_page(pfn_to_page(pfn));
+		if (PageHuge(pfn_to_page(pfn)))
+			dissolve_free_huge_page(pfn_to_page(pfn));
 }
 
 /*
-- 
2.8.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/1] memory offline issues with hugepage size > memory block size
  2016-09-20 15:53 [PATCH 0/1] memory offline issues with hugepage size > memory block size Gerald Schaefer
  2016-09-20 15:53 ` [PATCH 1/1] mm/hugetlb: fix memory offline " Gerald Schaefer
@ 2016-09-20 17:37 ` Mike Kravetz
  2016-09-20 17:45   ` Dave Hansen
                     ` (2 more replies)
  1 sibling, 3 replies; 24+ messages in thread
From: Mike Kravetz @ 2016-09-20 17:37 UTC (permalink / raw)
  To: Gerald Schaefer, Andrew Morton, Naoya Horiguchi
  Cc: linux-mm, linux-kernel, Michal Hocko, Kirill A . Shutemov,
	Vlastimil Babka, Aneesh Kumar K . V, Martin Schwidefsky,
	Heiko Carstens, Rui Teng, Dave Hansen

On 09/20/2016 08:53 AM, Gerald Schaefer wrote:
> dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
> list corruption and addressing exception when trying to set a memory
> block offline that is part (but not the first part) of a gigantic
> hugetlb page with a size > memory block size.
> 
> When no other smaller hugepage sizes are present, the VM_BUG_ON() will
> trigger directly. In the other case we will run into an addressing
> exception later, because dissolve_free_huge_page() will not use the head
> page of the compound hugetlb page which will result in a NULL hstate
> from page_hstate(). list_del() would also not work well on a tail page.
> 
> To fix this, first remove the VM_BUG_ON() because it is wrong, and then
> use the compound head page in dissolve_free_huge_page().
> 
> However, this all assumes that it is the desired behaviour to remove
> a (gigantic) unused hugetlb page from the pool, just because a small
> (in relation to the  hugepage size) memory block is going offline. Not
> sure if this is the right thing, and it doesn't look very consistent
> given that in this scenario it is _not_ possible to migrate
> such a (gigantic) hugepage if it is in use. OTOH, has_unmovable_pages()
> will return false in both cases, i.e. the memory block will be reported
> as removable, no matter if the hugepage that it is part of is unused or
> in use.
> 
> This patch is assuming that it would be OK to remove the hugepage,
> i.e. memory offline beats pre-allocated unused (gigantic) hugepages.
> 
> Any thoughts?

Cc'ed Rui Teng and Dave Hansen as they were discussing the issue in
this thread:
https://lkml.org/lkml/2016/9/13/146

Their approach (I believe) would be to fail the offline operation in
this case.  However, I could argue that failing the operation, or
dissolving the unused huge page containing the area to be offlined is
the right thing to do.

I never thought too much about the VM_BUG_ON(), but you are correct in
that it should be removed in either case.

The other thing that needs to be changed is the locking in
dissolve_free_huge_page().  I believe the lock only needs to be held if
we are removing the huge page from the pool.  It is not a correctness
but performance issue.

-- 
Mike Kravetz

> 
> 
> Gerald Schaefer (1):
>   mm/hugetlb: fix memory offline with hugepage size > memory block size
> 
>  mm/hugetlb.c | 16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/1] memory offline issues with hugepage size > memory block size
  2016-09-20 17:37 ` [PATCH 0/1] memory offline issues " Mike Kravetz
@ 2016-09-20 17:45   ` Dave Hansen
  2016-09-21  9:49     ` Vlastimil Babka
  2016-09-21 10:34     ` Gerald Schaefer
  2016-09-21 10:30   ` Gerald Schaefer
  2016-09-21 18:20   ` Michal Hocko
  2 siblings, 2 replies; 24+ messages in thread
From: Dave Hansen @ 2016-09-20 17:45 UTC (permalink / raw)
  To: Mike Kravetz, Gerald Schaefer, Andrew Morton, Naoya Horiguchi
  Cc: linux-mm, linux-kernel, Michal Hocko, Kirill A . Shutemov,
	Vlastimil Babka, Aneesh Kumar K . V, Martin Schwidefsky,
	Heiko Carstens, Rui Teng

On 09/20/2016 10:37 AM, Mike Kravetz wrote:
> 
> Their approach (I believe) would be to fail the offline operation in
> this case.  However, I could argue that failing the operation, or
> dissolving the unused huge page containing the area to be offlined is
> the right thing to do.

I think the right thing to do is dissolve the whole huge page if even a
part of it is offlined.  The only question is what to do with the
gigantic remnants.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/1] mm/hugetlb: fix memory offline with hugepage size > memory block size
  2016-09-20 15:53 ` [PATCH 1/1] mm/hugetlb: fix memory offline " Gerald Schaefer
@ 2016-09-21  6:29   ` Hillf Danton
  2016-09-21 12:35     ` [PATCH v2 " Gerald Schaefer
  0 siblings, 1 reply; 24+ messages in thread
From: Hillf Danton @ 2016-09-21  6:29 UTC (permalink / raw)
  To: 'Gerald Schaefer', 'Andrew Morton',
	'Naoya Horiguchi'
  Cc: linux-mm, linux-kernel, 'Michal Hocko',
	'Kirill A . Shutemov', 'Vlastimil Babka',
	'Mike Kravetz', 'Aneesh Kumar K . V',
	'Martin Schwidefsky', 'Heiko Carstens'

> @@ -1466,9 +1468,9 @@ void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
>  	if (!hugepages_supported())
>  		return;
> 
> -	VM_BUG_ON(!IS_ALIGNED(start_pfn, 1 << minimum_order));

Then the relevant comment has to be updated.

Hillf
>  	for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order)
> -		dissolve_free_huge_page(pfn_to_page(pfn));
> +		if (PageHuge(pfn_to_page(pfn)))
> +			dissolve_free_huge_page(pfn_to_page(pfn));
>  }
> 
>  /*
> --
> 2.8.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/1] memory offline issues with hugepage size > memory block size
  2016-09-20 17:45   ` Dave Hansen
@ 2016-09-21  9:49     ` Vlastimil Babka
  2016-09-21 10:34     ` Gerald Schaefer
  1 sibling, 0 replies; 24+ messages in thread
From: Vlastimil Babka @ 2016-09-21  9:49 UTC (permalink / raw)
  To: Dave Hansen, Mike Kravetz, Gerald Schaefer, Andrew Morton,
	Naoya Horiguchi
  Cc: linux-mm, linux-kernel, Michal Hocko, Kirill A . Shutemov,
	Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens, Rui Teng

On 09/20/2016 07:45 PM, Dave Hansen wrote:
> On 09/20/2016 10:37 AM, Mike Kravetz wrote:
>>
>> Their approach (I believe) would be to fail the offline operation in
>> this case.  However, I could argue that failing the operation, or
>> dissolving the unused huge page containing the area to be offlined is
>> the right thing to do.
>
> I think the right thing to do is dissolve the whole huge page if even a
> part of it is offlined.  The only question is what to do with the
> gigantic remnants.

Just free them into the buddy system? Or what are the alternatives? 
Creating smaller huge pages (if supported)? That doesn't make much 
sense. Offline it completely? That's probably not what the user requested.

Vlastimil

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/1] memory offline issues with hugepage size > memory block size
  2016-09-20 17:37 ` [PATCH 0/1] memory offline issues " Mike Kravetz
  2016-09-20 17:45   ` Dave Hansen
@ 2016-09-21 10:30   ` Gerald Schaefer
  2016-09-21 18:20   ` Michal Hocko
  2 siblings, 0 replies; 24+ messages in thread
From: Gerald Schaefer @ 2016-09-21 10:30 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Andrew Morton, Naoya Horiguchi, linux-mm, linux-kernel,
	Michal Hocko, Kirill A . Shutemov, Vlastimil Babka,
	Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens, Rui Teng,
	Dave Hansen

On Tue, 20 Sep 2016 10:37:04 -0700
Mike Kravetz <mike.kravetz@oracle.com> wrote:

> 
> Cc'ed Rui Teng and Dave Hansen as they were discussing the issue in
> this thread:
> https://lkml.org/lkml/2016/9/13/146

Ah, thanks, I didn't see that.

> 
> Their approach (I believe) would be to fail the offline operation in
> this case.  However, I could argue that failing the operation, or
> dissolving the unused huge page containing the area to be offlined is
> the right thing to do.
> 
> I never thought too much about the VM_BUG_ON(), but you are correct in
> that it should be removed in either case.
> 
> The other thing that needs to be changed is the locking in
> dissolve_free_huge_page().  I believe the lock only needs to be held if
> we are removing the huge page from the pool.  It is not a correctness
> but performance issue.
> 

Yes, that looks odd, that's why in my patch I moved the PageHuge() check
out from dissolve_free_huge_page(), up into the loop in
dissolve_free_huge_pages(). This way dissolve_free_huge_page() with its
locking should only be called once per memory block, in the case where
this memory block is part of a gigantic hugepage.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/1] memory offline issues with hugepage size > memory block size
  2016-09-20 17:45   ` Dave Hansen
  2016-09-21  9:49     ` Vlastimil Babka
@ 2016-09-21 10:34     ` Gerald Schaefer
  1 sibling, 0 replies; 24+ messages in thread
From: Gerald Schaefer @ 2016-09-21 10:34 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Mike Kravetz, Andrew Morton, Naoya Horiguchi, linux-mm,
	linux-kernel, Michal Hocko, Kirill A . Shutemov, Vlastimil Babka,
	Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens, Rui Teng

On Tue, 20 Sep 2016 10:45:23 -0700
Dave Hansen <dave.hansen@linux.intel.com> wrote:

> On 09/20/2016 10:37 AM, Mike Kravetz wrote:
> > 
> > Their approach (I believe) would be to fail the offline operation in
> > this case.  However, I could argue that failing the operation, or
> > dissolving the unused huge page containing the area to be offlined is
> > the right thing to do.
> 
> I think the right thing to do is dissolve the whole huge page if even a
> part of it is offlined.  The only question is what to do with the
> gigantic remnants.
> 

Hmm, not sure if I got this right, but I thought that by calling
update_and_free_page() on the head page (even if it is not part of the
memory block to be removed) all parts of the gigantic hugepage should be
properly freed and there should not be any remnants left.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 1/1] mm/hugetlb: fix memory offline with hugepage size > memory block size
  2016-09-21  6:29   ` Hillf Danton
@ 2016-09-21 12:35     ` Gerald Schaefer
  2016-09-21 13:17       ` Rui Teng
                         ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Gerald Schaefer @ 2016-09-21 12:35 UTC (permalink / raw)
  To: Andrew Morton, Naoya Horiguchi
  Cc: Hillf Danton, linux-mm, linux-kernel, Michal Hocko,
	Kirill A . Shutemov, Vlastimil Babka, Mike Kravetz,
	Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens,
	Dave Hansen, Rui Teng, Gerald Schaefer

dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
list corruption and addressing exception when trying to set a memory
block offline that is part (but not the first part) of a hugetlb page
with a size > memory block size.

When no other smaller hugetlb page sizes are present, the VM_BUG_ON()
will trigger directly. In the other case we will run into an addressing
exception later, because dissolve_free_huge_page() will not work on the
head page of the compound hugetlb page which will result in a NULL
hstate from page_hstate().

To fix this, first remove the VM_BUG_ON() because it is wrong, and then
use the compound head page in dissolve_free_huge_page().

Also change locking in dissolve_free_huge_page(), so that it only takes
the lock when actually removing a hugepage.

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
---
Changes in v2:
- Update comment in dissolve_free_huge_pages()
- Change locking in dissolve_free_huge_page()

 mm/hugetlb.c | 31 +++++++++++++++++++------------
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 87e11d8..1522af8 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1441,23 +1441,30 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
  */
 static void dissolve_free_huge_page(struct page *page)
 {
+	struct page *head = compound_head(page);
+	struct hstate *h;
+	int nid;
+
+	if (page_count(head))
+		return;
+
+	h = page_hstate(head);
+	nid = page_to_nid(head);
+
 	spin_lock(&hugetlb_lock);
-	if (PageHuge(page) && !page_count(page)) {
-		struct hstate *h = page_hstate(page);
-		int nid = page_to_nid(page);
-		list_del(&page->lru);
-		h->free_huge_pages--;
-		h->free_huge_pages_node[nid]--;
-		h->max_huge_pages--;
-		update_and_free_page(h, page);
-	}
+	list_del(&head->lru);
+	h->free_huge_pages--;
+	h->free_huge_pages_node[nid]--;
+	h->max_huge_pages--;
+	update_and_free_page(h, head);
 	spin_unlock(&hugetlb_lock);
 }
 
 /*
  * Dissolve free hugepages in a given pfn range. Used by memory hotplug to
  * make specified memory blocks removable from the system.
- * Note that start_pfn should aligned with (minimum) hugepage size.
+ * Note that this will dissolve a free gigantic hugepage completely, if any
+ * part of it lies within the given range.
  */
 void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
 {
@@ -1466,9 +1473,9 @@ void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
 	if (!hugepages_supported())
 		return;
 
-	VM_BUG_ON(!IS_ALIGNED(start_pfn, 1 << minimum_order));
 	for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order)
-		dissolve_free_huge_page(pfn_to_page(pfn));
+		if (PageHuge(pfn_to_page(pfn)))
+			dissolve_free_huge_page(pfn_to_page(pfn));
 }
 
 /*

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/1] mm/hugetlb: fix memory offline with hugepage size > memory block size
  2016-09-21 12:35     ` [PATCH v2 " Gerald Schaefer
@ 2016-09-21 13:17       ` Rui Teng
  2016-09-21 15:13         ` Gerald Schaefer
  2016-09-22  7:58       ` Hillf Danton
  2016-09-22  9:51       ` Michal Hocko
  2 siblings, 1 reply; 24+ messages in thread
From: Rui Teng @ 2016-09-21 13:17 UTC (permalink / raw)
  To: Gerald Schaefer, Andrew Morton, Naoya Horiguchi
  Cc: Hillf Danton, linux-mm, linux-kernel, Michal Hocko,
	Kirill A . Shutemov, Vlastimil Babka, Mike Kravetz,
	Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens,
	Dave Hansen

On 9/21/16 8:35 PM, Gerald Schaefer wrote:
> dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
> list corruption and addressing exception when trying to set a memory
> block offline that is part (but not the first part) of a hugetlb page
> with a size > memory block size.
>
> When no other smaller hugetlb page sizes are present, the VM_BUG_ON()
> will trigger directly. In the other case we will run into an addressing
> exception later, because dissolve_free_huge_page() will not work on the
> head page of the compound hugetlb page which will result in a NULL
> hstate from page_hstate().
>
> To fix this, first remove the VM_BUG_ON() because it is wrong, and then
> use the compound head page in dissolve_free_huge_page().
>
> Also change locking in dissolve_free_huge_page(), so that it only takes
> the lock when actually removing a hugepage.
>
> Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
> ---
> Changes in v2:
> - Update comment in dissolve_free_huge_pages()
> - Change locking in dissolve_free_huge_page()
>
>  mm/hugetlb.c | 31 +++++++++++++++++++------------
>  1 file changed, 19 insertions(+), 12 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 87e11d8..1522af8 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1441,23 +1441,30 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
>   */
>  static void dissolve_free_huge_page(struct page *page)
>  {
> +	struct page *head = compound_head(page);
> +	struct hstate *h;
> +	int nid;
> +
> +	if (page_count(head))
> +		return;
> +
> +	h = page_hstate(head);
> +	nid = page_to_nid(head);
> +
>  	spin_lock(&hugetlb_lock);
> -	if (PageHuge(page) && !page_count(page)) {
> -		struct hstate *h = page_hstate(page);
> -		int nid = page_to_nid(page);
> -		list_del(&page->lru);
> -		h->free_huge_pages--;
> -		h->free_huge_pages_node[nid]--;
> -		h->max_huge_pages--;
> -		update_and_free_page(h, page);
> -	}
> +	list_del(&head->lru);
> +	h->free_huge_pages--;
> +	h->free_huge_pages_node[nid]--;
> +	h->max_huge_pages--;
> +	update_and_free_page(h, head);
>  	spin_unlock(&hugetlb_lock);
>  }
>
>  /*
>   * Dissolve free hugepages in a given pfn range. Used by memory hotplug to
>   * make specified memory blocks removable from the system.
> - * Note that start_pfn should aligned with (minimum) hugepage size.
> + * Note that this will dissolve a free gigantic hugepage completely, if any
> + * part of it lies within the given range.
>   */
>  void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
>  {
> @@ -1466,9 +1473,9 @@ void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
>  	if (!hugepages_supported())
>  		return;
>
> -	VM_BUG_ON(!IS_ALIGNED(start_pfn, 1 << minimum_order));
>  	for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order)
> -		dissolve_free_huge_page(pfn_to_page(pfn));
> +		if (PageHuge(pfn_to_page(pfn)))
> +			dissolve_free_huge_page(pfn_to_page(pfn));
How many times will dissolve_free_huge_page() be invoked in this loop?
For each pfn, it will be converted to the head page, and then the list
will be deleted repeatedly.
>  }
>
>  /*
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/1] mm/hugetlb: fix memory offline with hugepage size > memory block size
  2016-09-21 13:17       ` Rui Teng
@ 2016-09-21 15:13         ` Gerald Schaefer
  0 siblings, 0 replies; 24+ messages in thread
From: Gerald Schaefer @ 2016-09-21 15:13 UTC (permalink / raw)
  To: Rui Teng
  Cc: Andrew Morton, Naoya Horiguchi, Hillf Danton, linux-mm,
	linux-kernel, Michal Hocko, Kirill A . Shutemov, Vlastimil Babka,
	Mike Kravetz, Aneesh Kumar K . V, Martin Schwidefsky,
	Heiko Carstens, Dave Hansen

On Wed, 21 Sep 2016 21:17:29 +0800
Rui Teng <rui.teng@linux.vnet.ibm.com> wrote:

> >  /*
> >   * Dissolve free hugepages in a given pfn range. Used by memory hotplug to
> >   * make specified memory blocks removable from the system.
> > - * Note that start_pfn should aligned with (minimum) hugepage size.
> > + * Note that this will dissolve a free gigantic hugepage completely, if any
> > + * part of it lies within the given range.
> >   */
> >  void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
> >  {
> > @@ -1466,9 +1473,9 @@ void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
> >  	if (!hugepages_supported())
> >  		return;
> >
> > -	VM_BUG_ON(!IS_ALIGNED(start_pfn, 1 << minimum_order));
> >  	for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order)
> > -		dissolve_free_huge_page(pfn_to_page(pfn));
> > +		if (pfn_to_page(pfn)))
> > +			pfn_to_page(pfn));
> How many times will dissolve_free_huge_page() be invoked in this loop?
> For each pfn, it will be converted to the head page, and then the list
> will be deleted repeatedly.

In the case where the memory block [start_pfn, end_pfn] is part of a
gigantic hugepage, dissolve_free_huge_page() will only be invoked once.

If there is only one gigantic hugepage pool, 1 << minimum_order will be
larger than the memory block size, and the loop will stop after the first
invocation of dissolve_free_huge_page().

If there are additional hugepage pools, with hugepage sizes < memory
block size, then it will loop as many times as 1 << minimum_order fits
inside a memory block, e.g. 256 times with 1 MB minimum hugepage size
and 256 MB memory block size.

However, the PageHuge() check should always return false after the first
invocation of dissolve_free_huge_page(), since update_and_free_page()
will take care of resetting compound_dtor, and so there will also be
just one invocation.

The only case where there will be more than one invocation is the case
where we do not have any part of a gigantic hugepage inside the memory
block, but rather multiple "normal sized" hugepages. Then there will be
one invocation per hugepage, as opposed to one invocation per
"1 << minimum_order" range as it was before the patch. So it also
improves the behaviour in the case where there is no gigantic page
involved.

> >  }
> >
> >  /*
> >
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/1] memory offline issues with hugepage size > memory block size
  2016-09-20 17:37 ` [PATCH 0/1] memory offline issues " Mike Kravetz
  2016-09-20 17:45   ` Dave Hansen
  2016-09-21 10:30   ` Gerald Schaefer
@ 2016-09-21 18:20   ` Michal Hocko
  2016-09-21 18:27     ` Dave Hansen
  2 siblings, 1 reply; 24+ messages in thread
From: Michal Hocko @ 2016-09-21 18:20 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Gerald Schaefer, Andrew Morton, Naoya Horiguchi, linux-mm,
	linux-kernel, Kirill A . Shutemov, Vlastimil Babka,
	Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens, Rui Teng,
	Dave Hansen

On Tue 20-09-16 10:37:04, Mike Kravetz wrote:
> On 09/20/2016 08:53 AM, Gerald Schaefer wrote:
> > dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
> > list corruption and addressing exception when trying to set a memory
> > block offline that is part (but not the first part) of a gigantic
> > hugetlb page with a size > memory block size.
> > 
> > When no other smaller hugepage sizes are present, the VM_BUG_ON() will
> > trigger directly. In the other case we will run into an addressing
> > exception later, because dissolve_free_huge_page() will not use the head
> > page of the compound hugetlb page which will result in a NULL hstate
> > from page_hstate(). list_del() would also not work well on a tail page.
> > 
> > To fix this, first remove the VM_BUG_ON() because it is wrong, and then
> > use the compound head page in dissolve_free_huge_page().
> > 
> > However, this all assumes that it is the desired behaviour to remove
> > a (gigantic) unused hugetlb page from the pool, just because a small
> > (in relation to the  hugepage size) memory block is going offline. Not
> > sure if this is the right thing, and it doesn't look very consistent
> > given that in this scenario it is _not_ possible to migrate
> > such a (gigantic) hugepage if it is in use. OTOH, has_unmovable_pages()
> > will return false in both cases, i.e. the memory block will be reported
> > as removable, no matter if the hugepage that it is part of is unused or
> > in use.
> > 
> > This patch is assuming that it would be OK to remove the hugepage,
> > i.e. memory offline beats pre-allocated unused (gigantic) hugepages.
> > 
> > Any thoughts?
> 
> Cc'ed Rui Teng and Dave Hansen as they were discussing the issue in
> this thread:
> https://lkml.org/lkml/2016/9/13/146
> 
> Their approach (I believe) would be to fail the offline operation in
> this case.  However, I could argue that failing the operation, or
> dissolving the unused huge page containing the area to be offlined is
> the right thing to do.

I am sorry I have noticed this thread only now. I was arguing about this
in the original thread. I would be rather reluctant to free gigantic
page just because somebody wants to offline a small part of it because
setup is really expensive and a lost page would be really hard to get
back.

I would even question the per page block offlining itself. Why would
anybody want to offline few blocks rather than the whole node? What is
the usecase here?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/1] memory offline issues with hugepage size > memory block size
  2016-09-21 18:20   ` Michal Hocko
@ 2016-09-21 18:27     ` Dave Hansen
  2016-09-21 19:22       ` Michal Hocko
  0 siblings, 1 reply; 24+ messages in thread
From: Dave Hansen @ 2016-09-21 18:27 UTC (permalink / raw)
  To: Michal Hocko, Mike Kravetz
  Cc: Gerald Schaefer, Andrew Morton, Naoya Horiguchi, linux-mm,
	linux-kernel, Kirill A . Shutemov, Vlastimil Babka,
	Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens, Rui Teng

On 09/21/2016 11:20 AM, Michal Hocko wrote:
> I would even question the per page block offlining itself. Why would
> anybody want to offline few blocks rather than the whole node? What is
> the usecase here?

The original reason was so that you could remove a DIMM or a riser card
full of DIMMs, which are certainly a subset of a node.

With virtual machines, perhaps you only want to make a small adjustment
to the memory that a VM has.  Or, perhaps the VM only _has_ one node.
Granted, ballooning takes care of a lot of these cases, but memmap[]
starts to get annoying at some point if you balloon too much memory away.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/1] memory offline issues with hugepage size > memory block size
  2016-09-21 18:27     ` Dave Hansen
@ 2016-09-21 19:22       ` Michal Hocko
  0 siblings, 0 replies; 24+ messages in thread
From: Michal Hocko @ 2016-09-21 19:22 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Mike Kravetz, Gerald Schaefer, Andrew Morton, Naoya Horiguchi,
	linux-mm, linux-kernel, Kirill A . Shutemov, Vlastimil Babka,
	Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens, Rui Teng

On Wed 21-09-16 11:27:48, Dave Hansen wrote:
> On 09/21/2016 11:20 AM, Michal Hocko wrote:
> > I would even question the per page block offlining itself. Why would
> > anybody want to offline few blocks rather than the whole node? What is
> > the usecase here?
> 
> The original reason was so that you could remove a DIMM or a riser card
> full of DIMMs, which are certainly a subset of a node.

OK, I see, thanks for the clarification! I was always thinking more in
node rather than physical memory range hot-remove. I do agree that it
makes sense to free the whole gigantic huge page if we encounter a tail
page for the above use case because losing the gigantic page is
justified when the whole dim goes away.

Thanks!

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/1] mm/hugetlb: fix memory offline with hugepage size > memory block size
  2016-09-21 12:35     ` [PATCH v2 " Gerald Schaefer
  2016-09-21 13:17       ` Rui Teng
@ 2016-09-22  7:58       ` Hillf Danton
  2016-09-22  9:51       ` Michal Hocko
  2 siblings, 0 replies; 24+ messages in thread
From: Hillf Danton @ 2016-09-22  7:58 UTC (permalink / raw)
  To: 'Gerald Schaefer', 'Andrew Morton',
	'Naoya Horiguchi'
  Cc: linux-mm, linux-kernel, 'Michal Hocko',
	'Kirill A . Shutemov', 'Vlastimil Babka',
	'Mike Kravetz', 'Aneesh Kumar K . V',
	'Martin Schwidefsky', 'Heiko Carstens',
	'Dave Hansen', 'Rui Teng'

> 
> dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
> list corruption and addressing exception when trying to set a memory
> block offline that is part (but not the first part) of a hugetlb page
> with a size > memory block size.
> 
> When no other smaller hugetlb page sizes are present, the VM_BUG_ON()
> will trigger directly. In the other case we will run into an addressing
> exception later, because dissolve_free_huge_page() will not work on the
> head page of the compound hugetlb page which will result in a NULL
> hstate from page_hstate().
> 
> To fix this, first remove the VM_BUG_ON() because it is wrong, and then
> use the compound head page in dissolve_free_huge_page().
> 
> Also change locking in dissolve_free_huge_page(), so that it only takes
> the lock when actually removing a hugepage.
> 
> Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
> ---
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/1] mm/hugetlb: fix memory offline with hugepage size > memory block size
  2016-09-21 12:35     ` [PATCH v2 " Gerald Schaefer
  2016-09-21 13:17       ` Rui Teng
  2016-09-22  7:58       ` Hillf Danton
@ 2016-09-22  9:51       ` Michal Hocko
  2016-09-22 13:45         ` Gerald Schaefer
  2016-09-23  6:40         ` [PATCH v2 1/1] " Rui Teng
  2 siblings, 2 replies; 24+ messages in thread
From: Michal Hocko @ 2016-09-22  9:51 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: Andrew Morton, Naoya Horiguchi, Hillf Danton, linux-mm,
	linux-kernel, Kirill A . Shutemov, Vlastimil Babka, Mike Kravetz,
	Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens,
	Dave Hansen, Rui Teng

On Wed 21-09-16 14:35:34, Gerald Schaefer wrote:
> dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
> list corruption and addressing exception when trying to set a memory
> block offline that is part (but not the first part) of a hugetlb page
> with a size > memory block size.
> 
> When no other smaller hugetlb page sizes are present, the VM_BUG_ON()
> will trigger directly. In the other case we will run into an addressing
> exception later, because dissolve_free_huge_page() will not work on the
> head page of the compound hugetlb page which will result in a NULL
> hstate from page_hstate().
> 
> To fix this, first remove the VM_BUG_ON() because it is wrong, and then
> use the compound head page in dissolve_free_huge_page().

OK so dissolve_free_huge_page will work also on tail pages now which
makes some sense. I would appreciate also few words why do we want to
sacrifice something as precious as gigantic page rather than fail the
page block offline. Dave pointed out dim offline usecase for example.

> Also change locking in dissolve_free_huge_page(), so that it only takes
> the lock when actually removing a hugepage.

>From a quick look it seems this has been broken since introduced by
c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle
hugepage"). Do we want to have this backported to stable? In any way
Fixes: SHA1 would be really nice.

> Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>

Other than that looks good to me, although there is a room for
improvements here. See below

> ---
> Changes in v2:
> - Update comment in dissolve_free_huge_pages()
> - Change locking in dissolve_free_huge_page()
> 
>  mm/hugetlb.c | 31 +++++++++++++++++++------------
>  1 file changed, 19 insertions(+), 12 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 87e11d8..1522af8 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1441,23 +1441,30 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
>   */
>  static void dissolve_free_huge_page(struct page *page)
>  {
> +	struct page *head = compound_head(page);
> +	struct hstate *h;
> +	int nid;
> +
> +	if (page_count(head))
> +		return;
> +
> +	h = page_hstate(head);
> +	nid = page_to_nid(head);
> +
>  	spin_lock(&hugetlb_lock);
> -	if (PageHuge(page) && !page_count(page)) {
> -		struct hstate *h = page_hstate(page);
> -		int nid = page_to_nid(page);
> -		list_del(&page->lru);
> -		h->free_huge_pages--;
> -		h->free_huge_pages_node[nid]--;
> -		h->max_huge_pages--;
> -		update_and_free_page(h, page);
> -	}
> +	list_del(&head->lru);
> +	h->free_huge_pages--;
> +	h->free_huge_pages_node[nid]--;
> +	h->max_huge_pages--;
> +	update_and_free_page(h, head);
>  	spin_unlock(&hugetlb_lock);
>  }
>  
>  /*
>   * Dissolve free hugepages in a given pfn range. Used by memory hotplug to
>   * make specified memory blocks removable from the system.
> - * Note that start_pfn should aligned with (minimum) hugepage size.
> + * Note that this will dissolve a free gigantic hugepage completely, if any
> + * part of it lies within the given range.
>   */
>  void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
>  {
> @@ -1466,9 +1473,9 @@ void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
>  	if (!hugepages_supported())
>  		return;
>  
> -	VM_BUG_ON(!IS_ALIGNED(start_pfn, 1 << minimum_order));
>  	for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order)
> -		dissolve_free_huge_page(pfn_to_page(pfn));
> +		if (PageHuge(pfn_to_page(pfn)))
> +			dissolve_free_huge_page(pfn_to_page(pfn));
>  }

we can return the number of freed pages from dissolve_free_huge_page and
move by the approapriate number of pfns. Nothing to really lose sleep
about but no rocket science either. An early break out if the page is
used would be nice as well. Something like the following, probably a
separate patch on top of yours.
---
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 029a80b90cea..d230900f571e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1434,17 +1434,17 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
 }
 
 /*
- * Dissolve a given free hugepage into free buddy pages. This function does
- * nothing for in-use (including surplus) hugepages.
+ * Dissolve a given free hugepage into free buddy pages. Returns number
+ * of freed pages or EBUSY if the page is in use.
  */
-static void dissolve_free_huge_page(struct page *page)
+static int dissolve_free_huge_page(struct page *page)
 {
 	struct page *head = compound_head(page);
 	struct hstate *h;
 	int nid;
 
 	if (page_count(head))
-		return;
+		return -EBUSY;
 
 	h = page_hstate(head);
 	nid = page_to_nid(head);
@@ -1456,6 +1456,8 @@ static void dissolve_free_huge_page(struct page *page)
 	h->max_huge_pages--;
 	update_and_free_page(h, head);
 	spin_unlock(&hugetlb_lock);
+
+	return 1 << h->order;
 }
 
 /*
@@ -1471,9 +1473,18 @@ void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
 	if (!hugepages_supported())
 		return;
 
-	for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order)
-		if (PageHuge(pfn_to_page(pfn)))
-			dissolve_free_huge_page(pfn_to_page(pfn));
+	for (pfn = start_pfn; pfn < end_pfn; )
+		int nr_pages;
+
+		if (!PageHuge(pfn_to_page(pfn))) {
+			pfn += 1 << minimum_order;
+			continue;
+		}
+
+		nr_pages = dissolve_free_huge_page(pfn_to_page(pfn));
+		if (IS_ERR(nr_pages))
+			break;
+		pfn += nr_pages;
 }
 
 /*
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/1] mm/hugetlb: fix memory offline with hugepage size > memory block size
  2016-09-22  9:51       ` Michal Hocko
@ 2016-09-22 13:45         ` Gerald Schaefer
  2016-09-22 16:29           ` [PATCH v3] " Gerald Schaefer
  2016-09-23  6:40         ` [PATCH v2 1/1] " Rui Teng
  1 sibling, 1 reply; 24+ messages in thread
From: Gerald Schaefer @ 2016-09-22 13:45 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Naoya Horiguchi, Hillf Danton, linux-mm,
	linux-kernel, Kirill A . Shutemov, Vlastimil Babka, Mike Kravetz,
	Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens,
	Dave Hansen, Rui Teng, Gerald Schaefer

On Thu, 22 Sep 2016 11:51:37 +0200
Michal Hocko <mhocko@kernel.org> wrote:

> On Wed 21-09-16 14:35:34, Gerald Schaefer wrote:
> > dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
> > list corruption and addressing exception when trying to set a memory
> > block offline that is part (but not the first part) of a hugetlb page
> > with a size > memory block size.
> > 
> > When no other smaller hugetlb page sizes are present, the VM_BUG_ON()
> > will trigger directly. In the other case we will run into an addressing
> > exception later, because dissolve_free_huge_page() will not work on the
> > head page of the compound hugetlb page which will result in a NULL
> > hstate from page_hstate().
> > 
> > To fix this, first remove the VM_BUG_ON() because it is wrong, and then
> > use the compound head page in dissolve_free_huge_page().
> 
> OK so dissolve_free_huge_page will work also on tail pages now which
> makes some sense. I would appreciate also few words why do we want to
> sacrifice something as precious as gigantic page rather than fail the
> page block offline. Dave pointed out dim offline usecase for example.
> 
> > Also change locking in dissolve_free_huge_page(), so that it only takes
> > the lock when actually removing a hugepage.
> 
> From a quick look it seems this has been broken since introduced by
> c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle
> hugepage"). Do we want to have this backported to stable? In any way
> Fixes: SHA1 would be really nice.

That's true, I'll send a v3.

> 
> > Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
> 
> Other than that looks good to me, although there is a room for
> improvements here. See below
> 
> > ---
> > Changes in v2:
> > - Update comment in dissolve_free_huge_pages()
> > - Change locking in dissolve_free_huge_page()
> > 
> >  mm/hugetlb.c | 31 +++++++++++++++++++------------
> >  1 file changed, 19 insertions(+), 12 deletions(-)
> > 
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 87e11d8..1522af8 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1441,23 +1441,30 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
> >   */
> >  static void dissolve_free_huge_page(struct page *page)
> >  {
> > +	struct page *head = compound_head(page);
> > +	struct hstate *h;
> > +	int nid;
> > +
> > +	if (page_count(head))
> > +		return;
> > +
> > +	h = page_hstate(head);
> > +	nid = page_to_nid(head);
> > +
> >  	spin_lock(&hugetlb_lock);
> > -	if (PageHuge(page) && !page_count(page)) {
> > -		struct hstate *h = page_hstate(page);
> > -		int nid = page_to_nid(page);
> > -		list_del(&page->lru);
> > -		h->free_huge_pages--;
> > -		h->free_huge_pages_node[nid]--;
> > -		h->max_huge_pages--;
> > -		update_and_free_page(h, page);
> > -	}
> > +	list_del(&head->lru);
> > +	h->free_huge_pages--;
> > +	h->free_huge_pages_node[nid]--;
> > +	h->max_huge_pages--;
> > +	update_and_free_page(h, head);
> >  	spin_unlock(&hugetlb_lock);
> >  }
> >  
> >  /*
> >   * Dissolve free hugepages in a given pfn range. Used by memory hotplug to
> >   * make specified memory blocks removable from the system.
> > - * Note that start_pfn should aligned with (minimum) hugepage size.
> > + * Note that this will dissolve a free gigantic hugepage completely, if any
> > + * part of it lies within the given range.
> >   */
> >  void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
> >  {
> > @@ -1466,9 +1473,9 @@ void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
> >  	if (!hugepages_supported())
> >  		return;
> >  
> > -	VM_BUG_ON(!IS_ALIGNED(start_pfn, 1 << minimum_order));
> >  	for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order)
> > -		dissolve_free_huge_page(pfn_to_page(pfn));
> > +		if (PageHuge(pfn_to_page(pfn)))
> > +			dissolve_free_huge_page(pfn_to_page(pfn));
> >  }
> 
> we can return the number of freed pages from dissolve_free_huge_page and
> move by the approapriate number of pfns. Nothing to really lose sleep
> about but no rocket science either. An early break out if the page is
> used would be nice as well. Something like the following, probably a
> separate patch on top of yours.

Hmm, not sure if this is really worth the effort and the (small) added
complexity. It would surely be worth it for the current code, where we
also have the spinlock involved even for non-huge pages. After this patch
however, dissolve_free_huge_page() will only be called for hugepages,
and the early break-out is also there, although the page_count() check
could probably be moved out from dissolve_free_huge_page() and into the
loop, I'll try this for v3.

The loop count will also not be greatly reduced, at least when there
are only hugepages of minimum_order in the memory block, or no hugepages
at all, it will not improve anything. In any other case the PageHuge()
check in the loop will already prevent unnecessary calls to
dissolve_free_huge_page().

> ---
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 029a80b90cea..d230900f571e 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1434,17 +1434,17 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
>  }
> 
>  /*
> - * Dissolve a given free hugepage into free buddy pages. This function does
> - * nothing for in-use (including surplus) hugepages.
> + * Dissolve a given free hugepage into free buddy pages. Returns number
> + * of freed pages or EBUSY if the page is in use.
>   */
> -static void dissolve_free_huge_page(struct page *page)
> +static int dissolve_free_huge_page(struct page *page)
>  {
>  	struct page *head = compound_head(page);
>  	struct hstate *h;
>  	int nid;
> 
>  	if (page_count(head))
> -		return;
> +		return -EBUSY;
> 
>  	h = page_hstate(head);
>  	nid = page_to_nid(head);
> @@ -1456,6 +1456,8 @@ static void dissolve_free_huge_page(struct page *page)
>  	h->max_huge_pages--;
>  	update_and_free_page(h, head);
>  	spin_unlock(&hugetlb_lock);
> +
> +	return 1 << h->order;
>  }
> 
>  /*
> @@ -1471,9 +1473,18 @@ void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
>  	if (!hugepages_supported())
>  		return;
> 
> -	for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order)
> -		if (PageHuge(pfn_to_page(pfn)))
> -			dissolve_free_huge_page(pfn_to_page(pfn));
> +	for (pfn = start_pfn; pfn < end_pfn; )
> +		int nr_pages;
> +
> +		if (!PageHuge(pfn_to_page(pfn))) {
> +			pfn += 1 << minimum_order;
> +			continue;
> +		}
> +
> +		nr_pages = dissolve_free_huge_page(pfn_to_page(pfn));
> +		if (IS_ERR(nr_pages))
> +			break;
> +		pfn += nr_pages;
>  }
> 
>  /*

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3] mm/hugetlb: fix memory offline with hugepage size > memory block size
  2016-09-22 13:45         ` Gerald Schaefer
@ 2016-09-22 16:29           ` Gerald Schaefer
  2016-09-22 18:12             ` Dave Hansen
  0 siblings, 1 reply; 24+ messages in thread
From: Gerald Schaefer @ 2016-09-22 16:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Gerald Schaefer, Michal Hocko, Naoya Horiguchi, Hillf Danton,
	linux-mm, linux-kernel, Kirill A . Shutemov, Vlastimil Babka,
	Mike Kravetz, Aneesh Kumar K . V, Martin Schwidefsky,
	Heiko Carstens, Dave Hansen, Rui Teng

dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
list corruption and addressing exception when trying to set a memory
block offline that is part (but not the first part) of a "gigantic"
hugetlb page with a size > memory block size.

When no other smaller hugetlb page sizes are present, the VM_BUG_ON()
will trigger directly. In the other case we will run into an addressing
exception later, because dissolve_free_huge_page() will not work on the
head page of the compound hugetlb page which will result in a NULL
hstate from page_hstate().

To fix this, first remove the VM_BUG_ON() because it is wrong, and then
use the compound head page in dissolve_free_huge_page(). This means that
an unused pre-allocated gigantic page that has any part of itself inside
the memory block that is going offline will be dissolved completely.
Losing the gigantic hugepage is preferable to failing the memory offline,
for example in the situation where a (possibly faulty) memory DIMM needs
to go offline.

Also move the PageHuge() and page_count() checks out of
dissolve_free_huge_page() in order to only take the spin_lock when
actually removing a hugepage.

Fixes: c8721bbb ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
Cc: <stable@vger.kernel.org>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
---
Changes in v3:
- Add Fixes: c8721bbb
- Add Cc: stable
- Elaborate on losing the gigantic page vs. failing memory offline
- Move page_count() check out of dissolve_free_huge_page()

Changes in v2:
- Update comment in dissolve_free_huge_pages()
- Change locking in dissolve_free_huge_page()

 mm/hugetlb.c | 34 +++++++++++++++++++---------------
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 87e11d8..29e10a2 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1436,39 +1436,43 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
 }
 
 /*
- * Dissolve a given free hugepage into free buddy pages. This function does
- * nothing for in-use (including surplus) hugepages.
+ * Dissolve a given free hugepage into free buddy pages.
  */
 static void dissolve_free_huge_page(struct page *page)
 {
+	struct page *head = compound_head(page);
+	struct hstate *h = page_hstate(head);
+	int nid = page_to_nid(head);
+
 	spin_lock(&hugetlb_lock);
-	if (PageHuge(page) && !page_count(page)) {
-		struct hstate *h = page_hstate(page);
-		int nid = page_to_nid(page);
-		list_del(&page->lru);
-		h->free_huge_pages--;
-		h->free_huge_pages_node[nid]--;
-		h->max_huge_pages--;
-		update_and_free_page(h, page);
-	}
+	list_del(&head->lru);
+	h->free_huge_pages--;
+	h->free_huge_pages_node[nid]--;
+	h->max_huge_pages--;
+	update_and_free_page(h, head);
 	spin_unlock(&hugetlb_lock);
 }
 
 /*
  * Dissolve free hugepages in a given pfn range. Used by memory hotplug to
  * make specified memory blocks removable from the system.
- * Note that start_pfn should aligned with (minimum) hugepage size.
+ * Note that this will dissolve a free gigantic hugepage completely, if any
+ * part of it lies within the given range.
+ * This function does nothing for in-use (including surplus) hugepages.
  */
 void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
 {
 	unsigned long pfn;
+	struct page *page;
 
 	if (!hugepages_supported())
 		return;
 
-	VM_BUG_ON(!IS_ALIGNED(start_pfn, 1 << minimum_order));
-	for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order)
-		dissolve_free_huge_page(pfn_to_page(pfn));
+	for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order) {
+		page = pfn_to_page(pfn);
+		if (PageHuge(page) && !page_count(page))
+			dissolve_free_huge_page(page);
+	}
 }
 
 /*

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v3] mm/hugetlb: fix memory offline with hugepage size > memory block size
  2016-09-22 16:29           ` [PATCH v3] " Gerald Schaefer
@ 2016-09-22 18:12             ` Dave Hansen
  2016-09-22 19:13               ` Mike Kravetz
  2016-09-23 10:36               ` Gerald Schaefer
  0 siblings, 2 replies; 24+ messages in thread
From: Dave Hansen @ 2016-09-22 18:12 UTC (permalink / raw)
  To: Gerald Schaefer, Andrew Morton
  Cc: Michal Hocko, Naoya Horiguchi, Hillf Danton, linux-mm,
	linux-kernel, Kirill A . Shutemov, Vlastimil Babka, Mike Kravetz,
	Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens, Rui Teng

On 09/22/2016 09:29 AM, Gerald Schaefer wrote:
>  static void dissolve_free_huge_page(struct page *page)
>  {
> +	struct page *head = compound_head(page);
> +	struct hstate *h = page_hstate(head);
> +	int nid = page_to_nid(head);
> +
>  	spin_lock(&hugetlb_lock);
> -	if (PageHuge(page) && !page_count(page)) {
> -		struct hstate *h = page_hstate(page);
> -		int nid = page_to_nid(page);
> -		list_del(&page->lru);
> -		h->free_huge_pages--;
> -		h->free_huge_pages_node[nid]--;
> -		h->max_huge_pages--;
> -		update_and_free_page(h, page);
> -	}
> +	list_del(&head->lru);
> +	h->free_huge_pages--;
> +	h->free_huge_pages_node[nid]--;
> +	h->max_huge_pages--;
> +	update_and_free_page(h, head);
>  	spin_unlock(&hugetlb_lock);
>  }

Do you need to revalidate anything once you acquire the lock?  Can this,
for instance, race with another thread doing vm.nr_hugepages=0?  Or a
thread faulting in and allocating the large page that's being dissolved?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3] mm/hugetlb: fix memory offline with hugepage size > memory block size
  2016-09-22 18:12             ` Dave Hansen
@ 2016-09-22 19:13               ` Mike Kravetz
  2016-09-23 10:36               ` Gerald Schaefer
  1 sibling, 0 replies; 24+ messages in thread
From: Mike Kravetz @ 2016-09-22 19:13 UTC (permalink / raw)
  To: Dave Hansen, Gerald Schaefer, Andrew Morton
  Cc: Michal Hocko, Naoya Horiguchi, Hillf Danton, linux-mm,
	linux-kernel, Kirill A . Shutemov, Vlastimil Babka,
	Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens, Rui Teng

On 09/22/2016 11:12 AM, Dave Hansen wrote:
> On 09/22/2016 09:29 AM, Gerald Schaefer wrote:
>>  static void dissolve_free_huge_page(struct page *page)
>>  {
>> +	struct page *head = compound_head(page);
>> +	struct hstate *h = page_hstate(head);
>> +	int nid = page_to_nid(head);
>> +
>>  	spin_lock(&hugetlb_lock);
>> -	if (PageHuge(page) && !page_count(page)) {
>> -		struct hstate *h = page_hstate(page);
>> -		int nid = page_to_nid(page);
>> -		list_del(&page->lru);
>> -		h->free_huge_pages--;
>> -		h->free_huge_pages_node[nid]--;
>> -		h->max_huge_pages--;
>> -		update_and_free_page(h, page);
>> -	}
>> +	list_del(&head->lru);
>> +	h->free_huge_pages--;
>> +	h->free_huge_pages_node[nid]--;
>> +	h->max_huge_pages--;
>> +	update_and_free_page(h, head);
>>  	spin_unlock(&hugetlb_lock);
>>  }
> 
> Do you need to revalidate anything once you acquire the lock?  Can this,
> for instance, race with another thread doing vm.nr_hugepages=0?  Or a
> thread faulting in and allocating the large page that's being dissolved?

I originally suggested the locking change, but this is not quite right.
The page count for huge pages is adjusted while holding hugetlb_lock.
So, that check or a revalidation needs to be done while holding the lock.

That question made me think about huge page reservations.  I don't think
the offline code takes this into account.  But, you would not want your
huge page count to drop below the reserved huge page count
(resv_huge_pages).
So, shouldn't this be another condition to check before allowing the huge
page to be dissolved?

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/1] mm/hugetlb: fix memory offline with hugepage size > memory block size
  2016-09-22  9:51       ` Michal Hocko
  2016-09-22 13:45         ` Gerald Schaefer
@ 2016-09-23  6:40         ` Rui Teng
  2016-09-23 11:03           ` Gerald Schaefer
  1 sibling, 1 reply; 24+ messages in thread
From: Rui Teng @ 2016-09-23  6:40 UTC (permalink / raw)
  To: Michal Hocko, Gerald Schaefer
  Cc: Andrew Morton, Naoya Horiguchi, Hillf Danton, linux-mm,
	linux-kernel, Kirill A . Shutemov, Vlastimil Babka, Mike Kravetz,
	Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens,
	Dave Hansen

On 9/22/16 5:51 PM, Michal Hocko wrote:
> On Wed 21-09-16 14:35:34, Gerald Schaefer wrote:
>> dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
>> list corruption and addressing exception when trying to set a memory
>> block offline that is part (but not the first part) of a hugetlb page
>> with a size > memory block size.
>>
>> When no other smaller hugetlb page sizes are present, the VM_BUG_ON()
>> will trigger directly. In the other case we will run into an addressing
>> exception later, because dissolve_free_huge_page() will not work on the
>> head page of the compound hugetlb page which will result in a NULL
>> hstate from page_hstate().
>>
>> To fix this, first remove the VM_BUG_ON() because it is wrong, and then
>> use the compound head page in dissolve_free_huge_page().
>
> OK so dissolve_free_huge_page will work also on tail pages now which
> makes some sense. I would appreciate also few words why do we want to
> sacrifice something as precious as gigantic page rather than fail the
> page block offline. Dave pointed out dim offline usecase for example.
>
>> Also change locking in dissolve_free_huge_page(), so that it only takes
>> the lock when actually removing a hugepage.
>
> From a quick look it seems this has been broken since introduced by
> c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle
> hugepage"). Do we want to have this backported to stable? In any way
> Fixes: SHA1 would be really nice.
>

If the huge page hot-plug function was introduced by c8721bbbdd36, and
it has already indicated that the gigantic page is not supported:

	"As for larger hugepages (1GB for x86_64), it's not easy to do
	hotremove over them because it's larger than memory block.  So
	we now simply leave it to fail as it is."

Is it possible that the gigantic page hot-plugin has never been
supported?

I made another patch for this problem, and also tried to apply the
first version of this patch on my system too. But they only postpone
the error happened. The HugePages_Free will be changed from 2 to 1, if I 
offline a huge page. I think it does not have a correct roll back.

# cat /proc/meminfo | grep -i huge
AnonHugePages:         0 kB
HugePages_Total:       2
HugePages_Free:        1
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:   16777216 kB

I will make more test on it, but can any one confirm that this function 
has been implemented and tested before?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3] mm/hugetlb: fix memory offline with hugepage size > memory block size
  2016-09-22 18:12             ` Dave Hansen
  2016-09-22 19:13               ` Mike Kravetz
@ 2016-09-23 10:36               ` Gerald Schaefer
  1 sibling, 0 replies; 24+ messages in thread
From: Gerald Schaefer @ 2016-09-23 10:36 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andrew Morton, Michal Hocko, Naoya Horiguchi, Hillf Danton,
	linux-mm, linux-kernel, Kirill A . Shutemov, Vlastimil Babka,
	Mike Kravetz, Aneesh Kumar K . V, Martin Schwidefsky,
	Heiko Carstens, Rui Teng

On Thu, 22 Sep 2016 11:12:06 -0700
Dave Hansen <dave.hansen@linux.intel.com> wrote:

> On 09/22/2016 09:29 AM, Gerald Schaefer wrote:
> >  static void dissolve_free_huge_page(struct page *page)
> >  {
> > +	struct page *head = compound_head(page);
> > +	struct hstate *h = page_hstate(head);
> > +	int nid = page_to_nid(head);
> > +
> >  	spin_lock(&hugetlb_lock);
> > -	if (PageHuge(page) && !page_count(page)) {
> > -		struct hstate *h = page_hstate(page);
> > -		int nid = page_to_nid(page);
> > -		list_del(&page->lru);
> > -		h->free_huge_pages--;
> > -		h->free_huge_pages_node[nid]--;
> > -		h->max_huge_pages--;
> > -		update_and_free_page(h, page);
> > -	}
> > +	list_del(&head->lru);
> > +	h->free_huge_pages--;
> > +	h->free_huge_pages_node[nid]--;
> > +	h->max_huge_pages--;
> > +	update_and_free_page(h, head);
> >  	spin_unlock(&hugetlb_lock);
> >  }
> 
> Do you need to revalidate anything once you acquire the lock?  Can this,
> for instance, race with another thread doing vm.nr_hugepages=0?  Or a
> thread faulting in and allocating the large page that's being dissolved?
> 

Yes, good point. I was relying on the range being isolated, but that only
seems to be checked in dequeue_huge_page_node(), as introduced with the
original commit. So this would only protect against anyone allocating the
hugepage at this point. This is also somehow expected, since we already
are beyond the "point of no return" in offline_pages().

vm.nr_hugepages=0 seems to be an issue though, as set_max_hugepages()
will not care about isolation, and so I guess we could have a race here
and double-free the hugepage. Revalidation of at least PageHuge() after
taking the lock should protect from that, not sure about page_count(),
but I think I'll just check both which will give the same behaviour as
before.

Will send v4, after thinking a bit more on the page reservation point
brought up by Mike.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/1] mm/hugetlb: fix memory offline with hugepage size > memory block size
  2016-09-23  6:40         ` [PATCH v2 1/1] " Rui Teng
@ 2016-09-23 11:03           ` Gerald Schaefer
  2016-09-26  2:49             ` Rui Teng
  0 siblings, 1 reply; 24+ messages in thread
From: Gerald Schaefer @ 2016-09-23 11:03 UTC (permalink / raw)
  To: Rui Teng
  Cc: Michal Hocko, Andrew Morton, Naoya Horiguchi, Hillf Danton,
	linux-mm, linux-kernel, Kirill A . Shutemov, Vlastimil Babka,
	Mike Kravetz, Aneesh Kumar K . V, Martin Schwidefsky,
	Heiko Carstens, Dave Hansen

On Fri, 23 Sep 2016 14:40:33 +0800
Rui Teng <rui.teng@linux.vnet.ibm.com> wrote:

> On 9/22/16 5:51 PM, Michal Hocko wrote:
> > On Wed 21-09-16 14:35:34, Gerald Schaefer wrote:
> >> dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
> >> list corruption and addressing exception when trying to set a memory
> >> block offline that is part (but not the first part) of a hugetlb page
> >> with a size > memory block size.
> >>
> >> When no other smaller hugetlb page sizes are present, the VM_BUG_ON()
> >> will trigger directly. In the other case we will run into an addressing
> >> exception later, because dissolve_free_huge_page() will not work on the
> >> head page of the compound hugetlb page which will result in a NULL
> >> hstate from page_hstate().
> >>
> >> To fix this, first remove the VM_BUG_ON() because it is wrong, and then
> >> use the compound head page in dissolve_free_huge_page().
> >
> > OK so dissolve_free_huge_page will work also on tail pages now which
> > makes some sense. I would appreciate also few words why do we want to
> > sacrifice something as precious as gigantic page rather than fail the
> > page block offline. Dave pointed out dim offline usecase for example.
> >
> >> Also change locking in dissolve_free_huge_page(), so that it only takes
> >> the lock when actually removing a hugepage.
> >
> > From a quick look it seems this has been broken since introduced by
> > c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle
> > hugepage"). Do we want to have this backported to stable? In any way
> > Fixes: SHA1 would be really nice.
> >
> 
> If the huge page hot-plug function was introduced by c8721bbbdd36, and
> it has already indicated that the gigantic page is not supported:
> 
> 	"As for larger hugepages (1GB for x86_64), it's not easy to do
> 	hotremove over them because it's larger than memory block.  So
> 	we now simply leave it to fail as it is."
> 
> Is it possible that the gigantic page hot-plugin has never been
> supported?

Offlining blocks with gigantic pages only fails when they are in-use,
I guess that was meant by the description. Maybe it was also meant to
fail in any case, but that was not was the patch did.

With free gigantic pages, it looks like it only ever worked when
offlining the first block of a gigantic page. And as long as you only
have gigantic pages, the VM_BUG_ON() would actually have triggered on
every block that is not gigantic-page-aligned, even if the block is not
part of any gigantic page at all.

Given the age of the patch it is a little bit surprising that it never
struck anyone, and that we now have found it on two architectures at
once :-)

> 
> I made another patch for this problem, and also tried to apply the
> first version of this patch on my system too. But they only postpone
> the error happened. The HugePages_Free will be changed from 2 to 1, if I 
> offline a huge page. I think it does not have a correct roll back.
> 
> # cat /proc/meminfo | grep -i huge
> AnonHugePages:         0 kB
> HugePages_Total:       2
> HugePages_Free:        1
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:   16777216 kB

HugePages_Free is supposed to be reduced when offlining a block, but
then HugePages_Total should also be reduced, so that is strange. On my
system both were reduced. Does this happen with any version of my patch?

What do you mean with postpone the error? Can you reproduce the BUG_ON
or the addressing exception with my patch?

> 
> I will make more test on it, but can any one confirm that this function 
> has been implemented and tested before?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/1] mm/hugetlb: fix memory offline with hugepage size > memory block size
  2016-09-23 11:03           ` Gerald Schaefer
@ 2016-09-26  2:49             ` Rui Teng
  0 siblings, 0 replies; 24+ messages in thread
From: Rui Teng @ 2016-09-26  2:49 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: Michal Hocko, Andrew Morton, Naoya Horiguchi, Hillf Danton,
	linux-mm, linux-kernel, Kirill A . Shutemov, Vlastimil Babka,
	Mike Kravetz, Aneesh Kumar K . V, Martin Schwidefsky,
	Heiko Carstens, Dave Hansen

On 9/23/16 7:03 PM, Gerald Schaefer wrote:
> On Fri, 23 Sep 2016 14:40:33 +0800
> Rui Teng <rui.teng@linux.vnet.ibm.com> wrote:
>
>> On 9/22/16 5:51 PM, Michal Hocko wrote:
>>> On Wed 21-09-16 14:35:34, Gerald Schaefer wrote:
>>>> dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
>>>> list corruption and addressing exception when trying to set a memory
>>>> block offline that is part (but not the first part) of a hugetlb page
>>>> with a size > memory block size.
>>>>
>>>> When no other smaller hugetlb page sizes are present, the VM_BUG_ON()
>>>> will trigger directly. In the other case we will run into an addressing
>>>> exception later, because dissolve_free_huge_page() will not work on the
>>>> head page of the compound hugetlb page which will result in a NULL
>>>> hstate from page_hstate().
>>>>
>>>> To fix this, first remove the VM_BUG_ON() because it is wrong, and then
>>>> use the compound head page in dissolve_free_huge_page().
>>>
>>> OK so dissolve_free_huge_page will work also on tail pages now which
>>> makes some sense. I would appreciate also few words why do we want to
>>> sacrifice something as precious as gigantic page rather than fail the
>>> page block offline. Dave pointed out dim offline usecase for example.
>>>
>>>> Also change locking in dissolve_free_huge_page(), so that it only takes
>>>> the lock when actually removing a hugepage.
>>>
>>> From a quick look it seems this has been broken since introduced by
>>> c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle
>>> hugepage"). Do we want to have this backported to stable? In any way
>>> Fixes: SHA1 would be really nice.
>>>
>>
>> If the huge page hot-plug function was introduced by c8721bbbdd36, and
>> it has already indicated that the gigantic page is not supported:
>>
>> 	"As for larger hugepages (1GB for x86_64), it's not easy to do
>> 	hotremove over them because it's larger than memory block.  So
>> 	we now simply leave it to fail as it is."
>>
>> Is it possible that the gigantic page hot-plugin has never been
>> supported?
>
> Offlining blocks with gigantic pages only fails when they are in-use,
> I guess that was meant by the description. Maybe it was also meant to
> fail in any case, but that was not was the patch did.
>
> With free gigantic pages, it looks like it only ever worked when
> offlining the first block of a gigantic page. And as long as you only
> have gigantic pages, the VM_BUG_ON() would actually have triggered on
> every block that is not gigantic-page-aligned, even if the block is not
> part of any gigantic page at all.

I have not met the VM_BUG_ON() issue on my powerpc architecture. Seems
it does not always have the align issue on other architectures.

>
> Given the age of the patch it is a little bit surprising that it never
> struck anyone, and that we now have found it on two architectures at
> once :-)
>
>>
>> I made another patch for this problem, and also tried to apply the
>> first version of this patch on my system too. But they only postpone
>> the error happened. The HugePages_Free will be changed from 2 to 1, if I
>> offline a huge page. I think it does not have a correct roll back.
>>
>> # cat /proc/meminfo | grep -i huge
>> AnonHugePages:         0 kB
>> HugePages_Total:       2
>> HugePages_Free:        1
>> HugePages_Rsvd:        0
>> HugePages_Surp:        0
>> Hugepagesize:   16777216 kB
>
> HugePages_Free is supposed to be reduced when offlining a block, but
> then HugePages_Total should also be reduced, so that is strange. On my
> system both were reduced. Does this happen with any version of my patch?

No, I only tested your first version. I do not have any question on
your patch, because the error was not introduced by your patch.

>
> What do you mean with postpone the error? Can you reproduce the BUG_ON
> or the addressing exception with my patch?

I mean the gigantic offlining function does not work at all on my
environment, even if the correct head page has been found. My method is
to filter all the tail pages out, and your method is to find head page
from tail pages.

Since you can offline gigantic page successful, I think such function
is supported now. I will debug the problem on my environment.

>
>>
>> I will make more test on it, but can any one confirm that this function
>> has been implemented and tested before?

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2016-09-26  2:49 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-20 15:53 [PATCH 0/1] memory offline issues with hugepage size > memory block size Gerald Schaefer
2016-09-20 15:53 ` [PATCH 1/1] mm/hugetlb: fix memory offline " Gerald Schaefer
2016-09-21  6:29   ` Hillf Danton
2016-09-21 12:35     ` [PATCH v2 " Gerald Schaefer
2016-09-21 13:17       ` Rui Teng
2016-09-21 15:13         ` Gerald Schaefer
2016-09-22  7:58       ` Hillf Danton
2016-09-22  9:51       ` Michal Hocko
2016-09-22 13:45         ` Gerald Schaefer
2016-09-22 16:29           ` [PATCH v3] " Gerald Schaefer
2016-09-22 18:12             ` Dave Hansen
2016-09-22 19:13               ` Mike Kravetz
2016-09-23 10:36               ` Gerald Schaefer
2016-09-23  6:40         ` [PATCH v2 1/1] " Rui Teng
2016-09-23 11:03           ` Gerald Schaefer
2016-09-26  2:49             ` Rui Teng
2016-09-20 17:37 ` [PATCH 0/1] memory offline issues " Mike Kravetz
2016-09-20 17:45   ` Dave Hansen
2016-09-21  9:49     ` Vlastimil Babka
2016-09-21 10:34     ` Gerald Schaefer
2016-09-21 10:30   ` Gerald Schaefer
2016-09-21 18:20   ` Michal Hocko
2016-09-21 18:27     ` Dave Hansen
2016-09-21 19:22       ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).