* [PATCH v2 1/1] mm/hugetlb: fix memory offline with hugepage size > memory block size
@ 2016-09-21 12:35 ` Gerald Schaefer
0 siblings, 0 replies; 48+ messages in thread
From: Gerald Schaefer @ 2016-09-21 12:35 UTC (permalink / raw)
To: Andrew Morton, Naoya Horiguchi
Cc: Hillf Danton, linux-mm, linux-kernel, Michal Hocko,
Kirill A . Shutemov, Vlastimil Babka, Mike Kravetz,
Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens,
Dave Hansen, Rui Teng, Gerald Schaefer
dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
list corruption and addressing exception when trying to set a memory
block offline that is part (but not the first part) of a hugetlb page
with a size > memory block size.
When no other smaller hugetlb page sizes are present, the VM_BUG_ON()
will trigger directly. In the other case we will run into an addressing
exception later, because dissolve_free_huge_page() will not work on the
head page of the compound hugetlb page which will result in a NULL
hstate from page_hstate().
To fix this, first remove the VM_BUG_ON() because it is wrong, and then
use the compound head page in dissolve_free_huge_page().
Also change locking in dissolve_free_huge_page(), so that it only takes
the lock when actually removing a hugepage.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
---
Changes in v2:
- Update comment in dissolve_free_huge_pages()
- Change locking in dissolve_free_huge_page()
mm/hugetlb.c | 31 +++++++++++++++++++------------
1 file changed, 19 insertions(+), 12 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 87e11d8..1522af8 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1441,23 +1441,30 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
*/
static void dissolve_free_huge_page(struct page *page)
{
+ struct page *head = compound_head(page);
+ struct hstate *h;
+ int nid;
+
+ if (page_count(head))
+ return;
+
+ h = page_hstate(head);
+ nid = page_to_nid(head);
+
spin_lock(&hugetlb_lock);
- if (PageHuge(page) && !page_count(page)) {
- struct hstate *h = page_hstate(page);
- int nid = page_to_nid(page);
- list_del(&page->lru);
- h->free_huge_pages--;
- h->free_huge_pages_node[nid]--;
- h->max_huge_pages--;
- update_and_free_page(h, page);
- }
+ list_del(&head->lru);
+ h->free_huge_pages--;
+ h->free_huge_pages_node[nid]--;
+ h->max_huge_pages--;
+ update_and_free_page(h, head);
spin_unlock(&hugetlb_lock);
}
/*
* Dissolve free hugepages in a given pfn range. Used by memory hotplug to
* make specified memory blocks removable from the system.
- * Note that start_pfn should aligned with (minimum) hugepage size.
+ * Note that this will dissolve a free gigantic hugepage completely, if any
+ * part of it lies within the given range.
*/
void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
{
@@ -1466,9 +1473,9 @@ void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
if (!hugepages_supported())
return;
- VM_BUG_ON(!IS_ALIGNED(start_pfn, 1 << minimum_order));
for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order)
- dissolve_free_huge_page(pfn_to_page(pfn));
+ if (PageHuge(pfn_to_page(pfn)))
+ dissolve_free_huge_page(pfn_to_page(pfn));
}
/*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 48+ messages in thread
* Re: [PATCH v2 1/1] mm/hugetlb: fix memory offline with hugepage size > memory block size
2016-09-21 12:35 ` Gerald Schaefer
@ 2016-09-21 13:17 ` Rui Teng
-1 siblings, 0 replies; 48+ messages in thread
From: Rui Teng @ 2016-09-21 13:17 UTC (permalink / raw)
To: Gerald Schaefer, Andrew Morton, Naoya Horiguchi
Cc: Hillf Danton, linux-mm, linux-kernel, Michal Hocko,
Kirill A . Shutemov, Vlastimil Babka, Mike Kravetz,
Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens,
Dave Hansen
On 9/21/16 8:35 PM, Gerald Schaefer wrote:
> dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
> list corruption and addressing exception when trying to set a memory
> block offline that is part (but not the first part) of a hugetlb page
> with a size > memory block size.
>
> When no other smaller hugetlb page sizes are present, the VM_BUG_ON()
> will trigger directly. In the other case we will run into an addressing
> exception later, because dissolve_free_huge_page() will not work on the
> head page of the compound hugetlb page which will result in a NULL
> hstate from page_hstate().
>
> To fix this, first remove the VM_BUG_ON() because it is wrong, and then
> use the compound head page in dissolve_free_huge_page().
>
> Also change locking in dissolve_free_huge_page(), so that it only takes
> the lock when actually removing a hugepage.
>
> Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
> ---
> Changes in v2:
> - Update comment in dissolve_free_huge_pages()
> - Change locking in dissolve_free_huge_page()
>
> mm/hugetlb.c | 31 +++++++++++++++++++------------
> 1 file changed, 19 insertions(+), 12 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 87e11d8..1522af8 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1441,23 +1441,30 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
> */
> static void dissolve_free_huge_page(struct page *page)
> {
> + struct page *head = compound_head(page);
> + struct hstate *h;
> + int nid;
> +
> + if (page_count(head))
> + return;
> +
> + h = page_hstate(head);
> + nid = page_to_nid(head);
> +
> spin_lock(&hugetlb_lock);
> - if (PageHuge(page) && !page_count(page)) {
> - struct hstate *h = page_hstate(page);
> - int nid = page_to_nid(page);
> - list_del(&page->lru);
> - h->free_huge_pages--;
> - h->free_huge_pages_node[nid]--;
> - h->max_huge_pages--;
> - update_and_free_page(h, page);
> - }
> + list_del(&head->lru);
> + h->free_huge_pages--;
> + h->free_huge_pages_node[nid]--;
> + h->max_huge_pages--;
> + update_and_free_page(h, head);
> spin_unlock(&hugetlb_lock);
> }
>
> /*
> * Dissolve free hugepages in a given pfn range. Used by memory hotplug to
> * make specified memory blocks removable from the system.
> - * Note that start_pfn should aligned with (minimum) hugepage size.
> + * Note that this will dissolve a free gigantic hugepage completely, if any
> + * part of it lies within the given range.
> */
> void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
> {
> @@ -1466,9 +1473,9 @@ void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
> if (!hugepages_supported())
> return;
>
> - VM_BUG_ON(!IS_ALIGNED(start_pfn, 1 << minimum_order));
> for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order)
> - dissolve_free_huge_page(pfn_to_page(pfn));
> + if (PageHuge(pfn_to_page(pfn)))
> + dissolve_free_huge_page(pfn_to_page(pfn));
How many times will dissolve_free_huge_page() be invoked in this loop?
For each pfn, it will be converted to the head page, and then the list
will be deleted repeatedly.
> }
>
> /*
>
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 1/1] mm/hugetlb: fix memory offline with hugepage size > memory block size
@ 2016-09-21 13:17 ` Rui Teng
0 siblings, 0 replies; 48+ messages in thread
From: Rui Teng @ 2016-09-21 13:17 UTC (permalink / raw)
To: Gerald Schaefer, Andrew Morton, Naoya Horiguchi
Cc: Hillf Danton, linux-mm, linux-kernel, Michal Hocko,
Kirill A . Shutemov, Vlastimil Babka, Mike Kravetz,
Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens,
Dave Hansen
On 9/21/16 8:35 PM, Gerald Schaefer wrote:
> dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
> list corruption and addressing exception when trying to set a memory
> block offline that is part (but not the first part) of a hugetlb page
> with a size > memory block size.
>
> When no other smaller hugetlb page sizes are present, the VM_BUG_ON()
> will trigger directly. In the other case we will run into an addressing
> exception later, because dissolve_free_huge_page() will not work on the
> head page of the compound hugetlb page which will result in a NULL
> hstate from page_hstate().
>
> To fix this, first remove the VM_BUG_ON() because it is wrong, and then
> use the compound head page in dissolve_free_huge_page().
>
> Also change locking in dissolve_free_huge_page(), so that it only takes
> the lock when actually removing a hugepage.
>
> Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
> ---
> Changes in v2:
> - Update comment in dissolve_free_huge_pages()
> - Change locking in dissolve_free_huge_page()
>
> mm/hugetlb.c | 31 +++++++++++++++++++------------
> 1 file changed, 19 insertions(+), 12 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 87e11d8..1522af8 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1441,23 +1441,30 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
> */
> static void dissolve_free_huge_page(struct page *page)
> {
> + struct page *head = compound_head(page);
> + struct hstate *h;
> + int nid;
> +
> + if (page_count(head))
> + return;
> +
> + h = page_hstate(head);
> + nid = page_to_nid(head);
> +
> spin_lock(&hugetlb_lock);
> - if (PageHuge(page) && !page_count(page)) {
> - struct hstate *h = page_hstate(page);
> - int nid = page_to_nid(page);
> - list_del(&page->lru);
> - h->free_huge_pages--;
> - h->free_huge_pages_node[nid]--;
> - h->max_huge_pages--;
> - update_and_free_page(h, page);
> - }
> + list_del(&head->lru);
> + h->free_huge_pages--;
> + h->free_huge_pages_node[nid]--;
> + h->max_huge_pages--;
> + update_and_free_page(h, head);
> spin_unlock(&hugetlb_lock);
> }
>
> /*
> * Dissolve free hugepages in a given pfn range. Used by memory hotplug to
> * make specified memory blocks removable from the system.
> - * Note that start_pfn should aligned with (minimum) hugepage size.
> + * Note that this will dissolve a free gigantic hugepage completely, if any
> + * part of it lies within the given range.
> */
> void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
> {
> @@ -1466,9 +1473,9 @@ void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
> if (!hugepages_supported())
> return;
>
> - VM_BUG_ON(!IS_ALIGNED(start_pfn, 1 << minimum_order));
> for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order)
> - dissolve_free_huge_page(pfn_to_page(pfn));
> + if (PageHuge(pfn_to_page(pfn)))
> + dissolve_free_huge_page(pfn_to_page(pfn));
How many times will dissolve_free_huge_page() be invoked in this loop?
For each pfn, it will be converted to the head page, and then the list
will be deleted repeatedly.
> }
>
> /*
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 1/1] mm/hugetlb: fix memory offline with hugepage size > memory block size
2016-09-21 13:17 ` Rui Teng
@ 2016-09-21 15:13 ` Gerald Schaefer
-1 siblings, 0 replies; 48+ messages in thread
From: Gerald Schaefer @ 2016-09-21 15:13 UTC (permalink / raw)
To: Rui Teng
Cc: Andrew Morton, Naoya Horiguchi, Hillf Danton, linux-mm,
linux-kernel, Michal Hocko, Kirill A . Shutemov, Vlastimil Babka,
Mike Kravetz, Aneesh Kumar K . V, Martin Schwidefsky,
Heiko Carstens, Dave Hansen
On Wed, 21 Sep 2016 21:17:29 +0800
Rui Teng <rui.teng@linux.vnet.ibm.com> wrote:
> > /*
> > * Dissolve free hugepages in a given pfn range. Used by memory hotplug to
> > * make specified memory blocks removable from the system.
> > - * Note that start_pfn should aligned with (minimum) hugepage size.
> > + * Note that this will dissolve a free gigantic hugepage completely, if any
> > + * part of it lies within the given range.
> > */
> > void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
> > {
> > @@ -1466,9 +1473,9 @@ void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
> > if (!hugepages_supported())
> > return;
> >
> > - VM_BUG_ON(!IS_ALIGNED(start_pfn, 1 << minimum_order));
> > for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order)
> > - dissolve_free_huge_page(pfn_to_page(pfn));
> > + if (pfn_to_page(pfn)))
> > + pfn_to_page(pfn));
> How many times will dissolve_free_huge_page() be invoked in this loop?
> For each pfn, it will be converted to the head page, and then the list
> will be deleted repeatedly.
In the case where the memory block [start_pfn, end_pfn] is part of a
gigantic hugepage, dissolve_free_huge_page() will only be invoked once.
If there is only one gigantic hugepage pool, 1 << minimum_order will be
larger than the memory block size, and the loop will stop after the first
invocation of dissolve_free_huge_page().
If there are additional hugepage pools, with hugepage sizes < memory
block size, then it will loop as many times as 1 << minimum_order fits
inside a memory block, e.g. 256 times with 1 MB minimum hugepage size
and 256 MB memory block size.
However, the PageHuge() check should always return false after the first
invocation of dissolve_free_huge_page(), since update_and_free_page()
will take care of resetting compound_dtor, and so there will also be
just one invocation.
The only case where there will be more than one invocation is the case
where we do not have any part of a gigantic hugepage inside the memory
block, but rather multiple "normal sized" hugepages. Then there will be
one invocation per hugepage, as opposed to one invocation per
"1 << minimum_order" range as it was before the patch. So it also
improves the behaviour in the case where there is no gigantic page
involved.
> > }
> >
> > /*
> >
>
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 1/1] mm/hugetlb: fix memory offline with hugepage size > memory block size
@ 2016-09-21 15:13 ` Gerald Schaefer
0 siblings, 0 replies; 48+ messages in thread
From: Gerald Schaefer @ 2016-09-21 15:13 UTC (permalink / raw)
To: Rui Teng
Cc: Andrew Morton, Naoya Horiguchi, Hillf Danton, linux-mm,
linux-kernel, Michal Hocko, Kirill A . Shutemov, Vlastimil Babka,
Mike Kravetz, Aneesh Kumar K . V, Martin Schwidefsky,
Heiko Carstens, Dave Hansen
On Wed, 21 Sep 2016 21:17:29 +0800
Rui Teng <rui.teng@linux.vnet.ibm.com> wrote:
> > /*
> > * Dissolve free hugepages in a given pfn range. Used by memory hotplug to
> > * make specified memory blocks removable from the system.
> > - * Note that start_pfn should aligned with (minimum) hugepage size.
> > + * Note that this will dissolve a free gigantic hugepage completely, if any
> > + * part of it lies within the given range.
> > */
> > void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
> > {
> > @@ -1466,9 +1473,9 @@ void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
> > if (!hugepages_supported())
> > return;
> >
> > - VM_BUG_ON(!IS_ALIGNED(start_pfn, 1 << minimum_order));
> > for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order)
> > - dissolve_free_huge_page(pfn_to_page(pfn));
> > + if (pfn_to_page(pfn)))
> > + pfn_to_page(pfn));
> How many times will dissolve_free_huge_page() be invoked in this loop?
> For each pfn, it will be converted to the head page, and then the list
> will be deleted repeatedly.
In the case where the memory block [start_pfn, end_pfn] is part of a
gigantic hugepage, dissolve_free_huge_page() will only be invoked once.
If there is only one gigantic hugepage pool, 1 << minimum_order will be
larger than the memory block size, and the loop will stop after the first
invocation of dissolve_free_huge_page().
If there are additional hugepage pools, with hugepage sizes < memory
block size, then it will loop as many times as 1 << minimum_order fits
inside a memory block, e.g. 256 times with 1 MB minimum hugepage size
and 256 MB memory block size.
However, the PageHuge() check should always return false after the first
invocation of dissolve_free_huge_page(), since update_and_free_page()
will take care of resetting compound_dtor, and so there will also be
just one invocation.
The only case where there will be more than one invocation is the case
where we do not have any part of a gigantic hugepage inside the memory
block, but rather multiple "normal sized" hugepages. Then there will be
one invocation per hugepage, as opposed to one invocation per
"1 << minimum_order" range as it was before the patch. So it also
improves the behaviour in the case where there is no gigantic page
involved.
> > }
> >
> > /*
> >
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 1/1] mm/hugetlb: fix memory offline with hugepage size > memory block size
2016-09-21 12:35 ` Gerald Schaefer
@ 2016-09-22 7:58 ` Hillf Danton
-1 siblings, 0 replies; 48+ messages in thread
From: Hillf Danton @ 2016-09-22 7:58 UTC (permalink / raw)
To: 'Gerald Schaefer', 'Andrew Morton',
'Naoya Horiguchi'
Cc: linux-mm, linux-kernel, 'Michal Hocko',
'Kirill A . Shutemov', 'Vlastimil Babka',
'Mike Kravetz', 'Aneesh Kumar K . V',
'Martin Schwidefsky', 'Heiko Carstens',
'Dave Hansen', 'Rui Teng'
>
> dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
> list corruption and addressing exception when trying to set a memory
> block offline that is part (but not the first part) of a hugetlb page
> with a size > memory block size.
>
> When no other smaller hugetlb page sizes are present, the VM_BUG_ON()
> will trigger directly. In the other case we will run into an addressing
> exception later, because dissolve_free_huge_page() will not work on the
> head page of the compound hugetlb page which will result in a NULL
> hstate from page_hstate().
>
> To fix this, first remove the VM_BUG_ON() because it is wrong, and then
> use the compound head page in dissolve_free_huge_page().
>
> Also change locking in dissolve_free_huge_page(), so that it only takes
> the lock when actually removing a hugepage.
>
> Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
> ---
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 1/1] mm/hugetlb: fix memory offline with hugepage size > memory block size
@ 2016-09-22 7:58 ` Hillf Danton
0 siblings, 0 replies; 48+ messages in thread
From: Hillf Danton @ 2016-09-22 7:58 UTC (permalink / raw)
To: 'Gerald Schaefer', 'Andrew Morton',
'Naoya Horiguchi'
Cc: linux-mm, linux-kernel, 'Michal Hocko',
'Kirill A . Shutemov', 'Vlastimil Babka',
'Mike Kravetz', 'Aneesh Kumar K . V',
'Martin Schwidefsky', 'Heiko Carstens',
'Dave Hansen', 'Rui Teng'
>
> dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
> list corruption and addressing exception when trying to set a memory
> block offline that is part (but not the first part) of a hugetlb page
> with a size > memory block size.
>
> When no other smaller hugetlb page sizes are present, the VM_BUG_ON()
> will trigger directly. In the other case we will run into an addressing
> exception later, because dissolve_free_huge_page() will not work on the
> head page of the compound hugetlb page which will result in a NULL
> hstate from page_hstate().
>
> To fix this, first remove the VM_BUG_ON() because it is wrong, and then
> use the compound head page in dissolve_free_huge_page().
>
> Also change locking in dissolve_free_huge_page(), so that it only takes
> the lock when actually removing a hugepage.
>
> Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
> ---
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 1/1] mm/hugetlb: fix memory offline with hugepage size > memory block size
2016-09-21 12:35 ` Gerald Schaefer
@ 2016-09-22 9:51 ` Michal Hocko
-1 siblings, 0 replies; 48+ messages in thread
From: Michal Hocko @ 2016-09-22 9:51 UTC (permalink / raw)
To: Gerald Schaefer
Cc: Andrew Morton, Naoya Horiguchi, Hillf Danton, linux-mm,
linux-kernel, Kirill A . Shutemov, Vlastimil Babka, Mike Kravetz,
Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens,
Dave Hansen, Rui Teng
On Wed 21-09-16 14:35:34, Gerald Schaefer wrote:
> dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
> list corruption and addressing exception when trying to set a memory
> block offline that is part (but not the first part) of a hugetlb page
> with a size > memory block size.
>
> When no other smaller hugetlb page sizes are present, the VM_BUG_ON()
> will trigger directly. In the other case we will run into an addressing
> exception later, because dissolve_free_huge_page() will not work on the
> head page of the compound hugetlb page which will result in a NULL
> hstate from page_hstate().
>
> To fix this, first remove the VM_BUG_ON() because it is wrong, and then
> use the compound head page in dissolve_free_huge_page().
OK so dissolve_free_huge_page will work also on tail pages now which
makes some sense. I would appreciate also few words why do we want to
sacrifice something as precious as gigantic page rather than fail the
page block offline. Dave pointed out dim offline usecase for example.
> Also change locking in dissolve_free_huge_page(), so that it only takes
> the lock when actually removing a hugepage.
>From a quick look it seems this has been broken since introduced by
c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle
hugepage"). Do we want to have this backported to stable? In any way
Fixes: SHA1 would be really nice.
> Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Other than that looks good to me, although there is a room for
improvements here. See below
> ---
> Changes in v2:
> - Update comment in dissolve_free_huge_pages()
> - Change locking in dissolve_free_huge_page()
>
> mm/hugetlb.c | 31 +++++++++++++++++++------------
> 1 file changed, 19 insertions(+), 12 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 87e11d8..1522af8 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1441,23 +1441,30 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
> */
> static void dissolve_free_huge_page(struct page *page)
> {
> + struct page *head = compound_head(page);
> + struct hstate *h;
> + int nid;
> +
> + if (page_count(head))
> + return;
> +
> + h = page_hstate(head);
> + nid = page_to_nid(head);
> +
> spin_lock(&hugetlb_lock);
> - if (PageHuge(page) && !page_count(page)) {
> - struct hstate *h = page_hstate(page);
> - int nid = page_to_nid(page);
> - list_del(&page->lru);
> - h->free_huge_pages--;
> - h->free_huge_pages_node[nid]--;
> - h->max_huge_pages--;
> - update_and_free_page(h, page);
> - }
> + list_del(&head->lru);
> + h->free_huge_pages--;
> + h->free_huge_pages_node[nid]--;
> + h->max_huge_pages--;
> + update_and_free_page(h, head);
> spin_unlock(&hugetlb_lock);
> }
>
> /*
> * Dissolve free hugepages in a given pfn range. Used by memory hotplug to
> * make specified memory blocks removable from the system.
> - * Note that start_pfn should aligned with (minimum) hugepage size.
> + * Note that this will dissolve a free gigantic hugepage completely, if any
> + * part of it lies within the given range.
> */
> void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
> {
> @@ -1466,9 +1473,9 @@ void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
> if (!hugepages_supported())
> return;
>
> - VM_BUG_ON(!IS_ALIGNED(start_pfn, 1 << minimum_order));
> for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order)
> - dissolve_free_huge_page(pfn_to_page(pfn));
> + if (PageHuge(pfn_to_page(pfn)))
> + dissolve_free_huge_page(pfn_to_page(pfn));
> }
we can return the number of freed pages from dissolve_free_huge_page and
move by the approapriate number of pfns. Nothing to really lose sleep
about but no rocket science either. An early break out if the page is
used would be nice as well. Something like the following, probably a
separate patch on top of yours.
---
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 029a80b90cea..d230900f571e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1434,17 +1434,17 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
}
/*
- * Dissolve a given free hugepage into free buddy pages. This function does
- * nothing for in-use (including surplus) hugepages.
+ * Dissolve a given free hugepage into free buddy pages. Returns number
+ * of freed pages or EBUSY if the page is in use.
*/
-static void dissolve_free_huge_page(struct page *page)
+static int dissolve_free_huge_page(struct page *page)
{
struct page *head = compound_head(page);
struct hstate *h;
int nid;
if (page_count(head))
- return;
+ return -EBUSY;
h = page_hstate(head);
nid = page_to_nid(head);
@@ -1456,6 +1456,8 @@ static void dissolve_free_huge_page(struct page *page)
h->max_huge_pages--;
update_and_free_page(h, head);
spin_unlock(&hugetlb_lock);
+
+ return 1 << h->order;
}
/*
@@ -1471,9 +1473,18 @@ void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
if (!hugepages_supported())
return;
- for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order)
- if (PageHuge(pfn_to_page(pfn)))
- dissolve_free_huge_page(pfn_to_page(pfn));
+ for (pfn = start_pfn; pfn < end_pfn; )
+ int nr_pages;
+
+ if (!PageHuge(pfn_to_page(pfn))) {
+ pfn += 1 << minimum_order;
+ continue;
+ }
+
+ nr_pages = dissolve_free_huge_page(pfn_to_page(pfn));
+ if (IS_ERR(nr_pages))
+ break;
+ pfn += nr_pages;
}
/*
--
Michal Hocko
SUSE Labs
^ permalink raw reply related [flat|nested] 48+ messages in thread
* Re: [PATCH v2 1/1] mm/hugetlb: fix memory offline with hugepage size > memory block size
@ 2016-09-22 9:51 ` Michal Hocko
0 siblings, 0 replies; 48+ messages in thread
From: Michal Hocko @ 2016-09-22 9:51 UTC (permalink / raw)
To: Gerald Schaefer
Cc: Andrew Morton, Naoya Horiguchi, Hillf Danton, linux-mm,
linux-kernel, Kirill A . Shutemov, Vlastimil Babka, Mike Kravetz,
Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens,
Dave Hansen, Rui Teng
On Wed 21-09-16 14:35:34, Gerald Schaefer wrote:
> dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
> list corruption and addressing exception when trying to set a memory
> block offline that is part (but not the first part) of a hugetlb page
> with a size > memory block size.
>
> When no other smaller hugetlb page sizes are present, the VM_BUG_ON()
> will trigger directly. In the other case we will run into an addressing
> exception later, because dissolve_free_huge_page() will not work on the
> head page of the compound hugetlb page which will result in a NULL
> hstate from page_hstate().
>
> To fix this, first remove the VM_BUG_ON() because it is wrong, and then
> use the compound head page in dissolve_free_huge_page().
OK so dissolve_free_huge_page will work also on tail pages now which
makes some sense. I would appreciate also few words why do we want to
sacrifice something as precious as gigantic page rather than fail the
page block offline. Dave pointed out dim offline usecase for example.
> Also change locking in dissolve_free_huge_page(), so that it only takes
> the lock when actually removing a hugepage.
>From a quick look it seems this has been broken since introduced by
c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle
hugepage"). Do we want to have this backported to stable? In any way
Fixes: SHA1 would be really nice.
> Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Other than that looks good to me, although there is a room for
improvements here. See below
> ---
> Changes in v2:
> - Update comment in dissolve_free_huge_pages()
> - Change locking in dissolve_free_huge_page()
>
> mm/hugetlb.c | 31 +++++++++++++++++++------------
> 1 file changed, 19 insertions(+), 12 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 87e11d8..1522af8 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1441,23 +1441,30 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
> */
> static void dissolve_free_huge_page(struct page *page)
> {
> + struct page *head = compound_head(page);
> + struct hstate *h;
> + int nid;
> +
> + if (page_count(head))
> + return;
> +
> + h = page_hstate(head);
> + nid = page_to_nid(head);
> +
> spin_lock(&hugetlb_lock);
> - if (PageHuge(page) && !page_count(page)) {
> - struct hstate *h = page_hstate(page);
> - int nid = page_to_nid(page);
> - list_del(&page->lru);
> - h->free_huge_pages--;
> - h->free_huge_pages_node[nid]--;
> - h->max_huge_pages--;
> - update_and_free_page(h, page);
> - }
> + list_del(&head->lru);
> + h->free_huge_pages--;
> + h->free_huge_pages_node[nid]--;
> + h->max_huge_pages--;
> + update_and_free_page(h, head);
> spin_unlock(&hugetlb_lock);
> }
>
> /*
> * Dissolve free hugepages in a given pfn range. Used by memory hotplug to
> * make specified memory blocks removable from the system.
> - * Note that start_pfn should aligned with (minimum) hugepage size.
> + * Note that this will dissolve a free gigantic hugepage completely, if any
> + * part of it lies within the given range.
> */
> void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
> {
> @@ -1466,9 +1473,9 @@ void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
> if (!hugepages_supported())
> return;
>
> - VM_BUG_ON(!IS_ALIGNED(start_pfn, 1 << minimum_order));
> for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order)
> - dissolve_free_huge_page(pfn_to_page(pfn));
> + if (PageHuge(pfn_to_page(pfn)))
> + dissolve_free_huge_page(pfn_to_page(pfn));
> }
we can return the number of freed pages from dissolve_free_huge_page and
move by the approapriate number of pfns. Nothing to really lose sleep
about but no rocket science either. An early break out if the page is
used would be nice as well. Something like the following, probably a
separate patch on top of yours.
---
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 029a80b90cea..d230900f571e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1434,17 +1434,17 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
}
/*
- * Dissolve a given free hugepage into free buddy pages. This function does
- * nothing for in-use (including surplus) hugepages.
+ * Dissolve a given free hugepage into free buddy pages. Returns number
+ * of freed pages or EBUSY if the page is in use.
*/
-static void dissolve_free_huge_page(struct page *page)
+static int dissolve_free_huge_page(struct page *page)
{
struct page *head = compound_head(page);
struct hstate *h;
int nid;
if (page_count(head))
- return;
+ return -EBUSY;
h = page_hstate(head);
nid = page_to_nid(head);
@@ -1456,6 +1456,8 @@ static void dissolve_free_huge_page(struct page *page)
h->max_huge_pages--;
update_and_free_page(h, head);
spin_unlock(&hugetlb_lock);
+
+ return 1 << h->order;
}
/*
@@ -1471,9 +1473,18 @@ void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
if (!hugepages_supported())
return;
- for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order)
- if (PageHuge(pfn_to_page(pfn)))
- dissolve_free_huge_page(pfn_to_page(pfn));
+ for (pfn = start_pfn; pfn < end_pfn; )
+ int nr_pages;
+
+ if (!PageHuge(pfn_to_page(pfn))) {
+ pfn += 1 << minimum_order;
+ continue;
+ }
+
+ nr_pages = dissolve_free_huge_page(pfn_to_page(pfn));
+ if (IS_ERR(nr_pages))
+ break;
+ pfn += nr_pages;
}
/*
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 48+ messages in thread
* Re: [PATCH v2 1/1] mm/hugetlb: fix memory offline with hugepage size > memory block size
2016-09-22 9:51 ` Michal Hocko
@ 2016-09-22 13:45 ` Gerald Schaefer
-1 siblings, 0 replies; 48+ messages in thread
From: Gerald Schaefer @ 2016-09-22 13:45 UTC (permalink / raw)
To: Michal Hocko
Cc: Andrew Morton, Naoya Horiguchi, Hillf Danton, linux-mm,
linux-kernel, Kirill A . Shutemov, Vlastimil Babka, Mike Kravetz,
Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens,
Dave Hansen, Rui Teng, Gerald Schaefer
On Thu, 22 Sep 2016 11:51:37 +0200
Michal Hocko <mhocko@kernel.org> wrote:
> On Wed 21-09-16 14:35:34, Gerald Schaefer wrote:
> > dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
> > list corruption and addressing exception when trying to set a memory
> > block offline that is part (but not the first part) of a hugetlb page
> > with a size > memory block size.
> >
> > When no other smaller hugetlb page sizes are present, the VM_BUG_ON()
> > will trigger directly. In the other case we will run into an addressing
> > exception later, because dissolve_free_huge_page() will not work on the
> > head page of the compound hugetlb page which will result in a NULL
> > hstate from page_hstate().
> >
> > To fix this, first remove the VM_BUG_ON() because it is wrong, and then
> > use the compound head page in dissolve_free_huge_page().
>
> OK so dissolve_free_huge_page will work also on tail pages now which
> makes some sense. I would appreciate also few words why do we want to
> sacrifice something as precious as gigantic page rather than fail the
> page block offline. Dave pointed out dim offline usecase for example.
>
> > Also change locking in dissolve_free_huge_page(), so that it only takes
> > the lock when actually removing a hugepage.
>
> From a quick look it seems this has been broken since introduced by
> c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle
> hugepage"). Do we want to have this backported to stable? In any way
> Fixes: SHA1 would be really nice.
That's true, I'll send a v3.
>
> > Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
>
> Other than that looks good to me, although there is a room for
> improvements here. See below
>
> > ---
> > Changes in v2:
> > - Update comment in dissolve_free_huge_pages()
> > - Change locking in dissolve_free_huge_page()
> >
> > mm/hugetlb.c | 31 +++++++++++++++++++------------
> > 1 file changed, 19 insertions(+), 12 deletions(-)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 87e11d8..1522af8 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1441,23 +1441,30 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
> > */
> > static void dissolve_free_huge_page(struct page *page)
> > {
> > + struct page *head = compound_head(page);
> > + struct hstate *h;
> > + int nid;
> > +
> > + if (page_count(head))
> > + return;
> > +
> > + h = page_hstate(head);
> > + nid = page_to_nid(head);
> > +
> > spin_lock(&hugetlb_lock);
> > - if (PageHuge(page) && !page_count(page)) {
> > - struct hstate *h = page_hstate(page);
> > - int nid = page_to_nid(page);
> > - list_del(&page->lru);
> > - h->free_huge_pages--;
> > - h->free_huge_pages_node[nid]--;
> > - h->max_huge_pages--;
> > - update_and_free_page(h, page);
> > - }
> > + list_del(&head->lru);
> > + h->free_huge_pages--;
> > + h->free_huge_pages_node[nid]--;
> > + h->max_huge_pages--;
> > + update_and_free_page(h, head);
> > spin_unlock(&hugetlb_lock);
> > }
> >
> > /*
> > * Dissolve free hugepages in a given pfn range. Used by memory hotplug to
> > * make specified memory blocks removable from the system.
> > - * Note that start_pfn should aligned with (minimum) hugepage size.
> > + * Note that this will dissolve a free gigantic hugepage completely, if any
> > + * part of it lies within the given range.
> > */
> > void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
> > {
> > @@ -1466,9 +1473,9 @@ void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
> > if (!hugepages_supported())
> > return;
> >
> > - VM_BUG_ON(!IS_ALIGNED(start_pfn, 1 << minimum_order));
> > for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order)
> > - dissolve_free_huge_page(pfn_to_page(pfn));
> > + if (PageHuge(pfn_to_page(pfn)))
> > + dissolve_free_huge_page(pfn_to_page(pfn));
> > }
>
> we can return the number of freed pages from dissolve_free_huge_page and
> move by the approapriate number of pfns. Nothing to really lose sleep
> about but no rocket science either. An early break out if the page is
> used would be nice as well. Something like the following, probably a
> separate patch on top of yours.
Hmm, not sure if this is really worth the effort and the (small) added
complexity. It would surely be worth it for the current code, where we
also have the spinlock involved even for non-huge pages. After this patch
however, dissolve_free_huge_page() will only be called for hugepages,
and the early break-out is also there, although the page_count() check
could probably be moved out from dissolve_free_huge_page() and into the
loop, I'll try this for v3.
The loop count will also not be greatly reduced, at least when there
are only hugepages of minimum_order in the memory block, or no hugepages
at all, it will not improve anything. In any other case the PageHuge()
check in the loop will already prevent unnecessary calls to
dissolve_free_huge_page().
> ---
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 029a80b90cea..d230900f571e 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1434,17 +1434,17 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
> }
>
> /*
> - * Dissolve a given free hugepage into free buddy pages. This function does
> - * nothing for in-use (including surplus) hugepages.
> + * Dissolve a given free hugepage into free buddy pages. Returns number
> + * of freed pages or EBUSY if the page is in use.
> */
> -static void dissolve_free_huge_page(struct page *page)
> +static int dissolve_free_huge_page(struct page *page)
> {
> struct page *head = compound_head(page);
> struct hstate *h;
> int nid;
>
> if (page_count(head))
> - return;
> + return -EBUSY;
>
> h = page_hstate(head);
> nid = page_to_nid(head);
> @@ -1456,6 +1456,8 @@ static void dissolve_free_huge_page(struct page *page)
> h->max_huge_pages--;
> update_and_free_page(h, head);
> spin_unlock(&hugetlb_lock);
> +
> + return 1 << h->order;
> }
>
> /*
> @@ -1471,9 +1473,18 @@ void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
> if (!hugepages_supported())
> return;
>
> - for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order)
> - if (PageHuge(pfn_to_page(pfn)))
> - dissolve_free_huge_page(pfn_to_page(pfn));
> + for (pfn = start_pfn; pfn < end_pfn; )
> + int nr_pages;
> +
> + if (!PageHuge(pfn_to_page(pfn))) {
> + pfn += 1 << minimum_order;
> + continue;
> + }
> +
> + nr_pages = dissolve_free_huge_page(pfn_to_page(pfn));
> + if (IS_ERR(nr_pages))
> + break;
> + pfn += nr_pages;
> }
>
> /*
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 1/1] mm/hugetlb: fix memory offline with hugepage size > memory block size
@ 2016-09-22 13:45 ` Gerald Schaefer
0 siblings, 0 replies; 48+ messages in thread
From: Gerald Schaefer @ 2016-09-22 13:45 UTC (permalink / raw)
To: Michal Hocko
Cc: Andrew Morton, Naoya Horiguchi, Hillf Danton, linux-mm,
linux-kernel, Kirill A . Shutemov, Vlastimil Babka, Mike Kravetz,
Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens,
Dave Hansen, Rui Teng, Gerald Schaefer
On Thu, 22 Sep 2016 11:51:37 +0200
Michal Hocko <mhocko@kernel.org> wrote:
> On Wed 21-09-16 14:35:34, Gerald Schaefer wrote:
> > dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
> > list corruption and addressing exception when trying to set a memory
> > block offline that is part (but not the first part) of a hugetlb page
> > with a size > memory block size.
> >
> > When no other smaller hugetlb page sizes are present, the VM_BUG_ON()
> > will trigger directly. In the other case we will run into an addressing
> > exception later, because dissolve_free_huge_page() will not work on the
> > head page of the compound hugetlb page which will result in a NULL
> > hstate from page_hstate().
> >
> > To fix this, first remove the VM_BUG_ON() because it is wrong, and then
> > use the compound head page in dissolve_free_huge_page().
>
> OK so dissolve_free_huge_page will work also on tail pages now which
> makes some sense. I would appreciate also few words why do we want to
> sacrifice something as precious as gigantic page rather than fail the
> page block offline. Dave pointed out dim offline usecase for example.
>
> > Also change locking in dissolve_free_huge_page(), so that it only takes
> > the lock when actually removing a hugepage.
>
> From a quick look it seems this has been broken since introduced by
> c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle
> hugepage"). Do we want to have this backported to stable? In any way
> Fixes: SHA1 would be really nice.
That's true, I'll send a v3.
>
> > Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
>
> Other than that looks good to me, although there is a room for
> improvements here. See below
>
> > ---
> > Changes in v2:
> > - Update comment in dissolve_free_huge_pages()
> > - Change locking in dissolve_free_huge_page()
> >
> > mm/hugetlb.c | 31 +++++++++++++++++++------------
> > 1 file changed, 19 insertions(+), 12 deletions(-)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 87e11d8..1522af8 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1441,23 +1441,30 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
> > */
> > static void dissolve_free_huge_page(struct page *page)
> > {
> > + struct page *head = compound_head(page);
> > + struct hstate *h;
> > + int nid;
> > +
> > + if (page_count(head))
> > + return;
> > +
> > + h = page_hstate(head);
> > + nid = page_to_nid(head);
> > +
> > spin_lock(&hugetlb_lock);
> > - if (PageHuge(page) && !page_count(page)) {
> > - struct hstate *h = page_hstate(page);
> > - int nid = page_to_nid(page);
> > - list_del(&page->lru);
> > - h->free_huge_pages--;
> > - h->free_huge_pages_node[nid]--;
> > - h->max_huge_pages--;
> > - update_and_free_page(h, page);
> > - }
> > + list_del(&head->lru);
> > + h->free_huge_pages--;
> > + h->free_huge_pages_node[nid]--;
> > + h->max_huge_pages--;
> > + update_and_free_page(h, head);
> > spin_unlock(&hugetlb_lock);
> > }
> >
> > /*
> > * Dissolve free hugepages in a given pfn range. Used by memory hotplug to
> > * make specified memory blocks removable from the system.
> > - * Note that start_pfn should aligned with (minimum) hugepage size.
> > + * Note that this will dissolve a free gigantic hugepage completely, if any
> > + * part of it lies within the given range.
> > */
> > void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
> > {
> > @@ -1466,9 +1473,9 @@ void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
> > if (!hugepages_supported())
> > return;
> >
> > - VM_BUG_ON(!IS_ALIGNED(start_pfn, 1 << minimum_order));
> > for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order)
> > - dissolve_free_huge_page(pfn_to_page(pfn));
> > + if (PageHuge(pfn_to_page(pfn)))
> > + dissolve_free_huge_page(pfn_to_page(pfn));
> > }
>
> we can return the number of freed pages from dissolve_free_huge_page and
> move by the approapriate number of pfns. Nothing to really lose sleep
> about but no rocket science either. An early break out if the page is
> used would be nice as well. Something like the following, probably a
> separate patch on top of yours.
Hmm, not sure if this is really worth the effort and the (small) added
complexity. It would surely be worth it for the current code, where we
also have the spinlock involved even for non-huge pages. After this patch
however, dissolve_free_huge_page() will only be called for hugepages,
and the early break-out is also there, although the page_count() check
could probably be moved out from dissolve_free_huge_page() and into the
loop, I'll try this for v3.
The loop count will also not be greatly reduced, at least when there
are only hugepages of minimum_order in the memory block, or no hugepages
at all, it will not improve anything. In any other case the PageHuge()
check in the loop will already prevent unnecessary calls to
dissolve_free_huge_page().
> ---
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 029a80b90cea..d230900f571e 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1434,17 +1434,17 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
> }
>
> /*
> - * Dissolve a given free hugepage into free buddy pages. This function does
> - * nothing for in-use (including surplus) hugepages.
> + * Dissolve a given free hugepage into free buddy pages. Returns number
> + * of freed pages or EBUSY if the page is in use.
> */
> -static void dissolve_free_huge_page(struct page *page)
> +static int dissolve_free_huge_page(struct page *page)
> {
> struct page *head = compound_head(page);
> struct hstate *h;
> int nid;
>
> if (page_count(head))
> - return;
> + return -EBUSY;
>
> h = page_hstate(head);
> nid = page_to_nid(head);
> @@ -1456,6 +1456,8 @@ static void dissolve_free_huge_page(struct page *page)
> h->max_huge_pages--;
> update_and_free_page(h, head);
> spin_unlock(&hugetlb_lock);
> +
> + return 1 << h->order;
> }
>
> /*
> @@ -1471,9 +1473,18 @@ void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
> if (!hugepages_supported())
> return;
>
> - for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order)
> - if (PageHuge(pfn_to_page(pfn)))
> - dissolve_free_huge_page(pfn_to_page(pfn));
> + for (pfn = start_pfn; pfn < end_pfn; )
> + int nr_pages;
> +
> + if (!PageHuge(pfn_to_page(pfn))) {
> + pfn += 1 << minimum_order;
> + continue;
> + }
> +
> + nr_pages = dissolve_free_huge_page(pfn_to_page(pfn));
> + if (IS_ERR(nr_pages))
> + break;
> + pfn += nr_pages;
> }
>
> /*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 48+ messages in thread
* [PATCH v3] mm/hugetlb: fix memory offline with hugepage size > memory block size
2016-09-22 13:45 ` Gerald Schaefer
@ 2016-09-22 16:29 ` Gerald Schaefer
-1 siblings, 0 replies; 48+ messages in thread
From: Gerald Schaefer @ 2016-09-22 16:29 UTC (permalink / raw)
To: Andrew Morton
Cc: Gerald Schaefer, Michal Hocko, Naoya Horiguchi, Hillf Danton,
linux-mm, linux-kernel, Kirill A . Shutemov, Vlastimil Babka,
Mike Kravetz, Aneesh Kumar K . V, Martin Schwidefsky,
Heiko Carstens, Dave Hansen, Rui Teng
dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
list corruption and addressing exception when trying to set a memory
block offline that is part (but not the first part) of a "gigantic"
hugetlb page with a size > memory block size.
When no other smaller hugetlb page sizes are present, the VM_BUG_ON()
will trigger directly. In the other case we will run into an addressing
exception later, because dissolve_free_huge_page() will not work on the
head page of the compound hugetlb page which will result in a NULL
hstate from page_hstate().
To fix this, first remove the VM_BUG_ON() because it is wrong, and then
use the compound head page in dissolve_free_huge_page(). This means that
an unused pre-allocated gigantic page that has any part of itself inside
the memory block that is going offline will be dissolved completely.
Losing the gigantic hugepage is preferable to failing the memory offline,
for example in the situation where a (possibly faulty) memory DIMM needs
to go offline.
Also move the PageHuge() and page_count() checks out of
dissolve_free_huge_page() in order to only take the spin_lock when
actually removing a hugepage.
Fixes: c8721bbb ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
Cc: <stable@vger.kernel.org>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
---
Changes in v3:
- Add Fixes: c8721bbb
- Add Cc: stable
- Elaborate on losing the gigantic page vs. failing memory offline
- Move page_count() check out of dissolve_free_huge_page()
Changes in v2:
- Update comment in dissolve_free_huge_pages()
- Change locking in dissolve_free_huge_page()
mm/hugetlb.c | 34 +++++++++++++++++++---------------
1 file changed, 19 insertions(+), 15 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 87e11d8..29e10a2 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1436,39 +1436,43 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
}
/*
- * Dissolve a given free hugepage into free buddy pages. This function does
- * nothing for in-use (including surplus) hugepages.
+ * Dissolve a given free hugepage into free buddy pages.
*/
static void dissolve_free_huge_page(struct page *page)
{
+ struct page *head = compound_head(page);
+ struct hstate *h = page_hstate(head);
+ int nid = page_to_nid(head);
+
spin_lock(&hugetlb_lock);
- if (PageHuge(page) && !page_count(page)) {
- struct hstate *h = page_hstate(page);
- int nid = page_to_nid(page);
- list_del(&page->lru);
- h->free_huge_pages--;
- h->free_huge_pages_node[nid]--;
- h->max_huge_pages--;
- update_and_free_page(h, page);
- }
+ list_del(&head->lru);
+ h->free_huge_pages--;
+ h->free_huge_pages_node[nid]--;
+ h->max_huge_pages--;
+ update_and_free_page(h, head);
spin_unlock(&hugetlb_lock);
}
/*
* Dissolve free hugepages in a given pfn range. Used by memory hotplug to
* make specified memory blocks removable from the system.
- * Note that start_pfn should aligned with (minimum) hugepage size.
+ * Note that this will dissolve a free gigantic hugepage completely, if any
+ * part of it lies within the given range.
+ * This function does nothing for in-use (including surplus) hugepages.
*/
void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
{
unsigned long pfn;
+ struct page *page;
if (!hugepages_supported())
return;
- VM_BUG_ON(!IS_ALIGNED(start_pfn, 1 << minimum_order));
- for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order)
- dissolve_free_huge_page(pfn_to_page(pfn));
+ for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order) {
+ page = pfn_to_page(pfn);
+ if (PageHuge(page) && !page_count(page))
+ dissolve_free_huge_page(page);
+ }
}
/*
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v3] mm/hugetlb: fix memory offline with hugepage size > memory block size
@ 2016-09-22 16:29 ` Gerald Schaefer
0 siblings, 0 replies; 48+ messages in thread
From: Gerald Schaefer @ 2016-09-22 16:29 UTC (permalink / raw)
To: Andrew Morton
Cc: Gerald Schaefer, Michal Hocko, Naoya Horiguchi, Hillf Danton,
linux-mm, linux-kernel, Kirill A . Shutemov, Vlastimil Babka,
Mike Kravetz, Aneesh Kumar K . V, Martin Schwidefsky,
Heiko Carstens, Dave Hansen, Rui Teng
dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
list corruption and addressing exception when trying to set a memory
block offline that is part (but not the first part) of a "gigantic"
hugetlb page with a size > memory block size.
When no other smaller hugetlb page sizes are present, the VM_BUG_ON()
will trigger directly. In the other case we will run into an addressing
exception later, because dissolve_free_huge_page() will not work on the
head page of the compound hugetlb page which will result in a NULL
hstate from page_hstate().
To fix this, first remove the VM_BUG_ON() because it is wrong, and then
use the compound head page in dissolve_free_huge_page(). This means that
an unused pre-allocated gigantic page that has any part of itself inside
the memory block that is going offline will be dissolved completely.
Losing the gigantic hugepage is preferable to failing the memory offline,
for example in the situation where a (possibly faulty) memory DIMM needs
to go offline.
Also move the PageHuge() and page_count() checks out of
dissolve_free_huge_page() in order to only take the spin_lock when
actually removing a hugepage.
Fixes: c8721bbb ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
Cc: <stable@vger.kernel.org>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
---
Changes in v3:
- Add Fixes: c8721bbb
- Add Cc: stable
- Elaborate on losing the gigantic page vs. failing memory offline
- Move page_count() check out of dissolve_free_huge_page()
Changes in v2:
- Update comment in dissolve_free_huge_pages()
- Change locking in dissolve_free_huge_page()
mm/hugetlb.c | 34 +++++++++++++++++++---------------
1 file changed, 19 insertions(+), 15 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 87e11d8..29e10a2 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1436,39 +1436,43 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
}
/*
- * Dissolve a given free hugepage into free buddy pages. This function does
- * nothing for in-use (including surplus) hugepages.
+ * Dissolve a given free hugepage into free buddy pages.
*/
static void dissolve_free_huge_page(struct page *page)
{
+ struct page *head = compound_head(page);
+ struct hstate *h = page_hstate(head);
+ int nid = page_to_nid(head);
+
spin_lock(&hugetlb_lock);
- if (PageHuge(page) && !page_count(page)) {
- struct hstate *h = page_hstate(page);
- int nid = page_to_nid(page);
- list_del(&page->lru);
- h->free_huge_pages--;
- h->free_huge_pages_node[nid]--;
- h->max_huge_pages--;
- update_and_free_page(h, page);
- }
+ list_del(&head->lru);
+ h->free_huge_pages--;
+ h->free_huge_pages_node[nid]--;
+ h->max_huge_pages--;
+ update_and_free_page(h, head);
spin_unlock(&hugetlb_lock);
}
/*
* Dissolve free hugepages in a given pfn range. Used by memory hotplug to
* make specified memory blocks removable from the system.
- * Note that start_pfn should aligned with (minimum) hugepage size.
+ * Note that this will dissolve a free gigantic hugepage completely, if any
+ * part of it lies within the given range.
+ * This function does nothing for in-use (including surplus) hugepages.
*/
void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
{
unsigned long pfn;
+ struct page *page;
if (!hugepages_supported())
return;
- VM_BUG_ON(!IS_ALIGNED(start_pfn, 1 << minimum_order));
- for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order)
- dissolve_free_huge_page(pfn_to_page(pfn));
+ for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order) {
+ page = pfn_to_page(pfn);
+ if (PageHuge(page) && !page_count(page))
+ dissolve_free_huge_page(page);
+ }
}
/*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 48+ messages in thread
* Re: [PATCH v3] mm/hugetlb: fix memory offline with hugepage size > memory block size
2016-09-22 16:29 ` Gerald Schaefer
@ 2016-09-22 18:12 ` Dave Hansen
-1 siblings, 0 replies; 48+ messages in thread
From: Dave Hansen @ 2016-09-22 18:12 UTC (permalink / raw)
To: Gerald Schaefer, Andrew Morton
Cc: Michal Hocko, Naoya Horiguchi, Hillf Danton, linux-mm,
linux-kernel, Kirill A . Shutemov, Vlastimil Babka, Mike Kravetz,
Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens, Rui Teng
On 09/22/2016 09:29 AM, Gerald Schaefer wrote:
> static void dissolve_free_huge_page(struct page *page)
> {
> + struct page *head = compound_head(page);
> + struct hstate *h = page_hstate(head);
> + int nid = page_to_nid(head);
> +
> spin_lock(&hugetlb_lock);
> - if (PageHuge(page) && !page_count(page)) {
> - struct hstate *h = page_hstate(page);
> - int nid = page_to_nid(page);
> - list_del(&page->lru);
> - h->free_huge_pages--;
> - h->free_huge_pages_node[nid]--;
> - h->max_huge_pages--;
> - update_and_free_page(h, page);
> - }
> + list_del(&head->lru);
> + h->free_huge_pages--;
> + h->free_huge_pages_node[nid]--;
> + h->max_huge_pages--;
> + update_and_free_page(h, head);
> spin_unlock(&hugetlb_lock);
> }
Do you need to revalidate anything once you acquire the lock? Can this,
for instance, race with another thread doing vm.nr_hugepages=0? Or a
thread faulting in and allocating the large page that's being dissolved?
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v3] mm/hugetlb: fix memory offline with hugepage size > memory block size
@ 2016-09-22 18:12 ` Dave Hansen
0 siblings, 0 replies; 48+ messages in thread
From: Dave Hansen @ 2016-09-22 18:12 UTC (permalink / raw)
To: Gerald Schaefer, Andrew Morton
Cc: Michal Hocko, Naoya Horiguchi, Hillf Danton, linux-mm,
linux-kernel, Kirill A . Shutemov, Vlastimil Babka, Mike Kravetz,
Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens, Rui Teng
On 09/22/2016 09:29 AM, Gerald Schaefer wrote:
> static void dissolve_free_huge_page(struct page *page)
> {
> + struct page *head = compound_head(page);
> + struct hstate *h = page_hstate(head);
> + int nid = page_to_nid(head);
> +
> spin_lock(&hugetlb_lock);
> - if (PageHuge(page) && !page_count(page)) {
> - struct hstate *h = page_hstate(page);
> - int nid = page_to_nid(page);
> - list_del(&page->lru);
> - h->free_huge_pages--;
> - h->free_huge_pages_node[nid]--;
> - h->max_huge_pages--;
> - update_and_free_page(h, page);
> - }
> + list_del(&head->lru);
> + h->free_huge_pages--;
> + h->free_huge_pages_node[nid]--;
> + h->max_huge_pages--;
> + update_and_free_page(h, head);
> spin_unlock(&hugetlb_lock);
> }
Do you need to revalidate anything once you acquire the lock? Can this,
for instance, race with another thread doing vm.nr_hugepages=0? Or a
thread faulting in and allocating the large page that's being dissolved?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v3] mm/hugetlb: fix memory offline with hugepage size > memory block size
2016-09-22 18:12 ` Dave Hansen
@ 2016-09-22 19:13 ` Mike Kravetz
-1 siblings, 0 replies; 48+ messages in thread
From: Mike Kravetz @ 2016-09-22 19:13 UTC (permalink / raw)
To: Dave Hansen, Gerald Schaefer, Andrew Morton
Cc: Michal Hocko, Naoya Horiguchi, Hillf Danton, linux-mm,
linux-kernel, Kirill A . Shutemov, Vlastimil Babka,
Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens, Rui Teng
On 09/22/2016 11:12 AM, Dave Hansen wrote:
> On 09/22/2016 09:29 AM, Gerald Schaefer wrote:
>> static void dissolve_free_huge_page(struct page *page)
>> {
>> + struct page *head = compound_head(page);
>> + struct hstate *h = page_hstate(head);
>> + int nid = page_to_nid(head);
>> +
>> spin_lock(&hugetlb_lock);
>> - if (PageHuge(page) && !page_count(page)) {
>> - struct hstate *h = page_hstate(page);
>> - int nid = page_to_nid(page);
>> - list_del(&page->lru);
>> - h->free_huge_pages--;
>> - h->free_huge_pages_node[nid]--;
>> - h->max_huge_pages--;
>> - update_and_free_page(h, page);
>> - }
>> + list_del(&head->lru);
>> + h->free_huge_pages--;
>> + h->free_huge_pages_node[nid]--;
>> + h->max_huge_pages--;
>> + update_and_free_page(h, head);
>> spin_unlock(&hugetlb_lock);
>> }
>
> Do you need to revalidate anything once you acquire the lock? Can this,
> for instance, race with another thread doing vm.nr_hugepages=0? Or a
> thread faulting in and allocating the large page that's being dissolved?
I originally suggested the locking change, but this is not quite right.
The page count for huge pages is adjusted while holding hugetlb_lock.
So, that check or a revalidation needs to be done while holding the lock.
That question made me think about huge page reservations. I don't think
the offline code takes this into account. But, you would not want your
huge page count to drop below the reserved huge page count
(resv_huge_pages).
So, shouldn't this be another condition to check before allowing the huge
page to be dissolved?
--
Mike Kravetz
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v3] mm/hugetlb: fix memory offline with hugepage size > memory block size
@ 2016-09-22 19:13 ` Mike Kravetz
0 siblings, 0 replies; 48+ messages in thread
From: Mike Kravetz @ 2016-09-22 19:13 UTC (permalink / raw)
To: Dave Hansen, Gerald Schaefer, Andrew Morton
Cc: Michal Hocko, Naoya Horiguchi, Hillf Danton, linux-mm,
linux-kernel, Kirill A . Shutemov, Vlastimil Babka,
Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens, Rui Teng
On 09/22/2016 11:12 AM, Dave Hansen wrote:
> On 09/22/2016 09:29 AM, Gerald Schaefer wrote:
>> static void dissolve_free_huge_page(struct page *page)
>> {
>> + struct page *head = compound_head(page);
>> + struct hstate *h = page_hstate(head);
>> + int nid = page_to_nid(head);
>> +
>> spin_lock(&hugetlb_lock);
>> - if (PageHuge(page) && !page_count(page)) {
>> - struct hstate *h = page_hstate(page);
>> - int nid = page_to_nid(page);
>> - list_del(&page->lru);
>> - h->free_huge_pages--;
>> - h->free_huge_pages_node[nid]--;
>> - h->max_huge_pages--;
>> - update_and_free_page(h, page);
>> - }
>> + list_del(&head->lru);
>> + h->free_huge_pages--;
>> + h->free_huge_pages_node[nid]--;
>> + h->max_huge_pages--;
>> + update_and_free_page(h, head);
>> spin_unlock(&hugetlb_lock);
>> }
>
> Do you need to revalidate anything once you acquire the lock? Can this,
> for instance, race with another thread doing vm.nr_hugepages=0? Or a
> thread faulting in and allocating the large page that's being dissolved?
I originally suggested the locking change, but this is not quite right.
The page count for huge pages is adjusted while holding hugetlb_lock.
So, that check or a revalidation needs to be done while holding the lock.
That question made me think about huge page reservations. I don't think
the offline code takes this into account. But, you would not want your
huge page count to drop below the reserved huge page count
(resv_huge_pages).
So, shouldn't this be another condition to check before allowing the huge
page to be dissolved?
--
Mike Kravetz
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v3] mm/hugetlb: fix memory offline with hugepage size > memory block size
2016-09-22 18:12 ` Dave Hansen
@ 2016-09-23 10:36 ` Gerald Schaefer
-1 siblings, 0 replies; 48+ messages in thread
From: Gerald Schaefer @ 2016-09-23 10:36 UTC (permalink / raw)
To: Dave Hansen
Cc: Andrew Morton, Michal Hocko, Naoya Horiguchi, Hillf Danton,
linux-mm, linux-kernel, Kirill A . Shutemov, Vlastimil Babka,
Mike Kravetz, Aneesh Kumar K . V, Martin Schwidefsky,
Heiko Carstens, Rui Teng
On Thu, 22 Sep 2016 11:12:06 -0700
Dave Hansen <dave.hansen@linux.intel.com> wrote:
> On 09/22/2016 09:29 AM, Gerald Schaefer wrote:
> > static void dissolve_free_huge_page(struct page *page)
> > {
> > + struct page *head = compound_head(page);
> > + struct hstate *h = page_hstate(head);
> > + int nid = page_to_nid(head);
> > +
> > spin_lock(&hugetlb_lock);
> > - if (PageHuge(page) && !page_count(page)) {
> > - struct hstate *h = page_hstate(page);
> > - int nid = page_to_nid(page);
> > - list_del(&page->lru);
> > - h->free_huge_pages--;
> > - h->free_huge_pages_node[nid]--;
> > - h->max_huge_pages--;
> > - update_and_free_page(h, page);
> > - }
> > + list_del(&head->lru);
> > + h->free_huge_pages--;
> > + h->free_huge_pages_node[nid]--;
> > + h->max_huge_pages--;
> > + update_and_free_page(h, head);
> > spin_unlock(&hugetlb_lock);
> > }
>
> Do you need to revalidate anything once you acquire the lock? Can this,
> for instance, race with another thread doing vm.nr_hugepages=0? Or a
> thread faulting in and allocating the large page that's being dissolved?
>
Yes, good point. I was relying on the range being isolated, but that only
seems to be checked in dequeue_huge_page_node(), as introduced with the
original commit. So this would only protect against anyone allocating the
hugepage at this point. This is also somehow expected, since we already
are beyond the "point of no return" in offline_pages().
vm.nr_hugepages=0 seems to be an issue though, as set_max_hugepages()
will not care about isolation, and so I guess we could have a race here
and double-free the hugepage. Revalidation of at least PageHuge() after
taking the lock should protect from that, not sure about page_count(),
but I think I'll just check both which will give the same behaviour as
before.
Will send v4, after thinking a bit more on the page reservation point
brought up by Mike.
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v3] mm/hugetlb: fix memory offline with hugepage size > memory block size
@ 2016-09-23 10:36 ` Gerald Schaefer
0 siblings, 0 replies; 48+ messages in thread
From: Gerald Schaefer @ 2016-09-23 10:36 UTC (permalink / raw)
To: Dave Hansen
Cc: Andrew Morton, Michal Hocko, Naoya Horiguchi, Hillf Danton,
linux-mm, linux-kernel, Kirill A . Shutemov, Vlastimil Babka,
Mike Kravetz, Aneesh Kumar K . V, Martin Schwidefsky,
Heiko Carstens, Rui Teng
On Thu, 22 Sep 2016 11:12:06 -0700
Dave Hansen <dave.hansen@linux.intel.com> wrote:
> On 09/22/2016 09:29 AM, Gerald Schaefer wrote:
> > static void dissolve_free_huge_page(struct page *page)
> > {
> > + struct page *head = compound_head(page);
> > + struct hstate *h = page_hstate(head);
> > + int nid = page_to_nid(head);
> > +
> > spin_lock(&hugetlb_lock);
> > - if (PageHuge(page) && !page_count(page)) {
> > - struct hstate *h = page_hstate(page);
> > - int nid = page_to_nid(page);
> > - list_del(&page->lru);
> > - h->free_huge_pages--;
> > - h->free_huge_pages_node[nid]--;
> > - h->max_huge_pages--;
> > - update_and_free_page(h, page);
> > - }
> > + list_del(&head->lru);
> > + h->free_huge_pages--;
> > + h->free_huge_pages_node[nid]--;
> > + h->max_huge_pages--;
> > + update_and_free_page(h, head);
> > spin_unlock(&hugetlb_lock);
> > }
>
> Do you need to revalidate anything once you acquire the lock? Can this,
> for instance, race with another thread doing vm.nr_hugepages=0? Or a
> thread faulting in and allocating the large page that's being dissolved?
>
Yes, good point. I was relying on the range being isolated, but that only
seems to be checked in dequeue_huge_page_node(), as introduced with the
original commit. So this would only protect against anyone allocating the
hugepage at this point. This is also somehow expected, since we already
are beyond the "point of no return" in offline_pages().
vm.nr_hugepages=0 seems to be an issue though, as set_max_hugepages()
will not care about isolation, and so I guess we could have a race here
and double-free the hugepage. Revalidation of at least PageHuge() after
taking the lock should protect from that, not sure about page_count(),
but I think I'll just check both which will give the same behaviour as
before.
Will send v4, after thinking a bit more on the page reservation point
brought up by Mike.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 1/1] mm/hugetlb: fix memory offline with hugepage size > memory block size
2016-09-22 9:51 ` Michal Hocko
@ 2016-09-23 6:40 ` Rui Teng
-1 siblings, 0 replies; 48+ messages in thread
From: Rui Teng @ 2016-09-23 6:40 UTC (permalink / raw)
To: Michal Hocko, Gerald Schaefer
Cc: Andrew Morton, Naoya Horiguchi, Hillf Danton, linux-mm,
linux-kernel, Kirill A . Shutemov, Vlastimil Babka, Mike Kravetz,
Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens,
Dave Hansen
On 9/22/16 5:51 PM, Michal Hocko wrote:
> On Wed 21-09-16 14:35:34, Gerald Schaefer wrote:
>> dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
>> list corruption and addressing exception when trying to set a memory
>> block offline that is part (but not the first part) of a hugetlb page
>> with a size > memory block size.
>>
>> When no other smaller hugetlb page sizes are present, the VM_BUG_ON()
>> will trigger directly. In the other case we will run into an addressing
>> exception later, because dissolve_free_huge_page() will not work on the
>> head page of the compound hugetlb page which will result in a NULL
>> hstate from page_hstate().
>>
>> To fix this, first remove the VM_BUG_ON() because it is wrong, and then
>> use the compound head page in dissolve_free_huge_page().
>
> OK so dissolve_free_huge_page will work also on tail pages now which
> makes some sense. I would appreciate also few words why do we want to
> sacrifice something as precious as gigantic page rather than fail the
> page block offline. Dave pointed out dim offline usecase for example.
>
>> Also change locking in dissolve_free_huge_page(), so that it only takes
>> the lock when actually removing a hugepage.
>
> From a quick look it seems this has been broken since introduced by
> c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle
> hugepage"). Do we want to have this backported to stable? In any way
> Fixes: SHA1 would be really nice.
>
If the huge page hot-plug function was introduced by c8721bbbdd36, and
it has already indicated that the gigantic page is not supported:
"As for larger hugepages (1GB for x86_64), it's not easy to do
hotremove over them because it's larger than memory block. So
we now simply leave it to fail as it is."
Is it possible that the gigantic page hot-plugin has never been
supported?
I made another patch for this problem, and also tried to apply the
first version of this patch on my system too. But they only postpone
the error happened. The HugePages_Free will be changed from 2 to 1, if I
offline a huge page. I think it does not have a correct roll back.
# cat /proc/meminfo | grep -i huge
AnonHugePages: 0 kB
HugePages_Total: 2
HugePages_Free: 1
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 16777216 kB
I will make more test on it, but can any one confirm that this function
has been implemented and tested before?
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 1/1] mm/hugetlb: fix memory offline with hugepage size > memory block size
@ 2016-09-23 6:40 ` Rui Teng
0 siblings, 0 replies; 48+ messages in thread
From: Rui Teng @ 2016-09-23 6:40 UTC (permalink / raw)
To: Michal Hocko, Gerald Schaefer
Cc: Andrew Morton, Naoya Horiguchi, Hillf Danton, linux-mm,
linux-kernel, Kirill A . Shutemov, Vlastimil Babka, Mike Kravetz,
Aneesh Kumar K . V, Martin Schwidefsky, Heiko Carstens,
Dave Hansen
On 9/22/16 5:51 PM, Michal Hocko wrote:
> On Wed 21-09-16 14:35:34, Gerald Schaefer wrote:
>> dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
>> list corruption and addressing exception when trying to set a memory
>> block offline that is part (but not the first part) of a hugetlb page
>> with a size > memory block size.
>>
>> When no other smaller hugetlb page sizes are present, the VM_BUG_ON()
>> will trigger directly. In the other case we will run into an addressing
>> exception later, because dissolve_free_huge_page() will not work on the
>> head page of the compound hugetlb page which will result in a NULL
>> hstate from page_hstate().
>>
>> To fix this, first remove the VM_BUG_ON() because it is wrong, and then
>> use the compound head page in dissolve_free_huge_page().
>
> OK so dissolve_free_huge_page will work also on tail pages now which
> makes some sense. I would appreciate also few words why do we want to
> sacrifice something as precious as gigantic page rather than fail the
> page block offline. Dave pointed out dim offline usecase for example.
>
>> Also change locking in dissolve_free_huge_page(), so that it only takes
>> the lock when actually removing a hugepage.
>
> From a quick look it seems this has been broken since introduced by
> c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle
> hugepage"). Do we want to have this backported to stable? In any way
> Fixes: SHA1 would be really nice.
>
If the huge page hot-plug function was introduced by c8721bbbdd36, and
it has already indicated that the gigantic page is not supported:
"As for larger hugepages (1GB for x86_64), it's not easy to do
hotremove over them because it's larger than memory block. So
we now simply leave it to fail as it is."
Is it possible that the gigantic page hot-plugin has never been
supported?
I made another patch for this problem, and also tried to apply the
first version of this patch on my system too. But they only postpone
the error happened. The HugePages_Free will be changed from 2 to 1, if I
offline a huge page. I think it does not have a correct roll back.
# cat /proc/meminfo | grep -i huge
AnonHugePages: 0 kB
HugePages_Total: 2
HugePages_Free: 1
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 16777216 kB
I will make more test on it, but can any one confirm that this function
has been implemented and tested before?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 1/1] mm/hugetlb: fix memory offline with hugepage size > memory block size
2016-09-23 6:40 ` Rui Teng
@ 2016-09-23 11:03 ` Gerald Schaefer
-1 siblings, 0 replies; 48+ messages in thread
From: Gerald Schaefer @ 2016-09-23 11:03 UTC (permalink / raw)
To: Rui Teng
Cc: Michal Hocko, Andrew Morton, Naoya Horiguchi, Hillf Danton,
linux-mm, linux-kernel, Kirill A . Shutemov, Vlastimil Babka,
Mike Kravetz, Aneesh Kumar K . V, Martin Schwidefsky,
Heiko Carstens, Dave Hansen
On Fri, 23 Sep 2016 14:40:33 +0800
Rui Teng <rui.teng@linux.vnet.ibm.com> wrote:
> On 9/22/16 5:51 PM, Michal Hocko wrote:
> > On Wed 21-09-16 14:35:34, Gerald Schaefer wrote:
> >> dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
> >> list corruption and addressing exception when trying to set a memory
> >> block offline that is part (but not the first part) of a hugetlb page
> >> with a size > memory block size.
> >>
> >> When no other smaller hugetlb page sizes are present, the VM_BUG_ON()
> >> will trigger directly. In the other case we will run into an addressing
> >> exception later, because dissolve_free_huge_page() will not work on the
> >> head page of the compound hugetlb page which will result in a NULL
> >> hstate from page_hstate().
> >>
> >> To fix this, first remove the VM_BUG_ON() because it is wrong, and then
> >> use the compound head page in dissolve_free_huge_page().
> >
> > OK so dissolve_free_huge_page will work also on tail pages now which
> > makes some sense. I would appreciate also few words why do we want to
> > sacrifice something as precious as gigantic page rather than fail the
> > page block offline. Dave pointed out dim offline usecase for example.
> >
> >> Also change locking in dissolve_free_huge_page(), so that it only takes
> >> the lock when actually removing a hugepage.
> >
> > From a quick look it seems this has been broken since introduced by
> > c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle
> > hugepage"). Do we want to have this backported to stable? In any way
> > Fixes: SHA1 would be really nice.
> >
>
> If the huge page hot-plug function was introduced by c8721bbbdd36, and
> it has already indicated that the gigantic page is not supported:
>
> "As for larger hugepages (1GB for x86_64), it's not easy to do
> hotremove over them because it's larger than memory block. So
> we now simply leave it to fail as it is."
>
> Is it possible that the gigantic page hot-plugin has never been
> supported?
Offlining blocks with gigantic pages only fails when they are in-use,
I guess that was meant by the description. Maybe it was also meant to
fail in any case, but that was not was the patch did.
With free gigantic pages, it looks like it only ever worked when
offlining the first block of a gigantic page. And as long as you only
have gigantic pages, the VM_BUG_ON() would actually have triggered on
every block that is not gigantic-page-aligned, even if the block is not
part of any gigantic page at all.
Given the age of the patch it is a little bit surprising that it never
struck anyone, and that we now have found it on two architectures at
once :-)
>
> I made another patch for this problem, and also tried to apply the
> first version of this patch on my system too. But they only postpone
> the error happened. The HugePages_Free will be changed from 2 to 1, if I
> offline a huge page. I think it does not have a correct roll back.
>
> # cat /proc/meminfo | grep -i huge
> AnonHugePages: 0 kB
> HugePages_Total: 2
> HugePages_Free: 1
> HugePages_Rsvd: 0
> HugePages_Surp: 0
> Hugepagesize: 16777216 kB
HugePages_Free is supposed to be reduced when offlining a block, but
then HugePages_Total should also be reduced, so that is strange. On my
system both were reduced. Does this happen with any version of my patch?
What do you mean with postpone the error? Can you reproduce the BUG_ON
or the addressing exception with my patch?
>
> I will make more test on it, but can any one confirm that this function
> has been implemented and tested before?
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 1/1] mm/hugetlb: fix memory offline with hugepage size > memory block size
@ 2016-09-23 11:03 ` Gerald Schaefer
0 siblings, 0 replies; 48+ messages in thread
From: Gerald Schaefer @ 2016-09-23 11:03 UTC (permalink / raw)
To: Rui Teng
Cc: Michal Hocko, Andrew Morton, Naoya Horiguchi, Hillf Danton,
linux-mm, linux-kernel, Kirill A . Shutemov, Vlastimil Babka,
Mike Kravetz, Aneesh Kumar K . V, Martin Schwidefsky,
Heiko Carstens, Dave Hansen
On Fri, 23 Sep 2016 14:40:33 +0800
Rui Teng <rui.teng@linux.vnet.ibm.com> wrote:
> On 9/22/16 5:51 PM, Michal Hocko wrote:
> > On Wed 21-09-16 14:35:34, Gerald Schaefer wrote:
> >> dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
> >> list corruption and addressing exception when trying to set a memory
> >> block offline that is part (but not the first part) of a hugetlb page
> >> with a size > memory block size.
> >>
> >> When no other smaller hugetlb page sizes are present, the VM_BUG_ON()
> >> will trigger directly. In the other case we will run into an addressing
> >> exception later, because dissolve_free_huge_page() will not work on the
> >> head page of the compound hugetlb page which will result in a NULL
> >> hstate from page_hstate().
> >>
> >> To fix this, first remove the VM_BUG_ON() because it is wrong, and then
> >> use the compound head page in dissolve_free_huge_page().
> >
> > OK so dissolve_free_huge_page will work also on tail pages now which
> > makes some sense. I would appreciate also few words why do we want to
> > sacrifice something as precious as gigantic page rather than fail the
> > page block offline. Dave pointed out dim offline usecase for example.
> >
> >> Also change locking in dissolve_free_huge_page(), so that it only takes
> >> the lock when actually removing a hugepage.
> >
> > From a quick look it seems this has been broken since introduced by
> > c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle
> > hugepage"). Do we want to have this backported to stable? In any way
> > Fixes: SHA1 would be really nice.
> >
>
> If the huge page hot-plug function was introduced by c8721bbbdd36, and
> it has already indicated that the gigantic page is not supported:
>
> "As for larger hugepages (1GB for x86_64), it's not easy to do
> hotremove over them because it's larger than memory block. So
> we now simply leave it to fail as it is."
>
> Is it possible that the gigantic page hot-plugin has never been
> supported?
Offlining blocks with gigantic pages only fails when they are in-use,
I guess that was meant by the description. Maybe it was also meant to
fail in any case, but that was not was the patch did.
With free gigantic pages, it looks like it only ever worked when
offlining the first block of a gigantic page. And as long as you only
have gigantic pages, the VM_BUG_ON() would actually have triggered on
every block that is not gigantic-page-aligned, even if the block is not
part of any gigantic page at all.
Given the age of the patch it is a little bit surprising that it never
struck anyone, and that we now have found it on two architectures at
once :-)
>
> I made another patch for this problem, and also tried to apply the
> first version of this patch on my system too. But they only postpone
> the error happened. The HugePages_Free will be changed from 2 to 1, if I
> offline a huge page. I think it does not have a correct roll back.
>
> # cat /proc/meminfo | grep -i huge
> AnonHugePages: 0 kB
> HugePages_Total: 2
> HugePages_Free: 1
> HugePages_Rsvd: 0
> HugePages_Surp: 0
> Hugepagesize: 16777216 kB
HugePages_Free is supposed to be reduced when offlining a block, but
then HugePages_Total should also be reduced, so that is strange. On my
system both were reduced. Does this happen with any version of my patch?
What do you mean with postpone the error? Can you reproduce the BUG_ON
or the addressing exception with my patch?
>
> I will make more test on it, but can any one confirm that this function
> has been implemented and tested before?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 1/1] mm/hugetlb: fix memory offline with hugepage size > memory block size
2016-09-23 11:03 ` Gerald Schaefer
@ 2016-09-26 2:49 ` Rui Teng
-1 siblings, 0 replies; 48+ messages in thread
From: Rui Teng @ 2016-09-26 2:49 UTC (permalink / raw)
To: Gerald Schaefer
Cc: Michal Hocko, Andrew Morton, Naoya Horiguchi, Hillf Danton,
linux-mm, linux-kernel, Kirill A . Shutemov, Vlastimil Babka,
Mike Kravetz, Aneesh Kumar K . V, Martin Schwidefsky,
Heiko Carstens, Dave Hansen
On 9/23/16 7:03 PM, Gerald Schaefer wrote:
> On Fri, 23 Sep 2016 14:40:33 +0800
> Rui Teng <rui.teng@linux.vnet.ibm.com> wrote:
>
>> On 9/22/16 5:51 PM, Michal Hocko wrote:
>>> On Wed 21-09-16 14:35:34, Gerald Schaefer wrote:
>>>> dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
>>>> list corruption and addressing exception when trying to set a memory
>>>> block offline that is part (but not the first part) of a hugetlb page
>>>> with a size > memory block size.
>>>>
>>>> When no other smaller hugetlb page sizes are present, the VM_BUG_ON()
>>>> will trigger directly. In the other case we will run into an addressing
>>>> exception later, because dissolve_free_huge_page() will not work on the
>>>> head page of the compound hugetlb page which will result in a NULL
>>>> hstate from page_hstate().
>>>>
>>>> To fix this, first remove the VM_BUG_ON() because it is wrong, and then
>>>> use the compound head page in dissolve_free_huge_page().
>>>
>>> OK so dissolve_free_huge_page will work also on tail pages now which
>>> makes some sense. I would appreciate also few words why do we want to
>>> sacrifice something as precious as gigantic page rather than fail the
>>> page block offline. Dave pointed out dim offline usecase for example.
>>>
>>>> Also change locking in dissolve_free_huge_page(), so that it only takes
>>>> the lock when actually removing a hugepage.
>>>
>>> From a quick look it seems this has been broken since introduced by
>>> c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle
>>> hugepage"). Do we want to have this backported to stable? In any way
>>> Fixes: SHA1 would be really nice.
>>>
>>
>> If the huge page hot-plug function was introduced by c8721bbbdd36, and
>> it has already indicated that the gigantic page is not supported:
>>
>> "As for larger hugepages (1GB for x86_64), it's not easy to do
>> hotremove over them because it's larger than memory block. So
>> we now simply leave it to fail as it is."
>>
>> Is it possible that the gigantic page hot-plugin has never been
>> supported?
>
> Offlining blocks with gigantic pages only fails when they are in-use,
> I guess that was meant by the description. Maybe it was also meant to
> fail in any case, but that was not was the patch did.
>
> With free gigantic pages, it looks like it only ever worked when
> offlining the first block of a gigantic page. And as long as you only
> have gigantic pages, the VM_BUG_ON() would actually have triggered on
> every block that is not gigantic-page-aligned, even if the block is not
> part of any gigantic page at all.
I have not met the VM_BUG_ON() issue on my powerpc architecture. Seems
it does not always have the align issue on other architectures.
>
> Given the age of the patch it is a little bit surprising that it never
> struck anyone, and that we now have found it on two architectures at
> once :-)
>
>>
>> I made another patch for this problem, and also tried to apply the
>> first version of this patch on my system too. But they only postpone
>> the error happened. The HugePages_Free will be changed from 2 to 1, if I
>> offline a huge page. I think it does not have a correct roll back.
>>
>> # cat /proc/meminfo | grep -i huge
>> AnonHugePages: 0 kB
>> HugePages_Total: 2
>> HugePages_Free: 1
>> HugePages_Rsvd: 0
>> HugePages_Surp: 0
>> Hugepagesize: 16777216 kB
>
> HugePages_Free is supposed to be reduced when offlining a block, but
> then HugePages_Total should also be reduced, so that is strange. On my
> system both were reduced. Does this happen with any version of my patch?
No, I only tested your first version. I do not have any question on
your patch, because the error was not introduced by your patch.
>
> What do you mean with postpone the error? Can you reproduce the BUG_ON
> or the addressing exception with my patch?
I mean the gigantic offlining function does not work at all on my
environment, even if the correct head page has been found. My method is
to filter all the tail pages out, and your method is to find head page
from tail pages.
Since you can offline gigantic page successful, I think such function
is supported now. I will debug the problem on my environment.
>
>>
>> I will make more test on it, but can any one confirm that this function
>> has been implemented and tested before?
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 1/1] mm/hugetlb: fix memory offline with hugepage size > memory block size
@ 2016-09-26 2:49 ` Rui Teng
0 siblings, 0 replies; 48+ messages in thread
From: Rui Teng @ 2016-09-26 2:49 UTC (permalink / raw)
To: Gerald Schaefer
Cc: Michal Hocko, Andrew Morton, Naoya Horiguchi, Hillf Danton,
linux-mm, linux-kernel, Kirill A . Shutemov, Vlastimil Babka,
Mike Kravetz, Aneesh Kumar K . V, Martin Schwidefsky,
Heiko Carstens, Dave Hansen
On 9/23/16 7:03 PM, Gerald Schaefer wrote:
> On Fri, 23 Sep 2016 14:40:33 +0800
> Rui Teng <rui.teng@linux.vnet.ibm.com> wrote:
>
>> On 9/22/16 5:51 PM, Michal Hocko wrote:
>>> On Wed 21-09-16 14:35:34, Gerald Schaefer wrote:
>>>> dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
>>>> list corruption and addressing exception when trying to set a memory
>>>> block offline that is part (but not the first part) of a hugetlb page
>>>> with a size > memory block size.
>>>>
>>>> When no other smaller hugetlb page sizes are present, the VM_BUG_ON()
>>>> will trigger directly. In the other case we will run into an addressing
>>>> exception later, because dissolve_free_huge_page() will not work on the
>>>> head page of the compound hugetlb page which will result in a NULL
>>>> hstate from page_hstate().
>>>>
>>>> To fix this, first remove the VM_BUG_ON() because it is wrong, and then
>>>> use the compound head page in dissolve_free_huge_page().
>>>
>>> OK so dissolve_free_huge_page will work also on tail pages now which
>>> makes some sense. I would appreciate also few words why do we want to
>>> sacrifice something as precious as gigantic page rather than fail the
>>> page block offline. Dave pointed out dim offline usecase for example.
>>>
>>>> Also change locking in dissolve_free_huge_page(), so that it only takes
>>>> the lock when actually removing a hugepage.
>>>
>>> From a quick look it seems this has been broken since introduced by
>>> c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle
>>> hugepage"). Do we want to have this backported to stable? In any way
>>> Fixes: SHA1 would be really nice.
>>>
>>
>> If the huge page hot-plug function was introduced by c8721bbbdd36, and
>> it has already indicated that the gigantic page is not supported:
>>
>> "As for larger hugepages (1GB for x86_64), it's not easy to do
>> hotremove over them because it's larger than memory block. So
>> we now simply leave it to fail as it is."
>>
>> Is it possible that the gigantic page hot-plugin has never been
>> supported?
>
> Offlining blocks with gigantic pages only fails when they are in-use,
> I guess that was meant by the description. Maybe it was also meant to
> fail in any case, but that was not was the patch did.
>
> With free gigantic pages, it looks like it only ever worked when
> offlining the first block of a gigantic page. And as long as you only
> have gigantic pages, the VM_BUG_ON() would actually have triggered on
> every block that is not gigantic-page-aligned, even if the block is not
> part of any gigantic page at all.
I have not met the VM_BUG_ON() issue on my powerpc architecture. Seems
it does not always have the align issue on other architectures.
>
> Given the age of the patch it is a little bit surprising that it never
> struck anyone, and that we now have found it on two architectures at
> once :-)
>
>>
>> I made another patch for this problem, and also tried to apply the
>> first version of this patch on my system too. But they only postpone
>> the error happened. The HugePages_Free will be changed from 2 to 1, if I
>> offline a huge page. I think it does not have a correct roll back.
>>
>> # cat /proc/meminfo | grep -i huge
>> AnonHugePages: 0 kB
>> HugePages_Total: 2
>> HugePages_Free: 1
>> HugePages_Rsvd: 0
>> HugePages_Surp: 0
>> Hugepagesize: 16777216 kB
>
> HugePages_Free is supposed to be reduced when offlining a block, but
> then HugePages_Total should also be reduced, so that is strange. On my
> system both were reduced. Does this happen with any version of my patch?
No, I only tested your first version. I do not have any question on
your patch, because the error was not introduced by your patch.
>
> What do you mean with postpone the error? Can you reproduce the BUG_ON
> or the addressing exception with my patch?
I mean the gigantic offlining function does not work at all on my
environment, even if the correct head page has been found. My method is
to filter all the tail pages out, and your method is to find head page
from tail pages.
Since you can offline gigantic page successful, I think such function
is supported now. I will debug the problem on my environment.
>
>>
>> I will make more test on it, but can any one confirm that this function
>> has been implemented and tested before?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 48+ messages in thread