All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gerald Schaefer <gerald.schaefer@de.ibm.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>,
	Michal Hocko <mhocko@kernel.org>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Hillf Danton <hillf.zj@alibaba-inc.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	"Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Rui Teng <rui.teng@linux.vnet.ibm.com>
Subject: [PATCH v3] mm/hugetlb: fix memory offline with hugepage size > memory block size
Date: Thu, 22 Sep 2016 18:29:37 +0200	[thread overview]
Message-ID: <20160922182937.38af9d0e@thinkpad> (raw)
In-Reply-To: <20160922154549.483ee313@thinkpad>

dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
list corruption and addressing exception when trying to set a memory
block offline that is part (but not the first part) of a "gigantic"
hugetlb page with a size > memory block size.

When no other smaller hugetlb page sizes are present, the VM_BUG_ON()
will trigger directly. In the other case we will run into an addressing
exception later, because dissolve_free_huge_page() will not work on the
head page of the compound hugetlb page which will result in a NULL
hstate from page_hstate().

To fix this, first remove the VM_BUG_ON() because it is wrong, and then
use the compound head page in dissolve_free_huge_page(). This means that
an unused pre-allocated gigantic page that has any part of itself inside
the memory block that is going offline will be dissolved completely.
Losing the gigantic hugepage is preferable to failing the memory offline,
for example in the situation where a (possibly faulty) memory DIMM needs
to go offline.

Also move the PageHuge() and page_count() checks out of
dissolve_free_huge_page() in order to only take the spin_lock when
actually removing a hugepage.

Fixes: c8721bbb ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
Cc: <stable@vger.kernel.org>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
---
Changes in v3:
- Add Fixes: c8721bbb
- Add Cc: stable
- Elaborate on losing the gigantic page vs. failing memory offline
- Move page_count() check out of dissolve_free_huge_page()

Changes in v2:
- Update comment in dissolve_free_huge_pages()
- Change locking in dissolve_free_huge_page()

 mm/hugetlb.c | 34 +++++++++++++++++++---------------
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 87e11d8..29e10a2 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1436,39 +1436,43 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
 }
 
 /*
- * Dissolve a given free hugepage into free buddy pages. This function does
- * nothing for in-use (including surplus) hugepages.
+ * Dissolve a given free hugepage into free buddy pages.
  */
 static void dissolve_free_huge_page(struct page *page)
 {
+	struct page *head = compound_head(page);
+	struct hstate *h = page_hstate(head);
+	int nid = page_to_nid(head);
+
 	spin_lock(&hugetlb_lock);
-	if (PageHuge(page) && !page_count(page)) {
-		struct hstate *h = page_hstate(page);
-		int nid = page_to_nid(page);
-		list_del(&page->lru);
-		h->free_huge_pages--;
-		h->free_huge_pages_node[nid]--;
-		h->max_huge_pages--;
-		update_and_free_page(h, page);
-	}
+	list_del(&head->lru);
+	h->free_huge_pages--;
+	h->free_huge_pages_node[nid]--;
+	h->max_huge_pages--;
+	update_and_free_page(h, head);
 	spin_unlock(&hugetlb_lock);
 }
 
 /*
  * Dissolve free hugepages in a given pfn range. Used by memory hotplug to
  * make specified memory blocks removable from the system.
- * Note that start_pfn should aligned with (minimum) hugepage size.
+ * Note that this will dissolve a free gigantic hugepage completely, if any
+ * part of it lies within the given range.
+ * This function does nothing for in-use (including surplus) hugepages.
  */
 void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
 {
 	unsigned long pfn;
+	struct page *page;
 
 	if (!hugepages_supported())
 		return;
 
-	VM_BUG_ON(!IS_ALIGNED(start_pfn, 1 << minimum_order));
-	for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order)
-		dissolve_free_huge_page(pfn_to_page(pfn));
+	for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order) {
+		page = pfn_to_page(pfn);
+		if (PageHuge(page) && !page_count(page))
+			dissolve_free_huge_page(page);
+	}
 }
 
 /*

WARNING: multiple messages have this Message-ID (diff)
From: Gerald Schaefer <gerald.schaefer@de.ibm.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>,
	Michal Hocko <mhocko@kernel.org>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Hillf Danton <hillf.zj@alibaba-inc.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	"Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Rui Teng <rui.teng@linux.vnet.ibm.com>
Subject: [PATCH v3] mm/hugetlb: fix memory offline with hugepage size > memory block size
Date: Thu, 22 Sep 2016 18:29:37 +0200	[thread overview]
Message-ID: <20160922182937.38af9d0e@thinkpad> (raw)
In-Reply-To: <20160922154549.483ee313@thinkpad>

dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
list corruption and addressing exception when trying to set a memory
block offline that is part (but not the first part) of a "gigantic"
hugetlb page with a size > memory block size.

When no other smaller hugetlb page sizes are present, the VM_BUG_ON()
will trigger directly. In the other case we will run into an addressing
exception later, because dissolve_free_huge_page() will not work on the
head page of the compound hugetlb page which will result in a NULL
hstate from page_hstate().

To fix this, first remove the VM_BUG_ON() because it is wrong, and then
use the compound head page in dissolve_free_huge_page(). This means that
an unused pre-allocated gigantic page that has any part of itself inside
the memory block that is going offline will be dissolved completely.
Losing the gigantic hugepage is preferable to failing the memory offline,
for example in the situation where a (possibly faulty) memory DIMM needs
to go offline.

Also move the PageHuge() and page_count() checks out of
dissolve_free_huge_page() in order to only take the spin_lock when
actually removing a hugepage.

Fixes: c8721bbb ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
Cc: <stable@vger.kernel.org>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
---
Changes in v3:
- Add Fixes: c8721bbb
- Add Cc: stable
- Elaborate on losing the gigantic page vs. failing memory offline
- Move page_count() check out of dissolve_free_huge_page()

Changes in v2:
- Update comment in dissolve_free_huge_pages()
- Change locking in dissolve_free_huge_page()

 mm/hugetlb.c | 34 +++++++++++++++++++---------------
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 87e11d8..29e10a2 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1436,39 +1436,43 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
 }
 
 /*
- * Dissolve a given free hugepage into free buddy pages. This function does
- * nothing for in-use (including surplus) hugepages.
+ * Dissolve a given free hugepage into free buddy pages.
  */
 static void dissolve_free_huge_page(struct page *page)
 {
+	struct page *head = compound_head(page);
+	struct hstate *h = page_hstate(head);
+	int nid = page_to_nid(head);
+
 	spin_lock(&hugetlb_lock);
-	if (PageHuge(page) && !page_count(page)) {
-		struct hstate *h = page_hstate(page);
-		int nid = page_to_nid(page);
-		list_del(&page->lru);
-		h->free_huge_pages--;
-		h->free_huge_pages_node[nid]--;
-		h->max_huge_pages--;
-		update_and_free_page(h, page);
-	}
+	list_del(&head->lru);
+	h->free_huge_pages--;
+	h->free_huge_pages_node[nid]--;
+	h->max_huge_pages--;
+	update_and_free_page(h, head);
 	spin_unlock(&hugetlb_lock);
 }
 
 /*
  * Dissolve free hugepages in a given pfn range. Used by memory hotplug to
  * make specified memory blocks removable from the system.
- * Note that start_pfn should aligned with (minimum) hugepage size.
+ * Note that this will dissolve a free gigantic hugepage completely, if any
+ * part of it lies within the given range.
+ * This function does nothing for in-use (including surplus) hugepages.
  */
 void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
 {
 	unsigned long pfn;
+	struct page *page;
 
 	if (!hugepages_supported())
 		return;
 
-	VM_BUG_ON(!IS_ALIGNED(start_pfn, 1 << minimum_order));
-	for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order)
-		dissolve_free_huge_page(pfn_to_page(pfn));
+	for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order) {
+		page = pfn_to_page(pfn);
+		if (PageHuge(page) && !page_count(page))
+			dissolve_free_huge_page(page);
+	}
 }
 
 /*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-09-22 16:29 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-20 15:53 [PATCH 0/1] memory offline issues with hugepage size > memory block size Gerald Schaefer
2016-09-20 15:53 ` Gerald Schaefer
2016-09-20 15:53 ` [PATCH 1/1] mm/hugetlb: fix memory offline " Gerald Schaefer
2016-09-20 15:53   ` Gerald Schaefer
2016-09-21  6:29   ` Hillf Danton
2016-09-21  6:29     ` Hillf Danton
2016-09-21 12:35     ` [PATCH v2 " Gerald Schaefer
2016-09-21 12:35       ` Gerald Schaefer
2016-09-21 13:17       ` Rui Teng
2016-09-21 13:17         ` Rui Teng
2016-09-21 15:13         ` Gerald Schaefer
2016-09-21 15:13           ` Gerald Schaefer
2016-09-22  7:58       ` Hillf Danton
2016-09-22  7:58         ` Hillf Danton
2016-09-22  9:51       ` Michal Hocko
2016-09-22  9:51         ` Michal Hocko
2016-09-22 13:45         ` Gerald Schaefer
2016-09-22 13:45           ` Gerald Schaefer
2016-09-22 16:29           ` Gerald Schaefer [this message]
2016-09-22 16:29             ` [PATCH v3] " Gerald Schaefer
2016-09-22 18:12             ` Dave Hansen
2016-09-22 18:12               ` Dave Hansen
2016-09-22 19:13               ` Mike Kravetz
2016-09-22 19:13                 ` Mike Kravetz
2016-09-23 10:36               ` Gerald Schaefer
2016-09-23 10:36                 ` Gerald Schaefer
2016-09-23  6:40         ` [PATCH v2 1/1] " Rui Teng
2016-09-23  6:40           ` Rui Teng
2016-09-23 11:03           ` Gerald Schaefer
2016-09-23 11:03             ` Gerald Schaefer
2016-09-26  2:49             ` Rui Teng
2016-09-26  2:49               ` Rui Teng
2016-09-20 17:37 ` [PATCH 0/1] memory offline issues " Mike Kravetz
2016-09-20 17:37   ` Mike Kravetz
2016-09-20 17:45   ` Dave Hansen
2016-09-20 17:45     ` Dave Hansen
2016-09-21  9:49     ` Vlastimil Babka
2016-09-21  9:49       ` Vlastimil Babka
2016-09-21 10:34     ` Gerald Schaefer
2016-09-21 10:34       ` Gerald Schaefer
2016-09-21 10:30   ` Gerald Schaefer
2016-09-21 10:30     ` Gerald Schaefer
2016-09-21 18:20   ` Michal Hocko
2016-09-21 18:20     ` Michal Hocko
2016-09-21 18:27     ` Dave Hansen
2016-09-21 18:27       ` Dave Hansen
2016-09-21 19:22       ` Michal Hocko
2016-09-21 19:22         ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160922182937.38af9d0e@thinkpad \
    --to=gerald.schaefer@de.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=hillf.zj@alibaba-inc.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=rui.teng@linux.vnet.ibm.com \
    --cc=schwidefsky@de.ibm.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.