All of lore.kernel.org
 help / color / mirror / Atom feed
From: "裘稀石(稀石)" <xishi.qiuxishi@alibaba-inc.com>
To: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: linux-mm <linux-mm@kvack.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	陈义全 <zy.zhengyi@alibaba-inc.com>
Subject: Re:[RFC] a question about reuse hwpoison page in soft_offline_page()
Date: Mon, 09 Jul 2018 13:43:35 +0800	[thread overview]
Message-ID: <518e6b02-47ef-4ba8-ab98-8d807e2de7d5.xishi.qiuxishi@alibaba-inc.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 2947 bytes --]


Hi Naoya,

Shall we fix this path too? It also will set hwpoison before dissolve_free_huge_page().

soft_offline_huge_page
    migrate_pages
        unmap_and_move_huge_page
            if (reason == MR_MEMORY_FAILURE && !test_set_page_hwpoison(hpage))
    dissolve_free_huge_page

Thanks,
Xishi QiuOn Mon, Jul 09, 2018 at 10:31:25AM +0800, 裘稀石(稀石) wrote:
> Hi Naoya,
> 
> I think the double check can not fix the problem as I said in another email.
> If someone mmap before soft offline, so the page_count(head) is still zero
> in soft offline, then hwpoison flag set and it can not be alloced again in
> dequeue_huge_page_node_exact() during page fault, so page fault return
> no-mem, and someone will be killed (not mce kill).
> 
> How about just set hwpoison flag after soft_offline_free_page - 
> dissolve_free_huge_page
> successfully? It will fix the both two problems (mce kill and no-mem kill).

Thank you for elaborating, you're right.
So do you like a fix like this?

---
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d34225c1cb5b..3c9ce4c05f1b 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1479,22 +1479,20 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
 /*
  * Dissolve a given free hugepage into free buddy pages. This function does
  * nothing for in-use (including surplus) hugepages. Returns -EBUSY if the
- * number of free hugepages would be reduced below the number of reserved
- * hugepages.
+ * dissolution fails because a give page is not a free hugepage, or because
+ * free hugepages are fully reserved.
  */
 int dissolve_free_huge_page(struct page *page)
 {
- int rc = 0;
+ int rc = -EBUSY;

  spin_lock(&hugetlb_lock);
  if (PageHuge(page) && !page_count(page)) {
   struct page *head = compound_head(page);
   struct hstate *h = page_hstate(head);
   int nid = page_to_nid(head);
-  if (h->free_huge_pages - h->resv_huge_pages == 0) {
-   rc = -EBUSY;
+  if (h->free_huge_pages - h->resv_huge_pages == 0)
    goto out;
-  }
   /*
    * Move PageHWPoison flag from head page to the raw error page,
    * which makes any subpages rather than the error page reusable.
@@ -1508,6 +1506,7 @@ int dissolve_free_huge_page(struct page *page)
   h->free_huge_pages_node[nid]--;
   h->max_huge_pages--;
   update_and_free_page(h, head);
+  rc = 0;
  }
 out:
  spin_unlock(&hugetlb_lock);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 9d142b9b86dc..e4c7e3ec7b10 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1715,13 +1715,13 @@ static int soft_offline_in_use_page(struct page *page, int flags)

 static void soft_offline_free_page(struct page *page)
 {
+ int rc = 0;
  struct page *head = compound_head(page);

- if (!TestSetPageHWPoison(head)) {
+ if (PageHuge(head))
+  rc = dissolve_free_huge_page(page);
+ if (!rc && !TestSetPageHWPoison(head))
   num_poisoned_pages_inc();
-  if (PageHuge(head))
-   dissolve_free_huge_page(page);
- }
 }

 /**


[-- Attachment #2: Type: text/html, Size: 6342 bytes --]

             reply	other threads:[~2018-07-09  5:43 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-09  5:43 裘稀石(稀石) [this message]
2018-07-09 10:28 ` Re:[RFC] a question about reuse hwpoison page in soft_offline_page() Naoya Horiguchi
2018-07-09 13:13   ` 回复:Re:[RFC] " 裘稀石(稀石)
2018-07-10  8:15     ` Naoya Horiguchi
  -- strict thread matches above, loose matches on Subject: below --
2018-07-09 12:42 Re:[RFC] " 裘稀石(稀石)
2018-07-09  2:31 裘稀石(稀石)
2018-07-09  4:16 ` Naoya Horiguchi
2018-07-06  9:59 裘稀石(稀石)
2018-07-09  0:38 ` Naoya Horiguchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=518e6b02-47ef-4ba8-ab98-8d807e2de7d5.xishi.qiuxishi@alibaba-inc.com \
    --to=xishi.qiuxishi@alibaba-inc.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=zy.zhengyi@alibaba-inc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.