LKML Archive on lore.kernel.org
 help / color / Atom feed
From: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>
To: Qian Cai <cai@lca.pw>
Cc: "nao.horiguchi@gmail.com" <nao.horiguchi@gmail.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"mhocko@kernel.org" <mhocko@kernel.org>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"mike.kravetz@oracle.com" <mike.kravetz@oracle.com>,
	"osalvador@suse.de" <osalvador@suse.de>,
	"tony.luck@intel.com" <tony.luck@intel.com>,
	"david@redhat.com" <david@redhat.com>,
	"aneesh.kumar@linux.vnet.ibm.com"
	<aneesh.kumar@linux.vnet.ibm.com>,
	"zeil@yandex-team.ru" <zeil@yandex-team.ru>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v5 00/16] HWPOISON: soft offline rework
Date: Mon, 3 Aug 2020 13:36:58 +0000
Message-ID: <20200803133657.GA13307@hori.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <20200803123954.GA4631@lca.pw>

Hello,

On Mon, Aug 03, 2020 at 08:39:55AM -0400, Qian Cai wrote:
> On Fri, Jul 31, 2020 at 12:20:56PM +0000, nao.horiguchi@gmail.com wrote:
> > This patchset is the latest version of soft offline rework patchset
> > targetted for v5.9.
> > 
> > Main focus of this series is to stabilize soft offline.  Historically soft
> > offlined pages have suffered from racy conditions because PageHWPoison is
> > used to a little too aggressively, which (directly or indirectly) invades
> > other mm code which cares little about hwpoison.  This results in unexpected
> > behavior or kernel panic, which is very far from soft offline's "do not
> > disturb userspace or other kernel component" policy.
> > 
> > Main point of this change set is to contain target page "via buddy allocator",
> > where we first free the target page as we do for normal pages, and remove
> > from buddy only when we confirm that it reaches free list. There is surely
> > race window of page allocation, but that's fine because someone really want
> > that page and the page is still working, so soft offline can happily give up.
> > 
> > v4 from Oscar tries to handle the race around reallocation, but that part
> > seems still work in progress, so I decide to separate it for changes into
> > v5.9.  Thank you for your contribution, Oscar.
> > 
> > The issue reported by Qian Cai is fixed by patch 16/16.
> 
> I am still getting EIO everywhere on next-20200803 (which includes this v5).
> 
> # ./random 1
> - start: migrate_huge_offline
> - use NUMA nodes 0,8.
> - mmap and free 8388608 bytes hugepages on node 0
> - mmap and free 8388608 bytes hugepages on node 8
> madvise: Input/output error
> 
> From the serial console,
> 
> [  637.164222][ T8357] soft offline: 0x118ee0: hugepage isolation failed: 0, page count 2, type 7fff800001000e (referenced|uptodate|dirty|head)
> [  637.164890][ T8357] Soft offlining pfn 0x20001380 at process virtual address 0x7fff9f000000
> [  637.165422][ T8357] Soft offlining pfn 0x3ba00 at process virtual address 0x7fff9f200000
> [  637.166409][ T8357] Soft offlining pfn 0x201914a0 at process virtual address 0x7fff9f000000
> [  637.166833][ T8357] Soft offlining pfn 0x12b9a0 at process virtual address 0x7fff9f200000
> [  637.168044][ T8357] Soft offlining pfn 0x1abb60 at process virtual address 0x7fff9f000000
> [  637.168557][ T8357] Soft offlining pfn 0x20014820 at process virtual address 0x7fff9f200000
> [  637.169493][ T8357] Soft offlining pfn 0x119720 at process virtual address 0x7fff9f000000
> [  637.169603][ T8357] soft offline: 0x119720: hugepage isolation failed: 0, page count 2, type 7fff800001000e (referenced|uptodate|dirty|head)
> [  637.169756][ T8357] Soft offlining pfn 0x118ee0 at process virtual address 0x7fff9f200000
> [  637.170653][ T8357] Soft offlining pfn 0x200e81e0 at process virtual address 0x7fff9f000000
> [  637.171067][ T8357] Soft offlining pfn 0x201c5f60 at process virtual address 0x7fff9f200000
> [  637.172101][ T8357] Soft offlining pfn 0x201c8f00 at process virtual address 0x7fff9f000000
> [  637.172241][ T8357] __get_any_page: 0x201c8f00: unknown zero refcount page type 87fff8000000000

I might misjudge to skip the following patch, sorry about that.
Could you try with it?

---
From eafe6fde94cd15e67631540f1b2b000b6e33a650 Mon Sep 17 00:00:00 2001
From: Oscar Salvador <osalvador@suse.de>
Date: Mon, 3 Aug 2020 22:25:10 +0900
Subject: [PATCH] mm,hwpoison: Drain pcplists before bailing out for non-buddy
 zero-refcount page

A page with 0-refcount and !PageBuddy could perfectly be a pcppage.
Currently, we bail out with an error if we encounter such a page,
meaning that we do not handle pcppages neither from hard-offline
nor from soft-offline path.

Fix this by draining pcplists whenever we find this kind of page
and retry the check again.
It might be that pcplists have been spilled into the buddy allocator
and so we can handle it.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 mm/memory-failure.c | 30 ++++++++++++++++++++++++------
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index b2753ce2b85b..02be529445c0 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -949,13 +949,13 @@ static int page_action(struct page_state *ps, struct page *p,
 }
 
 /**
- * get_hwpoison_page() - Get refcount for memory error handling:
+ * __get_hwpoison_page() - Get refcount for memory error handling:
  * @page:	raw error page (hit by memory error)
  *
  * Return: return 0 if failed to grab the refcount, otherwise true (some
  * non-zero value.)
  */
-static int get_hwpoison_page(struct page *page)
+static int __get_hwpoison_page(struct page *page)
 {
 	struct page *head = compound_head(page);
 
@@ -985,6 +985,28 @@ static int get_hwpoison_page(struct page *page)
 	return 0;
 }
 
+static int get_hwpoison_page(struct page *p)
+{
+	int ret;
+	bool drained = false;
+
+retry:
+	ret = __get_hwpoison_page(p);
+	if (!ret) {
+		if (!is_free_buddy_page(p) && !page_count(p) && !drained) {
+			/*
+			 * The page might be in a pcplist, so try to drain
+			 * those and see if we are lucky.
+			 */
+			drain_all_pages(page_zone(p));
+			drained = true;
+			goto retry;
+		}
+	}
+
+	return ret;
+}
+
 /*
  * Do all that is necessary to remove user space mappings. Unmap
  * the pages and send SIGBUS to the processes if the data was dirty.
@@ -1683,10 +1705,6 @@ static int __get_any_page(struct page *p, unsigned long pfn)
 {
 	int ret;
 
-	/*
-	 * When the target page is a free hugepage, just remove it
-	 * from free hugepage list.
-	 */
 	if (!get_hwpoison_page(p)) {
 		if (PageHuge(p)) {
 			pr_info("%s: %#lx free huge page\n", __func__, pfn);
-- 
2.25.1

  reply index

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-31 12:20 nao.horiguchi
2020-07-31 12:20 ` [PATCH v5 01/16] mm,hwpoison: cleanup unused PageHuge() check nao.horiguchi
2020-07-31 12:20 ` [PATCH v5 02/16] mm, hwpoison: remove recalculating hpage nao.horiguchi
2020-07-31 12:20 ` [PATCH v5 03/16] mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED nao.horiguchi
2020-07-31 12:21 ` [PATCH v5 04/16] mm,madvise: Refactor madvise_inject_error nao.horiguchi
2020-07-31 12:21 ` [PATCH v5 05/16] mm,hwpoison-inject: don't pin for hwpoison_filter nao.horiguchi
2020-07-31 12:21 ` [PATCH v5 06/16] mm,hwpoison: Un-export get_hwpoison_page and make it static nao.horiguchi
2020-07-31 12:21 ` [PATCH v5 07/16] mm,hwpoison: Kill put_hwpoison_page nao.horiguchi
2020-07-31 12:21 ` [PATCH v5 08/16] mm,hwpoison: remove MF_COUNT_INCREASED nao.horiguchi
2020-07-31 12:21 ` [PATCH v5 09/16] mm,hwpoison: remove flag argument from soft offline functions nao.horiguchi
2020-07-31 12:21 ` [PATCH v5 10/16] mm,hwpoison: Unify THP handling for hard and soft offline nao.horiguchi
2020-07-31 12:21 ` [PATCH v5 11/16] mm,hwpoison: Rework soft offline for free pages nao.horiguchi
2020-07-31 12:21 ` [PATCH v5 12/16] mm,hwpoison: Rework soft offline for in-use pages nao.horiguchi
2020-07-31 12:21 ` [PATCH v5 13/16] mm,hwpoison: Refactor soft_offline_huge_page and __soft_offline_page nao.horiguchi
2020-07-31 12:21 ` [PATCH v5 14/16] mm,hwpoison: Return 0 if the page is already poisoned in soft-offline nao.horiguchi
2020-07-31 12:21 ` [PATCH v5 15/16] mm,hwpoison: introduce MF_MSG_UNSPLIT_THP nao.horiguchi
2020-07-31 12:21 ` [PATCH v5 16/16] mm,hwpoison: double-check page count in __get_any_page() nao.horiguchi
2020-08-03 12:39 ` [PATCH v5 00/16] HWPOISON: soft offline rework Qian Cai
2020-08-03 13:36   ` HORIGUCHI NAOYA(堀口 直也) [this message]
2020-08-03 15:19     ` Qian Cai
2020-08-05 20:43       ` HORIGUCHI NAOYA(堀口 直也)
2020-08-03 19:07 ` Qian Cai
2020-08-04  1:16   ` HORIGUCHI NAOYA(堀口 直也)
2020-08-04  1:49     ` Qian Cai
2020-08-04  8:13       ` osalvador
2020-08-05 20:44       ` HORIGUCHI NAOYA(堀口 直也)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200803133657.GA13307@hori.linux.bs1.fc.nec.co.jp \
    --to=naoya.horiguchi@nec.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=cai@lca.pw \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=nao.horiguchi@gmail.com \
    --cc=osalvador@suse.de \
    --cc=tony.luck@intel.com \
    --cc=zeil@yandex-team.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git