From: Oscar Salvador <osalvador@suse.de> To: nao.horiguchi@gmail.com, linux-mm@kvack.org Cc: mhocko@kernel.org, akpm@linux-foundation.org, mike.kravetz@oracle.com, tony.luck@intel.com, david@redhat.com, aneesh.kumar@linux.vnet.ibm.com, zeil@yandex-team.ru, naoya.horiguchi@nec.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH v3 00/15] HWPOISON: soft offline rework Date: Mon, 29 Jun 2020 12:29:25 +0200 Message-ID: <1593426565.3504.6.camel@suse.de> (raw) In-Reply-To: <20200624150137.7052-1-nao.horiguchi@gmail.com> On Wed, 2020-06-24 at 15:01 +0000, nao.horiguchi@gmail.com wrote: > I rebased soft-offline rework patchset [1][2] onto the latest > mmotm. The > rebasing required some non-trivial changes to adjust, but mainly that > was > straightforward. I confirmed that the reported problem doesn't > reproduce on > compaction after soft offline. For more precise description of the > problem > and the motivation of this patchset, please see [2]. Hi Naoya, Thanks for dusting this off. To be honest, I got stuck with the hard offline mode so this delayed the resubmission, along other problems. > I think that the following two patches in v2 are better to be done > with > separate work of hard-offline rework, so it's not included in this > series. > > - mm,hwpoison: Take pages off the buddy when hard-offlining > - mm/hwpoison-inject: Rip off duplicated checks > > These two are not directly related to the reported problem, so they > seems > not urgent. And the first one breaks num_poisoned_pages counting in > some > testcases, and The second patch needs more consideration about > commented point. I fully agree. > Any comment/suggestion/help would be appreciated. My "new" version included a patch to make sure we give a chance to pages that possibly are in a pcplist. Current behavior is that if someone tries to soft-offline such a page, we return an error because page count is 0 but page is not in the buddy system. Since this patchset already landed in the mm tree, I could send it as a standalone patch on top if you agree with it. My patch looked something like: From: Oscar Salvador <osalvador@suse.de> Date: Mon, 29 Jun 2020 12:25:11 +0200 Subject: [PATCH] mm,hwpoison: Drain pcplists before bailing out for non-buddy zero-refcount page A page with 0-refcount and !PageBuddy could perfectly be a pcppage. Currently, we bail out with an error if we encounter such a page, meaning that we do not give a chance to handle pcppages. Fix this by draining pcplists whenever we find this kind of page and retry the check again. It might be that pcplists have been spilled into the buddy allocator and so we can handle it. Signed-off-by: Oscar Salvador <osalvador@suse.de> --- mm/memory-failure.c | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index e90ddddab397..3aac3f1eeed0 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -958,7 +958,7 @@ static int page_action(struct page_state *ps, struct page *p, * Return: return 0 if failed to grab the refcount, otherwise true (some * non-zero value.) */ -static int get_hwpoison_page(struct page *page) +static int __get_hwpoison_page(struct page *page) { struct page *head = compound_head(page); @@ -988,6 +988,28 @@ static int get_hwpoison_page(struct page *page) return 0; } +static int get_hwpoison_page(struct page *p) +{ + int ret; + bool drained = false; + +retry: + ret = __get_hwpoison_page(p); + if (!ret) { + if (!is_free_buddy_page(p) && !page_count(p) && !drained) { + /* + * The page might be in a pcplist, so try to drain + * those and see if we are lucky. + */ + drain_all_pages(page_zone(p)); + drained = true; + goto retry; + } + } + + return ret; +} + /* * Do all that is necessary to remove user space mappings. Unmap * the pages and send SIGBUS to the processes if the data was dirty. -- 2.26.2 -- Oscar Salvador SUSE L3
next prev parent reply index Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-06-24 15:01 nao.horiguchi 2020-06-24 15:01 ` [PATCH v3 01/15] mm,hwpoison: cleanup unused PageHuge() check nao.horiguchi 2020-06-30 23:28 ` Mike Kravetz 2020-06-24 15:01 ` [PATCH v3 02/15] mm, hwpoison: remove recalculating hpage nao.horiguchi 2020-06-30 23:45 ` Mike Kravetz 2020-06-24 15:01 ` [PATCH v3 03/15] mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED nao.horiguchi 2020-06-24 15:01 ` [PATCH v3 04/15] mm,madvise: Refactor madvise_inject_error nao.horiguchi 2020-06-24 15:01 ` [PATCH v3 05/15] mm,hwpoison-inject: don't pin for hwpoison_filter nao.horiguchi 2020-06-24 15:01 ` [PATCH v3 06/15] mm,hwpoison: Un-export get_hwpoison_page and make it static nao.horiguchi 2020-06-24 15:01 ` [PATCH v3 07/15] mm,hwpoison: Kill put_hwpoison_page nao.horiguchi 2020-06-24 15:01 ` [PATCH v3 08/15] mm,hwpoison: remove MF_COUNT_INCREASED nao.horiguchi 2020-06-24 15:01 ` [PATCH v3 09/15] mm,hwpoison: remove flag argument from soft offline functions nao.horiguchi 2020-06-24 15:01 ` [PATCH v3 10/15] mm,hwpoison: Unify THP handling for hard and soft offline nao.horiguchi 2020-06-24 15:01 ` [PATCH v3 11/15] mm,hwpoison: Rework soft offline for free pages nao.horiguchi 2020-06-24 15:01 ` [PATCH v3 12/15] mm,hwpoison: Rework soft offline for in-use pages nao.horiguchi 2020-06-24 15:01 ` [PATCH v3 13/15] mm,hwpoison: Refactor soft_offline_huge_page and __soft_offline_page nao.horiguchi 2020-06-26 16:53 ` Naresh Kamboju 2020-06-28 6:54 ` Naoya Horiguchi 2020-06-29 7:22 ` Stephen Rothwell 2020-06-29 7:33 ` HORIGUCHI NAOYA(堀口 直也) 2020-06-24 15:01 ` [PATCH v3 14/15] mm,hwpoison: Return 0 if the page is already poisoned in soft-offline nao.horiguchi 2020-06-24 15:01 ` [PATCH v3 15/15] mm,hwpoison: introduce MF_MSG_UNSPLIT_THP nao.horiguchi 2020-06-24 19:17 ` [PATCH v3 00/15] HWPOISON: soft offline rework Andrew Morton 2020-06-24 22:36 ` HORIGUCHI NAOYA(堀口 直也) 2020-06-24 22:49 ` Andrew Morton 2020-06-24 23:01 ` HORIGUCHI NAOYA(堀口 直也) 2020-06-26 13:35 ` Qian Cai 2020-06-29 10:29 ` Oscar Salvador [this message] 2020-06-30 6:50 ` HORIGUCHI NAOYA(堀口 直也) 2020-06-30 5:08 ` Qian Cai 2020-06-30 6:35 ` Oscar Salvador 2020-07-01 8:22 ` Oscar Salvador 2020-07-01 9:07 ` HORIGUCHI NAOYA(堀口 直也) 2020-07-14 10:08 ` Oscar Salvador
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=1593426565.3504.6.camel@suse.de \ --to=osalvador@suse.de \ --cc=akpm@linux-foundation.org \ --cc=aneesh.kumar@linux.vnet.ibm.com \ --cc=david@redhat.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mhocko@kernel.org \ --cc=mike.kravetz@oracle.com \ --cc=nao.horiguchi@gmail.com \ --cc=naoya.horiguchi@nec.com \ --cc=tony.luck@intel.com \ --cc=zeil@yandex-team.ru \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Linux-mm Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \ linux-mm@kvack.org public-inbox-index linux-mm Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kvack.linux-mm AGPL code for this site: git clone https://public-inbox.org/public-inbox.git