LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Andi Kleen <andi@firstfloor.org>, Johannes Weiner <hannes@cmpxchg.org>
Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	akpm@linux-foundation.org,
	Vladimir Davydov <vdavydov.dev@gmail.com>
Subject: Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
Date: Tue, 2 May 2017 20:55:07 +0200
Message-ID: <20170502185507.GB19165@dhcp22.suse.cz> (raw)
In-Reply-To: <c8ce6056-e89b-7470-c37a-85ab5bc7a5b2@linux.vnet.ibm.com>

On Tue 02-05-17 16:59:30, Laurent Dufour wrote:
> On 28/04/2017 15:48, Michal Hocko wrote:
[...]
> > This is getting quite hairy. What is the expected page count of the
> > hwpoison page?

OK, so from the quick check of the hwpoison code it seems that the ref
count will be > 1 (from get_hwpoison_page).

> > I guess we would need to update the VM_BUG_ON in the
> > memcg uncharge code to ignore the page count of hwpoison pages if it can
> > be arbitrary.
> 
> Based on the experiment I did, page count == 2 when isolate_lru_page()
> succeeds, even in the case of a poisoned page.

that would make some sense to me. The page should have been already
unmapped therefore but memory_failure increases the ref count and 1 is
for isolate_lru_page().

> In my case I think this
> is because the page is still used by the process which is calling madvise().
> 
> I'm wondering if I'm looking at the right place. May be the poisoned
> page should remain attach to the memory_cgroup until no one is using it.
> In that case this means that something should be done when the page is
> off-lined... I've to dig further here.

No, AFAIU the page will not drop the reference count down to 0 in most
cases. Maybe there are some scenarios where this can happen but I would
expect that the poisoned page will be mapped and in use most of the time
and won't drop down 0. And then we should really uncharge it because it
will pin the memcg and make it unfreeable which doesn't seem to be what
we want.  So does the following work reasonable? Andi, Johannes, what do
you think? I cannot say I would be really comfortable touching hwpoison
code as I really do not understand the workflow. Maybe we want to move
this uncharge down to memory_failure() right before we report success?
---
>From 8bf0791bcf35996a859b6d33fb5494e5b53de49d Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Tue, 2 May 2017 20:32:24 +0200
Subject: [PATCH] hwpoison, memcg: forcibly uncharge LRU pages

Laurent Dufour has noticed that hwpoinsoned pages are kept charged. In
his particular case he has hit a bad_page("page still charged to cgroup")
when onlining a hwpoison page. While this looks like something that shouldn't
happen in the first place because onlining hwpages and returning them to
the page allocator makes only little sense it shows a real problem.

hwpoison pages do not get freed usually so we do not uncharge them (at
least not since 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API")).
Each charge pins memcg (since e8ea14cc6ead ("mm: memcontrol: take a css
reference for each charged page")) as well and so the mem_cgroup and the
associated state will never go away. Fix this leak by forcibly
uncharging a LRU hwpoisoned page in delete_from_lru_cache(). We also
have to tweak uncharge_list because it cannot rely on zero ref count
for these pages.

Fixes: 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API")
Reported-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/memcontrol.c     | 2 +-
 mm/memory-failure.c | 7 +++++++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 16c556ac103d..4cf26059adb1 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5527,7 +5527,7 @@ static void uncharge_list(struct list_head *page_list)
 		next = page->lru.next;
 
 		VM_BUG_ON_PAGE(PageLRU(page), page);
-		VM_BUG_ON_PAGE(page_count(page), page);
+		VM_BUG_ON_PAGE(!PageHWPoison(page) && page_count(page), page);
 
 		if (!page->mem_cgroup)
 			continue;
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 8a6bd3a9eb1e..4497d9619bb4 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -541,6 +541,13 @@ static int delete_from_lru_cache(struct page *p)
 		 */
 		ClearPageActive(p);
 		ClearPageUnevictable(p);
+
+		/*
+		 * Poisoned page might never drop its ref count to 0 so we have to
+		 * uncharge it manually from its memcg.
+		 */
+		mem_cgroup_uncharge(p);
+
 		/*
 		 * drop the page count elevated by isolate_lru_page()
 		 */
-- 
2.11.0

-- 
Michal Hocko
SUSE Labs

  reply index

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-25 14:27 [PATCH v2 0/2] BUG raised when onlining HWPoisoned page Laurent Dufour
2017-04-25 14:27 ` [PATCH v2 1/2] mm: Uncharge poisoned pages Laurent Dufour
2017-04-25 23:48   ` Naoya Horiguchi
2017-04-26  1:54   ` Balbir Singh
2017-04-26  2:34     ` Naoya Horiguchi
2017-04-26  3:45       ` Balbir Singh
2017-04-26  4:46         ` Naoya Horiguchi
2017-04-26  8:59           ` Balbir Singh
2017-04-28  9:32             ` Laurent Dufour
2017-04-27 14:37   ` Michal Hocko
2017-04-27 20:51     ` Andi Kleen
2017-04-28  6:07       ` Michal Hocko
2017-04-28  7:31         ` Michal Hocko
2017-04-28  9:17           ` Laurent Dufour
2017-04-28 13:48             ` Michal Hocko
2017-05-02 14:59               ` Laurent Dufour
2017-05-02 18:55                 ` Michal Hocko [this message]
2017-05-03 11:34                   ` Laurent Dufour
2017-05-04  1:21                   ` Balbir Singh
2017-05-08 10:42                     ` Laurent Dufour
2017-05-09  1:41                       ` Balbir Singh
2017-05-08  2:58                   ` Naoya Horiguchi
2017-05-09  9:18                     ` Michal Hocko
2017-05-09 22:59                       ` Naoya Horiguchi
2017-04-25 14:27 ` [PATCH v2 2/2] mm: skip HWPoisoned pages when onlining pages Laurent Dufour
2017-04-26  2:10   ` Balbir Singh
2017-04-26  3:13     ` Naoya Horiguchi
2017-04-28  2:51       ` Balbir Singh
2017-04-28  6:30       ` Michal Hocko
2017-04-28  6:50         ` Michal Hocko
2017-04-28  6:51           ` Michal Hocko
2017-05-10  7:41             ` Michal Hocko
2018-01-17 23:03         ` Andrew Morton
2018-01-23 18:15           ` Laurent Dufour

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170502185507.GB19165@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=hannes@cmpxchg.org \
    --cc=ldufour@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git
	git clone --mirror https://lore.kernel.org/lkml/10 lkml/git/10.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git