From: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>
To: Ding Hui <dinghui@sangfor.com.cn>
Cc: David Hildenbrand <david@redhat.com>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"osalvador@suse.de" <osalvador@suse.de>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH] mm/page_alloc: fix counting of free pages after take off from buddy
Date: Thu, 6 May 2021 07:30:55 +0000 [thread overview]
Message-ID: <20210506073055.GA1848917@hori.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <33be44ea-f377-c049-03ff-3b45289ab5f7@sangfor.com.cn>
On Thu, May 06, 2021 at 12:01:34PM +0800, Ding Hui wrote:
> On 2021/5/6 10:49, HORIGUCHI NAOYA(堀口 直也) wrote:
> > On Wed, Apr 28, 2021 at 04:54:59PM +0200, David Hildenbrand wrote:
> > > On 21.04.21 04:04, Ding Hui wrote:
> > > > Recently we found there is a lot MemFree left in /proc/meminfo after
> > > > do a lot of pages soft offline.
> > > >
> > > > I think it's incorrect since NR_FREE_PAGES should not contain HWPoison pages.
> > > > After take_page_off_buddy, the page is no longer belong to buddy
> > > > allocator, and will not be used any more, but we maybe missed accounting
> > > > NR_FREE_PAGES in this situation.
> > > >
> > > > Signed-off-by: Ding Hui <dinghui@sangfor.com.cn>
> > > > ---
> > > > mm/page_alloc.c | 1 +
> > > > 1 file changed, 1 insertion(+)
> > > >
> > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > > index cfc72873961d..8d65b62784d8 100644
> > > > --- a/mm/page_alloc.c
> > > > +++ b/mm/page_alloc.c
> > > > @@ -8947,6 +8947,7 @@ bool take_page_off_buddy(struct page *page)
> > > > del_page_from_free_list(page_head, zone, page_order);
> > > > break_down_buddy_pages(zone, page_head, page, 0,
> > > > page_order, migratetype);
> > > > + __mod_zone_page_state(zone, NR_FREE_PAGES, -1);
> > > > ret = true;
> > > > break;
> > > > }
> > > >
> > >
> > > Should this use __mod_zone_freepage_state() instead?
> >
> > Yes, __mod_zone_freepage_state() looks better to me.
> >
> > And I think that maybe an additional __mod_zone_freepage_state() in
> > unpoison_memory() is necessary to cancel the decrement. I thought of the
> > following, but it doesn't build because get_pfnblock_migratetype() is
> > available only in mm/page_alloc.c, so you might want to add a small exported
> > routine in mm/page_alloc.c and let it called from unpoison_memory().
> >
> > @@ -1899,8 +1899,12 @@ int unpoison_memory(unsigned long pfn)
> > }
> > if (!get_hwpoison_page(p, flags, 0)) {
> > - if (TestClearPageHWPoison(p))
> > + if (TestClearPageHWPoison(p)) {
> > + int migratetype = get_pfnblock_migratetype(p, pfn);
> > +
> > num_poisoned_pages_dec();
> > + __mod_zone_freepage_state(page_zone(p), 1, migratetype);
> > + }
> > unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n",
> > pfn, &unpoison_rs);
> > return 0;
> >
>
> I think there is another problem:
> In normal case, we keep the last refcount of the hwpoison page, so
> get_hwpoison_page should return 1. The NR_FREE_PAGES will be adjusted when
> call put_page.
I think that take_page_off_buddy() should not be called for this case
(the error page have remaining refcount). So it seems that no need to
update NR_FREE_PAGES ?
> At race condition, we maybe leak the page because we does not put it back to
> buddy in unpoison_memory, however the HWPoison flag, num_poisoned_pages,
> NR_FREE_PAGES is adjusted correctly.
>
> CPU0 CPU1
>
> soft_offline_page
> soft_offline_free_page
> page_handle_poison
> take_page_off_buddy
> SetPageHWPoison
> unpoison_memory
> if (!get_hwpoison_page(p))
> TestClearPageHWPoison
> num_poisoned_pages_dec
> __mod_zone_freepage_state
> return 0
> /* miss put the page back to buddy */
> page_ref_inc
> num_poisoned_pages_inc
Thanks for checking this, unpoison_memory() is racy. Recently we are suggesting
to introduce mf_mutex by [1]. Although this patch is not merged to mainline yet,
but it could be used to prevent the above race too.
[1] https://lore.kernel.org/linux-mm/20210427062953.2080293-2-nao.horiguchi@gmail.com/
>
> How about do nothing and return -EBUSY (so the caller can retry) if unpoison
> a zero refcount page , or return 0 like 230ac719c500 ("mm/hwpoison: don't
> try to unpoison containment-failed pages") does ?
>
> @@ -1736,11 +1736,9 @@ int unpoison_memory(unsigned long pfn)
> }
>
> if (!get_hwpoison_page(p, flags, 0)) {
> - if (TestClearPageHWPoison(p))
> - num_poisoned_pages_dec();
> - unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n",
> + unpoison_pr_info("Unpoison: Software-unpoisoned zero refcount page
> %#lx\n",
> pfn, &unpoison_rs);
> - return 0;
> + return -EBUSY;
Currently unpoison_memory() does not work as reverse operation of take_page_off_buddy()
(it's simply broken), so implementing it at one time would be better.
I'll take time to fix unpoison_memory().
Thanks,
Naoya Horiguchi
next prev parent reply other threads:[~2021-05-06 7:30 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-21 2:04 [RFC PATCH] mm/page_alloc: fix counting of free pages after take off from buddy Ding Hui
2021-04-28 14:54 ` David Hildenbrand
2021-04-30 9:43 ` Ding Hui
2021-05-08 3:55 ` [PATCH v2] " Ding Hui
2021-05-25 8:32 ` HORIGUCHI NAOYA(堀口 直也)
2021-05-26 0:43 ` Ding Hui
2021-05-06 2:49 ` [RFC PATCH] " HORIGUCHI NAOYA(堀口 直也)
2021-05-06 4:01 ` Ding Hui
2021-05-06 7:30 ` HORIGUCHI NAOYA(堀口 直也) [this message]
2021-05-07 1:46 ` Ding Hui
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210506073055.GA1848917@hori.linux.bs1.fc.nec.co.jp \
--to=naoya.horiguchi@nec.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=dinghui@sangfor.com.cn \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=osalvador@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).