All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ding Hui <dinghui@sangfor.com.cn>
To: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>
Cc: David Hildenbrand <david@redhat.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"osalvador@suse.de" <osalvador@suse.de>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH] mm/page_alloc: fix counting of free pages after take off from buddy
Date: Fri, 7 May 2021 09:46:46 +0800	[thread overview]
Message-ID: <6af291a0-41fa-8112-5297-6a4cdf2337b6@sangfor.com.cn> (raw)
In-Reply-To: <20210506073055.GA1848917@hori.linux.bs1.fc.nec.co.jp>

On 2021/5/6 15:30, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Thu, May 06, 2021 at 12:01:34PM +0800, Ding Hui wrote:
>> On 2021/5/6 10:49, HORIGUCHI NAOYA(堀口 直也) wrote:
>>> On Wed, Apr 28, 2021 at 04:54:59PM +0200, David Hildenbrand wrote:
>>>> On 21.04.21 04:04, Ding Hui wrote:
>>>>> Recently we found there is a lot MemFree left in /proc/meminfo after
>>>>> do a lot of pages soft offline.
>>>>>
>>>>> I think it's incorrect since NR_FREE_PAGES should not contain HWPoison pages.
>>>>> After take_page_off_buddy, the page is no longer belong to buddy
>>>>> allocator, and will not be used any more, but we maybe missed accounting
>>>>> NR_FREE_PAGES in this situation.
>>>>>
>>>>> Signed-off-by: Ding Hui <dinghui@sangfor.com.cn>
>>>>> ---
>>>>>     mm/page_alloc.c | 1 +
>>>>>     1 file changed, 1 insertion(+)
>>>>>
>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>>> index cfc72873961d..8d65b62784d8 100644
>>>>> --- a/mm/page_alloc.c
>>>>> +++ b/mm/page_alloc.c
>>>>> @@ -8947,6 +8947,7 @@ bool take_page_off_buddy(struct page *page)
>>>>>     			del_page_from_free_list(page_head, zone, page_order);
>>>>>     			break_down_buddy_pages(zone, page_head, page, 0,
>>>>>     						page_order, migratetype);
>>>>> +			__mod_zone_page_state(zone, NR_FREE_PAGES, -1);
>>>>>     			ret = true;
>>>>>     			break;
>>>>>     		}
>>>>>
>>>>
>>>> Should this use __mod_zone_freepage_state() instead?
>>>
>>> Yes, __mod_zone_freepage_state() looks better to me.
>>>
>>> And I think that maybe an additional __mod_zone_freepage_state() in
>>> unpoison_memory() is necessary to cancel the decrement.  I thought of the
>>> following, but it doesn't build because get_pfnblock_migratetype() is
>>> available only in mm/page_alloc.c, so you might want to add a small exported
>>> routine in mm/page_alloc.c and let it called from unpoison_memory().
>>>
>>>     @@ -1899,8 +1899,12 @@ int unpoison_memory(unsigned long pfn)
>>>             }
>>>             if (!get_hwpoison_page(p, flags, 0)) {
>>>     -               if (TestClearPageHWPoison(p))
>>>     +               if (TestClearPageHWPoison(p)) {
>>>     +                       int migratetype = get_pfnblock_migratetype(p, pfn);
>>>     +
>>>                             num_poisoned_pages_dec();
>>>     +                       __mod_zone_freepage_state(page_zone(p), 1, migratetype);
>>>     +               }
>>>                     unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n",
>>>                                      pfn, &unpoison_rs);
>>>                     return 0;
>>>
>>
>> I think there is another problem:
>> In normal case, we keep the last refcount of the hwpoison page, so
>> get_hwpoison_page should return 1. The NR_FREE_PAGES will be adjusted when
>> call put_page.
> 
> I think that take_page_off_buddy() should not be called for this case
> (the error page have remaining refcount).  So it seems that no need to
> update NR_FREE_PAGES ?
> 

Yes, take_page_off_buddy() only used for free pages, but we will call 
page_ref_inc() after that, on the other hand for in used pages, we 
increased the refcount by get_any_page(), so in both cases, the 
hwpoisoned pages have refcount great than zero.

I think there is no need to update NR_FREE_PAGES explicitly in 
unpoison_memory(), the put_page() will help us to update NR_FREE_PAGES 
and put the page back to buddy.

>> At race condition, we maybe leak the page because we does not put it back to
>> buddy in unpoison_memory, however the HWPoison flag, num_poisoned_pages,
>> NR_FREE_PAGES is adjusted correctly.
>>
>> CPU0                        CPU1
>>
>> soft_offline_page
>>    soft_offline_free_page
>>      page_handle_poison
>>        take_page_off_buddy
>>        SetPageHWPoison
>>                              unpoison_memory
>>                                if (!get_hwpoison_page(p))
>>                                  TestClearPageHWPoison
>>                                    num_poisoned_pages_dec
>>                                  __mod_zone_freepage_state
>>                                  return 0
>>                                  /* miss put the page back to buddy */
>>        page_ref_inc
>>        num_poisoned_pages_inc
> 
> Thanks for checking this, unpoison_memory() is racy.  Recently we are suggesting
> to introduce mf_mutex by [1].  Although this patch is not merged to mainline yet,
> but it could be used to prevent the above race too.
> 
> [1] https://lore.kernel.org/linux-mm/20210427062953.2080293-2-nao.horiguchi@gmail.com/
> 

I'll look forward to it, thanks.

>>
>> How about do nothing and return -EBUSY (so the caller can retry) if unpoison
>> a zero refcount page , or return 0 like 230ac719c500 ("mm/hwpoison: don't
>> try to unpoison containment-failed pages") does ?
>>
>>    @@ -1736,11 +1736,9 @@ int unpoison_memory(unsigned long pfn)
>>      }
>>
>>      if (!get_hwpoison_page(p, flags, 0)) {
>>    -       if (TestClearPageHWPoison(p))
>>    -           num_poisoned_pages_dec();
>>    -       unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n",
>>    +       unpoison_pr_info("Unpoison: Software-unpoisoned zero refcount page
>> %#lx\n",
>>    				 pfn, &unpoison_rs);
>>    -       return 0;
>>    +       return -EBUSY;
> 
> Currently unpoison_memory() does not work as reverse operation of take_page_off_buddy()
> (it's simply broken), so implementing it at one time would be better.
> I'll take time to fix unpoison_memory().
> 

Thanks for your work.
Actually, I'm not sure about the exactly meaning of "broken", it seems 
that the basic function of unpoison_memory() is ok if not considered the 
race conditions.


-- 
Thanks,
- Ding Hui

      reply	other threads:[~2021-05-07  1:46 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-21  2:04 [RFC PATCH] mm/page_alloc: fix counting of free pages after take off from buddy Ding Hui
2021-04-28 14:54 ` David Hildenbrand
2021-04-30  9:43   ` Ding Hui
2021-05-08  3:55     ` [PATCH v2] " Ding Hui
2021-05-25  8:32       ` HORIGUCHI NAOYA(堀口 直也)
2021-05-26  0:43         ` Ding Hui
2021-05-06  2:49   ` [RFC PATCH] " HORIGUCHI NAOYA(堀口 直也)
2021-05-06  4:01     ` Ding Hui
2021-05-06  7:30       ` HORIGUCHI NAOYA(堀口 直也)
2021-05-07  1:46         ` Ding Hui [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6af291a0-41fa-8112-5297-6a4cdf2337b6@sangfor.com.cn \
    --to=dinghui@sangfor.com.cn \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=osalvador@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.