linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ding Hui <dinghui@sangfor.com.cn>
To: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>
Cc: David Hildenbrand <david@redhat.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"osalvador@suse.de" <osalvador@suse.de>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH] mm/page_alloc: fix counting of free pages after take off from buddy
Date: Fri, 7 May 2021 09:46:46 +0800	[thread overview]
Message-ID: <6af291a0-41fa-8112-5297-6a4cdf2337b6@sangfor.com.cn> (raw)
In-Reply-To: <20210506073055.GA1848917@hori.linux.bs1.fc.nec.co.jp>

On 2021/5/6 15:30, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Thu, May 06, 2021 at 12:01:34PM +0800, Ding Hui wrote:
>> On 2021/5/6 10:49, HORIGUCHI NAOYA(堀口 直也) wrote:
>>> On Wed, Apr 28, 2021 at 04:54:59PM +0200, David Hildenbrand wrote:
>>>> On 21.04.21 04:04, Ding Hui wrote:
>>>>> Recently we found there is a lot MemFree left in /proc/meminfo after
>>>>> do a lot of pages soft offline.
>>>>>
>>>>> I think it's incorrect since NR_FREE_PAGES should not contain HWPoison pages.
>>>>> After take_page_off_buddy, the page is no longer belong to buddy
>>>>> allocator, and will not be used any more, but we maybe missed accounting
>>>>> NR_FREE_PAGES in this situation.
>>>>>
>>>>> Signed-off-by: Ding Hui <dinghui@sangfor.com.cn>
>>>>> ---
>>>>>     mm/page_alloc.c | 1 +
>>>>>     1 file changed, 1 insertion(+)
>>>>>
>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>>> index cfc72873961d..8d65b62784d8 100644
>>>>> --- a/mm/page_alloc.c
>>>>> +++ b/mm/page_alloc.c
>>>>> @@ -8947,6 +8947,7 @@ bool take_page_off_buddy(struct page *page)
>>>>>     			del_page_from_free_list(page_head, zone, page_order);
>>>>>     			break_down_buddy_pages(zone, page_head, page, 0,
>>>>>     						page_order, migratetype);
>>>>> +			__mod_zone_page_state(zone, NR_FREE_PAGES, -1);
>>>>>     			ret = true;
>>>>>     			break;
>>>>>     		}
>>>>>
>>>>
>>>> Should this use __mod_zone_freepage_state() instead?
>>>
>>> Yes, __mod_zone_freepage_state() looks better to me.
>>>
>>> And I think that maybe an additional __mod_zone_freepage_state() in
>>> unpoison_memory() is necessary to cancel the decrement.  I thought of the
>>> following, but it doesn't build because get_pfnblock_migratetype() is
>>> available only in mm/page_alloc.c, so you might want to add a small exported
>>> routine in mm/page_alloc.c and let it called from unpoison_memory().
>>>
>>>     @@ -1899,8 +1899,12 @@ int unpoison_memory(unsigned long pfn)
>>>             }
>>>             if (!get_hwpoison_page(p, flags, 0)) {
>>>     -               if (TestClearPageHWPoison(p))
>>>     +               if (TestClearPageHWPoison(p)) {
>>>     +                       int migratetype = get_pfnblock_migratetype(p, pfn);
>>>     +
>>>                             num_poisoned_pages_dec();
>>>     +                       __mod_zone_freepage_state(page_zone(p), 1, migratetype);
>>>     +               }
>>>                     unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n",
>>>                                      pfn, &unpoison_rs);
>>>                     return 0;
>>>
>>
>> I think there is another problem:
>> In normal case, we keep the last refcount of the hwpoison page, so
>> get_hwpoison_page should return 1. The NR_FREE_PAGES will be adjusted when
>> call put_page.
> 
> I think that take_page_off_buddy() should not be called for this case
> (the error page have remaining refcount).  So it seems that no need to
> update NR_FREE_PAGES ?
> 

Yes, take_page_off_buddy() only used for free pages, but we will call 
page_ref_inc() after that, on the other hand for in used pages, we 
increased the refcount by get_any_page(), so in both cases, the 
hwpoisoned pages have refcount great than zero.

I think there is no need to update NR_FREE_PAGES explicitly in 
unpoison_memory(), the put_page() will help us to update NR_FREE_PAGES 
and put the page back to buddy.

>> At race condition, we maybe leak the page because we does not put it back to
>> buddy in unpoison_memory, however the HWPoison flag, num_poisoned_pages,
>> NR_FREE_PAGES is adjusted correctly.
>>
>> CPU0                        CPU1
>>
>> soft_offline_page
>>    soft_offline_free_page
>>      page_handle_poison
>>        take_page_off_buddy
>>        SetPageHWPoison
>>                              unpoison_memory
>>                                if (!get_hwpoison_page(p))
>>                                  TestClearPageHWPoison
>>                                    num_poisoned_pages_dec
>>                                  __mod_zone_freepage_state
>>                                  return 0
>>                                  /* miss put the page back to buddy */
>>        page_ref_inc
>>        num_poisoned_pages_inc
> 
> Thanks for checking this, unpoison_memory() is racy.  Recently we are suggesting
> to introduce mf_mutex by [1].  Although this patch is not merged to mainline yet,
> but it could be used to prevent the above race too.
> 
> [1] https://lore.kernel.org/linux-mm/20210427062953.2080293-2-nao.horiguchi@gmail.com/
> 

I'll look forward to it, thanks.

>>
>> How about do nothing and return -EBUSY (so the caller can retry) if unpoison
>> a zero refcount page , or return 0 like 230ac719c500 ("mm/hwpoison: don't
>> try to unpoison containment-failed pages") does ?
>>
>>    @@ -1736,11 +1736,9 @@ int unpoison_memory(unsigned long pfn)
>>      }
>>
>>      if (!get_hwpoison_page(p, flags, 0)) {
>>    -       if (TestClearPageHWPoison(p))
>>    -           num_poisoned_pages_dec();
>>    -       unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n",
>>    +       unpoison_pr_info("Unpoison: Software-unpoisoned zero refcount page
>> %#lx\n",
>>    				 pfn, &unpoison_rs);
>>    -       return 0;
>>    +       return -EBUSY;
> 
> Currently unpoison_memory() does not work as reverse operation of take_page_off_buddy()
> (it's simply broken), so implementing it at one time would be better.
> I'll take time to fix unpoison_memory().
> 

Thanks for your work.
Actually, I'm not sure about the exactly meaning of "broken", it seems 
that the basic function of unpoison_memory() is ok if not considered the 
race conditions.


-- 
Thanks,
- Ding Hui

      reply	other threads:[~2021-05-07  1:46 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-21  2:04 Ding Hui
2021-04-28 14:54 ` David Hildenbrand
2021-04-30  9:43   ` Ding Hui
2021-05-08  3:55     ` [PATCH v2] " Ding Hui
2021-05-25  8:32       ` HORIGUCHI NAOYA(堀口 直也)
2021-05-26  0:43         ` Ding Hui
2021-05-06  2:49   ` [RFC PATCH] " HORIGUCHI NAOYA(堀口 直也)
2021-05-06  4:01     ` Ding Hui
2021-05-06  7:30       ` HORIGUCHI NAOYA(堀口 直也)
2021-05-07  1:46         ` Ding Hui [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6af291a0-41fa-8112-5297-6a4cdf2337b6@sangfor.com.cn \
    --to=dinghui@sangfor.com.cn \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=osalvador@suse.de \
    --subject='Re: [RFC PATCH] mm/page_alloc: fix counting of free pages after take off from buddy' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
on how to clone and mirror all data and code used for this inbox