linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Charan Teja Kalla <charante@codeaurora.org>,
	akpm@linux-foundation.org, mhocko@suse.com, vbabka@suse.cz,
	linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org, vinmenon@codeaurora.org
Subject: Re: [PATCH] mm, page_alloc: fix core hung in free_pcppages_bulk()
Date: Tue, 11 Aug 2020 15:13:53 +0200	[thread overview]
Message-ID: <f3147c23-8ea8-c262-8dcc-7982d822db41@redhat.com> (raw)
In-Reply-To: <3e16448a-ea3b-99c0-13b0-a5a31c9b5582@codeaurora.org>

On 11.08.20 15:11, Charan Teja Kalla wrote:
> Thanks David for the comments.
> 
> On 8/11/2020 1:59 PM, David Hildenbrand wrote:
>> On 10.08.20 18:10, Charan Teja Reddy wrote:
>>> The following race is observed with the repeated online, offline and a
>>> delay between two successive online of memory blocks of movable zone.
>>>
>>> P1						P2
>>>
>>> Online the first memory block in
>>> the movable zone. The pcp struct
>>> values are initialized to default
>>> values,i.e., pcp->high = 0 &
>>> pcp->batch = 1.
>>>
>>> 					Allocate the pages from the
>>> 					movable zone.
>>>
>>> Try to Online the second memory
>>> block in the movable zone thus it
>>> entered the online_pages() but yet
>>> to call zone_pcp_update().
>>> 					This process is entered into
>>> 					the exit path thus it tries
>>> 					to release the order-0 pages
>>> 					to pcp lists through
>>> 					free_unref_page_commit().
>>> 					As pcp->high = 0, pcp->count = 1
>>> 					proceed to call the function
>>> 					free_pcppages_bulk().
>>> Update the pcp values thus the
>>> new pcp values are like, say,
>>> pcp->high = 378, pcp->batch = 63.
>>> 					Read the pcp's batch value using
>>> 					READ_ONCE() and pass the same to
>>> 					free_pcppages_bulk(), pcp values
>>> 					passed here are, batch = 63,
>>> 					count = 1.
>>>
>>> 					Since num of pages in the pcp
>>> 					lists are less than ->batch,
>>> 					then it will stuck in
>>> 					while(list_empty(list)) loop
>>> 					with interrupts disabled thus
>>> 					a core hung.
>>>
>>> Avoid this by ensuring free_pcppages_bulk() called with proper count of
>>> pcp list pages.
>>>
>>> The mentioned race is some what easily reproducible without [1] because
>>> pcp's are not updated for the first memory block online and thus there
>>> is a enough race window for P2 between alloc+free and pcp struct values
>>> update through onlining of second memory block.
>>>
>>> With [1], the race is still exists but it is very much narrow as we
>>> update the pcp struct values for the first memory block online itself.
>>>
>>> [1]: https://patchwork.kernel.org/patch/11696389/
>>>
>>> Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
>>> ---
>>>  mm/page_alloc.c | 16 ++++++++++++++--
>>>  1 file changed, 14 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index e4896e6..25e7e12 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -3106,6 +3106,7 @@ static void free_unref_page_commit(struct page *page, unsigned long pfn)
>>>  	struct zone *zone = page_zone(page);
>>>  	struct per_cpu_pages *pcp;
>>>  	int migratetype;
>>> +	int high;
>>>  
>>>  	migratetype = get_pcppage_migratetype(page);
>>>  	__count_vm_event(PGFREE);
>>> @@ -3128,8 +3129,19 @@ static void free_unref_page_commit(struct page *page, unsigned long pfn)
>>>  	pcp = &this_cpu_ptr(zone->pageset)->pcp;
>>>  	list_add(&page->lru, &pcp->lists[migratetype]);
>>>  	pcp->count++;
>>> -	if (pcp->count >= pcp->high) {
>>> -		unsigned long batch = READ_ONCE(pcp->batch);
>>> +	high = READ_ONCE(pcp->high);
>>> +	if (pcp->count >= high) {
>>> +		int batch;
>>> +
>>> +		batch = READ_ONCE(pcp->batch);
>>> +		/*
>>> +		 * For non-default pcp struct values, high is always
>>> +		 * greater than the batch. If high < batch then pass
>>> +		 * proper count to free the pcp's list pages.
>>> +		 */
>>> +		if (unlikely(high < batch))
>>> +			batch = min(pcp->count, batch);
>>> +
>>>  		free_pcppages_bulk(zone, batch, pcp);
>>>  	}
>>>  }
>>>
>>
>> I was wondering if we should rather set all pageblocks to
>> MIGRATE_ISOLATE in online_pages() before doing the online_pages_range()
>> call, and do undo_isolate_page_range() after onlining is done.
>>
>> move_pfn_range_to_zone()->memmap_init_zone() marks all pageblocks
>> MIGRATE_MOVABLE, and as that function is used also during boot, we could
>> supply a parameter to configure this.
>>
>> This would prevent another race from happening: Having pages exposed to
>> the buddy ready for allocation in online_pages_range() before the
>> sections are marked online.
> 
> Yeah this is another bug. And idea of isolate first, online and undoing
> the isolation after zonelist and pcp struct update should work even for
> the mentioned issue. This needs to go as a separate fix?

Yeah, also requires more work to be done. Will add it to my list of TODOs.

> 
> However, IMO, issue in free_pcppages_bulk() should be fixed by checking
> if sane count value is passed. NO?
> Posted V2: https://patchwork.kernel.org/patch/11709225/

Yeah, I'm fine with fixing this issue that can actually be reproduced.


-- 
Thanks,

David / dhildenb



  reply	other threads:[~2020-08-11 13:14 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-10 16:10 [PATCH] mm, page_alloc: fix core hung in free_pcppages_bulk() Charan Teja Reddy
2020-08-10 19:36 ` David Rientjes
2020-08-11 13:01   ` Charan Teja Kalla
2020-08-11  8:29 ` David Hildenbrand
2020-08-11 13:11   ` Charan Teja Kalla
2020-08-11 13:13     ` David Hildenbrand [this message]
2020-08-13  7:07   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f3147c23-8ea8-c262-8dcc-7982d822db41@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=charante@codeaurora.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=vbabka@suse.cz \
    --cc=vinmenon@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).