From: Charan Teja Kalla <charante@codeaurora.org>
To: David Hildenbrand <david@redhat.com>,
akpm@linux-foundation.org, mhocko@suse.com, vbabka@suse.cz,
rientjes@google.com, linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org, vinmenon@codeaurora.org
Subject: Re: [PATCH V2] mm, page_alloc: fix core hung in free_pcppages_bulk()
Date: Wed, 12 Aug 2020 15:41:30 +0530 [thread overview]
Message-ID: <2a2ec65a-ad93-7adc-76a0-246b9b328155@codeaurora.org> (raw)
In-Reply-To: <db7ac714-f508-85a4-1af8-9733e680dc69@redhat.com>
On 8/12/2020 3:30 PM, David Hildenbrand wrote:
> On 12.08.20 11:46, Charan Teja Kalla wrote:
>>
>> Thanks David for the inputs.
>>
>> On 8/12/2020 2:35 AM, David Hildenbrand wrote:
>>> On 11.08.20 14:58, Charan Teja Reddy wrote:
>>>> The following race is observed with the repeated online, offline and a
>>>> delay between two successive online of memory blocks of movable zone.
>>>>
>>>> P1 P2
>>>>
>>>> Online the first memory block in
>>>> the movable zone. The pcp struct
>>>> values are initialized to default
>>>> values,i.e., pcp->high = 0 &
>>>> pcp->batch = 1.
>>>>
>>>> Allocate the pages from the
>>>> movable zone.
>>>>
>>>> Try to Online the second memory
>>>> block in the movable zone thus it
>>>> entered the online_pages() but yet
>>>> to call zone_pcp_update().
>>>> This process is entered into
>>>> the exit path thus it tries
>>>> to release the order-0 pages
>>>> to pcp lists through
>>>> free_unref_page_commit().
>>>> As pcp->high = 0, pcp->count = 1
>>>> proceed to call the function
>>>> free_pcppages_bulk().
>>>> Update the pcp values thus the
>>>> new pcp values are like, say,
>>>> pcp->high = 378, pcp->batch = 63.
>>>> Read the pcp's batch value using
>>>> READ_ONCE() and pass the same to
>>>> free_pcppages_bulk(), pcp values
>>>> passed here are, batch = 63,
>>>> count = 1.
>>>>
>>>> Since num of pages in the pcp
>>>> lists are less than ->batch,
>>>> then it will stuck in
>>>> while(list_empty(list)) loop
>>>> with interrupts disabled thus
>>>> a core hung.
>>>>
>>>> Avoid this by ensuring free_pcppages_bulk() is called with proper count
>>>> of pcp list pages.
>>>>
>>>> The mentioned race is some what easily reproducible without [1] because
>>>> pcp's are not updated for the first memory block online and thus there
>>>> is a enough race window for P2 between alloc+free and pcp struct values
>>>> update through onlining of second memory block.
>>>>
>>>> With [1], the race is still exists but it is very much narrow as we
>>>> update the pcp struct values for the first memory block online itself.
>>>>
>>>> [1]: https://patchwork.kernel.org/patch/11696389/
>>>>
>>>
>>> IIUC, this is not limited to the movable zone, it could also happen in
>>> corner cases with the normal zone (e.g., hotplug to a node that only has
>>> DMA memory, or no other memory yet).
>>
>> Yes, this is my understanding too. I explained the above race in terms
>> of just movable zone for which it is observed. We can add the below line
>> in the end in patch commit message:
>> "This is not limited to the movable zone, it could also happen in cases
>> with the normal zone (e.g., hotplug to a node that only has DMA memory,
>> or no other memory yet)."
>
> Yeah, that makes sense!
>
>>
>> Just curious, there exists such systems where just a dma zone present
>> and we hot add the normal zone? I am not aware such thing in the
>> embedded world.
>
> You can easily create such setups using QEMU.
>
> IIRC, just specify a QEMU guest with 2G initial memory and a single NUMA
> node, or 4G initial memory and two NUMA nodes. Then hotplug memory.
>
> (IIRC kata containers always start a VM with 2G and then hotplug memory)
>
I see. Thanks for letting me know this.
>>>
>>>> Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
>>>> ---
>>>>
>>>> v1: https://patchwork.kernel.org/patch/11707637/
>>>>
>>>> mm/page_alloc.c | 5 +++++
>>>> 1 file changed, 5 insertions(+)
>>>>
>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>> index e4896e6..839039f 100644
>>>> --- a/mm/page_alloc.c
>>>> +++ b/mm/page_alloc.c
>>>> @@ -1304,6 +1304,11 @@ static void free_pcppages_bulk(struct zone *zone, int count,
>>>> struct page *page, *tmp;
>>>> LIST_HEAD(head);
>>>>
>>>> + /*
>>>> + * Ensure proper count is passed which otherwise would stuck in the
>>>> + * below while (list_empty(list)) loop.
>>>> + */
>>>> + count = min(pcp->count, count);
>>>> while (count) {
>>>> struct list_head *list;
>>>>
>>>>
>>>
>>> Fixes: and Cc: stable... tags?
>>
>> Fixes: 5f8dcc21211a ("page-allocator: split per-cpu list into
>> one-list-per-migrate-type")
>> Cc: <stable@vger.kernel.org> [2.6+]
>
> Did we have memory hotplug support then already?
Yes, it exist.
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/mm/memory_hotplug.c?h=v2.6.39
>
>>
>> I am not sure If I should have to raise V3 including these?
>
>
> Maybe Andrew can fixup when applying.
Okay, let Andrew decide on this. Meanwhile If you find that this patch
looks correct, ACK from you helps here.
>
>
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum, a Linux Foundation Collaborative Project
next prev parent reply other threads:[~2020-08-12 10:12 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-11 12:58 [PATCH V2] mm, page_alloc: fix core hung in free_pcppages_bulk() Charan Teja Reddy
2020-08-11 21:05 ` David Hildenbrand
2020-08-12 9:46 ` Charan Teja Kalla
2020-08-12 10:00 ` David Hildenbrand
2020-08-12 10:11 ` Charan Teja Kalla [this message]
2020-08-13 9:32 ` David Hildenbrand
2020-08-12 18:53 ` David Rientjes
2020-08-13 11:41 ` Michal Hocko
2020-08-13 16:21 ` Charan Teja Kalla
2020-08-13 16:30 ` Michal Hocko
2020-08-13 17:27 ` Charan Teja Kalla
2020-08-14 6:39 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2a2ec65a-ad93-7adc-76a0-246b9b328155@codeaurora.org \
--to=charante@codeaurora.org \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=rientjes@google.com \
--cc=vbabka@suse.cz \
--cc=vinmenon@codeaurora.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).