* [PATCH V2] mm, page_alloc: fix core hung in free_pcppages_bulk() @ 2020-08-11 12:58 Charan Teja Reddy 2020-08-11 21:05 ` David Hildenbrand 2020-08-13 11:41 ` Michal Hocko 0 siblings, 2 replies; 13+ messages in thread From: Charan Teja Reddy @ 2020-08-11 12:58 UTC (permalink / raw) To: akpm, mhocko, vbabka, david, rientjes, linux-mm Cc: linux-kernel, vinmenon, Charan Teja Reddy The following race is observed with the repeated online, offline and a delay between two successive online of memory blocks of movable zone. P1 P2 Online the first memory block in the movable zone. The pcp struct values are initialized to default values,i.e., pcp->high = 0 & pcp->batch = 1. Allocate the pages from the movable zone. Try to Online the second memory block in the movable zone thus it entered the online_pages() but yet to call zone_pcp_update(). This process is entered into the exit path thus it tries to release the order-0 pages to pcp lists through free_unref_page_commit(). As pcp->high = 0, pcp->count = 1 proceed to call the function free_pcppages_bulk(). Update the pcp values thus the new pcp values are like, say, pcp->high = 378, pcp->batch = 63. Read the pcp's batch value using READ_ONCE() and pass the same to free_pcppages_bulk(), pcp values passed here are, batch = 63, count = 1. Since num of pages in the pcp lists are less than ->batch, then it will stuck in while(list_empty(list)) loop with interrupts disabled thus a core hung. Avoid this by ensuring free_pcppages_bulk() is called with proper count of pcp list pages. The mentioned race is some what easily reproducible without [1] because pcp's are not updated for the first memory block online and thus there is a enough race window for P2 between alloc+free and pcp struct values update through onlining of second memory block. With [1], the race is still exists but it is very much narrow as we update the pcp struct values for the first memory block online itself. [1]: https://patchwork.kernel.org/patch/11696389/ Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> --- v1: https://patchwork.kernel.org/patch/11707637/ mm/page_alloc.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index e4896e6..839039f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1304,6 +1304,11 @@ static void free_pcppages_bulk(struct zone *zone, int count, struct page *page, *tmp; LIST_HEAD(head); + /* + * Ensure proper count is passed which otherwise would stuck in the + * below while (list_empty(list)) loop. + */ + count = min(pcp->count, count); while (count) { struct list_head *list; -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH V2] mm, page_alloc: fix core hung in free_pcppages_bulk() 2020-08-11 12:58 [PATCH V2] mm, page_alloc: fix core hung in free_pcppages_bulk() Charan Teja Reddy @ 2020-08-11 21:05 ` David Hildenbrand 2020-08-12 9:46 ` Charan Teja Kalla 2020-08-13 11:41 ` Michal Hocko 1 sibling, 1 reply; 13+ messages in thread From: David Hildenbrand @ 2020-08-11 21:05 UTC (permalink / raw) To: Charan Teja Reddy, akpm, mhocko, vbabka, rientjes, linux-mm Cc: linux-kernel, vinmenon On 11.08.20 14:58, Charan Teja Reddy wrote: > The following race is observed with the repeated online, offline and a > delay between two successive online of memory blocks of movable zone. > > P1 P2 > > Online the first memory block in > the movable zone. The pcp struct > values are initialized to default > values,i.e., pcp->high = 0 & > pcp->batch = 1. > > Allocate the pages from the > movable zone. > > Try to Online the second memory > block in the movable zone thus it > entered the online_pages() but yet > to call zone_pcp_update(). > This process is entered into > the exit path thus it tries > to release the order-0 pages > to pcp lists through > free_unref_page_commit(). > As pcp->high = 0, pcp->count = 1 > proceed to call the function > free_pcppages_bulk(). > Update the pcp values thus the > new pcp values are like, say, > pcp->high = 378, pcp->batch = 63. > Read the pcp's batch value using > READ_ONCE() and pass the same to > free_pcppages_bulk(), pcp values > passed here are, batch = 63, > count = 1. > > Since num of pages in the pcp > lists are less than ->batch, > then it will stuck in > while(list_empty(list)) loop > with interrupts disabled thus > a core hung. > > Avoid this by ensuring free_pcppages_bulk() is called with proper count > of pcp list pages. > > The mentioned race is some what easily reproducible without [1] because > pcp's are not updated for the first memory block online and thus there > is a enough race window for P2 between alloc+free and pcp struct values > update through onlining of second memory block. > > With [1], the race is still exists but it is very much narrow as we > update the pcp struct values for the first memory block online itself. > > [1]: https://patchwork.kernel.org/patch/11696389/ > IIUC, this is not limited to the movable zone, it could also happen in corner cases with the normal zone (e.g., hotplug to a node that only has DMA memory, or no other memory yet). > Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> > --- > > v1: https://patchwork.kernel.org/patch/11707637/ > > mm/page_alloc.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index e4896e6..839039f 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1304,6 +1304,11 @@ static void free_pcppages_bulk(struct zone *zone, int count, > struct page *page, *tmp; > LIST_HEAD(head); > > + /* > + * Ensure proper count is passed which otherwise would stuck in the > + * below while (list_empty(list)) loop. > + */ > + count = min(pcp->count, count); > while (count) { > struct list_head *list; > > Fixes: and Cc: stable... tags? -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH V2] mm, page_alloc: fix core hung in free_pcppages_bulk() 2020-08-11 21:05 ` David Hildenbrand @ 2020-08-12 9:46 ` Charan Teja Kalla 2020-08-12 10:00 ` David Hildenbrand 2020-08-12 18:53 ` David Rientjes 0 siblings, 2 replies; 13+ messages in thread From: Charan Teja Kalla @ 2020-08-12 9:46 UTC (permalink / raw) To: David Hildenbrand, akpm, mhocko, vbabka, rientjes, linux-mm Cc: linux-kernel, vinmenon Thanks David for the inputs. On 8/12/2020 2:35 AM, David Hildenbrand wrote: > On 11.08.20 14:58, Charan Teja Reddy wrote: >> The following race is observed with the repeated online, offline and a >> delay between two successive online of memory blocks of movable zone. >> >> P1 P2 >> >> Online the first memory block in >> the movable zone. The pcp struct >> values are initialized to default >> values,i.e., pcp->high = 0 & >> pcp->batch = 1. >> >> Allocate the pages from the >> movable zone. >> >> Try to Online the second memory >> block in the movable zone thus it >> entered the online_pages() but yet >> to call zone_pcp_update(). >> This process is entered into >> the exit path thus it tries >> to release the order-0 pages >> to pcp lists through >> free_unref_page_commit(). >> As pcp->high = 0, pcp->count = 1 >> proceed to call the function >> free_pcppages_bulk(). >> Update the pcp values thus the >> new pcp values are like, say, >> pcp->high = 378, pcp->batch = 63. >> Read the pcp's batch value using >> READ_ONCE() and pass the same to >> free_pcppages_bulk(), pcp values >> passed here are, batch = 63, >> count = 1. >> >> Since num of pages in the pcp >> lists are less than ->batch, >> then it will stuck in >> while(list_empty(list)) loop >> with interrupts disabled thus >> a core hung. >> >> Avoid this by ensuring free_pcppages_bulk() is called with proper count >> of pcp list pages. >> >> The mentioned race is some what easily reproducible without [1] because >> pcp's are not updated for the first memory block online and thus there >> is a enough race window for P2 between alloc+free and pcp struct values >> update through onlining of second memory block. >> >> With [1], the race is still exists but it is very much narrow as we >> update the pcp struct values for the first memory block online itself. >> >> [1]: https://patchwork.kernel.org/patch/11696389/ >> > > IIUC, this is not limited to the movable zone, it could also happen in > corner cases with the normal zone (e.g., hotplug to a node that only has > DMA memory, or no other memory yet). Yes, this is my understanding too. I explained the above race in terms of just movable zone for which it is observed. We can add the below line in the end in patch commit message: "This is not limited to the movable zone, it could also happen in cases with the normal zone (e.g., hotplug to a node that only has DMA memory, or no other memory yet)." Just curious, there exists such systems where just a dma zone present and we hot add the normal zone? I am not aware such thing in the embedded world. > >> Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> >> --- >> >> v1: https://patchwork.kernel.org/patch/11707637/ >> >> mm/page_alloc.c | 5 +++++ >> 1 file changed, 5 insertions(+) >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index e4896e6..839039f 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -1304,6 +1304,11 @@ static void free_pcppages_bulk(struct zone *zone, int count, >> struct page *page, *tmp; >> LIST_HEAD(head); >> >> + /* >> + * Ensure proper count is passed which otherwise would stuck in the >> + * below while (list_empty(list)) loop. >> + */ >> + count = min(pcp->count, count); >> while (count) { >> struct list_head *list; >> >> > > Fixes: and Cc: stable... tags? Fixes: 5f8dcc21211a ("page-allocator: split per-cpu list into one-list-per-migrate-type") Cc: <stable@vger.kernel.org> [2.6+] I am not sure If I should have to raise V3 including these? > -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH V2] mm, page_alloc: fix core hung in free_pcppages_bulk() 2020-08-12 9:46 ` Charan Teja Kalla @ 2020-08-12 10:00 ` David Hildenbrand 2020-08-12 10:11 ` Charan Teja Kalla 2020-08-12 18:53 ` David Rientjes 1 sibling, 1 reply; 13+ messages in thread From: David Hildenbrand @ 2020-08-12 10:00 UTC (permalink / raw) To: Charan Teja Kalla, akpm, mhocko, vbabka, rientjes, linux-mm Cc: linux-kernel, vinmenon On 12.08.20 11:46, Charan Teja Kalla wrote: > > Thanks David for the inputs. > > On 8/12/2020 2:35 AM, David Hildenbrand wrote: >> On 11.08.20 14:58, Charan Teja Reddy wrote: >>> The following race is observed with the repeated online, offline and a >>> delay between two successive online of memory blocks of movable zone. >>> >>> P1 P2 >>> >>> Online the first memory block in >>> the movable zone. The pcp struct >>> values are initialized to default >>> values,i.e., pcp->high = 0 & >>> pcp->batch = 1. >>> >>> Allocate the pages from the >>> movable zone. >>> >>> Try to Online the second memory >>> block in the movable zone thus it >>> entered the online_pages() but yet >>> to call zone_pcp_update(). >>> This process is entered into >>> the exit path thus it tries >>> to release the order-0 pages >>> to pcp lists through >>> free_unref_page_commit(). >>> As pcp->high = 0, pcp->count = 1 >>> proceed to call the function >>> free_pcppages_bulk(). >>> Update the pcp values thus the >>> new pcp values are like, say, >>> pcp->high = 378, pcp->batch = 63. >>> Read the pcp's batch value using >>> READ_ONCE() and pass the same to >>> free_pcppages_bulk(), pcp values >>> passed here are, batch = 63, >>> count = 1. >>> >>> Since num of pages in the pcp >>> lists are less than ->batch, >>> then it will stuck in >>> while(list_empty(list)) loop >>> with interrupts disabled thus >>> a core hung. >>> >>> Avoid this by ensuring free_pcppages_bulk() is called with proper count >>> of pcp list pages. >>> >>> The mentioned race is some what easily reproducible without [1] because >>> pcp's are not updated for the first memory block online and thus there >>> is a enough race window for P2 between alloc+free and pcp struct values >>> update through onlining of second memory block. >>> >>> With [1], the race is still exists but it is very much narrow as we >>> update the pcp struct values for the first memory block online itself. >>> >>> [1]: https://patchwork.kernel.org/patch/11696389/ >>> >> >> IIUC, this is not limited to the movable zone, it could also happen in >> corner cases with the normal zone (e.g., hotplug to a node that only has >> DMA memory, or no other memory yet). > > Yes, this is my understanding too. I explained the above race in terms > of just movable zone for which it is observed. We can add the below line > in the end in patch commit message: > "This is not limited to the movable zone, it could also happen in cases > with the normal zone (e.g., hotplug to a node that only has DMA memory, > or no other memory yet)." Yeah, that makes sense! > > Just curious, there exists such systems where just a dma zone present > and we hot add the normal zone? I am not aware such thing in the > embedded world. You can easily create such setups using QEMU. IIRC, just specify a QEMU guest with 2G initial memory and a single NUMA node, or 4G initial memory and two NUMA nodes. Then hotplug memory. (IIRC kata containers always start a VM with 2G and then hotplug memory) >> >>> Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> >>> --- >>> >>> v1: https://patchwork.kernel.org/patch/11707637/ >>> >>> mm/page_alloc.c | 5 +++++ >>> 1 file changed, 5 insertions(+) >>> >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>> index e4896e6..839039f 100644 >>> --- a/mm/page_alloc.c >>> +++ b/mm/page_alloc.c >>> @@ -1304,6 +1304,11 @@ static void free_pcppages_bulk(struct zone *zone, int count, >>> struct page *page, *tmp; >>> LIST_HEAD(head); >>> >>> + /* >>> + * Ensure proper count is passed which otherwise would stuck in the >>> + * below while (list_empty(list)) loop. >>> + */ >>> + count = min(pcp->count, count); >>> while (count) { >>> struct list_head *list; >>> >>> >> >> Fixes: and Cc: stable... tags? > > Fixes: 5f8dcc21211a ("page-allocator: split per-cpu list into > one-list-per-migrate-type") > Cc: <stable@vger.kernel.org> [2.6+] Did we have memory hotplug support then already? > > I am not sure If I should have to raise V3 including these? Maybe Andrew can fixup when applying. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH V2] mm, page_alloc: fix core hung in free_pcppages_bulk() 2020-08-12 10:00 ` David Hildenbrand @ 2020-08-12 10:11 ` Charan Teja Kalla 2020-08-13 9:32 ` David Hildenbrand 0 siblings, 1 reply; 13+ messages in thread From: Charan Teja Kalla @ 2020-08-12 10:11 UTC (permalink / raw) To: David Hildenbrand, akpm, mhocko, vbabka, rientjes, linux-mm Cc: linux-kernel, vinmenon On 8/12/2020 3:30 PM, David Hildenbrand wrote: > On 12.08.20 11:46, Charan Teja Kalla wrote: >> >> Thanks David for the inputs. >> >> On 8/12/2020 2:35 AM, David Hildenbrand wrote: >>> On 11.08.20 14:58, Charan Teja Reddy wrote: >>>> The following race is observed with the repeated online, offline and a >>>> delay between two successive online of memory blocks of movable zone. >>>> >>>> P1 P2 >>>> >>>> Online the first memory block in >>>> the movable zone. The pcp struct >>>> values are initialized to default >>>> values,i.e., pcp->high = 0 & >>>> pcp->batch = 1. >>>> >>>> Allocate the pages from the >>>> movable zone. >>>> >>>> Try to Online the second memory >>>> block in the movable zone thus it >>>> entered the online_pages() but yet >>>> to call zone_pcp_update(). >>>> This process is entered into >>>> the exit path thus it tries >>>> to release the order-0 pages >>>> to pcp lists through >>>> free_unref_page_commit(). >>>> As pcp->high = 0, pcp->count = 1 >>>> proceed to call the function >>>> free_pcppages_bulk(). >>>> Update the pcp values thus the >>>> new pcp values are like, say, >>>> pcp->high = 378, pcp->batch = 63. >>>> Read the pcp's batch value using >>>> READ_ONCE() and pass the same to >>>> free_pcppages_bulk(), pcp values >>>> passed here are, batch = 63, >>>> count = 1. >>>> >>>> Since num of pages in the pcp >>>> lists are less than ->batch, >>>> then it will stuck in >>>> while(list_empty(list)) loop >>>> with interrupts disabled thus >>>> a core hung. >>>> >>>> Avoid this by ensuring free_pcppages_bulk() is called with proper count >>>> of pcp list pages. >>>> >>>> The mentioned race is some what easily reproducible without [1] because >>>> pcp's are not updated for the first memory block online and thus there >>>> is a enough race window for P2 between alloc+free and pcp struct values >>>> update through onlining of second memory block. >>>> >>>> With [1], the race is still exists but it is very much narrow as we >>>> update the pcp struct values for the first memory block online itself. >>>> >>>> [1]: https://patchwork.kernel.org/patch/11696389/ >>>> >>> >>> IIUC, this is not limited to the movable zone, it could also happen in >>> corner cases with the normal zone (e.g., hotplug to a node that only has >>> DMA memory, or no other memory yet). >> >> Yes, this is my understanding too. I explained the above race in terms >> of just movable zone for which it is observed. We can add the below line >> in the end in patch commit message: >> "This is not limited to the movable zone, it could also happen in cases >> with the normal zone (e.g., hotplug to a node that only has DMA memory, >> or no other memory yet)." > > Yeah, that makes sense! > >> >> Just curious, there exists such systems where just a dma zone present >> and we hot add the normal zone? I am not aware such thing in the >> embedded world. > > You can easily create such setups using QEMU. > > IIRC, just specify a QEMU guest with 2G initial memory and a single NUMA > node, or 4G initial memory and two NUMA nodes. Then hotplug memory. > > (IIRC kata containers always start a VM with 2G and then hotplug memory) > I see. Thanks for letting me know this. >>> >>>> Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> >>>> --- >>>> >>>> v1: https://patchwork.kernel.org/patch/11707637/ >>>> >>>> mm/page_alloc.c | 5 +++++ >>>> 1 file changed, 5 insertions(+) >>>> >>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>>> index e4896e6..839039f 100644 >>>> --- a/mm/page_alloc.c >>>> +++ b/mm/page_alloc.c >>>> @@ -1304,6 +1304,11 @@ static void free_pcppages_bulk(struct zone *zone, int count, >>>> struct page *page, *tmp; >>>> LIST_HEAD(head); >>>> >>>> + /* >>>> + * Ensure proper count is passed which otherwise would stuck in the >>>> + * below while (list_empty(list)) loop. >>>> + */ >>>> + count = min(pcp->count, count); >>>> while (count) { >>>> struct list_head *list; >>>> >>>> >>> >>> Fixes: and Cc: stable... tags? >> >> Fixes: 5f8dcc21211a ("page-allocator: split per-cpu list into >> one-list-per-migrate-type") >> Cc: <stable@vger.kernel.org> [2.6+] > > Did we have memory hotplug support then already? Yes, it exist. https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/mm/memory_hotplug.c?h=v2.6.39 > >> >> I am not sure If I should have to raise V3 including these? > > > Maybe Andrew can fixup when applying. Okay, let Andrew decide on this. Meanwhile If you find that this patch looks correct, ACK from you helps here. > > -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH V2] mm, page_alloc: fix core hung in free_pcppages_bulk() 2020-08-12 10:11 ` Charan Teja Kalla @ 2020-08-13 9:32 ` David Hildenbrand 0 siblings, 0 replies; 13+ messages in thread From: David Hildenbrand @ 2020-08-13 9:32 UTC (permalink / raw) To: Charan Teja Kalla, akpm, mhocko, vbabka, rientjes, linux-mm Cc: linux-kernel, vinmenon On 12.08.20 12:11, Charan Teja Kalla wrote: > > > On 8/12/2020 3:30 PM, David Hildenbrand wrote: >> On 12.08.20 11:46, Charan Teja Kalla wrote: >>> >>> Thanks David for the inputs. >>> >>> On 8/12/2020 2:35 AM, David Hildenbrand wrote: >>>> On 11.08.20 14:58, Charan Teja Reddy wrote: >>>>> The following race is observed with the repeated online, offline and a >>>>> delay between two successive online of memory blocks of movable zone. >>>>> >>>>> P1 P2 >>>>> >>>>> Online the first memory block in >>>>> the movable zone. The pcp struct >>>>> values are initialized to default >>>>> values,i.e., pcp->high = 0 & >>>>> pcp->batch = 1. >>>>> >>>>> Allocate the pages from the >>>>> movable zone. >>>>> >>>>> Try to Online the second memory >>>>> block in the movable zone thus it >>>>> entered the online_pages() but yet >>>>> to call zone_pcp_update(). >>>>> This process is entered into >>>>> the exit path thus it tries >>>>> to release the order-0 pages >>>>> to pcp lists through >>>>> free_unref_page_commit(). >>>>> As pcp->high = 0, pcp->count = 1 >>>>> proceed to call the function >>>>> free_pcppages_bulk(). >>>>> Update the pcp values thus the >>>>> new pcp values are like, say, >>>>> pcp->high = 378, pcp->batch = 63. >>>>> Read the pcp's batch value using >>>>> READ_ONCE() and pass the same to >>>>> free_pcppages_bulk(), pcp values >>>>> passed here are, batch = 63, >>>>> count = 1. >>>>> >>>>> Since num of pages in the pcp >>>>> lists are less than ->batch, >>>>> then it will stuck in >>>>> while(list_empty(list)) loop >>>>> with interrupts disabled thus >>>>> a core hung. >>>>> >>>>> Avoid this by ensuring free_pcppages_bulk() is called with proper count >>>>> of pcp list pages. >>>>> >>>>> The mentioned race is some what easily reproducible without [1] because >>>>> pcp's are not updated for the first memory block online and thus there >>>>> is a enough race window for P2 between alloc+free and pcp struct values >>>>> update through onlining of second memory block. >>>>> >>>>> With [1], the race is still exists but it is very much narrow as we >>>>> update the pcp struct values for the first memory block online itself. >>>>> >>>>> [1]: https://patchwork.kernel.org/patch/11696389/ >>>>> >>>> >>>> IIUC, this is not limited to the movable zone, it could also happen in >>>> corner cases with the normal zone (e.g., hotplug to a node that only has >>>> DMA memory, or no other memory yet). >>> >>> Yes, this is my understanding too. I explained the above race in terms >>> of just movable zone for which it is observed. We can add the below line >>> in the end in patch commit message: >>> "This is not limited to the movable zone, it could also happen in cases >>> with the normal zone (e.g., hotplug to a node that only has DMA memory, >>> or no other memory yet)." >> >> Yeah, that makes sense! >> >>> >>> Just curious, there exists such systems where just a dma zone present >>> and we hot add the normal zone? I am not aware such thing in the >>> embedded world. >> >> You can easily create such setups using QEMU. >> >> IIRC, just specify a QEMU guest with 2G initial memory and a single NUMA >> node, or 4G initial memory and two NUMA nodes. Then hotplug memory. >> >> (IIRC kata containers always start a VM with 2G and then hotplug memory) >> > I see. Thanks for letting me know this. > >>>> >>>>> Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> >>>>> --- >>>>> >>>>> v1: https://patchwork.kernel.org/patch/11707637/ >>>>> >>>>> mm/page_alloc.c | 5 +++++ >>>>> 1 file changed, 5 insertions(+) >>>>> >>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>>>> index e4896e6..839039f 100644 >>>>> --- a/mm/page_alloc.c >>>>> +++ b/mm/page_alloc.c >>>>> @@ -1304,6 +1304,11 @@ static void free_pcppages_bulk(struct zone *zone, int count, >>>>> struct page *page, *tmp; >>>>> LIST_HEAD(head); >>>>> >>>>> + /* >>>>> + * Ensure proper count is passed which otherwise would stuck in the >>>>> + * below while (list_empty(list)) loop. >>>>> + */ >>>>> + count = min(pcp->count, count); >>>>> while (count) { >>>>> struct list_head *list; >>>>> >>>>> >>>> >>>> Fixes: and Cc: stable... tags? >>> >>> Fixes: 5f8dcc21211a ("page-allocator: split per-cpu list into >>> one-list-per-migrate-type") >>> Cc: <stable@vger.kernel.org> [2.6+] >> >> Did we have memory hotplug support then already? > > Yes, it exist. > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/mm/memory_hotplug.c?h=v2.6.39 Okay, so I guess these tags make sense. > >> >>> >>> I am not sure If I should have to raise V3 including these? >> >> >> Maybe Andrew can fixup when applying. > > Okay, let Andrew decide on this. Meanwhile If you find that this patch > looks correct, ACK from you helps here. Sure, I think this is good enough as a simple fix. Acked-by: David Hildenbrand <david@redhat.com> -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH V2] mm, page_alloc: fix core hung in free_pcppages_bulk() 2020-08-12 9:46 ` Charan Teja Kalla @ 2020-08-12 18:53 ` David Rientjes 2020-08-12 18:53 ` David Rientjes 1 sibling, 0 replies; 13+ messages in thread From: David Rientjes @ 2020-08-12 18:53 UTC (permalink / raw) To: Charan Teja Kalla Cc: David Hildenbrand, akpm, mhocko, vbabka, linux-mm, linux-kernel, vinmenon On Wed, 12 Aug 2020, Charan Teja Kalla wrote: > >> Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> > >> --- > >> > >> v1: https://patchwork.kernel.org/patch/11707637/ > >> > >> mm/page_alloc.c | 5 +++++ > >> 1 file changed, 5 insertions(+) > >> > >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >> index e4896e6..839039f 100644 > >> --- a/mm/page_alloc.c > >> +++ b/mm/page_alloc.c > >> @@ -1304,6 +1304,11 @@ static void free_pcppages_bulk(struct zone *zone, int count, > >> struct page *page, *tmp; > >> LIST_HEAD(head); > >> > >> + /* > >> + * Ensure proper count is passed which otherwise would stuck in the > >> + * below while (list_empty(list)) loop. > >> + */ > >> + count = min(pcp->count, count); > >> while (count) { > >> struct list_head *list; > >> > >> > > > > Fixes: and Cc: stable... tags? > > Fixes: 5f8dcc21211a ("page-allocator: split per-cpu list into > one-list-per-migrate-type") > Cc: <stable@vger.kernel.org> [2.6+] > Acked-by: David Rientjes <rientjes@google.com> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH V2] mm, page_alloc: fix core hung in free_pcppages_bulk() @ 2020-08-12 18:53 ` David Rientjes 0 siblings, 0 replies; 13+ messages in thread From: David Rientjes @ 2020-08-12 18:53 UTC (permalink / raw) To: Charan Teja Kalla Cc: David Hildenbrand, akpm, mhocko, vbabka, linux-mm, linux-kernel, vinmenon On Wed, 12 Aug 2020, Charan Teja Kalla wrote: > >> Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> > >> --- > >> > >> v1: https://patchwork.kernel.org/patch/11707637/ > >> > >> mm/page_alloc.c | 5 +++++ > >> 1 file changed, 5 insertions(+) > >> > >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >> index e4896e6..839039f 100644 > >> --- a/mm/page_alloc.c > >> +++ b/mm/page_alloc.c > >> @@ -1304,6 +1304,11 @@ static void free_pcppages_bulk(struct zone *zone, int count, > >> struct page *page, *tmp; > >> LIST_HEAD(head); > >> > >> + /* > >> + * Ensure proper count is passed which otherwise would stuck in the > >> + * below while (list_empty(list)) loop. > >> + */ > >> + count = min(pcp->count, count); > >> while (count) { > >> struct list_head *list; > >> > >> > > > > Fixes: and Cc: stable... tags? > > Fixes: 5f8dcc21211a ("page-allocator: split per-cpu list into > one-list-per-migrate-type") > Cc: <stable@vger.kernel.org> [2.6+] > Acked-by: David Rientjes <rientjes@google.com> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH V2] mm, page_alloc: fix core hung in free_pcppages_bulk() 2020-08-11 12:58 [PATCH V2] mm, page_alloc: fix core hung in free_pcppages_bulk() Charan Teja Reddy 2020-08-11 21:05 ` David Hildenbrand @ 2020-08-13 11:41 ` Michal Hocko 2020-08-13 16:21 ` Charan Teja Kalla 1 sibling, 1 reply; 13+ messages in thread From: Michal Hocko @ 2020-08-13 11:41 UTC (permalink / raw) To: Charan Teja Reddy Cc: akpm, vbabka, david, rientjes, linux-mm, linux-kernel, vinmenon On Tue 11-08-20 18:28:23, Charan Teja Reddy wrote: [...] > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index e4896e6..839039f 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1304,6 +1304,11 @@ static void free_pcppages_bulk(struct zone *zone, int count, > struct page *page, *tmp; > LIST_HEAD(head); > > + /* > + * Ensure proper count is passed which otherwise would stuck in the > + * below while (list_empty(list)) loop. > + */ > + count = min(pcp->count, count); > while (count) { > struct list_head *list; How does this prevent the race actually? Don't we need something like the following instead? diff --git a/mm/page_alloc.c b/mm/page_alloc.c index e028b87ce294..45bcc7ba37c4 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1317,9 +1317,16 @@ static void free_pcppages_bulk(struct zone *zone, int count, * lists */ do { + bool looped = false; + batch_free++; - if (++migratetype == MIGRATE_PCPTYPES) + if (++migratetype == MIGRATE_PCPTYPES) { + if (looped) + goto free; + migratetype = 0; + looped = true; + } list = &pcp->lists[migratetype]; } while (list_empty(list)); @@ -1352,6 +1359,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, } while (--count && --batch_free && !list_empty(list)); } +free: spin_lock(&zone->lock); isolated_pageblocks = has_isolate_pageblock(zone); -- Michal Hocko SUSE Labs ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH V2] mm, page_alloc: fix core hung in free_pcppages_bulk() 2020-08-13 11:41 ` Michal Hocko @ 2020-08-13 16:21 ` Charan Teja Kalla 2020-08-13 16:30 ` Michal Hocko 0 siblings, 1 reply; 13+ messages in thread From: Charan Teja Kalla @ 2020-08-13 16:21 UTC (permalink / raw) To: Michal Hocko Cc: akpm, vbabka, david, rientjes, linux-mm, linux-kernel, vinmenon Thanks Michal for comments. On 8/13/2020 5:11 PM, Michal Hocko wrote: > On Tue 11-08-20 18:28:23, Charan Teja Reddy wrote: > [...] >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index e4896e6..839039f 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -1304,6 +1304,11 @@ static void free_pcppages_bulk(struct zone *zone, int count, >> struct page *page, *tmp; >> LIST_HEAD(head); >> >> + /* >> + * Ensure proper count is passed which otherwise would stuck in the >> + * below while (list_empty(list)) loop. >> + */ >> + count = min(pcp->count, count); >> while (count) { >> struct list_head *list; > > > How does this prevent the race actually? This doesn't prevent the race. This only fixes the core hung(as this is called with spin_lock_irq()) caused by the race condition. This core hung is because of incorrect count value is passed to the free_pcppages_bulk() function. The actual race should be fixed by David's suggestion (isolate, online and undo isolation). Something needs to be improved in commit message? May be like below: s/The following race is observed with the repeated ... / Cpu core hung is observed as a result of race with the use case of repeated... Don't we need something like > the following instead? > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index e028b87ce294..45bcc7ba37c4 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1317,9 +1317,16 @@ static void free_pcppages_bulk(struct zone *zone, int count, > * lists > */ > do { > + bool looped = false; IIUC, this looped will always be initialzed to false thus never jumped to free. But I think I got your idea that looping of the pcp lists for any pages. If not found despite MIGRATE_PCPTYPES count lists are traversed, just break the loop. Does this checking really required? Shouldn't pcp->count tells the same whether any pages left in the pcp lists? > + > batch_free++; > - if (++migratetype == MIGRATE_PCPTYPES) > + if (++migratetype == MIGRATE_PCPTYPES) { > + if (looped) > + goto free; > + > migratetype = 0; > + looped = true; > + } > list = &pcp->lists[migratetype]; > } while (list_empty(list)); > > @@ -1352,6 +1359,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, > } while (--count && --batch_free && !list_empty(list)); > } > > +free: > spin_lock(&zone->lock); > isolated_pageblocks = has_isolate_pageblock(zone); > > -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH V2] mm, page_alloc: fix core hung in free_pcppages_bulk() 2020-08-13 16:21 ` Charan Teja Kalla @ 2020-08-13 16:30 ` Michal Hocko 2020-08-13 17:27 ` Charan Teja Kalla 0 siblings, 1 reply; 13+ messages in thread From: Michal Hocko @ 2020-08-13 16:30 UTC (permalink / raw) To: Charan Teja Kalla Cc: akpm, vbabka, david, rientjes, linux-mm, linux-kernel, vinmenon On Thu 13-08-20 21:51:29, Charan Teja Kalla wrote: > Thanks Michal for comments. > > On 8/13/2020 5:11 PM, Michal Hocko wrote: > > On Tue 11-08-20 18:28:23, Charan Teja Reddy wrote: > > [...] > >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >> index e4896e6..839039f 100644 > >> --- a/mm/page_alloc.c > >> +++ b/mm/page_alloc.c > >> @@ -1304,6 +1304,11 @@ static void free_pcppages_bulk(struct zone *zone, int count, > >> struct page *page, *tmp; > >> LIST_HEAD(head); > >> > >> + /* > >> + * Ensure proper count is passed which otherwise would stuck in the > >> + * below while (list_empty(list)) loop. > >> + */ > >> + count = min(pcp->count, count); > >> while (count) { > >> struct list_head *list; > > > > > > How does this prevent the race actually? > > This doesn't prevent the race. This only fixes the core hung(as this is > called with spin_lock_irq()) caused by the race condition. This core > hung is because of incorrect count value is passed to the > free_pcppages_bulk() function. Let me ask differently. What does enforce that the count and lists do not get out of sync in the loop. Your changelog says that the fix is to use the proper value without any specifics. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH V2] mm, page_alloc: fix core hung in free_pcppages_bulk() 2020-08-13 16:30 ` Michal Hocko @ 2020-08-13 17:27 ` Charan Teja Kalla 2020-08-14 6:39 ` Michal Hocko 0 siblings, 1 reply; 13+ messages in thread From: Charan Teja Kalla @ 2020-08-13 17:27 UTC (permalink / raw) To: Michal Hocko Cc: akpm, vbabka, david, rientjes, linux-mm, linux-kernel, vinmenon Thanks Michal. On 8/13/2020 10:00 PM, Michal Hocko wrote: > On Thu 13-08-20 21:51:29, Charan Teja Kalla wrote: >> Thanks Michal for comments. >> >> On 8/13/2020 5:11 PM, Michal Hocko wrote: >>> On Tue 11-08-20 18:28:23, Charan Teja Reddy wrote: >>> [...] >>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>>> index e4896e6..839039f 100644 >>>> --- a/mm/page_alloc.c >>>> +++ b/mm/page_alloc.c >>>> @@ -1304,6 +1304,11 @@ static void free_pcppages_bulk(struct zone *zone, int count, >>>> struct page *page, *tmp; >>>> LIST_HEAD(head); >>>> >>>> + /* >>>> + * Ensure proper count is passed which otherwise would stuck in the >>>> + * below while (list_empty(list)) loop. >>>> + */ >>>> + count = min(pcp->count, count); >>>> while (count) { >>>> struct list_head *list; >>> >>> >>> How does this prevent the race actually? >> >> This doesn't prevent the race. This only fixes the core hung(as this is >> called with spin_lock_irq()) caused by the race condition. This core >> hung is because of incorrect count value is passed to the >> free_pcppages_bulk() function. > > Let me ask differently. What does enforce that the count and lists do > not get out of sync in the loop. count value is updated whenever an order-0 page is being added to the pcp lists through free_unref_page_commit(), which is being called with both interrupts, premption disabled. static void free_unref_page_commit(struct page *page, { .... list_add(&page->lru, &pcp->lists[migratetype]); pcp->count++ } As these are pcp lists, they only gets touched by another process when this process is context switched, which happens only after enabling premption or interrupts. So, as long as process is operating on these pcp lists in free_unref_page_commit function, the count and lists are always synced. However, the problem here is not that the count and lists are being out of sync. They do always in sync, as explained above. It is with the asking free_pcppages_bulk() to free the pages more than what is present in the pcp lists which is ending up in while(list_empty()). > Your changelog says that the fix is to > use the proper value without any specifics. > Will change this to: Ensure the count value passed is not greater than the pcp lists count. Any better you suggest? -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH V2] mm, page_alloc: fix core hung in free_pcppages_bulk() 2020-08-13 17:27 ` Charan Teja Kalla @ 2020-08-14 6:39 ` Michal Hocko 0 siblings, 0 replies; 13+ messages in thread From: Michal Hocko @ 2020-08-14 6:39 UTC (permalink / raw) To: Charan Teja Kalla Cc: akpm, vbabka, david, rientjes, linux-mm, linux-kernel, vinmenon On Thu 13-08-20 22:57:32, Charan Teja Kalla wrote: > Thanks Michal. > > On 8/13/2020 10:00 PM, Michal Hocko wrote: > > On Thu 13-08-20 21:51:29, Charan Teja Kalla wrote: > >> Thanks Michal for comments. > >> > >> On 8/13/2020 5:11 PM, Michal Hocko wrote: > >>> On Tue 11-08-20 18:28:23, Charan Teja Reddy wrote: > >>> [...] > >>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >>>> index e4896e6..839039f 100644 > >>>> --- a/mm/page_alloc.c > >>>> +++ b/mm/page_alloc.c > >>>> @@ -1304,6 +1304,11 @@ static void free_pcppages_bulk(struct zone *zone, int count, > >>>> struct page *page, *tmp; > >>>> LIST_HEAD(head); > >>>> > >>>> + /* > >>>> + * Ensure proper count is passed which otherwise would stuck in the > >>>> + * below while (list_empty(list)) loop. > >>>> + */ > >>>> + count = min(pcp->count, count); > >>>> while (count) { > >>>> struct list_head *list; > >>> > >>> > >>> How does this prevent the race actually? > >> > >> This doesn't prevent the race. This only fixes the core hung(as this is > >> called with spin_lock_irq()) caused by the race condition. This core > >> hung is because of incorrect count value is passed to the > >> free_pcppages_bulk() function. > > > > Let me ask differently. What does enforce that the count and lists do > > not get out of sync in the loop. > > count value is updated whenever an order-0 page is being added to the > pcp lists through free_unref_page_commit(), which is being called with > both interrupts, premption disabled. > static void free_unref_page_commit(struct page *page, { > .... > list_add(&page->lru, &pcp->lists[migratetype]); > pcp->count++ > } > > As these are pcp lists, they only gets touched by another process when > this process is context switched, which happens only after enabling > premption or interrupts. So, as long as process is operating on these > pcp lists in free_unref_page_commit function, the count and lists are > always synced. > > However, the problem here is not that the count and lists are being out > of sync. They do always in sync, as explained above. It is with the > asking free_pcppages_bulk() to free the pages more than what is present > in the pcp lists which is ending up in while(list_empty()). You are right. I managed to confuse myself. The thing is that the batch count is out of sync. > > Your changelog says that the fix is to > > use the proper value without any specifics. > > > Will change this to: Ensure the count value passed is not greater than > the pcp lists count. Any better you suggest? Yes, this makes it more clear. Feel free to add Acked-by: Michal Hocko <mhocko@suse.com> -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2020-08-14 6:39 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-08-11 12:58 [PATCH V2] mm, page_alloc: fix core hung in free_pcppages_bulk() Charan Teja Reddy 2020-08-11 21:05 ` David Hildenbrand 2020-08-12 9:46 ` Charan Teja Kalla 2020-08-12 10:00 ` David Hildenbrand 2020-08-12 10:11 ` Charan Teja Kalla 2020-08-13 9:32 ` David Hildenbrand 2020-08-12 18:53 ` David Rientjes 2020-08-12 18:53 ` David Rientjes 2020-08-13 11:41 ` Michal Hocko 2020-08-13 16:21 ` Charan Teja Kalla 2020-08-13 16:30 ` Michal Hocko 2020-08-13 17:27 ` Charan Teja Kalla 2020-08-14 6:39 ` Michal Hocko
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.