linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs
@ 2020-08-25  4:59 js1304
  2020-08-25  5:10 ` Andrew Morton
  2020-08-25  9:43 ` Vlastimil Babka
  0 siblings, 2 replies; 13+ messages in thread
From: js1304 @ 2020-08-25  4:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Michal Hocko, Vlastimil Babka,
	Aneesh Kumar K . V, kernel-team, Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

memalloc_nocma_{save/restore} APIs can be used to skip page allocation
on CMA area, but, there is a missing case and the page on CMA area could
be allocated even if APIs are used. This patch handles this case to fix
the potential issue.

Missing case is an allocation from the pcplist. MIGRATE_MOVABLE pcplist
could have the pages on CMA area so we need to skip it if ALLOC_CMA isn't
specified.

This patch implements this behaviour by checking allocated page from
the pcplist rather than skipping an allocation from the pcplist entirely.
Skipping the pcplist entirely would result in a mismatch between watermark
check and actual page allocation. And, it requires to break current code
layering that order-0 page is always handled by the pcplist. I'd prefer
to avoid it so this patch uses different way to skip CMA page allocation
from the pcplist.

Fixes: 8510e69c8efe (mm/page_alloc: fix memalloc_nocma_{save/restore} APIs)
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/page_alloc.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0e2bab4..c4abf58 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3341,6 +3341,22 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone,
 	pcp = &this_cpu_ptr(zone->pageset)->pcp;
 	list = &pcp->lists[migratetype];
 	page = __rmqueue_pcplist(zone,  migratetype, alloc_flags, pcp, list);
+#ifdef CONFIG_CMA
+	if (page) {
+		int mt = get_pcppage_migratetype(page);
+
+		/*
+		 * pcp could have the pages on CMA area and we need to skip it
+		 * when !ALLOC_CMA. Free all pcplist and retry allocation.
+		 */
+		if (is_migrate_cma(mt) && !(alloc_flags & ALLOC_CMA)) {
+			list_add(&page->lru, &pcp->lists[migratetype]);
+			pcp->count++;
+			free_pcppages_bulk(zone, pcp->count, pcp);
+			page = __rmqueue_pcplist(zone, migratetype, alloc_flags, pcp, list);
+		}
+	}
+#endif
 	if (page) {
 		__count_zid_vm_events(PGALLOC, page_zonenum(page), 1);
 		zone_statistics(preferred_zone, zone);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs
  2020-08-25  4:59 [PATCH for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs js1304
@ 2020-08-25  5:10 ` Andrew Morton
  2020-08-25  5:34   ` Joonsoo Kim
  2020-08-25  9:43 ` Vlastimil Babka
  1 sibling, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2020-08-25  5:10 UTC (permalink / raw)
  To: js1304
  Cc: linux-mm, linux-kernel, Michal Hocko, Vlastimil Babka,
	Aneesh Kumar K . V, kernel-team, Joonsoo Kim

On Tue, 25 Aug 2020 13:59:42 +0900 js1304@gmail.com wrote:

> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> memalloc_nocma_{save/restore} APIs can be used to skip page allocation
> on CMA area, but, there is a missing case and the page on CMA area could
> be allocated even if APIs are used. This patch handles this case to fix
> the potential issue.
> 
> Missing case is an allocation from the pcplist. MIGRATE_MOVABLE pcplist
> could have the pages on CMA area so we need to skip it if ALLOC_CMA isn't
> specified.
> 
> This patch implements this behaviour by checking allocated page from
> the pcplist rather than skipping an allocation from the pcplist entirely.
> Skipping the pcplist entirely would result in a mismatch between watermark
> check and actual page allocation. And, it requires to break current code
> layering that order-0 page is always handled by the pcplist. I'd prefer
> to avoid it so this patch uses different way to skip CMA page allocation
> from the pcplist.
> 
> ...
>
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3341,6 +3341,22 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone,
>  	pcp = &this_cpu_ptr(zone->pageset)->pcp;
>  	list = &pcp->lists[migratetype];
>  	page = __rmqueue_pcplist(zone,  migratetype, alloc_flags, pcp, list);
> +#ifdef CONFIG_CMA
> +	if (page) {
> +		int mt = get_pcppage_migratetype(page);
> +
> +		/*
> +		 * pcp could have the pages on CMA area and we need to skip it
> +		 * when !ALLOC_CMA. Free all pcplist and retry allocation.
> +		 */
> +		if (is_migrate_cma(mt) && !(alloc_flags & ALLOC_CMA)) {
> +			list_add(&page->lru, &pcp->lists[migratetype]);
> +			pcp->count++;
> +			free_pcppages_bulk(zone, pcp->count, pcp);
> +			page = __rmqueue_pcplist(zone, migratetype, alloc_flags, pcp, list);
> +		}
> +	}
> +#endif
>  	if (page) {
>  		__count_zid_vm_events(PGALLOC, page_zonenum(page), 1);
>  		zone_statistics(preferred_zone, zone);

That's a bunch more code on a very hot path to serve an obscure feature
which has a single obscure callsite.

Can we instead put the burden on that callsite rather than upon
everyone?  For (dumb) example, teach __gup_longterm_locked() to put the
page back if it's CMA and go get another one?



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs
  2020-08-25  5:10 ` Andrew Morton
@ 2020-08-25  5:34   ` Joonsoo Kim
  2020-08-26  0:42     ` Andrew Morton
  0 siblings, 1 reply; 13+ messages in thread
From: Joonsoo Kim @ 2020-08-25  5:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux Memory Management List, LKML, Michal Hocko,
	Vlastimil Babka, Aneesh Kumar K . V, kernel-team, Joonsoo Kim

2020년 8월 25일 (화) 오후 2:10, Andrew Morton <akpm@linux-foundation.org>님이 작성:
>
> On Tue, 25 Aug 2020 13:59:42 +0900 js1304@gmail.com wrote:
>
> > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >
> > memalloc_nocma_{save/restore} APIs can be used to skip page allocation
> > on CMA area, but, there is a missing case and the page on CMA area could
> > be allocated even if APIs are used. This patch handles this case to fix
> > the potential issue.
> >
> > Missing case is an allocation from the pcplist. MIGRATE_MOVABLE pcplist
> > could have the pages on CMA area so we need to skip it if ALLOC_CMA isn't
> > specified.
> >
> > This patch implements this behaviour by checking allocated page from
> > the pcplist rather than skipping an allocation from the pcplist entirely.
> > Skipping the pcplist entirely would result in a mismatch between watermark
> > check and actual page allocation. And, it requires to break current code
> > layering that order-0 page is always handled by the pcplist. I'd prefer
> > to avoid it so this patch uses different way to skip CMA page allocation
> > from the pcplist.
> >
> > ...
> >
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -3341,6 +3341,22 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone,
> >       pcp = &this_cpu_ptr(zone->pageset)->pcp;
> >       list = &pcp->lists[migratetype];
> >       page = __rmqueue_pcplist(zone,  migratetype, alloc_flags, pcp, list);
> > +#ifdef CONFIG_CMA
> > +     if (page) {
> > +             int mt = get_pcppage_migratetype(page);
> > +
> > +             /*
> > +              * pcp could have the pages on CMA area and we need to skip it
> > +              * when !ALLOC_CMA. Free all pcplist and retry allocation.
> > +              */
> > +             if (is_migrate_cma(mt) && !(alloc_flags & ALLOC_CMA)) {
> > +                     list_add(&page->lru, &pcp->lists[migratetype]);
> > +                     pcp->count++;
> > +                     free_pcppages_bulk(zone, pcp->count, pcp);
> > +                     page = __rmqueue_pcplist(zone, migratetype, alloc_flags, pcp, list);
> > +             }
> > +     }
> > +#endif
> >       if (page) {
> >               __count_zid_vm_events(PGALLOC, page_zonenum(page), 1);
> >               zone_statistics(preferred_zone, zone);
>
> That's a bunch more code on a very hot path to serve an obscure feature
> which has a single obscure callsite.
>
> Can we instead put the burden on that callsite rather than upon
> everyone?  For (dumb) example, teach __gup_longterm_locked() to put the
> page back if it's CMA and go get another one?

Hmm... Unfortunately, it cannot ensure that we eventually get the non-CMA page.
I think that the only way to ensure it is to implement the
functionality here. We can
use 'unlikely' or 'static branch' to reduce the overhead for a really
rare case but
for now I have no idea how to completely remove the overhead.

Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs
  2020-08-25  4:59 [PATCH for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs js1304
  2020-08-25  5:10 ` Andrew Morton
@ 2020-08-25  9:43 ` Vlastimil Babka
  2020-08-26  5:12   ` Joonsoo Kim
  1 sibling, 1 reply; 13+ messages in thread
From: Vlastimil Babka @ 2020-08-25  9:43 UTC (permalink / raw)
  To: js1304, Andrew Morton
  Cc: linux-mm, linux-kernel, Michal Hocko, Aneesh Kumar K . V,
	kernel-team, Joonsoo Kim


On 8/25/20 6:59 AM, js1304@gmail.com wrote:
> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> memalloc_nocma_{save/restore} APIs can be used to skip page allocation
> on CMA area, but, there is a missing case and the page on CMA area could
> be allocated even if APIs are used. This patch handles this case to fix
> the potential issue.
> 
> Missing case is an allocation from the pcplist. MIGRATE_MOVABLE pcplist
> could have the pages on CMA area so we need to skip it if ALLOC_CMA isn't
> specified.
> 
> This patch implements this behaviour by checking allocated page from
> the pcplist rather than skipping an allocation from the pcplist entirely.
> Skipping the pcplist entirely would result in a mismatch between watermark
> check and actual page allocation.

Are you sure? I think a mismatch exists already. Pages can be on the pcplist but
they are not considered as free in the watermark check. So passing watermark
check means there should be also pages on free lists. So skipping pcplists would
be safe, no?

> And, it requires to break current code
> layering that order-0 page is always handled by the pcplist. I'd prefer
> to avoid it so this patch uses different way to skip CMA page allocation
> from the pcplist.

Well it would be much simpler and won't affect most of allocations. Better than
flushing pcplists IMHO.

Something like this?
----8<----
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0e2bab486fea..15787ffb1708 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3361,7 +3361,10 @@ struct page *rmqueue(struct zone *preferred_zone,
        unsigned long flags;
        struct page *page;
 
-       if (likely(order == 0)) {
+       if (likely(order == 0) &&
+                       (!IS_ENABLED(CONFIG_CMA) ||
+                        migratetype != MIGRATE_MOVABLE ||
+                        alloc_flags & ALLOC_CMA)) {
                page = rmqueue_pcplist(preferred_zone, zone, gfp_flags,
                                        migratetype, alloc_flags);
                goto out;

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs
  2020-08-25  5:34   ` Joonsoo Kim
@ 2020-08-26  0:42     ` Andrew Morton
  2020-08-26  5:21       ` Joonsoo Kim
  0 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2020-08-26  0:42 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Linux Memory Management List, LKML, Michal Hocko,
	Vlastimil Babka, Aneesh Kumar K . V, kernel-team, Joonsoo Kim

On Tue, 25 Aug 2020 14:34:32 +0900 Joonsoo Kim <js1304@gmail.com> wrote:

> >
> > That's a bunch more code on a very hot path to serve an obscure feature
> > which has a single obscure callsite.
> >
> > Can we instead put the burden on that callsite rather than upon
> > everyone?  For (dumb) example, teach __gup_longterm_locked() to put the
> > page back if it's CMA and go get another one?
> 
> Hmm... Unfortunately, it cannot ensure that we eventually get the non-CMA page.
> I think that the only way to ensure it is to implement the
> functionality here. We can
> use 'unlikely' or 'static branch' to reduce the overhead for a really
> rare case but
> for now I have no idea how to completely remove the overhead.

Gee, there must be something?  Provide the gup code with a special
entry point which takes the page straight from __rmqueue() and bypasses
the pcp lists?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs
  2020-08-25  9:43 ` Vlastimil Babka
@ 2020-08-26  5:12   ` Joonsoo Kim
  2020-08-27 12:15     ` Vlastimil Babka
  2020-08-27 13:35     ` Mel Gorman
  0 siblings, 2 replies; 13+ messages in thread
From: Joonsoo Kim @ 2020-08-26  5:12 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Linux Memory Management List, LKML, Michal Hocko,
	Aneesh Kumar K . V, kernel-team, Joonsoo Kim

2020년 8월 25일 (화) 오후 6:43, Vlastimil Babka <vbabka@suse.cz>님이 작성:
>
>
> On 8/25/20 6:59 AM, js1304@gmail.com wrote:
> > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >
> > memalloc_nocma_{save/restore} APIs can be used to skip page allocation
> > on CMA area, but, there is a missing case and the page on CMA area could
> > be allocated even if APIs are used. This patch handles this case to fix
> > the potential issue.
> >
> > Missing case is an allocation from the pcplist. MIGRATE_MOVABLE pcplist
> > could have the pages on CMA area so we need to skip it if ALLOC_CMA isn't
> > specified.
> >
> > This patch implements this behaviour by checking allocated page from
> > the pcplist rather than skipping an allocation from the pcplist entirely.
> > Skipping the pcplist entirely would result in a mismatch between watermark
> > check and actual page allocation.
>
> Are you sure? I think a mismatch exists already. Pages can be on the pcplist but
> they are not considered as free in the watermark check. So passing watermark
> check means there should be also pages on free lists. So skipping pcplists would
> be safe, no?

You are right.

> > And, it requires to break current code
> > layering that order-0 page is always handled by the pcplist. I'd prefer
> > to avoid it so this patch uses different way to skip CMA page allocation
> > from the pcplist.
>
> Well it would be much simpler and won't affect most of allocations. Better than
> flushing pcplists IMHO.

Hmm...Still, I'd prefer my approach. There are two reasons. First,
layering problem
mentioned above. In rmqueue(), there is a code for MIGRATE_HIGHATOMIC.
As the name shows, it's for high order atomic allocation. But, after
skipping pcplist
allocation as you suggested, we could get there with order 0 request.
We can also
change this code, but, I'd hope to maintain current layering. Second,
a performance
reason. After the flag for nocma is up, a burst of nocma allocation
could come. After
flushing the pcplist one times, we can use the free page on the
pcplist as usual until
the context is changed.

How about my reasoning?

Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs
  2020-08-26  0:42     ` Andrew Morton
@ 2020-08-26  5:21       ` Joonsoo Kim
  0 siblings, 0 replies; 13+ messages in thread
From: Joonsoo Kim @ 2020-08-26  5:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux Memory Management List, LKML, Michal Hocko,
	Vlastimil Babka, Aneesh Kumar K . V, kernel-team, Joonsoo Kim

2020년 8월 26일 (수) 오전 9:42, Andrew Morton <akpm@linux-foundation.org>님이 작성:
>
> On Tue, 25 Aug 2020 14:34:32 +0900 Joonsoo Kim <js1304@gmail.com> wrote:
>
> > >
> > > That's a bunch more code on a very hot path to serve an obscure feature
> > > which has a single obscure callsite.
> > >
> > > Can we instead put the burden on that callsite rather than upon
> > > everyone?  For (dumb) example, teach __gup_longterm_locked() to put the
> > > page back if it's CMA and go get another one?
> >
> > Hmm... Unfortunately, it cannot ensure that we eventually get the non-CMA page.
> > I think that the only way to ensure it is to implement the
> > functionality here. We can
> > use 'unlikely' or 'static branch' to reduce the overhead for a really
> > rare case but
> > for now I have no idea how to completely remove the overhead.
>
> Gee, there must be something?  Provide the gup code with a special
> entry point which takes the page straight from __rmqueue() and bypasses
> the pcp lists?

Hmm... it seems not possible. It's allocation context API and maybe actual
allocation happens in handle_mm_fault() or it's successor. We cannot use
a special entry point for allocation there since it's a general function.

And, IMHO, making a special allocation function that bypasses the pcp list
would not be a good practice.

Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs
  2020-08-26  5:12   ` Joonsoo Kim
@ 2020-08-27 12:15     ` Vlastimil Babka
  2020-08-27 13:35     ` Mel Gorman
  1 sibling, 0 replies; 13+ messages in thread
From: Vlastimil Babka @ 2020-08-27 12:15 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Linux Memory Management List, LKML, Michal Hocko,
	Aneesh Kumar K . V, kernel-team, Joonsoo Kim, Mel Gorman

On 8/26/20 7:12 AM, Joonsoo Kim wrote:
> 2020년 8월 25일 (화) 오후 6:43, Vlastimil Babka <vbabka@suse.cz>님이 작성:
>>
>>
>> On 8/25/20 6:59 AM, js1304@gmail.com wrote:
>> > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>> >
>> > memalloc_nocma_{save/restore} APIs can be used to skip page allocation
>> > on CMA area, but, there is a missing case and the page on CMA area could
>> > be allocated even if APIs are used. This patch handles this case to fix
>> > the potential issue.
>> >
>> > Missing case is an allocation from the pcplist. MIGRATE_MOVABLE pcplist
>> > could have the pages on CMA area so we need to skip it if ALLOC_CMA isn't
>> > specified.
>> >
>> > This patch implements this behaviour by checking allocated page from
>> > the pcplist rather than skipping an allocation from the pcplist entirely.
>> > Skipping the pcplist entirely would result in a mismatch between watermark
>> > check and actual page allocation.
>>
>> Are you sure? I think a mismatch exists already. Pages can be on the pcplist but
>> they are not considered as free in the watermark check. So passing watermark
>> check means there should be also pages on free lists. So skipping pcplists would
>> be safe, no?
> 
> You are right.
> 
>> > And, it requires to break current code
>> > layering that order-0 page is always handled by the pcplist. I'd prefer
>> > to avoid it so this patch uses different way to skip CMA page allocation
>> > from the pcplist.
>>
>> Well it would be much simpler and won't affect most of allocations. Better than
>> flushing pcplists IMHO.
> 
> Hmm...Still, I'd prefer my approach. There are two reasons. First,
> layering problem
> mentioned above. In rmqueue(), there is a code for MIGRATE_HIGHATOMIC.
> As the name shows, it's for high order atomic allocation. But, after
> skipping pcplist
> allocation as you suggested, we could get there with order 0 request.
> We can also
> change this code, but, I'd hope to maintain current layering. Second,
> a performance
> reason. After the flag for nocma is up, a burst of nocma allocation
> could come. After
> flushing the pcplist one times, we can use the free page on the
> pcplist as usual until
> the context is changed.

Both solutions are ugly and we should have CMA in ZONE_MOVABLE or get rid of it
completely. Let's CC Mel what he thinks.

> How about my reasoning?
> 
> Thanks.
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs
  2020-08-26  5:12   ` Joonsoo Kim
  2020-08-27 12:15     ` Vlastimil Babka
@ 2020-08-27 13:35     ` Mel Gorman
  2020-08-27 23:54       ` Joonsoo Kim
  1 sibling, 1 reply; 13+ messages in thread
From: Mel Gorman @ 2020-08-27 13:35 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Vlastimil Babka, Andrew Morton, Linux Memory Management List,
	LKML, Michal Hocko, Aneesh Kumar K . V, kernel-team, Joonsoo Kim

On Wed, Aug 26, 2020 at 02:12:44PM +0900, Joonsoo Kim wrote:
> > > And, it requires to break current code
> > > layering that order-0 page is always handled by the pcplist. I'd prefer
> > > to avoid it so this patch uses different way to skip CMA page allocation
> > > from the pcplist.
> >
> > Well it would be much simpler and won't affect most of allocations. Better than
> > flushing pcplists IMHO.
> 
> Hmm...Still, I'd prefer my approach.

I prefer the pcp bypass approach. It's simpler and it does not incur a
pcp drain/refill penalty. 

> There are two reasons. First,
> layering problem
> mentioned above. In rmqueue(), there is a code for MIGRATE_HIGHATOMIC.
> As the name shows, it's for high order atomic allocation. But, after
> skipping pcplist
> allocation as you suggested, we could get there with order 0 request.

I guess your concern is that under some circumstances that a request that
passes a watermark check could fail due to a highatomic reserve and to
an extent this is true. However, in that case the system is already low
on memory depending on the allocation context, the pcp lists may get
flushed anyway.

> We can also
> change this code, but, I'd hope to maintain current layering. Second,
> a performance
> reason. After the flag for nocma is up, a burst of nocma allocation
> could come. After
> flushing the pcplist one times, we can use the free page on the
> pcplist as usual until
> the context is changed.

It's not guaranteed because CMA pages could be freed between the nocma save
and restore triggering further drains due to a reschedule.  Similarly,
a CMA allocation in parallel could refill with CMA pages on the per-cpu
list. While both cases are unlikely, it's more unpredictable than a
straight-forward pcp bypass.

I don't really see it as a layering violation of the API because all
order-0 pages go through the PCP lists. The fact that order-0 is serviced
from the pcp list is an internal implementation detail, the API doesn't
care.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs
  2020-08-27 13:35     ` Mel Gorman
@ 2020-08-27 23:54       ` Joonsoo Kim
  2020-09-25  4:59         ` Joonsoo Kim
  0 siblings, 1 reply; 13+ messages in thread
From: Joonsoo Kim @ 2020-08-27 23:54 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Vlastimil Babka, Andrew Morton, Linux Memory Management List,
	LKML, Michal Hocko, Aneesh Kumar K . V, kernel-team, Joonsoo Kim

2020년 8월 27일 (목) 오후 10:35, Mel Gorman <mgorman@techsingularity.net>님이 작성:
>
> On Wed, Aug 26, 2020 at 02:12:44PM +0900, Joonsoo Kim wrote:
> > > > And, it requires to break current code
> > > > layering that order-0 page is always handled by the pcplist. I'd prefer
> > > > to avoid it so this patch uses different way to skip CMA page allocation
> > > > from the pcplist.
> > >
> > > Well it would be much simpler and won't affect most of allocations. Better than
> > > flushing pcplists IMHO.
> >
> > Hmm...Still, I'd prefer my approach.
>
> I prefer the pcp bypass approach. It's simpler and it does not incur a
> pcp drain/refill penalty.
>
> > There are two reasons. First,
> > layering problem
> > mentioned above. In rmqueue(), there is a code for MIGRATE_HIGHATOMIC.
> > As the name shows, it's for high order atomic allocation. But, after
> > skipping pcplist
> > allocation as you suggested, we could get there with order 0 request.
>
> I guess your concern is that under some circumstances that a request that
> passes a watermark check could fail due to a highatomic reserve and to
> an extent this is true. However, in that case the system is already low
> on memory depending on the allocation context, the pcp lists may get
> flushed anyway.

My concern is that non-highorder (order-0) allocation could pollute/use the
MIGRATE_HIGHATOMIC pageblock. It's reserved for highorder atomic
allocation so it's not good if an order-0 request could get there. It would
cause more fragmentation on that pageblock.

> > We can also
> > change this code, but, I'd hope to maintain current layering. Second,
> > a performance
> > reason. After the flag for nocma is up, a burst of nocma allocation
> > could come. After
> > flushing the pcplist one times, we can use the free page on the
> > pcplist as usual until
> > the context is changed.
>
> It's not guaranteed because CMA pages could be freed between the nocma save
> and restore triggering further drains due to a reschedule.  Similarly,
> a CMA allocation in parallel could refill with CMA pages on the per-cpu
> list. While both cases are unlikely, it's more unpredictable than a
> straight-forward pcp bypass.

Agreed that it's unpredictable than the pcp bypass. But, as you said,
those cases
would be rare.

> I don't really see it as a layering violation of the API because all
> order-0 pages go through the PCP lists. The fact that order-0 is serviced
> from the pcp list is an internal implementation detail, the API doesn't
> care.

What I mean is an internal implementation layering violation. We could make
a rule even in internal implementation to make code simpler and maintainable.
I guess that order-0 is serviced from the pcp list is one of those.

Anyway, although I prefer my approach, I'm okay with using pcp bypass.

Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs
  2020-08-27 23:54       ` Joonsoo Kim
@ 2020-09-25  4:59         ` Joonsoo Kim
  2020-09-25  8:55           ` Vlastimil Babka
  0 siblings, 1 reply; 13+ messages in thread
From: Joonsoo Kim @ 2020-09-25  4:59 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Vlastimil Babka, Andrew Morton, Linux Memory Management List,
	LKML, Michal Hocko, Aneesh Kumar K . V, kernel-team, Joonsoo Kim

2020년 8월 28일 (금) 오전 8:54, Joonsoo Kim <js1304@gmail.com>님이 작성:
>
> 2020년 8월 27일 (목) 오후 10:35, Mel Gorman <mgorman@techsingularity.net>님이 작성:
> >
> > On Wed, Aug 26, 2020 at 02:12:44PM +0900, Joonsoo Kim wrote:
> > > > > And, it requires to break current code
> > > > > layering that order-0 page is always handled by the pcplist. I'd prefer
> > > > > to avoid it so this patch uses different way to skip CMA page allocation
> > > > > from the pcplist.
> > > >
> > > > Well it would be much simpler and won't affect most of allocations. Better than
> > > > flushing pcplists IMHO.
> > >
> > > Hmm...Still, I'd prefer my approach.
> >
> > I prefer the pcp bypass approach. It's simpler and it does not incur a
> > pcp drain/refill penalty.
> >
> > > There are two reasons. First,
> > > layering problem
> > > mentioned above. In rmqueue(), there is a code for MIGRATE_HIGHATOMIC.
> > > As the name shows, it's for high order atomic allocation. But, after
> > > skipping pcplist
> > > allocation as you suggested, we could get there with order 0 request.
> >
> > I guess your concern is that under some circumstances that a request that
> > passes a watermark check could fail due to a highatomic reserve and to
> > an extent this is true. However, in that case the system is already low
> > on memory depending on the allocation context, the pcp lists may get
> > flushed anyway.
>
> My concern is that non-highorder (order-0) allocation could pollute/use the
> MIGRATE_HIGHATOMIC pageblock. It's reserved for highorder atomic
> allocation so it's not good if an order-0 request could get there. It would
> cause more fragmentation on that pageblock.
>
> > > We can also
> > > change this code, but, I'd hope to maintain current layering. Second,
> > > a performance
> > > reason. After the flag for nocma is up, a burst of nocma allocation
> > > could come. After
> > > flushing the pcplist one times, we can use the free page on the
> > > pcplist as usual until
> > > the context is changed.
> >
> > It's not guaranteed because CMA pages could be freed between the nocma save
> > and restore triggering further drains due to a reschedule.  Similarly,
> > a CMA allocation in parallel could refill with CMA pages on the per-cpu
> > list. While both cases are unlikely, it's more unpredictable than a
> > straight-forward pcp bypass.
>
> Agreed that it's unpredictable than the pcp bypass. But, as you said,
> those cases
> would be rare.
>
> > I don't really see it as a layering violation of the API because all
> > order-0 pages go through the PCP lists. The fact that order-0 is serviced
> > from the pcp list is an internal implementation detail, the API doesn't
> > care.
>
> What I mean is an internal implementation layering violation. We could make
> a rule even in internal implementation to make code simpler and maintainable.
> I guess that order-0 is serviced from the pcp list is one of those.
>
> Anyway, although I prefer my approach, I'm okay with using pcp bypass.

Hello, Andrew and Vlastimil.

It's better to fix this possible bug introduced in v5.9-rc1 before
v5.9 is released.
Which approach do you prefer?
If it is determined, I will immediately send a patch as you suggested.

Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs
  2020-09-25  4:59         ` Joonsoo Kim
@ 2020-09-25  8:55           ` Vlastimil Babka
  2020-09-25  8:58             ` Joonsoo Kim
  0 siblings, 1 reply; 13+ messages in thread
From: Vlastimil Babka @ 2020-09-25  8:55 UTC (permalink / raw)
  To: Joonsoo Kim, Mel Gorman
  Cc: Andrew Morton, Linux Memory Management List, LKML, Michal Hocko,
	Aneesh Kumar K . V, kernel-team, Joonsoo Kim

On 9/25/20 6:59 AM, Joonsoo Kim wrote:
> 2020년 8월 28일 (금) 오전 8:54, Joonsoo Kim <js1304@gmail.com>님이 작성:
> 
> Hello, Andrew and Vlastimil.
> 
> It's better to fix this possible bug introduced in v5.9-rc1 before
> v5.9 is released.
> Which approach do you prefer?
> If it is determined, I will immediately send a patch as you suggested.

Hmm both Mel and I preferred the bypass approach and nobody else weighted in, so
if you don't mind, you can use my suggestion. Hmm maybe alloc_flags & ALLOC_CMA
check should precede migratetype check in the if () to optimize for userspace
allocations?

Thanks,
Vlastimil

> Thanks.
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs
  2020-09-25  8:55           ` Vlastimil Babka
@ 2020-09-25  8:58             ` Joonsoo Kim
  0 siblings, 0 replies; 13+ messages in thread
From: Joonsoo Kim @ 2020-09-25  8:58 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Mel Gorman, Andrew Morton, Linux Memory Management List, LKML,
	Michal Hocko, Aneesh Kumar K . V, kernel-team, Joonsoo Kim

2020년 9월 25일 (금) 오후 5:55, Vlastimil Babka <vbabka@suse.cz>님이 작성:
>
> On 9/25/20 6:59 AM, Joonsoo Kim wrote:
> > 2020년 8월 28일 (금) 오전 8:54, Joonsoo Kim <js1304@gmail.com>님이 작성:
> >
> > Hello, Andrew and Vlastimil.
> >
> > It's better to fix this possible bug introduced in v5.9-rc1 before
> > v5.9 is released.
> > Which approach do you prefer?
> > If it is determined, I will immediately send a patch as you suggested.
>
> Hmm both Mel and I preferred the bypass approach and nobody else weighted in, so
> if you don't mind, you can use my suggestion. Hmm maybe alloc_flags & ALLOC_CMA
> check should precede migratetype check in the if () to optimize for userspace
> allocations?

Okay!
I will implement a bypass approach and send it early next week.

Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-09-25  8:58 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-25  4:59 [PATCH for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs js1304
2020-08-25  5:10 ` Andrew Morton
2020-08-25  5:34   ` Joonsoo Kim
2020-08-26  0:42     ` Andrew Morton
2020-08-26  5:21       ` Joonsoo Kim
2020-08-25  9:43 ` Vlastimil Babka
2020-08-26  5:12   ` Joonsoo Kim
2020-08-27 12:15     ` Vlastimil Babka
2020-08-27 13:35     ` Mel Gorman
2020-08-27 23:54       ` Joonsoo Kim
2020-09-25  4:59         ` Joonsoo Kim
2020-09-25  8:55           ` Vlastimil Babka
2020-09-25  8:58             ` Joonsoo Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).