linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/page_alloc: Fix tracepoint mm_page_alloc_zone_locked()
@ 2022-05-11  8:12 Wonhyuk Yang
  2022-05-11 14:23 ` Mel Gorman
  0 siblings, 1 reply; 5+ messages in thread
From: Wonhyuk Yang @ 2022-05-11  8:12 UTC (permalink / raw)
  To: Mel Gorman, Steven Rostedt, Ingo Molnar, Andrew Morton
  Cc: Wonhyuk Yang, Baik Song An, Hong Yeon Kim, Taeung Song,
	linuxgeek, linux-kernel, linux-mm

Currently, trace point mm_page_alloc_zone_locked() doesn't show
correct information.

First, when alloc_flag has ALLOC_HARDER/ALLOC_CMA, page can
be allocated from MIGRATE_HIGHATOMIC/MIGRATE_CMA. Nevertheless,
tracepoint use requested migration type not MIGRATE_HIGHATOMIC and
MIGRATE_CMA.

Second, after Commit 44042b4498728 ("mm/page_alloc: allow high-order
pages to be stored on the per-cpu lists") percpu-list can store
high order pages. But trace point determine whether it is a refiil
of percpu-list by comparing requested order and 0.

To handle these problems, use cached migration type by
get_pcppage_migratetype() instead of requested migration type.
Then, make mm_page_alloc_zone_locked() be called only two contexts
(rmqueue_bulk, rmqueue). With a new argument called percpu_refill,
it can show whether it is a refill of percpu-list correctly.

Cc: Baik Song An <bsahn@etri.re.kr>
Cc: Hong Yeon Kim <kimhy@etri.re.kr>
Cc: Taeung Song <taeung@reallinux.co.kr>
Cc: linuxgeek@linuxgeek.io
Signed-off-by: Wonhyuk Yang <vvghjk1234@gmail.com>
---
 include/trace/events/kmem.h | 14 +++++++++-----
 mm/page_alloc.c             | 18 +++++++++---------
 2 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
index ddc8c944f417..f89fb3afcd46 100644
--- a/include/trace/events/kmem.h
+++ b/include/trace/events/kmem.h
@@ -229,20 +229,23 @@ TRACE_EVENT(mm_page_alloc,
 
 DECLARE_EVENT_CLASS(mm_page,
 
-	TP_PROTO(struct page *page, unsigned int order, int migratetype),
+	TP_PROTO(struct page *page, unsigned int order, int migratetype,
+		 int percpu_refill),
 
-	TP_ARGS(page, order, migratetype),
+	TP_ARGS(page, order, migratetype, percpu_refill),
 
 	TP_STRUCT__entry(
 		__field(	unsigned long,	pfn		)
 		__field(	unsigned int,	order		)
 		__field(	int,		migratetype	)
+		__field(	int,		percpu_refill	)
 	),
 
 	TP_fast_assign(
 		__entry->pfn		= page ? page_to_pfn(page) : -1UL;
 		__entry->order		= order;
 		__entry->migratetype	= migratetype;
+		__entry->percpu_refill	= percpu_refill;
 	),
 
 	TP_printk("page=%p pfn=0x%lx order=%u migratetype=%d percpu_refill=%d",
@@ -250,14 +253,15 @@ DECLARE_EVENT_CLASS(mm_page,
 		__entry->pfn != -1UL ? __entry->pfn : 0,
 		__entry->order,
 		__entry->migratetype,
-		__entry->order == 0)
+		__entry->percpu_refill)
 );
 
 DEFINE_EVENT(mm_page, mm_page_alloc_zone_locked,
 
-	TP_PROTO(struct page *page, unsigned int order, int migratetype),
+	TP_PROTO(struct page *page, unsigned int order, int migratetype,
+		 int percpu_refill),
 
-	TP_ARGS(page, order, migratetype)
+	TP_ARGS(page, order, migratetype, percpu_refill)
 );
 
 TRACE_EVENT(mm_page_pcpu_drain,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0e42038382c1..0351808322ba 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2999,7 +2999,7 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
 		    zone_page_state(zone, NR_FREE_PAGES) / 2) {
 			page = __rmqueue_cma_fallback(zone, order);
 			if (page)
-				goto out;
+				return page;
 		}
 	}
 retry:
@@ -3012,9 +3012,6 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
 								alloc_flags))
 			goto retry;
 	}
-out:
-	if (page)
-		trace_mm_page_alloc_zone_locked(page, order, migratetype);
 	return page;
 }
 
@@ -3028,6 +3025,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 			int migratetype, unsigned int alloc_flags)
 {
 	int i, allocated = 0;
+	int mt;
 
 	/*
 	 * local_lock_irq held so equivalent to spin_lock_irqsave for
@@ -3055,7 +3053,9 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 		 */
 		list_add_tail(&page->lru, list);
 		allocated++;
-		if (is_migrate_cma(get_pcppage_migratetype(page)))
+		mt = get_pcppage_migratetype(page);
+		trace_mm_page_alloc_zone_locked(page, order, mt, true);
+		if (is_migrate_cma(mt))
 			__mod_zone_page_state(zone, NR_FREE_CMA_PAGES,
 					      -(1 << order));
 	}
@@ -3704,6 +3704,7 @@ struct page *rmqueue(struct zone *preferred_zone,
 {
 	unsigned long flags;
 	struct page *page;
+	int mt;
 
 	if (likely(pcp_allowed_order(order))) {
 		/*
@@ -3735,16 +3736,15 @@ struct page *rmqueue(struct zone *preferred_zone,
 		 */
 		if (order > 0 && alloc_flags & ALLOC_HARDER) {
 			page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
-			if (page)
-				trace_mm_page_alloc_zone_locked(page, order, migratetype);
 		}
 		if (!page) {
 			page = __rmqueue(zone, order, migratetype, alloc_flags);
 			if (!page)
 				goto failed;
 		}
-		__mod_zone_freepage_state(zone, -(1 << order),
-					  get_pcppage_migratetype(page));
+		mt = get_pcppage_migratetype(page);
+		trace_mm_page_alloc_zone_locked(page, order, mt, false);
+		__mod_zone_freepage_state(zone, -(1 << order), mt);
 		spin_unlock_irqrestore(&zone->lock, flags);
 	} while (check_new_pages(page, order));
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/page_alloc: Fix tracepoint mm_page_alloc_zone_locked()
  2022-05-11  8:12 [PATCH] mm/page_alloc: Fix tracepoint mm_page_alloc_zone_locked() Wonhyuk Yang
@ 2022-05-11 14:23 ` Mel Gorman
  2022-05-11 15:02   ` Wonhyuk Yang
  0 siblings, 1 reply; 5+ messages in thread
From: Mel Gorman @ 2022-05-11 14:23 UTC (permalink / raw)
  To: Wonhyuk Yang
  Cc: Steven Rostedt, Ingo Molnar, Andrew Morton, Baik Song An,
	Hong Yeon Kim, Taeung Song, linuxgeek, linux-kernel, linux-mm

On Wed, May 11, 2022 at 05:12:07PM +0900, Wonhyuk Yang wrote:
> Currently, trace point mm_page_alloc_zone_locked() doesn't show
> correct information.
> 
> First, when alloc_flag has ALLOC_HARDER/ALLOC_CMA, page can
> be allocated from MIGRATE_HIGHATOMIC/MIGRATE_CMA. Nevertheless,
> tracepoint use requested migration type not MIGRATE_HIGHATOMIC and
> MIGRATE_CMA.
> 
> Second, after Commit 44042b4498728 ("mm/page_alloc: allow high-order
> pages to be stored on the per-cpu lists") percpu-list can store
> high order pages. But trace point determine whether it is a refiil
> of percpu-list by comparing requested order and 0.
> 
> To handle these problems, use cached migration type by
> get_pcppage_migratetype() instead of requested migration type.
> Then, make mm_page_alloc_zone_locked() be called only two contexts
> (rmqueue_bulk, rmqueue). With a new argument called percpu_refill,
> it can show whether it is a refill of percpu-list correctly.
> 

You're definitely right that the current tracepoint is broken.

I got momentarily confused because HIGHATOMIC and CMA are not stored on
PCP lists even though they are a pageblock migrate type. Superficially
calling get_pcppage_migratetype on a page that cannot be a PCP page
seems silly but in the context of this patch, it happens to work because
it was isolated with __rmqueue_smallest which sets the PCP type even if
the page is not going to a PCP list.

The original intent of that tracepoint was to trace when pages were
removed from the buddy list. That would suggest this untested patch on
top of yours as a simplication;

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0351808322ba..66a70b898130 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2476,6 +2476,8 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
 		del_page_from_free_list(page, zone, current_order);
 		expand(zone, page, order, current_order, migratetype);
 		set_pcppage_migratetype(page, migratetype);
+		trace_mm_page_alloc_zone_locked(page, order, migratetype,
+			pcp_allowed_order(order) && migratetype < MIGRATE_PCPTYPES);
 		return page;
 	}
 
@@ -3025,7 +3027,6 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 			int migratetype, unsigned int alloc_flags)
 {
 	int i, allocated = 0;
-	int mt;
 
 	/*
 	 * local_lock_irq held so equivalent to spin_lock_irqsave for
@@ -3053,9 +3054,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 		 */
 		list_add_tail(&page->lru, list);
 		allocated++;
-		mt = get_pcppage_migratetype(page);
-		trace_mm_page_alloc_zone_locked(page, order, mt, true);
-		if (is_migrate_cma(mt))
+		if (is_migrate_cma(get_pcppage_migratetype(page)))
 			__mod_zone_page_state(zone, NR_FREE_CMA_PAGES,
 					      -(1 << order));
 	}
@@ -3704,7 +3703,6 @@ struct page *rmqueue(struct zone *preferred_zone,
 {
 	unsigned long flags;
 	struct page *page;
-	int mt;
 
 	if (likely(pcp_allowed_order(order))) {
 		/*
@@ -3734,17 +3732,15 @@ struct page *rmqueue(struct zone *preferred_zone,
 		 * reserved for high-order atomic allocation, so order-0
 		 * request should skip it.
 		 */
-		if (order > 0 && alloc_flags & ALLOC_HARDER) {
+		if (order > 0 && alloc_flags & ALLOC_HARDER)
 			page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
-		}
 		if (!page) {
 			page = __rmqueue(zone, order, migratetype, alloc_flags);
 			if (!page)
 				goto failed;
 		}
-		mt = get_pcppage_migratetype(page);
-		trace_mm_page_alloc_zone_locked(page, order, mt, false);
-		__mod_zone_freepage_state(zone, -(1 << order), mt);
+		__mod_zone_freepage_state(zone, -(1 << order),
+					  get_pcppage_migratetype(page));
 		spin_unlock_irqrestore(&zone->lock, flags);
 	} while (check_new_pages(page, order));
 

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/page_alloc: Fix tracepoint mm_page_alloc_zone_locked()
  2022-05-11 14:23 ` Mel Gorman
@ 2022-05-11 15:02   ` Wonhyuk Yang
  2022-05-11 15:47     ` Mel Gorman
  0 siblings, 1 reply; 5+ messages in thread
From: Wonhyuk Yang @ 2022-05-11 15:02 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Steven Rostedt, Ingo Molnar, Andrew Morton, Baik Song An,
	Hong Yeon Kim, Taeung Song, linuxgeek, linux-kernel, linux-mm

On Wed, May 11, 2022 at 11:23 PM Mel Gorman <mgorman@suse.de> wrote:
>
> On Wed, May 11, 2022 at 05:12:07PM +0900, Wonhyuk Yang wrote:
> > Currently, trace point mm_page_alloc_zone_locked() doesn't show
> > correct information.
> >
> > First, when alloc_flag has ALLOC_HARDER/ALLOC_CMA, page can
> > be allocated from MIGRATE_HIGHATOMIC/MIGRATE_CMA. Nevertheless,
> > tracepoint use requested migration type not MIGRATE_HIGHATOMIC and
> > MIGRATE_CMA.
> >
> > Second, after Commit 44042b4498728 ("mm/page_alloc: allow high-order
> > pages to be stored on the per-cpu lists") percpu-list can store
> > high order pages. But trace point determine whether it is a refiil
> > of percpu-list by comparing requested order and 0.
> >
> > To handle these problems, use cached migration type by
> > get_pcppage_migratetype() instead of requested migration type.
> > Then, make mm_page_alloc_zone_locked() be called only two contexts
> > (rmqueue_bulk, rmqueue). With a new argument called percpu_refill,
> > it can show whether it is a refill of percpu-list correctly.
> >
>
> You're definitely right that the current tracepoint is broken.
>
> I got momentarily confused because HIGHATOMIC and CMA are not stored on
> PCP lists even though they are a pageblock migrate type. Superficially
> calling get_pcppage_migratetype on a page that cannot be a PCP page
> seems silly but in the context of this patch, it happens to work because
> it was isolated with __rmqueue_smallest which sets the PCP type even if
> the page is not going to a PCP list.

Yes, I agree that calling get_pcppage_migratetype look quite confusing.

> The original intent of that tracepoint was to trace when pages were
> removed from the buddy list. That would suggest this untested patch on
> top of yours as a simplication;
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 0351808322ba..66a70b898130 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2476,6 +2476,8 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
>                 del_page_from_free_list(page, zone, current_order);
>                 expand(zone, page, order, current_order, migratetype);
>                 set_pcppage_migratetype(page, migratetype);
> +               trace_mm_page_alloc_zone_locked(page, order, migratetype,
> +                       pcp_allowed_order(order) && migratetype < MIGRATE_PCPTYPES);
>                 return page;
>         }

Interestingly, my first approach was quite similar your suggestion. But I
noticed that there can be a request whose migration type is MOVABLE
and alloc_flags doen't have ALLOC_CMA. In that case, page are marked
as percpu-refill even though it was allocated from buddy-list directly.
Is there no problem if we just ignore this case?


> @@ -3025,7 +3027,6 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
>                         int migratetype, unsigned int alloc_flags)
>  {
>         int i, allocated = 0;
> -       int mt;
>
>         /*
>          * local_lock_irq held so equivalent to spin_lock_irqsave for
> @@ -3053,9 +3054,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
>                  */
>                 list_add_tail(&page->lru, list);
>                 allocated++;
> -               mt = get_pcppage_migratetype(page);
> -               trace_mm_page_alloc_zone_locked(page, order, mt, true);
> -               if (is_migrate_cma(mt))
> +               if (is_migrate_cma(get_pcppage_migratetype(page)))
>                         __mod_zone_page_state(zone, NR_FREE_CMA_PAGES,
>                                               -(1 << order));
>         }
> @@ -3704,7 +3703,6 @@ struct page *rmqueue(struct zone *preferred_zone,
>  {
>         unsigned long flags;
>         struct page *page;
> -       int mt;
>
>         if (likely(pcp_allowed_order(order))) {
>                 /*
> @@ -3734,17 +3732,15 @@ struct page *rmqueue(struct zone *preferred_zone,
>                  * reserved for high-order atomic allocation, so order-0
>                  * request should skip it.
>                  */
> -               if (order > 0 && alloc_flags & ALLOC_HARDER) {
> +               if (order > 0 && alloc_flags & ALLOC_HARDER)
>                         page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
> -               }
>                 if (!page) {
>                         page = __rmqueue(zone, order, migratetype, alloc_flags);
>                         if (!page)
>                                 goto failed;
>                 }
> -               mt = get_pcppage_migratetype(page);
> -               trace_mm_page_alloc_zone_locked(page, order, mt, false);
> -               __mod_zone_freepage_state(zone, -(1 << order), mt);
> +               __mod_zone_freepage_state(zone, -(1 << order),
> +                                         get_pcppage_migratetype(page));
>                 spin_unlock_irqrestore(&zone->lock, flags);
>         } while (check_new_pages(page, order));
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/page_alloc: Fix tracepoint mm_page_alloc_zone_locked()
  2022-05-11 15:02   ` Wonhyuk Yang
@ 2022-05-11 15:47     ` Mel Gorman
  2022-05-12  1:11       ` Wonhyuk Yang
  0 siblings, 1 reply; 5+ messages in thread
From: Mel Gorman @ 2022-05-11 15:47 UTC (permalink / raw)
  To: Wonhyuk Yang
  Cc: Steven Rostedt, Ingo Molnar, Andrew Morton, Baik Song An,
	Hong Yeon Kim, Taeung Song, linuxgeek, linux-kernel, linux-mm

On Thu, May 12, 2022 at 12:02:30AM +0900, Wonhyuk Yang wrote:
> > The original intent of that tracepoint was to trace when pages were
> > removed from the buddy list. That would suggest this untested patch on
> > top of yours as a simplication;
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 0351808322ba..66a70b898130 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2476,6 +2476,8 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
> >                 del_page_from_free_list(page, zone, current_order);
> >                 expand(zone, page, order, current_order, migratetype);
> >                 set_pcppage_migratetype(page, migratetype);
> > +               trace_mm_page_alloc_zone_locked(page, order, migratetype,
> > +                       pcp_allowed_order(order) && migratetype < MIGRATE_PCPTYPES);
> >                 return page;
> >         }
> 
> Interestingly, my first approach was quite similar your suggestion. But I
> noticed that there can be a request whose migration type is MOVABLE
> and alloc_flags doen't have ALLOC_CMA. In that case, page are marked
> as percpu-refill even though it was allocated from buddy-list directly.
> Is there no problem if we just ignore this case?
> 

I assume you are referring to the case where CMA allocations are being
balanced between regular and CMA areas. I think it's relatively harmless
if percpu_refill field is not 100% accurate for that case. There are
also cases like the percpu list is too small to hold a THP and it's not a
percpu_refill either. If 100% accuracy is an issue, I would prefer renaming
it to percpu_eligible or just deleting it instead of adding complexity
for a tracepoint.  The main value of that tracepoint is determining what
percentage of allocations are potentially contending on zone lock at a
particular time.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/page_alloc: Fix tracepoint mm_page_alloc_zone_locked()
  2022-05-11 15:47     ` Mel Gorman
@ 2022-05-12  1:11       ` Wonhyuk Yang
  0 siblings, 0 replies; 5+ messages in thread
From: Wonhyuk Yang @ 2022-05-12  1:11 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Steven Rostedt, Ingo Molnar, Andrew Morton, Baik Song An,
	Hong Yeon Kim, Taeung Song, linuxgeek, linux-kernel, linux-mm

On Thu, May 12, 2022 at 12:47 AM Mel Gorman <mgorman@suse.de> wrote:
>
> On Thu, May 12, 2022 at 12:02:30AM +0900, Wonhyuk Yang wrote:
> > > The original intent of that tracepoint was to trace when pages were
> > > removed from the buddy list. That would suggest this untested patch on
> > > top of yours as a simplication;
> > >
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index 0351808322ba..66a70b898130 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -2476,6 +2476,8 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
> > >                 del_page_from_free_list(page, zone, current_order);
> > >                 expand(zone, page, order, current_order, migratetype);
> > >                 set_pcppage_migratetype(page, migratetype);
> > > +               trace_mm_page_alloc_zone_locked(page, order, migratetype,
> > > +                       pcp_allowed_order(order) && migratetype < MIGRATE_PCPTYPES);
> > >                 return page;
> > >         }
> >
> > Interestingly, my first approach was quite similar your suggestion. But I
> > noticed that there can be a request whose migration type is MOVABLE
> > and alloc_flags doen't have ALLOC_CMA. In that case, page are marked
> > as percpu-refill even though it was allocated from buddy-list directly.
> > Is there no problem if we just ignore this case?
> >
>
> I assume you are referring to the case where CMA allocations are being
> balanced between regular and CMA areas. I think it's relatively harmless
> if percpu_refill field is not 100% accurate for that case. There are
> also cases like the percpu list is too small to hold a THP and it's not a
> percpu_refill either. If 100% accuracy is an issue, I would prefer renaming
> it to percpu_eligible or just deleting it instead of adding complexity
> for a tracepoint.  The main value of that tracepoint is determining what
> percentage of allocations are potentially contending on zone lock at a
> particular time.
>

Okay, I'll send a new one with your suggestions. Thanks

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-05-12  1:11 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-11  8:12 [PATCH] mm/page_alloc: Fix tracepoint mm_page_alloc_zone_locked() Wonhyuk Yang
2022-05-11 14:23 ` Mel Gorman
2022-05-11 15:02   ` Wonhyuk Yang
2022-05-11 15:47     ` Mel Gorman
2022-05-12  1:11       ` Wonhyuk Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).