Re: [PATCH 3/4] tracing, page-allocator: Add trace event for page traffic related to the buddy lists

From: Mel Gorman <mel@csn.ul.ie>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Larry Woodman <lwoodman@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	riel@redhat.com, Ingo Molnar <mingo@elte.hu>,
	Peter Zijlstra <peterz@infradead.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-mm@kvack.org
Subject: Re: [PATCH 3/4] tracing, page-allocator: Add trace event for page traffic related to the buddy lists
Date: Wed, 5 Aug 2009 10:43:46 +0100	[thread overview]
Message-ID: <20090805094346.GC21950@csn.ul.ie> (raw)
In-Reply-To: <20090805182034.5BCD.A69D9226@jp.fujitsu.com>

On Wed, Aug 05, 2009 at 06:24:40PM +0900, KOSAKI Motohiro wrote:
> > The page allocation trace event reports that a page was successfully allocated
> > but it does not specify where it came from. When analysing performance,
> > it can be important to distinguish between pages coming from the per-cpu
> > allocator and pages coming from the buddy lists as the latter requires the
> > zone lock to the taken and more data structures to be examined.
> > 
> > This patch adds a trace event for __rmqueue reporting when a page is being
> > allocated from the buddy lists. It distinguishes between being called
> > to refill the per-cpu lists or whether it is a high-order allocation.
> > Similarly, this patch adds an event to catch when the PCP lists are being
> > drained a little and pages are going back to the buddy lists.
> > 
> > This is trickier to draw conclusions from but high activity on those
> > events could explain why there were a large number of cache misses on a
> > page-allocator-intensive workload. The coalescing and splitting of buddies
> > involves a lot of writing of page metadata and cache line bounces not to
> > mention the acquisition of an interrupt-safe lock necessary to enter this
> > path.
> > 
> > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > Acked-by: Rik van Riel <riel@redhat.com>
> > ---
> >  include/trace/events/kmem.h |   54 +++++++++++++++++++++++++++++++++++++++++++
> >  mm/page_alloc.c             |    2 +
> >  2 files changed, 56 insertions(+), 0 deletions(-)
> > 
> > diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
> > index 0b4002e..3be3df3 100644
> > --- a/include/trace/events/kmem.h
> > +++ b/include/trace/events/kmem.h
> > @@ -311,6 +311,60 @@ TRACE_EVENT(mm_page_alloc,
> >  		show_gfp_flags(__entry->gfp_flags))
> >  );
> >  
> > +TRACE_EVENT(mm_page_alloc_zone_locked,
> > +
> > +	TP_PROTO(const void *page, unsigned int order,
> > +				int migratetype, int percpu_refill),
> > +
> > +	TP_ARGS(page, order, migratetype, percpu_refill),
> > +
> > +	TP_STRUCT__entry(
> > +		__field(	const void *,	page		)
> > +		__field(	unsigned int,	order		)
> > +		__field(	int,		migratetype	)
> > +		__field(	int,		percpu_refill	)
> > +	),
> > +
> > +	TP_fast_assign(
> > +		__entry->page		= page;
> > +		__entry->order		= order;
> > +		__entry->migratetype	= migratetype;
> > +		__entry->percpu_refill	= percpu_refill;
> > +	),
> > +
> > +	TP_printk("page=%p pfn=%lu order=%u migratetype=%d percpu_refill=%d",
> > +		__entry->page,
> > +		page_to_pfn((struct page *)__entry->page),
> > +		__entry->order,
> > +		__entry->migratetype,
> > +		__entry->percpu_refill)
> > +);
> > +
> > +TRACE_EVENT(mm_page_pcpu_drain,
> > +
> > +	TP_PROTO(const void *page, int order, int migratetype),
> > +
> > +	TP_ARGS(page, order, migratetype),
> > +
> > +	TP_STRUCT__entry(
> > +		__field(	const void *,	page		)
> > +		__field(	int,		order		)
> > +		__field(	int,		migratetype	)
> > +	),
> > +
> > +	TP_fast_assign(
> > +		__entry->page		= page;
> > +		__entry->order		= order;
> > +		__entry->migratetype	= migratetype;
> > +	),
> > +
> > +	TP_printk("page=%p pfn=%lu order=%d migratetype=%d",
> > +		__entry->page,
> > +		page_to_pfn((struct page *)__entry->page),
> > +		__entry->order,
> > +		__entry->migratetype)
> > +);
> > +
> >  TRACE_EVENT(mm_page_alloc_extfrag,
> >  
> >  	TP_PROTO(const void *page,
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index c2c90cd..35b92a9 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -535,6 +535,7 @@ static void free_pages_bulk(struct zone *zone, int count,
> >  		page = list_entry(list->prev, struct page, lru);
> >  		/* have to delete it as __free_one_page list manipulates */
> >  		list_del(&page->lru);
> > +		trace_mm_page_pcpu_drain(page, order, page_private(page));
> 
> pcp refill (trace_mm_page_alloc_zone_locked) logged migratetype, but
> this tracepoint doesn't. why?
> 

It does log migratetype as migratetype is in page_private(page) in this
context.

> 
> >  		__free_one_page(page, zone, order, page_private(page));
> >  	}
> >  	spin_unlock(&zone->lock);
> > @@ -878,6 +879,7 @@ retry_reserve:
> >  		}
> >  	}
> >  
> > +	trace_mm_page_alloc_zone_locked(page, order, migratetype, order == 0);
> >  	return page;
> >  }
> 
> Umm, Can we assume order-0 always mean pcp refill?
> 

Right now, that assumption is accurate. Which callpath ends up here with
order == 0 and it's not a PCP refill?

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab