linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] New tracepoints for slowpath and memory compaction
@ 2016-07-27 14:43 Janani Ravichandran
  2016-07-27 14:47 ` [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath Janani Ravichandran
  2016-07-27 14:51 ` [PATCH 2/2] mm: compaction.c: Add/Modify direct compaction tracepoints Janani Ravichandran
  0 siblings, 2 replies; 19+ messages in thread
From: Janani Ravichandran @ 2016-07-27 14:43 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: riel, akpm, hannes, vdavydov, mhocko, vbabka, mgorman,
	kirill.shutemov, bywxiaobai, rostedt

Hi,

I am an Outreachy intern working under Rik van Riel on memory allocation 
latency tracing using tracepoints.
The goal of my project is to add tracepoints to code in vmscan.c and 
compaction.c to gain insight into what happens there and examine latencies
using a postprocessing script.

The one here:

https://github.com/Jananiravichandran/Analyzing-tracepoints/blob/master/shrink_slab_latencies.py

is a very basic script that shows how long direct reclaim and shrinkers take.
I intend to keep updating the script as more tracepoints are added in
the direct reclaim and compaction code and eventually submit the script
itself once I'm done.
Suggestions on this are most welcome! 

As of now, there are no mechanisms to find out how long slowpath and
memory compaction take to execute. This patchset adds new tracepoints 
and also modifies a couple of existing ones to address this and collect
some zone information that may be useful.

Janani Ravichandran (2):
  mm: page_alloc.c: Add tracepoints for slowpath
  mm: compaction.c: Add/Modify direct compaction tracepoints

 include/trace/events/compaction.h | 38 ++++++++++++++++++++++++++++++++-----
 include/trace/events/kmem.h       | 40 +++++++++++++++++++++++++++++++++++++++
 mm/compaction.c                   |  6 ++++--
 mm/page_alloc.c                   |  5 +++++
 4 files changed, 82 insertions(+), 7 deletions(-)

-- 
2.7.0

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath
  2016-07-27 14:43 [PATCH 0/2] New tracepoints for slowpath and memory compaction Janani Ravichandran
@ 2016-07-27 14:47 ` Janani Ravichandran
  2016-07-27 15:23   ` Steven Rostedt
  2016-07-27 16:33   ` Michal Hocko
  2016-07-27 14:51 ` [PATCH 2/2] mm: compaction.c: Add/Modify direct compaction tracepoints Janani Ravichandran
  1 sibling, 2 replies; 19+ messages in thread
From: Janani Ravichandran @ 2016-07-27 14:47 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: riel, akpm, hannes, vdavydov, mhocko, vbabka, mgorman,
	kirill.shutemov, bywxiaobai, rostedt

Add tracepoints to the slowpath code to gather some information.
The tracepoints can also be used to find out how much time was spent in
the slowpath.

Signed-off-by: Janani Ravichandran <janani.rvchndrn@gmail.com>
---
 include/trace/events/kmem.h | 40 ++++++++++++++++++++++++++++++++++++++++
 mm/page_alloc.c             |  5 +++++
 2 files changed, 45 insertions(+)

diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
index 6b2e154..c19ab9f 100644
--- a/include/trace/events/kmem.h
+++ b/include/trace/events/kmem.h
@@ -169,6 +169,46 @@ TRACE_EVENT(mm_page_free,
 			__entry->order)
 );
 
+TRACE_EVENT(mm_slowpath_begin,
+
+	TP_PROTO(gfp_t gfp_mask, int order),
+
+	TP_ARGS(gfp_mask, order),
+
+	TP_STRUCT__entry(
+		__field(gfp_t, gfp_mask)
+		__field(int, order)
+	),
+
+	TP_fast_assign(
+		__entry->gfp_mask = gfp_mask;
+		__entry->order = order;
+	),
+
+	TP_printk("gfp_mask:%s order=%d",
+		show_gfp_flags(__entry->gfp_mask),
+		__entry->order)
+);
+
+TRACE_EVENT(mm_slowpath_end,
+
+	TP_PROTO(struct page *page),
+
+	TP_ARGS(page),
+
+	TP_STRUCT__entry(
+		__field(unsigned long, pfn)
+	),
+
+	TP_fast_assign(
+		__entry->pfn = page ? page_to_pfn(page) : -1UL;
+	),
+
+	TP_printk("page=%p pfn=%lu",
+		__entry->pfn != -1UL ? pfn_to_page(__entry->pfn) : NULL,
+		__entry->pfn != -1UL ? __entry->pfn : 0)
+);
+
 TRACE_EVENT(mm_page_free_batched,
 
 	TP_PROTO(struct page *page, int cold),
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8b3e134..be9c688 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3595,6 +3595,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 				(__GFP_ATOMIC|__GFP_DIRECT_RECLAIM)))
 		gfp_mask &= ~__GFP_ATOMIC;
 
+	trace_mm_slowpath_begin(gfp_mask, order);
+
 retry:
 	if (gfp_mask & __GFP_KSWAPD_RECLAIM)
 		wake_all_kswapds(order, ac);
@@ -3769,6 +3771,9 @@ noretry:
 nopage:
 	warn_alloc_failed(gfp_mask, order, NULL);
 got_pg:
+
+	trace_mm_slowpath_end(page);
+
 	return page;
 }
 
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/2] mm: compaction.c: Add/Modify direct compaction tracepoints
  2016-07-27 14:43 [PATCH 0/2] New tracepoints for slowpath and memory compaction Janani Ravichandran
  2016-07-27 14:47 ` [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath Janani Ravichandran
@ 2016-07-27 14:51 ` Janani Ravichandran
  2016-07-27 15:24   ` Steven Rostedt
  2016-08-01 13:25   ` Vlastimil Babka
  1 sibling, 2 replies; 19+ messages in thread
From: Janani Ravichandran @ 2016-07-27 14:51 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: riel, akpm, hannes, vdavydov, mhocko, vbabka, mgorman,
	kirill.shutemov, bywxiaobai, rostedt

Add zone information to an existing tracepoint in compact_zone(). Also,
add a new tracepoint at the end of the compaction code so that latency 
information can be derived.

Signed-off-by: Janani Ravichandran <janani.rvchndrn@gmail.com>
---
 include/trace/events/compaction.h | 38 +++++++++++++++++++++++++++++++++-----
 mm/compaction.c                   |  6 ++++--
 2 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/include/trace/events/compaction.h b/include/trace/events/compaction.h
index 36e2d6f..4d86769 100644
--- a/include/trace/events/compaction.h
+++ b/include/trace/events/compaction.h
@@ -158,12 +158,15 @@ TRACE_EVENT(mm_compaction_migratepages,
 );
 
 TRACE_EVENT(mm_compaction_begin,
-	TP_PROTO(unsigned long zone_start, unsigned long migrate_pfn,
-		unsigned long free_pfn, unsigned long zone_end, bool sync),
+	TP_PROTO(struct zone *zone, unsigned long zone_start,
+		unsigned long migrate_pfn, unsigned long free_pfn,
+		unsigned long zone_end, bool sync),
 
-	TP_ARGS(zone_start, migrate_pfn, free_pfn, zone_end, sync),
+	TP_ARGS(zone, zone_start, migrate_pfn, free_pfn, zone_end, sync),
 
 	TP_STRUCT__entry(
+		__field(int, nid)
+		__field(int, zid)
 		__field(unsigned long, zone_start)
 		__field(unsigned long, migrate_pfn)
 		__field(unsigned long, free_pfn)
@@ -172,6 +175,8 @@ TRACE_EVENT(mm_compaction_begin,
 	),
 
 	TP_fast_assign(
+		__entry->nid = zone_to_nid(zone);
+		__entry->zid = zone_idx(zone);
 		__entry->zone_start = zone_start;
 		__entry->migrate_pfn = migrate_pfn;
 		__entry->free_pfn = free_pfn;
@@ -179,7 +184,9 @@ TRACE_EVENT(mm_compaction_begin,
 		__entry->sync = sync;
 	),
 
-	TP_printk("zone_start=0x%lx migrate_pfn=0x%lx free_pfn=0x%lx zone_end=0x%lx, mode=%s",
+	TP_printk("nid=%d zid=%d zone_start=0x%lx migrate_pfn=0x%lx free_pfn=0x%lx zone_end=0x%lx, mode=%s",
+		__entry->nid,
+		__entry->zid,
 		__entry->zone_start,
 		__entry->migrate_pfn,
 		__entry->free_pfn,
@@ -221,7 +228,7 @@ TRACE_EVENT(mm_compaction_end,
 		__print_symbolic(__entry->status, COMPACTION_STATUS))
 );
 
-TRACE_EVENT(mm_compaction_try_to_compact_pages,
+TRACE_EVENT(mm_compaction_try_to_compact_pages_begin,
 
 	TP_PROTO(
 		int order,
@@ -248,6 +255,27 @@ TRACE_EVENT(mm_compaction_try_to_compact_pages,
 		(int)__entry->mode)
 );
 
+TRACE_EVENT(mm_compaction_try_to_compact_pages_end,
+
+	TP_PROTO(int rc, int contended),
+
+	TP_ARGS(rc, contended),
+
+	TP_STRUCT__entry(
+		__field(int, rc)
+		__field(int, contended)
+	),
+
+	TP_fast_assign(
+		__entry->rc = rc;
+		__entry->contended = contended;
+	),
+
+	TP_printk("rc=%s contended=%d",
+		__print_symbolic(__entry->rc, COMPACTION_STATUS),
+		__entry->contended)
+);
+
 DECLARE_EVENT_CLASS(mm_compaction_suitable_template,
 
 	TP_PROTO(struct zone *zone,
diff --git a/mm/compaction.c b/mm/compaction.c
index 7bc0477..dddd7c7 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1453,7 +1453,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 
 	cc->last_migrated_pfn = 0;
 
-	trace_mm_compaction_begin(start_pfn, cc->migrate_pfn,
+	trace_mm_compaction_begin(zone, start_pfn, cc->migrate_pfn,
 				cc->free_pfn, end_pfn, sync);
 
 	migrate_prep_local();
@@ -1625,7 +1625,7 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 	if (!order || !may_enter_fs || !may_perform_io)
 		return COMPACT_SKIPPED;
 
-	trace_mm_compaction_try_to_compact_pages(order, gfp_mask, mode);
+	trace_mm_compaction_try_to_compact_pages_begin(order, gfp_mask, mode);
 
 	/* Compact each zone in the list */
 	for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, ac->high_zoneidx,
@@ -1711,6 +1711,8 @@ break_loop:
 	if (rc > COMPACT_INACTIVE && all_zones_contended)
 		*contended = COMPACT_CONTENDED_LOCK;
 
+	trace_mm_compaction_try_to_compact_pages_end(rc, *contended);
+
 	return rc;
 }
 
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath
  2016-07-27 14:47 ` [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath Janani Ravichandran
@ 2016-07-27 15:23   ` Steven Rostedt
  2016-07-27 17:20     ` Dave Hansen
  2016-07-28 20:11     ` Janani Ravichandran
  2016-07-27 16:33   ` Michal Hocko
  1 sibling, 2 replies; 19+ messages in thread
From: Steven Rostedt @ 2016-07-27 15:23 UTC (permalink / raw)
  To: Janani Ravichandran
  Cc: linux-mm, linux-kernel, riel, akpm, hannes, vdavydov, mhocko,
	vbabka, mgorman, kirill.shutemov, bywxiaobai

On Wed, 27 Jul 2016 10:47:59 -0400
Janani Ravichandran <janani.rvchndrn@gmail.com> wrote:

> Add tracepoints to the slowpath code to gather some information.
> The tracepoints can also be used to find out how much time was spent in
> the slowpath.
> 
> Signed-off-by: Janani Ravichandran <janani.rvchndrn@gmail.com>
> ---
>  include/trace/events/kmem.h | 40 ++++++++++++++++++++++++++++++++++++++++
>  mm/page_alloc.c             |  5 +++++
>  2 files changed, 45 insertions(+)
> 
> diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
> index 6b2e154..c19ab9f 100644
> --- a/include/trace/events/kmem.h
> +++ b/include/trace/events/kmem.h
> @@ -169,6 +169,46 @@ TRACE_EVENT(mm_page_free,
>  			__entry->order)
>  );
>  
> +TRACE_EVENT(mm_slowpath_begin,
> +
> +	TP_PROTO(gfp_t gfp_mask, int order),
> +
> +	TP_ARGS(gfp_mask, order),
> +
> +	TP_STRUCT__entry(
> +		__field(gfp_t, gfp_mask)
> +		__field(int, order)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->gfp_mask = gfp_mask;
> +		__entry->order = order;
> +	),
> +
> +	TP_printk("gfp_mask:%s order=%d",
> +		show_gfp_flags(__entry->gfp_mask),
> +		__entry->order)
> +);
> +
> +TRACE_EVENT(mm_slowpath_end,
> +
> +	TP_PROTO(struct page *page),
> +
> +	TP_ARGS(page),
> +
> +	TP_STRUCT__entry(
> +		__field(unsigned long, pfn)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->pfn = page ? page_to_pfn(page) : -1UL;
> +	),
> +
> +	TP_printk("page=%p pfn=%lu",
> +		__entry->pfn != -1UL ? pfn_to_page(__entry->pfn) : NULL,

Note, userspace tools will not be able to do this conversion (like
trace-cmd or perf).

> +		__entry->pfn != -1UL ? __entry->pfn : 0)
> +);
> +
>  TRACE_EVENT(mm_page_free_batched,
>  
>  	TP_PROTO(struct page *page, int cold),
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 8b3e134..be9c688 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3595,6 +3595,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  				(__GFP_ATOMIC|__GFP_DIRECT_RECLAIM)))
>  		gfp_mask &= ~__GFP_ATOMIC;
>  
> +	trace_mm_slowpath_begin(gfp_mask, order);
> +
>  retry:
>  	if (gfp_mask & __GFP_KSWAPD_RECLAIM)
>  		wake_all_kswapds(order, ac);
> @@ -3769,6 +3771,9 @@ noretry:
>  nopage:
>  	warn_alloc_failed(gfp_mask, order, NULL);
>  got_pg:
> +
> +	trace_mm_slowpath_end(page);
> +

I'm thinking you only need one tracepoint, and use function_graph
tracer for the length of the function call.

 # cd /sys/kernel/debug/tracing
 # echo __alloc_pages_nodemask > set_ftrace_filter
 # echo function_graph > current_tracer
 # echo 1 > events/kmem/trace_mm_slowpath/enable

-- Steve


>  	return page;
>  }
>  

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] mm: compaction.c: Add/Modify direct compaction tracepoints
  2016-07-27 14:51 ` [PATCH 2/2] mm: compaction.c: Add/Modify direct compaction tracepoints Janani Ravichandran
@ 2016-07-27 15:24   ` Steven Rostedt
  2016-08-01 13:25   ` Vlastimil Babka
  1 sibling, 0 replies; 19+ messages in thread
From: Steven Rostedt @ 2016-07-27 15:24 UTC (permalink / raw)
  To: Janani Ravichandran
  Cc: linux-mm, linux-kernel, riel, akpm, hannes, vdavydov, mhocko,
	vbabka, mgorman, kirill.shutemov, bywxiaobai

On Wed, 27 Jul 2016 10:51:03 -0400
Janani Ravichandran <janani.rvchndrn@gmail.com> wrote:

> Add zone information to an existing tracepoint in compact_zone(). Also,
> add a new tracepoint at the end of the compaction code so that latency 
> information can be derived.
> 
> Signed-off-by: Janani Ravichandran <janani.rvchndrn@gmail.com>
> ---

>  
> +	trace_mm_compaction_try_to_compact_pages_end(rc, *contended);
> +

Again, I'm not to thrilled about tracepoints just being added to track
the length of function calls. We have function graph tracer for that.

-- Steve


>  	return rc;
>  }
>  

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath
  2016-07-27 14:47 ` [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath Janani Ravichandran
  2016-07-27 15:23   ` Steven Rostedt
@ 2016-07-27 16:33   ` Michal Hocko
  2016-07-27 18:16     ` Rik van Riel
  1 sibling, 1 reply; 19+ messages in thread
From: Michal Hocko @ 2016-07-27 16:33 UTC (permalink / raw)
  To: Janani Ravichandran
  Cc: linux-mm, linux-kernel, riel, akpm, hannes, vdavydov, vbabka,
	mgorman, kirill.shutemov, bywxiaobai, rostedt

On Wed 27-07-16 10:47:59, Janani Ravichandran wrote:
> Add tracepoints to the slowpath code to gather some information.
> The tracepoints can also be used to find out how much time was spent in
> the slowpath.

I do not think this is a right thing to measure. __alloc_pages_slowpath
is more a code organization thing. The fast path might perform an
expensive operations like zone reclaim (if node_reclaim_mode > 0) so
these trace point would miss it.

__alloc_pages_nodemask already has a trace point after the allocation
request is done. This alone is not sufficient to measure the allocation
latency which is the main point of this patch AFAIU. Wouldn't it be
better to add another trace point when we enter __alloc_pages_nodemask?

> Signed-off-by: Janani Ravichandran <janani.rvchndrn@gmail.com>
> ---
>  include/trace/events/kmem.h | 40 ++++++++++++++++++++++++++++++++++++++++
>  mm/page_alloc.c             |  5 +++++
>  2 files changed, 45 insertions(+)
> 
> diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
> index 6b2e154..c19ab9f 100644
> --- a/include/trace/events/kmem.h
> +++ b/include/trace/events/kmem.h
> @@ -169,6 +169,46 @@ TRACE_EVENT(mm_page_free,
>  			__entry->order)
>  );
>  
> +TRACE_EVENT(mm_slowpath_begin,
> +
> +	TP_PROTO(gfp_t gfp_mask, int order),
> +
> +	TP_ARGS(gfp_mask, order),
> +
> +	TP_STRUCT__entry(
> +		__field(gfp_t, gfp_mask)
> +		__field(int, order)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->gfp_mask = gfp_mask;
> +		__entry->order = order;
> +	),
> +
> +	TP_printk("gfp_mask:%s order=%d",
> +		show_gfp_flags(__entry->gfp_mask),
> +		__entry->order)
> +);
> +
> +TRACE_EVENT(mm_slowpath_end,
> +
> +	TP_PROTO(struct page *page),
> +
> +	TP_ARGS(page),
> +
> +	TP_STRUCT__entry(
> +		__field(unsigned long, pfn)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->pfn = page ? page_to_pfn(page) : -1UL;
> +	),
> +
> +	TP_printk("page=%p pfn=%lu",
> +		__entry->pfn != -1UL ? pfn_to_page(__entry->pfn) : NULL,
> +		__entry->pfn != -1UL ? __entry->pfn : 0)
> +);
> +
>  TRACE_EVENT(mm_page_free_batched,
>  
>  	TP_PROTO(struct page *page, int cold),
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 8b3e134..be9c688 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3595,6 +3595,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  				(__GFP_ATOMIC|__GFP_DIRECT_RECLAIM)))
>  		gfp_mask &= ~__GFP_ATOMIC;
>  
> +	trace_mm_slowpath_begin(gfp_mask, order);
> +
>  retry:
>  	if (gfp_mask & __GFP_KSWAPD_RECLAIM)
>  		wake_all_kswapds(order, ac);
> @@ -3769,6 +3771,9 @@ noretry:
>  nopage:
>  	warn_alloc_failed(gfp_mask, order, NULL);
>  got_pg:
> +
> +	trace_mm_slowpath_end(page);
> +
>  	return page;
>  }
>  
> -- 
> 2.7.0
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath
  2016-07-27 15:23   ` Steven Rostedt
@ 2016-07-27 17:20     ` Dave Hansen
  2016-07-27 17:30       ` Steven Rostedt
  2016-07-28 20:11     ` Janani Ravichandran
  1 sibling, 1 reply; 19+ messages in thread
From: Dave Hansen @ 2016-07-27 17:20 UTC (permalink / raw)
  To: Steven Rostedt, Janani Ravichandran
  Cc: linux-mm, linux-kernel, riel, akpm, hannes, vdavydov, mhocko,
	vbabka, mgorman, kirill.shutemov, bywxiaobai

On 07/27/2016 08:23 AM, Steven Rostedt wrote:
>> > +
>> > +	trace_mm_slowpath_end(page);
>> > +
> I'm thinking you only need one tracepoint, and use function_graph
> tracer for the length of the function call.
> 
>  # cd /sys/kernel/debug/tracing
>  # echo __alloc_pages_nodemask > set_ftrace_filter
>  # echo function_graph > current_tracer
>  # echo 1 > events/kmem/trace_mm_slowpath/enable

I hesitate to endorse using the function_graph tracer for this kind of
stuff.  Tracepoints offer some level of stability in naming, and the
compiler won't ever make them go away.   While __alloc_pages_nodemask is
probably more stable than most things, there's no guarantee that it will
be there.

BTW, what's the overhead of the function graph tracer if the filter is
set up to be really restrictive like above?  Is the overhead really just
limited to that one function?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath
  2016-07-27 17:20     ` Dave Hansen
@ 2016-07-27 17:30       ` Steven Rostedt
  0 siblings, 0 replies; 19+ messages in thread
From: Steven Rostedt @ 2016-07-27 17:30 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Janani Ravichandran, linux-mm, linux-kernel, riel, akpm, hannes,
	vdavydov, mhocko, vbabka, mgorman, kirill.shutemov, bywxiaobai

On Wed, 27 Jul 2016 10:20:28 -0700
Dave Hansen <dave.hansen@intel.com> wrote:

> On 07/27/2016 08:23 AM, Steven Rostedt wrote:
> >> > +
> >> > +	trace_mm_slowpath_end(page);
> >> > +  
> > I'm thinking you only need one tracepoint, and use function_graph
> > tracer for the length of the function call.
> > 
> >  # cd /sys/kernel/debug/tracing
> >  # echo __alloc_pages_nodemask > set_ftrace_filter
> >  # echo function_graph > current_tracer
> >  # echo 1 > events/kmem/trace_mm_slowpath/enable  
> 
> I hesitate to endorse using the function_graph tracer for this kind of
> stuff.  Tracepoints offer some level of stability in naming, and the
> compiler won't ever make them go away.   While __alloc_pages_nodemask is
> probably more stable than most things, there's no guarantee that it will
> be there.

Well, then you are also advocating in a userspace ABI interface that
will have to be maintained forever. Just be warned.

> 
> BTW, what's the overhead of the function graph tracer if the filter is
> set up to be really restrictive like above?  Is the overhead really just
> limited to that one function?

Yes, if DYNAMIC_FTRACE is defined. Which it should be, because static
ftrace has a huge overhead without enabling the tracer.

It will enable only that function to be traced. I've recommend before
that if one wants to have a good idea of how long a function lasts,
they should filter to a single function. Anything else will include
overhead of the tracer itself.

-- Steve

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath
  2016-07-27 16:33   ` Michal Hocko
@ 2016-07-27 18:16     ` Rik van Riel
  2016-07-27 18:28       ` Steven Rostedt
  2016-07-27 18:44       ` Michal Hocko
  0 siblings, 2 replies; 19+ messages in thread
From: Rik van Riel @ 2016-07-27 18:16 UTC (permalink / raw)
  To: Michal Hocko, Janani Ravichandran
  Cc: linux-mm, linux-kernel, akpm, hannes, vdavydov, vbabka, mgorman,
	kirill.shutemov, bywxiaobai, rostedt

[-- Attachment #1: Type: text/plain, Size: 1245 bytes --]

On Wed, 2016-07-27 at 18:33 +0200, Michal Hocko wrote:
> On Wed 27-07-16 10:47:59, Janani Ravichandran wrote:
> > 
> > Add tracepoints to the slowpath code to gather some information.
> > The tracepoints can also be used to find out how much time was
> > spent in
> > the slowpath.
> I do not think this is a right thing to measure.
> __alloc_pages_slowpath
> is more a code organization thing. The fast path might perform an
> expensive operations like zone reclaim (if node_reclaim_mode > 0) so
> these trace point would miss it.

It doesn't look like it does. The fast path either
returns an allocated page to the caller, or calls
into the slow path.

Starting measurement from the slow path cuts out
a lot of noise, since the fast path will never be
slow (and not interesting as a source of memory
allocation latency).

As for the function tracer, I wish I had known
about that!

That looks like it should provide the info that
Janani needs to write her memory allocation latency
tracing script/tool.

As her Outreachy mentor, I should probably apologize
for potentially having sent her down the wrong path
with tracepoints, and I hope it has been an
educational trip at least :)

-- 
All rights reversed

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath
  2016-07-27 18:16     ` Rik van Riel
@ 2016-07-27 18:28       ` Steven Rostedt
  2016-07-27 18:44       ` Michal Hocko
  1 sibling, 0 replies; 19+ messages in thread
From: Steven Rostedt @ 2016-07-27 18:28 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Michal Hocko, Janani Ravichandran, linux-mm, linux-kernel, akpm,
	hannes, vdavydov, vbabka, mgorman, kirill.shutemov, bywxiaobai

On Wed, 27 Jul 2016 14:16:22 -0400
Rik van Riel <riel@surriel.com> wrote:


> As for the function tracer, I wish I had known
> about that!

The kernel (and just the tracing infrastructure) is too big to know
everything that is there.

> 
> That looks like it should provide the info that
> Janani needs to write her memory allocation latency
> tracing script/tool.
> 
> As her Outreachy mentor, I should probably apologize
> for potentially having sent her down the wrong path
> with tracepoints, and I hope it has been an
> educational trip at least :)
> 

No, it was a perfect example of how we work, and I don't see this as a
wrong path. It's a good learning tool because that patch is exactly
what someone wanting to do a specific task will probably do as their
first attempt. There should be no shame in sending out a patch and have
feedback on another way to accomplish the same thing that doesn't
impact the system as much.

As stated above, the kernel is too big to know everything that needs to
be done. Thus, kernel development is really about trial and error. Send
out what works for you, and then take feedback from those that know
their system better than you to make your patch better.

That's how the workflow should happen on a daily basis. And is exactly
how I operate. There's lots of patches that I send out to other
maintainers that end up being something complete different because I
don't know their systems as well as they do.

-- Steve

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath
  2016-07-27 18:16     ` Rik van Riel
  2016-07-27 18:28       ` Steven Rostedt
@ 2016-07-27 18:44       ` Michal Hocko
  2016-07-27 18:57         ` Rik van Riel
  1 sibling, 1 reply; 19+ messages in thread
From: Michal Hocko @ 2016-07-27 18:44 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Janani Ravichandran, linux-mm, linux-kernel, akpm, hannes,
	vdavydov, vbabka, mgorman, kirill.shutemov, bywxiaobai, rostedt

On Wed 27-07-16 14:16:22, Rik van Riel wrote:
> On Wed, 2016-07-27 at 18:33 +0200, Michal Hocko wrote:
> > On Wed 27-07-16 10:47:59, Janani Ravichandran wrote:
> > > 
> > > Add tracepoints to the slowpath code to gather some information.
> > > The tracepoints can also be used to find out how much time was
> > > spent in
> > > the slowpath.
> > I do not think this is a right thing to measure.
> > __alloc_pages_slowpath
> > is more a code organization thing. The fast path might perform an
> > expensive operations like zone reclaim (if node_reclaim_mode > 0) so
> > these trace point would miss it.
> 
> It doesn't look like it does. The fast path either
> returns an allocated page to the caller, or calls
> into the slow path.

I must be missing something here but what prevents
__alloc_pages_nodemask->get_page_from_freelist from doing zone_reclaim?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath
  2016-07-27 18:44       ` Michal Hocko
@ 2016-07-27 18:57         ` Rik van Riel
  0 siblings, 0 replies; 19+ messages in thread
From: Rik van Riel @ 2016-07-27 18:57 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Janani Ravichandran, linux-mm, linux-kernel, akpm, hannes,
	vdavydov, vbabka, mgorman, kirill.shutemov, bywxiaobai, rostedt

[-- Attachment #1: Type: text/plain, Size: 1259 bytes --]

On Wed, 2016-07-27 at 20:44 +0200, Michal Hocko wrote:
> On Wed 27-07-16 14:16:22, Rik van Riel wrote:
> > 
> > On Wed, 2016-07-27 at 18:33 +0200, Michal Hocko wrote:
> > > 
> > > On Wed 27-07-16 10:47:59, Janani Ravichandran wrote:
> > > > 
> > > > 
> > > > Add tracepoints to the slowpath code to gather some
> > > > information.
> > > > The tracepoints can also be used to find out how much time was
> > > > spent in
> > > > the slowpath.
> > > I do not think this is a right thing to measure.
> > > __alloc_pages_slowpath
> > > is more a code organization thing. The fast path might perform an
> > > expensive operations like zone reclaim (if node_reclaim_mode > 0)
> > > so
> > > these trace point would miss it.
> > It doesn't look like it does. The fast path either
> > returns an allocated page to the caller, or calls
> > into the slow path.
> I must be missing something here but what prevents
> __alloc_pages_nodemask->get_page_from_freelist from doing
> zone_reclaim?

You are right!

Guess the script may need to collect all the
tracing output from __alloc_pages_nodemask on
up, and then filter the output so only the
interesting (read: long duration) traces get
dumped out to a file.

-- 
All rights reversed

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath
  2016-07-27 15:23   ` Steven Rostedt
  2016-07-27 17:20     ` Dave Hansen
@ 2016-07-28 20:11     ` Janani Ravichandran
  2016-08-04 15:19       ` Steven Rostedt
  1 sibling, 1 reply; 19+ messages in thread
From: Janani Ravichandran @ 2016-07-28 20:11 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-mm, linux-kernel, riel, akpm, hannes, vdavydov, mhocko,
	vbabka, mgorman, kirill.shutemov, bywxiaobai


> On Jul 27, 2016, at 8:53 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> I'm thinking you only need one tracepoint, and use function_graph
> tracer for the length of the function call.
> 
> # cd /sys/kernel/debug/tracing
> # echo __alloc_pages_nodemask > set_ftrace_filter
> # echo function_graph > current_tracer
> # echo 1 > events/kmem/trace_mm_slowpath/enable

Thank you so much for your feedback! 

Actually, the goal is to only single out those cases with latencies higher than a given
threshold.

So, in accordance with this, I added those begin/end tracepoints and thought I 
could use a script to read the output of trace_pipe and only write to disk the trace 
information associated with latencies above the threshold. This would help prevent 
high disk I/O, especially when the threshold set is high.

I looked at function graph trace, as you’d suggested. I saw that I could set a threshold 
there using tracing_thresh. But the problem was that slowpath trace information was printed
for all the cases (even when __alloc_pages_nodemask latencies were below the threshold).
Is there a way to print tracepoint information only when __alloc_pages_nodemask
exceeds the threshold?

Thanks!

Janani.
> 
> -- Steve
> 
> 
>> 	return page;
>> }
>> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] mm: compaction.c: Add/Modify direct compaction tracepoints
  2016-07-27 14:51 ` [PATCH 2/2] mm: compaction.c: Add/Modify direct compaction tracepoints Janani Ravichandran
  2016-07-27 15:24   ` Steven Rostedt
@ 2016-08-01 13:25   ` Vlastimil Babka
  2016-08-07 12:32     ` Janani Ravichandran
  1 sibling, 1 reply; 19+ messages in thread
From: Vlastimil Babka @ 2016-08-01 13:25 UTC (permalink / raw)
  To: Janani Ravichandran, linux-mm, linux-kernel
  Cc: riel, akpm, hannes, vdavydov, mhocko, mgorman, kirill.shutemov,
	bywxiaobai, rostedt

On 07/27/2016 04:51 PM, Janani Ravichandran wrote:
> Add zone information to an existing tracepoint in compact_zone(). Also,
> add a new tracepoint at the end of the compaction code so that latency
> information can be derived.
>
> Signed-off-by: Janani Ravichandran <janani.rvchndrn@gmail.com>
> ---
>  include/trace/events/compaction.h | 38 +++++++++++++++++++++++++++++++++-----
>  mm/compaction.c                   |  6 ++++--
>  2 files changed, 37 insertions(+), 7 deletions(-)
>
> diff --git a/include/trace/events/compaction.h b/include/trace/events/compaction.h
> index 36e2d6f..4d86769 100644
> --- a/include/trace/events/compaction.h
> +++ b/include/trace/events/compaction.h
> @@ -158,12 +158,15 @@ TRACE_EVENT(mm_compaction_migratepages,
>  );
>
>  TRACE_EVENT(mm_compaction_begin,
> -	TP_PROTO(unsigned long zone_start, unsigned long migrate_pfn,
> -		unsigned long free_pfn, unsigned long zone_end, bool sync),
> +	TP_PROTO(struct zone *zone, unsigned long zone_start,
> +		unsigned long migrate_pfn, unsigned long free_pfn,
> +		unsigned long zone_end, bool sync),
>
> -	TP_ARGS(zone_start, migrate_pfn, free_pfn, zone_end, sync),
> +	TP_ARGS(zone, zone_start, migrate_pfn, free_pfn, zone_end, sync),
>
>  	TP_STRUCT__entry(
> +		__field(int, nid)
> +		__field(int, zid)
>  		__field(unsigned long, zone_start)
>  		__field(unsigned long, migrate_pfn)
>  		__field(unsigned long, free_pfn)
> @@ -172,6 +175,8 @@ TRACE_EVENT(mm_compaction_begin,
>  	),
>
>  	TP_fast_assign(
> +		__entry->nid = zone_to_nid(zone);
> +		__entry->zid = zone_idx(zone);
>  		__entry->zone_start = zone_start;
>  		__entry->migrate_pfn = migrate_pfn;
>  		__entry->free_pfn = free_pfn;
> @@ -179,7 +184,9 @@ TRACE_EVENT(mm_compaction_begin,
>  		__entry->sync = sync;
>  	),
>
> -	TP_printk("zone_start=0x%lx migrate_pfn=0x%lx free_pfn=0x%lx zone_end=0x%lx, mode=%s",
> +	TP_printk("nid=%d zid=%d zone_start=0x%lx migrate_pfn=0x%lx free_pfn=0x%lx zone_end=0x%lx, mode=%s",

Yea, this tracepoint has been odd in not printing node/zone in a 
friendly way (it's possible to determine it from zone_start/zone_end 
though, so this is good in general. But instead of printing nid and zid 
like this, it would be nice to unify the output with the other 
tracepoints, e.g.:

DECLARE_EVENT_CLASS(mm_compaction_suitable_template,
[...]
         TP_printk("node=%d zone=%-8s order=%d ret=%s",
                 __entry->nid,
                 __print_symbolic(__entry->idx, ZONE_TYPE),

Thanks,
Vlastimil

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath
  2016-07-28 20:11     ` Janani Ravichandran
@ 2016-08-04 15:19       ` Steven Rostedt
  2016-08-05 16:03         ` Janani Ravichandran
  0 siblings, 1 reply; 19+ messages in thread
From: Steven Rostedt @ 2016-08-04 15:19 UTC (permalink / raw)
  To: Janani Ravichandran
  Cc: linux-mm, linux-kernel, riel, akpm, hannes, vdavydov, mhocko,
	vbabka, mgorman, kirill.shutemov, bywxiaobai

On Fri, 29 Jul 2016 01:41:20 +0530
Janani Ravichandran <janani.rvchndrn@gmail.com> wrote:

Sorry for the late reply, I've been swamped with other things since
coming back from my vacation.

> I looked at function graph trace, as you’d suggested. I saw that I could set a threshold 
> there using tracing_thresh. But the problem was that slowpath trace information was printed
> for all the cases (even when __alloc_pages_nodemask latencies were below the threshold).
> Is there a way to print tracepoint information only when __alloc_pages_nodemask
> exceeds the threshold?

One thing you could do is to create your own module and hook into the
function graph tracer yourself!

It would require a patch to export two functions in
kernel/trace/ftrace.c:

 register_ftrace_graph()
 unregister_ftrace_graph()

Note, currently only one user of these functions is allowed at a time.
If function_graph tracing is already enabled, the register function
will return -EBUSY.

You pass in a "retfunc" and a "entryfunc" (I never understood why they
were backwards), and these are the functions that are called when a
function returns and when a function is entered respectively.

The retfunc looks like this:

static void my_retfunc(struct ftrace_graph_ret *trace)
{
	[...]
}

static int my_entryfunc(struct ftrace_graph_ent *trace)
{
	[...]
}


The ftrace_graph_ret structure looks like this:

struct ftrace_graph_ret {
	unsigned long func;
	unsigned long overrun;
	unsigned long calltime;
	unsigned long rettime;
	int depth;
};

Where func is actually the instruction pointer of the function that is
being traced.

You can ignore "overrun".

calltime is the trace_clock_local() (sched_clock() like timestamp) of
when the function was entered.

rettime is the trace_clock_local() timestamp of when the function
returns.

 rettime - calltime is the time difference of the entire function.

And that's the time you want to look at.

depth is how deep into the call chain the current function is. There's
a limit (50 I think), of how deep it will record, and anything deeper
will go into that "overrun" field I told you to ignore.


Hmm, looking at the code, it appears setting tracing_thresh should
work. Could you show me exactly what you did?

Either way, adding your own function graph hook may be a good exercise
in seeing how all this works.

-- Steve

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath
  2016-08-04 15:19       ` Steven Rostedt
@ 2016-08-05 16:03         ` Janani Ravichandran
  2016-08-05 16:30           ` Steven Rostedt
  0 siblings, 1 reply; 19+ messages in thread
From: Janani Ravichandran @ 2016-08-05 16:03 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-mm, linux-kernel, riel, akpm, hannes, vdavydov, mhocko,
	vbabka, mgorman, kirill.shutemov, bywxiaobai


> On Aug 4, 2016, at 8:49 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> On Fri, 29 Jul 2016 01:41:20 +0530
> Janani Ravichandran <janani.rvchndrn@gmail.com> wrote:
> 
> Sorry for the late reply, I've been swamped with other things since
> coming back from my vacation.
> 

No problem!
> 
> Hmm, looking at the code, it appears setting tracing_thresh should
> work. Could you show me exactly what you did?
> 

Sure. I wanted to observe how long it took to allocate pages and also how
long functions in the direct reclaim path took.

So I did:

echo function_graph > current_tracer
echo __alloc_pages_nodemask > set_graph_function
echo __alloc_pages_nodemask >> set_ftrace_filter
echo do_try_to_free_pages >> set_ftrace_filter
echo shrink_zone >> set_ftrace_filter
echo mem_cgroup_softlimit_reclaim >> set_ftrace_filter
echo shrink_zone_memcg >> set_ftrace_filter
echo shrink_slab >> set_ftrace_filter
echo shrink_list >> set_ftrace_filter
echo shrink_active_list >> set_ftrace_filter
echo shrink_inactive_list >> set_ftrace_filter

echo 20 > tracing_thresh

echo 1 > events/vmscan/mm_shrink_slab_start/enable
echo 1 > events/vmscan/mm_shrink_slab_end/enable
echo 1 > events/vmscan/mm_vmscan_direct_reclaim_begin/enable
echo 1 > events/vmscan/mm_vmscan_direct_reclaim_end/enable

Rik had suggested that it’d be good to write only the tracepoint info related to
high latencies to disk. Because otherwise, there’s a lot of information from
the tracepoints. Filtering them out would greatly reduce disk I/O.

What I first tried with begin/end tracepoints was simply use their timestamps
to calculate duration and write the tracepoint info to disk only if it exceeded a
certain number.

The function graph output is great when

a, no thresholds or tracepoints are set (with those aforementioned functions
used as filters).

Here is a sample output.
https://github.com/Jananiravichandran/Analyzing-tracepoints/blob/master/no_tp_no_threshold.txt

Lines 372 to 474 clearly show durations of functions and the output is helpful 
to observe how long each function took.

b, no thresholds are set and the tracepoints are enabled to get some
additional information.

https://github.com/Jananiravichandran/Analyzing-tracepoints/blob/master/set_tp_no_threshold.txt

Lines 785 to 916 here clearly show which tracepoints were invoked within which
function calls as the beginning and end of functions are clearly marked.

c, A threshold (20 in this case) is set but no tracepoints are enabled.

https://github.com/Jananiravichandran/Analyzing-tracepoints/blob/master/no_tp_set_threshold.txt

Lines 230 to 345 only show functions which exceeded the threshold.

But there’s a problem when a threshold is set and the tracepoints are enabled. It
is difficult to know the subset of the total tracepoint info printed that was actually
part of the functions that took longer than the threshold to execute (as there is no
info indicating the beginning of functions unlike case b, mentioned above).

For example,
between lines 59 and 75 here:

https://github.com/Jananiravichandran/Analyzing-tracepoints/blob/master/set_tp_set_threshold.txt

we can see that there was a call to shrink_zone() which took 54.141 us
(greater than 20, the threshold). We also see a lot of tracepoint information
printed between lines 59 and 74. But it is not possible for us to filter out
only the tracepoint info that belongs to the shrink_zone() call that took 54.141
us as it is possible that some of the information was printed as part of
other shrink_zone() calls which took less than the threshold and therefore
did not make it to the output file.

So, it’s the filtering of anomalous data from tracepoints that I find difficult while
using function_graphs. 

> Either way, adding your own function graph hook may be a good exercise
> in seeing how all this works.

Thank you for your suggestions regarding the function graph hook! I will try
it and see if there’s anything I can come up with!

Janani.

> 
> -- Steve

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath
  2016-08-05 16:03         ` Janani Ravichandran
@ 2016-08-05 16:30           ` Steven Rostedt
  2016-08-07 10:36             ` Janani Ravichandran
  0 siblings, 1 reply; 19+ messages in thread
From: Steven Rostedt @ 2016-08-05 16:30 UTC (permalink / raw)
  To: Janani Ravichandran
  Cc: linux-mm, linux-kernel, riel, akpm, hannes, vdavydov, mhocko,
	vbabka, mgorman, kirill.shutemov, bywxiaobai

On Fri, 5 Aug 2016 21:33:25 +0530
Janani Ravichandran <janani.rvchndrn@gmail.com> wrote:


> > Hmm, looking at the code, it appears setting tracing_thresh should
> > work. Could you show me exactly what you did?
> >   
> 
> Sure. I wanted to observe how long it took to allocate pages and also how
> long functions in the direct reclaim path took.
> 
> So I did:
> 
> echo function_graph > current_tracer
> echo __alloc_pages_nodemask > set_graph_function

Eliminate the update to set_graph_function step. I'm not sure this is
the cause, but it's not needed. Adding to set_ftrace_filter should be
sufficient.

> echo __alloc_pages_nodemask >> set_ftrace_filter

Use '>' and not '>>' as I don't know what you had in there before, as
I'm guessing you want this to only contain what you listed here.

> echo do_try_to_free_pages >> set_ftrace_filter
> echo shrink_zone >> set_ftrace_filter
> echo mem_cgroup_softlimit_reclaim >> set_ftrace_filter
> echo shrink_zone_memcg >> set_ftrace_filter
> echo shrink_slab >> set_ftrace_filter
> echo shrink_list >> set_ftrace_filter
> echo shrink_active_list >> set_ftrace_filter
> echo shrink_inactive_list >> set_ftrace_filter
> 
> echo 20 > tracing_thresh

You probably want to clear the trace here, or set function_graph here
first. Because the function graph starts writing to the buffer
immediately.

> 
> echo 1 > events/vmscan/mm_shrink_slab_start/enable
> echo 1 > events/vmscan/mm_shrink_slab_end/enable
> echo 1 > events/vmscan/mm_vmscan_direct_reclaim_begin/enable
> echo 1 > events/vmscan/mm_vmscan_direct_reclaim_end/enable
> 
> Rik had suggested that it’d be good to write only the tracepoint info related to
> high latencies to disk. Because otherwise, there’s a lot of information from
> the tracepoints. Filtering them out would greatly reduce disk I/O.
> 
> What I first tried with begin/end tracepoints was simply use their timestamps
> to calculate duration and write the tracepoint info to disk only if it exceeded a
> certain number.
> 
> The function graph output is great when
> 
> a, no thresholds or tracepoints are set (with those aforementioned functions
> used as filters).
> 
> Here is a sample output.
> https://github.com/Jananiravichandran/Analyzing-tracepoints/blob/master/no_tp_no_threshold.txt
> 
> Lines 372 to 474 clearly show durations of functions and the output is helpful 
> to observe how long each function took.
> 
> b, no thresholds are set and the tracepoints are enabled to get some
> additional information.
> 
> https://github.com/Jananiravichandran/Analyzing-tracepoints/blob/master/set_tp_no_threshold.txt
> 
> Lines 785 to 916 here clearly show which tracepoints were invoked within which
> function calls as the beginning and end of functions are clearly marked.
> 
> c, A threshold (20 in this case) is set but no tracepoints are enabled.
> 
> https://github.com/Jananiravichandran/Analyzing-tracepoints/blob/master/no_tp_set_threshold.txt
> 
> Lines 230 to 345 only show functions which exceeded the threshold.
> 
> But there’s a problem when a threshold is set and the tracepoints are enabled. It
> is difficult to know the subset of the total tracepoint info printed that was actually
> part of the functions that took longer than the threshold to execute (as there is no
> info indicating the beginning of functions unlike case b, mentioned above).
> 
> For example,
> between lines 59 and 75 here:
> 
> https://github.com/Jananiravichandran/Analyzing-tracepoints/blob/master/set_tp_set_threshold.txt

When threshold is set, the entry is not recorded, because it is only
showing the exit and the time it took in that function:

0) kswapd0-52 | + 54.141 us | } /* shrink_zone */

shrink_zone() took 54.141us.

The reason it doesn't record the entry is because it would fill the
entire buffer, if the threshold is never hit. One can't predict the
time in a function when you first enter that function.

> 
> we can see that there was a call to shrink_zone() which took 54.141 us
> (greater than 20, the threshold). We also see a lot of tracepoint information
> printed between lines 59 and 74. But it is not possible for us to filter out
> only the tracepoint info that belongs to the shrink_zone() call that took 54.141
> us as it is possible that some of the information was printed as part of
> other shrink_zone() calls which took less than the threshold and therefore
> did not make it to the output file.

Exactly!

You need your own interpreter here. Perhaps a module that either reads
the tracepoints directly and registers a function graph tracer itself.
The trace events and function tracers are plugable. You don't need to
use the tracing system to use them. Just hook into them directly.

Things like the wakeup latency tracer does this. Look at
kernel/trace/trace_sched_wakeup.c for an example. It hooks into the
sched_wakeup and sched_switch tracepoints, and also has a way to use
function and function_graph tracing.



> 
> So, it’s the filtering of anomalous data from tracepoints that I find difficult while
> using function_graphs. 

Well, as I said, you can't filter on the entry tracepoint/function
because you don't know how long that function will take yet. You need
to have code that takes all information and only writes it out after
you hit the latency. That's going to require some custom coding.

> 
> > Either way, adding your own function graph hook may be a good exercise
> > in seeing how all this works.  
> 
> Thank you for your suggestions regarding the function graph hook! I will try
> it and see if there’s anything I can come up with!

Great! And note, even if you add extra tracepoints, you can hook
directly into them too. Again, see the trace_sched_wakeup.c for
examples.

-- Steve

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath
  2016-08-05 16:30           ` Steven Rostedt
@ 2016-08-07 10:36             ` Janani Ravichandran
  0 siblings, 0 replies; 19+ messages in thread
From: Janani Ravichandran @ 2016-08-07 10:36 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Janani Ravichandran, linux-mm, linux-kernel, riel, akpm, hannes,
	vdavydov, mhocko, vbabka, mgorman, kirill.shutemov, bywxiaobai


> On Aug 5, 2016, at 10:00 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> 
>> 
> 
> You probably want to clear the trace here, or set function_graph here
> first. Because the function graph starts writing to the buffer
> immediately.
> 

I did that, just didn’t include it here :)
>> 
> 
> When threshold is set, the entry is not recorded, because it is only
> showing the exit and the time it took in that function:
> 
> 0) kswapd0-52 | + 54.141 us | } /* shrink_zone */
> 
> shrink_zone() took 54.141us.
> 
> The reason it doesn't record the entry is because it would fill the
> entire buffer, if the threshold is never hit. One can't predict the
> time in a function when you first enter that function.

Right!
> 
>> 
> 
> You need your own interpreter here. Perhaps a module that either reads
> the tracepoints directly and registers a function graph tracer itself.
> The trace events and function tracers are plugable. You don't need to
> use the tracing system to use them. Just hook into them directly.
> 
> Things like the wakeup latency tracer does this. Look at
> kernel/trace/trace_sched_wakeup.c for an example. It hooks into the
> sched_wakeup and sched_switch tracepoints, and also has a way to use
> function and function_graph tracing.
> 
> 

I will look at that file. Thanks!
> 
>> 
> 
> Great! And note, even if you add extra tracepoints, you can hook
> directly into them too. Again, see the trace_sched_wakeup.c for
> examples.

Alright. Thanks for all the help! :)

Janani.
> 
> -- Steve

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] mm: compaction.c: Add/Modify direct compaction tracepoints
  2016-08-01 13:25   ` Vlastimil Babka
@ 2016-08-07 12:32     ` Janani Ravichandran
  0 siblings, 0 replies; 19+ messages in thread
From: Janani Ravichandran @ 2016-08-07 12:32 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Janani Ravichandran, linux-mm, linux-kernel, riel, akpm, hannes,
	vdavydov, mhocko, mgorman, kirill.shutemov, bywxiaobai, rostedt


> On Aug 1, 2016, at 6:55 PM, Vlastimil Babka <vbabka@suse.cz> wrote:
> 
> 
> Yea, this tracepoint has been odd in not printing node/zone in a friendly way (it's possible to determine it from zone_start/zone_end though, so this is good in general. But instead of printing nid and zid like this, it would be nice to unify the output with the other tracepoints, e.g.:
> 
> DECLARE_EVENT_CLASS(mm_compaction_suitable_template,
> [...]
>        TP_printk("node=%d zone=%-8s order=%d ret=%s",
>                __entry->nid,
>                __print_symbolic(__entry->idx, ZONE_TYPE),

Sure, I’ll do that in v2. Thanks!

Also, I guess I should have mentioned that the tracepoint added
at the end of the compaction code wasn’t just for deriving latency information. 
rc and *contended would give us the result of the compaction attempted, 
which I thought would be useful.

I get that begin/end tracepoints aren’t required here, but how about having
trace_mm_compaction_try_to_compact_pages moved to the end to 
include compaction status?

Janani.
> 
> Thanks,
> Vlastimil

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2016-08-07 12:32 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-27 14:43 [PATCH 0/2] New tracepoints for slowpath and memory compaction Janani Ravichandran
2016-07-27 14:47 ` [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath Janani Ravichandran
2016-07-27 15:23   ` Steven Rostedt
2016-07-27 17:20     ` Dave Hansen
2016-07-27 17:30       ` Steven Rostedt
2016-07-28 20:11     ` Janani Ravichandran
2016-08-04 15:19       ` Steven Rostedt
2016-08-05 16:03         ` Janani Ravichandran
2016-08-05 16:30           ` Steven Rostedt
2016-08-07 10:36             ` Janani Ravichandran
2016-07-27 16:33   ` Michal Hocko
2016-07-27 18:16     ` Rik van Riel
2016-07-27 18:28       ` Steven Rostedt
2016-07-27 18:44       ` Michal Hocko
2016-07-27 18:57         ` Rik van Riel
2016-07-27 14:51 ` [PATCH 2/2] mm: compaction.c: Add/Modify direct compaction tracepoints Janani Ravichandran
2016-07-27 15:24   ` Steven Rostedt
2016-08-01 13:25   ` Vlastimil Babka
2016-08-07 12:32     ` Janani Ravichandran

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).