linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matthias Schiffer <mschiffer@universe-factory.net>
To: rostedt@goodmis.org, mingo@redhat.com
Cc: linux-kernel@vger.kernel.org
Subject: Re: [PATCH] ftrace: fix stddev calculation in function profiler (again)
Date: Sun, 25 Mar 2018 19:09:39 +0200	[thread overview]
Message-ID: <41987dff-d05f-168f-2662-ddd69eed11dc@universe-factory.net> (raw)
In-Reply-To: <b43d2270-dd11-0d42-7ba8-b647063dd0c1@universe-factory.net>


[-- Attachment #1.1: Type: text/plain, Size: 5934 bytes --]

On 03/24/2018 11:03 PM, Matthias Schiffer wrote:
> On 03/24/2018 05:26 PM, Matthias Schiffer wrote:
>> It is well-known that it is not possible to accurately calculate variances
>> just by accumulating squared samples; in fact, such an approach can even
>> result in negative numbers. An earlier attempt to fix the calculation
>> referred to Welford's method, but did not implement it correctly, leading
>> to meeaningless output like the following:
>>
>>   nf_conntrack_proto_fini     50    373.523 us    7.470 us    3234315951 us
>>
>> Welford's method uses one do_div() in the tracing path; this cannot be
>> avoided. The average time is added to struct ftrace_profile, so only a
>> single division is required. I also considered the following alternatives:
>>
>> 1) Only keeping the avg field and removing the aggregated time would
>> greatly lead to severe rounding errors in calculating the total time based
>> on counter and avg.
>>
>> 2) Calculating both the old and the new average in profile_graph_return()
>> instead of storing it in struct ftrace_profile would require a second
>> division.
>>
>> 3) I managed to transform Welford's equations in a way that uses the total
>> time instead of the average and requires only a single division.
>> Unfortunately, the divisor is counter^3 in this case, easily overflowing
>> even 64bit integers.
>>
>> Ruling out the above alternatives, I chose the present approach to fix the
>> issue.
>>
>> Fixes: e330b3bcd831 ("tracing: Show sample std dev in function profiling")
>> Fixes: 52d85d763086 ("ftrace: Fix stddev calculation in function profiler")
>> Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
> 
> 
> Hmm, in further testing, I'm seeing profile_graph_return() being called
> without rec->counter getting incremented a lot, completely throwing off the
> stddev calculation (and also slightly displacing the average). Is this
> expected?
> 
> Should we possibly move the counter increment to profile_graph_return() in
> the CONFIG_FUNCTION_GRAPH_TRACER case, or introduce a second counter for
> returns?
> 
> Matthias

profile_graph_return() being called without rec->counter getting
incremented turned out to be another MIPS-specific bug, I'll send patch for
that, too.


I have two more questions/issues regarding the generic code:

1) There is a small window when enabling the profiler where rec->counter
gets incremented, but profile_graph_return() does not record times yet. For
functions that are called a lot, the effect is negligible, but for very
infrequent calls, the shown average time and variance are way off. Moving
the rec->counter increment to profile_graph_return() when
CONFIG_FUNCTION_GRAPH_TRACER is enabled would fix this. Does this sound
reasonable?

2) I noticed that my patch does not handle rec->counter overflows correctly
and will cause a zero division. I can think of two solutions for handling
overflows that keep the avg / stddev fields meaningful:
- Reset time/avg/stddev to 0 on overflows
- Stop updating the record at 0xffffffff, avoiding the overflow

Matthias


> 
> 
> 
>> ---
>>  kernel/trace/ftrace.c | 33 +++++++++++++++++++--------------
>>  1 file changed, 19 insertions(+), 14 deletions(-)
>>
>> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
>> index eac9ce2c57a2..16dce67b855a 100644
>> --- a/kernel/trace/ftrace.c
>> +++ b/kernel/trace/ftrace.c
>> @@ -460,7 +460,8 @@ struct ftrace_profile {
>>  	unsigned long			counter;
>>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>>  	unsigned long long		time;
>> -	unsigned long long		time_squared;
>> +	unsigned long long		avg;
>> +	unsigned long long		stddev;
>>  #endif
>>  };
>>  
>> @@ -580,7 +581,6 @@ static int function_stat_show(struct seq_file *m, void *v)
>>  	int ret = 0;
>>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>>  	static struct trace_seq s;
>> -	unsigned long long avg;
>>  	unsigned long long stddev;
>>  #endif
>>  	mutex_lock(&ftrace_profile_lock);
>> @@ -592,9 +592,7 @@ static int function_stat_show(struct seq_file *m, void *v)
>>  	}
>>  
>>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>> -	avg = rec->time;
>> -	do_div(avg, rec->counter);
>> -	if (tracing_thresh && (avg < tracing_thresh))
>> +	if (tracing_thresh && (rec->avg < tracing_thresh))
>>  		goto out;
>>  #endif
>>  
>> @@ -608,24 +606,19 @@ static int function_stat_show(struct seq_file *m, void *v)
>>  	if (rec->counter <= 1)
>>  		stddev = 0;
>>  	else {
>> -		/*
>> -		 * Apply Welford's method:
>> -		 * s^2 = 1 / (n * (n-1)) * (n * \Sum (x_i)^2 - (\Sum x_i)^2)
>> -		 */
>> -		stddev = rec->counter * rec->time_squared -
>> -			 rec->time * rec->time;
>> +		stddev = rec->stddev;
>>  
>>  		/*
>>  		 * Divide only 1000 for ns^2 -> us^2 conversion.
>>  		 * trace_print_graph_duration will divide 1000 again.
>>  		 */
>> -		do_div(stddev, rec->counter * (rec->counter - 1) * 1000);
>> +		do_div(stddev, 1000 * (rec->counter - 1));
>>  	}
>>  
>>  	trace_seq_init(&s);
>>  	trace_print_graph_duration(rec->time, &s);
>>  	trace_seq_puts(&s, "    ");
>> -	trace_print_graph_duration(avg, &s);
>> +	trace_print_graph_duration(rec->avg, &s);
>>  	trace_seq_puts(&s, "    ");
>>  	trace_print_graph_duration(stddev, &s);
>>  	trace_print_seq(m, &s);
>> @@ -905,8 +898,20 @@ static void profile_graph_return(struct ftrace_graph_ret *trace)
>>  
>>  	rec = ftrace_find_profiled_func(stat, trace->func);
>>  	if (rec) {
>> +		unsigned long long avg, delta1, delta2;
>> +
>>  		rec->time += calltime;
>> -		rec->time_squared += calltime * calltime;
>> +
>> +		/* Apply Welford's method */
>> +		delta1 = calltime - rec->avg;
>> +
>> +		avg = rec->time;
>> +		do_div(avg, rec->counter);
>> +		rec->avg = avg;
>> +
>> +		delta2 = calltime - rec->avg;
>> +
>> +		rec->stddev += delta1 * delta2;
>>  	}
>>  
>>   out:
>>
> 
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2018-03-25 17:09 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-24 16:26 [PATCH] ftrace: fix stddev calculation in function profiler (again) Matthias Schiffer
2018-03-24 22:03 ` Matthias Schiffer
2018-03-25 17:09   ` Matthias Schiffer [this message]
2018-03-26 16:59     ` Steven Rostedt
2018-03-26 16:51 ` Steven Rostedt
2018-03-27 14:51   ` Matthias Schiffer
2018-03-27 15:14     ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=41987dff-d05f-168f-2662-ddd69eed11dc@universe-factory.net \
    --to=mschiffer@universe-factory.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).