On 03/24/2018 11:03 PM, Matthias Schiffer wrote: > On 03/24/2018 05:26 PM, Matthias Schiffer wrote: >> It is well-known that it is not possible to accurately calculate variances >> just by accumulating squared samples; in fact, such an approach can even >> result in negative numbers. An earlier attempt to fix the calculation >> referred to Welford's method, but did not implement it correctly, leading >> to meeaningless output like the following: >> >> nf_conntrack_proto_fini 50 373.523 us 7.470 us 3234315951 us >> >> Welford's method uses one do_div() in the tracing path; this cannot be >> avoided. The average time is added to struct ftrace_profile, so only a >> single division is required. I also considered the following alternatives: >> >> 1) Only keeping the avg field and removing the aggregated time would >> greatly lead to severe rounding errors in calculating the total time based >> on counter and avg. >> >> 2) Calculating both the old and the new average in profile_graph_return() >> instead of storing it in struct ftrace_profile would require a second >> division. >> >> 3) I managed to transform Welford's equations in a way that uses the total >> time instead of the average and requires only a single division. >> Unfortunately, the divisor is counter^3 in this case, easily overflowing >> even 64bit integers. >> >> Ruling out the above alternatives, I chose the present approach to fix the >> issue. >> >> Fixes: e330b3bcd831 ("tracing: Show sample std dev in function profiling") >> Fixes: 52d85d763086 ("ftrace: Fix stddev calculation in function profiler") >> Signed-off-by: Matthias Schiffer > > > Hmm, in further testing, I'm seeing profile_graph_return() being called > without rec->counter getting incremented a lot, completely throwing off the > stddev calculation (and also slightly displacing the average). Is this > expected? > > Should we possibly move the counter increment to profile_graph_return() in > the CONFIG_FUNCTION_GRAPH_TRACER case, or introduce a second counter for > returns? > > Matthias profile_graph_return() being called without rec->counter getting incremented turned out to be another MIPS-specific bug, I'll send patch for that, too. I have two more questions/issues regarding the generic code: 1) There is a small window when enabling the profiler where rec->counter gets incremented, but profile_graph_return() does not record times yet. For functions that are called a lot, the effect is negligible, but for very infrequent calls, the shown average time and variance are way off. Moving the rec->counter increment to profile_graph_return() when CONFIG_FUNCTION_GRAPH_TRACER is enabled would fix this. Does this sound reasonable? 2) I noticed that my patch does not handle rec->counter overflows correctly and will cause a zero division. I can think of two solutions for handling overflows that keep the avg / stddev fields meaningful: - Reset time/avg/stddev to 0 on overflows - Stop updating the record at 0xffffffff, avoiding the overflow Matthias > > > >> --- >> kernel/trace/ftrace.c | 33 +++++++++++++++++++-------------- >> 1 file changed, 19 insertions(+), 14 deletions(-) >> >> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c >> index eac9ce2c57a2..16dce67b855a 100644 >> --- a/kernel/trace/ftrace.c >> +++ b/kernel/trace/ftrace.c >> @@ -460,7 +460,8 @@ struct ftrace_profile { >> unsigned long counter; >> #ifdef CONFIG_FUNCTION_GRAPH_TRACER >> unsigned long long time; >> - unsigned long long time_squared; >> + unsigned long long avg; >> + unsigned long long stddev; >> #endif >> }; >> >> @@ -580,7 +581,6 @@ static int function_stat_show(struct seq_file *m, void *v) >> int ret = 0; >> #ifdef CONFIG_FUNCTION_GRAPH_TRACER >> static struct trace_seq s; >> - unsigned long long avg; >> unsigned long long stddev; >> #endif >> mutex_lock(&ftrace_profile_lock); >> @@ -592,9 +592,7 @@ static int function_stat_show(struct seq_file *m, void *v) >> } >> >> #ifdef CONFIG_FUNCTION_GRAPH_TRACER >> - avg = rec->time; >> - do_div(avg, rec->counter); >> - if (tracing_thresh && (avg < tracing_thresh)) >> + if (tracing_thresh && (rec->avg < tracing_thresh)) >> goto out; >> #endif >> >> @@ -608,24 +606,19 @@ static int function_stat_show(struct seq_file *m, void *v) >> if (rec->counter <= 1) >> stddev = 0; >> else { >> - /* >> - * Apply Welford's method: >> - * s^2 = 1 / (n * (n-1)) * (n * \Sum (x_i)^2 - (\Sum x_i)^2) >> - */ >> - stddev = rec->counter * rec->time_squared - >> - rec->time * rec->time; >> + stddev = rec->stddev; >> >> /* >> * Divide only 1000 for ns^2 -> us^2 conversion. >> * trace_print_graph_duration will divide 1000 again. >> */ >> - do_div(stddev, rec->counter * (rec->counter - 1) * 1000); >> + do_div(stddev, 1000 * (rec->counter - 1)); >> } >> >> trace_seq_init(&s); >> trace_print_graph_duration(rec->time, &s); >> trace_seq_puts(&s, " "); >> - trace_print_graph_duration(avg, &s); >> + trace_print_graph_duration(rec->avg, &s); >> trace_seq_puts(&s, " "); >> trace_print_graph_duration(stddev, &s); >> trace_print_seq(m, &s); >> @@ -905,8 +898,20 @@ static void profile_graph_return(struct ftrace_graph_ret *trace) >> >> rec = ftrace_find_profiled_func(stat, trace->func); >> if (rec) { >> + unsigned long long avg, delta1, delta2; >> + >> rec->time += calltime; >> - rec->time_squared += calltime * calltime; >> + >> + /* Apply Welford's method */ >> + delta1 = calltime - rec->avg; >> + >> + avg = rec->time; >> + do_div(avg, rec->counter); >> + rec->avg = avg; >> + >> + delta2 = calltime - rec->avg; >> + >> + rec->stddev += delta1 * delta2; >> } >> >> out: >> > >