[RFC PATCH 0/2] Tracing bursts of latencies

* [RFC PATCH 0/2] Tracing bursts of latencies
@ 2021-01-19 16:43 Viktor Rosendahl
  2021-01-19 16:43 ` [RFC PATCH 1/2] Use pause-on-trace with the latency tracers Viktor Rosendahl
  2021-01-19 16:43 ` [RFC PATCH 2/2] Add the latency-collector to tools Viktor Rosendahl
  0 siblings, 2 replies; 10+ messages in thread
From: Viktor Rosendahl @ 2021-01-19 16:43 UTC (permalink / raw)
  To: Steven Rostedt, Joel Fernandes, linux-kernel
  Cc: Ingo Molnar, Viktor Rosendahl

Hello all,

This series contains two things:

1. A fix for a bug in the Ftrace latency tracers that appeared with Linux 5.7.

2. The latency-collector, a tool that is designed to work around the
   limitations in the ftrace latency tracers. It needs the bug fix in order to
   work properly.

I have sent a patch series with the latency-collector before.

I never got any comments on it and I stopped pushing it because I thought that
BPF tracing would be the wave of the future and that it would solve the problem
in a cleaner and more elegant way.

Recently, I tried out the criticalstat script from bcc tools but it did not
fulfill all of my hopes and dreams.

On the bright side, it was able to capture all latencies in a burst. The main
problems that I encountered were:

1. The system became unstable and froze now and then. The man page of
   criticalstat has a mention of it being unstable, so I assume that this is a
   known problem.

2. Sometimes the stack traces were incorrect but not in an obvious way. After it
   happened once, all subsequent ones were bad.

3. If two instances were run simultaneously (to capture both preemptoff and irq
   off), there seemed to be a quite large performance hit but I did not measure
   this exactly.

4. The filesystem footprint seemed quite large. The size of libbcc seemed to be
   quite large for a small embedded system.

For these reasons, I take the liberty of resending the latency-collector again.

I would hope to get some comments regarding it, or some suggestion of an
alternative approach of how to solve the problem of being able to capture
latencies that systematically occur close to each other.

Admittedly, it may from a developer's perspective be somewhat of a niche
problem, since removing one latency will reveal the next but when one is doing
validation with a fleet of devices being tested in a long and expensive test
campaign, then it is quite desirable to not lose any latencies.

best regards,

Viktor

Viktor Rosendahl (2):
  Use pause-on-trace with the latency tracers
  Add the latency-collector to tools

 kernel/trace/trace_irqsoff.c      |    4 +
 tools/Makefile                    |   14 +-
 tools/tracing/Makefile            |   20 +
 tools/tracing/latency-collector.c | 1212 +++++++++++++++++++++++++++++
 4 files changed, 1244 insertions(+), 6 deletions(-)
 create mode 100644 tools/tracing/Makefile
 create mode 100644 tools/tracing/latency-collector.c

-- 
2.25.1

^ permalink raw reply	[flat|nested] 10+ messages in thread