[Ksummit-discuss] [TECH TOPIC] Kernel tracing and end-to-end performance breakdown

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Ksummit-discuss] [TECH TOPIC] Kernel tracing and end-to-end performance breakdown
@ 2016-07-20  8:30 Wangnan (F)
  2016-07-21  3:41 ` Christoph Lameter
  2016-07-21 10:00 ` Jan Kara
  0 siblings, 2 replies; 9+ messages in thread
From: Wangnan (F) @ 2016-07-20  8:30 UTC (permalink / raw)
  To: ksummit-discuss
  Cc: Peter Zijlstra, Alexei Starovoitov, Arnaldo Carvalho de Melo,
	Ingo Molnar

Hello,

I'd like to discuss kernel proformance and tracing.

Sometimes people ask us to make their business faster. They show us 
result from
benchmark (iobench), monitor (top, sar) and profiling (perf, oprofile), 
some of
them give brief introduction on their software. Base on these 
information, they
hope us to give magical advise like:

  Echo 1 to /proc/sys/kernel/xxx then the throughput will raise to 10 
times higher.
  Bind thread XX to core X then the latency will reduce from X s to X ns.
  ...

This is unrealistic, but we don't need to be extreme. Showing the 
bottleneck of
a software to point out the right direction is enough to make people happy.
However, even if we have the full kernel source code, finding bottleneck 
from an
unfamiliarity subsystem is still challenging since we don't know how to 
start.

There are two type of performance metrics: throughput and latency. Both 
of them
related to the concept of 'process': time between event 'A' and event 'B'.
Throughput measures how many processes complete in fixed time, latency 
measures
how long a process take. Given a performance result, a nature idea is to 
find
the two ends 'A' and 'B' of the process it concerns, and break down the time
from 'A->B' to find the critical phase. We call it 'end-to-end performance
breakdown'.

A lot of facilities have already in kernel to support end-to-end performance
breakdown. For example, u/kprobes allows us to trace event 'A' and 'B', 
there
are many tracepoitns have already been deployed among many subsystems, BPF
allows us to connect events belong to a specific request, and we have 
perf to
drive all of them. We even have subsystem specific tools like blktrace 
for it.
However, I find it still hard to do the breakdown from user's view. For 
example,
consider a file writing process, we want to break down the performance from
'write' system call to the device. Getting a closer look, we can see vfs,
filesystem, driver and device layers, each layers has queues and 
buffers, they
break larger requests and merge small requests, finally we find it is 
even hard
to define a proper 'process'.

Compare with CPU side, Intel has release its TopDown model, allows us to 
break
instruction execution into 4 stages, and further break each stage to smaller
stages. I also heard from hisilicon that in ARM64 processor we have similar
model. TopDown model is simple: monitoring at some PMU and doing simple
computation. Why can't we do this in software?

The problem is the lacking of a proper performance model. In my point of 
view,
it is linux kernel's responsibility to guide us to do the breakdown. 
Subsystem
designers should expose the principle processes to connect tracepoints 
together.
Kernel should link models from different subsystems. Model should be 
expressed
in a uniformed language, so a tool like perf can do the right thing
automatically.

I suggest to discuss following topics in this year's kernel summit:

  1. Is end-to-end performance breakdown really matter?

  2. Should we design a framework to help kernel developers to express 
and expose
     performance model to help people do the end-to-end performance 
breakdown?

  3. What external tools we need to do the end-to-end performance breakdown?

The list of potential attendees

Alexei Starovoitov
Arnaldo Carvalho de Melo
Ingo Molnar
Li Zefan
Peter Zijlstra
Steven Rostedt

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] Kernel tracing and end-to-end performance breakdown
  2016-07-20  8:30 [Ksummit-discuss] [TECH TOPIC] Kernel tracing and end-to-end performance breakdown Wangnan (F)
@ 2016-07-21  3:41 ` Christoph Lameter
  2016-07-21 10:00 ` Jan Kara
  1 sibling, 0 replies; 9+ messages in thread
From: Christoph Lameter @ 2016-07-21  3:41 UTC (permalink / raw)
  To: Wangnan (F)
  Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Alexei Starovoitov,
	ksummit-discuss, Ingo Molnar

On Wed, 20 Jul 2016, Wangnan (F) wrote:

> I suggest to discuss following topics in this year's kernel summit:
>
>  1. Is end-to-end performance breakdown really matter?
>
>  2. Should we design a framework to help kernel developers to express and
> expose
>     performance model to help people do the end-to-end performance breakdown?
>
>  3. What external tools we need to do the end-to-end performance breakdown?
>

Hmmm... The basic problem in my industry often ends up in accounting for
latency of software and hardware as well as event correlation between
applications, network and multiple hosts. In those scenarios time
measurement itself becomes a difficult thing because the host times is
slightly offset from the GPS clock on the network. Even in one host
there may be multiple clocks at drift to one another (f.e. clocks on the
NICs used for timestamping). In a very low latency environment the
inaccuracy of the clocks and the latency of the network links and various
forms of processing impacts accuracy so much that it is often difficult to
even come up with measurements that one is willing to trust.

So a key basic for tracing and relating events are clocks and knowlege
about how much of variance (how much inaccuracy) a certain clock
contributed.

I think in an ideal world where we could have accurate timestamps the
performance breakdown for us would be simply an instrumentation of the
hardware and software with timestamps. The data can then be correlated
later. This works well if one either  looks at larger time scales (like
milliseconds) or if only a single time source is in use.

If we go down to microseconds this becomes difficult. For nanoseconds fine
grained analysis one is basically always at odds with the reliability of
the clock sources unless one can just use a single one (like TSC).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] Kernel tracing and end-to-end performance breakdown
  2016-07-20  8:30 [Ksummit-discuss] [TECH TOPIC] Kernel tracing and end-to-end performance breakdown Wangnan (F)
  2016-07-21  3:41 ` Christoph Lameter
@ 2016-07-21 10:00 ` Jan Kara
  2016-07-21 13:54   ` Chris Mason
  2016-07-22  3:35   ` Wangnan (F)
  1 sibling, 2 replies; 9+ messages in thread
From: Jan Kara @ 2016-07-21 10:00 UTC (permalink / raw)
  To: Wangnan (F)
  Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Alexei Starovoitov,
	ksummit-discuss, Ingo Molnar

Hello,

On Wed 20-07-16 16:30:49, Wangnan (F) wrote:
> This is unrealistic, but we don't need to be extreme. Showing the
> bottleneck of a software to point out the right direction is enough to
> make people happy.  However, even if we have the full kernel source code,
> finding bottleneck from an unfamiliarity subsystem is still challenging
> since we don't know how to start.

Well, you'll always need quite some knowledge to be able to meaningfully
analyze and fix performance issues. Otherwise it is just stabbing in the
dark. But I do agree and in some cases trying to find out where the time is
actually spent requires fairly tedious analysis. There are nice tools like
Brendan Gregg's flame graphs or even off-cpu flame graphs which help quite
a bit but connecting the dots isn't always easy.

> There are two type of performance metrics: throughput and latency. Both
> of them related to the concept of 'process': time between event 'A' and
> event 'B'.  Throughput measures how many processes complete in fixed
> time, latency measures how long a process take. Given a performance
> result, a nature idea is to find the two ends 'A' and 'B' of the process
> it concerns, and break down the time from 'A->B' to find the critical
> phase. We call it 'end-to-end performance breakdown'.
> 
> A lot of facilities have already in kernel to support end-to-end
> performance breakdown. For example, u/kprobes allows us to trace event
> 'A' and 'B', there are many tracepoitns have already been deployed among
> many subsystems, BPF allows us to connect events belong to a specific
> request, and we have perf to drive all of them. We even have subsystem
> specific tools like blktrace for it.  However, I find it still hard to do
> the breakdown from user's view. For example, consider a file writing
> process, we want to break down the performance from 'write' system call
> to the device. Getting a closer look, we can see vfs, filesystem, driver
> and device layers, each layers has queues and buffers, they break larger
> requests and merge small requests, finally we find it is even hard to
> define a proper 'process'.
> 
> Compare with CPU side, Intel has release its TopDown model, allows us to
> break instruction execution into 4 stages, and further break each stage
> to smaller stages. I also heard from hisilicon that in ARM64 processor we
> have similar model. TopDown model is simple: monitoring at some PMU and
> doing simple computation. Why can't we do this in software?
> 
> The problem is the lacking of a proper performance model. In my point of
> view, it is linux kernel's responsibility to guide us to do the
> breakdown.  Subsystem designers should expose the principle processes to
> connect tracepoints together.  Kernel should link models from different
> subsystems. Model should be expressed in a uniformed language, so a tool
> like perf can do the right thing automatically.

So I'm not sure I understand what do you mean. Let's take you write(2)
example - if you'd like to just get a break out where do we spend time
during the syscall (including various sleeps), then off-cpu flame graphs
[1] already provide quite a reasonable overview. If you really look for
more targetted analysis (e.g. one in a million write has too large
latency), then you need something different. Do I understand right that
you'd like to have some way to associate trace events with some "object"
(being it IO, syscall, or whatever) so that you can more easily perform
targetted analysis for cases like this? 

> I suggest to discuss following topics in this year's kernel summit:
> 
>  1. Is end-to-end performance breakdown really matter?
> 
>  2. Should we design a framework to help kernel developers to express and
>  expose performance model to help people do the end-to-end performance
>  breakdown?
> 
>  3. What external tools we need to do the end-to-end performance breakdown?

So I think improvements in performance analysis are always welcome but
current proposal seems to be somewhat handwavy so I'm not sure what outcome
you'd like to get from the discussion... If you have a more concrete
proposal how you'd like to achieve what you need, then it may be worth
discussion.

As a side note I know that Google (and maybe Facebook, not sure here) have
out-of-tree patches which provide really neat performance analysis
capabilities. I have heard they are not really upstreamable because they
are horrible hacks but maybe they can be a good inspiration for this work.
If we could get someone from these companies to explain what capabilities
they have and how they achieve this (regardless how hacky the
implementation may be), that may be an interesting topic.

								Honza

[1] http://www.brendangregg.com/blog/2016-01-20/ebpf-offcpu-flame-graph.html
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] Kernel tracing and end-to-end performance breakdown
  2016-07-21 10:00 ` Jan Kara
@ 2016-07-21 13:54   ` Chris Mason
  2016-07-21 15:45     ` Jan Kara
  2016-07-22  3:35   ` Wangnan (F)
  1 sibling, 1 reply; 9+ messages in thread
From: Chris Mason @ 2016-07-21 13:54 UTC (permalink / raw)
  To: ksummit-discuss

On 07/21/2016 06:00 AM, Jan Kara wrote:
>
> So I think improvements in performance analysis are always welcome but
> current proposal seems to be somewhat handwavy so I'm not sure what outcome
> you'd like to get from the discussion... If you have a more concrete
> proposal how you'd like to achieve what you need, then it may be worth
> discussion.
>
> As a side note I know that Google (and maybe Facebook, not sure here) have
> out-of-tree patches which provide really neat performance analysis
> capabilities. I have heard they are not really upstreamable because they
> are horrible hacks but maybe they can be a good inspiration for this work.
> If we could get someone from these companies to explain what capabilities
> they have and how they achieve this (regardless how hacky the
> implementation may be), that may be an interesting topic.

At least for facebook, we're moving most things to bpf.  The most 
interesting part of our analysis isn't so much from the tool used to 
record it, it's from being able to aggregate over the fleet and making 
comparisons at scale.

For example, Josef setup the off-cpu flame graphs such that we can 
record stack traces for a latency higher than N, and then sum up the 
most expensive stack traces over a large number of machines.  It makes 
it much easier to find those happens-once-a-day problems.

-chris

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] Kernel tracing and end-to-end performance breakdown
  2016-07-21 13:54   ` Chris Mason
@ 2016-07-21 15:45     ` Jan Kara
  2016-07-21 16:03       ` Chris Mason
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Kara @ 2016-07-21 15:45 UTC (permalink / raw)
  To: Chris Mason; +Cc: ksummit-discuss

On Thu 21-07-16 09:54:53, Chris Mason wrote:
> On 07/21/2016 06:00 AM, Jan Kara wrote:
> >
> >So I think improvements in performance analysis are always welcome but
> >current proposal seems to be somewhat handwavy so I'm not sure what outcome
> >you'd like to get from the discussion... If you have a more concrete
> >proposal how you'd like to achieve what you need, then it may be worth
> >discussion.
> >
> >As a side note I know that Google (and maybe Facebook, not sure here) have
> >out-of-tree patches which provide really neat performance analysis
> >capabilities. I have heard they are not really upstreamable because they
> >are horrible hacks but maybe they can be a good inspiration for this work.
> >If we could get someone from these companies to explain what capabilities
> >they have and how they achieve this (regardless how hacky the
> >implementation may be), that may be an interesting topic.
> 
> At least for facebook, we're moving most things to bpf.  The most
> interesting part of our analysis isn't so much from the tool used to record
> it, it's from being able to aggregate over the fleet and making comparisons
> at scale.
> 
> For example, Josef setup the off-cpu flame graphs such that we can record
> stack traces for a latency higher than N, and then sum up the most expensive
> stack traces over a large number of machines.  It makes it much easier to
> find those happens-once-a-day problems.

By latency higher than N, do you mean that e.g. a syscall took more than N,
or just that a process is sleeping for more than N in some place?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] Kernel tracing and end-to-end performance breakdown
  2016-07-21 15:45     ` Jan Kara
@ 2016-07-21 16:03       ` Chris Mason
  0 siblings, 0 replies; 9+ messages in thread
From: Chris Mason @ 2016-07-21 16:03 UTC (permalink / raw)
  To: Jan Kara; +Cc: ksummit-discuss



On 07/21/2016 11:45 AM, Jan Kara wrote:
> On Thu 21-07-16 09:54:53, Chris Mason wrote:
>> On 07/21/2016 06:00 AM, Jan Kara wrote:
>>>
>>> So I think improvements in performance analysis are always welcome but
>>> current proposal seems to be somewhat handwavy so I'm not sure what outcome
>>> you'd like to get from the discussion... If you have a more concrete
>>> proposal how you'd like to achieve what you need, then it may be worth
>>> discussion.
>>>
>>> As a side note I know that Google (and maybe Facebook, not sure here) have
>>> out-of-tree patches which provide really neat performance analysis
>>> capabilities. I have heard they are not really upstreamable because they
>>> are horrible hacks but maybe they can be a good inspiration for this work.
>>> If we could get someone from these companies to explain what capabilities
>>> they have and how they achieve this (regardless how hacky the
>>> implementation may be), that may be an interesting topic.
>>
>> At least for facebook, we're moving most things to bpf.  The most
>> interesting part of our analysis isn't so much from the tool used to record
>> it, it's from being able to aggregate over the fleet and making comparisons
>> at scale.
>>
>> For example, Josef setup the off-cpu flame graphs such that we can record
>> stack traces for a latency higher than N, and then sum up the most expensive
>> stack traces over a large number of machines.  It makes it much easier to
>> find those happens-once-a-day problems.
>
> By latency higher than N, do you mean that e.g. a syscall took more than N,
> or just that a process is sleeping for more than N in some place?

Single sleep longer than N.  It would be a little more involved to track 
all the sleeps in a single syscall, but we haven't needed to (yet).

-chris

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] Kernel tracing and end-to-end performance breakdown
  2016-07-21 10:00 ` Jan Kara
  2016-07-21 13:54   ` Chris Mason
@ 2016-07-22  3:35   ` Wangnan (F)
  2016-07-23 17:59     ` Alexei Starovoitov
  1 sibling, 1 reply; 9+ messages in thread
From: Wangnan (F) @ 2016-07-22  3:35 UTC (permalink / raw)
  To: Jan Kara
  Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Alexei Starovoitov,
	ksummit-discuss, Ingo Molnar

On 2016/7/21 18:00, Jan Kara wrote:
> Hello,
>
> On Wed 20-07-16 16:30:49, Wangnan (F) wrote:
>
[SNIP]
>>
>> The problem is the lacking of a proper performance model. In my point of
>> view, it is linux kernel's responsibility to guide us to do the
>> breakdown.  Subsystem designers should expose the principle processes to
>> connect tracepoints together.  Kernel should link models from different
>> subsystems. Model should be expressed in a uniformed language, so a tool
>> like perf can do the right thing automatically.
> So I'm not sure I understand what do you mean. Let's take you write(2)
> example - if you'd like to just get a break out where do we spend time
> during the syscall (including various sleeps), then off-cpu flame graphs
> [1] already provide quite a reasonable overview. If you really look for
> more targetted analysis (e.g. one in a million write has too large
> latency), then you need something different. Do I understand right that
> you'd like to have some way to associate trace events with some "object"
> (being it IO, syscall, or whatever) so that you can more easily perform
> targetted analysis for cases like this?

Yes.

Both cpu and off-cpu flame graphs provide kernel side view, but
people want to know something like "how long it takes for a piece
of memory be written to disk and where is the bottleneck". To answer
this question, I have to explain the model of file writting, including
vfs, page cache, file system and device driver, but most of time they
still can't understand why it is hard to answer such a simple question.

I think kernel lacks a tool like top-down [1][2]. In top-down method,
CPU guys provide a model to break down time for instruction execution,
and provide formula to do the computation from PMU counters. Although
the real CPU microarchitecture is complex (similar to kernel,
asynchronization is common) and top-down result is statistical, result
from top-down shows the right direction for tuning.

I suggest kernel find a way to tell user how to break down a process
and where to trace. For example, tell user the performance of writting
can be decoupled into cache, filesystem, blockio and device, filesystem
performance cabe further breaks down into metadata writing, jounal
flushing and XYZ, then which tracepoints can be used to do the
performance breakdown.

There are two types of performance breakdown:

  1. breaks a specific process. For example, one in a million write has 
too large latency
  2. generical performance break down, like what topdown does.

Thank you.

[1] http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6844459
[2] 
https://moodle.technion.ac.il/pluginfile.php/560599/mod_resource/content/1/Vtune%20%20-%20Top%20Down%20Performance%20Analysis%20%2C%20by%20Ahmad%20Yasin.pdf

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] Kernel tracing and end-to-end performance breakdown
  2016-07-22  3:35   ` Wangnan (F)
@ 2016-07-23 17:59     ` Alexei Starovoitov
  2016-07-23 18:15       ` [Ksummit-discuss] Fwd: " Alexei Starovoitov
  0 siblings, 1 reply; 9+ messages in thread
From: Alexei Starovoitov @ 2016-07-23 17:59 UTC (permalink / raw)
  To: Wangnan (F)
  Cc: ksummit-discuss, Peter Zijlstra, Alexei Starovoitov,
	Arnaldo Carvalho de Melo, Ingo Molnar

On Fri, Jul 22, 2016 at 11:35:58AM +0800, Wangnan (F) wrote:
> 
> 
> On 2016/7/21 18:00, Jan Kara wrote:
> >Hello,
> >
> >On Wed 20-07-16 16:30:49, Wangnan (F) wrote:
> >
> [SNIP]
> >>
> >>The problem is the lacking of a proper performance model. In my point of
> >>view, it is linux kernel's responsibility to guide us to do the
> >>breakdown.  Subsystem designers should expose the principle processes to
> >>connect tracepoints together.  Kernel should link models from different
> >>subsystems. Model should be expressed in a uniformed language, so a tool
> >>like perf can do the right thing automatically.
> >So I'm not sure I understand what do you mean. Let's take you write(2)
> >example - if you'd like to just get a break out where do we spend time
> >during the syscall (including various sleeps), then off-cpu flame graphs
> >[1] already provide quite a reasonable overview. If you really look for
> >more targetted analysis (e.g. one in a million write has too large
> >latency), then you need something different. Do I understand right that
> >you'd like to have some way to associate trace events with some "object"
> >(being it IO, syscall, or whatever) so that you can more easily perform
> >targetted analysis for cases like this?
> 
> Yes.
> 
> Both cpu and off-cpu flame graphs provide kernel side view, but
> people want to know something like "how long it takes for a piece
> of memory be written to disk and where is the bottleneck". To answer
> this question, I have to explain the model of file writting, including
> vfs, page cache, file system and device driver, but most of time they
> still can't understand why it is hard to answer such a simple question.
> 
> I think kernel lacks a tool like top-down [1][2]. In top-down method,
> CPU guys provide a model to break down time for instruction execution,
> and provide formula to do the computation from PMU counters. Although
> the real CPU microarchitecture is complex (similar to kernel,
> asynchronization is common) and top-down result is statistical, result
> from top-down shows the right direction for tuning.
> 
> I suggest kernel find a way to tell user how to break down a process
> and where to trace. For example, tell user the performance of writting
> can be decoupled into cache, filesystem, blockio and device, filesystem
> performance cabe further breaks down into metadata writing, jounal
> flushing and XYZ, then which tracepoints can be used to do the
> performance breakdown.
> 
> There are two types of performance breakdown:
> 
>  1. breaks a specific process. For example, one in a million write has too
> large latency
>  2. generical performance break down, like what topdown does.

If I understand the proposal correctly it really meant to say
'request' (instead of 'process') that was issued by user space
process at time A into the kernel and was received back
into the user space at time B. And the goal is to trace this
'request' from the beginning till the end.
The proposal calls it 'end-to-end', but that typically means
networking principle whereas here it's single host latency of
the request.
I'm not sure how such tracing of something like write syscall
will be done. It's certainly an interesting discussion, but wouldn't
it be more appropriate for tracing microconf in plumbers [1] ?
There is also a danger of modeling such request tracing with
google's dapper tracing or intel's 'top-down'.
dapper is designed for tracing rpc calls in a large distributed
system whereas intel stuff is imo a generalization of
complex cpu pipeline into front/back-end and sub-blocks.

[1] http://www.linuxplumbersconf.org/2016/ocw/events/LPC2016/tracks/573

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Ksummit-discuss] Fwd: [TECH TOPIC] Kernel tracing and end-to-end performance breakdown
  2016-07-23 17:59     ` Alexei Starovoitov
@ 2016-07-23 18:15       ` Alexei Starovoitov
  0 siblings, 0 replies; 9+ messages in thread
From: Alexei Starovoitov @ 2016-07-23 18:15 UTC (permalink / raw)
  To: ksummit-discuss

On Fri, Jul 22, 2016 at 11:35:58AM +0800, Wangnan (F) wrote:
>
>
> On 2016/7/21 18:00, Jan Kara wrote:
> >Hello,
> >
> >On Wed 20-07-16 16:30:49, Wangnan (F) wrote:
> >
> [SNIP]
> >>
> >>The problem is the lacking of a proper performance model. In my point of
> >>view, it is linux kernel's responsibility to guide us to do the
> >>breakdown.  Subsystem designers should expose the principle processes to
> >>connect tracepoints together.  Kernel should link models from different
> >>subsystems. Model should be expressed in a uniformed language, so a tool
> >>like perf can do the right thing automatically.
> >So I'm not sure I understand what do you mean. Let's take you write(2)
> >example - if you'd like to just get a break out where do we spend time
> >during the syscall (including various sleeps), then off-cpu flame graphs
> >[1] already provide quite a reasonable overview. If you really look for
> >more targetted analysis (e.g. one in a million write has too large
> >latency), then you need something different. Do I understand right that
> >you'd like to have some way to associate trace events with some "object"
> >(being it IO, syscall, or whatever) so that you can more easily perform
> >targetted analysis for cases like this?
>
> Yes.
>
> Both cpu and off-cpu flame graphs provide kernel side view, but
> people want to know something like "how long it takes for a piece
> of memory be written to disk and where is the bottleneck". To answer
> this question, I have to explain the model of file writting, including
> vfs, page cache, file system and device driver, but most of time they
> still can't understand why it is hard to answer such a simple question.
>
> I think kernel lacks a tool like top-down [1][2]. In top-down method,
> CPU guys provide a model to break down time for instruction execution,
> and provide formula to do the computation from PMU counters. Although
> the real CPU microarchitecture is complex (similar to kernel,
> asynchronization is common) and top-down result is statistical, result
> from top-down shows the right direction for tuning.
>
> I suggest kernel find a way to tell user how to break down a process
> and where to trace. For example, tell user the performance of writting
> can be decoupled into cache, filesystem, blockio and device, filesystem
> performance cabe further breaks down into metadata writing, jounal
> flushing and XYZ, then which tracepoints can be used to do the
> performance breakdown.
>
> There are two types of performance breakdown:
>
>  1. breaks a specific process. For example, one in a million write has too
> large latency
>  2. generical performance break down, like what topdown does.

If I understand the proposal correctly it really meant to say
'request' (instead of 'process') that was issued by user space
process at time A into the kernel and was received back
into the user space at time B. And the goal is to trace this
'request' from the beginning till the end.
The proposal calls it 'end-to-end', but that typically means
networking principle whereas here it's single host latency of
the request.
I'm not sure how such tracing of something like write syscall
will be done. It's certainly an interesting discussion, but wouldn't
it be more appropriate for tracing microconf in plumbers [1] ?
There is also a danger of modeling such request tracing with
google's dapper tracing or intel's 'top-down'.
dapper is designed for tracing rpc calls in a large distributed
system whereas intel stuff is imo a generalization of
complex cpu pipeline into front/back-end and sub-blocks.

[1] http://www.linuxplumbersconf.org/2016/ocw/events/LPC2016/tracks/573

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-07-23 18:16 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-20  8:30 [Ksummit-discuss] [TECH TOPIC] Kernel tracing and end-to-end performance breakdown Wangnan (F)
2016-07-21  3:41 ` Christoph Lameter
2016-07-21 10:00 ` Jan Kara
2016-07-21 13:54   ` Chris Mason
2016-07-21 15:45     ` Jan Kara
2016-07-21 16:03       ` Chris Mason
2016-07-22  3:35   ` Wangnan (F)
2016-07-23 17:59     ` Alexei Starovoitov
2016-07-23 18:15       ` [Ksummit-discuss] Fwd: " Alexei Starovoitov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.