All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
@ 2014-12-17  3:06 Alexei Starovoitov
  2014-12-17 21:42 ` Josef Bacik
  0 siblings, 1 reply; 29+ messages in thread
From: Alexei Starovoitov @ 2014-12-17  3:06 UTC (permalink / raw)
  To: Martin Lau
  Cc: Eric Dumazet, Blake Matheny, Laurent Chavey, Yuchung Cheng,
	netdev, David S. Miller, Hannes Frederic Sowa, Steven Rostedt,
	Lawrence Brakmo, Josef Bacik, Kernel Team

On Tue, Dec 16, 2014 at 5:30 PM, Martin Lau <kafai@fb.com> wrote:
>> >> >> I think systemtap like scripting on top of patches 1 and 3
>> >> >> should solve your use case ?
>> > We have quite a few different versions running in the production.  It may not
>> > be operationally easy.
>>
>> different versions of kernel or different versions of tcp_tracer ?
> Former and we are releasing new kernel pretty often.

I see. So for dynamic tracer to be useful in such environment,
the scripts should be compatible across different kernel version
without recompilation. All makes sense.

> How does the current TRACE_EVENT do it when it wants to printf more data?

tracepoints, like any other user interface, shouldn't
break compatibility. With printf it's practically impossible.
Some subsystems may be breaking this rule arguing that
tracepoints is a debug facility, but networking tracepoints don't change.

>> It feels that for stats collection only, tracepoints+tcp_trace
>> do not add much additional value vs extending tcp_info
>> and using ss.
> I think we are on the same page. Once 'this should cost nothing if not
> activated' proposition was cleared out.  It was what I meant that doing the
> collection part in the TCP itself (instead of tracepoints) would be nice.

agree.

> I think going forward, as others have suggested, it may be better to come
> together and reach a common ground on what to collect first before I re-work
> patch 1 to 3 and repost.

I think as a minimum it will be discussed at netdev01 in Feb,
but I suspect not everyone on this list can(want) go to Ottawa,
so would be nice to have a meetup for bay area folks to
discuss this sooner with public g+ hangout.
Thoughts?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
  2014-12-17  3:06 [RFC PATCH net-next 0/5] tcp: TCP tracer Alexei Starovoitov
@ 2014-12-17 21:42 ` Josef Bacik
  2014-12-18 23:43   ` Lawrence Brakmo
  0 siblings, 1 reply; 29+ messages in thread
From: Josef Bacik @ 2014-12-17 21:42 UTC (permalink / raw)
  To: Alexei Starovoitov, Martin Lau
  Cc: Eric Dumazet, Blake Matheny, Laurent Chavey, Yuchung Cheng,
	netdev, David S. Miller, Hannes Frederic Sowa, Steven Rostedt,
	Lawrence Brakmo, Kernel Team

On 12/16/2014 10:06 PM, Alexei Starovoitov wrote:
> On Tue, Dec 16, 2014 at 5:30 PM, Martin Lau <kafai@fb.com> wrote:
>>>>>>> I think systemtap like scripting on top of patches 1 and 3
>>>>>>> should solve your use case ?
>>>> We have quite a few different versions running in the production.  It may not
>>>> be operationally easy.
>>>
>>> different versions of kernel or different versions of tcp_tracer ?
>> Former and we are releasing new kernel pretty often.
>
> I see. So for dynamic tracer to be useful in such environment,
> the scripts should be compatible across different kernel version
> without recompilation. All makes sense.
>
>> How does the current TRACE_EVENT do it when it wants to printf more data?
>
> tracepoints, like any other user interface, shouldn't
> break compatibility. With printf it's practically impossible.
> Some subsystems may be breaking this rule arguing that
> tracepoints is a debug facility, but networking tracepoints don't change.
>

So that's what the events/<subsystem>/<event>/format is for, to provide 
a nice way for scripts to know what they are looking at.  For things 
like the tcp estats and other tracing tools we use in production 
internally we use something (our own stuff in case of estats, trace-cmd 
in the case of normal tracepoints) to read the raw data and pull out the 
fields we need, and that way it works no matter what kernel we're on. 
Sometimes tracepoints move and so we have to adjust our scripts, but 
that's the cost of doing business and I think that's acceptable.

>>> It feels that for stats collection only, tracepoints+tcp_trace
>>> do not add much additional value vs extending tcp_info
>>> and using ss.
>> I think we are on the same page. Once 'this should cost nothing if not
>> activated' proposition was cleared out.  It was what I meant that doing the
>> collection part in the TCP itself (instead of tracepoints) would be nice.
>
> agree.
>
>> I think going forward, as others have suggested, it may be better to come
>> together and reach a common ground on what to collect first before I re-work
>> patch 1 to 3 and repost.
>
> I think as a minimum it will be discussed at netdev01 in Feb,
> but I suspect not everyone on this list can(want) go to Ottawa,
> so would be nice to have a meetup for bay area folks to
> discuss this sooner with public g+ hangout.
> Thoughts?
>

Yeah I think we're all in agreement that this is a good netdev01 
discussion.  I'm happy to include people who want to talk about this 
before hand in the bay area meetup we're throwing, but it seems like 
this is going to be something that the larger community is going to want 
to talk about so it may be more productive to wait until netdev01.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
  2014-12-17 21:42 ` Josef Bacik
@ 2014-12-18 23:43   ` Lawrence Brakmo
  2014-12-19  1:42     ` Yuchung Cheng
  0 siblings, 1 reply; 29+ messages in thread
From: Lawrence Brakmo @ 2014-12-18 23:43 UTC (permalink / raw)
  To: Josef Bacik, Alexei Starovoitov, Martin Lau
  Cc: Eric Dumazet, Blake Matheny, Laurent Chavey, Yuchung Cheng,
	netdev, David S. Miller, Hannes Frederic Sowa, Steven Rostedt,
	Kernel Team


On 12/17/14, 1:42 PM, "Josef Bacik" <jbacik@fb.com> wrote:

>>>> It feels that for stats collection only, tracepoints+tcp_trace
>>>> do not add much additional value vs extending tcp_info
>>>> and using ss.
>>> I think we are on the same page. Once 'this should cost nothing if not
>>> activated' proposition was cleared out.  It was what I meant that
>>>doing the
>>> collection part in the TCP itself (instead of tracepoints) would be
>>>nice.
>>
>> agree.
>>
>>> I think going forward, as others have suggested, it may be better to
>>>come
>>> together and reach a common ground on what to collect first before I
>>>re-work
>>> patch 1 to 3 and repost.
>>
>> I think as a minimum it will be discussed at netdev01 in Feb,
>> but I suspect not everyone on this list can(want) go to Ottawa,
>> so would be nice to have a meetup for bay area folks to
>> discuss this sooner with public g+ hangout.
>> Thoughts?
>>
>
>Yeah I think we're all in agreement that this is a good netdev01
>discussion.  I'm happy to include people who want to talk about this
>before hand in the bay area meetup we're throwing, but it seems like
>this is going to be something that the larger community is going to want
>to talk about so it may be more productive to wait until netdev01.
>Thanks,


Josef: I think a preliminary discussion during the bay area meet up would
be useful to get some of us in sync.

There are two issues going on. One is the collection of statistics that
can be read every-so-often and another is the issue of enabling easier
tracing of TCP state for analysis and debugging.

For statistics collection, extending tcp_info is a viable option although
we may need to do some modifications to deal with: (1) Having many
connections most of which are idle. We need an option to only output those
whose stats have changed since the last read. (2) A mechanism to deal with
closed connections and their stats. Note that in our current setup neither
of these is an issue for us.

For tracing and event collection, I see a lot of value in tracepoints that
could print basic info with perf but also allow us to do more complex
things by loading a module that hooks to the tracepoints. This is one way
to set up triggers to collect state for a particular flow.

Yuchung: I agree that a lot of information can be obtained through
analysis of tcpdumps, but some internal state must be inferred and in many
instances we can only get bounds.

- Larry

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
  2014-12-18 23:43   ` Lawrence Brakmo
@ 2014-12-19  1:42     ` Yuchung Cheng
  0 siblings, 0 replies; 29+ messages in thread
From: Yuchung Cheng @ 2014-12-19  1:42 UTC (permalink / raw)
  To: Lawrence Brakmo
  Cc: Josef Bacik, Alexei Starovoitov, Martin Lau, Eric Dumazet,
	Blake Matheny, Laurent Chavey, netdev, David S. Miller,
	Hannes Frederic Sowa, Steven Rostedt, Kernel Team

On Thu, Dec 18, 2014 at 3:43 PM, Lawrence Brakmo <brakmo@fb.com> wrote:
>
>
> On 12/17/14, 1:42 PM, "Josef Bacik" <jbacik@fb.com> wrote:
>
> >>>> It feels that for stats collection only, tracepoints+tcp_trace
> >>>> do not add much additional value vs extending tcp_info
> >>>> and using ss.
> >>> I think we are on the same page. Once 'this should cost nothing if not
> >>> activated' proposition was cleared out.  It was what I meant that
> >>>doing the
> >>> collection part in the TCP itself (instead of tracepoints) would be
> >>>nice.
> >>
> >> agree.
> >>
> >>> I think going forward, as others have suggested, it may be better to
> >>>come
> >>> together and reach a common ground on what to collect first before I
> >>>re-work
> >>> patch 1 to 3 and repost.
> >>
> >> I think as a minimum it will be discussed at netdev01 in Feb,
> >> but I suspect not everyone on this list can(want) go to Ottawa,
> >> so would be nice to have a meetup for bay area folks to
> >> discuss this sooner with public g+ hangout.
> >> Thoughts?
> >>
> >
> >Yeah I think we're all in agreement that this is a good netdev01
> >discussion.  I'm happy to include people who want to talk about this
> >before hand in the bay area meetup we're throwing, but it seems like
> >this is going to be something that the larger community is going to want
> >to talk about so it may be more productive to wait until netdev01.
> >Thanks,
>
>
> Josef: I think a preliminary discussion during the bay area meet up would
> be useful to get some of us in sync.
>
> There are two issues going on. One is the collection of statistics that
> can be read every-so-often and another is the issue of enabling easier
> tracing of TCP state for analysis and debugging.
>
> For statistics collection, extending tcp_info is a viable option although
> we may need to do some modifications to deal with: (1) Having many
> connections most of which are idle. We need an option to only output those
> whose stats have changed since the last read. (2) A mechanism to deal with
> closed connections and their stats. Note that in our current setup neither
> of these is an issue for us.
>
> For tracing and event collection, I see a lot of value in tracepoints that
> could print basic info with perf but also allow us to do more complex
> things by loading a module that hooks to the tracepoints. This is one way
> to set up triggers to collect state for a particular flow.
>
> Yuchung: I agree that a lot of information can be obtained through
> analysis of tcpdumps, but some internal state must be inferred and in many
> instances we can only get bounds.
Hi Larry :)

I definitely see values in tracepoints. I was responding to the commit
message in patch 5/5: "Uncover uplink/backbone/subnet issue, e.g. by
tracking the rxmit rate.". First ss and tcp_info can collect that
data. But rxmit rate is often not enough for diagnosis. One needs to
inspect the loss patterns from packet traces. I am sure you know what
I am talking about.



>
> - Larry
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
  2014-12-17 20:56 ` David Ahern
@ 2014-12-17 21:24   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 29+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-12-17 21:24 UTC (permalink / raw)
  To: David Ahern
  Cc: Alexei Starovoitov, Martin KaFai Lau, netdev, David S. Miller,
	Hannes Frederic Sowa, Steven Rostedt, Lawrence Brakmo,
	Josef Bacik, Kernel Team

Em Wed, Dec 17, 2014 at 01:56:33PM -0700, David Ahern escreveu:
> On 12/17/14 1:42 PM, Alexei Starovoitov wrote:
> >>It is not strictly necessary to carry vmlinux, that is just a probe
> >>>point resolution time problem, solvable when generating a shell script,
> >>>on the development machine, to insert the probes.
> >on N development machines with kernels that
> >would match worker machines...
> >I'm not saying it's impossible, just operationally difficult.
> >This is my understanding of Martin's use case.

> That's the use case I am talking about ... N-different kernel versions and
> the probe definitions would need to be generated at *build* time of the
> kernel that uses a cross-compile environment. ie., can't assume there is a
> development machine running the kernel from which you can generate the probe
> definitions. This gets messy quick for embedded deployments.

It shouldn't, you're saying that the rate of pushing out production
kernels is so high that we get lost and can't find the matching full
debug original binaries used.

We have build-ids for that, to have binary content keys, that we can
match what is in production, that has to be as lean as possible, while
being able to get back to all that fat.

Is it that people want so hard to forget about that extra debugging fat
that in the end we need to keep it to be able to figure out what happens
when things go wrong?

I understand that the expectation is that for each production build
there will be unwieldly different probe point definitions to keep, but
is that so?

- Arnaldo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
  2014-12-17 20:42 Alexei Starovoitov
  2014-12-17 20:56 ` David Ahern
@ 2014-12-17 21:19 ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 29+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-12-17 21:19 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Martin KaFai Lau, netdev, David S. Miller, Hannes Frederic Sowa,
	Steven Rostedt, Lawrence Brakmo, Josef Bacik, Kernel Team

Em Wed, Dec 17, 2014 at 12:42:34PM -0800, Alexei Starovoitov escreveu:
> On Wed, Dec 17, 2014 at 11:51 AM, Arnaldo Carvalho de Melo
> <arnaldo.melo@gmail.com> wrote:
> > Em Wed, Dec 17, 2014 at 09:14:02AM -0800, Alexei Starovoitov escreveu:
> >> On Wed, Dec 17, 2014 at 7:07 AM, Arnaldo Carvalho de Melo
> >> <arnaldo.melo@gmail.com> wrote:
> >> > I guess even just using 'perf probe' to set those wannabe tracepoints
> >> > should be enough, no? Then he can refer to those in his perf record
> >> > call, etc and process it just like with the real tracepoints.
> >
> >> it's far from ideal for two reasons.
> >> - they have different kernels and dragging along vmlinux
> >> with debug info or multiple 'perf list' data is too cumbersome
> >
> > It is not strictly necessary to carry vmlinux, that is just a probe
> > point resolution time problem, solvable when generating a shell script,
> > on the development machine, to insert the probes.
> 
> on N development machines with kernels that
> would match worker machines...
> I'm not saying it's impossible, just operationally difficult.
> This is my understanding of Martin's use case.

The point here is that its difficult to cater to the needs of all
involved, researchers and maintainers don't like to be plastered by
contracts to keep metrics and crossroads that at some point made sense.

It will be difficult, in some cases, to some people, to be able to get
all they want, what I tried to stress is that there are alternatives to
commiting to tons of tracepoints (or just a few), in the form of dynamic
ones, that with some infrastructure, could be put to use before
something better comes along.
 
> >> operationally. Permanent tracepoints solve this problem.
> >
> > Sure, and when available, use them, my suggestion wasn't to use
> > exclusively any mechanism, but to initially use what is available to
> > create the tools, then find places that could be improved (if that
> > proves to be the case) by using a higher performance mechanism.
 
> agree. I think if kprobe approach was usable, it would have

Who said it was not?

> been used already and yet here you have these patches
> that add tracepoints in few strategic places of tcp stack.

Well, up to the point that these points are argued to death to being
strategic enough to have a tracepoint, kprobes is the way to go, or, in
other words, the _only_ way to go, if you don't want to have a patched
kernel.
 
> >> - the action upon hitting tracepoint is non-trivial.
> >> perf probe style of unconditionally walking pointer chains
> >> will be tripping over wrong pointers.
> >
> > Huh? Care to elaborate on this one?
> 
> if perf probe does 'result->name' as in your example
> then it would work, but patch 5 does conditional
> walking of pointers, so you cannot just add
> a perf probe that does print(ptr1->value1, ptr2->value2)
> It won't crash, but will be collecting wrong stats.
> (likely counting zeros)

Right, for that we need to activate eBPF code when we hit such probes,
but then, it continues being something dynamic, not something that is
forever there, in the source code.
 
> >> Plus they already need to do aggregation for high
> >> frequency events.
> >
> >> As part of acting on trace_transmit_skb() event:
> >> if (before(tcb->seq, tcp_sk(sk)->snd_nxt)) {
> >>   tcp_trace_stats_add(...)
> >> }
> >> if (jiffies_to_msecs(jiffies - sktr->last_ts) ..) {
> >>   tcp_trace_stats_add(...)
> >> }
> >
> > But aren't these stats TCP already keeps or could be made to?
> 
> that's the whole discussion about.
> tcp_info has some of them.
> Though it's difficult to claim that, say, tcp_info->tcpi_lost is

For such flexibility I think we need to go the eBPF way, i.e. strive the
most to reduce the cost of inserting a stat collection point.
> the same as loss_segs_retrans from patch 5.

- Arnaldo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
  2014-12-17 20:42 Alexei Starovoitov
@ 2014-12-17 20:56 ` David Ahern
  2014-12-17 21:24   ` Arnaldo Carvalho de Melo
  2014-12-17 21:19 ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 29+ messages in thread
From: David Ahern @ 2014-12-17 20:56 UTC (permalink / raw)
  To: Alexei Starovoitov, Arnaldo Carvalho de Melo
  Cc: Martin KaFai Lau, netdev, David S. Miller, Hannes Frederic Sowa,
	Steven Rostedt, Lawrence Brakmo, Josef Bacik, Kernel Team

On 12/17/14 1:42 PM, Alexei Starovoitov wrote:
>> It is not strictly necessary to carry vmlinux, that is just a probe
>> >point resolution time problem, solvable when generating a shell script,
>> >on the development machine, to insert the probes.
> on N development machines with kernels that
> would match worker machines...
> I'm not saying it's impossible, just operationally difficult.
> This is my understanding of Martin's use case.
>

That's the use case I am talking about ... N-different kernel versions 
and the probe definitions would need to be generated at *build* time of 
the kernel that uses a cross-compile environment. ie., can't assume 
there is a development machine running the kernel from which you can 
generate the probe definitions. This gets messy quick for embedded 
deployments.

David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
  2014-12-15 19:56     ` Yuchung Cheng
@ 2014-12-17 20:45       ` rapier
  0 siblings, 0 replies; 29+ messages in thread
From: rapier @ 2014-12-17 20:45 UTC (permalink / raw)
  To: Yuchung Cheng, Blake Matheny
  Cc: Eric Dumazet, Alexei Starovoitov, Laurent Chavey, Martin Lau,
	netdev, David S. Miller, Hannes Frederic Sowa, Steven Rostedt,
	Lawrence Brakmo, Josef Bacik, Kernel Team


On 12/15/14 2:56 PM, Yuchung Cheng wrote:
> On Mon, Dec 15, 2014 at 8:08 AM, Blake Matheny <bmatheny@fb.com> wrote:
>>
>> We have an additional set of patches for web10g that builds on these
>> tracepoints. It can be made to work either way, but I agree the idea of
>> something like a sockopt would be really nice.
>
> I'd like to compare these patches  with tools that parse pcap files to
> generate per-flow counters to collect RTTs, #dupacks, etc. What
> additional values or insights do they provide to improve/debug TCP
> performance? maybe an example?

So this is our use scenario:

If the stack were instrumented on a per flow basis we can gather metrics 
proactively. This data can likely be processed in a near real time basis 
to at least get some general idea about the health of the flow (dupack, 
cong events, spurious rto, etc). It's possible we can use this data to 
provisionally flag flows during the lifespan of the transfer. If we 
store the collected metrics NOC engineers can access this to make a 
final determination about performance. They may then start the 
resolution process immediately using data collected in situ. With the 
web10g data we do collect stack data but we are also collecting 
information about the path and the interaction between the application 
and the stack.

This scenario is particularly appealing in the realm of big data 
science. We're currently working with datasets that are hundreds of TBs 
in size and will soon be dealing with multiple PBs as a matter of 
course. In many cases we're aware of the path characteristics in advance 
via SDN so we can apply the macroscopic model and see when we're 
dropping below thresholds for that path. Since we're doing most of 
transfers between loosely federated sets of distantly located transfer 
nodes we don't generally have access to the far end of the connection 
which might be the right place to collect the pcap data.

> IMO these stats provide a general pictures of how TCP works of a
> specific network, but not enough to really nail specific bugs in TCP
> protocol or implementation. Then SNMP stats or sampling with pcap
> traces with offline analysis can achieve the same purpose.

I'd agree with that but in the scenario we are most interested in 
protocol/implementation issues are secondary concerns. They are 
important but we've mostly be focused on what we can do to make the 
scientific workflow easier when dealing with the transfer of large data 
sets.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
@ 2014-12-17 20:42 Alexei Starovoitov
  2014-12-17 20:56 ` David Ahern
  2014-12-17 21:19 ` Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 29+ messages in thread
From: Alexei Starovoitov @ 2014-12-17 20:42 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Martin KaFai Lau, netdev, David S. Miller, Hannes Frederic Sowa,
	Steven Rostedt, Lawrence Brakmo, Josef Bacik, Kernel Team

On Wed, Dec 17, 2014 at 11:51 AM, Arnaldo Carvalho de Melo
<arnaldo.melo@gmail.com> wrote:
> Em Wed, Dec 17, 2014 at 09:14:02AM -0800, Alexei Starovoitov escreveu:
>> On Wed, Dec 17, 2014 at 7:07 AM, Arnaldo Carvalho de Melo
>> <arnaldo.melo@gmail.com> wrote:
>> > I guess even just using 'perf probe' to set those wannabe tracepoints
>> > should be enough, no? Then he can refer to those in his perf record
>> > call, etc and process it just like with the real tracepoints.
>
>> it's far from ideal for two reasons.
>> - they have different kernels and dragging along vmlinux
>> with debug info or multiple 'perf list' data is too cumbersome
>
> It is not strictly necessary to carry vmlinux, that is just a probe
> point resolution time problem, solvable when generating a shell script,
> on the development machine, to insert the probes.

on N development machines with kernels that
would match worker machines...
I'm not saying it's impossible, just operationally difficult.
This is my understanding of Martin's use case.

>> operationally. Permanent tracepoints solve this problem.
>
> Sure, and when available, use them, my suggestion wasn't to use
> exclusively any mechanism, but to initially use what is available to
> create the tools, then find places that could be improved (if that
> proves to be the case) by using a higher performance mechanism.

agree. I think if kprobe approach was usable, it would have
been used already and yet here you have these patches
that add tracepoints in few strategic places of tcp stack.

>> - the action upon hitting tracepoint is non-trivial.
>> perf probe style of unconditionally walking pointer chains
>> will be tripping over wrong pointers.
>
> Huh? Care to elaborate on this one?

if perf probe does 'result->name' as in your example
then it would work, but patch 5 does conditional
walking of pointers, so you cannot just add
a perf probe that does print(ptr1->value1, ptr2->value2)
It won't crash, but will be collecting wrong stats.
(likely counting zeros)

>> Plus they already need to do aggregation for high
>> frequency events.
>
>> As part of acting on trace_transmit_skb() event:
>> if (before(tcb->seq, tcp_sk(sk)->snd_nxt)) {
>>   tcp_trace_stats_add(...)
>> }
>> if (jiffies_to_msecs(jiffies - sktr->last_ts) ..) {
>>   tcp_trace_stats_add(...)
>> }
>
> But aren't these stats TCP already keeps or could be made to?

that's the whole discussion about.
tcp_info has some of them.
Though it's difficult to claim that, say, tcp_info->tcpi_lost is
the same as loss_segs_retrans from patch 5.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
  2014-12-17 17:14 Alexei Starovoitov
@ 2014-12-17 19:51 ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 29+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-12-17 19:51 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Martin KaFai Lau, netdev, David S. Miller, Hannes Frederic Sowa,
	Steven Rostedt, Lawrence Brakmo, Josef Bacik, Kernel Team

Em Wed, Dec 17, 2014 at 09:14:02AM -0800, Alexei Starovoitov escreveu:
> On Wed, Dec 17, 2014 at 7:07 AM, Arnaldo Carvalho de Melo
> <arnaldo.melo@gmail.com> wrote:
> > I guess even just using 'perf probe' to set those wannabe tracepoints
> > should be enough, no? Then he can refer to those in his perf record
> > call, etc and process it just like with the real tracepoints.
 
> it's far from ideal for two reasons.
> - they have different kernels and dragging along vmlinux
> with debug info or multiple 'perf list' data is too cumbersome

It is not strictly necessary to carry vmlinux, that is just a probe
point resolution time problem, solvable when generating a shell script,
on the development machine, to insert the probes.

> operationally. Permanent tracepoints solve this problem.

Sure, and when available, use them, my suggestion wasn't to use
exclusively any mechanism, but to initially use what is available to
create the tools, then find places that could be improved (if that
proves to be the case) by using a higher performance mechanism.

> - the action upon hitting tracepoint is non-trivial.
> perf probe style of unconditionally walking pointer chains
> will be tripping over wrong pointers.

Huh? Care to elaborate on this one?

> Plus they already need to do aggregation for high
> frequency events.

> As part of acting on trace_transmit_skb() event:
> if (before(tcb->seq, tcp_sk(sk)->snd_nxt)) {
>   tcp_trace_stats_add(...)
> }
> if (jiffies_to_msecs(jiffies - sktr->last_ts) ..) {
>   tcp_trace_stats_add(...)
> }

But aren't these stats TCP already keeps or could be made to?

- Arnaldo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
@ 2014-12-17 17:14 Alexei Starovoitov
  2014-12-17 19:51 ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 29+ messages in thread
From: Alexei Starovoitov @ 2014-12-17 17:14 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Martin KaFai Lau, netdev, David S. Miller, Hannes Frederic Sowa,
	Steven Rostedt, Lawrence Brakmo, Josef Bacik, Kernel Team

On Wed, Dec 17, 2014 at 7:07 AM, Arnaldo Carvalho de Melo
<arnaldo.melo@gmail.com> wrote:
>
> I guess even just using 'perf probe' to set those wannabe tracepoints
> should be enough, no? Then he can refer to those in his perf record
> call, etc and process it just like with the real tracepoints.

it's far from ideal for two reasons.
- they have different kernels and dragging along vmlinux
with debug info or multiple 'perf list' data is too cumbersome
operationally. Permanent tracepoints solve this problem.
- the action upon hitting tracepoint is non-trivial.
perf probe style of unconditionally walking pointer chains
will be tripping over wrong pointers.
Plus they already need to do aggregation for high
frequency events.
As part of acting on trace_transmit_skb() event:
if (before(tcb->seq, tcp_sk(sk)->snd_nxt)) {
  tcp_trace_stats_add(...)
}
if (jiffies_to_msecs(jiffies - sktr->last_ts) ..) {
  tcp_trace_stats_add(...)
}

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
  2014-12-15  6:55 Alexei Starovoitov
  2014-12-15 16:03 ` Eric Dumazet
@ 2014-12-17 15:07 ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 29+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-12-17 15:07 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Martin KaFai Lau, netdev, David S. Miller, Hannes Frederic Sowa,
	Steven Rostedt, Lawrence Brakmo, Josef Bacik, Kernel Team

Em Sun, Dec 14, 2014 at 10:55:55PM -0800, Alexei Starovoitov escreveu:
> On Sun, Dec 14, 2014 at 5:56 PM, Martin KaFai Lau <kafai@fb.com> wrote:
> > Hi,
> >
> > We have been using the kernel ftrace infra to collect TCP per-flow statistics.
> > The following patch set is a first slim-down version of our
> > existing implementation. We would like to get some early feedback
> > and make it useful for others.
> >
> > [RFC PATCH net-next 1/5] tcp: Add TCP TRACE_EVENTs:
> > Defines some basic tracepoints (by TRACE_EVENT).
> >
> > [RFC PATCH net-next 2/5] tcp: A perf script for TCP tracepoints:
> > A sample perf script with simple ip/port filtering and summary output.
> >
> > [RFC PATCH net-next 3/5] tcp: Add a few more tracepoints for tcp tracer:
> > Declares a few more tracepoints (by DECLARE_TRACE) which are
> > used by the tcp_tracer.  The tcp_tracer is in the patch 5/5.
> >
> > [RFC PATCH net-next 4/5] tcp: Introduce tcp_sk_trace and related structs:
> > Defines a few tcp_trace structs which are used to collect statistics
> > on each tcp_sock.
> >
> > [RFC PATCH net-next 5/5] tcp: Add TCP tracer:
> > It introduces a tcp_tracer which hooks onto the tracepoints defined in the
> > patch 1/5 and 3/5.  It collects data defined in patch 4/5. We currently
> > use this tracer to collect per-flow statistics.  The commit log has
> > some more details.
> 
> I think patches 1 and 3 are good additions, since they establish
> few permanent points of instrumentation in tcp stack.
> Patches 4-5 look more like use cases of tracepoints established
> before. They may feel like simple additions and, no doubt,
> they are useful, but since they expose things via tracing
> infra they become part of api and cannot be changed later,
> when more stats would be needed.
> I think systemtap like scripting on top of patches 1 and 3
> should solve your use case ?

I guess even just using 'perf probe' to set those wannabe tracepoints
should be enough, no? Then he can refer to those in his perf record
call, etc and process it just like with the real tracepoints.

> Also, have you looked at recent eBPF work?
> Though it's not completely ready yet, soon it should
> be able to do the same stats collection as you have
> in 4/5 without adding permanent pieces to the kernel.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
  2014-12-17  0:15 Alexei Starovoitov
@ 2014-12-17  1:30 ` Martin Lau
  0 siblings, 0 replies; 29+ messages in thread
From: Martin Lau @ 2014-12-17  1:30 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eric Dumazet, Blake Matheny, Laurent Chavey, Yuchung Cheng,
	netdev, David S. Miller, Hannes Frederic Sowa, Steven Rostedt,
	Lawrence Brakmo, Josef Bacik, Kernel Team

On Tue, Dec 16, 2014 at 04:15:24PM -0800, Alexei Starovoitov wrote:
> On Tue, Dec 16, 2014 at 10:28 AM, Martin Lau <kafai@fb.com> wrote:
> > We can consider to reuse the events's format (tracing/events/*/format). I think
> > blktrace.c is using similar approach in trace-cmd.
> 
> yes. tcp_trace is a carbon copy of blktrace applied to tcp.
> 
> >> >> I think systemtap like scripting on top of patches 1 and 3
> >> >> should solve your use case ?
> > We have quite a few different versions running in the production.  It may not
> > be operationally easy.
> 
> different versions of kernel or different versions of tcp_tracer ?
Former and we are releasing new kernel pretty often.

> 
> > Having a getsockopt will be useful for the new application/library to take
> > advantage of.
> >
> > For the continuous monitoring/logging purpose, ftrace can provide event
> > triggered tracing instead of periodically consulting ss.
> 
> so both getsockopt tcp_info approach and ftrace+tcp_trace
> approach can provide the same set of stats per flow, right?
> And the only difference is 'ss' needs polling and ftrace
> collects all events?
> Since they're stats anyway, the polling interval
> shouldn't matter. Just like lost trace events?
> 
> from patch 5 commit log:
> "Define probes and register them to the TCP tracepoints.  The probes
> collect the data defined in struct tcp_sk_trace and record them to
> the tracing's ring_buffer.
> "
> so two trace_seq_printf() from patch 5
> and two new 'struct tcp_trace_stats' and 'tcp_trace_basic'
> from patch 4 will become permanent user api.
> 
> At the same time the commit log is saying:
> "It is still missing a few things that
> we currently have, like:
> - why the sender is blocked? and how long for each reason?
> - some TCP Congestion Control data"
> 
> Does it mean that these printf and structs would have
> to change?
How does the current TRACE_EVENT do it when it wants to printf more data?

> Can 'struct tcp_info' be extended instead of
> adding 'struct tcp_trace_stats' ?
> Then getsockopt and ftrace+tcp_trace will be returning
> the same structs.
> 
> It feels that for stats collection only, tracepoints+tcp_trace
> do not add much additional value vs extending tcp_info
> and using ss.
I think we are on the same page. Once 'this should cost nothing if not
activated' proposition was cleared out.  It was what I meant that doing the
collection part in the TCP itself (instead of tracepoints) would be nice.

> I see the value in tracepoints on its own, since we'll
> be able to use dynamic tracing to do event aggregation,
> filtering, etc. That was my alternative suggestion to
> add only tracepoints from patches 1 and 3.

I think going forward, as others have suggested, it may be better to come
together and reach a common ground on what to collect first before I re-work
patch 1 to 3 and repost.

Thanks,
--Martin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
@ 2014-12-17  0:15 Alexei Starovoitov
  2014-12-17  1:30 ` Martin Lau
  0 siblings, 1 reply; 29+ messages in thread
From: Alexei Starovoitov @ 2014-12-17  0:15 UTC (permalink / raw)
  To: Martin Lau
  Cc: Eric Dumazet, Blake Matheny, Laurent Chavey, Yuchung Cheng,
	netdev, David S. Miller, Hannes Frederic Sowa, Steven Rostedt,
	Lawrence Brakmo, Josef Bacik, Kernel Team

On Tue, Dec 16, 2014 at 10:28 AM, Martin Lau <kafai@fb.com> wrote:
> We can consider to reuse the events's format (tracing/events/*/format). I think
> blktrace.c is using similar approach in trace-cmd.

yes. tcp_trace is a carbon copy of blktrace applied to tcp.

>> >> I think systemtap like scripting on top of patches 1 and 3
>> >> should solve your use case ?
> We have quite a few different versions running in the production.  It may not
> be operationally easy.

different versions of kernel or different versions of tcp_tracer ?

> Having a getsockopt will be useful for the new application/library to take
> advantage of.
>
> For the continuous monitoring/logging purpose, ftrace can provide event
> triggered tracing instead of periodically consulting ss.

so both getsockopt tcp_info approach and ftrace+tcp_trace
approach can provide the same set of stats per flow, right?
And the only difference is 'ss' needs polling and ftrace
collects all events?
Since they're stats anyway, the polling interval
shouldn't matter. Just like lost trace events?

from patch 5 commit log:
"Define probes and register them to the TCP tracepoints.  The probes
collect the data defined in struct tcp_sk_trace and record them to
the tracing's ring_buffer.
"
so two trace_seq_printf() from patch 5
and two new 'struct tcp_trace_stats' and 'tcp_trace_basic'
from patch 4 will become permanent user api.

At the same time the commit log is saying:
"It is still missing a few things that
we currently have, like:
- why the sender is blocked? and how long for each reason?
- some TCP Congestion Control data"

Does it mean that these printf and structs would have
to change?

Can 'struct tcp_info' be extended instead of
adding 'struct tcp_trace_stats' ?

Then getsockopt and ftrace+tcp_trace will be returning
the same structs.

It feels that for stats collection only, tracepoints+tcp_trace
do not add much additional value vs extending tcp_info
and using ss.

I see the value in tracepoints on its own, since we'll
be able to use dynamic tracing to do event aggregation,
filtering, etc. That was my alternative suggestion to
add only tracepoints from patches 1 and 3.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
  2014-12-16 22:45       ` David Miller
@ 2014-12-16 22:50         ` Hannes Frederic Sowa
  0 siblings, 0 replies; 29+ messages in thread
From: Hannes Frederic Sowa @ 2014-12-16 22:50 UTC (permalink / raw)
  To: David Miller, jbaron
  Cc: jbacik, eric.dumazet, alexei.starovoitov, chavey, ycheng, kafai,
	netdev, rostedt, brakmo, Kernel-team, Daniel Borkmann,
	Florian Westphal

On Tue, Dec 16, 2014, at 23:45, David Miller wrote:
> From: Jason Baron <jbaron@akamai.com>
> Date: Tue, 16 Dec 2014 17:40:47 -0500
> 
> > We are interested in tcp tracing as well. Another requirement that
> > we have that I don't think I saw is the ability to start/stop
> > tracing on sockets (potentially multiple times) during the lifetime
> > of a connection. So for example, the ability to use setsockopt(), to
> > selectively start/stop tracing on a connection, so as not to incur
> > overhead for non-traced sockets.
> 
> This is so backwards.
> 
> You make the tracing cheap enough that this can never be an issue.
> 
> Your requirement can only exist if the implementation is broken
> by design.

An idea I had was to add a proxy tcp congestion control which could be
selectively chosen per destination as soon as Daniel's patchset hits
net-next or one could enable it globally by
sys/net/ipv4/tcp_congestion_control. Needed tracepoints could be
installed in the congestion_ops handlers and the additional storage
could live inside the private data of the congestion control handler.
Further callbacks could go to a chained congestion control handler. The
names could be extended like "t:cubic", t for tracing.

Do the congestion control callbacks provide enough insight to the
connection state?

Bye,
Hannes

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
  2014-12-16 22:40     ` Jason Baron
@ 2014-12-16 22:45       ` David Miller
  2014-12-16 22:50         ` Hannes Frederic Sowa
  0 siblings, 1 reply; 29+ messages in thread
From: David Miller @ 2014-12-16 22:45 UTC (permalink / raw)
  To: jbaron
  Cc: jbacik, eric.dumazet, alexei.starovoitov, chavey, ycheng, kafai,
	netdev, hannes, rostedt, brakmo, Kernel-team

From: Jason Baron <jbaron@akamai.com>
Date: Tue, 16 Dec 2014 17:40:47 -0500

> We are interested in tcp tracing as well. Another requirement that
> we have that I don't think I saw is the ability to start/stop
> tracing on sockets (potentially multiple times) during the lifetime
> of a connection. So for example, the ability to use setsockopt(), to
> selectively start/stop tracing on a connection, so as not to incur
> overhead for non-traced sockets.

This is so backwards.

You make the tracing cheap enough that this can never be an issue.

Your requirement can only exist if the implementation is broken
by design.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
  2014-12-15 16:42   ` Josef Bacik
  2014-12-15 22:01     ` Tom Herbert
@ 2014-12-16 22:40     ` Jason Baron
  2014-12-16 22:45       ` David Miller
  1 sibling, 1 reply; 29+ messages in thread
From: Jason Baron @ 2014-12-16 22:40 UTC (permalink / raw)
  To: Josef Bacik, Eric Dumazet, Alexei Starovoitov, Laurent Chavey,
	Yuchung Cheng
  Cc: Martin KaFai Lau, netdev, David S. Miller, Hannes Frederic Sowa,
	Steven Rostedt, Lawrence Brakmo, Kernel Team

On 12/15/2014 11:42 AM, Josef Bacik wrote:
> On 12/15/2014 11:03 AM, Eric Dumazet wrote:
>> On Sun, 2014-12-14 at 22:55 -0800, Alexei Starovoitov wrote:
>>
>>> I think patches 1 and 3 are good additions, since they establish
>>> few permanent points of instrumentation in tcp stack.
>>> Patches 4-5 look more like use cases of tracepoints established
>>> before. They may feel like simple additions and, no doubt,
>>> they are useful, but since they expose things via tracing
>>> infra they become part of api and cannot be changed later,
>>> when more stats would be needed.
>>> I think systemtap like scripting on top of patches 1 and 3
>>> should solve your use case ?
>>> Also, have you looked at recent eBPF work?
>>> Though it's not completely ready yet, soon it should
>>> be able to do the same stats collection as you have
>>> in 4/5 without adding permanent pieces to the kernel.
>>
>> So it looks like web10g like interfaces are very often requested by
>> various teams.
>>
>> And we have many different views on how to hack this. I am astonished by
>> number of hacks I saw about this stuff going on.
>>
>> What about a clean way, extending current TCP_INFO, which is both
>> available as a getsockopt() for socket owners and ss/iproute2
>> information for 'external entities'
>>
>> If we consider web10g info needed, then adding a ftrace/eBPF like
>> interface is simply yet another piece of code we need to maintain,
>> and the argument of 'this should cost nothing if not activated' is
>> nonsense since major players need to constantly monitor TCP metrics and
>> behavior.
>>
>> It seems both FaceBook and Google are working on a subset of web10g.
>>
>> I suggest we meet together and establish a common ground, preferably
>> after Christmas holidays.
>>
>
> We've set up something for exactly this case at the end of January but
> have yet to get a response from Google.  If any of the Google people
> cc'ed (or really anybody, its not a strictly FB/Google thing) is
> interested please email me directly and I'll send you the details, we
> will be meeting face to face in the bay area at the end of January. 
> Thanks,
>
> Josef

We are interested in tcp tracing as well. Another requirement that we
have that
I don't think I saw is the ability to start/stop tracing on sockets
(potentially
multiple times) during the lifetime of a connection. So for example, the
ability
to use setsockopt(), to selectively start/stop tracing on a connection,
so as not
to incur overhead for non-traced sockets.

Thanks,

-Jason

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
  2014-12-15 16:08   ` Blake Matheny
  2014-12-15 19:56     ` Yuchung Cheng
@ 2014-12-16 18:28     ` Martin Lau
  1 sibling, 0 replies; 29+ messages in thread
From: Martin Lau @ 2014-12-16 18:28 UTC (permalink / raw)
  To: Alexei Starovoitov, Eric Dumazet
  Cc: Blake Matheny, Laurent Chavey, Yuchung Cheng, netdev,
	David S. Miller, Hannes Frederic Sowa, Steven Rostedt,
	Lawrence Brakmo, Josef Bacik, Kernel Team

> >On Sun, 2014-12-14 at 22:55 -0800, Alexei Starovoitov wrote:
> >
> >> I think patches 1 and 3 are good additions, since they establish
> >> few permanent points of instrumentation in tcp stack.
> >> Patches 4-5 look more like use cases of tracepoints established
> >> before. They may feel like simple additions and, no doubt,
> >> they are useful, but since they expose things via tracing
> >> infra they become part of api and cannot be changed later,
> >> when more stats would be needed.
We can consider to reuse the events's format (tracing/events/*/format). I think
blktrace.c is using similar approach in trace-cmd.

> >> I think systemtap like scripting on top of patches 1 and 3
> >> should solve your use case ?
We have quite a few different versions running in the production.  It may not
be operationally easy.

> >> Also, have you looked at recent eBPF work?
> >> Though it's not completely ready yet, soon it should
> >> be able to do the same stats collection as you have
> >> in 4/5 without adding permanent pieces to the kernel.
We are keeping an eye on the eBPF work.


> On 12/15/14, 8:03 AM, "Eric Dumazet" <eric.dumazet@gmail.com> wrote:
> 
> >So it looks like web10g like interfaces are very often requested by
> >various teams.
> >
> >And we have many different views on how to hack this. I am astonished by
> >number of hacks I saw about this stuff going on.
> >
> >What about a clean way, extending current TCP_INFO, which is both
> >available as a getsockopt() for socket owners and ss/iproute2
> >information for 'external entities'
> >
> >If we consider web10g info needed, then adding a ftrace/eBPF like
> >interface is simply yet another piece of code we need to maintain,
> >and the argument of 'this should cost nothing if not activated' is
> >nonsense since major players need to constantly monitor TCP metrics and
> >behavior.
For the data collecting part, it would be nice to do it in the TCP itself.

Having a getsockopt will be useful for the new application/library to take
advantage of.

For the continuous monitoring/logging purpose, ftrace can provide event
triggered tracing instead of periodically consulting ss.

Thanks,
--Martin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
  2014-12-15 23:28       ` Jamal Hadi Salim
@ 2014-12-15 23:40         ` Eric Dumazet
  0 siblings, 0 replies; 29+ messages in thread
From: Eric Dumazet @ 2014-12-15 23:40 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Tom Herbert, Josef Bacik, Alexei Starovoitov, Laurent Chavey,
	Yuchung Cheng, Martin KaFai Lau, netdev, David S. Miller,
	Hannes Frederic Sowa, Steven Rostedt, Lawrence Brakmo,
	Kernel Team

On Mon, 2014-12-15 at 18:28 -0500, Jamal Hadi Salim wrote:
> On 12/15/14 17:01, Tom Herbert wrote:
> 
> 
> >
> > Maybe this would be good for discussion at netdev01?
> >
> 
> Yes it would be a good fit,
> I just pinged Eric when i saw his email saying the same thing ;->
> 

For the record, I made this suggestion to Josef in a private mail, sent
at 10am PST ;)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
  2014-12-15 22:01     ` Tom Herbert
  2014-12-15 22:17       ` rapier
  2014-12-15 22:29       ` Steven Rostedt
@ 2014-12-15 23:28       ` Jamal Hadi Salim
  2014-12-15 23:40         ` Eric Dumazet
  2 siblings, 1 reply; 29+ messages in thread
From: Jamal Hadi Salim @ 2014-12-15 23:28 UTC (permalink / raw)
  To: Tom Herbert, Josef Bacik
  Cc: Eric Dumazet, Alexei Starovoitov, Laurent Chavey, Yuchung Cheng,
	Martin KaFai Lau, netdev, David S. Miller, Hannes Frederic Sowa,
	Steven Rostedt, Lawrence Brakmo, Kernel Team

On 12/15/14 17:01, Tom Herbert wrote:


>
> Maybe this would be good for discussion at netdev01?
>

Yes it would be a good fit,
I just pinged Eric when i saw his email saying the same thing ;->

cheers,
jamal

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
  2014-12-15 22:01     ` Tom Herbert
  2014-12-15 22:17       ` rapier
@ 2014-12-15 22:29       ` Steven Rostedt
  2014-12-15 23:28       ` Jamal Hadi Salim
  2 siblings, 0 replies; 29+ messages in thread
From: Steven Rostedt @ 2014-12-15 22:29 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Josef Bacik, Eric Dumazet, Alexei Starovoitov, Laurent Chavey,
	Yuchung Cheng, Martin KaFai Lau, netdev, David S. Miller,
	Hannes Frederic Sowa, Lawrence Brakmo, Kernel Team

On Mon, 15 Dec 2014 14:01:43 -0800
Tom Herbert <therbert@google.com> wrote:

> >
> > We've set up something for exactly this case at the end of January but have
> > yet to get a response from Google.  If any of the Google people cc'ed (or
> > really anybody, its not a strictly FB/Google thing) is interested please
> > email me directly and I'll send you the details, we will be meeting face to
> > face in the bay area at the end of January.  Thanks,
> >
> 
> Maybe this would be good for discussion at netdev01?

Is this something I should attend too? For this discussion that is.
Weather permitting, Ottawa is only a 4 1/2 hour drive for me.

-- Steve

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
  2014-12-15 22:01     ` Tom Herbert
@ 2014-12-15 22:17       ` rapier
  2014-12-15 22:29       ` Steven Rostedt
  2014-12-15 23:28       ` Jamal Hadi Salim
  2 siblings, 0 replies; 29+ messages in thread
From: rapier @ 2014-12-15 22:17 UTC (permalink / raw)
  To: Tom Herbert, Josef Bacik
  Cc: Eric Dumazet, Alexei Starovoitov, Laurent Chavey, Yuchung Cheng,
	Martin KaFai Lau, netdev, David S. Miller, Hannes Frederic Sowa,
	Steven Rostedt, Lawrence Brakmo, Kernel Team

The Web10g development team at PSC (we've been working with
a number of other organizations on this) will be submitting
the kernel instrument set tomorrow morning. We'd be happy to
join any discussion then.

Chris rapier

On 12/15/14, 5:01 PM, Tom Herbert wrote:
> On Mon, Dec 15, 2014 at 8:42 AM, Josef Bacik <jbacik@fb.com> wrote:
>> On 12/15/2014 11:03 AM, Eric Dumazet wrote:
>>>
>>> On Sun, 2014-12-14 at 22:55 -0800, Alexei Starovoitov wrote:
>>>
>>>> I think patches 1 and 3 are good additions, since they establish
>>>> few permanent points of instrumentation in tcp stack.
>>>> Patches 4-5 look more like use cases of tracepoints established
>>>> before. They may feel like simple additions and, no doubt,
>>>> they are useful, but since they expose things via tracing
>>>> infra they become part of api and cannot be changed later,
>>>> when more stats would be needed.
>>>> I think systemtap like scripting on top of patches 1 and 3
>>>> should solve your use case ?
>>>> Also, have you looked at recent eBPF work?
>>>> Though it's not completely ready yet, soon it should
>>>> be able to do the same stats collection as you have
>>>> in 4/5 without adding permanent pieces to the kernel.
>>>
>>>
>>> So it looks like web10g like interfaces are very often requested by
>>> various teams.
>>>
>>> And we have many different views on how to hack this. I am astonished by
>>> number of hacks I saw about this stuff going on.
>>>
>>> What about a clean way, extending current TCP_INFO, which is both
>>> available as a getsockopt() for socket owners and ss/iproute2
>>> information for 'external entities'
>>>
>>> If we consider web10g info needed, then adding a ftrace/eBPF like
>>> interface is simply yet another piece of code we need to maintain,
>>> and the argument of 'this should cost nothing if not activated' is
>>> nonsense since major players need to constantly monitor TCP metrics and
>>> behavior.
>>>
>>> It seems both FaceBook and Google are working on a subset of web10g.
>>>
>>> I suggest we meet together and establish a common ground, preferably
>>> after Christmas holidays.
>>>
>>
>> We've set up something for exactly this case at the end of January but have
>> yet to get a response from Google.  If any of the Google people cc'ed (or
>> really anybody, its not a strictly FB/Google thing) is interested please
>> email me directly and I'll send you the details, we will be meeting face to
>> face in the bay area at the end of January.  Thanks,
>>
>
> Maybe this would be good for discussion at netdev01?
>
>> Josef
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
  2014-12-15 16:42   ` Josef Bacik
@ 2014-12-15 22:01     ` Tom Herbert
  2014-12-15 22:17       ` rapier
                         ` (2 more replies)
  2014-12-16 22:40     ` Jason Baron
  1 sibling, 3 replies; 29+ messages in thread
From: Tom Herbert @ 2014-12-15 22:01 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Eric Dumazet, Alexei Starovoitov, Laurent Chavey, Yuchung Cheng,
	Martin KaFai Lau, netdev, David S. Miller, Hannes Frederic Sowa,
	Steven Rostedt, Lawrence Brakmo, Kernel Team

On Mon, Dec 15, 2014 at 8:42 AM, Josef Bacik <jbacik@fb.com> wrote:
> On 12/15/2014 11:03 AM, Eric Dumazet wrote:
>>
>> On Sun, 2014-12-14 at 22:55 -0800, Alexei Starovoitov wrote:
>>
>>> I think patches 1 and 3 are good additions, since they establish
>>> few permanent points of instrumentation in tcp stack.
>>> Patches 4-5 look more like use cases of tracepoints established
>>> before. They may feel like simple additions and, no doubt,
>>> they are useful, but since they expose things via tracing
>>> infra they become part of api and cannot be changed later,
>>> when more stats would be needed.
>>> I think systemtap like scripting on top of patches 1 and 3
>>> should solve your use case ?
>>> Also, have you looked at recent eBPF work?
>>> Though it's not completely ready yet, soon it should
>>> be able to do the same stats collection as you have
>>> in 4/5 without adding permanent pieces to the kernel.
>>
>>
>> So it looks like web10g like interfaces are very often requested by
>> various teams.
>>
>> And we have many different views on how to hack this. I am astonished by
>> number of hacks I saw about this stuff going on.
>>
>> What about a clean way, extending current TCP_INFO, which is both
>> available as a getsockopt() for socket owners and ss/iproute2
>> information for 'external entities'
>>
>> If we consider web10g info needed, then adding a ftrace/eBPF like
>> interface is simply yet another piece of code we need to maintain,
>> and the argument of 'this should cost nothing if not activated' is
>> nonsense since major players need to constantly monitor TCP metrics and
>> behavior.
>>
>> It seems both FaceBook and Google are working on a subset of web10g.
>>
>> I suggest we meet together and establish a common ground, preferably
>> after Christmas holidays.
>>
>
> We've set up something for exactly this case at the end of January but have
> yet to get a response from Google.  If any of the Google people cc'ed (or
> really anybody, its not a strictly FB/Google thing) is interested please
> email me directly and I'll send you the details, we will be meeting face to
> face in the bay area at the end of January.  Thanks,
>

Maybe this would be good for discussion at netdev01?

> Josef
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
  2014-12-15 16:08   ` Blake Matheny
@ 2014-12-15 19:56     ` Yuchung Cheng
  2014-12-17 20:45       ` rapier
  2014-12-16 18:28     ` Martin Lau
  1 sibling, 1 reply; 29+ messages in thread
From: Yuchung Cheng @ 2014-12-15 19:56 UTC (permalink / raw)
  To: Blake Matheny
  Cc: Eric Dumazet, Alexei Starovoitov, Laurent Chavey, Martin Lau,
	netdev, David S. Miller, Hannes Frederic Sowa, Steven Rostedt,
	Lawrence Brakmo, Josef Bacik, Kernel Team

On Mon, Dec 15, 2014 at 8:08 AM, Blake Matheny <bmatheny@fb.com> wrote:
>
> We have an additional set of patches for web10g that builds on these
> tracepoints. It can be made to work either way, but I agree the idea of
> something like a sockopt would be really nice.

I'd like to compare these patches  with tools that parse pcap files to
generate per-flow counters to collect RTTs, #dupacks, etc. What
additional values or insights do they provide to improve/debug TCP
performance? maybe an example?

IMO these stats provide a general pictures of how TCP works of a
specific network, but not enough to really nail specific bugs in TCP
protocol or implementation. Then SNMP stats or sampling with pcap
traces with offline analysis can achieve the same purpose.

>
>
> -Blake
>
> On 12/15/14, 8:03 AM, "Eric Dumazet" <eric.dumazet@gmail.com> wrote:
>
> >On Sun, 2014-12-14 at 22:55 -0800, Alexei Starovoitov wrote:
> >
> >> I think patches 1 and 3 are good additions, since they establish
> >> few permanent points of instrumentation in tcp stack.
> >> Patches 4-5 look more like use cases of tracepoints established
> >> before. They may feel like simple additions and, no doubt,
> >> they are useful, but since they expose things via tracing
> >> infra they become part of api and cannot be changed later,
> >> when more stats would be needed.
> >> I think systemtap like scripting on top of patches 1 and 3
> >> should solve your use case ?
> >> Also, have you looked at recent eBPF work?
> >> Though it's not completely ready yet, soon it should
> >> be able to do the same stats collection as you have
> >> in 4/5 without adding permanent pieces to the kernel.
> >
> >So it looks like web10g like interfaces are very often requested by
> >various teams.
> >
> >And we have many different views on how to hack this. I am astonished by
> >number of hacks I saw about this stuff going on.
> >
> >What about a clean way, extending current TCP_INFO, which is both
> >available as a getsockopt() for socket owners and ss/iproute2
> >information for 'external entities'
> >
> >If we consider web10g info needed, then adding a ftrace/eBPF like
> >interface is simply yet another piece of code we need to maintain,
> >and the argument of 'this should cost nothing if not activated' is
> >nonsense since major players need to constantly monitor TCP metrics and
> >behavior.
> >
> >It seems both FaceBook and Google are working on a subset of web10g.
> >
> >I suggest we meet together and establish a common ground, preferably
> >after Christmas holidays.
> >
> >Thanks
> >
> >
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
  2014-12-15 16:03 ` Eric Dumazet
  2014-12-15 16:08   ` Blake Matheny
@ 2014-12-15 16:42   ` Josef Bacik
  2014-12-15 22:01     ` Tom Herbert
  2014-12-16 22:40     ` Jason Baron
  1 sibling, 2 replies; 29+ messages in thread
From: Josef Bacik @ 2014-12-15 16:42 UTC (permalink / raw)
  To: Eric Dumazet, Alexei Starovoitov, Laurent Chavey, Yuchung Cheng
  Cc: Martin KaFai Lau, netdev, David S. Miller, Hannes Frederic Sowa,
	Steven Rostedt, Lawrence Brakmo, Kernel Team

On 12/15/2014 11:03 AM, Eric Dumazet wrote:
> On Sun, 2014-12-14 at 22:55 -0800, Alexei Starovoitov wrote:
>
>> I think patches 1 and 3 are good additions, since they establish
>> few permanent points of instrumentation in tcp stack.
>> Patches 4-5 look more like use cases of tracepoints established
>> before. They may feel like simple additions and, no doubt,
>> they are useful, but since they expose things via tracing
>> infra they become part of api and cannot be changed later,
>> when more stats would be needed.
>> I think systemtap like scripting on top of patches 1 and 3
>> should solve your use case ?
>> Also, have you looked at recent eBPF work?
>> Though it's not completely ready yet, soon it should
>> be able to do the same stats collection as you have
>> in 4/5 without adding permanent pieces to the kernel.
>
> So it looks like web10g like interfaces are very often requested by
> various teams.
>
> And we have many different views on how to hack this. I am astonished by
> number of hacks I saw about this stuff going on.
>
> What about a clean way, extending current TCP_INFO, which is both
> available as a getsockopt() for socket owners and ss/iproute2
> information for 'external entities'
>
> If we consider web10g info needed, then adding a ftrace/eBPF like
> interface is simply yet another piece of code we need to maintain,
> and the argument of 'this should cost nothing if not activated' is
> nonsense since major players need to constantly monitor TCP metrics and
> behavior.
>
> It seems both FaceBook and Google are working on a subset of web10g.
>
> I suggest we meet together and establish a common ground, preferably
> after Christmas holidays.
>

We've set up something for exactly this case at the end of January but 
have yet to get a response from Google.  If any of the Google people 
cc'ed (or really anybody, its not a strictly FB/Google thing) is 
interested please email me directly and I'll send you the details, we 
will be meeting face to face in the bay area at the end of January.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
  2014-12-15 16:03 ` Eric Dumazet
@ 2014-12-15 16:08   ` Blake Matheny
  2014-12-15 19:56     ` Yuchung Cheng
  2014-12-16 18:28     ` Martin Lau
  2014-12-15 16:42   ` Josef Bacik
  1 sibling, 2 replies; 29+ messages in thread
From: Blake Matheny @ 2014-12-15 16:08 UTC (permalink / raw)
  To: Eric Dumazet, Alexei Starovoitov, Laurent Chavey, Yuchung Cheng
  Cc: Martin Lau, netdev, David S. Miller, Hannes Frederic Sowa,
	Steven Rostedt, Lawrence Brakmo, Josef Bacik, Kernel Team

We have an additional set of patches for web10g that builds on these
tracepoints. It can be made to work either way, but I agree the idea of
something like a sockopt would be really nice.

-Blake

On 12/15/14, 8:03 AM, "Eric Dumazet" <eric.dumazet@gmail.com> wrote:

>On Sun, 2014-12-14 at 22:55 -0800, Alexei Starovoitov wrote:
>
>> I think patches 1 and 3 are good additions, since they establish
>> few permanent points of instrumentation in tcp stack.
>> Patches 4-5 look more like use cases of tracepoints established
>> before. They may feel like simple additions and, no doubt,
>> they are useful, but since they expose things via tracing
>> infra they become part of api and cannot be changed later,
>> when more stats would be needed.
>> I think systemtap like scripting on top of patches 1 and 3
>> should solve your use case ?
>> Also, have you looked at recent eBPF work?
>> Though it's not completely ready yet, soon it should
>> be able to do the same stats collection as you have
>> in 4/5 without adding permanent pieces to the kernel.
>
>So it looks like web10g like interfaces are very often requested by
>various teams.
>
>And we have many different views on how to hack this. I am astonished by
>number of hacks I saw about this stuff going on.
>
>What about a clean way, extending current TCP_INFO, which is both
>available as a getsockopt() for socket owners and ss/iproute2
>information for 'external entities'
>
>If we consider web10g info needed, then adding a ftrace/eBPF like
>interface is simply yet another piece of code we need to maintain,
>and the argument of 'this should cost nothing if not activated' is
>nonsense since major players need to constantly monitor TCP metrics and
>behavior.
>
>It seems both FaceBook and Google are working on a subset of web10g.
>
>I suggest we meet together and establish a common ground, preferably
>after Christmas holidays.
>
>Thanks
>
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
  2014-12-15  6:55 Alexei Starovoitov
@ 2014-12-15 16:03 ` Eric Dumazet
  2014-12-15 16:08   ` Blake Matheny
  2014-12-15 16:42   ` Josef Bacik
  2014-12-17 15:07 ` Arnaldo Carvalho de Melo
  1 sibling, 2 replies; 29+ messages in thread
From: Eric Dumazet @ 2014-12-15 16:03 UTC (permalink / raw)
  To: Alexei Starovoitov, Laurent Chavey, Yuchung Cheng
  Cc: Martin KaFai Lau, netdev, David S. Miller, Hannes Frederic Sowa,
	Steven Rostedt, Lawrence Brakmo, Josef Bacik, Kernel Team

On Sun, 2014-12-14 at 22:55 -0800, Alexei Starovoitov wrote:

> I think patches 1 and 3 are good additions, since they establish
> few permanent points of instrumentation in tcp stack.
> Patches 4-5 look more like use cases of tracepoints established
> before. They may feel like simple additions and, no doubt,
> they are useful, but since they expose things via tracing
> infra they become part of api and cannot be changed later,
> when more stats would be needed.
> I think systemtap like scripting on top of patches 1 and 3
> should solve your use case ?
> Also, have you looked at recent eBPF work?
> Though it's not completely ready yet, soon it should
> be able to do the same stats collection as you have
> in 4/5 without adding permanent pieces to the kernel.

So it looks like web10g like interfaces are very often requested by
various teams.

And we have many different views on how to hack this. I am astonished by
number of hacks I saw about this stuff going on.

What about a clean way, extending current TCP_INFO, which is both
available as a getsockopt() for socket owners and ss/iproute2
information for 'external entities'

If we consider web10g info needed, then adding a ftrace/eBPF like
interface is simply yet another piece of code we need to maintain,
and the argument of 'this should cost nothing if not activated' is
nonsense since major players need to constantly monitor TCP metrics and
behavior.

It seems both FaceBook and Google are working on a subset of web10g.

I suggest we meet together and establish a common ground, preferably
after Christmas holidays.

Thanks

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
@ 2014-12-15  6:55 Alexei Starovoitov
  2014-12-15 16:03 ` Eric Dumazet
  2014-12-17 15:07 ` Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 29+ messages in thread
From: Alexei Starovoitov @ 2014-12-15  6:55 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: netdev, David S. Miller, Hannes Frederic Sowa, Steven Rostedt,
	Lawrence Brakmo, Josef Bacik, Kernel Team

On Sun, Dec 14, 2014 at 5:56 PM, Martin KaFai Lau <kafai@fb.com> wrote:
> Hi,
>
> We have been using the kernel ftrace infra to collect TCP per-flow statistics.
> The following patch set is a first slim-down version of our
> existing implementation. We would like to get some early feedback
> and make it useful for others.
>
> [RFC PATCH net-next 1/5] tcp: Add TCP TRACE_EVENTs:
> Defines some basic tracepoints (by TRACE_EVENT).
>
> [RFC PATCH net-next 2/5] tcp: A perf script for TCP tracepoints:
> A sample perf script with simple ip/port filtering and summary output.
>
> [RFC PATCH net-next 3/5] tcp: Add a few more tracepoints for tcp tracer:
> Declares a few more tracepoints (by DECLARE_TRACE) which are
> used by the tcp_tracer.  The tcp_tracer is in the patch 5/5.
>
> [RFC PATCH net-next 4/5] tcp: Introduce tcp_sk_trace and related structs:
> Defines a few tcp_trace structs which are used to collect statistics
> on each tcp_sock.
>
> [RFC PATCH net-next 5/5] tcp: Add TCP tracer:
> It introduces a tcp_tracer which hooks onto the tracepoints defined in the
> patch 1/5 and 3/5.  It collects data defined in patch 4/5. We currently
> use this tracer to collect per-flow statistics.  The commit log has
> some more details.

I think patches 1 and 3 are good additions, since they establish
few permanent points of instrumentation in tcp stack.
Patches 4-5 look more like use cases of tracepoints established
before. They may feel like simple additions and, no doubt,
they are useful, but since they expose things via tracing
infra they become part of api and cannot be changed later,
when more stats would be needed.
I think systemtap like scripting on top of patches 1 and 3
should solve your use case ?
Also, have you looked at recent eBPF work?
Though it's not completely ready yet, soon it should
be able to do the same stats collection as you have
in 4/5 without adding permanent pieces to the kernel.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RFC PATCH net-next 0/5] tcp: TCP tracer
@ 2014-12-15  1:56 Martin KaFai Lau
  0 siblings, 0 replies; 29+ messages in thread
From: Martin KaFai Lau @ 2014-12-15  1:56 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Hannes Frederic Sowa, Steven Rostedt,
	Lawrence Brakmo, Josef Bacik, Kernel Team

Hi,

We have been using the kernel ftrace infra to collect TCP per-flow statistics.
The following patch set is a first slim-down version of our
existing implementation. We would like to get some early feedback
and make it useful for others.

[RFC PATCH net-next 1/5] tcp: Add TCP TRACE_EVENTs:
Defines some basic tracepoints (by TRACE_EVENT).

[RFC PATCH net-next 2/5] tcp: A perf script for TCP tracepoints:
A sample perf script with simple ip/port filtering and summary output.

[RFC PATCH net-next 3/5] tcp: Add a few more tracepoints for tcp tracer:
Declares a few more tracepoints (by DECLARE_TRACE) which are
used by the tcp_tracer.  The tcp_tracer is in the patch 5/5.

[RFC PATCH net-next 4/5] tcp: Introduce tcp_sk_trace and related structs:
Defines a few tcp_trace structs which are used to collect statistics
on each tcp_sock.

[RFC PATCH net-next 5/5] tcp: Add TCP tracer:
It introduces a tcp_tracer which hooks onto the tracepoints defined in the
patch 1/5 and 3/5.  It collects data defined in patch 4/5. We currently
use this tracer to collect per-flow statistics.  The commit log has
some more details.

Thanks,
--Martin

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2014-12-19  1:43 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-17  3:06 [RFC PATCH net-next 0/5] tcp: TCP tracer Alexei Starovoitov
2014-12-17 21:42 ` Josef Bacik
2014-12-18 23:43   ` Lawrence Brakmo
2014-12-19  1:42     ` Yuchung Cheng
  -- strict thread matches above, loose matches on Subject: below --
2014-12-17 20:42 Alexei Starovoitov
2014-12-17 20:56 ` David Ahern
2014-12-17 21:24   ` Arnaldo Carvalho de Melo
2014-12-17 21:19 ` Arnaldo Carvalho de Melo
2014-12-17 17:14 Alexei Starovoitov
2014-12-17 19:51 ` Arnaldo Carvalho de Melo
2014-12-17  0:15 Alexei Starovoitov
2014-12-17  1:30 ` Martin Lau
2014-12-15  6:55 Alexei Starovoitov
2014-12-15 16:03 ` Eric Dumazet
2014-12-15 16:08   ` Blake Matheny
2014-12-15 19:56     ` Yuchung Cheng
2014-12-17 20:45       ` rapier
2014-12-16 18:28     ` Martin Lau
2014-12-15 16:42   ` Josef Bacik
2014-12-15 22:01     ` Tom Herbert
2014-12-15 22:17       ` rapier
2014-12-15 22:29       ` Steven Rostedt
2014-12-15 23:28       ` Jamal Hadi Salim
2014-12-15 23:40         ` Eric Dumazet
2014-12-16 22:40     ` Jason Baron
2014-12-16 22:45       ` David Miller
2014-12-16 22:50         ` Hannes Frederic Sowa
2014-12-17 15:07 ` Arnaldo Carvalho de Melo
2014-12-15  1:56 Martin KaFai Lau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.