All of lore.kernel.org
 help / color / mirror / Atom feed
* [Patch net-next] tcp: add a tracepoint for tcp_retransmit_skb()
@ 2017-10-10  5:35 Cong Wang
  2017-10-10 11:21 ` Eric Dumazet
  2017-10-10 17:38 ` Alexei Starovoitov
  0 siblings, 2 replies; 12+ messages in thread
From: Cong Wang @ 2017-10-10  5:35 UTC (permalink / raw)
  To: netdev; +Cc: Cong Wang, Eric Dumazet, Yuchung Cheng, Neal Cardwell

We need a real-time notification for tcp retransmission
for monitoring.

Of course we could use ftrace to dynamically instrument this
kernel function too, however we can't retrieve the connection
information at the same time, for example perf-tools [1] reads
/proc/net/tcp for socket details, which is slow when we have
a lots of connections.

Therefore, this patch adds a tracepoint for tcp_retransmit_skb()
and exposes src/dst IP addresses and ports of the connection.
This also makes it easier to integrate into perf.

Note, I expose both IPv4 and IPv6 addresses at the same time:
for a IPv4 socket, v4 mapped address is used as IPv6 addresses,
for a IPv6 socket, LOOPBACK4_IPV6 is already filled by kernel.

Perhaps there are other interfaces to use (for example netlink),
but tracepoint is the quickest way I can think of.

1. https://github.com/brendangregg/perf-tools/blob/master/net/tcpretrans

Cc: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 include/trace/events/tcp.h | 63 ++++++++++++++++++++++++++++++++++++++++++++++
 net/core/net-traces.c      |  1 +
 net/ipv4/tcp_output.c      |  3 +++
 3 files changed, 67 insertions(+)
 create mode 100644 include/trace/events/tcp.h

diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
new file mode 100644
index 000000000000..cb22acc8aacd
--- /dev/null
+++ b/include/trace/events/tcp.h
@@ -0,0 +1,63 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM tcp
+
+#if !defined(_TRACE_TCP_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_TCP_H
+
+#include <linux/ipv6.h>
+#include <linux/tcp.h>
+#include <linux/tracepoint.h>
+#include <net/ipv6.h>
+
+TRACE_EVENT(tcp_retransmit_skb,
+
+	TP_PROTO(const struct sock *sk, struct sk_buff *skb, int segs),
+
+	TP_ARGS(sk, skb, segs),
+
+	TP_STRUCT__entry(
+		__field(__u16, sport)
+		__field(__u16, dport)
+		__array(__u8, saddr, 4)
+		__array(__u8, daddr, 4)
+		__array(__u8, saddr_v6, 16)
+		__array(__u8, daddr_v6, 16)
+	),
+
+	TP_fast_assign(
+		struct ipv6_pinfo *np = inet6_sk(sk);
+		struct inet_sock *inet = inet_sk(sk);
+		struct in6_addr *pin6;
+		__be32 *p32;
+
+		__entry->sport = ntohs(inet->inet_sport);
+		__entry->dport = ntohs(inet->inet_dport);
+
+		p32 = (__be32 *) __entry->saddr;
+		*p32 = inet->inet_saddr;
+
+		p32 = (__be32 *) __entry->daddr;
+		*p32 =  inet->inet_daddr;
+
+		if (np) {
+			pin6 = (struct in6_addr *)__entry->saddr_v6;
+			*pin6 = np->saddr;
+			pin6 = (struct in6_addr *)__entry->daddr_v6;
+			*pin6 = *(np->daddr_cache);
+		} else {
+			pin6 = (struct in6_addr *)__entry->saddr_v6;
+			ipv6_addr_set_v4mapped(inet->inet_saddr, pin6);
+			pin6 = (struct in6_addr *)__entry->daddr_v6;
+			ipv6_addr_set_v4mapped(inet->inet_daddr, pin6);
+		}
+	),
+
+	TP_printk("sport=%hu, dport=%hu, saddr=%pI4, daddr=%pI4, saddrv6=%pI6, daddrv6=%pI6",
+		  __entry->sport, __entry->dport, __entry->saddr, __entry->daddr,
+		  __entry->saddr_v6, __entry->daddr_v6)
+);
+
+#endif /* _TRACE_TCP_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/net/core/net-traces.c b/net/core/net-traces.c
index 1132820c8e62..f4e4fa2db505 100644
--- a/net/core/net-traces.c
+++ b/net/core/net-traces.c
@@ -31,6 +31,7 @@
 #include <trace/events/napi.h>
 #include <trace/events/sock.h>
 #include <trace/events/udp.h>
+#include <trace/events/tcp.h>
 #include <trace/events/fib.h>
 #include <trace/events/qdisc.h>
 #if IS_ENABLED(CONFIG_IPV6)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 696b0a168f16..e6d6e1393578 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -42,6 +42,8 @@
 #include <linux/gfp.h>
 #include <linux/module.h>
 
+#include <trace/events/tcp.h>
+
 /* People can turn this off for buggy TCP's found in printers etc. */
 int sysctl_tcp_retrans_collapse __read_mostly = 1;
 
@@ -2899,6 +2901,7 @@ int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs)
 		if (!tp->retrans_stamp)
 			tp->retrans_stamp = tcp_skb_timestamp(skb);
 
+		trace_tcp_retransmit_skb(sk, skb, segs);
 	}
 
 	if (tp->undo_retrans < 0)
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [Patch net-next] tcp: add a tracepoint for tcp_retransmit_skb()
  2017-10-10  5:35 [Patch net-next] tcp: add a tracepoint for tcp_retransmit_skb() Cong Wang
@ 2017-10-10 11:21 ` Eric Dumazet
  2017-10-10 17:16   ` Cong Wang
  2017-10-10 17:38 ` Alexei Starovoitov
  1 sibling, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2017-10-10 11:21 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, Yuchung Cheng, Neal Cardwell

On Mon, Oct 9, 2017 at 10:35 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> We need a real-time notification for tcp retransmission
> for monitoring.

Hi Cong

This seems to not be well defined.

For example, why a retransmission triggered by TLP is not traced ?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch net-next] tcp: add a tracepoint for tcp_retransmit_skb()
  2017-10-10 11:21 ` Eric Dumazet
@ 2017-10-10 17:16   ` Cong Wang
  0 siblings, 0 replies; 12+ messages in thread
From: Cong Wang @ 2017-10-10 17:16 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Yuchung Cheng, Neal Cardwell

On Tue, Oct 10, 2017 at 4:21 AM, Eric Dumazet <edumazet@google.com> wrote:
> On Mon, Oct 9, 2017 at 10:35 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> We need a real-time notification for tcp retransmission
>> for monitoring.
>
> Hi Cong
>
> This seems to not be well defined.
>
> For example, why a retransmission triggered by TLP is not traced ?

Oh, it should, seems I should just put the trace in __tcp_retransmit_skb()
instead...

I will update the patch after waiting for other feedbacks.

Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch net-next] tcp: add a tracepoint for tcp_retransmit_skb()
  2017-10-10  5:35 [Patch net-next] tcp: add a tracepoint for tcp_retransmit_skb() Cong Wang
  2017-10-10 11:21 ` Eric Dumazet
@ 2017-10-10 17:38 ` Alexei Starovoitov
  2017-10-10 21:37   ` Cong Wang
  2017-10-10 21:58   ` Hannes Frederic Sowa
  1 sibling, 2 replies; 12+ messages in thread
From: Alexei Starovoitov @ 2017-10-10 17:38 UTC (permalink / raw)
  To: Cong Wang
  Cc: netdev, Eric Dumazet, Yuchung Cheng, Neal Cardwell,
	Martin KaFai Lau, Brendan Gregg, hannes, Song Liu

On Mon, Oct 09, 2017 at 10:35:47PM -0700, Cong Wang wrote:
> We need a real-time notification for tcp retransmission
> for monitoring.
> 
> Of course we could use ftrace to dynamically instrument this
> kernel function too, however we can't retrieve the connection
> information at the same time, for example perf-tools [1] reads
> /proc/net/tcp for socket details, which is slow when we have
> a lots of connections.
> 
> Therefore, this patch adds a tracepoint for tcp_retransmit_skb()
> and exposes src/dst IP addresses and ports of the connection.
> This also makes it easier to integrate into perf.
> 
> Note, I expose both IPv4 and IPv6 addresses at the same time:
> for a IPv4 socket, v4 mapped address is used as IPv6 addresses,
> for a IPv6 socket, LOOPBACK4_IPV6 is already filled by kernel.
> 
> Perhaps there are other interfaces to use (for example netlink),
> but tracepoint is the quickest way I can think of.
> 
> 1. https://github.com/brendangregg/perf-tools/blob/master/net/tcpretrans
> 
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Yuchung Cheng <ycheng@google.com>
> Cc: Neal Cardwell <ncardwell@google.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---
>  include/trace/events/tcp.h | 63 ++++++++++++++++++++++++++++++++++++++++++++++
>  net/core/net-traces.c      |  1 +
>  net/ipv4/tcp_output.c      |  3 +++
>  3 files changed, 67 insertions(+)
>  create mode 100644 include/trace/events/tcp.h
> 
> diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
> new file mode 100644
> index 000000000000..cb22acc8aacd
> --- /dev/null
> +++ b/include/trace/events/tcp.h
> @@ -0,0 +1,63 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM tcp
> +
> +#if !defined(_TRACE_TCP_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_TCP_H
> +
> +#include <linux/ipv6.h>
> +#include <linux/tcp.h>
> +#include <linux/tracepoint.h>
> +#include <net/ipv6.h>
> +
> +TRACE_EVENT(tcp_retransmit_skb,
> +
> +	TP_PROTO(const struct sock *sk, struct sk_buff *skb, int segs),
> +
> +	TP_ARGS(sk, skb, segs),
> +
> +	TP_STRUCT__entry(
> +		__field(__u16, sport)
> +		__field(__u16, dport)
> +		__array(__u8, saddr, 4)
> +		__array(__u8, daddr, 4)
> +		__array(__u8, saddr_v6, 16)
> +		__array(__u8, daddr_v6, 16)
> +	),
> +
> +	TP_fast_assign(
> +		struct ipv6_pinfo *np = inet6_sk(sk);
> +		struct inet_sock *inet = inet_sk(sk);
> +		struct in6_addr *pin6;
> +		__be32 *p32;
> +
> +		__entry->sport = ntohs(inet->inet_sport);
> +		__entry->dport = ntohs(inet->inet_dport);
> +
> +		p32 = (__be32 *) __entry->saddr;
> +		*p32 = inet->inet_saddr;
> +
> +		p32 = (__be32 *) __entry->daddr;
> +		*p32 =  inet->inet_daddr;
> +
> +		if (np) {
> +			pin6 = (struct in6_addr *)__entry->saddr_v6;
> +			*pin6 = np->saddr;
> +			pin6 = (struct in6_addr *)__entry->daddr_v6;
> +			*pin6 = *(np->daddr_cache);
> +		} else {
> +			pin6 = (struct in6_addr *)__entry->saddr_v6;
> +			ipv6_addr_set_v4mapped(inet->inet_saddr, pin6);
> +			pin6 = (struct in6_addr *)__entry->daddr_v6;
> +			ipv6_addr_set_v4mapped(inet->inet_daddr, pin6);
> +		}
> +	),
> +
> +	TP_printk("sport=%hu, dport=%hu, saddr=%pI4, daddr=%pI4, saddrv6=%pI6, daddrv6=%pI6",
> +		  __entry->sport, __entry->dport, __entry->saddr, __entry->daddr,
> +		  __entry->saddr_v6, __entry->daddr_v6)
> +);
> +
> +#endif /* _TRACE_TCP_H */
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>
> diff --git a/net/core/net-traces.c b/net/core/net-traces.c
> index 1132820c8e62..f4e4fa2db505 100644
> --- a/net/core/net-traces.c
> +++ b/net/core/net-traces.c
> @@ -31,6 +31,7 @@
>  #include <trace/events/napi.h>
>  #include <trace/events/sock.h>
>  #include <trace/events/udp.h>
> +#include <trace/events/tcp.h>
>  #include <trace/events/fib.h>
>  #include <trace/events/qdisc.h>
>  #if IS_ENABLED(CONFIG_IPV6)
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 696b0a168f16..e6d6e1393578 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -42,6 +42,8 @@
>  #include <linux/gfp.h>
>  #include <linux/module.h>
>  
> +#include <trace/events/tcp.h>
> +
>  /* People can turn this off for buggy TCP's found in printers etc. */
>  int sysctl_tcp_retrans_collapse __read_mostly = 1;
>  
> @@ -2899,6 +2901,7 @@ int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs)
>  		if (!tp->retrans_stamp)
>  			tp->retrans_stamp = tcp_skb_timestamp(skb);
>  
> +		trace_tcp_retransmit_skb(sk, skb, segs);

I'm happy to see new tracepoints being added to tcp stack, but I'm concerned
with practical usability of them.
Like the above tracepoint definition makes it not very useful from bpf point of view,
since 'sk' pointer is not recored by as part of the tracepoint.
In bpf/tracing world we prefer tracepoints to have raw pointers recorded
in TP_STRUCT__entry() and _not_ printed in TP_printk()
(since pointers are useless for userspace).
Like trace_kfree_skb() tracepoint records raw 'skb' pointer and we can
walk whatever sk_buff fields we need inside the program.
Such approach allows tracepoint to be usable in many more scenarios, since
bpf program can examine kernel datastructures.
Over the last few years we've been running tcp statistics framework (similar to web10g)
using 8 kprobes in tcp stack with bpf programs extracting the data and now we're
ready to share this experience with the community. Right now we're working on a set
of tracepoints for tcp stack to make the interface more accurate, faster and more stable.
We're planning to send an RFC patch with these new tracepoints in the comming weeks.

More concrete, if you can make this trace_tcp_retransmit_skb() to record
sk, skb pointers and err code at the end of __tcp_retransmit_skb() it will solve
our need as well.

So far our list of kprobes is:
int kprobe__tcp_validate_incoming
int kprobe__tcp_send_active_reset
int kprobe__tcp_v4_send_reset
int kprobe__tcp_v6_send_reset
int kprobe__tcp_v4_destroy_sock
int kprobe__tcp_set_state
int kprobe__tcp_retransmit_skb
int kprobe__tcp_rtx_synack

with tracepoints we can consolidate two of them into one and drop
another one for sure. Notice that tcp_retransmit_skb is on our list too
and currently we're doing extra work inside the program to make it more
accurate which will be unnecessary if this tracepoint is at the end
of __tcp_retransmit_skb().

Thanks

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch net-next] tcp: add a tracepoint for tcp_retransmit_skb()
  2017-10-10 17:38 ` Alexei Starovoitov
@ 2017-10-10 21:37   ` Cong Wang
  2017-10-11  2:53     ` Alexei Starovoitov
  2017-10-10 21:58   ` Hannes Frederic Sowa
  1 sibling, 1 reply; 12+ messages in thread
From: Cong Wang @ 2017-10-10 21:37 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Linux Kernel Network Developers, Eric Dumazet, Yuchung Cheng,
	Neal Cardwell, Martin KaFai Lau, Brendan Gregg,
	Hannes Frederic Sowa, Song Liu

On Tue, Oct 10, 2017 at 10:38 AM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> I'm happy to see new tracepoints being added to tcp stack, but I'm concerned
> with practical usability of them.
> Like the above tracepoint definition makes it not very useful from bpf point of view,
> since 'sk' pointer is not recored by as part of the tracepoint.
> In bpf/tracing world we prefer tracepoints to have raw pointers recorded
> in TP_STRUCT__entry() and _not_ printed in TP_printk()
> (since pointers are useless for userspace).
> Like trace_kfree_skb() tracepoint records raw 'skb' pointer and we can
> walk whatever sk_buff fields we need inside the program.
> Such approach allows tracepoint to be usable in many more scenarios, since
> bpf program can examine kernel datastructures.


Sure, I am happy to add them for BPF. The current version is merely
for our own use case, other use cases like this are always welcome!


> Over the last few years we've been running tcp statistics framework (similar to web10g)
> using 8 kprobes in tcp stack with bpf programs extracting the data and now we're
> ready to share this experience with the community. Right now we're working on a set
> of tracepoints for tcp stack to make the interface more accurate, faster and more stable.
> We're planning to send an RFC patch with these new tracepoints in the comming weeks.

Great! Looking forward to it!

>
> More concrete, if you can make this trace_tcp_retransmit_skb() to record
> sk, skb pointers and err code at the end of __tcp_retransmit_skb() it will solve
> our need as well.


Note, currently I only call trace_tcp_retransmit_skb() for successful
retransmissions, since you mentioned err code, I guess you want it
for failures too? I am not sure if tracing unsuccessful TCP retransmissions
is meaningful here, I guess it's needed for BPF to track TCP states?

It doesn't harm to add it, at least we can filter out err!=0 since we
only care about successful ones.


>
> So far our list of kprobes is:
> int kprobe__tcp_validate_incoming
> int kprobe__tcp_send_active_reset
> int kprobe__tcp_v4_send_reset
> int kprobe__tcp_v6_send_reset
> int kprobe__tcp_v4_destroy_sock
> int kprobe__tcp_set_state
> int kprobe__tcp_retransmit_skb
> int kprobe__tcp_rtx_synack
>
> with tracepoints we can consolidate two of them into one and drop
> another one for sure. Notice that tcp_retransmit_skb is on our list too
> and currently we're doing extra work inside the program to make it more
> accurate which will be unnecessary if this tracepoint is at the end
> of __tcp_retransmit_skb().

Yeah, with these tracepoints we would be able to trace more TCP
state changes.

Thanks!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch net-next] tcp: add a tracepoint for tcp_retransmit_skb()
  2017-10-10 17:38 ` Alexei Starovoitov
  2017-10-10 21:37   ` Cong Wang
@ 2017-10-10 21:58   ` Hannes Frederic Sowa
  2017-10-11  2:56     ` Alexei Starovoitov
  1 sibling, 1 reply; 12+ messages in thread
From: Hannes Frederic Sowa @ 2017-10-10 21:58 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Cong Wang, netdev, Eric Dumazet, Yuchung Cheng, Neal Cardwell,
	Martin KaFai Lau, Brendan Gregg, Song Liu

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Mon, Oct 09, 2017 at 10:35:47PM -0700, Cong Wang wrote:

[...]

>> +		trace_tcp_retransmit_skb(sk, skb, segs);
>
> I'm happy to see new tracepoints being added to tcp stack, but I'm concerned
> with practical usability of them.
> Like the above tracepoint definition makes it not very useful from bpf point of view,
> since 'sk' pointer is not recored by as part of the tracepoint.
> In bpf/tracing world we prefer tracepoints to have raw pointers recorded
> in TP_STRUCT__entry() and _not_ printed in TP_printk()
> (since pointers are useless for userspace).

Ack.

Also could the TP_printk also use the socket cookies so they can get
associated with netlink dumps and as such also be associated to user
space processes? It could help against races while trying to associate
the socket with a process. ss already supports dumping those cookies
with -e.

The corresponding commit would be:

commit 33cf7c90fe2f97afb1cadaa0cfb782cb9d1b9ee2
Author: Eric Dumazet <edumazet@google.com>
Date:   Wed Mar 11 18:53:14 2015 -0700

    net: add real socket cookies

Right now they only get set when needed but as Eric already mentioned in
his commit log, this could be refined.

[...]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch net-next] tcp: add a tracepoint for tcp_retransmit_skb()
  2017-10-10 21:37   ` Cong Wang
@ 2017-10-11  2:53     ` Alexei Starovoitov
  0 siblings, 0 replies; 12+ messages in thread
From: Alexei Starovoitov @ 2017-10-11  2:53 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, Eric Dumazet, Yuchung Cheng,
	Neal Cardwell, Martin KaFai Lau, Brendan Gregg,
	Hannes Frederic Sowa, Song Liu

On Tue, Oct 10, 2017 at 02:37:11PM -0700, Cong Wang wrote:
> >
> > More concrete, if you can make this trace_tcp_retransmit_skb() to record
> > sk, skb pointers and err code at the end of __tcp_retransmit_skb() it will solve
> > our need as well.
> 
> 
> Note, currently I only call trace_tcp_retransmit_skb() for successful
> retransmissions, since you mentioned err code, I guess you want it
> for failures too? I am not sure if tracing unsuccessful TCP retransmissions
> is meaningful here, I guess it's needed for BPF to track TCP states?
>
> It doesn't harm to add it, at least we can filter out err!=0 since we
> only care about successful ones.

right now only successful rxmit would be enough for us.
Only that 'err' is hard to do via kprobe, since it's in some random
register and debug info is generally not available.
If you want to drop err for now and call tracepoint only
on success, I think, that's fine too. Need to double check.
Only sk and skb pointers are must have.

Thanks!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch net-next] tcp: add a tracepoint for tcp_retransmit_skb()
  2017-10-10 21:58   ` Hannes Frederic Sowa
@ 2017-10-11  2:56     ` Alexei Starovoitov
  2017-10-11 13:31       ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Alexei Starovoitov @ 2017-10-11  2:56 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Cong Wang, netdev, Eric Dumazet, Yuchung Cheng, Neal Cardwell,
	Martin KaFai Lau, Brendan Gregg, Song Liu

On Tue, Oct 10, 2017 at 11:58:53PM +0200, Hannes Frederic Sowa wrote:
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> 
> > On Mon, Oct 09, 2017 at 10:35:47PM -0700, Cong Wang wrote:
> 
> [...]
> 
> >> +		trace_tcp_retransmit_skb(sk, skb, segs);
> >
> > I'm happy to see new tracepoints being added to tcp stack, but I'm concerned
> > with practical usability of them.
> > Like the above tracepoint definition makes it not very useful from bpf point of view,
> > since 'sk' pointer is not recored by as part of the tracepoint.
> > In bpf/tracing world we prefer tracepoints to have raw pointers recorded
> > in TP_STRUCT__entry() and _not_ printed in TP_printk()
> > (since pointers are useless for userspace).
> 
> Ack.
> 
> Also could the TP_printk also use the socket cookies so they can get
> associated with netlink dumps and as such also be associated to user
> space processes? It could help against races while trying to associate
> the socket with a process. ss already supports dumping those cookies
> with -e.

makes sense to me.

> The corresponding commit would be:
> 
> commit 33cf7c90fe2f97afb1cadaa0cfb782cb9d1b9ee2
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Wed Mar 11 18:53:14 2015 -0700
> 
>     net: add real socket cookies
> 
> Right now they only get set when needed but as Eric already mentioned in
> his commit log, this could be refined.

actually we hit that too for completely different tracing use case.
Indeed would be good to generate socket cookie unconditionally
for all sockets. I don't think there is any harm.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch net-next] tcp: add a tracepoint for tcp_retransmit_skb()
  2017-10-11  2:56     ` Alexei Starovoitov
@ 2017-10-11 13:31       ` Eric Dumazet
  2017-10-11 18:21         ` Alexei Starovoitov
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2017-10-11 13:31 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Hannes Frederic Sowa, Cong Wang, netdev, Eric Dumazet,
	Yuchung Cheng, Neal Cardwell, Martin KaFai Lau, Brendan Gregg,
	Song Liu

On Tue, 2017-10-10 at 19:56 -0700, Alexei Starovoitov wrote:

> actually we hit that too for completely different tracing use case.
> Indeed would be good to generate socket cookie unconditionally
> for all sockets. I don't think there is any harm.
> 

Simply call sock_gen_cookie() when needed.

If a tracepoint needs the cookie and the cookie was not yet generated,
it will be generated at this point.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch net-next] tcp: add a tracepoint for tcp_retransmit_skb()
  2017-10-11 13:31       ` Eric Dumazet
@ 2017-10-11 18:21         ` Alexei Starovoitov
  2017-10-11 18:38           ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Alexei Starovoitov @ 2017-10-11 18:21 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Hannes Frederic Sowa, Cong Wang, netdev, Eric Dumazet,
	Yuchung Cheng, Neal Cardwell, Martin KaFai Lau, Brendan Gregg,
	Song Liu

On Wed, Oct 11, 2017 at 06:31:45AM -0700, Eric Dumazet wrote:
> On Tue, 2017-10-10 at 19:56 -0700, Alexei Starovoitov wrote:
> 
> > actually we hit that too for completely different tracing use case.
> > Indeed would be good to generate socket cookie unconditionally
> > for all sockets. I don't think there is any harm.
> > 
> 
> Simply call sock_gen_cookie() when needed.
> 
> If a tracepoint needs the cookie and the cookie was not yet generated,
> it will be generated at this point.

we already have bpf_get_socket_cookie() that will call it,
but this helper is for bpf socket filters, clsact and other
networking related prog types, whereas all of tracing is
read-only and side-effect-free, so we cannot use
bpf_get_socket_cookie() there.
Hence for tracing in kprobe-bpf we use raw sk pointer
as map key and full tuple when passing the socket info to user
space. If we could use socket cookie vs full tuple it would
make a nice difference.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch net-next] tcp: add a tracepoint for tcp_retransmit_skb()
  2017-10-11 18:21         ` Alexei Starovoitov
@ 2017-10-11 18:38           ` Eric Dumazet
  2017-10-12  9:39             ` Hannes Frederic Sowa
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2017-10-11 18:38 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Hannes Frederic Sowa, Cong Wang, netdev, Eric Dumazet,
	Yuchung Cheng, Neal Cardwell, Martin KaFai Lau, Brendan Gregg,
	Song Liu

On Wed, 2017-10-11 at 11:21 -0700, Alexei Starovoitov wrote:
> On Wed, Oct 11, 2017 at 06:31:45AM -0700, Eric Dumazet wrote:
> > On Tue, 2017-10-10 at 19:56 -0700, Alexei Starovoitov wrote:
> > 
> > > actually we hit that too for completely different tracing use case.
> > > Indeed would be good to generate socket cookie unconditionally
> > > for all sockets. I don't think there is any harm.
> > > 
> > 
> > Simply call sock_gen_cookie() when needed.
> > 
> > If a tracepoint needs the cookie and the cookie was not yet generated,
> > it will be generated at this point.
> 
> we already have bpf_get_socket_cookie() that will call it,
> but this helper is for bpf socket filters, clsact and other
> networking related prog types, whereas all of tracing is
> read-only and side-effect-free, so we cannot use
> bpf_get_socket_cookie() there.
> Hence for tracing in kprobe-bpf we use raw sk pointer
> as map key and full tuple when passing the socket info to user
> space. If we could use socket cookie vs full tuple it would
> make a nice difference.

Since this sock_gen_cookie() is lock-free and IRQ ready, it should be
not be a problem to pretend it works with a const socket.

I am a bit unsure about revealing in socket cookie a precise count of
sockets created on a netns. Some attackers might use this in a side
channel attack.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch net-next] tcp: add a tracepoint for tcp_retransmit_skb()
  2017-10-11 18:38           ` Eric Dumazet
@ 2017-10-12  9:39             ` Hannes Frederic Sowa
  0 siblings, 0 replies; 12+ messages in thread
From: Hannes Frederic Sowa @ 2017-10-12  9:39 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexei Starovoitov, Cong Wang, netdev, Eric Dumazet,
	Yuchung Cheng, Neal Cardwell, Martin KaFai Lau, Brendan Gregg,
	Song Liu

Eric Dumazet <eric.dumazet@gmail.com> writes:

[...]

> Since this sock_gen_cookie() is lock-free and IRQ ready, it should be
> not be a problem to pretend it works with a const socket.
>
> I am a bit unsure about revealing in socket cookie a precise count of
> sockets created on a netns. Some attackers might use this in a side
> channel attack.

That is true. We expose this information already via the inode number
allocator for sockets. It is a bit imprecise because of using CPU
batches.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-10-12  9:40 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-10  5:35 [Patch net-next] tcp: add a tracepoint for tcp_retransmit_skb() Cong Wang
2017-10-10 11:21 ` Eric Dumazet
2017-10-10 17:16   ` Cong Wang
2017-10-10 17:38 ` Alexei Starovoitov
2017-10-10 21:37   ` Cong Wang
2017-10-11  2:53     ` Alexei Starovoitov
2017-10-10 21:58   ` Hannes Frederic Sowa
2017-10-11  2:56     ` Alexei Starovoitov
2017-10-11 13:31       ` Eric Dumazet
2017-10-11 18:21         ` Alexei Starovoitov
2017-10-11 18:38           ` Eric Dumazet
2017-10-12  9:39             ` Hannes Frederic Sowa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.