* [Patch net-next] tcp: add a tracepoint for tcp_retransmit_skb() @ 2017-10-10 5:35 Cong Wang 2017-10-10 11:21 ` Eric Dumazet 2017-10-10 17:38 ` Alexei Starovoitov 0 siblings, 2 replies; 12+ messages in thread From: Cong Wang @ 2017-10-10 5:35 UTC (permalink / raw) To: netdev; +Cc: Cong Wang, Eric Dumazet, Yuchung Cheng, Neal Cardwell We need a real-time notification for tcp retransmission for monitoring. Of course we could use ftrace to dynamically instrument this kernel function too, however we can't retrieve the connection information at the same time, for example perf-tools [1] reads /proc/net/tcp for socket details, which is slow when we have a lots of connections. Therefore, this patch adds a tracepoint for tcp_retransmit_skb() and exposes src/dst IP addresses and ports of the connection. This also makes it easier to integrate into perf. Note, I expose both IPv4 and IPv6 addresses at the same time: for a IPv4 socket, v4 mapped address is used as IPv6 addresses, for a IPv6 socket, LOOPBACK4_IPV6 is already filled by kernel. Perhaps there are other interfaces to use (for example netlink), but tracepoint is the quickest way I can think of. 1. https://github.com/brendangregg/perf-tools/blob/master/net/tcpretrans Cc: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> --- include/trace/events/tcp.h | 63 ++++++++++++++++++++++++++++++++++++++++++++++ net/core/net-traces.c | 1 + net/ipv4/tcp_output.c | 3 +++ 3 files changed, 67 insertions(+) create mode 100644 include/trace/events/tcp.h diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h new file mode 100644 index 000000000000..cb22acc8aacd --- /dev/null +++ b/include/trace/events/tcp.h @@ -0,0 +1,63 @@ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM tcp + +#if !defined(_TRACE_TCP_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_TCP_H + +#include <linux/ipv6.h> +#include <linux/tcp.h> +#include <linux/tracepoint.h> +#include <net/ipv6.h> + +TRACE_EVENT(tcp_retransmit_skb, + + TP_PROTO(const struct sock *sk, struct sk_buff *skb, int segs), + + TP_ARGS(sk, skb, segs), + + TP_STRUCT__entry( + __field(__u16, sport) + __field(__u16, dport) + __array(__u8, saddr, 4) + __array(__u8, daddr, 4) + __array(__u8, saddr_v6, 16) + __array(__u8, daddr_v6, 16) + ), + + TP_fast_assign( + struct ipv6_pinfo *np = inet6_sk(sk); + struct inet_sock *inet = inet_sk(sk); + struct in6_addr *pin6; + __be32 *p32; + + __entry->sport = ntohs(inet->inet_sport); + __entry->dport = ntohs(inet->inet_dport); + + p32 = (__be32 *) __entry->saddr; + *p32 = inet->inet_saddr; + + p32 = (__be32 *) __entry->daddr; + *p32 = inet->inet_daddr; + + if (np) { + pin6 = (struct in6_addr *)__entry->saddr_v6; + *pin6 = np->saddr; + pin6 = (struct in6_addr *)__entry->daddr_v6; + *pin6 = *(np->daddr_cache); + } else { + pin6 = (struct in6_addr *)__entry->saddr_v6; + ipv6_addr_set_v4mapped(inet->inet_saddr, pin6); + pin6 = (struct in6_addr *)__entry->daddr_v6; + ipv6_addr_set_v4mapped(inet->inet_daddr, pin6); + } + ), + + TP_printk("sport=%hu, dport=%hu, saddr=%pI4, daddr=%pI4, saddrv6=%pI6, daddrv6=%pI6", + __entry->sport, __entry->dport, __entry->saddr, __entry->daddr, + __entry->saddr_v6, __entry->daddr_v6) +); + +#endif /* _TRACE_TCP_H */ + +/* This part must be outside protection */ +#include <trace/define_trace.h> diff --git a/net/core/net-traces.c b/net/core/net-traces.c index 1132820c8e62..f4e4fa2db505 100644 --- a/net/core/net-traces.c +++ b/net/core/net-traces.c @@ -31,6 +31,7 @@ #include <trace/events/napi.h> #include <trace/events/sock.h> #include <trace/events/udp.h> +#include <trace/events/tcp.h> #include <trace/events/fib.h> #include <trace/events/qdisc.h> #if IS_ENABLED(CONFIG_IPV6) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 696b0a168f16..e6d6e1393578 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -42,6 +42,8 @@ #include <linux/gfp.h> #include <linux/module.h> +#include <trace/events/tcp.h> + /* People can turn this off for buggy TCP's found in printers etc. */ int sysctl_tcp_retrans_collapse __read_mostly = 1; @@ -2899,6 +2901,7 @@ int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs) if (!tp->retrans_stamp) tp->retrans_stamp = tcp_skb_timestamp(skb); + trace_tcp_retransmit_skb(sk, skb, segs); } if (tp->undo_retrans < 0) -- 2.13.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [Patch net-next] tcp: add a tracepoint for tcp_retransmit_skb() 2017-10-10 5:35 [Patch net-next] tcp: add a tracepoint for tcp_retransmit_skb() Cong Wang @ 2017-10-10 11:21 ` Eric Dumazet 2017-10-10 17:16 ` Cong Wang 2017-10-10 17:38 ` Alexei Starovoitov 1 sibling, 1 reply; 12+ messages in thread From: Eric Dumazet @ 2017-10-10 11:21 UTC (permalink / raw) To: Cong Wang; +Cc: netdev, Yuchung Cheng, Neal Cardwell On Mon, Oct 9, 2017 at 10:35 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote: > We need a real-time notification for tcp retransmission > for monitoring. Hi Cong This seems to not be well defined. For example, why a retransmission triggered by TLP is not traced ? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch net-next] tcp: add a tracepoint for tcp_retransmit_skb() 2017-10-10 11:21 ` Eric Dumazet @ 2017-10-10 17:16 ` Cong Wang 0 siblings, 0 replies; 12+ messages in thread From: Cong Wang @ 2017-10-10 17:16 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, Yuchung Cheng, Neal Cardwell On Tue, Oct 10, 2017 at 4:21 AM, Eric Dumazet <edumazet@google.com> wrote: > On Mon, Oct 9, 2017 at 10:35 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote: >> We need a real-time notification for tcp retransmission >> for monitoring. > > Hi Cong > > This seems to not be well defined. > > For example, why a retransmission triggered by TLP is not traced ? Oh, it should, seems I should just put the trace in __tcp_retransmit_skb() instead... I will update the patch after waiting for other feedbacks. Thanks. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch net-next] tcp: add a tracepoint for tcp_retransmit_skb() 2017-10-10 5:35 [Patch net-next] tcp: add a tracepoint for tcp_retransmit_skb() Cong Wang 2017-10-10 11:21 ` Eric Dumazet @ 2017-10-10 17:38 ` Alexei Starovoitov 2017-10-10 21:37 ` Cong Wang 2017-10-10 21:58 ` Hannes Frederic Sowa 1 sibling, 2 replies; 12+ messages in thread From: Alexei Starovoitov @ 2017-10-10 17:38 UTC (permalink / raw) To: Cong Wang Cc: netdev, Eric Dumazet, Yuchung Cheng, Neal Cardwell, Martin KaFai Lau, Brendan Gregg, hannes, Song Liu On Mon, Oct 09, 2017 at 10:35:47PM -0700, Cong Wang wrote: > We need a real-time notification for tcp retransmission > for monitoring. > > Of course we could use ftrace to dynamically instrument this > kernel function too, however we can't retrieve the connection > information at the same time, for example perf-tools [1] reads > /proc/net/tcp for socket details, which is slow when we have > a lots of connections. > > Therefore, this patch adds a tracepoint for tcp_retransmit_skb() > and exposes src/dst IP addresses and ports of the connection. > This also makes it easier to integrate into perf. > > Note, I expose both IPv4 and IPv6 addresses at the same time: > for a IPv4 socket, v4 mapped address is used as IPv6 addresses, > for a IPv6 socket, LOOPBACK4_IPV6 is already filled by kernel. > > Perhaps there are other interfaces to use (for example netlink), > but tracepoint is the quickest way I can think of. > > 1. https://github.com/brendangregg/perf-tools/blob/master/net/tcpretrans > > Cc: Eric Dumazet <edumazet@google.com> > Cc: Yuchung Cheng <ycheng@google.com> > Cc: Neal Cardwell <ncardwell@google.com> > Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> > --- > include/trace/events/tcp.h | 63 ++++++++++++++++++++++++++++++++++++++++++++++ > net/core/net-traces.c | 1 + > net/ipv4/tcp_output.c | 3 +++ > 3 files changed, 67 insertions(+) > create mode 100644 include/trace/events/tcp.h > > diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h > new file mode 100644 > index 000000000000..cb22acc8aacd > --- /dev/null > +++ b/include/trace/events/tcp.h > @@ -0,0 +1,63 @@ > +#undef TRACE_SYSTEM > +#define TRACE_SYSTEM tcp > + > +#if !defined(_TRACE_TCP_H) || defined(TRACE_HEADER_MULTI_READ) > +#define _TRACE_TCP_H > + > +#include <linux/ipv6.h> > +#include <linux/tcp.h> > +#include <linux/tracepoint.h> > +#include <net/ipv6.h> > + > +TRACE_EVENT(tcp_retransmit_skb, > + > + TP_PROTO(const struct sock *sk, struct sk_buff *skb, int segs), > + > + TP_ARGS(sk, skb, segs), > + > + TP_STRUCT__entry( > + __field(__u16, sport) > + __field(__u16, dport) > + __array(__u8, saddr, 4) > + __array(__u8, daddr, 4) > + __array(__u8, saddr_v6, 16) > + __array(__u8, daddr_v6, 16) > + ), > + > + TP_fast_assign( > + struct ipv6_pinfo *np = inet6_sk(sk); > + struct inet_sock *inet = inet_sk(sk); > + struct in6_addr *pin6; > + __be32 *p32; > + > + __entry->sport = ntohs(inet->inet_sport); > + __entry->dport = ntohs(inet->inet_dport); > + > + p32 = (__be32 *) __entry->saddr; > + *p32 = inet->inet_saddr; > + > + p32 = (__be32 *) __entry->daddr; > + *p32 = inet->inet_daddr; > + > + if (np) { > + pin6 = (struct in6_addr *)__entry->saddr_v6; > + *pin6 = np->saddr; > + pin6 = (struct in6_addr *)__entry->daddr_v6; > + *pin6 = *(np->daddr_cache); > + } else { > + pin6 = (struct in6_addr *)__entry->saddr_v6; > + ipv6_addr_set_v4mapped(inet->inet_saddr, pin6); > + pin6 = (struct in6_addr *)__entry->daddr_v6; > + ipv6_addr_set_v4mapped(inet->inet_daddr, pin6); > + } > + ), > + > + TP_printk("sport=%hu, dport=%hu, saddr=%pI4, daddr=%pI4, saddrv6=%pI6, daddrv6=%pI6", > + __entry->sport, __entry->dport, __entry->saddr, __entry->daddr, > + __entry->saddr_v6, __entry->daddr_v6) > +); > + > +#endif /* _TRACE_TCP_H */ > + > +/* This part must be outside protection */ > +#include <trace/define_trace.h> > diff --git a/net/core/net-traces.c b/net/core/net-traces.c > index 1132820c8e62..f4e4fa2db505 100644 > --- a/net/core/net-traces.c > +++ b/net/core/net-traces.c > @@ -31,6 +31,7 @@ > #include <trace/events/napi.h> > #include <trace/events/sock.h> > #include <trace/events/udp.h> > +#include <trace/events/tcp.h> > #include <trace/events/fib.h> > #include <trace/events/qdisc.h> > #if IS_ENABLED(CONFIG_IPV6) > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index 696b0a168f16..e6d6e1393578 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -42,6 +42,8 @@ > #include <linux/gfp.h> > #include <linux/module.h> > > +#include <trace/events/tcp.h> > + > /* People can turn this off for buggy TCP's found in printers etc. */ > int sysctl_tcp_retrans_collapse __read_mostly = 1; > > @@ -2899,6 +2901,7 @@ int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs) > if (!tp->retrans_stamp) > tp->retrans_stamp = tcp_skb_timestamp(skb); > > + trace_tcp_retransmit_skb(sk, skb, segs); I'm happy to see new tracepoints being added to tcp stack, but I'm concerned with practical usability of them. Like the above tracepoint definition makes it not very useful from bpf point of view, since 'sk' pointer is not recored by as part of the tracepoint. In bpf/tracing world we prefer tracepoints to have raw pointers recorded in TP_STRUCT__entry() and _not_ printed in TP_printk() (since pointers are useless for userspace). Like trace_kfree_skb() tracepoint records raw 'skb' pointer and we can walk whatever sk_buff fields we need inside the program. Such approach allows tracepoint to be usable in many more scenarios, since bpf program can examine kernel datastructures. Over the last few years we've been running tcp statistics framework (similar to web10g) using 8 kprobes in tcp stack with bpf programs extracting the data and now we're ready to share this experience with the community. Right now we're working on a set of tracepoints for tcp stack to make the interface more accurate, faster and more stable. We're planning to send an RFC patch with these new tracepoints in the comming weeks. More concrete, if you can make this trace_tcp_retransmit_skb() to record sk, skb pointers and err code at the end of __tcp_retransmit_skb() it will solve our need as well. So far our list of kprobes is: int kprobe__tcp_validate_incoming int kprobe__tcp_send_active_reset int kprobe__tcp_v4_send_reset int kprobe__tcp_v6_send_reset int kprobe__tcp_v4_destroy_sock int kprobe__tcp_set_state int kprobe__tcp_retransmit_skb int kprobe__tcp_rtx_synack with tracepoints we can consolidate two of them into one and drop another one for sure. Notice that tcp_retransmit_skb is on our list too and currently we're doing extra work inside the program to make it more accurate which will be unnecessary if this tracepoint is at the end of __tcp_retransmit_skb(). Thanks ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch net-next] tcp: add a tracepoint for tcp_retransmit_skb() 2017-10-10 17:38 ` Alexei Starovoitov @ 2017-10-10 21:37 ` Cong Wang 2017-10-11 2:53 ` Alexei Starovoitov 2017-10-10 21:58 ` Hannes Frederic Sowa 1 sibling, 1 reply; 12+ messages in thread From: Cong Wang @ 2017-10-10 21:37 UTC (permalink / raw) To: Alexei Starovoitov Cc: Linux Kernel Network Developers, Eric Dumazet, Yuchung Cheng, Neal Cardwell, Martin KaFai Lau, Brendan Gregg, Hannes Frederic Sowa, Song Liu On Tue, Oct 10, 2017 at 10:38 AM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > I'm happy to see new tracepoints being added to tcp stack, but I'm concerned > with practical usability of them. > Like the above tracepoint definition makes it not very useful from bpf point of view, > since 'sk' pointer is not recored by as part of the tracepoint. > In bpf/tracing world we prefer tracepoints to have raw pointers recorded > in TP_STRUCT__entry() and _not_ printed in TP_printk() > (since pointers are useless for userspace). > Like trace_kfree_skb() tracepoint records raw 'skb' pointer and we can > walk whatever sk_buff fields we need inside the program. > Such approach allows tracepoint to be usable in many more scenarios, since > bpf program can examine kernel datastructures. Sure, I am happy to add them for BPF. The current version is merely for our own use case, other use cases like this are always welcome! > Over the last few years we've been running tcp statistics framework (similar to web10g) > using 8 kprobes in tcp stack with bpf programs extracting the data and now we're > ready to share this experience with the community. Right now we're working on a set > of tracepoints for tcp stack to make the interface more accurate, faster and more stable. > We're planning to send an RFC patch with these new tracepoints in the comming weeks. Great! Looking forward to it! > > More concrete, if you can make this trace_tcp_retransmit_skb() to record > sk, skb pointers and err code at the end of __tcp_retransmit_skb() it will solve > our need as well. Note, currently I only call trace_tcp_retransmit_skb() for successful retransmissions, since you mentioned err code, I guess you want it for failures too? I am not sure if tracing unsuccessful TCP retransmissions is meaningful here, I guess it's needed for BPF to track TCP states? It doesn't harm to add it, at least we can filter out err!=0 since we only care about successful ones. > > So far our list of kprobes is: > int kprobe__tcp_validate_incoming > int kprobe__tcp_send_active_reset > int kprobe__tcp_v4_send_reset > int kprobe__tcp_v6_send_reset > int kprobe__tcp_v4_destroy_sock > int kprobe__tcp_set_state > int kprobe__tcp_retransmit_skb > int kprobe__tcp_rtx_synack > > with tracepoints we can consolidate two of them into one and drop > another one for sure. Notice that tcp_retransmit_skb is on our list too > and currently we're doing extra work inside the program to make it more > accurate which will be unnecessary if this tracepoint is at the end > of __tcp_retransmit_skb(). Yeah, with these tracepoints we would be able to trace more TCP state changes. Thanks! ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch net-next] tcp: add a tracepoint for tcp_retransmit_skb() 2017-10-10 21:37 ` Cong Wang @ 2017-10-11 2:53 ` Alexei Starovoitov 0 siblings, 0 replies; 12+ messages in thread From: Alexei Starovoitov @ 2017-10-11 2:53 UTC (permalink / raw) To: Cong Wang Cc: Linux Kernel Network Developers, Eric Dumazet, Yuchung Cheng, Neal Cardwell, Martin KaFai Lau, Brendan Gregg, Hannes Frederic Sowa, Song Liu On Tue, Oct 10, 2017 at 02:37:11PM -0700, Cong Wang wrote: > > > > More concrete, if you can make this trace_tcp_retransmit_skb() to record > > sk, skb pointers and err code at the end of __tcp_retransmit_skb() it will solve > > our need as well. > > > Note, currently I only call trace_tcp_retransmit_skb() for successful > retransmissions, since you mentioned err code, I guess you want it > for failures too? I am not sure if tracing unsuccessful TCP retransmissions > is meaningful here, I guess it's needed for BPF to track TCP states? > > It doesn't harm to add it, at least we can filter out err!=0 since we > only care about successful ones. right now only successful rxmit would be enough for us. Only that 'err' is hard to do via kprobe, since it's in some random register and debug info is generally not available. If you want to drop err for now and call tracepoint only on success, I think, that's fine too. Need to double check. Only sk and skb pointers are must have. Thanks! ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch net-next] tcp: add a tracepoint for tcp_retransmit_skb() 2017-10-10 17:38 ` Alexei Starovoitov 2017-10-10 21:37 ` Cong Wang @ 2017-10-10 21:58 ` Hannes Frederic Sowa 2017-10-11 2:56 ` Alexei Starovoitov 1 sibling, 1 reply; 12+ messages in thread From: Hannes Frederic Sowa @ 2017-10-10 21:58 UTC (permalink / raw) To: Alexei Starovoitov Cc: Cong Wang, netdev, Eric Dumazet, Yuchung Cheng, Neal Cardwell, Martin KaFai Lau, Brendan Gregg, Song Liu Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: > On Mon, Oct 09, 2017 at 10:35:47PM -0700, Cong Wang wrote: [...] >> + trace_tcp_retransmit_skb(sk, skb, segs); > > I'm happy to see new tracepoints being added to tcp stack, but I'm concerned > with practical usability of them. > Like the above tracepoint definition makes it not very useful from bpf point of view, > since 'sk' pointer is not recored by as part of the tracepoint. > In bpf/tracing world we prefer tracepoints to have raw pointers recorded > in TP_STRUCT__entry() and _not_ printed in TP_printk() > (since pointers are useless for userspace). Ack. Also could the TP_printk also use the socket cookies so they can get associated with netlink dumps and as such also be associated to user space processes? It could help against races while trying to associate the socket with a process. ss already supports dumping those cookies with -e. The corresponding commit would be: commit 33cf7c90fe2f97afb1cadaa0cfb782cb9d1b9ee2 Author: Eric Dumazet <edumazet@google.com> Date: Wed Mar 11 18:53:14 2015 -0700 net: add real socket cookies Right now they only get set when needed but as Eric already mentioned in his commit log, this could be refined. [...] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch net-next] tcp: add a tracepoint for tcp_retransmit_skb() 2017-10-10 21:58 ` Hannes Frederic Sowa @ 2017-10-11 2:56 ` Alexei Starovoitov 2017-10-11 13:31 ` Eric Dumazet 0 siblings, 1 reply; 12+ messages in thread From: Alexei Starovoitov @ 2017-10-11 2:56 UTC (permalink / raw) To: Hannes Frederic Sowa Cc: Cong Wang, netdev, Eric Dumazet, Yuchung Cheng, Neal Cardwell, Martin KaFai Lau, Brendan Gregg, Song Liu On Tue, Oct 10, 2017 at 11:58:53PM +0200, Hannes Frederic Sowa wrote: > Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: > > > On Mon, Oct 09, 2017 at 10:35:47PM -0700, Cong Wang wrote: > > [...] > > >> + trace_tcp_retransmit_skb(sk, skb, segs); > > > > I'm happy to see new tracepoints being added to tcp stack, but I'm concerned > > with practical usability of them. > > Like the above tracepoint definition makes it not very useful from bpf point of view, > > since 'sk' pointer is not recored by as part of the tracepoint. > > In bpf/tracing world we prefer tracepoints to have raw pointers recorded > > in TP_STRUCT__entry() and _not_ printed in TP_printk() > > (since pointers are useless for userspace). > > Ack. > > Also could the TP_printk also use the socket cookies so they can get > associated with netlink dumps and as such also be associated to user > space processes? It could help against races while trying to associate > the socket with a process. ss already supports dumping those cookies > with -e. makes sense to me. > The corresponding commit would be: > > commit 33cf7c90fe2f97afb1cadaa0cfb782cb9d1b9ee2 > Author: Eric Dumazet <edumazet@google.com> > Date: Wed Mar 11 18:53:14 2015 -0700 > > net: add real socket cookies > > Right now they only get set when needed but as Eric already mentioned in > his commit log, this could be refined. actually we hit that too for completely different tracing use case. Indeed would be good to generate socket cookie unconditionally for all sockets. I don't think there is any harm. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch net-next] tcp: add a tracepoint for tcp_retransmit_skb() 2017-10-11 2:56 ` Alexei Starovoitov @ 2017-10-11 13:31 ` Eric Dumazet 2017-10-11 18:21 ` Alexei Starovoitov 0 siblings, 1 reply; 12+ messages in thread From: Eric Dumazet @ 2017-10-11 13:31 UTC (permalink / raw) To: Alexei Starovoitov Cc: Hannes Frederic Sowa, Cong Wang, netdev, Eric Dumazet, Yuchung Cheng, Neal Cardwell, Martin KaFai Lau, Brendan Gregg, Song Liu On Tue, 2017-10-10 at 19:56 -0700, Alexei Starovoitov wrote: > actually we hit that too for completely different tracing use case. > Indeed would be good to generate socket cookie unconditionally > for all sockets. I don't think there is any harm. > Simply call sock_gen_cookie() when needed. If a tracepoint needs the cookie and the cookie was not yet generated, it will be generated at this point. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch net-next] tcp: add a tracepoint for tcp_retransmit_skb() 2017-10-11 13:31 ` Eric Dumazet @ 2017-10-11 18:21 ` Alexei Starovoitov 2017-10-11 18:38 ` Eric Dumazet 0 siblings, 1 reply; 12+ messages in thread From: Alexei Starovoitov @ 2017-10-11 18:21 UTC (permalink / raw) To: Eric Dumazet Cc: Hannes Frederic Sowa, Cong Wang, netdev, Eric Dumazet, Yuchung Cheng, Neal Cardwell, Martin KaFai Lau, Brendan Gregg, Song Liu On Wed, Oct 11, 2017 at 06:31:45AM -0700, Eric Dumazet wrote: > On Tue, 2017-10-10 at 19:56 -0700, Alexei Starovoitov wrote: > > > actually we hit that too for completely different tracing use case. > > Indeed would be good to generate socket cookie unconditionally > > for all sockets. I don't think there is any harm. > > > > Simply call sock_gen_cookie() when needed. > > If a tracepoint needs the cookie and the cookie was not yet generated, > it will be generated at this point. we already have bpf_get_socket_cookie() that will call it, but this helper is for bpf socket filters, clsact and other networking related prog types, whereas all of tracing is read-only and side-effect-free, so we cannot use bpf_get_socket_cookie() there. Hence for tracing in kprobe-bpf we use raw sk pointer as map key and full tuple when passing the socket info to user space. If we could use socket cookie vs full tuple it would make a nice difference. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch net-next] tcp: add a tracepoint for tcp_retransmit_skb() 2017-10-11 18:21 ` Alexei Starovoitov @ 2017-10-11 18:38 ` Eric Dumazet 2017-10-12 9:39 ` Hannes Frederic Sowa 0 siblings, 1 reply; 12+ messages in thread From: Eric Dumazet @ 2017-10-11 18:38 UTC (permalink / raw) To: Alexei Starovoitov Cc: Hannes Frederic Sowa, Cong Wang, netdev, Eric Dumazet, Yuchung Cheng, Neal Cardwell, Martin KaFai Lau, Brendan Gregg, Song Liu On Wed, 2017-10-11 at 11:21 -0700, Alexei Starovoitov wrote: > On Wed, Oct 11, 2017 at 06:31:45AM -0700, Eric Dumazet wrote: > > On Tue, 2017-10-10 at 19:56 -0700, Alexei Starovoitov wrote: > > > > > actually we hit that too for completely different tracing use case. > > > Indeed would be good to generate socket cookie unconditionally > > > for all sockets. I don't think there is any harm. > > > > > > > Simply call sock_gen_cookie() when needed. > > > > If a tracepoint needs the cookie and the cookie was not yet generated, > > it will be generated at this point. > > we already have bpf_get_socket_cookie() that will call it, > but this helper is for bpf socket filters, clsact and other > networking related prog types, whereas all of tracing is > read-only and side-effect-free, so we cannot use > bpf_get_socket_cookie() there. > Hence for tracing in kprobe-bpf we use raw sk pointer > as map key and full tuple when passing the socket info to user > space. If we could use socket cookie vs full tuple it would > make a nice difference. Since this sock_gen_cookie() is lock-free and IRQ ready, it should be not be a problem to pretend it works with a const socket. I am a bit unsure about revealing in socket cookie a precise count of sockets created on a netns. Some attackers might use this in a side channel attack. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch net-next] tcp: add a tracepoint for tcp_retransmit_skb() 2017-10-11 18:38 ` Eric Dumazet @ 2017-10-12 9:39 ` Hannes Frederic Sowa 0 siblings, 0 replies; 12+ messages in thread From: Hannes Frederic Sowa @ 2017-10-12 9:39 UTC (permalink / raw) To: Eric Dumazet Cc: Alexei Starovoitov, Cong Wang, netdev, Eric Dumazet, Yuchung Cheng, Neal Cardwell, Martin KaFai Lau, Brendan Gregg, Song Liu Eric Dumazet <eric.dumazet@gmail.com> writes: [...] > Since this sock_gen_cookie() is lock-free and IRQ ready, it should be > not be a problem to pretend it works with a const socket. > > I am a bit unsure about revealing in socket cookie a precise count of > sockets created on a netns. Some attackers might use this in a side > channel attack. That is true. We expose this information already via the inode number allocator for sockets. It is a bit imprecise because of using CPU batches. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2017-10-12 9:40 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-10-10 5:35 [Patch net-next] tcp: add a tracepoint for tcp_retransmit_skb() Cong Wang 2017-10-10 11:21 ` Eric Dumazet 2017-10-10 17:16 ` Cong Wang 2017-10-10 17:38 ` Alexei Starovoitov 2017-10-10 21:37 ` Cong Wang 2017-10-11 2:53 ` Alexei Starovoitov 2017-10-10 21:58 ` Hannes Frederic Sowa 2017-10-11 2:56 ` Alexei Starovoitov 2017-10-11 13:31 ` Eric Dumazet 2017-10-11 18:21 ` Alexei Starovoitov 2017-10-11 18:38 ` Eric Dumazet 2017-10-12 9:39 ` Hannes Frederic Sowa
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.