All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Tracepoint for tcp retransmission
@ 2011-12-16 22:14 Satoru Moriya
  2011-12-16 22:16 ` [PATCH 1/2] tcp: refactor tcp_retransmit_skb() for a single return point Satoru Moriya
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Satoru Moriya @ 2011-12-16 22:14 UTC (permalink / raw)
  To: netdev; +Cc: nhorman, davem, tgraf, Seiji Aguchi, dle-develop

Sometimes network packets are dropped for some reason. In enterprise
systems which require strict RAS functionality, we must know the
reason why it happened and explain it to our customers even if using
TCP. When we investigate the incidents, at first we try to find out
whether the problem is in the server(kernel, application) or else
(router, hub etc). And next we try to find out which layer
(application/middleware/kernel(IP/TCP/UDP/..)etc.) the problem
occurs.

In application layer, user applications and/or middlewares usually
save logs if they dropped packets. In kernel layer, with this
tracepoint, we are able to know whether the kernel(TCP layer) send
packets successfully or not.

With a combination of them, we can find out whether the problem is
in the server or not.


Satoru Moriya (2):
  tcp: refactor tcp_retransmit_skb() for a single return point
  tcp: add tracepoint for tcp retransmission

 include/trace/events/tcp.h |   35 +++++++++++++++++++++++++++++++++++
 net/core/net-traces.c      |    1 +
 net/ipv4/tcp_output.c      |   34 ++++++++++++++++++++++++----------
 3 files changed, 60 insertions(+), 10 deletions(-)
 create mode 100644 include/trace/events/tcp.h

-- 
1.7.6.4

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] tcp: refactor tcp_retransmit_skb() for a single return point
  2011-12-16 22:14 [PATCH 0/2] Tracepoint for tcp retransmission Satoru Moriya
@ 2011-12-16 22:16 ` Satoru Moriya
  2011-12-19 12:10   ` David Laight
  2011-12-16 22:17 ` [PATCH 2/2] tcp: add tracepoint for tcp retransmission Satoru Moriya
  2011-12-17  0:17 ` [PATCH 0/2] Tracepoint " Stephen Hemminger
  2 siblings, 1 reply; 10+ messages in thread
From: Satoru Moriya @ 2011-12-16 22:16 UTC (permalink / raw)
  To: netdev; +Cc: nhorman, davem, tgraf, Seiji Aguchi, dle-develop

This is just a cleanup patch for making easy to hook return value
with a tracepoint.

Signed-off-by: Satoru Moriya <satoru.moriya@hds.com>
---
 net/ipv4/tcp_output.c |   31 +++++++++++++++++++++----------
 1 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 63170e2..13d3a79 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2089,18 +2089,24 @@ int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
 	 * copying overhead: fragmentation, tunneling, mangling etc.
 	 */
 	if (atomic_read(&sk->sk_wmem_alloc) >
-	    min(sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2), sk->sk_sndbuf))
-		return -EAGAIN;
+	    min(sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2), sk->sk_sndbuf)) {
+		err = -EAGAIN;
+		goto out;
+	}
 
 	if (before(TCP_SKB_CB(skb)->seq, tp->snd_una)) {
 		if (before(TCP_SKB_CB(skb)->end_seq, tp->snd_una))
 			BUG();
-		if (tcp_trim_head(sk, skb, tp->snd_una - TCP_SKB_CB(skb)->seq))
-			return -ENOMEM;
+		if (tcp_trim_head(sk, skb, tp->snd_una - TCP_SKB_CB(skb)->seq)) {
+			err = -ENOMEM;
+			goto out;
+		}
 	}
 
-	if (inet_csk(sk)->icsk_af_ops->rebuild_header(sk))
-		return -EHOSTUNREACH; /* Routing failure or similar. */
+	if (inet_csk(sk)->icsk_af_ops->rebuild_header(sk)) {
+		err = -EHOSTUNREACH; /* Routing failure or similar. */
+		goto out;
+	}
 
 	cur_mss = tcp_current_mss(sk);
 
@@ -2110,12 +2116,16 @@ int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
 	 * our retransmit serves as a zero window probe.
 	 */
 	if (!before(TCP_SKB_CB(skb)->seq, tcp_wnd_end(tp)) &&
-	    TCP_SKB_CB(skb)->seq != tp->snd_una)
-		return -EAGAIN;
+	    TCP_SKB_CB(skb)->seq != tp->snd_una) {
+		err = -EAGAIN;
+		goto out;
+	}
 
 	if (skb->len > cur_mss) {
-		if (tcp_fragment(sk, skb, cur_mss, cur_mss))
-			return -ENOMEM; /* We'll try again later. */
+		if (tcp_fragment(sk, skb, cur_mss, cur_mss)) {
+			err = -ENOMEM; /* We'll try again later. */
+			goto out;
+		}
 	} else {
 		int oldpcount = tcp_skb_pcount(skb);
 
@@ -2177,6 +2187,7 @@ int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
 		 */
 		TCP_SKB_CB(skb)->ack_seq = tp->snd_nxt;
 	}
+out:
 	return err;
 }
 
-- 
1.7.6.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] tcp: add tracepoint for tcp retransmission
  2011-12-16 22:14 [PATCH 0/2] Tracepoint for tcp retransmission Satoru Moriya
  2011-12-16 22:16 ` [PATCH 1/2] tcp: refactor tcp_retransmit_skb() for a single return point Satoru Moriya
@ 2011-12-16 22:17 ` Satoru Moriya
  2011-12-17  0:17 ` [PATCH 0/2] Tracepoint " Stephen Hemminger
  2 siblings, 0 replies; 10+ messages in thread
From: Satoru Moriya @ 2011-12-16 22:17 UTC (permalink / raw)
  To: netdev; +Cc: nhorman, davem, tgraf, Seiji Aguchi, dle-develop

This patch adds a tracepoint to tcp_retransmit_skb() to get the
sk, skb and return value. This helps one understand whether problems
are in a server or not.

Signed-off-by: Satoru Moriya <satoru.moriya@hds.com>
---
 include/trace/events/tcp.h |   35 +++++++++++++++++++++++++++++++++++
 net/core/net-traces.c      |    1 +
 net/ipv4/tcp_output.c      |    3 +++
 3 files changed, 39 insertions(+), 0 deletions(-)
 create mode 100644 include/trace/events/tcp.h

diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
new file mode 100644
index 0000000..821cdb7
--- /dev/null
+++ b/include/trace/events/tcp.h
@@ -0,0 +1,35 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM tcp
+
+#if !defined(_TRACE_TCP_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_TCP_H
+
+#include <linux/skbuff.h>
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(tcp_retransmit_skb,
+
+	TP_PROTO(struct sock *sk, struct sk_buff *skb, int err),
+
+	TP_ARGS(sk, skb, err),
+
+	TP_STRUCT__entry(
+		__field(void *, skaddr)
+		__field(void *, skbaddr)
+		__field(int, err)
+	),
+
+	TP_fast_assign(
+		__entry->skaddr = sk;
+		__entry->skbaddr = skb;
+		__entry->err = err;
+	),
+
+	TP_printk("sk=%p skb=%p err=%d",
+		__entry->skaddr, __entry->skbaddr, __entry->err)
+);
+
+#endif /* _TRACE_TCP_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/net/core/net-traces.c b/net/core/net-traces.c
index ba3c012..63f966b 100644
--- a/net/core/net-traces.c
+++ b/net/core/net-traces.c
@@ -31,6 +31,7 @@
 #include <trace/events/napi.h>
 #include <trace/events/sock.h>
 #include <trace/events/udp.h>
+#include <trace/events/tcp.h>
 
 EXPORT_TRACEPOINT_SYMBOL_GPL(kfree_skb);
 
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 13d3a79..a6db789 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -40,6 +40,8 @@
 #include <linux/gfp.h>
 #include <linux/module.h>
 
+#include <trace/events/tcp.h>
+
 /* People can turn this off for buggy TCP's found in printers etc. */
 int sysctl_tcp_retrans_collapse __read_mostly = 1;
 
@@ -2188,6 +2190,7 @@ int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
 		TCP_SKB_CB(skb)->ack_seq = tp->snd_nxt;
 	}
 out:
+	trace_tcp_retransmit_skb(sk, skb, err);
 	return err;
 }
 
-- 
1.7.6.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] Tracepoint for tcp retransmission
  2011-12-16 22:14 [PATCH 0/2] Tracepoint for tcp retransmission Satoru Moriya
  2011-12-16 22:16 ` [PATCH 1/2] tcp: refactor tcp_retransmit_skb() for a single return point Satoru Moriya
  2011-12-16 22:17 ` [PATCH 2/2] tcp: add tracepoint for tcp retransmission Satoru Moriya
@ 2011-12-17  0:17 ` Stephen Hemminger
  2011-12-18  0:49   ` Hagen Paul Pfeifer
  2011-12-20 18:13   ` Satoru Moriya
  2 siblings, 2 replies; 10+ messages in thread
From: Stephen Hemminger @ 2011-12-17  0:17 UTC (permalink / raw)
  To: Satoru Moriya; +Cc: nhorman, davem, tgraf, Seiji Aguchi, dle-develop, netdev


> Sometimes network packets are dropped for some reason. In enterprise
> systems which require strict RAS functionality, we must know the
> reason why it happened and explain it to our customers even if using
> TCP. When we investigate the incidents, at first we try to find out
> whether the problem is in the server(kernel, application) or else
> (router, hub etc). And next we try to find out which layer
> (application/middleware/kernel(IP/TCP/UDP/..)etc.) the problem
> occurs.

I feel sorry for you, your users don't understand TCP. TCP intentionally induces loss
to measure capacity. This is one of the fundamental principles of
loss based congestion control.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] Tracepoint for tcp retransmission
  2011-12-17  0:17 ` [PATCH 0/2] Tracepoint " Stephen Hemminger
@ 2011-12-18  0:49   ` Hagen Paul Pfeifer
  2011-12-20 23:40     ` Satoru Moriya
  2011-12-20 18:13   ` Satoru Moriya
  1 sibling, 1 reply; 10+ messages in thread
From: Hagen Paul Pfeifer @ 2011-12-18  0:49 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Satoru Moriya, nhorman, davem, tgraf, Seiji Aguchi, dle-develop, netdev

> Sometimes network packets are dropped for some reason. In enterprise
> systems which require strict RAS functionality, we must know the
> reason why it happened and explain it to our customers even if using
> TCP. When we investigate the incidents, at first we try to find out
> whether the problem is in the server(kernel, application) or else
> (router, hub etc). And next we try to find out which layer
> (application/middleware/kernel(IP/TCP/UDP/..)etc.) the problem
> occurs.

For the first question tcpdump may the right tool. For the later systemtap can
be used. I mean we now have the possibility to instrument the kernel at runtime,
without bloating the source. Anyway: is 63e03724b51 not suitable to gather the
required information easily?

Hagen

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH 1/2] tcp: refactor tcp_retransmit_skb() for a single return point
  2011-12-16 22:16 ` [PATCH 1/2] tcp: refactor tcp_retransmit_skb() for a single return point Satoru Moriya
@ 2011-12-19 12:10   ` David Laight
  2011-12-20 16:55     ` Satoru Moriya
  0 siblings, 1 reply; 10+ messages in thread
From: David Laight @ 2011-12-19 12:10 UTC (permalink / raw)
  To: Satoru Moriya, netdev; +Cc: nhorman, davem, tgraf, Seiji Aguchi, dle-develop

 
> This is just a cleanup patch for making easy to hook return value
> with a tracepoint.

Another way to temporarily add such a trace, is to temporarily
add a wrapper function.

	David

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH 1/2] tcp: refactor tcp_retransmit_skb() for a single return point
  2011-12-19 12:10   ` David Laight
@ 2011-12-20 16:55     ` Satoru Moriya
  0 siblings, 0 replies; 10+ messages in thread
From: Satoru Moriya @ 2011-12-20 16:55 UTC (permalink / raw)
  To: David Laight, netdev; +Cc: nhorman, davem, tgraf, Seiji Aguchi, dle-develop

On 12/19/2011 07:10 AM, David Laight wrote:
>  
>> This is just a cleanup patch for making easy to hook return value 
>> with a tracepoint.
> 
> Another way to temporarily add such a trace, is to temporarily
> add a wrapper function.

I'm afraid that I don't know how to add a wrapper function temporarily.
Could you please tell me how to do that? Jprobe?

Regards,
Satoru

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH 0/2] Tracepoint for tcp retransmission
  2011-12-17  0:17 ` [PATCH 0/2] Tracepoint " Stephen Hemminger
  2011-12-18  0:49   ` Hagen Paul Pfeifer
@ 2011-12-20 18:13   ` Satoru Moriya
  1 sibling, 0 replies; 10+ messages in thread
From: Satoru Moriya @ 2011-12-20 18:13 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: nhorman, davem, tgraf, Seiji Aguchi, dle-develop, netdev

On 12/16/2011 07:17 PM, Stephen Hemminger wrote:
> 
>> Sometimes network packets are dropped for some reason. In enterprise 
>> systems which require strict RAS functionality, we must know the 
>> reason why it happened and explain it to our customers even if using 
>> TCP. When we investigate the incidents, at first we try to find out 
>> whether the problem is in the server(kernel, application) or else 
>> (router, hub etc). And next we try to find out which layer
>> (application/middleware/kernel(IP/TCP/UDP/..)etc.) the problem 
>> occurs.
> 
> I feel sorry for you, your users don't understand TCP. TCP 
> intentionally induces loss to measure capacity. This is one of the 
> fundamental principles of loss based congestion control.

Maybe my explanation was not enough, I think...

Yes, as you said above, TCP induces loss in principle. Actually,
customers doesn't think that is a problem. But, at the same time
packet drop occurs everywhere in network due to BUG, wrong
configuration, broken hardware and/or etc. and sometimes it
causes serious problems to customers' system.

We provide Linux support service in our business and when a
serious problem occurs, we collect and analyze logs and find
where the problem is.

With this tracepoint, we're able to know whether the problem
is in OS or somewhere else. (Negative return value means packet
drop happened inside OS.)
That's a great help for us narrow down the root cause of
the problem.

I'll rewrite the cover letter when I post v2.

Regards,
Satoru

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH 0/2] Tracepoint for tcp retransmission
  2011-12-18  0:49   ` Hagen Paul Pfeifer
@ 2011-12-20 23:40     ` Satoru Moriya
  2011-12-21  7:05       ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Satoru Moriya @ 2011-12-20 23:40 UTC (permalink / raw)
  To: Hagen Paul Pfeifer, Stephen Hemminger
  Cc: nhorman, davem, tgraf, Seiji Aguchi, dle-develop, netdev


On 12/17/2011 07:49 PM, Hagen Paul Pfeifer wrote:
>> Sometimes network packets are dropped for some reason. In enterprise 
>> systems which require strict RAS functionality, we must know the 
>> reason why it happened and explain it to our customers even if using 
>> TCP. When we investigate the incidents, at first we try to find out 
>> whether the problem is in the server(kernel, application) or else 
>> (router, hub etc). And next we try to find out which layer
>> (application/middleware/kernel(IP/TCP/UDP/..)etc.) the problem 
>> occurs.
> 
> For the first question tcpdump may the right tool.

We'd like to keep records on memory and save it to file when
we detect problems so that we can keep tracing overhead low.
We can also keep the amount of trace data lower than with
tcpdump because we only record data when retransmission occurs.

Capturing all the packets and saving them to file cannot satisfy
our requirements. I should have written them in the cover-letter.
I'll fix it.

Also, we can analyze incidents in combination with the data
from this tracepoint and from others easily.

> For the later systemtap can be used. I mean we now have the 
> possibility to instrument the kernel at runtime, without bloating the 
> source.

Yes, we can use systemtap to get the data we need. But systemtap
is not included in kernel and we must maintain systemtap scripts
to follow kernel modification.

By adding tracepoint here, we can get useful data via ftrace/perf
without any instruments which is not included in kernel.
Of course, systemtap can insert a probe with this tracepoint too.

> Anyway: is 63e03724b51 not suitable to gather the required information 
> easily?

We use trace_kfree_skb() which 63e03724b51 uses to detect packet
drop event. In addition to that, we would like to detect errors
in TCP layer for better trouble shooting.

Regards,
Satoru

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH 0/2] Tracepoint for tcp retransmission
  2011-12-20 23:40     ` Satoru Moriya
@ 2011-12-21  7:05       ` Eric Dumazet
  0 siblings, 0 replies; 10+ messages in thread
From: Eric Dumazet @ 2011-12-21  7:05 UTC (permalink / raw)
  To: Satoru Moriya
  Cc: Hagen Paul Pfeifer, Stephen Hemminger, nhorman, davem, tgraf,
	Seiji Aguchi, dle-develop, netdev

Le mardi 20 décembre 2011 à 18:40 -0500, Satoru Moriya a écrit :

> We use trace_kfree_skb() which 63e03724b51 uses to detect packet
> drop event. In addition to that, we would like to detect errors
> in TCP layer for better trouble shooting.
> 

If you feel we miss a counter in a particular spot, just add it, so that
even a newbie can report "netstat -s" values in a message for analysis.

Not re-sending a frame because of a temporary shortage of memory is not
an error per se, TCP will recover eventually, but keeping track of these
events might be useful (once we have not percpu counters for this, but
regular atomic counters)

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-12-21  7:05 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-16 22:14 [PATCH 0/2] Tracepoint for tcp retransmission Satoru Moriya
2011-12-16 22:16 ` [PATCH 1/2] tcp: refactor tcp_retransmit_skb() for a single return point Satoru Moriya
2011-12-19 12:10   ` David Laight
2011-12-20 16:55     ` Satoru Moriya
2011-12-16 22:17 ` [PATCH 2/2] tcp: add tracepoint for tcp retransmission Satoru Moriya
2011-12-17  0:17 ` [PATCH 0/2] Tracepoint " Stephen Hemminger
2011-12-18  0:49   ` Hagen Paul Pfeifer
2011-12-20 23:40     ` Satoru Moriya
2011-12-21  7:05       ` Eric Dumazet
2011-12-20 18:13   ` Satoru Moriya

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.