All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v4 0/6] net-timestamp: new tx tstamps and tcp
@ 2014-08-05  2:11 Willem de Bruijn
  2014-08-05  2:11 ` [PATCH net-next v4 1/6] net-timestamp: extend SCM_TIMESTAMPING ancillary data struct Willem de Bruijn
                   ` (8 more replies)
  0 siblings, 9 replies; 27+ messages in thread
From: Willem de Bruijn @ 2014-08-05  2:11 UTC (permalink / raw)
  To: netdev; +Cc: davem, eric.dumazet, richardcochran, Willem de Bruijn

Extend socket tx timestamping:
- allow multiple types of software timestamps aside from send (1)
- add software timestamp on enter packet scheduling (4)
- add software timestamp for TCP (5)
- add software timestamp for TCP on ACK (6)

The sk_flags option space is nearly exhausted. Also move the
many timestamp options to a new sk->sk_tstamps (2).

To disambiguate data when tstamps may arrive out of order,
optionally return a sequential ID assigned at send (3).

Extend Linux tx timestamping to monitoring of latency
incurred within the kernel stack and to protocols embedded in TCP.
Complex kernel setups may have multiple layers of queueing, including
multiple instances of packet scheduling, and many classes per layer.
Many applications embed discrete payloads into TCP bytestreams for
reliability, flow control, etcetera. Detecting application tail
latency in such scenarios relies on identifying the exact queue
responsible if on the host, or the network latency if otherwise.

Changelog:
v4->v5
  - define SCM_TSTAMP_SND == 0, for legacy behavior
  - add TCP tstamps without changing the generated byte stream
    - modify GSO and ACK to find offset: slightly more complex
      than previous invariant that it is the last byte
  - consistent naming of packet scheduling
    - rename SCM_TSTAMP_ENQ to SCM_TSTAMP_SCHED
  - add unique key in ee_data
  - add id field in ee_info to disambiguate tstamps
    - optional, only on new flag SOF_TIMESTAMPING_OPT_ID
    - for bytestream, in bytes

v3->v4
  - (v3 review comment) removed skb->mark packet identification (*A)
  - (v3 review comment) fixed indentation
  - tcp: fixed poll() to return POLLERR on non-zero queue
  - rebased to work without syststamp
  - comments: removed all traces of MSG_TSTAMP_.. (*B)

v2->v3
  - extend the SO_TIMESTAMPING API, instead of defining a new one.
  - add protocol independent support to correlate tstamps with data,
    based on returning skb->mark.
  - removed no-payload optimization and documentation (for now):

    I have a follow-on patch that reintroduces MSG_TSTAMP along with a
    new socket option SOF_TIMESTAMPING_OPT_ONFLAG. This is equivalent
    to sequence setsockopt(<enable>); send(..); setsockopt(<disable>),
    but avoids the need to define a MSG_TSTAMP_<TYPE> for each type.

    I will leave these three patches as follow-on, as this patchset is
    large enough as is.

v1->v2
  - expand timestamping (existing and new) to SOCK_RAW and ping sockets
  - rename sock_errqueue_timestamping to scm_timestamping
  - change timestamp data format: do not add fields to scm_timestamping.
      Doing so could break legacy applications. Instead, communicate
      through an existing, but unused, field in the error message.
  - rename SOF_.._OPT_TX_NO_PAYLOAD to shorter SOF_.._OPT_TSONLY
  - move msg_tstamp test app out of patchset and to github
      git://github.com/wdebruij/kerneltools.git


Willem de Bruijn (6):
  net-timestamp: extend SCM_TIMESTAMPING ancillary data struct
  net-timestamp: move timestamp flags out of sk_flags
  net-timestamp: add key to disambiguate concurrent datagrams
  net-timestamp: SCHED timestamp on entering packet scheduler
  net-timestamp: TCP timestamping
  net-timestamp: ACK timestamp for bytestreams

-- 
2.0.0.526.g5318336

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH net-next v4 1/6] net-timestamp: extend SCM_TIMESTAMPING ancillary data struct
  2014-08-05  2:11 [PATCH net-next v4 0/6] net-timestamp: new tx tstamps and tcp Willem de Bruijn
@ 2014-08-05  2:11 ` Willem de Bruijn
  2014-08-05  2:11 ` [PATCH net-next v4 2/6] net-timestamp: move timestamp flags out of sk_flags Willem de Bruijn
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 27+ messages in thread
From: Willem de Bruijn @ 2014-08-05  2:11 UTC (permalink / raw)
  To: netdev; +Cc: davem, eric.dumazet, richardcochran, Willem de Bruijn

Applications that request kernel tx timestamps with SO_TIMESTAMPING
read timestamps as recvmsg() ancillary data. The response is defined
implicitly as timespec[3].

1) define struct scm_timestamping explicitly and

2) add support for new tstamp types. On tx, scm_timestamping always
   accompanies a sock_extended_err. Define previously unused field
   ee_info to signal the type of ts[0]. Introduce SCM_TSTAMP_SND to
   define the existing behavior.

The reception path is not modified. On rx, no struct similar to
sock_extended_err is passed along with SCM_TIMESTAMPING.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 include/linux/skbuff.h        |  3 +++
 include/net/sock.h            |  4 +++-
 include/uapi/linux/errqueue.h | 18 ++++++++++++++++++
 net/core/skbuff.c             |  1 +
 net/socket.c                  | 20 +++++++++++---------
 5 files changed, 36 insertions(+), 10 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 281dece..477f0f6 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -249,6 +249,9 @@ enum {
 	SKBTX_SHARED_FRAG = 1 << 5,
 };
 
+#define SKBTX_ANY_SW_TSTAMP	SKBTX_SW_TSTAMP
+#define SKBTX_ANY_TSTAMP	(SKBTX_HW_TSTAMP | SKBTX_ANY_SW_TSTAMP)
+
 /*
  * The callback notifies userspace to release buffers when skb DMA is done in
  * lower device, the skb last reference should be 0 when calling this.
diff --git a/include/net/sock.h b/include/net/sock.h
index b91c886..02f5b35 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2169,7 +2169,9 @@ sock_recv_timestamp(struct msghdr *msg, struct sock *sk, struct sk_buff *skb)
 	 */
 	if (sock_flag(sk, SOCK_RCVTSTAMP) ||
 	    sock_flag(sk, SOCK_TIMESTAMPING_RX_SOFTWARE) ||
-	    (kt.tv64 && sock_flag(sk, SOCK_TIMESTAMPING_SOFTWARE)) ||
+	    (kt.tv64 &&
+	     (sock_flag(sk, SOCK_TIMESTAMPING_SOFTWARE) ||
+	      skb_shinfo(skb)->tx_flags & SKBTX_ANY_SW_TSTAMP)) ||
 	    (hwtstamps->hwtstamp.tv64 &&
 	     sock_flag(sk, SOCK_TIMESTAMPING_RAW_HARDWARE)))
 		__sock_recv_timestamp(msg, sk, skb);
diff --git a/include/uapi/linux/errqueue.h b/include/uapi/linux/errqueue.h
index aacd4fb..accee72 100644
--- a/include/uapi/linux/errqueue.h
+++ b/include/uapi/linux/errqueue.h
@@ -22,5 +22,23 @@ struct sock_extended_err {
 
 #define SO_EE_OFFENDER(ee)	((struct sockaddr*)((ee)+1))
 
+/**
+ *	struct scm_timestamping - timestamps exposed through cmsg
+ *
+ *	The timestamping interfaces SO_TIMESTAMPING, MSG_TSTAMP_*
+ *	communicate network timestamps by passing this struct in a cmsg with
+ *	recvmsg(). See Documentation/networking/timestamping.txt for details.
+ */
+struct scm_timestamping {
+	struct timespec ts[3];
+};
+
+/* The type of scm_timestamping, passed in sock_extended_err ee_info.
+ * This defines the type of ts[0]. For SCM_TSTAMP_SND only, if ts[0]
+ * is zero, then this is a hardware timestamp and recorded in ts[2].
+ */
+enum {
+	SCM_TSTAMP_SND,		/* driver passed skb to NIC, or HW */
+};
 
 #endif /* _UAPI_LINUX_ERRQUEUE_H */
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index c1a3303..c9f6880 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3521,6 +3521,7 @@ void skb_tstamp_tx(struct sk_buff *orig_skb,
 	memset(serr, 0, sizeof(*serr));
 	serr->ee.ee_errno = ENOMSG;
 	serr->ee.ee_origin = SO_EE_ORIGIN_TIMESTAMPING;
+	serr->ee.ee_info = SCM_TSTAMP_SND;
 
 	err = sock_queue_err_skb(sk, skb);
 
diff --git a/net/socket.c b/net/socket.c
index d8222c0..dc0cc5d 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -106,6 +106,7 @@
 #include <linux/sockios.h>
 #include <linux/atalk.h>
 #include <net/busy_poll.h>
+#include <linux/errqueue.h>
 
 #ifdef CONFIG_NET_RX_BUSY_POLL
 unsigned int sysctl_net_busy_read __read_mostly;
@@ -697,7 +698,7 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
 	struct sk_buff *skb)
 {
 	int need_software_tstamp = sock_flag(sk, SOCK_RCVTSTAMP);
-	struct timespec ts[3];
+	struct scm_timestamping tss;
 	int empty = 1;
 	struct skb_shared_hwtstamps *shhwtstamps =
 		skb_hwtstamps(skb);
@@ -714,24 +715,25 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
 			put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMP,
 				 sizeof(tv), &tv);
 		} else {
-			skb_get_timestampns(skb, &ts[0]);
+			struct timespec ts;
+			skb_get_timestampns(skb, &ts);
 			put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPNS,
-				 sizeof(ts[0]), &ts[0]);
+				 sizeof(ts), &ts);
 		}
 	}
 
-
-	memset(ts, 0, sizeof(ts));
-	if (sock_flag(sk, SOCK_TIMESTAMPING_SOFTWARE) &&
-	    ktime_to_timespec_cond(skb->tstamp, ts + 0))
+	memset(&tss, 0, sizeof(tss));
+	if ((sock_flag(sk, SOCK_TIMESTAMPING_SOFTWARE) ||
+	     skb_shinfo(skb)->tx_flags & SKBTX_ANY_SW_TSTAMP) &&
+	    ktime_to_timespec_cond(skb->tstamp, tss.ts + 0))
 		empty = 0;
 	if (shhwtstamps &&
 	    sock_flag(sk, SOCK_TIMESTAMPING_RAW_HARDWARE) &&
-	    ktime_to_timespec_cond(shhwtstamps->hwtstamp, ts + 2))
+	    ktime_to_timespec_cond(shhwtstamps->hwtstamp, tss.ts + 2))
 		empty = 0;
 	if (!empty)
 		put_cmsg(msg, SOL_SOCKET,
-			 SCM_TIMESTAMPING, sizeof(ts), &ts);
+			 SCM_TIMESTAMPING, sizeof(tss), &tss);
 }
 EXPORT_SYMBOL_GPL(__sock_recv_timestamp);
 
-- 
2.0.0.526.g5318336

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next v4 2/6] net-timestamp: move timestamp flags out of sk_flags
  2014-08-05  2:11 [PATCH net-next v4 0/6] net-timestamp: new tx tstamps and tcp Willem de Bruijn
  2014-08-05  2:11 ` [PATCH net-next v4 1/6] net-timestamp: extend SCM_TIMESTAMPING ancillary data struct Willem de Bruijn
@ 2014-08-05  2:11 ` Willem de Bruijn
  2014-08-05  2:11 ` [PATCH net-next v4 3/6] net-timestamp: add key to disambiguate concurrent datagrams Willem de Bruijn
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 27+ messages in thread
From: Willem de Bruijn @ 2014-08-05  2:11 UTC (permalink / raw)
  To: netdev; +Cc: davem, eric.dumazet, richardcochran, Willem de Bruijn

sk_flags is reaching its limit. New timestamping options will not fit.
Move all of them into a new field sk->sk_tsflags.

Added benefit is that this removes boilerplate code to convert between
SOF_TIMESTAMPING_.. and SOCK_TIMESTAMPING_.. in getsockopt/setsockopt.

SOCK_TIMESTAMPING_RX_SOFTWARE is also used to toggle the receive
timestamp logic (netstamp_needed). That can be simplified and this
last key removed, but will leave that for a separate patch.

Signed-off-by: Willem de Bruijn <willemb@google.com>

----

The u16 in sock can be moved into a 16-bit hole below sk_gso_max_segs,
though that scatters tstamp fields throughout the struct.
---
 include/net/sock.h | 29 +++++++++++------------------
 net/core/sock.c    | 25 ++-----------------------
 net/socket.c       |  8 ++++----
 3 files changed, 17 insertions(+), 45 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 02f5b35..a211297 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -67,6 +67,7 @@
 #include <linux/atomic.h>
 #include <net/dst.h>
 #include <net/checksum.h>
+#include <linux/net_tstamp.h>
 
 struct cgroup;
 struct cgroup_subsys;
@@ -278,6 +279,7 @@ struct cg_proto;
   *	@sk_protinfo: private area, net family specific, when not using slab
   *	@sk_timer: sock cleanup timer
   *	@sk_stamp: time stamp of last packet received
+  *	@sk_tsflags: SO_TIMESTAMPING socket options
   *	@sk_socket: Identd and reporting IO signals
   *	@sk_user_data: RPC layer private data
   *	@sk_frag: cached page frag
@@ -411,6 +413,7 @@ struct sock {
 	void			*sk_protinfo;
 	struct timer_list	sk_timer;
 	ktime_t			sk_stamp;
+	u16			sk_tsflags;
 	struct socket		*sk_socket;
 	void			*sk_user_data;
 	struct page_frag	sk_frag;
@@ -701,12 +704,7 @@ enum sock_flags {
 	SOCK_LOCALROUTE, /* route locally only, %SO_DONTROUTE setting */
 	SOCK_QUEUE_SHRUNK, /* write queue has been shrunk recently */
 	SOCK_MEMALLOC, /* VM depends on this socket for swapping */
-	SOCK_TIMESTAMPING_TX_HARDWARE,  /* %SOF_TIMESTAMPING_TX_HARDWARE */
-	SOCK_TIMESTAMPING_TX_SOFTWARE,  /* %SOF_TIMESTAMPING_TX_SOFTWARE */
-	SOCK_TIMESTAMPING_RX_HARDWARE,  /* %SOF_TIMESTAMPING_RX_HARDWARE */
 	SOCK_TIMESTAMPING_RX_SOFTWARE,  /* %SOF_TIMESTAMPING_RX_SOFTWARE */
-	SOCK_TIMESTAMPING_SOFTWARE,     /* %SOF_TIMESTAMPING_SOFTWARE */
-	SOCK_TIMESTAMPING_RAW_HARDWARE, /* %SOF_TIMESTAMPING_RAW_HARDWARE */
 	SOCK_FASYNC, /* fasync() active */
 	SOCK_RXQ_OVFL,
 	SOCK_ZEROCOPY, /* buffers from userspace */
@@ -2160,20 +2158,17 @@ sock_recv_timestamp(struct msghdr *msg, struct sock *sk, struct sk_buff *skb)
 
 	/*
 	 * generate control messages if
-	 * - receive time stamping in software requested (SOCK_RCVTSTAMP
-	 *   or SOCK_TIMESTAMPING_RX_SOFTWARE)
+	 * - receive time stamping in software requested
 	 * - software time stamp available and wanted
-	 *   (SOCK_TIMESTAMPING_SOFTWARE)
 	 * - hardware time stamps available and wanted
-	 *   SOCK_TIMESTAMPING_RAW_HARDWARE
 	 */
 	if (sock_flag(sk, SOCK_RCVTSTAMP) ||
-	    sock_flag(sk, SOCK_TIMESTAMPING_RX_SOFTWARE) ||
+	    (sk->sk_tsflags & SOF_TIMESTAMPING_RX_SOFTWARE) ||
 	    (kt.tv64 &&
-	     (sock_flag(sk, SOCK_TIMESTAMPING_SOFTWARE) ||
+	     (sk->sk_tsflags & SOF_TIMESTAMPING_SOFTWARE ||
 	      skb_shinfo(skb)->tx_flags & SKBTX_ANY_SW_TSTAMP)) ||
 	    (hwtstamps->hwtstamp.tv64 &&
-	     sock_flag(sk, SOCK_TIMESTAMPING_RAW_HARDWARE)))
+	     (sk->sk_tsflags & SOF_TIMESTAMPING_RAW_HARDWARE)))
 		__sock_recv_timestamp(msg, sk, skb);
 	else
 		sk->sk_stamp = kt;
@@ -2189,11 +2184,11 @@ static inline void sock_recv_ts_and_drops(struct msghdr *msg, struct sock *sk,
 					  struct sk_buff *skb)
 {
 #define FLAGS_TS_OR_DROPS ((1UL << SOCK_RXQ_OVFL)			| \
-			   (1UL << SOCK_RCVTSTAMP)			| \
-			   (1UL << SOCK_TIMESTAMPING_SOFTWARE)		| \
-			   (1UL << SOCK_TIMESTAMPING_RAW_HARDWARE))
+			   (1UL << SOCK_RCVTSTAMP))
+#define TSFLAGS_ANY	  (SOF_TIMESTAMPING_SOFTWARE			| \
+			   SOF_TIMESTAMPING_RAW_HARDWARE)
 
-	if (sk->sk_flags & FLAGS_TS_OR_DROPS)
+	if (sk->sk_flags & FLAGS_TS_OR_DROPS || sk->sk_tsflags & TSFLAGS_ANY)
 		__sock_recv_ts_and_drops(msg, sk, skb);
 	else
 		sk->sk_stamp = skb->tstamp;
@@ -2203,8 +2198,6 @@ static inline void sock_recv_ts_and_drops(struct msghdr *msg, struct sock *sk,
  * sock_tx_timestamp - checks whether the outgoing packet is to be time stamped
  * @sk:		socket sending this packet
  * @tx_flags:	filled with instructions for time stamping
- *
- * Currently only depends on SOCK_TIMESTAMPING* flags.
  */
 void sock_tx_timestamp(struct sock *sk, __u8 *tx_flags);
 
diff --git a/net/core/sock.c b/net/core/sock.c
index a741163..47c9377 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -848,22 +848,13 @@ set_rcvbuf:
 			ret = -EINVAL;
 			break;
 		}
-		sock_valbool_flag(sk, SOCK_TIMESTAMPING_TX_HARDWARE,
-				  val & SOF_TIMESTAMPING_TX_HARDWARE);
-		sock_valbool_flag(sk, SOCK_TIMESTAMPING_TX_SOFTWARE,
-				  val & SOF_TIMESTAMPING_TX_SOFTWARE);
-		sock_valbool_flag(sk, SOCK_TIMESTAMPING_RX_HARDWARE,
-				  val & SOF_TIMESTAMPING_RX_HARDWARE);
+		sk->sk_tsflags = val;
 		if (val & SOF_TIMESTAMPING_RX_SOFTWARE)
 			sock_enable_timestamp(sk,
 					      SOCK_TIMESTAMPING_RX_SOFTWARE);
 		else
 			sock_disable_timestamp(sk,
 					       (1UL << SOCK_TIMESTAMPING_RX_SOFTWARE));
-		sock_valbool_flag(sk, SOCK_TIMESTAMPING_SOFTWARE,
-				  val & SOF_TIMESTAMPING_SOFTWARE);
-		sock_valbool_flag(sk, SOCK_TIMESTAMPING_RAW_HARDWARE,
-				  val & SOF_TIMESTAMPING_RAW_HARDWARE);
 		break;
 
 	case SO_RCVLOWAT:
@@ -1089,19 +1080,7 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
 		break;
 
 	case SO_TIMESTAMPING:
-		v.val = 0;
-		if (sock_flag(sk, SOCK_TIMESTAMPING_TX_HARDWARE))
-			v.val |= SOF_TIMESTAMPING_TX_HARDWARE;
-		if (sock_flag(sk, SOCK_TIMESTAMPING_TX_SOFTWARE))
-			v.val |= SOF_TIMESTAMPING_TX_SOFTWARE;
-		if (sock_flag(sk, SOCK_TIMESTAMPING_RX_HARDWARE))
-			v.val |= SOF_TIMESTAMPING_RX_HARDWARE;
-		if (sock_flag(sk, SOCK_TIMESTAMPING_RX_SOFTWARE))
-			v.val |= SOF_TIMESTAMPING_RX_SOFTWARE;
-		if (sock_flag(sk, SOCK_TIMESTAMPING_SOFTWARE))
-			v.val |= SOF_TIMESTAMPING_SOFTWARE;
-		if (sock_flag(sk, SOCK_TIMESTAMPING_RAW_HARDWARE))
-			v.val |= SOF_TIMESTAMPING_RAW_HARDWARE;
+		v.val = sk->sk_tsflags;
 		break;
 
 	case SO_RCVTIMEO:
diff --git a/net/socket.c b/net/socket.c
index dc0cc5d..255d9b8 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -613,9 +613,9 @@ EXPORT_SYMBOL(sock_release);
 void sock_tx_timestamp(struct sock *sk, __u8 *tx_flags)
 {
 	*tx_flags = 0;
-	if (sock_flag(sk, SOCK_TIMESTAMPING_TX_HARDWARE))
+	if (sk->sk_tsflags & SOF_TIMESTAMPING_TX_HARDWARE)
 		*tx_flags |= SKBTX_HW_TSTAMP;
-	if (sock_flag(sk, SOCK_TIMESTAMPING_TX_SOFTWARE))
+	if (sk->sk_tsflags & SOF_TIMESTAMPING_TX_SOFTWARE)
 		*tx_flags |= SKBTX_SW_TSTAMP;
 	if (sock_flag(sk, SOCK_WIFI_STATUS))
 		*tx_flags |= SKBTX_WIFI_STATUS;
@@ -723,12 +723,12 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
 	}
 
 	memset(&tss, 0, sizeof(tss));
-	if ((sock_flag(sk, SOCK_TIMESTAMPING_SOFTWARE) ||
+	if ((sk->sk_tsflags & SOF_TIMESTAMPING_SOFTWARE ||
 	     skb_shinfo(skb)->tx_flags & SKBTX_ANY_SW_TSTAMP) &&
 	    ktime_to_timespec_cond(skb->tstamp, tss.ts + 0))
 		empty = 0;
 	if (shhwtstamps &&
-	    sock_flag(sk, SOCK_TIMESTAMPING_RAW_HARDWARE) &&
+	    (sk->sk_tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) &&
 	    ktime_to_timespec_cond(shhwtstamps->hwtstamp, tss.ts + 2))
 		empty = 0;
 	if (!empty)
-- 
2.0.0.526.g5318336

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next v4 3/6] net-timestamp: add key to disambiguate concurrent datagrams
  2014-08-05  2:11 [PATCH net-next v4 0/6] net-timestamp: new tx tstamps and tcp Willem de Bruijn
  2014-08-05  2:11 ` [PATCH net-next v4 1/6] net-timestamp: extend SCM_TIMESTAMPING ancillary data struct Willem de Bruijn
  2014-08-05  2:11 ` [PATCH net-next v4 2/6] net-timestamp: move timestamp flags out of sk_flags Willem de Bruijn
@ 2014-08-05  2:11 ` Willem de Bruijn
  2014-08-05  2:11 ` [PATCH net-next v4 4/6] net-timestamp: SCHED timestamp on entering packet scheduler Willem de Bruijn
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 27+ messages in thread
From: Willem de Bruijn @ 2014-08-05  2:11 UTC (permalink / raw)
  To: netdev; +Cc: davem, eric.dumazet, richardcochran, Willem de Bruijn

Datagrams timestamped on transmission can coexist in the kernel stack
and be reordered in packet scheduling. When reading looped datagrams
from the socket error queue it is not always possible to unique
correlate looped data with original send() call (for application
level retransmits). Even if possible, it may be expensive and complex,
requiring packet inspection.

Introduce a data-independent ID mechanism to associate timestamps with
send calls. Pass an ID alongside the timestamp in field ee_data of
sock_extended_err.

The ID is a simple 32 bit unsigned int that is associated with the
socket and incremented on each send() call for which software tx
timestamp generation is enabled.

The feature is enabled only if SOF_TIMESTAMPING_OPT_ID is set, to
avoid changing ee_data for existing applications that expect it 0.
The counter is reset each time the flag is reenabled. Reenabling
does not change the ID of already submitted data. It is possible
to receive out of order IDs if the timestamp stream is not quiesced
first.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 include/linux/skbuff.h          | 1 +
 include/net/sock.h              | 2 ++
 include/uapi/linux/net_tstamp.h | 8 +++++---
 net/core/skbuff.c               | 2 ++
 net/core/sock.c                 | 3 +++
 net/ipv4/ip_output.c            | 6 ++++++
 net/ipv6/ip6_output.c           | 9 ++++++++-
 7 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 477f0f6..0e35b3a 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -278,6 +278,7 @@ struct skb_shared_info {
 	unsigned short  gso_type;
 	struct sk_buff	*frag_list;
 	struct skb_shared_hwtstamps hwtstamps;
+	u32		tskey;
 	__be32          ip6_frag_id;
 
 	/*
diff --git a/include/net/sock.h b/include/net/sock.h
index a211297..52fe0bc 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -280,6 +280,7 @@ struct cg_proto;
   *	@sk_timer: sock cleanup timer
   *	@sk_stamp: time stamp of last packet received
   *	@sk_tsflags: SO_TIMESTAMPING socket options
+  *	@sk_tskey: counter to disambiguate concurrent tstamp requests
   *	@sk_socket: Identd and reporting IO signals
   *	@sk_user_data: RPC layer private data
   *	@sk_frag: cached page frag
@@ -414,6 +415,7 @@ struct sock {
 	struct timer_list	sk_timer;
 	ktime_t			sk_stamp;
 	u16			sk_tsflags;
+	u32			sk_tskey;
 	struct socket		*sk_socket;
 	void			*sk_user_data;
 	struct page_frag	sk_frag;
diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
index f53879c..1e861d2 100644
--- a/include/uapi/linux/net_tstamp.h
+++ b/include/uapi/linux/net_tstamp.h
@@ -20,9 +20,11 @@ enum {
 	SOF_TIMESTAMPING_SOFTWARE = (1<<4),
 	SOF_TIMESTAMPING_SYS_HARDWARE = (1<<5),
 	SOF_TIMESTAMPING_RAW_HARDWARE = (1<<6),
-	SOF_TIMESTAMPING_MASK =
-	(SOF_TIMESTAMPING_RAW_HARDWARE - 1) |
-	SOF_TIMESTAMPING_RAW_HARDWARE
+	SOF_TIMESTAMPING_OPT_ID = (1<<7),
+
+	SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_ID,
+	SOF_TIMESTAMPING_MASK = (SOF_TIMESTAMPING_LAST - 1) |
+				 SOF_TIMESTAMPING_LAST
 };
 
 /**
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index c9f6880..0df4f1d 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3522,6 +3522,8 @@ void skb_tstamp_tx(struct sk_buff *orig_skb,
 	serr->ee.ee_errno = ENOMSG;
 	serr->ee.ee_origin = SO_EE_ORIGIN_TIMESTAMPING;
 	serr->ee.ee_info = SCM_TSTAMP_SND;
+	if (sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID)
+		serr->ee.ee_data = skb_shinfo(skb)->tskey;
 
 	err = sock_queue_err_skb(sk, skb);
 
diff --git a/net/core/sock.c b/net/core/sock.c
index 47c9377..1e0f1c6 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -848,6 +848,9 @@ set_rcvbuf:
 			ret = -EINVAL;
 			break;
 		}
+		if (val & SOF_TIMESTAMPING_OPT_ID &&
+		    !(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID))
+			sk->sk_tskey = 0;
 		sk->sk_tsflags = val;
 		if (val & SOF_TIMESTAMPING_RX_SOFTWARE)
 			sock_enable_timestamp(sk,
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index b165568..215af2b 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -855,11 +855,15 @@ static int __ip_append_data(struct sock *sk,
 	unsigned int maxfraglen, fragheaderlen, maxnonfragsize;
 	int csummode = CHECKSUM_NONE;
 	struct rtable *rt = (struct rtable *)cork->dst;
+	u32 tskey = 0;
 
 	skb = skb_peek_tail(queue);
 
 	exthdrlen = !skb ? rt->dst.header_len : 0;
 	mtu = cork->fragsize;
+	if (cork->tx_flags & SKBTX_ANY_SW_TSTAMP &&
+	    sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID)
+		tskey = sk->sk_tskey++;
 
 	hh_len = LL_RESERVED_SPACE(rt->dst.dev);
 
@@ -976,6 +980,8 @@ alloc_new_skb:
 			/* only the initial fragment is time stamped */
 			skb_shinfo(skb)->tx_flags = cork->tx_flags;
 			cork->tx_flags = 0;
+			skb_shinfo(skb)->tskey = tskey;
+			tskey = 0;
 
 			/*
 			 *	Find where to start putting bytes.
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index f5dafe6..315a55d 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1157,6 +1157,7 @@ int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to,
 	int err;
 	int offset = 0;
 	__u8 tx_flags = 0;
+	u32 tskey = 0;
 
 	if (flags&MSG_PROBE)
 		return 0;
@@ -1272,8 +1273,12 @@ emsgsize:
 		}
 	}
 
-	if (sk->sk_type == SOCK_DGRAM || sk->sk_type == SOCK_RAW)
+	if (sk->sk_type == SOCK_DGRAM || sk->sk_type == SOCK_RAW) {
 		sock_tx_timestamp(sk, &tx_flags);
+		if (tx_flags & SKBTX_ANY_SW_TSTAMP &&
+		    sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID)
+			tskey = sk->sk_tskey++;
+	}
 
 	/*
 	 * Let's try using as much space as possible.
@@ -1397,6 +1402,8 @@ alloc_new_skb:
 			/* Only the initial fragment is time stamped */
 			skb_shinfo(skb)->tx_flags = tx_flags;
 			tx_flags = 0;
+			skb_shinfo(skb)->tskey = tskey;
+			tskey = 0;
 
 			/*
 			 *	Find where to start putting bytes
-- 
2.0.0.526.g5318336

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next v4 4/6] net-timestamp: SCHED timestamp on entering packet scheduler
  2014-08-05  2:11 [PATCH net-next v4 0/6] net-timestamp: new tx tstamps and tcp Willem de Bruijn
                   ` (2 preceding siblings ...)
  2014-08-05  2:11 ` [PATCH net-next v4 3/6] net-timestamp: add key to disambiguate concurrent datagrams Willem de Bruijn
@ 2014-08-05  2:11 ` Willem de Bruijn
  2014-08-05  2:11 ` [PATCH net-next v4 5/6] net-timestamp: TCP timestamping Willem de Bruijn
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 27+ messages in thread
From: Willem de Bruijn @ 2014-08-05  2:11 UTC (permalink / raw)
  To: netdev; +Cc: davem, eric.dumazet, richardcochran, Willem de Bruijn

Kernel transmit latency is often incurred in the packet scheduler.
Introduce a new timestamp on transmission just before entering the
scheduler. When data travels through multiple devices (bonding,
tunneling, ...) each device will export an individual timestamp.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 include/linux/skbuff.h          | 11 +++++++++--
 include/uapi/linux/errqueue.h   |  1 +
 include/uapi/linux/net_tstamp.h |  3 ++-
 net/core/dev.c                  |  4 ++++
 net/core/skbuff.c               | 16 ++++++++++++----
 net/socket.c                    |  3 +++
 6 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 0e35b3a..50e1e9b 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -229,7 +229,7 @@ enum {
 	/* generate hardware time stamp */
 	SKBTX_HW_TSTAMP = 1 << 0,
 
-	/* generate software time stamp */
+	/* generate software time stamp when queueing packet to NIC */
 	SKBTX_SW_TSTAMP = 1 << 1,
 
 	/* device driver is going to provide hardware time stamp */
@@ -247,9 +247,12 @@ enum {
 	 * all frags to avoid possible bad checksum
 	 */
 	SKBTX_SHARED_FRAG = 1 << 5,
+
+	/* generate software time stamp when entering packet scheduling */
+	SKBTX_SCHED_TSTAMP = 1 << 6,
 };
 
-#define SKBTX_ANY_SW_TSTAMP	SKBTX_SW_TSTAMP
+#define SKBTX_ANY_SW_TSTAMP	(SKBTX_SW_TSTAMP | SKBTX_SCHED_TSTAMP)
 #define SKBTX_ANY_TSTAMP	(SKBTX_HW_TSTAMP | SKBTX_ANY_SW_TSTAMP)
 
 /*
@@ -2695,6 +2698,10 @@ static inline bool skb_defer_rx_timestamp(struct sk_buff *skb)
 void skb_complete_tx_timestamp(struct sk_buff *skb,
 			       struct skb_shared_hwtstamps *hwtstamps);
 
+void __skb_tstamp_tx(struct sk_buff *orig_skb,
+		     struct skb_shared_hwtstamps *hwtstamps,
+		     struct sock *sk, int tstype);
+
 /**
  * skb_tstamp_tx - queue clone of skb with send time stamps
  * @orig_skb:	the original outgoing packet
diff --git a/include/uapi/linux/errqueue.h b/include/uapi/linux/errqueue.h
index accee72..17437cf 100644
--- a/include/uapi/linux/errqueue.h
+++ b/include/uapi/linux/errqueue.h
@@ -39,6 +39,7 @@ struct scm_timestamping {
  */
 enum {
 	SCM_TSTAMP_SND,		/* driver passed skb to NIC, or HW */
+	SCM_TSTAMP_SCHED,	/* data entered the packet scheduler */
 };
 
 #endif /* _UAPI_LINUX_ERRQUEUE_H */
diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
index 1e861d2..6073384 100644
--- a/include/uapi/linux/net_tstamp.h
+++ b/include/uapi/linux/net_tstamp.h
@@ -21,8 +21,9 @@ enum {
 	SOF_TIMESTAMPING_SYS_HARDWARE = (1<<5),
 	SOF_TIMESTAMPING_RAW_HARDWARE = (1<<6),
 	SOF_TIMESTAMPING_OPT_ID = (1<<7),
+	SOF_TIMESTAMPING_TX_SCHED = (1<<8),
 
-	SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_ID,
+	SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_TX_SCHED,
 	SOF_TIMESTAMPING_MASK = (SOF_TIMESTAMPING_LAST - 1) |
 				 SOF_TIMESTAMPING_LAST
 };
diff --git a/net/core/dev.c b/net/core/dev.c
index b370230..1c15b18 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -132,6 +132,7 @@
 #include <linux/hashtable.h>
 #include <linux/vmalloc.h>
 #include <linux/if_macvlan.h>
+#include <linux/errqueue.h>
 
 #include "net-sysfs.h"
 
@@ -2876,6 +2877,9 @@ static int __dev_queue_xmit(struct sk_buff *skb, void *accel_priv)
 
 	skb_reset_mac_header(skb);
 
+	if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_SCHED_TSTAMP))
+		__skb_tstamp_tx(skb, NULL, skb->sk, SCM_TSTAMP_SCHED);
+
 	/* Disable soft irqs for various locks below. Also
 	 * stops preemption for RCU.
 	 */
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 0df4f1d..9705c07 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3490,10 +3490,10 @@ int sock_queue_err_skb(struct sock *sk, struct sk_buff *skb)
 }
 EXPORT_SYMBOL(sock_queue_err_skb);
 
-void skb_tstamp_tx(struct sk_buff *orig_skb,
-		struct skb_shared_hwtstamps *hwtstamps)
+void __skb_tstamp_tx(struct sk_buff *orig_skb,
+		     struct skb_shared_hwtstamps *hwtstamps,
+		     struct sock *sk, int tstype)
 {
-	struct sock *sk = orig_skb->sk;
 	struct sock_exterr_skb *serr;
 	struct sk_buff *skb;
 	int err;
@@ -3521,7 +3521,7 @@ void skb_tstamp_tx(struct sk_buff *orig_skb,
 	memset(serr, 0, sizeof(*serr));
 	serr->ee.ee_errno = ENOMSG;
 	serr->ee.ee_origin = SO_EE_ORIGIN_TIMESTAMPING;
-	serr->ee.ee_info = SCM_TSTAMP_SND;
+	serr->ee.ee_info = tstype;
 	if (sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID)
 		serr->ee.ee_data = skb_shinfo(skb)->tskey;
 
@@ -3530,6 +3530,14 @@ void skb_tstamp_tx(struct sk_buff *orig_skb,
 	if (err)
 		kfree_skb(skb);
 }
+EXPORT_SYMBOL_GPL(__skb_tstamp_tx);
+
+void skb_tstamp_tx(struct sk_buff *orig_skb,
+		   struct skb_shared_hwtstamps *hwtstamps)
+{
+	return __skb_tstamp_tx(orig_skb, hwtstamps, orig_skb->sk,
+			       SCM_TSTAMP_SND);
+}
 EXPORT_SYMBOL_GPL(skb_tstamp_tx);
 
 void skb_complete_wifi_ack(struct sk_buff *skb, bool acked)
diff --git a/net/socket.c b/net/socket.c
index 255d9b8..3a2778d 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -617,6 +617,9 @@ void sock_tx_timestamp(struct sock *sk, __u8 *tx_flags)
 		*tx_flags |= SKBTX_HW_TSTAMP;
 	if (sk->sk_tsflags & SOF_TIMESTAMPING_TX_SOFTWARE)
 		*tx_flags |= SKBTX_SW_TSTAMP;
+	if (sk->sk_tsflags & SOF_TIMESTAMPING_TX_SCHED)
+		*tx_flags |= SKBTX_SCHED_TSTAMP;
+
 	if (sock_flag(sk, SOCK_WIFI_STATUS))
 		*tx_flags |= SKBTX_WIFI_STATUS;
 }
-- 
2.0.0.526.g5318336

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next v4 5/6] net-timestamp: TCP timestamping
  2014-08-05  2:11 [PATCH net-next v4 0/6] net-timestamp: new tx tstamps and tcp Willem de Bruijn
                   ` (3 preceding siblings ...)
  2014-08-05  2:11 ` [PATCH net-next v4 4/6] net-timestamp: SCHED timestamp on entering packet scheduler Willem de Bruijn
@ 2014-08-05  2:11 ` Willem de Bruijn
  2014-08-06  6:25   ` Eric Dumazet
                     ` (2 more replies)
  2014-08-05  2:11 ` [PATCH net-next v4 6/6] net-timestamp: ACK timestamp for bytestreams Willem de Bruijn
                   ` (3 subsequent siblings)
  8 siblings, 3 replies; 27+ messages in thread
From: Willem de Bruijn @ 2014-08-05  2:11 UTC (permalink / raw)
  To: netdev; +Cc: davem, eric.dumazet, richardcochran, Willem de Bruijn

TCP timestamping extends SO_TIMESTAMPING to bytestreams.

Bytestreams do not have a 1:1 relationship between send() buffers and
network packets. The feature interprets a send call on a bytestream as
a request for a timestamp for the last byte in that send() buffer.

The choice corresponds to a request for a timestamp when all bytes in
the buffer have been sent. That assumption depends on in-order kernel
transmission. This is the common case. That said, it is possible to
construct a traffic shaping tree that would result in reordering.
The guarantee is strong, then, but not ironclad.

This implementation supports send and sendpages (splice). GSO replaces
one large packet with multiple smaller packets. This patch also copies
the option into the correct smaller packet.

This patch does not yet support timestamping on data in an initial TCP
Fast Open SYN, because that takes a very different data path.

If ID generation in ee_data is enabled, bytestream timestamps return a
byte offset, instead of the packet counter for datagrams.

The implementation supports a single timestamp per packet. It silenty
replaces requests for previous timestamps. To avoid missing tstamps,
flush the tcp queue by disabling Nagle, cork and autocork. Missing
tstamps can be detected by offset when the ee_data ID is enabled.

Implementation details:

- On GSO, the timestamping code can be included in the main loop. I
moved it into its own loop to reduce the impact on the common case
to a single branch.

- To avoid leaking the absolute seqno to userspace, the offset
returned in ee_data must always be relative. It is an offset between
an skb and sk field. The first is always set (also for GSO & ACK).
The second must also never be uninitialized. Only allow the ID
option on sockets in the ESTABLISHED state, for which the seqno
is available. Never reset it to zero (instead, move it to the
current seqno when reenabling the option).

Signed-off-by: Willem de Bruijn <willemb@google.com>

---
---
 net/core/skbuff.c      |  5 ++++-
 net/core/sock.c        | 13 +++++++++++--
 net/ipv4/tcp.c         | 22 +++++++++++++++++++---
 net/ipv4/tcp_offload.c | 18 ++++++++++++++++++
 4 files changed, 52 insertions(+), 6 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 9705c07..3dec029 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3522,8 +3522,11 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
 	serr->ee.ee_errno = ENOMSG;
 	serr->ee.ee_origin = SO_EE_ORIGIN_TIMESTAMPING;
 	serr->ee.ee_info = tstype;
-	if (sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID)
+	if (sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID) {
 		serr->ee.ee_data = skb_shinfo(skb)->tskey;
+		if (sk->sk_protocol == IPPROTO_TCP)
+			serr->ee.ee_data -= sk->sk_tskey;
+	}
 
 	err = sock_queue_err_skb(sk, skb);
 
diff --git a/net/core/sock.c b/net/core/sock.c
index 1e0f1c6..2714811 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -849,8 +849,17 @@ set_rcvbuf:
 			break;
 		}
 		if (val & SOF_TIMESTAMPING_OPT_ID &&
-		    !(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID))
-			sk->sk_tskey = 0;
+		    !(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID)) {
+			if (sk->sk_protocol == IPPROTO_TCP) {
+				if (sk->sk_state != TCP_ESTABLISHED) {
+					ret = -EINVAL;
+					break;
+				}
+				sk->sk_tskey = tcp_sk(sk)->snd_una;
+			} else {
+				sk->sk_tskey = 0;
+			}
+		}
 		sk->sk_tsflags = val;
 		if (val & SOF_TIMESTAMPING_RX_SOFTWARE)
 			sock_enable_timestamp(sk,
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 9d2118e..744af67 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -426,6 +426,15 @@ void tcp_init_sock(struct sock *sk)
 }
 EXPORT_SYMBOL(tcp_init_sock);
 
+void tcp_tx_timestamp(struct sock *sk, struct sk_buff *skb)
+{
+	struct skb_shared_info *shinfo = skb_shinfo(skb);
+
+	sock_tx_timestamp(sk, &shinfo->tx_flags);
+	if (shinfo->tx_flags & SKBTX_ANY_SW_TSTAMP)
+		shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
+}
+
 /*
  *	Wait for a TCP event.
  *
@@ -523,7 +532,7 @@ unsigned int tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
 	}
 	/* This barrier is coupled with smp_wmb() in tcp_reset() */
 	smp_rmb();
-	if (sk->sk_err)
+	if (sk->sk_err || !skb_queue_empty(&sk->sk_error_queue))
 		mask |= POLLERR;
 
 	return mask;
@@ -959,8 +968,10 @@ new_segment:
 
 		copied += copy;
 		offset += copy;
-		if (!(size -= copy))
+		if (!(size -= copy)) {
+			tcp_tx_timestamp(sk, skb);
 			goto out;
+		}
 
 		if (skb->len < size_goal || (flags & MSG_OOB))
 			continue;
@@ -1252,8 +1263,10 @@ new_segment:
 
 			from += copy;
 			copied += copy;
-			if ((seglen -= copy) == 0 && iovlen == 0)
+			if ((seglen -= copy) == 0 && iovlen == 0) {
+				tcp_tx_timestamp(sk, skb);
 				goto out;
+			}
 
 			if (skb->len < max || (flags & MSG_OOB) || unlikely(tp->repair))
 				continue;
@@ -1617,6 +1630,9 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	struct sk_buff *skb;
 	u32 urg_hole = 0;
 
+	if (unlikely(flags & MSG_ERRQUEUE))
+		return ip_recv_error(sk, msg, len, addr_len);
+
 	if (sk_can_busy_loop(sk) && skb_queue_empty(&sk->sk_receive_queue) &&
 	    (sk->sk_state == TCP_ESTABLISHED))
 		sk_busy_loop(sk, nonblock);
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index 55046ec..f597119 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -14,6 +14,21 @@
 #include <net/tcp.h>
 #include <net/protocol.h>
 
+void tcp_gso_tstamp(struct sk_buff *skb, unsigned int ts_seq, unsigned int seq,
+		    unsigned int mss)
+{
+	while (skb) {
+		if (ts_seq < (__u64) seq + mss) {
+			skb_shinfo(skb)->tx_flags = SKBTX_SW_TSTAMP;
+			skb_shinfo(skb)->tskey = ts_seq;
+			return;
+		}
+
+		skb = skb->next;
+		seq += mss;
+	}
+}
+
 struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
 				netdev_features_t features)
 {
@@ -91,6 +106,9 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
 	th = tcp_hdr(skb);
 	seq = ntohl(th->seq);
 
+	if (unlikely(skb_shinfo(gso_skb)->tx_flags & SKBTX_SW_TSTAMP))
+		tcp_gso_tstamp(segs, skb_shinfo(gso_skb)->tskey, seq, mss);
+
 	newcheck = ~csum_fold((__force __wsum)((__force u32)th->check +
 					       (__force u32)delta));
 
-- 
2.0.0.526.g5318336

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next v4 6/6] net-timestamp: ACK timestamp for bytestreams
  2014-08-05  2:11 [PATCH net-next v4 0/6] net-timestamp: new tx tstamps and tcp Willem de Bruijn
                   ` (4 preceding siblings ...)
  2014-08-05  2:11 ` [PATCH net-next v4 5/6] net-timestamp: TCP timestamping Willem de Bruijn
@ 2014-08-05  2:11 ` Willem de Bruijn
  2014-08-05  2:16 ` [PATCH net-next v4 0/6] net-timestamp: new tx tstamps and tcp Willem de Bruijn
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 27+ messages in thread
From: Willem de Bruijn @ 2014-08-05  2:11 UTC (permalink / raw)
  To: netdev; +Cc: davem, eric.dumazet, richardcochran, Willem de Bruijn

Add SOF_TIMESTAMPING_TX_ACK, a request for a tstamp when the last byte
in the send() call is acknowledged. It implements the feature for TCP.

The timestamp is generated when the TCP socket cumulative ACK is moved
beyond the tracked seqno for the first time. The feature ignores SACK
and FACK, because those acknowledge the specific byte, but not
necessarily the entire contents of the buffer up to that byte.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 include/linux/skbuff.h          | 7 ++++++-
 include/uapi/linux/errqueue.h   | 1 +
 include/uapi/linux/net_tstamp.h | 3 ++-
 net/ipv4/tcp_input.c            | 6 ++++++
 net/socket.c                    | 2 ++
 5 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 50e1e9b..11c2705 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -250,9 +250,14 @@ enum {
 
 	/* generate software time stamp when entering packet scheduling */
 	SKBTX_SCHED_TSTAMP = 1 << 6,
+
+	/* generate software timestamp on peer data acknowledgment */
+	SKBTX_ACK_TSTAMP = 1 << 7,
 };
 
-#define SKBTX_ANY_SW_TSTAMP	(SKBTX_SW_TSTAMP | SKBTX_SCHED_TSTAMP)
+#define SKBTX_ANY_SW_TSTAMP	(SKBTX_SW_TSTAMP    | \
+				 SKBTX_SCHED_TSTAMP | \
+				 SKBTX_ACK_TSTAMP)
 #define SKBTX_ANY_TSTAMP	(SKBTX_HW_TSTAMP | SKBTX_ANY_SW_TSTAMP)
 
 /*
diff --git a/include/uapi/linux/errqueue.h b/include/uapi/linux/errqueue.h
index 17437cf..07bdce1 100644
--- a/include/uapi/linux/errqueue.h
+++ b/include/uapi/linux/errqueue.h
@@ -40,6 +40,7 @@ struct scm_timestamping {
 enum {
 	SCM_TSTAMP_SND,		/* driver passed skb to NIC, or HW */
 	SCM_TSTAMP_SCHED,	/* data entered the packet scheduler */
+	SCM_TSTAMP_ACK,		/* data acknowledged by peer */
 };
 
 #endif /* _UAPI_LINUX_ERRQUEUE_H */
diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
index 6073384..ff35402 100644
--- a/include/uapi/linux/net_tstamp.h
+++ b/include/uapi/linux/net_tstamp.h
@@ -22,8 +22,9 @@ enum {
 	SOF_TIMESTAMPING_RAW_HARDWARE = (1<<6),
 	SOF_TIMESTAMPING_OPT_ID = (1<<7),
 	SOF_TIMESTAMPING_TX_SCHED = (1<<8),
+	SOF_TIMESTAMPING_TX_ACK = (1<<9),
 
-	SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_TX_SCHED,
+	SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_TX_ACK,
 	SOF_TIMESTAMPING_MASK = (SOF_TIMESTAMPING_LAST - 1) |
 				 SOF_TIMESTAMPING_LAST
 };
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 7832d94..1320602 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -74,6 +74,7 @@
 #include <linux/ipsec.h>
 #include <asm/unaligned.h>
 #include <net/netdma.h>
+#include <linux/errqueue.h>
 
 int sysctl_tcp_timestamps __read_mostly = 1;
 int sysctl_tcp_window_scaling __read_mostly = 1;
@@ -3099,6 +3100,11 @@ static int tcp_clean_rtx_queue(struct sock *sk, int prior_fackets,
 			tp->retrans_stamp = 0;
 		}
 
+		if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_ACK_TSTAMP) &&
+		    between(skb_shinfo(skb)->tskey, prior_snd_una,
+			    tp->snd_una + 1))
+			__skb_tstamp_tx(skb, NULL, sk, SCM_TSTAMP_ACK);
+
 		if (!fully_acked)
 			break;
 
diff --git a/net/socket.c b/net/socket.c
index 3a2778d..ae89569 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -619,6 +619,8 @@ void sock_tx_timestamp(struct sock *sk, __u8 *tx_flags)
 		*tx_flags |= SKBTX_SW_TSTAMP;
 	if (sk->sk_tsflags & SOF_TIMESTAMPING_TX_SCHED)
 		*tx_flags |= SKBTX_SCHED_TSTAMP;
+	if (sk->sk_tsflags & SOF_TIMESTAMPING_TX_ACK)
+		*tx_flags |= SKBTX_ACK_TSTAMP;
 
 	if (sock_flag(sk, SOCK_WIFI_STATUS))
 		*tx_flags |= SKBTX_WIFI_STATUS;
-- 
2.0.0.526.g5318336

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v4 0/6] net-timestamp: new tx tstamps and tcp
  2014-08-05  2:11 [PATCH net-next v4 0/6] net-timestamp: new tx tstamps and tcp Willem de Bruijn
                   ` (5 preceding siblings ...)
  2014-08-05  2:11 ` [PATCH net-next v4 6/6] net-timestamp: ACK timestamp for bytestreams Willem de Bruijn
@ 2014-08-05  2:16 ` Willem de Bruijn
  2014-08-05 23:36 ` David Miller
  2014-08-07 23:09 ` Andi Kleen
  8 siblings, 0 replies; 27+ messages in thread
From: Willem de Bruijn @ 2014-08-05  2:16 UTC (permalink / raw)
  To: Network Development
  Cc: David Miller, Eric Dumazet, Richard Cochran, Willem de Bruijn

Apologies for the off-by-one error in the subject of each patch: this
is v5. The changelog does mention that correctly.

If the code needs changes, or if you prefer to merge it together with
documentation + example code, I can also resubmit after the window
reopens for 3.18.

Thanks,

  Willem

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v4 0/6] net-timestamp: new tx tstamps and tcp
  2014-08-05  2:11 [PATCH net-next v4 0/6] net-timestamp: new tx tstamps and tcp Willem de Bruijn
                   ` (6 preceding siblings ...)
  2014-08-05  2:16 ` [PATCH net-next v4 0/6] net-timestamp: new tx tstamps and tcp Willem de Bruijn
@ 2014-08-05 23:36 ` David Miller
  2014-08-07 23:09 ` Andi Kleen
  8 siblings, 0 replies; 27+ messages in thread
From: David Miller @ 2014-08-05 23:36 UTC (permalink / raw)
  To: willemb; +Cc: netdev, eric.dumazet, richardcochran

From: Willem de Bruijn <willemb@google.com>
Date: Mon,  4 Aug 2014 22:11:44 -0400

> Extend socket tx timestamping:
> - allow multiple types of software timestamps aside from send (1)
> - add software timestamp on enter packet scheduling (4)
> - add software timestamp for TCP (5)
> - add software timestamp for TCP on ACK (6)
> 
> The sk_flags option space is nearly exhausted. Also move the
> many timestamp options to a new sk->sk_tstamps (2).
> 
> To disambiguate data when tstamps may arrive out of order,
> optionally return a sequential ID assigned at send (3).
> 
> Extend Linux tx timestamping to monitoring of latency
> incurred within the kernel stack and to protocols embedded in TCP.
> Complex kernel setups may have multiple layers of queueing, including
> multiple instances of packet scheduling, and many classes per layer.
> Many applications embed discrete payloads into TCP bytestreams for
> reliability, flow control, etcetera. Detecting application tail
> latency in such scenarios relies on identifying the exact queue
> responsible if on the host, or the network latency if otherwise.

This looks really good, thanks for taking the time to refine the
interface and how the internal sequence numbering works.

Applied to net-next.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v4 5/6] net-timestamp: TCP timestamping
  2014-08-05  2:11 ` [PATCH net-next v4 5/6] net-timestamp: TCP timestamping Willem de Bruijn
@ 2014-08-06  6:25   ` Eric Dumazet
  2014-08-06  6:55   ` Eric Dumazet
  2014-08-06  7:05   ` Eric Dumazet
  2 siblings, 0 replies; 27+ messages in thread
From: Eric Dumazet @ 2014-08-06  6:25 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: netdev, davem, richardcochran

On Mon, 2014-08-04 at 22:11 -0400, Willem de Bruijn wrote:

...

> diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
> index 55046ec..f597119 100644
> --- a/net/ipv4/tcp_offload.c
> +++ b/net/ipv4/tcp_offload.c
> @@ -14,6 +14,21 @@
>  #include <net/tcp.h>
>  #include <net/protocol.h>
>  
> +void tcp_gso_tstamp(struct sk_buff *skb, unsigned int ts_seq, unsigned int seq,
> +		    unsigned int mss)

static void ...

Or else :

make C=2 net/ipv4/tcp_offload.o
  CHECK   net/ipv4/tcp_offload.c
net/ipv4/tcp_offload.c:17:6: warning: symbol 'tcp_gso_tstamp' was not
declared. Should it be static?


> +{
> +	while (skb) {
> +		if (ts_seq < (__u64) seq + mss) {

TCP stack uses after() and/or before() macros when dealing with 32bit
sequence numbers.

> +			skb_shinfo(skb)->tx_flags = SKBTX_SW_TSTAMP;
> +			skb_shinfo(skb)->tskey = ts_seq;
> +			return;
> +		}
> +
> +		skb = skb->next;
> +		seq += mss;
> +	}
> +}
> +

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v4 5/6] net-timestamp: TCP timestamping
  2014-08-05  2:11 ` [PATCH net-next v4 5/6] net-timestamp: TCP timestamping Willem de Bruijn
  2014-08-06  6:25   ` Eric Dumazet
@ 2014-08-06  6:55   ` Eric Dumazet
  2014-08-06 13:08     ` Willem de Bruijn
  2014-08-06  7:05   ` Eric Dumazet
  2 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2014-08-06  6:55 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: netdev, davem, richardcochran

On Mon, 2014-08-04 at 22:11 -0400, Willem de Bruijn wrote:
> TCP timestamping extends SO_TIMESTAMPING to bytestreams.

> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 9d2118e..744af67 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -426,6 +426,15 @@ void tcp_init_sock(struct sock *sk)
>  }
>  EXPORT_SYMBOL(tcp_init_sock);
>  
> +void tcp_tx_timestamp(struct sock *sk, struct sk_buff *skb)
> +{
> +	struct skb_shared_info *shinfo = skb_shinfo(skb);
> +
> +	sock_tx_timestamp(sk, &shinfo->tx_flags);

Arg, this breaks SKBTX_SHARED_FRAG support.

And sock_tx_timestamp() is quite expensive for TCP where only
SKBTX_ACK_TSTAMP could possibly be supported ?

> +	if (shinfo->tx_flags & SKBTX_ANY_SW_TSTAMP)
> +		shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> +}


static void tcp_tx_timestamp(const struct sock *sk, struct sk_buff *skb)

I'll send a cumulative patch.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v4 5/6] net-timestamp: TCP timestamping
  2014-08-05  2:11 ` [PATCH net-next v4 5/6] net-timestamp: TCP timestamping Willem de Bruijn
  2014-08-06  6:25   ` Eric Dumazet
  2014-08-06  6:55   ` Eric Dumazet
@ 2014-08-06  7:05   ` Eric Dumazet
  2014-08-06  9:49     ` [PATCH net-next] net-timestamp: sock_tx_timestamp() fix Eric Dumazet
  2014-08-06 14:23     ` [PATCH net-next v4 5/6] net-timestamp: TCP timestamping Willem de Bruijn
  2 siblings, 2 replies; 27+ messages in thread
From: Eric Dumazet @ 2014-08-06  7:05 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: netdev, davem, richardcochran

On Mon, 2014-08-04 at 22:11 -0400, Willem de Bruijn wrote:
> TCP timestamping extends SO_TIMESTAMPING to bytestreams.
> 
> Bytestreams do not have a 1:1 relationship between send() buffers and
> network packets. The feature interprets a send call on a bytestream as
> a request for a timestamp for the last byte in that send() buffer.
> 
> The choice corresponds to a request for a timestamp when all bytes in
> the buffer have been sent. That assumption depends on in-order kernel
> transmission. This is the common case. That said, it is possible to
> construct a traffic shaping tree that would result in reordering.
> The guarantee is strong, then, but not ironclad.
> 
> This implementation supports send and sendpages (splice). GSO replaces
> one large packet with multiple smaller packets. This patch also copies
> the option into the correct smaller packet.
> 
> This patch does not yet support timestamping on data in an initial TCP
> Fast Open SYN, because that takes a very different data path.
> 
> If ID generation in ee_data is enabled, bytestream timestamps return a
> byte offset, instead of the packet counter for datagrams.
> 
> The implementation supports a single timestamp per packet. It silenty
> replaces requests for previous timestamps. To avoid missing tstamps,
> flush the tcp queue by disabling Nagle, cork and autocork. Missing
> tstamps can be detected by offset when the ee_data ID is enabled.
> 
> Implementation details:
> 
> - On GSO, the timestamping code can be included in the main loop. I
> moved it into its own loop to reduce the impact on the common case
> to a single branch.
> 
> - To avoid leaking the absolute seqno to userspace, the offset
> returned in ee_data must always be relative. It is an offset between
> an skb and sk field. The first is always set (also for GSO & ACK).
> The second must also never be uninitialized. Only allow the ID
> option on sockets in the ESTABLISHED state, for which the seqno
> is available. Never reset it to zero (instead, move it to the
> current seqno when reenabling the option).
> 
> Signed-off-by: Willem de Bruijn <willemb@google.com>
> 


I believe you missed fact that a GSO packet can be split into 2 or many
skbs, when tcp_write_xmit() has to send a lower amount.

This is done in tso_fragment()

Also, I am a bit worried about retransmits ?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH net-next] net-timestamp: sock_tx_timestamp() fix
  2014-08-06  7:05   ` Eric Dumazet
@ 2014-08-06  9:49     ` Eric Dumazet
  2014-08-06 12:52       ` Willem de Bruijn
  2014-08-06 19:38       ` David Miller
  2014-08-06 14:23     ` [PATCH net-next v4 5/6] net-timestamp: TCP timestamping Willem de Bruijn
  1 sibling, 2 replies; 27+ messages in thread
From: Eric Dumazet @ 2014-08-06  9:49 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: netdev, davem, richardcochran

From: Eric Dumazet <edumazet@google.com>

sock_tx_timestamp() should not ignore initial *tx_flags value, as TCP
stack can store SKBTX_SHARED_FRAG in it.

Also first argument (struct sock *) can be const.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 4ed2d765dfac ("net-timestamp: TCP timestamping")
Cc: Willem de Bruijn <willemb@google.com>
---
 include/net/sock.h |    6 ++++--
 net/socket.c       |   20 +++++++++++++-------
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 52fe0bc5598a..38805fa02e48 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2199,9 +2199,11 @@ static inline void sock_recv_ts_and_drops(struct msghdr *msg, struct sock *sk,
 /**
  * sock_tx_timestamp - checks whether the outgoing packet is to be time stamped
  * @sk:		socket sending this packet
- * @tx_flags:	filled with instructions for time stamping
+ * @tx_flags:	completed with instructions for time stamping
+ *
+ * Note : callers should take care of initial *tx_flags value (usually 0)
  */
-void sock_tx_timestamp(struct sock *sk, __u8 *tx_flags);
+void sock_tx_timestamp(const struct sock *sk, __u8 *tx_flags);
 
 /**
  * sk_eat_skb - Release a skb if it is no longer needed
diff --git a/net/socket.c b/net/socket.c
index ae89569a2db5..95ee7d8682e7 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -610,20 +610,26 @@ void sock_release(struct socket *sock)
 }
 EXPORT_SYMBOL(sock_release);
 
-void sock_tx_timestamp(struct sock *sk, __u8 *tx_flags)
+void sock_tx_timestamp(const struct sock *sk, __u8 *tx_flags)
 {
-	*tx_flags = 0;
+	u8 flags = *tx_flags;
+
 	if (sk->sk_tsflags & SOF_TIMESTAMPING_TX_HARDWARE)
-		*tx_flags |= SKBTX_HW_TSTAMP;
+		flags |= SKBTX_HW_TSTAMP;
+
 	if (sk->sk_tsflags & SOF_TIMESTAMPING_TX_SOFTWARE)
-		*tx_flags |= SKBTX_SW_TSTAMP;
+		flags |= SKBTX_SW_TSTAMP;
+
 	if (sk->sk_tsflags & SOF_TIMESTAMPING_TX_SCHED)
-		*tx_flags |= SKBTX_SCHED_TSTAMP;
+		flags |= SKBTX_SCHED_TSTAMP;
+
 	if (sk->sk_tsflags & SOF_TIMESTAMPING_TX_ACK)
-		*tx_flags |= SKBTX_ACK_TSTAMP;
+		flags |= SKBTX_ACK_TSTAMP;
 
 	if (sock_flag(sk, SOCK_WIFI_STATUS))
-		*tx_flags |= SKBTX_WIFI_STATUS;
+		flags |= SKBTX_WIFI_STATUS;
+
+	*tx_flags = flags;
 }
 EXPORT_SYMBOL(sock_tx_timestamp);
 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next] net-timestamp: sock_tx_timestamp() fix
  2014-08-06  9:49     ` [PATCH net-next] net-timestamp: sock_tx_timestamp() fix Eric Dumazet
@ 2014-08-06 12:52       ` Willem de Bruijn
  2014-08-06 19:38       ` David Miller
  1 sibling, 0 replies; 27+ messages in thread
From: Willem de Bruijn @ 2014-08-06 12:52 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Network Development, David Miller, Richard Cochran

On Wed, Aug 6, 2014 at 5:49 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> sock_tx_timestamp() should not ignore initial *tx_flags value, as TCP
> stack can store SKBTX_SHARED_FRAG in it.
>
> Also first argument (struct sock *) can be const.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Fixes: 4ed2d765dfac ("net-timestamp: TCP timestamping")
> Cc: Willem de Bruijn <willemb@google.com>

Acked-by: Willem de Bruijn <willemb@google.com>

Thanks, Eric!

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v4 5/6] net-timestamp: TCP timestamping
  2014-08-06  6:55   ` Eric Dumazet
@ 2014-08-06 13:08     ` Willem de Bruijn
  2014-08-06 14:36       ` Eric Dumazet
  0 siblings, 1 reply; 27+ messages in thread
From: Willem de Bruijn @ 2014-08-06 13:08 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Network Development, David Miller, Richard Cochran

On Wed, Aug 6, 2014 at 2:55 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Mon, 2014-08-04 at 22:11 -0400, Willem de Bruijn wrote:
>> TCP timestamping extends SO_TIMESTAMPING to bytestreams.
>
>>
>> +void tcp_tx_timestamp(struct sock *sk, struct sk_buff *skb)
>> +{
>> +     struct skb_shared_info *shinfo = skb_shinfo(skb);
>> +
>> +     sock_tx_timestamp(sk, &shinfo->tx_flags);
>
> Arg, this breaks SKBTX_SHARED_FRAG support.
>
> And sock_tx_timestamp() is quite expensive for TCP where only
> SKBTX_ACK_TSTAMP could possibly be supported ?

The other options can also be set. But it is expensive. It can
actually be wrapped into a branch if (sk->sk_tsflags), to avoid
calling the function in the common case.

>
> static void tcp_tx_timestamp(const struct sock *sk, struct sk_buff *skb)
>
> I'll send a cumulative patch.

Thanks for spotting and fixing the critical shared frags issue. I
can send a patch for this, the two static functions identified by
sparse and the sequence number arithmetic in tcp_gso_tstamp.
>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v4 5/6] net-timestamp: TCP timestamping
  2014-08-06  7:05   ` Eric Dumazet
  2014-08-06  9:49     ` [PATCH net-next] net-timestamp: sock_tx_timestamp() fix Eric Dumazet
@ 2014-08-06 14:23     ` Willem de Bruijn
  2014-08-06 14:37       ` Eric Dumazet
  1 sibling, 1 reply; 27+ messages in thread
From: Willem de Bruijn @ 2014-08-06 14:23 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Network Development, David Miller, Richard Cochran

> I believe you missed fact that a GSO packet can be split into 2 or many
> skbs, when tcp_write_xmit() has to send a lower amount.
>
> This is done in tso_fragment()

Thanks, I missed that case. I'll look into it.

> Also, I am a bit worried about retransmits ?

For the same reason, i.e., possibly calling tcp_fragment?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v4 5/6] net-timestamp: TCP timestamping
  2014-08-06 13:08     ` Willem de Bruijn
@ 2014-08-06 14:36       ` Eric Dumazet
  0 siblings, 0 replies; 27+ messages in thread
From: Eric Dumazet @ 2014-08-06 14:36 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Network Development, David Miller, Richard Cochran

On Wed, 2014-08-06 at 09:08 -0400, Willem de Bruijn wrote:

> Thanks for spotting and fixing the critical shared frags issue. I
> can send a patch for this, the two static functions identified by
> sparse and the sequence number arithmetic in tcp_gso_tstamp.

Please send the patch, as I am quite busy today.

Another issue is your should set the flag, not clearing others in
tcp_gso_tstamp()

	skb_shinfo(skb)->tx_flags |= SKBTX_SW_TSTAMP;


Not sure if this matters at the moment, but it would be cleaner anyway.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v4 5/6] net-timestamp: TCP timestamping
  2014-08-06 14:23     ` [PATCH net-next v4 5/6] net-timestamp: TCP timestamping Willem de Bruijn
@ 2014-08-06 14:37       ` Eric Dumazet
  2014-08-06 19:20         ` Willem de Bruijn
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2014-08-06 14:37 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Network Development, David Miller, Richard Cochran

On Wed, 2014-08-06 at 10:23 -0400, Willem de Bruijn wrote:
> > I believe you missed fact that a GSO packet can be split into 2 or many
> > skbs, when tcp_write_xmit() has to send a lower amount.
> >
> > This is done in tso_fragment()
> 
> Thanks, I missed that case. I'll look into it.
> 
> > Also, I am a bit worried about retransmits ?
> 
> For the same reason, i.e., possibly calling tcp_fragment?

Yes, but also that we're going to send multiple reports back to error
queue.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v4 5/6] net-timestamp: TCP timestamping
  2014-08-06 14:37       ` Eric Dumazet
@ 2014-08-06 19:20         ` Willem de Bruijn
  2014-08-06 21:32           ` David Miller
  0 siblings, 1 reply; 27+ messages in thread
From: Willem de Bruijn @ 2014-08-06 19:20 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Network Development, David Miller, Richard Cochran

On Wed, Aug 6, 2014 at 10:37 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2014-08-06 at 10:23 -0400, Willem de Bruijn wrote:
>> > I believe you missed fact that a GSO packet can be split into 2 or many
>> > skbs, when tcp_write_xmit() has to send a lower amount.
>> >
>> > This is done in tso_fragment()
>>
>> Thanks, I missed that case. I'll look into it.
>>
>> > Also, I am a bit worried about retransmits ?
>>
>> For the same reason, i.e., possibly calling tcp_fragment?
>
> Yes,

Good catch, thanks. Both cases only affect the correctness of this
feature, not existing code, so I will work on a separate fix for net
later.

> but also that we're going to send multiple reports back to error
> queue.

I see. The optimization patch to queue timestamps without payload will
mitigate that somewhat. I dropped that from the initial patchset, but
will fix it up for net-next. It may also be possible to squash
multiple timestamped packets on the errqueue together when they all
have the same payload, resulting in a single (possibly no-payload)
packet with repeating cmsgs IP_RECVERR and SCM_TIMESTAMPING. That
would give O(1) overhead regardless of number of retransmits.

I just sent the fix to the other issues.

>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next] net-timestamp: sock_tx_timestamp() fix
  2014-08-06  9:49     ` [PATCH net-next] net-timestamp: sock_tx_timestamp() fix Eric Dumazet
  2014-08-06 12:52       ` Willem de Bruijn
@ 2014-08-06 19:38       ` David Miller
  1 sibling, 0 replies; 27+ messages in thread
From: David Miller @ 2014-08-06 19:38 UTC (permalink / raw)
  To: eric.dumazet; +Cc: willemb, netdev, richardcochran

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 06 Aug 2014 11:49:29 +0200

> From: Eric Dumazet <edumazet@google.com>
> 
> sock_tx_timestamp() should not ignore initial *tx_flags value, as TCP
> stack can store SKBTX_SHARED_FRAG in it.
> 
> Also first argument (struct sock *) can be const.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Fixes: 4ed2d765dfac ("net-timestamp: TCP timestamping")

Applied, thanks Eric.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v4 5/6] net-timestamp: TCP timestamping
  2014-08-06 19:20         ` Willem de Bruijn
@ 2014-08-06 21:32           ` David Miller
  2014-08-07  0:59             ` Willem de Bruijn
  0 siblings, 1 reply; 27+ messages in thread
From: David Miller @ 2014-08-06 21:32 UTC (permalink / raw)
  To: willemb; +Cc: eric.dumazet, netdev, richardcochran

From: Willem de Bruijn <willemb@google.com>
Date: Wed, 6 Aug 2014 15:20:39 -0400

> On Wed, Aug 6, 2014 at 10:37 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> but also that we're going to send multiple reports back to error
>> queue.
> 
> I see. The optimization patch to queue timestamps without payload will
> mitigate that somewhat. I dropped that from the initial patchset, but
> will fix it up for net-next. It may also be possible to squash
> multiple timestamped packets on the errqueue together when they all
> have the same payload, resulting in a single (possibly no-payload)
> packet with repeating cmsgs IP_RECVERR and SCM_TIMESTAMPING. That
> would give O(1) overhead regardless of number of retransmits.

We could attach a singly linked list of small sequence number cookies
to the SKB when it gets queued up.

Upon an ACK, we need only look at the entries at the front of the list
until we see one past the cumulative ACK.

There's a little bit of work to do when SKBs are coalesced or chopped
up, but in return the timestamp indications the user gets will be very
accurate.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v4 5/6] net-timestamp: TCP timestamping
  2014-08-06 21:32           ` David Miller
@ 2014-08-07  0:59             ` Willem de Bruijn
  2014-08-07  2:03               ` David Miller
  0 siblings, 1 reply; 27+ messages in thread
From: Willem de Bruijn @ 2014-08-07  0:59 UTC (permalink / raw)
  To: David Miller; +Cc: Eric Dumazet, Network Development, Richard Cochran

>>> but also that we're going to send multiple reports back to error
>>> queue.
>>
>> I see. The optimization patch to queue timestamps without payload will
>> mitigate that somewhat. I dropped that from the initial patchset, but
>> will fix it up for net-next. It may also be possible to squash
>> multiple timestamped packets on the errqueue together when they all
>> have the same payload, resulting in a single (possibly no-payload)
>> packet with repeating cmsgs IP_RECVERR and SCM_TIMESTAMPING. That
>> would give O(1) overhead regardless of number of retransmits.
>
> We could attach a singly linked list of small sequence number cookies
> to the SKB when it gets queued up.

To avoid queuing a clone of the skb for each timestamp with
sock_queue_err_skb, only queue the first occurrence. Record
subsequent (tstype, tstamp) tuples to the queued skb with
matching tskey, and at ip_recv_error convert each into a cmsg?

> Upon an ACK, we need only look at the entries at the front of the list
> until we see one past the cumulative ACK.
>
> There's a little bit of work to do when SKBs are coalesced or chopped
> up, but in return the timestamp indications the user gets will be very
> accurate.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v4 5/6] net-timestamp: TCP timestamping
  2014-08-07  0:59             ` Willem de Bruijn
@ 2014-08-07  2:03               ` David Miller
  2014-08-07  3:43                 ` Willem de Bruijn
  0 siblings, 1 reply; 27+ messages in thread
From: David Miller @ 2014-08-07  2:03 UTC (permalink / raw)
  To: willemb; +Cc: eric.dumazet, netdev, richardcochran

From: Willem de Bruijn <willemb@google.com>
Date: Wed, 6 Aug 2014 20:59:30 -0400

>>>> but also that we're going to send multiple reports back to error
>>>> queue.
>>>
>>> I see. The optimization patch to queue timestamps without payload will
>>> mitigate that somewhat. I dropped that from the initial patchset, but
>>> will fix it up for net-next. It may also be possible to squash
>>> multiple timestamped packets on the errqueue together when they all
>>> have the same payload, resulting in a single (possibly no-payload)
>>> packet with repeating cmsgs IP_RECVERR and SCM_TIMESTAMPING. That
>>> would give O(1) overhead regardless of number of retransmits.
>>
>> We could attach a singly linked list of small sequence number cookies
>> to the SKB when it gets queued up.
> 
> To avoid queuing a clone of the skb for each timestamp with
> sock_queue_err_skb, only queue the first occurrence. Record
> subsequent (tstype, tstamp) tuples to the queued skb with
> matching tskey, and at ip_recv_error convert each into a cmsg?

The retransmit queue only contains the original transmit SKB(s).

So the only modification in tcp_clean_rtx_queue() is to walk the
list of timestamp sequence number cookies.

Actually I see what might be an existing problem... for example, the
leading data which is already ACK'd is trimmed from the front of the
SKB in the TSO code path of tcp_clean_rtx_queue().  And this data is
tossed before the __skb_tstamp_tx() invocation.

Hmmm...

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v4 5/6] net-timestamp: TCP timestamping
  2014-08-07  2:03               ` David Miller
@ 2014-08-07  3:43                 ` Willem de Bruijn
  2014-08-07  3:50                   ` David Miller
  0 siblings, 1 reply; 27+ messages in thread
From: Willem de Bruijn @ 2014-08-07  3:43 UTC (permalink / raw)
  To: David Miller; +Cc: Eric Dumazet, Network Development, Richard Cochran

On Wed, Aug 6, 2014 at 10:03 PM, David Miller <davem@davemloft.net> wrote:
> From: Willem de Bruijn <willemb@google.com>
> Date: Wed, 6 Aug 2014 20:59:30 -0400
>
>>>>> but also that we're going to send multiple reports back to error
>>>>> queue.
>>>>
>>>> I see. The optimization patch to queue timestamps without payload will
>>>> mitigate that somewhat. I dropped that from the initial patchset, but
>>>> will fix it up for net-next. It may also be possible to squash
>>>> multiple timestamped packets on the errqueue together when they all
>>>> have the same payload, resulting in a single (possibly no-payload)
>>>> packet with repeating cmsgs IP_RECVERR and SCM_TIMESTAMPING. That
>>>> would give O(1) overhead regardless of number of retransmits.
>>>
>>> We could attach a singly linked list of small sequence number cookies
>>> to the SKB when it gets queued up.
>>
>> To avoid queuing a clone of the skb for each timestamp with
>> sock_queue_err_skb, only queue the first occurrence. Record
>> subsequent (tstype, tstamp) tuples to the queued skb with
>> matching tskey, and at ip_recv_error convert each into a cmsg?
>
> The retransmit queue only contains the original transmit SKB(s).
>
> So the only modification in tcp_clean_rtx_queue() is to walk the
> list of timestamp sequence number cookies.

I think I may have misunderstood the design. When are cookies
added to this list and what exactly do they record? Attach a
cookie to the SKB on each invocation of skb_tstamp_tx, instead
of cloning the SKB every time and queuing each clone onto
the errqueue. Build this list of cookies and flush them at once on
the final ACK timestamp? If so, then cookies record a timestamp
and -type, but all refer to the same skb_shinfo(skb)->tskey.

> Actually I see what might be an existing problem... for example, the
> leading data which is already ACK'd is trimmed from the front of the
> SKB in the TSO code path of tcp_clean_rtx_queue().  And this data is
> tossed before the __skb_tstamp_tx() invocation.
>
> Hmmm...

The break after tcp_tso_acked, when bytes have been trimmed off the
head. Yes, indeed. Maybe I can move the timestamp code before that, at
the top of the loop.. I'll take a closer look to verify.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v4 5/6] net-timestamp: TCP timestamping
  2014-08-07  3:43                 ` Willem de Bruijn
@ 2014-08-07  3:50                   ` David Miller
  0 siblings, 0 replies; 27+ messages in thread
From: David Miller @ 2014-08-07  3:50 UTC (permalink / raw)
  To: willemb; +Cc: eric.dumazet, netdev, richardcochran

From: Willem de Bruijn <willemb@google.com>
Date: Wed, 6 Aug 2014 23:43:11 -0400

> On Wed, Aug 6, 2014 at 10:03 PM, David Miller <davem@davemloft.net> wrote:
>> From: Willem de Bruijn <willemb@google.com>
>> Date: Wed, 6 Aug 2014 20:59:30 -0400
>>
>>>>>> but also that we're going to send multiple reports back to error
>>>>>> queue.
>>>>>
>>>>> I see. The optimization patch to queue timestamps without payload will
>>>>> mitigate that somewhat. I dropped that from the initial patchset, but
>>>>> will fix it up for net-next. It may also be possible to squash
>>>>> multiple timestamped packets on the errqueue together when they all
>>>>> have the same payload, resulting in a single (possibly no-payload)
>>>>> packet with repeating cmsgs IP_RECVERR and SCM_TIMESTAMPING. That
>>>>> would give O(1) overhead regardless of number of retransmits.
>>>>
>>>> We could attach a singly linked list of small sequence number cookies
>>>> to the SKB when it gets queued up.
>>>
>>> To avoid queuing a clone of the skb for each timestamp with
>>> sock_queue_err_skb, only queue the first occurrence. Record
>>> subsequent (tstype, tstamp) tuples to the queued skb with
>>> matching tskey, and at ip_recv_error convert each into a cmsg?
>>
>> The retransmit queue only contains the original transmit SKB(s).
>>
>> So the only modification in tcp_clean_rtx_queue() is to walk the
>> list of timestamp sequence number cookies.
> 
> I think I may have misunderstood the design. When are cookies
> added to this list and what exactly do they record? Attach a
> cookie to the SKB on each invocation of skb_tstamp_tx, instead
> of cloning the SKB every time and queuing each clone onto
> the errqueue. Build this list of cookies and flush them at once on
> the final ACK timestamp? If so, then cookies record a timestamp
> and -type, but all refer to the same skb_shinfo(skb)->tskey.

When transmit packets are built in TCP for which we will provide
a timestamp, that is when we will allocate the cookie blob and
link it into the SKB.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v4 0/6] net-timestamp: new tx tstamps and tcp
  2014-08-05  2:11 [PATCH net-next v4 0/6] net-timestamp: new tx tstamps and tcp Willem de Bruijn
                   ` (7 preceding siblings ...)
  2014-08-05 23:36 ` David Miller
@ 2014-08-07 23:09 ` Andi Kleen
  2014-08-07 23:39   ` Willem de Bruijn
  8 siblings, 1 reply; 27+ messages in thread
From: Andi Kleen @ 2014-08-07 23:09 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: netdev, davem, eric.dumazet, richardcochran

Willem de Bruijn <willemb@google.com> writes:

> Extend socket tx timestamping:
> - allow multiple types of software timestamps aside from send (1)
> - add software timestamp on enter packet scheduling (4)
> - add software timestamp for TCP (5)
> - add software timestamp for TCP on ACK (6)

Is there any freely available reference library or tool to make use of
all these new time stamps?

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v4 0/6] net-timestamp: new tx tstamps and tcp
  2014-08-07 23:09 ` Andi Kleen
@ 2014-08-07 23:39   ` Willem de Bruijn
  0 siblings, 0 replies; 27+ messages in thread
From: Willem de Bruijn @ 2014-08-07 23:39 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Network Development, David Miller, Eric Dumazet, Richard Cochran

On Thu, Aug 7, 2014 at 7:09 PM, Andi Kleen <andi@firstfloor.org> wrote:
> Willem de Bruijn <willemb@google.com> writes:
>
>> Extend socket tx timestamping:
>> - allow multiple types of software timestamps aside from send (1)
>> - add software timestamp on enter packet scheduling (4)
>> - add software timestamp for TCP (5)
>> - add software timestamp for TCP on ACK (6)
>
> Is there any freely available reference library or tool to make use of
> all these new time stamps?

I posted one on github.com/wdebruij/kerneltools earlier, but the
interface has changed in the meantime. I just updated it to work with
the latest API.

There are still references to the old MSG_TSTAMP API in the test file
(txtimestamp.c), so I will need to clean it up some more. But it
works as is.

The v1 patchset also had a patch to the documentation. I will revise
that to match the latest interface, too, and send both files for
inclusion in Documention once net-next opens.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2014-08-07 23:40 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-05  2:11 [PATCH net-next v4 0/6] net-timestamp: new tx tstamps and tcp Willem de Bruijn
2014-08-05  2:11 ` [PATCH net-next v4 1/6] net-timestamp: extend SCM_TIMESTAMPING ancillary data struct Willem de Bruijn
2014-08-05  2:11 ` [PATCH net-next v4 2/6] net-timestamp: move timestamp flags out of sk_flags Willem de Bruijn
2014-08-05  2:11 ` [PATCH net-next v4 3/6] net-timestamp: add key to disambiguate concurrent datagrams Willem de Bruijn
2014-08-05  2:11 ` [PATCH net-next v4 4/6] net-timestamp: SCHED timestamp on entering packet scheduler Willem de Bruijn
2014-08-05  2:11 ` [PATCH net-next v4 5/6] net-timestamp: TCP timestamping Willem de Bruijn
2014-08-06  6:25   ` Eric Dumazet
2014-08-06  6:55   ` Eric Dumazet
2014-08-06 13:08     ` Willem de Bruijn
2014-08-06 14:36       ` Eric Dumazet
2014-08-06  7:05   ` Eric Dumazet
2014-08-06  9:49     ` [PATCH net-next] net-timestamp: sock_tx_timestamp() fix Eric Dumazet
2014-08-06 12:52       ` Willem de Bruijn
2014-08-06 19:38       ` David Miller
2014-08-06 14:23     ` [PATCH net-next v4 5/6] net-timestamp: TCP timestamping Willem de Bruijn
2014-08-06 14:37       ` Eric Dumazet
2014-08-06 19:20         ` Willem de Bruijn
2014-08-06 21:32           ` David Miller
2014-08-07  0:59             ` Willem de Bruijn
2014-08-07  2:03               ` David Miller
2014-08-07  3:43                 ` Willem de Bruijn
2014-08-07  3:50                   ` David Miller
2014-08-05  2:11 ` [PATCH net-next v4 6/6] net-timestamp: ACK timestamp for bytestreams Willem de Bruijn
2014-08-05  2:16 ` [PATCH net-next v4 0/6] net-timestamp: new tx tstamps and tcp Willem de Bruijn
2014-08-05 23:36 ` David Miller
2014-08-07 23:09 ` Andi Kleen
2014-08-07 23:39   ` Willem de Bruijn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.