All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next RFC 0/5] net-timestamp: address blinding and batching
@ 2015-01-09 17:31 Willem de Bruijn
  2015-01-09 17:31 ` [PATCH net-next RFC 1/5] net-timestamp: no-payload option Willem de Bruijn
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: Willem de Bruijn @ 2015-01-09 17:31 UTC (permalink / raw)
  To: netdev; +Cc: davem, richardcochran, eric.dumazet, luto, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Two issues were raised during recent timestamping discussions:
1. looping full packets on the error queue exposes packet headers
2. TCP timestamping with retransmissions generates many timestamps

This RFC patchset is an attempt at addressing both without breaking
legacy behavior.

Patch 1 reintroduces the "no payload" timestamp option, which loops
timestamps onto an empty skb. Patch 2 then gives administrators the
power to block all timestamp requests by unprivileged users that
contain data. I proposed this earlier as a backward compatible
workaround in the discussion of

  net-timestamp: pull headers for SOCK_STREAM
  http://patchwork.ozlabs.org/patch/414810/

Patch 3 only updates the txtimestamp example to test this option.

When looping timestamps without data, there is no need to associate
a timestamp with a specific packet. Patch 4 loops multiple timestamps
onto a single outstanding packet if this has no payload. It is a
variant of the cookies approach that David proposed in

  net-timestamp: TCP timestamping
  http://patchwork.ozlabs.org/patch/376513/

That patch turns out to introduce a quite a bit of code to save
relatively few bytes because
1. no-payload already limits the per-skb size that is queued and
2. batching is limited by send() failing as soon as there is an
   outstanding packet on the error queue. Therefore, I'm fine
with dropping this. By now, it is at least recorded in patchwork.

Patch 5, finally, creates a new short SO_TIMESTAMPING option,
SOF_TIMESTAMPING_TX, that combines all recent options, as a push
to get future applications to use the new ID and no-payload based
API by default.


Willem de Bruijn (5):
  net-timestamp: no-payload option
  net-timestamp: no-payload only sysctl
  net-timestamp: no-payload option in txtimestamp test
  net-timestamp: tx timestamp cookies
  net-timestamp: tx timestamping default mode flag

 .../networking/timestamping/txtimestamp.c          |  28 ++++-
 include/linux/skbuff.h                             |  12 ++
 include/net/sock.h                                 |   4 +-
 include/uapi/linux/errqueue.h                      |   1 +
 include/uapi/linux/net_tstamp.h                    |  11 +-
 net/core/skbuff.c                                  | 134 ++++++++++++++++++---
 net/core/sock.c                                    |   3 +
 net/core/sysctl_net_core.c                         |   9 ++
 net/ipv4/ip_sockglue.c                             |   9 +-
 net/ipv6/datagram.c                                |   4 +-
 net/rxrpc/ar-error.c                               |   5 +
 net/socket.c                                       |  64 +++++++++-
 12 files changed, 250 insertions(+), 34 deletions(-)

-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net-next RFC 1/5] net-timestamp: no-payload option
  2015-01-09 17:31 [PATCH net-next RFC 0/5] net-timestamp: address blinding and batching Willem de Bruijn
@ 2015-01-09 17:31 ` Willem de Bruijn
  2015-01-09 19:43   ` Andy Lutomirski
  2015-01-11 20:26   ` Richard Cochran
  2015-01-09 17:31 ` [PATCH net-next RFC 2/5] net-timestamp: no-payload only sysctl Willem de Bruijn
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 18+ messages in thread
From: Willem de Bruijn @ 2015-01-09 17:31 UTC (permalink / raw)
  To: netdev; +Cc: davem, richardcochran, eric.dumazet, luto, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Add timestamping option SOF_TIMESTAMPING_OPT_TSONLY. For transmit
timestamps, this loops timestamps on top of empty packets.

Doing so reduces the pressure on SO_RCVBUF. Payload inspection and
cmsg reception (aside from timestamps) are no longer possible. This
works together with a follow on patch that allows administrators to
only allow tx timestamping if it does not loop payload or metadata.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 include/uapi/linux/net_tstamp.h |  3 ++-
 net/core/skbuff.c               | 19 ++++++++++++++-----
 net/ipv4/ip_sockglue.c          |  9 +++++----
 net/ipv6/datagram.c             |  4 ++--
 net/rxrpc/ar-error.c            |  5 +++++
 5 files changed, 28 insertions(+), 12 deletions(-)

diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
index edbc888..6d1abea 100644
--- a/include/uapi/linux/net_tstamp.h
+++ b/include/uapi/linux/net_tstamp.h
@@ -24,8 +24,9 @@ enum {
 	SOF_TIMESTAMPING_TX_SCHED = (1<<8),
 	SOF_TIMESTAMPING_TX_ACK = (1<<9),
 	SOF_TIMESTAMPING_OPT_CMSG = (1<<10),
+	SOF_TIMESTAMPING_OPT_TSONLY = (1<<11),
 
-	SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_CMSG,
+	SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_TSONLY,
 	SOF_TIMESTAMPING_MASK = (SOF_TIMESTAMPING_LAST - 1) |
 				 SOF_TIMESTAMPING_LAST
 };
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 5a2a2e8..ece2bb8 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3710,19 +3710,28 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
 		     struct sock *sk, int tstype)
 {
 	struct sk_buff *skb;
+	bool tsonly = sk->sk_tsflags & SOF_TIMESTAMPING_OPT_TSONLY;
 
 	if (!sk)
 		return;
 
-	if (hwtstamps)
-		*skb_hwtstamps(orig_skb) = *hwtstamps;
+	if (tsonly)
+		skb = alloc_skb(0, GFP_ATOMIC);
 	else
-		orig_skb->tstamp = ktime_get_real();
-
-	skb = skb_clone(orig_skb, GFP_ATOMIC);
+		skb = skb_clone(orig_skb, GFP_ATOMIC);
 	if (!skb)
 		return;
 
+	if (tsonly) {
+		skb_shinfo(skb)->tx_flags = skb_shinfo(orig_skb)->tx_flags;
+		skb_shinfo(skb)->tskey = skb_shinfo(orig_skb)->tskey;
+	}
+
+	if (hwtstamps)
+		*skb_hwtstamps(skb) = *hwtstamps;
+	else
+		skb->tstamp = ktime_get_real();
+
 	__skb_complete_tx_timestamp(skb, sk, tstype);
 }
 EXPORT_SYMBOL_GPL(__skb_tstamp_tx);
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index a317797..d81ef70 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -440,7 +440,7 @@ static bool ipv4_pktinfo_prepare_errqueue(const struct sock *sk,
 
 	if ((ee_origin != SO_EE_ORIGIN_TIMESTAMPING) ||
 	    (!(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_CMSG)) ||
-	    (!skb->dev))
+	    (!skb->dev) || (!skb->len))
 		return false;
 
 	info->ipi_spec_dst.s_addr = ip_hdr(skb)->saddr;
@@ -483,7 +483,7 @@ int ip_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
 
 	serr = SKB_EXT_ERR(skb);
 
-	if (sin) {
+	if (sin && skb->len) {
 		sin->sin_family = AF_INET;
 		sin->sin_addr.s_addr = *(__be32 *)(skb_network_header(skb) +
 						   serr->addr_offset);
@@ -496,8 +496,9 @@ int ip_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
 	sin = &errhdr.offender;
 	sin->sin_family = AF_UNSPEC;
 
-	if (serr->ee.ee_origin == SO_EE_ORIGIN_ICMP ||
-	    ipv4_pktinfo_prepare_errqueue(sk, skb, serr->ee.ee_origin)) {
+	if (skb->len &&
+	    (serr->ee.ee_origin == SO_EE_ORIGIN_ICMP ||
+	     ipv4_pktinfo_prepare_errqueue(sk, skb, serr->ee.ee_origin))) {
 		struct inet_sock *inet = inet_sk(sk);
 
 		sin->sin_family = AF_INET;
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 100c589..91a31ea 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -369,7 +369,7 @@ int ipv6_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
 
 	serr = SKB_EXT_ERR(skb);
 
-	if (sin) {
+	if (sin && skb->len) {
 		const unsigned char *nh = skb_network_header(skb);
 		sin->sin6_family = AF_INET6;
 		sin->sin6_flowinfo = 0;
@@ -394,7 +394,7 @@ int ipv6_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
 	memcpy(&errhdr.ee, &serr->ee, sizeof(struct sock_extended_err));
 	sin = &errhdr.offender;
 	sin->sin6_family = AF_UNSPEC;
-	if (serr->ee.ee_origin != SO_EE_ORIGIN_LOCAL) {
+	if (serr->ee.ee_origin != SO_EE_ORIGIN_LOCAL && skb->len) {
 		sin->sin6_family = AF_INET6;
 		sin->sin6_flowinfo = 0;
 		sin->sin6_port = 0;
diff --git a/net/rxrpc/ar-error.c b/net/rxrpc/ar-error.c
index 74c0fcd..5394b6b 100644
--- a/net/rxrpc/ar-error.c
+++ b/net/rxrpc/ar-error.c
@@ -42,6 +42,11 @@ void rxrpc_UDP_error_report(struct sock *sk)
 		_leave("UDP socket errqueue empty");
 		return;
 	}
+	if (!skb->len) {
+		_leave("UDP empty message");
+		kfree_skb(skb);
+		return;
+	}
 
 	rxrpc_new_skb(skb);
 
-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next RFC 2/5] net-timestamp: no-payload only sysctl
  2015-01-09 17:31 [PATCH net-next RFC 0/5] net-timestamp: address blinding and batching Willem de Bruijn
  2015-01-09 17:31 ` [PATCH net-next RFC 1/5] net-timestamp: no-payload option Willem de Bruijn
@ 2015-01-09 17:31 ` Willem de Bruijn
  2015-01-09 17:31 ` [PATCH net-next RFC 3/5] net-timestamp: no-payload option in txtimestamp test Willem de Bruijn
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 18+ messages in thread
From: Willem de Bruijn @ 2015-01-09 17:31 UTC (permalink / raw)
  To: netdev; +Cc: davem, richardcochran, eric.dumazet, luto, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Tx timestamps are looped onto the error queue on top of an skb. This
mechanism leaks packet headers to processes unless the no-payload
options SOF_TIMESTAMPING_OPT_TSONLY is set.

Add a sysctl that optionally drops looped timestamps with data for
unprivileged users.

The policy is checked when timestamps are generated in the stack.
It is possible for timestamps with data to be reported after the
sysctl is set, if these were queued internally earlier.

No vulnerability is immediately known that exploits knowledge
gleaned from packet headers, but it may still be preferable to allow
administrators to lock down this path at the cost of possible
breakage of legacy applications.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 include/net/sock.h         |  1 +
 net/core/skbuff.c          | 11 ++++++++++-
 net/core/sock.c            |  3 +++
 net/core/sysctl_net_core.c |  9 +++++++++
 4 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 2210fec..9729171 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2262,6 +2262,7 @@ bool sk_net_capable(const struct sock *sk, int cap);
 extern __u32 sysctl_wmem_max;
 extern __u32 sysctl_rmem_max;
 
+extern int sysctl_tstamp_allow_data;
 extern int sysctl_optmem_max;
 
 extern __u32 sysctl_wmem_default;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index ece2bb8..e5f4c06 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3690,11 +3690,20 @@ static void __skb_complete_tx_timestamp(struct sk_buff *skb,
 		kfree_skb(skb);
 }
 
+static bool skb_may_tx_timestamp(struct sock *sk)
+{
+	return sysctl_tstamp_allow_data || capable(CAP_NET_RAW) ||
+	       sk->sk_tsflags & SOF_TIMESTAMPING_OPT_TSONLY;
+}
+
 void skb_complete_tx_timestamp(struct sk_buff *skb,
 			       struct skb_shared_hwtstamps *hwtstamps)
 {
 	struct sock *sk = skb->sk;
 
+	if (!skb_may_tx_timestamp(sk))
+		return;
+
 	/* take a reference to prevent skb_orphan() from freeing the socket */
 	sock_hold(sk);
 
@@ -3712,7 +3721,7 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
 	struct sk_buff *skb;
 	bool tsonly = sk->sk_tsflags & SOF_TIMESTAMPING_OPT_TSONLY;
 
-	if (!sk)
+	if (!sk || !skb_may_tx_timestamp(sk))
 		return;
 
 	if (tsonly)
diff --git a/net/core/sock.c b/net/core/sock.c
index 1c7a33d..93c8b20 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -325,6 +325,8 @@ __u32 sysctl_rmem_default __read_mostly = SK_RMEM_MAX;
 int sysctl_optmem_max __read_mostly = sizeof(unsigned long)*(2*UIO_MAXIOV+512);
 EXPORT_SYMBOL(sysctl_optmem_max);
 
+int sysctl_tstamp_allow_data __read_mostly = 1;
+
 struct static_key memalloc_socks = STATIC_KEY_INIT_FALSE;
 EXPORT_SYMBOL_GPL(memalloc_socks);
 
@@ -840,6 +842,7 @@ set_rcvbuf:
 			ret = -EINVAL;
 			break;
 		}
+
 		if (val & SOF_TIMESTAMPING_OPT_ID &&
 		    !(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID)) {
 			if (sk->sk_protocol == IPPROTO_TCP) {
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 31baba2..fde21d1 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -321,6 +321,15 @@ static struct ctl_table net_core_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec
 	},
+	{
+		.procname	= "tstamp_allow_data",
+		.data		= &sysctl_tstamp_allow_data,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &one
+	},
 #ifdef CONFIG_RPS
 	{
 		.procname	= "rps_sock_flow_entries",
-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next RFC 3/5] net-timestamp: no-payload option in txtimestamp test
  2015-01-09 17:31 [PATCH net-next RFC 0/5] net-timestamp: address blinding and batching Willem de Bruijn
  2015-01-09 17:31 ` [PATCH net-next RFC 1/5] net-timestamp: no-payload option Willem de Bruijn
  2015-01-09 17:31 ` [PATCH net-next RFC 2/5] net-timestamp: no-payload only sysctl Willem de Bruijn
@ 2015-01-09 17:31 ` Willem de Bruijn
  2015-01-09 17:31 ` [PATCH net-next RFC 4/5] net-timestamp: tx timestamp cookies Willem de Bruijn
  2015-01-09 17:31 ` [PATCH net-next RFC 5/5] net-timestamp: tx timestamping default mode flag Willem de Bruijn
  4 siblings, 0 replies; 18+ messages in thread
From: Willem de Bruijn @ 2015-01-09 17:31 UTC (permalink / raw)
  To: netdev; +Cc: davem, richardcochran, eric.dumazet, luto, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Demonstrate how SOF_TIMESTAMPING_OPT_TSONLY can be used and
test the implementation.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 .../networking/timestamping/txtimestamp.c          | 28 ++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/Documentation/networking/timestamping/txtimestamp.c b/Documentation/networking/timestamping/txtimestamp.c
index 8778e68..1171ebf 100644
--- a/Documentation/networking/timestamping/txtimestamp.c
+++ b/Documentation/networking/timestamping/txtimestamp.c
@@ -68,6 +68,7 @@ static int do_ipv6 = 1;
 static int cfg_payload_len = 10;
 static bool cfg_show_payload;
 static bool cfg_do_pktinfo;
+static bool cfg_loop_nodata;
 static uint16_t dest_port = 9000;
 
 static struct sockaddr_in daddr;
@@ -139,6 +140,9 @@ static void print_payload(char *data, int len)
 {
 	int i;
 
+	if (!len)
+		return;
+
 	if (len > 70)
 		len = 70;
 
@@ -175,6 +179,7 @@ static void __recv_errmsg_cmsg(struct msghdr *msg, int payload_len)
 	struct sock_extended_err *serr = NULL;
 	struct scm_timestamping *tss = NULL;
 	struct cmsghdr *cm;
+	int batch = 0;
 
 	for (cm = CMSG_FIRSTHDR(msg);
 	     cm && cm->cmsg_len;
@@ -207,10 +212,18 @@ static void __recv_errmsg_cmsg(struct msghdr *msg, int payload_len)
 		} else
 			fprintf(stderr, "unknown cmsg %d,%d\n",
 					cm->cmsg_level, cm->cmsg_type);
+
+		if (serr && tss) {
+			print_timestamp(tss, serr->ee_info, serr->ee_data,
+					payload_len);
+			serr = NULL;
+			tss = NULL;
+			batch++;
+		}
 	}
 
-	if (serr && tss)
-		print_timestamp(tss, serr->ee_info, serr->ee_data, payload_len);
+	if (batch > 1)
+		fprintf(stderr, "batched %d timestamps\n", batch);
 }
 
 static int recv_errmsg(int fd)
@@ -242,7 +255,7 @@ static int recv_errmsg(int fd)
 	if (ret == -1 && errno != EAGAIN)
 		error(1, errno, "recvmsg");
 
-	if (ret > 0) {
+	if (ret >= 0) {
 		__recv_errmsg_cmsg(&msg, ret);
 		if (cfg_show_payload)
 			print_payload(data, cfg_payload_len);
@@ -307,6 +320,9 @@ static void do_test(int family, unsigned int opt)
 	opt |= SOF_TIMESTAMPING_SOFTWARE |
 	       SOF_TIMESTAMPING_OPT_CMSG |
 	       SOF_TIMESTAMPING_OPT_ID;
+	if (cfg_loop_nodata)
+		opt |= SOF_TIMESTAMPING_OPT_TSONLY;
+
 	if (setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING,
 		       (char *) &opt, sizeof(opt)))
 		error(1, 0, "setsockopt timestamping");
@@ -376,6 +392,7 @@ static void __attribute__((noreturn)) usage(const char *filepath)
 			"  -h:   show this message\n"
 			"  -I:   request PKTINFO\n"
 			"  -l N: send N bytes at a time\n"
+			"  -n:   set no-payload option\n"
 			"  -r:   use raw\n"
 			"  -R:   use raw (IP_HDRINCL)\n"
 			"  -p N: connect to port N\n"
@@ -390,7 +407,7 @@ static void parse_opt(int argc, char **argv)
 	int proto_count = 0;
 	char c;
 
-	while ((c = getopt(argc, argv, "46hIl:p:rRux")) != -1) {
+	while ((c = getopt(argc, argv, "46hIl:np:rRux")) != -1) {
 		switch (c) {
 		case '4':
 			do_ipv6 = 0;
@@ -401,6 +418,9 @@ static void parse_opt(int argc, char **argv)
 		case 'I':
 			cfg_do_pktinfo = true;
 			break;
+		case 'n':
+			cfg_loop_nodata = true;
+			break;
 		case 'r':
 			proto_count++;
 			cfg_proto = SOCK_RAW;
-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next RFC 4/5] net-timestamp: tx timestamp cookies
  2015-01-09 17:31 [PATCH net-next RFC 0/5] net-timestamp: address blinding and batching Willem de Bruijn
                   ` (2 preceding siblings ...)
  2015-01-09 17:31 ` [PATCH net-next RFC 3/5] net-timestamp: no-payload option in txtimestamp test Willem de Bruijn
@ 2015-01-09 17:31 ` Willem de Bruijn
  2015-01-09 17:31 ` [PATCH net-next RFC 5/5] net-timestamp: tx timestamping default mode flag Willem de Bruijn
  4 siblings, 0 replies; 18+ messages in thread
From: Willem de Bruijn @ 2015-01-09 17:31 UTC (permalink / raw)
  To: netdev; +Cc: davem, richardcochran, eric.dumazet, luto, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Support looping multiple timestamps on top of a single skb on the
error queue.

Tx timestamps are returned on top of an skb. TCP timestamping and
other timestamp points enabled multiple timestamps for each buffer
passed in send. Due to retransmissions, this number may be high,
using lots of SO_RCVBUF space and kernel mode switches.

When returning without payload (SOF_TIMESTAMPING_OPT_TSONLY), the
total truesize is smaller, but still O(n). Without payload, the
constraint that a timestamp belongs to a specific skb also goes
away.

Instead of queuing multiple skbs onto the error queue, queue
successive timestamps onto the skb on top of the error queue.
For this purpose, introduce a timestamp cookie and use a list
of cookies instead of skb->tstamp.

The number of batched cookies is limited by having sends fail
with EAGAIN or ENOMSG as soon as a single packet is waiting on
the receive queue. If merging this functionality, a TODO is to
add a hard cap, so that processes can estimate the maximum
msg_controllen needed to read all timestamps.

The implementation returns the same structures as before, that is,
one struct sock_extended_err and one struct scm_timestamping for
each timestamp. The list is returned in reverse chronological
order: newest first. This choice is partially determined by the
callers (e.g., ip_recv_error) generating the final sock_extended_err.

Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 include/linux/skbuff.h        |  12 +++++
 include/net/sock.h            |   3 +-
 include/uapi/linux/errqueue.h |   1 +
 net/core/skbuff.c             | 104 ++++++++++++++++++++++++++++++++++++------
 net/socket.c                  |  64 ++++++++++++++++++++++++--
 5 files changed, 167 insertions(+), 17 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 85ab7d7..6d77b51 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -298,6 +298,13 @@ struct ubuf_info {
 	unsigned long desc;
 };
 
+struct skb_tstamp_cookie {
+	u32 tskey;
+	u32 tstype;
+	ktime_t tstamp;
+	struct skb_tstamp_cookie *next;
+};
+
 /* This data is invariant across clones and lives at
  * the end of the header data, ie. at skb->end.
  */
@@ -442,6 +449,8 @@ static inline u32 skb_mstamp_us_delta(const struct skb_mstamp *t1,
  *	@next: Next buffer in list
  *	@prev: Previous buffer in list
  *	@tstamp: Time we arrived/left
+ *	@skb_mstamp: tstamp variant used only within the TCP stack
+ *	@tscookies: tstamp variant used only with no-payload errqueue packets
  *	@rbnode: RB tree node, alternative to next/prev for netem/tcp
  *	@sk: Socket we are owned by
  *	@dev: Device we arrived on/are leaving by
@@ -516,6 +525,7 @@ struct sk_buff {
 			union {
 				ktime_t		tstamp;
 				struct skb_mstamp skb_mstamp;
+				struct skb_tstamp_cookie *tscookies;
 			};
 		};
 		struct rb_node	rbnode; /* used in netem & tcp stack */
@@ -2861,6 +2871,8 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
 		     struct skb_shared_hwtstamps *hwtstamps,
 		     struct sock *sk, int tstype);
 
+bool skb_has_tscookies(struct sk_buff *skb);
+
 /**
  * skb_tstamp_tx - queue clone of skb with send time stamps
  * @orig_skb:	the original outgoing packet
diff --git a/include/net/sock.h b/include/net/sock.h
index 9729171..de190d8 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2149,7 +2149,8 @@ sock_recv_timestamp(struct msghdr *msg, struct sock *sk, struct sk_buff *skb)
 	 */
 	if (sock_flag(sk, SOCK_RCVTSTAMP) ||
 	    (sk->sk_tsflags & SOF_TIMESTAMPING_RX_SOFTWARE) ||
-	    (kt.tv64 && sk->sk_tsflags & SOF_TIMESTAMPING_SOFTWARE) ||
+	    ((kt.tv64 || skb_has_tscookies(skb)) &&
+	     sk->sk_tsflags & SOF_TIMESTAMPING_SOFTWARE) ||
 	    (hwtstamps->hwtstamp.tv64 &&
 	     (sk->sk_tsflags & SOF_TIMESTAMPING_RAW_HARDWARE)))
 		__sock_recv_timestamp(msg, sk, skb);
diff --git a/include/uapi/linux/errqueue.h b/include/uapi/linux/errqueue.h
index 07bdce1..ab67bf0 100644
--- a/include/uapi/linux/errqueue.h
+++ b/include/uapi/linux/errqueue.h
@@ -41,6 +41,7 @@ enum {
 	SCM_TSTAMP_SND,		/* driver passed skb to NIC, or HW */
 	SCM_TSTAMP_SCHED,	/* data entered the packet scheduler */
 	SCM_TSTAMP_ACK,		/* data acknowledged by peer */
+	SCM_TSTAMP_HW,		/* internal use: HW generated */
 };
 
 #endif /* _UAPI_LINUX_ERRQUEUE_H */
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index e5f4c06..c41597f 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3581,6 +3581,19 @@ int skb_cow_data(struct sk_buff *skb, int tailbits, struct sk_buff **trailer)
 }
 EXPORT_SYMBOL_GPL(skb_cow_data);
 
+static void skb_destructor_tscookies(struct sk_buff *skb)
+{
+	struct skb_tstamp_cookie *prev, *cur = skb->tscookies;
+
+	while (cur) {
+		prev = cur;
+		cur = cur->next;
+		kfree(prev);
+	}
+	skb->tscookies = NULL;
+	skb->destructor = NULL;
+}
+
 static void sock_rmem_free(struct sk_buff *skb)
 {
 	struct sock *sk = skb->sk;
@@ -3588,6 +3601,12 @@ static void sock_rmem_free(struct sk_buff *skb)
 	atomic_sub(skb->truesize, &sk->sk_rmem_alloc);
 }
 
+static void sock_rmem_free_tscookies(struct sk_buff *skb)
+{
+	skb_destructor_tscookies(skb);
+	sock_rmem_free(skb);
+}
+
 /*
  * Note: We dont mem charge error packets (no sk_forward_alloc changes)
  */
@@ -3597,9 +3616,13 @@ int sock_queue_err_skb(struct sock *sk, struct sk_buff *skb)
 	    (unsigned int)sk->sk_rcvbuf)
 		return -ENOMEM;
 
-	skb_orphan(skb);
+	if (skb_has_tscookies(skb)) {
+		skb->destructor = sock_rmem_free_tscookies;
+	} else {
+		skb_orphan(skb);
+		skb->destructor = sock_rmem_free;
+	}
 	skb->sk = sk;
-	skb->destructor = sock_rmem_free;
 	atomic_add(skb->truesize, &sk->sk_rmem_alloc);
 
 	/* before exiting rcu section, make sure dst is refcounted */
@@ -3666,23 +3689,78 @@ struct sk_buff *skb_clone_sk(struct sk_buff *skb)
 }
 EXPORT_SYMBOL(skb_clone_sk);
 
-static void __skb_complete_tx_timestamp(struct sk_buff *skb,
-					struct sock *sk,
-					int tstype)
+bool skb_has_tscookies(struct sk_buff *skb)
+{
+	return skb->destructor == skb_destructor_tscookies ||
+	       skb->destructor == sock_rmem_free_tscookies;
+}
+EXPORT_SYMBOL(skb_has_tscookies);
+
+static bool __skb_queue_tstamp_cookie(struct sk_buff *skb, struct sock *sk,
+				      int tstype, u32 tskey, bool is_hw)
+{
+	struct sk_buff_head *q = &sk->sk_error_queue;
+	struct skb_tstamp_cookie *new;
+	struct sk_buff *qskb;
+	unsigned long flags;
+	bool queued = false;
+
+	if (skb->destructor)
+		return false;
+
+	new = kzalloc(sizeof(*new), GFP_ATOMIC);
+	if (!new)
+		return false;
+
+	new->tskey = tskey;
+	if (unlikely(is_hw)) {
+		new->tstype = SCM_TSTAMP_HW;
+		new->tstamp = skb_hwtstamps(skb)->hwtstamp;
+	} else {
+		new->tstype = tstype;
+		new->tstamp = skb->tstamp;
+	}
+
+	spin_lock_irqsave(&q->lock, flags);
+	qskb = skb_peek(&sk->sk_error_queue);
+	if (qskb && skb_has_tscookies(qskb)) {
+		new->next = qskb->tscookies;
+		qskb->tscookies = new;
+		queued = true;
+	}
+	spin_unlock_irqrestore(&q->lock, flags);
+	if (queued) {
+		consume_skb(skb);
+		return true;
+	}
+
+	skb->tscookies = new;
+	skb->destructor = skb_destructor_tscookies;
+	return false;
+}
+
+static void __skb_complete_tx_timestamp(struct sk_buff *skb, struct sock *sk,
+					int tstype, bool is_hw)
 {
 	struct sock_exterr_skb *serr;
-	int err;
+	int err, tskey = 0;
+
+	if (sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID) {
+		tskey = skb_shinfo(skb)->tskey;
+		if (sk->sk_protocol == IPPROTO_TCP)
+			tskey -= sk->sk_tskey;
+	}
+
+	if (sk->sk_tsflags & SOF_TIMESTAMPING_OPT_TSONLY &&
+	    __skb_queue_tstamp_cookie(skb, sk, tstype, tskey, is_hw))
+		return;
 
 	serr = SKB_EXT_ERR(skb);
 	memset(serr, 0, sizeof(*serr));
 	serr->ee.ee_errno = ENOMSG;
 	serr->ee.ee_origin = SO_EE_ORIGIN_TIMESTAMPING;
 	serr->ee.ee_info = tstype;
-	if (sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID) {
-		serr->ee.ee_data = skb_shinfo(skb)->tskey;
-		if (sk->sk_protocol == IPPROTO_TCP)
-			serr->ee.ee_data -= sk->sk_tskey;
-	}
+	serr->ee.ee_data = tskey;
 
 	err = sock_queue_err_skb(sk, skb);
 
@@ -3708,7 +3786,7 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
 	sock_hold(sk);
 
 	*skb_hwtstamps(skb) = *hwtstamps;
-	__skb_complete_tx_timestamp(skb, sk, SCM_TSTAMP_SND);
+	__skb_complete_tx_timestamp(skb, sk, SCM_TSTAMP_SND, true);
 
 	sock_put(sk);
 }
@@ -3741,7 +3819,7 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
 	else
 		skb->tstamp = ktime_get_real();
 
-	__skb_complete_tx_timestamp(skb, sk, tstype);
+	__skb_complete_tx_timestamp(skb, sk, tstype, hwtstamps);
 }
 EXPORT_SYMBOL_GPL(__skb_tstamp_tx);
 
diff --git a/net/socket.c b/net/socket.c
index a2c33a4..6595108 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -676,9 +676,63 @@ int kernel_sendmsg(struct socket *sock, struct msghdr *msg,
 }
 EXPORT_SYMBOL(kernel_sendmsg);
 
-/*
- * called from sock_recv_timestamp() if sock_flag(sk, SOCK_RCVTSTAMP)
- */
+static bool __ts_allow_report(struct sock *sk, int tstype)
+{
+	if (tstype == SCM_TSTAMP_HW)
+		return sk->sk_tsflags & SOF_TIMESTAMPING_RAW_HARDWARE;
+	else
+		return sk->sk_tsflags & SOF_TIMESTAMPING_SOFTWARE;
+}
+
+static void __ts_generate_serr(struct msghdr *msg, struct sock *sk,
+			       struct skb_tstamp_cookie *cur)
+{
+	struct sock_extended_err serr;
+
+	memset(&serr, 0, sizeof(serr));
+
+	serr.ee_errno = ENOMSG;
+	serr.ee_origin = SO_EE_ORIGIN_TIMESTAMPING;
+	serr.ee_data = cur->tskey;
+	serr.ee_info = cur->tstype;
+
+	/* work around legacy interface: HW reports SND with data in tss[2] */
+	if (serr.ee_info == SCM_TSTAMP_HW)
+		serr.ee_info = SCM_TSTAMP_SND;
+
+	if (sk->sk_family == AF_INET)
+		put_cmsg(msg, SOL_IP, IP_RECVERR, sizeof(serr), &serr);
+	else if (sk->sk_family == AF_INET6)
+		put_cmsg(msg, SOL_IPV6, IPV6_RECVERR, sizeof(serr), &serr);
+	else
+		net_warn_ratelimited("tscookie: unknown proto %x",
+				     sk->sk_family);
+}
+
+static void __ts_generate_tss(struct msghdr *msg, struct skb_tstamp_cookie *cur)
+{
+	struct scm_timestamping tss;
+	bool idx = cur->tstype == SCM_TSTAMP_HW ? 2 : 0;
+
+	memset(&tss, 0, sizeof(tss));
+	tss.ts[idx] = ktime_to_timespec(cur->tstamp);
+	put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPING, sizeof(tss), &tss);
+}
+
+static void __sock_recv_timestamp_cookies(struct msghdr *msg, struct sock *sk,
+					  struct skb_tstamp_cookie *cookie)
+{
+	while (cookie) {
+		if (__ts_allow_report(sk, cookie->tstype)) {
+			__ts_generate_tss(msg, cookie);
+			/* caller (e.g., ip_recv_error) generates last serr */
+			if (cookie->next)
+				__ts_generate_serr(msg, sk, cookie);
+		}
+		cookie = cookie->next;
+	}
+}
+
 void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
 	struct sk_buff *skb)
 {
@@ -688,6 +742,10 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
 	struct skb_shared_hwtstamps *shhwtstamps =
 		skb_hwtstamps(skb);
 
+	if (skb_has_tscookies(skb)) {
+		__sock_recv_timestamp_cookies(msg, sk, skb->tscookies);
+		return;
+	}
 	/* Race occurred between timestamp enabling and packet
 	   receiving.  Fill in the current time for now. */
 	if (need_software_tstamp && skb->tstamp.tv64 == 0)
-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next RFC 5/5] net-timestamp: tx timestamping default mode flag
  2015-01-09 17:31 [PATCH net-next RFC 0/5] net-timestamp: address blinding and batching Willem de Bruijn
                   ` (3 preceding siblings ...)
  2015-01-09 17:31 ` [PATCH net-next RFC 4/5] net-timestamp: tx timestamp cookies Willem de Bruijn
@ 2015-01-09 17:31 ` Willem de Bruijn
  2015-01-11 20:32   ` Richard Cochran
  4 siblings, 1 reply; 18+ messages in thread
From: Willem de Bruijn @ 2015-01-09 17:31 UTC (permalink / raw)
  To: netdev; +Cc: davem, richardcochran, eric.dumazet, luto, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

The number of timestamping points along the transmit path has grown,
as have the options. Preferred behavior is to request timestamps with
ID, without data (which enables batching) and for all supported
timestamp points. Define a short option that enables all these
defaults.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 include/uapi/linux/net_tstamp.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
index 6d1abea..c371ce2 100644
--- a/include/uapi/linux/net_tstamp.h
+++ b/include/uapi/linux/net_tstamp.h
@@ -27,6 +27,14 @@ enum {
 	SOF_TIMESTAMPING_OPT_TSONLY = (1<<11),
 
 	SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_TSONLY,
+
+	SOF_TIMESTAMPING_TX = SOF_TIMESTAMPING_TX_HARDWARE |
+			      SOF_TIMESTAMPING_TX_SOFTWARE |
+			      SOF_TIMESTAMPING_TX_SCHED |
+			      SOF_TIMESTAMPING_TX_ACK |
+			      SOF_TIMESTAMPING_OPT_ID |
+			      SOF_TIMESTAMPING_OPT_TSONLY,
+
 	SOF_TIMESTAMPING_MASK = (SOF_TIMESTAMPING_LAST - 1) |
 				 SOF_TIMESTAMPING_LAST
 };
-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next RFC 1/5] net-timestamp: no-payload option
  2015-01-09 17:31 ` [PATCH net-next RFC 1/5] net-timestamp: no-payload option Willem de Bruijn
@ 2015-01-09 19:43   ` Andy Lutomirski
  2015-01-09 19:47     ` Willem de Bruijn
  2015-01-11 20:26   ` Richard Cochran
  1 sibling, 1 reply; 18+ messages in thread
From: Andy Lutomirski @ 2015-01-09 19:43 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, David S. Miller, Richard Cochran, Eric Dumazet

On Fri, Jan 9, 2015 at 9:31 AM, Willem de Bruijn <willemb@google.com> wrote:
> From: Willem de Bruijn <willemb@google.com>
>
> Add timestamping option SOF_TIMESTAMPING_OPT_TSONLY. For transmit
> timestamps, this loops timestamps on top of empty packets.
>
> Doing so reduces the pressure on SO_RCVBUF. Payload inspection and
> cmsg reception (aside from timestamps) are no longer possible. This
> works together with a follow on patch that allows administrators to
> only allow tx timestamping if it does not loop payload or metadata.

If this loses IP_PKTINFO, that will be a bit unfortunate.

--Andy

>
> Signed-off-by: Willem de Bruijn <willemb@google.com>
> ---
>  include/uapi/linux/net_tstamp.h |  3 ++-
>  net/core/skbuff.c               | 19 ++++++++++++++-----
>  net/ipv4/ip_sockglue.c          |  9 +++++----
>  net/ipv6/datagram.c             |  4 ++--
>  net/rxrpc/ar-error.c            |  5 +++++
>  5 files changed, 28 insertions(+), 12 deletions(-)
>
> diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
> index edbc888..6d1abea 100644
> --- a/include/uapi/linux/net_tstamp.h
> +++ b/include/uapi/linux/net_tstamp.h
> @@ -24,8 +24,9 @@ enum {
>         SOF_TIMESTAMPING_TX_SCHED = (1<<8),
>         SOF_TIMESTAMPING_TX_ACK = (1<<9),
>         SOF_TIMESTAMPING_OPT_CMSG = (1<<10),
> +       SOF_TIMESTAMPING_OPT_TSONLY = (1<<11),
>
> -       SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_CMSG,
> +       SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_TSONLY,
>         SOF_TIMESTAMPING_MASK = (SOF_TIMESTAMPING_LAST - 1) |
>                                  SOF_TIMESTAMPING_LAST
>  };
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 5a2a2e8..ece2bb8 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -3710,19 +3710,28 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
>                      struct sock *sk, int tstype)
>  {
>         struct sk_buff *skb;
> +       bool tsonly = sk->sk_tsflags & SOF_TIMESTAMPING_OPT_TSONLY;
>
>         if (!sk)
>                 return;
>
> -       if (hwtstamps)
> -               *skb_hwtstamps(orig_skb) = *hwtstamps;
> +       if (tsonly)
> +               skb = alloc_skb(0, GFP_ATOMIC);
>         else
> -               orig_skb->tstamp = ktime_get_real();
> -
> -       skb = skb_clone(orig_skb, GFP_ATOMIC);
> +               skb = skb_clone(orig_skb, GFP_ATOMIC);
>         if (!skb)
>                 return;
>
> +       if (tsonly) {
> +               skb_shinfo(skb)->tx_flags = skb_shinfo(orig_skb)->tx_flags;
> +               skb_shinfo(skb)->tskey = skb_shinfo(orig_skb)->tskey;
> +       }
> +
> +       if (hwtstamps)
> +               *skb_hwtstamps(skb) = *hwtstamps;
> +       else
> +               skb->tstamp = ktime_get_real();
> +
>         __skb_complete_tx_timestamp(skb, sk, tstype);
>  }
>  EXPORT_SYMBOL_GPL(__skb_tstamp_tx);
> diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
> index a317797..d81ef70 100644
> --- a/net/ipv4/ip_sockglue.c
> +++ b/net/ipv4/ip_sockglue.c
> @@ -440,7 +440,7 @@ static bool ipv4_pktinfo_prepare_errqueue(const struct sock *sk,
>
>         if ((ee_origin != SO_EE_ORIGIN_TIMESTAMPING) ||
>             (!(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_CMSG)) ||
> -           (!skb->dev))
> +           (!skb->dev) || (!skb->len))
>                 return false;
>
>         info->ipi_spec_dst.s_addr = ip_hdr(skb)->saddr;
> @@ -483,7 +483,7 @@ int ip_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
>
>         serr = SKB_EXT_ERR(skb);
>
> -       if (sin) {
> +       if (sin && skb->len) {
>                 sin->sin_family = AF_INET;
>                 sin->sin_addr.s_addr = *(__be32 *)(skb_network_header(skb) +
>                                                    serr->addr_offset);
> @@ -496,8 +496,9 @@ int ip_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
>         sin = &errhdr.offender;
>         sin->sin_family = AF_UNSPEC;
>
> -       if (serr->ee.ee_origin == SO_EE_ORIGIN_ICMP ||
> -           ipv4_pktinfo_prepare_errqueue(sk, skb, serr->ee.ee_origin)) {
> +       if (skb->len &&
> +           (serr->ee.ee_origin == SO_EE_ORIGIN_ICMP ||
> +            ipv4_pktinfo_prepare_errqueue(sk, skb, serr->ee.ee_origin))) {
>                 struct inet_sock *inet = inet_sk(sk);
>
>                 sin->sin_family = AF_INET;
> diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
> index 100c589..91a31ea 100644
> --- a/net/ipv6/datagram.c
> +++ b/net/ipv6/datagram.c
> @@ -369,7 +369,7 @@ int ipv6_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
>
>         serr = SKB_EXT_ERR(skb);
>
> -       if (sin) {
> +       if (sin && skb->len) {
>                 const unsigned char *nh = skb_network_header(skb);
>                 sin->sin6_family = AF_INET6;
>                 sin->sin6_flowinfo = 0;
> @@ -394,7 +394,7 @@ int ipv6_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
>         memcpy(&errhdr.ee, &serr->ee, sizeof(struct sock_extended_err));
>         sin = &errhdr.offender;
>         sin->sin6_family = AF_UNSPEC;
> -       if (serr->ee.ee_origin != SO_EE_ORIGIN_LOCAL) {
> +       if (serr->ee.ee_origin != SO_EE_ORIGIN_LOCAL && skb->len) {
>                 sin->sin6_family = AF_INET6;
>                 sin->sin6_flowinfo = 0;
>                 sin->sin6_port = 0;
> diff --git a/net/rxrpc/ar-error.c b/net/rxrpc/ar-error.c
> index 74c0fcd..5394b6b 100644
> --- a/net/rxrpc/ar-error.c
> +++ b/net/rxrpc/ar-error.c
> @@ -42,6 +42,11 @@ void rxrpc_UDP_error_report(struct sock *sk)
>                 _leave("UDP socket errqueue empty");
>                 return;
>         }
> +       if (!skb->len) {
> +               _leave("UDP empty message");
> +               kfree_skb(skb);
> +               return;
> +       }
>
>         rxrpc_new_skb(skb);
>
> --
> 2.2.0.rc0.207.ga3a616c
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next RFC 1/5] net-timestamp: no-payload option
  2015-01-09 19:43   ` Andy Lutomirski
@ 2015-01-09 19:47     ` Willem de Bruijn
  2015-01-09 20:02       ` Andy Lutomirski
  0 siblings, 1 reply; 18+ messages in thread
From: Willem de Bruijn @ 2015-01-09 19:47 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Network Development, David S. Miller, Richard Cochran, Eric Dumazet

On Fri, Jan 9, 2015 at 2:43 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Fri, Jan 9, 2015 at 9:31 AM, Willem de Bruijn <willemb@google.com> wrote:
>> From: Willem de Bruijn <willemb@google.com>
>>
>> Add timestamping option SOF_TIMESTAMPING_OPT_TSONLY. For transmit
>> timestamps, this loops timestamps on top of empty packets.
>>
>> Doing so reduces the pressure on SO_RCVBUF. Payload inspection and
>> cmsg reception (aside from timestamps) are no longer possible. This
>> works together with a follow on patch that allows administrators to
>> only allow tx timestamping if it does not loop payload or metadata.
>
> If this loses IP_PKTINFO, that will be a bit unfortunate.
>

If it doesn't, then we might as well loop the entire payload. For applications
that need pktinfo or other cmsg, do not select the option.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next RFC 1/5] net-timestamp: no-payload option
  2015-01-09 19:47     ` Willem de Bruijn
@ 2015-01-09 20:02       ` Andy Lutomirski
  2015-01-09 20:33         ` Willem de Bruijn
  0 siblings, 1 reply; 18+ messages in thread
From: Andy Lutomirski @ 2015-01-09 20:02 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, David S. Miller, Richard Cochran, Eric Dumazet

On Fri, Jan 9, 2015 at 11:47 AM, Willem de Bruijn <willemb@google.com> wrote:
> On Fri, Jan 9, 2015 at 2:43 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Fri, Jan 9, 2015 at 9:31 AM, Willem de Bruijn <willemb@google.com> wrote:
>>> From: Willem de Bruijn <willemb@google.com>
>>>
>>> Add timestamping option SOF_TIMESTAMPING_OPT_TSONLY. For transmit
>>> timestamps, this loops timestamps on top of empty packets.
>>>
>>> Doing so reduces the pressure on SO_RCVBUF. Payload inspection and
>>> cmsg reception (aside from timestamps) are no longer possible. This
>>> works together with a follow on patch that allows administrators to
>>> only allow tx timestamping if it does not loop payload or metadata.
>>
>> If this loses IP_PKTINFO, that will be a bit unfortunate.
>>
>
> If it doesn't, then we might as well loop the entire payload. For applications
> that need pktinfo or other cmsg, do not select the option.

Right, but it loses the ability to get the ifindex if the sysctl is
set to the conservative option, which I don't think is desirable.

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next RFC 1/5] net-timestamp: no-payload option
  2015-01-09 20:02       ` Andy Lutomirski
@ 2015-01-09 20:33         ` Willem de Bruijn
  2015-01-09 20:55           ` Andy Lutomirski
  0 siblings, 1 reply; 18+ messages in thread
From: Willem de Bruijn @ 2015-01-09 20:33 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Network Development, David S. Miller, Richard Cochran, Eric Dumazet

On Fri, Jan 9, 2015 at 3:02 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Fri, Jan 9, 2015 at 11:47 AM, Willem de Bruijn <willemb@google.com> wrote:
>> On Fri, Jan 9, 2015 at 2:43 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>> On Fri, Jan 9, 2015 at 9:31 AM, Willem de Bruijn <willemb@google.com> wrote:
>>>> From: Willem de Bruijn <willemb@google.com>
>>>>
>>>> Add timestamping option SOF_TIMESTAMPING_OPT_TSONLY. For transmit
>>>> timestamps, this loops timestamps on top of empty packets.
>>>>
>>>> Doing so reduces the pressure on SO_RCVBUF. Payload inspection and
>>>> cmsg reception (aside from timestamps) are no longer possible. This
>>>> works together with a follow on patch that allows administrators to
>>>> only allow tx timestamping if it does not loop payload or metadata.
>>>
>>> If this loses IP_PKTINFO, that will be a bit unfortunate.
>>>
>>
>> If it doesn't, then we might as well loop the entire payload. For applications
>> that need pktinfo or other cmsg, do not select the option.
>
> Right, but it loses the ability to get the ifindex if the sysctl is
> set to the conservative option, which I don't think is desirable.

Understood. I just find the alternative, where the no-data policy is
weakened by exceptions, even less desirable. That makes it
harder to explain what the sysctl/option do and what the security
implications are.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next RFC 1/5] net-timestamp: no-payload option
  2015-01-09 20:33         ` Willem de Bruijn
@ 2015-01-09 20:55           ` Andy Lutomirski
  2015-01-09 21:18             ` Willem de Bruijn
  0 siblings, 1 reply; 18+ messages in thread
From: Andy Lutomirski @ 2015-01-09 20:55 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, David S. Miller, Richard Cochran, Eric Dumazet

On Fri, Jan 9, 2015 at 12:33 PM, Willem de Bruijn <willemb@google.com> wrote:
> On Fri, Jan 9, 2015 at 3:02 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Fri, Jan 9, 2015 at 11:47 AM, Willem de Bruijn <willemb@google.com> wrote:
>>> On Fri, Jan 9, 2015 at 2:43 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>>> On Fri, Jan 9, 2015 at 9:31 AM, Willem de Bruijn <willemb@google.com> wrote:
>>>>> From: Willem de Bruijn <willemb@google.com>
>>>>>
>>>>> Add timestamping option SOF_TIMESTAMPING_OPT_TSONLY. For transmit
>>>>> timestamps, this loops timestamps on top of empty packets.
>>>>>
>>>>> Doing so reduces the pressure on SO_RCVBUF. Payload inspection and
>>>>> cmsg reception (aside from timestamps) are no longer possible. This
>>>>> works together with a follow on patch that allows administrators to
>>>>> only allow tx timestamping if it does not loop payload or metadata.
>>>>
>>>> If this loses IP_PKTINFO, that will be a bit unfortunate.
>>>>
>>>
>>> If it doesn't, then we might as well loop the entire payload. For applications
>>> that need pktinfo or other cmsg, do not select the option.
>>
>> Right, but it loses the ability to get the ifindex if the sysctl is
>> set to the conservative option, which I don't think is desirable.
>
> Understood. I just find the alternative, where the no-data policy is
> weakened by exceptions, even less desirable. That makes it
> harder to explain what the sysctl/option do and what the security
> implications are.

Agreed.

If there was no-payload but not no-cmsg, then it would be a nice
middle ground, but I guess that's bad for some reason involving
accounting?

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next RFC 1/5] net-timestamp: no-payload option
  2015-01-09 20:55           ` Andy Lutomirski
@ 2015-01-09 21:18             ` Willem de Bruijn
  2015-01-09 22:00               ` Andy Lutomirski
  0 siblings, 1 reply; 18+ messages in thread
From: Willem de Bruijn @ 2015-01-09 21:18 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Network Development, David S. Miller, Richard Cochran, Eric Dumazet

On Fri, Jan 9, 2015 at 3:55 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Fri, Jan 9, 2015 at 12:33 PM, Willem de Bruijn <willemb@google.com> wrote:
>> On Fri, Jan 9, 2015 at 3:02 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>> On Fri, Jan 9, 2015 at 11:47 AM, Willem de Bruijn <willemb@google.com> wrote:
>>>> On Fri, Jan 9, 2015 at 2:43 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>>>> On Fri, Jan 9, 2015 at 9:31 AM, Willem de Bruijn <willemb@google.com> wrote:
>>>>>> From: Willem de Bruijn <willemb@google.com>
>>>>>>
>>>>>> Add timestamping option SOF_TIMESTAMPING_OPT_TSONLY. For transmit
>>>>>> timestamps, this loops timestamps on top of empty packets.
>>>>>>
>>>>>> Doing so reduces the pressure on SO_RCVBUF. Payload inspection and
>>>>>> cmsg reception (aside from timestamps) are no longer possible. This
>>>>>> works together with a follow on patch that allows administrators to
>>>>>> only allow tx timestamping if it does not loop payload or metadata.
>>>>>
>>>>> If this loses IP_PKTINFO, that will be a bit unfortunate.
>>>>>
>>>>
>>>> If it doesn't, then we might as well loop the entire payload. For applications
>>>> that need pktinfo or other cmsg, do not select the option.
>>>
>>> Right, but it loses the ability to get the ifindex if the sysctl is
>>> set to the conservative option, which I don't think is desirable.
>>
>> Understood. I just find the alternative, where the no-data policy is
>> weakened by exceptions, even less desirable. That makes it
>> harder to explain what the sysctl/option do and what the security
>> implications are.
>
> Agreed.
>
> If there was no-payload but not no-cmsg, then it would be a nice
> middle ground, but I guess that's bad for some reason involving
> accounting?

Enabling all cmsg opens up quite a few holes, including the tos
options that we previously discussed. Also, these are implemented
by reading the relevant bits from the payload at recvmsg time, so
at least we would have to queue the full payload (though not
necessarily return it).

> --Andy
>
> --
> Andy Lutomirski
> AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next RFC 1/5] net-timestamp: no-payload option
  2015-01-09 21:18             ` Willem de Bruijn
@ 2015-01-09 22:00               ` Andy Lutomirski
  0 siblings, 0 replies; 18+ messages in thread
From: Andy Lutomirski @ 2015-01-09 22:00 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, David S. Miller, Richard Cochran, Eric Dumazet

OK, makes sense.

On Fri, Jan 9, 2015 at 1:18 PM, Willem de Bruijn <willemb@google.com> wrote:
> On Fri, Jan 9, 2015 at 3:55 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Fri, Jan 9, 2015 at 12:33 PM, Willem de Bruijn <willemb@google.com> wrote:
>>> On Fri, Jan 9, 2015 at 3:02 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>>> On Fri, Jan 9, 2015 at 11:47 AM, Willem de Bruijn <willemb@google.com> wrote:
>>>>> On Fri, Jan 9, 2015 at 2:43 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>>>>> On Fri, Jan 9, 2015 at 9:31 AM, Willem de Bruijn <willemb@google.com> wrote:
>>>>>>> From: Willem de Bruijn <willemb@google.com>
>>>>>>>
>>>>>>> Add timestamping option SOF_TIMESTAMPING_OPT_TSONLY. For transmit
>>>>>>> timestamps, this loops timestamps on top of empty packets.
>>>>>>>
>>>>>>> Doing so reduces the pressure on SO_RCVBUF. Payload inspection and
>>>>>>> cmsg reception (aside from timestamps) are no longer possible. This
>>>>>>> works together with a follow on patch that allows administrators to
>>>>>>> only allow tx timestamping if it does not loop payload or metadata.
>>>>>>
>>>>>> If this loses IP_PKTINFO, that will be a bit unfortunate.
>>>>>>
>>>>>
>>>>> If it doesn't, then we might as well loop the entire payload. For applications
>>>>> that need pktinfo or other cmsg, do not select the option.
>>>>
>>>> Right, but it loses the ability to get the ifindex if the sysctl is
>>>> set to the conservative option, which I don't think is desirable.
>>>
>>> Understood. I just find the alternative, where the no-data policy is
>>> weakened by exceptions, even less desirable. That makes it
>>> harder to explain what the sysctl/option do and what the security
>>> implications are.
>>
>> Agreed.
>>
>> If there was no-payload but not no-cmsg, then it would be a nice
>> middle ground, but I guess that's bad for some reason involving
>> accounting?
>
> Enabling all cmsg opens up quite a few holes, including the tos
> options that we previously discussed. Also, these are implemented
> by reading the relevant bits from the payload at recvmsg time, so
> at least we would have to queue the full payload (though not
> necessarily return it).
>
>> --Andy
>>
>> --
>> Andy Lutomirski
>> AMA Capital Management, LLC



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next RFC 1/5] net-timestamp: no-payload option
  2015-01-09 17:31 ` [PATCH net-next RFC 1/5] net-timestamp: no-payload option Willem de Bruijn
  2015-01-09 19:43   ` Andy Lutomirski
@ 2015-01-11 20:26   ` Richard Cochran
  2015-01-15 18:22     ` Willem de Bruijn
  1 sibling, 1 reply; 18+ messages in thread
From: Richard Cochran @ 2015-01-11 20:26 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: netdev, davem, eric.dumazet, luto

On Fri, Jan 09, 2015 at 12:31:55PM -0500, Willem de Bruijn wrote:
> diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
> index a317797..d81ef70 100644
> --- a/net/ipv4/ip_sockglue.c
> +++ b/net/ipv4/ip_sockglue.c
> @@ -440,7 +440,7 @@ static bool ipv4_pktinfo_prepare_errqueue(const struct sock *sk,
>  
>  	if ((ee_origin != SO_EE_ORIGIN_TIMESTAMPING) ||
>  	    (!(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_CMSG)) ||
> -	    (!skb->dev))
> +	    (!skb->dev) || (!skb->len))
>  		return false;

Nit: You have already tested for the condition (!skb->len) ...

>  
>  	info->ipi_spec_dst.s_addr = ip_hdr(skb)->saddr;
> @@ -483,7 +483,7 @@ int ip_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
>  
>  	serr = SKB_EXT_ERR(skb);
>  
> -	if (sin) {
> +	if (sin && skb->len) {
>  		sin->sin_family = AF_INET;
>  		sin->sin_addr.s_addr = *(__be32 *)(skb_network_header(skb) +
>  						   serr->addr_offset);
> @@ -496,8 +496,9 @@ int ip_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
>  	sin = &errhdr.offender;
>  	sin->sin_family = AF_UNSPEC;
>  
> -	if (serr->ee.ee_origin == SO_EE_ORIGIN_ICMP ||
> -	    ipv4_pktinfo_prepare_errqueue(sk, skb, serr->ee.ee_origin)) {
> +	if (skb->len &&
> +	    (serr->ee.ee_origin == SO_EE_ORIGIN_ICMP ||
> +	     ipv4_pktinfo_prepare_errqueue(sk, skb, serr->ee.ee_origin))) {

... here.

>  		struct inet_sock *inet = inet_sk(sk);
>  
>  		sin->sin_family = AF_INET;

Thanks,
Richard

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next RFC 5/5] net-timestamp: tx timestamping default mode flag
  2015-01-09 17:31 ` [PATCH net-next RFC 5/5] net-timestamp: tx timestamping default mode flag Willem de Bruijn
@ 2015-01-11 20:32   ` Richard Cochran
  2015-01-12  1:49     ` Willem de Bruijn
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Cochran @ 2015-01-11 20:32 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: netdev, davem, eric.dumazet, luto

On Fri, Jan 09, 2015 at 12:31:59PM -0500, Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
> 
> The number of timestamping points along the transmit path has grown,
> as have the options. Preferred behavior is to request timestamps with
> ID, without data (which enables batching) and for all supported
> timestamp points. Define a short option that enables all these
> defaults.

This "preferred behavior" is subjective, and it depends on the
application.  I am sure it reflects your own interest, but for people
doing PTP over UDP or raw, it is a bit misleading.

I would drop this default and just let applications define their own.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next RFC 5/5] net-timestamp: tx timestamping default mode flag
  2015-01-11 20:32   ` Richard Cochran
@ 2015-01-12  1:49     ` Willem de Bruijn
  2015-01-12  8:26       ` Richard Cochran
  0 siblings, 1 reply; 18+ messages in thread
From: Willem de Bruijn @ 2015-01-12  1:49 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Network Development, David Miller, Eric Dumazet, Andy Lutomirski

On Sun, Jan 11, 2015 at 3:32 PM, Richard Cochran
<richardcochran@gmail.com> wrote:
> On Fri, Jan 09, 2015 at 12:31:59PM -0500, Willem de Bruijn wrote:
>> From: Willem de Bruijn <willemb@google.com>
>>
>> The number of timestamping points along the transmit path has grown,
>> as have the options. Preferred behavior is to request timestamps with
>> ID, without data (which enables batching) and for all supported
>> timestamp points. Define a short option that enables all these
>> defaults.
>
> This "preferred behavior" is subjective, and it depends on the
> application.  I am sure it reflects your own interest, but for people
> doing PTP over UDP or raw, it is a bit misleading.
>
> I would drop this default and just let applications define their own.

Okay. I indeed hadn't considered that use-case.

Just so I understand: ptp has no use for the sw tstamps
that would be generated with this flag, but is otherwise
okay with enabling counters to order tx timestamps
(OPT_ID) and disabling payload (OPT_TSONLY)?

In the documentation, I would like to strongly suggest all
processes to enable these, even in absence of this default.
because that is more robust wrt the sysctl (if merged).

>
> Thanks,

Thanks for taking a look at the patches.

> Richard

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next RFC 5/5] net-timestamp: tx timestamping default mode flag
  2015-01-12  1:49     ` Willem de Bruijn
@ 2015-01-12  8:26       ` Richard Cochran
  0 siblings, 0 replies; 18+ messages in thread
From: Richard Cochran @ 2015-01-12  8:26 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, David Miller, Eric Dumazet, Andy Lutomirski

On Sun, Jan 11, 2015 at 08:49:00PM -0500, Willem de Bruijn wrote:
> Just so I understand: ptp has no use for the sw tstamps
> that would be generated with this flag, but is otherwise
> okay with enabling counters to order tx timestamps
> (OPT_ID) and disabling payload (OPT_TSONLY)?

Yes.

> In the documentation, I would like to strongly suggest all
> processes to enable these, even in absence of this default.
> because that is more robust wrt the sysctl (if merged).

Sounds good.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next RFC 1/5] net-timestamp: no-payload option
  2015-01-11 20:26   ` Richard Cochran
@ 2015-01-15 18:22     ` Willem de Bruijn
  0 siblings, 0 replies; 18+ messages in thread
From: Willem de Bruijn @ 2015-01-15 18:22 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Network Development, David Miller, Eric Dumazet, Andy Lutomirski

On Sun, Jan 11, 2015 at 3:26 PM, Richard Cochran
<richardcochran@gmail.com> wrote:
> On Fri, Jan 09, 2015 at 12:31:55PM -0500, Willem de Bruijn wrote:
>> diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
>> index a317797..d81ef70 100644
>> --- a/net/ipv4/ip_sockglue.c
>> +++ b/net/ipv4/ip_sockglue.c
>> @@ -440,7 +440,7 @@ static bool ipv4_pktinfo_prepare_errqueue(const struct sock *sk,
>>
>>       if ((ee_origin != SO_EE_ORIGIN_TIMESTAMPING) ||
>>           (!(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_CMSG)) ||
>> -         (!skb->dev))
>> +         (!skb->dev) || (!skb->len))
>>               return false;
>
> Nit: You have already tested for the condition (!skb->len) ...

Thanks! I'll fix that when I send v1.

I had planned to do that today, but will hold off a bit to avoid merge
conflicts with a fix queued for net (http://patchwork.ozlabs.org/patch/429553/)

I will drop patches 4 and 5 when sending out v1, btw.
>
>>
>>       info->ipi_spec_dst.s_addr = ip_hdr(skb)->saddr;
>> @@ -483,7 +483,7 @@ int ip_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
>>
>>       serr = SKB_EXT_ERR(skb);
>>
>> -     if (sin) {
>> +     if (sin && skb->len) {
>>               sin->sin_family = AF_INET;
>>               sin->sin_addr.s_addr = *(__be32 *)(skb_network_header(skb) +
>>                                                  serr->addr_offset);
>> @@ -496,8 +496,9 @@ int ip_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
>>       sin = &errhdr.offender;
>>       sin->sin_family = AF_UNSPEC;
>>
>> -     if (serr->ee.ee_origin == SO_EE_ORIGIN_ICMP ||
>> -         ipv4_pktinfo_prepare_errqueue(sk, skb, serr->ee.ee_origin)) {
>> +     if (skb->len &&
>> +         (serr->ee.ee_origin == SO_EE_ORIGIN_ICMP ||
>> +          ipv4_pktinfo_prepare_errqueue(sk, skb, serr->ee.ee_origin))) {
>
> ... here.
>
>>               struct inet_sock *inet = inet_sk(sk);
>>
>>               sin->sin_family = AF_INET;
>
> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-01-15 18:22 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-09 17:31 [PATCH net-next RFC 0/5] net-timestamp: address blinding and batching Willem de Bruijn
2015-01-09 17:31 ` [PATCH net-next RFC 1/5] net-timestamp: no-payload option Willem de Bruijn
2015-01-09 19:43   ` Andy Lutomirski
2015-01-09 19:47     ` Willem de Bruijn
2015-01-09 20:02       ` Andy Lutomirski
2015-01-09 20:33         ` Willem de Bruijn
2015-01-09 20:55           ` Andy Lutomirski
2015-01-09 21:18             ` Willem de Bruijn
2015-01-09 22:00               ` Andy Lutomirski
2015-01-11 20:26   ` Richard Cochran
2015-01-15 18:22     ` Willem de Bruijn
2015-01-09 17:31 ` [PATCH net-next RFC 2/5] net-timestamp: no-payload only sysctl Willem de Bruijn
2015-01-09 17:31 ` [PATCH net-next RFC 3/5] net-timestamp: no-payload option in txtimestamp test Willem de Bruijn
2015-01-09 17:31 ` [PATCH net-next RFC 4/5] net-timestamp: tx timestamp cookies Willem de Bruijn
2015-01-09 17:31 ` [PATCH net-next RFC 5/5] net-timestamp: tx timestamping default mode flag Willem de Bruijn
2015-01-11 20:32   ` Richard Cochran
2015-01-12  1:49     ` Willem de Bruijn
2015-01-12  8:26       ` Richard Cochran

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.