netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC net-next 0/6] multi release pacing for UDP GSO
@ 2020-06-09 14:09 Willem de Bruijn
  2020-06-09 14:09 ` [PATCH RFC net-next 1/6] net: multiple release time SO_TXTIME Willem de Bruijn
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Willem de Bruijn @ 2020-06-09 14:09 UTC (permalink / raw)
  To: netdev; +Cc: Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

UDP segmentation offload with UDP_SEGMENT can significantly reduce the
transmission cycle cost per byte for protocols like QUIC.

Pacing offload with SO_TXTIME can improve accuracy and cycle cost of
pacing for such userspace protocols further.

But the maximum GSO size built is limited by the pacing rate. As msec
pacing interval, for many Internet clients results in at most a few
segments per datagram.

The pros and cons were captured in a recent CloudFlare article,
specifically mentioning

  "But it does not yet support specifying different times for each
  packet when GSO is used, as there is no way to define multiple
  timestamps for packets that need to be segmented (each segmented
  packet essentially ends up being sent at the same time anyway)."

  https://blog.cloudflare.com/accelerating-udp-packet-transmission-for-quic/

We have been evaluating such a mechanism for multiple release times
per UDP GSO packets. Since it sounds like it may of interest to
others, too, it may be a while before we have all the data I'd like
and it's more quiet on the list now that the merge window is open,
sharing a WIP version.

The basic approach is to specify

1. initial early release time (in nsec)
2. interval between subsequent release times (in msec)
3. number of segments to release at each release time

One implementation concern is where to store the additional two fields
in the skb. Given that msec granularity is the Internet pacing speed,
for now repurpose the two lowest 4B nibbles in skb->tstamp to hold the
interval and segment count. I'm aware that this does not win a prize
for elegance.

Patch 1 adds the socket option and basic segmentation function to
  adjust the skb->tstamp of the individual segments.

Patch 2 extends this with support for build GSO segs. Build one GSO
   segment per interval if the hardware can offload (USO) and thus
   we are segmenting only to maintain pacing rate.

Patch 3 wires the segmentation up to the FQ qdisc on enqueue, so that
   segments will be scheduled for delivery at their adjusted time.

Patch 4..6 extend existing tests to experiment with the feature

Patch 4 allows testing so_txtime across hardware (for USO)
Patch 5 extends the so_txtime test with support for gso and mr-pacing
Patch 6 extends the udpgso bench to support pacing and mr-pacing

Some known limitations:

- the aforementioned storage in skb->tstamp.

- exposing this constraint through the SO_TXTIME interface.
  it is cleaner to add new fields to the cmsg, at nsec resolution.

- the fq_enqueue path adds a branch to the hot path.
  a static branch would avoid that.

- a few udp specific assumptions in a net/core datapath.
  notably the hw_features. this can be derived from gso_type.

Willem de Bruijn (6):
  net: multiple release time SO_TXTIME
  net: build gso segs in multi release time SO_TXTIME
  net_sched: sch_fq: multiple release time support
  selftests/net: so_txtime: support txonly/rxonly modes
  selftests/net: so_txtime: add gso and multi release pacing
  selftests/net: upgso bench: add pacing with SO_TXTIME

 include/linux/netdevice.h                     |   1 +
 include/net/sock.h                            |   3 +-
 include/uapi/linux/net_tstamp.h               |   3 +-
 net/core/dev.c                                |  71 +++++++++
 net/core/sock.c                               |   4 +
 net/sched/sch_fq.c                            |  33 ++++-
 tools/testing/selftests/net/so_txtime.c       | 136 ++++++++++++++----
 tools/testing/selftests/net/so_txtime.sh      |   7 +
 .../testing/selftests/net/so_txtime_multi.sh  |  68 +++++++++
 .../selftests/net/udpgso_bench_multi.sh       |  65 +++++++++
 tools/testing/selftests/net/udpgso_bench_tx.c |  72 +++++++++-
 11 files changed, 431 insertions(+), 32 deletions(-)
 create mode 100755 tools/testing/selftests/net/so_txtime_multi.sh
 create mode 100755 tools/testing/selftests/net/udpgso_bench_multi.sh

-- 
2.27.0.278.ge193c7cf3a9-goog


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH RFC net-next 1/6] net: multiple release time SO_TXTIME
  2020-06-09 14:09 [PATCH RFC net-next 0/6] multi release pacing for UDP GSO Willem de Bruijn
@ 2020-06-09 14:09 ` Willem de Bruijn
  2020-06-09 14:09 ` [PATCH RFC net-next 2/6] net: build gso segs in multi " Willem de Bruijn
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Willem de Bruijn @ 2020-06-09 14:09 UTC (permalink / raw)
  To: netdev; +Cc: Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Pace transmission of segments in a UDP GSO datagram.

Batching datagram protocol stack traversals with UDP_SEGMENT saves
significant cycles for large data transfers.

But GSO packets are sent at once. Pacing traffic to internet clients
often requires sending just a few MSS per msec pacing interval.

SO_TXTIME allows delivery of packets at a later time. Extend it
to allow pacing the segments in a UDP GSO packet, to be able to build
larger GSO datagrams.

Add SO_TXTIME flag SOF_TXTIME_MULTI_RELEASE. This reinterprets the
lower 8 bits of the 64-bit release timestamp as

  - bits 4..7: release time interval in usec
  - bits 0..3: number of segments sent per period

So a timestamp of 0x148 means

  - 0x100 initial timestamp in Qdisc selected clocksource
  - every 4 usec release N MSS
  - N is 8

A subsequent qdisc change will pace the individual segments.

Packet transmission can race with the socket option. This is safe.
For predictable behavior, it is up to the caller to not toggle the
feature while packets on a socket are in flight.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 include/linux/netdevice.h       |  1 +
 include/net/sock.h              |  3 ++-
 include/uapi/linux/net_tstamp.h |  3 ++-
 net/core/dev.c                  | 44 +++++++++++++++++++++++++++++++++
 net/core/sock.c                 |  4 +++
 5 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 1a96e9c4ec36..15ea976dd446 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4528,6 +4528,7 @@ struct sk_buff *__skb_gso_segment(struct sk_buff *skb,
 				  netdev_features_t features, bool tx_path);
 struct sk_buff *skb_mac_gso_segment(struct sk_buff *skb,
 				    netdev_features_t features);
+struct sk_buff *skb_gso_segment_txtime(struct sk_buff *skb);
 
 struct netdev_bonding_info {
 	ifslave	slave;
diff --git a/include/net/sock.h b/include/net/sock.h
index c53cc42b5ab9..491e389b3570 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -493,7 +493,8 @@ struct sock {
 	u8			sk_clockid;
 	u8			sk_txtime_deadline_mode : 1,
 				sk_txtime_report_errors : 1,
-				sk_txtime_unused : 6;
+				sk_txtime_multi_release : 1,
+				sk_txtime_unused : 5;
 
 	struct socket		*sk_socket;
 	void			*sk_user_data;
diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
index 7ed0b3d1c00a..ca1ae3b6f601 100644
--- a/include/uapi/linux/net_tstamp.h
+++ b/include/uapi/linux/net_tstamp.h
@@ -162,8 +162,9 @@ struct scm_ts_pktinfo {
 enum txtime_flags {
 	SOF_TXTIME_DEADLINE_MODE = (1 << 0),
 	SOF_TXTIME_REPORT_ERRORS = (1 << 1),
+	SOF_TXTIME_MULTI_RELEASE = (1 << 2),
 
-	SOF_TXTIME_FLAGS_LAST = SOF_TXTIME_REPORT_ERRORS,
+	SOF_TXTIME_FLAGS_LAST = SOF_TXTIME_MULTI_RELEASE,
 	SOF_TXTIME_FLAGS_MASK = (SOF_TXTIME_FLAGS_LAST - 1) |
 				 SOF_TXTIME_FLAGS_LAST
 };
diff --git a/net/core/dev.c b/net/core/dev.c
index 061496a1f640..5058083375fb 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3377,6 +3377,50 @@ struct sk_buff *__skb_gso_segment(struct sk_buff *skb,
 }
 EXPORT_SYMBOL(__skb_gso_segment);
 
+struct sk_buff *skb_gso_segment_txtime(struct sk_buff *skb)
+{
+	int mss_per_ival, mss_in_cur_ival;
+	struct sk_buff *segs, *seg;
+	struct skb_shared_info *sh;
+	u64 step_ns, tstamp;
+
+	if (!skb->sk || !sk_fullsock(skb->sk) ||
+	    !skb->sk->sk_txtime_multi_release)
+		return NULL;
+
+	/* extract multi release variables mss and stepsize */
+	mss_per_ival = skb->tstamp & 0xF;
+	step_ns = ((skb->tstamp >> 4) & 0xF) * NSEC_PER_MSEC;
+	tstamp = skb->tstamp;
+
+	if (mss_per_ival == 0)
+		return NULL;
+
+	/* skip multi-release if total segs can be sent at once */
+	sh = skb_shinfo(skb);
+	if (sh->gso_segs <= mss_per_ival)
+		return NULL;
+
+	segs = skb_gso_segment(skb, NETIF_F_SG | NETIF_F_HW_CSUM);
+	if (IS_ERR_OR_NULL(segs))
+		return segs;
+
+	mss_in_cur_ival = 0;
+
+	for (seg = segs; seg; seg = seg->next) {
+		seg->tstamp = tstamp & ~0xFF;
+
+		mss_in_cur_ival++;
+		if (mss_in_cur_ival == mss_per_ival) {
+			tstamp += step_ns;
+			mss_in_cur_ival = 0;
+		}
+	}
+
+	return segs;
+}
+EXPORT_SYMBOL_GPL(skb_gso_segment_txtime);
+
 /* Take action when hardware reception checksum errors are detected. */
 #ifdef CONFIG_BUG
 void netdev_rx_csum_fault(struct net_device *dev, struct sk_buff *skb)
diff --git a/net/core/sock.c b/net/core/sock.c
index 6c4acf1f0220..7036b8855154 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1258,6 +1258,8 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
 			!!(sk_txtime.flags & SOF_TXTIME_DEADLINE_MODE);
 		sk->sk_txtime_report_errors =
 			!!(sk_txtime.flags & SOF_TXTIME_REPORT_ERRORS);
+		sk->sk_txtime_multi_release =
+			!!(sk_txtime.flags & SOF_TXTIME_MULTI_RELEASE);
 		break;
 
 	case SO_BINDTOIFINDEX:
@@ -1608,6 +1610,8 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
 				  SOF_TXTIME_DEADLINE_MODE : 0;
 		v.txtime.flags |= sk->sk_txtime_report_errors ?
 				  SOF_TXTIME_REPORT_ERRORS : 0;
+		v.txtime.flags |= sk->sk_txtime_multi_release ?
+				  SOF_TXTIME_MULTI_RELEASE : 0;
 		break;
 
 	case SO_BINDTOIFINDEX:
-- 
2.27.0.278.ge193c7cf3a9-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH RFC net-next 2/6] net: build gso segs in multi release time SO_TXTIME
  2020-06-09 14:09 [PATCH RFC net-next 0/6] multi release pacing for UDP GSO Willem de Bruijn
  2020-06-09 14:09 ` [PATCH RFC net-next 1/6] net: multiple release time SO_TXTIME Willem de Bruijn
@ 2020-06-09 14:09 ` Willem de Bruijn
  2020-06-09 14:09 ` [PATCH RFC net-next 3/6] net_sched: sch_fq: multiple release time support Willem de Bruijn
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Willem de Bruijn @ 2020-06-09 14:09 UTC (permalink / raw)
  To: netdev; +Cc: Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

When sending multiple segments per interval and the device supports
hardware segmentation, build one GSO segment per interval.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 net/core/dev.c | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 5058083375fb..05f538f0f631 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3379,7 +3379,9 @@ EXPORT_SYMBOL(__skb_gso_segment);
 
 struct sk_buff *skb_gso_segment_txtime(struct sk_buff *skb)
 {
+	const netdev_features_t hw_features = NETIF_F_GSO_UDP_L4;
 	int mss_per_ival, mss_in_cur_ival;
+	u16 gso_size_orig, gso_segs_orig;
 	struct sk_buff *segs, *seg;
 	struct skb_shared_info *sh;
 	u64 step_ns, tstamp;
@@ -3401,13 +3403,27 @@ struct sk_buff *skb_gso_segment_txtime(struct sk_buff *skb)
 	if (sh->gso_segs <= mss_per_ival)
 		return NULL;
 
+	/* update gso size and segs to build 1 GSO packet per ival */
+	gso_size_orig = sh->gso_size;
+	gso_segs_orig = sh->gso_segs;
+	if (mss_per_ival > 1 && skb->dev->features & hw_features) {
+		sh->gso_size *= mss_per_ival;
+		sh->gso_segs = DIV_ROUND_UP(sh->gso_segs, mss_per_ival);
+		mss_per_ival = 1;
+	}
+
 	segs = skb_gso_segment(skb, NETIF_F_SG | NETIF_F_HW_CSUM);
-	if (IS_ERR_OR_NULL(segs))
+	if (IS_ERR_OR_NULL(segs)) {
+		sh->gso_size = gso_size_orig;
+		sh->gso_segs = gso_segs_orig;
 		return segs;
+	}
 
 	mss_in_cur_ival = 0;
 
 	for (seg = segs; seg; seg = seg->next) {
+		unsigned int data_len, data_off;
+
 		seg->tstamp = tstamp & ~0xFF;
 
 		mss_in_cur_ival++;
@@ -3415,6 +3431,17 @@ struct sk_buff *skb_gso_segment_txtime(struct sk_buff *skb)
 			tstamp += step_ns;
 			mss_in_cur_ival = 0;
 		}
+
+		data_off = skb_checksum_start_offset(skb) +
+			   skb->csum_offset + sizeof(__sum16);
+		data_len = seg->len - data_off;
+
+		if (data_len > gso_size_orig) {
+			sh = skb_shinfo(seg);
+			sh->gso_type = skb_shinfo(skb)->gso_type;
+			sh->gso_size = gso_size_orig;
+			sh->gso_segs = DIV_ROUND_UP(data_len, gso_size_orig);
+		}
 	}
 
 	return segs;
-- 
2.27.0.278.ge193c7cf3a9-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH RFC net-next 3/6] net_sched: sch_fq: multiple release time support
  2020-06-09 14:09 [PATCH RFC net-next 0/6] multi release pacing for UDP GSO Willem de Bruijn
  2020-06-09 14:09 ` [PATCH RFC net-next 1/6] net: multiple release time SO_TXTIME Willem de Bruijn
  2020-06-09 14:09 ` [PATCH RFC net-next 2/6] net: build gso segs in multi " Willem de Bruijn
@ 2020-06-09 14:09 ` Willem de Bruijn
  2020-06-09 15:00   ` Eric Dumazet
  2020-06-09 14:09 ` [PATCH RFC net-next 4/6] selftests/net: so_txtime: support txonly/rxonly modes Willem de Bruijn
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 9+ messages in thread
From: Willem de Bruijn @ 2020-06-09 14:09 UTC (permalink / raw)
  To: netdev; +Cc: Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Optionally segment skbs on FQ enqueue, to later send segments at
their individual delivery time.

Segmentation on enqueue is new for FQ, but already happens in TBF,
CAKE and netem.

This slow patch should probably be behind a static_branch.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 net/sched/sch_fq.c | 33 +++++++++++++++++++++++++++++++--
 1 file changed, 31 insertions(+), 2 deletions(-)

diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index 8f06a808c59a..a5e2c35bb557 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -439,8 +439,8 @@ static bool fq_packet_beyond_horizon(const struct sk_buff *skb,
 	return unlikely((s64)skb->tstamp > (s64)(q->ktime_cache + q->horizon));
 }
 
-static int fq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
-		      struct sk_buff **to_free)
+static int __fq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
+			struct sk_buff **to_free)
 {
 	struct fq_sched_data *q = qdisc_priv(sch);
 	struct fq_flow *f;
@@ -496,6 +496,35 @@ static int fq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	return NET_XMIT_SUCCESS;
 }
 
+static int fq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
+		      struct sk_buff **to_free)
+{
+	struct sk_buff *segs, *next;
+	int ret;
+
+	if (likely(!skb_is_gso(skb) || !skb->sk ||
+		   !skb->sk->sk_txtime_multi_release))
+		return __fq_enqueue(skb, sch, to_free);
+
+	segs = skb_gso_segment_txtime(skb);
+	if (IS_ERR(segs))
+		return qdisc_drop(skb, sch, to_free);
+	if (!segs)
+		return __fq_enqueue(skb, sch, to_free);
+
+	consume_skb(skb);
+
+	ret = NET_XMIT_DROP;
+	skb_list_walk_safe(segs, segs, next) {
+		skb_mark_not_on_list(segs);
+		qdisc_skb_cb(segs)->pkt_len = segs->len;
+		if (__fq_enqueue(segs, sch, to_free) == NET_XMIT_SUCCESS)
+			ret = NET_XMIT_SUCCESS;
+	}
+
+	return ret;
+}
+
 static void fq_check_throttled(struct fq_sched_data *q, u64 now)
 {
 	unsigned long sample;
-- 
2.27.0.278.ge193c7cf3a9-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH RFC net-next 4/6] selftests/net: so_txtime: support txonly/rxonly modes
  2020-06-09 14:09 [PATCH RFC net-next 0/6] multi release pacing for UDP GSO Willem de Bruijn
                   ` (2 preceding siblings ...)
  2020-06-09 14:09 ` [PATCH RFC net-next 3/6] net_sched: sch_fq: multiple release time support Willem de Bruijn
@ 2020-06-09 14:09 ` Willem de Bruijn
  2020-06-09 14:09 ` [PATCH RFC net-next 5/6] selftests/net: so_txtime: add gso and multi release pacing Willem de Bruijn
  2020-06-09 14:09 ` [PATCH RFC net-next 6/6] selftests/net: upgso bench: add pacing with SO_TXTIME Willem de Bruijn
  5 siblings, 0 replies; 9+ messages in thread
From: Willem de Bruijn @ 2020-06-09 14:09 UTC (permalink / raw)
  To: netdev; +Cc: Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Allow running the test across two machines, to test nic hw offload.

Add options
-A: receiver address
-r: receive only
-t: transmit only
-T: SO_RCVTIMEO value

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 tools/testing/selftests/net/so_txtime.c | 60 ++++++++++++++++++++-----
 1 file changed, 48 insertions(+), 12 deletions(-)

diff --git a/tools/testing/selftests/net/so_txtime.c b/tools/testing/selftests/net/so_txtime.c
index 383bac05ac32..fa748e4209c0 100644
--- a/tools/testing/selftests/net/so_txtime.c
+++ b/tools/testing/selftests/net/so_txtime.c
@@ -28,9 +28,13 @@
 #include <time.h>
 #include <unistd.h>
 
+static const char *cfg_addr;
 static int	cfg_clockid	= CLOCK_TAI;
 static bool	cfg_do_ipv4;
 static bool	cfg_do_ipv6;
+static bool	cfg_rxonly;
+static int	cfg_timeout_sec;
+static bool	cfg_txonly;
 static uint16_t	cfg_port	= 8000;
 static int	cfg_variance_us	= 4000;
 
@@ -238,8 +242,12 @@ static int setup_rx(struct sockaddr *addr, socklen_t alen)
 	if (fd == -1)
 		error(1, errno, "socket r");
 
-	if (bind(fd, addr, alen))
-		error(1, errno, "bind");
+	if (!cfg_txonly)
+		if (bind(fd, addr, alen))
+			error(1, errno, "bind");
+
+	if (cfg_timeout_sec)
+		tv.tv_sec = cfg_timeout_sec;
 
 	if (setsockopt(fd, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv)))
 		error(1, errno, "setsockopt rcv timeout");
@@ -260,13 +268,18 @@ static void do_test(struct sockaddr *addr, socklen_t alen)
 
 	glob_tstart = gettime_ns();
 
-	for (i = 0; i < cfg_num_pkt; i++)
-		do_send_one(fdt, &cfg_in[i]);
-	for (i = 0; i < cfg_num_pkt; i++)
-		if (do_recv_one(fdr, &cfg_out[i]))
-			do_recv_errqueue_timeout(fdt);
+	if (!cfg_rxonly) {
+		for (i = 0; i < cfg_num_pkt; i++)
+			do_send_one(fdt, &cfg_in[i]);
+	}
 
-	do_recv_verify_empty(fdr);
+	if (!cfg_txonly) {
+		for (i = 0; i < cfg_num_pkt; i++)
+			if (do_recv_one(fdr, &cfg_out[i]))
+				do_recv_errqueue_timeout(fdt);
+
+		do_recv_verify_empty(fdr);
+	}
 
 	if (close(fdr))
 		error(1, errno, "close r");
@@ -308,7 +321,7 @@ static void parse_opts(int argc, char **argv)
 {
 	int c, ilen, olen;
 
-	while ((c = getopt(argc, argv, "46c:")) != -1) {
+	while ((c = getopt(argc, argv, "46A:c:rtT:")) != -1) {
 		switch (c) {
 		case '4':
 			cfg_do_ipv4 = true;
@@ -316,6 +329,9 @@ static void parse_opts(int argc, char **argv)
 		case '6':
 			cfg_do_ipv6 = true;
 			break;
+		case 'A':
+			cfg_addr = optarg;
+			break;
 		case 'c':
 			if (!strcmp(optarg, "tai"))
 				cfg_clockid = CLOCK_TAI;
@@ -325,13 +341,27 @@ static void parse_opts(int argc, char **argv)
 			else
 				error(1, 0, "unknown clock id %s", optarg);
 			break;
+		case 'r':
+			cfg_rxonly = true;
+			break;
+		case 't':
+			cfg_txonly = true;
+			break;
+		case 'T':
+			cfg_timeout_sec = strtol(optarg, NULL, 0);
+			break;
 		default:
 			error(1, 0, "parse error at %d", optind);
 		}
 	}
 
 	if (argc - optind != 2)
-		error(1, 0, "Usage: %s [-46] -c <clock> <in> <out>", argv[0]);
+		error(1, 0, "Usage: %s [-46rt] [-A addr] [-c clock] [-T timeout] <in> <out>", argv[0]);
+
+	if (cfg_rxonly && cfg_txonly)
+		error(1, 0, "Select rx-only or tx-only, not both");
+	if (cfg_addr && cfg_do_ipv4 && cfg_do_ipv6)
+		error(1, 0, "Cannot run both IPv4 and IPv6 when passing address");
 
 	ilen = parse_io(argv[optind], cfg_in);
 	olen = parse_io(argv[optind + 1], cfg_out);
@@ -349,7 +379,10 @@ int main(int argc, char **argv)
 
 		addr6.sin6_family = AF_INET6;
 		addr6.sin6_port = htons(cfg_port);
-		addr6.sin6_addr = in6addr_loopback;
+		if (!cfg_addr)
+			addr6.sin6_addr = in6addr_loopback;
+		else if (inet_pton(AF_INET6, cfg_addr, &addr6.sin6_addr) != 1)
+			error(1, 0, "ipv6 parse error: %s", cfg_addr);
 
 		cfg_errq_level = SOL_IPV6;
 		cfg_errq_type = IPV6_RECVERR;
@@ -362,7 +395,10 @@ int main(int argc, char **argv)
 
 		addr4.sin_family = AF_INET;
 		addr4.sin_port = htons(cfg_port);
-		addr4.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+		if (!cfg_addr)
+			addr4.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+		else if (inet_pton(AF_INET, cfg_addr, &addr4.sin_addr) != 1)
+			error(1, 0, "ipv4 parse error: %s", cfg_addr);
 
 		cfg_errq_level = SOL_IP;
 		cfg_errq_type = IP_RECVERR;
-- 
2.27.0.278.ge193c7cf3a9-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH RFC net-next 5/6] selftests/net: so_txtime: add gso and multi release pacing
  2020-06-09 14:09 [PATCH RFC net-next 0/6] multi release pacing for UDP GSO Willem de Bruijn
                   ` (3 preceding siblings ...)
  2020-06-09 14:09 ` [PATCH RFC net-next 4/6] selftests/net: so_txtime: support txonly/rxonly modes Willem de Bruijn
@ 2020-06-09 14:09 ` Willem de Bruijn
  2020-06-09 14:09 ` [PATCH RFC net-next 6/6] selftests/net: upgso bench: add pacing with SO_TXTIME Willem de Bruijn
  5 siblings, 0 replies; 9+ messages in thread
From: Willem de Bruijn @ 2020-06-09 14:09 UTC (permalink / raw)
  To: netdev; +Cc: Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Support sending more than 1B payload and passing segment size.

- add option '-s' to send more than 1B payload
- add option '-m' to configure mss

If size exceeds mss, enable UDP_SEGMENT.

Optionally also allow configuring multi release time.

- add option '-M' to set release interval (msec)
- add option '-N' to set release segment count

Both options have to be specified, or neither.

Add a testcase to so_txtime.sh over loopback.
Add a testscript so_txtime_multi.sh that operates over veth using
txonly/rxonly mode.

Also fix a small sock_extended_err parsing bug, mixing up
ee_code and ee_errno

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 tools/testing/selftests/net/so_txtime.c       | 80 +++++++++++++++----
 tools/testing/selftests/net/so_txtime.sh      |  7 ++
 .../testing/selftests/net/so_txtime_multi.sh  | 68 ++++++++++++++++
 3 files changed, 141 insertions(+), 14 deletions(-)
 create mode 100755 tools/testing/selftests/net/so_txtime_multi.sh

diff --git a/tools/testing/selftests/net/so_txtime.c b/tools/testing/selftests/net/so_txtime.c
index fa748e4209c0..da4ef411693c 100644
--- a/tools/testing/selftests/net/so_txtime.c
+++ b/tools/testing/selftests/net/so_txtime.c
@@ -17,6 +17,7 @@
 #include <linux/errqueue.h>
 #include <linux/ipv6.h>
 #include <linux/tcp.h>
+#include <netinet/udp.h>
 #include <stdbool.h>
 #include <stdlib.h>
 #include <stdio.h>
@@ -32,7 +33,11 @@ static const char *cfg_addr;
 static int	cfg_clockid	= CLOCK_TAI;
 static bool	cfg_do_ipv4;
 static bool	cfg_do_ipv6;
+static uint8_t	cfg_mr_num;
+static uint8_t	cfg_mr_ival;
+static int	cfg_mss		= 1400;
 static bool	cfg_rxonly;
+static uint16_t	cfg_size	= 1;
 static int	cfg_timeout_sec;
 static bool	cfg_txonly;
 static uint16_t	cfg_port	= 8000;
@@ -66,6 +71,7 @@ static uint64_t gettime_ns(void)
 
 static void do_send_one(int fdt, struct timed_send *ts)
 {
+	static char buf[1 << 16];
 	char control[CMSG_SPACE(sizeof(uint64_t))];
 	struct msghdr msg = {0};
 	struct iovec iov = {0};
@@ -73,8 +79,10 @@ static void do_send_one(int fdt, struct timed_send *ts)
 	uint64_t tdeliver;
 	int ret;
 
-	iov.iov_base = &ts->data;
-	iov.iov_len = 1;
+	memset(buf, ts->data, cfg_size);
+
+	iov.iov_base = buf;
+	iov.iov_len = cfg_size;
 
 	msg.msg_iov = &iov;
 	msg.msg_iovlen = 1;
@@ -85,6 +93,11 @@ static void do_send_one(int fdt, struct timed_send *ts)
 		msg.msg_controllen = sizeof(control);
 
 		tdeliver = glob_tstart + ts->delay_us * 1000;
+		if (cfg_mr_ival) {
+			tdeliver &= ~0xFF;
+			tdeliver |= cfg_mr_ival << 4;
+			tdeliver |= cfg_mr_num;
+		}
 
 		cm = CMSG_FIRSTHDR(&msg);
 		cm->cmsg_level = SOL_SOCKET;
@@ -104,30 +117,41 @@ static void do_send_one(int fdt, struct timed_send *ts)
 static bool do_recv_one(int fdr, struct timed_send *ts)
 {
 	int64_t tstop, texpect;
+	int total = 0;
 	char rbuf[2];
 	int ret;
 
-	ret = recv(fdr, rbuf, sizeof(rbuf), 0);
+read_again:
+	ret = recv(fdr, rbuf, sizeof(rbuf), MSG_TRUNC);
 	if (ret == -1 && errno == EAGAIN)
-		return true;
+		goto timedout;
 	if (ret == -1)
 		error(1, errno, "read");
-	if (ret != 1)
-		error(1, 0, "read: %dB", ret);
 
 	tstop = (gettime_ns() - glob_tstart) / 1000;
 	texpect = ts->delay_us >= 0 ? ts->delay_us : 0;
 
-	fprintf(stderr, "payload:%c delay:%lld expected:%lld (us)\n",
-			rbuf[0], (long long)tstop, (long long)texpect);
+	fprintf(stderr, "payload:%c delay:%lld expected:%lld (us) -- read=%d,len=%d,total=%d\n",
+			rbuf[0], (long long)tstop, (long long)texpect,
+			total, ret, cfg_size);
 
 	if (rbuf[0] != ts->data)
 		error(1, 0, "payload mismatch. expected %c", ts->data);
 
-	if (labs(tstop - texpect) > cfg_variance_us)
+	total += ret;
+	if (total < cfg_size)
+		goto read_again;
+
+	/* measure latency if all data arrives in a single datagram (not GSO) */
+	if (ret == cfg_size && labs(tstop - texpect) > cfg_variance_us)
 		error(1, 0, "exceeds variance (%d us)", cfg_variance_us);
 
 	return false;
+
+timedout:
+	if (total != 0 && total != cfg_size)
+		error(1, 0, "timeout mid-read");
+	return true;
 }
 
 static void do_recv_verify_empty(int fdr)
@@ -168,7 +192,9 @@ static void do_recv_errqueue_timeout(int fdt)
 			break;
 		if (ret == -1)
 			error(1, errno, "errqueue");
-		if (msg.msg_flags != MSG_ERRQUEUE)
+		if (ret != sizeof(data))
+			error(1, errno, "insufficient data");
+		if (msg.msg_flags & ~(MSG_ERRQUEUE | MSG_TRUNC))
 			error(1, 0, "errqueue: flags 0x%x\n", msg.msg_flags);
 
 		cm = CMSG_FIRSTHDR(&msg);
@@ -180,7 +206,9 @@ static void do_recv_errqueue_timeout(int fdt)
 		err = (struct sock_extended_err *)CMSG_DATA(cm);
 		if (err->ee_origin != SO_EE_ORIGIN_TXTIME)
 			error(1, 0, "errqueue: origin 0x%x\n", err->ee_origin);
-		if (err->ee_code != ECANCELED)
+		if (err->ee_errno != ECANCELED)
+			error(1, 0, "errqueue: errno 0x%x\n", err->ee_errno);
+		if (err->ee_code != SO_EE_CODE_TXTIME_MISSED)
 			error(1, 0, "errqueue: code 0x%x\n", err->ee_code);
 
 		tstamp = ((int64_t) err->ee_data) << 32 | err->ee_info;
@@ -202,7 +230,7 @@ static void setsockopt_txtime(int fd)
 	struct sock_txtime so_txtime_val_read = { 0 };
 	socklen_t vallen = sizeof(so_txtime_val);
 
-	so_txtime_val.flags = SOF_TXTIME_REPORT_ERRORS;
+	so_txtime_val.flags = SOF_TXTIME_REPORT_ERRORS | SOF_TXTIME_MULTI_RELEASE;
 
 	if (setsockopt(fd, SOL_SOCKET, SO_TXTIME,
 		       &so_txtime_val, sizeof(so_txtime_val)))
@@ -230,6 +258,12 @@ static int setup_tx(struct sockaddr *addr, socklen_t alen)
 
 	setsockopt_txtime(fd);
 
+	if (cfg_size > cfg_mss) {
+		if (setsockopt(fd, SOL_UDP, UDP_SEGMENT,
+			       &cfg_mss, sizeof(cfg_mss)))
+			error(1, errno, "setsockopt udp segment");
+	}
+
 	return fd;
 }
 
@@ -321,7 +355,7 @@ static void parse_opts(int argc, char **argv)
 {
 	int c, ilen, olen;
 
-	while ((c = getopt(argc, argv, "46A:c:rtT:")) != -1) {
+	while ((c = getopt(argc, argv, "46A:c:m:M:N:rs:tT:")) != -1) {
 		switch (c) {
 		case '4':
 			cfg_do_ipv4 = true;
@@ -341,9 +375,25 @@ static void parse_opts(int argc, char **argv)
 			else
 				error(1, 0, "unknown clock id %s", optarg);
 			break;
+		case 'm':
+			cfg_mss = strtol(optarg, NULL, 0);
+			break;
+		case 'M':
+			cfg_mr_ival = atoi(optarg);
+			if (cfg_mr_ival > 0xF)
+				error(1, 0, "multi release ival exceeds max");
+			break;
+		case 'N':
+			cfg_mr_num = atoi(optarg);
+			if (cfg_mr_num > 0xF)
+				error(1, 0, "multi release count exceeds max");
+			break;
 		case 'r':
 			cfg_rxonly = true;
 			break;
+		case 's':
+			cfg_size = atoi(optarg);
+			break;
 		case 't':
 			cfg_txonly = true;
 			break;
@@ -356,12 +406,14 @@ static void parse_opts(int argc, char **argv)
 	}
 
 	if (argc - optind != 2)
-		error(1, 0, "Usage: %s [-46rt] [-A addr] [-c clock] [-T timeout] <in> <out>", argv[0]);
+		error(1, 0, "Usage: %s [-46rt] [-A addr] [-c clock] [-m mtu] [-M ival] [-N num] [-s size] [-T timeout] <in> <out>", argv[0]);
 
 	if (cfg_rxonly && cfg_txonly)
 		error(1, 0, "Select rx-only or tx-only, not both");
 	if (cfg_addr && cfg_do_ipv4 && cfg_do_ipv6)
 		error(1, 0, "Cannot run both IPv4 and IPv6 when passing address");
+	if (!!cfg_mr_ival ^ !!cfg_mr_num)
+		error(1, 0, "Multi release pacing requires both -M and -N");
 
 	ilen = parse_io(argv[optind], cfg_in);
 	olen = parse_io(argv[optind + 1], cfg_out);
diff --git a/tools/testing/selftests/net/so_txtime.sh b/tools/testing/selftests/net/so_txtime.sh
index 3f7800eaecb1..7c60c11717e4 100755
--- a/tools/testing/selftests/net/so_txtime.sh
+++ b/tools/testing/selftests/net/so_txtime.sh
@@ -16,13 +16,20 @@ fi
 
 set -e
 
+ip link set dev lo mtu 1500
 tc qdisc add dev lo root fq
+
 ./so_txtime -4 -6 -c mono a,-1 a,-1
 ./so_txtime -4 -6 -c mono a,0 a,0
 ./so_txtime -4 -6 -c mono a,10 a,10
 ./so_txtime -4 -6 -c mono a,10,b,20 a,10,b,20
 ./so_txtime -4 -6 -c mono a,20,b,10 b,20,a,20
 
+# test gso
+./so_txtime -4 -6 -m 1000 -s 3500 -c mono a,50,b,100 a,50,b,100
+./so_txtime -4 -6 -m 1000 -s 3500 -M 5 -N 1 -c mono a,50,b,100 a,50,b,100
+./so_txtime -4 -6 -m 1000 -s 3500 -M 5 -N 2 -c mono a,50,b,100 a,50,b,100
+
 if tc qdisc replace dev lo root etf clockid CLOCK_TAI delta 400000; then
 	! ./so_txtime -4 -6 -c tai a,-1 a,-1
 	! ./so_txtime -4 -6 -c tai a,0 a,0
diff --git a/tools/testing/selftests/net/so_txtime_multi.sh b/tools/testing/selftests/net/so_txtime_multi.sh
new file mode 100755
index 000000000000..4e5ab06fd178
--- /dev/null
+++ b/tools/testing/selftests/net/so_txtime_multi.sh
@@ -0,0 +1,68 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Regression tests for the SO_TXTIME interface
+
+readonly ns_prefix="ns-sotxtime-"
+readonly ns1="${ns_prefix}1"
+readonly ns2="${ns_prefix}2"
+
+readonly ns1_v4=192.168.1.1
+readonly ns2_v4=192.168.1.2
+readonly ns1_v6=fd::1
+readonly ns2_v6=fd::2
+
+set -eu
+
+cleanup() {
+	ip netns del "${ns2}"
+	ip netns del "${ns1}"
+}
+
+setup() {
+	ip netns add "${ns1}"
+	ip netns add "${ns2}"
+
+	ip link add dev veth1 mtu 1500 netns "${ns1}" type veth \
+	      peer name veth2 mtu 1500 netns "${ns2}"
+
+	ip -netns "${ns1}" link set veth1 up
+	ip -netns "${ns2}" link set veth2 up
+
+	ip -netns "${ns1}" -4 addr add "${ns1_v4}/24" dev veth1
+	ip -netns "${ns2}" -4 addr add "${ns2_v4}/24" dev veth2
+	ip -netns "${ns1}" -6 addr add "${ns1_v6}/64" dev veth1 nodad
+	ip -netns "${ns2}" -6 addr add "${ns2_v6}/64" dev veth2 nodad
+
+	ip netns exec "${ns1}" tc qdisc add dev veth1 root fq
+}
+
+run_test() {
+	ip netns exec "${ns2}" ./so_txtime -r -T 1 $@ &
+	sleep 0.1
+	ip netns exec "${ns1}" ./so_txtime -t $@
+	wait
+}
+
+run_test_46() {
+	run_test -4 -A "${ns2_v4}" $@
+	run_test -6 -A "${ns2_v6}" $@
+}
+
+trap cleanup EXIT
+setup
+
+echo "pacing"
+TEST_ARGS="-c mono a,10 a,10"
+run_test_46 ${TEST_ARGS}
+
+echo "gso + pacing"
+TEST_ARGS_GSO="-m 1000 -s 4500 ${TEST_ARGS}"
+run_test_46 ${TEST_ARGS_GSO}
+
+echo "gso + multi release pacing"
+run_test_46 -M 5 -N 1 ${TEST_ARGS_GSO}
+run_test_46 -M 5 -N 2 ${TEST_ARGS_GSO}
+
+# Does not validate pacing delay yet. Check manually.
+echo "Ok. Executed tests."
-- 
2.27.0.278.ge193c7cf3a9-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH RFC net-next 6/6] selftests/net: upgso bench: add pacing with SO_TXTIME
  2020-06-09 14:09 [PATCH RFC net-next 0/6] multi release pacing for UDP GSO Willem de Bruijn
                   ` (4 preceding siblings ...)
  2020-06-09 14:09 ` [PATCH RFC net-next 5/6] selftests/net: so_txtime: add gso and multi release pacing Willem de Bruijn
@ 2020-06-09 14:09 ` Willem de Bruijn
  5 siblings, 0 replies; 9+ messages in thread
From: Willem de Bruijn @ 2020-06-09 14:09 UTC (permalink / raw)
  To: netdev; +Cc: Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Enable passing an SCM_TXTIME ('-x'), optionally configured to use
multi release (SOF_TXTIME_MULTI_RELEASE) ('-X').

With multi release, the segments that make up a single udp gso packet
can be released over time, as opposed to burst at once.

Repurpose the lower 8 bits of the 64 bit timestamp for this purpose.
- bits 4..7: delay between transmit periods in msec
- bits 0..3: number of segments sent per period

Also add an optional delay in usec between sendmsg calls ('-d'). To
reduce throughput sufficiently to observe pacing delay introduced.

Added udpgso_bench_multi.sh, derived from so_txtime_multi.sh, to run
tests over veth.

Also fix up a minor issue where sizeof the wrong field was used
(mss, instead of gso_size). They happen to both be uint16_t.

Tested:
    ./udpgso_bench_multi.sh

    or manually ran across two hosts:

    # sender
    #
    # -S 6000 -s 1400:  6000B buffer encoding 1400B datagrams
    # -d 200000:        200 msec delay between sendmsg
    # -x 0x989611:      ~1000000 nsec first delay + 1 MSS every 1 msec

    ./udpgso_bench_tx -6 -D fd00::1 \
        -l 1 -s 6000 -S 1400 -v
        -d 200000 -x 0x989611 -X

    # receiver

    tcpdump -n -i eth1 -c 100 udp and port 8000 &
    sleep 0.2
    ./udpgso_bench_rx

    16:29:45.146855 IP6 host1.40803 > host2.8000: UDP, length 1400
    16:29:45.147798 IP6 host1.40803 > host2.8000: UDP, length 1400
    16:29:45.148797 IP6 host1.40803 > host2.8000: UDP, length 1400
    16:29:45.149797 IP6 host1.40803 > host2.8000: UDP, length 1400
    16:29:45.150796 IP6 host1.40803 > host2.8000: UDP, length 400
    16:29:45.347056 IP6 host1.40803 > host2.8000: UDP, length 1400
    16:29:45.348000 IP6 host1.40803 > host2.8000: UDP, length 1400
    16:29:45.349000 IP6 host1.40803 > host2.8000: UDP, length 1400
    16:29:45.349999 IP6 host1.40803 > host2.8000: UDP, length 1400
    16:29:45.350999 IP6 host1.40803 > host2.8000: UDP, length 400

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 .../selftests/net/udpgso_bench_multi.sh       | 65 +++++++++++++++++
 tools/testing/selftests/net/udpgso_bench_tx.c | 72 +++++++++++++++++--
 2 files changed, 133 insertions(+), 4 deletions(-)
 create mode 100755 tools/testing/selftests/net/udpgso_bench_multi.sh

diff --git a/tools/testing/selftests/net/udpgso_bench_multi.sh b/tools/testing/selftests/net/udpgso_bench_multi.sh
new file mode 100755
index 000000000000..c29f75aec759
--- /dev/null
+++ b/tools/testing/selftests/net/udpgso_bench_multi.sh
@@ -0,0 +1,65 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Regression tests for the SO_TXTIME interface
+
+readonly ns_prefix="ns-sotxtime-"
+readonly ns1="${ns_prefix}1"
+readonly ns2="${ns_prefix}2"
+
+readonly ns1_v4=192.168.1.1
+readonly ns2_v4=192.168.1.2
+readonly ns1_v6=fd::1
+readonly ns2_v6=fd::2
+
+set -eu
+
+cleanup() {
+	ip netns del "${ns2}"
+	ip netns del "${ns1}"
+}
+
+setup() {
+	ip netns add "${ns1}"
+	ip netns add "${ns2}"
+
+	ip link add dev veth1 mtu 1500 netns "${ns1}" type veth \
+	      peer name veth2 mtu 1500 netns "${ns2}"
+
+	ip -netns "${ns1}" link set veth1 up
+	ip -netns "${ns2}" link set veth2 up
+
+	ip -netns "${ns1}" -4 addr add "${ns1_v4}/24" dev veth1
+	ip -netns "${ns2}" -4 addr add "${ns2_v4}/24" dev veth2
+	ip -netns "${ns1}" -6 addr add "${ns1_v6}/64" dev veth1 nodad
+	ip -netns "${ns2}" -6 addr add "${ns2_v6}/64" dev veth2 nodad
+
+	ip netns exec "${ns1}" tc qdisc add dev veth1 root fq
+}
+
+run_test() {
+	ip netns exec "${ns2}" tcpdump -q -n -i veth2 udp &
+	ip netns exec "${ns2}" ./udpgso_bench_rx &
+	sleep 0.1
+	ip netns exec "${ns1}" ./udpgso_bench_tx $@
+	pkill -P $$
+}
+
+run_test_46() {
+	run_test -4 -D "${ns2_v4}" $@
+	run_test -6 -D "${ns2_v6}" $@
+}
+
+trap cleanup EXIT
+setup
+
+echo "gso + pacing"
+TEST_ARGS="-l 1 -s 3500 -S 1000 -v -d 200000 -x 1000000"
+run_test_46 ${TEST_ARGS} -x 1000000
+
+echo "gso + multi release pacing"
+run_test_46 ${TEST_ARGS} -X -x 0x989611
+run_test_46 ${TEST_ARGS} -X -x 0x9896A2
+
+# Does not validate pacing delay yet. Check manually.
+echo "Ok. Executed tests."
diff --git a/tools/testing/selftests/net/udpgso_bench_tx.c b/tools/testing/selftests/net/udpgso_bench_tx.c
index 17512a43885e..264222c2b94e 100644
--- a/tools/testing/selftests/net/udpgso_bench_tx.c
+++ b/tools/testing/selftests/net/udpgso_bench_tx.c
@@ -23,6 +23,7 @@
 #include <sys/time.h>
 #include <sys/poll.h>
 #include <sys/types.h>
+#include <time.h>
 #include <unistd.h>
 
 #include "../kselftest.h"
@@ -56,6 +57,7 @@
 static bool	cfg_cache_trash;
 static int	cfg_cpu		= -1;
 static int	cfg_connected	= true;
+static int	cfg_delay_us;
 static int	cfg_family	= PF_UNSPEC;
 static uint16_t	cfg_mss;
 static int	cfg_payload_len	= (1472 * 42);
@@ -65,6 +67,8 @@ static bool	cfg_poll;
 static bool	cfg_segment;
 static bool	cfg_sendmmsg;
 static bool	cfg_tcp;
+static uint64_t	cfg_txtime;
+static bool	cfg_txtime_multi;
 static uint32_t	cfg_tx_ts = SOF_TIMESTAMPING_TX_SOFTWARE;
 static bool	cfg_tx_tstamp;
 static bool	cfg_audit;
@@ -306,6 +310,34 @@ static void send_ts_cmsg(struct cmsghdr *cm)
 	*valp = cfg_tx_ts;
 }
 
+static uint64_t gettime_ns(void)
+{
+	struct timespec ts;
+
+	if (clock_gettime(CLOCK_MONOTONIC, &ts))
+		error(1, errno, "gettime");
+
+	return ts.tv_sec * (1000ULL * 1000 * 1000) + ts.tv_nsec;
+}
+
+static void send_txtime_cmsg(struct cmsghdr *cm)
+{
+	uint64_t tdeliver, *valp;
+
+	tdeliver = gettime_ns() + cfg_txtime;
+
+	if (cfg_txtime_multi) {
+		tdeliver &= ~0xFF;
+		tdeliver |= cfg_txtime & 0xFF;
+	}
+
+	cm->cmsg_level = SOL_SOCKET;
+	cm->cmsg_type = SCM_TXTIME;
+	cm->cmsg_len = CMSG_LEN(sizeof(cfg_txtime));
+	valp = (void *)CMSG_DATA(cm);
+	*valp = tdeliver;
+}
+
 static int send_udp_sendmmsg(int fd, char *data)
 {
 	char control[CMSG_SPACE(sizeof(cfg_tx_ts))] = {0};
@@ -373,7 +405,8 @@ static void send_udp_segment_cmsg(struct cmsghdr *cm)
 static int send_udp_segment(int fd, char *data)
 {
 	char control[CMSG_SPACE(sizeof(cfg_gso_size)) +
-		     CMSG_SPACE(sizeof(cfg_tx_ts))] = {0};
+		     CMSG_SPACE(sizeof(cfg_tx_ts)) +
+		     CMSG_SPACE(sizeof(uint64_t))] = {0};
 	struct msghdr msg = {0};
 	struct iovec iov = {0};
 	size_t msg_controllen;
@@ -390,12 +423,17 @@ static int send_udp_segment(int fd, char *data)
 	msg.msg_controllen = sizeof(control);
 	cmsg = CMSG_FIRSTHDR(&msg);
 	send_udp_segment_cmsg(cmsg);
-	msg_controllen = CMSG_SPACE(sizeof(cfg_mss));
+	msg_controllen = CMSG_SPACE(sizeof(cfg_gso_size));
 	if (cfg_tx_tstamp) {
 		cmsg = CMSG_NXTHDR(&msg, cmsg);
 		send_ts_cmsg(cmsg);
 		msg_controllen += CMSG_SPACE(sizeof(cfg_tx_ts));
 	}
+	if (cfg_txtime) {
+		cmsg = CMSG_NXTHDR(&msg, cmsg);
+		send_txtime_cmsg(cmsg);
+		msg_controllen += CMSG_SPACE(sizeof(cfg_txtime));
+	}
 
 	msg.msg_controllen = msg_controllen;
 	msg.msg_name = (void *)&cfg_dst_addr;
@@ -413,7 +451,7 @@ static int send_udp_segment(int fd, char *data)
 
 static void usage(const char *filepath)
 {
-	error(1, 0, "Usage: %s [-46acmHPtTuvz] [-C cpu] [-D dst ip] [-l secs] [-M messagenr] [-p port] [-s sendsize] [-S gsosize]",
+	error(1, 0, "Usage: %s [-46acmHPtTuvXz] [-C cpu] [-d delay] [-D dst ip] [-l secs] [-M messagenr] [-p port] [-s sendsize] [-S gsosize] [-x time]",
 		    filepath);
 }
 
@@ -422,7 +460,7 @@ static void parse_opts(int argc, char **argv)
 	int max_len, hdrlen;
 	int c;
 
-	while ((c = getopt(argc, argv, "46acC:D:Hl:mM:p:s:PS:tTuvz")) != -1) {
+	while ((c = getopt(argc, argv, "46acC:d:D:Hl:mM:p:s:PS:tTuvx:Xz")) != -1) {
 		switch (c) {
 		case '4':
 			if (cfg_family != PF_UNSPEC)
@@ -445,6 +483,9 @@ static void parse_opts(int argc, char **argv)
 		case 'C':
 			cfg_cpu = strtol(optarg, NULL, 0);
 			break;
+		case 'd':
+			cfg_delay_us = strtol(optarg, NULL, 0);
+			break;
 		case 'D':
 			setup_sockaddr(cfg_family, optarg, &cfg_dst_addr);
 			break;
@@ -486,6 +527,12 @@ static void parse_opts(int argc, char **argv)
 		case 'v':
 			cfg_verbose = true;
 			break;
+		case 'x':
+			cfg_txtime = strtoull(optarg, NULL, 0);
+			break;
+		case 'X':
+			cfg_txtime_multi = true;
+			break;
 		case 'z':
 			cfg_zerocopy = true;
 			break;
@@ -551,6 +598,17 @@ static void set_tx_timestamping(int fd)
 		error(1, errno, "setsockopt tx timestamping");
 }
 
+static void set_txtime(int fd)
+{
+	struct sock_txtime txt = { .clockid = CLOCK_MONOTONIC };
+
+	if (cfg_txtime_multi)
+		txt.flags = SOF_TXTIME_MULTI_RELEASE;
+
+	if (setsockopt(fd, SOL_SOCKET, SO_TXTIME, &txt, sizeof(txt)))
+		error(1, errno, "setsockopt txtime");
+}
+
 static void print_audit_report(unsigned long num_msgs, unsigned long num_sends)
 {
 	unsigned long tdelta;
@@ -652,6 +710,9 @@ int main(int argc, char **argv)
 	if (cfg_tx_tstamp)
 		set_tx_timestamping(fd);
 
+	if (cfg_txtime)
+		set_txtime(fd);
+
 	num_msgs = num_sends = 0;
 	tnow = gettimeofday_ms();
 	tstart = tnow;
@@ -687,6 +748,9 @@ int main(int argc, char **argv)
 		if (cfg_cache_trash)
 			i = ++i < NUM_PKT ? i : 0;
 
+		if (cfg_delay_us)
+			usleep(cfg_delay_us);
+
 	} while (!interrupted && (cfg_runtime_ms == -1 || tnow < tstop));
 
 	if (cfg_zerocopy || cfg_tx_tstamp)
-- 
2.27.0.278.ge193c7cf3a9-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC net-next 3/6] net_sched: sch_fq: multiple release time support
  2020-06-09 14:09 ` [PATCH RFC net-next 3/6] net_sched: sch_fq: multiple release time support Willem de Bruijn
@ 2020-06-09 15:00   ` Eric Dumazet
  2020-06-09 15:10     ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2020-06-09 15:00 UTC (permalink / raw)
  To: Willem de Bruijn, netdev; +Cc: Willem de Bruijn



On 6/9/20 7:09 AM, Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
> 
> Optionally segment skbs on FQ enqueue, to later send segments at
> their individual delivery time.
> 
> Segmentation on enqueue is new for FQ, but already happens in TBF,
> CAKE and netem.
> 
> This slow patch should probably be behind a static_branch.
> 
> Signed-off-by: Willem de Bruijn <willemb@google.com>
> ---
>  net/sched/sch_fq.c | 33 +++++++++++++++++++++++++++++++--
>  1 file changed, 31 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
> index 8f06a808c59a..a5e2c35bb557 100644
> --- a/net/sched/sch_fq.c
> +++ b/net/sched/sch_fq.c
> @@ -439,8 +439,8 @@ static bool fq_packet_beyond_horizon(const struct sk_buff *skb,
>  	return unlikely((s64)skb->tstamp > (s64)(q->ktime_cache + q->horizon));
>  }
>  
> -static int fq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> -		      struct sk_buff **to_free)
> +static int __fq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> +			struct sk_buff **to_free)
>  {
>  	struct fq_sched_data *q = qdisc_priv(sch);
>  	struct fq_flow *f;
> @@ -496,6 +496,35 @@ static int fq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
>  	return NET_XMIT_SUCCESS;
>  }
>  
> +static int fq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> +		      struct sk_buff **to_free)
> +{
> +	struct sk_buff *segs, *next;
> +	int ret;
> +
> +	if (likely(!skb_is_gso(skb) || !skb->sk ||

You also need to check sk_fullsock(skb->sk), otherwise KMSAN might be unhappy.

> +		   !skb->sk->sk_txtime_multi_release))
> +		return __fq_enqueue(skb, sch, to_free);
> +
> +	segs = skb_gso_segment_txtime(skb);
> +	if (IS_ERR(segs))
> +		return qdisc_drop(skb, sch, to_free);
> +	if (!segs)
> +		return __fq_enqueue(skb, sch, to_free);
> +
> +	consume_skb(skb);

   This needs to be qdisc_drop(skb, sch, to_free) if queue is full, see below.

> +
> +	ret = NET_XMIT_DROP;
> +	skb_list_walk_safe(segs, segs, next) {
> +		skb_mark_not_on_list(segs);
> +		qdisc_skb_cb(segs)->pkt_len = segs->len;

This seems to under-estimate bytes sent. See qdisc_pkt_len_init() for details.

> +		if (__fq_enqueue(segs, sch, to_free) == NET_XMIT_SUCCESS)
> +			ret = NET_XMIT_SUCCESS;
> +	}

        if (unlikely(ret == NET_XMIT_DROP))
            qdisc_drop(skb, sch, to_free);
        else
            consume_skb(skb);

> +
> +	return ret;
> +}
> +
>  static void fq_check_throttled(struct fq_sched_data *q, u64 now)
>  {
>  	unsigned long sample;
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC net-next 3/6] net_sched: sch_fq: multiple release time support
  2020-06-09 15:00   ` Eric Dumazet
@ 2020-06-09 15:10     ` Eric Dumazet
  0 siblings, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2020-06-09 15:10 UTC (permalink / raw)
  To: Willem de Bruijn, netdev; +Cc: Willem de Bruijn



On 6/9/20 8:00 AM, Eric Dumazet wrote:
> 
> 
> On 6/9/20 7:09 AM, Willem de Bruijn wrote:
>> From: Willem de Bruijn <willemb@google.com>
>>
>> Optionally segment skbs on FQ enqueue, to later send segments at
>> their individual delivery time.
>>
>> Segmentation on enqueue is new for FQ, but already happens in TBF,
>> CAKE and netem.
>>
>> This slow patch should probably be behind a static_branch.
>>
>> Signed-off-by: Willem de Bruijn <willemb@google.com>
>> ---
>>  net/sched/sch_fq.c | 33 +++++++++++++++++++++++++++++++--
>>  1 file changed, 31 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
>> index 8f06a808c59a..a5e2c35bb557 100644
>> --- a/net/sched/sch_fq.c
>> +++ b/net/sched/sch_fq.c
>> @@ -439,8 +439,8 @@ static bool fq_packet_beyond_horizon(const struct sk_buff *skb,
>>  	return unlikely((s64)skb->tstamp > (s64)(q->ktime_cache + q->horizon));
>>  }
>>  
>> -static int fq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
>> -		      struct sk_buff **to_free)
>> +static int __fq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
>> +			struct sk_buff **to_free)
>>  {
>>  	struct fq_sched_data *q = qdisc_priv(sch);
>>  	struct fq_flow *f;
>> @@ -496,6 +496,35 @@ static int fq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
>>  	return NET_XMIT_SUCCESS;
>>  }
>>  
>> +static int fq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
>> +		      struct sk_buff **to_free)
>> +{
>> +	struct sk_buff *segs, *next;
>> +	int ret;
>> +
>> +	if (likely(!skb_is_gso(skb) || !skb->sk ||
> 
> You also need to check sk_fullsock(skb->sk), otherwise KMSAN might be unhappy.
> 
>> +		   !skb->sk->sk_txtime_multi_release))
>> +		return __fq_enqueue(skb, sch, to_free);
>> +
>> +	segs = skb_gso_segment_txtime(skb);
>> +	if (IS_ERR(segs))
>> +		return qdisc_drop(skb, sch, to_free);
>> +	if (!segs)
>> +		return __fq_enqueue(skb, sch, to_free);
>> +
>> +	consume_skb(skb);
> 
>    This needs to be qdisc_drop(skb, sch, to_free) if queue is full, see below.
> 
>> +
>> +	ret = NET_XMIT_DROP;
>> +	skb_list_walk_safe(segs, segs, next) {
>> +		skb_mark_not_on_list(segs);
>> +		qdisc_skb_cb(segs)->pkt_len = segs->len;
> 
> This seems to under-estimate bytes sent. See qdisc_pkt_len_init() for details.
> 
>> +		if (__fq_enqueue(segs, sch, to_free) == NET_XMIT_SUCCESS)
>> +			ret = NET_XMIT_SUCCESS;
>> +	}
> 
>         if (unlikely(ret == NET_XMIT_DROP))
>             qdisc_drop(skb, sch, to_free);

Maybe not qdisc_drop() (qdisc drop counters are already updated)
but at least kfree_skb() so that drop monitor is not fooled.

>         else
>             consume_skb(skb);
> 
>> +
>> +	return ret;
>> +}
>> +
>>  static void fq_check_throttled(struct fq_sched_data *q, u64 now)
>>  {
>>  	unsigned long sample;
>>
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-06-09 15:10 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-09 14:09 [PATCH RFC net-next 0/6] multi release pacing for UDP GSO Willem de Bruijn
2020-06-09 14:09 ` [PATCH RFC net-next 1/6] net: multiple release time SO_TXTIME Willem de Bruijn
2020-06-09 14:09 ` [PATCH RFC net-next 2/6] net: build gso segs in multi " Willem de Bruijn
2020-06-09 14:09 ` [PATCH RFC net-next 3/6] net_sched: sch_fq: multiple release time support Willem de Bruijn
2020-06-09 15:00   ` Eric Dumazet
2020-06-09 15:10     ` Eric Dumazet
2020-06-09 14:09 ` [PATCH RFC net-next 4/6] selftests/net: so_txtime: support txonly/rxonly modes Willem de Bruijn
2020-06-09 14:09 ` [PATCH RFC net-next 5/6] selftests/net: so_txtime: add gso and multi release pacing Willem de Bruijn
2020-06-09 14:09 ` [PATCH RFC net-next 6/6] selftests/net: upgso bench: add pacing with SO_TXTIME Willem de Bruijn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).