netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH rfc 0/3] virtio-net: add tx-hash, rx-tstamp and tx-tstamp
@ 2020-12-28 16:22 Willem de Bruijn
  2020-12-28 16:22 ` [PATCH rfc 1/3] virtio-net: support transmit hash report Willem de Bruijn
                   ` (3 more replies)
  0 siblings, 4 replies; 34+ messages in thread
From: Willem de Bruijn @ 2020-12-28 16:22 UTC (permalink / raw)
  To: virtualization; +Cc: netdev, mst, jasowang, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

RFC for three new features to the virtio network device:

1. pass tx flow hash and state to host, for routing + telemetry
2. pass rx tstamp to guest, for better RTT estimation
3. pass tx tstamp to host, for accurate pacing

All three would introduce an extension to the virtio spec.
I assume this would require opening three ballots against v1.2 at
https://www.oasis-open.org/committees/ballots.php?wg_abbrev=virtio

This RFC is to informally discuss the proposals first.

The patchset is against v5.10. Evaluation additionally requires
changes to qemu and at least one back-end. I implemented preliminary
support in Linux vhost-net. Both patches available through github at

https://github.com/wdebruij/linux/tree/virtio-net-txhash-1
https://github.com/wdebruij/qemu/tree/virtio-net-txhash-1

Willem de Bruijn (3):
  virtio-net: support transmit hash report
  virtio-net: support receive timestamp
  virtio-net: support transmit timestamp

 drivers/net/virtio_net.c        | 52 +++++++++++++++++++++++++++++++--
 include/uapi/linux/virtio_net.h | 23 ++++++++++++++-
 2 files changed, 71 insertions(+), 4 deletions(-)

-- 
2.29.2.729.g45daf8777d-goog


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH rfc 1/3] virtio-net: support transmit hash report
  2020-12-28 16:22 [PATCH rfc 0/3] virtio-net: add tx-hash, rx-tstamp and tx-tstamp Willem de Bruijn
@ 2020-12-28 16:22 ` Willem de Bruijn
  2020-12-28 16:28   ` Michael S. Tsirkin
  2020-12-28 21:36   ` Michael S. Tsirkin
  2020-12-28 16:22 ` [PATCH rfc 2/3] virtio-net: support receive timestamp Willem de Bruijn
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 34+ messages in thread
From: Willem de Bruijn @ 2020-12-28 16:22 UTC (permalink / raw)
  To: virtualization; +Cc: netdev, mst, jasowang, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Virtio-net supports sharing the flow hash from host to guest on rx.
Do the same on transmit, to allow the host to infer connection state
for more robust routing and telemetry.

Linux derives ipv6 flowlabel and ECMP multipath from sk->sk_txhash,
and updates these fields on error with sk_rethink_txhash. This feature
allows the host to make similar decisions.

Besides the raw hash, optionally also convey connection state for
this hash. Specifically, the hash rotates on transmit timeout. To
avoid having to keep a stateful table in the host to detect flow
changes, explicitly notify when a hash changed due to timeout.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 drivers/net/virtio_net.c        | 24 +++++++++++++++++++++---
 include/uapi/linux/virtio_net.h | 10 +++++++++-
 2 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 21b71148c532..b917b7333928 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -201,6 +201,9 @@ struct virtnet_info {
 	/* Host will merge rx buffers for big packets (shake it! shake it!) */
 	bool mergeable_rx_bufs;
 
+	/* Guest will pass tx path info to the host */
+	bool has_tx_hash;
+
 	/* Has control virtqueue */
 	bool has_cvq;
 
@@ -394,9 +397,9 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 
 	hdr_len = vi->hdr_len;
 	if (vi->mergeable_rx_bufs)
-		hdr_padded_len = sizeof(*hdr);
+		hdr_padded_len = max_t(unsigned int, hdr_len, sizeof(*hdr));
 	else
-		hdr_padded_len = sizeof(struct padded_vnet_hdr);
+		hdr_padded_len = ALIGN(hdr_len, 16);
 
 	/* hdr_valid means no XDP, so we can copy the vnet header */
 	if (hdr_valid)
@@ -1534,6 +1537,7 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
 	struct virtio_net_hdr_mrg_rxbuf *hdr;
 	const unsigned char *dest = ((struct ethhdr *)skb->data)->h_dest;
 	struct virtnet_info *vi = sq->vq->vdev->priv;
+	struct virtio_net_hdr_v1_hash *ht;
 	int num_sg;
 	unsigned hdr_len = vi->hdr_len;
 	bool can_push;
@@ -1558,6 +1562,14 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
 	if (vi->mergeable_rx_bufs)
 		hdr->num_buffers = 0;
 
+	ht = (void *)hdr;
+	if (vi->has_tx_hash) {
+		ht->hash_value = cpu_to_virtio32(vi->vdev, skb->hash);
+		ht->hash_report = skb->l4_hash ? VIRTIO_NET_HASH_REPORT_L4 :
+						 VIRTIO_NET_HASH_REPORT_OTHER;
+		ht->hash_state = VIRTIO_NET_HASH_STATE_DEFAULT;
+	}
+
 	sg_init_table(sq->sg, skb_shinfo(skb)->nr_frags + (can_push ? 1 : 2));
 	if (can_push) {
 		__skb_push(skb, hdr_len);
@@ -3054,6 +3066,11 @@ static int virtnet_probe(struct virtio_device *vdev)
 	else
 		vi->hdr_len = sizeof(struct virtio_net_hdr);
 
+	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_TX_HASH)) {
+		vi->has_tx_hash = true;
+		vi->hdr_len = sizeof(struct virtio_net_hdr_v1_hash);
+	}
+
 	if (virtio_has_feature(vdev, VIRTIO_F_ANY_LAYOUT) ||
 	    virtio_has_feature(vdev, VIRTIO_F_VERSION_1))
 		vi->any_header_sg = true;
@@ -3243,7 +3260,8 @@ static struct virtio_device_id id_table[] = {
 	VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \
 	VIRTIO_NET_F_CTRL_MAC_ADDR, \
 	VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, \
-	VIRTIO_NET_F_SPEED_DUPLEX, VIRTIO_NET_F_STANDBY
+	VIRTIO_NET_F_SPEED_DUPLEX, VIRTIO_NET_F_STANDBY, \
+	VIRTIO_NET_F_TX_HASH
 
 static unsigned int features[] = {
 	VIRTNET_FEATURES,
diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
index 3f55a4215f11..f6881b5b77ee 100644
--- a/include/uapi/linux/virtio_net.h
+++ b/include/uapi/linux/virtio_net.h
@@ -57,6 +57,7 @@
 					 * Steering */
 #define VIRTIO_NET_F_CTRL_MAC_ADDR 23	/* Set MAC address */
 
+#define VIRTIO_NET_F_TX_HASH	  56	/* Guest sends hash report */
 #define VIRTIO_NET_F_HASH_REPORT  57	/* Supports hash report */
 #define VIRTIO_NET_F_RSS	  60	/* Supports RSS RX steering */
 #define VIRTIO_NET_F_RSC_EXT	  61	/* extended coalescing info */
@@ -170,8 +171,15 @@ struct virtio_net_hdr_v1_hash {
 #define VIRTIO_NET_HASH_REPORT_IPv6_EX         7
 #define VIRTIO_NET_HASH_REPORT_TCPv6_EX        8
 #define VIRTIO_NET_HASH_REPORT_UDPv6_EX        9
+#define VIRTIO_NET_HASH_REPORT_L4              10
+#define VIRTIO_NET_HASH_REPORT_OTHER           11
 	__le16 hash_report;
-	__le16 padding;
+	union {
+		__le16 padding;
+#define VIRTIO_NET_HASH_STATE_DEFAULT          0
+#define VIRTIO_NET_HASH_STATE_TIMEOUT_BIT      0x1
+		__le16 hash_state;
+	};
 };
 
 #ifndef VIRTIO_NET_NO_LEGACY
-- 
2.29.2.729.g45daf8777d-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH rfc 2/3] virtio-net: support receive timestamp
  2020-12-28 16:22 [PATCH rfc 0/3] virtio-net: add tx-hash, rx-tstamp and tx-tstamp Willem de Bruijn
  2020-12-28 16:22 ` [PATCH rfc 1/3] virtio-net: support transmit hash report Willem de Bruijn
@ 2020-12-28 16:22 ` Willem de Bruijn
  2020-12-28 17:28   ` Michael S. Tsirkin
                     ` (3 more replies)
  2020-12-28 16:22 ` [PATCH rfc 3/3] virtio-net: support transmit timestamp Willem de Bruijn
  2020-12-28 17:29 ` [PATCH rfc 0/3] virtio-net: add tx-hash, rx-tstamp and tx-tstamp Michael S. Tsirkin
  3 siblings, 4 replies; 34+ messages in thread
From: Willem de Bruijn @ 2020-12-28 16:22 UTC (permalink / raw)
  To: virtualization; +Cc: netdev, mst, jasowang, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Add optional PTP hardware timestamp offload for virtio-net.

Accurate RTT measurement requires timestamps close to the wire.
Introduce virtio feature VIRTIO_NET_F_RX_TSTAMP. If negotiated, the
virtio-net header is expanded with room for a timestamp. A host may
pass receive timestamps for all or some packets. A timestamp is valid
if non-zero.

The timestamp straddles (virtual) hardware domains. Like PTP, use
international atomic time (CLOCK_TAI) as global clock base. It is
guest responsibility to sync with host, e.g., through kvm-clock.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 drivers/net/virtio_net.c        | 20 +++++++++++++++++++-
 include/uapi/linux/virtio_net.h | 12 ++++++++++++
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index b917b7333928..57744bb6a141 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -204,6 +204,9 @@ struct virtnet_info {
 	/* Guest will pass tx path info to the host */
 	bool has_tx_hash;
 
+	/* Host will pass CLOCK_TAI receive time to the guest */
+	bool has_rx_tstamp;
+
 	/* Has control virtqueue */
 	bool has_cvq;
 
@@ -292,6 +295,13 @@ static inline struct virtio_net_hdr_mrg_rxbuf *skb_vnet_hdr(struct sk_buff *skb)
 	return (struct virtio_net_hdr_mrg_rxbuf *)skb->cb;
 }
 
+static inline struct virtio_net_hdr_v12 *skb_vnet_hdr_12(struct sk_buff *skb)
+{
+	BUILD_BUG_ON(sizeof(struct virtio_net_hdr_v12) > sizeof(skb->cb));
+
+	return (void *)skb->cb;
+}
+
 /*
  * private is used to chain pages for big packets, put the whole
  * most recent used list in the beginning for reuse
@@ -1082,6 +1092,9 @@ static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
 		goto frame_err;
 	}
 
+	if (vi->has_rx_tstamp)
+		skb_hwtstamps(skb)->hwtstamp = ns_to_ktime(skb_vnet_hdr_12(skb)->tstamp);
+
 	skb_record_rx_queue(skb, vq2rxq(rq->vq));
 	skb->protocol = eth_type_trans(skb, dev);
 	pr_debug("Receiving skb proto 0x%04x len %i type %i\n",
@@ -3071,6 +3084,11 @@ static int virtnet_probe(struct virtio_device *vdev)
 		vi->hdr_len = sizeof(struct virtio_net_hdr_v1_hash);
 	}
 
+	if (virtio_has_feature(vdev, VIRTIO_NET_F_RX_TSTAMP)) {
+		vi->has_rx_tstamp = true;
+		vi->hdr_len = sizeof(struct virtio_net_hdr_v12);
+	}
+
 	if (virtio_has_feature(vdev, VIRTIO_F_ANY_LAYOUT) ||
 	    virtio_has_feature(vdev, VIRTIO_F_VERSION_1))
 		vi->any_header_sg = true;
@@ -3261,7 +3279,7 @@ static struct virtio_device_id id_table[] = {
 	VIRTIO_NET_F_CTRL_MAC_ADDR, \
 	VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, \
 	VIRTIO_NET_F_SPEED_DUPLEX, VIRTIO_NET_F_STANDBY, \
-	VIRTIO_NET_F_TX_HASH
+	VIRTIO_NET_F_TX_HASH, VIRTIO_NET_F_RX_TSTAMP
 
 static unsigned int features[] = {
 	VIRTNET_FEATURES,
diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
index f6881b5b77ee..0ffe2eeebd4a 100644
--- a/include/uapi/linux/virtio_net.h
+++ b/include/uapi/linux/virtio_net.h
@@ -57,6 +57,7 @@
 					 * Steering */
 #define VIRTIO_NET_F_CTRL_MAC_ADDR 23	/* Set MAC address */
 
+#define VIRTIO_NET_F_RX_TSTAMP	  55	/* Host sends TAI receive time */
 #define VIRTIO_NET_F_TX_HASH	  56	/* Guest sends hash report */
 #define VIRTIO_NET_F_HASH_REPORT  57	/* Supports hash report */
 #define VIRTIO_NET_F_RSS	  60	/* Supports RSS RX steering */
@@ -182,6 +183,17 @@ struct virtio_net_hdr_v1_hash {
 	};
 };
 
+struct virtio_net_hdr_v12 {
+	struct virtio_net_hdr_v1 hdr;
+	struct {
+		__le32 value;
+		__le16 report;
+		__le16 flow_state;
+	} hash;
+	__virtio32 reserved;
+	__virtio64 tstamp;
+};
+
 #ifndef VIRTIO_NET_NO_LEGACY
 /* This header comes first in the scatter-gather list.
  * For legacy virtio, if VIRTIO_F_ANY_LAYOUT is not negotiated, it must
-- 
2.29.2.729.g45daf8777d-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH rfc 3/3] virtio-net: support transmit timestamp
  2020-12-28 16:22 [PATCH rfc 0/3] virtio-net: add tx-hash, rx-tstamp and tx-tstamp Willem de Bruijn
  2020-12-28 16:22 ` [PATCH rfc 1/3] virtio-net: support transmit hash report Willem de Bruijn
  2020-12-28 16:22 ` [PATCH rfc 2/3] virtio-net: support receive timestamp Willem de Bruijn
@ 2020-12-28 16:22 ` Willem de Bruijn
  2020-12-30 12:38   ` Richard Cochran
  2021-02-02 13:47   ` kernel test robot
  2020-12-28 17:29 ` [PATCH rfc 0/3] virtio-net: add tx-hash, rx-tstamp and tx-tstamp Michael S. Tsirkin
  3 siblings, 2 replies; 34+ messages in thread
From: Willem de Bruijn @ 2020-12-28 16:22 UTC (permalink / raw)
  To: virtualization; +Cc: netdev, mst, jasowang, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Add optional delivery time (SO_TXTIME) offload for virtio-net.

The Linux TCP/IP stack tries to avoid bursty transmission and network
congestion through pacing: computing an skb delivery time based on
congestion information. Userspace protocol implementations can achieve
the same with SO_TXTIME. This may also reduce scheduling jitter and
improve RTT estimation.

Pacing can be implemented in ETF or FQ qdiscs or offloaded to NIC
hardware. Allow guests to offload for the same reasons.

The timestamp straddles (virtual) hardware domains. Like PTP, use
international atomic time (CLOCK_TAI) as global clock base. It is
guest responsibility to sync with host, e.g., through kvm-clock.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 drivers/net/virtio_net.c        | 24 +++++++++++++++++-------
 include/uapi/linux/virtio_net.h |  1 +
 2 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 57744bb6a141..d40be688aed0 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -207,6 +207,9 @@ struct virtnet_info {
 	/* Host will pass CLOCK_TAI receive time to the guest */
 	bool has_rx_tstamp;
 
+	/* Guest will pass CLOCK_TAI delivery time to the host */
+	bool has_tx_tstamp;
+
 	/* Has control virtqueue */
 	bool has_cvq;
 
@@ -1550,7 +1553,7 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
 	struct virtio_net_hdr_mrg_rxbuf *hdr;
 	const unsigned char *dest = ((struct ethhdr *)skb->data)->h_dest;
 	struct virtnet_info *vi = sq->vq->vdev->priv;
-	struct virtio_net_hdr_v1_hash *ht;
+	struct virtio_net_hdr_v12 *h12;
 	int num_sg;
 	unsigned hdr_len = vi->hdr_len;
 	bool can_push;
@@ -1575,13 +1578,15 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
 	if (vi->mergeable_rx_bufs)
 		hdr->num_buffers = 0;
 
-	ht = (void *)hdr;
+	h12 = (void *)hdr;
 	if (vi->has_tx_hash) {
-		ht->hash_value = cpu_to_virtio32(vi->vdev, skb->hash);
-		ht->hash_report = skb->l4_hash ? VIRTIO_NET_HASH_REPORT_L4 :
-						 VIRTIO_NET_HASH_REPORT_OTHER;
-		ht->hash_state = VIRTIO_NET_HASH_STATE_DEFAULT;
+		h12->hash.value = cpu_to_virtio32(vi->vdev, skb->hash);
+		h12->hash.report = skb->l4_hash ? VIRTIO_NET_HASH_REPORT_L4 :
+						  VIRTIO_NET_HASH_REPORT_OTHER;
+		h12->hash.flow_state = VIRTIO_NET_HASH_STATE_DEFAULT;
 	}
+	if (vi->has_tx_tstamp)
+		h12->tstamp = cpu_to_virtio64(vi->vdev, skb->tstamp);
 
 	sg_init_table(sq->sg, skb_shinfo(skb)->nr_frags + (can_push ? 1 : 2));
 	if (can_push) {
@@ -3089,6 +3094,11 @@ static int virtnet_probe(struct virtio_device *vdev)
 		vi->hdr_len = sizeof(struct virtio_net_hdr_v12);
 	}
 
+	if (virtio_has_feature(vdev, VIRTIO_NET_F_TX_TSTAMP)) {
+		vi->has_tx_tstamp = true;
+		vi->hdr_len = sizeof(struct virtio_net_hdr_v12);
+	}
+
 	if (virtio_has_feature(vdev, VIRTIO_F_ANY_LAYOUT) ||
 	    virtio_has_feature(vdev, VIRTIO_F_VERSION_1))
 		vi->any_header_sg = true;
@@ -3279,7 +3289,7 @@ static struct virtio_device_id id_table[] = {
 	VIRTIO_NET_F_CTRL_MAC_ADDR, \
 	VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, \
 	VIRTIO_NET_F_SPEED_DUPLEX, VIRTIO_NET_F_STANDBY, \
-	VIRTIO_NET_F_TX_HASH, VIRTIO_NET_F_RX_TSTAMP
+	VIRTIO_NET_F_TX_HASH, VIRTIO_NET_F_RX_TSTAMP, VIRTIO_NET_F_TX_TSTAMP
 
 static unsigned int features[] = {
 	VIRTNET_FEATURES,
diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
index 0ffe2eeebd4a..da017a47791d 100644
--- a/include/uapi/linux/virtio_net.h
+++ b/include/uapi/linux/virtio_net.h
@@ -57,6 +57,7 @@
 					 * Steering */
 #define VIRTIO_NET_F_CTRL_MAC_ADDR 23	/* Set MAC address */
 
+#define VIRTIO_NET_F_TX_TSTAMP	  54	/* Guest sets TAI delivery time */
 #define VIRTIO_NET_F_RX_TSTAMP	  55	/* Host sends TAI receive time */
 #define VIRTIO_NET_F_TX_HASH	  56	/* Guest sends hash report */
 #define VIRTIO_NET_F_HASH_REPORT  57	/* Supports hash report */
-- 
2.29.2.729.g45daf8777d-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH rfc 1/3] virtio-net: support transmit hash report
  2020-12-28 16:22 ` [PATCH rfc 1/3] virtio-net: support transmit hash report Willem de Bruijn
@ 2020-12-28 16:28   ` Michael S. Tsirkin
  2020-12-28 16:47     ` Willem de Bruijn
  2020-12-28 21:36   ` Michael S. Tsirkin
  1 sibling, 1 reply; 34+ messages in thread
From: Michael S. Tsirkin @ 2020-12-28 16:28 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: virtualization, netdev, jasowang, Willem de Bruijn

On Mon, Dec 28, 2020 at 11:22:31AM -0500, Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
> 
> Virtio-net supports sharing the flow hash from host to guest on rx.
> Do the same on transmit, to allow the host to infer connection state
> for more robust routing and telemetry.
> 
> Linux derives ipv6 flowlabel and ECMP multipath from sk->sk_txhash,
> and updates these fields on error with sk_rethink_txhash. This feature
> allows the host to make similar decisions.
> 
> Besides the raw hash, optionally also convey connection state for
> this hash. Specifically, the hash rotates on transmit timeout. To
> avoid having to keep a stateful table in the host to detect flow
> changes, explicitly notify when a hash changed due to timeout.

I don't actually see code using VIRTIO_NET_HASH_STATE_TIMEOUT_BIT
in this series. Want to split out that part to a separate patch?


> Signed-off-by: Willem de Bruijn <willemb@google.com>
> ---
>  drivers/net/virtio_net.c        | 24 +++++++++++++++++++++---
>  include/uapi/linux/virtio_net.h | 10 +++++++++-
>  2 files changed, 30 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 21b71148c532..b917b7333928 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -201,6 +201,9 @@ struct virtnet_info {
>  	/* Host will merge rx buffers for big packets (shake it! shake it!) */
>  	bool mergeable_rx_bufs;
>  
> +	/* Guest will pass tx path info to the host */
> +	bool has_tx_hash;
> +
>  	/* Has control virtqueue */
>  	bool has_cvq;
>  
> @@ -394,9 +397,9 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>  
>  	hdr_len = vi->hdr_len;
>  	if (vi->mergeable_rx_bufs)
> -		hdr_padded_len = sizeof(*hdr);
> +		hdr_padded_len = max_t(unsigned int, hdr_len, sizeof(*hdr));
>  	else
> -		hdr_padded_len = sizeof(struct padded_vnet_hdr);
> +		hdr_padded_len = ALIGN(hdr_len, 16);
>  
>  	/* hdr_valid means no XDP, so we can copy the vnet header */
>  	if (hdr_valid)
> @@ -1534,6 +1537,7 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
>  	struct virtio_net_hdr_mrg_rxbuf *hdr;
>  	const unsigned char *dest = ((struct ethhdr *)skb->data)->h_dest;
>  	struct virtnet_info *vi = sq->vq->vdev->priv;
> +	struct virtio_net_hdr_v1_hash *ht;
>  	int num_sg;
>  	unsigned hdr_len = vi->hdr_len;
>  	bool can_push;
> @@ -1558,6 +1562,14 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
>  	if (vi->mergeable_rx_bufs)
>  		hdr->num_buffers = 0;
>  
> +	ht = (void *)hdr;
> +	if (vi->has_tx_hash) {
> +		ht->hash_value = cpu_to_virtio32(vi->vdev, skb->hash);
> +		ht->hash_report = skb->l4_hash ? VIRTIO_NET_HASH_REPORT_L4 :
> +						 VIRTIO_NET_HASH_REPORT_OTHER;
> +		ht->hash_state = VIRTIO_NET_HASH_STATE_DEFAULT;
> +	}
> +
>  	sg_init_table(sq->sg, skb_shinfo(skb)->nr_frags + (can_push ? 1 : 2));
>  	if (can_push) {
>  		__skb_push(skb, hdr_len);
> @@ -3054,6 +3066,11 @@ static int virtnet_probe(struct virtio_device *vdev)
>  	else
>  		vi->hdr_len = sizeof(struct virtio_net_hdr);
>  
> +	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_TX_HASH)) {
> +		vi->has_tx_hash = true;
> +		vi->hdr_len = sizeof(struct virtio_net_hdr_v1_hash);
> +	}
> +
>  	if (virtio_has_feature(vdev, VIRTIO_F_ANY_LAYOUT) ||
>  	    virtio_has_feature(vdev, VIRTIO_F_VERSION_1))
>  		vi->any_header_sg = true;
> @@ -3243,7 +3260,8 @@ static struct virtio_device_id id_table[] = {
>  	VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \
>  	VIRTIO_NET_F_CTRL_MAC_ADDR, \
>  	VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, \
> -	VIRTIO_NET_F_SPEED_DUPLEX, VIRTIO_NET_F_STANDBY
> +	VIRTIO_NET_F_SPEED_DUPLEX, VIRTIO_NET_F_STANDBY, \
> +	VIRTIO_NET_F_TX_HASH
>  
>  static unsigned int features[] = {
>  	VIRTNET_FEATURES,
> diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
> index 3f55a4215f11..f6881b5b77ee 100644
> --- a/include/uapi/linux/virtio_net.h
> +++ b/include/uapi/linux/virtio_net.h
> @@ -57,6 +57,7 @@
>  					 * Steering */
>  #define VIRTIO_NET_F_CTRL_MAC_ADDR 23	/* Set MAC address */
>  
> +#define VIRTIO_NET_F_TX_HASH	  56	/* Guest sends hash report */


Guest -> Driver

>  #define VIRTIO_NET_F_HASH_REPORT  57	/* Supports hash report */
>  #define VIRTIO_NET_F_RSS	  60	/* Supports RSS RX steering */
>  #define VIRTIO_NET_F_RSC_EXT	  61	/* extended coalescing info */
> @@ -170,8 +171,15 @@ struct virtio_net_hdr_v1_hash {
>  #define VIRTIO_NET_HASH_REPORT_IPv6_EX         7
>  #define VIRTIO_NET_HASH_REPORT_TCPv6_EX        8
>  #define VIRTIO_NET_HASH_REPORT_UDPv6_EX        9
> +#define VIRTIO_NET_HASH_REPORT_L4              10
> +#define VIRTIO_NET_HASH_REPORT_OTHER           11
>  	__le16 hash_report;
> -	__le16 padding;
> +	union {
> +		__le16 padding;
> +#define VIRTIO_NET_HASH_STATE_DEFAULT          0
> +#define VIRTIO_NET_HASH_STATE_TIMEOUT_BIT      0x1
> +		__le16 hash_state;
> +	};
>  };
>  
>  #ifndef VIRTIO_NET_NO_LEGACY
> -- 
> 2.29.2.729.g45daf8777d-goog


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rfc 1/3] virtio-net: support transmit hash report
  2020-12-28 16:28   ` Michael S. Tsirkin
@ 2020-12-28 16:47     ` Willem de Bruijn
  2020-12-28 17:22       ` Michael S. Tsirkin
  0 siblings, 1 reply; 34+ messages in thread
From: Willem de Bruijn @ 2020-12-28 16:47 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Willem de Bruijn, virtualization, Network Development, Jason Wang

On Mon, Dec 28, 2020 at 11:28 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Dec 28, 2020 at 11:22:31AM -0500, Willem de Bruijn wrote:
> > From: Willem de Bruijn <willemb@google.com>
> >
> > Virtio-net supports sharing the flow hash from host to guest on rx.
> > Do the same on transmit, to allow the host to infer connection state
> > for more robust routing and telemetry.
> >
> > Linux derives ipv6 flowlabel and ECMP multipath from sk->sk_txhash,
> > and updates these fields on error with sk_rethink_txhash. This feature
> > allows the host to make similar decisions.
> >
> > Besides the raw hash, optionally also convey connection state for
> > this hash. Specifically, the hash rotates on transmit timeout. To
> > avoid having to keep a stateful table in the host to detect flow
> > changes, explicitly notify when a hash changed due to timeout.
>
> I don't actually see code using VIRTIO_NET_HASH_STATE_TIMEOUT_BIT
> in this series. Want to split out that part to a separate patch?

Will do.

I wanted to make it clear that these bits must be reserved (i.e.,
zero) until a later patch specifies them.

The timeout notification feature requires additional plumbing between
the TCP protocol stack and device driver, probably an skb bit. I'd
like to leave that as follow-up for now.

Thanks for the fast feedback!

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rfc 1/3] virtio-net: support transmit hash report
  2020-12-28 16:47     ` Willem de Bruijn
@ 2020-12-28 17:22       ` Michael S. Tsirkin
  2020-12-29  1:19         ` Willem de Bruijn
  0 siblings, 1 reply; 34+ messages in thread
From: Michael S. Tsirkin @ 2020-12-28 17:22 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: virtualization, Network Development, Jason Wang

On Mon, Dec 28, 2020 at 11:47:45AM -0500, Willem de Bruijn wrote:
> On Mon, Dec 28, 2020 at 11:28 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Dec 28, 2020 at 11:22:31AM -0500, Willem de Bruijn wrote:
> > > From: Willem de Bruijn <willemb@google.com>
> > >
> > > Virtio-net supports sharing the flow hash from host to guest on rx.
> > > Do the same on transmit, to allow the host to infer connection state
> > > for more robust routing and telemetry.
> > >
> > > Linux derives ipv6 flowlabel and ECMP multipath from sk->sk_txhash,
> > > and updates these fields on error with sk_rethink_txhash. This feature
> > > allows the host to make similar decisions.
> > >
> > > Besides the raw hash, optionally also convey connection state for
> > > this hash. Specifically, the hash rotates on transmit timeout. To
> > > avoid having to keep a stateful table in the host to detect flow
> > > changes, explicitly notify when a hash changed due to timeout.
> >
> > I don't actually see code using VIRTIO_NET_HASH_STATE_TIMEOUT_BIT
> > in this series. Want to split out that part to a separate patch?
> 
> Will do.
> 
> I wanted to make it clear that these bits must be reserved (i.e.,
> zero) until a later patch specifies them.

Already the case for the padding field I think ...

> The timeout notification feature requires additional plumbing between
> the TCP protocol stack and device driver, probably an skb bit. I'd
> like to leave that as follow-up for now.
> 
> Thanks for the fast feedback!


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rfc 2/3] virtio-net: support receive timestamp
  2020-12-28 16:22 ` [PATCH rfc 2/3] virtio-net: support receive timestamp Willem de Bruijn
@ 2020-12-28 17:28   ` Michael S. Tsirkin
  2020-12-28 19:30     ` Willem de Bruijn
  2020-12-28 22:59   ` Jakub Kicinski
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 34+ messages in thread
From: Michael S. Tsirkin @ 2020-12-28 17:28 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: virtualization, netdev, jasowang, Willem de Bruijn

On Mon, Dec 28, 2020 at 11:22:32AM -0500, Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
> 
> Add optional PTP hardware timestamp offload for virtio-net.
> 
> Accurate RTT measurement requires timestamps close to the wire.
> Introduce virtio feature VIRTIO_NET_F_RX_TSTAMP. If negotiated, the
> virtio-net header is expanded with room for a timestamp. A host may
> pass receive timestamps for all or some packets. A timestamp is valid
> if non-zero.
> 
> The timestamp straddles (virtual) hardware domains. Like PTP, use
> international atomic time (CLOCK_TAI) as global clock base. It is
> guest responsibility to sync with host, e.g., through kvm-clock.
> 
> Signed-off-by: Willem de Bruijn <willemb@google.com>
> ---
>  drivers/net/virtio_net.c        | 20 +++++++++++++++++++-
>  include/uapi/linux/virtio_net.h | 12 ++++++++++++
>  2 files changed, 31 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index b917b7333928..57744bb6a141 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -204,6 +204,9 @@ struct virtnet_info {
>  	/* Guest will pass tx path info to the host */
>  	bool has_tx_hash;
>  
> +	/* Host will pass CLOCK_TAI receive time to the guest */
> +	bool has_rx_tstamp;
> +
>  	/* Has control virtqueue */
>  	bool has_cvq;
>  
> @@ -292,6 +295,13 @@ static inline struct virtio_net_hdr_mrg_rxbuf *skb_vnet_hdr(struct sk_buff *skb)
>  	return (struct virtio_net_hdr_mrg_rxbuf *)skb->cb;
>  }
>  
> +static inline struct virtio_net_hdr_v12 *skb_vnet_hdr_12(struct sk_buff *skb)
> +{
> +	BUILD_BUG_ON(sizeof(struct virtio_net_hdr_v12) > sizeof(skb->cb));
> +
> +	return (void *)skb->cb;
> +}
> +
>  /*
>   * private is used to chain pages for big packets, put the whole
>   * most recent used list in the beginning for reuse
> @@ -1082,6 +1092,9 @@ static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
>  		goto frame_err;
>  	}
>  
> +	if (vi->has_rx_tstamp)
> +		skb_hwtstamps(skb)->hwtstamp = ns_to_ktime(skb_vnet_hdr_12(skb)->tstamp);
> +
>  	skb_record_rx_queue(skb, vq2rxq(rq->vq));
>  	skb->protocol = eth_type_trans(skb, dev);
>  	pr_debug("Receiving skb proto 0x%04x len %i type %i\n",
> @@ -3071,6 +3084,11 @@ static int virtnet_probe(struct virtio_device *vdev)
>  		vi->hdr_len = sizeof(struct virtio_net_hdr_v1_hash);
>  	}
>  
> +	if (virtio_has_feature(vdev, VIRTIO_NET_F_RX_TSTAMP)) {
> +		vi->has_rx_tstamp = true;
> +		vi->hdr_len = sizeof(struct virtio_net_hdr_v12);
> +	}
> +
>  	if (virtio_has_feature(vdev, VIRTIO_F_ANY_LAYOUT) ||
>  	    virtio_has_feature(vdev, VIRTIO_F_VERSION_1))
>  		vi->any_header_sg = true;
> @@ -3261,7 +3279,7 @@ static struct virtio_device_id id_table[] = {
>  	VIRTIO_NET_F_CTRL_MAC_ADDR, \
>  	VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, \
>  	VIRTIO_NET_F_SPEED_DUPLEX, VIRTIO_NET_F_STANDBY, \
> -	VIRTIO_NET_F_TX_HASH
> +	VIRTIO_NET_F_TX_HASH, VIRTIO_NET_F_RX_TSTAMP
>  
>  static unsigned int features[] = {
>  	VIRTNET_FEATURES,
> diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
> index f6881b5b77ee..0ffe2eeebd4a 100644
> --- a/include/uapi/linux/virtio_net.h
> +++ b/include/uapi/linux/virtio_net.h
> @@ -57,6 +57,7 @@
>  					 * Steering */
>  #define VIRTIO_NET_F_CTRL_MAC_ADDR 23	/* Set MAC address */
>  
> +#define VIRTIO_NET_F_RX_TSTAMP	  55	/* Host sends TAI receive time */
>  #define VIRTIO_NET_F_TX_HASH	  56	/* Guest sends hash report */
>  #define VIRTIO_NET_F_HASH_REPORT  57	/* Supports hash report */
>  #define VIRTIO_NET_F_RSS	  60	/* Supports RSS RX steering */
> @@ -182,6 +183,17 @@ struct virtio_net_hdr_v1_hash {
>  	};
>  };
>  
> +struct virtio_net_hdr_v12 {
> +	struct virtio_net_hdr_v1 hdr;
> +	struct {
> +		__le32 value;
> +		__le16 report;
> +		__le16 flow_state;
> +	} hash;
> +	__virtio32 reserved;
> +	__virtio64 tstamp;
> +};
> +
>  #ifndef VIRTIO_NET_NO_LEGACY
>  /* This header comes first in the scatter-gather list.
>   * For legacy virtio, if VIRTIO_F_ANY_LAYOUT is not negotiated, it must


So it looks like VIRTIO_NET_F_RX_TSTAMP should depend on both
VIRTIO_NET_F_RX_TSTAMP and VIRTIO_NET_F_HASH_REPORT then?

I am not sure what does v12 mean here.

virtio_net_hdr_v1 is just with VIRTIO_F_VERSION_1,
virtio_net_hdr_v1_hash is with VIRTIO_F_VERSION_1 and
VIRTIO_NET_F_HASH_REPORT.

So this one is virtio_net_hdr_hash_tstamp I guess?


> -- 
> 2.29.2.729.g45daf8777d-goog


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rfc 0/3] virtio-net: add tx-hash, rx-tstamp and tx-tstamp
  2020-12-28 16:22 [PATCH rfc 0/3] virtio-net: add tx-hash, rx-tstamp and tx-tstamp Willem de Bruijn
                   ` (2 preceding siblings ...)
  2020-12-28 16:22 ` [PATCH rfc 3/3] virtio-net: support transmit timestamp Willem de Bruijn
@ 2020-12-28 17:29 ` Michael S. Tsirkin
  2020-12-28 19:51   ` Willem de Bruijn
  3 siblings, 1 reply; 34+ messages in thread
From: Michael S. Tsirkin @ 2020-12-28 17:29 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: virtualization, netdev, jasowang, Willem de Bruijn

On Mon, Dec 28, 2020 at 11:22:30AM -0500, Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
> 
> RFC for three new features to the virtio network device:
> 
> 1. pass tx flow hash and state to host, for routing + telemetry
> 2. pass rx tstamp to guest, for better RTT estimation
> 3. pass tx tstamp to host, for accurate pacing
> 
> All three would introduce an extension to the virtio spec.
> I assume this would require opening three ballots against v1.2 at
> https://www.oasis-open.org/committees/ballots.php?wg_abbrev=virtio
> 
> This RFC is to informally discuss the proposals first.
> 
> The patchset is against v5.10. Evaluation additionally requires
> changes to qemu and at least one back-end. I implemented preliminary
> support in Linux vhost-net. Both patches available through github at
> 
> https://github.com/wdebruij/linux/tree/virtio-net-txhash-1
> https://github.com/wdebruij/qemu/tree/virtio-net-txhash-1

Any data on what the benefits are?

> Willem de Bruijn (3):
>   virtio-net: support transmit hash report
>   virtio-net: support receive timestamp
>   virtio-net: support transmit timestamp
> 
>  drivers/net/virtio_net.c        | 52 +++++++++++++++++++++++++++++++--
>  include/uapi/linux/virtio_net.h | 23 ++++++++++++++-
>  2 files changed, 71 insertions(+), 4 deletions(-)
> 
> -- 
> 2.29.2.729.g45daf8777d-goog


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rfc 2/3] virtio-net: support receive timestamp
  2020-12-28 17:28   ` Michael S. Tsirkin
@ 2020-12-28 19:30     ` Willem de Bruijn
  2020-12-28 21:32       ` Michael S. Tsirkin
  0 siblings, 1 reply; 34+ messages in thread
From: Willem de Bruijn @ 2020-12-28 19:30 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Willem de Bruijn, virtualization, Network Development, Jason Wang

On Mon, Dec 28, 2020 at 12:29 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Dec 28, 2020 at 11:22:32AM -0500, Willem de Bruijn wrote:
> > From: Willem de Bruijn <willemb@google.com>
> >
> > Add optional PTP hardware timestamp offload for virtio-net.
> >
> > Accurate RTT measurement requires timestamps close to the wire.
> > Introduce virtio feature VIRTIO_NET_F_RX_TSTAMP. If negotiated, the
> > virtio-net header is expanded with room for a timestamp. A host may
> > pass receive timestamps for all or some packets. A timestamp is valid
> > if non-zero.
> >
> > The timestamp straddles (virtual) hardware domains. Like PTP, use
> > international atomic time (CLOCK_TAI) as global clock base. It is
> > guest responsibility to sync with host, e.g., through kvm-clock.
> >
> > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
> > index f6881b5b77ee..0ffe2eeebd4a 100644
> > --- a/include/uapi/linux/virtio_net.h
> > +++ b/include/uapi/linux/virtio_net.h
> > @@ -57,6 +57,7 @@
> >                                        * Steering */
> >  #define VIRTIO_NET_F_CTRL_MAC_ADDR 23        /* Set MAC address */
> >
> > +#define VIRTIO_NET_F_RX_TSTAMP         55    /* Host sends TAI receive time */
> >  #define VIRTIO_NET_F_TX_HASH   56    /* Guest sends hash report */
> >  #define VIRTIO_NET_F_HASH_REPORT  57 /* Supports hash report */
> >  #define VIRTIO_NET_F_RSS       60    /* Supports RSS RX steering */
> > @@ -182,6 +183,17 @@ struct virtio_net_hdr_v1_hash {
> >       };
> >  };
> >
> > +struct virtio_net_hdr_v12 {
> > +     struct virtio_net_hdr_v1 hdr;
> > +     struct {
> > +             __le32 value;
> > +             __le16 report;
> > +             __le16 flow_state;
> > +     } hash;
> > +     __virtio32 reserved;
> > +     __virtio64 tstamp;
> > +};
> > +
> >  #ifndef VIRTIO_NET_NO_LEGACY
> >  /* This header comes first in the scatter-gather list.
> >   * For legacy virtio, if VIRTIO_F_ANY_LAYOUT is not negotiated, it must
>
>
> So it looks like VIRTIO_NET_F_RX_TSTAMP should depend on both
> VIRTIO_NET_F_RX_TSTAMP and VIRTIO_NET_F_HASH_REPORT then?

Do you mean VIRTIO_NET_F_TX_TSTAMP depends on VIRTIO_NET_F_RX_TSTAMP?

I think if either is enabled we need to enable the extended layout.
Regardless of whether TX_HASH or HASH_REPORT are enabled. If they are
not, then those fields are ignored.

> I am not sure what does v12 mean here.
>
> virtio_net_hdr_v1 is just with VIRTIO_F_VERSION_1,
> virtio_net_hdr_v1_hash is with VIRTIO_F_VERSION_1 and
> VIRTIO_NET_F_HASH_REPORT.
>
> So this one is virtio_net_hdr_hash_tstamp I guess?

Sounds better, yes, will change that.

This was an attempt at identifying the layout with the likely next
generation of the spec, 1.2. Similar to virtio_net_hdr_v1. But that is
both premature and not very helpful.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rfc 0/3] virtio-net: add tx-hash, rx-tstamp and tx-tstamp
  2020-12-28 17:29 ` [PATCH rfc 0/3] virtio-net: add tx-hash, rx-tstamp and tx-tstamp Michael S. Tsirkin
@ 2020-12-28 19:51   ` Willem de Bruijn
  2020-12-28 21:38     ` Michael S. Tsirkin
  0 siblings, 1 reply; 34+ messages in thread
From: Willem de Bruijn @ 2020-12-28 19:51 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Willem de Bruijn, virtualization, Network Development, Jason Wang

On Mon, Dec 28, 2020 at 12:29 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Dec 28, 2020 at 11:22:30AM -0500, Willem de Bruijn wrote:
> > From: Willem de Bruijn <willemb@google.com>
> >
> > RFC for three new features to the virtio network device:
> >
> > 1. pass tx flow hash and state to host, for routing + telemetry
> > 2. pass rx tstamp to guest, for better RTT estimation
> > 3. pass tx tstamp to host, for accurate pacing
> >
> > All three would introduce an extension to the virtio spec.
> > I assume this would require opening three ballots against v1.2 at
> > https://www.oasis-open.org/committees/ballots.php?wg_abbrev=virtio
> >
> > This RFC is to informally discuss the proposals first.
> >
> > The patchset is against v5.10. Evaluation additionally requires
> > changes to qemu and at least one back-end. I implemented preliminary
> > support in Linux vhost-net. Both patches available through github at
> >
> > https://github.com/wdebruij/linux/tree/virtio-net-txhash-1
> > https://github.com/wdebruij/qemu/tree/virtio-net-txhash-1
>
> Any data on what the benefits are?

For the general method, yes. For this specific implementation, not  yet.

Swift congestion control is delay based. It won the best paper award
at SIGCOMM this year. That paper has a lot of data:
https://dl.acm.org/doi/pdf/10.1145/3387514.3406591 . Section 3.1 talks
about the different components that contribute to delay and how to
isolate them.

BBR and BBRv2 also have an explicit ProbeRTT phase as part of the design.

The specific additional benefits for VM-based TCP depends on many
conditions, e.g., whether a vCPU is exclusively owned and pinned. But
the same reasoning should be even more applicable to this even longer
stack, especially in the worst case conditions.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rfc 2/3] virtio-net: support receive timestamp
  2020-12-28 19:30     ` Willem de Bruijn
@ 2020-12-28 21:32       ` Michael S. Tsirkin
  2020-12-29  1:05         ` Willem de Bruijn
  0 siblings, 1 reply; 34+ messages in thread
From: Michael S. Tsirkin @ 2020-12-28 21:32 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: virtualization, Network Development, Jason Wang

On Mon, Dec 28, 2020 at 02:30:31PM -0500, Willem de Bruijn wrote:
> On Mon, Dec 28, 2020 at 12:29 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Dec 28, 2020 at 11:22:32AM -0500, Willem de Bruijn wrote:
> > > From: Willem de Bruijn <willemb@google.com>
> > >
> > > Add optional PTP hardware timestamp offload for virtio-net.
> > >
> > > Accurate RTT measurement requires timestamps close to the wire.
> > > Introduce virtio feature VIRTIO_NET_F_RX_TSTAMP. If negotiated, the
> > > virtio-net header is expanded with room for a timestamp. A host may
> > > pass receive timestamps for all or some packets. A timestamp is valid
> > > if non-zero.
> > >
> > > The timestamp straddles (virtual) hardware domains. Like PTP, use
> > > international atomic time (CLOCK_TAI) as global clock base. It is
> > > guest responsibility to sync with host, e.g., through kvm-clock.
> > >
> > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > > diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
> > > index f6881b5b77ee..0ffe2eeebd4a 100644
> > > --- a/include/uapi/linux/virtio_net.h
> > > +++ b/include/uapi/linux/virtio_net.h
> > > @@ -57,6 +57,7 @@
> > >                                        * Steering */
> > >  #define VIRTIO_NET_F_CTRL_MAC_ADDR 23        /* Set MAC address */
> > >
> > > +#define VIRTIO_NET_F_RX_TSTAMP         55    /* Host sends TAI receive time */
> > >  #define VIRTIO_NET_F_TX_HASH   56    /* Guest sends hash report */
> > >  #define VIRTIO_NET_F_HASH_REPORT  57 /* Supports hash report */
> > >  #define VIRTIO_NET_F_RSS       60    /* Supports RSS RX steering */
> > > @@ -182,6 +183,17 @@ struct virtio_net_hdr_v1_hash {
> > >       };
> > >  };
> > >
> > > +struct virtio_net_hdr_v12 {
> > > +     struct virtio_net_hdr_v1 hdr;
> > > +     struct {
> > > +             __le32 value;
> > > +             __le16 report;
> > > +             __le16 flow_state;
> > > +     } hash;
> > > +     __virtio32 reserved;
> > > +     __virtio64 tstamp;
> > > +};
> > > +
> > >  #ifndef VIRTIO_NET_NO_LEGACY
> > >  /* This header comes first in the scatter-gather list.
> > >   * For legacy virtio, if VIRTIO_F_ANY_LAYOUT is not negotiated, it must
> >
> >
> > So it looks like VIRTIO_NET_F_RX_TSTAMP should depend on both
> > VIRTIO_NET_F_RX_TSTAMP and VIRTIO_NET_F_HASH_REPORT then?
> 
> Do you mean VIRTIO_NET_F_TX_TSTAMP depends on VIRTIO_NET_F_RX_TSTAMP?
> 
> I think if either is enabled we need to enable the extended layout.
> Regardless of whether TX_HASH or HASH_REPORT are enabled. If they are
> not, then those fields are ignored.

I mean do we waste the 8 bytes for hash if TSTAMP is set but HASH is not?
If TSTAMP depends on HASH then point is moot.



> > I am not sure what does v12 mean here.
> >
> > virtio_net_hdr_v1 is just with VIRTIO_F_VERSION_1,
> > virtio_net_hdr_v1_hash is with VIRTIO_F_VERSION_1 and
> > VIRTIO_NET_F_HASH_REPORT.
> >
> > So this one is virtio_net_hdr_hash_tstamp I guess?
> 
> Sounds better, yes, will change that.
> 
> This was an attempt at identifying the layout with the likely next
> generation of the spec, 1.2. Similar to virtio_net_hdr_v1. But that is
> both premature and not very helpful.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rfc 1/3] virtio-net: support transmit hash report
  2020-12-28 16:22 ` [PATCH rfc 1/3] virtio-net: support transmit hash report Willem de Bruijn
  2020-12-28 16:28   ` Michael S. Tsirkin
@ 2020-12-28 21:36   ` Michael S. Tsirkin
  2020-12-29  1:23     ` Willem de Bruijn
  1 sibling, 1 reply; 34+ messages in thread
From: Michael S. Tsirkin @ 2020-12-28 21:36 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: virtualization, netdev, jasowang, Willem de Bruijn

On Mon, Dec 28, 2020 at 11:22:31AM -0500, Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
> 
> Virtio-net supports sharing the flow hash from host to guest on rx.
> Do the same on transmit, to allow the host to infer connection state
> for more robust routing and telemetry.
> 
> Linux derives ipv6 flowlabel and ECMP multipath from sk->sk_txhash,
> and updates these fields on error with sk_rethink_txhash. This feature
> allows the host to make similar decisions.
> 
> Besides the raw hash, optionally also convey connection state for
> this hash. Specifically, the hash rotates on transmit timeout. To
> avoid having to keep a stateful table in the host to detect flow
> changes, explicitly notify when a hash changed due to timeout.
> 
> Signed-off-by: Willem de Bruijn <willemb@google.com>
> ---
>  drivers/net/virtio_net.c        | 24 +++++++++++++++++++++---
>  include/uapi/linux/virtio_net.h | 10 +++++++++-
>  2 files changed, 30 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 21b71148c532..b917b7333928 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -201,6 +201,9 @@ struct virtnet_info {
>  	/* Host will merge rx buffers for big packets (shake it! shake it!) */
>  	bool mergeable_rx_bufs;
>  
> +	/* Guest will pass tx path info to the host */
> +	bool has_tx_hash;
> +
>  	/* Has control virtqueue */
>  	bool has_cvq;
>  
> @@ -394,9 +397,9 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>  
>  	hdr_len = vi->hdr_len;
>  	if (vi->mergeable_rx_bufs)
> -		hdr_padded_len = sizeof(*hdr);
> +		hdr_padded_len = max_t(unsigned int, hdr_len, sizeof(*hdr));
>  	else
> -		hdr_padded_len = sizeof(struct padded_vnet_hdr);
> +		hdr_padded_len = ALIGN(hdr_len, 16);
>  
>  	/* hdr_valid means no XDP, so we can copy the vnet header */
>  	if (hdr_valid)
> @@ -1534,6 +1537,7 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
>  	struct virtio_net_hdr_mrg_rxbuf *hdr;
>  	const unsigned char *dest = ((struct ethhdr *)skb->data)->h_dest;
>  	struct virtnet_info *vi = sq->vq->vdev->priv;
> +	struct virtio_net_hdr_v1_hash *ht;
>  	int num_sg;
>  	unsigned hdr_len = vi->hdr_len;
>  	bool can_push;
> @@ -1558,6 +1562,14 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
>  	if (vi->mergeable_rx_bufs)
>  		hdr->num_buffers = 0;
>  
> +	ht = (void *)hdr;
> +	if (vi->has_tx_hash) {
> +		ht->hash_value = cpu_to_virtio32(vi->vdev, skb->hash);
> +		ht->hash_report = skb->l4_hash ? VIRTIO_NET_HASH_REPORT_L4 :
> +						 VIRTIO_NET_HASH_REPORT_OTHER;
> +		ht->hash_state = VIRTIO_NET_HASH_STATE_DEFAULT;
> +	}
> +
>  	sg_init_table(sq->sg, skb_shinfo(skb)->nr_frags + (can_push ? 1 : 2));
>  	if (can_push) {
>  		__skb_push(skb, hdr_len);
> @@ -3054,6 +3066,11 @@ static int virtnet_probe(struct virtio_device *vdev)
>  	else
>  		vi->hdr_len = sizeof(struct virtio_net_hdr);
>  
> +	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_TX_HASH)) {
> +		vi->has_tx_hash = true;
> +		vi->hdr_len = sizeof(struct virtio_net_hdr_v1_hash);
> +	}
> +
>  	if (virtio_has_feature(vdev, VIRTIO_F_ANY_LAYOUT) ||
>  	    virtio_has_feature(vdev, VIRTIO_F_VERSION_1))
>  		vi->any_header_sg = true;
> @@ -3243,7 +3260,8 @@ static struct virtio_device_id id_table[] = {
>  	VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \
>  	VIRTIO_NET_F_CTRL_MAC_ADDR, \
>  	VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, \
> -	VIRTIO_NET_F_SPEED_DUPLEX, VIRTIO_NET_F_STANDBY
> +	VIRTIO_NET_F_SPEED_DUPLEX, VIRTIO_NET_F_STANDBY, \
> +	VIRTIO_NET_F_TX_HASH
>  
>  static unsigned int features[] = {
>  	VIRTNET_FEATURES,
> diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
> index 3f55a4215f11..f6881b5b77ee 100644
> --- a/include/uapi/linux/virtio_net.h
> +++ b/include/uapi/linux/virtio_net.h
> @@ -57,6 +57,7 @@
>  					 * Steering */
>  #define VIRTIO_NET_F_CTRL_MAC_ADDR 23	/* Set MAC address */
>  
> +#define VIRTIO_NET_F_TX_HASH	  56	/* Guest sends hash report */
>  #define VIRTIO_NET_F_HASH_REPORT  57	/* Supports hash report */
>  #define VIRTIO_NET_F_RSS	  60	/* Supports RSS RX steering */
>  #define VIRTIO_NET_F_RSC_EXT	  61	/* extended coalescing info */
> @@ -170,8 +171,15 @@ struct virtio_net_hdr_v1_hash {
>  #define VIRTIO_NET_HASH_REPORT_IPv6_EX         7
>  #define VIRTIO_NET_HASH_REPORT_TCPv6_EX        8
>  #define VIRTIO_NET_HASH_REPORT_UDPv6_EX        9
> +#define VIRTIO_NET_HASH_REPORT_L4              10
> +#define VIRTIO_NET_HASH_REPORT_OTHER           11

Need to specify these I guess ...
Can't there be any consistency with RX hash?
Handy for VM2VM ...


>  	__le16 hash_report;
> -	__le16 padding;
> +	union {
> +		__le16 padding;
> +#define VIRTIO_NET_HASH_STATE_DEFAULT          0
> +#define VIRTIO_NET_HASH_STATE_TIMEOUT_BIT      0x1
> +		__le16 hash_state;
> +	};
>  };
>  
>  #ifndef VIRTIO_NET_NO_LEGACY
> -- 
> 2.29.2.729.g45daf8777d-goog


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rfc 0/3] virtio-net: add tx-hash, rx-tstamp and tx-tstamp
  2020-12-28 19:51   ` Willem de Bruijn
@ 2020-12-28 21:38     ` Michael S. Tsirkin
  2020-12-29  1:14       ` Willem de Bruijn
  0 siblings, 1 reply; 34+ messages in thread
From: Michael S. Tsirkin @ 2020-12-28 21:38 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: virtualization, Network Development, Jason Wang

On Mon, Dec 28, 2020 at 02:51:09PM -0500, Willem de Bruijn wrote:
> On Mon, Dec 28, 2020 at 12:29 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Dec 28, 2020 at 11:22:30AM -0500, Willem de Bruijn wrote:
> > > From: Willem de Bruijn <willemb@google.com>
> > >
> > > RFC for three new features to the virtio network device:
> > >
> > > 1. pass tx flow hash and state to host, for routing + telemetry
> > > 2. pass rx tstamp to guest, for better RTT estimation
> > > 3. pass tx tstamp to host, for accurate pacing
> > >
> > > All three would introduce an extension to the virtio spec.
> > > I assume this would require opening three ballots against v1.2 at
> > > https://www.oasis-open.org/committees/ballots.php?wg_abbrev=virtio
> > >
> > > This RFC is to informally discuss the proposals first.
> > >
> > > The patchset is against v5.10. Evaluation additionally requires
> > > changes to qemu and at least one back-end. I implemented preliminary
> > > support in Linux vhost-net. Both patches available through github at
> > >
> > > https://github.com/wdebruij/linux/tree/virtio-net-txhash-1
> > > https://github.com/wdebruij/qemu/tree/virtio-net-txhash-1
> >
> > Any data on what the benefits are?
> 
> For the general method, yes. For this specific implementation, not  yet.
> 
> Swift congestion control is delay based. It won the best paper award
> at SIGCOMM this year. That paper has a lot of data:
> https://dl.acm.org/doi/pdf/10.1145/3387514.3406591 . Section 3.1 talks
> about the different components that contribute to delay and how to
> isolate them.

And for the hashing part?

> BBR and BBRv2 also have an explicit ProbeRTT phase as part of the design.
> 
> The specific additional benefits for VM-based TCP depends on many
> conditions, e.g., whether a vCPU is exclusively owned and pinned. But
> the same reasoning should be even more applicable to this even longer
> stack, especially in the worst case conditions.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rfc 2/3] virtio-net: support receive timestamp
  2020-12-28 16:22 ` [PATCH rfc 2/3] virtio-net: support receive timestamp Willem de Bruijn
  2020-12-28 17:28   ` Michael S. Tsirkin
@ 2020-12-28 22:59   ` Jakub Kicinski
  2020-12-29  0:57     ` Willem de Bruijn
  2021-02-02 13:05   ` kernel test robot
  2021-02-02 14:08   ` Michael S. Tsirkin
  3 siblings, 1 reply; 34+ messages in thread
From: Jakub Kicinski @ 2020-12-28 22:59 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: virtualization, netdev, mst, jasowang, Willem de Bruijn

On Mon, 28 Dec 2020 11:22:32 -0500 Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
> 
> Add optional PTP hardware timestamp offload for virtio-net.
> 
> Accurate RTT measurement requires timestamps close to the wire.
> Introduce virtio feature VIRTIO_NET_F_RX_TSTAMP. If negotiated, the
> virtio-net header is expanded with room for a timestamp. A host may
> pass receive timestamps for all or some packets. A timestamp is valid
> if non-zero.
> 
> The timestamp straddles (virtual) hardware domains. Like PTP, use
> international atomic time (CLOCK_TAI) as global clock base. It is
> guest responsibility to sync with host, e.g., through kvm-clock.

Would this not be confusing to some user space SW to have a NIC with 
no PHC deliver HW stamps?

I'd CC Richard on this, unless you already discussed with him offline.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rfc 2/3] virtio-net: support receive timestamp
  2020-12-28 22:59   ` Jakub Kicinski
@ 2020-12-29  0:57     ` Willem de Bruijn
  2020-12-30  8:44       ` Jason Wang
  2020-12-30 12:30       ` Richard Cochran
  0 siblings, 2 replies; 34+ messages in thread
From: Willem de Bruijn @ 2020-12-29  0:57 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Willem de Bruijn, virtualization, Network Development,
	Michael S. Tsirkin, Jason Wang, Richard Cochran

On Mon, Dec 28, 2020 at 5:59 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 28 Dec 2020 11:22:32 -0500 Willem de Bruijn wrote:
> > From: Willem de Bruijn <willemb@google.com>
> >
> > Add optional PTP hardware timestamp offload for virtio-net.
> >
> > Accurate RTT measurement requires timestamps close to the wire.
> > Introduce virtio feature VIRTIO_NET_F_RX_TSTAMP. If negotiated, the
> > virtio-net header is expanded with room for a timestamp. A host may
> > pass receive timestamps for all or some packets. A timestamp is valid
> > if non-zero.
> >
> > The timestamp straddles (virtual) hardware domains. Like PTP, use
> > international atomic time (CLOCK_TAI) as global clock base. It is
> > guest responsibility to sync with host, e.g., through kvm-clock.
>
> Would this not be confusing to some user space SW to have a NIC with
> no PHC deliver HW stamps?
>
> I'd CC Richard on this, unless you already discussed with him offline.

Thanks, good point. I should have included Richard.

There is a well understood method for synchronizing guest and host
clock in KVM using ptp_kvm. For virtual environments without NIC
hardware offload, the when host timestamps in software, this suffices.

Syncing host with NIC is assumed if the host advertises the feature
and implements using real hardware timestamps.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rfc 2/3] virtio-net: support receive timestamp
  2020-12-28 21:32       ` Michael S. Tsirkin
@ 2020-12-29  1:05         ` Willem de Bruijn
  2020-12-29  9:17           ` Jason Wang
  0 siblings, 1 reply; 34+ messages in thread
From: Willem de Bruijn @ 2020-12-29  1:05 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Willem de Bruijn, virtualization, Network Development, Jason Wang

On Mon, Dec 28, 2020 at 7:55 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Dec 28, 2020 at 02:30:31PM -0500, Willem de Bruijn wrote:
> > On Mon, Dec 28, 2020 at 12:29 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Dec 28, 2020 at 11:22:32AM -0500, Willem de Bruijn wrote:
> > > > From: Willem de Bruijn <willemb@google.com>
> > > >
> > > > Add optional PTP hardware timestamp offload for virtio-net.
> > > >
> > > > Accurate RTT measurement requires timestamps close to the wire.
> > > > Introduce virtio feature VIRTIO_NET_F_RX_TSTAMP. If negotiated, the
> > > > virtio-net header is expanded with room for a timestamp. A host may
> > > > pass receive timestamps for all or some packets. A timestamp is valid
> > > > if non-zero.
> > > >
> > > > The timestamp straddles (virtual) hardware domains. Like PTP, use
> > > > international atomic time (CLOCK_TAI) as global clock base. It is
> > > > guest responsibility to sync with host, e.g., through kvm-clock.
> > > >
> > > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > > > diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
> > > > index f6881b5b77ee..0ffe2eeebd4a 100644
> > > > --- a/include/uapi/linux/virtio_net.h
> > > > +++ b/include/uapi/linux/virtio_net.h
> > > > @@ -57,6 +57,7 @@
> > > >                                        * Steering */
> > > >  #define VIRTIO_NET_F_CTRL_MAC_ADDR 23        /* Set MAC address */
> > > >
> > > > +#define VIRTIO_NET_F_RX_TSTAMP         55    /* Host sends TAI receive time */
> > > >  #define VIRTIO_NET_F_TX_HASH   56    /* Guest sends hash report */
> > > >  #define VIRTIO_NET_F_HASH_REPORT  57 /* Supports hash report */
> > > >  #define VIRTIO_NET_F_RSS       60    /* Supports RSS RX steering */
> > > > @@ -182,6 +183,17 @@ struct virtio_net_hdr_v1_hash {
> > > >       };
> > > >  };
> > > >
> > > > +struct virtio_net_hdr_v12 {
> > > > +     struct virtio_net_hdr_v1 hdr;
> > > > +     struct {
> > > > +             __le32 value;
> > > > +             __le16 report;
> > > > +             __le16 flow_state;
> > > > +     } hash;
> > > > +     __virtio32 reserved;
> > > > +     __virtio64 tstamp;
> > > > +};
> > > > +
> > > >  #ifndef VIRTIO_NET_NO_LEGACY
> > > >  /* This header comes first in the scatter-gather list.
> > > >   * For legacy virtio, if VIRTIO_F_ANY_LAYOUT is not negotiated, it must
> > >
> > >
> > > So it looks like VIRTIO_NET_F_RX_TSTAMP should depend on both
> > > VIRTIO_NET_F_RX_TSTAMP and VIRTIO_NET_F_HASH_REPORT then?
> >
> > Do you mean VIRTIO_NET_F_TX_TSTAMP depends on VIRTIO_NET_F_RX_TSTAMP?
> >
> > I think if either is enabled we need to enable the extended layout.
> > Regardless of whether TX_HASH or HASH_REPORT are enabled. If they are
> > not, then those fields are ignored.
>
> I mean do we waste the 8 bytes for hash if TSTAMP is set but HASH is not?
> If TSTAMP depends on HASH then point is moot.

True, but the two features really are independent.

I did consider using configurable metadata layout depending on
features negotiated. If there are tons of optional extensions, that
makes sense. But it is more complex and parsing error prone. With a
handful of options each of a few bytes, that did not seem worth the
cost to me at this point.

And importantly, such a mode can always be added later as a separate
VIRTIO_NET_F_PACKED_HEADER option.

If anything, perhaps if we increase the virtio_net_hdr_* allocation,
we should allocate some additional reserved space now? As each new
size introduces quite a bit of boilerplate. Also, e.g., in qemu just
to pass the sizes between virtio-net driver and vhost-net device.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rfc 0/3] virtio-net: add tx-hash, rx-tstamp and tx-tstamp
  2020-12-28 21:38     ` Michael S. Tsirkin
@ 2020-12-29  1:14       ` Willem de Bruijn
  2021-01-06 20:32         ` Willem de Bruijn
  0 siblings, 1 reply; 34+ messages in thread
From: Willem de Bruijn @ 2020-12-29  1:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Willem de Bruijn, virtualization, Network Development, Jason Wang

On Mon, Dec 28, 2020 at 7:47 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Dec 28, 2020 at 02:51:09PM -0500, Willem de Bruijn wrote:
> > On Mon, Dec 28, 2020 at 12:29 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Dec 28, 2020 at 11:22:30AM -0500, Willem de Bruijn wrote:
> > > > From: Willem de Bruijn <willemb@google.com>
> > > >
> > > > RFC for three new features to the virtio network device:
> > > >
> > > > 1. pass tx flow hash and state to host, for routing + telemetry
> > > > 2. pass rx tstamp to guest, for better RTT estimation
> > > > 3. pass tx tstamp to host, for accurate pacing
> > > >
> > > > All three would introduce an extension to the virtio spec.
> > > > I assume this would require opening three ballots against v1.2 at
> > > > https://www.oasis-open.org/committees/ballots.php?wg_abbrev=virtio
> > > >
> > > > This RFC is to informally discuss the proposals first.
> > > >
> > > > The patchset is against v5.10. Evaluation additionally requires
> > > > changes to qemu and at least one back-end. I implemented preliminary
> > > > support in Linux vhost-net. Both patches available through github at
> > > >
> > > > https://github.com/wdebruij/linux/tree/virtio-net-txhash-1
> > > > https://github.com/wdebruij/qemu/tree/virtio-net-txhash-1
> > >
> > > Any data on what the benefits are?
> >
> > For the general method, yes. For this specific implementation, not  yet.
> >
> > Swift congestion control is delay based. It won the best paper award
> > at SIGCOMM this year. That paper has a lot of data:
> > https://dl.acm.org/doi/pdf/10.1145/3387514.3406591 . Section 3.1 talks
> > about the different components that contribute to delay and how to
> > isolate them.
>
> And for the hashing part?

A few concrete examples of error conditions that can be resolved are
mentioned in the commits that add sk_rethink_txhash calls. Such as
commit 7788174e8726 ("tcp: change IPv6 flow-label upon receiving
spurious retransmission"):

"
    Currently a Linux IPv6 TCP sender will change the flow label upon
    timeouts to potentially steer away from a data path that has gone
    bad. However this does not help if the problem is on the ACK path
    and the data path is healthy. In this case the receiver is likely
    to receive repeated spurious retransmission because the sender
    couldn't get the ACKs in time and has recurring timeouts.

    This patch adds another feature to mitigate this problem. It
    leverages the DSACK states in the receiver to change the flow
    label of the ACKs to speculatively re-route the ACK packets.
    In order to allow triggering on the second consecutive spurious
    RTO, the receiver changes the flow label upon sending a second
    consecutive DSACK for a sequence number below RCV.NXT.
"

I don't have quantitative data on the efficacy at scale at hand. Let
me see what I can find. This will probably take some time, at least
until people are back after the holidays. I didn't want to delay the
patch, as the merge window was a nice time for RFC. But agreed that it
deserves stronger justification.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rfc 1/3] virtio-net: support transmit hash report
  2020-12-28 17:22       ` Michael S. Tsirkin
@ 2020-12-29  1:19         ` Willem de Bruijn
  0 siblings, 0 replies; 34+ messages in thread
From: Willem de Bruijn @ 2020-12-29  1:19 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Willem de Bruijn, virtualization, Network Development, Jason Wang

On Mon, Dec 28, 2020 at 12:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Dec 28, 2020 at 11:47:45AM -0500, Willem de Bruijn wrote:
> > On Mon, Dec 28, 2020 at 11:28 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Dec 28, 2020 at 11:22:31AM -0500, Willem de Bruijn wrote:
> > > > From: Willem de Bruijn <willemb@google.com>
> > > >
> > > > Virtio-net supports sharing the flow hash from host to guest on rx.
> > > > Do the same on transmit, to allow the host to infer connection state
> > > > for more robust routing and telemetry.
> > > >
> > > > Linux derives ipv6 flowlabel and ECMP multipath from sk->sk_txhash,
> > > > and updates these fields on error with sk_rethink_txhash. This feature
> > > > allows the host to make similar decisions.
> > > >
> > > > Besides the raw hash, optionally also convey connection state for
> > > > this hash. Specifically, the hash rotates on transmit timeout. To
> > > > avoid having to keep a stateful table in the host to detect flow
> > > > changes, explicitly notify when a hash changed due to timeout.
> > >
> > > I don't actually see code using VIRTIO_NET_HASH_STATE_TIMEOUT_BIT
> > > in this series. Want to split out that part to a separate patch?
> >
> > Will do.
> >
> > I wanted to make it clear that these bits must be reserved (i.e.,
> > zero) until a later patch specifies them.
>
> Already the case for the padding field I think ...

Good. Is this something that the device should enforce? Meaning, drop
requests with reserved bits set.

Once a bit is defined and device updated to accept it, it cannot
distinguish between a new driver aware of the bit and an old one that
wrote to padding. More problematic, a well behaved new driver will
gets packets dropped by an old device.

The proper way is to negotiate this is through features, of course.
But I don't think we want to add one for each new bit. That's why I
squished the bit definition in here, even in absence of an
implementation.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rfc 1/3] virtio-net: support transmit hash report
  2020-12-28 21:36   ` Michael S. Tsirkin
@ 2020-12-29  1:23     ` Willem de Bruijn
  0 siblings, 0 replies; 34+ messages in thread
From: Willem de Bruijn @ 2020-12-29  1:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Willem de Bruijn, virtualization, Network Development, Jason Wang

On Mon, Dec 28, 2020 at 4:36 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Dec 28, 2020 at 11:22:31AM -0500, Willem de Bruijn wrote:
> > From: Willem de Bruijn <willemb@google.com>
> >
> > Virtio-net supports sharing the flow hash from host to guest on rx.
> > Do the same on transmit, to allow the host to infer connection state
> > for more robust routing and telemetry.
> >
> > Linux derives ipv6 flowlabel and ECMP multipath from sk->sk_txhash,
> > and updates these fields on error with sk_rethink_txhash. This feature
> > allows the host to make similar decisions.
> >
> > Besides the raw hash, optionally also convey connection state for
> > this hash. Specifically, the hash rotates on transmit timeout. To
> > avoid having to keep a stateful table in the host to detect flow
> > changes, explicitly notify when a hash changed due to timeout.
> >
> > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > ---
> > diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
> > index 3f55a4215f11..f6881b5b77ee 100644
> > --- a/include/uapi/linux/virtio_net.h
> > +++ b/include/uapi/linux/virtio_net.h
> > @@ -57,6 +57,7 @@
> >                                        * Steering */
> >  #define VIRTIO_NET_F_CTRL_MAC_ADDR 23        /* Set MAC address */
> >
> > +#define VIRTIO_NET_F_TX_HASH   56    /* Guest sends hash report */
> >  #define VIRTIO_NET_F_HASH_REPORT  57 /* Supports hash report */
> >  #define VIRTIO_NET_F_RSS       60    /* Supports RSS RX steering */
> >  #define VIRTIO_NET_F_RSC_EXT   61    /* extended coalescing info */
> > @@ -170,8 +171,15 @@ struct virtio_net_hdr_v1_hash {
> >  #define VIRTIO_NET_HASH_REPORT_IPv6_EX         7
> >  #define VIRTIO_NET_HASH_REPORT_TCPv6_EX        8
> >  #define VIRTIO_NET_HASH_REPORT_UDPv6_EX        9
> > +#define VIRTIO_NET_HASH_REPORT_L4              10
> > +#define VIRTIO_NET_HASH_REPORT_OTHER           11
>
> Need to specify these I guess ...
> Can't there be any consistency with RX hash?
> Handy for VM2VM ...

Agreed. Unfortunately the skb hash does only distinguishes between L4
and not. And for many purposes that is sufficient.

Implementing the existing flags would require flow dissection, at cpu cost.

I did add the flags to the same field, so that the less specific .._L4
and .._OTHER are valid rx-hash values as well.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rfc 2/3] virtio-net: support receive timestamp
  2020-12-29  1:05         ` Willem de Bruijn
@ 2020-12-29  9:17           ` Jason Wang
  2020-12-29 14:20             ` Willem de Bruijn
  0 siblings, 1 reply; 34+ messages in thread
From: Jason Wang @ 2020-12-29  9:17 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Michael S. Tsirkin, virtualization, Network Development



----- Original Message -----
> On Mon, Dec 28, 2020 at 7:55 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Dec 28, 2020 at 02:30:31PM -0500, Willem de Bruijn wrote:
> > > On Mon, Dec 28, 2020 at 12:29 PM Michael S. Tsirkin <mst@redhat.com>
> > > wrote:
> > > >
> > > > On Mon, Dec 28, 2020 at 11:22:32AM -0500, Willem de Bruijn wrote:
> > > > > From: Willem de Bruijn <willemb@google.com>
> > > > >
> > > > > Add optional PTP hardware timestamp offload for virtio-net.
> > > > >
> > > > > Accurate RTT measurement requires timestamps close to the wire.
> > > > > Introduce virtio feature VIRTIO_NET_F_RX_TSTAMP. If negotiated, the
> > > > > virtio-net header is expanded with room for a timestamp. A host may
> > > > > pass receive timestamps for all or some packets. A timestamp is valid
> > > > > if non-zero.
> > > > >
> > > > > The timestamp straddles (virtual) hardware domains. Like PTP, use
> > > > > international atomic time (CLOCK_TAI) as global clock base. It is
> > > > > guest responsibility to sync with host, e.g., through kvm-clock.
> > > > >
> > > > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > > > > diff --git a/include/uapi/linux/virtio_net.h
> > > > > b/include/uapi/linux/virtio_net.h
> > > > > index f6881b5b77ee..0ffe2eeebd4a 100644
> > > > > --- a/include/uapi/linux/virtio_net.h
> > > > > +++ b/include/uapi/linux/virtio_net.h
> > > > > @@ -57,6 +57,7 @@
> > > > >                                        * Steering */
> > > > >  #define VIRTIO_NET_F_CTRL_MAC_ADDR 23        /* Set MAC address */
> > > > >
> > > > > +#define VIRTIO_NET_F_RX_TSTAMP         55    /* Host sends TAI
> > > > > receive time */
> > > > >  #define VIRTIO_NET_F_TX_HASH   56    /* Guest sends hash report */
> > > > >  #define VIRTIO_NET_F_HASH_REPORT  57 /* Supports hash report */
> > > > >  #define VIRTIO_NET_F_RSS       60    /* Supports RSS RX steering */
> > > > > @@ -182,6 +183,17 @@ struct virtio_net_hdr_v1_hash {
> > > > >       };
> > > > >  };
> > > > >
> > > > > +struct virtio_net_hdr_v12 {
> > > > > +     struct virtio_net_hdr_v1 hdr;
> > > > > +     struct {
> > > > > +             __le32 value;
> > > > > +             __le16 report;
> > > > > +             __le16 flow_state;
> > > > > +     } hash;
> > > > > +     __virtio32 reserved;
> > > > > +     __virtio64 tstamp;
> > > > > +};
> > > > > +
> > > > >  #ifndef VIRTIO_NET_NO_LEGACY
> > > > >  /* This header comes first in the scatter-gather list.
> > > > >   * For legacy virtio, if VIRTIO_F_ANY_LAYOUT is not negotiated, it
> > > > >   must
> > > >
> > > >
> > > > So it looks like VIRTIO_NET_F_RX_TSTAMP should depend on both
> > > > VIRTIO_NET_F_RX_TSTAMP and VIRTIO_NET_F_HASH_REPORT then?
> > >
> > > Do you mean VIRTIO_NET_F_TX_TSTAMP depends on VIRTIO_NET_F_RX_TSTAMP?
> > >
> > > I think if either is enabled we need to enable the extended layout.
> > > Regardless of whether TX_HASH or HASH_REPORT are enabled. If they are
> > > not, then those fields are ignored.
> >
> > I mean do we waste the 8 bytes for hash if TSTAMP is set but HASH is not?
> > If TSTAMP depends on HASH then point is moot.
> 
> True, but the two features really are independent.
> 
> I did consider using configurable metadata layout depending on
> features negotiated. If there are tons of optional extensions, that
> makes sense. But it is more complex and parsing error prone. With a
> handful of options each of a few bytes, that did not seem worth the
> cost to me at this point.

Consider NFV workloads(64B) packet. Most fields of current vnet header
is even a burdern. It might be the time to introduce configurable or
self-descriptive vnet header.


> 
> And importantly, such a mode can always be added later as a separate
> VIRTIO_NET_F_PACKED_HEADER option.

What will do feature provide?

Thanks

> 
> If anything, perhaps if we increase the virtio_net_hdr_* allocation,
> we should allocate some additional reserved space now? As each new
> size introduces quite a bit of boilerplate. Also, e.g., in qemu just
> to pass the sizes between virtio-net driver and vhost-net device.
> 
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rfc 2/3] virtio-net: support receive timestamp
  2020-12-29  9:17           ` Jason Wang
@ 2020-12-29 14:20             ` Willem de Bruijn
  2020-12-30  8:38               ` Jason Wang
  0 siblings, 1 reply; 34+ messages in thread
From: Willem de Bruijn @ 2020-12-29 14:20 UTC (permalink / raw)
  To: Jason Wang
  Cc: Willem de Bruijn, Michael S. Tsirkin, virtualization,
	Network Development

On Tue, Dec 29, 2020 at 4:23 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
>
> ----- Original Message -----
> > On Mon, Dec 28, 2020 at 7:55 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Dec 28, 2020 at 02:30:31PM -0500, Willem de Bruijn wrote:
> > > > On Mon, Dec 28, 2020 at 12:29 PM Michael S. Tsirkin <mst@redhat.com>
> > > > wrote:
> > > > >
> > > > > On Mon, Dec 28, 2020 at 11:22:32AM -0500, Willem de Bruijn wrote:
> > > > > > From: Willem de Bruijn <willemb@google.com>
> > > > > >
> > > > > > Add optional PTP hardware timestamp offload for virtio-net.
> > > > > >
> > > > > > Accurate RTT measurement requires timestamps close to the wire.
> > > > > > Introduce virtio feature VIRTIO_NET_F_RX_TSTAMP. If negotiated, the
> > > > > > virtio-net header is expanded with room for a timestamp. A host may
> > > > > > pass receive timestamps for all or some packets. A timestamp is valid
> > > > > > if non-zero.
> > > > > >
> > > > > > The timestamp straddles (virtual) hardware domains. Like PTP, use
> > > > > > international atomic time (CLOCK_TAI) as global clock base. It is
> > > > > > guest responsibility to sync with host, e.g., through kvm-clock.
> > > > > >
> > > > > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > > > > > diff --git a/include/uapi/linux/virtio_net.h
> > > > > > b/include/uapi/linux/virtio_net.h
> > > > > > index f6881b5b77ee..0ffe2eeebd4a 100644
> > > > > > --- a/include/uapi/linux/virtio_net.h
> > > > > > +++ b/include/uapi/linux/virtio_net.h
> > > > > > @@ -57,6 +57,7 @@
> > > > > >                                        * Steering */
> > > > > >  #define VIRTIO_NET_F_CTRL_MAC_ADDR 23        /* Set MAC address */
> > > > > >
> > > > > > +#define VIRTIO_NET_F_RX_TSTAMP         55    /* Host sends TAI
> > > > > > receive time */
> > > > > >  #define VIRTIO_NET_F_TX_HASH   56    /* Guest sends hash report */
> > > > > >  #define VIRTIO_NET_F_HASH_REPORT  57 /* Supports hash report */
> > > > > >  #define VIRTIO_NET_F_RSS       60    /* Supports RSS RX steering */
> > > > > > @@ -182,6 +183,17 @@ struct virtio_net_hdr_v1_hash {
> > > > > >       };
> > > > > >  };
> > > > > >
> > > > > > +struct virtio_net_hdr_v12 {
> > > > > > +     struct virtio_net_hdr_v1 hdr;
> > > > > > +     struct {
> > > > > > +             __le32 value;
> > > > > > +             __le16 report;
> > > > > > +             __le16 flow_state;
> > > > > > +     } hash;
> > > > > > +     __virtio32 reserved;
> > > > > > +     __virtio64 tstamp;
> > > > > > +};
> > > > > > +
> > > > > >  #ifndef VIRTIO_NET_NO_LEGACY
> > > > > >  /* This header comes first in the scatter-gather list.
> > > > > >   * For legacy virtio, if VIRTIO_F_ANY_LAYOUT is not negotiated, it
> > > > > >   must
> > > > >
> > > > >
> > > > > So it looks like VIRTIO_NET_F_RX_TSTAMP should depend on both
> > > > > VIRTIO_NET_F_RX_TSTAMP and VIRTIO_NET_F_HASH_REPORT then?
> > > >
> > > > Do you mean VIRTIO_NET_F_TX_TSTAMP depends on VIRTIO_NET_F_RX_TSTAMP?
> > > >
> > > > I think if either is enabled we need to enable the extended layout.
> > > > Regardless of whether TX_HASH or HASH_REPORT are enabled. If they are
> > > > not, then those fields are ignored.
> > >
> > > I mean do we waste the 8 bytes for hash if TSTAMP is set but HASH is not?
> > > If TSTAMP depends on HASH then point is moot.
> >
> > True, but the two features really are independent.
> >
> > I did consider using configurable metadata layout depending on
> > features negotiated. If there are tons of optional extensions, that
> > makes sense. But it is more complex and parsing error prone. With a
> > handful of options each of a few bytes, that did not seem worth the
> > cost to me at this point.
>
> Consider NFV workloads(64B) packet. Most fields of current vnet header
> is even a burdern.

Such workloads will not negotiate these features, I imagine.

I think this could only become an issue if a small packet workload
would want to negotiate one optional feature, without the others.

Even then, the actual cost of a few untouched bytes is negligible, as
long as the headers don't span cache-lines. I expect it to be cheaper
than having to parse a dynamic layout.

> It might be the time to introduce configurable or
> self-descriptive vnet header.

I did briefly toy with a type-val encoding. Not type-val-len, as all
options have fixed length (so far), so length can be left implicit.
But as each feature is negotiated before hand, the type can be left
implicit too. Leaving just the data elements tightly packed. As long
as order is defined.

> > And importantly, such a mode can always be added later as a separate
> > VIRTIO_NET_F_PACKED_HEADER option.
>
> What will do feature provide?

The tightly packed data elements. Strip out the elements for non
negotiated features. So if either tstamp option is negotiated, but
hash is not, exclude the 64b of hash fields. Note that I'm not
actually arguing for this (at this point).

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rfc 2/3] virtio-net: support receive timestamp
  2020-12-29 14:20             ` Willem de Bruijn
@ 2020-12-30  8:38               ` Jason Wang
  0 siblings, 0 replies; 34+ messages in thread
From: Jason Wang @ 2020-12-30  8:38 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Michael S. Tsirkin, virtualization, Network Development


On 2020/12/29 下午10:20, Willem de Bruijn wrote:
> On Tue, Dec 29, 2020 at 4:23 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>>
>> ----- Original Message -----
>>> On Mon, Dec 28, 2020 at 7:55 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>>>> On Mon, Dec 28, 2020 at 02:30:31PM -0500, Willem de Bruijn wrote:
>>>>> On Mon, Dec 28, 2020 at 12:29 PM Michael S. Tsirkin <mst@redhat.com>
>>>>> wrote:
>>>>>> On Mon, Dec 28, 2020 at 11:22:32AM -0500, Willem de Bruijn wrote:
>>>>>>> From: Willem de Bruijn <willemb@google.com>
>>>>>>>
>>>>>>> Add optional PTP hardware timestamp offload for virtio-net.
>>>>>>>
>>>>>>> Accurate RTT measurement requires timestamps close to the wire.
>>>>>>> Introduce virtio feature VIRTIO_NET_F_RX_TSTAMP. If negotiated, the
>>>>>>> virtio-net header is expanded with room for a timestamp. A host may
>>>>>>> pass receive timestamps for all or some packets. A timestamp is valid
>>>>>>> if non-zero.
>>>>>>>
>>>>>>> The timestamp straddles (virtual) hardware domains. Like PTP, use
>>>>>>> international atomic time (CLOCK_TAI) as global clock base. It is
>>>>>>> guest responsibility to sync with host, e.g., through kvm-clock.
>>>>>>>
>>>>>>> Signed-off-by: Willem de Bruijn <willemb@google.com>
>>>>>>> diff --git a/include/uapi/linux/virtio_net.h
>>>>>>> b/include/uapi/linux/virtio_net.h
>>>>>>> index f6881b5b77ee..0ffe2eeebd4a 100644
>>>>>>> --- a/include/uapi/linux/virtio_net.h
>>>>>>> +++ b/include/uapi/linux/virtio_net.h
>>>>>>> @@ -57,6 +57,7 @@
>>>>>>>                                         * Steering */
>>>>>>>   #define VIRTIO_NET_F_CTRL_MAC_ADDR 23        /* Set MAC address */
>>>>>>>
>>>>>>> +#define VIRTIO_NET_F_RX_TSTAMP         55    /* Host sends TAI
>>>>>>> receive time */
>>>>>>>   #define VIRTIO_NET_F_TX_HASH   56    /* Guest sends hash report */
>>>>>>>   #define VIRTIO_NET_F_HASH_REPORT  57 /* Supports hash report */
>>>>>>>   #define VIRTIO_NET_F_RSS       60    /* Supports RSS RX steering */
>>>>>>> @@ -182,6 +183,17 @@ struct virtio_net_hdr_v1_hash {
>>>>>>>        };
>>>>>>>   };
>>>>>>>
>>>>>>> +struct virtio_net_hdr_v12 {
>>>>>>> +     struct virtio_net_hdr_v1 hdr;
>>>>>>> +     struct {
>>>>>>> +             __le32 value;
>>>>>>> +             __le16 report;
>>>>>>> +             __le16 flow_state;
>>>>>>> +     } hash;
>>>>>>> +     __virtio32 reserved;
>>>>>>> +     __virtio64 tstamp;
>>>>>>> +};
>>>>>>> +
>>>>>>>   #ifndef VIRTIO_NET_NO_LEGACY
>>>>>>>   /* This header comes first in the scatter-gather list.
>>>>>>>    * For legacy virtio, if VIRTIO_F_ANY_LAYOUT is not negotiated, it
>>>>>>>    must
>>>>>>
>>>>>> So it looks like VIRTIO_NET_F_RX_TSTAMP should depend on both
>>>>>> VIRTIO_NET_F_RX_TSTAMP and VIRTIO_NET_F_HASH_REPORT then?
>>>>> Do you mean VIRTIO_NET_F_TX_TSTAMP depends on VIRTIO_NET_F_RX_TSTAMP?
>>>>>
>>>>> I think if either is enabled we need to enable the extended layout.
>>>>> Regardless of whether TX_HASH or HASH_REPORT are enabled. If they are
>>>>> not, then those fields are ignored.
>>>> I mean do we waste the 8 bytes for hash if TSTAMP is set but HASH is not?
>>>> If TSTAMP depends on HASH then point is moot.
>>> True, but the two features really are independent.
>>>
>>> I did consider using configurable metadata layout depending on
>>> features negotiated. If there are tons of optional extensions, that
>>> makes sense. But it is more complex and parsing error prone. With a
>>> handful of options each of a few bytes, that did not seem worth the
>>> cost to me at this point.
>> Consider NFV workloads(64B) packet. Most fields of current vnet header
>> is even a burdern.
> Such workloads will not negotiate these features, I imagine.
>
> I think this could only become an issue if a small packet workload
> would want to negotiate one optional feature, without the others.


Yes.


> Even then, the actual cost of a few untouched bytes is negligible, as
> long as the headers don't span cache-lines.


Right. It looks to me with hash report support the current header has 
exceeds 32 bytes which is the cacheline size for some archs. And it can 
be imagined that after two or more features it could be larger than 64 
bytes.


>   I expect it to be cheaper
> than having to parse a dynamic layout.


The only overhead is probably analyzing "type" which should be cheap I 
guess.


>
>> It might be the time to introduce configurable or
>> self-descriptive vnet header.
> I did briefly toy with a type-val encoding. Not type-val-len, as all
> options have fixed length (so far), so length can be left implicit.
> But as each feature is negotiated before hand, the type can be left
> implicit too. Leaving just the data elements tightly packed. As long
> as order is defined.


Right, so in this case there should be even no overhead.


>
>>> And importantly, such a mode can always be added later as a separate
>>> VIRTIO_NET_F_PACKED_HEADER option.
>> What will do feature provide?
> The tightly packed data elements. Strip out the elements for non
> negotiated features. So if either tstamp option is negotiated, but
> hash is not, exclude the 64b of hash fields. Note that I'm not
> actually arguing for this (at this point).


I second for this.

Thanks


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rfc 2/3] virtio-net: support receive timestamp
  2020-12-29  0:57     ` Willem de Bruijn
@ 2020-12-30  8:44       ` Jason Wang
  2020-12-30 12:30       ` Richard Cochran
  1 sibling, 0 replies; 34+ messages in thread
From: Jason Wang @ 2020-12-30  8:44 UTC (permalink / raw)
  To: Willem de Bruijn, Jakub Kicinski
  Cc: virtualization, Network Development, Michael S. Tsirkin, Richard Cochran


On 2020/12/29 上午8:57, Willem de Bruijn wrote:
> On Mon, Dec 28, 2020 at 5:59 PM Jakub Kicinski <kuba@kernel.org> wrote:
>> On Mon, 28 Dec 2020 11:22:32 -0500 Willem de Bruijn wrote:
>>> From: Willem de Bruijn <willemb@google.com>
>>>
>>> Add optional PTP hardware timestamp offload for virtio-net.
>>>
>>> Accurate RTT measurement requires timestamps close to the wire.
>>> Introduce virtio feature VIRTIO_NET_F_RX_TSTAMP. If negotiated, the
>>> virtio-net header is expanded with room for a timestamp. A host may
>>> pass receive timestamps for all or some packets. A timestamp is valid
>>> if non-zero.
>>>
>>> The timestamp straddles (virtual) hardware domains. Like PTP, use
>>> international atomic time (CLOCK_TAI) as global clock base. It is
>>> guest responsibility to sync with host, e.g., through kvm-clock.
>> Would this not be confusing to some user space SW to have a NIC with
>> no PHC deliver HW stamps?
>>
>> I'd CC Richard on this, unless you already discussed with him offline.
> Thanks, good point. I should have included Richard.
>
> There is a well understood method for synchronizing guest and host
> clock in KVM using ptp_kvm. For virtual environments without NIC
> hardware offload, the when host timestamps in software, this suffices.
>
> Syncing host with NIC is assumed if the host advertises the feature
> and implements using real hardware timestamps.


Or it could be useful for virtio hardware when there's no KVM that 
provides PTP.

Thanks


>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rfc 2/3] virtio-net: support receive timestamp
  2020-12-29  0:57     ` Willem de Bruijn
  2020-12-30  8:44       ` Jason Wang
@ 2020-12-30 12:30       ` Richard Cochran
  1 sibling, 0 replies; 34+ messages in thread
From: Richard Cochran @ 2020-12-30 12:30 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Jakub Kicinski, virtualization, Network Development,
	Michael S. Tsirkin, Jason Wang

On Mon, Dec 28, 2020 at 07:57:20PM -0500, Willem de Bruijn wrote:
> There is a well understood method for synchronizing guest and host
> clock in KVM using ptp_kvm. For virtual environments without NIC
> hardware offload, the when host timestamps in software, this suffices.
> 
> Syncing host with NIC is assumed if the host advertises the feature
> and implements using real hardware timestamps.

Sounds reasonable.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rfc 3/3] virtio-net: support transmit timestamp
  2020-12-28 16:22 ` [PATCH rfc 3/3] virtio-net: support transmit timestamp Willem de Bruijn
@ 2020-12-30 12:38   ` Richard Cochran
  2020-12-30 15:25     ` Willem de Bruijn
  2021-02-02 13:47   ` kernel test robot
  1 sibling, 1 reply; 34+ messages in thread
From: Richard Cochran @ 2020-12-30 12:38 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: virtualization, netdev, mst, jasowang, Willem de Bruijn

On Mon, Dec 28, 2020 at 11:22:33AM -0500, Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
> 
> Add optional delivery time (SO_TXTIME) offload for virtio-net.
> 
> The Linux TCP/IP stack tries to avoid bursty transmission and network
> congestion through pacing: computing an skb delivery time based on
> congestion information. Userspace protocol implementations can achieve
> the same with SO_TXTIME. This may also reduce scheduling jitter and
> improve RTT estimation.

This description is clear, but the Subject line is confusing.  It made
me wonder whether this series is somehow about host/guest synchronization
(but your comments do explain that that isn't the case).

How about this instead?

   virtio-net: support future packet transmit time

Thanks,
Richard

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rfc 3/3] virtio-net: support transmit timestamp
  2020-12-30 12:38   ` Richard Cochran
@ 2020-12-30 15:25     ` Willem de Bruijn
  0 siblings, 0 replies; 34+ messages in thread
From: Willem de Bruijn @ 2020-12-30 15:25 UTC (permalink / raw)
  To: Richard Cochran
  Cc: virtualization, Network Development, Michael S. Tsirkin,
	Jason Wang, Willem de Bruijn

On Wed, Dec 30, 2020 at 7:38 AM Richard Cochran
<richardcochran@gmail.com> wrote:
>
> On Mon, Dec 28, 2020 at 11:22:33AM -0500, Willem de Bruijn wrote:
> > From: Willem de Bruijn <willemb@google.com>
> >
> > Add optional delivery time (SO_TXTIME) offload for virtio-net.
> >
> > The Linux TCP/IP stack tries to avoid bursty transmission and network
> > congestion through pacing: computing an skb delivery time based on
> > congestion information. Userspace protocol implementations can achieve
> > the same with SO_TXTIME. This may also reduce scheduling jitter and
> > improve RTT estimation.
>
> This description is clear, but the Subject line is confusing.  It made
> me wonder whether this series is somehow about host/guest synchronization
> (but your comments do explain that that isn't the case).
>
> How about this instead?
>
>    virtio-net: support future packet transmit time

Yes, that's clearer. As is, this could easily be mistaken for
SOF_TIMESTAMPING_TX_*, which it is not. Will update, thanks.

Related terms already in use are SO_TXTIME, Earliest Delivery Time
(EDT) and Earliest TxTime First (ETF).

I should probably also s/TX_TSTAMP/TX_TIME/  in the code for the same reason.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rfc 0/3] virtio-net: add tx-hash, rx-tstamp and tx-tstamp
  2020-12-29  1:14       ` Willem de Bruijn
@ 2021-01-06 20:32         ` Willem de Bruijn
  0 siblings, 0 replies; 34+ messages in thread
From: Willem de Bruijn @ 2021-01-06 20:32 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtualization, Network Development, Jason Wang

On Mon, Dec 28, 2020 at 8:15 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Mon, Dec 28, 2020 at 7:47 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Dec 28, 2020 at 02:51:09PM -0500, Willem de Bruijn wrote:
> > > On Mon, Dec 28, 2020 at 12:29 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Mon, Dec 28, 2020 at 11:22:30AM -0500, Willem de Bruijn wrote:
> > > > > From: Willem de Bruijn <willemb@google.com>
> > > > >
> > > > > RFC for three new features to the virtio network device:
> > > > >
> > > > > 1. pass tx flow hash and state to host, for routing + telemetry
> > > > > 2. pass rx tstamp to guest, for better RTT estimation
> > > > > 3. pass tx tstamp to host, for accurate pacing
> > > > >
> > > > > All three would introduce an extension to the virtio spec.
> > > > > I assume this would require opening three ballots against v1.2 at
> > > > > https://www.oasis-open.org/committees/ballots.php?wg_abbrev=virtio
> > > > >
> > > > > This RFC is to informally discuss the proposals first.
> > > > >
> > > > > The patchset is against v5.10. Evaluation additionally requires
> > > > > changes to qemu and at least one back-end. I implemented preliminary
> > > > > support in Linux vhost-net. Both patches available through github at
> > > > >
> > > > > https://github.com/wdebruij/linux/tree/virtio-net-txhash-1
> > > > > https://github.com/wdebruij/qemu/tree/virtio-net-txhash-1
> > > >
> > > > Any data on what the benefits are?
> > >
> > > For the general method, yes. For this specific implementation, not  yet.
> > >
> > > Swift congestion control is delay based. It won the best paper award
> > > at SIGCOMM this year. That paper has a lot of data:
> > > https://dl.acm.org/doi/pdf/10.1145/3387514.3406591 . Section 3.1 talks
> > > about the different components that contribute to delay and how to
> > > isolate them.
> >
> > And for the hashing part?
>
> A few concrete examples of error conditions that can be resolved are
> mentioned in the commits that add sk_rethink_txhash calls. Such as
> commit 7788174e8726 ("tcp: change IPv6 flow-label upon receiving
> spurious retransmission"):
>
> "
>     Currently a Linux IPv6 TCP sender will change the flow label upon
>     timeouts to potentially steer away from a data path that has gone
>     bad. However this does not help if the problem is on the ACK path
>     and the data path is healthy. In this case the receiver is likely
>     to receive repeated spurious retransmission because the sender
>     couldn't get the ACKs in time and has recurring timeouts.
>
>     This patch adds another feature to mitigate this problem. It
>     leverages the DSACK states in the receiver to change the flow
>     label of the ACKs to speculatively re-route the ACK packets.
>     In order to allow triggering on the second consecutive spurious
>     RTO, the receiver changes the flow label upon sending a second
>     consecutive DSACK for a sequence number below RCV.NXT.
> "
>
> I don't have quantitative data on the efficacy at scale at hand. Let
> me see what I can find. This will probably take some time, at least
> until people are back after the holidays. I didn't want to delay the
> patch, as the merge window was a nice time for RFC. But agreed that it
> deserves stronger justification.

The practical results mirror what the theory suggests: that in the
presence of multiple paths, of which one goes bad, this method
maintains connectivity where otherwise it would disconnect.

When IPv6 FlowLabel was included in path selection (e.g., LAG/ECMP),
flowlabel rotation on TCP timeout avoided the vast majority of TCP
disconnections that would otherwise have occurred during to link
failures in long-haul backbones, when an alternative path was
available.

So it's not a matter of percentages, just the existence of an
alternative healthy path on which the packets will eventually land
quite deterministically as it rotates the txhash on each timeout.

This method can be deployed based on a variety of "bad connection"
signals. Besides timeouts, the aforementioned spurious retransmits,
for one. This TCP connection-level information can independent of
flowlabel rotation be valuable information to the cloud provider to
detect and pinpoint network issues. As mentioned before, ideally we
can pass along such details of the type of signal along with the hash.
But that also requires passing that state in the guest from the TCP
layer to the virtio-net device. So left for separate later work. For
now we just have the reserved space in the header.

Michael, what is the best way to proceed with this? Send the patches
for review to net-next, or should I start by opening ballots to
https://www.oasis-open.org/committees/ballots.php?wg_abbrev=virtio
first? Thanks.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rfc 2/3] virtio-net: support receive timestamp
  2020-12-28 16:22 ` [PATCH rfc 2/3] virtio-net: support receive timestamp Willem de Bruijn
  2020-12-28 17:28   ` Michael S. Tsirkin
  2020-12-28 22:59   ` Jakub Kicinski
@ 2021-02-02 13:05   ` kernel test robot
  2021-02-02 14:08   ` Michael S. Tsirkin
  3 siblings, 0 replies; 34+ messages in thread
From: kernel test robot @ 2021-02-02 13:05 UTC (permalink / raw)
  To: Willem de Bruijn, virtualization
  Cc: kbuild-all, netdev, mst, jasowang, Willem de Bruijn

[-- Attachment #1: Type: text/plain, Size: 4750 bytes --]

Hi Willem,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on ipvs/master]
[also build test WARNING on linus/master v5.11-rc6 next-20210125]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Willem-de-Bruijn/virtio-net-add-tx-hash-rx-tstamp-and-tx-tstamp/20201229-002604
base:   https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs.git master
config: x86_64-randconfig-s021-20201228 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.3-215-g0fb77bb6-dirty
        # https://github.com/0day-ci/linux/commit/d309db6857fa35b0d7a11cc5229436d6d71ab274
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Willem-de-Bruijn/virtio-net-add-tx-hash-rx-tstamp-and-tx-tstamp/20201229-002604
        git checkout d309db6857fa35b0d7a11cc5229436d6d71ab274
        # save the attached .config to linux build tree
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


"sparse warnings: (new ones prefixed by >>)"
>> drivers/net/virtio_net.c:1096:80: sparse: sparse: incorrect type in argument 1 (different base types) @@     expected unsigned long long [usertype] ns @@     got restricted __virtio64 [usertype] tstamp @@
   drivers/net/virtio_net.c:1096:80: sparse:     expected unsigned long long [usertype] ns
   drivers/net/virtio_net.c:1096:80: sparse:     got restricted __virtio64 [usertype] tstamp
   drivers/net/virtio_net.c:1580:32: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __le32 [usertype] hash_value @@     got restricted __virtio32 @@
   drivers/net/virtio_net.c:1580:32: sparse:     expected restricted __le32 [usertype] hash_value
   drivers/net/virtio_net.c:1580:32: sparse:     got restricted __virtio32
   drivers/net/virtio_net.c:1581:33: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __le16 [usertype] hash_report @@     got int @@
   drivers/net/virtio_net.c:1581:33: sparse:     expected restricted __le16 [usertype] hash_report
   drivers/net/virtio_net.c:1581:33: sparse:     got int

vim +1096 drivers/net/virtio_net.c

  1048	
  1049	static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
  1050				void *buf, unsigned int len, void **ctx,
  1051				unsigned int *xdp_xmit,
  1052				struct virtnet_rq_stats *stats)
  1053	{
  1054		struct net_device *dev = vi->dev;
  1055		struct sk_buff *skb;
  1056		struct virtio_net_hdr_mrg_rxbuf *hdr;
  1057	
  1058		if (unlikely(len < vi->hdr_len + ETH_HLEN)) {
  1059			pr_debug("%s: short packet %i\n", dev->name, len);
  1060			dev->stats.rx_length_errors++;
  1061			if (vi->mergeable_rx_bufs) {
  1062				put_page(virt_to_head_page(buf));
  1063			} else if (vi->big_packets) {
  1064				give_pages(rq, buf);
  1065			} else {
  1066				put_page(virt_to_head_page(buf));
  1067			}
  1068			return;
  1069		}
  1070	
  1071		if (vi->mergeable_rx_bufs)
  1072			skb = receive_mergeable(dev, vi, rq, buf, ctx, len, xdp_xmit,
  1073						stats);
  1074		else if (vi->big_packets)
  1075			skb = receive_big(dev, vi, rq, buf, len, stats);
  1076		else
  1077			skb = receive_small(dev, vi, rq, buf, ctx, len, xdp_xmit, stats);
  1078	
  1079		if (unlikely(!skb))
  1080			return;
  1081	
  1082		hdr = skb_vnet_hdr(skb);
  1083	
  1084		if (hdr->hdr.flags & VIRTIO_NET_HDR_F_DATA_VALID)
  1085			skb->ip_summed = CHECKSUM_UNNECESSARY;
  1086	
  1087		if (virtio_net_hdr_to_skb(skb, &hdr->hdr,
  1088					  virtio_is_little_endian(vi->vdev))) {
  1089			net_warn_ratelimited("%s: bad gso: type: %u, size: %u\n",
  1090					     dev->name, hdr->hdr.gso_type,
  1091					     hdr->hdr.gso_size);
  1092			goto frame_err;
  1093		}
  1094	
  1095		if (vi->has_rx_tstamp)
> 1096			skb_hwtstamps(skb)->hwtstamp = ns_to_ktime(skb_vnet_hdr_12(skb)->tstamp);
  1097	
  1098		skb_record_rx_queue(skb, vq2rxq(rq->vq));
  1099		skb->protocol = eth_type_trans(skb, dev);
  1100		pr_debug("Receiving skb proto 0x%04x len %i type %i\n",
  1101			 ntohs(skb->protocol), skb->len, skb->pkt_type);
  1102	
  1103		napi_gro_receive(&rq->napi, skb);
  1104		return;
  1105	
  1106	frame_err:
  1107		dev->stats.rx_frame_errors++;
  1108		dev_kfree_skb(skb);
  1109	}
  1110	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 34123 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rfc 3/3] virtio-net: support transmit timestamp
  2020-12-28 16:22 ` [PATCH rfc 3/3] virtio-net: support transmit timestamp Willem de Bruijn
  2020-12-30 12:38   ` Richard Cochran
@ 2021-02-02 13:47   ` kernel test robot
  1 sibling, 0 replies; 34+ messages in thread
From: kernel test robot @ 2021-02-02 13:47 UTC (permalink / raw)
  To: Willem de Bruijn, virtualization
  Cc: kbuild-all, netdev, mst, jasowang, Willem de Bruijn

[-- Attachment #1: Type: text/plain, Size: 4833 bytes --]

Hi Willem,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on ipvs/master]
[also build test WARNING on linus/master v5.11-rc6 next-20210125]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Willem-de-Bruijn/virtio-net-add-tx-hash-rx-tstamp-and-tx-tstamp/20201229-002604
base:   https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs.git master
config: x86_64-randconfig-s021-20201228 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.3-215-g0fb77bb6-dirty
        # https://github.com/0day-ci/linux/commit/be9cab7692382c7886333b498d1d8adbba1881f7
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Willem-de-Bruijn/virtio-net-add-tx-hash-rx-tstamp-and-tx-tstamp/20201229-002604
        git checkout be9cab7692382c7886333b498d1d8adbba1881f7
        # save the attached .config to linux build tree
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


"sparse warnings: (new ones prefixed by >>)"
   drivers/net/virtio_net.c:1099:80: sparse: sparse: incorrect type in argument 1 (different base types) @@     expected unsigned long long [usertype] ns @@     got restricted __virtio64 [usertype] tstamp @@
   drivers/net/virtio_net.c:1099:80: sparse:     expected unsigned long long [usertype] ns
   drivers/net/virtio_net.c:1099:80: sparse:     got restricted __virtio64 [usertype] tstamp
>> drivers/net/virtio_net.c:1583:33: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __le32 [usertype] value @@     got restricted __virtio32 @@
   drivers/net/virtio_net.c:1583:33: sparse:     expected restricted __le32 [usertype] value
   drivers/net/virtio_net.c:1583:33: sparse:     got restricted __virtio32
>> drivers/net/virtio_net.c:1584:34: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __le16 [usertype] report @@     got int @@
   drivers/net/virtio_net.c:1584:34: sparse:     expected restricted __le16 [usertype] report
   drivers/net/virtio_net.c:1584:34: sparse:     got int

vim +1583 drivers/net/virtio_net.c

  1550	
  1551	static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
  1552	{
  1553		struct virtio_net_hdr_mrg_rxbuf *hdr;
  1554		const unsigned char *dest = ((struct ethhdr *)skb->data)->h_dest;
  1555		struct virtnet_info *vi = sq->vq->vdev->priv;
  1556		struct virtio_net_hdr_v12 *h12;
  1557		int num_sg;
  1558		unsigned hdr_len = vi->hdr_len;
  1559		bool can_push;
  1560	
  1561		pr_debug("%s: xmit %p %pM\n", vi->dev->name, skb, dest);
  1562	
  1563		can_push = vi->any_header_sg &&
  1564			!((unsigned long)skb->data & (__alignof__(*hdr) - 1)) &&
  1565			!skb_header_cloned(skb) && skb_headroom(skb) >= hdr_len;
  1566		/* Even if we can, don't push here yet as this would skew
  1567		 * csum_start offset below. */
  1568		if (can_push)
  1569			hdr = (struct virtio_net_hdr_mrg_rxbuf *)(skb->data - hdr_len);
  1570		else
  1571			hdr = skb_vnet_hdr(skb);
  1572	
  1573		if (virtio_net_hdr_from_skb(skb, &hdr->hdr,
  1574					    virtio_is_little_endian(vi->vdev), false,
  1575					    0))
  1576			BUG();
  1577	
  1578		if (vi->mergeable_rx_bufs)
  1579			hdr->num_buffers = 0;
  1580	
  1581		h12 = (void *)hdr;
  1582		if (vi->has_tx_hash) {
> 1583			h12->hash.value = cpu_to_virtio32(vi->vdev, skb->hash);
> 1584			h12->hash.report = skb->l4_hash ? VIRTIO_NET_HASH_REPORT_L4 :
  1585							  VIRTIO_NET_HASH_REPORT_OTHER;
  1586			h12->hash.flow_state = VIRTIO_NET_HASH_STATE_DEFAULT;
  1587		}
  1588		if (vi->has_tx_tstamp)
  1589			h12->tstamp = cpu_to_virtio64(vi->vdev, skb->tstamp);
  1590	
  1591		sg_init_table(sq->sg, skb_shinfo(skb)->nr_frags + (can_push ? 1 : 2));
  1592		if (can_push) {
  1593			__skb_push(skb, hdr_len);
  1594			num_sg = skb_to_sgvec(skb, sq->sg, 0, skb->len);
  1595			if (unlikely(num_sg < 0))
  1596				return num_sg;
  1597			/* Pull header back to avoid skew in tx bytes calculations. */
  1598			__skb_pull(skb, hdr_len);
  1599		} else {
  1600			sg_set_buf(sq->sg, hdr, hdr_len);
  1601			num_sg = skb_to_sgvec(skb, sq->sg + 1, 0, skb->len);
  1602			if (unlikely(num_sg < 0))
  1603				return num_sg;
  1604			num_sg++;
  1605		}
  1606		return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
  1607	}
  1608	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 34123 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rfc 2/3] virtio-net: support receive timestamp
  2020-12-28 16:22 ` [PATCH rfc 2/3] virtio-net: support receive timestamp Willem de Bruijn
                     ` (2 preceding siblings ...)
  2021-02-02 13:05   ` kernel test robot
@ 2021-02-02 14:08   ` Michael S. Tsirkin
  2021-02-02 22:17     ` Willem de Bruijn
  3 siblings, 1 reply; 34+ messages in thread
From: Michael S. Tsirkin @ 2021-02-02 14:08 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: virtualization, netdev, jasowang, Willem de Bruijn

On Mon, Dec 28, 2020 at 11:22:32AM -0500, Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
> 
> Add optional PTP hardware timestamp offload for virtio-net.
> 
> Accurate RTT measurement requires timestamps close to the wire.
> Introduce virtio feature VIRTIO_NET_F_RX_TSTAMP. If negotiated, the
> virtio-net header is expanded with room for a timestamp. A host may
> pass receive timestamps for all or some packets. A timestamp is valid
> if non-zero.
> 
> The timestamp straddles (virtual) hardware domains. Like PTP, use
> international atomic time (CLOCK_TAI) as global clock base. It is
> guest responsibility to sync with host, e.g., through kvm-clock.
> 
> Signed-off-by: Willem de Bruijn <willemb@google.com>
> ---
>  drivers/net/virtio_net.c        | 20 +++++++++++++++++++-
>  include/uapi/linux/virtio_net.h | 12 ++++++++++++
>  2 files changed, 31 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index b917b7333928..57744bb6a141 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -204,6 +204,9 @@ struct virtnet_info {
>  	/* Guest will pass tx path info to the host */
>  	bool has_tx_hash;
>  
> +	/* Host will pass CLOCK_TAI receive time to the guest */
> +	bool has_rx_tstamp;
> +
>  	/* Has control virtqueue */
>  	bool has_cvq;
>  
> @@ -292,6 +295,13 @@ static inline struct virtio_net_hdr_mrg_rxbuf *skb_vnet_hdr(struct sk_buff *skb)
>  	return (struct virtio_net_hdr_mrg_rxbuf *)skb->cb;
>  }
>  
> +static inline struct virtio_net_hdr_v12 *skb_vnet_hdr_12(struct sk_buff *skb)
> +{
> +	BUILD_BUG_ON(sizeof(struct virtio_net_hdr_v12) > sizeof(skb->cb));
> +
> +	return (void *)skb->cb;
> +}
> +
>  /*
>   * private is used to chain pages for big packets, put the whole
>   * most recent used list in the beginning for reuse
> @@ -1082,6 +1092,9 @@ static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
>  		goto frame_err;
>  	}
>  
> +	if (vi->has_rx_tstamp)
> +		skb_hwtstamps(skb)->hwtstamp = ns_to_ktime(skb_vnet_hdr_12(skb)->tstamp);
> +
>  	skb_record_rx_queue(skb, vq2rxq(rq->vq));
>  	skb->protocol = eth_type_trans(skb, dev);
>  	pr_debug("Receiving skb proto 0x%04x len %i type %i\n",
> @@ -3071,6 +3084,11 @@ static int virtnet_probe(struct virtio_device *vdev)
>  		vi->hdr_len = sizeof(struct virtio_net_hdr_v1_hash);
>  	}
>  
> +	if (virtio_has_feature(vdev, VIRTIO_NET_F_RX_TSTAMP)) {
> +		vi->has_rx_tstamp = true;
> +		vi->hdr_len = sizeof(struct virtio_net_hdr_v12);
> +	}
> +
>  	if (virtio_has_feature(vdev, VIRTIO_F_ANY_LAYOUT) ||
>  	    virtio_has_feature(vdev, VIRTIO_F_VERSION_1))
>  		vi->any_header_sg = true;
> @@ -3261,7 +3279,7 @@ static struct virtio_device_id id_table[] = {
>  	VIRTIO_NET_F_CTRL_MAC_ADDR, \
>  	VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, \
>  	VIRTIO_NET_F_SPEED_DUPLEX, VIRTIO_NET_F_STANDBY, \
> -	VIRTIO_NET_F_TX_HASH
> +	VIRTIO_NET_F_TX_HASH, VIRTIO_NET_F_RX_TSTAMP
>  
>  static unsigned int features[] = {
>  	VIRTNET_FEATURES,
> diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
> index f6881b5b77ee..0ffe2eeebd4a 100644
> --- a/include/uapi/linux/virtio_net.h
> +++ b/include/uapi/linux/virtio_net.h
> @@ -57,6 +57,7 @@
>  					 * Steering */
>  #define VIRTIO_NET_F_CTRL_MAC_ADDR 23	/* Set MAC address */
>  
> +#define VIRTIO_NET_F_RX_TSTAMP	  55	/* Host sends TAI receive time */
>  #define VIRTIO_NET_F_TX_HASH	  56	/* Guest sends hash report */
>  #define VIRTIO_NET_F_HASH_REPORT  57	/* Supports hash report */
>  #define VIRTIO_NET_F_RSS	  60	/* Supports RSS RX steering */
> @@ -182,6 +183,17 @@ struct virtio_net_hdr_v1_hash {
>  	};
>  };
>  
> +struct virtio_net_hdr_v12 {
> +	struct virtio_net_hdr_v1 hdr;
> +	struct {
> +		__le32 value;
> +		__le16 report;
> +		__le16 flow_state;
> +	} hash;
> +	__virtio32 reserved;


Does endian-ness matter? If not - just u32?

> +	__virtio64 tstamp;
> +};
> +

Given it's only available in modern devices, I think we
can make this __le64 tstamp.

>  #ifndef VIRTIO_NET_NO_LEGACY
>  /* This header comes first in the scatter-gather list.
>   * For legacy virtio, if VIRTIO_F_ANY_LAYOUT is not negotiated, it must
> -- 
> 2.29.2.729.g45daf8777d-goog


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rfc 2/3] virtio-net: support receive timestamp
  2021-02-02 14:08   ` Michael S. Tsirkin
@ 2021-02-02 22:17     ` Willem de Bruijn
  2021-02-02 23:02       ` Michael S. Tsirkin
  0 siblings, 1 reply; 34+ messages in thread
From: Willem de Bruijn @ 2021-02-02 22:17 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtualization, Network Development, Jason Wang

On Tue, Feb 2, 2021 at 9:08 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Dec 28, 2020 at 11:22:32AM -0500, Willem de Bruijn wrote:
> > From: Willem de Bruijn <willemb@google.com>
> >
> > Add optional PTP hardware timestamp offload for virtio-net.
> >
> > Accurate RTT measurement requires timestamps close to the wire.
> > Introduce virtio feature VIRTIO_NET_F_RX_TSTAMP. If negotiated, the
> > virtio-net header is expanded with room for a timestamp. A host may
> > pass receive timestamps for all or some packets. A timestamp is valid
> > if non-zero.
> >
> > The timestamp straddles (virtual) hardware domains. Like PTP, use
> > international atomic time (CLOCK_TAI) as global clock base. It is
> > guest responsibility to sync with host, e.g., through kvm-clock.
> >
> > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > ---
> >  drivers/net/virtio_net.c        | 20 +++++++++++++++++++-
> >  include/uapi/linux/virtio_net.h | 12 ++++++++++++
> >  2 files changed, 31 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index b917b7333928..57744bb6a141 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -204,6 +204,9 @@ struct virtnet_info {
> >       /* Guest will pass tx path info to the host */
> >       bool has_tx_hash;
> >
> > +     /* Host will pass CLOCK_TAI receive time to the guest */
> > +     bool has_rx_tstamp;
> > +
> >       /* Has control virtqueue */
> >       bool has_cvq;
> >
> > @@ -292,6 +295,13 @@ static inline struct virtio_net_hdr_mrg_rxbuf *skb_vnet_hdr(struct sk_buff *skb)
> >       return (struct virtio_net_hdr_mrg_rxbuf *)skb->cb;
> >  }
> >
> > +static inline struct virtio_net_hdr_v12 *skb_vnet_hdr_12(struct sk_buff *skb)
> > +{
> > +     BUILD_BUG_ON(sizeof(struct virtio_net_hdr_v12) > sizeof(skb->cb));
> > +
> > +     return (void *)skb->cb;
> > +}
> > +
> >  /*
> >   * private is used to chain pages for big packets, put the whole
> >   * most recent used list in the beginning for reuse
> > @@ -1082,6 +1092,9 @@ static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
> >               goto frame_err;
> >       }
> >
> > +     if (vi->has_rx_tstamp)
> > +             skb_hwtstamps(skb)->hwtstamp = ns_to_ktime(skb_vnet_hdr_12(skb)->tstamp);
> > +
> >       skb_record_rx_queue(skb, vq2rxq(rq->vq));
> >       skb->protocol = eth_type_trans(skb, dev);
> >       pr_debug("Receiving skb proto 0x%04x len %i type %i\n",
> > @@ -3071,6 +3084,11 @@ static int virtnet_probe(struct virtio_device *vdev)
> >               vi->hdr_len = sizeof(struct virtio_net_hdr_v1_hash);
> >       }
> >
> > +     if (virtio_has_feature(vdev, VIRTIO_NET_F_RX_TSTAMP)) {
> > +             vi->has_rx_tstamp = true;
> > +             vi->hdr_len = sizeof(struct virtio_net_hdr_v12);
> > +     }
> > +
> >       if (virtio_has_feature(vdev, VIRTIO_F_ANY_LAYOUT) ||
> >           virtio_has_feature(vdev, VIRTIO_F_VERSION_1))
> >               vi->any_header_sg = true;
> > @@ -3261,7 +3279,7 @@ static struct virtio_device_id id_table[] = {
> >       VIRTIO_NET_F_CTRL_MAC_ADDR, \
> >       VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, \
> >       VIRTIO_NET_F_SPEED_DUPLEX, VIRTIO_NET_F_STANDBY, \
> > -     VIRTIO_NET_F_TX_HASH
> > +     VIRTIO_NET_F_TX_HASH, VIRTIO_NET_F_RX_TSTAMP
> >
> >  static unsigned int features[] = {
> >       VIRTNET_FEATURES,
> > diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
> > index f6881b5b77ee..0ffe2eeebd4a 100644
> > --- a/include/uapi/linux/virtio_net.h
> > +++ b/include/uapi/linux/virtio_net.h
> > @@ -57,6 +57,7 @@
> >                                        * Steering */
> >  #define VIRTIO_NET_F_CTRL_MAC_ADDR 23        /* Set MAC address */
> >
> > +#define VIRTIO_NET_F_RX_TSTAMP         55    /* Host sends TAI receive time */
> >  #define VIRTIO_NET_F_TX_HASH   56    /* Guest sends hash report */
> >  #define VIRTIO_NET_F_HASH_REPORT  57 /* Supports hash report */
> >  #define VIRTIO_NET_F_RSS       60    /* Supports RSS RX steering */
> > @@ -182,6 +183,17 @@ struct virtio_net_hdr_v1_hash {
> >       };
> >  };
> >
> > +struct virtio_net_hdr_v12 {
> > +     struct virtio_net_hdr_v1 hdr;
> > +     struct {
> > +             __le32 value;
> > +             __le16 report;
> > +             __le16 flow_state;
> > +     } hash;
> > +     __virtio32 reserved;
>
>
> Does endian-ness matter? If not - just u32?

I suppose it does not matter as long as this is reserved. Should it be
__le32, at least?

> > +     __virtio64 tstamp;
> > +};
> > +
>
> Given it's only available in modern devices, I think we
> can make this __le64 tstamp.

Actually, would it be possible to make new features available on
legacy devices? There is nothing in the features bits precluding it.

I have a revised patchset almost ready. I suppose I should send it as
RFC again, and simultaneously file an OASIS ballot for each feature?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rfc 2/3] virtio-net: support receive timestamp
  2021-02-02 22:17     ` Willem de Bruijn
@ 2021-02-02 23:02       ` Michael S. Tsirkin
  2021-02-02 23:43         ` Willem de Bruijn
  0 siblings, 1 reply; 34+ messages in thread
From: Michael S. Tsirkin @ 2021-02-02 23:02 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: virtualization, Network Development, Jason Wang

On Tue, Feb 02, 2021 at 05:17:13PM -0500, Willem de Bruijn wrote:
> On Tue, Feb 2, 2021 at 9:08 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Dec 28, 2020 at 11:22:32AM -0500, Willem de Bruijn wrote:
> > > From: Willem de Bruijn <willemb@google.com>
> > >
> > > Add optional PTP hardware timestamp offload for virtio-net.
> > >
> > > Accurate RTT measurement requires timestamps close to the wire.
> > > Introduce virtio feature VIRTIO_NET_F_RX_TSTAMP. If negotiated, the
> > > virtio-net header is expanded with room for a timestamp. A host may
> > > pass receive timestamps for all or some packets. A timestamp is valid
> > > if non-zero.
> > >
> > > The timestamp straddles (virtual) hardware domains. Like PTP, use
> > > international atomic time (CLOCK_TAI) as global clock base. It is
> > > guest responsibility to sync with host, e.g., through kvm-clock.
> > >
> > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > > ---
> > >  drivers/net/virtio_net.c        | 20 +++++++++++++++++++-
> > >  include/uapi/linux/virtio_net.h | 12 ++++++++++++
> > >  2 files changed, 31 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index b917b7333928..57744bb6a141 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -204,6 +204,9 @@ struct virtnet_info {
> > >       /* Guest will pass tx path info to the host */
> > >       bool has_tx_hash;
> > >
> > > +     /* Host will pass CLOCK_TAI receive time to the guest */
> > > +     bool has_rx_tstamp;
> > > +
> > >       /* Has control virtqueue */
> > >       bool has_cvq;
> > >
> > > @@ -292,6 +295,13 @@ static inline struct virtio_net_hdr_mrg_rxbuf *skb_vnet_hdr(struct sk_buff *skb)
> > >       return (struct virtio_net_hdr_mrg_rxbuf *)skb->cb;
> > >  }
> > >
> > > +static inline struct virtio_net_hdr_v12 *skb_vnet_hdr_12(struct sk_buff *skb)
> > > +{
> > > +     BUILD_BUG_ON(sizeof(struct virtio_net_hdr_v12) > sizeof(skb->cb));
> > > +
> > > +     return (void *)skb->cb;
> > > +}
> > > +
> > >  /*
> > >   * private is used to chain pages for big packets, put the whole
> > >   * most recent used list in the beginning for reuse
> > > @@ -1082,6 +1092,9 @@ static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
> > >               goto frame_err;
> > >       }
> > >
> > > +     if (vi->has_rx_tstamp)
> > > +             skb_hwtstamps(skb)->hwtstamp = ns_to_ktime(skb_vnet_hdr_12(skb)->tstamp);
> > > +
> > >       skb_record_rx_queue(skb, vq2rxq(rq->vq));
> > >       skb->protocol = eth_type_trans(skb, dev);
> > >       pr_debug("Receiving skb proto 0x%04x len %i type %i\n",
> > > @@ -3071,6 +3084,11 @@ static int virtnet_probe(struct virtio_device *vdev)
> > >               vi->hdr_len = sizeof(struct virtio_net_hdr_v1_hash);
> > >       }
> > >
> > > +     if (virtio_has_feature(vdev, VIRTIO_NET_F_RX_TSTAMP)) {
> > > +             vi->has_rx_tstamp = true;
> > > +             vi->hdr_len = sizeof(struct virtio_net_hdr_v12);
> > > +     }
> > > +
> > >       if (virtio_has_feature(vdev, VIRTIO_F_ANY_LAYOUT) ||
> > >           virtio_has_feature(vdev, VIRTIO_F_VERSION_1))
> > >               vi->any_header_sg = true;
> > > @@ -3261,7 +3279,7 @@ static struct virtio_device_id id_table[] = {
> > >       VIRTIO_NET_F_CTRL_MAC_ADDR, \
> > >       VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, \
> > >       VIRTIO_NET_F_SPEED_DUPLEX, VIRTIO_NET_F_STANDBY, \
> > > -     VIRTIO_NET_F_TX_HASH
> > > +     VIRTIO_NET_F_TX_HASH, VIRTIO_NET_F_RX_TSTAMP
> > >
> > >  static unsigned int features[] = {
> > >       VIRTNET_FEATURES,
> > > diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
> > > index f6881b5b77ee..0ffe2eeebd4a 100644
> > > --- a/include/uapi/linux/virtio_net.h
> > > +++ b/include/uapi/linux/virtio_net.h
> > > @@ -57,6 +57,7 @@
> > >                                        * Steering */
> > >  #define VIRTIO_NET_F_CTRL_MAC_ADDR 23        /* Set MAC address */
> > >
> > > +#define VIRTIO_NET_F_RX_TSTAMP         55    /* Host sends TAI receive time */
> > >  #define VIRTIO_NET_F_TX_HASH   56    /* Guest sends hash report */
> > >  #define VIRTIO_NET_F_HASH_REPORT  57 /* Supports hash report */
> > >  #define VIRTIO_NET_F_RSS       60    /* Supports RSS RX steering */
> > > @@ -182,6 +183,17 @@ struct virtio_net_hdr_v1_hash {
> > >       };
> > >  };
> > >
> > > +struct virtio_net_hdr_v12 {
> > > +     struct virtio_net_hdr_v1 hdr;
> > > +     struct {
> > > +             __le32 value;
> > > +             __le16 report;
> > > +             __le16 flow_state;
> > > +     } hash;
> > > +     __virtio32 reserved;
> >
> >
> > Does endian-ness matter? If not - just u32?
> 
> I suppose it does not matter as long as this is reserved. Should it be
> __le32, at least?

One can safely assign 0 to any value.


> > > +     __virtio64 tstamp;
> > > +};
> > > +
> >
> > Given it's only available in modern devices, I think we
> > can make this __le64 tstamp.
> 
> Actually, would it be possible to make new features available on
> legacy devices? There is nothing in the features bits precluding it.

I think it won't be possible: you are using feature bit 55,
legacy devices have up to 32 feature bits. And of course the
header looks a bit differently for legacy, you would have to add special
code to handle that when mergeable buffers are off.

> 
> I have a revised patchset almost ready. I suppose I should send it as
> RFC again, and simultaneously file an OASIS ballot for each feature?

that would be great.

-- 
MST


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH rfc 2/3] virtio-net: support receive timestamp
  2021-02-02 23:02       ` Michael S. Tsirkin
@ 2021-02-02 23:43         ` Willem de Bruijn
  0 siblings, 0 replies; 34+ messages in thread
From: Willem de Bruijn @ 2021-02-02 23:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Willem de Bruijn, virtualization, Network Development, Jason Wang

On Tue, Feb 2, 2021 at 6:06 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Feb 02, 2021 at 05:17:13PM -0500, Willem de Bruijn wrote:
> > On Tue, Feb 2, 2021 at 9:08 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Dec 28, 2020 at 11:22:32AM -0500, Willem de Bruijn wrote:
> > > > From: Willem de Bruijn <willemb@google.com>
> > > >
> > > > Add optional PTP hardware timestamp offload for virtio-net.
> > > >
> > > > Accurate RTT measurement requires timestamps close to the wire.
> > > > Introduce virtio feature VIRTIO_NET_F_RX_TSTAMP. If negotiated, the
> > > > virtio-net header is expanded with room for a timestamp. A host may
> > > > pass receive timestamps for all or some packets. A timestamp is valid
> > > > if non-zero.
> > > >
> > > > The timestamp straddles (virtual) hardware domains. Like PTP, use
> > > > international atomic time (CLOCK_TAI) as global clock base. It is
> > > > guest responsibility to sync with host, e.g., through kvm-clock.
> > > >
> > > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > > > ---
> > > >  drivers/net/virtio_net.c        | 20 +++++++++++++++++++-
> > > >  include/uapi/linux/virtio_net.h | 12 ++++++++++++
> > > >  2 files changed, 31 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index b917b7333928..57744bb6a141 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -204,6 +204,9 @@ struct virtnet_info {
> > > >       /* Guest will pass tx path info to the host */
> > > >       bool has_tx_hash;
> > > >
> > > > +     /* Host will pass CLOCK_TAI receive time to the guest */
> > > > +     bool has_rx_tstamp;
> > > > +
> > > >       /* Has control virtqueue */
> > > >       bool has_cvq;
> > > >
> > > > @@ -292,6 +295,13 @@ static inline struct virtio_net_hdr_mrg_rxbuf *skb_vnet_hdr(struct sk_buff *skb)
> > > >       return (struct virtio_net_hdr_mrg_rxbuf *)skb->cb;
> > > >  }
> > > >
> > > > +static inline struct virtio_net_hdr_v12 *skb_vnet_hdr_12(struct sk_buff *skb)
> > > > +{
> > > > +     BUILD_BUG_ON(sizeof(struct virtio_net_hdr_v12) > sizeof(skb->cb));
> > > > +
> > > > +     return (void *)skb->cb;
> > > > +}
> > > > +
> > > >  /*
> > > >   * private is used to chain pages for big packets, put the whole
> > > >   * most recent used list in the beginning for reuse
> > > > @@ -1082,6 +1092,9 @@ static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
> > > >               goto frame_err;
> > > >       }
> > > >
> > > > +     if (vi->has_rx_tstamp)
> > > > +             skb_hwtstamps(skb)->hwtstamp = ns_to_ktime(skb_vnet_hdr_12(skb)->tstamp);
> > > > +
> > > >       skb_record_rx_queue(skb, vq2rxq(rq->vq));
> > > >       skb->protocol = eth_type_trans(skb, dev);
> > > >       pr_debug("Receiving skb proto 0x%04x len %i type %i\n",
> > > > @@ -3071,6 +3084,11 @@ static int virtnet_probe(struct virtio_device *vdev)
> > > >               vi->hdr_len = sizeof(struct virtio_net_hdr_v1_hash);
> > > >       }
> > > >
> > > > +     if (virtio_has_feature(vdev, VIRTIO_NET_F_RX_TSTAMP)) {
> > > > +             vi->has_rx_tstamp = true;
> > > > +             vi->hdr_len = sizeof(struct virtio_net_hdr_v12);
> > > > +     }
> > > > +
> > > >       if (virtio_has_feature(vdev, VIRTIO_F_ANY_LAYOUT) ||
> > > >           virtio_has_feature(vdev, VIRTIO_F_VERSION_1))
> > > >               vi->any_header_sg = true;
> > > > @@ -3261,7 +3279,7 @@ static struct virtio_device_id id_table[] = {
> > > >       VIRTIO_NET_F_CTRL_MAC_ADDR, \
> > > >       VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, \
> > > >       VIRTIO_NET_F_SPEED_DUPLEX, VIRTIO_NET_F_STANDBY, \
> > > > -     VIRTIO_NET_F_TX_HASH
> > > > +     VIRTIO_NET_F_TX_HASH, VIRTIO_NET_F_RX_TSTAMP
> > > >
> > > >  static unsigned int features[] = {
> > > >       VIRTNET_FEATURES,
> > > > diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
> > > > index f6881b5b77ee..0ffe2eeebd4a 100644
> > > > --- a/include/uapi/linux/virtio_net.h
> > > > +++ b/include/uapi/linux/virtio_net.h
> > > > @@ -57,6 +57,7 @@
> > > >                                        * Steering */
> > > >  #define VIRTIO_NET_F_CTRL_MAC_ADDR 23        /* Set MAC address */
> > > >
> > > > +#define VIRTIO_NET_F_RX_TSTAMP         55    /* Host sends TAI receive time */
> > > >  #define VIRTIO_NET_F_TX_HASH   56    /* Guest sends hash report */
> > > >  #define VIRTIO_NET_F_HASH_REPORT  57 /* Supports hash report */
> > > >  #define VIRTIO_NET_F_RSS       60    /* Supports RSS RX steering */
> > > > @@ -182,6 +183,17 @@ struct virtio_net_hdr_v1_hash {
> > > >       };
> > > >  };
> > > >
> > > > +struct virtio_net_hdr_v12 {
> > > > +     struct virtio_net_hdr_v1 hdr;
> > > > +     struct {
> > > > +             __le32 value;
> > > > +             __le16 report;
> > > > +             __le16 flow_state;
> > > > +     } hash;
> > > > +     __virtio32 reserved;
> > >
> > >
> > > Does endian-ness matter? If not - just u32?
> >
> > I suppose it does not matter as long as this is reserved. Should it be
> > __le32, at least?
>
> One can safely assign 0 to any value.

Ack.

>
> > > > +     __virtio64 tstamp;
> > > > +};
> > > > +
> > >
> > > Given it's only available in modern devices, I think we
> > > can make this __le64 tstamp.
> >
> > Actually, would it be possible to make new features available on
> > legacy devices? There is nothing in the features bits precluding it.
>
> I think it won't be possible: you are using feature bit 55,
> legacy devices have up to 32 feature bits. And of course the
> header looks a bit differently for legacy, you would have to add special
> code to handle that when mergeable buffers are off.

I think I can make the latter work. I did start without a dependency
on the v1 header initially.

Feature bit array length I had not considered. Good point. Need to
think about that. It would be very appealing if in particular the
tx-hash feature could work in legacy mode.

> >
> > I have a revised patchset almost ready. I suppose I should send it as
> > RFC again, and simultaneously file an OASIS ballot for each feature?
>
> that would be great.

Will do, thanks.

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2021-02-02 23:45 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-28 16:22 [PATCH rfc 0/3] virtio-net: add tx-hash, rx-tstamp and tx-tstamp Willem de Bruijn
2020-12-28 16:22 ` [PATCH rfc 1/3] virtio-net: support transmit hash report Willem de Bruijn
2020-12-28 16:28   ` Michael S. Tsirkin
2020-12-28 16:47     ` Willem de Bruijn
2020-12-28 17:22       ` Michael S. Tsirkin
2020-12-29  1:19         ` Willem de Bruijn
2020-12-28 21:36   ` Michael S. Tsirkin
2020-12-29  1:23     ` Willem de Bruijn
2020-12-28 16:22 ` [PATCH rfc 2/3] virtio-net: support receive timestamp Willem de Bruijn
2020-12-28 17:28   ` Michael S. Tsirkin
2020-12-28 19:30     ` Willem de Bruijn
2020-12-28 21:32       ` Michael S. Tsirkin
2020-12-29  1:05         ` Willem de Bruijn
2020-12-29  9:17           ` Jason Wang
2020-12-29 14:20             ` Willem de Bruijn
2020-12-30  8:38               ` Jason Wang
2020-12-28 22:59   ` Jakub Kicinski
2020-12-29  0:57     ` Willem de Bruijn
2020-12-30  8:44       ` Jason Wang
2020-12-30 12:30       ` Richard Cochran
2021-02-02 13:05   ` kernel test robot
2021-02-02 14:08   ` Michael S. Tsirkin
2021-02-02 22:17     ` Willem de Bruijn
2021-02-02 23:02       ` Michael S. Tsirkin
2021-02-02 23:43         ` Willem de Bruijn
2020-12-28 16:22 ` [PATCH rfc 3/3] virtio-net: support transmit timestamp Willem de Bruijn
2020-12-30 12:38   ` Richard Cochran
2020-12-30 15:25     ` Willem de Bruijn
2021-02-02 13:47   ` kernel test robot
2020-12-28 17:29 ` [PATCH rfc 0/3] virtio-net: add tx-hash, rx-tstamp and tx-tstamp Michael S. Tsirkin
2020-12-28 19:51   ` Willem de Bruijn
2020-12-28 21:38     ` Michael S. Tsirkin
2020-12-29  1:14       ` Willem de Bruijn
2021-01-06 20:32         ` Willem de Bruijn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).