netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/7] Support for virtio-net hash reporting
@ 2021-01-12 19:41 Yuri Benditovich
  2021-01-12 19:41 ` [RFC PATCH 1/7] skbuff: define field for hash report type Yuri Benditovich
                   ` (7 more replies)
  0 siblings, 8 replies; 26+ messages in thread
From: Yuri Benditovich @ 2021-01-12 19:41 UTC (permalink / raw)
  To: davem, kuba, mst, jasowang, ast, daniel, andrii, kafai,
	songliubraving, yhs, john.fastabend, kpsingh, rdunlap, willemb,
	gustavoars, herbert, steffen.klassert, nogikh, pablo, decui, cai,
	jakub, elver, pabeni, netdev, linux-kernel, kvm, virtualization,
	bpf
  Cc: yan

Existing TUN module is able to use provided "steering eBPF" to
calculate per-packet hash and derive the destination queue to
place the packet to. The eBPF uses mapped configuration data
containing a key for hash calculation and indirection table
with array of queues' indices.

This series of patches adds support for virtio-net hash reporting
feature as defined in virtio specification. It extends the TUN module
and the "steering eBPF" as follows:

Extended steering eBPF calculates the hash value and hash type, keeps
hash value in the skb->hash and returns index of destination virtqueue
and the type of the hash. TUN module keeps returned hash type in
(currently unused) field of the skb. 
skb->__unused renamed to 'hash_report_type'.

When TUN module is called later to allocate and fill the virtio-net
header and push it to destination virtqueue it populates the hash
and the hash type into virtio-net header.

VHOST driver is made aware of respective virtio-net feature that
extends the virtio-net header to report the hash value and hash report
type.

Yuri Benditovich (7):
  skbuff: define field for hash report type
  vhost: support for hash report virtio-net feature
  tun: allow use of BPF_PROG_TYPE_SCHED_CLS program type
  tun: free bpf_program by bpf_prog_put instead of bpf_prog_destroy
  tun: add ioctl code TUNSETHASHPOPULATION
  tun: populate hash in virtio-net header when needed
  tun: report new tun feature IFF_HASH

 drivers/net/tun.c           | 43 +++++++++++++++++++++++++++++++------
 drivers/vhost/net.c         | 37 ++++++++++++++++++++++++-------
 include/linux/skbuff.h      |  7 +++++-
 include/uapi/linux/if_tun.h |  2 ++
 4 files changed, 74 insertions(+), 15 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC PATCH 1/7] skbuff: define field for hash report type
  2021-01-12 19:41 [RFC PATCH 0/7] Support for virtio-net hash reporting Yuri Benditovich
@ 2021-01-12 19:41 ` Yuri Benditovich
  2021-01-12 19:41 ` [RFC PATCH 2/7] vhost: support for hash report virtio-net feature Yuri Benditovich
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 26+ messages in thread
From: Yuri Benditovich @ 2021-01-12 19:41 UTC (permalink / raw)
  To: davem, kuba, mst, jasowang, ast, daniel, andrii, kafai,
	songliubraving, yhs, john.fastabend, kpsingh, rdunlap, willemb,
	gustavoars, herbert, steffen.klassert, nogikh, pablo, decui, cai,
	jakub, elver, pabeni, netdev, linux-kernel, kvm, virtualization,
	bpf
  Cc: yan

Used by virtio-net receive side scaling

Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
---
 include/linux/skbuff.h | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 416bf95cd5f2..36cf40ec0259 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -510,7 +510,7 @@ int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb,
  * the end of the header data, ie. at skb->end.
  */
 struct skb_shared_info {
-	__u8		__unused;
+	__u8		hash_report_type; /* virtio-net rss */
 	__u8		meta_len;
 	__u8		nr_frags;
 	__u8		tx_flags;
@@ -1430,6 +1430,11 @@ static inline struct skb_shared_hwtstamps *skb_hwtstamps(struct sk_buff *skb)
 	return &skb_shinfo(skb)->hwtstamps;
 }
 
+static inline __u8 *skb_hash_report_type(struct sk_buff *skb)
+{
+	return &skb_shinfo(skb)->hash_report_type;
+}
+
 static inline struct ubuf_info *skb_zcopy(struct sk_buff *skb)
 {
 	bool is_zcopy = skb && skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 2/7] vhost: support for hash report virtio-net feature
  2021-01-12 19:41 [RFC PATCH 0/7] Support for virtio-net hash reporting Yuri Benditovich
  2021-01-12 19:41 ` [RFC PATCH 1/7] skbuff: define field for hash report type Yuri Benditovich
@ 2021-01-12 19:41 ` Yuri Benditovich
  2021-01-12 19:41 ` [RFC PATCH 3/7] tun: allow use of BPF_PROG_TYPE_SCHED_CLS program type Yuri Benditovich
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 26+ messages in thread
From: Yuri Benditovich @ 2021-01-12 19:41 UTC (permalink / raw)
  To: davem, kuba, mst, jasowang, ast, daniel, andrii, kafai,
	songliubraving, yhs, john.fastabend, kpsingh, rdunlap, willemb,
	gustavoars, herbert, steffen.klassert, nogikh, pablo, decui, cai,
	jakub, elver, pabeni, netdev, linux-kernel, kvm, virtualization,
	bpf
  Cc: yan

According to the virtio specification if VIRTIO_NET_F_HASH_REPORT
feature acked the virtio-net header is extended to hold the hash
value and hash report type.

Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
---
 drivers/vhost/net.c | 37 +++++++++++++++++++++++++++++--------
 1 file changed, 29 insertions(+), 8 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 531a00d703cd..31a894b9a992 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -73,7 +73,8 @@ enum {
 	VHOST_NET_FEATURES = VHOST_FEATURES |
 			 (1ULL << VHOST_NET_F_VIRTIO_NET_HDR) |
 			 (1ULL << VIRTIO_NET_F_MRG_RXBUF) |
-			 (1ULL << VIRTIO_F_ACCESS_PLATFORM)
+			 (1ULL << VIRTIO_F_ACCESS_PLATFORM) |
+			 (1ULL << VIRTIO_NET_F_HASH_REPORT)
 };
 
 enum {
@@ -1108,14 +1109,16 @@ static void handle_rx(struct vhost_net *net)
 		.msg_controllen = 0,
 		.msg_flags = MSG_DONTWAIT,
 	};
-	struct virtio_net_hdr hdr = {
-		.flags = 0,
-		.gso_type = VIRTIO_NET_HDR_GSO_NONE
+	struct virtio_net_hdr_v1_hash hdrv1 = {
+		{
+			.flags = 0,
+			.gso_type = VIRTIO_NET_HDR_GSO_NONE
+		}
 	};
 	size_t total_len = 0;
 	int err, mergeable;
 	s16 headcount;
-	size_t vhost_hlen, sock_hlen;
+	size_t vhost_hlen, sock_hlen, extra_hlen;
 	size_t vhost_len, sock_len;
 	bool busyloop_intr = false;
 	struct socket *sock;
@@ -1137,9 +1140,12 @@ static void handle_rx(struct vhost_net *net)
 	vhost_hlen = nvq->vhost_hlen;
 	sock_hlen = nvq->sock_hlen;
 
+
 	vq_log = unlikely(vhost_has_feature(vq, VHOST_F_LOG_ALL)) ?
 		vq->log : NULL;
 	mergeable = vhost_has_feature(vq, VIRTIO_NET_F_MRG_RXBUF);
+	extra_hlen = vhost_has_feature(vq, VIRTIO_NET_F_HASH_REPORT) ?
+		sizeof(hdrv1) - sizeof(hdrv1.hdr) : 0;
 
 	do {
 		sock_len = vhost_net_rx_peek_head_len(net, sock->sk,
@@ -1201,8 +1207,8 @@ static void handle_rx(struct vhost_net *net)
 		}
 		/* Supply virtio_net_hdr if VHOST_NET_F_VIRTIO_NET_HDR */
 		if (unlikely(vhost_hlen)) {
-			if (copy_to_iter(&hdr, sizeof(hdr),
-					 &fixup) != sizeof(hdr)) {
+			if (copy_to_iter(&hdrv1, sizeof(struct virtio_net_hdr),
+					 &fixup) != sizeof(struct virtio_net_hdr)) {
 				vq_err(vq, "Unable to write vnet_hdr "
 				       "at addr %p\n", vq->iov->iov_base);
 				goto out;
@@ -1211,7 +1217,7 @@ static void handle_rx(struct vhost_net *net)
 			/* Header came from socket; we'll need to patch
 			 * ->num_buffers over if VIRTIO_NET_F_MRG_RXBUF
 			 */
-			iov_iter_advance(&fixup, sizeof(hdr));
+			iov_iter_advance(&fixup, sizeof(struct virtio_net_hdr));
 		}
 		/* TODO: Should check and handle checksum. */
 
@@ -1223,6 +1229,18 @@ static void handle_rx(struct vhost_net *net)
 			vhost_discard_vq_desc(vq, headcount);
 			goto out;
 		}
+		if (unlikely(extra_hlen)) {
+			if (unlikely(vhost_hlen)) {
+				if (copy_to_iter(&hdrv1.hash_value, extra_hlen,
+						&fixup) != extra_hlen) {
+					vq_err(vq, "Unable to write extra_hdr "
+					"at addr %p\n", vq->iov->iov_base);
+					goto out;
+				}
+			} else {
+				iov_iter_advance(&fixup, extra_hlen);
+			}
+		}
 		nvq->done_idx += headcount;
 		if (nvq->done_idx > VHOST_NET_BATCH)
 			vhost_net_signal_used(nvq);
@@ -1624,6 +1642,9 @@ static int vhost_net_set_features(struct vhost_net *n, u64 features)
 			       (1ULL << VIRTIO_F_VERSION_1))) ?
 			sizeof(struct virtio_net_hdr_mrg_rxbuf) :
 			sizeof(struct virtio_net_hdr);
+	if (features & (1ULL << VIRTIO_NET_F_HASH_REPORT)) {
+		hdr_len = sizeof(struct virtio_net_hdr_v1_hash);
+	}
 	if (features & (1 << VHOST_NET_F_VIRTIO_NET_HDR)) {
 		/* vhost provides vnet_hdr */
 		vhost_hlen = hdr_len;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 3/7] tun: allow use of BPF_PROG_TYPE_SCHED_CLS program type
  2021-01-12 19:41 [RFC PATCH 0/7] Support for virtio-net hash reporting Yuri Benditovich
  2021-01-12 19:41 ` [RFC PATCH 1/7] skbuff: define field for hash report type Yuri Benditovich
  2021-01-12 19:41 ` [RFC PATCH 2/7] vhost: support for hash report virtio-net feature Yuri Benditovich
@ 2021-01-12 19:41 ` Yuri Benditovich
  2021-01-12 19:46   ` Alexei Starovoitov
  2021-01-12 20:40   ` Yuri Benditovich
  2021-01-12 19:41 ` [RFC PATCH 4/7] tun: free bpf_program by bpf_prog_put instead of bpf_prog_destroy Yuri Benditovich
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 26+ messages in thread
From: Yuri Benditovich @ 2021-01-12 19:41 UTC (permalink / raw)
  To: davem, kuba, mst, jasowang, ast, daniel, andrii, kafai,
	songliubraving, yhs, john.fastabend, kpsingh, rdunlap, willemb,
	gustavoars, herbert, steffen.klassert, nogikh, pablo, decui, cai,
	jakub, elver, pabeni, netdev, linux-kernel, kvm, virtualization,
	bpf
  Cc: yan

This program type can set skb hash value. It will be useful
when the tun will support hash reporting feature if virtio-net.

Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
---
 drivers/net/tun.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 7959b5c2d11f..455f7afc1f36 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2981,6 +2981,8 @@ static int tun_set_ebpf(struct tun_struct *tun, struct tun_prog __rcu **prog_p,
 		prog = NULL;
 	} else {
 		prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_SOCKET_FILTER);
+		if (IS_ERR(prog))
+			prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_SCHED_CLS);
 		if (IS_ERR(prog))
 			return PTR_ERR(prog);
 	}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 4/7] tun: free bpf_program by bpf_prog_put instead of bpf_prog_destroy
  2021-01-12 19:41 [RFC PATCH 0/7] Support for virtio-net hash reporting Yuri Benditovich
                   ` (2 preceding siblings ...)
  2021-01-12 19:41 ` [RFC PATCH 3/7] tun: allow use of BPF_PROG_TYPE_SCHED_CLS program type Yuri Benditovich
@ 2021-01-12 19:41 ` Yuri Benditovich
  2021-01-12 19:41 ` [RFC PATCH 5/7] tun: add ioctl code TUNSETHASHPOPULATION Yuri Benditovich
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 26+ messages in thread
From: Yuri Benditovich @ 2021-01-12 19:41 UTC (permalink / raw)
  To: davem, kuba, mst, jasowang, ast, daniel, andrii, kafai,
	songliubraving, yhs, john.fastabend, kpsingh, rdunlap, willemb,
	gustavoars, herbert, steffen.klassert, nogikh, pablo, decui, cai,
	jakub, elver, pabeni, netdev, linux-kernel, kvm, virtualization,
	bpf
  Cc: yan

The module never creates the bpf program with bpf_prog_create
so it shouldn't free it with bpf_prog_destroy.
The program is obtained by bpf_prog_get and should be freed
by bpf_prog_put. For BPF_PROG_TYPE_SOCKET_FILTER both
methods do the same but for other program types they don't.

Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
---
 drivers/net/tun.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 455f7afc1f36..18c1baf1a6c1 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2218,7 +2218,7 @@ static void tun_prog_free(struct rcu_head *rcu)
 {
 	struct tun_prog *prog = container_of(rcu, struct tun_prog, rcu);
 
-	bpf_prog_destroy(prog->prog);
+	bpf_prog_put(prog->prog);
 	kfree(prog);
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 5/7] tun: add ioctl code TUNSETHASHPOPULATION
  2021-01-12 19:41 [RFC PATCH 0/7] Support for virtio-net hash reporting Yuri Benditovich
                   ` (3 preceding siblings ...)
  2021-01-12 19:41 ` [RFC PATCH 4/7] tun: free bpf_program by bpf_prog_put instead of bpf_prog_destroy Yuri Benditovich
@ 2021-01-12 19:41 ` Yuri Benditovich
  2021-01-12 19:41 ` [RFC PATCH 6/7] tun: populate hash in virtio-net header when needed Yuri Benditovich
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 26+ messages in thread
From: Yuri Benditovich @ 2021-01-12 19:41 UTC (permalink / raw)
  To: davem, kuba, mst, jasowang, ast, daniel, andrii, kafai,
	songliubraving, yhs, john.fastabend, kpsingh, rdunlap, willemb,
	gustavoars, herbert, steffen.klassert, nogikh, pablo, decui, cai,
	jakub, elver, pabeni, netdev, linux-kernel, kvm, virtualization,
	bpf
  Cc: yan

User mode program calls this ioctl before loading of
BPF program to inform the tun that the BPF program has
extended functionality, i.e. sets hash value and returns
the virtqueue number in the lower 16 bits and the type
of the hash report in the upper 16 bits.

Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
---
 drivers/net/tun.c           | 12 +++++++++++-
 include/uapi/linux/if_tun.h |  1 +
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 18c1baf1a6c1..45f4f04a4a3e 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -197,6 +197,7 @@ struct tun_struct {
 	struct sock_fprog	fprog;
 	/* protected by rtnl lock */
 	bool			filter_attached;
+	bool                    bpf_populates_hash;
 	u32			msg_enable;
 	spinlock_t lock;
 	struct hlist_head flows[TUN_NUM_FLOW_ENTRIES];
@@ -2765,6 +2766,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 
 		tun->align = NET_SKB_PAD;
 		tun->filter_attached = false;
+		tun->bpf_populates_hash = false;
 		tun->sndbuf = tfile->socket.sk->sk_sndbuf;
 		tun->rx_batched = 0;
 		RCU_INIT_POINTER(tun->steering_prog, NULL);
@@ -2997,7 +2999,7 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd,
 	struct net *net = sock_net(&tfile->sk);
 	struct tun_struct *tun;
 	void __user* argp = (void __user*)arg;
-	unsigned int ifindex, carrier;
+	unsigned int ifindex, carrier, populate_hash;
 	struct ifreq ifr;
 	kuid_t owner;
 	kgid_t group;
@@ -3298,6 +3300,14 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd,
 		ret = open_related_ns(&net->ns, get_net_ns);
 		break;
 
+	case TUNSETHASHPOPULATION:
+		ret = -EFAULT;
+		if (copy_from_user(&populate_hash, argp, sizeof(populate_hash)))
+			goto unlock;
+		tun->bpf_populates_hash = !!populate_hash;
+		ret = 0;
+		break;
+
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h
index 454ae31b93c7..0fd43533da26 100644
--- a/include/uapi/linux/if_tun.h
+++ b/include/uapi/linux/if_tun.h
@@ -61,6 +61,7 @@
 #define TUNSETFILTEREBPF _IOR('T', 225, int)
 #define TUNSETCARRIER _IOW('T', 226, int)
 #define TUNGETDEVNETNS _IO('T', 227)
+#define TUNSETHASHPOPULATION _IOR('T', 228, int)
 
 /* TUNSETIFF ifr flags */
 #define IFF_TUN		0x0001
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 6/7] tun: populate hash in virtio-net header when needed
  2021-01-12 19:41 [RFC PATCH 0/7] Support for virtio-net hash reporting Yuri Benditovich
                   ` (4 preceding siblings ...)
  2021-01-12 19:41 ` [RFC PATCH 5/7] tun: add ioctl code TUNSETHASHPOPULATION Yuri Benditovich
@ 2021-01-12 19:41 ` Yuri Benditovich
  2021-01-12 19:41 ` [RFC PATCH 7/7] tun: report new tun feature IFF_HASH Yuri Benditovich
  2021-01-12 19:49 ` [RFC PATCH 0/7] Support for virtio-net hash reporting Yuri Benditovich
  7 siblings, 0 replies; 26+ messages in thread
From: Yuri Benditovich @ 2021-01-12 19:41 UTC (permalink / raw)
  To: davem, kuba, mst, jasowang, ast, daniel, andrii, kafai,
	songliubraving, yhs, john.fastabend, kpsingh, rdunlap, willemb,
	gustavoars, herbert, steffen.klassert, nogikh, pablo, decui, cai,
	jakub, elver, pabeni, netdev, linux-kernel, kvm, virtualization,
	bpf
  Cc: yan

If the BPF program populated the hash in the skb the tun
propagates the hash value and hash report type to the
respective fields of virtio-net header.

Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
---
 drivers/net/tun.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 45f4f04a4a3e..214feb0b16fb 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -556,15 +556,20 @@ static u16 tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb)
 {
 	struct tun_prog *prog;
 	u32 numqueues;
-	u16 ret = 0;
+	u32 ret = 0;
 
 	numqueues = READ_ONCE(tun->numqueues);
 	if (!numqueues)
 		return 0;
 
 	prog = rcu_dereference(tun->steering_prog);
-	if (prog)
+	if (prog) {
 		ret = bpf_prog_run_clear_cb(prog->prog, skb);
+		if (tun->bpf_populates_hash) {
+			*skb_hash_report_type(skb) = (__u8)(ret >> 16);
+			ret &= 0xffff;
+		}
+	}
 
 	return ret % numqueues;
 }
@@ -2062,6 +2067,7 @@ static ssize_t tun_put_user(struct tun_struct *tun,
 
 	if (vnet_hdr_sz) {
 		struct virtio_net_hdr gso;
+		__u16 extra_copy = 0;
 
 		if (iov_iter_count(iter) < vnet_hdr_sz)
 			return -EINVAL;
@@ -2085,7 +2091,20 @@ static ssize_t tun_put_user(struct tun_struct *tun,
 		if (copy_to_iter(&gso, sizeof(gso), iter) != sizeof(gso))
 			return -EFAULT;
 
-		iov_iter_advance(iter, vnet_hdr_sz - sizeof(gso));
+		if (tun->bpf_populates_hash &&
+		    vnet_hdr_sz >= sizeof(struct virtio_net_hdr_v1_hash)) {
+			struct virtio_net_hdr_v1_hash hdr;
+
+			hdr.hdr.num_buffers = 0;
+			hdr.hash_value = cpu_to_le32(skb_get_hash(skb));
+			hdr.hash_report = cpu_to_le16(*skb_hash_report_type(skb));
+			hdr.padding = 0;
+			extra_copy = sizeof(hdr) - sizeof(gso);
+			if (copy_to_iter(&hdr.hdr.num_buffers, extra_copy, iter) != extra_copy)
+				return -EFAULT;
+		}
+
+		iov_iter_advance(iter, vnet_hdr_sz - sizeof(gso) - extra_copy);
 	}
 
 	if (vlan_hlen) {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 7/7] tun: report new tun feature IFF_HASH
  2021-01-12 19:41 [RFC PATCH 0/7] Support for virtio-net hash reporting Yuri Benditovich
                   ` (5 preceding siblings ...)
  2021-01-12 19:41 ` [RFC PATCH 6/7] tun: populate hash in virtio-net header when needed Yuri Benditovich
@ 2021-01-12 19:41 ` Yuri Benditovich
  2021-01-12 19:49 ` [RFC PATCH 0/7] Support for virtio-net hash reporting Yuri Benditovich
  7 siblings, 0 replies; 26+ messages in thread
From: Yuri Benditovich @ 2021-01-12 19:41 UTC (permalink / raw)
  To: davem, kuba, mst, jasowang, ast, daniel, andrii, kafai,
	songliubraving, yhs, john.fastabend, kpsingh, rdunlap, willemb,
	gustavoars, herbert, steffen.klassert, nogikh, pablo, decui, cai,
	jakub, elver, pabeni, netdev, linux-kernel, kvm, virtualization,
	bpf
  Cc: yan

IFF_HASH feature indicates that the tun supports
TUNSETHASHPOPULATION ioctl and can propagate the hash
data to the virtio-net packet.

Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
---
 drivers/net/tun.c           | 2 +-
 include/uapi/linux/if_tun.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 214feb0b16fb..b46aa8941a9d 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -88,7 +88,7 @@ static void tun_default_link_ksettings(struct net_device *dev,
 #define TUN_VNET_LE     0x80000000
 #define TUN_VNET_BE     0x40000000
 
-#define TUN_FEATURES (IFF_NO_PI | IFF_ONE_QUEUE | IFF_VNET_HDR | \
+#define TUN_FEATURES (IFF_NO_PI | IFF_ONE_QUEUE | IFF_VNET_HDR | IFF_HASH |\
 		      IFF_MULTI_QUEUE | IFF_NAPI | IFF_NAPI_FRAGS)
 
 #define GOODCOPY_LEN 128
diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h
index 0fd43533da26..116b84ede3a0 100644
--- a/include/uapi/linux/if_tun.h
+++ b/include/uapi/linux/if_tun.h
@@ -73,6 +73,7 @@
 #define IFF_ONE_QUEUE	0x2000
 #define IFF_VNET_HDR	0x4000
 #define IFF_TUN_EXCL	0x8000
+#define IFF_HASH	0x0080
 #define IFF_MULTI_QUEUE 0x0100
 #define IFF_ATTACH_QUEUE 0x0200
 #define IFF_DETACH_QUEUE 0x0400
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/7] tun: allow use of BPF_PROG_TYPE_SCHED_CLS program type
  2021-01-12 19:41 ` [RFC PATCH 3/7] tun: allow use of BPF_PROG_TYPE_SCHED_CLS program type Yuri Benditovich
@ 2021-01-12 19:46   ` Alexei Starovoitov
  2021-01-12 20:33     ` Yuri Benditovich
  2021-01-12 20:40   ` Yuri Benditovich
  1 sibling, 1 reply; 26+ messages in thread
From: Alexei Starovoitov @ 2021-01-12 19:46 UTC (permalink / raw)
  To: Yuri Benditovich
  Cc: David S. Miller, Jakub Kicinski, Michael S. Tsirkin, Jason Wang,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Randy Dunlap, Willem de Bruijn, gustavoars, Herbert Xu,
	Steffen Klassert, nogikh, Pablo Neira Ayuso, decui, cai,
	Jakub Sitnicki, Marco Elver, Paolo Abeni, Network Development,
	LKML, kvm, virtualization, bpf, Yan Vugenfirer

On Tue, Jan 12, 2021 at 11:42 AM Yuri Benditovich
<yuri.benditovich@daynix.com> wrote:
>
> This program type can set skb hash value. It will be useful
> when the tun will support hash reporting feature if virtio-net.
>
> Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
> ---
>  drivers/net/tun.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 7959b5c2d11f..455f7afc1f36 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -2981,6 +2981,8 @@ static int tun_set_ebpf(struct tun_struct *tun, struct tun_prog __rcu **prog_p,
>                 prog = NULL;
>         } else {
>                 prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_SOCKET_FILTER);
> +               if (IS_ERR(prog))
> +                       prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_SCHED_CLS);

You've ignored the feedback and just resend? what for?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 0/7] Support for virtio-net hash reporting
  2021-01-12 19:41 [RFC PATCH 0/7] Support for virtio-net hash reporting Yuri Benditovich
                   ` (6 preceding siblings ...)
  2021-01-12 19:41 ` [RFC PATCH 7/7] tun: report new tun feature IFF_HASH Yuri Benditovich
@ 2021-01-12 19:49 ` Yuri Benditovich
  2021-01-12 20:28   ` Yuri Benditovich
  7 siblings, 1 reply; 26+ messages in thread
From: Yuri Benditovich @ 2021-01-12 19:49 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Michael S . Tsirkin, Jason Wang,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, rdunlap, willemb, gustavoars, herbert,
	steffen.klassert, nogikh, pablo, decui, cai, jakub, elver,
	pabeni, netdev, linux-kernel, kvm, virtualization, bpf
  Cc: Yan Vugenfirer

On Tue, Jan 12, 2021 at 9:41 PM Yuri Benditovich
<yuri.benditovich@daynix.com> wrote:
>
> Existing TUN module is able to use provided "steering eBPF" to
> calculate per-packet hash and derive the destination queue to
> place the packet to. The eBPF uses mapped configuration data
> containing a key for hash calculation and indirection table
> with array of queues' indices.
>
> This series of patches adds support for virtio-net hash reporting
> feature as defined in virtio specification. It extends the TUN module
> and the "steering eBPF" as follows:
>
> Extended steering eBPF calculates the hash value and hash type, keeps
> hash value in the skb->hash and returns index of destination virtqueue
> and the type of the hash. TUN module keeps returned hash type in
> (currently unused) field of the skb.
> skb->__unused renamed to 'hash_report_type'.
>
> When TUN module is called later to allocate and fill the virtio-net
> header and push it to destination virtqueue it populates the hash
> and the hash type into virtio-net header.
>
> VHOST driver is made aware of respective virtio-net feature that
> extends the virtio-net header to report the hash value and hash report
> type.

Comment from Willem de Bruijn:

Skbuff fields are in short supply. I don't think we need to add one
just for this narrow path entirely internal to the tun device.

Instead, you could just run the flow_dissector in tun_put_user if the
feature is negotiated. Indeed, the flow dissector seems more apt to me
than BPF here. Note that the flow dissector internally can be
overridden by a BPF program if the admin so chooses.

This also hits on a deeper point with the choice of hash values, that
I also noticed in my RFC patchset to implement the inverse [1][2]. It
is much more detailed than skb->hash + skb->l4_hash currently offers,
and that can be gotten for free from most hardware. In most practical
cases, that information suffices. I added less specific fields
VIRTIO_NET_HASH_REPORT_L4, VIRTIO_NET_HASH_REPORT_OTHER that work
without explicit flow dissection. I understand that the existing
fields are part of the standard. Just curious, what is their purpose
beyond 4-tuple based flow hashing?

[1] https://patchwork.kernel.org/project/netdevbpf/list/?series=406859&state=*
[2] https://github.com/wdebruij/linux/commit/0f77febf22cd6ffc242a575807fa8382a26e511e
>
> Yuri Benditovich (7):
>   skbuff: define field for hash report type
>   vhost: support for hash report virtio-net feature
>   tun: allow use of BPF_PROG_TYPE_SCHED_CLS program type
>   tun: free bpf_program by bpf_prog_put instead of bpf_prog_destroy
>   tun: add ioctl code TUNSETHASHPOPULATION
>   tun: populate hash in virtio-net header when needed
>   tun: report new tun feature IFF_HASH
>
>  drivers/net/tun.c           | 43 +++++++++++++++++++++++++++++++------
>  drivers/vhost/net.c         | 37 ++++++++++++++++++++++++-------
>  include/linux/skbuff.h      |  7 +++++-
>  include/uapi/linux/if_tun.h |  2 ++
>  4 files changed, 74 insertions(+), 15 deletions(-)
>
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 0/7] Support for virtio-net hash reporting
  2021-01-12 19:49 ` [RFC PATCH 0/7] Support for virtio-net hash reporting Yuri Benditovich
@ 2021-01-12 20:28   ` Yuri Benditovich
  2021-01-12 23:47     ` Willem de Bruijn
  0 siblings, 1 reply; 26+ messages in thread
From: Yuri Benditovich @ 2021-01-12 20:28 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Michael S . Tsirkin, Jason Wang,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, rdunlap, willemb, gustavoars, herbert,
	steffen.klassert, pablo, decui, cai, jakub, elver, pabeni,
	netdev, linux-kernel, kvm, virtualization, bpf
  Cc: Yan Vugenfirer

On Tue, Jan 12, 2021 at 9:49 PM Yuri Benditovich
<yuri.benditovich@daynix.com> wrote:
>
> On Tue, Jan 12, 2021 at 9:41 PM Yuri Benditovich
> <yuri.benditovich@daynix.com> wrote:
> >
> > Existing TUN module is able to use provided "steering eBPF" to
> > calculate per-packet hash and derive the destination queue to
> > place the packet to. The eBPF uses mapped configuration data
> > containing a key for hash calculation and indirection table
> > with array of queues' indices.
> >
> > This series of patches adds support for virtio-net hash reporting
> > feature as defined in virtio specification. It extends the TUN module
> > and the "steering eBPF" as follows:
> >
> > Extended steering eBPF calculates the hash value and hash type, keeps
> > hash value in the skb->hash and returns index of destination virtqueue
> > and the type of the hash. TUN module keeps returned hash type in
> > (currently unused) field of the skb.
> > skb->__unused renamed to 'hash_report_type'.
> >
> > When TUN module is called later to allocate and fill the virtio-net
> > header and push it to destination virtqueue it populates the hash
> > and the hash type into virtio-net header.
> >
> > VHOST driver is made aware of respective virtio-net feature that
> > extends the virtio-net header to report the hash value and hash report
> > type.
>
> Comment from Willem de Bruijn:
>
> Skbuff fields are in short supply. I don't think we need to add one
> just for this narrow path entirely internal to the tun device.
>

We understand that and try to minimize the impact by using an already
existing unused field of skb.


> Instead, you could just run the flow_dissector in tun_put_user if the
> feature is negotiated. Indeed, the flow dissector seems more apt to me
> than BPF here. Note that the flow dissector internally can be
> overridden by a BPF program if the admin so chooses.
>
When this set of patches is related to hash delivery in the virtio-net
packet in general,
it was prepared in context of RSS feature implementation as defined in
virtio spec [1]
In case of RSS it is not enough to run the flow_dissector in tun_put_user:
in tun_ebpf_select_queue the TUN calls eBPF to calculate the hash,
hash type and queue index
according to the (mapped) parameters (key, hash types, indirection
table) received from the guest.
Our intention is to keep the hash and hash type in the skb to populate them
into a virtio-net header later in tun_put_user.
Note that in this case the type of calculated hash is selected not
only from flow dissections
but also from limitations provided by the guest.
This is already implemented in qemu (for case of vhost=off), see [2]
(virtio_net_process_rss)
For case of vhost=on there are WIP for qemu to load eBPF and attach it to TUN.

Note that exact way of selecting rx virtqueue depends on the guest,
it could be automatic steering (typical for Linux VM), RSS (typical
for Windows VM) or
any other steering mechanism implemented in loadable TUN steering BPF with
or without hash calculation.

[1] https://github.com/oasis-tcs/virtio-spec/blob/master/content.tex#L3740
[2] https://github.com/qemu/qemu/blob/master/hw/net/virtio-net.c#L1591

> This also hits on a deeper point with the choice of hash values, that
> I also noticed in my RFC patchset to implement the inverse [1][2]. It
> is much more detailed than skb->hash + skb->l4_hash currently offers,
> and that can be gotten for free from most hardware.

Unfortunately in the case of RSS we can't get this hash from the hardware as
this requires configuration of the NIC's hardware with key and hash types for
Toeplitz hash calculation.

> In most practical
> cases, that information suffices. I added less specific fields
> VIRTIO_NET_HASH_REPORT_L4, VIRTIO_NET_HASH_REPORT_OTHER that work
> without explicit flow dissection. I understand that the existing
> fields are part of the standard. Just curious, what is their purpose
> beyond 4-tuple based flow hashing?

The hash is used in combination with the indirection table to select
destination rx virtqueue.
The hash and hash type are to be reported in virtio-net header, if requested.
For Windows VM - in case the device does not report the hash (even if
it calculated it to
schedule the packet to a proper queue), the driver must do that for each packet
(this is a certification requirement).

>
> [1] https://patchwork.kernel.org/project/netdevbpf/list/?series=406859&state=*
> [2] https://github.com/wdebruij/linux/commit/0f77febf22cd6ffc242a575807fa8382a26e511e
> >
> > Yuri Benditovich (7):
> >   skbuff: define field for hash report type
> >   vhost: support for hash report virtio-net feature
> >   tun: allow use of BPF_PROG_TYPE_SCHED_CLS program type
> >   tun: free bpf_program by bpf_prog_put instead of bpf_prog_destroy
> >   tun: add ioctl code TUNSETHASHPOPULATION
> >   tun: populate hash in virtio-net header when needed
> >   tun: report new tun feature IFF_HASH
> >
> >  drivers/net/tun.c           | 43 +++++++++++++++++++++++++++++++------
> >  drivers/vhost/net.c         | 37 ++++++++++++++++++++++++-------
> >  include/linux/skbuff.h      |  7 +++++-
> >  include/uapi/linux/if_tun.h |  2 ++
> >  4 files changed, 74 insertions(+), 15 deletions(-)
> >
> > --
> > 2.17.1
> >

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/7] tun: allow use of BPF_PROG_TYPE_SCHED_CLS program type
  2021-01-12 19:46   ` Alexei Starovoitov
@ 2021-01-12 20:33     ` Yuri Benditovich
  0 siblings, 0 replies; 26+ messages in thread
From: Yuri Benditovich @ 2021-01-12 20:33 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Jakub Kicinski, Michael S. Tsirkin, Jason Wang,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Randy Dunlap, Willem de Bruijn, gustavoars, Herbert Xu,
	Steffen Klassert, Pablo Neira Ayuso, decui, cai, Jakub Sitnicki,
	Marco Elver, Paolo Abeni, Network Development, LKML, kvm,
	virtualization, bpf, Yan Vugenfirer

On Tue, Jan 12, 2021 at 9:46 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Jan 12, 2021 at 11:42 AM Yuri Benditovich
> <yuri.benditovich@daynix.com> wrote:
> >
> > This program type can set skb hash value. It will be useful
> > when the tun will support hash reporting feature if virtio-net.
> >
> > Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
> > ---
> >  drivers/net/tun.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > index 7959b5c2d11f..455f7afc1f36 100644
> > --- a/drivers/net/tun.c
> > +++ b/drivers/net/tun.c
> > @@ -2981,6 +2981,8 @@ static int tun_set_ebpf(struct tun_struct *tun, struct tun_prog __rcu **prog_p,
> >                 prog = NULL;
> >         } else {
> >                 prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_SOCKET_FILTER);
> > +               if (IS_ERR(prog))
> > +                       prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_SCHED_CLS);
>
> You've ignored the feedback and just resend? what for?

No, I do not. Some patches did not reach relevant people at all, so I
just resent _all_ the patches to all the people.
I will copy your earlier comment to this patch and will address it in
the discussion.
Sorry for misunderstanding and some redundant noise.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/7] tun: allow use of BPF_PROG_TYPE_SCHED_CLS program type
  2021-01-12 19:41 ` [RFC PATCH 3/7] tun: allow use of BPF_PROG_TYPE_SCHED_CLS program type Yuri Benditovich
  2021-01-12 19:46   ` Alexei Starovoitov
@ 2021-01-12 20:40   ` Yuri Benditovich
  2021-01-12 20:55     ` Yuri Benditovich
  1 sibling, 1 reply; 26+ messages in thread
From: Yuri Benditovich @ 2021-01-12 20:40 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Michael S . Tsirkin, Jason Wang,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, rdunlap, willemb, gustavoars, herbert,
	steffen.klassert, pablo, decui, cai, jakub, elver, pabeni,
	netdev, linux-kernel, kvm, virtualization, bpf
  Cc: Yan Vugenfirer

On Tue, Jan 12, 2021 at 9:42 PM Yuri Benditovich
<yuri.benditovich@daynix.com> wrote:
>
> This program type can set skb hash value. It will be useful
> when the tun will support hash reporting feature if virtio-net.
>
> Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
> ---
>  drivers/net/tun.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 7959b5c2d11f..455f7afc1f36 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -2981,6 +2981,8 @@ static int tun_set_ebpf(struct tun_struct *tun, struct tun_prog __rcu **prog_p,
>                 prog = NULL;
>         } else {
>                 prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_SOCKET_FILTER);
> +               if (IS_ERR(prog))
> +                       prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_SCHED_CLS);
>                 if (IS_ERR(prog))
>                         return PTR_ERR(prog);
>         }

Comment from Alexei Starovoitov:
Patches 1 and 2 are missing for me, so I couldn't review properly,
but this diff looks odd.
It allows sched_cls prog type to attach to tun.
That means everything that sched_cls progs can do will be done from tun hook?
sched_cls assumes l2 and can modify the packet.
I think crashes are inevitable.

> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/7] tun: allow use of BPF_PROG_TYPE_SCHED_CLS program type
  2021-01-12 20:40   ` Yuri Benditovich
@ 2021-01-12 20:55     ` Yuri Benditovich
  2021-01-18  9:16       ` Yuri Benditovich
  2021-01-20 18:44       ` Alexei Starovoitov
  0 siblings, 2 replies; 26+ messages in thread
From: Yuri Benditovich @ 2021-01-12 20:55 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Michael S . Tsirkin, Jason Wang,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, rdunlap, willemb, gustavoars, herbert,
	steffen.klassert, pablo, decui, cai, jakub, elver, pabeni,
	netdev, linux-kernel, kvm, virtualization, bpf
  Cc: Yan Vugenfirer

On Tue, Jan 12, 2021 at 10:40 PM Yuri Benditovich
<yuri.benditovich@daynix.com> wrote:
>
> On Tue, Jan 12, 2021 at 9:42 PM Yuri Benditovich
> <yuri.benditovich@daynix.com> wrote:
> >
> > This program type can set skb hash value. It will be useful
> > when the tun will support hash reporting feature if virtio-net.
> >
> > Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
> > ---
> >  drivers/net/tun.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > index 7959b5c2d11f..455f7afc1f36 100644
> > --- a/drivers/net/tun.c
> > +++ b/drivers/net/tun.c
> > @@ -2981,6 +2981,8 @@ static int tun_set_ebpf(struct tun_struct *tun, struct tun_prog __rcu **prog_p,
> >                 prog = NULL;
> >         } else {
> >                 prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_SOCKET_FILTER);
> > +               if (IS_ERR(prog))
> > +                       prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_SCHED_CLS);
> >                 if (IS_ERR(prog))
> >                         return PTR_ERR(prog);
> >         }
>
> Comment from Alexei Starovoitov:
> Patches 1 and 2 are missing for me, so I couldn't review properly,
> but this diff looks odd.
> It allows sched_cls prog type to attach to tun.
> That means everything that sched_cls progs can do will be done from tun hook?

We do not have an intention to modify the packet in this steering eBPF.
There is just one function that unavailable for BPF_PROG_TYPE_SOCKET_FILTER
that the eBPF needs to make possible to deliver the hash to the guest
VM - it is 'bpf_set_hash'

Does it mean that we need to define a new eBPF type for socket filter
operations + set_hash?

Our problem is that the eBPF calculates 32-bit hash, 16-bit queue
index and 8-bit of hash type.
But it is able to return only 32-bit integer, so in this set of
patches the eBPF returns
queue index and hash type and saves the hash in skb->hash using bpf_set_hash().

If this is unacceptable, can you please recommend a better solution?

> sched_cls assumes l2 and can modify the packet.

The steering eBPF in TUN module also assumes l2.

> I think crashes are inevitable.
>
> > --
> > 2.17.1
> >

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 0/7] Support for virtio-net hash reporting
  2021-01-12 20:28   ` Yuri Benditovich
@ 2021-01-12 23:47     ` Willem de Bruijn
  2021-01-13  4:05       ` Jason Wang
  0 siblings, 1 reply; 26+ messages in thread
From: Willem de Bruijn @ 2021-01-12 23:47 UTC (permalink / raw)
  To: Yuri Benditovich
  Cc: David S. Miller, Jakub Kicinski, Michael S . Tsirkin, Jason Wang,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, rdunlap, Gustavo A . R . Silva, Herbert Xu,
	Steffen Klassert, Pablo Neira Ayuso, decui, cai, Jakub Sitnicki,
	Marco Elver, Paolo Abeni, Network Development, linux-kernel, kvm,
	virtualization, bpf, Yan Vugenfirer

On Tue, Jan 12, 2021 at 3:29 PM Yuri Benditovich
<yuri.benditovich@daynix.com> wrote:
>
> On Tue, Jan 12, 2021 at 9:49 PM Yuri Benditovich
> <yuri.benditovich@daynix.com> wrote:
> >
> > On Tue, Jan 12, 2021 at 9:41 PM Yuri Benditovich
> > <yuri.benditovich@daynix.com> wrote:
> > >
> > > Existing TUN module is able to use provided "steering eBPF" to
> > > calculate per-packet hash and derive the destination queue to
> > > place the packet to. The eBPF uses mapped configuration data
> > > containing a key for hash calculation and indirection table
> > > with array of queues' indices.
> > >
> > > This series of patches adds support for virtio-net hash reporting
> > > feature as defined in virtio specification. It extends the TUN module
> > > and the "steering eBPF" as follows:
> > >
> > > Extended steering eBPF calculates the hash value and hash type, keeps
> > > hash value in the skb->hash and returns index of destination virtqueue
> > > and the type of the hash. TUN module keeps returned hash type in
> > > (currently unused) field of the skb.
> > > skb->__unused renamed to 'hash_report_type'.
> > >
> > > When TUN module is called later to allocate and fill the virtio-net
> > > header and push it to destination virtqueue it populates the hash
> > > and the hash type into virtio-net header.
> > >
> > > VHOST driver is made aware of respective virtio-net feature that
> > > extends the virtio-net header to report the hash value and hash report
> > > type.
> >
> > Comment from Willem de Bruijn:
> >
> > Skbuff fields are in short supply. I don't think we need to add one
> > just for this narrow path entirely internal to the tun device.
> >
>
> We understand that and try to minimize the impact by using an already
> existing unused field of skb.

Not anymore. It was repurposed as a flags field very recently.

This use case is also very narrow in scope. And a very short path from
data producer to consumer. So I don't think it needs to claim scarce
bits in the skb.

tun_ebpf_select_queue stores the field, tun_put_user reads it and
converts it to the virtio_net_hdr in the descriptor.

tun_ebpf_select_queue is called from .ndo_select_queue.  Storing the
field in skb->cb is fragile, as in theory some code could overwrite
that between field between ndo_select_queue and
ndo_start_xmit/tun_net_xmit, from which point it is fully under tun
control again. But in practice, I don't believe anything does.

Alternatively an existing skb field that is used only on disjoint
datapaths, such as ingress-only, could be viable.

> > Instead, you could just run the flow_dissector in tun_put_user if the
> > feature is negotiated. Indeed, the flow dissector seems more apt to me
> > than BPF here. Note that the flow dissector internally can be
> > overridden by a BPF program if the admin so chooses.
> >
> When this set of patches is related to hash delivery in the virtio-net
> packet in general,
> it was prepared in context of RSS feature implementation as defined in
> virtio spec [1]
> In case of RSS it is not enough to run the flow_dissector in tun_put_user:
> in tun_ebpf_select_queue the TUN calls eBPF to calculate the hash,
> hash type and queue index
> according to the (mapped) parameters (key, hash types, indirection
> table) received from the guest.

TUNSETSTEERINGEBPF was added to support more diverse queue selection
than the default in case of multiqueue tun. Not sure what the exact
use cases are.

But RSS is exactly the purpose of the flow dissector. It is used for
that purpose in the software variant RPS. The flow dissector
implements a superset of the RSS spec, and certainly computes a
four-tuple for TCP/IPv6. In the case of RPS, it is skipped if the NIC
has already computed a 4-tuple hash.

What it does not give is a type indication, such as
VIRTIO_NET_HASH_TYPE_TCPv6. I don't understand how this would be used.
In datapaths where the NIC has already computed the four-tuple hash
and stored it in skb->hash --the common case for servers--, That type
field is the only reason to have to compute again.

> Our intention is to keep the hash and hash type in the skb to populate them
> into a virtio-net header later in tun_put_user.
> Note that in this case the type of calculated hash is selected not
> only from flow dissections
> but also from limitations provided by the guest.
>
> This is already implemented in qemu (for case of vhost=off), see [2]
> (virtio_net_process_rss)
> For case of vhost=on there are WIP for qemu to load eBPF and attach it to TUN.

> Note that exact way of selecting rx virtqueue depends on the guest,
> it could be automatic steering (typical for Linux VM), RSS (typical
> for Windows VM) or
> any other steering mechanism implemented in loadable TUN steering BPF with
> or without hash calculation.
>
> [1] https://github.com/oasis-tcs/virtio-spec/blob/master/content.tex#L3740
> [2] https://github.com/qemu/qemu/blob/master/hw/net/virtio-net.c#L1591
>
> > This also hits on a deeper point with the choice of hash values, that
> > I also noticed in my RFC patchset to implement the inverse [1][2]. It
> > is much more detailed than skb->hash + skb->l4_hash currently offers,
> > and that can be gotten for free from most hardware.
>
> Unfortunately in the case of RSS we can't get this hash from the hardware as
> this requires configuration of the NIC's hardware with key and hash types for
> Toeplitz hash calculation.

I don't understand. Toeplitz hash calculation is enabled by default
for multiqueue devices, and many devices will pass the toeplitz hash
along for free to avoid software flow dissection.

> > In most practical
> > cases, that information suffices. I added less specific fields
> > VIRTIO_NET_HASH_REPORT_L4, VIRTIO_NET_HASH_REPORT_OTHER that work
> > without explicit flow dissection. I understand that the existing
> > fields are part of the standard. Just curious, what is their purpose
> > beyond 4-tuple based flow hashing?
>
> The hash is used in combination with the indirection table to select
> destination rx virtqueue.
> The hash and hash type are to be reported in virtio-net header, if requested.
> For Windows VM - in case the device does not report the hash (even if
> it calculated it to
> schedule the packet to a proper queue), the driver must do that for each packet
> (this is a certification requirement).

I understand the basics of RSS. My question is what the hash-type is
intended to be used for by the guest. It is part of the virtio spec,
so this point is somewhat moot: it has to be passed along with the
hash value now.

But it is not entirely moot. If most users are satisfied with knowing
whether a hash is L4 or not, we could add two new types
VIRTIO_NET_HASH_TYPE_L4 and VIRTIO_NET_HASH_TYPE_OTHER. And then pass
the existing skb->hash as is, likely computed by the NIC.

[1] https://patchwork.kernel.org/project/netdevbpf/patch/20201228162233.2032571-2-willemdebruijn.kernel@gmail.com/

> >
> > [1] https://patchwork.kernel.org/project/netdevbpf/list/?series=406859&state=*
> > [2] https://github.com/wdebruij/linux/commit/0f77febf22cd6ffc242a575807fa8382a26e511e
> > >
> > > Yuri Benditovich (7):
> > >   skbuff: define field for hash report type
> > >   vhost: support for hash report virtio-net feature
> > >   tun: allow use of BPF_PROG_TYPE_SCHED_CLS program type
> > >   tun: free bpf_program by bpf_prog_put instead of bpf_prog_destroy
> > >   tun: add ioctl code TUNSETHASHPOPULATION
> > >   tun: populate hash in virtio-net header when needed
> > >   tun: report new tun feature IFF_HASH
> > >
> > >  drivers/net/tun.c           | 43 +++++++++++++++++++++++++++++++------
> > >  drivers/vhost/net.c         | 37 ++++++++++++++++++++++++-------
> > >  include/linux/skbuff.h      |  7 +++++-
> > >  include/uapi/linux/if_tun.h |  2 ++
> > >  4 files changed, 74 insertions(+), 15 deletions(-)
> > >
> > > --
> > > 2.17.1
> > >

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 0/7] Support for virtio-net hash reporting
  2021-01-12 23:47     ` Willem de Bruijn
@ 2021-01-13  4:05       ` Jason Wang
  2021-01-13 14:33         ` Willem de Bruijn
  0 siblings, 1 reply; 26+ messages in thread
From: Jason Wang @ 2021-01-13  4:05 UTC (permalink / raw)
  To: Willem de Bruijn, Yuri Benditovich
  Cc: David S. Miller, Jakub Kicinski, Michael S . Tsirkin,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, rdunlap, Gustavo A . R . Silva, Herbert Xu,
	Steffen Klassert, Pablo Neira Ayuso, decui, cai, Jakub Sitnicki,
	Marco Elver, Paolo Abeni, Network Development, linux-kernel, kvm,
	virtualization, bpf, Yan Vugenfirer


On 2021/1/13 上午7:47, Willem de Bruijn wrote:
> On Tue, Jan 12, 2021 at 3:29 PM Yuri Benditovich
> <yuri.benditovich@daynix.com> wrote:
>> On Tue, Jan 12, 2021 at 9:49 PM Yuri Benditovich
>> <yuri.benditovich@daynix.com> wrote:
>>> On Tue, Jan 12, 2021 at 9:41 PM Yuri Benditovich
>>> <yuri.benditovich@daynix.com> wrote:
>>>> Existing TUN module is able to use provided "steering eBPF" to
>>>> calculate per-packet hash and derive the destination queue to
>>>> place the packet to. The eBPF uses mapped configuration data
>>>> containing a key for hash calculation and indirection table
>>>> with array of queues' indices.
>>>>
>>>> This series of patches adds support for virtio-net hash reporting
>>>> feature as defined in virtio specification. It extends the TUN module
>>>> and the "steering eBPF" as follows:
>>>>
>>>> Extended steering eBPF calculates the hash value and hash type, keeps
>>>> hash value in the skb->hash and returns index of destination virtqueue
>>>> and the type of the hash. TUN module keeps returned hash type in
>>>> (currently unused) field of the skb.
>>>> skb->__unused renamed to 'hash_report_type'.
>>>>
>>>> When TUN module is called later to allocate and fill the virtio-net
>>>> header and push it to destination virtqueue it populates the hash
>>>> and the hash type into virtio-net header.
>>>>
>>>> VHOST driver is made aware of respective virtio-net feature that
>>>> extends the virtio-net header to report the hash value and hash report
>>>> type.
>>> Comment from Willem de Bruijn:
>>>
>>> Skbuff fields are in short supply. I don't think we need to add one
>>> just for this narrow path entirely internal to the tun device.
>>>
>> We understand that and try to minimize the impact by using an already
>> existing unused field of skb.
> Not anymore. It was repurposed as a flags field very recently.
>
> This use case is also very narrow in scope. And a very short path from
> data producer to consumer. So I don't think it needs to claim scarce
> bits in the skb.
>
> tun_ebpf_select_queue stores the field, tun_put_user reads it and
> converts it to the virtio_net_hdr in the descriptor.
>
> tun_ebpf_select_queue is called from .ndo_select_queue.  Storing the
> field in skb->cb is fragile, as in theory some code could overwrite
> that between field between ndo_select_queue and
> ndo_start_xmit/tun_net_xmit, from which point it is fully under tun
> control again. But in practice, I don't believe anything does.
>
> Alternatively an existing skb field that is used only on disjoint
> datapaths, such as ingress-only, could be viable.


A question here. We had metadata support in XDP for cooperation between 
eBPF programs. Do we have something similar in the skb?

E.g in the RSS, if we want to pass some metadata information between 
eBPF program and the logic that generates the vnet header (either hard 
logic in the kernel or another eBPF program). Is there any way that can 
avoid the possible conflicts of qdiscs?


>
>>> Instead, you could just run the flow_dissector in tun_put_user if the
>>> feature is negotiated. Indeed, the flow dissector seems more apt to me
>>> than BPF here. Note that the flow dissector internally can be
>>> overridden by a BPF program if the admin so chooses.
>>>
>> When this set of patches is related to hash delivery in the virtio-net
>> packet in general,
>> it was prepared in context of RSS feature implementation as defined in
>> virtio spec [1]
>> In case of RSS it is not enough to run the flow_dissector in tun_put_user:
>> in tun_ebpf_select_queue the TUN calls eBPF to calculate the hash,
>> hash type and queue index
>> according to the (mapped) parameters (key, hash types, indirection
>> table) received from the guest.
> TUNSETSTEERINGEBPF was added to support more diverse queue selection
> than the default in case of multiqueue tun. Not sure what the exact
> use cases are.
>
> But RSS is exactly the purpose of the flow dissector. It is used for
> that purpose in the software variant RPS. The flow dissector
> implements a superset of the RSS spec, and certainly computes a
> four-tuple for TCP/IPv6. In the case of RPS, it is skipped if the NIC
> has already computed a 4-tuple hash.
>
> What it does not give is a type indication, such as
> VIRTIO_NET_HASH_TYPE_TCPv6. I don't understand how this would be used.
> In datapaths where the NIC has already computed the four-tuple hash
> and stored it in skb->hash --the common case for servers--, That type
> field is the only reason to have to compute again.


The problem is there's no guarantee that the packet comes from the NIC, 
it could be a simple VM2VM or host2VM packet.

And even if the packet is coming from the NIC that calculates the hash 
there's no guarantee that it's the has that guest want (guest may use 
different RSS keys).

Thanks


>
>> Our intention is to keep the hash and hash type in the skb to populate them
>> into a virtio-net header later in tun_put_user.
>> Note that in this case the type of calculated hash is selected not
>> only from flow dissections
>> but also from limitations provided by the guest.
>>
>> This is already implemented in qemu (for case of vhost=off), see [2]
>> (virtio_net_process_rss)
>> For case of vhost=on there are WIP for qemu to load eBPF and attach it to TUN.
>> Note that exact way of selecting rx virtqueue depends on the guest,
>> it could be automatic steering (typical for Linux VM), RSS (typical
>> for Windows VM) or
>> any other steering mechanism implemented in loadable TUN steering BPF with
>> or without hash calculation.
>>
>> [1] https://github.com/oasis-tcs/virtio-spec/blob/master/content.tex#L3740
>> [2] https://github.com/qemu/qemu/blob/master/hw/net/virtio-net.c#L1591
>>
>>> This also hits on a deeper point with the choice of hash values, that
>>> I also noticed in my RFC patchset to implement the inverse [1][2]. It
>>> is much more detailed than skb->hash + skb->l4_hash currently offers,
>>> and that can be gotten for free from most hardware.
>> Unfortunately in the case of RSS we can't get this hash from the hardware as
>> this requires configuration of the NIC's hardware with key and hash types for
>> Toeplitz hash calculation.
> I don't understand. Toeplitz hash calculation is enabled by default
> for multiqueue devices, and many devices will pass the toeplitz hash
> along for free to avoid software flow dissection.
>
>>> In most practical
>>> cases, that information suffices. I added less specific fields
>>> VIRTIO_NET_HASH_REPORT_L4, VIRTIO_NET_HASH_REPORT_OTHER that work
>>> without explicit flow dissection. I understand that the existing
>>> fields are part of the standard. Just curious, what is their purpose
>>> beyond 4-tuple based flow hashing?
>> The hash is used in combination with the indirection table to select
>> destination rx virtqueue.
>> The hash and hash type are to be reported in virtio-net header, if requested.
>> For Windows VM - in case the device does not report the hash (even if
>> it calculated it to
>> schedule the packet to a proper queue), the driver must do that for each packet
>> (this is a certification requirement).
> I understand the basics of RSS. My question is what the hash-type is
> intended to be used for by the guest. It is part of the virtio spec,
> so this point is somewhat moot: it has to be passed along with the
> hash value now.
>
> But it is not entirely moot. If most users are satisfied with knowing
> whether a hash is L4 or not, we could add two new types
> VIRTIO_NET_HASH_TYPE_L4 and VIRTIO_NET_HASH_TYPE_OTHER. And then pass
> the existing skb->hash as is, likely computed by the NIC.
>
> [1] https://patchwork.kernel.org/project/netdevbpf/patch/20201228162233.2032571-2-willemdebruijn.kernel@gmail.com/
>
>>> [1] https://patchwork.kernel.org/project/netdevbpf/list/?series=406859&state=*
>>> [2] https://github.com/wdebruij/linux/commit/0f77febf22cd6ffc242a575807fa8382a26e511e
>>>> Yuri Benditovich (7):
>>>>    skbuff: define field for hash report type
>>>>    vhost: support for hash report virtio-net feature
>>>>    tun: allow use of BPF_PROG_TYPE_SCHED_CLS program type
>>>>    tun: free bpf_program by bpf_prog_put instead of bpf_prog_destroy
>>>>    tun: add ioctl code TUNSETHASHPOPULATION
>>>>    tun: populate hash in virtio-net header when needed
>>>>    tun: report new tun feature IFF_HASH
>>>>
>>>>   drivers/net/tun.c           | 43 +++++++++++++++++++++++++++++++------
>>>>   drivers/vhost/net.c         | 37 ++++++++++++++++++++++++-------
>>>>   include/linux/skbuff.h      |  7 +++++-
>>>>   include/uapi/linux/if_tun.h |  2 ++
>>>>   4 files changed, 74 insertions(+), 15 deletions(-)
>>>>
>>>> --
>>>> 2.17.1
>>>>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 0/7] Support for virtio-net hash reporting
  2021-01-13  4:05       ` Jason Wang
@ 2021-01-13 14:33         ` Willem de Bruijn
  2021-01-14  3:38           ` Jason Wang
  0 siblings, 1 reply; 26+ messages in thread
From: Willem de Bruijn @ 2021-01-13 14:33 UTC (permalink / raw)
  To: Jason Wang
  Cc: Willem de Bruijn, Yuri Benditovich, David S. Miller,
	Jakub Kicinski, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, rdunlap,
	Gustavo A . R . Silva, Herbert Xu, Steffen Klassert,
	Pablo Neira Ayuso, decui, cai, Jakub Sitnicki, Marco Elver,
	Paolo Abeni, Network Development, linux-kernel, kvm,
	virtualization, bpf, Yan Vugenfirer

On Tue, Jan 12, 2021 at 11:11 PM Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2021/1/13 上午7:47, Willem de Bruijn wrote:
> > On Tue, Jan 12, 2021 at 3:29 PM Yuri Benditovich
> > <yuri.benditovich@daynix.com> wrote:
> >> On Tue, Jan 12, 2021 at 9:49 PM Yuri Benditovich
> >> <yuri.benditovich@daynix.com> wrote:
> >>> On Tue, Jan 12, 2021 at 9:41 PM Yuri Benditovich
> >>> <yuri.benditovich@daynix.com> wrote:
> >>>> Existing TUN module is able to use provided "steering eBPF" to
> >>>> calculate per-packet hash and derive the destination queue to
> >>>> place the packet to. The eBPF uses mapped configuration data
> >>>> containing a key for hash calculation and indirection table
> >>>> with array of queues' indices.
> >>>>
> >>>> This series of patches adds support for virtio-net hash reporting
> >>>> feature as defined in virtio specification. It extends the TUN module
> >>>> and the "steering eBPF" as follows:
> >>>>
> >>>> Extended steering eBPF calculates the hash value and hash type, keeps
> >>>> hash value in the skb->hash and returns index of destination virtqueue
> >>>> and the type of the hash. TUN module keeps returned hash type in
> >>>> (currently unused) field of the skb.
> >>>> skb->__unused renamed to 'hash_report_type'.
> >>>>
> >>>> When TUN module is called later to allocate and fill the virtio-net
> >>>> header and push it to destination virtqueue it populates the hash
> >>>> and the hash type into virtio-net header.
> >>>>
> >>>> VHOST driver is made aware of respective virtio-net feature that
> >>>> extends the virtio-net header to report the hash value and hash report
> >>>> type.
> >>> Comment from Willem de Bruijn:
> >>>
> >>> Skbuff fields are in short supply. I don't think we need to add one
> >>> just for this narrow path entirely internal to the tun device.
> >>>
> >> We understand that and try to minimize the impact by using an already
> >> existing unused field of skb.
> > Not anymore. It was repurposed as a flags field very recently.
> >
> > This use case is also very narrow in scope. And a very short path from
> > data producer to consumer. So I don't think it needs to claim scarce
> > bits in the skb.
> >
> > tun_ebpf_select_queue stores the field, tun_put_user reads it and
> > converts it to the virtio_net_hdr in the descriptor.
> >
> > tun_ebpf_select_queue is called from .ndo_select_queue.  Storing the
> > field in skb->cb is fragile, as in theory some code could overwrite
> > that between field between ndo_select_queue and
> > ndo_start_xmit/tun_net_xmit, from which point it is fully under tun
> > control again. But in practice, I don't believe anything does.
> >
> > Alternatively an existing skb field that is used only on disjoint
> > datapaths, such as ingress-only, could be viable.
>
>
> A question here. We had metadata support in XDP for cooperation between
> eBPF programs. Do we have something similar in the skb?
>
> E.g in the RSS, if we want to pass some metadata information between
> eBPF program and the logic that generates the vnet header (either hard
> logic in the kernel or another eBPF program). Is there any way that can
> avoid the possible conflicts of qdiscs?

Not that I am aware of. The closest thing is cb[].

It'll have to aliase a field like that, that is known unused for the given path.

One other approach that has been used within linear call stacks is out
of band. Like percpu variables softnet_data.xmit.more and
mirred_rec_level. But that is perhaps a bit overwrought for this use
case.

> >
> >>> Instead, you could just run the flow_dissector in tun_put_user if the
> >>> feature is negotiated. Indeed, the flow dissector seems more apt to me
> >>> than BPF here. Note that the flow dissector internally can be
> >>> overridden by a BPF program if the admin so chooses.
> >>>
> >> When this set of patches is related to hash delivery in the virtio-net
> >> packet in general,
> >> it was prepared in context of RSS feature implementation as defined in
> >> virtio spec [1]
> >> In case of RSS it is not enough to run the flow_dissector in tun_put_user:
> >> in tun_ebpf_select_queue the TUN calls eBPF to calculate the hash,
> >> hash type and queue index
> >> according to the (mapped) parameters (key, hash types, indirection
> >> table) received from the guest.
> > TUNSETSTEERINGEBPF was added to support more diverse queue selection
> > than the default in case of multiqueue tun. Not sure what the exact
> > use cases are.
> >
> > But RSS is exactly the purpose of the flow dissector. It is used for
> > that purpose in the software variant RPS. The flow dissector
> > implements a superset of the RSS spec, and certainly computes a
> > four-tuple for TCP/IPv6. In the case of RPS, it is skipped if the NIC
> > has already computed a 4-tuple hash.
> >
> > What it does not give is a type indication, such as
> > VIRTIO_NET_HASH_TYPE_TCPv6. I don't understand how this would be used.
> > In datapaths where the NIC has already computed the four-tuple hash
> > and stored it in skb->hash --the common case for servers--, That type
> > field is the only reason to have to compute again.
>
>
> The problem is there's no guarantee that the packet comes from the NIC,
> it could be a simple VM2VM or host2VM packet.
>
> And even if the packet is coming from the NIC that calculates the hash
> there's no guarantee that it's the has that guest want (guest may use
> different RSS keys).

Ah yes, of course.

I would still revisit the need to store a detailed hash_type along with
the hash, as as far I can tell that conveys no actionable information
to the guest.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 0/7] Support for virtio-net hash reporting
  2021-01-13 14:33         ` Willem de Bruijn
@ 2021-01-14  3:38           ` Jason Wang
  2021-01-17  7:57             ` Yuri Benditovich
  0 siblings, 1 reply; 26+ messages in thread
From: Jason Wang @ 2021-01-14  3:38 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Yuri Benditovich, David S. Miller, Jakub Kicinski,
	Michael S . Tsirkin, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, rdunlap, Gustavo A . R . Silva,
	Herbert Xu, Steffen Klassert, Pablo Neira Ayuso, decui, cai,
	Jakub Sitnicki, Marco Elver, Paolo Abeni, Network Development,
	linux-kernel, kvm, virtualization, bpf, Yan Vugenfirer


On 2021/1/13 下午10:33, Willem de Bruijn wrote:
> On Tue, Jan 12, 2021 at 11:11 PM Jason Wang <jasowang@redhat.com> wrote:
>>
>> On 2021/1/13 上午7:47, Willem de Bruijn wrote:
>>> On Tue, Jan 12, 2021 at 3:29 PM Yuri Benditovich
>>> <yuri.benditovich@daynix.com> wrote:
>>>> On Tue, Jan 12, 2021 at 9:49 PM Yuri Benditovich
>>>> <yuri.benditovich@daynix.com> wrote:
>>>>> On Tue, Jan 12, 2021 at 9:41 PM Yuri Benditovich
>>>>> <yuri.benditovich@daynix.com> wrote:
>>>>>> Existing TUN module is able to use provided "steering eBPF" to
>>>>>> calculate per-packet hash and derive the destination queue to
>>>>>> place the packet to. The eBPF uses mapped configuration data
>>>>>> containing a key for hash calculation and indirection table
>>>>>> with array of queues' indices.
>>>>>>
>>>>>> This series of patches adds support for virtio-net hash reporting
>>>>>> feature as defined in virtio specification. It extends the TUN module
>>>>>> and the "steering eBPF" as follows:
>>>>>>
>>>>>> Extended steering eBPF calculates the hash value and hash type, keeps
>>>>>> hash value in the skb->hash and returns index of destination virtqueue
>>>>>> and the type of the hash. TUN module keeps returned hash type in
>>>>>> (currently unused) field of the skb.
>>>>>> skb->__unused renamed to 'hash_report_type'.
>>>>>>
>>>>>> When TUN module is called later to allocate and fill the virtio-net
>>>>>> header and push it to destination virtqueue it populates the hash
>>>>>> and the hash type into virtio-net header.
>>>>>>
>>>>>> VHOST driver is made aware of respective virtio-net feature that
>>>>>> extends the virtio-net header to report the hash value and hash report
>>>>>> type.
>>>>> Comment from Willem de Bruijn:
>>>>>
>>>>> Skbuff fields are in short supply. I don't think we need to add one
>>>>> just for this narrow path entirely internal to the tun device.
>>>>>
>>>> We understand that and try to minimize the impact by using an already
>>>> existing unused field of skb.
>>> Not anymore. It was repurposed as a flags field very recently.
>>>
>>> This use case is also very narrow in scope. And a very short path from
>>> data producer to consumer. So I don't think it needs to claim scarce
>>> bits in the skb.
>>>
>>> tun_ebpf_select_queue stores the field, tun_put_user reads it and
>>> converts it to the virtio_net_hdr in the descriptor.
>>>
>>> tun_ebpf_select_queue is called from .ndo_select_queue.  Storing the
>>> field in skb->cb is fragile, as in theory some code could overwrite
>>> that between field between ndo_select_queue and
>>> ndo_start_xmit/tun_net_xmit, from which point it is fully under tun
>>> control again. But in practice, I don't believe anything does.
>>>
>>> Alternatively an existing skb field that is used only on disjoint
>>> datapaths, such as ingress-only, could be viable.
>>
>> A question here. We had metadata support in XDP for cooperation between
>> eBPF programs. Do we have something similar in the skb?
>>
>> E.g in the RSS, if we want to pass some metadata information between
>> eBPF program and the logic that generates the vnet header (either hard
>> logic in the kernel or another eBPF program). Is there any way that can
>> avoid the possible conflicts of qdiscs?
> Not that I am aware of. The closest thing is cb[].
>
> It'll have to aliase a field like that, that is known unused for the given path.


Right, we need to make sure cb is not used by other ones. I'm not sure 
how hard to achieve that consider Qemu installs the eBPF program but it 
doesn't deal with networking configurations.


>
> One other approach that has been used within linear call stacks is out
> of band. Like percpu variables softnet_data.xmit.more and
> mirred_rec_level. But that is perhaps a bit overwrought for this use
> case.


Yes, and if we go that way then eBPF turns out to be a burden since we 
need to invent helpers to access those auxiliary data structure. It 
would be better then to hard-coded the RSS in the kernel.


>
>>>>> Instead, you could just run the flow_dissector in tun_put_user if the
>>>>> feature is negotiated. Indeed, the flow dissector seems more apt to me
>>>>> than BPF here. Note that the flow dissector internally can be
>>>>> overridden by a BPF program if the admin so chooses.
>>>>>
>>>> When this set of patches is related to hash delivery in the virtio-net
>>>> packet in general,
>>>> it was prepared in context of RSS feature implementation as defined in
>>>> virtio spec [1]
>>>> In case of RSS it is not enough to run the flow_dissector in tun_put_user:
>>>> in tun_ebpf_select_queue the TUN calls eBPF to calculate the hash,
>>>> hash type and queue index
>>>> according to the (mapped) parameters (key, hash types, indirection
>>>> table) received from the guest.
>>> TUNSETSTEERINGEBPF was added to support more diverse queue selection
>>> than the default in case of multiqueue tun. Not sure what the exact
>>> use cases are.
>>>
>>> But RSS is exactly the purpose of the flow dissector. It is used for
>>> that purpose in the software variant RPS. The flow dissector
>>> implements a superset of the RSS spec, and certainly computes a
>>> four-tuple for TCP/IPv6. In the case of RPS, it is skipped if the NIC
>>> has already computed a 4-tuple hash.
>>>
>>> What it does not give is a type indication, such as
>>> VIRTIO_NET_HASH_TYPE_TCPv6. I don't understand how this would be used.
>>> In datapaths where the NIC has already computed the four-tuple hash
>>> and stored it in skb->hash --the common case for servers--, That type
>>> field is the only reason to have to compute again.
>>
>> The problem is there's no guarantee that the packet comes from the NIC,
>> it could be a simple VM2VM or host2VM packet.
>>
>> And even if the packet is coming from the NIC that calculates the hash
>> there's no guarantee that it's the has that guest want (guest may use
>> different RSS keys).
> Ah yes, of course.
>
> I would still revisit the need to store a detailed hash_type along with
> the hash, as as far I can tell that conveys no actionable information
> to the guest.


Yes, need to figure out its usage. According to [1], it only mention 
that storing has type is a charge of driver. Maybe Yuri can answer this.

Thanks

[1] 
https://docs.microsoft.com/en-us/windows-hardware/drivers/network/indicating-rss-receive-data


>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 0/7] Support for virtio-net hash reporting
  2021-01-14  3:38           ` Jason Wang
@ 2021-01-17  7:57             ` Yuri Benditovich
  2021-01-18  2:46               ` Jason Wang
  0 siblings, 1 reply; 26+ messages in thread
From: Yuri Benditovich @ 2021-01-17  7:57 UTC (permalink / raw)
  To: Jason Wang
  Cc: Willem de Bruijn, David S. Miller, Jakub Kicinski,
	Michael S . Tsirkin, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Randy Dunlap, Gustavo A . R . Silva,
	Herbert Xu, Steffen Klassert, Pablo Neira Ayuso, decui, cai,
	Jakub Sitnicki, Marco Elver, Paolo Abeni, Network Development,
	linux-kernel, kvm, virtualization, bpf, Yan Vugenfirer

On Thu, Jan 14, 2021 at 5:39 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2021/1/13 下午10:33, Willem de Bruijn wrote:
> > On Tue, Jan 12, 2021 at 11:11 PM Jason Wang <jasowang@redhat.com> wrote:
> >>
> >> On 2021/1/13 上午7:47, Willem de Bruijn wrote:
> >>> On Tue, Jan 12, 2021 at 3:29 PM Yuri Benditovich
> >>> <yuri.benditovich@daynix.com> wrote:
> >>>> On Tue, Jan 12, 2021 at 9:49 PM Yuri Benditovich
> >>>> <yuri.benditovich@daynix.com> wrote:
> >>>>> On Tue, Jan 12, 2021 at 9:41 PM Yuri Benditovich
> >>>>> <yuri.benditovich@daynix.com> wrote:
> >>>>>> Existing TUN module is able to use provided "steering eBPF" to
> >>>>>> calculate per-packet hash and derive the destination queue to
> >>>>>> place the packet to. The eBPF uses mapped configuration data
> >>>>>> containing a key for hash calculation and indirection table
> >>>>>> with array of queues' indices.
> >>>>>>
> >>>>>> This series of patches adds support for virtio-net hash reporting
> >>>>>> feature as defined in virtio specification. It extends the TUN module
> >>>>>> and the "steering eBPF" as follows:
> >>>>>>
> >>>>>> Extended steering eBPF calculates the hash value and hash type, keeps
> >>>>>> hash value in the skb->hash and returns index of destination virtqueue
> >>>>>> and the type of the hash. TUN module keeps returned hash type in
> >>>>>> (currently unused) field of the skb.
> >>>>>> skb->__unused renamed to 'hash_report_type'.
> >>>>>>
> >>>>>> When TUN module is called later to allocate and fill the virtio-net
> >>>>>> header and push it to destination virtqueue it populates the hash
> >>>>>> and the hash type into virtio-net header.
> >>>>>>
> >>>>>> VHOST driver is made aware of respective virtio-net feature that
> >>>>>> extends the virtio-net header to report the hash value and hash report
> >>>>>> type.
> >>>>> Comment from Willem de Bruijn:
> >>>>>
> >>>>> Skbuff fields are in short supply. I don't think we need to add one
> >>>>> just for this narrow path entirely internal to the tun device.
> >>>>>
> >>>> We understand that and try to minimize the impact by using an already
> >>>> existing unused field of skb.
> >>> Not anymore. It was repurposed as a flags field very recently.
> >>>
> >>> This use case is also very narrow in scope. And a very short path from
> >>> data producer to consumer. So I don't think it needs to claim scarce
> >>> bits in the skb.
> >>>
> >>> tun_ebpf_select_queue stores the field, tun_put_user reads it and
> >>> converts it to the virtio_net_hdr in the descriptor.
> >>>
> >>> tun_ebpf_select_queue is called from .ndo_select_queue.  Storing the
> >>> field in skb->cb is fragile, as in theory some code could overwrite
> >>> that between field between ndo_select_queue and
> >>> ndo_start_xmit/tun_net_xmit, from which point it is fully under tun
> >>> control again. But in practice, I don't believe anything does.
> >>>
> >>> Alternatively an existing skb field that is used only on disjoint
> >>> datapaths, such as ingress-only, could be viable.
> >>
> >> A question here. We had metadata support in XDP for cooperation between
> >> eBPF programs. Do we have something similar in the skb?
> >>
> >> E.g in the RSS, if we want to pass some metadata information between
> >> eBPF program and the logic that generates the vnet header (either hard
> >> logic in the kernel or another eBPF program). Is there any way that can
> >> avoid the possible conflicts of qdiscs?
> > Not that I am aware of. The closest thing is cb[].
> >
> > It'll have to aliase a field like that, that is known unused for the given path.
>
>
> Right, we need to make sure cb is not used by other ones. I'm not sure
> how hard to achieve that consider Qemu installs the eBPF program but it
> doesn't deal with networking configurations.
>
>
> >
> > One other approach that has been used within linear call stacks is out
> > of band. Like percpu variables softnet_data.xmit.more and
> > mirred_rec_level. But that is perhaps a bit overwrought for this use
> > case.
>
>
> Yes, and if we go that way then eBPF turns out to be a burden since we
> need to invent helpers to access those auxiliary data structure. It
> would be better then to hard-coded the RSS in the kernel.
>
>
> >
> >>>>> Instead, you could just run the flow_dissector in tun_put_user if the
> >>>>> feature is negotiated. Indeed, the flow dissector seems more apt to me
> >>>>> than BPF here. Note that the flow dissector internally can be
> >>>>> overridden by a BPF program if the admin so chooses.
> >>>>>
> >>>> When this set of patches is related to hash delivery in the virtio-net
> >>>> packet in general,
> >>>> it was prepared in context of RSS feature implementation as defined in
> >>>> virtio spec [1]
> >>>> In case of RSS it is not enough to run the flow_dissector in tun_put_user:
> >>>> in tun_ebpf_select_queue the TUN calls eBPF to calculate the hash,
> >>>> hash type and queue index
> >>>> according to the (mapped) parameters (key, hash types, indirection
> >>>> table) received from the guest.
> >>> TUNSETSTEERINGEBPF was added to support more diverse queue selection
> >>> than the default in case of multiqueue tun. Not sure what the exact
> >>> use cases are.
> >>>
> >>> But RSS is exactly the purpose of the flow dissector. It is used for
> >>> that purpose in the software variant RPS. The flow dissector
> >>> implements a superset of the RSS spec, and certainly computes a
> >>> four-tuple for TCP/IPv6. In the case of RPS, it is skipped if the NIC
> >>> has already computed a 4-tuple hash.
> >>>
> >>> What it does not give is a type indication, such as
> >>> VIRTIO_NET_HASH_TYPE_TCPv6. I don't understand how this would be used.
> >>> In datapaths where the NIC has already computed the four-tuple hash
> >>> and stored it in skb->hash --the common case for servers--, That type
> >>> field is the only reason to have to compute again.
> >>
> >> The problem is there's no guarantee that the packet comes from the NIC,
> >> it could be a simple VM2VM or host2VM packet.
> >>
> >> And even if the packet is coming from the NIC that calculates the hash
> >> there's no guarantee that it's the has that guest want (guest may use
> >> different RSS keys).
> > Ah yes, of course.
> >
> > I would still revisit the need to store a detailed hash_type along with
> > the hash, as as far I can tell that conveys no actionable information
> > to the guest.
>
>
> Yes, need to figure out its usage. According to [1], it only mention
> that storing has type is a charge of driver. Maybe Yuri can answer this.
>

For the case of Windows VM we can't know how exactly the network stack
uses provided hash data (including hash type). But: different releases
of Windows
enable different hash types (for example UDP hash is enabled only on
Server 2016 and up).

Indeed the Windows requires a little more from the network adapter/driver
than Linux does.

The addition of RSS support to virtio specification takes in account
the widest set of
requirements (i.e. Windows one), our initial impression is that this
should be enough also for Linux.

The NDIS specification in part of RSS is _mandatory_ and there are
certification tests
that check that the driver provides the hash data as expected. All the
high-performance
network adapters have such RSS functionality in the hardware.
With pre-RSS QEMU (i.e. where the virtio-net device does not indicate
the RSS support)
the virtio-net driver for Windows does all the job related to RSS:
- hash calculation
- hash/hash_type delivery
- reporting each packet on the correct CPU according to RSS settings

With RSS support in QEMU all the packets always come on a proper CPU and
the driver never needs to reschedule them. The driver still need to
calculate the
hash and report it to Windows. In this case we do the same job twice: the device
(QEMU or eBPF) does calculate the hash and get proper queue/CPU to deliver
the packet. But the hash is not delivered by the device, so the driver needs to
recalculate it and report to the Windows.

If we add HASH_REPORT support (current set of patches) and the device
indicates this
feature we can avoid hash recalculation in the driver assuming we
receive the correct hash
value and hash type. Otherwise the driver can't know which exactly
hash the device has calculated.

Please let me know if I did not answer the question.

> Thanks
>
> [1]
> https://docs.microsoft.com/en-us/windows-hardware/drivers/network/indicating-rss-receive-data
>
>
> >
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 0/7] Support for virtio-net hash reporting
  2021-01-17  7:57             ` Yuri Benditovich
@ 2021-01-18  2:46               ` Jason Wang
  2021-01-18  9:09                 ` Yuri Benditovich
  0 siblings, 1 reply; 26+ messages in thread
From: Jason Wang @ 2021-01-18  2:46 UTC (permalink / raw)
  To: Yuri Benditovich
  Cc: Willem de Bruijn, David S. Miller, Jakub Kicinski,
	Michael S . Tsirkin, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Randy Dunlap, Gustavo A . R . Silva,
	Herbert Xu, Steffen Klassert, Pablo Neira Ayuso, decui, cai,
	Jakub Sitnicki, Marco Elver, Paolo Abeni, Network Development,
	linux-kernel, kvm, virtualization, bpf, Yan Vugenfirer


On 2021/1/17 下午3:57, Yuri Benditovich wrote:
> On Thu, Jan 14, 2021 at 5:39 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> On 2021/1/13 下午10:33, Willem de Bruijn wrote:
>>> On Tue, Jan 12, 2021 at 11:11 PM Jason Wang <jasowang@redhat.com> wrote:
>>>> On 2021/1/13 上午7:47, Willem de Bruijn wrote:
>>>>> On Tue, Jan 12, 2021 at 3:29 PM Yuri Benditovich
>>>>> <yuri.benditovich@daynix.com> wrote:
>>>>>> On Tue, Jan 12, 2021 at 9:49 PM Yuri Benditovich
>>>>>> <yuri.benditovich@daynix.com> wrote:
>>>>>>> On Tue, Jan 12, 2021 at 9:41 PM Yuri Benditovich
>>>>>>> <yuri.benditovich@daynix.com> wrote:
>>>>>>>> Existing TUN module is able to use provided "steering eBPF" to
>>>>>>>> calculate per-packet hash and derive the destination queue to
>>>>>>>> place the packet to. The eBPF uses mapped configuration data
>>>>>>>> containing a key for hash calculation and indirection table
>>>>>>>> with array of queues' indices.
>>>>>>>>
>>>>>>>> This series of patches adds support for virtio-net hash reporting
>>>>>>>> feature as defined in virtio specification. It extends the TUN module
>>>>>>>> and the "steering eBPF" as follows:
>>>>>>>>
>>>>>>>> Extended steering eBPF calculates the hash value and hash type, keeps
>>>>>>>> hash value in the skb->hash and returns index of destination virtqueue
>>>>>>>> and the type of the hash. TUN module keeps returned hash type in
>>>>>>>> (currently unused) field of the skb.
>>>>>>>> skb->__unused renamed to 'hash_report_type'.
>>>>>>>>
>>>>>>>> When TUN module is called later to allocate and fill the virtio-net
>>>>>>>> header and push it to destination virtqueue it populates the hash
>>>>>>>> and the hash type into virtio-net header.
>>>>>>>>
>>>>>>>> VHOST driver is made aware of respective virtio-net feature that
>>>>>>>> extends the virtio-net header to report the hash value and hash report
>>>>>>>> type.
>>>>>>> Comment from Willem de Bruijn:
>>>>>>>
>>>>>>> Skbuff fields are in short supply. I don't think we need to add one
>>>>>>> just for this narrow path entirely internal to the tun device.
>>>>>>>
>>>>>> We understand that and try to minimize the impact by using an already
>>>>>> existing unused field of skb.
>>>>> Not anymore. It was repurposed as a flags field very recently.
>>>>>
>>>>> This use case is also very narrow in scope. And a very short path from
>>>>> data producer to consumer. So I don't think it needs to claim scarce
>>>>> bits in the skb.
>>>>>
>>>>> tun_ebpf_select_queue stores the field, tun_put_user reads it and
>>>>> converts it to the virtio_net_hdr in the descriptor.
>>>>>
>>>>> tun_ebpf_select_queue is called from .ndo_select_queue.  Storing the
>>>>> field in skb->cb is fragile, as in theory some code could overwrite
>>>>> that between field between ndo_select_queue and
>>>>> ndo_start_xmit/tun_net_xmit, from which point it is fully under tun
>>>>> control again. But in practice, I don't believe anything does.
>>>>>
>>>>> Alternatively an existing skb field that is used only on disjoint
>>>>> datapaths, such as ingress-only, could be viable.
>>>> A question here. We had metadata support in XDP for cooperation between
>>>> eBPF programs. Do we have something similar in the skb?
>>>>
>>>> E.g in the RSS, if we want to pass some metadata information between
>>>> eBPF program and the logic that generates the vnet header (either hard
>>>> logic in the kernel or another eBPF program). Is there any way that can
>>>> avoid the possible conflicts of qdiscs?
>>> Not that I am aware of. The closest thing is cb[].
>>>
>>> It'll have to aliase a field like that, that is known unused for the given path.
>>
>> Right, we need to make sure cb is not used by other ones. I'm not sure
>> how hard to achieve that consider Qemu installs the eBPF program but it
>> doesn't deal with networking configurations.
>>
>>
>>> One other approach that has been used within linear call stacks is out
>>> of band. Like percpu variables softnet_data.xmit.more and
>>> mirred_rec_level. But that is perhaps a bit overwrought for this use
>>> case.
>>
>> Yes, and if we go that way then eBPF turns out to be a burden since we
>> need to invent helpers to access those auxiliary data structure. It
>> would be better then to hard-coded the RSS in the kernel.
>>
>>
>>>>>>> Instead, you could just run the flow_dissector in tun_put_user if the
>>>>>>> feature is negotiated. Indeed, the flow dissector seems more apt to me
>>>>>>> than BPF here. Note that the flow dissector internally can be
>>>>>>> overridden by a BPF program if the admin so chooses.
>>>>>>>
>>>>>> When this set of patches is related to hash delivery in the virtio-net
>>>>>> packet in general,
>>>>>> it was prepared in context of RSS feature implementation as defined in
>>>>>> virtio spec [1]
>>>>>> In case of RSS it is not enough to run the flow_dissector in tun_put_user:
>>>>>> in tun_ebpf_select_queue the TUN calls eBPF to calculate the hash,
>>>>>> hash type and queue index
>>>>>> according to the (mapped) parameters (key, hash types, indirection
>>>>>> table) received from the guest.
>>>>> TUNSETSTEERINGEBPF was added to support more diverse queue selection
>>>>> than the default in case of multiqueue tun. Not sure what the exact
>>>>> use cases are.
>>>>>
>>>>> But RSS is exactly the purpose of the flow dissector. It is used for
>>>>> that purpose in the software variant RPS. The flow dissector
>>>>> implements a superset of the RSS spec, and certainly computes a
>>>>> four-tuple for TCP/IPv6. In the case of RPS, it is skipped if the NIC
>>>>> has already computed a 4-tuple hash.
>>>>>
>>>>> What it does not give is a type indication, such as
>>>>> VIRTIO_NET_HASH_TYPE_TCPv6. I don't understand how this would be used.
>>>>> In datapaths where the NIC has already computed the four-tuple hash
>>>>> and stored it in skb->hash --the common case for servers--, That type
>>>>> field is the only reason to have to compute again.
>>>> The problem is there's no guarantee that the packet comes from the NIC,
>>>> it could be a simple VM2VM or host2VM packet.
>>>>
>>>> And even if the packet is coming from the NIC that calculates the hash
>>>> there's no guarantee that it's the has that guest want (guest may use
>>>> different RSS keys).
>>> Ah yes, of course.
>>>
>>> I would still revisit the need to store a detailed hash_type along with
>>> the hash, as as far I can tell that conveys no actionable information
>>> to the guest.
>>
>> Yes, need to figure out its usage. According to [1], it only mention
>> that storing has type is a charge of driver. Maybe Yuri can answer this.
>>
> For the case of Windows VM we can't know how exactly the network stack
> uses provided hash data (including hash type). But: different releases
> of Windows
> enable different hash types (for example UDP hash is enabled only on
> Server 2016 and up).
>
> Indeed the Windows requires a little more from the network adapter/driver
> than Linux does.
>
> The addition of RSS support to virtio specification takes in account
> the widest set of
> requirements (i.e. Windows one), our initial impression is that this
> should be enough also for Linux.
>
> The NDIS specification in part of RSS is _mandatory_ and there are
> certification tests
> that check that the driver provides the hash data as expected. All the
> high-performance
> network adapters have such RSS functionality in the hardware.
> With pre-RSS QEMU (i.e. where the virtio-net device does not indicate
> the RSS support)
> the virtio-net driver for Windows does all the job related to RSS:
> - hash calculation
> - hash/hash_type delivery
> - reporting each packet on the correct CPU according to RSS settings
>
> With RSS support in QEMU all the packets always come on a proper CPU and
> the driver never needs to reschedule them. The driver still need to
> calculate the
> hash and report it to Windows. In this case we do the same job twice: the device
> (QEMU or eBPF) does calculate the hash and get proper queue/CPU to deliver
> the packet. But the hash is not delivered by the device, so the driver needs to
> recalculate it and report to the Windows.
>
> If we add HASH_REPORT support (current set of patches) and the device
> indicates this
> feature we can avoid hash recalculation in the driver assuming we
> receive the correct hash
> value and hash type. Otherwise the driver can't know which exactly
> hash the device has calculated.
>
> Please let me know if I did not answer the question.


I think I get you. The hash type is also a kind of classification (e.g 
TCP or UDP). Any possibility that it can be deduced from the driver? (Or 
it could be too expensive to do that).

Thanks


>
>> Thanks
>>
>> [1]
>> https://docs.microsoft.com/en-us/windows-hardware/drivers/network/indicating-rss-receive-data
>>
>>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 0/7] Support for virtio-net hash reporting
  2021-01-18  2:46               ` Jason Wang
@ 2021-01-18  9:09                 ` Yuri Benditovich
  2021-01-18 15:19                   ` Willem de Bruijn
  0 siblings, 1 reply; 26+ messages in thread
From: Yuri Benditovich @ 2021-01-18  9:09 UTC (permalink / raw)
  To: Jason Wang
  Cc: Willem de Bruijn, David S. Miller, Jakub Kicinski,
	Michael S . Tsirkin, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Randy Dunlap, Gustavo A . R . Silva,
	Herbert Xu, Steffen Klassert, Pablo Neira Ayuso, decui, cai,
	Jakub Sitnicki, Marco Elver, Paolo Abeni, Network Development,
	linux-kernel, kvm, virtualization, bpf, Yan Vugenfirer

On Mon, Jan 18, 2021 at 4:46 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2021/1/17 下午3:57, Yuri Benditovich wrote:
> > On Thu, Jan 14, 2021 at 5:39 AM Jason Wang <jasowang@redhat.com> wrote:
> >>
> >> On 2021/1/13 下午10:33, Willem de Bruijn wrote:
> >>> On Tue, Jan 12, 2021 at 11:11 PM Jason Wang <jasowang@redhat.com> wrote:
> >>>> On 2021/1/13 上午7:47, Willem de Bruijn wrote:
> >>>>> On Tue, Jan 12, 2021 at 3:29 PM Yuri Benditovich
> >>>>> <yuri.benditovich@daynix.com> wrote:
> >>>>>> On Tue, Jan 12, 2021 at 9:49 PM Yuri Benditovich
> >>>>>> <yuri.benditovich@daynix.com> wrote:
> >>>>>>> On Tue, Jan 12, 2021 at 9:41 PM Yuri Benditovich
> >>>>>>> <yuri.benditovich@daynix.com> wrote:
> >>>>>>>> Existing TUN module is able to use provided "steering eBPF" to
> >>>>>>>> calculate per-packet hash and derive the destination queue to
> >>>>>>>> place the packet to. The eBPF uses mapped configuration data
> >>>>>>>> containing a key for hash calculation and indirection table
> >>>>>>>> with array of queues' indices.
> >>>>>>>>
> >>>>>>>> This series of patches adds support for virtio-net hash reporting
> >>>>>>>> feature as defined in virtio specification. It extends the TUN module
> >>>>>>>> and the "steering eBPF" as follows:
> >>>>>>>>
> >>>>>>>> Extended steering eBPF calculates the hash value and hash type, keeps
> >>>>>>>> hash value in the skb->hash and returns index of destination virtqueue
> >>>>>>>> and the type of the hash. TUN module keeps returned hash type in
> >>>>>>>> (currently unused) field of the skb.
> >>>>>>>> skb->__unused renamed to 'hash_report_type'.
> >>>>>>>>
> >>>>>>>> When TUN module is called later to allocate and fill the virtio-net
> >>>>>>>> header and push it to destination virtqueue it populates the hash
> >>>>>>>> and the hash type into virtio-net header.
> >>>>>>>>
> >>>>>>>> VHOST driver is made aware of respective virtio-net feature that
> >>>>>>>> extends the virtio-net header to report the hash value and hash report
> >>>>>>>> type.
> >>>>>>> Comment from Willem de Bruijn:
> >>>>>>>
> >>>>>>> Skbuff fields are in short supply. I don't think we need to add one
> >>>>>>> just for this narrow path entirely internal to the tun device.
> >>>>>>>
> >>>>>> We understand that and try to minimize the impact by using an already
> >>>>>> existing unused field of skb.
> >>>>> Not anymore. It was repurposed as a flags field very recently.
> >>>>>
> >>>>> This use case is also very narrow in scope. And a very short path from
> >>>>> data producer to consumer. So I don't think it needs to claim scarce
> >>>>> bits in the skb.
> >>>>>
> >>>>> tun_ebpf_select_queue stores the field, tun_put_user reads it and
> >>>>> converts it to the virtio_net_hdr in the descriptor.
> >>>>>
> >>>>> tun_ebpf_select_queue is called from .ndo_select_queue.  Storing the
> >>>>> field in skb->cb is fragile, as in theory some code could overwrite
> >>>>> that between field between ndo_select_queue and
> >>>>> ndo_start_xmit/tun_net_xmit, from which point it is fully under tun
> >>>>> control again. But in practice, I don't believe anything does.
> >>>>>
> >>>>> Alternatively an existing skb field that is used only on disjoint
> >>>>> datapaths, such as ingress-only, could be viable.
> >>>> A question here. We had metadata support in XDP for cooperation between
> >>>> eBPF programs. Do we have something similar in the skb?
> >>>>
> >>>> E.g in the RSS, if we want to pass some metadata information between
> >>>> eBPF program and the logic that generates the vnet header (either hard
> >>>> logic in the kernel or another eBPF program). Is there any way that can
> >>>> avoid the possible conflicts of qdiscs?
> >>> Not that I am aware of. The closest thing is cb[].
> >>>
> >>> It'll have to aliase a field like that, that is known unused for the given path.
> >>
> >> Right, we need to make sure cb is not used by other ones. I'm not sure
> >> how hard to achieve that consider Qemu installs the eBPF program but it
> >> doesn't deal with networking configurations.
> >>
> >>
> >>> One other approach that has been used within linear call stacks is out
> >>> of band. Like percpu variables softnet_data.xmit.more and
> >>> mirred_rec_level. But that is perhaps a bit overwrought for this use
> >>> case.
> >>
> >> Yes, and if we go that way then eBPF turns out to be a burden since we
> >> need to invent helpers to access those auxiliary data structure. It
> >> would be better then to hard-coded the RSS in the kernel.
> >>
> >>
> >>>>>>> Instead, you could just run the flow_dissector in tun_put_user if the
> >>>>>>> feature is negotiated. Indeed, the flow dissector seems more apt to me
> >>>>>>> than BPF here. Note that the flow dissector internally can be
> >>>>>>> overridden by a BPF program if the admin so chooses.
> >>>>>>>
> >>>>>> When this set of patches is related to hash delivery in the virtio-net
> >>>>>> packet in general,
> >>>>>> it was prepared in context of RSS feature implementation as defined in
> >>>>>> virtio spec [1]
> >>>>>> In case of RSS it is not enough to run the flow_dissector in tun_put_user:
> >>>>>> in tun_ebpf_select_queue the TUN calls eBPF to calculate the hash,
> >>>>>> hash type and queue index
> >>>>>> according to the (mapped) parameters (key, hash types, indirection
> >>>>>> table) received from the guest.
> >>>>> TUNSETSTEERINGEBPF was added to support more diverse queue selection
> >>>>> than the default in case of multiqueue tun. Not sure what the exact
> >>>>> use cases are.
> >>>>>
> >>>>> But RSS is exactly the purpose of the flow dissector. It is used for
> >>>>> that purpose in the software variant RPS. The flow dissector
> >>>>> implements a superset of the RSS spec, and certainly computes a
> >>>>> four-tuple for TCP/IPv6. In the case of RPS, it is skipped if the NIC
> >>>>> has already computed a 4-tuple hash.
> >>>>>
> >>>>> What it does not give is a type indication, such as
> >>>>> VIRTIO_NET_HASH_TYPE_TCPv6. I don't understand how this would be used.
> >>>>> In datapaths where the NIC has already computed the four-tuple hash
> >>>>> and stored it in skb->hash --the common case for servers--, That type
> >>>>> field is the only reason to have to compute again.
> >>>> The problem is there's no guarantee that the packet comes from the NIC,
> >>>> it could be a simple VM2VM or host2VM packet.
> >>>>
> >>>> And even if the packet is coming from the NIC that calculates the hash
> >>>> there's no guarantee that it's the has that guest want (guest may use
> >>>> different RSS keys).
> >>> Ah yes, of course.
> >>>
> >>> I would still revisit the need to store a detailed hash_type along with
> >>> the hash, as as far I can tell that conveys no actionable information
> >>> to the guest.
> >>
> >> Yes, need to figure out its usage. According to [1], it only mention
> >> that storing has type is a charge of driver. Maybe Yuri can answer this.
> >>
> > For the case of Windows VM we can't know how exactly the network stack
> > uses provided hash data (including hash type). But: different releases
> > of Windows
> > enable different hash types (for example UDP hash is enabled only on
> > Server 2016 and up).
> >
> > Indeed the Windows requires a little more from the network adapter/driver
> > than Linux does.
> >
> > The addition of RSS support to virtio specification takes in account
> > the widest set of
> > requirements (i.e. Windows one), our initial impression is that this
> > should be enough also for Linux.
> >
> > The NDIS specification in part of RSS is _mandatory_ and there are
> > certification tests
> > that check that the driver provides the hash data as expected. All the
> > high-performance
> > network adapters have such RSS functionality in the hardware.
> > With pre-RSS QEMU (i.e. where the virtio-net device does not indicate
> > the RSS support)
> > the virtio-net driver for Windows does all the job related to RSS:
> > - hash calculation
> > - hash/hash_type delivery
> > - reporting each packet on the correct CPU according to RSS settings
> >
> > With RSS support in QEMU all the packets always come on a proper CPU and
> > the driver never needs to reschedule them. The driver still need to
> > calculate the
> > hash and report it to Windows. In this case we do the same job twice: the device
> > (QEMU or eBPF) does calculate the hash and get proper queue/CPU to deliver
> > the packet. But the hash is not delivered by the device, so the driver needs to
> > recalculate it and report to the Windows.
> >
> > If we add HASH_REPORT support (current set of patches) and the device
> > indicates this
> > feature we can avoid hash recalculation in the driver assuming we
> > receive the correct hash
> > value and hash type. Otherwise the driver can't know which exactly
> > hash the device has calculated.
> >
> > Please let me know if I did not answer the question.
>
>
> I think I get you. The hash type is also a kind of classification (e.g
> TCP or UDP). Any possibility that it can be deduced from the driver? (Or
> it could be too expensive to do that).
>
The driver does it today (when the device does not offer any features)
and of course can continue doing it.
IMO if the device can't report the data according to the spec it
should not indicate support for the respective feature (or fallback to
vhost=off).
Again, IMO if Linux does not need the exact hash_type we can use (for
Linux) the way that Willem de Brujin suggested in his patchset:
- just add VIRTIO_NET_HASH_REPORT_L4 to the spec
- Linux can use MQ + hash delivery (and use VIRTIO_NET_HASH_REPORT_L4)
- Linux can use (if makes sense) RSS with VIRTIO_NET_HASH_REPORT_L4 and eBPF
- Windows gets what it needs + eBPF
So, everyone has what they need at the respective cost.

Regarding use of skb->cb for hash type:
Currently, if I'm not mistaken, there are 2 bytes at the end of skb->cb:
skb->cb is 48 bytes array
There is skb_gso_cb (14 bytes) at offset SKB_GSO_CB_OFFSET(32)
Is it possible to use one of these 2 bytes for hash_type?
If yes, shall we extend the skb_gso_cb and place the 1-bytes hash_type
in it or just emit compilation error if the skb_gso_cb grows beyond 15
bytes?


> Thanks
>
>
> >
> >> Thanks
> >>
> >> [1]
> >> https://docs.microsoft.com/en-us/windows-hardware/drivers/network/indicating-rss-receive-data
> >>
> >>
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/7] tun: allow use of BPF_PROG_TYPE_SCHED_CLS program type
  2021-01-12 20:55     ` Yuri Benditovich
@ 2021-01-18  9:16       ` Yuri Benditovich
  2021-01-20 18:44       ` Alexei Starovoitov
  1 sibling, 0 replies; 26+ messages in thread
From: Yuri Benditovich @ 2021-01-18  9:16 UTC (permalink / raw)
  To: Alexei Starovoitov, Network Development, LKML, kvm, virtualization, bpf
  Cc: Yan Vugenfirer, Jakub Kicinski, David S. Miller, Jason Wang,
	Michael S . Tsirkin, Andrii Nakryiko, Daniel Borkmann, Song Liu,
	Martin KaFai Lau, John Fastabend, Yonghong Song,
	Willem de Bruijn, KP Singh, Gustavo A . R . Silva, Randy Dunlap,
	Marco Elver, decui, cai, Pablo Neira Ayuso, Steffen Klassert,
	Herbert Xu, Jakub Sitnicki, Paolo Abeni

Hello Alexei,

Can you please answer the questions in the last email of this thread?
Your comment will be extremely helpful.

Thanks

On Tue, Jan 12, 2021 at 10:55 PM Yuri Benditovich
<yuri.benditovich@daynix.com> wrote:
>
> On Tue, Jan 12, 2021 at 10:40 PM Yuri Benditovich
> <yuri.benditovich@daynix.com> wrote:
> >
> > On Tue, Jan 12, 2021 at 9:42 PM Yuri Benditovich
> > <yuri.benditovich@daynix.com> wrote:
> > >
> > > This program type can set skb hash value. It will be useful
> > > when the tun will support hash reporting feature if virtio-net.
> > >
> > > Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
> > > ---
> > >  drivers/net/tun.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > > index 7959b5c2d11f..455f7afc1f36 100644
> > > --- a/drivers/net/tun.c
> > > +++ b/drivers/net/tun.c
> > > @@ -2981,6 +2981,8 @@ static int tun_set_ebpf(struct tun_struct *tun, struct tun_prog __rcu **prog_p,
> > >                 prog = NULL;
> > >         } else {
> > >                 prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_SOCKET_FILTER);
> > > +               if (IS_ERR(prog))
> > > +                       prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_SCHED_CLS);
> > >                 if (IS_ERR(prog))
> > >                         return PTR_ERR(prog);
> > >         }
> >
> > Comment from Alexei Starovoitov:
> > Patches 1 and 2 are missing for me, so I couldn't review properly,
> > but this diff looks odd.
> > It allows sched_cls prog type to attach to tun.
> > That means everything that sched_cls progs can do will be done from tun hook?
>
> We do not have an intention to modify the packet in this steering eBPF.
> There is just one function that unavailable for BPF_PROG_TYPE_SOCKET_FILTER
> that the eBPF needs to make possible to deliver the hash to the guest
> VM - it is 'bpf_set_hash'
>
> Does it mean that we need to define a new eBPF type for socket filter
> operations + set_hash?
>
> Our problem is that the eBPF calculates 32-bit hash, 16-bit queue
> index and 8-bit of hash type.
> But it is able to return only 32-bit integer, so in this set of
> patches the eBPF returns
> queue index and hash type and saves the hash in skb->hash using bpf_set_hash().
>
> If this is unacceptable, can you please recommend a better solution?
>
> > sched_cls assumes l2 and can modify the packet.
>
> The steering eBPF in TUN module also assumes l2.
>
> > I think crashes are inevitable.
> >
> > > --
> > > 2.17.1
> > >

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 0/7] Support for virtio-net hash reporting
  2021-01-18  9:09                 ` Yuri Benditovich
@ 2021-01-18 15:19                   ` Willem de Bruijn
  0 siblings, 0 replies; 26+ messages in thread
From: Willem de Bruijn @ 2021-01-18 15:19 UTC (permalink / raw)
  To: Yuri Benditovich
  Cc: Jason Wang, Willem de Bruijn, David S. Miller, Jakub Kicinski,
	Michael S . Tsirkin, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Randy Dunlap, Gustavo A . R . Silva,
	Herbert Xu, Steffen Klassert, Pablo Neira Ayuso, decui, cai,
	Jakub Sitnicki, Marco Elver, Paolo Abeni, Network Development,
	linux-kernel, kvm, virtualization, bpf, Yan Vugenfirer

> > >>>>> What it does not give is a type indication, such as
> > >>>>> VIRTIO_NET_HASH_TYPE_TCPv6. I don't understand how this would be used.
> > >>>>> In datapaths where the NIC has already computed the four-tuple hash
> > >>>>> and stored it in skb->hash --the common case for servers--, That type
> > >>>>> field is the only reason to have to compute again.
> > >>>> The problem is there's no guarantee that the packet comes from the NIC,
> > >>>> it could be a simple VM2VM or host2VM packet.
> > >>>>
> > >>>> And even if the packet is coming from the NIC that calculates the hash
> > >>>> there's no guarantee that it's the has that guest want (guest may use
> > >>>> different RSS keys).
> > >>> Ah yes, of course.
> > >>>
> > >>> I would still revisit the need to store a detailed hash_type along with
> > >>> the hash, as as far I can tell that conveys no actionable information
> > >>> to the guest.
> > >>
> > >> Yes, need to figure out its usage. According to [1], it only mention
> > >> that storing has type is a charge of driver. Maybe Yuri can answer this.
> > >>
> > > For the case of Windows VM we can't know how exactly the network stack
> > > uses provided hash data (including hash type). But: different releases
> > > of Windows
> > > enable different hash types (for example UDP hash is enabled only on
> > > Server 2016 and up).
> > >
> > > Indeed the Windows requires a little more from the network adapter/driver
> > > than Linux does.
> > >
> > > The addition of RSS support to virtio specification takes in account
> > > the widest set of
> > > requirements (i.e. Windows one), our initial impression is that this
> > > should be enough also for Linux.
> > >
> > > The NDIS specification in part of RSS is _mandatory_ and there are
> > > certification tests
> > > that check that the driver provides the hash data as expected. All the
> > > high-performance
> > > network adapters have such RSS functionality in the hardware.

Thanks for the context.

If Windows requires the driver to pass the hash-type along with the
hash data, then indeed this will be needed.

If it only requires the device to support a subset of of the possible
types, chosen at init, that would be different and it would be cheaper
for the driver to pass this config to the device one time.

> > > With pre-RSS QEMU (i.e. where the virtio-net device does not indicate
> > > the RSS support)
> > > the virtio-net driver for Windows does all the job related to RSS:
> > > - hash calculation
> > > - hash/hash_type delivery
> > > - reporting each packet on the correct CPU according to RSS settings
> > >
> > > With RSS support in QEMU all the packets always come on a proper CPU and
> > > the driver never needs to reschedule them. The driver still need to
> > > calculate the
> > > hash and report it to Windows. In this case we do the same job twice: the device
> > > (QEMU or eBPF) does calculate the hash and get proper queue/CPU to deliver
> > > the packet. But the hash is not delivered by the device, so the driver needs to
> > > recalculate it and report to the Windows.
> > >
> > > If we add HASH_REPORT support (current set of patches) and the device
> > > indicates this
> > > feature we can avoid hash recalculation in the driver assuming we
> > > receive the correct hash
> > > value and hash type. Otherwise the driver can't know which exactly
> > > hash the device has calculated.
> > >
> > > Please let me know if I did not answer the question.
> >
> >
> > I think I get you. The hash type is also a kind of classification (e.g
> > TCP or UDP). Any possibility that it can be deduced from the driver? (Or
> > it could be too expensive to do that).
> >
> The driver does it today (when the device does not offer any features)
> and of course can continue doing it.
> IMO if the device can't report the data according to the spec it
> should not indicate support for the respective feature (or fallback to
> vhost=off).
> Again, IMO if Linux does not need the exact hash_type we can use (for
> Linux) the way that Willem de Brujin suggested in his patchset:
> - just add VIRTIO_NET_HASH_REPORT_L4 to the spec
> - Linux can use MQ + hash delivery (and use VIRTIO_NET_HASH_REPORT_L4)
> - Linux can use (if makes sense) RSS with VIRTIO_NET_HASH_REPORT_L4 and eBPF
> - Windows gets what it needs + eBPF
> So, everyone has what they need at the respective cost.
>
> Regarding use of skb->cb for hash type:
> Currently, if I'm not mistaken, there are 2 bytes at the end of skb->cb:
> skb->cb is 48 bytes array
> There is skb_gso_cb (14 bytes) at offset SKB_GSO_CB_OFFSET(32)
> Is it possible to use one of these 2 bytes for hash_type?
> If yes, shall we extend the skb_gso_cb and place the 1-bytes hash_type
> in it or just emit compilation error if the skb_gso_cb grows beyond 15
> bytes?

Good catch on segmentation taking place between .ndo_select_queue and
.ndo_start_xmit.

That also means that whatever field in the skb is used, has to be
copied to all segments in skb_segment. Which happens for cb. But this
feature is completely unrelated to the skb_gso_cb type. Perhaps
another field with a real type is more clear. For instance, an
extension to the union with napi_id and sender_cpu, as neither is used
in this egress path with .ndo_select_queue?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/7] tun: allow use of BPF_PROG_TYPE_SCHED_CLS program type
  2021-01-12 20:55     ` Yuri Benditovich
  2021-01-18  9:16       ` Yuri Benditovich
@ 2021-01-20 18:44       ` Alexei Starovoitov
  2021-01-24 11:52         ` Yuri Benditovich
  1 sibling, 1 reply; 26+ messages in thread
From: Alexei Starovoitov @ 2021-01-20 18:44 UTC (permalink / raw)
  To: Yuri Benditovich
  Cc: David S. Miller, Jakub Kicinski, Michael S . Tsirkin, Jason Wang,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Randy Dunlap, Willem de Bruijn, gustavoars, Herbert Xu,
	Steffen Klassert, Pablo Neira Ayuso, decui, cai, Jakub Sitnicki,
	Marco Elver, Paolo Abeni, Network Development, LKML, kvm,
	virtualization, bpf, Yan Vugenfirer

On Tue, Jan 12, 2021 at 12:55 PM Yuri Benditovich
<yuri.benditovich@daynix.com> wrote:
>
> On Tue, Jan 12, 2021 at 10:40 PM Yuri Benditovich
> <yuri.benditovich@daynix.com> wrote:
> >
> > On Tue, Jan 12, 2021 at 9:42 PM Yuri Benditovich
> > <yuri.benditovich@daynix.com> wrote:
> > >
> > > This program type can set skb hash value. It will be useful
> > > when the tun will support hash reporting feature if virtio-net.
> > >
> > > Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
> > > ---
> > >  drivers/net/tun.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > > index 7959b5c2d11f..455f7afc1f36 100644
> > > --- a/drivers/net/tun.c
> > > +++ b/drivers/net/tun.c
> > > @@ -2981,6 +2981,8 @@ static int tun_set_ebpf(struct tun_struct *tun, struct tun_prog __rcu **prog_p,
> > >                 prog = NULL;
> > >         } else {
> > >                 prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_SOCKET_FILTER);
> > > +               if (IS_ERR(prog))
> > > +                       prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_SCHED_CLS);
> > >                 if (IS_ERR(prog))
> > >                         return PTR_ERR(prog);
> > >         }
> >
> > Comment from Alexei Starovoitov:
> > Patches 1 and 2 are missing for me, so I couldn't review properly,
> > but this diff looks odd.
> > It allows sched_cls prog type to attach to tun.
> > That means everything that sched_cls progs can do will be done from tun hook?
>
> We do not have an intention to modify the packet in this steering eBPF.

The intent is irrelevant. Using SCHED_CLS here will let users modify the packet
and some users will do so. Hence the tun code has to support it.

> There is just one function that unavailable for BPF_PROG_TYPE_SOCKET_FILTER
> that the eBPF needs to make possible to deliver the hash to the guest
> VM - it is 'bpf_set_hash'
>
> Does it mean that we need to define a new eBPF type for socket filter
> operations + set_hash?
>
> Our problem is that the eBPF calculates 32-bit hash, 16-bit queue
> index and 8-bit of hash type.
> But it is able to return only 32-bit integer, so in this set of
> patches the eBPF returns
> queue index and hash type and saves the hash in skb->hash using bpf_set_hash().

bpf prog can only return a 32-bit integer. That's true.
But the prog can use helpers to set any number of bits and variables.
bpf_set_hash_v2() with hash, queue and index arguments could fit this purpose,
but if you allow it for SCHED_CLS type,
tc side of the code should be ready to deal with that too and this extended
helper should be meaningful for both tc and tun.

In general if the purpose of the prog is to compute three values they better be
grouped together. Returned two of them via ORed 32-bit integer and
returning 32-bit via bpf_set_hash is an awkward api.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/7] tun: allow use of BPF_PROG_TYPE_SCHED_CLS program type
  2021-01-20 18:44       ` Alexei Starovoitov
@ 2021-01-24 11:52         ` Yuri Benditovich
  0 siblings, 0 replies; 26+ messages in thread
From: Yuri Benditovich @ 2021-01-24 11:52 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Jakub Kicinski, Michael S . Tsirkin, Jason Wang,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Randy Dunlap, Willem de Bruijn, Gustavo A . R . Silva,
	Herbert Xu, Steffen Klassert, Pablo Neira Ayuso, decui, cai,
	Jakub Sitnicki, Marco Elver, Paolo Abeni, Network Development,
	LKML, kvm, virtualization, bpf, Yan Vugenfirer

On Wed, Jan 20, 2021 at 8:45 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Jan 12, 2021 at 12:55 PM Yuri Benditovich
> <yuri.benditovich@daynix.com> wrote:
> >
> > On Tue, Jan 12, 2021 at 10:40 PM Yuri Benditovich
> > <yuri.benditovich@daynix.com> wrote:
> > >
> > > On Tue, Jan 12, 2021 at 9:42 PM Yuri Benditovich
> > > <yuri.benditovich@daynix.com> wrote:
> > > >
> > > > This program type can set skb hash value. It will be useful
> > > > when the tun will support hash reporting feature if virtio-net.
> > > >
> > > > Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
> > > > ---
> > > >  drivers/net/tun.c | 2 ++
> > > >  1 file changed, 2 insertions(+)
> > > >
> > > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > > > index 7959b5c2d11f..455f7afc1f36 100644
> > > > --- a/drivers/net/tun.c
> > > > +++ b/drivers/net/tun.c
> > > > @@ -2981,6 +2981,8 @@ static int tun_set_ebpf(struct tun_struct *tun, struct tun_prog __rcu **prog_p,
> > > >                 prog = NULL;
> > > >         } else {
> > > >                 prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_SOCKET_FILTER);
> > > > +               if (IS_ERR(prog))
> > > > +                       prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_SCHED_CLS);
> > > >                 if (IS_ERR(prog))
> > > >                         return PTR_ERR(prog);
> > > >         }
> > >
> > > Comment from Alexei Starovoitov:
> > > Patches 1 and 2 are missing for me, so I couldn't review properly,
> > > but this diff looks odd.
> > > It allows sched_cls prog type to attach to tun.
> > > That means everything that sched_cls progs can do will be done from tun hook?
> >
> > We do not have an intention to modify the packet in this steering eBPF.
>
> The intent is irrelevant. Using SCHED_CLS here will let users modify the packet
> and some users will do so. Hence the tun code has to support it.
>
> > There is just one function that unavailable for BPF_PROG_TYPE_SOCKET_FILTER
> > that the eBPF needs to make possible to deliver the hash to the guest
> > VM - it is 'bpf_set_hash'
> >
> > Does it mean that we need to define a new eBPF type for socket filter
> > operations + set_hash?
> >
> > Our problem is that the eBPF calculates 32-bit hash, 16-bit queue
> > index and 8-bit of hash type.
> > But it is able to return only 32-bit integer, so in this set of
> > patches the eBPF returns
> > queue index and hash type and saves the hash in skb->hash using bpf_set_hash().
>
> bpf prog can only return a 32-bit integer. That's true.
> But the prog can use helpers to set any number of bits and variables.
> bpf_set_hash_v2() with hash, queue and index arguments could fit this purpose,
> but if you allow it for SCHED_CLS type,

Do I understand correctly that this means:
1. Creation of new helper like
https://lists.linuxfoundation.org/pipermail/bridge/2020-July/013036.html
2. Validation on tun side that the BPF uses only limited subset of
helpers available for SCHED_CLS

> tc side of the code should be ready to deal with that too and this extended
> helper should be meaningful for both tc and tun.
>
> In general if the purpose of the prog is to compute three values they better be
> grouped together. Returned two of them via ORed 32-bit integer and
> returning 32-bit via bpf_set_hash is an awkward api.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC PATCH 4/7] tun: free bpf_program by bpf_prog_put instead of bpf_prog_destroy
  2021-01-05 12:24 Yuri Benditovich
@ 2021-01-05 12:24 ` Yuri Benditovich
  0 siblings, 0 replies; 26+ messages in thread
From: Yuri Benditovich @ 2021-01-05 12:24 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh
  Cc: yan, netdev, linux-kernel, bpf

The module never creates the bpf program with bpf_prog_create
so it shouldn't free it with bpf_prog_destroy.
The program is obtained by bpf_prog_get and should be freed
by bpf_prog_put. For BPF_PROG_TYPE_SOCKET_FILTER both
methods do the same but for other program types they don't.

Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
---
 drivers/net/tun.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 455f7afc1f36..18c1baf1a6c1 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2218,7 +2218,7 @@ static void tun_prog_free(struct rcu_head *rcu)
 {
 	struct tun_prog *prog = container_of(rcu, struct tun_prog, rcu);
 
-	bpf_prog_destroy(prog->prog);
+	bpf_prog_put(prog->prog);
 	kfree(prog);
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-01-24 11:55 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-12 19:41 [RFC PATCH 0/7] Support for virtio-net hash reporting Yuri Benditovich
2021-01-12 19:41 ` [RFC PATCH 1/7] skbuff: define field for hash report type Yuri Benditovich
2021-01-12 19:41 ` [RFC PATCH 2/7] vhost: support for hash report virtio-net feature Yuri Benditovich
2021-01-12 19:41 ` [RFC PATCH 3/7] tun: allow use of BPF_PROG_TYPE_SCHED_CLS program type Yuri Benditovich
2021-01-12 19:46   ` Alexei Starovoitov
2021-01-12 20:33     ` Yuri Benditovich
2021-01-12 20:40   ` Yuri Benditovich
2021-01-12 20:55     ` Yuri Benditovich
2021-01-18  9:16       ` Yuri Benditovich
2021-01-20 18:44       ` Alexei Starovoitov
2021-01-24 11:52         ` Yuri Benditovich
2021-01-12 19:41 ` [RFC PATCH 4/7] tun: free bpf_program by bpf_prog_put instead of bpf_prog_destroy Yuri Benditovich
2021-01-12 19:41 ` [RFC PATCH 5/7] tun: add ioctl code TUNSETHASHPOPULATION Yuri Benditovich
2021-01-12 19:41 ` [RFC PATCH 6/7] tun: populate hash in virtio-net header when needed Yuri Benditovich
2021-01-12 19:41 ` [RFC PATCH 7/7] tun: report new tun feature IFF_HASH Yuri Benditovich
2021-01-12 19:49 ` [RFC PATCH 0/7] Support for virtio-net hash reporting Yuri Benditovich
2021-01-12 20:28   ` Yuri Benditovich
2021-01-12 23:47     ` Willem de Bruijn
2021-01-13  4:05       ` Jason Wang
2021-01-13 14:33         ` Willem de Bruijn
2021-01-14  3:38           ` Jason Wang
2021-01-17  7:57             ` Yuri Benditovich
2021-01-18  2:46               ` Jason Wang
2021-01-18  9:09                 ` Yuri Benditovich
2021-01-18 15:19                   ` Willem de Bruijn
  -- strict thread matches above, loose matches on Subject: below --
2021-01-05 12:24 Yuri Benditovich
2021-01-05 12:24 ` [RFC PATCH 4/7] tun: free bpf_program by bpf_prog_put instead of bpf_prog_destroy Yuri Benditovich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).