* [RFC net-next 01/18] bpf: introduce bpf_prog_offload_verifier_setup()
2019-11-26 10:07 [RFC net-next 00/18] virtio_net XDP offload Prashant Bhole
@ 2019-11-26 10:07 ` Prashant Bhole
2019-11-26 10:07 ` [RFC net-next 02/18] net: core: rename netif_receive_generic_xdp() to do_generic_xdp_core() Prashant Bhole
` (17 subsequent siblings)
18 siblings, 0 replies; 43+ messages in thread
From: Prashant Bhole @ 2019-11-26 10:07 UTC (permalink / raw)
To: David S . Miller, Michael S . Tsirkin
Cc: Song Liu, Jakub Kicinski, Jesper Dangaard Brouer,
Daniel Borkmann, netdev, Jason Wang, John Fastabend,
Alexei Starovoitov, qemu-devel, Prashant Bhole, kvm,
Yonghong Song, Andrii Nakryiko, Martin KaFai Lau
From: Jason Wang <jasowang@redhat.com>
Background:
This change was initiated from virtio_net XDP offload work. As per
the implementation plan, a copy of original program with map fds from
guest replaced with map fds from host needs to be offloaded to the
host. To implement this fd replacement, insn_hook() must provide an
insn with map fd intact. bpf_map and driver specific map data can be
derived from map_fd.
Since verifier calls all the offload callbacks after replacing map
fds, it was difficult to implement virtio_net XDP offload feature.
If virtio_net gets only one callback with original bpf program, it
will get a chance to perform the fd replacement in its own copy of the
program.
Solution:
Let's introduce a setup() callback in bpf_prog_offload_ops. It will be
non mandetory. The verifier will call it just before replacing the map
fds.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
include/linux/bpf.h | 1 +
include/linux/bpf_verifier.h | 1 +
kernel/bpf/offload.c | 14 ++++++++++++++
kernel/bpf/verifier.c | 6 ++++++
4 files changed, 22 insertions(+)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 35903f148be5..1cdba120357c 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -361,6 +361,7 @@ struct bpf_prog_offload_ops {
struct bpf_insn *insn);
int (*remove_insns)(struct bpf_verifier_env *env, u32 off, u32 cnt);
/* program management callbacks */
+ int (*setup)(struct bpf_prog *prog);
int (*prepare)(struct bpf_prog *prog);
int (*translate)(struct bpf_prog *prog);
void (*destroy)(struct bpf_prog *prog);
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 26e40de9ef55..de7028e17c0d 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -418,6 +418,7 @@ static inline struct bpf_reg_state *cur_regs(struct bpf_verifier_env *env)
return cur_func(env)->regs;
}
+int bpf_prog_offload_verifier_setup(struct bpf_prog *prog);
int bpf_prog_offload_verifier_prep(struct bpf_prog *prog);
int bpf_prog_offload_verify_insn(struct bpf_verifier_env *env,
int insn_idx, int prev_insn_idx);
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index 5b9da0954a27..04ca7a31d947 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -124,6 +124,20 @@ int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr)
return err;
}
+int bpf_prog_offload_verifier_setup(struct bpf_prog *prog)
+{
+ struct bpf_prog_offload *offload;
+ int ret = 0;
+
+ down_read(&bpf_devs_lock);
+ offload = prog->aux->offload;
+ if (offload && offload->offdev->ops->setup)
+ ret = offload->offdev->ops->setup(prog);
+ up_read(&bpf_devs_lock);
+
+ return ret;
+}
+
int bpf_prog_offload_verifier_prep(struct bpf_prog *prog)
{
struct bpf_prog_offload *offload;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index a0482e1c4a77..94b43542439e 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -9737,6 +9737,12 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr,
env->allow_ptr_leaks = is_priv;
+ if (bpf_prog_is_dev_bound(env->prog->aux)) {
+ ret = bpf_prog_offload_verifier_setup(env->prog);
+ if (ret)
+ goto skip_full_check;
+ }
+
if (is_priv)
env->test_state_freq = attr->prog_flags & BPF_F_TEST_STATE_FREQ;
--
2.20.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [RFC net-next 02/18] net: core: rename netif_receive_generic_xdp() to do_generic_xdp_core()
2019-11-26 10:07 [RFC net-next 00/18] virtio_net XDP offload Prashant Bhole
2019-11-26 10:07 ` [RFC net-next 01/18] bpf: introduce bpf_prog_offload_verifier_setup() Prashant Bhole
@ 2019-11-26 10:07 ` Prashant Bhole
2019-11-26 10:07 ` [RFC net-next 03/18] net: core: export do_xdp_generic_core() Prashant Bhole
` (16 subsequent siblings)
18 siblings, 0 replies; 43+ messages in thread
From: Prashant Bhole @ 2019-11-26 10:07 UTC (permalink / raw)
To: David S . Miller, Michael S . Tsirkin
Cc: Song Liu, Jakub Kicinski, Jesper Dangaard Brouer,
Daniel Borkmann, netdev, Jason Wang, John Fastabend,
Alexei Starovoitov, qemu-devel, Prashant Bhole, kvm,
Yonghong Song, Andrii Nakryiko, Martin KaFai Lau
From: Jason Wang <jasowang@redhat.com>
In skb generic path, we need a way to run XDP program on skb but
to have customized handling of XDP actions. netif_receive_generic_xdp
will be more helpful in such cases than do_xdp_generic.
This patch prepares netif_receive_generic_xdp() to be used as general
purpose function by renaming it.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
net/core/dev.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index c7fc902ccbdc..5ae647b9914f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4461,9 +4461,9 @@ static struct netdev_rx_queue *netif_get_rxqueue(struct sk_buff *skb)
return rxqueue;
}
-static u32 netif_receive_generic_xdp(struct sk_buff *skb,
- struct xdp_buff *xdp,
- struct bpf_prog *xdp_prog)
+static u32 do_xdp_generic_core(struct sk_buff *skb,
+ struct xdp_buff *xdp,
+ struct bpf_prog *xdp_prog)
{
struct netdev_rx_queue *rxqueue;
void *orig_data, *orig_data_end;
@@ -4610,7 +4610,7 @@ int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
u32 act;
int err;
- act = netif_receive_generic_xdp(skb, &xdp, xdp_prog);
+ act = do_xdp_generic_core(skb, &xdp, xdp_prog);
if (act != XDP_PASS) {
switch (act) {
case XDP_REDIRECT:
--
2.20.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [RFC net-next 03/18] net: core: export do_xdp_generic_core()
2019-11-26 10:07 [RFC net-next 00/18] virtio_net XDP offload Prashant Bhole
2019-11-26 10:07 ` [RFC net-next 01/18] bpf: introduce bpf_prog_offload_verifier_setup() Prashant Bhole
2019-11-26 10:07 ` [RFC net-next 02/18] net: core: rename netif_receive_generic_xdp() to do_generic_xdp_core() Prashant Bhole
@ 2019-11-26 10:07 ` Prashant Bhole
2019-11-26 10:07 ` [RFC net-next 04/18] tuntap: check tun_msg_ctl type at necessary places Prashant Bhole
` (15 subsequent siblings)
18 siblings, 0 replies; 43+ messages in thread
From: Prashant Bhole @ 2019-11-26 10:07 UTC (permalink / raw)
To: David S . Miller, Michael S . Tsirkin
Cc: Song Liu, Jakub Kicinski, Jesper Dangaard Brouer,
Daniel Borkmann, netdev, Jason Wang, John Fastabend,
Alexei Starovoitov, qemu-devel, Prashant Bhole, kvm,
Yonghong Song, Andrii Nakryiko, Martin KaFai Lau
From: Jason Wang <jasowang@redhat.com>
Let's export do_xdp_generic as a general purpose function. It will
just run XDP program on skb but will not handle XDP actions.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
include/linux/netdevice.h | 2 ++
net/core/dev.c | 6 +++---
2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 9e6fb8524d91..2b6317ac9795 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3648,6 +3648,8 @@ static inline void dev_consume_skb_any(struct sk_buff *skb)
void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog);
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb);
+u32 do_xdp_generic_core(struct sk_buff *skb, struct xdp_buff *xdp,
+ struct bpf_prog *xdp_prog);
int netif_rx(struct sk_buff *skb);
int netif_rx_ni(struct sk_buff *skb);
int netif_receive_skb(struct sk_buff *skb);
diff --git a/net/core/dev.c b/net/core/dev.c
index 5ae647b9914f..d97c3f35e047 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4461,9 +4461,8 @@ static struct netdev_rx_queue *netif_get_rxqueue(struct sk_buff *skb)
return rxqueue;
}
-static u32 do_xdp_generic_core(struct sk_buff *skb,
- struct xdp_buff *xdp,
- struct bpf_prog *xdp_prog)
+u32 do_xdp_generic_core(struct sk_buff *skb, struct xdp_buff *xdp,
+ struct bpf_prog *xdp_prog)
{
struct netdev_rx_queue *rxqueue;
void *orig_data, *orig_data_end;
@@ -4574,6 +4573,7 @@ static u32 do_xdp_generic_core(struct sk_buff *skb,
return act;
}
+EXPORT_SYMBOL_GPL(do_xdp_generic_core);
/* When doing generic XDP we have to bypass the qdisc layer and the
* network taps in order to match in-driver-XDP behavior.
--
2.20.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [RFC net-next 04/18] tuntap: check tun_msg_ctl type at necessary places
2019-11-26 10:07 [RFC net-next 00/18] virtio_net XDP offload Prashant Bhole
` (2 preceding siblings ...)
2019-11-26 10:07 ` [RFC net-next 03/18] net: core: export do_xdp_generic_core() Prashant Bhole
@ 2019-11-26 10:07 ` Prashant Bhole
2019-11-26 10:07 ` [RFC net-next 05/18] vhost_net: user tap recvmsg api to access ptr ring Prashant Bhole
` (14 subsequent siblings)
18 siblings, 0 replies; 43+ messages in thread
From: Prashant Bhole @ 2019-11-26 10:07 UTC (permalink / raw)
To: David S . Miller, Michael S . Tsirkin
Cc: Song Liu, Jakub Kicinski, Jesper Dangaard Brouer,
Daniel Borkmann, netdev, Jason Wang, John Fastabend,
Alexei Starovoitov, qemu-devel, Prashant Bhole, kvm,
Yonghong Song, Andrii Nakryiko, Martin KaFai Lau
tun_msg_ctl is used by vhost_net to communicate with tuntap. We will
introduce another type in soon. As a preparation this patch adds
conditions to check tun_msg_ctl type at necessary places.
Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
drivers/net/tap.c | 7 +++++--
drivers/net/tun.c | 6 +++++-
2 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index 3ae70c7e6860..4df7bf00af66 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -1213,6 +1213,7 @@ static int tap_sendmsg(struct socket *sock, struct msghdr *m,
struct tap_queue *q = container_of(sock, struct tap_queue, sock);
struct tun_msg_ctl *ctl = m->msg_control;
struct xdp_buff *xdp;
+ void *ptr = NULL;
int i;
if (ctl && (ctl->type == TUN_MSG_PTR)) {
@@ -1223,8 +1224,10 @@ static int tap_sendmsg(struct socket *sock, struct msghdr *m,
return 0;
}
- return tap_get_user(q, ctl ? ctl->ptr : NULL, &m->msg_iter,
- m->msg_flags & MSG_DONTWAIT);
+ if (ctl && ctl->type == TUN_MSG_UBUF)
+ ptr = ctl->ptr;
+
+ return tap_get_user(q, ptr, &m->msg_iter, m->msg_flags & MSG_DONTWAIT);
}
static int tap_recvmsg(struct socket *sock, struct msghdr *m,
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 683d371e6e82..1e436d9ec4e1 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2529,6 +2529,7 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
struct tun_struct *tun = tun_get(tfile);
struct tun_msg_ctl *ctl = m->msg_control;
struct xdp_buff *xdp;
+ void *ptr = NULL;
if (!tun)
return -EBADFD;
@@ -2560,7 +2561,10 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
goto out;
}
- ret = tun_get_user(tun, tfile, ctl ? ctl->ptr : NULL, &m->msg_iter,
+ if (ctl && ctl->type == TUN_MSG_UBUF)
+ ptr = ctl->ptr;
+
+ ret = tun_get_user(tun, tfile, ptr, &m->msg_iter,
m->msg_flags & MSG_DONTWAIT,
m->msg_flags & MSG_MORE);
out:
--
2.20.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [RFC net-next 05/18] vhost_net: user tap recvmsg api to access ptr ring
2019-11-26 10:07 [RFC net-next 00/18] virtio_net XDP offload Prashant Bhole
` (3 preceding siblings ...)
2019-11-26 10:07 ` [RFC net-next 04/18] tuntap: check tun_msg_ctl type at necessary places Prashant Bhole
@ 2019-11-26 10:07 ` Prashant Bhole
2019-11-26 10:07 ` [RFC net-next 06/18] tuntap: remove usage of ptr ring in vhost_net Prashant Bhole
` (13 subsequent siblings)
18 siblings, 0 replies; 43+ messages in thread
From: Prashant Bhole @ 2019-11-26 10:07 UTC (permalink / raw)
To: David S . Miller, Michael S . Tsirkin
Cc: Song Liu, Jakub Kicinski, Jesper Dangaard Brouer,
Daniel Borkmann, netdev, Jason Wang, John Fastabend,
Alexei Starovoitov, qemu-devel, Prashant Bhole, kvm,
Yonghong Song, Andrii Nakryiko, Martin KaFai Lau
Currently vhost_net directly accesses ptr ring of tap driver to
fetch Rx packet pointers. In order to avoid it this patch modifies
tap driver's recvmsg api to do additional task of fetching Rx packet
pointers.
A special struct tun_msg_ctl is already being passed via msg_control
for tun Rx XDP batching. This patch extends tun_msg_ctl usage to
send sub commands to recvmsg api. Now tun_recvmsg will handle commands
to consume and unconsume packet pointers from ptr ring.
This will be useful in implementation of virtio-net XDP offload
feature, where packets will be XDP processed before they are passed
to vhost_net.
Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
drivers/net/tap.c | 22 ++++++++++++++++++-
drivers/net/tun.c | 24 ++++++++++++++++++++-
drivers/vhost/net.c | 48 +++++++++++++++++++++++++++++++-----------
include/linux/if_tun.h | 18 ++++++++++++++++
4 files changed, 98 insertions(+), 14 deletions(-)
diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index 4df7bf00af66..8635cdfd7aa4 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -1234,8 +1234,28 @@ static int tap_recvmsg(struct socket *sock, struct msghdr *m,
size_t total_len, int flags)
{
struct tap_queue *q = container_of(sock, struct tap_queue, sock);
- struct sk_buff *skb = m->msg_control;
+ struct tun_msg_ctl *ctl = m->msg_control;
+ struct sk_buff *skb = NULL;
int ret;
+
+ if (ctl) {
+ switch (ctl->type) {
+ case TUN_MSG_PKT:
+ skb = ctl->ptr;
+ break;
+ case TUN_MSG_CONSUME_PKTS:
+ return ptr_ring_consume_batched(&q->ring,
+ ctl->ptr,
+ ctl->num);
+ case TUN_MSG_UNCONSUME_PKTS:
+ ptr_ring_unconsume(&q->ring, ctl->ptr, ctl->num,
+ tun_ptr_free);
+ return 0;
+ default:
+ return -EINVAL;
+ }
+ }
+
if (flags & ~(MSG_DONTWAIT|MSG_TRUNC)) {
kfree_skb(skb);
return -EINVAL;
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 1e436d9ec4e1..4f28f2387435 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2577,7 +2577,8 @@ static int tun_recvmsg(struct socket *sock, struct msghdr *m, size_t total_len,
{
struct tun_file *tfile = container_of(sock, struct tun_file, socket);
struct tun_struct *tun = tun_get(tfile);
- void *ptr = m->msg_control;
+ struct tun_msg_ctl *ctl = m->msg_control;
+ void *ptr = NULL;
int ret;
if (!tun) {
@@ -2585,6 +2586,27 @@ static int tun_recvmsg(struct socket *sock, struct msghdr *m, size_t total_len,
goto out_free;
}
+ if (ctl) {
+ switch (ctl->type) {
+ case TUN_MSG_PKT:
+ ptr = ctl->ptr;
+ break;
+ case TUN_MSG_CONSUME_PKTS:
+ ret = ptr_ring_consume_batched(&tfile->tx_ring,
+ ctl->ptr,
+ ctl->num);
+ goto out;
+ case TUN_MSG_UNCONSUME_PKTS:
+ ptr_ring_unconsume(&tfile->tx_ring, ctl->ptr,
+ ctl->num, tun_ptr_free);
+ ret = 0;
+ goto out;
+ default:
+ ret = -EINVAL;
+ goto out_put_tun;
+ }
+ }
+
if (flags & ~(MSG_DONTWAIT|MSG_TRUNC|MSG_ERRQUEUE)) {
ret = -EINVAL;
goto out_put_tun;
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 1a2dd53caade..0f91b374a558 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -175,24 +175,44 @@ static void *vhost_net_buf_consume(struct vhost_net_buf *rxq)
static int vhost_net_buf_produce(struct vhost_net_virtqueue *nvq)
{
+ struct vhost_virtqueue *vq = &nvq->vq;
+ struct socket *sock = vq->private_data;
struct vhost_net_buf *rxq = &nvq->rxq;
+ struct tun_msg_ctl ctl = {
+ .type = TUN_MSG_CONSUME_PKTS,
+ .ptr = (void *) rxq->queue,
+ .num = VHOST_NET_BATCH,
+ };
+ struct msghdr msg = {
+ .msg_control = &ctl,
+ };
rxq->head = 0;
- rxq->tail = ptr_ring_consume_batched(nvq->rx_ring, rxq->queue,
- VHOST_NET_BATCH);
+ rxq->tail = sock->ops->recvmsg(sock, &msg, 0, 0);
+ if (WARN_ON_ONCE(rxq->tail < 0))
+ rxq->tail = 0;
+
return rxq->tail;
}
static void vhost_net_buf_unproduce(struct vhost_net_virtqueue *nvq)
{
+ struct vhost_virtqueue *vq = &nvq->vq;
+ struct socket *sock = vq->private_data;
struct vhost_net_buf *rxq = &nvq->rxq;
+ struct tun_msg_ctl ctl = {
+ .type = TUN_MSG_UNCONSUME_PKTS,
+ .ptr = (void *) (rxq->queue + rxq->head),
+ .num = vhost_net_buf_get_size(rxq),
+ };
+ struct msghdr msg = {
+ .msg_control = &ctl,
+ };
- if (nvq->rx_ring && !vhost_net_buf_is_empty(rxq)) {
- ptr_ring_unconsume(nvq->rx_ring, rxq->queue + rxq->head,
- vhost_net_buf_get_size(rxq),
- tun_ptr_free);
- rxq->head = rxq->tail = 0;
- }
+ if (!vhost_net_buf_is_empty(rxq))
+ sock->ops->recvmsg(sock, &msg, 0, 0);
+
+ rxq->head = rxq->tail = 0;
}
static int vhost_net_buf_peek_len(void *ptr)
@@ -1109,6 +1129,7 @@ static void handle_rx(struct vhost_net *net)
.flags = 0,
.gso_type = VIRTIO_NET_HDR_GSO_NONE
};
+ struct tun_msg_ctl ctl;
size_t total_len = 0;
int err, mergeable;
s16 headcount;
@@ -1166,8 +1187,11 @@ static void handle_rx(struct vhost_net *net)
goto out;
}
busyloop_intr = false;
- if (nvq->rx_ring)
- msg.msg_control = vhost_net_buf_consume(&nvq->rxq);
+ if (nvq->rx_ring) {
+ ctl.type = TUN_MSG_PKT;
+ ctl.ptr = vhost_net_buf_consume(&nvq->rxq);
+ msg.msg_control = &ctl;
+ }
/* On overrun, truncate and discard */
if (unlikely(headcount > UIO_MAXIOV)) {
iov_iter_init(&msg.msg_iter, READ, vq->iov, 1, 1);
@@ -1346,8 +1370,8 @@ static struct socket *vhost_net_stop_vq(struct vhost_net *n,
mutex_lock(&vq->mutex);
sock = vq->private_data;
vhost_net_disable_vq(n, vq);
- vq->private_data = NULL;
vhost_net_buf_unproduce(nvq);
+ vq->private_data = NULL;
nvq->rx_ring = NULL;
mutex_unlock(&vq->mutex);
return sock;
@@ -1538,8 +1562,8 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
}
vhost_net_disable_vq(n, vq);
- vq->private_data = sock;
vhost_net_buf_unproduce(nvq);
+ vq->private_data = sock;
r = vhost_vq_init_access(vq);
if (r)
goto err_used;
diff --git a/include/linux/if_tun.h b/include/linux/if_tun.h
index 5bda8cf457b6..bb94843e3829 100644
--- a/include/linux/if_tun.h
+++ b/include/linux/if_tun.h
@@ -11,8 +11,26 @@
#define TUN_XDP_FLAG 0x1UL
+/*
+ * tun_msg_ctl types
+ */
+
#define TUN_MSG_UBUF 1
#define TUN_MSG_PTR 2
+/*
+ * Used for passing a packet pointer from vhost to tun
+ */
+#define TUN_MSG_PKT 3
+/*
+ * Used for passing an array of pointer from vhost to tun.
+ * tun consumes packets from ptr ring and stores in pointer array.
+ */
+#define TUN_MSG_CONSUME_PKTS 4
+/*
+ * Used for passing an array of pointer from vhost to tun.
+ * tun consumes get pointer from array and puts back into ptr ring.
+ */
+#define TUN_MSG_UNCONSUME_PKTS 5
struct tun_msg_ctl {
unsigned short type;
unsigned short num;
--
2.20.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [RFC net-next 06/18] tuntap: remove usage of ptr ring in vhost_net
2019-11-26 10:07 [RFC net-next 00/18] virtio_net XDP offload Prashant Bhole
` (4 preceding siblings ...)
2019-11-26 10:07 ` [RFC net-next 05/18] vhost_net: user tap recvmsg api to access ptr ring Prashant Bhole
@ 2019-11-26 10:07 ` Prashant Bhole
2019-11-26 10:07 ` [RFC net-next 07/18] tun: set offloaded xdp program Prashant Bhole
` (12 subsequent siblings)
18 siblings, 0 replies; 43+ messages in thread
From: Prashant Bhole @ 2019-11-26 10:07 UTC (permalink / raw)
To: David S . Miller, Michael S . Tsirkin
Cc: Song Liu, Jakub Kicinski, Jesper Dangaard Brouer,
Daniel Borkmann, netdev, Jason Wang, John Fastabend,
Alexei Starovoitov, qemu-devel, Prashant Bhole, kvm,
Yonghong Song, Andrii Nakryiko, Martin KaFai Lau
Remove usage of ptr ring of tuntap in vhost_net and remove the
functions exported from tuntap drivers to get ptr ring.
Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
drivers/net/tap.c | 13 -------------
drivers/net/tun.c | 13 -------------
drivers/vhost/net.c | 31 ++++---------------------------
include/linux/if_tap.h | 5 -----
include/linux/if_tun.h | 5 -----
5 files changed, 4 insertions(+), 63 deletions(-)
diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index 8635cdfd7aa4..6426501b8d0e 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -1298,19 +1298,6 @@ struct socket *tap_get_socket(struct file *file)
}
EXPORT_SYMBOL_GPL(tap_get_socket);
-struct ptr_ring *tap_get_ptr_ring(struct file *file)
-{
- struct tap_queue *q;
-
- if (file->f_op != &tap_fops)
- return ERR_PTR(-EINVAL);
- q = file->private_data;
- if (!q)
- return ERR_PTR(-EBADFD);
- return &q->ring;
-}
-EXPORT_SYMBOL_GPL(tap_get_ptr_ring);
-
int tap_queue_resize(struct tap_dev *tap)
{
struct net_device *dev = tap->dev;
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 4f28f2387435..d078b4659897 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -3750,19 +3750,6 @@ struct socket *tun_get_socket(struct file *file)
}
EXPORT_SYMBOL_GPL(tun_get_socket);
-struct ptr_ring *tun_get_tx_ring(struct file *file)
-{
- struct tun_file *tfile;
-
- if (file->f_op != &tun_fops)
- return ERR_PTR(-EINVAL);
- tfile = file->private_data;
- if (!tfile)
- return ERR_PTR(-EBADFD);
- return &tfile->tx_ring;
-}
-EXPORT_SYMBOL_GPL(tun_get_tx_ring);
-
module_init(tun_init);
module_exit(tun_cleanup);
MODULE_DESCRIPTION(DRV_DESCRIPTION);
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 0f91b374a558..2e069d1ef946 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -122,7 +122,6 @@ struct vhost_net_virtqueue {
/* Reference counting for outstanding ubufs.
* Protected by vq mutex. Writers must also take device mutex. */
struct vhost_net_ubuf_ref *ubufs;
- struct ptr_ring *rx_ring;
struct vhost_net_buf rxq;
/* Batched XDP buffs */
struct xdp_buff *xdp;
@@ -997,8 +996,9 @@ static int peek_head_len(struct vhost_net_virtqueue *rvq, struct sock *sk)
int len = 0;
unsigned long flags;
- if (rvq->rx_ring)
- return vhost_net_buf_peek(rvq);
+ len = vhost_net_buf_peek(rvq);
+ if (len)
+ return len;
spin_lock_irqsave(&sk->sk_receive_queue.lock, flags);
head = skb_peek(&sk->sk_receive_queue);
@@ -1187,7 +1187,7 @@ static void handle_rx(struct vhost_net *net)
goto out;
}
busyloop_intr = false;
- if (nvq->rx_ring) {
+ if (!vhost_net_buf_is_empty(&nvq->rxq)) {
ctl.type = TUN_MSG_PKT;
ctl.ptr = vhost_net_buf_consume(&nvq->rxq);
msg.msg_control = &ctl;
@@ -1343,7 +1343,6 @@ static int vhost_net_open(struct inode *inode, struct file *f)
n->vqs[i].batched_xdp = 0;
n->vqs[i].vhost_hlen = 0;
n->vqs[i].sock_hlen = 0;
- n->vqs[i].rx_ring = NULL;
vhost_net_buf_init(&n->vqs[i].rxq);
}
vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX,
@@ -1372,7 +1371,6 @@ static struct socket *vhost_net_stop_vq(struct vhost_net *n,
vhost_net_disable_vq(n, vq);
vhost_net_buf_unproduce(nvq);
vq->private_data = NULL;
- nvq->rx_ring = NULL;
mutex_unlock(&vq->mutex);
return sock;
}
@@ -1468,25 +1466,6 @@ static struct socket *get_raw_socket(int fd)
return ERR_PTR(r);
}
-static struct ptr_ring *get_tap_ptr_ring(int fd)
-{
- struct ptr_ring *ring;
- struct file *file = fget(fd);
-
- if (!file)
- return NULL;
- ring = tun_get_tx_ring(file);
- if (!IS_ERR(ring))
- goto out;
- ring = tap_get_ptr_ring(file);
- if (!IS_ERR(ring))
- goto out;
- ring = NULL;
-out:
- fput(file);
- return ring;
-}
-
static struct socket *get_tap_socket(int fd)
{
struct file *file = fget(fd);
@@ -1570,8 +1549,6 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
r = vhost_net_enable_vq(n, vq);
if (r)
goto err_used;
- if (index == VHOST_NET_VQ_RX)
- nvq->rx_ring = get_tap_ptr_ring(fd);
oldubufs = nvq->ubufs;
nvq->ubufs = ubufs;
diff --git a/include/linux/if_tap.h b/include/linux/if_tap.h
index 915a187cfabd..68fe366fb185 100644
--- a/include/linux/if_tap.h
+++ b/include/linux/if_tap.h
@@ -4,7 +4,6 @@
#if IS_ENABLED(CONFIG_TAP)
struct socket *tap_get_socket(struct file *);
-struct ptr_ring *tap_get_ptr_ring(struct file *file);
#else
#include <linux/err.h>
#include <linux/errno.h>
@@ -14,10 +13,6 @@ static inline struct socket *tap_get_socket(struct file *f)
{
return ERR_PTR(-EINVAL);
}
-static inline struct ptr_ring *tap_get_ptr_ring(struct file *f)
-{
- return ERR_PTR(-EINVAL);
-}
#endif /* CONFIG_TAP */
#include <net/sock.h>
diff --git a/include/linux/if_tun.h b/include/linux/if_tun.h
index bb94843e3829..f01a255e076d 100644
--- a/include/linux/if_tun.h
+++ b/include/linux/if_tun.h
@@ -44,7 +44,6 @@ struct tun_xdp_hdr {
#if defined(CONFIG_TUN) || defined(CONFIG_TUN_MODULE)
struct socket *tun_get_socket(struct file *);
-struct ptr_ring *tun_get_tx_ring(struct file *file);
bool tun_is_xdp_frame(void *ptr);
void *tun_xdp_to_ptr(void *ptr);
void *tun_ptr_to_xdp(void *ptr);
@@ -58,10 +57,6 @@ static inline struct socket *tun_get_socket(struct file *f)
{
return ERR_PTR(-EINVAL);
}
-static inline struct ptr_ring *tun_get_tx_ring(struct file *f)
-{
- return ERR_PTR(-EINVAL);
-}
static inline bool tun_is_xdp_frame(void *ptr)
{
return false;
--
2.20.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [RFC net-next 07/18] tun: set offloaded xdp program
2019-11-26 10:07 [RFC net-next 00/18] virtio_net XDP offload Prashant Bhole
` (5 preceding siblings ...)
2019-11-26 10:07 ` [RFC net-next 06/18] tuntap: remove usage of ptr ring in vhost_net Prashant Bhole
@ 2019-11-26 10:07 ` Prashant Bhole
2019-12-01 16:35 ` David Ahern
2019-12-01 16:45 ` David Ahern
2019-11-26 10:07 ` [RFC net-next 08/18] tun: run offloaded XDP program in Tx path Prashant Bhole
` (11 subsequent siblings)
18 siblings, 2 replies; 43+ messages in thread
From: Prashant Bhole @ 2019-11-26 10:07 UTC (permalink / raw)
To: David S . Miller, Michael S . Tsirkin
Cc: Song Liu, Jakub Kicinski, Jesper Dangaard Brouer,
Daniel Borkmann, netdev, Jason Wang, John Fastabend,
Alexei Starovoitov, qemu-devel, Prashant Bhole, kvm,
Yonghong Song, Andrii Nakryiko, Martin KaFai Lau
From: Jason Wang <jasowang@redhat.com>
This patch introduces an ioctl way to set an offloaded XDP program
to tun driver. This ioctl will be used by qemu to offload XDP program
from virtio_net in the guest.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
drivers/net/tun.c | 19 ++++++++++++++-----
include/uapi/linux/if_tun.h | 1 +
2 files changed, 15 insertions(+), 5 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index d078b4659897..ecb49101b0b5 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -241,6 +241,7 @@ struct tun_struct {
struct bpf_prog __rcu *xdp_prog;
struct tun_prog __rcu *steering_prog;
struct tun_prog __rcu *filter_prog;
+ struct tun_prog __rcu *offloaded_xdp_prog;
struct ethtool_link_ksettings link_ksettings;
};
@@ -2256,7 +2257,7 @@ static void tun_prog_free(struct rcu_head *rcu)
{
struct tun_prog *prog = container_of(rcu, struct tun_prog, rcu);
- bpf_prog_destroy(prog->prog);
+ bpf_prog_put(prog->prog);
kfree(prog);
}
@@ -2301,6 +2302,7 @@ static void tun_free_netdev(struct net_device *dev)
security_tun_dev_free_security(tun->security);
__tun_set_ebpf(tun, &tun->steering_prog, NULL);
__tun_set_ebpf(tun, &tun->filter_prog, NULL);
+ __tun_set_ebpf(tun, &tun->offloaded_xdp_prog, NULL);
}
static void tun_setup(struct net_device *dev)
@@ -3036,7 +3038,7 @@ static int tun_set_queue(struct file *file, struct ifreq *ifr)
}
static int tun_set_ebpf(struct tun_struct *tun, struct tun_prog **prog_p,
- void __user *data)
+ void __user *data, int type)
{
struct bpf_prog *prog;
int fd;
@@ -3047,7 +3049,7 @@ static int tun_set_ebpf(struct tun_struct *tun, struct tun_prog **prog_p,
if (fd == -1) {
prog = NULL;
} else {
- prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_SOCKET_FILTER);
+ prog = bpf_prog_get_type(fd, type);
if (IS_ERR(prog))
return PTR_ERR(prog);
}
@@ -3345,11 +3347,18 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd,
break;
case TUNSETSTEERINGEBPF:
- ret = tun_set_ebpf(tun, &tun->steering_prog, argp);
+ ret = tun_set_ebpf(tun, &tun->steering_prog, argp,
+ BPF_PROG_TYPE_SOCKET_FILTER);
break;
case TUNSETFILTEREBPF:
- ret = tun_set_ebpf(tun, &tun->filter_prog, argp);
+ ret = tun_set_ebpf(tun, &tun->filter_prog, argp,
+ BPF_PROG_TYPE_SOCKET_FILTER);
+ break;
+
+ case TUNSETOFFLOADEDXDP:
+ ret = tun_set_ebpf(tun, &tun->offloaded_xdp_prog, argp,
+ BPF_PROG_TYPE_XDP);
break;
case TUNSETCARRIER:
diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h
index 454ae31b93c7..21dbd8db2401 100644
--- a/include/uapi/linux/if_tun.h
+++ b/include/uapi/linux/if_tun.h
@@ -61,6 +61,7 @@
#define TUNSETFILTEREBPF _IOR('T', 225, int)
#define TUNSETCARRIER _IOW('T', 226, int)
#define TUNGETDEVNETNS _IO('T', 227)
+#define TUNSETOFFLOADEDXDP _IOW('T', 228, int)
/* TUNSETIFF ifr flags */
#define IFF_TUN 0x0001
--
2.20.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [RFC net-next 07/18] tun: set offloaded xdp program
2019-11-26 10:07 ` [RFC net-next 07/18] tun: set offloaded xdp program Prashant Bhole
@ 2019-12-01 16:35 ` David Ahern
2019-12-02 2:44 ` Jason Wang
2019-12-01 16:45 ` David Ahern
1 sibling, 1 reply; 43+ messages in thread
From: David Ahern @ 2019-12-01 16:35 UTC (permalink / raw)
To: Prashant Bhole, David S . Miller, Michael S . Tsirkin
Cc: Song Liu, Jakub Kicinski, Jesper Dangaard Brouer,
Daniel Borkmann, netdev, Jason Wang, John Fastabend,
Alexei Starovoitov, qemu-devel, kvm, Yonghong Song,
Andrii Nakryiko, Martin KaFai Lau
On 11/26/19 4:07 AM, Prashant Bhole wrote:
> From: Jason Wang <jasowang@redhat.com>
>
> This patch introduces an ioctl way to set an offloaded XDP program
> to tun driver. This ioctl will be used by qemu to offload XDP program
> from virtio_net in the guest.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
> ---
> drivers/net/tun.c | 19 ++++++++++++++-----
> include/uapi/linux/if_tun.h | 1 +
> 2 files changed, 15 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index d078b4659897..ecb49101b0b5 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -241,6 +241,7 @@ struct tun_struct {
> struct bpf_prog __rcu *xdp_prog;
> struct tun_prog __rcu *steering_prog;
> struct tun_prog __rcu *filter_prog;
> + struct tun_prog __rcu *offloaded_xdp_prog;
I have been looking into running XDP pograms in the TX path of a tap
device [1] where the program is installed and managed by a process in
the host. The code paths are the same as what you are doing with XDP
offload, so how about calling this xdp_prog_tx?
[1]
https://github.com/dsahern/linux/commit/f2303d05187c8a604cdb70b288338e9b1d1b0db6
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC net-next 07/18] tun: set offloaded xdp program
2019-12-01 16:35 ` David Ahern
@ 2019-12-02 2:44 ` Jason Wang
0 siblings, 0 replies; 43+ messages in thread
From: Jason Wang @ 2019-12-02 2:44 UTC (permalink / raw)
To: David Ahern, Prashant Bhole, David S . Miller, Michael S . Tsirkin
Cc: Song Liu, Jakub Kicinski, Jesper Dangaard Brouer,
Daniel Borkmann, netdev, John Fastabend, Alexei Starovoitov,
qemu-devel, kvm, Yonghong Song, Andrii Nakryiko,
Martin KaFai Lau
On 2019/12/2 上午12:35, David Ahern wrote:
> On 11/26/19 4:07 AM, Prashant Bhole wrote:
>> From: Jason Wang <jasowang@redhat.com>
>>
>> This patch introduces an ioctl way to set an offloaded XDP program
>> to tun driver. This ioctl will be used by qemu to offload XDP program
>> from virtio_net in the guest.
>>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
>> ---
>> drivers/net/tun.c | 19 ++++++++++++++-----
>> include/uapi/linux/if_tun.h | 1 +
>> 2 files changed, 15 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index d078b4659897..ecb49101b0b5 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -241,6 +241,7 @@ struct tun_struct {
>> struct bpf_prog __rcu *xdp_prog;
>> struct tun_prog __rcu *steering_prog;
>> struct tun_prog __rcu *filter_prog;
>> + struct tun_prog __rcu *offloaded_xdp_prog;
> I have been looking into running XDP pograms in the TX path of a tap
> device [1] where the program is installed and managed by a process in
> the host. The code paths are the same as what you are doing with XDP
> offload, so how about calling this xdp_prog_tx?
>
> [1]
> https://github.com/dsahern/linux/commit/f2303d05187c8a604cdb70b288338e9b1d1b0db6
>
I think it's fine, btw, except for the netlink part there should be no
much difference.
Thanks
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC net-next 07/18] tun: set offloaded xdp program
2019-11-26 10:07 ` [RFC net-next 07/18] tun: set offloaded xdp program Prashant Bhole
2019-12-01 16:35 ` David Ahern
@ 2019-12-01 16:45 ` David Ahern
2019-12-02 2:47 ` Jason Wang
1 sibling, 1 reply; 43+ messages in thread
From: David Ahern @ 2019-12-01 16:45 UTC (permalink / raw)
To: Prashant Bhole, David S . Miller, Michael S . Tsirkin
Cc: Song Liu, Jakub Kicinski, Jesper Dangaard Brouer,
Daniel Borkmann, netdev, Jason Wang, John Fastabend,
Alexei Starovoitov, qemu-devel, kvm, Yonghong Song,
Andrii Nakryiko, Martin KaFai Lau
On 11/26/19 4:07 AM, Prashant Bhole wrote:
> From: Jason Wang <jasowang@redhat.com>
>
> This patch introduces an ioctl way to set an offloaded XDP program
> to tun driver. This ioctl will be used by qemu to offload XDP program
> from virtio_net in the guest.
>
Seems like you need to set / reset the SOCK_XDP flag on tfile->sk since
this is an XDP program.
Also, why not add this program using netlink instead of ioctl? e.g., as
part of a generic XDP in the egress path like I am looking into for the
host side.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC net-next 07/18] tun: set offloaded xdp program
2019-12-01 16:45 ` David Ahern
@ 2019-12-02 2:47 ` Jason Wang
2019-12-09 0:24 ` Prashant Bhole
0 siblings, 1 reply; 43+ messages in thread
From: Jason Wang @ 2019-12-02 2:47 UTC (permalink / raw)
To: David Ahern, Prashant Bhole, David S . Miller, Michael S . Tsirkin
Cc: Song Liu, Jakub Kicinski, Jesper Dangaard Brouer,
Daniel Borkmann, netdev, John Fastabend, Alexei Starovoitov,
qemu-devel, kvm, Yonghong Song, Andrii Nakryiko,
Martin KaFai Lau
On 2019/12/2 上午12:45, David Ahern wrote:
> On 11/26/19 4:07 AM, Prashant Bhole wrote:
>> From: Jason Wang <jasowang@redhat.com>
>>
>> This patch introduces an ioctl way to set an offloaded XDP program
>> to tun driver. This ioctl will be used by qemu to offload XDP program
>> from virtio_net in the guest.
>>
> Seems like you need to set / reset the SOCK_XDP flag on tfile->sk since
> this is an XDP program.
>
> Also, why not add this program using netlink instead of ioctl? e.g., as
> part of a generic XDP in the egress path like I am looking into for the
> host side.
Maybe both, otherwise, qemu may need netlink as a dependency.
Thanks
>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC net-next 07/18] tun: set offloaded xdp program
2019-12-02 2:47 ` Jason Wang
@ 2019-12-09 0:24 ` Prashant Bhole
0 siblings, 0 replies; 43+ messages in thread
From: Prashant Bhole @ 2019-12-09 0:24 UTC (permalink / raw)
To: Jason Wang, David Ahern, David S . Miller, Michael S . Tsirkin,
Alexei Starovoitov, Jakub Kicinski
Cc: Song Liu, Jesper Dangaard Brouer, Daniel Borkmann, netdev,
John Fastabend, qemu-devel, kvm, Yonghong Song, Andrii Nakryiko,
Martin KaFai Lau
On 12/2/19 11:47 AM, Jason Wang wrote:
>
> On 2019/12/2 上午12:45, David Ahern wrote:
>> On 11/26/19 4:07 AM, Prashant Bhole wrote:
>>> From: Jason Wang <jasowang@redhat.com>
>>>
>>> This patch introduces an ioctl way to set an offloaded XDP program
>>> to tun driver. This ioctl will be used by qemu to offload XDP program
>>> from virtio_net in the guest.
>>>
>> Seems like you need to set / reset the SOCK_XDP flag on tfile->sk since
>> this is an XDP program.
>>
>> Also, why not add this program using netlink instead of ioctl? e.g., as
>> part of a generic XDP in the egress path like I am looking into for the
>> host side.
>
>
> Maybe both, otherwise, qemu may need netlink as a dependency.
>
> Thanks
>
Thank you all for reviewing. We will continue to improve this set.
If we split this work, Tx path XDP is one of the necessary part
which can be developed first. As suggested by David Ahern it will be
a netlink way but we will still need ioctl way for tap. I will try
to come up with Tx path XDP set next time.
Thanks.
^ permalink raw reply [flat|nested] 43+ messages in thread
* [RFC net-next 08/18] tun: run offloaded XDP program in Tx path
2019-11-26 10:07 [RFC net-next 00/18] virtio_net XDP offload Prashant Bhole
` (6 preceding siblings ...)
2019-11-26 10:07 ` [RFC net-next 07/18] tun: set offloaded xdp program Prashant Bhole
@ 2019-11-26 10:07 ` Prashant Bhole
2019-12-01 16:39 ` David Ahern
2019-11-26 10:07 ` [RFC net-next 09/18] tun: add a way to inject Tx path packet into Rx path Prashant Bhole
` (10 subsequent siblings)
18 siblings, 1 reply; 43+ messages in thread
From: Prashant Bhole @ 2019-11-26 10:07 UTC (permalink / raw)
To: David S . Miller, Michael S . Tsirkin
Cc: Song Liu, Jakub Kicinski, Jesper Dangaard Brouer,
Daniel Borkmann, netdev, Jason Wang, John Fastabend,
Alexei Starovoitov, qemu-devel, Prashant Bhole, kvm,
Yonghong Song, Andrii Nakryiko, Martin KaFai Lau
run offloaded XDP program as soon as packet is removed from the ptr
ring. Since this is XDP in Tx path, the traditional handling of
XDP actions XDP_TX/REDIRECT isn't valid. For this reason we call
do_xdp_generic_core instead of do_xdp_generic. do_xdp_generic_core
just runs the program and leaves the action handling to us.
Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
drivers/net/tun.c | 149 +++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 146 insertions(+), 3 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index ecb49101b0b5..466ea69f00ee 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -131,6 +131,7 @@ struct tap_filter {
/* MAX_TAP_QUEUES 256 is chosen to allow rx/tx queues to be equal
* to max number of VCPUs in guest. */
#define MAX_TAP_QUEUES 256
+#define MAX_TAP_BATCH 64
#define MAX_TAP_FLOWS 4096
#define TUN_FLOW_EXPIRE (3 * HZ)
@@ -2156,6 +2157,109 @@ static ssize_t tun_put_user(struct tun_struct *tun,
return total;
}
+static struct sk_buff *tun_prepare_xdp_skb(struct sk_buff *skb)
+{
+ struct sk_buff *nskb;
+
+ if (skb_shared(skb) || skb_cloned(skb)) {
+ nskb = skb_copy(skb, GFP_ATOMIC);
+ consume_skb(skb);
+ return nskb;
+ }
+
+ return skb;
+}
+
+static u32 tun_do_xdp_offload_generic(struct tun_struct *tun,
+ struct sk_buff *skb)
+{
+ struct tun_prog *xdp_prog;
+ struct xdp_buff xdp;
+ u32 act = XDP_PASS;
+
+ xdp_prog = rcu_dereference(tun->offloaded_xdp_prog);
+ if (xdp_prog) {
+ skb = tun_prepare_xdp_skb(skb);
+ if (!skb) {
+ act = XDP_DROP;
+ kfree_skb(skb);
+ goto drop;
+ }
+
+ act = do_xdp_generic_core(skb, &xdp, xdp_prog->prog);
+ switch (act) {
+ case XDP_TX:
+ /*
+ * Rx path generic XDP will be called in this path
+ */
+ netif_receive_skb(skb);
+ break;
+ case XDP_PASS:
+ break;
+ case XDP_REDIRECT:
+ /*
+ * Since we are not handling this case yet, let's free
+ * skb here. In case of XDP_DROP/XDP_ABORTED, the skb
+ * was already freed in do_xdp_generic_core()
+ */
+ kfree_skb(skb);
+ /* fall through */
+ default:
+ bpf_warn_invalid_xdp_action(act);
+ /* fall through */
+ case XDP_ABORTED:
+ trace_xdp_exception(tun->dev, xdp_prog->prog, act);
+ /* fall through */
+ case XDP_DROP:
+ goto drop;
+ }
+ }
+
+ return act;
+drop:
+ this_cpu_inc(tun->pcpu_stats->tx_dropped);
+ return act;
+}
+
+static u32 tun_do_xdp_offload(struct tun_struct *tun, struct tun_file *tfile,
+ struct xdp_frame *frame)
+{
+ struct tun_prog *xdp_prog;
+ struct tun_page tpage;
+ struct xdp_buff xdp;
+ u32 act = XDP_PASS;
+ int flush = 0;
+
+ xdp_prog = rcu_dereference(tun->offloaded_xdp_prog);
+ if (xdp_prog) {
+ xdp.data_hard_start = frame->data - frame->headroom;
+ xdp.data = frame->data;
+ xdp.data_end = xdp.data + frame->len;
+ xdp.data_meta = xdp.data - frame->metasize;
+
+ act = bpf_prog_run_xdp(xdp_prog->prog, &xdp);
+ switch (act) {
+ case XDP_PASS:
+ break;
+ case XDP_TX:
+ /* fall through */
+ case XDP_REDIRECT:
+ /* fall through */
+ default:
+ bpf_warn_invalid_xdp_action(act);
+ /* fall through */
+ case XDP_ABORTED:
+ trace_xdp_exception(tun->dev, xdp_prog->prog, act);
+ /* fall through */
+ case XDP_DROP:
+ xdp_return_frame_rx_napi(frame);
+ break;
+ }
+ }
+
+ return act;
+}
+
static void *tun_ring_recv(struct tun_file *tfile, int noblock, int *err)
{
DECLARE_WAITQUEUE(wait, current);
@@ -2574,6 +2678,47 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
return ret;
}
+static int tun_consume_packets(struct tun_file *tfile, void **ptr_array, int n)
+{
+ struct tun_prog *xdp_prog;
+ struct xdp_frame *frame;
+ struct tun_struct *tun;
+ int i, num_ptrs;
+ int pkt_cnt = 0;
+ void *pkts[MAX_TAP_BATCH];
+ void *ptr;
+ u32 act;
+
+ if (unlikely(!tfile))
+ return 0;
+
+ if (n > MAX_TAP_BATCH)
+ n = MAX_TAP_BATCH;
+
+ rcu_read_lock();
+ tun = rcu_dereference(tfile->tun);
+ if (unlikely(!tun))
+ return 0;
+ xdp_prog = rcu_dereference(tun->offloaded_xdp_prog);
+
+ num_ptrs = ptr_ring_consume_batched(&tfile->tx_ring, pkts, n);
+ for (i = 0; i < num_ptrs; i++) {
+ ptr = pkts[i];
+ if (tun_is_xdp_frame(ptr)) {
+ frame = tun_ptr_to_xdp(ptr);
+ act = tun_do_xdp_offload(tun, tfile, frame);
+ } else {
+ act = tun_do_xdp_offload_generic(tun, ptr);
+ }
+
+ if (act == XDP_PASS)
+ ptr_array[pkt_cnt++] = ptr;
+ }
+
+ rcu_read_unlock();
+ return pkt_cnt;
+}
+
static int tun_recvmsg(struct socket *sock, struct msghdr *m, size_t total_len,
int flags)
{
@@ -2594,9 +2739,7 @@ static int tun_recvmsg(struct socket *sock, struct msghdr *m, size_t total_len,
ptr = ctl->ptr;
break;
case TUN_MSG_CONSUME_PKTS:
- ret = ptr_ring_consume_batched(&tfile->tx_ring,
- ctl->ptr,
- ctl->num);
+ ret = tun_consume_packets(tfile, ctl->ptr, ctl->num);
goto out;
case TUN_MSG_UNCONSUME_PKTS:
ptr_ring_unconsume(&tfile->tx_ring, ctl->ptr,
--
2.20.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [RFC net-next 08/18] tun: run offloaded XDP program in Tx path
2019-11-26 10:07 ` [RFC net-next 08/18] tun: run offloaded XDP program in Tx path Prashant Bhole
@ 2019-12-01 16:39 ` David Ahern
2019-12-01 20:56 ` David Miller
2019-12-02 2:45 ` Jason Wang
0 siblings, 2 replies; 43+ messages in thread
From: David Ahern @ 2019-12-01 16:39 UTC (permalink / raw)
To: Prashant Bhole, David S . Miller, Michael S . Tsirkin
Cc: Song Liu, Jakub Kicinski, Jesper Dangaard Brouer,
Daniel Borkmann, netdev, Jason Wang, John Fastabend,
Alexei Starovoitov, qemu-devel, kvm, Yonghong Song,
Andrii Nakryiko, Martin KaFai Lau
On 11/26/19 4:07 AM, Prashant Bhole wrote:
> run offloaded XDP program as soon as packet is removed from the ptr
> ring. Since this is XDP in Tx path, the traditional handling of
> XDP actions XDP_TX/REDIRECT isn't valid. For this reason we call
> do_xdp_generic_core instead of do_xdp_generic. do_xdp_generic_core
> just runs the program and leaves the action handling to us.
What happens if an offloaded program returns XDP_REDIRECT?
Below you just drop the packet which is going to be a bad user
experience. A better user experience is to detect XDP return codes a
program uses, catch those that are not supported for this use case and
fail the install of the program.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC net-next 08/18] tun: run offloaded XDP program in Tx path
2019-12-01 16:39 ` David Ahern
@ 2019-12-01 20:56 ` David Miller
2019-12-01 21:40 ` Michael S. Tsirkin
2019-12-02 2:45 ` Jason Wang
1 sibling, 1 reply; 43+ messages in thread
From: David Miller @ 2019-12-01 20:56 UTC (permalink / raw)
To: dsahern
Cc: songliubraving, jakub.kicinski, hawk, daniel, mst, netdev,
jasowang, john.fastabend, ast, qemu-devel, prashantbhole.linux,
kvm, yhs, andriin, kafai
From: David Ahern <dsahern@gmail.com>
Date: Sun, 1 Dec 2019 09:39:54 -0700
> Below you just drop the packet which is going to be a bad user
> experience. A better user experience is to detect XDP return codes a
> program uses, catch those that are not supported for this use case and
> fail the install of the program.
This is not universally possible.
Return codes can be calculated dynamically, come from maps potentially
shared with other bpf programs, etc.
So unfortunately this suggestion is not tenable.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC net-next 08/18] tun: run offloaded XDP program in Tx path
2019-12-01 20:56 ` David Miller
@ 2019-12-01 21:40 ` Michael S. Tsirkin
2019-12-01 21:54 ` David Miller
0 siblings, 1 reply; 43+ messages in thread
From: Michael S. Tsirkin @ 2019-12-01 21:40 UTC (permalink / raw)
To: David Miller
Cc: songliubraving, jakub.kicinski, hawk, daniel, netdev, jasowang,
john.fastabend, ast, qemu-devel, prashantbhole.linux, kvm,
dsahern, yhs, andriin, kafai
On Sun, Dec 01, 2019 at 12:56:21PM -0800, David Miller wrote:
> From: David Ahern <dsahern@gmail.com>
> Date: Sun, 1 Dec 2019 09:39:54 -0700
>
> > Below you just drop the packet which is going to be a bad user
> > experience. A better user experience is to detect XDP return codes a
> > program uses, catch those that are not supported for this use case and
> > fail the install of the program.
>
> This is not universally possible.
>
> Return codes can be calculated dynamically, come from maps potentially
> shared with other bpf programs, etc.
>
> So unfortunately this suggestion is not tenable.
Right. But it is helpful to expose the supported functionality
to guest in some way, if nothing else then so that
guests can be moved between different hosts.
Also, we need a way to report this kind of event to guest
so it's possible to figure out what went wrong.
--
MST
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC net-next 08/18] tun: run offloaded XDP program in Tx path
2019-12-01 21:40 ` Michael S. Tsirkin
@ 2019-12-01 21:54 ` David Miller
2019-12-02 2:56 ` Jason Wang
0 siblings, 1 reply; 43+ messages in thread
From: David Miller @ 2019-12-01 21:54 UTC (permalink / raw)
To: mst
Cc: songliubraving, jakub.kicinski, hawk, daniel, netdev, jasowang,
john.fastabend, ast, qemu-devel, prashantbhole.linux, kvm,
dsahern, yhs, andriin, kafai
From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Sun, 1 Dec 2019 16:40:22 -0500
> Right. But it is helpful to expose the supported functionality
> to guest in some way, if nothing else then so that
> guests can be moved between different hosts.
>
> Also, we need a way to report this kind of event to guest
> so it's possible to figure out what went wrong.
On the contrary, this is why it is of utmost importance that all
XDP implementations support the full suite of XDP facilities from
the very beginning.
This is why we keep giving people a hard time when they add support
only for some of the XDP return values and semantics. Users will get
killed by this, and it makes XDP a poor technology to use because
behavior is not consistent across device types.
That's not acceptable and I'll push back on anything that continues
this trend.
If you can't HW offload it, kick it to software.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC net-next 08/18] tun: run offloaded XDP program in Tx path
2019-12-01 21:54 ` David Miller
@ 2019-12-02 2:56 ` Jason Wang
0 siblings, 0 replies; 43+ messages in thread
From: Jason Wang @ 2019-12-02 2:56 UTC (permalink / raw)
To: David Miller, mst
Cc: songliubraving, jakub.kicinski, hawk, daniel, netdev,
john.fastabend, ast, qemu-devel, prashantbhole.linux, kvm,
dsahern, yhs, andriin, kafai
On 2019/12/2 上午5:54, David Miller wrote:
> From: "Michael S. Tsirkin" <mst@redhat.com>
> Date: Sun, 1 Dec 2019 16:40:22 -0500
>
>> Right. But it is helpful to expose the supported functionality
>> to guest in some way, if nothing else then so that
>> guests can be moved between different hosts.
>>
>> Also, we need a way to report this kind of event to guest
>> so it's possible to figure out what went wrong.
> On the contrary, this is why it is of utmost importance that all
> XDP implementations support the full suite of XDP facilities from
> the very beginning.
>
> This is why we keep giving people a hard time when they add support
> only for some of the XDP return values and semantics. Users will get
> killed by this, and it makes XDP a poor technology to use because
> behavior is not consistent across device types.
>
> That's not acceptable and I'll push back on anything that continues
> this trend.
>
> If you can't HW offload it, kick it to software.
We can try to work out a solution for XDP_REDIRECT.
Thanks
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC net-next 08/18] tun: run offloaded XDP program in Tx path
2019-12-01 16:39 ` David Ahern
2019-12-01 20:56 ` David Miller
@ 2019-12-02 2:45 ` Jason Wang
1 sibling, 0 replies; 43+ messages in thread
From: Jason Wang @ 2019-12-02 2:45 UTC (permalink / raw)
To: David Ahern, Prashant Bhole, David S . Miller, Michael S . Tsirkin
Cc: Song Liu, Jakub Kicinski, Jesper Dangaard Brouer,
Daniel Borkmann, netdev, John Fastabend, Alexei Starovoitov,
qemu-devel, kvm, Yonghong Song, Andrii Nakryiko,
Martin KaFai Lau
On 2019/12/2 上午12:39, David Ahern wrote:
> On 11/26/19 4:07 AM, Prashant Bhole wrote:
>> run offloaded XDP program as soon as packet is removed from the ptr
>> ring. Since this is XDP in Tx path, the traditional handling of
>> XDP actions XDP_TX/REDIRECT isn't valid. For this reason we call
>> do_xdp_generic_core instead of do_xdp_generic. do_xdp_generic_core
>> just runs the program and leaves the action handling to us.
> What happens if an offloaded program returns XDP_REDIRECT?
>
> Below you just drop the packet which is going to be a bad user
> experience. A better user experience is to detect XDP return codes a
> program uses, catch those that are not supported for this use case and
> fail the install of the program.
Yes, this could be done in the guest.
Thanks
^ permalink raw reply [flat|nested] 43+ messages in thread
* [RFC net-next 09/18] tun: add a way to inject Tx path packet into Rx path
2019-11-26 10:07 [RFC net-next 00/18] virtio_net XDP offload Prashant Bhole
` (7 preceding siblings ...)
2019-11-26 10:07 ` [RFC net-next 08/18] tun: run offloaded XDP program in Tx path Prashant Bhole
@ 2019-11-26 10:07 ` Prashant Bhole
2019-11-26 10:07 ` [RFC net-next 10/18] tun: handle XDP_TX action of offloaded program Prashant Bhole
` (9 subsequent siblings)
18 siblings, 0 replies; 43+ messages in thread
From: Prashant Bhole @ 2019-11-26 10:07 UTC (permalink / raw)
To: David S . Miller, Michael S . Tsirkin
Cc: Song Liu, Jakub Kicinski, Jesper Dangaard Brouer,
Daniel Borkmann, netdev, Jason Wang, John Fastabend,
Alexei Starovoitov, qemu-devel, Prashant Bhole, kvm,
Yonghong Song, Andrii Nakryiko, Martin KaFai Lau
In order to support XDP_TX from offloaded XDP program, we need a way
to inject Tx path packet into Rx path. Let's modify the Rx path
function tun_xdp_one() for this purpose.
This patch adds a parameter to pass information whether packet has
virtio_net header. When header isn't present, it is considered as
XDP_TX'ed packet from offloaded program.
Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
drivers/net/tun.c | 35 ++++++++++++++++++++++++++++-------
1 file changed, 28 insertions(+), 7 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 466ea69f00ee..8d6cdd3e5139 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2221,6 +2221,13 @@ static u32 tun_do_xdp_offload_generic(struct tun_struct *tun,
return act;
}
+static int tun_xdp_one(struct tun_struct *tun,
+ struct tun_file *tfile,
+ struct xdp_buff *xdp, int *flush,
+ struct tun_page *tpage, int has_hdr);
+
+static void tun_put_page(struct tun_page *tpage);
+
static u32 tun_do_xdp_offload(struct tun_struct *tun, struct tun_file *tfile,
struct xdp_frame *frame)
{
@@ -2527,23 +2534,36 @@ static void tun_put_page(struct tun_page *tpage)
static int tun_xdp_one(struct tun_struct *tun,
struct tun_file *tfile,
struct xdp_buff *xdp, int *flush,
- struct tun_page *tpage)
+ struct tun_page *tpage, int has_hdr)
{
unsigned int datasize = xdp->data_end - xdp->data;
- struct tun_xdp_hdr *hdr = xdp->data_hard_start;
- struct virtio_net_hdr *gso = &hdr->gso;
+ struct tun_xdp_hdr *hdr;
+ struct virtio_net_hdr *gso;
struct tun_pcpu_stats *stats;
struct bpf_prog *xdp_prog;
struct sk_buff *skb = NULL;
+ unsigned int headroom;
u32 rxhash = 0, act;
- int buflen = hdr->buflen;
+ int buflen;
int err = 0;
bool skb_xdp = false;
struct page *page;
+ if (has_hdr) {
+ hdr = xdp->data_hard_start;
+ gso = &hdr->gso;
+ buflen = hdr->buflen;
+ } else {
+ /* came here from tun tx path */
+ xdp->data_hard_start -= sizeof(struct xdp_frame);
+ headroom = xdp->data - xdp->data_hard_start;
+ buflen = datasize + headroom +
+ SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+ }
+
xdp_prog = rcu_dereference(tun->xdp_prog);
if (xdp_prog) {
- if (gso->gso_type) {
+ if (has_hdr && gso->gso_type) {
skb_xdp = true;
goto build;
}
@@ -2588,7 +2608,8 @@ static int tun_xdp_one(struct tun_struct *tun,
skb_reserve(skb, xdp->data - xdp->data_hard_start);
skb_put(skb, xdp->data_end - xdp->data);
- if (virtio_net_hdr_to_skb(skb, gso, tun_is_little_endian(tun))) {
+ if (has_hdr &&
+ virtio_net_hdr_to_skb(skb, gso, tun_is_little_endian(tun))) {
this_cpu_inc(tun->pcpu_stats->rx_frame_errors);
kfree_skb(skb);
err = -EINVAL;
@@ -2652,7 +2673,7 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
for (i = 0; i < n; i++) {
xdp = &((struct xdp_buff *)ctl->ptr)[i];
- tun_xdp_one(tun, tfile, xdp, &flush, &tpage);
+ tun_xdp_one(tun, tfile, xdp, &flush, &tpage, true);
}
if (flush)
--
2.20.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [RFC net-next 10/18] tun: handle XDP_TX action of offloaded program
2019-11-26 10:07 [RFC net-next 00/18] virtio_net XDP offload Prashant Bhole
` (8 preceding siblings ...)
2019-11-26 10:07 ` [RFC net-next 09/18] tun: add a way to inject Tx path packet into Rx path Prashant Bhole
@ 2019-11-26 10:07 ` Prashant Bhole
2019-11-26 10:07 ` [RFC net-next 11/18] tun: run xdp prog when tun is read from file interface Prashant Bhole
` (8 subsequent siblings)
18 siblings, 0 replies; 43+ messages in thread
From: Prashant Bhole @ 2019-11-26 10:07 UTC (permalink / raw)
To: David S . Miller, Michael S . Tsirkin
Cc: Song Liu, Jakub Kicinski, Jesper Dangaard Brouer,
Daniel Borkmann, netdev, Jason Wang, John Fastabend,
Alexei Starovoitov, qemu-devel, Prashant Bhole, kvm,
Yonghong Song, Andrii Nakryiko, Martin KaFai Lau
When offloaded program returns XDP_TX, we need to inject the packet in
Rx path of tun. This patch injects such packets in Rx path using
tun_xdp_one.
Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
drivers/net/tun.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 8d6cdd3e5139..084ca95358fe 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2249,7 +2249,13 @@ static u32 tun_do_xdp_offload(struct tun_struct *tun, struct tun_file *tfile,
case XDP_PASS:
break;
case XDP_TX:
- /* fall through */
+ tpage.page = NULL;
+ tpage.count = 0;
+ tun_xdp_one(tun, tfile, &xdp, &flush, &tpage, false);
+ tun_put_page(&tpage);
+ if (flush)
+ xdp_do_flush_map();
+ break;
case XDP_REDIRECT:
/* fall through */
default:
--
2.20.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [RFC net-next 11/18] tun: run xdp prog when tun is read from file interface
2019-11-26 10:07 [RFC net-next 00/18] virtio_net XDP offload Prashant Bhole
` (9 preceding siblings ...)
2019-11-26 10:07 ` [RFC net-next 10/18] tun: handle XDP_TX action of offloaded program Prashant Bhole
@ 2019-11-26 10:07 ` Prashant Bhole
2019-11-26 10:07 ` [RFC net-next 12/18] virtio-net: store xdp_prog in device Prashant Bhole
` (7 subsequent siblings)
18 siblings, 0 replies; 43+ messages in thread
From: Prashant Bhole @ 2019-11-26 10:07 UTC (permalink / raw)
To: David S . Miller, Michael S . Tsirkin
Cc: Song Liu, Jakub Kicinski, Jesper Dangaard Brouer,
Daniel Borkmann, netdev, Jason Wang, John Fastabend,
Alexei Starovoitov, qemu-devel, Prashant Bhole, kvm,
Yonghong Song, Andrii Nakryiko, Martin KaFai Lau
It handles the case when qemu performs read on tun using file
operations.
Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
drivers/net/tun.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 084ca95358fe..639921c10e32 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2318,8 +2318,10 @@ static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile,
struct iov_iter *to,
int noblock, void *ptr)
{
+ struct xdp_frame *frame;
ssize_t ret;
int err;
+ u32 act;
tun_debug(KERN_INFO, tun, "tun_do_read\n");
@@ -2333,6 +2335,15 @@ static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile,
ptr = tun_ring_recv(tfile, noblock, &err);
if (!ptr)
return err;
+
+ if (tun_is_xdp_frame(ptr)) {
+ frame = tun_ptr_to_xdp(ptr);
+ act = tun_do_xdp_offload(tun, tfile, frame);
+ } else {
+ act = tun_do_xdp_offload_generic(tun, ptr);
+ }
+ if (act != XDP_PASS)
+ return err;
}
if (tun_is_xdp_frame(ptr)) {
--
2.20.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [RFC net-next 12/18] virtio-net: store xdp_prog in device
2019-11-26 10:07 [RFC net-next 00/18] virtio_net XDP offload Prashant Bhole
` (10 preceding siblings ...)
2019-11-26 10:07 ` [RFC net-next 11/18] tun: run xdp prog when tun is read from file interface Prashant Bhole
@ 2019-11-26 10:07 ` Prashant Bhole
2019-11-26 10:07 ` [RFC net-next 13/18] virtio_net: use XDP attachment helpers Prashant Bhole
` (6 subsequent siblings)
18 siblings, 0 replies; 43+ messages in thread
From: Prashant Bhole @ 2019-11-26 10:07 UTC (permalink / raw)
To: David S . Miller, Michael S . Tsirkin
Cc: Song Liu, Jakub Kicinski, Jesper Dangaard Brouer,
Daniel Borkmann, netdev, Jason Wang, John Fastabend,
Alexei Starovoitov, qemu-devel, Prashant Bhole, kvm,
Yonghong Song, Andrii Nakryiko, Martin KaFai Lau
From: Jason Wang <jasowang@redhat.com>
This is a preparation for adding XDP offload support in virtio_net
driver. By storing XDP program in virtionet_info will make it
consistent with the offloaded program which will introduce in next
patches.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Co-developed-by: Prashant Bhole <prashantbhole.linux@gmail.com>
Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
drivers/net/virtio_net.c | 62 ++++++++++++++++------------------------
1 file changed, 25 insertions(+), 37 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 4d7d5434cc5d..c8bbb1b90c1c 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -137,8 +137,6 @@ struct receive_queue {
struct napi_struct napi;
- struct bpf_prog __rcu *xdp_prog;
-
struct virtnet_rq_stats stats;
/* Chain pages by the private ptr. */
@@ -229,6 +227,8 @@ struct virtnet_info {
/* failover when STANDBY feature enabled */
struct failover *failover;
+
+ struct bpf_prog __rcu *xdp_prog;
};
struct padded_vnet_hdr {
@@ -486,7 +486,6 @@ static int virtnet_xdp_xmit(struct net_device *dev,
int n, struct xdp_frame **frames, u32 flags)
{
struct virtnet_info *vi = netdev_priv(dev);
- struct receive_queue *rq = vi->rq;
struct bpf_prog *xdp_prog;
struct send_queue *sq;
unsigned int len;
@@ -501,7 +500,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
/* Only allow ndo_xdp_xmit if XDP is loaded on dev, as this
* indicate XDP resources have been successfully allocated.
*/
- xdp_prog = rcu_dereference(rq->xdp_prog);
+ xdp_prog = rcu_dereference(vi->xdp_prog);
if (!xdp_prog)
return -ENXIO;
@@ -649,7 +648,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
stats->bytes += len;
rcu_read_lock();
- xdp_prog = rcu_dereference(rq->xdp_prog);
+ xdp_prog = rcu_dereference(vi->xdp_prog);
if (xdp_prog) {
struct virtio_net_hdr_mrg_rxbuf *hdr = buf + header_offset;
struct xdp_frame *xdpf;
@@ -798,7 +797,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
stats->bytes += len - vi->hdr_len;
rcu_read_lock();
- xdp_prog = rcu_dereference(rq->xdp_prog);
+ xdp_prog = rcu_dereference(vi->xdp_prog);
if (xdp_prog) {
struct xdp_frame *xdpf;
struct page *xdp_page;
@@ -2060,7 +2059,7 @@ static int virtnet_set_channels(struct net_device *dev,
* also when XDP is loaded all RX queues have XDP programs so we only
* need to check a single RX queue.
*/
- if (vi->rq[0].xdp_prog)
+ if (vi->xdp_prog)
return -EINVAL;
get_online_cpus();
@@ -2441,13 +2440,10 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
return -ENOMEM;
}
- old_prog = rtnl_dereference(vi->rq[0].xdp_prog);
+ old_prog = rtnl_dereference(vi->xdp_prog);
if (!prog && !old_prog)
return 0;
- if (prog)
- bpf_prog_add(prog, vi->max_queue_pairs - 1);
-
/* Make sure NAPI is not using any XDP TX queues for RX. */
if (netif_running(dev)) {
for (i = 0; i < vi->max_queue_pairs; i++) {
@@ -2457,11 +2453,8 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
}
if (!prog) {
- for (i = 0; i < vi->max_queue_pairs; i++) {
- rcu_assign_pointer(vi->rq[i].xdp_prog, prog);
- if (i == 0)
- virtnet_restore_guest_offloads(vi);
- }
+ rcu_assign_pointer(vi->xdp_prog, prog);
+ virtnet_restore_guest_offloads(vi);
synchronize_net();
}
@@ -2472,16 +2465,12 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
vi->xdp_queue_pairs = xdp_qp;
if (prog) {
- for (i = 0; i < vi->max_queue_pairs; i++) {
- rcu_assign_pointer(vi->rq[i].xdp_prog, prog);
- if (i == 0 && !old_prog)
- virtnet_clear_guest_offloads(vi);
- }
+ rcu_assign_pointer(vi->xdp_prog, prog);
+ if (!old_prog)
+ virtnet_clear_guest_offloads(vi);
}
for (i = 0; i < vi->max_queue_pairs; i++) {
- if (old_prog)
- bpf_prog_put(old_prog);
if (netif_running(dev)) {
virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi);
virtnet_napi_tx_enable(vi, vi->sq[i].vq,
@@ -2489,13 +2478,15 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
}
}
+ if (old_prog)
+ bpf_prog_put(old_prog);
+
return 0;
err:
if (!prog) {
virtnet_clear_guest_offloads(vi);
- for (i = 0; i < vi->max_queue_pairs; i++)
- rcu_assign_pointer(vi->rq[i].xdp_prog, old_prog);
+ rcu_assign_pointer(vi->xdp_prog, old_prog);
}
if (netif_running(dev)) {
@@ -2514,13 +2505,11 @@ static u32 virtnet_xdp_query(struct net_device *dev)
{
struct virtnet_info *vi = netdev_priv(dev);
const struct bpf_prog *xdp_prog;
- int i;
- for (i = 0; i < vi->max_queue_pairs; i++) {
- xdp_prog = rtnl_dereference(vi->rq[i].xdp_prog);
- if (xdp_prog)
- return xdp_prog->aux->id;
- }
+ xdp_prog = rtnl_dereference(vi->xdp_prog);
+ if (xdp_prog)
+ return xdp_prog->aux->id;
+
return 0;
}
@@ -2657,18 +2646,17 @@ static void virtnet_free_queues(struct virtnet_info *vi)
static void _free_receive_bufs(struct virtnet_info *vi)
{
- struct bpf_prog *old_prog;
+ struct bpf_prog *old_prog = rtnl_dereference(vi->xdp_prog);
int i;
for (i = 0; i < vi->max_queue_pairs; i++) {
while (vi->rq[i].pages)
__free_pages(get_a_page(&vi->rq[i], GFP_KERNEL), 0);
-
- old_prog = rtnl_dereference(vi->rq[i].xdp_prog);
- RCU_INIT_POINTER(vi->rq[i].xdp_prog, NULL);
- if (old_prog)
- bpf_prog_put(old_prog);
}
+
+ RCU_INIT_POINTER(vi->xdp_prog, NULL);
+ if (old_prog)
+ bpf_prog_put(old_prog);
}
static void free_receive_bufs(struct virtnet_info *vi)
--
2.20.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [RFC net-next 13/18] virtio_net: use XDP attachment helpers
2019-11-26 10:07 [RFC net-next 00/18] virtio_net XDP offload Prashant Bhole
` (11 preceding siblings ...)
2019-11-26 10:07 ` [RFC net-next 12/18] virtio-net: store xdp_prog in device Prashant Bhole
@ 2019-11-26 10:07 ` Prashant Bhole
2019-11-26 10:07 ` [RFC net-next 14/18] virtio_net: add XDP prog offload infrastructure Prashant Bhole
` (5 subsequent siblings)
18 siblings, 0 replies; 43+ messages in thread
From: Prashant Bhole @ 2019-11-26 10:07 UTC (permalink / raw)
To: David S . Miller, Michael S . Tsirkin
Cc: Song Liu, Jakub Kicinski, Jesper Dangaard Brouer,
Daniel Borkmann, netdev, Jason Wang, John Fastabend,
Alexei Starovoitov, qemu-devel, Prashant Bhole, kvm,
Yonghong Song, Andrii Nakryiko, Martin KaFai Lau
Next patches will introduce virtio_net XDP offloading. In that case
we need to manage offloaded and non-offload program. In order to make
it consistent this patch introduces use of XDP attachment helpers.
Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
drivers/net/virtio_net.c | 30 +++++++++++-------------------
1 file changed, 11 insertions(+), 19 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index c8bbb1b90c1c..cee5c2b15c62 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -229,6 +229,8 @@ struct virtnet_info {
struct failover *failover;
struct bpf_prog __rcu *xdp_prog;
+
+ struct xdp_attachment_info xdp;
};
struct padded_vnet_hdr {
@@ -2398,15 +2400,19 @@ static int virtnet_restore_guest_offloads(struct virtnet_info *vi)
return virtnet_set_guest_offloads(vi, offloads);
}
-static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
- struct netlink_ext_ack *extack)
+static int virtnet_xdp_set(struct net_device *dev, struct netdev_bpf *bpf)
{
unsigned long int max_sz = PAGE_SIZE - sizeof(struct padded_vnet_hdr);
+ struct netlink_ext_ack *extack = bpf->extack;
struct virtnet_info *vi = netdev_priv(dev);
+ struct bpf_prog *prog = bpf->prog;
struct bpf_prog *old_prog;
u16 xdp_qp = 0, curr_qp;
int i, err;
+ if (!xdp_attachment_flags_ok(&vi->xdp, bpf))
+ return -EBUSY;
+
if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS)
&& (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||
virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6) ||
@@ -2478,8 +2484,7 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
}
}
- if (old_prog)
- bpf_prog_put(old_prog);
+ xdp_attachment_setup(&vi->xdp, bpf);
return 0;
@@ -2501,26 +2506,13 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
return err;
}
-static u32 virtnet_xdp_query(struct net_device *dev)
-{
- struct virtnet_info *vi = netdev_priv(dev);
- const struct bpf_prog *xdp_prog;
-
- xdp_prog = rtnl_dereference(vi->xdp_prog);
- if (xdp_prog)
- return xdp_prog->aux->id;
-
- return 0;
-}
-
static int virtnet_xdp(struct net_device *dev, struct netdev_bpf *xdp)
{
switch (xdp->command) {
case XDP_SETUP_PROG:
- return virtnet_xdp_set(dev, xdp->prog, xdp->extack);
+ return virtnet_xdp_set(dev, xdp);
case XDP_QUERY_PROG:
- xdp->prog_id = virtnet_xdp_query(dev);
- return 0;
+ return xdp_attachment_query(&vi->xdp, xdp);
default:
return -EINVAL;
}
--
2.20.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [RFC net-next 14/18] virtio_net: add XDP prog offload infrastructure
2019-11-26 10:07 [RFC net-next 00/18] virtio_net XDP offload Prashant Bhole
` (12 preceding siblings ...)
2019-11-26 10:07 ` [RFC net-next 13/18] virtio_net: use XDP attachment helpers Prashant Bhole
@ 2019-11-26 10:07 ` Prashant Bhole
2019-11-26 10:07 ` [RFC net-next 15/18] virtio_net: implement XDP prog offload functionality Prashant Bhole
` (4 subsequent siblings)
18 siblings, 0 replies; 43+ messages in thread
From: Prashant Bhole @ 2019-11-26 10:07 UTC (permalink / raw)
To: David S . Miller, Michael S . Tsirkin
Cc: Song Liu, Jakub Kicinski, Jesper Dangaard Brouer,
Daniel Borkmann, netdev, Jason Wang, John Fastabend,
Alexei Starovoitov, qemu-devel, Prashant Bhole, kvm,
Yonghong Song, Andrii Nakryiko, Martin KaFai Lau
From: Jason Wang <jasowang@redhat.com>
This patch prepares virtio_net of XDP offloading. It adds data
structures, blank callback implementations for bpf_prog_offload_ops.
It also implements ndo_init, ndo_uninit operations for setting up
offload related data structures.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Co-developed-by: Prashant Bhole <prashantbhole.linux@gmail.com>
Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
drivers/net/virtio_net.c | 103 +++++++++++++++++++++++++++++++++++++++
1 file changed, 103 insertions(+)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index cee5c2b15c62..a1088d0114f2 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -229,8 +229,14 @@ struct virtnet_info {
struct failover *failover;
struct bpf_prog __rcu *xdp_prog;
+ struct bpf_prog __rcu *offload_xdp_prog;
struct xdp_attachment_info xdp;
+ struct xdp_attachment_info xdp_hw;
+
+ struct bpf_offload_dev *bpf_dev;
+
+ struct list_head bpf_bound_progs;
};
struct padded_vnet_hdr {
@@ -258,6 +264,14 @@ static struct xdp_frame *ptr_to_xdp(void *ptr)
return (struct xdp_frame *)((unsigned long)ptr & ~VIRTIO_XDP_FLAG);
}
+struct virtnet_bpf_bound_prog {
+ struct virtnet_info *vi;
+ struct bpf_prog *prog;
+ struct list_head list;
+ u32 len;
+ struct bpf_insn insnsi[0];
+};
+
/* Converting between virtqueue no. and kernel tx/rx queue no.
* 0:rx0 1:tx0 2:rx1 3:tx1 ... 2N:rxN 2N+1:txN 2N+2:cvq
*/
@@ -2506,13 +2520,63 @@ static int virtnet_xdp_set(struct net_device *dev, struct netdev_bpf *bpf)
return err;
}
+static int virtnet_bpf_verify_insn(struct bpf_verifier_env *env, int insn_idx,
+ int prev_insn)
+{
+ return 0;
+}
+
+static void virtnet_bpf_destroy_prog(struct bpf_prog *prog)
+{
+}
+
+static int virtnet_xdp_set_offload(struct virtnet_info *vi,
+ struct netdev_bpf *bpf)
+{
+ return -EBUSY;
+}
+
+static int virtnet_bpf_verifier_setup(struct bpf_prog *prog)
+{
+ return -ENOMEM;
+}
+
+static int virtnet_bpf_verifier_prep(struct bpf_prog *prog)
+{
+ return 0;
+}
+
+static int virtnet_bpf_translate(struct bpf_prog *prog)
+{
+ return 0;
+}
+
+static int virtnet_bpf_finalize(struct bpf_verifier_env *env)
+{
+ return 0;
+}
+
+static const struct bpf_prog_offload_ops virtnet_bpf_dev_ops = {
+ .setup = virtnet_bpf_verifier_setup,
+ .prepare = virtnet_bpf_verifier_prep,
+ .insn_hook = virtnet_bpf_verify_insn,
+ .finalize = virtnet_bpf_finalize,
+ .translate = virtnet_bpf_translate,
+ .destroy = virtnet_bpf_destroy_prog,
+};
+
static int virtnet_xdp(struct net_device *dev, struct netdev_bpf *xdp)
{
+ struct virtnet_info *vi = netdev_priv(dev);
switch (xdp->command) {
case XDP_SETUP_PROG:
return virtnet_xdp_set(dev, xdp);
case XDP_QUERY_PROG:
return xdp_attachment_query(&vi->xdp, xdp);
+ case XDP_SETUP_PROG_HW:
+ return virtnet_xdp_set_offload(vi, xdp);
+ case XDP_QUERY_PROG_HW:
+ return xdp_attachment_query(&vi->xdp_hw, xdp);
default:
return -EINVAL;
}
@@ -2559,7 +2623,46 @@ static int virtnet_set_features(struct net_device *dev,
return 0;
}
+static int virtnet_bpf_init(struct virtnet_info *vi)
+{
+ int err;
+
+ vi->bpf_dev = bpf_offload_dev_create(&virtnet_bpf_dev_ops, NULL);
+ err = PTR_ERR_OR_ZERO(vi->bpf_dev);
+ if (err)
+ return err;
+
+ err = bpf_offload_dev_netdev_register(vi->bpf_dev, vi->dev);
+ if (err)
+ goto err_netdev_register;
+
+ INIT_LIST_HEAD(&vi->bpf_bound_progs);
+
+ return 0;
+
+err_netdev_register:
+ bpf_offload_dev_destroy(vi->bpf_dev);
+ return err;
+}
+
+static int virtnet_init(struct net_device *dev)
+{
+ struct virtnet_info *vi = netdev_priv(dev);
+
+ return virtnet_bpf_init(vi);
+}
+
+static void virtnet_uninit(struct net_device *dev)
+{
+ struct virtnet_info *vi = netdev_priv(dev);
+
+ bpf_offload_dev_netdev_unregister(vi->bpf_dev, vi->dev);
+ bpf_offload_dev_destroy(vi->bpf_dev);
+}
+
static const struct net_device_ops virtnet_netdev = {
+ .ndo_init = virtnet_init,
+ .ndo_uninit = virtnet_uninit,
.ndo_open = virtnet_open,
.ndo_stop = virtnet_close,
.ndo_start_xmit = start_xmit,
--
2.20.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [RFC net-next 15/18] virtio_net: implement XDP prog offload functionality
2019-11-26 10:07 [RFC net-next 00/18] virtio_net XDP offload Prashant Bhole
` (13 preceding siblings ...)
2019-11-26 10:07 ` [RFC net-next 14/18] virtio_net: add XDP prog offload infrastructure Prashant Bhole
@ 2019-11-26 10:07 ` Prashant Bhole
2019-11-27 20:42 ` Michael S. Tsirkin
2019-11-26 10:07 ` [RFC net-next 16/18] bpf: export function __bpf_map_get Prashant Bhole
` (3 subsequent siblings)
18 siblings, 1 reply; 43+ messages in thread
From: Prashant Bhole @ 2019-11-26 10:07 UTC (permalink / raw)
To: David S . Miller, Michael S . Tsirkin
Cc: Song Liu, Jakub Kicinski, Jesper Dangaard Brouer,
Daniel Borkmann, netdev, Jason Wang, John Fastabend,
Alexei Starovoitov, qemu-devel, Prashant Bhole, kvm,
Yonghong Song, Andrii Nakryiko, Martin KaFai Lau
From: Jason Wang <jasowang@redhat.com>
This patch implements bpf_prog_offload_ops callbacks and adds handling
for XDP_SETUP_PROG_HW. Handling of XDP_SETUP_PROG_HW involves setting
up struct virtio_net_ctrl_ebpf_prog and appending program instructions
to it. This control buffer is sent to Qemu with class
VIRTIO_NET_CTRL_EBPF and command VIRTIO_NET_BPF_CMD_SET_OFFLOAD.
The expected behavior from Qemu is that it should try to load the
program in host os and report the status.
It also adds restriction to have either driver or offloaded program
at a time. This restriction can be removed later after proper testing.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Co-developed-by: Prashant Bhole <prashantbhole.linux@gmail.com>
Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
drivers/net/virtio_net.c | 114 +++++++++++++++++++++++++++++++-
include/uapi/linux/virtio_net.h | 27 ++++++++
2 files changed, 139 insertions(+), 2 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index a1088d0114f2..dddfbb2a2075 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -169,6 +169,7 @@ struct control_buf {
u8 allmulti;
__virtio16 vid;
__virtio64 offloads;
+ struct virtio_net_ctrl_ebpf_prog prog_ctrl;
};
struct virtnet_info {
@@ -272,6 +273,8 @@ struct virtnet_bpf_bound_prog {
struct bpf_insn insnsi[0];
};
+#define VIRTNET_EA(extack, msg) NL_SET_ERR_MSG_MOD((extack), msg)
+
/* Converting between virtqueue no. and kernel tx/rx queue no.
* 0:rx0 1:tx0 2:rx1 3:tx1 ... 2N:rxN 2N+1:txN 2N+2:cvq
*/
@@ -2427,6 +2430,11 @@ static int virtnet_xdp_set(struct net_device *dev, struct netdev_bpf *bpf)
if (!xdp_attachment_flags_ok(&vi->xdp, bpf))
return -EBUSY;
+ if (rtnl_dereference(vi->offload_xdp_prog)) {
+ VIRTNET_EA(bpf->extack, "program already attached in offload mode");
+ return -EINVAL;
+ }
+
if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS)
&& (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||
virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6) ||
@@ -2528,17 +2536,114 @@ static int virtnet_bpf_verify_insn(struct bpf_verifier_env *env, int insn_idx,
static void virtnet_bpf_destroy_prog(struct bpf_prog *prog)
{
+ struct virtnet_bpf_bound_prog *state;
+
+ state = prog->aux->offload->dev_priv;
+ list_del(&state->list);
+ kfree(state);
+}
+
+static int virtnet_xdp_offload_check(struct virtnet_info *vi,
+ struct netdev_bpf *bpf)
+{
+ if (!bpf->prog)
+ return 0;
+
+ if (!bpf->prog->aux->offload) {
+ VIRTNET_EA(bpf->extack, "xdpoffload of non-bound program");
+ return -EINVAL;
+ }
+ if (bpf->prog->aux->offload->netdev != vi->dev) {
+ VIRTNET_EA(bpf->extack, "program bound to different dev");
+ return -EINVAL;
+ }
+
+ if (rtnl_dereference(vi->xdp_prog)) {
+ VIRTNET_EA(bpf->extack, "program already attached in driver mode");
+ return -EINVAL;
+ }
+
+ return 0;
}
static int virtnet_xdp_set_offload(struct virtnet_info *vi,
struct netdev_bpf *bpf)
{
- return -EBUSY;
+ struct virtio_net_ctrl_ebpf_prog *ctrl;
+ struct virtnet_bpf_bound_prog *bound_prog = NULL;
+ struct virtio_device *vdev = vi->vdev;
+ struct bpf_prog *prog = bpf->prog;
+ void *ctrl_buf = NULL;
+ struct scatterlist sg;
+ int prog_len;
+ int err = 0;
+
+ if (!xdp_attachment_flags_ok(&vi->xdp_hw, bpf))
+ return -EBUSY;
+
+ if (prog) {
+ if (prog->type != BPF_PROG_TYPE_XDP)
+ return -EOPNOTSUPP;
+ bound_prog = prog->aux->offload->dev_priv;
+ prog_len = prog->len * sizeof(bound_prog->insnsi[0]);
+
+ ctrl_buf = kmalloc(GFP_KERNEL, sizeof(*ctrl) + prog_len);
+ if (!ctrl_buf)
+ return -ENOMEM;
+ ctrl = ctrl_buf;
+ ctrl->cmd = cpu_to_virtio32(vi->vdev,
+ VIRTIO_NET_BPF_CMD_SET_OFFLOAD);
+ ctrl->len = cpu_to_virtio32(vi->vdev, prog_len);
+ ctrl->gpl_compatible = cpu_to_virtio16(vi->vdev,
+ prog->gpl_compatible);
+ memcpy(ctrl->insns, bound_prog->insnsi,
+ prog->len * sizeof(bound_prog->insnsi[0]));
+ sg_init_one(&sg, ctrl_buf, sizeof(*ctrl) + prog_len);
+ } else {
+ ctrl = &vi->ctrl->prog_ctrl;
+ ctrl->cmd = cpu_to_virtio32(vi->vdev,
+ VIRTIO_NET_BPF_CMD_UNSET_OFFLOAD);
+ sg_init_one(&sg, ctrl, sizeof(*ctrl));
+ }
+
+ if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_EBPF,
+ VIRTIO_NET_CTRL_EBPF_PROG,
+ &sg)) {
+ dev_warn(&vdev->dev, "Failed to set bpf offload prog\n");
+ err = -EFAULT;
+ goto out;
+ }
+
+ rcu_assign_pointer(vi->offload_xdp_prog, prog);
+
+ xdp_attachment_setup(&vi->xdp_hw, bpf);
+
+out:
+ kfree(ctrl_buf);
+ return err;
}
static int virtnet_bpf_verifier_setup(struct bpf_prog *prog)
{
- return -ENOMEM;
+ struct virtnet_info *vi = netdev_priv(prog->aux->offload->netdev);
+ size_t insn_len = prog->len * sizeof(struct bpf_insn);
+ struct virtnet_bpf_bound_prog *state;
+
+ state = kzalloc(sizeof(*state) + insn_len, GFP_KERNEL);
+ if (!state)
+ return -ENOMEM;
+
+ memcpy(&state->insnsi[0], prog->insnsi, insn_len);
+
+ state->vi = vi;
+ state->prog = prog;
+ state->len = prog->len;
+
+ list_add_tail(&state->list, &vi->bpf_bound_progs);
+
+ prog->aux->offload->dev_priv = state;
+
+ return 0;
}
static int virtnet_bpf_verifier_prep(struct bpf_prog *prog)
@@ -2568,12 +2673,17 @@ static const struct bpf_prog_offload_ops virtnet_bpf_dev_ops = {
static int virtnet_xdp(struct net_device *dev, struct netdev_bpf *xdp)
{
struct virtnet_info *vi = netdev_priv(dev);
+ int err;
+
switch (xdp->command) {
case XDP_SETUP_PROG:
return virtnet_xdp_set(dev, xdp);
case XDP_QUERY_PROG:
return xdp_attachment_query(&vi->xdp, xdp);
case XDP_SETUP_PROG_HW:
+ err = virtnet_xdp_offload_check(vi, xdp);
+ if (err)
+ return err;
return virtnet_xdp_set_offload(vi, xdp);
case XDP_QUERY_PROG_HW:
return xdp_attachment_query(&vi->xdp_hw, xdp);
diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
index a3715a3224c1..0ea2f7910a5a 100644
--- a/include/uapi/linux/virtio_net.h
+++ b/include/uapi/linux/virtio_net.h
@@ -261,4 +261,31 @@ struct virtio_net_ctrl_mq {
#define VIRTIO_NET_CTRL_GUEST_OFFLOADS 5
#define VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET 0
+/*
+ * Control XDP offloads offloads
+ *
+ * When guest wants to offload XDP program to the device, it calls
+ * VIRTIO_NET_CTRL_EBPF_PROG along with VIRTIO_NET_BPF_CMD_SET_OFFLOAD
+ * subcommands. When offloading is successful, the device runs offloaded
+ * XDP program for each packet before sending it to the guest.
+ *
+ * VIRTIO_NET_BPF_CMD_UNSET_OFFLOAD removes the the offloaded program from
+ * the device, if exists.
+ */
+
+struct virtio_net_ctrl_ebpf_prog {
+ /* program length in bytes */
+ __virtio32 len;
+ __virtio16 cmd;
+ __virtio16 gpl_compatible;
+ __u8 insns[0];
+};
+
+#define VIRTIO_NET_CTRL_EBPF 6
+ #define VIRTIO_NET_CTRL_EBPF_PROG 1
+
+/* Commands for VIRTIO_NET_CTRL_EBPF_PROG */
+#define VIRTIO_NET_BPF_CMD_SET_OFFLOAD 1
+#define VIRTIO_NET_BPF_CMD_UNSET_OFFLOAD 2
+
#endif /* _UAPI_LINUX_VIRTIO_NET_H */
--
2.20.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [RFC net-next 15/18] virtio_net: implement XDP prog offload functionality
2019-11-26 10:07 ` [RFC net-next 15/18] virtio_net: implement XDP prog offload functionality Prashant Bhole
@ 2019-11-27 20:42 ` Michael S. Tsirkin
2019-11-28 2:53 ` Prashant Bhole
0 siblings, 1 reply; 43+ messages in thread
From: Michael S. Tsirkin @ 2019-11-27 20:42 UTC (permalink / raw)
To: Prashant Bhole
Cc: Song Liu, Jakub Kicinski, Jesper Dangaard Brouer,
Daniel Borkmann, qemu-devel, netdev, Jason Wang, John Fastabend,
Alexei Starovoitov, Martin KaFai Lau, kvm, Yonghong Song,
Andrii Nakryiko, David S . Miller
On Tue, Nov 26, 2019 at 07:07:41PM +0900, Prashant Bhole wrote:
> From: Jason Wang <jasowang@redhat.com>
>
> This patch implements bpf_prog_offload_ops callbacks and adds handling
> for XDP_SETUP_PROG_HW. Handling of XDP_SETUP_PROG_HW involves setting
> up struct virtio_net_ctrl_ebpf_prog and appending program instructions
> to it. This control buffer is sent to Qemu with class
> VIRTIO_NET_CTRL_EBPF and command VIRTIO_NET_BPF_CMD_SET_OFFLOAD.
> The expected behavior from Qemu is that it should try to load the
> program in host os and report the status.
That's not great e.g. for migration: different hosts might have
a different idea about what's allowed.
Device capabilities should be preferably exported through
feature bits or config space such that it's easy to
intercept and limit these as needed.
Also, how are we going to handle e.g. endian-ness here?
>
> It also adds restriction to have either driver or offloaded program
> at a time.
I'm not sure I understand what does the above say.
virtnet_xdp_offload_check?
Please add code comments so we remember what to do and when.
> This restriction can be removed later after proper testing.
What kind of testing is missing here?
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> Co-developed-by: Prashant Bhole <prashantbhole.linux@gmail.com>
> Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
Any UAPI changes need to be copied to virtio-dev@lists.oasis-open.org
(subscriber only) list please.
> ---
> drivers/net/virtio_net.c | 114 +++++++++++++++++++++++++++++++-
> include/uapi/linux/virtio_net.h | 27 ++++++++
> 2 files changed, 139 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index a1088d0114f2..dddfbb2a2075 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -169,6 +169,7 @@ struct control_buf {
> u8 allmulti;
> __virtio16 vid;
> __virtio64 offloads;
> + struct virtio_net_ctrl_ebpf_prog prog_ctrl;
> };
>
> struct virtnet_info {
> @@ -272,6 +273,8 @@ struct virtnet_bpf_bound_prog {
> struct bpf_insn insnsi[0];
> };
>
> +#define VIRTNET_EA(extack, msg) NL_SET_ERR_MSG_MOD((extack), msg)
> +
> /* Converting between virtqueue no. and kernel tx/rx queue no.
> * 0:rx0 1:tx0 2:rx1 3:tx1 ... 2N:rxN 2N+1:txN 2N+2:cvq
> */
> @@ -2427,6 +2430,11 @@ static int virtnet_xdp_set(struct net_device *dev, struct netdev_bpf *bpf)
> if (!xdp_attachment_flags_ok(&vi->xdp, bpf))
> return -EBUSY;
>
> + if (rtnl_dereference(vi->offload_xdp_prog)) {
> + VIRTNET_EA(bpf->extack, "program already attached in offload mode");
> + return -EINVAL;
> + }
> +
> if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS)
> && (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6) ||
> @@ -2528,17 +2536,114 @@ static int virtnet_bpf_verify_insn(struct bpf_verifier_env *env, int insn_idx,
>
> static void virtnet_bpf_destroy_prog(struct bpf_prog *prog)
> {
> + struct virtnet_bpf_bound_prog *state;
> +
> + state = prog->aux->offload->dev_priv;
> + list_del(&state->list);
> + kfree(state);
> +}
> +
> +static int virtnet_xdp_offload_check(struct virtnet_info *vi,
> + struct netdev_bpf *bpf)
> +{
> + if (!bpf->prog)
> + return 0;
> +
> + if (!bpf->prog->aux->offload) {
> + VIRTNET_EA(bpf->extack, "xdpoffload of non-bound program");
> + return -EINVAL;
> + }
> + if (bpf->prog->aux->offload->netdev != vi->dev) {
> + VIRTNET_EA(bpf->extack, "program bound to different dev");
> + return -EINVAL;
> + }
> +
> + if (rtnl_dereference(vi->xdp_prog)) {
> + VIRTNET_EA(bpf->extack, "program already attached in driver mode");
> + return -EINVAL;
> + }
> +
> + return 0;
> }
>
> static int virtnet_xdp_set_offload(struct virtnet_info *vi,
> struct netdev_bpf *bpf)
> {
> - return -EBUSY;
> + struct virtio_net_ctrl_ebpf_prog *ctrl;
> + struct virtnet_bpf_bound_prog *bound_prog = NULL;
> + struct virtio_device *vdev = vi->vdev;
> + struct bpf_prog *prog = bpf->prog;
> + void *ctrl_buf = NULL;
> + struct scatterlist sg;
> + int prog_len;
> + int err = 0;
> +
> + if (!xdp_attachment_flags_ok(&vi->xdp_hw, bpf))
> + return -EBUSY;
> +
> + if (prog) {
> + if (prog->type != BPF_PROG_TYPE_XDP)
> + return -EOPNOTSUPP;
> + bound_prog = prog->aux->offload->dev_priv;
> + prog_len = prog->len * sizeof(bound_prog->insnsi[0]);
> +
> + ctrl_buf = kmalloc(GFP_KERNEL, sizeof(*ctrl) + prog_len);
> + if (!ctrl_buf)
> + return -ENOMEM;
> + ctrl = ctrl_buf;
> + ctrl->cmd = cpu_to_virtio32(vi->vdev,
> + VIRTIO_NET_BPF_CMD_SET_OFFLOAD);
> + ctrl->len = cpu_to_virtio32(vi->vdev, prog_len);
> + ctrl->gpl_compatible = cpu_to_virtio16(vi->vdev,
> + prog->gpl_compatible);
> + memcpy(ctrl->insns, bound_prog->insnsi,
> + prog->len * sizeof(bound_prog->insnsi[0]));
> + sg_init_one(&sg, ctrl_buf, sizeof(*ctrl) + prog_len);
> + } else {
> + ctrl = &vi->ctrl->prog_ctrl;
> + ctrl->cmd = cpu_to_virtio32(vi->vdev,
> + VIRTIO_NET_BPF_CMD_UNSET_OFFLOAD);
> + sg_init_one(&sg, ctrl, sizeof(*ctrl));
> + }
> +
> + if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_EBPF,
> + VIRTIO_NET_CTRL_EBPF_PROG,
> + &sg)) {
> + dev_warn(&vdev->dev, "Failed to set bpf offload prog\n");
> + err = -EFAULT;
> + goto out;
> + }
> +
> + rcu_assign_pointer(vi->offload_xdp_prog, prog);
> +
> + xdp_attachment_setup(&vi->xdp_hw, bpf);
> +
> +out:
> + kfree(ctrl_buf);
> + return err;
> }
>
> static int virtnet_bpf_verifier_setup(struct bpf_prog *prog)
> {
> - return -ENOMEM;
> + struct virtnet_info *vi = netdev_priv(prog->aux->offload->netdev);
> + size_t insn_len = prog->len * sizeof(struct bpf_insn);
> + struct virtnet_bpf_bound_prog *state;
> +
> + state = kzalloc(sizeof(*state) + insn_len, GFP_KERNEL);
> + if (!state)
> + return -ENOMEM;
> +
> + memcpy(&state->insnsi[0], prog->insnsi, insn_len);
> +
> + state->vi = vi;
> + state->prog = prog;
> + state->len = prog->len;
> +
> + list_add_tail(&state->list, &vi->bpf_bound_progs);
> +
> + prog->aux->offload->dev_priv = state;
> +
> + return 0;
> }
>
> static int virtnet_bpf_verifier_prep(struct bpf_prog *prog)
> @@ -2568,12 +2673,17 @@ static const struct bpf_prog_offload_ops virtnet_bpf_dev_ops = {
> static int virtnet_xdp(struct net_device *dev, struct netdev_bpf *xdp)
> {
> struct virtnet_info *vi = netdev_priv(dev);
> + int err;
> +
> switch (xdp->command) {
> case XDP_SETUP_PROG:
> return virtnet_xdp_set(dev, xdp);
> case XDP_QUERY_PROG:
> return xdp_attachment_query(&vi->xdp, xdp);
> case XDP_SETUP_PROG_HW:
> + err = virtnet_xdp_offload_check(vi, xdp);
> + if (err)
> + return err;
> return virtnet_xdp_set_offload(vi, xdp);
> case XDP_QUERY_PROG_HW:
> return xdp_attachment_query(&vi->xdp_hw, xdp);
> diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
> index a3715a3224c1..0ea2f7910a5a 100644
> --- a/include/uapi/linux/virtio_net.h
> +++ b/include/uapi/linux/virtio_net.h
> @@ -261,4 +261,31 @@ struct virtio_net_ctrl_mq {
> #define VIRTIO_NET_CTRL_GUEST_OFFLOADS 5
> #define VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET 0
>
> +/*
> + * Control XDP offloads offloads
> + *
> + * When guest wants to offload XDP program to the device, it calls
> + * VIRTIO_NET_CTRL_EBPF_PROG along with VIRTIO_NET_BPF_CMD_SET_OFFLOAD
> + * subcommands. When offloading is successful, the device runs offloaded
> + * XDP program for each packet before sending it to the guest.
> + *
> + * VIRTIO_NET_BPF_CMD_UNSET_OFFLOAD removes the the offloaded program from
> + * the device, if exists.
> + */
> +
> +struct virtio_net_ctrl_ebpf_prog {
> + /* program length in bytes */
> + __virtio32 len;
> + __virtio16 cmd;
> + __virtio16 gpl_compatible;
> + __u8 insns[0];
> +};
> +
> +#define VIRTIO_NET_CTRL_EBPF 6
> + #define VIRTIO_NET_CTRL_EBPF_PROG 1
> +
> +/* Commands for VIRTIO_NET_CTRL_EBPF_PROG */
> +#define VIRTIO_NET_BPF_CMD_SET_OFFLOAD 1
> +#define VIRTIO_NET_BPF_CMD_UNSET_OFFLOAD 2
> +
> #endif /* _UAPI_LINUX_VIRTIO_NET_H */
> --
> 2.20.1
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC net-next 15/18] virtio_net: implement XDP prog offload functionality
2019-11-27 20:42 ` Michael S. Tsirkin
@ 2019-11-28 2:53 ` Prashant Bhole
0 siblings, 0 replies; 43+ messages in thread
From: Prashant Bhole @ 2019-11-28 2:53 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Song Liu, Jakub Kicinski, Jesper Dangaard Brouer,
Daniel Borkmann, qemu-devel, netdev, Jason Wang, John Fastabend,
Alexei Starovoitov, Martin KaFai Lau, kvm, Yonghong Song,
Andrii Nakryiko, David S . Miller
On 11/28/19 5:42 AM, Michael S. Tsirkin wrote:
> On Tue, Nov 26, 2019 at 07:07:41PM +0900, Prashant Bhole wrote:
>> From: Jason Wang <jasowang@redhat.com>
>>
>> This patch implements bpf_prog_offload_ops callbacks and adds handling
>> for XDP_SETUP_PROG_HW. Handling of XDP_SETUP_PROG_HW involves setting
>> up struct virtio_net_ctrl_ebpf_prog and appending program instructions
>> to it. This control buffer is sent to Qemu with class
>> VIRTIO_NET_CTRL_EBPF and command VIRTIO_NET_BPF_CMD_SET_OFFLOAD.
>> The expected behavior from Qemu is that it should try to load the
>> program in host os and report the status.
>
> That's not great e.g. for migration: different hosts might have
> a different idea about what's allowed.
> Device capabilities should be preferably exported through
> feature bits or config space such that it's easy to
> intercept and limit these as needed.
These things are mentioned in the TODO section of cover letter.
Having offload feature enabled should be a migration blocker.
A feature bit in virtio for offloading capability needs to be added.
>
> Also, how are we going to handle e.g. endian-ness here?
For now I feel we should block offloading in case of cross endian
virtualization. Further to support cross endian-ness, the requests for
offloading a map or program should include metadata such as BTF info.
Qemu needs to handle the conversion.
>
>>
>> It also adds restriction to have either driver or offloaded program
>> at a time.
>
> I'm not sure I understand what does the above say.
> virtnet_xdp_offload_check?
> Please add code comments so we remember what to do and when.
>
>> This restriction can be removed later after proper testing.
>
> What kind of testing is missing here?
This restriction mentioned above is about having multiple programs
attached to the same device. It can be possible to have a program
attached in driver mode and other in offload mode, but in current code
only one mode at a time is supported because I wasn't aware whether
bpf tooling supports the case. I will add a comment or remove the
restriction in the next revision.
>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> Co-developed-by: Prashant Bhole <prashantbhole.linux@gmail.com>
>> Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
>
> Any UAPI changes need to be copied to virtio-dev@lists.oasis-open.org
> (subscriber only) list please.
Sure.
Thanks.
^ permalink raw reply [flat|nested] 43+ messages in thread
* [RFC net-next 16/18] bpf: export function __bpf_map_get
2019-11-26 10:07 [RFC net-next 00/18] virtio_net XDP offload Prashant Bhole
` (14 preceding siblings ...)
2019-11-26 10:07 ` [RFC net-next 15/18] virtio_net: implement XDP prog offload functionality Prashant Bhole
@ 2019-11-26 10:07 ` Prashant Bhole
2019-11-26 10:07 ` [RFC net-next 17/18] virtio_net: implment XDP map offload functionality Prashant Bhole
` (2 subsequent siblings)
18 siblings, 0 replies; 43+ messages in thread
From: Prashant Bhole @ 2019-11-26 10:07 UTC (permalink / raw)
To: David S . Miller, Michael S . Tsirkin
Cc: Song Liu, Jakub Kicinski, Jesper Dangaard Brouer,
Daniel Borkmann, netdev, Jason Wang, John Fastabend,
Alexei Starovoitov, qemu-devel, Prashant Bhole, kvm,
Yonghong Song, Andrii Nakryiko, Martin KaFai Lau
From: Jason Wang <jasowang@redhat.com>
__bpf_map_get is necessary to get verify whether an fd corresponds
to a bpf map, without adding a refcount on that map. After exporting
it can be used by a kernel module.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
kernel/bpf/syscall.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index e3461ec59570..e524ab1e7c64 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -737,6 +737,7 @@ struct bpf_map *__bpf_map_get(struct fd f)
return f.file->private_data;
}
+EXPORT_SYMBOL(__bpf_map_get);
void bpf_map_inc(struct bpf_map *map)
{
--
2.20.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [RFC net-next 17/18] virtio_net: implment XDP map offload functionality
2019-11-26 10:07 [RFC net-next 00/18] virtio_net XDP offload Prashant Bhole
` (15 preceding siblings ...)
2019-11-26 10:07 ` [RFC net-next 16/18] bpf: export function __bpf_map_get Prashant Bhole
@ 2019-11-26 10:07 ` Prashant Bhole
2019-11-26 10:07 ` [RFC net-next 18/18] virtio_net: restrict bpf helper calls from offloaded program Prashant Bhole
2019-11-26 20:35 ` [RFC net-next 00/18] virtio_net XDP offload Jakub Kicinski
18 siblings, 0 replies; 43+ messages in thread
From: Prashant Bhole @ 2019-11-26 10:07 UTC (permalink / raw)
To: David S . Miller, Michael S . Tsirkin
Cc: Song Liu, Jakub Kicinski, Jesper Dangaard Brouer,
Daniel Borkmann, netdev, Jason Wang, John Fastabend,
Alexei Starovoitov, qemu-devel, Prashant Bhole, kvm,
Yonghong Song, Andrii Nakryiko, Martin KaFai Lau
From: Jason Wang <jasowang@redhat.com>
This patch implements:
* Handling of BPF_OFFLOAD_MAP_ALLOC, BPF_OFFLOAD_MAP_FREE:
Allocate driver specific map data structure. Set up struct
virtio_net_ctrl_ebpf_map and send the control buffer to Qemu with
class VIRTIO_NET_CTRL_EBPF, cmd VIRTIO_NET_CTRL_EBPF_MAP. The cmd
in the control buffer is set to VIRTIO_NET_BPF_CMD_CREATE_MAP. The
expected behavior from Qemu is that it should perform the action
as per command and return the status (and map data). In case of
create map command, Qemu should set the map_fd in the control buffer
* bpf_map_dev_ops operations:
Common map operations are implemented with use of above mentioned
struct virtio_net_ctrl_ebpf_map. This control buffer has space for
storing: key + key or key + value.
* map_fd replacement in a copy of the program:
Since map are created before the verification of program begins,
we have map fds from the host side for each offloaded map when
program verification begins. map fds in the copy of the program are
replaced with map fds from host side. This copy of program is used
for offloading.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Co-developed-by: Prashant Bhole <prashantbhole.linux@gmail.com>
Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
drivers/net/virtio_net.c | 241 ++++++++++++++++++++++++++++++++
include/uapi/linux/virtio_net.h | 23 +++
2 files changed, 264 insertions(+)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index dddfbb2a2075..91a94b787c64 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -238,6 +238,7 @@ struct virtnet_info {
struct bpf_offload_dev *bpf_dev;
struct list_head bpf_bound_progs;
+ struct list_head map_list;
};
struct padded_vnet_hdr {
@@ -275,6 +276,13 @@ struct virtnet_bpf_bound_prog {
#define VIRTNET_EA(extack, msg) NL_SET_ERR_MSG_MOD((extack), msg)
+struct virtnet_bpf_map {
+ struct bpf_offloaded_map *offmap;
+ struct virtnet_info *vi;
+ struct virtio_net_ctrl_ebpf_map *ctrl;
+ struct list_head list;
+};
+
/* Converting between virtqueue no. and kernel tx/rx queue no.
* 0:rx0 1:tx0 2:rx1 3:tx1 ... 2N:rxN 2N+1:txN 2N+2:cvq
*/
@@ -2528,6 +2536,19 @@ static int virtnet_xdp_set(struct net_device *dev, struct netdev_bpf *bpf)
return err;
}
+static struct virtnet_bpf_map *virtnet_get_bpf_map(struct virtnet_info *vi,
+ struct bpf_map *map)
+{
+ struct virtnet_bpf_map *virtnet_map;
+
+ list_for_each_entry(virtnet_map, &vi->map_list, list) {
+ if (&virtnet_map->offmap->map == map)
+ return virtnet_map;
+ }
+
+ return NULL;
+}
+
static int virtnet_bpf_verify_insn(struct bpf_verifier_env *env, int insn_idx,
int prev_insn)
{
@@ -2623,11 +2644,194 @@ static int virtnet_xdp_set_offload(struct virtnet_info *vi,
return err;
}
+static int virtnet_bpf_ctrl_map(struct bpf_offloaded_map *offmap,
+ int cmd, u8 *key, u8 *value, u64 flags,
+ u8 *out_key, u8 *out_value)
+{
+ struct virtio_net_ctrl_ebpf_map *ctrl;
+ struct virtnet_bpf_map *virtnet_map;
+ struct bpf_map *map = &offmap->map;
+ unsigned char *keyptr, *valptr;
+ struct virtnet_info *vi;
+ struct scatterlist sg;
+
+ virtnet_map = offmap->dev_priv;
+ vi = virtnet_map->vi;
+ ctrl = virtnet_map->ctrl;
+
+ keyptr = ctrl->buf;
+ valptr = ctrl->buf + ctrl->key_size;
+
+ if (key)
+ memcpy(keyptr, key, map->key_size);
+ if (value)
+ memcpy(valptr, value, map->value_size);
+
+ ctrl->cmd = cpu_to_virtio32(vi->vdev, cmd);
+ ctrl->flags = cpu_to_virtio64(vi->vdev, flags);
+
+ sg_init_one(&sg, ctrl, sizeof(*ctrl) + ctrl->buf_len);
+ if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_EBPF,
+ VIRTIO_NET_CTRL_EBPF_MAP,
+ &sg))
+ return -EFAULT;
+
+ if (out_key)
+ memcpy(out_key, valptr, map->key_size);
+ if (out_value)
+ memcpy(out_value, valptr, map->value_size);
+ return 0;
+}
+
+static int virtnet_bpf_map_update_entry(struct bpf_offloaded_map *offmap,
+ void *key, void *value, u64 flags)
+{
+ return virtnet_bpf_ctrl_map(offmap,
+ VIRTIO_NET_BPF_CMD_UPDATE_ELEM,
+ key, value, flags, NULL, NULL);
+}
+
+static int virtnet_bpf_map_delete_elem(struct bpf_offloaded_map *offmap,
+ void *key)
+{
+ return virtnet_bpf_ctrl_map(offmap,
+ VIRTIO_NET_BPF_CMD_DELETE_ELEM,
+ key, NULL, 0, NULL, NULL);
+}
+
+static int virtnet_bpf_map_lookup_entry(struct bpf_offloaded_map *offmap,
+ void *key, void *value)
+{
+ return virtnet_bpf_ctrl_map(offmap,
+ VIRTIO_NET_BPF_CMD_LOOKUP_ELEM,
+ key, NULL, 0, NULL, value);
+}
+
+static int virtnet_bpf_map_get_first_key(struct bpf_offloaded_map *offmap,
+ void *next_key)
+{
+ return virtnet_bpf_ctrl_map(offmap,
+ VIRTIO_NET_BPF_CMD_GET_FIRST,
+ NULL, NULL, 0, next_key, NULL);
+}
+
+static int virtnet_bpf_map_get_next_key(struct bpf_offloaded_map *offmap,
+ void *key, void *next_key)
+{
+ if (!key)
+ return virtnet_bpf_map_get_first_key(offmap, next_key);
+
+ return virtnet_bpf_ctrl_map(offmap,
+ VIRTIO_NET_BPF_CMD_GET_NEXT,
+ key, NULL, 0, next_key, NULL);
+}
+
+static const struct bpf_map_dev_ops virtnet_bpf_map_ops = {
+ .map_get_next_key = virtnet_bpf_map_get_next_key,
+ .map_lookup_elem = virtnet_bpf_map_lookup_entry,
+ .map_update_elem = virtnet_bpf_map_update_entry,
+ .map_delete_elem = virtnet_bpf_map_delete_elem,
+};
+
+static int virtnet_bpf_map_alloc(struct virtnet_info *vi,
+ struct bpf_offloaded_map *offmap)
+{
+ struct virtnet_bpf_map *virtnet_map = NULL;
+ struct virtio_net_ctrl_ebpf_map *ctrl = NULL;
+ struct bpf_map *map = &offmap->map;
+ struct scatterlist sg;
+ int buf_len;
+
+ if (map->map_type != BPF_MAP_TYPE_ARRAY &&
+ map->map_type != BPF_MAP_TYPE_HASH)
+ goto err;
+
+ virtnet_map = kzalloc(sizeof(*virtnet_map), GFP_KERNEL);
+ if (!virtnet_map)
+ goto err;
+
+ /* allocate buffer size to fit
+ * - sizeof (struct virio_net_ctrl_map_buf)
+ * - key_size
+ * - max(key_size + value_size, key_size + key_size)
+ */
+ buf_len = map->key_size;
+ buf_len += (map->key_size > map->value_size) ?
+ map->key_size : map->value_size;
+ ctrl = kzalloc(sizeof(*ctrl) + buf_len, GFP_KERNEL);
+ if (!ctrl)
+ goto err;
+
+ ctrl->buf_len = cpu_to_virtio32(vi->vdev, buf_len);
+ ctrl->key_size = cpu_to_virtio32(vi->vdev, map->key_size);
+ ctrl->value_size = cpu_to_virtio32(vi->vdev, map->value_size);
+ ctrl->max_entries = cpu_to_virtio32(vi->vdev, map->max_entries);
+ ctrl->map_type = cpu_to_virtio32(vi->vdev, map->map_type);
+ ctrl->map_flags = 0;
+ ctrl->cmd = cpu_to_virtio32(vi->vdev, VIRTIO_NET_BPF_CMD_CREATE_MAP);
+
+ sg_init_one(&sg, ctrl, sizeof(*ctrl) + ctrl->buf_len);
+
+ if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_EBPF,
+ VIRTIO_NET_CTRL_EBPF_MAP,
+ &sg)) {
+ dev_warn(&vi->vdev->dev, "Failed to create ebpf map\n");
+ goto err;
+ }
+
+ offmap->dev_ops = &virtnet_bpf_map_ops;
+ offmap->dev_priv = virtnet_map;
+
+ virtnet_map->offmap = offmap;
+ virtnet_map->vi = vi;
+ virtnet_map->ctrl = ctrl;
+
+ list_add_tail(&virtnet_map->list, &vi->map_list);
+
+ return 0;
+err:
+ kfree(virtnet_map);
+ kfree(ctrl);
+ return -EFAULT;
+}
+
+static int virtnet_bpf_map_free(struct virtnet_info *vi,
+ struct bpf_offloaded_map *offmap)
+{
+ struct bpf_map *map = &offmap->map;
+ struct virtnet_bpf_map *virtnet_map = virtnet_get_bpf_map(vi, map);
+ struct virtio_net_ctrl_ebpf_map *ctrl = virtnet_map->ctrl;
+ struct scatterlist sg;
+
+ if (!virtnet_map)
+ return -EINVAL;
+
+ ctrl->cmd = cpu_to_virtio32(vi->vdev, VIRTIO_NET_BPF_CMD_FREE_MAP);
+
+ sg_init_one(&sg, ctrl, sizeof(*ctrl) + ctrl->buf_len);
+
+ if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_EBPF,
+ VIRTIO_NET_CTRL_EBPF_MAP,
+ &sg)) {
+ dev_warn(&vi->vdev->dev, "Failed to free ebpf map\n");
+ return -EFAULT;
+ }
+
+ list_del(&virtnet_map->list);
+ kfree(virtnet_map->ctrl);
+ kfree(virtnet_map);
+ return 0;
+}
+
static int virtnet_bpf_verifier_setup(struct bpf_prog *prog)
{
struct virtnet_info *vi = netdev_priv(prog->aux->offload->netdev);
size_t insn_len = prog->len * sizeof(struct bpf_insn);
struct virtnet_bpf_bound_prog *state;
+ struct virtnet_bpf_map *virtnet_map;
+ struct bpf_map *map;
+ struct fd mapfd;
+ int i, err = 0;
state = kzalloc(sizeof(*state) + insn_len, GFP_KERNEL);
if (!state)
@@ -2639,11 +2843,43 @@ static int virtnet_bpf_verifier_setup(struct bpf_prog *prog)
state->prog = prog;
state->len = prog->len;
+ for (i = 0; i < state->len; i++) {
+ struct bpf_insn *insn = &state->insnsi[i];
+
+ if (insn->code != (BPF_LD | BPF_IMM | BPF_DW))
+ continue;
+
+ mapfd = fdget(insn->imm);
+ map = __bpf_map_get(mapfd);
+ if (IS_ERR(map)) {
+ pr_warn("fd %d is not pointing to valid bpf_map\n",
+ insn->imm);
+ err = -EINVAL;
+ goto err_replace;
+ }
+
+ virtnet_map = virtnet_get_bpf_map(vi, map);
+ if (!virtnet_map) {
+ pr_warn("could not get a offloaded map fd %d\n",
+ insn->imm);
+ err = -EINVAL;
+ goto err_replace;
+ }
+
+ /* Note: no need of virtio32_to_cpu */
+ insn->imm = virtnet_map->ctrl->map_fd;
+ fdput(mapfd);
+ }
+
list_add_tail(&state->list, &vi->bpf_bound_progs);
prog->aux->offload->dev_priv = state;
return 0;
+
+err_replace:
+ kfree(state);
+ return err;
}
static int virtnet_bpf_verifier_prep(struct bpf_prog *prog)
@@ -2676,6 +2912,10 @@ static int virtnet_xdp(struct net_device *dev, struct netdev_bpf *xdp)
int err;
switch (xdp->command) {
+ case BPF_OFFLOAD_MAP_ALLOC:
+ return virtnet_bpf_map_alloc(vi, xdp->offmap);
+ case BPF_OFFLOAD_MAP_FREE:
+ return virtnet_bpf_map_free(vi, xdp->offmap);
case XDP_SETUP_PROG:
return virtnet_xdp_set(dev, xdp);
case XDP_QUERY_PROG:
@@ -2747,6 +2987,7 @@ static int virtnet_bpf_init(struct virtnet_info *vi)
goto err_netdev_register;
INIT_LIST_HEAD(&vi->bpf_bound_progs);
+ INIT_LIST_HEAD(&vi->map_list);
return 0;
diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
index 0ea2f7910a5a..1d330a0883ac 100644
--- a/include/uapi/linux/virtio_net.h
+++ b/include/uapi/linux/virtio_net.h
@@ -281,11 +281,34 @@ struct virtio_net_ctrl_ebpf_prog {
__u8 insns[0];
};
+struct virtio_net_ctrl_ebpf_map {
+ __virtio32 buf_len;
+ __virtio32 cmd;
+ __virtio32 map_type;
+ __virtio32 key_size;
+ __virtio32 value_size;
+ __virtio32 max_entries;
+ __virtio32 map_flags;
+ __virtio32 map_fd;
+ __virtio64 flags;
+ __u8 buf[0];
+};
+
#define VIRTIO_NET_CTRL_EBPF 6
#define VIRTIO_NET_CTRL_EBPF_PROG 1
+ #define VIRTIO_NET_CTRL_EBPF_MAP 2
/* Commands for VIRTIO_NET_CTRL_EBPF_PROG */
#define VIRTIO_NET_BPF_CMD_SET_OFFLOAD 1
#define VIRTIO_NET_BPF_CMD_UNSET_OFFLOAD 2
+/* Commands for VIRTIO_NET_CTRL_EBPF_MAP */
+#define VIRTIO_NET_BPF_CMD_CREATE_MAP 1
+#define VIRTIO_NET_BPF_CMD_FREE_MAP 2
+#define VIRTIO_NET_BPF_CMD_UPDATE_ELEM 3
+#define VIRTIO_NET_BPF_CMD_LOOKUP_ELEM 4
+#define VIRTIO_NET_BPF_CMD_DELETE_ELEM 5
+#define VIRTIO_NET_BPF_CMD_GET_FIRST 6
+#define VIRTIO_NET_BPF_CMD_GET_NEXT 7
+
#endif /* _UAPI_LINUX_VIRTIO_NET_H */
--
2.20.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [RFC net-next 18/18] virtio_net: restrict bpf helper calls from offloaded program
2019-11-26 10:07 [RFC net-next 00/18] virtio_net XDP offload Prashant Bhole
` (16 preceding siblings ...)
2019-11-26 10:07 ` [RFC net-next 17/18] virtio_net: implment XDP map offload functionality Prashant Bhole
@ 2019-11-26 10:07 ` Prashant Bhole
2019-11-26 20:35 ` [RFC net-next 00/18] virtio_net XDP offload Jakub Kicinski
18 siblings, 0 replies; 43+ messages in thread
From: Prashant Bhole @ 2019-11-26 10:07 UTC (permalink / raw)
To: David S . Miller, Michael S . Tsirkin
Cc: Song Liu, Jakub Kicinski, Jesper Dangaard Brouer,
Daniel Borkmann, netdev, Jason Wang, John Fastabend,
Alexei Starovoitov, qemu-devel, Prashant Bhole, kvm,
Yonghong Song, Andrii Nakryiko, Martin KaFai Lau
Since we are offloading this program to the host, some of the helper
calls will not make sense. For example get_numa_node_id. Some helpers
can not be used because we don't handle them yet.
So let's allow a small set of helper calls for now.
Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
drivers/net/virtio_net.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 91a94b787c64..ab5be6b95bbd 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2549,6 +2549,25 @@ static struct virtnet_bpf_map *virtnet_get_bpf_map(struct virtnet_info *vi,
return NULL;
}
+static int virtnet_bpf_check_helper_call(struct bpf_insn *insn)
+{
+ switch (insn->imm) {
+ case BPF_FUNC_map_lookup_elem:
+ case BPF_FUNC_map_update_elem:
+ case BPF_FUNC_map_delete_elem:
+ case BPF_FUNC_ktime_get_ns:
+ case BPF_FUNC_get_prandom_u32:
+ case BPF_FUNC_csum_update:
+ case BPF_FUNC_xdp_adjust_head:
+ case BPF_FUNC_xdp_adjust_meta:
+ case BPF_FUNC_xdp_adjust_tail:
+ case BPF_FUNC_strtol:
+ case BPF_FUNC_strtoul:
+ return 0;
+ }
+ return -EOPNOTSUPP;
+}
+
static int virtnet_bpf_verify_insn(struct bpf_verifier_env *env, int insn_idx,
int prev_insn)
{
@@ -2830,6 +2849,7 @@ static int virtnet_bpf_verifier_setup(struct bpf_prog *prog)
struct virtnet_bpf_bound_prog *state;
struct virtnet_bpf_map *virtnet_map;
struct bpf_map *map;
+ u8 opcode, class;
struct fd mapfd;
int i, err = 0;
@@ -2846,6 +2866,16 @@ static int virtnet_bpf_verifier_setup(struct bpf_prog *prog)
for (i = 0; i < state->len; i++) {
struct bpf_insn *insn = &state->insnsi[i];
+ opcode = BPF_OP(insn->code);
+ class = BPF_CLASS(insn->code);
+
+ if ((class == BPF_JMP || class == BPF_JMP32) &&
+ opcode == BPF_CALL && insn->src_reg != BPF_PSEUDO_CALL) {
+ if (virtnet_bpf_check_helper_call(insn))
+ return -EOPNOTSUPP;
+ continue;
+ }
+
if (insn->code != (BPF_LD | BPF_IMM | BPF_DW))
continue;
--
2.20.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [RFC net-next 00/18] virtio_net XDP offload
2019-11-26 10:07 [RFC net-next 00/18] virtio_net XDP offload Prashant Bhole
` (17 preceding siblings ...)
2019-11-26 10:07 ` [RFC net-next 18/18] virtio_net: restrict bpf helper calls from offloaded program Prashant Bhole
@ 2019-11-26 20:35 ` Jakub Kicinski
2019-11-27 2:59 ` Jason Wang
` (2 more replies)
18 siblings, 3 replies; 43+ messages in thread
From: Jakub Kicinski @ 2019-11-26 20:35 UTC (permalink / raw)
To: Prashant Bhole
Cc: Song Liu, Jesper Dangaard Brouer, Daniel Borkmann,
Michael S . Tsirkin, qemu-devel, netdev, Jason Wang,
John Fastabend, Alexei Starovoitov, Martin KaFai Lau, kvm,
Yonghong Song, Andrii Nakryiko, David S . Miller
On Tue, 26 Nov 2019 19:07:26 +0900, Prashant Bhole wrote:
> Note: This RFC has been sent to netdev as well as qemu-devel lists
>
> This series introduces XDP offloading from virtio_net. It is based on
> the following work by Jason Wang:
> https://netdevconf.info/0x13/session.html?xdp-offload-with-virtio-net
>
> Current XDP performance in virtio-net is far from what we can achieve
> on host. Several major factors cause the difference:
> - Cost of virtualization
> - Cost of virtio (populating virtqueue and context switching)
> - Cost of vhost, it needs more optimization
> - Cost of data copy
> Because of above reasons there is a need of offloading XDP program to
> host. This set is an attempt to implement XDP offload from the guest.
This turns the guest kernel into a uAPI proxy.
BPF uAPI calls related to the "offloaded" BPF objects are forwarded
to the hypervisor, they pop up in QEMU which makes the requested call
to the hypervisor kernel. Today it's the Linux kernel tomorrow it may
be someone's proprietary "SmartNIC" implementation.
Why can't those calls be forwarded at the higher layer? Why do they
have to go through the guest kernel?
If kernel performs no significant work (or "adds value", pardon the
expression), and problem can easily be solved otherwise we shouldn't
do the work of maintaining the mechanism.
The approach of kernel generating actual machine code which is then
loaded into a sandbox on the hypervisor/SmartNIC is another story.
I'd appreciate if others could chime in.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC net-next 00/18] virtio_net XDP offload
2019-11-26 20:35 ` [RFC net-next 00/18] virtio_net XDP offload Jakub Kicinski
@ 2019-11-27 2:59 ` Jason Wang
2019-11-27 19:49 ` Jakub Kicinski
2019-11-27 20:32 ` Michael S. Tsirkin
2019-11-28 3:32 ` Alexei Starovoitov
2 siblings, 1 reply; 43+ messages in thread
From: Jason Wang @ 2019-11-27 2:59 UTC (permalink / raw)
To: Jakub Kicinski, Prashant Bhole
Cc: Song Liu, Jesper Dangaard Brouer, Daniel Borkmann,
Michael S . Tsirkin, netdev, John Fastabend, qemu-devel,
Alexei Starovoitov, David S . Miller, kvm, Yonghong Song,
Andrii Nakryiko, Martin KaFai Lau
Hi Jakub:
On 2019/11/27 上午4:35, Jakub Kicinski wrote:
> On Tue, 26 Nov 2019 19:07:26 +0900, Prashant Bhole wrote:
>> Note: This RFC has been sent to netdev as well as qemu-devel lists
>>
>> This series introduces XDP offloading from virtio_net. It is based on
>> the following work by Jason Wang:
>> https://netdevconf.info/0x13/session.html?xdp-offload-with-virtio-net
>>
>> Current XDP performance in virtio-net is far from what we can achieve
>> on host. Several major factors cause the difference:
>> - Cost of virtualization
>> - Cost of virtio (populating virtqueue and context switching)
>> - Cost of vhost, it needs more optimization
>> - Cost of data copy
>> Because of above reasons there is a need of offloading XDP program to
>> host. This set is an attempt to implement XDP offload from the guest.
> This turns the guest kernel into a uAPI proxy.
>
> BPF uAPI calls related to the "offloaded" BPF objects are forwarded
> to the hypervisor, they pop up in QEMU which makes the requested call
> to the hypervisor kernel. Today it's the Linux kernel tomorrow it may
> be someone's proprietary "SmartNIC" implementation.
>
> Why can't those calls be forwarded at the higher layer? Why do they
> have to go through the guest kernel?
I think doing forwarding at higher layer have the following issues:
- Need a dedicated library (probably libbpf) but application may choose
to do eBPF syscall directly
- Depends on guest agent to work
- Can't work for virtio-net hardware, since it still requires a hardware
interface for carrying offloading information
- Implement at the level of kernel may help for future extension like
BPF object pinning and eBPF helper etc.
Basically, this series is trying to have an implementation of
transporting eBPF through virtio, so it's not necessarily a guest to
host but driver and device. For device, it could be either a virtual one
(as done in qemu) or a real hardware.
>
> If kernel performs no significant work (or "adds value", pardon the
> expression), and problem can easily be solved otherwise we shouldn't
> do the work of maintaining the mechanism.
My understanding is that it should not be much difference compared to
other offloading technology.
>
> The approach of kernel generating actual machine code which is then
> loaded into a sandbox on the hypervisor/SmartNIC is another story.
We've considered such way, but actual machine code is not as portable as
eBPF bytecode consider we may want:
- Support migration
- Further offload the program to smart NIC (e.g through macvtap
passthrough mode etc).
Thanks
> I'd appreciate if others could chime in.
>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC net-next 00/18] virtio_net XDP offload
2019-11-27 2:59 ` Jason Wang
@ 2019-11-27 19:49 ` Jakub Kicinski
2019-11-28 3:41 ` Jason Wang
0 siblings, 1 reply; 43+ messages in thread
From: Jakub Kicinski @ 2019-11-27 19:49 UTC (permalink / raw)
To: Jason Wang
Cc: Song Liu, Jesper Dangaard Brouer, Daniel Borkmann,
Michael S . Tsirkin, netdev, John Fastabend, qemu-devel,
Alexei Starovoitov, David S . Miller, Prashant Bhole, kvm,
Yonghong Song, Andrii Nakryiko, Martin KaFai Lau
On Wed, 27 Nov 2019 10:59:37 +0800, Jason Wang wrote:
> On 2019/11/27 上午4:35, Jakub Kicinski wrote:
> > On Tue, 26 Nov 2019 19:07:26 +0900, Prashant Bhole wrote:
> >> Note: This RFC has been sent to netdev as well as qemu-devel lists
> >>
> >> This series introduces XDP offloading from virtio_net. It is based on
> >> the following work by Jason Wang:
> >> https://netdevconf.info/0x13/session.html?xdp-offload-with-virtio-net
> >>
> >> Current XDP performance in virtio-net is far from what we can achieve
> >> on host. Several major factors cause the difference:
> >> - Cost of virtualization
> >> - Cost of virtio (populating virtqueue and context switching)
> >> - Cost of vhost, it needs more optimization
> >> - Cost of data copy
> >> Because of above reasons there is a need of offloading XDP program to
> >> host. This set is an attempt to implement XDP offload from the guest.
> > This turns the guest kernel into a uAPI proxy.
> >
> > BPF uAPI calls related to the "offloaded" BPF objects are forwarded
> > to the hypervisor, they pop up in QEMU which makes the requested call
> > to the hypervisor kernel. Today it's the Linux kernel tomorrow it may
> > be someone's proprietary "SmartNIC" implementation.
> >
> > Why can't those calls be forwarded at the higher layer? Why do they
> > have to go through the guest kernel?
>
>
> I think doing forwarding at higher layer have the following issues:
>
> - Need a dedicated library (probably libbpf) but application may choose
> to do eBPF syscall directly
> - Depends on guest agent to work
This can be said about any user space functionality.
> - Can't work for virtio-net hardware, since it still requires a hardware
> interface for carrying offloading information
The HW virtio-net presumably still has a PF and hopefully reprs for
VFs, so why can't it attach the program there?
> - Implement at the level of kernel may help for future extension like
> BPF object pinning and eBPF helper etc.
No idea what you mean by this.
> Basically, this series is trying to have an implementation of
> transporting eBPF through virtio, so it's not necessarily a guest to
> host but driver and device. For device, it could be either a virtual one
> (as done in qemu) or a real hardware.
SmartNIC with a multi-core 64bit ARM CPUs is as much of a host as
is the x86 hypervisor side. This set turns the kernel into a uAPI
forwarder.
3 years ago my answer to this proposal would have been very different.
Today after all the CPU bugs it seems like the SmartNICs (which are
just another CPU running proprietary code) may just take off..
> > If kernel performs no significant work (or "adds value", pardon the
> > expression), and problem can easily be solved otherwise we shouldn't
> > do the work of maintaining the mechanism.
>
> My understanding is that it should not be much difference compared to
> other offloading technology.
I presume you mean TC offloads? In virtualization there is inherently a
hypervisor which will receive the request, be it an IO hub/SmartNIC or
the traditional hypervisor on the same CPU.
The ACL/routing offloads differ significantly, because it's either the
driver that does all the HW register poking directly or the complexity
of programming a rule into a HW table is quite low.
Same is true for the NFP BPF offload, BTW, the driver does all the
heavy lifting and compiles the final machine code image.
You can't say verifying and JITing BPF code into machine code entirely
in the hypervisor is similarly simple.
So no, there is a huge difference.
> > The approach of kernel generating actual machine code which is then
> > loaded into a sandbox on the hypervisor/SmartNIC is another story.
>
> We've considered such way, but actual machine code is not as portable as
> eBPF bytecode consider we may want:
>
> - Support migration
> - Further offload the program to smart NIC (e.g through macvtap
> passthrough mode etc).
You can re-JIT or JIT for SmartNIC..? Having the BPF bytecode does not
guarantee migration either, if the environment is expected to be
running different version of HW and SW. But yes, JITing in the guest
kernel when you don't know what to JIT for may be hard, I was just
saying that I don't mean to discourage people from implementing
sandboxes which run JITed code on SmartNICs. My criticism is (as
always?) against turning the kernel into a one-to-one uAPI forwarder
into unknown platform code.
For cloud use cases I believe the higher layer should solve this.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC net-next 00/18] virtio_net XDP offload
2019-11-27 19:49 ` Jakub Kicinski
@ 2019-11-28 3:41 ` Jason Wang
0 siblings, 0 replies; 43+ messages in thread
From: Jason Wang @ 2019-11-28 3:41 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Song Liu, Martin KaFai Lau, Jesper Dangaard Brouer,
Daniel Borkmann, Michael S . Tsirkin, netdev, John Fastabend,
qemu-devel, Alexei Starovoitov, Prashant Bhole, kvm,
Yonghong Song, Andrii Nakryiko, David S . Miller
On 2019/11/28 上午3:49, Jakub Kicinski wrote:
> On Wed, 27 Nov 2019 10:59:37 +0800, Jason Wang wrote:
>> On 2019/11/27 上午4:35, Jakub Kicinski wrote:
>>> On Tue, 26 Nov 2019 19:07:26 +0900, Prashant Bhole wrote:
>>>> Note: This RFC has been sent to netdev as well as qemu-devel lists
>>>>
>>>> This series introduces XDP offloading from virtio_net. It is based on
>>>> the following work by Jason Wang:
>>>> https://netdevconf.info/0x13/session.html?xdp-offload-with-virtio-net
>>>>
>>>> Current XDP performance in virtio-net is far from what we can achieve
>>>> on host. Several major factors cause the difference:
>>>> - Cost of virtualization
>>>> - Cost of virtio (populating virtqueue and context switching)
>>>> - Cost of vhost, it needs more optimization
>>>> - Cost of data copy
>>>> Because of above reasons there is a need of offloading XDP program to
>>>> host. This set is an attempt to implement XDP offload from the guest.
>>> This turns the guest kernel into a uAPI proxy.
>>>
>>> BPF uAPI calls related to the "offloaded" BPF objects are forwarded
>>> to the hypervisor, they pop up in QEMU which makes the requested call
>>> to the hypervisor kernel. Today it's the Linux kernel tomorrow it may
>>> be someone's proprietary "SmartNIC" implementation.
>>>
>>> Why can't those calls be forwarded at the higher layer? Why do they
>>> have to go through the guest kernel?
>>
>> I think doing forwarding at higher layer have the following issues:
>>
>> - Need a dedicated library (probably libbpf) but application may choose
>> to do eBPF syscall directly
>> - Depends on guest agent to work
> This can be said about any user space functionality.
Yes but the feature may have too much unnecessary dependencies:
dedicated library, guest agent, host agent etc. This can only work for
some specific setups and will lead vendor specific implementations.
>
>> - Can't work for virtio-net hardware, since it still requires a hardware
>> interface for carrying offloading information
> The HW virtio-net presumably still has a PF and hopefully reprs for
> VFs, so why can't it attach the program there?
Then you still need a interface for carrying such information? It will
work like assuming we had a virtio-net VF with reprs:
libbpf(guest) -> guest agent -> host agent -> libbpf(host) -> BPF
syscall -> VF reprs/PF drvier -> VF/PF reprs -> virtio-net VF
Still need a vendor specific way for passing eBPF commands from driver
to reprs/PF, and possibility, it could still be a virtio interface there.
In this proposal it will work out of box as simple as:
libbpf(guest) -> guest kernel -> virtio-net driver -> virtio-net VF
If the request comes from host (e.g flow offloading, configuration etc),
VF reprs make perfect fit. But if the request comes from guest, having
much longer journey looks quite like a burden (dependencies, bugs etc) .
What's more important, we can not assume the how virtio-net HW is
structured, it could even not a SRIOV or PCI card.
>
>> - Implement at the level of kernel may help for future extension like
>> BPF object pinning and eBPF helper etc.
> No idea what you mean by this.
My understanding is, we should narrow the gap between non-offloaded eBPF
program and offloaded eBPF program. Making maps or progs to be visible
to kernel may help to persist a unified API e.g object pinning through
sysfs, tracepoint, debug etc.
>
>> Basically, this series is trying to have an implementation of
>> transporting eBPF through virtio, so it's not necessarily a guest to
>> host but driver and device. For device, it could be either a virtual one
>> (as done in qemu) or a real hardware.
> SmartNIC with a multi-core 64bit ARM CPUs is as much of a host as
> is the x86 hypervisor side. This set turns the kernel into a uAPI
> forwarder.
Not necessarily, as what has been done by NFP, driver filter out the
features that is not supported, and the bpf object is still visible in
the kernel (and see above comment).
>
> 3 years ago my answer to this proposal would have been very different.
> Today after all the CPU bugs it seems like the SmartNICs (which are
> just another CPU running proprietary code) may just take off..
>
That's interesting but vendor may choose to use FPGA other than SoC in
this case. Anyhow discussion like this is somehow out of the scope of
the series.
>>> If kernel performs no significant work (or "adds value", pardon the
>>> expression), and problem can easily be solved otherwise we shouldn't
>>> do the work of maintaining the mechanism.
>> My understanding is that it should not be much difference compared to
>> other offloading technology.
> I presume you mean TC offloads? In virtualization there is inherently a
> hypervisor which will receive the request, be it an IO hub/SmartNIC or
> the traditional hypervisor on the same CPU.
>
> The ACL/routing offloads differ significantly, because it's either the
> driver that does all the HW register poking directly or the complexity
> of programming a rule into a HW table is quite low.
>
> Same is true for the NFP BPF offload, BTW, the driver does all the
> heavy lifting and compiles the final machine code image.
Yes and this series benefit from the infrastructure invented from NFP.
But I'm not sure this is a good point since, technically the machine
code could be generated by smart NIC as well.
>
> You can't say verifying and JITing BPF code into machine code entirely
> in the hypervisor is similarly simple.
Yes and that's why we choose to do in on the device (host) to simplify
things.
>
> So no, there is a huge difference.
>
>>> The approach of kernel generating actual machine code which is then
>>> loaded into a sandbox on the hypervisor/SmartNIC is another story.
>> We've considered such way, but actual machine code is not as portable as
>> eBPF bytecode consider we may want:
>>
>> - Support migration
>> - Further offload the program to smart NIC (e.g through macvtap
>> passthrough mode etc).
> You can re-JIT or JIT for SmartNIC..? Having the BPF bytecode does not
> guarantee migration either,
Yes, but it's more portable than machine code.
> if the environment is expected to be
> running different version of HW and SW.
Right, we plan to have feature negotiation.
> But yes, JITing in the guest
> kernel when you don't know what to JIT for may be hard,
Yes.
> I was just
> saying that I don't mean to discourage people from implementing
> sandboxes which run JITed code on SmartNICs. My criticism is (as
> always?) against turning the kernel into a one-to-one uAPI forwarder
> into unknown platform code.
We have FUSE and I think it's not only the forwarder, and we may do much
more work on top in the future. For unknown platform code, I'm not sure
why we need care about that. There's no way for us to prevent such
implementation and if we try to formalize it through a specification
(virtio spec and probably eBPF spec), it may help actually.
>
> For cloud use cases I believe the higher layer should solve this.
>
Technically possible, but have lots of drawbacks.
Thanks
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC net-next 00/18] virtio_net XDP offload
2019-11-26 20:35 ` [RFC net-next 00/18] virtio_net XDP offload Jakub Kicinski
2019-11-27 2:59 ` Jason Wang
@ 2019-11-27 20:32 ` Michael S. Tsirkin
2019-11-27 23:40 ` Jakub Kicinski
2019-11-28 3:32 ` Alexei Starovoitov
2 siblings, 1 reply; 43+ messages in thread
From: Michael S. Tsirkin @ 2019-11-27 20:32 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Song Liu, Jesper Dangaard Brouer, Daniel Borkmann, qemu-devel,
netdev, Jason Wang, John Fastabend, Alexei Starovoitov,
Martin KaFai Lau, Prashant Bhole, kvm, Yonghong Song,
Andrii Nakryiko, David S . Miller
On Tue, Nov 26, 2019 at 12:35:14PM -0800, Jakub Kicinski wrote:
> On Tue, 26 Nov 2019 19:07:26 +0900, Prashant Bhole wrote:
> > Note: This RFC has been sent to netdev as well as qemu-devel lists
> >
> > This series introduces XDP offloading from virtio_net. It is based on
> > the following work by Jason Wang:
> > https://netdevconf.info/0x13/session.html?xdp-offload-with-virtio-net
> >
> > Current XDP performance in virtio-net is far from what we can achieve
> > on host. Several major factors cause the difference:
> > - Cost of virtualization
> > - Cost of virtio (populating virtqueue and context switching)
> > - Cost of vhost, it needs more optimization
> > - Cost of data copy
> > Because of above reasons there is a need of offloading XDP program to
> > host. This set is an attempt to implement XDP offload from the guest.
>
> This turns the guest kernel into a uAPI proxy.
>
> BPF uAPI calls related to the "offloaded" BPF objects are forwarded
> to the hypervisor, they pop up in QEMU which makes the requested call
> to the hypervisor kernel. Today it's the Linux kernel tomorrow it may
> be someone's proprietary "SmartNIC" implementation.
>
> Why can't those calls be forwarded at the higher layer? Why do they
> have to go through the guest kernel?
Well everyone is writing these programs and attaching them to NICs.
For better or worse that's how userspace is written.
Yes, in the simple case where everything is passed through, it could
instead be passed through some other channel just as well, but then
userspace would need significant changes just to make it work with
virtio.
> If kernel performs no significant work (or "adds value", pardon the
> expression), and problem can easily be solved otherwise we shouldn't
> do the work of maintaining the mechanism.
>
> The approach of kernel generating actual machine code which is then
> loaded into a sandbox on the hypervisor/SmartNIC is another story.
But that's transparent to guest userspace. Making userspace care whether
it's a SmartNIC or a software device breaks part of virtualization's
appeal, which is that it looks like a hardware box to the guest.
> I'd appreciate if others could chime in.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC net-next 00/18] virtio_net XDP offload
2019-11-27 20:32 ` Michael S. Tsirkin
@ 2019-11-27 23:40 ` Jakub Kicinski
2019-12-02 15:29 ` Michael S. Tsirkin
0 siblings, 1 reply; 43+ messages in thread
From: Jakub Kicinski @ 2019-11-27 23:40 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Song Liu, Jesper Dangaard Brouer, Daniel Borkmann, qemu-devel,
netdev, Jason Wang, John Fastabend, Alexei Starovoitov,
Martin KaFai Lau, Prashant Bhole, kvm, Yonghong Song,
Andrii Nakryiko, David S . Miller
On Wed, 27 Nov 2019 15:32:17 -0500, Michael S. Tsirkin wrote:
> On Tue, Nov 26, 2019 at 12:35:14PM -0800, Jakub Kicinski wrote:
> > On Tue, 26 Nov 2019 19:07:26 +0900, Prashant Bhole wrote:
> > > Note: This RFC has been sent to netdev as well as qemu-devel lists
> > >
> > > This series introduces XDP offloading from virtio_net. It is based on
> > > the following work by Jason Wang:
> > > https://netdevconf.info/0x13/session.html?xdp-offload-with-virtio-net
> > >
> > > Current XDP performance in virtio-net is far from what we can achieve
> > > on host. Several major factors cause the difference:
> > > - Cost of virtualization
> > > - Cost of virtio (populating virtqueue and context switching)
> > > - Cost of vhost, it needs more optimization
> > > - Cost of data copy
> > > Because of above reasons there is a need of offloading XDP program to
> > > host. This set is an attempt to implement XDP offload from the guest.
> >
> > This turns the guest kernel into a uAPI proxy.
> >
> > BPF uAPI calls related to the "offloaded" BPF objects are forwarded
> > to the hypervisor, they pop up in QEMU which makes the requested call
> > to the hypervisor kernel. Today it's the Linux kernel tomorrow it may
> > be someone's proprietary "SmartNIC" implementation.
> >
> > Why can't those calls be forwarded at the higher layer? Why do they
> > have to go through the guest kernel?
>
> Well everyone is writing these programs and attaching them to NICs.
Who's everyone?
> For better or worse that's how userspace is written.
HW offload requires modifying the user space, too. The offload is not
transparent. Do you know that?
> Yes, in the simple case where everything is passed through, it could
> instead be passed through some other channel just as well, but then
> userspace would need significant changes just to make it work with
> virtio.
There is a recently spawned effort to create an "XDP daemon" or
otherwise a control application which would among other things link
separate XDP apps to share a NIC attachment point.
Making use of cloud APIs would make a perfect addition to that.
Obviously if one asks a kernel guy to solve a problem one'll get kernel
code as an answer. And writing higher layer code requires companies to
actually organize their teams and have "full stack" strategies.
We've seen this story already with net_failover wart. At least that
time we weren't risking building a proxy to someone's proprietary FW.
> > If kernel performs no significant work (or "adds value", pardon the
> > expression), and problem can easily be solved otherwise we shouldn't
> > do the work of maintaining the mechanism.
> >
> > The approach of kernel generating actual machine code which is then
> > loaded into a sandbox on the hypervisor/SmartNIC is another story.
>
> But that's transparent to guest userspace. Making userspace care whether
> it's a SmartNIC or a software device breaks part of virtualization's
> appeal, which is that it looks like a hardware box to the guest.
It's not hardware unless you JITed machine code for it, it's just
someone else's software.
I'm not arguing with the appeal. I'm arguing the risk/benefit ratio
doesn't justify opening this can of worms.
> > I'd appreciate if others could chime in.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC net-next 00/18] virtio_net XDP offload
2019-11-27 23:40 ` Jakub Kicinski
@ 2019-12-02 15:29 ` Michael S. Tsirkin
0 siblings, 0 replies; 43+ messages in thread
From: Michael S. Tsirkin @ 2019-12-02 15:29 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Song Liu, Jesper Dangaard Brouer, Daniel Borkmann, qemu-devel,
netdev, Jason Wang, John Fastabend, Alexei Starovoitov,
Martin KaFai Lau, Prashant Bhole, kvm, Yonghong Song,
Andrii Nakryiko, David S . Miller
On Wed, Nov 27, 2019 at 03:40:14PM -0800, Jakub Kicinski wrote:
> > For better or worse that's how userspace is written.
>
> HW offload requires modifying the user space, too. The offload is not
> transparent. Do you know that?
It's true, offload of program itself isn't transparent. Adding a 3rd
interface (software/hardware/host) isn't welcome though, IMHO.
--
MST
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC net-next 00/18] virtio_net XDP offload
2019-11-26 20:35 ` [RFC net-next 00/18] virtio_net XDP offload Jakub Kicinski
2019-11-27 2:59 ` Jason Wang
2019-11-27 20:32 ` Michael S. Tsirkin
@ 2019-11-28 3:32 ` Alexei Starovoitov
2019-11-28 4:18 ` Jason Wang
2 siblings, 1 reply; 43+ messages in thread
From: Alexei Starovoitov @ 2019-11-28 3:32 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Song Liu, Jesper Dangaard Brouer, Daniel Borkmann,
Michael S . Tsirkin, qemu-devel, netdev, Jason Wang,
John Fastabend, Alexei Starovoitov, Martin KaFai Lau,
Prashant Bhole, kvm, Yonghong Song, Andrii Nakryiko,
David S . Miller
On Tue, Nov 26, 2019 at 12:35:14PM -0800, Jakub Kicinski wrote:
>
> I'd appreciate if others could chime in.
The performance improvements are quite appealing.
In general offloading from higher layers into lower layers is necessary long term.
But the approach taken by patches 15 and 17 is a dead end. I don't see how it
can ever catch up with the pace of bpf development. As presented this approach
works for the most basic programs and simple maps. No line info, no BTF, no
debuggability. There are no tail_calls either. I don't think I've seen a single
production XDP program that doesn't use tail calls. Static and dynamic linking
is coming. Wraping one bpf feature at a time with virtio api is never going to
be complete. How FDs are going to be passed back? OBJ_GET_INFO_BY_FD ?
OBJ_PIN/GET ? Where bpffs is going to live ? Any realistic XDP application will
be using a lot more than single self contained XDP prog with hash and array
maps. It feels that the whole sys_bpf needs to be forwarded as a whole from
guest into host. In case of true hw offload the host is managing HW. So it
doesn't forward syscalls into the driver. The offload from guest into host is
different. BPF can be seen as a resource that host provides and guest kernel
plus qemu would be forwarding requests between guest user space and host
kernel. Like sys_bpf(BPF_MAP_CREATE) can passthrough into the host directly.
The FD that hosts sees would need a corresponding mirror FD in the guest. There
are still questions about bpffs paths, but the main issue of
one-feature-at-a-time will be addressed in such approach. There could be other
solutions, of course.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC net-next 00/18] virtio_net XDP offload
2019-11-28 3:32 ` Alexei Starovoitov
@ 2019-11-28 4:18 ` Jason Wang
2019-12-01 16:54 ` David Ahern
0 siblings, 1 reply; 43+ messages in thread
From: Jason Wang @ 2019-11-28 4:18 UTC (permalink / raw)
To: Alexei Starovoitov, Jakub Kicinski
Cc: Song Liu, Jesper Dangaard Brouer, Daniel Borkmann,
Michael S . Tsirkin, netdev, John Fastabend, qemu-devel,
Alexei Starovoitov, David S . Miller, Prashant Bhole, kvm,
Yonghong Song, Andrii Nakryiko, Martin KaFai Lau
On 2019/11/28 上午11:32, Alexei Starovoitov wrote:
> On Tue, Nov 26, 2019 at 12:35:14PM -0800, Jakub Kicinski wrote:
>> I'd appreciate if others could chime in.
> The performance improvements are quite appealing.
> In general offloading from higher layers into lower layers is necessary long term.
>
> But the approach taken by patches 15 and 17 is a dead end. I don't see how it
> can ever catch up with the pace of bpf development.
This applies for any hardware offloading features, isn't it?
> As presented this approach
> works for the most basic programs and simple maps. No line info, no BTF, no
> debuggability. There are no tail_calls either.
If I understand correctly, neither of above were implemented in NFP. We
can collaborate to find solution for all of those.
> I don't think I've seen a single
> production XDP program that doesn't use tail calls.
It looks to me we can manage to add this support.
> Static and dynamic linking
> is coming. Wraping one bpf feature at a time with virtio api is never going to
> be complete.
It's a common problem for hardware that want to implement eBPF
offloading, not a virtio specific one.
> How FDs are going to be passed back? OBJ_GET_INFO_BY_FD ?
> OBJ_PIN/GET ? Where bpffs is going to live ?
If we want pinning work in the case of virt, it should live in both host
and guest probably.
> Any realistic XDP application will
> be using a lot more than single self contained XDP prog with hash and array
> maps.
It's possible if we want to use XDP offloading to accelerate VNF which
often has simple logic.
> It feels that the whole sys_bpf needs to be forwarded as a whole from
> guest into host. In case of true hw offload the host is managing HW. So it
> doesn't forward syscalls into the driver. The offload from guest into host is
> different. BPF can be seen as a resource that host provides and guest kernel
> plus qemu would be forwarding requests between guest user space and host
> kernel. Like sys_bpf(BPF_MAP_CREATE) can passthrough into the host directly.
> The FD that hosts sees would need a corresponding mirror FD in the guest. There
> are still questions about bpffs paths, but the main issue of
> one-feature-at-a-time will be addressed in such approach.
We try to follow what NFP did by starting from a fraction of the whole
eBPF features. It would be very hard to have all eBPF features
implemented from the start. It would be helpful to clarify what's the
minimal set of features that you want to have from the start.
> There could be other
> solutions, of course.
>
>
Suggestions are welcomed.
Thanks
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC net-next 00/18] virtio_net XDP offload
2019-11-28 4:18 ` Jason Wang
@ 2019-12-01 16:54 ` David Ahern
2019-12-02 2:48 ` Jason Wang
0 siblings, 1 reply; 43+ messages in thread
From: David Ahern @ 2019-12-01 16:54 UTC (permalink / raw)
To: Jason Wang, Alexei Starovoitov, Jakub Kicinski
Cc: Song Liu, Jesper Dangaard Brouer, Daniel Borkmann,
Michael S . Tsirkin, netdev, John Fastabend, qemu-devel,
Alexei Starovoitov, David S . Miller, Prashant Bhole, kvm,
Yonghong Song, Andrii Nakryiko, Martin KaFai Lau
On 11/27/19 10:18 PM, Jason Wang wrote:
> We try to follow what NFP did by starting from a fraction of the whole
> eBPF features. It would be very hard to have all eBPF features
> implemented from the start. It would be helpful to clarify what's the
> minimal set of features that you want to have from the start.
Offloading guest programs needs to prevent a guest XDP program from
running bpf helpers that access host kernel data. e.g., bpf_fib_lookup
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC net-next 00/18] virtio_net XDP offload
2019-12-01 16:54 ` David Ahern
@ 2019-12-02 2:48 ` Jason Wang
0 siblings, 0 replies; 43+ messages in thread
From: Jason Wang @ 2019-12-02 2:48 UTC (permalink / raw)
To: David Ahern, Alexei Starovoitov, Jakub Kicinski
Cc: Song Liu, Jesper Dangaard Brouer, Daniel Borkmann,
Michael S . Tsirkin, netdev, John Fastabend, qemu-devel,
Alexei Starovoitov, David S . Miller, Prashant Bhole, kvm,
Yonghong Song, Andrii Nakryiko, Martin KaFai Lau
On 2019/12/2 上午12:54, David Ahern wrote:
> On 11/27/19 10:18 PM, Jason Wang wrote:
>> We try to follow what NFP did by starting from a fraction of the whole
>> eBPF features. It would be very hard to have all eBPF features
>> implemented from the start. It would be helpful to clarify what's the
>> minimal set of features that you want to have from the start.
> Offloading guest programs needs to prevent a guest XDP program from
> running bpf helpers that access host kernel data. e.g., bpf_fib_lookup
Right, so we probably need a new type of eBPF program on the host and
filter out the unsupported helpers there.
Thanks
>
^ permalink raw reply [flat|nested] 43+ messages in thread