From: Jakub Kicinski <jakub.kicinski@netronome.com> To: "Michael S. Tsirkin" <mst@redhat.com> Cc: Prashant Bhole <prashantbhole.linux@gmail.com>, "David S . Miller" <davem@davemloft.net>, Jason Wang <jasowang@redhat.com>, Alexei Starovoitov <ast@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, Jesper Dangaard Brouer <hawk@kernel.org>, John Fastabend <john.fastabend@gmail.com>, Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>, Yonghong Song <yhs@fb.com>, Andrii Nakryiko <andriin@fb.com>, netdev@vger.kernel.org, qemu-devel@nongnu.org, kvm@vger.kernel.org Subject: Re: [RFC net-next 00/18] virtio_net XDP offload Date: Wed, 27 Nov 2019 15:40:14 -0800 [thread overview] Message-ID: <20191127154014.2b91ecc2@cakuba.netronome.com> (raw) In-Reply-To: <20191127152653-mutt-send-email-mst@kernel.org> On Wed, 27 Nov 2019 15:32:17 -0500, Michael S. Tsirkin wrote: > On Tue, Nov 26, 2019 at 12:35:14PM -0800, Jakub Kicinski wrote: > > On Tue, 26 Nov 2019 19:07:26 +0900, Prashant Bhole wrote: > > > Note: This RFC has been sent to netdev as well as qemu-devel lists > > > > > > This series introduces XDP offloading from virtio_net. It is based on > > > the following work by Jason Wang: > > > https://netdevconf.info/0x13/session.html?xdp-offload-with-virtio-net > > > > > > Current XDP performance in virtio-net is far from what we can achieve > > > on host. Several major factors cause the difference: > > > - Cost of virtualization > > > - Cost of virtio (populating virtqueue and context switching) > > > - Cost of vhost, it needs more optimization > > > - Cost of data copy > > > Because of above reasons there is a need of offloading XDP program to > > > host. This set is an attempt to implement XDP offload from the guest. > > > > This turns the guest kernel into a uAPI proxy. > > > > BPF uAPI calls related to the "offloaded" BPF objects are forwarded > > to the hypervisor, they pop up in QEMU which makes the requested call > > to the hypervisor kernel. Today it's the Linux kernel tomorrow it may > > be someone's proprietary "SmartNIC" implementation. > > > > Why can't those calls be forwarded at the higher layer? Why do they > > have to go through the guest kernel? > > Well everyone is writing these programs and attaching them to NICs. Who's everyone? > For better or worse that's how userspace is written. HW offload requires modifying the user space, too. The offload is not transparent. Do you know that? > Yes, in the simple case where everything is passed through, it could > instead be passed through some other channel just as well, but then > userspace would need significant changes just to make it work with > virtio. There is a recently spawned effort to create an "XDP daemon" or otherwise a control application which would among other things link separate XDP apps to share a NIC attachment point. Making use of cloud APIs would make a perfect addition to that. Obviously if one asks a kernel guy to solve a problem one'll get kernel code as an answer. And writing higher layer code requires companies to actually organize their teams and have "full stack" strategies. We've seen this story already with net_failover wart. At least that time we weren't risking building a proxy to someone's proprietary FW. > > If kernel performs no significant work (or "adds value", pardon the > > expression), and problem can easily be solved otherwise we shouldn't > > do the work of maintaining the mechanism. > > > > The approach of kernel generating actual machine code which is then > > loaded into a sandbox on the hypervisor/SmartNIC is another story. > > But that's transparent to guest userspace. Making userspace care whether > it's a SmartNIC or a software device breaks part of virtualization's > appeal, which is that it looks like a hardware box to the guest. It's not hardware unless you JITed machine code for it, it's just someone else's software. I'm not arguing with the appeal. I'm arguing the risk/benefit ratio doesn't justify opening this can of worms. > > I'd appreciate if others could chime in.
WARNING: multiple messages have this Message-ID (diff)
From: Jakub Kicinski <jakub.kicinski@netronome.com> To: "Michael S. Tsirkin" <mst@redhat.com> Cc: Song Liu <songliubraving@fb.com>, Jesper Dangaard Brouer <hawk@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, qemu-devel@nongnu.org, netdev@vger.kernel.org, Jason Wang <jasowang@redhat.com>, John Fastabend <john.fastabend@gmail.com>, Alexei Starovoitov <ast@kernel.org>, Martin KaFai Lau <kafai@fb.com>, Prashant Bhole <prashantbhole.linux@gmail.com>, kvm@vger.kernel.org, Yonghong Song <yhs@fb.com>, Andrii Nakryiko <andriin@fb.com>, "David S . Miller" <davem@davemloft.net> Subject: Re: [RFC net-next 00/18] virtio_net XDP offload Date: Wed, 27 Nov 2019 15:40:14 -0800 [thread overview] Message-ID: <20191127154014.2b91ecc2@cakuba.netronome.com> (raw) In-Reply-To: <20191127152653-mutt-send-email-mst@kernel.org> On Wed, 27 Nov 2019 15:32:17 -0500, Michael S. Tsirkin wrote: > On Tue, Nov 26, 2019 at 12:35:14PM -0800, Jakub Kicinski wrote: > > On Tue, 26 Nov 2019 19:07:26 +0900, Prashant Bhole wrote: > > > Note: This RFC has been sent to netdev as well as qemu-devel lists > > > > > > This series introduces XDP offloading from virtio_net. It is based on > > > the following work by Jason Wang: > > > https://netdevconf.info/0x13/session.html?xdp-offload-with-virtio-net > > > > > > Current XDP performance in virtio-net is far from what we can achieve > > > on host. Several major factors cause the difference: > > > - Cost of virtualization > > > - Cost of virtio (populating virtqueue and context switching) > > > - Cost of vhost, it needs more optimization > > > - Cost of data copy > > > Because of above reasons there is a need of offloading XDP program to > > > host. This set is an attempt to implement XDP offload from the guest. > > > > This turns the guest kernel into a uAPI proxy. > > > > BPF uAPI calls related to the "offloaded" BPF objects are forwarded > > to the hypervisor, they pop up in QEMU which makes the requested call > > to the hypervisor kernel. Today it's the Linux kernel tomorrow it may > > be someone's proprietary "SmartNIC" implementation. > > > > Why can't those calls be forwarded at the higher layer? Why do they > > have to go through the guest kernel? > > Well everyone is writing these programs and attaching them to NICs. Who's everyone? > For better or worse that's how userspace is written. HW offload requires modifying the user space, too. The offload is not transparent. Do you know that? > Yes, in the simple case where everything is passed through, it could > instead be passed through some other channel just as well, but then > userspace would need significant changes just to make it work with > virtio. There is a recently spawned effort to create an "XDP daemon" or otherwise a control application which would among other things link separate XDP apps to share a NIC attachment point. Making use of cloud APIs would make a perfect addition to that. Obviously if one asks a kernel guy to solve a problem one'll get kernel code as an answer. And writing higher layer code requires companies to actually organize their teams and have "full stack" strategies. We've seen this story already with net_failover wart. At least that time we weren't risking building a proxy to someone's proprietary FW. > > If kernel performs no significant work (or "adds value", pardon the > > expression), and problem can easily be solved otherwise we shouldn't > > do the work of maintaining the mechanism. > > > > The approach of kernel generating actual machine code which is then > > loaded into a sandbox on the hypervisor/SmartNIC is another story. > > But that's transparent to guest userspace. Making userspace care whether > it's a SmartNIC or a software device breaks part of virtualization's > appeal, which is that it looks like a hardware box to the guest. It's not hardware unless you JITed machine code for it, it's just someone else's software. I'm not arguing with the appeal. I'm arguing the risk/benefit ratio doesn't justify opening this can of worms. > > I'd appreciate if others could chime in.
next prev parent reply other threads:[~2019-11-27 23:40 UTC|newest] Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-11-26 10:07 [RFC net-next 00/18] virtio_net XDP offload Prashant Bhole 2019-11-26 10:07 ` Prashant Bhole 2019-11-26 10:07 ` [RFC net-next 01/18] bpf: introduce bpf_prog_offload_verifier_setup() Prashant Bhole 2019-11-26 10:07 ` Prashant Bhole 2019-11-26 10:07 ` [RFC net-next 02/18] net: core: rename netif_receive_generic_xdp() to do_generic_xdp_core() Prashant Bhole 2019-11-26 10:07 ` Prashant Bhole 2019-11-26 10:07 ` [RFC net-next 03/18] net: core: export do_xdp_generic_core() Prashant Bhole 2019-11-26 10:07 ` Prashant Bhole 2019-11-26 10:07 ` [RFC net-next 04/18] tuntap: check tun_msg_ctl type at necessary places Prashant Bhole 2019-11-26 10:07 ` Prashant Bhole 2019-11-26 10:07 ` [RFC net-next 05/18] vhost_net: user tap recvmsg api to access ptr ring Prashant Bhole 2019-11-26 10:07 ` Prashant Bhole 2019-11-26 10:07 ` [RFC net-next 06/18] tuntap: remove usage of ptr ring in vhost_net Prashant Bhole 2019-11-26 10:07 ` Prashant Bhole 2019-11-26 10:07 ` [RFC net-next 07/18] tun: set offloaded xdp program Prashant Bhole 2019-11-26 10:07 ` Prashant Bhole 2019-12-01 16:35 ` David Ahern 2019-12-01 16:35 ` David Ahern 2019-12-02 2:44 ` Jason Wang 2019-12-02 2:44 ` Jason Wang 2019-12-01 16:45 ` David Ahern 2019-12-01 16:45 ` David Ahern 2019-12-02 2:47 ` Jason Wang 2019-12-02 2:47 ` Jason Wang 2019-12-09 0:24 ` Prashant Bhole 2019-12-09 0:24 ` Prashant Bhole 2019-11-26 10:07 ` [RFC net-next 08/18] tun: run offloaded XDP program in Tx path Prashant Bhole 2019-11-26 10:07 ` Prashant Bhole 2019-12-01 16:39 ` David Ahern 2019-12-01 16:39 ` David Ahern 2019-12-01 20:56 ` David Miller 2019-12-01 20:56 ` David Miller 2019-12-01 21:40 ` Michael S. Tsirkin 2019-12-01 21:40 ` Michael S. Tsirkin 2019-12-01 21:54 ` David Miller 2019-12-01 21:54 ` David Miller 2019-12-02 2:56 ` Jason Wang 2019-12-02 2:56 ` Jason Wang 2019-12-02 2:45 ` Jason Wang 2019-12-02 2:45 ` Jason Wang 2019-11-26 10:07 ` [RFC net-next 09/18] tun: add a way to inject Tx path packet into Rx path Prashant Bhole 2019-11-26 10:07 ` Prashant Bhole 2019-11-26 10:07 ` [RFC net-next 10/18] tun: handle XDP_TX action of offloaded program Prashant Bhole 2019-11-26 10:07 ` Prashant Bhole 2019-11-26 10:07 ` [RFC net-next 11/18] tun: run xdp prog when tun is read from file interface Prashant Bhole 2019-11-26 10:07 ` Prashant Bhole 2019-11-26 10:07 ` [RFC net-next 12/18] virtio-net: store xdp_prog in device Prashant Bhole 2019-11-26 10:07 ` Prashant Bhole 2019-11-26 10:07 ` [RFC net-next 13/18] virtio_net: use XDP attachment helpers Prashant Bhole 2019-11-26 10:07 ` Prashant Bhole 2019-11-26 10:07 ` [RFC net-next 14/18] virtio_net: add XDP prog offload infrastructure Prashant Bhole 2019-11-26 10:07 ` Prashant Bhole 2019-11-26 10:07 ` [RFC net-next 15/18] virtio_net: implement XDP prog offload functionality Prashant Bhole 2019-11-26 10:07 ` Prashant Bhole 2019-11-27 20:42 ` Michael S. Tsirkin 2019-11-27 20:42 ` Michael S. Tsirkin 2019-11-28 2:53 ` Prashant Bhole 2019-11-28 2:53 ` Prashant Bhole 2019-11-26 10:07 ` [RFC net-next 16/18] bpf: export function __bpf_map_get Prashant Bhole 2019-11-26 10:07 ` Prashant Bhole 2019-11-26 10:07 ` [RFC net-next 17/18] virtio_net: implment XDP map offload functionality Prashant Bhole 2019-11-26 10:07 ` Prashant Bhole 2019-11-26 20:19 ` kbuild test robot 2019-11-26 10:07 ` [RFC net-next 18/18] virtio_net: restrict bpf helper calls from offloaded program Prashant Bhole 2019-11-26 10:07 ` Prashant Bhole 2019-11-26 20:35 ` [RFC net-next 00/18] virtio_net XDP offload Jakub Kicinski 2019-11-26 20:35 ` Jakub Kicinski 2019-11-27 2:59 ` Jason Wang 2019-11-27 2:59 ` Jason Wang 2019-11-27 19:49 ` Jakub Kicinski 2019-11-27 19:49 ` Jakub Kicinski 2019-11-28 3:41 ` Jason Wang 2019-11-28 3:41 ` Jason Wang 2019-11-27 20:32 ` Michael S. Tsirkin 2019-11-27 20:32 ` Michael S. Tsirkin 2019-11-27 23:40 ` Jakub Kicinski [this message] 2019-11-27 23:40 ` Jakub Kicinski 2019-12-02 15:29 ` Michael S. Tsirkin 2019-12-02 15:29 ` Michael S. Tsirkin 2019-11-28 3:32 ` Alexei Starovoitov 2019-11-28 3:32 ` Alexei Starovoitov 2019-11-28 4:18 ` Jason Wang 2019-11-28 4:18 ` Jason Wang 2019-12-01 16:54 ` David Ahern 2019-12-01 16:54 ` David Ahern 2019-12-02 2:48 ` Jason Wang 2019-12-02 2:48 ` Jason Wang
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20191127154014.2b91ecc2@cakuba.netronome.com \ --to=jakub.kicinski@netronome.com \ --cc=andriin@fb.com \ --cc=ast@kernel.org \ --cc=daniel@iogearbox.net \ --cc=davem@davemloft.net \ --cc=hawk@kernel.org \ --cc=jasowang@redhat.com \ --cc=john.fastabend@gmail.com \ --cc=kafai@fb.com \ --cc=kvm@vger.kernel.org \ --cc=mst@redhat.com \ --cc=netdev@vger.kernel.org \ --cc=prashantbhole.linux@gmail.com \ --cc=qemu-devel@nongnu.org \ --cc=songliubraving@fb.com \ --cc=yhs@fb.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.