On Mon, 2021-06-28 at 12:23 +0800, Jason Wang wrote: > 在 2021/6/25 下午4:37, David Woodhouse 写道: > > On Fri, 2021-06-25 at 15:33 +0800, Jason Wang wrote: > > > 在 2021/6/24 下午8:30, David Woodhouse 写道: > > > > From: David Woodhouse > > > > > > > > When the underlying socket isn't configured with a virtio_net_hdr, the > > > > existing code in vhost_net_build_xdp() would attempt to validate > > > > uninitialised data, by copying zero bytes (sock_hlen) into the local > > > > copy of the header and then trying to validate that. > > > > > > > > Fixing it is somewhat non-trivial because the tun device might put a > > > > struct tun_pi*before* the virtio_net_hdr, which makes it hard to find. > > > > So just stop messing with someone else's data in vhost_net_build_xdp(), > > > > and let tap and tun validate it for themselves, as they do in the > > > > non-XDP case anyway. > > > > > > Thinking in another way. All XDP stuffs for vhost is prepared for TAP. > > > XDP is not expected to work for TUN. > > > > > > So we can simply let's vhost doesn't go with XDP path is the underlayer > > > socket is TUN. > > > > Actually, IFF_TUN mode per se isn't that complex. It's fixed purely on > > the tun side by that first patch I posted, which I later expanded a > > little to factor out tun_skb_set_protocol(). > > > > The next two patches in my original set were fixing up the fact that > > XDP currently assumes that the *socket* will be doing the vhdr, not > > vhost. Those two weren't tun-specific at all. > > > > It's supporting the PI header (which tun puts *before* the virtio > > header as I just said) which introduces a tiny bit more complexity. > > > This reminds me we need to fix tun_put_user_xdp(), Good point; thanks. > but as we've discussed, we need first figure out if PI is worth to > support for vhost-net. FWIW I certainly don't care about PI support. The only time anyone would want PI support is if they need to support protocols *other* than IPv6 and Legacy IP, over tun mode. I'm fixing this stuff because when I tried to use vhost-tun + tun for *sensible* use cases, I ended up having to flounder around trying to find a combination of settings that actually worked. And that offended me :) So I wrote a test case to iterate over various possible combinations of settings, and then kept typing until that all worked. The only thing I do feel quite strongly about is that stuff should either *work*, or *explicitly* fail if it's unsupported. At this point, although I have no actual use for it myself, I'd probably just about come down on the side of supporting PI. On the basis that: • I've basically made it work already. • It allows those code paths like tun_skb_set_protocol() to be consolidated as both calling code paths need the same thing. • Even in the kernel, and even when modules are as incestuously intertwined as vhost-net and tun already are, I'm a strong believer in *not* making assumptions about someone else's data, so letting *tun* handle its own headers without making those assumptions seems like the right thing to do. If we want to support PI, I need to go fix tun_put_user_xdp() as you noted (and work out how to add that to the test case). And resolve the fact that configuration might change after tun_get_socket() is called — and indeed that there might not *be* a configuration at all when tun_get_socket() is called. If we *don't* want to support PI, well, the *interesting* part of the above needs fixing anyway. Because I strongly believe we should *prevent* it if we don't support it, and we *still* have the case you point out of the tun vhdr_size being changed at runtime. I'll take a look at whether can pass the socklen back from tun to vhost-net on *every* packet. Is there a MSG_XXX flag we can abuse and somewhere in the msghdr that could return the header length used for *this* packet? Or could we make vhost_net_rx_peek_head_len() call explicitly into the tun device instead of making stuff up in peek_head_len()? To be clear: from the point of view of my *application* I don't care about any of this; my only motivation here is to clean up the kernel behaviour and make life easier for potential future users. I have found a setup that works in today's kernels (even though I have to disable XDP, and have to use a virtio header that I don't want), and will stick with that for now, if I actually commit it to my master branch at all: https://gitlab.com/openconnect/openconnect/-/commit/0da4fe43b886403e6 I might yet abandon it because I haven't *yet* seen it go any faster than the code which just does read()/write() on the tun device from userspace. And without XDP or zerocopy it's not clear that it could ever give me any benefit that I couldn't achieve purely in userspace by having a separate thread to do tun device I/O. But we'll see...