All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Woodhouse <dwmw2@infradead.org>
To: Jason Wang <jasowang@redhat.com>, netdev@vger.kernel.org
Cc: "Eugenio Pérez" <eperezma@redhat.com>,
	"Willem de Bruijn" <willemb@google.com>
Subject: Re: [PATCH v3 3/5] vhost_net: remove virtio_net_hdr validation, let tun/tap do it themselves
Date: Mon, 28 Jun 2021 12:23:21 +0100	[thread overview]
Message-ID: <72dfecd426d183615c0dd4c2e68690b0e95dd739.camel@infradead.org> (raw)
In-Reply-To: <1c6110d9-2a45-f766-9d9a-e2996c14b748@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 5010 bytes --]

On Mon, 2021-06-28 at 12:23 +0800, Jason Wang wrote:
> 在 2021/6/25 下午4:37, David Woodhouse 写道:
> > On Fri, 2021-06-25 at 15:33 +0800, Jason Wang wrote:
> > > 在 2021/6/24 下午8:30, David Woodhouse 写道:
> > > > From: David Woodhouse<dwmw@amazon.co.uk>
> > > > 
> > > > When the underlying socket isn't configured with a virtio_net_hdr, the
> > > > existing code in vhost_net_build_xdp() would attempt to validate
> > > > uninitialised data, by copying zero bytes (sock_hlen) into the local
> > > > copy of the header and then trying to validate that.
> > > > 
> > > > Fixing it is somewhat non-trivial because the tun device might put a
> > > > struct tun_pi*before*  the virtio_net_hdr, which makes it hard to find.
> > > > So just stop messing with someone else's data in vhost_net_build_xdp(),
> > > > and let tap and tun validate it for themselves, as they do in the
> > > > non-XDP case anyway.
> > > 
> > > Thinking in another way. All XDP stuffs for vhost is prepared for TAP.
> > > XDP is not expected to work for TUN.
> > > 
> > > So we can simply let's vhost doesn't go with XDP path is the underlayer
> > > socket is TUN.
> > 
> > Actually, IFF_TUN mode per se isn't that complex. It's fixed purely on
> > the tun side by that first patch I posted, which I later expanded a
> > little to factor out tun_skb_set_protocol().
> > 
> > The next two patches in my original set were fixing up the fact that
> > XDP currently assumes that the *socket* will be doing the vhdr, not
> > vhost. Those two weren't tun-specific at all.
> > 
> > It's supporting the PI header (which tun puts *before* the virtio
> > header as I just said) which introduces a tiny bit more complexity.
> 
> 
> This reminds me we need to fix tun_put_user_xdp(),

Good point; thanks.

> but as we've discussed, we need first figure out if PI is worth to
> support for vhost-net.

FWIW I certainly don't care about PI support. The only time anyone
would want PI support is if they need to support protocols *other* than
IPv6 and Legacy IP, over tun mode.

I'm fixing this stuff because when I tried to use vhost-tun + tun for
*sensible* use cases, I ended up having to flounder around trying to
find a combination of settings that actually worked. And that offended
me :)

So I wrote a test case to iterate over various possible combinations of
settings, and then kept typing until that all worked.

The only thing I do feel quite strongly about is that stuff should
either *work*, or *explicitly* fail if it's unsupported.

At this point, although I have no actual use for it myself, I'd
probably just about come down on the side of supporting PI. On the
basis that:

 • I've basically made it work already.

 • It allows those code paths like tun_skb_set_protocol() to be
   consolidated as both calling code paths need the same thing.

 • Even in the kernel, and even when modules are as incestuously
   intertwined as vhost-net and tun already are, I'm a strong
   believer in *not* making assumptions about someone else's data,
   so letting *tun* handle its own headers without making those
   assumptions seems like the right thing to do.



If we want to support PI, I need to go fix tun_put_user_xdp() as you
noted (and work out how to add that to the test case). And resolve the
fact that configuration might change after tun_get_socket() is called —
and indeed that there might not *be* a configuration at all when
tun_get_socket() is called.


If we *don't* want to support PI, well, the *interesting* part of the
above needs fixing anyway. Because I strongly believe we should
*prevent* it if we don't support it, and we *still* have the case you
point out of the tun vhdr_size being changed at runtime.

I'll take a look at whether can pass the socklen back from tun to
vhost-net on *every* packet. Is there a MSG_XXX flag we can abuse and
somewhere in the msghdr that could return the header length used for
*this* packet? Or could we make vhost_net_rx_peek_head_len() call
explicitly into the tun device instead of making stuff up in
peek_head_len()? 


To be clear: from the point of view of my *application* I don't care
about any of this; my only motivation here is to clean up the kernel
behaviour and make life easier for potential future users. I have found
a setup that works in today's kernels (even though I have to disable
XDP, and have to use a virtio header that I don't want), and will stick
with that for now, if I actually commit it to my master branch at all:
https://gitlab.com/openconnect/openconnect/-/commit/0da4fe43b886403e6

I might yet abandon it because I haven't *yet* seen it go any faster
than the code which just does read()/write() on the tun device from
userspace. And without XDP or zerocopy it's not clear that it could
ever give me any benefit that I couldn't achieve purely in userspace by
having a separate thread to do tun device I/O. But we'll see...

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

  reply	other threads:[~2021-06-28 11:23 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-19 13:33 [PATCH] net: tun: fix tun_xdp_one() for IFF_TUN mode David Woodhouse
2021-06-21  7:00 ` Jason Wang
2021-06-21 10:52   ` David Woodhouse
2021-06-21 14:50     ` David Woodhouse
2021-06-21 20:43       ` David Woodhouse
2021-06-22  4:52         ` Jason Wang
2021-06-22  7:24           ` David Woodhouse
2021-06-22  7:51             ` Jason Wang
2021-06-22  8:10               ` David Woodhouse
2021-06-22 11:36               ` David Woodhouse
2021-06-22  4:34       ` Jason Wang
2021-06-22  4:34     ` Jason Wang
2021-06-22  7:28       ` David Woodhouse
2021-06-22  8:00         ` Jason Wang
2021-06-22  8:29           ` David Woodhouse
2021-06-23  3:39             ` Jason Wang
2021-06-24 12:39               ` David Woodhouse
2021-06-22 16:15 ` [PATCH v2 1/4] " David Woodhouse
2021-06-22 16:15   ` [PATCH v2 2/4] net: tun: don't assume IFF_VNET_HDR in tun_xdp_one() tx path David Woodhouse
2021-06-23  3:46     ` Jason Wang
2021-06-22 16:15   ` [PATCH v2 3/4] vhost_net: validate virtio_net_hdr only if it exists David Woodhouse
2021-06-23  3:48     ` Jason Wang
2021-06-22 16:15   ` [PATCH v2 4/4] vhost_net: Add self test with tun device David Woodhouse
2021-06-23  4:02     ` Jason Wang
2021-06-23 16:12       ` David Woodhouse
2021-06-24  6:12         ` Jason Wang
2021-06-24 10:42           ` David Woodhouse
2021-06-25  2:55             ` Jason Wang
2021-06-25  7:54               ` David Woodhouse
2021-06-23  3:45   ` [PATCH v2 1/4] net: tun: fix tun_xdp_one() for IFF_TUN mode Jason Wang
2021-06-23  8:30     ` David Woodhouse
2021-06-23 13:52     ` David Woodhouse
2021-06-23 17:31       ` David Woodhouse
2021-06-23 22:52         ` David Woodhouse
2021-06-24  6:37           ` Jason Wang
2021-06-24  7:23             ` David Woodhouse
2021-06-24  6:18       ` Jason Wang
2021-06-24  7:05         ` David Woodhouse
2021-06-24 12:30 ` [PATCH v3 1/5] net: add header len parameter to tun_get_socket(), tap_get_socket() David Woodhouse
2021-06-24 12:30   ` [PATCH v3 2/5] net: tun: don't assume IFF_VNET_HDR in tun_xdp_one() tx path David Woodhouse
2021-06-25  6:58     ` Jason Wang
2021-06-24 12:30   ` [PATCH v3 3/5] vhost_net: remove virtio_net_hdr validation, let tun/tap do it themselves David Woodhouse
2021-06-25  7:33     ` Jason Wang
2021-06-25  8:37       ` David Woodhouse
2021-06-28  4:23         ` Jason Wang
2021-06-28 11:23           ` David Woodhouse [this message]
2021-06-28 23:29             ` David Woodhouse
2021-06-29  3:43               ` Jason Wang
2021-06-29  6:59                 ` David Woodhouse
2021-06-29 10:49                 ` David Woodhouse
2021-06-29 13:15                   ` David Woodhouse
2021-06-30  4:39                   ` Jason Wang
2021-06-30 10:02                     ` David Woodhouse
2021-07-01  4:13                       ` Jason Wang
2021-07-01 17:39                         ` David Woodhouse
2021-07-02  3:13                           ` Jason Wang
2021-07-02  8:08                             ` David Woodhouse
2021-07-02  8:50                               ` Jason Wang
2021-07-09 15:04                               ` Eugenio Perez Martin
2021-06-29  3:21             ` Jason Wang
2021-06-24 12:30   ` [PATCH v3 4/5] net: tun: fix tun_xdp_one() for IFF_TUN mode David Woodhouse
2021-06-25  7:41     ` Jason Wang
2021-06-25  8:51       ` David Woodhouse
2021-06-28  4:27         ` Jason Wang
2021-06-28 10:43           ` David Woodhouse
2021-06-25 18:43     ` Willem de Bruijn
2021-06-25 19:00       ` David Woodhouse
2021-06-24 12:30   ` [PATCH v3 5/5] vhost_net: Add self test with tun device David Woodhouse
2021-06-25  5:00   ` [PATCH v3 1/5] net: add header len parameter to tun_get_socket(), tap_get_socket() Jason Wang
2021-06-25  8:23     ` David Woodhouse
2021-06-28  4:22       ` Jason Wang
2021-06-25 18:13   ` Willem de Bruijn
2021-06-25 18:55     ` David Woodhouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=72dfecd426d183615c0dd4c2e68690b0e95dd739.camel@infradead.org \
    --to=dwmw2@infradead.org \
    --cc=eperezma@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.