All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: David Woodhouse <dwmw2@infradead.org>,
	netdev@vger.kernel.org, "Michael S. Tsirkin" <mst@redhat.com>
Cc: "Eugenio Pérez" <eperezma@redhat.com>,
	"Willem de Bruijn" <willemb@google.com>
Subject: Re: [PATCH v3 3/5] vhost_net: remove virtio_net_hdr validation, let tun/tap do it themselves
Date: Tue, 29 Jun 2021 11:21:24 +0800	[thread overview]
Message-ID: <2902f3b1-752b-e720-6662-24b2f580a716@redhat.com> (raw)
In-Reply-To: <72dfecd426d183615c0dd4c2e68690b0e95dd739.camel@infradead.org>


在 2021/6/28 下午7:23, David Woodhouse 写道:
> On Mon, 2021-06-28 at 12:23 +0800, Jason Wang wrote:
>> 在 2021/6/25 下午4:37, David Woodhouse 写道:
>>> On Fri, 2021-06-25 at 15:33 +0800, Jason Wang wrote:
>>>> 在 2021/6/24 下午8:30, David Woodhouse 写道:
>>>>> From: David Woodhouse<dwmw@amazon.co.uk>
>>>>>
>>>>> When the underlying socket isn't configured with a virtio_net_hdr, the
>>>>> existing code in vhost_net_build_xdp() would attempt to validate
>>>>> uninitialised data, by copying zero bytes (sock_hlen) into the local
>>>>> copy of the header and then trying to validate that.
>>>>>
>>>>> Fixing it is somewhat non-trivial because the tun device might put a
>>>>> struct tun_pi*before*  the virtio_net_hdr, which makes it hard to find.
>>>>> So just stop messing with someone else's data in vhost_net_build_xdp(),
>>>>> and let tap and tun validate it for themselves, as they do in the
>>>>> non-XDP case anyway.
>>>> Thinking in another way. All XDP stuffs for vhost is prepared for TAP.
>>>> XDP is not expected to work for TUN.
>>>>
>>>> So we can simply let's vhost doesn't go with XDP path is the underlayer
>>>> socket is TUN.
>>> Actually, IFF_TUN mode per se isn't that complex. It's fixed purely on
>>> the tun side by that first patch I posted, which I later expanded a
>>> little to factor out tun_skb_set_protocol().
>>>
>>> The next two patches in my original set were fixing up the fact that
>>> XDP currently assumes that the *socket* will be doing the vhdr, not
>>> vhost. Those two weren't tun-specific at all.
>>>
>>> It's supporting the PI header (which tun puts *before* the virtio
>>> header as I just said) which introduces a tiny bit more complexity.
>>
>> This reminds me we need to fix tun_put_user_xdp(),
> Good point; thanks.
>
>> but as we've discussed, we need first figure out if PI is worth to
>> support for vhost-net.
> FWIW I certainly don't care about PI support. The only time anyone
> would want PI support is if they need to support protocols *other* than
> IPv6 and Legacy IP, over tun mode.
>
> I'm fixing this stuff because when I tried to use vhost-tun + tun for
> *sensible* use cases, I ended up having to flounder around trying to
> find a combination of settings that actually worked. And that offended
> me :)
>
> So I wrote a test case to iterate over various possible combinations of
> settings, and then kept typing until that all worked.
>
> The only thing I do feel quite strongly about is that stuff should
> either *work*, or *explicitly* fail if it's unsupported.


I fully agree, but I suspect this may only work when we invent something 
new, otherwise I'm not sure if it's too late to fix where it may break 
the existing application.


>
> At this point, although I have no actual use for it myself, I'd
> probably just about come down on the side of supporting PI. On the
> basis that:
>
>   • I've basically made it work already.
>
>   • It allows those code paths like tun_skb_set_protocol() to be
>     consolidated as both calling code paths need the same thing.
>
>   • Even in the kernel, and even when modules are as incestuously
>     intertwined as vhost-net and tun already are, I'm a strong
>     believer in *not* making assumptions about someone else's data,
>     so letting *tun* handle its own headers without making those
>     assumptions seems like the right thing to do.
>
>
>
> If we want to support PI, I need to go fix tun_put_user_xdp() as you
> noted (and work out how to add that to the test case). And resolve the
> fact that configuration might change after tun_get_socket() is called —
> and indeed that there might not *be* a configuration at all when
> tun_get_socket() is called.


Yes, but I tend to leave the code as is PI part consider no one is 
interested in that. (vhost_net + PI).


>
>
> If we *don't* want to support PI, well, the *interesting* part of the
> above needs fixing anyway. Because I strongly believe we should
> *prevent* it if we don't support it, and we *still* have the case you
> point out of the tun vhdr_size being changed at runtime.


As discussed in another thread, it looks me to it's sufficient to have 
some statics counters/API in vhost_net. Or simply use msg_control to 
reuse tx_errors of TUN/TAP or macvtap.


>
> I'll take a look at whether can pass the socklen back from tun to
> vhost-net on *every* packet. Is there a MSG_XXX flag we can abuse and
> somewhere in the msghdr that could return the header length used for
> *this* packet?


msg_control is probably the best place to do this.


>   Or could we make vhost_net_rx_peek_head_len() call
> explicitly into the tun device instead of making stuff up in
> peek_head_len()?


They're working at skb/xdp level which is unaware of PI stuffs.

But again, I think it should be much more cheaper to just add error 
reporting in this case. And it should be sufficient.


>
>
> To be clear: from the point of view of my *application* I don't care
> about any of this; my only motivation here is to clean up the kernel
> behaviour and make life easier for potential future users.


Yes, thanks a lot for having a look at this.

Though I'm not quite sure vhost_net is designed to work on those setups 
but let's ask for Michael (author of vhost/net) for his idea:

Michael, do you think it's worth to support

1) vhost_net + TUN
2) vhost_net + PI

?


> I have found
> a setup that works in today's kernels (even though I have to disable
> XDP, and have to use a virtio header that I don't want), and will stick
> with that for now, if I actually commit it to my master branch at all:
> https://gitlab.com/openconnect/openconnect/-/commit/0da4fe43b886403e6


Yes, but unfortunately it needs some tricks for avoid hitting bugs in 
the kernel.


>
> I might yet abandon it because I haven't *yet* seen it go any faster
> than the code which just does read()/write() on the tun device from
> userspace. And without XDP or zerocopy it's not clear that it could
> ever give me any benefit that I couldn't achieve purely in userspace by
> having a separate thread to do tun device I/O. But we'll see...


Ok.

Thanks.


  parent reply	other threads:[~2021-06-29  3:21 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-19 13:33 [PATCH] net: tun: fix tun_xdp_one() for IFF_TUN mode David Woodhouse
2021-06-21  7:00 ` Jason Wang
2021-06-21 10:52   ` David Woodhouse
2021-06-21 14:50     ` David Woodhouse
2021-06-21 20:43       ` David Woodhouse
2021-06-22  4:52         ` Jason Wang
2021-06-22  7:24           ` David Woodhouse
2021-06-22  7:51             ` Jason Wang
2021-06-22  8:10               ` David Woodhouse
2021-06-22 11:36               ` David Woodhouse
2021-06-22  4:34       ` Jason Wang
2021-06-22  4:34     ` Jason Wang
2021-06-22  7:28       ` David Woodhouse
2021-06-22  8:00         ` Jason Wang
2021-06-22  8:29           ` David Woodhouse
2021-06-23  3:39             ` Jason Wang
2021-06-24 12:39               ` David Woodhouse
2021-06-22 16:15 ` [PATCH v2 1/4] " David Woodhouse
2021-06-22 16:15   ` [PATCH v2 2/4] net: tun: don't assume IFF_VNET_HDR in tun_xdp_one() tx path David Woodhouse
2021-06-23  3:46     ` Jason Wang
2021-06-22 16:15   ` [PATCH v2 3/4] vhost_net: validate virtio_net_hdr only if it exists David Woodhouse
2021-06-23  3:48     ` Jason Wang
2021-06-22 16:15   ` [PATCH v2 4/4] vhost_net: Add self test with tun device David Woodhouse
2021-06-23  4:02     ` Jason Wang
2021-06-23 16:12       ` David Woodhouse
2021-06-24  6:12         ` Jason Wang
2021-06-24 10:42           ` David Woodhouse
2021-06-25  2:55             ` Jason Wang
2021-06-25  7:54               ` David Woodhouse
2021-06-23  3:45   ` [PATCH v2 1/4] net: tun: fix tun_xdp_one() for IFF_TUN mode Jason Wang
2021-06-23  8:30     ` David Woodhouse
2021-06-23 13:52     ` David Woodhouse
2021-06-23 17:31       ` David Woodhouse
2021-06-23 22:52         ` David Woodhouse
2021-06-24  6:37           ` Jason Wang
2021-06-24  7:23             ` David Woodhouse
2021-06-24  6:18       ` Jason Wang
2021-06-24  7:05         ` David Woodhouse
2021-06-24 12:30 ` [PATCH v3 1/5] net: add header len parameter to tun_get_socket(), tap_get_socket() David Woodhouse
2021-06-24 12:30   ` [PATCH v3 2/5] net: tun: don't assume IFF_VNET_HDR in tun_xdp_one() tx path David Woodhouse
2021-06-25  6:58     ` Jason Wang
2021-06-24 12:30   ` [PATCH v3 3/5] vhost_net: remove virtio_net_hdr validation, let tun/tap do it themselves David Woodhouse
2021-06-25  7:33     ` Jason Wang
2021-06-25  8:37       ` David Woodhouse
2021-06-28  4:23         ` Jason Wang
2021-06-28 11:23           ` David Woodhouse
2021-06-28 23:29             ` David Woodhouse
2021-06-29  3:43               ` Jason Wang
2021-06-29  6:59                 ` David Woodhouse
2021-06-29 10:49                 ` David Woodhouse
2021-06-29 13:15                   ` David Woodhouse
2021-06-30  4:39                   ` Jason Wang
2021-06-30 10:02                     ` David Woodhouse
2021-07-01  4:13                       ` Jason Wang
2021-07-01 17:39                         ` David Woodhouse
2021-07-02  3:13                           ` Jason Wang
2021-07-02  8:08                             ` David Woodhouse
2021-07-02  8:50                               ` Jason Wang
2021-07-09 15:04                               ` Eugenio Perez Martin
2021-06-29  3:21             ` Jason Wang [this message]
2021-06-24 12:30   ` [PATCH v3 4/5] net: tun: fix tun_xdp_one() for IFF_TUN mode David Woodhouse
2021-06-25  7:41     ` Jason Wang
2021-06-25  8:51       ` David Woodhouse
2021-06-28  4:27         ` Jason Wang
2021-06-28 10:43           ` David Woodhouse
2021-06-25 18:43     ` Willem de Bruijn
2021-06-25 19:00       ` David Woodhouse
2021-06-24 12:30   ` [PATCH v3 5/5] vhost_net: Add self test with tun device David Woodhouse
2021-06-25  5:00   ` [PATCH v3 1/5] net: add header len parameter to tun_get_socket(), tap_get_socket() Jason Wang
2021-06-25  8:23     ` David Woodhouse
2021-06-28  4:22       ` Jason Wang
2021-06-25 18:13   ` Willem de Bruijn
2021-06-25 18:55     ` David Woodhouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2902f3b1-752b-e720-6662-24b2f580a716@redhat.com \
    --to=jasowang@redhat.com \
    --cc=dwmw2@infradead.org \
    --cc=eperezma@redhat.com \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.