On Thu, 2021-06-24 at 14:37 +0800, Jason Wang wrote: > 在 2021/6/24 上午6:52, David Woodhouse 写道: > > On Wed, 2021-06-23 at 18:31 +0100, David Woodhouse wrote: > > > Joy... that's wrong because when tun does both the PI and the vnet > > > headers, the PI header comes *first*. When tun does only PI and vhost > > > does the vnet headers, they come in the other order. > > > > > > Will fix (and adjust the test cases to cope). > > > > I got this far, pushed to > > https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/vhost-net > > > > All the test cases are now passing. I don't guarantee I haven't > > actually broken qemu and IFF_TAP mode though, mind you :) > > > No problem, but it would be easier for me if you can post another > version of the series. Ack; I'm reworking it now into a saner series. All three of my initial simple fixes ended up with more changes once I expanded the test cases to cover more permutations of PI/XDP/headers :) > > As discussed, I expanded tun_get_socket()/tap_get_socket() to return > > the actual header length instead of letting vhost make wild guesses. > > > This probably won't work since we had TUNSETVNETHDRSZ. Or indeed IFF_NO_PI. > I agree the vhost codes is tricky since it assumes only two kinds of the > hdr length. > > But it was basically how it works for the past 10 years. It depends on > the userspace (Qemu) to coordinate it with the TUN/TAP through > TUNSETVNETHDRSZ during the feature negotiation. I think that in any given situation, the kernel should either work correctly, or gracefully refuse to set it up. My patch set will make it work correctly for all the permutations I've looked at. I would accept and answer of "screw that, just make tun_get_socket() return failure if IFF_NO_PI isn't set", for example. > > Note that in doing so, I have made tun_get_socket() return -ENOTCONN if > > the tun fd *isn't* actually attached (TUNSETIFF) to a real device yet. > > Any reason for doing this? Note that the socket is loosely coupled with > the networking device. Because to determine the sock_hlen to return, it needs to look at the tun>flags and tun->vndr_hdr_sz field. And if there isn't an actual tun device attached, it can't. > > > > > I moved the sanity check back to tun/tap instead of doing it in > > vhost_net_build_xdp(), because the latter has no clue about the tun PI > > header and doesn't know *where* the virtio header is. > > > Right, the deserves a separate patch. Yep, in my tree it has one, but it's a bit mixed in with other fixes until I do that refactoring. > > diff --git a/include/linux/if_tun.h b/include/linux/if_tun.h > > index 2a7660843444..8d78b6bbc228 100644 > > --- a/include/linux/if_tun.h > > +++ b/include/linux/if_tun.h > > @@ -21,11 +21,10 @@ struct tun_msg_ctl { > > > > struct tun_xdp_hdr { > > int buflen; > > - struct virtio_net_hdr gso; > > > Any reason for doing this? I meant it can work but we need limit the > changes that is unrelated to the fixes. That's part of the patch that moves the sanity check back to tun/tap. As I said it needs a little reworking, so it currently contains a little bit of cleanup to previous code in tun_xdp_one(), but it looks like this. The bit in drivers/vhost/net.c is obviously removing code that I'd made conditional in a previous patch, so that will change somewhat as I rework the series and drop the original patch. From 2a0080f37244ec6dac8fb3e8330f9153a4373cfd Mon Sep 17 00:00:00 2001 From: David Woodhouse Date: Wed, 23 Jun 2021 23:32:00 +0100 Subject: [PATCH 10/10] net: remove virtio_net_hdr from struct tun_xdp_hdr The tun device puts its struct tun_pi *before* the virtio_net_hdr, which significantly complicates letting vhost validate it. Just let tap and tun validate it for themselves, as they do in the non-XDP case anyway. Signed-off-by: David Woodhouse --- drivers/net/tap.c | 25 ++++++++++++++++++++++--- drivers/net/tun.c | 34 ++++++++++++++++++++++++---------- drivers/vhost/net.c | 15 +-------------- include/linux/if_tun.h | 1 - 4 files changed, 47 insertions(+), 28 deletions(-) diff --git a/drivers/net/tap.c b/drivers/net/tap.c index 2170a0d3d34c..d1b1f1de374e 100644 --- a/drivers/net/tap.c +++ b/drivers/net/tap.c @@ -1132,16 +1132,35 @@ static const struct file_operations tap_fops = { static int tap_get_user_xdp(struct tap_queue *q, struct xdp_buff *xdp) { struct tun_xdp_hdr *hdr = xdp->data_hard_start; - struct virtio_net_hdr *gso = &hdr->gso; + struct virtio_net_hdr *gso = NULL; int buflen = hdr->buflen; int vnet_hdr_len = 0; struct tap_dev *tap; struct sk_buff *skb; int err, depth; - if (q->flags & IFF_VNET_HDR) + if (q->flags & IFF_VNET_HDR) { vnet_hdr_len = READ_ONCE(q->vnet_hdr_sz); + if (xdp->data != xdp->data_hard_start + sizeof(*hdr) + vnet_hdr_len) { + err = -EINVAL; + goto err; + } + + gso = (void *)&hdr[1]; + if ((gso->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) && + tap16_to_cpu(q, gso->csum_start) + + tap16_to_cpu(q, gso->csum_offset) + 2 > + tap16_to_cpu(q, gso->hdr_len)) + gso->hdr_len = cpu_to_tap16(q, + tap16_to_cpu(q, gso->csum_start) + + tap16_to_cpu(q, gso->csum_offset) + 2); + + if (tap16_to_cpu(q, gso->hdr_len) > xdp->data_end - xdp->data) { + err = -EINVAL; + goto err; + } + } skb = build_skb(xdp->data_hard_start, buflen); if (!skb) { err = -ENOMEM; @@ -1155,7 +1174,7 @@ static int tap_get_user_xdp(struct tap_queue *q, struct xdp_buff *xdp) skb_reset_mac_header(skb); skb->protocol = eth_hdr(skb)->h_proto; - if (vnet_hdr_len) { + if (gso) { err = virtio_net_hdr_to_skb(skb, gso, tap_is_little_endian(q)); if (err) goto err_kfree; diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 69f6ce87b109..72f8a04f493b 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -2337,29 +2337,43 @@ static int tun_xdp_one(struct tun_struct *tun, { unsigned int datasize = xdp->data_end - xdp->data; struct tun_xdp_hdr *hdr = xdp->data_hard_start; + void *tun_hdr = &hdr[1]; struct virtio_net_hdr *gso = NULL; struct bpf_prog *xdp_prog; struct sk_buff *skb = NULL; __be16 proto = 0; u32 rxhash = 0, act; int buflen = hdr->buflen; - int reservelen = xdp->data - xdp->data_hard_start; int err = 0; bool skb_xdp = false; struct page *page; - if (tun->flags & IFF_VNET_HDR) - gso = &hdr->gso; - if (!(tun->flags & IFF_NO_PI)) { - struct tun_pi *pi = xdp->data; - if (datasize < sizeof(*pi)) { + struct tun_pi *pi = tun_hdr; + tun_hdr += sizeof(*pi); + + if (tun_hdr > xdp->data) { atomic_long_inc(&tun->rx_frame_errors); - return -EINVAL; + return -EINVAL; } proto = pi->proto; - reservelen += sizeof(*pi); - datasize -= sizeof(*pi); + } + + if (tun->flags & IFF_VNET_HDR) { + gso = tun_hdr; + tun_hdr += sizeof(*gso); + + if (tun_hdr > xdp->data) { + atomic_long_inc(&tun->rx_frame_errors); + return -EINVAL; + } + + if ((gso->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) && + tun16_to_cpu(tun, gso->csum_start) + tun16_to_cpu(tun, gso->csum_offset) + 2 > tun16_to_cpu(tun, gso->hdr_len)) + gso->hdr_len = cpu_to_tun16(tun, tun16_to_cpu(tun, gso->csum_start) + tun16_to_cpu(tun, gso->csum_offset) + 2); + + if (tun16_to_cpu(tun, gso->hdr_len) > datasize) + return -EINVAL; } xdp_prog = rcu_dereference(tun->xdp_prog); @@ -2407,7 +2421,7 @@ static int tun_xdp_one(struct tun_struct *tun, goto out; } - skb_reserve(skb, reservelen); + skb_reserve(skb, xdp->data - xdp->data_hard_start); skb_put(skb, datasize); if (gso && virtio_net_hdr_to_skb(skb, gso, tun_is_little_endian(tun))) { diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index e88cc18d079f..d9491c620a9c 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -716,26 +716,13 @@ static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq, buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset; hdr = buf; if (sock_hlen) { - struct virtio_net_hdr *gso = &hdr->gso; - copied = copy_page_from_iter(alloc_frag->page, alloc_frag->offset + - offsetof(struct tun_xdp_hdr, gso), + sizeof(struct tun_xdp_hdr), sock_hlen, from); if (copied != sock_hlen) return -EFAULT; - if ((gso->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) && - vhost16_to_cpu(vq, gso->csum_start) + - vhost16_to_cpu(vq, gso->csum_offset) + 2 > - vhost16_to_cpu(vq, gso->hdr_len)) { - gso->hdr_len = cpu_to_vhost16(vq, - vhost16_to_cpu(vq, gso->csum_start) + - vhost16_to_cpu(vq, gso->csum_offset) + 2); - - if (vhost16_to_cpu(vq, gso->hdr_len) > len) - return -EINVAL; - } len -= sock_hlen; } diff --git a/include/linux/if_tun.h b/include/linux/if_tun.h index 8a7debd3f663..8d78b6bbc228 100644 --- a/include/linux/if_tun.h +++ b/include/linux/if_tun.h @@ -21,7 +21,6 @@ struct tun_msg_ctl { struct tun_xdp_hdr { int buflen; - struct virtio_net_hdr gso; }; #if defined(CONFIG_TUN) || defined(CONFIG_TUN_MODULE) -- 2.17.1