On Wed, 2021-06-23 at 12:02 +0800, Jason Wang wrote: > 在 2021/6/23 上午12:15, David Woodhouse 写道: > > From: David Woodhouse > > > > This creates a tun device and brings it up, then finds out the link-local > > address the kernel automatically assigns to it. > > > > It sends a ping to that address, from a fake LL address of its own, and > > then waits for a response. > > > > If the virtio_net_hdr stuff is all working correctly, it gets a response > > and manages to understand it. > > > I wonder whether it worth to bother the dependency like ipv6 or kernel > networking stack. > > How about simply use packet socket that is bound to tun to receive and > send packets? > I pondered that but figured that using the kernel's network stack wasn't too much of an additional dependency. We *could* use an AF_PACKET socket on the tun device and then drive both ends, but given that the kernel *automatically* assigns a link-local address when we bring the device up anyway, it seemed simple enough just to use ICMP. I also happened to have the ICMP generation/checking code lying around anyway in the same emacs instance, so it was reduced to a previously solved problem. We *should* eventually expand this test case to attach an AF_PACKET device to the vhost-net, instead of using a tun device as the back end. (Although I don't really see *why* vhost is limited to AF_PACKET. Why *can't* I attach anything else, like an AF_UNIX socket, to vhost-net?) > > + /* > > + * I just want to map the *whole* of userspace address space. But > > + * from userspace I don't know what that is. On x86_64 it would be: > > + * > > + * vmem->regions[0].guest_phys_addr = 4096; > > + * vmem->regions[0].memory_size = 0x7fffffffe000; > > + * vmem->regions[0].userspace_addr = 4096; > > + * > > + * For now, just ensure we put everything inside a single BSS region. > > + */ > > + vmem->regions[0].guest_phys_addr = (uint64_t)&rings; > > + vmem->regions[0].userspace_addr = (uint64_t)&rings; > > + vmem->regions[0].memory_size = sizeof(rings); > > > Instead of doing tricks like this, we can do it in another way: > > 1) enable device IOTLB > 2) wait for the IOTLB miss request (iova, len) and update identity > mapping accordingly > > This should work for all the archs (with some performance hit). Ick. For my actual application (OpenConnect) I'm either going to suck it up and put in the arch-specific limits like in the comment above, or I'll fix things to do the VHOST_F_IDENTITY_MAPPING thing we're talking about elsewhere. (Probably the former, since if I'm requiring kernel changes then I have grander plans around extending AF_TLS to do DTLS, then hooking that directly up to the tun socket via BPF and a sockmap without the data frames ever going to userspace at all.) For this test case, a hard-coded single address range in BSS is fine. I've now added !IFF_NO_PI support to the test case, but as noted it fails just like the other ones I'd already marked with #if 0, which is because vhost-net pulls some value for 'sock_hlen' out of its posterior based on some assumption around the vhost features. And then expects sock_recvmsg() to return precisely that number of bytes more than the value it peeks in the skb at the head of the sock's queue. I think I can fix *all* those test cases by making tun_get_socket() take an extra 'int *' argument, and use that to return the *actual* value of sock_hlen. Here's the updated test case in the meantime: From cf74e3fc80b8fd9df697a42cfc1ff3887de18f78 Mon Sep 17 00:00:00 2001 From: David Woodhouse Date: Wed, 23 Jun 2021 16:38:56 +0100 Subject: [PATCH] test_vhost_net: add test cases with tun_pi header These fail too, for the same reason as the previous tests were guarded with #if 0: vhost-net pulls 'sock_hlen' out of its posterior and just assumes it's 10 bytes. And then barfs when a sock_recvmsg() doesn't return precisely ten bytes more than it peeked in the head skb: [1296757.531103] Discarded rx packet: len 78, expected 74 Signed-off-by: David Woodhouse --- .../testing/selftests/vhost/test_vhost_net.c | 97 +++++++++++++------ 1 file changed, 65 insertions(+), 32 deletions(-) diff --git a/tools/testing/selftests/vhost/test_vhost_net.c b/tools/testing/selftests/vhost/test_vhost_net.c index fd4a2b0e42f0..734b3015a5bd 100644 --- a/tools/testing/selftests/vhost/test_vhost_net.c +++ b/tools/testing/selftests/vhost/test_vhost_net.c @@ -48,7 +48,7 @@ static unsigned char hexchar(char *hex) return (hexnybble(hex[0]) << 4) | hexnybble(hex[1]); } -int open_tun(int vnet_hdr_sz, struct in6_addr *addr) +int open_tun(int vnet_hdr_sz, int pi, struct in6_addr *addr) { int tun_fd = open("/dev/net/tun", O_RDWR); if (tun_fd == -1) @@ -56,7 +56,9 @@ int open_tun(int vnet_hdr_sz, struct in6_addr *addr) struct ifreq ifr = { 0 }; - ifr.ifr_flags = IFF_TUN | IFF_NO_PI; + ifr.ifr_flags = IFF_TUN; + if (!pi) + ifr.ifr_flags |= IFF_NO_PI; if (vnet_hdr_sz) ifr.ifr_flags |= IFF_VNET_HDR; @@ -249,11 +251,18 @@ static inline uint16_t csum_finish(uint32_t sum) return htons((uint16_t)(~sum)); } -static int create_icmp_echo(unsigned char *data, struct in6_addr *dst, +static int create_icmp_echo(unsigned char *data, int pi, struct in6_addr *dst, struct in6_addr *src, uint16_t id, uint16_t seq) { const int icmplen = ICMP_MINLEN + sizeof(ping_payload); - const int plen = sizeof(struct ip6_hdr) + icmplen; + int plen = sizeof(struct ip6_hdr) + icmplen; + + if (pi) { + struct tun_pi *pi = (void *)data; + data += sizeof(*pi); + plen += sizeof(*pi); + pi->proto = htons(ETH_P_IPV6); + } struct ip6_hdr *iph = (void *)data; struct icmp6_hdr *icmph = (void *)(data + sizeof(*iph)); @@ -312,8 +321,21 @@ static int create_icmp_echo(unsigned char *data, struct in6_addr *dst, } -static int check_icmp_response(unsigned char *data, uint32_t len, struct in6_addr *dst, struct in6_addr *src) +static int check_icmp_response(unsigned char *data, uint32_t len, int pi, + struct in6_addr *dst, struct in6_addr *src) { + if (pi) { + struct tun_pi *pi = (void *)data; + if (len < sizeof(*pi)) + return 0; + + if (pi->proto != htons(ETH_P_IPV6)) + return 0; + + data += sizeof(*pi); + len -= sizeof(*pi); + } + struct ip6_hdr *iph = (void *)data; return ( len >= 41 && (ntohl(iph->ip6_flow) >> 28)==6 /* IPv6 header */ && iph->ip6_nxt == IPPROTO_ICMPV6 /* IPv6 next header field = ICMPv6 */ @@ -337,7 +359,7 @@ static int check_icmp_response(unsigned char *data, uint32_t len, struct in6_add #endif -int test_vhost(int vnet_hdr_sz, int xdp, uint64_t features) +int test_vhost(int vnet_hdr_sz, int pi, int xdp, uint64_t features) { int call_fd = eventfd(0, EFD_CLOEXEC|EFD_NONBLOCK); int kick_fd = eventfd(0, EFD_CLOEXEC|EFD_NONBLOCK); @@ -353,7 +375,7 @@ int test_vhost(int vnet_hdr_sz, int xdp, uint64_t features) /* Pick up the link-local address that the kernel * assigns to the tun device. */ struct in6_addr tun_addr; - tun_fd = open_tun(vnet_hdr_sz, &tun_addr); + tun_fd = open_tun(vnet_hdr_sz, pi, &tun_addr); if (tun_fd < 0) goto err; @@ -387,18 +409,18 @@ int test_vhost(int vnet_hdr_sz, int xdp, uint64_t features) local_addr.s6_addr16[0] = htons(0xfe80); local_addr.s6_addr16[7] = htons(1); + /* Set up RX and TX descriptors; the latter with ping packets ready to * send to the kernel, but don't actually send them yet. */ for (int i = 0; i < RING_SIZE; i++) { struct pkt_buf *pkt = &rings[1].pkts[i]; - int plen = create_icmp_echo(&pkt->data[vnet_hdr_sz], &tun_addr, - &local_addr, 0x4747, i); + int plen = create_icmp_echo(&pkt->data[vnet_hdr_sz], pi, + &tun_addr, &local_addr, 0x4747, i); rings[1].desc[i].addr = vio64((uint64_t)pkt); rings[1].desc[i].len = vio32(plen + vnet_hdr_sz); rings[1].avail_ring[i] = vio16(i); - pkt = &rings[0].pkts[i]; rings[0].desc[i].addr = vio64((uint64_t)pkt); rings[0].desc[i].len = vio32(sizeof(*pkt)); @@ -438,9 +460,10 @@ int test_vhost(int vnet_hdr_sz, int xdp, uint64_t features) return -1; if (check_icmp_response((void *)(addr + vnet_hdr_sz), len - vnet_hdr_sz, - &local_addr, &tun_addr)) { + pi, &local_addr, &tun_addr)) { ret = 0; - printf("Success (%d %d %llx)\n", vnet_hdr_sz, xdp, (unsigned long long)features); + printf("Success (hdr %d, xdp %d, pi %d, features %llx)\n", + vnet_hdr_sz, xdp, pi, (unsigned long long)features); goto err; } @@ -466,51 +489,61 @@ int test_vhost(int vnet_hdr_sz, int xdp, uint64_t features) return ret; } - -int main(void) +/* Perform the given test with all four combinations of XDP/PI */ +int test_four(int vnet_hdr_sz, uint64_t features) { - int ret; - - ret = test_vhost(0, 0, ((1ULL << VHOST_NET_F_VIRTIO_NET_HDR) | - (1ULL << VIRTIO_F_VERSION_1))); + int ret = test_vhost(vnet_hdr_sz, 0, 0, features); if (ret && ret != KSFT_SKIP) return ret; - ret = test_vhost(0, 1, ((1ULL << VHOST_NET_F_VIRTIO_NET_HDR) | - (1ULL << VIRTIO_F_VERSION_1))); + ret = test_vhost(vnet_hdr_sz, 0, 1, features); if (ret && ret != KSFT_SKIP) return ret; - - ret = test_vhost(0, 0, ((1ULL << VHOST_NET_F_VIRTIO_NET_HDR))); +#if 0 /* These don't work *either* for the same reason as the #if 0 later */ + ret = test_vhost(vnet_hdr_sz, 1, 0, features); if (ret && ret != KSFT_SKIP) return ret; - ret = test_vhost(0, 1, ((1ULL << VHOST_NET_F_VIRTIO_NET_HDR))); + ret = test_vhost(vnet_hdr_sz, 1, 1, features); if (ret && ret != KSFT_SKIP) return ret; +#endif +} - ret = test_vhost(10, 0, 0); - if (ret && ret != KSFT_SKIP) - return ret; +int main(void) +{ + int ret; - ret = test_vhost(10, 1, 0); + ret = test_four(10, 0); if (ret && ret != KSFT_SKIP) return ret; -#if 0 /* These ones will fail */ - ret = test_vhost(0, 0, 0); + ret = test_four(0, ((1ULL << VHOST_NET_F_VIRTIO_NET_HDR) | + (1ULL << VIRTIO_F_VERSION_1))); if (ret && ret != KSFT_SKIP) return ret; - ret = test_vhost(0, 1, 0); + ret = test_four(0, ((1ULL << VHOST_NET_F_VIRTIO_NET_HDR))); if (ret && ret != KSFT_SKIP) return ret; - ret = test_vhost(12, 0, 0); + +#if 0 + /* + * These ones will fail, because right now vhost *assumes* that the + * underlying (tun, etc.) socket will be doing a header of precisely + * sizeof(struct virtio_net_hdr), if vhost isn't doing so itself due + * to VHOST_NET_F_VIRTIO_NET_HDR. + * + * That assumption breaks both tun with no IFF_VNET_HDR, and also + * presumably raw sockets. So leave these test cases disabled for + * now until it's fixed. + */ + ret = test_four(0, 0); if (ret && ret != KSFT_SKIP) return ret; - ret = test_vhost(12, 1, 0); + ret = test_four(12, 0); if (ret && ret != KSFT_SKIP) return ret; #endif -- 2.31.1