All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Woodhouse <dwmw2@infradead.org>
To: Jason Wang <jasowang@redhat.com>, netdev@vger.kernel.org
Cc: "Eugenio Pérez" <eperezma@redhat.com>
Subject: Re: [PATCH v2 4/4] vhost_net: Add self test with tun device
Date: Wed, 23 Jun 2021 17:12:16 +0100	[thread overview]
Message-ID: <d6ad85649fd56d3e12e59085836326a09885593b.camel@infradead.org> (raw)
In-Reply-To: <85e55d53-4ef2-0b61-234e-4b5f30909efa@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 11142 bytes --]

On Wed, 2021-06-23 at 12:02 +0800, Jason Wang wrote:
> 在 2021/6/23 上午12:15, David Woodhouse 写道:
> > From: David Woodhouse <dwmw@amazon.co.uk>
> > 
> > This creates a tun device and brings it up, then finds out the link-local
> > address the kernel automatically assigns to it.
> > 
> > It sends a ping to that address, from a fake LL address of its own, and
> > then waits for a response.
> > 
> > If the virtio_net_hdr stuff is all working correctly, it gets a response
> > and manages to understand it.
> 
> 
> I wonder whether it worth to bother the dependency like ipv6 or kernel 
> networking stack.
> 
> How about simply use packet socket that is bound to tun to receive and 
> send packets?
> 

I pondered that but figured that using the kernel's network stack
wasn't too much of an additional dependency. We *could* use an
AF_PACKET socket on the tun device and then drive both ends, but given
that the kernel *automatically* assigns a link-local address when we
bring the device up anyway, it seemed simple enough just to use ICMP.
I also happened to have the ICMP generation/checking code lying around
anyway in the same emacs instance, so it was reduced to a previously
solved problem.

We *should* eventually expand this test case to attach an AF_PACKET
device to the vhost-net, instead of using a tun device as the back end.
(Although I don't really see *why* vhost is limited to AF_PACKET. Why
*can't* I attach anything else, like an AF_UNIX socket, to vhost-net?)


> > +	/*
> > +	 * I just want to map the *whole* of userspace address space. But
> > +	 * from userspace I don't know what that is. On x86_64 it would be:
> > +	 *
> > +	 * vmem->regions[0].guest_phys_addr = 4096;
> > +	 * vmem->regions[0].memory_size = 0x7fffffffe000;
> > +	 * vmem->regions[0].userspace_addr = 4096;
> > +	 *
> > +	 * For now, just ensure we put everything inside a single BSS region.
> > +	 */
> > +	vmem->regions[0].guest_phys_addr = (uint64_t)&rings;
> > +	vmem->regions[0].userspace_addr = (uint64_t)&rings;
> > +	vmem->regions[0].memory_size = sizeof(rings);
> 
> 
> Instead of doing tricks like this, we can do it in another way:
> 
> 1) enable device IOTLB
> 2) wait for the IOTLB miss request (iova, len) and update identity 
> mapping accordingly
> 
> This should work for all the archs (with some performance hit).

Ick. For my actual application (OpenConnect) I'm either going to suck
it up and put in the arch-specific limits like in the comment above, or
I'll fix things to do the VHOST_F_IDENTITY_MAPPING thing we're talking
about elsewhere. (Probably the former, since if I'm requiring kernel
changes then I have grander plans around extending AF_TLS to do DTLS,
then hooking that directly up to the tun socket via BPF and a sockmap
without the data frames ever going to userspace at all.)

For this test case, a hard-coded single address range in BSS is fine.

I've now added !IFF_NO_PI support to the test case, but as noted it
fails just like the other ones I'd already marked with #if 0, which is
because vhost-net pulls some value for 'sock_hlen' out of its posterior
based on some assumption around the vhost features. And then expects
sock_recvmsg() to return precisely that number of bytes more than the
value it peeks in the skb at the head of the sock's queue.

I think I can fix *all* those test cases by making tun_get_socket()
take an extra 'int *' argument, and use that to return the *actual*
value of sock_hlen. Here's the updated test case in the meantime:


From cf74e3fc80b8fd9df697a42cfc1ff3887de18f78 Mon Sep 17 00:00:00 2001
From: David Woodhouse <dwmw@amazon.co.uk>
Date: Wed, 23 Jun 2021 16:38:56 +0100
Subject: [PATCH] test_vhost_net: add test cases with tun_pi header

These fail too, for the same reason as the previous tests were
guarded with #if 0: vhost-net pulls 'sock_hlen' out of its
posterior and just assumes it's 10 bytes. And then barfs when
a sock_recvmsg() doesn't return precisely ten bytes more than
it peeked in the head skb:

[1296757.531103] Discarded rx packet:  len 78, expected 74

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 .../testing/selftests/vhost/test_vhost_net.c  | 97 +++++++++++++------
 1 file changed, 65 insertions(+), 32 deletions(-)

diff --git a/tools/testing/selftests/vhost/test_vhost_net.c b/tools/testing/selftests/vhost/test_vhost_net.c
index fd4a2b0e42f0..734b3015a5bd 100644
--- a/tools/testing/selftests/vhost/test_vhost_net.c
+++ b/tools/testing/selftests/vhost/test_vhost_net.c
@@ -48,7 +48,7 @@ static unsigned char hexchar(char *hex)
 	return (hexnybble(hex[0]) << 4) | hexnybble(hex[1]);
 }
 
-int open_tun(int vnet_hdr_sz, struct in6_addr *addr)
+int open_tun(int vnet_hdr_sz, int pi, struct in6_addr *addr)
 {
 	int tun_fd = open("/dev/net/tun", O_RDWR);
 	if (tun_fd == -1)
@@ -56,7 +56,9 @@ int open_tun(int vnet_hdr_sz, struct in6_addr *addr)
 
 	struct ifreq ifr = { 0 };
 
-	ifr.ifr_flags = IFF_TUN | IFF_NO_PI;
+	ifr.ifr_flags = IFF_TUN;
+	if (!pi)
+		ifr.ifr_flags |= IFF_NO_PI;
 	if (vnet_hdr_sz)
 		ifr.ifr_flags |= IFF_VNET_HDR;
 
@@ -249,11 +251,18 @@ static inline uint16_t csum_finish(uint32_t sum)
 	return htons((uint16_t)(~sum));
 }
 
-static int create_icmp_echo(unsigned char *data, struct in6_addr *dst,
+static int create_icmp_echo(unsigned char *data, int pi, struct in6_addr *dst,
 			    struct in6_addr *src, uint16_t id, uint16_t seq)
 {
 	const int icmplen = ICMP_MINLEN + sizeof(ping_payload);
-	const int plen = sizeof(struct ip6_hdr) + icmplen;
+	int plen = sizeof(struct ip6_hdr) + icmplen;
+
+	if (pi) {
+		struct tun_pi *pi = (void *)data;
+		data += sizeof(*pi);
+		plen += sizeof(*pi);
+		pi->proto = htons(ETH_P_IPV6);
+	}
 
 	struct ip6_hdr *iph = (void *)data;
 	struct icmp6_hdr *icmph = (void *)(data + sizeof(*iph));
@@ -312,8 +321,21 @@ static int create_icmp_echo(unsigned char *data, struct in6_addr *dst,
 }
 
 
-static int check_icmp_response(unsigned char *data, uint32_t len, struct in6_addr *dst, struct in6_addr *src)
+static int check_icmp_response(unsigned char *data, uint32_t len, int pi,
+			       struct in6_addr *dst, struct in6_addr *src)
 {
+	if (pi) {
+		struct tun_pi *pi = (void *)data;
+		if (len < sizeof(*pi))
+			return 0;
+
+		if (pi->proto != htons(ETH_P_IPV6))
+			return 0;
+
+		data += sizeof(*pi);
+		len -= sizeof(*pi);
+	}
+
 	struct ip6_hdr *iph = (void *)data;
 	return ( len >= 41 && (ntohl(iph->ip6_flow) >> 28)==6 /* IPv6 header */
 		 && iph->ip6_nxt == IPPROTO_ICMPV6 /* IPv6 next header field = ICMPv6 */
@@ -337,7 +359,7 @@ static int check_icmp_response(unsigned char *data, uint32_t len, struct in6_add
 #endif
 
 
-int test_vhost(int vnet_hdr_sz, int xdp, uint64_t features)
+int test_vhost(int vnet_hdr_sz, int pi, int xdp, uint64_t features)
 {
 	int call_fd = eventfd(0, EFD_CLOEXEC|EFD_NONBLOCK);
 	int kick_fd = eventfd(0, EFD_CLOEXEC|EFD_NONBLOCK);
@@ -353,7 +375,7 @@ int test_vhost(int vnet_hdr_sz, int xdp, uint64_t features)
 	/* Pick up the link-local address that the kernel
 	 * assigns to the tun device. */
 	struct in6_addr tun_addr;
-	tun_fd = open_tun(vnet_hdr_sz, &tun_addr);
+	tun_fd = open_tun(vnet_hdr_sz, pi, &tun_addr);
 	if (tun_fd < 0)
 		goto err;
 
@@ -387,18 +409,18 @@ int test_vhost(int vnet_hdr_sz, int xdp, uint64_t features)
 	local_addr.s6_addr16[0] = htons(0xfe80);
 	local_addr.s6_addr16[7] = htons(1);
 
+
 	/* Set up RX and TX descriptors; the latter with ping packets ready to
 	 * send to the kernel, but don't actually send them yet. */
 	for (int i = 0; i < RING_SIZE; i++) {
 		struct pkt_buf *pkt = &rings[1].pkts[i];
-		int plen = create_icmp_echo(&pkt->data[vnet_hdr_sz], &tun_addr,
-					    &local_addr, 0x4747, i);
+		int plen = create_icmp_echo(&pkt->data[vnet_hdr_sz], pi,
+					    &tun_addr, &local_addr, 0x4747, i);
 
 		rings[1].desc[i].addr = vio64((uint64_t)pkt);
 		rings[1].desc[i].len = vio32(plen + vnet_hdr_sz);
 		rings[1].avail_ring[i] = vio16(i);
 
-
 		pkt = &rings[0].pkts[i];
 		rings[0].desc[i].addr = vio64((uint64_t)pkt);
 		rings[0].desc[i].len = vio32(sizeof(*pkt));
@@ -438,9 +460,10 @@ int test_vhost(int vnet_hdr_sz, int xdp, uint64_t features)
 				return -1;
 
 			if (check_icmp_response((void *)(addr + vnet_hdr_sz), len - vnet_hdr_sz,
-						&local_addr, &tun_addr)) {
+						pi, &local_addr, &tun_addr)) {
 				ret = 0;
-				printf("Success (%d %d %llx)\n", vnet_hdr_sz, xdp, (unsigned long long)features);
+				printf("Success (hdr %d, xdp %d, pi %d, features %llx)\n",
+				       vnet_hdr_sz, xdp, pi, (unsigned long long)features);
 				goto err;
 			}
 
@@ -466,51 +489,61 @@ int test_vhost(int vnet_hdr_sz, int xdp, uint64_t features)
 	return ret;
 }
 
-
-int main(void)
+/* Perform the given test with all four combinations of XDP/PI */
+int test_four(int vnet_hdr_sz, uint64_t features)
 {
-	int ret;
-
-	ret = test_vhost(0, 0, ((1ULL << VHOST_NET_F_VIRTIO_NET_HDR) |
-				(1ULL << VIRTIO_F_VERSION_1)));
+	int ret = test_vhost(vnet_hdr_sz, 0, 0, features);
 	if (ret && ret != KSFT_SKIP)
 		return ret;
 
-	ret = test_vhost(0, 1, ((1ULL << VHOST_NET_F_VIRTIO_NET_HDR) |
-				(1ULL << VIRTIO_F_VERSION_1)));
+	ret = test_vhost(vnet_hdr_sz, 0, 1, features);
 	if (ret && ret != KSFT_SKIP)
 		return ret;
-
-	ret = test_vhost(0, 0, ((1ULL << VHOST_NET_F_VIRTIO_NET_HDR)));
+#if 0 /* These don't work *either* for the same reason as the #if 0 later */
+	ret = test_vhost(vnet_hdr_sz, 1, 0, features);
 	if (ret && ret != KSFT_SKIP)
 		return ret;
 
-	ret = test_vhost(0, 1, ((1ULL << VHOST_NET_F_VIRTIO_NET_HDR)));
+	ret = test_vhost(vnet_hdr_sz, 1, 1, features);
 	if (ret && ret != KSFT_SKIP)
 		return ret;
+#endif
+}
 
-	ret = test_vhost(10, 0, 0);
-	if (ret && ret != KSFT_SKIP)
-		return ret;
+int main(void)
+{
+	int ret;
 
-	ret = test_vhost(10, 1, 0);
+	ret = test_four(10, 0);
 	if (ret && ret != KSFT_SKIP)
 		return ret;
 
-#if 0 /* These ones will fail */
-	ret = test_vhost(0, 0, 0);
+	ret = test_four(0, ((1ULL << VHOST_NET_F_VIRTIO_NET_HDR) |
+			    (1ULL << VIRTIO_F_VERSION_1)));
 	if (ret && ret != KSFT_SKIP)
 		return ret;
 
-	ret = test_vhost(0, 1, 0);
+	ret = test_four(0, ((1ULL << VHOST_NET_F_VIRTIO_NET_HDR)));
 	if (ret && ret != KSFT_SKIP)
 		return ret;
 
-	ret = test_vhost(12, 0, 0);
+
+#if 0
+	/*
+	 * These ones will fail, because right now vhost *assumes* that the
+	 * underlying (tun, etc.) socket will be doing a header of precisely
+	 * sizeof(struct virtio_net_hdr), if vhost isn't doing so itself due
+	 * to VHOST_NET_F_VIRTIO_NET_HDR.
+	 *
+	 * That assumption breaks both tun with no IFF_VNET_HDR, and also
+	 * presumably raw sockets. So leave these test cases disabled for
+	 * now until it's fixed.
+	 */
+	ret = test_four(0, 0);
 	if (ret && ret != KSFT_SKIP)
 		return ret;
 
-	ret = test_vhost(12, 1, 0);
+	ret = test_four(12, 0);
 	if (ret && ret != KSFT_SKIP)
 		return ret;
 #endif
-- 
2.31.1


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

  reply	other threads:[~2021-06-23 16:12 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-19 13:33 [PATCH] net: tun: fix tun_xdp_one() for IFF_TUN mode David Woodhouse
2021-06-21  7:00 ` Jason Wang
2021-06-21 10:52   ` David Woodhouse
2021-06-21 14:50     ` David Woodhouse
2021-06-21 20:43       ` David Woodhouse
2021-06-22  4:52         ` Jason Wang
2021-06-22  7:24           ` David Woodhouse
2021-06-22  7:51             ` Jason Wang
2021-06-22  8:10               ` David Woodhouse
2021-06-22 11:36               ` David Woodhouse
2021-06-22  4:34       ` Jason Wang
2021-06-22  4:34     ` Jason Wang
2021-06-22  7:28       ` David Woodhouse
2021-06-22  8:00         ` Jason Wang
2021-06-22  8:29           ` David Woodhouse
2021-06-23  3:39             ` Jason Wang
2021-06-24 12:39               ` David Woodhouse
2021-06-22 16:15 ` [PATCH v2 1/4] " David Woodhouse
2021-06-22 16:15   ` [PATCH v2 2/4] net: tun: don't assume IFF_VNET_HDR in tun_xdp_one() tx path David Woodhouse
2021-06-23  3:46     ` Jason Wang
2021-06-22 16:15   ` [PATCH v2 3/4] vhost_net: validate virtio_net_hdr only if it exists David Woodhouse
2021-06-23  3:48     ` Jason Wang
2021-06-22 16:15   ` [PATCH v2 4/4] vhost_net: Add self test with tun device David Woodhouse
2021-06-23  4:02     ` Jason Wang
2021-06-23 16:12       ` David Woodhouse [this message]
2021-06-24  6:12         ` Jason Wang
2021-06-24 10:42           ` David Woodhouse
2021-06-25  2:55             ` Jason Wang
2021-06-25  7:54               ` David Woodhouse
2021-06-23  3:45   ` [PATCH v2 1/4] net: tun: fix tun_xdp_one() for IFF_TUN mode Jason Wang
2021-06-23  8:30     ` David Woodhouse
2021-06-23 13:52     ` David Woodhouse
2021-06-23 17:31       ` David Woodhouse
2021-06-23 22:52         ` David Woodhouse
2021-06-24  6:37           ` Jason Wang
2021-06-24  7:23             ` David Woodhouse
2021-06-24  6:18       ` Jason Wang
2021-06-24  7:05         ` David Woodhouse
2021-06-24 12:30 ` [PATCH v3 1/5] net: add header len parameter to tun_get_socket(), tap_get_socket() David Woodhouse
2021-06-24 12:30   ` [PATCH v3 2/5] net: tun: don't assume IFF_VNET_HDR in tun_xdp_one() tx path David Woodhouse
2021-06-25  6:58     ` Jason Wang
2021-06-24 12:30   ` [PATCH v3 3/5] vhost_net: remove virtio_net_hdr validation, let tun/tap do it themselves David Woodhouse
2021-06-25  7:33     ` Jason Wang
2021-06-25  8:37       ` David Woodhouse
2021-06-28  4:23         ` Jason Wang
2021-06-28 11:23           ` David Woodhouse
2021-06-28 23:29             ` David Woodhouse
2021-06-29  3:43               ` Jason Wang
2021-06-29  6:59                 ` David Woodhouse
2021-06-29 10:49                 ` David Woodhouse
2021-06-29 13:15                   ` David Woodhouse
2021-06-30  4:39                   ` Jason Wang
2021-06-30 10:02                     ` David Woodhouse
2021-07-01  4:13                       ` Jason Wang
2021-07-01 17:39                         ` David Woodhouse
2021-07-02  3:13                           ` Jason Wang
2021-07-02  8:08                             ` David Woodhouse
2021-07-02  8:50                               ` Jason Wang
2021-07-09 15:04                               ` Eugenio Perez Martin
2021-06-29  3:21             ` Jason Wang
2021-06-24 12:30   ` [PATCH v3 4/5] net: tun: fix tun_xdp_one() for IFF_TUN mode David Woodhouse
2021-06-25  7:41     ` Jason Wang
2021-06-25  8:51       ` David Woodhouse
2021-06-28  4:27         ` Jason Wang
2021-06-28 10:43           ` David Woodhouse
2021-06-25 18:43     ` Willem de Bruijn
2021-06-25 19:00       ` David Woodhouse
2021-06-24 12:30   ` [PATCH v3 5/5] vhost_net: Add self test with tun device David Woodhouse
2021-06-25  5:00   ` [PATCH v3 1/5] net: add header len parameter to tun_get_socket(), tap_get_socket() Jason Wang
2021-06-25  8:23     ` David Woodhouse
2021-06-28  4:22       ` Jason Wang
2021-06-25 18:13   ` Willem de Bruijn
2021-06-25 18:55     ` David Woodhouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d6ad85649fd56d3e12e59085836326a09885593b.camel@infradead.org \
    --to=dwmw2@infradead.org \
    --cc=eperezma@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.