All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Marc-André Lureau" <marcandre.lureau@gmail.com>
To: "V." <mail@winaoe.org>
Cc: QEMU <qemu-devel@nongnu.org>
Subject: Re: [PATCH/RFC 0/1] Vhost User Cross Cable: Intro
Date: Fri, 10 Jan 2020 14:27:29 +0400	[thread overview]
Message-ID: <CAJ+F1CLzR7Q7ei550d+2GhnmcwiGpb2ixem_tr4QUPnsF_KPKg@mail.gmail.com> (raw)
In-Reply-To: <98d1e1f0-0e53-d207-78ce-ea9717673985@winaoe.org>

Hi

On Wed, Jan 8, 2020 at 5:57 AM V. <mail@winaoe.org> wrote:
>
> Hi List,
>
> For my VM setup I tend to use a lot of VM to VM single network links to do routing, switching and bridging in VM's instead of the host.
> Also stemming from a silly fetish to sometimes use some OpenBSD VMs as firewall, but that is besides the point here.
> I am using the standard, tested and true method of using a whole bunch  of bridges, having 2 vhost taps each.
> This works and it's fast, but it is a nightmare to manage with all the interfaces on the host.
>
> So, I looked a bit into how I can improve this, basically coming down to "How to connect 2 VM's together in a really fast and easy way".
> This however, is not as straightforward as I thought, without going the whole route of OVS/Snabb/any other big feature bloated
> software switch.
> Cause really, all I want is to connect 2 VM's in a fast and easy way. Shouldn't be that hard right?
>
> Anyways, I end up finding tests/vhost-user-bridge.c, which is very nicely doing half of what I wanted.
> After some doubling of the vhosts and eliminating udp, I came up with a Vhost User Cross Cable. (patch in next post).
> It just opens 2 vhost sockets instead of 1 and does the forwarding between them.
> A terrible hack and slash of vhost-user-bridge.c, probably now with bugs causing the dead of many puppies and the end of humanity,
> but it works!
>
> However... I now am left with some questions, which I hope some of you can answer.
>
> 1.
> I looked, googled, read and tried things, but it is likely that I am an complete and utter moron and my google-fu has just been awful...
> Very likely... But is there really no other way then I have found to just link up 2 QEMU's in a fast non-bridge way? (No, not sockets.)
> Not that OVS and the likes are not fine software, but do we really need the whole DPDK to do this?

By "not sockets", you mean the data path should use shared memory?
Then, I don't think there are other way.

>
> 2.
> In the unlikely chance that I'm not an idiot, then I guess now we have a nice simple cross cable.
> However, I am still a complete vhost/virtio idiot who has no clue how it works and just randomly brute-forced code into submission.
> Maybe not entirely true, but I would still appreciate it very much if someone with more knowledge into vhost to have a quick look at
> how things are done in cc.
>
> Specifically this monstrosity in TX (speed_killer is a 1MB buffer and kills any speed):
>   ret = iov_from_buf(sg, num, 0, speed_killer,
>                      iov_to_buf(out_sg, out_num, 0, speed_killer,
>                                 MIN(iov_size(out_sg, out_num), sizeof speed_killer)
>                                )
>                     );
>
>   vs. the commented:
>   //ret = iov_copy(sg, num, out_sg, out_num, 0,
>   //               MIN(iov_size(sg, num), iov_size(out_sg, out_num)));
>
> The first is obviously a quick fix to get things working, however, in my meager understanding, should the 2nd one not work?
> Maybe I'm messing up my vectors here, or I am messing up my understanding of iov_copy, but shouldn't the 2nd form be the way to zero
> copy?


As you noted, the data must be copied from source to dest memory.
iov_copy() doesn't actually do that, I don't think we have a iov
function for that.

>
> 3.
> Now if Cross Cable is actually a new and (after a code-rewrite of 10) a viable way to connect 2 QEMU's together, could I actually
> suggest a better way?
> I am thinking of a '-netdev vhost-user-slave' option to connect (as client or server) to a master QEMU running '-netdev vhost-user'.
> This way there is no need for any external program at all, the master can have it's memory unshared and everything would just work
> and be fast.
> Also the whole thing can fall back to normal virtio if memory is not shared and it would even work in pure usermode without any
> context switch.
>
> Building a patch for this idea I could maybe get around to, don't clearly have an idea how much work this would be but I've done
> crazier things.
> But is this is something that someone might be able to whip up in an hour or two? Someone who actually does have a clue about vhost
> and virtio maybe? ;-)

I believe https://wiki.qemu.org/Features/VirtioVhostUser is what you
are after. It's still being discussed and non-trivial, but not very
active lately afaik.

>
> 4.
> Hacking together cc from bridge I noticed the use of container_of() to get from vudev to state in the vu callbacks.
> Would it be an idea to add a context pointer to the callbacks (possibly gotten from VuDevIface)?
> And I know. First post and I have the forwardness to even suggest an API change! I know!
> But it makes things a bit simpler to avoid globals and it makes sense to have some context in a callback to know what's going on,
> right? ;-)

Well, the callbacks are called with the VuDev, so container_of() is
quite fine since you can embed the device in your own structure. I
don't see a compelling reason to change that.

> 5.
> Last one, promise.
> I'm very much in the church of "less software == less bugs == less security problems".
> Running cc or a vhost-user-slave means QEMU has fast networking in usermode without the need for anything else then AF_UNIX + shared
> mem.
> So might it be possible to weed out any modern fancy stuff like the Internet Protocol, TCP, taps, bridges, ethernet and tokenring
> from a kernel and run QEMU on that?
> The idea here is a kernel with storage, a serial console, AF_UNIX and vfio-pci, only running QEMU.
> Would this be feasible? Or does QEMU need a kernel which at least has a grasp of understanding of what AF_INET and ethernet is?
> (Does a modern kernel even still support tokenring? No idea, Probably does.)

Sounds like it is possible.

> Finally, an example and some numbers.
>
> Compiling and starting the cross cable:
> ./configure
> make tests/vhost-user-cc
> tests/vhost-user-cc -l /tmp/left.sock -r /tmp/right.sock
>
> (Note, the cross cable will quit if one of the vm's quits, but the VM's will reconnect when cc starts again.)
>
> 2 VM's, host1 and host2, Linux guests, run like this:
>
> host1:
> /qemu/bin/qemu-system-x86_64 \
>   -accel kvm -nodefaults -k en-us -vnc none -machine q35 -cpu host -smp 8,cores=8 -m 2G -vga std \
>   -object memory-backend-file,id=memory,mem-path=/hugetlbfs,share=on,size=2G \
>   -numa node,memdev=memory \
>   -drive if=none,cache=none,format=raw,aio=native,file=/dev/lvm/host1,id=sda \
>   -device virtio-scsi-pci,id=scsi0 -device scsi-hd,drive=sda,bus=scsi0.0 \
>   -nic tap,vhost=on,helper=/usr/libexec/qemu-bridge-helper,id=eth0,model=virtio-net-pci,mac=52:54:00:aa:aa:aa,br=br0 \
>   -chardev socket,id=left,path=/tmp/left.sock,reconnect=1 \
>   -nic vhost-user,chardev=left,id=eth1,model=virtio-net-pci,mac=52:54:00:bb:bb:bb
>
> host2:
> /qemu/bin/qemu-system-x86_64 \
>   -accel kvm -nodefaults -k en-us -vnc none -machine q35 -cpu host -smp 8,cores=8 -m 2G -vga std \
>   -object memory-backend-file,id=memory,mem-path=/hugetlbfs,share=on,size=2G \
>   -numa node,memdev=memory \
>   -drive if=none,cache=none,format=raw,aio=native,file=/dev/lvm/host2,id=sda \
>   -device virtio-scsi-pci,id=scsi0 -device scsi-hd,drive=sda,bus=scsi0.0 \
>   -nic tap,vhost=on,helper=/usr/libexec/qemu-bridge-helper,id=eth0,model=virtio-net-pci,mac=52:54:00:cc:cc:cc,br=br0 \
>   -chardev socket,id=right,path=/tmp/right.sock,reconnect=1 \
>   -nic vhost-user,chardev=right,id=eth1,model=virtio-net-pci,mac=52:54:00:dd:dd:dd
>
>
> First, speed via eth0 (bridged tap with vhost, host2 runs './iperf3 -s'):
>   root@host1:~/iperf-3.1.3/src# ./iperf3 -c 192.168.0.2 -i 1 -t 10
>   ...
>   [  4]   0.00-10.00  sec  10.7 GBytes  9.22 Gbits/sec                  receiver
>
> Second, speed via eth1 (Vhost Cross Cable):
>   root@host1:~/iperf-3.1.3/src# ./iperf3 -c 192.168.1.2 -i 1 -t 10
>   ...
>   [  4]   0.00-10.00  sec  2.05 GBytes  1.76 Gbits/sec                  receiver
>
> So, a factor of 6 slowdown against bridge. Not too bad, considering the bad iovec mem-copying I do.
> Lots of room for improvement though, but at least for me it's also 5 times faster as socket.
>

And what performance do you get with -netdev socket ?

--
Marc-André Lureau


  reply	other threads:[~2020-01-10 10:34 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-08  1:54 [PATCH/RFC 0/1] Vhost User Cross Cable: Intro V.
2020-01-10 10:27 ` Marc-André Lureau [this message]
2020-01-14 10:20   ` Stefan Hajnoczi
2020-02-13 13:48     ` Nikos Dragazis
2020-02-21 10:44       ` Stefan Hajnoczi
2020-02-13 14:50 ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJ+F1CLzR7Q7ei550d+2GhnmcwiGpb2ixem_tr4QUPnsF_KPKg@mail.gmail.com \
    --to=marcandre.lureau@gmail.com \
    --cc=mail@winaoe.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.