On Mon, 6 Jun 2016, Paul Durrant wrote: > > -----Original Message----- > > From: Xen-devel [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of > > Andrew Cooper > > Sent: 06 June 2016 10:58 > > To: Stefano Stabellini; xen-devel@lists.xenproject.org > > Cc: joao.m.martins@oracle.com; Wei Liu; Roger Pau Monne > > Subject: Re: [Xen-devel] RFC: XenSock brainstorming > > > > On 06/06/16 10:33, Stefano Stabellini wrote: > > > Hi all, > > > > > > a couple of months ago I started working on a new PV protocol for > > > virtualizing syscalls. I named it XenSock, as its main purpose is to > > > allow the implementation of the POSIX socket API in a domain other than > > > the one of the caller. It allows connect, accept, recvmsg, sendmsg, etc > > > to be implemented directly in Dom0. In a way this is conceptually > > > similar to virtio-9pfs, but for sockets rather than filesystem APIs. > > > See this diagram as reference: > > > > > > https://docs.google.com/presentation/d/1z4AICTY2ejAjZ- > > Ul15GTL3i_wcmhKQJA7tcXwhI3dys/edit?usp=sharing > > > > > > The frontends and backends could live either in userspace or kernel > > > space, with different trade-offs. My current prototype is based on Linux > > > kernel drivers but it would be nice to have userspace drivers too. > > > Discussing where the drivers could be implemented it's beyond the scope > > > of this email. > > > > Just to confirm, you are intending to create a cross-domain transport > > for all AF_ socket types, or just some? > > > > > > > > > > > # Goals > > > > > > The goal of the protocol is to provide networking capabilities to any > > > guests, with the following added benefits: > > > > Throughout, s/Dom0/the backend/ > > > > I expect running the backend in dom0 will be the overwhelmingly common > > configuration, but you should avoid designing the protocol for just this > > usecase. > > > > > > > > * guest networking should work out of the box with VPNs, wireless > > > networks and any other complex network configurations in Dom0 > > > > > > * guest services should listen on ports bound directly to Dom0 IP > > > addresses, fitting naturally in a Docker based workflow, where guests > > > are Docker containers > > > > > > * Dom0 should have full visibility on the guest behavior and should be > > > able to perform inexpensive filtering and manipulation of guest calls > > > > > > * XenSock should provide excellent performance. Unoptimized early code > > > reaches 22 Gbit/sec TCP single stream and scales to 60 Gbit/sec with 3 > > > streams. > > > > What happens if domU tries to open an AF_INET socket, and the domain has > > both sockfront and netfront ? What happens if a domain has multiple > > sockfronts? > > > > This sounds awfully like a class of problem that the open onload (http://www.openonload.org/) stack had to solve, and it involved having to track updates to various kernel tables involved in inet routing and having to keep a 'standard' inet socket in hand even when setting up an intercepted (read 'PV' for this connect ) socket since, until connect, you don’t know what the far end is or how to get to it. > > Having your own AF is definitely a much easier starting point. It also means you get to define all the odd corner-case semantics rather than having to emulate Linux/BSD/Solaris/etc. quirks. Thanks for the pointer, I'll have a look. Other related work include: VirtuOS http://people.cs.vt.edu/~gback/papers/sosp13final.pdf Virtio-vsock http://events.linuxfoundation.org/sites/events/files/slides/stefanha-kvm-forum-2015.pdf