From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stefano Stabellini Subject: Re: RFC: XenSock brainstorming Date: Mon, 6 Jun 2016 11:48:01 +0100 (BST) Message-ID: References: <5755490B.8030000@citrix.com> <633a38b2b11a4a4e979a024b9e47bb96@AMSPEX02CL03.citrite.net> Mime-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="8323329-932819471-1465208885=:6721" Return-path: Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1b9s4q-0003sy-U8 for xen-devel@lists.xenproject.org; Mon, 06 Jun 2016 10:48:13 +0000 Received: by mail-wm0-f47.google.com with SMTP id c74so39577354wme.0 for ; Mon, 06 Jun 2016 03:48:04 -0700 (PDT) In-Reply-To: <633a38b2b11a4a4e979a024b9e47bb96@AMSPEX02CL03.citrite.net> Content-ID: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: Paul Durrant Cc: Wei Liu , Andrew Cooper , Stefano Stabellini , "xen-devel@lists.xenproject.org" , "joao.m.martins@oracle.com" , Roger Pau Monne List-Id: xen-devel@lists.xenproject.org This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --8323329-932819471-1465208885=:6721 Content-Type: TEXT/PLAIN; CHARSET=UTF-8 Content-Transfer-Encoding: 8BIT Content-ID: On Mon, 6 Jun 2016, Paul Durrant wrote: > > -----Original Message----- > > From: Xen-devel [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of > > Andrew Cooper > > Sent: 06 June 2016 10:58 > > To: Stefano Stabellini; xen-devel@lists.xenproject.org > > Cc: joao.m.martins@oracle.com; Wei Liu; Roger Pau Monne > > Subject: Re: [Xen-devel] RFC: XenSock brainstorming > > > > On 06/06/16 10:33, Stefano Stabellini wrote: > > > Hi all, > > > > > > a couple of months ago I started working on a new PV protocol for > > > virtualizing syscalls. I named it XenSock, as its main purpose is to > > > allow the implementation of the POSIX socket API in a domain other than > > > the one of the caller. It allows connect, accept, recvmsg, sendmsg, etc > > > to be implemented directly in Dom0. In a way this is conceptually > > > similar to virtio-9pfs, but for sockets rather than filesystem APIs. > > > See this diagram as reference: > > > > > > https://docs.google.com/presentation/d/1z4AICTY2ejAjZ- > > Ul15GTL3i_wcmhKQJA7tcXwhI3dys/edit?usp=sharing > > > > > > The frontends and backends could live either in userspace or kernel > > > space, with different trade-offs. My current prototype is based on Linux > > > kernel drivers but it would be nice to have userspace drivers too. > > > Discussing where the drivers could be implemented it's beyond the scope > > > of this email. > > > > Just to confirm, you are intending to create a cross-domain transport > > for all AF_ socket types, or just some? > > > > > > > > > > > # Goals > > > > > > The goal of the protocol is to provide networking capabilities to any > > > guests, with the following added benefits: > > > > Throughout, s/Dom0/the backend/ > > > > I expect running the backend in dom0 will be the overwhelmingly common > > configuration, but you should avoid designing the protocol for just this > > usecase. > > > > > > > > * guest networking should work out of the box with VPNs, wireless > > > networks and any other complex network configurations in Dom0 > > > > > > * guest services should listen on ports bound directly to Dom0 IP > > > addresses, fitting naturally in a Docker based workflow, where guests > > > are Docker containers > > > > > > * Dom0 should have full visibility on the guest behavior and should be > > > able to perform inexpensive filtering and manipulation of guest calls > > > > > > * XenSock should provide excellent performance. Unoptimized early code > > > reaches 22 Gbit/sec TCP single stream and scales to 60 Gbit/sec with 3 > > > streams. > > > > What happens if domU tries to open an AF_INET socket, and the domain has > > both sockfront and netfront ? What happens if a domain has multiple > > sockfronts? > > > > This sounds awfully like a class of problem that the open onload (http://www.openonload.org/) stack had to solve, and it involved having to track updates to various kernel tables involved in inet routing and having to keep a 'standard' inet socket in hand even when setting up an intercepted (read 'PV' for this connect ) socket since, until connect, you don’t know what the far end is or how to get to it. > > Having your own AF is definitely a much easier starting point. It also means you get to define all the odd corner-case semantics rather than having to emulate Linux/BSD/Solaris/etc. quirks. Thanks for the pointer, I'll have a look. Other related work include: VirtuOS http://people.cs.vt.edu/~gback/papers/sosp13final.pdf Virtio-vsock http://events.linuxfoundation.org/sites/events/files/slides/stefanha-kvm-forum-2015.pdf --8323329-932819471-1465208885=:6721 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwOi8vbGlzdHMueGVuLm9y Zy94ZW4tZGV2ZWwK --8323329-932819471-1465208885=:6721--