Re: [dpdk-dev] [RFC 0/4] SocketPair Broker support for vhost and virtio-user.

From: Ilya Maximets <i.maximets@ovn.org>
To: Ilya Maximets <i.maximets@ovn.org>,
	Stefan Hajnoczi <stefanha@redhat.com>
Cc: "Maxime Coquelin" <maxime.coquelin@redhat.com>,
	"Chenbo Xia" <chenbo.xia@intel.com>,
	dev@dpdk.org, "Adrian Moreno" <amorenoz@redhat.com>,
	"Julia Suvorova" <jusual@redhat.com>,
	"Marc-André Lureau" <marcandre.lureau@redhat.com>,
	"Daniel Berrange" <berrange@redhat.com>
Subject: Re: [dpdk-dev] [RFC 0/4] SocketPair Broker support for vhost and virtio-user.
Date: Thu, 18 Mar 2021 21:14:27 +0100	[thread overview]
Message-ID: <269ceb3d-3eda-ab5e-659d-e646a4c81957@ovn.org> (raw)
In-Reply-To: <eeea4d9f-e600-9b4d-58f3-f8ced9485854@ovn.org>

On 3/18/21 8:47 PM, Ilya Maximets wrote:
> On 3/18/21 6:52 PM, Stefan Hajnoczi wrote:
>> On Wed, Mar 17, 2021 at 09:25:26PM +0100, Ilya Maximets wrote:
>> Hi,
>> Some questions to understand the problems that SocketPair Broker solves:
>>
>>> Even more configuration tricks required in order to share some sockets
>>> between different containers and not only with the host, e.g. to
>>> create service chains.
>>
>> How does SocketPair Broker solve this? I guess the idea is that
>> SocketPair Broker must be started before other containers. That way
>> applications don't need to sleep and reconnect when a socket isn't
>> available yet.
>>
>> On the other hand, the SocketPair Broker might be unavailable (OOM
>> killer, crash, etc), so applications still need to sleep and reconnect
>> to the broker itself. I'm not sure the problem has actually been solved
>> unless there is a reason why the broker is always guaranteed to be
>> available?
> 
> Hi, Stefan.  Thanks for your feedback!
> 
> The idea is to have the SocketPair Broker running right from the
> boot of the host.  If it will use a systemd socket-based service
> activation, the socket should persist while systemd is alive, IIUC.
> OOM, crash and restart of the broker should not affect existence
> of the socket and systemd will spawn a service if it's not running
> for any reason without loosing incoming connections.
> 
>>
>>> And some housekeeping usually required for applications in case the
>>> socket server terminated abnormally and socket files left on a file
>>> system:
>>>  "failed to bind to vhu: Address already in use; remove it and try again"
>>
>> QEMU avoids this by unlinking before binding. The drawback is that users
>> might accidentally hijack an existing listen socket, but that can be
>> solved with a pidfile.
> 
> How exactly this could be solved with a pidfile?  And what if this is
> a different application that tries to create a socket on a same path?
> e.g. QEMU creates a socket (started in a server mode) and user
> accidentally created dpdkvhostuser port in Open vSwitch instead of
> dpdkvhostuserclient.  This way rte_vhost library will try to bind
> to an existing socket file and will fail.  Subsequently port creation
> in OVS will fail.   We can't allow OVS to unlink files because this
> way OVS users will have ability to unlink random sockets that OVS has
> access to and we also has no idea if it's a QEMU that created a file
> or it was a virtio-user application or someone else.
> There are, probably, ways to detect if there is any alive process that
> has this socket open, but that sounds like too much for this purpose,
> also I'm not sure if it's possible if actual user is in a different
> container.
> So I don't see a good reliable way to detect these conditions.  This
> falls on shoulders of a higher level management software or a user to
> clean these socket files up before adding ports.
> 
>>
>>> Additionally, all applications (system and user's!) should follow
>>> naming conventions and place socket files in particular location on a
>>> file system to make things work.
>>
>> Does SocketPair Broker solve this? Applications now need to use a naming
>> convention for keys, so it seems like this issue has not been
>> eliminated.
> 
> Key is an arbitrary sequence of bytes, so it's hard to call it a naming
> convention.  But they need to know keys, you're right.  And to be
> careful I said "eliminates most of the inconveniences". :)
> 
>>
>>> This patch-set aims to eliminate most of the inconveniences by
>>> leveraging an infrastructure service provided by a SocketPair Broker.
>>
>> I don't understand yet why this is useful for vhost-user, where the
>> creation of the vhost-user device backend and its use by a VMM are
>> closely managed by one piece of software:
>>
>> 1. Unlink the socket path.
>> 2. Create, bind, and listen on the socket path.
>> 3. Instantiate the vhost-user device backend (e.g. talk to DPDK/SPDK
>>    RPC, spawn a process, etc) and pass in the listen fd.
>> 4. In the meantime the VMM can open the socket path and call connect(2).
>>    As soon as the vhost-user device backend calls accept(2) the
>>    connection will proceed (there is no need for sleeping).
>>
>> This approach works across containers without a broker.
> 
> Not sure if I fully understood a question here, but anyway.
> 
> This approach works fine if you know what application to run.
> In case of a k8s cluster, it might be a random DPDK application
> with virtio-user ports running inside a container and want to
> have a network connection.  Also, this application needs to run
> virtio-user in server mode, otherwise restart of the OVS will
> require restart of the application.  So, you basically need to
> rely on a third-party application to create a socket with a right
> name and in a correct location that is shared with a host, so
> OVS can find it and connect.
> 
> In a VM world everything is much more simple, since you have
> a libvirt and QEMU that will take care of all of these stuff
> and which are also under full control of management software
> and a system administrator.
> In case of a container with a "random" DPDK application inside
> there is no such entity that can help.  Of course, some solution
> might be implemented in docker/podman daemon to create and manage
> outside-looking sockets for an application inside the container,
> but that is not available today AFAIK and I'm not sure if it
> ever will.
> 
>>
>> BTW what is the security model of the broker? Unlike pathname UNIX
>> domain sockets there is no ownership permission check.
> 
> I thought about this.  Yes, we should allow connection to this socket
> for a wide group of applications.  That might be a problem.
> However, 2 applications need to know the 1024 (at most) byte key in
> order to connect to each other.  This might be considered as a
> sufficient security model in case these keys are not predictable.
> Suggestions on how to make this more secure are welcome.

Digging more into unix sockets, I think that broker might use
SO_PEERCRED to identify at least a uid and gid of a client.
This way we can implement policies, e.g. one client might
request to pair it only with clients from the same group or
from the same user.

This is actually a great extension for the SocketPair Broker Protocol.

Might even use SO_PEERSEC to enforce even stricter policies
based on selinux context.

> 
> If it's really necessary to completely isolate some connections
> from other ones, one more broker could be started.  But I'm not
> sure what the case it should be.
> 
> Broker itself closes the socketpair on its side, so the connection
> between 2 applications is direct and should be secure as far as
> kernel doesn't allow other system processes to intercept data on
> arbitrary unix sockets.
> 
> Best regards, Ilya Maximets.
>