netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] vsock: proposal to support multiple transports at runtime
@ 2019-05-14  8:15 Stefano Garzarella
  2019-05-16 21:48 ` Dexuan Cui
  2019-05-23 15:37 ` Stefan Hajnoczi
  0 siblings, 2 replies; 9+ messages in thread
From: Stefano Garzarella @ 2019-05-14  8:15 UTC (permalink / raw)
  To: netdev, Stefan Hajnoczi, Dexuan Cui, Jorgen Hansen
  Cc: David S. Miller, Vishnu Dasa, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin

Hi guys,
I'm currently interested on implement a multi-transport support for VSOCK in
order to handle nested VMs.

As Stefan suggested me, I started to look at this discussion:
https://lkml.org/lkml/2017/8/17/551
Below I tried to summarize a proposal for a discussion, following the ideas
from Dexuan, Jorgen, and Stefan.


We can define two types of transport that we have to handle at the same time
(e.g. in a nested VM we would have both types of transport running together):

- 'host side transport', it runs in the host and it is used to communicate with
  the guests of a specific hypervisor (KVM, VMWare or HyperV)

  Should we support multiple 'host side transport' running at the same time?

- 'guest side transport'. it runs in the guest and it is used to communicate
  with the host transport


The main goal is to find a way to decide what transport use in these cases:
1. connect() / sendto()

	a. use the 'host side transport', if the destination is the guest
	   (dest_cid > VMADDR_CID_HOST).
	   If we want to support multiple 'host side transport' running at the
	   same time, we should assign CIDs uniquely across all transports.
	   In this way, a packet generated by the host side will get directed
	   to the appropriate transport based on the CID

	b. use the 'guest side transport', if the destination is the host
	   (dest_cid == VMADDR_CID_HOST)


2. listen() / recvfrom()

	a. use the 'host side transport', if the socket is bound to
	   VMADDR_CID_HOST, or it is bound to VMADDR_CID_ANY and there is no
	   guest transport.
	   We could also define a new VMADDR_CID_LISTEN_FROM_GUEST in order to
	   address this case.
	   If we want to support multiple 'host side transport' running at the
	   same time, we should find a way to allow an application to bound a
	   specific host transport (e.g. adding new VMADDR_CID_LISTEN_FROM_KVM,
	   VMADDR_CID_LISTEN_FROM_VMWARE, VMADDR_CID_LISTEN_FROM_HYPERV)

	b. use the 'guest side transport', if the socket is bound to local CID
	   different from the VMADDR_CID_HOST (guest CID get with
	   IOCTL_VM_SOCKETS_GET_LOCAL_CID), or it is bound to VMADDR_CID_ANY
	   (to be backward compatible).
	   Also in this case, we could define a new VMADDR_CID_LISTEN_FROM_HOST.

Thanks in advance for your comments and suggestions.

Cheers,
Stefano

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [RFC] vsock: proposal to support multiple transports at runtime
  2019-05-14  8:15 [RFC] vsock: proposal to support multiple transports at runtime Stefano Garzarella
@ 2019-05-16 21:48 ` Dexuan Cui
  2019-05-20 14:44   ` Stefano Garzarella
  2019-05-23 15:37 ` Stefan Hajnoczi
  1 sibling, 1 reply; 9+ messages in thread
From: Dexuan Cui @ 2019-05-16 21:48 UTC (permalink / raw)
  To: Stefano Garzarella, netdev, Stefan Hajnoczi, Jorgen Hansen
  Cc: David S. Miller, Vishnu Dasa, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin

> From: Stefano Garzarella <sgarzare@redhat.com>
> Sent: Tuesday, May 14, 2019 1:16 AM
> To: netdev@vger.kernel.org; Stefan Hajnoczi <stefanha@redhat.com>; Dexuan
> 
> Hi guys,
> I'm currently interested on implement a multi-transport support for VSOCK in
> order to handle nested VMs.

Hi Stefano,
Thanks for reviving the discussion! :-)

I don't know a lot about the details of kvm/vmware sockets, but please let me
share my understanding about them, and let me also share some details about
hyper-v sockets, which I think should be the simplest:

1) For hyper-v sockets, the "host" can only be Windows. We can do nothing on the
Windows host, and I guess we need to do nothing there.

2) For hyper-v sockets, I think we only care about Linux guest, and the guest can
only talk to the host; a guest can not talk to another guest running on the same host.

3) On a hyper-v host, if the guest is running kvm/vmware (i.e. nested virtualization),
I think in the "KVM guest" the Linux hyper-v transport driver needs to load so that
the guest can talk to the host (I'm not sure about "vmware guest" in this case); 
the "KVM guest" also needs to load the kvm transport drivers so that it can talk
to its child VMs (I'm not sure abut "vmware guest" in this case).

4) On kvm/vmware, if the guest is a Windows guest, I think we can do nothing in
the guest; if the guest is Linux guest, I think the kvm/vmware transport drivers
should load; if the Linux guest is running kvm/vmware (nested virtualization), I
think the proper "to child VMs" versions of the kvm/vmware transport drivers
need to load. 

Thanks,
-- Dexuan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] vsock: proposal to support multiple transports at runtime
  2019-05-16 21:48 ` Dexuan Cui
@ 2019-05-20 14:44   ` Stefano Garzarella
  0 siblings, 0 replies; 9+ messages in thread
From: Stefano Garzarella @ 2019-05-20 14:44 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: netdev, Stefan Hajnoczi, Jorgen Hansen, David S. Miller,
	Vishnu Dasa, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Sasha Levin

Hi Dexuan,

On Thu, May 16, 2019 at 09:48:11PM +0000, Dexuan Cui wrote:
> > From: Stefano Garzarella <sgarzare@redhat.com>
> > Sent: Tuesday, May 14, 2019 1:16 AM
> > To: netdev@vger.kernel.org; Stefan Hajnoczi <stefanha@redhat.com>; Dexuan
> > 
> > Hi guys,
> > I'm currently interested on implement a multi-transport support for VSOCK in
> > order to handle nested VMs.
> 
> Hi Stefano,
> Thanks for reviving the discussion! :-)
> 

You're welcome :)

> I don't know a lot about the details of kvm/vmware sockets, but please let me
> share my understanding about them, and let me also share some details about
> hyper-v sockets, which I think should be the simplest:
> 
> 1) For hyper-v sockets, the "host" can only be Windows. We can do nothing on the
> Windows host, and I guess we need to do nothing there.

I agree that for the Windows host we shouldn't change anything.

> 
> 2) For hyper-v sockets, I think we only care about Linux guest, and the guest can
> only talk to the host; a guest can not talk to another guest running on the same host.

Also for KVM (virtio) a guest can talk only with the host.

> 
> 3) On a hyper-v host, if the guest is running kvm/vmware (i.e. nested virtualization),
> I think in the "KVM guest" the Linux hyper-v transport driver needs to load so that
> the guest can talk to the host (I'm not sure about "vmware guest" in this case); 
> the "KVM guest" also needs to load the kvm transport drivers so that it can talk
> to its child VMs (I'm not sure abut "vmware guest" in this case).

Okay, so since in the "KVM guest" we will have both hyper-v and kvm
transports, we should implement a way to decide what transport use in
the cases that I described in the first email.

> 
> 4) On kvm/vmware, if the guest is a Windows guest, I think we can do nothing in
> the guest;

Yes, the driver in Windows guest shouldn't change.

> if the guest is Linux guest, I think the kvm/vmware transport drivers
> should load; if the Linux guest is running kvm/vmware (nested virtualization), I
> think the proper "to child VMs" versions of the kvm/vmware transport drivers
> need to load.

Exactly, and for the KVM side is the vhost-vsock driver. So, as the
point 3, we should support at least two transports running in Linux at
the same time.

Thank you very much to share these information!

Cheers,
Stefano

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] vsock: proposal to support multiple transports at runtime
  2019-05-14  8:15 [RFC] vsock: proposal to support multiple transports at runtime Stefano Garzarella
  2019-05-16 21:48 ` Dexuan Cui
@ 2019-05-23 15:37 ` Stefan Hajnoczi
  2019-05-27 10:44   ` Stefano Garzarella
  1 sibling, 1 reply; 9+ messages in thread
From: Stefan Hajnoczi @ 2019-05-23 15:37 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, Dexuan Cui, Jorgen Hansen, David S. Miller, Vishnu Dasa,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin

[-- Attachment #1: Type: text/plain, Size: 3845 bytes --]

On Tue, May 14, 2019 at 10:15:43AM +0200, Stefano Garzarella wrote:
> Hi guys,
> I'm currently interested on implement a multi-transport support for VSOCK in
> order to handle nested VMs.
> 
> As Stefan suggested me, I started to look at this discussion:
> https://lkml.org/lkml/2017/8/17/551
> Below I tried to summarize a proposal for a discussion, following the ideas
> from Dexuan, Jorgen, and Stefan.
> 
> 
> We can define two types of transport that we have to handle at the same time
> (e.g. in a nested VM we would have both types of transport running together):
> 
> - 'host side transport', it runs in the host and it is used to communicate with
>   the guests of a specific hypervisor (KVM, VMWare or HyperV)
> 
>   Should we support multiple 'host side transport' running at the same time?
> 
> - 'guest side transport'. it runs in the guest and it is used to communicate
>   with the host transport

I find this terminology confusing.  Perhaps "host->guest" (your 'host
side transport') and "guest->host" (your 'guest side transport') is
clearer?

Or maybe the nested virtualization terminology of L2 transport (your
'host side transport') and L0 transport (your 'guest side transport')?
Here we are the L1 guest and L0 is the host and L2 is our nested guest.

> 
> 
> The main goal is to find a way to decide what transport use in these cases:
> 1. connect() / sendto()
> 
> 	a. use the 'host side transport', if the destination is the guest
> 	   (dest_cid > VMADDR_CID_HOST).
> 	   If we want to support multiple 'host side transport' running at the
> 	   same time, we should assign CIDs uniquely across all transports.
> 	   In this way, a packet generated by the host side will get directed
> 	   to the appropriate transport based on the CID

The multiple host side transport case is unlikely to be necessary on x86
where only one hypervisor uses VMX at any given time.  But eventually it
may happen so it's wise to at least allow it in the design.

> 
> 	b. use the 'guest side transport', if the destination is the host
> 	   (dest_cid == VMADDR_CID_HOST)

Makes sense to me.

> 
> 
> 2. listen() / recvfrom()
> 
> 	a. use the 'host side transport', if the socket is bound to
> 	   VMADDR_CID_HOST, or it is bound to VMADDR_CID_ANY and there is no
> 	   guest transport.
> 	   We could also define a new VMADDR_CID_LISTEN_FROM_GUEST in order to
> 	   address this case.
> 	   If we want to support multiple 'host side transport' running at the
> 	   same time, we should find a way to allow an application to bound a
> 	   specific host transport (e.g. adding new VMADDR_CID_LISTEN_FROM_KVM,
> 	   VMADDR_CID_LISTEN_FROM_VMWARE, VMADDR_CID_LISTEN_FROM_HYPERV)

Hmm...VMADDR_CID_LISTEN_FROM_KVM, VMADDR_CID_LISTEN_FROM_VMWARE,
VMADDR_CID_LISTEN_FROM_HYPERV isn't very flexible.  What if my service
should only be available to a subset of VMware VMs?

Instead it might be more appropriate to use network namespaces to create
independent AF_VSOCK addressing domains.  Then you could have two
separate groups of VMware VMs and selectively listen to just one group.

> 
> 	b. use the 'guest side transport', if the socket is bound to local CID
> 	   different from the VMADDR_CID_HOST (guest CID get with
> 	   IOCTL_VM_SOCKETS_GET_LOCAL_CID), or it is bound to VMADDR_CID_ANY
> 	   (to be backward compatible).
> 	   Also in this case, we could define a new VMADDR_CID_LISTEN_FROM_HOST.

Two additional topics:

1. How will loading af_vsock.ko change?  In particular, can an
   application create a socket in af_vsock.ko without any loaded
   transport?  Can it enter listen state without any loaded transport
   (this seems useful with VMADDR_CID_ANY)?

2. Does your proposed behavior match VMware's existing nested vsock
   semantics?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] vsock: proposal to support multiple transports at runtime
  2019-05-23 15:37 ` Stefan Hajnoczi
@ 2019-05-27 10:44   ` Stefano Garzarella
  2019-05-28 16:01     ` Jorgen Hansen
  0 siblings, 1 reply; 9+ messages in thread
From: Stefano Garzarella @ 2019-05-27 10:44 UTC (permalink / raw)
  To: Stefan Hajnoczi, Jorgen Hansen
  Cc: netdev, Dexuan Cui, David S. Miller, Vishnu Dasa,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin

On Thu, May 23, 2019 at 04:37:03PM +0100, Stefan Hajnoczi wrote:
> On Tue, May 14, 2019 at 10:15:43AM +0200, Stefano Garzarella wrote:
> > Hi guys,
> > I'm currently interested on implement a multi-transport support for VSOCK in
> > order to handle nested VMs.
> > 
> > As Stefan suggested me, I started to look at this discussion:
> > https://lkml.org/lkml/2017/8/17/551
> > Below I tried to summarize a proposal for a discussion, following the ideas
> > from Dexuan, Jorgen, and Stefan.
> > 
> > 
> > We can define two types of transport that we have to handle at the same time
> > (e.g. in a nested VM we would have both types of transport running together):
> > 
> > - 'host side transport', it runs in the host and it is used to communicate with
> >   the guests of a specific hypervisor (KVM, VMWare or HyperV)
> > 
> >   Should we support multiple 'host side transport' running at the same time?
> > 
> > - 'guest side transport'. it runs in the guest and it is used to communicate
> >   with the host transport
> 
> I find this terminology confusing.  Perhaps "host->guest" (your 'host
> side transport') and "guest->host" (your 'guest side transport') is
> clearer?

I agree, "host->guest" and "guest->host" are better, I'll use them.

> 
> Or maybe the nested virtualization terminology of L2 transport (your
> 'host side transport') and L0 transport (your 'guest side transport')?
> Here we are the L1 guest and L0 is the host and L2 is our nested guest.
>

I'm confused, if L2 is the nested guest, it should be the
'guest side transport'. Did I miss anything?

Maybe it is another point to your first proposal :)

> > 
> > 
> > The main goal is to find a way to decide what transport use in these cases:
> > 1. connect() / sendto()
> > 
> > 	a. use the 'host side transport', if the destination is the guest
> > 	   (dest_cid > VMADDR_CID_HOST).
> > 	   If we want to support multiple 'host side transport' running at the
> > 	   same time, we should assign CIDs uniquely across all transports.
> > 	   In this way, a packet generated by the host side will get directed
> > 	   to the appropriate transport based on the CID
> 
> The multiple host side transport case is unlikely to be necessary on x86
> where only one hypervisor uses VMX at any given time.  But eventually it
> may happen so it's wise to at least allow it in the design.
> 

Okay, I was in doubt, but I'll keep it in the design.

> > 
> > 	b. use the 'guest side transport', if the destination is the host
> > 	   (dest_cid == VMADDR_CID_HOST)
> 
> Makes sense to me.
> 
> > 
> > 
> > 2. listen() / recvfrom()
> > 
> > 	a. use the 'host side transport', if the socket is bound to
> > 	   VMADDR_CID_HOST, or it is bound to VMADDR_CID_ANY and there is no
> > 	   guest transport.
> > 	   We could also define a new VMADDR_CID_LISTEN_FROM_GUEST in order to
> > 	   address this case.
> > 	   If we want to support multiple 'host side transport' running at the
> > 	   same time, we should find a way to allow an application to bound a
> > 	   specific host transport (e.g. adding new VMADDR_CID_LISTEN_FROM_KVM,
> > 	   VMADDR_CID_LISTEN_FROM_VMWARE, VMADDR_CID_LISTEN_FROM_HYPERV)
> 
> Hmm...VMADDR_CID_LISTEN_FROM_KVM, VMADDR_CID_LISTEN_FROM_VMWARE,
> VMADDR_CID_LISTEN_FROM_HYPERV isn't very flexible.  What if my service
> should only be available to a subset of VMware VMs?

You're right, it is not very flexible.

> 
> Instead it might be more appropriate to use network namespaces to create
> independent AF_VSOCK addressing domains.  Then you could have two
> separate groups of VMware VMs and selectively listen to just one group.
> 

Does AF_VSOCK support network namespace or it could be another
improvement to take care? (IIUC is not currently supported)

A possible issue that I'm seeing with netns is if they are used for
other purpose (e.g. to isolate the network of a VM), we should have
multiple instances of the application, one per netns.

> > 
> > 	b. use the 'guest side transport', if the socket is bound to local CID
> > 	   different from the VMADDR_CID_HOST (guest CID get with
> > 	   IOCTL_VM_SOCKETS_GET_LOCAL_CID), or it is bound to VMADDR_CID_ANY
> > 	   (to be backward compatible).
> > 	   Also in this case, we could define a new VMADDR_CID_LISTEN_FROM_HOST.
> 
> Two additional topics:
> 
> 1. How will loading af_vsock.ko change?

I'd allow the loading of af_vsock.ko without any transport.
Maybe we should move the MODULE_ALIAS_NETPROTO(PF_VSOCK) from the
vmci_transport.ko to the af_vsock.ko, but this can impact the VMware
driver.

>    In particular, can an
>    application create a socket in af_vsock.ko without any loaded
>    transport?  Can it enter listen state without any loaded transport
>    (this seems useful with VMADDR_CID_ANY)?

I'll check if we can allow listen sockets without any loaded transport,
but I think could be a nice behaviour to have.

> 
> 2. Does your proposed behavior match VMware's existing nested vsock
>    semantics?

I'm not sure, but I tried to follow the Jorgen's answers to the original
thread. I hope that this proposal matches the VMware semantic.

@Jorgen, do you have any advice?

Thanks,
Stefano

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] vsock: proposal to support multiple transports at runtime
  2019-05-27 10:44   ` Stefano Garzarella
@ 2019-05-28 16:01     ` Jorgen Hansen
  2019-05-30 11:19       ` Stefano Garzarella
  0 siblings, 1 reply; 9+ messages in thread
From: Jorgen Hansen @ 2019-05-28 16:01 UTC (permalink / raw)
  To: Stefano Garzarella, Stefan Hajnoczi
  Cc: netdev, Dexuan Cui, David S. Miller, Vishnu DASA,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin

> On Thu, May 23, 2019 at 04:37:03PM +0100, Stefan Hajnoczi wrote:
> > On Tue, May 14, 2019 at 10:15:43AM +0200, Stefano Garzarella wrote:
> > > Hi guys,
> > > I'm currently interested on implement a multi-transport support for VSOCK in
> > > order to handle nested VMs.

Thanks for picking this up!

> > >
> > > As Stefan suggested me, I started to look at this discussion:
> > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org%2Flkml%2F2017%2F8%2F17%2F551&amp;data=02%7C01%7Cjhansen%40vmware.com%7Cc2a340a868bb4525c6d408d6e2905909%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636945506938670252&amp;sdata=kl820ZF1AAOXEyCZYoNPpYmLVyvK3ISr1GT0oDODEn4%3D&amp;reserved=0
> > > Below I tried to summarize a proposal for a discussion, following the ideas
> > > from Dexuan, Jorgen, and Stefan.
> > >
> > >
> > > We can define two types of transport that we have to handle at the same time
> > > (e.g. in a nested VM we would have both types of transport running together):
> > >
> > > - 'host side transport', it runs in the host and it is used to communicate with
> > >   the guests of a specific hypervisor (KVM, VMWare or HyperV)
> > >
> > >   Should we support multiple 'host side transport' running at the same time?
> > >
> > > - 'guest side transport'. it runs in the guest and it is used to communicate
> > >   with the host transport
> >
> > I find this terminology confusing.  Perhaps "host->guest" (your 'host
> > side transport') and "guest->host" (your 'guest side transport') is
> > clearer?
>
> I agree, "host->guest" and "guest->host" are better, I'll use them.
>
> >
> > Or maybe the nested virtualization terminology of L2 transport (your
> > 'host side transport') and L0 transport (your 'guest side transport')?
> > Here we are the L1 guest and L0 is the host and L2 is our nested guest.
> >
>
> I'm confused, if L2 is the nested guest, it should be the
> 'guest side transport'. Did I miss anything?
>
> Maybe it is another point to your first proposal :)
>
> > >
> > >
> > > The main goal is to find a way to decide what transport use in these cases:
> > > 1. connect() / sendto()
> > >
> > >     a. use the 'host side transport', if the destination is the guest
> > >        (dest_cid > VMADDR_CID_HOST).
> > >        If we want to support multiple 'host side transport' running at the
> > >        same time, we should assign CIDs uniquely across all transports.
> > >        In this way, a packet generated by the host side will get directed
> > >        to the appropriate transport based on the CID
> >
> > The multiple host side transport case is unlikely to be necessary on x86
> > where only one hypervisor uses VMX at any given time.  But eventually it
> > may happen so it's wise to at least allow it in the design.
> >
>
> Okay, I was in doubt, but I'll keep it in the design.
>
> > >
> > >     b. use the 'guest side transport', if the destination is the host
> > >        (dest_cid == VMADDR_CID_HOST)
> >
> > Makes sense to me.
> >

Agreed. With the addition that VMADDR_CID_HYPERVISOR is also routed as "guest->host/guest side transport".

>> >
>> >
>> > 2. listen() / recvfrom()
> > >
>> >     a. use the 'host side transport', if the socket is bound to
> > >        VMADDR_CID_HOST, or it is bound to VMADDR_CID_ANY and there is no
> > >        guest transport.
> > >        We could also define a new VMADDR_CID_LISTEN_FROM_GUEST in order to
> > >        address this case.
> > >        If we want to support multiple 'host side transport' running at the
> > >        same time, we should find a way to allow an application to bound a
> > >        specific host transport (e.g. adding new VMADDR_CID_LISTEN_FROM_KVM,
> > >        VMADDR_CID_LISTEN_FROM_VMWARE, VMADDR_CID_LISTEN_FROM_HYPERV)
> >
> > Hmm...VMADDR_CID_LISTEN_FROM_KVM, VMADDR_CID_LISTEN_FROM_VMWARE,
> > VMADDR_CID_LISTEN_FROM_HYPERV isn't very flexible.  What if my service
> > should only be available to a subset of VMware VMs?
>
> You're right, it is not very flexible.

When I was last looking at this, I was considering a proposal where the incoming traffic would determine which transport to use for CID_ANY in the case of multiple transports. For stream sockets, we already have a shared port space, so if we receive a connection request for < port N, CID_ANY>, that connection would use the transport of the incoming request. The transport could either be a host->guest transport or the guest->host transport. This is a bit harder to do for datagrams since the VSOCK port is decided by the transport itself today. For VMCI, a VMCI datagram handler is allocated for each datagram socket, and the ID of that handler is used as the port. So we would potentially have to register the same datagram port with all transports.

The use of network namespaces would be complimentary to this, and could be used to partition VMs between hypervisors or at a finer granularity. This could also be used to isolate host applications from guest applications using the same ports with CID_ANY if necessary.

>
> >
> > Instead it might be more appropriate to use network namespaces to create
> > independent AF_VSOCK addressing domains.  Then you could have two
> > separate groups of VMware VMs and selectively listen to just one group.
> >
>
> Does AF_VSOCK support network namespace or it could be another
> improvement to take care? (IIUC is not currently supported)
>
> A possible issue that I'm seeing with netns is if they are used for
> other purpose (e.g. to isolate the network of a VM), we should have
> multiple instances of the application, one per netns.
>
> > >
> > >     b. use the 'guest side transport', if the socket is bound to local CID
> > >        different from the VMADDR_CID_HOST (guest CID get with
> > >        IOCTL_VM_SOCKETS_GET_LOCAL_CID), or it is bound to VMADDR_CID_ANY
> > >        (to be backward compatible).
> > >        Also in this case, we could define a new VMADDR_CID_LISTEN_FROM_HOST.
> >
> > Two additional topics:
> >
> > 1. How will loading af_vsock.ko change?
>
> I'd allow the loading of af_vsock.ko without any transport.
> Maybe we should move the MODULE_ALIAS_NETPROTO(PF_VSOCK) from the
> vmci_transport.ko to the af_vsock.ko, but this can impact the VMware
> driver.

As I remember it, this will impact the existing VMware products. I'll have to double check that.

>
> >    In particular, can an
> >    application create a socket in af_vsock.ko without any loaded
> >    transport?  Can it enter listen state without any loaded transport
> >    (this seems useful with VMADDR_CID_ANY)?
>
> I'll check if we can allow listen sockets without any loaded transport,
> but I think could be a nice behaviour to have.
>
> >
> > 2. Does your proposed behavior match VMware's existing nested vsock
> >    semantics?
>
> I'm not sure, but I tried to follow the Jorgen's answers to the original
> thread. I hope that this proposal matches the VMware semantic.

Yes, the semantics should be preserved.

Thanks,
Jorgen

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] vsock: proposal to support multiple transports at runtime
  2019-05-28 16:01     ` Jorgen Hansen
@ 2019-05-30 11:19       ` Stefano Garzarella
  2019-05-31  9:24         ` Jorgen Hansen
  0 siblings, 1 reply; 9+ messages in thread
From: Stefano Garzarella @ 2019-05-30 11:19 UTC (permalink / raw)
  To: Jorgen Hansen
  Cc: Stefan Hajnoczi, netdev, Dexuan Cui, David S. Miller,
	Vishnu DASA, K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Sasha Levin

On Tue, May 28, 2019 at 04:01:00PM +0000, Jorgen Hansen wrote:
> > On Thu, May 23, 2019 at 04:37:03PM +0100, Stefan Hajnoczi wrote:
> > > On Tue, May 14, 2019 at 10:15:43AM +0200, Stefano Garzarella wrote:
> > > > Hi guys,
> > > > I'm currently interested on implement a multi-transport support for VSOCK in
> > > > order to handle nested VMs.
> 
> Thanks for picking this up!
> 

:)

> > > >
> > > > As Stefan suggested me, I started to look at this discussion:
> > > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org%2Flkml%2F2017%2F8%2F17%2F551&amp;data=02%7C01%7Cjhansen%40vmware.com%7Cc2a340a868bb4525c6d408d6e2905909%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636945506938670252&amp;sdata=kl820ZF1AAOXEyCZYoNPpYmLVyvK3ISr1GT0oDODEn4%3D&amp;reserved=0
> > > > Below I tried to summarize a proposal for a discussion, following the ideas
> > > > from Dexuan, Jorgen, and Stefan.
> > > >
> > > >
> > > > We can define two types of transport that we have to handle at the same time
> > > > (e.g. in a nested VM we would have both types of transport running together):
> > > >
> > > > - 'host side transport', it runs in the host and it is used to communicate with
> > > >   the guests of a specific hypervisor (KVM, VMWare or HyperV)
> > > >
> > > >   Should we support multiple 'host side transport' running at the same time?
> > > >
> > > > - 'guest side transport'. it runs in the guest and it is used to communicate
> > > >   with the host transport
> > >
> > > I find this terminology confusing.  Perhaps "host->guest" (your 'host
> > > side transport') and "guest->host" (your 'guest side transport') is
> > > clearer?
> >
> > I agree, "host->guest" and "guest->host" are better, I'll use them.
> >
> > >
> > > Or maybe the nested virtualization terminology of L2 transport (your
> > > 'host side transport') and L0 transport (your 'guest side transport')?
> > > Here we are the L1 guest and L0 is the host and L2 is our nested guest.
> > >
> >
> > I'm confused, if L2 is the nested guest, it should be the
> > 'guest side transport'. Did I miss anything?
> >
> > Maybe it is another point to your first proposal :)
> >
> > > >
> > > >
> > > > The main goal is to find a way to decide what transport use in these cases:
> > > > 1. connect() / sendto()
> > > >
> > > >     a. use the 'host side transport', if the destination is the guest
> > > >        (dest_cid > VMADDR_CID_HOST).
> > > >        If we want to support multiple 'host side transport' running at the
> > > >        same time, we should assign CIDs uniquely across all transports.
> > > >        In this way, a packet generated by the host side will get directed
> > > >        to the appropriate transport based on the CID
> > >
> > > The multiple host side transport case is unlikely to be necessary on x86
> > > where only one hypervisor uses VMX at any given time.  But eventually it
> > > may happen so it's wise to at least allow it in the design.
> > >
> >
> > Okay, I was in doubt, but I'll keep it in the design.
> >
> > > >
> > > >     b. use the 'guest side transport', if the destination is the host
> > > >        (dest_cid == VMADDR_CID_HOST)
> > >
> > > Makes sense to me.
> > >
> 
> Agreed. With the addition that VMADDR_CID_HYPERVISOR is also routed as
> "guest->host/guest side transport".
> 

Yes, I had it in mind, but I forgot to write it in the proposal.

> >> >
> >> >
> >> > 2. listen() / recvfrom()
> > > >
> >> >     a. use the 'host side transport', if the socket is bound to
> > > >        VMADDR_CID_HOST, or it is bound to VMADDR_CID_ANY and there is no
> > > >        guest transport.
> > > >        We could also define a new VMADDR_CID_LISTEN_FROM_GUEST in order to
> > > >        address this case.
> > > >        If we want to support multiple 'host side transport' running at the
> > > >        same time, we should find a way to allow an application to bound a
> > > >        specific host transport (e.g. adding new VMADDR_CID_LISTEN_FROM_KVM,
> > > >        VMADDR_CID_LISTEN_FROM_VMWARE, VMADDR_CID_LISTEN_FROM_HYPERV)
> > >
> > > Hmm...VMADDR_CID_LISTEN_FROM_KVM, VMADDR_CID_LISTEN_FROM_VMWARE,
> > > VMADDR_CID_LISTEN_FROM_HYPERV isn't very flexible.  What if my service
> > > should only be available to a subset of VMware VMs?
> >
> > You're right, it is not very flexible.
> 
> When I was last looking at this, I was considering a proposal where
> the incoming traffic would determine which transport to use for
> CID_ANY in the case of multiple transports. For stream sockets, we
> already have a shared port space, so if we receive a connection
> request for < port N, CID_ANY>, that connection would use the
> transport of the incoming request. The transport could either be a
> host->guest transport or the guest->host transport. This is a bit
> harder to do for datagrams since the VSOCK port is decided by the
> transport itself today. For VMCI, a VMCI datagram handler is allocated
> for each datagram socket, and the ID of that handler is used as the
> port. So we would potentially have to register the same datagram port
> with all transports.

So, do you think we should implement a shared port space also for
datagram sockets?

For now only the VMWare implementation supports the datagram sockets,
but in the future we could support it also on KVM and HyperV, so I think
we should consider it in this proposal.

> 
> The use of network namespaces would be complimentary to this, and
> could be used to partition VMs between hypervisors or at a finer
> granularity. This could also be used to isolate host applications from
> guest applications using the same ports with CID_ANY if necessary.
> 

Another point to the netns support, I'll put it in the proposal (or it
could go in parallel with the multi-transport support).

> >
> > >
> > > Instead it might be more appropriate to use network namespaces to create
> > > independent AF_VSOCK addressing domains.  Then you could have two
> > > separate groups of VMware VMs and selectively listen to just one group.
> > >
> >
> > Does AF_VSOCK support network namespace or it could be another
> > improvement to take care? (IIUC is not currently supported)
> >
> > A possible issue that I'm seeing with netns is if they are used for
> > other purpose (e.g. to isolate the network of a VM), we should have
> > multiple instances of the application, one per netns.
> >
> > > >
> > > >     b. use the 'guest side transport', if the socket is bound to local CID
> > > >        different from the VMADDR_CID_HOST (guest CID get with
> > > >        IOCTL_VM_SOCKETS_GET_LOCAL_CID), or it is bound to VMADDR_CID_ANY
> > > >        (to be backward compatible).
> > > >        Also in this case, we could define a new VMADDR_CID_LISTEN_FROM_HOST.
> > >
> > > Two additional topics:
> > >
> > > 1. How will loading af_vsock.ko change?
> >
> > I'd allow the loading of af_vsock.ko without any transport.
> > Maybe we should move the MODULE_ALIAS_NETPROTO(PF_VSOCK) from the
> > vmci_transport.ko to the af_vsock.ko, but this can impact the VMware
> > driver.
> 
> As I remember it, this will impact the existing VMware products. I'll
> have to double check that.
> 

Thanks! Let me know, because I think could be better if we can move it
to the af_vsock.ko, in order to be more agnostic of the transport used.

> >
> > >    In particular, can an
> > >    application create a socket in af_vsock.ko without any loaded
> > >    transport?  Can it enter listen state without any loaded transport
> > >    (this seems useful with VMADDR_CID_ANY)?
> >
> > I'll check if we can allow listen sockets without any loaded transport,
> > but I think could be a nice behaviour to have.
> >
> > >
> > > 2. Does your proposed behavior match VMware's existing nested vsock
> > >    semantics?
> >
> > I'm not sure, but I tried to follow the Jorgen's answers to the original
> > thread. I hope that this proposal matches the VMware semantic.
> 
> Yes, the semantics should be preserved.

Thanks you very much,
Stefano

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] vsock: proposal to support multiple transports at runtime
  2019-05-30 11:19       ` Stefano Garzarella
@ 2019-05-31  9:24         ` Jorgen Hansen
  2019-06-03 10:49           ` Stefano Garzarella
  0 siblings, 1 reply; 9+ messages in thread
From: Jorgen Hansen @ 2019-05-31  9:24 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Stefan Hajnoczi, netdev, Dexuan Cui, David S. Miller,
	Vishnu DASA, K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Sasha Levin

On 30 May 2019, at 13:19, Stefano Garzarella <sgarzare@redhat.com> wrote:
> 
> On Tue, May 28, 2019 at 04:01:00PM +0000, Jorgen Hansen wrote:
>>> On Thu, May 23, 2019 at 04:37:03PM +0100, Stefan Hajnoczi wrote:
>>>> On Tue, May 14, 2019 at 10:15:43AM +0200, Stefano Garzarella wrote:
> 
>>>>> 
>>>>> 
>>>>> 2. listen() / recvfrom()
>>>>> 
>>>>>    a. use the 'host side transport', if the socket is bound to
>>>>>       VMADDR_CID_HOST, or it is bound to VMADDR_CID_ANY and there is no
>>>>>       guest transport.
>>>>>       We could also define a new VMADDR_CID_LISTEN_FROM_GUEST in order to
>>>>>       address this case.
>>>>>       If we want to support multiple 'host side transport' running at the
>>>>>       same time, we should find a way to allow an application to bound a
>>>>>       specific host transport (e.g. adding new VMADDR_CID_LISTEN_FROM_KVM,
>>>>>       VMADDR_CID_LISTEN_FROM_VMWARE, VMADDR_CID_LISTEN_FROM_HYPERV)
>>>> 
>>>> Hmm...VMADDR_CID_LISTEN_FROM_KVM, VMADDR_CID_LISTEN_FROM_VMWARE,
>>>> VMADDR_CID_LISTEN_FROM_HYPERV isn't very flexible.  What if my service
>>>> should only be available to a subset of VMware VMs?
>>> 
>>> You're right, it is not very flexible.
>> 
>> When I was last looking at this, I was considering a proposal where
>> the incoming traffic would determine which transport to use for
>> CID_ANY in the case of multiple transports. For stream sockets, we
>> already have a shared port space, so if we receive a connection
>> request for < port N, CID_ANY>, that connection would use the
>> transport of the incoming request. The transport could either be a
>> host->guest transport or the guest->host transport. This is a bit
>> harder to do for datagrams since the VSOCK port is decided by the
>> transport itself today. For VMCI, a VMCI datagram handler is allocated
>> for each datagram socket, and the ID of that handler is used as the
>> port. So we would potentially have to register the same datagram port
>> with all transports.
> 
> So, do you think we should implement a shared port space also for
> datagram sockets?

Yes, having the two socket types work the same way seems cleaner to me. We should at least cover it in the design.

> For now only the VMWare implementation supports the datagram sockets,
> but in the future we could support it also on KVM and HyperV, so I think
> we should consider it in this proposal.

So for now, it sounds like we could make the VMCI transport the default transport for any host side datagram socket, then.

>> 
>> The use of network namespaces would be complimentary to this, and
>> could be used to partition VMs between hypervisors or at a finer
>> granularity. This could also be used to isolate host applications from
>> guest applications using the same ports with CID_ANY if necessary.
>> 
> 
> Another point to the netns support, I'll put it in the proposal (or it
> could go in parallel with the multi-transport support).
> 

It should be fine to put in the proposal that we rely on namespaces to provide this support, but pursue namespaces as a separate project.

Thanks,
Jorgen

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] vsock: proposal to support multiple transports at runtime
  2019-05-31  9:24         ` Jorgen Hansen
@ 2019-06-03 10:49           ` Stefano Garzarella
  0 siblings, 0 replies; 9+ messages in thread
From: Stefano Garzarella @ 2019-06-03 10:49 UTC (permalink / raw)
  To: Jorgen Hansen
  Cc: Stefan Hajnoczi, netdev, Dexuan Cui, David S. Miller,
	Vishnu DASA, K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Sasha Levin

On Fri, May 31, 2019 at 09:24:49AM +0000, Jorgen Hansen wrote:
> On 30 May 2019, at 13:19, Stefano Garzarella <sgarzare@redhat.com> wrote:
> > 
> > On Tue, May 28, 2019 at 04:01:00PM +0000, Jorgen Hansen wrote:
> >>> On Thu, May 23, 2019 at 04:37:03PM +0100, Stefan Hajnoczi wrote:
> >>>> On Tue, May 14, 2019 at 10:15:43AM +0200, Stefano Garzarella wrote:
> > 
> >>>>> 
> >>>>> 
> >>>>> 2. listen() / recvfrom()
> >>>>> 
> >>>>>    a. use the 'host side transport', if the socket is bound to
> >>>>>       VMADDR_CID_HOST, or it is bound to VMADDR_CID_ANY and there is no
> >>>>>       guest transport.
> >>>>>       We could also define a new VMADDR_CID_LISTEN_FROM_GUEST in order to
> >>>>>       address this case.
> >>>>>       If we want to support multiple 'host side transport' running at the
> >>>>>       same time, we should find a way to allow an application to bound a
> >>>>>       specific host transport (e.g. adding new VMADDR_CID_LISTEN_FROM_KVM,
> >>>>>       VMADDR_CID_LISTEN_FROM_VMWARE, VMADDR_CID_LISTEN_FROM_HYPERV)
> >>>> 
> >>>> Hmm...VMADDR_CID_LISTEN_FROM_KVM, VMADDR_CID_LISTEN_FROM_VMWARE,
> >>>> VMADDR_CID_LISTEN_FROM_HYPERV isn't very flexible.  What if my service
> >>>> should only be available to a subset of VMware VMs?
> >>> 
> >>> You're right, it is not very flexible.
> >> 
> >> When I was last looking at this, I was considering a proposal where
> >> the incoming traffic would determine which transport to use for
> >> CID_ANY in the case of multiple transports. For stream sockets, we
> >> already have a shared port space, so if we receive a connection
> >> request for < port N, CID_ANY>, that connection would use the
> >> transport of the incoming request. The transport could either be a
> >> host->guest transport or the guest->host transport. This is a bit
> >> harder to do for datagrams since the VSOCK port is decided by the
> >> transport itself today. For VMCI, a VMCI datagram handler is allocated
> >> for each datagram socket, and the ID of that handler is used as the
> >> port. So we would potentially have to register the same datagram port
> >> with all transports.
> > 
> > So, do you think we should implement a shared port space also for
> > datagram sockets?
> 
> Yes, having the two socket types work the same way seems cleaner to me. We should at least cover it in the design.
> 

Okay, I'll add this point on a v2 of this proposal!

> > For now only the VMWare implementation supports the datagram sockets,
> > but in the future we could support it also on KVM and HyperV, so I think
> > we should consider it in this proposal.
> 
> So for now, it sounds like we could make the VMCI transport the default transport for any host side datagram socket, then.
> 

Yes, make sense.

> >> 
> >> The use of network namespaces would be complimentary to this, and
> >> could be used to partition VMs between hypervisors or at a finer
> >> granularity. This could also be used to isolate host applications from
> >> guest applications using the same ports with CID_ANY if necessary.
> >> 
> > 
> > Another point to the netns support, I'll put it in the proposal (or it
> > could go in parallel with the multi-transport support).
> > 
> 
> It should be fine to put in the proposal that we rely on namespaces to provide this support, but pursue namespaces as a separate project.

Sure.

I'll send a v2 adding all the points discussed to be sure that we are
aligned. Then I'll start working on it if we agree on the proposal.

Thanks,
Stefano

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-06-03 10:49 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-14  8:15 [RFC] vsock: proposal to support multiple transports at runtime Stefano Garzarella
2019-05-16 21:48 ` Dexuan Cui
2019-05-20 14:44   ` Stefano Garzarella
2019-05-23 15:37 ` Stefan Hajnoczi
2019-05-27 10:44   ` Stefano Garzarella
2019-05-28 16:01     ` Jorgen Hansen
2019-05-30 11:19       ` Stefano Garzarella
2019-05-31  9:24         ` Jorgen Hansen
2019-06-03 10:49           ` Stefano Garzarella

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).