All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: Dexuan Cui <decui@microsoft.com>
Cc: "Jorgen S. Hansen" <jhansen@vmware.com>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	"devel@linuxdriverproject.org" <devel@linuxdriverproject.org>,
	KY Srinivasan <kys@microsoft.com>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	Stephen Hemminger <sthemmin@microsoft.com>,
	George Zhang <georgezhang@vmware.com>,
	Michal Kubecek <mkubecek@suse.cz>, Asias He <asias@redhat.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Cathy Avery <cavery@redhat.com>,
	"jasowang@redhat.com" <jasowang@redhat.com>,
	Rolf Neugebauer <rolf.neugebauer@docker.com>,
	Dave Scott <dave.scott@docker.com>,
	Marcelo Cerri <marcelo.cerri@canonical.com>,
	"apw@canonical.com" <apw@canonical.com>,
	"olaf@aepfle.de" <olaf@aepfle.de>,
	"joe@perches.com" <joe@perches.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Dan Carpenter <dan.carpenter@oracle.com>
Subject: Re: [PATCH] vsock: only load vmci transport on VMware hypervisor by default
Date: Fri, 18 Aug 2017 16:37:16 +0100	[thread overview]
Message-ID: <20170818153716.GB17572@stefanha-x1.localdomain> (raw)
In-Reply-To: <KL1P15301MB0008A67BF969A47309CFA992BF800@KL1P15301MB0008.APCP153.PROD.OUTLOOK.COM>

[-- Attachment #1: Type: text/plain, Size: 4314 bytes --]

On Fri, Aug 18, 2017 at 03:07:30AM +0000, Dexuan Cui wrote:
> > From: Jorgen S. Hansen [mailto:jhansen@vmware.com]
> > Sent: Thursday, August 17, 2017 08:17
> > >
> > > Putting aside nested virtualization, I want to load the transport (vmci,
> > > Hyper-V, vsock) for which there is paravirtualized hardware present
> > > inside the guest.
> > 
> > Good points. Completely agree that this is the desired behavior for a guest.
> > 
> > 
> > > It's a little tricker on the host side (doesn't matter for Hyper-V and
> > > probably also doesn't for VMware) because the host-side driver is a
> > > software device with no hardware backing it.  In KVM we assume the
> > > vhost_vsock.ko kernel module will be loaded sufficiently early.
> > 
> > Since the vmci driver is currently tied to PF_VSOCK it hasn’t been a problem,
> > but on the host side the VMCI driver has no hardware backing it either, so
> > when we move to a more appropriate solution, this will be an issue for VMCI as
> > well. I’ll check our shipped products, but they most likely assume that if an
> > upstreamed vmci module is present, it will be loaded automatically.
> 
> Hyper-V Sockets is a standard feature of VMBus v4.0, so we can easily know
> we can and should load iff vmbus_proto_version >= VERSION_WIN10.
> 
> > > Things get trickier with nested virtualization because the VM might want
> > > to talk to its host but also to its nested VMs.  The simple way of
> > > fixing this would be to allow two transports loaded simultaneously and
> > > route traffic destined to CID 2 to the host transport and all other
> > > traffic to the guest transport.
> 
> This sounds like a little tricky to me.
> CID is not really used by us, because we only support guest<->host communication,
> and don't support guest<->guest communication. The Hyper-V host references
> every VM by VmID (which is invisible to the VM), and a VM can only talk to the
> host via this feature.

Applications running inside the guest should use VMADDR_CID_HOST (2) to
connect to the host, even on Hyper-V.

By the way, we should collaborate on a test suite and a vsock(7) man
page that documents the semantics of AF_VSOCK sockets.  This way our
transports will have the same behavior and AF_VSOCK applications will
work on all 3 hypervisors.

Not all features need to be supported.  For example, VMCI supports
SOCK_DGRAM while Hyper-V and virtio do not.  But features that are
available should behave identically.

> > This is close to the routing the VMCI driver does in a nested environment, but
> > that is with the assumption that there is only one type of transport. Having two
> > different transports would require that we delay resolving the transport type
> > until the socket endpoint has been bound to an address. Things get trickier if
> > listening sockets use VMADDR_CID_ANY - if only one transport is present, this
> > would allow the socket to accept connections from both guests and outer host,
> > but with multiple transports that won’t work, since we can’t associate a socket
> > with a transport until the socket is bound.
> > 
> > >
> > > Perhaps we should discuss these cases a bit more to figure out how to
> > > avoid conflicts over MODULE_ALIAS_NETPROTO(PF_VSOCK).
> > 
> > Agreed.
> 
> Can we use the 'protocol' parameter in the socket() function:
> int socket(int domain, int type, int protocol) 
> 
> IMO currently the 'protocol' is not really used.
> I think we can modify __vsock_core_init() to allow multiple transport layers to
>  be registered, and we can define different 'protocol' numbers for
> VMware/KVM/Hyper-V, and ask the application to explicitly specify what should
> be used. Considering compatibility, we can use the default transport in a given
> VM depending on the underlying hypervisor. 

I think AF_VSOCK should hide the transport from users/applications.
Think of same-on-same nested virtualization: VMware-on-VMware or
KVM-on-KVM.  In that case specifying VMCI or virtio doesn't help.

We'd still need to distinguish between "to guest" and "to host"
(currently VMCI has code to do this but virtio does not).

The natural place to distinguish the destination is when dealing with
the sockaddr in connect(), bind(), etc.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

WARNING: multiple messages have this Message-ID (diff)
From: Stefan Hajnoczi <stefanha@redhat.com>
To: Dexuan Cui <decui@microsoft.com>
Cc: "Jorgen S. Hansen" <jhansen@vmware.com>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	"devel@linuxdriverproject.org" <devel@linuxdriverproject.org>,
	KY Srinivasan <kys@microsoft.com>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	Stephen Hemminger <sthemmin@microsoft.com>,
	George Zhang <georgezhang@vmware.com>,
	Michal Kubecek <mkubecek@suse.cz>, Asias He <asias@redhat.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Cathy Avery <cavery@redhat.com>,
	"jasowang@redhat.com" <jasowang@redhat.com>,
	Rolf Neugebauer <rolf.neugebauer@docker.com>,
	Dave Scott <dave.scott@docker.com>,
	Marcelo Cerri <marcelo.cerri@canonical.com>,
	"apw@canonical.com" <apw@canonical.com>,
Subject: Re: [PATCH] vsock: only load vmci transport on VMware hypervisor by default
Date: Fri, 18 Aug 2017 16:37:16 +0100	[thread overview]
Message-ID: <20170818153716.GB17572@stefanha-x1.localdomain> (raw)
In-Reply-To: <KL1P15301MB0008A67BF969A47309CFA992BF800@KL1P15301MB0008.APCP153.PROD.OUTLOOK.COM>

[-- Attachment #1: Type: text/plain, Size: 4314 bytes --]

On Fri, Aug 18, 2017 at 03:07:30AM +0000, Dexuan Cui wrote:
> > From: Jorgen S. Hansen [mailto:jhansen@vmware.com]
> > Sent: Thursday, August 17, 2017 08:17
> > >
> > > Putting aside nested virtualization, I want to load the transport (vmci,
> > > Hyper-V, vsock) for which there is paravirtualized hardware present
> > > inside the guest.
> > 
> > Good points. Completely agree that this is the desired behavior for a guest.
> > 
> > 
> > > It's a little tricker on the host side (doesn't matter for Hyper-V and
> > > probably also doesn't for VMware) because the host-side driver is a
> > > software device with no hardware backing it.  In KVM we assume the
> > > vhost_vsock.ko kernel module will be loaded sufficiently early.
> > 
> > Since the vmci driver is currently tied to PF_VSOCK it hasn’t been a problem,
> > but on the host side the VMCI driver has no hardware backing it either, so
> > when we move to a more appropriate solution, this will be an issue for VMCI as
> > well. I’ll check our shipped products, but they most likely assume that if an
> > upstreamed vmci module is present, it will be loaded automatically.
> 
> Hyper-V Sockets is a standard feature of VMBus v4.0, so we can easily know
> we can and should load iff vmbus_proto_version >= VERSION_WIN10.
> 
> > > Things get trickier with nested virtualization because the VM might want
> > > to talk to its host but also to its nested VMs.  The simple way of
> > > fixing this would be to allow two transports loaded simultaneously and
> > > route traffic destined to CID 2 to the host transport and all other
> > > traffic to the guest transport.
> 
> This sounds like a little tricky to me.
> CID is not really used by us, because we only support guest<->host communication,
> and don't support guest<->guest communication. The Hyper-V host references
> every VM by VmID (which is invisible to the VM), and a VM can only talk to the
> host via this feature.

Applications running inside the guest should use VMADDR_CID_HOST (2) to
connect to the host, even on Hyper-V.

By the way, we should collaborate on a test suite and a vsock(7) man
page that documents the semantics of AF_VSOCK sockets.  This way our
transports will have the same behavior and AF_VSOCK applications will
work on all 3 hypervisors.

Not all features need to be supported.  For example, VMCI supports
SOCK_DGRAM while Hyper-V and virtio do not.  But features that are
available should behave identically.

> > This is close to the routing the VMCI driver does in a nested environment, but
> > that is with the assumption that there is only one type of transport. Having two
> > different transports would require that we delay resolving the transport type
> > until the socket endpoint has been bound to an address. Things get trickier if
> > listening sockets use VMADDR_CID_ANY - if only one transport is present, this
> > would allow the socket to accept connections from both guests and outer host,
> > but with multiple transports that won’t work, since we can’t associate a socket
> > with a transport until the socket is bound.
> > 
> > >
> > > Perhaps we should discuss these cases a bit more to figure out how to
> > > avoid conflicts over MODULE_ALIAS_NETPROTO(PF_VSOCK).
> > 
> > Agreed.
> 
> Can we use the 'protocol' parameter in the socket() function:
> int socket(int domain, int type, int protocol) 
> 
> IMO currently the 'protocol' is not really used.
> I think we can modify __vsock_core_init() to allow multiple transport layers to
>  be registered, and we can define different 'protocol' numbers for
> VMware/KVM/Hyper-V, and ask the application to explicitly specify what should
> be used. Considering compatibility, we can use the default transport in a given
> VM depending on the underlying hypervisor. 

I think AF_VSOCK should hide the transport from users/applications.
Think of same-on-same nested virtualization: VMware-on-VMware or
KVM-on-KVM.  In that case specifying VMCI or virtio doesn't help.

We'd still need to distinguish between "to guest" and "to host"
(currently VMCI has code to do this but virtio does not).

The natural place to distinguish the destination is when dealing with
the sockaddr in connect(), bind(), etc.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

  reply	other threads:[~2017-08-18 15:37 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-17  8:00 [PATCH] vsock: only load vmci transport on VMware hypervisor by default Dexuan Cui
2017-08-17  8:00 ` Dexuan Cui
2017-08-17 13:55 ` Stefan Hajnoczi
2017-08-17 13:55   ` Stefan Hajnoczi
2017-08-17 15:16   ` Jorgen S. Hansen
2017-08-17 15:16     ` Jorgen S. Hansen
2017-08-18  3:07     ` Dexuan Cui
2017-08-18  3:07       ` Dexuan Cui
2017-08-18 15:37       ` Stefan Hajnoczi [this message]
2017-08-18 15:37         ` Stefan Hajnoczi
2017-08-18 23:07         ` Dexuan Cui
2017-08-18 23:07           ` Dexuan Cui
2017-08-22  9:54           ` Stefan Hajnoczi
2017-08-22  9:54             ` Stefan Hajnoczi
2017-08-22 13:07             ` Jorgen S. Hansen
2017-08-22 13:07               ` Jorgen S. Hansen
2017-08-23  4:21               ` Dexuan Cui
2017-08-23  4:21                 ` Dexuan Cui
2017-08-29  2:36                 ` Dexuan Cui
2017-08-29  2:36                   ` Dexuan Cui
2017-08-29 15:37                   ` Jorgen S. Hansen
2017-08-29 15:37                     ` Jorgen S. Hansen
2017-08-31 11:54                     ` Stefan Hajnoczi
2017-08-31 11:54                       ` Stefan Hajnoczi
2017-09-02  6:25                       ` Dexuan Cui
2017-09-02  6:25                         ` Dexuan Cui
2017-09-06 14:11                       ` Jorgen S. Hansen
2017-09-06 14:11                         ` Jorgen S. Hansen
2017-09-06 19:39                         ` Dexuan Cui
2017-09-06 19:39                           ` Dexuan Cui
2017-08-17 17:04 ` David Miller
2017-08-17 17:04   ` David Miller
2017-08-17 18:27   ` Dexuan Cui
2017-08-17 18:27     ` Dexuan Cui

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170818153716.GB17572@stefanha-x1.localdomain \
    --to=stefanha@redhat.com \
    --cc=apw@canonical.com \
    --cc=asias@redhat.com \
    --cc=cavery@redhat.com \
    --cc=dan.carpenter@oracle.com \
    --cc=dave.scott@docker.com \
    --cc=davem@davemloft.net \
    --cc=decui@microsoft.com \
    --cc=devel@linuxdriverproject.org \
    --cc=georgezhang@vmware.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=haiyangz@microsoft.com \
    --cc=jasowang@redhat.com \
    --cc=jhansen@vmware.com \
    --cc=joe@perches.com \
    --cc=kys@microsoft.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marcelo.cerri@canonical.com \
    --cc=mkubecek@suse.cz \
    --cc=netdev@vger.kernel.org \
    --cc=olaf@aepfle.de \
    --cc=rolf.neugebauer@docker.com \
    --cc=sthemmin@microsoft.com \
    --cc=vkuznets@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.