All of lore.kernel.org
 help / color / mirror / Atom feed
* virtio-vsock live migration
@ 2016-03-03 15:37 Stefan Hajnoczi
  2016-03-10 23:56 ` Michael S. Tsirkin
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2016-03-03 15:37 UTC (permalink / raw)
  To: virtualization
  Cc: virtio-dev, Michael S. Tsirkin, Claudio Imbrenda,
	Christian Borntraeger, Matt Benjamin, Christoffer Dall


[-- Attachment #1.1: Type: text/plain, Size: 3812 bytes --]

Michael pointed out that the virtio-vsock draft specification does not
address live migration and in fact currently precludes migration.

Migration is fundamental so the device specification at least mustn't
preclude it.  Having brainstormed migration with Matthew Benjamin and
Michael Tsirkin, I am now summarizing the approach that I want to
include in the next draft specification.

Feedback and comments welcome!  In the meantime I will implement this in
code and update the draft specification.

1. Requirements

Virtio-vsock is a new AF_VSOCK transport.  As such, it should provide at
least the same guarantees as the existing AF_VSOCK VMCI transport.  This
is for consistency and to allow code reuse across any AF_VSOCK
transport.

Virtio-vsock aims to replace virtio-serial by providing the same
guest/host communication ability but with sockets API semantics that are
more popular and convenient for application developers.  Therefore
virtio-vsock migration should provide at least the same level of
migration functionality as virtio-serial.

Ideally it should be possible to migrate applications using AF_VSOCK
together with the virtual machine so that guest<->host communication is
interrupted.  Neither AF_VSOCK VMCI nor virtio-serial support this
today.

2. Basic disruptive migration flow

When the virtual machine migrates from the source host to the
destination host, the guest's CID may change.  The CID namespace is
host-wide so other hosts may have CID collisions and allocate a new CID
for incoming migration VMs.

The device notifies the guest that the CID has changed.  Guest sockets
are affected as follows:

 * Established connections are reset (ECONNRESET) and the guest
   application will have to reconnect.

 * Listen sockets remain open.  The only thing to note is that
   connections from the host are now made to the new CID.  This means
   the local address of the listen socket is automatically updated to
   the new CID.

 * Sockets in other states are unchanged.

Applications must handle disruptive migration by reconnecting if
necessary after ECONNRESET.

3. Checkpoint/restore for seamless migration

Applications that wish to communicate across live migration can do so
but this requires extra application-specific checkpoint/restore code.

This is similar to the approach taken by the CRIU project where
getsockopt()/setsockopt() is used to migrate socket state.  The
difference is that the application process is not automatically migrated
from the source host to the destination host.  Therefore, the
application needs to migrate its own state somehow.

The flow is as follows:

The application on the source host must quiesce (stop sending/receiving)
and use getsockopt() to extract socket state information from the host
kernel.

A new instance of the application is started on the destination host and
given the state so it can restore the connection.  The setsockopt()
syscall is used to restore socket state information.

The guest is given a list of <host_old_cid, host_new_cid, host_port,
guest_port> tuples for established connections that must not be reset
when the guest CID update notification is received.  These connections
will carry on as if nothing changed.

Note that the connection's remote address is updated from host_old_cid
to host_new_cid.  This allows remapping of CIDs (if necessary).
Typically this will be unused because the host always has well-known CID
2.  In a guest<->guest scenario it may be used to remap CIDs.


For the time being I am focussing on the basic disruptive migration flow
only.  Checkpoint/restore can be added with a feature bit in the future.
It is a lot more complex and I'm not sure whether there will be any
users yet.

Stefan

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: virtio-vsock live migration
  2016-03-03 15:37 virtio-vsock live migration Stefan Hajnoczi
@ 2016-03-10 23:56 ` Michael S. Tsirkin
  2016-03-14 11:13 ` [virtio-dev] " Michael S. Tsirkin
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Michael S. Tsirkin @ 2016-03-10 23:56 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: virtio-dev, Claudio Imbrenda, Christian Borntraeger,
	Matt Benjamin, virtualization, Christoffer Dall

On Thu, Mar 03, 2016 at 03:37:37PM +0000, Stefan Hajnoczi wrote:
> Michael pointed out that the virtio-vsock draft specification does not
> address live migration and in fact currently precludes migration.
> 
> Migration is fundamental so the device specification at least mustn't
> preclude it.  Having brainstormed migration with Matthew Benjamin and
> Michael Tsirkin, I am now summarizing the approach that I want to
> include in the next draft specification.
> 
> Feedback and comments welcome!  In the meantime I will implement this in
> code and update the draft specification.
> 
> 1. Requirements
> 
> Virtio-vsock is a new AF_VSOCK transport.  As such, it should provide at
> least the same guarantees as the existing AF_VSOCK VMCI transport.  This
> is for consistency and to allow code reuse across any AF_VSOCK
> transport.
> 
> Virtio-vsock aims to replace virtio-serial by providing the same
> guest/host communication ability but with sockets API semantics that are
> more popular and convenient for application developers.  Therefore
> virtio-vsock migration should provide at least the same level of
> migration functionality as virtio-serial.
> 
> Ideally it should be possible to migrate applications using AF_VSOCK
> together with the virtual machine so that guest<->host communication is
> interrupted.  Neither AF_VSOCK VMCI nor virtio-serial support this
> today.

I'm not sure why do you say this about virtio serial.
It appears that if host pre-connected to destination
qemu before migration, backend reconnects transparently
on destination.


> 2. Basic disruptive migration flow
> 
> When the virtual machine migrates from the source host to the
> destination host, the guest's CID may change.  The CID namespace is
> host-wide


BTW, I think CIDs would have to become per network namespace.

> so other hosts may have CID collisions and allocate a new CID
> for incoming migration VMs.

I guess all this is so that guest can retrieve its CID and
send it to host using some side-channel?


> The device notifies the guest that the CID has changed.  Guest sockets
> are affected as follows:
> 
>  * Established connections are reset (ECONNRESET) and the guest
>    application will have to reconnect.
> 
>  * Listen sockets remain open.  The only thing to note is that
>    connections from the host are now made to the new CID.  This means
>    the local address of the listen socket is automatically updated to
>    the new CID.
> 
>  * Sockets in other states are unchanged.
> 
> Applications must handle disruptive migration by reconnecting if
> necessary after ECONNRESET.
> 
> 3. Checkpoint/restore for seamless migration
> 
> Applications that wish to communicate across live migration can do so
> but this requires extra application-specific checkpoint/restore code.
> 
> This is similar to the approach taken by the CRIU project where
> getsockopt()/setsockopt() is used to migrate socket state.  The
> difference is that the application process is not automatically migrated
> from the source host to the destination host.  Therefore, the
> application needs to migrate its own state somehow.
> 
> The flow is as follows:
> 
> The application on the source host must quiesce (stop sending/receiving)
> and use getsockopt() to extract socket state information from the host
> kernel.
> 
> A new instance of the application is started on the destination host and
> given the state so it can restore the connection.  The setsockopt()
> syscall is used to restore socket state information.
> 
> The guest is given a list of <host_old_cid, host_new_cid, host_port,
> guest_port> tuples for established connections that must not be reset
> when the guest CID update notification is received.  These connections
> will carry on as if nothing changed.
> 
> Note that the connection's remote address is updated from host_old_cid
> to host_new_cid.  This allows remapping of CIDs (if necessary).
> Typically this will be unused because the host always has well-known CID
> 2.  In a guest<->guest scenario it may be used to remap CIDs.
> 
> 
> For the time being I am focussing on the basic disruptive migration flow
> only.  Checkpoint/restore can be added with a feature bit in the future.
> It is a lot more complex and I'm not sure whether there will be any
> users yet.
> 
> Stefan

This makes some things harder. For example, imagine a guest
reboot mixed with migration. We don't know why did the connection
die, so we'll retry connections until - when?

Could you please describe some user of vsock and show how
it recovers from destructive migration?

-- 
MST

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [virtio-dev] virtio-vsock live migration
  2016-03-03 15:37 virtio-vsock live migration Stefan Hajnoczi
  2016-03-10 23:56 ` Michael S. Tsirkin
@ 2016-03-14 11:13 ` Michael S. Tsirkin
       [not found] ` <20160311014147-mutt-send-email-mst@redhat.com>
       [not found] ` <20160314130150-mutt-send-email-mst@redhat.com>
  3 siblings, 0 replies; 11+ messages in thread
From: Michael S. Tsirkin @ 2016-03-14 11:13 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: virtio-dev, Claudio Imbrenda, Christian Borntraeger,
	Matt Benjamin, virtualization, Christoffer Dall

On Thu, Mar 03, 2016 at 03:37:37PM +0000, Stefan Hajnoczi wrote:
> Michael pointed out that the virtio-vsock draft specification does not
> address live migration and in fact currently precludes migration.
> 
> Migration is fundamental so the device specification at least mustn't
> preclude it.  Having brainstormed migration with Matthew Benjamin and
> Michael Tsirkin, I am now summarizing the approach that I want to
> include in the next draft specification.
> 
> Feedback and comments welcome!  In the meantime I will implement this in
> code and update the draft specification.

Most of the issue seems to be a consequence of using a 4 byte CID.

I think the right thing to do is just to teach guests
about 64 bit CIDs.

For now, can we drop guest CID from guest to host communication completely,
making CID only host-visible? Maybe leave the space
in the packet so we can add CID there later.
It seems that in theory this will allow changing CID
during migration, transparently to the guest.

Guest visible CID is required for guest to guest communication -
but IIUC that is not currently supported.
Maybe that can be made conditional on 64 bit addressing.
Alternatively, it seems much easier to accept that these channels get broken
across migration.


> 1. Requirements
> 
> Virtio-vsock is a new AF_VSOCK transport.  As such, it should provide at
> least the same guarantees as the existing AF_VSOCK VMCI transport.  This
> is for consistency and to allow code reuse across any AF_VSOCK
> transport.
> 
> Virtio-vsock aims to replace virtio-serial by providing the same
> guest/host communication ability but with sockets API semantics that are
> more popular and convenient for application developers.  Therefore
> virtio-vsock migration should provide at least the same level of
> migration functionality as virtio-serial.
> 
> Ideally it should be possible to migrate applications using AF_VSOCK
> together with the virtual machine so that guest<->host communication is
> interrupted.  Neither AF_VSOCK VMCI nor virtio-serial support this
> today.
> 
> 2. Basic disruptive migration flow
> 
> When the virtual machine migrates from the source host to the
> destination host, the guest's CID may change.  The CID namespace is
> host-wide so other hosts may have CID collisions and allocate a new CID
> for incoming migration VMs.
> 
> The device notifies the guest that the CID has changed.  Guest sockets
> are affected as follows:
> 
>  * Established connections are reset (ECONNRESET) and the guest
>    application will have to reconnect.
> 
>  * Listen sockets remain open.  The only thing to note is that
>    connections from the host are now made to the new CID.  This means
>    the local address of the listen socket is automatically updated to
>    the new CID.
> 
>  * Sockets in other states are unchanged.
> 
> Applications must handle disruptive migration by reconnecting if
> necessary after ECONNRESET.
> 
> 3. Checkpoint/restore for seamless migration
> 
> Applications that wish to communicate across live migration can do so
> but this requires extra application-specific checkpoint/restore code.
> 
> This is similar to the approach taken by the CRIU project where
> getsockopt()/setsockopt() is used to migrate socket state.  The
> difference is that the application process is not automatically migrated
> from the source host to the destination host.  Therefore, the
> application needs to migrate its own state somehow.
> 
> The flow is as follows:
> 
> The application on the source host must quiesce (stop sending/receiving)
> and use getsockopt() to extract socket state information from the host
> kernel.
> 
> A new instance of the application is started on the destination host and
> given the state so it can restore the connection.  The setsockopt()
> syscall is used to restore socket state information.
> 
> The guest is given a list of <host_old_cid, host_new_cid, host_port,
> guest_port> tuples for established connections that must not be reset
> when the guest CID update notification is received.  These connections
> will carry on as if nothing changed.
> 
> Note that the connection's remote address is updated from host_old_cid
> to host_new_cid.  This allows remapping of CIDs (if necessary).
> Typically this will be unused because the host always has well-known CID
> 2.  In a guest<->guest scenario it may be used to remap CIDs.
> 
> 
> For the time being I am focussing on the basic disruptive migration flow
> only.  Checkpoint/restore can be added with a feature bit in the future.
> It is a lot more complex and I'm not sure whether there will be any
> users yet.
> 
> Stefan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: virtio-vsock live migration
       [not found] ` <20160311014147-mutt-send-email-mst@redhat.com>
@ 2016-03-15 15:10   ` Stefan Hajnoczi
  0 siblings, 0 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2016-03-15 15:10 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev, Claudio Imbrenda, Christian Borntraeger,
	Matt Benjamin, virtualization, Christoffer Dall


[-- Attachment #1.1: Type: text/plain, Size: 6191 bytes --]

On Fri, Mar 11, 2016 at 01:56:05AM +0200, Michael S. Tsirkin wrote:
> On Thu, Mar 03, 2016 at 03:37:37PM +0000, Stefan Hajnoczi wrote:
> > Michael pointed out that the virtio-vsock draft specification does not
> > address live migration and in fact currently precludes migration.
> > 
> > Migration is fundamental so the device specification at least mustn't
> > preclude it.  Having brainstormed migration with Matthew Benjamin and
> > Michael Tsirkin, I am now summarizing the approach that I want to
> > include in the next draft specification.
> > 
> > Feedback and comments welcome!  In the meantime I will implement this in
> > code and update the draft specification.
> > 
> > 1. Requirements
> > 
> > Virtio-vsock is a new AF_VSOCK transport.  As such, it should provide at
> > least the same guarantees as the existing AF_VSOCK VMCI transport.  This
> > is for consistency and to allow code reuse across any AF_VSOCK
> > transport.
> > 
> > Virtio-vsock aims to replace virtio-serial by providing the same
> > guest/host communication ability but with sockets API semantics that are
> > more popular and convenient for application developers.  Therefore
> > virtio-vsock migration should provide at least the same level of
> > migration functionality as virtio-serial.
> > 
> > Ideally it should be possible to migrate applications using AF_VSOCK
> > together with the virtual machine so that guest<->host communication is
> > interrupted.  Neither AF_VSOCK VMCI nor virtio-serial support this
> > today.
> 
> I'm not sure why do you say this about virtio serial.
> It appears that if host pre-connected to destination
> qemu before migration, backend reconnects transparently
> on destination.

You are right, virtio-serial supports keeping active ports open across
migration (as well as closing active ports across migration).  In
virtio-vsock the equivalent would be setsockopt() CRIU-style socket
migration which is not implemented today.

> > 2. Basic disruptive migration flow
> > 
> > When the virtual machine migrates from the source host to the
> > destination host, the guest's CID may change.  The CID namespace is
> > host-wide
> 
> 
> BTW, I think CIDs would have to become per network namespace.

Yes, I agree.

> > so other hosts may have CID collisions and allocate a new CID
> > for incoming migration VMs.
> 
> I guess all this is so that guest can retrieve its CID and
> send it to host using some side-channel?

Yes.

> > The device notifies the guest that the CID has changed.  Guest sockets
> > are affected as follows:
> > 
> >  * Established connections are reset (ECONNRESET) and the guest
> >    application will have to reconnect.
> > 
> >  * Listen sockets remain open.  The only thing to note is that
> >    connections from the host are now made to the new CID.  This means
> >    the local address of the listen socket is automatically updated to
> >    the new CID.
> > 
> >  * Sockets in other states are unchanged.
> > 
> > Applications must handle disruptive migration by reconnecting if
> > necessary after ECONNRESET.
> > 
> > 3. Checkpoint/restore for seamless migration
> > 
> > Applications that wish to communicate across live migration can do so
> > but this requires extra application-specific checkpoint/restore code.
> > 
> > This is similar to the approach taken by the CRIU project where
> > getsockopt()/setsockopt() is used to migrate socket state.  The
> > difference is that the application process is not automatically migrated
> > from the source host to the destination host.  Therefore, the
> > application needs to migrate its own state somehow.
> > 
> > The flow is as follows:
> > 
> > The application on the source host must quiesce (stop sending/receiving)
> > and use getsockopt() to extract socket state information from the host
> > kernel.
> > 
> > A new instance of the application is started on the destination host and
> > given the state so it can restore the connection.  The setsockopt()
> > syscall is used to restore socket state information.
> > 
> > The guest is given a list of <host_old_cid, host_new_cid, host_port,
> > guest_port> tuples for established connections that must not be reset
> > when the guest CID update notification is received.  These connections
> > will carry on as if nothing changed.
> > 
> > Note that the connection's remote address is updated from host_old_cid
> > to host_new_cid.  This allows remapping of CIDs (if necessary).
> > Typically this will be unused because the host always has well-known CID
> > 2.  In a guest<->guest scenario it may be used to remap CIDs.
> > 
> > 
> > For the time being I am focussing on the basic disruptive migration flow
> > only.  Checkpoint/restore can be added with a feature bit in the future.
> > It is a lot more complex and I'm not sure whether there will be any
> > users yet.
> > 
> > Stefan
> 
> This makes some things harder. For example, imagine a guest
> reboot mixed with migration. We don't know why did the connection
> die, so we'll retry connections until - when?
> 
> Could you please describe some user of vsock and show how
> it recovers from destructive migration?

qemu-guest-agent runs inside the guest with an AF_VSOCK listen socket.

libvirt arbitrates the qemu-guest-agent connection and provides an API
for applications to send commands.

When an application sends a command, libvirt checks if the connection to
qemu-guest-agent is established.  If there is no connection libvirt will
attempt to connect.

The command is sent to qemu-guest-agent and the response is handed back
to the guest application.  libvirt arbitrates access so commands from
multiple applications are serialized.

Live migration resets the established connection between
qemu-guest-agent and the source host's libvirt daemon.  When an
application issues the next qemu-guest-agent command the libvirt daemon
on the destination host notices there is no established connection yet
and starts a new one.

Libvirt refuses to send qemu-guest-agent commands while live migration
is in progress.

Stefan

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [virtio-dev] virtio-vsock live migration
       [not found] ` <20160314130150-mutt-send-email-mst@redhat.com>
@ 2016-03-15 15:15   ` Stefan Hajnoczi
       [not found]   ` <20160315151529.GB26263@stefanha-x1.localdomain>
  1 sibling, 0 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2016-03-15 15:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev, Claudio Imbrenda, Christian Borntraeger,
	Matt Benjamin, virtualization, Christoffer Dall


[-- Attachment #1.1: Type: text/plain, Size: 1946 bytes --]

On Mon, Mar 14, 2016 at 01:13:24PM +0200, Michael S. Tsirkin wrote:
> On Thu, Mar 03, 2016 at 03:37:37PM +0000, Stefan Hajnoczi wrote:
> > Michael pointed out that the virtio-vsock draft specification does not
> > address live migration and in fact currently precludes migration.
> > 
> > Migration is fundamental so the device specification at least mustn't
> > preclude it.  Having brainstormed migration with Matthew Benjamin and
> > Michael Tsirkin, I am now summarizing the approach that I want to
> > include in the next draft specification.
> > 
> > Feedback and comments welcome!  In the meantime I will implement this in
> > code and update the draft specification.
> 
> Most of the issue seems to be a consequence of using a 4 byte CID.
> 
> I think the right thing to do is just to teach guests
> about 64 bit CIDs.
> 
> For now, can we drop guest CID from guest to host communication completely,
> making CID only host-visible? Maybe leave the space
> in the packet so we can add CID there later.
> It seems that in theory this will allow changing CID
> during migration, transparently to the guest.
> 
> Guest visible CID is required for guest to guest communication -
> but IIUC that is not currently supported.
> Maybe that can be made conditional on 64 bit addressing.
> Alternatively, it seems much easier to accept that these channels get broken
> across migration.

I reached the conclusion that channels break across migration because:

1. 32-bit CIDs are in sockaddr_vm and we'd break AF_VSOCK ABI by
   changing it to 64-bit.  Application code would be specific
   virtio-vsock and wouldn't work with other AF_VSOCK transports that
   use the 32-bit sockaddr_vm struct.

2. Dropping guest CIDs from the protocol breaks network protocols that
   send addresses.  NFS and netperf are the first two protocols I looked
   at and both transmit address information across the connection...

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [virtio-dev] virtio-vsock live migration
       [not found]   ` <20160315151529.GB26263@stefanha-x1.localdomain>
@ 2016-03-15 16:12     ` Michael S. Tsirkin
       [not found]     ` <20160315180916-mutt-send-email-mst@redhat.com>
  1 sibling, 0 replies; 11+ messages in thread
From: Michael S. Tsirkin @ 2016-03-15 16:12 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: virtio-dev, Claudio Imbrenda, Christian Borntraeger,
	Matt Benjamin, virtualization, Christoffer Dall

On Tue, Mar 15, 2016 at 03:15:29PM +0000, Stefan Hajnoczi wrote:
> On Mon, Mar 14, 2016 at 01:13:24PM +0200, Michael S. Tsirkin wrote:
> > On Thu, Mar 03, 2016 at 03:37:37PM +0000, Stefan Hajnoczi wrote:
> > > Michael pointed out that the virtio-vsock draft specification does not
> > > address live migration and in fact currently precludes migration.
> > > 
> > > Migration is fundamental so the device specification at least mustn't
> > > preclude it.  Having brainstormed migration with Matthew Benjamin and
> > > Michael Tsirkin, I am now summarizing the approach that I want to
> > > include in the next draft specification.
> > > 
> > > Feedback and comments welcome!  In the meantime I will implement this in
> > > code and update the draft specification.
> > 
> > Most of the issue seems to be a consequence of using a 4 byte CID.
> > 
> > I think the right thing to do is just to teach guests
> > about 64 bit CIDs.
> > 
> > For now, can we drop guest CID from guest to host communication completely,
> > making CID only host-visible? Maybe leave the space
> > in the packet so we can add CID there later.
> > It seems that in theory this will allow changing CID
> > during migration, transparently to the guest.
> > 
> > Guest visible CID is required for guest to guest communication -
> > but IIUC that is not currently supported.
> > Maybe that can be made conditional on 64 bit addressing.
> > Alternatively, it seems much easier to accept that these channels get broken
> > across migration.
> 
> I reached the conclusion that channels break across migration because:
> 
> 1. 32-bit CIDs are in sockaddr_vm and we'd break AF_VSOCK ABI by
>    changing it to 64-bit.  Application code would be specific
>    virtio-vsock and wouldn't work with other AF_VSOCK transports that
>    use the 32-bit sockaddr_vm struct.

You don't have to repeat the IPv6 mistake.  Make all 32 bit CIDs
64 bit CIDs by padding with 0s, then 64 bit apps can use
any CID.

Old 32 bit CID applications will not be able to use the extended
addresses, but hardcoding bugs
does not seem sane.


> 2. Dropping guest CIDs from the protocol breaks network protocols that
>    send addresses.

Stick it in config space if you really have to.
But why do you need it on each packet?

>  NFS and netperf are the first two protocols I looked
>    at and both transmit address information across the connection...


Does netperf really attempt to get local IP
and then send that inline within the connection?


-- 
MST

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [virtio-dev] virtio-vsock live migration
       [not found]     ` <20160315180916-mutt-send-email-mst@redhat.com>
@ 2016-03-16 14:32       ` Stefan Hajnoczi
  2016-03-16 14:58         ` Matt Benjamin
                           ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2016-03-16 14:32 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev, Claudio Imbrenda, Christian Borntraeger,
	Matt Benjamin, virtualization, Christoffer Dall


[-- Attachment #1.1: Type: text/plain, Size: 3907 bytes --]

On Tue, Mar 15, 2016 at 06:12:55PM +0200, Michael S. Tsirkin wrote:
> On Tue, Mar 15, 2016 at 03:15:29PM +0000, Stefan Hajnoczi wrote:
> > On Mon, Mar 14, 2016 at 01:13:24PM +0200, Michael S. Tsirkin wrote:
> > > On Thu, Mar 03, 2016 at 03:37:37PM +0000, Stefan Hajnoczi wrote:
> > > > Michael pointed out that the virtio-vsock draft specification does not
> > > > address live migration and in fact currently precludes migration.
> > > > 
> > > > Migration is fundamental so the device specification at least mustn't
> > > > preclude it.  Having brainstormed migration with Matthew Benjamin and
> > > > Michael Tsirkin, I am now summarizing the approach that I want to
> > > > include in the next draft specification.
> > > > 
> > > > Feedback and comments welcome!  In the meantime I will implement this in
> > > > code and update the draft specification.
> > > 
> > > Most of the issue seems to be a consequence of using a 4 byte CID.
> > > 
> > > I think the right thing to do is just to teach guests
> > > about 64 bit CIDs.
> > > 
> > > For now, can we drop guest CID from guest to host communication completely,
> > > making CID only host-visible? Maybe leave the space
> > > in the packet so we can add CID there later.
> > > It seems that in theory this will allow changing CID
> > > during migration, transparently to the guest.
> > > 
> > > Guest visible CID is required for guest to guest communication -
> > > but IIUC that is not currently supported.
> > > Maybe that can be made conditional on 64 bit addressing.
> > > Alternatively, it seems much easier to accept that these channels get broken
> > > across migration.
> > 
> > I reached the conclusion that channels break across migration because:
> > 
> > 1. 32-bit CIDs are in sockaddr_vm and we'd break AF_VSOCK ABI by
> >    changing it to 64-bit.  Application code would be specific
> >    virtio-vsock and wouldn't work with other AF_VSOCK transports that
> >    use the 32-bit sockaddr_vm struct.
> 
> You don't have to repeat the IPv6 mistake.  Make all 32 bit CIDs
> 64 bit CIDs by padding with 0s, then 64 bit apps can use
> any CID.
> 
> Old 32 bit CID applications will not be able to use the extended
> addresses, but hardcoding bugs
> does not seem sane.

A mixed 32-bit and 64-bit CID world is complex.  The host doesn't know
in advance whether all applications (especially inside the guest) will
support 64-bit CIDs or not.  32-bit CID applications won't work if a
64-bit CID has been assigned.

It also opens up the question how unique CIDs are allocated across
hosts.

Given that AF_VSOCK in Linux already exists in the 32-bit CID version,
I'd prefer to make virtio-vsock compatible with that for the time being.
Extensions can be added in the future but just implementing existing
AF_VSOCK semantics will already allow the applications to run.

> > 2. Dropping guest CIDs from the protocol breaks network protocols that
> >    send addresses.
> 
> Stick it in config space if you really have to.
> But why do you need it on each packet?

If packets are implicitly guest<->host then adding guest<->guest
communication requires a virtio spec change.  If packets contain
source/destination CIDs then allowing/forbidding guest<->host or
guest<->guest communication is purely a host policy decision.  I think
it's worth keeping that in from the start.

> >  NFS and netperf are the first two protocols I looked
> >    at and both transmit address information across the connection...
> 
> 
> Does netperf really attempt to get local IP
> and then send that inline within the connection?

Yes, netperf has separate control and data sockets.  I think part of the
reason for this split is that the control connection can communicate the
address details for the data connection over a different protocol (TCP +
RDMA?), but I'm not sure.

Stefan

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [virtio-dev] virtio-vsock live migration
  2016-03-16 14:32       ` Stefan Hajnoczi
@ 2016-03-16 14:58         ` Matt Benjamin
  2016-03-16 15:05         ` Michael S. Tsirkin
       [not found]         ` <20160316163344-mutt-send-email-mst@redhat.com>
  2 siblings, 0 replies; 11+ messages in thread
From: Matt Benjamin @ 2016-03-16 14:58 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: virtio-dev, Michael S. Tsirkin, virtualization,
	Christian Borntraeger, Claudio Imbrenda, Christoffer Dall

Hi,

----- Original Message -----
> From: "Stefan Hajnoczi" <stefanha@redhat.com>
> To: "Michael S. Tsirkin" <mst@redhat.com>

> > > > I think the right thing to do is just to teach guests
> > > > about 64 bit CIDs.
> > > > 
> > > > For now, can we drop guest CID from guest to host communication
> > > > completely,
> > > > making CID only host-visible? Maybe leave the space
> > > > in the packet so we can add CID there later.
> > > > It seems that in theory this will allow changing CID
> > > > during migration, transparently to the guest.
> > > > 
> > > > Guest visible CID is required for guest to guest communication -
> > > > but IIUC that is not currently supported.
> > > > Maybe that can be made conditional on 64 bit addressing.
> > > > Alternatively, it seems much easier to accept that these channels get
> > > > broken
> > > > across migration.
> > > 
> > > I reached the conclusion that channels break across migration because:
> > > 
> > > 1. 32-bit CIDs are in sockaddr_vm and we'd break AF_VSOCK ABI by
> > >    changing it to 64-bit.  Application code would be specific
> > >    virtio-vsock and wouldn't work with other AF_VSOCK transports that
> > >    use the 32-bit sockaddr_vm struct.
> > 
> > You don't have to repeat the IPv6 mistake.  Make all 32 bit CIDs
> > 64 bit CIDs by padding with 0s, then 64 bit apps can use
> > any CID.
> > 
> > Old 32 bit CID applications will not be able to use the extended
> > addresses, but hardcoding bugs
> > does not seem sane.
> 
> A mixed 32-bit and 64-bit CID world is complex.  The host doesn't know
> in advance whether all applications (especially inside the guest) will
> support 64-bit CIDs or not.  32-bit CID applications won't work if a
> 64-bit CID has been assigned.
> 
> It also opens up the question how unique CIDs are allocated across
> hosts.
> 
> Given that AF_VSOCK in Linux already exists in the 32-bit CID version,
> I'd prefer to make virtio-vsock compatible with that for the time being.
> Extensions can be added in the future but just implementing existing
> AF_VSOCK semantics will already allow the applications to run.
> 
> > > 2. Dropping guest CIDs from the protocol breaks network protocols that
> > >    send addresses.
> > 
> > Stick it in config space if you really have to.
> > But why do you need it on each packet?
> 
> If packets are implicitly guest<->host then adding guest<->guest
> communication requires a virtio spec change.  If packets contain
> source/destination CIDs then allowing/forbidding guest<->host or
> guest<->guest communication is purely a host policy decision.  I think
> it's worth keeping that in from the start.

I'm just the downstream consumer of vsock, but this was my intuition, as well.

Matt

> 
> > >  NFS and netperf are the first two protocols I looked
> > >    at and both transmit address information across the connection...
> > 
> > 

> Stefan
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-707-0660
fax.  734-769-8938
cel.  734-216-5309

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [virtio-dev] virtio-vsock live migration
  2016-03-16 14:32       ` Stefan Hajnoczi
  2016-03-16 14:58         ` Matt Benjamin
@ 2016-03-16 15:05         ` Michael S. Tsirkin
       [not found]         ` <20160316163344-mutt-send-email-mst@redhat.com>
  2 siblings, 0 replies; 11+ messages in thread
From: Michael S. Tsirkin @ 2016-03-16 15:05 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: virtio-dev, Claudio Imbrenda, Christian Borntraeger,
	Matt Benjamin, virtualization, Christoffer Dall

On Wed, Mar 16, 2016 at 02:32:00PM +0000, Stefan Hajnoczi wrote:
> On Tue, Mar 15, 2016 at 06:12:55PM +0200, Michael S. Tsirkin wrote:
> > On Tue, Mar 15, 2016 at 03:15:29PM +0000, Stefan Hajnoczi wrote:
> > > On Mon, Mar 14, 2016 at 01:13:24PM +0200, Michael S. Tsirkin wrote:
> > > > On Thu, Mar 03, 2016 at 03:37:37PM +0000, Stefan Hajnoczi wrote:
> > > > > Michael pointed out that the virtio-vsock draft specification does not
> > > > > address live migration and in fact currently precludes migration.
> > > > > 
> > > > > Migration is fundamental so the device specification at least mustn't
> > > > > preclude it.  Having brainstormed migration with Matthew Benjamin and
> > > > > Michael Tsirkin, I am now summarizing the approach that I want to
> > > > > include in the next draft specification.
> > > > > 
> > > > > Feedback and comments welcome!  In the meantime I will implement this in
> > > > > code and update the draft specification.
> > > > 
> > > > Most of the issue seems to be a consequence of using a 4 byte CID.
> > > > 
> > > > I think the right thing to do is just to teach guests
> > > > about 64 bit CIDs.
> > > > 
> > > > For now, can we drop guest CID from guest to host communication completely,
> > > > making CID only host-visible? Maybe leave the space
> > > > in the packet so we can add CID there later.
> > > > It seems that in theory this will allow changing CID
> > > > during migration, transparently to the guest.
> > > > 
> > > > Guest visible CID is required for guest to guest communication -
> > > > but IIUC that is not currently supported.
> > > > Maybe that can be made conditional on 64 bit addressing.
> > > > Alternatively, it seems much easier to accept that these channels get broken
> > > > across migration.
> > > 
> > > I reached the conclusion that channels break across migration because:
> > > 
> > > 1. 32-bit CIDs are in sockaddr_vm and we'd break AF_VSOCK ABI by
> > >    changing it to 64-bit.  Application code would be specific
> > >    virtio-vsock and wouldn't work with other AF_VSOCK transports that
> > >    use the 32-bit sockaddr_vm struct.
> > 
> > You don't have to repeat the IPv6 mistake.  Make all 32 bit CIDs
> > 64 bit CIDs by padding with 0s, then 64 bit apps can use
> > any CID.
> > 
> > Old 32 bit CID applications will not be able to use the extended
> > addresses, but hardcoding bugs
> > does not seem sane.
> 
> A mixed 32-bit and 64-bit CID world is complex.  The host doesn't know
> in advance whether all applications (especially inside the guest) will
> support 64-bit CIDs or not.  32-bit CID applications won't work if a
> 64-bit CID has been assigned.

Only for guest to guest communication, correct?
Host can do dual addressing as well.
Applications that do not want connections to be broken
will use 64 bit addresses. Old applications will keep
running until you migrate.

> It also opens up the question how unique CIDs are allocated across
> hosts.

I think it's actually a good idea to define this, rather than
leave things in the air. For example, EUI-64 can be used.

> Given that AF_VSOCK in Linux already exists in the 32-bit CID version,
> I'd prefer to make virtio-vsock compatible with that for the time being.

Yes, so we cut corners in order to ship it quickly,
but that is implementation. Linux can be extended.
Why limit the protocol to follow current implementation bugs?

> Extensions can be added in the future but just implementing existing
> AF_VSOCK semantics will already allow the applications to run.

It's an important goal. At the spec level, I do not think
it is a good idea to put this limitation in, but users can
just use a subset of the available address space in order
to make existing apps work.

> > > 2. Dropping guest CIDs from the protocol breaks network protocols that
> > >    send addresses.
> > 
> > Stick it in config space if you really have to.
> > But why do you need it on each packet?
> 
> If packets are implicitly guest<->host then adding guest<->guest
> communication requires a virtio spec change.  If packets contain
> source/destination CIDs then allowing/forbidding guest<->host or
> guest<->guest communication is purely a host policy decision.  I think
> it's worth keeping that in from the start.

OK.



> > >  NFS and netperf are the first two protocols I looked
> > >    at and both transmit address information across the connection...
> > 
> > 
> > Does netperf really attempt to get local IP
> > and then send that inline within the connection?
> 
> Yes, netperf has separate control and data sockets.  I think part of the
> reason for this split is that the control connection can communicate the
> address details for the data connection over a different protocol (TCP +
> RDMA?), but I'm not sure.
> 
> Stefan

Thinking about it, netperf does not survive disconnects.
So the current protocol would be useless for it.
I am not sure about NFS but from (long past) experience it did not
attempt to re-resolve the name to address, so changing an
address would break it as well.

So I think these applications would have to use a 64 bit CID.

Why, then, do we care about one aspect of these applications
(creating connections) and not another (not breaking them)?

-- 
MST

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [virtio-dev] virtio-vsock live migration
       [not found]         ` <20160316163344-mutt-send-email-mst@redhat.com>
@ 2016-04-06 12:55           ` Stefan Hajnoczi
       [not found]           ` <20160406125550.GB17538@stefanha-x1.localdomain>
  1 sibling, 0 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2016-04-06 12:55 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev, Claudio Imbrenda, Christian Borntraeger,
	Matt Benjamin, virtualization, Christoffer Dall


[-- Attachment #1.1: Type: text/plain, Size: 2243 bytes --]

On Wed, Mar 16, 2016 at 05:05:19PM +0200, Michael S. Tsirkin wrote:
> > > >  NFS and netperf are the first two protocols I looked
> > > >    at and both transmit address information across the connection...
> > > 
> > > 
> > > Does netperf really attempt to get local IP
> > > and then send that inline within the connection?
> > 
> > Yes, netperf has separate control and data sockets.  I think part of the
> > reason for this split is that the control connection can communicate the
> > address details for the data connection over a different protocol (TCP +
> > RDMA?), but I'm not sure.
> > 
> > Stefan
> 
> Thinking about it, netperf does not survive disconnects.
> So the current protocol would be useless for it.
> I am not sure about NFS but from (long past) experience it did not
> attempt to re-resolve the name to address, so changing an
> address would break it as well.
> 
> So I think these applications would have to use a 64 bit CID.
> 
> Why, then, do we care about one aspect of these applications
> (creating connections) and not another (not breaking them)?

I care about mapping the semantics of AF_VSOCK to virtio-vsock.
AF_VSOCK was implemented with the ability to plug in additional
transports (like virtio).  This allows guest agents and other
applications to compile once and run on any transport.

If we change virtio-vsock to rely on unique addresses across migration
then we lose zero-configuration.  AF_VSOCK applications use the
VMADDR_CID_HOST (2) constant to communicate with the host.  After live
migration this well-known CID refers to the new host.  Applications
would need to know a unique host CID in order to work correctly across
live migration.

Although I appreciate your drive to make the device as flexible as
possible, if we want to do this we are totally beyond AF_VSOCK semantics
and would be better served by a separate effort that avoids confusion
between class AF_VSOCK semantics and virtio socket semantics.

Can we please treat AF_VSOCK semantics as the requirements we're trying
to implement?  It supports qemu-guest-agent and as I described in a
previous mail could also support transparent connection migration a la
CRIU sockets.

Stefan

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [virtio-dev] virtio-vsock live migration
       [not found]           ` <20160406125550.GB17538@stefanha-x1.localdomain>
@ 2016-04-06 13:17             ` Michael S. Tsirkin
  0 siblings, 0 replies; 11+ messages in thread
From: Michael S. Tsirkin @ 2016-04-06 13:17 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: virtio-dev, Claudio Imbrenda, Christian Borntraeger,
	Matt Benjamin, virtualization, Christoffer Dall

On Wed, Apr 06, 2016 at 01:55:50PM +0100, Stefan Hajnoczi wrote:
> On Wed, Mar 16, 2016 at 05:05:19PM +0200, Michael S. Tsirkin wrote:
> > > > >  NFS and netperf are the first two protocols I looked
> > > > >    at and both transmit address information across the connection...
> > > > 
> > > > 
> > > > Does netperf really attempt to get local IP
> > > > and then send that inline within the connection?
> > > 
> > > Yes, netperf has separate control and data sockets.  I think part of the
> > > reason for this split is that the control connection can communicate the
> > > address details for the data connection over a different protocol (TCP +
> > > RDMA?), but I'm not sure.
> > > 
> > > Stefan
> > 
> > Thinking about it, netperf does not survive disconnects.
> > So the current protocol would be useless for it.
> > I am not sure about NFS but from (long past) experience it did not
> > attempt to re-resolve the name to address, so changing an
> > address would break it as well.
> > 
> > So I think these applications would have to use a 64 bit CID.
> > 
> > Why, then, do we care about one aspect of these applications
> > (creating connections) and not another (not breaking them)?
> 
> I care about mapping the semantics of AF_VSOCK to virtio-vsock.
> AF_VSOCK was implemented with the ability to plug in additional
> transports (like virtio).  This allows guest agents and other
> applications to compile once and run on any transport.
> 
> If we change virtio-vsock to rely on unique addresses across migration
> then we lose zero-configuration.

Could be an option. management has very little trouble configuring
unique CIDs and it does care about migration. Desktop users don't
migrate and they want zero configuration.

> AF_VSOCK applications use the
> VMADDR_CID_HOST (2) constant to communicate with the host.  After live
> migration this well-known CID refers to the new host.  Applications
> would need to know a unique host CID in order to work correctly across
> live migration.
> 
> Although I appreciate your drive to make the device as flexible as
> possible, if we want to do this we are totally beyond AF_VSOCK semantics
> and would be better served by a separate effort that avoids confusion
> between class AF_VSOCK semantics and virtio socket semantics.

Maybe, though I merely proposed reserving some space so we can extend
CIDs to 64 bit (or 128 bit like hyperv guys would like to?)
in the future without too much pain.
To me this doesn't look like worth starting a completely separate effort for.

> Can we please treat AF_VSOCK semantics as the requirements we're trying
> to implement?  It supports qemu-guest-agent and as I described in a
> previous mail could also support transparent connection migration a la
> CRIU sockets.
> 
> Stefan

I only wish the semantics were better documented somewhere :)

-- 
MST

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-04-06 13:17 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-03 15:37 virtio-vsock live migration Stefan Hajnoczi
2016-03-10 23:56 ` Michael S. Tsirkin
2016-03-14 11:13 ` [virtio-dev] " Michael S. Tsirkin
     [not found] ` <20160311014147-mutt-send-email-mst@redhat.com>
2016-03-15 15:10   ` Stefan Hajnoczi
     [not found] ` <20160314130150-mutt-send-email-mst@redhat.com>
2016-03-15 15:15   ` [virtio-dev] " Stefan Hajnoczi
     [not found]   ` <20160315151529.GB26263@stefanha-x1.localdomain>
2016-03-15 16:12     ` Michael S. Tsirkin
     [not found]     ` <20160315180916-mutt-send-email-mst@redhat.com>
2016-03-16 14:32       ` Stefan Hajnoczi
2016-03-16 14:58         ` Matt Benjamin
2016-03-16 15:05         ` Michael S. Tsirkin
     [not found]         ` <20160316163344-mutt-send-email-mst@redhat.com>
2016-04-06 12:55           ` Stefan Hajnoczi
     [not found]           ` <20160406125550.GB17538@stefanha-x1.localdomain>
2016-04-06 13:17             ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.