All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/1] yank: Unregister function when using TLS migration
@ 2021-05-26 20:05 Leonardo Bras
  2021-05-26 20:40 ` Peter Xu
  2021-05-26 21:24 ` Lukas Straub
  0 siblings, 2 replies; 14+ messages in thread
From: Leonardo Bras @ 2021-05-26 20:05 UTC (permalink / raw)
  To: Juan Quintela, Dr. David Alan Gilbert, peterx, lukasstraub2
  Cc: Leonardo Bras, qemu-devel

After yank feature was introduced, whenever migration is started using TLS,
the following error happens in both source and destination hosts:

(qemu) qemu-kvm: ../util/yank.c:107: yank_unregister_instance:
Assertion `QLIST_EMPTY(&entry->yankfns)' failed.

This happens because of a missing yank_unregister_function() when using
qio-channel-tls.

Fix this by also allowing TYPE_QIO_CHANNEL_TLS object type to perform
yank_unregister_function() in channel_close() and multifd_load_cleanup().

Fixes: 50186051f ("Introduce yank feature")
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1964326
Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
---
 migration/multifd.c           | 5 +++--
 migration/qemu-file-channel.c | 7 +++++++
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 0a4803cfcc..be8656f4c0 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -987,8 +987,9 @@ int multifd_load_cleanup(Error **errp)
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDRecvParams *p = &multifd_recv_state->params[i];
 
-        if (object_dynamic_cast(OBJECT(p->c), TYPE_QIO_CHANNEL_SOCKET)
-            && OBJECT(p->c)->ref == 1) {
+        if ((object_dynamic_cast(OBJECT(p->c), TYPE_QIO_CHANNEL_SOCKET) ||
+            (object_dynamic_cast(OBJECT(p->c), TYPE_QIO_CHANNEL_TLS)))  &&
+            OBJECT(p->c)->ref == 1) {
             yank_unregister_function(MIGRATION_YANK_INSTANCE,
                                      migration_yank_iochannel,
                                      QIO_CHANNEL(p->c));
diff --git a/migration/qemu-file-channel.c b/migration/qemu-file-channel.c
index 876d05a540..4f79090f3f 100644
--- a/migration/qemu-file-channel.c
+++ b/migration/qemu-file-channel.c
@@ -26,6 +26,7 @@
 #include "qemu-file-channel.h"
 #include "qemu-file.h"
 #include "io/channel-socket.h"
+#include "io/channel-tls.h"
 #include "qemu/iov.h"
 #include "qemu/yank.h"
 #include "yank_functions.h"
@@ -111,6 +112,12 @@ static int channel_close(void *opaque, Error **errp)
         yank_unregister_function(MIGRATION_YANK_INSTANCE,
                                  migration_yank_iochannel,
                                  QIO_CHANNEL(ioc));
+    } else if (object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_TLS)
+               && OBJECT(ioc)->ref == 1) {
+        QIOChannelTLS *tioc = opaque;
+        yank_unregister_function(MIGRATION_YANK_INSTANCE,
+                                 migration_yank_iochannel,
+                                 QIO_CHANNEL(tioc->master));
     }
     object_unref(OBJECT(ioc));
     return ret;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] yank: Unregister function when using TLS migration
  2021-05-26 20:05 [PATCH 1/1] yank: Unregister function when using TLS migration Leonardo Bras
@ 2021-05-26 20:40 ` Peter Xu
  2021-05-26 21:21   ` Lukas Straub
  2021-05-26 21:24 ` Lukas Straub
  1 sibling, 1 reply; 14+ messages in thread
From: Peter Xu @ 2021-05-26 20:40 UTC (permalink / raw)
  To: Leonardo Bras, lukasstraub2
  Cc: qemu-devel, lukasstraub2, Dr. David Alan Gilbert, Juan Quintela

On Wed, May 26, 2021 at 05:05:40PM -0300, Leonardo Bras wrote:
> After yank feature was introduced, whenever migration is started using TLS,
> the following error happens in both source and destination hosts:
> 
> (qemu) qemu-kvm: ../util/yank.c:107: yank_unregister_instance:
> Assertion `QLIST_EMPTY(&entry->yankfns)' failed.
> 
> This happens because of a missing yank_unregister_function() when using
> qio-channel-tls.
> 
> Fix this by also allowing TYPE_QIO_CHANNEL_TLS object type to perform
> yank_unregister_function() in channel_close() and multifd_load_cleanup().
> 
> Fixes: 50186051f ("Introduce yank feature")
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1964326
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>

Leo,

Thanks for looking into it!

So before looking int the fix... I do have a doubt on why we only enable yank
on socket typed, as I think tls should also work with qio_channel_shutdown().

IIUC the confused thing here is we register only for qio-socket, however tls
will actually call migration_channel_connect() twice, first with a qio-socket,
then with the real tls-socket.  For tls I feel like we have registered with the
wrong channel - instead of the wrapper socket ioc, we should register to the
final tls ioc?

Lukas, is there a reason?

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] yank: Unregister function when using TLS migration
  2021-05-26 20:40 ` Peter Xu
@ 2021-05-26 21:21   ` Lukas Straub
  2021-05-26 21:58     ` Peter Xu
  0 siblings, 1 reply; 14+ messages in thread
From: Lukas Straub @ 2021-05-26 21:21 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Leonardo Bras, Dr. David Alan Gilbert, Juan Quintela

[-- Attachment #1: Type: text/plain, Size: 1849 bytes --]

On Wed, 26 May 2021 16:40:35 -0400
Peter Xu <peterx@redhat.com> wrote:

> On Wed, May 26, 2021 at 05:05:40PM -0300, Leonardo Bras wrote:
> > After yank feature was introduced, whenever migration is started using TLS,
> > the following error happens in both source and destination hosts:
> > 
> > (qemu) qemu-kvm: ../util/yank.c:107: yank_unregister_instance:
> > Assertion `QLIST_EMPTY(&entry->yankfns)' failed.
> > 
> > This happens because of a missing yank_unregister_function() when using
> > qio-channel-tls.
> > 
> > Fix this by also allowing TYPE_QIO_CHANNEL_TLS object type to perform
> > yank_unregister_function() in channel_close() and multifd_load_cleanup().
> > 
> > Fixes: 50186051f ("Introduce yank feature")
> > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1964326
> > Signed-off-by: Leonardo Bras <leobras.c@gmail.com>  
> 
> Leo,
> 
> Thanks for looking into it!
> 
> So before looking int the fix... I do have a doubt on why we only enable yank
> on socket typed, as I think tls should also work with qio_channel_shutdown().
> 
> IIUC the confused thing here is we register only for qio-socket, however tls
> will actually call migration_channel_connect() twice, first with a qio-socket,
> then with the real tls-socket.  For tls I feel like we have registered with the
> wrong channel - instead of the wrapper socket ioc, we should register to the
> final tls ioc?
> 
> Lukas, is there a reason?
> 

Hi,
There is no specific reason. Both ways work equally well in preventing
qemu from hanging. shutdown() for tls-channel just makes it abort a
little sooner (by not attempting to encrypt and send data anymore).

I don't lean either way. I guess registering it on the tls-channel
makes is a bit more explicit and clearer.

What do you think?

Regards,
Lukas Straub

-- 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] yank: Unregister function when using TLS migration
  2021-05-26 20:05 [PATCH 1/1] yank: Unregister function when using TLS migration Leonardo Bras
  2021-05-26 20:40 ` Peter Xu
@ 2021-05-26 21:24 ` Lukas Straub
  2021-05-26 21:56   ` Leonardo Brás
  1 sibling, 1 reply; 14+ messages in thread
From: Lukas Straub @ 2021-05-26 21:24 UTC (permalink / raw)
  To: Leonardo Bras; +Cc: qemu-devel, Dr. David Alan Gilbert, peterx, Juan Quintela

[-- Attachment #1: Type: text/plain, Size: 3086 bytes --]

On Wed, 26 May 2021 17:05:40 -0300
Leonardo Bras <leobras.c@gmail.com> wrote:

> After yank feature was introduced, whenever migration is started using TLS,
> the following error happens in both source and destination hosts:
> 
> (qemu) qemu-kvm: ../util/yank.c:107: yank_unregister_instance:
> Assertion `QLIST_EMPTY(&entry->yankfns)' failed.
> 
> This happens because of a missing yank_unregister_function() when using
> qio-channel-tls.
> 
> Fix this by also allowing TYPE_QIO_CHANNEL_TLS object type to perform
> yank_unregister_function() in channel_close() and multifd_load_cleanup().
> 
> Fixes: 50186051f ("Introduce yank feature")
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1964326
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> ---
>  migration/multifd.c           | 5 +++--
>  migration/qemu-file-channel.c | 7 +++++++
>  2 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 0a4803cfcc..be8656f4c0 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -987,8 +987,9 @@ int multifd_load_cleanup(Error **errp)
>      for (i = 0; i < migrate_multifd_channels(); i++) {
>          MultiFDRecvParams *p = &multifd_recv_state->params[i];
>  
> -        if (object_dynamic_cast(OBJECT(p->c), TYPE_QIO_CHANNEL_SOCKET)
> -            && OBJECT(p->c)->ref == 1) {
> +        if ((object_dynamic_cast(OBJECT(p->c), TYPE_QIO_CHANNEL_SOCKET) ||
> +            (object_dynamic_cast(OBJECT(p->c), TYPE_QIO_CHANNEL_TLS)))  &&
> +            OBJECT(p->c)->ref == 1) {
>              yank_unregister_function(MIGRATION_YANK_INSTANCE,
>                                       migration_yank_iochannel,
>                                       QIO_CHANNEL(p->c));

The code here should be the same as in channel_close. So for the
tls-channel you have to unregister with QIO_CHANNEL(tioc->master) like
below.

Regards,
Lukas Straub

> diff --git a/migration/qemu-file-channel.c b/migration/qemu-file-channel.c
> index 876d05a540..4f79090f3f 100644
> --- a/migration/qemu-file-channel.c
> +++ b/migration/qemu-file-channel.c
> @@ -26,6 +26,7 @@
>  #include "qemu-file-channel.h"
>  #include "qemu-file.h"
>  #include "io/channel-socket.h"
> +#include "io/channel-tls.h"
>  #include "qemu/iov.h"
>  #include "qemu/yank.h"
>  #include "yank_functions.h"
> @@ -111,6 +112,12 @@ static int channel_close(void *opaque, Error **errp)
>          yank_unregister_function(MIGRATION_YANK_INSTANCE,
>                                   migration_yank_iochannel,
>                                   QIO_CHANNEL(ioc));
> +    } else if (object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_TLS)
> +               && OBJECT(ioc)->ref == 1) {
> +        QIOChannelTLS *tioc = opaque;
> +        yank_unregister_function(MIGRATION_YANK_INSTANCE,
> +                                 migration_yank_iochannel,
> +                                 QIO_CHANNEL(tioc->master));
>      }
>      object_unref(OBJECT(ioc));
>      return ret;



-- 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] yank: Unregister function when using TLS migration
  2021-05-26 21:24 ` Lukas Straub
@ 2021-05-26 21:56   ` Leonardo Brás
  0 siblings, 0 replies; 14+ messages in thread
From: Leonardo Brás @ 2021-05-26 21:56 UTC (permalink / raw)
  To: Lukas Straub; +Cc: qemu-devel, Dr. David Alan Gilbert, peterx, Juan Quintela

Hello Lukas, thanks for this feedback!

On Wed, 2021-05-26 at 23:24 +0200, Lukas Straub wrote:
> > diff --git a/migration/multifd.c b/migration/multifd.c
> > index 0a4803cfcc..be8656f4c0 100644
> > --- a/migration/multifd.c
> > +++ b/migration/multifd.c
> > @@ -987,8 +987,9 @@ int multifd_load_cleanup(Error **errp)
> >      for (i = 0; i < migrate_multifd_channels(); i++) {
> >          MultiFDRecvParams *p = &multifd_recv_state->params[i];
> >  
> > -        if (object_dynamic_cast(OBJECT(p->c),
> > TYPE_QIO_CHANNEL_SOCKET)
> > -            && OBJECT(p->c)->ref == 1) {
> > +        if ((object_dynamic_cast(OBJECT(p->c),
> > TYPE_QIO_CHANNEL_SOCKET) ||
> > +            (object_dynamic_cast(OBJECT(p->c),
> > TYPE_QIO_CHANNEL_TLS)))  &&
> > +            OBJECT(p->c)->ref == 1) {
> >              yank_unregister_function(MIGRATION_YANK_INSTANCE,
> >                                       migration_yank_iochannel,
> >                                       QIO_CHANNEL(p->c));
> 
> The code here should be the same as in channel_close. So for the
> tls-channel you have to unregister with QIO_CHANNEL(tioc->master)
> like
> below.

ok, sure, I will send a v2.

Thanks!


> 
> Regards,
> Lukas Straub






^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] yank: Unregister function when using TLS migration
  2021-05-26 21:21   ` Lukas Straub
@ 2021-05-26 21:58     ` Peter Xu
  2021-05-27  8:46       ` Daniel P. Berrangé
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Xu @ 2021-05-26 21:58 UTC (permalink / raw)
  To: Lukas Straub
  Cc: qemu-devel, Leonardo Bras, Daniel P. Berrange,
	Dr. David Alan Gilbert, Juan Quintela

On Wed, May 26, 2021 at 11:21:03PM +0200, Lukas Straub wrote:
> On Wed, 26 May 2021 16:40:35 -0400
> Peter Xu <peterx@redhat.com> wrote:
> 
> > On Wed, May 26, 2021 at 05:05:40PM -0300, Leonardo Bras wrote:
> > > After yank feature was introduced, whenever migration is started using TLS,
> > > the following error happens in both source and destination hosts:
> > > 
> > > (qemu) qemu-kvm: ../util/yank.c:107: yank_unregister_instance:
> > > Assertion `QLIST_EMPTY(&entry->yankfns)' failed.
> > > 
> > > This happens because of a missing yank_unregister_function() when using
> > > qio-channel-tls.
> > > 
> > > Fix this by also allowing TYPE_QIO_CHANNEL_TLS object type to perform
> > > yank_unregister_function() in channel_close() and multifd_load_cleanup().
> > > 
> > > Fixes: 50186051f ("Introduce yank feature")
> > > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1964326
> > > Signed-off-by: Leonardo Bras <leobras.c@gmail.com>  
> > 
> > Leo,
> > 
> > Thanks for looking into it!
> > 
> > So before looking int the fix... I do have a doubt on why we only enable yank
> > on socket typed, as I think tls should also work with qio_channel_shutdown().
> > 
> > IIUC the confused thing here is we register only for qio-socket, however tls
> > will actually call migration_channel_connect() twice, first with a qio-socket,
> > then with the real tls-socket.  For tls I feel like we have registered with the
> > wrong channel - instead of the wrapper socket ioc, we should register to the
> > final tls ioc?
> > 
> > Lukas, is there a reason?
> > 
> 
> Hi,
> There is no specific reason. Both ways work equally well in preventing
> qemu from hanging. shutdown() for tls-channel just makes it abort a
> little sooner (by not attempting to encrypt and send data anymore).
> 
> I don't lean either way. I guess registering it on the tls-channel
> makes is a bit more explicit and clearer.

Agreed, because IMHO logically the migration code should not be aware of
internals of IOChannels, e.g., we shouldn't need to know ioc->master is the
socket ioc of tls ioc to unregister.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] yank: Unregister function when using TLS migration
  2021-05-26 21:58     ` Peter Xu
@ 2021-05-27  8:46       ` Daniel P. Berrangé
  2021-05-27 12:23         ` Peter Xu
  0 siblings, 1 reply; 14+ messages in thread
From: Daniel P. Berrangé @ 2021-05-27  8:46 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Leonardo Bras, Lukas Straub, Dr. David Alan Gilbert,
	Juan Quintela

On Wed, May 26, 2021 at 05:58:58PM -0400, Peter Xu wrote:
> On Wed, May 26, 2021 at 11:21:03PM +0200, Lukas Straub wrote:
> > On Wed, 26 May 2021 16:40:35 -0400
> > Peter Xu <peterx@redhat.com> wrote:
> > 
> > > On Wed, May 26, 2021 at 05:05:40PM -0300, Leonardo Bras wrote:
> > > > After yank feature was introduced, whenever migration is started using TLS,
> > > > the following error happens in both source and destination hosts:
> > > > 
> > > > (qemu) qemu-kvm: ../util/yank.c:107: yank_unregister_instance:
> > > > Assertion `QLIST_EMPTY(&entry->yankfns)' failed.
> > > > 
> > > > This happens because of a missing yank_unregister_function() when using
> > > > qio-channel-tls.
> > > > 
> > > > Fix this by also allowing TYPE_QIO_CHANNEL_TLS object type to perform
> > > > yank_unregister_function() in channel_close() and multifd_load_cleanup().
> > > > 
> > > > Fixes: 50186051f ("Introduce yank feature")
> > > > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1964326
> > > > Signed-off-by: Leonardo Bras <leobras.c@gmail.com>  
> > > 
> > > Leo,
> > > 
> > > Thanks for looking into it!
> > > 
> > > So before looking int the fix... I do have a doubt on why we only enable yank
> > > on socket typed, as I think tls should also work with qio_channel_shutdown().
> > > 
> > > IIUC the confused thing here is we register only for qio-socket, however tls
> > > will actually call migration_channel_connect() twice, first with a qio-socket,
> > > then with the real tls-socket.  For tls I feel like we have registered with the
> > > wrong channel - instead of the wrapper socket ioc, we should register to the
> > > final tls ioc?
> > > 
> > > Lukas, is there a reason?
> > > 
> > 
> > Hi,
> > There is no specific reason. Both ways work equally well in preventing
> > qemu from hanging. shutdown() for tls-channel just makes it abort a
> > little sooner (by not attempting to encrypt and send data anymore).
> > 
> > I don't lean either way. I guess registering it on the tls-channel
> > makes is a bit more explicit and clearer.
> 
> Agreed, because IMHO logically the migration code should not be aware of
> internals of IOChannels, e.g., we shouldn't need to know ioc->master is the
> socket ioc of tls ioc to unregister.

I think it is atually better to ignore the TLS channel and *always* yank
on the undering socket IO channel. The yank functionality is intended to
be used in a scenario where we know the channels are broken.  If yank
calls the high level IO channel it is potentially going to try to do a
cleanup shutdown that we know will fail because of the broken network.

Conceptually we just want to yank out the socket channel and leave
everything above that to just deal with the fallout of the terminated
socket.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] yank: Unregister function when using TLS migration
  2021-05-27  8:46       ` Daniel P. Berrangé
@ 2021-05-27 12:23         ` Peter Xu
  2021-05-27 12:37           ` Daniel P. Berrangé
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Xu @ 2021-05-27 12:23 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Leonardo Bras, Lukas Straub, Dr. David Alan Gilbert,
	Juan Quintela

On Thu, May 27, 2021 at 09:46:54AM +0100, Daniel P. Berrangé wrote:
> On Wed, May 26, 2021 at 05:58:58PM -0400, Peter Xu wrote:
> > On Wed, May 26, 2021 at 11:21:03PM +0200, Lukas Straub wrote:
> > > On Wed, 26 May 2021 16:40:35 -0400
> > > Peter Xu <peterx@redhat.com> wrote:
> > > 
> > > > On Wed, May 26, 2021 at 05:05:40PM -0300, Leonardo Bras wrote:
> > > > > After yank feature was introduced, whenever migration is started using TLS,
> > > > > the following error happens in both source and destination hosts:
> > > > > 
> > > > > (qemu) qemu-kvm: ../util/yank.c:107: yank_unregister_instance:
> > > > > Assertion `QLIST_EMPTY(&entry->yankfns)' failed.
> > > > > 
> > > > > This happens because of a missing yank_unregister_function() when using
> > > > > qio-channel-tls.
> > > > > 
> > > > > Fix this by also allowing TYPE_QIO_CHANNEL_TLS object type to perform
> > > > > yank_unregister_function() in channel_close() and multifd_load_cleanup().
> > > > > 
> > > > > Fixes: 50186051f ("Introduce yank feature")
> > > > > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1964326
> > > > > Signed-off-by: Leonardo Bras <leobras.c@gmail.com>  
> > > > 
> > > > Leo,
> > > > 
> > > > Thanks for looking into it!
> > > > 
> > > > So before looking int the fix... I do have a doubt on why we only enable yank
> > > > on socket typed, as I think tls should also work with qio_channel_shutdown().
> > > > 
> > > > IIUC the confused thing here is we register only for qio-socket, however tls
> > > > will actually call migration_channel_connect() twice, first with a qio-socket,
> > > > then with the real tls-socket.  For tls I feel like we have registered with the
> > > > wrong channel - instead of the wrapper socket ioc, we should register to the
> > > > final tls ioc?
> > > > 
> > > > Lukas, is there a reason?
> > > > 
> > > 
> > > Hi,
> > > There is no specific reason. Both ways work equally well in preventing
> > > qemu from hanging. shutdown() for tls-channel just makes it abort a
> > > little sooner (by not attempting to encrypt and send data anymore).
> > > 
> > > I don't lean either way. I guess registering it on the tls-channel
> > > makes is a bit more explicit and clearer.
> > 
> > Agreed, because IMHO logically the migration code should not be aware of
> > internals of IOChannels, e.g., we shouldn't need to know ioc->master is the
> > socket ioc of tls ioc to unregister.
> 
> I think it is atually better to ignore the TLS channel and *always* yank
> on the undering socket IO channel. The yank functionality is intended to
> be used in a scenario where we know the channels are broken.  If yank
> calls the high level IO channel it is potentially going to try to do a
> cleanup shutdown that we know will fail because of the broken network.

Could you elaborate what's the "cleanup shutdown"?

The yank calls migration_yank_iochannel:

void migration_yank_iochannel(void *opaque)
{
    QIOChannel *ioc = QIO_CHANNEL(opaque);

    qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
}

Where qio_channel_shutdown for tls is nothing but delivers that to the master
channel:

static int qio_channel_tls_shutdown(QIOChannel *ioc,
                                    QIOChannelShutdown how,
                                    Error **errp)
{
    QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc);

    qatomic_or(&tioc->shutdown, how);

    return qio_channel_shutdown(tioc->master, how, errp);
}

So I thought it was a nice wrapper just for things like this, and I didn't see
anything it does more than the io_shutdown for the socket channel.  Did I miss
something?

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] yank: Unregister function when using TLS migration
  2021-05-27 12:23         ` Peter Xu
@ 2021-05-27 12:37           ` Daniel P. Berrangé
  2021-05-27 13:09             ` Peter Xu
  2021-05-27 15:05             ` Lukas Straub
  0 siblings, 2 replies; 14+ messages in thread
From: Daniel P. Berrangé @ 2021-05-27 12:37 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Leonardo Bras, Lukas Straub, Dr. David Alan Gilbert,
	Juan Quintela

On Thu, May 27, 2021 at 08:23:52AM -0400, Peter Xu wrote:
> On Thu, May 27, 2021 at 09:46:54AM +0100, Daniel P. Berrangé wrote:
> > On Wed, May 26, 2021 at 05:58:58PM -0400, Peter Xu wrote:
> > > On Wed, May 26, 2021 at 11:21:03PM +0200, Lukas Straub wrote:
> > > > On Wed, 26 May 2021 16:40:35 -0400
> > > > Peter Xu <peterx@redhat.com> wrote:
> > > > 
> > > > > On Wed, May 26, 2021 at 05:05:40PM -0300, Leonardo Bras wrote:
> > > > > > After yank feature was introduced, whenever migration is started using TLS,
> > > > > > the following error happens in both source and destination hosts:
> > > > > > 
> > > > > > (qemu) qemu-kvm: ../util/yank.c:107: yank_unregister_instance:
> > > > > > Assertion `QLIST_EMPTY(&entry->yankfns)' failed.
> > > > > > 
> > > > > > This happens because of a missing yank_unregister_function() when using
> > > > > > qio-channel-tls.
> > > > > > 
> > > > > > Fix this by also allowing TYPE_QIO_CHANNEL_TLS object type to perform
> > > > > > yank_unregister_function() in channel_close() and multifd_load_cleanup().
> > > > > > 
> > > > > > Fixes: 50186051f ("Introduce yank feature")
> > > > > > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1964326
> > > > > > Signed-off-by: Leonardo Bras <leobras.c@gmail.com>  
> > > > > 
> > > > > Leo,
> > > > > 
> > > > > Thanks for looking into it!
> > > > > 
> > > > > So before looking int the fix... I do have a doubt on why we only enable yank
> > > > > on socket typed, as I think tls should also work with qio_channel_shutdown().
> > > > > 
> > > > > IIUC the confused thing here is we register only for qio-socket, however tls
> > > > > will actually call migration_channel_connect() twice, first with a qio-socket,
> > > > > then with the real tls-socket.  For tls I feel like we have registered with the
> > > > > wrong channel - instead of the wrapper socket ioc, we should register to the
> > > > > final tls ioc?
> > > > > 
> > > > > Lukas, is there a reason?
> > > > > 
> > > > 
> > > > Hi,
> > > > There is no specific reason. Both ways work equally well in preventing
> > > > qemu from hanging. shutdown() for tls-channel just makes it abort a
> > > > little sooner (by not attempting to encrypt and send data anymore).
> > > > 
> > > > I don't lean either way. I guess registering it on the tls-channel
> > > > makes is a bit more explicit and clearer.
> > > 
> > > Agreed, because IMHO logically the migration code should not be aware of
> > > internals of IOChannels, e.g., we shouldn't need to know ioc->master is the
> > > socket ioc of tls ioc to unregister.
> > 
> > I think it is atually better to ignore the TLS channel and *always* yank
> > on the undering socket IO channel. The yank functionality is intended to
> > be used in a scenario where we know the channels are broken.  If yank
> > calls the high level IO channel it is potentially going to try to do a
> > cleanup shutdown that we know will fail because of the broken network.
> 
> Could you elaborate what's the "cleanup shutdown"?
> 
> The yank calls migration_yank_iochannel:
> 
> void migration_yank_iochannel(void *opaque)
> {
>     QIOChannel *ioc = QIO_CHANNEL(opaque);
> 
>     qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
> }
> 
> Where qio_channel_shutdown for tls is nothing but delivers that to the master
> channel:
> 
> static int qio_channel_tls_shutdown(QIOChannel *ioc,
>                                     QIOChannelShutdown how,
>                                     Error **errp)
> {
>     QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc);
> 
>     qatomic_or(&tioc->shutdown, how);
> 
>     return qio_channel_shutdown(tioc->master, how, errp);
> }
> 
> So I thought it was a nice wrapper just for things like this, and I didn't see
> anything it does more than the io_shutdown for the socket channel.  Did I miss
> something?

Today thats the case, but don't assume it will be the case forever.
There is a mechanism in TLS for doing clean shutdown which we've
debated including.

In general apps *can* just call the shutdown method on the QIOChannelTLS
object no matter what.  Yank is just a little bit special because of its
need to be guaranteed to work even when the network is dead. So yank
should always directly call the low level QIOChannelSocket, so thre is
a strong guarantee it can't block on something.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] yank: Unregister function when using TLS migration
  2021-05-27 12:37           ` Daniel P. Berrangé
@ 2021-05-27 13:09             ` Peter Xu
  2021-05-27 13:17               ` Daniel P. Berrangé
  2021-05-27 15:05             ` Lukas Straub
  1 sibling, 1 reply; 14+ messages in thread
From: Peter Xu @ 2021-05-27 13:09 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Leonardo Bras, Lukas Straub, Dr. David Alan Gilbert,
	Juan Quintela

On Thu, May 27, 2021 at 01:37:42PM +0100, Daniel P. Berrangé wrote:
> On Thu, May 27, 2021 at 08:23:52AM -0400, Peter Xu wrote:
> > On Thu, May 27, 2021 at 09:46:54AM +0100, Daniel P. Berrangé wrote:
> > > On Wed, May 26, 2021 at 05:58:58PM -0400, Peter Xu wrote:
> > > > On Wed, May 26, 2021 at 11:21:03PM +0200, Lukas Straub wrote:
> > > > > On Wed, 26 May 2021 16:40:35 -0400
> > > > > Peter Xu <peterx@redhat.com> wrote:
> > > > > 
> > > > > > On Wed, May 26, 2021 at 05:05:40PM -0300, Leonardo Bras wrote:
> > > > > > > After yank feature was introduced, whenever migration is started using TLS,
> > > > > > > the following error happens in both source and destination hosts:
> > > > > > > 
> > > > > > > (qemu) qemu-kvm: ../util/yank.c:107: yank_unregister_instance:
> > > > > > > Assertion `QLIST_EMPTY(&entry->yankfns)' failed.
> > > > > > > 
> > > > > > > This happens because of a missing yank_unregister_function() when using
> > > > > > > qio-channel-tls.
> > > > > > > 
> > > > > > > Fix this by also allowing TYPE_QIO_CHANNEL_TLS object type to perform
> > > > > > > yank_unregister_function() in channel_close() and multifd_load_cleanup().
> > > > > > > 
> > > > > > > Fixes: 50186051f ("Introduce yank feature")
> > > > > > > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1964326
> > > > > > > Signed-off-by: Leonardo Bras <leobras.c@gmail.com>  
> > > > > > 
> > > > > > Leo,
> > > > > > 
> > > > > > Thanks for looking into it!
> > > > > > 
> > > > > > So before looking int the fix... I do have a doubt on why we only enable yank
> > > > > > on socket typed, as I think tls should also work with qio_channel_shutdown().
> > > > > > 
> > > > > > IIUC the confused thing here is we register only for qio-socket, however tls
> > > > > > will actually call migration_channel_connect() twice, first with a qio-socket,
> > > > > > then with the real tls-socket.  For tls I feel like we have registered with the
> > > > > > wrong channel - instead of the wrapper socket ioc, we should register to the
> > > > > > final tls ioc?
> > > > > > 
> > > > > > Lukas, is there a reason?
> > > > > > 
> > > > > 
> > > > > Hi,
> > > > > There is no specific reason. Both ways work equally well in preventing
> > > > > qemu from hanging. shutdown() for tls-channel just makes it abort a
> > > > > little sooner (by not attempting to encrypt and send data anymore).
> > > > > 
> > > > > I don't lean either way. I guess registering it on the tls-channel
> > > > > makes is a bit more explicit and clearer.
> > > > 
> > > > Agreed, because IMHO logically the migration code should not be aware of
> > > > internals of IOChannels, e.g., we shouldn't need to know ioc->master is the
> > > > socket ioc of tls ioc to unregister.
> > > 
> > > I think it is atually better to ignore the TLS channel and *always* yank
> > > on the undering socket IO channel. The yank functionality is intended to
> > > be used in a scenario where we know the channels are broken.  If yank
> > > calls the high level IO channel it is potentially going to try to do a
> > > cleanup shutdown that we know will fail because of the broken network.
> > 
> > Could you elaborate what's the "cleanup shutdown"?
> > 
> > The yank calls migration_yank_iochannel:
> > 
> > void migration_yank_iochannel(void *opaque)
> > {
> >     QIOChannel *ioc = QIO_CHANNEL(opaque);
> > 
> >     qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
> > }
> > 
> > Where qio_channel_shutdown for tls is nothing but delivers that to the master
> > channel:
> > 
> > static int qio_channel_tls_shutdown(QIOChannel *ioc,
> >                                     QIOChannelShutdown how,
> >                                     Error **errp)
> > {
> >     QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc);
> > 
> >     qatomic_or(&tioc->shutdown, how);
> > 
> >     return qio_channel_shutdown(tioc->master, how, errp);
> > }
> > 
> > So I thought it was a nice wrapper just for things like this, and I didn't see
> > anything it does more than the io_shutdown for the socket channel.  Did I miss
> > something?
> 
> Today thats the case, but don't assume it will be the case forever.
> There is a mechanism in TLS for doing clean shutdown which we've
> debated including.
> 
> In general apps *can* just call the shutdown method on the QIOChannelTLS
> object no matter what.  Yank is just a little bit special because of its
> need to be guaranteed to work even when the network is dead. So yank
> should always directly call the low level QIOChannelSocket, so thre is
> a strong guarantee it can't block on something.

Hmm, I am still not fully convinced that that's a valid reason the migration
code should be aware of how the socket is managed in tls channels...

Does that sound more like a good reason to introduce QIOChannelShutdown with
QIO_CHANNEL_SHUTDOWN_FORCE so it'll always not block if FORCE set?  Then we can
switch the yank function to use that.

What do you think?

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] yank: Unregister function when using TLS migration
  2021-05-27 13:09             ` Peter Xu
@ 2021-05-27 13:17               ` Daniel P. Berrangé
  2021-05-27 13:34                 ` Dr. David Alan Gilbert
  2021-05-27 13:35                 ` Peter Xu
  0 siblings, 2 replies; 14+ messages in thread
From: Daniel P. Berrangé @ 2021-05-27 13:17 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Leonardo Bras, Lukas Straub, Dr. David Alan Gilbert,
	Juan Quintela

On Thu, May 27, 2021 at 09:09:09AM -0400, Peter Xu wrote:
> On Thu, May 27, 2021 at 01:37:42PM +0100, Daniel P. Berrangé wrote:
> > On Thu, May 27, 2021 at 08:23:52AM -0400, Peter Xu wrote:
> > > On Thu, May 27, 2021 at 09:46:54AM +0100, Daniel P. Berrangé wrote:
> > > > On Wed, May 26, 2021 at 05:58:58PM -0400, Peter Xu wrote:
> > > > > On Wed, May 26, 2021 at 11:21:03PM +0200, Lukas Straub wrote:
> > > > > > On Wed, 26 May 2021 16:40:35 -0400
> > > > > > Peter Xu <peterx@redhat.com> wrote:
> > > > > > 
> > > > > > > On Wed, May 26, 2021 at 05:05:40PM -0300, Leonardo Bras wrote:
> > > > > > > > After yank feature was introduced, whenever migration is started using TLS,
> > > > > > > > the following error happens in both source and destination hosts:
> > > > > > > > 
> > > > > > > > (qemu) qemu-kvm: ../util/yank.c:107: yank_unregister_instance:
> > > > > > > > Assertion `QLIST_EMPTY(&entry->yankfns)' failed.
> > > > > > > > 
> > > > > > > > This happens because of a missing yank_unregister_function() when using
> > > > > > > > qio-channel-tls.
> > > > > > > > 
> > > > > > > > Fix this by also allowing TYPE_QIO_CHANNEL_TLS object type to perform
> > > > > > > > yank_unregister_function() in channel_close() and multifd_load_cleanup().
> > > > > > > > 
> > > > > > > > Fixes: 50186051f ("Introduce yank feature")
> > > > > > > > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1964326
> > > > > > > > Signed-off-by: Leonardo Bras <leobras.c@gmail.com>  
> > > > > > > 
> > > > > > > Leo,
> > > > > > > 
> > > > > > > Thanks for looking into it!
> > > > > > > 
> > > > > > > So before looking int the fix... I do have a doubt on why we only enable yank
> > > > > > > on socket typed, as I think tls should also work with qio_channel_shutdown().
> > > > > > > 
> > > > > > > IIUC the confused thing here is we register only for qio-socket, however tls
> > > > > > > will actually call migration_channel_connect() twice, first with a qio-socket,
> > > > > > > then with the real tls-socket.  For tls I feel like we have registered with the
> > > > > > > wrong channel - instead of the wrapper socket ioc, we should register to the
> > > > > > > final tls ioc?
> > > > > > > 
> > > > > > > Lukas, is there a reason?
> > > > > > > 
> > > > > > 
> > > > > > Hi,
> > > > > > There is no specific reason. Both ways work equally well in preventing
> > > > > > qemu from hanging. shutdown() for tls-channel just makes it abort a
> > > > > > little sooner (by not attempting to encrypt and send data anymore).
> > > > > > 
> > > > > > I don't lean either way. I guess registering it on the tls-channel
> > > > > > makes is a bit more explicit and clearer.
> > > > > 
> > > > > Agreed, because IMHO logically the migration code should not be aware of
> > > > > internals of IOChannels, e.g., we shouldn't need to know ioc->master is the
> > > > > socket ioc of tls ioc to unregister.
> > > > 
> > > > I think it is atually better to ignore the TLS channel and *always* yank
> > > > on the undering socket IO channel. The yank functionality is intended to
> > > > be used in a scenario where we know the channels are broken.  If yank
> > > > calls the high level IO channel it is potentially going to try to do a
> > > > cleanup shutdown that we know will fail because of the broken network.
> > > 
> > > Could you elaborate what's the "cleanup shutdown"?
> > > 
> > > The yank calls migration_yank_iochannel:
> > > 
> > > void migration_yank_iochannel(void *opaque)
> > > {
> > >     QIOChannel *ioc = QIO_CHANNEL(opaque);
> > > 
> > >     qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
> > > }
> > > 
> > > Where qio_channel_shutdown for tls is nothing but delivers that to the master
> > > channel:
> > > 
> > > static int qio_channel_tls_shutdown(QIOChannel *ioc,
> > >                                     QIOChannelShutdown how,
> > >                                     Error **errp)
> > > {
> > >     QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc);
> > > 
> > >     qatomic_or(&tioc->shutdown, how);
> > > 
> > >     return qio_channel_shutdown(tioc->master, how, errp);
> > > }
> > > 
> > > So I thought it was a nice wrapper just for things like this, and I didn't see
> > > anything it does more than the io_shutdown for the socket channel.  Did I miss
> > > something?
> > 
> > Today thats the case, but don't assume it will be the case forever.
> > There is a mechanism in TLS for doing clean shutdown which we've
> > debated including.
> > 
> > In general apps *can* just call the shutdown method on the QIOChannelTLS
> > object no matter what.  Yank is just a little bit special because of its
> > need to be guaranteed to work even when the network is dead. So yank
> > should always directly call the low level QIOChannelSocket, so thre is
> > a strong guarantee it can't block on something.
> 
> Hmm, I am still not fully convinced that that's a valid reason the migration
> code should be aware of how the socket is managed in tls channels...
> 
> Does that sound more like a good reason to introduce QIOChannelShutdown with
> QIO_CHANNEL_SHUTDOWN_FORCE so it'll always not block if FORCE set?  Then we can
> switch the yank function to use that.
> 
> What do you think?

I think that's unneccessary - the migration code already does similar
things elsewhere when it wants to distinguish TLS usage, so this is not
anything new conceptually.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] yank: Unregister function when using TLS migration
  2021-05-27 13:17               ` Daniel P. Berrangé
@ 2021-05-27 13:34                 ` Dr. David Alan Gilbert
  2021-05-27 13:35                 ` Peter Xu
  1 sibling, 0 replies; 14+ messages in thread
From: Dr. David Alan Gilbert @ 2021-05-27 13:34 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Leonardo Bras, Lukas Straub, qemu-devel, Peter Xu, Juan Quintela

* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Thu, May 27, 2021 at 09:09:09AM -0400, Peter Xu wrote:
> > On Thu, May 27, 2021 at 01:37:42PM +0100, Daniel P. Berrangé wrote:
> > > On Thu, May 27, 2021 at 08:23:52AM -0400, Peter Xu wrote:
> > > > On Thu, May 27, 2021 at 09:46:54AM +0100, Daniel P. Berrangé wrote:
> > > > > On Wed, May 26, 2021 at 05:58:58PM -0400, Peter Xu wrote:
> > > > > > On Wed, May 26, 2021 at 11:21:03PM +0200, Lukas Straub wrote:
> > > > > > > On Wed, 26 May 2021 16:40:35 -0400
> > > > > > > Peter Xu <peterx@redhat.com> wrote:
> > > > > > > 
> > > > > > > > On Wed, May 26, 2021 at 05:05:40PM -0300, Leonardo Bras wrote:
> > > > > > > > > After yank feature was introduced, whenever migration is started using TLS,
> > > > > > > > > the following error happens in both source and destination hosts:
> > > > > > > > > 
> > > > > > > > > (qemu) qemu-kvm: ../util/yank.c:107: yank_unregister_instance:
> > > > > > > > > Assertion `QLIST_EMPTY(&entry->yankfns)' failed.
> > > > > > > > > 
> > > > > > > > > This happens because of a missing yank_unregister_function() when using
> > > > > > > > > qio-channel-tls.
> > > > > > > > > 
> > > > > > > > > Fix this by also allowing TYPE_QIO_CHANNEL_TLS object type to perform
> > > > > > > > > yank_unregister_function() in channel_close() and multifd_load_cleanup().
> > > > > > > > > 
> > > > > > > > > Fixes: 50186051f ("Introduce yank feature")
> > > > > > > > > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1964326
> > > > > > > > > Signed-off-by: Leonardo Bras <leobras.c@gmail.com>  
> > > > > > > > 
> > > > > > > > Leo,
> > > > > > > > 
> > > > > > > > Thanks for looking into it!
> > > > > > > > 
> > > > > > > > So before looking int the fix... I do have a doubt on why we only enable yank
> > > > > > > > on socket typed, as I think tls should also work with qio_channel_shutdown().
> > > > > > > > 
> > > > > > > > IIUC the confused thing here is we register only for qio-socket, however tls
> > > > > > > > will actually call migration_channel_connect() twice, first with a qio-socket,
> > > > > > > > then with the real tls-socket.  For tls I feel like we have registered with the
> > > > > > > > wrong channel - instead of the wrapper socket ioc, we should register to the
> > > > > > > > final tls ioc?
> > > > > > > > 
> > > > > > > > Lukas, is there a reason?
> > > > > > > > 
> > > > > > > 
> > > > > > > Hi,
> > > > > > > There is no specific reason. Both ways work equally well in preventing
> > > > > > > qemu from hanging. shutdown() for tls-channel just makes it abort a
> > > > > > > little sooner (by not attempting to encrypt and send data anymore).
> > > > > > > 
> > > > > > > I don't lean either way. I guess registering it on the tls-channel
> > > > > > > makes is a bit more explicit and clearer.
> > > > > > 
> > > > > > Agreed, because IMHO logically the migration code should not be aware of
> > > > > > internals of IOChannels, e.g., we shouldn't need to know ioc->master is the
> > > > > > socket ioc of tls ioc to unregister.
> > > > > 
> > > > > I think it is atually better to ignore the TLS channel and *always* yank
> > > > > on the undering socket IO channel. The yank functionality is intended to
> > > > > be used in a scenario where we know the channels are broken.  If yank
> > > > > calls the high level IO channel it is potentially going to try to do a
> > > > > cleanup shutdown that we know will fail because of the broken network.
> > > > 
> > > > Could you elaborate what's the "cleanup shutdown"?
> > > > 
> > > > The yank calls migration_yank_iochannel:
> > > > 
> > > > void migration_yank_iochannel(void *opaque)
> > > > {
> > > >     QIOChannel *ioc = QIO_CHANNEL(opaque);
> > > > 
> > > >     qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
> > > > }
> > > > 
> > > > Where qio_channel_shutdown for tls is nothing but delivers that to the master
> > > > channel:
> > > > 
> > > > static int qio_channel_tls_shutdown(QIOChannel *ioc,
> > > >                                     QIOChannelShutdown how,
> > > >                                     Error **errp)
> > > > {
> > > >     QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc);
> > > > 
> > > >     qatomic_or(&tioc->shutdown, how);
> > > > 
> > > >     return qio_channel_shutdown(tioc->master, how, errp);
> > > > }
> > > > 
> > > > So I thought it was a nice wrapper just for things like this, and I didn't see
> > > > anything it does more than the io_shutdown for the socket channel.  Did I miss
> > > > something?
> > > 
> > > Today thats the case, but don't assume it will be the case forever.
> > > There is a mechanism in TLS for doing clean shutdown which we've
> > > debated including.
> > > 
> > > In general apps *can* just call the shutdown method on the QIOChannelTLS
> > > object no matter what.  Yank is just a little bit special because of its
> > > need to be guaranteed to work even when the network is dead. So yank
> > > should always directly call the low level QIOChannelSocket, so thre is
> > > a strong guarantee it can't block on something.
> > 
> > Hmm, I am still not fully convinced that that's a valid reason the migration
> > code should be aware of how the socket is managed in tls channels...
> > 
> > Does that sound more like a good reason to introduce QIOChannelShutdown with
> > QIO_CHANNEL_SHUTDOWN_FORCE so it'll always not block if FORCE set?  Then we can
> > switch the yank function to use that.
> > 
> > What do you think?
> 
> I think that's unneccessary - the migration code already does similar
> things elsewhere when it wants to distinguish TLS usage, so this is not
> anything new conceptually.

I'd probably agree with Peter here that it would be preferential for the
migration code not to know;  what we want to do here is give the channel
the opportunity to do a hard shutdown, and let the channel worry about
it whether it can do that.

If you look at the nbd code, it's yank registration calls
qio_channel_shutdown on whatever channel it happens to have; so that's
probably wrong since I think you're saying that could block.

Dave

> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] yank: Unregister function when using TLS migration
  2021-05-27 13:17               ` Daniel P. Berrangé
  2021-05-27 13:34                 ` Dr. David Alan Gilbert
@ 2021-05-27 13:35                 ` Peter Xu
  1 sibling, 0 replies; 14+ messages in thread
From: Peter Xu @ 2021-05-27 13:35 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Leonardo Bras, Lukas Straub, Dr. David Alan Gilbert,
	Juan Quintela

On Thu, May 27, 2021 at 02:17:55PM +0100, Daniel P. Berrangé wrote:
> On Thu, May 27, 2021 at 09:09:09AM -0400, Peter Xu wrote:
> > On Thu, May 27, 2021 at 01:37:42PM +0100, Daniel P. Berrangé wrote:
> > > On Thu, May 27, 2021 at 08:23:52AM -0400, Peter Xu wrote:
> > > > On Thu, May 27, 2021 at 09:46:54AM +0100, Daniel P. Berrangé wrote:
> > > > > On Wed, May 26, 2021 at 05:58:58PM -0400, Peter Xu wrote:
> > > > > > On Wed, May 26, 2021 at 11:21:03PM +0200, Lukas Straub wrote:
> > > > > > > On Wed, 26 May 2021 16:40:35 -0400
> > > > > > > Peter Xu <peterx@redhat.com> wrote:
> > > > > > > 
> > > > > > > > On Wed, May 26, 2021 at 05:05:40PM -0300, Leonardo Bras wrote:
> > > > > > > > > After yank feature was introduced, whenever migration is started using TLS,
> > > > > > > > > the following error happens in both source and destination hosts:
> > > > > > > > > 
> > > > > > > > > (qemu) qemu-kvm: ../util/yank.c:107: yank_unregister_instance:
> > > > > > > > > Assertion `QLIST_EMPTY(&entry->yankfns)' failed.
> > > > > > > > > 
> > > > > > > > > This happens because of a missing yank_unregister_function() when using
> > > > > > > > > qio-channel-tls.
> > > > > > > > > 
> > > > > > > > > Fix this by also allowing TYPE_QIO_CHANNEL_TLS object type to perform
> > > > > > > > > yank_unregister_function() in channel_close() and multifd_load_cleanup().
> > > > > > > > > 
> > > > > > > > > Fixes: 50186051f ("Introduce yank feature")
> > > > > > > > > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1964326
> > > > > > > > > Signed-off-by: Leonardo Bras <leobras.c@gmail.com>  
> > > > > > > > 
> > > > > > > > Leo,
> > > > > > > > 
> > > > > > > > Thanks for looking into it!
> > > > > > > > 
> > > > > > > > So before looking int the fix... I do have a doubt on why we only enable yank
> > > > > > > > on socket typed, as I think tls should also work with qio_channel_shutdown().
> > > > > > > > 
> > > > > > > > IIUC the confused thing here is we register only for qio-socket, however tls
> > > > > > > > will actually call migration_channel_connect() twice, first with a qio-socket,
> > > > > > > > then with the real tls-socket.  For tls I feel like we have registered with the
> > > > > > > > wrong channel - instead of the wrapper socket ioc, we should register to the
> > > > > > > > final tls ioc?
> > > > > > > > 
> > > > > > > > Lukas, is there a reason?
> > > > > > > > 
> > > > > > > 
> > > > > > > Hi,
> > > > > > > There is no specific reason. Both ways work equally well in preventing
> > > > > > > qemu from hanging. shutdown() for tls-channel just makes it abort a
> > > > > > > little sooner (by not attempting to encrypt and send data anymore).
> > > > > > > 
> > > > > > > I don't lean either way. I guess registering it on the tls-channel
> > > > > > > makes is a bit more explicit and clearer.
> > > > > > 
> > > > > > Agreed, because IMHO logically the migration code should not be aware of
> > > > > > internals of IOChannels, e.g., we shouldn't need to know ioc->master is the
> > > > > > socket ioc of tls ioc to unregister.
> > > > > 
> > > > > I think it is atually better to ignore the TLS channel and *always* yank
> > > > > on the undering socket IO channel. The yank functionality is intended to
> > > > > be used in a scenario where we know the channels are broken.  If yank
> > > > > calls the high level IO channel it is potentially going to try to do a
> > > > > cleanup shutdown that we know will fail because of the broken network.
> > > > 
> > > > Could you elaborate what's the "cleanup shutdown"?
> > > > 
> > > > The yank calls migration_yank_iochannel:
> > > > 
> > > > void migration_yank_iochannel(void *opaque)
> > > > {
> > > >     QIOChannel *ioc = QIO_CHANNEL(opaque);
> > > > 
> > > >     qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
> > > > }
> > > > 
> > > > Where qio_channel_shutdown for tls is nothing but delivers that to the master
> > > > channel:
> > > > 
> > > > static int qio_channel_tls_shutdown(QIOChannel *ioc,
> > > >                                     QIOChannelShutdown how,
> > > >                                     Error **errp)
> > > > {
> > > >     QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc);
> > > > 
> > > >     qatomic_or(&tioc->shutdown, how);
> > > > 
> > > >     return qio_channel_shutdown(tioc->master, how, errp);
> > > > }
> > > > 
> > > > So I thought it was a nice wrapper just for things like this, and I didn't see
> > > > anything it does more than the io_shutdown for the socket channel.  Did I miss
> > > > something?
> > > 
> > > Today thats the case, but don't assume it will be the case forever.
> > > There is a mechanism in TLS for doing clean shutdown which we've
> > > debated including.
> > > 
> > > In general apps *can* just call the shutdown method on the QIOChannelTLS
> > > object no matter what.  Yank is just a little bit special because of its
> > > need to be guaranteed to work even when the network is dead. So yank
> > > should always directly call the low level QIOChannelSocket, so thre is
> > > a strong guarantee it can't block on something.
> > 
> > Hmm, I am still not fully convinced that that's a valid reason the migration
> > code should be aware of how the socket is managed in tls channels...
> > 
> > Does that sound more like a good reason to introduce QIOChannelShutdown with
> > QIO_CHANNEL_SHUTDOWN_FORCE so it'll always not block if FORCE set?  Then we can
> > switch the yank function to use that.
> > 
> > What do you think?
> 
> I think that's unneccessary - the migration code already does similar
> things elsewhere when it wants to distinguish TLS usage, so this is not
> anything new conceptually.

But IMHO the other option is to reduce tls difference and treat it the same as
the other channels as much as possible.. probably starting from reworking the
yank calls..

And IIUC it's not only about migration.  E.g., all the existing yank functions
are based on the fact that current channel shutdown() won't block as it's
called in the QMP oob handler:

char_socket_yank_iochannel[409] qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
nbd_yank[1783]                 qio_channel_shutdown(QIO_CHANNEL(s->sioc), QIO_CHANNEL_SHUTDOWN_BOTH, NULL);

I have no idea whether they'll use tls or not, I don't see strictly on why they
can't at least in the future... If tls starts to introduce shutdown() that can
block, IMHO the cleaner solution is to separate the use of block/nonblock
shutdown() because we do have scenarios that do not want shutdown() to block,
by either introduce QIO_CHANNEL_SHUTDOWN_FORCE or QIO_CHANNEL_SHUTDOWN_FULL
which guarantees full cleanup of the tls channel even slower.

We also have other call sites for channel shutdown()s besides yank, I didn't
check but I feel like it's always good to provide non-blocking solutions when
the caller wants, as it does look like a valid requirement as long as the
change is trivial (IIUC it'll be a small patch and conditionally do either
quick/slow version of shutdown for tls only).

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/1] yank: Unregister function when using TLS migration
  2021-05-27 12:37           ` Daniel P. Berrangé
  2021-05-27 13:09             ` Peter Xu
@ 2021-05-27 15:05             ` Lukas Straub
  1 sibling, 0 replies; 14+ messages in thread
From: Lukas Straub @ 2021-05-27 15:05 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Leonardo Bras, Dr. David Alan Gilbert, Peter Xu,
	Juan Quintela

[-- Attachment #1: Type: text/plain, Size: 5333 bytes --]

On Thu, 27 May 2021 13:37:42 +0100
Daniel P. Berrangé <berrange@redhat.com> wrote:

> On Thu, May 27, 2021 at 08:23:52AM -0400, Peter Xu wrote:
> > On Thu, May 27, 2021 at 09:46:54AM +0100, Daniel P. Berrangé wrote:  
> > > On Wed, May 26, 2021 at 05:58:58PM -0400, Peter Xu wrote:  
> > > > On Wed, May 26, 2021 at 11:21:03PM +0200, Lukas Straub wrote:  
> > > > > On Wed, 26 May 2021 16:40:35 -0400
> > > > > Peter Xu <peterx@redhat.com> wrote:
> > > > >   
> > > > > > On Wed, May 26, 2021 at 05:05:40PM -0300, Leonardo Bras wrote:  
> > > > > > > After yank feature was introduced, whenever migration is started using TLS,
> > > > > > > the following error happens in both source and destination hosts:
> > > > > > > 
> > > > > > > (qemu) qemu-kvm: ../util/yank.c:107: yank_unregister_instance:
> > > > > > > Assertion `QLIST_EMPTY(&entry->yankfns)' failed.
> > > > > > > 
> > > > > > > This happens because of a missing yank_unregister_function() when using
> > > > > > > qio-channel-tls.
> > > > > > > 
> > > > > > > Fix this by also allowing TYPE_QIO_CHANNEL_TLS object type to perform
> > > > > > > yank_unregister_function() in channel_close() and multifd_load_cleanup().
> > > > > > > 
> > > > > > > Fixes: 50186051f ("Introduce yank feature")
> > > > > > > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1964326
> > > > > > > Signed-off-by: Leonardo Bras <leobras.c@gmail.com>    
> > > > > > 
> > > > > > Leo,
> > > > > > 
> > > > > > Thanks for looking into it!
> > > > > > 
> > > > > > So before looking int the fix... I do have a doubt on why we only enable yank
> > > > > > on socket typed, as I think tls should also work with qio_channel_shutdown().
> > > > > > 
> > > > > > IIUC the confused thing here is we register only for qio-socket, however tls
> > > > > > will actually call migration_channel_connect() twice, first with a qio-socket,
> > > > > > then with the real tls-socket.  For tls I feel like we have registered with the
> > > > > > wrong channel - instead of the wrapper socket ioc, we should register to the
> > > > > > final tls ioc?
> > > > > > 
> > > > > > Lukas, is there a reason?
> > > > > >   
> > > > > 
> > > > > Hi,
> > > > > There is no specific reason. Both ways work equally well in preventing
> > > > > qemu from hanging. shutdown() for tls-channel just makes it abort a
> > > > > little sooner (by not attempting to encrypt and send data anymore).
> > > > > 
> > > > > I don't lean either way. I guess registering it on the tls-channel
> > > > > makes is a bit more explicit and clearer.  
> > > > 
> > > > Agreed, because IMHO logically the migration code should not be aware of
> > > > internals of IOChannels, e.g., we shouldn't need to know ioc->master is the
> > > > socket ioc of tls ioc to unregister.  
> > > 
> > > I think it is atually better to ignore the TLS channel and *always* yank
> > > on the undering socket IO channel. The yank functionality is intended to
> > > be used in a scenario where we know the channels are broken.  If yank
> > > calls the high level IO channel it is potentially going to try to do a
> > > cleanup shutdown that we know will fail because of the broken network.  
> > 
> > Could you elaborate what's the "cleanup shutdown"?
> > 
> > The yank calls migration_yank_iochannel:
> > 
> > void migration_yank_iochannel(void *opaque)
> > {
> >     QIOChannel *ioc = QIO_CHANNEL(opaque);
> > 
> >     qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
> > }
> > 
> > Where qio_channel_shutdown for tls is nothing but delivers that to the master
> > channel:
> > 
> > static int qio_channel_tls_shutdown(QIOChannel *ioc,
> >                                     QIOChannelShutdown how,
> >                                     Error **errp)
> > {
> >     QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc);
> > 
> >     qatomic_or(&tioc->shutdown, how);
> > 
> >     return qio_channel_shutdown(tioc->master, how, errp);
> > }
> > 
> > So I thought it was a nice wrapper just for things like this, and I didn't see
> > anything it does more than the io_shutdown for the socket channel.  Did I miss
> > something?  
> 
> Today thats the case, but don't assume it will be the case forever.
> There is a mechanism in TLS for doing clean shutdown which we've
> debated including.

Actually, the requirements of io_shutdown where tightened with the
introduction of the yank feature (commit 8659f317d) and it now reads:

/**
 * qio_channel_shutdown:
 * ...
 * This function is thread-safe, terminates quickly and does not block.
 * ...
 */

And it should probably be further tightened with something like :
"With SHUTDOWN_BOTH all in-flight read()/write() operations on the io
object will be canceled immediately"

Since from a quick look, at least nbd (nbd_teardown_connection()),
migrate (migrate_fd_cancel()) and yank of course expect that.

> In general apps *can* just call the shutdown method on the QIOChannelTLS
> object no matter what.  Yank is just a little bit special because of its
> need to be guaranteed to work even when the network is dead. So yank
> should always directly call the low level QIOChannelSocket, so thre is
> a strong guarantee it can't block on something.
> 
> 
> Regards,
> Daniel



-- 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-05-27 15:09 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-26 20:05 [PATCH 1/1] yank: Unregister function when using TLS migration Leonardo Bras
2021-05-26 20:40 ` Peter Xu
2021-05-26 21:21   ` Lukas Straub
2021-05-26 21:58     ` Peter Xu
2021-05-27  8:46       ` Daniel P. Berrangé
2021-05-27 12:23         ` Peter Xu
2021-05-27 12:37           ` Daniel P. Berrangé
2021-05-27 13:09             ` Peter Xu
2021-05-27 13:17               ` Daniel P. Berrangé
2021-05-27 13:34                 ` Dr. David Alan Gilbert
2021-05-27 13:35                 ` Peter Xu
2021-05-27 15:05             ` Lukas Straub
2021-05-26 21:24 ` Lukas Straub
2021-05-26 21:56   ` Leonardo Brás

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.