All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Make local migration with TAP network device possible
@ 2022-06-14 11:18 Andrey Ryabinin
  2022-06-14 11:18 ` [PATCH 1/2] chardev: don't set O_NONBLOCK on SCM_RIGHTS file descriptors Andrey Ryabinin
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Andrey Ryabinin @ 2022-06-14 11:18 UTC (permalink / raw)
  To: qemu-devel, Marc-André Lureau, Paolo Bonzini,
	Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Jason Wang
  Cc: Stefan Hajnoczi, yc-core, Andrey Ryabinin

Hi

These couple patches aims to  make possible local migration (within one host)
on the same TAP device used by source and destination QEMU

The scenario looks like this
 1. Create TAP devices and pass file descriptors to source QEMU
 2. Launch destination QEMU (-incoming defer) and pass same descriptors to it.
 3. Start migration


Regarding the first patch: It makes possible to receive file descriptor in non-blocking
state. But I probably didn't cover all FD users which might need to set blocking state after
the patch. So I'm hopping for the hints where else, besides fd_start_incoming_migration()
I need to put qemu_socket_set_block() calls.


Andrey Ryabinin (2):
  chardev: don't set O_NONBLOCK on SCM_RIGHTS file descriptors.
  tap: initialize TAPState->enabled according to the actual state of
    queue

 chardev/char-socket.c |  3 ---
 io/channel-socket.c   |  3 ---
 migration/fd.c        |  2 ++
 net/tap-bsd.c         |  5 +++++
 net/tap-linux.c       | 12 ++++++++++++
 net/tap-solaris.c     |  5 +++++
 net/tap.c             |  2 +-
 net/tap_int.h         |  1 +
 8 files changed, 26 insertions(+), 7 deletions(-)

-- 
2.35.1



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] chardev: don't set O_NONBLOCK on SCM_RIGHTS file descriptors.
  2022-06-14 11:18 [PATCH 0/2] Make local migration with TAP network device possible Andrey Ryabinin
@ 2022-06-14 11:18 ` Andrey Ryabinin
  2022-06-15 13:12   ` Stefan Hajnoczi
  2022-06-14 11:21 ` [PATCH 2/2] tap: initialize TAPState->enabled according to the actual state of queue Andrey Ryabinin
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Andrey Ryabinin @ 2022-06-14 11:18 UTC (permalink / raw)
  To: qemu-devel, Marc-André Lureau, Paolo Bonzini,
	Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Jason Wang
  Cc: Stefan Hajnoczi, yc-core, Andrey Ryabinin

This reverts commit 9b938c7262e4 ("chardev: clear O_NONBLOCK on SCM_RIGHTS file descriptors").
File descriptor passed to QEMU via 'getfd' QMP command always
changed to blocking mode. Instead of that, change blocking mode by QEMU
file descriptors users when necessary, e.g. like migration.

We need to preserve the state of the file descriptor in case it's still
used by an external process and before the QEMU itself started
using it.

E.g. our local migration scenario with TAP networking looks like this:
 1. Create TAP devices and pass file descriptors to source QEMU
 2. Launch destination QEMU (-incoming defer) and pass same descriptors to it.
 3. Start migration

In such scenario setting blocking state at stage (2) will hang source QEMU
since TAP fd suddenly become blocking.

Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
---
 chardev/char-socket.c | 3 ---
 io/channel-socket.c   | 3 ---
 migration/fd.c        | 2 ++
 3 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index dc4e218eeb6..c9592fb5836 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -310,9 +310,6 @@ static ssize_t tcp_chr_recv(Chardev *chr, char *buf, size_t len)
             continue;
         }
 
-        /* O_NONBLOCK is preserved across SCM_RIGHTS so reset it */
-        qemu_socket_set_block(fd);
-
 #ifndef MSG_CMSG_CLOEXEC
         qemu_set_cloexec(fd);
 #endif
diff --git a/io/channel-socket.c b/io/channel-socket.c
index dc9c165de11..8b9679460dc 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -479,9 +479,6 @@ static void qio_channel_socket_copy_fds(struct msghdr *msg,
                 continue;
             }
 
-            /* O_NONBLOCK is preserved across SCM_RIGHTS so reset it */
-            qemu_socket_set_block(fd);
-
 #ifndef MSG_CMSG_CLOEXEC
             qemu_set_cloexec(fd);
 #endif
diff --git a/migration/fd.c b/migration/fd.c
index 6f2f50475f4..793fffeb169 100644
--- a/migration/fd.c
+++ b/migration/fd.c
@@ -60,6 +60,8 @@ void fd_start_incoming_migration(const char *fdname, Error **errp)
         return;
     }
 
+    qemu_socket_set_block(fd);
+
     trace_migration_fd_incoming(fd);
 
     ioc = qio_channel_new_fd(fd, errp);
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] tap: initialize TAPState->enabled according to the actual state of queue
  2022-06-14 11:18 [PATCH 0/2] Make local migration with TAP network device possible Andrey Ryabinin
  2022-06-14 11:18 ` [PATCH 1/2] chardev: don't set O_NONBLOCK on SCM_RIGHTS file descriptors Andrey Ryabinin
@ 2022-06-14 11:21 ` Andrey Ryabinin
  2022-06-28  4:15   ` Jason Wang
  2022-06-14 11:32 ` [PATCH 0/2] Make local migration with TAP network device possible Daniel P. Berrangé
  2022-06-15 13:10 ` Stefan Hajnoczi
  3 siblings, 1 reply; 11+ messages in thread
From: Andrey Ryabinin @ 2022-06-14 11:21 UTC (permalink / raw)
  To: qemu-devel, Marc-André Lureau, Paolo Bonzini,
	Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Jason Wang
  Cc: Stefan Hajnoczi, yc-core, Andrey Ryabinin

Currently TAPState->enabled initialized as true. If fd was passed to qemu
in a disabled state it will cause an assert at the attempt to detach queue
in virtio_net_set_queues():

virtio_net_set_queues() :
            r = peer_detach() -> tap_disable():
                                    if (s->enabled == 0) {
                                       return 0;
				    } else {
				       //Will return an error.
                                       ret = tap_fd_disable(s->fd);
                                       ...
				       return ret;
            assert(!r);

Initialize ->enabled according to the actual state of fd to fix this.

Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
---
 net/tap-bsd.c     |  5 +++++
 net/tap-linux.c   | 12 ++++++++++++
 net/tap-solaris.c |  5 +++++
 net/tap.c         |  2 +-
 net/tap_int.h     |  1 +
 5 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/net/tap-bsd.c b/net/tap-bsd.c
index 005ce05c6e0..8c21f058c8c 100644
--- a/net/tap-bsd.c
+++ b/net/tap-bsd.c
@@ -217,6 +217,11 @@ int tap_probe_vnet_hdr_len(int fd, int len)
     return 0;
 }
 
+bool tap_probe_enabled(int fd)
+{
+    return true;
+}
+
 void tap_fd_set_vnet_hdr_len(int fd, int len)
 {
 }
diff --git a/net/tap-linux.c b/net/tap-linux.c
index 304ff45071d..6078ba03af6 100644
--- a/net/tap-linux.c
+++ b/net/tap-linux.c
@@ -193,6 +193,18 @@ int tap_probe_vnet_hdr_len(int fd, int len)
     return 1;
 }
 
+bool tap_probe_enabled(int fd)
+{
+    struct ifreq ifr;
+
+    if (ioctl(fd, TUNGETIFF, &ifr) != 0) {
+        error_report("TUNGETIFF ioctl() failed: %s",
+                     strerror(errno));
+        return false;
+    }
+    return !(ifr.ifr_flags & IFF_DETACH_QUEUE);
+}
+
 void tap_fd_set_vnet_hdr_len(int fd, int len)
 {
     if (ioctl(fd, TUNSETVNETHDRSZ, &len) == -1) {
diff --git a/net/tap-solaris.c b/net/tap-solaris.c
index a44f8805c23..ccaa3334882 100644
--- a/net/tap-solaris.c
+++ b/net/tap-solaris.c
@@ -221,6 +221,11 @@ int tap_probe_vnet_hdr_len(int fd, int len)
     return 0;
 }
 
+bool tap_probe_enabled(int fd)
+{
+    return true;
+}
+
 void tap_fd_set_vnet_hdr_len(int fd, int len)
 {
 }
diff --git a/net/tap.c b/net/tap.c
index b3ddfd4a74b..799f8ec7c76 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -399,7 +399,7 @@ static TAPState *net_tap_fd_init(NetClientState *peer,
     s->host_vnet_hdr_len = vnet_hdr ? sizeof(struct virtio_net_hdr) : 0;
     s->using_vnet_hdr = false;
     s->has_ufo = tap_probe_has_ufo(s->fd);
-    s->enabled = true;
+    s->enabled = tap_probe_enabled(s->fd);
     tap_set_offload(&s->nc, 0, 0, 0, 0, 0);
     /*
      * Make sure host header length is set correctly in tap:
diff --git a/net/tap_int.h b/net/tap_int.h
index 547f8a5a28f..b8fc3dfbfa7 100644
--- a/net/tap_int.h
+++ b/net/tap_int.h
@@ -37,6 +37,7 @@ void tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp);
 int tap_probe_vnet_hdr(int fd, Error **errp);
 int tap_probe_vnet_hdr_len(int fd, int len);
 int tap_probe_has_ufo(int fd);
+bool tap_probe_enabled(int fd);
 void tap_fd_set_offload(int fd, int csum, int tso4, int tso6, int ecn, int ufo);
 void tap_fd_set_vnet_hdr_len(int fd, int len);
 int tap_fd_set_vnet_le(int fd, int vnet_is_le);
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] Make local migration with TAP network device possible
  2022-06-14 11:18 [PATCH 0/2] Make local migration with TAP network device possible Andrey Ryabinin
  2022-06-14 11:18 ` [PATCH 1/2] chardev: don't set O_NONBLOCK on SCM_RIGHTS file descriptors Andrey Ryabinin
  2022-06-14 11:21 ` [PATCH 2/2] tap: initialize TAPState->enabled according to the actual state of queue Andrey Ryabinin
@ 2022-06-14 11:32 ` Daniel P. Berrangé
  2022-06-15 13:10 ` Stefan Hajnoczi
  3 siblings, 0 replies; 11+ messages in thread
From: Daniel P. Berrangé @ 2022-06-14 11:32 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: qemu-devel, Marc-André Lureau, Paolo Bonzini, Juan Quintela,
	Dr. David Alan Gilbert, Jason Wang, Stefan Hajnoczi, yc-core

On Tue, Jun 14, 2022 at 02:18:41PM +0300, Andrey Ryabinin wrote:
> Hi
> 
> These couple patches aims to  make possible local migration (within one host)
> on the same TAP device used by source and destination QEMU
> 
> The scenario looks like this
>  1. Create TAP devices and pass file descriptors to source QEMU
>  2. Launch destination QEMU (-incoming defer) and pass same descriptors to it.
>  3. Start migration
> 
> 
> Regarding the first patch: It makes possible to receive file descriptor in non-blocking
> state. But I probably didn't cover all FD users which might need to set blocking state after
> the patch. So I'm hopping for the hints where else, besides fd_start_incoming_migration()
> I need to put qemu_socket_set_block() calls.

You'll need to check all callers of

qio_channel_readv_full
qio_channel_readv_full_all
qio_channel_readv_full_all_eof

and identify which pass a non-NULL parameter for 'fds'. If the caller
does NOT have a qemu_setnonblock call on the FD it gets back, then you
have to assume it is expecting it in blocking mode and so need to
add qemu_setblock


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] Make local migration with TAP network device possible
  2022-06-14 11:18 [PATCH 0/2] Make local migration with TAP network device possible Andrey Ryabinin
                   ` (2 preceding siblings ...)
  2022-06-14 11:32 ` [PATCH 0/2] Make local migration with TAP network device possible Daniel P. Berrangé
@ 2022-06-15 13:10 ` Stefan Hajnoczi
  2022-06-24 10:53   ` Andrey Ryabinin
  3 siblings, 1 reply; 11+ messages in thread
From: Stefan Hajnoczi @ 2022-06-15 13:10 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: qemu-devel, Marc-André Lureau, Paolo Bonzini,
	Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Jason Wang, yc-core

[-- Attachment #1: Type: text/plain, Size: 2286 bytes --]

On Tue, Jun 14, 2022 at 02:18:41PM +0300, Andrey Ryabinin wrote:
> Hi
> 
> These couple patches aims to  make possible local migration (within one host)
> on the same TAP device used by source and destination QEMU
> 
> The scenario looks like this
>  1. Create TAP devices and pass file descriptors to source QEMU
>  2. Launch destination QEMU (-incoming defer) and pass same descriptors to it.
>  3. Start migration
> 
> 
> Regarding the first patch: It makes possible to receive file descriptor in non-blocking
> state. But I probably didn't cover all FD users which might need to set blocking state after
> the patch. So I'm hopping for the hints where else, besides fd_start_incoming_migration()
> I need to put qemu_socket_set_block() calls.

Nice feature. I am worried that these patches are unsafe/incomplete
though.

Tap local migration isn't explicitly visible in the code. How will other
developers know the feature is there and how to avoid breaking it when
modifying the code? Maybe a migration test case, comments that explain
the rules about accessing the tap fd, and/or assertions?

How does this interact with hw/net/vhost_net.c, which uses tap_get_fd()
to borrow the fd? I guess the idea is that the source VM is paused and
no tap activity is expected. Then migration handover happens and the
destination VM starts running and is allowed to access the tap fd.
However, the source VM still has vhost_net with the tap fd set up. I
wonder if there is any issue with interference between the two vhost_net
instances?

These kinds of questions should be answered, mostly in the code but also
in the cover letter. It should be clear why this approach is correct.

Thanks,
Stefan

> 
> 
> Andrey Ryabinin (2):
>   chardev: don't set O_NONBLOCK on SCM_RIGHTS file descriptors.
>   tap: initialize TAPState->enabled according to the actual state of
>     queue
> 
>  chardev/char-socket.c |  3 ---
>  io/channel-socket.c   |  3 ---
>  migration/fd.c        |  2 ++
>  net/tap-bsd.c         |  5 +++++
>  net/tap-linux.c       | 12 ++++++++++++
>  net/tap-solaris.c     |  5 +++++
>  net/tap.c             |  2 +-
>  net/tap_int.h         |  1 +
>  8 files changed, 26 insertions(+), 7 deletions(-)
> 
> -- 
> 2.35.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] chardev: don't set O_NONBLOCK on SCM_RIGHTS file descriptors.
  2022-06-14 11:18 ` [PATCH 1/2] chardev: don't set O_NONBLOCK on SCM_RIGHTS file descriptors Andrey Ryabinin
@ 2022-06-15 13:12   ` Stefan Hajnoczi
  2022-06-24 11:00     ` Andrey Ryabinin
  0 siblings, 1 reply; 11+ messages in thread
From: Stefan Hajnoczi @ 2022-06-15 13:12 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: qemu-devel, Marc-André Lureau, Paolo Bonzini,
	Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Jason Wang, yc-core

[-- Attachment #1: Type: text/plain, Size: 2849 bytes --]

On Tue, Jun 14, 2022 at 02:18:42PM +0300, Andrey Ryabinin wrote:
> This reverts commit 9b938c7262e4 ("chardev: clear O_NONBLOCK on SCM_RIGHTS file descriptors").
> File descriptor passed to QEMU via 'getfd' QMP command always
> changed to blocking mode. Instead of that, change blocking mode by QEMU
> file descriptors users when necessary, e.g. like migration.
> 
> We need to preserve the state of the file descriptor in case it's still
> used by an external process and before the QEMU itself started
> using it.
> 
> E.g. our local migration scenario with TAP networking looks like this:
>  1. Create TAP devices and pass file descriptors to source QEMU
>  2. Launch destination QEMU (-incoming defer) and pass same descriptors to it.
>  3. Start migration
> 
> In such scenario setting blocking state at stage (2) will hang source QEMU
> since TAP fd suddenly become blocking.

Is it possible to add a special flag or API for preserving the
O_NONBLOCK open flag? That way the rest of QEMU could continue to safely
reset the flag while the tap fd passing code would explicitly ask for
the O_NONBLOCK open flag to be preserved. That seems safer but I haven't
checked whether it's possible to do this.

> 
> Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
> ---
>  chardev/char-socket.c | 3 ---
>  io/channel-socket.c   | 3 ---
>  migration/fd.c        | 2 ++
>  3 files changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> index dc4e218eeb6..c9592fb5836 100644
> --- a/chardev/char-socket.c
> +++ b/chardev/char-socket.c
> @@ -310,9 +310,6 @@ static ssize_t tcp_chr_recv(Chardev *chr, char *buf, size_t len)
>              continue;
>          }
>  
> -        /* O_NONBLOCK is preserved across SCM_RIGHTS so reset it */
> -        qemu_socket_set_block(fd);
> -
>  #ifndef MSG_CMSG_CLOEXEC
>          qemu_set_cloexec(fd);
>  #endif
> diff --git a/io/channel-socket.c b/io/channel-socket.c
> index dc9c165de11..8b9679460dc 100644
> --- a/io/channel-socket.c
> +++ b/io/channel-socket.c
> @@ -479,9 +479,6 @@ static void qio_channel_socket_copy_fds(struct msghdr *msg,
>                  continue;
>              }
>  
> -            /* O_NONBLOCK is preserved across SCM_RIGHTS so reset it */
> -            qemu_socket_set_block(fd);
> -
>  #ifndef MSG_CMSG_CLOEXEC
>              qemu_set_cloexec(fd);
>  #endif
> diff --git a/migration/fd.c b/migration/fd.c
> index 6f2f50475f4..793fffeb169 100644
> --- a/migration/fd.c
> +++ b/migration/fd.c
> @@ -60,6 +60,8 @@ void fd_start_incoming_migration(const char *fdname, Error **errp)
>          return;
>      }
>  
> +    qemu_socket_set_block(fd);
> +
>      trace_migration_fd_incoming(fd);
>  
>      ioc = qio_channel_new_fd(fd, errp);
> -- 
> 2.35.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] Make local migration with TAP network device possible
  2022-06-15 13:10 ` Stefan Hajnoczi
@ 2022-06-24 10:53   ` Andrey Ryabinin
  2022-06-24 11:45     ` Stefan Hajnoczi
  0 siblings, 1 reply; 11+ messages in thread
From: Andrey Ryabinin @ 2022-06-24 10:53 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Marc-André Lureau, Paolo Bonzini,
	Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Jason Wang, yc-core



On 6/15/22 16:10, Stefan Hajnoczi wrote:
> On Tue, Jun 14, 2022 at 02:18:41PM +0300, Andrey Ryabinin wrote:
>> Hi
>>
>> These couple patches aims to  make possible local migration (within one host)
>> on the same TAP device used by source and destination QEMU
>>
>> The scenario looks like this
>>  1. Create TAP devices and pass file descriptors to source QEMU
>>  2. Launch destination QEMU (-incoming defer) and pass same descriptors to it.
>>  3. Start migration
>>
>>
>> Regarding the first patch: It makes possible to receive file descriptor in non-blocking
>> state. But I probably didn't cover all FD users which might need to set blocking state after
>> the patch. So I'm hopping for the hints where else, besides fd_start_incoming_migration()
>> I need to put qemu_socket_set_block() calls.
> 
> Nice feature. I am worried that these patches are unsafe/incomplete
> though.
> 
> Tap local migration isn't explicitly visible in the code. How will other
> developers know the feature is there and how to avoid breaking it when
> modifying the code? Maybe a migration test case, comments that explain
> the rules about accessing the tap fd, and/or assertions?
> 

Yeah, I'm writing avocado test case, will add it in future next iterations.

> How does this interact with hw/net/vhost_net.c, which uses tap_get_fd()
> to borrow the fd? I guess the idea is that the source VM is paused and
> no tap activity is expected. Then migration handover happens and the
> destination VM starts running and is allowed to access the tap fd.
> However, the source VM still has vhost_net with the tap fd set up. I
> wonder if there is any issue with interference between the two vhost_net
> instances?
> 

I'll try to take a closer look at code, let's see if I can find any possible issues.
But I did some brief testing with vhost=on net device, and it find any problems.
The test was
1. launch pings from source VM to host
2. Migrate
3. Check at dest VM that there is no lost packets.


> These kinds of questions should be answered, mostly in the code but also
> in the cover letter. It should be clear why this approach is correct.
> 
> Thanks,
> Stefan
> 
>>
>>
>> Andrey Ryabinin (2):
>>   chardev: don't set O_NONBLOCK on SCM_RIGHTS file descriptors.
>>   tap: initialize TAPState->enabled according to the actual state of
>>     queue
>>
>>  chardev/char-socket.c |  3 ---
>>  io/channel-socket.c   |  3 ---
>>  migration/fd.c        |  2 ++
>>  net/tap-bsd.c         |  5 +++++
>>  net/tap-linux.c       | 12 ++++++++++++
>>  net/tap-solaris.c     |  5 +++++
>>  net/tap.c             |  2 +-
>>  net/tap_int.h         |  1 +
>>  8 files changed, 26 insertions(+), 7 deletions(-)
>>
>> -- 
>> 2.35.1
>>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] chardev: don't set O_NONBLOCK on SCM_RIGHTS file descriptors.
  2022-06-15 13:12   ` Stefan Hajnoczi
@ 2022-06-24 11:00     ` Andrey Ryabinin
  2022-06-24 11:34       ` Daniel P. Berrangé
  0 siblings, 1 reply; 11+ messages in thread
From: Andrey Ryabinin @ 2022-06-24 11:00 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Marc-André Lureau, Paolo Bonzini,
	Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Jason Wang, yc-core



On 6/15/22 16:12, Stefan Hajnoczi wrote:
> On Tue, Jun 14, 2022 at 02:18:42PM +0300, Andrey Ryabinin wrote:
>> This reverts commit 9b938c7262e4 ("chardev: clear O_NONBLOCK on SCM_RIGHTS file descriptors").
>> File descriptor passed to QEMU via 'getfd' QMP command always
>> changed to blocking mode. Instead of that, change blocking mode by QEMU
>> file descriptors users when necessary, e.g. like migration.
>>
>> We need to preserve the state of the file descriptor in case it's still
>> used by an external process and before the QEMU itself started
>> using it.
>>
>> E.g. our local migration scenario with TAP networking looks like this:
>>  1. Create TAP devices and pass file descriptors to source QEMU
>>  2. Launch destination QEMU (-incoming defer) and pass same descriptors to it.
>>  3. Start migration
>>
>> In such scenario setting blocking state at stage (2) will hang source QEMU
>> since TAP fd suddenly become blocking.
> 
> Is it possible to add a special flag or API for preserving the
> O_NONBLOCK open flag? That way the rest of QEMU could continue to safely
> reset the flag while the tap fd passing code would explicitly ask for
> the O_NONBLOCK open flag to be preserved. That seems safer but I haven't
> checked whether it's possible to do this.
> 

The only possibility I see here is embedding some kind 'nonblock' in the message
itself along with fds. Not sure if this sensible approach.

Not changing fd state seems like more correct approach to me. E.g. I would expect
that sending fd to qemu and executing qmp commands 'getfd' & 'closefd' shouldn't
induce any changes in fd state. Which is currently no true.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] chardev: don't set O_NONBLOCK on SCM_RIGHTS file descriptors.
  2022-06-24 11:00     ` Andrey Ryabinin
@ 2022-06-24 11:34       ` Daniel P. Berrangé
  0 siblings, 0 replies; 11+ messages in thread
From: Daniel P. Berrangé @ 2022-06-24 11:34 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Stefan Hajnoczi, qemu-devel, Marc-André Lureau,
	Paolo Bonzini, Juan Quintela, Dr. David Alan Gilbert, Jason Wang,
	yc-core

On Fri, Jun 24, 2022 at 02:00:15PM +0300, Andrey Ryabinin wrote:
> 
> 
> On 6/15/22 16:12, Stefan Hajnoczi wrote:
> > On Tue, Jun 14, 2022 at 02:18:42PM +0300, Andrey Ryabinin wrote:
> >> This reverts commit 9b938c7262e4 ("chardev: clear O_NONBLOCK on SCM_RIGHTS file descriptors").
> >> File descriptor passed to QEMU via 'getfd' QMP command always
> >> changed to blocking mode. Instead of that, change blocking mode by QEMU
> >> file descriptors users when necessary, e.g. like migration.
> >>
> >> We need to preserve the state of the file descriptor in case it's still
> >> used by an external process and before the QEMU itself started
> >> using it.
> >>
> >> E.g. our local migration scenario with TAP networking looks like this:
> >>  1. Create TAP devices and pass file descriptors to source QEMU
> >>  2. Launch destination QEMU (-incoming defer) and pass same descriptors to it.
> >>  3. Start migration
> >>
> >> In such scenario setting blocking state at stage (2) will hang source QEMU
> >> since TAP fd suddenly become blocking.
> > 
> > Is it possible to add a special flag or API for preserving the
> > O_NONBLOCK open flag? That way the rest of QEMU could continue to safely
> > reset the flag while the tap fd passing code would explicitly ask for
> > the O_NONBLOCK open flag to be preserved. That seems safer but I haven't
> > checked whether it's possible to do this.
> > 
> 
> The only possibility I see here is embedding some kind 'nonblock' in the message
> itself along with fds. Not sure if this sensible approach.
> 
> Not changing fd state seems like more correct approach to me. E.g. I would expect
> that sending fd to qemu and executing qmp commands 'getfd' & 'closefd' shouldn't
> induce any changes in fd state. Which is currently no true.

I think that's a wrong expectation. The contract with 'getfd' is that
you are giving the target QEMU ownership of the file. The caller should
not expect todo anything with the FD after it has passd it to QEMU, and
target QEMU has freedom todo whatever it wants.  Admittedly this usage
model doesn't fit with what you're trying to make it now do, but those
are the historical expectations of getfd.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] Make local migration with TAP network device possible
  2022-06-24 10:53   ` Andrey Ryabinin
@ 2022-06-24 11:45     ` Stefan Hajnoczi
  0 siblings, 0 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2022-06-24 11:45 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Stefan Hajnoczi, qemu-devel, Marc-André Lureau,
	Paolo Bonzini, Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert, Jason Wang, yc-core

[-- Attachment #1: Type: text/plain, Size: 3040 bytes --]

Thanks! I will be offline for the next week.

Feel free to progress this series without my review.

Stefan

On Fri, Jun 24, 2022, 11:53 Andrey Ryabinin <arbn@yandex-team.com> wrote:

>
>
> On 6/15/22 16:10, Stefan Hajnoczi wrote:
> > On Tue, Jun 14, 2022 at 02:18:41PM +0300, Andrey Ryabinin wrote:
> >> Hi
> >>
> >> These couple patches aims to  make possible local migration (within one
> host)
> >> on the same TAP device used by source and destination QEMU
> >>
> >> The scenario looks like this
> >>  1. Create TAP devices and pass file descriptors to source QEMU
> >>  2. Launch destination QEMU (-incoming defer) and pass same descriptors
> to it.
> >>  3. Start migration
> >>
> >>
> >> Regarding the first patch: It makes possible to receive file descriptor
> in non-blocking
> >> state. But I probably didn't cover all FD users which might need to set
> blocking state after
> >> the patch. So I'm hopping for the hints where else, besides
> fd_start_incoming_migration()
> >> I need to put qemu_socket_set_block() calls.
> >
> > Nice feature. I am worried that these patches are unsafe/incomplete
> > though.
> >
> > Tap local migration isn't explicitly visible in the code. How will other
> > developers know the feature is there and how to avoid breaking it when
> > modifying the code? Maybe a migration test case, comments that explain
> > the rules about accessing the tap fd, and/or assertions?
> >
>
> Yeah, I'm writing avocado test case, will add it in future next iterations.
>
> > How does this interact with hw/net/vhost_net.c, which uses tap_get_fd()
> > to borrow the fd? I guess the idea is that the source VM is paused and
> > no tap activity is expected. Then migration handover happens and the
> > destination VM starts running and is allowed to access the tap fd.
> > However, the source VM still has vhost_net with the tap fd set up. I
> > wonder if there is any issue with interference between the two vhost_net
> > instances?
> >
>
> I'll try to take a closer look at code, let's see if I can find any
> possible issues.
> But I did some brief testing with vhost=on net device, and it find any
> problems.
> The test was
> 1. launch pings from source VM to host
> 2. Migrate
> 3. Check at dest VM that there is no lost packets.
>
>
> > These kinds of questions should be answered, mostly in the code but also
> > in the cover letter. It should be clear why this approach is correct.
> >
> > Thanks,
> > Stefan
> >
> >>
> >>
> >> Andrey Ryabinin (2):
> >>   chardev: don't set O_NONBLOCK on SCM_RIGHTS file descriptors.
> >>   tap: initialize TAPState->enabled according to the actual state of
> >>     queue
> >>
> >>  chardev/char-socket.c |  3 ---
> >>  io/channel-socket.c   |  3 ---
> >>  migration/fd.c        |  2 ++
> >>  net/tap-bsd.c         |  5 +++++
> >>  net/tap-linux.c       | 12 ++++++++++++
> >>  net/tap-solaris.c     |  5 +++++
> >>  net/tap.c             |  2 +-
> >>  net/tap_int.h         |  1 +
> >>  8 files changed, 26 insertions(+), 7 deletions(-)
> >>
> >> --
> >> 2.35.1
> >>
>
>

[-- Attachment #2: Type: text/html, Size: 3992 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] tap: initialize TAPState->enabled according to the actual state of queue
  2022-06-14 11:21 ` [PATCH 2/2] tap: initialize TAPState->enabled according to the actual state of queue Andrey Ryabinin
@ 2022-06-28  4:15   ` Jason Wang
  0 siblings, 0 replies; 11+ messages in thread
From: Jason Wang @ 2022-06-28  4:15 UTC (permalink / raw)
  To: Andrey Ryabinin, qemu-devel, Marc-André Lureau,
	Paolo Bonzini, Daniel P. Berrangé,
	Juan Quintela, Dr. David Alan Gilbert
  Cc: Stefan Hajnoczi, yc-core


在 2022/6/14 19:21, Andrey Ryabinin 写道:
> Currently TAPState->enabled initialized as true. If fd was passed to qemu
> in a disabled state it will cause an assert at the attempt to detach queue
> in virtio_net_set_queues():
>
> virtio_net_set_queues() :
>              r = peer_detach() -> tap_disable():
>                                      if (s->enabled == 0) {
>                                         return 0;
> 				    } else {
> 				       //Will return an error.
>                                         ret = tap_fd_disable(s->fd);
>                                         ...
> 				       return ret;
>              assert(!r);
>
> Initialize ->enabled according to the actual state of fd to fix this.
>
> Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
> ---
>   net/tap-bsd.c     |  5 +++++
>   net/tap-linux.c   | 12 ++++++++++++
>   net/tap-solaris.c |  5 +++++
>   net/tap.c         |  2 +-
>   net/tap_int.h     |  1 +
>   5 files changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/net/tap-bsd.c b/net/tap-bsd.c
> index 005ce05c6e0..8c21f058c8c 100644
> --- a/net/tap-bsd.c
> +++ b/net/tap-bsd.c
> @@ -217,6 +217,11 @@ int tap_probe_vnet_hdr_len(int fd, int len)
>       return 0;
>   }
>   
> +bool tap_probe_enabled(int fd)
> +{
> +    return true;
> +}
> +
>   void tap_fd_set_vnet_hdr_len(int fd, int len)
>   {
>   }
> diff --git a/net/tap-linux.c b/net/tap-linux.c
> index 304ff45071d..6078ba03af6 100644
> --- a/net/tap-linux.c
> +++ b/net/tap-linux.c
> @@ -193,6 +193,18 @@ int tap_probe_vnet_hdr_len(int fd, int len)
>       return 1;
>   }
>   
> +bool tap_probe_enabled(int fd)
> +{
> +    struct ifreq ifr;
> +
> +    if (ioctl(fd, TUNGETIFF, &ifr) != 0) {
> +        error_report("TUNGETIFF ioctl() failed: %s",
> +                     strerror(errno));
> +        return false;
> +    }
> +    return !(ifr.ifr_flags & IFF_DETACH_QUEUE);


Is it better to check IFF_MULT_QUEUE before this to unbreak the old host?

Thanks


> +}
> +
>   void tap_fd_set_vnet_hdr_len(int fd, int len)
>   {
>       if (ioctl(fd, TUNSETVNETHDRSZ, &len) == -1) {
> diff --git a/net/tap-solaris.c b/net/tap-solaris.c
> index a44f8805c23..ccaa3334882 100644
> --- a/net/tap-solaris.c
> +++ b/net/tap-solaris.c
> @@ -221,6 +221,11 @@ int tap_probe_vnet_hdr_len(int fd, int len)
>       return 0;
>   }
>   
> +bool tap_probe_enabled(int fd)
> +{
> +    return true;
> +}
> +
>   void tap_fd_set_vnet_hdr_len(int fd, int len)
>   {
>   }
> diff --git a/net/tap.c b/net/tap.c
> index b3ddfd4a74b..799f8ec7c76 100644
> --- a/net/tap.c
> +++ b/net/tap.c
> @@ -399,7 +399,7 @@ static TAPState *net_tap_fd_init(NetClientState *peer,
>       s->host_vnet_hdr_len = vnet_hdr ? sizeof(struct virtio_net_hdr) : 0;
>       s->using_vnet_hdr = false;
>       s->has_ufo = tap_probe_has_ufo(s->fd);
> -    s->enabled = true;
> +    s->enabled = tap_probe_enabled(s->fd);
>       tap_set_offload(&s->nc, 0, 0, 0, 0, 0);
>       /*
>        * Make sure host header length is set correctly in tap:
> diff --git a/net/tap_int.h b/net/tap_int.h
> index 547f8a5a28f..b8fc3dfbfa7 100644
> --- a/net/tap_int.h
> +++ b/net/tap_int.h
> @@ -37,6 +37,7 @@ void tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp);
>   int tap_probe_vnet_hdr(int fd, Error **errp);
>   int tap_probe_vnet_hdr_len(int fd, int len);
>   int tap_probe_has_ufo(int fd);
> +bool tap_probe_enabled(int fd);
>   void tap_fd_set_offload(int fd, int csum, int tso4, int tso6, int ecn, int ufo);
>   void tap_fd_set_vnet_hdr_len(int fd, int len);
>   int tap_fd_set_vnet_le(int fd, int vnet_is_le);



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-06-28  4:16 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-14 11:18 [PATCH 0/2] Make local migration with TAP network device possible Andrey Ryabinin
2022-06-14 11:18 ` [PATCH 1/2] chardev: don't set O_NONBLOCK on SCM_RIGHTS file descriptors Andrey Ryabinin
2022-06-15 13:12   ` Stefan Hajnoczi
2022-06-24 11:00     ` Andrey Ryabinin
2022-06-24 11:34       ` Daniel P. Berrangé
2022-06-14 11:21 ` [PATCH 2/2] tap: initialize TAPState->enabled according to the actual state of queue Andrey Ryabinin
2022-06-28  4:15   ` Jason Wang
2022-06-14 11:32 ` [PATCH 0/2] Make local migration with TAP network device possible Daniel P. Berrangé
2022-06-15 13:10 ` Stefan Hajnoczi
2022-06-24 10:53   ` Andrey Ryabinin
2022-06-24 11:45     ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.