* [PATCH v13 1/8] meson.build: Fix docker-test-build@alpine when including linux/errqueue.h
2022-05-13 6:28 [PATCH v13 0/8] MSG_ZEROCOPY + multifd Leonardo Bras
@ 2022-05-13 6:28 ` Leonardo Bras
2022-05-16 11:13 ` Dr. David Alan Gilbert
2022-05-13 6:28 ` [PATCH v13 2/8] QIOChannel: Add flags on io_writev and introduce io_flush callback Leonardo Bras
` (6 subsequent siblings)
7 siblings, 1 reply; 29+ messages in thread
From: Leonardo Bras @ 2022-05-13 6:28 UTC (permalink / raw)
To: Marc-André Lureau, Paolo Bonzini, Elena Ufimtseva,
Jagannathan Raman, John G Johnson, Daniel P. Berrangé,
Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
Markus Armbruster, Fam Zheng, Peter Xu
Cc: Leonardo Bras, qemu-devel, qemu-block
A build error happens in alpine CI when linux/errqueue.h is included
in io/channel-socket.c, due to redefining of 'struct __kernel_timespec':
===
ninja: job failed: [...]
In file included from /usr/include/linux/errqueue.h:6,
from ../io/channel-socket.c:29:
/usr/include/linux/time_types.h:7:8: error: redefinition of 'struct __kernel_timespec'
7 | struct __kernel_timespec {
| ^~~~~~~~~~~~~~~~~
In file included from /usr/include/liburing.h:19,
from /builds/user/qemu/include/block/aio.h:18,
from /builds/user/qemu/include/io/channel.h:26,
from /builds/user/qemu/include/io/channel-socket.h:24,
from ../io/channel-socket.c:24:
/usr/include/liburing/compat.h:9:8: note: originally defined here
9 | struct __kernel_timespec {
| ^~~~~~~~~~~~~~~~~
ninja: subcommand failed
===
As above error message suggests, 'struct __kernel_timespec' was already
defined by liburing/compat.h.
Fix alpine CI by adding test to disable liburing in configure step if a
redefinition happens between linux/errqueue.h and liburing/compat.h.
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
meson.build | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/meson.build b/meson.build
index 9b20dcd143..a996690c9b 100644
--- a/meson.build
+++ b/meson.build
@@ -515,12 +515,23 @@ if not get_option('linux_aio').auto() or have_block
required: get_option('linux_aio'),
kwargs: static_kwargs)
endif
+
+linux_io_uring_test = '''
+ #include <liburing.h>
+ #include <linux/errqueue.h>
+
+ int main(void) { return 0; }'''
+
linux_io_uring = not_found
if not get_option('linux_io_uring').auto() or have_block
linux_io_uring = dependency('liburing', version: '>=0.3',
required: get_option('linux_io_uring'),
method: 'pkg-config', kwargs: static_kwargs)
+ if not cc.links(linux_io_uring_test)
+ linux_io_uring = not_found
+ endif
endif
+
libnfs = not_found
if not get_option('libnfs').auto() or have_block
libnfs = dependency('libnfs', version: '>=1.9.3',
--
2.36.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH v13 1/8] meson.build: Fix docker-test-build@alpine when including linux/errqueue.h
2022-05-13 6:28 ` [PATCH v13 1/8] meson.build: Fix docker-test-build@alpine when including linux/errqueue.h Leonardo Bras
@ 2022-05-16 11:13 ` Dr. David Alan Gilbert
2022-05-16 11:17 ` Daniel P. Berrangé
0 siblings, 1 reply; 29+ messages in thread
From: Dr. David Alan Gilbert @ 2022-05-16 11:13 UTC (permalink / raw)
To: Leonardo Bras
Cc: Marc-André Lureau, Paolo Bonzini, Elena Ufimtseva,
Jagannathan Raman, John G Johnson, Daniel P. Berrangé,
Juan Quintela, Eric Blake, Markus Armbruster, Fam Zheng,
Peter Xu, qemu-devel, qemu-block
* Leonardo Bras (leobras@redhat.com) wrote:
> A build error happens in alpine CI when linux/errqueue.h is included
> in io/channel-socket.c, due to redefining of 'struct __kernel_timespec':
OK, looks to be same mechanism as other meson tests.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ===
> ninja: job failed: [...]
> In file included from /usr/include/linux/errqueue.h:6,
> from ../io/channel-socket.c:29:
> /usr/include/linux/time_types.h:7:8: error: redefinition of 'struct __kernel_timespec'
> 7 | struct __kernel_timespec {
> | ^~~~~~~~~~~~~~~~~
> In file included from /usr/include/liburing.h:19,
> from /builds/user/qemu/include/block/aio.h:18,
> from /builds/user/qemu/include/io/channel.h:26,
> from /builds/user/qemu/include/io/channel-socket.h:24,
> from ../io/channel-socket.c:24:
> /usr/include/liburing/compat.h:9:8: note: originally defined here
> 9 | struct __kernel_timespec {
> | ^~~~~~~~~~~~~~~~~
> ninja: subcommand failed
> ===
>
> As above error message suggests, 'struct __kernel_timespec' was already
> defined by liburing/compat.h.
>
> Fix alpine CI by adding test to disable liburing in configure step if a
> redefinition happens between linux/errqueue.h and liburing/compat.h.
>
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> ---
> meson.build | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/meson.build b/meson.build
> index 9b20dcd143..a996690c9b 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -515,12 +515,23 @@ if not get_option('linux_aio').auto() or have_block
> required: get_option('linux_aio'),
> kwargs: static_kwargs)
> endif
> +
> +linux_io_uring_test = '''
> + #include <liburing.h>
> + #include <linux/errqueue.h>
> +
> + int main(void) { return 0; }'''
> +
> linux_io_uring = not_found
> if not get_option('linux_io_uring').auto() or have_block
> linux_io_uring = dependency('liburing', version: '>=0.3',
> required: get_option('linux_io_uring'),
> method: 'pkg-config', kwargs: static_kwargs)
> + if not cc.links(linux_io_uring_test)
> + linux_io_uring = not_found
> + endif
> endif
> +
> libnfs = not_found
> if not get_option('libnfs').auto() or have_block
> libnfs = dependency('libnfs', version: '>=1.9.3',
> --
> 2.36.1
>
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v13 1/8] meson.build: Fix docker-test-build@alpine when including linux/errqueue.h
2022-05-16 11:13 ` Dr. David Alan Gilbert
@ 2022-05-16 11:17 ` Daniel P. Berrangé
2022-05-16 11:30 ` Dr. David Alan Gilbert
0 siblings, 1 reply; 29+ messages in thread
From: Daniel P. Berrangé @ 2022-05-16 11:17 UTC (permalink / raw)
To: Dr. David Alan Gilbert
Cc: Leonardo Bras, Marc-André Lureau, Paolo Bonzini,
Elena Ufimtseva, Jagannathan Raman, John G Johnson,
Juan Quintela, Eric Blake, Markus Armbruster, Fam Zheng,
Peter Xu, qemu-devel, qemu-block
On Mon, May 16, 2022 at 12:13:16PM +0100, Dr. David Alan Gilbert wrote:
> * Leonardo Bras (leobras@redhat.com) wrote:
> > A build error happens in alpine CI when linux/errqueue.h is included
> > in io/channel-socket.c, due to redefining of 'struct __kernel_timespec':
>
> OK, looks to be same mechanism as other meson tests.
>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
As of about an hour or so ago, this patch should not be required.
https://gitlab.alpinelinux.org/alpine/aports/-/issues/13813
>
> > ===
> > ninja: job failed: [...]
> > In file included from /usr/include/linux/errqueue.h:6,
> > from ../io/channel-socket.c:29:
> > /usr/include/linux/time_types.h:7:8: error: redefinition of 'struct __kernel_timespec'
> > 7 | struct __kernel_timespec {
> > | ^~~~~~~~~~~~~~~~~
> > In file included from /usr/include/liburing.h:19,
> > from /builds/user/qemu/include/block/aio.h:18,
> > from /builds/user/qemu/include/io/channel.h:26,
> > from /builds/user/qemu/include/io/channel-socket.h:24,
> > from ../io/channel-socket.c:24:
> > /usr/include/liburing/compat.h:9:8: note: originally defined here
> > 9 | struct __kernel_timespec {
> > | ^~~~~~~~~~~~~~~~~
> > ninja: subcommand failed
> > ===
> >
> > As above error message suggests, 'struct __kernel_timespec' was already
> > defined by liburing/compat.h.
> >
> > Fix alpine CI by adding test to disable liburing in configure step if a
> > redefinition happens between linux/errqueue.h and liburing/compat.h.
> >
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > ---
> > meson.build | 11 +++++++++++
> > 1 file changed, 11 insertions(+)
> >
> > diff --git a/meson.build b/meson.build
> > index 9b20dcd143..a996690c9b 100644
> > --- a/meson.build
> > +++ b/meson.build
> > @@ -515,12 +515,23 @@ if not get_option('linux_aio').auto() or have_block
> > required: get_option('linux_aio'),
> > kwargs: static_kwargs)
> > endif
> > +
> > +linux_io_uring_test = '''
> > + #include <liburing.h>
> > + #include <linux/errqueue.h>
> > +
> > + int main(void) { return 0; }'''
> > +
> > linux_io_uring = not_found
> > if not get_option('linux_io_uring').auto() or have_block
> > linux_io_uring = dependency('liburing', version: '>=0.3',
> > required: get_option('linux_io_uring'),
> > method: 'pkg-config', kwargs: static_kwargs)
> > + if not cc.links(linux_io_uring_test)
> > + linux_io_uring = not_found
> > + endif
> > endif
> > +
> > libnfs = not_found
> > if not get_option('libnfs').auto() or have_block
> > libnfs = dependency('libnfs', version: '>=1.9.3',
> > --
> > 2.36.1
> >
> >
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v13 1/8] meson.build: Fix docker-test-build@alpine when including linux/errqueue.h
2022-05-16 11:17 ` Daniel P. Berrangé
@ 2022-05-16 11:30 ` Dr. David Alan Gilbert
2022-05-16 11:35 ` Daniel P. Berrangé
0 siblings, 1 reply; 29+ messages in thread
From: Dr. David Alan Gilbert @ 2022-05-16 11:30 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: Leonardo Bras, Marc-André Lureau, Paolo Bonzini,
Elena Ufimtseva, Jagannathan Raman, John G Johnson,
Juan Quintela, Eric Blake, Markus Armbruster, Fam Zheng,
Peter Xu, qemu-devel, qemu-block
* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Mon, May 16, 2022 at 12:13:16PM +0100, Dr. David Alan Gilbert wrote:
> > * Leonardo Bras (leobras@redhat.com) wrote:
> > > A build error happens in alpine CI when linux/errqueue.h is included
> > > in io/channel-socket.c, due to redefining of 'struct __kernel_timespec':
> >
> > OK, looks to be same mechanism as other meson tests.
>
> >
> > Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>
> As of about an hour or so ago, this patch should not be required.
>
> https://gitlab.alpinelinux.org/alpine/aports/-/issues/13813
I'll take it anyway as protection against any other broken build envs.
Dave
> >
> > > ===
> > > ninja: job failed: [...]
> > > In file included from /usr/include/linux/errqueue.h:6,
> > > from ../io/channel-socket.c:29:
> > > /usr/include/linux/time_types.h:7:8: error: redefinition of 'struct __kernel_timespec'
> > > 7 | struct __kernel_timespec {
> > > | ^~~~~~~~~~~~~~~~~
> > > In file included from /usr/include/liburing.h:19,
> > > from /builds/user/qemu/include/block/aio.h:18,
> > > from /builds/user/qemu/include/io/channel.h:26,
> > > from /builds/user/qemu/include/io/channel-socket.h:24,
> > > from ../io/channel-socket.c:24:
> > > /usr/include/liburing/compat.h:9:8: note: originally defined here
> > > 9 | struct __kernel_timespec {
> > > | ^~~~~~~~~~~~~~~~~
> > > ninja: subcommand failed
> > > ===
> > >
> > > As above error message suggests, 'struct __kernel_timespec' was already
> > > defined by liburing/compat.h.
> > >
> > > Fix alpine CI by adding test to disable liburing in configure step if a
> > > redefinition happens between linux/errqueue.h and liburing/compat.h.
> > >
> > > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > > ---
> > > meson.build | 11 +++++++++++
> > > 1 file changed, 11 insertions(+)
> > >
> > > diff --git a/meson.build b/meson.build
> > > index 9b20dcd143..a996690c9b 100644
> > > --- a/meson.build
> > > +++ b/meson.build
> > > @@ -515,12 +515,23 @@ if not get_option('linux_aio').auto() or have_block
> > > required: get_option('linux_aio'),
> > > kwargs: static_kwargs)
> > > endif
> > > +
> > > +linux_io_uring_test = '''
> > > + #include <liburing.h>
> > > + #include <linux/errqueue.h>
> > > +
> > > + int main(void) { return 0; }'''
> > > +
> > > linux_io_uring = not_found
> > > if not get_option('linux_io_uring').auto() or have_block
> > > linux_io_uring = dependency('liburing', version: '>=0.3',
> > > required: get_option('linux_io_uring'),
> > > method: 'pkg-config', kwargs: static_kwargs)
> > > + if not cc.links(linux_io_uring_test)
> > > + linux_io_uring = not_found
> > > + endif
> > > endif
> > > +
> > > libnfs = not_found
> > > if not get_option('libnfs').auto() or have_block
> > > libnfs = dependency('libnfs', version: '>=1.9.3',
> > > --
> > > 2.36.1
> > >
> > >
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
>
> With regards,
> Daniel
> --
> |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o- https://fstop138.berrange.com :|
> |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v13 1/8] meson.build: Fix docker-test-build@alpine when including linux/errqueue.h
2022-05-16 11:30 ` Dr. David Alan Gilbert
@ 2022-05-16 11:35 ` Daniel P. Berrangé
2022-05-16 12:51 ` Dr. David Alan Gilbert
0 siblings, 1 reply; 29+ messages in thread
From: Daniel P. Berrangé @ 2022-05-16 11:35 UTC (permalink / raw)
To: Dr. David Alan Gilbert
Cc: Leonardo Bras, Marc-André Lureau, Paolo Bonzini,
Elena Ufimtseva, Jagannathan Raman, John G Johnson,
Juan Quintela, Eric Blake, Markus Armbruster, Fam Zheng,
Peter Xu, qemu-devel, qemu-block
On Mon, May 16, 2022 at 12:30:15PM +0100, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > On Mon, May 16, 2022 at 12:13:16PM +0100, Dr. David Alan Gilbert wrote:
> > > * Leonardo Bras (leobras@redhat.com) wrote:
> > > > A build error happens in alpine CI when linux/errqueue.h is included
> > > > in io/channel-socket.c, due to redefining of 'struct __kernel_timespec':
> > >
> > > OK, looks to be same mechanism as other meson tests.
> >
> > >
> > > Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >
> > As of about an hour or so ago, this patch should not be required.
> >
> > https://gitlab.alpinelinux.org/alpine/aports/-/issues/13813
>
> I'll take it anyway as protection against any other broken build envs.
Can you update the commit message at least then.
The root casue trigger for the bug is the OS uses a busybox
impl of mkdtemp, which isn't compat with the args liburing
configure was previously using. I doubt there are many such OS
around to be honest, as most will use coreutils.
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v13 1/8] meson.build: Fix docker-test-build@alpine when including linux/errqueue.h
2022-05-16 11:35 ` Daniel P. Berrangé
@ 2022-05-16 12:51 ` Dr. David Alan Gilbert
2022-05-16 14:04 ` Daniel P. Berrangé
0 siblings, 1 reply; 29+ messages in thread
From: Dr. David Alan Gilbert @ 2022-05-16 12:51 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: Leonardo Bras, Marc-André Lureau, Paolo Bonzini,
Elena Ufimtseva, Jagannathan Raman, John G Johnson,
Juan Quintela, Eric Blake, Markus Armbruster, Fam Zheng,
Peter Xu, qemu-devel, qemu-block
* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Mon, May 16, 2022 at 12:30:15PM +0100, Dr. David Alan Gilbert wrote:
> > * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > > On Mon, May 16, 2022 at 12:13:16PM +0100, Dr. David Alan Gilbert wrote:
> > > > * Leonardo Bras (leobras@redhat.com) wrote:
> > > > > A build error happens in alpine CI when linux/errqueue.h is included
> > > > > in io/channel-socket.c, due to redefining of 'struct __kernel_timespec':
> > > >
> > > > OK, looks to be same mechanism as other meson tests.
> > >
> > > >
> > > > Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > >
> > > As of about an hour or so ago, this patch should not be required.
> > >
> > > https://gitlab.alpinelinux.org/alpine/aports/-/issues/13813
> >
> > I'll take it anyway as protection against any other broken build envs.
>
> Can you update the commit message at least then.
Sure, I've added:
[dgilbert: This has been fixed in Alpine issue 13813 and liburing]
> The root casue trigger for the bug is the OS uses a busybox
I guess you mean musl??
> impl of mkdtemp, which isn't compat with the args liburing
> configure was previously using. I doubt there are many such OS
> around to be honest, as most will use coreutils.
>
> With regards,
> Daniel
> --
> |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o- https://fstop138.berrange.com :|
> |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v13 1/8] meson.build: Fix docker-test-build@alpine when including linux/errqueue.h
2022-05-16 12:51 ` Dr. David Alan Gilbert
@ 2022-05-16 14:04 ` Daniel P. Berrangé
0 siblings, 0 replies; 29+ messages in thread
From: Daniel P. Berrangé @ 2022-05-16 14:04 UTC (permalink / raw)
To: Dr. David Alan Gilbert
Cc: Leonardo Bras, Marc-André Lureau, Paolo Bonzini,
Elena Ufimtseva, Jagannathan Raman, John G Johnson,
Juan Quintela, Eric Blake, Markus Armbruster, Fam Zheng,
Peter Xu, qemu-devel, qemu-block
On Mon, May 16, 2022 at 01:51:43PM +0100, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > On Mon, May 16, 2022 at 12:30:15PM +0100, Dr. David Alan Gilbert wrote:
> > > * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > > > On Mon, May 16, 2022 at 12:13:16PM +0100, Dr. David Alan Gilbert wrote:
> > > > > * Leonardo Bras (leobras@redhat.com) wrote:
> > > > > > A build error happens in alpine CI when linux/errqueue.h is included
> > > > > > in io/channel-socket.c, due to redefining of 'struct __kernel_timespec':
> > > > >
> > > > > OK, looks to be same mechanism as other meson tests.
> > > >
> > > > >
> > > > > Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > >
> > > > As of about an hour or so ago, this patch should not be required.
> > > >
> > > > https://gitlab.alpinelinux.org/alpine/aports/-/issues/13813
> > >
> > > I'll take it anyway as protection against any other broken build envs.
> >
> > Can you update the commit message at least then.
>
> Sure, I've added:
>
> [dgilbert: This has been fixed in Alpine issue 13813 and liburing]
>
> > The root casue trigger for the bug is the OS uses a busybox
>
> I guess you mean musl??
I don't think it is musl, its the configure shell script and it is
throwing an error from the 'mktemp' command
> > impl of mkdtemp, which isn't compat with the args liburing
> > configure was previously using. I doubt there are many such OS
> > around to be honest, as most will use coreutils.
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v13 2/8] QIOChannel: Add flags on io_writev and introduce io_flush callback
2022-05-13 6:28 [PATCH v13 0/8] MSG_ZEROCOPY + multifd Leonardo Bras
2022-05-13 6:28 ` [PATCH v13 1/8] meson.build: Fix docker-test-build@alpine when including linux/errqueue.h Leonardo Bras
@ 2022-05-13 6:28 ` Leonardo Bras
2022-05-13 6:28 ` [PATCH v13 3/8] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX Leonardo Bras
` (5 subsequent siblings)
7 siblings, 0 replies; 29+ messages in thread
From: Leonardo Bras @ 2022-05-13 6:28 UTC (permalink / raw)
To: Marc-André Lureau, Paolo Bonzini, Elena Ufimtseva,
Jagannathan Raman, John G Johnson, Daniel P. Berrangé,
Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
Markus Armbruster, Fam Zheng, Peter Xu
Cc: Leonardo Bras, qemu-devel, qemu-block
Add flags to io_writev and introduce io_flush as optional callback to
QIOChannelClass, allowing the implementation of zero copy writes by
subclasses.
How to use them:
- Write data using qio_channel_writev*(...,QIO_CHANNEL_WRITE_FLAG_ZERO_COPY),
- Wait write completion with qio_channel_flush().
Notes:
As some zero copy write implementations work asynchronously, it's
recommended to keep the write buffer untouched until the return of
qio_channel_flush(), to avoid the risk of sending an updated buffer
instead of the buffer state during write.
As io_flush callback is optional, if a subclass does not implement it, then:
- io_flush will return 0 without changing anything.
Also, some functions like qio_channel_writev_full_all() were adapted to
receive a flag parameter. That allows shared code between zero copy and
non-zero copy writev, and also an easier implementation on new flags.
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
---
include/io/channel.h | 38 +++++++++++++++++++++-
chardev/char-io.c | 2 +-
hw/remote/mpqemu-link.c | 2 +-
io/channel-buffer.c | 1 +
io/channel-command.c | 1 +
io/channel-file.c | 1 +
io/channel-socket.c | 2 ++
io/channel-tls.c | 1 +
io/channel-websock.c | 1 +
io/channel.c | 49 +++++++++++++++++++++++------
migration/rdma.c | 1 +
scsi/pr-manager-helper.c | 2 +-
tests/unit/test-io-channel-socket.c | 1 +
13 files changed, 88 insertions(+), 14 deletions(-)
diff --git a/include/io/channel.h b/include/io/channel.h
index 88988979f8..c680ee7480 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -32,12 +32,15 @@ OBJECT_DECLARE_TYPE(QIOChannel, QIOChannelClass,
#define QIO_CHANNEL_ERR_BLOCK -2
+#define QIO_CHANNEL_WRITE_FLAG_ZERO_COPY 0x1
+
typedef enum QIOChannelFeature QIOChannelFeature;
enum QIOChannelFeature {
QIO_CHANNEL_FEATURE_FD_PASS,
QIO_CHANNEL_FEATURE_SHUTDOWN,
QIO_CHANNEL_FEATURE_LISTEN,
+ QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY,
};
@@ -104,6 +107,7 @@ struct QIOChannelClass {
size_t niov,
int *fds,
size_t nfds,
+ int flags,
Error **errp);
ssize_t (*io_readv)(QIOChannel *ioc,
const struct iovec *iov,
@@ -136,6 +140,8 @@ struct QIOChannelClass {
IOHandler *io_read,
IOHandler *io_write,
void *opaque);
+ int (*io_flush)(QIOChannel *ioc,
+ Error **errp);
};
/* General I/O handling functions */
@@ -228,6 +234,7 @@ ssize_t qio_channel_readv_full(QIOChannel *ioc,
* @niov: the length of the @iov array
* @fds: an array of file handles to send
* @nfds: number of file handles in @fds
+ * @flags: write flags (QIO_CHANNEL_WRITE_FLAG_*)
* @errp: pointer to a NULL-initialized error object
*
* Write data to the IO channel, reading it from the
@@ -260,6 +267,7 @@ ssize_t qio_channel_writev_full(QIOChannel *ioc,
size_t niov,
int *fds,
size_t nfds,
+ int flags,
Error **errp);
/**
@@ -837,6 +845,7 @@ int qio_channel_readv_full_all(QIOChannel *ioc,
* @niov: the length of the @iov array
* @fds: an array of file handles to send
* @nfds: number of file handles in @fds
+ * @flags: write flags (QIO_CHANNEL_WRITE_FLAG_*)
* @errp: pointer to a NULL-initialized error object
*
*
@@ -846,6 +855,14 @@ int qio_channel_readv_full_all(QIOChannel *ioc,
* to be written, yielding from the current coroutine
* if required.
*
+ * If QIO_CHANNEL_WRITE_FLAG_ZERO_COPY is passed in flags,
+ * instead of waiting for all requested data to be written,
+ * this function will wait until it's all queued for writing.
+ * In this case, if the buffer gets changed between queueing and
+ * sending, the updated buffer will be sent. If this is not a
+ * desired behavior, it's suggested to call qio_channel_flush()
+ * before reusing the buffer.
+ *
* Returns: 0 if all bytes were written, or -1 on error
*/
@@ -853,6 +870,25 @@ int qio_channel_writev_full_all(QIOChannel *ioc,
const struct iovec *iov,
size_t niov,
int *fds, size_t nfds,
- Error **errp);
+ int flags, Error **errp);
+
+/**
+ * qio_channel_flush:
+ * @ioc: the channel object
+ * @errp: pointer to a NULL-initialized error object
+ *
+ * Will block until every packet queued with
+ * qio_channel_writev_full() + QIO_CHANNEL_WRITE_FLAG_ZERO_COPY
+ * is sent, or return in case of any error.
+ *
+ * If not implemented, acts as a no-op, and returns 0.
+ *
+ * Returns -1 if any error is found,
+ * 1 if every send failed to use zero copy.
+ * 0 otherwise.
+ */
+
+int qio_channel_flush(QIOChannel *ioc,
+ Error **errp);
#endif /* QIO_CHANNEL_H */
diff --git a/chardev/char-io.c b/chardev/char-io.c
index 8ced184160..4451128cba 100644
--- a/chardev/char-io.c
+++ b/chardev/char-io.c
@@ -122,7 +122,7 @@ int io_channel_send_full(QIOChannel *ioc,
ret = qio_channel_writev_full(
ioc, &iov, 1,
- fds, nfds, NULL);
+ fds, nfds, 0, NULL);
if (ret == QIO_CHANNEL_ERR_BLOCK) {
if (offset) {
return offset;
diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c
index 2a4aa651ca..9bd98e8219 100644
--- a/hw/remote/mpqemu-link.c
+++ b/hw/remote/mpqemu-link.c
@@ -68,7 +68,7 @@ bool mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc, Error **errp)
}
if (!qio_channel_writev_full_all(ioc, send, G_N_ELEMENTS(send),
- fds, nfds, errp)) {
+ fds, nfds, 0, errp)) {
ret = true;
} else {
trace_mpqemu_send_io_error(msg->cmd, msg->size, nfds);
diff --git a/io/channel-buffer.c b/io/channel-buffer.c
index baa4e2b089..bf52011be2 100644
--- a/io/channel-buffer.c
+++ b/io/channel-buffer.c
@@ -81,6 +81,7 @@ static ssize_t qio_channel_buffer_writev(QIOChannel *ioc,
size_t niov,
int *fds,
size_t nfds,
+ int flags,
Error **errp)
{
QIOChannelBuffer *bioc = QIO_CHANNEL_BUFFER(ioc);
diff --git a/io/channel-command.c b/io/channel-command.c
index 4a1f969aaa..9f2f4a1793 100644
--- a/io/channel-command.c
+++ b/io/channel-command.c
@@ -276,6 +276,7 @@ static ssize_t qio_channel_command_writev(QIOChannel *ioc,
size_t niov,
int *fds,
size_t nfds,
+ int flags,
Error **errp)
{
QIOChannelCommand *cioc = QIO_CHANNEL_COMMAND(ioc);
diff --git a/io/channel-file.c b/io/channel-file.c
index d146ace7db..b67687c2aa 100644
--- a/io/channel-file.c
+++ b/io/channel-file.c
@@ -114,6 +114,7 @@ static ssize_t qio_channel_file_writev(QIOChannel *ioc,
size_t niov,
int *fds,
size_t nfds,
+ int flags,
Error **errp)
{
QIOChannelFile *fioc = QIO_CHANNEL_FILE(ioc);
diff --git a/io/channel-socket.c b/io/channel-socket.c
index e531d7bd2a..05c425abb8 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -524,6 +524,7 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
size_t niov,
int *fds,
size_t nfds,
+ int flags,
Error **errp)
{
QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
@@ -619,6 +620,7 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
size_t niov,
int *fds,
size_t nfds,
+ int flags,
Error **errp)
{
QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
diff --git a/io/channel-tls.c b/io/channel-tls.c
index 2ae1b92fc0..4ce890a538 100644
--- a/io/channel-tls.c
+++ b/io/channel-tls.c
@@ -301,6 +301,7 @@ static ssize_t qio_channel_tls_writev(QIOChannel *ioc,
size_t niov,
int *fds,
size_t nfds,
+ int flags,
Error **errp)
{
QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc);
diff --git a/io/channel-websock.c b/io/channel-websock.c
index 55145a6a8c..9619906ac3 100644
--- a/io/channel-websock.c
+++ b/io/channel-websock.c
@@ -1127,6 +1127,7 @@ static ssize_t qio_channel_websock_writev(QIOChannel *ioc,
size_t niov,
int *fds,
size_t nfds,
+ int flags,
Error **errp)
{
QIOChannelWebsock *wioc = QIO_CHANNEL_WEBSOCK(ioc);
diff --git a/io/channel.c b/io/channel.c
index e8b019dc36..0640941ac5 100644
--- a/io/channel.c
+++ b/io/channel.c
@@ -72,18 +72,32 @@ ssize_t qio_channel_writev_full(QIOChannel *ioc,
size_t niov,
int *fds,
size_t nfds,
+ int flags,
Error **errp)
{
QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
- if ((fds || nfds) &&
- !qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_FD_PASS)) {
+ if (fds || nfds) {
+ if (!qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_FD_PASS)) {
+ error_setg_errno(errp, EINVAL,
+ "Channel does not support file descriptor passing");
+ return -1;
+ }
+ if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
+ error_setg_errno(errp, EINVAL,
+ "Zero Copy does not support file descriptor passing");
+ return -1;
+ }
+ }
+
+ if ((flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) &&
+ !qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY)) {
error_setg_errno(errp, EINVAL,
- "Channel does not support file descriptor passing");
+ "Requested Zero Copy feature is not available");
return -1;
}
- return klass->io_writev(ioc, iov, niov, fds, nfds, errp);
+ return klass->io_writev(ioc, iov, niov, fds, nfds, flags, errp);
}
@@ -217,14 +231,14 @@ int qio_channel_writev_all(QIOChannel *ioc,
size_t niov,
Error **errp)
{
- return qio_channel_writev_full_all(ioc, iov, niov, NULL, 0, errp);
+ return qio_channel_writev_full_all(ioc, iov, niov, NULL, 0, 0, errp);
}
int qio_channel_writev_full_all(QIOChannel *ioc,
const struct iovec *iov,
size_t niov,
int *fds, size_t nfds,
- Error **errp)
+ int flags, Error **errp)
{
int ret = -1;
struct iovec *local_iov = g_new(struct iovec, niov);
@@ -237,8 +251,10 @@ int qio_channel_writev_full_all(QIOChannel *ioc,
while (nlocal_iov > 0) {
ssize_t len;
- len = qio_channel_writev_full(ioc, local_iov, nlocal_iov, fds, nfds,
- errp);
+
+ len = qio_channel_writev_full(ioc, local_iov, nlocal_iov, fds,
+ nfds, flags, errp);
+
if (len == QIO_CHANNEL_ERR_BLOCK) {
if (qemu_in_coroutine()) {
qio_channel_yield(ioc, G_IO_OUT);
@@ -277,7 +293,7 @@ ssize_t qio_channel_writev(QIOChannel *ioc,
size_t niov,
Error **errp)
{
- return qio_channel_writev_full(ioc, iov, niov, NULL, 0, errp);
+ return qio_channel_writev_full(ioc, iov, niov, NULL, 0, 0, errp);
}
@@ -297,7 +313,7 @@ ssize_t qio_channel_write(QIOChannel *ioc,
Error **errp)
{
struct iovec iov = { .iov_base = (char *)buf, .iov_len = buflen };
- return qio_channel_writev_full(ioc, &iov, 1, NULL, 0, errp);
+ return qio_channel_writev_full(ioc, &iov, 1, NULL, 0, 0, errp);
}
@@ -473,6 +489,19 @@ off_t qio_channel_io_seek(QIOChannel *ioc,
return klass->io_seek(ioc, offset, whence, errp);
}
+int qio_channel_flush(QIOChannel *ioc,
+ Error **errp)
+{
+ QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc);
+
+ if (!klass->io_flush ||
+ !qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY)) {
+ return 0;
+ }
+
+ return klass->io_flush(ioc, errp);
+}
+
static void qio_channel_restart_read(void *opaque)
{
diff --git a/migration/rdma.c b/migration/rdma.c
index ef1e65ec36..672d1958a9 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2840,6 +2840,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
size_t niov,
int *fds,
size_t nfds,
+ int flags,
Error **errp)
{
QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
diff --git a/scsi/pr-manager-helper.c b/scsi/pr-manager-helper.c
index 451c7631b7..3be52a98d5 100644
--- a/scsi/pr-manager-helper.c
+++ b/scsi/pr-manager-helper.c
@@ -77,7 +77,7 @@ static int pr_manager_helper_write(PRManagerHelper *pr_mgr,
iov.iov_base = (void *)buf;
iov.iov_len = sz;
n_written = qio_channel_writev_full(QIO_CHANNEL(pr_mgr->ioc), &iov, 1,
- nfds ? &fd : NULL, nfds, errp);
+ nfds ? &fd : NULL, nfds, 0, errp);
if (n_written <= 0) {
assert(n_written != QIO_CHANNEL_ERR_BLOCK);
diff --git a/tests/unit/test-io-channel-socket.c b/tests/unit/test-io-channel-socket.c
index c49eec1f03..6713886d02 100644
--- a/tests/unit/test-io-channel-socket.c
+++ b/tests/unit/test-io-channel-socket.c
@@ -444,6 +444,7 @@ static void test_io_channel_unix_fd_pass(void)
G_N_ELEMENTS(iosend),
fdsend,
G_N_ELEMENTS(fdsend),
+ 0,
&error_abort);
qio_channel_readv_full(dst,
--
2.36.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v13 3/8] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX
2022-05-13 6:28 [PATCH v13 0/8] MSG_ZEROCOPY + multifd Leonardo Bras
2022-05-13 6:28 ` [PATCH v13 1/8] meson.build: Fix docker-test-build@alpine when including linux/errqueue.h Leonardo Bras
2022-05-13 6:28 ` [PATCH v13 2/8] QIOChannel: Add flags on io_writev and introduce io_flush callback Leonardo Bras
@ 2022-05-13 6:28 ` Leonardo Bras
2022-06-01 9:37 ` [External] " 徐闯
2022-06-14 13:09 ` chuang xu
2022-05-13 6:28 ` [PATCH v13 4/8] migration: Add zero-copy-send parameter for QMP/HMP for Linux Leonardo Bras
` (4 subsequent siblings)
7 siblings, 2 replies; 29+ messages in thread
From: Leonardo Bras @ 2022-05-13 6:28 UTC (permalink / raw)
To: Marc-André Lureau, Paolo Bonzini, Elena Ufimtseva,
Jagannathan Raman, John G Johnson, Daniel P. Berrangé,
Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
Markus Armbruster, Fam Zheng, Peter Xu
Cc: Leonardo Bras, qemu-devel, qemu-block
For CONFIG_LINUX, implement the new zero copy flag and the optional callback
io_flush on QIOChannelSocket, but enables it only when MSG_ZEROCOPY
feature is available in the host kernel, which is checked on
qio_channel_socket_connect_sync()
qio_channel_socket_flush() was implemented by counting how many times
sendmsg(...,MSG_ZEROCOPY) was successfully called, and then reading the
socket's error queue, in order to find how many of them finished sending.
Flush will loop until those counters are the same, or until some error occurs.
Notes on using writev() with QIO_CHANNEL_WRITE_FLAG_ZERO_COPY:
1: Buffer
- As MSG_ZEROCOPY tells the kernel to use the same user buffer to avoid copying,
some caution is necessary to avoid overwriting any buffer before it's sent.
If something like this happen, a newer version of the buffer may be sent instead.
- If this is a problem, it's recommended to call qio_channel_flush() before freeing
or re-using the buffer.
2: Locked memory
- When using MSG_ZERCOCOPY, the buffer memory will be locked after queued, and
unlocked after it's sent.
- Depending on the size of each buffer, and how often it's sent, it may require
a larger amount of locked memory than usually available to non-root user.
- If the required amount of locked memory is not available, writev_zero_copy
will return an error, which can abort an operation like migration,
- Because of this, when an user code wants to add zero copy as a feature, it
requires a mechanism to disable it, so it can still be accessible to less
privileged users.
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
---
include/io/channel-socket.h | 2 +
io/channel-socket.c | 116 ++++++++++++++++++++++++++++++++++--
2 files changed, 114 insertions(+), 4 deletions(-)
diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h
index e747e63514..513c428fe4 100644
--- a/include/io/channel-socket.h
+++ b/include/io/channel-socket.h
@@ -47,6 +47,8 @@ struct QIOChannelSocket {
socklen_t localAddrLen;
struct sockaddr_storage remoteAddr;
socklen_t remoteAddrLen;
+ ssize_t zero_copy_queued;
+ ssize_t zero_copy_sent;
};
diff --git a/io/channel-socket.c b/io/channel-socket.c
index 05c425abb8..dc9c165de1 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -25,6 +25,14 @@
#include "io/channel-watch.h"
#include "trace.h"
#include "qapi/clone-visitor.h"
+#ifdef CONFIG_LINUX
+#include <linux/errqueue.h>
+#include <sys/socket.h>
+
+#if (defined(MSG_ZEROCOPY) && defined(SO_ZEROCOPY))
+#define QEMU_MSG_ZEROCOPY
+#endif
+#endif
#define SOCKET_MAX_FDS 16
@@ -54,6 +62,8 @@ qio_channel_socket_new(void)
sioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET));
sioc->fd = -1;
+ sioc->zero_copy_queued = 0;
+ sioc->zero_copy_sent = 0;
ioc = QIO_CHANNEL(sioc);
qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN);
@@ -153,6 +163,16 @@ int qio_channel_socket_connect_sync(QIOChannelSocket *ioc,
return -1;
}
+#ifdef QEMU_MSG_ZEROCOPY
+ int ret, v = 1;
+ ret = setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &v, sizeof(v));
+ if (ret == 0) {
+ /* Zero copy available on host */
+ qio_channel_set_feature(QIO_CHANNEL(ioc),
+ QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY);
+ }
+#endif
+
return 0;
}
@@ -533,6 +553,7 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
char control[CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS)];
size_t fdsize = sizeof(int) * nfds;
struct cmsghdr *cmsg;
+ int sflags = 0;
memset(control, 0, CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS));
@@ -557,15 +578,31 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
memcpy(CMSG_DATA(cmsg), fds, fdsize);
}
+#ifdef QEMU_MSG_ZEROCOPY
+ if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
+ sflags = MSG_ZEROCOPY;
+ }
+#endif
+
retry:
- ret = sendmsg(sioc->fd, &msg, 0);
+ ret = sendmsg(sioc->fd, &msg, sflags);
if (ret <= 0) {
- if (errno == EAGAIN) {
+ switch (errno) {
+ case EAGAIN:
return QIO_CHANNEL_ERR_BLOCK;
- }
- if (errno == EINTR) {
+ case EINTR:
goto retry;
+#ifdef QEMU_MSG_ZEROCOPY
+ case ENOBUFS:
+ if (sflags & MSG_ZEROCOPY) {
+ error_setg_errno(errp, errno,
+ "Process can't lock enough memory for using MSG_ZEROCOPY");
+ return -1;
+ }
+ break;
+#endif
}
+
error_setg_errno(errp, errno,
"Unable to write to socket");
return -1;
@@ -659,6 +696,74 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
}
#endif /* WIN32 */
+
+#ifdef QEMU_MSG_ZEROCOPY
+static int qio_channel_socket_flush(QIOChannel *ioc,
+ Error **errp)
+{
+ QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
+ struct msghdr msg = {};
+ struct sock_extended_err *serr;
+ struct cmsghdr *cm;
+ char control[CMSG_SPACE(sizeof(*serr))];
+ int received;
+ int ret = 1;
+
+ msg.msg_control = control;
+ msg.msg_controllen = sizeof(control);
+ memset(control, 0, sizeof(control));
+
+ while (sioc->zero_copy_sent < sioc->zero_copy_queued) {
+ received = recvmsg(sioc->fd, &msg, MSG_ERRQUEUE);
+ if (received < 0) {
+ switch (errno) {
+ case EAGAIN:
+ /* Nothing on errqueue, wait until something is available */
+ qio_channel_wait(ioc, G_IO_ERR);
+ continue;
+ case EINTR:
+ continue;
+ default:
+ error_setg_errno(errp, errno,
+ "Unable to read errqueue");
+ return -1;
+ }
+ }
+
+ cm = CMSG_FIRSTHDR(&msg);
+ if (cm->cmsg_level != SOL_IP &&
+ cm->cmsg_type != IP_RECVERR) {
+ error_setg_errno(errp, EPROTOTYPE,
+ "Wrong cmsg in errqueue");
+ return -1;
+ }
+
+ serr = (void *) CMSG_DATA(cm);
+ if (serr->ee_errno != SO_EE_ORIGIN_NONE) {
+ error_setg_errno(errp, serr->ee_errno,
+ "Error on socket");
+ return -1;
+ }
+ if (serr->ee_origin != SO_EE_ORIGIN_ZEROCOPY) {
+ error_setg_errno(errp, serr->ee_origin,
+ "Error not from zero copy");
+ return -1;
+ }
+
+ /* No errors, count successfully finished sendmsg()*/
+ sioc->zero_copy_sent += serr->ee_data - serr->ee_info + 1;
+
+ /* If any sendmsg() succeeded using zero copy, return 0 at the end */
+ if (serr->ee_code != SO_EE_CODE_ZEROCOPY_COPIED) {
+ ret = 0;
+ }
+ }
+
+ return ret;
+}
+
+#endif /* QEMU_MSG_ZEROCOPY */
+
static int
qio_channel_socket_set_blocking(QIOChannel *ioc,
bool enabled,
@@ -789,6 +894,9 @@ static void qio_channel_socket_class_init(ObjectClass *klass,
ioc_klass->io_set_delay = qio_channel_socket_set_delay;
ioc_klass->io_create_watch = qio_channel_socket_create_watch;
ioc_klass->io_set_aio_fd_handler = qio_channel_socket_set_aio_fd_handler;
+#ifdef QEMU_MSG_ZEROCOPY
+ ioc_klass->io_flush = qio_channel_socket_flush;
+#endif
}
static const TypeInfo qio_channel_socket_info = {
--
2.36.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [External] [PATCH v13 3/8] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX
2022-05-13 6:28 ` [PATCH v13 3/8] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX Leonardo Bras
@ 2022-06-01 9:37 ` 徐闯
2022-06-01 13:58 ` Peter Xu
2022-06-08 5:24 ` Leonardo Bras Soares Passos
2022-06-14 13:09 ` chuang xu
1 sibling, 2 replies; 29+ messages in thread
From: 徐闯 @ 2022-06-01 9:37 UTC (permalink / raw)
To: Leonardo Bras
Cc: qemu-devel, qemu-block, Peter Xu, Fam Zheng, Markus Armbruster,
Dr. David Alan Gilbert, Eric Blake, Juan Quintela,
Daniel P. Berrangé,
John G Johnson, Jagannathan Raman, Elena Ufimtseva,
Paolo Bonzini, Marc-André Lureau, lizefan.x, zhouyibo
On 2022/5/13 下午2:28, Leonardo Bras wrote:
> For CONFIG_LINUX, implement the new zero copy flag and the optional callback
> io_flush on QIOChannelSocket, but enables it only when MSG_ZEROCOPY
> feature is available in the host kernel, which is checked on
> qio_channel_socket_connect_sync()
>
> qio_channel_socket_flush() was implemented by counting how many times
> sendmsg(...,MSG_ZEROCOPY) was successfully called, and then reading the
> socket's error queue, in order to find how many of them finished sending.
> Flush will loop until those counters are the same, or until some error occurs.
>
> Notes on using writev() with QIO_CHANNEL_WRITE_FLAG_ZERO_COPY:
> 1: Buffer
> - As MSG_ZEROCOPY tells the kernel to use the same user buffer to avoid copying,
> some caution is necessary to avoid overwriting any buffer before it's sent.
> If something like this happen, a newer version of the buffer may be sent instead.
> - If this is a problem, it's recommended to call qio_channel_flush() before freeing
> or re-using the buffer.
>
> 2: Locked memory
> - When using MSG_ZERCOCOPY, the buffer memory will be locked after queued, and
> unlocked after it's sent.
> - Depending on the size of each buffer, and how often it's sent, it may require
> a larger amount of locked memory than usually available to non-root user.
> - If the required amount of locked memory is not available, writev_zero_copy
> will return an error, which can abort an operation like migration,
> - Because of this, when an user code wants to add zero copy as a feature, it
> requires a mechanism to disable it, so it can still be accessible to less
> privileged users.
>
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> Reviewed-by: Peter Xu <peterx@redhat.com>
> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> ---
> include/io/channel-socket.h | 2 +
> io/channel-socket.c | 116 ++++++++++++++++++++++++++++++++++--
> 2 files changed, 114 insertions(+), 4 deletions(-)
>
> diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h
> index e747e63514..513c428fe4 100644
> --- a/include/io/channel-socket.h
> +++ b/include/io/channel-socket.h
> @@ -47,6 +47,8 @@ struct QIOChannelSocket {
> socklen_t localAddrLen;
> struct sockaddr_storage remoteAddr;
> socklen_t remoteAddrLen;
> + ssize_t zero_copy_queued;
> + ssize_t zero_copy_sent;
> };
>
Hi, Leonardo. I'm also paying attention to the application of
MSG_ZEROCOPY in live migration recently. I noticed that you defined a
member `zero_copy_queued` in the struct QIOChannelSocket, but I can't
find out where the value of this member has been changed in your patch.
Can you answer it for me?
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [External] [PATCH v13 3/8] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX
2022-06-01 9:37 ` [External] " 徐闯
@ 2022-06-01 13:58 ` Peter Xu
2022-06-08 5:37 ` Leonardo Bras Soares Passos
2022-06-08 5:24 ` Leonardo Bras Soares Passos
1 sibling, 1 reply; 29+ messages in thread
From: Peter Xu @ 2022-06-01 13:58 UTC (permalink / raw)
To: 徐闯, Leonardo Bras Soares Passos
Cc: Leonardo Bras, qemu-devel, qemu-block, Fam Zheng,
Markus Armbruster, Dr. David Alan Gilbert, Eric Blake,
Juan Quintela, Daniel P. Berrangé,
John G Johnson, Jagannathan Raman, Elena Ufimtseva,
Paolo Bonzini, Marc-André Lureau, lizefan.x, zhouyibo
On Wed, Jun 01, 2022 at 05:37:10PM +0800, 徐闯 wrote:
>
> On 2022/5/13 下午2:28, Leonardo Bras wrote:
> > For CONFIG_LINUX, implement the new zero copy flag and the optional callback
> > io_flush on QIOChannelSocket, but enables it only when MSG_ZEROCOPY
> > feature is available in the host kernel, which is checked on
> > qio_channel_socket_connect_sync()
> >
> > qio_channel_socket_flush() was implemented by counting how many times
> > sendmsg(...,MSG_ZEROCOPY) was successfully called, and then reading the
> > socket's error queue, in order to find how many of them finished sending.
> > Flush will loop until those counters are the same, or until some error occurs.
> >
> > Notes on using writev() with QIO_CHANNEL_WRITE_FLAG_ZERO_COPY:
> > 1: Buffer
> > - As MSG_ZEROCOPY tells the kernel to use the same user buffer to avoid copying,
> > some caution is necessary to avoid overwriting any buffer before it's sent.
> > If something like this happen, a newer version of the buffer may be sent instead.
> > - If this is a problem, it's recommended to call qio_channel_flush() before freeing
> > or re-using the buffer.
> >
> > 2: Locked memory
> > - When using MSG_ZERCOCOPY, the buffer memory will be locked after queued, and
> > unlocked after it's sent.
> > - Depending on the size of each buffer, and how often it's sent, it may require
> > a larger amount of locked memory than usually available to non-root user.
> > - If the required amount of locked memory is not available, writev_zero_copy
> > will return an error, which can abort an operation like migration,
> > - Because of this, when an user code wants to add zero copy as a feature, it
> > requires a mechanism to disable it, so it can still be accessible to less
> > privileged users.
> >
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > Reviewed-by: Peter Xu <peterx@redhat.com>
> > Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
> > Reviewed-by: Juan Quintela <quintela@redhat.com>
> > ---
> > include/io/channel-socket.h | 2 +
> > io/channel-socket.c | 116 ++++++++++++++++++++++++++++++++++--
> > 2 files changed, 114 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h
> > index e747e63514..513c428fe4 100644
> > --- a/include/io/channel-socket.h
> > +++ b/include/io/channel-socket.h
> > @@ -47,6 +47,8 @@ struct QIOChannelSocket {
> > socklen_t localAddrLen;
> > struct sockaddr_storage remoteAddr;
> > socklen_t remoteAddrLen;
> > + ssize_t zero_copy_queued;
> > + ssize_t zero_copy_sent;
> > };
> Hi, Leonardo. I'm also paying attention to the application of MSG_ZEROCOPY
> in live migration recently. I noticed that you defined a member
> `zero_copy_queued` in the struct QIOChannelSocket, but I can't find out
> where the value of this member has been changed in your patch. Can you
> answer it for me?
>
Good point.. it should probably be increased when queuing the pages. We'd
better fix it up or it seems the flush() will be literally an no-op..
Two things in qio_channel_socket_flush() we can do to make sure it'll work
as expected, imo:
1) make ret=-1 as initial value, rather than 1 - we only check negative
errors in the caller so we could have missed a positive "1"
2) add a tracepoint into the loop of updating zero_copy_sent
Leo, what's your take?
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [External] [PATCH v13 3/8] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX
2022-06-01 13:58 ` Peter Xu
@ 2022-06-08 5:37 ` Leonardo Bras Soares Passos
2022-06-08 11:41 ` Peter Xu
0 siblings, 1 reply; 29+ messages in thread
From: Leonardo Bras Soares Passos @ 2022-06-08 5:37 UTC (permalink / raw)
To: Peter Xu
Cc: 徐闯,
qemu-devel, qemu-block, Fam Zheng, Markus Armbruster,
Dr. David Alan Gilbert, Eric Blake, Juan Quintela,
Daniel P. Berrangé,
John G Johnson, Jagannathan Raman, Elena Ufimtseva,
Paolo Bonzini, Marc-André Lureau, lizefan.x, zhouyibo
Hello Peter,
On Wed, Jun 1, 2022 at 10:58 AM Peter Xu <peterx@redhat.com> wrote:
>
[...]
> > Hi, Leonardo. I'm also paying attention to the application of MSG_ZEROCOPY
> > in live migration recently. I noticed that you defined a member
> > `zero_copy_queued` in the struct QIOChannelSocket, but I can't find out
> > where the value of this member has been changed in your patch. Can you
> > answer it for me?
> >
>
> Good point.. it should probably be increased when queuing the pages. We'd
> better fix it up or it seems the flush() will be literally an no-op..
That's correct.
I am working on a fix right now.
The idea is to increment it in qio_channel_socket_writev() if sendmsg succeeds.
>
> Two things in qio_channel_socket_flush() we can do to make sure it'll work
> as expected, imo:
>
> 1) make ret=-1 as initial value, rather than 1 - we only check negative
> errors in the caller so we could have missed a positive "1"
>
> 2) add a tracepoint into the loop of updating zero_copy_sent
>
> Leo, what's your take?
(1) is not an option, as the interface currently uses ret=1 to make
sure MSG_ZEROCOPY is getting used,
I added that so the user of qio_channel can switch off zero-copy if
it's not getting used, and save some cpu.
(2) is not a problem, but I fail to see how useful that would be. Is
the idea manually keeping track of flush happening?
Best regards,
Leo
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [External] [PATCH v13 3/8] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX
2022-06-08 5:37 ` Leonardo Bras Soares Passos
@ 2022-06-08 11:41 ` Peter Xu
2022-06-08 18:14 ` Leonardo Bras Soares Passos
0 siblings, 1 reply; 29+ messages in thread
From: Peter Xu @ 2022-06-08 11:41 UTC (permalink / raw)
To: Leonardo Bras Soares Passos
Cc: 徐闯,
qemu-devel, qemu-block, Fam Zheng, Markus Armbruster,
Dr. David Alan Gilbert, Eric Blake, Juan Quintela,
Daniel P. Berrangé,
John G Johnson, Jagannathan Raman, Elena Ufimtseva,
Paolo Bonzini, Marc-André Lureau, lizefan.x, zhouyibo
On Wed, Jun 08, 2022 at 02:37:28AM -0300, Leonardo Bras Soares Passos wrote:
> (1) is not an option, as the interface currently uses ret=1 to make
> sure MSG_ZEROCOPY is getting used,
> I added that so the user of qio_channel can switch off zero-copy if
> it's not getting used, and save some cpu.
Yes (1) is not, but could you explain what do you mean by making sure
MSG_ZEROCOPY being used? Why is it relevant to the retval here?
I just figured it's a bit weird to return >0 here in flush().
>
> (2) is not a problem, but I fail to see how useful that would be. Is
> the idea manually keeping track of flush happening?
Yes if we can check this up it'll be good enough to me. The trace point
could help in some case in the future too to monitor the behavior of kernel
MSG_ERRQUEUE but if you don't like it then it's okay.
--
Peter Xu
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [External] [PATCH v13 3/8] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX
2022-06-08 11:41 ` Peter Xu
@ 2022-06-08 18:14 ` Leonardo Bras Soares Passos
2022-06-08 20:23 ` Peter Xu
0 siblings, 1 reply; 29+ messages in thread
From: Leonardo Bras Soares Passos @ 2022-06-08 18:14 UTC (permalink / raw)
To: Peter Xu
Cc: 徐闯,
qemu-devel, qemu-block, Fam Zheng, Markus Armbruster,
Dr. David Alan Gilbert, Eric Blake, Juan Quintela,
Daniel P. Berrangé,
John G Johnson, Jagannathan Raman, Elena Ufimtseva,
Paolo Bonzini, Marc-André Lureau, lizefan.x, zhouyibo
On Wed, Jun 8, 2022 at 8:41 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Wed, Jun 08, 2022 at 02:37:28AM -0300, Leonardo Bras Soares Passos wrote:
> > (1) is not an option, as the interface currently uses ret=1 to make
> > sure MSG_ZEROCOPY is getting used,
> > I added that so the user of qio_channel can switch off zero-copy if
> > it's not getting used, and save some cpu.
>
> Yes (1) is not, but could you explain what do you mean by making sure
> MSG_ZEROCOPY being used? Why is it relevant to the retval here?
If sendmsg() is called with MSG_ZEROCOPY, and everything is configured
correctly, the kernel will attempt to send the buffer using zero-copy.
Even with the right configuration on a recent enough kernel, there are
factors that can prevent zero-copy from happening, and the kernel will
fall back to the copying mechanism.
An example being the net device not supporting 'Scatter-Gather'
feature (NETIF_F_SG).
When this happens, there is an overhead for 'trying zero-copy first',
instead of just opting for the copying mechanism.
In a previous iteration of the patchset, it was made clear that it's
desirable to detect when the kernel falls back to copying mechanism,
so the user of 'QIOChannelSocket' can switch to copying and avoid the
overhead. This was done by the return value of flush(), which is 1 if
that occurs.
>
> I just figured it's a bit weird to return >0 here in flush().
>
> >
> > (2) is not a problem, but I fail to see how useful that would be. Is
> > the idea manually keeping track of flush happening?
>
> Yes if we can check this up it'll be good enough to me. The trace point
> could help in some case in the future too to monitor the behavior of kernel
> MSG_ERRQUEUE but if you don't like it then it's okay.
>
TBH I am not sure how those traces work yet, and I am afraid it can
introduce some overhead in flush.
In any way, we can introduce this trace in a separated patch, since
fixing zero-copy flush seems more urgent right now.
Best regards,
Leo
> --
> Peter Xu
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [External] [PATCH v13 3/8] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX
2022-06-08 18:14 ` Leonardo Bras Soares Passos
@ 2022-06-08 20:23 ` Peter Xu
2022-06-13 20:58 ` Leonardo Bras Soares Passos
0 siblings, 1 reply; 29+ messages in thread
From: Peter Xu @ 2022-06-08 20:23 UTC (permalink / raw)
To: Leonardo Bras Soares Passos
Cc: 徐闯,
qemu-devel, qemu-block, Fam Zheng, Markus Armbruster,
Dr. David Alan Gilbert, Eric Blake, Juan Quintela,
Daniel P. Berrangé,
John G Johnson, Jagannathan Raman, Elena Ufimtseva,
Paolo Bonzini, Marc-André Lureau, lizefan.x, zhouyibo
On Wed, Jun 08, 2022 at 03:14:36PM -0300, Leonardo Bras Soares Passos wrote:
> On Wed, Jun 8, 2022 at 8:41 AM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Wed, Jun 08, 2022 at 02:37:28AM -0300, Leonardo Bras Soares Passos wrote:
> > > (1) is not an option, as the interface currently uses ret=1 to make
> > > sure MSG_ZEROCOPY is getting used,
> > > I added that so the user of qio_channel can switch off zero-copy if
> > > it's not getting used, and save some cpu.
> >
> > Yes (1) is not, but could you explain what do you mean by making sure
> > MSG_ZEROCOPY being used? Why is it relevant to the retval here?
>
> If sendmsg() is called with MSG_ZEROCOPY, and everything is configured
> correctly, the kernel will attempt to send the buffer using zero-copy.
>
> Even with the right configuration on a recent enough kernel, there are
> factors that can prevent zero-copy from happening, and the kernel will
> fall back to the copying mechanism.
> An example being the net device not supporting 'Scatter-Gather'
> feature (NETIF_F_SG).
>
> When this happens, there is an overhead for 'trying zero-copy first',
> instead of just opting for the copying mechanism.
>
> In a previous iteration of the patchset, it was made clear that it's
> desirable to detect when the kernel falls back to copying mechanism,
> so the user of 'QIOChannelSocket' can switch to copying and avoid the
> overhead. This was done by the return value of flush(), which is 1 if
> that occurs.
Two questions..
1) When that happens, will MSG_ERRQUEUE keeps working just like zerocopy
is functional?
If the answer is yes, I don't see how ret=1 will ever be
returned.. because we'll also go into the same loop in
qio_channel_socket_flush() anyway.
If the answer is no, then since we'll have non-zero zero_copy_queued,
will the loop in qio_channel_socket_flush() go into a dead one? How
could it return?
2) Even if we have the correct ret=1 returned when that happens, which
caller is detecting that ret==1 and warn the admin?
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [External] [PATCH v13 3/8] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX
2022-06-08 20:23 ` Peter Xu
@ 2022-06-13 20:58 ` Leonardo Bras Soares Passos
2022-06-13 22:53 ` Peter Xu
0 siblings, 1 reply; 29+ messages in thread
From: Leonardo Bras Soares Passos @ 2022-06-13 20:58 UTC (permalink / raw)
To: Peter Xu
Cc: 徐闯,
qemu-devel, qemu-block, Fam Zheng, Markus Armbruster,
Dr. David Alan Gilbert, Eric Blake, Juan Quintela,
Daniel P. Berrangé,
John G Johnson, Jagannathan Raman, Elena Ufimtseva,
Paolo Bonzini, Marc-André Lureau, lizefan.x, zhouyibo
Hello Peter,
On Wed, Jun 8, 2022 at 5:23 PM Peter Xu <peterx@redhat.com> wrote:
[...]
> > In a previous iteration of the patchset, it was made clear that it's
> > desirable to detect when the kernel falls back to copying mechanism,
> > so the user of 'QIOChannelSocket' can switch to copying and avoid the
> > overhead. This was done by the return value of flush(), which is 1 if
> > that occurs.
>
> Two questions..
>
> 1) When that happens, will MSG_ERRQUEUE keeps working just like zerocopy
> is functional?
I am not sure about what exactly you meant by 'like zerocopy is
funcional', but the
idea is that reading from MSG_ERRQUEUE should return a msg for each sendmsg
syscall with MSG_ZEROCOPY that previously happened. This does not depend on
the outcome (like falling back to the copying mechanism).
btw, most of those messages may be batched to reduce overhead.
At some point, zero-copy may fail, and fall back to copying, so in
those messages
an error code SO_EE_CODE_ZEROCOPY_COPIED can be seen. Having only
those messages in a flush will trigger the returning of 1 from the
flush function.
>
> If the answer is yes, I don't see how ret=1 will ever be
> returned.. because we'll also go into the same loop in
> qio_channel_socket_flush() anyway.
We set ret to 1 at function entry and then for each message in the MSG_ERRQUEUE,
we test if it has error code different than SO_EE_CODE_ZEROCOPY_COPIED.
If it ever have a different error code, we set ret=0.
So, in our previous example, if we have a net device not supporting
the 'Scatter-Gather'
feature (NETIF_F_SG), every error message will be
SO_EE_CODE_ZEROCOPY_COPIED, and it will return 1.
>
> If the answer is no, then since we'll have non-zero zero_copy_queued,
> will the loop in qio_channel_socket_flush() go into a dead one? How
> could it return?
No, because it will go through all packets sent with MSG_ZEROCOPY, including the
ones that fell back to copying, so the counter should be fine. If any
code disables
zero-copy, it will both stop sending stuff wil MSG_ZEROCOPY and flushing, so it
should be fine.
>
> 2) Even if we have the correct ret=1 returned when that happens, which
> caller is detecting that ret==1 and warn the admin?
>
No caller is using that right now.
It's supposed to be a QIOChannel interface feature, and any user/implementation
could use that information to warn if zero-copy is not being used, fall back to
copying directly (to avoid overhead of testing zero-copy) or even use
it to cancel the
sending if wanted.
It was a suggestion of Daniel on top of [PATCH v5 1/6] IIRC.
Best regards,
Leo
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [External] [PATCH v13 3/8] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX
2022-06-13 20:58 ` Leonardo Bras Soares Passos
@ 2022-06-13 22:53 ` Peter Xu
2022-06-14 3:14 ` Leonardo Bras Soares Passos
0 siblings, 1 reply; 29+ messages in thread
From: Peter Xu @ 2022-06-13 22:53 UTC (permalink / raw)
To: Leonardo Bras Soares Passos
Cc: 徐闯,
qemu-devel, qemu-block, Fam Zheng, Markus Armbruster,
Dr. David Alan Gilbert, Eric Blake, Juan Quintela,
Daniel P. Berrangé,
John G Johnson, Jagannathan Raman, Elena Ufimtseva,
Paolo Bonzini, Marc-André Lureau, lizefan.x, zhouyibo
On Mon, Jun 13, 2022 at 05:58:44PM -0300, Leonardo Bras Soares Passos wrote:
> Hello Peter,
>
> On Wed, Jun 8, 2022 at 5:23 PM Peter Xu <peterx@redhat.com> wrote:
> [...]
> > > In a previous iteration of the patchset, it was made clear that it's
> > > desirable to detect when the kernel falls back to copying mechanism,
> > > so the user of 'QIOChannelSocket' can switch to copying and avoid the
> > > overhead. This was done by the return value of flush(), which is 1 if
> > > that occurs.
> >
> > Two questions..
> >
> > 1) When that happens, will MSG_ERRQUEUE keeps working just like zerocopy
> > is functional?
>
> I am not sure about what exactly you meant by 'like zerocopy is
> funcional', but the
> idea is that reading from MSG_ERRQUEUE should return a msg for each sendmsg
> syscall with MSG_ZEROCOPY that previously happened. This does not depend on
> the outcome (like falling back to the copying mechanism).
> btw, most of those messages may be batched to reduce overhead.
>
> At some point, zero-copy may fail, and fall back to copying, so in
> those messages
> an error code SO_EE_CODE_ZEROCOPY_COPIED can be seen. Having only
> those messages in a flush will trigger the returning of 1 from the
> flush function.
Ah I think I missed the "reset ret==0 when !SO_EE_CODE_ZEROCOPY_COPIED"
path.. Sorry.
>
> >
> > If the answer is yes, I don't see how ret=1 will ever be
> > returned.. because we'll also go into the same loop in
> > qio_channel_socket_flush() anyway.
>
>
> We set ret to 1 at function entry and then for each message in the MSG_ERRQUEUE,
> we test if it has error code different than SO_EE_CODE_ZEROCOPY_COPIED.
> If it ever have a different error code, we set ret=0.
>
> So, in our previous example, if we have a net device not supporting
> the 'Scatter-Gather'
> feature (NETIF_F_SG), every error message will be
> SO_EE_CODE_ZEROCOPY_COPIED, and it will return 1.
>
>
> >
> > If the answer is no, then since we'll have non-zero zero_copy_queued,
> > will the loop in qio_channel_socket_flush() go into a dead one? How
> > could it return?
>
> No, because it will go through all packets sent with MSG_ZEROCOPY, including the
> ones that fell back to copying, so the counter should be fine. If any
> code disables
> zero-copy, it will both stop sending stuff wil MSG_ZEROCOPY and flushing, so it
> should be fine.
>
> >
> > 2) Even if we have the correct ret=1 returned when that happens, which
> > caller is detecting that ret==1 and warn the admin?
> >
>
> No caller is using that right now.
> It's supposed to be a QIOChannel interface feature, and any user/implementation
> could use that information to warn if zero-copy is not being used, fall back to
> copying directly (to avoid overhead of testing zero-copy) or even use
> it to cancel the
> sending if wanted.
>
> It was a suggestion of Daniel on top of [PATCH v5 1/6] IIRC.
OK the detection makes sense, thanks for the details.
Then now I'm wondering whether we should have warned the admin already if
zero-copy send is not fully enabled in live migration. Should we add a
error_report_once() somewhere for the ret==1 already? After all the user
specify zero_copy_send=true explicitly. Did I miss something again?
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [External] [PATCH v13 3/8] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX
2022-06-13 22:53 ` Peter Xu
@ 2022-06-14 3:14 ` Leonardo Bras Soares Passos
0 siblings, 0 replies; 29+ messages in thread
From: Leonardo Bras Soares Passos @ 2022-06-14 3:14 UTC (permalink / raw)
To: Peter Xu
Cc: 徐闯,
qemu-devel, qemu-block, Fam Zheng, Markus Armbruster,
Dr. David Alan Gilbert, Eric Blake, Juan Quintela,
Daniel P. Berrangé,
John G Johnson, Jagannathan Raman, Elena Ufimtseva,
Paolo Bonzini, Marc-André Lureau, lizefan.x, zhouyibo
On Mon, Jun 13, 2022 at 7:53 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Mon, Jun 13, 2022 at 05:58:44PM -0300, Leonardo Bras Soares Passos wrote:
> > Hello Peter,
> >
> > On Wed, Jun 8, 2022 at 5:23 PM Peter Xu <peterx@redhat.com> wrote:
> > [...]
> > > > In a previous iteration of the patchset, it was made clear that it's
> > > > desirable to detect when the kernel falls back to copying mechanism,
> > > > so the user of 'QIOChannelSocket' can switch to copying and avoid the
> > > > overhead. This was done by the return value of flush(), which is 1 if
> > > > that occurs.
> > >
> > > Two questions..
> > >
> > > 1) When that happens, will MSG_ERRQUEUE keeps working just like zerocopy
> > > is functional?
> >
> > I am not sure about what exactly you meant by 'like zerocopy is
> > funcional', but the
> > idea is that reading from MSG_ERRQUEUE should return a msg for each sendmsg
> > syscall with MSG_ZEROCOPY that previously happened. This does not depend on
> > the outcome (like falling back to the copying mechanism).
> > btw, most of those messages may be batched to reduce overhead.
> >
> > At some point, zero-copy may fail, and fall back to copying, so in
> > those messages
> > an error code SO_EE_CODE_ZEROCOPY_COPIED can be seen. Having only
> > those messages in a flush will trigger the returning of 1 from the
> > flush function.
>
> Ah I think I missed the "reset ret==0 when !SO_EE_CODE_ZEROCOPY_COPIED"
> path.. Sorry.
>
> >
> > >
> > > If the answer is yes, I don't see how ret=1 will ever be
> > > returned.. because we'll also go into the same loop in
> > > qio_channel_socket_flush() anyway.
> >
> >
> > We set ret to 1 at function entry and then for each message in the MSG_ERRQUEUE,
> > we test if it has error code different than SO_EE_CODE_ZEROCOPY_COPIED.
> > If it ever have a different error code, we set ret=0.
> >
> > So, in our previous example, if we have a net device not supporting
> > the 'Scatter-Gather'
> > feature (NETIF_F_SG), every error message will be
> > SO_EE_CODE_ZEROCOPY_COPIED, and it will return 1.
> >
> >
> > >
> > > If the answer is no, then since we'll have non-zero zero_copy_queued,
> > > will the loop in qio_channel_socket_flush() go into a dead one? How
> > > could it return?
> >
> > No, because it will go through all packets sent with MSG_ZEROCOPY, including the
> > ones that fell back to copying, so the counter should be fine. If any
> > code disables
> > zero-copy, it will both stop sending stuff wil MSG_ZEROCOPY and flushing, so it
> > should be fine.
> >
> > >
> > > 2) Even if we have the correct ret=1 returned when that happens, which
> > > caller is detecting that ret==1 and warn the admin?
> > >
> >
> > No caller is using that right now.
> > It's supposed to be a QIOChannel interface feature, and any user/implementation
> > could use that information to warn if zero-copy is not being used, fall back to
> > copying directly (to avoid overhead of testing zero-copy) or even use
> > it to cancel the
> > sending if wanted.
> >
> > It was a suggestion of Daniel on top of [PATCH v5 1/6] IIRC.
>
> OK the detection makes sense, thanks for the details.
>
> Then now I'm wondering whether we should have warned the admin already if
> zero-copy send is not fully enabled in live migration. Should we add a
> error_report_once() somewhere for the ret==1 already? After all the user
> specify zero_copy_send=true explicitly. Did I miss something again?
>
You are correct, I think warning the user is the valid thing to have here.
At the end of the first iteration, where the first flush happens, I
think it's too late to
fail the migration, since a huge lot of the data has already been sent.
Best regards,
Leo
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [External] [PATCH v13 3/8] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX
2022-06-01 9:37 ` [External] " 徐闯
2022-06-01 13:58 ` Peter Xu
@ 2022-06-08 5:24 ` Leonardo Bras Soares Passos
2022-06-08 6:48 ` chuang xu
1 sibling, 1 reply; 29+ messages in thread
From: Leonardo Bras Soares Passos @ 2022-06-08 5:24 UTC (permalink / raw)
To: 徐闯
Cc: qemu-devel, qemu-block, Peter Xu, Fam Zheng, Markus Armbruster,
Dr. David Alan Gilbert, Eric Blake, Juan Quintela,
Daniel P. Berrangé,
John G Johnson, Jagannathan Raman, Elena Ufimtseva,
Paolo Bonzini, Marc-André Lureau, lizefan.x, zhouyibo
Hello 徐闯,
Thanks for reviewing!
On Wed, Jun 1, 2022 at 6:37 AM 徐闯 <xuchuangxclwt@bytedance.com> wrote:
[...]
> Hi, Leonardo. I'm also paying attention to the application of
> MSG_ZEROCOPY in live migration recently. I noticed that you defined a
> member `zero_copy_queued` in the struct QIOChannelSocket, but I can't
> find out where the value of this member has been changed in your patch.
> Can you answer it for me?
>
You are right.
This was being correctly implemented until v6, and then the increment
just vanished.
Since v6 there were a lot of changes both in the patch and in the
base repository, so I think I completely missed it in some change
iteration.
I will send a fix shortly.
Is that ok if I include a "Reported-by: 徐闯
<xuchuangxclwt@bytedance.com>" in the patch?
Best regards,
Leo
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [External] [PATCH v13 3/8] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX
2022-06-08 5:24 ` Leonardo Bras Soares Passos
@ 2022-06-08 6:48 ` chuang xu
0 siblings, 0 replies; 29+ messages in thread
From: chuang xu @ 2022-06-08 6:48 UTC (permalink / raw)
To: Leonardo Bras Soares Passos
Cc: qemu-devel, qemu-block, Peter Xu, Fam Zheng, Markus Armbruster,
Dr. David Alan Gilbert, Eric Blake, Juan Quintela,
Daniel P. Berrangé,
John G Johnson, Jagannathan Raman, Elena Ufimtseva,
Paolo Bonzini, Marc-André Lureau, lizefan.x, zhouyibo
On 2022/6/8 下午1:24, Leonardo Bras Soares Passos wrote:
> I will send a fix shortly.
> Is that ok if I include a "Reported-by: 徐闯
> <xuchuangxclwt@bytedance.com>" in the patch?
okay.
Best Regards,
chuang xu
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [External] [PATCH v13 3/8] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX
2022-05-13 6:28 ` [PATCH v13 3/8] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX Leonardo Bras
2022-06-01 9:37 ` [External] " 徐闯
@ 2022-06-14 13:09 ` chuang xu
2022-06-14 14:14 ` Dr. David Alan Gilbert
1 sibling, 1 reply; 29+ messages in thread
From: chuang xu @ 2022-06-14 13:09 UTC (permalink / raw)
To: Leonardo Bras
Cc: qemu-devel, qemu-block, Peter Xu, Marc-André Lureau,
Paolo Bonzini, Elena Ufimtseva, Jagannathan Raman,
Daniel P. Berrangé,
John G Johnson, Juan Quintela, Dr. David Alan Gilbert,
Eric Blake, Markus Armbruster, Fam Zheng, lizefan.x, zhouyibo
[-- Attachment #1: Type: text/plain, Size: 2441 bytes --]
On 2022/5/13 下午2:28, Leonardo Bras wrote:
> @@ -557,15 +578,31 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
> memcpy(CMSG_DATA(cmsg), fds, fdsize);
> }
>
> +#ifdef QEMU_MSG_ZEROCOPY
> + if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
> + sflags = MSG_ZEROCOPY;
> + }
> +#endif
> +
> retry:
> - ret = sendmsg(sioc->fd, &msg, 0);
> + ret = sendmsg(sioc->fd, &msg, sflags);
> if (ret <= 0) {
> - if (errno == EAGAIN) {
> + switch (errno) {
> + case EAGAIN:
> return QIO_CHANNEL_ERR_BLOCK;
> - }
> - if (errno == EINTR) {
> + case EINTR:
> goto retry;
> +#ifdef QEMU_MSG_ZEROCOPY
> + case ENOBUFS:
> + if (sflags & MSG_ZEROCOPY) {
> + error_setg_errno(errp, errno,
> + "Process can't lock enough memory for using MSG_ZEROCOPY");
> + return -1;
> + }
> + break;
> +#endif
> }
> +
> error_setg_errno(errp, errno,
> "Unable to write to socket");
> return -1;
Hi, Leo.
There are some other questions I would like to discuss with you.
I tested the multifd zero_copy migration and found that sometimes even
if max locked memory of qemu was set to 16GB(much greater than
`MULTIFD_PACKET_SIZE`), the error "Process can't lock enough memory for
using MSG_ZEROCOPY" would still be reported.
I noticed that the
doc(https://www.kernel.org/doc/html/v5.12/networking/msg_zerocopy.html)
says "A zerocopy failure will return -1 with errno ENOBUFS. This happens
if the socket option was not set, _the socket exceeds its optmem limit_
or the user exceeds its ulimit on locked pages."
I also found that the RFC(https://lwn.net/Articles/715279/) says _"__The
change to allocate notification skbuffs from optmem requires__ensuring
that net.core.optmem is at least a few 100KB."_
On my host, optmem was initially set to 20KB, I tried to change it to
100KB (echo 102400 > /proc/sys/net/core/optmem_max) as the RFC says.Then
I tested the multifd zero_copy migration repeatedly,and the error
disappeared.
So when sendmsg returns -1 with errno ENOBUFS, should we distinguish
between error ''socket exceeds optmem limit" and error "user exceeds
ulimit on locked pages"? Or is there any better way to avoid this problem?
Best Regards,
chuang xu
[-- Attachment #2: Type: text/html, Size: 4356 bytes --]
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [External] [PATCH v13 3/8] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX
2022-06-14 13:09 ` chuang xu
@ 2022-06-14 14:14 ` Dr. David Alan Gilbert
2022-06-15 14:44 ` chuang xu
0 siblings, 1 reply; 29+ messages in thread
From: Dr. David Alan Gilbert @ 2022-06-14 14:14 UTC (permalink / raw)
To: chuang xu
Cc: Leonardo Bras, qemu-devel, Peter Xu, Marc-André Lureau,
Paolo Bonzini, Elena Ufimtseva, Jagannathan Raman,
Daniel P. Berrangé,
John G Johnson, Juan Quintela, Eric Blake, Markus Armbruster,
Fam Zheng, lizefan.x, zhouyibo
* chuang xu (xuchuangxclwt@bytedance.com) wrote:
>
> On 2022/5/13 下午2:28, Leonardo Bras wrote:
> > @@ -557,15 +578,31 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
> > memcpy(CMSG_DATA(cmsg), fds, fdsize);
> > }
> > +#ifdef QEMU_MSG_ZEROCOPY
> > + if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
> > + sflags = MSG_ZEROCOPY;
> > + }
> > +#endif
> > +
> > retry:
> > - ret = sendmsg(sioc->fd, &msg, 0);
> > + ret = sendmsg(sioc->fd, &msg, sflags);
> > if (ret <= 0) {
> > - if (errno == EAGAIN) {
> > + switch (errno) {
> > + case EAGAIN:
> > return QIO_CHANNEL_ERR_BLOCK;
> > - }
> > - if (errno == EINTR) {
> > + case EINTR:
> > goto retry;
> > +#ifdef QEMU_MSG_ZEROCOPY
> > + case ENOBUFS:
> > + if (sflags & MSG_ZEROCOPY) {
> > + error_setg_errno(errp, errno,
> > + "Process can't lock enough memory for using MSG_ZEROCOPY");
> > + return -1;
> > + }
> > + break;
> > +#endif
> > }
> > +
> > error_setg_errno(errp, errno,
> > "Unable to write to socket");
> > return -1;
>
> Hi, Leo.
>
> There are some other questions I would like to discuss with you.
>
> I tested the multifd zero_copy migration and found that sometimes even if
> max locked memory of qemu was set to 16GB(much greater than
> `MULTIFD_PACKET_SIZE`), the error "Process can't lock enough memory for
> using MSG_ZEROCOPY" would still be reported.
>
> I noticed that the
> doc(https://www.kernel.org/doc/html/v5.12/networking/msg_zerocopy.html) says
> "A zerocopy failure will return -1 with errno ENOBUFS. This happens if the
> socket option was not set, _the socket exceeds its optmem limit_ or the user
> exceeds its ulimit on locked pages."
>
> I also found that the RFC(https://lwn.net/Articles/715279/) says _"__The
> change to allocate notification skbuffs from optmem requires__ensuring that
> net.core.optmem is at least a few 100KB."_
Interesting.
> On my host, optmem was initially set to 20KB, I tried to change it to 100KB
> (echo 102400 > /proc/sys/net/core/optmem_max) as the RFC says.Then I tested
> the multifd zero_copy migration repeatedly,and the error disappeared.
>
> So when sendmsg returns -1 with errno ENOBUFS, should we distinguish between
> error ''socket exceeds optmem limit" and error "user exceeds ulimit on
> locked pages"? Or is there any better way to avoid this problem?
I don't think we can tell which one of them triggered the error; so the
only thing I can suggest is that we document the need for optmem_max
setting; I wonder how we get a better answer than 'a few 100KB'?
I guess it's something like the number of packets inflight *
sizeof(cmsghdr) ?
Dave
> Best Regards,
>
> chuang xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [External] [PATCH v13 3/8] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX
2022-06-14 14:14 ` Dr. David Alan Gilbert
@ 2022-06-15 14:44 ` chuang xu
0 siblings, 0 replies; 29+ messages in thread
From: chuang xu @ 2022-06-15 14:44 UTC (permalink / raw)
To: Dr. David Alan Gilbert
Cc: Leonardo Bras, qemu-devel, Peter Xu, Marc-André Lureau,
Paolo Bonzini, Elena Ufimtseva, Jagannathan Raman,
Daniel P. Berrangé,
John G Johnson, Juan Quintela, Eric Blake, Markus Armbruster,
Fam Zheng, lizefan.x, zhouyibo
[-- Attachment #1: Type: text/plain, Size: 2572 bytes --]
On 2022/6/14 下午10:14, Dr. David Alan Gilbert wrote:
> I don't think we can tell which one of them triggered the error; so the
> only thing I can suggest is that we document the need for optmem_max
> setting; I wonder how we get a better answer than 'a few 100KB'?
> I guess it's something like the number of packets inflight *
> sizeof(cmsghdr) ?
>
> Dave
Three cases with errno ENOBUFS are described in the official
doc(https://www.kernel.org/doc/html/v5.12/networking/msg_zerocopy.html):
1.The socket option was not set
2.The socket exceeds its optmem limit
3.The user exceeds its ulimit on locked pages
For case 1, if the code logic is correct, this possibility can be ignored.
For case 2, I asked a kernel developer about the reason for "a few
100KB". He said that the recommended value should be for the purpose of
improving the performance of zero_copy send. If the NICsends data slower
than the data generation speed, even if optmem is set to 100KB, there is
a probability that sendmsg returns with errno ENOBUFS.
For case 3, If I do not set max locked memory for the qemu, the max
locked memory will be unlimited. I set the max locked memory for qemu
and found that once the memory usage exceeds the max locked memory, oom
will occur. Does this mean that sendmsg cannot return with errno
ENOBUFS at all when user exceeds its ulimit on locked pages?
If the above is true, can we take the errno as the case 2?
I modified the code logic to call sendmsg again when the errno is
ENOBUFS and set optmem to the initial 20KB(echo 20480 >
/proc/sys/net/core/optmem_max), now the multifd zero_copy migration goes
well.
Here are the changes I made to the code:
Signed-off-by: chuang xu <xuchuangxclwt@bytedance.com>
---
io/channel-socket.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/io/channel-socket.c b/io/channel-socket.c
index dc9c165de1..9267f55a1d 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -595,9 +595,7 @@ static ssize_t qio_channel_socket_writev(QIOChannel
*ioc,
#ifdef QEMU_MSG_ZEROCOPY
case ENOBUFS:
if (sflags & MSG_ZEROCOPY) {
- error_setg_errno(errp, errno,
- "Process can't lock enough memory for
using MSG_ZEROCOPY");
- return -1;
+ goto retry;
}
break;
#endif
--
Dave, what's your take?
Best Regards,
chuang xu
[-- Attachment #2: Type: text/html, Size: 4175 bytes --]
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v13 4/8] migration: Add zero-copy-send parameter for QMP/HMP for Linux
2022-05-13 6:28 [PATCH v13 0/8] MSG_ZEROCOPY + multifd Leonardo Bras
` (2 preceding siblings ...)
2022-05-13 6:28 ` [PATCH v13 3/8] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX Leonardo Bras
@ 2022-05-13 6:28 ` Leonardo Bras
2022-05-13 6:28 ` [PATCH v13 5/8] migration: Add migrate_use_tls() helper Leonardo Bras
` (3 subsequent siblings)
7 siblings, 0 replies; 29+ messages in thread
From: Leonardo Bras @ 2022-05-13 6:28 UTC (permalink / raw)
To: Marc-André Lureau, Paolo Bonzini, Elena Ufimtseva,
Jagannathan Raman, John G Johnson, Daniel P. Berrangé,
Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
Markus Armbruster, Fam Zheng, Peter Xu
Cc: Leonardo Bras, qemu-devel, qemu-block
Add property that allows zero-copy migration of memory pages
on the sending side, and also includes a helper function
migrate_use_zero_copy_send() to check if it's enabled.
No code is introduced to actually do the migration, but it allow
future implementations to enable/disable this feature.
On non-Linux builds this parameter is compiled-out.
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
---
qapi/migration.json | 24 ++++++++++++++++++++++++
migration/migration.h | 5 +++++
migration/migration.c | 32 ++++++++++++++++++++++++++++++++
migration/socket.c | 11 +++++++++--
monitor/hmp-cmds.c | 6 ++++++
5 files changed, 76 insertions(+), 2 deletions(-)
diff --git a/qapi/migration.json b/qapi/migration.json
index 409eb086a2..2222f44250 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -741,6 +741,13 @@
# will consume more CPU.
# Defaults to 1. (Since 5.0)
#
+# @zero-copy-send: Controls behavior on sending memory pages on migration.
+# When true, enables a zero-copy mechanism for sending
+# memory pages, if host supports it.
+# Requires that QEMU be permitted to use locked memory
+# for guest RAM pages.
+# Defaults to false. (Since 7.1)
+#
# @block-bitmap-mapping: Maps block nodes and bitmaps on them to
# aliases for the purpose of dirty bitmap migration. Such
# aliases may for example be the corresponding names on the
@@ -780,6 +787,7 @@
'xbzrle-cache-size', 'max-postcopy-bandwidth',
'max-cpu-throttle', 'multifd-compression',
'multifd-zlib-level' ,'multifd-zstd-level',
+ { 'name': 'zero-copy-send', 'if' : 'CONFIG_LINUX'},
'block-bitmap-mapping' ] }
##
@@ -906,6 +914,13 @@
# will consume more CPU.
# Defaults to 1. (Since 5.0)
#
+# @zero-copy-send: Controls behavior on sending memory pages on migration.
+# When true, enables a zero-copy mechanism for sending
+# memory pages, if host supports it.
+# Requires that QEMU be permitted to use locked memory
+# for guest RAM pages.
+# Defaults to false. (Since 7.1)
+#
# @block-bitmap-mapping: Maps block nodes and bitmaps on them to
# aliases for the purpose of dirty bitmap migration. Such
# aliases may for example be the corresponding names on the
@@ -960,6 +975,7 @@
'*multifd-compression': 'MultiFDCompression',
'*multifd-zlib-level': 'uint8',
'*multifd-zstd-level': 'uint8',
+ '*zero-copy-send': { 'type': 'bool', 'if': 'CONFIG_LINUX' },
'*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
##
@@ -1106,6 +1122,13 @@
# will consume more CPU.
# Defaults to 1. (Since 5.0)
#
+# @zero-copy-send: Controls behavior on sending memory pages on migration.
+# When true, enables a zero-copy mechanism for sending
+# memory pages, if host supports it.
+# Requires that QEMU be permitted to use locked memory
+# for guest RAM pages.
+# Defaults to false. (Since 7.1)
+#
# @block-bitmap-mapping: Maps block nodes and bitmaps on them to
# aliases for the purpose of dirty bitmap migration. Such
# aliases may for example be the corresponding names on the
@@ -1158,6 +1181,7 @@
'*multifd-compression': 'MultiFDCompression',
'*multifd-zlib-level': 'uint8',
'*multifd-zstd-level': 'uint8',
+ '*zero-copy-send': { 'type': 'bool', 'if': 'CONFIG_LINUX' },
'*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
##
diff --git a/migration/migration.h b/migration/migration.h
index a863032b71..e8f2941a55 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -375,6 +375,11 @@ MultiFDCompression migrate_multifd_compression(void);
int migrate_multifd_zlib_level(void);
int migrate_multifd_zstd_level(void);
+#ifdef CONFIG_LINUX
+bool migrate_use_zero_copy_send(void);
+#else
+#define migrate_use_zero_copy_send() (false)
+#endif
int migrate_use_xbzrle(void);
uint64_t migrate_xbzrle_cache_size(void);
bool migrate_colo_enabled(void);
diff --git a/migration/migration.c b/migration/migration.c
index 5a31b23bd6..3e91f4b5e2 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -910,6 +910,10 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
params->multifd_zlib_level = s->parameters.multifd_zlib_level;
params->has_multifd_zstd_level = true;
params->multifd_zstd_level = s->parameters.multifd_zstd_level;
+#ifdef CONFIG_LINUX
+ params->has_zero_copy_send = true;
+ params->zero_copy_send = s->parameters.zero_copy_send;
+#endif
params->has_xbzrle_cache_size = true;
params->xbzrle_cache_size = s->parameters.xbzrle_cache_size;
params->has_max_postcopy_bandwidth = true;
@@ -1567,6 +1571,11 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
if (params->has_multifd_compression) {
dest->multifd_compression = params->multifd_compression;
}
+#ifdef CONFIG_LINUX
+ if (params->has_zero_copy_send) {
+ dest->zero_copy_send = params->zero_copy_send;
+ }
+#endif
if (params->has_xbzrle_cache_size) {
dest->xbzrle_cache_size = params->xbzrle_cache_size;
}
@@ -1679,6 +1688,11 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
if (params->has_multifd_compression) {
s->parameters.multifd_compression = params->multifd_compression;
}
+#ifdef CONFIG_LINUX
+ if (params->has_zero_copy_send) {
+ s->parameters.zero_copy_send = params->zero_copy_send;
+ }
+#endif
if (params->has_xbzrle_cache_size) {
s->parameters.xbzrle_cache_size = params->xbzrle_cache_size;
xbzrle_cache_resize(params->xbzrle_cache_size, errp);
@@ -2563,6 +2577,17 @@ int migrate_multifd_zstd_level(void)
return s->parameters.multifd_zstd_level;
}
+#ifdef CONFIG_LINUX
+bool migrate_use_zero_copy_send(void)
+{
+ MigrationState *s;
+
+ s = migrate_get_current();
+
+ return s->parameters.zero_copy_send;
+}
+#endif
+
int migrate_use_xbzrle(void)
{
MigrationState *s;
@@ -4206,6 +4231,10 @@ static Property migration_properties[] = {
DEFINE_PROP_UINT8("multifd-zstd-level", MigrationState,
parameters.multifd_zstd_level,
DEFAULT_MIGRATE_MULTIFD_ZSTD_LEVEL),
+#ifdef CONFIG_LINUX
+ DEFINE_PROP_BOOL("zero_copy_send", MigrationState,
+ parameters.zero_copy_send, false),
+#endif
DEFINE_PROP_SIZE("xbzrle-cache-size", MigrationState,
parameters.xbzrle_cache_size,
DEFAULT_MIGRATE_XBZRLE_CACHE_SIZE),
@@ -4303,6 +4332,9 @@ static void migration_instance_init(Object *obj)
params->has_multifd_compression = true;
params->has_multifd_zlib_level = true;
params->has_multifd_zstd_level = true;
+#ifdef CONFIG_LINUX
+ params->has_zero_copy_send = true;
+#endif
params->has_xbzrle_cache_size = true;
params->has_max_postcopy_bandwidth = true;
params->has_max_cpu_throttle = true;
diff --git a/migration/socket.c b/migration/socket.c
index 05705a32d8..3754d8f72c 100644
--- a/migration/socket.c
+++ b/migration/socket.c
@@ -74,9 +74,16 @@ static void socket_outgoing_migration(QIOTask *task,
if (qio_task_propagate_error(task, &err)) {
trace_migration_socket_outgoing_error(error_get_pretty(err));
- } else {
- trace_migration_socket_outgoing_connected(data->hostname);
+ goto out;
}
+
+ trace_migration_socket_outgoing_connected(data->hostname);
+
+ if (migrate_use_zero_copy_send()) {
+ error_setg(&err, "Zero copy send not available in migration");
+ }
+
+out:
migration_channel_connect(data->s, sioc, data->hostname, err);
object_unref(OBJECT(sioc));
}
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 93061a11af..622c783c32 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1309,6 +1309,12 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
p->has_multifd_zstd_level = true;
visit_type_uint8(v, param, &p->multifd_zstd_level, &err);
break;
+#ifdef CONFIG_LINUX
+ case MIGRATION_PARAMETER_ZERO_COPY_SEND:
+ p->has_zero_copy_send = true;
+ visit_type_bool(v, param, &p->zero_copy_send, &err);
+ break;
+#endif
case MIGRATION_PARAMETER_XBZRLE_CACHE_SIZE:
p->has_xbzrle_cache_size = true;
if (!visit_type_size(v, param, &cache_size, &err)) {
--
2.36.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v13 5/8] migration: Add migrate_use_tls() helper
2022-05-13 6:28 [PATCH v13 0/8] MSG_ZEROCOPY + multifd Leonardo Bras
` (3 preceding siblings ...)
2022-05-13 6:28 ` [PATCH v13 4/8] migration: Add zero-copy-send parameter for QMP/HMP for Linux Leonardo Bras
@ 2022-05-13 6:28 ` Leonardo Bras
2022-05-13 6:28 ` [PATCH v13 6/8] multifd: multifd_send_sync_main now returns negative on error Leonardo Bras
` (2 subsequent siblings)
7 siblings, 0 replies; 29+ messages in thread
From: Leonardo Bras @ 2022-05-13 6:28 UTC (permalink / raw)
To: Marc-André Lureau, Paolo Bonzini, Elena Ufimtseva,
Jagannathan Raman, John G Johnson, Daniel P. Berrangé,
Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
Markus Armbruster, Fam Zheng, Peter Xu
Cc: Leonardo Bras, qemu-devel, qemu-block
A lot of places check parameters.tls_creds in order to evaluate if TLS is
in use, and sometimes call migrate_get_current() just for that test.
Add new helper function migrate_use_tls() in order to simplify testing
for TLS usage.
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
---
migration/migration.h | 1 +
migration/channel.c | 3 +--
migration/migration.c | 9 +++++++++
migration/multifd.c | 5 +----
4 files changed, 12 insertions(+), 6 deletions(-)
diff --git a/migration/migration.h b/migration/migration.h
index e8f2941a55..485d58b95f 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -380,6 +380,7 @@ bool migrate_use_zero_copy_send(void);
#else
#define migrate_use_zero_copy_send() (false)
#endif
+int migrate_use_tls(void);
int migrate_use_xbzrle(void);
uint64_t migrate_xbzrle_cache_size(void);
bool migrate_colo_enabled(void);
diff --git a/migration/channel.c b/migration/channel.c
index c6a8dcf1d7..a162d00fea 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -38,8 +38,7 @@ void migration_channel_process_incoming(QIOChannel *ioc)
trace_migration_set_incoming_channel(
ioc, object_get_typename(OBJECT(ioc)));
- if (s->parameters.tls_creds &&
- *s->parameters.tls_creds &&
+ if (migrate_use_tls() &&
!object_dynamic_cast(OBJECT(ioc),
TYPE_QIO_CHANNEL_TLS)) {
migration_tls_channel_process_incoming(s, ioc, &local_err);
diff --git a/migration/migration.c b/migration/migration.c
index 3e91f4b5e2..4b6df2eb5e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2588,6 +2588,15 @@ bool migrate_use_zero_copy_send(void)
}
#endif
+int migrate_use_tls(void)
+{
+ MigrationState *s;
+
+ s = migrate_get_current();
+
+ return s->parameters.tls_creds && *s->parameters.tls_creds;
+}
+
int migrate_use_xbzrle(void)
{
MigrationState *s;
diff --git a/migration/multifd.c b/migration/multifd.c
index 9ea4f581e2..2a8c8570c3 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -782,15 +782,12 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
QIOChannel *ioc,
Error *error)
{
- MigrationState *s = migrate_get_current();
-
trace_multifd_set_outgoing_channel(
ioc, object_get_typename(OBJECT(ioc)),
migrate_get_current()->hostname, error);
if (!error) {
- if (s->parameters.tls_creds &&
- *s->parameters.tls_creds &&
+ if (migrate_use_tls() &&
!object_dynamic_cast(OBJECT(ioc),
TYPE_QIO_CHANNEL_TLS)) {
multifd_tls_channel_connect(p, ioc, &error);
--
2.36.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v13 6/8] multifd: multifd_send_sync_main now returns negative on error
2022-05-13 6:28 [PATCH v13 0/8] MSG_ZEROCOPY + multifd Leonardo Bras
` (4 preceding siblings ...)
2022-05-13 6:28 ` [PATCH v13 5/8] migration: Add migrate_use_tls() helper Leonardo Bras
@ 2022-05-13 6:28 ` Leonardo Bras
2022-05-13 6:28 ` [PATCH v13 7/8] multifd: Send header packet without flags if zero-copy-send is enabled Leonardo Bras
2022-05-13 6:28 ` [PATCH v13 8/8] multifd: Implement zero copy write in multifd migration (multifd-zero-copy) Leonardo Bras
7 siblings, 0 replies; 29+ messages in thread
From: Leonardo Bras @ 2022-05-13 6:28 UTC (permalink / raw)
To: Marc-André Lureau, Paolo Bonzini, Elena Ufimtseva,
Jagannathan Raman, John G Johnson, Daniel P. Berrangé,
Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
Markus Armbruster, Fam Zheng, Peter Xu
Cc: Leonardo Bras, qemu-devel, qemu-block
Even though multifd_send_sync_main() currently emits error_reports, it's
callers don't really check it before continuing.
Change multifd_send_sync_main() to return -1 on error and 0 on success.
Also change all it's callers to make use of this change and possibly fail
earlier.
(This change is important to next patch on multifd zero copy
implementation, to make it sure an error in zero-copy flush does not go
unnoticed.
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
migration/multifd.h | 2 +-
migration/multifd.c | 10 ++++++----
migration/ram.c | 29 ++++++++++++++++++++++-------
3 files changed, 29 insertions(+), 12 deletions(-)
diff --git a/migration/multifd.h b/migration/multifd.h
index 7d0effcb03..bcf5992945 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -20,7 +20,7 @@ int multifd_load_cleanup(Error **errp);
bool multifd_recv_all_channels_created(void);
bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
void multifd_recv_sync_main(void);
-void multifd_send_sync_main(QEMUFile *f);
+int multifd_send_sync_main(QEMUFile *f);
int multifd_queue_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset);
/* Multifd Compression flags */
diff --git a/migration/multifd.c b/migration/multifd.c
index 2a8c8570c3..15fb668e64 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -566,17 +566,17 @@ void multifd_save_cleanup(void)
multifd_send_state = NULL;
}
-void multifd_send_sync_main(QEMUFile *f)
+int multifd_send_sync_main(QEMUFile *f)
{
int i;
if (!migrate_use_multifd()) {
- return;
+ return 0;
}
if (multifd_send_state->pages->num) {
if (multifd_send_pages(f) < 0) {
error_report("%s: multifd_send_pages fail", __func__);
- return;
+ return -1;
}
}
for (i = 0; i < migrate_multifd_channels(); i++) {
@@ -589,7 +589,7 @@ void multifd_send_sync_main(QEMUFile *f)
if (p->quit) {
error_report("%s: channel %d has already quit", __func__, i);
qemu_mutex_unlock(&p->mutex);
- return;
+ return -1;
}
p->packet_num = multifd_send_state->packet_num++;
@@ -608,6 +608,8 @@ void multifd_send_sync_main(QEMUFile *f)
qemu_sem_wait(&p->sem_sync);
}
trace_multifd_send_sync_main(multifd_send_state->packet_num);
+
+ return 0;
}
static void *multifd_send_thread(void *opaque)
diff --git a/migration/ram.c b/migration/ram.c
index a2489a2699..5f5e37f64d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2909,6 +2909,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
{
RAMState **rsp = opaque;
RAMBlock *block;
+ int ret;
if (compress_threads_save_setup()) {
return -1;
@@ -2943,7 +2944,11 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
ram_control_before_iterate(f, RAM_CONTROL_SETUP);
ram_control_after_iterate(f, RAM_CONTROL_SETUP);
- multifd_send_sync_main(f);
+ ret = multifd_send_sync_main(f);
+ if (ret < 0) {
+ return ret;
+ }
+
qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
qemu_fflush(f);
@@ -3052,7 +3057,11 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
out:
if (ret >= 0
&& migration_is_setup_or_active(migrate_get_current()->state)) {
- multifd_send_sync_main(rs->f);
+ ret = multifd_send_sync_main(rs->f);
+ if (ret < 0) {
+ return ret;
+ }
+
qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
qemu_fflush(f);
ram_transferred_add(8);
@@ -3112,13 +3121,19 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
ram_control_after_iterate(f, RAM_CONTROL_FINISH);
}
- if (ret >= 0) {
- multifd_send_sync_main(rs->f);
- qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
- qemu_fflush(f);
+ if (ret < 0) {
+ return ret;
}
- return ret;
+ ret = multifd_send_sync_main(rs->f);
+ if (ret < 0) {
+ return ret;
+ }
+
+ qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
+ qemu_fflush(f);
+
+ return 0;
}
static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
--
2.36.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v13 7/8] multifd: Send header packet without flags if zero-copy-send is enabled
2022-05-13 6:28 [PATCH v13 0/8] MSG_ZEROCOPY + multifd Leonardo Bras
` (5 preceding siblings ...)
2022-05-13 6:28 ` [PATCH v13 6/8] multifd: multifd_send_sync_main now returns negative on error Leonardo Bras
@ 2022-05-13 6:28 ` Leonardo Bras
2022-05-13 6:28 ` [PATCH v13 8/8] multifd: Implement zero copy write in multifd migration (multifd-zero-copy) Leonardo Bras
7 siblings, 0 replies; 29+ messages in thread
From: Leonardo Bras @ 2022-05-13 6:28 UTC (permalink / raw)
To: Marc-André Lureau, Paolo Bonzini, Elena Ufimtseva,
Jagannathan Raman, John G Johnson, Daniel P. Berrangé,
Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
Markus Armbruster, Fam Zheng, Peter Xu
Cc: Leonardo Bras, qemu-devel, qemu-block
Since d48c3a0445 ("multifd: Use a single writev on the send side"),
sending the header packet and the memory pages happens in the same
writev, which can potentially make the migration faster.
Using channel-socket as example, this works well with the default copying
mechanism of sendmsg(), but with zero-copy-send=true, it will cause
the migration to often break.
This happens because the header packet buffer gets reused quite often,
and there is a high chance that by the time the MSG_ZEROCOPY mechanism get
to send the buffer, it has already changed, sending the wrong data and
causing the migration to abort.
It means that, as it is, the buffer for the header packet is not suitable
for sending with MSG_ZEROCOPY.
In order to enable zero copy for multifd, send the header packet on an
individual write(), without any flags, and the remanining pages with a
writev(), as it was happening before. This only changes how a migration
with zero-copy-send=true works, not changing any current behavior for
migrations with zero-copy-send=false.
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
---
migration/multifd.c | 22 +++++++++++++++++++---
1 file changed, 19 insertions(+), 3 deletions(-)
diff --git a/migration/multifd.c b/migration/multifd.c
index 15fb668e64..2541cd2322 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -617,6 +617,7 @@ static void *multifd_send_thread(void *opaque)
MultiFDSendParams *p = opaque;
Error *local_err = NULL;
int ret = 0;
+ bool use_zero_copy_send = migrate_use_zero_copy_send();
trace_multifd_send_thread_start(p->id);
rcu_register_thread();
@@ -639,9 +640,14 @@ static void *multifd_send_thread(void *opaque)
if (p->pending_job) {
uint64_t packet_num = p->packet_num;
uint32_t flags = p->flags;
- p->iovs_num = 1;
p->normal_num = 0;
+ if (use_zero_copy_send) {
+ p->iovs_num = 0;
+ } else {
+ p->iovs_num = 1;
+ }
+
for (int i = 0; i < p->pages->num; i++) {
p->normal[p->normal_num] = p->pages->offset[i];
p->normal_num++;
@@ -665,8 +671,18 @@ static void *multifd_send_thread(void *opaque)
trace_multifd_send(p->id, packet_num, p->normal_num, flags,
p->next_packet_size);
- p->iov[0].iov_len = p->packet_len;
- p->iov[0].iov_base = p->packet;
+ if (use_zero_copy_send) {
+ /* Send header first, without zerocopy */
+ ret = qio_channel_write_all(p->c, (void *)p->packet,
+ p->packet_len, &local_err);
+ if (ret != 0) {
+ break;
+ }
+ } else {
+ /* Send header using the same writev call */
+ p->iov[0].iov_len = p->packet_len;
+ p->iov[0].iov_base = p->packet;
+ }
ret = qio_channel_writev_all(p->c, p->iov, p->iovs_num,
&local_err);
--
2.36.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v13 8/8] multifd: Implement zero copy write in multifd migration (multifd-zero-copy)
2022-05-13 6:28 [PATCH v13 0/8] MSG_ZEROCOPY + multifd Leonardo Bras
` (6 preceding siblings ...)
2022-05-13 6:28 ` [PATCH v13 7/8] multifd: Send header packet without flags if zero-copy-send is enabled Leonardo Bras
@ 2022-05-13 6:28 ` Leonardo Bras
7 siblings, 0 replies; 29+ messages in thread
From: Leonardo Bras @ 2022-05-13 6:28 UTC (permalink / raw)
To: Marc-André Lureau, Paolo Bonzini, Elena Ufimtseva,
Jagannathan Raman, John G Johnson, Daniel P. Berrangé,
Juan Quintela, Dr. David Alan Gilbert, Eric Blake,
Markus Armbruster, Fam Zheng, Peter Xu
Cc: Leonardo Bras, qemu-devel, qemu-block
Implement zero copy send on nocomp_send_write(), by making use of QIOChannel
writev + flags & flush interface.
Change multifd_send_sync_main() so flush_zero_copy() can be called
after each iteration in order to make sure all dirty pages are sent before
a new iteration is started. It will also flush at the beginning and at the
end of migration.
Also make it return -1 if flush_zero_copy() fails, in order to cancel
the migration process, and avoid resuming the guest in the target host
without receiving all current RAM.
This will work fine on RAM migration because the RAM pages are not usually freed,
and there is no problem on changing the pages content between writev_zero_copy() and
the actual sending of the buffer, because this change will dirty the page and
cause it to be re-sent on a next iteration anyway.
A lot of locked memory may be needed in order to use multifd migration
with zero-copy enabled, so disabling the feature should be necessary for
low-privileged users trying to perform multifd migrations.
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
---
migration/multifd.h | 2 ++
migration/migration.c | 11 ++++++++++-
migration/multifd.c | 37 +++++++++++++++++++++++++++++++++++--
migration/socket.c | 5 +++--
4 files changed, 50 insertions(+), 5 deletions(-)
diff --git a/migration/multifd.h b/migration/multifd.h
index bcf5992945..4d8d89e5e5 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -92,6 +92,8 @@ typedef struct {
uint32_t packet_len;
/* pointer to the packet */
MultiFDPacket_t *packet;
+ /* multifd flags for sending ram */
+ int write_flags;
/* multifd flags for each packet */
uint32_t flags;
/* size of the next packet that contains pages */
diff --git a/migration/migration.c b/migration/migration.c
index 4b6df2eb5e..31739b2af9 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1497,7 +1497,16 @@ static bool migrate_params_check(MigrationParameters *params, Error **errp)
error_prepend(errp, "Invalid mapping given for block-bitmap-mapping: ");
return false;
}
-
+#ifdef CONFIG_LINUX
+ if (params->zero_copy_send &&
+ (!migrate_use_multifd() ||
+ params->multifd_compression != MULTIFD_COMPRESSION_NONE ||
+ (params->tls_creds && *params->tls_creds))) {
+ error_setg(errp,
+ "Zero copy only available for non-compressed non-TLS multifd migration");
+ return false;
+ }
+#endif
return true;
}
diff --git a/migration/multifd.c b/migration/multifd.c
index 2541cd2322..9282ab6aa4 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -569,6 +569,7 @@ void multifd_save_cleanup(void)
int multifd_send_sync_main(QEMUFile *f)
{
int i;
+ bool flush_zero_copy;
if (!migrate_use_multifd()) {
return 0;
@@ -579,6 +580,20 @@ int multifd_send_sync_main(QEMUFile *f)
return -1;
}
}
+
+ /*
+ * When using zero-copy, it's necessary to flush the pages before any of
+ * the pages can be sent again, so we'll make sure the new version of the
+ * pages will always arrive _later_ than the old pages.
+ *
+ * Currently we achieve this by flushing the zero-page requested writes
+ * per ram iteration, but in the future we could potentially optimize it
+ * to be less frequent, e.g. only after we finished one whole scanning of
+ * all the dirty bitmaps.
+ */
+
+ flush_zero_copy = migrate_use_zero_copy_send();
+
for (i = 0; i < migrate_multifd_channels(); i++) {
MultiFDSendParams *p = &multifd_send_state->params[i];
@@ -600,6 +615,17 @@ int multifd_send_sync_main(QEMUFile *f)
ram_counters.transferred += p->packet_len;
qemu_mutex_unlock(&p->mutex);
qemu_sem_post(&p->sem);
+
+ if (flush_zero_copy && p->c) {
+ int ret;
+ Error *err = NULL;
+
+ ret = qio_channel_flush(p->c, &err);
+ if (ret < 0) {
+ error_report_err(err);
+ return -1;
+ }
+ }
}
for (i = 0; i < migrate_multifd_channels(); i++) {
MultiFDSendParams *p = &multifd_send_state->params[i];
@@ -684,8 +710,8 @@ static void *multifd_send_thread(void *opaque)
p->iov[0].iov_base = p->packet;
}
- ret = qio_channel_writev_all(p->c, p->iov, p->iovs_num,
- &local_err);
+ ret = qio_channel_writev_full_all(p->c, p->iov, p->iovs_num, NULL,
+ 0, p->write_flags, &local_err);
if (ret != 0) {
break;
}
@@ -913,6 +939,13 @@ int multifd_save_setup(Error **errp)
/* We need one extra place for the packet header */
p->iov = g_new0(struct iovec, page_count + 1);
p->normal = g_new0(ram_addr_t, page_count);
+
+ if (migrate_use_zero_copy_send()) {
+ p->write_flags = QIO_CHANNEL_WRITE_FLAG_ZERO_COPY;
+ } else {
+ p->write_flags = 0;
+ }
+
socket_send_channel_create(multifd_new_send_channel_async, p);
}
diff --git a/migration/socket.c b/migration/socket.c
index 3754d8f72c..4fd5e85f50 100644
--- a/migration/socket.c
+++ b/migration/socket.c
@@ -79,8 +79,9 @@ static void socket_outgoing_migration(QIOTask *task,
trace_migration_socket_outgoing_connected(data->hostname);
- if (migrate_use_zero_copy_send()) {
- error_setg(&err, "Zero copy send not available in migration");
+ if (migrate_use_zero_copy_send() &&
+ !qio_channel_has_feature(sioc, QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY)) {
+ error_setg(&err, "Zero copy send feature not detected in host kernel");
}
out:
--
2.36.1
^ permalink raw reply related [flat|nested] 29+ messages in thread