All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [libvirt RFC] virFile: new VIR_FILE_WRAPPER_BIG_PIPE to improve performance
       [not found]       ` <2d1248d4-ebdf-43f9-e4a7-95f586aade8e@suse.de>
@ 2022-03-17 10:12         ` Claudio Fontana
  2022-03-17 10:25           ` Daniel P. Berrangé
  0 siblings, 1 reply; 30+ messages in thread
From: Claudio Fontana @ 2022-03-17 10:12 UTC (permalink / raw)
  To: Daniel P. Berrangé, Dr. David Alan Gilbert; +Cc: libvir-list, qemu-devel

On 3/16/22 1:17 PM, Claudio Fontana wrote:
> On 3/14/22 6:48 PM, Daniel P. Berrangé wrote:
>> On Mon, Mar 14, 2022 at 06:38:31PM +0100, Claudio Fontana wrote:
>>> On 3/14/22 6:17 PM, Daniel P. Berrangé wrote:
>>>> On Sat, Mar 12, 2022 at 05:30:01PM +0100, Claudio Fontana wrote:
>>>>> the first user is the qemu driver,
>>>>>
>>>>> virsh save/resume would slow to a crawl with a default pipe size (64k).
>>>>>
>>>>> This improves the situation by 400%.
>>>>>
>>>>> Going through io_helper still seems to incur in some penalty (~15%-ish)
>>>>> compared with direct qemu migration to a nc socket to a file.
>>>>>
>>>>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
>>>>> ---
>>>>>  src/qemu/qemu_driver.c    |  6 +++---
>>>>>  src/qemu/qemu_saveimage.c | 11 ++++++-----
>>>>>  src/util/virfile.c        | 12 ++++++++++++
>>>>>  src/util/virfile.h        |  1 +
>>>>>  4 files changed, 22 insertions(+), 8 deletions(-)
>>>>>
>>>>> Hello, I initially thought this to be a qemu performance issue,
>>>>> so you can find the discussion about this in qemu-devel:
>>>>>
>>>>> "Re: bad virsh save /dev/null performance (600 MiB/s max)"
>>>>>
>>>>> https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03142.html
>>>>>
>>>>> RFC since need to validate idea, and it is only lightly tested:
>>>>>
>>>>> save     - about 400% benefit in throughput, getting around 20 Gbps to /dev/null,
>>>>>            and around 13 Gbps to a ramdisk.
>>>>> 	   By comparison, direct qemu migration to a nc socket is around 24Gbps.
>>>>>
>>>>> restore  - not tested, _should_ also benefit in the "bypass_cache" case
>>>>> coredump - not tested, _should_ also benefit like for save
>>>>>
>>>>> Thanks for your comments and review,
>>>>>
>>>>> Claudio
>>>>>
>>>>>
>>>>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
>>>>> index c1b3bd8536..be248c1e92 100644
>>>>> --- a/src/qemu/qemu_driver.c
>>>>> +++ b/src/qemu/qemu_driver.c
>>>>> @@ -3044,7 +3044,7 @@ doCoreDump(virQEMUDriver *driver,
>>>>>      virFileWrapperFd *wrapperFd = NULL;
>>>>>      int directFlag = 0;
>>>>>      bool needUnlink = false;
>>>>> -    unsigned int flags = VIR_FILE_WRAPPER_NON_BLOCKING;
>>>>> +    unsigned int wrapperFlags = VIR_FILE_WRAPPER_NON_BLOCKING | VIR_FILE_WRAPPER_BIG_PIPE;
>>>>>      const char *memory_dump_format = NULL;
>>>>>      g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver);
>>>>>      g_autoptr(virCommand) compressor = NULL;
>>>>> @@ -3059,7 +3059,7 @@ doCoreDump(virQEMUDriver *driver,
>>>>>  
>>>>>      /* Create an empty file with appropriate ownership.  */
>>>>>      if (dump_flags & VIR_DUMP_BYPASS_CACHE) {
>>>>> -        flags |= VIR_FILE_WRAPPER_BYPASS_CACHE;
>>>>> +        wrapperFlags |= VIR_FILE_WRAPPER_BYPASS_CACHE;
>>>>>          directFlag = virFileDirectFdFlag();
>>>>>          if (directFlag < 0) {
>>>>>              virReportError(VIR_ERR_OPERATION_FAILED, "%s",
>>>>> @@ -3072,7 +3072,7 @@ doCoreDump(virQEMUDriver *driver,
>>>>>                               &needUnlink)) < 0)
>>>>>          goto cleanup;
>>>>>  
>>>>> -    if (!(wrapperFd = virFileWrapperFdNew(&fd, path, flags)))
>>>>> +    if (!(wrapperFd = virFileWrapperFdNew(&fd, path, wrapperFlags)))
>>>>>          goto cleanup;
>>>>>  
>>>>>      if (dump_flags & VIR_DUMP_MEMORY_ONLY) {
>>>>> diff --git a/src/qemu/qemu_saveimage.c b/src/qemu/qemu_saveimage.c
>>>>> index c0139041eb..1b522a1542 100644
>>>>> --- a/src/qemu/qemu_saveimage.c
>>>>> +++ b/src/qemu/qemu_saveimage.c
>>>>> @@ -267,7 +267,7 @@ qemuSaveImageCreate(virQEMUDriver *driver,
>>>>>      int fd = -1;
>>>>>      int directFlag = 0;
>>>>>      virFileWrapperFd *wrapperFd = NULL;
>>>>> -    unsigned int wrapperFlags = VIR_FILE_WRAPPER_NON_BLOCKING;
>>>>> +    unsigned int wrapperFlags = VIR_FILE_WRAPPER_NON_BLOCKING | VIR_FILE_WRAPPER_BIG_PIPE;
>>>>>  
>>>>>      /* Obtain the file handle.  */
>>>>>      if ((flags & VIR_DOMAIN_SAVE_BYPASS_CACHE)) {
>>>>> @@ -463,10 +463,11 @@ qemuSaveImageOpen(virQEMUDriver *driver,
>>>>>      if ((fd = qemuDomainOpenFile(cfg, NULL, path, oflags, NULL)) < 0)
>>>>>          return -1;
>>>>>  
>>>>> -    if (bypass_cache &&
>>>>> -        !(*wrapperFd = virFileWrapperFdNew(&fd, path,
>>>>> -                                           VIR_FILE_WRAPPER_BYPASS_CACHE)))
>>>>> -        return -1;
>>>>> +    if (bypass_cache) {
>>>>> +        unsigned int wrapperFlags = VIR_FILE_WRAPPER_BYPASS_CACHE | VIR_FILE_WRAPPER_BIG_PIPE;
>>>>> +        if (!(*wrapperFd = virFileWrapperFdNew(&fd, path, wrapperFlags)))
>>>>> +            return -1;
>>>>> +    }
>>>>>  
>>>>>      data = g_new0(virQEMUSaveData, 1);
>>>>>  
>>>>> diff --git a/src/util/virfile.c b/src/util/virfile.c
>>>>> index a04f888e06..fdacd17890 100644
>>>>> --- a/src/util/virfile.c
>>>>> +++ b/src/util/virfile.c
>>>>> @@ -282,6 +282,18 @@ virFileWrapperFdNew(int *fd, const char *name, unsigned int flags)
>>>>>  
>>>>>      ret->cmd = virCommandNewArgList(iohelper_path, name, NULL);
>>>>>  
>>>>> +    if (flags & VIR_FILE_WRAPPER_BIG_PIPE) {
>>>>> +        /*
>>>>> +         * virsh save/resume would slow to a crawl with a default pipe size (usually 64k).
>>>>> +         * This improves the situation by 400%, although going through io_helper still incurs
>>>>> +         * in a performance penalty compared with a direct qemu migration to a socket.
>>>>> +         */
>>>>> +        int pipe_sz, rv = virFileReadValueInt(&pipe_sz, "/proc/sys/fs/pipe-max-size");
>>>>
>>>> This is fine as an experiment but I don't think it is that safe
>>>> to use in the real world. There could be a variety of reasons why
>>>> an admin can enlarge this value, and we shouldn't assume the max
>>>> size is sensible for libvirt/QEMU to use.
>>>>
>>>> I very much suspect there are diminishing returns here in terms
>>>> of buffer sizes.
>>>>
>>>> 64k is obvious too small, but 1 MB, may be sufficiently large
>>>> that the bottleneck is then elsewhere in our code. IOW, If the
>>>> pipe max size is 100 MB, we shouldn't blindly use it. Can you
>>>> do a few tests with varying sizes to see where a sensible
>>>> tradeoff falls ?
>>>
>>>
>>> Hi Daniel,
>>>
>>> this is a very good point. Actually I see very diminishing returns after the default pipe-max-size (1MB).
>>>
>>> The idea was that beyond allowing larger size, the admin could have set a _smaller_ pipe-max-size,
>>> so we want to use that in that case, otherwise an attempt to use 1MB would result in EPERM, if the process does not have CAP_SYS_RESOURCE or CAP_SYS_ADMIN.
>>> I am not sure if used with Kubevirt, for example, CAP_SYS_RESOURCE or CAP_SYS_ADMIN would be available...?
>>>
>>> So maybe one idea could be to use the minimum between /proc/sys/fs/pipe-max-size and for example 1MB, but will do more testing to see where the actual break point is.
>>
>> That's reasonable.
>>
> 
> Just as an update: still running tests with various combinations, and larger VMs (to RAM, to slow disk, and now to nvme).
> 
> For now no clear winner yet. There seems to be a significant benefit already going from 1MB (my previous default) to 2MB,
> but anything more than 16MB seems to not improve anything at all.
> 
> But I just need to do more testing, more runs.
> 
> Thanks,
> 
> Claudio
> 

Current results show these experimental averages maximum throughput migrating to /dev/null per each FdWrapper Pipe Size (as per QEMU QMP "query-migrate", tests repeated 5 times for each).
VM Size is 60G, most of the memory effectively touched before migration, through user application allocating and touching all memory with pseudorandom data.

64K:     5200 Mbps (current situation)
128K:    5800 Mbps
256K:   20900 Mbps
512K:   21600 Mbps
1M:     22800 Mbps
2M:     22800 Mbps
4M:     22400 Mbps
8M:     22500 Mbps
16M:    22800 Mbps
32M:    22900 Mbps
64M:    22900 Mbps
128M:   22800 Mbps

This above is the throughput out of patched libvirt with multiple Pipe Sizes for the FDWrapper.

As for the theoretical limit for the libvirt architecture,
I ran a qemu migration directly issuing the appropriate QMP commands, setting the same migration parameters as per libvirt, and then migrating to a socket netcatted to /dev/null via
{"execute": "migrate", "arguments": { "uri", "unix:///tmp/netcat.sock" } } : 

QMP:    37000 Mbps

---

So although the Pipe size improves things (in particular the large jump is for the 256K size, although 1M seems a very good value),
there is still a second bottleneck in there somewhere that accounts for a loss of ~14200 Mbps in throughput.

Thanks,

Claudio






^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [libvirt RFC] virFile: new VIR_FILE_WRAPPER_BIG_PIPE to improve performance
  2022-03-17 10:12         ` [libvirt RFC] virFile: new VIR_FILE_WRAPPER_BIG_PIPE to improve performance Claudio Fontana
@ 2022-03-17 10:25           ` Daniel P. Berrangé
  2022-03-17 13:41             ` Claudio Fontana
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel P. Berrangé @ 2022-03-17 10:25 UTC (permalink / raw)
  To: Claudio Fontana; +Cc: libvir-list, Dr. David Alan Gilbert, qemu-devel

On Thu, Mar 17, 2022 at 11:12:11AM +0100, Claudio Fontana wrote:
> On 3/16/22 1:17 PM, Claudio Fontana wrote:
> > On 3/14/22 6:48 PM, Daniel P. Berrangé wrote:
> >> On Mon, Mar 14, 2022 at 06:38:31PM +0100, Claudio Fontana wrote:
> >>> On 3/14/22 6:17 PM, Daniel P. Berrangé wrote:
> >>>> On Sat, Mar 12, 2022 at 05:30:01PM +0100, Claudio Fontana wrote:
> >>>>> the first user is the qemu driver,
> >>>>>
> >>>>> virsh save/resume would slow to a crawl with a default pipe size (64k).
> >>>>>
> >>>>> This improves the situation by 400%.
> >>>>>
> >>>>> Going through io_helper still seems to incur in some penalty (~15%-ish)
> >>>>> compared with direct qemu migration to a nc socket to a file.
> >>>>>
> >>>>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
> >>>>> ---
> >>>>>  src/qemu/qemu_driver.c    |  6 +++---
> >>>>>  src/qemu/qemu_saveimage.c | 11 ++++++-----
> >>>>>  src/util/virfile.c        | 12 ++++++++++++
> >>>>>  src/util/virfile.h        |  1 +
> >>>>>  4 files changed, 22 insertions(+), 8 deletions(-)
> >>>>>
> >>>>> Hello, I initially thought this to be a qemu performance issue,
> >>>>> so you can find the discussion about this in qemu-devel:
> >>>>>
> >>>>> "Re: bad virsh save /dev/null performance (600 MiB/s max)"
> >>>>>
> >>>>> https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03142.html


> Current results show these experimental averages maximum throughput
> migrating to /dev/null per each FdWrapper Pipe Size (as per QEMU QMP
> "query-migrate", tests repeated 5 times for each).
> VM Size is 60G, most of the memory effectively touched before migration,
> through user application allocating and touching all memory with
> pseudorandom data.
> 
> 64K:     5200 Mbps (current situation)
> 128K:    5800 Mbps
> 256K:   20900 Mbps
> 512K:   21600 Mbps
> 1M:     22800 Mbps
> 2M:     22800 Mbps
> 4M:     22400 Mbps
> 8M:     22500 Mbps
> 16M:    22800 Mbps
> 32M:    22900 Mbps
> 64M:    22900 Mbps
> 128M:   22800 Mbps
> 
> This above is the throughput out of patched libvirt with multiple Pipe Sizes for the FDWrapper.

Ok, its bouncing around with noise after 1 MB. So I'd suggest that
libvirt attempt to raise the pipe limit to 1 MB by default, but
not try to go higher.

> As for the theoretical limit for the libvirt architecture,
> I ran a qemu migration directly issuing the appropriate QMP
> commands, setting the same migration parameters as per libvirt,
> and then migrating to a socket netcatted to /dev/null via
> {"execute": "migrate", "arguments": { "uri", "unix:///tmp/netcat.sock" } } :
> 
> QMP:    37000 Mbps

> So although the Pipe size improves things (in particular the
> large jump is for the 256K size, although 1M seems a very good value),
> there is still a second bottleneck in there somewhere that
> accounts for a loss of ~14200 Mbps in throughput.

In the above tests with libvirt, were you using the
--bypass-cache flag or not ?

Hopefully use of O_DIRECT doesn't make a difference for
/dev/null, since the I/O is being immediately thrown
away and so ought to never go into I/O cache. 

In terms of the comparison, we still have libvirt iohelper
giving QEMU a pipe, while your test above gives QEMU a
UNIX socket.

So I still wonder if the delta is caused by the pipe vs socket
difference, as opposed to netcat vs libvirt iohelper code.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [libvirt RFC] virFile: new VIR_FILE_WRAPPER_BIG_PIPE to improve performance
  2022-03-17 10:25           ` Daniel P. Berrangé
@ 2022-03-17 13:41             ` Claudio Fontana
  2022-03-17 14:14               ` Claudio Fontana
  0 siblings, 1 reply; 30+ messages in thread
From: Claudio Fontana @ 2022-03-17 13:41 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: libvir-list, Dr. David Alan Gilbert, qemu-devel

On 3/17/22 11:25 AM, Daniel P. Berrangé wrote:
> On Thu, Mar 17, 2022 at 11:12:11AM +0100, Claudio Fontana wrote:
>> On 3/16/22 1:17 PM, Claudio Fontana wrote:
>>> On 3/14/22 6:48 PM, Daniel P. Berrangé wrote:
>>>> On Mon, Mar 14, 2022 at 06:38:31PM +0100, Claudio Fontana wrote:
>>>>> On 3/14/22 6:17 PM, Daniel P. Berrangé wrote:
>>>>>> On Sat, Mar 12, 2022 at 05:30:01PM +0100, Claudio Fontana wrote:
>>>>>>> the first user is the qemu driver,
>>>>>>>
>>>>>>> virsh save/resume would slow to a crawl with a default pipe size (64k).
>>>>>>>
>>>>>>> This improves the situation by 400%.
>>>>>>>
>>>>>>> Going through io_helper still seems to incur in some penalty (~15%-ish)
>>>>>>> compared with direct qemu migration to a nc socket to a file.
>>>>>>>
>>>>>>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
>>>>>>> ---
>>>>>>>  src/qemu/qemu_driver.c    |  6 +++---
>>>>>>>  src/qemu/qemu_saveimage.c | 11 ++++++-----
>>>>>>>  src/util/virfile.c        | 12 ++++++++++++
>>>>>>>  src/util/virfile.h        |  1 +
>>>>>>>  4 files changed, 22 insertions(+), 8 deletions(-)
>>>>>>>
>>>>>>> Hello, I initially thought this to be a qemu performance issue,
>>>>>>> so you can find the discussion about this in qemu-devel:
>>>>>>>
>>>>>>> "Re: bad virsh save /dev/null performance (600 MiB/s max)"
>>>>>>>
>>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03142.html
> 
> 
>> Current results show these experimental averages maximum throughput
>> migrating to /dev/null per each FdWrapper Pipe Size (as per QEMU QMP
>> "query-migrate", tests repeated 5 times for each).
>> VM Size is 60G, most of the memory effectively touched before migration,
>> through user application allocating and touching all memory with
>> pseudorandom data.
>>
>> 64K:     5200 Mbps (current situation)
>> 128K:    5800 Mbps
>> 256K:   20900 Mbps
>> 512K:   21600 Mbps
>> 1M:     22800 Mbps
>> 2M:     22800 Mbps
>> 4M:     22400 Mbps
>> 8M:     22500 Mbps
>> 16M:    22800 Mbps
>> 32M:    22900 Mbps
>> 64M:    22900 Mbps
>> 128M:   22800 Mbps
>>
>> This above is the throughput out of patched libvirt with multiple Pipe Sizes for the FDWrapper.
> 
> Ok, its bouncing around with noise after 1 MB. So I'd suggest that
> libvirt attempt to raise the pipe limit to 1 MB by default, but
> not try to go higher.
> 
>> As for the theoretical limit for the libvirt architecture,
>> I ran a qemu migration directly issuing the appropriate QMP
>> commands, setting the same migration parameters as per libvirt,
>> and then migrating to a socket netcatted to /dev/null via
>> {"execute": "migrate", "arguments": { "uri", "unix:///tmp/netcat.sock" } } :
>>
>> QMP:    37000 Mbps
> 
>> So although the Pipe size improves things (in particular the
>> large jump is for the 256K size, although 1M seems a very good value),
>> there is still a second bottleneck in there somewhere that
>> accounts for a loss of ~14200 Mbps in throughput.
> 
> In the above tests with libvirt, were you using the
> --bypass-cache flag or not ?

No, I do not. Tests with ramdisk did not show a notable difference for me,

but tests with /dev/null were not possible, since the command line is not accepted:

# virsh save centos7 /dev/null
Domain 'centos7' saved to /dev/null
[OK]

# virsh save centos7 /dev/null --bypass-cache
error: Failed to save domain 'centos7' to /dev/null
error: Failed to create file '/dev/null': Invalid argument


> 
> Hopefully use of O_DIRECT doesn't make a difference for
> /dev/null, since the I/O is being immediately thrown
> away and so ought to never go into I/O cache. 
> 
> In terms of the comparison, we still have libvirt iohelper
> giving QEMU a pipe, while your test above gives QEMU a
> UNIX socket.
> 
> So I still wonder if the delta is caused by the pipe vs socket
> difference, as opposed to netcat vs libvirt iohelper code.

I'll look into this aspect, thanks!
> 
> With regards,
> Daniel
> 

Ciao,

Claudio



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [libvirt RFC] virFile: new VIR_FILE_WRAPPER_BIG_PIPE to improve performance
  2022-03-17 13:41             ` Claudio Fontana
@ 2022-03-17 14:14               ` Claudio Fontana
  2022-03-17 15:03                 ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 30+ messages in thread
From: Claudio Fontana @ 2022-03-17 14:14 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: libvir-list, Dr. David Alan Gilbert, qemu-devel

On 3/17/22 2:41 PM, Claudio Fontana wrote:
> On 3/17/22 11:25 AM, Daniel P. Berrangé wrote:
>> On Thu, Mar 17, 2022 at 11:12:11AM +0100, Claudio Fontana wrote:
>>> On 3/16/22 1:17 PM, Claudio Fontana wrote:
>>>> On 3/14/22 6:48 PM, Daniel P. Berrangé wrote:
>>>>> On Mon, Mar 14, 2022 at 06:38:31PM +0100, Claudio Fontana wrote:
>>>>>> On 3/14/22 6:17 PM, Daniel P. Berrangé wrote:
>>>>>>> On Sat, Mar 12, 2022 at 05:30:01PM +0100, Claudio Fontana wrote:
>>>>>>>> the first user is the qemu driver,
>>>>>>>>
>>>>>>>> virsh save/resume would slow to a crawl with a default pipe size (64k).
>>>>>>>>
>>>>>>>> This improves the situation by 400%.
>>>>>>>>
>>>>>>>> Going through io_helper still seems to incur in some penalty (~15%-ish)
>>>>>>>> compared with direct qemu migration to a nc socket to a file.
>>>>>>>>
>>>>>>>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
>>>>>>>> ---
>>>>>>>>  src/qemu/qemu_driver.c    |  6 +++---
>>>>>>>>  src/qemu/qemu_saveimage.c | 11 ++++++-----
>>>>>>>>  src/util/virfile.c        | 12 ++++++++++++
>>>>>>>>  src/util/virfile.h        |  1 +
>>>>>>>>  4 files changed, 22 insertions(+), 8 deletions(-)
>>>>>>>>
>>>>>>>> Hello, I initially thought this to be a qemu performance issue,
>>>>>>>> so you can find the discussion about this in qemu-devel:
>>>>>>>>
>>>>>>>> "Re: bad virsh save /dev/null performance (600 MiB/s max)"
>>>>>>>>
>>>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03142.html
>>
>>
>>> Current results show these experimental averages maximum throughput
>>> migrating to /dev/null per each FdWrapper Pipe Size (as per QEMU QMP
>>> "query-migrate", tests repeated 5 times for each).
>>> VM Size is 60G, most of the memory effectively touched before migration,
>>> through user application allocating and touching all memory with
>>> pseudorandom data.
>>>
>>> 64K:     5200 Mbps (current situation)
>>> 128K:    5800 Mbps
>>> 256K:   20900 Mbps
>>> 512K:   21600 Mbps
>>> 1M:     22800 Mbps
>>> 2M:     22800 Mbps
>>> 4M:     22400 Mbps
>>> 8M:     22500 Mbps
>>> 16M:    22800 Mbps
>>> 32M:    22900 Mbps
>>> 64M:    22900 Mbps
>>> 128M:   22800 Mbps
>>>
>>> This above is the throughput out of patched libvirt with multiple Pipe Sizes for the FDWrapper.
>>
>> Ok, its bouncing around with noise after 1 MB. So I'd suggest that
>> libvirt attempt to raise the pipe limit to 1 MB by default, but
>> not try to go higher.
>>
>>> As for the theoretical limit for the libvirt architecture,
>>> I ran a qemu migration directly issuing the appropriate QMP
>>> commands, setting the same migration parameters as per libvirt,
>>> and then migrating to a socket netcatted to /dev/null via
>>> {"execute": "migrate", "arguments": { "uri", "unix:///tmp/netcat.sock" } } :
>>>
>>> QMP:    37000 Mbps
>>
>>> So although the Pipe size improves things (in particular the
>>> large jump is for the 256K size, although 1M seems a very good value),
>>> there is still a second bottleneck in there somewhere that
>>> accounts for a loss of ~14200 Mbps in throughput.


Interesting addition: I tested quickly on a system with faster cpus and larger VM sizes, up to 200GB,
and the difference in throughput libvirt vs qemu is basically the same ~14500 Mbps.

~50000 mbps qemu to netcat socket to /dev/null
~35500 mbps virsh save to /dev/null

Seems it is not proportional to cpu speed by the looks of it (not a totally fair comparison because the VM sizes are different).

Ciao,

C

>>
>> In the above tests with libvirt, were you using the
>> --bypass-cache flag or not ?
> 
> No, I do not. Tests with ramdisk did not show a notable difference for me,
> 
> but tests with /dev/null were not possible, since the command line is not accepted:
> 
> # virsh save centos7 /dev/null
> Domain 'centos7' saved to /dev/null
> [OK]
> 
> # virsh save centos7 /dev/null --bypass-cache
> error: Failed to save domain 'centos7' to /dev/null
> error: Failed to create file '/dev/null': Invalid argument
> 
> 
>>
>> Hopefully use of O_DIRECT doesn't make a difference for
>> /dev/null, since the I/O is being immediately thrown
>> away and so ought to never go into I/O cache. 
>>
>> In terms of the comparison, we still have libvirt iohelper
>> giving QEMU a pipe, while your test above gives QEMU a
>> UNIX socket.
>>
>> So I still wonder if the delta is caused by the pipe vs socket
>> difference, as opposed to netcat vs libvirt iohelper code.
> 
> I'll look into this aspect, thanks!



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [libvirt RFC] virFile: new VIR_FILE_WRAPPER_BIG_PIPE to improve performance
  2022-03-17 14:14               ` Claudio Fontana
@ 2022-03-17 15:03                 ` Dr. David Alan Gilbert
  2022-03-18 13:34                   ` Claudio Fontana
  0 siblings, 1 reply; 30+ messages in thread
From: Dr. David Alan Gilbert @ 2022-03-17 15:03 UTC (permalink / raw)
  To: Claudio Fontana; +Cc: libvir-list, Daniel P. Berrangé, qemu-devel

* Claudio Fontana (cfontana@suse.de) wrote:
> On 3/17/22 2:41 PM, Claudio Fontana wrote:
> > On 3/17/22 11:25 AM, Daniel P. Berrangé wrote:
> >> On Thu, Mar 17, 2022 at 11:12:11AM +0100, Claudio Fontana wrote:
> >>> On 3/16/22 1:17 PM, Claudio Fontana wrote:
> >>>> On 3/14/22 6:48 PM, Daniel P. Berrangé wrote:
> >>>>> On Mon, Mar 14, 2022 at 06:38:31PM +0100, Claudio Fontana wrote:
> >>>>>> On 3/14/22 6:17 PM, Daniel P. Berrangé wrote:
> >>>>>>> On Sat, Mar 12, 2022 at 05:30:01PM +0100, Claudio Fontana wrote:
> >>>>>>>> the first user is the qemu driver,
> >>>>>>>>
> >>>>>>>> virsh save/resume would slow to a crawl with a default pipe size (64k).
> >>>>>>>>
> >>>>>>>> This improves the situation by 400%.
> >>>>>>>>
> >>>>>>>> Going through io_helper still seems to incur in some penalty (~15%-ish)
> >>>>>>>> compared with direct qemu migration to a nc socket to a file.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
> >>>>>>>> ---
> >>>>>>>>  src/qemu/qemu_driver.c    |  6 +++---
> >>>>>>>>  src/qemu/qemu_saveimage.c | 11 ++++++-----
> >>>>>>>>  src/util/virfile.c        | 12 ++++++++++++
> >>>>>>>>  src/util/virfile.h        |  1 +
> >>>>>>>>  4 files changed, 22 insertions(+), 8 deletions(-)
> >>>>>>>>
> >>>>>>>> Hello, I initially thought this to be a qemu performance issue,
> >>>>>>>> so you can find the discussion about this in qemu-devel:
> >>>>>>>>
> >>>>>>>> "Re: bad virsh save /dev/null performance (600 MiB/s max)"
> >>>>>>>>
> >>>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03142.html
> >>
> >>
> >>> Current results show these experimental averages maximum throughput
> >>> migrating to /dev/null per each FdWrapper Pipe Size (as per QEMU QMP
> >>> "query-migrate", tests repeated 5 times for each).
> >>> VM Size is 60G, most of the memory effectively touched before migration,
> >>> through user application allocating and touching all memory with
> >>> pseudorandom data.
> >>>
> >>> 64K:     5200 Mbps (current situation)
> >>> 128K:    5800 Mbps
> >>> 256K:   20900 Mbps
> >>> 512K:   21600 Mbps
> >>> 1M:     22800 Mbps
> >>> 2M:     22800 Mbps
> >>> 4M:     22400 Mbps
> >>> 8M:     22500 Mbps
> >>> 16M:    22800 Mbps
> >>> 32M:    22900 Mbps
> >>> 64M:    22900 Mbps
> >>> 128M:   22800 Mbps
> >>>
> >>> This above is the throughput out of patched libvirt with multiple Pipe Sizes for the FDWrapper.
> >>
> >> Ok, its bouncing around with noise after 1 MB. So I'd suggest that
> >> libvirt attempt to raise the pipe limit to 1 MB by default, but
> >> not try to go higher.
> >>
> >>> As for the theoretical limit for the libvirt architecture,
> >>> I ran a qemu migration directly issuing the appropriate QMP
> >>> commands, setting the same migration parameters as per libvirt,
> >>> and then migrating to a socket netcatted to /dev/null via
> >>> {"execute": "migrate", "arguments": { "uri", "unix:///tmp/netcat.sock" } } :
> >>>
> >>> QMP:    37000 Mbps
> >>
> >>> So although the Pipe size improves things (in particular the
> >>> large jump is for the 256K size, although 1M seems a very good value),
> >>> there is still a second bottleneck in there somewhere that
> >>> accounts for a loss of ~14200 Mbps in throughput.
> 
> 
> Interesting addition: I tested quickly on a system with faster cpus and larger VM sizes, up to 200GB,
> and the difference in throughput libvirt vs qemu is basically the same ~14500 Mbps.
> 
> ~50000 mbps qemu to netcat socket to /dev/null
> ~35500 mbps virsh save to /dev/null
> 
> Seems it is not proportional to cpu speed by the looks of it (not a totally fair comparison because the VM sizes are different).

It might be closer to RAM or cache bandwidth limited though; for an extra copy.

Dave

> Ciao,
> 
> C
> 
> >>
> >> In the above tests with libvirt, were you using the
> >> --bypass-cache flag or not ?
> > 
> > No, I do not. Tests with ramdisk did not show a notable difference for me,
> > 
> > but tests with /dev/null were not possible, since the command line is not accepted:
> > 
> > # virsh save centos7 /dev/null
> > Domain 'centos7' saved to /dev/null
> > [OK]
> > 
> > # virsh save centos7 /dev/null --bypass-cache
> > error: Failed to save domain 'centos7' to /dev/null
> > error: Failed to create file '/dev/null': Invalid argument
> > 
> > 
> >>
> >> Hopefully use of O_DIRECT doesn't make a difference for
> >> /dev/null, since the I/O is being immediately thrown
> >> away and so ought to never go into I/O cache. 
> >>
> >> In terms of the comparison, we still have libvirt iohelper
> >> giving QEMU a pipe, while your test above gives QEMU a
> >> UNIX socket.
> >>
> >> So I still wonder if the delta is caused by the pipe vs socket
> >> difference, as opposed to netcat vs libvirt iohelper code.
> > 
> > I'll look into this aspect, thanks!
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [libvirt RFC] virFile: new VIR_FILE_WRAPPER_BIG_PIPE to improve performance
  2022-03-17 15:03                 ` Dr. David Alan Gilbert
@ 2022-03-18 13:34                   ` Claudio Fontana
  2022-03-21  7:55                     ` Andrea Righi
                                       ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Claudio Fontana @ 2022-03-18 13:34 UTC (permalink / raw)
  To: Jiri Denemark
  Cc: libvir-list, andrea.righi, Dr. David Alan Gilbert, qemu-devel

On 3/17/22 4:03 PM, Dr. David Alan Gilbert wrote:
> * Claudio Fontana (cfontana@suse.de) wrote:
>> On 3/17/22 2:41 PM, Claudio Fontana wrote:
>>> On 3/17/22 11:25 AM, Daniel P. Berrangé wrote:
>>>> On Thu, Mar 17, 2022 at 11:12:11AM +0100, Claudio Fontana wrote:
>>>>> On 3/16/22 1:17 PM, Claudio Fontana wrote:
>>>>>> On 3/14/22 6:48 PM, Daniel P. Berrangé wrote:
>>>>>>> On Mon, Mar 14, 2022 at 06:38:31PM +0100, Claudio Fontana wrote:
>>>>>>>> On 3/14/22 6:17 PM, Daniel P. Berrangé wrote:
>>>>>>>>> On Sat, Mar 12, 2022 at 05:30:01PM +0100, Claudio Fontana wrote:
>>>>>>>>>> the first user is the qemu driver,
>>>>>>>>>>
>>>>>>>>>> virsh save/resume would slow to a crawl with a default pipe size (64k).
>>>>>>>>>>
>>>>>>>>>> This improves the situation by 400%.
>>>>>>>>>>
>>>>>>>>>> Going through io_helper still seems to incur in some penalty (~15%-ish)
>>>>>>>>>> compared with direct qemu migration to a nc socket to a file.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
>>>>>>>>>> ---
>>>>>>>>>>  src/qemu/qemu_driver.c    |  6 +++---
>>>>>>>>>>  src/qemu/qemu_saveimage.c | 11 ++++++-----
>>>>>>>>>>  src/util/virfile.c        | 12 ++++++++++++
>>>>>>>>>>  src/util/virfile.h        |  1 +
>>>>>>>>>>  4 files changed, 22 insertions(+), 8 deletions(-)
>>>>>>>>>>
>>>>>>>>>> Hello, I initially thought this to be a qemu performance issue,
>>>>>>>>>> so you can find the discussion about this in qemu-devel:
>>>>>>>>>>
>>>>>>>>>> "Re: bad virsh save /dev/null performance (600 MiB/s max)"
>>>>>>>>>>
>>>>>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03142.html
>>>>
>>>>
>>>>> Current results show these experimental averages maximum throughput
>>>>> migrating to /dev/null per each FdWrapper Pipe Size (as per QEMU QMP
>>>>> "query-migrate", tests repeated 5 times for each).
>>>>> VM Size is 60G, most of the memory effectively touched before migration,
>>>>> through user application allocating and touching all memory with
>>>>> pseudorandom data.
>>>>>
>>>>> 64K:     5200 Mbps (current situation)
>>>>> 128K:    5800 Mbps
>>>>> 256K:   20900 Mbps
>>>>> 512K:   21600 Mbps
>>>>> 1M:     22800 Mbps
>>>>> 2M:     22800 Mbps
>>>>> 4M:     22400 Mbps
>>>>> 8M:     22500 Mbps
>>>>> 16M:    22800 Mbps
>>>>> 32M:    22900 Mbps
>>>>> 64M:    22900 Mbps
>>>>> 128M:   22800 Mbps
>>>>>
>>>>> This above is the throughput out of patched libvirt with multiple Pipe Sizes for the FDWrapper.
>>>>
>>>> Ok, its bouncing around with noise after 1 MB. So I'd suggest that
>>>> libvirt attempt to raise the pipe limit to 1 MB by default, but
>>>> not try to go higher.
>>>>
>>>>> As for the theoretical limit for the libvirt architecture,
>>>>> I ran a qemu migration directly issuing the appropriate QMP
>>>>> commands, setting the same migration parameters as per libvirt,
>>>>> and then migrating to a socket netcatted to /dev/null via
>>>>> {"execute": "migrate", "arguments": { "uri", "unix:///tmp/netcat.sock" } } :
>>>>>
>>>>> QMP:    37000 Mbps
>>>>
>>>>> So although the Pipe size improves things (in particular the
>>>>> large jump is for the 256K size, although 1M seems a very good value),
>>>>> there is still a second bottleneck in there somewhere that
>>>>> accounts for a loss of ~14200 Mbps in throughput.
>>
>>
>> Interesting addition: I tested quickly on a system with faster cpus and larger VM sizes, up to 200GB,
>> and the difference in throughput libvirt vs qemu is basically the same ~14500 Mbps.
>>
>> ~50000 mbps qemu to netcat socket to /dev/null
>> ~35500 mbps virsh save to /dev/null
>>
>> Seems it is not proportional to cpu speed by the looks of it (not a totally fair comparison because the VM sizes are different).
> 
> It might be closer to RAM or cache bandwidth limited though; for an extra copy.

I was thinking about sendfile(2) in iohelper, but that probably can't work as the input fd is a socket, I am getting EINVAL.

One thing that I noticed is:

ommit afe6e58aedcd5e27ea16184fed90b338569bd042
Author: Jiri Denemark <jdenemar@redhat.com>
Date:   Mon Feb 6 14:40:48 2012 +0100

    util: Generalize virFileDirectFd
    
    virFileDirectFd was used for accessing files opened with O_DIRECT using
    libvirt_iohelper. We will want to use the helper for accessing files
    regardless on O_DIRECT and thus virFileDirectFd was generalized and
    renamed to virFileWrapperFd.


And in particular the comment in src/util/virFile.c:

    /* XXX support posix_fadvise rather than O_DIRECT, if the kernel support                                                                                                 
     * for that is decent enough. In that case, we will also need to                                                                                                         
     * explicitly support VIR_FILE_WRAPPER_NON_BLOCKING since                                                                                                                
     * VIR_FILE_WRAPPER_BYPASS_CACHE alone will no longer require spawning                                                                                                   
     * iohelper.                                                                                                                                                             
     */

by Jiri Denemark.

I have lots of questions here, and I tried to involve Jiri and Andrea Righi here, who a long time ago proposed a POSIX_FADV_NOREUSE implementation.

1) What is the reason iohelper was introduced?

2) Was Jiri's comment about the missing linux implementation of POSIX_FADV_NOREUSE?

3) if using O_DIRECT is the only reason for iohelper to exist (...?), would replacing it with posix_fadvise remove the need for iohelper?

4) What has stopped Andreas' or another POSIX_FADV_NOREUSE implementation in the kernel?

Lots of questions..

Thanks for all your insight,

Claudio

> 
> Dave
> 
>> Ciao,
>>
>> C
>>
>>>>
>>>> In the above tests with libvirt, were you using the
>>>> --bypass-cache flag or not ?
>>>
>>> No, I do not. Tests with ramdisk did not show a notable difference for me,
>>>
>>> but tests with /dev/null were not possible, since the command line is not accepted:
>>>
>>> # virsh save centos7 /dev/null
>>> Domain 'centos7' saved to /dev/null
>>> [OK]
>>>
>>> # virsh save centos7 /dev/null --bypass-cache
>>> error: Failed to save domain 'centos7' to /dev/null
>>> error: Failed to create file '/dev/null': Invalid argument
>>>
>>>
>>>>
>>>> Hopefully use of O_DIRECT doesn't make a difference for
>>>> /dev/null, since the I/O is being immediately thrown
>>>> away and so ought to never go into I/O cache. 
>>>>
>>>> In terms of the comparison, we still have libvirt iohelper
>>>> giving QEMU a pipe, while your test above gives QEMU a
>>>> UNIX socket.
>>>>
>>>> So I still wonder if the delta is caused by the pipe vs socket
>>>> difference, as opposed to netcat vs libvirt iohelper code.
>>>
>>> I'll look into this aspect, thanks!
>>



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [libvirt RFC] virFile: new VIR_FILE_WRAPPER_BIG_PIPE to improve performance
  2022-03-18 13:34                   ` Claudio Fontana
@ 2022-03-21  7:55                     ` Andrea Righi
  2022-03-25  9:56                       ` Claudio Fontana
  2022-03-25 10:33                     ` Daniel P. Berrangé
  2022-03-25 11:29                     ` Daniel P. Berrangé
  2 siblings, 1 reply; 30+ messages in thread
From: Andrea Righi @ 2022-03-21  7:55 UTC (permalink / raw)
  To: Claudio Fontana
  Cc: libvir-list, Jiri Denemark, Dr. David Alan Gilbert, qemu-devel

On Fri, Mar 18, 2022 at 02:34:29PM +0100, Claudio Fontana wrote:
...
> I have lots of questions here, and I tried to involve Jiri and Andrea Righi here, who a long time ago proposed a POSIX_FADV_NOREUSE implementation.
> 
> 1) What is the reason iohelper was introduced?
> 
> 2) Was Jiri's comment about the missing linux implementation of POSIX_FADV_NOREUSE?
> 
> 3) if using O_DIRECT is the only reason for iohelper to exist (...?), would replacing it with posix_fadvise remove the need for iohelper?
> 
> 4) What has stopped Andreas' or another POSIX_FADV_NOREUSE implementation in the kernel?

For what I remember (it was a long time ago sorry) I stopped to pursue
the POSIX_FADV_NOREUSE idea, because we thought that moving to a
memcg-based solution was a better and more flexible approach, assuming
memcg would have given some form of specific page cache control. As of
today I think we still don't have any specific page cache control
feature in memcg, so maybe we could reconsider the FADV_NOREUSE idea (or
something similar)?

Maybe even introduce a separate FADV_<something> flag if we don't want
to bind a specific implementation of this feature to a standard POSIX
flag (even if FADV_NOREUSE is still implemented as a no-op in the
kernel).

The thing that I liked about the fadvise approach is its simplicity from
an application perspective, because it's just a syscall and that's it,
without having to deal with any other subsystems (cgroups, sysfs, and
similar).

-Andrea

> 
> Lots of questions..
> 
> Thanks for all your insight,
> 
> Claudio
> 
> > 
> > Dave
> > 
> >> Ciao,
> >>
> >> C
> >>
> >>>>
> >>>> In the above tests with libvirt, were you using the
> >>>> --bypass-cache flag or not ?
> >>>
> >>> No, I do not. Tests with ramdisk did not show a notable difference for me,
> >>>
> >>> but tests with /dev/null were not possible, since the command line is not accepted:
> >>>
> >>> # virsh save centos7 /dev/null
> >>> Domain 'centos7' saved to /dev/null
> >>> [OK]
> >>>
> >>> # virsh save centos7 /dev/null --bypass-cache
> >>> error: Failed to save domain 'centos7' to /dev/null
> >>> error: Failed to create file '/dev/null': Invalid argument
> >>>
> >>>
> >>>>
> >>>> Hopefully use of O_DIRECT doesn't make a difference for
> >>>> /dev/null, since the I/O is being immediately thrown
> >>>> away and so ought to never go into I/O cache. 
> >>>>
> >>>> In terms of the comparison, we still have libvirt iohelper
> >>>> giving QEMU a pipe, while your test above gives QEMU a
> >>>> UNIX socket.
> >>>>
> >>>> So I still wonder if the delta is caused by the pipe vs socket
> >>>> difference, as opposed to netcat vs libvirt iohelper code.
> >>>
> >>> I'll look into this aspect, thanks!
> >>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [libvirt RFC] virFile: new VIR_FILE_WRAPPER_BIG_PIPE to improve performance
  2022-03-21  7:55                     ` Andrea Righi
@ 2022-03-25  9:56                       ` Claudio Fontana
  0 siblings, 0 replies; 30+ messages in thread
From: Claudio Fontana @ 2022-03-25  9:56 UTC (permalink / raw)
  To: Andrea Righi, Jiri Denemark
  Cc: libvir-list, Jim Fehlig, Dr. David Alan Gilbert, qemu-devel

On 3/21/22 8:55 AM, Andrea Righi wrote:
> On Fri, Mar 18, 2022 at 02:34:29PM +0100, Claudio Fontana wrote:
> ...
>> I have lots of questions here, and I tried to involve Jiri and Andrea Righi here, who a long time ago proposed a POSIX_FADV_NOREUSE implementation.
>>
>> 1) What is the reason iohelper was introduced?
>>
>> 2) Was Jiri's comment about the missing linux implementation of POSIX_FADV_NOREUSE?
>>
>> 3) if using O_DIRECT is the only reason for iohelper to exist (...?), would replacing it with posix_fadvise remove the need for iohelper?
>>
>> 4) What has stopped Andreas' or another POSIX_FADV_NOREUSE implementation in the kernel?
> 
> For what I remember (it was a long time ago sorry) I stopped to pursue
> the POSIX_FADV_NOREUSE idea, because we thought that moving to a
> memcg-based solution was a better and more flexible approach, assuming
> memcg would have given some form of specific page cache control. As of
> today I think we still don't have any specific page cache control
> feature in memcg, so maybe we could reconsider the FADV_NOREUSE idea (or
> something similar)?
> 
> Maybe even introduce a separate FADV_<something> flag if we don't want
> to bind a specific implementation of this feature to a standard POSIX
> flag (even if FADV_NOREUSE is still implemented as a no-op in the
> kernel).
> 
> The thing that I liked about the fadvise approach is its simplicity from
> an application perspective, because it's just a syscall and that's it,
> without having to deal with any other subsystems (cgroups, sysfs, and
> similar).
> 
> -Andrea


Thanks Andrea,

I guess for this specific use case I am still missing some key understanding on the role of iohelper in libvirt,

Jiri Denemark's comment seems to suggest that having an implementation of FADV_NOREUSE would remove the need for iohelper entirely,

so I assume this would remove the extra copy of the data which seems to impose a substantial throughput penalty when migrating to a file.

I guess I am hoping for Jiri to weigh in on this, or anyone with a clear understanding of this matter.

Ciao,

Claudio



> 
>>
>> Lots of questions..
>>
>> Thanks for all your insight,
>>
>> Claudio
>>
>>>
>>> Dave
>>>
>>>> Ciao,
>>>>
>>>> C
>>>>
>>>>>>
>>>>>> In the above tests with libvirt, were you using the
>>>>>> --bypass-cache flag or not ?
>>>>>
>>>>> No, I do not. Tests with ramdisk did not show a notable difference for me,
>>>>>
>>>>> but tests with /dev/null were not possible, since the command line is not accepted:
>>>>>
>>>>> # virsh save centos7 /dev/null
>>>>> Domain 'centos7' saved to /dev/null
>>>>> [OK]
>>>>>
>>>>> # virsh save centos7 /dev/null --bypass-cache
>>>>> error: Failed to save domain 'centos7' to /dev/null
>>>>> error: Failed to create file '/dev/null': Invalid argument
>>>>>
>>>>>
>>>>>>
>>>>>> Hopefully use of O_DIRECT doesn't make a difference for
>>>>>> /dev/null, since the I/O is being immediately thrown
>>>>>> away and so ought to never go into I/O cache. 
>>>>>>
>>>>>> In terms of the comparison, we still have libvirt iohelper
>>>>>> giving QEMU a pipe, while your test above gives QEMU a
>>>>>> UNIX socket.
>>>>>>
>>>>>> So I still wonder if the delta is caused by the pipe vs socket
>>>>>> difference, as opposed to netcat vs libvirt iohelper code.
>>>>>
>>>>> I'll look into this aspect, thanks!
>>>>
> 



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [libvirt RFC] virFile: new VIR_FILE_WRAPPER_BIG_PIPE to improve performance
  2022-03-18 13:34                   ` Claudio Fontana
  2022-03-21  7:55                     ` Andrea Righi
@ 2022-03-25 10:33                     ` Daniel P. Berrangé
  2022-03-25 10:56                       ` Claudio Fontana
  2022-04-10 19:58                       ` Claudio Fontana
  2022-03-25 11:29                     ` Daniel P. Berrangé
  2 siblings, 2 replies; 30+ messages in thread
From: Daniel P. Berrangé @ 2022-03-25 10:33 UTC (permalink / raw)
  To: Claudio Fontana
  Cc: libvir-list, andrea.righi, Jiri Denemark, Dr. David Alan Gilbert,
	qemu-devel

On Fri, Mar 18, 2022 at 02:34:29PM +0100, Claudio Fontana wrote:
> On 3/17/22 4:03 PM, Dr. David Alan Gilbert wrote:
> > * Claudio Fontana (cfontana@suse.de) wrote:
> >> On 3/17/22 2:41 PM, Claudio Fontana wrote:
> >>> On 3/17/22 11:25 AM, Daniel P. Berrangé wrote:
> >>>> On Thu, Mar 17, 2022 at 11:12:11AM +0100, Claudio Fontana wrote:
> >>>>> On 3/16/22 1:17 PM, Claudio Fontana wrote:
> >>>>>> On 3/14/22 6:48 PM, Daniel P. Berrangé wrote:
> >>>>>>> On Mon, Mar 14, 2022 at 06:38:31PM +0100, Claudio Fontana wrote:
> >>>>>>>> On 3/14/22 6:17 PM, Daniel P. Berrangé wrote:
> >>>>>>>>> On Sat, Mar 12, 2022 at 05:30:01PM +0100, Claudio Fontana wrote:
> >>>>>>>>>> the first user is the qemu driver,
> >>>>>>>>>>
> >>>>>>>>>> virsh save/resume would slow to a crawl with a default pipe size (64k).
> >>>>>>>>>>
> >>>>>>>>>> This improves the situation by 400%.
> >>>>>>>>>>
> >>>>>>>>>> Going through io_helper still seems to incur in some penalty (~15%-ish)
> >>>>>>>>>> compared with direct qemu migration to a nc socket to a file.
> >>>>>>>>>>
> >>>>>>>>>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
> >>>>>>>>>> ---
> >>>>>>>>>>  src/qemu/qemu_driver.c    |  6 +++---
> >>>>>>>>>>  src/qemu/qemu_saveimage.c | 11 ++++++-----
> >>>>>>>>>>  src/util/virfile.c        | 12 ++++++++++++
> >>>>>>>>>>  src/util/virfile.h        |  1 +
> >>>>>>>>>>  4 files changed, 22 insertions(+), 8 deletions(-)
> >>>>>>>>>>
> >>>>>>>>>> Hello, I initially thought this to be a qemu performance issue,
> >>>>>>>>>> so you can find the discussion about this in qemu-devel:
> >>>>>>>>>>
> >>>>>>>>>> "Re: bad virsh save /dev/null performance (600 MiB/s max)"
> >>>>>>>>>>
> >>>>>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03142.html
> >>>>
> >>>>
> >>>>> Current results show these experimental averages maximum throughput
> >>>>> migrating to /dev/null per each FdWrapper Pipe Size (as per QEMU QMP
> >>>>> "query-migrate", tests repeated 5 times for each).
> >>>>> VM Size is 60G, most of the memory effectively touched before migration,
> >>>>> through user application allocating and touching all memory with
> >>>>> pseudorandom data.
> >>>>>
> >>>>> 64K:     5200 Mbps (current situation)
> >>>>> 128K:    5800 Mbps
> >>>>> 256K:   20900 Mbps
> >>>>> 512K:   21600 Mbps
> >>>>> 1M:     22800 Mbps
> >>>>> 2M:     22800 Mbps
> >>>>> 4M:     22400 Mbps
> >>>>> 8M:     22500 Mbps
> >>>>> 16M:    22800 Mbps
> >>>>> 32M:    22900 Mbps
> >>>>> 64M:    22900 Mbps
> >>>>> 128M:   22800 Mbps
> >>>>>
> >>>>> This above is the throughput out of patched libvirt with multiple Pipe Sizes for the FDWrapper.
> >>>>
> >>>> Ok, its bouncing around with noise after 1 MB. So I'd suggest that
> >>>> libvirt attempt to raise the pipe limit to 1 MB by default, but
> >>>> not try to go higher.
> >>>>
> >>>>> As for the theoretical limit for the libvirt architecture,
> >>>>> I ran a qemu migration directly issuing the appropriate QMP
> >>>>> commands, setting the same migration parameters as per libvirt,
> >>>>> and then migrating to a socket netcatted to /dev/null via
> >>>>> {"execute": "migrate", "arguments": { "uri", "unix:///tmp/netcat.sock" } } :
> >>>>>
> >>>>> QMP:    37000 Mbps
> >>>>
> >>>>> So although the Pipe size improves things (in particular the
> >>>>> large jump is for the 256K size, although 1M seems a very good value),
> >>>>> there is still a second bottleneck in there somewhere that
> >>>>> accounts for a loss of ~14200 Mbps in throughput.
> >>
> >>
> >> Interesting addition: I tested quickly on a system with faster cpus and larger VM sizes, up to 200GB,
> >> and the difference in throughput libvirt vs qemu is basically the same ~14500 Mbps.
> >>
> >> ~50000 mbps qemu to netcat socket to /dev/null
> >> ~35500 mbps virsh save to /dev/null
> >>
> >> Seems it is not proportional to cpu speed by the looks of it (not a totally fair comparison because the VM sizes are different).
> > 
> > It might be closer to RAM or cache bandwidth limited though; for an extra copy.
> 
> I was thinking about sendfile(2) in iohelper, but that probably can't work as the input fd is a socket, I am getting EINVAL.
> 
> One thing that I noticed is:
> 
> ommit afe6e58aedcd5e27ea16184fed90b338569bd042
> Author: Jiri Denemark <jdenemar@redhat.com>
> Date:   Mon Feb 6 14:40:48 2012 +0100
> 
>     util: Generalize virFileDirectFd
>     
>     virFileDirectFd was used for accessing files opened with O_DIRECT using
>     libvirt_iohelper. We will want to use the helper for accessing files
>     regardless on O_DIRECT and thus virFileDirectFd was generalized and
>     renamed to virFileWrapperFd.
> 
> 
> And in particular the comment in src/util/virFile.c:
> 
>     /* XXX support posix_fadvise rather than O_DIRECT, if the kernel support                                                                                                 
>      * for that is decent enough. In that case, we will also need to                                                                                                         
>      * explicitly support VIR_FILE_WRAPPER_NON_BLOCKING since                                                                                                                
>      * VIR_FILE_WRAPPER_BYPASS_CACHE alone will no longer require spawning                                                                                                   
>      * iohelper.                                                                                                                                                             
>      */
> 
> by Jiri Denemark.
> 
> I have lots of questions here, and I tried to involve Jiri and Andrea Righi here, who a long time ago proposed a POSIX_FADV_NOREUSE implementation.
> 
> 1) What is the reason iohelper was introduced?

With POSIX you can't get sensible results from poll() on FDs associated with
plain files. It will always report the file as readable/writable, and the
userspace caller will get blocked any time the I/O operation causes the
kernel to read/write from the underlying (potentially very slow) storage.

IOW if you give QEMU an FD associated with a plain file and tell it to
migrate to that, the guest OS will get stalled.

To avoid this we have to give QEMU an FD that is NOT a plain file, but
rather something on which poll() works correctly to avoid blocking. This
essentially means a socket or pipe FD.

Here enters the iohelper - we give QEMU a pipe whose other end is the
iohelper. The iohelper suffers from blocking on read/write but that
doesn't matter, because QEMU is isolated from this via the pipe.

In theory we could just spawn a thread inside libvirtd todo the same
as the iohelper, but using a separate helper process is more robust

If not using libvirt, you would use QEMU's 'exec:' migration protocol
with 'dd' or 'cat' for the same reasons. Libvirt provides the iohelper
so we don't have to deal with portibility questions around 'dd' syntax
and can add features like O_DIRECT that cat lacks.

> 2) Was Jiri's comment about the missing linux implementation of POSIX_FADV_NOREUSE?
> 
> 3) if using O_DIRECT is the only reason for iohelper to exist (...?), would replacing it with posix_fadvise remove the need for iohelper?

We can't remove the iohelper for the reason above.

> 4) What has stopped Andreas' or another POSIX_FADV_NOREUSE implementation in the kernel?

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [libvirt RFC] virFile: new VIR_FILE_WRAPPER_BIG_PIPE to improve performance
  2022-03-25 10:33                     ` Daniel P. Berrangé
@ 2022-03-25 10:56                       ` Claudio Fontana
  2022-03-25 11:14                         ` Daniel P. Berrangé
  2022-04-10 19:58                       ` Claudio Fontana
  1 sibling, 1 reply; 30+ messages in thread
From: Claudio Fontana @ 2022-03-25 10:56 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: libvir-list, andrea.righi, Jiri Denemark, Dr. David Alan Gilbert,
	qemu-devel

Thanks Daniel,

On 3/25/22 11:33 AM, Daniel P. Berrangé wrote:
> On Fri, Mar 18, 2022 at 02:34:29PM +0100, Claudio Fontana wrote:
>> On 3/17/22 4:03 PM, Dr. David Alan Gilbert wrote:
>>> * Claudio Fontana (cfontana@suse.de) wrote:
>>>> On 3/17/22 2:41 PM, Claudio Fontana wrote:
>>>>> On 3/17/22 11:25 AM, Daniel P. Berrangé wrote:
>>>>>> On Thu, Mar 17, 2022 at 11:12:11AM +0100, Claudio Fontana wrote:
>>>>>>> On 3/16/22 1:17 PM, Claudio Fontana wrote:
>>>>>>>> On 3/14/22 6:48 PM, Daniel P. Berrangé wrote:
>>>>>>>>> On Mon, Mar 14, 2022 at 06:38:31PM +0100, Claudio Fontana wrote:
>>>>>>>>>> On 3/14/22 6:17 PM, Daniel P. Berrangé wrote:
>>>>>>>>>>> On Sat, Mar 12, 2022 at 05:30:01PM +0100, Claudio Fontana wrote:
>>>>>>>>>>>> the first user is the qemu driver,
>>>>>>>>>>>>
>>>>>>>>>>>> virsh save/resume would slow to a crawl with a default pipe size (64k).
>>>>>>>>>>>>
>>>>>>>>>>>> This improves the situation by 400%.
>>>>>>>>>>>>
>>>>>>>>>>>> Going through io_helper still seems to incur in some penalty (~15%-ish)
>>>>>>>>>>>> compared with direct qemu migration to a nc socket to a file.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
>>>>>>>>>>>> ---
>>>>>>>>>>>>  src/qemu/qemu_driver.c    |  6 +++---
>>>>>>>>>>>>  src/qemu/qemu_saveimage.c | 11 ++++++-----
>>>>>>>>>>>>  src/util/virfile.c        | 12 ++++++++++++
>>>>>>>>>>>>  src/util/virfile.h        |  1 +
>>>>>>>>>>>>  4 files changed, 22 insertions(+), 8 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> Hello, I initially thought this to be a qemu performance issue,
>>>>>>>>>>>> so you can find the discussion about this in qemu-devel:
>>>>>>>>>>>>
>>>>>>>>>>>> "Re: bad virsh save /dev/null performance (600 MiB/s max)"
>>>>>>>>>>>>
>>>>>>>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03142.html
>>>>>>
>>>>>>
>>>>>>> Current results show these experimental averages maximum throughput
>>>>>>> migrating to /dev/null per each FdWrapper Pipe Size (as per QEMU QMP
>>>>>>> "query-migrate", tests repeated 5 times for each).
>>>>>>> VM Size is 60G, most of the memory effectively touched before migration,
>>>>>>> through user application allocating and touching all memory with
>>>>>>> pseudorandom data.
>>>>>>>
>>>>>>> 64K:     5200 Mbps (current situation)
>>>>>>> 128K:    5800 Mbps
>>>>>>> 256K:   20900 Mbps
>>>>>>> 512K:   21600 Mbps
>>>>>>> 1M:     22800 Mbps
>>>>>>> 2M:     22800 Mbps
>>>>>>> 4M:     22400 Mbps
>>>>>>> 8M:     22500 Mbps
>>>>>>> 16M:    22800 Mbps
>>>>>>> 32M:    22900 Mbps
>>>>>>> 64M:    22900 Mbps
>>>>>>> 128M:   22800 Mbps
>>>>>>>
>>>>>>> This above is the throughput out of patched libvirt with multiple Pipe Sizes for the FDWrapper.
>>>>>>
>>>>>> Ok, its bouncing around with noise after 1 MB. So I'd suggest that
>>>>>> libvirt attempt to raise the pipe limit to 1 MB by default, but
>>>>>> not try to go higher.
>>>>>>
>>>>>>> As for the theoretical limit for the libvirt architecture,
>>>>>>> I ran a qemu migration directly issuing the appropriate QMP
>>>>>>> commands, setting the same migration parameters as per libvirt,
>>>>>>> and then migrating to a socket netcatted to /dev/null via
>>>>>>> {"execute": "migrate", "arguments": { "uri", "unix:///tmp/netcat.sock" } } :
>>>>>>>
>>>>>>> QMP:    37000 Mbps
>>>>>>
>>>>>>> So although the Pipe size improves things (in particular the
>>>>>>> large jump is for the 256K size, although 1M seems a very good value),
>>>>>>> there is still a second bottleneck in there somewhere that
>>>>>>> accounts for a loss of ~14200 Mbps in throughput.
>>>>
>>>>
>>>> Interesting addition: I tested quickly on a system with faster cpus and larger VM sizes, up to 200GB,
>>>> and the difference in throughput libvirt vs qemu is basically the same ~14500 Mbps.
>>>>
>>>> ~50000 mbps qemu to netcat socket to /dev/null
>>>> ~35500 mbps virsh save to /dev/null
>>>>
>>>> Seems it is not proportional to cpu speed by the looks of it (not a totally fair comparison because the VM sizes are different).
>>>
>>> It might be closer to RAM or cache bandwidth limited though; for an extra copy.
>>
>> I was thinking about sendfile(2) in iohelper, but that probably can't work as the input fd is a socket, I am getting EINVAL.
>>
>> One thing that I noticed is:
>>
>> ommit afe6e58aedcd5e27ea16184fed90b338569bd042
>> Author: Jiri Denemark <jdenemar@redhat.com>
>> Date:   Mon Feb 6 14:40:48 2012 +0100
>>
>>     util: Generalize virFileDirectFd
>>     
>>     virFileDirectFd was used for accessing files opened with O_DIRECT using
>>     libvirt_iohelper. We will want to use the helper for accessing files
>>     regardless on O_DIRECT and thus virFileDirectFd was generalized and
>>     renamed to virFileWrapperFd.
>>
>>
>> And in particular the comment in src/util/virFile.c:
>>
>>     /* XXX support posix_fadvise rather than O_DIRECT, if the kernel support                                                                                                 
>>      * for that is decent enough. In that case, we will also need to                                                                                                         
>>      * explicitly support VIR_FILE_WRAPPER_NON_BLOCKING since                                                                                                                
>>      * VIR_FILE_WRAPPER_BYPASS_CACHE alone will no longer require spawning                                                                                                   
>>      * iohelper.                                                                                                                                                             
>>      */
>>
>> by Jiri Denemark.
>>
>> I have lots of questions here, and I tried to involve Jiri and Andrea Righi here, who a long time ago proposed a POSIX_FADV_NOREUSE implementation.
>>
>> 1) What is the reason iohelper was introduced?
> 
> With POSIX you can't get sensible results from poll() on FDs associated with
> plain files. It will always report the file as readable/writable, and the
> userspace caller will get blocked any time the I/O operation causes the
> kernel to read/write from the underlying (potentially very slow) storage.
> 
> IOW if you give QEMU an FD associated with a plain file and tell it to
> migrate to that, the guest OS will get stalled.

we send a stop command to qemu just before migrating to a file in virsh save though right?
With virsh restore we also first load the VM, and only then start executing it.

So for virsh save and virsh restore, this should not be a problem? Still we need the iohelper?

> 
> To avoid this we have to give QEMU an FD that is NOT a plain file, but
> rather something on which poll() works correctly to avoid blocking. This
> essentially means a socket or pipe FD.
> 
> Here enters the iohelper - we give QEMU a pipe whose other end is the
> iohelper. The iohelper suffers from blocking on read/write but that
> doesn't matter, because QEMU is isolated from this via the pipe.
> 
> In theory we could just spawn a thread inside libvirtd todo the same
> as the iohelper, but using a separate helper process is more robust
> 
> If not using libvirt, you would use QEMU's 'exec:' migration protocol
> with 'dd' or 'cat' for the same reasons. Libvirt provides the iohelper
> so we don't have to deal with portibility questions around 'dd' syntax
> and can add features like O_DIRECT that cat lacks.

Thanks, I'll try this as well!

Claudio

> 
>> 2) Was Jiri's comment about the missing linux implementation of POSIX_FADV_NOREUSE?
>>
>> 3) if using O_DIRECT is the only reason for iohelper to exist (...?), would replacing it with posix_fadvise remove the need for iohelper?
> 
> We can't remove the iohelper for the reason above.> 
>> 4) What has stopped Andreas' or another POSIX_FADV_NOREUSE implementation in the kernel?
> 
> With regards,
> Daniel
> 



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [libvirt RFC] virFile: new VIR_FILE_WRAPPER_BIG_PIPE to improve performance
  2022-03-25 10:56                       ` Claudio Fontana
@ 2022-03-25 11:14                         ` Daniel P. Berrangé
  2022-03-25 11:16                           ` Claudio Fontana
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel P. Berrangé @ 2022-03-25 11:14 UTC (permalink / raw)
  To: Claudio Fontana
  Cc: libvir-list, andrea.righi, Jiri Denemark, Dr. David Alan Gilbert,
	qemu-devel

On Fri, Mar 25, 2022 at 11:56:44AM +0100, Claudio Fontana wrote:
> Thanks Daniel,
> 
> On 3/25/22 11:33 AM, Daniel P. Berrangé wrote:
> > On Fri, Mar 18, 2022 at 02:34:29PM +0100, Claudio Fontana wrote:
> >> On 3/17/22 4:03 PM, Dr. David Alan Gilbert wrote:
> >>> * Claudio Fontana (cfontana@suse.de) wrote:
> >>>> On 3/17/22 2:41 PM, Claudio Fontana wrote:
> >>>>> On 3/17/22 11:25 AM, Daniel P. Berrangé wrote:
> >>>>>> On Thu, Mar 17, 2022 at 11:12:11AM +0100, Claudio Fontana wrote:
> >>>>>>> On 3/16/22 1:17 PM, Claudio Fontana wrote:
> >>>>>>>> On 3/14/22 6:48 PM, Daniel P. Berrangé wrote:
> >>>>>>>>> On Mon, Mar 14, 2022 at 06:38:31PM +0100, Claudio Fontana wrote:
> >>>>>>>>>> On 3/14/22 6:17 PM, Daniel P. Berrangé wrote:
> >>>>>>>>>>> On Sat, Mar 12, 2022 at 05:30:01PM +0100, Claudio Fontana wrote:
> >>>>>>>>>>>> the first user is the qemu driver,
> >>>>>>>>>>>>
> >>>>>>>>>>>> virsh save/resume would slow to a crawl with a default pipe size (64k).
> >>>>>>>>>>>>
> >>>>>>>>>>>> This improves the situation by 400%.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Going through io_helper still seems to incur in some penalty (~15%-ish)
> >>>>>>>>>>>> compared with direct qemu migration to a nc socket to a file.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
> >>>>>>>>>>>> ---
> >>>>>>>>>>>>  src/qemu/qemu_driver.c    |  6 +++---
> >>>>>>>>>>>>  src/qemu/qemu_saveimage.c | 11 ++++++-----
> >>>>>>>>>>>>  src/util/virfile.c        | 12 ++++++++++++
> >>>>>>>>>>>>  src/util/virfile.h        |  1 +
> >>>>>>>>>>>>  4 files changed, 22 insertions(+), 8 deletions(-)
> >>>>>>>>>>>>
> >>>>>>>>>>>> Hello, I initially thought this to be a qemu performance issue,
> >>>>>>>>>>>> so you can find the discussion about this in qemu-devel:
> >>>>>>>>>>>>
> >>>>>>>>>>>> "Re: bad virsh save /dev/null performance (600 MiB/s max)"
> >>>>>>>>>>>>
> >>>>>>>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03142.html
> >>>>>>
> >>>>>>
> >>>>>>> Current results show these experimental averages maximum throughput
> >>>>>>> migrating to /dev/null per each FdWrapper Pipe Size (as per QEMU QMP
> >>>>>>> "query-migrate", tests repeated 5 times for each).
> >>>>>>> VM Size is 60G, most of the memory effectively touched before migration,
> >>>>>>> through user application allocating and touching all memory with
> >>>>>>> pseudorandom data.
> >>>>>>>
> >>>>>>> 64K:     5200 Mbps (current situation)
> >>>>>>> 128K:    5800 Mbps
> >>>>>>> 256K:   20900 Mbps
> >>>>>>> 512K:   21600 Mbps
> >>>>>>> 1M:     22800 Mbps
> >>>>>>> 2M:     22800 Mbps
> >>>>>>> 4M:     22400 Mbps
> >>>>>>> 8M:     22500 Mbps
> >>>>>>> 16M:    22800 Mbps
> >>>>>>> 32M:    22900 Mbps
> >>>>>>> 64M:    22900 Mbps
> >>>>>>> 128M:   22800 Mbps
> >>>>>>>
> >>>>>>> This above is the throughput out of patched libvirt with multiple Pipe Sizes for the FDWrapper.
> >>>>>>
> >>>>>> Ok, its bouncing around with noise after 1 MB. So I'd suggest that
> >>>>>> libvirt attempt to raise the pipe limit to 1 MB by default, but
> >>>>>> not try to go higher.
> >>>>>>
> >>>>>>> As for the theoretical limit for the libvirt architecture,
> >>>>>>> I ran a qemu migration directly issuing the appropriate QMP
> >>>>>>> commands, setting the same migration parameters as per libvirt,
> >>>>>>> and then migrating to a socket netcatted to /dev/null via
> >>>>>>> {"execute": "migrate", "arguments": { "uri", "unix:///tmp/netcat.sock" } } :
> >>>>>>>
> >>>>>>> QMP:    37000 Mbps
> >>>>>>
> >>>>>>> So although the Pipe size improves things (in particular the
> >>>>>>> large jump is for the 256K size, although 1M seems a very good value),
> >>>>>>> there is still a second bottleneck in there somewhere that
> >>>>>>> accounts for a loss of ~14200 Mbps in throughput.
> >>>>
> >>>>
> >>>> Interesting addition: I tested quickly on a system with faster cpus and larger VM sizes, up to 200GB,
> >>>> and the difference in throughput libvirt vs qemu is basically the same ~14500 Mbps.
> >>>>
> >>>> ~50000 mbps qemu to netcat socket to /dev/null
> >>>> ~35500 mbps virsh save to /dev/null
> >>>>
> >>>> Seems it is not proportional to cpu speed by the looks of it (not a totally fair comparison because the VM sizes are different).
> >>>
> >>> It might be closer to RAM or cache bandwidth limited though; for an extra copy.
> >>
> >> I was thinking about sendfile(2) in iohelper, but that probably can't work as the input fd is a socket, I am getting EINVAL.
> >>
> >> One thing that I noticed is:
> >>
> >> ommit afe6e58aedcd5e27ea16184fed90b338569bd042
> >> Author: Jiri Denemark <jdenemar@redhat.com>
> >> Date:   Mon Feb 6 14:40:48 2012 +0100
> >>
> >>     util: Generalize virFileDirectFd
> >>     
> >>     virFileDirectFd was used for accessing files opened with O_DIRECT using
> >>     libvirt_iohelper. We will want to use the helper for accessing files
> >>     regardless on O_DIRECT and thus virFileDirectFd was generalized and
> >>     renamed to virFileWrapperFd.
> >>
> >>
> >> And in particular the comment in src/util/virFile.c:
> >>
> >>     /* XXX support posix_fadvise rather than O_DIRECT, if the kernel support                                                                                                 
> >>      * for that is decent enough. In that case, we will also need to                                                                                                         
> >>      * explicitly support VIR_FILE_WRAPPER_NON_BLOCKING since                                                                                                                
> >>      * VIR_FILE_WRAPPER_BYPASS_CACHE alone will no longer require spawning                                                                                                   
> >>      * iohelper.                                                                                                                                                             
> >>      */
> >>
> >> by Jiri Denemark.
> >>
> >> I have lots of questions here, and I tried to involve Jiri and Andrea Righi here, who a long time ago proposed a POSIX_FADV_NOREUSE implementation.
> >>
> >> 1) What is the reason iohelper was introduced?
> > 
> > With POSIX you can't get sensible results from poll() on FDs associated with
> > plain files. It will always report the file as readable/writable, and the
> > userspace caller will get blocked any time the I/O operation causes the
> > kernel to read/write from the underlying (potentially very slow) storage.
> > 
> > IOW if you give QEMU an FD associated with a plain file and tell it to
> > migrate to that, the guest OS will get stalled.
> 
> we send a stop command to qemu just before migrating to a file in virsh save though right?
> With virsh restore we also first load the VM, and only then start executing it.
> 
> So for virsh save and virsh restore, this should not be a problem? Still we need the iohelper?

The same code is used in libvirt for other commands like 'virsh dump'
and snapshots, where the VM remains live though. In general I don't
think we should remove the iohelper, because QEMU code is written from
the POV that the channels honour O_NOBLOCK.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [libvirt RFC] virFile: new VIR_FILE_WRAPPER_BIG_PIPE to improve performance
  2022-03-25 11:14                         ` Daniel P. Berrangé
@ 2022-03-25 11:16                           ` Claudio Fontana
  0 siblings, 0 replies; 30+ messages in thread
From: Claudio Fontana @ 2022-03-25 11:16 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: libvir-list, andrea.righi, Jiri Denemark, Dr. David Alan Gilbert,
	qemu-devel

On 3/25/22 12:14 PM, Daniel P. Berrangé wrote:
> On Fri, Mar 25, 2022 at 11:56:44AM +0100, Claudio Fontana wrote:
>> Thanks Daniel,
>>
>> On 3/25/22 11:33 AM, Daniel P. Berrangé wrote:
>>> On Fri, Mar 18, 2022 at 02:34:29PM +0100, Claudio Fontana wrote:
>>>> On 3/17/22 4:03 PM, Dr. David Alan Gilbert wrote:
>>>>> * Claudio Fontana (cfontana@suse.de) wrote:
>>>>>> On 3/17/22 2:41 PM, Claudio Fontana wrote:
>>>>>>> On 3/17/22 11:25 AM, Daniel P. Berrangé wrote:
>>>>>>>> On Thu, Mar 17, 2022 at 11:12:11AM +0100, Claudio Fontana wrote:
>>>>>>>>> On 3/16/22 1:17 PM, Claudio Fontana wrote:
>>>>>>>>>> On 3/14/22 6:48 PM, Daniel P. Berrangé wrote:
>>>>>>>>>>> On Mon, Mar 14, 2022 at 06:38:31PM +0100, Claudio Fontana wrote:
>>>>>>>>>>>> On 3/14/22 6:17 PM, Daniel P. Berrangé wrote:
>>>>>>>>>>>>> On Sat, Mar 12, 2022 at 05:30:01PM +0100, Claudio Fontana wrote:
>>>>>>>>>>>>>> the first user is the qemu driver,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> virsh save/resume would slow to a crawl with a default pipe size (64k).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This improves the situation by 400%.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Going through io_helper still seems to incur in some penalty (~15%-ish)
>>>>>>>>>>>>>> compared with direct qemu migration to a nc socket to a file.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>  src/qemu/qemu_driver.c    |  6 +++---
>>>>>>>>>>>>>>  src/qemu/qemu_saveimage.c | 11 ++++++-----
>>>>>>>>>>>>>>  src/util/virfile.c        | 12 ++++++++++++
>>>>>>>>>>>>>>  src/util/virfile.h        |  1 +
>>>>>>>>>>>>>>  4 files changed, 22 insertions(+), 8 deletions(-)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello, I initially thought this to be a qemu performance issue,
>>>>>>>>>>>>>> so you can find the discussion about this in qemu-devel:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> "Re: bad virsh save /dev/null performance (600 MiB/s max)"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03142.html
>>>>>>>>
>>>>>>>>
>>>>>>>>> Current results show these experimental averages maximum throughput
>>>>>>>>> migrating to /dev/null per each FdWrapper Pipe Size (as per QEMU QMP
>>>>>>>>> "query-migrate", tests repeated 5 times for each).
>>>>>>>>> VM Size is 60G, most of the memory effectively touched before migration,
>>>>>>>>> through user application allocating and touching all memory with
>>>>>>>>> pseudorandom data.
>>>>>>>>>
>>>>>>>>> 64K:     5200 Mbps (current situation)
>>>>>>>>> 128K:    5800 Mbps
>>>>>>>>> 256K:   20900 Mbps
>>>>>>>>> 512K:   21600 Mbps
>>>>>>>>> 1M:     22800 Mbps
>>>>>>>>> 2M:     22800 Mbps
>>>>>>>>> 4M:     22400 Mbps
>>>>>>>>> 8M:     22500 Mbps
>>>>>>>>> 16M:    22800 Mbps
>>>>>>>>> 32M:    22900 Mbps
>>>>>>>>> 64M:    22900 Mbps
>>>>>>>>> 128M:   22800 Mbps
>>>>>>>>>
>>>>>>>>> This above is the throughput out of patched libvirt with multiple Pipe Sizes for the FDWrapper.
>>>>>>>>
>>>>>>>> Ok, its bouncing around with noise after 1 MB. So I'd suggest that
>>>>>>>> libvirt attempt to raise the pipe limit to 1 MB by default, but
>>>>>>>> not try to go higher.
>>>>>>>>
>>>>>>>>> As for the theoretical limit for the libvirt architecture,
>>>>>>>>> I ran a qemu migration directly issuing the appropriate QMP
>>>>>>>>> commands, setting the same migration parameters as per libvirt,
>>>>>>>>> and then migrating to a socket netcatted to /dev/null via
>>>>>>>>> {"execute": "migrate", "arguments": { "uri", "unix:///tmp/netcat.sock" } } :
>>>>>>>>>
>>>>>>>>> QMP:    37000 Mbps
>>>>>>>>
>>>>>>>>> So although the Pipe size improves things (in particular the
>>>>>>>>> large jump is for the 256K size, although 1M seems a very good value),
>>>>>>>>> there is still a second bottleneck in there somewhere that
>>>>>>>>> accounts for a loss of ~14200 Mbps in throughput.
>>>>>>
>>>>>>
>>>>>> Interesting addition: I tested quickly on a system with faster cpus and larger VM sizes, up to 200GB,
>>>>>> and the difference in throughput libvirt vs qemu is basically the same ~14500 Mbps.
>>>>>>
>>>>>> ~50000 mbps qemu to netcat socket to /dev/null
>>>>>> ~35500 mbps virsh save to /dev/null
>>>>>>
>>>>>> Seems it is not proportional to cpu speed by the looks of it (not a totally fair comparison because the VM sizes are different).
>>>>>
>>>>> It might be closer to RAM or cache bandwidth limited though; for an extra copy.
>>>>
>>>> I was thinking about sendfile(2) in iohelper, but that probably can't work as the input fd is a socket, I am getting EINVAL.
>>>>
>>>> One thing that I noticed is:
>>>>
>>>> ommit afe6e58aedcd5e27ea16184fed90b338569bd042
>>>> Author: Jiri Denemark <jdenemar@redhat.com>
>>>> Date:   Mon Feb 6 14:40:48 2012 +0100
>>>>
>>>>     util: Generalize virFileDirectFd
>>>>     
>>>>     virFileDirectFd was used for accessing files opened with O_DIRECT using
>>>>     libvirt_iohelper. We will want to use the helper for accessing files
>>>>     regardless on O_DIRECT and thus virFileDirectFd was generalized and
>>>>     renamed to virFileWrapperFd.
>>>>
>>>>
>>>> And in particular the comment in src/util/virFile.c:
>>>>
>>>>     /* XXX support posix_fadvise rather than O_DIRECT, if the kernel support                                                                                                 
>>>>      * for that is decent enough. In that case, we will also need to                                                                                                         
>>>>      * explicitly support VIR_FILE_WRAPPER_NON_BLOCKING since                                                                                                                
>>>>      * VIR_FILE_WRAPPER_BYPASS_CACHE alone will no longer require spawning                                                                                                   
>>>>      * iohelper.                                                                                                                                                             
>>>>      */
>>>>
>>>> by Jiri Denemark.
>>>>
>>>> I have lots of questions here, and I tried to involve Jiri and Andrea Righi here, who a long time ago proposed a POSIX_FADV_NOREUSE implementation.
>>>>
>>>> 1) What is the reason iohelper was introduced?
>>>
>>> With POSIX you can't get sensible results from poll() on FDs associated with
>>> plain files. It will always report the file as readable/writable, and the
>>> userspace caller will get blocked any time the I/O operation causes the
>>> kernel to read/write from the underlying (potentially very slow) storage.
>>>
>>> IOW if you give QEMU an FD associated with a plain file and tell it to
>>> migrate to that, the guest OS will get stalled.
>>
>> we send a stop command to qemu just before migrating to a file in virsh save though right?
>> With virsh restore we also first load the VM, and only then start executing it.
>>
>> So for virsh save and virsh restore, this should not be a problem? Still we need the iohelper?
> 
> The same code is used in libvirt for other commands like 'virsh dump'
> and snapshots, where the VM remains live though. In general I don't
> think we should remove the iohelper, because QEMU code is written from
> the POV that the channels honour O_NOBLOCK.
> 

understand.. it is actually not traceful to QEMU anyway indeed. Thanks for the clarification.

Claudio



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [libvirt RFC] virFile: new VIR_FILE_WRAPPER_BIG_PIPE to improve performance
  2022-03-18 13:34                   ` Claudio Fontana
  2022-03-21  7:55                     ` Andrea Righi
  2022-03-25 10:33                     ` Daniel P. Berrangé
@ 2022-03-25 11:29                     ` Daniel P. Berrangé
  2022-03-26 15:49                       ` Claudio Fontana
  2 siblings, 1 reply; 30+ messages in thread
From: Daniel P. Berrangé @ 2022-03-25 11:29 UTC (permalink / raw)
  To: Claudio Fontana
  Cc: libvir-list, andrea.righi, Jiri Denemark, Dr. David Alan Gilbert,
	qemu-devel

On Fri, Mar 18, 2022 at 02:34:29PM +0100, Claudio Fontana wrote:
> On 3/17/22 4:03 PM, Dr. David Alan Gilbert wrote:
> > * Claudio Fontana (cfontana@suse.de) wrote:
> >> On 3/17/22 2:41 PM, Claudio Fontana wrote:
> >>> On 3/17/22 11:25 AM, Daniel P. Berrangé wrote:
> >>>> On Thu, Mar 17, 2022 at 11:12:11AM +0100, Claudio Fontana wrote:
> >>>>> On 3/16/22 1:17 PM, Claudio Fontana wrote:
> >>>>>> On 3/14/22 6:48 PM, Daniel P. Berrangé wrote:
> >>>>>>> On Mon, Mar 14, 2022 at 06:38:31PM +0100, Claudio Fontana wrote:
> >>>>>>>> On 3/14/22 6:17 PM, Daniel P. Berrangé wrote:
> >>>>>>>>> On Sat, Mar 12, 2022 at 05:30:01PM +0100, Claudio Fontana wrote:
> >>>>>>>>>> the first user is the qemu driver,
> >>>>>>>>>>
> >>>>>>>>>> virsh save/resume would slow to a crawl with a default pipe size (64k).
> >>>>>>>>>>
> >>>>>>>>>> This improves the situation by 400%.
> >>>>>>>>>>
> >>>>>>>>>> Going through io_helper still seems to incur in some penalty (~15%-ish)
> >>>>>>>>>> compared with direct qemu migration to a nc socket to a file.
> >>>>>>>>>>
> >>>>>>>>>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
> >>>>>>>>>> ---
> >>>>>>>>>>  src/qemu/qemu_driver.c    |  6 +++---
> >>>>>>>>>>  src/qemu/qemu_saveimage.c | 11 ++++++-----
> >>>>>>>>>>  src/util/virfile.c        | 12 ++++++++++++
> >>>>>>>>>>  src/util/virfile.h        |  1 +
> >>>>>>>>>>  4 files changed, 22 insertions(+), 8 deletions(-)
> >>>>>>>>>>
> >>>>>>>>>> Hello, I initially thought this to be a qemu performance issue,
> >>>>>>>>>> so you can find the discussion about this in qemu-devel:
> >>>>>>>>>>
> >>>>>>>>>> "Re: bad virsh save /dev/null performance (600 MiB/s max)"
> >>>>>>>>>>
> >>>>>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03142.html
> >>>>
> >>>>
> >>>>> Current results show these experimental averages maximum throughput
> >>>>> migrating to /dev/null per each FdWrapper Pipe Size (as per QEMU QMP
> >>>>> "query-migrate", tests repeated 5 times for each).
> >>>>> VM Size is 60G, most of the memory effectively touched before migration,
> >>>>> through user application allocating and touching all memory with
> >>>>> pseudorandom data.
> >>>>>
> >>>>> 64K:     5200 Mbps (current situation)
> >>>>> 128K:    5800 Mbps
> >>>>> 256K:   20900 Mbps
> >>>>> 512K:   21600 Mbps
> >>>>> 1M:     22800 Mbps
> >>>>> 2M:     22800 Mbps
> >>>>> 4M:     22400 Mbps
> >>>>> 8M:     22500 Mbps
> >>>>> 16M:    22800 Mbps
> >>>>> 32M:    22900 Mbps
> >>>>> 64M:    22900 Mbps
> >>>>> 128M:   22800 Mbps
> >>>>>
> >>>>> This above is the throughput out of patched libvirt with multiple Pipe Sizes for the FDWrapper.
> >>>>
> >>>> Ok, its bouncing around with noise after 1 MB. So I'd suggest that
> >>>> libvirt attempt to raise the pipe limit to 1 MB by default, but
> >>>> not try to go higher.
> >>>>
> >>>>> As for the theoretical limit for the libvirt architecture,
> >>>>> I ran a qemu migration directly issuing the appropriate QMP
> >>>>> commands, setting the same migration parameters as per libvirt,
> >>>>> and then migrating to a socket netcatted to /dev/null via
> >>>>> {"execute": "migrate", "arguments": { "uri", "unix:///tmp/netcat.sock" } } :
> >>>>>
> >>>>> QMP:    37000 Mbps
> >>>>
> >>>>> So although the Pipe size improves things (in particular the
> >>>>> large jump is for the 256K size, although 1M seems a very good value),
> >>>>> there is still a second bottleneck in there somewhere that
> >>>>> accounts for a loss of ~14200 Mbps in throughput.
> >>
> >>
> >> Interesting addition: I tested quickly on a system with faster cpus and larger VM sizes, up to 200GB,
> >> and the difference in throughput libvirt vs qemu is basically the same ~14500 Mbps.
> >>
> >> ~50000 mbps qemu to netcat socket to /dev/null
> >> ~35500 mbps virsh save to /dev/null
> >>
> >> Seems it is not proportional to cpu speed by the looks of it (not a totally fair comparison because the VM sizes are different).
> > 
> > It might be closer to RAM or cache bandwidth limited though; for an extra copy.
> 
> I was thinking about sendfile(2) in iohelper, but that probably
> can't work as the input fd is a socket, I am getting EINVAL.

Yep, sendfile() requires the input to be a mmapable FD,
and the output to be a socket.

Try splice() instead  which merely requires 1 end to be a
pipe, and the other end can be any FD afaik.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [libvirt RFC] virFile: new VIR_FILE_WRAPPER_BIG_PIPE to improve performance
  2022-03-25 11:29                     ` Daniel P. Berrangé
@ 2022-03-26 15:49                       ` Claudio Fontana
  2022-03-26 17:38                         ` Claudio Fontana
                                           ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Claudio Fontana @ 2022-03-26 15:49 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: libvir-list, andrea.righi, Jiri Denemark, Dr. David Alan Gilbert,
	qemu-devel

On 3/25/22 12:29 PM, Daniel P. Berrangé wrote:
> On Fri, Mar 18, 2022 at 02:34:29PM +0100, Claudio Fontana wrote:
>> On 3/17/22 4:03 PM, Dr. David Alan Gilbert wrote:
>>> * Claudio Fontana (cfontana@suse.de) wrote:
>>>> On 3/17/22 2:41 PM, Claudio Fontana wrote:
>>>>> On 3/17/22 11:25 AM, Daniel P. Berrangé wrote:
>>>>>> On Thu, Mar 17, 2022 at 11:12:11AM +0100, Claudio Fontana wrote:
>>>>>>> On 3/16/22 1:17 PM, Claudio Fontana wrote:
>>>>>>>> On 3/14/22 6:48 PM, Daniel P. Berrangé wrote:
>>>>>>>>> On Mon, Mar 14, 2022 at 06:38:31PM +0100, Claudio Fontana wrote:
>>>>>>>>>> On 3/14/22 6:17 PM, Daniel P. Berrangé wrote:
>>>>>>>>>>> On Sat, Mar 12, 2022 at 05:30:01PM +0100, Claudio Fontana wrote:
>>>>>>>>>>>> the first user is the qemu driver,
>>>>>>>>>>>>
>>>>>>>>>>>> virsh save/resume would slow to a crawl with a default pipe size (64k).
>>>>>>>>>>>>
>>>>>>>>>>>> This improves the situation by 400%.
>>>>>>>>>>>>
>>>>>>>>>>>> Going through io_helper still seems to incur in some penalty (~15%-ish)
>>>>>>>>>>>> compared with direct qemu migration to a nc socket to a file.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
>>>>>>>>>>>> ---
>>>>>>>>>>>>  src/qemu/qemu_driver.c    |  6 +++---
>>>>>>>>>>>>  src/qemu/qemu_saveimage.c | 11 ++++++-----
>>>>>>>>>>>>  src/util/virfile.c        | 12 ++++++++++++
>>>>>>>>>>>>  src/util/virfile.h        |  1 +
>>>>>>>>>>>>  4 files changed, 22 insertions(+), 8 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> Hello, I initially thought this to be a qemu performance issue,
>>>>>>>>>>>> so you can find the discussion about this in qemu-devel:
>>>>>>>>>>>>
>>>>>>>>>>>> "Re: bad virsh save /dev/null performance (600 MiB/s max)"
>>>>>>>>>>>>
>>>>>>>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03142.html
>>>>>>
>>>>>>
>>>>>>> Current results show these experimental averages maximum throughput
>>>>>>> migrating to /dev/null per each FdWrapper Pipe Size (as per QEMU QMP
>>>>>>> "query-migrate", tests repeated 5 times for each).
>>>>>>> VM Size is 60G, most of the memory effectively touched before migration,
>>>>>>> through user application allocating and touching all memory with
>>>>>>> pseudorandom data.
>>>>>>>
>>>>>>> 64K:     5200 Mbps (current situation)
>>>>>>> 128K:    5800 Mbps
>>>>>>> 256K:   20900 Mbps
>>>>>>> 512K:   21600 Mbps
>>>>>>> 1M:     22800 Mbps
>>>>>>> 2M:     22800 Mbps
>>>>>>> 4M:     22400 Mbps
>>>>>>> 8M:     22500 Mbps
>>>>>>> 16M:    22800 Mbps
>>>>>>> 32M:    22900 Mbps
>>>>>>> 64M:    22900 Mbps
>>>>>>> 128M:   22800 Mbps
>>>>>>>
>>>>>>> This above is the throughput out of patched libvirt with multiple Pipe Sizes for the FDWrapper.
>>>>>>
>>>>>> Ok, its bouncing around with noise after 1 MB. So I'd suggest that
>>>>>> libvirt attempt to raise the pipe limit to 1 MB by default, but
>>>>>> not try to go higher.
>>>>>>
>>>>>>> As for the theoretical limit for the libvirt architecture,
>>>>>>> I ran a qemu migration directly issuing the appropriate QMP
>>>>>>> commands, setting the same migration parameters as per libvirt,
>>>>>>> and then migrating to a socket netcatted to /dev/null via
>>>>>>> {"execute": "migrate", "arguments": { "uri", "unix:///tmp/netcat.sock" } } :
>>>>>>>
>>>>>>> QMP:    37000 Mbps
>>>>>>
>>>>>>> So although the Pipe size improves things (in particular the
>>>>>>> large jump is for the 256K size, although 1M seems a very good value),
>>>>>>> there is still a second bottleneck in there somewhere that
>>>>>>> accounts for a loss of ~14200 Mbps in throughput.
>>>>
>>>>
>>>> Interesting addition: I tested quickly on a system with faster cpus and larger VM sizes, up to 200GB,
>>>> and the difference in throughput libvirt vs qemu is basically the same ~14500 Mbps.
>>>>
>>>> ~50000 mbps qemu to netcat socket to /dev/null
>>>> ~35500 mbps virsh save to /dev/null
>>>>
>>>> Seems it is not proportional to cpu speed by the looks of it (not a totally fair comparison because the VM sizes are different).
>>>
>>> It might be closer to RAM or cache bandwidth limited though; for an extra copy.
>>
>> I was thinking about sendfile(2) in iohelper, but that probably
>> can't work as the input fd is a socket, I am getting EINVAL.
> 
> Yep, sendfile() requires the input to be a mmapable FD,
> and the output to be a socket.
> 
> Try splice() instead  which merely requires 1 end to be a
> pipe, and the other end can be any FD afaik.
> 
> With regards,
> Daniel
> 

I did try splice(), but performance is worse by around 500%.

It also fails with EINVAL when trying to use it in combination with O_DIRECT.

Tried larger and smaller buffers, flags like SPLICE_F_MORE an SPLICE_F_MOVE in any combination; no change, just awful performance.

Here is the code:

#ifdef __linux__
+static ssize_t safesplice(int fdin, int fdout, size_t todo)
+{
+    unsigned int flags = SPLICE_F_MOVE | SPLICE_F_MORE;
+    ssize_t ncopied = 0;
+
+    while (todo > 0) {
+        ssize_t r = splice(fdin, NULL, fdout, NULL, todo, flags);
+        if (r < 0 && errno == EINTR)
+            continue;
+        if (r < 0)
+            return r;
+        if (r == 0)
+            return ncopied;
+        todo -= r;
+        ncopied += r;
+    }
+    return ncopied;
+}
+
+static ssize_t runIOCopy(const struct runIOParams p)
+{
+    size_t len = 1024 * 1024;
+    ssize_t total = 0;
+
+    while (1) {
+        ssize_t got = safesplice(p.fdin, p.fdout, len);
+        if (got < 0)
+            return -1;
+        if (got == 0)
+            break;
+
+        total += got;
+
+        /* handle last write truncate in direct case */
+        if (got < len && p.isDirect && p.isWrite && !p.isBlockDev) {
+            if (ftruncate(p.fdout, total) < 0) {
+                return -4;
+            }
+            break;
+        }
+    }
+    return total;
+}
+
+#endif


Any ideas welcome,

Claudio



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [libvirt RFC] virFile: new VIR_FILE_WRAPPER_BIG_PIPE to improve performance
  2022-03-26 15:49                       ` Claudio Fontana
@ 2022-03-26 17:38                         ` Claudio Fontana
  2022-03-28  8:31                         ` Daniel P. Berrangé
  2022-03-28 10:47                         ` Claudio Fontana
  2 siblings, 0 replies; 30+ messages in thread
From: Claudio Fontana @ 2022-03-26 17:38 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: libvir-list, andrea.righi, Jiri Denemark, Dr. David Alan Gilbert,
	qemu-devel

On 3/26/22 4:49 PM, Claudio Fontana wrote:
> On 3/25/22 12:29 PM, Daniel P. Berrangé wrote:
>> On Fri, Mar 18, 2022 at 02:34:29PM +0100, Claudio Fontana wrote:
>>> On 3/17/22 4:03 PM, Dr. David Alan Gilbert wrote:
>>>> * Claudio Fontana (cfontana@suse.de) wrote:
>>>>> On 3/17/22 2:41 PM, Claudio Fontana wrote:
>>>>>> On 3/17/22 11:25 AM, Daniel P. Berrangé wrote:
>>>>>>> On Thu, Mar 17, 2022 at 11:12:11AM +0100, Claudio Fontana wrote:
>>>>>>>> On 3/16/22 1:17 PM, Claudio Fontana wrote:
>>>>>>>>> On 3/14/22 6:48 PM, Daniel P. Berrangé wrote:
>>>>>>>>>> On Mon, Mar 14, 2022 at 06:38:31PM +0100, Claudio Fontana wrote:
>>>>>>>>>>> On 3/14/22 6:17 PM, Daniel P. Berrangé wrote:
>>>>>>>>>>>> On Sat, Mar 12, 2022 at 05:30:01PM +0100, Claudio Fontana wrote:
>>>>>>>>>>>>> the first user is the qemu driver,
>>>>>>>>>>>>>
>>>>>>>>>>>>> virsh save/resume would slow to a crawl with a default pipe size (64k).
>>>>>>>>>>>>>
>>>>>>>>>>>>> This improves the situation by 400%.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Going through io_helper still seems to incur in some penalty (~15%-ish)
>>>>>>>>>>>>> compared with direct qemu migration to a nc socket to a file.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>>  src/qemu/qemu_driver.c    |  6 +++---
>>>>>>>>>>>>>  src/qemu/qemu_saveimage.c | 11 ++++++-----
>>>>>>>>>>>>>  src/util/virfile.c        | 12 ++++++++++++
>>>>>>>>>>>>>  src/util/virfile.h        |  1 +
>>>>>>>>>>>>>  4 files changed, 22 insertions(+), 8 deletions(-)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hello, I initially thought this to be a qemu performance issue,
>>>>>>>>>>>>> so you can find the discussion about this in qemu-devel:
>>>>>>>>>>>>>
>>>>>>>>>>>>> "Re: bad virsh save /dev/null performance (600 MiB/s max)"
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03142.html
>>>>>>>
>>>>>>>
>>>>>>>> Current results show these experimental averages maximum throughput
>>>>>>>> migrating to /dev/null per each FdWrapper Pipe Size (as per QEMU QMP
>>>>>>>> "query-migrate", tests repeated 5 times for each).
>>>>>>>> VM Size is 60G, most of the memory effectively touched before migration,
>>>>>>>> through user application allocating and touching all memory with
>>>>>>>> pseudorandom data.
>>>>>>>>
>>>>>>>> 64K:     5200 Mbps (current situation)
>>>>>>>> 128K:    5800 Mbps
>>>>>>>> 256K:   20900 Mbps
>>>>>>>> 512K:   21600 Mbps
>>>>>>>> 1M:     22800 Mbps
>>>>>>>> 2M:     22800 Mbps
>>>>>>>> 4M:     22400 Mbps
>>>>>>>> 8M:     22500 Mbps
>>>>>>>> 16M:    22800 Mbps
>>>>>>>> 32M:    22900 Mbps
>>>>>>>> 64M:    22900 Mbps
>>>>>>>> 128M:   22800 Mbps
>>>>>>>>
>>>>>>>> This above is the throughput out of patched libvirt with multiple Pipe Sizes for the FDWrapper.
>>>>>>>
>>>>>>> Ok, its bouncing around with noise after 1 MB. So I'd suggest that
>>>>>>> libvirt attempt to raise the pipe limit to 1 MB by default, but
>>>>>>> not try to go higher.
>>>>>>>
>>>>>>>> As for the theoretical limit for the libvirt architecture,
>>>>>>>> I ran a qemu migration directly issuing the appropriate QMP
>>>>>>>> commands, setting the same migration parameters as per libvirt,
>>>>>>>> and then migrating to a socket netcatted to /dev/null via
>>>>>>>> {"execute": "migrate", "arguments": { "uri", "unix:///tmp/netcat.sock" } } :
>>>>>>>>
>>>>>>>> QMP:    37000 Mbps
>>>>>>>
>>>>>>>> So although the Pipe size improves things (in particular the
>>>>>>>> large jump is for the 256K size, although 1M seems a very good value),
>>>>>>>> there is still a second bottleneck in there somewhere that
>>>>>>>> accounts for a loss of ~14200 Mbps in throughput.
>>>>>
>>>>>
>>>>> Interesting addition: I tested quickly on a system with faster cpus and larger VM sizes, up to 200GB,
>>>>> and the difference in throughput libvirt vs qemu is basically the same ~14500 Mbps.
>>>>>
>>>>> ~50000 mbps qemu to netcat socket to /dev/null
>>>>> ~35500 mbps virsh save to /dev/null
>>>>>
>>>>> Seems it is not proportional to cpu speed by the looks of it (not a totally fair comparison because the VM sizes are different).
>>>>
>>>> It might be closer to RAM or cache bandwidth limited though; for an extra copy.
>>>
>>> I was thinking about sendfile(2) in iohelper, but that probably
>>> can't work as the input fd is a socket, I am getting EINVAL.
>>
>> Yep, sendfile() requires the input to be a mmapable FD,
>> and the output to be a socket.
>>
>> Try splice() instead  which merely requires 1 end to be a
>> pipe, and the other end can be any FD afaik.
>>
>> With regards,
>> Daniel
>>
> 
> I did try splice(), but performance is worse by around 500%.
> 
> It also fails with EINVAL when trying to use it in combination with O_DIRECT.
> 
> Tried larger and smaller buffers, flags like SPLICE_F_MORE an SPLICE_F_MOVE in any combination; no change, just awful performance.


when doing read from the save file, performance is actually okish (when doing virsh restore), still slightly worse than normal read/write.

Claudio

> 
> Here is the code:
> 
> #ifdef __linux__
> +static ssize_t safesplice(int fdin, int fdout, size_t todo)
> +{
> +    unsigned int flags = SPLICE_F_MOVE | SPLICE_F_MORE;
> +    ssize_t ncopied = 0;
> +
> +    while (todo > 0) {
> +        ssize_t r = splice(fdin, NULL, fdout, NULL, todo, flags);
> +        if (r < 0 && errno == EINTR)
> +            continue;
> +        if (r < 0)
> +            return r;
> +        if (r == 0)
> +            return ncopied;
> +        todo -= r;
> +        ncopied += r;
> +    }
> +    return ncopied;
> +}
> +
> +static ssize_t runIOCopy(const struct runIOParams p)
> +{
> +    size_t len = 1024 * 1024;
> +    ssize_t total = 0;
> +
> +    while (1) {
> +        ssize_t got = safesplice(p.fdin, p.fdout, len);
> +        if (got < 0)
> +            return -1;
> +        if (got == 0)
> +            break;
> +
> +        total += got;
> +
> +        /* handle last write truncate in direct case */
> +        if (got < len && p.isDirect && p.isWrite && !p.isBlockDev) {
> +            if (ftruncate(p.fdout, total) < 0) {
> +                return -4;
> +            }
> +            break;
> +        }
> +    }
> +    return total;
> +}
> +
> +#endif
> 
> 
> Any ideas welcome,
> 
> Claudio
> 



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [libvirt RFC] virFile: new VIR_FILE_WRAPPER_BIG_PIPE to improve performance
  2022-03-26 15:49                       ` Claudio Fontana
  2022-03-26 17:38                         ` Claudio Fontana
@ 2022-03-28  8:31                         ` Daniel P. Berrangé
  2022-03-28  9:19                           ` Claudio Fontana
  2022-03-28  9:31                           ` Claudio Fontana
  2022-03-28 10:47                         ` Claudio Fontana
  2 siblings, 2 replies; 30+ messages in thread
From: Daniel P. Berrangé @ 2022-03-28  8:31 UTC (permalink / raw)
  To: Claudio Fontana
  Cc: libvir-list, andrea.righi, Jiri Denemark, Dr. David Alan Gilbert,
	qemu-devel

On Sat, Mar 26, 2022 at 04:49:46PM +0100, Claudio Fontana wrote:
> On 3/25/22 12:29 PM, Daniel P. Berrangé wrote:
> > On Fri, Mar 18, 2022 at 02:34:29PM +0100, Claudio Fontana wrote:
> >> On 3/17/22 4:03 PM, Dr. David Alan Gilbert wrote:
> >>> * Claudio Fontana (cfontana@suse.de) wrote:
> >>>> On 3/17/22 2:41 PM, Claudio Fontana wrote:
> >>>>> On 3/17/22 11:25 AM, Daniel P. Berrangé wrote:
> >>>>>> On Thu, Mar 17, 2022 at 11:12:11AM +0100, Claudio Fontana wrote:
> >>>>>>> On 3/16/22 1:17 PM, Claudio Fontana wrote:
> >>>>>>>> On 3/14/22 6:48 PM, Daniel P. Berrangé wrote:
> >>>>>>>>> On Mon, Mar 14, 2022 at 06:38:31PM +0100, Claudio Fontana wrote:
> >>>>>>>>>> On 3/14/22 6:17 PM, Daniel P. Berrangé wrote:
> >>>>>>>>>>> On Sat, Mar 12, 2022 at 05:30:01PM +0100, Claudio Fontana wrote:
> >>>>>>>>>>>> the first user is the qemu driver,
> >>>>>>>>>>>>
> >>>>>>>>>>>> virsh save/resume would slow to a crawl with a default pipe size (64k).
> >>>>>>>>>>>>
> >>>>>>>>>>>> This improves the situation by 400%.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Going through io_helper still seems to incur in some penalty (~15%-ish)
> >>>>>>>>>>>> compared with direct qemu migration to a nc socket to a file.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
> >>>>>>>>>>>> ---
> >>>>>>>>>>>>  src/qemu/qemu_driver.c    |  6 +++---
> >>>>>>>>>>>>  src/qemu/qemu_saveimage.c | 11 ++++++-----
> >>>>>>>>>>>>  src/util/virfile.c        | 12 ++++++++++++
> >>>>>>>>>>>>  src/util/virfile.h        |  1 +
> >>>>>>>>>>>>  4 files changed, 22 insertions(+), 8 deletions(-)
> >>>>>>>>>>>>
> >>>>>>>>>>>> Hello, I initially thought this to be a qemu performance issue,
> >>>>>>>>>>>> so you can find the discussion about this in qemu-devel:
> >>>>>>>>>>>>
> >>>>>>>>>>>> "Re: bad virsh save /dev/null performance (600 MiB/s max)"
> >>>>>>>>>>>>
> >>>>>>>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03142.html
> >>>>>>
> >>>>>>
> >>>>>>> Current results show these experimental averages maximum throughput
> >>>>>>> migrating to /dev/null per each FdWrapper Pipe Size (as per QEMU QMP
> >>>>>>> "query-migrate", tests repeated 5 times for each).
> >>>>>>> VM Size is 60G, most of the memory effectively touched before migration,
> >>>>>>> through user application allocating and touching all memory with
> >>>>>>> pseudorandom data.
> >>>>>>>
> >>>>>>> 64K:     5200 Mbps (current situation)
> >>>>>>> 128K:    5800 Mbps
> >>>>>>> 256K:   20900 Mbps
> >>>>>>> 512K:   21600 Mbps
> >>>>>>> 1M:     22800 Mbps
> >>>>>>> 2M:     22800 Mbps
> >>>>>>> 4M:     22400 Mbps
> >>>>>>> 8M:     22500 Mbps
> >>>>>>> 16M:    22800 Mbps
> >>>>>>> 32M:    22900 Mbps
> >>>>>>> 64M:    22900 Mbps
> >>>>>>> 128M:   22800 Mbps
> >>>>>>>
> >>>>>>> This above is the throughput out of patched libvirt with multiple Pipe Sizes for the FDWrapper.
> >>>>>>
> >>>>>> Ok, its bouncing around with noise after 1 MB. So I'd suggest that
> >>>>>> libvirt attempt to raise the pipe limit to 1 MB by default, but
> >>>>>> not try to go higher.
> >>>>>>
> >>>>>>> As for the theoretical limit for the libvirt architecture,
> >>>>>>> I ran a qemu migration directly issuing the appropriate QMP
> >>>>>>> commands, setting the same migration parameters as per libvirt,
> >>>>>>> and then migrating to a socket netcatted to /dev/null via
> >>>>>>> {"execute": "migrate", "arguments": { "uri", "unix:///tmp/netcat.sock" } } :
> >>>>>>>
> >>>>>>> QMP:    37000 Mbps
> >>>>>>
> >>>>>>> So although the Pipe size improves things (in particular the
> >>>>>>> large jump is for the 256K size, although 1M seems a very good value),
> >>>>>>> there is still a second bottleneck in there somewhere that
> >>>>>>> accounts for a loss of ~14200 Mbps in throughput.
> >>>>
> >>>>
> >>>> Interesting addition: I tested quickly on a system with faster cpus and larger VM sizes, up to 200GB,
> >>>> and the difference in throughput libvirt vs qemu is basically the same ~14500 Mbps.
> >>>>
> >>>> ~50000 mbps qemu to netcat socket to /dev/null
> >>>> ~35500 mbps virsh save to /dev/null
> >>>>
> >>>> Seems it is not proportional to cpu speed by the looks of it (not a totally fair comparison because the VM sizes are different).
> >>>
> >>> It might be closer to RAM or cache bandwidth limited though; for an extra copy.
> >>
> >> I was thinking about sendfile(2) in iohelper, but that probably
> >> can't work as the input fd is a socket, I am getting EINVAL.
> > 
> > Yep, sendfile() requires the input to be a mmapable FD,
> > and the output to be a socket.
> > 
> > Try splice() instead  which merely requires 1 end to be a
> > pipe, and the other end can be any FD afaik.
> > 
> 
> I did try splice(), but performance is worse by around 500%.

Hmm, that's certainly unexpected !

> Any ideas welcome,

I learnt there is also a newer  copy_file_range call, not sure if that's
any better.

You passed len as 1 MB, I wonder if passing MAXINT is viable ? We just
want to copy everything IIRC.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [libvirt RFC] virFile: new VIR_FILE_WRAPPER_BIG_PIPE to improve performance
  2022-03-28  8:31                         ` Daniel P. Berrangé
@ 2022-03-28  9:19                           ` Claudio Fontana
  2022-03-28  9:41                             ` Claudio Fontana
  2022-03-28  9:31                           ` Claudio Fontana
  1 sibling, 1 reply; 30+ messages in thread
From: Claudio Fontana @ 2022-03-28  9:19 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: libvir-list, andrea.righi, Jiri Denemark, Dr. David Alan Gilbert,
	qemu-devel

On 3/28/22 10:31 AM, Daniel P. Berrangé wrote:
> On Sat, Mar 26, 2022 at 04:49:46PM +0100, Claudio Fontana wrote:
>> On 3/25/22 12:29 PM, Daniel P. Berrangé wrote:
>>> On Fri, Mar 18, 2022 at 02:34:29PM +0100, Claudio Fontana wrote:
>>>> On 3/17/22 4:03 PM, Dr. David Alan Gilbert wrote:
>>>>> * Claudio Fontana (cfontana@suse.de) wrote:
>>>>>> On 3/17/22 2:41 PM, Claudio Fontana wrote:
>>>>>>> On 3/17/22 11:25 AM, Daniel P. Berrangé wrote:
>>>>>>>> On Thu, Mar 17, 2022 at 11:12:11AM +0100, Claudio Fontana wrote:
>>>>>>>>> On 3/16/22 1:17 PM, Claudio Fontana wrote:
>>>>>>>>>> On 3/14/22 6:48 PM, Daniel P. Berrangé wrote:
>>>>>>>>>>> On Mon, Mar 14, 2022 at 06:38:31PM +0100, Claudio Fontana wrote:
>>>>>>>>>>>> On 3/14/22 6:17 PM, Daniel P. Berrangé wrote:
>>>>>>>>>>>>> On Sat, Mar 12, 2022 at 05:30:01PM +0100, Claudio Fontana wrote:
>>>>>>>>>>>>>> the first user is the qemu driver,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> virsh save/resume would slow to a crawl with a default pipe size (64k).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This improves the situation by 400%.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Going through io_helper still seems to incur in some penalty (~15%-ish)
>>>>>>>>>>>>>> compared with direct qemu migration to a nc socket to a file.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>  src/qemu/qemu_driver.c    |  6 +++---
>>>>>>>>>>>>>>  src/qemu/qemu_saveimage.c | 11 ++++++-----
>>>>>>>>>>>>>>  src/util/virfile.c        | 12 ++++++++++++
>>>>>>>>>>>>>>  src/util/virfile.h        |  1 +
>>>>>>>>>>>>>>  4 files changed, 22 insertions(+), 8 deletions(-)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello, I initially thought this to be a qemu performance issue,
>>>>>>>>>>>>>> so you can find the discussion about this in qemu-devel:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> "Re: bad virsh save /dev/null performance (600 MiB/s max)"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03142.html
>>>>>>>>
>>>>>>>>
>>>>>>>>> Current results show these experimental averages maximum throughput
>>>>>>>>> migrating to /dev/null per each FdWrapper Pipe Size (as per QEMU QMP
>>>>>>>>> "query-migrate", tests repeated 5 times for each).
>>>>>>>>> VM Size is 60G, most of the memory effectively touched before migration,
>>>>>>>>> through user application allocating and touching all memory with
>>>>>>>>> pseudorandom data.
>>>>>>>>>
>>>>>>>>> 64K:     5200 Mbps (current situation)
>>>>>>>>> 128K:    5800 Mbps
>>>>>>>>> 256K:   20900 Mbps
>>>>>>>>> 512K:   21600 Mbps
>>>>>>>>> 1M:     22800 Mbps
>>>>>>>>> 2M:     22800 Mbps
>>>>>>>>> 4M:     22400 Mbps
>>>>>>>>> 8M:     22500 Mbps
>>>>>>>>> 16M:    22800 Mbps
>>>>>>>>> 32M:    22900 Mbps
>>>>>>>>> 64M:    22900 Mbps
>>>>>>>>> 128M:   22800 Mbps
>>>>>>>>>
>>>>>>>>> This above is the throughput out of patched libvirt with multiple Pipe Sizes for the FDWrapper.
>>>>>>>>
>>>>>>>> Ok, its bouncing around with noise after 1 MB. So I'd suggest that
>>>>>>>> libvirt attempt to raise the pipe limit to 1 MB by default, but
>>>>>>>> not try to go higher.
>>>>>>>>
>>>>>>>>> As for the theoretical limit for the libvirt architecture,
>>>>>>>>> I ran a qemu migration directly issuing the appropriate QMP
>>>>>>>>> commands, setting the same migration parameters as per libvirt,
>>>>>>>>> and then migrating to a socket netcatted to /dev/null via
>>>>>>>>> {"execute": "migrate", "arguments": { "uri", "unix:///tmp/netcat.sock" } } :
>>>>>>>>>
>>>>>>>>> QMP:    37000 Mbps
>>>>>>>>
>>>>>>>>> So although the Pipe size improves things (in particular the
>>>>>>>>> large jump is for the 256K size, although 1M seems a very good value),
>>>>>>>>> there is still a second bottleneck in there somewhere that
>>>>>>>>> accounts for a loss of ~14200 Mbps in throughput.
>>>>>>
>>>>>>
>>>>>> Interesting addition: I tested quickly on a system with faster cpus and larger VM sizes, up to 200GB,
>>>>>> and the difference in throughput libvirt vs qemu is basically the same ~14500 Mbps.
>>>>>>
>>>>>> ~50000 mbps qemu to netcat socket to /dev/null
>>>>>> ~35500 mbps virsh save to /dev/null
>>>>>>
>>>>>> Seems it is not proportional to cpu speed by the looks of it (not a totally fair comparison because the VM sizes are different).
>>>>>
>>>>> It might be closer to RAM or cache bandwidth limited though; for an extra copy.
>>>>
>>>> I was thinking about sendfile(2) in iohelper, but that probably
>>>> can't work as the input fd is a socket, I am getting EINVAL.
>>>
>>> Yep, sendfile() requires the input to be a mmapable FD,
>>> and the output to be a socket.
>>>
>>> Try splice() instead  which merely requires 1 end to be a
>>> pipe, and the other end can be any FD afaik.
>>>
>>
>> I did try splice(), but performance is worse by around 500%.
> 
> Hmm, that's certainly unexpected !
> 
>> Any ideas welcome,
> 
> I learnt there is also a newer  copy_file_range call, not sure if that's
> any better.
> 
> You passed len as 1 MB, I wonder if passing MAXINT is viable ? We just
> want to copy everything IIRC.
> 
> With regards,
> Daniel
> 

Hi Daniel, tried also up to 64MB, no improvement with splice.

I'll take a look at copy_file_range,

Thanks!

Claudio


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [libvirt RFC] virFile: new VIR_FILE_WRAPPER_BIG_PIPE to improve performance
  2022-03-28  8:31                         ` Daniel P. Berrangé
  2022-03-28  9:19                           ` Claudio Fontana
@ 2022-03-28  9:31                           ` Claudio Fontana
  2022-04-05  8:35                             ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 30+ messages in thread
From: Claudio Fontana @ 2022-03-28  9:31 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: libvir-list, andrea.righi, Jiri Denemark, Dr. David Alan Gilbert,
	qemu-devel

On 3/28/22 10:31 AM, Daniel P. Berrangé wrote:
> On Sat, Mar 26, 2022 at 04:49:46PM +0100, Claudio Fontana wrote:
>> On 3/25/22 12:29 PM, Daniel P. Berrangé wrote:
>>> On Fri, Mar 18, 2022 at 02:34:29PM +0100, Claudio Fontana wrote:
>>>> On 3/17/22 4:03 PM, Dr. David Alan Gilbert wrote:
>>>>> * Claudio Fontana (cfontana@suse.de) wrote:
>>>>>> On 3/17/22 2:41 PM, Claudio Fontana wrote:
>>>>>>> On 3/17/22 11:25 AM, Daniel P. Berrangé wrote:
>>>>>>>> On Thu, Mar 17, 2022 at 11:12:11AM +0100, Claudio Fontana wrote:
>>>>>>>>> On 3/16/22 1:17 PM, Claudio Fontana wrote:
>>>>>>>>>> On 3/14/22 6:48 PM, Daniel P. Berrangé wrote:
>>>>>>>>>>> On Mon, Mar 14, 2022 at 06:38:31PM +0100, Claudio Fontana wrote:
>>>>>>>>>>>> On 3/14/22 6:17 PM, Daniel P. Berrangé wrote:
>>>>>>>>>>>>> On Sat, Mar 12, 2022 at 05:30:01PM +0100, Claudio Fontana wrote:
>>>>>>>>>>>>>> the first user is the qemu driver,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> virsh save/resume would slow to a crawl with a default pipe size (64k).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This improves the situation by 400%.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Going through io_helper still seems to incur in some penalty (~15%-ish)
>>>>>>>>>>>>>> compared with direct qemu migration to a nc socket to a file.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>  src/qemu/qemu_driver.c    |  6 +++---
>>>>>>>>>>>>>>  src/qemu/qemu_saveimage.c | 11 ++++++-----
>>>>>>>>>>>>>>  src/util/virfile.c        | 12 ++++++++++++
>>>>>>>>>>>>>>  src/util/virfile.h        |  1 +
>>>>>>>>>>>>>>  4 files changed, 22 insertions(+), 8 deletions(-)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello, I initially thought this to be a qemu performance issue,
>>>>>>>>>>>>>> so you can find the discussion about this in qemu-devel:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> "Re: bad virsh save /dev/null performance (600 MiB/s max)"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03142.html
>>>>>>>>
>>>>>>>>
>>>>>>>>> Current results show these experimental averages maximum throughput
>>>>>>>>> migrating to /dev/null per each FdWrapper Pipe Size (as per QEMU QMP
>>>>>>>>> "query-migrate", tests repeated 5 times for each).
>>>>>>>>> VM Size is 60G, most of the memory effectively touched before migration,
>>>>>>>>> through user application allocating and touching all memory with
>>>>>>>>> pseudorandom data.
>>>>>>>>>
>>>>>>>>> 64K:     5200 Mbps (current situation)
>>>>>>>>> 128K:    5800 Mbps
>>>>>>>>> 256K:   20900 Mbps
>>>>>>>>> 512K:   21600 Mbps
>>>>>>>>> 1M:     22800 Mbps
>>>>>>>>> 2M:     22800 Mbps
>>>>>>>>> 4M:     22400 Mbps
>>>>>>>>> 8M:     22500 Mbps
>>>>>>>>> 16M:    22800 Mbps
>>>>>>>>> 32M:    22900 Mbps
>>>>>>>>> 64M:    22900 Mbps
>>>>>>>>> 128M:   22800 Mbps
>>>>>>>>>
>>>>>>>>> This above is the throughput out of patched libvirt with multiple Pipe Sizes for the FDWrapper.
>>>>>>>>
>>>>>>>> Ok, its bouncing around with noise after 1 MB. So I'd suggest that
>>>>>>>> libvirt attempt to raise the pipe limit to 1 MB by default, but
>>>>>>>> not try to go higher.
>>>>>>>>
>>>>>>>>> As for the theoretical limit for the libvirt architecture,
>>>>>>>>> I ran a qemu migration directly issuing the appropriate QMP
>>>>>>>>> commands, setting the same migration parameters as per libvirt,
>>>>>>>>> and then migrating to a socket netcatted to /dev/null via
>>>>>>>>> {"execute": "migrate", "arguments": { "uri", "unix:///tmp/netcat.sock" } } :
>>>>>>>>>
>>>>>>>>> QMP:    37000 Mbps
>>>>>>>>
>>>>>>>>> So although the Pipe size improves things (in particular the
>>>>>>>>> large jump is for the 256K size, although 1M seems a very good value),
>>>>>>>>> there is still a second bottleneck in there somewhere that
>>>>>>>>> accounts for a loss of ~14200 Mbps in throughput.
>>>>>>
>>>>>>
>>>>>> Interesting addition: I tested quickly on a system with faster cpus and larger VM sizes, up to 200GB,
>>>>>> and the difference in throughput libvirt vs qemu is basically the same ~14500 Mbps.
>>>>>>
>>>>>> ~50000 mbps qemu to netcat socket to /dev/null
>>>>>> ~35500 mbps virsh save to /dev/null
>>>>>>
>>>>>> Seems it is not proportional to cpu speed by the looks of it (not a totally fair comparison because the VM sizes are different).
>>>>>
>>>>> It might be closer to RAM or cache bandwidth limited though; for an extra copy.
>>>>
>>>> I was thinking about sendfile(2) in iohelper, but that probably
>>>> can't work as the input fd is a socket, I am getting EINVAL.
>>>
>>> Yep, sendfile() requires the input to be a mmapable FD,
>>> and the output to be a socket.
>>>
>>> Try splice() instead  which merely requires 1 end to be a
>>> pipe, and the other end can be any FD afaik.
>>>
>>
>> I did try splice(), but performance is worse by around 500%.
> 
> Hmm, that's certainly unexpected !
> 
>> Any ideas welcome,
> 
> I learnt there is also a newer  copy_file_range call, not sure if that's
> any better.
> 
> You passed len as 1 MB, I wonder if passing MAXINT is viable ? We just
> want to copy everything IIRC.
> 
> With regards,
> Daniel
> 

Crazy idea, would trying to use the parallel migration concept for migrating to/from a file make any sense?

Not sure if applying the qemu multifd implementation of this would apply, maybe it could be given another implementation for "toFile", trying to use more than one cpu to do the transfer?

Thanks,

Claudio





^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [libvirt RFC] virFile: new VIR_FILE_WRAPPER_BIG_PIPE to improve performance
  2022-03-28  9:19                           ` Claudio Fontana
@ 2022-03-28  9:41                             ` Claudio Fontana
  0 siblings, 0 replies; 30+ messages in thread
From: Claudio Fontana @ 2022-03-28  9:41 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: libvir-list, andrea.righi, Jiri Denemark, Dr. David Alan Gilbert,
	qemu-devel

On 3/28/22 11:19 AM, Claudio Fontana wrote:
> On 3/28/22 10:31 AM, Daniel P. Berrangé wrote:
>> On Sat, Mar 26, 2022 at 04:49:46PM +0100, Claudio Fontana wrote:
>>> On 3/25/22 12:29 PM, Daniel P. Berrangé wrote:
>>>> On Fri, Mar 18, 2022 at 02:34:29PM +0100, Claudio Fontana wrote:
>>>>> On 3/17/22 4:03 PM, Dr. David Alan Gilbert wrote:
>>>>>> * Claudio Fontana (cfontana@suse.de) wrote:
>>>>>>> On 3/17/22 2:41 PM, Claudio Fontana wrote:
>>>>>>>> On 3/17/22 11:25 AM, Daniel P. Berrangé wrote:
>>>>>>>>> On Thu, Mar 17, 2022 at 11:12:11AM +0100, Claudio Fontana wrote:
>>>>>>>>>> On 3/16/22 1:17 PM, Claudio Fontana wrote:
>>>>>>>>>>> On 3/14/22 6:48 PM, Daniel P. Berrangé wrote:
>>>>>>>>>>>> On Mon, Mar 14, 2022 at 06:38:31PM +0100, Claudio Fontana wrote:
>>>>>>>>>>>>> On 3/14/22 6:17 PM, Daniel P. Berrangé wrote:
>>>>>>>>>>>>>> On Sat, Mar 12, 2022 at 05:30:01PM +0100, Claudio Fontana wrote:
>>>>>>>>>>>>>>> the first user is the qemu driver,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> virsh save/resume would slow to a crawl with a default pipe size (64k).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This improves the situation by 400%.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Going through io_helper still seems to incur in some penalty (~15%-ish)
>>>>>>>>>>>>>>> compared with direct qemu migration to a nc socket to a file.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>  src/qemu/qemu_driver.c    |  6 +++---
>>>>>>>>>>>>>>>  src/qemu/qemu_saveimage.c | 11 ++++++-----
>>>>>>>>>>>>>>>  src/util/virfile.c        | 12 ++++++++++++
>>>>>>>>>>>>>>>  src/util/virfile.h        |  1 +
>>>>>>>>>>>>>>>  4 files changed, 22 insertions(+), 8 deletions(-)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hello, I initially thought this to be a qemu performance issue,
>>>>>>>>>>>>>>> so you can find the discussion about this in qemu-devel:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> "Re: bad virsh save /dev/null performance (600 MiB/s max)"
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03142.html
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Current results show these experimental averages maximum throughput
>>>>>>>>>> migrating to /dev/null per each FdWrapper Pipe Size (as per QEMU QMP
>>>>>>>>>> "query-migrate", tests repeated 5 times for each).
>>>>>>>>>> VM Size is 60G, most of the memory effectively touched before migration,
>>>>>>>>>> through user application allocating and touching all memory with
>>>>>>>>>> pseudorandom data.
>>>>>>>>>>
>>>>>>>>>> 64K:     5200 Mbps (current situation)
>>>>>>>>>> 128K:    5800 Mbps
>>>>>>>>>> 256K:   20900 Mbps
>>>>>>>>>> 512K:   21600 Mbps
>>>>>>>>>> 1M:     22800 Mbps
>>>>>>>>>> 2M:     22800 Mbps
>>>>>>>>>> 4M:     22400 Mbps
>>>>>>>>>> 8M:     22500 Mbps
>>>>>>>>>> 16M:    22800 Mbps
>>>>>>>>>> 32M:    22900 Mbps
>>>>>>>>>> 64M:    22900 Mbps
>>>>>>>>>> 128M:   22800 Mbps
>>>>>>>>>>
>>>>>>>>>> This above is the throughput out of patched libvirt with multiple Pipe Sizes for the FDWrapper.
>>>>>>>>>
>>>>>>>>> Ok, its bouncing around with noise after 1 MB. So I'd suggest that
>>>>>>>>> libvirt attempt to raise the pipe limit to 1 MB by default, but
>>>>>>>>> not try to go higher.
>>>>>>>>>
>>>>>>>>>> As for the theoretical limit for the libvirt architecture,
>>>>>>>>>> I ran a qemu migration directly issuing the appropriate QMP
>>>>>>>>>> commands, setting the same migration parameters as per libvirt,
>>>>>>>>>> and then migrating to a socket netcatted to /dev/null via
>>>>>>>>>> {"execute": "migrate", "arguments": { "uri", "unix:///tmp/netcat.sock" } } :
>>>>>>>>>>
>>>>>>>>>> QMP:    37000 Mbps
>>>>>>>>>
>>>>>>>>>> So although the Pipe size improves things (in particular the
>>>>>>>>>> large jump is for the 256K size, although 1M seems a very good value),
>>>>>>>>>> there is still a second bottleneck in there somewhere that
>>>>>>>>>> accounts for a loss of ~14200 Mbps in throughput.
>>>>>>>
>>>>>>>
>>>>>>> Interesting addition: I tested quickly on a system with faster cpus and larger VM sizes, up to 200GB,
>>>>>>> and the difference in throughput libvirt vs qemu is basically the same ~14500 Mbps.
>>>>>>>
>>>>>>> ~50000 mbps qemu to netcat socket to /dev/null
>>>>>>> ~35500 mbps virsh save to /dev/null
>>>>>>>
>>>>>>> Seems it is not proportional to cpu speed by the looks of it (not a totally fair comparison because the VM sizes are different).
>>>>>>
>>>>>> It might be closer to RAM or cache bandwidth limited though; for an extra copy.
>>>>>
>>>>> I was thinking about sendfile(2) in iohelper, but that probably
>>>>> can't work as the input fd is a socket, I am getting EINVAL.
>>>>
>>>> Yep, sendfile() requires the input to be a mmapable FD,
>>>> and the output to be a socket.
>>>>
>>>> Try splice() instead  which merely requires 1 end to be a
>>>> pipe, and the other end can be any FD afaik.
>>>>
>>>
>>> I did try splice(), but performance is worse by around 500%.
>>
>> Hmm, that's certainly unexpected !
>>
>>> Any ideas welcome,
>>
>> I learnt there is also a newer  copy_file_range call, not sure if that's
>> any better.
>>
>> You passed len as 1 MB, I wonder if passing MAXINT is viable ? We just
>> want to copy everything IIRC.
>>
>> With regards,
>> Daniel
>>
> 
> Hi Daniel, tried also up to 64MB, no improvement with splice.
> 
> I'll take a look at copy_file_range,

It fails with EINVAL, according to man pages it needs both fds to refer to regular files.

All these alternatives to read/write API seem very situational...

would be cool if there was an API that does the best thing to minimize copies with the FDs it is passed, avoiding the need for userspace buffer
whatever the FDs refer to, but seems like there isn't one?

Ciao,

Claudio



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [libvirt RFC] virFile: new VIR_FILE_WRAPPER_BIG_PIPE to improve performance
  2022-03-26 15:49                       ` Claudio Fontana
  2022-03-26 17:38                         ` Claudio Fontana
  2022-03-28  8:31                         ` Daniel P. Berrangé
@ 2022-03-28 10:47                         ` Claudio Fontana
  2022-03-28 13:28                           ` Claudio Fontana
  2 siblings, 1 reply; 30+ messages in thread
From: Claudio Fontana @ 2022-03-28 10:47 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: libvir-list, andrea.righi, Jiri Denemark, Dr. David Alan Gilbert,
	qemu-devel

On 3/26/22 4:49 PM, Claudio Fontana wrote:
> On 3/25/22 12:29 PM, Daniel P. Berrangé wrote:
>> On Fri, Mar 18, 2022 at 02:34:29PM +0100, Claudio Fontana wrote:
>>> On 3/17/22 4:03 PM, Dr. David Alan Gilbert wrote:
>>>> * Claudio Fontana (cfontana@suse.de) wrote:
>>>>> On 3/17/22 2:41 PM, Claudio Fontana wrote:
>>>>>> On 3/17/22 11:25 AM, Daniel P. Berrangé wrote:
>>>>>>> On Thu, Mar 17, 2022 at 11:12:11AM +0100, Claudio Fontana wrote:
>>>>>>>> On 3/16/22 1:17 PM, Claudio Fontana wrote:
>>>>>>>>> On 3/14/22 6:48 PM, Daniel P. Berrangé wrote:
>>>>>>>>>> On Mon, Mar 14, 2022 at 06:38:31PM +0100, Claudio Fontana wrote:
>>>>>>>>>>> On 3/14/22 6:17 PM, Daniel P. Berrangé wrote:
>>>>>>>>>>>> On Sat, Mar 12, 2022 at 05:30:01PM +0100, Claudio Fontana wrote:
>>>>>>>>>>>>> the first user is the qemu driver,
>>>>>>>>>>>>>
>>>>>>>>>>>>> virsh save/resume would slow to a crawl with a default pipe size (64k).
>>>>>>>>>>>>>
>>>>>>>>>>>>> This improves the situation by 400%.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Going through io_helper still seems to incur in some penalty (~15%-ish)
>>>>>>>>>>>>> compared with direct qemu migration to a nc socket to a file.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>>  src/qemu/qemu_driver.c    |  6 +++---
>>>>>>>>>>>>>  src/qemu/qemu_saveimage.c | 11 ++++++-----
>>>>>>>>>>>>>  src/util/virfile.c        | 12 ++++++++++++
>>>>>>>>>>>>>  src/util/virfile.h        |  1 +
>>>>>>>>>>>>>  4 files changed, 22 insertions(+), 8 deletions(-)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hello, I initially thought this to be a qemu performance issue,
>>>>>>>>>>>>> so you can find the discussion about this in qemu-devel:
>>>>>>>>>>>>>
>>>>>>>>>>>>> "Re: bad virsh save /dev/null performance (600 MiB/s max)"
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03142.html
>>>>>>>
>>>>>>>
>>>>>>>> Current results show these experimental averages maximum throughput
>>>>>>>> migrating to /dev/null per each FdWrapper Pipe Size (as per QEMU QMP
>>>>>>>> "query-migrate", tests repeated 5 times for each).
>>>>>>>> VM Size is 60G, most of the memory effectively touched before migration,
>>>>>>>> through user application allocating and touching all memory with
>>>>>>>> pseudorandom data.
>>>>>>>>
>>>>>>>> 64K:     5200 Mbps (current situation)
>>>>>>>> 128K:    5800 Mbps
>>>>>>>> 256K:   20900 Mbps
>>>>>>>> 512K:   21600 Mbps
>>>>>>>> 1M:     22800 Mbps
>>>>>>>> 2M:     22800 Mbps
>>>>>>>> 4M:     22400 Mbps
>>>>>>>> 8M:     22500 Mbps
>>>>>>>> 16M:    22800 Mbps
>>>>>>>> 32M:    22900 Mbps
>>>>>>>> 64M:    22900 Mbps
>>>>>>>> 128M:   22800 Mbps
>>>>>>>>
>>>>>>>> This above is the throughput out of patched libvirt with multiple Pipe Sizes for the FDWrapper.
>>>>>>>
>>>>>>> Ok, its bouncing around with noise after 1 MB. So I'd suggest that
>>>>>>> libvirt attempt to raise the pipe limit to 1 MB by default, but
>>>>>>> not try to go higher.
>>>>>>>
>>>>>>>> As for the theoretical limit for the libvirt architecture,
>>>>>>>> I ran a qemu migration directly issuing the appropriate QMP
>>>>>>>> commands, setting the same migration parameters as per libvirt,
>>>>>>>> and then migrating to a socket netcatted to /dev/null via
>>>>>>>> {"execute": "migrate", "arguments": { "uri", "unix:///tmp/netcat.sock" } } :
>>>>>>>>
>>>>>>>> QMP:    37000 Mbps
>>>>>>>
>>>>>>>> So although the Pipe size improves things (in particular the
>>>>>>>> large jump is for the 256K size, although 1M seems a very good value),
>>>>>>>> there is still a second bottleneck in there somewhere that
>>>>>>>> accounts for a loss of ~14200 Mbps in throughput.
>>>>>
>>>>>
>>>>> Interesting addition: I tested quickly on a system with faster cpus and larger VM sizes, up to 200GB,
>>>>> and the difference in throughput libvirt vs qemu is basically the same ~14500 Mbps.
>>>>>
>>>>> ~50000 mbps qemu to netcat socket to /dev/null
>>>>> ~35500 mbps virsh save to /dev/null
>>>>>
>>>>> Seems it is not proportional to cpu speed by the looks of it (not a totally fair comparison because the VM sizes are different).
>>>>
>>>> It might be closer to RAM or cache bandwidth limited though; for an extra copy.
>>>
>>> I was thinking about sendfile(2) in iohelper, but that probably
>>> can't work as the input fd is a socket, I am getting EINVAL.
>>
>> Yep, sendfile() requires the input to be a mmapable FD,
>> and the output to be a socket.
>>
>> Try splice() instead  which merely requires 1 end to be a
>> pipe, and the other end can be any FD afaik.
>>
>> With regards,
>> Daniel
>>
> 
> I did try splice(), but performance is worse by around 500%.
> 
> It also fails with EINVAL when trying to use it in combination with O_DIRECT.
> 
> Tried larger and smaller buffers, flags like SPLICE_F_MORE an SPLICE_F_MOVE in any combination; no change, just awful performance.


Ok I found a case where splice actually helps: in the read case, without O_DIRECT, splice seems to actually outperform read/write
by _a lot_.

I will code up the patch and start making more experiments with larger VM sizes etc.

Thanks!

Claudio


> 
> Here is the code:
> 
> #ifdef __linux__
> +static ssize_t safesplice(int fdin, int fdout, size_t todo)
> +{
> +    unsigned int flags = SPLICE_F_MOVE | SPLICE_F_MORE;
> +    ssize_t ncopied = 0;
> +
> +    while (todo > 0) {
> +        ssize_t r = splice(fdin, NULL, fdout, NULL, todo, flags);
> +        if (r < 0 && errno == EINTR)
> +            continue;
> +        if (r < 0)
> +            return r;
> +        if (r == 0)
> +            return ncopied;
> +        todo -= r;
> +        ncopied += r;
> +    }
> +    return ncopied;
> +}
> +
> +static ssize_t runIOCopy(const struct runIOParams p)
> +{
> +    size_t len = 1024 * 1024;
> +    ssize_t total = 0;
> +
> +    while (1) {
> +        ssize_t got = safesplice(p.fdin, p.fdout, len);
> +        if (got < 0)
> +            return -1;
> +        if (got == 0)
> +            break;
> +
> +        total += got;
> +
> +        /* handle last write truncate in direct case */
> +        if (got < len && p.isDirect && p.isWrite && !p.isBlockDev) {
> +            if (ftruncate(p.fdout, total) < 0) {
> +                return -4;
> +            }
> +            break;
> +        }
> +    }
> +    return total;
> +}
> +
> +#endif
> 
> 
> Any ideas welcome,
> 
> Claudio
> 



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [libvirt RFC] virFile: new VIR_FILE_WRAPPER_BIG_PIPE to improve performance
  2022-03-28 10:47                         ` Claudio Fontana
@ 2022-03-28 13:28                           ` Claudio Fontana
  0 siblings, 0 replies; 30+ messages in thread
From: Claudio Fontana @ 2022-03-28 13:28 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: libvir-list, andrea.righi, Jiri Denemark, Dr. David Alan Gilbert,
	qemu-devel

On 3/28/22 12:47 PM, Claudio Fontana wrote:
> On 3/26/22 4:49 PM, Claudio Fontana wrote:
>> On 3/25/22 12:29 PM, Daniel P. Berrangé wrote:
>>> On Fri, Mar 18, 2022 at 02:34:29PM +0100, Claudio Fontana wrote:
>>>> On 3/17/22 4:03 PM, Dr. David Alan Gilbert wrote:
>>>>> * Claudio Fontana (cfontana@suse.de) wrote:
>>>>>> On 3/17/22 2:41 PM, Claudio Fontana wrote:
>>>>>>> On 3/17/22 11:25 AM, Daniel P. Berrangé wrote:
>>>>>>>> On Thu, Mar 17, 2022 at 11:12:11AM +0100, Claudio Fontana wrote:
>>>>>>>>> On 3/16/22 1:17 PM, Claudio Fontana wrote:
>>>>>>>>>> On 3/14/22 6:48 PM, Daniel P. Berrangé wrote:
>>>>>>>>>>> On Mon, Mar 14, 2022 at 06:38:31PM +0100, Claudio Fontana wrote:
>>>>>>>>>>>> On 3/14/22 6:17 PM, Daniel P. Berrangé wrote:
>>>>>>>>>>>>> On Sat, Mar 12, 2022 at 05:30:01PM +0100, Claudio Fontana wrote:
>>>>>>>>>>>>>> the first user is the qemu driver,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> virsh save/resume would slow to a crawl with a default pipe size (64k).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This improves the situation by 400%.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Going through io_helper still seems to incur in some penalty (~15%-ish)
>>>>>>>>>>>>>> compared with direct qemu migration to a nc socket to a file.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>  src/qemu/qemu_driver.c    |  6 +++---
>>>>>>>>>>>>>>  src/qemu/qemu_saveimage.c | 11 ++++++-----
>>>>>>>>>>>>>>  src/util/virfile.c        | 12 ++++++++++++
>>>>>>>>>>>>>>  src/util/virfile.h        |  1 +
>>>>>>>>>>>>>>  4 files changed, 22 insertions(+), 8 deletions(-)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello, I initially thought this to be a qemu performance issue,
>>>>>>>>>>>>>> so you can find the discussion about this in qemu-devel:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> "Re: bad virsh save /dev/null performance (600 MiB/s max)"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03142.html
>>>>>>>>
>>>>>>>>
>>>>>>>>> Current results show these experimental averages maximum throughput
>>>>>>>>> migrating to /dev/null per each FdWrapper Pipe Size (as per QEMU QMP
>>>>>>>>> "query-migrate", tests repeated 5 times for each).
>>>>>>>>> VM Size is 60G, most of the memory effectively touched before migration,
>>>>>>>>> through user application allocating and touching all memory with
>>>>>>>>> pseudorandom data.
>>>>>>>>>
>>>>>>>>> 64K:     5200 Mbps (current situation)
>>>>>>>>> 128K:    5800 Mbps
>>>>>>>>> 256K:   20900 Mbps
>>>>>>>>> 512K:   21600 Mbps
>>>>>>>>> 1M:     22800 Mbps
>>>>>>>>> 2M:     22800 Mbps
>>>>>>>>> 4M:     22400 Mbps
>>>>>>>>> 8M:     22500 Mbps
>>>>>>>>> 16M:    22800 Mbps
>>>>>>>>> 32M:    22900 Mbps
>>>>>>>>> 64M:    22900 Mbps
>>>>>>>>> 128M:   22800 Mbps
>>>>>>>>>
>>>>>>>>> This above is the throughput out of patched libvirt with multiple Pipe Sizes for the FDWrapper.
>>>>>>>>
>>>>>>>> Ok, its bouncing around with noise after 1 MB. So I'd suggest that
>>>>>>>> libvirt attempt to raise the pipe limit to 1 MB by default, but
>>>>>>>> not try to go higher.
>>>>>>>>
>>>>>>>>> As for the theoretical limit for the libvirt architecture,
>>>>>>>>> I ran a qemu migration directly issuing the appropriate QMP
>>>>>>>>> commands, setting the same migration parameters as per libvirt,
>>>>>>>>> and then migrating to a socket netcatted to /dev/null via
>>>>>>>>> {"execute": "migrate", "arguments": { "uri", "unix:///tmp/netcat.sock" } } :
>>>>>>>>>
>>>>>>>>> QMP:    37000 Mbps
>>>>>>>>
>>>>>>>>> So although the Pipe size improves things (in particular the
>>>>>>>>> large jump is for the 256K size, although 1M seems a very good value),
>>>>>>>>> there is still a second bottleneck in there somewhere that
>>>>>>>>> accounts for a loss of ~14200 Mbps in throughput.
>>>>>>
>>>>>>
>>>>>> Interesting addition: I tested quickly on a system with faster cpus and larger VM sizes, up to 200GB,
>>>>>> and the difference in throughput libvirt vs qemu is basically the same ~14500 Mbps.
>>>>>>
>>>>>> ~50000 mbps qemu to netcat socket to /dev/null
>>>>>> ~35500 mbps virsh save to /dev/null
>>>>>>
>>>>>> Seems it is not proportional to cpu speed by the looks of it (not a totally fair comparison because the VM sizes are different).
>>>>>
>>>>> It might be closer to RAM or cache bandwidth limited though; for an extra copy.
>>>>
>>>> I was thinking about sendfile(2) in iohelper, but that probably
>>>> can't work as the input fd is a socket, I am getting EINVAL.
>>>
>>> Yep, sendfile() requires the input to be a mmapable FD,
>>> and the output to be a socket.
>>>
>>> Try splice() instead  which merely requires 1 end to be a
>>> pipe, and the other end can be any FD afaik.
>>>
>>> With regards,
>>> Daniel
>>>
>>
>> I did try splice(), but performance is worse by around 500%.
>>
>> It also fails with EINVAL when trying to use it in combination with O_DIRECT.
>>
>> Tried larger and smaller buffers, flags like SPLICE_F_MORE an SPLICE_F_MOVE in any combination; no change, just awful performance.
> 
> 
> Ok I found a case where splice actually helps: in the read case, without O_DIRECT, splice seems to actually outperform read/write
> by _a lot_.


I was just hit by a cache effect. No real improvements I could measure.

> 
> I will code up the patch and start making more experiments with larger VM sizes etc.
> 
> Thanks!
> 
> Claudio
> 
> 
>>
>> Here is the code:
>>
>> #ifdef __linux__
>> +static ssize_t safesplice(int fdin, int fdout, size_t todo)
>> +{
>> +    unsigned int flags = SPLICE_F_MOVE | SPLICE_F_MORE;
>> +    ssize_t ncopied = 0;
>> +
>> +    while (todo > 0) {
>> +        ssize_t r = splice(fdin, NULL, fdout, NULL, todo, flags);
>> +        if (r < 0 && errno == EINTR)
>> +            continue;
>> +        if (r < 0)
>> +            return r;
>> +        if (r == 0)
>> +            return ncopied;
>> +        todo -= r;
>> +        ncopied += r;
>> +    }
>> +    return ncopied;
>> +}
>> +
>> +static ssize_t runIOCopy(const struct runIOParams p)
>> +{
>> +    size_t len = 1024 * 1024;
>> +    ssize_t total = 0;
>> +
>> +    while (1) {
>> +        ssize_t got = safesplice(p.fdin, p.fdout, len);
>> +        if (got < 0)
>> +            return -1;
>> +        if (got == 0)
>> +            break;
>> +
>> +        total += got;
>> +
>> +        /* handle last write truncate in direct case */
>> +        if (got < len && p.isDirect && p.isWrite && !p.isBlockDev) {
>> +            if (ftruncate(p.fdout, total) < 0) {
>> +                return -4;
>> +            }
>> +            break;
>> +        }
>> +    }
>> +    return total;
>> +}
>> +
>> +#endif
>>
>>
>> Any ideas welcome,
>>
>> Claudio
>>
> 



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [libvirt RFC] virFile: new VIR_FILE_WRAPPER_BIG_PIPE to improve performance
  2022-03-28  9:31                           ` Claudio Fontana
@ 2022-04-05  8:35                             ` Dr. David Alan Gilbert
  2022-04-05  9:23                               ` Claudio Fontana
  2022-04-07  7:11                               ` Claudio Fontana
  0 siblings, 2 replies; 30+ messages in thread
From: Dr. David Alan Gilbert @ 2022-04-05  8:35 UTC (permalink / raw)
  To: Claudio Fontana
  Cc: libvir-list, andrea.righi, Jiri Denemark, Daniel P. Berrangé,
	qemu-devel

* Claudio Fontana (cfontana@suse.de) wrote:
> On 3/28/22 10:31 AM, Daniel P. Berrangé wrote:
> > On Sat, Mar 26, 2022 at 04:49:46PM +0100, Claudio Fontana wrote:
> >> On 3/25/22 12:29 PM, Daniel P. Berrangé wrote:
> >>> On Fri, Mar 18, 2022 at 02:34:29PM +0100, Claudio Fontana wrote:
> >>>> On 3/17/22 4:03 PM, Dr. David Alan Gilbert wrote:
> >>>>> * Claudio Fontana (cfontana@suse.de) wrote:
> >>>>>> On 3/17/22 2:41 PM, Claudio Fontana wrote:
> >>>>>>> On 3/17/22 11:25 AM, Daniel P. Berrangé wrote:
> >>>>>>>> On Thu, Mar 17, 2022 at 11:12:11AM +0100, Claudio Fontana wrote:
> >>>>>>>>> On 3/16/22 1:17 PM, Claudio Fontana wrote:
> >>>>>>>>>> On 3/14/22 6:48 PM, Daniel P. Berrangé wrote:
> >>>>>>>>>>> On Mon, Mar 14, 2022 at 06:38:31PM +0100, Claudio Fontana wrote:
> >>>>>>>>>>>> On 3/14/22 6:17 PM, Daniel P. Berrangé wrote:
> >>>>>>>>>>>>> On Sat, Mar 12, 2022 at 05:30:01PM +0100, Claudio Fontana wrote:
> >>>>>>>>>>>>>> the first user is the qemu driver,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> virsh save/resume would slow to a crawl with a default pipe size (64k).
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> This improves the situation by 400%.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Going through io_helper still seems to incur in some penalty (~15%-ish)
> >>>>>>>>>>>>>> compared with direct qemu migration to a nc socket to a file.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
> >>>>>>>>>>>>>> ---
> >>>>>>>>>>>>>>  src/qemu/qemu_driver.c    |  6 +++---
> >>>>>>>>>>>>>>  src/qemu/qemu_saveimage.c | 11 ++++++-----
> >>>>>>>>>>>>>>  src/util/virfile.c        | 12 ++++++++++++
> >>>>>>>>>>>>>>  src/util/virfile.h        |  1 +
> >>>>>>>>>>>>>>  4 files changed, 22 insertions(+), 8 deletions(-)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hello, I initially thought this to be a qemu performance issue,
> >>>>>>>>>>>>>> so you can find the discussion about this in qemu-devel:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> "Re: bad virsh save /dev/null performance (600 MiB/s max)"
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03142.html
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> Current results show these experimental averages maximum throughput
> >>>>>>>>> migrating to /dev/null per each FdWrapper Pipe Size (as per QEMU QMP
> >>>>>>>>> "query-migrate", tests repeated 5 times for each).
> >>>>>>>>> VM Size is 60G, most of the memory effectively touched before migration,
> >>>>>>>>> through user application allocating and touching all memory with
> >>>>>>>>> pseudorandom data.
> >>>>>>>>>
> >>>>>>>>> 64K:     5200 Mbps (current situation)
> >>>>>>>>> 128K:    5800 Mbps
> >>>>>>>>> 256K:   20900 Mbps
> >>>>>>>>> 512K:   21600 Mbps
> >>>>>>>>> 1M:     22800 Mbps
> >>>>>>>>> 2M:     22800 Mbps
> >>>>>>>>> 4M:     22400 Mbps
> >>>>>>>>> 8M:     22500 Mbps
> >>>>>>>>> 16M:    22800 Mbps
> >>>>>>>>> 32M:    22900 Mbps
> >>>>>>>>> 64M:    22900 Mbps
> >>>>>>>>> 128M:   22800 Mbps
> >>>>>>>>>
> >>>>>>>>> This above is the throughput out of patched libvirt with multiple Pipe Sizes for the FDWrapper.
> >>>>>>>>
> >>>>>>>> Ok, its bouncing around with noise after 1 MB. So I'd suggest that
> >>>>>>>> libvirt attempt to raise the pipe limit to 1 MB by default, but
> >>>>>>>> not try to go higher.
> >>>>>>>>
> >>>>>>>>> As for the theoretical limit for the libvirt architecture,
> >>>>>>>>> I ran a qemu migration directly issuing the appropriate QMP
> >>>>>>>>> commands, setting the same migration parameters as per libvirt,
> >>>>>>>>> and then migrating to a socket netcatted to /dev/null via
> >>>>>>>>> {"execute": "migrate", "arguments": { "uri", "unix:///tmp/netcat.sock" } } :
> >>>>>>>>>
> >>>>>>>>> QMP:    37000 Mbps
> >>>>>>>>
> >>>>>>>>> So although the Pipe size improves things (in particular the
> >>>>>>>>> large jump is for the 256K size, although 1M seems a very good value),
> >>>>>>>>> there is still a second bottleneck in there somewhere that
> >>>>>>>>> accounts for a loss of ~14200 Mbps in throughput.
> >>>>>>
> >>>>>>
> >>>>>> Interesting addition: I tested quickly on a system with faster cpus and larger VM sizes, up to 200GB,
> >>>>>> and the difference in throughput libvirt vs qemu is basically the same ~14500 Mbps.
> >>>>>>
> >>>>>> ~50000 mbps qemu to netcat socket to /dev/null
> >>>>>> ~35500 mbps virsh save to /dev/null
> >>>>>>
> >>>>>> Seems it is not proportional to cpu speed by the looks of it (not a totally fair comparison because the VM sizes are different).
> >>>>>
> >>>>> It might be closer to RAM or cache bandwidth limited though; for an extra copy.
> >>>>
> >>>> I was thinking about sendfile(2) in iohelper, but that probably
> >>>> can't work as the input fd is a socket, I am getting EINVAL.
> >>>
> >>> Yep, sendfile() requires the input to be a mmapable FD,
> >>> and the output to be a socket.
> >>>
> >>> Try splice() instead  which merely requires 1 end to be a
> >>> pipe, and the other end can be any FD afaik.
> >>>
> >>
> >> I did try splice(), but performance is worse by around 500%.
> > 
> > Hmm, that's certainly unexpected !
> > 
> >> Any ideas welcome,
> > 
> > I learnt there is also a newer  copy_file_range call, not sure if that's
> > any better.
> > 
> > You passed len as 1 MB, I wonder if passing MAXINT is viable ? We just
> > want to copy everything IIRC.
> > 
> > With regards,
> > Daniel
> > 
> 
> Crazy idea, would trying to use the parallel migration concept for migrating to/from a file make any sense?
> 
> Not sure if applying the qemu multifd implementation of this would apply, maybe it could be given another implementation for "toFile", trying to use more than one cpu to do the transfer?

I can't see a way that would help; well, I could if you could
somehow have multiple io helper threads that dealt with it.

Dave

> Thanks,
> 
> Claudio
> 
> 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [libvirt RFC] virFile: new VIR_FILE_WRAPPER_BIG_PIPE to improve performance
  2022-04-05  8:35                             ` Dr. David Alan Gilbert
@ 2022-04-05  9:23                               ` Claudio Fontana
  2022-04-07  7:11                               ` Claudio Fontana
  1 sibling, 0 replies; 30+ messages in thread
From: Claudio Fontana @ 2022-04-05  9:23 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: libvir-list, andrea.righi, Jiri Denemark, Daniel P. Berrangé,
	qemu-devel

On 4/5/22 10:35 AM, Dr. David Alan Gilbert wrote:
> * Claudio Fontana (cfontana@suse.de) wrote:
>> On 3/28/22 10:31 AM, Daniel P. Berrangé wrote:
>>> On Sat, Mar 26, 2022 at 04:49:46PM +0100, Claudio Fontana wrote:
>>>> On 3/25/22 12:29 PM, Daniel P. Berrangé wrote:
>>>>> On Fri, Mar 18, 2022 at 02:34:29PM +0100, Claudio Fontana wrote:
>>>>>> On 3/17/22 4:03 PM, Dr. David Alan Gilbert wrote:
>>>>>>> * Claudio Fontana (cfontana@suse.de) wrote:
>>>>>>>> On 3/17/22 2:41 PM, Claudio Fontana wrote:
>>>>>>>>> On 3/17/22 11:25 AM, Daniel P. Berrangé wrote:
>>>>>>>>>> On Thu, Mar 17, 2022 at 11:12:11AM +0100, Claudio Fontana wrote:
>>>>>>>>>>> On 3/16/22 1:17 PM, Claudio Fontana wrote:
>>>>>>>>>>>> On 3/14/22 6:48 PM, Daniel P. Berrangé wrote:
>>>>>>>>>>>>> On Mon, Mar 14, 2022 at 06:38:31PM +0100, Claudio Fontana wrote:
>>>>>>>>>>>>>> On 3/14/22 6:17 PM, Daniel P. Berrangé wrote:
>>>>>>>>>>>>>>> On Sat, Mar 12, 2022 at 05:30:01PM +0100, Claudio Fontana wrote:
>>>>>>>>>>>>>>>> the first user is the qemu driver,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> virsh save/resume would slow to a crawl with a default pipe size (64k).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This improves the situation by 400%.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Going through io_helper still seems to incur in some penalty (~15%-ish)
>>>>>>>>>>>>>>>> compared with direct qemu migration to a nc socket to a file.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>  src/qemu/qemu_driver.c    |  6 +++---
>>>>>>>>>>>>>>>>  src/qemu/qemu_saveimage.c | 11 ++++++-----
>>>>>>>>>>>>>>>>  src/util/virfile.c        | 12 ++++++++++++
>>>>>>>>>>>>>>>>  src/util/virfile.h        |  1 +
>>>>>>>>>>>>>>>>  4 files changed, 22 insertions(+), 8 deletions(-)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hello, I initially thought this to be a qemu performance issue,
>>>>>>>>>>>>>>>> so you can find the discussion about this in qemu-devel:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> "Re: bad virsh save /dev/null performance (600 MiB/s max)"
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03142.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Current results show these experimental averages maximum throughput
>>>>>>>>>>> migrating to /dev/null per each FdWrapper Pipe Size (as per QEMU QMP
>>>>>>>>>>> "query-migrate", tests repeated 5 times for each).
>>>>>>>>>>> VM Size is 60G, most of the memory effectively touched before migration,
>>>>>>>>>>> through user application allocating and touching all memory with
>>>>>>>>>>> pseudorandom data.
>>>>>>>>>>>
>>>>>>>>>>> 64K:     5200 Mbps (current situation)
>>>>>>>>>>> 128K:    5800 Mbps
>>>>>>>>>>> 256K:   20900 Mbps
>>>>>>>>>>> 512K:   21600 Mbps
>>>>>>>>>>> 1M:     22800 Mbps
>>>>>>>>>>> 2M:     22800 Mbps
>>>>>>>>>>> 4M:     22400 Mbps
>>>>>>>>>>> 8M:     22500 Mbps
>>>>>>>>>>> 16M:    22800 Mbps
>>>>>>>>>>> 32M:    22900 Mbps
>>>>>>>>>>> 64M:    22900 Mbps
>>>>>>>>>>> 128M:   22800 Mbps
>>>>>>>>>>>
>>>>>>>>>>> This above is the throughput out of patched libvirt with multiple Pipe Sizes for the FDWrapper.
>>>>>>>>>>
>>>>>>>>>> Ok, its bouncing around with noise after 1 MB. So I'd suggest that
>>>>>>>>>> libvirt attempt to raise the pipe limit to 1 MB by default, but
>>>>>>>>>> not try to go higher.
>>>>>>>>>>
>>>>>>>>>>> As for the theoretical limit for the libvirt architecture,
>>>>>>>>>>> I ran a qemu migration directly issuing the appropriate QMP
>>>>>>>>>>> commands, setting the same migration parameters as per libvirt,
>>>>>>>>>>> and then migrating to a socket netcatted to /dev/null via
>>>>>>>>>>> {"execute": "migrate", "arguments": { "uri", "unix:///tmp/netcat.sock" } } :
>>>>>>>>>>>
>>>>>>>>>>> QMP:    37000 Mbps
>>>>>>>>>>
>>>>>>>>>>> So although the Pipe size improves things (in particular the
>>>>>>>>>>> large jump is for the 256K size, although 1M seems a very good value),
>>>>>>>>>>> there is still a second bottleneck in there somewhere that
>>>>>>>>>>> accounts for a loss of ~14200 Mbps in throughput.
>>>>>>>>
>>>>>>>>
>>>>>>>> Interesting addition: I tested quickly on a system with faster cpus and larger VM sizes, up to 200GB,
>>>>>>>> and the difference in throughput libvirt vs qemu is basically the same ~14500 Mbps.
>>>>>>>>
>>>>>>>> ~50000 mbps qemu to netcat socket to /dev/null
>>>>>>>> ~35500 mbps virsh save to /dev/null
>>>>>>>>
>>>>>>>> Seems it is not proportional to cpu speed by the looks of it (not a totally fair comparison because the VM sizes are different).
>>>>>>>
>>>>>>> It might be closer to RAM or cache bandwidth limited though; for an extra copy.
>>>>>>
>>>>>> I was thinking about sendfile(2) in iohelper, but that probably
>>>>>> can't work as the input fd is a socket, I am getting EINVAL.
>>>>>
>>>>> Yep, sendfile() requires the input to be a mmapable FD,
>>>>> and the output to be a socket.
>>>>>
>>>>> Try splice() instead  which merely requires 1 end to be a
>>>>> pipe, and the other end can be any FD afaik.
>>>>>
>>>>
>>>> I did try splice(), but performance is worse by around 500%.
>>>
>>> Hmm, that's certainly unexpected !
>>>
>>>> Any ideas welcome,
>>>
>>> I learnt there is also a newer  copy_file_range call, not sure if that's
>>> any better.
>>>
>>> You passed len as 1 MB, I wonder if passing MAXINT is viable ? We just
>>> want to copy everything IIRC.
>>>
>>> With regards,
>>> Daniel
>>>
>>
>> Crazy idea, would trying to use the parallel migration concept for migrating to/from a file make any sense?
>>
>> Not sure if applying the qemu multifd implementation of this would apply, maybe it could be given another implementation for "toFile", trying to use more than one cpu to do the transfer?
> 
> I can't see a way that would help; well, I could if you could
> somehow have multiple io helper threads that dealt with it.
> 
> Dave
> 

I'll spend cycles on this;

another thing I noticed while doing the "splice" API experiments in iohelper,

we cannot use splice and --bypass-cache (O_DIRECT) together there, because as far as I could debug/understand, apparently the source data stream is not block aligned,
as the kernel iomap_dio_bio_iter return value during the splice call shows, so that's why I get EINVAL.

If we had one of the migration streams that consisted only of pages (instead of header+page, ...), that might give us block alignment, potentially unlock better performance via splice that way?

This is all just ideas, don't have any data yet to back this up.

Thanks,

Claudio



 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [libvirt RFC] virFile: new VIR_FILE_WRAPPER_BIG_PIPE to improve performance
  2022-04-05  8:35                             ` Dr. David Alan Gilbert
  2022-04-05  9:23                               ` Claudio Fontana
@ 2022-04-07  7:11                               ` Claudio Fontana
  2022-04-07 13:53                                 ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 30+ messages in thread
From: Claudio Fontana @ 2022-04-07  7:11 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: libvir-list, andrea.righi, Jiri Denemark, qemu-devel

On 4/5/22 10:35 AM, Dr. David Alan Gilbert wrote:
> * Claudio Fontana (cfontana@suse.de) wrote:
>> On 3/28/22 10:31 AM, Daniel P. Berrangé wrote:
>>> On Sat, Mar 26, 2022 at 04:49:46PM +0100, Claudio Fontana wrote:
>>>> On 3/25/22 12:29 PM, Daniel P. Berrangé wrote:
>>>>> On Fri, Mar 18, 2022 at 02:34:29PM +0100, Claudio Fontana wrote:
>>>>>> On 3/17/22 4:03 PM, Dr. David Alan Gilbert wrote:
>>>>>>> * Claudio Fontana (cfontana@suse.de) wrote:
>>>>>>>> On 3/17/22 2:41 PM, Claudio Fontana wrote:
>>>>>>>>> On 3/17/22 11:25 AM, Daniel P. Berrangé wrote:
>>>>>>>>>> On Thu, Mar 17, 2022 at 11:12:11AM +0100, Claudio Fontana wrote:
>>>>>>>>>>> On 3/16/22 1:17 PM, Claudio Fontana wrote:
>>>>>>>>>>>> On 3/14/22 6:48 PM, Daniel P. Berrangé wrote:
>>>>>>>>>>>>> On Mon, Mar 14, 2022 at 06:38:31PM +0100, Claudio Fontana wrote:
>>>>>>>>>>>>>> On 3/14/22 6:17 PM, Daniel P. Berrangé wrote:
>>>>>>>>>>>>>>> On Sat, Mar 12, 2022 at 05:30:01PM +0100, Claudio Fontana wrote:
>>>>>>>>>>>>>>>> the first user is the qemu driver,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> virsh save/resume would slow to a crawl with a default pipe size (64k).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This improves the situation by 400%.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Going through io_helper still seems to incur in some penalty (~15%-ish)
>>>>>>>>>>>>>>>> compared with direct qemu migration to a nc socket to a file.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>  src/qemu/qemu_driver.c    |  6 +++---
>>>>>>>>>>>>>>>>  src/qemu/qemu_saveimage.c | 11 ++++++-----
>>>>>>>>>>>>>>>>  src/util/virfile.c        | 12 ++++++++++++
>>>>>>>>>>>>>>>>  src/util/virfile.h        |  1 +
>>>>>>>>>>>>>>>>  4 files changed, 22 insertions(+), 8 deletions(-)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hello, I initially thought this to be a qemu performance issue,
>>>>>>>>>>>>>>>> so you can find the discussion about this in qemu-devel:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> "Re: bad virsh save /dev/null performance (600 MiB/s max)"
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03142.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Current results show these experimental averages maximum throughput
>>>>>>>>>>> migrating to /dev/null per each FdWrapper Pipe Size (as per QEMU QMP
>>>>>>>>>>> "query-migrate", tests repeated 5 times for each).
>>>>>>>>>>> VM Size is 60G, most of the memory effectively touched before migration,
>>>>>>>>>>> through user application allocating and touching all memory with
>>>>>>>>>>> pseudorandom data.
>>>>>>>>>>>
>>>>>>>>>>> 64K:     5200 Mbps (current situation)
>>>>>>>>>>> 128K:    5800 Mbps
>>>>>>>>>>> 256K:   20900 Mbps
>>>>>>>>>>> 512K:   21600 Mbps
>>>>>>>>>>> 1M:     22800 Mbps
>>>>>>>>>>> 2M:     22800 Mbps
>>>>>>>>>>> 4M:     22400 Mbps
>>>>>>>>>>> 8M:     22500 Mbps
>>>>>>>>>>> 16M:    22800 Mbps
>>>>>>>>>>> 32M:    22900 Mbps
>>>>>>>>>>> 64M:    22900 Mbps
>>>>>>>>>>> 128M:   22800 Mbps
>>>>>>>>>>>
>>>>>>>>>>> This above is the throughput out of patched libvirt with multiple Pipe Sizes for the FDWrapper.
>>>>>>>>>>
>>>>>>>>>> Ok, its bouncing around with noise after 1 MB. So I'd suggest that
>>>>>>>>>> libvirt attempt to raise the pipe limit to 1 MB by default, but
>>>>>>>>>> not try to go higher.
>>>>>>>>>>
>>>>>>>>>>> As for the theoretical limit for the libvirt architecture,
>>>>>>>>>>> I ran a qemu migration directly issuing the appropriate QMP
>>>>>>>>>>> commands, setting the same migration parameters as per libvirt,
>>>>>>>>>>> and then migrating to a socket netcatted to /dev/null via
>>>>>>>>>>> {"execute": "migrate", "arguments": { "uri", "unix:///tmp/netcat.sock" } } :
>>>>>>>>>>>
>>>>>>>>>>> QMP:    37000 Mbps
>>>>>>>>>>
>>>>>>>>>>> So although the Pipe size improves things (in particular the
>>>>>>>>>>> large jump is for the 256K size, although 1M seems a very good value),
>>>>>>>>>>> there is still a second bottleneck in there somewhere that
>>>>>>>>>>> accounts for a loss of ~14200 Mbps in throughput.
>>>>>>>>
>>>>>>>>
>>>>>>>> Interesting addition: I tested quickly on a system with faster cpus and larger VM sizes, up to 200GB,
>>>>>>>> and the difference in throughput libvirt vs qemu is basically the same ~14500 Mbps.
>>>>>>>>
>>>>>>>> ~50000 mbps qemu to netcat socket to /dev/null
>>>>>>>> ~35500 mbps virsh save to /dev/null
>>>>>>>>
>>>>>>>> Seems it is not proportional to cpu speed by the looks of it (not a totally fair comparison because the VM sizes are different).
>>>>>>>
>>>>>>> It might be closer to RAM or cache bandwidth limited though; for an extra copy.
>>>>>>
>>>>>> I was thinking about sendfile(2) in iohelper, but that probably
>>>>>> can't work as the input fd is a socket, I am getting EINVAL.
>>>>>
>>>>> Yep, sendfile() requires the input to be a mmapable FD,
>>>>> and the output to be a socket.
>>>>>
>>>>> Try splice() instead  which merely requires 1 end to be a
>>>>> pipe, and the other end can be any FD afaik.
>>>>>
>>>>
>>>> I did try splice(), but performance is worse by around 500%.
>>>
>>> Hmm, that's certainly unexpected !
>>>
>>>> Any ideas welcome,
>>>
>>> I learnt there is also a newer  copy_file_range call, not sure if that's
>>> any better.
>>>
>>> You passed len as 1 MB, I wonder if passing MAXINT is viable ? We just
>>> want to copy everything IIRC.
>>>
>>> With regards,
>>> Daniel
>>>
>>
>> Crazy idea, would trying to use the parallel migration concept for migrating to/from a file make any sense?
>>
>> Not sure if applying the qemu multifd implementation of this would apply, maybe it could be given another implementation for "toFile", trying to use more than one cpu to do the transfer?
> 
> I can't see a way that would help; well, I could if you could
> somehow have multiple io helper threads that dealt with it.

The first issue I encounter here for both the "virsh save" and "virsh restore" scenarios is that libvirt uses fd: migration, not unix: migration.
QEMU supports multifd for unix:, tcp:, vsock: as far as I can see.

Current save procedure in QMP in short:

{"execute":"migrate-set-capabilities", ...}
{"execute":"migrate-set-parameters", ...}
{"execute":"getfd","arguments":{"fdname":"migrate"}, ...} fd=26
QEMU_MONITOR_IO_SEND_FD: fd=26
{"execute":"migrate","arguments":{"uri":"fd:migrate"}, ...}


Current restore procedure in QMP in short:

(start QEMU)
{"execute":"migrate-incoming","arguments":{"uri":"fd:21"}, ...}


Should I investigate changing libvirt to use unix: for save/restore?
Or should I look into changing qemu to somehow accept fd: for multifd, meaning I guess providing multiple fd: uris in the migrate command?


Thank you for your help,

Claudio



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [libvirt RFC] virFile: new VIR_FILE_WRAPPER_BIG_PIPE to improve performance
  2022-04-07  7:11                               ` Claudio Fontana
@ 2022-04-07 13:53                                 ` Dr. David Alan Gilbert
  2022-04-07 13:57                                   ` Claudio Fontana
  0 siblings, 1 reply; 30+ messages in thread
From: Dr. David Alan Gilbert @ 2022-04-07 13:53 UTC (permalink / raw)
  To: Claudio Fontana; +Cc: libvir-list, andrea.righi, Jiri Denemark, qemu-devel

* Claudio Fontana (cfontana@suse.de) wrote:
> On 4/5/22 10:35 AM, Dr. David Alan Gilbert wrote:
> > * Claudio Fontana (cfontana@suse.de) wrote:
> >> On 3/28/22 10:31 AM, Daniel P. Berrangé wrote:
> >>> On Sat, Mar 26, 2022 at 04:49:46PM +0100, Claudio Fontana wrote:
> >>>> On 3/25/22 12:29 PM, Daniel P. Berrangé wrote:
> >>>>> On Fri, Mar 18, 2022 at 02:34:29PM +0100, Claudio Fontana wrote:
> >>>>>> On 3/17/22 4:03 PM, Dr. David Alan Gilbert wrote:
> >>>>>>> * Claudio Fontana (cfontana@suse.de) wrote:
> >>>>>>>> On 3/17/22 2:41 PM, Claudio Fontana wrote:
> >>>>>>>>> On 3/17/22 11:25 AM, Daniel P. Berrangé wrote:
> >>>>>>>>>> On Thu, Mar 17, 2022 at 11:12:11AM +0100, Claudio Fontana wrote:
> >>>>>>>>>>> On 3/16/22 1:17 PM, Claudio Fontana wrote:
> >>>>>>>>>>>> On 3/14/22 6:48 PM, Daniel P. Berrangé wrote:
> >>>>>>>>>>>>> On Mon, Mar 14, 2022 at 06:38:31PM +0100, Claudio Fontana wrote:
> >>>>>>>>>>>>>> On 3/14/22 6:17 PM, Daniel P. Berrangé wrote:
> >>>>>>>>>>>>>>> On Sat, Mar 12, 2022 at 05:30:01PM +0100, Claudio Fontana wrote:
> >>>>>>>>>>>>>>>> the first user is the qemu driver,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> virsh save/resume would slow to a crawl with a default pipe size (64k).
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> This improves the situation by 400%.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Going through io_helper still seems to incur in some penalty (~15%-ish)
> >>>>>>>>>>>>>>>> compared with direct qemu migration to a nc socket to a file.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
> >>>>>>>>>>>>>>>> ---
> >>>>>>>>>>>>>>>>  src/qemu/qemu_driver.c    |  6 +++---
> >>>>>>>>>>>>>>>>  src/qemu/qemu_saveimage.c | 11 ++++++-----
> >>>>>>>>>>>>>>>>  src/util/virfile.c        | 12 ++++++++++++
> >>>>>>>>>>>>>>>>  src/util/virfile.h        |  1 +
> >>>>>>>>>>>>>>>>  4 files changed, 22 insertions(+), 8 deletions(-)
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Hello, I initially thought this to be a qemu performance issue,
> >>>>>>>>>>>>>>>> so you can find the discussion about this in qemu-devel:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> "Re: bad virsh save /dev/null performance (600 MiB/s max)"
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03142.html
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> Current results show these experimental averages maximum throughput
> >>>>>>>>>>> migrating to /dev/null per each FdWrapper Pipe Size (as per QEMU QMP
> >>>>>>>>>>> "query-migrate", tests repeated 5 times for each).
> >>>>>>>>>>> VM Size is 60G, most of the memory effectively touched before migration,
> >>>>>>>>>>> through user application allocating and touching all memory with
> >>>>>>>>>>> pseudorandom data.
> >>>>>>>>>>>
> >>>>>>>>>>> 64K:     5200 Mbps (current situation)
> >>>>>>>>>>> 128K:    5800 Mbps
> >>>>>>>>>>> 256K:   20900 Mbps
> >>>>>>>>>>> 512K:   21600 Mbps
> >>>>>>>>>>> 1M:     22800 Mbps
> >>>>>>>>>>> 2M:     22800 Mbps
> >>>>>>>>>>> 4M:     22400 Mbps
> >>>>>>>>>>> 8M:     22500 Mbps
> >>>>>>>>>>> 16M:    22800 Mbps
> >>>>>>>>>>> 32M:    22900 Mbps
> >>>>>>>>>>> 64M:    22900 Mbps
> >>>>>>>>>>> 128M:   22800 Mbps
> >>>>>>>>>>>
> >>>>>>>>>>> This above is the throughput out of patched libvirt with multiple Pipe Sizes for the FDWrapper.
> >>>>>>>>>>
> >>>>>>>>>> Ok, its bouncing around with noise after 1 MB. So I'd suggest that
> >>>>>>>>>> libvirt attempt to raise the pipe limit to 1 MB by default, but
> >>>>>>>>>> not try to go higher.
> >>>>>>>>>>
> >>>>>>>>>>> As for the theoretical limit for the libvirt architecture,
> >>>>>>>>>>> I ran a qemu migration directly issuing the appropriate QMP
> >>>>>>>>>>> commands, setting the same migration parameters as per libvirt,
> >>>>>>>>>>> and then migrating to a socket netcatted to /dev/null via
> >>>>>>>>>>> {"execute": "migrate", "arguments": { "uri", "unix:///tmp/netcat.sock" } } :
> >>>>>>>>>>>
> >>>>>>>>>>> QMP:    37000 Mbps
> >>>>>>>>>>
> >>>>>>>>>>> So although the Pipe size improves things (in particular the
> >>>>>>>>>>> large jump is for the 256K size, although 1M seems a very good value),
> >>>>>>>>>>> there is still a second bottleneck in there somewhere that
> >>>>>>>>>>> accounts for a loss of ~14200 Mbps in throughput.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Interesting addition: I tested quickly on a system with faster cpus and larger VM sizes, up to 200GB,
> >>>>>>>> and the difference in throughput libvirt vs qemu is basically the same ~14500 Mbps.
> >>>>>>>>
> >>>>>>>> ~50000 mbps qemu to netcat socket to /dev/null
> >>>>>>>> ~35500 mbps virsh save to /dev/null
> >>>>>>>>
> >>>>>>>> Seems it is not proportional to cpu speed by the looks of it (not a totally fair comparison because the VM sizes are different).
> >>>>>>>
> >>>>>>> It might be closer to RAM or cache bandwidth limited though; for an extra copy.
> >>>>>>
> >>>>>> I was thinking about sendfile(2) in iohelper, but that probably
> >>>>>> can't work as the input fd is a socket, I am getting EINVAL.
> >>>>>
> >>>>> Yep, sendfile() requires the input to be a mmapable FD,
> >>>>> and the output to be a socket.
> >>>>>
> >>>>> Try splice() instead  which merely requires 1 end to be a
> >>>>> pipe, and the other end can be any FD afaik.
> >>>>>
> >>>>
> >>>> I did try splice(), but performance is worse by around 500%.
> >>>
> >>> Hmm, that's certainly unexpected !
> >>>
> >>>> Any ideas welcome,
> >>>
> >>> I learnt there is also a newer  copy_file_range call, not sure if that's
> >>> any better.
> >>>
> >>> You passed len as 1 MB, I wonder if passing MAXINT is viable ? We just
> >>> want to copy everything IIRC.
> >>>
> >>> With regards,
> >>> Daniel
> >>>
> >>
> >> Crazy idea, would trying to use the parallel migration concept for migrating to/from a file make any sense?
> >>
> >> Not sure if applying the qemu multifd implementation of this would apply, maybe it could be given another implementation for "toFile", trying to use more than one cpu to do the transfer?
> > 
> > I can't see a way that would help; well, I could if you could
> > somehow have multiple io helper threads that dealt with it.
> 
> The first issue I encounter here for both the "virsh save" and "virsh restore" scenarios is that libvirt uses fd: migration, not unix: migration.
> QEMU supports multifd for unix:, tcp:, vsock: as far as I can see.
> 
> Current save procedure in QMP in short:
> 
> {"execute":"migrate-set-capabilities", ...}
> {"execute":"migrate-set-parameters", ...}
> {"execute":"getfd","arguments":{"fdname":"migrate"}, ...} fd=26
> QEMU_MONITOR_IO_SEND_FD: fd=26
> {"execute":"migrate","arguments":{"uri":"fd:migrate"}, ...}
> 
> 
> Current restore procedure in QMP in short:
> 
> (start QEMU)
> {"execute":"migrate-incoming","arguments":{"uri":"fd:21"}, ...}
> 
> 
> Should I investigate changing libvirt to use unix: for save/restore?
> Or should I look into changing qemu to somehow accept fd: for multifd, meaning I guess providing multiple fd: uris in the migrate command?

So I'm not sure this is the right direction; i.e. if multifd is the
right answer to your problem.
However, I think the qemu code probably really really wants to be a
socket.

Dave

> 
> Thank you for your help,
> 
> Claudio
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [libvirt RFC] virFile: new VIR_FILE_WRAPPER_BIG_PIPE to improve performance
  2022-04-07 13:53                                 ` Dr. David Alan Gilbert
@ 2022-04-07 13:57                                   ` Claudio Fontana
  2022-04-11 18:21                                     ` Claudio Fontana
  0 siblings, 1 reply; 30+ messages in thread
From: Claudio Fontana @ 2022-04-07 13:57 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: libvir-list, andrea.righi, Jiri Denemark, qemu-devel

On 4/7/22 3:53 PM, Dr. David Alan Gilbert wrote:
> * Claudio Fontana (cfontana@suse.de) wrote:
>> On 4/5/22 10:35 AM, Dr. David Alan Gilbert wrote:
>>> * Claudio Fontana (cfontana@suse.de) wrote:
>>>> On 3/28/22 10:31 AM, Daniel P. Berrangé wrote:
>>>>> On Sat, Mar 26, 2022 at 04:49:46PM +0100, Claudio Fontana wrote:
>>>>>> On 3/25/22 12:29 PM, Daniel P. Berrangé wrote:
>>>>>>> On Fri, Mar 18, 2022 at 02:34:29PM +0100, Claudio Fontana wrote:
>>>>>>>> On 3/17/22 4:03 PM, Dr. David Alan Gilbert wrote:
>>>>>>>>> * Claudio Fontana (cfontana@suse.de) wrote:
>>>>>>>>>> On 3/17/22 2:41 PM, Claudio Fontana wrote:
>>>>>>>>>>> On 3/17/22 11:25 AM, Daniel P. Berrangé wrote:
>>>>>>>>>>>> On Thu, Mar 17, 2022 at 11:12:11AM +0100, Claudio Fontana wrote:
>>>>>>>>>>>>> On 3/16/22 1:17 PM, Claudio Fontana wrote:
>>>>>>>>>>>>>> On 3/14/22 6:48 PM, Daniel P. Berrangé wrote:
>>>>>>>>>>>>>>> On Mon, Mar 14, 2022 at 06:38:31PM +0100, Claudio Fontana wrote:
>>>>>>>>>>>>>>>> On 3/14/22 6:17 PM, Daniel P. Berrangé wrote:
>>>>>>>>>>>>>>>>> On Sat, Mar 12, 2022 at 05:30:01PM +0100, Claudio Fontana wrote:
>>>>>>>>>>>>>>>>>> the first user is the qemu driver,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> virsh save/resume would slow to a crawl with a default pipe size (64k).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This improves the situation by 400%.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Going through io_helper still seems to incur in some penalty (~15%-ish)
>>>>>>>>>>>>>>>>>> compared with direct qemu migration to a nc socket to a file.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
>>>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>>>  src/qemu/qemu_driver.c    |  6 +++---
>>>>>>>>>>>>>>>>>>  src/qemu/qemu_saveimage.c | 11 ++++++-----
>>>>>>>>>>>>>>>>>>  src/util/virfile.c        | 12 ++++++++++++
>>>>>>>>>>>>>>>>>>  src/util/virfile.h        |  1 +
>>>>>>>>>>>>>>>>>>  4 files changed, 22 insertions(+), 8 deletions(-)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hello, I initially thought this to be a qemu performance issue,
>>>>>>>>>>>>>>>>>> so you can find the discussion about this in qemu-devel:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> "Re: bad virsh save /dev/null performance (600 MiB/s max)"
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03142.html
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Current results show these experimental averages maximum throughput
>>>>>>>>>>>>> migrating to /dev/null per each FdWrapper Pipe Size (as per QEMU QMP
>>>>>>>>>>>>> "query-migrate", tests repeated 5 times for each).
>>>>>>>>>>>>> VM Size is 60G, most of the memory effectively touched before migration,
>>>>>>>>>>>>> through user application allocating and touching all memory with
>>>>>>>>>>>>> pseudorandom data.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 64K:     5200 Mbps (current situation)
>>>>>>>>>>>>> 128K:    5800 Mbps
>>>>>>>>>>>>> 256K:   20900 Mbps
>>>>>>>>>>>>> 512K:   21600 Mbps
>>>>>>>>>>>>> 1M:     22800 Mbps
>>>>>>>>>>>>> 2M:     22800 Mbps
>>>>>>>>>>>>> 4M:     22400 Mbps
>>>>>>>>>>>>> 8M:     22500 Mbps
>>>>>>>>>>>>> 16M:    22800 Mbps
>>>>>>>>>>>>> 32M:    22900 Mbps
>>>>>>>>>>>>> 64M:    22900 Mbps
>>>>>>>>>>>>> 128M:   22800 Mbps
>>>>>>>>>>>>>
>>>>>>>>>>>>> This above is the throughput out of patched libvirt with multiple Pipe Sizes for the FDWrapper.
>>>>>>>>>>>>
>>>>>>>>>>>> Ok, its bouncing around with noise after 1 MB. So I'd suggest that
>>>>>>>>>>>> libvirt attempt to raise the pipe limit to 1 MB by default, but
>>>>>>>>>>>> not try to go higher.
>>>>>>>>>>>>
>>>>>>>>>>>>> As for the theoretical limit for the libvirt architecture,
>>>>>>>>>>>>> I ran a qemu migration directly issuing the appropriate QMP
>>>>>>>>>>>>> commands, setting the same migration parameters as per libvirt,
>>>>>>>>>>>>> and then migrating to a socket netcatted to /dev/null via
>>>>>>>>>>>>> {"execute": "migrate", "arguments": { "uri", "unix:///tmp/netcat.sock" } } :
>>>>>>>>>>>>>
>>>>>>>>>>>>> QMP:    37000 Mbps
>>>>>>>>>>>>
>>>>>>>>>>>>> So although the Pipe size improves things (in particular the
>>>>>>>>>>>>> large jump is for the 256K size, although 1M seems a very good value),
>>>>>>>>>>>>> there is still a second bottleneck in there somewhere that
>>>>>>>>>>>>> accounts for a loss of ~14200 Mbps in throughput.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Interesting addition: I tested quickly on a system with faster cpus and larger VM sizes, up to 200GB,
>>>>>>>>>> and the difference in throughput libvirt vs qemu is basically the same ~14500 Mbps.
>>>>>>>>>>
>>>>>>>>>> ~50000 mbps qemu to netcat socket to /dev/null
>>>>>>>>>> ~35500 mbps virsh save to /dev/null
>>>>>>>>>>
>>>>>>>>>> Seems it is not proportional to cpu speed by the looks of it (not a totally fair comparison because the VM sizes are different).
>>>>>>>>>
>>>>>>>>> It might be closer to RAM or cache bandwidth limited though; for an extra copy.
>>>>>>>>
>>>>>>>> I was thinking about sendfile(2) in iohelper, but that probably
>>>>>>>> can't work as the input fd is a socket, I am getting EINVAL.
>>>>>>>
>>>>>>> Yep, sendfile() requires the input to be a mmapable FD,
>>>>>>> and the output to be a socket.
>>>>>>>
>>>>>>> Try splice() instead  which merely requires 1 end to be a
>>>>>>> pipe, and the other end can be any FD afaik.
>>>>>>>
>>>>>>
>>>>>> I did try splice(), but performance is worse by around 500%.
>>>>>
>>>>> Hmm, that's certainly unexpected !
>>>>>
>>>>>> Any ideas welcome,
>>>>>
>>>>> I learnt there is also a newer  copy_file_range call, not sure if that's
>>>>> any better.
>>>>>
>>>>> You passed len as 1 MB, I wonder if passing MAXINT is viable ? We just
>>>>> want to copy everything IIRC.
>>>>>
>>>>> With regards,
>>>>> Daniel
>>>>>
>>>>
>>>> Crazy idea, would trying to use the parallel migration concept for migrating to/from a file make any sense?
>>>>
>>>> Not sure if applying the qemu multifd implementation of this would apply, maybe it could be given another implementation for "toFile", trying to use more than one cpu to do the transfer?
>>>
>>> I can't see a way that would help; well, I could if you could
>>> somehow have multiple io helper threads that dealt with it.
>>
>> The first issue I encounter here for both the "virsh save" and "virsh restore" scenarios is that libvirt uses fd: migration, not unix: migration.
>> QEMU supports multifd for unix:, tcp:, vsock: as far as I can see.
>>
>> Current save procedure in QMP in short:
>>
>> {"execute":"migrate-set-capabilities", ...}
>> {"execute":"migrate-set-parameters", ...}
>> {"execute":"getfd","arguments":{"fdname":"migrate"}, ...} fd=26
>> QEMU_MONITOR_IO_SEND_FD: fd=26
>> {"execute":"migrate","arguments":{"uri":"fd:migrate"}, ...}
>>
>>
>> Current restore procedure in QMP in short:
>>
>> (start QEMU)
>> {"execute":"migrate-incoming","arguments":{"uri":"fd:21"}, ...}
>>
>>
>> Should I investigate changing libvirt to use unix: for save/restore?
>> Or should I look into changing qemu to somehow accept fd: for multifd, meaning I guess providing multiple fd: uris in the migrate command?
> 
> So I'm not sure this is the right direction; i.e. if multifd is the
> right answer to your problem.

Of course, just exploring the space.

> However, I think the qemu code probably really really wants to be a
> socket.

Understood, I'll try to bend libvirt to use unix:/// and see how far I get,

Thanks,

Claudio

> 
> Dave
> 
>>
>> Thank you for your help,
>>
>> Claudio
>>



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [libvirt RFC] virFile: new VIR_FILE_WRAPPER_BIG_PIPE to improve performance
  2022-03-25 10:33                     ` Daniel P. Berrangé
  2022-03-25 10:56                       ` Claudio Fontana
@ 2022-04-10 19:58                       ` Claudio Fontana
  1 sibling, 0 replies; 30+ messages in thread
From: Claudio Fontana @ 2022-04-10 19:58 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: libvir-list, andrea.righi, Jiri Denemark, Dr. David Alan Gilbert,
	qemu-devel

Hi Daniel,

On 3/25/22 11:33 AM, Daniel P. Berrangé wrote:
> On Fri, Mar 18, 2022 at 02:34:29PM +0100, Claudio Fontana wrote:
>> On 3/17/22 4:03 PM, Dr. David Alan Gilbert wrote:
>>> * Claudio Fontana (cfontana@suse.de) wrote:
>>>> On 3/17/22 2:41 PM, Claudio Fontana wrote:
>>>>> On 3/17/22 11:25 AM, Daniel P. Berrangé wrote:
>>>>>> On Thu, Mar 17, 2022 at 11:12:11AM +0100, Claudio Fontana wrote:
>>>>>>> On 3/16/22 1:17 PM, Claudio Fontana wrote:
>>>>>>>> On 3/14/22 6:48 PM, Daniel P. Berrangé wrote:
>>>>>>>>> On Mon, Mar 14, 2022 at 06:38:31PM +0100, Claudio Fontana wrote:
>>>>>>>>>> On 3/14/22 6:17 PM, Daniel P. Berrangé wrote:
>>>>>>>>>>> On Sat, Mar 12, 2022 at 05:30:01PM +0100, Claudio Fontana wrote:
>>>>>>>>>>>> the first user is the qemu driver,
>>>>>>>>>>>>
>>>>>>>>>>>> virsh save/resume would slow to a crawl with a default pipe size (64k).
>>>>>>>>>>>>
>>>>>>>>>>>> This improves the situation by 400%.
>>>>>>>>>>>>
>>>>>>>>>>>> Going through io_helper still seems to incur in some penalty (~15%-ish)
>>>>>>>>>>>> compared with direct qemu migration to a nc socket to a file.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
>>>>>>>>>>>> ---
>>>>>>>>>>>>  src/qemu/qemu_driver.c    |  6 +++---
>>>>>>>>>>>>  src/qemu/qemu_saveimage.c | 11 ++++++-----
>>>>>>>>>>>>  src/util/virfile.c        | 12 ++++++++++++
>>>>>>>>>>>>  src/util/virfile.h        |  1 +
>>>>>>>>>>>>  4 files changed, 22 insertions(+), 8 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> Hello, I initially thought this to be a qemu performance issue,
>>>>>>>>>>>> so you can find the discussion about this in qemu-devel:
>>>>>>>>>>>>
>>>>>>>>>>>> "Re: bad virsh save /dev/null performance (600 MiB/s max)"
>>>>>>>>>>>>
>>>>>>>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03142.html
>>>>>>
>>>>>>
>>>>>>> Current results show these experimental averages maximum throughput
>>>>>>> migrating to /dev/null per each FdWrapper Pipe Size (as per QEMU QMP
>>>>>>> "query-migrate", tests repeated 5 times for each).
>>>>>>> VM Size is 60G, most of the memory effectively touched before migration,
>>>>>>> through user application allocating and touching all memory with
>>>>>>> pseudorandom data.
>>>>>>>
>>>>>>> 64K:     5200 Mbps (current situation)
>>>>>>> 128K:    5800 Mbps
>>>>>>> 256K:   20900 Mbps
>>>>>>> 512K:   21600 Mbps
>>>>>>> 1M:     22800 Mbps
>>>>>>> 2M:     22800 Mbps
>>>>>>> 4M:     22400 Mbps
>>>>>>> 8M:     22500 Mbps
>>>>>>> 16M:    22800 Mbps
>>>>>>> 32M:    22900 Mbps
>>>>>>> 64M:    22900 Mbps
>>>>>>> 128M:   22800 Mbps
>>>>>>>
>>>>>>> This above is the throughput out of patched libvirt with multiple Pipe Sizes for the FDWrapper.
>>>>>>
>>>>>> Ok, its bouncing around with noise after 1 MB. So I'd suggest that
>>>>>> libvirt attempt to raise the pipe limit to 1 MB by default, but
>>>>>> not try to go higher.
>>>>>>
>>>>>>> As for the theoretical limit for the libvirt architecture,
>>>>>>> I ran a qemu migration directly issuing the appropriate QMP
>>>>>>> commands, setting the same migration parameters as per libvirt,
>>>>>>> and then migrating to a socket netcatted to /dev/null via
>>>>>>> {"execute": "migrate", "arguments": { "uri", "unix:///tmp/netcat.sock" } } :
>>>>>>>
>>>>>>> QMP:    37000 Mbps
>>>>>>
>>>>>>> So although the Pipe size improves things (in particular the
>>>>>>> large jump is for the 256K size, although 1M seems a very good value),
>>>>>>> there is still a second bottleneck in there somewhere that
>>>>>>> accounts for a loss of ~14200 Mbps in throughput.
>>>>
>>>>
>>>> Interesting addition: I tested quickly on a system with faster cpus and larger VM sizes, up to 200GB,
>>>> and the difference in throughput libvirt vs qemu is basically the same ~14500 Mbps.
>>>>
>>>> ~50000 mbps qemu to netcat socket to /dev/null
>>>> ~35500 mbps virsh save to /dev/null
>>>>
>>>> Seems it is not proportional to cpu speed by the looks of it (not a totally fair comparison because the VM sizes are different).
>>>
>>> It might be closer to RAM or cache bandwidth limited though; for an extra copy.
>>
>> I was thinking about sendfile(2) in iohelper, but that probably can't work as the input fd is a socket, I am getting EINVAL.
>>
>> One thing that I noticed is:
>>
>> ommit afe6e58aedcd5e27ea16184fed90b338569bd042
>> Author: Jiri Denemark <jdenemar@redhat.com>
>> Date:   Mon Feb 6 14:40:48 2012 +0100
>>
>>     util: Generalize virFileDirectFd
>>     
>>     virFileDirectFd was used for accessing files opened with O_DIRECT using
>>     libvirt_iohelper. We will want to use the helper for accessing files
>>     regardless on O_DIRECT and thus virFileDirectFd was generalized and
>>     renamed to virFileWrapperFd.
>>
>>
>> And in particular the comment in src/util/virFile.c:
>>
>>     /* XXX support posix_fadvise rather than O_DIRECT, if the kernel support                                                                                                 
>>      * for that is decent enough. In that case, we will also need to                                                                                                         
>>      * explicitly support VIR_FILE_WRAPPER_NON_BLOCKING since                                                                                                                
>>      * VIR_FILE_WRAPPER_BYPASS_CACHE alone will no longer require spawning                                                                                                   
>>      * iohelper.                                                                                                                                                             
>>      */
>>
>> by Jiri Denemark.
>>
>> I have lots of questions here, and I tried to involve Jiri and Andrea Righi here, who a long time ago proposed a POSIX_FADV_NOREUSE implementation.
>>
>> 1) What is the reason iohelper was introduced?
> 
> With POSIX you can't get sensible results from poll() on FDs associated with
> plain files. It will always report the file as readable/writable, and the
> userspace caller will get blocked any time the I/O operation causes the
> kernel to read/write from the underlying (potentially very slow) storage.
> 
> IOW if you give QEMU an FD associated with a plain file and tell it to
> migrate to that, the guest OS will get stalled.
> 
> To avoid this we have to give QEMU an FD that is NOT a plain file, but
> rather something on which poll() works correctly to avoid blocking. This
> essentially means a socket or pipe FD.
> 
> Here enters the iohelper - we give QEMU a pipe whose other end is the
> iohelper. The iohelper suffers from blocking on read/write but that
> doesn't matter, because QEMU is isolated from this via the pipe.


I am still puzzled by this, when we migrate to a file via virsh save in qemu_saveimage.c ,
we suspend the guest anyway right?

But maybe there is some other problem that triggers?

In the Restore code, ie qemuSaveImageOpen(), we say:

    if (bypass_cache &&
        !(*wrapperFd = virFileWrapperFdNew(&fd, path,
                                           VIR_FILE_WRAPPER_BYPASS_CACHE)))
        return -1;

why don't we make the wrapper conditional on bypass_cache in the Save code too, in qemuSaveImageCreate?

I ask because I tried this change:

commit ae7dff45f10be78d1555e3f302f337e72afa300c
Author: Claudio Fontana <cfontana@suse.de>
Date:   Sun Apr 10 12:33:37 2022 -0600

    only use wrapper if you want to skip the filesystem cache
    
    Signed-off-by: Claudio Fontana <cfontana@suse.de>

diff --git a/src/qemu/qemu_saveimage.c b/src/qemu/qemu_saveimage.c
index 4fd4c5cfcd..5ea1b2fbcc 100644
--- a/src/qemu/qemu_saveimage.c
+++ b/src/qemu/qemu_saveimage.c
@@ -289,8 +289,10 @@ qemuSaveImageCreate(virQEMUDriver *driver,
     if (qemuSecuritySetImageFDLabel(driver->securityManager, vm->def, fd) < 0)
         goto cleanup;
 
-    if (!(wrapperFd = virFileWrapperFdNew(&fd, path, wrapperFlags)))
-        goto cleanup;
+    if ((flags & VIR_DOMAIN_SAVE_BYPASS_CACHE)) {
+        if (!(wrapperFd = virFileWrapperFdNew(&fd, path, wrapperFlags)))
+            goto cleanup;
+    }
 
     if (virQEMUSaveDataWrite(data, fd, path) < 0)
         goto cleanup;



and I got a pretty good performance improvement, where it would be better in my use case not to use O_DIRECT anymore,
and nothing prohibits to still use O_DIRECT if desired.

I get these results with a 90 G VM with this patch applied:

# echo 3 > /proc/sys/vm/drop_caches
# time virsh save centos7 /vm_images/claudio/savevm --bypass-cache
Domain 'centos7' saved to /vm_images/claudio/savevm
real	2m9.368s

# echo 3 > /proc/sys/vm/drop_caches
# time virsh save centos7 /vm_images/claudio/savevm
Domain 'centos7' saved to /vm_images/claudio/savevm
real	0m42.155s

and now without this patch applied:

# echo 3 > /proc/sys/vm/drop_caches
# time virsh save centos7 /vm_images/claudio/savevm --bypass-cache
Domain 'centos7' saved to /vm_images/claudio/savevm
real	2m10.468s

# echo 3 > /proc/sys/vm/drop_caches
# time virsh save centos7 /vm_images/claudio/savevm
Domain 'centos7' saved to /vm_images/claudio/savevm
real	2m6.142s


I'll rerun the numbers again next week on a machine with better cpu if possible.

Thanks,

Claudio

> 
> In theory we could just spawn a thread inside libvirtd todo the same
> as the iohelper, but using a separate helper process is more robust
> 
> If not using libvirt, you would use QEMU's 'exec:' migration protocol
> with 'dd' or 'cat' for the same reasons. Libvirt provides the iohelper
> so we don't have to deal with portibility questions around 'dd' syntax
> and can add features like O_DIRECT that cat lacks.
> 
>> 2) Was Jiri's comment about the missing linux implementation of POSIX_FADV_NOREUSE?
>>
>> 3) if using O_DIRECT is the only reason for iohelper to exist (...?), would replacing it with posix_fadvise remove the need for iohelper?
> 
> We can't remove the iohelper for the reason above.
> 
>> 4) What has stopped Andreas' or another POSIX_FADV_NOREUSE implementation in the kernel?
> 
> With regards,
> Daniel
> 



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [libvirt RFC] virFile: new VIR_FILE_WRAPPER_BIG_PIPE to improve performance
  2022-04-07 13:57                                   ` Claudio Fontana
@ 2022-04-11 18:21                                     ` Claudio Fontana
  2022-04-11 18:53                                       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 30+ messages in thread
From: Claudio Fontana @ 2022-04-11 18:21 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: libvir-list, andrea.righi, Jiri Denemark, qemu-devel

On 4/7/22 3:57 PM, Claudio Fontana wrote:
> On 4/7/22 3:53 PM, Dr. David Alan Gilbert wrote:
>> * Claudio Fontana (cfontana@suse.de) wrote:
>>> On 4/5/22 10:35 AM, Dr. David Alan Gilbert wrote:
>>>> * Claudio Fontana (cfontana@suse.de) wrote:
>>>>> On 3/28/22 10:31 AM, Daniel P. Berrangé wrote:
>>>>>> On Sat, Mar 26, 2022 at 04:49:46PM +0100, Claudio Fontana wrote:
>>>>>>> On 3/25/22 12:29 PM, Daniel P. Berrangé wrote:
>>>>>>>> On Fri, Mar 18, 2022 at 02:34:29PM +0100, Claudio Fontana wrote:
>>>>>>>>> On 3/17/22 4:03 PM, Dr. David Alan Gilbert wrote:
>>>>>>>>>> * Claudio Fontana (cfontana@suse.de) wrote:
>>>>>>>>>>> On 3/17/22 2:41 PM, Claudio Fontana wrote:
>>>>>>>>>>>> On 3/17/22 11:25 AM, Daniel P. Berrangé wrote:
>>>>>>>>>>>>> On Thu, Mar 17, 2022 at 11:12:11AM +0100, Claudio Fontana wrote:
>>>>>>>>>>>>>> On 3/16/22 1:17 PM, Claudio Fontana wrote:
>>>>>>>>>>>>>>> On 3/14/22 6:48 PM, Daniel P. Berrangé wrote:
>>>>>>>>>>>>>>>> On Mon, Mar 14, 2022 at 06:38:31PM +0100, Claudio Fontana wrote:
>>>>>>>>>>>>>>>>> On 3/14/22 6:17 PM, Daniel P. Berrangé wrote:
>>>>>>>>>>>>>>>>>> On Sat, Mar 12, 2022 at 05:30:01PM +0100, Claudio Fontana wrote:
>>>>>>>>>>>>>>>>>>> the first user is the qemu driver,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> virsh save/resume would slow to a crawl with a default pipe size (64k).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> This improves the situation by 400%.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Going through io_helper still seems to incur in some penalty (~15%-ish)
>>>>>>>>>>>>>>>>>>> compared with direct qemu migration to a nc socket to a file.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
>>>>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>>>>  src/qemu/qemu_driver.c    |  6 +++---
>>>>>>>>>>>>>>>>>>>  src/qemu/qemu_saveimage.c | 11 ++++++-----
>>>>>>>>>>>>>>>>>>>  src/util/virfile.c        | 12 ++++++++++++
>>>>>>>>>>>>>>>>>>>  src/util/virfile.h        |  1 +
>>>>>>>>>>>>>>>>>>>  4 files changed, 22 insertions(+), 8 deletions(-)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hello, I initially thought this to be a qemu performance issue,
>>>>>>>>>>>>>>>>>>> so you can find the discussion about this in qemu-devel:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> "Re: bad virsh save /dev/null performance (600 MiB/s max)"
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03142.html
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Current results show these experimental averages maximum throughput
>>>>>>>>>>>>>> migrating to /dev/null per each FdWrapper Pipe Size (as per QEMU QMP
>>>>>>>>>>>>>> "query-migrate", tests repeated 5 times for each).
>>>>>>>>>>>>>> VM Size is 60G, most of the memory effectively touched before migration,
>>>>>>>>>>>>>> through user application allocating and touching all memory with
>>>>>>>>>>>>>> pseudorandom data.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 64K:     5200 Mbps (current situation)
>>>>>>>>>>>>>> 128K:    5800 Mbps
>>>>>>>>>>>>>> 256K:   20900 Mbps
>>>>>>>>>>>>>> 512K:   21600 Mbps
>>>>>>>>>>>>>> 1M:     22800 Mbps
>>>>>>>>>>>>>> 2M:     22800 Mbps
>>>>>>>>>>>>>> 4M:     22400 Mbps
>>>>>>>>>>>>>> 8M:     22500 Mbps
>>>>>>>>>>>>>> 16M:    22800 Mbps
>>>>>>>>>>>>>> 32M:    22900 Mbps
>>>>>>>>>>>>>> 64M:    22900 Mbps
>>>>>>>>>>>>>> 128M:   22800 Mbps
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This above is the throughput out of patched libvirt with multiple Pipe Sizes for the FDWrapper.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ok, its bouncing around with noise after 1 MB. So I'd suggest that
>>>>>>>>>>>>> libvirt attempt to raise the pipe limit to 1 MB by default, but
>>>>>>>>>>>>> not try to go higher.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> As for the theoretical limit for the libvirt architecture,
>>>>>>>>>>>>>> I ran a qemu migration directly issuing the appropriate QMP
>>>>>>>>>>>>>> commands, setting the same migration parameters as per libvirt,
>>>>>>>>>>>>>> and then migrating to a socket netcatted to /dev/null via
>>>>>>>>>>>>>> {"execute": "migrate", "arguments": { "uri", "unix:///tmp/netcat.sock" } } :
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> QMP:    37000 Mbps
>>>>>>>>>>>>>
>>>>>>>>>>>>>> So although the Pipe size improves things (in particular the
>>>>>>>>>>>>>> large jump is for the 256K size, although 1M seems a very good value),
>>>>>>>>>>>>>> there is still a second bottleneck in there somewhere that
>>>>>>>>>>>>>> accounts for a loss of ~14200 Mbps in throughput.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Interesting addition: I tested quickly on a system with faster cpus and larger VM sizes, up to 200GB,
>>>>>>>>>>> and the difference in throughput libvirt vs qemu is basically the same ~14500 Mbps.
>>>>>>>>>>>
>>>>>>>>>>> ~50000 mbps qemu to netcat socket to /dev/null
>>>>>>>>>>> ~35500 mbps virsh save to /dev/null
>>>>>>>>>>>
>>>>>>>>>>> Seems it is not proportional to cpu speed by the looks of it (not a totally fair comparison because the VM sizes are different).
>>>>>>>>>>
>>>>>>>>>> It might be closer to RAM or cache bandwidth limited though; for an extra copy.
>>>>>>>>>
>>>>>>>>> I was thinking about sendfile(2) in iohelper, but that probably
>>>>>>>>> can't work as the input fd is a socket, I am getting EINVAL.
>>>>>>>>
>>>>>>>> Yep, sendfile() requires the input to be a mmapable FD,
>>>>>>>> and the output to be a socket.
>>>>>>>>
>>>>>>>> Try splice() instead  which merely requires 1 end to be a
>>>>>>>> pipe, and the other end can be any FD afaik.
>>>>>>>>
>>>>>>>
>>>>>>> I did try splice(), but performance is worse by around 500%.
>>>>>>
>>>>>> Hmm, that's certainly unexpected !
>>>>>>
>>>>>>> Any ideas welcome,
>>>>>>
>>>>>> I learnt there is also a newer  copy_file_range call, not sure if that's
>>>>>> any better.
>>>>>>
>>>>>> You passed len as 1 MB, I wonder if passing MAXINT is viable ? We just
>>>>>> want to copy everything IIRC.
>>>>>>
>>>>>> With regards,
>>>>>> Daniel
>>>>>>
>>>>>
>>>>> Crazy idea, would trying to use the parallel migration concept for migrating to/from a file make any sense?
>>>>>
>>>>> Not sure if applying the qemu multifd implementation of this would apply, maybe it could be given another implementation for "toFile", trying to use more than one cpu to do the transfer?
>>>>
>>>> I can't see a way that would help; well, I could if you could
>>>> somehow have multiple io helper threads that dealt with it.
>>>
>>> The first issue I encounter here for both the "virsh save" and "virsh restore" scenarios is that libvirt uses fd: migration, not unix: migration.
>>> QEMU supports multifd for unix:, tcp:, vsock: as far as I can see.
>>>
>>> Current save procedure in QMP in short:
>>>
>>> {"execute":"migrate-set-capabilities", ...}
>>> {"execute":"migrate-set-parameters", ...}
>>> {"execute":"getfd","arguments":{"fdname":"migrate"}, ...} fd=26
>>> QEMU_MONITOR_IO_SEND_FD: fd=26
>>> {"execute":"migrate","arguments":{"uri":"fd:migrate"}, ...}
>>>
>>>
>>> Current restore procedure in QMP in short:
>>>
>>> (start QEMU)
>>> {"execute":"migrate-incoming","arguments":{"uri":"fd:21"}, ...}
>>>
>>>
>>> Should I investigate changing libvirt to use unix: for save/restore?
>>> Or should I look into changing qemu to somehow accept fd: for multifd, meaning I guess providing multiple fd: uris in the migrate command?
>>
>> So I'm not sure this is the right direction; i.e. if multifd is the
>> right answer to your problem.
> 
> Of course, just exploring the space.


I have some progress on multifd if we can call it so:

I wrote a simple program that sets up a unix socket,
listens for N_CHANNELS + 1 connections there, sets up multifd parameters, and runs the migration,
spawning threads for each incoming connection from QEMU, creating a file to use to store the migration data coming from qemu (optionally using O_DIRECT).

This program plays the role of a "iohelper"-like thing, basically just copying things over, making O_DIRECT possible.

I save the data streams to multiple files; this works, for the actual results though I will have to migrate to a better hardware setup (enterprise nvme + fast cpu, under various memory configurations).

The intuition would be that if we have enough cpus to spare (no libvirt in the picture as mentioned for now),
say, the same 4 cpus already allocated for a certain VM to run, we can use those cpus (now "free" since we suspended the guest)
to compress each multifd channel (multifd-zstd? multifd-zlib?), thus reducing the amount of stuff that needs to go to disk, making use of those cpus.

Work in progress...

> 
>> However, I think the qemu code probably really really wants to be a
>> socket.
> 
> Understood, I'll try to bend libvirt to use unix:/// and see how far I get,
> 
> Thanks,
> 
> Claudio
> 
>>
>> Dave
>>
>>>
>>> Thank you for your help,
>>>
>>> Claudio
>>>
> 



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [libvirt RFC] virFile: new VIR_FILE_WRAPPER_BIG_PIPE to improve performance
  2022-04-11 18:21                                     ` Claudio Fontana
@ 2022-04-11 18:53                                       ` Dr. David Alan Gilbert
  2022-04-12  9:04                                         ` Claudio Fontana
  0 siblings, 1 reply; 30+ messages in thread
From: Dr. David Alan Gilbert @ 2022-04-11 18:53 UTC (permalink / raw)
  To: Claudio Fontana; +Cc: libvir-list, andrea.righi, Jiri Denemark, qemu-devel

* Claudio Fontana (cfontana@suse.de) wrote:
> On 4/7/22 3:57 PM, Claudio Fontana wrote:
> > On 4/7/22 3:53 PM, Dr. David Alan Gilbert wrote:
> >> * Claudio Fontana (cfontana@suse.de) wrote:
> >>> On 4/5/22 10:35 AM, Dr. David Alan Gilbert wrote:
> >>>> * Claudio Fontana (cfontana@suse.de) wrote:
> >>>>> On 3/28/22 10:31 AM, Daniel P. Berrangé wrote:
> >>>>>> On Sat, Mar 26, 2022 at 04:49:46PM +0100, Claudio Fontana wrote:
> >>>>>>> On 3/25/22 12:29 PM, Daniel P. Berrangé wrote:
> >>>>>>>> On Fri, Mar 18, 2022 at 02:34:29PM +0100, Claudio Fontana wrote:
> >>>>>>>>> On 3/17/22 4:03 PM, Dr. David Alan Gilbert wrote:
> >>>>>>>>>> * Claudio Fontana (cfontana@suse.de) wrote:
> >>>>>>>>>>> On 3/17/22 2:41 PM, Claudio Fontana wrote:
> >>>>>>>>>>>> On 3/17/22 11:25 AM, Daniel P. Berrangé wrote:
> >>>>>>>>>>>>> On Thu, Mar 17, 2022 at 11:12:11AM +0100, Claudio Fontana wrote:
> >>>>>>>>>>>>>> On 3/16/22 1:17 PM, Claudio Fontana wrote:
> >>>>>>>>>>>>>>> On 3/14/22 6:48 PM, Daniel P. Berrangé wrote:
> >>>>>>>>>>>>>>>> On Mon, Mar 14, 2022 at 06:38:31PM +0100, Claudio Fontana wrote:
> >>>>>>>>>>>>>>>>> On 3/14/22 6:17 PM, Daniel P. Berrangé wrote:
> >>>>>>>>>>>>>>>>>> On Sat, Mar 12, 2022 at 05:30:01PM +0100, Claudio Fontana wrote:
> >>>>>>>>>>>>>>>>>>> the first user is the qemu driver,
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> virsh save/resume would slow to a crawl with a default pipe size (64k).
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> This improves the situation by 400%.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Going through io_helper still seems to incur in some penalty (~15%-ish)
> >>>>>>>>>>>>>>>>>>> compared with direct qemu migration to a nc socket to a file.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
> >>>>>>>>>>>>>>>>>>> ---
> >>>>>>>>>>>>>>>>>>>  src/qemu/qemu_driver.c    |  6 +++---
> >>>>>>>>>>>>>>>>>>>  src/qemu/qemu_saveimage.c | 11 ++++++-----
> >>>>>>>>>>>>>>>>>>>  src/util/virfile.c        | 12 ++++++++++++
> >>>>>>>>>>>>>>>>>>>  src/util/virfile.h        |  1 +
> >>>>>>>>>>>>>>>>>>>  4 files changed, 22 insertions(+), 8 deletions(-)
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Hello, I initially thought this to be a qemu performance issue,
> >>>>>>>>>>>>>>>>>>> so you can find the discussion about this in qemu-devel:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> "Re: bad virsh save /dev/null performance (600 MiB/s max)"
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03142.html
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Current results show these experimental averages maximum throughput
> >>>>>>>>>>>>>> migrating to /dev/null per each FdWrapper Pipe Size (as per QEMU QMP
> >>>>>>>>>>>>>> "query-migrate", tests repeated 5 times for each).
> >>>>>>>>>>>>>> VM Size is 60G, most of the memory effectively touched before migration,
> >>>>>>>>>>>>>> through user application allocating and touching all memory with
> >>>>>>>>>>>>>> pseudorandom data.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 64K:     5200 Mbps (current situation)
> >>>>>>>>>>>>>> 128K:    5800 Mbps
> >>>>>>>>>>>>>> 256K:   20900 Mbps
> >>>>>>>>>>>>>> 512K:   21600 Mbps
> >>>>>>>>>>>>>> 1M:     22800 Mbps
> >>>>>>>>>>>>>> 2M:     22800 Mbps
> >>>>>>>>>>>>>> 4M:     22400 Mbps
> >>>>>>>>>>>>>> 8M:     22500 Mbps
> >>>>>>>>>>>>>> 16M:    22800 Mbps
> >>>>>>>>>>>>>> 32M:    22900 Mbps
> >>>>>>>>>>>>>> 64M:    22900 Mbps
> >>>>>>>>>>>>>> 128M:   22800 Mbps
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> This above is the throughput out of patched libvirt with multiple Pipe Sizes for the FDWrapper.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Ok, its bouncing around with noise after 1 MB. So I'd suggest that
> >>>>>>>>>>>>> libvirt attempt to raise the pipe limit to 1 MB by default, but
> >>>>>>>>>>>>> not try to go higher.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> As for the theoretical limit for the libvirt architecture,
> >>>>>>>>>>>>>> I ran a qemu migration directly issuing the appropriate QMP
> >>>>>>>>>>>>>> commands, setting the same migration parameters as per libvirt,
> >>>>>>>>>>>>>> and then migrating to a socket netcatted to /dev/null via
> >>>>>>>>>>>>>> {"execute": "migrate", "arguments": { "uri", "unix:///tmp/netcat.sock" } } :
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> QMP:    37000 Mbps
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> So although the Pipe size improves things (in particular the
> >>>>>>>>>>>>>> large jump is for the 256K size, although 1M seems a very good value),
> >>>>>>>>>>>>>> there is still a second bottleneck in there somewhere that
> >>>>>>>>>>>>>> accounts for a loss of ~14200 Mbps in throughput.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Interesting addition: I tested quickly on a system with faster cpus and larger VM sizes, up to 200GB,
> >>>>>>>>>>> and the difference in throughput libvirt vs qemu is basically the same ~14500 Mbps.
> >>>>>>>>>>>
> >>>>>>>>>>> ~50000 mbps qemu to netcat socket to /dev/null
> >>>>>>>>>>> ~35500 mbps virsh save to /dev/null
> >>>>>>>>>>>
> >>>>>>>>>>> Seems it is not proportional to cpu speed by the looks of it (not a totally fair comparison because the VM sizes are different).
> >>>>>>>>>>
> >>>>>>>>>> It might be closer to RAM or cache bandwidth limited though; for an extra copy.
> >>>>>>>>>
> >>>>>>>>> I was thinking about sendfile(2) in iohelper, but that probably
> >>>>>>>>> can't work as the input fd is a socket, I am getting EINVAL.
> >>>>>>>>
> >>>>>>>> Yep, sendfile() requires the input to be a mmapable FD,
> >>>>>>>> and the output to be a socket.
> >>>>>>>>
> >>>>>>>> Try splice() instead  which merely requires 1 end to be a
> >>>>>>>> pipe, and the other end can be any FD afaik.
> >>>>>>>>
> >>>>>>>
> >>>>>>> I did try splice(), but performance is worse by around 500%.
> >>>>>>
> >>>>>> Hmm, that's certainly unexpected !
> >>>>>>
> >>>>>>> Any ideas welcome,
> >>>>>>
> >>>>>> I learnt there is also a newer  copy_file_range call, not sure if that's
> >>>>>> any better.
> >>>>>>
> >>>>>> You passed len as 1 MB, I wonder if passing MAXINT is viable ? We just
> >>>>>> want to copy everything IIRC.
> >>>>>>
> >>>>>> With regards,
> >>>>>> Daniel
> >>>>>>
> >>>>>
> >>>>> Crazy idea, would trying to use the parallel migration concept for migrating to/from a file make any sense?
> >>>>>
> >>>>> Not sure if applying the qemu multifd implementation of this would apply, maybe it could be given another implementation for "toFile", trying to use more than one cpu to do the transfer?
> >>>>
> >>>> I can't see a way that would help; well, I could if you could
> >>>> somehow have multiple io helper threads that dealt with it.
> >>>
> >>> The first issue I encounter here for both the "virsh save" and "virsh restore" scenarios is that libvirt uses fd: migration, not unix: migration.
> >>> QEMU supports multifd for unix:, tcp:, vsock: as far as I can see.
> >>>
> >>> Current save procedure in QMP in short:
> >>>
> >>> {"execute":"migrate-set-capabilities", ...}
> >>> {"execute":"migrate-set-parameters", ...}
> >>> {"execute":"getfd","arguments":{"fdname":"migrate"}, ...} fd=26
> >>> QEMU_MONITOR_IO_SEND_FD: fd=26
> >>> {"execute":"migrate","arguments":{"uri":"fd:migrate"}, ...}
> >>>
> >>>
> >>> Current restore procedure in QMP in short:
> >>>
> >>> (start QEMU)
> >>> {"execute":"migrate-incoming","arguments":{"uri":"fd:21"}, ...}
> >>>
> >>>
> >>> Should I investigate changing libvirt to use unix: for save/restore?
> >>> Or should I look into changing qemu to somehow accept fd: for multifd, meaning I guess providing multiple fd: uris in the migrate command?
> >>
> >> So I'm not sure this is the right direction; i.e. if multifd is the
> >> right answer to your problem.
> > 
> > Of course, just exploring the space.
> 
> 
> I have some progress on multifd if we can call it so:
> 
> I wrote a simple program that sets up a unix socket,
> listens for N_CHANNELS + 1 connections there, sets up multifd parameters, and runs the migration,
> spawning threads for each incoming connection from QEMU, creating a file to use to store the migration data coming from qemu (optionally using O_DIRECT).
> 
> This program plays the role of a "iohelper"-like thing, basically just copying things over, making O_DIRECT possible.
> 
> I save the data streams to multiple files; this works, for the actual results though I will have to migrate to a better hardware setup (enterprise nvme + fast cpu, under various memory configurations).
> 
> The intuition would be that if we have enough cpus to spare (no libvirt in the picture as mentioned for now),
> say, the same 4 cpus already allocated for a certain VM to run, we can use those cpus (now "free" since we suspended the guest)
> to compress each multifd channel (multifd-zstd? multifd-zlib?), thus reducing the amount of stuff that needs to go to disk, making use of those cpus.

Yes possibly; you have an advantage over ormal migration, in that your
vCPUs are stopped.

> Work in progress...
> 
> > 
> >> However, I think the qemu code probably really really wants to be a
> >> socket.
> > 
> > Understood, I'll try to bend libvirt to use unix:/// and see how far I get,
> > 
> > Thanks,
> > 
> > Claudio
> > 
> >>
> >> Dave
> >>
> >>>
> >>> Thank you for your help,
> >>>
> >>> Claudio
> >>>
> > 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [libvirt RFC] virFile: new VIR_FILE_WRAPPER_BIG_PIPE to improve performance
  2022-04-11 18:53                                       ` Dr. David Alan Gilbert
@ 2022-04-12  9:04                                         ` Claudio Fontana
  0 siblings, 0 replies; 30+ messages in thread
From: Claudio Fontana @ 2022-04-12  9:04 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: libvir-list, andrea.righi, Jiri Denemark, qemu-devel

On 4/11/22 8:53 PM, Dr. David Alan Gilbert wrote:
> * Claudio Fontana (cfontana@suse.de) wrote:
>> On 4/7/22 3:57 PM, Claudio Fontana wrote:
>>> On 4/7/22 3:53 PM, Dr. David Alan Gilbert wrote:
>>>> * Claudio Fontana (cfontana@suse.de) wrote:
>>>>> On 4/5/22 10:35 AM, Dr. David Alan Gilbert wrote:
>>>>>> * Claudio Fontana (cfontana@suse.de) wrote:
>>>>>>> On 3/28/22 10:31 AM, Daniel P. Berrangé wrote:
>>>>>>>> On Sat, Mar 26, 2022 at 04:49:46PM +0100, Claudio Fontana wrote:
>>>>>>>>> On 3/25/22 12:29 PM, Daniel P. Berrangé wrote:
>>>>>>>>>> On Fri, Mar 18, 2022 at 02:34:29PM +0100, Claudio Fontana wrote:
>>>>>>>>>>> On 3/17/22 4:03 PM, Dr. David Alan Gilbert wrote:
>>>>>>>>>>>> * Claudio Fontana (cfontana@suse.de) wrote:
>>>>>>>>>>>>> On 3/17/22 2:41 PM, Claudio Fontana wrote:
>>>>>>>>>>>>>> On 3/17/22 11:25 AM, Daniel P. Berrangé wrote:
>>>>>>>>>>>>>>> On Thu, Mar 17, 2022 at 11:12:11AM +0100, Claudio Fontana wrote:
>>>>>>>>>>>>>>>> On 3/16/22 1:17 PM, Claudio Fontana wrote:
>>>>>>>>>>>>>>>>> On 3/14/22 6:48 PM, Daniel P. Berrangé wrote:
>>>>>>>>>>>>>>>>>> On Mon, Mar 14, 2022 at 06:38:31PM +0100, Claudio Fontana wrote:
>>>>>>>>>>>>>>>>>>> On 3/14/22 6:17 PM, Daniel P. Berrangé wrote:
>>>>>>>>>>>>>>>>>>>> On Sat, Mar 12, 2022 at 05:30:01PM +0100, Claudio Fontana wrote:
>>>>>>>>>>>>>>>>>>>>> the first user is the qemu driver,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> virsh save/resume would slow to a crawl with a default pipe size (64k).
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> This improves the situation by 400%.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Going through io_helper still seems to incur in some penalty (~15%-ish)
>>>>>>>>>>>>>>>>>>>>> compared with direct qemu migration to a nc socket to a file.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
>>>>>>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>>>>>>  src/qemu/qemu_driver.c    |  6 +++---
>>>>>>>>>>>>>>>>>>>>>  src/qemu/qemu_saveimage.c | 11 ++++++-----
>>>>>>>>>>>>>>>>>>>>>  src/util/virfile.c        | 12 ++++++++++++
>>>>>>>>>>>>>>>>>>>>>  src/util/virfile.h        |  1 +
>>>>>>>>>>>>>>>>>>>>>  4 files changed, 22 insertions(+), 8 deletions(-)
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hello, I initially thought this to be a qemu performance issue,
>>>>>>>>>>>>>>>>>>>>> so you can find the discussion about this in qemu-devel:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> "Re: bad virsh save /dev/null performance (600 MiB/s max)"
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03142.html
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Current results show these experimental averages maximum throughput
>>>>>>>>>>>>>>>> migrating to /dev/null per each FdWrapper Pipe Size (as per QEMU QMP
>>>>>>>>>>>>>>>> "query-migrate", tests repeated 5 times for each).
>>>>>>>>>>>>>>>> VM Size is 60G, most of the memory effectively touched before migration,
>>>>>>>>>>>>>>>> through user application allocating and touching all memory with
>>>>>>>>>>>>>>>> pseudorandom data.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 64K:     5200 Mbps (current situation)
>>>>>>>>>>>>>>>> 128K:    5800 Mbps
>>>>>>>>>>>>>>>> 256K:   20900 Mbps
>>>>>>>>>>>>>>>> 512K:   21600 Mbps
>>>>>>>>>>>>>>>> 1M:     22800 Mbps
>>>>>>>>>>>>>>>> 2M:     22800 Mbps
>>>>>>>>>>>>>>>> 4M:     22400 Mbps
>>>>>>>>>>>>>>>> 8M:     22500 Mbps
>>>>>>>>>>>>>>>> 16M:    22800 Mbps
>>>>>>>>>>>>>>>> 32M:    22900 Mbps
>>>>>>>>>>>>>>>> 64M:    22900 Mbps
>>>>>>>>>>>>>>>> 128M:   22800 Mbps
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This above is the throughput out of patched libvirt with multiple Pipe Sizes for the FDWrapper.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Ok, its bouncing around with noise after 1 MB. So I'd suggest that
>>>>>>>>>>>>>>> libvirt attempt to raise the pipe limit to 1 MB by default, but
>>>>>>>>>>>>>>> not try to go higher.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> As for the theoretical limit for the libvirt architecture,
>>>>>>>>>>>>>>>> I ran a qemu migration directly issuing the appropriate QMP
>>>>>>>>>>>>>>>> commands, setting the same migration parameters as per libvirt,
>>>>>>>>>>>>>>>> and then migrating to a socket netcatted to /dev/null via
>>>>>>>>>>>>>>>> {"execute": "migrate", "arguments": { "uri", "unix:///tmp/netcat.sock" } } :
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> QMP:    37000 Mbps
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> So although the Pipe size improves things (in particular the
>>>>>>>>>>>>>>>> large jump is for the 256K size, although 1M seems a very good value),
>>>>>>>>>>>>>>>> there is still a second bottleneck in there somewhere that
>>>>>>>>>>>>>>>> accounts for a loss of ~14200 Mbps in throughput.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Interesting addition: I tested quickly on a system with faster cpus and larger VM sizes, up to 200GB,
>>>>>>>>>>>>> and the difference in throughput libvirt vs qemu is basically the same ~14500 Mbps.
>>>>>>>>>>>>>
>>>>>>>>>>>>> ~50000 mbps qemu to netcat socket to /dev/null
>>>>>>>>>>>>> ~35500 mbps virsh save to /dev/null
>>>>>>>>>>>>>
>>>>>>>>>>>>> Seems it is not proportional to cpu speed by the looks of it (not a totally fair comparison because the VM sizes are different).
>>>>>>>>>>>>
>>>>>>>>>>>> It might be closer to RAM or cache bandwidth limited though; for an extra copy.
>>>>>>>>>>>
>>>>>>>>>>> I was thinking about sendfile(2) in iohelper, but that probably
>>>>>>>>>>> can't work as the input fd is a socket, I am getting EINVAL.
>>>>>>>>>>
>>>>>>>>>> Yep, sendfile() requires the input to be a mmapable FD,
>>>>>>>>>> and the output to be a socket.
>>>>>>>>>>
>>>>>>>>>> Try splice() instead  which merely requires 1 end to be a
>>>>>>>>>> pipe, and the other end can be any FD afaik.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I did try splice(), but performance is worse by around 500%.
>>>>>>>>
>>>>>>>> Hmm, that's certainly unexpected !
>>>>>>>>
>>>>>>>>> Any ideas welcome,
>>>>>>>>
>>>>>>>> I learnt there is also a newer  copy_file_range call, not sure if that's
>>>>>>>> any better.
>>>>>>>>
>>>>>>>> You passed len as 1 MB, I wonder if passing MAXINT is viable ? We just
>>>>>>>> want to copy everything IIRC.
>>>>>>>>
>>>>>>>> With regards,
>>>>>>>> Daniel
>>>>>>>>
>>>>>>>
>>>>>>> Crazy idea, would trying to use the parallel migration concept for migrating to/from a file make any sense?
>>>>>>>
>>>>>>> Not sure if applying the qemu multifd implementation of this would apply, maybe it could be given another implementation for "toFile", trying to use more than one cpu to do the transfer?
>>>>>>
>>>>>> I can't see a way that would help; well, I could if you could
>>>>>> somehow have multiple io helper threads that dealt with it.
>>>>>
>>>>> The first issue I encounter here for both the "virsh save" and "virsh restore" scenarios is that libvirt uses fd: migration, not unix: migration.
>>>>> QEMU supports multifd for unix:, tcp:, vsock: as far as I can see.
>>>>>
>>>>> Current save procedure in QMP in short:
>>>>>
>>>>> {"execute":"migrate-set-capabilities", ...}
>>>>> {"execute":"migrate-set-parameters", ...}
>>>>> {"execute":"getfd","arguments":{"fdname":"migrate"}, ...} fd=26
>>>>> QEMU_MONITOR_IO_SEND_FD: fd=26
>>>>> {"execute":"migrate","arguments":{"uri":"fd:migrate"}, ...}
>>>>>
>>>>>
>>>>> Current restore procedure in QMP in short:
>>>>>
>>>>> (start QEMU)
>>>>> {"execute":"migrate-incoming","arguments":{"uri":"fd:21"}, ...}
>>>>>
>>>>>
>>>>> Should I investigate changing libvirt to use unix: for save/restore?
>>>>> Or should I look into changing qemu to somehow accept fd: for multifd, meaning I guess providing multiple fd: uris in the migrate command?
>>>>
>>>> So I'm not sure this is the right direction; i.e. if multifd is the
>>>> right answer to your problem.
>>>
>>> Of course, just exploring the space.
>>
>>
>> I have some progress on multifd if we can call it so:
>>
>> I wrote a simple program that sets up a unix socket,
>> listens for N_CHANNELS + 1 connections there, sets up multifd parameters, and runs the migration,
>> spawning threads for each incoming connection from QEMU, creating a file to use to store the migration data coming from qemu (optionally using O_DIRECT).
>>
>> This program plays the role of a "iohelper"-like thing, basically just copying things over, making O_DIRECT possible.
>>
>> I save the data streams to multiple files; this works, for the actual results though I will have to migrate to a better hardware setup (enterprise nvme + fast cpu, under various memory configurations).
>>
>> The intuition would be that if we have enough cpus to spare (no libvirt in the picture as mentioned for now),
>> say, the same 4 cpus already allocated for a certain VM to run, we can use those cpus (now "free" since we suspended the guest)
>> to compress each multifd channel (multifd-zstd? multifd-zlib?), thus reducing the amount of stuff that needs to go to disk, making use of those cpus.
> 
> Yes possibly; you have an advantage over ormal migration, in that your
> vCPUs are stopped.

Indeed, it seems to help immensely in the save vm case, cutting on the full transfer cost (including sync).
In my experiment though the data is 90G generated via random() so it likely contains too many repeated patterns,
so the effectiveness will likely depend a lot on how much we can compress.

> 
>> Work in progress...
>>
>>>
>>>> However, I think the qemu code probably really really wants to be a
>>>> socket.
>>>
>>> Understood, I'll try to bend libvirt to use unix:/// and see how far I get,
>>>
>>> Thanks,
>>>
>>> Claudio
>>>
>>>>
>>>> Dave
>>>>
>>>>>
>>>>> Thank you for your help,
>>>>>
>>>>> Claudio
>>>>>
>>>
>>



^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2022-04-12  9:06 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20220312163001.3811-1-cfontana@suse.de>
     [not found] ` <Yi94mQUfrxMVbiLM@redhat.com>
     [not found]   ` <34eb53b5-78f7-3814-b71e-aa7ac59f9d25@suse.de>
     [not found]     ` <Yi+ACeaZ+oXTVYjc@redhat.com>
     [not found]       ` <2d1248d4-ebdf-43f9-e4a7-95f586aade8e@suse.de>
2022-03-17 10:12         ` [libvirt RFC] virFile: new VIR_FILE_WRAPPER_BIG_PIPE to improve performance Claudio Fontana
2022-03-17 10:25           ` Daniel P. Berrangé
2022-03-17 13:41             ` Claudio Fontana
2022-03-17 14:14               ` Claudio Fontana
2022-03-17 15:03                 ` Dr. David Alan Gilbert
2022-03-18 13:34                   ` Claudio Fontana
2022-03-21  7:55                     ` Andrea Righi
2022-03-25  9:56                       ` Claudio Fontana
2022-03-25 10:33                     ` Daniel P. Berrangé
2022-03-25 10:56                       ` Claudio Fontana
2022-03-25 11:14                         ` Daniel P. Berrangé
2022-03-25 11:16                           ` Claudio Fontana
2022-04-10 19:58                       ` Claudio Fontana
2022-03-25 11:29                     ` Daniel P. Berrangé
2022-03-26 15:49                       ` Claudio Fontana
2022-03-26 17:38                         ` Claudio Fontana
2022-03-28  8:31                         ` Daniel P. Berrangé
2022-03-28  9:19                           ` Claudio Fontana
2022-03-28  9:41                             ` Claudio Fontana
2022-03-28  9:31                           ` Claudio Fontana
2022-04-05  8:35                             ` Dr. David Alan Gilbert
2022-04-05  9:23                               ` Claudio Fontana
2022-04-07  7:11                               ` Claudio Fontana
2022-04-07 13:53                                 ` Dr. David Alan Gilbert
2022-04-07 13:57                                   ` Claudio Fontana
2022-04-11 18:21                                     ` Claudio Fontana
2022-04-11 18:53                                       ` Dr. David Alan Gilbert
2022-04-12  9:04                                         ` Claudio Fontana
2022-03-28 10:47                         ` Claudio Fontana
2022-03-28 13:28                           ` Claudio Fontana

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.