* tools/virtiofs: Multi threading seems to hurt performance
@ 2020-09-18 21:34 ` Vivek Goyal
0 siblings, 0 replies; 107+ messages in thread
From: Vivek Goyal @ 2020-09-18 21:34 UTC (permalink / raw)
To: virtio-fs-list, qemu-devel; +Cc: Dr. David Alan Gilbert, Stefan Hajnoczi
Hi All,
virtiofsd default thread pool size is 64. To me it feels that in most of
the cases thread pool size 1 performs better than thread pool size 64.
I ran virtiofs-tests.
https://github.com/rhvgoyal/virtiofs-tests
And here are the comparision results. To me it seems that by default
we should switch to 1 thread (Till we can figure out how to make
multi thread performance better even when single process is doing
I/O in client).
I am especially more interested in getting performance better for
single process in client. If that suffers, then it is pretty bad.
Especially look at randread, randwrite, seqwrite performance. seqread
seems pretty good anyway.
If I don't run who test suite and just ran randread-psync job,
my throughput jumps from around 40MB/s to 60MB/s. That's a huge
jump I would say.
Thoughts?
Thanks
Vivek
NAME WORKLOAD Bandwidth IOPS
cache-auto seqread-psync 690(MiB/s) 172k
cache-auto-1-thread seqread-psync 729(MiB/s) 182k
cache-auto seqread-psync-multi 2578(MiB/s) 644k
cache-auto-1-thread seqread-psync-multi 2597(MiB/s) 649k
cache-auto seqread-mmap 660(MiB/s) 165k
cache-auto-1-thread seqread-mmap 672(MiB/s) 168k
cache-auto seqread-mmap-multi 2499(MiB/s) 624k
cache-auto-1-thread seqread-mmap-multi 2618(MiB/s) 654k
cache-auto seqread-libaio 286(MiB/s) 71k
cache-auto-1-thread seqread-libaio 260(MiB/s) 65k
cache-auto seqread-libaio-multi 1508(MiB/s) 377k
cache-auto-1-thread seqread-libaio-multi 986(MiB/s) 246k
cache-auto randread-psync 35(MiB/s) 9191
cache-auto-1-thread randread-psync 55(MiB/s) 13k
cache-auto randread-psync-multi 179(MiB/s) 44k
cache-auto-1-thread randread-psync-multi 209(MiB/s) 52k
cache-auto randread-mmap 32(MiB/s) 8273
cache-auto-1-thread randread-mmap 50(MiB/s) 12k
cache-auto randread-mmap-multi 161(MiB/s) 40k
cache-auto-1-thread randread-mmap-multi 185(MiB/s) 46k
cache-auto randread-libaio 268(MiB/s) 67k
cache-auto-1-thread randread-libaio 254(MiB/s) 63k
cache-auto randread-libaio-multi 256(MiB/s) 64k
cache-auto-1-thread randread-libaio-multi 155(MiB/s) 38k
cache-auto seqwrite-psync 23(MiB/s) 6026
cache-auto-1-thread seqwrite-psync 30(MiB/s) 7925
cache-auto seqwrite-psync-multi 100(MiB/s) 25k
cache-auto-1-thread seqwrite-psync-multi 154(MiB/s) 38k
cache-auto seqwrite-mmap 343(MiB/s) 85k
cache-auto-1-thread seqwrite-mmap 355(MiB/s) 88k
cache-auto seqwrite-mmap-multi 408(MiB/s) 102k
cache-auto-1-thread seqwrite-mmap-multi 438(MiB/s) 109k
cache-auto seqwrite-libaio 41(MiB/s) 10k
cache-auto-1-thread seqwrite-libaio 65(MiB/s) 16k
cache-auto seqwrite-libaio-multi 137(MiB/s) 34k
cache-auto-1-thread seqwrite-libaio-multi 214(MiB/s) 53k
cache-auto randwrite-psync 22(MiB/s) 5801
cache-auto-1-thread randwrite-psync 30(MiB/s) 7927
cache-auto randwrite-psync-multi 100(MiB/s) 25k
cache-auto-1-thread randwrite-psync-multi 151(MiB/s) 37k
cache-auto randwrite-mmap 31(MiB/s) 7984
cache-auto-1-thread randwrite-mmap 55(MiB/s) 13k
cache-auto randwrite-mmap-multi 124(MiB/s) 31k
cache-auto-1-thread randwrite-mmap-multi 213(MiB/s) 53k
cache-auto randwrite-libaio 40(MiB/s) 10k
cache-auto-1-thread randwrite-libaio 64(MiB/s) 16k
cache-auto randwrite-libaio-multi 139(MiB/s) 34k
cache-auto-1-thread randwrite-libaio-multi 212(MiB/s) 53k
^ permalink raw reply [flat|nested] 107+ messages in thread
* [Virtio-fs] tools/virtiofs: Multi threading seems to hurt performance
@ 2020-09-18 21:34 ` Vivek Goyal
0 siblings, 0 replies; 107+ messages in thread
From: Vivek Goyal @ 2020-09-18 21:34 UTC (permalink / raw)
To: virtio-fs-list, qemu-devel
Hi All,
virtiofsd default thread pool size is 64. To me it feels that in most of
the cases thread pool size 1 performs better than thread pool size 64.
I ran virtiofs-tests.
https://github.com/rhvgoyal/virtiofs-tests
And here are the comparision results. To me it seems that by default
we should switch to 1 thread (Till we can figure out how to make
multi thread performance better even when single process is doing
I/O in client).
I am especially more interested in getting performance better for
single process in client. If that suffers, then it is pretty bad.
Especially look at randread, randwrite, seqwrite performance. seqread
seems pretty good anyway.
If I don't run who test suite and just ran randread-psync job,
my throughput jumps from around 40MB/s to 60MB/s. That's a huge
jump I would say.
Thoughts?
Thanks
Vivek
NAME WORKLOAD Bandwidth IOPS
cache-auto seqread-psync 690(MiB/s) 172k
cache-auto-1-thread seqread-psync 729(MiB/s) 182k
cache-auto seqread-psync-multi 2578(MiB/s) 644k
cache-auto-1-thread seqread-psync-multi 2597(MiB/s) 649k
cache-auto seqread-mmap 660(MiB/s) 165k
cache-auto-1-thread seqread-mmap 672(MiB/s) 168k
cache-auto seqread-mmap-multi 2499(MiB/s) 624k
cache-auto-1-thread seqread-mmap-multi 2618(MiB/s) 654k
cache-auto seqread-libaio 286(MiB/s) 71k
cache-auto-1-thread seqread-libaio 260(MiB/s) 65k
cache-auto seqread-libaio-multi 1508(MiB/s) 377k
cache-auto-1-thread seqread-libaio-multi 986(MiB/s) 246k
cache-auto randread-psync 35(MiB/s) 9191
cache-auto-1-thread randread-psync 55(MiB/s) 13k
cache-auto randread-psync-multi 179(MiB/s) 44k
cache-auto-1-thread randread-psync-multi 209(MiB/s) 52k
cache-auto randread-mmap 32(MiB/s) 8273
cache-auto-1-thread randread-mmap 50(MiB/s) 12k
cache-auto randread-mmap-multi 161(MiB/s) 40k
cache-auto-1-thread randread-mmap-multi 185(MiB/s) 46k
cache-auto randread-libaio 268(MiB/s) 67k
cache-auto-1-thread randread-libaio 254(MiB/s) 63k
cache-auto randread-libaio-multi 256(MiB/s) 64k
cache-auto-1-thread randread-libaio-multi 155(MiB/s) 38k
cache-auto seqwrite-psync 23(MiB/s) 6026
cache-auto-1-thread seqwrite-psync 30(MiB/s) 7925
cache-auto seqwrite-psync-multi 100(MiB/s) 25k
cache-auto-1-thread seqwrite-psync-multi 154(MiB/s) 38k
cache-auto seqwrite-mmap 343(MiB/s) 85k
cache-auto-1-thread seqwrite-mmap 355(MiB/s) 88k
cache-auto seqwrite-mmap-multi 408(MiB/s) 102k
cache-auto-1-thread seqwrite-mmap-multi 438(MiB/s) 109k
cache-auto seqwrite-libaio 41(MiB/s) 10k
cache-auto-1-thread seqwrite-libaio 65(MiB/s) 16k
cache-auto seqwrite-libaio-multi 137(MiB/s) 34k
cache-auto-1-thread seqwrite-libaio-multi 214(MiB/s) 53k
cache-auto randwrite-psync 22(MiB/s) 5801
cache-auto-1-thread randwrite-psync 30(MiB/s) 7927
cache-auto randwrite-psync-multi 100(MiB/s) 25k
cache-auto-1-thread randwrite-psync-multi 151(MiB/s) 37k
cache-auto randwrite-mmap 31(MiB/s) 7984
cache-auto-1-thread randwrite-mmap 55(MiB/s) 13k
cache-auto randwrite-mmap-multi 124(MiB/s) 31k
cache-auto-1-thread randwrite-mmap-multi 213(MiB/s) 53k
cache-auto randwrite-libaio 40(MiB/s) 10k
cache-auto-1-thread randwrite-libaio 64(MiB/s) 16k
cache-auto randwrite-libaio-multi 139(MiB/s) 34k
cache-auto-1-thread randwrite-libaio-multi 212(MiB/s) 53k
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: tools/virtiofs: Multi threading seems to hurt performance
2020-09-18 21:34 ` [Virtio-fs] " Vivek Goyal
@ 2020-09-21 8:39 ` Stefan Hajnoczi
-1 siblings, 0 replies; 107+ messages in thread
From: Stefan Hajnoczi @ 2020-09-21 8:39 UTC (permalink / raw)
To: Vivek Goyal; +Cc: virtio-fs-list, qemu-devel, Dr. David Alan Gilbert
[-- Attachment #1: Type: text/plain, Size: 5074 bytes --]
On Fri, Sep 18, 2020 at 05:34:36PM -0400, Vivek Goyal wrote:
> And here are the comparision results. To me it seems that by default
> we should switch to 1 thread (Till we can figure out how to make
> multi thread performance better even when single process is doing
> I/O in client).
Let's understand the reason before making changes.
Questions:
* Is "1-thread" --thread-pool-size=1?
* Was DAX enabled?
* How does cache=none perform?
* Does commenting out vu_queue_get_avail_bytes() + fuse_log("%s:
Queue %d gave evalue: %zx available: in: %u out: %u\n") in
fv_queue_thread help?
* How do the kvm_stat vmexit counters compare?
* How does host mpstat -P ALL compare?
* How does host perf record -a compare?
* Does the Rust virtiofsd show the same pattern (it doesn't use glib
thread pools)?
Stefan
> NAME WORKLOAD Bandwidth IOPS
> cache-auto seqread-psync 690(MiB/s) 172k
> cache-auto-1-thread seqread-psync 729(MiB/s) 182k
>
> cache-auto seqread-psync-multi 2578(MiB/s) 644k
> cache-auto-1-thread seqread-psync-multi 2597(MiB/s) 649k
>
> cache-auto seqread-mmap 660(MiB/s) 165k
> cache-auto-1-thread seqread-mmap 672(MiB/s) 168k
>
> cache-auto seqread-mmap-multi 2499(MiB/s) 624k
> cache-auto-1-thread seqread-mmap-multi 2618(MiB/s) 654k
>
> cache-auto seqread-libaio 286(MiB/s) 71k
> cache-auto-1-thread seqread-libaio 260(MiB/s) 65k
>
> cache-auto seqread-libaio-multi 1508(MiB/s) 377k
> cache-auto-1-thread seqread-libaio-multi 986(MiB/s) 246k
>
> cache-auto randread-psync 35(MiB/s) 9191
> cache-auto-1-thread randread-psync 55(MiB/s) 13k
>
> cache-auto randread-psync-multi 179(MiB/s) 44k
> cache-auto-1-thread randread-psync-multi 209(MiB/s) 52k
>
> cache-auto randread-mmap 32(MiB/s) 8273
> cache-auto-1-thread randread-mmap 50(MiB/s) 12k
>
> cache-auto randread-mmap-multi 161(MiB/s) 40k
> cache-auto-1-thread randread-mmap-multi 185(MiB/s) 46k
>
> cache-auto randread-libaio 268(MiB/s) 67k
> cache-auto-1-thread randread-libaio 254(MiB/s) 63k
>
> cache-auto randread-libaio-multi 256(MiB/s) 64k
> cache-auto-1-thread randread-libaio-multi 155(MiB/s) 38k
>
> cache-auto seqwrite-psync 23(MiB/s) 6026
> cache-auto-1-thread seqwrite-psync 30(MiB/s) 7925
>
> cache-auto seqwrite-psync-multi 100(MiB/s) 25k
> cache-auto-1-thread seqwrite-psync-multi 154(MiB/s) 38k
>
> cache-auto seqwrite-mmap 343(MiB/s) 85k
> cache-auto-1-thread seqwrite-mmap 355(MiB/s) 88k
>
> cache-auto seqwrite-mmap-multi 408(MiB/s) 102k
> cache-auto-1-thread seqwrite-mmap-multi 438(MiB/s) 109k
>
> cache-auto seqwrite-libaio 41(MiB/s) 10k
> cache-auto-1-thread seqwrite-libaio 65(MiB/s) 16k
>
> cache-auto seqwrite-libaio-multi 137(MiB/s) 34k
> cache-auto-1-thread seqwrite-libaio-multi 214(MiB/s) 53k
>
> cache-auto randwrite-psync 22(MiB/s) 5801
> cache-auto-1-thread randwrite-psync 30(MiB/s) 7927
>
> cache-auto randwrite-psync-multi 100(MiB/s) 25k
> cache-auto-1-thread randwrite-psync-multi 151(MiB/s) 37k
>
> cache-auto randwrite-mmap 31(MiB/s) 7984
> cache-auto-1-thread randwrite-mmap 55(MiB/s) 13k
>
> cache-auto randwrite-mmap-multi 124(MiB/s) 31k
> cache-auto-1-thread randwrite-mmap-multi 213(MiB/s) 53k
>
> cache-auto randwrite-libaio 40(MiB/s) 10k
> cache-auto-1-thread randwrite-libaio 64(MiB/s) 16k
>
> cache-auto randwrite-libaio-multi 139(MiB/s) 34k
> cache-auto-1-thread randwrite-libaio-multi 212(MiB/s) 53k
>
>
>
>
>
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] tools/virtiofs: Multi threading seems to hurt performance
@ 2020-09-21 8:39 ` Stefan Hajnoczi
0 siblings, 0 replies; 107+ messages in thread
From: Stefan Hajnoczi @ 2020-09-21 8:39 UTC (permalink / raw)
To: Vivek Goyal; +Cc: virtio-fs-list, qemu-devel
[-- Attachment #1: Type: text/plain, Size: 5074 bytes --]
On Fri, Sep 18, 2020 at 05:34:36PM -0400, Vivek Goyal wrote:
> And here are the comparision results. To me it seems that by default
> we should switch to 1 thread (Till we can figure out how to make
> multi thread performance better even when single process is doing
> I/O in client).
Let's understand the reason before making changes.
Questions:
* Is "1-thread" --thread-pool-size=1?
* Was DAX enabled?
* How does cache=none perform?
* Does commenting out vu_queue_get_avail_bytes() + fuse_log("%s:
Queue %d gave evalue: %zx available: in: %u out: %u\n") in
fv_queue_thread help?
* How do the kvm_stat vmexit counters compare?
* How does host mpstat -P ALL compare?
* How does host perf record -a compare?
* Does the Rust virtiofsd show the same pattern (it doesn't use glib
thread pools)?
Stefan
> NAME WORKLOAD Bandwidth IOPS
> cache-auto seqread-psync 690(MiB/s) 172k
> cache-auto-1-thread seqread-psync 729(MiB/s) 182k
>
> cache-auto seqread-psync-multi 2578(MiB/s) 644k
> cache-auto-1-thread seqread-psync-multi 2597(MiB/s) 649k
>
> cache-auto seqread-mmap 660(MiB/s) 165k
> cache-auto-1-thread seqread-mmap 672(MiB/s) 168k
>
> cache-auto seqread-mmap-multi 2499(MiB/s) 624k
> cache-auto-1-thread seqread-mmap-multi 2618(MiB/s) 654k
>
> cache-auto seqread-libaio 286(MiB/s) 71k
> cache-auto-1-thread seqread-libaio 260(MiB/s) 65k
>
> cache-auto seqread-libaio-multi 1508(MiB/s) 377k
> cache-auto-1-thread seqread-libaio-multi 986(MiB/s) 246k
>
> cache-auto randread-psync 35(MiB/s) 9191
> cache-auto-1-thread randread-psync 55(MiB/s) 13k
>
> cache-auto randread-psync-multi 179(MiB/s) 44k
> cache-auto-1-thread randread-psync-multi 209(MiB/s) 52k
>
> cache-auto randread-mmap 32(MiB/s) 8273
> cache-auto-1-thread randread-mmap 50(MiB/s) 12k
>
> cache-auto randread-mmap-multi 161(MiB/s) 40k
> cache-auto-1-thread randread-mmap-multi 185(MiB/s) 46k
>
> cache-auto randread-libaio 268(MiB/s) 67k
> cache-auto-1-thread randread-libaio 254(MiB/s) 63k
>
> cache-auto randread-libaio-multi 256(MiB/s) 64k
> cache-auto-1-thread randread-libaio-multi 155(MiB/s) 38k
>
> cache-auto seqwrite-psync 23(MiB/s) 6026
> cache-auto-1-thread seqwrite-psync 30(MiB/s) 7925
>
> cache-auto seqwrite-psync-multi 100(MiB/s) 25k
> cache-auto-1-thread seqwrite-psync-multi 154(MiB/s) 38k
>
> cache-auto seqwrite-mmap 343(MiB/s) 85k
> cache-auto-1-thread seqwrite-mmap 355(MiB/s) 88k
>
> cache-auto seqwrite-mmap-multi 408(MiB/s) 102k
> cache-auto-1-thread seqwrite-mmap-multi 438(MiB/s) 109k
>
> cache-auto seqwrite-libaio 41(MiB/s) 10k
> cache-auto-1-thread seqwrite-libaio 65(MiB/s) 16k
>
> cache-auto seqwrite-libaio-multi 137(MiB/s) 34k
> cache-auto-1-thread seqwrite-libaio-multi 214(MiB/s) 53k
>
> cache-auto randwrite-psync 22(MiB/s) 5801
> cache-auto-1-thread randwrite-psync 30(MiB/s) 7927
>
> cache-auto randwrite-psync-multi 100(MiB/s) 25k
> cache-auto-1-thread randwrite-psync-multi 151(MiB/s) 37k
>
> cache-auto randwrite-mmap 31(MiB/s) 7984
> cache-auto-1-thread randwrite-mmap 55(MiB/s) 13k
>
> cache-auto randwrite-mmap-multi 124(MiB/s) 31k
> cache-auto-1-thread randwrite-mmap-multi 213(MiB/s) 53k
>
> cache-auto randwrite-libaio 40(MiB/s) 10k
> cache-auto-1-thread randwrite-libaio 64(MiB/s) 16k
>
> cache-auto randwrite-libaio-multi 139(MiB/s) 34k
> cache-auto-1-thread randwrite-libaio-multi 212(MiB/s) 53k
>
>
>
>
>
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: tools/virtiofs: Multi threading seems to hurt performance
2020-09-18 21:34 ` [Virtio-fs] " Vivek Goyal
@ 2020-09-21 8:50 ` Dr. David Alan Gilbert
-1 siblings, 0 replies; 107+ messages in thread
From: Dr. David Alan Gilbert @ 2020-09-21 8:50 UTC (permalink / raw)
To: Vivek Goyal; +Cc: virtio-fs-list, qemu-devel, Stefan Hajnoczi
* Vivek Goyal (vgoyal@redhat.com) wrote:
> Hi All,
>
> virtiofsd default thread pool size is 64. To me it feels that in most of
> the cases thread pool size 1 performs better than thread pool size 64.
>
> I ran virtiofs-tests.
>
> https://github.com/rhvgoyal/virtiofs-tests
>
> And here are the comparision results. To me it seems that by default
> we should switch to 1 thread (Till we can figure out how to make
> multi thread performance better even when single process is doing
> I/O in client).
>
> I am especially more interested in getting performance better for
> single process in client. If that suffers, then it is pretty bad.
>
> Especially look at randread, randwrite, seqwrite performance. seqread
> seems pretty good anyway.
>
> If I don't run who test suite and just ran randread-psync job,
> my throughput jumps from around 40MB/s to 60MB/s. That's a huge
> jump I would say.
>
> Thoughts?
What's your host setup; how many cores has the host got and how many did
you give the guest?
Dave
> Thanks
> Vivek
>
>
> NAME WORKLOAD Bandwidth IOPS
> cache-auto seqread-psync 690(MiB/s) 172k
> cache-auto-1-thread seqread-psync 729(MiB/s) 182k
>
> cache-auto seqread-psync-multi 2578(MiB/s) 644k
> cache-auto-1-thread seqread-psync-multi 2597(MiB/s) 649k
>
> cache-auto seqread-mmap 660(MiB/s) 165k
> cache-auto-1-thread seqread-mmap 672(MiB/s) 168k
>
> cache-auto seqread-mmap-multi 2499(MiB/s) 624k
> cache-auto-1-thread seqread-mmap-multi 2618(MiB/s) 654k
>
> cache-auto seqread-libaio 286(MiB/s) 71k
> cache-auto-1-thread seqread-libaio 260(MiB/s) 65k
>
> cache-auto seqread-libaio-multi 1508(MiB/s) 377k
> cache-auto-1-thread seqread-libaio-multi 986(MiB/s) 246k
>
> cache-auto randread-psync 35(MiB/s) 9191
> cache-auto-1-thread randread-psync 55(MiB/s) 13k
>
> cache-auto randread-psync-multi 179(MiB/s) 44k
> cache-auto-1-thread randread-psync-multi 209(MiB/s) 52k
>
> cache-auto randread-mmap 32(MiB/s) 8273
> cache-auto-1-thread randread-mmap 50(MiB/s) 12k
>
> cache-auto randread-mmap-multi 161(MiB/s) 40k
> cache-auto-1-thread randread-mmap-multi 185(MiB/s) 46k
>
> cache-auto randread-libaio 268(MiB/s) 67k
> cache-auto-1-thread randread-libaio 254(MiB/s) 63k
>
> cache-auto randread-libaio-multi 256(MiB/s) 64k
> cache-auto-1-thread randread-libaio-multi 155(MiB/s) 38k
>
> cache-auto seqwrite-psync 23(MiB/s) 6026
> cache-auto-1-thread seqwrite-psync 30(MiB/s) 7925
>
> cache-auto seqwrite-psync-multi 100(MiB/s) 25k
> cache-auto-1-thread seqwrite-psync-multi 154(MiB/s) 38k
>
> cache-auto seqwrite-mmap 343(MiB/s) 85k
> cache-auto-1-thread seqwrite-mmap 355(MiB/s) 88k
>
> cache-auto seqwrite-mmap-multi 408(MiB/s) 102k
> cache-auto-1-thread seqwrite-mmap-multi 438(MiB/s) 109k
>
> cache-auto seqwrite-libaio 41(MiB/s) 10k
> cache-auto-1-thread seqwrite-libaio 65(MiB/s) 16k
>
> cache-auto seqwrite-libaio-multi 137(MiB/s) 34k
> cache-auto-1-thread seqwrite-libaio-multi 214(MiB/s) 53k
>
> cache-auto randwrite-psync 22(MiB/s) 5801
> cache-auto-1-thread randwrite-psync 30(MiB/s) 7927
>
> cache-auto randwrite-psync-multi 100(MiB/s) 25k
> cache-auto-1-thread randwrite-psync-multi 151(MiB/s) 37k
>
> cache-auto randwrite-mmap 31(MiB/s) 7984
> cache-auto-1-thread randwrite-mmap 55(MiB/s) 13k
>
> cache-auto randwrite-mmap-multi 124(MiB/s) 31k
> cache-auto-1-thread randwrite-mmap-multi 213(MiB/s) 53k
>
> cache-auto randwrite-libaio 40(MiB/s) 10k
> cache-auto-1-thread randwrite-libaio 64(MiB/s) 16k
>
> cache-auto randwrite-libaio-multi 139(MiB/s) 34k
> cache-auto-1-thread randwrite-libaio-multi 212(MiB/s) 53k
>
>
>
>
>
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] tools/virtiofs: Multi threading seems to hurt performance
@ 2020-09-21 8:50 ` Dr. David Alan Gilbert
0 siblings, 0 replies; 107+ messages in thread
From: Dr. David Alan Gilbert @ 2020-09-21 8:50 UTC (permalink / raw)
To: Vivek Goyal; +Cc: virtio-fs-list, qemu-devel
* Vivek Goyal (vgoyal@redhat.com) wrote:
> Hi All,
>
> virtiofsd default thread pool size is 64. To me it feels that in most of
> the cases thread pool size 1 performs better than thread pool size 64.
>
> I ran virtiofs-tests.
>
> https://github.com/rhvgoyal/virtiofs-tests
>
> And here are the comparision results. To me it seems that by default
> we should switch to 1 thread (Till we can figure out how to make
> multi thread performance better even when single process is doing
> I/O in client).
>
> I am especially more interested in getting performance better for
> single process in client. If that suffers, then it is pretty bad.
>
> Especially look at randread, randwrite, seqwrite performance. seqread
> seems pretty good anyway.
>
> If I don't run who test suite and just ran randread-psync job,
> my throughput jumps from around 40MB/s to 60MB/s. That's a huge
> jump I would say.
>
> Thoughts?
What's your host setup; how many cores has the host got and how many did
you give the guest?
Dave
> Thanks
> Vivek
>
>
> NAME WORKLOAD Bandwidth IOPS
> cache-auto seqread-psync 690(MiB/s) 172k
> cache-auto-1-thread seqread-psync 729(MiB/s) 182k
>
> cache-auto seqread-psync-multi 2578(MiB/s) 644k
> cache-auto-1-thread seqread-psync-multi 2597(MiB/s) 649k
>
> cache-auto seqread-mmap 660(MiB/s) 165k
> cache-auto-1-thread seqread-mmap 672(MiB/s) 168k
>
> cache-auto seqread-mmap-multi 2499(MiB/s) 624k
> cache-auto-1-thread seqread-mmap-multi 2618(MiB/s) 654k
>
> cache-auto seqread-libaio 286(MiB/s) 71k
> cache-auto-1-thread seqread-libaio 260(MiB/s) 65k
>
> cache-auto seqread-libaio-multi 1508(MiB/s) 377k
> cache-auto-1-thread seqread-libaio-multi 986(MiB/s) 246k
>
> cache-auto randread-psync 35(MiB/s) 9191
> cache-auto-1-thread randread-psync 55(MiB/s) 13k
>
> cache-auto randread-psync-multi 179(MiB/s) 44k
> cache-auto-1-thread randread-psync-multi 209(MiB/s) 52k
>
> cache-auto randread-mmap 32(MiB/s) 8273
> cache-auto-1-thread randread-mmap 50(MiB/s) 12k
>
> cache-auto randread-mmap-multi 161(MiB/s) 40k
> cache-auto-1-thread randread-mmap-multi 185(MiB/s) 46k
>
> cache-auto randread-libaio 268(MiB/s) 67k
> cache-auto-1-thread randread-libaio 254(MiB/s) 63k
>
> cache-auto randread-libaio-multi 256(MiB/s) 64k
> cache-auto-1-thread randread-libaio-multi 155(MiB/s) 38k
>
> cache-auto seqwrite-psync 23(MiB/s) 6026
> cache-auto-1-thread seqwrite-psync 30(MiB/s) 7925
>
> cache-auto seqwrite-psync-multi 100(MiB/s) 25k
> cache-auto-1-thread seqwrite-psync-multi 154(MiB/s) 38k
>
> cache-auto seqwrite-mmap 343(MiB/s) 85k
> cache-auto-1-thread seqwrite-mmap 355(MiB/s) 88k
>
> cache-auto seqwrite-mmap-multi 408(MiB/s) 102k
> cache-auto-1-thread seqwrite-mmap-multi 438(MiB/s) 109k
>
> cache-auto seqwrite-libaio 41(MiB/s) 10k
> cache-auto-1-thread seqwrite-libaio 65(MiB/s) 16k
>
> cache-auto seqwrite-libaio-multi 137(MiB/s) 34k
> cache-auto-1-thread seqwrite-libaio-multi 214(MiB/s) 53k
>
> cache-auto randwrite-psync 22(MiB/s) 5801
> cache-auto-1-thread randwrite-psync 30(MiB/s) 7927
>
> cache-auto randwrite-psync-multi 100(MiB/s) 25k
> cache-auto-1-thread randwrite-psync-multi 151(MiB/s) 37k
>
> cache-auto randwrite-mmap 31(MiB/s) 7984
> cache-auto-1-thread randwrite-mmap 55(MiB/s) 13k
>
> cache-auto randwrite-mmap-multi 124(MiB/s) 31k
> cache-auto-1-thread randwrite-mmap-multi 213(MiB/s) 53k
>
> cache-auto randwrite-libaio 40(MiB/s) 10k
> cache-auto-1-thread randwrite-libaio 64(MiB/s) 16k
>
> cache-auto randwrite-libaio-multi 139(MiB/s) 34k
> cache-auto-1-thread randwrite-libaio-multi 212(MiB/s) 53k
>
>
>
>
>
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: tools/virtiofs: Multi threading seems to hurt performance
2020-09-21 8:50 ` [Virtio-fs] " Dr. David Alan Gilbert
@ 2020-09-21 13:35 ` Vivek Goyal
-1 siblings, 0 replies; 107+ messages in thread
From: Vivek Goyal @ 2020-09-21 13:35 UTC (permalink / raw)
To: Dr. David Alan Gilbert; +Cc: virtio-fs-list, qemu-devel, Stefan Hajnoczi
On Mon, Sep 21, 2020 at 09:50:19AM +0100, Dr. David Alan Gilbert wrote:
> * Vivek Goyal (vgoyal@redhat.com) wrote:
> > Hi All,
> >
> > virtiofsd default thread pool size is 64. To me it feels that in most of
> > the cases thread pool size 1 performs better than thread pool size 64.
> >
> > I ran virtiofs-tests.
> >
> > https://github.com/rhvgoyal/virtiofs-tests
> >
> > And here are the comparision results. To me it seems that by default
> > we should switch to 1 thread (Till we can figure out how to make
> > multi thread performance better even when single process is doing
> > I/O in client).
> >
> > I am especially more interested in getting performance better for
> > single process in client. If that suffers, then it is pretty bad.
> >
> > Especially look at randread, randwrite, seqwrite performance. seqread
> > seems pretty good anyway.
> >
> > If I don't run who test suite and just ran randread-psync job,
> > my throughput jumps from around 40MB/s to 60MB/s. That's a huge
> > jump I would say.
> >
> > Thoughts?
>
> What's your host setup; how many cores has the host got and how many did
> you give the guest?
Got 2 processors on host with 16 cores in each processor. With
hyperthreading enabled, it makes 32 logical cores on each processor and
that makes 64 logical cores on host.
I have given 32 to guest.
Vivek
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] tools/virtiofs: Multi threading seems to hurt performance
@ 2020-09-21 13:35 ` Vivek Goyal
0 siblings, 0 replies; 107+ messages in thread
From: Vivek Goyal @ 2020-09-21 13:35 UTC (permalink / raw)
To: Dr. David Alan Gilbert; +Cc: virtio-fs-list, qemu-devel
On Mon, Sep 21, 2020 at 09:50:19AM +0100, Dr. David Alan Gilbert wrote:
> * Vivek Goyal (vgoyal@redhat.com) wrote:
> > Hi All,
> >
> > virtiofsd default thread pool size is 64. To me it feels that in most of
> > the cases thread pool size 1 performs better than thread pool size 64.
> >
> > I ran virtiofs-tests.
> >
> > https://github.com/rhvgoyal/virtiofs-tests
> >
> > And here are the comparision results. To me it seems that by default
> > we should switch to 1 thread (Till we can figure out how to make
> > multi thread performance better even when single process is doing
> > I/O in client).
> >
> > I am especially more interested in getting performance better for
> > single process in client. If that suffers, then it is pretty bad.
> >
> > Especially look at randread, randwrite, seqwrite performance. seqread
> > seems pretty good anyway.
> >
> > If I don't run who test suite and just ran randread-psync job,
> > my throughput jumps from around 40MB/s to 60MB/s. That's a huge
> > jump I would say.
> >
> > Thoughts?
>
> What's your host setup; how many cores has the host got and how many did
> you give the guest?
Got 2 processors on host with 16 cores in each processor. With
hyperthreading enabled, it makes 32 logical cores on each processor and
that makes 64 logical cores on host.
I have given 32 to guest.
Vivek
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: tools/virtiofs: Multi threading seems to hurt performance
2020-09-21 8:39 ` [Virtio-fs] " Stefan Hajnoczi
@ 2020-09-21 13:39 ` Vivek Goyal
-1 siblings, 0 replies; 107+ messages in thread
From: Vivek Goyal @ 2020-09-21 13:39 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: virtio-fs-list, qemu-devel, Dr. David Alan Gilbert
On Mon, Sep 21, 2020 at 09:39:23AM +0100, Stefan Hajnoczi wrote:
> On Fri, Sep 18, 2020 at 05:34:36PM -0400, Vivek Goyal wrote:
> > And here are the comparision results. To me it seems that by default
> > we should switch to 1 thread (Till we can figure out how to make
> > multi thread performance better even when single process is doing
> > I/O in client).
>
> Let's understand the reason before making changes.
>
> Questions:
> * Is "1-thread" --thread-pool-size=1?
Yes.
> * Was DAX enabled?
No.
> * How does cache=none perform?
I just ran random read workload with cache=none.
cache-none randread-psync 45(MiB/s) 11k
cache-none-1-thread randread-psync 63(MiB/s) 15k
With 1 thread it offers more IOPS.
> * Does commenting out vu_queue_get_avail_bytes() + fuse_log("%s:
> Queue %d gave evalue: %zx available: in: %u out: %u\n") in
> fv_queue_thread help?
Will try that.
> * How do the kvm_stat vmexit counters compare?
This should be same, isn't it. Changing number of threads serving should
not change number of vmexits?
> * How does host mpstat -P ALL compare?
Never used mpstat. Will try running it and see if I can get something
meaningful.
> * How does host perf record -a compare?
Will try it. I feel this might be too big and too verbose to get
something meaningful.
> * Does the Rust virtiofsd show the same pattern (it doesn't use glib
> thread pools)?
No idea. Never tried rust implementation of virtiofsd.
But I suepct it has to do with thread pool implementation and possibly
extra cacheline bouncing.
Thanks
Vivek
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] tools/virtiofs: Multi threading seems to hurt performance
@ 2020-09-21 13:39 ` Vivek Goyal
0 siblings, 0 replies; 107+ messages in thread
From: Vivek Goyal @ 2020-09-21 13:39 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: virtio-fs-list, qemu-devel
On Mon, Sep 21, 2020 at 09:39:23AM +0100, Stefan Hajnoczi wrote:
> On Fri, Sep 18, 2020 at 05:34:36PM -0400, Vivek Goyal wrote:
> > And here are the comparision results. To me it seems that by default
> > we should switch to 1 thread (Till we can figure out how to make
> > multi thread performance better even when single process is doing
> > I/O in client).
>
> Let's understand the reason before making changes.
>
> Questions:
> * Is "1-thread" --thread-pool-size=1?
Yes.
> * Was DAX enabled?
No.
> * How does cache=none perform?
I just ran random read workload with cache=none.
cache-none randread-psync 45(MiB/s) 11k
cache-none-1-thread randread-psync 63(MiB/s) 15k
With 1 thread it offers more IOPS.
> * Does commenting out vu_queue_get_avail_bytes() + fuse_log("%s:
> Queue %d gave evalue: %zx available: in: %u out: %u\n") in
> fv_queue_thread help?
Will try that.
> * How do the kvm_stat vmexit counters compare?
This should be same, isn't it. Changing number of threads serving should
not change number of vmexits?
> * How does host mpstat -P ALL compare?
Never used mpstat. Will try running it and see if I can get something
meaningful.
> * How does host perf record -a compare?
Will try it. I feel this might be too big and too verbose to get
something meaningful.
> * Does the Rust virtiofsd show the same pattern (it doesn't use glib
> thread pools)?
No idea. Never tried rust implementation of virtiofsd.
But I suepct it has to do with thread pool implementation and possibly
extra cacheline bouncing.
Thanks
Vivek
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: tools/virtiofs: Multi threading seems to hurt performance
2020-09-21 13:35 ` [Virtio-fs] " Vivek Goyal
@ 2020-09-21 14:08 ` Daniel P. Berrangé
-1 siblings, 0 replies; 107+ messages in thread
From: Daniel P. Berrangé @ 2020-09-21 14:08 UTC (permalink / raw)
To: Vivek Goyal
Cc: virtio-fs-list, Dr. David Alan Gilbert, Stefan Hajnoczi, qemu-devel
On Mon, Sep 21, 2020 at 09:35:16AM -0400, Vivek Goyal wrote:
> On Mon, Sep 21, 2020 at 09:50:19AM +0100, Dr. David Alan Gilbert wrote:
> > * Vivek Goyal (vgoyal@redhat.com) wrote:
> > > Hi All,
> > >
> > > virtiofsd default thread pool size is 64. To me it feels that in most of
> > > the cases thread pool size 1 performs better than thread pool size 64.
> > >
> > > I ran virtiofs-tests.
> > >
> > > https://github.com/rhvgoyal/virtiofs-tests
> > >
> > > And here are the comparision results. To me it seems that by default
> > > we should switch to 1 thread (Till we can figure out how to make
> > > multi thread performance better even when single process is doing
> > > I/O in client).
> > >
> > > I am especially more interested in getting performance better for
> > > single process in client. If that suffers, then it is pretty bad.
> > >
> > > Especially look at randread, randwrite, seqwrite performance. seqread
> > > seems pretty good anyway.
> > >
> > > If I don't run who test suite and just ran randread-psync job,
> > > my throughput jumps from around 40MB/s to 60MB/s. That's a huge
> > > jump I would say.
> > >
> > > Thoughts?
> >
> > What's your host setup; how many cores has the host got and how many did
> > you give the guest?
>
> Got 2 processors on host with 16 cores in each processor. With
> hyperthreading enabled, it makes 32 logical cores on each processor and
> that makes 64 logical cores on host.
>
> I have given 32 to guest.
FWIW, I'd be inclined to disable hyperthreading in the BIOS for one
test to validate whether it is impacting performance results seen.
Hyperthreads are weak compared to a real CPU, and could result in
misleading data even if you are limiting your guest to 1/2 the host
logical CPUs.
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] tools/virtiofs: Multi threading seems to hurt performance
@ 2020-09-21 14:08 ` Daniel P. Berrangé
0 siblings, 0 replies; 107+ messages in thread
From: Daniel P. Berrangé @ 2020-09-21 14:08 UTC (permalink / raw)
To: Vivek Goyal; +Cc: virtio-fs-list, qemu-devel
On Mon, Sep 21, 2020 at 09:35:16AM -0400, Vivek Goyal wrote:
> On Mon, Sep 21, 2020 at 09:50:19AM +0100, Dr. David Alan Gilbert wrote:
> > * Vivek Goyal (vgoyal@redhat.com) wrote:
> > > Hi All,
> > >
> > > virtiofsd default thread pool size is 64. To me it feels that in most of
> > > the cases thread pool size 1 performs better than thread pool size 64.
> > >
> > > I ran virtiofs-tests.
> > >
> > > https://github.com/rhvgoyal/virtiofs-tests
> > >
> > > And here are the comparision results. To me it seems that by default
> > > we should switch to 1 thread (Till we can figure out how to make
> > > multi thread performance better even when single process is doing
> > > I/O in client).
> > >
> > > I am especially more interested in getting performance better for
> > > single process in client. If that suffers, then it is pretty bad.
> > >
> > > Especially look at randread, randwrite, seqwrite performance. seqread
> > > seems pretty good anyway.
> > >
> > > If I don't run who test suite and just ran randread-psync job,
> > > my throughput jumps from around 40MB/s to 60MB/s. That's a huge
> > > jump I would say.
> > >
> > > Thoughts?
> >
> > What's your host setup; how many cores has the host got and how many did
> > you give the guest?
>
> Got 2 processors on host with 16 cores in each processor. With
> hyperthreading enabled, it makes 32 logical cores on each processor and
> that makes 64 logical cores on host.
>
> I have given 32 to guest.
FWIW, I'd be inclined to disable hyperthreading in the BIOS for one
test to validate whether it is impacting performance results seen.
Hyperthreads are weak compared to a real CPU, and could result in
misleading data even if you are limiting your guest to 1/2 the host
logical CPUs.
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: tools/virtiofs: Multi threading seems to hurt performance
2020-09-18 21:34 ` [Virtio-fs] " Vivek Goyal
@ 2020-09-21 15:32 ` Dr. David Alan Gilbert
-1 siblings, 0 replies; 107+ messages in thread
From: Dr. David Alan Gilbert @ 2020-09-21 15:32 UTC (permalink / raw)
To: Vivek Goyal
Cc: jose.carlos.venegas.munoz, qemu-devel, cdupontd, virtio-fs-list,
Stefan Hajnoczi, archana.m.shinde
Hi,
I've been doing some of my own perf tests and I think I agree
about the thread pool size; my test is a kernel build
and I've tried a bunch of different options.
My config:
Host: 16 core AMD EPYC (32 thread), 128G RAM,
5.9.0-rc4 kernel, rhel 8.2ish userspace.
5.1.0 qemu/virtiofsd built from git.
Guest: Fedora 32 from cloud image with just enough extra installed for
a kernel build.
git cloned and checkout v5.8 of Linux into /dev/shm/linux on the host
fresh before each test. Then log into the guest, make defconfig,
time make -j 16 bzImage, make clean; time make -j 16 bzImage
The numbers below are the 'real' time in the guest from the initial make
(the subsequent makes dont vary much)
Below are the detauls of what each of these means, but here are the
numbers first
virtiofsdefault 4m0.978s
9pdefault 9m41.660s
virtiofscache=none 10m29.700s
9pmmappass 9m30.047s
9pmbigmsize 12m4.208s
9pmsecnone 9m21.363s
virtiofscache=noneT1 7m17.494s
virtiofsdefaultT1 3m43.326s
So the winner there by far is the 'virtiofsdefaultT1' - that's
the default virtiofs settings, but with --thread-pool-size=1 - so
yes it gives a small benefit.
But interestingly the cache=none virtiofs performance is pretty bad,
but thread-pool-size=1 on that makes a BIG improvement.
virtiofsdefault:
./virtiofsd --socket-path=/tmp/vhostqemu -o source=/dev/shm/linux
./x86_64-softmmu/qemu-system-x86_64 -M pc,memory-backend=mem,accel=kvm -smp 8 -cpu host -m 32G,maxmem=64G,slots=1 -object memory-backend-memfd,id=mem,size=32G,share=on -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -chardev socket,id=char0,path=/tmp/vhostqemu -device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=kernel
mount -t virtiofs kernel /mnt
9pdefault
./x86_64-softmmu/qemu-system-x86_64 -M pc,accel=kvm -smp 8 -cpu host -m 32G -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -virtfs local,path=/dev/shm/linux,mount_tag=kernel,security_model=passthrough
mount -t 9p -o trans=virtio kernel /mnt -oversion=9p2000.L
virtiofscache=none
./virtiofsd --socket-path=/tmp/vhostqemu -o source=/dev/shm/linux -o cache=none
./x86_64-softmmu/qemu-system-x86_64 -M pc,memory-backend=mem,accel=kvm -smp 8 -cpu host -m 32G,maxmem=64G,slots=1 -object memory-backend-memfd,id=mem,size=32G,share=on -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -chardev socket,id=char0,path=/tmp/vhostqemu -device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=kernel
mount -t virtiofs kernel /mnt
9pmmappass
./x86_64-softmmu/qemu-system-x86_64 -M pc,accel=kvm -smp 8 -cpu host -m 32G -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -virtfs local,path=/dev/shm/linux,mount_tag=kernel,security_model=passthrough
mount -t 9p -o trans=virtio kernel /mnt -oversion=9p2000.L,cache=mmap
9pmbigmsize
./x86_64-softmmu/qemu-system-x86_64 -M pc,accel=kvm -smp 8 -cpu host -m 32G -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -virtfs local,path=/dev/shm/linux,mount_tag=kernel,security_model=passthrough
mount -t 9p -o trans=virtio kernel /mnt -oversion=9p2000.L,cache=mmap,msize=1048576
9pmsecnone
./x86_64-softmmu/qemu-system-x86_64 -M pc,accel=kvm -smp 8 -cpu host -m 32G -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -virtfs local,path=/dev/shm/linux,mount_tag=kernel,security_model=none
mount -t 9p -o trans=virtio kernel /mnt -oversion=9p2000.L
virtiofscache=noneT1
./virtiofsd --socket-path=/tmp/vhostqemu -o source=/dev/shm/linux -o cache=none --thread-pool-size=1
mount -t virtiofs kernel /mnt
virtiofsdefaultT1
./virtiofsd --socket-path=/tmp/vhostqemu -o source=/dev/shm/linux --thread-pool-size=1
./x86_64-softmmu/qemu-system-x86_64 -M pc,memory-backend=mem,accel=kvm -smp 8 -cpu host -m 32G,maxmem=64G,slots=1 -object memory-backend-memfd,id=mem,size=32G,share=on -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -chardev socket,id=char0,path=/tmp/vhostqemu -device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=kernel
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] tools/virtiofs: Multi threading seems to hurt performance
@ 2020-09-21 15:32 ` Dr. David Alan Gilbert
0 siblings, 0 replies; 107+ messages in thread
From: Dr. David Alan Gilbert @ 2020-09-21 15:32 UTC (permalink / raw)
To: Vivek Goyal
Cc: jose.carlos.venegas.munoz, qemu-devel, cdupontd, virtio-fs-list,
archana.m.shinde
Hi,
I've been doing some of my own perf tests and I think I agree
about the thread pool size; my test is a kernel build
and I've tried a bunch of different options.
My config:
Host: 16 core AMD EPYC (32 thread), 128G RAM,
5.9.0-rc4 kernel, rhel 8.2ish userspace.
5.1.0 qemu/virtiofsd built from git.
Guest: Fedora 32 from cloud image with just enough extra installed for
a kernel build.
git cloned and checkout v5.8 of Linux into /dev/shm/linux on the host
fresh before each test. Then log into the guest, make defconfig,
time make -j 16 bzImage, make clean; time make -j 16 bzImage
The numbers below are the 'real' time in the guest from the initial make
(the subsequent makes dont vary much)
Below are the detauls of what each of these means, but here are the
numbers first
virtiofsdefault 4m0.978s
9pdefault 9m41.660s
virtiofscache=none 10m29.700s
9pmmappass 9m30.047s
9pmbigmsize 12m4.208s
9pmsecnone 9m21.363s
virtiofscache=noneT1 7m17.494s
virtiofsdefaultT1 3m43.326s
So the winner there by far is the 'virtiofsdefaultT1' - that's
the default virtiofs settings, but with --thread-pool-size=1 - so
yes it gives a small benefit.
But interestingly the cache=none virtiofs performance is pretty bad,
but thread-pool-size=1 on that makes a BIG improvement.
virtiofsdefault:
./virtiofsd --socket-path=/tmp/vhostqemu -o source=/dev/shm/linux
./x86_64-softmmu/qemu-system-x86_64 -M pc,memory-backend=mem,accel=kvm -smp 8 -cpu host -m 32G,maxmem=64G,slots=1 -object memory-backend-memfd,id=mem,size=32G,share=on -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -chardev socket,id=char0,path=/tmp/vhostqemu -device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=kernel
mount -t virtiofs kernel /mnt
9pdefault
./x86_64-softmmu/qemu-system-x86_64 -M pc,accel=kvm -smp 8 -cpu host -m 32G -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -virtfs local,path=/dev/shm/linux,mount_tag=kernel,security_model=passthrough
mount -t 9p -o trans=virtio kernel /mnt -oversion=9p2000.L
virtiofscache=none
./virtiofsd --socket-path=/tmp/vhostqemu -o source=/dev/shm/linux -o cache=none
./x86_64-softmmu/qemu-system-x86_64 -M pc,memory-backend=mem,accel=kvm -smp 8 -cpu host -m 32G,maxmem=64G,slots=1 -object memory-backend-memfd,id=mem,size=32G,share=on -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -chardev socket,id=char0,path=/tmp/vhostqemu -device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=kernel
mount -t virtiofs kernel /mnt
9pmmappass
./x86_64-softmmu/qemu-system-x86_64 -M pc,accel=kvm -smp 8 -cpu host -m 32G -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -virtfs local,path=/dev/shm/linux,mount_tag=kernel,security_model=passthrough
mount -t 9p -o trans=virtio kernel /mnt -oversion=9p2000.L,cache=mmap
9pmbigmsize
./x86_64-softmmu/qemu-system-x86_64 -M pc,accel=kvm -smp 8 -cpu host -m 32G -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -virtfs local,path=/dev/shm/linux,mount_tag=kernel,security_model=passthrough
mount -t 9p -o trans=virtio kernel /mnt -oversion=9p2000.L,cache=mmap,msize=1048576
9pmsecnone
./x86_64-softmmu/qemu-system-x86_64 -M pc,accel=kvm -smp 8 -cpu host -m 32G -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -virtfs local,path=/dev/shm/linux,mount_tag=kernel,security_model=none
mount -t 9p -o trans=virtio kernel /mnt -oversion=9p2000.L
virtiofscache=noneT1
./virtiofsd --socket-path=/tmp/vhostqemu -o source=/dev/shm/linux -o cache=none --thread-pool-size=1
mount -t virtiofs kernel /mnt
virtiofsdefaultT1
./virtiofsd --socket-path=/tmp/vhostqemu -o source=/dev/shm/linux --thread-pool-size=1
./x86_64-softmmu/qemu-system-x86_64 -M pc,memory-backend=mem,accel=kvm -smp 8 -cpu host -m 32G,maxmem=64G,slots=1 -object memory-backend-memfd,id=mem,size=32G,share=on -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -chardev socket,id=char0,path=/tmp/vhostqemu -device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=kernel
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: tools/virtiofs: Multi threading seems to hurt performance
2020-09-21 13:39 ` [Virtio-fs] " Vivek Goyal
@ 2020-09-21 16:57 ` Stefan Hajnoczi
-1 siblings, 0 replies; 107+ messages in thread
From: Stefan Hajnoczi @ 2020-09-21 16:57 UTC (permalink / raw)
To: Vivek Goyal; +Cc: virtio-fs-list, qemu-devel, Dr. David Alan Gilbert
[-- Attachment #1: Type: text/plain, Size: 1787 bytes --]
On Mon, Sep 21, 2020 at 09:39:44AM -0400, Vivek Goyal wrote:
> On Mon, Sep 21, 2020 at 09:39:23AM +0100, Stefan Hajnoczi wrote:
> > On Fri, Sep 18, 2020 at 05:34:36PM -0400, Vivek Goyal wrote:
> > > And here are the comparision results. To me it seems that by default
> > > we should switch to 1 thread (Till we can figure out how to make
> > > multi thread performance better even when single process is doing
> > > I/O in client).
> >
> > Let's understand the reason before making changes.
> >
> > Questions:
> > * Is "1-thread" --thread-pool-size=1?
>
> Yes.
Okay, I wanted to make sure 1-thread is still going through the glib
thread pool. So it's the same code path regardless of the
--thread-pool-size= value. This suggests the performance issue is
related to timing side-effects like lock contention, thread scheduling,
etc.
> > * How do the kvm_stat vmexit counters compare?
>
> This should be same, isn't it. Changing number of threads serving should
> not change number of vmexits?
There is batching at the virtio and eventfd levels. I'm not sure if it's
coming into play here but you would see it by comparing vmexits and
eventfd reads. Having more threads can increase the number of
notifications and completion interrupt, which can make overall
performance worse in some cases.
> > * How does host mpstat -P ALL compare?
>
> Never used mpstat. Will try running it and see if I can get something
> meaningful.
Tools like top, vmstat, etc can give similar information. I'm wondering
what the host CPU utilization (guest/sys/user) looks like.
> But I suepct it has to do with thread pool implementation and possibly
> extra cacheline bouncing.
I think perf can record cacheline bounces if you want to check.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] tools/virtiofs: Multi threading seems to hurt performance
@ 2020-09-21 16:57 ` Stefan Hajnoczi
0 siblings, 0 replies; 107+ messages in thread
From: Stefan Hajnoczi @ 2020-09-21 16:57 UTC (permalink / raw)
To: Vivek Goyal; +Cc: virtio-fs-list, qemu-devel
[-- Attachment #1: Type: text/plain, Size: 1787 bytes --]
On Mon, Sep 21, 2020 at 09:39:44AM -0400, Vivek Goyal wrote:
> On Mon, Sep 21, 2020 at 09:39:23AM +0100, Stefan Hajnoczi wrote:
> > On Fri, Sep 18, 2020 at 05:34:36PM -0400, Vivek Goyal wrote:
> > > And here are the comparision results. To me it seems that by default
> > > we should switch to 1 thread (Till we can figure out how to make
> > > multi thread performance better even when single process is doing
> > > I/O in client).
> >
> > Let's understand the reason before making changes.
> >
> > Questions:
> > * Is "1-thread" --thread-pool-size=1?
>
> Yes.
Okay, I wanted to make sure 1-thread is still going through the glib
thread pool. So it's the same code path regardless of the
--thread-pool-size= value. This suggests the performance issue is
related to timing side-effects like lock contention, thread scheduling,
etc.
> > * How do the kvm_stat vmexit counters compare?
>
> This should be same, isn't it. Changing number of threads serving should
> not change number of vmexits?
There is batching at the virtio and eventfd levels. I'm not sure if it's
coming into play here but you would see it by comparing vmexits and
eventfd reads. Having more threads can increase the number of
notifications and completion interrupt, which can make overall
performance worse in some cases.
> > * How does host mpstat -P ALL compare?
>
> Never used mpstat. Will try running it and see if I can get something
> meaningful.
Tools like top, vmstat, etc can give similar information. I'm wondering
what the host CPU utilization (guest/sys/user) looks like.
> But I suepct it has to do with thread pool implementation and possibly
> extra cacheline bouncing.
I think perf can record cacheline bounces if you want to check.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: tools/virtiofs: Multi threading seems to hurt performance
2020-09-18 21:34 ` [Virtio-fs] " Vivek Goyal
@ 2020-09-21 20:16 ` Vivek Goyal
-1 siblings, 0 replies; 107+ messages in thread
From: Vivek Goyal @ 2020-09-21 20:16 UTC (permalink / raw)
To: virtio-fs-list, qemu-devel
Cc: Dr. David Alan Gilbert, Stefan Hajnoczi, Miklos Szeredi
On Fri, Sep 18, 2020 at 05:34:36PM -0400, Vivek Goyal wrote:
> Hi All,
>
> virtiofsd default thread pool size is 64. To me it feels that in most of
> the cases thread pool size 1 performs better than thread pool size 64.
>
> I ran virtiofs-tests.
>
> https://github.com/rhvgoyal/virtiofs-tests
I spent more time debugging this. First thing I noticed is that we
are using "exclusive" glib thread pool.
https://developer.gnome.org/glib/stable/glib-Thread-Pools.html#g-thread-pool-new
This seems to run pre-determined number of threads dedicated to that
thread pool. Little instrumentation of code revealed that every new
request gets assiged to new thread (despite the fact that previous
thread finished its job). So internally there might be some kind of
round robin policy to choose next thread for running the job.
I decided to switch to "shared" pool instead where it seemed to spin
up new threads only if there is enough work. Also threads can be shared
between pools.
And looks like testing results are way better with "shared" pools. So
may be we should switch to shared pool by default. (Till somebody shows
in what cases exclusive pools are better).
Second thought which came to mind was what's the impact of NUMA. What
if qemu and virtiofsd process/threads are running on separate NUMA
node. That should increase memory access latency and increased overhead.
So I used "numactl --cpubind=0" to bind both qemu and virtiofsd to node
0. My machine seems to have two numa nodes. (Each node is having 32
logical processors). Keeping both qemu and virtiofsd on same node
improves throughput further.
So here are the results.
vtfs-none-epool --> cache=none, exclusive thread pool.
vtfs-none-spool --> cache=none, shared thread pool.
vtfs-none-spool-numa --> cache=none, shared thread pool, same numa node
NAME WORKLOAD Bandwidth IOPS
vtfs-none-epool seqread-psync 36(MiB/s) 9392
vtfs-none-spool seqread-psync 68(MiB/s) 17k
vtfs-none-spool-numa seqread-psync 73(MiB/s) 18k
vtfs-none-epool seqread-psync-multi 210(MiB/s) 52k
vtfs-none-spool seqread-psync-multi 260(MiB/s) 65k
vtfs-none-spool-numa seqread-psync-multi 309(MiB/s) 77k
vtfs-none-epool seqread-libaio 286(MiB/s) 71k
vtfs-none-spool seqread-libaio 328(MiB/s) 82k
vtfs-none-spool-numa seqread-libaio 332(MiB/s) 83k
vtfs-none-epool seqread-libaio-multi 201(MiB/s) 50k
vtfs-none-spool seqread-libaio-multi 254(MiB/s) 63k
vtfs-none-spool-numa seqread-libaio-multi 276(MiB/s) 69k
vtfs-none-epool randread-psync 40(MiB/s) 10k
vtfs-none-spool randread-psync 64(MiB/s) 16k
vtfs-none-spool-numa randread-psync 72(MiB/s) 18k
vtfs-none-epool randread-psync-multi 211(MiB/s) 52k
vtfs-none-spool randread-psync-multi 252(MiB/s) 63k
vtfs-none-spool-numa randread-psync-multi 297(MiB/s) 74k
vtfs-none-epool randread-libaio 313(MiB/s) 78k
vtfs-none-spool randread-libaio 320(MiB/s) 80k
vtfs-none-spool-numa randread-libaio 330(MiB/s) 82k
vtfs-none-epool randread-libaio-multi 257(MiB/s) 64k
vtfs-none-spool randread-libaio-multi 274(MiB/s) 68k
vtfs-none-spool-numa randread-libaio-multi 319(MiB/s) 79k
vtfs-none-epool seqwrite-psync 34(MiB/s) 8926
vtfs-none-spool seqwrite-psync 55(MiB/s) 13k
vtfs-none-spool-numa seqwrite-psync 66(MiB/s) 16k
vtfs-none-epool seqwrite-psync-multi 196(MiB/s) 49k
vtfs-none-spool seqwrite-psync-multi 225(MiB/s) 56k
vtfs-none-spool-numa seqwrite-psync-multi 270(MiB/s) 67k
vtfs-none-epool seqwrite-libaio 257(MiB/s) 64k
vtfs-none-spool seqwrite-libaio 304(MiB/s) 76k
vtfs-none-spool-numa seqwrite-libaio 267(MiB/s) 66k
vtfs-none-epool seqwrite-libaio-multi 312(MiB/s) 78k
vtfs-none-spool seqwrite-libaio-multi 366(MiB/s) 91k
vtfs-none-spool-numa seqwrite-libaio-multi 381(MiB/s) 95k
vtfs-none-epool randwrite-psync 38(MiB/s) 9745
vtfs-none-spool randwrite-psync 55(MiB/s) 13k
vtfs-none-spool-numa randwrite-psync 67(MiB/s) 16k
vtfs-none-epool randwrite-psync-multi 186(MiB/s) 46k
vtfs-none-spool randwrite-psync-multi 240(MiB/s) 60k
vtfs-none-spool-numa randwrite-psync-multi 271(MiB/s) 67k
vtfs-none-epool randwrite-libaio 224(MiB/s) 56k
vtfs-none-spool randwrite-libaio 296(MiB/s) 74k
vtfs-none-spool-numa randwrite-libaio 290(MiB/s) 72k
vtfs-none-epool randwrite-libaio-multi 300(MiB/s) 75k
vtfs-none-spool randwrite-libaio-multi 350(MiB/s) 87k
vtfs-none-spool-numa randwrite-libaio-multi 383(MiB/s) 95k
Thanks
Vivek
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] tools/virtiofs: Multi threading seems to hurt performance
@ 2020-09-21 20:16 ` Vivek Goyal
0 siblings, 0 replies; 107+ messages in thread
From: Vivek Goyal @ 2020-09-21 20:16 UTC (permalink / raw)
To: virtio-fs-list, qemu-devel; +Cc: Miklos Szeredi
On Fri, Sep 18, 2020 at 05:34:36PM -0400, Vivek Goyal wrote:
> Hi All,
>
> virtiofsd default thread pool size is 64. To me it feels that in most of
> the cases thread pool size 1 performs better than thread pool size 64.
>
> I ran virtiofs-tests.
>
> https://github.com/rhvgoyal/virtiofs-tests
I spent more time debugging this. First thing I noticed is that we
are using "exclusive" glib thread pool.
https://developer.gnome.org/glib/stable/glib-Thread-Pools.html#g-thread-pool-new
This seems to run pre-determined number of threads dedicated to that
thread pool. Little instrumentation of code revealed that every new
request gets assiged to new thread (despite the fact that previous
thread finished its job). So internally there might be some kind of
round robin policy to choose next thread for running the job.
I decided to switch to "shared" pool instead where it seemed to spin
up new threads only if there is enough work. Also threads can be shared
between pools.
And looks like testing results are way better with "shared" pools. So
may be we should switch to shared pool by default. (Till somebody shows
in what cases exclusive pools are better).
Second thought which came to mind was what's the impact of NUMA. What
if qemu and virtiofsd process/threads are running on separate NUMA
node. That should increase memory access latency and increased overhead.
So I used "numactl --cpubind=0" to bind both qemu and virtiofsd to node
0. My machine seems to have two numa nodes. (Each node is having 32
logical processors). Keeping both qemu and virtiofsd on same node
improves throughput further.
So here are the results.
vtfs-none-epool --> cache=none, exclusive thread pool.
vtfs-none-spool --> cache=none, shared thread pool.
vtfs-none-spool-numa --> cache=none, shared thread pool, same numa node
NAME WORKLOAD Bandwidth IOPS
vtfs-none-epool seqread-psync 36(MiB/s) 9392
vtfs-none-spool seqread-psync 68(MiB/s) 17k
vtfs-none-spool-numa seqread-psync 73(MiB/s) 18k
vtfs-none-epool seqread-psync-multi 210(MiB/s) 52k
vtfs-none-spool seqread-psync-multi 260(MiB/s) 65k
vtfs-none-spool-numa seqread-psync-multi 309(MiB/s) 77k
vtfs-none-epool seqread-libaio 286(MiB/s) 71k
vtfs-none-spool seqread-libaio 328(MiB/s) 82k
vtfs-none-spool-numa seqread-libaio 332(MiB/s) 83k
vtfs-none-epool seqread-libaio-multi 201(MiB/s) 50k
vtfs-none-spool seqread-libaio-multi 254(MiB/s) 63k
vtfs-none-spool-numa seqread-libaio-multi 276(MiB/s) 69k
vtfs-none-epool randread-psync 40(MiB/s) 10k
vtfs-none-spool randread-psync 64(MiB/s) 16k
vtfs-none-spool-numa randread-psync 72(MiB/s) 18k
vtfs-none-epool randread-psync-multi 211(MiB/s) 52k
vtfs-none-spool randread-psync-multi 252(MiB/s) 63k
vtfs-none-spool-numa randread-psync-multi 297(MiB/s) 74k
vtfs-none-epool randread-libaio 313(MiB/s) 78k
vtfs-none-spool randread-libaio 320(MiB/s) 80k
vtfs-none-spool-numa randread-libaio 330(MiB/s) 82k
vtfs-none-epool randread-libaio-multi 257(MiB/s) 64k
vtfs-none-spool randread-libaio-multi 274(MiB/s) 68k
vtfs-none-spool-numa randread-libaio-multi 319(MiB/s) 79k
vtfs-none-epool seqwrite-psync 34(MiB/s) 8926
vtfs-none-spool seqwrite-psync 55(MiB/s) 13k
vtfs-none-spool-numa seqwrite-psync 66(MiB/s) 16k
vtfs-none-epool seqwrite-psync-multi 196(MiB/s) 49k
vtfs-none-spool seqwrite-psync-multi 225(MiB/s) 56k
vtfs-none-spool-numa seqwrite-psync-multi 270(MiB/s) 67k
vtfs-none-epool seqwrite-libaio 257(MiB/s) 64k
vtfs-none-spool seqwrite-libaio 304(MiB/s) 76k
vtfs-none-spool-numa seqwrite-libaio 267(MiB/s) 66k
vtfs-none-epool seqwrite-libaio-multi 312(MiB/s) 78k
vtfs-none-spool seqwrite-libaio-multi 366(MiB/s) 91k
vtfs-none-spool-numa seqwrite-libaio-multi 381(MiB/s) 95k
vtfs-none-epool randwrite-psync 38(MiB/s) 9745
vtfs-none-spool randwrite-psync 55(MiB/s) 13k
vtfs-none-spool-numa randwrite-psync 67(MiB/s) 16k
vtfs-none-epool randwrite-psync-multi 186(MiB/s) 46k
vtfs-none-spool randwrite-psync-multi 240(MiB/s) 60k
vtfs-none-spool-numa randwrite-psync-multi 271(MiB/s) 67k
vtfs-none-epool randwrite-libaio 224(MiB/s) 56k
vtfs-none-spool randwrite-libaio 296(MiB/s) 74k
vtfs-none-spool-numa randwrite-libaio 290(MiB/s) 72k
vtfs-none-epool randwrite-libaio-multi 300(MiB/s) 75k
vtfs-none-spool randwrite-libaio-multi 350(MiB/s) 87k
vtfs-none-spool-numa randwrite-libaio-multi 383(MiB/s) 95k
Thanks
Vivek
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: tools/virtiofs: Multi threading seems to hurt performance
2020-09-21 15:32 ` [Virtio-fs] " Dr. David Alan Gilbert
@ 2020-09-22 10:25 ` Dr. David Alan Gilbert
-1 siblings, 0 replies; 107+ messages in thread
From: Dr. David Alan Gilbert @ 2020-09-22 10:25 UTC (permalink / raw)
To: Vivek Goyal
Cc: jose.carlos.venegas.munoz, qemu-devel, cdupontd, virtio-fs-list,
Stefan Hajnoczi, archana.m.shinde
* Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> Hi,
> I've been doing some of my own perf tests and I think I agree
> about the thread pool size; my test is a kernel build
> and I've tried a bunch of different options.
>
> My config:
> Host: 16 core AMD EPYC (32 thread), 128G RAM,
> 5.9.0-rc4 kernel, rhel 8.2ish userspace.
> 5.1.0 qemu/virtiofsd built from git.
> Guest: Fedora 32 from cloud image with just enough extra installed for
> a kernel build.
>
> git cloned and checkout v5.8 of Linux into /dev/shm/linux on the host
> fresh before each test. Then log into the guest, make defconfig,
> time make -j 16 bzImage, make clean; time make -j 16 bzImage
> The numbers below are the 'real' time in the guest from the initial make
> (the subsequent makes dont vary much)
>
> Below are the detauls of what each of these means, but here are the
> numbers first
>
> virtiofsdefault 4m0.978s
> 9pdefault 9m41.660s
> virtiofscache=none 10m29.700s
> 9pmmappass 9m30.047s
> 9pmbigmsize 12m4.208s
> 9pmsecnone 9m21.363s
> virtiofscache=noneT1 7m17.494s
> virtiofsdefaultT1 3m43.326s
>
> So the winner there by far is the 'virtiofsdefaultT1' - that's
> the default virtiofs settings, but with --thread-pool-size=1 - so
> yes it gives a small benefit.
> But interestingly the cache=none virtiofs performance is pretty bad,
> but thread-pool-size=1 on that makes a BIG improvement.
Here are fio runs that Vivek asked me to run in my same environment
(there are some 0's in some of the mmap cases, and I've not investigated
why yet). virtiofs is looking good here in I think all of the cases;
there's some division over which cinfig; cache=none
seems faster in some cases which surprises me.
Dave
NAME WORKLOAD Bandwidth IOPS
9pbigmsize seqread-psync 108(MiB/s) 27k
9pdefault seqread-psync 105(MiB/s) 26k
9pmmappass seqread-psync 107(MiB/s) 26k
9pmsecnone seqread-psync 107(MiB/s) 26k
virtiofscachenoneT1 seqread-psync 135(MiB/s) 33k
virtiofscachenone seqread-psync 115(MiB/s) 28k
virtiofsdefaultT1 seqread-psync 2465(MiB/s) 616k
virtiofsdefault seqread-psync 2468(MiB/s) 617k
9pbigmsize seqread-psync-multi 357(MiB/s) 89k
9pdefault seqread-psync-multi 358(MiB/s) 89k
9pmmappass seqread-psync-multi 347(MiB/s) 86k
9pmsecnone seqread-psync-multi 364(MiB/s) 91k
virtiofscachenoneT1 seqread-psync-multi 479(MiB/s) 119k
virtiofscachenone seqread-psync-multi 385(MiB/s) 96k
virtiofsdefaultT1 seqread-psync-multi 5916(MiB/s) 1479k
virtiofsdefault seqread-psync-multi 8771(MiB/s) 2192k
9pbigmsize seqread-mmap 111(MiB/s) 27k
9pdefault seqread-mmap 101(MiB/s) 25k
9pmmappass seqread-mmap 114(MiB/s) 28k
9pmsecnone seqread-mmap 107(MiB/s) 26k
virtiofscachenoneT1 seqread-mmap 0(KiB/s) 0
virtiofscachenone seqread-mmap 0(KiB/s) 0
virtiofsdefaultT1 seqread-mmap 2896(MiB/s) 724k
virtiofsdefault seqread-mmap 2856(MiB/s) 714k
9pbigmsize seqread-mmap-multi 364(MiB/s) 91k
9pdefault seqread-mmap-multi 348(MiB/s) 87k
9pmmappass seqread-mmap-multi 354(MiB/s) 88k
9pmsecnone seqread-mmap-multi 340(MiB/s) 85k
virtiofscachenoneT1 seqread-mmap-multi 0(KiB/s) 0
virtiofscachenone seqread-mmap-multi 0(KiB/s) 0
virtiofsdefaultT1 seqread-mmap-multi 6057(MiB/s) 1514k
virtiofsdefault seqread-mmap-multi 9585(MiB/s) 2396k
9pbigmsize seqread-libaio 109(MiB/s) 27k
9pdefault seqread-libaio 103(MiB/s) 25k
9pmmappass seqread-libaio 107(MiB/s) 26k
9pmsecnone seqread-libaio 107(MiB/s) 26k
virtiofscachenoneT1 seqread-libaio 671(MiB/s) 167k
virtiofscachenone seqread-libaio 538(MiB/s) 134k
virtiofsdefaultT1 seqread-libaio 187(MiB/s) 46k
virtiofsdefault seqread-libaio 541(MiB/s) 135k
9pbigmsize seqread-libaio-multi 354(MiB/s) 88k
9pdefault seqread-libaio-multi 360(MiB/s) 90k
9pmmappass seqread-libaio-multi 356(MiB/s) 89k
9pmsecnone seqread-libaio-multi 344(MiB/s) 86k
virtiofscachenoneT1 seqread-libaio-multi 488(MiB/s) 122k
virtiofscachenone seqread-libaio-multi 380(MiB/s) 95k
virtiofsdefaultT1 seqread-libaio-multi 5577(MiB/s) 1394k
virtiofsdefault seqread-libaio-multi 5359(MiB/s) 1339k
9pbigmsize randread-psync 106(MiB/s) 26k
9pdefault randread-psync 106(MiB/s) 26k
9pmmappass randread-psync 120(MiB/s) 30k
9pmsecnone randread-psync 105(MiB/s) 26k
virtiofscachenoneT1 randread-psync 154(MiB/s) 38k
virtiofscachenone randread-psync 134(MiB/s) 33k
virtiofsdefaultT1 randread-psync 129(MiB/s) 32k
virtiofsdefault randread-psync 129(MiB/s) 32k
9pbigmsize randread-psync-multi 349(MiB/s) 87k
9pdefault randread-psync-multi 354(MiB/s) 88k
9pmmappass randread-psync-multi 360(MiB/s) 90k
9pmsecnone randread-psync-multi 352(MiB/s) 88k
virtiofscachenoneT1 randread-psync-multi 449(MiB/s) 112k
virtiofscachenone randread-psync-multi 383(MiB/s) 95k
virtiofsdefaultT1 randread-psync-multi 435(MiB/s) 108k
virtiofsdefault randread-psync-multi 368(MiB/s) 92k
9pbigmsize randread-mmap 100(MiB/s) 25k
9pdefault randread-mmap 89(MiB/s) 22k
9pmmappass randread-mmap 87(MiB/s) 21k
9pmsecnone randread-mmap 92(MiB/s) 23k
virtiofscachenoneT1 randread-mmap 0(KiB/s) 0
virtiofscachenone randread-mmap 0(KiB/s) 0
virtiofsdefaultT1 randread-mmap 111(MiB/s) 27k
virtiofsdefault randread-mmap 101(MiB/s) 25k
9pbigmsize randread-mmap-multi 335(MiB/s) 83k
9pdefault randread-mmap-multi 318(MiB/s) 79k
9pmmappass randread-mmap-multi 335(MiB/s) 83k
9pmsecnone randread-mmap-multi 323(MiB/s) 80k
virtiofscachenoneT1 randread-mmap-multi 0(KiB/s) 0
virtiofscachenone randread-mmap-multi 0(KiB/s) 0
virtiofsdefaultT1 randread-mmap-multi 422(MiB/s) 105k
virtiofsdefault randread-mmap-multi 345(MiB/s) 86k
9pbigmsize randread-libaio 84(MiB/s) 21k
9pdefault randread-libaio 89(MiB/s) 22k
9pmmappass randread-libaio 87(MiB/s) 21k
9pmsecnone randread-libaio 82(MiB/s) 20k
virtiofscachenoneT1 randread-libaio 641(MiB/s) 160k
virtiofscachenone randread-libaio 527(MiB/s) 131k
virtiofsdefaultT1 randread-libaio 205(MiB/s) 51k
virtiofsdefault randread-libaio 536(MiB/s) 134k
9pbigmsize randread-libaio-multi 265(MiB/s) 66k
9pdefault randread-libaio-multi 267(MiB/s) 66k
9pmmappass randread-libaio-multi 266(MiB/s) 66k
9pmsecnone randread-libaio-multi 269(MiB/s) 67k
virtiofscachenoneT1 randread-libaio-multi 615(MiB/s) 153k
virtiofscachenone randread-libaio-multi 542(MiB/s) 135k
virtiofsdefaultT1 randread-libaio-multi 595(MiB/s) 148k
virtiofsdefault randread-libaio-multi 552(MiB/s) 138k
9pbigmsize seqwrite-psync 106(MiB/s) 26k
9pdefault seqwrite-psync 106(MiB/s) 26k
9pmmappass seqwrite-psync 107(MiB/s) 26k
9pmsecnone seqwrite-psync 107(MiB/s) 26k
virtiofscachenoneT1 seqwrite-psync 136(MiB/s) 34k
virtiofscachenone seqwrite-psync 112(MiB/s) 28k
virtiofsdefaultT1 seqwrite-psync 132(MiB/s) 33k
virtiofsdefault seqwrite-psync 109(MiB/s) 27k
9pbigmsize seqwrite-psync-multi 353(MiB/s) 88k
9pdefault seqwrite-psync-multi 364(MiB/s) 91k
9pmmappass seqwrite-psync-multi 345(MiB/s) 86k
9pmsecnone seqwrite-psync-multi 350(MiB/s) 87k
virtiofscachenoneT1 seqwrite-psync-multi 470(MiB/s) 117k
virtiofscachenone seqwrite-psync-multi 374(MiB/s) 93k
virtiofsdefaultT1 seqwrite-psync-multi 470(MiB/s) 117k
virtiofsdefault seqwrite-psync-multi 373(MiB/s) 93k
9pbigmsize seqwrite-mmap 195(MiB/s) 48k
9pdefault seqwrite-mmap 0(KiB/s) 0
9pmmappass seqwrite-mmap 196(MiB/s) 49k
9pmsecnone seqwrite-mmap 0(KiB/s) 0
virtiofscachenoneT1 seqwrite-mmap 0(KiB/s) 0
virtiofscachenone seqwrite-mmap 0(KiB/s) 0
virtiofsdefaultT1 seqwrite-mmap 603(MiB/s) 150k
virtiofsdefault seqwrite-mmap 629(MiB/s) 157k
9pbigmsize seqwrite-mmap-multi 247(MiB/s) 61k
9pdefault seqwrite-mmap-multi 0(KiB/s) 0
9pmmappass seqwrite-mmap-multi 246(MiB/s) 61k
9pmsecnone seqwrite-mmap-multi 0(KiB/s) 0
virtiofscachenoneT1 seqwrite-mmap-multi 0(KiB/s) 0
virtiofscachenone seqwrite-mmap-multi 0(KiB/s) 0
virtiofsdefaultT1 seqwrite-mmap-multi 1787(MiB/s) 446k
virtiofsdefault seqwrite-mmap-multi 1692(MiB/s) 423k
9pbigmsize seqwrite-libaio 107(MiB/s) 26k
9pdefault seqwrite-libaio 107(MiB/s) 26k
9pmmappass seqwrite-libaio 106(MiB/s) 26k
9pmsecnone seqwrite-libaio 108(MiB/s) 27k
virtiofscachenoneT1 seqwrite-libaio 595(MiB/s) 148k
virtiofscachenone seqwrite-libaio 524(MiB/s) 131k
virtiofsdefaultT1 seqwrite-libaio 575(MiB/s) 143k
virtiofsdefault seqwrite-libaio 538(MiB/s) 134k
9pbigmsize seqwrite-libaio-multi 355(MiB/s) 88k
9pdefault seqwrite-libaio-multi 341(MiB/s) 85k
9pmmappass seqwrite-libaio-multi 354(MiB/s) 88k
9pmsecnone seqwrite-libaio-multi 350(MiB/s) 87k
virtiofscachenoneT1 seqwrite-libaio-multi 609(MiB/s) 152k
virtiofscachenone seqwrite-libaio-multi 536(MiB/s) 134k
virtiofsdefaultT1 seqwrite-libaio-multi 609(MiB/s) 152k
virtiofsdefault seqwrite-libaio-multi 538(MiB/s) 134k
9pbigmsize randwrite-psync 104(MiB/s) 26k
9pdefault randwrite-psync 106(MiB/s) 26k
9pmmappass randwrite-psync 105(MiB/s) 26k
9pmsecnone randwrite-psync 103(MiB/s) 25k
virtiofscachenoneT1 randwrite-psync 125(MiB/s) 31k
virtiofscachenone randwrite-psync 110(MiB/s) 27k
virtiofsdefaultT1 randwrite-psync 129(MiB/s) 32k
virtiofsdefault randwrite-psync 112(MiB/s) 28k
9pbigmsize randwrite-psync-multi 355(MiB/s) 88k
9pdefault randwrite-psync-multi 339(MiB/s) 84k
9pmmappass randwrite-psync-multi 343(MiB/s) 85k
9pmsecnone randwrite-psync-multi 344(MiB/s) 86k
virtiofscachenoneT1 randwrite-psync-multi 461(MiB/s) 115k
virtiofscachenone randwrite-psync-multi 370(MiB/s) 92k
virtiofsdefaultT1 randwrite-psync-multi 449(MiB/s) 112k
virtiofsdefault randwrite-psync-multi 364(MiB/s) 91k
9pbigmsize randwrite-mmap 98(MiB/s) 24k
9pdefault randwrite-mmap 0(KiB/s) 0
9pmmappass randwrite-mmap 97(MiB/s) 24k
9pmsecnone randwrite-mmap 0(KiB/s) 0
virtiofscachenoneT1 randwrite-mmap 0(KiB/s) 0
virtiofscachenone randwrite-mmap 0(KiB/s) 0
virtiofsdefaultT1 randwrite-mmap 102(MiB/s) 25k
virtiofsdefault randwrite-mmap 92(MiB/s) 23k
9pbigmsize randwrite-mmap-multi 246(MiB/s) 61k
9pdefault randwrite-mmap-multi 0(KiB/s) 0
9pmmappass randwrite-mmap-multi 239(MiB/s) 59k
9pmsecnone randwrite-mmap-multi 0(KiB/s) 0
virtiofscachenoneT1 randwrite-mmap-multi 0(KiB/s) 0
virtiofscachenone randwrite-mmap-multi 0(KiB/s) 0
virtiofsdefaultT1 randwrite-mmap-multi 279(MiB/s) 69k
virtiofsdefault randwrite-mmap-multi 225(MiB/s) 56k
9pbigmsize randwrite-libaio 110(MiB/s) 27k
9pdefault randwrite-libaio 111(MiB/s) 27k
9pmmappass randwrite-libaio 103(MiB/s) 25k
9pmsecnone randwrite-libaio 102(MiB/s) 25k
virtiofscachenoneT1 randwrite-libaio 601(MiB/s) 150k
virtiofscachenone randwrite-libaio 525(MiB/s) 131k
virtiofsdefaultT1 randwrite-libaio 618(MiB/s) 154k
virtiofsdefault randwrite-libaio 527(MiB/s) 131k
9pbigmsize randwrite-libaio-multi 332(MiB/s) 83k
9pdefault randwrite-libaio-multi 343(MiB/s) 85k
9pmmappass randwrite-libaio-multi 350(MiB/s) 87k
9pmsecnone randwrite-libaio-multi 334(MiB/s) 83k
virtiofscachenoneT1 randwrite-libaio-multi 611(MiB/s) 152k
virtiofscachenone randwrite-libaio-multi 533(MiB/s) 133k
virtiofsdefaultT1 randwrite-libaio-multi 599(MiB/s) 149k
virtiofsdefault randwrite-libaio-multi 531(MiB/s) 132k
>
> virtiofsdefault:
> ./virtiofsd --socket-path=/tmp/vhostqemu -o source=/dev/shm/linux
> ./x86_64-softmmu/qemu-system-x86_64 -M pc,memory-backend=mem,accel=kvm -smp 8 -cpu host -m 32G,maxmem=64G,slots=1 -object memory-backend-memfd,id=mem,size=32G,share=on -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -chardev socket,id=char0,path=/tmp/vhostqemu -device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=kernel
> mount -t virtiofs kernel /mnt
>
> 9pdefault
> ./x86_64-softmmu/qemu-system-x86_64 -M pc,accel=kvm -smp 8 -cpu host -m 32G -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -virtfs local,path=/dev/shm/linux,mount_tag=kernel,security_model=passthrough
> mount -t 9p -o trans=virtio kernel /mnt -oversion=9p2000.L
>
> virtiofscache=none
> ./virtiofsd --socket-path=/tmp/vhostqemu -o source=/dev/shm/linux -o cache=none
> ./x86_64-softmmu/qemu-system-x86_64 -M pc,memory-backend=mem,accel=kvm -smp 8 -cpu host -m 32G,maxmem=64G,slots=1 -object memory-backend-memfd,id=mem,size=32G,share=on -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -chardev socket,id=char0,path=/tmp/vhostqemu -device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=kernel
> mount -t virtiofs kernel /mnt
>
> 9pmmappass
> ./x86_64-softmmu/qemu-system-x86_64 -M pc,accel=kvm -smp 8 -cpu host -m 32G -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -virtfs local,path=/dev/shm/linux,mount_tag=kernel,security_model=passthrough
> mount -t 9p -o trans=virtio kernel /mnt -oversion=9p2000.L,cache=mmap
>
> 9pmbigmsize
> ./x86_64-softmmu/qemu-system-x86_64 -M pc,accel=kvm -smp 8 -cpu host -m 32G -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -virtfs local,path=/dev/shm/linux,mount_tag=kernel,security_model=passthrough
> mount -t 9p -o trans=virtio kernel /mnt -oversion=9p2000.L,cache=mmap,msize=1048576
>
> 9pmsecnone
> ./x86_64-softmmu/qemu-system-x86_64 -M pc,accel=kvm -smp 8 -cpu host -m 32G -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -virtfs local,path=/dev/shm/linux,mount_tag=kernel,security_model=none
> mount -t 9p -o trans=virtio kernel /mnt -oversion=9p2000.L
>
> virtiofscache=noneT1
> ./virtiofsd --socket-path=/tmp/vhostqemu -o source=/dev/shm/linux -o cache=none --thread-pool-size=1
> mount -t virtiofs kernel /mnt
>
> virtiofsdefaultT1
> ./virtiofsd --socket-path=/tmp/vhostqemu -o source=/dev/shm/linux --thread-pool-size=1
> ./x86_64-softmmu/qemu-system-x86_64 -M pc,memory-backend=mem,accel=kvm -smp 8 -cpu host -m 32G,maxmem=64G,slots=1 -object memory-backend-memfd,id=mem,size=32G,share=on -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -chardev socket,id=char0,path=/tmp/vhostqemu -device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=kernel
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] tools/virtiofs: Multi threading seems to hurt performance
@ 2020-09-22 10:25 ` Dr. David Alan Gilbert
0 siblings, 0 replies; 107+ messages in thread
From: Dr. David Alan Gilbert @ 2020-09-22 10:25 UTC (permalink / raw)
To: Vivek Goyal
Cc: jose.carlos.venegas.munoz, qemu-devel, cdupontd, virtio-fs-list,
archana.m.shinde
* Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> Hi,
> I've been doing some of my own perf tests and I think I agree
> about the thread pool size; my test is a kernel build
> and I've tried a bunch of different options.
>
> My config:
> Host: 16 core AMD EPYC (32 thread), 128G RAM,
> 5.9.0-rc4 kernel, rhel 8.2ish userspace.
> 5.1.0 qemu/virtiofsd built from git.
> Guest: Fedora 32 from cloud image with just enough extra installed for
> a kernel build.
>
> git cloned and checkout v5.8 of Linux into /dev/shm/linux on the host
> fresh before each test. Then log into the guest, make defconfig,
> time make -j 16 bzImage, make clean; time make -j 16 bzImage
> The numbers below are the 'real' time in the guest from the initial make
> (the subsequent makes dont vary much)
>
> Below are the detauls of what each of these means, but here are the
> numbers first
>
> virtiofsdefault 4m0.978s
> 9pdefault 9m41.660s
> virtiofscache=none 10m29.700s
> 9pmmappass 9m30.047s
> 9pmbigmsize 12m4.208s
> 9pmsecnone 9m21.363s
> virtiofscache=noneT1 7m17.494s
> virtiofsdefaultT1 3m43.326s
>
> So the winner there by far is the 'virtiofsdefaultT1' - that's
> the default virtiofs settings, but with --thread-pool-size=1 - so
> yes it gives a small benefit.
> But interestingly the cache=none virtiofs performance is pretty bad,
> but thread-pool-size=1 on that makes a BIG improvement.
Here are fio runs that Vivek asked me to run in my same environment
(there are some 0's in some of the mmap cases, and I've not investigated
why yet). virtiofs is looking good here in I think all of the cases;
there's some division over which cinfig; cache=none
seems faster in some cases which surprises me.
Dave
NAME WORKLOAD Bandwidth IOPS
9pbigmsize seqread-psync 108(MiB/s) 27k
9pdefault seqread-psync 105(MiB/s) 26k
9pmmappass seqread-psync 107(MiB/s) 26k
9pmsecnone seqread-psync 107(MiB/s) 26k
virtiofscachenoneT1 seqread-psync 135(MiB/s) 33k
virtiofscachenone seqread-psync 115(MiB/s) 28k
virtiofsdefaultT1 seqread-psync 2465(MiB/s) 616k
virtiofsdefault seqread-psync 2468(MiB/s) 617k
9pbigmsize seqread-psync-multi 357(MiB/s) 89k
9pdefault seqread-psync-multi 358(MiB/s) 89k
9pmmappass seqread-psync-multi 347(MiB/s) 86k
9pmsecnone seqread-psync-multi 364(MiB/s) 91k
virtiofscachenoneT1 seqread-psync-multi 479(MiB/s) 119k
virtiofscachenone seqread-psync-multi 385(MiB/s) 96k
virtiofsdefaultT1 seqread-psync-multi 5916(MiB/s) 1479k
virtiofsdefault seqread-psync-multi 8771(MiB/s) 2192k
9pbigmsize seqread-mmap 111(MiB/s) 27k
9pdefault seqread-mmap 101(MiB/s) 25k
9pmmappass seqread-mmap 114(MiB/s) 28k
9pmsecnone seqread-mmap 107(MiB/s) 26k
virtiofscachenoneT1 seqread-mmap 0(KiB/s) 0
virtiofscachenone seqread-mmap 0(KiB/s) 0
virtiofsdefaultT1 seqread-mmap 2896(MiB/s) 724k
virtiofsdefault seqread-mmap 2856(MiB/s) 714k
9pbigmsize seqread-mmap-multi 364(MiB/s) 91k
9pdefault seqread-mmap-multi 348(MiB/s) 87k
9pmmappass seqread-mmap-multi 354(MiB/s) 88k
9pmsecnone seqread-mmap-multi 340(MiB/s) 85k
virtiofscachenoneT1 seqread-mmap-multi 0(KiB/s) 0
virtiofscachenone seqread-mmap-multi 0(KiB/s) 0
virtiofsdefaultT1 seqread-mmap-multi 6057(MiB/s) 1514k
virtiofsdefault seqread-mmap-multi 9585(MiB/s) 2396k
9pbigmsize seqread-libaio 109(MiB/s) 27k
9pdefault seqread-libaio 103(MiB/s) 25k
9pmmappass seqread-libaio 107(MiB/s) 26k
9pmsecnone seqread-libaio 107(MiB/s) 26k
virtiofscachenoneT1 seqread-libaio 671(MiB/s) 167k
virtiofscachenone seqread-libaio 538(MiB/s) 134k
virtiofsdefaultT1 seqread-libaio 187(MiB/s) 46k
virtiofsdefault seqread-libaio 541(MiB/s) 135k
9pbigmsize seqread-libaio-multi 354(MiB/s) 88k
9pdefault seqread-libaio-multi 360(MiB/s) 90k
9pmmappass seqread-libaio-multi 356(MiB/s) 89k
9pmsecnone seqread-libaio-multi 344(MiB/s) 86k
virtiofscachenoneT1 seqread-libaio-multi 488(MiB/s) 122k
virtiofscachenone seqread-libaio-multi 380(MiB/s) 95k
virtiofsdefaultT1 seqread-libaio-multi 5577(MiB/s) 1394k
virtiofsdefault seqread-libaio-multi 5359(MiB/s) 1339k
9pbigmsize randread-psync 106(MiB/s) 26k
9pdefault randread-psync 106(MiB/s) 26k
9pmmappass randread-psync 120(MiB/s) 30k
9pmsecnone randread-psync 105(MiB/s) 26k
virtiofscachenoneT1 randread-psync 154(MiB/s) 38k
virtiofscachenone randread-psync 134(MiB/s) 33k
virtiofsdefaultT1 randread-psync 129(MiB/s) 32k
virtiofsdefault randread-psync 129(MiB/s) 32k
9pbigmsize randread-psync-multi 349(MiB/s) 87k
9pdefault randread-psync-multi 354(MiB/s) 88k
9pmmappass randread-psync-multi 360(MiB/s) 90k
9pmsecnone randread-psync-multi 352(MiB/s) 88k
virtiofscachenoneT1 randread-psync-multi 449(MiB/s) 112k
virtiofscachenone randread-psync-multi 383(MiB/s) 95k
virtiofsdefaultT1 randread-psync-multi 435(MiB/s) 108k
virtiofsdefault randread-psync-multi 368(MiB/s) 92k
9pbigmsize randread-mmap 100(MiB/s) 25k
9pdefault randread-mmap 89(MiB/s) 22k
9pmmappass randread-mmap 87(MiB/s) 21k
9pmsecnone randread-mmap 92(MiB/s) 23k
virtiofscachenoneT1 randread-mmap 0(KiB/s) 0
virtiofscachenone randread-mmap 0(KiB/s) 0
virtiofsdefaultT1 randread-mmap 111(MiB/s) 27k
virtiofsdefault randread-mmap 101(MiB/s) 25k
9pbigmsize randread-mmap-multi 335(MiB/s) 83k
9pdefault randread-mmap-multi 318(MiB/s) 79k
9pmmappass randread-mmap-multi 335(MiB/s) 83k
9pmsecnone randread-mmap-multi 323(MiB/s) 80k
virtiofscachenoneT1 randread-mmap-multi 0(KiB/s) 0
virtiofscachenone randread-mmap-multi 0(KiB/s) 0
virtiofsdefaultT1 randread-mmap-multi 422(MiB/s) 105k
virtiofsdefault randread-mmap-multi 345(MiB/s) 86k
9pbigmsize randread-libaio 84(MiB/s) 21k
9pdefault randread-libaio 89(MiB/s) 22k
9pmmappass randread-libaio 87(MiB/s) 21k
9pmsecnone randread-libaio 82(MiB/s) 20k
virtiofscachenoneT1 randread-libaio 641(MiB/s) 160k
virtiofscachenone randread-libaio 527(MiB/s) 131k
virtiofsdefaultT1 randread-libaio 205(MiB/s) 51k
virtiofsdefault randread-libaio 536(MiB/s) 134k
9pbigmsize randread-libaio-multi 265(MiB/s) 66k
9pdefault randread-libaio-multi 267(MiB/s) 66k
9pmmappass randread-libaio-multi 266(MiB/s) 66k
9pmsecnone randread-libaio-multi 269(MiB/s) 67k
virtiofscachenoneT1 randread-libaio-multi 615(MiB/s) 153k
virtiofscachenone randread-libaio-multi 542(MiB/s) 135k
virtiofsdefaultT1 randread-libaio-multi 595(MiB/s) 148k
virtiofsdefault randread-libaio-multi 552(MiB/s) 138k
9pbigmsize seqwrite-psync 106(MiB/s) 26k
9pdefault seqwrite-psync 106(MiB/s) 26k
9pmmappass seqwrite-psync 107(MiB/s) 26k
9pmsecnone seqwrite-psync 107(MiB/s) 26k
virtiofscachenoneT1 seqwrite-psync 136(MiB/s) 34k
virtiofscachenone seqwrite-psync 112(MiB/s) 28k
virtiofsdefaultT1 seqwrite-psync 132(MiB/s) 33k
virtiofsdefault seqwrite-psync 109(MiB/s) 27k
9pbigmsize seqwrite-psync-multi 353(MiB/s) 88k
9pdefault seqwrite-psync-multi 364(MiB/s) 91k
9pmmappass seqwrite-psync-multi 345(MiB/s) 86k
9pmsecnone seqwrite-psync-multi 350(MiB/s) 87k
virtiofscachenoneT1 seqwrite-psync-multi 470(MiB/s) 117k
virtiofscachenone seqwrite-psync-multi 374(MiB/s) 93k
virtiofsdefaultT1 seqwrite-psync-multi 470(MiB/s) 117k
virtiofsdefault seqwrite-psync-multi 373(MiB/s) 93k
9pbigmsize seqwrite-mmap 195(MiB/s) 48k
9pdefault seqwrite-mmap 0(KiB/s) 0
9pmmappass seqwrite-mmap 196(MiB/s) 49k
9pmsecnone seqwrite-mmap 0(KiB/s) 0
virtiofscachenoneT1 seqwrite-mmap 0(KiB/s) 0
virtiofscachenone seqwrite-mmap 0(KiB/s) 0
virtiofsdefaultT1 seqwrite-mmap 603(MiB/s) 150k
virtiofsdefault seqwrite-mmap 629(MiB/s) 157k
9pbigmsize seqwrite-mmap-multi 247(MiB/s) 61k
9pdefault seqwrite-mmap-multi 0(KiB/s) 0
9pmmappass seqwrite-mmap-multi 246(MiB/s) 61k
9pmsecnone seqwrite-mmap-multi 0(KiB/s) 0
virtiofscachenoneT1 seqwrite-mmap-multi 0(KiB/s) 0
virtiofscachenone seqwrite-mmap-multi 0(KiB/s) 0
virtiofsdefaultT1 seqwrite-mmap-multi 1787(MiB/s) 446k
virtiofsdefault seqwrite-mmap-multi 1692(MiB/s) 423k
9pbigmsize seqwrite-libaio 107(MiB/s) 26k
9pdefault seqwrite-libaio 107(MiB/s) 26k
9pmmappass seqwrite-libaio 106(MiB/s) 26k
9pmsecnone seqwrite-libaio 108(MiB/s) 27k
virtiofscachenoneT1 seqwrite-libaio 595(MiB/s) 148k
virtiofscachenone seqwrite-libaio 524(MiB/s) 131k
virtiofsdefaultT1 seqwrite-libaio 575(MiB/s) 143k
virtiofsdefault seqwrite-libaio 538(MiB/s) 134k
9pbigmsize seqwrite-libaio-multi 355(MiB/s) 88k
9pdefault seqwrite-libaio-multi 341(MiB/s) 85k
9pmmappass seqwrite-libaio-multi 354(MiB/s) 88k
9pmsecnone seqwrite-libaio-multi 350(MiB/s) 87k
virtiofscachenoneT1 seqwrite-libaio-multi 609(MiB/s) 152k
virtiofscachenone seqwrite-libaio-multi 536(MiB/s) 134k
virtiofsdefaultT1 seqwrite-libaio-multi 609(MiB/s) 152k
virtiofsdefault seqwrite-libaio-multi 538(MiB/s) 134k
9pbigmsize randwrite-psync 104(MiB/s) 26k
9pdefault randwrite-psync 106(MiB/s) 26k
9pmmappass randwrite-psync 105(MiB/s) 26k
9pmsecnone randwrite-psync 103(MiB/s) 25k
virtiofscachenoneT1 randwrite-psync 125(MiB/s) 31k
virtiofscachenone randwrite-psync 110(MiB/s) 27k
virtiofsdefaultT1 randwrite-psync 129(MiB/s) 32k
virtiofsdefault randwrite-psync 112(MiB/s) 28k
9pbigmsize randwrite-psync-multi 355(MiB/s) 88k
9pdefault randwrite-psync-multi 339(MiB/s) 84k
9pmmappass randwrite-psync-multi 343(MiB/s) 85k
9pmsecnone randwrite-psync-multi 344(MiB/s) 86k
virtiofscachenoneT1 randwrite-psync-multi 461(MiB/s) 115k
virtiofscachenone randwrite-psync-multi 370(MiB/s) 92k
virtiofsdefaultT1 randwrite-psync-multi 449(MiB/s) 112k
virtiofsdefault randwrite-psync-multi 364(MiB/s) 91k
9pbigmsize randwrite-mmap 98(MiB/s) 24k
9pdefault randwrite-mmap 0(KiB/s) 0
9pmmappass randwrite-mmap 97(MiB/s) 24k
9pmsecnone randwrite-mmap 0(KiB/s) 0
virtiofscachenoneT1 randwrite-mmap 0(KiB/s) 0
virtiofscachenone randwrite-mmap 0(KiB/s) 0
virtiofsdefaultT1 randwrite-mmap 102(MiB/s) 25k
virtiofsdefault randwrite-mmap 92(MiB/s) 23k
9pbigmsize randwrite-mmap-multi 246(MiB/s) 61k
9pdefault randwrite-mmap-multi 0(KiB/s) 0
9pmmappass randwrite-mmap-multi 239(MiB/s) 59k
9pmsecnone randwrite-mmap-multi 0(KiB/s) 0
virtiofscachenoneT1 randwrite-mmap-multi 0(KiB/s) 0
virtiofscachenone randwrite-mmap-multi 0(KiB/s) 0
virtiofsdefaultT1 randwrite-mmap-multi 279(MiB/s) 69k
virtiofsdefault randwrite-mmap-multi 225(MiB/s) 56k
9pbigmsize randwrite-libaio 110(MiB/s) 27k
9pdefault randwrite-libaio 111(MiB/s) 27k
9pmmappass randwrite-libaio 103(MiB/s) 25k
9pmsecnone randwrite-libaio 102(MiB/s) 25k
virtiofscachenoneT1 randwrite-libaio 601(MiB/s) 150k
virtiofscachenone randwrite-libaio 525(MiB/s) 131k
virtiofsdefaultT1 randwrite-libaio 618(MiB/s) 154k
virtiofsdefault randwrite-libaio 527(MiB/s) 131k
9pbigmsize randwrite-libaio-multi 332(MiB/s) 83k
9pdefault randwrite-libaio-multi 343(MiB/s) 85k
9pmmappass randwrite-libaio-multi 350(MiB/s) 87k
9pmsecnone randwrite-libaio-multi 334(MiB/s) 83k
virtiofscachenoneT1 randwrite-libaio-multi 611(MiB/s) 152k
virtiofscachenone randwrite-libaio-multi 533(MiB/s) 133k
virtiofsdefaultT1 randwrite-libaio-multi 599(MiB/s) 149k
virtiofsdefault randwrite-libaio-multi 531(MiB/s) 132k
>
> virtiofsdefault:
> ./virtiofsd --socket-path=/tmp/vhostqemu -o source=/dev/shm/linux
> ./x86_64-softmmu/qemu-system-x86_64 -M pc,memory-backend=mem,accel=kvm -smp 8 -cpu host -m 32G,maxmem=64G,slots=1 -object memory-backend-memfd,id=mem,size=32G,share=on -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -chardev socket,id=char0,path=/tmp/vhostqemu -device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=kernel
> mount -t virtiofs kernel /mnt
>
> 9pdefault
> ./x86_64-softmmu/qemu-system-x86_64 -M pc,accel=kvm -smp 8 -cpu host -m 32G -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -virtfs local,path=/dev/shm/linux,mount_tag=kernel,security_model=passthrough
> mount -t 9p -o trans=virtio kernel /mnt -oversion=9p2000.L
>
> virtiofscache=none
> ./virtiofsd --socket-path=/tmp/vhostqemu -o source=/dev/shm/linux -o cache=none
> ./x86_64-softmmu/qemu-system-x86_64 -M pc,memory-backend=mem,accel=kvm -smp 8 -cpu host -m 32G,maxmem=64G,slots=1 -object memory-backend-memfd,id=mem,size=32G,share=on -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -chardev socket,id=char0,path=/tmp/vhostqemu -device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=kernel
> mount -t virtiofs kernel /mnt
>
> 9pmmappass
> ./x86_64-softmmu/qemu-system-x86_64 -M pc,accel=kvm -smp 8 -cpu host -m 32G -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -virtfs local,path=/dev/shm/linux,mount_tag=kernel,security_model=passthrough
> mount -t 9p -o trans=virtio kernel /mnt -oversion=9p2000.L,cache=mmap
>
> 9pmbigmsize
> ./x86_64-softmmu/qemu-system-x86_64 -M pc,accel=kvm -smp 8 -cpu host -m 32G -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -virtfs local,path=/dev/shm/linux,mount_tag=kernel,security_model=passthrough
> mount -t 9p -o trans=virtio kernel /mnt -oversion=9p2000.L,cache=mmap,msize=1048576
>
> 9pmsecnone
> ./x86_64-softmmu/qemu-system-x86_64 -M pc,accel=kvm -smp 8 -cpu host -m 32G -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -virtfs local,path=/dev/shm/linux,mount_tag=kernel,security_model=none
> mount -t 9p -o trans=virtio kernel /mnt -oversion=9p2000.L
>
> virtiofscache=noneT1
> ./virtiofsd --socket-path=/tmp/vhostqemu -o source=/dev/shm/linux -o cache=none --thread-pool-size=1
> mount -t virtiofs kernel /mnt
>
> virtiofsdefaultT1
> ./virtiofsd --socket-path=/tmp/vhostqemu -o source=/dev/shm/linux --thread-pool-size=1
> ./x86_64-softmmu/qemu-system-x86_64 -M pc,memory-backend=mem,accel=kvm -smp 8 -cpu host -m 32G,maxmem=64G,slots=1 -object memory-backend-memfd,id=mem,size=32G,share=on -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -chardev socket,id=char0,path=/tmp/vhostqemu -device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=kernel
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: tools/virtiofs: Multi threading seems to hurt performance
2020-09-21 20:16 ` [Virtio-fs] " Vivek Goyal
@ 2020-09-22 11:09 ` Dr. David Alan Gilbert
-1 siblings, 0 replies; 107+ messages in thread
From: Dr. David Alan Gilbert @ 2020-09-22 11:09 UTC (permalink / raw)
To: Vivek Goyal; +Cc: virtio-fs-list, qemu-devel, Stefan Hajnoczi, Miklos Szeredi
* Vivek Goyal (vgoyal@redhat.com) wrote:
> On Fri, Sep 18, 2020 at 05:34:36PM -0400, Vivek Goyal wrote:
> > Hi All,
> >
> > virtiofsd default thread pool size is 64. To me it feels that in most of
> > the cases thread pool size 1 performs better than thread pool size 64.
> >
> > I ran virtiofs-tests.
> >
> > https://github.com/rhvgoyal/virtiofs-tests
>
> I spent more time debugging this. First thing I noticed is that we
> are using "exclusive" glib thread pool.
>
> https://developer.gnome.org/glib/stable/glib-Thread-Pools.html#g-thread-pool-new
>
> This seems to run pre-determined number of threads dedicated to that
> thread pool. Little instrumentation of code revealed that every new
> request gets assiged to new thread (despite the fact that previous
> thread finished its job). So internally there might be some kind of
> round robin policy to choose next thread for running the job.
>
> I decided to switch to "shared" pool instead where it seemed to spin
> up new threads only if there is enough work. Also threads can be shared
> between pools.
>
> And looks like testing results are way better with "shared" pools. So
> may be we should switch to shared pool by default. (Till somebody shows
> in what cases exclusive pools are better).
>
> Second thought which came to mind was what's the impact of NUMA. What
> if qemu and virtiofsd process/threads are running on separate NUMA
> node. That should increase memory access latency and increased overhead.
> So I used "numactl --cpubind=0" to bind both qemu and virtiofsd to node
> 0. My machine seems to have two numa nodes. (Each node is having 32
> logical processors). Keeping both qemu and virtiofsd on same node
> improves throughput further.
>
> So here are the results.
>
> vtfs-none-epool --> cache=none, exclusive thread pool.
> vtfs-none-spool --> cache=none, shared thread pool.
> vtfs-none-spool-numa --> cache=none, shared thread pool, same numa node
Do you have the numbers for:
epool
epool thread-pool-size=1
spool
?
Dave
>
> NAME WORKLOAD Bandwidth IOPS
> vtfs-none-epool seqread-psync 36(MiB/s) 9392
> vtfs-none-spool seqread-psync 68(MiB/s) 17k
> vtfs-none-spool-numa seqread-psync 73(MiB/s) 18k
>
> vtfs-none-epool seqread-psync-multi 210(MiB/s) 52k
> vtfs-none-spool seqread-psync-multi 260(MiB/s) 65k
> vtfs-none-spool-numa seqread-psync-multi 309(MiB/s) 77k
>
> vtfs-none-epool seqread-libaio 286(MiB/s) 71k
> vtfs-none-spool seqread-libaio 328(MiB/s) 82k
> vtfs-none-spool-numa seqread-libaio 332(MiB/s) 83k
>
> vtfs-none-epool seqread-libaio-multi 201(MiB/s) 50k
> vtfs-none-spool seqread-libaio-multi 254(MiB/s) 63k
> vtfs-none-spool-numa seqread-libaio-multi 276(MiB/s) 69k
>
> vtfs-none-epool randread-psync 40(MiB/s) 10k
> vtfs-none-spool randread-psync 64(MiB/s) 16k
> vtfs-none-spool-numa randread-psync 72(MiB/s) 18k
>
> vtfs-none-epool randread-psync-multi 211(MiB/s) 52k
> vtfs-none-spool randread-psync-multi 252(MiB/s) 63k
> vtfs-none-spool-numa randread-psync-multi 297(MiB/s) 74k
>
> vtfs-none-epool randread-libaio 313(MiB/s) 78k
> vtfs-none-spool randread-libaio 320(MiB/s) 80k
> vtfs-none-spool-numa randread-libaio 330(MiB/s) 82k
>
> vtfs-none-epool randread-libaio-multi 257(MiB/s) 64k
> vtfs-none-spool randread-libaio-multi 274(MiB/s) 68k
> vtfs-none-spool-numa randread-libaio-multi 319(MiB/s) 79k
>
> vtfs-none-epool seqwrite-psync 34(MiB/s) 8926
> vtfs-none-spool seqwrite-psync 55(MiB/s) 13k
> vtfs-none-spool-numa seqwrite-psync 66(MiB/s) 16k
>
> vtfs-none-epool seqwrite-psync-multi 196(MiB/s) 49k
> vtfs-none-spool seqwrite-psync-multi 225(MiB/s) 56k
> vtfs-none-spool-numa seqwrite-psync-multi 270(MiB/s) 67k
>
> vtfs-none-epool seqwrite-libaio 257(MiB/s) 64k
> vtfs-none-spool seqwrite-libaio 304(MiB/s) 76k
> vtfs-none-spool-numa seqwrite-libaio 267(MiB/s) 66k
>
> vtfs-none-epool seqwrite-libaio-multi 312(MiB/s) 78k
> vtfs-none-spool seqwrite-libaio-multi 366(MiB/s) 91k
> vtfs-none-spool-numa seqwrite-libaio-multi 381(MiB/s) 95k
>
> vtfs-none-epool randwrite-psync 38(MiB/s) 9745
> vtfs-none-spool randwrite-psync 55(MiB/s) 13k
> vtfs-none-spool-numa randwrite-psync 67(MiB/s) 16k
>
> vtfs-none-epool randwrite-psync-multi 186(MiB/s) 46k
> vtfs-none-spool randwrite-psync-multi 240(MiB/s) 60k
> vtfs-none-spool-numa randwrite-psync-multi 271(MiB/s) 67k
>
> vtfs-none-epool randwrite-libaio 224(MiB/s) 56k
> vtfs-none-spool randwrite-libaio 296(MiB/s) 74k
> vtfs-none-spool-numa randwrite-libaio 290(MiB/s) 72k
>
> vtfs-none-epool randwrite-libaio-multi 300(MiB/s) 75k
> vtfs-none-spool randwrite-libaio-multi 350(MiB/s) 87k
> vtfs-none-spool-numa randwrite-libaio-multi 383(MiB/s) 95k
>
> Thanks
> Vivek
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] tools/virtiofs: Multi threading seems to hurt performance
@ 2020-09-22 11:09 ` Dr. David Alan Gilbert
0 siblings, 0 replies; 107+ messages in thread
From: Dr. David Alan Gilbert @ 2020-09-22 11:09 UTC (permalink / raw)
To: Vivek Goyal; +Cc: virtio-fs-list, qemu-devel, Miklos Szeredi
* Vivek Goyal (vgoyal@redhat.com) wrote:
> On Fri, Sep 18, 2020 at 05:34:36PM -0400, Vivek Goyal wrote:
> > Hi All,
> >
> > virtiofsd default thread pool size is 64. To me it feels that in most of
> > the cases thread pool size 1 performs better than thread pool size 64.
> >
> > I ran virtiofs-tests.
> >
> > https://github.com/rhvgoyal/virtiofs-tests
>
> I spent more time debugging this. First thing I noticed is that we
> are using "exclusive" glib thread pool.
>
> https://developer.gnome.org/glib/stable/glib-Thread-Pools.html#g-thread-pool-new
>
> This seems to run pre-determined number of threads dedicated to that
> thread pool. Little instrumentation of code revealed that every new
> request gets assiged to new thread (despite the fact that previous
> thread finished its job). So internally there might be some kind of
> round robin policy to choose next thread for running the job.
>
> I decided to switch to "shared" pool instead where it seemed to spin
> up new threads only if there is enough work. Also threads can be shared
> between pools.
>
> And looks like testing results are way better with "shared" pools. So
> may be we should switch to shared pool by default. (Till somebody shows
> in what cases exclusive pools are better).
>
> Second thought which came to mind was what's the impact of NUMA. What
> if qemu and virtiofsd process/threads are running on separate NUMA
> node. That should increase memory access latency and increased overhead.
> So I used "numactl --cpubind=0" to bind both qemu and virtiofsd to node
> 0. My machine seems to have two numa nodes. (Each node is having 32
> logical processors). Keeping both qemu and virtiofsd on same node
> improves throughput further.
>
> So here are the results.
>
> vtfs-none-epool --> cache=none, exclusive thread pool.
> vtfs-none-spool --> cache=none, shared thread pool.
> vtfs-none-spool-numa --> cache=none, shared thread pool, same numa node
Do you have the numbers for:
epool
epool thread-pool-size=1
spool
?
Dave
>
> NAME WORKLOAD Bandwidth IOPS
> vtfs-none-epool seqread-psync 36(MiB/s) 9392
> vtfs-none-spool seqread-psync 68(MiB/s) 17k
> vtfs-none-spool-numa seqread-psync 73(MiB/s) 18k
>
> vtfs-none-epool seqread-psync-multi 210(MiB/s) 52k
> vtfs-none-spool seqread-psync-multi 260(MiB/s) 65k
> vtfs-none-spool-numa seqread-psync-multi 309(MiB/s) 77k
>
> vtfs-none-epool seqread-libaio 286(MiB/s) 71k
> vtfs-none-spool seqread-libaio 328(MiB/s) 82k
> vtfs-none-spool-numa seqread-libaio 332(MiB/s) 83k
>
> vtfs-none-epool seqread-libaio-multi 201(MiB/s) 50k
> vtfs-none-spool seqread-libaio-multi 254(MiB/s) 63k
> vtfs-none-spool-numa seqread-libaio-multi 276(MiB/s) 69k
>
> vtfs-none-epool randread-psync 40(MiB/s) 10k
> vtfs-none-spool randread-psync 64(MiB/s) 16k
> vtfs-none-spool-numa randread-psync 72(MiB/s) 18k
>
> vtfs-none-epool randread-psync-multi 211(MiB/s) 52k
> vtfs-none-spool randread-psync-multi 252(MiB/s) 63k
> vtfs-none-spool-numa randread-psync-multi 297(MiB/s) 74k
>
> vtfs-none-epool randread-libaio 313(MiB/s) 78k
> vtfs-none-spool randread-libaio 320(MiB/s) 80k
> vtfs-none-spool-numa randread-libaio 330(MiB/s) 82k
>
> vtfs-none-epool randread-libaio-multi 257(MiB/s) 64k
> vtfs-none-spool randread-libaio-multi 274(MiB/s) 68k
> vtfs-none-spool-numa randread-libaio-multi 319(MiB/s) 79k
>
> vtfs-none-epool seqwrite-psync 34(MiB/s) 8926
> vtfs-none-spool seqwrite-psync 55(MiB/s) 13k
> vtfs-none-spool-numa seqwrite-psync 66(MiB/s) 16k
>
> vtfs-none-epool seqwrite-psync-multi 196(MiB/s) 49k
> vtfs-none-spool seqwrite-psync-multi 225(MiB/s) 56k
> vtfs-none-spool-numa seqwrite-psync-multi 270(MiB/s) 67k
>
> vtfs-none-epool seqwrite-libaio 257(MiB/s) 64k
> vtfs-none-spool seqwrite-libaio 304(MiB/s) 76k
> vtfs-none-spool-numa seqwrite-libaio 267(MiB/s) 66k
>
> vtfs-none-epool seqwrite-libaio-multi 312(MiB/s) 78k
> vtfs-none-spool seqwrite-libaio-multi 366(MiB/s) 91k
> vtfs-none-spool-numa seqwrite-libaio-multi 381(MiB/s) 95k
>
> vtfs-none-epool randwrite-psync 38(MiB/s) 9745
> vtfs-none-spool randwrite-psync 55(MiB/s) 13k
> vtfs-none-spool-numa randwrite-psync 67(MiB/s) 16k
>
> vtfs-none-epool randwrite-psync-multi 186(MiB/s) 46k
> vtfs-none-spool randwrite-psync-multi 240(MiB/s) 60k
> vtfs-none-spool-numa randwrite-psync-multi 271(MiB/s) 67k
>
> vtfs-none-epool randwrite-libaio 224(MiB/s) 56k
> vtfs-none-spool randwrite-libaio 296(MiB/s) 74k
> vtfs-none-spool-numa randwrite-libaio 290(MiB/s) 72k
>
> vtfs-none-epool randwrite-libaio-multi 300(MiB/s) 75k
> vtfs-none-spool randwrite-libaio-multi 350(MiB/s) 87k
> vtfs-none-spool-numa randwrite-libaio-multi 383(MiB/s) 95k
>
> Thanks
> Vivek
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: tools/virtiofs: Multi threading seems to hurt performance
2020-09-22 10:25 ` [Virtio-fs] " Dr. David Alan Gilbert
@ 2020-09-22 17:47 ` Vivek Goyal
-1 siblings, 0 replies; 107+ messages in thread
From: Vivek Goyal @ 2020-09-22 17:47 UTC (permalink / raw)
To: Dr. David Alan Gilbert
Cc: jose.carlos.venegas.munoz, qemu-devel, cdupontd, virtio-fs-list,
Stefan Hajnoczi, archana.m.shinde
On Tue, Sep 22, 2020 at 11:25:31AM +0100, Dr. David Alan Gilbert wrote:
> * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> > Hi,
> > I've been doing some of my own perf tests and I think I agree
> > about the thread pool size; my test is a kernel build
> > and I've tried a bunch of different options.
> >
> > My config:
> > Host: 16 core AMD EPYC (32 thread), 128G RAM,
> > 5.9.0-rc4 kernel, rhel 8.2ish userspace.
> > 5.1.0 qemu/virtiofsd built from git.
> > Guest: Fedora 32 from cloud image with just enough extra installed for
> > a kernel build.
> >
> > git cloned and checkout v5.8 of Linux into /dev/shm/linux on the host
> > fresh before each test. Then log into the guest, make defconfig,
> > time make -j 16 bzImage, make clean; time make -j 16 bzImage
> > The numbers below are the 'real' time in the guest from the initial make
> > (the subsequent makes dont vary much)
> >
> > Below are the detauls of what each of these means, but here are the
> > numbers first
> >
> > virtiofsdefault 4m0.978s
> > 9pdefault 9m41.660s
> > virtiofscache=none 10m29.700s
> > 9pmmappass 9m30.047s
> > 9pmbigmsize 12m4.208s
> > 9pmsecnone 9m21.363s
> > virtiofscache=noneT1 7m17.494s
> > virtiofsdefaultT1 3m43.326s
> >
> > So the winner there by far is the 'virtiofsdefaultT1' - that's
> > the default virtiofs settings, but with --thread-pool-size=1 - so
> > yes it gives a small benefit.
> > But interestingly the cache=none virtiofs performance is pretty bad,
> > but thread-pool-size=1 on that makes a BIG improvement.
>
> Here are fio runs that Vivek asked me to run in my same environment
> (there are some 0's in some of the mmap cases, and I've not investigated
> why yet).
cache=none does not allow mmap in case of virtiofs. That's when you
are seeing 0.
>virtiofs is looking good here in I think all of the cases;
> there's some division over which cinfig; cache=none
> seems faster in some cases which surprises me.
I know cache=none is faster in case of write workloads. It forces
direct write where we don't call file_remove_privs(). While cache=auto
goes through file_remove_privs() and that adds a GETXATTR request to
every WRITE request.
Vivek
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] tools/virtiofs: Multi threading seems to hurt performance
@ 2020-09-22 17:47 ` Vivek Goyal
0 siblings, 0 replies; 107+ messages in thread
From: Vivek Goyal @ 2020-09-22 17:47 UTC (permalink / raw)
To: Dr. David Alan Gilbert
Cc: jose.carlos.venegas.munoz, qemu-devel, cdupontd, virtio-fs-list,
archana.m.shinde
On Tue, Sep 22, 2020 at 11:25:31AM +0100, Dr. David Alan Gilbert wrote:
> * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> > Hi,
> > I've been doing some of my own perf tests and I think I agree
> > about the thread pool size; my test is a kernel build
> > and I've tried a bunch of different options.
> >
> > My config:
> > Host: 16 core AMD EPYC (32 thread), 128G RAM,
> > 5.9.0-rc4 kernel, rhel 8.2ish userspace.
> > 5.1.0 qemu/virtiofsd built from git.
> > Guest: Fedora 32 from cloud image with just enough extra installed for
> > a kernel build.
> >
> > git cloned and checkout v5.8 of Linux into /dev/shm/linux on the host
> > fresh before each test. Then log into the guest, make defconfig,
> > time make -j 16 bzImage, make clean; time make -j 16 bzImage
> > The numbers below are the 'real' time in the guest from the initial make
> > (the subsequent makes dont vary much)
> >
> > Below are the detauls of what each of these means, but here are the
> > numbers first
> >
> > virtiofsdefault 4m0.978s
> > 9pdefault 9m41.660s
> > virtiofscache=none 10m29.700s
> > 9pmmappass 9m30.047s
> > 9pmbigmsize 12m4.208s
> > 9pmsecnone 9m21.363s
> > virtiofscache=noneT1 7m17.494s
> > virtiofsdefaultT1 3m43.326s
> >
> > So the winner there by far is the 'virtiofsdefaultT1' - that's
> > the default virtiofs settings, but with --thread-pool-size=1 - so
> > yes it gives a small benefit.
> > But interestingly the cache=none virtiofs performance is pretty bad,
> > but thread-pool-size=1 on that makes a BIG improvement.
>
> Here are fio runs that Vivek asked me to run in my same environment
> (there are some 0's in some of the mmap cases, and I've not investigated
> why yet).
cache=none does not allow mmap in case of virtiofs. That's when you
are seeing 0.
>virtiofs is looking good here in I think all of the cases;
> there's some division over which cinfig; cache=none
> seems faster in some cases which surprises me.
I know cache=none is faster in case of write workloads. It forces
direct write where we don't call file_remove_privs(). While cache=auto
goes through file_remove_privs() and that adds a GETXATTR request to
every WRITE request.
Vivek
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: tools/virtiofs: Multi threading seems to hurt performance
2020-09-22 11:09 ` [Virtio-fs] " Dr. David Alan Gilbert
@ 2020-09-22 22:56 ` Vivek Goyal
-1 siblings, 0 replies; 107+ messages in thread
From: Vivek Goyal @ 2020-09-22 22:56 UTC (permalink / raw)
To: Dr. David Alan Gilbert
Cc: virtio-fs-list, qemu-devel, Stefan Hajnoczi, Miklos Szeredi
On Tue, Sep 22, 2020 at 12:09:46PM +0100, Dr. David Alan Gilbert wrote:
>
> Do you have the numbers for:
> epool
> epool thread-pool-size=1
> spool
Hi David,
Ok, I re-ran my numbers again after upgrading to latest qemu and also
upgraded host kernel to latest upstream. Apart from comparing I epool,
spool and 1Thread, I also ran their numa variants. That is I launched
qemu and virtiofsd on node 0 of machine (numactl --cpunodebind=0).
Results are kind of mixed. Here are my takeaways.
- Running on same numa node improves performance overall for exclusive,
shared and exclusive-1T mode.
- In general both shared pool and exclusive-1T mode seem to perform
better than exclusive mode, except for the case of randwrite-libaio.
In some cases (seqread-libaio, seqwrite-libaio, seqwrite-libaio-multi)
exclusive pool performs better than exclusive-1T.
- Looks like in some cases exclusive-1T performs better than shared
pool. (randwrite-libaio, randwrite-psync-multi, seqwrite-psync-multi,
seqwrite-psync, seqread-libaio-multi, seqread-psync-multi)
Overall, I feel that both exlusive-1T and shared perform better than
exclusive pool. Results between exclusive-1T and shared pool are mixed.
It seems like in many cases exclusve-1T performs better. I would say
that moving to "shared" pool seems like a reasonable option.
Thanks
Vivek
NAME WORKLOAD Bandwidth IOPS
vtfs-none-epool seqread-psync 38(MiB/s) 9967
vtfs-none-epool-1T seqread-psync 66(MiB/s) 16k
vtfs-none-spool seqread-psync 67(MiB/s) 16k
vtfs-none-epool-numa seqread-psync 48(MiB/s) 12k
vtfs-none-epool-1T-numa seqread-psync 74(MiB/s) 18k
vtfs-none-spool-numa seqread-psync 74(MiB/s) 18k
vtfs-none-epool seqread-psync-multi 204(MiB/s) 51k
vtfs-none-epool-1T seqread-psync-multi 325(MiB/s) 81k
vtfs-none-spool seqread-psync-multi 271(MiB/s) 67k
vtfs-none-epool-numa seqread-psync-multi 253(MiB/s) 63k
vtfs-none-epool-1T-numa seqread-psync-multi 349(MiB/s) 87k
vtfs-none-spool-numa seqread-psync-multi 301(MiB/s) 75k
vtfs-none-epool seqread-libaio 301(MiB/s) 75k
vtfs-none-epool-1T seqread-libaio 273(MiB/s) 68k
vtfs-none-spool seqread-libaio 334(MiB/s) 83k
vtfs-none-epool-numa seqread-libaio 315(MiB/s) 78k
vtfs-none-epool-1T-numa seqread-libaio 326(MiB/s) 81k
vtfs-none-spool-numa seqread-libaio 335(MiB/s) 83k
vtfs-none-epool seqread-libaio-multi 202(MiB/s) 50k
vtfs-none-epool-1T seqread-libaio-multi 308(MiB/s) 77k
vtfs-none-spool seqread-libaio-multi 247(MiB/s) 61k
vtfs-none-epool-numa seqread-libaio-multi 238(MiB/s) 59k
vtfs-none-epool-1T-numa seqread-libaio-multi 307(MiB/s) 76k
vtfs-none-spool-numa seqread-libaio-multi 269(MiB/s) 67k
vtfs-none-epool randread-psync 41(MiB/s) 10k
vtfs-none-epool-1T randread-psync 67(MiB/s) 16k
vtfs-none-spool randread-psync 64(MiB/s) 16k
vtfs-none-epool-numa randread-psync 48(MiB/s) 12k
vtfs-none-epool-1T-numa randread-psync 73(MiB/s) 18k
vtfs-none-spool-numa randread-psync 72(MiB/s) 18k
vtfs-none-epool randread-psync-multi 207(MiB/s) 51k
vtfs-none-epool-1T randread-psync-multi 313(MiB/s) 78k
vtfs-none-spool randread-psync-multi 265(MiB/s) 66k
vtfs-none-epool-numa randread-psync-multi 253(MiB/s) 63k
vtfs-none-epool-1T-numa randread-psync-multi 340(MiB/s) 85k
vtfs-none-spool-numa randread-psync-multi 305(MiB/s) 76k
vtfs-none-epool randread-libaio 305(MiB/s) 76k
vtfs-none-epool-1T randread-libaio 308(MiB/s) 77k
vtfs-none-spool randread-libaio 329(MiB/s) 82k
vtfs-none-epool-numa randread-libaio 310(MiB/s) 77k
vtfs-none-epool-1T-numa randread-libaio 328(MiB/s) 82k
vtfs-none-spool-numa randread-libaio 339(MiB/s) 84k
vtfs-none-epool randread-libaio-multi 265(MiB/s) 66k
vtfs-none-epool-1T randread-libaio-multi 267(MiB/s) 66k
vtfs-none-spool randread-libaio-multi 269(MiB/s) 67k
vtfs-none-epool-numa randread-libaio-multi 314(MiB/s) 78k
vtfs-none-epool-1T-numa randread-libaio-multi 319(MiB/s) 79k
vtfs-none-spool-numa randread-libaio-multi 318(MiB/s) 79k
vtfs-none-epool seqwrite-psync 36(MiB/s) 9224
vtfs-none-epool-1T seqwrite-psync 67(MiB/s) 16k
vtfs-none-spool seqwrite-psync 61(MiB/s) 15k
vtfs-none-epool-numa seqwrite-psync 44(MiB/s) 11k
vtfs-none-epool-1T-numa seqwrite-psync 69(MiB/s) 17k
vtfs-none-spool-numa seqwrite-psync 68(MiB/s) 17k
vtfs-none-epool seqwrite-psync-multi 193(MiB/s) 48k
vtfs-none-epool-1T seqwrite-psync-multi 299(MiB/s) 74k
vtfs-none-spool seqwrite-psync-multi 240(MiB/s) 60k
vtfs-none-epool-numa seqwrite-psync-multi 233(MiB/s) 58k
vtfs-none-epool-1T-numa seqwrite-psync-multi 358(MiB/s) 89k
vtfs-none-spool-numa seqwrite-psync-multi 285(MiB/s) 71k
vtfs-none-epool seqwrite-libaio 265(MiB/s) 66k
vtfs-none-epool-1T seqwrite-libaio 245(MiB/s) 61k
vtfs-none-spool seqwrite-libaio 312(MiB/s) 78k
vtfs-none-epool-numa seqwrite-libaio 295(MiB/s) 73k
vtfs-none-epool-1T-numa seqwrite-libaio 282(MiB/s) 70k
vtfs-none-spool-numa seqwrite-libaio 297(MiB/s) 74k
vtfs-none-epool seqwrite-libaio-multi 313(MiB/s) 78k
vtfs-none-epool-1T seqwrite-libaio-multi 299(MiB/s) 74k
vtfs-none-spool seqwrite-libaio-multi 315(MiB/s) 78k
vtfs-none-epool-numa seqwrite-libaio-multi 318(MiB/s) 79k
vtfs-none-epool-1T-numa seqwrite-libaio-multi 410(MiB/s) 102k
vtfs-none-spool-numa seqwrite-libaio-multi 378(MiB/s) 94k
vtfs-none-epool randwrite-psync 33(MiB/s) 8629
vtfs-none-epool-1T randwrite-psync 61(MiB/s) 15k
vtfs-none-spool randwrite-psync 63(MiB/s) 15k
vtfs-none-epool-numa randwrite-psync 49(MiB/s) 12k
vtfs-none-epool-1T-numa randwrite-psync 68(MiB/s) 17k
vtfs-none-spool-numa randwrite-psync 66(MiB/s) 16k
vtfs-none-epool randwrite-psync-multi 186(MiB/s) 46k
vtfs-none-epool-1T randwrite-psync-multi 300(MiB/s) 75k
vtfs-none-spool randwrite-psync-multi 233(MiB/s) 58k
vtfs-none-epool-numa randwrite-psync-multi 235(MiB/s) 58k
vtfs-none-epool-1T-numa randwrite-psync-multi 355(MiB/s) 88k
vtfs-none-spool-numa randwrite-psync-multi 266(MiB/s) 66k
vtfs-none-epool randwrite-libaio 289(MiB/s) 72k
vtfs-none-epool-1T randwrite-libaio 284(MiB/s) 71k
vtfs-none-spool randwrite-libaio 278(MiB/s) 69k
vtfs-none-epool-numa randwrite-libaio 292(MiB/s) 73k
vtfs-none-epool-1T-numa randwrite-libaio 294(MiB/s) 73k
vtfs-none-spool-numa randwrite-libaio 290(MiB/s) 72k
vtfs-none-epool randwrite-libaio-multi 317(MiB/s) 79k
vtfs-none-epool-1T randwrite-libaio-multi 323(MiB/s) 80k
vtfs-none-spool randwrite-libaio-multi 330(MiB/s) 82k
vtfs-none-epool-numa randwrite-libaio-multi 315(MiB/s) 78k
vtfs-none-epool-1T-numa randwrite-libaio-multi 409(MiB/s) 102k
vtfs-none-spool-numa randwrite-libaio-multi 384(MiB/s) 96k
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] tools/virtiofs: Multi threading seems to hurt performance
@ 2020-09-22 22:56 ` Vivek Goyal
0 siblings, 0 replies; 107+ messages in thread
From: Vivek Goyal @ 2020-09-22 22:56 UTC (permalink / raw)
To: Dr. David Alan Gilbert; +Cc: virtio-fs-list, qemu-devel, Miklos Szeredi
On Tue, Sep 22, 2020 at 12:09:46PM +0100, Dr. David Alan Gilbert wrote:
>
> Do you have the numbers for:
> epool
> epool thread-pool-size=1
> spool
Hi David,
Ok, I re-ran my numbers again after upgrading to latest qemu and also
upgraded host kernel to latest upstream. Apart from comparing I epool,
spool and 1Thread, I also ran their numa variants. That is I launched
qemu and virtiofsd on node 0 of machine (numactl --cpunodebind=0).
Results are kind of mixed. Here are my takeaways.
- Running on same numa node improves performance overall for exclusive,
shared and exclusive-1T mode.
- In general both shared pool and exclusive-1T mode seem to perform
better than exclusive mode, except for the case of randwrite-libaio.
In some cases (seqread-libaio, seqwrite-libaio, seqwrite-libaio-multi)
exclusive pool performs better than exclusive-1T.
- Looks like in some cases exclusive-1T performs better than shared
pool. (randwrite-libaio, randwrite-psync-multi, seqwrite-psync-multi,
seqwrite-psync, seqread-libaio-multi, seqread-psync-multi)
Overall, I feel that both exlusive-1T and shared perform better than
exclusive pool. Results between exclusive-1T and shared pool are mixed.
It seems like in many cases exclusve-1T performs better. I would say
that moving to "shared" pool seems like a reasonable option.
Thanks
Vivek
NAME WORKLOAD Bandwidth IOPS
vtfs-none-epool seqread-psync 38(MiB/s) 9967
vtfs-none-epool-1T seqread-psync 66(MiB/s) 16k
vtfs-none-spool seqread-psync 67(MiB/s) 16k
vtfs-none-epool-numa seqread-psync 48(MiB/s) 12k
vtfs-none-epool-1T-numa seqread-psync 74(MiB/s) 18k
vtfs-none-spool-numa seqread-psync 74(MiB/s) 18k
vtfs-none-epool seqread-psync-multi 204(MiB/s) 51k
vtfs-none-epool-1T seqread-psync-multi 325(MiB/s) 81k
vtfs-none-spool seqread-psync-multi 271(MiB/s) 67k
vtfs-none-epool-numa seqread-psync-multi 253(MiB/s) 63k
vtfs-none-epool-1T-numa seqread-psync-multi 349(MiB/s) 87k
vtfs-none-spool-numa seqread-psync-multi 301(MiB/s) 75k
vtfs-none-epool seqread-libaio 301(MiB/s) 75k
vtfs-none-epool-1T seqread-libaio 273(MiB/s) 68k
vtfs-none-spool seqread-libaio 334(MiB/s) 83k
vtfs-none-epool-numa seqread-libaio 315(MiB/s) 78k
vtfs-none-epool-1T-numa seqread-libaio 326(MiB/s) 81k
vtfs-none-spool-numa seqread-libaio 335(MiB/s) 83k
vtfs-none-epool seqread-libaio-multi 202(MiB/s) 50k
vtfs-none-epool-1T seqread-libaio-multi 308(MiB/s) 77k
vtfs-none-spool seqread-libaio-multi 247(MiB/s) 61k
vtfs-none-epool-numa seqread-libaio-multi 238(MiB/s) 59k
vtfs-none-epool-1T-numa seqread-libaio-multi 307(MiB/s) 76k
vtfs-none-spool-numa seqread-libaio-multi 269(MiB/s) 67k
vtfs-none-epool randread-psync 41(MiB/s) 10k
vtfs-none-epool-1T randread-psync 67(MiB/s) 16k
vtfs-none-spool randread-psync 64(MiB/s) 16k
vtfs-none-epool-numa randread-psync 48(MiB/s) 12k
vtfs-none-epool-1T-numa randread-psync 73(MiB/s) 18k
vtfs-none-spool-numa randread-psync 72(MiB/s) 18k
vtfs-none-epool randread-psync-multi 207(MiB/s) 51k
vtfs-none-epool-1T randread-psync-multi 313(MiB/s) 78k
vtfs-none-spool randread-psync-multi 265(MiB/s) 66k
vtfs-none-epool-numa randread-psync-multi 253(MiB/s) 63k
vtfs-none-epool-1T-numa randread-psync-multi 340(MiB/s) 85k
vtfs-none-spool-numa randread-psync-multi 305(MiB/s) 76k
vtfs-none-epool randread-libaio 305(MiB/s) 76k
vtfs-none-epool-1T randread-libaio 308(MiB/s) 77k
vtfs-none-spool randread-libaio 329(MiB/s) 82k
vtfs-none-epool-numa randread-libaio 310(MiB/s) 77k
vtfs-none-epool-1T-numa randread-libaio 328(MiB/s) 82k
vtfs-none-spool-numa randread-libaio 339(MiB/s) 84k
vtfs-none-epool randread-libaio-multi 265(MiB/s) 66k
vtfs-none-epool-1T randread-libaio-multi 267(MiB/s) 66k
vtfs-none-spool randread-libaio-multi 269(MiB/s) 67k
vtfs-none-epool-numa randread-libaio-multi 314(MiB/s) 78k
vtfs-none-epool-1T-numa randread-libaio-multi 319(MiB/s) 79k
vtfs-none-spool-numa randread-libaio-multi 318(MiB/s) 79k
vtfs-none-epool seqwrite-psync 36(MiB/s) 9224
vtfs-none-epool-1T seqwrite-psync 67(MiB/s) 16k
vtfs-none-spool seqwrite-psync 61(MiB/s) 15k
vtfs-none-epool-numa seqwrite-psync 44(MiB/s) 11k
vtfs-none-epool-1T-numa seqwrite-psync 69(MiB/s) 17k
vtfs-none-spool-numa seqwrite-psync 68(MiB/s) 17k
vtfs-none-epool seqwrite-psync-multi 193(MiB/s) 48k
vtfs-none-epool-1T seqwrite-psync-multi 299(MiB/s) 74k
vtfs-none-spool seqwrite-psync-multi 240(MiB/s) 60k
vtfs-none-epool-numa seqwrite-psync-multi 233(MiB/s) 58k
vtfs-none-epool-1T-numa seqwrite-psync-multi 358(MiB/s) 89k
vtfs-none-spool-numa seqwrite-psync-multi 285(MiB/s) 71k
vtfs-none-epool seqwrite-libaio 265(MiB/s) 66k
vtfs-none-epool-1T seqwrite-libaio 245(MiB/s) 61k
vtfs-none-spool seqwrite-libaio 312(MiB/s) 78k
vtfs-none-epool-numa seqwrite-libaio 295(MiB/s) 73k
vtfs-none-epool-1T-numa seqwrite-libaio 282(MiB/s) 70k
vtfs-none-spool-numa seqwrite-libaio 297(MiB/s) 74k
vtfs-none-epool seqwrite-libaio-multi 313(MiB/s) 78k
vtfs-none-epool-1T seqwrite-libaio-multi 299(MiB/s) 74k
vtfs-none-spool seqwrite-libaio-multi 315(MiB/s) 78k
vtfs-none-epool-numa seqwrite-libaio-multi 318(MiB/s) 79k
vtfs-none-epool-1T-numa seqwrite-libaio-multi 410(MiB/s) 102k
vtfs-none-spool-numa seqwrite-libaio-multi 378(MiB/s) 94k
vtfs-none-epool randwrite-psync 33(MiB/s) 8629
vtfs-none-epool-1T randwrite-psync 61(MiB/s) 15k
vtfs-none-spool randwrite-psync 63(MiB/s) 15k
vtfs-none-epool-numa randwrite-psync 49(MiB/s) 12k
vtfs-none-epool-1T-numa randwrite-psync 68(MiB/s) 17k
vtfs-none-spool-numa randwrite-psync 66(MiB/s) 16k
vtfs-none-epool randwrite-psync-multi 186(MiB/s) 46k
vtfs-none-epool-1T randwrite-psync-multi 300(MiB/s) 75k
vtfs-none-spool randwrite-psync-multi 233(MiB/s) 58k
vtfs-none-epool-numa randwrite-psync-multi 235(MiB/s) 58k
vtfs-none-epool-1T-numa randwrite-psync-multi 355(MiB/s) 88k
vtfs-none-spool-numa randwrite-psync-multi 266(MiB/s) 66k
vtfs-none-epool randwrite-libaio 289(MiB/s) 72k
vtfs-none-epool-1T randwrite-libaio 284(MiB/s) 71k
vtfs-none-spool randwrite-libaio 278(MiB/s) 69k
vtfs-none-epool-numa randwrite-libaio 292(MiB/s) 73k
vtfs-none-epool-1T-numa randwrite-libaio 294(MiB/s) 73k
vtfs-none-spool-numa randwrite-libaio 290(MiB/s) 72k
vtfs-none-epool randwrite-libaio-multi 317(MiB/s) 79k
vtfs-none-epool-1T randwrite-libaio-multi 323(MiB/s) 80k
vtfs-none-spool randwrite-libaio-multi 330(MiB/s) 82k
vtfs-none-epool-numa randwrite-libaio-multi 315(MiB/s) 78k
vtfs-none-epool-1T-numa randwrite-libaio-multi 409(MiB/s) 102k
vtfs-none-spool-numa randwrite-libaio-multi 384(MiB/s) 96k
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] tools/virtiofs: Multi threading seems to hurt performance
2020-09-18 21:34 ` [Virtio-fs] " Vivek Goyal
` (4 preceding siblings ...)
(?)
@ 2020-09-23 12:50 ` Chirantan Ekbote
2020-09-23 12:59 ` Vivek Goyal
2020-09-25 11:35 ` Dr. David Alan Gilbert
-1 siblings, 2 replies; 107+ messages in thread
From: Chirantan Ekbote @ 2020-09-23 12:50 UTC (permalink / raw)
To: Vivek Goyal; +Cc: virtio-fs-list, qemu-devel
On Sat, Sep 19, 2020 at 6:36 AM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> Hi All,
>
> virtiofsd default thread pool size is 64. To me it feels that in most of
> the cases thread pool size 1 performs better than thread pool size 64.
>
> I ran virtiofs-tests.
>
> https://github.com/rhvgoyal/virtiofs-tests
>
> And here are the comparision results. To me it seems that by default
> we should switch to 1 thread (Till we can figure out how to make
> multi thread performance better even when single process is doing
> I/O in client).
>
FWIW, we've observed the same behavior in crosvm. Using a thread pool
for the virtiofs server consistently gave us worse performance than
using a single thread.
Chirantan
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] tools/virtiofs: Multi threading seems to hurt performance
2020-09-23 12:50 ` Chirantan Ekbote
@ 2020-09-23 12:59 ` Vivek Goyal
2020-09-25 11:35 ` Dr. David Alan Gilbert
1 sibling, 0 replies; 107+ messages in thread
From: Vivek Goyal @ 2020-09-23 12:59 UTC (permalink / raw)
To: Chirantan Ekbote; +Cc: virtio-fs-list, qemu-devel
On Wed, Sep 23, 2020 at 09:50:59PM +0900, Chirantan Ekbote wrote:
> On Sat, Sep 19, 2020 at 6:36 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > Hi All,
> >
> > virtiofsd default thread pool size is 64. To me it feels that in most of
> > the cases thread pool size 1 performs better than thread pool size 64.
> >
> > I ran virtiofs-tests.
> >
> > https://github.com/rhvgoyal/virtiofs-tests
> >
> > And here are the comparision results. To me it seems that by default
> > we should switch to 1 thread (Till we can figure out how to make
> > multi thread performance better even when single process is doing
> > I/O in client).
> >
>
> FWIW, we've observed the same behavior in crosvm. Using a thread pool
> for the virtiofs server consistently gave us worse performance than
> using a single thread.
Thanks for sharing this information Chirantan. Shared pool seems to
perform better than exclusive pool. Single thread vs shared pool is
sort of mixed result but it looks like one thread beats shared pool
results in many of the tests.
May be we will have to swtich to single thread as default at some point
of time if shared pool does not live up to the expectations.
Vivek
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: tools/virtiofs: Multi threading seems to hurt performance
2020-09-22 17:47 ` [Virtio-fs] " Vivek Goyal
@ 2020-09-24 21:33 ` Venegas Munoz, Jose Carlos
-1 siblings, 0 replies; 107+ messages in thread
From: Venegas Munoz, Jose Carlos @ 2020-09-24 21:33 UTC (permalink / raw)
To: Vivek Goyal, Dr. David Alan Gilbert
Cc: virtio-fs-list, Shinde, Archana M, qemu-devel, Stefan Hajnoczi, cdupontd
[-- Attachment #1: Type: text/plain, Size: 4115 bytes --]
Hi Folks,
Sorry for the delay about how to reproduce `fio` data.
I have some code to automate testing for multiple kata configs and collect info like:
- Kata-env, kata configuration.toml, qemu command, virtiofsd command.
See:
https://github.com/jcvenegas/mrunner/
Last time we agreed to narrow the cases and configs to compare virtiofs and 9pfs
The configs where the following:
- qemu + virtiofs(cache=auto, dax=0) a.ka. `kata-qemu-virtiofs` WITOUT xattr
- qemu + 9pfs a.k.a `kata-qemu`
Please take a look to the html and raw results I attach in this mail.
## Can I say that the current status is:
- As David tests and Vivek points, for the fio workload you are using, seems that the best candidate should be cache=none
- In the comparison I took cache=auto as Vivek suggested, this make sense as it seems that will be the default for kata.
- Even if for this case cache=none works better, Can I assume that cache=auto dax=0 will be better than any 9pfs config? (once we find the root cause)
- Vivek is taking a look to mmap mode from 9pfs, to see how different is with current virtiofs implementations. In 9pfs for kata, this is what we use by default.
## I'd like to identify what should be next on the debug/testing?
- Should I try to narrow by only trying to with qemu?
- Should I try first with a new patch you already have?
- Probably try with qemu without static build?
- Do the same test with thread-pool-size=1?
Please let me know how can I help.
Cheers.
On 22/09/20 12:47, "Vivek Goyal" <vgoyal@redhat.com> wrote:
On Tue, Sep 22, 2020 at 11:25:31AM +0100, Dr. David Alan Gilbert wrote:
> * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> > Hi,
> > I've been doing some of my own perf tests and I think I agree
> > about the thread pool size; my test is a kernel build
> > and I've tried a bunch of different options.
> >
> > My config:
> > Host: 16 core AMD EPYC (32 thread), 128G RAM,
> > 5.9.0-rc4 kernel, rhel 8.2ish userspace.
> > 5.1.0 qemu/virtiofsd built from git.
> > Guest: Fedora 32 from cloud image with just enough extra installed for
> > a kernel build.
> >
> > git cloned and checkout v5.8 of Linux into /dev/shm/linux on the host
> > fresh before each test. Then log into the guest, make defconfig,
> > time make -j 16 bzImage, make clean; time make -j 16 bzImage
> > The numbers below are the 'real' time in the guest from the initial make
> > (the subsequent makes dont vary much)
> >
> > Below are the detauls of what each of these means, but here are the
> > numbers first
> >
> > virtiofsdefault 4m0.978s
> > 9pdefault 9m41.660s
> > virtiofscache=none 10m29.700s
> > 9pmmappass 9m30.047s
> > 9pmbigmsize 12m4.208s
> > 9pmsecnone 9m21.363s
> > virtiofscache=noneT1 7m17.494s
> > virtiofsdefaultT1 3m43.326s
> >
> > So the winner there by far is the 'virtiofsdefaultT1' - that's
> > the default virtiofs settings, but with --thread-pool-size=1 - so
> > yes it gives a small benefit.
> > But interestingly the cache=none virtiofs performance is pretty bad,
> > but thread-pool-size=1 on that makes a BIG improvement.
>
> Here are fio runs that Vivek asked me to run in my same environment
> (there are some 0's in some of the mmap cases, and I've not investigated
> why yet).
cache=none does not allow mmap in case of virtiofs. That's when you
are seeing 0.
>virtiofs is looking good here in I think all of the cases;
> there's some division over which cinfig; cache=none
> seems faster in some cases which surprises me.
I know cache=none is faster in case of write workloads. It forces
direct write where we don't call file_remove_privs(). While cache=auto
goes through file_remove_privs() and that adds a GETXATTR request to
every WRITE request.
Vivek
[-- Attachment #2: results.tar.gz --]
[-- Type: application/x-gzip, Size: 18156 bytes --]
[-- Attachment #3: vitiofs 9pfs fio comparsion.html --]
[-- Type: text/html, Size: 29758 bytes --]
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] tools/virtiofs: Multi threading seems to hurt performance
@ 2020-09-24 21:33 ` Venegas Munoz, Jose Carlos
0 siblings, 0 replies; 107+ messages in thread
From: Venegas Munoz, Jose Carlos @ 2020-09-24 21:33 UTC (permalink / raw)
To: Vivek Goyal, Dr. David Alan Gilbert
Cc: virtio-fs-list, Shinde, Archana M, qemu-devel, cdupontd
[-- Attachment #1: Type: text/plain, Size: 4115 bytes --]
Hi Folks,
Sorry for the delay about how to reproduce `fio` data.
I have some code to automate testing for multiple kata configs and collect info like:
- Kata-env, kata configuration.toml, qemu command, virtiofsd command.
See:
https://github.com/jcvenegas/mrunner/
Last time we agreed to narrow the cases and configs to compare virtiofs and 9pfs
The configs where the following:
- qemu + virtiofs(cache=auto, dax=0) a.ka. `kata-qemu-virtiofs` WITOUT xattr
- qemu + 9pfs a.k.a `kata-qemu`
Please take a look to the html and raw results I attach in this mail.
## Can I say that the current status is:
- As David tests and Vivek points, for the fio workload you are using, seems that the best candidate should be cache=none
- In the comparison I took cache=auto as Vivek suggested, this make sense as it seems that will be the default for kata.
- Even if for this case cache=none works better, Can I assume that cache=auto dax=0 will be better than any 9pfs config? (once we find the root cause)
- Vivek is taking a look to mmap mode from 9pfs, to see how different is with current virtiofs implementations. In 9pfs for kata, this is what we use by default.
## I'd like to identify what should be next on the debug/testing?
- Should I try to narrow by only trying to with qemu?
- Should I try first with a new patch you already have?
- Probably try with qemu without static build?
- Do the same test with thread-pool-size=1?
Please let me know how can I help.
Cheers.
On 22/09/20 12:47, "Vivek Goyal" <vgoyal@redhat.com> wrote:
On Tue, Sep 22, 2020 at 11:25:31AM +0100, Dr. David Alan Gilbert wrote:
> * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> > Hi,
> > I've been doing some of my own perf tests and I think I agree
> > about the thread pool size; my test is a kernel build
> > and I've tried a bunch of different options.
> >
> > My config:
> > Host: 16 core AMD EPYC (32 thread), 128G RAM,
> > 5.9.0-rc4 kernel, rhel 8.2ish userspace.
> > 5.1.0 qemu/virtiofsd built from git.
> > Guest: Fedora 32 from cloud image with just enough extra installed for
> > a kernel build.
> >
> > git cloned and checkout v5.8 of Linux into /dev/shm/linux on the host
> > fresh before each test. Then log into the guest, make defconfig,
> > time make -j 16 bzImage, make clean; time make -j 16 bzImage
> > The numbers below are the 'real' time in the guest from the initial make
> > (the subsequent makes dont vary much)
> >
> > Below are the detauls of what each of these means, but here are the
> > numbers first
> >
> > virtiofsdefault 4m0.978s
> > 9pdefault 9m41.660s
> > virtiofscache=none 10m29.700s
> > 9pmmappass 9m30.047s
> > 9pmbigmsize 12m4.208s
> > 9pmsecnone 9m21.363s
> > virtiofscache=noneT1 7m17.494s
> > virtiofsdefaultT1 3m43.326s
> >
> > So the winner there by far is the 'virtiofsdefaultT1' - that's
> > the default virtiofs settings, but with --thread-pool-size=1 - so
> > yes it gives a small benefit.
> > But interestingly the cache=none virtiofs performance is pretty bad,
> > but thread-pool-size=1 on that makes a BIG improvement.
>
> Here are fio runs that Vivek asked me to run in my same environment
> (there are some 0's in some of the mmap cases, and I've not investigated
> why yet).
cache=none does not allow mmap in case of virtiofs. That's when you
are seeing 0.
>virtiofs is looking good here in I think all of the cases;
> there's some division over which cinfig; cache=none
> seems faster in some cases which surprises me.
I know cache=none is faster in case of write workloads. It forces
direct write where we don't call file_remove_privs(). While cache=auto
goes through file_remove_privs() and that adds a GETXATTR request to
every WRITE request.
Vivek
[-- Attachment #2: results.tar.gz --]
[-- Type: application/x-gzip, Size: 18156 bytes --]
[-- Attachment #3: vitiofs 9pfs fio comparsion.html --]
[-- Type: text/html, Size: 29758 bytes --]
^ permalink raw reply [flat|nested] 107+ messages in thread
* virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance)
2020-09-24 21:33 ` [Virtio-fs] " Venegas Munoz, Jose Carlos
@ 2020-09-24 22:10 ` Vivek Goyal
-1 siblings, 0 replies; 107+ messages in thread
From: Vivek Goyal @ 2020-09-24 22:10 UTC (permalink / raw)
To: Venegas Munoz, Jose Carlos
Cc: qemu-devel, cdupontd, Dr. David Alan Gilbert, virtio-fs-list,
Stefan Hajnoczi, Shinde, Archana M
On Thu, Sep 24, 2020 at 09:33:01PM +0000, Venegas Munoz, Jose Carlos wrote:
> Hi Folks,
>
> Sorry for the delay about how to reproduce `fio` data.
>
> I have some code to automate testing for multiple kata configs and collect info like:
> - Kata-env, kata configuration.toml, qemu command, virtiofsd command.
>
> See:
> https://github.com/jcvenegas/mrunner/
>
>
> Last time we agreed to narrow the cases and configs to compare virtiofs and 9pfs
>
> The configs where the following:
>
> - qemu + virtiofs(cache=auto, dax=0) a.ka. `kata-qemu-virtiofs` WITOUT xattr
> - qemu + 9pfs a.k.a `kata-qemu`
>
> Please take a look to the html and raw results I attach in this mail.
Hi Carlos,
So you are running following test.
fio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75 --output=/output/fio.txt
And following are your results.
9p
--
READ: bw=211MiB/s (222MB/s), 211MiB/s-211MiB/s (222MB/s-222MB/s), io=3070MiB (3219MB), run=14532-14532msec
WRITE: bw=70.6MiB/s (74.0MB/s), 70.6MiB/s-70.6MiB/s (74.0MB/s-74.0MB/s), io=1026MiB (1076MB), run=14532-14532msec
virtiofs
--------
Run status group 0 (all jobs):
READ: bw=159MiB/s (167MB/s), 159MiB/s-159MiB/s (167MB/s-167MB/s), io=3070MiB (3219MB), run=19321-19321msec
WRITE: bw=53.1MiB/s (55.7MB/s), 53.1MiB/s-53.1MiB/s (55.7MB/s-55.7MB/s), io=1026MiB (1076MB), run=19321-19321msec
So looks like you are getting better performance with 9p in this case.
Can you apply "shared pool" patch to qemu for virtiofsd and re-run this
test and see if you see any better results.
In my testing, with cache=none, virtiofs performed better than 9p in
all the fio jobs I was running. For the case of cache=auto for virtiofs
(with xattr enabled), 9p performed better in certain write workloads. I
have identified root cause of that problem and working on
HANDLE_KILLPRIV_V2 patches to improve WRITE performance of virtiofs
with cache=auto and xattr enabled.
I will post my 9p and virtiofs comparison numbers next week. In the
mean time will be great if you could apply following qemu patch, rebuild
qemu and re-run above test.
https://www.redhat.com/archives/virtio-fs/2020-September/msg00081.html
Also what's the status of file cache on host in both the cases. Are
you booting host fresh for these tests so that cache is cold on host
or cache is warm?
Thanks
Vivek
^ permalink raw reply [flat|nested] 107+ messages in thread
* [Virtio-fs] virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance)
@ 2020-09-24 22:10 ` Vivek Goyal
0 siblings, 0 replies; 107+ messages in thread
From: Vivek Goyal @ 2020-09-24 22:10 UTC (permalink / raw)
To: Venegas Munoz, Jose Carlos
Cc: qemu-devel, cdupontd, virtio-fs-list, Shinde, Archana M
On Thu, Sep 24, 2020 at 09:33:01PM +0000, Venegas Munoz, Jose Carlos wrote:
> Hi Folks,
>
> Sorry for the delay about how to reproduce `fio` data.
>
> I have some code to automate testing for multiple kata configs and collect info like:
> - Kata-env, kata configuration.toml, qemu command, virtiofsd command.
>
> See:
> https://github.com/jcvenegas/mrunner/
>
>
> Last time we agreed to narrow the cases and configs to compare virtiofs and 9pfs
>
> The configs where the following:
>
> - qemu + virtiofs(cache=auto, dax=0) a.ka. `kata-qemu-virtiofs` WITOUT xattr
> - qemu + 9pfs a.k.a `kata-qemu`
>
> Please take a look to the html and raw results I attach in this mail.
Hi Carlos,
So you are running following test.
fio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75 --output=/output/fio.txt
And following are your results.
9p
--
READ: bw=211MiB/s (222MB/s), 211MiB/s-211MiB/s (222MB/s-222MB/s), io=3070MiB (3219MB), run=14532-14532msec
WRITE: bw=70.6MiB/s (74.0MB/s), 70.6MiB/s-70.6MiB/s (74.0MB/s-74.0MB/s), io=1026MiB (1076MB), run=14532-14532msec
virtiofs
--------
Run status group 0 (all jobs):
READ: bw=159MiB/s (167MB/s), 159MiB/s-159MiB/s (167MB/s-167MB/s), io=3070MiB (3219MB), run=19321-19321msec
WRITE: bw=53.1MiB/s (55.7MB/s), 53.1MiB/s-53.1MiB/s (55.7MB/s-55.7MB/s), io=1026MiB (1076MB), run=19321-19321msec
So looks like you are getting better performance with 9p in this case.
Can you apply "shared pool" patch to qemu for virtiofsd and re-run this
test and see if you see any better results.
In my testing, with cache=none, virtiofs performed better than 9p in
all the fio jobs I was running. For the case of cache=auto for virtiofs
(with xattr enabled), 9p performed better in certain write workloads. I
have identified root cause of that problem and working on
HANDLE_KILLPRIV_V2 patches to improve WRITE performance of virtiofs
with cache=auto and xattr enabled.
I will post my 9p and virtiofs comparison numbers next week. In the
mean time will be great if you could apply following qemu patch, rebuild
qemu and re-run above test.
https://www.redhat.com/archives/virtio-fs/2020-September/msg00081.html
Also what's the status of file cache on host in both the cases. Are
you booting host fresh for these tests so that cache is cold on host
or cache is warm?
Thanks
Vivek
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: virtiofs vs 9p performance
2020-09-24 22:10 ` [Virtio-fs] " Vivek Goyal
@ 2020-09-25 8:06 ` Christian Schoenebeck
-1 siblings, 0 replies; 107+ messages in thread
From: Christian Schoenebeck @ 2020-09-25 8:06 UTC (permalink / raw)
To: qemu-devel
Cc: Vivek Goyal, Venegas Munoz, Jose Carlos, cdupontd,
Dr. David Alan Gilbert, virtio-fs-list, Stefan Hajnoczi, Shinde,
Archana M, Greg Kurz
On Freitag, 25. September 2020 00:10:23 CEST Vivek Goyal wrote:
> In my testing, with cache=none, virtiofs performed better than 9p in
> all the fio jobs I was running. For the case of cache=auto for virtiofs
> (with xattr enabled), 9p performed better in certain write workloads. I
> have identified root cause of that problem and working on
> HANDLE_KILLPRIV_V2 patches to improve WRITE performance of virtiofs
> with cache=auto and xattr enabled.
Please note, when it comes to performance aspects, you should set a reasonable
high value for 'msize' on 9p client side:
https://wiki.qemu.org/Documentation/9psetup#msize
I'm also working on performance optimizations for 9p BTW. There is plenty of
headroom to put it mildly. For QEMU 5.2 I started by addressing readdir
requests:
https://wiki.qemu.org/ChangeLog/5.2#9pfs
Best regards,
Christian Schoenebeck
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] virtiofs vs 9p performance
@ 2020-09-25 8:06 ` Christian Schoenebeck
0 siblings, 0 replies; 107+ messages in thread
From: Christian Schoenebeck @ 2020-09-25 8:06 UTC (permalink / raw)
To: qemu-devel
Cc: Venegas Munoz, Jose Carlos, cdupontd, virtio-fs-list, Greg Kurz,
Shinde, Archana M, Vivek Goyal
On Freitag, 25. September 2020 00:10:23 CEST Vivek Goyal wrote:
> In my testing, with cache=none, virtiofs performed better than 9p in
> all the fio jobs I was running. For the case of cache=auto for virtiofs
> (with xattr enabled), 9p performed better in certain write workloads. I
> have identified root cause of that problem and working on
> HANDLE_KILLPRIV_V2 patches to improve WRITE performance of virtiofs
> with cache=auto and xattr enabled.
Please note, when it comes to performance aspects, you should set a reasonable
high value for 'msize' on 9p client side:
https://wiki.qemu.org/Documentation/9psetup#msize
I'm also working on performance optimizations for 9p BTW. There is plenty of
headroom to put it mildly. For QEMU 5.2 I started by addressing readdir
requests:
https://wiki.qemu.org/ChangeLog/5.2#9pfs
Best regards,
Christian Schoenebeck
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] tools/virtiofs: Multi threading seems to hurt performance
2020-09-23 12:50 ` Chirantan Ekbote
2020-09-23 12:59 ` Vivek Goyal
@ 2020-09-25 11:35 ` Dr. David Alan Gilbert
1 sibling, 0 replies; 107+ messages in thread
From: Dr. David Alan Gilbert @ 2020-09-25 11:35 UTC (permalink / raw)
To: Chirantan Ekbote; +Cc: virtio-fs-list, qemu-devel, Vivek Goyal
* Chirantan Ekbote (chirantan@chromium.org) wrote:
> On Sat, Sep 19, 2020 at 6:36 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > Hi All,
> >
> > virtiofsd default thread pool size is 64. To me it feels that in most of
> > the cases thread pool size 1 performs better than thread pool size 64.
> >
> > I ran virtiofs-tests.
> >
> > https://github.com/rhvgoyal/virtiofs-tests
> >
> > And here are the comparision results. To me it seems that by default
> > we should switch to 1 thread (Till we can figure out how to make
> > multi thread performance better even when single process is doing
> > I/O in client).
> >
>
> FWIW, we've observed the same behavior in crosvm. Using a thread pool
> for the virtiofs server consistently gave us worse performance than
> using a single thread.
Interesting; so it's not just us doing something silly!
It does feel like you *should* be able to get some benefit from multiple
threads; so I guess some more investigation needed at some time.
Dave
> Chirantan
>
> _______________________________________________
> Virtio-fs mailing list
> Virtio-fs@redhat.com
> https://www.redhat.com/mailman/listinfo/virtio-fs
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: tools/virtiofs: Multi threading seems to hurt performance
2020-09-22 17:47 ` [Virtio-fs] " Vivek Goyal
@ 2020-09-25 12:11 ` Dr. David Alan Gilbert
-1 siblings, 0 replies; 107+ messages in thread
From: Dr. David Alan Gilbert @ 2020-09-25 12:11 UTC (permalink / raw)
To: Vivek Goyal
Cc: jose.carlos.venegas.munoz, qemu-devel, cdupontd, virtio-fs-list,
Stefan Hajnoczi, archana.m.shinde
* Vivek Goyal (vgoyal@redhat.com) wrote:
> On Tue, Sep 22, 2020 at 11:25:31AM +0100, Dr. David Alan Gilbert wrote:
> > * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> > > Hi,
> > > I've been doing some of my own perf tests and I think I agree
> > > about the thread pool size; my test is a kernel build
> > > and I've tried a bunch of different options.
> > >
> > > My config:
> > > Host: 16 core AMD EPYC (32 thread), 128G RAM,
> > > 5.9.0-rc4 kernel, rhel 8.2ish userspace.
> > > 5.1.0 qemu/virtiofsd built from git.
> > > Guest: Fedora 32 from cloud image with just enough extra installed for
> > > a kernel build.
> > >
> > > git cloned and checkout v5.8 of Linux into /dev/shm/linux on the host
> > > fresh before each test. Then log into the guest, make defconfig,
> > > time make -j 16 bzImage, make clean; time make -j 16 bzImage
> > > The numbers below are the 'real' time in the guest from the initial make
> > > (the subsequent makes dont vary much)
> > >
> > > Below are the detauls of what each of these means, but here are the
> > > numbers first
> > >
> > > virtiofsdefault 4m0.978s
> > > 9pdefault 9m41.660s
> > > virtiofscache=none 10m29.700s
> > > 9pmmappass 9m30.047s
> > > 9pmbigmsize 12m4.208s
> > > 9pmsecnone 9m21.363s
> > > virtiofscache=noneT1 7m17.494s
> > > virtiofsdefaultT1 3m43.326s
> > >
> > > So the winner there by far is the 'virtiofsdefaultT1' - that's
> > > the default virtiofs settings, but with --thread-pool-size=1 - so
> > > yes it gives a small benefit.
> > > But interestingly the cache=none virtiofs performance is pretty bad,
> > > but thread-pool-size=1 on that makes a BIG improvement.
> >
> > Here are fio runs that Vivek asked me to run in my same environment
> > (there are some 0's in some of the mmap cases, and I've not investigated
> > why yet).
>
> cache=none does not allow mmap in case of virtiofs. That's when you
> are seeing 0.
>
> >virtiofs is looking good here in I think all of the cases;
> > there's some division over which cinfig; cache=none
> > seems faster in some cases which surprises me.
>
> I know cache=none is faster in case of write workloads. It forces
> direct write where we don't call file_remove_privs(). While cache=auto
> goes through file_remove_privs() and that adds a GETXATTR request to
> every WRITE request.
Can you point me to how cache=auto causes the file_remove_privs?
Dave
> Vivek
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] tools/virtiofs: Multi threading seems to hurt performance
@ 2020-09-25 12:11 ` Dr. David Alan Gilbert
0 siblings, 0 replies; 107+ messages in thread
From: Dr. David Alan Gilbert @ 2020-09-25 12:11 UTC (permalink / raw)
To: Vivek Goyal
Cc: jose.carlos.venegas.munoz, qemu-devel, cdupontd, virtio-fs-list,
archana.m.shinde
* Vivek Goyal (vgoyal@redhat.com) wrote:
> On Tue, Sep 22, 2020 at 11:25:31AM +0100, Dr. David Alan Gilbert wrote:
> > * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> > > Hi,
> > > I've been doing some of my own perf tests and I think I agree
> > > about the thread pool size; my test is a kernel build
> > > and I've tried a bunch of different options.
> > >
> > > My config:
> > > Host: 16 core AMD EPYC (32 thread), 128G RAM,
> > > 5.9.0-rc4 kernel, rhel 8.2ish userspace.
> > > 5.1.0 qemu/virtiofsd built from git.
> > > Guest: Fedora 32 from cloud image with just enough extra installed for
> > > a kernel build.
> > >
> > > git cloned and checkout v5.8 of Linux into /dev/shm/linux on the host
> > > fresh before each test. Then log into the guest, make defconfig,
> > > time make -j 16 bzImage, make clean; time make -j 16 bzImage
> > > The numbers below are the 'real' time in the guest from the initial make
> > > (the subsequent makes dont vary much)
> > >
> > > Below are the detauls of what each of these means, but here are the
> > > numbers first
> > >
> > > virtiofsdefault 4m0.978s
> > > 9pdefault 9m41.660s
> > > virtiofscache=none 10m29.700s
> > > 9pmmappass 9m30.047s
> > > 9pmbigmsize 12m4.208s
> > > 9pmsecnone 9m21.363s
> > > virtiofscache=noneT1 7m17.494s
> > > virtiofsdefaultT1 3m43.326s
> > >
> > > So the winner there by far is the 'virtiofsdefaultT1' - that's
> > > the default virtiofs settings, but with --thread-pool-size=1 - so
> > > yes it gives a small benefit.
> > > But interestingly the cache=none virtiofs performance is pretty bad,
> > > but thread-pool-size=1 on that makes a BIG improvement.
> >
> > Here are fio runs that Vivek asked me to run in my same environment
> > (there are some 0's in some of the mmap cases, and I've not investigated
> > why yet).
>
> cache=none does not allow mmap in case of virtiofs. That's when you
> are seeing 0.
>
> >virtiofs is looking good here in I think all of the cases;
> > there's some division over which cinfig; cache=none
> > seems faster in some cases which surprises me.
>
> I know cache=none is faster in case of write workloads. It forces
> direct write where we don't call file_remove_privs(). While cache=auto
> goes through file_remove_privs() and that adds a GETXATTR request to
> every WRITE request.
Can you point me to how cache=auto causes the file_remove_privs?
Dave
> Vivek
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance)
2020-09-24 22:10 ` [Virtio-fs] " Vivek Goyal
@ 2020-09-25 12:41 ` Dr. David Alan Gilbert
-1 siblings, 0 replies; 107+ messages in thread
From: Dr. David Alan Gilbert @ 2020-09-25 12:41 UTC (permalink / raw)
To: Vivek Goyal
Cc: Venegas Munoz, Jose Carlos, qemu-devel, cdupontd, virtio-fs-list,
Stefan Hajnoczi, Shinde, Archana M
* Vivek Goyal (vgoyal@redhat.com) wrote:
> On Thu, Sep 24, 2020 at 09:33:01PM +0000, Venegas Munoz, Jose Carlos wrote:
> > Hi Folks,
> >
> > Sorry for the delay about how to reproduce `fio` data.
> >
> > I have some code to automate testing for multiple kata configs and collect info like:
> > - Kata-env, kata configuration.toml, qemu command, virtiofsd command.
> >
> > See:
> > https://github.com/jcvenegas/mrunner/
> >
> >
> > Last time we agreed to narrow the cases and configs to compare virtiofs and 9pfs
> >
> > The configs where the following:
> >
> > - qemu + virtiofs(cache=auto, dax=0) a.ka. `kata-qemu-virtiofs` WITOUT xattr
> > - qemu + 9pfs a.k.a `kata-qemu`
> >
> > Please take a look to the html and raw results I attach in this mail.
>
> Hi Carlos,
>
> So you are running following test.
>
> fio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75 --output=/output/fio.txt
>
> And following are your results.
>
> 9p
> --
> READ: bw=211MiB/s (222MB/s), 211MiB/s-211MiB/s (222MB/s-222MB/s), io=3070MiB (3219MB), run=14532-14532msec
>
> WRITE: bw=70.6MiB/s (74.0MB/s), 70.6MiB/s-70.6MiB/s (74.0MB/s-74.0MB/s), io=1026MiB (1076MB), run=14532-14532msec
>
> virtiofs
> --------
> Run status group 0 (all jobs):
> READ: bw=159MiB/s (167MB/s), 159MiB/s-159MiB/s (167MB/s-167MB/s), io=3070MiB (3219MB), run=19321-19321msec
> WRITE: bw=53.1MiB/s (55.7MB/s), 53.1MiB/s-53.1MiB/s (55.7MB/s-55.7MB/s), io=1026MiB (1076MB), run=19321-19321msec
>
> So looks like you are getting better performance with 9p in this case.
That's interesting, because I've just tried similar again with my
ramdisk setup:
fio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75 --output=aname.txt
virtiofs default options
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=64
fio-3.21
Starting 1 process
test: Laying out IO file (1 file / 4096MiB)
test: (groupid=0, jobs=1): err= 0: pid=773: Fri Sep 25 12:28:32 2020
read: IOPS=18.3k, BW=71.3MiB/s (74.8MB/s)(3070MiB/43042msec)
bw ( KiB/s): min=70752, max=77280, per=100.00%, avg=73075.71, stdev=1603.47, samples=85
iops : min=17688, max=19320, avg=18268.92, stdev=400.86, samples=85
write: IOPS=6102, BW=23.8MiB/s (24.0MB/s)(1026MiB/43042msec); 0 zone resets
bw ( KiB/s): min=23128, max=25696, per=100.00%, avg=24420.40, stdev=583.08, samples=85
iops : min= 5782, max= 6424, avg=6105.09, stdev=145.76, samples=85
cpu : usr=0.10%, sys=30.09%, ctx=1245312, majf=0, minf=6
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=71.3MiB/s (74.8MB/s), 71.3MiB/s-71.3MiB/s (74.8MB/s-74.8MB/s), io=3070MiB (3219MB), run=43042-43042msec
WRITE: bw=23.8MiB/s (24.0MB/s), 23.8MiB/s-23.8MiB/s (24.0MB/s-24.0MB/s), io=1026MiB (1076MB), run=43042-43042msec
virtiofs cache=none
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=64
fio-3.21
Starting 1 process
test: (groupid=0, jobs=1): err= 0: pid=740: Fri Sep 25 12:30:57 2020
read: IOPS=22.9k, BW=89.6MiB/s (93.0MB/s)(3070MiB/34256msec)
bw ( KiB/s): min=89048, max=94240, per=100.00%, avg=91871.06, stdev=967.87, samples=68
iops : min=22262, max=23560, avg=22967.76, stdev=241.97, samples=68
write: IOPS=7667, BW=29.0MiB/s (31.4MB/s)(1026MiB/34256msec); 0 zone resets
bw ( KiB/s): min=29264, max=32248, per=100.00%, avg=30700.82, stdev=541.97, samples=68
iops : min= 7316, max= 8062, avg=7675.21, stdev=135.49, samples=68
cpu : usr=1.03%, sys=27.64%, ctx=1048635, majf=0, minf=5
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=89.6MiB/s (93.0MB/s), 89.6MiB/s-89.6MiB/s (93.0MB/s-93.0MB/s), io=3070MiB (3219MB), run=34256-34256msec
WRITE: bw=29.0MiB/s (31.4MB/s), 29.0MiB/s-29.0MiB/s (31.4MB/s-31.4MB/s), io=1026MiB (1076MB), run=34256-34256msec
virtiofs cache=none thread-pool-size=1
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=64
fio-3.21
Starting 1 process
test: (groupid=0, jobs=1): err= 0: pid=738: Fri Sep 25 12:33:17 2020
read: IOPS=23.7k, BW=92.4MiB/s (96.9MB/s)(3070MiB/33215msec)
bw ( KiB/s): min=89808, max=111952, per=100.00%, avg=94762.30, stdev=4507.43, samples=66
iops : min=22452, max=27988, avg=23690.58, stdev=1126.86, samples=66
write: IOPS=7907, BW=30.9MiB/s (32.4MB/s)(1026MiB/33215msec); 0 zone resets
bw ( KiB/s): min=29424, max=37112, per=100.00%, avg=31668.73, stdev=1558.69, samples=66
iops : min= 7356, max= 9278, avg=7917.18, stdev=389.67, samples=66
cpu : usr=0.43%, sys=29.07%, ctx=1048627, majf=0, minf=7
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=92.4MiB/s (96.9MB/s), 92.4MiB/s-92.4MiB/s (96.9MB/s-96.9MB/s), io=3070MiB (3219MB), run=33215-33215msec
WRITE: bw=30.9MiB/s (32.4MB/s), 30.9MiB/s-30.9MiB/s (32.4MB/s-32.4MB/s), io=1026MiB (1076MB), run=33215-33215msec
9p ( mount -t 9p -o trans=virtio kernel /mnt -oversion=9p2000.L,cache=mmap,msize=1048576 )
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=64
fio-3.21
Starting 1 process
test: (groupid=0, jobs=1): err= 0: pid=736: Fri Sep 25 12:36:00 2020
read: IOPS=16.2k, BW=63.5MiB/s (66.6MB/s)(3070MiB/48366msec)
bw ( KiB/s): min=63426, max=82776, per=100.00%, avg=65054.28, stdev=2014.88, samples=96
iops : min=15856, max=20694, avg=16263.34, stdev=503.74, samples=96
write: IOPS=5430, BW=21.2MiB/s (22.2MB/s)(1026MiB/48366msec); 0 zone resets
bw ( KiB/s): min=20916, max=27632, per=100.00%, avg=21740.64, stdev=735.73, samples=96
iops : min= 5229, max= 6908, avg=5434.99, stdev=183.95, samples=96
cpu : usr=1.60%, sys=14.28%, ctx=1049348, majf=0, minf=7
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=63.5MiB/s (66.6MB/s), 63.5MiB/s-63.5MiB/s (66.6MB/s-66.6MB/s), io=3070MiB (3219MB), run=48366-48366msec
WRITE: bw=21.2MiB/s (22.2MB/s), 21.2MiB/s-21.2MiB/s (22.2MB/s-22.2MB/s), io=1026MiB (1076MB), run=48366-48366msec
So I'm sitll beating 9p; the thread-pool-size=1 seems to be great for
read performance here.
Dave
> Can you apply "shared pool" patch to qemu for virtiofsd and re-run this
> test and see if you see any better results.
>
> In my testing, with cache=none, virtiofs performed better than 9p in
> all the fio jobs I was running. For the case of cache=auto for virtiofs
> (with xattr enabled), 9p performed better in certain write workloads. I
> have identified root cause of that problem and working on
> HANDLE_KILLPRIV_V2 patches to improve WRITE performance of virtiofs
> with cache=auto and xattr enabled.
>
> I will post my 9p and virtiofs comparison numbers next week. In the
> mean time will be great if you could apply following qemu patch, rebuild
> qemu and re-run above test.
>
> https://www.redhat.com/archives/virtio-fs/2020-September/msg00081.html
>
> Also what's the status of file cache on host in both the cases. Are
> you booting host fresh for these tests so that cache is cold on host
> or cache is warm?
>
> Thanks
> Vivek
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance)
@ 2020-09-25 12:41 ` Dr. David Alan Gilbert
0 siblings, 0 replies; 107+ messages in thread
From: Dr. David Alan Gilbert @ 2020-09-25 12:41 UTC (permalink / raw)
To: Vivek Goyal
Cc: Venegas Munoz, Jose Carlos, qemu-devel, cdupontd, virtio-fs-list,
Shinde, Archana M
* Vivek Goyal (vgoyal@redhat.com) wrote:
> On Thu, Sep 24, 2020 at 09:33:01PM +0000, Venegas Munoz, Jose Carlos wrote:
> > Hi Folks,
> >
> > Sorry for the delay about how to reproduce `fio` data.
> >
> > I have some code to automate testing for multiple kata configs and collect info like:
> > - Kata-env, kata configuration.toml, qemu command, virtiofsd command.
> >
> > See:
> > https://github.com/jcvenegas/mrunner/
> >
> >
> > Last time we agreed to narrow the cases and configs to compare virtiofs and 9pfs
> >
> > The configs where the following:
> >
> > - qemu + virtiofs(cache=auto, dax=0) a.ka. `kata-qemu-virtiofs` WITOUT xattr
> > - qemu + 9pfs a.k.a `kata-qemu`
> >
> > Please take a look to the html and raw results I attach in this mail.
>
> Hi Carlos,
>
> So you are running following test.
>
> fio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75 --output=/output/fio.txt
>
> And following are your results.
>
> 9p
> --
> READ: bw=211MiB/s (222MB/s), 211MiB/s-211MiB/s (222MB/s-222MB/s), io=3070MiB (3219MB), run=14532-14532msec
>
> WRITE: bw=70.6MiB/s (74.0MB/s), 70.6MiB/s-70.6MiB/s (74.0MB/s-74.0MB/s), io=1026MiB (1076MB), run=14532-14532msec
>
> virtiofs
> --------
> Run status group 0 (all jobs):
> READ: bw=159MiB/s (167MB/s), 159MiB/s-159MiB/s (167MB/s-167MB/s), io=3070MiB (3219MB), run=19321-19321msec
> WRITE: bw=53.1MiB/s (55.7MB/s), 53.1MiB/s-53.1MiB/s (55.7MB/s-55.7MB/s), io=1026MiB (1076MB), run=19321-19321msec
>
> So looks like you are getting better performance with 9p in this case.
That's interesting, because I've just tried similar again with my
ramdisk setup:
fio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75 --output=aname.txt
virtiofs default options
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=64
fio-3.21
Starting 1 process
test: Laying out IO file (1 file / 4096MiB)
test: (groupid=0, jobs=1): err= 0: pid=773: Fri Sep 25 12:28:32 2020
read: IOPS=18.3k, BW=71.3MiB/s (74.8MB/s)(3070MiB/43042msec)
bw ( KiB/s): min=70752, max=77280, per=100.00%, avg=73075.71, stdev=1603.47, samples=85
iops : min=17688, max=19320, avg=18268.92, stdev=400.86, samples=85
write: IOPS=6102, BW=23.8MiB/s (24.0MB/s)(1026MiB/43042msec); 0 zone resets
bw ( KiB/s): min=23128, max=25696, per=100.00%, avg=24420.40, stdev=583.08, samples=85
iops : min= 5782, max= 6424, avg=6105.09, stdev=145.76, samples=85
cpu : usr=0.10%, sys=30.09%, ctx=1245312, majf=0, minf=6
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=71.3MiB/s (74.8MB/s), 71.3MiB/s-71.3MiB/s (74.8MB/s-74.8MB/s), io=3070MiB (3219MB), run=43042-43042msec
WRITE: bw=23.8MiB/s (24.0MB/s), 23.8MiB/s-23.8MiB/s (24.0MB/s-24.0MB/s), io=1026MiB (1076MB), run=43042-43042msec
virtiofs cache=none
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=64
fio-3.21
Starting 1 process
test: (groupid=0, jobs=1): err= 0: pid=740: Fri Sep 25 12:30:57 2020
read: IOPS=22.9k, BW=89.6MiB/s (93.0MB/s)(3070MiB/34256msec)
bw ( KiB/s): min=89048, max=94240, per=100.00%, avg=91871.06, stdev=967.87, samples=68
iops : min=22262, max=23560, avg=22967.76, stdev=241.97, samples=68
write: IOPS=7667, BW=29.0MiB/s (31.4MB/s)(1026MiB/34256msec); 0 zone resets
bw ( KiB/s): min=29264, max=32248, per=100.00%, avg=30700.82, stdev=541.97, samples=68
iops : min= 7316, max= 8062, avg=7675.21, stdev=135.49, samples=68
cpu : usr=1.03%, sys=27.64%, ctx=1048635, majf=0, minf=5
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=89.6MiB/s (93.0MB/s), 89.6MiB/s-89.6MiB/s (93.0MB/s-93.0MB/s), io=3070MiB (3219MB), run=34256-34256msec
WRITE: bw=29.0MiB/s (31.4MB/s), 29.0MiB/s-29.0MiB/s (31.4MB/s-31.4MB/s), io=1026MiB (1076MB), run=34256-34256msec
virtiofs cache=none thread-pool-size=1
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=64
fio-3.21
Starting 1 process
test: (groupid=0, jobs=1): err= 0: pid=738: Fri Sep 25 12:33:17 2020
read: IOPS=23.7k, BW=92.4MiB/s (96.9MB/s)(3070MiB/33215msec)
bw ( KiB/s): min=89808, max=111952, per=100.00%, avg=94762.30, stdev=4507.43, samples=66
iops : min=22452, max=27988, avg=23690.58, stdev=1126.86, samples=66
write: IOPS=7907, BW=30.9MiB/s (32.4MB/s)(1026MiB/33215msec); 0 zone resets
bw ( KiB/s): min=29424, max=37112, per=100.00%, avg=31668.73, stdev=1558.69, samples=66
iops : min= 7356, max= 9278, avg=7917.18, stdev=389.67, samples=66
cpu : usr=0.43%, sys=29.07%, ctx=1048627, majf=0, minf=7
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=92.4MiB/s (96.9MB/s), 92.4MiB/s-92.4MiB/s (96.9MB/s-96.9MB/s), io=3070MiB (3219MB), run=33215-33215msec
WRITE: bw=30.9MiB/s (32.4MB/s), 30.9MiB/s-30.9MiB/s (32.4MB/s-32.4MB/s), io=1026MiB (1076MB), run=33215-33215msec
9p ( mount -t 9p -o trans=virtio kernel /mnt -oversion=9p2000.L,cache=mmap,msize=1048576 )
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=64
fio-3.21
Starting 1 process
test: (groupid=0, jobs=1): err= 0: pid=736: Fri Sep 25 12:36:00 2020
read: IOPS=16.2k, BW=63.5MiB/s (66.6MB/s)(3070MiB/48366msec)
bw ( KiB/s): min=63426, max=82776, per=100.00%, avg=65054.28, stdev=2014.88, samples=96
iops : min=15856, max=20694, avg=16263.34, stdev=503.74, samples=96
write: IOPS=5430, BW=21.2MiB/s (22.2MB/s)(1026MiB/48366msec); 0 zone resets
bw ( KiB/s): min=20916, max=27632, per=100.00%, avg=21740.64, stdev=735.73, samples=96
iops : min= 5229, max= 6908, avg=5434.99, stdev=183.95, samples=96
cpu : usr=1.60%, sys=14.28%, ctx=1049348, majf=0, minf=7
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=63.5MiB/s (66.6MB/s), 63.5MiB/s-63.5MiB/s (66.6MB/s-66.6MB/s), io=3070MiB (3219MB), run=48366-48366msec
WRITE: bw=21.2MiB/s (22.2MB/s), 21.2MiB/s-21.2MiB/s (22.2MB/s-22.2MB/s), io=1026MiB (1076MB), run=48366-48366msec
So I'm sitll beating 9p; the thread-pool-size=1 seems to be great for
read performance here.
Dave
> Can you apply "shared pool" patch to qemu for virtiofsd and re-run this
> test and see if you see any better results.
>
> In my testing, with cache=none, virtiofs performed better than 9p in
> all the fio jobs I was running. For the case of cache=auto for virtiofs
> (with xattr enabled), 9p performed better in certain write workloads. I
> have identified root cause of that problem and working on
> HANDLE_KILLPRIV_V2 patches to improve WRITE performance of virtiofs
> with cache=auto and xattr enabled.
>
> I will post my 9p and virtiofs comparison numbers next week. In the
> mean time will be great if you could apply following qemu patch, rebuild
> qemu and re-run above test.
>
> https://www.redhat.com/archives/virtio-fs/2020-September/msg00081.html
>
> Also what's the status of file cache on host in both the cases. Are
> you booting host fresh for these tests so that cache is cold on host
> or cache is warm?
>
> Thanks
> Vivek
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance)
2020-09-25 12:41 ` [Virtio-fs] " Dr. David Alan Gilbert
@ 2020-09-25 13:04 ` Christian Schoenebeck
-1 siblings, 0 replies; 107+ messages in thread
From: Christian Schoenebeck @ 2020-09-25 13:04 UTC (permalink / raw)
To: qemu-devel
Cc: Dr. David Alan Gilbert, Vivek Goyal, Venegas Munoz, Jose Carlos,
cdupontd, virtio-fs-list, Stefan Hajnoczi, Shinde, Archana M
On Freitag, 25. September 2020 14:41:39 CEST Dr. David Alan Gilbert wrote:
> > Hi Carlos,
> >
> > So you are running following test.
> >
> > fio --direct=1 --gtod_reduce=1 --name=test
> > --filename=random_read_write.fio --bs=4k --iodepth=64 --size=4G
> > --readwrite=randrw --rwmixread=75 --output=/output/fio.txt
> >
> > And following are your results.
> >
> > 9p
> > --
> > READ: bw=211MiB/s (222MB/s), 211MiB/s-211MiB/s (222MB/s-222MB/s),
> > io=3070MiB (3219MB), run=14532-14532msec
> >
> > WRITE: bw=70.6MiB/s (74.0MB/s), 70.6MiB/s-70.6MiB/s (74.0MB/s-74.0MB/s),
> > io=1026MiB (1076MB), run=14532-14532msec
> >
> > virtiofs
> > --------
> >
> > Run status group 0 (all jobs):
> > READ: bw=159MiB/s (167MB/s), 159MiB/s-159MiB/s (167MB/s-167MB/s),
> > io=3070MiB (3219MB), run=19321-19321msec>
> > WRITE: bw=53.1MiB/s (55.7MB/s), 53.1MiB/s-53.1MiB/s (55.7MB/s-55.7MB/s),
> > io=1026MiB (1076MB), run=19321-19321msec>
> > So looks like you are getting better performance with 9p in this case.
>
> That's interesting, because I've just tried similar again with my
> ramdisk setup:
>
> fio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio
> --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
> --output=aname.txt
>
>
> virtiofs default options
> test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T)
> 4096B-4096B, ioengine=psync, iodepth=64 fio-3.21
> Starting 1 process
> test: Laying out IO file (1 file / 4096MiB)
>
> test: (groupid=0, jobs=1): err= 0: pid=773: Fri Sep 25 12:28:32 2020
> read: IOPS=18.3k, BW=71.3MiB/s (74.8MB/s)(3070MiB/43042msec)
> bw ( KiB/s): min=70752, max=77280, per=100.00%, avg=73075.71,
> stdev=1603.47, samples=85 iops : min=17688, max=19320, avg=18268.92,
> stdev=400.86, samples=85 write: IOPS=6102, BW=23.8MiB/s
> (24.0MB/s)(1026MiB/43042msec); 0 zone resets bw ( KiB/s): min=23128,
> max=25696, per=100.00%, avg=24420.40, stdev=583.08, samples=85 iops
> : min= 5782, max= 6424, avg=6105.09, stdev=145.76, samples=85 cpu
> : usr=0.10%, sys=30.09%, ctx=1245312, majf=0, minf=6 IO depths :
> 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit :
> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete :
> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts:
> total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0,
> window=0, percentile=100.00%, depth=64
>
> Run status group 0 (all jobs):
> READ: bw=71.3MiB/s (74.8MB/s), 71.3MiB/s-71.3MiB/s (74.8MB/s-74.8MB/s),
> io=3070MiB (3219MB), run=43042-43042msec WRITE: bw=23.8MiB/s (24.0MB/s),
> 23.8MiB/s-23.8MiB/s (24.0MB/s-24.0MB/s), io=1026MiB (1076MB),
> run=43042-43042msec
>
> virtiofs cache=none
> test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T)
> 4096B-4096B, ioengine=psync, iodepth=64 fio-3.21
> Starting 1 process
>
> test: (groupid=0, jobs=1): err= 0: pid=740: Fri Sep 25 12:30:57 2020
> read: IOPS=22.9k, BW=89.6MiB/s (93.0MB/s)(3070MiB/34256msec)
> bw ( KiB/s): min=89048, max=94240, per=100.00%, avg=91871.06,
> stdev=967.87, samples=68 iops : min=22262, max=23560, avg=22967.76,
> stdev=241.97, samples=68 write: IOPS=7667, BW=29.0MiB/s
> (31.4MB/s)(1026MiB/34256msec); 0 zone resets bw ( KiB/s): min=29264,
> max=32248, per=100.00%, avg=30700.82, stdev=541.97, samples=68 iops
> : min= 7316, max= 8062, avg=7675.21, stdev=135.49, samples=68 cpu
> : usr=1.03%, sys=27.64%, ctx=1048635, majf=0, minf=5 IO depths :
> 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit :
> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete :
> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts:
> total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0,
> window=0, percentile=100.00%, depth=64
>
> Run status group 0 (all jobs):
> READ: bw=89.6MiB/s (93.0MB/s), 89.6MiB/s-89.6MiB/s (93.0MB/s-93.0MB/s),
> io=3070MiB (3219MB), run=34256-34256msec WRITE: bw=29.0MiB/s (31.4MB/s),
> 29.0MiB/s-29.0MiB/s (31.4MB/s-31.4MB/s), io=1026MiB (1076MB),
> run=34256-34256msec
>
> virtiofs cache=none thread-pool-size=1
> test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T)
> 4096B-4096B, ioengine=psync, iodepth=64 fio-3.21
> Starting 1 process
>
> test: (groupid=0, jobs=1): err= 0: pid=738: Fri Sep 25 12:33:17 2020
> read: IOPS=23.7k, BW=92.4MiB/s (96.9MB/s)(3070MiB/33215msec)
> bw ( KiB/s): min=89808, max=111952, per=100.00%, avg=94762.30,
> stdev=4507.43, samples=66 iops : min=22452, max=27988, avg=23690.58,
> stdev=1126.86, samples=66 write: IOPS=7907, BW=30.9MiB/s
> (32.4MB/s)(1026MiB/33215msec); 0 zone resets bw ( KiB/s): min=29424,
> max=37112, per=100.00%, avg=31668.73, stdev=1558.69, samples=66 iops
> : min= 7356, max= 9278, avg=7917.18, stdev=389.67, samples=66 cpu
> : usr=0.43%, sys=29.07%, ctx=1048627, majf=0, minf=7 IO depths :
> 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit :
> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete :
> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts:
> total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0,
> window=0, percentile=100.00%, depth=64
>
> Run status group 0 (all jobs):
> READ: bw=92.4MiB/s (96.9MB/s), 92.4MiB/s-92.4MiB/s (96.9MB/s-96.9MB/s),
> io=3070MiB (3219MB), run=33215-33215msec WRITE: bw=30.9MiB/s (32.4MB/s),
> 30.9MiB/s-30.9MiB/s (32.4MB/s-32.4MB/s), io=1026MiB (1076MB),
> run=33215-33215msec
>
> 9p ( mount -t 9p -o trans=virtio kernel /mnt
> -oversion=9p2000.L,cache=mmap,msize=1048576 ) test: (g=0): rw=randrw,
Bottleneck ------------------------------^
By increasing 'msize' you would encounter better 9P I/O results.
> bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync,
> iodepth=64 fio-3.21
> Starting 1 process
>
> test: (groupid=0, jobs=1): err= 0: pid=736: Fri Sep 25 12:36:00 2020
> read: IOPS=16.2k, BW=63.5MiB/s (66.6MB/s)(3070MiB/48366msec)
> bw ( KiB/s): min=63426, max=82776, per=100.00%, avg=65054.28,
> stdev=2014.88, samples=96 iops : min=15856, max=20694, avg=16263.34,
> stdev=503.74, samples=96 write: IOPS=5430, BW=21.2MiB/s
> (22.2MB/s)(1026MiB/48366msec); 0 zone resets bw ( KiB/s): min=20916,
> max=27632, per=100.00%, avg=21740.64, stdev=735.73, samples=96 iops
> : min= 5229, max= 6908, avg=5434.99, stdev=183.95, samples=96 cpu
> : usr=1.60%, sys=14.28%, ctx=1049348, majf=0, minf=7 IO depths :
> 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit :
> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete :
> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts:
> total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0,
> window=0, percentile=100.00%, depth=64
>
> Run status group 0 (all jobs):
> READ: bw=63.5MiB/s (66.6MB/s), 63.5MiB/s-63.5MiB/s (66.6MB/s-66.6MB/s),
> io=3070MiB (3219MB), run=48366-48366msec WRITE: bw=21.2MiB/s (22.2MB/s),
> 21.2MiB/s-21.2MiB/s (22.2MB/s-22.2MB/s), io=1026MiB (1076MB),
> run=48366-48366msec
>
> So I'm sitll beating 9p; the thread-pool-size=1 seems to be great for
> read performance here.
>
> Dave
Best regards,
Christian Schoenebeck
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance)
@ 2020-09-25 13:04 ` Christian Schoenebeck
0 siblings, 0 replies; 107+ messages in thread
From: Christian Schoenebeck @ 2020-09-25 13:04 UTC (permalink / raw)
To: qemu-devel
Cc: Venegas Munoz, Jose Carlos, cdupontd, virtio-fs-list, Shinde,
Archana M, Vivek Goyal
On Freitag, 25. September 2020 14:41:39 CEST Dr. David Alan Gilbert wrote:
> > Hi Carlos,
> >
> > So you are running following test.
> >
> > fio --direct=1 --gtod_reduce=1 --name=test
> > --filename=random_read_write.fio --bs=4k --iodepth=64 --size=4G
> > --readwrite=randrw --rwmixread=75 --output=/output/fio.txt
> >
> > And following are your results.
> >
> > 9p
> > --
> > READ: bw=211MiB/s (222MB/s), 211MiB/s-211MiB/s (222MB/s-222MB/s),
> > io=3070MiB (3219MB), run=14532-14532msec
> >
> > WRITE: bw=70.6MiB/s (74.0MB/s), 70.6MiB/s-70.6MiB/s (74.0MB/s-74.0MB/s),
> > io=1026MiB (1076MB), run=14532-14532msec
> >
> > virtiofs
> > --------
> >
> > Run status group 0 (all jobs):
> > READ: bw=159MiB/s (167MB/s), 159MiB/s-159MiB/s (167MB/s-167MB/s),
> > io=3070MiB (3219MB), run=19321-19321msec>
> > WRITE: bw=53.1MiB/s (55.7MB/s), 53.1MiB/s-53.1MiB/s (55.7MB/s-55.7MB/s),
> > io=1026MiB (1076MB), run=19321-19321msec>
> > So looks like you are getting better performance with 9p in this case.
>
> That's interesting, because I've just tried similar again with my
> ramdisk setup:
>
> fio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio
> --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
> --output=aname.txt
>
>
> virtiofs default options
> test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T)
> 4096B-4096B, ioengine=psync, iodepth=64 fio-3.21
> Starting 1 process
> test: Laying out IO file (1 file / 4096MiB)
>
> test: (groupid=0, jobs=1): err= 0: pid=773: Fri Sep 25 12:28:32 2020
> read: IOPS=18.3k, BW=71.3MiB/s (74.8MB/s)(3070MiB/43042msec)
> bw ( KiB/s): min=70752, max=77280, per=100.00%, avg=73075.71,
> stdev=1603.47, samples=85 iops : min=17688, max=19320, avg=18268.92,
> stdev=400.86, samples=85 write: IOPS=6102, BW=23.8MiB/s
> (24.0MB/s)(1026MiB/43042msec); 0 zone resets bw ( KiB/s): min=23128,
> max=25696, per=100.00%, avg=24420.40, stdev=583.08, samples=85 iops
> : min= 5782, max= 6424, avg=6105.09, stdev=145.76, samples=85 cpu
> : usr=0.10%, sys=30.09%, ctx=1245312, majf=0, minf=6 IO depths :
> 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit :
> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete :
> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts:
> total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0,
> window=0, percentile=100.00%, depth=64
>
> Run status group 0 (all jobs):
> READ: bw=71.3MiB/s (74.8MB/s), 71.3MiB/s-71.3MiB/s (74.8MB/s-74.8MB/s),
> io=3070MiB (3219MB), run=43042-43042msec WRITE: bw=23.8MiB/s (24.0MB/s),
> 23.8MiB/s-23.8MiB/s (24.0MB/s-24.0MB/s), io=1026MiB (1076MB),
> run=43042-43042msec
>
> virtiofs cache=none
> test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T)
> 4096B-4096B, ioengine=psync, iodepth=64 fio-3.21
> Starting 1 process
>
> test: (groupid=0, jobs=1): err= 0: pid=740: Fri Sep 25 12:30:57 2020
> read: IOPS=22.9k, BW=89.6MiB/s (93.0MB/s)(3070MiB/34256msec)
> bw ( KiB/s): min=89048, max=94240, per=100.00%, avg=91871.06,
> stdev=967.87, samples=68 iops : min=22262, max=23560, avg=22967.76,
> stdev=241.97, samples=68 write: IOPS=7667, BW=29.0MiB/s
> (31.4MB/s)(1026MiB/34256msec); 0 zone resets bw ( KiB/s): min=29264,
> max=32248, per=100.00%, avg=30700.82, stdev=541.97, samples=68 iops
> : min= 7316, max= 8062, avg=7675.21, stdev=135.49, samples=68 cpu
> : usr=1.03%, sys=27.64%, ctx=1048635, majf=0, minf=5 IO depths :
> 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit :
> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete :
> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts:
> total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0,
> window=0, percentile=100.00%, depth=64
>
> Run status group 0 (all jobs):
> READ: bw=89.6MiB/s (93.0MB/s), 89.6MiB/s-89.6MiB/s (93.0MB/s-93.0MB/s),
> io=3070MiB (3219MB), run=34256-34256msec WRITE: bw=29.0MiB/s (31.4MB/s),
> 29.0MiB/s-29.0MiB/s (31.4MB/s-31.4MB/s), io=1026MiB (1076MB),
> run=34256-34256msec
>
> virtiofs cache=none thread-pool-size=1
> test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T)
> 4096B-4096B, ioengine=psync, iodepth=64 fio-3.21
> Starting 1 process
>
> test: (groupid=0, jobs=1): err= 0: pid=738: Fri Sep 25 12:33:17 2020
> read: IOPS=23.7k, BW=92.4MiB/s (96.9MB/s)(3070MiB/33215msec)
> bw ( KiB/s): min=89808, max=111952, per=100.00%, avg=94762.30,
> stdev=4507.43, samples=66 iops : min=22452, max=27988, avg=23690.58,
> stdev=1126.86, samples=66 write: IOPS=7907, BW=30.9MiB/s
> (32.4MB/s)(1026MiB/33215msec); 0 zone resets bw ( KiB/s): min=29424,
> max=37112, per=100.00%, avg=31668.73, stdev=1558.69, samples=66 iops
> : min= 7356, max= 9278, avg=7917.18, stdev=389.67, samples=66 cpu
> : usr=0.43%, sys=29.07%, ctx=1048627, majf=0, minf=7 IO depths :
> 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit :
> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete :
> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts:
> total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0,
> window=0, percentile=100.00%, depth=64
>
> Run status group 0 (all jobs):
> READ: bw=92.4MiB/s (96.9MB/s), 92.4MiB/s-92.4MiB/s (96.9MB/s-96.9MB/s),
> io=3070MiB (3219MB), run=33215-33215msec WRITE: bw=30.9MiB/s (32.4MB/s),
> 30.9MiB/s-30.9MiB/s (32.4MB/s-32.4MB/s), io=1026MiB (1076MB),
> run=33215-33215msec
>
> 9p ( mount -t 9p -o trans=virtio kernel /mnt
> -oversion=9p2000.L,cache=mmap,msize=1048576 ) test: (g=0): rw=randrw,
Bottleneck ------------------------------^
By increasing 'msize' you would encounter better 9P I/O results.
> bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync,
> iodepth=64 fio-3.21
> Starting 1 process
>
> test: (groupid=0, jobs=1): err= 0: pid=736: Fri Sep 25 12:36:00 2020
> read: IOPS=16.2k, BW=63.5MiB/s (66.6MB/s)(3070MiB/48366msec)
> bw ( KiB/s): min=63426, max=82776, per=100.00%, avg=65054.28,
> stdev=2014.88, samples=96 iops : min=15856, max=20694, avg=16263.34,
> stdev=503.74, samples=96 write: IOPS=5430, BW=21.2MiB/s
> (22.2MB/s)(1026MiB/48366msec); 0 zone resets bw ( KiB/s): min=20916,
> max=27632, per=100.00%, avg=21740.64, stdev=735.73, samples=96 iops
> : min= 5229, max= 6908, avg=5434.99, stdev=183.95, samples=96 cpu
> : usr=1.60%, sys=14.28%, ctx=1049348, majf=0, minf=7 IO depths :
> 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit :
> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete :
> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts:
> total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0,
> window=0, percentile=100.00%, depth=64
>
> Run status group 0 (all jobs):
> READ: bw=63.5MiB/s (66.6MB/s), 63.5MiB/s-63.5MiB/s (66.6MB/s-66.6MB/s),
> io=3070MiB (3219MB), run=48366-48366msec WRITE: bw=21.2MiB/s (22.2MB/s),
> 21.2MiB/s-21.2MiB/s (22.2MB/s-22.2MB/s), io=1026MiB (1076MB),
> run=48366-48366msec
>
> So I'm sitll beating 9p; the thread-pool-size=1 seems to be great for
> read performance here.
>
> Dave
Best regards,
Christian Schoenebeck
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance)
2020-09-25 13:04 ` [Virtio-fs] " Christian Schoenebeck
@ 2020-09-25 13:05 ` Dr. David Alan Gilbert
-1 siblings, 0 replies; 107+ messages in thread
From: Dr. David Alan Gilbert @ 2020-09-25 13:05 UTC (permalink / raw)
To: Christian Schoenebeck
Cc: Venegas Munoz, Jose Carlos, qemu-devel, cdupontd, virtio-fs-list,
Stefan Hajnoczi, Shinde, Archana M, Vivek Goyal
* Christian Schoenebeck (qemu_oss@crudebyte.com) wrote:
> On Freitag, 25. September 2020 14:41:39 CEST Dr. David Alan Gilbert wrote:
> > > Hi Carlos,
> > >
> > > So you are running following test.
> > >
> > > fio --direct=1 --gtod_reduce=1 --name=test
> > > --filename=random_read_write.fio --bs=4k --iodepth=64 --size=4G
> > > --readwrite=randrw --rwmixread=75 --output=/output/fio.txt
> > >
> > > And following are your results.
> > >
> > > 9p
> > > --
> > > READ: bw=211MiB/s (222MB/s), 211MiB/s-211MiB/s (222MB/s-222MB/s),
> > > io=3070MiB (3219MB), run=14532-14532msec
> > >
> > > WRITE: bw=70.6MiB/s (74.0MB/s), 70.6MiB/s-70.6MiB/s (74.0MB/s-74.0MB/s),
> > > io=1026MiB (1076MB), run=14532-14532msec
> > >
> > > virtiofs
> > > --------
> > >
> > > Run status group 0 (all jobs):
> > > READ: bw=159MiB/s (167MB/s), 159MiB/s-159MiB/s (167MB/s-167MB/s),
> > > io=3070MiB (3219MB), run=19321-19321msec>
> > > WRITE: bw=53.1MiB/s (55.7MB/s), 53.1MiB/s-53.1MiB/s (55.7MB/s-55.7MB/s),
> > > io=1026MiB (1076MB), run=19321-19321msec>
> > > So looks like you are getting better performance with 9p in this case.
> >
> > That's interesting, because I've just tried similar again with my
> > ramdisk setup:
> >
> > fio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio
> > --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
> > --output=aname.txt
> >
> >
> > virtiofs default options
> > test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T)
> > 4096B-4096B, ioengine=psync, iodepth=64 fio-3.21
> > Starting 1 process
> > test: Laying out IO file (1 file / 4096MiB)
> >
> > test: (groupid=0, jobs=1): err= 0: pid=773: Fri Sep 25 12:28:32 2020
> > read: IOPS=18.3k, BW=71.3MiB/s (74.8MB/s)(3070MiB/43042msec)
> > bw ( KiB/s): min=70752, max=77280, per=100.00%, avg=73075.71,
> > stdev=1603.47, samples=85 iops : min=17688, max=19320, avg=18268.92,
> > stdev=400.86, samples=85 write: IOPS=6102, BW=23.8MiB/s
> > (24.0MB/s)(1026MiB/43042msec); 0 zone resets bw ( KiB/s): min=23128,
> > max=25696, per=100.00%, avg=24420.40, stdev=583.08, samples=85 iops
> > : min= 5782, max= 6424, avg=6105.09, stdev=145.76, samples=85 cpu
> > : usr=0.10%, sys=30.09%, ctx=1245312, majf=0, minf=6 IO depths :
> > 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit :
> > 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete :
> > 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts:
> > total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0,
> > window=0, percentile=100.00%, depth=64
> >
> > Run status group 0 (all jobs):
> > READ: bw=71.3MiB/s (74.8MB/s), 71.3MiB/s-71.3MiB/s (74.8MB/s-74.8MB/s),
> > io=3070MiB (3219MB), run=43042-43042msec WRITE: bw=23.8MiB/s (24.0MB/s),
> > 23.8MiB/s-23.8MiB/s (24.0MB/s-24.0MB/s), io=1026MiB (1076MB),
> > run=43042-43042msec
> >
> > virtiofs cache=none
> > test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T)
> > 4096B-4096B, ioengine=psync, iodepth=64 fio-3.21
> > Starting 1 process
> >
> > test: (groupid=0, jobs=1): err= 0: pid=740: Fri Sep 25 12:30:57 2020
> > read: IOPS=22.9k, BW=89.6MiB/s (93.0MB/s)(3070MiB/34256msec)
> > bw ( KiB/s): min=89048, max=94240, per=100.00%, avg=91871.06,
> > stdev=967.87, samples=68 iops : min=22262, max=23560, avg=22967.76,
> > stdev=241.97, samples=68 write: IOPS=7667, BW=29.0MiB/s
> > (31.4MB/s)(1026MiB/34256msec); 0 zone resets bw ( KiB/s): min=29264,
> > max=32248, per=100.00%, avg=30700.82, stdev=541.97, samples=68 iops
> > : min= 7316, max= 8062, avg=7675.21, stdev=135.49, samples=68 cpu
> > : usr=1.03%, sys=27.64%, ctx=1048635, majf=0, minf=5 IO depths :
> > 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit :
> > 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete :
> > 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts:
> > total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0,
> > window=0, percentile=100.00%, depth=64
> >
> > Run status group 0 (all jobs):
> > READ: bw=89.6MiB/s (93.0MB/s), 89.6MiB/s-89.6MiB/s (93.0MB/s-93.0MB/s),
> > io=3070MiB (3219MB), run=34256-34256msec WRITE: bw=29.0MiB/s (31.4MB/s),
> > 29.0MiB/s-29.0MiB/s (31.4MB/s-31.4MB/s), io=1026MiB (1076MB),
> > run=34256-34256msec
> >
> > virtiofs cache=none thread-pool-size=1
> > test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T)
> > 4096B-4096B, ioengine=psync, iodepth=64 fio-3.21
> > Starting 1 process
> >
> > test: (groupid=0, jobs=1): err= 0: pid=738: Fri Sep 25 12:33:17 2020
> > read: IOPS=23.7k, BW=92.4MiB/s (96.9MB/s)(3070MiB/33215msec)
> > bw ( KiB/s): min=89808, max=111952, per=100.00%, avg=94762.30,
> > stdev=4507.43, samples=66 iops : min=22452, max=27988, avg=23690.58,
> > stdev=1126.86, samples=66 write: IOPS=7907, BW=30.9MiB/s
> > (32.4MB/s)(1026MiB/33215msec); 0 zone resets bw ( KiB/s): min=29424,
> > max=37112, per=100.00%, avg=31668.73, stdev=1558.69, samples=66 iops
> > : min= 7356, max= 9278, avg=7917.18, stdev=389.67, samples=66 cpu
> > : usr=0.43%, sys=29.07%, ctx=1048627, majf=0, minf=7 IO depths :
> > 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit :
> > 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete :
> > 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts:
> > total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0,
> > window=0, percentile=100.00%, depth=64
> >
> > Run status group 0 (all jobs):
> > READ: bw=92.4MiB/s (96.9MB/s), 92.4MiB/s-92.4MiB/s (96.9MB/s-96.9MB/s),
> > io=3070MiB (3219MB), run=33215-33215msec WRITE: bw=30.9MiB/s (32.4MB/s),
> > 30.9MiB/s-30.9MiB/s (32.4MB/s-32.4MB/s), io=1026MiB (1076MB),
> > run=33215-33215msec
> >
> > 9p ( mount -t 9p -o trans=virtio kernel /mnt
> > -oversion=9p2000.L,cache=mmap,msize=1048576 ) test: (g=0): rw=randrw,
> Bottleneck ------------------------------^
>
> By increasing 'msize' you would encounter better 9P I/O results.
OK, I thought that was bigger than the default; what number should I
use?
Dave
> > bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync,
> > iodepth=64 fio-3.21
> > Starting 1 process
> >
> > test: (groupid=0, jobs=1): err= 0: pid=736: Fri Sep 25 12:36:00 2020
> > read: IOPS=16.2k, BW=63.5MiB/s (66.6MB/s)(3070MiB/48366msec)
> > bw ( KiB/s): min=63426, max=82776, per=100.00%, avg=65054.28,
> > stdev=2014.88, samples=96 iops : min=15856, max=20694, avg=16263.34,
> > stdev=503.74, samples=96 write: IOPS=5430, BW=21.2MiB/s
> > (22.2MB/s)(1026MiB/48366msec); 0 zone resets bw ( KiB/s): min=20916,
> > max=27632, per=100.00%, avg=21740.64, stdev=735.73, samples=96 iops
> > : min= 5229, max= 6908, avg=5434.99, stdev=183.95, samples=96 cpu
> > : usr=1.60%, sys=14.28%, ctx=1049348, majf=0, minf=7 IO depths :
> > 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit :
> > 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete :
> > 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts:
> > total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0,
> > window=0, percentile=100.00%, depth=64
> >
> > Run status group 0 (all jobs):
> > READ: bw=63.5MiB/s (66.6MB/s), 63.5MiB/s-63.5MiB/s (66.6MB/s-66.6MB/s),
> > io=3070MiB (3219MB), run=48366-48366msec WRITE: bw=21.2MiB/s (22.2MB/s),
> > 21.2MiB/s-21.2MiB/s (22.2MB/s-22.2MB/s), io=1026MiB (1076MB),
> > run=48366-48366msec
> >
> > So I'm sitll beating 9p; the thread-pool-size=1 seems to be great for
> > read performance here.
> >
> > Dave
>
> Best regards,
> Christian Schoenebeck
>
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance)
@ 2020-09-25 13:05 ` Dr. David Alan Gilbert
0 siblings, 0 replies; 107+ messages in thread
From: Dr. David Alan Gilbert @ 2020-09-25 13:05 UTC (permalink / raw)
To: Christian Schoenebeck
Cc: Venegas Munoz, Jose Carlos, qemu-devel, cdupontd, virtio-fs-list,
Shinde, Archana M, Vivek Goyal
* Christian Schoenebeck (qemu_oss@crudebyte.com) wrote:
> On Freitag, 25. September 2020 14:41:39 CEST Dr. David Alan Gilbert wrote:
> > > Hi Carlos,
> > >
> > > So you are running following test.
> > >
> > > fio --direct=1 --gtod_reduce=1 --name=test
> > > --filename=random_read_write.fio --bs=4k --iodepth=64 --size=4G
> > > --readwrite=randrw --rwmixread=75 --output=/output/fio.txt
> > >
> > > And following are your results.
> > >
> > > 9p
> > > --
> > > READ: bw=211MiB/s (222MB/s), 211MiB/s-211MiB/s (222MB/s-222MB/s),
> > > io=3070MiB (3219MB), run=14532-14532msec
> > >
> > > WRITE: bw=70.6MiB/s (74.0MB/s), 70.6MiB/s-70.6MiB/s (74.0MB/s-74.0MB/s),
> > > io=1026MiB (1076MB), run=14532-14532msec
> > >
> > > virtiofs
> > > --------
> > >
> > > Run status group 0 (all jobs):
> > > READ: bw=159MiB/s (167MB/s), 159MiB/s-159MiB/s (167MB/s-167MB/s),
> > > io=3070MiB (3219MB), run=19321-19321msec>
> > > WRITE: bw=53.1MiB/s (55.7MB/s), 53.1MiB/s-53.1MiB/s (55.7MB/s-55.7MB/s),
> > > io=1026MiB (1076MB), run=19321-19321msec>
> > > So looks like you are getting better performance with 9p in this case.
> >
> > That's interesting, because I've just tried similar again with my
> > ramdisk setup:
> >
> > fio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio
> > --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
> > --output=aname.txt
> >
> >
> > virtiofs default options
> > test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T)
> > 4096B-4096B, ioengine=psync, iodepth=64 fio-3.21
> > Starting 1 process
> > test: Laying out IO file (1 file / 4096MiB)
> >
> > test: (groupid=0, jobs=1): err= 0: pid=773: Fri Sep 25 12:28:32 2020
> > read: IOPS=18.3k, BW=71.3MiB/s (74.8MB/s)(3070MiB/43042msec)
> > bw ( KiB/s): min=70752, max=77280, per=100.00%, avg=73075.71,
> > stdev=1603.47, samples=85 iops : min=17688, max=19320, avg=18268.92,
> > stdev=400.86, samples=85 write: IOPS=6102, BW=23.8MiB/s
> > (24.0MB/s)(1026MiB/43042msec); 0 zone resets bw ( KiB/s): min=23128,
> > max=25696, per=100.00%, avg=24420.40, stdev=583.08, samples=85 iops
> > : min= 5782, max= 6424, avg=6105.09, stdev=145.76, samples=85 cpu
> > : usr=0.10%, sys=30.09%, ctx=1245312, majf=0, minf=6 IO depths :
> > 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit :
> > 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete :
> > 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts:
> > total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0,
> > window=0, percentile=100.00%, depth=64
> >
> > Run status group 0 (all jobs):
> > READ: bw=71.3MiB/s (74.8MB/s), 71.3MiB/s-71.3MiB/s (74.8MB/s-74.8MB/s),
> > io=3070MiB (3219MB), run=43042-43042msec WRITE: bw=23.8MiB/s (24.0MB/s),
> > 23.8MiB/s-23.8MiB/s (24.0MB/s-24.0MB/s), io=1026MiB (1076MB),
> > run=43042-43042msec
> >
> > virtiofs cache=none
> > test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T)
> > 4096B-4096B, ioengine=psync, iodepth=64 fio-3.21
> > Starting 1 process
> >
> > test: (groupid=0, jobs=1): err= 0: pid=740: Fri Sep 25 12:30:57 2020
> > read: IOPS=22.9k, BW=89.6MiB/s (93.0MB/s)(3070MiB/34256msec)
> > bw ( KiB/s): min=89048, max=94240, per=100.00%, avg=91871.06,
> > stdev=967.87, samples=68 iops : min=22262, max=23560, avg=22967.76,
> > stdev=241.97, samples=68 write: IOPS=7667, BW=29.0MiB/s
> > (31.4MB/s)(1026MiB/34256msec); 0 zone resets bw ( KiB/s): min=29264,
> > max=32248, per=100.00%, avg=30700.82, stdev=541.97, samples=68 iops
> > : min= 7316, max= 8062, avg=7675.21, stdev=135.49, samples=68 cpu
> > : usr=1.03%, sys=27.64%, ctx=1048635, majf=0, minf=5 IO depths :
> > 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit :
> > 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete :
> > 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts:
> > total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0,
> > window=0, percentile=100.00%, depth=64
> >
> > Run status group 0 (all jobs):
> > READ: bw=89.6MiB/s (93.0MB/s), 89.6MiB/s-89.6MiB/s (93.0MB/s-93.0MB/s),
> > io=3070MiB (3219MB), run=34256-34256msec WRITE: bw=29.0MiB/s (31.4MB/s),
> > 29.0MiB/s-29.0MiB/s (31.4MB/s-31.4MB/s), io=1026MiB (1076MB),
> > run=34256-34256msec
> >
> > virtiofs cache=none thread-pool-size=1
> > test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T)
> > 4096B-4096B, ioengine=psync, iodepth=64 fio-3.21
> > Starting 1 process
> >
> > test: (groupid=0, jobs=1): err= 0: pid=738: Fri Sep 25 12:33:17 2020
> > read: IOPS=23.7k, BW=92.4MiB/s (96.9MB/s)(3070MiB/33215msec)
> > bw ( KiB/s): min=89808, max=111952, per=100.00%, avg=94762.30,
> > stdev=4507.43, samples=66 iops : min=22452, max=27988, avg=23690.58,
> > stdev=1126.86, samples=66 write: IOPS=7907, BW=30.9MiB/s
> > (32.4MB/s)(1026MiB/33215msec); 0 zone resets bw ( KiB/s): min=29424,
> > max=37112, per=100.00%, avg=31668.73, stdev=1558.69, samples=66 iops
> > : min= 7356, max= 9278, avg=7917.18, stdev=389.67, samples=66 cpu
> > : usr=0.43%, sys=29.07%, ctx=1048627, majf=0, minf=7 IO depths :
> > 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit :
> > 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete :
> > 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts:
> > total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0,
> > window=0, percentile=100.00%, depth=64
> >
> > Run status group 0 (all jobs):
> > READ: bw=92.4MiB/s (96.9MB/s), 92.4MiB/s-92.4MiB/s (96.9MB/s-96.9MB/s),
> > io=3070MiB (3219MB), run=33215-33215msec WRITE: bw=30.9MiB/s (32.4MB/s),
> > 30.9MiB/s-30.9MiB/s (32.4MB/s-32.4MB/s), io=1026MiB (1076MB),
> > run=33215-33215msec
> >
> > 9p ( mount -t 9p -o trans=virtio kernel /mnt
> > -oversion=9p2000.L,cache=mmap,msize=1048576 ) test: (g=0): rw=randrw,
> Bottleneck ------------------------------^
>
> By increasing 'msize' you would encounter better 9P I/O results.
OK, I thought that was bigger than the default; what number should I
use?
Dave
> > bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync,
> > iodepth=64 fio-3.21
> > Starting 1 process
> >
> > test: (groupid=0, jobs=1): err= 0: pid=736: Fri Sep 25 12:36:00 2020
> > read: IOPS=16.2k, BW=63.5MiB/s (66.6MB/s)(3070MiB/48366msec)
> > bw ( KiB/s): min=63426, max=82776, per=100.00%, avg=65054.28,
> > stdev=2014.88, samples=96 iops : min=15856, max=20694, avg=16263.34,
> > stdev=503.74, samples=96 write: IOPS=5430, BW=21.2MiB/s
> > (22.2MB/s)(1026MiB/48366msec); 0 zone resets bw ( KiB/s): min=20916,
> > max=27632, per=100.00%, avg=21740.64, stdev=735.73, samples=96 iops
> > : min= 5229, max= 6908, avg=5434.99, stdev=183.95, samples=96 cpu
> > : usr=1.60%, sys=14.28%, ctx=1049348, majf=0, minf=7 IO depths :
> > 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit :
> > 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete :
> > 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts:
> > total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0,
> > window=0, percentile=100.00%, depth=64
> >
> > Run status group 0 (all jobs):
> > READ: bw=63.5MiB/s (66.6MB/s), 63.5MiB/s-63.5MiB/s (66.6MB/s-66.6MB/s),
> > io=3070MiB (3219MB), run=48366-48366msec WRITE: bw=21.2MiB/s (22.2MB/s),
> > 21.2MiB/s-21.2MiB/s (22.2MB/s-22.2MB/s), io=1026MiB (1076MB),
> > run=48366-48366msec
> >
> > So I'm sitll beating 9p; the thread-pool-size=1 seems to be great for
> > read performance here.
> >
> > Dave
>
> Best regards,
> Christian Schoenebeck
>
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: tools/virtiofs: Multi threading seems to hurt performance
2020-09-25 12:11 ` [Virtio-fs] " Dr. David Alan Gilbert
@ 2020-09-25 13:11 ` Vivek Goyal
-1 siblings, 0 replies; 107+ messages in thread
From: Vivek Goyal @ 2020-09-25 13:11 UTC (permalink / raw)
To: Dr. David Alan Gilbert
Cc: jose.carlos.venegas.munoz, qemu-devel, cdupontd, virtio-fs-list,
Stefan Hajnoczi, archana.m.shinde
On Fri, Sep 25, 2020 at 01:11:27PM +0100, Dr. David Alan Gilbert wrote:
> * Vivek Goyal (vgoyal@redhat.com) wrote:
> > On Tue, Sep 22, 2020 at 11:25:31AM +0100, Dr. David Alan Gilbert wrote:
> > > * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> > > > Hi,
> > > > I've been doing some of my own perf tests and I think I agree
> > > > about the thread pool size; my test is a kernel build
> > > > and I've tried a bunch of different options.
> > > >
> > > > My config:
> > > > Host: 16 core AMD EPYC (32 thread), 128G RAM,
> > > > 5.9.0-rc4 kernel, rhel 8.2ish userspace.
> > > > 5.1.0 qemu/virtiofsd built from git.
> > > > Guest: Fedora 32 from cloud image with just enough extra installed for
> > > > a kernel build.
> > > >
> > > > git cloned and checkout v5.8 of Linux into /dev/shm/linux on the host
> > > > fresh before each test. Then log into the guest, make defconfig,
> > > > time make -j 16 bzImage, make clean; time make -j 16 bzImage
> > > > The numbers below are the 'real' time in the guest from the initial make
> > > > (the subsequent makes dont vary much)
> > > >
> > > > Below are the detauls of what each of these means, but here are the
> > > > numbers first
> > > >
> > > > virtiofsdefault 4m0.978s
> > > > 9pdefault 9m41.660s
> > > > virtiofscache=none 10m29.700s
> > > > 9pmmappass 9m30.047s
> > > > 9pmbigmsize 12m4.208s
> > > > 9pmsecnone 9m21.363s
> > > > virtiofscache=noneT1 7m17.494s
> > > > virtiofsdefaultT1 3m43.326s
> > > >
> > > > So the winner there by far is the 'virtiofsdefaultT1' - that's
> > > > the default virtiofs settings, but with --thread-pool-size=1 - so
> > > > yes it gives a small benefit.
> > > > But interestingly the cache=none virtiofs performance is pretty bad,
> > > > but thread-pool-size=1 on that makes a BIG improvement.
> > >
> > > Here are fio runs that Vivek asked me to run in my same environment
> > > (there are some 0's in some of the mmap cases, and I've not investigated
> > > why yet).
> >
> > cache=none does not allow mmap in case of virtiofs. That's when you
> > are seeing 0.
> >
> > >virtiofs is looking good here in I think all of the cases;
> > > there's some division over which cinfig; cache=none
> > > seems faster in some cases which surprises me.
> >
> > I know cache=none is faster in case of write workloads. It forces
> > direct write where we don't call file_remove_privs(). While cache=auto
> > goes through file_remove_privs() and that adds a GETXATTR request to
> > every WRITE request.
>
> Can you point me to how cache=auto causes the file_remove_privs?
fs/fuse/file.c
fuse_cache_write_iter() {
err = file_remove_privs(file);
}
Above path is taken when cache=auto/cache=always is used. If virtiofsd
is running with noxattr, then it does not impose any cost. But if xattr
are enabled, then every WRITE first results in a
getxattr(security.capability) and that slows down WRITES tremendously.
When cache=none is used, we go through following path instead.
fuse_direct_write_iter() and it does not have file_remove_privs(). We
set a flag in WRITE request to tell server to kill
suid/sgid/security.capability, instead.
fuse_direct_io() {
ia->write.in.write_flags |= FUSE_WRITE_KILL_PRIV
}
Vivek
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] tools/virtiofs: Multi threading seems to hurt performance
@ 2020-09-25 13:11 ` Vivek Goyal
0 siblings, 0 replies; 107+ messages in thread
From: Vivek Goyal @ 2020-09-25 13:11 UTC (permalink / raw)
To: Dr. David Alan Gilbert
Cc: jose.carlos.venegas.munoz, qemu-devel, cdupontd, virtio-fs-list,
archana.m.shinde
On Fri, Sep 25, 2020 at 01:11:27PM +0100, Dr. David Alan Gilbert wrote:
> * Vivek Goyal (vgoyal@redhat.com) wrote:
> > On Tue, Sep 22, 2020 at 11:25:31AM +0100, Dr. David Alan Gilbert wrote:
> > > * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> > > > Hi,
> > > > I've been doing some of my own perf tests and I think I agree
> > > > about the thread pool size; my test is a kernel build
> > > > and I've tried a bunch of different options.
> > > >
> > > > My config:
> > > > Host: 16 core AMD EPYC (32 thread), 128G RAM,
> > > > 5.9.0-rc4 kernel, rhel 8.2ish userspace.
> > > > 5.1.0 qemu/virtiofsd built from git.
> > > > Guest: Fedora 32 from cloud image with just enough extra installed for
> > > > a kernel build.
> > > >
> > > > git cloned and checkout v5.8 of Linux into /dev/shm/linux on the host
> > > > fresh before each test. Then log into the guest, make defconfig,
> > > > time make -j 16 bzImage, make clean; time make -j 16 bzImage
> > > > The numbers below are the 'real' time in the guest from the initial make
> > > > (the subsequent makes dont vary much)
> > > >
> > > > Below are the detauls of what each of these means, but here are the
> > > > numbers first
> > > >
> > > > virtiofsdefault 4m0.978s
> > > > 9pdefault 9m41.660s
> > > > virtiofscache=none 10m29.700s
> > > > 9pmmappass 9m30.047s
> > > > 9pmbigmsize 12m4.208s
> > > > 9pmsecnone 9m21.363s
> > > > virtiofscache=noneT1 7m17.494s
> > > > virtiofsdefaultT1 3m43.326s
> > > >
> > > > So the winner there by far is the 'virtiofsdefaultT1' - that's
> > > > the default virtiofs settings, but with --thread-pool-size=1 - so
> > > > yes it gives a small benefit.
> > > > But interestingly the cache=none virtiofs performance is pretty bad,
> > > > but thread-pool-size=1 on that makes a BIG improvement.
> > >
> > > Here are fio runs that Vivek asked me to run in my same environment
> > > (there are some 0's in some of the mmap cases, and I've not investigated
> > > why yet).
> >
> > cache=none does not allow mmap in case of virtiofs. That's when you
> > are seeing 0.
> >
> > >virtiofs is looking good here in I think all of the cases;
> > > there's some division over which cinfig; cache=none
> > > seems faster in some cases which surprises me.
> >
> > I know cache=none is faster in case of write workloads. It forces
> > direct write where we don't call file_remove_privs(). While cache=auto
> > goes through file_remove_privs() and that adds a GETXATTR request to
> > every WRITE request.
>
> Can you point me to how cache=auto causes the file_remove_privs?
fs/fuse/file.c
fuse_cache_write_iter() {
err = file_remove_privs(file);
}
Above path is taken when cache=auto/cache=always is used. If virtiofsd
is running with noxattr, then it does not impose any cost. But if xattr
are enabled, then every WRITE first results in a
getxattr(security.capability) and that slows down WRITES tremendously.
When cache=none is used, we go through following path instead.
fuse_direct_write_iter() and it does not have file_remove_privs(). We
set a flag in WRITE request to tell server to kill
suid/sgid/security.capability, instead.
fuse_direct_io() {
ia->write.in.write_flags |= FUSE_WRITE_KILL_PRIV
}
Vivek
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: virtiofs vs 9p performance
2020-09-25 8:06 ` [Virtio-fs] " Christian Schoenebeck
@ 2020-09-25 13:13 ` Vivek Goyal
-1 siblings, 0 replies; 107+ messages in thread
From: Vivek Goyal @ 2020-09-25 13:13 UTC (permalink / raw)
To: Christian Schoenebeck
Cc: Shinde, Archana M, Venegas Munoz, Jose Carlos, qemu-devel,
Dr. David Alan Gilbert, virtio-fs-list, Greg Kurz,
Stefan Hajnoczi, cdupontd
On Fri, Sep 25, 2020 at 10:06:41AM +0200, Christian Schoenebeck wrote:
> On Freitag, 25. September 2020 00:10:23 CEST Vivek Goyal wrote:
> > In my testing, with cache=none, virtiofs performed better than 9p in
> > all the fio jobs I was running. For the case of cache=auto for virtiofs
> > (with xattr enabled), 9p performed better in certain write workloads. I
> > have identified root cause of that problem and working on
> > HANDLE_KILLPRIV_V2 patches to improve WRITE performance of virtiofs
> > with cache=auto and xattr enabled.
>
> Please note, when it comes to performance aspects, you should set a reasonable
> high value for 'msize' on 9p client side:
> https://wiki.qemu.org/Documentation/9psetup#msize
Interesting. I will try that. What does "msize" do?
>
> I'm also working on performance optimizations for 9p BTW. There is plenty of
> headroom to put it mildly. For QEMU 5.2 I started by addressing readdir
> requests:
> https://wiki.qemu.org/ChangeLog/5.2#9pfs
Nice. I guess this performance comparison between 9p and virtiofs is good.
Both the projects can try to identify weak points and improve performance.
Thanks
Vivek
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] virtiofs vs 9p performance
@ 2020-09-25 13:13 ` Vivek Goyal
0 siblings, 0 replies; 107+ messages in thread
From: Vivek Goyal @ 2020-09-25 13:13 UTC (permalink / raw)
To: Christian Schoenebeck
Cc: Shinde, Archana M, Venegas Munoz, Jose Carlos, qemu-devel,
virtio-fs-list, Greg Kurz, cdupontd
On Fri, Sep 25, 2020 at 10:06:41AM +0200, Christian Schoenebeck wrote:
> On Freitag, 25. September 2020 00:10:23 CEST Vivek Goyal wrote:
> > In my testing, with cache=none, virtiofs performed better than 9p in
> > all the fio jobs I was running. For the case of cache=auto for virtiofs
> > (with xattr enabled), 9p performed better in certain write workloads. I
> > have identified root cause of that problem and working on
> > HANDLE_KILLPRIV_V2 patches to improve WRITE performance of virtiofs
> > with cache=auto and xattr enabled.
>
> Please note, when it comes to performance aspects, you should set a reasonable
> high value for 'msize' on 9p client side:
> https://wiki.qemu.org/Documentation/9psetup#msize
Interesting. I will try that. What does "msize" do?
>
> I'm also working on performance optimizations for 9p BTW. There is plenty of
> headroom to put it mildly. For QEMU 5.2 I started by addressing readdir
> requests:
> https://wiki.qemu.org/ChangeLog/5.2#9pfs
Nice. I guess this performance comparison between 9p and virtiofs is good.
Both the projects can try to identify weak points and improve performance.
Thanks
Vivek
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: virtiofs vs 9p performance
2020-09-25 13:13 ` [Virtio-fs] " Vivek Goyal
@ 2020-09-25 15:47 ` Christian Schoenebeck
-1 siblings, 0 replies; 107+ messages in thread
From: Christian Schoenebeck @ 2020-09-25 15:47 UTC (permalink / raw)
To: qemu-devel
Cc: Vivek Goyal, Shinde, Archana M, Venegas Munoz, Jose Carlos,
Dr. David Alan Gilbert, virtio-fs-list, Greg Kurz,
Stefan Hajnoczi, cdupontd
On Freitag, 25. September 2020 15:13:56 CEST Vivek Goyal wrote:
> On Fri, Sep 25, 2020 at 10:06:41AM +0200, Christian Schoenebeck wrote:
> > On Freitag, 25. September 2020 00:10:23 CEST Vivek Goyal wrote:
> > > In my testing, with cache=none, virtiofs performed better than 9p in
> > > all the fio jobs I was running. For the case of cache=auto for virtiofs
> > > (with xattr enabled), 9p performed better in certain write workloads. I
> > > have identified root cause of that problem and working on
> > > HANDLE_KILLPRIV_V2 patches to improve WRITE performance of virtiofs
> > > with cache=auto and xattr enabled.
> >
> > Please note, when it comes to performance aspects, you should set a
> > reasonable high value for 'msize' on 9p client side:
> > https://wiki.qemu.org/Documentation/9psetup#msize
>
> Interesting. I will try that. What does "msize" do?
Simple: it's the "maximum message size" ever to be used for communication
between host and guest, in both directions that is.
So if that 'msize' value is too small, a potential large 9p message would be
split into several smaller 9p messages, and each message adds latency which is
the main problem.
Keep in mind: The default value with Linux clients for msize is still only
8kB!
Think of doing 'dd bs=8192 if=/src.dat of=/dst.dat count=...' as analogy,
which probably makes its impact on performance clear.
However the negative impact of a small 'msize' value is not just limited to
raw file I/O like that; calling readdir() for instance on a guest directory
with several hundred files or more, will likewise slow down in the same way
tremendously as both sides have to transmit a large amount of 9p messages back
and forth instead of just 2 messages (Treaddir and Rreaddir).
> > I'm also working on performance optimizations for 9p BTW. There is plenty
> > of headroom to put it mildly. For QEMU 5.2 I started by addressing
> > readdir requests:
> > https://wiki.qemu.org/ChangeLog/5.2#9pfs
>
> Nice. I guess this performance comparison between 9p and virtiofs is good.
> Both the projects can try to identify weak points and improve performance.
Yes, that's indeed handy being able to make comparisons.
Best regards,
Christian Schoenebeck
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] virtiofs vs 9p performance
@ 2020-09-25 15:47 ` Christian Schoenebeck
0 siblings, 0 replies; 107+ messages in thread
From: Christian Schoenebeck @ 2020-09-25 15:47 UTC (permalink / raw)
To: qemu-devel
Cc: cdupontd, Venegas Munoz, Jose Carlos, Greg Kurz, virtio-fs-list,
Shinde, Archana M, Vivek Goyal
On Freitag, 25. September 2020 15:13:56 CEST Vivek Goyal wrote:
> On Fri, Sep 25, 2020 at 10:06:41AM +0200, Christian Schoenebeck wrote:
> > On Freitag, 25. September 2020 00:10:23 CEST Vivek Goyal wrote:
> > > In my testing, with cache=none, virtiofs performed better than 9p in
> > > all the fio jobs I was running. For the case of cache=auto for virtiofs
> > > (with xattr enabled), 9p performed better in certain write workloads. I
> > > have identified root cause of that problem and working on
> > > HANDLE_KILLPRIV_V2 patches to improve WRITE performance of virtiofs
> > > with cache=auto and xattr enabled.
> >
> > Please note, when it comes to performance aspects, you should set a
> > reasonable high value for 'msize' on 9p client side:
> > https://wiki.qemu.org/Documentation/9psetup#msize
>
> Interesting. I will try that. What does "msize" do?
Simple: it's the "maximum message size" ever to be used for communication
between host and guest, in both directions that is.
So if that 'msize' value is too small, a potential large 9p message would be
split into several smaller 9p messages, and each message adds latency which is
the main problem.
Keep in mind: The default value with Linux clients for msize is still only
8kB!
Think of doing 'dd bs=8192 if=/src.dat of=/dst.dat count=...' as analogy,
which probably makes its impact on performance clear.
However the negative impact of a small 'msize' value is not just limited to
raw file I/O like that; calling readdir() for instance on a guest directory
with several hundred files or more, will likewise slow down in the same way
tremendously as both sides have to transmit a large amount of 9p messages back
and forth instead of just 2 messages (Treaddir and Rreaddir).
> > I'm also working on performance optimizations for 9p BTW. There is plenty
> > of headroom to put it mildly. For QEMU 5.2 I started by addressing
> > readdir requests:
> > https://wiki.qemu.org/ChangeLog/5.2#9pfs
>
> Nice. I guess this performance comparison between 9p and virtiofs is good.
> Both the projects can try to identify weak points and improve performance.
Yes, that's indeed handy being able to make comparisons.
Best regards,
Christian Schoenebeck
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance)
2020-09-25 13:05 ` [Virtio-fs] " Dr. David Alan Gilbert
@ 2020-09-25 16:05 ` Christian Schoenebeck
-1 siblings, 0 replies; 107+ messages in thread
From: Christian Schoenebeck @ 2020-09-25 16:05 UTC (permalink / raw)
To: qemu-devel
Cc: Dr. David Alan Gilbert, Venegas Munoz, Jose Carlos, cdupontd,
virtio-fs-list, Stefan Hajnoczi, Shinde, Archana M, Vivek Goyal
On Freitag, 25. September 2020 15:05:38 CEST Dr. David Alan Gilbert wrote:
> > > 9p ( mount -t 9p -o trans=virtio kernel /mnt
> > > -oversion=9p2000.L,cache=mmap,msize=1048576 ) test: (g=0): rw=randrw,
> >
> > Bottleneck ------------------------------^
> >
> > By increasing 'msize' you would encounter better 9P I/O results.
>
> OK, I thought that was bigger than the default; what number should I
> use?
It depends on the underlying storage hardware. In other words: you have to try
increasing the 'msize' value to a point where you no longer notice a negative
performance impact (or almost). Which is fortunately quite easy to test on
guest like:
dd if=/dev/zero of=test.dat bs=1G count=12
time cat test.dat > /dev/null
I would start with an absolute minimum msize of 10MB. I would recommend
something around 100MB maybe for a mechanical hard drive. With a PCIe flash
you probably would rather pick several hundred MB or even more.
That unpleasant 'msize' issue is a limitation of the 9p protocol: client
(guest) must suggest the value of msize on connection to server (host). Server
can only lower, but not raise it. And the client in turn obviously cannot see
host's storage device(s), so client is unable to pick a good value by itself.
So it's a suboptimal handshake issue right now.
Many users don't even know this 'msize' parameter exists and hence run with
the Linux kernel's default value of just 8kB. For QEMU 5.2 I addressed this by
logging a performance warning on host side for making users at least aware
about this issue. The long-term plan is to pass a good msize value from host
to guest via virtio (like it's already done for the available export tags) and
the Linux kernel would default to that instead.
Best regards,
Christian Schoenebeck
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance)
@ 2020-09-25 16:05 ` Christian Schoenebeck
0 siblings, 0 replies; 107+ messages in thread
From: Christian Schoenebeck @ 2020-09-25 16:05 UTC (permalink / raw)
To: qemu-devel
Cc: Venegas Munoz, Jose Carlos, cdupontd, virtio-fs-list, Shinde,
Archana M, Vivek Goyal
On Freitag, 25. September 2020 15:05:38 CEST Dr. David Alan Gilbert wrote:
> > > 9p ( mount -t 9p -o trans=virtio kernel /mnt
> > > -oversion=9p2000.L,cache=mmap,msize=1048576 ) test: (g=0): rw=randrw,
> >
> > Bottleneck ------------------------------^
> >
> > By increasing 'msize' you would encounter better 9P I/O results.
>
> OK, I thought that was bigger than the default; what number should I
> use?
It depends on the underlying storage hardware. In other words: you have to try
increasing the 'msize' value to a point where you no longer notice a negative
performance impact (or almost). Which is fortunately quite easy to test on
guest like:
dd if=/dev/zero of=test.dat bs=1G count=12
time cat test.dat > /dev/null
I would start with an absolute minimum msize of 10MB. I would recommend
something around 100MB maybe for a mechanical hard drive. With a PCIe flash
you probably would rather pick several hundred MB or even more.
That unpleasant 'msize' issue is a limitation of the 9p protocol: client
(guest) must suggest the value of msize on connection to server (host). Server
can only lower, but not raise it. And the client in turn obviously cannot see
host's storage device(s), so client is unable to pick a good value by itself.
So it's a suboptimal handshake issue right now.
Many users don't even know this 'msize' parameter exists and hence run with
the Linux kernel's default value of just 8kB. For QEMU 5.2 I addressed this by
logging a performance warning on host side for making users at least aware
about this issue. The long-term plan is to pass a good msize value from host
to guest via virtio (like it's already done for the available export tags) and
the Linux kernel would default to that instead.
Best regards,
Christian Schoenebeck
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance)
2020-09-25 16:05 ` [Virtio-fs] " Christian Schoenebeck
@ 2020-09-25 16:33 ` Christian Schoenebeck
-1 siblings, 0 replies; 107+ messages in thread
From: Christian Schoenebeck @ 2020-09-25 16:33 UTC (permalink / raw)
To: qemu-devel
Cc: Dr. David Alan Gilbert, Venegas Munoz, Jose Carlos, cdupontd,
virtio-fs-list, Stefan Hajnoczi, Shinde, Archana M, Vivek Goyal
On Freitag, 25. September 2020 18:05:17 CEST Christian Schoenebeck wrote:
> On Freitag, 25. September 2020 15:05:38 CEST Dr. David Alan Gilbert wrote:
> > > > 9p ( mount -t 9p -o trans=virtio kernel /mnt
> > > > -oversion=9p2000.L,cache=mmap,msize=1048576 ) test: (g=0): rw=randrw,
> > >
> > > Bottleneck ------------------------------^
> > >
> > > By increasing 'msize' you would encounter better 9P I/O results.
> >
> > OK, I thought that was bigger than the default; what number should I
> > use?
>
> It depends on the underlying storage hardware. In other words: you have to
> try increasing the 'msize' value to a point where you no longer notice a
> negative performance impact (or almost). Which is fortunately quite easy to
> test on guest like:
>
> dd if=/dev/zero of=test.dat bs=1G count=12
> time cat test.dat > /dev/null
I forgot: you should execute that 'dd' command and host side, and the 'cat'
command on guest side, to avoid any caching making the benchmark result look
better than it actually is. Because for finding a good 'msize' value you only
care about actual 9p data really being transmitted between host and guest.
Best regards,
Christian Schoenebeck
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance)
@ 2020-09-25 16:33 ` Christian Schoenebeck
0 siblings, 0 replies; 107+ messages in thread
From: Christian Schoenebeck @ 2020-09-25 16:33 UTC (permalink / raw)
To: qemu-devel
Cc: Venegas Munoz, Jose Carlos, cdupontd, virtio-fs-list, Shinde,
Archana M, Vivek Goyal
On Freitag, 25. September 2020 18:05:17 CEST Christian Schoenebeck wrote:
> On Freitag, 25. September 2020 15:05:38 CEST Dr. David Alan Gilbert wrote:
> > > > 9p ( mount -t 9p -o trans=virtio kernel /mnt
> > > > -oversion=9p2000.L,cache=mmap,msize=1048576 ) test: (g=0): rw=randrw,
> > >
> > > Bottleneck ------------------------------^
> > >
> > > By increasing 'msize' you would encounter better 9P I/O results.
> >
> > OK, I thought that was bigger than the default; what number should I
> > use?
>
> It depends on the underlying storage hardware. In other words: you have to
> try increasing the 'msize' value to a point where you no longer notice a
> negative performance impact (or almost). Which is fortunately quite easy to
> test on guest like:
>
> dd if=/dev/zero of=test.dat bs=1G count=12
> time cat test.dat > /dev/null
I forgot: you should execute that 'dd' command and host side, and the 'cat'
command on guest side, to avoid any caching making the benchmark result look
better than it actually is. Because for finding a good 'msize' value you only
care about actual 9p data really being transmitted between host and guest.
Best regards,
Christian Schoenebeck
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance)
2020-09-25 16:05 ` [Virtio-fs] " Christian Schoenebeck
@ 2020-09-25 18:51 ` Dr. David Alan Gilbert
-1 siblings, 0 replies; 107+ messages in thread
From: Dr. David Alan Gilbert @ 2020-09-25 18:51 UTC (permalink / raw)
To: Christian Schoenebeck
Cc: Venegas Munoz, Jose Carlos, qemu-devel, cdupontd, virtio-fs-list,
Stefan Hajnoczi, Shinde, Archana M, Vivek Goyal
* Christian Schoenebeck (qemu_oss@crudebyte.com) wrote:
> On Freitag, 25. September 2020 15:05:38 CEST Dr. David Alan Gilbert wrote:
> > > > 9p ( mount -t 9p -o trans=virtio kernel /mnt
> > > > -oversion=9p2000.L,cache=mmap,msize=1048576 ) test: (g=0): rw=randrw,
> > >
> > > Bottleneck ------------------------------^
> > >
> > > By increasing 'msize' you would encounter better 9P I/O results.
> >
> > OK, I thought that was bigger than the default; what number should I
> > use?
>
> It depends on the underlying storage hardware. In other words: you have to try
> increasing the 'msize' value to a point where you no longer notice a negative
> performance impact (or almost). Which is fortunately quite easy to test on
> guest like:
>
> dd if=/dev/zero of=test.dat bs=1G count=12
> time cat test.dat > /dev/null
>
> I would start with an absolute minimum msize of 10MB. I would recommend
> something around 100MB maybe for a mechanical hard drive. With a PCIe flash
> you probably would rather pick several hundred MB or even more.
>
> That unpleasant 'msize' issue is a limitation of the 9p protocol: client
> (guest) must suggest the value of msize on connection to server (host). Server
> can only lower, but not raise it. And the client in turn obviously cannot see
> host's storage device(s), so client is unable to pick a good value by itself.
> So it's a suboptimal handshake issue right now.
It doesn't seem to be making a vast difference here:
9p mount -t 9p -o trans=virtio kernel /mnt -oversion=9p2000.L,cache=mmap,msize=104857600
Run status group 0 (all jobs):
READ: bw=62.5MiB/s (65.6MB/s), 62.5MiB/s-62.5MiB/s (65.6MB/s-65.6MB/s), io=3070MiB (3219MB), run=49099-49099msec
WRITE: bw=20.9MiB/s (21.9MB/s), 20.9MiB/s-20.9MiB/s (21.9MB/s-21.9MB/s), io=1026MiB (1076MB), run=49099-49099msec
9p mount -t 9p -o trans=virtio kernel /mnt -oversion=9p2000.L,cache=mmap,msize=1048576000
Run status group 0 (all jobs):
READ: bw=65.2MiB/s (68.3MB/s), 65.2MiB/s-65.2MiB/s (68.3MB/s-68.3MB/s), io=3070MiB (3219MB), run=47104-47104msec
WRITE: bw=21.8MiB/s (22.8MB/s), 21.8MiB/s-21.8MiB/s (22.8MB/s-22.8MB/s), io=1026MiB (1076MB), run=47104-47104msec
Dave
> Many users don't even know this 'msize' parameter exists and hence run with
> the Linux kernel's default value of just 8kB. For QEMU 5.2 I addressed this by
> logging a performance warning on host side for making users at least aware
> about this issue. The long-term plan is to pass a good msize value from host
> to guest via virtio (like it's already done for the available export tags) and
> the Linux kernel would default to that instead.
>
> Best regards,
> Christian Schoenebeck
>
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance)
@ 2020-09-25 18:51 ` Dr. David Alan Gilbert
0 siblings, 0 replies; 107+ messages in thread
From: Dr. David Alan Gilbert @ 2020-09-25 18:51 UTC (permalink / raw)
To: Christian Schoenebeck
Cc: Venegas Munoz, Jose Carlos, qemu-devel, cdupontd, virtio-fs-list,
Shinde, Archana M, Vivek Goyal
* Christian Schoenebeck (qemu_oss@crudebyte.com) wrote:
> On Freitag, 25. September 2020 15:05:38 CEST Dr. David Alan Gilbert wrote:
> > > > 9p ( mount -t 9p -o trans=virtio kernel /mnt
> > > > -oversion=9p2000.L,cache=mmap,msize=1048576 ) test: (g=0): rw=randrw,
> > >
> > > Bottleneck ------------------------------^
> > >
> > > By increasing 'msize' you would encounter better 9P I/O results.
> >
> > OK, I thought that was bigger than the default; what number should I
> > use?
>
> It depends on the underlying storage hardware. In other words: you have to try
> increasing the 'msize' value to a point where you no longer notice a negative
> performance impact (or almost). Which is fortunately quite easy to test on
> guest like:
>
> dd if=/dev/zero of=test.dat bs=1G count=12
> time cat test.dat > /dev/null
>
> I would start with an absolute minimum msize of 10MB. I would recommend
> something around 100MB maybe for a mechanical hard drive. With a PCIe flash
> you probably would rather pick several hundred MB or even more.
>
> That unpleasant 'msize' issue is a limitation of the 9p protocol: client
> (guest) must suggest the value of msize on connection to server (host). Server
> can only lower, but not raise it. And the client in turn obviously cannot see
> host's storage device(s), so client is unable to pick a good value by itself.
> So it's a suboptimal handshake issue right now.
It doesn't seem to be making a vast difference here:
9p mount -t 9p -o trans=virtio kernel /mnt -oversion=9p2000.L,cache=mmap,msize=104857600
Run status group 0 (all jobs):
READ: bw=62.5MiB/s (65.6MB/s), 62.5MiB/s-62.5MiB/s (65.6MB/s-65.6MB/s), io=3070MiB (3219MB), run=49099-49099msec
WRITE: bw=20.9MiB/s (21.9MB/s), 20.9MiB/s-20.9MiB/s (21.9MB/s-21.9MB/s), io=1026MiB (1076MB), run=49099-49099msec
9p mount -t 9p -o trans=virtio kernel /mnt -oversion=9p2000.L,cache=mmap,msize=1048576000
Run status group 0 (all jobs):
READ: bw=65.2MiB/s (68.3MB/s), 65.2MiB/s-65.2MiB/s (68.3MB/s-68.3MB/s), io=3070MiB (3219MB), run=47104-47104msec
WRITE: bw=21.8MiB/s (22.8MB/s), 21.8MiB/s-21.8MiB/s (22.8MB/s-22.8MB/s), io=1026MiB (1076MB), run=47104-47104msec
Dave
> Many users don't even know this 'msize' parameter exists and hence run with
> the Linux kernel's default value of just 8kB. For QEMU 5.2 I addressed this by
> logging a performance warning on host side for making users at least aware
> about this issue. The long-term plan is to pass a good msize value from host
> to guest via virtio (like it's already done for the available export tags) and
> the Linux kernel would default to that instead.
>
> Best regards,
> Christian Schoenebeck
>
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance)
2020-09-25 18:51 ` [Virtio-fs] " Dr. David Alan Gilbert
@ 2020-09-27 12:14 ` Christian Schoenebeck
-1 siblings, 0 replies; 107+ messages in thread
From: Christian Schoenebeck @ 2020-09-27 12:14 UTC (permalink / raw)
To: qemu-devel
Cc: Dr. David Alan Gilbert, Venegas Munoz, Jose Carlos, cdupontd,
virtio-fs-list, Stefan Hajnoczi, Shinde, Archana M, Vivek Goyal
On Freitag, 25. September 2020 20:51:47 CEST Dr. David Alan Gilbert wrote:
> * Christian Schoenebeck (qemu_oss@crudebyte.com) wrote:
> > On Freitag, 25. September 2020 15:05:38 CEST Dr. David Alan Gilbert wrote:
> > > > > 9p ( mount -t 9p -o trans=virtio kernel /mnt
> > > > > -oversion=9p2000.L,cache=mmap,msize=1048576 ) test: (g=0):
> > > > > rw=randrw,
> > > >
> > > > Bottleneck ------------------------------^
> > > >
> > > > By increasing 'msize' you would encounter better 9P I/O results.
> > >
> > > OK, I thought that was bigger than the default; what number should I
> > > use?
> >
> > It depends on the underlying storage hardware. In other words: you have to
> > try increasing the 'msize' value to a point where you no longer notice a
> > negative performance impact (or almost). Which is fortunately quite easy
> > to test on>
> > guest like:
> > dd if=/dev/zero of=test.dat bs=1G count=12
> > time cat test.dat > /dev/null
> >
> > I would start with an absolute minimum msize of 10MB. I would recommend
> > something around 100MB maybe for a mechanical hard drive. With a PCIe
> > flash
> > you probably would rather pick several hundred MB or even more.
> >
> > That unpleasant 'msize' issue is a limitation of the 9p protocol: client
> > (guest) must suggest the value of msize on connection to server (host).
> > Server can only lower, but not raise it. And the client in turn obviously
> > cannot see host's storage device(s), so client is unable to pick a good
> > value by itself. So it's a suboptimal handshake issue right now.
>
> It doesn't seem to be making a vast difference here:
>
>
>
> 9p mount -t 9p -o trans=virtio kernel /mnt
> -oversion=9p2000.L,cache=mmap,msize=104857600
>
> Run status group 0 (all jobs):
> READ: bw=62.5MiB/s (65.6MB/s), 62.5MiB/s-62.5MiB/s (65.6MB/s-65.6MB/s),
> io=3070MiB (3219MB), run=49099-49099msec WRITE: bw=20.9MiB/s (21.9MB/s),
> 20.9MiB/s-20.9MiB/s (21.9MB/s-21.9MB/s), io=1026MiB (1076MB),
> run=49099-49099msec
>
> 9p mount -t 9p -o trans=virtio kernel /mnt
> -oversion=9p2000.L,cache=mmap,msize=1048576000
>
> Run status group 0 (all jobs):
> READ: bw=65.2MiB/s (68.3MB/s), 65.2MiB/s-65.2MiB/s (68.3MB/s-68.3MB/s),
> io=3070MiB (3219MB), run=47104-47104msec WRITE: bw=21.8MiB/s (22.8MB/s),
> 21.8MiB/s-21.8MiB/s (22.8MB/s-22.8MB/s), io=1026MiB (1076MB),
> run=47104-47104msec
>
>
> Dave
Is that benchmark tool honoring 'iounit' to automatically run with max. I/O
chunk sizes? What's that benchmark tool actually? And do you also see no
improvement with a simple
time cat largefile.dat > /dev/null
?
Best regards,
Christian Schoenebeck
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance)
@ 2020-09-27 12:14 ` Christian Schoenebeck
0 siblings, 0 replies; 107+ messages in thread
From: Christian Schoenebeck @ 2020-09-27 12:14 UTC (permalink / raw)
To: qemu-devel
Cc: Venegas Munoz, Jose Carlos, cdupontd, virtio-fs-list, Shinde,
Archana M, Vivek Goyal
On Freitag, 25. September 2020 20:51:47 CEST Dr. David Alan Gilbert wrote:
> * Christian Schoenebeck (qemu_oss@crudebyte.com) wrote:
> > On Freitag, 25. September 2020 15:05:38 CEST Dr. David Alan Gilbert wrote:
> > > > > 9p ( mount -t 9p -o trans=virtio kernel /mnt
> > > > > -oversion=9p2000.L,cache=mmap,msize=1048576 ) test: (g=0):
> > > > > rw=randrw,
> > > >
> > > > Bottleneck ------------------------------^
> > > >
> > > > By increasing 'msize' you would encounter better 9P I/O results.
> > >
> > > OK, I thought that was bigger than the default; what number should I
> > > use?
> >
> > It depends on the underlying storage hardware. In other words: you have to
> > try increasing the 'msize' value to a point where you no longer notice a
> > negative performance impact (or almost). Which is fortunately quite easy
> > to test on>
> > guest like:
> > dd if=/dev/zero of=test.dat bs=1G count=12
> > time cat test.dat > /dev/null
> >
> > I would start with an absolute minimum msize of 10MB. I would recommend
> > something around 100MB maybe for a mechanical hard drive. With a PCIe
> > flash
> > you probably would rather pick several hundred MB or even more.
> >
> > That unpleasant 'msize' issue is a limitation of the 9p protocol: client
> > (guest) must suggest the value of msize on connection to server (host).
> > Server can only lower, but not raise it. And the client in turn obviously
> > cannot see host's storage device(s), so client is unable to pick a good
> > value by itself. So it's a suboptimal handshake issue right now.
>
> It doesn't seem to be making a vast difference here:
>
>
>
> 9p mount -t 9p -o trans=virtio kernel /mnt
> -oversion=9p2000.L,cache=mmap,msize=104857600
>
> Run status group 0 (all jobs):
> READ: bw=62.5MiB/s (65.6MB/s), 62.5MiB/s-62.5MiB/s (65.6MB/s-65.6MB/s),
> io=3070MiB (3219MB), run=49099-49099msec WRITE: bw=20.9MiB/s (21.9MB/s),
> 20.9MiB/s-20.9MiB/s (21.9MB/s-21.9MB/s), io=1026MiB (1076MB),
> run=49099-49099msec
>
> 9p mount -t 9p -o trans=virtio kernel /mnt
> -oversion=9p2000.L,cache=mmap,msize=1048576000
>
> Run status group 0 (all jobs):
> READ: bw=65.2MiB/s (68.3MB/s), 65.2MiB/s-65.2MiB/s (68.3MB/s-68.3MB/s),
> io=3070MiB (3219MB), run=47104-47104msec WRITE: bw=21.8MiB/s (22.8MB/s),
> 21.8MiB/s-21.8MiB/s (22.8MB/s-22.8MB/s), io=1026MiB (1076MB),
> run=47104-47104msec
>
>
> Dave
Is that benchmark tool honoring 'iounit' to automatically run with max. I/O
chunk sizes? What's that benchmark tool actually? And do you also see no
improvement with a simple
time cat largefile.dat > /dev/null
?
Best regards,
Christian Schoenebeck
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance)
2020-09-27 12:14 ` [Virtio-fs] " Christian Schoenebeck
@ 2020-09-29 13:03 ` Vivek Goyal
-1 siblings, 0 replies; 107+ messages in thread
From: Vivek Goyal @ 2020-09-29 13:03 UTC (permalink / raw)
To: Christian Schoenebeck
Cc: Venegas Munoz, Jose Carlos, cdupontd, qemu-devel, virtio-fs-list,
Stefan Hajnoczi, Shinde, Archana M, Dr. David Alan Gilbert
On Sun, Sep 27, 2020 at 02:14:43PM +0200, Christian Schoenebeck wrote:
> On Freitag, 25. September 2020 20:51:47 CEST Dr. David Alan Gilbert wrote:
> > * Christian Schoenebeck (qemu_oss@crudebyte.com) wrote:
> > > On Freitag, 25. September 2020 15:05:38 CEST Dr. David Alan Gilbert wrote:
> > > > > > 9p ( mount -t 9p -o trans=virtio kernel /mnt
> > > > > > -oversion=9p2000.L,cache=mmap,msize=1048576 ) test: (g=0):
> > > > > > rw=randrw,
> > > > >
> > > > > Bottleneck ------------------------------^
> > > > >
> > > > > By increasing 'msize' you would encounter better 9P I/O results.
> > > >
> > > > OK, I thought that was bigger than the default; what number should I
> > > > use?
> > >
> > > It depends on the underlying storage hardware. In other words: you have to
> > > try increasing the 'msize' value to a point where you no longer notice a
> > > negative performance impact (or almost). Which is fortunately quite easy
> > > to test on>
> > > guest like:
> > > dd if=/dev/zero of=test.dat bs=1G count=12
> > > time cat test.dat > /dev/null
> > >
> > > I would start with an absolute minimum msize of 10MB. I would recommend
> > > something around 100MB maybe for a mechanical hard drive. With a PCIe
> > > flash
> > > you probably would rather pick several hundred MB or even more.
> > >
> > > That unpleasant 'msize' issue is a limitation of the 9p protocol: client
> > > (guest) must suggest the value of msize on connection to server (host).
> > > Server can only lower, but not raise it. And the client in turn obviously
> > > cannot see host's storage device(s), so client is unable to pick a good
> > > value by itself. So it's a suboptimal handshake issue right now.
> >
> > It doesn't seem to be making a vast difference here:
> >
> >
> >
> > 9p mount -t 9p -o trans=virtio kernel /mnt
> > -oversion=9p2000.L,cache=mmap,msize=104857600
> >
> > Run status group 0 (all jobs):
> > READ: bw=62.5MiB/s (65.6MB/s), 62.5MiB/s-62.5MiB/s (65.6MB/s-65.6MB/s),
> > io=3070MiB (3219MB), run=49099-49099msec WRITE: bw=20.9MiB/s (21.9MB/s),
> > 20.9MiB/s-20.9MiB/s (21.9MB/s-21.9MB/s), io=1026MiB (1076MB),
> > run=49099-49099msec
> >
> > 9p mount -t 9p -o trans=virtio kernel /mnt
> > -oversion=9p2000.L,cache=mmap,msize=1048576000
> >
> > Run status group 0 (all jobs):
> > READ: bw=65.2MiB/s (68.3MB/s), 65.2MiB/s-65.2MiB/s (68.3MB/s-68.3MB/s),
> > io=3070MiB (3219MB), run=47104-47104msec WRITE: bw=21.8MiB/s (22.8MB/s),
> > 21.8MiB/s-21.8MiB/s (22.8MB/s-22.8MB/s), io=1026MiB (1076MB),
> > run=47104-47104msec
> >
> >
> > Dave
>
> Is that benchmark tool honoring 'iounit' to automatically run with max. I/O
> chunk sizes? What's that benchmark tool actually? And do you also see no
> improvement with a simple
>
> time cat largefile.dat > /dev/null
I am assuming that msize only helps with sequential I/O and not random
I/O.
Dave is running random read and random write mix and probably that's why
he is not seeing any improvement with msize increase.
If we run sequential workload (as "cat largefile.dat"), that should
see an improvement with msize increase.
Thanks
Vivek
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance)
@ 2020-09-29 13:03 ` Vivek Goyal
0 siblings, 0 replies; 107+ messages in thread
From: Vivek Goyal @ 2020-09-29 13:03 UTC (permalink / raw)
To: Christian Schoenebeck
Cc: Venegas Munoz, Jose Carlos, cdupontd, qemu-devel, virtio-fs-list,
Shinde, Archana M
On Sun, Sep 27, 2020 at 02:14:43PM +0200, Christian Schoenebeck wrote:
> On Freitag, 25. September 2020 20:51:47 CEST Dr. David Alan Gilbert wrote:
> > * Christian Schoenebeck (qemu_oss@crudebyte.com) wrote:
> > > On Freitag, 25. September 2020 15:05:38 CEST Dr. David Alan Gilbert wrote:
> > > > > > 9p ( mount -t 9p -o trans=virtio kernel /mnt
> > > > > > -oversion=9p2000.L,cache=mmap,msize=1048576 ) test: (g=0):
> > > > > > rw=randrw,
> > > > >
> > > > > Bottleneck ------------------------------^
> > > > >
> > > > > By increasing 'msize' you would encounter better 9P I/O results.
> > > >
> > > > OK, I thought that was bigger than the default; what number should I
> > > > use?
> > >
> > > It depends on the underlying storage hardware. In other words: you have to
> > > try increasing the 'msize' value to a point where you no longer notice a
> > > negative performance impact (or almost). Which is fortunately quite easy
> > > to test on>
> > > guest like:
> > > dd if=/dev/zero of=test.dat bs=1G count=12
> > > time cat test.dat > /dev/null
> > >
> > > I would start with an absolute minimum msize of 10MB. I would recommend
> > > something around 100MB maybe for a mechanical hard drive. With a PCIe
> > > flash
> > > you probably would rather pick several hundred MB or even more.
> > >
> > > That unpleasant 'msize' issue is a limitation of the 9p protocol: client
> > > (guest) must suggest the value of msize on connection to server (host).
> > > Server can only lower, but not raise it. And the client in turn obviously
> > > cannot see host's storage device(s), so client is unable to pick a good
> > > value by itself. So it's a suboptimal handshake issue right now.
> >
> > It doesn't seem to be making a vast difference here:
> >
> >
> >
> > 9p mount -t 9p -o trans=virtio kernel /mnt
> > -oversion=9p2000.L,cache=mmap,msize=104857600
> >
> > Run status group 0 (all jobs):
> > READ: bw=62.5MiB/s (65.6MB/s), 62.5MiB/s-62.5MiB/s (65.6MB/s-65.6MB/s),
> > io=3070MiB (3219MB), run=49099-49099msec WRITE: bw=20.9MiB/s (21.9MB/s),
> > 20.9MiB/s-20.9MiB/s (21.9MB/s-21.9MB/s), io=1026MiB (1076MB),
> > run=49099-49099msec
> >
> > 9p mount -t 9p -o trans=virtio kernel /mnt
> > -oversion=9p2000.L,cache=mmap,msize=1048576000
> >
> > Run status group 0 (all jobs):
> > READ: bw=65.2MiB/s (68.3MB/s), 65.2MiB/s-65.2MiB/s (68.3MB/s-68.3MB/s),
> > io=3070MiB (3219MB), run=47104-47104msec WRITE: bw=21.8MiB/s (22.8MB/s),
> > 21.8MiB/s-21.8MiB/s (22.8MB/s-22.8MB/s), io=1026MiB (1076MB),
> > run=47104-47104msec
> >
> >
> > Dave
>
> Is that benchmark tool honoring 'iounit' to automatically run with max. I/O
> chunk sizes? What's that benchmark tool actually? And do you also see no
> improvement with a simple
>
> time cat largefile.dat > /dev/null
I am assuming that msize only helps with sequential I/O and not random
I/O.
Dave is running random read and random write mix and probably that's why
he is not seeing any improvement with msize increase.
If we run sequential workload (as "cat largefile.dat"), that should
see an improvement with msize increase.
Thanks
Vivek
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance)
2020-09-25 12:41 ` [Virtio-fs] " Dr. David Alan Gilbert
@ 2020-09-29 13:17 ` Vivek Goyal
-1 siblings, 0 replies; 107+ messages in thread
From: Vivek Goyal @ 2020-09-29 13:17 UTC (permalink / raw)
To: Dr. David Alan Gilbert
Cc: Venegas Munoz, Jose Carlos, qemu-devel, cdupontd, virtio-fs-list,
Stefan Hajnoczi, Shinde, Archana M
On Fri, Sep 25, 2020 at 01:41:39PM +0100, Dr. David Alan Gilbert wrote:
[..]
> So I'm sitll beating 9p; the thread-pool-size=1 seems to be great for
> read performance here.
>
Hi Dave,
I spent some time making changes to virtiofs-tests so that I can test
a mix of random read and random write workload. That testsuite runs
a workload 3 times and reports the average. So I like to use it to
reduce run to run variation effect.
So I ran following to mimic carlos's workload.
$ ./run-fio-test.sh test -direct=1 -c <test-dir> fio-jobs/randrw-psync.job >
testresults.txt
$ ./parse-fio-results.sh testresults.txt
I am using a SSD at the host to back these files. Option "-c" always
creates new files for testing.
Following are my results in various configurations. Used cache=mmap mode
for 9p and cache=auto (and cache=none) modes for virtiofs. Also tested
9p default as well as msize=16m. Tested virtiofs both with exclusive
as well as shared thread pool.
NAME WORKLOAD Bandwidth IOPS
9p-mmap-randrw randrw-psync 42.8mb/14.3mb 10.7k/3666
9p-mmap-msize16m randrw-psync 42.8mb/14.3mb 10.7k/3674
vtfs-auto-ex-randrw randrw-psync 27.8mb/9547kb 7136/2386
vtfs-auto-sh-randrw randrw-psync 43.3mb/14.4mb 10.8k/3709
vtfs-none-sh-randrw randrw-psync 54.1mb/18.1mb 13.5k/4649
- Increasing msize to 16m did not help with performance for this workload.
- virtiofs exclusive thread pool ("ex"), is slower than 9p.
- virtiofs shared thread pool ("sh"), matches the performance of 9p.
- virtiofs cache=none mode is faster than cache=auto mode for this
workload.
Carlos, I am looking at more ways to optimize it further for virtiofs.
In the mean time I think switching to "shared" thread pool should
bring you very close to 9p in your setup I think.
Thanks
Vivek
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance)
@ 2020-09-29 13:17 ` Vivek Goyal
0 siblings, 0 replies; 107+ messages in thread
From: Vivek Goyal @ 2020-09-29 13:17 UTC (permalink / raw)
To: Dr. David Alan Gilbert
Cc: Venegas Munoz, Jose Carlos, qemu-devel, cdupontd, virtio-fs-list,
Shinde, Archana M
On Fri, Sep 25, 2020 at 01:41:39PM +0100, Dr. David Alan Gilbert wrote:
[..]
> So I'm sitll beating 9p; the thread-pool-size=1 seems to be great for
> read performance here.
>
Hi Dave,
I spent some time making changes to virtiofs-tests so that I can test
a mix of random read and random write workload. That testsuite runs
a workload 3 times and reports the average. So I like to use it to
reduce run to run variation effect.
So I ran following to mimic carlos's workload.
$ ./run-fio-test.sh test -direct=1 -c <test-dir> fio-jobs/randrw-psync.job >
testresults.txt
$ ./parse-fio-results.sh testresults.txt
I am using a SSD at the host to back these files. Option "-c" always
creates new files for testing.
Following are my results in various configurations. Used cache=mmap mode
for 9p and cache=auto (and cache=none) modes for virtiofs. Also tested
9p default as well as msize=16m. Tested virtiofs both with exclusive
as well as shared thread pool.
NAME WORKLOAD Bandwidth IOPS
9p-mmap-randrw randrw-psync 42.8mb/14.3mb 10.7k/3666
9p-mmap-msize16m randrw-psync 42.8mb/14.3mb 10.7k/3674
vtfs-auto-ex-randrw randrw-psync 27.8mb/9547kb 7136/2386
vtfs-auto-sh-randrw randrw-psync 43.3mb/14.4mb 10.8k/3709
vtfs-none-sh-randrw randrw-psync 54.1mb/18.1mb 13.5k/4649
- Increasing msize to 16m did not help with performance for this workload.
- virtiofs exclusive thread pool ("ex"), is slower than 9p.
- virtiofs shared thread pool ("sh"), matches the performance of 9p.
- virtiofs cache=none mode is faster than cache=auto mode for this
workload.
Carlos, I am looking at more ways to optimize it further for virtiofs.
In the mean time I think switching to "shared" thread pool should
bring you very close to 9p in your setup I think.
Thanks
Vivek
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance)
2020-09-29 13:03 ` [Virtio-fs] " Vivek Goyal
@ 2020-09-29 13:28 ` Christian Schoenebeck
-1 siblings, 0 replies; 107+ messages in thread
From: Christian Schoenebeck @ 2020-09-29 13:28 UTC (permalink / raw)
To: qemu-devel
Cc: Vivek Goyal, Venegas Munoz, Jose Carlos, cdupontd,
virtio-fs-list, Stefan Hajnoczi, Shinde, Archana M,
Dr. David Alan Gilbert
On Dienstag, 29. September 2020 15:03:25 CEST Vivek Goyal wrote:
> On Sun, Sep 27, 2020 at 02:14:43PM +0200, Christian Schoenebeck wrote:
> > On Freitag, 25. September 2020 20:51:47 CEST Dr. David Alan Gilbert wrote:
> > > * Christian Schoenebeck (qemu_oss@crudebyte.com) wrote:
> > > > On Freitag, 25. September 2020 15:05:38 CEST Dr. David Alan Gilbert
wrote:
> > > > > > > 9p ( mount -t 9p -o trans=virtio kernel /mnt
> > > > > > > -oversion=9p2000.L,cache=mmap,msize=1048576 ) test: (g=0):
> > > > > > > rw=randrw,
> > > > > >
> > > > > > Bottleneck ------------------------------^
> > > > > >
> > > > > > By increasing 'msize' you would encounter better 9P I/O results.
> > > > >
> > > > > OK, I thought that was bigger than the default; what number should
> > > > > I
> > > > > use?
> > > >
> > > > It depends on the underlying storage hardware. In other words: you
> > > > have to
> > > > try increasing the 'msize' value to a point where you no longer notice
> > > > a
> > > > negative performance impact (or almost). Which is fortunately quite
> > > > easy
> > > > to test on>
> > > >
> > > > guest like:
> > > > dd if=/dev/zero of=test.dat bs=1G count=12
> > > > time cat test.dat > /dev/null
> > > >
> > > > I would start with an absolute minimum msize of 10MB. I would
> > > > recommend
> > > > something around 100MB maybe for a mechanical hard drive. With a PCIe
> > > > flash
> > > > you probably would rather pick several hundred MB or even more.
> > > >
> > > > That unpleasant 'msize' issue is a limitation of the 9p protocol:
> > > > client
> > > > (guest) must suggest the value of msize on connection to server
> > > > (host).
> > > > Server can only lower, but not raise it. And the client in turn
> > > > obviously
> > > > cannot see host's storage device(s), so client is unable to pick a
> > > > good
> > > > value by itself. So it's a suboptimal handshake issue right now.
> > >
> > > It doesn't seem to be making a vast difference here:
> > >
> > >
> > >
> > > 9p mount -t 9p -o trans=virtio kernel /mnt
> > > -oversion=9p2000.L,cache=mmap,msize=104857600
> > >
> > > Run status group 0 (all jobs):
> > > READ: bw=62.5MiB/s (65.6MB/s), 62.5MiB/s-62.5MiB/s
> > > (65.6MB/s-65.6MB/s),
> > >
> > > io=3070MiB (3219MB), run=49099-49099msec WRITE: bw=20.9MiB/s (21.9MB/s),
> > > 20.9MiB/s-20.9MiB/s (21.9MB/s-21.9MB/s), io=1026MiB (1076MB),
> > > run=49099-49099msec
> > >
> > > 9p mount -t 9p -o trans=virtio kernel /mnt
> > > -oversion=9p2000.L,cache=mmap,msize=1048576000
> > >
> > > Run status group 0 (all jobs):
> > > READ: bw=65.2MiB/s (68.3MB/s), 65.2MiB/s-65.2MiB/s
> > > (68.3MB/s-68.3MB/s),
> > >
> > > io=3070MiB (3219MB), run=47104-47104msec WRITE: bw=21.8MiB/s (22.8MB/s),
> > > 21.8MiB/s-21.8MiB/s (22.8MB/s-22.8MB/s), io=1026MiB (1076MB),
> > > run=47104-47104msec
> > >
> > >
> > > Dave
> >
> > Is that benchmark tool honoring 'iounit' to automatically run with max.
> > I/O
> > chunk sizes? What's that benchmark tool actually? And do you also see no
> > improvement with a simple
> >
> > time cat largefile.dat > /dev/null
>
> I am assuming that msize only helps with sequential I/O and not random
> I/O.
>
> Dave is running random read and random write mix and probably that's why
> he is not seeing any improvement with msize increase.
>
> If we run sequential workload (as "cat largefile.dat"), that should
> see an improvement with msize increase.
>
> Thanks
> Vivek
Depends on what's randomized. If read chunk size is randomized, then yes, you
would probably see less performance increase compared to a simple
'cat foo.dat'.
If only the read position is randomized, but the read chunk size honors
iounit, a.k.a. stat's st_blksize (i.e. reading with the most efficient block
size advertised by 9P), then I would assume still seeing a performance
increase. Because seeking is a no/low cost factor in this case. The guest OS
seeking does not transmit a 9p message. The offset is rather passed with any
Tread message instead:
https://github.com/chaos/diod/blob/master/protocol.md
I mean, yes, random seeks reduce I/O performance in general of course, but in
direct performance comparison, the difference in overhead of the 9p vs.
virtiofs network controller layer is most probably the most relevant aspect if
large I/O chunk sizes are used.
But OTOH: I haven't optimized anything in Tread handling in 9p (yet).
Best regards,
Christian Schoenebeck
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance)
@ 2020-09-29 13:28 ` Christian Schoenebeck
0 siblings, 0 replies; 107+ messages in thread
From: Christian Schoenebeck @ 2020-09-29 13:28 UTC (permalink / raw)
To: qemu-devel
Cc: Venegas Munoz, Jose Carlos, cdupontd, virtio-fs-list, Shinde,
Archana M, Vivek Goyal
On Dienstag, 29. September 2020 15:03:25 CEST Vivek Goyal wrote:
> On Sun, Sep 27, 2020 at 02:14:43PM +0200, Christian Schoenebeck wrote:
> > On Freitag, 25. September 2020 20:51:47 CEST Dr. David Alan Gilbert wrote:
> > > * Christian Schoenebeck (qemu_oss@crudebyte.com) wrote:
> > > > On Freitag, 25. September 2020 15:05:38 CEST Dr. David Alan Gilbert
wrote:
> > > > > > > 9p ( mount -t 9p -o trans=virtio kernel /mnt
> > > > > > > -oversion=9p2000.L,cache=mmap,msize=1048576 ) test: (g=0):
> > > > > > > rw=randrw,
> > > > > >
> > > > > > Bottleneck ------------------------------^
> > > > > >
> > > > > > By increasing 'msize' you would encounter better 9P I/O results.
> > > > >
> > > > > OK, I thought that was bigger than the default; what number should
> > > > > I
> > > > > use?
> > > >
> > > > It depends on the underlying storage hardware. In other words: you
> > > > have to
> > > > try increasing the 'msize' value to a point where you no longer notice
> > > > a
> > > > negative performance impact (or almost). Which is fortunately quite
> > > > easy
> > > > to test on>
> > > >
> > > > guest like:
> > > > dd if=/dev/zero of=test.dat bs=1G count=12
> > > > time cat test.dat > /dev/null
> > > >
> > > > I would start with an absolute minimum msize of 10MB. I would
> > > > recommend
> > > > something around 100MB maybe for a mechanical hard drive. With a PCIe
> > > > flash
> > > > you probably would rather pick several hundred MB or even more.
> > > >
> > > > That unpleasant 'msize' issue is a limitation of the 9p protocol:
> > > > client
> > > > (guest) must suggest the value of msize on connection to server
> > > > (host).
> > > > Server can only lower, but not raise it. And the client in turn
> > > > obviously
> > > > cannot see host's storage device(s), so client is unable to pick a
> > > > good
> > > > value by itself. So it's a suboptimal handshake issue right now.
> > >
> > > It doesn't seem to be making a vast difference here:
> > >
> > >
> > >
> > > 9p mount -t 9p -o trans=virtio kernel /mnt
> > > -oversion=9p2000.L,cache=mmap,msize=104857600
> > >
> > > Run status group 0 (all jobs):
> > > READ: bw=62.5MiB/s (65.6MB/s), 62.5MiB/s-62.5MiB/s
> > > (65.6MB/s-65.6MB/s),
> > >
> > > io=3070MiB (3219MB), run=49099-49099msec WRITE: bw=20.9MiB/s (21.9MB/s),
> > > 20.9MiB/s-20.9MiB/s (21.9MB/s-21.9MB/s), io=1026MiB (1076MB),
> > > run=49099-49099msec
> > >
> > > 9p mount -t 9p -o trans=virtio kernel /mnt
> > > -oversion=9p2000.L,cache=mmap,msize=1048576000
> > >
> > > Run status group 0 (all jobs):
> > > READ: bw=65.2MiB/s (68.3MB/s), 65.2MiB/s-65.2MiB/s
> > > (68.3MB/s-68.3MB/s),
> > >
> > > io=3070MiB (3219MB), run=47104-47104msec WRITE: bw=21.8MiB/s (22.8MB/s),
> > > 21.8MiB/s-21.8MiB/s (22.8MB/s-22.8MB/s), io=1026MiB (1076MB),
> > > run=47104-47104msec
> > >
> > >
> > > Dave
> >
> > Is that benchmark tool honoring 'iounit' to automatically run with max.
> > I/O
> > chunk sizes? What's that benchmark tool actually? And do you also see no
> > improvement with a simple
> >
> > time cat largefile.dat > /dev/null
>
> I am assuming that msize only helps with sequential I/O and not random
> I/O.
>
> Dave is running random read and random write mix and probably that's why
> he is not seeing any improvement with msize increase.
>
> If we run sequential workload (as "cat largefile.dat"), that should
> see an improvement with msize increase.
>
> Thanks
> Vivek
Depends on what's randomized. If read chunk size is randomized, then yes, you
would probably see less performance increase compared to a simple
'cat foo.dat'.
If only the read position is randomized, but the read chunk size honors
iounit, a.k.a. stat's st_blksize (i.e. reading with the most efficient block
size advertised by 9P), then I would assume still seeing a performance
increase. Because seeking is a no/low cost factor in this case. The guest OS
seeking does not transmit a 9p message. The offset is rather passed with any
Tread message instead:
https://github.com/chaos/diod/blob/master/protocol.md
I mean, yes, random seeks reduce I/O performance in general of course, but in
direct performance comparison, the difference in overhead of the 9p vs.
virtiofs network controller layer is most probably the most relevant aspect if
large I/O chunk sizes are used.
But OTOH: I haven't optimized anything in Tread handling in 9p (yet).
Best regards,
Christian Schoenebeck
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance)
2020-09-29 13:17 ` [Virtio-fs] " Vivek Goyal
@ 2020-09-29 13:49 ` Miklos Szeredi
-1 siblings, 0 replies; 107+ messages in thread
From: Miklos Szeredi @ 2020-09-29 13:49 UTC (permalink / raw)
To: Vivek Goyal
Cc: qemu-devel, Venegas Munoz, Jose Carlos, cdupontd,
Dr. David Alan Gilbert, virtio-fs-list, Shinde, Archana M
On Tue, Sep 29, 2020 at 3:18 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> - virtiofs cache=none mode is faster than cache=auto mode for this
> workload.
Not sure why. One cause could be that readahead is not perfect at
detecting the random pattern. Could we compare total I/O on the
server vs. total I/O by fio?
Thanks,
Millos
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance)
@ 2020-09-29 13:49 ` Miklos Szeredi
0 siblings, 0 replies; 107+ messages in thread
From: Miklos Szeredi @ 2020-09-29 13:49 UTC (permalink / raw)
To: Vivek Goyal
Cc: qemu-devel, Venegas Munoz, Jose Carlos, cdupontd, virtio-fs-list,
Shinde, Archana M
On Tue, Sep 29, 2020 at 3:18 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> - virtiofs cache=none mode is faster than cache=auto mode for this
> workload.
Not sure why. One cause could be that readahead is not perfect at
detecting the random pattern. Could we compare total I/O on the
server vs. total I/O by fio?
Thanks,
Millos
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance)
2020-09-29 13:28 ` [Virtio-fs] " Christian Schoenebeck
@ 2020-09-29 13:49 ` Vivek Goyal
-1 siblings, 0 replies; 107+ messages in thread
From: Vivek Goyal @ 2020-09-29 13:49 UTC (permalink / raw)
To: Christian Schoenebeck
Cc: Venegas Munoz, Jose Carlos, qemu-devel, cdupontd, virtio-fs-list,
Stefan Hajnoczi, Shinde, Archana M, Dr. David Alan Gilbert
On Tue, Sep 29, 2020 at 03:28:06PM +0200, Christian Schoenebeck wrote:
> On Dienstag, 29. September 2020 15:03:25 CEST Vivek Goyal wrote:
> > On Sun, Sep 27, 2020 at 02:14:43PM +0200, Christian Schoenebeck wrote:
> > > On Freitag, 25. September 2020 20:51:47 CEST Dr. David Alan Gilbert wrote:
> > > > * Christian Schoenebeck (qemu_oss@crudebyte.com) wrote:
> > > > > On Freitag, 25. September 2020 15:05:38 CEST Dr. David Alan Gilbert
> wrote:
> > > > > > > > 9p ( mount -t 9p -o trans=virtio kernel /mnt
> > > > > > > > -oversion=9p2000.L,cache=mmap,msize=1048576 ) test: (g=0):
> > > > > > > > rw=randrw,
> > > > > > >
> > > > > > > Bottleneck ------------------------------^
> > > > > > >
> > > > > > > By increasing 'msize' you would encounter better 9P I/O results.
> > > > > >
> > > > > > OK, I thought that was bigger than the default; what number should
> > > > > > I
> > > > > > use?
> > > > >
> > > > > It depends on the underlying storage hardware. In other words: you
> > > > > have to
> > > > > try increasing the 'msize' value to a point where you no longer notice
> > > > > a
> > > > > negative performance impact (or almost). Which is fortunately quite
> > > > > easy
> > > > > to test on>
> > > > >
> > > > > guest like:
> > > > > dd if=/dev/zero of=test.dat bs=1G count=12
> > > > > time cat test.dat > /dev/null
> > > > >
> > > > > I would start with an absolute minimum msize of 10MB. I would
> > > > > recommend
> > > > > something around 100MB maybe for a mechanical hard drive. With a PCIe
> > > > > flash
> > > > > you probably would rather pick several hundred MB or even more.
> > > > >
> > > > > That unpleasant 'msize' issue is a limitation of the 9p protocol:
> > > > > client
> > > > > (guest) must suggest the value of msize on connection to server
> > > > > (host).
> > > > > Server can only lower, but not raise it. And the client in turn
> > > > > obviously
> > > > > cannot see host's storage device(s), so client is unable to pick a
> > > > > good
> > > > > value by itself. So it's a suboptimal handshake issue right now.
> > > >
> > > > It doesn't seem to be making a vast difference here:
> > > >
> > > >
> > > >
> > > > 9p mount -t 9p -o trans=virtio kernel /mnt
> > > > -oversion=9p2000.L,cache=mmap,msize=104857600
> > > >
> > > > Run status group 0 (all jobs):
> > > > READ: bw=62.5MiB/s (65.6MB/s), 62.5MiB/s-62.5MiB/s
> > > > (65.6MB/s-65.6MB/s),
> > > >
> > > > io=3070MiB (3219MB), run=49099-49099msec WRITE: bw=20.9MiB/s (21.9MB/s),
> > > > 20.9MiB/s-20.9MiB/s (21.9MB/s-21.9MB/s), io=1026MiB (1076MB),
> > > > run=49099-49099msec
> > > >
> > > > 9p mount -t 9p -o trans=virtio kernel /mnt
> > > > -oversion=9p2000.L,cache=mmap,msize=1048576000
> > > >
> > > > Run status group 0 (all jobs):
> > > > READ: bw=65.2MiB/s (68.3MB/s), 65.2MiB/s-65.2MiB/s
> > > > (68.3MB/s-68.3MB/s),
> > > >
> > > > io=3070MiB (3219MB), run=47104-47104msec WRITE: bw=21.8MiB/s (22.8MB/s),
> > > > 21.8MiB/s-21.8MiB/s (22.8MB/s-22.8MB/s), io=1026MiB (1076MB),
> > > > run=47104-47104msec
> > > >
> > > >
> > > > Dave
> > >
> > > Is that benchmark tool honoring 'iounit' to automatically run with max.
> > > I/O
> > > chunk sizes? What's that benchmark tool actually? And do you also see no
> > > improvement with a simple
> > >
> > > time cat largefile.dat > /dev/null
> >
> > I am assuming that msize only helps with sequential I/O and not random
> > I/O.
> >
> > Dave is running random read and random write mix and probably that's why
> > he is not seeing any improvement with msize increase.
> >
> > If we run sequential workload (as "cat largefile.dat"), that should
> > see an improvement with msize increase.
> >
> > Thanks
> > Vivek
>
> Depends on what's randomized. If read chunk size is randomized, then yes, you
> would probably see less performance increase compared to a simple
> 'cat foo.dat'.
We are using "fio" for testing and read chunk size is not being
randomized. chunk size (block size) is fixed at 4K size for these tests.
>
> If only the read position is randomized, but the read chunk size honors
> iounit, a.k.a. stat's st_blksize (i.e. reading with the most efficient block
> size advertised by 9P), then I would assume still seeing a performance
> increase.
Yes, we are randomizing read position. But there is no notion of looking
at st_blksize. Its fixed at 4K. (notice option --bs=4k in fio
commandline).
> Because seeking is a no/low cost factor in this case. The guest OS
> seeking does not transmit a 9p message. The offset is rather passed with any
> Tread message instead:
> https://github.com/chaos/diod/blob/master/protocol.md
>
> I mean, yes, random seeks reduce I/O performance in general of course, but in
> direct performance comparison, the difference in overhead of the 9p vs.
> virtiofs network controller layer is most probably the most relevant aspect if
> large I/O chunk sizes are used.
>
Agreed that large I/O chunk size will help with the perfomance numbers.
But idea is to intentonally use smaller I/O chunk size with some of
the tests to measure how efficient communication path is.
Thanks
Vivek
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance)
@ 2020-09-29 13:49 ` Vivek Goyal
0 siblings, 0 replies; 107+ messages in thread
From: Vivek Goyal @ 2020-09-29 13:49 UTC (permalink / raw)
To: Christian Schoenebeck
Cc: Venegas Munoz, Jose Carlos, qemu-devel, cdupontd, virtio-fs-list,
Shinde, Archana M
On Tue, Sep 29, 2020 at 03:28:06PM +0200, Christian Schoenebeck wrote:
> On Dienstag, 29. September 2020 15:03:25 CEST Vivek Goyal wrote:
> > On Sun, Sep 27, 2020 at 02:14:43PM +0200, Christian Schoenebeck wrote:
> > > On Freitag, 25. September 2020 20:51:47 CEST Dr. David Alan Gilbert wrote:
> > > > * Christian Schoenebeck (qemu_oss@crudebyte.com) wrote:
> > > > > On Freitag, 25. September 2020 15:05:38 CEST Dr. David Alan Gilbert
> wrote:
> > > > > > > > 9p ( mount -t 9p -o trans=virtio kernel /mnt
> > > > > > > > -oversion=9p2000.L,cache=mmap,msize=1048576 ) test: (g=0):
> > > > > > > > rw=randrw,
> > > > > > >
> > > > > > > Bottleneck ------------------------------^
> > > > > > >
> > > > > > > By increasing 'msize' you would encounter better 9P I/O results.
> > > > > >
> > > > > > OK, I thought that was bigger than the default; what number should
> > > > > > I
> > > > > > use?
> > > > >
> > > > > It depends on the underlying storage hardware. In other words: you
> > > > > have to
> > > > > try increasing the 'msize' value to a point where you no longer notice
> > > > > a
> > > > > negative performance impact (or almost). Which is fortunately quite
> > > > > easy
> > > > > to test on>
> > > > >
> > > > > guest like:
> > > > > dd if=/dev/zero of=test.dat bs=1G count=12
> > > > > time cat test.dat > /dev/null
> > > > >
> > > > > I would start with an absolute minimum msize of 10MB. I would
> > > > > recommend
> > > > > something around 100MB maybe for a mechanical hard drive. With a PCIe
> > > > > flash
> > > > > you probably would rather pick several hundred MB or even more.
> > > > >
> > > > > That unpleasant 'msize' issue is a limitation of the 9p protocol:
> > > > > client
> > > > > (guest) must suggest the value of msize on connection to server
> > > > > (host).
> > > > > Server can only lower, but not raise it. And the client in turn
> > > > > obviously
> > > > > cannot see host's storage device(s), so client is unable to pick a
> > > > > good
> > > > > value by itself. So it's a suboptimal handshake issue right now.
> > > >
> > > > It doesn't seem to be making a vast difference here:
> > > >
> > > >
> > > >
> > > > 9p mount -t 9p -o trans=virtio kernel /mnt
> > > > -oversion=9p2000.L,cache=mmap,msize=104857600
> > > >
> > > > Run status group 0 (all jobs):
> > > > READ: bw=62.5MiB/s (65.6MB/s), 62.5MiB/s-62.5MiB/s
> > > > (65.6MB/s-65.6MB/s),
> > > >
> > > > io=3070MiB (3219MB), run=49099-49099msec WRITE: bw=20.9MiB/s (21.9MB/s),
> > > > 20.9MiB/s-20.9MiB/s (21.9MB/s-21.9MB/s), io=1026MiB (1076MB),
> > > > run=49099-49099msec
> > > >
> > > > 9p mount -t 9p -o trans=virtio kernel /mnt
> > > > -oversion=9p2000.L,cache=mmap,msize=1048576000
> > > >
> > > > Run status group 0 (all jobs):
> > > > READ: bw=65.2MiB/s (68.3MB/s), 65.2MiB/s-65.2MiB/s
> > > > (68.3MB/s-68.3MB/s),
> > > >
> > > > io=3070MiB (3219MB), run=47104-47104msec WRITE: bw=21.8MiB/s (22.8MB/s),
> > > > 21.8MiB/s-21.8MiB/s (22.8MB/s-22.8MB/s), io=1026MiB (1076MB),
> > > > run=47104-47104msec
> > > >
> > > >
> > > > Dave
> > >
> > > Is that benchmark tool honoring 'iounit' to automatically run with max.
> > > I/O
> > > chunk sizes? What's that benchmark tool actually? And do you also see no
> > > improvement with a simple
> > >
> > > time cat largefile.dat > /dev/null
> >
> > I am assuming that msize only helps with sequential I/O and not random
> > I/O.
> >
> > Dave is running random read and random write mix and probably that's why
> > he is not seeing any improvement with msize increase.
> >
> > If we run sequential workload (as "cat largefile.dat"), that should
> > see an improvement with msize increase.
> >
> > Thanks
> > Vivek
>
> Depends on what's randomized. If read chunk size is randomized, then yes, you
> would probably see less performance increase compared to a simple
> 'cat foo.dat'.
We are using "fio" for testing and read chunk size is not being
randomized. chunk size (block size) is fixed at 4K size for these tests.
>
> If only the read position is randomized, but the read chunk size honors
> iounit, a.k.a. stat's st_blksize (i.e. reading with the most efficient block
> size advertised by 9P), then I would assume still seeing a performance
> increase.
Yes, we are randomizing read position. But there is no notion of looking
at st_blksize. Its fixed at 4K. (notice option --bs=4k in fio
commandline).
> Because seeking is a no/low cost factor in this case. The guest OS
> seeking does not transmit a 9p message. The offset is rather passed with any
> Tread message instead:
> https://github.com/chaos/diod/blob/master/protocol.md
>
> I mean, yes, random seeks reduce I/O performance in general of course, but in
> direct performance comparison, the difference in overhead of the 9p vs.
> virtiofs network controller layer is most probably the most relevant aspect if
> large I/O chunk sizes are used.
>
Agreed that large I/O chunk size will help with the perfomance numbers.
But idea is to intentonally use smaller I/O chunk size with some of
the tests to measure how efficient communication path is.
Thanks
Vivek
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance)
2020-09-29 13:49 ` [Virtio-fs] " Vivek Goyal
@ 2020-09-29 13:59 ` Christian Schoenebeck
-1 siblings, 0 replies; 107+ messages in thread
From: Christian Schoenebeck @ 2020-09-29 13:59 UTC (permalink / raw)
To: qemu-devel
Cc: Vivek Goyal, Venegas Munoz, Jose Carlos, cdupontd,
virtio-fs-list, Stefan Hajnoczi, Shinde, Archana M,
Dr. David Alan Gilbert
On Dienstag, 29. September 2020 15:49:42 CEST Vivek Goyal wrote:
> > Depends on what's randomized. If read chunk size is randomized, then yes,
> > you would probably see less performance increase compared to a simple
> > 'cat foo.dat'.
>
> We are using "fio" for testing and read chunk size is not being
> randomized. chunk size (block size) is fixed at 4K size for these tests.
Good to know, thanks!
> > If only the read position is randomized, but the read chunk size honors
> > iounit, a.k.a. stat's st_blksize (i.e. reading with the most efficient
> > block size advertised by 9P), then I would assume still seeing a
> > performance increase.
>
> Yes, we are randomizing read position. But there is no notion of looking
> at st_blksize. Its fixed at 4K. (notice option --bs=4k in fio
> commandline).
Ah ok, then the results make sense.
With these block sizes you will indeed suffer a performance issue with 9p, due
to several thread hops in Tread handling, which is due to be fixed.
Best regards,
Christian Schoenebeck
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance)
@ 2020-09-29 13:59 ` Christian Schoenebeck
0 siblings, 0 replies; 107+ messages in thread
From: Christian Schoenebeck @ 2020-09-29 13:59 UTC (permalink / raw)
To: qemu-devel
Cc: Venegas Munoz, Jose Carlos, cdupontd, virtio-fs-list, Shinde,
Archana M, Vivek Goyal
On Dienstag, 29. September 2020 15:49:42 CEST Vivek Goyal wrote:
> > Depends on what's randomized. If read chunk size is randomized, then yes,
> > you would probably see less performance increase compared to a simple
> > 'cat foo.dat'.
>
> We are using "fio" for testing and read chunk size is not being
> randomized. chunk size (block size) is fixed at 4K size for these tests.
Good to know, thanks!
> > If only the read position is randomized, but the read chunk size honors
> > iounit, a.k.a. stat's st_blksize (i.e. reading with the most efficient
> > block size advertised by 9P), then I would assume still seeing a
> > performance increase.
>
> Yes, we are randomizing read position. But there is no notion of looking
> at st_blksize. Its fixed at 4K. (notice option --bs=4k in fio
> commandline).
Ah ok, then the results make sense.
With these block sizes you will indeed suffer a performance issue with 9p, due
to several thread hops in Tread handling, which is due to be fixed.
Best regards,
Christian Schoenebeck
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance)
2020-09-29 13:49 ` Miklos Szeredi
@ 2020-09-29 14:01 ` Vivek Goyal
-1 siblings, 0 replies; 107+ messages in thread
From: Vivek Goyal @ 2020-09-29 14:01 UTC (permalink / raw)
To: Miklos Szeredi
Cc: qemu-devel, Venegas Munoz, Jose Carlos, cdupontd,
Dr. David Alan Gilbert, virtio-fs-list, Shinde, Archana M
On Tue, Sep 29, 2020 at 03:49:04PM +0200, Miklos Szeredi wrote:
> On Tue, Sep 29, 2020 at 3:18 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> > - virtiofs cache=none mode is faster than cache=auto mode for this
> > workload.
>
> Not sure why. One cause could be that readahead is not perfect at
> detecting the random pattern. Could we compare total I/O on the
> server vs. total I/O by fio?
Hi Miklos,
I will instrument virtiosd code to figure out total I/O.
One more potential issue I am staring at is refreshing the attrs on
READ if fc->auto_inval_data is set.
fuse_cache_read_iter() {
/*
* In auto invalidate mode, always update attributes on read.
* Otherwise, only update if we attempt to read past EOF (to ensure
* i_size is up to date).
*/
if (fc->auto_inval_data ||
(iocb->ki_pos + iov_iter_count(to) > i_size_read(inode))) {
int err;
err = fuse_update_attributes(inode, iocb->ki_filp);
if (err)
return err;
}
}
Given this is a mixed READ/WRITE workload, every WRITE will invalidate
attrs. And next READ will first do GETATTR() from server (and potentially
invalidate page cache) before doing READ.
This sounds suboptimal especially from the point of view of WRITEs
done by this client itself. I mean if another client has modified
the file, then doing GETATTR after a second makes sense. But there
should be some optimization to make sure our own WRITEs don't end
up doing GETATTR and invalidate page cache (because cache contents
are still valid).
I disabled ->auto_invalid_data and that seemed to result in 8-10%
gain in performance for this workload.
Thanks
Vivek
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance)
@ 2020-09-29 14:01 ` Vivek Goyal
0 siblings, 0 replies; 107+ messages in thread
From: Vivek Goyal @ 2020-09-29 14:01 UTC (permalink / raw)
To: Miklos Szeredi
Cc: qemu-devel, Venegas Munoz, Jose Carlos, cdupontd, virtio-fs-list,
Shinde, Archana M
On Tue, Sep 29, 2020 at 03:49:04PM +0200, Miklos Szeredi wrote:
> On Tue, Sep 29, 2020 at 3:18 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> > - virtiofs cache=none mode is faster than cache=auto mode for this
> > workload.
>
> Not sure why. One cause could be that readahead is not perfect at
> detecting the random pattern. Could we compare total I/O on the
> server vs. total I/O by fio?
Hi Miklos,
I will instrument virtiosd code to figure out total I/O.
One more potential issue I am staring at is refreshing the attrs on
READ if fc->auto_inval_data is set.
fuse_cache_read_iter() {
/*
* In auto invalidate mode, always update attributes on read.
* Otherwise, only update if we attempt to read past EOF (to ensure
* i_size is up to date).
*/
if (fc->auto_inval_data ||
(iocb->ki_pos + iov_iter_count(to) > i_size_read(inode))) {
int err;
err = fuse_update_attributes(inode, iocb->ki_filp);
if (err)
return err;
}
}
Given this is a mixed READ/WRITE workload, every WRITE will invalidate
attrs. And next READ will first do GETATTR() from server (and potentially
invalidate page cache) before doing READ.
This sounds suboptimal especially from the point of view of WRITEs
done by this client itself. I mean if another client has modified
the file, then doing GETATTR after a second makes sense. But there
should be some optimization to make sure our own WRITEs don't end
up doing GETATTR and invalidate page cache (because cache contents
are still valid).
I disabled ->auto_invalid_data and that seemed to result in 8-10%
gain in performance for this workload.
Thanks
Vivek
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance)
2020-09-29 14:01 ` Vivek Goyal
@ 2020-09-29 14:54 ` Miklos Szeredi
-1 siblings, 0 replies; 107+ messages in thread
From: Miklos Szeredi @ 2020-09-29 14:54 UTC (permalink / raw)
To: Vivek Goyal
Cc: qemu-devel, Venegas Munoz, Jose Carlos, cdupontd,
Dr. David Alan Gilbert, virtio-fs-list, Shinde, Archana M
On Tue, Sep 29, 2020 at 4:01 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Tue, Sep 29, 2020 at 03:49:04PM +0200, Miklos Szeredi wrote:
> > On Tue, Sep 29, 2020 at 3:18 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > > - virtiofs cache=none mode is faster than cache=auto mode for this
> > > workload.
> >
> > Not sure why. One cause could be that readahead is not perfect at
> > detecting the random pattern. Could we compare total I/O on the
> > server vs. total I/O by fio?
>
> Hi Miklos,
>
> I will instrument virtiosd code to figure out total I/O.
>
> One more potential issue I am staring at is refreshing the attrs on
> READ if fc->auto_inval_data is set.
>
> fuse_cache_read_iter() {
> /*
> * In auto invalidate mode, always update attributes on read.
> * Otherwise, only update if we attempt to read past EOF (to ensure
> * i_size is up to date).
> */
> if (fc->auto_inval_data ||
> (iocb->ki_pos + iov_iter_count(to) > i_size_read(inode))) {
> int err;
> err = fuse_update_attributes(inode, iocb->ki_filp);
> if (err)
> return err;
> }
> }
>
> Given this is a mixed READ/WRITE workload, every WRITE will invalidate
> attrs. And next READ will first do GETATTR() from server (and potentially
> invalidate page cache) before doing READ.
>
> This sounds suboptimal especially from the point of view of WRITEs
> done by this client itself. I mean if another client has modified
> the file, then doing GETATTR after a second makes sense. But there
> should be some optimization to make sure our own WRITEs don't end
> up doing GETATTR and invalidate page cache (because cache contents
> are still valid).
Yeah, that sucks.
> I disabled ->auto_invalid_data and that seemed to result in 8-10%
> gain in performance for this workload.
Need to wrap my head around these caching issues.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance)
@ 2020-09-29 14:54 ` Miklos Szeredi
0 siblings, 0 replies; 107+ messages in thread
From: Miklos Szeredi @ 2020-09-29 14:54 UTC (permalink / raw)
To: Vivek Goyal
Cc: qemu-devel, Venegas Munoz, Jose Carlos, cdupontd, virtio-fs-list,
Shinde, Archana M
On Tue, Sep 29, 2020 at 4:01 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Tue, Sep 29, 2020 at 03:49:04PM +0200, Miklos Szeredi wrote:
> > On Tue, Sep 29, 2020 at 3:18 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > > - virtiofs cache=none mode is faster than cache=auto mode for this
> > > workload.
> >
> > Not sure why. One cause could be that readahead is not perfect at
> > detecting the random pattern. Could we compare total I/O on the
> > server vs. total I/O by fio?
>
> Hi Miklos,
>
> I will instrument virtiosd code to figure out total I/O.
>
> One more potential issue I am staring at is refreshing the attrs on
> READ if fc->auto_inval_data is set.
>
> fuse_cache_read_iter() {
> /*
> * In auto invalidate mode, always update attributes on read.
> * Otherwise, only update if we attempt to read past EOF (to ensure
> * i_size is up to date).
> */
> if (fc->auto_inval_data ||
> (iocb->ki_pos + iov_iter_count(to) > i_size_read(inode))) {
> int err;
> err = fuse_update_attributes(inode, iocb->ki_filp);
> if (err)
> return err;
> }
> }
>
> Given this is a mixed READ/WRITE workload, every WRITE will invalidate
> attrs. And next READ will first do GETATTR() from server (and potentially
> invalidate page cache) before doing READ.
>
> This sounds suboptimal especially from the point of view of WRITEs
> done by this client itself. I mean if another client has modified
> the file, then doing GETATTR after a second makes sense. But there
> should be some optimization to make sure our own WRITEs don't end
> up doing GETATTR and invalidate page cache (because cache contents
> are still valid).
Yeah, that sucks.
> I disabled ->auto_invalid_data and that seemed to result in 8-10%
> gain in performance for this workload.
Need to wrap my head around these caching issues.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance)
2020-09-29 13:49 ` Miklos Szeredi
@ 2020-09-29 15:28 ` Vivek Goyal
-1 siblings, 0 replies; 107+ messages in thread
From: Vivek Goyal @ 2020-09-29 15:28 UTC (permalink / raw)
To: Miklos Szeredi
Cc: qemu-devel, Venegas Munoz, Jose Carlos, cdupontd,
Dr. David Alan Gilbert, virtio-fs-list, Shinde, Archana M
On Tue, Sep 29, 2020 at 03:49:04PM +0200, Miklos Szeredi wrote:
> On Tue, Sep 29, 2020 at 3:18 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> > - virtiofs cache=none mode is faster than cache=auto mode for this
> > workload.
>
> Not sure why. One cause could be that readahead is not perfect at
> detecting the random pattern. Could we compare total I/O on the
> server vs. total I/O by fio?
Ran tests with auto_inval_data disabled and compared with other results.
vtfs-auto-ex-randrw randrw-psync 27.8mb/9547kb 7136/2386
vtfs-auto-sh-randrw randrw-psync 43.3mb/14.4mb 10.8k/3709
vtfs-auto-sh-noinval randrw-psync 50.5mb/16.9mb 12.6k/4330
vtfs-none-sh-randrw randrw-psync 54.1mb/18.1mb 13.5k/4649
With auto_inval_data disabled, this time I saw around 20% performance jump
in READ and is now much closer to cache=none performance.
Thanks
Vivek
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance)
@ 2020-09-29 15:28 ` Vivek Goyal
0 siblings, 0 replies; 107+ messages in thread
From: Vivek Goyal @ 2020-09-29 15:28 UTC (permalink / raw)
To: Miklos Szeredi
Cc: qemu-devel, Venegas Munoz, Jose Carlos, cdupontd, virtio-fs-list,
Shinde, Archana M
On Tue, Sep 29, 2020 at 03:49:04PM +0200, Miklos Szeredi wrote:
> On Tue, Sep 29, 2020 at 3:18 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> > - virtiofs cache=none mode is faster than cache=auto mode for this
> > workload.
>
> Not sure why. One cause could be that readahead is not perfect at
> detecting the random pattern. Could we compare total I/O on the
> server vs. total I/O by fio?
Ran tests with auto_inval_data disabled and compared with other results.
vtfs-auto-ex-randrw randrw-psync 27.8mb/9547kb 7136/2386
vtfs-auto-sh-randrw randrw-psync 43.3mb/14.4mb 10.8k/3709
vtfs-auto-sh-noinval randrw-psync 50.5mb/16.9mb 12.6k/4330
vtfs-none-sh-randrw randrw-psync 54.1mb/18.1mb 13.5k/4649
With auto_inval_data disabled, this time I saw around 20% performance jump
in READ and is now much closer to cache=none performance.
Thanks
Vivek
^ permalink raw reply [flat|nested] 107+ messages in thread
* Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance)
2020-09-25 8:06 ` [Virtio-fs] " Christian Schoenebeck
@ 2021-02-19 16:08 ` Vivek Goyal
-1 siblings, 0 replies; 107+ messages in thread
From: Vivek Goyal @ 2021-02-19 16:08 UTC (permalink / raw)
To: Christian Schoenebeck
Cc: Shinde, Archana M, Venegas Munoz, Jose Carlos, qemu-devel,
Dr. David Alan Gilbert, virtio-fs-list, Greg Kurz,
Stefan Hajnoczi, cdupontd
On Fri, Sep 25, 2020 at 10:06:41AM +0200, Christian Schoenebeck wrote:
> On Freitag, 25. September 2020 00:10:23 CEST Vivek Goyal wrote:
> > In my testing, with cache=none, virtiofs performed better than 9p in
> > all the fio jobs I was running. For the case of cache=auto for virtiofs
> > (with xattr enabled), 9p performed better in certain write workloads. I
> > have identified root cause of that problem and working on
> > HANDLE_KILLPRIV_V2 patches to improve WRITE performance of virtiofs
> > with cache=auto and xattr enabled.
>
> Please note, when it comes to performance aspects, you should set a reasonable
> high value for 'msize' on 9p client side:
> https://wiki.qemu.org/Documentation/9psetup#msize
Hi Christian,
I am not able to set msize to a higher value. If I try to specify msize
16MB, and then read back msize from /proc/mounts, it sees to cap it
at 512000. Is that intended?
$ mount -t 9p -o trans=virtio,version=9p2000.L,cache=none,msize=16777216 hostShared /mnt/virtio-9p
$ cat /proc/mounts | grep 9p
hostShared /mnt/virtio-9p 9p rw,sync,dirsync,relatime,access=client,msize=512000,trans=virtio 0 0
I am using 5.11 kernel.
Thanks
Vivek
^ permalink raw reply [flat|nested] 107+ messages in thread
* [Virtio-fs] Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance)
@ 2021-02-19 16:08 ` Vivek Goyal
0 siblings, 0 replies; 107+ messages in thread
From: Vivek Goyal @ 2021-02-19 16:08 UTC (permalink / raw)
To: Christian Schoenebeck
Cc: Shinde, Archana M, Venegas Munoz, Jose Carlos, qemu-devel,
virtio-fs-list, cdupontd
On Fri, Sep 25, 2020 at 10:06:41AM +0200, Christian Schoenebeck wrote:
> On Freitag, 25. September 2020 00:10:23 CEST Vivek Goyal wrote:
> > In my testing, with cache=none, virtiofs performed better than 9p in
> > all the fio jobs I was running. For the case of cache=auto for virtiofs
> > (with xattr enabled), 9p performed better in certain write workloads. I
> > have identified root cause of that problem and working on
> > HANDLE_KILLPRIV_V2 patches to improve WRITE performance of virtiofs
> > with cache=auto and xattr enabled.
>
> Please note, when it comes to performance aspects, you should set a reasonable
> high value for 'msize' on 9p client side:
> https://wiki.qemu.org/Documentation/9psetup#msize
Hi Christian,
I am not able to set msize to a higher value. If I try to specify msize
16MB, and then read back msize from /proc/mounts, it sees to cap it
at 512000. Is that intended?
$ mount -t 9p -o trans=virtio,version=9p2000.L,cache=none,msize=16777216 hostShared /mnt/virtio-9p
$ cat /proc/mounts | grep 9p
hostShared /mnt/virtio-9p 9p rw,sync,dirsync,relatime,access=client,msize=512000,trans=virtio 0 0
I am using 5.11 kernel.
Thanks
Vivek
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance)
2021-02-19 16:08 ` [Virtio-fs] " Vivek Goyal
@ 2021-02-19 17:33 ` Christian Schoenebeck
-1 siblings, 0 replies; 107+ messages in thread
From: Christian Schoenebeck @ 2021-02-19 17:33 UTC (permalink / raw)
To: qemu-devel
Cc: Vivek Goyal, Shinde, Archana M, Venegas Munoz, Jose Carlos,
Dr. David Alan Gilbert, virtio-fs-list, Greg Kurz,
Stefan Hajnoczi, cdupontd
On Freitag, 19. Februar 2021 17:08:48 CET Vivek Goyal wrote:
> On Fri, Sep 25, 2020 at 10:06:41AM +0200, Christian Schoenebeck wrote:
> > On Freitag, 25. September 2020 00:10:23 CEST Vivek Goyal wrote:
> > > In my testing, with cache=none, virtiofs performed better than 9p in
> > > all the fio jobs I was running. For the case of cache=auto for virtiofs
> > > (with xattr enabled), 9p performed better in certain write workloads. I
> > > have identified root cause of that problem and working on
> > > HANDLE_KILLPRIV_V2 patches to improve WRITE performance of virtiofs
> > > with cache=auto and xattr enabled.
> >
> > Please note, when it comes to performance aspects, you should set a
> > reasonable high value for 'msize' on 9p client side:
> > https://wiki.qemu.org/Documentation/9psetup#msize
>
> Hi Christian,
>
> I am not able to set msize to a higher value. If I try to specify msize
> 16MB, and then read back msize from /proc/mounts, it sees to cap it
> at 512000. Is that intended?
9p server side in QEMU does not perform any msize capping. The code in this
case is very simple, it's just what you see in function v9fs_version():
https://github.com/qemu/qemu/blob/6de76c5f324904c93e69f9a1e8e4fd0bd6f6b57a/hw/9pfs/9p.c#L1332
> $ mount -t 9p -o trans=virtio,version=9p2000.L,cache=none,msize=16777216
> hostShared /mnt/virtio-9p
>
> $ cat /proc/mounts | grep 9p
> hostShared /mnt/virtio-9p 9p
> rw,sync,dirsync,relatime,access=client,msize=512000,trans=virtio 0 0
>
> I am using 5.11 kernel.
Must be something on client (guest kernel) side. I don't see this here with
guest kernel 4.9.0 happening with my setup in a quick test:
$ cat /etc/mtab | grep 9p
svnRoot / 9p rw,dirsync,relatime,trans=virtio,version=9p2000.L,msize=104857600,cache=mmap 0 0
$
Looks like the root cause of your issue is this:
struct p9_client *p9_client_create(const char *dev_name, char *options)
{
...
if (clnt->msize > clnt->trans_mod->maxsize)
clnt->msize = clnt->trans_mod->maxsize;
https://github.com/torvalds/linux/blob/f40ddce88593482919761f74910f42f4b84c004b/net/9p/client.c#L1045
Best regards,
Christian Schoenebeck
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance)
@ 2021-02-19 17:33 ` Christian Schoenebeck
0 siblings, 0 replies; 107+ messages in thread
From: Christian Schoenebeck @ 2021-02-19 17:33 UTC (permalink / raw)
To: qemu-devel
Cc: cdupontd, Venegas Munoz, Jose Carlos, virtio-fs-list, Shinde,
Archana M, Vivek Goyal
On Freitag, 19. Februar 2021 17:08:48 CET Vivek Goyal wrote:
> On Fri, Sep 25, 2020 at 10:06:41AM +0200, Christian Schoenebeck wrote:
> > On Freitag, 25. September 2020 00:10:23 CEST Vivek Goyal wrote:
> > > In my testing, with cache=none, virtiofs performed better than 9p in
> > > all the fio jobs I was running. For the case of cache=auto for virtiofs
> > > (with xattr enabled), 9p performed better in certain write workloads. I
> > > have identified root cause of that problem and working on
> > > HANDLE_KILLPRIV_V2 patches to improve WRITE performance of virtiofs
> > > with cache=auto and xattr enabled.
> >
> > Please note, when it comes to performance aspects, you should set a
> > reasonable high value for 'msize' on 9p client side:
> > https://wiki.qemu.org/Documentation/9psetup#msize
>
> Hi Christian,
>
> I am not able to set msize to a higher value. If I try to specify msize
> 16MB, and then read back msize from /proc/mounts, it sees to cap it
> at 512000. Is that intended?
9p server side in QEMU does not perform any msize capping. The code in this
case is very simple, it's just what you see in function v9fs_version():
https://github.com/qemu/qemu/blob/6de76c5f324904c93e69f9a1e8e4fd0bd6f6b57a/hw/9pfs/9p.c#L1332
> $ mount -t 9p -o trans=virtio,version=9p2000.L,cache=none,msize=16777216
> hostShared /mnt/virtio-9p
>
> $ cat /proc/mounts | grep 9p
> hostShared /mnt/virtio-9p 9p
> rw,sync,dirsync,relatime,access=client,msize=512000,trans=virtio 0 0
>
> I am using 5.11 kernel.
Must be something on client (guest kernel) side. I don't see this here with
guest kernel 4.9.0 happening with my setup in a quick test:
$ cat /etc/mtab | grep 9p
svnRoot / 9p rw,dirsync,relatime,trans=virtio,version=9p2000.L,msize=104857600,cache=mmap 0 0
$
Looks like the root cause of your issue is this:
struct p9_client *p9_client_create(const char *dev_name, char *options)
{
...
if (clnt->msize > clnt->trans_mod->maxsize)
clnt->msize = clnt->trans_mod->maxsize;
https://github.com/torvalds/linux/blob/f40ddce88593482919761f74910f42f4b84c004b/net/9p/client.c#L1045
Best regards,
Christian Schoenebeck
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance)
2021-02-19 17:33 ` [Virtio-fs] " Christian Schoenebeck
@ 2021-02-19 19:01 ` Vivek Goyal
-1 siblings, 0 replies; 107+ messages in thread
From: Vivek Goyal @ 2021-02-19 19:01 UTC (permalink / raw)
To: Christian Schoenebeck
Cc: cdupontd, Venegas Munoz, Jose Carlos, Greg Kurz, qemu-devel,
virtio-fs-list, Stefan Hajnoczi, Shinde, Archana M,
Dr. David Alan Gilbert
On Fri, Feb 19, 2021 at 06:33:46PM +0100, Christian Schoenebeck wrote:
> On Freitag, 19. Februar 2021 17:08:48 CET Vivek Goyal wrote:
> > On Fri, Sep 25, 2020 at 10:06:41AM +0200, Christian Schoenebeck wrote:
> > > On Freitag, 25. September 2020 00:10:23 CEST Vivek Goyal wrote:
> > > > In my testing, with cache=none, virtiofs performed better than 9p in
> > > > all the fio jobs I was running. For the case of cache=auto for virtiofs
> > > > (with xattr enabled), 9p performed better in certain write workloads. I
> > > > have identified root cause of that problem and working on
> > > > HANDLE_KILLPRIV_V2 patches to improve WRITE performance of virtiofs
> > > > with cache=auto and xattr enabled.
> > >
> > > Please note, when it comes to performance aspects, you should set a
> > > reasonable high value for 'msize' on 9p client side:
> > > https://wiki.qemu.org/Documentation/9psetup#msize
> >
> > Hi Christian,
> >
> > I am not able to set msize to a higher value. If I try to specify msize
> > 16MB, and then read back msize from /proc/mounts, it sees to cap it
> > at 512000. Is that intended?
>
> 9p server side in QEMU does not perform any msize capping. The code in this
> case is very simple, it's just what you see in function v9fs_version():
>
> https://github.com/qemu/qemu/blob/6de76c5f324904c93e69f9a1e8e4fd0bd6f6b57a/hw/9pfs/9p.c#L1332
>
> > $ mount -t 9p -o trans=virtio,version=9p2000.L,cache=none,msize=16777216
> > hostShared /mnt/virtio-9p
> >
> > $ cat /proc/mounts | grep 9p
> > hostShared /mnt/virtio-9p 9p
> > rw,sync,dirsync,relatime,access=client,msize=512000,trans=virtio 0 0
> >
> > I am using 5.11 kernel.
>
> Must be something on client (guest kernel) side. I don't see this here with
> guest kernel 4.9.0 happening with my setup in a quick test:
>
> $ cat /etc/mtab | grep 9p
> svnRoot / 9p rw,dirsync,relatime,trans=virtio,version=9p2000.L,msize=104857600,cache=mmap 0 0
> $
>
> Looks like the root cause of your issue is this:
>
> struct p9_client *p9_client_create(const char *dev_name, char *options)
> {
> ...
> if (clnt->msize > clnt->trans_mod->maxsize)
> clnt->msize = clnt->trans_mod->maxsize;
>
> https://github.com/torvalds/linux/blob/f40ddce88593482919761f74910f42f4b84c004b/net/9p/client.c#L1045
That was introduced by a patch 2011.
commit c9ffb05ca5b5098d6ea468c909dd384d90da7d54
Author: Venkateswararao Jujjuri (JV) <jvrao@linux.vnet.ibm.com>
Date: Wed Jun 29 18:06:33 2011 -0700
net/9p: Fix the msize calculation.
msize represents the maximum PDU size that includes P9_IOHDRSZ.
You kernel 4.9 is newer than this. So most likely you have this commit
too. I will spend some time later trying to debug this.
Vivek
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance)
@ 2021-02-19 19:01 ` Vivek Goyal
0 siblings, 0 replies; 107+ messages in thread
From: Vivek Goyal @ 2021-02-19 19:01 UTC (permalink / raw)
To: Christian Schoenebeck
Cc: cdupontd, Venegas Munoz, Jose Carlos, qemu-devel, virtio-fs-list,
Shinde, Archana M
On Fri, Feb 19, 2021 at 06:33:46PM +0100, Christian Schoenebeck wrote:
> On Freitag, 19. Februar 2021 17:08:48 CET Vivek Goyal wrote:
> > On Fri, Sep 25, 2020 at 10:06:41AM +0200, Christian Schoenebeck wrote:
> > > On Freitag, 25. September 2020 00:10:23 CEST Vivek Goyal wrote:
> > > > In my testing, with cache=none, virtiofs performed better than 9p in
> > > > all the fio jobs I was running. For the case of cache=auto for virtiofs
> > > > (with xattr enabled), 9p performed better in certain write workloads. I
> > > > have identified root cause of that problem and working on
> > > > HANDLE_KILLPRIV_V2 patches to improve WRITE performance of virtiofs
> > > > with cache=auto and xattr enabled.
> > >
> > > Please note, when it comes to performance aspects, you should set a
> > > reasonable high value for 'msize' on 9p client side:
> > > https://wiki.qemu.org/Documentation/9psetup#msize
> >
> > Hi Christian,
> >
> > I am not able to set msize to a higher value. If I try to specify msize
> > 16MB, and then read back msize from /proc/mounts, it sees to cap it
> > at 512000. Is that intended?
>
> 9p server side in QEMU does not perform any msize capping. The code in this
> case is very simple, it's just what you see in function v9fs_version():
>
> https://github.com/qemu/qemu/blob/6de76c5f324904c93e69f9a1e8e4fd0bd6f6b57a/hw/9pfs/9p.c#L1332
>
> > $ mount -t 9p -o trans=virtio,version=9p2000.L,cache=none,msize=16777216
> > hostShared /mnt/virtio-9p
> >
> > $ cat /proc/mounts | grep 9p
> > hostShared /mnt/virtio-9p 9p
> > rw,sync,dirsync,relatime,access=client,msize=512000,trans=virtio 0 0
> >
> > I am using 5.11 kernel.
>
> Must be something on client (guest kernel) side. I don't see this here with
> guest kernel 4.9.0 happening with my setup in a quick test:
>
> $ cat /etc/mtab | grep 9p
> svnRoot / 9p rw,dirsync,relatime,trans=virtio,version=9p2000.L,msize=104857600,cache=mmap 0 0
> $
>
> Looks like the root cause of your issue is this:
>
> struct p9_client *p9_client_create(const char *dev_name, char *options)
> {
> ...
> if (clnt->msize > clnt->trans_mod->maxsize)
> clnt->msize = clnt->trans_mod->maxsize;
>
> https://github.com/torvalds/linux/blob/f40ddce88593482919761f74910f42f4b84c004b/net/9p/client.c#L1045
That was introduced by a patch 2011.
commit c9ffb05ca5b5098d6ea468c909dd384d90da7d54
Author: Venkateswararao Jujjuri (JV) <jvrao@linux.vnet.ibm.com>
Date: Wed Jun 29 18:06:33 2011 -0700
net/9p: Fix the msize calculation.
msize represents the maximum PDU size that includes P9_IOHDRSZ.
You kernel 4.9 is newer than this. So most likely you have this commit
too. I will spend some time later trying to debug this.
Vivek
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance)
2021-02-19 19:01 ` [Virtio-fs] " Vivek Goyal
@ 2021-02-20 15:38 ` Christian Schoenebeck
-1 siblings, 0 replies; 107+ messages in thread
From: Christian Schoenebeck @ 2021-02-20 15:38 UTC (permalink / raw)
To: qemu-devel
Cc: Vivek Goyal, cdupontd, Venegas Munoz, Jose Carlos, Greg Kurz,
virtio-fs-list, Stefan Hajnoczi, Shinde, Archana M,
Dr. David Alan Gilbert
On Freitag, 19. Februar 2021 20:01:12 CET Vivek Goyal wrote:
> On Fri, Feb 19, 2021 at 06:33:46PM +0100, Christian Schoenebeck wrote:
> > On Freitag, 19. Februar 2021 17:08:48 CET Vivek Goyal wrote:
> > > On Fri, Sep 25, 2020 at 10:06:41AM +0200, Christian Schoenebeck wrote:
> > > > On Freitag, 25. September 2020 00:10:23 CEST Vivek Goyal wrote:
> > > > > In my testing, with cache=none, virtiofs performed better than 9p in
> > > > > all the fio jobs I was running. For the case of cache=auto for
> > > > > virtiofs
> > > > > (with xattr enabled), 9p performed better in certain write
> > > > > workloads. I
> > > > > have identified root cause of that problem and working on
> > > > > HANDLE_KILLPRIV_V2 patches to improve WRITE performance of virtiofs
> > > > > with cache=auto and xattr enabled.
> > > >
> > > > Please note, when it comes to performance aspects, you should set a
> > > > reasonable high value for 'msize' on 9p client side:
> > > > https://wiki.qemu.org/Documentation/9psetup#msize
> > >
> > > Hi Christian,
> > >
> > > I am not able to set msize to a higher value. If I try to specify msize
> > > 16MB, and then read back msize from /proc/mounts, it sees to cap it
> > > at 512000. Is that intended?
> >
> > 9p server side in QEMU does not perform any msize capping. The code in
> > this
> > case is very simple, it's just what you see in function v9fs_version():
> >
> > https://github.com/qemu/qemu/blob/6de76c5f324904c93e69f9a1e8e4fd0bd6f6b57a
> > /hw/9pfs/9p.c#L1332>
> > > $ mount -t 9p -o trans=virtio,version=9p2000.L,cache=none,msize=16777216
> > > hostShared /mnt/virtio-9p
> > >
> > > $ cat /proc/mounts | grep 9p
> > > hostShared /mnt/virtio-9p 9p
> > > rw,sync,dirsync,relatime,access=client,msize=512000,trans=virtio 0 0
> > >
> > > I am using 5.11 kernel.
> >
> > Must be something on client (guest kernel) side. I don't see this here
> > with
> > guest kernel 4.9.0 happening with my setup in a quick test:
> >
> > $ cat /etc/mtab | grep 9p
> > svnRoot / 9p
> > rw,dirsync,relatime,trans=virtio,version=9p2000.L,msize=104857600,cache=m
> > map 0 0 $
> >
> > Looks like the root cause of your issue is this:
> >
> > struct p9_client *p9_client_create(const char *dev_name, char *options)
> > {
> >
> > ...
> > if (clnt->msize > clnt->trans_mod->maxsize)
> >
> > clnt->msize = clnt->trans_mod->maxsize;
> >
> > https://github.com/torvalds/linux/blob/f40ddce88593482919761f74910f42f4b84
> > c004b/net/9p/client.c#L1045
> That was introduced by a patch 2011.
>
> commit c9ffb05ca5b5098d6ea468c909dd384d90da7d54
> Author: Venkateswararao Jujjuri (JV) <jvrao@linux.vnet.ibm.com>
> Date: Wed Jun 29 18:06:33 2011 -0700
>
> net/9p: Fix the msize calculation.
>
> msize represents the maximum PDU size that includes P9_IOHDRSZ.
>
>
> You kernel 4.9 is newer than this. So most likely you have this commit
> too. I will spend some time later trying to debug this.
>
> Vivek
As the kernel code sais trans_mod->maxsize, maybe its something in virtio on
qemu side that does an automatic step back for some reason. I don't see
something in the 9pfs virtio transport driver (hw/9pfs/virtio-9p-device.c on
QEMU side) that would do this, so I would also need to dig deeper.
Do you have some RAM limitation in your setup somewhere?
For comparison, this is how I started the VM:
~/git/qemu/build/qemu-system-x86_64 \
-machine pc,accel=kvm,usb=off,dump-guest-core=off -m 2048 \
-smp 4,sockets=4,cores=1,threads=1 -rtc base=utc \
-boot strict=on -kernel /home/bee/vm/stretch/boot/vmlinuz-4.9.0-13-amd64 \
-initrd /home/bee/vm/stretch/boot/initrd.img-4.9.0-13-amd64 \
-append 'root=svnRoot rw rootfstype=9p rootflags=trans=virtio,version=9p2000.L,msize=104857600,cache=mmap console=ttyS0' \
-fsdev local,security_model=mapped,multidevs=remap,id=fsdev-fs0,path=/home/bee/vm/stretch/ \
-device virtio-9p-pci,id=fs0,fsdev=fsdev-fs0,mount_tag=svnRoot \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-nographic
So the guest system is running entirely and solely on top of 9pfs (as root fs)
and hence it's mounted by above's CL i.e. immediately when the guest is
booted, and RAM size is set to 2 GB.
Best regards,
Christian Schoenebeck
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance)
@ 2021-02-20 15:38 ` Christian Schoenebeck
0 siblings, 0 replies; 107+ messages in thread
From: Christian Schoenebeck @ 2021-02-20 15:38 UTC (permalink / raw)
To: qemu-devel
Cc: Venegas Munoz, Jose Carlos, cdupontd, virtio-fs-list, Shinde,
Archana M, Vivek Goyal
On Freitag, 19. Februar 2021 20:01:12 CET Vivek Goyal wrote:
> On Fri, Feb 19, 2021 at 06:33:46PM +0100, Christian Schoenebeck wrote:
> > On Freitag, 19. Februar 2021 17:08:48 CET Vivek Goyal wrote:
> > > On Fri, Sep 25, 2020 at 10:06:41AM +0200, Christian Schoenebeck wrote:
> > > > On Freitag, 25. September 2020 00:10:23 CEST Vivek Goyal wrote:
> > > > > In my testing, with cache=none, virtiofs performed better than 9p in
> > > > > all the fio jobs I was running. For the case of cache=auto for
> > > > > virtiofs
> > > > > (with xattr enabled), 9p performed better in certain write
> > > > > workloads. I
> > > > > have identified root cause of that problem and working on
> > > > > HANDLE_KILLPRIV_V2 patches to improve WRITE performance of virtiofs
> > > > > with cache=auto and xattr enabled.
> > > >
> > > > Please note, when it comes to performance aspects, you should set a
> > > > reasonable high value for 'msize' on 9p client side:
> > > > https://wiki.qemu.org/Documentation/9psetup#msize
> > >
> > > Hi Christian,
> > >
> > > I am not able to set msize to a higher value. If I try to specify msize
> > > 16MB, and then read back msize from /proc/mounts, it sees to cap it
> > > at 512000. Is that intended?
> >
> > 9p server side in QEMU does not perform any msize capping. The code in
> > this
> > case is very simple, it's just what you see in function v9fs_version():
> >
> > https://github.com/qemu/qemu/blob/6de76c5f324904c93e69f9a1e8e4fd0bd6f6b57a
> > /hw/9pfs/9p.c#L1332>
> > > $ mount -t 9p -o trans=virtio,version=9p2000.L,cache=none,msize=16777216
> > > hostShared /mnt/virtio-9p
> > >
> > > $ cat /proc/mounts | grep 9p
> > > hostShared /mnt/virtio-9p 9p
> > > rw,sync,dirsync,relatime,access=client,msize=512000,trans=virtio 0 0
> > >
> > > I am using 5.11 kernel.
> >
> > Must be something on client (guest kernel) side. I don't see this here
> > with
> > guest kernel 4.9.0 happening with my setup in a quick test:
> >
> > $ cat /etc/mtab | grep 9p
> > svnRoot / 9p
> > rw,dirsync,relatime,trans=virtio,version=9p2000.L,msize=104857600,cache=m
> > map 0 0 $
> >
> > Looks like the root cause of your issue is this:
> >
> > struct p9_client *p9_client_create(const char *dev_name, char *options)
> > {
> >
> > ...
> > if (clnt->msize > clnt->trans_mod->maxsize)
> >
> > clnt->msize = clnt->trans_mod->maxsize;
> >
> > https://github.com/torvalds/linux/blob/f40ddce88593482919761f74910f42f4b84
> > c004b/net/9p/client.c#L1045
> That was introduced by a patch 2011.
>
> commit c9ffb05ca5b5098d6ea468c909dd384d90da7d54
> Author: Venkateswararao Jujjuri (JV) <jvrao@linux.vnet.ibm.com>
> Date: Wed Jun 29 18:06:33 2011 -0700
>
> net/9p: Fix the msize calculation.
>
> msize represents the maximum PDU size that includes P9_IOHDRSZ.
>
>
> You kernel 4.9 is newer than this. So most likely you have this commit
> too. I will spend some time later trying to debug this.
>
> Vivek
As the kernel code sais trans_mod->maxsize, maybe its something in virtio on
qemu side that does an automatic step back for some reason. I don't see
something in the 9pfs virtio transport driver (hw/9pfs/virtio-9p-device.c on
QEMU side) that would do this, so I would also need to dig deeper.
Do you have some RAM limitation in your setup somewhere?
For comparison, this is how I started the VM:
~/git/qemu/build/qemu-system-x86_64 \
-machine pc,accel=kvm,usb=off,dump-guest-core=off -m 2048 \
-smp 4,sockets=4,cores=1,threads=1 -rtc base=utc \
-boot strict=on -kernel /home/bee/vm/stretch/boot/vmlinuz-4.9.0-13-amd64 \
-initrd /home/bee/vm/stretch/boot/initrd.img-4.9.0-13-amd64 \
-append 'root=svnRoot rw rootfstype=9p rootflags=trans=virtio,version=9p2000.L,msize=104857600,cache=mmap console=ttyS0' \
-fsdev local,security_model=mapped,multidevs=remap,id=fsdev-fs0,path=/home/bee/vm/stretch/ \
-device virtio-9p-pci,id=fs0,fsdev=fsdev-fs0,mount_tag=svnRoot \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-nographic
So the guest system is running entirely and solely on top of 9pfs (as root fs)
and hence it's mounted by above's CL i.e. immediately when the guest is
booted, and RAM size is set to 2 GB.
Best regards,
Christian Schoenebeck
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance)
2021-02-20 15:38 ` [Virtio-fs] " Christian Schoenebeck
@ 2021-02-22 12:18 ` Greg Kurz
-1 siblings, 0 replies; 107+ messages in thread
From: Greg Kurz @ 2021-02-22 12:18 UTC (permalink / raw)
To: Christian Schoenebeck
Cc: Venegas Munoz, Jose Carlos, qemu-devel, cdupontd, virtio-fs-list,
Dr. David Alan Gilbert, Stefan Hajnoczi, Shinde, Archana M,
Vivek Goyal
On Sat, 20 Feb 2021 16:38:35 +0100
Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:
> On Freitag, 19. Februar 2021 20:01:12 CET Vivek Goyal wrote:
> > On Fri, Feb 19, 2021 at 06:33:46PM +0100, Christian Schoenebeck wrote:
> > > On Freitag, 19. Februar 2021 17:08:48 CET Vivek Goyal wrote:
> > > > On Fri, Sep 25, 2020 at 10:06:41AM +0200, Christian Schoenebeck wrote:
> > > > > On Freitag, 25. September 2020 00:10:23 CEST Vivek Goyal wrote:
> > > > > > In my testing, with cache=none, virtiofs performed better than 9p in
> > > > > > all the fio jobs I was running. For the case of cache=auto for
> > > > > > virtiofs
> > > > > > (with xattr enabled), 9p performed better in certain write
> > > > > > workloads. I
> > > > > > have identified root cause of that problem and working on
> > > > > > HANDLE_KILLPRIV_V2 patches to improve WRITE performance of virtiofs
> > > > > > with cache=auto and xattr enabled.
> > > > >
> > > > > Please note, when it comes to performance aspects, you should set a
> > > > > reasonable high value for 'msize' on 9p client side:
> > > > > https://wiki.qemu.org/Documentation/9psetup#msize
> > > >
> > > > Hi Christian,
> > > >
> > > > I am not able to set msize to a higher value. If I try to specify msize
> > > > 16MB, and then read back msize from /proc/mounts, it sees to cap it
> > > > at 512000. Is that intended?
> > >
> > > 9p server side in QEMU does not perform any msize capping. The code in
> > > this
> > > case is very simple, it's just what you see in function v9fs_version():
> > >
> > > https://github.com/qemu/qemu/blob/6de76c5f324904c93e69f9a1e8e4fd0bd6f6b57a
> > > /hw/9pfs/9p.c#L1332>
> > > > $ mount -t 9p -o trans=virtio,version=9p2000.L,cache=none,msize=16777216
> > > > hostShared /mnt/virtio-9p
> > > >
> > > > $ cat /proc/mounts | grep 9p
> > > > hostShared /mnt/virtio-9p 9p
> > > > rw,sync,dirsync,relatime,access=client,msize=512000,trans=virtio 0 0
> > > >
> > > > I am using 5.11 kernel.
> > >
> > > Must be something on client (guest kernel) side. I don't see this here
> > > with
> > > guest kernel 4.9.0 happening with my setup in a quick test:
> > >
> > > $ cat /etc/mtab | grep 9p
> > > svnRoot / 9p
> > > rw,dirsync,relatime,trans=virtio,version=9p2000.L,msize=104857600,cache=m
> > > map 0 0 $
> > >
> > > Looks like the root cause of your issue is this:
> > >
> > > struct p9_client *p9_client_create(const char *dev_name, char *options)
> > > {
> > >
> > > ...
> > > if (clnt->msize > clnt->trans_mod->maxsize)
> > >
> > > clnt->msize = clnt->trans_mod->maxsize;
> > >
> > > https://github.com/torvalds/linux/blob/f40ddce88593482919761f74910f42f4b84
> > > c004b/net/9p/client.c#L1045
> > That was introduced by a patch 2011.
> >
> > commit c9ffb05ca5b5098d6ea468c909dd384d90da7d54
> > Author: Venkateswararao Jujjuri (JV) <jvrao@linux.vnet.ibm.com>
> > Date: Wed Jun 29 18:06:33 2011 -0700
> >
> > net/9p: Fix the msize calculation.
> >
> > msize represents the maximum PDU size that includes P9_IOHDRSZ.
> >
> >
> > You kernel 4.9 is newer than this. So most likely you have this commit
> > too. I will spend some time later trying to debug this.
> >
> > Vivek
>
Hi Vivek and Christian,
I reproduce with an up-to-date fedora rawhide guest.
Capping comes from here:
net/9p/trans_virtio.c: .maxsize = PAGE_SIZE * (VIRTQUEUE_NUM - 3),
i.e. 4096 * (128 - 3) == 512000
AFAICT this has been around since 2011, i.e. always for me as a
maintainer and I admit I had never tried such high msize settings
before.
commit b49d8b5d7007a673796f3f99688b46931293873e
Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Date: Wed Aug 17 16:56:04 2011 +0000
net/9p: Fix kernel crash with msize 512K
With msize equal to 512K (PAGE_SIZE * VIRTQUEUE_NUM), we hit multiple
crashes. This patch fix those.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Changelog doesn't help much but it looks like it was a bandaid
for some more severe issues.
> As the kernel code sais trans_mod->maxsize, maybe its something in virtio on
> qemu side that does an automatic step back for some reason. I don't see
> something in the 9pfs virtio transport driver (hw/9pfs/virtio-9p-device.c on
> QEMU side) that would do this, so I would also need to dig deeper.
>
> Do you have some RAM limitation in your setup somewhere?
>
> For comparison, this is how I started the VM:
>
> ~/git/qemu/build/qemu-system-x86_64 \
> -machine pc,accel=kvm,usb=off,dump-guest-core=off -m 2048 \
> -smp 4,sockets=4,cores=1,threads=1 -rtc base=utc \
> -boot strict=on -kernel /home/bee/vm/stretch/boot/vmlinuz-4.9.0-13-amd64 \
> -initrd /home/bee/vm/stretch/boot/initrd.img-4.9.0-13-amd64 \
> -append 'root=svnRoot rw rootfstype=9p rootflags=trans=virtio,version=9p2000.L,msize=104857600,cache=mmap console=ttyS0' \
First obvious difference I see between your setup and mine is that
you're mounting the 9pfs as root from the kernel command line. For
some reason, maybe this has an impact on the check in p9_client_create() ?
Can you reproduce with a scenario like Vivek's one ?
> -fsdev local,security_model=mapped,multidevs=remap,id=fsdev-fs0,path=/home/bee/vm/stretch/ \
> -device virtio-9p-pci,id=fs0,fsdev=fsdev-fs0,mount_tag=svnRoot \
> -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
> -nographic
>
> So the guest system is running entirely and solely on top of 9pfs (as root fs)
> and hence it's mounted by above's CL i.e. immediately when the guest is
> booted, and RAM size is set to 2 GB.
>
> Best regards,
> Christian Schoenebeck
>
>
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance)
@ 2021-02-22 12:18 ` Greg Kurz
0 siblings, 0 replies; 107+ messages in thread
From: Greg Kurz @ 2021-02-22 12:18 UTC (permalink / raw)
To: Christian Schoenebeck
Cc: Venegas Munoz, Jose Carlos, qemu-devel, cdupontd, virtio-fs-list,
Shinde, Archana M, Vivek Goyal
On Sat, 20 Feb 2021 16:38:35 +0100
Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:
> On Freitag, 19. Februar 2021 20:01:12 CET Vivek Goyal wrote:
> > On Fri, Feb 19, 2021 at 06:33:46PM +0100, Christian Schoenebeck wrote:
> > > On Freitag, 19. Februar 2021 17:08:48 CET Vivek Goyal wrote:
> > > > On Fri, Sep 25, 2020 at 10:06:41AM +0200, Christian Schoenebeck wrote:
> > > > > On Freitag, 25. September 2020 00:10:23 CEST Vivek Goyal wrote:
> > > > > > In my testing, with cache=none, virtiofs performed better than 9p in
> > > > > > all the fio jobs I was running. For the case of cache=auto for
> > > > > > virtiofs
> > > > > > (with xattr enabled), 9p performed better in certain write
> > > > > > workloads. I
> > > > > > have identified root cause of that problem and working on
> > > > > > HANDLE_KILLPRIV_V2 patches to improve WRITE performance of virtiofs
> > > > > > with cache=auto and xattr enabled.
> > > > >
> > > > > Please note, when it comes to performance aspects, you should set a
> > > > > reasonable high value for 'msize' on 9p client side:
> > > > > https://wiki.qemu.org/Documentation/9psetup#msize
> > > >
> > > > Hi Christian,
> > > >
> > > > I am not able to set msize to a higher value. If I try to specify msize
> > > > 16MB, and then read back msize from /proc/mounts, it sees to cap it
> > > > at 512000. Is that intended?
> > >
> > > 9p server side in QEMU does not perform any msize capping. The code in
> > > this
> > > case is very simple, it's just what you see in function v9fs_version():
> > >
> > > https://github.com/qemu/qemu/blob/6de76c5f324904c93e69f9a1e8e4fd0bd6f6b57a
> > > /hw/9pfs/9p.c#L1332>
> > > > $ mount -t 9p -o trans=virtio,version=9p2000.L,cache=none,msize=16777216
> > > > hostShared /mnt/virtio-9p
> > > >
> > > > $ cat /proc/mounts | grep 9p
> > > > hostShared /mnt/virtio-9p 9p
> > > > rw,sync,dirsync,relatime,access=client,msize=512000,trans=virtio 0 0
> > > >
> > > > I am using 5.11 kernel.
> > >
> > > Must be something on client (guest kernel) side. I don't see this here
> > > with
> > > guest kernel 4.9.0 happening with my setup in a quick test:
> > >
> > > $ cat /etc/mtab | grep 9p
> > > svnRoot / 9p
> > > rw,dirsync,relatime,trans=virtio,version=9p2000.L,msize=104857600,cache=m
> > > map 0 0 $
> > >
> > > Looks like the root cause of your issue is this:
> > >
> > > struct p9_client *p9_client_create(const char *dev_name, char *options)
> > > {
> > >
> > > ...
> > > if (clnt->msize > clnt->trans_mod->maxsize)
> > >
> > > clnt->msize = clnt->trans_mod->maxsize;
> > >
> > > https://github.com/torvalds/linux/blob/f40ddce88593482919761f74910f42f4b84
> > > c004b/net/9p/client.c#L1045
> > That was introduced by a patch 2011.
> >
> > commit c9ffb05ca5b5098d6ea468c909dd384d90da7d54
> > Author: Venkateswararao Jujjuri (JV) <jvrao@linux.vnet.ibm.com>
> > Date: Wed Jun 29 18:06:33 2011 -0700
> >
> > net/9p: Fix the msize calculation.
> >
> > msize represents the maximum PDU size that includes P9_IOHDRSZ.
> >
> >
> > You kernel 4.9 is newer than this. So most likely you have this commit
> > too. I will spend some time later trying to debug this.
> >
> > Vivek
>
Hi Vivek and Christian,
I reproduce with an up-to-date fedora rawhide guest.
Capping comes from here:
net/9p/trans_virtio.c: .maxsize = PAGE_SIZE * (VIRTQUEUE_NUM - 3),
i.e. 4096 * (128 - 3) == 512000
AFAICT this has been around since 2011, i.e. always for me as a
maintainer and I admit I had never tried such high msize settings
before.
commit b49d8b5d7007a673796f3f99688b46931293873e
Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Date: Wed Aug 17 16:56:04 2011 +0000
net/9p: Fix kernel crash with msize 512K
With msize equal to 512K (PAGE_SIZE * VIRTQUEUE_NUM), we hit multiple
crashes. This patch fix those.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Changelog doesn't help much but it looks like it was a bandaid
for some more severe issues.
> As the kernel code sais trans_mod->maxsize, maybe its something in virtio on
> qemu side that does an automatic step back for some reason. I don't see
> something in the 9pfs virtio transport driver (hw/9pfs/virtio-9p-device.c on
> QEMU side) that would do this, so I would also need to dig deeper.
>
> Do you have some RAM limitation in your setup somewhere?
>
> For comparison, this is how I started the VM:
>
> ~/git/qemu/build/qemu-system-x86_64 \
> -machine pc,accel=kvm,usb=off,dump-guest-core=off -m 2048 \
> -smp 4,sockets=4,cores=1,threads=1 -rtc base=utc \
> -boot strict=on -kernel /home/bee/vm/stretch/boot/vmlinuz-4.9.0-13-amd64 \
> -initrd /home/bee/vm/stretch/boot/initrd.img-4.9.0-13-amd64 \
> -append 'root=svnRoot rw rootfstype=9p rootflags=trans=virtio,version=9p2000.L,msize=104857600,cache=mmap console=ttyS0' \
First obvious difference I see between your setup and mine is that
you're mounting the 9pfs as root from the kernel command line. For
some reason, maybe this has an impact on the check in p9_client_create() ?
Can you reproduce with a scenario like Vivek's one ?
> -fsdev local,security_model=mapped,multidevs=remap,id=fsdev-fs0,path=/home/bee/vm/stretch/ \
> -device virtio-9p-pci,id=fs0,fsdev=fsdev-fs0,mount_tag=svnRoot \
> -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
> -nographic
>
> So the guest system is running entirely and solely on top of 9pfs (as root fs)
> and hence it's mounted by above's CL i.e. immediately when the guest is
> booted, and RAM size is set to 2 GB.
>
> Best regards,
> Christian Schoenebeck
>
>
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance)
2021-02-22 12:18 ` [Virtio-fs] " Greg Kurz
@ 2021-02-22 15:08 ` Christian Schoenebeck
-1 siblings, 0 replies; 107+ messages in thread
From: Christian Schoenebeck @ 2021-02-22 15:08 UTC (permalink / raw)
To: qemu-devel
Cc: Greg Kurz, Venegas Munoz, Jose Carlos, cdupontd, virtio-fs-list,
Dr. David Alan Gilbert, Stefan Hajnoczi, Shinde, Archana M,
Vivek Goyal
On Montag, 22. Februar 2021 13:18:14 CET Greg Kurz wrote:
> On Sat, 20 Feb 2021 16:38:35 +0100
>
> Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:
> > On Freitag, 19. Februar 2021 20:01:12 CET Vivek Goyal wrote:
> > > On Fri, Feb 19, 2021 at 06:33:46PM +0100, Christian Schoenebeck wrote:
> > > > On Freitag, 19. Februar 2021 17:08:48 CET Vivek Goyal wrote:
> > > > > On Fri, Sep 25, 2020 at 10:06:41AM +0200, Christian Schoenebeck
wrote:
> > > > > > On Freitag, 25. September 2020 00:10:23 CEST Vivek Goyal wrote:
> > > > > > > In my testing, with cache=none, virtiofs performed better than
> > > > > > > 9p in
> > > > > > > all the fio jobs I was running. For the case of cache=auto for
> > > > > > > virtiofs
> > > > > > > (with xattr enabled), 9p performed better in certain write
> > > > > > > workloads. I
> > > > > > > have identified root cause of that problem and working on
> > > > > > > HANDLE_KILLPRIV_V2 patches to improve WRITE performance of
> > > > > > > virtiofs
> > > > > > > with cache=auto and xattr enabled.
> > > > > >
> > > > > > Please note, when it comes to performance aspects, you should set
> > > > > > a
> > > > > > reasonable high value for 'msize' on 9p client side:
> > > > > > https://wiki.qemu.org/Documentation/9psetup#msize
> > > > >
> > > > > Hi Christian,
> > > > >
> > > > > I am not able to set msize to a higher value. If I try to specify
> > > > > msize
> > > > > 16MB, and then read back msize from /proc/mounts, it sees to cap it
> > > > > at 512000. Is that intended?
> > > >
> > > > 9p server side in QEMU does not perform any msize capping. The code in
> > > > this
> > > > case is very simple, it's just what you see in function
> > > > v9fs_version():
> > > >
> > > > https://github.com/qemu/qemu/blob/6de76c5f324904c93e69f9a1e8e4fd0bd6f6
> > > > b57a
> > > > /hw/9pfs/9p.c#L1332>
> > > >
> > > > > $ mount -t 9p -o
> > > > > trans=virtio,version=9p2000.L,cache=none,msize=16777216
> > > > > hostShared /mnt/virtio-9p
> > > > >
> > > > > $ cat /proc/mounts | grep 9p
> > > > > hostShared /mnt/virtio-9p 9p
> > > > > rw,sync,dirsync,relatime,access=client,msize=512000,trans=virtio 0 0
> > > > >
> > > > > I am using 5.11 kernel.
> > > >
> > > > Must be something on client (guest kernel) side. I don't see this here
> > > > with
> > > > guest kernel 4.9.0 happening with my setup in a quick test:
> > > >
> > > > $ cat /etc/mtab | grep 9p
> > > > svnRoot / 9p
> > > > rw,dirsync,relatime,trans=virtio,version=9p2000.L,msize=104857600,cach
> > > > e=m
> > > > map 0 0 $
> > > >
> > > > Looks like the root cause of your issue is this:
> > > >
> > > > struct p9_client *p9_client_create(const char *dev_name, char
> > > > *options)
> > > > {
> > > >
> > > > ...
> > > > if (clnt->msize > clnt->trans_mod->maxsize)
> > > >
> > > > clnt->msize = clnt->trans_mod->maxsize;
> > > >
> > > > https://github.com/torvalds/linux/blob/f40ddce88593482919761f74910f42f
> > > > 4b84
> > > > c004b/net/9p/client.c#L1045
> > >
> > > That was introduced by a patch 2011.
> > >
> > > commit c9ffb05ca5b5098d6ea468c909dd384d90da7d54
> > > Author: Venkateswararao Jujjuri (JV) <jvrao@linux.vnet.ibm.com>
> > > Date: Wed Jun 29 18:06:33 2011 -0700
> > >
> > > net/9p: Fix the msize calculation.
> > >
> > > msize represents the maximum PDU size that includes P9_IOHDRSZ.
> > >
> > > You kernel 4.9 is newer than this. So most likely you have this commit
> > > too. I will spend some time later trying to debug this.
> > >
> > > Vivek
>
> Hi Vivek and Christian,
>
> I reproduce with an up-to-date fedora rawhide guest.
>
> Capping comes from here:
>
> net/9p/trans_virtio.c: .maxsize = PAGE_SIZE * (VIRTQUEUE_NUM - 3),
>
> i.e. 4096 * (128 - 3) == 512000
>
> AFAICT this has been around since 2011, i.e. always for me as a
> maintainer and I admit I had never tried such high msize settings
> before.
>
> commit b49d8b5d7007a673796f3f99688b46931293873e
> Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Date: Wed Aug 17 16:56:04 2011 +0000
>
> net/9p: Fix kernel crash with msize 512K
>
> With msize equal to 512K (PAGE_SIZE * VIRTQUEUE_NUM), we hit multiple
> crashes. This patch fix those.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
>
> Changelog doesn't help much but it looks like it was a bandaid
> for some more severe issues.
I did not ever have a kernel crash when I boot a Linux guest with a 9pfs root
fs and 100 MiB msize. Should we ask virtio or 9p Linux client maintainers if
they can add some info what this is about?
> > As the kernel code sais trans_mod->maxsize, maybe its something in virtio
> > on qemu side that does an automatic step back for some reason. I don't
> > see something in the 9pfs virtio transport driver
> > (hw/9pfs/virtio-9p-device.c on QEMU side) that would do this, so I would
> > also need to dig deeper.
> >
> > Do you have some RAM limitation in your setup somewhere?
> >
> > For comparison, this is how I started the VM:
> >
> > ~/git/qemu/build/qemu-system-x86_64 \
> > -machine pc,accel=kvm,usb=off,dump-guest-core=off -m 2048 \
> > -smp 4,sockets=4,cores=1,threads=1 -rtc base=utc \
> > -boot strict=on -kernel /home/bee/vm/stretch/boot/vmlinuz-4.9.0-13-amd64 \
> > -initrd /home/bee/vm/stretch/boot/initrd.img-4.9.0-13-amd64 \
> > -append 'root=svnRoot rw rootfstype=9p
> > rootflags=trans=virtio,version=9p2000.L,msize=104857600,cache=mmap
> > console=ttyS0' \
> First obvious difference I see between your setup and mine is that
> you're mounting the 9pfs as root from the kernel command line. For
> some reason, maybe this has an impact on the check in p9_client_create() ?
>
> Can you reproduce with a scenario like Vivek's one ?
Yep, confirmed. If I boot a guest from an image file first and then try to
manually mount a 9pfs share after guest booted, then I get indeed that msize
capping of just 512 kiB as well. That's far too small. :/
Best regards,
Christian Schoenebeck
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance)
@ 2021-02-22 15:08 ` Christian Schoenebeck
0 siblings, 0 replies; 107+ messages in thread
From: Christian Schoenebeck @ 2021-02-22 15:08 UTC (permalink / raw)
To: qemu-devel
Cc: Shinde, Archana M, Venegas Munoz, Jose Carlos, virtio-fs-list,
cdupontd, Vivek Goyal
On Montag, 22. Februar 2021 13:18:14 CET Greg Kurz wrote:
> On Sat, 20 Feb 2021 16:38:35 +0100
>
> Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:
> > On Freitag, 19. Februar 2021 20:01:12 CET Vivek Goyal wrote:
> > > On Fri, Feb 19, 2021 at 06:33:46PM +0100, Christian Schoenebeck wrote:
> > > > On Freitag, 19. Februar 2021 17:08:48 CET Vivek Goyal wrote:
> > > > > On Fri, Sep 25, 2020 at 10:06:41AM +0200, Christian Schoenebeck
wrote:
> > > > > > On Freitag, 25. September 2020 00:10:23 CEST Vivek Goyal wrote:
> > > > > > > In my testing, with cache=none, virtiofs performed better than
> > > > > > > 9p in
> > > > > > > all the fio jobs I was running. For the case of cache=auto for
> > > > > > > virtiofs
> > > > > > > (with xattr enabled), 9p performed better in certain write
> > > > > > > workloads. I
> > > > > > > have identified root cause of that problem and working on
> > > > > > > HANDLE_KILLPRIV_V2 patches to improve WRITE performance of
> > > > > > > virtiofs
> > > > > > > with cache=auto and xattr enabled.
> > > > > >
> > > > > > Please note, when it comes to performance aspects, you should set
> > > > > > a
> > > > > > reasonable high value for 'msize' on 9p client side:
> > > > > > https://wiki.qemu.org/Documentation/9psetup#msize
> > > > >
> > > > > Hi Christian,
> > > > >
> > > > > I am not able to set msize to a higher value. If I try to specify
> > > > > msize
> > > > > 16MB, and then read back msize from /proc/mounts, it sees to cap it
> > > > > at 512000. Is that intended?
> > > >
> > > > 9p server side in QEMU does not perform any msize capping. The code in
> > > > this
> > > > case is very simple, it's just what you see in function
> > > > v9fs_version():
> > > >
> > > > https://github.com/qemu/qemu/blob/6de76c5f324904c93e69f9a1e8e4fd0bd6f6
> > > > b57a
> > > > /hw/9pfs/9p.c#L1332>
> > > >
> > > > > $ mount -t 9p -o
> > > > > trans=virtio,version=9p2000.L,cache=none,msize=16777216
> > > > > hostShared /mnt/virtio-9p
> > > > >
> > > > > $ cat /proc/mounts | grep 9p
> > > > > hostShared /mnt/virtio-9p 9p
> > > > > rw,sync,dirsync,relatime,access=client,msize=512000,trans=virtio 0 0
> > > > >
> > > > > I am using 5.11 kernel.
> > > >
> > > > Must be something on client (guest kernel) side. I don't see this here
> > > > with
> > > > guest kernel 4.9.0 happening with my setup in a quick test:
> > > >
> > > > $ cat /etc/mtab | grep 9p
> > > > svnRoot / 9p
> > > > rw,dirsync,relatime,trans=virtio,version=9p2000.L,msize=104857600,cach
> > > > e=m
> > > > map 0 0 $
> > > >
> > > > Looks like the root cause of your issue is this:
> > > >
> > > > struct p9_client *p9_client_create(const char *dev_name, char
> > > > *options)
> > > > {
> > > >
> > > > ...
> > > > if (clnt->msize > clnt->trans_mod->maxsize)
> > > >
> > > > clnt->msize = clnt->trans_mod->maxsize;
> > > >
> > > > https://github.com/torvalds/linux/blob/f40ddce88593482919761f74910f42f
> > > > 4b84
> > > > c004b/net/9p/client.c#L1045
> > >
> > > That was introduced by a patch 2011.
> > >
> > > commit c9ffb05ca5b5098d6ea468c909dd384d90da7d54
> > > Author: Venkateswararao Jujjuri (JV) <jvrao@linux.vnet.ibm.com>
> > > Date: Wed Jun 29 18:06:33 2011 -0700
> > >
> > > net/9p: Fix the msize calculation.
> > >
> > > msize represents the maximum PDU size that includes P9_IOHDRSZ.
> > >
> > > You kernel 4.9 is newer than this. So most likely you have this commit
> > > too. I will spend some time later trying to debug this.
> > >
> > > Vivek
>
> Hi Vivek and Christian,
>
> I reproduce with an up-to-date fedora rawhide guest.
>
> Capping comes from here:
>
> net/9p/trans_virtio.c: .maxsize = PAGE_SIZE * (VIRTQUEUE_NUM - 3),
>
> i.e. 4096 * (128 - 3) == 512000
>
> AFAICT this has been around since 2011, i.e. always for me as a
> maintainer and I admit I had never tried such high msize settings
> before.
>
> commit b49d8b5d7007a673796f3f99688b46931293873e
> Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Date: Wed Aug 17 16:56:04 2011 +0000
>
> net/9p: Fix kernel crash with msize 512K
>
> With msize equal to 512K (PAGE_SIZE * VIRTQUEUE_NUM), we hit multiple
> crashes. This patch fix those.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
>
> Changelog doesn't help much but it looks like it was a bandaid
> for some more severe issues.
I did not ever have a kernel crash when I boot a Linux guest with a 9pfs root
fs and 100 MiB msize. Should we ask virtio or 9p Linux client maintainers if
they can add some info what this is about?
> > As the kernel code sais trans_mod->maxsize, maybe its something in virtio
> > on qemu side that does an automatic step back for some reason. I don't
> > see something in the 9pfs virtio transport driver
> > (hw/9pfs/virtio-9p-device.c on QEMU side) that would do this, so I would
> > also need to dig deeper.
> >
> > Do you have some RAM limitation in your setup somewhere?
> >
> > For comparison, this is how I started the VM:
> >
> > ~/git/qemu/build/qemu-system-x86_64 \
> > -machine pc,accel=kvm,usb=off,dump-guest-core=off -m 2048 \
> > -smp 4,sockets=4,cores=1,threads=1 -rtc base=utc \
> > -boot strict=on -kernel /home/bee/vm/stretch/boot/vmlinuz-4.9.0-13-amd64 \
> > -initrd /home/bee/vm/stretch/boot/initrd.img-4.9.0-13-amd64 \
> > -append 'root=svnRoot rw rootfstype=9p
> > rootflags=trans=virtio,version=9p2000.L,msize=104857600,cache=mmap
> > console=ttyS0' \
> First obvious difference I see between your setup and mine is that
> you're mounting the 9pfs as root from the kernel command line. For
> some reason, maybe this has an impact on the check in p9_client_create() ?
>
> Can you reproduce with a scenario like Vivek's one ?
Yep, confirmed. If I boot a guest from an image file first and then try to
manually mount a 9pfs share after guest booted, then I get indeed that msize
capping of just 512 kiB as well. That's far too small. :/
Best regards,
Christian Schoenebeck
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance)
2021-02-22 15:08 ` [Virtio-fs] " Christian Schoenebeck
@ 2021-02-22 17:11 ` Greg Kurz
-1 siblings, 0 replies; 107+ messages in thread
From: Greg Kurz @ 2021-02-22 17:11 UTC (permalink / raw)
To: Christian Schoenebeck
Cc: Shinde, Archana M, Venegas Munoz, Jose Carlos, qemu-devel,
Dr. David Alan Gilbert, virtio-fs-list, Stefan Hajnoczi,
cdupontd, Vivek Goyal
On Mon, 22 Feb 2021 16:08:04 +0100
Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:
[...]
> I did not ever have a kernel crash when I boot a Linux guest with a 9pfs root
> fs and 100 MiB msize.
Interesting.
> Should we ask virtio or 9p Linux client maintainers if
> they can add some info what this is about?
>
Probably worth to try that first, even if I'm not sure anyone has a
answer for that since all the people who worked on virtio-9p at
the time have somehow deserted the project.
> > > As the kernel code sais trans_mod->maxsize, maybe its something in virtio
> > > on qemu side that does an automatic step back for some reason. I don't
> > > see something in the 9pfs virtio transport driver
> > > (hw/9pfs/virtio-9p-device.c on QEMU side) that would do this, so I would
> > > also need to dig deeper.
> > >
> > > Do you have some RAM limitation in your setup somewhere?
> > >
> > > For comparison, this is how I started the VM:
> > >
> > > ~/git/qemu/build/qemu-system-x86_64 \
> > > -machine pc,accel=kvm,usb=off,dump-guest-core=off -m 2048 \
> > > -smp 4,sockets=4,cores=1,threads=1 -rtc base=utc \
> > > -boot strict=on -kernel /home/bee/vm/stretch/boot/vmlinuz-4.9.0-13-amd64 \
> > > -initrd /home/bee/vm/stretch/boot/initrd.img-4.9.0-13-amd64 \
> > > -append 'root=svnRoot rw rootfstype=9p
> > > rootflags=trans=virtio,version=9p2000.L,msize=104857600,cache=mmap
> > > console=ttyS0' \
> > First obvious difference I see between your setup and mine is that
> > you're mounting the 9pfs as root from the kernel command line. For
> > some reason, maybe this has an impact on the check in p9_client_create() ?
> >
> > Can you reproduce with a scenario like Vivek's one ?
>
> Yep, confirmed. If I boot a guest from an image file first and then try to
> manually mount a 9pfs share after guest booted, then I get indeed that msize
> capping of just 512 kiB as well. That's far too small. :/
>
Maybe worth digging :
- why no capping happens in your scenario ?
- is capping really needed ?
Cheers,
--
Greg
> Best regards,
> Christian Schoenebeck
>
>
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance)
@ 2021-02-22 17:11 ` Greg Kurz
0 siblings, 0 replies; 107+ messages in thread
From: Greg Kurz @ 2021-02-22 17:11 UTC (permalink / raw)
To: Christian Schoenebeck
Cc: Shinde, Archana M, Venegas Munoz, Jose Carlos, qemu-devel,
virtio-fs-list, cdupontd, Vivek Goyal
On Mon, 22 Feb 2021 16:08:04 +0100
Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:
[...]
> I did not ever have a kernel crash when I boot a Linux guest with a 9pfs root
> fs and 100 MiB msize.
Interesting.
> Should we ask virtio or 9p Linux client maintainers if
> they can add some info what this is about?
>
Probably worth to try that first, even if I'm not sure anyone has a
answer for that since all the people who worked on virtio-9p at
the time have somehow deserted the project.
> > > As the kernel code sais trans_mod->maxsize, maybe its something in virtio
> > > on qemu side that does an automatic step back for some reason. I don't
> > > see something in the 9pfs virtio transport driver
> > > (hw/9pfs/virtio-9p-device.c on QEMU side) that would do this, so I would
> > > also need to dig deeper.
> > >
> > > Do you have some RAM limitation in your setup somewhere?
> > >
> > > For comparison, this is how I started the VM:
> > >
> > > ~/git/qemu/build/qemu-system-x86_64 \
> > > -machine pc,accel=kvm,usb=off,dump-guest-core=off -m 2048 \
> > > -smp 4,sockets=4,cores=1,threads=1 -rtc base=utc \
> > > -boot strict=on -kernel /home/bee/vm/stretch/boot/vmlinuz-4.9.0-13-amd64 \
> > > -initrd /home/bee/vm/stretch/boot/initrd.img-4.9.0-13-amd64 \
> > > -append 'root=svnRoot rw rootfstype=9p
> > > rootflags=trans=virtio,version=9p2000.L,msize=104857600,cache=mmap
> > > console=ttyS0' \
> > First obvious difference I see between your setup and mine is that
> > you're mounting the 9pfs as root from the kernel command line. For
> > some reason, maybe this has an impact on the check in p9_client_create() ?
> >
> > Can you reproduce with a scenario like Vivek's one ?
>
> Yep, confirmed. If I boot a guest from an image file first and then try to
> manually mount a 9pfs share after guest booted, then I get indeed that msize
> capping of just 512 kiB as well. That's far too small. :/
>
Maybe worth digging :
- why no capping happens in your scenario ?
- is capping really needed ?
Cheers,
--
Greg
> Best regards,
> Christian Schoenebeck
>
>
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance)
2021-02-22 17:11 ` [Virtio-fs] " Greg Kurz
@ 2021-02-23 13:39 ` Christian Schoenebeck
-1 siblings, 0 replies; 107+ messages in thread
From: Christian Schoenebeck @ 2021-02-23 13:39 UTC (permalink / raw)
To: qemu-devel
Cc: Greg Kurz, Shinde, Archana M, Venegas Munoz, Jose Carlos,
Dr. David Alan Gilbert, virtio-fs-list, Stefan Hajnoczi,
cdupontd, Vivek Goyal, Michael S. Tsirkin, Dominique Martinet,
v9fs-developer
On Montag, 22. Februar 2021 18:11:59 CET Greg Kurz wrote:
> On Mon, 22 Feb 2021 16:08:04 +0100
> Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:
>
> [...]
>
> > I did not ever have a kernel crash when I boot a Linux guest with a 9pfs
> > root fs and 100 MiB msize.
>
> Interesting.
>
> > Should we ask virtio or 9p Linux client maintainers if
> > they can add some info what this is about?
>
> Probably worth to try that first, even if I'm not sure anyone has a
> answer for that since all the people who worked on virtio-9p at
> the time have somehow deserted the project.
Michael, Dominique,
we are wondering here about the message size limitation of just 5 kiB in the
9p Linux client (using virtio transport) which imposes a performance
bottleneck, introduced by this kernel commit:
commit b49d8b5d7007a673796f3f99688b46931293873e
Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Date: Wed Aug 17 16:56:04 2011 +0000
net/9p: Fix kernel crash with msize 512K
With msize equal to 512K (PAGE_SIZE * VIRTQUEUE_NUM), we hit multiple
crashes. This patch fix those.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Is this a fundamental maximum message size that cannot be exceeded with virtio
in general or is there another reason for this limit that still applies?
Full discussion:
https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg06343.html
> > > > As the kernel code sais trans_mod->maxsize, maybe its something in
> > > > virtio
> > > > on qemu side that does an automatic step back for some reason. I don't
> > > > see something in the 9pfs virtio transport driver
> > > > (hw/9pfs/virtio-9p-device.c on QEMU side) that would do this, so I
> > > > would
> > > > also need to dig deeper.
> > > >
> > > > Do you have some RAM limitation in your setup somewhere?
> > > >
> > > > For comparison, this is how I started the VM:
> > > >
> > > > ~/git/qemu/build/qemu-system-x86_64 \
> > > > -machine pc,accel=kvm,usb=off,dump-guest-core=off -m 2048 \
> > > > -smp 4,sockets=4,cores=1,threads=1 -rtc base=utc \
> > > > -boot strict=on -kernel
> > > > /home/bee/vm/stretch/boot/vmlinuz-4.9.0-13-amd64 \
> > > > -initrd /home/bee/vm/stretch/boot/initrd.img-4.9.0-13-amd64 \
> > > > -append 'root=svnRoot rw rootfstype=9p
> > > > rootflags=trans=virtio,version=9p2000.L,msize=104857600,cache=mmap
> > > > console=ttyS0' \
> > >
> > > First obvious difference I see between your setup and mine is that
> > > you're mounting the 9pfs as root from the kernel command line. For
> > > some reason, maybe this has an impact on the check in p9_client_create()
> > > ?
> > >
> > > Can you reproduce with a scenario like Vivek's one ?
> >
> > Yep, confirmed. If I boot a guest from an image file first and then try to
> > manually mount a 9pfs share after guest booted, then I get indeed that
> > msize capping of just 512 kiB as well. That's far too small. :/
>
> Maybe worth digging :
> - why no capping happens in your scenario ?
Because I was wrong.
I just figured even in the 9p rootfs scenario it does indeed cap msize to 5kiB
as well. The output of /etc/mtab on guest side was fooling me. I debugged this
on 9p server side and the Linux 9p client always connects with a max. msize of
5 kiB, no matter what you do.
> - is capping really needed ?
>
> Cheers,
That's a good question and probably depends on whether there is a limitation
on virtio side, which I don't have an answer for. Maybe Michael or Dominique
can answer this.
Best regards,
Christian Schoenebeck
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance)
@ 2021-02-23 13:39 ` Christian Schoenebeck
0 siblings, 0 replies; 107+ messages in thread
From: Christian Schoenebeck @ 2021-02-23 13:39 UTC (permalink / raw)
To: qemu-devel
Cc: cdupontd, Michael S. Tsirkin, Dominique Martinet, Venegas Munoz,
Jose Carlos, virtio-fs-list, v9fs-developer, Shinde, Archana M,
Vivek Goyal
On Montag, 22. Februar 2021 18:11:59 CET Greg Kurz wrote:
> On Mon, 22 Feb 2021 16:08:04 +0100
> Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:
>
> [...]
>
> > I did not ever have a kernel crash when I boot a Linux guest with a 9pfs
> > root fs and 100 MiB msize.
>
> Interesting.
>
> > Should we ask virtio or 9p Linux client maintainers if
> > they can add some info what this is about?
>
> Probably worth to try that first, even if I'm not sure anyone has a
> answer for that since all the people who worked on virtio-9p at
> the time have somehow deserted the project.
Michael, Dominique,
we are wondering here about the message size limitation of just 5 kiB in the
9p Linux client (using virtio transport) which imposes a performance
bottleneck, introduced by this kernel commit:
commit b49d8b5d7007a673796f3f99688b46931293873e
Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Date: Wed Aug 17 16:56:04 2011 +0000
net/9p: Fix kernel crash with msize 512K
With msize equal to 512K (PAGE_SIZE * VIRTQUEUE_NUM), we hit multiple
crashes. This patch fix those.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Is this a fundamental maximum message size that cannot be exceeded with virtio
in general or is there another reason for this limit that still applies?
Full discussion:
https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg06343.html
> > > > As the kernel code sais trans_mod->maxsize, maybe its something in
> > > > virtio
> > > > on qemu side that does an automatic step back for some reason. I don't
> > > > see something in the 9pfs virtio transport driver
> > > > (hw/9pfs/virtio-9p-device.c on QEMU side) that would do this, so I
> > > > would
> > > > also need to dig deeper.
> > > >
> > > > Do you have some RAM limitation in your setup somewhere?
> > > >
> > > > For comparison, this is how I started the VM:
> > > >
> > > > ~/git/qemu/build/qemu-system-x86_64 \
> > > > -machine pc,accel=kvm,usb=off,dump-guest-core=off -m 2048 \
> > > > -smp 4,sockets=4,cores=1,threads=1 -rtc base=utc \
> > > > -boot strict=on -kernel
> > > > /home/bee/vm/stretch/boot/vmlinuz-4.9.0-13-amd64 \
> > > > -initrd /home/bee/vm/stretch/boot/initrd.img-4.9.0-13-amd64 \
> > > > -append 'root=svnRoot rw rootfstype=9p
> > > > rootflags=trans=virtio,version=9p2000.L,msize=104857600,cache=mmap
> > > > console=ttyS0' \
> > >
> > > First obvious difference I see between your setup and mine is that
> > > you're mounting the 9pfs as root from the kernel command line. For
> > > some reason, maybe this has an impact on the check in p9_client_create()
> > > ?
> > >
> > > Can you reproduce with a scenario like Vivek's one ?
> >
> > Yep, confirmed. If I boot a guest from an image file first and then try to
> > manually mount a 9pfs share after guest booted, then I get indeed that
> > msize capping of just 512 kiB as well. That's far too small. :/
>
> Maybe worth digging :
> - why no capping happens in your scenario ?
Because I was wrong.
I just figured even in the 9p rootfs scenario it does indeed cap msize to 5kiB
as well. The output of /etc/mtab on guest side was fooling me. I debugged this
on 9p server side and the Linux 9p client always connects with a max. msize of
5 kiB, no matter what you do.
> - is capping really needed ?
>
> Cheers,
That's a good question and probably depends on whether there is a limitation
on virtio side, which I don't have an answer for. Maybe Michael or Dominique
can answer this.
Best regards,
Christian Schoenebeck
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance)
2021-02-23 13:39 ` [Virtio-fs] " Christian Schoenebeck
@ 2021-02-23 14:07 ` Michael S. Tsirkin
-1 siblings, 0 replies; 107+ messages in thread
From: Michael S. Tsirkin @ 2021-02-23 14:07 UTC (permalink / raw)
To: Christian Schoenebeck
Cc: cdupontd, Dominique Martinet, Venegas Munoz, Jose Carlos,
qemu-devel, Dr. David Alan Gilbert, virtio-fs-list, Greg Kurz,
Stefan Hajnoczi, v9fs-developer, Shinde, Archana M, Vivek Goyal
On Tue, Feb 23, 2021 at 02:39:48PM +0100, Christian Schoenebeck wrote:
> On Montag, 22. Februar 2021 18:11:59 CET Greg Kurz wrote:
> > On Mon, 22 Feb 2021 16:08:04 +0100
> > Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:
> >
> > [...]
> >
> > > I did not ever have a kernel crash when I boot a Linux guest with a 9pfs
> > > root fs and 100 MiB msize.
> >
> > Interesting.
> >
> > > Should we ask virtio or 9p Linux client maintainers if
> > > they can add some info what this is about?
> >
> > Probably worth to try that first, even if I'm not sure anyone has a
> > answer for that since all the people who worked on virtio-9p at
> > the time have somehow deserted the project.
>
> Michael, Dominique,
>
> we are wondering here about the message size limitation of just 5 kiB in the
> 9p Linux client (using virtio transport) which imposes a performance
> bottleneck, introduced by this kernel commit:
>
> commit b49d8b5d7007a673796f3f99688b46931293873e
> Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Date: Wed Aug 17 16:56:04 2011 +0000
>
> net/9p: Fix kernel crash with msize 512K
>
> With msize equal to 512K (PAGE_SIZE * VIRTQUEUE_NUM), we hit multiple
> crashes. This patch fix those.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Well the change I see is:
- .maxsize = PAGE_SIZE*VIRTQUEUE_NUM,
+ .maxsize = PAGE_SIZE * (VIRTQUEUE_NUM - 3),
so how come you say it changes 512K to 5K?
Looks more like 500K to me.
> Is this a fundamental maximum message size that cannot be exceeded with virtio
> in general or is there another reason for this limit that still applies?
>
> Full discussion:
> https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg06343.html
>
> > > > > As the kernel code sais trans_mod->maxsize, maybe its something in
> > > > > virtio
> > > > > on qemu side that does an automatic step back for some reason. I don't
> > > > > see something in the 9pfs virtio transport driver
> > > > > (hw/9pfs/virtio-9p-device.c on QEMU side) that would do this, so I
> > > > > would
> > > > > also need to dig deeper.
> > > > >
> > > > > Do you have some RAM limitation in your setup somewhere?
> > > > >
> > > > > For comparison, this is how I started the VM:
> > > > >
> > > > > ~/git/qemu/build/qemu-system-x86_64 \
> > > > > -machine pc,accel=kvm,usb=off,dump-guest-core=off -m 2048 \
> > > > > -smp 4,sockets=4,cores=1,threads=1 -rtc base=utc \
> > > > > -boot strict=on -kernel
> > > > > /home/bee/vm/stretch/boot/vmlinuz-4.9.0-13-amd64 \
> > > > > -initrd /home/bee/vm/stretch/boot/initrd.img-4.9.0-13-amd64 \
> > > > > -append 'root=svnRoot rw rootfstype=9p
> > > > > rootflags=trans=virtio,version=9p2000.L,msize=104857600,cache=mmap
> > > > > console=ttyS0' \
> > > >
> > > > First obvious difference I see between your setup and mine is that
> > > > you're mounting the 9pfs as root from the kernel command line. For
> > > > some reason, maybe this has an impact on the check in p9_client_create()
> > > > ?
> > > >
> > > > Can you reproduce with a scenario like Vivek's one ?
> > >
> > > Yep, confirmed. If I boot a guest from an image file first and then try to
> > > manually mount a 9pfs share after guest booted, then I get indeed that
> > > msize capping of just 512 kiB as well. That's far too small. :/
> >
> > Maybe worth digging :
> > - why no capping happens in your scenario ?
>
> Because I was wrong.
>
> I just figured even in the 9p rootfs scenario it does indeed cap msize to 5kiB
> as well. The output of /etc/mtab on guest side was fooling me. I debugged this
> on 9p server side and the Linux 9p client always connects with a max. msize of
> 5 kiB, no matter what you do.
>
> > - is capping really needed ?
> >
> > Cheers,
>
> That's a good question and probably depends on whether there is a limitation
> on virtio side, which I don't have an answer for. Maybe Michael or Dominique
> can answer this.
>
> Best regards,
> Christian Schoenebeck
>
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance)
@ 2021-02-23 14:07 ` Michael S. Tsirkin
0 siblings, 0 replies; 107+ messages in thread
From: Michael S. Tsirkin @ 2021-02-23 14:07 UTC (permalink / raw)
To: Christian Schoenebeck
Cc: cdupontd, Dominique Martinet, Venegas Munoz, Jose Carlos,
qemu-devel, virtio-fs-list, v9fs-developer, Shinde, Archana M,
Vivek Goyal
On Tue, Feb 23, 2021 at 02:39:48PM +0100, Christian Schoenebeck wrote:
> On Montag, 22. Februar 2021 18:11:59 CET Greg Kurz wrote:
> > On Mon, 22 Feb 2021 16:08:04 +0100
> > Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:
> >
> > [...]
> >
> > > I did not ever have a kernel crash when I boot a Linux guest with a 9pfs
> > > root fs and 100 MiB msize.
> >
> > Interesting.
> >
> > > Should we ask virtio or 9p Linux client maintainers if
> > > they can add some info what this is about?
> >
> > Probably worth to try that first, even if I'm not sure anyone has a
> > answer for that since all the people who worked on virtio-9p at
> > the time have somehow deserted the project.
>
> Michael, Dominique,
>
> we are wondering here about the message size limitation of just 5 kiB in the
> 9p Linux client (using virtio transport) which imposes a performance
> bottleneck, introduced by this kernel commit:
>
> commit b49d8b5d7007a673796f3f99688b46931293873e
> Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Date: Wed Aug 17 16:56:04 2011 +0000
>
> net/9p: Fix kernel crash with msize 512K
>
> With msize equal to 512K (PAGE_SIZE * VIRTQUEUE_NUM), we hit multiple
> crashes. This patch fix those.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Well the change I see is:
- .maxsize = PAGE_SIZE*VIRTQUEUE_NUM,
+ .maxsize = PAGE_SIZE * (VIRTQUEUE_NUM - 3),
so how come you say it changes 512K to 5K?
Looks more like 500K to me.
> Is this a fundamental maximum message size that cannot be exceeded with virtio
> in general or is there another reason for this limit that still applies?
>
> Full discussion:
> https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg06343.html
>
> > > > > As the kernel code sais trans_mod->maxsize, maybe its something in
> > > > > virtio
> > > > > on qemu side that does an automatic step back for some reason. I don't
> > > > > see something in the 9pfs virtio transport driver
> > > > > (hw/9pfs/virtio-9p-device.c on QEMU side) that would do this, so I
> > > > > would
> > > > > also need to dig deeper.
> > > > >
> > > > > Do you have some RAM limitation in your setup somewhere?
> > > > >
> > > > > For comparison, this is how I started the VM:
> > > > >
> > > > > ~/git/qemu/build/qemu-system-x86_64 \
> > > > > -machine pc,accel=kvm,usb=off,dump-guest-core=off -m 2048 \
> > > > > -smp 4,sockets=4,cores=1,threads=1 -rtc base=utc \
> > > > > -boot strict=on -kernel
> > > > > /home/bee/vm/stretch/boot/vmlinuz-4.9.0-13-amd64 \
> > > > > -initrd /home/bee/vm/stretch/boot/initrd.img-4.9.0-13-amd64 \
> > > > > -append 'root=svnRoot rw rootfstype=9p
> > > > > rootflags=trans=virtio,version=9p2000.L,msize=104857600,cache=mmap
> > > > > console=ttyS0' \
> > > >
> > > > First obvious difference I see between your setup and mine is that
> > > > you're mounting the 9pfs as root from the kernel command line. For
> > > > some reason, maybe this has an impact on the check in p9_client_create()
> > > > ?
> > > >
> > > > Can you reproduce with a scenario like Vivek's one ?
> > >
> > > Yep, confirmed. If I boot a guest from an image file first and then try to
> > > manually mount a 9pfs share after guest booted, then I get indeed that
> > > msize capping of just 512 kiB as well. That's far too small. :/
> >
> > Maybe worth digging :
> > - why no capping happens in your scenario ?
>
> Because I was wrong.
>
> I just figured even in the 9p rootfs scenario it does indeed cap msize to 5kiB
> as well. The output of /etc/mtab on guest side was fooling me. I debugged this
> on 9p server side and the Linux 9p client always connects with a max. msize of
> 5 kiB, no matter what you do.
>
> > - is capping really needed ?
> >
> > Cheers,
>
> That's a good question and probably depends on whether there is a limitation
> on virtio side, which I don't have an answer for. Maybe Michael or Dominique
> can answer this.
>
> Best regards,
> Christian Schoenebeck
>
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance)
2021-02-23 14:07 ` [Virtio-fs] " Michael S. Tsirkin
@ 2021-02-24 15:16 ` Christian Schoenebeck
-1 siblings, 0 replies; 107+ messages in thread
From: Christian Schoenebeck @ 2021-02-24 15:16 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: qemu-devel, Greg Kurz, Shinde, Archana M, Venegas Munoz,
Jose Carlos, Dr. David Alan Gilbert, virtio-fs-list,
Stefan Hajnoczi, cdupontd, Vivek Goyal, Dominique Martinet,
v9fs-developer
On Dienstag, 23. Februar 2021 15:07:31 CET Michael S. Tsirkin wrote:
> > Michael, Dominique,
> >
> > we are wondering here about the message size limitation of just 5 kiB in
> > the 9p Linux client (using virtio transport) which imposes a performance
> > bottleneck, introduced by this kernel commit:
> >
> > commit b49d8b5d7007a673796f3f99688b46931293873e
> > Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> > Date: Wed Aug 17 16:56:04 2011 +0000
> >
> > net/9p: Fix kernel crash with msize 512K
> >
> > With msize equal to 512K (PAGE_SIZE * VIRTQUEUE_NUM), we hit multiple
> > crashes. This patch fix those.
> >
> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> > Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
>
> Well the change I see is:
>
> - .maxsize = PAGE_SIZE*VIRTQUEUE_NUM,
> + .maxsize = PAGE_SIZE * (VIRTQUEUE_NUM - 3),
>
>
> so how come you say it changes 512K to 5K?
> Looks more like 500K to me.
Misapprehension + typo(s) in my previous message, sorry Michael. That's 500k
of course (not 5k), yes.
Let me rephrase that question: are you aware of something in virtio that would
per se mandate an absolute hard coded message size limit (e.g. from virtio
specs perspective or maybe some compatibility issue)?
If not, we would try getting rid of that hard coded limit of the 9p client on
kernel side in the first place, because the kernel's 9p client already has a
dynamic runtime option 'msize' and that hard coded enforced limit (500k) is a
performance bottleneck like I said.
Best regards,
Christian Schoenebeck
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance)
@ 2021-02-24 15:16 ` Christian Schoenebeck
0 siblings, 0 replies; 107+ messages in thread
From: Christian Schoenebeck @ 2021-02-24 15:16 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: cdupontd, Dominique Martinet, Venegas Munoz, Jose Carlos,
qemu-devel, virtio-fs-list, v9fs-developer, Shinde, Archana M,
Vivek Goyal
On Dienstag, 23. Februar 2021 15:07:31 CET Michael S. Tsirkin wrote:
> > Michael, Dominique,
> >
> > we are wondering here about the message size limitation of just 5 kiB in
> > the 9p Linux client (using virtio transport) which imposes a performance
> > bottleneck, introduced by this kernel commit:
> >
> > commit b49d8b5d7007a673796f3f99688b46931293873e
> > Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> > Date: Wed Aug 17 16:56:04 2011 +0000
> >
> > net/9p: Fix kernel crash with msize 512K
> >
> > With msize equal to 512K (PAGE_SIZE * VIRTQUEUE_NUM), we hit multiple
> > crashes. This patch fix those.
> >
> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> > Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
>
> Well the change I see is:
>
> - .maxsize = PAGE_SIZE*VIRTQUEUE_NUM,
> + .maxsize = PAGE_SIZE * (VIRTQUEUE_NUM - 3),
>
>
> so how come you say it changes 512K to 5K?
> Looks more like 500K to me.
Misapprehension + typo(s) in my previous message, sorry Michael. That's 500k
of course (not 5k), yes.
Let me rephrase that question: are you aware of something in virtio that would
per se mandate an absolute hard coded message size limit (e.g. from virtio
specs perspective or maybe some compatibility issue)?
If not, we would try getting rid of that hard coded limit of the 9p client on
kernel side in the first place, because the kernel's 9p client already has a
dynamic runtime option 'msize' and that hard coded enforced limit (500k) is a
performance bottleneck like I said.
Best regards,
Christian Schoenebeck
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance)
2021-02-24 15:16 ` [Virtio-fs] " Christian Schoenebeck
@ 2021-02-24 15:43 ` Dominique Martinet
-1 siblings, 0 replies; 107+ messages in thread
From: Dominique Martinet @ 2021-02-24 15:43 UTC (permalink / raw)
To: Christian Schoenebeck
Cc: cdupontd, Michael S. Tsirkin, Venegas Munoz, Jose Carlos,
Greg Kurz, qemu-devel, virtio-fs-list, Vivek Goyal,
Stefan Hajnoczi, v9fs-developer, Shinde, Archana M,
Dr. David Alan Gilbert
Christian Schoenebeck wrote on Wed, Feb 24, 2021 at 04:16:52PM +0100:
> Misapprehension + typo(s) in my previous message, sorry Michael. That's 500k
> of course (not 5k), yes.
>
> Let me rephrase that question: are you aware of something in virtio that would
> per se mandate an absolute hard coded message size limit (e.g. from virtio
> specs perspective or maybe some compatibility issue)?
>
> If not, we would try getting rid of that hard coded limit of the 9p client on
> kernel side in the first place, because the kernel's 9p client already has a
> dynamic runtime option 'msize' and that hard coded enforced limit (500k) is a
> performance bottleneck like I said.
We could probably set it at init time through virtio_max_dma_size(vdev)
like virtio_blk does (I just tried and get 2^64 so we can probably
expect virtually no limit there)
I'm not too familiar with virtio, feel free to try and if it works send
me a patch -- the size drop from 512 to 500k is old enough that things
probably have changed in the background since then.
On the 9p side itself, unrelated to virtio, we don't want to make it
*too* big as the client code doesn't use any scatter-gather and will
want to allocate upfront contiguous buffers of the size that got
negotiated -- that can get ugly quite fast, but we can leave it up to
users to decide.
One of my very-long-term goal would be to tend to that, if someone has
cycles to work on it I'd gladly review any patch in that area.
A possible implementation path would be to have transport define
themselves if they support it or not and handle it accordingly until all
transports migrated, so one wouldn't need to care about e.g. rdma or xen
if you don't have hardware to test in the short term.
The next best thing would be David's netfs helpers and sending
concurrent requests if you use cache, but that's not merged yet either
so it'll be a few cycles as well.
Cheers,
--
Dominique
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance)
@ 2021-02-24 15:43 ` Dominique Martinet
0 siblings, 0 replies; 107+ messages in thread
From: Dominique Martinet @ 2021-02-24 15:43 UTC (permalink / raw)
To: Christian Schoenebeck
Cc: cdupontd, Michael S. Tsirkin, Venegas Munoz, Jose Carlos,
qemu-devel, virtio-fs-list, Vivek Goyal, v9fs-developer, Shinde,
Archana M
Christian Schoenebeck wrote on Wed, Feb 24, 2021 at 04:16:52PM +0100:
> Misapprehension + typo(s) in my previous message, sorry Michael. That's 500k
> of course (not 5k), yes.
>
> Let me rephrase that question: are you aware of something in virtio that would
> per se mandate an absolute hard coded message size limit (e.g. from virtio
> specs perspective or maybe some compatibility issue)?
>
> If not, we would try getting rid of that hard coded limit of the 9p client on
> kernel side in the first place, because the kernel's 9p client already has a
> dynamic runtime option 'msize' and that hard coded enforced limit (500k) is a
> performance bottleneck like I said.
We could probably set it at init time through virtio_max_dma_size(vdev)
like virtio_blk does (I just tried and get 2^64 so we can probably
expect virtually no limit there)
I'm not too familiar with virtio, feel free to try and if it works send
me a patch -- the size drop from 512 to 500k is old enough that things
probably have changed in the background since then.
On the 9p side itself, unrelated to virtio, we don't want to make it
*too* big as the client code doesn't use any scatter-gather and will
want to allocate upfront contiguous buffers of the size that got
negotiated -- that can get ugly quite fast, but we can leave it up to
users to decide.
One of my very-long-term goal would be to tend to that, if someone has
cycles to work on it I'd gladly review any patch in that area.
A possible implementation path would be to have transport define
themselves if they support it or not and handle it accordingly until all
transports migrated, so one wouldn't need to care about e.g. rdma or xen
if you don't have hardware to test in the short term.
The next best thing would be David's netfs helpers and sending
concurrent requests if you use cache, but that's not merged yet either
so it'll be a few cycles as well.
Cheers,
--
Dominique
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance)
2021-02-24 15:43 ` [Virtio-fs] " Dominique Martinet
@ 2021-02-26 13:49 ` Christian Schoenebeck
-1 siblings, 0 replies; 107+ messages in thread
From: Christian Schoenebeck @ 2021-02-26 13:49 UTC (permalink / raw)
To: qemu-devel
Cc: Dominique Martinet, cdupontd, Michael S. Tsirkin, Venegas Munoz,
Jose Carlos, Greg Kurz, virtio-fs-list, Vivek Goyal,
Stefan Hajnoczi, v9fs-developer, Shinde, Archana M,
Dr. David Alan Gilbert
On Mittwoch, 24. Februar 2021 16:43:57 CET Dominique Martinet wrote:
> Christian Schoenebeck wrote on Wed, Feb 24, 2021 at 04:16:52PM +0100:
> > Misapprehension + typo(s) in my previous message, sorry Michael. That's
> > 500k of course (not 5k), yes.
> >
> > Let me rephrase that question: are you aware of something in virtio that
> > would per se mandate an absolute hard coded message size limit (e.g. from
> > virtio specs perspective or maybe some compatibility issue)?
> >
> > If not, we would try getting rid of that hard coded limit of the 9p client
> > on kernel side in the first place, because the kernel's 9p client already
> > has a dynamic runtime option 'msize' and that hard coded enforced limit
> > (500k) is a performance bottleneck like I said.
>
> We could probably set it at init time through virtio_max_dma_size(vdev)
> like virtio_blk does (I just tried and get 2^64 so we can probably
> expect virtually no limit there)
>
> I'm not too familiar with virtio, feel free to try and if it works send
> me a patch -- the size drop from 512 to 500k is old enough that things
> probably have changed in the background since then.
Yes, agreed. I'm neither too familiar with virtio, nor with the Linux 9p
client code yet. For that reason I consider a minimal invasive change as a
first step at least. AFAICS a "split virtqueue" setup is currently used:
https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-240006
Right now the client uses a hard coded amount of 128 elements. So what about
replacing VIRTQUEUE_NUM by a variable which is initialized with a value
according to the user's requested 'msize' option at init time?
According to the virtio specs the max. amount of elements in a virtqueue is
32768. So 32768 * 4k = 128M as new upper limit would already be a significant
improvement and would not require too many changes to the client code, right?
> On the 9p side itself, unrelated to virtio, we don't want to make it
> *too* big as the client code doesn't use any scatter-gather and will
> want to allocate upfront contiguous buffers of the size that got
> negotiated -- that can get ugly quite fast, but we can leave it up to
> users to decide.
With ugly you just mean that it's occupying this memory for good as long as
the driver is loaded, or is there some runtime performance penalty as well to
be aware of?
> One of my very-long-term goal would be to tend to that, if someone has
> cycles to work on it I'd gladly review any patch in that area.
> A possible implementation path would be to have transport define
> themselves if they support it or not and handle it accordingly until all
> transports migrated, so one wouldn't need to care about e.g. rdma or xen
> if you don't have hardware to test in the short term.
Sounds like something that Greg suggested before for a slightly different,
even though related issue: right now the default 'msize' on Linux client side
is 8k, which really hurts performance wise as virtually all 9p messages have
to be split into a huge number of request and response messages. OTOH you
don't want to set this default value too high. So Greg noted that virtio could
suggest a default msize, i.e. a value that would suit host's storage hardware
appropriately.
> The next best thing would be David's netfs helpers and sending
> concurrent requests if you use cache, but that's not merged yet either
> so it'll be a few cycles as well.
So right now the Linux client is always just handling one request at a time;
it sends a 9p request and waits for its response before processing the next
request?
If so, is there a reason to limit the planned concurrent request handling
feature to one of the cached modes? I mean ordering of requests is already
handled on 9p server side, so client could just pass all messages in a
lite-weight way and assume server takes care of it.
Best regards,
Christian Schoenebeck
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance)
@ 2021-02-26 13:49 ` Christian Schoenebeck
0 siblings, 0 replies; 107+ messages in thread
From: Christian Schoenebeck @ 2021-02-26 13:49 UTC (permalink / raw)
To: qemu-devel
Cc: Michael S. Tsirkin, Dominique Martinet, Venegas Munoz,
Jose Carlos, cdupontd, virtio-fs-list, v9fs-developer, Shinde,
Archana M, Vivek Goyal
On Mittwoch, 24. Februar 2021 16:43:57 CET Dominique Martinet wrote:
> Christian Schoenebeck wrote on Wed, Feb 24, 2021 at 04:16:52PM +0100:
> > Misapprehension + typo(s) in my previous message, sorry Michael. That's
> > 500k of course (not 5k), yes.
> >
> > Let me rephrase that question: are you aware of something in virtio that
> > would per se mandate an absolute hard coded message size limit (e.g. from
> > virtio specs perspective or maybe some compatibility issue)?
> >
> > If not, we would try getting rid of that hard coded limit of the 9p client
> > on kernel side in the first place, because the kernel's 9p client already
> > has a dynamic runtime option 'msize' and that hard coded enforced limit
> > (500k) is a performance bottleneck like I said.
>
> We could probably set it at init time through virtio_max_dma_size(vdev)
> like virtio_blk does (I just tried and get 2^64 so we can probably
> expect virtually no limit there)
>
> I'm not too familiar with virtio, feel free to try and if it works send
> me a patch -- the size drop from 512 to 500k is old enough that things
> probably have changed in the background since then.
Yes, agreed. I'm neither too familiar with virtio, nor with the Linux 9p
client code yet. For that reason I consider a minimal invasive change as a
first step at least. AFAICS a "split virtqueue" setup is currently used:
https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-240006
Right now the client uses a hard coded amount of 128 elements. So what about
replacing VIRTQUEUE_NUM by a variable which is initialized with a value
according to the user's requested 'msize' option at init time?
According to the virtio specs the max. amount of elements in a virtqueue is
32768. So 32768 * 4k = 128M as new upper limit would already be a significant
improvement and would not require too many changes to the client code, right?
> On the 9p side itself, unrelated to virtio, we don't want to make it
> *too* big as the client code doesn't use any scatter-gather and will
> want to allocate upfront contiguous buffers of the size that got
> negotiated -- that can get ugly quite fast, but we can leave it up to
> users to decide.
With ugly you just mean that it's occupying this memory for good as long as
the driver is loaded, or is there some runtime performance penalty as well to
be aware of?
> One of my very-long-term goal would be to tend to that, if someone has
> cycles to work on it I'd gladly review any patch in that area.
> A possible implementation path would be to have transport define
> themselves if they support it or not and handle it accordingly until all
> transports migrated, so one wouldn't need to care about e.g. rdma or xen
> if you don't have hardware to test in the short term.
Sounds like something that Greg suggested before for a slightly different,
even though related issue: right now the default 'msize' on Linux client side
is 8k, which really hurts performance wise as virtually all 9p messages have
to be split into a huge number of request and response messages. OTOH you
don't want to set this default value too high. So Greg noted that virtio could
suggest a default msize, i.e. a value that would suit host's storage hardware
appropriately.
> The next best thing would be David's netfs helpers and sending
> concurrent requests if you use cache, but that's not merged yet either
> so it'll be a few cycles as well.
So right now the Linux client is always just handling one request at a time;
it sends a 9p request and waits for its response before processing the next
request?
If so, is there a reason to limit the planned concurrent request handling
feature to one of the cached modes? I mean ordering of requests is already
handled on 9p server side, so client could just pass all messages in a
lite-weight way and assume server takes care of it.
Best regards,
Christian Schoenebeck
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance)
2021-02-26 13:49 ` [Virtio-fs] " Christian Schoenebeck
@ 2021-02-27 0:03 ` Dominique Martinet
-1 siblings, 0 replies; 107+ messages in thread
From: Dominique Martinet @ 2021-02-27 0:03 UTC (permalink / raw)
To: Christian Schoenebeck
Cc: Shinde, Archana M, Michael S. Tsirkin, Venegas Munoz,
Jose Carlos, Greg Kurz, qemu-devel, virtio-fs-list,
Dr. David Alan Gilbert, Stefan Hajnoczi, v9fs-developer,
cdupontd, Vivek Goyal
Christian Schoenebeck wrote on Fri, Feb 26, 2021 at 02:49:12PM +0100:
> Right now the client uses a hard coded amount of 128 elements. So what about
> replacing VIRTQUEUE_NUM by a variable which is initialized with a value
> according to the user's requested 'msize' option at init time?
>
> According to the virtio specs the max. amount of elements in a virtqueue is
> 32768. So 32768 * 4k = 128M as new upper limit would already be a significant
> improvement and would not require too many changes to the client code, right?
The current code inits the chan->sg at probe time (when driver is
loader) and not mount time, and it is currently embedded in the chan
struct, so that would need allocating at mount time (p9_client_create ;
either resizing if required or not sharing) but it doesn't sound too
intrusive yes.
I don't see more adherenences to VIRTQUEUE_NUM that would hurt trying.
> > On the 9p side itself, unrelated to virtio, we don't want to make it
> > *too* big as the client code doesn't use any scatter-gather and will
> > want to allocate upfront contiguous buffers of the size that got
> > negotiated -- that can get ugly quite fast, but we can leave it up to
> > users to decide.
>
> With ugly you just mean that it's occupying this memory for good as long as
> the driver is loaded, or is there some runtime performance penalty as well to
> be aware of?
The main problem is memory fragmentation, see /proc/buddyinfo on various
systems.
After a fresh boot memory is quite clean and there is no problem
allocating 2MB contiguous buffers, but after a while depending on the
workload it can be hard to even allocate large buffers.
I've had that problem at work in the past with a RDMA driver that wanted
to allocate 256KB and could get that to fail quite reliably with our
workload, so it really depends on what the client does.
In the 9p case, the memory used to be allocated for good and per client
(= mountpoint), so if you had 15 9p mounts that could do e.g. 32
requests in parallel with 1MB buffers you could lock 500MB of idling
ram. I changed that to a dedicated slab a while ago, so that should no
longer be so much of a problem -- the slab will keep the buffers around
as well if used frequently so the performance hit wasn't bad even for
larger msizes
> > One of my very-long-term goal would be to tend to that, if someone has
> > cycles to work on it I'd gladly review any patch in that area.
> > A possible implementation path would be to have transport define
> > themselves if they support it or not and handle it accordingly until all
> > transports migrated, so one wouldn't need to care about e.g. rdma or xen
> > if you don't have hardware to test in the short term.
>
> Sounds like something that Greg suggested before for a slightly different,
> even though related issue: right now the default 'msize' on Linux client side
> is 8k, which really hurts performance wise as virtually all 9p messages have
> to be split into a huge number of request and response messages. OTOH you
> don't want to set this default value too high. So Greg noted that virtio could
> suggest a default msize, i.e. a value that would suit host's storage hardware
> appropriately.
We can definitely increase the default, for all transports in my
opinion.
As a first step, 64 or 128k?
> > The next best thing would be David's netfs helpers and sending
> > concurrent requests if you use cache, but that's not merged yet either
> > so it'll be a few cycles as well.
>
> So right now the Linux client is always just handling one request at a time;
> it sends a 9p request and waits for its response before processing the next
> request?
Requests are handled concurrently just fine - if you have multiple
processes all doing their things it will all go out in parallel.
The bottleneck people generally complain about (and where things hurt)
is if you have a single process reading then there is currently no
readahead as far as I know, so reads are really sent one at a time,
waiting for reply and sending next.
> If so, is there a reason to limit the planned concurrent request handling
> feature to one of the cached modes? I mean ordering of requests is already
> handled on 9p server side, so client could just pass all messages in a
> lite-weight way and assume server takes care of it.
cache=none is difficult, we could pipeline requests up to the buffer
size the client requested, but that's it.
Still something worth doing if the msize is tiny and the client requests
4+MB in my opinion, but nothing anything in the vfs can help us with.
cache=mmap is basically cache=none with a hack to say "ok, for mmap
there's no choice so do use some" -- afaik mmap has its own readahead
mechanism, so this should actually prefetch things, but I don't know
about the parallelism of that mechanism and would say it's linear.
Other chaching models (loose / fscache) actually share most of the code
so whatever is done for one would be for both, the discussion is still
underway with David/Willy and others mostly about ceph/cifs but would
benefit everyone and I'm following closely.
--
Dominique
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance)
@ 2021-02-27 0:03 ` Dominique Martinet
0 siblings, 0 replies; 107+ messages in thread
From: Dominique Martinet @ 2021-02-27 0:03 UTC (permalink / raw)
To: Christian Schoenebeck
Cc: Shinde, Archana M, Michael S. Tsirkin, Venegas Munoz,
Jose Carlos, qemu-devel, virtio-fs-list, v9fs-developer,
cdupontd, Vivek Goyal
Christian Schoenebeck wrote on Fri, Feb 26, 2021 at 02:49:12PM +0100:
> Right now the client uses a hard coded amount of 128 elements. So what about
> replacing VIRTQUEUE_NUM by a variable which is initialized with a value
> according to the user's requested 'msize' option at init time?
>
> According to the virtio specs the max. amount of elements in a virtqueue is
> 32768. So 32768 * 4k = 128M as new upper limit would already be a significant
> improvement and would not require too many changes to the client code, right?
The current code inits the chan->sg at probe time (when driver is
loader) and not mount time, and it is currently embedded in the chan
struct, so that would need allocating at mount time (p9_client_create ;
either resizing if required or not sharing) but it doesn't sound too
intrusive yes.
I don't see more adherenences to VIRTQUEUE_NUM that would hurt trying.
> > On the 9p side itself, unrelated to virtio, we don't want to make it
> > *too* big as the client code doesn't use any scatter-gather and will
> > want to allocate upfront contiguous buffers of the size that got
> > negotiated -- that can get ugly quite fast, but we can leave it up to
> > users to decide.
>
> With ugly you just mean that it's occupying this memory for good as long as
> the driver is loaded, or is there some runtime performance penalty as well to
> be aware of?
The main problem is memory fragmentation, see /proc/buddyinfo on various
systems.
After a fresh boot memory is quite clean and there is no problem
allocating 2MB contiguous buffers, but after a while depending on the
workload it can be hard to even allocate large buffers.
I've had that problem at work in the past with a RDMA driver that wanted
to allocate 256KB and could get that to fail quite reliably with our
workload, so it really depends on what the client does.
In the 9p case, the memory used to be allocated for good and per client
(= mountpoint), so if you had 15 9p mounts that could do e.g. 32
requests in parallel with 1MB buffers you could lock 500MB of idling
ram. I changed that to a dedicated slab a while ago, so that should no
longer be so much of a problem -- the slab will keep the buffers around
as well if used frequently so the performance hit wasn't bad even for
larger msizes
> > One of my very-long-term goal would be to tend to that, if someone has
> > cycles to work on it I'd gladly review any patch in that area.
> > A possible implementation path would be to have transport define
> > themselves if they support it or not and handle it accordingly until all
> > transports migrated, so one wouldn't need to care about e.g. rdma or xen
> > if you don't have hardware to test in the short term.
>
> Sounds like something that Greg suggested before for a slightly different,
> even though related issue: right now the default 'msize' on Linux client side
> is 8k, which really hurts performance wise as virtually all 9p messages have
> to be split into a huge number of request and response messages. OTOH you
> don't want to set this default value too high. So Greg noted that virtio could
> suggest a default msize, i.e. a value that would suit host's storage hardware
> appropriately.
We can definitely increase the default, for all transports in my
opinion.
As a first step, 64 or 128k?
> > The next best thing would be David's netfs helpers and sending
> > concurrent requests if you use cache, but that's not merged yet either
> > so it'll be a few cycles as well.
>
> So right now the Linux client is always just handling one request at a time;
> it sends a 9p request and waits for its response before processing the next
> request?
Requests are handled concurrently just fine - if you have multiple
processes all doing their things it will all go out in parallel.
The bottleneck people generally complain about (and where things hurt)
is if you have a single process reading then there is currently no
readahead as far as I know, so reads are really sent one at a time,
waiting for reply and sending next.
> If so, is there a reason to limit the planned concurrent request handling
> feature to one of the cached modes? I mean ordering of requests is already
> handled on 9p server side, so client could just pass all messages in a
> lite-weight way and assume server takes care of it.
cache=none is difficult, we could pipeline requests up to the buffer
size the client requested, but that's it.
Still something worth doing if the msize is tiny and the client requests
4+MB in my opinion, but nothing anything in the vfs can help us with.
cache=mmap is basically cache=none with a hack to say "ok, for mmap
there's no choice so do use some" -- afaik mmap has its own readahead
mechanism, so this should actually prefetch things, but I don't know
about the parallelism of that mechanism and would say it's linear.
Other chaching models (loose / fscache) actually share most of the code
so whatever is done for one would be for both, the discussion is still
underway with David/Willy and others mostly about ceph/cifs but would
benefit everyone and I'm following closely.
--
Dominique
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance)
2021-02-27 0:03 ` [Virtio-fs] " Dominique Martinet
@ 2021-03-03 14:04 ` Christian Schoenebeck
-1 siblings, 0 replies; 107+ messages in thread
From: Christian Schoenebeck @ 2021-03-03 14:04 UTC (permalink / raw)
To: qemu-devel
Cc: Dominique Martinet, Shinde, Archana M, Michael S. Tsirkin,
Venegas Munoz, Jose Carlos, Greg Kurz, virtio-fs-list,
Dr. David Alan Gilbert, Stefan Hajnoczi, v9fs-developer,
cdupontd, Vivek Goyal
On Samstag, 27. Februar 2021 01:03:40 CET Dominique Martinet wrote:
> Christian Schoenebeck wrote on Fri, Feb 26, 2021 at 02:49:12PM +0100:
> > Right now the client uses a hard coded amount of 128 elements. So what
> > about replacing VIRTQUEUE_NUM by a variable which is initialized with a
> > value according to the user's requested 'msize' option at init time?
> >
> > According to the virtio specs the max. amount of elements in a virtqueue
> > is
> > 32768. So 32768 * 4k = 128M as new upper limit would already be a
> > significant improvement and would not require too many changes to the
> > client code, right?
> The current code inits the chan->sg at probe time (when driver is
> loader) and not mount time, and it is currently embedded in the chan
> struct, so that would need allocating at mount time (p9_client_create ;
> either resizing if required or not sharing) but it doesn't sound too
> intrusive yes.
>
> I don't see more adherenences to VIRTQUEUE_NUM that would hurt trying.
Ok, then I will look into changing this when I hopefully have some time in few
weeks.
> > > On the 9p side itself, unrelated to virtio, we don't want to make it
> > > *too* big as the client code doesn't use any scatter-gather and will
> > > want to allocate upfront contiguous buffers of the size that got
> > > negotiated -- that can get ugly quite fast, but we can leave it up to
> > > users to decide.
> >
> > With ugly you just mean that it's occupying this memory for good as long
> > as
> > the driver is loaded, or is there some runtime performance penalty as well
> > to be aware of?
>
> The main problem is memory fragmentation, see /proc/buddyinfo on various
> systems.
> After a fresh boot memory is quite clean and there is no problem
> allocating 2MB contiguous buffers, but after a while depending on the
> workload it can be hard to even allocate large buffers.
> I've had that problem at work in the past with a RDMA driver that wanted
> to allocate 256KB and could get that to fail quite reliably with our
> workload, so it really depends on what the client does.
>
> In the 9p case, the memory used to be allocated for good and per client
> (= mountpoint), so if you had 15 9p mounts that could do e.g. 32
> requests in parallel with 1MB buffers you could lock 500MB of idling
> ram. I changed that to a dedicated slab a while ago, so that should no
> longer be so much of a problem -- the slab will keep the buffers around
> as well if used frequently so the performance hit wasn't bad even for
> larger msizes
Ah ok, good to know.
BTW qemu now handles multiple filesystems below one 9p share correctly by
(optionally) remapping inode numbers from host side -> guest side
appropriately to prevent potential file ID collisions. This might reduce the
need to have a large amount of 9p mount points on guest side.
For instance I am running entire guest systems entirely on one 9p mount point
as root fs that is. The guest system is divided into multiple filesystems on
host side (e.g. multiple zfs datasets), not on guest side.
> > > One of my very-long-term goal would be to tend to that, if someone has
> > > cycles to work on it I'd gladly review any patch in that area.
> > > A possible implementation path would be to have transport define
> > > themselves if they support it or not and handle it accordingly until all
> > > transports migrated, so one wouldn't need to care about e.g. rdma or xen
> > > if you don't have hardware to test in the short term.
> >
> > Sounds like something that Greg suggested before for a slightly different,
> > even though related issue: right now the default 'msize' on Linux client
> > side is 8k, which really hurts performance wise as virtually all 9p
> > messages have to be split into a huge number of request and response
> > messages. OTOH you don't want to set this default value too high. So Greg
> > noted that virtio could suggest a default msize, i.e. a value that would
> > suit host's storage hardware appropriately.
>
> We can definitely increase the default, for all transports in my
> opinion.
> As a first step, 64 or 128k?
Just to throw some numbers first; when linearly reading a 12 GB file on guest
(i.e. "time cat test.dat > /dev/null") on a test machine, these are the
results that I get (cache=mmap):
msize=16k: 2min7s (95 MB/s)
msize=64k: 17s (706 MB/s)
msize=128k: 12s (1000 MB/s)
msize=256k: 8s (1500 MB/s)
msize=512k: 6.5s (1846 MB/s)
Personally I would raise the default msize value at least to 128k.
> > > The next best thing would be David's netfs helpers and sending
> > > concurrent requests if you use cache, but that's not merged yet either
> > > so it'll be a few cycles as well.
> >
> > So right now the Linux client is always just handling one request at a
> > time; it sends a 9p request and waits for its response before processing
> > the next request?
>
> Requests are handled concurrently just fine - if you have multiple
> processes all doing their things it will all go out in parallel.
>
> The bottleneck people generally complain about (and where things hurt)
> is if you have a single process reading then there is currently no
> readahead as far as I know, so reads are really sent one at a time,
> waiting for reply and sending next.
So that also means if you are running a multi-threaded app (in one process) on
guest side, then none of its I/O requests are handled in parallel right now.
It would be desirable to have parallel requests for multi-threaded apps as
well.
Personally I don't find raw I/O the worst performance issue right now. As you
can see from the numbers above, if 'msize' is raised and I/O being performed
with large chunk sizes (e.g. 'cat' automatically uses a chunk size according
to the iounit advertised by stat) then the I/O results are okay.
What hurts IMO the most in practice is the sluggish behaviour regarding
dentries ATM. The following is with cache=mmap (on guest side):
$ time ls /etc/ > /dev/null
real 0m0.091s
user 0m0.000s
sys 0m0.044s
$ time ls -l /etc/ > /dev/null
real 0m0.259s
user 0m0.008s
sys 0m0.016s
$ ls -l /etc/ | wc -l
113
$
With cache=loose there is some improvement; on the first "ls" run (when its
not in the dentry cache I assume) the results are similar. The subsequent runs
then improve to around 50ms for "ls" and around 70ms for "ls -l". But that's
still far from numbers I would expect.
Keep in mind, even when you just open() & read() a file, then directory
components have to be walked for checking ownership and permissions. I have
seen huge slowdowns in deep directory structures for that reason.
Best regards,
Christian Schoenebeck
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance)
@ 2021-03-03 14:04 ` Christian Schoenebeck
0 siblings, 0 replies; 107+ messages in thread
From: Christian Schoenebeck @ 2021-03-03 14:04 UTC (permalink / raw)
To: qemu-devel
Cc: cdupontd, Michael S. Tsirkin, Dominique Martinet, Venegas Munoz,
Jose Carlos, virtio-fs-list, v9fs-developer, Shinde, Archana M,
Vivek Goyal
On Samstag, 27. Februar 2021 01:03:40 CET Dominique Martinet wrote:
> Christian Schoenebeck wrote on Fri, Feb 26, 2021 at 02:49:12PM +0100:
> > Right now the client uses a hard coded amount of 128 elements. So what
> > about replacing VIRTQUEUE_NUM by a variable which is initialized with a
> > value according to the user's requested 'msize' option at init time?
> >
> > According to the virtio specs the max. amount of elements in a virtqueue
> > is
> > 32768. So 32768 * 4k = 128M as new upper limit would already be a
> > significant improvement and would not require too many changes to the
> > client code, right?
> The current code inits the chan->sg at probe time (when driver is
> loader) and not mount time, and it is currently embedded in the chan
> struct, so that would need allocating at mount time (p9_client_create ;
> either resizing if required or not sharing) but it doesn't sound too
> intrusive yes.
>
> I don't see more adherenences to VIRTQUEUE_NUM that would hurt trying.
Ok, then I will look into changing this when I hopefully have some time in few
weeks.
> > > On the 9p side itself, unrelated to virtio, we don't want to make it
> > > *too* big as the client code doesn't use any scatter-gather and will
> > > want to allocate upfront contiguous buffers of the size that got
> > > negotiated -- that can get ugly quite fast, but we can leave it up to
> > > users to decide.
> >
> > With ugly you just mean that it's occupying this memory for good as long
> > as
> > the driver is loaded, or is there some runtime performance penalty as well
> > to be aware of?
>
> The main problem is memory fragmentation, see /proc/buddyinfo on various
> systems.
> After a fresh boot memory is quite clean and there is no problem
> allocating 2MB contiguous buffers, but after a while depending on the
> workload it can be hard to even allocate large buffers.
> I've had that problem at work in the past with a RDMA driver that wanted
> to allocate 256KB and could get that to fail quite reliably with our
> workload, so it really depends on what the client does.
>
> In the 9p case, the memory used to be allocated for good and per client
> (= mountpoint), so if you had 15 9p mounts that could do e.g. 32
> requests in parallel with 1MB buffers you could lock 500MB of idling
> ram. I changed that to a dedicated slab a while ago, so that should no
> longer be so much of a problem -- the slab will keep the buffers around
> as well if used frequently so the performance hit wasn't bad even for
> larger msizes
Ah ok, good to know.
BTW qemu now handles multiple filesystems below one 9p share correctly by
(optionally) remapping inode numbers from host side -> guest side
appropriately to prevent potential file ID collisions. This might reduce the
need to have a large amount of 9p mount points on guest side.
For instance I am running entire guest systems entirely on one 9p mount point
as root fs that is. The guest system is divided into multiple filesystems on
host side (e.g. multiple zfs datasets), not on guest side.
> > > One of my very-long-term goal would be to tend to that, if someone has
> > > cycles to work on it I'd gladly review any patch in that area.
> > > A possible implementation path would be to have transport define
> > > themselves if they support it or not and handle it accordingly until all
> > > transports migrated, so one wouldn't need to care about e.g. rdma or xen
> > > if you don't have hardware to test in the short term.
> >
> > Sounds like something that Greg suggested before for a slightly different,
> > even though related issue: right now the default 'msize' on Linux client
> > side is 8k, which really hurts performance wise as virtually all 9p
> > messages have to be split into a huge number of request and response
> > messages. OTOH you don't want to set this default value too high. So Greg
> > noted that virtio could suggest a default msize, i.e. a value that would
> > suit host's storage hardware appropriately.
>
> We can definitely increase the default, for all transports in my
> opinion.
> As a first step, 64 or 128k?
Just to throw some numbers first; when linearly reading a 12 GB file on guest
(i.e. "time cat test.dat > /dev/null") on a test machine, these are the
results that I get (cache=mmap):
msize=16k: 2min7s (95 MB/s)
msize=64k: 17s (706 MB/s)
msize=128k: 12s (1000 MB/s)
msize=256k: 8s (1500 MB/s)
msize=512k: 6.5s (1846 MB/s)
Personally I would raise the default msize value at least to 128k.
> > > The next best thing would be David's netfs helpers and sending
> > > concurrent requests if you use cache, but that's not merged yet either
> > > so it'll be a few cycles as well.
> >
> > So right now the Linux client is always just handling one request at a
> > time; it sends a 9p request and waits for its response before processing
> > the next request?
>
> Requests are handled concurrently just fine - if you have multiple
> processes all doing their things it will all go out in parallel.
>
> The bottleneck people generally complain about (and where things hurt)
> is if you have a single process reading then there is currently no
> readahead as far as I know, so reads are really sent one at a time,
> waiting for reply and sending next.
So that also means if you are running a multi-threaded app (in one process) on
guest side, then none of its I/O requests are handled in parallel right now.
It would be desirable to have parallel requests for multi-threaded apps as
well.
Personally I don't find raw I/O the worst performance issue right now. As you
can see from the numbers above, if 'msize' is raised and I/O being performed
with large chunk sizes (e.g. 'cat' automatically uses a chunk size according
to the iounit advertised by stat) then the I/O results are okay.
What hurts IMO the most in practice is the sluggish behaviour regarding
dentries ATM. The following is with cache=mmap (on guest side):
$ time ls /etc/ > /dev/null
real 0m0.091s
user 0m0.000s
sys 0m0.044s
$ time ls -l /etc/ > /dev/null
real 0m0.259s
user 0m0.008s
sys 0m0.016s
$ ls -l /etc/ | wc -l
113
$
With cache=loose there is some improvement; on the first "ls" run (when its
not in the dentry cache I assume) the results are similar. The subsequent runs
then improve to around 50ms for "ls" and around 70ms for "ls -l". But that's
still far from numbers I would expect.
Keep in mind, even when you just open() & read() a file, then directory
components have to be walked for checking ownership and permissions. I have
seen huge slowdowns in deep directory structures for that reason.
Best regards,
Christian Schoenebeck
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance)
2021-03-03 14:04 ` [Virtio-fs] " Christian Schoenebeck
@ 2021-03-03 14:50 ` Dominique Martinet
-1 siblings, 0 replies; 107+ messages in thread
From: Dominique Martinet @ 2021-03-03 14:50 UTC (permalink / raw)
To: Christian Schoenebeck
Cc: cdupontd, Michael S. Tsirkin, Venegas Munoz, Jose Carlos,
Greg Kurz, qemu-devel, virtio-fs-list, Vivek Goyal,
Stefan Hajnoczi, v9fs-developer, Shinde, Archana M,
Dr. David Alan Gilbert
Christian Schoenebeck wrote on Wed, Mar 03, 2021 at 03:04:21PM +0100:
> > We can definitely increase the default, for all transports in my
> > opinion.
> > As a first step, 64 or 128k?
>
> Just to throw some numbers first; when linearly reading a 12 GB file on guest
> (i.e. "time cat test.dat > /dev/null") on a test machine, these are the
> results that I get (cache=mmap):
>
> msize=16k: 2min7s (95 MB/s)
> msize=64k: 17s (706 MB/s)
> msize=128k: 12s (1000 MB/s)
> msize=256k: 8s (1500 MB/s)
> msize=512k: 6.5s (1846 MB/s)
>
> Personally I would raise the default msize value at least to 128k.
Thanks for the numbers.
I'm still a bit worried about too large chunks, let's go with 128k for
now -- I'll send a couple of patches increasing the tcp max/default as
well next week-ish.
> > The bottleneck people generally complain about (and where things hurt)
> > is if you have a single process reading then there is currently no
> > readahead as far as I know, so reads are really sent one at a time,
> > waiting for reply and sending next.
>
> So that also means if you are running a multi-threaded app (in one process) on
> guest side, then none of its I/O requests are handled in parallel right now.
> It would be desirable to have parallel requests for multi-threaded apps as
> well.
threads are independant there as far as the kernel goes, if multiple
threads issue IO in parallel it will be handled in parallel.
(the exception would be "lightweight threads" which don't spawn actual
OS thread, but in this case the IOs are generally sent asynchronously so
that should work as well)
> Personally I don't find raw I/O the worst performance issue right now. As you
> can see from the numbers above, if 'msize' is raised and I/O being performed
> with large chunk sizes (e.g. 'cat' automatically uses a chunk size according
> to the iounit advertised by stat) then the I/O results are okay.
>
> What hurts IMO the most in practice is the sluggish behaviour regarding
> dentries ATM. The following is with cache=mmap (on guest side):
>
> $ time ls /etc/ > /dev/null
> real 0m0.091s
> user 0m0.000s
> sys 0m0.044s
> $ time ls -l /etc/ > /dev/null
> real 0m0.259s
> user 0m0.008s
> sys 0m0.016s
> $ ls -l /etc/ | wc -l
> 113
> $
Yes, that is slow indeed.. Unfortunately cache=none/mmap means only open
dentries are pinned, so that means a load of requests everytime.
I was going to suggest something like readdirplus or prefetching
directory entries attributes in parallel/background, but since we're not
keeping any entries around we can't even do that in that mode.
> With cache=loose there is some improvement; on the first "ls" run (when its
> not in the dentry cache I assume) the results are similar. The subsequent runs
> then improve to around 50ms for "ls" and around 70ms for "ls -l". But that's
> still far from numbers I would expect.
I'm surprised cached mode is that slow though, that is worth
investigating.
With that time range we are definitely sending more requests to the
server than I would expect for cache=loose, some stat revalidation
perhaps? I thought there wasn't any.
I don't like cache=loose/fscache right now as the reclaim mechanism
doesn't work well as far as I'm aware (I've heard reports of 9p memory
usage growing ad nauseam in these modes), so while it's fine for
short-lived VMs it can't really be used for long periods of time as
is... That's been on my todo for a while too, but unfortunately no time
for that.
Ideally if that gets fixed, it really should be the default with some
sort of cache revalidation like NFS does (if that hasn't changed, inode
stats have a lifetime after which they get revalidated on access, and
directory ctime changes lead to a fresh readdir) ; but we can't really
do that right now if it "leaks".
Some cap to the number of open fids could be appreciable as well
perhaps, to spare server resources and keep internal lists short.
> Keep in mind, even when you just open() & read() a file, then directory
> components have to be walked for checking ownership and permissions. I have
> seen huge slowdowns in deep directory structures for that reason.
Yes, each component is walked one at a time. In theory the protocol
allows opening a path with all components specified to a single walk and
letting the server handle intermediate directories check, but the VFS
doesn't allow that.
Using relative paths or openat/fstatat/etc helps but many programs
aren't very smart with that.. Note it's not just a problem with 9p
though, even network filesystems with proper caching have a noticeable
performance cost with deep directory trees.
Anyway, there definitely is room for improvement; if you need ideas I
have plenty but my time is more than limited right now and for the
forseeable future... 9p work is purely on my freetime and there isn't
much at the moment :(
I'll make time as necessary for reviews & tests but that's about as much
as I can promise, sorry and good luck!
--
Dominique
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance)
@ 2021-03-03 14:50 ` Dominique Martinet
0 siblings, 0 replies; 107+ messages in thread
From: Dominique Martinet @ 2021-03-03 14:50 UTC (permalink / raw)
To: Christian Schoenebeck
Cc: cdupontd, Michael S. Tsirkin, Venegas Munoz, Jose Carlos,
qemu-devel, virtio-fs-list, Vivek Goyal, v9fs-developer, Shinde,
Archana M
Christian Schoenebeck wrote on Wed, Mar 03, 2021 at 03:04:21PM +0100:
> > We can definitely increase the default, for all transports in my
> > opinion.
> > As a first step, 64 or 128k?
>
> Just to throw some numbers first; when linearly reading a 12 GB file on guest
> (i.e. "time cat test.dat > /dev/null") on a test machine, these are the
> results that I get (cache=mmap):
>
> msize=16k: 2min7s (95 MB/s)
> msize=64k: 17s (706 MB/s)
> msize=128k: 12s (1000 MB/s)
> msize=256k: 8s (1500 MB/s)
> msize=512k: 6.5s (1846 MB/s)
>
> Personally I would raise the default msize value at least to 128k.
Thanks for the numbers.
I'm still a bit worried about too large chunks, let's go with 128k for
now -- I'll send a couple of patches increasing the tcp max/default as
well next week-ish.
> > The bottleneck people generally complain about (and where things hurt)
> > is if you have a single process reading then there is currently no
> > readahead as far as I know, so reads are really sent one at a time,
> > waiting for reply and sending next.
>
> So that also means if you are running a multi-threaded app (in one process) on
> guest side, then none of its I/O requests are handled in parallel right now.
> It would be desirable to have parallel requests for multi-threaded apps as
> well.
threads are independant there as far as the kernel goes, if multiple
threads issue IO in parallel it will be handled in parallel.
(the exception would be "lightweight threads" which don't spawn actual
OS thread, but in this case the IOs are generally sent asynchronously so
that should work as well)
> Personally I don't find raw I/O the worst performance issue right now. As you
> can see from the numbers above, if 'msize' is raised and I/O being performed
> with large chunk sizes (e.g. 'cat' automatically uses a chunk size according
> to the iounit advertised by stat) then the I/O results are okay.
>
> What hurts IMO the most in practice is the sluggish behaviour regarding
> dentries ATM. The following is with cache=mmap (on guest side):
>
> $ time ls /etc/ > /dev/null
> real 0m0.091s
> user 0m0.000s
> sys 0m0.044s
> $ time ls -l /etc/ > /dev/null
> real 0m0.259s
> user 0m0.008s
> sys 0m0.016s
> $ ls -l /etc/ | wc -l
> 113
> $
Yes, that is slow indeed.. Unfortunately cache=none/mmap means only open
dentries are pinned, so that means a load of requests everytime.
I was going to suggest something like readdirplus or prefetching
directory entries attributes in parallel/background, but since we're not
keeping any entries around we can't even do that in that mode.
> With cache=loose there is some improvement; on the first "ls" run (when its
> not in the dentry cache I assume) the results are similar. The subsequent runs
> then improve to around 50ms for "ls" and around 70ms for "ls -l". But that's
> still far from numbers I would expect.
I'm surprised cached mode is that slow though, that is worth
investigating.
With that time range we are definitely sending more requests to the
server than I would expect for cache=loose, some stat revalidation
perhaps? I thought there wasn't any.
I don't like cache=loose/fscache right now as the reclaim mechanism
doesn't work well as far as I'm aware (I've heard reports of 9p memory
usage growing ad nauseam in these modes), so while it's fine for
short-lived VMs it can't really be used for long periods of time as
is... That's been on my todo for a while too, but unfortunately no time
for that.
Ideally if that gets fixed, it really should be the default with some
sort of cache revalidation like NFS does (if that hasn't changed, inode
stats have a lifetime after which they get revalidated on access, and
directory ctime changes lead to a fresh readdir) ; but we can't really
do that right now if it "leaks".
Some cap to the number of open fids could be appreciable as well
perhaps, to spare server resources and keep internal lists short.
> Keep in mind, even when you just open() & read() a file, then directory
> components have to be walked for checking ownership and permissions. I have
> seen huge slowdowns in deep directory structures for that reason.
Yes, each component is walked one at a time. In theory the protocol
allows opening a path with all components specified to a single walk and
letting the server handle intermediate directories check, but the VFS
doesn't allow that.
Using relative paths or openat/fstatat/etc helps but many programs
aren't very smart with that.. Note it's not just a problem with 9p
though, even network filesystems with proper caching have a noticeable
performance cost with deep directory trees.
Anyway, there definitely is room for improvement; if you need ideas I
have plenty but my time is more than limited right now and for the
forseeable future... 9p work is purely on my freetime and there isn't
much at the moment :(
I'll make time as necessary for reviews & tests but that's about as much
as I can promise, sorry and good luck!
--
Dominique
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance)
2021-03-03 14:50 ` [Virtio-fs] " Dominique Martinet
@ 2021-03-05 14:57 ` Christian Schoenebeck
-1 siblings, 0 replies; 107+ messages in thread
From: Christian Schoenebeck @ 2021-03-05 14:57 UTC (permalink / raw)
To: qemu-devel
Cc: Dominique Martinet, cdupontd, Michael S. Tsirkin, Venegas Munoz,
Jose Carlos, Greg Kurz, virtio-fs-list, Vivek Goyal,
Stefan Hajnoczi, v9fs-developer, Shinde, Archana M,
Dr. David Alan Gilbert
On Mittwoch, 3. März 2021 15:50:37 CET Dominique Martinet wrote:
> Christian Schoenebeck wrote on Wed, Mar 03, 2021 at 03:04:21PM +0100:
> > > We can definitely increase the default, for all transports in my
> > > opinion.
> > > As a first step, 64 or 128k?
> >
> > Just to throw some numbers first; when linearly reading a 12 GB file on
> > guest (i.e. "time cat test.dat > /dev/null") on a test machine, these are
> > the results that I get (cache=mmap):
> >
> > msize=16k: 2min7s (95 MB/s)
> > msize=64k: 17s (706 MB/s)
> > msize=128k: 12s (1000 MB/s)
> > msize=256k: 8s (1500 MB/s)
> > msize=512k: 6.5s (1846 MB/s)
> >
> > Personally I would raise the default msize value at least to 128k.
>
> Thanks for the numbers.
> I'm still a bit worried about too large chunks, let's go with 128k for
> now -- I'll send a couple of patches increasing the tcp max/default as
> well next week-ish.
Ok, sounds good!
> > Personally I don't find raw I/O the worst performance issue right now. As
> > you can see from the numbers above, if 'msize' is raised and I/O being
> > performed with large chunk sizes (e.g. 'cat' automatically uses a chunk
> > size according to the iounit advertised by stat) then the I/O results are
> > okay.
> >
> > What hurts IMO the most in practice is the sluggish behaviour regarding
> > dentries ATM. The following is with cache=mmap (on guest side):
> >
> > $ time ls /etc/ > /dev/null
> > real 0m0.091s
> > user 0m0.000s
> > sys 0m0.044s
> > $ time ls -l /etc/ > /dev/null
> > real 0m0.259s
> > user 0m0.008s
> > sys 0m0.016s
> > $ ls -l /etc/ | wc -l
> > 113
> > $
>
> Yes, that is slow indeed.. Unfortunately cache=none/mmap means only open
> dentries are pinned, so that means a load of requests everytime.
>
> I was going to suggest something like readdirplus or prefetching
> directory entries attributes in parallel/background, but since we're not
> keeping any entries around we can't even do that in that mode.
>
> > With cache=loose there is some improvement; on the first "ls" run (when
> > its
> > not in the dentry cache I assume) the results are similar. The subsequent
> > runs then improve to around 50ms for "ls" and around 70ms for "ls -l".
> > But that's still far from numbers I would expect.
>
> I'm surprised cached mode is that slow though, that is worth
> investigating.
> With that time range we are definitely sending more requests to the
> server than I would expect for cache=loose, some stat revalidation
> perhaps? I thought there wasn't any.
Yes, it looks like more 9p requests are sent than actually required for
readdir. But I haven't checked yet what's going on there in detail. That's
definitely on my todo list, because this readdir/stat/direntry issue ATM
really hurts the most IMO.
> I don't like cache=loose/fscache right now as the reclaim mechanism
> doesn't work well as far as I'm aware (I've heard reports of 9p memory
> usage growing ad nauseam in these modes), so while it's fine for
> short-lived VMs it can't really be used for long periods of time as
> is... That's been on my todo for a while too, but unfortunately no time
> for that.
Ok, that's new to me. But I fear the opposite is currently worse; with
cache=mmap and running a VM for a longer time: 9p requests get slower and
slower, e.g. at a certain point you're waiting like 20s for one request. I
haven't investigated the cause here either yet. It may very well be an issue
on QEMU side: I have some doubts in the fid reclaim algorithm on 9p server
side which is using just a linked list. Maybe that list is growing to
ridiculous sizes and searching the list with O(n) starts to hurt after a
while.
With cache=loose I don't see such tremendous slowdowns even on long runs,
which might indicate that this symptom might indeed be due to a problem on
QEMU side.
> Ideally if that gets fixed, it really should be the default with some
> sort of cache revalidation like NFS does (if that hasn't changed, inode
> stats have a lifetime after which they get revalidated on access, and
> directory ctime changes lead to a fresh readdir) ; but we can't really
> do that right now if it "leaks".
>
> Some cap to the number of open fids could be appreciable as well
> perhaps, to spare server resources and keep internal lists short.
I just reviewed the fid reclaim code on 9p servers side to some extent because
of a security issue on 9p server side in this area recently, but I haven't
really thought through nor captured the authors' original ideas behind it
entirely yet. I still have some question marks here. Maybe Greg feels the
same.
Probably when support for macOS is added (also on my todo list), then the
amount of open fids needs to be limited anyway. Because macOS is much more
conservative and does not allow a large number of open files by default.
> Anyway, there definitely is room for improvement; if you need ideas I
> have plenty but my time is more than limited right now and for the
> forseeable future... 9p work is purely on my freetime and there isn't
> much at the moment :(
>
> I'll make time as necessary for reviews & tests but that's about as much
> as I can promise, sorry and good luck!
I fear that applies to all developers right now. To my knowledge there is not
a single developer either paid and/or able to spend reasonable large time
slices on 9p issues.
From my side: my plans are to hunt down the worst 9p performance issues in
order of their impact, but like anybody else, when I find some free time
slices for that.
#patience #optimistic
Best regards,
Christian Schoenebeck
^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [Virtio-fs] Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance)
@ 2021-03-05 14:57 ` Christian Schoenebeck
0 siblings, 0 replies; 107+ messages in thread
From: Christian Schoenebeck @ 2021-03-05 14:57 UTC (permalink / raw)
To: qemu-devel
Cc: Michael S. Tsirkin, Dominique Martinet, Venegas Munoz,
Jose Carlos, cdupontd, virtio-fs-list, v9fs-developer, Shinde,
Archana M, Vivek Goyal
On Mittwoch, 3. März 2021 15:50:37 CET Dominique Martinet wrote:
> Christian Schoenebeck wrote on Wed, Mar 03, 2021 at 03:04:21PM +0100:
> > > We can definitely increase the default, for all transports in my
> > > opinion.
> > > As a first step, 64 or 128k?
> >
> > Just to throw some numbers first; when linearly reading a 12 GB file on
> > guest (i.e. "time cat test.dat > /dev/null") on a test machine, these are
> > the results that I get (cache=mmap):
> >
> > msize=16k: 2min7s (95 MB/s)
> > msize=64k: 17s (706 MB/s)
> > msize=128k: 12s (1000 MB/s)
> > msize=256k: 8s (1500 MB/s)
> > msize=512k: 6.5s (1846 MB/s)
> >
> > Personally I would raise the default msize value at least to 128k.
>
> Thanks for the numbers.
> I'm still a bit worried about too large chunks, let's go with 128k for
> now -- I'll send a couple of patches increasing the tcp max/default as
> well next week-ish.
Ok, sounds good!
> > Personally I don't find raw I/O the worst performance issue right now. As
> > you can see from the numbers above, if 'msize' is raised and I/O being
> > performed with large chunk sizes (e.g. 'cat' automatically uses a chunk
> > size according to the iounit advertised by stat) then the I/O results are
> > okay.
> >
> > What hurts IMO the most in practice is the sluggish behaviour regarding
> > dentries ATM. The following is with cache=mmap (on guest side):
> >
> > $ time ls /etc/ > /dev/null
> > real 0m0.091s
> > user 0m0.000s
> > sys 0m0.044s
> > $ time ls -l /etc/ > /dev/null
> > real 0m0.259s
> > user 0m0.008s
> > sys 0m0.016s
> > $ ls -l /etc/ | wc -l
> > 113
> > $
>
> Yes, that is slow indeed.. Unfortunately cache=none/mmap means only open
> dentries are pinned, so that means a load of requests everytime.
>
> I was going to suggest something like readdirplus or prefetching
> directory entries attributes in parallel/background, but since we're not
> keeping any entries around we can't even do that in that mode.
>
> > With cache=loose there is some improvement; on the first "ls" run (when
> > its
> > not in the dentry cache I assume) the results are similar. The subsequent
> > runs then improve to around 50ms for "ls" and around 70ms for "ls -l".
> > But that's still far from numbers I would expect.
>
> I'm surprised cached mode is that slow though, that is worth
> investigating.
> With that time range we are definitely sending more requests to the
> server than I would expect for cache=loose, some stat revalidation
> perhaps? I thought there wasn't any.
Yes, it looks like more 9p requests are sent than actually required for
readdir. But I haven't checked yet what's going on there in detail. That's
definitely on my todo list, because this readdir/stat/direntry issue ATM
really hurts the most IMO.
> I don't like cache=loose/fscache right now as the reclaim mechanism
> doesn't work well as far as I'm aware (I've heard reports of 9p memory
> usage growing ad nauseam in these modes), so while it's fine for
> short-lived VMs it can't really be used for long periods of time as
> is... That's been on my todo for a while too, but unfortunately no time
> for that.
Ok, that's new to me. But I fear the opposite is currently worse; with
cache=mmap and running a VM for a longer time: 9p requests get slower and
slower, e.g. at a certain point you're waiting like 20s for one request. I
haven't investigated the cause here either yet. It may very well be an issue
on QEMU side: I have some doubts in the fid reclaim algorithm on 9p server
side which is using just a linked list. Maybe that list is growing to
ridiculous sizes and searching the list with O(n) starts to hurt after a
while.
With cache=loose I don't see such tremendous slowdowns even on long runs,
which might indicate that this symptom might indeed be due to a problem on
QEMU side.
> Ideally if that gets fixed, it really should be the default with some
> sort of cache revalidation like NFS does (if that hasn't changed, inode
> stats have a lifetime after which they get revalidated on access, and
> directory ctime changes lead to a fresh readdir) ; but we can't really
> do that right now if it "leaks".
>
> Some cap to the number of open fids could be appreciable as well
> perhaps, to spare server resources and keep internal lists short.
I just reviewed the fid reclaim code on 9p servers side to some extent because
of a security issue on 9p server side in this area recently, but I haven't
really thought through nor captured the authors' original ideas behind it
entirely yet. I still have some question marks here. Maybe Greg feels the
same.
Probably when support for macOS is added (also on my todo list), then the
amount of open fids needs to be limited anyway. Because macOS is much more
conservative and does not allow a large number of open files by default.
> Anyway, there definitely is room for improvement; if you need ideas I
> have plenty but my time is more than limited right now and for the
> forseeable future... 9p work is purely on my freetime and there isn't
> much at the moment :(
>
> I'll make time as necessary for reviews & tests but that's about as much
> as I can promise, sorry and good luck!
I fear that applies to all developers right now. To my knowledge there is not
a single developer either paid and/or able to spend reasonable large time
slices on 9p issues.
>From my side: my plans are to hunt down the worst 9p performance issues in
order of their impact, but like anybody else, when I find some free time
slices for that.
#patience #optimistic
Best regards,
Christian Schoenebeck
^ permalink raw reply [flat|nested] 107+ messages in thread
end of thread, other threads:[~2021-03-05 14:59 UTC | newest]
Thread overview: 107+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-18 21:34 tools/virtiofs: Multi threading seems to hurt performance Vivek Goyal
2020-09-18 21:34 ` [Virtio-fs] " Vivek Goyal
2020-09-21 8:39 ` Stefan Hajnoczi
2020-09-21 8:39 ` [Virtio-fs] " Stefan Hajnoczi
2020-09-21 13:39 ` Vivek Goyal
2020-09-21 13:39 ` [Virtio-fs] " Vivek Goyal
2020-09-21 16:57 ` Stefan Hajnoczi
2020-09-21 16:57 ` [Virtio-fs] " Stefan Hajnoczi
2020-09-21 8:50 ` Dr. David Alan Gilbert
2020-09-21 8:50 ` [Virtio-fs] " Dr. David Alan Gilbert
2020-09-21 13:35 ` Vivek Goyal
2020-09-21 13:35 ` [Virtio-fs] " Vivek Goyal
2020-09-21 14:08 ` Daniel P. Berrangé
2020-09-21 14:08 ` [Virtio-fs] " Daniel P. Berrangé
2020-09-21 15:32 ` Dr. David Alan Gilbert
2020-09-21 15:32 ` [Virtio-fs] " Dr. David Alan Gilbert
2020-09-22 10:25 ` Dr. David Alan Gilbert
2020-09-22 10:25 ` [Virtio-fs] " Dr. David Alan Gilbert
2020-09-22 17:47 ` Vivek Goyal
2020-09-22 17:47 ` [Virtio-fs] " Vivek Goyal
2020-09-24 21:33 ` Venegas Munoz, Jose Carlos
2020-09-24 21:33 ` [Virtio-fs] " Venegas Munoz, Jose Carlos
2020-09-24 22:10 ` virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance) Vivek Goyal
2020-09-24 22:10 ` [Virtio-fs] " Vivek Goyal
2020-09-25 8:06 ` virtiofs vs 9p performance Christian Schoenebeck
2020-09-25 8:06 ` [Virtio-fs] " Christian Schoenebeck
2020-09-25 13:13 ` Vivek Goyal
2020-09-25 13:13 ` [Virtio-fs] " Vivek Goyal
2020-09-25 15:47 ` Christian Schoenebeck
2020-09-25 15:47 ` [Virtio-fs] " Christian Schoenebeck
2021-02-19 16:08 ` Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance) Vivek Goyal
2021-02-19 16:08 ` [Virtio-fs] " Vivek Goyal
2021-02-19 17:33 ` Christian Schoenebeck
2021-02-19 17:33 ` [Virtio-fs] " Christian Schoenebeck
2021-02-19 19:01 ` Vivek Goyal
2021-02-19 19:01 ` [Virtio-fs] " Vivek Goyal
2021-02-20 15:38 ` Christian Schoenebeck
2021-02-20 15:38 ` [Virtio-fs] " Christian Schoenebeck
2021-02-22 12:18 ` Greg Kurz
2021-02-22 12:18 ` [Virtio-fs] " Greg Kurz
2021-02-22 15:08 ` Christian Schoenebeck
2021-02-22 15:08 ` [Virtio-fs] " Christian Schoenebeck
2021-02-22 17:11 ` Greg Kurz
2021-02-22 17:11 ` [Virtio-fs] " Greg Kurz
2021-02-23 13:39 ` Christian Schoenebeck
2021-02-23 13:39 ` [Virtio-fs] " Christian Schoenebeck
2021-02-23 14:07 ` Michael S. Tsirkin
2021-02-23 14:07 ` [Virtio-fs] " Michael S. Tsirkin
2021-02-24 15:16 ` Christian Schoenebeck
2021-02-24 15:16 ` [Virtio-fs] " Christian Schoenebeck
2021-02-24 15:43 ` Dominique Martinet
2021-02-24 15:43 ` [Virtio-fs] " Dominique Martinet
2021-02-26 13:49 ` Christian Schoenebeck
2021-02-26 13:49 ` [Virtio-fs] " Christian Schoenebeck
2021-02-27 0:03 ` Dominique Martinet
2021-02-27 0:03 ` [Virtio-fs] " Dominique Martinet
2021-03-03 14:04 ` Christian Schoenebeck
2021-03-03 14:04 ` [Virtio-fs] " Christian Schoenebeck
2021-03-03 14:50 ` Dominique Martinet
2021-03-03 14:50 ` [Virtio-fs] " Dominique Martinet
2021-03-05 14:57 ` Christian Schoenebeck
2021-03-05 14:57 ` [Virtio-fs] " Christian Schoenebeck
2020-09-25 12:41 ` virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance) Dr. David Alan Gilbert
2020-09-25 12:41 ` [Virtio-fs] " Dr. David Alan Gilbert
2020-09-25 13:04 ` Christian Schoenebeck
2020-09-25 13:04 ` [Virtio-fs] " Christian Schoenebeck
2020-09-25 13:05 ` Dr. David Alan Gilbert
2020-09-25 13:05 ` [Virtio-fs] " Dr. David Alan Gilbert
2020-09-25 16:05 ` Christian Schoenebeck
2020-09-25 16:05 ` [Virtio-fs] " Christian Schoenebeck
2020-09-25 16:33 ` Christian Schoenebeck
2020-09-25 16:33 ` [Virtio-fs] " Christian Schoenebeck
2020-09-25 18:51 ` Dr. David Alan Gilbert
2020-09-25 18:51 ` [Virtio-fs] " Dr. David Alan Gilbert
2020-09-27 12:14 ` Christian Schoenebeck
2020-09-27 12:14 ` [Virtio-fs] " Christian Schoenebeck
2020-09-29 13:03 ` Vivek Goyal
2020-09-29 13:03 ` [Virtio-fs] " Vivek Goyal
2020-09-29 13:28 ` Christian Schoenebeck
2020-09-29 13:28 ` [Virtio-fs] " Christian Schoenebeck
2020-09-29 13:49 ` Vivek Goyal
2020-09-29 13:49 ` [Virtio-fs] " Vivek Goyal
2020-09-29 13:59 ` Christian Schoenebeck
2020-09-29 13:59 ` [Virtio-fs] " Christian Schoenebeck
2020-09-29 13:17 ` Vivek Goyal
2020-09-29 13:17 ` [Virtio-fs] " Vivek Goyal
2020-09-29 13:49 ` Miklos Szeredi
2020-09-29 13:49 ` Miklos Szeredi
2020-09-29 14:01 ` Vivek Goyal
2020-09-29 14:01 ` Vivek Goyal
2020-09-29 14:54 ` Miklos Szeredi
2020-09-29 14:54 ` Miklos Szeredi
2020-09-29 15:28 ` Vivek Goyal
2020-09-29 15:28 ` Vivek Goyal
2020-09-25 12:11 ` tools/virtiofs: Multi threading seems to hurt performance Dr. David Alan Gilbert
2020-09-25 12:11 ` [Virtio-fs] " Dr. David Alan Gilbert
2020-09-25 13:11 ` Vivek Goyal
2020-09-25 13:11 ` [Virtio-fs] " Vivek Goyal
2020-09-21 20:16 ` Vivek Goyal
2020-09-21 20:16 ` [Virtio-fs] " Vivek Goyal
2020-09-22 11:09 ` Dr. David Alan Gilbert
2020-09-22 11:09 ` [Virtio-fs] " Dr. David Alan Gilbert
2020-09-22 22:56 ` Vivek Goyal
2020-09-22 22:56 ` [Virtio-fs] " Vivek Goyal
2020-09-23 12:50 ` Chirantan Ekbote
2020-09-23 12:59 ` Vivek Goyal
2020-09-25 11:35 ` Dr. David Alan Gilbert
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.