* tools/virtiofs: Multi threading seems to hurt performance @ 2020-09-18 21:34 Vivek Goyal 2020-09-21 8:39 ` Stefan Hajnoczi ` (4 more replies) 0 siblings, 5 replies; 55+ messages in thread From: Vivek Goyal @ 2020-09-18 21:34 UTC (permalink / raw) To: virtio-fs-list, qemu-devel; +Cc: Dr. David Alan Gilbert, Stefan Hajnoczi Hi All, virtiofsd default thread pool size is 64. To me it feels that in most of the cases thread pool size 1 performs better than thread pool size 64. I ran virtiofs-tests. https://github.com/rhvgoyal/virtiofs-tests And here are the comparision results. To me it seems that by default we should switch to 1 thread (Till we can figure out how to make multi thread performance better even when single process is doing I/O in client). I am especially more interested in getting performance better for single process in client. If that suffers, then it is pretty bad. Especially look at randread, randwrite, seqwrite performance. seqread seems pretty good anyway. If I don't run who test suite and just ran randread-psync job, my throughput jumps from around 40MB/s to 60MB/s. That's a huge jump I would say. Thoughts? Thanks Vivek NAME WORKLOAD Bandwidth IOPS cache-auto seqread-psync 690(MiB/s) 172k cache-auto-1-thread seqread-psync 729(MiB/s) 182k cache-auto seqread-psync-multi 2578(MiB/s) 644k cache-auto-1-thread seqread-psync-multi 2597(MiB/s) 649k cache-auto seqread-mmap 660(MiB/s) 165k cache-auto-1-thread seqread-mmap 672(MiB/s) 168k cache-auto seqread-mmap-multi 2499(MiB/s) 624k cache-auto-1-thread seqread-mmap-multi 2618(MiB/s) 654k cache-auto seqread-libaio 286(MiB/s) 71k cache-auto-1-thread seqread-libaio 260(MiB/s) 65k cache-auto seqread-libaio-multi 1508(MiB/s) 377k cache-auto-1-thread seqread-libaio-multi 986(MiB/s) 246k cache-auto randread-psync 35(MiB/s) 9191 cache-auto-1-thread randread-psync 55(MiB/s) 13k cache-auto randread-psync-multi 179(MiB/s) 44k cache-auto-1-thread randread-psync-multi 209(MiB/s) 52k cache-auto randread-mmap 32(MiB/s) 8273 cache-auto-1-thread randread-mmap 50(MiB/s) 12k cache-auto randread-mmap-multi 161(MiB/s) 40k cache-auto-1-thread randread-mmap-multi 185(MiB/s) 46k cache-auto randread-libaio 268(MiB/s) 67k cache-auto-1-thread randread-libaio 254(MiB/s) 63k cache-auto randread-libaio-multi 256(MiB/s) 64k cache-auto-1-thread randread-libaio-multi 155(MiB/s) 38k cache-auto seqwrite-psync 23(MiB/s) 6026 cache-auto-1-thread seqwrite-psync 30(MiB/s) 7925 cache-auto seqwrite-psync-multi 100(MiB/s) 25k cache-auto-1-thread seqwrite-psync-multi 154(MiB/s) 38k cache-auto seqwrite-mmap 343(MiB/s) 85k cache-auto-1-thread seqwrite-mmap 355(MiB/s) 88k cache-auto seqwrite-mmap-multi 408(MiB/s) 102k cache-auto-1-thread seqwrite-mmap-multi 438(MiB/s) 109k cache-auto seqwrite-libaio 41(MiB/s) 10k cache-auto-1-thread seqwrite-libaio 65(MiB/s) 16k cache-auto seqwrite-libaio-multi 137(MiB/s) 34k cache-auto-1-thread seqwrite-libaio-multi 214(MiB/s) 53k cache-auto randwrite-psync 22(MiB/s) 5801 cache-auto-1-thread randwrite-psync 30(MiB/s) 7927 cache-auto randwrite-psync-multi 100(MiB/s) 25k cache-auto-1-thread randwrite-psync-multi 151(MiB/s) 37k cache-auto randwrite-mmap 31(MiB/s) 7984 cache-auto-1-thread randwrite-mmap 55(MiB/s) 13k cache-auto randwrite-mmap-multi 124(MiB/s) 31k cache-auto-1-thread randwrite-mmap-multi 213(MiB/s) 53k cache-auto randwrite-libaio 40(MiB/s) 10k cache-auto-1-thread randwrite-libaio 64(MiB/s) 16k cache-auto randwrite-libaio-multi 139(MiB/s) 34k cache-auto-1-thread randwrite-libaio-multi 212(MiB/s) 53k ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: tools/virtiofs: Multi threading seems to hurt performance 2020-09-18 21:34 tools/virtiofs: Multi threading seems to hurt performance Vivek Goyal @ 2020-09-21 8:39 ` Stefan Hajnoczi 2020-09-21 13:39 ` Vivek Goyal 2020-09-21 8:50 ` Dr. David Alan Gilbert ` (3 subsequent siblings) 4 siblings, 1 reply; 55+ messages in thread From: Stefan Hajnoczi @ 2020-09-21 8:39 UTC (permalink / raw) To: Vivek Goyal; +Cc: virtio-fs-list, qemu-devel, Dr. David Alan Gilbert [-- Attachment #1: Type: text/plain, Size: 5074 bytes --] On Fri, Sep 18, 2020 at 05:34:36PM -0400, Vivek Goyal wrote: > And here are the comparision results. To me it seems that by default > we should switch to 1 thread (Till we can figure out how to make > multi thread performance better even when single process is doing > I/O in client). Let's understand the reason before making changes. Questions: * Is "1-thread" --thread-pool-size=1? * Was DAX enabled? * How does cache=none perform? * Does commenting out vu_queue_get_avail_bytes() + fuse_log("%s: Queue %d gave evalue: %zx available: in: %u out: %u\n") in fv_queue_thread help? * How do the kvm_stat vmexit counters compare? * How does host mpstat -P ALL compare? * How does host perf record -a compare? * Does the Rust virtiofsd show the same pattern (it doesn't use glib thread pools)? Stefan > NAME WORKLOAD Bandwidth IOPS > cache-auto seqread-psync 690(MiB/s) 172k > cache-auto-1-thread seqread-psync 729(MiB/s) 182k > > cache-auto seqread-psync-multi 2578(MiB/s) 644k > cache-auto-1-thread seqread-psync-multi 2597(MiB/s) 649k > > cache-auto seqread-mmap 660(MiB/s) 165k > cache-auto-1-thread seqread-mmap 672(MiB/s) 168k > > cache-auto seqread-mmap-multi 2499(MiB/s) 624k > cache-auto-1-thread seqread-mmap-multi 2618(MiB/s) 654k > > cache-auto seqread-libaio 286(MiB/s) 71k > cache-auto-1-thread seqread-libaio 260(MiB/s) 65k > > cache-auto seqread-libaio-multi 1508(MiB/s) 377k > cache-auto-1-thread seqread-libaio-multi 986(MiB/s) 246k > > cache-auto randread-psync 35(MiB/s) 9191 > cache-auto-1-thread randread-psync 55(MiB/s) 13k > > cache-auto randread-psync-multi 179(MiB/s) 44k > cache-auto-1-thread randread-psync-multi 209(MiB/s) 52k > > cache-auto randread-mmap 32(MiB/s) 8273 > cache-auto-1-thread randread-mmap 50(MiB/s) 12k > > cache-auto randread-mmap-multi 161(MiB/s) 40k > cache-auto-1-thread randread-mmap-multi 185(MiB/s) 46k > > cache-auto randread-libaio 268(MiB/s) 67k > cache-auto-1-thread randread-libaio 254(MiB/s) 63k > > cache-auto randread-libaio-multi 256(MiB/s) 64k > cache-auto-1-thread randread-libaio-multi 155(MiB/s) 38k > > cache-auto seqwrite-psync 23(MiB/s) 6026 > cache-auto-1-thread seqwrite-psync 30(MiB/s) 7925 > > cache-auto seqwrite-psync-multi 100(MiB/s) 25k > cache-auto-1-thread seqwrite-psync-multi 154(MiB/s) 38k > > cache-auto seqwrite-mmap 343(MiB/s) 85k > cache-auto-1-thread seqwrite-mmap 355(MiB/s) 88k > > cache-auto seqwrite-mmap-multi 408(MiB/s) 102k > cache-auto-1-thread seqwrite-mmap-multi 438(MiB/s) 109k > > cache-auto seqwrite-libaio 41(MiB/s) 10k > cache-auto-1-thread seqwrite-libaio 65(MiB/s) 16k > > cache-auto seqwrite-libaio-multi 137(MiB/s) 34k > cache-auto-1-thread seqwrite-libaio-multi 214(MiB/s) 53k > > cache-auto randwrite-psync 22(MiB/s) 5801 > cache-auto-1-thread randwrite-psync 30(MiB/s) 7927 > > cache-auto randwrite-psync-multi 100(MiB/s) 25k > cache-auto-1-thread randwrite-psync-multi 151(MiB/s) 37k > > cache-auto randwrite-mmap 31(MiB/s) 7984 > cache-auto-1-thread randwrite-mmap 55(MiB/s) 13k > > cache-auto randwrite-mmap-multi 124(MiB/s) 31k > cache-auto-1-thread randwrite-mmap-multi 213(MiB/s) 53k > > cache-auto randwrite-libaio 40(MiB/s) 10k > cache-auto-1-thread randwrite-libaio 64(MiB/s) 16k > > cache-auto randwrite-libaio-multi 139(MiB/s) 34k > cache-auto-1-thread randwrite-libaio-multi 212(MiB/s) 53k > > > > > > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: tools/virtiofs: Multi threading seems to hurt performance 2020-09-21 8:39 ` Stefan Hajnoczi @ 2020-09-21 13:39 ` Vivek Goyal 2020-09-21 16:57 ` Stefan Hajnoczi 0 siblings, 1 reply; 55+ messages in thread From: Vivek Goyal @ 2020-09-21 13:39 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: virtio-fs-list, qemu-devel, Dr. David Alan Gilbert On Mon, Sep 21, 2020 at 09:39:23AM +0100, Stefan Hajnoczi wrote: > On Fri, Sep 18, 2020 at 05:34:36PM -0400, Vivek Goyal wrote: > > And here are the comparision results. To me it seems that by default > > we should switch to 1 thread (Till we can figure out how to make > > multi thread performance better even when single process is doing > > I/O in client). > > Let's understand the reason before making changes. > > Questions: > * Is "1-thread" --thread-pool-size=1? Yes. > * Was DAX enabled? No. > * How does cache=none perform? I just ran random read workload with cache=none. cache-none randread-psync 45(MiB/s) 11k cache-none-1-thread randread-psync 63(MiB/s) 15k With 1 thread it offers more IOPS. > * Does commenting out vu_queue_get_avail_bytes() + fuse_log("%s: > Queue %d gave evalue: %zx available: in: %u out: %u\n") in > fv_queue_thread help? Will try that. > * How do the kvm_stat vmexit counters compare? This should be same, isn't it. Changing number of threads serving should not change number of vmexits? > * How does host mpstat -P ALL compare? Never used mpstat. Will try running it and see if I can get something meaningful. > * How does host perf record -a compare? Will try it. I feel this might be too big and too verbose to get something meaningful. > * Does the Rust virtiofsd show the same pattern (it doesn't use glib > thread pools)? No idea. Never tried rust implementation of virtiofsd. But I suepct it has to do with thread pool implementation and possibly extra cacheline bouncing. Thanks Vivek ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: tools/virtiofs: Multi threading seems to hurt performance 2020-09-21 13:39 ` Vivek Goyal @ 2020-09-21 16:57 ` Stefan Hajnoczi 0 siblings, 0 replies; 55+ messages in thread From: Stefan Hajnoczi @ 2020-09-21 16:57 UTC (permalink / raw) To: Vivek Goyal; +Cc: virtio-fs-list, qemu-devel, Dr. David Alan Gilbert [-- Attachment #1: Type: text/plain, Size: 1787 bytes --] On Mon, Sep 21, 2020 at 09:39:44AM -0400, Vivek Goyal wrote: > On Mon, Sep 21, 2020 at 09:39:23AM +0100, Stefan Hajnoczi wrote: > > On Fri, Sep 18, 2020 at 05:34:36PM -0400, Vivek Goyal wrote: > > > And here are the comparision results. To me it seems that by default > > > we should switch to 1 thread (Till we can figure out how to make > > > multi thread performance better even when single process is doing > > > I/O in client). > > > > Let's understand the reason before making changes. > > > > Questions: > > * Is "1-thread" --thread-pool-size=1? > > Yes. Okay, I wanted to make sure 1-thread is still going through the glib thread pool. So it's the same code path regardless of the --thread-pool-size= value. This suggests the performance issue is related to timing side-effects like lock contention, thread scheduling, etc. > > * How do the kvm_stat vmexit counters compare? > > This should be same, isn't it. Changing number of threads serving should > not change number of vmexits? There is batching at the virtio and eventfd levels. I'm not sure if it's coming into play here but you would see it by comparing vmexits and eventfd reads. Having more threads can increase the number of notifications and completion interrupt, which can make overall performance worse in some cases. > > * How does host mpstat -P ALL compare? > > Never used mpstat. Will try running it and see if I can get something > meaningful. Tools like top, vmstat, etc can give similar information. I'm wondering what the host CPU utilization (guest/sys/user) looks like. > But I suepct it has to do with thread pool implementation and possibly > extra cacheline bouncing. I think perf can record cacheline bounces if you want to check. Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: tools/virtiofs: Multi threading seems to hurt performance 2020-09-18 21:34 tools/virtiofs: Multi threading seems to hurt performance Vivek Goyal 2020-09-21 8:39 ` Stefan Hajnoczi @ 2020-09-21 8:50 ` Dr. David Alan Gilbert 2020-09-21 13:35 ` Vivek Goyal 2020-09-21 15:32 ` Dr. David Alan Gilbert ` (2 subsequent siblings) 4 siblings, 1 reply; 55+ messages in thread From: Dr. David Alan Gilbert @ 2020-09-21 8:50 UTC (permalink / raw) To: Vivek Goyal; +Cc: virtio-fs-list, qemu-devel, Stefan Hajnoczi * Vivek Goyal (vgoyal@redhat.com) wrote: > Hi All, > > virtiofsd default thread pool size is 64. To me it feels that in most of > the cases thread pool size 1 performs better than thread pool size 64. > > I ran virtiofs-tests. > > https://github.com/rhvgoyal/virtiofs-tests > > And here are the comparision results. To me it seems that by default > we should switch to 1 thread (Till we can figure out how to make > multi thread performance better even when single process is doing > I/O in client). > > I am especially more interested in getting performance better for > single process in client. If that suffers, then it is pretty bad. > > Especially look at randread, randwrite, seqwrite performance. seqread > seems pretty good anyway. > > If I don't run who test suite and just ran randread-psync job, > my throughput jumps from around 40MB/s to 60MB/s. That's a huge > jump I would say. > > Thoughts? What's your host setup; how many cores has the host got and how many did you give the guest? Dave > Thanks > Vivek > > > NAME WORKLOAD Bandwidth IOPS > cache-auto seqread-psync 690(MiB/s) 172k > cache-auto-1-thread seqread-psync 729(MiB/s) 182k > > cache-auto seqread-psync-multi 2578(MiB/s) 644k > cache-auto-1-thread seqread-psync-multi 2597(MiB/s) 649k > > cache-auto seqread-mmap 660(MiB/s) 165k > cache-auto-1-thread seqread-mmap 672(MiB/s) 168k > > cache-auto seqread-mmap-multi 2499(MiB/s) 624k > cache-auto-1-thread seqread-mmap-multi 2618(MiB/s) 654k > > cache-auto seqread-libaio 286(MiB/s) 71k > cache-auto-1-thread seqread-libaio 260(MiB/s) 65k > > cache-auto seqread-libaio-multi 1508(MiB/s) 377k > cache-auto-1-thread seqread-libaio-multi 986(MiB/s) 246k > > cache-auto randread-psync 35(MiB/s) 9191 > cache-auto-1-thread randread-psync 55(MiB/s) 13k > > cache-auto randread-psync-multi 179(MiB/s) 44k > cache-auto-1-thread randread-psync-multi 209(MiB/s) 52k > > cache-auto randread-mmap 32(MiB/s) 8273 > cache-auto-1-thread randread-mmap 50(MiB/s) 12k > > cache-auto randread-mmap-multi 161(MiB/s) 40k > cache-auto-1-thread randread-mmap-multi 185(MiB/s) 46k > > cache-auto randread-libaio 268(MiB/s) 67k > cache-auto-1-thread randread-libaio 254(MiB/s) 63k > > cache-auto randread-libaio-multi 256(MiB/s) 64k > cache-auto-1-thread randread-libaio-multi 155(MiB/s) 38k > > cache-auto seqwrite-psync 23(MiB/s) 6026 > cache-auto-1-thread seqwrite-psync 30(MiB/s) 7925 > > cache-auto seqwrite-psync-multi 100(MiB/s) 25k > cache-auto-1-thread seqwrite-psync-multi 154(MiB/s) 38k > > cache-auto seqwrite-mmap 343(MiB/s) 85k > cache-auto-1-thread seqwrite-mmap 355(MiB/s) 88k > > cache-auto seqwrite-mmap-multi 408(MiB/s) 102k > cache-auto-1-thread seqwrite-mmap-multi 438(MiB/s) 109k > > cache-auto seqwrite-libaio 41(MiB/s) 10k > cache-auto-1-thread seqwrite-libaio 65(MiB/s) 16k > > cache-auto seqwrite-libaio-multi 137(MiB/s) 34k > cache-auto-1-thread seqwrite-libaio-multi 214(MiB/s) 53k > > cache-auto randwrite-psync 22(MiB/s) 5801 > cache-auto-1-thread randwrite-psync 30(MiB/s) 7927 > > cache-auto randwrite-psync-multi 100(MiB/s) 25k > cache-auto-1-thread randwrite-psync-multi 151(MiB/s) 37k > > cache-auto randwrite-mmap 31(MiB/s) 7984 > cache-auto-1-thread randwrite-mmap 55(MiB/s) 13k > > cache-auto randwrite-mmap-multi 124(MiB/s) 31k > cache-auto-1-thread randwrite-mmap-multi 213(MiB/s) 53k > > cache-auto randwrite-libaio 40(MiB/s) 10k > cache-auto-1-thread randwrite-libaio 64(MiB/s) 16k > > cache-auto randwrite-libaio-multi 139(MiB/s) 34k > cache-auto-1-thread randwrite-libaio-multi 212(MiB/s) 53k > > > > > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: tools/virtiofs: Multi threading seems to hurt performance 2020-09-21 8:50 ` Dr. David Alan Gilbert @ 2020-09-21 13:35 ` Vivek Goyal 2020-09-21 14:08 ` Daniel P. Berrangé 0 siblings, 1 reply; 55+ messages in thread From: Vivek Goyal @ 2020-09-21 13:35 UTC (permalink / raw) To: Dr. David Alan Gilbert; +Cc: virtio-fs-list, qemu-devel, Stefan Hajnoczi On Mon, Sep 21, 2020 at 09:50:19AM +0100, Dr. David Alan Gilbert wrote: > * Vivek Goyal (vgoyal@redhat.com) wrote: > > Hi All, > > > > virtiofsd default thread pool size is 64. To me it feels that in most of > > the cases thread pool size 1 performs better than thread pool size 64. > > > > I ran virtiofs-tests. > > > > https://github.com/rhvgoyal/virtiofs-tests > > > > And here are the comparision results. To me it seems that by default > > we should switch to 1 thread (Till we can figure out how to make > > multi thread performance better even when single process is doing > > I/O in client). > > > > I am especially more interested in getting performance better for > > single process in client. If that suffers, then it is pretty bad. > > > > Especially look at randread, randwrite, seqwrite performance. seqread > > seems pretty good anyway. > > > > If I don't run who test suite and just ran randread-psync job, > > my throughput jumps from around 40MB/s to 60MB/s. That's a huge > > jump I would say. > > > > Thoughts? > > What's your host setup; how many cores has the host got and how many did > you give the guest? Got 2 processors on host with 16 cores in each processor. With hyperthreading enabled, it makes 32 logical cores on each processor and that makes 64 logical cores on host. I have given 32 to guest. Vivek ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: tools/virtiofs: Multi threading seems to hurt performance 2020-09-21 13:35 ` Vivek Goyal @ 2020-09-21 14:08 ` Daniel P. Berrangé 0 siblings, 0 replies; 55+ messages in thread From: Daniel P. Berrangé @ 2020-09-21 14:08 UTC (permalink / raw) To: Vivek Goyal Cc: virtio-fs-list, Dr. David Alan Gilbert, Stefan Hajnoczi, qemu-devel On Mon, Sep 21, 2020 at 09:35:16AM -0400, Vivek Goyal wrote: > On Mon, Sep 21, 2020 at 09:50:19AM +0100, Dr. David Alan Gilbert wrote: > > * Vivek Goyal (vgoyal@redhat.com) wrote: > > > Hi All, > > > > > > virtiofsd default thread pool size is 64. To me it feels that in most of > > > the cases thread pool size 1 performs better than thread pool size 64. > > > > > > I ran virtiofs-tests. > > > > > > https://github.com/rhvgoyal/virtiofs-tests > > > > > > And here are the comparision results. To me it seems that by default > > > we should switch to 1 thread (Till we can figure out how to make > > > multi thread performance better even when single process is doing > > > I/O in client). > > > > > > I am especially more interested in getting performance better for > > > single process in client. If that suffers, then it is pretty bad. > > > > > > Especially look at randread, randwrite, seqwrite performance. seqread > > > seems pretty good anyway. > > > > > > If I don't run who test suite and just ran randread-psync job, > > > my throughput jumps from around 40MB/s to 60MB/s. That's a huge > > > jump I would say. > > > > > > Thoughts? > > > > What's your host setup; how many cores has the host got and how many did > > you give the guest? > > Got 2 processors on host with 16 cores in each processor. With > hyperthreading enabled, it makes 32 logical cores on each processor and > that makes 64 logical cores on host. > > I have given 32 to guest. FWIW, I'd be inclined to disable hyperthreading in the BIOS for one test to validate whether it is impacting performance results seen. Hyperthreads are weak compared to a real CPU, and could result in misleading data even if you are limiting your guest to 1/2 the host logical CPUs. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: tools/virtiofs: Multi threading seems to hurt performance 2020-09-18 21:34 tools/virtiofs: Multi threading seems to hurt performance Vivek Goyal 2020-09-21 8:39 ` Stefan Hajnoczi 2020-09-21 8:50 ` Dr. David Alan Gilbert @ 2020-09-21 15:32 ` Dr. David Alan Gilbert 2020-09-22 10:25 ` Dr. David Alan Gilbert 2020-09-21 20:16 ` Vivek Goyal 2020-09-23 12:50 ` [Virtio-fs] " Chirantan Ekbote 4 siblings, 1 reply; 55+ messages in thread From: Dr. David Alan Gilbert @ 2020-09-21 15:32 UTC (permalink / raw) To: Vivek Goyal Cc: jose.carlos.venegas.munoz, qemu-devel, cdupontd, virtio-fs-list, Stefan Hajnoczi, archana.m.shinde Hi, I've been doing some of my own perf tests and I think I agree about the thread pool size; my test is a kernel build and I've tried a bunch of different options. My config: Host: 16 core AMD EPYC (32 thread), 128G RAM, 5.9.0-rc4 kernel, rhel 8.2ish userspace. 5.1.0 qemu/virtiofsd built from git. Guest: Fedora 32 from cloud image with just enough extra installed for a kernel build. git cloned and checkout v5.8 of Linux into /dev/shm/linux on the host fresh before each test. Then log into the guest, make defconfig, time make -j 16 bzImage, make clean; time make -j 16 bzImage The numbers below are the 'real' time in the guest from the initial make (the subsequent makes dont vary much) Below are the detauls of what each of these means, but here are the numbers first virtiofsdefault 4m0.978s 9pdefault 9m41.660s virtiofscache=none 10m29.700s 9pmmappass 9m30.047s 9pmbigmsize 12m4.208s 9pmsecnone 9m21.363s virtiofscache=noneT1 7m17.494s virtiofsdefaultT1 3m43.326s So the winner there by far is the 'virtiofsdefaultT1' - that's the default virtiofs settings, but with --thread-pool-size=1 - so yes it gives a small benefit. But interestingly the cache=none virtiofs performance is pretty bad, but thread-pool-size=1 on that makes a BIG improvement. virtiofsdefault: ./virtiofsd --socket-path=/tmp/vhostqemu -o source=/dev/shm/linux ./x86_64-softmmu/qemu-system-x86_64 -M pc,memory-backend=mem,accel=kvm -smp 8 -cpu host -m 32G,maxmem=64G,slots=1 -object memory-backend-memfd,id=mem,size=32G,share=on -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -chardev socket,id=char0,path=/tmp/vhostqemu -device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=kernel mount -t virtiofs kernel /mnt 9pdefault ./x86_64-softmmu/qemu-system-x86_64 -M pc,accel=kvm -smp 8 -cpu host -m 32G -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -virtfs local,path=/dev/shm/linux,mount_tag=kernel,security_model=passthrough mount -t 9p -o trans=virtio kernel /mnt -oversion=9p2000.L virtiofscache=none ./virtiofsd --socket-path=/tmp/vhostqemu -o source=/dev/shm/linux -o cache=none ./x86_64-softmmu/qemu-system-x86_64 -M pc,memory-backend=mem,accel=kvm -smp 8 -cpu host -m 32G,maxmem=64G,slots=1 -object memory-backend-memfd,id=mem,size=32G,share=on -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -chardev socket,id=char0,path=/tmp/vhostqemu -device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=kernel mount -t virtiofs kernel /mnt 9pmmappass ./x86_64-softmmu/qemu-system-x86_64 -M pc,accel=kvm -smp 8 -cpu host -m 32G -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -virtfs local,path=/dev/shm/linux,mount_tag=kernel,security_model=passthrough mount -t 9p -o trans=virtio kernel /mnt -oversion=9p2000.L,cache=mmap 9pmbigmsize ./x86_64-softmmu/qemu-system-x86_64 -M pc,accel=kvm -smp 8 -cpu host -m 32G -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -virtfs local,path=/dev/shm/linux,mount_tag=kernel,security_model=passthrough mount -t 9p -o trans=virtio kernel /mnt -oversion=9p2000.L,cache=mmap,msize=1048576 9pmsecnone ./x86_64-softmmu/qemu-system-x86_64 -M pc,accel=kvm -smp 8 -cpu host -m 32G -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -virtfs local,path=/dev/shm/linux,mount_tag=kernel,security_model=none mount -t 9p -o trans=virtio kernel /mnt -oversion=9p2000.L virtiofscache=noneT1 ./virtiofsd --socket-path=/tmp/vhostqemu -o source=/dev/shm/linux -o cache=none --thread-pool-size=1 mount -t virtiofs kernel /mnt virtiofsdefaultT1 ./virtiofsd --socket-path=/tmp/vhostqemu -o source=/dev/shm/linux --thread-pool-size=1 ./x86_64-softmmu/qemu-system-x86_64 -M pc,memory-backend=mem,accel=kvm -smp 8 -cpu host -m 32G,maxmem=64G,slots=1 -object memory-backend-memfd,id=mem,size=32G,share=on -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -chardev socket,id=char0,path=/tmp/vhostqemu -device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=kernel -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: tools/virtiofs: Multi threading seems to hurt performance 2020-09-21 15:32 ` Dr. David Alan Gilbert @ 2020-09-22 10:25 ` Dr. David Alan Gilbert 2020-09-22 17:47 ` Vivek Goyal 0 siblings, 1 reply; 55+ messages in thread From: Dr. David Alan Gilbert @ 2020-09-22 10:25 UTC (permalink / raw) To: Vivek Goyal Cc: jose.carlos.venegas.munoz, qemu-devel, cdupontd, virtio-fs-list, Stefan Hajnoczi, archana.m.shinde * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote: > Hi, > I've been doing some of my own perf tests and I think I agree > about the thread pool size; my test is a kernel build > and I've tried a bunch of different options. > > My config: > Host: 16 core AMD EPYC (32 thread), 128G RAM, > 5.9.0-rc4 kernel, rhel 8.2ish userspace. > 5.1.0 qemu/virtiofsd built from git. > Guest: Fedora 32 from cloud image with just enough extra installed for > a kernel build. > > git cloned and checkout v5.8 of Linux into /dev/shm/linux on the host > fresh before each test. Then log into the guest, make defconfig, > time make -j 16 bzImage, make clean; time make -j 16 bzImage > The numbers below are the 'real' time in the guest from the initial make > (the subsequent makes dont vary much) > > Below are the detauls of what each of these means, but here are the > numbers first > > virtiofsdefault 4m0.978s > 9pdefault 9m41.660s > virtiofscache=none 10m29.700s > 9pmmappass 9m30.047s > 9pmbigmsize 12m4.208s > 9pmsecnone 9m21.363s > virtiofscache=noneT1 7m17.494s > virtiofsdefaultT1 3m43.326s > > So the winner there by far is the 'virtiofsdefaultT1' - that's > the default virtiofs settings, but with --thread-pool-size=1 - so > yes it gives a small benefit. > But interestingly the cache=none virtiofs performance is pretty bad, > but thread-pool-size=1 on that makes a BIG improvement. Here are fio runs that Vivek asked me to run in my same environment (there are some 0's in some of the mmap cases, and I've not investigated why yet). virtiofs is looking good here in I think all of the cases; there's some division over which cinfig; cache=none seems faster in some cases which surprises me. Dave NAME WORKLOAD Bandwidth IOPS 9pbigmsize seqread-psync 108(MiB/s) 27k 9pdefault seqread-psync 105(MiB/s) 26k 9pmmappass seqread-psync 107(MiB/s) 26k 9pmsecnone seqread-psync 107(MiB/s) 26k virtiofscachenoneT1 seqread-psync 135(MiB/s) 33k virtiofscachenone seqread-psync 115(MiB/s) 28k virtiofsdefaultT1 seqread-psync 2465(MiB/s) 616k virtiofsdefault seqread-psync 2468(MiB/s) 617k 9pbigmsize seqread-psync-multi 357(MiB/s) 89k 9pdefault seqread-psync-multi 358(MiB/s) 89k 9pmmappass seqread-psync-multi 347(MiB/s) 86k 9pmsecnone seqread-psync-multi 364(MiB/s) 91k virtiofscachenoneT1 seqread-psync-multi 479(MiB/s) 119k virtiofscachenone seqread-psync-multi 385(MiB/s) 96k virtiofsdefaultT1 seqread-psync-multi 5916(MiB/s) 1479k virtiofsdefault seqread-psync-multi 8771(MiB/s) 2192k 9pbigmsize seqread-mmap 111(MiB/s) 27k 9pdefault seqread-mmap 101(MiB/s) 25k 9pmmappass seqread-mmap 114(MiB/s) 28k 9pmsecnone seqread-mmap 107(MiB/s) 26k virtiofscachenoneT1 seqread-mmap 0(KiB/s) 0 virtiofscachenone seqread-mmap 0(KiB/s) 0 virtiofsdefaultT1 seqread-mmap 2896(MiB/s) 724k virtiofsdefault seqread-mmap 2856(MiB/s) 714k 9pbigmsize seqread-mmap-multi 364(MiB/s) 91k 9pdefault seqread-mmap-multi 348(MiB/s) 87k 9pmmappass seqread-mmap-multi 354(MiB/s) 88k 9pmsecnone seqread-mmap-multi 340(MiB/s) 85k virtiofscachenoneT1 seqread-mmap-multi 0(KiB/s) 0 virtiofscachenone seqread-mmap-multi 0(KiB/s) 0 virtiofsdefaultT1 seqread-mmap-multi 6057(MiB/s) 1514k virtiofsdefault seqread-mmap-multi 9585(MiB/s) 2396k 9pbigmsize seqread-libaio 109(MiB/s) 27k 9pdefault seqread-libaio 103(MiB/s) 25k 9pmmappass seqread-libaio 107(MiB/s) 26k 9pmsecnone seqread-libaio 107(MiB/s) 26k virtiofscachenoneT1 seqread-libaio 671(MiB/s) 167k virtiofscachenone seqread-libaio 538(MiB/s) 134k virtiofsdefaultT1 seqread-libaio 187(MiB/s) 46k virtiofsdefault seqread-libaio 541(MiB/s) 135k 9pbigmsize seqread-libaio-multi 354(MiB/s) 88k 9pdefault seqread-libaio-multi 360(MiB/s) 90k 9pmmappass seqread-libaio-multi 356(MiB/s) 89k 9pmsecnone seqread-libaio-multi 344(MiB/s) 86k virtiofscachenoneT1 seqread-libaio-multi 488(MiB/s) 122k virtiofscachenone seqread-libaio-multi 380(MiB/s) 95k virtiofsdefaultT1 seqread-libaio-multi 5577(MiB/s) 1394k virtiofsdefault seqread-libaio-multi 5359(MiB/s) 1339k 9pbigmsize randread-psync 106(MiB/s) 26k 9pdefault randread-psync 106(MiB/s) 26k 9pmmappass randread-psync 120(MiB/s) 30k 9pmsecnone randread-psync 105(MiB/s) 26k virtiofscachenoneT1 randread-psync 154(MiB/s) 38k virtiofscachenone randread-psync 134(MiB/s) 33k virtiofsdefaultT1 randread-psync 129(MiB/s) 32k virtiofsdefault randread-psync 129(MiB/s) 32k 9pbigmsize randread-psync-multi 349(MiB/s) 87k 9pdefault randread-psync-multi 354(MiB/s) 88k 9pmmappass randread-psync-multi 360(MiB/s) 90k 9pmsecnone randread-psync-multi 352(MiB/s) 88k virtiofscachenoneT1 randread-psync-multi 449(MiB/s) 112k virtiofscachenone randread-psync-multi 383(MiB/s) 95k virtiofsdefaultT1 randread-psync-multi 435(MiB/s) 108k virtiofsdefault randread-psync-multi 368(MiB/s) 92k 9pbigmsize randread-mmap 100(MiB/s) 25k 9pdefault randread-mmap 89(MiB/s) 22k 9pmmappass randread-mmap 87(MiB/s) 21k 9pmsecnone randread-mmap 92(MiB/s) 23k virtiofscachenoneT1 randread-mmap 0(KiB/s) 0 virtiofscachenone randread-mmap 0(KiB/s) 0 virtiofsdefaultT1 randread-mmap 111(MiB/s) 27k virtiofsdefault randread-mmap 101(MiB/s) 25k 9pbigmsize randread-mmap-multi 335(MiB/s) 83k 9pdefault randread-mmap-multi 318(MiB/s) 79k 9pmmappass randread-mmap-multi 335(MiB/s) 83k 9pmsecnone randread-mmap-multi 323(MiB/s) 80k virtiofscachenoneT1 randread-mmap-multi 0(KiB/s) 0 virtiofscachenone randread-mmap-multi 0(KiB/s) 0 virtiofsdefaultT1 randread-mmap-multi 422(MiB/s) 105k virtiofsdefault randread-mmap-multi 345(MiB/s) 86k 9pbigmsize randread-libaio 84(MiB/s) 21k 9pdefault randread-libaio 89(MiB/s) 22k 9pmmappass randread-libaio 87(MiB/s) 21k 9pmsecnone randread-libaio 82(MiB/s) 20k virtiofscachenoneT1 randread-libaio 641(MiB/s) 160k virtiofscachenone randread-libaio 527(MiB/s) 131k virtiofsdefaultT1 randread-libaio 205(MiB/s) 51k virtiofsdefault randread-libaio 536(MiB/s) 134k 9pbigmsize randread-libaio-multi 265(MiB/s) 66k 9pdefault randread-libaio-multi 267(MiB/s) 66k 9pmmappass randread-libaio-multi 266(MiB/s) 66k 9pmsecnone randread-libaio-multi 269(MiB/s) 67k virtiofscachenoneT1 randread-libaio-multi 615(MiB/s) 153k virtiofscachenone randread-libaio-multi 542(MiB/s) 135k virtiofsdefaultT1 randread-libaio-multi 595(MiB/s) 148k virtiofsdefault randread-libaio-multi 552(MiB/s) 138k 9pbigmsize seqwrite-psync 106(MiB/s) 26k 9pdefault seqwrite-psync 106(MiB/s) 26k 9pmmappass seqwrite-psync 107(MiB/s) 26k 9pmsecnone seqwrite-psync 107(MiB/s) 26k virtiofscachenoneT1 seqwrite-psync 136(MiB/s) 34k virtiofscachenone seqwrite-psync 112(MiB/s) 28k virtiofsdefaultT1 seqwrite-psync 132(MiB/s) 33k virtiofsdefault seqwrite-psync 109(MiB/s) 27k 9pbigmsize seqwrite-psync-multi 353(MiB/s) 88k 9pdefault seqwrite-psync-multi 364(MiB/s) 91k 9pmmappass seqwrite-psync-multi 345(MiB/s) 86k 9pmsecnone seqwrite-psync-multi 350(MiB/s) 87k virtiofscachenoneT1 seqwrite-psync-multi 470(MiB/s) 117k virtiofscachenone seqwrite-psync-multi 374(MiB/s) 93k virtiofsdefaultT1 seqwrite-psync-multi 470(MiB/s) 117k virtiofsdefault seqwrite-psync-multi 373(MiB/s) 93k 9pbigmsize seqwrite-mmap 195(MiB/s) 48k 9pdefault seqwrite-mmap 0(KiB/s) 0 9pmmappass seqwrite-mmap 196(MiB/s) 49k 9pmsecnone seqwrite-mmap 0(KiB/s) 0 virtiofscachenoneT1 seqwrite-mmap 0(KiB/s) 0 virtiofscachenone seqwrite-mmap 0(KiB/s) 0 virtiofsdefaultT1 seqwrite-mmap 603(MiB/s) 150k virtiofsdefault seqwrite-mmap 629(MiB/s) 157k 9pbigmsize seqwrite-mmap-multi 247(MiB/s) 61k 9pdefault seqwrite-mmap-multi 0(KiB/s) 0 9pmmappass seqwrite-mmap-multi 246(MiB/s) 61k 9pmsecnone seqwrite-mmap-multi 0(KiB/s) 0 virtiofscachenoneT1 seqwrite-mmap-multi 0(KiB/s) 0 virtiofscachenone seqwrite-mmap-multi 0(KiB/s) 0 virtiofsdefaultT1 seqwrite-mmap-multi 1787(MiB/s) 446k virtiofsdefault seqwrite-mmap-multi 1692(MiB/s) 423k 9pbigmsize seqwrite-libaio 107(MiB/s) 26k 9pdefault seqwrite-libaio 107(MiB/s) 26k 9pmmappass seqwrite-libaio 106(MiB/s) 26k 9pmsecnone seqwrite-libaio 108(MiB/s) 27k virtiofscachenoneT1 seqwrite-libaio 595(MiB/s) 148k virtiofscachenone seqwrite-libaio 524(MiB/s) 131k virtiofsdefaultT1 seqwrite-libaio 575(MiB/s) 143k virtiofsdefault seqwrite-libaio 538(MiB/s) 134k 9pbigmsize seqwrite-libaio-multi 355(MiB/s) 88k 9pdefault seqwrite-libaio-multi 341(MiB/s) 85k 9pmmappass seqwrite-libaio-multi 354(MiB/s) 88k 9pmsecnone seqwrite-libaio-multi 350(MiB/s) 87k virtiofscachenoneT1 seqwrite-libaio-multi 609(MiB/s) 152k virtiofscachenone seqwrite-libaio-multi 536(MiB/s) 134k virtiofsdefaultT1 seqwrite-libaio-multi 609(MiB/s) 152k virtiofsdefault seqwrite-libaio-multi 538(MiB/s) 134k 9pbigmsize randwrite-psync 104(MiB/s) 26k 9pdefault randwrite-psync 106(MiB/s) 26k 9pmmappass randwrite-psync 105(MiB/s) 26k 9pmsecnone randwrite-psync 103(MiB/s) 25k virtiofscachenoneT1 randwrite-psync 125(MiB/s) 31k virtiofscachenone randwrite-psync 110(MiB/s) 27k virtiofsdefaultT1 randwrite-psync 129(MiB/s) 32k virtiofsdefault randwrite-psync 112(MiB/s) 28k 9pbigmsize randwrite-psync-multi 355(MiB/s) 88k 9pdefault randwrite-psync-multi 339(MiB/s) 84k 9pmmappass randwrite-psync-multi 343(MiB/s) 85k 9pmsecnone randwrite-psync-multi 344(MiB/s) 86k virtiofscachenoneT1 randwrite-psync-multi 461(MiB/s) 115k virtiofscachenone randwrite-psync-multi 370(MiB/s) 92k virtiofsdefaultT1 randwrite-psync-multi 449(MiB/s) 112k virtiofsdefault randwrite-psync-multi 364(MiB/s) 91k 9pbigmsize randwrite-mmap 98(MiB/s) 24k 9pdefault randwrite-mmap 0(KiB/s) 0 9pmmappass randwrite-mmap 97(MiB/s) 24k 9pmsecnone randwrite-mmap 0(KiB/s) 0 virtiofscachenoneT1 randwrite-mmap 0(KiB/s) 0 virtiofscachenone randwrite-mmap 0(KiB/s) 0 virtiofsdefaultT1 randwrite-mmap 102(MiB/s) 25k virtiofsdefault randwrite-mmap 92(MiB/s) 23k 9pbigmsize randwrite-mmap-multi 246(MiB/s) 61k 9pdefault randwrite-mmap-multi 0(KiB/s) 0 9pmmappass randwrite-mmap-multi 239(MiB/s) 59k 9pmsecnone randwrite-mmap-multi 0(KiB/s) 0 virtiofscachenoneT1 randwrite-mmap-multi 0(KiB/s) 0 virtiofscachenone randwrite-mmap-multi 0(KiB/s) 0 virtiofsdefaultT1 randwrite-mmap-multi 279(MiB/s) 69k virtiofsdefault randwrite-mmap-multi 225(MiB/s) 56k 9pbigmsize randwrite-libaio 110(MiB/s) 27k 9pdefault randwrite-libaio 111(MiB/s) 27k 9pmmappass randwrite-libaio 103(MiB/s) 25k 9pmsecnone randwrite-libaio 102(MiB/s) 25k virtiofscachenoneT1 randwrite-libaio 601(MiB/s) 150k virtiofscachenone randwrite-libaio 525(MiB/s) 131k virtiofsdefaultT1 randwrite-libaio 618(MiB/s) 154k virtiofsdefault randwrite-libaio 527(MiB/s) 131k 9pbigmsize randwrite-libaio-multi 332(MiB/s) 83k 9pdefault randwrite-libaio-multi 343(MiB/s) 85k 9pmmappass randwrite-libaio-multi 350(MiB/s) 87k 9pmsecnone randwrite-libaio-multi 334(MiB/s) 83k virtiofscachenoneT1 randwrite-libaio-multi 611(MiB/s) 152k virtiofscachenone randwrite-libaio-multi 533(MiB/s) 133k virtiofsdefaultT1 randwrite-libaio-multi 599(MiB/s) 149k virtiofsdefault randwrite-libaio-multi 531(MiB/s) 132k > > virtiofsdefault: > ./virtiofsd --socket-path=/tmp/vhostqemu -o source=/dev/shm/linux > ./x86_64-softmmu/qemu-system-x86_64 -M pc,memory-backend=mem,accel=kvm -smp 8 -cpu host -m 32G,maxmem=64G,slots=1 -object memory-backend-memfd,id=mem,size=32G,share=on -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -chardev socket,id=char0,path=/tmp/vhostqemu -device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=kernel > mount -t virtiofs kernel /mnt > > 9pdefault > ./x86_64-softmmu/qemu-system-x86_64 -M pc,accel=kvm -smp 8 -cpu host -m 32G -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -virtfs local,path=/dev/shm/linux,mount_tag=kernel,security_model=passthrough > mount -t 9p -o trans=virtio kernel /mnt -oversion=9p2000.L > > virtiofscache=none > ./virtiofsd --socket-path=/tmp/vhostqemu -o source=/dev/shm/linux -o cache=none > ./x86_64-softmmu/qemu-system-x86_64 -M pc,memory-backend=mem,accel=kvm -smp 8 -cpu host -m 32G,maxmem=64G,slots=1 -object memory-backend-memfd,id=mem,size=32G,share=on -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -chardev socket,id=char0,path=/tmp/vhostqemu -device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=kernel > mount -t virtiofs kernel /mnt > > 9pmmappass > ./x86_64-softmmu/qemu-system-x86_64 -M pc,accel=kvm -smp 8 -cpu host -m 32G -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -virtfs local,path=/dev/shm/linux,mount_tag=kernel,security_model=passthrough > mount -t 9p -o trans=virtio kernel /mnt -oversion=9p2000.L,cache=mmap > > 9pmbigmsize > ./x86_64-softmmu/qemu-system-x86_64 -M pc,accel=kvm -smp 8 -cpu host -m 32G -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -virtfs local,path=/dev/shm/linux,mount_tag=kernel,security_model=passthrough > mount -t 9p -o trans=virtio kernel /mnt -oversion=9p2000.L,cache=mmap,msize=1048576 > > 9pmsecnone > ./x86_64-softmmu/qemu-system-x86_64 -M pc,accel=kvm -smp 8 -cpu host -m 32G -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -virtfs local,path=/dev/shm/linux,mount_tag=kernel,security_model=none > mount -t 9p -o trans=virtio kernel /mnt -oversion=9p2000.L > > virtiofscache=noneT1 > ./virtiofsd --socket-path=/tmp/vhostqemu -o source=/dev/shm/linux -o cache=none --thread-pool-size=1 > mount -t virtiofs kernel /mnt > > virtiofsdefaultT1 > ./virtiofsd --socket-path=/tmp/vhostqemu -o source=/dev/shm/linux --thread-pool-size=1 > ./x86_64-softmmu/qemu-system-x86_64 -M pc,memory-backend=mem,accel=kvm -smp 8 -cpu host -m 32G,maxmem=64G,slots=1 -object memory-backend-memfd,id=mem,size=32G,share=on -drive if=virtio,file=/home/images/f-32-kernel.qcow2 -nographic -chardev socket,id=char0,path=/tmp/vhostqemu -device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=kernel > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: tools/virtiofs: Multi threading seems to hurt performance 2020-09-22 10:25 ` Dr. David Alan Gilbert @ 2020-09-22 17:47 ` Vivek Goyal 2020-09-24 21:33 ` Venegas Munoz, Jose Carlos 2020-09-25 12:11 ` tools/virtiofs: Multi threading seems to hurt performance Dr. David Alan Gilbert 0 siblings, 2 replies; 55+ messages in thread From: Vivek Goyal @ 2020-09-22 17:47 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: jose.carlos.venegas.munoz, qemu-devel, cdupontd, virtio-fs-list, Stefan Hajnoczi, archana.m.shinde On Tue, Sep 22, 2020 at 11:25:31AM +0100, Dr. David Alan Gilbert wrote: > * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote: > > Hi, > > I've been doing some of my own perf tests and I think I agree > > about the thread pool size; my test is a kernel build > > and I've tried a bunch of different options. > > > > My config: > > Host: 16 core AMD EPYC (32 thread), 128G RAM, > > 5.9.0-rc4 kernel, rhel 8.2ish userspace. > > 5.1.0 qemu/virtiofsd built from git. > > Guest: Fedora 32 from cloud image with just enough extra installed for > > a kernel build. > > > > git cloned and checkout v5.8 of Linux into /dev/shm/linux on the host > > fresh before each test. Then log into the guest, make defconfig, > > time make -j 16 bzImage, make clean; time make -j 16 bzImage > > The numbers below are the 'real' time in the guest from the initial make > > (the subsequent makes dont vary much) > > > > Below are the detauls of what each of these means, but here are the > > numbers first > > > > virtiofsdefault 4m0.978s > > 9pdefault 9m41.660s > > virtiofscache=none 10m29.700s > > 9pmmappass 9m30.047s > > 9pmbigmsize 12m4.208s > > 9pmsecnone 9m21.363s > > virtiofscache=noneT1 7m17.494s > > virtiofsdefaultT1 3m43.326s > > > > So the winner there by far is the 'virtiofsdefaultT1' - that's > > the default virtiofs settings, but with --thread-pool-size=1 - so > > yes it gives a small benefit. > > But interestingly the cache=none virtiofs performance is pretty bad, > > but thread-pool-size=1 on that makes a BIG improvement. > > Here are fio runs that Vivek asked me to run in my same environment > (there are some 0's in some of the mmap cases, and I've not investigated > why yet). cache=none does not allow mmap in case of virtiofs. That's when you are seeing 0. >virtiofs is looking good here in I think all of the cases; > there's some division over which cinfig; cache=none > seems faster in some cases which surprises me. I know cache=none is faster in case of write workloads. It forces direct write where we don't call file_remove_privs(). While cache=auto goes through file_remove_privs() and that adds a GETXATTR request to every WRITE request. Vivek ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: tools/virtiofs: Multi threading seems to hurt performance 2020-09-22 17:47 ` Vivek Goyal @ 2020-09-24 21:33 ` Venegas Munoz, Jose Carlos 2020-09-24 22:10 ` virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance) Vivek Goyal 2020-09-25 12:11 ` tools/virtiofs: Multi threading seems to hurt performance Dr. David Alan Gilbert 1 sibling, 1 reply; 55+ messages in thread From: Venegas Munoz, Jose Carlos @ 2020-09-24 21:33 UTC (permalink / raw) To: Vivek Goyal, Dr. David Alan Gilbert Cc: virtio-fs-list, Shinde, Archana M, qemu-devel, Stefan Hajnoczi, cdupontd [-- Attachment #1: Type: text/plain, Size: 4115 bytes --] Hi Folks, Sorry for the delay about how to reproduce `fio` data. I have some code to automate testing for multiple kata configs and collect info like: - Kata-env, kata configuration.toml, qemu command, virtiofsd command. See: https://github.com/jcvenegas/mrunner/ Last time we agreed to narrow the cases and configs to compare virtiofs and 9pfs The configs where the following: - qemu + virtiofs(cache=auto, dax=0) a.ka. `kata-qemu-virtiofs` WITOUT xattr - qemu + 9pfs a.k.a `kata-qemu` Please take a look to the html and raw results I attach in this mail. ## Can I say that the current status is: - As David tests and Vivek points, for the fio workload you are using, seems that the best candidate should be cache=none - In the comparison I took cache=auto as Vivek suggested, this make sense as it seems that will be the default for kata. - Even if for this case cache=none works better, Can I assume that cache=auto dax=0 will be better than any 9pfs config? (once we find the root cause) - Vivek is taking a look to mmap mode from 9pfs, to see how different is with current virtiofs implementations. In 9pfs for kata, this is what we use by default. ## I'd like to identify what should be next on the debug/testing? - Should I try to narrow by only trying to with qemu? - Should I try first with a new patch you already have? - Probably try with qemu without static build? - Do the same test with thread-pool-size=1? Please let me know how can I help. Cheers. On 22/09/20 12:47, "Vivek Goyal" <vgoyal@redhat.com> wrote: On Tue, Sep 22, 2020 at 11:25:31AM +0100, Dr. David Alan Gilbert wrote: > * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote: > > Hi, > > I've been doing some of my own perf tests and I think I agree > > about the thread pool size; my test is a kernel build > > and I've tried a bunch of different options. > > > > My config: > > Host: 16 core AMD EPYC (32 thread), 128G RAM, > > 5.9.0-rc4 kernel, rhel 8.2ish userspace. > > 5.1.0 qemu/virtiofsd built from git. > > Guest: Fedora 32 from cloud image with just enough extra installed for > > a kernel build. > > > > git cloned and checkout v5.8 of Linux into /dev/shm/linux on the host > > fresh before each test. Then log into the guest, make defconfig, > > time make -j 16 bzImage, make clean; time make -j 16 bzImage > > The numbers below are the 'real' time in the guest from the initial make > > (the subsequent makes dont vary much) > > > > Below are the detauls of what each of these means, but here are the > > numbers first > > > > virtiofsdefault 4m0.978s > > 9pdefault 9m41.660s > > virtiofscache=none 10m29.700s > > 9pmmappass 9m30.047s > > 9pmbigmsize 12m4.208s > > 9pmsecnone 9m21.363s > > virtiofscache=noneT1 7m17.494s > > virtiofsdefaultT1 3m43.326s > > > > So the winner there by far is the 'virtiofsdefaultT1' - that's > > the default virtiofs settings, but with --thread-pool-size=1 - so > > yes it gives a small benefit. > > But interestingly the cache=none virtiofs performance is pretty bad, > > but thread-pool-size=1 on that makes a BIG improvement. > > Here are fio runs that Vivek asked me to run in my same environment > (there are some 0's in some of the mmap cases, and I've not investigated > why yet). cache=none does not allow mmap in case of virtiofs. That's when you are seeing 0. >virtiofs is looking good here in I think all of the cases; > there's some division over which cinfig; cache=none > seems faster in some cases which surprises me. I know cache=none is faster in case of write workloads. It forces direct write where we don't call file_remove_privs(). While cache=auto goes through file_remove_privs() and that adds a GETXATTR request to every WRITE request. Vivek [-- Attachment #2: results.tar.gz --] [-- Type: application/x-gzip, Size: 18156 bytes --] [-- Attachment #3: vitiofs 9pfs fio comparsion.html --] [-- Type: text/html, Size: 29758 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance) 2020-09-24 21:33 ` Venegas Munoz, Jose Carlos @ 2020-09-24 22:10 ` Vivek Goyal 2020-09-25 8:06 ` virtiofs vs 9p performance Christian Schoenebeck 2020-09-25 12:41 ` virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance) Dr. David Alan Gilbert 0 siblings, 2 replies; 55+ messages in thread From: Vivek Goyal @ 2020-09-24 22:10 UTC (permalink / raw) To: Venegas Munoz, Jose Carlos Cc: qemu-devel, cdupontd, Dr. David Alan Gilbert, virtio-fs-list, Stefan Hajnoczi, Shinde, Archana M On Thu, Sep 24, 2020 at 09:33:01PM +0000, Venegas Munoz, Jose Carlos wrote: > Hi Folks, > > Sorry for the delay about how to reproduce `fio` data. > > I have some code to automate testing for multiple kata configs and collect info like: > - Kata-env, kata configuration.toml, qemu command, virtiofsd command. > > See: > https://github.com/jcvenegas/mrunner/ > > > Last time we agreed to narrow the cases and configs to compare virtiofs and 9pfs > > The configs where the following: > > - qemu + virtiofs(cache=auto, dax=0) a.ka. `kata-qemu-virtiofs` WITOUT xattr > - qemu + 9pfs a.k.a `kata-qemu` > > Please take a look to the html and raw results I attach in this mail. Hi Carlos, So you are running following test. fio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75 --output=/output/fio.txt And following are your results. 9p -- READ: bw=211MiB/s (222MB/s), 211MiB/s-211MiB/s (222MB/s-222MB/s), io=3070MiB (3219MB), run=14532-14532msec WRITE: bw=70.6MiB/s (74.0MB/s), 70.6MiB/s-70.6MiB/s (74.0MB/s-74.0MB/s), io=1026MiB (1076MB), run=14532-14532msec virtiofs -------- Run status group 0 (all jobs): READ: bw=159MiB/s (167MB/s), 159MiB/s-159MiB/s (167MB/s-167MB/s), io=3070MiB (3219MB), run=19321-19321msec WRITE: bw=53.1MiB/s (55.7MB/s), 53.1MiB/s-53.1MiB/s (55.7MB/s-55.7MB/s), io=1026MiB (1076MB), run=19321-19321msec So looks like you are getting better performance with 9p in this case. Can you apply "shared pool" patch to qemu for virtiofsd and re-run this test and see if you see any better results. In my testing, with cache=none, virtiofs performed better than 9p in all the fio jobs I was running. For the case of cache=auto for virtiofs (with xattr enabled), 9p performed better in certain write workloads. I have identified root cause of that problem and working on HANDLE_KILLPRIV_V2 patches to improve WRITE performance of virtiofs with cache=auto and xattr enabled. I will post my 9p and virtiofs comparison numbers next week. In the mean time will be great if you could apply following qemu patch, rebuild qemu and re-run above test. https://www.redhat.com/archives/virtio-fs/2020-September/msg00081.html Also what's the status of file cache on host in both the cases. Are you booting host fresh for these tests so that cache is cold on host or cache is warm? Thanks Vivek ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: virtiofs vs 9p performance 2020-09-24 22:10 ` virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance) Vivek Goyal @ 2020-09-25 8:06 ` Christian Schoenebeck 2020-09-25 13:13 ` Vivek Goyal 2021-02-19 16:08 ` Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance) Vivek Goyal 2020-09-25 12:41 ` virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance) Dr. David Alan Gilbert 1 sibling, 2 replies; 55+ messages in thread From: Christian Schoenebeck @ 2020-09-25 8:06 UTC (permalink / raw) To: qemu-devel Cc: Vivek Goyal, Venegas Munoz, Jose Carlos, cdupontd, Dr. David Alan Gilbert, virtio-fs-list, Stefan Hajnoczi, Shinde, Archana M, Greg Kurz On Freitag, 25. September 2020 00:10:23 CEST Vivek Goyal wrote: > In my testing, with cache=none, virtiofs performed better than 9p in > all the fio jobs I was running. For the case of cache=auto for virtiofs > (with xattr enabled), 9p performed better in certain write workloads. I > have identified root cause of that problem and working on > HANDLE_KILLPRIV_V2 patches to improve WRITE performance of virtiofs > with cache=auto and xattr enabled. Please note, when it comes to performance aspects, you should set a reasonable high value for 'msize' on 9p client side: https://wiki.qemu.org/Documentation/9psetup#msize I'm also working on performance optimizations for 9p BTW. There is plenty of headroom to put it mildly. For QEMU 5.2 I started by addressing readdir requests: https://wiki.qemu.org/ChangeLog/5.2#9pfs Best regards, Christian Schoenebeck ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: virtiofs vs 9p performance 2020-09-25 8:06 ` virtiofs vs 9p performance Christian Schoenebeck @ 2020-09-25 13:13 ` Vivek Goyal 2020-09-25 15:47 ` Christian Schoenebeck 2021-02-19 16:08 ` Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance) Vivek Goyal 1 sibling, 1 reply; 55+ messages in thread From: Vivek Goyal @ 2020-09-25 13:13 UTC (permalink / raw) To: Christian Schoenebeck Cc: Shinde, Archana M, Venegas Munoz, Jose Carlos, qemu-devel, Dr. David Alan Gilbert, virtio-fs-list, Greg Kurz, Stefan Hajnoczi, cdupontd On Fri, Sep 25, 2020 at 10:06:41AM +0200, Christian Schoenebeck wrote: > On Freitag, 25. September 2020 00:10:23 CEST Vivek Goyal wrote: > > In my testing, with cache=none, virtiofs performed better than 9p in > > all the fio jobs I was running. For the case of cache=auto for virtiofs > > (with xattr enabled), 9p performed better in certain write workloads. I > > have identified root cause of that problem and working on > > HANDLE_KILLPRIV_V2 patches to improve WRITE performance of virtiofs > > with cache=auto and xattr enabled. > > Please note, when it comes to performance aspects, you should set a reasonable > high value for 'msize' on 9p client side: > https://wiki.qemu.org/Documentation/9psetup#msize Interesting. I will try that. What does "msize" do? > > I'm also working on performance optimizations for 9p BTW. There is plenty of > headroom to put it mildly. For QEMU 5.2 I started by addressing readdir > requests: > https://wiki.qemu.org/ChangeLog/5.2#9pfs Nice. I guess this performance comparison between 9p and virtiofs is good. Both the projects can try to identify weak points and improve performance. Thanks Vivek ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: virtiofs vs 9p performance 2020-09-25 13:13 ` Vivek Goyal @ 2020-09-25 15:47 ` Christian Schoenebeck 0 siblings, 0 replies; 55+ messages in thread From: Christian Schoenebeck @ 2020-09-25 15:47 UTC (permalink / raw) To: qemu-devel Cc: Vivek Goyal, Shinde, Archana M, Venegas Munoz, Jose Carlos, Dr. David Alan Gilbert, virtio-fs-list, Greg Kurz, Stefan Hajnoczi, cdupontd On Freitag, 25. September 2020 15:13:56 CEST Vivek Goyal wrote: > On Fri, Sep 25, 2020 at 10:06:41AM +0200, Christian Schoenebeck wrote: > > On Freitag, 25. September 2020 00:10:23 CEST Vivek Goyal wrote: > > > In my testing, with cache=none, virtiofs performed better than 9p in > > > all the fio jobs I was running. For the case of cache=auto for virtiofs > > > (with xattr enabled), 9p performed better in certain write workloads. I > > > have identified root cause of that problem and working on > > > HANDLE_KILLPRIV_V2 patches to improve WRITE performance of virtiofs > > > with cache=auto and xattr enabled. > > > > Please note, when it comes to performance aspects, you should set a > > reasonable high value for 'msize' on 9p client side: > > https://wiki.qemu.org/Documentation/9psetup#msize > > Interesting. I will try that. What does "msize" do? Simple: it's the "maximum message size" ever to be used for communication between host and guest, in both directions that is. So if that 'msize' value is too small, a potential large 9p message would be split into several smaller 9p messages, and each message adds latency which is the main problem. Keep in mind: The default value with Linux clients for msize is still only 8kB! Think of doing 'dd bs=8192 if=/src.dat of=/dst.dat count=...' as analogy, which probably makes its impact on performance clear. However the negative impact of a small 'msize' value is not just limited to raw file I/O like that; calling readdir() for instance on a guest directory with several hundred files or more, will likewise slow down in the same way tremendously as both sides have to transmit a large amount of 9p messages back and forth instead of just 2 messages (Treaddir and Rreaddir). > > I'm also working on performance optimizations for 9p BTW. There is plenty > > of headroom to put it mildly. For QEMU 5.2 I started by addressing > > readdir requests: > > https://wiki.qemu.org/ChangeLog/5.2#9pfs > > Nice. I guess this performance comparison between 9p and virtiofs is good. > Both the projects can try to identify weak points and improve performance. Yes, that's indeed handy being able to make comparisons. Best regards, Christian Schoenebeck ^ permalink raw reply [flat|nested] 55+ messages in thread
* Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance) 2020-09-25 8:06 ` virtiofs vs 9p performance Christian Schoenebeck 2020-09-25 13:13 ` Vivek Goyal @ 2021-02-19 16:08 ` Vivek Goyal 2021-02-19 17:33 ` Christian Schoenebeck 1 sibling, 1 reply; 55+ messages in thread From: Vivek Goyal @ 2021-02-19 16:08 UTC (permalink / raw) To: Christian Schoenebeck Cc: Shinde, Archana M, Venegas Munoz, Jose Carlos, qemu-devel, Dr. David Alan Gilbert, virtio-fs-list, Greg Kurz, Stefan Hajnoczi, cdupontd On Fri, Sep 25, 2020 at 10:06:41AM +0200, Christian Schoenebeck wrote: > On Freitag, 25. September 2020 00:10:23 CEST Vivek Goyal wrote: > > In my testing, with cache=none, virtiofs performed better than 9p in > > all the fio jobs I was running. For the case of cache=auto for virtiofs > > (with xattr enabled), 9p performed better in certain write workloads. I > > have identified root cause of that problem and working on > > HANDLE_KILLPRIV_V2 patches to improve WRITE performance of virtiofs > > with cache=auto and xattr enabled. > > Please note, when it comes to performance aspects, you should set a reasonable > high value for 'msize' on 9p client side: > https://wiki.qemu.org/Documentation/9psetup#msize Hi Christian, I am not able to set msize to a higher value. If I try to specify msize 16MB, and then read back msize from /proc/mounts, it sees to cap it at 512000. Is that intended? $ mount -t 9p -o trans=virtio,version=9p2000.L,cache=none,msize=16777216 hostShared /mnt/virtio-9p $ cat /proc/mounts | grep 9p hostShared /mnt/virtio-9p 9p rw,sync,dirsync,relatime,access=client,msize=512000,trans=virtio 0 0 I am using 5.11 kernel. Thanks Vivek ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance) 2021-02-19 16:08 ` Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance) Vivek Goyal @ 2021-02-19 17:33 ` Christian Schoenebeck 2021-02-19 19:01 ` Vivek Goyal 0 siblings, 1 reply; 55+ messages in thread From: Christian Schoenebeck @ 2021-02-19 17:33 UTC (permalink / raw) To: qemu-devel Cc: Vivek Goyal, Shinde, Archana M, Venegas Munoz, Jose Carlos, Dr. David Alan Gilbert, virtio-fs-list, Greg Kurz, Stefan Hajnoczi, cdupontd On Freitag, 19. Februar 2021 17:08:48 CET Vivek Goyal wrote: > On Fri, Sep 25, 2020 at 10:06:41AM +0200, Christian Schoenebeck wrote: > > On Freitag, 25. September 2020 00:10:23 CEST Vivek Goyal wrote: > > > In my testing, with cache=none, virtiofs performed better than 9p in > > > all the fio jobs I was running. For the case of cache=auto for virtiofs > > > (with xattr enabled), 9p performed better in certain write workloads. I > > > have identified root cause of that problem and working on > > > HANDLE_KILLPRIV_V2 patches to improve WRITE performance of virtiofs > > > with cache=auto and xattr enabled. > > > > Please note, when it comes to performance aspects, you should set a > > reasonable high value for 'msize' on 9p client side: > > https://wiki.qemu.org/Documentation/9psetup#msize > > Hi Christian, > > I am not able to set msize to a higher value. If I try to specify msize > 16MB, and then read back msize from /proc/mounts, it sees to cap it > at 512000. Is that intended? 9p server side in QEMU does not perform any msize capping. The code in this case is very simple, it's just what you see in function v9fs_version(): https://github.com/qemu/qemu/blob/6de76c5f324904c93e69f9a1e8e4fd0bd6f6b57a/hw/9pfs/9p.c#L1332 > $ mount -t 9p -o trans=virtio,version=9p2000.L,cache=none,msize=16777216 > hostShared /mnt/virtio-9p > > $ cat /proc/mounts | grep 9p > hostShared /mnt/virtio-9p 9p > rw,sync,dirsync,relatime,access=client,msize=512000,trans=virtio 0 0 > > I am using 5.11 kernel. Must be something on client (guest kernel) side. I don't see this here with guest kernel 4.9.0 happening with my setup in a quick test: $ cat /etc/mtab | grep 9p svnRoot / 9p rw,dirsync,relatime,trans=virtio,version=9p2000.L,msize=104857600,cache=mmap 0 0 $ Looks like the root cause of your issue is this: struct p9_client *p9_client_create(const char *dev_name, char *options) { ... if (clnt->msize > clnt->trans_mod->maxsize) clnt->msize = clnt->trans_mod->maxsize; https://github.com/torvalds/linux/blob/f40ddce88593482919761f74910f42f4b84c004b/net/9p/client.c#L1045 Best regards, Christian Schoenebeck ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance) 2021-02-19 17:33 ` Christian Schoenebeck @ 2021-02-19 19:01 ` Vivek Goyal 2021-02-20 15:38 ` Christian Schoenebeck 0 siblings, 1 reply; 55+ messages in thread From: Vivek Goyal @ 2021-02-19 19:01 UTC (permalink / raw) To: Christian Schoenebeck Cc: cdupontd, Venegas Munoz, Jose Carlos, Greg Kurz, qemu-devel, virtio-fs-list, Stefan Hajnoczi, Shinde, Archana M, Dr. David Alan Gilbert On Fri, Feb 19, 2021 at 06:33:46PM +0100, Christian Schoenebeck wrote: > On Freitag, 19. Februar 2021 17:08:48 CET Vivek Goyal wrote: > > On Fri, Sep 25, 2020 at 10:06:41AM +0200, Christian Schoenebeck wrote: > > > On Freitag, 25. September 2020 00:10:23 CEST Vivek Goyal wrote: > > > > In my testing, with cache=none, virtiofs performed better than 9p in > > > > all the fio jobs I was running. For the case of cache=auto for virtiofs > > > > (with xattr enabled), 9p performed better in certain write workloads. I > > > > have identified root cause of that problem and working on > > > > HANDLE_KILLPRIV_V2 patches to improve WRITE performance of virtiofs > > > > with cache=auto and xattr enabled. > > > > > > Please note, when it comes to performance aspects, you should set a > > > reasonable high value for 'msize' on 9p client side: > > > https://wiki.qemu.org/Documentation/9psetup#msize > > > > Hi Christian, > > > > I am not able to set msize to a higher value. If I try to specify msize > > 16MB, and then read back msize from /proc/mounts, it sees to cap it > > at 512000. Is that intended? > > 9p server side in QEMU does not perform any msize capping. The code in this > case is very simple, it's just what you see in function v9fs_version(): > > https://github.com/qemu/qemu/blob/6de76c5f324904c93e69f9a1e8e4fd0bd6f6b57a/hw/9pfs/9p.c#L1332 > > > $ mount -t 9p -o trans=virtio,version=9p2000.L,cache=none,msize=16777216 > > hostShared /mnt/virtio-9p > > > > $ cat /proc/mounts | grep 9p > > hostShared /mnt/virtio-9p 9p > > rw,sync,dirsync,relatime,access=client,msize=512000,trans=virtio 0 0 > > > > I am using 5.11 kernel. > > Must be something on client (guest kernel) side. I don't see this here with > guest kernel 4.9.0 happening with my setup in a quick test: > > $ cat /etc/mtab | grep 9p > svnRoot / 9p rw,dirsync,relatime,trans=virtio,version=9p2000.L,msize=104857600,cache=mmap 0 0 > $ > > Looks like the root cause of your issue is this: > > struct p9_client *p9_client_create(const char *dev_name, char *options) > { > ... > if (clnt->msize > clnt->trans_mod->maxsize) > clnt->msize = clnt->trans_mod->maxsize; > > https://github.com/torvalds/linux/blob/f40ddce88593482919761f74910f42f4b84c004b/net/9p/client.c#L1045 That was introduced by a patch 2011. commit c9ffb05ca5b5098d6ea468c909dd384d90da7d54 Author: Venkateswararao Jujjuri (JV) <jvrao@linux.vnet.ibm.com> Date: Wed Jun 29 18:06:33 2011 -0700 net/9p: Fix the msize calculation. msize represents the maximum PDU size that includes P9_IOHDRSZ. You kernel 4.9 is newer than this. So most likely you have this commit too. I will spend some time later trying to debug this. Vivek ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance) 2021-02-19 19:01 ` Vivek Goyal @ 2021-02-20 15:38 ` Christian Schoenebeck 2021-02-22 12:18 ` Greg Kurz 0 siblings, 1 reply; 55+ messages in thread From: Christian Schoenebeck @ 2021-02-20 15:38 UTC (permalink / raw) To: qemu-devel Cc: Vivek Goyal, cdupontd, Venegas Munoz, Jose Carlos, Greg Kurz, virtio-fs-list, Stefan Hajnoczi, Shinde, Archana M, Dr. David Alan Gilbert On Freitag, 19. Februar 2021 20:01:12 CET Vivek Goyal wrote: > On Fri, Feb 19, 2021 at 06:33:46PM +0100, Christian Schoenebeck wrote: > > On Freitag, 19. Februar 2021 17:08:48 CET Vivek Goyal wrote: > > > On Fri, Sep 25, 2020 at 10:06:41AM +0200, Christian Schoenebeck wrote: > > > > On Freitag, 25. September 2020 00:10:23 CEST Vivek Goyal wrote: > > > > > In my testing, with cache=none, virtiofs performed better than 9p in > > > > > all the fio jobs I was running. For the case of cache=auto for > > > > > virtiofs > > > > > (with xattr enabled), 9p performed better in certain write > > > > > workloads. I > > > > > have identified root cause of that problem and working on > > > > > HANDLE_KILLPRIV_V2 patches to improve WRITE performance of virtiofs > > > > > with cache=auto and xattr enabled. > > > > > > > > Please note, when it comes to performance aspects, you should set a > > > > reasonable high value for 'msize' on 9p client side: > > > > https://wiki.qemu.org/Documentation/9psetup#msize > > > > > > Hi Christian, > > > > > > I am not able to set msize to a higher value. If I try to specify msize > > > 16MB, and then read back msize from /proc/mounts, it sees to cap it > > > at 512000. Is that intended? > > > > 9p server side in QEMU does not perform any msize capping. The code in > > this > > case is very simple, it's just what you see in function v9fs_version(): > > > > https://github.com/qemu/qemu/blob/6de76c5f324904c93e69f9a1e8e4fd0bd6f6b57a > > /hw/9pfs/9p.c#L1332> > > > $ mount -t 9p -o trans=virtio,version=9p2000.L,cache=none,msize=16777216 > > > hostShared /mnt/virtio-9p > > > > > > $ cat /proc/mounts | grep 9p > > > hostShared /mnt/virtio-9p 9p > > > rw,sync,dirsync,relatime,access=client,msize=512000,trans=virtio 0 0 > > > > > > I am using 5.11 kernel. > > > > Must be something on client (guest kernel) side. I don't see this here > > with > > guest kernel 4.9.0 happening with my setup in a quick test: > > > > $ cat /etc/mtab | grep 9p > > svnRoot / 9p > > rw,dirsync,relatime,trans=virtio,version=9p2000.L,msize=104857600,cache=m > > map 0 0 $ > > > > Looks like the root cause of your issue is this: > > > > struct p9_client *p9_client_create(const char *dev_name, char *options) > > { > > > > ... > > if (clnt->msize > clnt->trans_mod->maxsize) > > > > clnt->msize = clnt->trans_mod->maxsize; > > > > https://github.com/torvalds/linux/blob/f40ddce88593482919761f74910f42f4b84 > > c004b/net/9p/client.c#L1045 > That was introduced by a patch 2011. > > commit c9ffb05ca5b5098d6ea468c909dd384d90da7d54 > Author: Venkateswararao Jujjuri (JV) <jvrao@linux.vnet.ibm.com> > Date: Wed Jun 29 18:06:33 2011 -0700 > > net/9p: Fix the msize calculation. > > msize represents the maximum PDU size that includes P9_IOHDRSZ. > > > You kernel 4.9 is newer than this. So most likely you have this commit > too. I will spend some time later trying to debug this. > > Vivek As the kernel code sais trans_mod->maxsize, maybe its something in virtio on qemu side that does an automatic step back for some reason. I don't see something in the 9pfs virtio transport driver (hw/9pfs/virtio-9p-device.c on QEMU side) that would do this, so I would also need to dig deeper. Do you have some RAM limitation in your setup somewhere? For comparison, this is how I started the VM: ~/git/qemu/build/qemu-system-x86_64 \ -machine pc,accel=kvm,usb=off,dump-guest-core=off -m 2048 \ -smp 4,sockets=4,cores=1,threads=1 -rtc base=utc \ -boot strict=on -kernel /home/bee/vm/stretch/boot/vmlinuz-4.9.0-13-amd64 \ -initrd /home/bee/vm/stretch/boot/initrd.img-4.9.0-13-amd64 \ -append 'root=svnRoot rw rootfstype=9p rootflags=trans=virtio,version=9p2000.L,msize=104857600,cache=mmap console=ttyS0' \ -fsdev local,security_model=mapped,multidevs=remap,id=fsdev-fs0,path=/home/bee/vm/stretch/ \ -device virtio-9p-pci,id=fs0,fsdev=fsdev-fs0,mount_tag=svnRoot \ -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ -nographic So the guest system is running entirely and solely on top of 9pfs (as root fs) and hence it's mounted by above's CL i.e. immediately when the guest is booted, and RAM size is set to 2 GB. Best regards, Christian Schoenebeck ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance) 2021-02-20 15:38 ` Christian Schoenebeck @ 2021-02-22 12:18 ` Greg Kurz 2021-02-22 15:08 ` Christian Schoenebeck 0 siblings, 1 reply; 55+ messages in thread From: Greg Kurz @ 2021-02-22 12:18 UTC (permalink / raw) To: Christian Schoenebeck Cc: Venegas Munoz, Jose Carlos, qemu-devel, cdupontd, virtio-fs-list, Dr. David Alan Gilbert, Stefan Hajnoczi, Shinde, Archana M, Vivek Goyal On Sat, 20 Feb 2021 16:38:35 +0100 Christian Schoenebeck <qemu_oss@crudebyte.com> wrote: > On Freitag, 19. Februar 2021 20:01:12 CET Vivek Goyal wrote: > > On Fri, Feb 19, 2021 at 06:33:46PM +0100, Christian Schoenebeck wrote: > > > On Freitag, 19. Februar 2021 17:08:48 CET Vivek Goyal wrote: > > > > On Fri, Sep 25, 2020 at 10:06:41AM +0200, Christian Schoenebeck wrote: > > > > > On Freitag, 25. September 2020 00:10:23 CEST Vivek Goyal wrote: > > > > > > In my testing, with cache=none, virtiofs performed better than 9p in > > > > > > all the fio jobs I was running. For the case of cache=auto for > > > > > > virtiofs > > > > > > (with xattr enabled), 9p performed better in certain write > > > > > > workloads. I > > > > > > have identified root cause of that problem and working on > > > > > > HANDLE_KILLPRIV_V2 patches to improve WRITE performance of virtiofs > > > > > > with cache=auto and xattr enabled. > > > > > > > > > > Please note, when it comes to performance aspects, you should set a > > > > > reasonable high value for 'msize' on 9p client side: > > > > > https://wiki.qemu.org/Documentation/9psetup#msize > > > > > > > > Hi Christian, > > > > > > > > I am not able to set msize to a higher value. If I try to specify msize > > > > 16MB, and then read back msize from /proc/mounts, it sees to cap it > > > > at 512000. Is that intended? > > > > > > 9p server side in QEMU does not perform any msize capping. The code in > > > this > > > case is very simple, it's just what you see in function v9fs_version(): > > > > > > https://github.com/qemu/qemu/blob/6de76c5f324904c93e69f9a1e8e4fd0bd6f6b57a > > > /hw/9pfs/9p.c#L1332> > > > > $ mount -t 9p -o trans=virtio,version=9p2000.L,cache=none,msize=16777216 > > > > hostShared /mnt/virtio-9p > > > > > > > > $ cat /proc/mounts | grep 9p > > > > hostShared /mnt/virtio-9p 9p > > > > rw,sync,dirsync,relatime,access=client,msize=512000,trans=virtio 0 0 > > > > > > > > I am using 5.11 kernel. > > > > > > Must be something on client (guest kernel) side. I don't see this here > > > with > > > guest kernel 4.9.0 happening with my setup in a quick test: > > > > > > $ cat /etc/mtab | grep 9p > > > svnRoot / 9p > > > rw,dirsync,relatime,trans=virtio,version=9p2000.L,msize=104857600,cache=m > > > map 0 0 $ > > > > > > Looks like the root cause of your issue is this: > > > > > > struct p9_client *p9_client_create(const char *dev_name, char *options) > > > { > > > > > > ... > > > if (clnt->msize > clnt->trans_mod->maxsize) > > > > > > clnt->msize = clnt->trans_mod->maxsize; > > > > > > https://github.com/torvalds/linux/blob/f40ddce88593482919761f74910f42f4b84 > > > c004b/net/9p/client.c#L1045 > > That was introduced by a patch 2011. > > > > commit c9ffb05ca5b5098d6ea468c909dd384d90da7d54 > > Author: Venkateswararao Jujjuri (JV) <jvrao@linux.vnet.ibm.com> > > Date: Wed Jun 29 18:06:33 2011 -0700 > > > > net/9p: Fix the msize calculation. > > > > msize represents the maximum PDU size that includes P9_IOHDRSZ. > > > > > > You kernel 4.9 is newer than this. So most likely you have this commit > > too. I will spend some time later trying to debug this. > > > > Vivek > Hi Vivek and Christian, I reproduce with an up-to-date fedora rawhide guest. Capping comes from here: net/9p/trans_virtio.c: .maxsize = PAGE_SIZE * (VIRTQUEUE_NUM - 3), i.e. 4096 * (128 - 3) == 512000 AFAICT this has been around since 2011, i.e. always for me as a maintainer and I admit I had never tried such high msize settings before. commit b49d8b5d7007a673796f3f99688b46931293873e Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Date: Wed Aug 17 16:56:04 2011 +0000 net/9p: Fix kernel crash with msize 512K With msize equal to 512K (PAGE_SIZE * VIRTQUEUE_NUM), we hit multiple crashes. This patch fix those. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> Changelog doesn't help much but it looks like it was a bandaid for some more severe issues. > As the kernel code sais trans_mod->maxsize, maybe its something in virtio on > qemu side that does an automatic step back for some reason. I don't see > something in the 9pfs virtio transport driver (hw/9pfs/virtio-9p-device.c on > QEMU side) that would do this, so I would also need to dig deeper. > > Do you have some RAM limitation in your setup somewhere? > > For comparison, this is how I started the VM: > > ~/git/qemu/build/qemu-system-x86_64 \ > -machine pc,accel=kvm,usb=off,dump-guest-core=off -m 2048 \ > -smp 4,sockets=4,cores=1,threads=1 -rtc base=utc \ > -boot strict=on -kernel /home/bee/vm/stretch/boot/vmlinuz-4.9.0-13-amd64 \ > -initrd /home/bee/vm/stretch/boot/initrd.img-4.9.0-13-amd64 \ > -append 'root=svnRoot rw rootfstype=9p rootflags=trans=virtio,version=9p2000.L,msize=104857600,cache=mmap console=ttyS0' \ First obvious difference I see between your setup and mine is that you're mounting the 9pfs as root from the kernel command line. For some reason, maybe this has an impact on the check in p9_client_create() ? Can you reproduce with a scenario like Vivek's one ? > -fsdev local,security_model=mapped,multidevs=remap,id=fsdev-fs0,path=/home/bee/vm/stretch/ \ > -device virtio-9p-pci,id=fs0,fsdev=fsdev-fs0,mount_tag=svnRoot \ > -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ > -nographic > > So the guest system is running entirely and solely on top of 9pfs (as root fs) > and hence it's mounted by above's CL i.e. immediately when the guest is > booted, and RAM size is set to 2 GB. > > Best regards, > Christian Schoenebeck > > ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance) 2021-02-22 12:18 ` Greg Kurz @ 2021-02-22 15:08 ` Christian Schoenebeck 2021-02-22 17:11 ` Greg Kurz 0 siblings, 1 reply; 55+ messages in thread From: Christian Schoenebeck @ 2021-02-22 15:08 UTC (permalink / raw) To: qemu-devel Cc: Greg Kurz, Venegas Munoz, Jose Carlos, cdupontd, virtio-fs-list, Dr. David Alan Gilbert, Stefan Hajnoczi, Shinde, Archana M, Vivek Goyal On Montag, 22. Februar 2021 13:18:14 CET Greg Kurz wrote: > On Sat, 20 Feb 2021 16:38:35 +0100 > > Christian Schoenebeck <qemu_oss@crudebyte.com> wrote: > > On Freitag, 19. Februar 2021 20:01:12 CET Vivek Goyal wrote: > > > On Fri, Feb 19, 2021 at 06:33:46PM +0100, Christian Schoenebeck wrote: > > > > On Freitag, 19. Februar 2021 17:08:48 CET Vivek Goyal wrote: > > > > > On Fri, Sep 25, 2020 at 10:06:41AM +0200, Christian Schoenebeck wrote: > > > > > > On Freitag, 25. September 2020 00:10:23 CEST Vivek Goyal wrote: > > > > > > > In my testing, with cache=none, virtiofs performed better than > > > > > > > 9p in > > > > > > > all the fio jobs I was running. For the case of cache=auto for > > > > > > > virtiofs > > > > > > > (with xattr enabled), 9p performed better in certain write > > > > > > > workloads. I > > > > > > > have identified root cause of that problem and working on > > > > > > > HANDLE_KILLPRIV_V2 patches to improve WRITE performance of > > > > > > > virtiofs > > > > > > > with cache=auto and xattr enabled. > > > > > > > > > > > > Please note, when it comes to performance aspects, you should set > > > > > > a > > > > > > reasonable high value for 'msize' on 9p client side: > > > > > > https://wiki.qemu.org/Documentation/9psetup#msize > > > > > > > > > > Hi Christian, > > > > > > > > > > I am not able to set msize to a higher value. If I try to specify > > > > > msize > > > > > 16MB, and then read back msize from /proc/mounts, it sees to cap it > > > > > at 512000. Is that intended? > > > > > > > > 9p server side in QEMU does not perform any msize capping. The code in > > > > this > > > > case is very simple, it's just what you see in function > > > > v9fs_version(): > > > > > > > > https://github.com/qemu/qemu/blob/6de76c5f324904c93e69f9a1e8e4fd0bd6f6 > > > > b57a > > > > /hw/9pfs/9p.c#L1332> > > > > > > > > > $ mount -t 9p -o > > > > > trans=virtio,version=9p2000.L,cache=none,msize=16777216 > > > > > hostShared /mnt/virtio-9p > > > > > > > > > > $ cat /proc/mounts | grep 9p > > > > > hostShared /mnt/virtio-9p 9p > > > > > rw,sync,dirsync,relatime,access=client,msize=512000,trans=virtio 0 0 > > > > > > > > > > I am using 5.11 kernel. > > > > > > > > Must be something on client (guest kernel) side. I don't see this here > > > > with > > > > guest kernel 4.9.0 happening with my setup in a quick test: > > > > > > > > $ cat /etc/mtab | grep 9p > > > > svnRoot / 9p > > > > rw,dirsync,relatime,trans=virtio,version=9p2000.L,msize=104857600,cach > > > > e=m > > > > map 0 0 $ > > > > > > > > Looks like the root cause of your issue is this: > > > > > > > > struct p9_client *p9_client_create(const char *dev_name, char > > > > *options) > > > > { > > > > > > > > ... > > > > if (clnt->msize > clnt->trans_mod->maxsize) > > > > > > > > clnt->msize = clnt->trans_mod->maxsize; > > > > > > > > https://github.com/torvalds/linux/blob/f40ddce88593482919761f74910f42f > > > > 4b84 > > > > c004b/net/9p/client.c#L1045 > > > > > > That was introduced by a patch 2011. > > > > > > commit c9ffb05ca5b5098d6ea468c909dd384d90da7d54 > > > Author: Venkateswararao Jujjuri (JV) <jvrao@linux.vnet.ibm.com> > > > Date: Wed Jun 29 18:06:33 2011 -0700 > > > > > > net/9p: Fix the msize calculation. > > > > > > msize represents the maximum PDU size that includes P9_IOHDRSZ. > > > > > > You kernel 4.9 is newer than this. So most likely you have this commit > > > too. I will spend some time later trying to debug this. > > > > > > Vivek > > Hi Vivek and Christian, > > I reproduce with an up-to-date fedora rawhide guest. > > Capping comes from here: > > net/9p/trans_virtio.c: .maxsize = PAGE_SIZE * (VIRTQUEUE_NUM - 3), > > i.e. 4096 * (128 - 3) == 512000 > > AFAICT this has been around since 2011, i.e. always for me as a > maintainer and I admit I had never tried such high msize settings > before. > > commit b49d8b5d7007a673796f3f99688b46931293873e > Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> > Date: Wed Aug 17 16:56:04 2011 +0000 > > net/9p: Fix kernel crash with msize 512K > > With msize equal to 512K (PAGE_SIZE * VIRTQUEUE_NUM), we hit multiple > crashes. This patch fix those. > > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> > Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> > > Changelog doesn't help much but it looks like it was a bandaid > for some more severe issues. I did not ever have a kernel crash when I boot a Linux guest with a 9pfs root fs and 100 MiB msize. Should we ask virtio or 9p Linux client maintainers if they can add some info what this is about? > > As the kernel code sais trans_mod->maxsize, maybe its something in virtio > > on qemu side that does an automatic step back for some reason. I don't > > see something in the 9pfs virtio transport driver > > (hw/9pfs/virtio-9p-device.c on QEMU side) that would do this, so I would > > also need to dig deeper. > > > > Do you have some RAM limitation in your setup somewhere? > > > > For comparison, this is how I started the VM: > > > > ~/git/qemu/build/qemu-system-x86_64 \ > > -machine pc,accel=kvm,usb=off,dump-guest-core=off -m 2048 \ > > -smp 4,sockets=4,cores=1,threads=1 -rtc base=utc \ > > -boot strict=on -kernel /home/bee/vm/stretch/boot/vmlinuz-4.9.0-13-amd64 \ > > -initrd /home/bee/vm/stretch/boot/initrd.img-4.9.0-13-amd64 \ > > -append 'root=svnRoot rw rootfstype=9p > > rootflags=trans=virtio,version=9p2000.L,msize=104857600,cache=mmap > > console=ttyS0' \ > First obvious difference I see between your setup and mine is that > you're mounting the 9pfs as root from the kernel command line. For > some reason, maybe this has an impact on the check in p9_client_create() ? > > Can you reproduce with a scenario like Vivek's one ? Yep, confirmed. If I boot a guest from an image file first and then try to manually mount a 9pfs share after guest booted, then I get indeed that msize capping of just 512 kiB as well. That's far too small. :/ Best regards, Christian Schoenebeck ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance) 2021-02-22 15:08 ` Christian Schoenebeck @ 2021-02-22 17:11 ` Greg Kurz 2021-02-23 13:39 ` Christian Schoenebeck 0 siblings, 1 reply; 55+ messages in thread From: Greg Kurz @ 2021-02-22 17:11 UTC (permalink / raw) To: Christian Schoenebeck Cc: Shinde, Archana M, Venegas Munoz, Jose Carlos, qemu-devel, Dr. David Alan Gilbert, virtio-fs-list, Stefan Hajnoczi, cdupontd, Vivek Goyal On Mon, 22 Feb 2021 16:08:04 +0100 Christian Schoenebeck <qemu_oss@crudebyte.com> wrote: [...] > I did not ever have a kernel crash when I boot a Linux guest with a 9pfs root > fs and 100 MiB msize. Interesting. > Should we ask virtio or 9p Linux client maintainers if > they can add some info what this is about? > Probably worth to try that first, even if I'm not sure anyone has a answer for that since all the people who worked on virtio-9p at the time have somehow deserted the project. > > > As the kernel code sais trans_mod->maxsize, maybe its something in virtio > > > on qemu side that does an automatic step back for some reason. I don't > > > see something in the 9pfs virtio transport driver > > > (hw/9pfs/virtio-9p-device.c on QEMU side) that would do this, so I would > > > also need to dig deeper. > > > > > > Do you have some RAM limitation in your setup somewhere? > > > > > > For comparison, this is how I started the VM: > > > > > > ~/git/qemu/build/qemu-system-x86_64 \ > > > -machine pc,accel=kvm,usb=off,dump-guest-core=off -m 2048 \ > > > -smp 4,sockets=4,cores=1,threads=1 -rtc base=utc \ > > > -boot strict=on -kernel /home/bee/vm/stretch/boot/vmlinuz-4.9.0-13-amd64 \ > > > -initrd /home/bee/vm/stretch/boot/initrd.img-4.9.0-13-amd64 \ > > > -append 'root=svnRoot rw rootfstype=9p > > > rootflags=trans=virtio,version=9p2000.L,msize=104857600,cache=mmap > > > console=ttyS0' \ > > First obvious difference I see between your setup and mine is that > > you're mounting the 9pfs as root from the kernel command line. For > > some reason, maybe this has an impact on the check in p9_client_create() ? > > > > Can you reproduce with a scenario like Vivek's one ? > > Yep, confirmed. If I boot a guest from an image file first and then try to > manually mount a 9pfs share after guest booted, then I get indeed that msize > capping of just 512 kiB as well. That's far too small. :/ > Maybe worth digging : - why no capping happens in your scenario ? - is capping really needed ? Cheers, -- Greg > Best regards, > Christian Schoenebeck > > ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance) 2021-02-22 17:11 ` Greg Kurz @ 2021-02-23 13:39 ` Christian Schoenebeck 2021-02-23 14:07 ` Michael S. Tsirkin 0 siblings, 1 reply; 55+ messages in thread From: Christian Schoenebeck @ 2021-02-23 13:39 UTC (permalink / raw) To: qemu-devel Cc: Greg Kurz, Shinde, Archana M, Venegas Munoz, Jose Carlos, Dr. David Alan Gilbert, virtio-fs-list, Stefan Hajnoczi, cdupontd, Vivek Goyal, Michael S. Tsirkin, Dominique Martinet, v9fs-developer On Montag, 22. Februar 2021 18:11:59 CET Greg Kurz wrote: > On Mon, 22 Feb 2021 16:08:04 +0100 > Christian Schoenebeck <qemu_oss@crudebyte.com> wrote: > > [...] > > > I did not ever have a kernel crash when I boot a Linux guest with a 9pfs > > root fs and 100 MiB msize. > > Interesting. > > > Should we ask virtio or 9p Linux client maintainers if > > they can add some info what this is about? > > Probably worth to try that first, even if I'm not sure anyone has a > answer for that since all the people who worked on virtio-9p at > the time have somehow deserted the project. Michael, Dominique, we are wondering here about the message size limitation of just 5 kiB in the 9p Linux client (using virtio transport) which imposes a performance bottleneck, introduced by this kernel commit: commit b49d8b5d7007a673796f3f99688b46931293873e Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Date: Wed Aug 17 16:56:04 2011 +0000 net/9p: Fix kernel crash with msize 512K With msize equal to 512K (PAGE_SIZE * VIRTQUEUE_NUM), we hit multiple crashes. This patch fix those. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> Is this a fundamental maximum message size that cannot be exceeded with virtio in general or is there another reason for this limit that still applies? Full discussion: https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg06343.html > > > > As the kernel code sais trans_mod->maxsize, maybe its something in > > > > virtio > > > > on qemu side that does an automatic step back for some reason. I don't > > > > see something in the 9pfs virtio transport driver > > > > (hw/9pfs/virtio-9p-device.c on QEMU side) that would do this, so I > > > > would > > > > also need to dig deeper. > > > > > > > > Do you have some RAM limitation in your setup somewhere? > > > > > > > > For comparison, this is how I started the VM: > > > > > > > > ~/git/qemu/build/qemu-system-x86_64 \ > > > > -machine pc,accel=kvm,usb=off,dump-guest-core=off -m 2048 \ > > > > -smp 4,sockets=4,cores=1,threads=1 -rtc base=utc \ > > > > -boot strict=on -kernel > > > > /home/bee/vm/stretch/boot/vmlinuz-4.9.0-13-amd64 \ > > > > -initrd /home/bee/vm/stretch/boot/initrd.img-4.9.0-13-amd64 \ > > > > -append 'root=svnRoot rw rootfstype=9p > > > > rootflags=trans=virtio,version=9p2000.L,msize=104857600,cache=mmap > > > > console=ttyS0' \ > > > > > > First obvious difference I see between your setup and mine is that > > > you're mounting the 9pfs as root from the kernel command line. For > > > some reason, maybe this has an impact on the check in p9_client_create() > > > ? > > > > > > Can you reproduce with a scenario like Vivek's one ? > > > > Yep, confirmed. If I boot a guest from an image file first and then try to > > manually mount a 9pfs share after guest booted, then I get indeed that > > msize capping of just 512 kiB as well. That's far too small. :/ > > Maybe worth digging : > - why no capping happens in your scenario ? Because I was wrong. I just figured even in the 9p rootfs scenario it does indeed cap msize to 5kiB as well. The output of /etc/mtab on guest side was fooling me. I debugged this on 9p server side and the Linux 9p client always connects with a max. msize of 5 kiB, no matter what you do. > - is capping really needed ? > > Cheers, That's a good question and probably depends on whether there is a limitation on virtio side, which I don't have an answer for. Maybe Michael or Dominique can answer this. Best regards, Christian Schoenebeck ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance) 2021-02-23 13:39 ` Christian Schoenebeck @ 2021-02-23 14:07 ` Michael S. Tsirkin 2021-02-24 15:16 ` Christian Schoenebeck 0 siblings, 1 reply; 55+ messages in thread From: Michael S. Tsirkin @ 2021-02-23 14:07 UTC (permalink / raw) To: Christian Schoenebeck Cc: cdupontd, Dominique Martinet, Venegas Munoz, Jose Carlos, qemu-devel, Dr. David Alan Gilbert, virtio-fs-list, Greg Kurz, Stefan Hajnoczi, v9fs-developer, Shinde, Archana M, Vivek Goyal On Tue, Feb 23, 2021 at 02:39:48PM +0100, Christian Schoenebeck wrote: > On Montag, 22. Februar 2021 18:11:59 CET Greg Kurz wrote: > > On Mon, 22 Feb 2021 16:08:04 +0100 > > Christian Schoenebeck <qemu_oss@crudebyte.com> wrote: > > > > [...] > > > > > I did not ever have a kernel crash when I boot a Linux guest with a 9pfs > > > root fs and 100 MiB msize. > > > > Interesting. > > > > > Should we ask virtio or 9p Linux client maintainers if > > > they can add some info what this is about? > > > > Probably worth to try that first, even if I'm not sure anyone has a > > answer for that since all the people who worked on virtio-9p at > > the time have somehow deserted the project. > > Michael, Dominique, > > we are wondering here about the message size limitation of just 5 kiB in the > 9p Linux client (using virtio transport) which imposes a performance > bottleneck, introduced by this kernel commit: > > commit b49d8b5d7007a673796f3f99688b46931293873e > Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> > Date: Wed Aug 17 16:56:04 2011 +0000 > > net/9p: Fix kernel crash with msize 512K > > With msize equal to 512K (PAGE_SIZE * VIRTQUEUE_NUM), we hit multiple > crashes. This patch fix those. > > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> > Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> Well the change I see is: - .maxsize = PAGE_SIZE*VIRTQUEUE_NUM, + .maxsize = PAGE_SIZE * (VIRTQUEUE_NUM - 3), so how come you say it changes 512K to 5K? Looks more like 500K to me. > Is this a fundamental maximum message size that cannot be exceeded with virtio > in general or is there another reason for this limit that still applies? > > Full discussion: > https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg06343.html > > > > > > As the kernel code sais trans_mod->maxsize, maybe its something in > > > > > virtio > > > > > on qemu side that does an automatic step back for some reason. I don't > > > > > see something in the 9pfs virtio transport driver > > > > > (hw/9pfs/virtio-9p-device.c on QEMU side) that would do this, so I > > > > > would > > > > > also need to dig deeper. > > > > > > > > > > Do you have some RAM limitation in your setup somewhere? > > > > > > > > > > For comparison, this is how I started the VM: > > > > > > > > > > ~/git/qemu/build/qemu-system-x86_64 \ > > > > > -machine pc,accel=kvm,usb=off,dump-guest-core=off -m 2048 \ > > > > > -smp 4,sockets=4,cores=1,threads=1 -rtc base=utc \ > > > > > -boot strict=on -kernel > > > > > /home/bee/vm/stretch/boot/vmlinuz-4.9.0-13-amd64 \ > > > > > -initrd /home/bee/vm/stretch/boot/initrd.img-4.9.0-13-amd64 \ > > > > > -append 'root=svnRoot rw rootfstype=9p > > > > > rootflags=trans=virtio,version=9p2000.L,msize=104857600,cache=mmap > > > > > console=ttyS0' \ > > > > > > > > First obvious difference I see between your setup and mine is that > > > > you're mounting the 9pfs as root from the kernel command line. For > > > > some reason, maybe this has an impact on the check in p9_client_create() > > > > ? > > > > > > > > Can you reproduce with a scenario like Vivek's one ? > > > > > > Yep, confirmed. If I boot a guest from an image file first and then try to > > > manually mount a 9pfs share after guest booted, then I get indeed that > > > msize capping of just 512 kiB as well. That's far too small. :/ > > > > Maybe worth digging : > > - why no capping happens in your scenario ? > > Because I was wrong. > > I just figured even in the 9p rootfs scenario it does indeed cap msize to 5kiB > as well. The output of /etc/mtab on guest side was fooling me. I debugged this > on 9p server side and the Linux 9p client always connects with a max. msize of > 5 kiB, no matter what you do. > > > - is capping really needed ? > > > > Cheers, > > That's a good question and probably depends on whether there is a limitation > on virtio side, which I don't have an answer for. Maybe Michael or Dominique > can answer this. > > Best regards, > Christian Schoenebeck > ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance) 2021-02-23 14:07 ` Michael S. Tsirkin @ 2021-02-24 15:16 ` Christian Schoenebeck 2021-02-24 15:43 ` Dominique Martinet 0 siblings, 1 reply; 55+ messages in thread From: Christian Schoenebeck @ 2021-02-24 15:16 UTC (permalink / raw) To: Michael S. Tsirkin Cc: qemu-devel, Greg Kurz, Shinde, Archana M, Venegas Munoz, Jose Carlos, Dr. David Alan Gilbert, virtio-fs-list, Stefan Hajnoczi, cdupontd, Vivek Goyal, Dominique Martinet, v9fs-developer On Dienstag, 23. Februar 2021 15:07:31 CET Michael S. Tsirkin wrote: > > Michael, Dominique, > > > > we are wondering here about the message size limitation of just 5 kiB in > > the 9p Linux client (using virtio transport) which imposes a performance > > bottleneck, introduced by this kernel commit: > > > > commit b49d8b5d7007a673796f3f99688b46931293873e > > Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> > > Date: Wed Aug 17 16:56:04 2011 +0000 > > > > net/9p: Fix kernel crash with msize 512K > > > > With msize equal to 512K (PAGE_SIZE * VIRTQUEUE_NUM), we hit multiple > > crashes. This patch fix those. > > > > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> > > Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> > > Well the change I see is: > > - .maxsize = PAGE_SIZE*VIRTQUEUE_NUM, > + .maxsize = PAGE_SIZE * (VIRTQUEUE_NUM - 3), > > > so how come you say it changes 512K to 5K? > Looks more like 500K to me. Misapprehension + typo(s) in my previous message, sorry Michael. That's 500k of course (not 5k), yes. Let me rephrase that question: are you aware of something in virtio that would per se mandate an absolute hard coded message size limit (e.g. from virtio specs perspective or maybe some compatibility issue)? If not, we would try getting rid of that hard coded limit of the 9p client on kernel side in the first place, because the kernel's 9p client already has a dynamic runtime option 'msize' and that hard coded enforced limit (500k) is a performance bottleneck like I said. Best regards, Christian Schoenebeck ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance) 2021-02-24 15:16 ` Christian Schoenebeck @ 2021-02-24 15:43 ` Dominique Martinet 2021-02-26 13:49 ` Christian Schoenebeck 0 siblings, 1 reply; 55+ messages in thread From: Dominique Martinet @ 2021-02-24 15:43 UTC (permalink / raw) To: Christian Schoenebeck Cc: cdupontd, Michael S. Tsirkin, Venegas Munoz, Jose Carlos, Greg Kurz, qemu-devel, virtio-fs-list, Vivek Goyal, Stefan Hajnoczi, v9fs-developer, Shinde, Archana M, Dr. David Alan Gilbert Christian Schoenebeck wrote on Wed, Feb 24, 2021 at 04:16:52PM +0100: > Misapprehension + typo(s) in my previous message, sorry Michael. That's 500k > of course (not 5k), yes. > > Let me rephrase that question: are you aware of something in virtio that would > per se mandate an absolute hard coded message size limit (e.g. from virtio > specs perspective or maybe some compatibility issue)? > > If not, we would try getting rid of that hard coded limit of the 9p client on > kernel side in the first place, because the kernel's 9p client already has a > dynamic runtime option 'msize' and that hard coded enforced limit (500k) is a > performance bottleneck like I said. We could probably set it at init time through virtio_max_dma_size(vdev) like virtio_blk does (I just tried and get 2^64 so we can probably expect virtually no limit there) I'm not too familiar with virtio, feel free to try and if it works send me a patch -- the size drop from 512 to 500k is old enough that things probably have changed in the background since then. On the 9p side itself, unrelated to virtio, we don't want to make it *too* big as the client code doesn't use any scatter-gather and will want to allocate upfront contiguous buffers of the size that got negotiated -- that can get ugly quite fast, but we can leave it up to users to decide. One of my very-long-term goal would be to tend to that, if someone has cycles to work on it I'd gladly review any patch in that area. A possible implementation path would be to have transport define themselves if they support it or not and handle it accordingly until all transports migrated, so one wouldn't need to care about e.g. rdma or xen if you don't have hardware to test in the short term. The next best thing would be David's netfs helpers and sending concurrent requests if you use cache, but that's not merged yet either so it'll be a few cycles as well. Cheers, -- Dominique ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance) 2021-02-24 15:43 ` Dominique Martinet @ 2021-02-26 13:49 ` Christian Schoenebeck 2021-02-27 0:03 ` Dominique Martinet 0 siblings, 1 reply; 55+ messages in thread From: Christian Schoenebeck @ 2021-02-26 13:49 UTC (permalink / raw) To: qemu-devel Cc: Dominique Martinet, cdupontd, Michael S. Tsirkin, Venegas Munoz, Jose Carlos, Greg Kurz, virtio-fs-list, Vivek Goyal, Stefan Hajnoczi, v9fs-developer, Shinde, Archana M, Dr. David Alan Gilbert On Mittwoch, 24. Februar 2021 16:43:57 CET Dominique Martinet wrote: > Christian Schoenebeck wrote on Wed, Feb 24, 2021 at 04:16:52PM +0100: > > Misapprehension + typo(s) in my previous message, sorry Michael. That's > > 500k of course (not 5k), yes. > > > > Let me rephrase that question: are you aware of something in virtio that > > would per se mandate an absolute hard coded message size limit (e.g. from > > virtio specs perspective or maybe some compatibility issue)? > > > > If not, we would try getting rid of that hard coded limit of the 9p client > > on kernel side in the first place, because the kernel's 9p client already > > has a dynamic runtime option 'msize' and that hard coded enforced limit > > (500k) is a performance bottleneck like I said. > > We could probably set it at init time through virtio_max_dma_size(vdev) > like virtio_blk does (I just tried and get 2^64 so we can probably > expect virtually no limit there) > > I'm not too familiar with virtio, feel free to try and if it works send > me a patch -- the size drop from 512 to 500k is old enough that things > probably have changed in the background since then. Yes, agreed. I'm neither too familiar with virtio, nor with the Linux 9p client code yet. For that reason I consider a minimal invasive change as a first step at least. AFAICS a "split virtqueue" setup is currently used: https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-240006 Right now the client uses a hard coded amount of 128 elements. So what about replacing VIRTQUEUE_NUM by a variable which is initialized with a value according to the user's requested 'msize' option at init time? According to the virtio specs the max. amount of elements in a virtqueue is 32768. So 32768 * 4k = 128M as new upper limit would already be a significant improvement and would not require too many changes to the client code, right? > On the 9p side itself, unrelated to virtio, we don't want to make it > *too* big as the client code doesn't use any scatter-gather and will > want to allocate upfront contiguous buffers of the size that got > negotiated -- that can get ugly quite fast, but we can leave it up to > users to decide. With ugly you just mean that it's occupying this memory for good as long as the driver is loaded, or is there some runtime performance penalty as well to be aware of? > One of my very-long-term goal would be to tend to that, if someone has > cycles to work on it I'd gladly review any patch in that area. > A possible implementation path would be to have transport define > themselves if they support it or not and handle it accordingly until all > transports migrated, so one wouldn't need to care about e.g. rdma or xen > if you don't have hardware to test in the short term. Sounds like something that Greg suggested before for a slightly different, even though related issue: right now the default 'msize' on Linux client side is 8k, which really hurts performance wise as virtually all 9p messages have to be split into a huge number of request and response messages. OTOH you don't want to set this default value too high. So Greg noted that virtio could suggest a default msize, i.e. a value that would suit host's storage hardware appropriately. > The next best thing would be David's netfs helpers and sending > concurrent requests if you use cache, but that's not merged yet either > so it'll be a few cycles as well. So right now the Linux client is always just handling one request at a time; it sends a 9p request and waits for its response before processing the next request? If so, is there a reason to limit the planned concurrent request handling feature to one of the cached modes? I mean ordering of requests is already handled on 9p server side, so client could just pass all messages in a lite-weight way and assume server takes care of it. Best regards, Christian Schoenebeck ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance) 2021-02-26 13:49 ` Christian Schoenebeck @ 2021-02-27 0:03 ` Dominique Martinet 2021-03-03 14:04 ` Christian Schoenebeck 0 siblings, 1 reply; 55+ messages in thread From: Dominique Martinet @ 2021-02-27 0:03 UTC (permalink / raw) To: Christian Schoenebeck Cc: Shinde, Archana M, Michael S. Tsirkin, Venegas Munoz, Jose Carlos, Greg Kurz, qemu-devel, virtio-fs-list, Dr. David Alan Gilbert, Stefan Hajnoczi, v9fs-developer, cdupontd, Vivek Goyal Christian Schoenebeck wrote on Fri, Feb 26, 2021 at 02:49:12PM +0100: > Right now the client uses a hard coded amount of 128 elements. So what about > replacing VIRTQUEUE_NUM by a variable which is initialized with a value > according to the user's requested 'msize' option at init time? > > According to the virtio specs the max. amount of elements in a virtqueue is > 32768. So 32768 * 4k = 128M as new upper limit would already be a significant > improvement and would not require too many changes to the client code, right? The current code inits the chan->sg at probe time (when driver is loader) and not mount time, and it is currently embedded in the chan struct, so that would need allocating at mount time (p9_client_create ; either resizing if required or not sharing) but it doesn't sound too intrusive yes. I don't see more adherenences to VIRTQUEUE_NUM that would hurt trying. > > On the 9p side itself, unrelated to virtio, we don't want to make it > > *too* big as the client code doesn't use any scatter-gather and will > > want to allocate upfront contiguous buffers of the size that got > > negotiated -- that can get ugly quite fast, but we can leave it up to > > users to decide. > > With ugly you just mean that it's occupying this memory for good as long as > the driver is loaded, or is there some runtime performance penalty as well to > be aware of? The main problem is memory fragmentation, see /proc/buddyinfo on various systems. After a fresh boot memory is quite clean and there is no problem allocating 2MB contiguous buffers, but after a while depending on the workload it can be hard to even allocate large buffers. I've had that problem at work in the past with a RDMA driver that wanted to allocate 256KB and could get that to fail quite reliably with our workload, so it really depends on what the client does. In the 9p case, the memory used to be allocated for good and per client (= mountpoint), so if you had 15 9p mounts that could do e.g. 32 requests in parallel with 1MB buffers you could lock 500MB of idling ram. I changed that to a dedicated slab a while ago, so that should no longer be so much of a problem -- the slab will keep the buffers around as well if used frequently so the performance hit wasn't bad even for larger msizes > > One of my very-long-term goal would be to tend to that, if someone has > > cycles to work on it I'd gladly review any patch in that area. > > A possible implementation path would be to have transport define > > themselves if they support it or not and handle it accordingly until all > > transports migrated, so one wouldn't need to care about e.g. rdma or xen > > if you don't have hardware to test in the short term. > > Sounds like something that Greg suggested before for a slightly different, > even though related issue: right now the default 'msize' on Linux client side > is 8k, which really hurts performance wise as virtually all 9p messages have > to be split into a huge number of request and response messages. OTOH you > don't want to set this default value too high. So Greg noted that virtio could > suggest a default msize, i.e. a value that would suit host's storage hardware > appropriately. We can definitely increase the default, for all transports in my opinion. As a first step, 64 or 128k? > > The next best thing would be David's netfs helpers and sending > > concurrent requests if you use cache, but that's not merged yet either > > so it'll be a few cycles as well. > > So right now the Linux client is always just handling one request at a time; > it sends a 9p request and waits for its response before processing the next > request? Requests are handled concurrently just fine - if you have multiple processes all doing their things it will all go out in parallel. The bottleneck people generally complain about (and where things hurt) is if you have a single process reading then there is currently no readahead as far as I know, so reads are really sent one at a time, waiting for reply and sending next. > If so, is there a reason to limit the planned concurrent request handling > feature to one of the cached modes? I mean ordering of requests is already > handled on 9p server side, so client could just pass all messages in a > lite-weight way and assume server takes care of it. cache=none is difficult, we could pipeline requests up to the buffer size the client requested, but that's it. Still something worth doing if the msize is tiny and the client requests 4+MB in my opinion, but nothing anything in the vfs can help us with. cache=mmap is basically cache=none with a hack to say "ok, for mmap there's no choice so do use some" -- afaik mmap has its own readahead mechanism, so this should actually prefetch things, but I don't know about the parallelism of that mechanism and would say it's linear. Other chaching models (loose / fscache) actually share most of the code so whatever is done for one would be for both, the discussion is still underway with David/Willy and others mostly about ceph/cifs but would benefit everyone and I'm following closely. -- Dominique ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance) 2021-02-27 0:03 ` Dominique Martinet @ 2021-03-03 14:04 ` Christian Schoenebeck 2021-03-03 14:50 ` Dominique Martinet 0 siblings, 1 reply; 55+ messages in thread From: Christian Schoenebeck @ 2021-03-03 14:04 UTC (permalink / raw) To: qemu-devel Cc: Dominique Martinet, Shinde, Archana M, Michael S. Tsirkin, Venegas Munoz, Jose Carlos, Greg Kurz, virtio-fs-list, Dr. David Alan Gilbert, Stefan Hajnoczi, v9fs-developer, cdupontd, Vivek Goyal On Samstag, 27. Februar 2021 01:03:40 CET Dominique Martinet wrote: > Christian Schoenebeck wrote on Fri, Feb 26, 2021 at 02:49:12PM +0100: > > Right now the client uses a hard coded amount of 128 elements. So what > > about replacing VIRTQUEUE_NUM by a variable which is initialized with a > > value according to the user's requested 'msize' option at init time? > > > > According to the virtio specs the max. amount of elements in a virtqueue > > is > > 32768. So 32768 * 4k = 128M as new upper limit would already be a > > significant improvement and would not require too many changes to the > > client code, right? > The current code inits the chan->sg at probe time (when driver is > loader) and not mount time, and it is currently embedded in the chan > struct, so that would need allocating at mount time (p9_client_create ; > either resizing if required or not sharing) but it doesn't sound too > intrusive yes. > > I don't see more adherenences to VIRTQUEUE_NUM that would hurt trying. Ok, then I will look into changing this when I hopefully have some time in few weeks. > > > On the 9p side itself, unrelated to virtio, we don't want to make it > > > *too* big as the client code doesn't use any scatter-gather and will > > > want to allocate upfront contiguous buffers of the size that got > > > negotiated -- that can get ugly quite fast, but we can leave it up to > > > users to decide. > > > > With ugly you just mean that it's occupying this memory for good as long > > as > > the driver is loaded, or is there some runtime performance penalty as well > > to be aware of? > > The main problem is memory fragmentation, see /proc/buddyinfo on various > systems. > After a fresh boot memory is quite clean and there is no problem > allocating 2MB contiguous buffers, but after a while depending on the > workload it can be hard to even allocate large buffers. > I've had that problem at work in the past with a RDMA driver that wanted > to allocate 256KB and could get that to fail quite reliably with our > workload, so it really depends on what the client does. > > In the 9p case, the memory used to be allocated for good and per client > (= mountpoint), so if you had 15 9p mounts that could do e.g. 32 > requests in parallel with 1MB buffers you could lock 500MB of idling > ram. I changed that to a dedicated slab a while ago, so that should no > longer be so much of a problem -- the slab will keep the buffers around > as well if used frequently so the performance hit wasn't bad even for > larger msizes Ah ok, good to know. BTW qemu now handles multiple filesystems below one 9p share correctly by (optionally) remapping inode numbers from host side -> guest side appropriately to prevent potential file ID collisions. This might reduce the need to have a large amount of 9p mount points on guest side. For instance I am running entire guest systems entirely on one 9p mount point as root fs that is. The guest system is divided into multiple filesystems on host side (e.g. multiple zfs datasets), not on guest side. > > > One of my very-long-term goal would be to tend to that, if someone has > > > cycles to work on it I'd gladly review any patch in that area. > > > A possible implementation path would be to have transport define > > > themselves if they support it or not and handle it accordingly until all > > > transports migrated, so one wouldn't need to care about e.g. rdma or xen > > > if you don't have hardware to test in the short term. > > > > Sounds like something that Greg suggested before for a slightly different, > > even though related issue: right now the default 'msize' on Linux client > > side is 8k, which really hurts performance wise as virtually all 9p > > messages have to be split into a huge number of request and response > > messages. OTOH you don't want to set this default value too high. So Greg > > noted that virtio could suggest a default msize, i.e. a value that would > > suit host's storage hardware appropriately. > > We can definitely increase the default, for all transports in my > opinion. > As a first step, 64 or 128k? Just to throw some numbers first; when linearly reading a 12 GB file on guest (i.e. "time cat test.dat > /dev/null") on a test machine, these are the results that I get (cache=mmap): msize=16k: 2min7s (95 MB/s) msize=64k: 17s (706 MB/s) msize=128k: 12s (1000 MB/s) msize=256k: 8s (1500 MB/s) msize=512k: 6.5s (1846 MB/s) Personally I would raise the default msize value at least to 128k. > > > The next best thing would be David's netfs helpers and sending > > > concurrent requests if you use cache, but that's not merged yet either > > > so it'll be a few cycles as well. > > > > So right now the Linux client is always just handling one request at a > > time; it sends a 9p request and waits for its response before processing > > the next request? > > Requests are handled concurrently just fine - if you have multiple > processes all doing their things it will all go out in parallel. > > The bottleneck people generally complain about (and where things hurt) > is if you have a single process reading then there is currently no > readahead as far as I know, so reads are really sent one at a time, > waiting for reply and sending next. So that also means if you are running a multi-threaded app (in one process) on guest side, then none of its I/O requests are handled in parallel right now. It would be desirable to have parallel requests for multi-threaded apps as well. Personally I don't find raw I/O the worst performance issue right now. As you can see from the numbers above, if 'msize' is raised and I/O being performed with large chunk sizes (e.g. 'cat' automatically uses a chunk size according to the iounit advertised by stat) then the I/O results are okay. What hurts IMO the most in practice is the sluggish behaviour regarding dentries ATM. The following is with cache=mmap (on guest side): $ time ls /etc/ > /dev/null real 0m0.091s user 0m0.000s sys 0m0.044s $ time ls -l /etc/ > /dev/null real 0m0.259s user 0m0.008s sys 0m0.016s $ ls -l /etc/ | wc -l 113 $ With cache=loose there is some improvement; on the first "ls" run (when its not in the dentry cache I assume) the results are similar. The subsequent runs then improve to around 50ms for "ls" and around 70ms for "ls -l". But that's still far from numbers I would expect. Keep in mind, even when you just open() & read() a file, then directory components have to be walked for checking ownership and permissions. I have seen huge slowdowns in deep directory structures for that reason. Best regards, Christian Schoenebeck ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance) 2021-03-03 14:04 ` Christian Schoenebeck @ 2021-03-03 14:50 ` Dominique Martinet 2021-03-05 14:57 ` Christian Schoenebeck 0 siblings, 1 reply; 55+ messages in thread From: Dominique Martinet @ 2021-03-03 14:50 UTC (permalink / raw) To: Christian Schoenebeck Cc: cdupontd, Michael S. Tsirkin, Venegas Munoz, Jose Carlos, Greg Kurz, qemu-devel, virtio-fs-list, Vivek Goyal, Stefan Hajnoczi, v9fs-developer, Shinde, Archana M, Dr. David Alan Gilbert Christian Schoenebeck wrote on Wed, Mar 03, 2021 at 03:04:21PM +0100: > > We can definitely increase the default, for all transports in my > > opinion. > > As a first step, 64 or 128k? > > Just to throw some numbers first; when linearly reading a 12 GB file on guest > (i.e. "time cat test.dat > /dev/null") on a test machine, these are the > results that I get (cache=mmap): > > msize=16k: 2min7s (95 MB/s) > msize=64k: 17s (706 MB/s) > msize=128k: 12s (1000 MB/s) > msize=256k: 8s (1500 MB/s) > msize=512k: 6.5s (1846 MB/s) > > Personally I would raise the default msize value at least to 128k. Thanks for the numbers. I'm still a bit worried about too large chunks, let's go with 128k for now -- I'll send a couple of patches increasing the tcp max/default as well next week-ish. > > The bottleneck people generally complain about (and where things hurt) > > is if you have a single process reading then there is currently no > > readahead as far as I know, so reads are really sent one at a time, > > waiting for reply and sending next. > > So that also means if you are running a multi-threaded app (in one process) on > guest side, then none of its I/O requests are handled in parallel right now. > It would be desirable to have parallel requests for multi-threaded apps as > well. threads are independant there as far as the kernel goes, if multiple threads issue IO in parallel it will be handled in parallel. (the exception would be "lightweight threads" which don't spawn actual OS thread, but in this case the IOs are generally sent asynchronously so that should work as well) > Personally I don't find raw I/O the worst performance issue right now. As you > can see from the numbers above, if 'msize' is raised and I/O being performed > with large chunk sizes (e.g. 'cat' automatically uses a chunk size according > to the iounit advertised by stat) then the I/O results are okay. > > What hurts IMO the most in practice is the sluggish behaviour regarding > dentries ATM. The following is with cache=mmap (on guest side): > > $ time ls /etc/ > /dev/null > real 0m0.091s > user 0m0.000s > sys 0m0.044s > $ time ls -l /etc/ > /dev/null > real 0m0.259s > user 0m0.008s > sys 0m0.016s > $ ls -l /etc/ | wc -l > 113 > $ Yes, that is slow indeed.. Unfortunately cache=none/mmap means only open dentries are pinned, so that means a load of requests everytime. I was going to suggest something like readdirplus or prefetching directory entries attributes in parallel/background, but since we're not keeping any entries around we can't even do that in that mode. > With cache=loose there is some improvement; on the first "ls" run (when its > not in the dentry cache I assume) the results are similar. The subsequent runs > then improve to around 50ms for "ls" and around 70ms for "ls -l". But that's > still far from numbers I would expect. I'm surprised cached mode is that slow though, that is worth investigating. With that time range we are definitely sending more requests to the server than I would expect for cache=loose, some stat revalidation perhaps? I thought there wasn't any. I don't like cache=loose/fscache right now as the reclaim mechanism doesn't work well as far as I'm aware (I've heard reports of 9p memory usage growing ad nauseam in these modes), so while it's fine for short-lived VMs it can't really be used for long periods of time as is... That's been on my todo for a while too, but unfortunately no time for that. Ideally if that gets fixed, it really should be the default with some sort of cache revalidation like NFS does (if that hasn't changed, inode stats have a lifetime after which they get revalidated on access, and directory ctime changes lead to a fresh readdir) ; but we can't really do that right now if it "leaks". Some cap to the number of open fids could be appreciable as well perhaps, to spare server resources and keep internal lists short. > Keep in mind, even when you just open() & read() a file, then directory > components have to be walked for checking ownership and permissions. I have > seen huge slowdowns in deep directory structures for that reason. Yes, each component is walked one at a time. In theory the protocol allows opening a path with all components specified to a single walk and letting the server handle intermediate directories check, but the VFS doesn't allow that. Using relative paths or openat/fstatat/etc helps but many programs aren't very smart with that.. Note it's not just a problem with 9p though, even network filesystems with proper caching have a noticeable performance cost with deep directory trees. Anyway, there definitely is room for improvement; if you need ideas I have plenty but my time is more than limited right now and for the forseeable future... 9p work is purely on my freetime and there isn't much at the moment :( I'll make time as necessary for reviews & tests but that's about as much as I can promise, sorry and good luck! -- Dominique ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance) 2021-03-03 14:50 ` Dominique Martinet @ 2021-03-05 14:57 ` Christian Schoenebeck 0 siblings, 0 replies; 55+ messages in thread From: Christian Schoenebeck @ 2021-03-05 14:57 UTC (permalink / raw) To: qemu-devel Cc: Dominique Martinet, cdupontd, Michael S. Tsirkin, Venegas Munoz, Jose Carlos, Greg Kurz, virtio-fs-list, Vivek Goyal, Stefan Hajnoczi, v9fs-developer, Shinde, Archana M, Dr. David Alan Gilbert On Mittwoch, 3. März 2021 15:50:37 CET Dominique Martinet wrote: > Christian Schoenebeck wrote on Wed, Mar 03, 2021 at 03:04:21PM +0100: > > > We can definitely increase the default, for all transports in my > > > opinion. > > > As a first step, 64 or 128k? > > > > Just to throw some numbers first; when linearly reading a 12 GB file on > > guest (i.e. "time cat test.dat > /dev/null") on a test machine, these are > > the results that I get (cache=mmap): > > > > msize=16k: 2min7s (95 MB/s) > > msize=64k: 17s (706 MB/s) > > msize=128k: 12s (1000 MB/s) > > msize=256k: 8s (1500 MB/s) > > msize=512k: 6.5s (1846 MB/s) > > > > Personally I would raise the default msize value at least to 128k. > > Thanks for the numbers. > I'm still a bit worried about too large chunks, let's go with 128k for > now -- I'll send a couple of patches increasing the tcp max/default as > well next week-ish. Ok, sounds good! > > Personally I don't find raw I/O the worst performance issue right now. As > > you can see from the numbers above, if 'msize' is raised and I/O being > > performed with large chunk sizes (e.g. 'cat' automatically uses a chunk > > size according to the iounit advertised by stat) then the I/O results are > > okay. > > > > What hurts IMO the most in practice is the sluggish behaviour regarding > > dentries ATM. The following is with cache=mmap (on guest side): > > > > $ time ls /etc/ > /dev/null > > real 0m0.091s > > user 0m0.000s > > sys 0m0.044s > > $ time ls -l /etc/ > /dev/null > > real 0m0.259s > > user 0m0.008s > > sys 0m0.016s > > $ ls -l /etc/ | wc -l > > 113 > > $ > > Yes, that is slow indeed.. Unfortunately cache=none/mmap means only open > dentries are pinned, so that means a load of requests everytime. > > I was going to suggest something like readdirplus or prefetching > directory entries attributes in parallel/background, but since we're not > keeping any entries around we can't even do that in that mode. > > > With cache=loose there is some improvement; on the first "ls" run (when > > its > > not in the dentry cache I assume) the results are similar. The subsequent > > runs then improve to around 50ms for "ls" and around 70ms for "ls -l". > > But that's still far from numbers I would expect. > > I'm surprised cached mode is that slow though, that is worth > investigating. > With that time range we are definitely sending more requests to the > server than I would expect for cache=loose, some stat revalidation > perhaps? I thought there wasn't any. Yes, it looks like more 9p requests are sent than actually required for readdir. But I haven't checked yet what's going on there in detail. That's definitely on my todo list, because this readdir/stat/direntry issue ATM really hurts the most IMO. > I don't like cache=loose/fscache right now as the reclaim mechanism > doesn't work well as far as I'm aware (I've heard reports of 9p memory > usage growing ad nauseam in these modes), so while it's fine for > short-lived VMs it can't really be used for long periods of time as > is... That's been on my todo for a while too, but unfortunately no time > for that. Ok, that's new to me. But I fear the opposite is currently worse; with cache=mmap and running a VM for a longer time: 9p requests get slower and slower, e.g. at a certain point you're waiting like 20s for one request. I haven't investigated the cause here either yet. It may very well be an issue on QEMU side: I have some doubts in the fid reclaim algorithm on 9p server side which is using just a linked list. Maybe that list is growing to ridiculous sizes and searching the list with O(n) starts to hurt after a while. With cache=loose I don't see such tremendous slowdowns even on long runs, which might indicate that this symptom might indeed be due to a problem on QEMU side. > Ideally if that gets fixed, it really should be the default with some > sort of cache revalidation like NFS does (if that hasn't changed, inode > stats have a lifetime after which they get revalidated on access, and > directory ctime changes lead to a fresh readdir) ; but we can't really > do that right now if it "leaks". > > Some cap to the number of open fids could be appreciable as well > perhaps, to spare server resources and keep internal lists short. I just reviewed the fid reclaim code on 9p servers side to some extent because of a security issue on 9p server side in this area recently, but I haven't really thought through nor captured the authors' original ideas behind it entirely yet. I still have some question marks here. Maybe Greg feels the same. Probably when support for macOS is added (also on my todo list), then the amount of open fids needs to be limited anyway. Because macOS is much more conservative and does not allow a large number of open files by default. > Anyway, there definitely is room for improvement; if you need ideas I > have plenty but my time is more than limited right now and for the > forseeable future... 9p work is purely on my freetime and there isn't > much at the moment :( > > I'll make time as necessary for reviews & tests but that's about as much > as I can promise, sorry and good luck! I fear that applies to all developers right now. To my knowledge there is not a single developer either paid and/or able to spend reasonable large time slices on 9p issues. From my side: my plans are to hunt down the worst 9p performance issues in order of their impact, but like anybody else, when I find some free time slices for that. #patience #optimistic Best regards, Christian Schoenebeck ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance) 2020-09-24 22:10 ` virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance) Vivek Goyal 2020-09-25 8:06 ` virtiofs vs 9p performance Christian Schoenebeck @ 2020-09-25 12:41 ` Dr. David Alan Gilbert 2020-09-25 13:04 ` Christian Schoenebeck 2020-09-29 13:17 ` Vivek Goyal 1 sibling, 2 replies; 55+ messages in thread From: Dr. David Alan Gilbert @ 2020-09-25 12:41 UTC (permalink / raw) To: Vivek Goyal Cc: Venegas Munoz, Jose Carlos, qemu-devel, cdupontd, virtio-fs-list, Stefan Hajnoczi, Shinde, Archana M * Vivek Goyal (vgoyal@redhat.com) wrote: > On Thu, Sep 24, 2020 at 09:33:01PM +0000, Venegas Munoz, Jose Carlos wrote: > > Hi Folks, > > > > Sorry for the delay about how to reproduce `fio` data. > > > > I have some code to automate testing for multiple kata configs and collect info like: > > - Kata-env, kata configuration.toml, qemu command, virtiofsd command. > > > > See: > > https://github.com/jcvenegas/mrunner/ > > > > > > Last time we agreed to narrow the cases and configs to compare virtiofs and 9pfs > > > > The configs where the following: > > > > - qemu + virtiofs(cache=auto, dax=0) a.ka. `kata-qemu-virtiofs` WITOUT xattr > > - qemu + 9pfs a.k.a `kata-qemu` > > > > Please take a look to the html and raw results I attach in this mail. > > Hi Carlos, > > So you are running following test. > > fio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75 --output=/output/fio.txt > > And following are your results. > > 9p > -- > READ: bw=211MiB/s (222MB/s), 211MiB/s-211MiB/s (222MB/s-222MB/s), io=3070MiB (3219MB), run=14532-14532msec > > WRITE: bw=70.6MiB/s (74.0MB/s), 70.6MiB/s-70.6MiB/s (74.0MB/s-74.0MB/s), io=1026MiB (1076MB), run=14532-14532msec > > virtiofs > -------- > Run status group 0 (all jobs): > READ: bw=159MiB/s (167MB/s), 159MiB/s-159MiB/s (167MB/s-167MB/s), io=3070MiB (3219MB), run=19321-19321msec > WRITE: bw=53.1MiB/s (55.7MB/s), 53.1MiB/s-53.1MiB/s (55.7MB/s-55.7MB/s), io=1026MiB (1076MB), run=19321-19321msec > > So looks like you are getting better performance with 9p in this case. That's interesting, because I've just tried similar again with my ramdisk setup: fio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75 --output=aname.txt virtiofs default options test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=64 fio-3.21 Starting 1 process test: Laying out IO file (1 file / 4096MiB) test: (groupid=0, jobs=1): err= 0: pid=773: Fri Sep 25 12:28:32 2020 read: IOPS=18.3k, BW=71.3MiB/s (74.8MB/s)(3070MiB/43042msec) bw ( KiB/s): min=70752, max=77280, per=100.00%, avg=73075.71, stdev=1603.47, samples=85 iops : min=17688, max=19320, avg=18268.92, stdev=400.86, samples=85 write: IOPS=6102, BW=23.8MiB/s (24.0MB/s)(1026MiB/43042msec); 0 zone resets bw ( KiB/s): min=23128, max=25696, per=100.00%, avg=24420.40, stdev=583.08, samples=85 iops : min= 5782, max= 6424, avg=6105.09, stdev=145.76, samples=85 cpu : usr=0.10%, sys=30.09%, ctx=1245312, majf=0, minf=6 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=64 Run status group 0 (all jobs): READ: bw=71.3MiB/s (74.8MB/s), 71.3MiB/s-71.3MiB/s (74.8MB/s-74.8MB/s), io=3070MiB (3219MB), run=43042-43042msec WRITE: bw=23.8MiB/s (24.0MB/s), 23.8MiB/s-23.8MiB/s (24.0MB/s-24.0MB/s), io=1026MiB (1076MB), run=43042-43042msec virtiofs cache=none test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=64 fio-3.21 Starting 1 process test: (groupid=0, jobs=1): err= 0: pid=740: Fri Sep 25 12:30:57 2020 read: IOPS=22.9k, BW=89.6MiB/s (93.0MB/s)(3070MiB/34256msec) bw ( KiB/s): min=89048, max=94240, per=100.00%, avg=91871.06, stdev=967.87, samples=68 iops : min=22262, max=23560, avg=22967.76, stdev=241.97, samples=68 write: IOPS=7667, BW=29.0MiB/s (31.4MB/s)(1026MiB/34256msec); 0 zone resets bw ( KiB/s): min=29264, max=32248, per=100.00%, avg=30700.82, stdev=541.97, samples=68 iops : min= 7316, max= 8062, avg=7675.21, stdev=135.49, samples=68 cpu : usr=1.03%, sys=27.64%, ctx=1048635, majf=0, minf=5 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=64 Run status group 0 (all jobs): READ: bw=89.6MiB/s (93.0MB/s), 89.6MiB/s-89.6MiB/s (93.0MB/s-93.0MB/s), io=3070MiB (3219MB), run=34256-34256msec WRITE: bw=29.0MiB/s (31.4MB/s), 29.0MiB/s-29.0MiB/s (31.4MB/s-31.4MB/s), io=1026MiB (1076MB), run=34256-34256msec virtiofs cache=none thread-pool-size=1 test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=64 fio-3.21 Starting 1 process test: (groupid=0, jobs=1): err= 0: pid=738: Fri Sep 25 12:33:17 2020 read: IOPS=23.7k, BW=92.4MiB/s (96.9MB/s)(3070MiB/33215msec) bw ( KiB/s): min=89808, max=111952, per=100.00%, avg=94762.30, stdev=4507.43, samples=66 iops : min=22452, max=27988, avg=23690.58, stdev=1126.86, samples=66 write: IOPS=7907, BW=30.9MiB/s (32.4MB/s)(1026MiB/33215msec); 0 zone resets bw ( KiB/s): min=29424, max=37112, per=100.00%, avg=31668.73, stdev=1558.69, samples=66 iops : min= 7356, max= 9278, avg=7917.18, stdev=389.67, samples=66 cpu : usr=0.43%, sys=29.07%, ctx=1048627, majf=0, minf=7 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=64 Run status group 0 (all jobs): READ: bw=92.4MiB/s (96.9MB/s), 92.4MiB/s-92.4MiB/s (96.9MB/s-96.9MB/s), io=3070MiB (3219MB), run=33215-33215msec WRITE: bw=30.9MiB/s (32.4MB/s), 30.9MiB/s-30.9MiB/s (32.4MB/s-32.4MB/s), io=1026MiB (1076MB), run=33215-33215msec 9p ( mount -t 9p -o trans=virtio kernel /mnt -oversion=9p2000.L,cache=mmap,msize=1048576 ) test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=64 fio-3.21 Starting 1 process test: (groupid=0, jobs=1): err= 0: pid=736: Fri Sep 25 12:36:00 2020 read: IOPS=16.2k, BW=63.5MiB/s (66.6MB/s)(3070MiB/48366msec) bw ( KiB/s): min=63426, max=82776, per=100.00%, avg=65054.28, stdev=2014.88, samples=96 iops : min=15856, max=20694, avg=16263.34, stdev=503.74, samples=96 write: IOPS=5430, BW=21.2MiB/s (22.2MB/s)(1026MiB/48366msec); 0 zone resets bw ( KiB/s): min=20916, max=27632, per=100.00%, avg=21740.64, stdev=735.73, samples=96 iops : min= 5229, max= 6908, avg=5434.99, stdev=183.95, samples=96 cpu : usr=1.60%, sys=14.28%, ctx=1049348, majf=0, minf=7 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=64 Run status group 0 (all jobs): READ: bw=63.5MiB/s (66.6MB/s), 63.5MiB/s-63.5MiB/s (66.6MB/s-66.6MB/s), io=3070MiB (3219MB), run=48366-48366msec WRITE: bw=21.2MiB/s (22.2MB/s), 21.2MiB/s-21.2MiB/s (22.2MB/s-22.2MB/s), io=1026MiB (1076MB), run=48366-48366msec So I'm sitll beating 9p; the thread-pool-size=1 seems to be great for read performance here. Dave > Can you apply "shared pool" patch to qemu for virtiofsd and re-run this > test and see if you see any better results. > > In my testing, with cache=none, virtiofs performed better than 9p in > all the fio jobs I was running. For the case of cache=auto for virtiofs > (with xattr enabled), 9p performed better in certain write workloads. I > have identified root cause of that problem and working on > HANDLE_KILLPRIV_V2 patches to improve WRITE performance of virtiofs > with cache=auto and xattr enabled. > > I will post my 9p and virtiofs comparison numbers next week. In the > mean time will be great if you could apply following qemu patch, rebuild > qemu and re-run above test. > > https://www.redhat.com/archives/virtio-fs/2020-September/msg00081.html > > Also what's the status of file cache on host in both the cases. Are > you booting host fresh for these tests so that cache is cold on host > or cache is warm? > > Thanks > Vivek -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance) 2020-09-25 12:41 ` virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance) Dr. David Alan Gilbert @ 2020-09-25 13:04 ` Christian Schoenebeck 2020-09-25 13:05 ` Dr. David Alan Gilbert 2020-09-29 13:17 ` Vivek Goyal 1 sibling, 1 reply; 55+ messages in thread From: Christian Schoenebeck @ 2020-09-25 13:04 UTC (permalink / raw) To: qemu-devel Cc: Dr. David Alan Gilbert, Vivek Goyal, Venegas Munoz, Jose Carlos, cdupontd, virtio-fs-list, Stefan Hajnoczi, Shinde, Archana M On Freitag, 25. September 2020 14:41:39 CEST Dr. David Alan Gilbert wrote: > > Hi Carlos, > > > > So you are running following test. > > > > fio --direct=1 --gtod_reduce=1 --name=test > > --filename=random_read_write.fio --bs=4k --iodepth=64 --size=4G > > --readwrite=randrw --rwmixread=75 --output=/output/fio.txt > > > > And following are your results. > > > > 9p > > -- > > READ: bw=211MiB/s (222MB/s), 211MiB/s-211MiB/s (222MB/s-222MB/s), > > io=3070MiB (3219MB), run=14532-14532msec > > > > WRITE: bw=70.6MiB/s (74.0MB/s), 70.6MiB/s-70.6MiB/s (74.0MB/s-74.0MB/s), > > io=1026MiB (1076MB), run=14532-14532msec > > > > virtiofs > > -------- > > > > Run status group 0 (all jobs): > > READ: bw=159MiB/s (167MB/s), 159MiB/s-159MiB/s (167MB/s-167MB/s), > > io=3070MiB (3219MB), run=19321-19321msec> > > WRITE: bw=53.1MiB/s (55.7MB/s), 53.1MiB/s-53.1MiB/s (55.7MB/s-55.7MB/s), > > io=1026MiB (1076MB), run=19321-19321msec> > > So looks like you are getting better performance with 9p in this case. > > That's interesting, because I've just tried similar again with my > ramdisk setup: > > fio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio > --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75 > --output=aname.txt > > > virtiofs default options > test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) > 4096B-4096B, ioengine=psync, iodepth=64 fio-3.21 > Starting 1 process > test: Laying out IO file (1 file / 4096MiB) > > test: (groupid=0, jobs=1): err= 0: pid=773: Fri Sep 25 12:28:32 2020 > read: IOPS=18.3k, BW=71.3MiB/s (74.8MB/s)(3070MiB/43042msec) > bw ( KiB/s): min=70752, max=77280, per=100.00%, avg=73075.71, > stdev=1603.47, samples=85 iops : min=17688, max=19320, avg=18268.92, > stdev=400.86, samples=85 write: IOPS=6102, BW=23.8MiB/s > (24.0MB/s)(1026MiB/43042msec); 0 zone resets bw ( KiB/s): min=23128, > max=25696, per=100.00%, avg=24420.40, stdev=583.08, samples=85 iops > : min= 5782, max= 6424, avg=6105.09, stdev=145.76, samples=85 cpu > : usr=0.10%, sys=30.09%, ctx=1245312, majf=0, minf=6 IO depths : > 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : > 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : > 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: > total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, > window=0, percentile=100.00%, depth=64 > > Run status group 0 (all jobs): > READ: bw=71.3MiB/s (74.8MB/s), 71.3MiB/s-71.3MiB/s (74.8MB/s-74.8MB/s), > io=3070MiB (3219MB), run=43042-43042msec WRITE: bw=23.8MiB/s (24.0MB/s), > 23.8MiB/s-23.8MiB/s (24.0MB/s-24.0MB/s), io=1026MiB (1076MB), > run=43042-43042msec > > virtiofs cache=none > test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) > 4096B-4096B, ioengine=psync, iodepth=64 fio-3.21 > Starting 1 process > > test: (groupid=0, jobs=1): err= 0: pid=740: Fri Sep 25 12:30:57 2020 > read: IOPS=22.9k, BW=89.6MiB/s (93.0MB/s)(3070MiB/34256msec) > bw ( KiB/s): min=89048, max=94240, per=100.00%, avg=91871.06, > stdev=967.87, samples=68 iops : min=22262, max=23560, avg=22967.76, > stdev=241.97, samples=68 write: IOPS=7667, BW=29.0MiB/s > (31.4MB/s)(1026MiB/34256msec); 0 zone resets bw ( KiB/s): min=29264, > max=32248, per=100.00%, avg=30700.82, stdev=541.97, samples=68 iops > : min= 7316, max= 8062, avg=7675.21, stdev=135.49, samples=68 cpu > : usr=1.03%, sys=27.64%, ctx=1048635, majf=0, minf=5 IO depths : > 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : > 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : > 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: > total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, > window=0, percentile=100.00%, depth=64 > > Run status group 0 (all jobs): > READ: bw=89.6MiB/s (93.0MB/s), 89.6MiB/s-89.6MiB/s (93.0MB/s-93.0MB/s), > io=3070MiB (3219MB), run=34256-34256msec WRITE: bw=29.0MiB/s (31.4MB/s), > 29.0MiB/s-29.0MiB/s (31.4MB/s-31.4MB/s), io=1026MiB (1076MB), > run=34256-34256msec > > virtiofs cache=none thread-pool-size=1 > test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) > 4096B-4096B, ioengine=psync, iodepth=64 fio-3.21 > Starting 1 process > > test: (groupid=0, jobs=1): err= 0: pid=738: Fri Sep 25 12:33:17 2020 > read: IOPS=23.7k, BW=92.4MiB/s (96.9MB/s)(3070MiB/33215msec) > bw ( KiB/s): min=89808, max=111952, per=100.00%, avg=94762.30, > stdev=4507.43, samples=66 iops : min=22452, max=27988, avg=23690.58, > stdev=1126.86, samples=66 write: IOPS=7907, BW=30.9MiB/s > (32.4MB/s)(1026MiB/33215msec); 0 zone resets bw ( KiB/s): min=29424, > max=37112, per=100.00%, avg=31668.73, stdev=1558.69, samples=66 iops > : min= 7356, max= 9278, avg=7917.18, stdev=389.67, samples=66 cpu > : usr=0.43%, sys=29.07%, ctx=1048627, majf=0, minf=7 IO depths : > 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : > 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : > 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: > total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, > window=0, percentile=100.00%, depth=64 > > Run status group 0 (all jobs): > READ: bw=92.4MiB/s (96.9MB/s), 92.4MiB/s-92.4MiB/s (96.9MB/s-96.9MB/s), > io=3070MiB (3219MB), run=33215-33215msec WRITE: bw=30.9MiB/s (32.4MB/s), > 30.9MiB/s-30.9MiB/s (32.4MB/s-32.4MB/s), io=1026MiB (1076MB), > run=33215-33215msec > > 9p ( mount -t 9p -o trans=virtio kernel /mnt > -oversion=9p2000.L,cache=mmap,msize=1048576 ) test: (g=0): rw=randrw, Bottleneck ------------------------------^ By increasing 'msize' you would encounter better 9P I/O results. > bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, > iodepth=64 fio-3.21 > Starting 1 process > > test: (groupid=0, jobs=1): err= 0: pid=736: Fri Sep 25 12:36:00 2020 > read: IOPS=16.2k, BW=63.5MiB/s (66.6MB/s)(3070MiB/48366msec) > bw ( KiB/s): min=63426, max=82776, per=100.00%, avg=65054.28, > stdev=2014.88, samples=96 iops : min=15856, max=20694, avg=16263.34, > stdev=503.74, samples=96 write: IOPS=5430, BW=21.2MiB/s > (22.2MB/s)(1026MiB/48366msec); 0 zone resets bw ( KiB/s): min=20916, > max=27632, per=100.00%, avg=21740.64, stdev=735.73, samples=96 iops > : min= 5229, max= 6908, avg=5434.99, stdev=183.95, samples=96 cpu > : usr=1.60%, sys=14.28%, ctx=1049348, majf=0, minf=7 IO depths : > 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : > 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : > 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: > total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, > window=0, percentile=100.00%, depth=64 > > Run status group 0 (all jobs): > READ: bw=63.5MiB/s (66.6MB/s), 63.5MiB/s-63.5MiB/s (66.6MB/s-66.6MB/s), > io=3070MiB (3219MB), run=48366-48366msec WRITE: bw=21.2MiB/s (22.2MB/s), > 21.2MiB/s-21.2MiB/s (22.2MB/s-22.2MB/s), io=1026MiB (1076MB), > run=48366-48366msec > > So I'm sitll beating 9p; the thread-pool-size=1 seems to be great for > read performance here. > > Dave Best regards, Christian Schoenebeck ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance) 2020-09-25 13:04 ` Christian Schoenebeck @ 2020-09-25 13:05 ` Dr. David Alan Gilbert 2020-09-25 16:05 ` Christian Schoenebeck 0 siblings, 1 reply; 55+ messages in thread From: Dr. David Alan Gilbert @ 2020-09-25 13:05 UTC (permalink / raw) To: Christian Schoenebeck Cc: Venegas Munoz, Jose Carlos, qemu-devel, cdupontd, virtio-fs-list, Stefan Hajnoczi, Shinde, Archana M, Vivek Goyal * Christian Schoenebeck (qemu_oss@crudebyte.com) wrote: > On Freitag, 25. September 2020 14:41:39 CEST Dr. David Alan Gilbert wrote: > > > Hi Carlos, > > > > > > So you are running following test. > > > > > > fio --direct=1 --gtod_reduce=1 --name=test > > > --filename=random_read_write.fio --bs=4k --iodepth=64 --size=4G > > > --readwrite=randrw --rwmixread=75 --output=/output/fio.txt > > > > > > And following are your results. > > > > > > 9p > > > -- > > > READ: bw=211MiB/s (222MB/s), 211MiB/s-211MiB/s (222MB/s-222MB/s), > > > io=3070MiB (3219MB), run=14532-14532msec > > > > > > WRITE: bw=70.6MiB/s (74.0MB/s), 70.6MiB/s-70.6MiB/s (74.0MB/s-74.0MB/s), > > > io=1026MiB (1076MB), run=14532-14532msec > > > > > > virtiofs > > > -------- > > > > > > Run status group 0 (all jobs): > > > READ: bw=159MiB/s (167MB/s), 159MiB/s-159MiB/s (167MB/s-167MB/s), > > > io=3070MiB (3219MB), run=19321-19321msec> > > > WRITE: bw=53.1MiB/s (55.7MB/s), 53.1MiB/s-53.1MiB/s (55.7MB/s-55.7MB/s), > > > io=1026MiB (1076MB), run=19321-19321msec> > > > So looks like you are getting better performance with 9p in this case. > > > > That's interesting, because I've just tried similar again with my > > ramdisk setup: > > > > fio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio > > --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75 > > --output=aname.txt > > > > > > virtiofs default options > > test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) > > 4096B-4096B, ioengine=psync, iodepth=64 fio-3.21 > > Starting 1 process > > test: Laying out IO file (1 file / 4096MiB) > > > > test: (groupid=0, jobs=1): err= 0: pid=773: Fri Sep 25 12:28:32 2020 > > read: IOPS=18.3k, BW=71.3MiB/s (74.8MB/s)(3070MiB/43042msec) > > bw ( KiB/s): min=70752, max=77280, per=100.00%, avg=73075.71, > > stdev=1603.47, samples=85 iops : min=17688, max=19320, avg=18268.92, > > stdev=400.86, samples=85 write: IOPS=6102, BW=23.8MiB/s > > (24.0MB/s)(1026MiB/43042msec); 0 zone resets bw ( KiB/s): min=23128, > > max=25696, per=100.00%, avg=24420.40, stdev=583.08, samples=85 iops > > : min= 5782, max= 6424, avg=6105.09, stdev=145.76, samples=85 cpu > > : usr=0.10%, sys=30.09%, ctx=1245312, majf=0, minf=6 IO depths : > > 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : > > 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : > > 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: > > total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, > > window=0, percentile=100.00%, depth=64 > > > > Run status group 0 (all jobs): > > READ: bw=71.3MiB/s (74.8MB/s), 71.3MiB/s-71.3MiB/s (74.8MB/s-74.8MB/s), > > io=3070MiB (3219MB), run=43042-43042msec WRITE: bw=23.8MiB/s (24.0MB/s), > > 23.8MiB/s-23.8MiB/s (24.0MB/s-24.0MB/s), io=1026MiB (1076MB), > > run=43042-43042msec > > > > virtiofs cache=none > > test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) > > 4096B-4096B, ioengine=psync, iodepth=64 fio-3.21 > > Starting 1 process > > > > test: (groupid=0, jobs=1): err= 0: pid=740: Fri Sep 25 12:30:57 2020 > > read: IOPS=22.9k, BW=89.6MiB/s (93.0MB/s)(3070MiB/34256msec) > > bw ( KiB/s): min=89048, max=94240, per=100.00%, avg=91871.06, > > stdev=967.87, samples=68 iops : min=22262, max=23560, avg=22967.76, > > stdev=241.97, samples=68 write: IOPS=7667, BW=29.0MiB/s > > (31.4MB/s)(1026MiB/34256msec); 0 zone resets bw ( KiB/s): min=29264, > > max=32248, per=100.00%, avg=30700.82, stdev=541.97, samples=68 iops > > : min= 7316, max= 8062, avg=7675.21, stdev=135.49, samples=68 cpu > > : usr=1.03%, sys=27.64%, ctx=1048635, majf=0, minf=5 IO depths : > > 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : > > 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : > > 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: > > total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, > > window=0, percentile=100.00%, depth=64 > > > > Run status group 0 (all jobs): > > READ: bw=89.6MiB/s (93.0MB/s), 89.6MiB/s-89.6MiB/s (93.0MB/s-93.0MB/s), > > io=3070MiB (3219MB), run=34256-34256msec WRITE: bw=29.0MiB/s (31.4MB/s), > > 29.0MiB/s-29.0MiB/s (31.4MB/s-31.4MB/s), io=1026MiB (1076MB), > > run=34256-34256msec > > > > virtiofs cache=none thread-pool-size=1 > > test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) > > 4096B-4096B, ioengine=psync, iodepth=64 fio-3.21 > > Starting 1 process > > > > test: (groupid=0, jobs=1): err= 0: pid=738: Fri Sep 25 12:33:17 2020 > > read: IOPS=23.7k, BW=92.4MiB/s (96.9MB/s)(3070MiB/33215msec) > > bw ( KiB/s): min=89808, max=111952, per=100.00%, avg=94762.30, > > stdev=4507.43, samples=66 iops : min=22452, max=27988, avg=23690.58, > > stdev=1126.86, samples=66 write: IOPS=7907, BW=30.9MiB/s > > (32.4MB/s)(1026MiB/33215msec); 0 zone resets bw ( KiB/s): min=29424, > > max=37112, per=100.00%, avg=31668.73, stdev=1558.69, samples=66 iops > > : min= 7356, max= 9278, avg=7917.18, stdev=389.67, samples=66 cpu > > : usr=0.43%, sys=29.07%, ctx=1048627, majf=0, minf=7 IO depths : > > 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : > > 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : > > 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: > > total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, > > window=0, percentile=100.00%, depth=64 > > > > Run status group 0 (all jobs): > > READ: bw=92.4MiB/s (96.9MB/s), 92.4MiB/s-92.4MiB/s (96.9MB/s-96.9MB/s), > > io=3070MiB (3219MB), run=33215-33215msec WRITE: bw=30.9MiB/s (32.4MB/s), > > 30.9MiB/s-30.9MiB/s (32.4MB/s-32.4MB/s), io=1026MiB (1076MB), > > run=33215-33215msec > > > > 9p ( mount -t 9p -o trans=virtio kernel /mnt > > -oversion=9p2000.L,cache=mmap,msize=1048576 ) test: (g=0): rw=randrw, > Bottleneck ------------------------------^ > > By increasing 'msize' you would encounter better 9P I/O results. OK, I thought that was bigger than the default; what number should I use? Dave > > bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, > > iodepth=64 fio-3.21 > > Starting 1 process > > > > test: (groupid=0, jobs=1): err= 0: pid=736: Fri Sep 25 12:36:00 2020 > > read: IOPS=16.2k, BW=63.5MiB/s (66.6MB/s)(3070MiB/48366msec) > > bw ( KiB/s): min=63426, max=82776, per=100.00%, avg=65054.28, > > stdev=2014.88, samples=96 iops : min=15856, max=20694, avg=16263.34, > > stdev=503.74, samples=96 write: IOPS=5430, BW=21.2MiB/s > > (22.2MB/s)(1026MiB/48366msec); 0 zone resets bw ( KiB/s): min=20916, > > max=27632, per=100.00%, avg=21740.64, stdev=735.73, samples=96 iops > > : min= 5229, max= 6908, avg=5434.99, stdev=183.95, samples=96 cpu > > : usr=1.60%, sys=14.28%, ctx=1049348, majf=0, minf=7 IO depths : > > 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : > > 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : > > 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: > > total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, > > window=0, percentile=100.00%, depth=64 > > > > Run status group 0 (all jobs): > > READ: bw=63.5MiB/s (66.6MB/s), 63.5MiB/s-63.5MiB/s (66.6MB/s-66.6MB/s), > > io=3070MiB (3219MB), run=48366-48366msec WRITE: bw=21.2MiB/s (22.2MB/s), > > 21.2MiB/s-21.2MiB/s (22.2MB/s-22.2MB/s), io=1026MiB (1076MB), > > run=48366-48366msec > > > > So I'm sitll beating 9p; the thread-pool-size=1 seems to be great for > > read performance here. > > > > Dave > > Best regards, > Christian Schoenebeck > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance) 2020-09-25 13:05 ` Dr. David Alan Gilbert @ 2020-09-25 16:05 ` Christian Schoenebeck 2020-09-25 16:33 ` Christian Schoenebeck 2020-09-25 18:51 ` Dr. David Alan Gilbert 0 siblings, 2 replies; 55+ messages in thread From: Christian Schoenebeck @ 2020-09-25 16:05 UTC (permalink / raw) To: qemu-devel Cc: Dr. David Alan Gilbert, Venegas Munoz, Jose Carlos, cdupontd, virtio-fs-list, Stefan Hajnoczi, Shinde, Archana M, Vivek Goyal On Freitag, 25. September 2020 15:05:38 CEST Dr. David Alan Gilbert wrote: > > > 9p ( mount -t 9p -o trans=virtio kernel /mnt > > > -oversion=9p2000.L,cache=mmap,msize=1048576 ) test: (g=0): rw=randrw, > > > > Bottleneck ------------------------------^ > > > > By increasing 'msize' you would encounter better 9P I/O results. > > OK, I thought that was bigger than the default; what number should I > use? It depends on the underlying storage hardware. In other words: you have to try increasing the 'msize' value to a point where you no longer notice a negative performance impact (or almost). Which is fortunately quite easy to test on guest like: dd if=/dev/zero of=test.dat bs=1G count=12 time cat test.dat > /dev/null I would start with an absolute minimum msize of 10MB. I would recommend something around 100MB maybe for a mechanical hard drive. With a PCIe flash you probably would rather pick several hundred MB or even more. That unpleasant 'msize' issue is a limitation of the 9p protocol: client (guest) must suggest the value of msize on connection to server (host). Server can only lower, but not raise it. And the client in turn obviously cannot see host's storage device(s), so client is unable to pick a good value by itself. So it's a suboptimal handshake issue right now. Many users don't even know this 'msize' parameter exists and hence run with the Linux kernel's default value of just 8kB. For QEMU 5.2 I addressed this by logging a performance warning on host side for making users at least aware about this issue. The long-term plan is to pass a good msize value from host to guest via virtio (like it's already done for the available export tags) and the Linux kernel would default to that instead. Best regards, Christian Schoenebeck ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance) 2020-09-25 16:05 ` Christian Schoenebeck @ 2020-09-25 16:33 ` Christian Schoenebeck 2020-09-25 18:51 ` Dr. David Alan Gilbert 1 sibling, 0 replies; 55+ messages in thread From: Christian Schoenebeck @ 2020-09-25 16:33 UTC (permalink / raw) To: qemu-devel Cc: Dr. David Alan Gilbert, Venegas Munoz, Jose Carlos, cdupontd, virtio-fs-list, Stefan Hajnoczi, Shinde, Archana M, Vivek Goyal On Freitag, 25. September 2020 18:05:17 CEST Christian Schoenebeck wrote: > On Freitag, 25. September 2020 15:05:38 CEST Dr. David Alan Gilbert wrote: > > > > 9p ( mount -t 9p -o trans=virtio kernel /mnt > > > > -oversion=9p2000.L,cache=mmap,msize=1048576 ) test: (g=0): rw=randrw, > > > > > > Bottleneck ------------------------------^ > > > > > > By increasing 'msize' you would encounter better 9P I/O results. > > > > OK, I thought that was bigger than the default; what number should I > > use? > > It depends on the underlying storage hardware. In other words: you have to > try increasing the 'msize' value to a point where you no longer notice a > negative performance impact (or almost). Which is fortunately quite easy to > test on guest like: > > dd if=/dev/zero of=test.dat bs=1G count=12 > time cat test.dat > /dev/null I forgot: you should execute that 'dd' command and host side, and the 'cat' command on guest side, to avoid any caching making the benchmark result look better than it actually is. Because for finding a good 'msize' value you only care about actual 9p data really being transmitted between host and guest. Best regards, Christian Schoenebeck ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance) 2020-09-25 16:05 ` Christian Schoenebeck 2020-09-25 16:33 ` Christian Schoenebeck @ 2020-09-25 18:51 ` Dr. David Alan Gilbert 2020-09-27 12:14 ` Christian Schoenebeck 1 sibling, 1 reply; 55+ messages in thread From: Dr. David Alan Gilbert @ 2020-09-25 18:51 UTC (permalink / raw) To: Christian Schoenebeck Cc: Venegas Munoz, Jose Carlos, qemu-devel, cdupontd, virtio-fs-list, Stefan Hajnoczi, Shinde, Archana M, Vivek Goyal * Christian Schoenebeck (qemu_oss@crudebyte.com) wrote: > On Freitag, 25. September 2020 15:05:38 CEST Dr. David Alan Gilbert wrote: > > > > 9p ( mount -t 9p -o trans=virtio kernel /mnt > > > > -oversion=9p2000.L,cache=mmap,msize=1048576 ) test: (g=0): rw=randrw, > > > > > > Bottleneck ------------------------------^ > > > > > > By increasing 'msize' you would encounter better 9P I/O results. > > > > OK, I thought that was bigger than the default; what number should I > > use? > > It depends on the underlying storage hardware. In other words: you have to try > increasing the 'msize' value to a point where you no longer notice a negative > performance impact (or almost). Which is fortunately quite easy to test on > guest like: > > dd if=/dev/zero of=test.dat bs=1G count=12 > time cat test.dat > /dev/null > > I would start with an absolute minimum msize of 10MB. I would recommend > something around 100MB maybe for a mechanical hard drive. With a PCIe flash > you probably would rather pick several hundred MB or even more. > > That unpleasant 'msize' issue is a limitation of the 9p protocol: client > (guest) must suggest the value of msize on connection to server (host). Server > can only lower, but not raise it. And the client in turn obviously cannot see > host's storage device(s), so client is unable to pick a good value by itself. > So it's a suboptimal handshake issue right now. It doesn't seem to be making a vast difference here: 9p mount -t 9p -o trans=virtio kernel /mnt -oversion=9p2000.L,cache=mmap,msize=104857600 Run status group 0 (all jobs): READ: bw=62.5MiB/s (65.6MB/s), 62.5MiB/s-62.5MiB/s (65.6MB/s-65.6MB/s), io=3070MiB (3219MB), run=49099-49099msec WRITE: bw=20.9MiB/s (21.9MB/s), 20.9MiB/s-20.9MiB/s (21.9MB/s-21.9MB/s), io=1026MiB (1076MB), run=49099-49099msec 9p mount -t 9p -o trans=virtio kernel /mnt -oversion=9p2000.L,cache=mmap,msize=1048576000 Run status group 0 (all jobs): READ: bw=65.2MiB/s (68.3MB/s), 65.2MiB/s-65.2MiB/s (68.3MB/s-68.3MB/s), io=3070MiB (3219MB), run=47104-47104msec WRITE: bw=21.8MiB/s (22.8MB/s), 21.8MiB/s-21.8MiB/s (22.8MB/s-22.8MB/s), io=1026MiB (1076MB), run=47104-47104msec Dave > Many users don't even know this 'msize' parameter exists and hence run with > the Linux kernel's default value of just 8kB. For QEMU 5.2 I addressed this by > logging a performance warning on host side for making users at least aware > about this issue. The long-term plan is to pass a good msize value from host > to guest via virtio (like it's already done for the available export tags) and > the Linux kernel would default to that instead. > > Best regards, > Christian Schoenebeck > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance) 2020-09-25 18:51 ` Dr. David Alan Gilbert @ 2020-09-27 12:14 ` Christian Schoenebeck 2020-09-29 13:03 ` Vivek Goyal 0 siblings, 1 reply; 55+ messages in thread From: Christian Schoenebeck @ 2020-09-27 12:14 UTC (permalink / raw) To: qemu-devel Cc: Dr. David Alan Gilbert, Venegas Munoz, Jose Carlos, cdupontd, virtio-fs-list, Stefan Hajnoczi, Shinde, Archana M, Vivek Goyal On Freitag, 25. September 2020 20:51:47 CEST Dr. David Alan Gilbert wrote: > * Christian Schoenebeck (qemu_oss@crudebyte.com) wrote: > > On Freitag, 25. September 2020 15:05:38 CEST Dr. David Alan Gilbert wrote: > > > > > 9p ( mount -t 9p -o trans=virtio kernel /mnt > > > > > -oversion=9p2000.L,cache=mmap,msize=1048576 ) test: (g=0): > > > > > rw=randrw, > > > > > > > > Bottleneck ------------------------------^ > > > > > > > > By increasing 'msize' you would encounter better 9P I/O results. > > > > > > OK, I thought that was bigger than the default; what number should I > > > use? > > > > It depends on the underlying storage hardware. In other words: you have to > > try increasing the 'msize' value to a point where you no longer notice a > > negative performance impact (or almost). Which is fortunately quite easy > > to test on> > > guest like: > > dd if=/dev/zero of=test.dat bs=1G count=12 > > time cat test.dat > /dev/null > > > > I would start with an absolute minimum msize of 10MB. I would recommend > > something around 100MB maybe for a mechanical hard drive. With a PCIe > > flash > > you probably would rather pick several hundred MB or even more. > > > > That unpleasant 'msize' issue is a limitation of the 9p protocol: client > > (guest) must suggest the value of msize on connection to server (host). > > Server can only lower, but not raise it. And the client in turn obviously > > cannot see host's storage device(s), so client is unable to pick a good > > value by itself. So it's a suboptimal handshake issue right now. > > It doesn't seem to be making a vast difference here: > > > > 9p mount -t 9p -o trans=virtio kernel /mnt > -oversion=9p2000.L,cache=mmap,msize=104857600 > > Run status group 0 (all jobs): > READ: bw=62.5MiB/s (65.6MB/s), 62.5MiB/s-62.5MiB/s (65.6MB/s-65.6MB/s), > io=3070MiB (3219MB), run=49099-49099msec WRITE: bw=20.9MiB/s (21.9MB/s), > 20.9MiB/s-20.9MiB/s (21.9MB/s-21.9MB/s), io=1026MiB (1076MB), > run=49099-49099msec > > 9p mount -t 9p -o trans=virtio kernel /mnt > -oversion=9p2000.L,cache=mmap,msize=1048576000 > > Run status group 0 (all jobs): > READ: bw=65.2MiB/s (68.3MB/s), 65.2MiB/s-65.2MiB/s (68.3MB/s-68.3MB/s), > io=3070MiB (3219MB), run=47104-47104msec WRITE: bw=21.8MiB/s (22.8MB/s), > 21.8MiB/s-21.8MiB/s (22.8MB/s-22.8MB/s), io=1026MiB (1076MB), > run=47104-47104msec > > > Dave Is that benchmark tool honoring 'iounit' to automatically run with max. I/O chunk sizes? What's that benchmark tool actually? And do you also see no improvement with a simple time cat largefile.dat > /dev/null ? Best regards, Christian Schoenebeck ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance) 2020-09-27 12:14 ` Christian Schoenebeck @ 2020-09-29 13:03 ` Vivek Goyal 2020-09-29 13:28 ` Christian Schoenebeck 0 siblings, 1 reply; 55+ messages in thread From: Vivek Goyal @ 2020-09-29 13:03 UTC (permalink / raw) To: Christian Schoenebeck Cc: Venegas Munoz, Jose Carlos, cdupontd, qemu-devel, virtio-fs-list, Stefan Hajnoczi, Shinde, Archana M, Dr. David Alan Gilbert On Sun, Sep 27, 2020 at 02:14:43PM +0200, Christian Schoenebeck wrote: > On Freitag, 25. September 2020 20:51:47 CEST Dr. David Alan Gilbert wrote: > > * Christian Schoenebeck (qemu_oss@crudebyte.com) wrote: > > > On Freitag, 25. September 2020 15:05:38 CEST Dr. David Alan Gilbert wrote: > > > > > > 9p ( mount -t 9p -o trans=virtio kernel /mnt > > > > > > -oversion=9p2000.L,cache=mmap,msize=1048576 ) test: (g=0): > > > > > > rw=randrw, > > > > > > > > > > Bottleneck ------------------------------^ > > > > > > > > > > By increasing 'msize' you would encounter better 9P I/O results. > > > > > > > > OK, I thought that was bigger than the default; what number should I > > > > use? > > > > > > It depends on the underlying storage hardware. In other words: you have to > > > try increasing the 'msize' value to a point where you no longer notice a > > > negative performance impact (or almost). Which is fortunately quite easy > > > to test on> > > > guest like: > > > dd if=/dev/zero of=test.dat bs=1G count=12 > > > time cat test.dat > /dev/null > > > > > > I would start with an absolute minimum msize of 10MB. I would recommend > > > something around 100MB maybe for a mechanical hard drive. With a PCIe > > > flash > > > you probably would rather pick several hundred MB or even more. > > > > > > That unpleasant 'msize' issue is a limitation of the 9p protocol: client > > > (guest) must suggest the value of msize on connection to server (host). > > > Server can only lower, but not raise it. And the client in turn obviously > > > cannot see host's storage device(s), so client is unable to pick a good > > > value by itself. So it's a suboptimal handshake issue right now. > > > > It doesn't seem to be making a vast difference here: > > > > > > > > 9p mount -t 9p -o trans=virtio kernel /mnt > > -oversion=9p2000.L,cache=mmap,msize=104857600 > > > > Run status group 0 (all jobs): > > READ: bw=62.5MiB/s (65.6MB/s), 62.5MiB/s-62.5MiB/s (65.6MB/s-65.6MB/s), > > io=3070MiB (3219MB), run=49099-49099msec WRITE: bw=20.9MiB/s (21.9MB/s), > > 20.9MiB/s-20.9MiB/s (21.9MB/s-21.9MB/s), io=1026MiB (1076MB), > > run=49099-49099msec > > > > 9p mount -t 9p -o trans=virtio kernel /mnt > > -oversion=9p2000.L,cache=mmap,msize=1048576000 > > > > Run status group 0 (all jobs): > > READ: bw=65.2MiB/s (68.3MB/s), 65.2MiB/s-65.2MiB/s (68.3MB/s-68.3MB/s), > > io=3070MiB (3219MB), run=47104-47104msec WRITE: bw=21.8MiB/s (22.8MB/s), > > 21.8MiB/s-21.8MiB/s (22.8MB/s-22.8MB/s), io=1026MiB (1076MB), > > run=47104-47104msec > > > > > > Dave > > Is that benchmark tool honoring 'iounit' to automatically run with max. I/O > chunk sizes? What's that benchmark tool actually? And do you also see no > improvement with a simple > > time cat largefile.dat > /dev/null I am assuming that msize only helps with sequential I/O and not random I/O. Dave is running random read and random write mix and probably that's why he is not seeing any improvement with msize increase. If we run sequential workload (as "cat largefile.dat"), that should see an improvement with msize increase. Thanks Vivek ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance) 2020-09-29 13:03 ` Vivek Goyal @ 2020-09-29 13:28 ` Christian Schoenebeck 2020-09-29 13:49 ` Vivek Goyal 0 siblings, 1 reply; 55+ messages in thread From: Christian Schoenebeck @ 2020-09-29 13:28 UTC (permalink / raw) To: qemu-devel Cc: Vivek Goyal, Venegas Munoz, Jose Carlos, cdupontd, virtio-fs-list, Stefan Hajnoczi, Shinde, Archana M, Dr. David Alan Gilbert On Dienstag, 29. September 2020 15:03:25 CEST Vivek Goyal wrote: > On Sun, Sep 27, 2020 at 02:14:43PM +0200, Christian Schoenebeck wrote: > > On Freitag, 25. September 2020 20:51:47 CEST Dr. David Alan Gilbert wrote: > > > * Christian Schoenebeck (qemu_oss@crudebyte.com) wrote: > > > > On Freitag, 25. September 2020 15:05:38 CEST Dr. David Alan Gilbert wrote: > > > > > > > 9p ( mount -t 9p -o trans=virtio kernel /mnt > > > > > > > -oversion=9p2000.L,cache=mmap,msize=1048576 ) test: (g=0): > > > > > > > rw=randrw, > > > > > > > > > > > > Bottleneck ------------------------------^ > > > > > > > > > > > > By increasing 'msize' you would encounter better 9P I/O results. > > > > > > > > > > OK, I thought that was bigger than the default; what number should > > > > > I > > > > > use? > > > > > > > > It depends on the underlying storage hardware. In other words: you > > > > have to > > > > try increasing the 'msize' value to a point where you no longer notice > > > > a > > > > negative performance impact (or almost). Which is fortunately quite > > > > easy > > > > to test on> > > > > > > > > guest like: > > > > dd if=/dev/zero of=test.dat bs=1G count=12 > > > > time cat test.dat > /dev/null > > > > > > > > I would start with an absolute minimum msize of 10MB. I would > > > > recommend > > > > something around 100MB maybe for a mechanical hard drive. With a PCIe > > > > flash > > > > you probably would rather pick several hundred MB or even more. > > > > > > > > That unpleasant 'msize' issue is a limitation of the 9p protocol: > > > > client > > > > (guest) must suggest the value of msize on connection to server > > > > (host). > > > > Server can only lower, but not raise it. And the client in turn > > > > obviously > > > > cannot see host's storage device(s), so client is unable to pick a > > > > good > > > > value by itself. So it's a suboptimal handshake issue right now. > > > > > > It doesn't seem to be making a vast difference here: > > > > > > > > > > > > 9p mount -t 9p -o trans=virtio kernel /mnt > > > -oversion=9p2000.L,cache=mmap,msize=104857600 > > > > > > Run status group 0 (all jobs): > > > READ: bw=62.5MiB/s (65.6MB/s), 62.5MiB/s-62.5MiB/s > > > (65.6MB/s-65.6MB/s), > > > > > > io=3070MiB (3219MB), run=49099-49099msec WRITE: bw=20.9MiB/s (21.9MB/s), > > > 20.9MiB/s-20.9MiB/s (21.9MB/s-21.9MB/s), io=1026MiB (1076MB), > > > run=49099-49099msec > > > > > > 9p mount -t 9p -o trans=virtio kernel /mnt > > > -oversion=9p2000.L,cache=mmap,msize=1048576000 > > > > > > Run status group 0 (all jobs): > > > READ: bw=65.2MiB/s (68.3MB/s), 65.2MiB/s-65.2MiB/s > > > (68.3MB/s-68.3MB/s), > > > > > > io=3070MiB (3219MB), run=47104-47104msec WRITE: bw=21.8MiB/s (22.8MB/s), > > > 21.8MiB/s-21.8MiB/s (22.8MB/s-22.8MB/s), io=1026MiB (1076MB), > > > run=47104-47104msec > > > > > > > > > Dave > > > > Is that benchmark tool honoring 'iounit' to automatically run with max. > > I/O > > chunk sizes? What's that benchmark tool actually? And do you also see no > > improvement with a simple > > > > time cat largefile.dat > /dev/null > > I am assuming that msize only helps with sequential I/O and not random > I/O. > > Dave is running random read and random write mix and probably that's why > he is not seeing any improvement with msize increase. > > If we run sequential workload (as "cat largefile.dat"), that should > see an improvement with msize increase. > > Thanks > Vivek Depends on what's randomized. If read chunk size is randomized, then yes, you would probably see less performance increase compared to a simple 'cat foo.dat'. If only the read position is randomized, but the read chunk size honors iounit, a.k.a. stat's st_blksize (i.e. reading with the most efficient block size advertised by 9P), then I would assume still seeing a performance increase. Because seeking is a no/low cost factor in this case. The guest OS seeking does not transmit a 9p message. The offset is rather passed with any Tread message instead: https://github.com/chaos/diod/blob/master/protocol.md I mean, yes, random seeks reduce I/O performance in general of course, but in direct performance comparison, the difference in overhead of the 9p vs. virtiofs network controller layer is most probably the most relevant aspect if large I/O chunk sizes are used. But OTOH: I haven't optimized anything in Tread handling in 9p (yet). Best regards, Christian Schoenebeck ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance) 2020-09-29 13:28 ` Christian Schoenebeck @ 2020-09-29 13:49 ` Vivek Goyal 2020-09-29 13:59 ` Christian Schoenebeck 0 siblings, 1 reply; 55+ messages in thread From: Vivek Goyal @ 2020-09-29 13:49 UTC (permalink / raw) To: Christian Schoenebeck Cc: Venegas Munoz, Jose Carlos, qemu-devel, cdupontd, virtio-fs-list, Stefan Hajnoczi, Shinde, Archana M, Dr. David Alan Gilbert On Tue, Sep 29, 2020 at 03:28:06PM +0200, Christian Schoenebeck wrote: > On Dienstag, 29. September 2020 15:03:25 CEST Vivek Goyal wrote: > > On Sun, Sep 27, 2020 at 02:14:43PM +0200, Christian Schoenebeck wrote: > > > On Freitag, 25. September 2020 20:51:47 CEST Dr. David Alan Gilbert wrote: > > > > * Christian Schoenebeck (qemu_oss@crudebyte.com) wrote: > > > > > On Freitag, 25. September 2020 15:05:38 CEST Dr. David Alan Gilbert > wrote: > > > > > > > > 9p ( mount -t 9p -o trans=virtio kernel /mnt > > > > > > > > -oversion=9p2000.L,cache=mmap,msize=1048576 ) test: (g=0): > > > > > > > > rw=randrw, > > > > > > > > > > > > > > Bottleneck ------------------------------^ > > > > > > > > > > > > > > By increasing 'msize' you would encounter better 9P I/O results. > > > > > > > > > > > > OK, I thought that was bigger than the default; what number should > > > > > > I > > > > > > use? > > > > > > > > > > It depends on the underlying storage hardware. In other words: you > > > > > have to > > > > > try increasing the 'msize' value to a point where you no longer notice > > > > > a > > > > > negative performance impact (or almost). Which is fortunately quite > > > > > easy > > > > > to test on> > > > > > > > > > > guest like: > > > > > dd if=/dev/zero of=test.dat bs=1G count=12 > > > > > time cat test.dat > /dev/null > > > > > > > > > > I would start with an absolute minimum msize of 10MB. I would > > > > > recommend > > > > > something around 100MB maybe for a mechanical hard drive. With a PCIe > > > > > flash > > > > > you probably would rather pick several hundred MB or even more. > > > > > > > > > > That unpleasant 'msize' issue is a limitation of the 9p protocol: > > > > > client > > > > > (guest) must suggest the value of msize on connection to server > > > > > (host). > > > > > Server can only lower, but not raise it. And the client in turn > > > > > obviously > > > > > cannot see host's storage device(s), so client is unable to pick a > > > > > good > > > > > value by itself. So it's a suboptimal handshake issue right now. > > > > > > > > It doesn't seem to be making a vast difference here: > > > > > > > > > > > > > > > > 9p mount -t 9p -o trans=virtio kernel /mnt > > > > -oversion=9p2000.L,cache=mmap,msize=104857600 > > > > > > > > Run status group 0 (all jobs): > > > > READ: bw=62.5MiB/s (65.6MB/s), 62.5MiB/s-62.5MiB/s > > > > (65.6MB/s-65.6MB/s), > > > > > > > > io=3070MiB (3219MB), run=49099-49099msec WRITE: bw=20.9MiB/s (21.9MB/s), > > > > 20.9MiB/s-20.9MiB/s (21.9MB/s-21.9MB/s), io=1026MiB (1076MB), > > > > run=49099-49099msec > > > > > > > > 9p mount -t 9p -o trans=virtio kernel /mnt > > > > -oversion=9p2000.L,cache=mmap,msize=1048576000 > > > > > > > > Run status group 0 (all jobs): > > > > READ: bw=65.2MiB/s (68.3MB/s), 65.2MiB/s-65.2MiB/s > > > > (68.3MB/s-68.3MB/s), > > > > > > > > io=3070MiB (3219MB), run=47104-47104msec WRITE: bw=21.8MiB/s (22.8MB/s), > > > > 21.8MiB/s-21.8MiB/s (22.8MB/s-22.8MB/s), io=1026MiB (1076MB), > > > > run=47104-47104msec > > > > > > > > > > > > Dave > > > > > > Is that benchmark tool honoring 'iounit' to automatically run with max. > > > I/O > > > chunk sizes? What's that benchmark tool actually? And do you also see no > > > improvement with a simple > > > > > > time cat largefile.dat > /dev/null > > > > I am assuming that msize only helps with sequential I/O and not random > > I/O. > > > > Dave is running random read and random write mix and probably that's why > > he is not seeing any improvement with msize increase. > > > > If we run sequential workload (as "cat largefile.dat"), that should > > see an improvement with msize increase. > > > > Thanks > > Vivek > > Depends on what's randomized. If read chunk size is randomized, then yes, you > would probably see less performance increase compared to a simple > 'cat foo.dat'. We are using "fio" for testing and read chunk size is not being randomized. chunk size (block size) is fixed at 4K size for these tests. > > If only the read position is randomized, but the read chunk size honors > iounit, a.k.a. stat's st_blksize (i.e. reading with the most efficient block > size advertised by 9P), then I would assume still seeing a performance > increase. Yes, we are randomizing read position. But there is no notion of looking at st_blksize. Its fixed at 4K. (notice option --bs=4k in fio commandline). > Because seeking is a no/low cost factor in this case. The guest OS > seeking does not transmit a 9p message. The offset is rather passed with any > Tread message instead: > https://github.com/chaos/diod/blob/master/protocol.md > > I mean, yes, random seeks reduce I/O performance in general of course, but in > direct performance comparison, the difference in overhead of the 9p vs. > virtiofs network controller layer is most probably the most relevant aspect if > large I/O chunk sizes are used. > Agreed that large I/O chunk size will help with the perfomance numbers. But idea is to intentonally use smaller I/O chunk size with some of the tests to measure how efficient communication path is. Thanks Vivek ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance) 2020-09-29 13:49 ` Vivek Goyal @ 2020-09-29 13:59 ` Christian Schoenebeck 0 siblings, 0 replies; 55+ messages in thread From: Christian Schoenebeck @ 2020-09-29 13:59 UTC (permalink / raw) To: qemu-devel Cc: Vivek Goyal, Venegas Munoz, Jose Carlos, cdupontd, virtio-fs-list, Stefan Hajnoczi, Shinde, Archana M, Dr. David Alan Gilbert On Dienstag, 29. September 2020 15:49:42 CEST Vivek Goyal wrote: > > Depends on what's randomized. If read chunk size is randomized, then yes, > > you would probably see less performance increase compared to a simple > > 'cat foo.dat'. > > We are using "fio" for testing and read chunk size is not being > randomized. chunk size (block size) is fixed at 4K size for these tests. Good to know, thanks! > > If only the read position is randomized, but the read chunk size honors > > iounit, a.k.a. stat's st_blksize (i.e. reading with the most efficient > > block size advertised by 9P), then I would assume still seeing a > > performance increase. > > Yes, we are randomizing read position. But there is no notion of looking > at st_blksize. Its fixed at 4K. (notice option --bs=4k in fio > commandline). Ah ok, then the results make sense. With these block sizes you will indeed suffer a performance issue with 9p, due to several thread hops in Tread handling, which is due to be fixed. Best regards, Christian Schoenebeck ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance) 2020-09-25 12:41 ` virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance) Dr. David Alan Gilbert 2020-09-25 13:04 ` Christian Schoenebeck @ 2020-09-29 13:17 ` Vivek Goyal 2020-09-29 13:49 ` [Virtio-fs] " Miklos Szeredi 1 sibling, 1 reply; 55+ messages in thread From: Vivek Goyal @ 2020-09-29 13:17 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Venegas Munoz, Jose Carlos, qemu-devel, cdupontd, virtio-fs-list, Stefan Hajnoczi, Shinde, Archana M On Fri, Sep 25, 2020 at 01:41:39PM +0100, Dr. David Alan Gilbert wrote: [..] > So I'm sitll beating 9p; the thread-pool-size=1 seems to be great for > read performance here. > Hi Dave, I spent some time making changes to virtiofs-tests so that I can test a mix of random read and random write workload. That testsuite runs a workload 3 times and reports the average. So I like to use it to reduce run to run variation effect. So I ran following to mimic carlos's workload. $ ./run-fio-test.sh test -direct=1 -c <test-dir> fio-jobs/randrw-psync.job > testresults.txt $ ./parse-fio-results.sh testresults.txt I am using a SSD at the host to back these files. Option "-c" always creates new files for testing. Following are my results in various configurations. Used cache=mmap mode for 9p and cache=auto (and cache=none) modes for virtiofs. Also tested 9p default as well as msize=16m. Tested virtiofs both with exclusive as well as shared thread pool. NAME WORKLOAD Bandwidth IOPS 9p-mmap-randrw randrw-psync 42.8mb/14.3mb 10.7k/3666 9p-mmap-msize16m randrw-psync 42.8mb/14.3mb 10.7k/3674 vtfs-auto-ex-randrw randrw-psync 27.8mb/9547kb 7136/2386 vtfs-auto-sh-randrw randrw-psync 43.3mb/14.4mb 10.8k/3709 vtfs-none-sh-randrw randrw-psync 54.1mb/18.1mb 13.5k/4649 - Increasing msize to 16m did not help with performance for this workload. - virtiofs exclusive thread pool ("ex"), is slower than 9p. - virtiofs shared thread pool ("sh"), matches the performance of 9p. - virtiofs cache=none mode is faster than cache=auto mode for this workload. Carlos, I am looking at more ways to optimize it further for virtiofs. In the mean time I think switching to "shared" thread pool should bring you very close to 9p in your setup I think. Thanks Vivek ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [Virtio-fs] virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance) 2020-09-29 13:17 ` Vivek Goyal @ 2020-09-29 13:49 ` Miklos Szeredi 2020-09-29 14:01 ` Vivek Goyal 2020-09-29 15:28 ` Vivek Goyal 0 siblings, 2 replies; 55+ messages in thread From: Miklos Szeredi @ 2020-09-29 13:49 UTC (permalink / raw) To: Vivek Goyal Cc: qemu-devel, Venegas Munoz, Jose Carlos, cdupontd, Dr. David Alan Gilbert, virtio-fs-list, Shinde, Archana M On Tue, Sep 29, 2020 at 3:18 PM Vivek Goyal <vgoyal@redhat.com> wrote: > - virtiofs cache=none mode is faster than cache=auto mode for this > workload. Not sure why. One cause could be that readahead is not perfect at detecting the random pattern. Could we compare total I/O on the server vs. total I/O by fio? Thanks, Millos ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [Virtio-fs] virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance) 2020-09-29 13:49 ` [Virtio-fs] " Miklos Szeredi @ 2020-09-29 14:01 ` Vivek Goyal 2020-09-29 14:54 ` Miklos Szeredi 2020-09-29 15:28 ` Vivek Goyal 1 sibling, 1 reply; 55+ messages in thread From: Vivek Goyal @ 2020-09-29 14:01 UTC (permalink / raw) To: Miklos Szeredi Cc: qemu-devel, Venegas Munoz, Jose Carlos, cdupontd, Dr. David Alan Gilbert, virtio-fs-list, Shinde, Archana M On Tue, Sep 29, 2020 at 03:49:04PM +0200, Miklos Szeredi wrote: > On Tue, Sep 29, 2020 at 3:18 PM Vivek Goyal <vgoyal@redhat.com> wrote: > > > - virtiofs cache=none mode is faster than cache=auto mode for this > > workload. > > Not sure why. One cause could be that readahead is not perfect at > detecting the random pattern. Could we compare total I/O on the > server vs. total I/O by fio? Hi Miklos, I will instrument virtiosd code to figure out total I/O. One more potential issue I am staring at is refreshing the attrs on READ if fc->auto_inval_data is set. fuse_cache_read_iter() { /* * In auto invalidate mode, always update attributes on read. * Otherwise, only update if we attempt to read past EOF (to ensure * i_size is up to date). */ if (fc->auto_inval_data || (iocb->ki_pos + iov_iter_count(to) > i_size_read(inode))) { int err; err = fuse_update_attributes(inode, iocb->ki_filp); if (err) return err; } } Given this is a mixed READ/WRITE workload, every WRITE will invalidate attrs. And next READ will first do GETATTR() from server (and potentially invalidate page cache) before doing READ. This sounds suboptimal especially from the point of view of WRITEs done by this client itself. I mean if another client has modified the file, then doing GETATTR after a second makes sense. But there should be some optimization to make sure our own WRITEs don't end up doing GETATTR and invalidate page cache (because cache contents are still valid). I disabled ->auto_invalid_data and that seemed to result in 8-10% gain in performance for this workload. Thanks Vivek ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [Virtio-fs] virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance) 2020-09-29 14:01 ` Vivek Goyal @ 2020-09-29 14:54 ` Miklos Szeredi 0 siblings, 0 replies; 55+ messages in thread From: Miklos Szeredi @ 2020-09-29 14:54 UTC (permalink / raw) To: Vivek Goyal Cc: qemu-devel, Venegas Munoz, Jose Carlos, cdupontd, Dr. David Alan Gilbert, virtio-fs-list, Shinde, Archana M On Tue, Sep 29, 2020 at 4:01 PM Vivek Goyal <vgoyal@redhat.com> wrote: > > On Tue, Sep 29, 2020 at 03:49:04PM +0200, Miklos Szeredi wrote: > > On Tue, Sep 29, 2020 at 3:18 PM Vivek Goyal <vgoyal@redhat.com> wrote: > > > > > - virtiofs cache=none mode is faster than cache=auto mode for this > > > workload. > > > > Not sure why. One cause could be that readahead is not perfect at > > detecting the random pattern. Could we compare total I/O on the > > server vs. total I/O by fio? > > Hi Miklos, > > I will instrument virtiosd code to figure out total I/O. > > One more potential issue I am staring at is refreshing the attrs on > READ if fc->auto_inval_data is set. > > fuse_cache_read_iter() { > /* > * In auto invalidate mode, always update attributes on read. > * Otherwise, only update if we attempt to read past EOF (to ensure > * i_size is up to date). > */ > if (fc->auto_inval_data || > (iocb->ki_pos + iov_iter_count(to) > i_size_read(inode))) { > int err; > err = fuse_update_attributes(inode, iocb->ki_filp); > if (err) > return err; > } > } > > Given this is a mixed READ/WRITE workload, every WRITE will invalidate > attrs. And next READ will first do GETATTR() from server (and potentially > invalidate page cache) before doing READ. > > This sounds suboptimal especially from the point of view of WRITEs > done by this client itself. I mean if another client has modified > the file, then doing GETATTR after a second makes sense. But there > should be some optimization to make sure our own WRITEs don't end > up doing GETATTR and invalidate page cache (because cache contents > are still valid). Yeah, that sucks. > I disabled ->auto_invalid_data and that seemed to result in 8-10% > gain in performance for this workload. Need to wrap my head around these caching issues. Thanks, Miklos ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [Virtio-fs] virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance) 2020-09-29 13:49 ` [Virtio-fs] " Miklos Szeredi 2020-09-29 14:01 ` Vivek Goyal @ 2020-09-29 15:28 ` Vivek Goyal 1 sibling, 0 replies; 55+ messages in thread From: Vivek Goyal @ 2020-09-29 15:28 UTC (permalink / raw) To: Miklos Szeredi Cc: qemu-devel, Venegas Munoz, Jose Carlos, cdupontd, Dr. David Alan Gilbert, virtio-fs-list, Shinde, Archana M On Tue, Sep 29, 2020 at 03:49:04PM +0200, Miklos Szeredi wrote: > On Tue, Sep 29, 2020 at 3:18 PM Vivek Goyal <vgoyal@redhat.com> wrote: > > > - virtiofs cache=none mode is faster than cache=auto mode for this > > workload. > > Not sure why. One cause could be that readahead is not perfect at > detecting the random pattern. Could we compare total I/O on the > server vs. total I/O by fio? Ran tests with auto_inval_data disabled and compared with other results. vtfs-auto-ex-randrw randrw-psync 27.8mb/9547kb 7136/2386 vtfs-auto-sh-randrw randrw-psync 43.3mb/14.4mb 10.8k/3709 vtfs-auto-sh-noinval randrw-psync 50.5mb/16.9mb 12.6k/4330 vtfs-none-sh-randrw randrw-psync 54.1mb/18.1mb 13.5k/4649 With auto_inval_data disabled, this time I saw around 20% performance jump in READ and is now much closer to cache=none performance. Thanks Vivek ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: tools/virtiofs: Multi threading seems to hurt performance 2020-09-22 17:47 ` Vivek Goyal 2020-09-24 21:33 ` Venegas Munoz, Jose Carlos @ 2020-09-25 12:11 ` Dr. David Alan Gilbert 2020-09-25 13:11 ` Vivek Goyal 1 sibling, 1 reply; 55+ messages in thread From: Dr. David Alan Gilbert @ 2020-09-25 12:11 UTC (permalink / raw) To: Vivek Goyal Cc: jose.carlos.venegas.munoz, qemu-devel, cdupontd, virtio-fs-list, Stefan Hajnoczi, archana.m.shinde * Vivek Goyal (vgoyal@redhat.com) wrote: > On Tue, Sep 22, 2020 at 11:25:31AM +0100, Dr. David Alan Gilbert wrote: > > * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote: > > > Hi, > > > I've been doing some of my own perf tests and I think I agree > > > about the thread pool size; my test is a kernel build > > > and I've tried a bunch of different options. > > > > > > My config: > > > Host: 16 core AMD EPYC (32 thread), 128G RAM, > > > 5.9.0-rc4 kernel, rhel 8.2ish userspace. > > > 5.1.0 qemu/virtiofsd built from git. > > > Guest: Fedora 32 from cloud image with just enough extra installed for > > > a kernel build. > > > > > > git cloned and checkout v5.8 of Linux into /dev/shm/linux on the host > > > fresh before each test. Then log into the guest, make defconfig, > > > time make -j 16 bzImage, make clean; time make -j 16 bzImage > > > The numbers below are the 'real' time in the guest from the initial make > > > (the subsequent makes dont vary much) > > > > > > Below are the detauls of what each of these means, but here are the > > > numbers first > > > > > > virtiofsdefault 4m0.978s > > > 9pdefault 9m41.660s > > > virtiofscache=none 10m29.700s > > > 9pmmappass 9m30.047s > > > 9pmbigmsize 12m4.208s > > > 9pmsecnone 9m21.363s > > > virtiofscache=noneT1 7m17.494s > > > virtiofsdefaultT1 3m43.326s > > > > > > So the winner there by far is the 'virtiofsdefaultT1' - that's > > > the default virtiofs settings, but with --thread-pool-size=1 - so > > > yes it gives a small benefit. > > > But interestingly the cache=none virtiofs performance is pretty bad, > > > but thread-pool-size=1 on that makes a BIG improvement. > > > > Here are fio runs that Vivek asked me to run in my same environment > > (there are some 0's in some of the mmap cases, and I've not investigated > > why yet). > > cache=none does not allow mmap in case of virtiofs. That's when you > are seeing 0. > > >virtiofs is looking good here in I think all of the cases; > > there's some division over which cinfig; cache=none > > seems faster in some cases which surprises me. > > I know cache=none is faster in case of write workloads. It forces > direct write where we don't call file_remove_privs(). While cache=auto > goes through file_remove_privs() and that adds a GETXATTR request to > every WRITE request. Can you point me to how cache=auto causes the file_remove_privs? Dave > Vivek -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: tools/virtiofs: Multi threading seems to hurt performance 2020-09-25 12:11 ` tools/virtiofs: Multi threading seems to hurt performance Dr. David Alan Gilbert @ 2020-09-25 13:11 ` Vivek Goyal 0 siblings, 0 replies; 55+ messages in thread From: Vivek Goyal @ 2020-09-25 13:11 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: jose.carlos.venegas.munoz, qemu-devel, cdupontd, virtio-fs-list, Stefan Hajnoczi, archana.m.shinde On Fri, Sep 25, 2020 at 01:11:27PM +0100, Dr. David Alan Gilbert wrote: > * Vivek Goyal (vgoyal@redhat.com) wrote: > > On Tue, Sep 22, 2020 at 11:25:31AM +0100, Dr. David Alan Gilbert wrote: > > > * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote: > > > > Hi, > > > > I've been doing some of my own perf tests and I think I agree > > > > about the thread pool size; my test is a kernel build > > > > and I've tried a bunch of different options. > > > > > > > > My config: > > > > Host: 16 core AMD EPYC (32 thread), 128G RAM, > > > > 5.9.0-rc4 kernel, rhel 8.2ish userspace. > > > > 5.1.0 qemu/virtiofsd built from git. > > > > Guest: Fedora 32 from cloud image with just enough extra installed for > > > > a kernel build. > > > > > > > > git cloned and checkout v5.8 of Linux into /dev/shm/linux on the host > > > > fresh before each test. Then log into the guest, make defconfig, > > > > time make -j 16 bzImage, make clean; time make -j 16 bzImage > > > > The numbers below are the 'real' time in the guest from the initial make > > > > (the subsequent makes dont vary much) > > > > > > > > Below are the detauls of what each of these means, but here are the > > > > numbers first > > > > > > > > virtiofsdefault 4m0.978s > > > > 9pdefault 9m41.660s > > > > virtiofscache=none 10m29.700s > > > > 9pmmappass 9m30.047s > > > > 9pmbigmsize 12m4.208s > > > > 9pmsecnone 9m21.363s > > > > virtiofscache=noneT1 7m17.494s > > > > virtiofsdefaultT1 3m43.326s > > > > > > > > So the winner there by far is the 'virtiofsdefaultT1' - that's > > > > the default virtiofs settings, but with --thread-pool-size=1 - so > > > > yes it gives a small benefit. > > > > But interestingly the cache=none virtiofs performance is pretty bad, > > > > but thread-pool-size=1 on that makes a BIG improvement. > > > > > > Here are fio runs that Vivek asked me to run in my same environment > > > (there are some 0's in some of the mmap cases, and I've not investigated > > > why yet). > > > > cache=none does not allow mmap in case of virtiofs. That's when you > > are seeing 0. > > > > >virtiofs is looking good here in I think all of the cases; > > > there's some division over which cinfig; cache=none > > > seems faster in some cases which surprises me. > > > > I know cache=none is faster in case of write workloads. It forces > > direct write where we don't call file_remove_privs(). While cache=auto > > goes through file_remove_privs() and that adds a GETXATTR request to > > every WRITE request. > > Can you point me to how cache=auto causes the file_remove_privs? fs/fuse/file.c fuse_cache_write_iter() { err = file_remove_privs(file); } Above path is taken when cache=auto/cache=always is used. If virtiofsd is running with noxattr, then it does not impose any cost. But if xattr are enabled, then every WRITE first results in a getxattr(security.capability) and that slows down WRITES tremendously. When cache=none is used, we go through following path instead. fuse_direct_write_iter() and it does not have file_remove_privs(). We set a flag in WRITE request to tell server to kill suid/sgid/security.capability, instead. fuse_direct_io() { ia->write.in.write_flags |= FUSE_WRITE_KILL_PRIV } Vivek ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: tools/virtiofs: Multi threading seems to hurt performance 2020-09-18 21:34 tools/virtiofs: Multi threading seems to hurt performance Vivek Goyal ` (2 preceding siblings ...) 2020-09-21 15:32 ` Dr. David Alan Gilbert @ 2020-09-21 20:16 ` Vivek Goyal 2020-09-22 11:09 ` Dr. David Alan Gilbert 2020-09-23 12:50 ` [Virtio-fs] " Chirantan Ekbote 4 siblings, 1 reply; 55+ messages in thread From: Vivek Goyal @ 2020-09-21 20:16 UTC (permalink / raw) To: virtio-fs-list, qemu-devel Cc: Dr. David Alan Gilbert, Stefan Hajnoczi, Miklos Szeredi On Fri, Sep 18, 2020 at 05:34:36PM -0400, Vivek Goyal wrote: > Hi All, > > virtiofsd default thread pool size is 64. To me it feels that in most of > the cases thread pool size 1 performs better than thread pool size 64. > > I ran virtiofs-tests. > > https://github.com/rhvgoyal/virtiofs-tests I spent more time debugging this. First thing I noticed is that we are using "exclusive" glib thread pool. https://developer.gnome.org/glib/stable/glib-Thread-Pools.html#g-thread-pool-new This seems to run pre-determined number of threads dedicated to that thread pool. Little instrumentation of code revealed that every new request gets assiged to new thread (despite the fact that previous thread finished its job). So internally there might be some kind of round robin policy to choose next thread for running the job. I decided to switch to "shared" pool instead where it seemed to spin up new threads only if there is enough work. Also threads can be shared between pools. And looks like testing results are way better with "shared" pools. So may be we should switch to shared pool by default. (Till somebody shows in what cases exclusive pools are better). Second thought which came to mind was what's the impact of NUMA. What if qemu and virtiofsd process/threads are running on separate NUMA node. That should increase memory access latency and increased overhead. So I used "numactl --cpubind=0" to bind both qemu and virtiofsd to node 0. My machine seems to have two numa nodes. (Each node is having 32 logical processors). Keeping both qemu and virtiofsd on same node improves throughput further. So here are the results. vtfs-none-epool --> cache=none, exclusive thread pool. vtfs-none-spool --> cache=none, shared thread pool. vtfs-none-spool-numa --> cache=none, shared thread pool, same numa node NAME WORKLOAD Bandwidth IOPS vtfs-none-epool seqread-psync 36(MiB/s) 9392 vtfs-none-spool seqread-psync 68(MiB/s) 17k vtfs-none-spool-numa seqread-psync 73(MiB/s) 18k vtfs-none-epool seqread-psync-multi 210(MiB/s) 52k vtfs-none-spool seqread-psync-multi 260(MiB/s) 65k vtfs-none-spool-numa seqread-psync-multi 309(MiB/s) 77k vtfs-none-epool seqread-libaio 286(MiB/s) 71k vtfs-none-spool seqread-libaio 328(MiB/s) 82k vtfs-none-spool-numa seqread-libaio 332(MiB/s) 83k vtfs-none-epool seqread-libaio-multi 201(MiB/s) 50k vtfs-none-spool seqread-libaio-multi 254(MiB/s) 63k vtfs-none-spool-numa seqread-libaio-multi 276(MiB/s) 69k vtfs-none-epool randread-psync 40(MiB/s) 10k vtfs-none-spool randread-psync 64(MiB/s) 16k vtfs-none-spool-numa randread-psync 72(MiB/s) 18k vtfs-none-epool randread-psync-multi 211(MiB/s) 52k vtfs-none-spool randread-psync-multi 252(MiB/s) 63k vtfs-none-spool-numa randread-psync-multi 297(MiB/s) 74k vtfs-none-epool randread-libaio 313(MiB/s) 78k vtfs-none-spool randread-libaio 320(MiB/s) 80k vtfs-none-spool-numa randread-libaio 330(MiB/s) 82k vtfs-none-epool randread-libaio-multi 257(MiB/s) 64k vtfs-none-spool randread-libaio-multi 274(MiB/s) 68k vtfs-none-spool-numa randread-libaio-multi 319(MiB/s) 79k vtfs-none-epool seqwrite-psync 34(MiB/s) 8926 vtfs-none-spool seqwrite-psync 55(MiB/s) 13k vtfs-none-spool-numa seqwrite-psync 66(MiB/s) 16k vtfs-none-epool seqwrite-psync-multi 196(MiB/s) 49k vtfs-none-spool seqwrite-psync-multi 225(MiB/s) 56k vtfs-none-spool-numa seqwrite-psync-multi 270(MiB/s) 67k vtfs-none-epool seqwrite-libaio 257(MiB/s) 64k vtfs-none-spool seqwrite-libaio 304(MiB/s) 76k vtfs-none-spool-numa seqwrite-libaio 267(MiB/s) 66k vtfs-none-epool seqwrite-libaio-multi 312(MiB/s) 78k vtfs-none-spool seqwrite-libaio-multi 366(MiB/s) 91k vtfs-none-spool-numa seqwrite-libaio-multi 381(MiB/s) 95k vtfs-none-epool randwrite-psync 38(MiB/s) 9745 vtfs-none-spool randwrite-psync 55(MiB/s) 13k vtfs-none-spool-numa randwrite-psync 67(MiB/s) 16k vtfs-none-epool randwrite-psync-multi 186(MiB/s) 46k vtfs-none-spool randwrite-psync-multi 240(MiB/s) 60k vtfs-none-spool-numa randwrite-psync-multi 271(MiB/s) 67k vtfs-none-epool randwrite-libaio 224(MiB/s) 56k vtfs-none-spool randwrite-libaio 296(MiB/s) 74k vtfs-none-spool-numa randwrite-libaio 290(MiB/s) 72k vtfs-none-epool randwrite-libaio-multi 300(MiB/s) 75k vtfs-none-spool randwrite-libaio-multi 350(MiB/s) 87k vtfs-none-spool-numa randwrite-libaio-multi 383(MiB/s) 95k Thanks Vivek ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: tools/virtiofs: Multi threading seems to hurt performance 2020-09-21 20:16 ` Vivek Goyal @ 2020-09-22 11:09 ` Dr. David Alan Gilbert 2020-09-22 22:56 ` Vivek Goyal 0 siblings, 1 reply; 55+ messages in thread From: Dr. David Alan Gilbert @ 2020-09-22 11:09 UTC (permalink / raw) To: Vivek Goyal; +Cc: virtio-fs-list, qemu-devel, Stefan Hajnoczi, Miklos Szeredi * Vivek Goyal (vgoyal@redhat.com) wrote: > On Fri, Sep 18, 2020 at 05:34:36PM -0400, Vivek Goyal wrote: > > Hi All, > > > > virtiofsd default thread pool size is 64. To me it feels that in most of > > the cases thread pool size 1 performs better than thread pool size 64. > > > > I ran virtiofs-tests. > > > > https://github.com/rhvgoyal/virtiofs-tests > > I spent more time debugging this. First thing I noticed is that we > are using "exclusive" glib thread pool. > > https://developer.gnome.org/glib/stable/glib-Thread-Pools.html#g-thread-pool-new > > This seems to run pre-determined number of threads dedicated to that > thread pool. Little instrumentation of code revealed that every new > request gets assiged to new thread (despite the fact that previous > thread finished its job). So internally there might be some kind of > round robin policy to choose next thread for running the job. > > I decided to switch to "shared" pool instead where it seemed to spin > up new threads only if there is enough work. Also threads can be shared > between pools. > > And looks like testing results are way better with "shared" pools. So > may be we should switch to shared pool by default. (Till somebody shows > in what cases exclusive pools are better). > > Second thought which came to mind was what's the impact of NUMA. What > if qemu and virtiofsd process/threads are running on separate NUMA > node. That should increase memory access latency and increased overhead. > So I used "numactl --cpubind=0" to bind both qemu and virtiofsd to node > 0. My machine seems to have two numa nodes. (Each node is having 32 > logical processors). Keeping both qemu and virtiofsd on same node > improves throughput further. > > So here are the results. > > vtfs-none-epool --> cache=none, exclusive thread pool. > vtfs-none-spool --> cache=none, shared thread pool. > vtfs-none-spool-numa --> cache=none, shared thread pool, same numa node Do you have the numbers for: epool epool thread-pool-size=1 spool ? Dave > > NAME WORKLOAD Bandwidth IOPS > vtfs-none-epool seqread-psync 36(MiB/s) 9392 > vtfs-none-spool seqread-psync 68(MiB/s) 17k > vtfs-none-spool-numa seqread-psync 73(MiB/s) 18k > > vtfs-none-epool seqread-psync-multi 210(MiB/s) 52k > vtfs-none-spool seqread-psync-multi 260(MiB/s) 65k > vtfs-none-spool-numa seqread-psync-multi 309(MiB/s) 77k > > vtfs-none-epool seqread-libaio 286(MiB/s) 71k > vtfs-none-spool seqread-libaio 328(MiB/s) 82k > vtfs-none-spool-numa seqread-libaio 332(MiB/s) 83k > > vtfs-none-epool seqread-libaio-multi 201(MiB/s) 50k > vtfs-none-spool seqread-libaio-multi 254(MiB/s) 63k > vtfs-none-spool-numa seqread-libaio-multi 276(MiB/s) 69k > > vtfs-none-epool randread-psync 40(MiB/s) 10k > vtfs-none-spool randread-psync 64(MiB/s) 16k > vtfs-none-spool-numa randread-psync 72(MiB/s) 18k > > vtfs-none-epool randread-psync-multi 211(MiB/s) 52k > vtfs-none-spool randread-psync-multi 252(MiB/s) 63k > vtfs-none-spool-numa randread-psync-multi 297(MiB/s) 74k > > vtfs-none-epool randread-libaio 313(MiB/s) 78k > vtfs-none-spool randread-libaio 320(MiB/s) 80k > vtfs-none-spool-numa randread-libaio 330(MiB/s) 82k > > vtfs-none-epool randread-libaio-multi 257(MiB/s) 64k > vtfs-none-spool randread-libaio-multi 274(MiB/s) 68k > vtfs-none-spool-numa randread-libaio-multi 319(MiB/s) 79k > > vtfs-none-epool seqwrite-psync 34(MiB/s) 8926 > vtfs-none-spool seqwrite-psync 55(MiB/s) 13k > vtfs-none-spool-numa seqwrite-psync 66(MiB/s) 16k > > vtfs-none-epool seqwrite-psync-multi 196(MiB/s) 49k > vtfs-none-spool seqwrite-psync-multi 225(MiB/s) 56k > vtfs-none-spool-numa seqwrite-psync-multi 270(MiB/s) 67k > > vtfs-none-epool seqwrite-libaio 257(MiB/s) 64k > vtfs-none-spool seqwrite-libaio 304(MiB/s) 76k > vtfs-none-spool-numa seqwrite-libaio 267(MiB/s) 66k > > vtfs-none-epool seqwrite-libaio-multi 312(MiB/s) 78k > vtfs-none-spool seqwrite-libaio-multi 366(MiB/s) 91k > vtfs-none-spool-numa seqwrite-libaio-multi 381(MiB/s) 95k > > vtfs-none-epool randwrite-psync 38(MiB/s) 9745 > vtfs-none-spool randwrite-psync 55(MiB/s) 13k > vtfs-none-spool-numa randwrite-psync 67(MiB/s) 16k > > vtfs-none-epool randwrite-psync-multi 186(MiB/s) 46k > vtfs-none-spool randwrite-psync-multi 240(MiB/s) 60k > vtfs-none-spool-numa randwrite-psync-multi 271(MiB/s) 67k > > vtfs-none-epool randwrite-libaio 224(MiB/s) 56k > vtfs-none-spool randwrite-libaio 296(MiB/s) 74k > vtfs-none-spool-numa randwrite-libaio 290(MiB/s) 72k > > vtfs-none-epool randwrite-libaio-multi 300(MiB/s) 75k > vtfs-none-spool randwrite-libaio-multi 350(MiB/s) 87k > vtfs-none-spool-numa randwrite-libaio-multi 383(MiB/s) 95k > > Thanks > Vivek -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: tools/virtiofs: Multi threading seems to hurt performance 2020-09-22 11:09 ` Dr. David Alan Gilbert @ 2020-09-22 22:56 ` Vivek Goyal 0 siblings, 0 replies; 55+ messages in thread From: Vivek Goyal @ 2020-09-22 22:56 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: virtio-fs-list, qemu-devel, Stefan Hajnoczi, Miklos Szeredi On Tue, Sep 22, 2020 at 12:09:46PM +0100, Dr. David Alan Gilbert wrote: > > Do you have the numbers for: > epool > epool thread-pool-size=1 > spool Hi David, Ok, I re-ran my numbers again after upgrading to latest qemu and also upgraded host kernel to latest upstream. Apart from comparing I epool, spool and 1Thread, I also ran their numa variants. That is I launched qemu and virtiofsd on node 0 of machine (numactl --cpunodebind=0). Results are kind of mixed. Here are my takeaways. - Running on same numa node improves performance overall for exclusive, shared and exclusive-1T mode. - In general both shared pool and exclusive-1T mode seem to perform better than exclusive mode, except for the case of randwrite-libaio. In some cases (seqread-libaio, seqwrite-libaio, seqwrite-libaio-multi) exclusive pool performs better than exclusive-1T. - Looks like in some cases exclusive-1T performs better than shared pool. (randwrite-libaio, randwrite-psync-multi, seqwrite-psync-multi, seqwrite-psync, seqread-libaio-multi, seqread-psync-multi) Overall, I feel that both exlusive-1T and shared perform better than exclusive pool. Results between exclusive-1T and shared pool are mixed. It seems like in many cases exclusve-1T performs better. I would say that moving to "shared" pool seems like a reasonable option. Thanks Vivek NAME WORKLOAD Bandwidth IOPS vtfs-none-epool seqread-psync 38(MiB/s) 9967 vtfs-none-epool-1T seqread-psync 66(MiB/s) 16k vtfs-none-spool seqread-psync 67(MiB/s) 16k vtfs-none-epool-numa seqread-psync 48(MiB/s) 12k vtfs-none-epool-1T-numa seqread-psync 74(MiB/s) 18k vtfs-none-spool-numa seqread-psync 74(MiB/s) 18k vtfs-none-epool seqread-psync-multi 204(MiB/s) 51k vtfs-none-epool-1T seqread-psync-multi 325(MiB/s) 81k vtfs-none-spool seqread-psync-multi 271(MiB/s) 67k vtfs-none-epool-numa seqread-psync-multi 253(MiB/s) 63k vtfs-none-epool-1T-numa seqread-psync-multi 349(MiB/s) 87k vtfs-none-spool-numa seqread-psync-multi 301(MiB/s) 75k vtfs-none-epool seqread-libaio 301(MiB/s) 75k vtfs-none-epool-1T seqread-libaio 273(MiB/s) 68k vtfs-none-spool seqread-libaio 334(MiB/s) 83k vtfs-none-epool-numa seqread-libaio 315(MiB/s) 78k vtfs-none-epool-1T-numa seqread-libaio 326(MiB/s) 81k vtfs-none-spool-numa seqread-libaio 335(MiB/s) 83k vtfs-none-epool seqread-libaio-multi 202(MiB/s) 50k vtfs-none-epool-1T seqread-libaio-multi 308(MiB/s) 77k vtfs-none-spool seqread-libaio-multi 247(MiB/s) 61k vtfs-none-epool-numa seqread-libaio-multi 238(MiB/s) 59k vtfs-none-epool-1T-numa seqread-libaio-multi 307(MiB/s) 76k vtfs-none-spool-numa seqread-libaio-multi 269(MiB/s) 67k vtfs-none-epool randread-psync 41(MiB/s) 10k vtfs-none-epool-1T randread-psync 67(MiB/s) 16k vtfs-none-spool randread-psync 64(MiB/s) 16k vtfs-none-epool-numa randread-psync 48(MiB/s) 12k vtfs-none-epool-1T-numa randread-psync 73(MiB/s) 18k vtfs-none-spool-numa randread-psync 72(MiB/s) 18k vtfs-none-epool randread-psync-multi 207(MiB/s) 51k vtfs-none-epool-1T randread-psync-multi 313(MiB/s) 78k vtfs-none-spool randread-psync-multi 265(MiB/s) 66k vtfs-none-epool-numa randread-psync-multi 253(MiB/s) 63k vtfs-none-epool-1T-numa randread-psync-multi 340(MiB/s) 85k vtfs-none-spool-numa randread-psync-multi 305(MiB/s) 76k vtfs-none-epool randread-libaio 305(MiB/s) 76k vtfs-none-epool-1T randread-libaio 308(MiB/s) 77k vtfs-none-spool randread-libaio 329(MiB/s) 82k vtfs-none-epool-numa randread-libaio 310(MiB/s) 77k vtfs-none-epool-1T-numa randread-libaio 328(MiB/s) 82k vtfs-none-spool-numa randread-libaio 339(MiB/s) 84k vtfs-none-epool randread-libaio-multi 265(MiB/s) 66k vtfs-none-epool-1T randread-libaio-multi 267(MiB/s) 66k vtfs-none-spool randread-libaio-multi 269(MiB/s) 67k vtfs-none-epool-numa randread-libaio-multi 314(MiB/s) 78k vtfs-none-epool-1T-numa randread-libaio-multi 319(MiB/s) 79k vtfs-none-spool-numa randread-libaio-multi 318(MiB/s) 79k vtfs-none-epool seqwrite-psync 36(MiB/s) 9224 vtfs-none-epool-1T seqwrite-psync 67(MiB/s) 16k vtfs-none-spool seqwrite-psync 61(MiB/s) 15k vtfs-none-epool-numa seqwrite-psync 44(MiB/s) 11k vtfs-none-epool-1T-numa seqwrite-psync 69(MiB/s) 17k vtfs-none-spool-numa seqwrite-psync 68(MiB/s) 17k vtfs-none-epool seqwrite-psync-multi 193(MiB/s) 48k vtfs-none-epool-1T seqwrite-psync-multi 299(MiB/s) 74k vtfs-none-spool seqwrite-psync-multi 240(MiB/s) 60k vtfs-none-epool-numa seqwrite-psync-multi 233(MiB/s) 58k vtfs-none-epool-1T-numa seqwrite-psync-multi 358(MiB/s) 89k vtfs-none-spool-numa seqwrite-psync-multi 285(MiB/s) 71k vtfs-none-epool seqwrite-libaio 265(MiB/s) 66k vtfs-none-epool-1T seqwrite-libaio 245(MiB/s) 61k vtfs-none-spool seqwrite-libaio 312(MiB/s) 78k vtfs-none-epool-numa seqwrite-libaio 295(MiB/s) 73k vtfs-none-epool-1T-numa seqwrite-libaio 282(MiB/s) 70k vtfs-none-spool-numa seqwrite-libaio 297(MiB/s) 74k vtfs-none-epool seqwrite-libaio-multi 313(MiB/s) 78k vtfs-none-epool-1T seqwrite-libaio-multi 299(MiB/s) 74k vtfs-none-spool seqwrite-libaio-multi 315(MiB/s) 78k vtfs-none-epool-numa seqwrite-libaio-multi 318(MiB/s) 79k vtfs-none-epool-1T-numa seqwrite-libaio-multi 410(MiB/s) 102k vtfs-none-spool-numa seqwrite-libaio-multi 378(MiB/s) 94k vtfs-none-epool randwrite-psync 33(MiB/s) 8629 vtfs-none-epool-1T randwrite-psync 61(MiB/s) 15k vtfs-none-spool randwrite-psync 63(MiB/s) 15k vtfs-none-epool-numa randwrite-psync 49(MiB/s) 12k vtfs-none-epool-1T-numa randwrite-psync 68(MiB/s) 17k vtfs-none-spool-numa randwrite-psync 66(MiB/s) 16k vtfs-none-epool randwrite-psync-multi 186(MiB/s) 46k vtfs-none-epool-1T randwrite-psync-multi 300(MiB/s) 75k vtfs-none-spool randwrite-psync-multi 233(MiB/s) 58k vtfs-none-epool-numa randwrite-psync-multi 235(MiB/s) 58k vtfs-none-epool-1T-numa randwrite-psync-multi 355(MiB/s) 88k vtfs-none-spool-numa randwrite-psync-multi 266(MiB/s) 66k vtfs-none-epool randwrite-libaio 289(MiB/s) 72k vtfs-none-epool-1T randwrite-libaio 284(MiB/s) 71k vtfs-none-spool randwrite-libaio 278(MiB/s) 69k vtfs-none-epool-numa randwrite-libaio 292(MiB/s) 73k vtfs-none-epool-1T-numa randwrite-libaio 294(MiB/s) 73k vtfs-none-spool-numa randwrite-libaio 290(MiB/s) 72k vtfs-none-epool randwrite-libaio-multi 317(MiB/s) 79k vtfs-none-epool-1T randwrite-libaio-multi 323(MiB/s) 80k vtfs-none-spool randwrite-libaio-multi 330(MiB/s) 82k vtfs-none-epool-numa randwrite-libaio-multi 315(MiB/s) 78k vtfs-none-epool-1T-numa randwrite-libaio-multi 409(MiB/s) 102k vtfs-none-spool-numa randwrite-libaio-multi 384(MiB/s) 96k ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [Virtio-fs] tools/virtiofs: Multi threading seems to hurt performance 2020-09-18 21:34 tools/virtiofs: Multi threading seems to hurt performance Vivek Goyal ` (3 preceding siblings ...) 2020-09-21 20:16 ` Vivek Goyal @ 2020-09-23 12:50 ` Chirantan Ekbote 2020-09-23 12:59 ` Vivek Goyal 2020-09-25 11:35 ` Dr. David Alan Gilbert 4 siblings, 2 replies; 55+ messages in thread From: Chirantan Ekbote @ 2020-09-23 12:50 UTC (permalink / raw) To: Vivek Goyal; +Cc: virtio-fs-list, qemu-devel On Sat, Sep 19, 2020 at 6:36 AM Vivek Goyal <vgoyal@redhat.com> wrote: > > Hi All, > > virtiofsd default thread pool size is 64. To me it feels that in most of > the cases thread pool size 1 performs better than thread pool size 64. > > I ran virtiofs-tests. > > https://github.com/rhvgoyal/virtiofs-tests > > And here are the comparision results. To me it seems that by default > we should switch to 1 thread (Till we can figure out how to make > multi thread performance better even when single process is doing > I/O in client). > FWIW, we've observed the same behavior in crosvm. Using a thread pool for the virtiofs server consistently gave us worse performance than using a single thread. Chirantan ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [Virtio-fs] tools/virtiofs: Multi threading seems to hurt performance 2020-09-23 12:50 ` [Virtio-fs] " Chirantan Ekbote @ 2020-09-23 12:59 ` Vivek Goyal 2020-09-25 11:35 ` Dr. David Alan Gilbert 1 sibling, 0 replies; 55+ messages in thread From: Vivek Goyal @ 2020-09-23 12:59 UTC (permalink / raw) To: Chirantan Ekbote; +Cc: virtio-fs-list, qemu-devel On Wed, Sep 23, 2020 at 09:50:59PM +0900, Chirantan Ekbote wrote: > On Sat, Sep 19, 2020 at 6:36 AM Vivek Goyal <vgoyal@redhat.com> wrote: > > > > Hi All, > > > > virtiofsd default thread pool size is 64. To me it feels that in most of > > the cases thread pool size 1 performs better than thread pool size 64. > > > > I ran virtiofs-tests. > > > > https://github.com/rhvgoyal/virtiofs-tests > > > > And here are the comparision results. To me it seems that by default > > we should switch to 1 thread (Till we can figure out how to make > > multi thread performance better even when single process is doing > > I/O in client). > > > > FWIW, we've observed the same behavior in crosvm. Using a thread pool > for the virtiofs server consistently gave us worse performance than > using a single thread. Thanks for sharing this information Chirantan. Shared pool seems to perform better than exclusive pool. Single thread vs shared pool is sort of mixed result but it looks like one thread beats shared pool results in many of the tests. May be we will have to swtich to single thread as default at some point of time if shared pool does not live up to the expectations. Vivek ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [Virtio-fs] tools/virtiofs: Multi threading seems to hurt performance 2020-09-23 12:50 ` [Virtio-fs] " Chirantan Ekbote 2020-09-23 12:59 ` Vivek Goyal @ 2020-09-25 11:35 ` Dr. David Alan Gilbert 1 sibling, 0 replies; 55+ messages in thread From: Dr. David Alan Gilbert @ 2020-09-25 11:35 UTC (permalink / raw) To: Chirantan Ekbote; +Cc: virtio-fs-list, qemu-devel, Vivek Goyal * Chirantan Ekbote (chirantan@chromium.org) wrote: > On Sat, Sep 19, 2020 at 6:36 AM Vivek Goyal <vgoyal@redhat.com> wrote: > > > > Hi All, > > > > virtiofsd default thread pool size is 64. To me it feels that in most of > > the cases thread pool size 1 performs better than thread pool size 64. > > > > I ran virtiofs-tests. > > > > https://github.com/rhvgoyal/virtiofs-tests > > > > And here are the comparision results. To me it seems that by default > > we should switch to 1 thread (Till we can figure out how to make > > multi thread performance better even when single process is doing > > I/O in client). > > > > FWIW, we've observed the same behavior in crosvm. Using a thread pool > for the virtiofs server consistently gave us worse performance than > using a single thread. Interesting; so it's not just us doing something silly! It does feel like you *should* be able to get some benefit from multiple threads; so I guess some more investigation needed at some time. Dave > Chirantan > > _______________________________________________ > Virtio-fs mailing list > Virtio-fs@redhat.com > https://www.redhat.com/mailman/listinfo/virtio-fs -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 55+ messages in thread
end of thread, other threads:[~2021-03-05 14:59 UTC | newest] Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-09-18 21:34 tools/virtiofs: Multi threading seems to hurt performance Vivek Goyal 2020-09-21 8:39 ` Stefan Hajnoczi 2020-09-21 13:39 ` Vivek Goyal 2020-09-21 16:57 ` Stefan Hajnoczi 2020-09-21 8:50 ` Dr. David Alan Gilbert 2020-09-21 13:35 ` Vivek Goyal 2020-09-21 14:08 ` Daniel P. Berrangé 2020-09-21 15:32 ` Dr. David Alan Gilbert 2020-09-22 10:25 ` Dr. David Alan Gilbert 2020-09-22 17:47 ` Vivek Goyal 2020-09-24 21:33 ` Venegas Munoz, Jose Carlos 2020-09-24 22:10 ` virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance) Vivek Goyal 2020-09-25 8:06 ` virtiofs vs 9p performance Christian Schoenebeck 2020-09-25 13:13 ` Vivek Goyal 2020-09-25 15:47 ` Christian Schoenebeck 2021-02-19 16:08 ` Can not set high msize with virtio-9p (Was: Re: virtiofs vs 9p performance) Vivek Goyal 2021-02-19 17:33 ` Christian Schoenebeck 2021-02-19 19:01 ` Vivek Goyal 2021-02-20 15:38 ` Christian Schoenebeck 2021-02-22 12:18 ` Greg Kurz 2021-02-22 15:08 ` Christian Schoenebeck 2021-02-22 17:11 ` Greg Kurz 2021-02-23 13:39 ` Christian Schoenebeck 2021-02-23 14:07 ` Michael S. Tsirkin 2021-02-24 15:16 ` Christian Schoenebeck 2021-02-24 15:43 ` Dominique Martinet 2021-02-26 13:49 ` Christian Schoenebeck 2021-02-27 0:03 ` Dominique Martinet 2021-03-03 14:04 ` Christian Schoenebeck 2021-03-03 14:50 ` Dominique Martinet 2021-03-05 14:57 ` Christian Schoenebeck 2020-09-25 12:41 ` virtiofs vs 9p performance(Re: tools/virtiofs: Multi threading seems to hurt performance) Dr. David Alan Gilbert 2020-09-25 13:04 ` Christian Schoenebeck 2020-09-25 13:05 ` Dr. David Alan Gilbert 2020-09-25 16:05 ` Christian Schoenebeck 2020-09-25 16:33 ` Christian Schoenebeck 2020-09-25 18:51 ` Dr. David Alan Gilbert 2020-09-27 12:14 ` Christian Schoenebeck 2020-09-29 13:03 ` Vivek Goyal 2020-09-29 13:28 ` Christian Schoenebeck 2020-09-29 13:49 ` Vivek Goyal 2020-09-29 13:59 ` Christian Schoenebeck 2020-09-29 13:17 ` Vivek Goyal 2020-09-29 13:49 ` [Virtio-fs] " Miklos Szeredi 2020-09-29 14:01 ` Vivek Goyal 2020-09-29 14:54 ` Miklos Szeredi 2020-09-29 15:28 ` Vivek Goyal 2020-09-25 12:11 ` tools/virtiofs: Multi threading seems to hurt performance Dr. David Alan Gilbert 2020-09-25 13:11 ` Vivek Goyal 2020-09-21 20:16 ` Vivek Goyal 2020-09-22 11:09 ` Dr. David Alan Gilbert 2020-09-22 22:56 ` Vivek Goyal 2020-09-23 12:50 ` [Virtio-fs] " Chirantan Ekbote 2020-09-23 12:59 ` Vivek Goyal 2020-09-25 11:35 ` Dr. David Alan Gilbert
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).