All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: virtio-fs performance
       [not found] <CAFKS8hWbckrE_cyJCf0pgFresD-JQk66wo-6uJA=Gu2MhReHVw@mail.gmail.com>
@ 2020-07-28 13:49   ` Stefan Hajnoczi
  0 siblings, 0 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2020-07-28 13:49 UTC (permalink / raw)
  To: jwsu1986; +Cc: virtio-fs, qemu-devel, qemu-discuss

[-- Attachment #1: Type: text/plain, Size: 3368 bytes --]

> I'm trying and testing the virtio-fs feature in QEMU v5.0.0.
> My host and guest OS are both ubuntu 18.04 with kernel 5.4, and the
> underlying storage is one single SSD.
> 
> The configuations are:
> (1) virtiofsd
> ./virtiofsd -o 
> source=/mnt/ssd/virtiofs,cache=auto,flock,posix_lock,writeback,xattr
> --thread-pool-size=1 --socket-path=/tmp/vhostqemu
> 
> (2) qemu
> qemu-system-x86_64 \
> -enable-kvm \
> -name ubuntu \
> -cpu Westmere \
> -m 4096 \
> -global kvm-apic.vapic=false \
> -netdev tap,id=hn0,vhost=off,br=br0,helper=/usr/local/libexec/qemu-bridge-helper
> \
> -device e1000,id=e0,netdev=hn0 \
> -blockdev '{"node-name": "disk0", "driver": "qcow2",
> "refcount-cache-size": 1638400, "l2-cache-size": 6553600, "file": {
> "driver": "file", "filename": "'${imagefolder}\/ubuntu.qcow2'"}}' \
> -device virtio-blk,drive=disk0,id=disk0 \
> -chardev socket,id=ch0,path=/tmp/vhostqemu \
> -device vhost-user-fs-pci,chardev=ch0,tag=myfs \
> -object memory-backend-memfd,id=mem,size=4G,share=on \
> -numa node,memdev=mem \
> -qmp stdio \
> -vnc :0
> 
> (3) guest
> mount -t virtiofs myfs /mnt/virtiofs
> 
> I tried to change virtiofsd's --thread-pool-size value and test the
> storage performance by fio.
> Before each read/write/randread/randwrite test, the pagecaches of
> guest and host are dropped.
> 
> ```
> RW="read" # or write/randread/randwrite
> fio --name=test --rw=$RW --bs=4k --numjobs=1 --ioengine=libaio
> --runtime=60 --direct=0 --iodepth=64 --size=10g
> --filename=/mnt/virtiofs/testfile
> done
> ```
> 
> --thread-pool-size=64 (default)
>     seq read: 305 MB/s
>     seq write: 118 MB/s
>     rand 4KB read: 2222 IOPS
>     rand 4KB write: 21100 IOPS
> 
> --thread-pool-size=1
>     seq read: 387 MB/s
>     seq write: 160 MB/s
>     rand 4KB read: 2622 IOPS
>     rand 4KB write: 30400 IOPS
> 
> The results show the performance using default-pool-size (64) is
> poorer than using single thread.
> Is it due to the lock contention of the multiple threads?
> When can virtio-fs get better performance using multiple threads?
> 
> 
> I also tested the performance that guest accesses host's files via
> NFSv4/CIFS network filesystem.
> The "seq read" and "randread" performance of virtio-fs are also worse
> than the NFSv4 and CIFS.
> 
> NFSv4:
>   seq write: 244 MB/s
>   rand 4K read: 4086 IOPS
> 
> I cannot figure out why the perf of NFSv4/CIFS with the network stack
> is better than virtio-fs.
> Is it expected? Or, do I have an incorrect configuration?

No, I remember benchmarking the thread pool and did not see such a big
difference.

Please use direct=1 so that each I/O results in a virtio-fs request.
Otherwise the I/O pattern is not directly controlled by the benchmark
but by the page cache (readahead, etc).

Using numactl(8) or taskset(1) to launch virtiofsd allows you to control
NUMA and CPU scheduling properties. For example, you could force all 64
threads to run on the same host CPU using taskset to see if that helps
this I/O bound workload.

fio can collect detailed statistics on queue depths and a latency
histogram. It would be interesting to compare the --thread-pool-size=64
and --thread-pool-size=1 numbers.

Comparing the "perf record -e kvm:kvm_exit" counts between the two might
also be interesting.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Virtio-fs] virtio-fs performance
@ 2020-07-28 13:49   ` Stefan Hajnoczi
  0 siblings, 0 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2020-07-28 13:49 UTC (permalink / raw)
  To: jwsu1986; +Cc: virtio-fs, qemu-devel, qemu-discuss

[-- Attachment #1: Type: text/plain, Size: 3368 bytes --]

> I'm trying and testing the virtio-fs feature in QEMU v5.0.0.
> My host and guest OS are both ubuntu 18.04 with kernel 5.4, and the
> underlying storage is one single SSD.
> 
> The configuations are:
> (1) virtiofsd
> ./virtiofsd -o 
> source=/mnt/ssd/virtiofs,cache=auto,flock,posix_lock,writeback,xattr
> --thread-pool-size=1 --socket-path=/tmp/vhostqemu
> 
> (2) qemu
> qemu-system-x86_64 \
> -enable-kvm \
> -name ubuntu \
> -cpu Westmere \
> -m 4096 \
> -global kvm-apic.vapic=false \
> -netdev tap,id=hn0,vhost=off,br=br0,helper=/usr/local/libexec/qemu-bridge-helper
> \
> -device e1000,id=e0,netdev=hn0 \
> -blockdev '{"node-name": "disk0", "driver": "qcow2",
> "refcount-cache-size": 1638400, "l2-cache-size": 6553600, "file": {
> "driver": "file", "filename": "'${imagefolder}\/ubuntu.qcow2'"}}' \
> -device virtio-blk,drive=disk0,id=disk0 \
> -chardev socket,id=ch0,path=/tmp/vhostqemu \
> -device vhost-user-fs-pci,chardev=ch0,tag=myfs \
> -object memory-backend-memfd,id=mem,size=4G,share=on \
> -numa node,memdev=mem \
> -qmp stdio \
> -vnc :0
> 
> (3) guest
> mount -t virtiofs myfs /mnt/virtiofs
> 
> I tried to change virtiofsd's --thread-pool-size value and test the
> storage performance by fio.
> Before each read/write/randread/randwrite test, the pagecaches of
> guest and host are dropped.
> 
> ```
> RW="read" # or write/randread/randwrite
> fio --name=test --rw=$RW --bs=4k --numjobs=1 --ioengine=libaio
> --runtime=60 --direct=0 --iodepth=64 --size=10g
> --filename=/mnt/virtiofs/testfile
> done
> ```
> 
> --thread-pool-size=64 (default)
>     seq read: 305 MB/s
>     seq write: 118 MB/s
>     rand 4KB read: 2222 IOPS
>     rand 4KB write: 21100 IOPS
> 
> --thread-pool-size=1
>     seq read: 387 MB/s
>     seq write: 160 MB/s
>     rand 4KB read: 2622 IOPS
>     rand 4KB write: 30400 IOPS
> 
> The results show the performance using default-pool-size (64) is
> poorer than using single thread.
> Is it due to the lock contention of the multiple threads?
> When can virtio-fs get better performance using multiple threads?
> 
> 
> I also tested the performance that guest accesses host's files via
> NFSv4/CIFS network filesystem.
> The "seq read" and "randread" performance of virtio-fs are also worse
> than the NFSv4 and CIFS.
> 
> NFSv4:
>   seq write: 244 MB/s
>   rand 4K read: 4086 IOPS
> 
> I cannot figure out why the perf of NFSv4/CIFS with the network stack
> is better than virtio-fs.
> Is it expected? Or, do I have an incorrect configuration?

No, I remember benchmarking the thread pool and did not see such a big
difference.

Please use direct=1 so that each I/O results in a virtio-fs request.
Otherwise the I/O pattern is not directly controlled by the benchmark
but by the page cache (readahead, etc).

Using numactl(8) or taskset(1) to launch virtiofsd allows you to control
NUMA and CPU scheduling properties. For example, you could force all 64
threads to run on the same host CPU using taskset to see if that helps
this I/O bound workload.

fio can collect detailed statistics on queue depths and a latency
histogram. It would be interesting to compare the --thread-pool-size=64
and --thread-pool-size=1 numbers.

Comparing the "perf record -e kvm:kvm_exit" counts between the two might
also be interesting.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Virtio-fs] virtio-fs performance
  2020-07-28 13:49   ` [Virtio-fs] " Stefan Hajnoczi
  (?)
@ 2020-07-28 15:27   ` Vivek Goyal
  2020-08-04  7:51       ` Derek Su
  -1 siblings, 1 reply; 11+ messages in thread
From: Vivek Goyal @ 2020-07-28 15:27 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-fs, jwsu1986, qemu-devel, qemu-discuss

On Tue, Jul 28, 2020 at 02:49:36PM +0100, Stefan Hajnoczi wrote:
> > I'm trying and testing the virtio-fs feature in QEMU v5.0.0.
> > My host and guest OS are both ubuntu 18.04 with kernel 5.4, and the
> > underlying storage is one single SSD.
> > 
> > The configuations are:
> > (1) virtiofsd
> > ./virtiofsd -o 
> > source=/mnt/ssd/virtiofs,cache=auto,flock,posix_lock,writeback,xattr
> > --thread-pool-size=1 --socket-path=/tmp/vhostqemu
> > 
> > (2) qemu
> > qemu-system-x86_64 \
> > -enable-kvm \
> > -name ubuntu \
> > -cpu Westmere \
> > -m 4096 \
> > -global kvm-apic.vapic=false \
> > -netdev tap,id=hn0,vhost=off,br=br0,helper=/usr/local/libexec/qemu-bridge-helper
> > \
> > -device e1000,id=e0,netdev=hn0 \
> > -blockdev '{"node-name": "disk0", "driver": "qcow2",
> > "refcount-cache-size": 1638400, "l2-cache-size": 6553600, "file": {
> > "driver": "file", "filename": "'${imagefolder}\/ubuntu.qcow2'"}}' \
> > -device virtio-blk,drive=disk0,id=disk0 \
> > -chardev socket,id=ch0,path=/tmp/vhostqemu \
> > -device vhost-user-fs-pci,chardev=ch0,tag=myfs \
> > -object memory-backend-memfd,id=mem,size=4G,share=on \
> > -numa node,memdev=mem \
> > -qmp stdio \
> > -vnc :0
> > 
> > (3) guest
> > mount -t virtiofs myfs /mnt/virtiofs
> > 
> > I tried to change virtiofsd's --thread-pool-size value and test the
> > storage performance by fio.
> > Before each read/write/randread/randwrite test, the pagecaches of
> > guest and host are dropped.
> > 
> > ```
> > RW="read" # or write/randread/randwrite
> > fio --name=test --rw=$RW --bs=4k --numjobs=1 --ioengine=libaio
> > --runtime=60 --direct=0 --iodepth=64 --size=10g
> > --filename=/mnt/virtiofs/testfile
> > done

Couple of things.

- Can you try cache=none option in virtiofsd. That will bypass page
  cache in guest. It also gets rid of latencies related to
  file_remove_privs() as of now. 

- Also with direct=0, are we really driving iodepth of 64? With direct=0
  it is cached I/O. Is it still asynchronous at this point of time of
  we have fallen back to synchronous I/O and driving queue depth of
  1.

- With cache=auto/always, I am seeing performance issues with small writes
  and trying to address it.

https://lore.kernel.org/linux-fsdevel/20200716144032.GC422759@redhat.com/
https://lore.kernel.org/linux-fsdevel/20200724183812.19573-1-vgoyal@redhat.com/

Thanks
Vivek

> > ```
> > 
> > --thread-pool-size=64 (default)
> >     seq read: 305 MB/s
> >     seq write: 118 MB/s
> >     rand 4KB read: 2222 IOPS
> >     rand 4KB write: 21100 IOPS
> > 
> > --thread-pool-size=1
> >     seq read: 387 MB/s
> >     seq write: 160 MB/s
> >     rand 4KB read: 2622 IOPS
> >     rand 4KB write: 30400 IOPS
> > 
> > The results show the performance using default-pool-size (64) is
> > poorer than using single thread.
> > Is it due to the lock contention of the multiple threads?
> > When can virtio-fs get better performance using multiple threads?
> > 
> > 
> > I also tested the performance that guest accesses host's files via
> > NFSv4/CIFS network filesystem.
> > The "seq read" and "randread" performance of virtio-fs are also worse
> > than the NFSv4 and CIFS.
> > 
> > NFSv4:
> >   seq write: 244 MB/s
> >   rand 4K read: 4086 IOPS
> > 
> > I cannot figure out why the perf of NFSv4/CIFS with the network stack
> > is better than virtio-fs.
> > Is it expected? Or, do I have an incorrect configuration?
> 
> No, I remember benchmarking the thread pool and did not see such a big
> difference.
> 
> Please use direct=1 so that each I/O results in a virtio-fs request.
> Otherwise the I/O pattern is not directly controlled by the benchmark
> but by the page cache (readahead, etc).
> 
> Using numactl(8) or taskset(1) to launch virtiofsd allows you to control
> NUMA and CPU scheduling properties. For example, you could force all 64
> threads to run on the same host CPU using taskset to see if that helps
> this I/O bound workload.
> 
> fio can collect detailed statistics on queue depths and a latency
> histogram. It would be interesting to compare the --thread-pool-size=64
> and --thread-pool-size=1 numbers.
> 
> Comparing the "perf record -e kvm:kvm_exit" counts between the two might
> also be interesting.
> 
> Stefan



> _______________________________________________
> Virtio-fs mailing list
> Virtio-fs@redhat.com
> https://www.redhat.com/mailman/listinfo/virtio-fs


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: virtio-fs performance
  2020-07-28 13:49   ` [Virtio-fs] " Stefan Hajnoczi
@ 2020-08-04  7:37     ` Derek Su
  -1 siblings, 0 replies; 11+ messages in thread
From: Derek Su @ 2020-08-04  7:37 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-fs, qemu-devel, qemu-discuss

Hello,

Set the cache=none in virtiofsd and direct=1 in fio,
here are the results and kvm-exit count in 5 seconds.

--thread-pool-size=64 (default)
    seq read: 307 MB/s (kvm-exit count=1076463)
    seq write: 430 MB/s (kvm-exit count=1302493)
    rand 4KB read: 65.2k IOPS (kvm-exit count=1322899)
    rand 4KB write: 97.2k IOPS (kvm-exit count=1568618)

--thread-pool-size=1
    seq read: 303 MB/s (kvm-exit count=1034614)
    seq write: 358 MB/s. (kvm-exit count=1537735)
    rand 4KB read: 7995 IOPS (kvm-exit count=438348)
    rand 4KB write: 97.7k IOPS (kvm-exit count=1907585)

The thread-pool-size=64 improves the rand 4KB read performance largely,
but doesn't increases the kvm-exit count too much.

In addition, the fio avg. clat of rand 4K write are 960us for
thread-pool-size=64 and 7700us for thread-pool-size=1.

Regards,
Derek

Stefan Hajnoczi <stefanha@redhat.com> 於 2020年7月28日 週二 下午9:49寫道:
>
> > I'm trying and testing the virtio-fs feature in QEMU v5.0.0.
> > My host and guest OS are both ubuntu 18.04 with kernel 5.4, and the
> > underlying storage is one single SSD.
> >
> > The configuations are:
> > (1) virtiofsd
> > ./virtiofsd -o
> > source=/mnt/ssd/virtiofs,cache=auto,flock,posix_lock,writeback,xattr
> > --thread-pool-size=1 --socket-path=/tmp/vhostqemu
> >
> > (2) qemu
> > qemu-system-x86_64 \
> > -enable-kvm \
> > -name ubuntu \
> > -cpu Westmere \
> > -m 4096 \
> > -global kvm-apic.vapic=false \
> > -netdev tap,id=hn0,vhost=off,br=br0,helper=/usr/local/libexec/qemu-bridge-helper
> > \
> > -device e1000,id=e0,netdev=hn0 \
> > -blockdev '{"node-name": "disk0", "driver": "qcow2",
> > "refcount-cache-size": 1638400, "l2-cache-size": 6553600, "file": {
> > "driver": "file", "filename": "'${imagefolder}\/ubuntu.qcow2'"}}' \
> > -device virtio-blk,drive=disk0,id=disk0 \
> > -chardev socket,id=ch0,path=/tmp/vhostqemu \
> > -device vhost-user-fs-pci,chardev=ch0,tag=myfs \
> > -object memory-backend-memfd,id=mem,size=4G,share=on \
> > -numa node,memdev=mem \
> > -qmp stdio \
> > -vnc :0
> >
> > (3) guest
> > mount -t virtiofs myfs /mnt/virtiofs
> >
> > I tried to change virtiofsd's --thread-pool-size value and test the
> > storage performance by fio.
> > Before each read/write/randread/randwrite test, the pagecaches of
> > guest and host are dropped.
> >
> > ```
> > RW="read" # or write/randread/randwrite
> > fio --name=test --rw=$RW --bs=4k --numjobs=1 --ioengine=libaio
> > --runtime=60 --direct=0 --iodepth=64 --size=10g
> > --filename=/mnt/virtiofs/testfile
> > done
> > ```
> >
> > --thread-pool-size=64 (default)
> >     seq read: 305 MB/s
> >     seq write: 118 MB/s
> >     rand 4KB read: 2222 IOPS
> >     rand 4KB write: 21100 IOPS
> >
> > --thread-pool-size=1
> >     seq read: 387 MB/s
> >     seq write: 160 MB/s
> >     rand 4KB read: 2622 IOPS
> >     rand 4KB write: 30400 IOPS
> >
> > The results show the performance using default-pool-size (64) is
> > poorer than using single thread.
> > Is it due to the lock contention of the multiple threads?
> > When can virtio-fs get better performance using multiple threads?
> >
> >
> > I also tested the performance that guest accesses host's files via
> > NFSv4/CIFS network filesystem.
> > The "seq read" and "randread" performance of virtio-fs are also worse
> > than the NFSv4 and CIFS.
> >
> > NFSv4:
> >   seq write: 244 MB/s
> >   rand 4K read: 4086 IOPS
> >
> > I cannot figure out why the perf of NFSv4/CIFS with the network stack
> > is better than virtio-fs.
> > Is it expected? Or, do I have an incorrect configuration?
>
> No, I remember benchmarking the thread pool and did not see such a big
> difference.
>
> Please use direct=1 so that each I/O results in a virtio-fs request.
> Otherwise the I/O pattern is not directly controlled by the benchmark
> but by the page cache (readahead, etc).
>
> Using numactl(8) or taskset(1) to launch virtiofsd allows you to control
> NUMA and CPU scheduling properties. For example, you could force all 64
> threads to run on the same host CPU using taskset to see if that helps
> this I/O bound workload.
>
> fio can collect detailed statistics on queue depths and a latency
> histogram. It would be interesting to compare the --thread-pool-size=64
> and --thread-pool-size=1 numbers.
>
> Comparing the "perf record -e kvm:kvm_exit" counts between the two might
> also be interesting.
>
> Stefan


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Virtio-fs] virtio-fs performance
@ 2020-08-04  7:37     ` Derek Su
  0 siblings, 0 replies; 11+ messages in thread
From: Derek Su @ 2020-08-04  7:37 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-fs, qemu-devel, qemu-discuss

Hello,

Set the cache=none in virtiofsd and direct=1 in fio,
here are the results and kvm-exit count in 5 seconds.

--thread-pool-size=64 (default)
    seq read: 307 MB/s (kvm-exit count=1076463)
    seq write: 430 MB/s (kvm-exit count=1302493)
    rand 4KB read: 65.2k IOPS (kvm-exit count=1322899)
    rand 4KB write: 97.2k IOPS (kvm-exit count=1568618)

--thread-pool-size=1
    seq read: 303 MB/s (kvm-exit count=1034614)
    seq write: 358 MB/s. (kvm-exit count=1537735)
    rand 4KB read: 7995 IOPS (kvm-exit count=438348)
    rand 4KB write: 97.7k IOPS (kvm-exit count=1907585)

The thread-pool-size=64 improves the rand 4KB read performance largely,
but doesn't increases the kvm-exit count too much.

In addition, the fio avg. clat of rand 4K write are 960us for
thread-pool-size=64 and 7700us for thread-pool-size=1.

Regards,
Derek

Stefan Hajnoczi <stefanha@redhat.com> 於 2020年7月28日 週二 下午9:49寫道:
>
> > I'm trying and testing the virtio-fs feature in QEMU v5.0.0.
> > My host and guest OS are both ubuntu 18.04 with kernel 5.4, and the
> > underlying storage is one single SSD.
> >
> > The configuations are:
> > (1) virtiofsd
> > ./virtiofsd -o
> > source=/mnt/ssd/virtiofs,cache=auto,flock,posix_lock,writeback,xattr
> > --thread-pool-size=1 --socket-path=/tmp/vhostqemu
> >
> > (2) qemu
> > qemu-system-x86_64 \
> > -enable-kvm \
> > -name ubuntu \
> > -cpu Westmere \
> > -m 4096 \
> > -global kvm-apic.vapic=false \
> > -netdev tap,id=hn0,vhost=off,br=br0,helper=/usr/local/libexec/qemu-bridge-helper
> > \
> > -device e1000,id=e0,netdev=hn0 \
> > -blockdev '{"node-name": "disk0", "driver": "qcow2",
> > "refcount-cache-size": 1638400, "l2-cache-size": 6553600, "file": {
> > "driver": "file", "filename": "'${imagefolder}\/ubuntu.qcow2'"}}' \
> > -device virtio-blk,drive=disk0,id=disk0 \
> > -chardev socket,id=ch0,path=/tmp/vhostqemu \
> > -device vhost-user-fs-pci,chardev=ch0,tag=myfs \
> > -object memory-backend-memfd,id=mem,size=4G,share=on \
> > -numa node,memdev=mem \
> > -qmp stdio \
> > -vnc :0
> >
> > (3) guest
> > mount -t virtiofs myfs /mnt/virtiofs
> >
> > I tried to change virtiofsd's --thread-pool-size value and test the
> > storage performance by fio.
> > Before each read/write/randread/randwrite test, the pagecaches of
> > guest and host are dropped.
> >
> > ```
> > RW="read" # or write/randread/randwrite
> > fio --name=test --rw=$RW --bs=4k --numjobs=1 --ioengine=libaio
> > --runtime=60 --direct=0 --iodepth=64 --size=10g
> > --filename=/mnt/virtiofs/testfile
> > done
> > ```
> >
> > --thread-pool-size=64 (default)
> >     seq read: 305 MB/s
> >     seq write: 118 MB/s
> >     rand 4KB read: 2222 IOPS
> >     rand 4KB write: 21100 IOPS
> >
> > --thread-pool-size=1
> >     seq read: 387 MB/s
> >     seq write: 160 MB/s
> >     rand 4KB read: 2622 IOPS
> >     rand 4KB write: 30400 IOPS
> >
> > The results show the performance using default-pool-size (64) is
> > poorer than using single thread.
> > Is it due to the lock contention of the multiple threads?
> > When can virtio-fs get better performance using multiple threads?
> >
> >
> > I also tested the performance that guest accesses host's files via
> > NFSv4/CIFS network filesystem.
> > The "seq read" and "randread" performance of virtio-fs are also worse
> > than the NFSv4 and CIFS.
> >
> > NFSv4:
> >   seq write: 244 MB/s
> >   rand 4K read: 4086 IOPS
> >
> > I cannot figure out why the perf of NFSv4/CIFS with the network stack
> > is better than virtio-fs.
> > Is it expected? Or, do I have an incorrect configuration?
>
> No, I remember benchmarking the thread pool and did not see such a big
> difference.
>
> Please use direct=1 so that each I/O results in a virtio-fs request.
> Otherwise the I/O pattern is not directly controlled by the benchmark
> but by the page cache (readahead, etc).
>
> Using numactl(8) or taskset(1) to launch virtiofsd allows you to control
> NUMA and CPU scheduling properties. For example, you could force all 64
> threads to run on the same host CPU using taskset to see if that helps
> this I/O bound workload.
>
> fio can collect detailed statistics on queue depths and a latency
> histogram. It would be interesting to compare the --thread-pool-size=64
> and --thread-pool-size=1 numbers.
>
> Comparing the "perf record -e kvm:kvm_exit" counts between the two might
> also be interesting.
>
> Stefan



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Virtio-fs] virtio-fs performance
  2020-07-28 15:27   ` Vivek Goyal
@ 2020-08-04  7:51       ` Derek Su
  0 siblings, 0 replies; 11+ messages in thread
From: Derek Su @ 2020-08-04  7:51 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, qemu-devel, Stefan Hajnoczi, qemu-discuss

Vivek Goyal <vgoyal@redhat.com> 於 2020年7月28日 週二 下午11:27寫道:
>
> On Tue, Jul 28, 2020 at 02:49:36PM +0100, Stefan Hajnoczi wrote:
> > > I'm trying and testing the virtio-fs feature in QEMU v5.0.0.
> > > My host and guest OS are both ubuntu 18.04 with kernel 5.4, and the
> > > underlying storage is one single SSD.
> > >
> > > The configuations are:
> > > (1) virtiofsd
> > > ./virtiofsd -o
> > > source=/mnt/ssd/virtiofs,cache=auto,flock,posix_lock,writeback,xattr
> > > --thread-pool-size=1 --socket-path=/tmp/vhostqemu
> > >
> > > (2) qemu
> > > qemu-system-x86_64 \
> > > -enable-kvm \
> > > -name ubuntu \
> > > -cpu Westmere \
> > > -m 4096 \
> > > -global kvm-apic.vapic=false \
> > > -netdev tap,id=hn0,vhost=off,br=br0,helper=/usr/local/libexec/qemu-bridge-helper
> > > \
> > > -device e1000,id=e0,netdev=hn0 \
> > > -blockdev '{"node-name": "disk0", "driver": "qcow2",
> > > "refcount-cache-size": 1638400, "l2-cache-size": 6553600, "file": {
> > > "driver": "file", "filename": "'${imagefolder}\/ubuntu.qcow2'"}}' \
> > > -device virtio-blk,drive=disk0,id=disk0 \
> > > -chardev socket,id=ch0,path=/tmp/vhostqemu \
> > > -device vhost-user-fs-pci,chardev=ch0,tag=myfs \
> > > -object memory-backend-memfd,id=mem,size=4G,share=on \
> > > -numa node,memdev=mem \
> > > -qmp stdio \
> > > -vnc :0
> > >
> > > (3) guest
> > > mount -t virtiofs myfs /mnt/virtiofs
> > >
> > > I tried to change virtiofsd's --thread-pool-size value and test the
> > > storage performance by fio.
> > > Before each read/write/randread/randwrite test, the pagecaches of
> > > guest and host are dropped.
> > >
> > > ```
> > > RW="read" # or write/randread/randwrite
> > > fio --name=test --rw=$RW --bs=4k --numjobs=1 --ioengine=libaio
> > > --runtime=60 --direct=0 --iodepth=64 --size=10g
> > > --filename=/mnt/virtiofs/testfile
> > > done
>
> Couple of things.
>
> - Can you try cache=none option in virtiofsd. That will bypass page
>   cache in guest. It also gets rid of latencies related to
>   file_remove_privs() as of now.
>
> - Also with direct=0, are we really driving iodepth of 64? With direct=0
>   it is cached I/O. Is it still asynchronous at this point of time of
>   we have fallen back to synchronous I/O and driving queue depth of
>   1.

Hi, Vivek

I did not see any difference in queue depth with direct={0|1} in my fio test.
Are there more clues to dig into this issue?

>
> - With cache=auto/always, I am seeing performance issues with small writes
>   and trying to address it.
>
> https://lore.kernel.org/linux-fsdevel/20200716144032.GC422759@redhat.com/
> https://lore.kernel.org/linux-fsdevel/20200724183812.19573-1-vgoyal@redhat.com/

No problem, I'll try it, thanks.

Regards,
Derek

>
> Thanks
> Vivek
>
> > > ```
> > >
> > > --thread-pool-size=64 (default)
> > >     seq read: 305 MB/s
> > >     seq write: 118 MB/s
> > >     rand 4KB read: 2222 IOPS
> > >     rand 4KB write: 21100 IOPS
> > >
> > > --thread-pool-size=1
> > >     seq read: 387 MB/s
> > >     seq write: 160 MB/s
> > >     rand 4KB read: 2622 IOPS
> > >     rand 4KB write: 30400 IOPS
> > >
> > > The results show the performance using default-pool-size (64) is
> > > poorer than using single thread.
> > > Is it due to the lock contention of the multiple threads?
> > > When can virtio-fs get better performance using multiple threads?
> > >
> > >
> > > I also tested the performance that guest accesses host's files via
> > > NFSv4/CIFS network filesystem.
> > > The "seq read" and "randread" performance of virtio-fs are also worse
> > > than the NFSv4 and CIFS.
> > >
> > > NFSv4:
> > >   seq write: 244 MB/s
> > >   rand 4K read: 4086 IOPS
> > >
> > > I cannot figure out why the perf of NFSv4/CIFS with the network stack
> > > is better than virtio-fs.
> > > Is it expected? Or, do I have an incorrect configuration?
> >
> > No, I remember benchmarking the thread pool and did not see such a big
> > difference.
> >
> > Please use direct=1 so that each I/O results in a virtio-fs request.
> > Otherwise the I/O pattern is not directly controlled by the benchmark
> > but by the page cache (readahead, etc).
> >
> > Using numactl(8) or taskset(1) to launch virtiofsd allows you to control
> > NUMA and CPU scheduling properties. For example, you could force all 64
> > threads to run on the same host CPU using taskset to see if that helps
> > this I/O bound workload.
> >
> > fio can collect detailed statistics on queue depths and a latency
> > histogram. It would be interesting to compare the --thread-pool-size=64
> > and --thread-pool-size=1 numbers.
> >
> > Comparing the "perf record -e kvm:kvm_exit" counts between the two might
> > also be interesting.
> >
> > Stefan
>
>
>
> > _______________________________________________
> > Virtio-fs mailing list
> > Virtio-fs@redhat.com
> > https://www.redhat.com/mailman/listinfo/virtio-fs
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Virtio-fs] virtio-fs performance
@ 2020-08-04  7:51       ` Derek Su
  0 siblings, 0 replies; 11+ messages in thread
From: Derek Su @ 2020-08-04  7:51 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, qemu-devel, qemu-discuss

Vivek Goyal <vgoyal@redhat.com> 於 2020年7月28日 週二 下午11:27寫道:
>
> On Tue, Jul 28, 2020 at 02:49:36PM +0100, Stefan Hajnoczi wrote:
> > > I'm trying and testing the virtio-fs feature in QEMU v5.0.0.
> > > My host and guest OS are both ubuntu 18.04 with kernel 5.4, and the
> > > underlying storage is one single SSD.
> > >
> > > The configuations are:
> > > (1) virtiofsd
> > > ./virtiofsd -o
> > > source=/mnt/ssd/virtiofs,cache=auto,flock,posix_lock,writeback,xattr
> > > --thread-pool-size=1 --socket-path=/tmp/vhostqemu
> > >
> > > (2) qemu
> > > qemu-system-x86_64 \
> > > -enable-kvm \
> > > -name ubuntu \
> > > -cpu Westmere \
> > > -m 4096 \
> > > -global kvm-apic.vapic=false \
> > > -netdev tap,id=hn0,vhost=off,br=br0,helper=/usr/local/libexec/qemu-bridge-helper
> > > \
> > > -device e1000,id=e0,netdev=hn0 \
> > > -blockdev '{"node-name": "disk0", "driver": "qcow2",
> > > "refcount-cache-size": 1638400, "l2-cache-size": 6553600, "file": {
> > > "driver": "file", "filename": "'${imagefolder}\/ubuntu.qcow2'"}}' \
> > > -device virtio-blk,drive=disk0,id=disk0 \
> > > -chardev socket,id=ch0,path=/tmp/vhostqemu \
> > > -device vhost-user-fs-pci,chardev=ch0,tag=myfs \
> > > -object memory-backend-memfd,id=mem,size=4G,share=on \
> > > -numa node,memdev=mem \
> > > -qmp stdio \
> > > -vnc :0
> > >
> > > (3) guest
> > > mount -t virtiofs myfs /mnt/virtiofs
> > >
> > > I tried to change virtiofsd's --thread-pool-size value and test the
> > > storage performance by fio.
> > > Before each read/write/randread/randwrite test, the pagecaches of
> > > guest and host are dropped.
> > >
> > > ```
> > > RW="read" # or write/randread/randwrite
> > > fio --name=test --rw=$RW --bs=4k --numjobs=1 --ioengine=libaio
> > > --runtime=60 --direct=0 --iodepth=64 --size=10g
> > > --filename=/mnt/virtiofs/testfile
> > > done
>
> Couple of things.
>
> - Can you try cache=none option in virtiofsd. That will bypass page
>   cache in guest. It also gets rid of latencies related to
>   file_remove_privs() as of now.
>
> - Also with direct=0, are we really driving iodepth of 64? With direct=0
>   it is cached I/O. Is it still asynchronous at this point of time of
>   we have fallen back to synchronous I/O and driving queue depth of
>   1.

Hi, Vivek

I did not see any difference in queue depth with direct={0|1} in my fio test.
Are there more clues to dig into this issue?

>
> - With cache=auto/always, I am seeing performance issues with small writes
>   and trying to address it.
>
> https://lore.kernel.org/linux-fsdevel/20200716144032.GC422759@redhat.com/
> https://lore.kernel.org/linux-fsdevel/20200724183812.19573-1-vgoyal@redhat.com/

No problem, I'll try it, thanks.

Regards,
Derek

>
> Thanks
> Vivek
>
> > > ```
> > >
> > > --thread-pool-size=64 (default)
> > >     seq read: 305 MB/s
> > >     seq write: 118 MB/s
> > >     rand 4KB read: 2222 IOPS
> > >     rand 4KB write: 21100 IOPS
> > >
> > > --thread-pool-size=1
> > >     seq read: 387 MB/s
> > >     seq write: 160 MB/s
> > >     rand 4KB read: 2622 IOPS
> > >     rand 4KB write: 30400 IOPS
> > >
> > > The results show the performance using default-pool-size (64) is
> > > poorer than using single thread.
> > > Is it due to the lock contention of the multiple threads?
> > > When can virtio-fs get better performance using multiple threads?
> > >
> > >
> > > I also tested the performance that guest accesses host's files via
> > > NFSv4/CIFS network filesystem.
> > > The "seq read" and "randread" performance of virtio-fs are also worse
> > > than the NFSv4 and CIFS.
> > >
> > > NFSv4:
> > >   seq write: 244 MB/s
> > >   rand 4K read: 4086 IOPS
> > >
> > > I cannot figure out why the perf of NFSv4/CIFS with the network stack
> > > is better than virtio-fs.
> > > Is it expected? Or, do I have an incorrect configuration?
> >
> > No, I remember benchmarking the thread pool and did not see such a big
> > difference.
> >
> > Please use direct=1 so that each I/O results in a virtio-fs request.
> > Otherwise the I/O pattern is not directly controlled by the benchmark
> > but by the page cache (readahead, etc).
> >
> > Using numactl(8) or taskset(1) to launch virtiofsd allows you to control
> > NUMA and CPU scheduling properties. For example, you could force all 64
> > threads to run on the same host CPU using taskset to see if that helps
> > this I/O bound workload.
> >
> > fio can collect detailed statistics on queue depths and a latency
> > histogram. It would be interesting to compare the --thread-pool-size=64
> > and --thread-pool-size=1 numbers.
> >
> > Comparing the "perf record -e kvm:kvm_exit" counts between the two might
> > also be interesting.
> >
> > Stefan
>
>
>
> > _______________________________________________
> > Virtio-fs mailing list
> > Virtio-fs@redhat.com
> > https://www.redhat.com/mailman/listinfo/virtio-fs
>



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: virtio-fs performance
  2020-08-04  7:37     ` [Virtio-fs] " Derek Su
@ 2020-08-05 11:52       ` Stefan Hajnoczi
  -1 siblings, 0 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2020-08-05 11:52 UTC (permalink / raw)
  To: Derek Su; +Cc: virtio-fs, qemu-devel, qemu-discuss

[-- Attachment #1: Type: text/plain, Size: 2035 bytes --]

On Tue, Aug 04, 2020 at 03:37:26PM +0800, Derek Su wrote:
> Set the cache=none in virtiofsd and direct=1 in fio,
> here are the results and kvm-exit count in 5 seconds.
> 
> --thread-pool-size=64 (default)
>     seq read: 307 MB/s (kvm-exit count=1076463)
>     seq write: 430 MB/s (kvm-exit count=1302493)
>     rand 4KB read: 65.2k IOPS (kvm-exit count=1322899)
>     rand 4KB write: 97.2k IOPS (kvm-exit count=1568618)
> 
> --thread-pool-size=1
>     seq read: 303 MB/s (kvm-exit count=1034614)
>     seq write: 358 MB/s. (kvm-exit count=1537735)
>     rand 4KB read: 7995 IOPS (kvm-exit count=438348)
>     rand 4KB write: 97.7k IOPS (kvm-exit count=1907585)
> 
> The thread-pool-size=64 improves the rand 4KB read performance largely,
> but doesn't increases the kvm-exit count too much.
> 
> In addition, the fio avg. clat of rand 4K write are 960us for
> thread-pool-size=64 and 7700us for thread-pool-size=1.

These numbers make sense to me. The thread pool is generally faster.

Note that virtiofsd opens files without O_DIRECT, even if with the
cache=none option. This explains why rand 4KB write reaches 97.7K but
rand 4KB read only does 7885 IOPS (random reads result in page cache
misses on the host).

I don't have a good explanation of why the thread pool was slower with
direct=0 though :(. One way to investigate that is by checking whether
the I/O pattern submitted by the guest is comparable between
--thread-pool-size=64 and --thread-pool-size=1. You could try to observe
this by tracing virtiofsd preadv()/pwritev() system calls.

If you find that --thread-pool-size=64 made more I/O requests and with
smaller block sizes, then it's probably a timing issue where the guest
page cache responds differently because the virtiofsd thread pool
completes requests at a different rate. Maybe it affects how the guest
page cache is populated and a slower virtiofsd leads to more efficient
page cache activity in the guest (-> fewer and bigger FUSE read/write
requests)?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Virtio-fs] virtio-fs performance
@ 2020-08-05 11:52       ` Stefan Hajnoczi
  0 siblings, 0 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2020-08-05 11:52 UTC (permalink / raw)
  To: Derek Su; +Cc: virtio-fs, qemu-devel, qemu-discuss

[-- Attachment #1: Type: text/plain, Size: 2035 bytes --]

On Tue, Aug 04, 2020 at 03:37:26PM +0800, Derek Su wrote:
> Set the cache=none in virtiofsd and direct=1 in fio,
> here are the results and kvm-exit count in 5 seconds.
> 
> --thread-pool-size=64 (default)
>     seq read: 307 MB/s (kvm-exit count=1076463)
>     seq write: 430 MB/s (kvm-exit count=1302493)
>     rand 4KB read: 65.2k IOPS (kvm-exit count=1322899)
>     rand 4KB write: 97.2k IOPS (kvm-exit count=1568618)
> 
> --thread-pool-size=1
>     seq read: 303 MB/s (kvm-exit count=1034614)
>     seq write: 358 MB/s. (kvm-exit count=1537735)
>     rand 4KB read: 7995 IOPS (kvm-exit count=438348)
>     rand 4KB write: 97.7k IOPS (kvm-exit count=1907585)
> 
> The thread-pool-size=64 improves the rand 4KB read performance largely,
> but doesn't increases the kvm-exit count too much.
> 
> In addition, the fio avg. clat of rand 4K write are 960us for
> thread-pool-size=64 and 7700us for thread-pool-size=1.

These numbers make sense to me. The thread pool is generally faster.

Note that virtiofsd opens files without O_DIRECT, even if with the
cache=none option. This explains why rand 4KB write reaches 97.7K but
rand 4KB read only does 7885 IOPS (random reads result in page cache
misses on the host).

I don't have a good explanation of why the thread pool was slower with
direct=0 though :(. One way to investigate that is by checking whether
the I/O pattern submitted by the guest is comparable between
--thread-pool-size=64 and --thread-pool-size=1. You could try to observe
this by tracing virtiofsd preadv()/pwritev() system calls.

If you find that --thread-pool-size=64 made more I/O requests and with
smaller block sizes, then it's probably a timing issue where the guest
page cache responds differently because the virtiofsd thread pool
completes requests at a different rate. Maybe it affects how the guest
page cache is populated and a slower virtiofsd leads to more efficient
page cache activity in the guest (-> fewer and bigger FUSE read/write
requests)?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Virtio-fs] virtio-fs performance
  2020-08-04  7:51       ` Derek Su
@ 2020-08-06 20:07         ` Vivek Goyal
  -1 siblings, 0 replies; 11+ messages in thread
From: Vivek Goyal @ 2020-08-06 20:07 UTC (permalink / raw)
  To: Derek Su; +Cc: virtio-fs, qemu-devel, Stefan Hajnoczi, qemu-discuss

On Tue, Aug 04, 2020 at 03:51:50PM +0800, Derek Su wrote:
> Vivek Goyal <vgoyal@redhat.com> 於 2020年7月28日 週二 下午11:27寫道:
> >
> > On Tue, Jul 28, 2020 at 02:49:36PM +0100, Stefan Hajnoczi wrote:
> > > > I'm trying and testing the virtio-fs feature in QEMU v5.0.0.
> > > > My host and guest OS are both ubuntu 18.04 with kernel 5.4, and the
> > > > underlying storage is one single SSD.
> > > >
> > > > The configuations are:
> > > > (1) virtiofsd
> > > > ./virtiofsd -o
> > > > source=/mnt/ssd/virtiofs,cache=auto,flock,posix_lock,writeback,xattr
> > > > --thread-pool-size=1 --socket-path=/tmp/vhostqemu
> > > >
> > > > (2) qemu
> > > > qemu-system-x86_64 \
> > > > -enable-kvm \
> > > > -name ubuntu \
> > > > -cpu Westmere \
> > > > -m 4096 \
> > > > -global kvm-apic.vapic=false \
> > > > -netdev tap,id=hn0,vhost=off,br=br0,helper=/usr/local/libexec/qemu-bridge-helper
> > > > \
> > > > -device e1000,id=e0,netdev=hn0 \
> > > > -blockdev '{"node-name": "disk0", "driver": "qcow2",
> > > > "refcount-cache-size": 1638400, "l2-cache-size": 6553600, "file": {
> > > > "driver": "file", "filename": "'${imagefolder}\/ubuntu.qcow2'"}}' \
> > > > -device virtio-blk,drive=disk0,id=disk0 \
> > > > -chardev socket,id=ch0,path=/tmp/vhostqemu \
> > > > -device vhost-user-fs-pci,chardev=ch0,tag=myfs \
> > > > -object memory-backend-memfd,id=mem,size=4G,share=on \
> > > > -numa node,memdev=mem \
> > > > -qmp stdio \
> > > > -vnc :0
> > > >
> > > > (3) guest
> > > > mount -t virtiofs myfs /mnt/virtiofs
> > > >
> > > > I tried to change virtiofsd's --thread-pool-size value and test the
> > > > storage performance by fio.
> > > > Before each read/write/randread/randwrite test, the pagecaches of
> > > > guest and host are dropped.
> > > >
> > > > ```
> > > > RW="read" # or write/randread/randwrite
> > > > fio --name=test --rw=$RW --bs=4k --numjobs=1 --ioengine=libaio
> > > > --runtime=60 --direct=0 --iodepth=64 --size=10g
> > > > --filename=/mnt/virtiofs/testfile
> > > > done
> >
> > Couple of things.
> >
> > - Can you try cache=none option in virtiofsd. That will bypass page
> >   cache in guest. It also gets rid of latencies related to
> >   file_remove_privs() as of now.
> >
> > - Also with direct=0, are we really driving iodepth of 64? With direct=0
> >   it is cached I/O. Is it still asynchronous at this point of time of
> >   we have fallen back to synchronous I/O and driving queue depth of
> >   1.
> 
> Hi, Vivek
> 
> I did not see any difference in queue depth with direct={0|1} in my fio test.
> Are there more clues to dig into this issue?

I tried it just again. fio seems to say queue depth 64 in both the cases
but I am not sure if this is correct. Reason being that I get much
better performance with direct=1. Also fio man page says.

 libaio Linux native asynchronous I/O. Note that Linux may
        only support queued behavior with non-buffered I/O
        (set  `direct=1'  or  `buffered=0').   This engine
        defines engine specific options.

Are you see difference in effective bandwidth/iops when you run with
direct=0/1. I see it. 

Anyway, in an attempt to narrow down the issues, I ran virtiofsd 
with cache=none and did not enable xattr. (As of now xattr case
needs to be optimized with SB_NOSEC).

I ran virtiofsd as follows.

./virtiofsd --socket-path=/tmp/vhostqemu2 -o source=/mnt/sdb/virtiofs-source2/ -o no_posix_lock -o modcaps=+sys_admin -o log_level=info -o cache=none --daemonize

And then ran following fio commands with direct=0 and direct=1.

fio --name=test --rw=randwrite --bs=4K --numjobs=1 --ioengine=libaio --runtime=30 --direct=0 --iodepth=64 --filename=fio-file1

direct=0
--------
write: IOPS=8712, BW=34.0MiB/s (35.7MB/s)(1021MiB/30001msec)

direct=1
--------
write: IOPS=84.4k, BW=330MiB/s (346MB/s)(4096MiB/12428msec)

So I see almost 10 fold jump in throughput with direct=1. So I believe
direct=0 is not driving the queue depth.

You raised interesting issue of --thread-pool-size=1 vs 64 and I decided
to give it a try. I ran same tests as above with thread pool size 1
and following are results.

with direct=0
-------------
write: IOPS=14.7k, BW=57.4MiB/s (60.2MB/s)(1721MiB/30001msec)

with direct=1
-------------
write: IOPS=71.7k, BW=280MiB/s (294MB/s)(4096MiB/14622msec);

So with we are driving queue depth 1 (direct=0), looks like
--thread-pool-size 1 is helping. I see higher IOPS. But when we are
driving queue depth of 64, then --thread-pool-size=1 seems to hurt.

Now question is, why thread pool size 64 by default hurts so much
for the case of queue depth 1.

You raised anohter issue of it being slower than NFSv4/CIFS. I think
you can run virtiofsd with cache=none and without enabling xattr
and post results here so that we have some idea how much better
NFSv4/CIFS is.

Thanks
Vivek



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Virtio-fs] virtio-fs performance
@ 2020-08-06 20:07         ` Vivek Goyal
  0 siblings, 0 replies; 11+ messages in thread
From: Vivek Goyal @ 2020-08-06 20:07 UTC (permalink / raw)
  To: Derek Su; +Cc: virtio-fs, qemu-devel, qemu-discuss

On Tue, Aug 04, 2020 at 03:51:50PM +0800, Derek Su wrote:
> Vivek Goyal <vgoyal@redhat.com> 於 2020年7月28日 週二 下午11:27寫道:
> >
> > On Tue, Jul 28, 2020 at 02:49:36PM +0100, Stefan Hajnoczi wrote:
> > > > I'm trying and testing the virtio-fs feature in QEMU v5.0.0.
> > > > My host and guest OS are both ubuntu 18.04 with kernel 5.4, and the
> > > > underlying storage is one single SSD.
> > > >
> > > > The configuations are:
> > > > (1) virtiofsd
> > > > ./virtiofsd -o
> > > > source=/mnt/ssd/virtiofs,cache=auto,flock,posix_lock,writeback,xattr
> > > > --thread-pool-size=1 --socket-path=/tmp/vhostqemu
> > > >
> > > > (2) qemu
> > > > qemu-system-x86_64 \
> > > > -enable-kvm \
> > > > -name ubuntu \
> > > > -cpu Westmere \
> > > > -m 4096 \
> > > > -global kvm-apic.vapic=false \
> > > > -netdev tap,id=hn0,vhost=off,br=br0,helper=/usr/local/libexec/qemu-bridge-helper
> > > > \
> > > > -device e1000,id=e0,netdev=hn0 \
> > > > -blockdev '{"node-name": "disk0", "driver": "qcow2",
> > > > "refcount-cache-size": 1638400, "l2-cache-size": 6553600, "file": {
> > > > "driver": "file", "filename": "'${imagefolder}\/ubuntu.qcow2'"}}' \
> > > > -device virtio-blk,drive=disk0,id=disk0 \
> > > > -chardev socket,id=ch0,path=/tmp/vhostqemu \
> > > > -device vhost-user-fs-pci,chardev=ch0,tag=myfs \
> > > > -object memory-backend-memfd,id=mem,size=4G,share=on \
> > > > -numa node,memdev=mem \
> > > > -qmp stdio \
> > > > -vnc :0
> > > >
> > > > (3) guest
> > > > mount -t virtiofs myfs /mnt/virtiofs
> > > >
> > > > I tried to change virtiofsd's --thread-pool-size value and test the
> > > > storage performance by fio.
> > > > Before each read/write/randread/randwrite test, the pagecaches of
> > > > guest and host are dropped.
> > > >
> > > > ```
> > > > RW="read" # or write/randread/randwrite
> > > > fio --name=test --rw=$RW --bs=4k --numjobs=1 --ioengine=libaio
> > > > --runtime=60 --direct=0 --iodepth=64 --size=10g
> > > > --filename=/mnt/virtiofs/testfile
> > > > done
> >
> > Couple of things.
> >
> > - Can you try cache=none option in virtiofsd. That will bypass page
> >   cache in guest. It also gets rid of latencies related to
> >   file_remove_privs() as of now.
> >
> > - Also with direct=0, are we really driving iodepth of 64? With direct=0
> >   it is cached I/O. Is it still asynchronous at this point of time of
> >   we have fallen back to synchronous I/O and driving queue depth of
> >   1.
> 
> Hi, Vivek
> 
> I did not see any difference in queue depth with direct={0|1} in my fio test.
> Are there more clues to dig into this issue?

I tried it just again. fio seems to say queue depth 64 in both the cases
but I am not sure if this is correct. Reason being that I get much
better performance with direct=1. Also fio man page says.

 libaio Linux native asynchronous I/O. Note that Linux may
        only support queued behavior with non-buffered I/O
        (set  `direct=1'  or  `buffered=0').   This engine
        defines engine specific options.

Are you see difference in effective bandwidth/iops when you run with
direct=0/1. I see it. 

Anyway, in an attempt to narrow down the issues, I ran virtiofsd 
with cache=none and did not enable xattr. (As of now xattr case
needs to be optimized with SB_NOSEC).

I ran virtiofsd as follows.

./virtiofsd --socket-path=/tmp/vhostqemu2 -o source=/mnt/sdb/virtiofs-source2/ -o no_posix_lock -o modcaps=+sys_admin -o log_level=info -o cache=none --daemonize

And then ran following fio commands with direct=0 and direct=1.

fio --name=test --rw=randwrite --bs=4K --numjobs=1 --ioengine=libaio --runtime=30 --direct=0 --iodepth=64 --filename=fio-file1

direct=0
--------
write: IOPS=8712, BW=34.0MiB/s (35.7MB/s)(1021MiB/30001msec)

direct=1
--------
write: IOPS=84.4k, BW=330MiB/s (346MB/s)(4096MiB/12428msec)

So I see almost 10 fold jump in throughput with direct=1. So I believe
direct=0 is not driving the queue depth.

You raised interesting issue of --thread-pool-size=1 vs 64 and I decided
to give it a try. I ran same tests as above with thread pool size 1
and following are results.

with direct=0
-------------
write: IOPS=14.7k, BW=57.4MiB/s (60.2MB/s)(1721MiB/30001msec)

with direct=1
-------------
write: IOPS=71.7k, BW=280MiB/s (294MB/s)(4096MiB/14622msec);

So with we are driving queue depth 1 (direct=0), looks like
--thread-pool-size 1 is helping. I see higher IOPS. But when we are
driving queue depth of 64, then --thread-pool-size=1 seems to hurt.

Now question is, why thread pool size 64 by default hurts so much
for the case of queue depth 1.

You raised anohter issue of it being slower than NFSv4/CIFS. I think
you can run virtiofsd with cache=none and without enabling xattr
and post results here so that we have some idea how much better
NFSv4/CIFS is.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-08-06 20:29 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAFKS8hWbckrE_cyJCf0pgFresD-JQk66wo-6uJA=Gu2MhReHVw@mail.gmail.com>
2020-07-28 13:49 ` virtio-fs performance Stefan Hajnoczi
2020-07-28 13:49   ` [Virtio-fs] " Stefan Hajnoczi
2020-07-28 15:27   ` Vivek Goyal
2020-08-04  7:51     ` Derek Su
2020-08-04  7:51       ` Derek Su
2020-08-06 20:07       ` Vivek Goyal
2020-08-06 20:07         ` Vivek Goyal
2020-08-04  7:37   ` Derek Su
2020-08-04  7:37     ` [Virtio-fs] " Derek Su
2020-08-05 11:52     ` Stefan Hajnoczi
2020-08-05 11:52       ` [Virtio-fs] " Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.