* [Qemu-devel] Performance problem and improvement about block drive on NFS shares with libnfs
@ 2017-04-01 5:23 Jaden Liang
2017-04-01 5:37 ` Fam Zheng
2017-04-06 10:40 ` Stefan Hajnoczi
0 siblings, 2 replies; 4+ messages in thread
From: Jaden Liang @ 2017-04-01 5:23 UTC (permalink / raw)
To: qemu-devel
Hello,
I ran qemu with drive file via libnfs recently, and found some performance
problem and an improvement idea.
I started qemu with 6 drives parameter like nfs://127.0.0.1/dir/vm-disk-x.qcow2
which linked to a local NFS server, then used iometer in guest machine to test
the 4K random read or random write IO performance. I found that while the IO
depth go up, the IOPS hit a bottleneck. I looked into the causes, found that the
main thread of qemu used 100% CPU. From the perf data, it show the CPU heats are
send / recv calls in libnfs. By reading the source code of libnfs and qemu block
drive of nfs.c, libnfs only support single work thread, and the network events
of nfs interface in qemu are all registered in the epoll of main thread. That is
the cause why main thread uses 100% CPU.
After the analysis above, there is an improvement idea comes up. I start a
thread for every drive while libnfs open drive file, then create an epoll in
every drive thread to handle all of the network events. I have finished an demo
modification in block/nfs.c, then rerun iometer in the guest machine, the
performance increased a lot. Random read IOPS increases almost 100%, random
write IOPS increases about 68%.
Test model details
VM configure: 6 vdisks in 1 VM
Test tool and parameter: iometer with 4K random read and randwrite
Backend physical drive: 2 SSDs, 6 vdisks are seperated in 2 SSDs
Before modified:
IO Depth 1 2 4 8 16 32
4K randread 16659 28387 42932 46868 52108 55760
4K randwrite 12212 19456 30447 30574 35788 39015
After modified:
IO Depth 1 2 4 8 16 32
4K randread 17661 33115 57138 82016 99369 109410
4K randwrite 12669 21492 36017 51532 61475 65577
I could put a up to coding standard patch later. Now I want to get some advise
about this modification. Is this a reasonable solution to improve performance in
NFS shares? Or there is another better way?
Any suggestions would be great! Also please feel free to ask question.
--
Best regards,
Jaden Liang
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] Performance problem and improvement about block drive on NFS shares with libnfs
2017-04-01 5:23 [Qemu-devel] Performance problem and improvement about block drive on NFS shares with libnfs Jaden Liang
@ 2017-04-01 5:37 ` Fam Zheng
2017-04-01 6:28 ` Jaden Liang
2017-04-06 10:40 ` Stefan Hajnoczi
1 sibling, 1 reply; 4+ messages in thread
From: Fam Zheng @ 2017-04-01 5:37 UTC (permalink / raw)
To: Jaden Liang; +Cc: qemu-devel
On Sat, 04/01 13:23, Jaden Liang wrote:
> Hello,
>
> I ran qemu with drive file via libnfs recently, and found some performance
> problem and an improvement idea.
>
> I started qemu with 6 drives parameter like nfs://127.0.0.1/dir/vm-disk-x.qcow2
> which linked to a local NFS server, then used iometer in guest machine to test
> the 4K random read or random write IO performance. I found that while the IO
> depth go up, the IOPS hit a bottleneck. I looked into the causes, found that the
> main thread of qemu used 100% CPU. From the perf data, it show the CPU heats are
> send / recv calls in libnfs. By reading the source code of libnfs and qemu block
> drive of nfs.c, libnfs only support single work thread, and the network events
> of nfs interface in qemu are all registered in the epoll of main thread. That is
> the cause why main thread uses 100% CPU.
>
> After the analysis above, there is an improvement idea comes up. I start a
> thread for every drive while libnfs open drive file, then create an epoll in
> every drive thread to handle all of the network events. I have finished an demo
> modification in block/nfs.c, then rerun iometer in the guest machine, the
> performance increased a lot. Random read IOPS increases almost 100%, random
> write IOPS increases about 68%.
>
> Test model details
> VM configure: 6 vdisks in 1 VM
> Test tool and parameter: iometer with 4K random read and randwrite
> Backend physical drive: 2 SSDs, 6 vdisks are seperated in 2 SSDs
>
> Before modified:
> IO Depth 1 2 4 8 16 32
> 4K randread 16659 28387 42932 46868 52108 55760
> 4K randwrite 12212 19456 30447 30574 35788 39015
>
> After modified:
> IO Depth 1 2 4 8 16 32
> 4K randread 17661 33115 57138 82016 99369 109410
> 4K randwrite 12669 21492 36017 51532 61475 65577
>
> I could put a up to coding standard patch later. Now I want to get some advise
> about this modification. Is this a reasonable solution to improve performance in
> NFS shares? Or there is another better way?
>
> Any suggestions would be great! Also please feel free to ask question.
Just one comment: in block/file-posix.c (aio=threads), there is a thread pool
that does something similar, using the code util/thread-pool.c. Maybe it's
usable for your block/nfs.c change too.
Also a question: have you considered modifying libnfs to create more worker
threads? That way all applications using libnfs can benefit.
Fam
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] Performance problem and improvement about block drive on NFS shares with libnfs
2017-04-01 5:37 ` Fam Zheng
@ 2017-04-01 6:28 ` Jaden Liang
0 siblings, 0 replies; 4+ messages in thread
From: Jaden Liang @ 2017-04-01 6:28 UTC (permalink / raw)
To: Fam Zheng; +Cc: qemu-devel
2017-04-01 13:37 GMT+08:00 Fam Zheng <famz@redhat.com>:
> On Sat, 04/01 13:23, Jaden Liang wrote:
>> Hello,
>>
>> I ran qemu with drive file via libnfs recently, and found some performance
>> problem and an improvement idea.
>>
>> I started qemu with 6 drives parameter like nfs://127.0.0.1/dir/vm-disk-x.qcow2
>> which linked to a local NFS server, then used iometer in guest machine to test
>> the 4K random read or random write IO performance. I found that while the IO
>> depth go up, the IOPS hit a bottleneck. I looked into the causes, found that the
>> main thread of qemu used 100% CPU. From the perf data, it show the CPU heats are
>> send / recv calls in libnfs. By reading the source code of libnfs and qemu block
>> drive of nfs.c, libnfs only support single work thread, and the network events
>> of nfs interface in qemu are all registered in the epoll of main thread. That is
>> the cause why main thread uses 100% CPU.
>>
>> After the analysis above, there is an improvement idea comes up. I start a
>> thread for every drive while libnfs open drive file, then create an epoll in
>> every drive thread to handle all of the network events. I have finished an demo
>> modification in block/nfs.c, then rerun iometer in the guest machine, the
>> performance increased a lot. Random read IOPS increases almost 100%, random
>> write IOPS increases about 68%.
>>
>> Test model details
>> VM configure: 6 vdisks in 1 VM
>> Test tool and parameter: iometer with 4K random read and randwrite
>> Backend physical drive: 2 SSDs, 6 vdisks are seperated in 2 SSDs
>>
>> Before modified:
>> IO Depth 1 2 4 8 16 32
>> 4K randread 16659 28387 42932 46868 52108 55760
>> 4K randwrite 12212 19456 30447 30574 35788 39015
>>
>> After modified:
>> IO Depth 1 2 4 8 16 32
>> 4K randread 17661 33115 57138 82016 99369 109410
>> 4K randwrite 12669 21492 36017 51532 61475 65577
>>
>> I could put a up to coding standard patch later. Now I want to get some advise
>> about this modification. Is this a reasonable solution to improve performance in
>> NFS shares? Or there is another better way?
>>
>> Any suggestions would be great! Also please feel free to ask question.
>
> Just one comment: in block/file-posix.c (aio=threads), there is a thread pool
> that does something similar, using the code util/thread-pool.c. Maybe it's
> usable for your block/nfs.c change too.
>
> Also a question: have you considered modifying libnfs to create more worker
> threads? That way all applications using libnfs can benefit.
>
> Fam
Modifying libnfs is also a solution. However, when I looked into
libnfs, found that it
is totally single-thread design. It would be a lot work to make it
support multi-thread mode.
That is why I choose to modify qemu block/nfs.c instead. Because there
are already
some similar ways like file-posix.c.
--
Best regards,
Jaden Liang
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] Performance problem and improvement about block drive on NFS shares with libnfs
2017-04-01 5:23 [Qemu-devel] Performance problem and improvement about block drive on NFS shares with libnfs Jaden Liang
2017-04-01 5:37 ` Fam Zheng
@ 2017-04-06 10:40 ` Stefan Hajnoczi
1 sibling, 0 replies; 4+ messages in thread
From: Stefan Hajnoczi @ 2017-04-06 10:40 UTC (permalink / raw)
To: Jaden Liang; +Cc: qemu-devel
[-- Attachment #1: Type: text/plain, Size: 2576 bytes --]
On Sat, Apr 01, 2017 at 01:23:46PM +0800, Jaden Liang wrote:
> Hello,
>
> I ran qemu with drive file via libnfs recently, and found some performance
> problem and an improvement idea.
>
> I started qemu with 6 drives parameter like nfs://127.0.0.1/dir/vm-disk-x.qcow2
> which linked to a local NFS server, then used iometer in guest machine to test
> the 4K random read or random write IO performance. I found that while the IO
> depth go up, the IOPS hit a bottleneck. I looked into the causes, found that the
> main thread of qemu used 100% CPU. From the perf data, it show the CPU heats are
> send / recv calls in libnfs. By reading the source code of libnfs and qemu block
> drive of nfs.c, libnfs only support single work thread, and the network events
> of nfs interface in qemu are all registered in the epoll of main thread. That is
> the cause why main thread uses 100% CPU.
>
> After the analysis above, there is an improvement idea comes up. I start a
> thread for every drive while libnfs open drive file, then create an epoll in
> every drive thread to handle all of the network events. I have finished an demo
> modification in block/nfs.c, then rerun iometer in the guest machine, the
> performance increased a lot. Random read IOPS increases almost 100%, random
> write IOPS increases about 68%.
>
> Test model details
> VM configure: 6 vdisks in 1 VM
> Test tool and parameter: iometer with 4K random read and randwrite
> Backend physical drive: 2 SSDs, 6 vdisks are seperated in 2 SSDs
>
> Before modified:
> IO Depth 1 2 4 8 16 32
> 4K randread 16659 28387 42932 46868 52108 55760
> 4K randwrite 12212 19456 30447 30574 35788 39015
>
> After modified:
> IO Depth 1 2 4 8 16 32
> 4K randread 17661 33115 57138 82016 99369 109410
> 4K randwrite 12669 21492 36017 51532 61475 65577
>
> I could put a up to coding standard patch later. Now I want to get some advise
> about this modification. Is this a reasonable solution to improve performance in
> NFS shares? Or there is another better way?
>
> Any suggestions would be great! Also please feel free to ask question.
Did you try using -object iothread,id=iothread1 -device
virtio-blk-pci,iothread=iothread1,... to define IOThreads for each
virtio-blk-pci device?
The block/nfs.c code already supports IOThread so you can run multiple
threads and don't need to use 100% CPU in the main loop.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-04-06 10:40 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-01 5:23 [Qemu-devel] Performance problem and improvement about block drive on NFS shares with libnfs Jaden Liang
2017-04-01 5:37 ` Fam Zheng
2017-04-01 6:28 ` Jaden Liang
2017-04-06 10:40 ` Stefan Hajnoczi
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.