[Qemu-devel] Performance problem and improvement about block drive on NFS shares with libnfs

* [Qemu-devel] Performance problem and improvement about block drive on NFS shares with libnfs
@ 2017-04-01  5:23 Jaden Liang
  2017-04-01  5:37 ` Fam Zheng
  2017-04-06 10:40 ` Stefan Hajnoczi
  0 siblings, 2 replies; 4+ messages in thread
From: Jaden Liang @ 2017-04-01  5:23 UTC (permalink / raw)
  To: qemu-devel

Hello,

I ran qemu with drive file via libnfs recently, and found some performance
problem and an improvement idea.

I started qemu with 6 drives parameter like nfs://127.0.0.1/dir/vm-disk-x.qcow2
which linked to a local NFS server, then used iometer in guest machine to test
the 4K random read or random write IO performance. I found that while the IO
depth go up, the IOPS hit a bottleneck. I looked into the causes, found that the
main thread of qemu used 100% CPU. From the perf data, it show the CPU heats are
send / recv calls in libnfs. By reading the source code of libnfs and qemu block
drive of nfs.c, libnfs only support single work thread, and the network events
of nfs interface in qemu are all registered in the epoll of main thread. That is
the cause why main thread uses 100% CPU.

After the analysis above, there is an improvement idea comes up. I start a
thread for every drive while libnfs open drive file, then create an epoll in
every drive thread to handle all of the network events. I have finished an demo
modification in block/nfs.c, then rerun iometer in the guest machine, the
performance increased a lot. Random read IOPS increases almost 100%, random
write IOPS increases about 68%.

Test model details
VM configure: 6 vdisks in 1 VM
Test tool and parameter: iometer with 4K random read and randwrite
Backend physical drive: 2 SSDs, 6 vdisks are seperated in 2 SSDs

Before modified:
IO Depth           1        2          4           8       16         32
4K randread  16659  28387   42932   46868   52108   55760
4K randwrite  12212   19456   30447   30574   35788   39015

After modified:
IO Depth            1         2          4          8        16          32
4K randread   17661   33115   57138   82016   99369   109410
4K randwrite  12669   21492   36017   51532   61475   65577

I could put a up to coding standard patch later. Now I want to get some advise
about this modification. Is this a reasonable solution to improve performance in
NFS shares? Or there is another better way?

Any suggestions would be great! Also please feel free to ask question.

-- 
Best regards,
Jaden Liang

^ permalink raw reply	[flat|nested] 4+ messages in thread