Network performance with small packets

* Network performance with small packets
@ 2011-01-25 21:09 Steve Dobbelstein
  2011-01-26 15:17 ` Michael S. Tsirkin
  2011-02-02 18:38 ` Michael S. Tsirkin
  0 siblings, 2 replies; 88+ messages in thread
From: Steve Dobbelstein @ 2011-01-25 21:09 UTC (permalink / raw)
  To: kvm

I am working on a KVM network performance issue found in our lab running
the DayTrader benchmark.  The benchmark throughput takes a significant hit
when running the application server in a KVM guest verses on bare metal.
We have dug into the problem and found that DayTrader's use of small
packets exposes KVM's overhead of handling network packets.  I have been
able to reproduce the performance hit with a simpler setup using the
netperf benchmark with the TCP_RR test and the request and response sizes
set to 256 bytes.  I run the benchmark between two physical systems, each
using a 1GB link.  In order to get the maximum throughput for the system I
have to run 100 instances of netperf.  When I run the netserver processes
in a guest, I see a maximum throughput that is 51% of what I get if I run
the netserver processes directly on the host.  The CPU utilization in the
guest is only 85% at maximum throughput, whereas it is 100% on bare metal.

The KVM host has 16 CPUs.  The KVM guest is configured with 2 VCPUs.  When
I run netperf on the host I boot the host with maxcpus=2 on the kernel
command line.  The host is running the current KVM upstream kernel along
with the current upstream qemu.  Here is the qemu command used to launch
the guest:
/build/qemu-kvm/x86_64-softmmu/qemu-system-x86_64 -name glasgow-RH60 -m 32768 -drive file=/build/guest-data/glasgow-RH60.img,if=virtio,index=0,boot=on
 -drive file=/dev/virt/WAS,if=virtio,index=1 -net nic,model=virtio,vlan=3,macaddr=00:1A:64:E5:00:63,netdev=nic0 -netdev tap,id=nic0,vhost=on -smp 2
-vnc :1 -monitor telnet::4499,server,nowait -serial telnet::8899,server,nowait --mem-path /libhugetlbfs -daemonize

We have tried various proposed fixes, each with varying amounts of success.
One such fix was to add code to the vhost thread such that when it found
the work queue empty it wouldn't just exit the thread but rather would
delay for 50 microseconds and then recheck the queue.  If there was work on
the queue it would loop back and process it, else it would exit the thread.
The change got us a 13% improvement in the DayTrader throughput.

Running the same netperf configuration on the same hardware but using a
different hypervisor gets us significantly better throughput numbers.   The
guest on that hypervisor runs at 100% CPU utilization.  The various fixes
we have tried have not gotten us close to the throughput seen on the other
hypervisor.  I'm looking for ideas/input from the KVM experts on how to
make KVM perform better when handling small packets.

Thanks,
Steve

^ permalink raw reply	[flat|nested] 88+ messages in thread