From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: [RFC PATCH 00/17] virtual-bus Date: Sun, 05 Apr 2009 11:06:12 +0300 Message-ID: <49D86674.3020503@redhat.com> References: <20090331184057.28333.77287.stgit@dev.haskins.net> <200904011638.45135.rusty@rustcorp.com.au> <49D391F5.4080700@codemonkey.ws> <200904051314.23170.rusty@rustcorp.com.au> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: Anthony Liguori , Gregory Haskins , linux-kernel@vger.kernel.org, agraf@suse.de, pmullaney@novell.com, pmorreale@novell.com, netdev@vger.kernel.org, kvm@vger.kernel.org To: Rusty Russell Return-path: Received: from mx2.redhat.com ([66.187.237.31]:34700 "EHLO mx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752841AbZDEIGC (ORCPT ); Sun, 5 Apr 2009 04:06:02 -0400 In-Reply-To: <200904051314.23170.rusty@rustcorp.com.au> Sender: kvm-owner@vger.kernel.org List-ID: Rusty Russell wrote: > On Thursday 02 April 2009 02:40:29 Anthony Liguori wrote: > >> Rusty Russell wrote: >> >>> As you point out, 350-450 is possible, which is still bad, and it's at least >>> partially caused by the exit to userspace and two system calls. If virtio_net >>> had a backend in the kernel, we'd be able to compare numbers properly. >>> >> I doubt the userspace exit is the problem. On a modern system, it takes >> about 1us to do a light-weight exit and about 2us to do a heavy-weight >> exit. A transition to userspace is only about ~150ns, the bulk of the >> additional heavy-weight exit cost is from vcpu_put() within KVM. >> > > Just to inject some facts, servicing a ping via tap (ie host->guest then > guest->host response) takes 26 system calls from one qemu thread, 7 from > another (see strace below). Judging by those futex calls, multiple context > switches, too. > Interesting stuff. Even if amortized over half a ring's worth of packets, that's quite a lot. Two threads are involved (we complete on the iothread, since we don't know which vcpu will end up processing the interrupt, if any). > > Pid 10260: > 12:37:40.245785 select(17, [4 6 8 14 16], [], [], {0, 996000}) = 1 (in [6], left {0, 992000}) <0.003995> > Should switch to epoll with its lower wait costs. Unfortunately the relative timeout requires reading the clock. > 12:37:40.250226 read(6, "\0\0\0\0\0\0\0\0\0\0RT\0\0224V*\211\24\210`\304\10\0E\0"..., 69632) = 108 <0.000051> > 12:37:40.250462 write(1, "tap read: 108 bytes\n", 20) = 20 <0.000197> > I hope this is your addition. > 12:37:40.250800 ioctl(7, 0x4008ae61, 0x7fff8cafb3a0) = 0 <0.000223> > 12:37:40.251149 read(6, 0x115c6ac, 69632) = -1 EAGAIN (Resource temporarily unavailable) <0.000019> > This wouldn't be necessary with io_getevents(). > 12:37:40.251292 write(1, "tap read: -1 bytes\n", 19) = 19 <0.000085> > ... > 12:37:40.251488 clock_gettime(CLOCK_MONOTONIC, {1554, 633304282}) = 0 <0.000020> > 12:37:40.251604 clock_gettime(CLOCK_MONOTONIC, {1554, 633413793}) = 0 <0.000019> > Great. > 12:37:40.251717 futex(0xb81360, 0x81 /* FUTEX_??? */, 1) = 1 <0.001222> > 12:37:40.253037 select(17, [4 6 8 14 16], [], [], {1, 0}) = 1 (in [16], left {1, 0}) <0.000026> > 12:37:40.253196 read(16, "\16\0\0\0\0\0\0\0\376\377\377\377\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128 <0.000022> > 12:37:40.253324 rt_sigaction(SIGALRM, NULL, {0x406d50, ~[KILL STOP RTMIN RT_1], SA_RESTORER, 0x7f1a842430f0}, 8) = 0 <0.000018> > 12:37:40.253477 write(5, "\0", 1) = 1 <0.000022> > The write is to wake someone up. Who? > 12:37:40.253585 read(16, 0x7fff8cb09440, 128) = -1 EAGAIN (Resource temporarily unavailable) <0.000020> > Clearing up signalfd... > 12:37:40.253687 clock_gettime(CLOCK_MONOTONIC, {1554, 635496181}) = 0 <0.000019> > 12:37:40.253798 writev(6, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"*\211\24\210`\304RT\0\0224V\10\0E\0\0T\255\262\0\0@\1G"..., 98}], 2) = 108 <0.000062> > 12:37:40.253993 ioctl(7, 0x4008ae61, 0x7fff8caff460) = 0 <0.000161> > Injecting the interrupt. > 12:37:40.254263 clock_gettime(CLOCK_MONOTONIC, {1554, 636077540}) = 0 <0.000019> > 12:37:40.254380 futex(0xb81360, 0x81 /* FUTEX_??? */, 1) = 1 <0.000394> > 12:37:40.254861 select(17, [4 6 8 14 16], [], [], {1, 0}) = 1 (in [4], left {1, 0}) <0.000022> > 12:37:40.255001 read(4, "\0", 512) = 1 <0.000021> > Great, the write() was to wake ourselves up. > 12:37:40.255109 read(4, 0x7fff8cb092d0, 512) = -1 EAGAIN (Resource temporarily unavailable) <0.000018> > 12:37:40.255211 clock_gettime(CLOCK_MONOTONIC, {1554, 637020677}) = 0 <0.000019> > 12:37:40.255314 clock_gettime(CLOCK_MONOTONIC, {1554, 637123483}) = 0 <0.000019> > 12:37:40.255416 timer_gettime(0, {it_interval={0, 0}, it_value={0, 0}}) = 0 <0.000018> > 12:37:40.255524 timer_settime(0, 0, {it_interval={0, 0}, it_value={0, 14000000}}, NULL) = 0 <0.000021> > 12:37:40.255635 clock_gettime(CLOCK_MONOTONIC, {1554, 637443915}) = 0 <0.000019> > 12:37:40.255739 clock_gettime(CLOCK_MONOTONIC, {1554, 637547001}) = 0 <0.000018> > 12:37:40.255847 select(17, [4 6 8 14 16], [], [], {1, 0}) = 1 (in [16], left {0, 988000}) <0.014303> > > This is the vcpu thread: > Pid 10262: > 12:37:40.252531 clock_gettime(CLOCK_MONOTONIC, {1554, 634339051}) = 0 <0.000018> > 12:37:40.252631 timer_gettime(0, {it_interval={0, 0}, it_value={0, 17549811}}) = 0 <0.000021> > 12:37:40.252750 timer_settime(0, 0, {it_interval={0, 0}, it_value={0, 250000}}, NULL) = 0 <0.000024> > 12:37:40.252868 ioctl(11, 0xae80, 0) = 0 <0.001171> > 12:37:40.254128 futex(0xb81360, 0x80 /* FUTEX_??? */, 2) = 0 <0.000270> > 12:37:40.254490 ioctl(7, 0x4008ae61, 0x4134bee0) = 0 <0.000019> > 12:37:40.254598 futex(0xb81360, 0x81 /* FUTEX_??? */, 1) = 0 <0.000017> > 12:37:40.254693 ioctl(11, 0xae80 > Looks like the interrupt from the iothread was injected and delivered before the iothread could give up the mutex, so we needed to wait here. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.