From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37052) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VTtLM-0003WY-1o for qemu-devel@nongnu.org; Wed, 09 Oct 2013 08:58:30 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VTtLG-0006Z8-2O for qemu-devel@nongnu.org; Wed, 09 Oct 2013 08:58:23 -0400 Received: from mx1.redhat.com ([209.132.183.28]:20108) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VTtLF-0006Yy-Qa for qemu-devel@nongnu.org; Wed, 09 Oct 2013 08:58:17 -0400 Message-ID: <525552D9.90407@redhat.com> Date: Wed, 09 Oct 2013 14:58:01 +0200 From: Hans de Goede MIME-Version: 1.0 References: <1381259403-7386-1-git-send-email-hdegoede@redhat.com> <52545950.5070403@redhat.com> <52545B44.70005@redhat.com> <28EE4224-856B-4DC1-8159-A0C274BD269E@alex.org.uk> <52546000.6070308@redhat.com> <6CD4D4EE-341E-48D0-98F4-D55C0D3922D4@alex.org.uk> <52546832.9040900@redhat.com> <5254701B.5070804@redhat.com> In-Reply-To: <5254701B.5070804@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH] main-loop: Don't lock starve io-threads when main_loop_tlg has pending events List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: qemu-devel@nongnu.org, Alex Bligh Hi, On 10/08/2013 10:50 PM, Paolo Bonzini wrote: > Il 08/10/2013 22:16, Hans de Goede ha scritto: >> No, it is calling main_loop_wait with nonblocking set to 0, so >> normally the lock would get released. But >> timerlistgroup_deadline_ns(&main_loop_tlg) is returning 0, >> causing timeout_ns to be 0, and this causes the lock to not get >> released. > > Yes, this was my understanding of the patch as well. Before Alex's > series, this would be capped to MIN_REARM_TIMER_NS (250 us). This is > why I mentioned 250 us. > > However, I agree with Alex that it looks a bit fishy and I'd like to > know what timer is it that is continuously set to expire in the past. I got an off-list email this morning from a kind soul informing me that he was seeing the same issue as me, but only when using audio over spice. I've just tried things with "-device hda -device hda-duplex" removed from my qemu cmdline and I can confirm this is the same for me. There seem to be more cases which trigger the timer set to expire in the past case, I added a printf("bingo\n"); to my code for catching this case and still unlocking the lock there, and it triggers a couple of times before the hda device becomes active too. As soon as pulseaudio starts in the guest the screen really starts filling with bingos though, so it definitely seems a spice + audio problem, Normally the bingo's stop scrolling by after a couple of seconds (as pulse audio stops the audio device when there is no sound), but when I try to play music they keep scrolling. Also playing music in git master (with my patch) makes qemu take 100% host cpu, while doing the same using qemu-1.5.3 takes 50% host cpu, so there definetily is something wrong here. Looking at the spice code in question, we have the spice-server code setting a timer to go of every MM_TIMER_GRANULARITY_MS ms, which is every 33 ms, so not often at all. Spice server does this by calling ui/spice-core.c timer_start() from its timer function, which does: qemu_mod_timer(timer->timer, qemu_get_clock_ms(rt_clock) + ms); Note how it goes from the current time, not from some saved time, so even if we've missed a couple of timer ticks it should catch up with the current time after running once. So long story short I've no idea what is going on here, and Alex is right that there is a bigger underlying problem. Regards, Hans p.s. I still think we should add my patch in some form, since the lock starvation caused by timers set to expire in the past could still happen in other cases, esp for timer users who take a time stamp once and then add incremental values to trigger the next run, these can get behind quite a bit if there are latency spikes, and we don't wont to run without ever releasing the lock while these are catching up.