From mboxrd@z Thu Jan  1 00:00:00 1970
From: Jan Kiszka <jan.kiszka@siemens.com>
Subject: Re: [PATCH 14/22] kvm: Fix race between timer signals and vcpu entry
 under !IOTHREAD
Date: Tue, 01 Feb 2011 15:37:26 +0100
Message-ID: <4D481AA6.9010607@siemens.com>
References: <cover.1296133797.git.jan.kiszka@siemens.com> <c576bef42c459463e4cd18f61f845a32fd347b60.1296133797.git.jan.kiszka@siemens.com> <20110201124707.GA12061@amt.cnet> <4D480B76.90509@siemens.com> <20110201134821.GA12848@amt.cnet> <4D48116A.9020807@siemens.com> <20110201141039.GA14442@amt.cnet> <4D4816F0.5060009@siemens.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: Avi Kivity <avi@redhat.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from thoth.sbs.de ([192.35.17.2]:28438 "EHLO thoth.sbs.de"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1753267Ab1BAOho (ORCPT <rfc822;kvm@vger.kernel.org>);
	Tue, 1 Feb 2011 09:37:44 -0500
In-Reply-To: <4D4816F0.5060009@siemens.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On 2011-02-01 15:21, Jan Kiszka wrote:
> On 2011-02-01 15:10, Marcelo Tosatti wrote:
>> On Tue, Feb 01, 2011 at 02:58:02PM +0100, Jan Kiszka wrote:
>>> On 2011-02-01 14:48, Marcelo Tosatti wrote:
>>>> On Tue, Feb 01, 2011 at 02:32:38PM +0100, Jan Kiszka wrote:
>>>>> On 2011-02-01 13:47, Marcelo Tosatti wrote:
>>>>>> On Thu, Jan 27, 2011 at 02:09:58PM +0100, Jan Kiszka wrote:
>>>>>>> Found by Stefan Hajnoczi: There is a race in kvm_cpu_exec between
>>>>>>> checking for exit_request on vcpu entry and timer signals arriving
>>>>>>> before KVM starts to catch them. Plug it by blocking both timer related
>>>>>>> signals also on !CONFIG_IOTHREAD and process those via signalfd.
>>>>>>>
>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>> CC: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>>>>>>> ---
>>>>>>>  cpus.c |   18 ++++++++++++++++++
>>>>>>>  1 files changed, 18 insertions(+), 0 deletions(-)
>>>>>>>
>>>>>>> diff --git a/cpus.c b/cpus.c
>>>>>>> index fc3f222..29b1070 100644
>>>>>>> --- a/cpus.c
>>>>>>> +++ b/cpus.c
>>>>>>> @@ -254,6 +254,10 @@ static void qemu_kvm_init_cpu_signals(CPUState *env)
>>>>>>>      pthread_sigmask(SIG_BLOCK, NULL, &set);
>>>>>>>      sigdelset(&set, SIG_IPI);
>>>>>>>      sigdelset(&set, SIGBUS);
>>>>>>> +#ifndef CONFIG_IOTHREAD
>>>>>>> +    sigdelset(&set, SIGIO);
>>>>>>> +    sigdelset(&set, SIGALRM);
>>>>>>> +#endif
>>>>>>
>>>>>> I'd prefer separate qemu_kvm_init_cpu_signals in the !IOTHREAD
>>>>>> section.
>>>>>
>>>>> You mean to duplicate qemu_kvm_init_cpu_signals for both configurations?
>>>>
>>>> Yes, so to avoid #ifdefs spread.
>>>
>>> Would exchange some #ifdefs against ifndef _WIN32. Haven't measured the
>>> delta though.
>>>
>>>>
>>>>>>> +
>>>>>>> +#ifndef CONFIG_IOTHREAD
>>>>>>> +    if (sigismember(&chkset, SIGIO) || sigismember(&chkset, SIGALRM)) {
>>>>>>> +        qemu_notify_event();
>>>>>>> +    }
>>>>>>> +#endif
>>>>>>
>>>>>> Why is this necessary?
>>>>>>
>>>>>> You should break out of cpu_exec_all if there's a pending alarm (see
>>>>>> qemu_alarm_pending()).
>>>>>
>>>>> qemu_alarm_pending() is not true until the signal is actually taken. The
>>>>> alarm handler sets the required flags.
>>>>
>>>> Right. What i mean is you need to execute the signal handler inside
>>>> cpu_exec_all loop (so that alarm pending is set).
>>>>
>>>> So, if there is a SIGALRM pending, qemu_run_timers has highest
>>>> priority, not vcpu execution.
>>>
>>> We leave the vcpu loop (thanks to notify_event), process the signal in
>>> the event loop and run the timer handler. This pattern is IMO less
>>> invasive to the existing code, specifically as it is about to die
>>> long-term anyway.
>>
>> You'll probably see poor timer behaviour on smp guests without iothread
>> enabled.
>>
> 
> Still checking, but that would mean the notification mechanism is broken
> anyway: If IO events do not force us to process them quickly, we already
> suffer from latencies in SMP mode.

Maybe a regression caused by the iothread introduction: we need to break
out of the cpu loop via global exit_request when there is an IO event
pending. Fixing this will also heal the above issue.

Sigh, we need to get rid of those two implementations and focus all
reviewing and testing on one. I bet there are still more corner cases
sleeping somewhere.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from [140.186.70.92] (port=57738 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1PkHMN-0001JW-1Y
	for qemu-devel@nongnu.org; Tue, 01 Feb 2011 09:37:37 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <jan.kiszka@siemens.com>) id 1PkHMK-00007D-PN
	for qemu-devel@nongnu.org; Tue, 01 Feb 2011 09:37:34 -0500
Received: from thoth.sbs.de ([192.35.17.2]:25932)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jan.kiszka@siemens.com>) id 1PkHMK-00006g-Gs
	for qemu-devel@nongnu.org; Tue, 01 Feb 2011 09:37:32 -0500
Message-ID: <4D481AA6.9010607@siemens.com>
Date: Tue, 01 Feb 2011 15:37:26 +0100
From: Jan Kiszka <jan.kiszka@siemens.com>
MIME-Version: 1.0
References: <cover.1296133797.git.jan.kiszka@siemens.com>
	<c576bef42c459463e4cd18f61f845a32fd347b60.1296133797.git.jan.kiszka@siemens.com>
	<20110201124707.GA12061@amt.cnet> <4D480B76.90509@siemens.com>
	<20110201134821.GA12848@amt.cnet> <4D48116A.9020807@siemens.com>
	<20110201141039.GA14442@amt.cnet> <4D4816F0.5060009@siemens.com>
In-Reply-To: <4D4816F0.5060009@siemens.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Subject: [Qemu-devel] Re: [PATCH 14/22] kvm: Fix race between timer signals
 and vcpu entry under !IOTHREAD
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>, Avi Kivity <avi@redhat.com>, "kvm@vger.kernel.org" <kvm@vger.kernel.org>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>

On 2011-02-01 15:21, Jan Kiszka wrote:
> On 2011-02-01 15:10, Marcelo Tosatti wrote:
>> On Tue, Feb 01, 2011 at 02:58:02PM +0100, Jan Kiszka wrote:
>>> On 2011-02-01 14:48, Marcelo Tosatti wrote:
>>>> On Tue, Feb 01, 2011 at 02:32:38PM +0100, Jan Kiszka wrote:
>>>>> On 2011-02-01 13:47, Marcelo Tosatti wrote:
>>>>>> On Thu, Jan 27, 2011 at 02:09:58PM +0100, Jan Kiszka wrote:
>>>>>>> Found by Stefan Hajnoczi: There is a race in kvm_cpu_exec between
>>>>>>> checking for exit_request on vcpu entry and timer signals arriving
>>>>>>> before KVM starts to catch them. Plug it by blocking both timer related
>>>>>>> signals also on !CONFIG_IOTHREAD and process those via signalfd.
>>>>>>>
>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>> CC: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>>>>>>> ---
>>>>>>>  cpus.c |   18 ++++++++++++++++++
>>>>>>>  1 files changed, 18 insertions(+), 0 deletions(-)
>>>>>>>
>>>>>>> diff --git a/cpus.c b/cpus.c
>>>>>>> index fc3f222..29b1070 100644
>>>>>>> --- a/cpus.c
>>>>>>> +++ b/cpus.c
>>>>>>> @@ -254,6 +254,10 @@ static void qemu_kvm_init_cpu_signals(CPUState *env)
>>>>>>>      pthread_sigmask(SIG_BLOCK, NULL, &set);
>>>>>>>      sigdelset(&set, SIG_IPI);
>>>>>>>      sigdelset(&set, SIGBUS);
>>>>>>> +#ifndef CONFIG_IOTHREAD
>>>>>>> +    sigdelset(&set, SIGIO);
>>>>>>> +    sigdelset(&set, SIGALRM);
>>>>>>> +#endif
>>>>>>
>>>>>> I'd prefer separate qemu_kvm_init_cpu_signals in the !IOTHREAD
>>>>>> section.
>>>>>
>>>>> You mean to duplicate qemu_kvm_init_cpu_signals for both configurations?
>>>>
>>>> Yes, so to avoid #ifdefs spread.
>>>
>>> Would exchange some #ifdefs against ifndef _WIN32. Haven't measured the
>>> delta though.
>>>
>>>>
>>>>>>> +
>>>>>>> +#ifndef CONFIG_IOTHREAD
>>>>>>> +    if (sigismember(&chkset, SIGIO) || sigismember(&chkset, SIGALRM)) {
>>>>>>> +        qemu_notify_event();
>>>>>>> +    }
>>>>>>> +#endif
>>>>>>
>>>>>> Why is this necessary?
>>>>>>
>>>>>> You should break out of cpu_exec_all if there's a pending alarm (see
>>>>>> qemu_alarm_pending()).
>>>>>
>>>>> qemu_alarm_pending() is not true until the signal is actually taken. The
>>>>> alarm handler sets the required flags.
>>>>
>>>> Right. What i mean is you need to execute the signal handler inside
>>>> cpu_exec_all loop (so that alarm pending is set).
>>>>
>>>> So, if there is a SIGALRM pending, qemu_run_timers has highest
>>>> priority, not vcpu execution.
>>>
>>> We leave the vcpu loop (thanks to notify_event), process the signal in
>>> the event loop and run the timer handler. This pattern is IMO less
>>> invasive to the existing code, specifically as it is about to die
>>> long-term anyway.
>>
>> You'll probably see poor timer behaviour on smp guests without iothread
>> enabled.
>>
> 
> Still checking, but that would mean the notification mechanism is broken
> anyway: If IO events do not force us to process them quickly, we already
> suffer from latencies in SMP mode.

Maybe a regression caused by the iothread introduction: we need to break
out of the cpu loop via global exit_request when there is an IO event
pending. Fixing this will also heal the above issue.

Sigh, we need to get rid of those two implementations and focus all
reviewing and testing on one. I bet there are still more corner cases
sleeping somewhere.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux