From mboxrd@z Thu Jan  1 00:00:00 1970
From: Gleb Natapov <gleb@redhat.com>
Subject: Re: [PATCH 2/2] x86, apicv: Add Posted Interrupt supporting
Date: Tue, 5 Feb 2013 10:00:35 +0200
Message-ID: <20130205080035.GU23213@redhat.com>
References: <20130131133837.GB23213@redhat.com>
 <20130131134443.GA4419@amt.cnet>
 <20130131135556.GC23213@redhat.com>
 <20130204005700.GA2705@amt.cnet>
 <20130204095553.GK23213@redhat.com>
 <20130204144345.GA11328@amt.cnet>
 <20130204171301.GB10756@redhat.com>
 <20130204195952.GA15856@amt.cnet>
 <20130204204729.GA16442@amt.cnet>
 <A9667DDFB95DB7438FA9D7D576C3D87E0999AA7B@SHSMSX101.ccr.corp.intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"Shan, Haitao" <haitao.shan@intel.com>,
	"Zhang, Xiantao" <xiantao.zhang@intel.com>,
	"Nakajima, Jun" <jun.nakajima@intel.com>,
	"Anvin, H Peter" <h.peter.anvin@intel.com>
To: "Zhang, Yang Z" <yang.z.zhang@intel.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:40150 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1754037Ab3BEIAu (ORCPT <rfc822;kvm@vger.kernel.org>);
	Tue, 5 Feb 2013 03:00:50 -0500
Content-Disposition: inline
In-Reply-To: <A9667DDFB95DB7438FA9D7D576C3D87E0999AA7B@SHSMSX101.ccr.corp.intel.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On Tue, Feb 05, 2013 at 05:57:14AM +0000, Zhang, Yang Z wrote:
> Marcelo Tosatti wrote on 2013-02-05:
> > On Mon, Feb 04, 2013 at 05:59:52PM -0200, Marcelo Tosatti wrote:
> >> On Mon, Feb 04, 2013 at 07:13:01PM +0200, Gleb Natapov wrote:
> >>> On Mon, Feb 04, 2013 at 12:43:45PM -0200, Marcelo Tosatti wrote:
> >>>>>> Any example how software relies on such
> > two-interrupts-queued-in-IRR/ISR behaviour?
> >>>>> Don't know about guests, but KVM relies on it to detect interrupt
> >>>>> coalescing. So if interrupt is set in IRR but not in PIR interrupt will
> >>>>> not be reported as coalesced, but it will be coalesced during PIR->IRR
> >>>>> merge.
> >>>> 
> >>>> Yes, so:
> >>>> 
> >>>> 1. IRR=1, ISR=0, PIR=0. Event: set_irq, coalesced=no.
> >>>> 2. IRR=0, ISR=1, PIR=0. Event: IRR->ISR transfer.
> >>>> 3. vcpu outside of guest mode.
> >>>> 4. IRR=1, ISR=1, PIR=0. Event: set_irq, coalesced=no.
> >>>> 5. vcpu enters guest mode.
> >>>> 6. IRR=1, ISR=1, PIR=1. Event: set_irq, coalesced=no.
> >>>> 7. HW transfers PIR into IRR.
> >>>> 
> >>>> set_irq return value at 7 is incorrect, interrupt event was _not_
> >>>> queued.
> >>> Not sure I understand the flow of events in your description correctly. As I
> >>> understand it at 4 set_irq() will return incorrect result. Basically
> >>> when PIR is set to 1 while IRR has 1 for the vector the value of
> >>> set_irq() will be incorrect.
> >> 
> >> At 4 it has not been coalesced: it has been queued to IRR.
> >> At 6 it has been coalesced: PIR bit merged into IRR bit.
> >> 
> >>> Frankly I do not see how it can be fixed
> >>> without any race with present HW PIR design.
> >> 
> >> At kvm_accept_apic_interrupt, check IRR before setting PIR bit, if IRR
> >> already set, don't set PIR.
> > 
> > Or:
> > 
> > apic_accept_interrupt() {
> > 
> > 1. Read ORIG_PIR=PIR, ORIG_IRR=IRR.
> > Never set IRR when HWAPIC enabled, even if outside of guest mode.
> > 2. Set PIR and let HW or SW VM-entry transfer it to IRR.
> > 3. set_irq return value: (ORIG_PIR or ORIG_IRR set).
> > }
> > 
> > Two or more concurrent set_irq can race with each other, though. Can
> > either document the race or add a lock.
> According the SDM, software should not touch the IRR when target vcpu is running. Instead, use locked way to access PIR. So your solution may wrong.
Then your apicv patches are broken, because they do exactly that.

> The only problem is the step 6, but at that point, there already an interrupt pending in IRR. This means the interrupt will be handled not lost. And even in real hardware, this case do exist. So I think it should not be a problem. 
> 
This is not the problem we are trying to fix. Sometimes we need to make
sure that each interrupt device generates result in an interrupt handler
invocation in a guest. If interrupt is coalesced (meaning it will not
correspond to separate invocation of a guest interrupt handler) it needs
to be re-injected. With PIR detection of such condition is broken.

--
			Gleb.