From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:33722)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <alex.williamson@redhat.com>) id 1bHuCD-00070W-Bf
	for qemu-devel@nongnu.org; Tue, 28 Jun 2016 10:41:04 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <alex.williamson@redhat.com>) id 1bHuC6-0002kd-6b
	for qemu-devel@nongnu.org; Tue, 28 Jun 2016 10:41:00 -0400
Received: from mx1.redhat.com ([209.132.183.28]:37874)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <alex.williamson@redhat.com>) id 1bHuC5-0002jr-Ue
	for qemu-devel@nongnu.org; Tue, 28 Jun 2016 10:40:54 -0400
Date: Tue, 28 Jun 2016 08:40:52 -0600
From: Alex Williamson <alex.williamson@redhat.com>
Message-ID: <20160628084052.1e85a730@t450s.home>
In-Reply-To: <7912dad0-0e37-603d-fdfe-bb4950b55f28@cn.fujitsu.com>
References: <1464315131-25834-1-git-send-email-zhoujie2011@cn.fujitsu.com>
	<20160527100655.60db8206@t450s.home>
	<30d1cd95-7f67-29cf-c55e-0565364d89ff@cn.fujitsu.com>
	<41b0c187-ade0-182e-46b5-afd3e99f1e36@cn.fujitsu.com>
	<20160620103226.0ff61b21@ul30vt.home>
	<c12c77e8-e664-9b09-5380-7dd9e09ec4e2@cn.fujitsu.com>
	<20160620211306.66a6b249@t450s.home>
	<576935FC.1080503@easystack.cn>
	<20160621084443.330f932d@t450s.home>
	<be32e794-4ad7-a7b6-dbe2-e14d2c181c0b@cn.fujitsu.com>
	<20160621215626.71c99582@t450s.home>
	<113474d2-8408-db49-e7ef-8c6b736af866@cn.fujitsu.com>
	<468b752b-a161-902b-d4cc-489dfa18c21e@cn.fujitsu.com>
	<20160622094236.515549fa@t450s.home>
	<7746532f-2fad-1304-0df7-7cd25ba761af@cn.fujitsu.com>
	<20160627095418.659e6e5f@t450s.home>
	<d52923dc-afb9-9dc1-ac0e-bae645141843@cn.fujitsu.com>
	<20160627215808.1531a774@t450s.home>
	<7912dad0-0e37-603d-fdfe-bb4950b55f28@cn.fujitsu.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH v8 11/12] vfio: register aer resume
 notification handler for aer resume
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Zhou Jie <zhoujie2011@cn.fujitsu.com>
Cc: izumi.taku@jp.fujitsu.com, caoj.fnst@cn.fujitsu.com, Chen Fan <fan.chen@easystack.cn>, qemu-devel@nongnu.org, mst@redhat.com

On Tue, 28 Jun 2016 13:27:21 +0800
Zhou Jie <zhoujie2011@cn.fujitsu.com> wrote:

> Hi Alex,
> 
> On 2016/6/28 11:58, Alex Williamson wrote:
> > On Tue, 28 Jun 2016 11:26:33 +0800
> > Zhou Jie <zhoujie2011@cn.fujitsu.com> wrote:
> >  
> >> Hi Alex,
> >>  
> >>> The INTx/MSI part needs further definition for the user.  Are we
> >>> actually completely tearing down interrupts with the expectation that
> >>> the user will re-enable them or are we just masking them such that the
> >>> user needs to unmask?  Also note that not all devices support DisINTx.  
> >>
> >> After reset, the "Bus Master Enable" bit of "Command Register"
> >> should be cleared, so MSI/MSI- X interrupt Messages is still disabled.
> >> After reset, the "Interrupt Disable" bit of "Command Register"
> >> should be cleared, so INTx interrupts is enabled.
> >> If the device doesn't support INTx, "Interrupt Disable" bit will
> >> hardware to 0, it is OK here.
> >>
> >> After fatal-error occurs, the user should reset the device and
> >> reinitialize the device.
> >> So I disable the interrupt before host reset the device,
> >> and let user to do the reinitialization.  
> >
> > I'm dubious here.  When DisINTx is not supported by the device or it's
> > marked broken in host quirks, then we can't trust the device to stop
> > sending INTx.  It's hardwired to zero, meaning that it doesn't work or
> > it's been found to be broken in other ways.  So COMMAND register
> > masking is not sufficient for all devices.  
> For Endpoints that generate INTx interrupts, this bit is required.
> For Endpoints that do not generate IN Tx interrupts this bit is
> optional.  If not implemented, this bit must be hardwired to 0b.
> For Root Ports, Switch Ports, and Bridges that generate INTx
> interrupts on their own behalf, this bit is required.
> 
> The above is from "7.5.1.1." of "PCI Express Base Specification 3.1a".
> So I think "Interrupt Disable" bit must be supported by the device
> which can generate INTx interrupts.

And yet we have struct pci_dev.broken_intx_masking and we test for
working DisINTx via pci_intx_mask_supported() rather than simply
looking for a PCIe device.  Some devices are broken and some simply
don't follow the spec, so you're going to need to deal with that or
exclude those devices.
 
> > Also, any time we start
> > changing the state of the device from what the user expects, we risk
> > consistency problems.  We need to consider how the user last saw the
> > device and whether we can legitimately expect them to handle the device
> > in a new state.  If we expect the user to re-initialize the device then
> > would it be more correct to teardown all interrupt signaling such that
> > the device is effectively in the same state as initial handoff when the
> > vfio device fd is opened?  
> Before the user re-initialize the device, host has reseted the device.

How does that happen, aren't we notifying the user at the point the
error occurs, while the device is still in the process or being reset?
My question is how does the user know that the host reset is complete
in order to begin their own re-initialization?

> The interrupt status will be cleared by hardware.
> So the hardware is the same as the state when the
> vfio device fd is opened.

The PCI-core in Linux will save and restore the device state around
reset, how do we know that vfio-pci itself is not racing that reset and
whether PCI-core will restore the state including our interrupt masking
or a state without it?  Do we need to restore the state to the one we
saved when we originally opened the device?  Shouldn't that mean we
teardown the interrupt setup the user had prior to the error event?
 
> > How will the user know when the device is
> > ready to be reset?  Which of the ioctls that you're blocking can they
> > poll w/o any unwanted side-effects or awkward interactions?  Should
> > flag bits in the device info ioctl indicate not only support for this
> > behavior but also the current status?  Thanks,  
> I can block the reset ioctl and config write.
> I will not add flag for the device current status,
> because I don't depend on user to prevent awkward interactions.

Ok, so that's a reason to block rather than return -EAGAIN.  Still we
need some way to indicate to the user whether the device supports this
new interaction rather than the existing behavior.  Thanks,

Alex