From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46418) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bIR4Y-0003aT-2w for qemu-devel@nongnu.org; Wed, 29 Jun 2016 21:47:20 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bIR4S-0005GD-Lo for qemu-devel@nongnu.org; Wed, 29 Jun 2016 21:47:15 -0400 Received: from [59.151.112.132] (port=20144 helo=heian.cn.fujitsu.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bIR4P-0005Ef-Hm for qemu-devel@nongnu.org; Wed, 29 Jun 2016 21:47:12 -0400 References: <1464315131-25834-1-git-send-email-zhoujie2011@cn.fujitsu.com> <30d1cd95-7f67-29cf-c55e-0565364d89ff@cn.fujitsu.com> <41b0c187-ade0-182e-46b5-afd3e99f1e36@cn.fujitsu.com> <20160620103226.0ff61b21@ul30vt.home> <20160620211306.66a6b249@t450s.home> <576935FC.1080503@easystack.cn> <20160621084443.330f932d@t450s.home> <20160621215626.71c99582@t450s.home> <113474d2-8408-db49-e7ef-8c6b736af866@cn.fujitsu.com> <468b752b-a161-902b-d4cc-489dfa18c21e@cn.fujitsu.com> <20160622094236.515549fa@t450s.home> <7746532f-2fad-1304-0df7-7cd25ba761af@cn.fujitsu.com> <20160627095418.659e6e5f@t450s.home> <20160627215808.1531a774@t450s.home> <7912dad0-0e37-603d-fdfe-bb4950b55f28@cn.fujitsu.com> <20160628084052.1e85a730@t450s.home> <689ac38f-96d7-9717-e9c4-d9926272cb86@cn.fujitsu.com> <20160629122242.2ac20254@t450s.home> From: Zhou Jie Message-ID: Date: Thu, 30 Jun 2016 09:45:53 +0800 MIME-Version: 1.0 In-Reply-To: <20160629122242.2ac20254@t450s.home> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v8 11/12] vfio: register aer resume notification handler for aer resume List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alex Williamson Cc: izumi.taku@jp.fujitsu.com, caoj.fnst@cn.fujitsu.com, Chen Fan , qemu-devel@nongnu.org, mst@redhat.com Hi Alex, On 2016/6/30 2:22, Alex Williamson wrote: > On Wed, 29 Jun 2016 16:54:05 +0800 > Zhou Jie wrote: > >> Hi Alex, >> >>> And yet we have struct pci_dev.broken_intx_masking and we test for >>> working DisINTx via pci_intx_mask_supported() rather than simply >>> looking for a PCIe device. Some devices are broken and some simply >>> don't follow the spec, so you're going to need to deal with that or >>> exclude those devices. >> For those devices I have no way to disable the INTx. > > disable_irq()? Clearly vfio-pci already manages these types of devices > and can disable INTx. This is why I keep suggesting that maybe tearing > the interrupt setup down completely is a more complete and reliable > approach than masking in the command register. Unless we're going to > exclude such devices from supporting this mode (which I don't condone), > we must deal with them. Thank you for tell me that. Yes, I can use disable_irq to disable the pci device irq. But should I enable the irq after reset? I will dig into it. Sincerely Zhou Jie >>> How does that happen, aren't we notifying the user at the point the >>> error occurs, while the device is still in the process or being reset? >>> My question is how does the user know that the host reset is complete >>> in order to begin their own re-initialization? >> I will add a state in "struct vfio_pci_device". >> The state is set when the device can not work such as a aer error >> occured. >> And the state is clear when the device can work such as resume >> received. >> Return the state when user get info by vfio_pci_ioctl. >> >>>> The interrupt status will be cleared by hardware. >>>> So the hardware is the same as the state when the >>>> vfio device fd is opened. >>> >>> The PCI-core in Linux will save and restore the device state around >>> reset, how do we know that vfio-pci itself is not racing that reset and >>> whether PCI-core will restore the state including our interrupt masking >>> or a state without it? Do we need to restore the state to the one we >>> saved when we originally opened the device? Shouldn't that mean we >>> teardown the interrupt setup the user had prior to the error event? >> For above you said. >> Maybe disable the interrupt is not a good idea. >> Think about what will happend in the interrupt handler. >> Maybe read/write configure space and region bar. >> I will make the configure space read only. >> Do nothing for region bar which used by userd. > > I'm thinking that vfio-pci will be attempting to mask the interrupts > via the PCI command register, which is potentially in a state of flux > due to the host reset and yet we're somehow expecting that our write to > the command register sticks. We certainly have the ability to a) > discard interrupts received between AER error and resume, and b) if we > want to be consistent with requiring the user to reinitialize the > device, then the user interrupt setup should likely be torn down. > Thanks,