From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751050AbdAXMyP (ORCPT ); Tue, 24 Jan 2017 07:54:15 -0500 Received: from cn.fujitsu.com ([59.151.112.132]:18593 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1750987AbdAXMyM (ORCPT ); Tue, 24 Jan 2017 07:54:12 -0500 X-IronPort-AV: E=Sophos;i="5.22,518,1449504000"; d="scan'208";a="15058569" Subject: Re: [PATCH RFC] vfio error recovery: kernel support To: "Michael S. Tsirkin" , References: <20170119001744-mutt-send-email-mst@kernel.org> <5881E2C2.2060502@cn.fujitsu.com> <20170120185753-mutt-send-email-mst@kernel.org> CC: , , , From: Cao jin Message-ID: <58874FD7.9080605@cn.fujitsu.com> Date: Tue, 24 Jan 2017 21:00:07 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: <20170120185753-mutt-send-email-mst@kernel.org> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.167.226.69] X-yoursite-MailScanner-ID: 71610477AE75.A69C5 X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: caoj.fnst@cn.fujitsu.com Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/21/2017 01:01 AM, Michael S. Tsirkin wrote: > On Fri, Jan 20, 2017 at 06:13:22PM +0800, Cao jin wrote: >> >> >> On 01/20/2017 04:16 AM, Michael S. Tsirkin wrote: >>> This is a design and an initial patch for kernel side for AER >>> support in VFIO. >>> >>> 0. What happens now (PCIE AER only) >>> Fatal errors cause a link reset. >>> Non fatal errors don't. >>> All errors stop the VM eventually, but not immediately >>> because it's detected and reported asynchronously. >>> Interrupts are forwarded as usual. >>> Correctable errors are not reported to guest at all. >>> Note: PPC EEH is different. This focuses on AER. >>> >>> 1. Correctable errors >>> I don't see a need to report these to guest. So let's not. >>> >>> 2. Fatal errors >>> It's not easy to handle them gracefully since link reset >>> is needed. As a first step, let's use the existing mechanism >>> in that case. >>> >>> 2. Non-fatal errors >>> Here we could make progress by reporting them to guest >>> and have guest handle them. >>> Issues: >>> a. this behaviour should only be enabled with new userspace >>> old userspace should work without changes >>> Suggestion: One way to address this would be to add a new eventfd >>> non_fatal_err_trigger. If not set, invoke err_trigger. >>> >>> b. drivers are supposed to stop MMIO when error is reported >>> if vm keeps going, we will keep doing MMIO/config >>> Suggestion 1: ignore this. vm stop happens much later when userspace runs anyway, >>> so we are not making things much worse >>> Suggestion 2: try to stop MMIO/config, resume on resume call >>> >>> Patch below implements Suggestion 1. >>> >>> c. PF driver might detect that function is completely broken, >>> if vm keeps going, we will keep doing MMIO/config >>> Suggestion 1: ignore this. vm stop happens much later when userspace runs anyway, >>> so we are not making things much worse >>> Suggestion 2: detect this and invoke err_trigger to stop VM >>> >>> Patch below implements Suggestion 2. >>> >>> Aside: we currently return PCI_ERS_RESULT_DISCONNECT when device >>> is not attached. This seems bogus, likely based on the confusing name. >>> We probably should return PCI_ERS_RESULT_CAN_RECOVER. >>> >>> The following patch does not change that. >>> >>> Signed-off-by: Michael S. Tsirkin >>> >>> --- >>> >>> The patch is completely untested. Let's discuss the design first. >>> Cao jin, if this is deemed acceptable please take it from here. >>> >> >> Ok, thanks very much. >> >>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c >>> index dce511f..fdca683 100644 >>> --- a/drivers/vfio/pci/vfio_pci.c >>> +++ b/drivers/vfio/pci/vfio_pci.c >>> @@ -1292,7 +1292,9 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev, >>> >>> mutex_lock(&vdev->igate); >>> >>> - if (vdev->err_trigger) >>> + if (state == pci_channel_io_normal && vdev->non_fatal_err_trigger) >>> + eventfd_signal(vdev->err_trigger, 1); >>> + else if (vdev->err_trigger) >>> eventfd_signal(vdev->err_trigger, 1); >>> >>> mutex_unlock(&vdev->igate); >>> @@ -1302,8 +1304,38 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev, >>> return PCI_ERS_RESULT_CAN_RECOVER; >>> } >>> >>> +static pci_ers_result_t vfio_pci_aer_slot_reset(struct pci_dev *pdev, >>> + pci_channel_state_t state) >>> +{ >>> + struct vfio_pci_device *vdev; >>> + struct vfio_device *device; >>> + >>> + device = vfio_device_get_from_dev(&pdev->dev); >>> + if (!device) >>> + goto err_dev; >>> + >>> + vdev = vfio_device_data(device); >>> + if (!vdev) >>> + goto err_dev; >>> + >>> + mutex_lock(&vdev->igate); >>> + >>> + if (vdev->err_trigger) >>> + eventfd_signal(vdev->err_trigger, 1); >>> + >>> + mutex_unlock(&vdev->igate); >>> + >>> + vfio_device_put(device); >>> + >>> +err_data: >>> + vfio_device_put(device); >>> +err_dev: >>> + return PCI_ERS_RESULT_RECOVERED; >>> +} >>> + >>> static const struct pci_error_handlers vfio_err_handlers = { >>> .error_detected = vfio_pci_aer_err_detected, >>> + .slot_reset = vfio_pci_aer_slot_reset, >>> }; >>> >> >> if .slot_reset wants to be called, .error_detected should return >> PCI_ERS_RESULT_NEED_RESET, as pci-error-recovery.txt said, so does code. >> >> Is .slot_reset now just a copy of .error_detected and we are going do >> some tricks here? or else don't get why .slot_reset signal user again. > > > No. We do not want a slot reset. But it might be triggered by > another (PF) driver on the slot. If that happens some driver did something > to make devices in the slot go to reset and our driver must recover. > We can't recover however, so let's stop the VM. > > Basically the design is simple > - if you can keep going - do > - if you can't - ask qemu to stop guest > > Don't worry about adding more conditions to trigger reset etc > at this point. Do that with later patches on-top. > It take me for a long while to find what I missed. I missed much details in pci-error-recovery.txt, and I was not conscious of a multi-function device which could have different functions(driver) on it. Thank you both. -- Sincerely, Cao jin >>> static struct pci_driver vfio_pci_driver = { >>> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c >>> index 1c46045..e883db5 100644 >>> --- a/drivers/vfio/pci/vfio_pci_intrs.c >>> +++ b/drivers/vfio/pci/vfio_pci_intrs.c >>> @@ -611,6 +611,17 @@ static int vfio_pci_set_err_trigger(struct vfio_pci_device *vdev, >>> count, flags, data); >>> } >>> >>> +static int vfio_pci_set_non_fatal_err_trigger(struct vfio_pci_device *vdev, >>> + unsigned index, unsigned start, >>> + unsigned count, uint32_t flags, void *data) >>> +{ >>> + if (index != VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX || start != 0 || count > 1) >>> + return -EINVAL; >>> + >>> + return vfio_pci_set_ctx_trigger_single(&vdev->non_fatal_err_trigger, >>> + count, flags, data); >>> +} >>> + >>> static int vfio_pci_set_req_trigger(struct vfio_pci_device *vdev, >>> unsigned index, unsigned start, >>> unsigned count, uint32_t flags, void *data) >>> @@ -664,6 +675,14 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_device *vdev, uint32_t flags, >>> break; >>> } >>> break; >>> + case VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX: >>> + switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) { >>> + case VFIO_IRQ_SET_ACTION_TRIGGER: >>> + if (pci_is_pcie(vdev->pdev)) >>> + func = vfio_pci_set_err_trigger; >> >> s/vfio_pci_set_err_trigger/vfio_pci_set_non_fatal_err_trigger > > Right. > >>> + break; >>> + } >>> + break; >>> case VFIO_PCI_REQ_IRQ_INDEX: >>> switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) { >>> case VFIO_IRQ_SET_ACTION_TRIGGER: >>> diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h >>> index f37c73b..c27a507 100644 >>> --- a/drivers/vfio/pci/vfio_pci_private.h >>> +++ b/drivers/vfio/pci/vfio_pci_private.h >>> @@ -93,6 +93,7 @@ struct vfio_pci_device { >>> struct pci_saved_state *pci_saved_state; >>> int refcnt; >>> struct eventfd_ctx *err_trigger; >>> + struct eventfd_ctx *non_fatal_err_trigger; >>> struct eventfd_ctx *req_trigger; >>> struct list_head dummy_resources_list; >>> }; >>> >> >> -- >> Sincerely, >> Cao jin >> > > > . > From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60049) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cW0c5-0002vE-8I for qemu-devel@nongnu.org; Tue, 24 Jan 2017 07:54:18 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cW0c1-00070A-At for qemu-devel@nongnu.org; Tue, 24 Jan 2017 07:54:17 -0500 Received: from [59.151.112.132] (port=26300 helo=heian.cn.fujitsu.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cW0c0-0006y0-Bl for qemu-devel@nongnu.org; Tue, 24 Jan 2017 07:54:13 -0500 References: <20170119001744-mutt-send-email-mst@kernel.org> <5881E2C2.2060502@cn.fujitsu.com> <20170120185753-mutt-send-email-mst@kernel.org> From: Cao jin Message-ID: <58874FD7.9080605@cn.fujitsu.com> Date: Tue, 24 Jan 2017 21:00:07 +0800 MIME-Version: 1.0 In-Reply-To: <20170120185753-mutt-send-email-mst@kernel.org> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH RFC] vfio error recovery: kernel support List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" , alex.williamson@redhat.com Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, qemu-devel@nongnu.org, izumi.taku@jp.fujitsu.com On 01/21/2017 01:01 AM, Michael S. Tsirkin wrote: > On Fri, Jan 20, 2017 at 06:13:22PM +0800, Cao jin wrote: >> >> >> On 01/20/2017 04:16 AM, Michael S. Tsirkin wrote: >>> This is a design and an initial patch for kernel side for AER >>> support in VFIO. >>> >>> 0. What happens now (PCIE AER only) >>> Fatal errors cause a link reset. >>> Non fatal errors don't. >>> All errors stop the VM eventually, but not immediately >>> because it's detected and reported asynchronously. >>> Interrupts are forwarded as usual. >>> Correctable errors are not reported to guest at all. >>> Note: PPC EEH is different. This focuses on AER. >>> >>> 1. Correctable errors >>> I don't see a need to report these to guest. So let's not. >>> >>> 2. Fatal errors >>> It's not easy to handle them gracefully since link reset >>> is needed. As a first step, let's use the existing mechanism >>> in that case. >>> >>> 2. Non-fatal errors >>> Here we could make progress by reporting them to guest >>> and have guest handle them. >>> Issues: >>> a. this behaviour should only be enabled with new userspace >>> old userspace should work without changes >>> Suggestion: One way to address this would be to add a new eventfd >>> non_fatal_err_trigger. If not set, invoke err_trigger. >>> >>> b. drivers are supposed to stop MMIO when error is reported >>> if vm keeps going, we will keep doing MMIO/config >>> Suggestion 1: ignore this. vm stop happens much later when userspace runs anyway, >>> so we are not making things much worse >>> Suggestion 2: try to stop MMIO/config, resume on resume call >>> >>> Patch below implements Suggestion 1. >>> >>> c. PF driver might detect that function is completely broken, >>> if vm keeps going, we will keep doing MMIO/config >>> Suggestion 1: ignore this. vm stop happens much later when userspace runs anyway, >>> so we are not making things much worse >>> Suggestion 2: detect this and invoke err_trigger to stop VM >>> >>> Patch below implements Suggestion 2. >>> >>> Aside: we currently return PCI_ERS_RESULT_DISCONNECT when device >>> is not attached. This seems bogus, likely based on the confusing name. >>> We probably should return PCI_ERS_RESULT_CAN_RECOVER. >>> >>> The following patch does not change that. >>> >>> Signed-off-by: Michael S. Tsirkin >>> >>> --- >>> >>> The patch is completely untested. Let's discuss the design first. >>> Cao jin, if this is deemed acceptable please take it from here. >>> >> >> Ok, thanks very much. >> >>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c >>> index dce511f..fdca683 100644 >>> --- a/drivers/vfio/pci/vfio_pci.c >>> +++ b/drivers/vfio/pci/vfio_pci.c >>> @@ -1292,7 +1292,9 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev, >>> >>> mutex_lock(&vdev->igate); >>> >>> - if (vdev->err_trigger) >>> + if (state == pci_channel_io_normal && vdev->non_fatal_err_trigger) >>> + eventfd_signal(vdev->err_trigger, 1); >>> + else if (vdev->err_trigger) >>> eventfd_signal(vdev->err_trigger, 1); >>> >>> mutex_unlock(&vdev->igate); >>> @@ -1302,8 +1304,38 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev, >>> return PCI_ERS_RESULT_CAN_RECOVER; >>> } >>> >>> +static pci_ers_result_t vfio_pci_aer_slot_reset(struct pci_dev *pdev, >>> + pci_channel_state_t state) >>> +{ >>> + struct vfio_pci_device *vdev; >>> + struct vfio_device *device; >>> + >>> + device = vfio_device_get_from_dev(&pdev->dev); >>> + if (!device) >>> + goto err_dev; >>> + >>> + vdev = vfio_device_data(device); >>> + if (!vdev) >>> + goto err_dev; >>> + >>> + mutex_lock(&vdev->igate); >>> + >>> + if (vdev->err_trigger) >>> + eventfd_signal(vdev->err_trigger, 1); >>> + >>> + mutex_unlock(&vdev->igate); >>> + >>> + vfio_device_put(device); >>> + >>> +err_data: >>> + vfio_device_put(device); >>> +err_dev: >>> + return PCI_ERS_RESULT_RECOVERED; >>> +} >>> + >>> static const struct pci_error_handlers vfio_err_handlers = { >>> .error_detected = vfio_pci_aer_err_detected, >>> + .slot_reset = vfio_pci_aer_slot_reset, >>> }; >>> >> >> if .slot_reset wants to be called, .error_detected should return >> PCI_ERS_RESULT_NEED_RESET, as pci-error-recovery.txt said, so does code. >> >> Is .slot_reset now just a copy of .error_detected and we are going do >> some tricks here? or else don't get why .slot_reset signal user again. > > > No. We do not want a slot reset. But it might be triggered by > another (PF) driver on the slot. If that happens some driver did something > to make devices in the slot go to reset and our driver must recover. > We can't recover however, so let's stop the VM. > > Basically the design is simple > - if you can keep going - do > - if you can't - ask qemu to stop guest > > Don't worry about adding more conditions to trigger reset etc > at this point. Do that with later patches on-top. > It take me for a long while to find what I missed. I missed much details in pci-error-recovery.txt, and I was not conscious of a multi-function device which could have different functions(driver) on it. Thank you both. -- Sincerely, Cao jin >>> static struct pci_driver vfio_pci_driver = { >>> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c >>> index 1c46045..e883db5 100644 >>> --- a/drivers/vfio/pci/vfio_pci_intrs.c >>> +++ b/drivers/vfio/pci/vfio_pci_intrs.c >>> @@ -611,6 +611,17 @@ static int vfio_pci_set_err_trigger(struct vfio_pci_device *vdev, >>> count, flags, data); >>> } >>> >>> +static int vfio_pci_set_non_fatal_err_trigger(struct vfio_pci_device *vdev, >>> + unsigned index, unsigned start, >>> + unsigned count, uint32_t flags, void *data) >>> +{ >>> + if (index != VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX || start != 0 || count > 1) >>> + return -EINVAL; >>> + >>> + return vfio_pci_set_ctx_trigger_single(&vdev->non_fatal_err_trigger, >>> + count, flags, data); >>> +} >>> + >>> static int vfio_pci_set_req_trigger(struct vfio_pci_device *vdev, >>> unsigned index, unsigned start, >>> unsigned count, uint32_t flags, void *data) >>> @@ -664,6 +675,14 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_device *vdev, uint32_t flags, >>> break; >>> } >>> break; >>> + case VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX: >>> + switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) { >>> + case VFIO_IRQ_SET_ACTION_TRIGGER: >>> + if (pci_is_pcie(vdev->pdev)) >>> + func = vfio_pci_set_err_trigger; >> >> s/vfio_pci_set_err_trigger/vfio_pci_set_non_fatal_err_trigger > > Right. > >>> + break; >>> + } >>> + break; >>> case VFIO_PCI_REQ_IRQ_INDEX: >>> switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) { >>> case VFIO_IRQ_SET_ACTION_TRIGGER: >>> diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h >>> index f37c73b..c27a507 100644 >>> --- a/drivers/vfio/pci/vfio_pci_private.h >>> +++ b/drivers/vfio/pci/vfio_pci_private.h >>> @@ -93,6 +93,7 @@ struct vfio_pci_device { >>> struct pci_saved_state *pci_saved_state; >>> int refcnt; >>> struct eventfd_ctx *err_trigger; >>> + struct eventfd_ctx *non_fatal_err_trigger; >>> struct eventfd_ctx *req_trigger; >>> struct list_head dummy_resources_list; >>> }; >>> >> >> -- >> Sincerely, >> Cao jin >> > > > . >