All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] vfio pci: kernel support of error recovery only for non fatal error
@ 2017-02-27  7:28 ` Cao jin
  0 siblings, 0 replies; 22+ messages in thread
From: Cao jin @ 2017-02-27  7:28 UTC (permalink / raw)
  To: linux-kernel, kvm, qemu-devel; +Cc: mst, alex.williamson, izumi.taku

0. What happens now (PCIE AER only)
   Fatal errors cause a link reset.
   Non fatal errors don't.
   All errors stop the VM eventually, but not immediately
   because it's detected and reported asynchronously.
   Interrupts are forwarded as usual.
   Correctable errors are not reported to guest at all.
   Note: PPC EEH is different. This focuses on AER.

1. Correctable errors
   There is no need to report these to guest. So let's not.

2. Fatal errors
   It's not easy to handle them gracefully since link reset
   is needed. As a first step, let's use the existing mechanism
   in that case.

2. Non-fatal errors
   Here we could make progress by reporting them to guest
   and have guest handle them.

   Issues:
   a. this behaviour should only be enabled with new userspace,
      old userspace should work without changes.

      Suggestion: One way to address this would be to add a new eventfd
      non_fatal_err_trigger. If not set, invoke err_trigger.

   b. drivers are supposed to stop MMIO when error is reported,
      if vm keeps going, we will keep doing MMIO/config.

      Suggestion 1: ignore this. vm stop happens much later when
      userspace runs anyway, so we are not making things much worse.

      Suggestion 2: try to stop MMIO/config, resume on resume call

      Patch below implements Suggestion 1.

      Note that although this is really against the documentation, which
      states error_detected() is the point at which the driver should quiesce
      the device and not touch it further (until diagnostic poking at
      mmio_enabled or full access at resume callback).

      Fixing this won't be easy. However, this is not a regression.

      Also note this does nothing about interrupts, documentation
      suggests returning IRQ_NONE until reset.
      Again, not a regression.

   c. PF driver might detect that function is completely broken,
      if vm keeps going, we will keep doing MMIO/config.

      Suggestion 1: ignore this. vm stop happens much later when
      userspace runs anyway, so we are not making things much worse.

      Suggestion 2: detect this and invoke err_trigger to stop VM.

      Patch below implements Suggestion 2.

Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
---
 drivers/vfio/pci/vfio_pci.c         | 38 +++++++++++++++++++++++++++++++++++--
 drivers/vfio/pci/vfio_pci_intrs.c   | 19 +++++++++++++++++++
 drivers/vfio/pci/vfio_pci_private.h |  1 +
 include/uapi/linux/vfio.h           |  1 +
 4 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 324c52e..3551cc9 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -441,7 +441,8 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type)
 
 			return (flags & PCI_MSIX_FLAGS_QSIZE) + 1;
 		}
-	} else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX) {
+	} else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX ||
+		   irq_type == VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX) {
 		if (pci_is_pcie(vdev->pdev))
 			return 1;
 	} else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) {
@@ -796,6 +797,7 @@ static long vfio_pci_ioctl(void *device_data,
 		case VFIO_PCI_REQ_IRQ_INDEX:
 			break;
 		case VFIO_PCI_ERR_IRQ_INDEX:
+		case VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX:
 			if (pci_is_pcie(vdev->pdev))
 				break;
 		/* pass thru to return error */
@@ -1282,7 +1284,9 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
 
 	mutex_lock(&vdev->igate);
 
-	if (vdev->err_trigger)
+	if (state == pci_channel_io_normal && vdev->non_fatal_err_trigger)
+		eventfd_signal(vdev->non_fatal_err_trigger, 1);
+	else if (vdev->err_trigger)
 		eventfd_signal(vdev->err_trigger, 1);
 
 	mutex_unlock(&vdev->igate);
@@ -1292,8 +1296,38 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
 	return PCI_ERS_RESULT_CAN_RECOVER;
 }
 
+static pci_ers_result_t vfio_pci_aer_slot_reset(struct pci_dev *pdev)
+{
+	struct vfio_pci_device *vdev;
+	struct vfio_device *device;
+	static pci_ers_result_t err = PCI_ERS_RESULT_NONE;
+
+	device = vfio_device_get_from_dev(&pdev->dev);
+	if (!device)
+		goto err_dev;
+
+	vdev = vfio_device_data(device);
+	if (!vdev)
+		goto err_data;
+
+	mutex_lock(&vdev->igate);
+
+	if (vdev->err_trigger)
+		eventfd_signal(vdev->err_trigger, 1);
+
+	mutex_unlock(&vdev->igate);
+
+	err = PCI_ERS_RESULT_RECOVERED;
+
+err_data:
+	vfio_device_put(device);
+err_dev:
+	return err;
+}
+
 static const struct pci_error_handlers vfio_err_handlers = {
 	.error_detected = vfio_pci_aer_err_detected,
+	.slot_reset = vfio_pci_aer_slot_reset,
 };
 
 static struct pci_driver vfio_pci_driver = {
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 1c46045..270d568 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -611,6 +611,17 @@ static int vfio_pci_set_err_trigger(struct vfio_pci_device *vdev,
 					       count, flags, data);
 }
 
+static int vfio_pci_set_non_fatal_err_trigger(struct vfio_pci_device *vdev,
+				    unsigned index, unsigned start,
+				    unsigned count, uint32_t flags, void *data)
+{
+	if (index != VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX || start != 0 || count > 1)
+		return -EINVAL;
+
+	return vfio_pci_set_ctx_trigger_single(&vdev->non_fatal_err_trigger,
+					       count, flags, data);
+}
+
 static int vfio_pci_set_req_trigger(struct vfio_pci_device *vdev,
 				    unsigned index, unsigned start,
 				    unsigned count, uint32_t flags, void *data)
@@ -664,6 +675,14 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_device *vdev, uint32_t flags,
 			break;
 		}
 		break;
+	case VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX:
+		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
+		case VFIO_IRQ_SET_ACTION_TRIGGER:
+			if (pci_is_pcie(vdev->pdev))
+				func = vfio_pci_set_non_fatal_err_trigger;
+			break;
+		}
+		break;
 	case VFIO_PCI_REQ_IRQ_INDEX:
 		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
 		case VFIO_IRQ_SET_ACTION_TRIGGER:
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index f561ac1..d30f8a3 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -93,6 +93,7 @@ struct vfio_pci_device {
 	struct pci_saved_state	*pci_saved_state;
 	int			refcnt;
 	struct eventfd_ctx	*err_trigger;
+	struct eventfd_ctx	*non_fatal_err_trigger;
 	struct eventfd_ctx	*req_trigger;
 	struct list_head	dummy_resources_list;
 };
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 519eff3..affa973 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -443,6 +443,7 @@ enum {
 	VFIO_PCI_MSIX_IRQ_INDEX,
 	VFIO_PCI_ERR_IRQ_INDEX,
 	VFIO_PCI_REQ_IRQ_INDEX,
+	VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX,
 	VFIO_PCI_NUM_IRQS
 };
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH] vfio pci: kernel support of error recovery only for non fatal error
@ 2017-02-27  7:28 ` Cao jin
  0 siblings, 0 replies; 22+ messages in thread
From: Cao jin @ 2017-02-27  7:28 UTC (permalink / raw)
  To: linux-kernel, kvm, qemu-devel; +Cc: izumi.taku, alex.williamson, mst

0. What happens now (PCIE AER only)
   Fatal errors cause a link reset.
   Non fatal errors don't.
   All errors stop the VM eventually, but not immediately
   because it's detected and reported asynchronously.
   Interrupts are forwarded as usual.
   Correctable errors are not reported to guest at all.
   Note: PPC EEH is different. This focuses on AER.

1. Correctable errors
   There is no need to report these to guest. So let's not.

2. Fatal errors
   It's not easy to handle them gracefully since link reset
   is needed. As a first step, let's use the existing mechanism
   in that case.

2. Non-fatal errors
   Here we could make progress by reporting them to guest
   and have guest handle them.

   Issues:
   a. this behaviour should only be enabled with new userspace,
      old userspace should work without changes.

      Suggestion: One way to address this would be to add a new eventfd
      non_fatal_err_trigger. If not set, invoke err_trigger.

   b. drivers are supposed to stop MMIO when error is reported,
      if vm keeps going, we will keep doing MMIO/config.

      Suggestion 1: ignore this. vm stop happens much later when
      userspace runs anyway, so we are not making things much worse.

      Suggestion 2: try to stop MMIO/config, resume on resume call

      Patch below implements Suggestion 1.

      Note that although this is really against the documentation, which
      states error_detected() is the point at which the driver should quiesce
      the device and not touch it further (until diagnostic poking at
      mmio_enabled or full access at resume callback).

      Fixing this won't be easy. However, this is not a regression.

      Also note this does nothing about interrupts, documentation
      suggests returning IRQ_NONE until reset.
      Again, not a regression.

   c. PF driver might detect that function is completely broken,
      if vm keeps going, we will keep doing MMIO/config.

      Suggestion 1: ignore this. vm stop happens much later when
      userspace runs anyway, so we are not making things much worse.

      Suggestion 2: detect this and invoke err_trigger to stop VM.

      Patch below implements Suggestion 2.

Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
---
 drivers/vfio/pci/vfio_pci.c         | 38 +++++++++++++++++++++++++++++++++++--
 drivers/vfio/pci/vfio_pci_intrs.c   | 19 +++++++++++++++++++
 drivers/vfio/pci/vfio_pci_private.h |  1 +
 include/uapi/linux/vfio.h           |  1 +
 4 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 324c52e..3551cc9 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -441,7 +441,8 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type)
 
 			return (flags & PCI_MSIX_FLAGS_QSIZE) + 1;
 		}
-	} else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX) {
+	} else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX ||
+		   irq_type == VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX) {
 		if (pci_is_pcie(vdev->pdev))
 			return 1;
 	} else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) {
@@ -796,6 +797,7 @@ static long vfio_pci_ioctl(void *device_data,
 		case VFIO_PCI_REQ_IRQ_INDEX:
 			break;
 		case VFIO_PCI_ERR_IRQ_INDEX:
+		case VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX:
 			if (pci_is_pcie(vdev->pdev))
 				break;
 		/* pass thru to return error */
@@ -1282,7 +1284,9 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
 
 	mutex_lock(&vdev->igate);
 
-	if (vdev->err_trigger)
+	if (state == pci_channel_io_normal && vdev->non_fatal_err_trigger)
+		eventfd_signal(vdev->non_fatal_err_trigger, 1);
+	else if (vdev->err_trigger)
 		eventfd_signal(vdev->err_trigger, 1);
 
 	mutex_unlock(&vdev->igate);
@@ -1292,8 +1296,38 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
 	return PCI_ERS_RESULT_CAN_RECOVER;
 }
 
+static pci_ers_result_t vfio_pci_aer_slot_reset(struct pci_dev *pdev)
+{
+	struct vfio_pci_device *vdev;
+	struct vfio_device *device;
+	static pci_ers_result_t err = PCI_ERS_RESULT_NONE;
+
+	device = vfio_device_get_from_dev(&pdev->dev);
+	if (!device)
+		goto err_dev;
+
+	vdev = vfio_device_data(device);
+	if (!vdev)
+		goto err_data;
+
+	mutex_lock(&vdev->igate);
+
+	if (vdev->err_trigger)
+		eventfd_signal(vdev->err_trigger, 1);
+
+	mutex_unlock(&vdev->igate);
+
+	err = PCI_ERS_RESULT_RECOVERED;
+
+err_data:
+	vfio_device_put(device);
+err_dev:
+	return err;
+}
+
 static const struct pci_error_handlers vfio_err_handlers = {
 	.error_detected = vfio_pci_aer_err_detected,
+	.slot_reset = vfio_pci_aer_slot_reset,
 };
 
 static struct pci_driver vfio_pci_driver = {
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 1c46045..270d568 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -611,6 +611,17 @@ static int vfio_pci_set_err_trigger(struct vfio_pci_device *vdev,
 					       count, flags, data);
 }
 
+static int vfio_pci_set_non_fatal_err_trigger(struct vfio_pci_device *vdev,
+				    unsigned index, unsigned start,
+				    unsigned count, uint32_t flags, void *data)
+{
+	if (index != VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX || start != 0 || count > 1)
+		return -EINVAL;
+
+	return vfio_pci_set_ctx_trigger_single(&vdev->non_fatal_err_trigger,
+					       count, flags, data);
+}
+
 static int vfio_pci_set_req_trigger(struct vfio_pci_device *vdev,
 				    unsigned index, unsigned start,
 				    unsigned count, uint32_t flags, void *data)
@@ -664,6 +675,14 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_device *vdev, uint32_t flags,
 			break;
 		}
 		break;
+	case VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX:
+		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
+		case VFIO_IRQ_SET_ACTION_TRIGGER:
+			if (pci_is_pcie(vdev->pdev))
+				func = vfio_pci_set_non_fatal_err_trigger;
+			break;
+		}
+		break;
 	case VFIO_PCI_REQ_IRQ_INDEX:
 		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
 		case VFIO_IRQ_SET_ACTION_TRIGGER:
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index f561ac1..d30f8a3 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -93,6 +93,7 @@ struct vfio_pci_device {
 	struct pci_saved_state	*pci_saved_state;
 	int			refcnt;
 	struct eventfd_ctx	*err_trigger;
+	struct eventfd_ctx	*non_fatal_err_trigger;
 	struct eventfd_ctx	*req_trigger;
 	struct list_head	dummy_resources_list;
 };
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 519eff3..affa973 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -443,6 +443,7 @@ enum {
 	VFIO_PCI_MSIX_IRQ_INDEX,
 	VFIO_PCI_ERR_IRQ_INDEX,
 	VFIO_PCI_REQ_IRQ_INDEX,
+	VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX,
 	VFIO_PCI_NUM_IRQS
 };
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH] vfio pci: kernel support of error recovery only for non fatal error
@ 2017-02-27  7:28 ` Cao jin
  0 siblings, 0 replies; 22+ messages in thread
From: Cao jin @ 2017-02-27  7:28 UTC (permalink / raw)
  To: linux-kernel, kvm, qemu-devel; +Cc: mst, alex.williamson, izumi.taku

0. What happens now (PCIE AER only)
   Fatal errors cause a link reset.
   Non fatal errors don't.
   All errors stop the VM eventually, but not immediately
   because it's detected and reported asynchronously.
   Interrupts are forwarded as usual.
   Correctable errors are not reported to guest at all.
   Note: PPC EEH is different. This focuses on AER.

1. Correctable errors
   There is no need to report these to guest. So let's not.

2. Fatal errors
   It's not easy to handle them gracefully since link reset
   is needed. As a first step, let's use the existing mechanism
   in that case.

2. Non-fatal errors
   Here we could make progress by reporting them to guest
   and have guest handle them.

   Issues:
   a. this behaviour should only be enabled with new userspace,
      old userspace should work without changes.

      Suggestion: One way to address this would be to add a new eventfd
      non_fatal_err_trigger. If not set, invoke err_trigger.

   b. drivers are supposed to stop MMIO when error is reported,
      if vm keeps going, we will keep doing MMIO/config.

      Suggestion 1: ignore this. vm stop happens much later when
      userspace runs anyway, so we are not making things much worse.

      Suggestion 2: try to stop MMIO/config, resume on resume call

      Patch below implements Suggestion 1.

      Note that although this is really against the documentation, which
      states error_detected() is the point at which the driver should quiesce
      the device and not touch it further (until diagnostic poking at
      mmio_enabled or full access at resume callback).

      Fixing this won't be easy. However, this is not a regression.

      Also note this does nothing about interrupts, documentation
      suggests returning IRQ_NONE until reset.
      Again, not a regression.

   c. PF driver might detect that function is completely broken,
      if vm keeps going, we will keep doing MMIO/config.

      Suggestion 1: ignore this. vm stop happens much later when
      userspace runs anyway, so we are not making things much worse.

      Suggestion 2: detect this and invoke err_trigger to stop VM.

      Patch below implements Suggestion 2.

Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
---
 drivers/vfio/pci/vfio_pci.c         | 38 +++++++++++++++++++++++++++++++++++--
 drivers/vfio/pci/vfio_pci_intrs.c   | 19 +++++++++++++++++++
 drivers/vfio/pci/vfio_pci_private.h |  1 +
 include/uapi/linux/vfio.h           |  1 +
 4 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 324c52e..3551cc9 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -441,7 +441,8 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type)
 
 			return (flags & PCI_MSIX_FLAGS_QSIZE) + 1;
 		}
-	} else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX) {
+	} else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX ||
+		   irq_type == VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX) {
 		if (pci_is_pcie(vdev->pdev))
 			return 1;
 	} else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) {
@@ -796,6 +797,7 @@ static long vfio_pci_ioctl(void *device_data,
 		case VFIO_PCI_REQ_IRQ_INDEX:
 			break;
 		case VFIO_PCI_ERR_IRQ_INDEX:
+		case VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX:
 			if (pci_is_pcie(vdev->pdev))
 				break;
 		/* pass thru to return error */
@@ -1282,7 +1284,9 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
 
 	mutex_lock(&vdev->igate);
 
-	if (vdev->err_trigger)
+	if (state == pci_channel_io_normal && vdev->non_fatal_err_trigger)
+		eventfd_signal(vdev->non_fatal_err_trigger, 1);
+	else if (vdev->err_trigger)
 		eventfd_signal(vdev->err_trigger, 1);
 
 	mutex_unlock(&vdev->igate);
@@ -1292,8 +1296,38 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
 	return PCI_ERS_RESULT_CAN_RECOVER;
 }
 
+static pci_ers_result_t vfio_pci_aer_slot_reset(struct pci_dev *pdev)
+{
+	struct vfio_pci_device *vdev;
+	struct vfio_device *device;
+	static pci_ers_result_t err = PCI_ERS_RESULT_NONE;
+
+	device = vfio_device_get_from_dev(&pdev->dev);
+	if (!device)
+		goto err_dev;
+
+	vdev = vfio_device_data(device);
+	if (!vdev)
+		goto err_data;
+
+	mutex_lock(&vdev->igate);
+
+	if (vdev->err_trigger)
+		eventfd_signal(vdev->err_trigger, 1);
+
+	mutex_unlock(&vdev->igate);
+
+	err = PCI_ERS_RESULT_RECOVERED;
+
+err_data:
+	vfio_device_put(device);
+err_dev:
+	return err;
+}
+
 static const struct pci_error_handlers vfio_err_handlers = {
 	.error_detected = vfio_pci_aer_err_detected,
+	.slot_reset = vfio_pci_aer_slot_reset,
 };
 
 static struct pci_driver vfio_pci_driver = {
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 1c46045..270d568 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -611,6 +611,17 @@ static int vfio_pci_set_err_trigger(struct vfio_pci_device *vdev,
 					       count, flags, data);
 }
 
+static int vfio_pci_set_non_fatal_err_trigger(struct vfio_pci_device *vdev,
+				    unsigned index, unsigned start,
+				    unsigned count, uint32_t flags, void *data)
+{
+	if (index != VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX || start != 0 || count > 1)
+		return -EINVAL;
+
+	return vfio_pci_set_ctx_trigger_single(&vdev->non_fatal_err_trigger,
+					       count, flags, data);
+}
+
 static int vfio_pci_set_req_trigger(struct vfio_pci_device *vdev,
 				    unsigned index, unsigned start,
 				    unsigned count, uint32_t flags, void *data)
@@ -664,6 +675,14 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_device *vdev, uint32_t flags,
 			break;
 		}
 		break;
+	case VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX:
+		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
+		case VFIO_IRQ_SET_ACTION_TRIGGER:
+			if (pci_is_pcie(vdev->pdev))
+				func = vfio_pci_set_non_fatal_err_trigger;
+			break;
+		}
+		break;
 	case VFIO_PCI_REQ_IRQ_INDEX:
 		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
 		case VFIO_IRQ_SET_ACTION_TRIGGER:
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index f561ac1..d30f8a3 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -93,6 +93,7 @@ struct vfio_pci_device {
 	struct pci_saved_state	*pci_saved_state;
 	int			refcnt;
 	struct eventfd_ctx	*err_trigger;
+	struct eventfd_ctx	*non_fatal_err_trigger;
 	struct eventfd_ctx	*req_trigger;
 	struct list_head	dummy_resources_list;
 };
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 519eff3..affa973 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -443,6 +443,7 @@ enum {
 	VFIO_PCI_MSIX_IRQ_INDEX,
 	VFIO_PCI_ERR_IRQ_INDEX,
 	VFIO_PCI_REQ_IRQ_INDEX,
+	VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX,
 	VFIO_PCI_NUM_IRQS
 };
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] vfio pci: kernel support of error recovery only for non fatal error
  2017-02-27  7:28 ` Cao jin
@ 2017-02-27 16:16   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 22+ messages in thread
From: Michael S. Tsirkin @ 2017-02-27 16:16 UTC (permalink / raw)
  To: Cao jin; +Cc: linux-kernel, kvm, qemu-devel, alex.williamson, izumi.taku

On Mon, Feb 27, 2017 at 03:28:43PM +0800, Cao jin wrote:
> Subject: Re: [PATCH] vfio pci: kernel support of error recovery only for non
> fatal error

Don't make the subject so long. This is why I had
	[PATCH v3] vfio error recovery: kernel support
you also want to add versioning as you inherited my v3,
you should make this v4 etc.

> 0. What happens now (PCIE AER only)
>    Fatal errors cause a link reset.
>    Non fatal errors don't.
>    All errors stop the VM eventually, but not immediately
>    because it's detected and reported asynchronously.
>    Interrupts are forwarded as usual.
>    Correctable errors are not reported to guest at all.
>    Note: PPC EEH is different. This focuses on AER.
> 
> 1. Correctable errors
>    There is no need to report these to guest. So let's not.
> 
> 2. Fatal errors
>    It's not easy to handle them gracefully since link reset
>    is needed. As a first step, let's use the existing mechanism
>    in that case.
> 
> 2. Non-fatal errors
>    Here we could make progress by reporting them to guest
>    and have guest handle them.
> 
>    Issues:
>    a. this behaviour should only be enabled with new userspace,
>       old userspace should work without changes.
> 
>       Suggestion: One way to address this would be to add a new eventfd
>       non_fatal_err_trigger. If not set, invoke err_trigger.
> 
>    b. drivers are supposed to stop MMIO when error is reported,
>       if vm keeps going, we will keep doing MMIO/config.
> 
>       Suggestion 1: ignore this. vm stop happens much later when
>       userspace runs anyway, so we are not making things much worse.
> 
>       Suggestion 2: try to stop MMIO/config, resume on resume call
> 
>       Patch below implements Suggestion 1.
> 
>       Note that although this is really against the documentation, which
>       states error_detected() is the point at which the driver should quiesce
>       the device and not touch it further (until diagnostic poking at
>       mmio_enabled or full access at resume callback).
> 
>       Fixing this won't be easy. However, this is not a regression.
> 
>       Also note this does nothing about interrupts, documentation
>       suggests returning IRQ_NONE until reset.
>       Again, not a regression.
> 
>    c. PF driver might detect that function is completely broken,
>       if vm keeps going, we will keep doing MMIO/config.
> 
>       Suggestion 1: ignore this. vm stop happens much later when
>       userspace runs anyway, so we are not making things much worse.
> 
>       Suggestion 2: detect this and invoke err_trigger to stop VM.
> 
>       Patch below implements Suggestion 2.
> 
> Suggested-by: Michael S. Tsirkin <mst@redhat.com>

It's more than this, you are really reusing parts of my patch,
so you should say so and include my signature.

If you only added a line or two you can keep
the original author. To do this you add
	From: Michael S. Tsirkin <mst@redhat.com>
before commit text.

> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
> ---

Changelog from my v3?

>  drivers/vfio/pci/vfio_pci.c         | 38 +++++++++++++++++++++++++++++++++++--
>  drivers/vfio/pci/vfio_pci_intrs.c   | 19 +++++++++++++++++++
>  drivers/vfio/pci/vfio_pci_private.h |  1 +
>  include/uapi/linux/vfio.h           |  1 +
>  4 files changed, 57 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 324c52e..3551cc9 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -441,7 +441,8 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type)
>  
>  			return (flags & PCI_MSIX_FLAGS_QSIZE) + 1;
>  		}
> -	} else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX) {
> +	} else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX ||
> +		   irq_type == VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX) {
>  		if (pci_is_pcie(vdev->pdev))
>  			return 1;
>  	} else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) {
> @@ -796,6 +797,7 @@ static long vfio_pci_ioctl(void *device_data,
>  		case VFIO_PCI_REQ_IRQ_INDEX:
>  			break;
>  		case VFIO_PCI_ERR_IRQ_INDEX:
> +		case VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX:
>  			if (pci_is_pcie(vdev->pdev))
>  				break;
>  		/* pass thru to return error */
> @@ -1282,7 +1284,9 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
>  
>  	mutex_lock(&vdev->igate);
>  
> -	if (vdev->err_trigger)
> +	if (state == pci_channel_io_normal && vdev->non_fatal_err_trigger)
> +		eventfd_signal(vdev->non_fatal_err_trigger, 1);
> +	else if (vdev->err_trigger)
>  		eventfd_signal(vdev->err_trigger, 1);
>  
>  	mutex_unlock(&vdev->igate);
> @@ -1292,8 +1296,38 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
>  	return PCI_ERS_RESULT_CAN_RECOVER;
>  }
>  
> +static pci_ers_result_t vfio_pci_aer_slot_reset(struct pci_dev *pdev)
> +{
> +	struct vfio_pci_device *vdev;
> +	struct vfio_device *device;
> +	static pci_ers_result_t err = PCI_ERS_RESULT_NONE;
> +
> +	device = vfio_device_get_from_dev(&pdev->dev);
> +	if (!device)
> +		goto err_dev;
> +
> +	vdev = vfio_device_data(device);
> +	if (!vdev)
> +		goto err_data;
> +
> +	mutex_lock(&vdev->igate);
> +
> +	if (vdev->err_trigger)
> +		eventfd_signal(vdev->err_trigger, 1);
> +
> +	mutex_unlock(&vdev->igate);
> +
> +	err = PCI_ERS_RESULT_RECOVERED;
> +
> +err_data:
> +	vfio_device_put(device);
> +err_dev:
> +	return err;
> +}
> +
>  static const struct pci_error_handlers vfio_err_handlers = {
>  	.error_detected = vfio_pci_aer_err_detected,
> +	.slot_reset = vfio_pci_aer_slot_reset,
>  };
>  
>  static struct pci_driver vfio_pci_driver = {
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index 1c46045..270d568 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -611,6 +611,17 @@ static int vfio_pci_set_err_trigger(struct vfio_pci_device *vdev,
>  					       count, flags, data);
>  }
>  
> +static int vfio_pci_set_non_fatal_err_trigger(struct vfio_pci_device *vdev,
> +				    unsigned index, unsigned start,
> +				    unsigned count, uint32_t flags, void *data)
> +{
> +	if (index != VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX || start != 0 || count > 1)
> +		return -EINVAL;
> +
> +	return vfio_pci_set_ctx_trigger_single(&vdev->non_fatal_err_trigger,
> +					       count, flags, data);
> +}
> +
>  static int vfio_pci_set_req_trigger(struct vfio_pci_device *vdev,
>  				    unsigned index, unsigned start,
>  				    unsigned count, uint32_t flags, void *data)
> @@ -664,6 +675,14 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_device *vdev, uint32_t flags,
>  			break;
>  		}
>  		break;
> +	case VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX:
> +		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
> +		case VFIO_IRQ_SET_ACTION_TRIGGER:
> +			if (pci_is_pcie(vdev->pdev))
> +				func = vfio_pci_set_non_fatal_err_trigger;
> +			break;
> +		}
> +		break;
>  	case VFIO_PCI_REQ_IRQ_INDEX:
>  		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
>  		case VFIO_IRQ_SET_ACTION_TRIGGER:
> diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
> index f561ac1..d30f8a3 100644
> --- a/drivers/vfio/pci/vfio_pci_private.h
> +++ b/drivers/vfio/pci/vfio_pci_private.h
> @@ -93,6 +93,7 @@ struct vfio_pci_device {
>  	struct pci_saved_state	*pci_saved_state;
>  	int			refcnt;
>  	struct eventfd_ctx	*err_trigger;
> +	struct eventfd_ctx	*non_fatal_err_trigger;
>  	struct eventfd_ctx	*req_trigger;
>  	struct list_head	dummy_resources_list;
>  };
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 519eff3..affa973 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -443,6 +443,7 @@ enum {
>  	VFIO_PCI_MSIX_IRQ_INDEX,
>  	VFIO_PCI_ERR_IRQ_INDEX,
>  	VFIO_PCI_REQ_IRQ_INDEX,
> +	VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX,
>  	VFIO_PCI_NUM_IRQS
>  };
>  
> -- 
> 1.8.3.1
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH] vfio pci: kernel support of error recovery only for non fatal error
@ 2017-02-27 16:16   ` Michael S. Tsirkin
  0 siblings, 0 replies; 22+ messages in thread
From: Michael S. Tsirkin @ 2017-02-27 16:16 UTC (permalink / raw)
  To: Cao jin; +Cc: linux-kernel, kvm, qemu-devel, alex.williamson, izumi.taku

On Mon, Feb 27, 2017 at 03:28:43PM +0800, Cao jin wrote:
> Subject: Re: [PATCH] vfio pci: kernel support of error recovery only for non
> fatal error

Don't make the subject so long. This is why I had
	[PATCH v3] vfio error recovery: kernel support
you also want to add versioning as you inherited my v3,
you should make this v4 etc.

> 0. What happens now (PCIE AER only)
>    Fatal errors cause a link reset.
>    Non fatal errors don't.
>    All errors stop the VM eventually, but not immediately
>    because it's detected and reported asynchronously.
>    Interrupts are forwarded as usual.
>    Correctable errors are not reported to guest at all.
>    Note: PPC EEH is different. This focuses on AER.
> 
> 1. Correctable errors
>    There is no need to report these to guest. So let's not.
> 
> 2. Fatal errors
>    It's not easy to handle them gracefully since link reset
>    is needed. As a first step, let's use the existing mechanism
>    in that case.
> 
> 2. Non-fatal errors
>    Here we could make progress by reporting them to guest
>    and have guest handle them.
> 
>    Issues:
>    a. this behaviour should only be enabled with new userspace,
>       old userspace should work without changes.
> 
>       Suggestion: One way to address this would be to add a new eventfd
>       non_fatal_err_trigger. If not set, invoke err_trigger.
> 
>    b. drivers are supposed to stop MMIO when error is reported,
>       if vm keeps going, we will keep doing MMIO/config.
> 
>       Suggestion 1: ignore this. vm stop happens much later when
>       userspace runs anyway, so we are not making things much worse.
> 
>       Suggestion 2: try to stop MMIO/config, resume on resume call
> 
>       Patch below implements Suggestion 1.
> 
>       Note that although this is really against the documentation, which
>       states error_detected() is the point at which the driver should quiesce
>       the device and not touch it further (until diagnostic poking at
>       mmio_enabled or full access at resume callback).
> 
>       Fixing this won't be easy. However, this is not a regression.
> 
>       Also note this does nothing about interrupts, documentation
>       suggests returning IRQ_NONE until reset.
>       Again, not a regression.
> 
>    c. PF driver might detect that function is completely broken,
>       if vm keeps going, we will keep doing MMIO/config.
> 
>       Suggestion 1: ignore this. vm stop happens much later when
>       userspace runs anyway, so we are not making things much worse.
> 
>       Suggestion 2: detect this and invoke err_trigger to stop VM.
> 
>       Patch below implements Suggestion 2.
> 
> Suggested-by: Michael S. Tsirkin <mst@redhat.com>

It's more than this, you are really reusing parts of my patch,
so you should say so and include my signature.

If you only added a line or two you can keep
the original author. To do this you add
	From: Michael S. Tsirkin <mst@redhat.com>
before commit text.

> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
> ---

Changelog from my v3?

>  drivers/vfio/pci/vfio_pci.c         | 38 +++++++++++++++++++++++++++++++++++--
>  drivers/vfio/pci/vfio_pci_intrs.c   | 19 +++++++++++++++++++
>  drivers/vfio/pci/vfio_pci_private.h |  1 +
>  include/uapi/linux/vfio.h           |  1 +
>  4 files changed, 57 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 324c52e..3551cc9 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -441,7 +441,8 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type)
>  
>  			return (flags & PCI_MSIX_FLAGS_QSIZE) + 1;
>  		}
> -	} else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX) {
> +	} else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX ||
> +		   irq_type == VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX) {
>  		if (pci_is_pcie(vdev->pdev))
>  			return 1;
>  	} else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) {
> @@ -796,6 +797,7 @@ static long vfio_pci_ioctl(void *device_data,
>  		case VFIO_PCI_REQ_IRQ_INDEX:
>  			break;
>  		case VFIO_PCI_ERR_IRQ_INDEX:
> +		case VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX:
>  			if (pci_is_pcie(vdev->pdev))
>  				break;
>  		/* pass thru to return error */
> @@ -1282,7 +1284,9 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
>  
>  	mutex_lock(&vdev->igate);
>  
> -	if (vdev->err_trigger)
> +	if (state == pci_channel_io_normal && vdev->non_fatal_err_trigger)
> +		eventfd_signal(vdev->non_fatal_err_trigger, 1);
> +	else if (vdev->err_trigger)
>  		eventfd_signal(vdev->err_trigger, 1);
>  
>  	mutex_unlock(&vdev->igate);
> @@ -1292,8 +1296,38 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
>  	return PCI_ERS_RESULT_CAN_RECOVER;
>  }
>  
> +static pci_ers_result_t vfio_pci_aer_slot_reset(struct pci_dev *pdev)
> +{
> +	struct vfio_pci_device *vdev;
> +	struct vfio_device *device;
> +	static pci_ers_result_t err = PCI_ERS_RESULT_NONE;
> +
> +	device = vfio_device_get_from_dev(&pdev->dev);
> +	if (!device)
> +		goto err_dev;
> +
> +	vdev = vfio_device_data(device);
> +	if (!vdev)
> +		goto err_data;
> +
> +	mutex_lock(&vdev->igate);
> +
> +	if (vdev->err_trigger)
> +		eventfd_signal(vdev->err_trigger, 1);
> +
> +	mutex_unlock(&vdev->igate);
> +
> +	err = PCI_ERS_RESULT_RECOVERED;
> +
> +err_data:
> +	vfio_device_put(device);
> +err_dev:
> +	return err;
> +}
> +
>  static const struct pci_error_handlers vfio_err_handlers = {
>  	.error_detected = vfio_pci_aer_err_detected,
> +	.slot_reset = vfio_pci_aer_slot_reset,
>  };
>  
>  static struct pci_driver vfio_pci_driver = {
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index 1c46045..270d568 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -611,6 +611,17 @@ static int vfio_pci_set_err_trigger(struct vfio_pci_device *vdev,
>  					       count, flags, data);
>  }
>  
> +static int vfio_pci_set_non_fatal_err_trigger(struct vfio_pci_device *vdev,
> +				    unsigned index, unsigned start,
> +				    unsigned count, uint32_t flags, void *data)
> +{
> +	if (index != VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX || start != 0 || count > 1)
> +		return -EINVAL;
> +
> +	return vfio_pci_set_ctx_trigger_single(&vdev->non_fatal_err_trigger,
> +					       count, flags, data);
> +}
> +
>  static int vfio_pci_set_req_trigger(struct vfio_pci_device *vdev,
>  				    unsigned index, unsigned start,
>  				    unsigned count, uint32_t flags, void *data)
> @@ -664,6 +675,14 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_device *vdev, uint32_t flags,
>  			break;
>  		}
>  		break;
> +	case VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX:
> +		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
> +		case VFIO_IRQ_SET_ACTION_TRIGGER:
> +			if (pci_is_pcie(vdev->pdev))
> +				func = vfio_pci_set_non_fatal_err_trigger;
> +			break;
> +		}
> +		break;
>  	case VFIO_PCI_REQ_IRQ_INDEX:
>  		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
>  		case VFIO_IRQ_SET_ACTION_TRIGGER:
> diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
> index f561ac1..d30f8a3 100644
> --- a/drivers/vfio/pci/vfio_pci_private.h
> +++ b/drivers/vfio/pci/vfio_pci_private.h
> @@ -93,6 +93,7 @@ struct vfio_pci_device {
>  	struct pci_saved_state	*pci_saved_state;
>  	int			refcnt;
>  	struct eventfd_ctx	*err_trigger;
> +	struct eventfd_ctx	*non_fatal_err_trigger;
>  	struct eventfd_ctx	*req_trigger;
>  	struct list_head	dummy_resources_list;
>  };
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 519eff3..affa973 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -443,6 +443,7 @@ enum {
>  	VFIO_PCI_MSIX_IRQ_INDEX,
>  	VFIO_PCI_ERR_IRQ_INDEX,
>  	VFIO_PCI_REQ_IRQ_INDEX,
> +	VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX,
>  	VFIO_PCI_NUM_IRQS
>  };
>  
> -- 
> 1.8.3.1
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] vfio pci: kernel support of error recovery only for non fatal error
  2017-02-27 16:16   ` [Qemu-devel] " Michael S. Tsirkin
  (?)
@ 2017-02-28  1:31     ` Cao jin
  -1 siblings, 0 replies; 22+ messages in thread
From: Cao jin @ 2017-02-28  1:31 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, kvm, qemu-devel, alex.williamson, izumi.taku



On 02/28/2017 12:16 AM, Michael S. Tsirkin wrote:
> On Mon, Feb 27, 2017 at 03:28:43PM +0800, Cao jin wrote:
>> Subject: Re: [PATCH] vfio pci: kernel support of error recovery only for non
>> fatal error
> 
> Don't make the subject so long. This is why I had
> 	[PATCH v3] vfio error recovery: kernel support
> you also want to add versioning as you inherited my v3,
> you should make this v4 etc.
> 

Ok. I didn' see [PATCH v3], I guess I am not CCed.

>> 0. What happens now (PCIE AER only)
>>    Fatal errors cause a link reset.
>>    Non fatal errors don't.
>>    All errors stop the VM eventually, but not immediately
>>    because it's detected and reported asynchronously.
>>    Interrupts are forwarded as usual.
>>    Correctable errors are not reported to guest at all.
>>    Note: PPC EEH is different. This focuses on AER.
>>
>> 1. Correctable errors
>>    There is no need to report these to guest. So let's not.
>>
>> 2. Fatal errors
>>    It's not easy to handle them gracefully since link reset
>>    is needed. As a first step, let's use the existing mechanism
>>    in that case.
>>
>> 2. Non-fatal errors
>>    Here we could make progress by reporting them to guest
>>    and have guest handle them.
>>
>>    Issues:
>>    a. this behaviour should only be enabled with new userspace,
>>       old userspace should work without changes.
>>
>>       Suggestion: One way to address this would be to add a new eventfd
>>       non_fatal_err_trigger. If not set, invoke err_trigger.
>>
>>    b. drivers are supposed to stop MMIO when error is reported,
>>       if vm keeps going, we will keep doing MMIO/config.
>>
>>       Suggestion 1: ignore this. vm stop happens much later when
>>       userspace runs anyway, so we are not making things much worse.
>>
>>       Suggestion 2: try to stop MMIO/config, resume on resume call
>>
>>       Patch below implements Suggestion 1.
>>
>>       Note that although this is really against the documentation, which
>>       states error_detected() is the point at which the driver should quiesce
>>       the device and not touch it further (until diagnostic poking at
>>       mmio_enabled or full access at resume callback).
>>
>>       Fixing this won't be easy. However, this is not a regression.
>>
>>       Also note this does nothing about interrupts, documentation
>>       suggests returning IRQ_NONE until reset.
>>       Again, not a regression.
>>
>>    c. PF driver might detect that function is completely broken,
>>       if vm keeps going, we will keep doing MMIO/config.
>>
>>       Suggestion 1: ignore this. vm stop happens much later when
>>       userspace runs anyway, so we are not making things much worse.
>>
>>       Suggestion 2: detect this and invoke err_trigger to stop VM.
>>
>>       Patch below implements Suggestion 2.
>>
>> Suggested-by: Michael S. Tsirkin <mst@redhat.com>
> 
> It's more than this, you are really reusing parts of my patch,
> so you should say so and include my signature.
> 
> If you only added a line or two you can keep
> the original author. To do this you add
> 	From: Michael S. Tsirkin <mst@redhat.com>
> before commit text.

On this topic, I am really bewildered for a while, thanks for clarification.

-- 
Sincerely,
Cao jin

> 
>> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
>> ---
> 
> Changelog from my v3?
> 
>>  drivers/vfio/pci/vfio_pci.c         | 38 +++++++++++++++++++++++++++++++++++--
>>  drivers/vfio/pci/vfio_pci_intrs.c   | 19 +++++++++++++++++++
>>  drivers/vfio/pci/vfio_pci_private.h |  1 +
>>  include/uapi/linux/vfio.h           |  1 +
>>  4 files changed, 57 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
>> index 324c52e..3551cc9 100644
>> --- a/drivers/vfio/pci/vfio_pci.c
>> +++ b/drivers/vfio/pci/vfio_pci.c
>> @@ -441,7 +441,8 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type)
>>  
>>  			return (flags & PCI_MSIX_FLAGS_QSIZE) + 1;
>>  		}
>> -	} else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX) {
>> +	} else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX ||
>> +		   irq_type == VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX) {
>>  		if (pci_is_pcie(vdev->pdev))
>>  			return 1;
>>  	} else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) {
>> @@ -796,6 +797,7 @@ static long vfio_pci_ioctl(void *device_data,
>>  		case VFIO_PCI_REQ_IRQ_INDEX:
>>  			break;
>>  		case VFIO_PCI_ERR_IRQ_INDEX:
>> +		case VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX:
>>  			if (pci_is_pcie(vdev->pdev))
>>  				break;
>>  		/* pass thru to return error */
>> @@ -1282,7 +1284,9 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
>>  
>>  	mutex_lock(&vdev->igate);
>>  
>> -	if (vdev->err_trigger)
>> +	if (state == pci_channel_io_normal && vdev->non_fatal_err_trigger)
>> +		eventfd_signal(vdev->non_fatal_err_trigger, 1);
>> +	else if (vdev->err_trigger)
>>  		eventfd_signal(vdev->err_trigger, 1);
>>  
>>  	mutex_unlock(&vdev->igate);
>> @@ -1292,8 +1296,38 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
>>  	return PCI_ERS_RESULT_CAN_RECOVER;
>>  }
>>  
>> +static pci_ers_result_t vfio_pci_aer_slot_reset(struct pci_dev *pdev)
>> +{
>> +	struct vfio_pci_device *vdev;
>> +	struct vfio_device *device;
>> +	static pci_ers_result_t err = PCI_ERS_RESULT_NONE;
>> +
>> +	device = vfio_device_get_from_dev(&pdev->dev);
>> +	if (!device)
>> +		goto err_dev;
>> +
>> +	vdev = vfio_device_data(device);
>> +	if (!vdev)
>> +		goto err_data;
>> +
>> +	mutex_lock(&vdev->igate);
>> +
>> +	if (vdev->err_trigger)
>> +		eventfd_signal(vdev->err_trigger, 1);
>> +
>> +	mutex_unlock(&vdev->igate);
>> +
>> +	err = PCI_ERS_RESULT_RECOVERED;
>> +
>> +err_data:
>> +	vfio_device_put(device);
>> +err_dev:
>> +	return err;
>> +}
>> +
>>  static const struct pci_error_handlers vfio_err_handlers = {
>>  	.error_detected = vfio_pci_aer_err_detected,
>> +	.slot_reset = vfio_pci_aer_slot_reset,
>>  };
>>  
>>  static struct pci_driver vfio_pci_driver = {
>> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
>> index 1c46045..270d568 100644
>> --- a/drivers/vfio/pci/vfio_pci_intrs.c
>> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
>> @@ -611,6 +611,17 @@ static int vfio_pci_set_err_trigger(struct vfio_pci_device *vdev,
>>  					       count, flags, data);
>>  }
>>  
>> +static int vfio_pci_set_non_fatal_err_trigger(struct vfio_pci_device *vdev,
>> +				    unsigned index, unsigned start,
>> +				    unsigned count, uint32_t flags, void *data)
>> +{
>> +	if (index != VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX || start != 0 || count > 1)
>> +		return -EINVAL;
>> +
>> +	return vfio_pci_set_ctx_trigger_single(&vdev->non_fatal_err_trigger,
>> +					       count, flags, data);
>> +}
>> +
>>  static int vfio_pci_set_req_trigger(struct vfio_pci_device *vdev,
>>  				    unsigned index, unsigned start,
>>  				    unsigned count, uint32_t flags, void *data)
>> @@ -664,6 +675,14 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_device *vdev, uint32_t flags,
>>  			break;
>>  		}
>>  		break;
>> +	case VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX:
>> +		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
>> +		case VFIO_IRQ_SET_ACTION_TRIGGER:
>> +			if (pci_is_pcie(vdev->pdev))
>> +				func = vfio_pci_set_non_fatal_err_trigger;
>> +			break;
>> +		}
>> +		break;
>>  	case VFIO_PCI_REQ_IRQ_INDEX:
>>  		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
>>  		case VFIO_IRQ_SET_ACTION_TRIGGER:
>> diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
>> index f561ac1..d30f8a3 100644
>> --- a/drivers/vfio/pci/vfio_pci_private.h
>> +++ b/drivers/vfio/pci/vfio_pci_private.h
>> @@ -93,6 +93,7 @@ struct vfio_pci_device {
>>  	struct pci_saved_state	*pci_saved_state;
>>  	int			refcnt;
>>  	struct eventfd_ctx	*err_trigger;
>> +	struct eventfd_ctx	*non_fatal_err_trigger;
>>  	struct eventfd_ctx	*req_trigger;
>>  	struct list_head	dummy_resources_list;
>>  };
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index 519eff3..affa973 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -443,6 +443,7 @@ enum {
>>  	VFIO_PCI_MSIX_IRQ_INDEX,
>>  	VFIO_PCI_ERR_IRQ_INDEX,
>>  	VFIO_PCI_REQ_IRQ_INDEX,
>> +	VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX,
>>  	VFIO_PCI_NUM_IRQS
>>  };
>>  
>> -- 
>> 1.8.3.1
>>
>>
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] vfio pci: kernel support of error recovery only for non fatal error
@ 2017-02-28  1:31     ` Cao jin
  0 siblings, 0 replies; 22+ messages in thread
From: Cao jin @ 2017-02-28  1:31 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: izumi.taku, alex.williamson, linux-kernel, kvm, qemu-devel



On 02/28/2017 12:16 AM, Michael S. Tsirkin wrote:
> On Mon, Feb 27, 2017 at 03:28:43PM +0800, Cao jin wrote:
>> Subject: Re: [PATCH] vfio pci: kernel support of error recovery only for non
>> fatal error
> 
> Don't make the subject so long. This is why I had
> 	[PATCH v3] vfio error recovery: kernel support
> you also want to add versioning as you inherited my v3,
> you should make this v4 etc.
> 

Ok. I didn' see [PATCH v3], I guess I am not CCed.

>> 0. What happens now (PCIE AER only)
>>    Fatal errors cause a link reset.
>>    Non fatal errors don't.
>>    All errors stop the VM eventually, but not immediately
>>    because it's detected and reported asynchronously.
>>    Interrupts are forwarded as usual.
>>    Correctable errors are not reported to guest at all.
>>    Note: PPC EEH is different. This focuses on AER.
>>
>> 1. Correctable errors
>>    There is no need to report these to guest. So let's not.
>>
>> 2. Fatal errors
>>    It's not easy to handle them gracefully since link reset
>>    is needed. As a first step, let's use the existing mechanism
>>    in that case.
>>
>> 2. Non-fatal errors
>>    Here we could make progress by reporting them to guest
>>    and have guest handle them.
>>
>>    Issues:
>>    a. this behaviour should only be enabled with new userspace,
>>       old userspace should work without changes.
>>
>>       Suggestion: One way to address this would be to add a new eventfd
>>       non_fatal_err_trigger. If not set, invoke err_trigger.
>>
>>    b. drivers are supposed to stop MMIO when error is reported,
>>       if vm keeps going, we will keep doing MMIO/config.
>>
>>       Suggestion 1: ignore this. vm stop happens much later when
>>       userspace runs anyway, so we are not making things much worse.
>>
>>       Suggestion 2: try to stop MMIO/config, resume on resume call
>>
>>       Patch below implements Suggestion 1.
>>
>>       Note that although this is really against the documentation, which
>>       states error_detected() is the point at which the driver should quiesce
>>       the device and not touch it further (until diagnostic poking at
>>       mmio_enabled or full access at resume callback).
>>
>>       Fixing this won't be easy. However, this is not a regression.
>>
>>       Also note this does nothing about interrupts, documentation
>>       suggests returning IRQ_NONE until reset.
>>       Again, not a regression.
>>
>>    c. PF driver might detect that function is completely broken,
>>       if vm keeps going, we will keep doing MMIO/config.
>>
>>       Suggestion 1: ignore this. vm stop happens much later when
>>       userspace runs anyway, so we are not making things much worse.
>>
>>       Suggestion 2: detect this and invoke err_trigger to stop VM.
>>
>>       Patch below implements Suggestion 2.
>>
>> Suggested-by: Michael S. Tsirkin <mst@redhat.com>
> 
> It's more than this, you are really reusing parts of my patch,
> so you should say so and include my signature.
> 
> If you only added a line or two you can keep
> the original author. To do this you add
> 	From: Michael S. Tsirkin <mst@redhat.com>
> before commit text.

On this topic, I am really bewildered for a while, thanks for clarification.

-- 
Sincerely,
Cao jin

> 
>> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
>> ---
> 
> Changelog from my v3?
> 
>>  drivers/vfio/pci/vfio_pci.c         | 38 +++++++++++++++++++++++++++++++++++--
>>  drivers/vfio/pci/vfio_pci_intrs.c   | 19 +++++++++++++++++++
>>  drivers/vfio/pci/vfio_pci_private.h |  1 +
>>  include/uapi/linux/vfio.h           |  1 +
>>  4 files changed, 57 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
>> index 324c52e..3551cc9 100644
>> --- a/drivers/vfio/pci/vfio_pci.c
>> +++ b/drivers/vfio/pci/vfio_pci.c
>> @@ -441,7 +441,8 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type)
>>  
>>  			return (flags & PCI_MSIX_FLAGS_QSIZE) + 1;
>>  		}
>> -	} else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX) {
>> +	} else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX ||
>> +		   irq_type == VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX) {
>>  		if (pci_is_pcie(vdev->pdev))
>>  			return 1;
>>  	} else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) {
>> @@ -796,6 +797,7 @@ static long vfio_pci_ioctl(void *device_data,
>>  		case VFIO_PCI_REQ_IRQ_INDEX:
>>  			break;
>>  		case VFIO_PCI_ERR_IRQ_INDEX:
>> +		case VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX:
>>  			if (pci_is_pcie(vdev->pdev))
>>  				break;
>>  		/* pass thru to return error */
>> @@ -1282,7 +1284,9 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
>>  
>>  	mutex_lock(&vdev->igate);
>>  
>> -	if (vdev->err_trigger)
>> +	if (state == pci_channel_io_normal && vdev->non_fatal_err_trigger)
>> +		eventfd_signal(vdev->non_fatal_err_trigger, 1);
>> +	else if (vdev->err_trigger)
>>  		eventfd_signal(vdev->err_trigger, 1);
>>  
>>  	mutex_unlock(&vdev->igate);
>> @@ -1292,8 +1296,38 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
>>  	return PCI_ERS_RESULT_CAN_RECOVER;
>>  }
>>  
>> +static pci_ers_result_t vfio_pci_aer_slot_reset(struct pci_dev *pdev)
>> +{
>> +	struct vfio_pci_device *vdev;
>> +	struct vfio_device *device;
>> +	static pci_ers_result_t err = PCI_ERS_RESULT_NONE;
>> +
>> +	device = vfio_device_get_from_dev(&pdev->dev);
>> +	if (!device)
>> +		goto err_dev;
>> +
>> +	vdev = vfio_device_data(device);
>> +	if (!vdev)
>> +		goto err_data;
>> +
>> +	mutex_lock(&vdev->igate);
>> +
>> +	if (vdev->err_trigger)
>> +		eventfd_signal(vdev->err_trigger, 1);
>> +
>> +	mutex_unlock(&vdev->igate);
>> +
>> +	err = PCI_ERS_RESULT_RECOVERED;
>> +
>> +err_data:
>> +	vfio_device_put(device);
>> +err_dev:
>> +	return err;
>> +}
>> +
>>  static const struct pci_error_handlers vfio_err_handlers = {
>>  	.error_detected = vfio_pci_aer_err_detected,
>> +	.slot_reset = vfio_pci_aer_slot_reset,
>>  };
>>  
>>  static struct pci_driver vfio_pci_driver = {
>> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
>> index 1c46045..270d568 100644
>> --- a/drivers/vfio/pci/vfio_pci_intrs.c
>> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
>> @@ -611,6 +611,17 @@ static int vfio_pci_set_err_trigger(struct vfio_pci_device *vdev,
>>  					       count, flags, data);
>>  }
>>  
>> +static int vfio_pci_set_non_fatal_err_trigger(struct vfio_pci_device *vdev,
>> +				    unsigned index, unsigned start,
>> +				    unsigned count, uint32_t flags, void *data)
>> +{
>> +	if (index != VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX || start != 0 || count > 1)
>> +		return -EINVAL;
>> +
>> +	return vfio_pci_set_ctx_trigger_single(&vdev->non_fatal_err_trigger,
>> +					       count, flags, data);
>> +}
>> +
>>  static int vfio_pci_set_req_trigger(struct vfio_pci_device *vdev,
>>  				    unsigned index, unsigned start,
>>  				    unsigned count, uint32_t flags, void *data)
>> @@ -664,6 +675,14 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_device *vdev, uint32_t flags,
>>  			break;
>>  		}
>>  		break;
>> +	case VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX:
>> +		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
>> +		case VFIO_IRQ_SET_ACTION_TRIGGER:
>> +			if (pci_is_pcie(vdev->pdev))
>> +				func = vfio_pci_set_non_fatal_err_trigger;
>> +			break;
>> +		}
>> +		break;
>>  	case VFIO_PCI_REQ_IRQ_INDEX:
>>  		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
>>  		case VFIO_IRQ_SET_ACTION_TRIGGER:
>> diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
>> index f561ac1..d30f8a3 100644
>> --- a/drivers/vfio/pci/vfio_pci_private.h
>> +++ b/drivers/vfio/pci/vfio_pci_private.h
>> @@ -93,6 +93,7 @@ struct vfio_pci_device {
>>  	struct pci_saved_state	*pci_saved_state;
>>  	int			refcnt;
>>  	struct eventfd_ctx	*err_trigger;
>> +	struct eventfd_ctx	*non_fatal_err_trigger;
>>  	struct eventfd_ctx	*req_trigger;
>>  	struct list_head	dummy_resources_list;
>>  };
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index 519eff3..affa973 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -443,6 +443,7 @@ enum {
>>  	VFIO_PCI_MSIX_IRQ_INDEX,
>>  	VFIO_PCI_ERR_IRQ_INDEX,
>>  	VFIO_PCI_REQ_IRQ_INDEX,
>> +	VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX,
>>  	VFIO_PCI_NUM_IRQS
>>  };
>>  
>> -- 
>> 1.8.3.1
>>
>>
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH] vfio pci: kernel support of error recovery only for non fatal error
@ 2017-02-28  1:31     ` Cao jin
  0 siblings, 0 replies; 22+ messages in thread
From: Cao jin @ 2017-02-28  1:31 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, kvm, qemu-devel, alex.williamson, izumi.taku



On 02/28/2017 12:16 AM, Michael S. Tsirkin wrote:
> On Mon, Feb 27, 2017 at 03:28:43PM +0800, Cao jin wrote:
>> Subject: Re: [PATCH] vfio pci: kernel support of error recovery only for non
>> fatal error
> 
> Don't make the subject so long. This is why I had
> 	[PATCH v3] vfio error recovery: kernel support
> you also want to add versioning as you inherited my v3,
> you should make this v4 etc.
> 

Ok. I didn' see [PATCH v3], I guess I am not CCed.

>> 0. What happens now (PCIE AER only)
>>    Fatal errors cause a link reset.
>>    Non fatal errors don't.
>>    All errors stop the VM eventually, but not immediately
>>    because it's detected and reported asynchronously.
>>    Interrupts are forwarded as usual.
>>    Correctable errors are not reported to guest at all.
>>    Note: PPC EEH is different. This focuses on AER.
>>
>> 1. Correctable errors
>>    There is no need to report these to guest. So let's not.
>>
>> 2. Fatal errors
>>    It's not easy to handle them gracefully since link reset
>>    is needed. As a first step, let's use the existing mechanism
>>    in that case.
>>
>> 2. Non-fatal errors
>>    Here we could make progress by reporting them to guest
>>    and have guest handle them.
>>
>>    Issues:
>>    a. this behaviour should only be enabled with new userspace,
>>       old userspace should work without changes.
>>
>>       Suggestion: One way to address this would be to add a new eventfd
>>       non_fatal_err_trigger. If not set, invoke err_trigger.
>>
>>    b. drivers are supposed to stop MMIO when error is reported,
>>       if vm keeps going, we will keep doing MMIO/config.
>>
>>       Suggestion 1: ignore this. vm stop happens much later when
>>       userspace runs anyway, so we are not making things much worse.
>>
>>       Suggestion 2: try to stop MMIO/config, resume on resume call
>>
>>       Patch below implements Suggestion 1.
>>
>>       Note that although this is really against the documentation, which
>>       states error_detected() is the point at which the driver should quiesce
>>       the device and not touch it further (until diagnostic poking at
>>       mmio_enabled or full access at resume callback).
>>
>>       Fixing this won't be easy. However, this is not a regression.
>>
>>       Also note this does nothing about interrupts, documentation
>>       suggests returning IRQ_NONE until reset.
>>       Again, not a regression.
>>
>>    c. PF driver might detect that function is completely broken,
>>       if vm keeps going, we will keep doing MMIO/config.
>>
>>       Suggestion 1: ignore this. vm stop happens much later when
>>       userspace runs anyway, so we are not making things much worse.
>>
>>       Suggestion 2: detect this and invoke err_trigger to stop VM.
>>
>>       Patch below implements Suggestion 2.
>>
>> Suggested-by: Michael S. Tsirkin <mst@redhat.com>
> 
> It's more than this, you are really reusing parts of my patch,
> so you should say so and include my signature.
> 
> If you only added a line or two you can keep
> the original author. To do this you add
> 	From: Michael S. Tsirkin <mst@redhat.com>
> before commit text.

On this topic, I am really bewildered for a while, thanks for clarification.

-- 
Sincerely,
Cao jin

> 
>> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
>> ---
> 
> Changelog from my v3?
> 
>>  drivers/vfio/pci/vfio_pci.c         | 38 +++++++++++++++++++++++++++++++++++--
>>  drivers/vfio/pci/vfio_pci_intrs.c   | 19 +++++++++++++++++++
>>  drivers/vfio/pci/vfio_pci_private.h |  1 +
>>  include/uapi/linux/vfio.h           |  1 +
>>  4 files changed, 57 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
>> index 324c52e..3551cc9 100644
>> --- a/drivers/vfio/pci/vfio_pci.c
>> +++ b/drivers/vfio/pci/vfio_pci.c
>> @@ -441,7 +441,8 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type)
>>  
>>  			return (flags & PCI_MSIX_FLAGS_QSIZE) + 1;
>>  		}
>> -	} else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX) {
>> +	} else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX ||
>> +		   irq_type == VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX) {
>>  		if (pci_is_pcie(vdev->pdev))
>>  			return 1;
>>  	} else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) {
>> @@ -796,6 +797,7 @@ static long vfio_pci_ioctl(void *device_data,
>>  		case VFIO_PCI_REQ_IRQ_INDEX:
>>  			break;
>>  		case VFIO_PCI_ERR_IRQ_INDEX:
>> +		case VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX:
>>  			if (pci_is_pcie(vdev->pdev))
>>  				break;
>>  		/* pass thru to return error */
>> @@ -1282,7 +1284,9 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
>>  
>>  	mutex_lock(&vdev->igate);
>>  
>> -	if (vdev->err_trigger)
>> +	if (state == pci_channel_io_normal && vdev->non_fatal_err_trigger)
>> +		eventfd_signal(vdev->non_fatal_err_trigger, 1);
>> +	else if (vdev->err_trigger)
>>  		eventfd_signal(vdev->err_trigger, 1);
>>  
>>  	mutex_unlock(&vdev->igate);
>> @@ -1292,8 +1296,38 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
>>  	return PCI_ERS_RESULT_CAN_RECOVER;
>>  }
>>  
>> +static pci_ers_result_t vfio_pci_aer_slot_reset(struct pci_dev *pdev)
>> +{
>> +	struct vfio_pci_device *vdev;
>> +	struct vfio_device *device;
>> +	static pci_ers_result_t err = PCI_ERS_RESULT_NONE;
>> +
>> +	device = vfio_device_get_from_dev(&pdev->dev);
>> +	if (!device)
>> +		goto err_dev;
>> +
>> +	vdev = vfio_device_data(device);
>> +	if (!vdev)
>> +		goto err_data;
>> +
>> +	mutex_lock(&vdev->igate);
>> +
>> +	if (vdev->err_trigger)
>> +		eventfd_signal(vdev->err_trigger, 1);
>> +
>> +	mutex_unlock(&vdev->igate);
>> +
>> +	err = PCI_ERS_RESULT_RECOVERED;
>> +
>> +err_data:
>> +	vfio_device_put(device);
>> +err_dev:
>> +	return err;
>> +}
>> +
>>  static const struct pci_error_handlers vfio_err_handlers = {
>>  	.error_detected = vfio_pci_aer_err_detected,
>> +	.slot_reset = vfio_pci_aer_slot_reset,
>>  };
>>  
>>  static struct pci_driver vfio_pci_driver = {
>> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
>> index 1c46045..270d568 100644
>> --- a/drivers/vfio/pci/vfio_pci_intrs.c
>> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
>> @@ -611,6 +611,17 @@ static int vfio_pci_set_err_trigger(struct vfio_pci_device *vdev,
>>  					       count, flags, data);
>>  }
>>  
>> +static int vfio_pci_set_non_fatal_err_trigger(struct vfio_pci_device *vdev,
>> +				    unsigned index, unsigned start,
>> +				    unsigned count, uint32_t flags, void *data)
>> +{
>> +	if (index != VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX || start != 0 || count > 1)
>> +		return -EINVAL;
>> +
>> +	return vfio_pci_set_ctx_trigger_single(&vdev->non_fatal_err_trigger,
>> +					       count, flags, data);
>> +}
>> +
>>  static int vfio_pci_set_req_trigger(struct vfio_pci_device *vdev,
>>  				    unsigned index, unsigned start,
>>  				    unsigned count, uint32_t flags, void *data)
>> @@ -664,6 +675,14 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_device *vdev, uint32_t flags,
>>  			break;
>>  		}
>>  		break;
>> +	case VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX:
>> +		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
>> +		case VFIO_IRQ_SET_ACTION_TRIGGER:
>> +			if (pci_is_pcie(vdev->pdev))
>> +				func = vfio_pci_set_non_fatal_err_trigger;
>> +			break;
>> +		}
>> +		break;
>>  	case VFIO_PCI_REQ_IRQ_INDEX:
>>  		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
>>  		case VFIO_IRQ_SET_ACTION_TRIGGER:
>> diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
>> index f561ac1..d30f8a3 100644
>> --- a/drivers/vfio/pci/vfio_pci_private.h
>> +++ b/drivers/vfio/pci/vfio_pci_private.h
>> @@ -93,6 +93,7 @@ struct vfio_pci_device {
>>  	struct pci_saved_state	*pci_saved_state;
>>  	int			refcnt;
>>  	struct eventfd_ctx	*err_trigger;
>> +	struct eventfd_ctx	*non_fatal_err_trigger;
>>  	struct eventfd_ctx	*req_trigger;
>>  	struct list_head	dummy_resources_list;
>>  };
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index 519eff3..affa973 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -443,6 +443,7 @@ enum {
>>  	VFIO_PCI_MSIX_IRQ_INDEX,
>>  	VFIO_PCI_ERR_IRQ_INDEX,
>>  	VFIO_PCI_REQ_IRQ_INDEX,
>> +	VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX,
>>  	VFIO_PCI_NUM_IRQS
>>  };
>>  
>> -- 
>> 1.8.3.1
>>
>>
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] vfio pci: kernel support of error recovery only for non fatal error
  2017-02-27  7:28 ` Cao jin
@ 2017-03-13 22:06   ` Alex Williamson
  -1 siblings, 0 replies; 22+ messages in thread
From: Alex Williamson @ 2017-03-13 22:06 UTC (permalink / raw)
  To: Cao jin; +Cc: linux-kernel, kvm, qemu-devel, mst, izumi.taku

On Mon, 27 Feb 2017 15:28:43 +0800
Cao jin <caoj.fnst@cn.fujitsu.com> wrote:

> 0. What happens now (PCIE AER only)
>    Fatal errors cause a link reset.
>    Non fatal errors don't.
>    All errors stop the VM eventually, but not immediately
>    because it's detected and reported asynchronously.
>    Interrupts are forwarded as usual.
>    Correctable errors are not reported to guest at all.
>    Note: PPC EEH is different. This focuses on AER.

Perhaps you're only focusing on AER, but don't the error handlers we're
using support both AER and EEH generically?  I don't think we can
completely disregard how this affects EEH behavior, if at all.

> 
> 1. Correctable errors
>    There is no need to report these to guest. So let's not.

What does this patch change to make this happen?  I don't see
anything.  Was this always the case?  No change?

> 
> 2. Fatal errors
>    It's not easy to handle them gracefully since link reset
>    is needed. As a first step, let's use the existing mechanism
>    in that case.

Ok, so no change here either.

> 2. Non-fatal errors
>    Here we could make progress by reporting them to guest
>    and have guest handle them.

In practice, what actual errors do we expect userspace to see as
non-fatal errors?  It would be useful for the commit log to describe
the actual benefit we're going to see by splitting out non-fatal errors
for the user (not always a guest) to see separately.  Justify that this
is actually useful.

> 
>    Issues:
>    a. this behaviour should only be enabled with new userspace,
>       old userspace should work without changes.
> 
>       Suggestion: One way to address this would be to add a new eventfd
>       non_fatal_err_trigger. If not set, invoke err_trigger.

This outline format was really more useful for Michael to try to
generate discussion, for a commit log, I'd much rather see a definitive
statement such as:

 "To maintain backwards compatibility with userspace, non-fatal errors
 will continue to trigger via the existing error interrupt index if a
 non-fatal signaling mechanism has not been registered."

>    b. drivers are supposed to stop MMIO when error is reported,
>       if vm keeps going, we will keep doing MMIO/config.
> 
>       Suggestion 1: ignore this. vm stop happens much later when
>       userspace runs anyway, so we are not making things much worse.
> 
>       Suggestion 2: try to stop MMIO/config, resume on resume call
> 
>       Patch below implements Suggestion 1.
> 
>       Note that although this is really against the documentation, which
>       states error_detected() is the point at which the driver should quiesce
>       the device and not touch it further (until diagnostic poking at
>       mmio_enabled or full access at resume callback).
> 
>       Fixing this won't be easy. However, this is not a regression.
> 
>       Also note this does nothing about interrupts, documentation
>       suggests returning IRQ_NONE until reset.
>       Again, not a regression.

So again, no change here.  I'm not sure what this adds to the commit
log, perhaps we can reference this as a link to Michael's original
proposal.
 
>    c. PF driver might detect that function is completely broken,
>       if vm keeps going, we will keep doing MMIO/config.
> 
>       Suggestion 1: ignore this. vm stop happens much later when
>       userspace runs anyway, so we are not making things much worse.
> 
>       Suggestion 2: detect this and invoke err_trigger to stop VM.
> 
>       Patch below implements Suggestion 2.

This needs more description and seems a bit misleading.  This patch
adds a slot_reset handler, such that if the slot is reset, we notify
the user, essentially promoting the non-fatal error to fatal.  But what
condition gets us to this point?  AIUI, AER is a voting scheme and if
any driver affected says they need a reset, everyone gets a reset.  So
the PF driver we're talking about here is not vfio-pci and it's not the
user, the user has no way to signal that the device is completely
broken, this only handles the case of other collateral devices with
native host drivers that might signal this, right?

It seems like this is where this patch has the greatest exposure to
regressions.  If we take the VM use case, previously we could have a
non-AER aware guest and the hypervisor could stop the VM on all
errors.  Now the hypervisor might support the distinction between fatal
and non-fatal, but the guest may still not have AER support.  That
doesn't imply a problem with this approach, the user (hypervisor) would
be at fault for any difference in handling in that case.

> Suggested-by: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
> ---
>  drivers/vfio/pci/vfio_pci.c         | 38 +++++++++++++++++++++++++++++++++++--
>  drivers/vfio/pci/vfio_pci_intrs.c   | 19 +++++++++++++++++++
>  drivers/vfio/pci/vfio_pci_private.h |  1 +
>  include/uapi/linux/vfio.h           |  1 +
>  4 files changed, 57 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 324c52e..3551cc9 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -441,7 +441,8 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type)
>  
>  			return (flags & PCI_MSIX_FLAGS_QSIZE) + 1;
>  		}
> -	} else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX) {
> +	} else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX ||
> +		   irq_type == VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX) {
>  		if (pci_is_pcie(vdev->pdev))
>  			return 1;
>  	} else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) {
> @@ -796,6 +797,7 @@ static long vfio_pci_ioctl(void *device_data,
>  		case VFIO_PCI_REQ_IRQ_INDEX:
>  			break;
>  		case VFIO_PCI_ERR_IRQ_INDEX:
> +		case VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX:
>  			if (pci_is_pcie(vdev->pdev))
>  				break;
>  		/* pass thru to return error */
> @@ -1282,7 +1284,9 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
>  
>  	mutex_lock(&vdev->igate);
>  
> -	if (vdev->err_trigger)
> +	if (state == pci_channel_io_normal && vdev->non_fatal_err_trigger)
> +		eventfd_signal(vdev->non_fatal_err_trigger, 1);
> +	else if (vdev->err_trigger)
>  		eventfd_signal(vdev->err_trigger, 1);
>  
>  	mutex_unlock(&vdev->igate);
> @@ -1292,8 +1296,38 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
>  	return PCI_ERS_RESULT_CAN_RECOVER;
>  }
>  
> +static pci_ers_result_t vfio_pci_aer_slot_reset(struct pci_dev *pdev)
> +{
> +	struct vfio_pci_device *vdev;
> +	struct vfio_device *device;
> +	static pci_ers_result_t err = PCI_ERS_RESULT_NONE;
> +
> +	device = vfio_device_get_from_dev(&pdev->dev);
> +	if (!device)
> +		goto err_dev;
> +
> +	vdev = vfio_device_data(device);
> +	if (!vdev)
> +		goto err_data;
> +
> +	mutex_lock(&vdev->igate);
> +
> +	if (vdev->err_trigger)
> +		eventfd_signal(vdev->err_trigger, 1);

What about the case where the user has not registered for receiving
non-fatal errors, now we send an error signal on both error_detected
and slot_reset.  Is that useful/desirable?

> +
> +	mutex_unlock(&vdev->igate);
> +
> +	err = PCI_ERS_RESULT_RECOVERED;
> +
> +err_data:
> +	vfio_device_put(device);
> +err_dev:
> +	return err;
> +}
> +
>  static const struct pci_error_handlers vfio_err_handlers = {
>  	.error_detected = vfio_pci_aer_err_detected,
> +	.slot_reset = vfio_pci_aer_slot_reset,
>  };
>  
>  static struct pci_driver vfio_pci_driver = {
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index 1c46045..270d568 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -611,6 +611,17 @@ static int vfio_pci_set_err_trigger(struct vfio_pci_device *vdev,
>  					       count, flags, data);
>  }
>  
> +static int vfio_pci_set_non_fatal_err_trigger(struct vfio_pci_device *vdev,
> +				    unsigned index, unsigned start,
> +				    unsigned count, uint32_t flags, void *data)
> +{
> +	if (index != VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX || start != 0 || count > 1)
> +		return -EINVAL;
> +
> +	return vfio_pci_set_ctx_trigger_single(&vdev->non_fatal_err_trigger,
> +					       count, flags, data);
> +}
> +
>  static int vfio_pci_set_req_trigger(struct vfio_pci_device *vdev,
>  				    unsigned index, unsigned start,
>  				    unsigned count, uint32_t flags, void *data)
> @@ -664,6 +675,14 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_device *vdev, uint32_t flags,
>  			break;
>  		}
>  		break;
> +	case VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX:
> +		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
> +		case VFIO_IRQ_SET_ACTION_TRIGGER:
> +			if (pci_is_pcie(vdev->pdev))
> +				func = vfio_pci_set_non_fatal_err_trigger;
> +			break;
> +		}
> +		break;
>  	case VFIO_PCI_REQ_IRQ_INDEX:
>  		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
>  		case VFIO_IRQ_SET_ACTION_TRIGGER:
> diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
> index f561ac1..d30f8a3 100644
> --- a/drivers/vfio/pci/vfio_pci_private.h
> +++ b/drivers/vfio/pci/vfio_pci_private.h
> @@ -93,6 +93,7 @@ struct vfio_pci_device {
>  	struct pci_saved_state	*pci_saved_state;
>  	int			refcnt;
>  	struct eventfd_ctx	*err_trigger;
> +	struct eventfd_ctx	*non_fatal_err_trigger;
>  	struct eventfd_ctx	*req_trigger;
>  	struct list_head	dummy_resources_list;
>  };
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 519eff3..affa973 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -443,6 +443,7 @@ enum {
>  	VFIO_PCI_MSIX_IRQ_INDEX,
>  	VFIO_PCI_ERR_IRQ_INDEX,
>  	VFIO_PCI_REQ_IRQ_INDEX,
> +	VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX,
>  	VFIO_PCI_NUM_IRQS
>  };
>  

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH] vfio pci: kernel support of error recovery only for non fatal error
@ 2017-03-13 22:06   ` Alex Williamson
  0 siblings, 0 replies; 22+ messages in thread
From: Alex Williamson @ 2017-03-13 22:06 UTC (permalink / raw)
  To: Cao jin; +Cc: linux-kernel, kvm, qemu-devel, mst, izumi.taku

On Mon, 27 Feb 2017 15:28:43 +0800
Cao jin <caoj.fnst@cn.fujitsu.com> wrote:

> 0. What happens now (PCIE AER only)
>    Fatal errors cause a link reset.
>    Non fatal errors don't.
>    All errors stop the VM eventually, but not immediately
>    because it's detected and reported asynchronously.
>    Interrupts are forwarded as usual.
>    Correctable errors are not reported to guest at all.
>    Note: PPC EEH is different. This focuses on AER.

Perhaps you're only focusing on AER, but don't the error handlers we're
using support both AER and EEH generically?  I don't think we can
completely disregard how this affects EEH behavior, if at all.

> 
> 1. Correctable errors
>    There is no need to report these to guest. So let's not.

What does this patch change to make this happen?  I don't see
anything.  Was this always the case?  No change?

> 
> 2. Fatal errors
>    It's not easy to handle them gracefully since link reset
>    is needed. As a first step, let's use the existing mechanism
>    in that case.

Ok, so no change here either.

> 2. Non-fatal errors
>    Here we could make progress by reporting them to guest
>    and have guest handle them.

In practice, what actual errors do we expect userspace to see as
non-fatal errors?  It would be useful for the commit log to describe
the actual benefit we're going to see by splitting out non-fatal errors
for the user (not always a guest) to see separately.  Justify that this
is actually useful.

> 
>    Issues:
>    a. this behaviour should only be enabled with new userspace,
>       old userspace should work without changes.
> 
>       Suggestion: One way to address this would be to add a new eventfd
>       non_fatal_err_trigger. If not set, invoke err_trigger.

This outline format was really more useful for Michael to try to
generate discussion, for a commit log, I'd much rather see a definitive
statement such as:

 "To maintain backwards compatibility with userspace, non-fatal errors
 will continue to trigger via the existing error interrupt index if a
 non-fatal signaling mechanism has not been registered."

>    b. drivers are supposed to stop MMIO when error is reported,
>       if vm keeps going, we will keep doing MMIO/config.
> 
>       Suggestion 1: ignore this. vm stop happens much later when
>       userspace runs anyway, so we are not making things much worse.
> 
>       Suggestion 2: try to stop MMIO/config, resume on resume call
> 
>       Patch below implements Suggestion 1.
> 
>       Note that although this is really against the documentation, which
>       states error_detected() is the point at which the driver should quiesce
>       the device and not touch it further (until diagnostic poking at
>       mmio_enabled or full access at resume callback).
> 
>       Fixing this won't be easy. However, this is not a regression.
> 
>       Also note this does nothing about interrupts, documentation
>       suggests returning IRQ_NONE until reset.
>       Again, not a regression.

So again, no change here.  I'm not sure what this adds to the commit
log, perhaps we can reference this as a link to Michael's original
proposal.
 
>    c. PF driver might detect that function is completely broken,
>       if vm keeps going, we will keep doing MMIO/config.
> 
>       Suggestion 1: ignore this. vm stop happens much later when
>       userspace runs anyway, so we are not making things much worse.
> 
>       Suggestion 2: detect this and invoke err_trigger to stop VM.
> 
>       Patch below implements Suggestion 2.

This needs more description and seems a bit misleading.  This patch
adds a slot_reset handler, such that if the slot is reset, we notify
the user, essentially promoting the non-fatal error to fatal.  But what
condition gets us to this point?  AIUI, AER is a voting scheme and if
any driver affected says they need a reset, everyone gets a reset.  So
the PF driver we're talking about here is not vfio-pci and it's not the
user, the user has no way to signal that the device is completely
broken, this only handles the case of other collateral devices with
native host drivers that might signal this, right?

It seems like this is where this patch has the greatest exposure to
regressions.  If we take the VM use case, previously we could have a
non-AER aware guest and the hypervisor could stop the VM on all
errors.  Now the hypervisor might support the distinction between fatal
and non-fatal, but the guest may still not have AER support.  That
doesn't imply a problem with this approach, the user (hypervisor) would
be at fault for any difference in handling in that case.

> Suggested-by: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
> ---
>  drivers/vfio/pci/vfio_pci.c         | 38 +++++++++++++++++++++++++++++++++++--
>  drivers/vfio/pci/vfio_pci_intrs.c   | 19 +++++++++++++++++++
>  drivers/vfio/pci/vfio_pci_private.h |  1 +
>  include/uapi/linux/vfio.h           |  1 +
>  4 files changed, 57 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 324c52e..3551cc9 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -441,7 +441,8 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type)
>  
>  			return (flags & PCI_MSIX_FLAGS_QSIZE) + 1;
>  		}
> -	} else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX) {
> +	} else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX ||
> +		   irq_type == VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX) {
>  		if (pci_is_pcie(vdev->pdev))
>  			return 1;
>  	} else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) {
> @@ -796,6 +797,7 @@ static long vfio_pci_ioctl(void *device_data,
>  		case VFIO_PCI_REQ_IRQ_INDEX:
>  			break;
>  		case VFIO_PCI_ERR_IRQ_INDEX:
> +		case VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX:
>  			if (pci_is_pcie(vdev->pdev))
>  				break;
>  		/* pass thru to return error */
> @@ -1282,7 +1284,9 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
>  
>  	mutex_lock(&vdev->igate);
>  
> -	if (vdev->err_trigger)
> +	if (state == pci_channel_io_normal && vdev->non_fatal_err_trigger)
> +		eventfd_signal(vdev->non_fatal_err_trigger, 1);
> +	else if (vdev->err_trigger)
>  		eventfd_signal(vdev->err_trigger, 1);
>  
>  	mutex_unlock(&vdev->igate);
> @@ -1292,8 +1296,38 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
>  	return PCI_ERS_RESULT_CAN_RECOVER;
>  }
>  
> +static pci_ers_result_t vfio_pci_aer_slot_reset(struct pci_dev *pdev)
> +{
> +	struct vfio_pci_device *vdev;
> +	struct vfio_device *device;
> +	static pci_ers_result_t err = PCI_ERS_RESULT_NONE;
> +
> +	device = vfio_device_get_from_dev(&pdev->dev);
> +	if (!device)
> +		goto err_dev;
> +
> +	vdev = vfio_device_data(device);
> +	if (!vdev)
> +		goto err_data;
> +
> +	mutex_lock(&vdev->igate);
> +
> +	if (vdev->err_trigger)
> +		eventfd_signal(vdev->err_trigger, 1);

What about the case where the user has not registered for receiving
non-fatal errors, now we send an error signal on both error_detected
and slot_reset.  Is that useful/desirable?

> +
> +	mutex_unlock(&vdev->igate);
> +
> +	err = PCI_ERS_RESULT_RECOVERED;
> +
> +err_data:
> +	vfio_device_put(device);
> +err_dev:
> +	return err;
> +}
> +
>  static const struct pci_error_handlers vfio_err_handlers = {
>  	.error_detected = vfio_pci_aer_err_detected,
> +	.slot_reset = vfio_pci_aer_slot_reset,
>  };
>  
>  static struct pci_driver vfio_pci_driver = {
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index 1c46045..270d568 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -611,6 +611,17 @@ static int vfio_pci_set_err_trigger(struct vfio_pci_device *vdev,
>  					       count, flags, data);
>  }
>  
> +static int vfio_pci_set_non_fatal_err_trigger(struct vfio_pci_device *vdev,
> +				    unsigned index, unsigned start,
> +				    unsigned count, uint32_t flags, void *data)
> +{
> +	if (index != VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX || start != 0 || count > 1)
> +		return -EINVAL;
> +
> +	return vfio_pci_set_ctx_trigger_single(&vdev->non_fatal_err_trigger,
> +					       count, flags, data);
> +}
> +
>  static int vfio_pci_set_req_trigger(struct vfio_pci_device *vdev,
>  				    unsigned index, unsigned start,
>  				    unsigned count, uint32_t flags, void *data)
> @@ -664,6 +675,14 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_device *vdev, uint32_t flags,
>  			break;
>  		}
>  		break;
> +	case VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX:
> +		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
> +		case VFIO_IRQ_SET_ACTION_TRIGGER:
> +			if (pci_is_pcie(vdev->pdev))
> +				func = vfio_pci_set_non_fatal_err_trigger;
> +			break;
> +		}
> +		break;
>  	case VFIO_PCI_REQ_IRQ_INDEX:
>  		switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
>  		case VFIO_IRQ_SET_ACTION_TRIGGER:
> diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
> index f561ac1..d30f8a3 100644
> --- a/drivers/vfio/pci/vfio_pci_private.h
> +++ b/drivers/vfio/pci/vfio_pci_private.h
> @@ -93,6 +93,7 @@ struct vfio_pci_device {
>  	struct pci_saved_state	*pci_saved_state;
>  	int			refcnt;
>  	struct eventfd_ctx	*err_trigger;
> +	struct eventfd_ctx	*non_fatal_err_trigger;
>  	struct eventfd_ctx	*req_trigger;
>  	struct list_head	dummy_resources_list;
>  };
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 519eff3..affa973 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -443,6 +443,7 @@ enum {
>  	VFIO_PCI_MSIX_IRQ_INDEX,
>  	VFIO_PCI_ERR_IRQ_INDEX,
>  	VFIO_PCI_REQ_IRQ_INDEX,
> +	VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX,
>  	VFIO_PCI_NUM_IRQS
>  };
>  

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] vfio pci: kernel support of error recovery only for non fatal error
  2017-03-13 22:06   ` [Qemu-devel] " Alex Williamson
@ 2017-03-20 12:50     ` Cao jin
  -1 siblings, 0 replies; 22+ messages in thread
From: Cao jin @ 2017-03-20 12:50 UTC (permalink / raw)
  To: Alex Williamson; +Cc: linux-kernel, kvm, qemu-devel, mst, izumi.taku

Sorry for late.

On 03/14/2017 06:06 AM, Alex Williamson wrote:
> On Mon, 27 Feb 2017 15:28:43 +0800
> Cao jin <caoj.fnst@cn.fujitsu.com> wrote:
> 
>> 0. What happens now (PCIE AER only)
>>    Fatal errors cause a link reset.
>>    Non fatal errors don't.
>>    All errors stop the VM eventually, but not immediately
>>    because it's detected and reported asynchronously.
>>    Interrupts are forwarded as usual.
>>    Correctable errors are not reported to guest at all.
>>    Note: PPC EEH is different. This focuses on AER.
> 
> Perhaps you're only focusing on AER, but don't the error handlers we're
> using support both AER and EEH generically?  I don't think we can
> completely disregard how this affects EEH behavior, if at all.
> 

After taking a rough look at the EEH,  find that EEH always feed
error_detected with pci_channel_io_frozen, from perspective of
error_detected, EEH is not affected.  

I am not sure about a question: when assign devices in spapr host,
should all functions/devices in a PE be bound to vfio? I am kind of
confused about the relationship between a PE & a tce iommu group

>>
>> 1. Correctable errors
>>    There is no need to report these to guest. So let's not.
> 
> What does this patch change to make this happen?  I don't see
> anything.  Was this always the case?  No change?
> 

yes, no change on correctable error.

>>
>> 2. Fatal errors
>>    It's not easy to handle them gracefully since link reset
>>    is needed. As a first step, let's use the existing mechanism
>>    in that case.
> 
> Ok, so no change here either.
> 
>> 2. Non-fatal errors
>>    Here we could make progress by reporting them to guest
>>    and have guest handle them.
> 
> In practice, what actual errors do we expect userspace to see as
> non-fatal errors? It would be useful for the commit log to describe
> the actual benefit we're going to see by splitting out non-fatal errors
> for the user (not always a guest) to see separately.  Justify that this
> is actually useful.
> 
>>
>>    Issues:
>>    a. this behaviour should only be enabled with new userspace,
>>       old userspace should work without changes.
>>
>>       Suggestion: One way to address this would be to add a new eventfd
>>       non_fatal_err_trigger. If not set, invoke err_trigger.
> 
> This outline format was really more useful for Michael to try to
> generate discussion, for a commit log, I'd much rather see a definitive
> statement such as:
> 
>  "To maintain backwards compatibility with userspace, non-fatal errors
>  will continue to trigger via the existing error interrupt index if a
>  non-fatal signaling mechanism has not been registered."
> 
>>    b. drivers are supposed to stop MMIO when error is reported,
>>       if vm keeps going, we will keep doing MMIO/config.
>>
>>       Suggestion 1: ignore this. vm stop happens much later when
>>       userspace runs anyway, so we are not making things much worse.
>>
>>       Suggestion 2: try to stop MMIO/config, resume on resume call
>>
>>       Patch below implements Suggestion 1.
>>
>>       Note that although this is really against the documentation, which
>>       states error_detected() is the point at which the driver should quiesce
>>       the device and not touch it further (until diagnostic poking at
>>       mmio_enabled or full access at resume callback).
>>
>>       Fixing this won't be easy. However, this is not a regression.
>>
>>       Also note this does nothing about interrupts, documentation
>>       suggests returning IRQ_NONE until reset.
>>       Again, not a regression.
> 
> So again, no change here.  I'm not sure what this adds to the commit
> log, perhaps we can reference this as a link to Michael's original
> proposal.
>  
>>    c. PF driver might detect that function is completely broken,
>>       if vm keeps going, we will keep doing MMIO/config.
>>
>>       Suggestion 1: ignore this. vm stop happens much later when
>>       userspace runs anyway, so we are not making things much worse.
>>
>>       Suggestion 2: detect this and invoke err_trigger to stop VM.
>>
>>       Patch below implements Suggestion 2.
> 
> This needs more description and seems a bit misleading.  This patch
> adds a slot_reset handler, such that if the slot is reset, we notify
> the user, essentially promoting the non-fatal error to fatal.  But what
> condition gets us to this point?  AIUI, AER is a voting scheme and if
> any driver affected says they need a reset, everyone gets a reset.  So
> the PF driver we're talking about here is not vfio-pci and it's not the
> user, the user has no way to signal that the device is completely
> broken, this only handles the case of other collateral devices with
> native host drivers that might signal this, right?
> 

Yes, same understanding as you, if I don't miss something from michael.

> It seems like this is where this patch has the greatest exposure to
> regressions.  If we take the VM use case, previously we could have a
> non-AER aware guest and the hypervisor could stop the VM on all
> errors.  Now the hypervisor might support the distinction between fatal
> and non-fatal, but the guest may still not have AER support.  That
> doesn't imply a problem with this approach, the user (hypervisor) would
> be at fault for any difference in handling in that case.
> 

>>  
>> +static pci_ers_result_t vfio_pci_aer_slot_reset(struct pci_dev *pdev)
>> +{
>> +	struct vfio_pci_device *vdev;
>> +	struct vfio_device *device;
>> +	static pci_ers_result_t err = PCI_ERS_RESULT_NONE;
>> +
>> +	device = vfio_device_get_from_dev(&pdev->dev);
>> +	if (!device)
>> +		goto err_dev;
>> +
>> +	vdev = vfio_device_data(device);
>> +	if (!vdev)
>> +		goto err_data;
>> +
>> +	mutex_lock(&vdev->igate);
>> +
>> +	if (vdev->err_trigger)
>> +		eventfd_signal(vdev->err_trigger, 1);
> 
> What about the case where the user has not registered for receiving
> non-fatal errors, now we send an error signal on both error_detected
> and slot_reset.  Is that useful/desirable?
> 

Not desirable, but seems not harmful, guest user will stop anyway. How
to avoid this case gracefully seems not easy.

-- 
Sincerely,
Cao jin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH] vfio pci: kernel support of error recovery only for non fatal error
@ 2017-03-20 12:50     ` Cao jin
  0 siblings, 0 replies; 22+ messages in thread
From: Cao jin @ 2017-03-20 12:50 UTC (permalink / raw)
  To: Alex Williamson; +Cc: linux-kernel, kvm, qemu-devel, mst, izumi.taku

Sorry for late.

On 03/14/2017 06:06 AM, Alex Williamson wrote:
> On Mon, 27 Feb 2017 15:28:43 +0800
> Cao jin <caoj.fnst@cn.fujitsu.com> wrote:
> 
>> 0. What happens now (PCIE AER only)
>>    Fatal errors cause a link reset.
>>    Non fatal errors don't.
>>    All errors stop the VM eventually, but not immediately
>>    because it's detected and reported asynchronously.
>>    Interrupts are forwarded as usual.
>>    Correctable errors are not reported to guest at all.
>>    Note: PPC EEH is different. This focuses on AER.
> 
> Perhaps you're only focusing on AER, but don't the error handlers we're
> using support both AER and EEH generically?  I don't think we can
> completely disregard how this affects EEH behavior, if at all.
> 

After taking a rough look at the EEH,  find that EEH always feed
error_detected with pci_channel_io_frozen, from perspective of
error_detected, EEH is not affected.  

I am not sure about a question: when assign devices in spapr host,
should all functions/devices in a PE be bound to vfio? I am kind of
confused about the relationship between a PE & a tce iommu group

>>
>> 1. Correctable errors
>>    There is no need to report these to guest. So let's not.
> 
> What does this patch change to make this happen?  I don't see
> anything.  Was this always the case?  No change?
> 

yes, no change on correctable error.

>>
>> 2. Fatal errors
>>    It's not easy to handle them gracefully since link reset
>>    is needed. As a first step, let's use the existing mechanism
>>    in that case.
> 
> Ok, so no change here either.
> 
>> 2. Non-fatal errors
>>    Here we could make progress by reporting them to guest
>>    and have guest handle them.
> 
> In practice, what actual errors do we expect userspace to see as
> non-fatal errors? It would be useful for the commit log to describe
> the actual benefit we're going to see by splitting out non-fatal errors
> for the user (not always a guest) to see separately.  Justify that this
> is actually useful.
> 
>>
>>    Issues:
>>    a. this behaviour should only be enabled with new userspace,
>>       old userspace should work without changes.
>>
>>       Suggestion: One way to address this would be to add a new eventfd
>>       non_fatal_err_trigger. If not set, invoke err_trigger.
> 
> This outline format was really more useful for Michael to try to
> generate discussion, for a commit log, I'd much rather see a definitive
> statement such as:
> 
>  "To maintain backwards compatibility with userspace, non-fatal errors
>  will continue to trigger via the existing error interrupt index if a
>  non-fatal signaling mechanism has not been registered."
> 
>>    b. drivers are supposed to stop MMIO when error is reported,
>>       if vm keeps going, we will keep doing MMIO/config.
>>
>>       Suggestion 1: ignore this. vm stop happens much later when
>>       userspace runs anyway, so we are not making things much worse.
>>
>>       Suggestion 2: try to stop MMIO/config, resume on resume call
>>
>>       Patch below implements Suggestion 1.
>>
>>       Note that although this is really against the documentation, which
>>       states error_detected() is the point at which the driver should quiesce
>>       the device and not touch it further (until diagnostic poking at
>>       mmio_enabled or full access at resume callback).
>>
>>       Fixing this won't be easy. However, this is not a regression.
>>
>>       Also note this does nothing about interrupts, documentation
>>       suggests returning IRQ_NONE until reset.
>>       Again, not a regression.
> 
> So again, no change here.  I'm not sure what this adds to the commit
> log, perhaps we can reference this as a link to Michael's original
> proposal.
>  
>>    c. PF driver might detect that function is completely broken,
>>       if vm keeps going, we will keep doing MMIO/config.
>>
>>       Suggestion 1: ignore this. vm stop happens much later when
>>       userspace runs anyway, so we are not making things much worse.
>>
>>       Suggestion 2: detect this and invoke err_trigger to stop VM.
>>
>>       Patch below implements Suggestion 2.
> 
> This needs more description and seems a bit misleading.  This patch
> adds a slot_reset handler, such that if the slot is reset, we notify
> the user, essentially promoting the non-fatal error to fatal.  But what
> condition gets us to this point?  AIUI, AER is a voting scheme and if
> any driver affected says they need a reset, everyone gets a reset.  So
> the PF driver we're talking about here is not vfio-pci and it's not the
> user, the user has no way to signal that the device is completely
> broken, this only handles the case of other collateral devices with
> native host drivers that might signal this, right?
> 

Yes, same understanding as you, if I don't miss something from michael.

> It seems like this is where this patch has the greatest exposure to
> regressions.  If we take the VM use case, previously we could have a
> non-AER aware guest and the hypervisor could stop the VM on all
> errors.  Now the hypervisor might support the distinction between fatal
> and non-fatal, but the guest may still not have AER support.  That
> doesn't imply a problem with this approach, the user (hypervisor) would
> be at fault for any difference in handling in that case.
> 

>>  
>> +static pci_ers_result_t vfio_pci_aer_slot_reset(struct pci_dev *pdev)
>> +{
>> +	struct vfio_pci_device *vdev;
>> +	struct vfio_device *device;
>> +	static pci_ers_result_t err = PCI_ERS_RESULT_NONE;
>> +
>> +	device = vfio_device_get_from_dev(&pdev->dev);
>> +	if (!device)
>> +		goto err_dev;
>> +
>> +	vdev = vfio_device_data(device);
>> +	if (!vdev)
>> +		goto err_data;
>> +
>> +	mutex_lock(&vdev->igate);
>> +
>> +	if (vdev->err_trigger)
>> +		eventfd_signal(vdev->err_trigger, 1);
> 
> What about the case where the user has not registered for receiving
> non-fatal errors, now we send an error signal on both error_detected
> and slot_reset.  Is that useful/desirable?
> 

Not desirable, but seems not harmful, guest user will stop anyway. How
to avoid this case gracefully seems not easy.

-- 
Sincerely,
Cao jin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] vfio pci: kernel support of error recovery only for non fatal error
  2017-03-20 12:50     ` [Qemu-devel] " Cao jin
@ 2017-03-20 14:30       ` Alex Williamson
  -1 siblings, 0 replies; 22+ messages in thread
From: Alex Williamson @ 2017-03-20 14:30 UTC (permalink / raw)
  To: Cao jin; +Cc: linux-kernel, kvm, qemu-devel, mst, izumi.taku

On Mon, 20 Mar 2017 20:50:39 +0800
Cao jin <caoj.fnst@cn.fujitsu.com> wrote:

> Sorry for late.
> 
> On 03/14/2017 06:06 AM, Alex Williamson wrote:
> > On Mon, 27 Feb 2017 15:28:43 +0800
> > Cao jin <caoj.fnst@cn.fujitsu.com> wrote:
> >   
> >> 0. What happens now (PCIE AER only)
> >>    Fatal errors cause a link reset.
> >>    Non fatal errors don't.
> >>    All errors stop the VM eventually, but not immediately
> >>    because it's detected and reported asynchronously.
> >>    Interrupts are forwarded as usual.
> >>    Correctable errors are not reported to guest at all.
> >>    Note: PPC EEH is different. This focuses on AER.  
> > 
> > Perhaps you're only focusing on AER, but don't the error handlers we're
> > using support both AER and EEH generically?  I don't think we can
> > completely disregard how this affects EEH behavior, if at all.
> >   
> 
> After taking a rough look at the EEH,  find that EEH always feed
> error_detected with pci_channel_io_frozen, from perspective of
> error_detected, EEH is not affected.  
> 
> I am not sure about a question: when assign devices in spapr host,
> should all functions/devices in a PE be bound to vfio? I am kind of
> confused about the relationship between a PE & a tce iommu group

AIUI, yes all devices within the PE are part of the same IOMMU group
and therefore all endpoints must be bound to vfio or pci-stub.

> >>
> >> 1. Correctable errors
> >>    There is no need to report these to guest. So let's not.  
> > 
> > What does this patch change to make this happen?  I don't see
> > anything.  Was this always the case?  No change?
> >   
> 
> yes, no change on correctable error.
> 
> >>
> >> 2. Fatal errors
> >>    It's not easy to handle them gracefully since link reset
> >>    is needed. As a first step, let's use the existing mechanism
> >>    in that case.  
> > 
> > Ok, so no change here either.
> >   
> >> 2. Non-fatal errors
> >>    Here we could make progress by reporting them to guest
> >>    and have guest handle them.  
> > 
> > In practice, what actual errors do we expect userspace to see as
> > non-fatal errors? It would be useful for the commit log to describe
> > the actual benefit we're going to see by splitting out non-fatal errors
> > for the user (not always a guest) to see separately.  Justify that this
> > is actually useful.
> >   
> >>
> >>    Issues:
> >>    a. this behaviour should only be enabled with new userspace,
> >>       old userspace should work without changes.
> >>
> >>       Suggestion: One way to address this would be to add a new eventfd
> >>       non_fatal_err_trigger. If not set, invoke err_trigger.  
> > 
> > This outline format was really more useful for Michael to try to
> > generate discussion, for a commit log, I'd much rather see a definitive
> > statement such as:
> > 
> >  "To maintain backwards compatibility with userspace, non-fatal errors
> >  will continue to trigger via the existing error interrupt index if a
> >  non-fatal signaling mechanism has not been registered."
> >   
> >>    b. drivers are supposed to stop MMIO when error is reported,
> >>       if vm keeps going, we will keep doing MMIO/config.
> >>
> >>       Suggestion 1: ignore this. vm stop happens much later when
> >>       userspace runs anyway, so we are not making things much worse.
> >>
> >>       Suggestion 2: try to stop MMIO/config, resume on resume call
> >>
> >>       Patch below implements Suggestion 1.
> >>
> >>       Note that although this is really against the documentation, which
> >>       states error_detected() is the point at which the driver should quiesce
> >>       the device and not touch it further (until diagnostic poking at
> >>       mmio_enabled or full access at resume callback).
> >>
> >>       Fixing this won't be easy. However, this is not a regression.
> >>
> >>       Also note this does nothing about interrupts, documentation
> >>       suggests returning IRQ_NONE until reset.
> >>       Again, not a regression.  
> > 
> > So again, no change here.  I'm not sure what this adds to the commit
> > log, perhaps we can reference this as a link to Michael's original
> > proposal.
> >    
> >>    c. PF driver might detect that function is completely broken,
> >>       if vm keeps going, we will keep doing MMIO/config.
> >>
> >>       Suggestion 1: ignore this. vm stop happens much later when
> >>       userspace runs anyway, so we are not making things much worse.
> >>
> >>       Suggestion 2: detect this and invoke err_trigger to stop VM.
> >>
> >>       Patch below implements Suggestion 2.  
> > 
> > This needs more description and seems a bit misleading.  This patch
> > adds a slot_reset handler, such that if the slot is reset, we notify
> > the user, essentially promoting the non-fatal error to fatal.  But what
> > condition gets us to this point?  AIUI, AER is a voting scheme and if
> > any driver affected says they need a reset, everyone gets a reset.  So
> > the PF driver we're talking about here is not vfio-pci and it's not the
> > user, the user has no way to signal that the device is completely
> > broken, this only handles the case of other collateral devices with
> > native host drivers that might signal this, right?
> >   
> 
> Yes, same understanding as you, if I don't miss something from michael.
> 
> > It seems like this is where this patch has the greatest exposure to
> > regressions.  If we take the VM use case, previously we could have a
> > non-AER aware guest and the hypervisor could stop the VM on all
> > errors.  Now the hypervisor might support the distinction between fatal
> > and non-fatal, but the guest may still not have AER support.  That
> > doesn't imply a problem with this approach, the user (hypervisor) would
> > be at fault for any difference in handling in that case.
> >   
> 
> >>  
> >> +static pci_ers_result_t vfio_pci_aer_slot_reset(struct pci_dev *pdev)
> >> +{
> >> +	struct vfio_pci_device *vdev;
> >> +	struct vfio_device *device;
> >> +	static pci_ers_result_t err = PCI_ERS_RESULT_NONE;
> >> +
> >> +	device = vfio_device_get_from_dev(&pdev->dev);
> >> +	if (!device)
> >> +		goto err_dev;
> >> +
> >> +	vdev = vfio_device_data(device);
> >> +	if (!vdev)
> >> +		goto err_data;
> >> +
> >> +	mutex_lock(&vdev->igate);
> >> +
> >> +	if (vdev->err_trigger)
> >> +		eventfd_signal(vdev->err_trigger, 1);  
> > 
> > What about the case where the user has not registered for receiving
> > non-fatal errors, now we send an error signal on both error_detected
> > and slot_reset.  Is that useful/desirable?
> >   
> 
> Not desirable, but seems not harmful, guest user will stop anyway. How
> to avoid this case gracefully seems not easy.

"Not harmful" is presuming the behavior of the user.  QEMU might not be
the only consumer of these events.  Is it possible to receive a
slot_reset without first receiving an error_detected?  If not then we
can easily track our action for one to decide on the behavior for the
other.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH] vfio pci: kernel support of error recovery only for non fatal error
@ 2017-03-20 14:30       ` Alex Williamson
  0 siblings, 0 replies; 22+ messages in thread
From: Alex Williamson @ 2017-03-20 14:30 UTC (permalink / raw)
  To: Cao jin; +Cc: linux-kernel, kvm, qemu-devel, mst, izumi.taku

On Mon, 20 Mar 2017 20:50:39 +0800
Cao jin <caoj.fnst@cn.fujitsu.com> wrote:

> Sorry for late.
> 
> On 03/14/2017 06:06 AM, Alex Williamson wrote:
> > On Mon, 27 Feb 2017 15:28:43 +0800
> > Cao jin <caoj.fnst@cn.fujitsu.com> wrote:
> >   
> >> 0. What happens now (PCIE AER only)
> >>    Fatal errors cause a link reset.
> >>    Non fatal errors don't.
> >>    All errors stop the VM eventually, but not immediately
> >>    because it's detected and reported asynchronously.
> >>    Interrupts are forwarded as usual.
> >>    Correctable errors are not reported to guest at all.
> >>    Note: PPC EEH is different. This focuses on AER.  
> > 
> > Perhaps you're only focusing on AER, but don't the error handlers we're
> > using support both AER and EEH generically?  I don't think we can
> > completely disregard how this affects EEH behavior, if at all.
> >   
> 
> After taking a rough look at the EEH,  find that EEH always feed
> error_detected with pci_channel_io_frozen, from perspective of
> error_detected, EEH is not affected.  
> 
> I am not sure about a question: when assign devices in spapr host,
> should all functions/devices in a PE be bound to vfio? I am kind of
> confused about the relationship between a PE & a tce iommu group

AIUI, yes all devices within the PE are part of the same IOMMU group
and therefore all endpoints must be bound to vfio or pci-stub.

> >>
> >> 1. Correctable errors
> >>    There is no need to report these to guest. So let's not.  
> > 
> > What does this patch change to make this happen?  I don't see
> > anything.  Was this always the case?  No change?
> >   
> 
> yes, no change on correctable error.
> 
> >>
> >> 2. Fatal errors
> >>    It's not easy to handle them gracefully since link reset
> >>    is needed. As a first step, let's use the existing mechanism
> >>    in that case.  
> > 
> > Ok, so no change here either.
> >   
> >> 2. Non-fatal errors
> >>    Here we could make progress by reporting them to guest
> >>    and have guest handle them.  
> > 
> > In practice, what actual errors do we expect userspace to see as
> > non-fatal errors? It would be useful for the commit log to describe
> > the actual benefit we're going to see by splitting out non-fatal errors
> > for the user (not always a guest) to see separately.  Justify that this
> > is actually useful.
> >   
> >>
> >>    Issues:
> >>    a. this behaviour should only be enabled with new userspace,
> >>       old userspace should work without changes.
> >>
> >>       Suggestion: One way to address this would be to add a new eventfd
> >>       non_fatal_err_trigger. If not set, invoke err_trigger.  
> > 
> > This outline format was really more useful for Michael to try to
> > generate discussion, for a commit log, I'd much rather see a definitive
> > statement such as:
> > 
> >  "To maintain backwards compatibility with userspace, non-fatal errors
> >  will continue to trigger via the existing error interrupt index if a
> >  non-fatal signaling mechanism has not been registered."
> >   
> >>    b. drivers are supposed to stop MMIO when error is reported,
> >>       if vm keeps going, we will keep doing MMIO/config.
> >>
> >>       Suggestion 1: ignore this. vm stop happens much later when
> >>       userspace runs anyway, so we are not making things much worse.
> >>
> >>       Suggestion 2: try to stop MMIO/config, resume on resume call
> >>
> >>       Patch below implements Suggestion 1.
> >>
> >>       Note that although this is really against the documentation, which
> >>       states error_detected() is the point at which the driver should quiesce
> >>       the device and not touch it further (until diagnostic poking at
> >>       mmio_enabled or full access at resume callback).
> >>
> >>       Fixing this won't be easy. However, this is not a regression.
> >>
> >>       Also note this does nothing about interrupts, documentation
> >>       suggests returning IRQ_NONE until reset.
> >>       Again, not a regression.  
> > 
> > So again, no change here.  I'm not sure what this adds to the commit
> > log, perhaps we can reference this as a link to Michael's original
> > proposal.
> >    
> >>    c. PF driver might detect that function is completely broken,
> >>       if vm keeps going, we will keep doing MMIO/config.
> >>
> >>       Suggestion 1: ignore this. vm stop happens much later when
> >>       userspace runs anyway, so we are not making things much worse.
> >>
> >>       Suggestion 2: detect this and invoke err_trigger to stop VM.
> >>
> >>       Patch below implements Suggestion 2.  
> > 
> > This needs more description and seems a bit misleading.  This patch
> > adds a slot_reset handler, such that if the slot is reset, we notify
> > the user, essentially promoting the non-fatal error to fatal.  But what
> > condition gets us to this point?  AIUI, AER is a voting scheme and if
> > any driver affected says they need a reset, everyone gets a reset.  So
> > the PF driver we're talking about here is not vfio-pci and it's not the
> > user, the user has no way to signal that the device is completely
> > broken, this only handles the case of other collateral devices with
> > native host drivers that might signal this, right?
> >   
> 
> Yes, same understanding as you, if I don't miss something from michael.
> 
> > It seems like this is where this patch has the greatest exposure to
> > regressions.  If we take the VM use case, previously we could have a
> > non-AER aware guest and the hypervisor could stop the VM on all
> > errors.  Now the hypervisor might support the distinction between fatal
> > and non-fatal, but the guest may still not have AER support.  That
> > doesn't imply a problem with this approach, the user (hypervisor) would
> > be at fault for any difference in handling in that case.
> >   
> 
> >>  
> >> +static pci_ers_result_t vfio_pci_aer_slot_reset(struct pci_dev *pdev)
> >> +{
> >> +	struct vfio_pci_device *vdev;
> >> +	struct vfio_device *device;
> >> +	static pci_ers_result_t err = PCI_ERS_RESULT_NONE;
> >> +
> >> +	device = vfio_device_get_from_dev(&pdev->dev);
> >> +	if (!device)
> >> +		goto err_dev;
> >> +
> >> +	vdev = vfio_device_data(device);
> >> +	if (!vdev)
> >> +		goto err_data;
> >> +
> >> +	mutex_lock(&vdev->igate);
> >> +
> >> +	if (vdev->err_trigger)
> >> +		eventfd_signal(vdev->err_trigger, 1);  
> > 
> > What about the case where the user has not registered for receiving
> > non-fatal errors, now we send an error signal on both error_detected
> > and slot_reset.  Is that useful/desirable?
> >   
> 
> Not desirable, but seems not harmful, guest user will stop anyway. How
> to avoid this case gracefully seems not easy.

"Not harmful" is presuming the behavior of the user.  QEMU might not be
the only consumer of these events.  Is it possible to receive a
slot_reset without first receiving an error_detected?  If not then we
can easily track our action for one to decide on the behavior for the
other.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] vfio pci: kernel support of error recovery only for non fatal error
  2017-03-20 12:50     ` [Qemu-devel] " Cao jin
@ 2017-03-20 14:32       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 22+ messages in thread
From: Michael S. Tsirkin @ 2017-03-20 14:32 UTC (permalink / raw)
  To: Cao jin; +Cc: Alex Williamson, linux-kernel, kvm, qemu-devel, izumi.taku

On Mon, Mar 20, 2017 at 08:50:39PM +0800, Cao jin wrote:
> Sorry for late.
> 
> On 03/14/2017 06:06 AM, Alex Williamson wrote:
> > On Mon, 27 Feb 2017 15:28:43 +0800
> > Cao jin <caoj.fnst@cn.fujitsu.com> wrote:
> > 
> >> 0. What happens now (PCIE AER only)
> >>    Fatal errors cause a link reset.
> >>    Non fatal errors don't.
> >>    All errors stop the VM eventually, but not immediately
> >>    because it's detected and reported asynchronously.
> >>    Interrupts are forwarded as usual.
> >>    Correctable errors are not reported to guest at all.
> >>    Note: PPC EEH is different. This focuses on AER.
> > 
> > Perhaps you're only focusing on AER, but don't the error handlers we're
> > using support both AER and EEH generically?  I don't think we can
> > completely disregard how this affects EEH behavior, if at all.
> > 
> 
> After taking a rough look at the EEH,  find that EEH always feed
> error_detected with pci_channel_io_frozen, from perspective of
> error_detected, EEH is not affected.  
> 
> I am not sure about a question: when assign devices in spapr host,
> should all functions/devices in a PE be bound to vfio? I am kind of
> confused about the relationship between a PE & a tce iommu group
> 
> >>
> >> 1. Correctable errors
> >>    There is no need to report these to guest. So let's not.
> > 
> > What does this patch change to make this happen?  I don't see
> > anything.  Was this always the case?  No change?
> > 
> 
> yes, no change on correctable error.
> 
> >>
> >> 2. Fatal errors
> >>    It's not easy to handle them gracefully since link reset
> >>    is needed. As a first step, let's use the existing mechanism
> >>    in that case.
> > 
> > Ok, so no change here either.
> > 
> >> 2. Non-fatal errors
> >>    Here we could make progress by reporting them to guest
> >>    and have guest handle them.
> > 
> > In practice, what actual errors do we expect userspace to see as
> > non-fatal errors? It would be useful for the commit log to describe
> > the actual benefit we're going to see by splitting out non-fatal errors
> > for the user (not always a guest) to see separately.  Justify that this
> > is actually useful.
> > 
> >>
> >>    Issues:
> >>    a. this behaviour should only be enabled with new userspace,
> >>       old userspace should work without changes.
> >>
> >>       Suggestion: One way to address this would be to add a new eventfd
> >>       non_fatal_err_trigger. If not set, invoke err_trigger.
> > 
> > This outline format was really more useful for Michael to try to
> > generate discussion, for a commit log, I'd much rather see a definitive
> > statement such as:
> > 
> >  "To maintain backwards compatibility with userspace, non-fatal errors
> >  will continue to trigger via the existing error interrupt index if a
> >  non-fatal signaling mechanism has not been registered."
> > 
> >>    b. drivers are supposed to stop MMIO when error is reported,
> >>       if vm keeps going, we will keep doing MMIO/config.
> >>
> >>       Suggestion 1: ignore this. vm stop happens much later when
> >>       userspace runs anyway, so we are not making things much worse.
> >>
> >>       Suggestion 2: try to stop MMIO/config, resume on resume call
> >>
> >>       Patch below implements Suggestion 1.
> >>
> >>       Note that although this is really against the documentation, which
> >>       states error_detected() is the point at which the driver should quiesce
> >>       the device and not touch it further (until diagnostic poking at
> >>       mmio_enabled or full access at resume callback).
> >>
> >>       Fixing this won't be easy. However, this is not a regression.
> >>
> >>       Also note this does nothing about interrupts, documentation
> >>       suggests returning IRQ_NONE until reset.
> >>       Again, not a regression.
> > 
> > So again, no change here.  I'm not sure what this adds to the commit
> > log, perhaps we can reference this as a link to Michael's original
> > proposal.
> >  
> >>    c. PF driver might detect that function is completely broken,
> >>       if vm keeps going, we will keep doing MMIO/config.
> >>
> >>       Suggestion 1: ignore this. vm stop happens much later when
> >>       userspace runs anyway, so we are not making things much worse.
> >>
> >>       Suggestion 2: detect this and invoke err_trigger to stop VM.
> >>
> >>       Patch below implements Suggestion 2.
> > 
> > This needs more description and seems a bit misleading.  This patch
> > adds a slot_reset handler, such that if the slot is reset, we notify
> > the user, essentially promoting the non-fatal error to fatal.  But what
> > condition gets us to this point?  AIUI, AER is a voting scheme and if
> > any driver affected says they need a reset, everyone gets a reset.  So
> > the PF driver we're talking about here is not vfio-pci and it's not the
> > user, the user has no way to signal that the device is completely
> > broken, this only handles the case of other collateral devices with
> > native host drivers that might signal this, right?
> > 
> 
> Yes, same understanding as you, if I don't miss something from michael.
> > It seems like this is where this patch has the greatest exposure to
> > regressions.  If we take the VM use case, previously we could have a
> > non-AER aware guest and the hypervisor could stop the VM on all
> > errors.  Now the hypervisor might support the distinction between fatal
> > and non-fatal, but the guest may still not have AER support.  That
> > doesn't imply a problem with this approach, the user (hypervisor) would
> > be at fault for any difference in handling in that case.
> > 
> 
> >>  
> >> +static pci_ers_result_t vfio_pci_aer_slot_reset(struct pci_dev *pdev)
> >> +{
> >> +	struct vfio_pci_device *vdev;
> >> +	struct vfio_device *device;
> >> +	static pci_ers_result_t err = PCI_ERS_RESULT_NONE;
> >> +
> >> +	device = vfio_device_get_from_dev(&pdev->dev);
> >> +	if (!device)
> >> +		goto err_dev;
> >> +
> >> +	vdev = vfio_device_data(device);
> >> +	if (!vdev)
> >> +		goto err_data;
> >> +
> >> +	mutex_lock(&vdev->igate);
> >> +
> >> +	if (vdev->err_trigger)
> >> +		eventfd_signal(vdev->err_trigger, 1);
> > 
> > What about the case where the user has not registered for receiving
> > non-fatal errors, now we send an error signal on both error_detected
> > and slot_reset.  Is that useful/desirable?
> > 
> 
> Not desirable, but seems not harmful, guest user will stop anyway. How
> to avoid this case gracefully seems not easy.

I actually see a clean way to do this.

Let's add yet another eventfd to trigger
when hosts resets the device itself.  vdev->host_reset ?

Users can use the same one as err_trigger if they like,
it will be up to them.

Alex?

> -- 
> Sincerely,
> Cao jin
> 
> 
>
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH] vfio pci: kernel support of error recovery only for non fatal error
@ 2017-03-20 14:32       ` Michael S. Tsirkin
  0 siblings, 0 replies; 22+ messages in thread
From: Michael S. Tsirkin @ 2017-03-20 14:32 UTC (permalink / raw)
  To: Cao jin; +Cc: Alex Williamson, linux-kernel, kvm, qemu-devel, izumi.taku

On Mon, Mar 20, 2017 at 08:50:39PM +0800, Cao jin wrote:
> Sorry for late.
> 
> On 03/14/2017 06:06 AM, Alex Williamson wrote:
> > On Mon, 27 Feb 2017 15:28:43 +0800
> > Cao jin <caoj.fnst@cn.fujitsu.com> wrote:
> > 
> >> 0. What happens now (PCIE AER only)
> >>    Fatal errors cause a link reset.
> >>    Non fatal errors don't.
> >>    All errors stop the VM eventually, but not immediately
> >>    because it's detected and reported asynchronously.
> >>    Interrupts are forwarded as usual.
> >>    Correctable errors are not reported to guest at all.
> >>    Note: PPC EEH is different. This focuses on AER.
> > 
> > Perhaps you're only focusing on AER, but don't the error handlers we're
> > using support both AER and EEH generically?  I don't think we can
> > completely disregard how this affects EEH behavior, if at all.
> > 
> 
> After taking a rough look at the EEH,  find that EEH always feed
> error_detected with pci_channel_io_frozen, from perspective of
> error_detected, EEH is not affected.  
> 
> I am not sure about a question: when assign devices in spapr host,
> should all functions/devices in a PE be bound to vfio? I am kind of
> confused about the relationship between a PE & a tce iommu group
> 
> >>
> >> 1. Correctable errors
> >>    There is no need to report these to guest. So let's not.
> > 
> > What does this patch change to make this happen?  I don't see
> > anything.  Was this always the case?  No change?
> > 
> 
> yes, no change on correctable error.
> 
> >>
> >> 2. Fatal errors
> >>    It's not easy to handle them gracefully since link reset
> >>    is needed. As a first step, let's use the existing mechanism
> >>    in that case.
> > 
> > Ok, so no change here either.
> > 
> >> 2. Non-fatal errors
> >>    Here we could make progress by reporting them to guest
> >>    and have guest handle them.
> > 
> > In practice, what actual errors do we expect userspace to see as
> > non-fatal errors? It would be useful for the commit log to describe
> > the actual benefit we're going to see by splitting out non-fatal errors
> > for the user (not always a guest) to see separately.  Justify that this
> > is actually useful.
> > 
> >>
> >>    Issues:
> >>    a. this behaviour should only be enabled with new userspace,
> >>       old userspace should work without changes.
> >>
> >>       Suggestion: One way to address this would be to add a new eventfd
> >>       non_fatal_err_trigger. If not set, invoke err_trigger.
> > 
> > This outline format was really more useful for Michael to try to
> > generate discussion, for a commit log, I'd much rather see a definitive
> > statement such as:
> > 
> >  "To maintain backwards compatibility with userspace, non-fatal errors
> >  will continue to trigger via the existing error interrupt index if a
> >  non-fatal signaling mechanism has not been registered."
> > 
> >>    b. drivers are supposed to stop MMIO when error is reported,
> >>       if vm keeps going, we will keep doing MMIO/config.
> >>
> >>       Suggestion 1: ignore this. vm stop happens much later when
> >>       userspace runs anyway, so we are not making things much worse.
> >>
> >>       Suggestion 2: try to stop MMIO/config, resume on resume call
> >>
> >>       Patch below implements Suggestion 1.
> >>
> >>       Note that although this is really against the documentation, which
> >>       states error_detected() is the point at which the driver should quiesce
> >>       the device and not touch it further (until diagnostic poking at
> >>       mmio_enabled or full access at resume callback).
> >>
> >>       Fixing this won't be easy. However, this is not a regression.
> >>
> >>       Also note this does nothing about interrupts, documentation
> >>       suggests returning IRQ_NONE until reset.
> >>       Again, not a regression.
> > 
> > So again, no change here.  I'm not sure what this adds to the commit
> > log, perhaps we can reference this as a link to Michael's original
> > proposal.
> >  
> >>    c. PF driver might detect that function is completely broken,
> >>       if vm keeps going, we will keep doing MMIO/config.
> >>
> >>       Suggestion 1: ignore this. vm stop happens much later when
> >>       userspace runs anyway, so we are not making things much worse.
> >>
> >>       Suggestion 2: detect this and invoke err_trigger to stop VM.
> >>
> >>       Patch below implements Suggestion 2.
> > 
> > This needs more description and seems a bit misleading.  This patch
> > adds a slot_reset handler, such that if the slot is reset, we notify
> > the user, essentially promoting the non-fatal error to fatal.  But what
> > condition gets us to this point?  AIUI, AER is a voting scheme and if
> > any driver affected says they need a reset, everyone gets a reset.  So
> > the PF driver we're talking about here is not vfio-pci and it's not the
> > user, the user has no way to signal that the device is completely
> > broken, this only handles the case of other collateral devices with
> > native host drivers that might signal this, right?
> > 
> 
> Yes, same understanding as you, if I don't miss something from michael.
> > It seems like this is where this patch has the greatest exposure to
> > regressions.  If we take the VM use case, previously we could have a
> > non-AER aware guest and the hypervisor could stop the VM on all
> > errors.  Now the hypervisor might support the distinction between fatal
> > and non-fatal, but the guest may still not have AER support.  That
> > doesn't imply a problem with this approach, the user (hypervisor) would
> > be at fault for any difference in handling in that case.
> > 
> 
> >>  
> >> +static pci_ers_result_t vfio_pci_aer_slot_reset(struct pci_dev *pdev)
> >> +{
> >> +	struct vfio_pci_device *vdev;
> >> +	struct vfio_device *device;
> >> +	static pci_ers_result_t err = PCI_ERS_RESULT_NONE;
> >> +
> >> +	device = vfio_device_get_from_dev(&pdev->dev);
> >> +	if (!device)
> >> +		goto err_dev;
> >> +
> >> +	vdev = vfio_device_data(device);
> >> +	if (!vdev)
> >> +		goto err_data;
> >> +
> >> +	mutex_lock(&vdev->igate);
> >> +
> >> +	if (vdev->err_trigger)
> >> +		eventfd_signal(vdev->err_trigger, 1);
> > 
> > What about the case where the user has not registered for receiving
> > non-fatal errors, now we send an error signal on both error_detected
> > and slot_reset.  Is that useful/desirable?
> > 
> 
> Not desirable, but seems not harmful, guest user will stop anyway. How
> to avoid this case gracefully seems not easy.

I actually see a clean way to do this.

Let's add yet another eventfd to trigger
when hosts resets the device itself.  vdev->host_reset ?

Users can use the same one as err_trigger if they like,
it will be up to them.

Alex?

> -- 
> Sincerely,
> Cao jin
> 
> 
>
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] vfio pci: kernel support of error recovery only for non fatal error
  2017-03-20 14:30       ` [Qemu-devel] " Alex Williamson
@ 2017-03-20 14:34         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 22+ messages in thread
From: Michael S. Tsirkin @ 2017-03-20 14:34 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Cao jin, linux-kernel, kvm, qemu-devel, izumi.taku

On Mon, Mar 20, 2017 at 08:30:56AM -0600, Alex Williamson wrote:
> > > What about the case where the user has not registered for receiving
> > > non-fatal errors, now we send an error signal on both error_detected
> > > and slot_reset.  Is that useful/desirable?
> > >   
> > 
> > Not desirable, but seems not harmful, guest user will stop anyway. How
> > to avoid this case gracefully seems not easy.
> 
> "Not harmful" is presuming the behavior of the user.  QEMU might not be
> the only consumer of these events.  Is it possible to receive a
> slot_reset without first receiving an error_detected?  If not then we
> can easily track our action for one to decide on the behavior for the
> other.  Thanks,
> 
> Alex

I would just pass maximum info to userspace and let it decide.

-- 
MST

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH] vfio pci: kernel support of error recovery only for non fatal error
@ 2017-03-20 14:34         ` Michael S. Tsirkin
  0 siblings, 0 replies; 22+ messages in thread
From: Michael S. Tsirkin @ 2017-03-20 14:34 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Cao jin, linux-kernel, kvm, qemu-devel, izumi.taku

On Mon, Mar 20, 2017 at 08:30:56AM -0600, Alex Williamson wrote:
> > > What about the case where the user has not registered for receiving
> > > non-fatal errors, now we send an error signal on both error_detected
> > > and slot_reset.  Is that useful/desirable?
> > >   
> > 
> > Not desirable, but seems not harmful, guest user will stop anyway. How
> > to avoid this case gracefully seems not easy.
> 
> "Not harmful" is presuming the behavior of the user.  QEMU might not be
> the only consumer of these events.  Is it possible to receive a
> slot_reset without first receiving an error_detected?  If not then we
> can easily track our action for one to decide on the behavior for the
> other.  Thanks,
> 
> Alex

I would just pass maximum info to userspace and let it decide.

-- 
MST

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] vfio pci: kernel support of error recovery only for non fatal error
  2017-03-20 14:32       ` [Qemu-devel] " Michael S. Tsirkin
@ 2017-03-21  5:18         ` Alex Williamson
  -1 siblings, 0 replies; 22+ messages in thread
From: Alex Williamson @ 2017-03-21  5:18 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Cao jin, linux-kernel, kvm, qemu-devel, izumi.taku

On Mon, 20 Mar 2017 16:32:33 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Mon, Mar 20, 2017 at 08:50:39PM +0800, Cao jin wrote:
> > Sorry for late.
> > 
> > On 03/14/2017 06:06 AM, Alex Williamson wrote:  
> > > On Mon, 27 Feb 2017 15:28:43 +0800
> > > Cao jin <caoj.fnst@cn.fujitsu.com> wrote:
> > >   
> > >> 0. What happens now (PCIE AER only)
> > >>    Fatal errors cause a link reset.
> > >>    Non fatal errors don't.
> > >>    All errors stop the VM eventually, but not immediately
> > >>    because it's detected and reported asynchronously.
> > >>    Interrupts are forwarded as usual.
> > >>    Correctable errors are not reported to guest at all.
> > >>    Note: PPC EEH is different. This focuses on AER.  
> > > 
> > > Perhaps you're only focusing on AER, but don't the error handlers we're
> > > using support both AER and EEH generically?  I don't think we can
> > > completely disregard how this affects EEH behavior, if at all.
> > >   
> > 
> > After taking a rough look at the EEH,  find that EEH always feed
> > error_detected with pci_channel_io_frozen, from perspective of
> > error_detected, EEH is not affected.  
> > 
> > I am not sure about a question: when assign devices in spapr host,
> > should all functions/devices in a PE be bound to vfio? I am kind of
> > confused about the relationship between a PE & a tce iommu group
> >   
> > >>
> > >> 1. Correctable errors
> > >>    There is no need to report these to guest. So let's not.  
> > > 
> > > What does this patch change to make this happen?  I don't see
> > > anything.  Was this always the case?  No change?
> > >   
> > 
> > yes, no change on correctable error.
> >   
> > >>
> > >> 2. Fatal errors
> > >>    It's not easy to handle them gracefully since link reset
> > >>    is needed. As a first step, let's use the existing mechanism
> > >>    in that case.  
> > > 
> > > Ok, so no change here either.
> > >   
> > >> 2. Non-fatal errors
> > >>    Here we could make progress by reporting them to guest
> > >>    and have guest handle them.  
> > > 
> > > In practice, what actual errors do we expect userspace to see as
> > > non-fatal errors? It would be useful for the commit log to describe
> > > the actual benefit we're going to see by splitting out non-fatal errors
> > > for the user (not always a guest) to see separately.  Justify that this
> > > is actually useful.
> > >   
> > >>
> > >>    Issues:
> > >>    a. this behaviour should only be enabled with new userspace,
> > >>       old userspace should work without changes.
> > >>
> > >>       Suggestion: One way to address this would be to add a new eventfd
> > >>       non_fatal_err_trigger. If not set, invoke err_trigger.  
> > > 
> > > This outline format was really more useful for Michael to try to
> > > generate discussion, for a commit log, I'd much rather see a definitive
> > > statement such as:
> > > 
> > >  "To maintain backwards compatibility with userspace, non-fatal errors
> > >  will continue to trigger via the existing error interrupt index if a
> > >  non-fatal signaling mechanism has not been registered."
> > >   
> > >>    b. drivers are supposed to stop MMIO when error is reported,
> > >>       if vm keeps going, we will keep doing MMIO/config.
> > >>
> > >>       Suggestion 1: ignore this. vm stop happens much later when
> > >>       userspace runs anyway, so we are not making things much worse.
> > >>
> > >>       Suggestion 2: try to stop MMIO/config, resume on resume call
> > >>
> > >>       Patch below implements Suggestion 1.
> > >>
> > >>       Note that although this is really against the documentation, which
> > >>       states error_detected() is the point at which the driver should quiesce
> > >>       the device and not touch it further (until diagnostic poking at
> > >>       mmio_enabled or full access at resume callback).
> > >>
> > >>       Fixing this won't be easy. However, this is not a regression.
> > >>
> > >>       Also note this does nothing about interrupts, documentation
> > >>       suggests returning IRQ_NONE until reset.
> > >>       Again, not a regression.  
> > > 
> > > So again, no change here.  I'm not sure what this adds to the commit
> > > log, perhaps we can reference this as a link to Michael's original
> > > proposal.
> > >    
> > >>    c. PF driver might detect that function is completely broken,
> > >>       if vm keeps going, we will keep doing MMIO/config.
> > >>
> > >>       Suggestion 1: ignore this. vm stop happens much later when
> > >>       userspace runs anyway, so we are not making things much worse.
> > >>
> > >>       Suggestion 2: detect this and invoke err_trigger to stop VM.
> > >>
> > >>       Patch below implements Suggestion 2.  
> > > 
> > > This needs more description and seems a bit misleading.  This patch
> > > adds a slot_reset handler, such that if the slot is reset, we notify
> > > the user, essentially promoting the non-fatal error to fatal.  But what
> > > condition gets us to this point?  AIUI, AER is a voting scheme and if
> > > any driver affected says they need a reset, everyone gets a reset.  So
> > > the PF driver we're talking about here is not vfio-pci and it's not the
> > > user, the user has no way to signal that the device is completely
> > > broken, this only handles the case of other collateral devices with
> > > native host drivers that might signal this, right?
> > >   
> > 
> > Yes, same understanding as you, if I don't miss something from michael.  
> > > It seems like this is where this patch has the greatest exposure to
> > > regressions.  If we take the VM use case, previously we could have a
> > > non-AER aware guest and the hypervisor could stop the VM on all
> > > errors.  Now the hypervisor might support the distinction between fatal
> > > and non-fatal, but the guest may still not have AER support.  That
> > > doesn't imply a problem with this approach, the user (hypervisor) would
> > > be at fault for any difference in handling in that case.
> > >   
> >   
> > >>  
> > >> +static pci_ers_result_t vfio_pci_aer_slot_reset(struct pci_dev *pdev)
> > >> +{
> > >> +	struct vfio_pci_device *vdev;
> > >> +	struct vfio_device *device;
> > >> +	static pci_ers_result_t err = PCI_ERS_RESULT_NONE;
> > >> +
> > >> +	device = vfio_device_get_from_dev(&pdev->dev);
> > >> +	if (!device)
> > >> +		goto err_dev;
> > >> +
> > >> +	vdev = vfio_device_data(device);
> > >> +	if (!vdev)
> > >> +		goto err_data;
> > >> +
> > >> +	mutex_lock(&vdev->igate);
> > >> +
> > >> +	if (vdev->err_trigger)
> > >> +		eventfd_signal(vdev->err_trigger, 1);  
> > > 
> > > What about the case where the user has not registered for receiving
> > > non-fatal errors, now we send an error signal on both error_detected
> > > and slot_reset.  Is that useful/desirable?
> > >   
> > 
> > Not desirable, but seems not harmful, guest user will stop anyway. How
> > to avoid this case gracefully seems not easy.  
> 
> I actually see a clean way to do this.
> 
> Let's add yet another eventfd to trigger
> when hosts resets the device itself.  vdev->host_reset ?
> 
> Users can use the same one as err_trigger if they like,
> it will be up to them.
> 
> Alex?

Sure, the UNIX way, throw more file descriptors at the problem.  Kind
of ugly, but I don't have a cleaner solution in mind.  "host_reset"
implies something completely different to me though.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH] vfio pci: kernel support of error recovery only for non fatal error
@ 2017-03-21  5:18         ` Alex Williamson
  0 siblings, 0 replies; 22+ messages in thread
From: Alex Williamson @ 2017-03-21  5:18 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Cao jin, linux-kernel, kvm, qemu-devel, izumi.taku

On Mon, 20 Mar 2017 16:32:33 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Mon, Mar 20, 2017 at 08:50:39PM +0800, Cao jin wrote:
> > Sorry for late.
> > 
> > On 03/14/2017 06:06 AM, Alex Williamson wrote:  
> > > On Mon, 27 Feb 2017 15:28:43 +0800
> > > Cao jin <caoj.fnst@cn.fujitsu.com> wrote:
> > >   
> > >> 0. What happens now (PCIE AER only)
> > >>    Fatal errors cause a link reset.
> > >>    Non fatal errors don't.
> > >>    All errors stop the VM eventually, but not immediately
> > >>    because it's detected and reported asynchronously.
> > >>    Interrupts are forwarded as usual.
> > >>    Correctable errors are not reported to guest at all.
> > >>    Note: PPC EEH is different. This focuses on AER.  
> > > 
> > > Perhaps you're only focusing on AER, but don't the error handlers we're
> > > using support both AER and EEH generically?  I don't think we can
> > > completely disregard how this affects EEH behavior, if at all.
> > >   
> > 
> > After taking a rough look at the EEH,  find that EEH always feed
> > error_detected with pci_channel_io_frozen, from perspective of
> > error_detected, EEH is not affected.  
> > 
> > I am not sure about a question: when assign devices in spapr host,
> > should all functions/devices in a PE be bound to vfio? I am kind of
> > confused about the relationship between a PE & a tce iommu group
> >   
> > >>
> > >> 1. Correctable errors
> > >>    There is no need to report these to guest. So let's not.  
> > > 
> > > What does this patch change to make this happen?  I don't see
> > > anything.  Was this always the case?  No change?
> > >   
> > 
> > yes, no change on correctable error.
> >   
> > >>
> > >> 2. Fatal errors
> > >>    It's not easy to handle them gracefully since link reset
> > >>    is needed. As a first step, let's use the existing mechanism
> > >>    in that case.  
> > > 
> > > Ok, so no change here either.
> > >   
> > >> 2. Non-fatal errors
> > >>    Here we could make progress by reporting them to guest
> > >>    and have guest handle them.  
> > > 
> > > In practice, what actual errors do we expect userspace to see as
> > > non-fatal errors? It would be useful for the commit log to describe
> > > the actual benefit we're going to see by splitting out non-fatal errors
> > > for the user (not always a guest) to see separately.  Justify that this
> > > is actually useful.
> > >   
> > >>
> > >>    Issues:
> > >>    a. this behaviour should only be enabled with new userspace,
> > >>       old userspace should work without changes.
> > >>
> > >>       Suggestion: One way to address this would be to add a new eventfd
> > >>       non_fatal_err_trigger. If not set, invoke err_trigger.  
> > > 
> > > This outline format was really more useful for Michael to try to
> > > generate discussion, for a commit log, I'd much rather see a definitive
> > > statement such as:
> > > 
> > >  "To maintain backwards compatibility with userspace, non-fatal errors
> > >  will continue to trigger via the existing error interrupt index if a
> > >  non-fatal signaling mechanism has not been registered."
> > >   
> > >>    b. drivers are supposed to stop MMIO when error is reported,
> > >>       if vm keeps going, we will keep doing MMIO/config.
> > >>
> > >>       Suggestion 1: ignore this. vm stop happens much later when
> > >>       userspace runs anyway, so we are not making things much worse.
> > >>
> > >>       Suggestion 2: try to stop MMIO/config, resume on resume call
> > >>
> > >>       Patch below implements Suggestion 1.
> > >>
> > >>       Note that although this is really against the documentation, which
> > >>       states error_detected() is the point at which the driver should quiesce
> > >>       the device and not touch it further (until diagnostic poking at
> > >>       mmio_enabled or full access at resume callback).
> > >>
> > >>       Fixing this won't be easy. However, this is not a regression.
> > >>
> > >>       Also note this does nothing about interrupts, documentation
> > >>       suggests returning IRQ_NONE until reset.
> > >>       Again, not a regression.  
> > > 
> > > So again, no change here.  I'm not sure what this adds to the commit
> > > log, perhaps we can reference this as a link to Michael's original
> > > proposal.
> > >    
> > >>    c. PF driver might detect that function is completely broken,
> > >>       if vm keeps going, we will keep doing MMIO/config.
> > >>
> > >>       Suggestion 1: ignore this. vm stop happens much later when
> > >>       userspace runs anyway, so we are not making things much worse.
> > >>
> > >>       Suggestion 2: detect this and invoke err_trigger to stop VM.
> > >>
> > >>       Patch below implements Suggestion 2.  
> > > 
> > > This needs more description and seems a bit misleading.  This patch
> > > adds a slot_reset handler, such that if the slot is reset, we notify
> > > the user, essentially promoting the non-fatal error to fatal.  But what
> > > condition gets us to this point?  AIUI, AER is a voting scheme and if
> > > any driver affected says they need a reset, everyone gets a reset.  So
> > > the PF driver we're talking about here is not vfio-pci and it's not the
> > > user, the user has no way to signal that the device is completely
> > > broken, this only handles the case of other collateral devices with
> > > native host drivers that might signal this, right?
> > >   
> > 
> > Yes, same understanding as you, if I don't miss something from michael.  
> > > It seems like this is where this patch has the greatest exposure to
> > > regressions.  If we take the VM use case, previously we could have a
> > > non-AER aware guest and the hypervisor could stop the VM on all
> > > errors.  Now the hypervisor might support the distinction between fatal
> > > and non-fatal, but the guest may still not have AER support.  That
> > > doesn't imply a problem with this approach, the user (hypervisor) would
> > > be at fault for any difference in handling in that case.
> > >   
> >   
> > >>  
> > >> +static pci_ers_result_t vfio_pci_aer_slot_reset(struct pci_dev *pdev)
> > >> +{
> > >> +	struct vfio_pci_device *vdev;
> > >> +	struct vfio_device *device;
> > >> +	static pci_ers_result_t err = PCI_ERS_RESULT_NONE;
> > >> +
> > >> +	device = vfio_device_get_from_dev(&pdev->dev);
> > >> +	if (!device)
> > >> +		goto err_dev;
> > >> +
> > >> +	vdev = vfio_device_data(device);
> > >> +	if (!vdev)
> > >> +		goto err_data;
> > >> +
> > >> +	mutex_lock(&vdev->igate);
> > >> +
> > >> +	if (vdev->err_trigger)
> > >> +		eventfd_signal(vdev->err_trigger, 1);  
> > > 
> > > What about the case where the user has not registered for receiving
> > > non-fatal errors, now we send an error signal on both error_detected
> > > and slot_reset.  Is that useful/desirable?
> > >   
> > 
> > Not desirable, but seems not harmful, guest user will stop anyway. How
> > to avoid this case gracefully seems not easy.  
> 
> I actually see a clean way to do this.
> 
> Let's add yet another eventfd to trigger
> when hosts resets the device itself.  vdev->host_reset ?
> 
> Users can use the same one as err_trigger if they like,
> it will be up to them.
> 
> Alex?

Sure, the UNIX way, throw more file descriptors at the problem.  Kind
of ugly, but I don't have a cleaner solution in mind.  "host_reset"
implies something completely different to me though.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] vfio pci: kernel support of error recovery only for non fatal error
  2017-03-20 14:30       ` [Qemu-devel] " Alex Williamson
@ 2017-03-21  8:05         ` Cao jin
  -1 siblings, 0 replies; 22+ messages in thread
From: Cao jin @ 2017-03-21  8:05 UTC (permalink / raw)
  To: Alex Williamson; +Cc: linux-kernel, kvm, qemu-devel, mst, izumi.taku



On 03/20/2017 10:30 PM, Alex Williamson wrote:
> On Mon, 20 Mar 2017 20:50:39 +0800
> Cao jin <caoj.fnst@cn.fujitsu.com> wrote:
> 
>> Sorry for late.
>>
>> On 03/14/2017 06:06 AM, Alex Williamson wrote:
>>> On Mon, 27 Feb 2017 15:28:43 +0800
>>> Cao jin <caoj.fnst@cn.fujitsu.com> wrote:
>>>   
>>>> 0. What happens now (PCIE AER only)
>>>>    Fatal errors cause a link reset.
>>>>    Non fatal errors don't.
>>>>    All errors stop the VM eventually, but not immediately
>>>>    because it's detected and reported asynchronously.
>>>>    Interrupts are forwarded as usual.
>>>>    Correctable errors are not reported to guest at all.
>>>>    Note: PPC EEH is different. This focuses on AER.  
>>>
>>> Perhaps you're only focusing on AER, but don't the error handlers we're
>>> using support both AER and EEH generically?  I don't think we can
>>> completely disregard how this affects EEH behavior, if at all.
>>>   
>>
>> After taking a rough look at the EEH,  find that EEH always feed
>> error_detected with pci_channel_io_frozen, from perspective of
>> error_detected, EEH is not affected.  
>>
>> I am not sure about a question: when assign devices in spapr host,
>> should all functions/devices in a PE be bound to vfio? I am kind of
>> confused about the relationship between a PE & a tce iommu group
> 
> AIUI, yes all devices within the PE are part of the same IOMMU group
> and therefore all endpoints must be bound to vfio or pci-stub.
> 

Thanks. Then I think this approach won't affect EEH. I was considering
the same issue you mentioned for slot_reset may affect EEH, but if they
all must be bound to vfio, seems the issue won't happen to EEH.

-- 
Sincerely,
Cao jin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH] vfio pci: kernel support of error recovery only for non fatal error
@ 2017-03-21  8:05         ` Cao jin
  0 siblings, 0 replies; 22+ messages in thread
From: Cao jin @ 2017-03-21  8:05 UTC (permalink / raw)
  To: Alex Williamson; +Cc: linux-kernel, kvm, qemu-devel, mst, izumi.taku



On 03/20/2017 10:30 PM, Alex Williamson wrote:
> On Mon, 20 Mar 2017 20:50:39 +0800
> Cao jin <caoj.fnst@cn.fujitsu.com> wrote:
> 
>> Sorry for late.
>>
>> On 03/14/2017 06:06 AM, Alex Williamson wrote:
>>> On Mon, 27 Feb 2017 15:28:43 +0800
>>> Cao jin <caoj.fnst@cn.fujitsu.com> wrote:
>>>   
>>>> 0. What happens now (PCIE AER only)
>>>>    Fatal errors cause a link reset.
>>>>    Non fatal errors don't.
>>>>    All errors stop the VM eventually, but not immediately
>>>>    because it's detected and reported asynchronously.
>>>>    Interrupts are forwarded as usual.
>>>>    Correctable errors are not reported to guest at all.
>>>>    Note: PPC EEH is different. This focuses on AER.  
>>>
>>> Perhaps you're only focusing on AER, but don't the error handlers we're
>>> using support both AER and EEH generically?  I don't think we can
>>> completely disregard how this affects EEH behavior, if at all.
>>>   
>>
>> After taking a rough look at the EEH,  find that EEH always feed
>> error_detected with pci_channel_io_frozen, from perspective of
>> error_detected, EEH is not affected.  
>>
>> I am not sure about a question: when assign devices in spapr host,
>> should all functions/devices in a PE be bound to vfio? I am kind of
>> confused about the relationship between a PE & a tce iommu group
> 
> AIUI, yes all devices within the PE are part of the same IOMMU group
> and therefore all endpoints must be bound to vfio or pci-stub.
> 

Thanks. Then I think this approach won't affect EEH. I was considering
the same issue you mentioned for slot_reset may affect EEH, but if they
all must be bound to vfio, seems the issue won't happen to EEH.

-- 
Sincerely,
Cao jin

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-03-21  8:01 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-27  7:28 [PATCH] vfio pci: kernel support of error recovery only for non fatal error Cao jin
2017-02-27  7:28 ` [Qemu-devel] " Cao jin
2017-02-27  7:28 ` Cao jin
2017-02-27 16:16 ` Michael S. Tsirkin
2017-02-27 16:16   ` [Qemu-devel] " Michael S. Tsirkin
2017-02-28  1:31   ` Cao jin
2017-02-28  1:31     ` [Qemu-devel] " Cao jin
2017-02-28  1:31     ` Cao jin
2017-03-13 22:06 ` Alex Williamson
2017-03-13 22:06   ` [Qemu-devel] " Alex Williamson
2017-03-20 12:50   ` Cao jin
2017-03-20 12:50     ` [Qemu-devel] " Cao jin
2017-03-20 14:30     ` Alex Williamson
2017-03-20 14:30       ` [Qemu-devel] " Alex Williamson
2017-03-20 14:34       ` Michael S. Tsirkin
2017-03-20 14:34         ` [Qemu-devel] " Michael S. Tsirkin
2017-03-21  8:05       ` Cao jin
2017-03-21  8:05         ` [Qemu-devel] " Cao jin
2017-03-20 14:32     ` Michael S. Tsirkin
2017-03-20 14:32       ` [Qemu-devel] " Michael S. Tsirkin
2017-03-21  5:18       ` Alex Williamson
2017-03-21  5:18         ` [Qemu-devel] " Alex Williamson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.