All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] PCI-AER: Do not report successful error recovery for devices with AER-unaware drivers
@ 2012-11-17 11:47 Pandarathil, Vijaymohan R
  2012-11-28 20:15 ` Bjorn Helgaas
  0 siblings, 1 reply; 3+ messages in thread
From: Pandarathil, Vijaymohan R @ 2012-11-17 11:47 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-kernel, linux-pci, linasvepstas, Myron Stowe, Ortiz,
	Lance E, Huang Ying, Hidetoshi Seto, Patterson,
	Andrew D (LeftHand Networks),
	Zhang Yanmin

When an error is detected on a PCIe device which does not have an
AER-aware driver, prevent AER infrastructure from reporting
successful error recovery.

This is because the report_error_detected() function that gets
called in the first phase of recovery process allows forward
progress even when the driver for the device does not have AER
capabilities. It seems that all callbacks (in pci_error_handlers
structure) registered by drivers that gets called during error
recovery are not mandatory. So the intention of the infrastructure
design seems to be to allow forward progress even when a specific
callback has not been registered by a driver. However, if error
handler structure itself has not been registered, it doesn't make
sense to allow forward progress.

As a result of the current design, in the case of a single device
having an AER-unaware driver or in the case of any function in a
multi-function card having an AER-unaware driver, a successful
recovery is reported.

Typical scenario this happens is when a PCI device is detached
from a KVM host and the pci-stub driver on the host claims the
device. The pci-stub driver does not have error handling capabilities
but the AER infrastructure still reports that the device recovered
successfully.

The changes proposed here leaves the device(s)in an unrecovered state
if the driver for the device or for any device in the subtree
does not have error handler structure registered. This reflects
the true state of the device and prevents any partial recovery (or no
recovery at all) reported as successful.


v3:
  - Fixed a checkpatch/build issue

v2:
  - Made changes so that all devices in the subtree have the error
    state set correctly.


Reviewed-by: Linas Vepstas <linasvepstas <at> gmail.com>
Reviewed-by: Myron Stowe <mstowe <at> redhat.com>
Reviewed-by: Bjorn Helgaas <bhelgaas <at> google.com>
Signed-off-by: Vijay Mohan Pandarathil <vijaymohan.pandarathil@hp.com>
---
 drivers/pci/pcie/aer/aerdrv.h      |  5 ++++-
 drivers/pci/pcie/aer/aerdrv_core.c | 21 ++++++++++++++++++---
 include/linux/pci.h                |  3 +++
 3 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h
index 94a7598..22f840f 100644
--- a/drivers/pci/pcie/aer/aerdrv.h
+++ b/drivers/pci/pcie/aer/aerdrv.h
@@ -87,6 +87,9 @@ struct aer_broadcast_data {
 static inline pci_ers_result_t merge_result(enum pci_ers_result orig,
 		enum pci_ers_result new)
 {
+	if (new == PCI_ERS_RESULT_NO_AER_DRIVER)
+		return PCI_ERS_RESULT_NO_AER_DRIVER;
+
 	if (new == PCI_ERS_RESULT_NONE)
 		return orig;
 
@@ -97,7 +100,7 @@ static inline pci_ers_result_t merge_result(enum pci_ers_result orig,
 		break;
 	case PCI_ERS_RESULT_DISCONNECT:
 		if (new == PCI_ERS_RESULT_NEED_RESET)
-			orig = new;
+			orig = PCI_ERS_RESULT_NEED_RESET;
 		break;
 	default:
 		break;
diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
index 9f80cb0..b67f91a 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -232,11 +232,26 @@ static int report_error_detected(struct pci_dev *dev, void *data)
 				   dev->driver ?
 				   "no AER-aware driver" : "no driver");
 		}
-		return 0;
+
+		/*
+		 * If there's any device in the subtree that does not
+		 * have an error_detected callback, returning
+		 * PCI_ERS_RESULT_NO_AER_DRIVER prevents calling of
+		 * the subsequent mmio_enabled/slot_reset/resume
+		 * callbacks of "any" device in the subtree. All the
+		 * devices in the subtree are left in the error state
+		 * without recovery.
+		 */
+
+		if (!(dev->hdr_type & PCI_HEADER_TYPE_BRIDGE))
+			vote = PCI_ERS_RESULT_NO_AER_DRIVER;
+		else
+			vote = PCI_ERS_RESULT_NONE;
+	} else {
+		err_handler = dev->driver->err_handler;
+		vote = err_handler->error_detected(dev, result_data->state);
 	}
 
-	err_handler = dev->driver->err_handler;
-	vote = err_handler->error_detected(dev, result_data->state);
 	result_data->result = merge_result(result_data->result, vote);
 	return 0;
 }
diff --git a/include/linux/pci.h b/include/linux/pci.h
index ab17a08..c458602 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -540,6 +540,9 @@ enum pci_ers_result {
 
 	/* Device driver is fully recovered and operational */
 	PCI_ERS_RESULT_RECOVERED = (__force pci_ers_result_t) 5,
+
+	/* No AER capabilities registered for the driver */
+	PCI_ERS_RESULT_NO_AER_DRIVER = (__force pci_ers_result_t) 6,
 };
 
 /* PCI bus error event callbacks */
-- 
1.7.11.3


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v3] PCI-AER: Do not report successful error recovery for devices with AER-unaware drivers
  2012-11-17 11:47 [PATCH v3] PCI-AER: Do not report successful error recovery for devices with AER-unaware drivers Pandarathil, Vijaymohan R
@ 2012-11-28 20:15 ` Bjorn Helgaas
  2012-11-28 20:30   ` Pandarathil, Vijaymohan R
  0 siblings, 1 reply; 3+ messages in thread
From: Bjorn Helgaas @ 2012-11-28 20:15 UTC (permalink / raw)
  To: Pandarathil, Vijaymohan R
  Cc: linux-kernel, linux-pci, linasvepstas, Myron Stowe, Ortiz,
	Lance E, Huang Ying, Hidetoshi Seto, Patterson,
	Andrew D (LeftHand Networks),
	Zhang Yanmin

On Sat, Nov 17, 2012 at 4:47 AM, Pandarathil, Vijaymohan R
<vijaymohan.pandarathil@hp.com> wrote:
> When an error is detected on a PCIe device which does not have an
> AER-aware driver, prevent AER infrastructure from reporting
> successful error recovery.
>
> This is because the report_error_detected() function that gets
> called in the first phase of recovery process allows forward
> progress even when the driver for the device does not have AER
> capabilities. It seems that all callbacks (in pci_error_handlers
> structure) registered by drivers that gets called during error
> recovery are not mandatory. So the intention of the infrastructure
> design seems to be to allow forward progress even when a specific
> callback has not been registered by a driver. However, if error
> handler structure itself has not been registered, it doesn't make
> sense to allow forward progress.
>
> As a result of the current design, in the case of a single device
> having an AER-unaware driver or in the case of any function in a
> multi-function card having an AER-unaware driver, a successful
> recovery is reported.
>
> Typical scenario this happens is when a PCI device is detached
> from a KVM host and the pci-stub driver on the host claims the
> device. The pci-stub driver does not have error handling capabilities
> but the AER infrastructure still reports that the device recovered
> successfully.
>
> The changes proposed here leaves the device(s)in an unrecovered state
> if the driver for the device or for any device in the subtree
> does not have error handler structure registered. This reflects
> the true state of the device and prevents any partial recovery (or no
> recovery at all) reported as successful.
>
>
> v3:
>   - Fixed a checkpatch/build issue
>
> v2:
>   - Made changes so that all devices in the subtree have the error
>     state set correctly.
>
>
> Reviewed-by: Linas Vepstas <linasvepstas <at> gmail.com>
> Reviewed-by: Myron Stowe <mstowe <at> redhat.com>
> Reviewed-by: Bjorn Helgaas <bhelgaas <at> google.com>
> Signed-off-by: Vijay Mohan Pandarathil <vijaymohan.pandarathil@hp.com>

I added this to my -next branch, which will be merged for v3.8.

Please double-check that I resolved the merge conflict correctly.  It
should show up here in the next few minutes:

http://git.kernel.org/?p=linux/kernel/git/helgaas/pci.git;a=shortlog;h=refs/heads/next

>  drivers/pci/pcie/aer/aerdrv.h      |  5 ++++-
>  drivers/pci/pcie/aer/aerdrv_core.c | 21 ++++++++++++++++++---
>  include/linux/pci.h                |  3 +++
>  3 files changed, 25 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h
> index 94a7598..22f840f 100644
> --- a/drivers/pci/pcie/aer/aerdrv.h
> +++ b/drivers/pci/pcie/aer/aerdrv.h
> @@ -87,6 +87,9 @@ struct aer_broadcast_data {
>  static inline pci_ers_result_t merge_result(enum pci_ers_result orig,
>                 enum pci_ers_result new)
>  {
> +       if (new == PCI_ERS_RESULT_NO_AER_DRIVER)
> +               return PCI_ERS_RESULT_NO_AER_DRIVER;
> +
>         if (new == PCI_ERS_RESULT_NONE)
>                 return orig;
>
> @@ -97,7 +100,7 @@ static inline pci_ers_result_t merge_result(enum pci_ers_result orig,
>                 break;
>         case PCI_ERS_RESULT_DISCONNECT:
>                 if (new == PCI_ERS_RESULT_NEED_RESET)
> -                       orig = new;
> +                       orig = PCI_ERS_RESULT_NEED_RESET;
>                 break;
>         default:
>                 break;
> diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
> index 9f80cb0..b67f91a 100644
> --- a/drivers/pci/pcie/aer/aerdrv_core.c
> +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> @@ -232,11 +232,26 @@ static int report_error_detected(struct pci_dev *dev, void *data)
>                                    dev->driver ?
>                                    "no AER-aware driver" : "no driver");
>                 }
> -               return 0;
> +
> +               /*
> +                * If there's any device in the subtree that does not
> +                * have an error_detected callback, returning
> +                * PCI_ERS_RESULT_NO_AER_DRIVER prevents calling of
> +                * the subsequent mmio_enabled/slot_reset/resume
> +                * callbacks of "any" device in the subtree. All the
> +                * devices in the subtree are left in the error state
> +                * without recovery.
> +                */
> +
> +               if (!(dev->hdr_type & PCI_HEADER_TYPE_BRIDGE))
> +                       vote = PCI_ERS_RESULT_NO_AER_DRIVER;
> +               else
> +                       vote = PCI_ERS_RESULT_NONE;
> +       } else {
> +               err_handler = dev->driver->err_handler;
> +               vote = err_handler->error_detected(dev, result_data->state);
>         }
>
> -       err_handler = dev->driver->err_handler;
> -       vote = err_handler->error_detected(dev, result_data->state);
>         result_data->result = merge_result(result_data->result, vote);
>         return 0;
>  }
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index ab17a08..c458602 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -540,6 +540,9 @@ enum pci_ers_result {
>
>         /* Device driver is fully recovered and operational */
>         PCI_ERS_RESULT_RECOVERED = (__force pci_ers_result_t) 5,
> +
> +       /* No AER capabilities registered for the driver */
> +       PCI_ERS_RESULT_NO_AER_DRIVER = (__force pci_ers_result_t) 6,
>  };
>
>  /* PCI bus error event callbacks */
> --
> 1.7.11.3
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: [PATCH v3] PCI-AER: Do not report successful error recovery for devices with AER-unaware drivers
  2012-11-28 20:15 ` Bjorn Helgaas
@ 2012-11-28 20:30   ` Pandarathil, Vijaymohan R
  0 siblings, 0 replies; 3+ messages in thread
From: Pandarathil, Vijaymohan R @ 2012-11-28 20:30 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-kernel, linux-pci, linasvepstas, Myron Stowe, Ortiz,
	Lance E, Huang Ying, Hidetoshi Seto, Patterson,
	Andrew D (LeftHand Networks),
	Zhang Yanmin

Looks correct.

Vijay

> -----Original Message-----
> From: Bjorn Helgaas [mailto:bhelgaas@google.com]
> Sent: Wednesday, November 28, 2012 12:15 PM
> To: Pandarathil, Vijaymohan R
> Cc: linux-kernel@vger.kernel.org; linux-pci@vger.kernel.org;
> linasvepstas@gmail.com; Myron Stowe; Ortiz, Lance E; Huang Ying; Hidetoshi
> Seto; Patterson, Andrew D (LeftHand Networks); Zhang Yanmin
> Subject: Re: [PATCH v3] PCI-AER: Do not report successful error recovery
> for devices with AER-unaware drivers
> 
> On Sat, Nov 17, 2012 at 4:47 AM, Pandarathil, Vijaymohan R
> <vijaymohan.pandarathil@hp.com> wrote:
> > When an error is detected on a PCIe device which does not have an
> > AER-aware driver, prevent AER infrastructure from reporting
> > successful error recovery.
> >
> > This is because the report_error_detected() function that gets
> > called in the first phase of recovery process allows forward
> > progress even when the driver for the device does not have AER
> > capabilities. It seems that all callbacks (in pci_error_handlers
> > structure) registered by drivers that gets called during error
> > recovery are not mandatory. So the intention of the infrastructure
> > design seems to be to allow forward progress even when a specific
> > callback has not been registered by a driver. However, if error
> > handler structure itself has not been registered, it doesn't make
> > sense to allow forward progress.
> >
> > As a result of the current design, in the case of a single device
> > having an AER-unaware driver or in the case of any function in a
> > multi-function card having an AER-unaware driver, a successful
> > recovery is reported.
> >
> > Typical scenario this happens is when a PCI device is detached
> > from a KVM host and the pci-stub driver on the host claims the
> > device. The pci-stub driver does not have error handling capabilities
> > but the AER infrastructure still reports that the device recovered
> > successfully.
> >
> > The changes proposed here leaves the device(s)in an unrecovered state
> > if the driver for the device or for any device in the subtree
> > does not have error handler structure registered. This reflects
> > the true state of the device and prevents any partial recovery (or no
> > recovery at all) reported as successful.
> >
> >
> > v3:
> >   - Fixed a checkpatch/build issue
> >
> > v2:
> >   - Made changes so that all devices in the subtree have the error
> >     state set correctly.
> >
> >
> > Reviewed-by: Linas Vepstas <linasvepstas <at> gmail.com>
> > Reviewed-by: Myron Stowe <mstowe <at> redhat.com>
> > Reviewed-by: Bjorn Helgaas <bhelgaas <at> google.com>
> > Signed-off-by: Vijay Mohan Pandarathil <vijaymohan.pandarathil@hp.com>
> 
> I added this to my -next branch, which will be merged for v3.8.
> 
> Please double-check that I resolved the merge conflict correctly.  It
> should show up here in the next few minutes:
> 
> http://git.kernel.org/?p=linux/kernel/git/helgaas/pci.git;a=shortlog;h=refs
> /heads/next
> 
> >  drivers/pci/pcie/aer/aerdrv.h      |  5 ++++-
> >  drivers/pci/pcie/aer/aerdrv_core.c | 21 ++++++++++++++++++---
> >  include/linux/pci.h                |  3 +++
> >  3 files changed, 25 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/pci/pcie/aer/aerdrv.h
> b/drivers/pci/pcie/aer/aerdrv.h
> > index 94a7598..22f840f 100644
> > --- a/drivers/pci/pcie/aer/aerdrv.h
> > +++ b/drivers/pci/pcie/aer/aerdrv.h
> > @@ -87,6 +87,9 @@ struct aer_broadcast_data {
> >  static inline pci_ers_result_t merge_result(enum pci_ers_result orig,
> >                 enum pci_ers_result new)
> >  {
> > +       if (new == PCI_ERS_RESULT_NO_AER_DRIVER)
> > +               return PCI_ERS_RESULT_NO_AER_DRIVER;
> > +
> >         if (new == PCI_ERS_RESULT_NONE)
> >                 return orig;
> >
> > @@ -97,7 +100,7 @@ static inline pci_ers_result_t merge_result(enum
> pci_ers_result orig,
> >                 break;
> >         case PCI_ERS_RESULT_DISCONNECT:
> >                 if (new == PCI_ERS_RESULT_NEED_RESET)
> > -                       orig = new;
> > +                       orig = PCI_ERS_RESULT_NEED_RESET;
> >                 break;
> >         default:
> >                 break;
> > diff --git a/drivers/pci/pcie/aer/aerdrv_core.c
> b/drivers/pci/pcie/aer/aerdrv_core.c
> > index 9f80cb0..b67f91a 100644
> > --- a/drivers/pci/pcie/aer/aerdrv_core.c
> > +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> > @@ -232,11 +232,26 @@ static int report_error_detected(struct pci_dev
> *dev, void *data)
> >                                    dev->driver ?
> >                                    "no AER-aware driver" : "no driver");
> >                 }
> > -               return 0;
> > +
> > +               /*
> > +                * If there's any device in the subtree that does not
> > +                * have an error_detected callback, returning
> > +                * PCI_ERS_RESULT_NO_AER_DRIVER prevents calling of
> > +                * the subsequent mmio_enabled/slot_reset/resume
> > +                * callbacks of "any" device in the subtree. All the
> > +                * devices in the subtree are left in the error state
> > +                * without recovery.
> > +                */
> > +
> > +               if (!(dev->hdr_type & PCI_HEADER_TYPE_BRIDGE))
> > +                       vote = PCI_ERS_RESULT_NO_AER_DRIVER;
> > +               else
> > +                       vote = PCI_ERS_RESULT_NONE;
> > +       } else {
> > +               err_handler = dev->driver->err_handler;
> > +               vote = err_handler->error_detected(dev, result_data-
> >state);
> >         }
> >
> > -       err_handler = dev->driver->err_handler;
> > -       vote = err_handler->error_detected(dev, result_data->state);
> >         result_data->result = merge_result(result_data->result, vote);
> >         return 0;
> >  }
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index ab17a08..c458602 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -540,6 +540,9 @@ enum pci_ers_result {
> >
> >         /* Device driver is fully recovered and operational */
> >         PCI_ERS_RESULT_RECOVERED = (__force pci_ers_result_t) 5,
> > +
> > +       /* No AER capabilities registered for the driver */
> > +       PCI_ERS_RESULT_NO_AER_DRIVER = (__force pci_ers_result_t) 6,
> >  };
> >
> >  /* PCI bus error event callbacks */
> > --
> > 1.7.11.3
> >

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-11-28 20:31 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-17 11:47 [PATCH v3] PCI-AER: Do not report successful error recovery for devices with AER-unaware drivers Pandarathil, Vijaymohan R
2012-11-28 20:15 ` Bjorn Helgaas
2012-11-28 20:30   ` Pandarathil, Vijaymohan R

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.