From: Bjorn Helgaas <helgaas@kernel.org>
To: Oza Pawandeep <poza@codeaurora.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
Philippe Ombredanne <pombredanne@nexb.com>,
Thomas Gleixner <tglx@linutronix.de>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Kate Stewart <kstewart@linuxfoundation.org>,
linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
Dongdong Liu <liudongdong3@huawei.com>,
Keith Busch <keith.busch@intel.com>, Wei Zhang <wzhang@fb.com>,
Sinan Kaya <okaya@codeaurora.org>,
Timur Tabi <timur@codeaurora.org>
Subject: Re: [PATCH v15 3/9] PCI/AER: Handle ERR_FATAL with removal and re-enumeration of devices
Date: Tue, 8 May 2018 18:53:30 -0500 [thread overview]
Message-ID: <20180508235330.GN161390@bhelgaas-glaptop.roam.corp.google.com> (raw)
In-Reply-To: <1525323838-1735-4-git-send-email-poza@codeaurora.org>
On Thu, May 03, 2018 at 01:03:52AM -0400, Oza Pawandeep wrote:
> This patch alters the behavior of handling of ERR_FATAL, where removal
> of devices is initiated, followed by reset link, followed by
> re-enumeration.
>
> So the errors are handled in a different way as follows:
> ERR_NONFATAL => call driver recovery entry points
> ERR_FATAL => remove and re-enumerate
>
> please refer to Documentation/PCI/pci-error-recovery.txt for more details.
>
> Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
>
> diff --git a/drivers/pci/pcie/aer/aerdrv.c b/drivers/pci/pcie/aer/aerdrv.c
> index 779b387..206f590 100644
> --- a/drivers/pci/pcie/aer/aerdrv.c
> +++ b/drivers/pci/pcie/aer/aerdrv.c
> @@ -330,6 +330,13 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
> reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
> pci_write_config_dword(dev, pos + PCI_ERR_ROOT_COMMAND, reg32);
>
> + /*
> + * This function is called only on ERR_FATAL now, and since
> + * the pci_report_resume is called only in ERR_NONFATAL case,
> + * the clearing part has to be taken care here.
> + */
> + aer_error_resume(dev);
I don't understand this part. Previously the ERR_FATAL path looked like
this:
do_recovery
reset_link
driver->reset_link
aer_root_reset
pci_reset_bridge_secondary_bus # <-- reset
broadcast_error_message(..., report_resume)
pci_walk_bus(..., report_resume, ...)
report_resume
if (cb == report_resume)
pci_cleanup_aer_uncorrect_error_status
pci_write_config_dword(PCI_ERR_UNCOR_STATUS) # <-- clear status
After this patch, it will look like this:
do_recovery
do_fatal_recovery
pci_cleanup_aer_uncorrect_error_status
pci_write_config_dword(PCI_ERR_UNCOR_STATUS) # <-- clear status
reset_link
driver->reset_link
aer_root_reset
pci_reset_bridge_secondary_bus # <-- reset
aer_error_resume
pcie_capability_write_word(PCI_EXP_DEVSTA) # <-- clear more
pci_write_config_dword(PCI_ERR_UNCOR_STATUS) # <-- clear status
So if I'm understanding correctly, the new path clears the status too
early, then clears it again (plus clearing DEVSTA, which we didn't do
before) later.
I would think we would want to leave aer_root_reset() alone, and just move
the pci_cleanup_aer_uncorrect_error_status() in do_fatal_recovery() down so
it happens after we call reset_link(). That way the reset/clear sequence
would be the same as it was before.
> return PCI_ERS_RESULT_RECOVERED;
> }
>
> diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
> index 0ea5acc..655d4e8 100644
> --- a/drivers/pci/pcie/aer/aerdrv_core.c
> +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> @@ -20,6 +20,7 @@
> #include <linux/slab.h>
> #include <linux/kfifo.h>
> #include "aerdrv.h"
> +#include "../../pci.h"
>
> #define PCI_EXP_AER_FLAGS (PCI_EXP_DEVCTL_CERE | PCI_EXP_DEVCTL_NFERE | \
> PCI_EXP_DEVCTL_FERE | PCI_EXP_DEVCTL_URRE)
> @@ -474,6 +475,44 @@ static pci_ers_result_t reset_link(struct pci_dev *dev)
> return status;
> }
>
> +static pci_ers_result_t do_fatal_recovery(struct pci_dev *dev, int severity)
> +{
> + struct pci_dev *udev;
> + struct pci_bus *parent;
> + struct pci_dev *pdev, *temp;
> + pci_ers_result_t result = PCI_ERS_RESULT_RECOVERED;
> +
> + if (severity == AER_FATAL)
> + pci_cleanup_aer_uncorrect_error_status(dev);
> +
> + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE)
> + udev = dev;
> + else
> + udev = dev->bus->self;
> +
> + parent = udev->subordinate;
> + pci_lock_rescan_remove();
> + list_for_each_entry_safe_reverse(pdev, temp, &parent->devices,
> + bus_list) {
> + pci_dev_get(pdev);
> + pci_dev_set_disconnected(pdev, NULL);
> + if (pci_has_subordinate(pdev))
> + pci_walk_bus(pdev->subordinate,
> + pci_dev_set_disconnected, NULL);
> + pci_stop_and_remove_bus_device(pdev);
> + pci_dev_put(pdev);
> + }
> +
> + result = reset_link(udev);
> + if (result == PCI_ERS_RESULT_RECOVERED)
> + if (pcie_wait_for_link(udev, true))
> + pci_rescan_bus(udev->bus);
> +
> + pci_unlock_rescan_remove();
> +
> + return result;
> +}
> +
> /**
> * do_recovery - handle nonfatal/fatal error recovery process
> * @dev: pointer to a pci_dev data structure of agent detecting an error
> @@ -485,11 +524,15 @@ static pci_ers_result_t reset_link(struct pci_dev *dev)
> */
> static void do_recovery(struct pci_dev *dev, int severity)
> {
> - pci_ers_result_t status, result = PCI_ERS_RESULT_RECOVERED;
> + pci_ers_result_t status;
> enum pci_channel_state state;
>
> - if (severity == AER_FATAL)
> - state = pci_channel_io_frozen;
> + if (severity == AER_FATAL) {
> + status = do_fatal_recovery(dev, severity);
> + if (status != PCI_ERS_RESULT_RECOVERED)
> + goto failed;
> + return;
> + }
> else
> state = pci_channel_io_normal;
>
> @@ -498,12 +541,6 @@ static void do_recovery(struct pci_dev *dev, int severity)
> "error_detected",
> report_error_detected);
>
> - if (severity == AER_FATAL) {
> - result = reset_link(dev);
> - if (result != PCI_ERS_RESULT_RECOVERED)
> - goto failed;
> - }
> -
> if (status == PCI_ERS_RESULT_CAN_RECOVER)
> status = broadcast_error_message(dev,
> state,
> --
> 2.7.4
>
next prev parent reply other threads:[~2018-05-08 23:53 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-03 5:03 [PATCH v15 0/9] Address error and recovery for AER and DPC Oza Pawandeep
2018-05-03 5:03 ` [PATCH v15 1/9] PCI: Unify wait for link active into generic PCI Oza Pawandeep
2018-05-10 13:18 ` Bjorn Helgaas
2018-05-03 5:03 ` [PATCH v15 2/9] pci-error-recovery: Add AER_FATAL handling Oza Pawandeep
2018-05-03 5:03 ` [PATCH v15 3/9] PCI/AER: Handle ERR_FATAL with removal and re-enumeration of devices Oza Pawandeep
2018-05-08 23:53 ` Bjorn Helgaas [this message]
2018-05-09 13:07 ` Bjorn Helgaas
2018-05-09 13:14 ` poza
2018-05-09 23:21 ` Bjorn Helgaas
2018-05-10 7:01 ` poza
2018-05-10 13:10 ` Bjorn Helgaas
2018-05-10 13:15 ` okaya
2018-05-10 14:18 ` poza
2018-05-10 13:17 ` Bjorn Helgaas
2018-05-03 5:03 ` [PATCH v15 4/9] PCI/AER: Rename error recovery to generic PCI naming Oza Pawandeep
2018-05-03 5:03 ` [PATCH v15 5/9] PCI/AER: Factor out error reporting from AER Oza Pawandeep
2018-05-03 21:52 ` kbuild test robot
2018-05-03 22:53 ` kbuild test robot
2018-05-04 6:48 ` poza
2018-05-03 5:03 ` [PATCH v15 6/9] PCI/PORTDRV: Implement generic find service Oza Pawandeep
2018-05-03 5:03 ` [PATCH v15 7/9] PCI/PORTDRV: Implement generic find device Oza Pawandeep
2018-05-10 13:31 ` Bjorn Helgaas
2018-05-03 5:03 ` [PATCH v15 8/9] PCI/DPC: Unify and plumb error handling into DPC Oza Pawandeep
2018-05-10 13:22 ` Bjorn Helgaas
2018-05-10 14:26 ` poza
2018-05-10 16:27 ` Bjorn Helgaas
2018-05-03 5:03 ` [PATCH v15 9/9] PCI/DPC: Disable ERR_NONFATAL and enable ERR_FATAL for DPC Oza Pawandeep
2018-05-10 13:26 ` Bjorn Helgaas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180508235330.GN161390@bhelgaas-glaptop.roam.corp.google.com \
--to=helgaas@kernel.org \
--cc=bhelgaas@google.com \
--cc=gregkh@linuxfoundation.org \
--cc=keith.busch@intel.com \
--cc=kstewart@linuxfoundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=liudongdong3@huawei.com \
--cc=okaya@codeaurora.org \
--cc=pombredanne@nexb.com \
--cc=poza@codeaurora.org \
--cc=tglx@linutronix.de \
--cc=timur@codeaurora.org \
--cc=wzhang@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.