From: Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com> To: Kai-Heng Feng <kai.heng.feng@canonical.com>, bhelgaas@google.com Cc: mika.westerberg@linux.intel.com, koba.ko@canonical.com, Russell Currey <ruscur@russell.cc>, Oliver O'Halloran <oohall@gmail.com>, Lalithambika Krishnakumar <lalithambika.krishnakumar@intel.com>, Lu Baolu <baolu.lu@linux.intel.com>, Joerg Roedel <jroedel@suse.de>, linuxppc-dev@lists.ozlabs.org, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2 1/2] PCI/AER: Disable AER service when link is in L2/L3 ready, L2 and L3 state Date: Sat, 19 Mar 2022 13:38:39 -0700 [thread overview] Message-ID: <427f19c6-32f0-684e-5fdd-2e5ed192b71d@linux.intel.com> (raw) In-Reply-To: <20220127025418.1989642-1-kai.heng.feng@canonical.com> On 1/26/22 6:54 PM, Kai-Heng Feng wrote: > Commit 50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in > hint") enables ACS, and some platforms lose its NVMe after resume from Why enabling ACS makes platform lose NVMe? Can you add more details about the problem? > S3: > [ 50.947816] pcieport 0000:00:1b.0: DPC: containment event, status:0x1f01 source:0x0000 > [ 50.947817] pcieport 0000:00:1b.0: DPC: unmasked uncorrectable error detected > [ 50.947829] pcieport 0000:00:1b.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID) > [ 50.947830] pcieport 0000:00:1b.0: device [8086:06ac] error status/mask=00200000/00010000 > [ 50.947831] pcieport 0000:00:1b.0: [21] ACSViol (First) > [ 50.947841] pcieport 0000:00:1b.0: AER: broadcast error_detected message > [ 50.947843] nvme nvme0: frozen state error detected, reset controller > > It happens right after ACS gets enabled during resume. > > There's another case, when Thunderbolt reaches D3cold: > [ 30.100211] pcieport 0000:00:1d.0: AER: Uncorrected (Non-Fatal) error received: 0000:00:1d.0 > [ 30.100251] pcieport 0000:00:1d.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID) > [ 30.100256] pcieport 0000:00:1d.0: device [8086:7ab0] error status/mask=00100000/00004000 > [ 30.100262] pcieport 0000:00:1d.0: [20] UnsupReq (First) > [ 30.100267] pcieport 0000:00:1d.0: AER: TLP Header: 34000000 08000052 00000000 00000000 > [ 30.100372] thunderbolt 0000:0a:00.0: AER: can't recover (no error_detected callback) no callback message means one or more devices in the given port does not support error handler. How is this related to ACS? > [ 30.100401] xhci_hcd 0000:3e:00.0: AER: can't recover (no error_detected callback) > [ 30.100427] pcieport 0000:00:1d.0: AER: device recovery failed > > So disable AER service to avoid the noises from turning power rails > on/off when the device is in low power states (D3hot and D3cold), as > PCIe spec "5.2 Link State Power Management" states that TLP and DLLP > transmission is disabled for a Link in L2/L3 Ready (D3hot), L2 (D3cold > with aux power) and L3 (D3cold). > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=209149 > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215453 > Fixes: 50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in hint") > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> > --- > v2: > - Wording change. > > drivers/pci/pcie/aer.c | 31 +++++++++++++++++++++++++------ > 1 file changed, 25 insertions(+), 6 deletions(-) > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index 9fa1f97e5b270..e4e9d4a3098d7 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -1367,6 +1367,22 @@ static int aer_probe(struct pcie_device *dev) > return 0; > } > > +static int aer_suspend(struct pcie_device *dev) > +{ > + struct aer_rpc *rpc = get_service_data(dev); > + > + aer_disable_rootport(rpc); > + return 0; > +} > + > +static int aer_resume(struct pcie_device *dev) > +{ > + struct aer_rpc *rpc = get_service_data(dev); > + > + aer_enable_rootport(rpc); > + return 0; > +} > + > /** > * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP > * @dev: pointer to Root Port, RCEC, or RCiEP > @@ -1433,12 +1449,15 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev) > } > > static struct pcie_port_service_driver aerdriver = { > - .name = "aer", > - .port_type = PCIE_ANY_PORT, > - .service = PCIE_PORT_SERVICE_AER, > - > - .probe = aer_probe, > - .remove = aer_remove, > + .name = "aer", > + .port_type = PCIE_ANY_PORT, > + .service = PCIE_PORT_SERVICE_AER, > + .probe = aer_probe, > + .suspend = aer_suspend, > + .resume = aer_resume, > + .runtime_suspend = aer_suspend, > + .runtime_resume = aer_resume, > + .remove = aer_remove, > }; > > /** -- Sathyanarayanan Kuppuswamy Linux Kernel Developer
WARNING: multiple messages have this Message-ID (diff)
From: Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com> To: Kai-Heng Feng <kai.heng.feng@canonical.com>, bhelgaas@google.com Cc: Joerg Roedel <jroedel@suse.de>, koba.ko@canonical.com, linuxppc-dev@lists.ozlabs.org, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, Lalithambika Krishnakumar <lalithambika.krishnakumar@intel.com>, Oliver O'Halloran <oohall@gmail.com>, mika.westerberg@linux.intel.com, Lu Baolu <baolu.lu@linux.intel.com> Subject: Re: [PATCH v2 1/2] PCI/AER: Disable AER service when link is in L2/L3 ready, L2 and L3 state Date: Sat, 19 Mar 2022 13:38:39 -0700 [thread overview] Message-ID: <427f19c6-32f0-684e-5fdd-2e5ed192b71d@linux.intel.com> (raw) In-Reply-To: <20220127025418.1989642-1-kai.heng.feng@canonical.com> On 1/26/22 6:54 PM, Kai-Heng Feng wrote: > Commit 50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in > hint") enables ACS, and some platforms lose its NVMe after resume from Why enabling ACS makes platform lose NVMe? Can you add more details about the problem? > S3: > [ 50.947816] pcieport 0000:00:1b.0: DPC: containment event, status:0x1f01 source:0x0000 > [ 50.947817] pcieport 0000:00:1b.0: DPC: unmasked uncorrectable error detected > [ 50.947829] pcieport 0000:00:1b.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID) > [ 50.947830] pcieport 0000:00:1b.0: device [8086:06ac] error status/mask=00200000/00010000 > [ 50.947831] pcieport 0000:00:1b.0: [21] ACSViol (First) > [ 50.947841] pcieport 0000:00:1b.0: AER: broadcast error_detected message > [ 50.947843] nvme nvme0: frozen state error detected, reset controller > > It happens right after ACS gets enabled during resume. > > There's another case, when Thunderbolt reaches D3cold: > [ 30.100211] pcieport 0000:00:1d.0: AER: Uncorrected (Non-Fatal) error received: 0000:00:1d.0 > [ 30.100251] pcieport 0000:00:1d.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID) > [ 30.100256] pcieport 0000:00:1d.0: device [8086:7ab0] error status/mask=00100000/00004000 > [ 30.100262] pcieport 0000:00:1d.0: [20] UnsupReq (First) > [ 30.100267] pcieport 0000:00:1d.0: AER: TLP Header: 34000000 08000052 00000000 00000000 > [ 30.100372] thunderbolt 0000:0a:00.0: AER: can't recover (no error_detected callback) no callback message means one or more devices in the given port does not support error handler. How is this related to ACS? > [ 30.100401] xhci_hcd 0000:3e:00.0: AER: can't recover (no error_detected callback) > [ 30.100427] pcieport 0000:00:1d.0: AER: device recovery failed > > So disable AER service to avoid the noises from turning power rails > on/off when the device is in low power states (D3hot and D3cold), as > PCIe spec "5.2 Link State Power Management" states that TLP and DLLP > transmission is disabled for a Link in L2/L3 Ready (D3hot), L2 (D3cold > with aux power) and L3 (D3cold). > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=209149 > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215453 > Fixes: 50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in hint") > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> > --- > v2: > - Wording change. > > drivers/pci/pcie/aer.c | 31 +++++++++++++++++++++++++------ > 1 file changed, 25 insertions(+), 6 deletions(-) > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index 9fa1f97e5b270..e4e9d4a3098d7 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -1367,6 +1367,22 @@ static int aer_probe(struct pcie_device *dev) > return 0; > } > > +static int aer_suspend(struct pcie_device *dev) > +{ > + struct aer_rpc *rpc = get_service_data(dev); > + > + aer_disable_rootport(rpc); > + return 0; > +} > + > +static int aer_resume(struct pcie_device *dev) > +{ > + struct aer_rpc *rpc = get_service_data(dev); > + > + aer_enable_rootport(rpc); > + return 0; > +} > + > /** > * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP > * @dev: pointer to Root Port, RCEC, or RCiEP > @@ -1433,12 +1449,15 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev) > } > > static struct pcie_port_service_driver aerdriver = { > - .name = "aer", > - .port_type = PCIE_ANY_PORT, > - .service = PCIE_PORT_SERVICE_AER, > - > - .probe = aer_probe, > - .remove = aer_remove, > + .name = "aer", > + .port_type = PCIE_ANY_PORT, > + .service = PCIE_PORT_SERVICE_AER, > + .probe = aer_probe, > + .suspend = aer_suspend, > + .resume = aer_resume, > + .runtime_suspend = aer_suspend, > + .runtime_resume = aer_resume, > + .remove = aer_remove, > }; > > /** -- Sathyanarayanan Kuppuswamy Linux Kernel Developer
next prev parent reply other threads:[~2022-03-19 20:38 UTC|newest] Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-01-27 2:54 [PATCH v2 1/2] PCI/AER: Disable AER service when link is in L2/L3 ready, L2 and L3 state Kai-Heng Feng 2022-01-27 2:54 ` Kai-Heng Feng 2022-01-27 2:54 ` [PATCH v2 2/2] PCI/DPC: Disable DPC " Kai-Heng Feng 2022-01-27 2:54 ` Kai-Heng Feng 2022-01-27 6:31 ` Mika Westerberg 2022-01-27 6:31 ` Mika Westerberg 2022-03-19 20:40 ` Sathyanarayanan Kuppuswamy 2022-03-19 20:40 ` Sathyanarayanan Kuppuswamy 2022-01-27 6:30 ` [PATCH v2 1/2] PCI/AER: Disable AER " Mika Westerberg 2022-01-27 6:30 ` Mika Westerberg 2022-01-27 7:01 ` Lu Baolu 2022-01-27 7:01 ` Lu Baolu 2022-01-27 11:14 ` Kai-Heng Feng 2022-01-27 11:14 ` Kai-Heng Feng 2022-01-28 2:53 ` Lu Baolu 2022-01-28 2:53 ` Lu Baolu 2022-01-28 3:29 ` Kai-Heng Feng 2022-01-28 3:29 ` Kai-Heng Feng 2022-03-19 20:38 ` Sathyanarayanan Kuppuswamy [this message] 2022-03-19 20:38 ` Sathyanarayanan Kuppuswamy 2022-03-21 2:38 ` Kai-Heng Feng 2022-03-21 2:38 ` Kai-Heng Feng 2022-03-21 3:52 ` Sathyanarayanan Kuppuswamy 2022-03-21 3:52 ` Sathyanarayanan Kuppuswamy
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=427f19c6-32f0-684e-5fdd-2e5ed192b71d@linux.intel.com \ --to=sathyanarayanan.kuppuswamy@linux.intel.com \ --cc=baolu.lu@linux.intel.com \ --cc=bhelgaas@google.com \ --cc=jroedel@suse.de \ --cc=kai.heng.feng@canonical.com \ --cc=koba.ko@canonical.com \ --cc=lalithambika.krishnakumar@intel.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-pci@vger.kernel.org \ --cc=linuxppc-dev@lists.ozlabs.org \ --cc=mika.westerberg@linux.intel.com \ --cc=oohall@gmail.com \ --cc=ruscur@russell.cc \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.