* [PATCH] PCI/PME: Fix race on PME polling @ 2019-06-09 11:29 Lukas Wunner 2019-06-17 10:37 ` Rafael J. Wysocki 0 siblings, 1 reply; 7+ messages in thread From: Lukas Wunner @ 2019-06-09 11:29 UTC (permalink / raw) To: Bjorn Helgaas, linux-pci, linux-kernel Cc: Mika Westerberg, Rafael J. Wysocki, Keith Busch, Alex Williamson, Alexandru Gagniuc Since commit df17e62e5bff ("PCI: Add support for polling PME state on suspended legacy PCI devices"), the work item pci_pme_list_scan() polls the PME status flag of devices and wakes them up if the bit is set. The function performs a check whether a device's upstream bridge is in D0 for otherwise the device is inaccessible, rendering PME polling impossible. However the check is racy because it is performed before polling the device. If the upstream bridge runtime suspends to D3hot after pci_pme_list_scan() checks its power state and before it invokes pci_pme_wakeup(), the latter will read the PMCSR as "all ones" and mistake it for a set PME status flag. I am seeing this race play out as a Thunderbolt controller going to D3cold and occasionally immediately going to D0 again because PM polling was performed at just the wrong time. Avoid by checking for an "all ones" PMCSR in pci_check_pme_status(). Fixes: 58ff463396ad ("PCI PM: Add function for checking PME status of devices") Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: stable@vger.kernel.org # v2.6.34+ Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> --- drivers/pci/pci.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 8abc843b1615..eed5db9f152f 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -1989,6 +1989,8 @@ bool pci_check_pme_status(struct pci_dev *dev) pci_read_config_word(dev, pmcsr_pos, &pmcsr); if (!(pmcsr & PCI_PM_CTRL_PME_STATUS)) return false; + if (pmcsr == 0xffff) + return false; /* Clear PME status. */ pmcsr |= PCI_PM_CTRL_PME_STATUS; -- 2.20.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] PCI/PME: Fix race on PME polling 2019-06-09 11:29 [PATCH] PCI/PME: Fix race on PME polling Lukas Wunner @ 2019-06-17 10:37 ` Rafael J. Wysocki 2019-06-17 14:35 ` Mika Westerberg 0 siblings, 1 reply; 7+ messages in thread From: Rafael J. Wysocki @ 2019-06-17 10:37 UTC (permalink / raw) To: Lukas Wunner Cc: Bjorn Helgaas, linux-pci, linux-kernel, Mika Westerberg, Rafael J. Wysocki, Keith Busch, Alex Williamson, Alexandru Gagniuc On Sunday, June 9, 2019 1:29:33 PM CEST Lukas Wunner wrote: > Since commit df17e62e5bff ("PCI: Add support for polling PME state on > suspended legacy PCI devices"), the work item pci_pme_list_scan() polls > the PME status flag of devices and wakes them up if the bit is set. > > The function performs a check whether a device's upstream bridge is in > D0 for otherwise the device is inaccessible, rendering PME polling > impossible. However the check is racy because it is performed before > polling the device. If the upstream bridge runtime suspends to D3hot > after pci_pme_list_scan() checks its power state and before it invokes > pci_pme_wakeup(), the latter will read the PMCSR as "all ones" and > mistake it for a set PME status flag. I am seeing this race play out as > a Thunderbolt controller going to D3cold and occasionally immediately > going to D0 again because PM polling was performed at just the wrong > time. > > Avoid by checking for an "all ones" PMCSR in pci_check_pme_status(). > > Fixes: 58ff463396ad ("PCI PM: Add function for checking PME status of devices") > Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> > Signed-off-by: Lukas Wunner <lukas@wunner.de> > Cc: stable@vger.kernel.org # v2.6.34+ > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > --- > drivers/pci/pci.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 8abc843b1615..eed5db9f152f 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -1989,6 +1989,8 @@ bool pci_check_pme_status(struct pci_dev *dev) > pci_read_config_word(dev, pmcsr_pos, &pmcsr); > if (!(pmcsr & PCI_PM_CTRL_PME_STATUS)) > return false; > + if (pmcsr == 0xffff) > + return false; > > /* Clear PME status. */ > pmcsr |= PCI_PM_CTRL_PME_STATUS; > Added to my 5.3 queue, thanks! ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] PCI/PME: Fix race on PME polling 2019-06-17 10:37 ` Rafael J. Wysocki @ 2019-06-17 14:35 ` Mika Westerberg 2019-06-17 14:53 ` Lukas Wunner 2019-06-17 22:41 ` Rafael J. Wysocki 0 siblings, 2 replies; 7+ messages in thread From: Mika Westerberg @ 2019-06-17 14:35 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Lukas Wunner, Bjorn Helgaas, linux-pci, linux-kernel, Rafael J. Wysocki, Keith Busch, Alex Williamson, Alexandru Gagniuc On Mon, Jun 17, 2019 at 12:37:06PM +0200, Rafael J. Wysocki wrote: > On Sunday, June 9, 2019 1:29:33 PM CEST Lukas Wunner wrote: > > Since commit df17e62e5bff ("PCI: Add support for polling PME state on > > suspended legacy PCI devices"), the work item pci_pme_list_scan() polls > > the PME status flag of devices and wakes them up if the bit is set. > > > > The function performs a check whether a device's upstream bridge is in > > D0 for otherwise the device is inaccessible, rendering PME polling > > impossible. However the check is racy because it is performed before > > polling the device. If the upstream bridge runtime suspends to D3hot > > after pci_pme_list_scan() checks its power state and before it invokes > > pci_pme_wakeup(), the latter will read the PMCSR as "all ones" and > > mistake it for a set PME status flag. I am seeing this race play out as > > a Thunderbolt controller going to D3cold and occasionally immediately > > going to D0 again because PM polling was performed at just the wrong > > time. > > > > Avoid by checking for an "all ones" PMCSR in pci_check_pme_status(). > > > > Fixes: 58ff463396ad ("PCI PM: Add function for checking PME status of devices") > > Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> > > Signed-off-by: Lukas Wunner <lukas@wunner.de> > > Cc: stable@vger.kernel.org # v2.6.34+ > > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > --- > > drivers/pci/pci.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > index 8abc843b1615..eed5db9f152f 100644 > > --- a/drivers/pci/pci.c > > +++ b/drivers/pci/pci.c > > @@ -1989,6 +1989,8 @@ bool pci_check_pme_status(struct pci_dev *dev) > > pci_read_config_word(dev, pmcsr_pos, &pmcsr); > > if (!(pmcsr & PCI_PM_CTRL_PME_STATUS)) > > return false; > > + if (pmcsr == 0xffff) > > + return false; > > > > /* Clear PME status. */ > > pmcsr |= PCI_PM_CTRL_PME_STATUS; > > > > Added to my 5.3 queue, thanks! Today when doing some PM testing I noticed that this patch actually reveals an issue in our native PME handling. Problem is in pcie_pme_handle_request() where we first convert req_id to struct pci_dev and then call pci_check_pme_status() for it. Now, when a device triggers wake the link is first brought up and then the PME is sent to root complex with req_id matching the originating device. However, if there are PCIe ports in the middle they may still be in D3 which means that pci_check_pme_status() returns 0xffff for the device below so there are lots of Spurious native interrupt" messages in the dmesg but the actual PME is never handled. It has been working because pci_check_pme_status() returned true in case of 0xffff as well and we went and runtime resumed to originating device. I think the correct way to handle this is actually drop the call to pci_check_pme_status() in pcie_pme_handle_request() because the whole idea of req_id in PME message is to allow the root complex and SW to identify the device without need to poll for the PME status bit. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] PCI/PME: Fix race on PME polling 2019-06-17 14:35 ` Mika Westerberg @ 2019-06-17 14:53 ` Lukas Wunner 2019-06-17 22:43 ` Rafael J. Wysocki 2019-06-17 22:41 ` Rafael J. Wysocki 1 sibling, 1 reply; 7+ messages in thread From: Lukas Wunner @ 2019-06-17 14:53 UTC (permalink / raw) To: Mika Westerberg Cc: Rafael J. Wysocki, Bjorn Helgaas, linux-pci, linux-kernel, Rafael J. Wysocki, Keith Busch, Alex Williamson, Alexandru Gagniuc On Mon, Jun 17, 2019 at 05:35:10PM +0300, Mika Westerberg wrote: > Today when doing some PM testing I noticed that this patch actually > reveals an issue in our native PME handling. Problem is in > pcie_pme_handle_request() where we first convert req_id to struct > pci_dev and then call pci_check_pme_status() for it. Now, when a device > triggers wake the link is first brought up and then the PME is sent to > root complex with req_id matching the originating device. However, if > there are PCIe ports in the middle they may still be in D3 which means > that pci_check_pme_status() returns 0xffff for the device below so there > are lots of > > Spurious native interrupt" > > messages in the dmesg but the actual PME is never handled. > > It has been working because pci_check_pme_status() returned true in case > of 0xffff as well and we went and runtime resumed to originating device. > > I think the correct way to handle this is actually drop the call to > pci_check_pme_status() in pcie_pme_handle_request() because the whole > idea of req_id in PME message is to allow the root complex and SW to > identify the device without need to poll for the PME status bit. Either that or the call to pci_check_pme_status() should be encapsulated in a pci_config_pm_runtime_get() / _put() pair. Thanks, Lukas ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] PCI/PME: Fix race on PME polling 2019-06-17 14:53 ` Lukas Wunner @ 2019-06-17 22:43 ` Rafael J. Wysocki 0 siblings, 0 replies; 7+ messages in thread From: Rafael J. Wysocki @ 2019-06-17 22:43 UTC (permalink / raw) To: Lukas Wunner Cc: Mika Westerberg, Bjorn Helgaas, linux-pci, linux-kernel, Keith Busch, Alex Williamson, Alexandru Gagniuc On Monday, June 17, 2019 4:53:48 PM CEST Lukas Wunner wrote: > On Mon, Jun 17, 2019 at 05:35:10PM +0300, Mika Westerberg wrote: > > Today when doing some PM testing I noticed that this patch actually > > reveals an issue in our native PME handling. Problem is in > > pcie_pme_handle_request() where we first convert req_id to struct > > pci_dev and then call pci_check_pme_status() for it. Now, when a device > > triggers wake the link is first brought up and then the PME is sent to > > root complex with req_id matching the originating device. However, if > > there are PCIe ports in the middle they may still be in D3 which means > > that pci_check_pme_status() returns 0xffff for the device below so there > > are lots of > > > > Spurious native interrupt" > > > > messages in the dmesg but the actual PME is never handled. > > > > It has been working because pci_check_pme_status() returned true in case > > of 0xffff as well and we went and runtime resumed to originating device. > > > > I think the correct way to handle this is actually drop the call to > > pci_check_pme_status() in pcie_pme_handle_request() because the whole > > idea of req_id in PME message is to allow the root complex and SW to > > identify the device without need to poll for the PME status bit. > > Either that or the call to pci_check_pme_status() should be encapsulated > in a pci_config_pm_runtime_get() / _put() pair. And the whole hierarchy might as well be resumed, which could be rather wasteful. The problem is that the $subject patch should affect polling only, but it has side effects beyond that. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] PCI/PME: Fix race on PME polling 2019-06-17 14:35 ` Mika Westerberg 2019-06-17 14:53 ` Lukas Wunner @ 2019-06-17 22:41 ` Rafael J. Wysocki 2019-06-18 9:45 ` Mika Westerberg 1 sibling, 1 reply; 7+ messages in thread From: Rafael J. Wysocki @ 2019-06-17 22:41 UTC (permalink / raw) To: Mika Westerberg, Keith Busch Cc: Lukas Wunner, Bjorn Helgaas, linux-pci, linux-kernel, Alex Williamson, Alexandru Gagniuc On Monday, June 17, 2019 4:35:10 PM CEST Mika Westerberg wrote: > On Mon, Jun 17, 2019 at 12:37:06PM +0200, Rafael J. Wysocki wrote: > > On Sunday, June 9, 2019 1:29:33 PM CEST Lukas Wunner wrote: > > > Since commit df17e62e5bff ("PCI: Add support for polling PME state on > > > suspended legacy PCI devices"), the work item pci_pme_list_scan() polls > > > the PME status flag of devices and wakes them up if the bit is set. > > > > > > The function performs a check whether a device's upstream bridge is in > > > D0 for otherwise the device is inaccessible, rendering PME polling > > > impossible. However the check is racy because it is performed before > > > polling the device. If the upstream bridge runtime suspends to D3hot > > > after pci_pme_list_scan() checks its power state and before it invokes > > > pci_pme_wakeup(), the latter will read the PMCSR as "all ones" and > > > mistake it for a set PME status flag. I am seeing this race play out as > > > a Thunderbolt controller going to D3cold and occasionally immediately > > > going to D0 again because PM polling was performed at just the wrong > > > time. > > > > > > Avoid by checking for an "all ones" PMCSR in pci_check_pme_status(). > > > > > > Fixes: 58ff463396ad ("PCI PM: Add function for checking PME status of devices") > > > Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> > > > Signed-off-by: Lukas Wunner <lukas@wunner.de> > > > Cc: stable@vger.kernel.org # v2.6.34+ > > > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > --- > > > drivers/pci/pci.c | 2 ++ > > > 1 file changed, 2 insertions(+) > > > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > > index 8abc843b1615..eed5db9f152f 100644 > > > --- a/drivers/pci/pci.c > > > +++ b/drivers/pci/pci.c > > > @@ -1989,6 +1989,8 @@ bool pci_check_pme_status(struct pci_dev *dev) > > > pci_read_config_word(dev, pmcsr_pos, &pmcsr); > > > if (!(pmcsr & PCI_PM_CTRL_PME_STATUS)) > > > return false; > > > + if (pmcsr == 0xffff) > > > + return false; > > > > > > /* Clear PME status. */ > > > pmcsr |= PCI_PM_CTRL_PME_STATUS; > > > > > > > Added to my 5.3 queue, thanks! > > Today when doing some PM testing I noticed that this patch actually > reveals an issue in our native PME handling. Problem is in > pcie_pme_handle_request() where we first convert req_id to struct > pci_dev and then call pci_check_pme_status() for it. Now, when a device > triggers wake the link is first brought up and then the PME is sent to > root complex with req_id matching the originating device. However, if > there are PCIe ports in the middle they may still be in D3 which means > that pci_check_pme_status() returns 0xffff for the device below so there > are lots of > > Spurious native interrupt" > > messages in the dmesg but the actual PME is never handled. > > It has been working because pci_check_pme_status() returned true in case > of 0xffff as well and we went and runtime resumed to originating device. In this case 0xffff is as good as PME Status set, that is the device needs to be resumed. This is a regression in the $subject patch, not a bug in the PME code. > I think the correct way to handle this is actually drop the call to > pci_check_pme_status() in pcie_pme_handle_request() because the whole > idea of req_id in PME message is to allow the root complex and SW to > identify the device without need to poll for the PME status bit. Not really, because if there is a PCIe-to-PCI bridge below the port, it is expected to use the req_id of the bridge for all of the devices below it. I'm going to drop this patch from my queue. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] PCI/PME: Fix race on PME polling 2019-06-17 22:41 ` Rafael J. Wysocki @ 2019-06-18 9:45 ` Mika Westerberg 0 siblings, 0 replies; 7+ messages in thread From: Mika Westerberg @ 2019-06-18 9:45 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Keith Busch, Lukas Wunner, Bjorn Helgaas, linux-pci, linux-kernel, Alex Williamson, Alexandru Gagniuc On Tue, Jun 18, 2019 at 12:41:01AM +0200, Rafael J. Wysocki wrote: > On Monday, June 17, 2019 4:35:10 PM CEST Mika Westerberg wrote: > > On Mon, Jun 17, 2019 at 12:37:06PM +0200, Rafael J. Wysocki wrote: > > > On Sunday, June 9, 2019 1:29:33 PM CEST Lukas Wunner wrote: > > > > Since commit df17e62e5bff ("PCI: Add support for polling PME state on > > > > suspended legacy PCI devices"), the work item pci_pme_list_scan() polls > > > > the PME status flag of devices and wakes them up if the bit is set. > > > > > > > > The function performs a check whether a device's upstream bridge is in > > > > D0 for otherwise the device is inaccessible, rendering PME polling > > > > impossible. However the check is racy because it is performed before > > > > polling the device. If the upstream bridge runtime suspends to D3hot > > > > after pci_pme_list_scan() checks its power state and before it invokes > > > > pci_pme_wakeup(), the latter will read the PMCSR as "all ones" and > > > > mistake it for a set PME status flag. I am seeing this race play out as > > > > a Thunderbolt controller going to D3cold and occasionally immediately > > > > going to D0 again because PM polling was performed at just the wrong > > > > time. > > > > > > > > Avoid by checking for an "all ones" PMCSR in pci_check_pme_status(). > > > > > > > > Fixes: 58ff463396ad ("PCI PM: Add function for checking PME status of devices") > > > > Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> > > > > Signed-off-by: Lukas Wunner <lukas@wunner.de> > > > > Cc: stable@vger.kernel.org # v2.6.34+ > > > > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > --- > > > > drivers/pci/pci.c | 2 ++ > > > > 1 file changed, 2 insertions(+) > > > > > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > > > index 8abc843b1615..eed5db9f152f 100644 > > > > --- a/drivers/pci/pci.c > > > > +++ b/drivers/pci/pci.c > > > > @@ -1989,6 +1989,8 @@ bool pci_check_pme_status(struct pci_dev *dev) > > > > pci_read_config_word(dev, pmcsr_pos, &pmcsr); > > > > if (!(pmcsr & PCI_PM_CTRL_PME_STATUS)) > > > > return false; > > > > + if (pmcsr == 0xffff) > > > > + return false; > > > > > > > > /* Clear PME status. */ > > > > pmcsr |= PCI_PM_CTRL_PME_STATUS; > > > > > > > > > > Added to my 5.3 queue, thanks! > > > > Today when doing some PM testing I noticed that this patch actually > > reveals an issue in our native PME handling. Problem is in > > pcie_pme_handle_request() where we first convert req_id to struct > > pci_dev and then call pci_check_pme_status() for it. Now, when a device > > triggers wake the link is first brought up and then the PME is sent to > > root complex with req_id matching the originating device. However, if > > there are PCIe ports in the middle they may still be in D3 which means > > that pci_check_pme_status() returns 0xffff for the device below so there > > are lots of > > > > Spurious native interrupt" > > > > messages in the dmesg but the actual PME is never handled. > > > > It has been working because pci_check_pme_status() returned true in case > > of 0xffff as well and we went and runtime resumed to originating device. > > In this case 0xffff is as good as PME Status set, that is the device needs to be > resumed. > > This is a regression in the $subject patch, not a bug in the PME code. OK, thanks for explanation. > > I think the correct way to handle this is actually drop the call to > > pci_check_pme_status() in pcie_pme_handle_request() because the whole > > idea of req_id in PME message is to allow the root complex and SW to > > identify the device without need to poll for the PME status bit. > > Not really, because if there is a PCIe-to-PCI bridge below the port, it is > expected to use the req_id of the bridge for all of the devices below it. Right, I forgot about that so indeed we need to check for the PME status in that case to find out the correct device. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2019-06-18 9:45 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-06-09 11:29 [PATCH] PCI/PME: Fix race on PME polling Lukas Wunner 2019-06-17 10:37 ` Rafael J. Wysocki 2019-06-17 14:35 ` Mika Westerberg 2019-06-17 14:53 ` Lukas Wunner 2019-06-17 22:43 ` Rafael J. Wysocki 2019-06-17 22:41 ` Rafael J. Wysocki 2019-06-18 9:45 ` Mika Westerberg
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).