* [PATCH] PCI: pciehp: clear cmd_busy bit when Command Completed in polling mode @ 2021-11-11 5:42 Liguang Zhang 2021-11-18 11:00 ` luanshi ` (2 more replies) 0 siblings, 3 replies; 11+ messages in thread From: Liguang Zhang @ 2021-11-11 5:42 UTC (permalink / raw) To: Bjorn Helgaas, Lukas Wunner, Kuppuswamy Sathyanarayanan, Amey Narkhede Cc: linux-pci, linux-kernel, Liguang Zhang This patch fixes this problem that on driver probe from system startup, pciehp checks the Presence Detect State bit in the Slot Status register to bring up an occupied slot or bring down an unoccupied slot. If empty slot's power status is on, turn power off. The Hot-Plug interrupt isn't requested yet, so avoid triggering a notification by calling pcie_disable_notification(). Both the CCIE and HPIE bits are masked in pcie_disable_notification(), when we issue a hotplug command, pcie_wait_cmd() will polling the Command Completed bit instead of waiting for an interrupt. But cmd_busy bit was not cleared when Command Completed which results in timeouts like this in pciehp_power_off_slot() and pcie_init_notification(): pcieport 0000:00:03.0: pciehp: Timeout on hotplug command 0x01c0 (issued 2264 msec ago) pcieport 0000:00:03.0: pciehp: Timeout on hotplug command 0x05c0 (issued 2288 msec ago) Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com> --- drivers/pci/hotplug/pciehp_hpc.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c index 83a0fa119cae..8698aefc6041 100644 --- a/drivers/pci/hotplug/pciehp_hpc.c +++ b/drivers/pci/hotplug/pciehp_hpc.c @@ -98,6 +98,8 @@ static int pcie_poll_cmd(struct controller *ctrl, int timeout) if (slot_status & PCI_EXP_SLTSTA_CC) { pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, PCI_EXP_SLTSTA_CC); + ctrl->cmd_busy = 0; + smp_mb(); return 1; } msleep(10); -- 2.19.1.6.gb485710b ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] PCI: pciehp: clear cmd_busy bit when Command Completed in polling mode 2021-11-11 5:42 [PATCH] PCI: pciehp: clear cmd_busy bit when Command Completed in polling mode Liguang Zhang @ 2021-11-18 11:00 ` luanshi 2021-11-19 12:00 ` Lukas Wunner 2021-11-26 17:33 ` Lukas Wunner 2 siblings, 0 replies; 11+ messages in thread From: luanshi @ 2021-11-18 11:00 UTC (permalink / raw) To: Bjorn Helgaas, Lukas Wunner, Kuppuswamy Sathyanarayanan, Amey Narkhede Cc: linux-pci, linux-kernel Hi Bjorn & Lukas & Kuppuswamy & Amey, Gentle ping! Any comments on this patch? 在 2021/11/11 13:42, Liguang Zhang 写道: > This patch fixes this problem that on driver probe from system startup, > pciehp checks the Presence Detect State bit in the Slot Status register > to bring up an occupied slot or bring down an unoccupied slot. If empty > slot's power status is on, turn power off. The Hot-Plug interrupt isn't > requested yet, so avoid triggering a notification by calling > pcie_disable_notification(). > > Both the CCIE and HPIE bits are masked in pcie_disable_notification(), > when we issue a hotplug command, pcie_wait_cmd() will polling the > Command Completed bit instead of waiting for an interrupt. But cmd_busy > bit was not cleared when Command Completed which results in timeouts > like this in pciehp_power_off_slot() and pcie_init_notification(): > > pcieport 0000:00:03.0: pciehp: Timeout on hotplug command 0x01c0 > (issued 2264 msec ago) > pcieport 0000:00:03.0: pciehp: Timeout on hotplug command 0x05c0 > (issued 2288 msec ago) > > Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com> > --- > drivers/pci/hotplug/pciehp_hpc.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c > index 83a0fa119cae..8698aefc6041 100644 > --- a/drivers/pci/hotplug/pciehp_hpc.c > +++ b/drivers/pci/hotplug/pciehp_hpc.c > @@ -98,6 +98,8 @@ static int pcie_poll_cmd(struct controller *ctrl, int timeout) > if (slot_status & PCI_EXP_SLTSTA_CC) { > pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, > PCI_EXP_SLTSTA_CC); > + ctrl->cmd_busy = 0; > + smp_mb(); > return 1; > } > msleep(10); ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] PCI: pciehp: clear cmd_busy bit when Command Completed in polling mode 2021-11-11 5:42 [PATCH] PCI: pciehp: clear cmd_busy bit when Command Completed in polling mode Liguang Zhang 2021-11-18 11:00 ` luanshi @ 2021-11-19 12:00 ` Lukas Wunner 2021-11-21 1:50 ` luanshi 2021-11-26 17:33 ` Lukas Wunner 2 siblings, 1 reply; 11+ messages in thread From: Lukas Wunner @ 2021-11-19 12:00 UTC (permalink / raw) To: Liguang Zhang Cc: Bjorn Helgaas, Kuppuswamy Sathyanarayanan, Amey Narkhede, linux-pci, linux-kernel On Thu, Nov 11, 2021 at 01:42:58PM +0800, Liguang Zhang wrote: > Both the CCIE and HPIE bits are masked in pcie_disable_notification(), > when we issue a hotplug command, pcie_wait_cmd() will polling the > Command Completed bit instead of waiting for an interrupt. But cmd_busy > bit was not cleared when Command Completed which results in timeouts > like this in pciehp_power_off_slot() and pcie_init_notification(): > > pcieport 0000:00:03.0: pciehp: Timeout on hotplug command 0x01c0 > (issued 2264 msec ago) > pcieport 0000:00:03.0: pciehp: Timeout on hotplug command 0x05c0 > (issued 2288 msec ago) The first timeout occurs with the following bits set in ctrl->slot_ctrl: PCI_EXP_SLTCTL_PWR_IND_ON | PCI_EXP_SLTCTL_ATTN_IND_OFF Those bits are set by: board_added() pciehp_set_indicators() The second timeout occurs with: PCI_EXP_SLTCTL_PWR_IND_ON | PCI_EXP_SLTCTL_ATTN_IND_OFF | PCI_EXP_SLTCTL_PWR_OFF This might be triggered by: remove_board() pciehp_power_off_slot() So it seems Command Completed is not signaled when changing the Power Indicator, Attention Indicator and Power Controller Control bits in the Slot Control register. But apparently it works for the other bits. We know there are hotplug controllers out there which suffer from broken Command Completed support. They support it for the bits mentioned above but not the others. So the inverse behavior from your hotplug controller. See this code comment in pcie_do_write_cmd(): /* * Controllers with the Intel CF118 and similar errata advertise * Command Completed support, but they only set Command Completed * if we change the "Control" bits for power, power indicator, * attention indicator, or interlock. If we only change the * "Enable" bits, they never set the Command Completed bit. */ > --- a/drivers/pci/hotplug/pciehp_hpc.c > +++ b/drivers/pci/hotplug/pciehp_hpc.c > @@ -98,6 +98,8 @@ static int pcie_poll_cmd(struct controller *ctrl, int timeout) > if (slot_status & PCI_EXP_SLTSTA_CC) { > pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, > PCI_EXP_SLTSTA_CC); > + ctrl->cmd_busy = 0; > + smp_mb(); > return 1; > } > msleep(10); I suspect that this patch merely papers over the problem and that the real solution would be to either apply quirk_cmd_compl or a similar quirk to your hotplug controller. Please open a bug on bugzilla.kernel.org and attach full output of lspci -vv and dmesg. Be sure to add the following to the command line: pciehp.pciehp_debug=1 dyndbg="file pciehp* +p" Once you've done that, please report the bugzilla link here so that we can analyze the issue properly. Thanks, Lukas ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] PCI: pciehp: clear cmd_busy bit when Command Completed in polling mode 2021-11-19 12:00 ` Lukas Wunner @ 2021-11-21 1:50 ` luanshi 2021-11-21 6:35 ` Lukas Wunner 0 siblings, 1 reply; 11+ messages in thread From: luanshi @ 2021-11-21 1:50 UTC (permalink / raw) To: Lukas Wunner Cc: Bjorn Helgaas, Kuppuswamy Sathyanarayanan, Amey Narkhede, linux-pci, linux-kernel Hi,thanks for your comments. 在 2021/11/19 20:00, Lukas Wunner 写道: > On Thu, Nov 11, 2021 at 01:42:58PM +0800, Liguang Zhang wrote: >> Both the CCIE and HPIE bits are masked in pcie_disable_notification(), >> when we issue a hotplug command, pcie_wait_cmd() will polling the >> Command Completed bit instead of waiting for an interrupt. But cmd_busy >> bit was not cleared when Command Completed which results in timeouts >> like this in pciehp_power_off_slot() and pcie_init_notification(): >> >> pcieport 0000:00:03.0: pciehp: Timeout on hotplug command 0x01c0 >> (issued 2264 msec ago) >> pcieport 0000:00:03.0: pciehp: Timeout on hotplug command 0x05c0 >> (issued 2288 msec ago) > The first timeout occurs with the following bits set in ctrl->slot_ctrl: > PCI_EXP_SLTCTL_PWR_IND_ON | PCI_EXP_SLTCTL_ATTN_IND_OFF After some debug, the first timeout occurs: pciehp_power_off_slot pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PWR_OFF, PCI_EXP_SLTCTL_PCC) pcie_do_write_cmd /* * Always wait for any previous command that might still be in progress */ pcie_wait_cmd(ctrl); // at the beginning if (!ctrl->cmd_busy) // cmd_busy was not zero return; Why cmd_busy was not flase, because in function pcie_disable_notification cmd_busy was not setting correctly: pcie_disable_notification // Both the CCIE and HPIE bits are masked pcie_write_cmd pcie_do_write_cmd ctrl->cmd_busy = 1 // cmd_busy was setting to 1 pcie_wait_cmd pcie_poll_cmd // pcie_wait_cmd() will polling the Command Completed bit instead of waiting for an interrupt After debug we found Command Completed can be signaled without cmd_busy was setting to 0. If we use interrupt mode pciehp_isr: pciehp_isr if (events & PCI_EXP_SLTSTA_CC) { ctrl->cmd_busy = 0; // cmd_busy was setting to zero, this was left in pcie_poll_cmd. Thanks, Liguang > > Those bits are set by: > board_added() > pciehp_set_indicators() > > > The second timeout occurs with: > PCI_EXP_SLTCTL_PWR_IND_ON | PCI_EXP_SLTCTL_ATTN_IND_OFF | > PCI_EXP_SLTCTL_PWR_OFF > > This might be triggered by: > remove_board() > pciehp_power_off_slot() > > > So it seems Command Completed is not signaled when changing the > Power Indicator, Attention Indicator and Power Controller Control > bits in the Slot Control register. But apparently it works for > the other bits. > > We know there are hotplug controllers out there which suffer from > broken Command Completed support. They support it for the bits > mentioned above but not the others. So the inverse behavior from > your hotplug controller. See this code comment in pcie_do_write_cmd(): > > /* > * Controllers with the Intel CF118 and similar errata advertise > * Command Completed support, but they only set Command Completed > * if we change the "Control" bits for power, power indicator, > * attention indicator, or interlock. If we only change the > * "Enable" bits, they never set the Command Completed bit. > */ > > >> --- a/drivers/pci/hotplug/pciehp_hpc.c >> +++ b/drivers/pci/hotplug/pciehp_hpc.c >> @@ -98,6 +98,8 @@ static int pcie_poll_cmd(struct controller *ctrl, int timeout) >> if (slot_status & PCI_EXP_SLTSTA_CC) { >> pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, >> PCI_EXP_SLTSTA_CC); >> + ctrl->cmd_busy = 0; >> + smp_mb(); >> return 1; >> } >> msleep(10); > I suspect that this patch merely papers over the problem and that > the real solution would be to either apply quirk_cmd_compl or a > similar quirk to your hotplug controller. > > Please open a bug on bugzilla.kernel.org and attach full output > of lspci -vv and dmesg. Be sure to add the following to the > command line: > pciehp.pciehp_debug=1 dyndbg="file pciehp* +p" > > Once you've done that, please report the bugzilla link here > so that we can analyze the issue properly. > > Thanks, > > Lukas ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] PCI: pciehp: clear cmd_busy bit when Command Completed in polling mode 2021-11-21 1:50 ` luanshi @ 2021-11-21 6:35 ` Lukas Wunner 2021-11-26 7:47 ` luanshi 0 siblings, 1 reply; 11+ messages in thread From: Lukas Wunner @ 2021-11-21 6:35 UTC (permalink / raw) To: luanshi Cc: Bjorn Helgaas, Kuppuswamy Sathyanarayanan, Amey Narkhede, linux-pci, linux-kernel On Sun, Nov 21, 2021 at 09:50:38AM +0800, luanshi wrote: > 2021/11/19 20:00, Lukas Wunner: > > Please open a bug on bugzilla.kernel.org and attach full output > > of lspci -vv and dmesg. Be sure to add the following to the > > command line: > > pciehp.pciehp_debug=1 dyndbg="file pciehp* +p" > > > > Once you've done that, please report the bugzilla link here > > so that we can analyze the issue properly. I really need you to perform the above steps in order to analyze what's going on here. Again, if you get such timeout messages, it's usually not caused by a bug in the driver but by an erratum in the hardware, i.e. the hardware neglected to signal Command Completed in response to a Slot Control register write. Thanks, Lukas ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] PCI: pciehp: clear cmd_busy bit when Command Completed in polling mode 2021-11-21 6:35 ` Lukas Wunner @ 2021-11-26 7:47 ` luanshi 0 siblings, 0 replies; 11+ messages in thread From: luanshi @ 2021-11-26 7:47 UTC (permalink / raw) To: Lukas Wunner Cc: Bjorn Helgaas, Kuppuswamy Sathyanarayanan, Amey Narkhede, linux-pci, linux-kernel Sorry for the late reply, the machine is occupied by someone else. For detailed information: https://bugzilla.kernel.org/show_bug.cgi?id=215143 Thanks, Liguang 在 2021/11/21 14:35, Lukas Wunner 写道: > On Sun, Nov 21, 2021 at 09:50:38AM +0800, luanshi wrote: >> 2021/11/19 20:00, Lukas Wunner: >>> Please open a bug on bugzilla.kernel.org and attach full output >>> of lspci -vv and dmesg. Be sure to add the following to the >>> command line: >>> pciehp.pciehp_debug=1 dyndbg="file pciehp* +p" >>> >>> Once you've done that, please report the bugzilla link here >>> so that we can analyze the issue properly. > I really need you to perform the above steps in order to analyze > what's going on here. > > Again, if you get such timeout messages, it's usually not caused > by a bug in the driver but by an erratum in the hardware, > i.e. the hardware neglected to signal Command Completed in response > to a Slot Control register write. > > Thanks, > > Lukas ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] PCI: pciehp: clear cmd_busy bit when Command Completed in polling mode 2021-11-11 5:42 [PATCH] PCI: pciehp: clear cmd_busy bit when Command Completed in polling mode Liguang Zhang 2021-11-18 11:00 ` luanshi 2021-11-19 12:00 ` Lukas Wunner @ 2021-11-26 17:33 ` Lukas Wunner 2021-11-26 17:35 ` Lukas Wunner ` (2 more replies) 2 siblings, 3 replies; 11+ messages in thread From: Lukas Wunner @ 2021-11-26 17:33 UTC (permalink / raw) To: Liguang Zhang Cc: Bjorn Helgaas, Kuppuswamy Sathyanarayanan, Amey Narkhede, linux-pci, linux-kernel On Thu, Nov 11, 2021 at 01:42:58PM +0800, Liguang Zhang wrote: > This patch fixes this problem that on driver probe from system startup, > pciehp checks the Presence Detect State bit in the Slot Status register > to bring up an occupied slot or bring down an unoccupied slot. If empty > slot's power status is on, turn power off. The Hot-Plug interrupt isn't > requested yet, so avoid triggering a notification by calling > pcie_disable_notification(). > > Both the CCIE and HPIE bits are masked in pcie_disable_notification(), > when we issue a hotplug command, pcie_wait_cmd() will polling the > Command Completed bit instead of waiting for an interrupt. But cmd_busy > bit was not cleared when Command Completed which results in timeouts > like this in pciehp_power_off_slot() and pcie_init_notification(): > > pcieport 0000:00:03.0: pciehp: Timeout on hotplug command 0x01c0 > (issued 2264 msec ago) > pcieport 0000:00:03.0: pciehp: Timeout on hotplug command 0x05c0 > (issued 2288 msec ago) > > Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com> Fixes: a5dd4b4b0570 ("PCI: pciehp: Wait for hotplug command completion where necessary") Link: https://bugzilla.kernel.org/show_bug.cgi?id=215143 Reviewed-by: Lukas Wunner <lukas@wunner.de> Cc: stable@vger.kernel.org # v4.19+ Thanks a lot, that's a really good catch. It's a somewhat intricate bug, so I'll try to explain in my own words: If notification is disabled (HPIE or CCIE not set in the Slot Status register), we rely on pcie_poll_cmd() to poll for Command Completed. But once it's signaled, we neglect to clear ctrl->cmd_busy. (Normally it is cleared by the hardirq handler pciehp_isr() if notification is enabled.) The result is that starting with the second Slot Control write, pciehp will gratuitously wait for a command to finish which has already finished and it will incorrectly report a timeout. The bug was originally introduced in 2015 by commit a5dd4b4b0570 ("PCI: pciehp: Wait for hotplug command completion where necessary"), but didn't manifest itself because the first Slot Control Write already enabled notification and from that point on the hardirq handler would clear ctrl->cmd_busy. However I think the bug may have manifested itself with pciehp_poll_mode=1. It wasn't until commit 4e6a13356f1c ("PCI: pciehp: Deduplicate presence check on probe & resume") that multiple consecutive Slot Control writes were performed on ->probe with notification disabled, so that's the commit which first exposed the bug with pciehp_poll_mode=0. Thanks, Lukas > --- > drivers/pci/hotplug/pciehp_hpc.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c > index 83a0fa119cae..8698aefc6041 100644 > --- a/drivers/pci/hotplug/pciehp_hpc.c > +++ b/drivers/pci/hotplug/pciehp_hpc.c > @@ -98,6 +98,8 @@ static int pcie_poll_cmd(struct controller *ctrl, int timeout) > if (slot_status & PCI_EXP_SLTSTA_CC) { > pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, > PCI_EXP_SLTSTA_CC); > + ctrl->cmd_busy = 0; > + smp_mb(); > return 1; > } > msleep(10); > -- > 2.19.1.6.gb485710b ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] PCI: pciehp: clear cmd_busy bit when Command Completed in polling mode 2021-11-26 17:33 ` Lukas Wunner @ 2021-11-26 17:35 ` Lukas Wunner 2022-01-06 8:55 ` luanshi 2022-02-03 19:07 ` Lukas Wunner 2 siblings, 0 replies; 11+ messages in thread From: Lukas Wunner @ 2021-11-26 17:35 UTC (permalink / raw) To: Liguang Zhang Cc: Bjorn Helgaas, Kuppuswamy Sathyanarayanan, Amey Narkhede, linux-pci, linux-kernel On Fri, Nov 26, 2021 at 06:33:09PM +0100, Lukas Wunner wrote: > Cc: stable@vger.kernel.org # v4.19+ Sorry, I meant v4.2+. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] PCI: pciehp: clear cmd_busy bit when Command Completed in polling mode 2021-11-26 17:33 ` Lukas Wunner 2021-11-26 17:35 ` Lukas Wunner @ 2022-01-06 8:55 ` luanshi 2022-02-03 19:07 ` Lukas Wunner 2 siblings, 0 replies; 11+ messages in thread From: luanshi @ 2022-01-06 8:55 UTC (permalink / raw) To: Lukas Wunner Cc: Bjorn Helgaas, Kuppuswamy Sathyanarayanan, Amey Narkhede, linux-pci, linux-kernel Hi Bjorn & Lukas & Kuppuswamy & Amey, Gentle ping! Any comments on this patch,should be merged? Thanks, Liguang 在 2021/11/27 1:33, Lukas Wunner 写道: > On Thu, Nov 11, 2021 at 01:42:58PM +0800, Liguang Zhang wrote: >> This patch fixes this problem that on driver probe from system startup, >> pciehp checks the Presence Detect State bit in the Slot Status register >> to bring up an occupied slot or bring down an unoccupied slot. If empty >> slot's power status is on, turn power off. The Hot-Plug interrupt isn't >> requested yet, so avoid triggering a notification by calling >> pcie_disable_notification(). >> >> Both the CCIE and HPIE bits are masked in pcie_disable_notification(), >> when we issue a hotplug command, pcie_wait_cmd() will polling the >> Command Completed bit instead of waiting for an interrupt. But cmd_busy >> bit was not cleared when Command Completed which results in timeouts >> like this in pciehp_power_off_slot() and pcie_init_notification(): >> >> pcieport 0000:00:03.0: pciehp: Timeout on hotplug command 0x01c0 >> (issued 2264 msec ago) >> pcieport 0000:00:03.0: pciehp: Timeout on hotplug command 0x05c0 >> (issued 2288 msec ago) >> >> Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com> > Fixes: a5dd4b4b0570 ("PCI: pciehp: Wait for hotplug command completion where necessary") > Link: https://bugzilla.kernel.org/show_bug.cgi?id=215143 > Reviewed-by: Lukas Wunner <lukas@wunner.de> > Cc: stable@vger.kernel.org # v4.19+ > > Thanks a lot, that's a really good catch. > > It's a somewhat intricate bug, so I'll try to explain in my own words: > > If notification is disabled (HPIE or CCIE not set in the Slot Status > register), we rely on pcie_poll_cmd() to poll for Command Completed. > But once it's signaled, we neglect to clear ctrl->cmd_busy. > (Normally it is cleared by the hardirq handler pciehp_isr() if > notification is enabled.) > > The result is that starting with the second Slot Control write, > pciehp will gratuitously wait for a command to finish which has > already finished and it will incorrectly report a timeout. > > The bug was originally introduced in 2015 by commit a5dd4b4b0570 > ("PCI: pciehp: Wait for hotplug command completion where necessary"), > but didn't manifest itself because the first Slot Control Write already > enabled notification and from that point on the hardirq handler would > clear ctrl->cmd_busy. However I think the bug may have manifested > itself with pciehp_poll_mode=1. > > It wasn't until commit 4e6a13356f1c ("PCI: pciehp: Deduplicate presence > check on probe & resume") that multiple consecutive Slot Control writes > were performed on ->probe with notification disabled, so that's the > commit which first exposed the bug with pciehp_poll_mode=0. > > Thanks, > > Lukas > >> --- >> drivers/pci/hotplug/pciehp_hpc.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c >> index 83a0fa119cae..8698aefc6041 100644 >> --- a/drivers/pci/hotplug/pciehp_hpc.c >> +++ b/drivers/pci/hotplug/pciehp_hpc.c >> @@ -98,6 +98,8 @@ static int pcie_poll_cmd(struct controller *ctrl, int timeout) >> if (slot_status & PCI_EXP_SLTSTA_CC) { >> pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, >> PCI_EXP_SLTSTA_CC); >> + ctrl->cmd_busy = 0; >> + smp_mb(); >> return 1; >> } >> msleep(10); >> -- >> 2.19.1.6.gb485710b ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] PCI: pciehp: clear cmd_busy bit when Command Completed in polling mode 2021-11-26 17:33 ` Lukas Wunner 2021-11-26 17:35 ` Lukas Wunner 2022-01-06 8:55 ` luanshi @ 2022-02-03 19:07 ` Lukas Wunner 2022-02-03 20:38 ` Bjorn Helgaas 2 siblings, 1 reply; 11+ messages in thread From: Lukas Wunner @ 2022-02-03 19:07 UTC (permalink / raw) To: Bjorn Helgaas, Liguang Zhang Cc: Kuppuswamy Sathyanarayanan, Amey Narkhede, linux-pci, linux-kernel Hi Bjorn, the below patch is marked "Changes Requested" in patchwork: https://patchwork.kernel.org/project/linux-pci/patch/20211111054258.7309-1-zhangliguang@linux.alibaba.com/ I think that might be erroneous because the patch is correct, I've provided a Reviewed-by and no change requests are recorded in patchwork or the mailing list archive. If you've got a few minutes to spare, could you double-check the state in patchwork and provide Liguang Zhang with the changes you'd want (if any)? Thanks! Lukas On Fri, Nov 26, 2021 at 06:33:09PM +0100, Lukas Wunner wrote: > On Thu, Nov 11, 2021 at 01:42:58PM +0800, Liguang Zhang wrote: > > This patch fixes this problem that on driver probe from system startup, > > pciehp checks the Presence Detect State bit in the Slot Status register > > to bring up an occupied slot or bring down an unoccupied slot. If empty > > slot's power status is on, turn power off. The Hot-Plug interrupt isn't > > requested yet, so avoid triggering a notification by calling > > pcie_disable_notification(). > > > > Both the CCIE and HPIE bits are masked in pcie_disable_notification(), > > when we issue a hotplug command, pcie_wait_cmd() will polling the > > Command Completed bit instead of waiting for an interrupt. But cmd_busy > > bit was not cleared when Command Completed which results in timeouts > > like this in pciehp_power_off_slot() and pcie_init_notification(): > > > > pcieport 0000:00:03.0: pciehp: Timeout on hotplug command 0x01c0 > > (issued 2264 msec ago) > > pcieport 0000:00:03.0: pciehp: Timeout on hotplug command 0x05c0 > > (issued 2288 msec ago) > > > > Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com> > > Fixes: a5dd4b4b0570 ("PCI: pciehp: Wait for hotplug command completion where necessary") > Link: https://bugzilla.kernel.org/show_bug.cgi?id=215143 > Reviewed-by: Lukas Wunner <lukas@wunner.de> > Cc: stable@vger.kernel.org # v4.2+ > > Thanks a lot, that's a really good catch. > > It's a somewhat intricate bug, so I'll try to explain in my own words: > > If notification is disabled (HPIE or CCIE not set in the Slot Status > register), we rely on pcie_poll_cmd() to poll for Command Completed. > But once it's signaled, we neglect to clear ctrl->cmd_busy. > (Normally it is cleared by the hardirq handler pciehp_isr() if > notification is enabled.) > > The result is that starting with the second Slot Control write, > pciehp will gratuitously wait for a command to finish which has > already finished and it will incorrectly report a timeout. > > The bug was originally introduced in 2015 by commit a5dd4b4b0570 > ("PCI: pciehp: Wait for hotplug command completion where necessary"), > but didn't manifest itself because the first Slot Control Write already > enabled notification and from that point on the hardirq handler would > clear ctrl->cmd_busy. However I think the bug may have manifested > itself with pciehp_poll_mode=1. > > It wasn't until commit 4e6a13356f1c ("PCI: pciehp: Deduplicate presence > check on probe & resume") that multiple consecutive Slot Control writes > were performed on ->probe with notification disabled, so that's the > commit which first exposed the bug with pciehp_poll_mode=0. > > Thanks, > > Lukas > > > --- > > drivers/pci/hotplug/pciehp_hpc.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c > > index 83a0fa119cae..8698aefc6041 100644 > > --- a/drivers/pci/hotplug/pciehp_hpc.c > > +++ b/drivers/pci/hotplug/pciehp_hpc.c > > @@ -98,6 +98,8 @@ static int pcie_poll_cmd(struct controller *ctrl, int timeout) > > if (slot_status & PCI_EXP_SLTSTA_CC) { > > pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, > > PCI_EXP_SLTSTA_CC); > > + ctrl->cmd_busy = 0; > > + smp_mb(); > > return 1; > > } > > msleep(10); > > -- > > 2.19.1.6.gb485710b ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] PCI: pciehp: clear cmd_busy bit when Command Completed in polling mode 2022-02-03 19:07 ` Lukas Wunner @ 2022-02-03 20:38 ` Bjorn Helgaas 0 siblings, 0 replies; 11+ messages in thread From: Bjorn Helgaas @ 2022-02-03 20:38 UTC (permalink / raw) To: Lukas Wunner Cc: Bjorn Helgaas, Liguang Zhang, Kuppuswamy Sathyanarayanan, Amey Narkhede, linux-pci, linux-kernel On Thu, Feb 03, 2022 at 08:07:31PM +0100, Lukas Wunner wrote: > Hi Bjorn, > > the below patch is marked "Changes Requested" in patchwork: > > https://patchwork.kernel.org/project/linux-pci/patch/20211111054258.7309-1-zhangliguang@linux.alibaba.com/ > > I think that might be erroneous because the patch is correct, > I've provided a Reviewed-by and no change requests are recorded > in patchwork or the mailing list archive. > > If you've got a few minutes to spare, could you double-check the > state in patchwork and provide Liguang Zhang with the changes you'd > want (if any)? Thanks for reminding me about this. I don't remember why I marked it "changes requested" but I often do that if there's been significant discussion. In this case you also provided additional information (Fixes tag, bugzilla link, commit log elaboration) that should be included. Given that, I would typically wait for the author to repost it and incorporate the additional information. But I applied it to pci/hotplug with the commit log below. commit 92912b175178 ("PCI: pciehp: Clear cmd_busy bit in polling mode") Author: Liguang Zhang <zhangliguang@linux.alibaba.com> Date: Thu Nov 11 13:42:58 2021 +0800 PCI: pciehp: Clear cmd_busy bit in polling mode Writes to a Downstream Port's Slot Control register are PCIe hotplug "commands." If the Port supports Command Completed events, software must wait for a command to complete before writing to Slot Control again. pcie_do_write_cmd() sets ctrl->cmd_busy when it writes to Slot Control. If software notification is enabled, i.e., PCI_EXP_SLTCTL_HPIE and PCI_EXP_SLTCTL_CCIE are set, ctrl->cmd_busy is cleared by pciehp_isr(). But when software notification is disabled, as it is when pcie_init() powers off an empty slot, pcie_wait_cmd() uses pcie_poll_cmd() to poll for command completion, and it neglects to clear ctrl->cmd_busy, which leads to spurious timeouts: pcieport 0000:00:03.0: pciehp: Timeout on hotplug command 0x01c0 (issued 2264 msec ago) pcieport 0000:00:03.0: pciehp: Timeout on hotplug command 0x05c0 (issued 2288 msec ago) Clear ctrl->cmd_busy in pcie_poll_cmd() when it detects a Command Completed event (PCI_EXP_SLTSTA_CC). [bhelgaas: commit log] Fixes: a5dd4b4b0570 ("PCI: pciehp: Wait for hotplug command completion where necessary") Link: https://lore.kernel.org/r/20211111054258.7309-1-zhangliguang@linux.alibaba.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=215143 Link: https://lore.kernel.org/r/20211126173309.GA12255@wunner.de Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> Cc: stable@vger.kernel.org # v4.19+ ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2022-02-03 20:42 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-11-11 5:42 [PATCH] PCI: pciehp: clear cmd_busy bit when Command Completed in polling mode Liguang Zhang 2021-11-18 11:00 ` luanshi 2021-11-19 12:00 ` Lukas Wunner 2021-11-21 1:50 ` luanshi 2021-11-21 6:35 ` Lukas Wunner 2021-11-26 7:47 ` luanshi 2021-11-26 17:33 ` Lukas Wunner 2021-11-26 17:35 ` Lukas Wunner 2022-01-06 8:55 ` luanshi 2022-02-03 19:07 ` Lukas Wunner 2022-02-03 20:38 ` Bjorn Helgaas
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).