* [PATCH 0/5] Fix DPC hotplug race and enhance error hanlding @ 2020-09-25 2:34 Ethan Zhao 2020-09-25 2:34 ` [PATCH 4/5] PCI: only return true when dev io state is really changed Ethan Zhao ` (4 more replies) 0 siblings, 5 replies; 16+ messages in thread From: Ethan Zhao @ 2020-09-25 2:34 UTC (permalink / raw) To: bhelgaas, oohall, ruscur, lukas, andriy.shevchenko, stuart.w.hayes, mr.nuke.me, mika.westerberg Cc: linux-pci, linux-kernel, pei.p.jia, Ethan Zhao This simple patch set fixed some serious security issues found when DPC error injection and NVMe SSD hotplug brute force test were doing -- race condition between DPC handler and pciehp, AER interrupt handlers, caused system hang and system with DPC feature couldn't recover to normal working state as expected (NVMe instance lost, mount operation hang, race PCIe access caused uncorrectable errors reported alternativly etc). With this patch set applied, stable 5.9-rc6 could pass the PCIe Gen4 NVMe SSD brute force hotplug test with any time interval between hot-remove and plug-in operation tens of times without any errors occur and system works normal. With this patch set applied, system with DPC feature could recover from NON-FATAL and FATAL errors injection test and works as expected. System works smoothly when errors happen while hotplug is doing, no uncorrectable errors found. Brute DPC error injection script: for i in {0..100} do setpci -s 64:02.0 0x196.w=000a setpci -s 65:00.0 0x04.w=0544 mount /dev/nvme0n1p1 /root/nvme sleep 1 done Other details see every commits description part. This patch set could be applied to stable 5.9-rc6 directly. Help to review and test. Thanks, Ethan Ethan Zhao (5): PCI: define a function to check and wait till port finish DPC handling PCI: pciehp: check and wait port status out of DPC before handling DLLSC and PDC PCI/ERR: get device before call device driver to avoid null pointer reference PCI: only return true when dev io state is really changed PCI/ERR: don't mix io state not changed and no driver together drivers/pci/hotplug/pciehp_hpc.c | 4 +++- drivers/pci/pci.h | 31 +++---------------------------- drivers/pci/pcie/err.c | 18 ++++++++++++++++-- include/linux/pci.h | 31 +++++++++++++++++++++++++++++++ 4 files changed, 53 insertions(+), 31 deletions(-) -- 2.18.4 ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 4/5] PCI: only return true when dev io state is really changed 2020-09-25 2:34 [PATCH 0/5] Fix DPC hotplug race and enhance error hanlding Ethan Zhao @ 2020-09-25 2:34 ` Ethan Zhao 2020-09-25 12:38 ` Andy Shevchenko 2020-09-25 13:56 ` Alex G. 2020-09-25 2:34 ` [PATCH 5/5] PCI/ERR: don't mix io state not changed and no driver together Ethan Zhao ` (3 subsequent siblings) 4 siblings, 2 replies; 16+ messages in thread From: Ethan Zhao @ 2020-09-25 2:34 UTC (permalink / raw) To: bhelgaas, oohall, ruscur, lukas, andriy.shevchenko, stuart.w.hayes, mr.nuke.me, mika.westerberg Cc: linux-pci, linux-kernel, pei.p.jia, Ethan Zhao When uncorrectable error happens, AER driver and DPC driver interrupt handlers likely call pcie_do_recovery()->pci_walk_bus()->report_frozen_detected() with pci_channel_io_frozen the same time. If pci_dev_set_io_state() return true even if the original state is pci_channel_io_frozen, that will cause AER or DPC handler re-enter the error detecting and recovery procedure one after another. The result is the recovery flow mixed between AER and DPC. So simplify the pci_dev_set_io_state() function to only return true when dev->error_state is changed. Signed-off-by: Ethan Zhao <haifeng.zhao@intel.com> Tested-by: Wen jin <wen.jin@intel.com> Tested-by: Shanshan Zhang <ShanshanX.Zhang@intel.com> --- drivers/pci/pci.h | 31 +++---------------------------- 1 file changed, 3 insertions(+), 28 deletions(-) diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index fa12f7cbc1a0..d420bb977f3b 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -362,35 +362,10 @@ static inline bool pci_dev_set_io_state(struct pci_dev *dev, bool changed = false; device_lock_assert(&dev->dev); - switch (new) { - case pci_channel_io_perm_failure: - switch (dev->error_state) { - case pci_channel_io_frozen: - case pci_channel_io_normal: - case pci_channel_io_perm_failure: - changed = true; - break; - } - break; - case pci_channel_io_frozen: - switch (dev->error_state) { - case pci_channel_io_frozen: - case pci_channel_io_normal: - changed = true; - break; - } - break; - case pci_channel_io_normal: - switch (dev->error_state) { - case pci_channel_io_frozen: - case pci_channel_io_normal: - changed = true; - break; - } - break; - } - if (changed) + if (dev->error_state != new) { dev->error_state = new; + changed = true; + } return changed; } -- 2.18.4 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 4/5] PCI: only return true when dev io state is really changed 2020-09-25 2:34 ` [PATCH 4/5] PCI: only return true when dev io state is really changed Ethan Zhao @ 2020-09-25 12:38 ` Andy Shevchenko 2020-09-27 1:28 ` Zhao, Haifeng 2020-09-25 13:56 ` Alex G. 1 sibling, 1 reply; 16+ messages in thread From: Andy Shevchenko @ 2020-09-25 12:38 UTC (permalink / raw) To: Ethan Zhao Cc: bhelgaas, oohall, ruscur, lukas, stuart.w.hayes, mr.nuke.me, mika.westerberg, linux-pci, linux-kernel, pei.p.jia On Thu, Sep 24, 2020 at 10:34:22PM -0400, Ethan Zhao wrote: > When uncorrectable error happens, AER driver and DPC driver interrupt > handlers likely call > pcie_do_recovery()->pci_walk_bus()->report_frozen_detected() with > pci_channel_io_frozen the same time. Call chains are better to read if they split like foo() -> bar() -> baz() > If pci_dev_set_io_state() return true even if the original state is > pci_channel_io_frozen, that will cause AER or DPC handler re-enter > the error detecting and recovery procedure one after another. > The result is the recovery flow mixed between AER and DPC. > So simplify the pci_dev_set_io_state() function to only return true > when dev->error_state is changed. ... > + if (dev->error_state != new) { > dev->error_state = new; > + changed = true; > + } > return changed; Perhaps if (dev->error_state == new) return changed; dev->error_state = new; return true; ? -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [PATCH 4/5] PCI: only return true when dev io state is really changed 2020-09-25 12:38 ` Andy Shevchenko @ 2020-09-27 1:28 ` Zhao, Haifeng 0 siblings, 0 replies; 16+ messages in thread From: Zhao, Haifeng @ 2020-09-27 1:28 UTC (permalink / raw) To: Andy Shevchenko Cc: bhelgaas, oohall, ruscur, lukas, stuart.w.hayes, mr.nuke.me, mika.westerberg, linux-pci, linux-kernel, Jia, Pei P Yes, better ! -----Original Message----- From: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Sent: Friday, September 25, 2020 8:38 PM To: Zhao, Haifeng <haifeng.zhao@intel.com> Cc: bhelgaas@google.com; oohall@gmail.com; ruscur@russell.cc; lukas@wunner.de; stuart.w.hayes@gmail.com; mr.nuke.me@gmail.com; mika.westerberg@linux.intel.com; linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Jia, Pei P <pei.p.jia@intel.com> Subject: Re: [PATCH 4/5] PCI: only return true when dev io state is really changed On Thu, Sep 24, 2020 at 10:34:22PM -0400, Ethan Zhao wrote: > When uncorrectable error happens, AER driver and DPC driver interrupt > handlers likely call > pcie_do_recovery()->pci_walk_bus()->report_frozen_detected() with > pci_channel_io_frozen the same time. Call chains are better to read if they split like foo() -> bar() -> baz() > If pci_dev_set_io_state() return true even if the original state is > pci_channel_io_frozen, that will cause AER or DPC handler re-enter the > error detecting and recovery procedure one after another. > The result is the recovery flow mixed between AER and DPC. > So simplify the pci_dev_set_io_state() function to only return true > when dev->error_state is changed. ... > + if (dev->error_state != new) { > dev->error_state = new; > + changed = true; > + } > return changed; Perhaps if (dev->error_state == new) return changed; dev->error_state = new; return true; ? -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 4/5] PCI: only return true when dev io state is really changed 2020-09-25 2:34 ` [PATCH 4/5] PCI: only return true when dev io state is really changed Ethan Zhao 2020-09-25 12:38 ` Andy Shevchenko @ 2020-09-25 13:56 ` Alex G. 1 sibling, 0 replies; 16+ messages in thread From: Alex G. @ 2020-09-25 13:56 UTC (permalink / raw) To: Ethan Zhao, bhelgaas, oohall, ruscur, lukas, andriy.shevchenko, stuart.w.hayes, mika.westerberg Cc: linux-pci, linux-kernel, pei.p.jia Hi Ethan, On 9/24/20 9:34 PM, Ethan Zhao wrote: > When uncorrectable error happens, AER driver and DPC driver interrupt > handlers likely call > pcie_do_recovery()->pci_walk_bus()->report_frozen_detected() with > pci_channel_io_frozen the same time. > If pci_dev_set_io_state() return true even if the original state is > pci_channel_io_frozen, that will cause AER or DPC handler re-enter > the error detecting and recovery procedure one after another. > The result is the recovery flow mixed between AER and DPC. > So simplify the pci_dev_set_io_state() function to only return true > when dev->error_state is changed. > > Signed-off-by: Ethan Zhao <haifeng.zhao@intel.com> > Tested-by: Wen jin <wen.jin@intel.com> > Tested-by: Shanshan Zhang <ShanshanX.Zhang@intel.com> > --- > drivers/pci/pci.h | 31 +++---------------------------- > 1 file changed, 3 insertions(+), 28 deletions(-) > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h > index fa12f7cbc1a0..d420bb977f3b 100644 > --- a/drivers/pci/pci.h > +++ b/drivers/pci/pci.h > @@ -362,35 +362,10 @@ static inline bool pci_dev_set_io_state(struct pci_dev *dev, > bool changed = false; > > device_lock_assert(&dev->dev); > - switch (new) { > - case pci_channel_io_perm_failure: > - switch (dev->error_state) { > - case pci_channel_io_frozen: > - case pci_channel_io_normal: > - case pci_channel_io_perm_failure: > - changed = true; > - break; > - } > - break; > - case pci_channel_io_frozen: > - switch (dev->error_state) { > - case pci_channel_io_frozen: > - case pci_channel_io_normal: > - changed = true; > - break; > - } > - break; > - case pci_channel_io_normal: > - switch (dev->error_state) { > - case pci_channel_io_frozen: > - case pci_channel_io_normal: > - changed = true; > - break; > - } > - break; > - } > - if (changed) > + if (dev->error_state != new) { > dev->error_state = new; > + changed = true; > + } > return changed; > } The flow is a lot easier to follow now. Thank you. Reviewed-by: Alexandru Gagniuc <mr.nuke.me@gmail.com> ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 5/5] PCI/ERR: don't mix io state not changed and no driver together 2020-09-25 2:34 [PATCH 0/5] Fix DPC hotplug race and enhance error hanlding Ethan Zhao 2020-09-25 2:34 ` [PATCH 4/5] PCI: only return true when dev io state is really changed Ethan Zhao @ 2020-09-25 2:34 ` Ethan Zhao [not found] ` <20200925023423.42675-2-haifeng.zhao@intel.com> ` (2 subsequent siblings) 4 siblings, 0 replies; 16+ messages in thread From: Ethan Zhao @ 2020-09-25 2:34 UTC (permalink / raw) To: bhelgaas, oohall, ruscur, lukas, andriy.shevchenko, stuart.w.hayes, mr.nuke.me, mika.westerberg Cc: linux-pci, linux-kernel, pei.p.jia, Ethan Zhao When we see 'can't recover (no error_detected callback)' on console, Maybe the reason is io state is not changed by calling pci_dev_set_io_state(), that is confused. fix it. Signed-off-by: Ethan Zhao <haifeng.zhao@intel.com> Tested-by: Wen jin <wen.jin@intel.com> Tested-by: Shanshan Zhang <ShanshanX.Zhang@intel.com> --- drivers/pci/pcie/err.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index e35c4480c86b..d85f27c90c26 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -55,8 +55,10 @@ static int report_error_detected(struct pci_dev *dev, if (!pci_dev_get(dev)) return 0; device_lock(&dev->dev); - if (!pci_dev_set_io_state(dev, state) || - !dev->driver || + if (!pci_dev_set_io_state(dev, state)) { + pci_dbg(dev, "Device might already being in error handling ...\n"); + vote = PCI_ERS_RESULT_NONE; + } else if (!dev->driver || !dev->driver->err_handler || !dev->driver->err_handler->error_detected) { /* -- 2.18.4 ^ permalink raw reply related [flat|nested] 16+ messages in thread
[parent not found: <20200925023423.42675-2-haifeng.zhao@intel.com>]
* Re: [PATCH 1/5] PCI: define a function to check and wait till port finish DPC handling [not found] ` <20200925023423.42675-2-haifeng.zhao@intel.com> @ 2020-09-25 12:24 ` Andy Shevchenko 2020-09-27 1:53 ` Zhao, Haifeng 0 siblings, 1 reply; 16+ messages in thread From: Andy Shevchenko @ 2020-09-25 12:24 UTC (permalink / raw) To: Ethan Zhao Cc: bhelgaas, oohall, ruscur, lukas, stuart.w.hayes, mr.nuke.me, mika.westerberg, linux-pci, linux-kernel, pei.p.jia On Thu, Sep 24, 2020 at 10:34:19PM -0400, Ethan Zhao wrote: > Once root port DPC capability is enabled and triggered, at the beginning > of DPC is triggered, the DPC status bits are set by hardware and then > sends DPC/DLLSC/PDC interrupts to OS DPC and pciehp drivers, it will > take the port and software DPC interrupt handler 10ms to 50ms (test data > on ICX platform & stable 5.9-rc6) to complete the DPC containment procedure > till the DPC status is cleared at the end of the DPC interrupt handler. > > We use this function to check if the root port is in DPC handling status > and wait till the hardware and software completed the procedure. > #include <linux/interrupt.h> > #include <linux/io.h> > #include <linux/resource_ext.h> > +#include <linux/delay.h> Keep it sorted? > #include <uapi/linux/pci.h> ... > +#ifdef CONFIG_PCIE_DPC > +static inline bool pci_wait_port_outdpc(struct pci_dev *pdev) > +{ > + u16 cap = pdev->dpc_cap, status; > + u16 loop = 0; > + > + if (!cap) { > + pci_WARN_ONCE(pdev, !cap, "No DPC capability initiated\n"); But why? Is this feature mandatory to have? Then the same question about ifdeffery, otherwise it's pretty normal to not have a feature, right? > + return false; > + } > + pci_read_config_word(pdev, cap + PCI_EXP_DPC_STATUS, &status); > + pci_dbg(pdev, "DPC status %x, cap %x\n", status, cap); > + while (status & PCI_EXP_DPC_STATUS_TRIGGER && loop < 100) { > + msleep(10); > + loop++; > + pci_read_config_word(pdev, cap + PCI_EXP_DPC_STATUS, &status); > + } Can we have rather something like readx_poll_timeout() for PCI and use them here? > + if (!(status & PCI_EXP_DPC_STATUS_TRIGGER)) { > + pci_dbg(pdev, "Out of DPC status %x, time cost %d ms\n", status, loop*10); > + return true; > + } > + pci_dbg(pdev, "Timeout to wait port out of DPC status\n"); > + return false; > +} > +#else > +static inline bool pci_wait_port_outdpc(struct pci_dev *pdev) > +{ > + return true; > +} > +#endif > #endif /* LINUX_PCI_H */ > -- > 2.18.4 > -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [PATCH 1/5] PCI: define a function to check and wait till port finish DPC handling 2020-09-25 12:24 ` [PATCH 1/5] PCI: define a function to check and wait till port finish DPC handling Andy Shevchenko @ 2020-09-27 1:53 ` Zhao, Haifeng 0 siblings, 0 replies; 16+ messages in thread From: Zhao, Haifeng @ 2020-09-27 1:53 UTC (permalink / raw) To: Andy Shevchenko Cc: bhelgaas, oohall, ruscur, lukas, stuart.w.hayes, mr.nuke.me, mika.westerberg, linux-pci, linux-kernel, Jia, Pei P Andy, About the header file, yes, to keep the order. The function was already defined with #ifdef CONFIG_PCIE_DPC. As to ' readx_poll_timeout()' if there is generic one, I would like to use it. Seems there is no yet ? Thanks, Ethan -----Original Message----- From: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Sent: Friday, September 25, 2020 8:25 PM To: Zhao, Haifeng <haifeng.zhao@intel.com> Cc: bhelgaas@google.com; oohall@gmail.com; ruscur@russell.cc; lukas@wunner.de; stuart.w.hayes@gmail.com; mr.nuke.me@gmail.com; mika.westerberg@linux.intel.com; linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Jia, Pei P <pei.p.jia@intel.com> Subject: Re: [PATCH 1/5] PCI: define a function to check and wait till port finish DPC handling On Thu, Sep 24, 2020 at 10:34:19PM -0400, Ethan Zhao wrote: > Once root port DPC capability is enabled and triggered, at the > beginning of DPC is triggered, the DPC status bits are set by hardware > and then sends DPC/DLLSC/PDC interrupts to OS DPC and pciehp drivers, > it will take the port and software DPC interrupt handler 10ms to 50ms > (test data on ICX platform & stable 5.9-rc6) to complete the DPC > containment procedure till the DPC status is cleared at the end of the DPC interrupt handler. > > We use this function to check if the root port is in DPC handling > status and wait till the hardware and software completed the procedure. > #include <linux/interrupt.h> > #include <linux/io.h> > #include <linux/resource_ext.h> > +#include <linux/delay.h> Keep it sorted? > #include <uapi/linux/pci.h> ... > +#ifdef CONFIG_PCIE_DPC > +static inline bool pci_wait_port_outdpc(struct pci_dev *pdev) { > + u16 cap = pdev->dpc_cap, status; > + u16 loop = 0; > + > + if (!cap) { > + pci_WARN_ONCE(pdev, !cap, "No DPC capability initiated\n"); But why? Is this feature mandatory to have? Then the same question about ifdeffery, otherwise it's pretty normal to not have a feature, right? > + return false; > + } > + pci_read_config_word(pdev, cap + PCI_EXP_DPC_STATUS, &status); > + pci_dbg(pdev, "DPC status %x, cap %x\n", status, cap); > + while (status & PCI_EXP_DPC_STATUS_TRIGGER && loop < 100) { > + msleep(10); > + loop++; > + pci_read_config_word(pdev, cap + PCI_EXP_DPC_STATUS, &status); > + } Can we have rather something like readx_poll_timeout() for PCI and use them here? > + if (!(status & PCI_EXP_DPC_STATUS_TRIGGER)) { > + pci_dbg(pdev, "Out of DPC status %x, time cost %d ms\n", status, loop*10); > + return true; > + } > + pci_dbg(pdev, "Timeout to wait port out of DPC status\n"); > + return false; > +} > +#else > +static inline bool pci_wait_port_outdpc(struct pci_dev *pdev) { > + return true; > +} > +#endif > #endif /* LINUX_PCI_H */ > -- > 2.18.4 > -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <20200925023423.42675-3-haifeng.zhao@intel.com>]
* Re: [PATCH 2/5] PCI: pciehp: check and wait port status out of DPC before handling DLLSC and PDC [not found] ` <20200925023423.42675-3-haifeng.zhao@intel.com> @ 2020-09-25 12:32 ` Andy Shevchenko 2020-09-27 1:50 ` Zhao, Haifeng 0 siblings, 1 reply; 16+ messages in thread From: Andy Shevchenko @ 2020-09-25 12:32 UTC (permalink / raw) To: Ethan Zhao Cc: bhelgaas, oohall, ruscur, lukas, stuart.w.hayes, mr.nuke.me, mika.westerberg, linux-pci, linux-kernel, pei.p.jia On Thu, Sep 24, 2020 at 10:34:20PM -0400, Ethan Zhao wrote: > When root port has DPC capability and it is enabled, then triggered by errors, DPC > DLLSC and PDC interrupts will be sent to DPC driver, pciehp driver at the same time. > That will cause following result: > > 1. Link and device are recovered by hardware DPC and software DPC driver, device > isn't removed, but the pciehp might treat it as devce was hot removed. > > 2. Race condition happens between pciehp_unconfigure_device() called by > pciehp_ist() in pciehp driver and pci_do_recovery() called by dpc_handler in > DPC driver. no luck, there is no lock to protect pci_stop_and_remove_bus_device() > against pci_walk_bus(), they hold different semaphore and mutex, > pci_stop_and_remove_bus_device holds pci_rescan_remove_lock, and pci_walk_bus() > holds pci_bus_sem. > > This race condition is not purely code analysis, it could be triggered by following > command series: > > # setpci -s 64:02.0 0x196.w=000a // 64:02.0 is rootport has DPC capability > # setpci -s 65:00.0 0x04.w=0544 // 65:00.0 is NVMe SSD populated in above port > # mount /dev/nvme0n1p1 nvme > > One shot will cause system panic and null pointer reference happened. > (tested on stable 5.8 & ICX platform) What is ICX (yes, I know, but you are writing this for wider audience)? > Buffer I/O error on dev nvme0n1p1, logical block 468843328, async page read > BUG: kernel NULL pointer dereference, address: 0000000000000050 > #PF: supervisor read access in kernel mode > #PF: error_code(0x0000) - not-present page Please, try to remove non-relevant information from the crashes. > Oops: 0000 [#1] SMP NOPTI > CPU: 12 PID: 513 Comm: irq/124-pcie-dp Not tainted 5.8.0-0.0.7.el8.x86_64+ #1 > Hardware name: Intel Corporation Wilxxxx.200x0x0xxx 0x/0x/20x0 > RIP: 0010:report_error_detected.cold.4+0x7d/0xe6 > Code: b6 d0 e8 e8 fe 11 00 e8 16 c5 fb ff be 06 00 00 00 48 89 df e8 d3 65 ff > ff b8 06 00 00 00 e9 75 fc ff ff 48 8b 43 68 45 31 c9 <48> 8b 50 50 48 83 3a 00 > 41 0f 94 c1 45 31 c0 48 85 d2 41 0f 94 c0 > RSP: 0018:ff8e06cf8762fda8 EFLAGS: 00010246 > RAX: 0000000000000000 RBX: ff4e3eaacf42a000 RCX: ff4e3eb31f223c01 > RDX: ff4e3eaacf42a140 RSI: ff4e3eb31f223c00 RDI: ff4e3eaacf42a138 > RBP: ff8e06cf8762fdd0 R08: 00000000000000bf R09: 0000000000000000 > R10: 000000eb8ebeab53 R11: ffffffff93453258 R12: 0000000000000002 > R13: ff4e3eaacf42a130 R14: ff8e06cf8762fe2c R15: ff4e3eab44733828 > FS: 0000000000000000(0000) GS:ff4e3eab1fd00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000050 CR3: 0000000f8f80a004 CR4: 0000000000761ee0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > PKRU: 55555554 > Call Trace: > ? report_normal_detected+0x20/0x20 > report_frozen_detected+0x16/0x20 > pci_walk_bus+0x75/0x90 > ? dpc_irq+0x90/0x90 > pcie_do_recovery+0x157/0x201 > ? irq_finalize_oneshot.part.47+0xe0/0xe0 > dpc_handler+0x29/0x40 > irq_thread_fn+0x24/0x60 > irq_thread+0xea/0x170 > ? irq_forced_thread_fn+0x80/0x80 > ? irq_thread_check_affinity+0xf0/0xf0 > kthread+0x124/0x140 > ? kthread_park+0x90/0x90 > ret_from_fork+0x1f/0x30 > Modules linked in: nft_fib_inet......... > CR2: 0000000000000050 > > With this patch, the handling flow of DPC containment and hotplug is partly ordered > and serialized, let hardware DPC do the controller reset etc recovery action first, > then DPC driver handling the call-back from device drivers, clear the DPC status, > at the end, pciehp driver handles the DLLSC and PDC etc. > This patch closes the race conditon partly. Not sure if Bjorn require to get rid of "this patch" from commit messages... In any case it's written in documentation. -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [PATCH 2/5] PCI: pciehp: check and wait port status out of DPC before handling DLLSC and PDC 2020-09-25 12:32 ` [PATCH 2/5] PCI: pciehp: check and wait port status out of DPC before handling DLLSC and PDC Andy Shevchenko @ 2020-09-27 1:50 ` Zhao, Haifeng 0 siblings, 0 replies; 16+ messages in thread From: Zhao, Haifeng @ 2020-09-27 1:50 UTC (permalink / raw) To: Andy Shevchenko Cc: bhelgaas, oohall, ruscur, lukas, stuart.w.hayes, mr.nuke.me, mika.westerberg, linux-pci, linux-kernel, Jia, Pei P Andy, About the ICX code name, I will align it to public documentation in next version. ' non-relevant information' about the crash, I'm not sure what's not relevant. Thanks, Ethan -----Original Message----- From: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Sent: Friday, September 25, 2020 8:33 PM To: Zhao, Haifeng <haifeng.zhao@intel.com> Cc: bhelgaas@google.com; oohall@gmail.com; ruscur@russell.cc; lukas@wunner.de; stuart.w.hayes@gmail.com; mr.nuke.me@gmail.com; mika.westerberg@linux.intel.com; linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Jia, Pei P <pei.p.jia@intel.com> Subject: Re: [PATCH 2/5] PCI: pciehp: check and wait port status out of DPC before handling DLLSC and PDC On Thu, Sep 24, 2020 at 10:34:20PM -0400, Ethan Zhao wrote: > When root port has DPC capability and it is enabled, then triggered by > errors, DPC DLLSC and PDC interrupts will be sent to DPC driver, pciehp driver at the same time. > That will cause following result: > > 1. Link and device are recovered by hardware DPC and software DPC driver, device > isn't removed, but the pciehp might treat it as devce was hot removed. > > 2. Race condition happens between pciehp_unconfigure_device() called by > pciehp_ist() in pciehp driver and pci_do_recovery() called by dpc_handler in > DPC driver. no luck, there is no lock to protect pci_stop_and_remove_bus_device() > against pci_walk_bus(), they hold different semaphore and mutex, > pci_stop_and_remove_bus_device holds pci_rescan_remove_lock, and pci_walk_bus() > holds pci_bus_sem. > > This race condition is not purely code analysis, it could be triggered > by following command series: > > # setpci -s 64:02.0 0x196.w=000a // 64:02.0 is rootport has DPC capability > # setpci -s 65:00.0 0x04.w=0544 // 65:00.0 is NVMe SSD populated in above port > # mount /dev/nvme0n1p1 nvme > > One shot will cause system panic and null pointer reference happened. > (tested on stable 5.8 & ICX platform) What is ICX (yes, I know, but you are writing this for wider audience)? > Buffer I/O error on dev nvme0n1p1, logical block 468843328, async page read > BUG: kernel NULL pointer dereference, address: 0000000000000050 > #PF: supervisor read access in kernel mode > #PF: error_code(0x0000) - not-present page Please, try to remove non-relevant information from the crashes. > Oops: 0000 [#1] SMP NOPTI > CPU: 12 PID: 513 Comm: irq/124-pcie-dp Not tainted 5.8.0-0.0.7.el8.x86_64+ #1 > Hardware name: Intel Corporation Wilxxxx.200x0x0xxx 0x/0x/20x0 > RIP: 0010:report_error_detected.cold.4+0x7d/0xe6 > Code: b6 d0 e8 e8 fe 11 00 e8 16 c5 fb ff be 06 00 00 00 48 89 df e8 d3 65 ff > ff b8 06 00 00 00 e9 75 fc ff ff 48 8b 43 68 45 31 c9 <48> 8b 50 50 48 83 3a 00 > 41 0f 94 c1 45 31 c0 48 85 d2 41 0f 94 c0 > RSP: 0018:ff8e06cf8762fda8 EFLAGS: 00010246 > RAX: 0000000000000000 RBX: ff4e3eaacf42a000 RCX: ff4e3eb31f223c01 > RDX: ff4e3eaacf42a140 RSI: ff4e3eb31f223c00 RDI: ff4e3eaacf42a138 > RBP: ff8e06cf8762fdd0 R08: 00000000000000bf R09: 0000000000000000 > R10: 000000eb8ebeab53 R11: ffffffff93453258 R12: 0000000000000002 > R13: ff4e3eaacf42a130 R14: ff8e06cf8762fe2c R15: ff4e3eab44733828 > FS: 0000000000000000(0000) GS:ff4e3eab1fd00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000050 CR3: 0000000f8f80a004 CR4: 0000000000761ee0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > PKRU: 55555554 > Call Trace: > ? report_normal_detected+0x20/0x20 > report_frozen_detected+0x16/0x20 > pci_walk_bus+0x75/0x90 > ? dpc_irq+0x90/0x90 > pcie_do_recovery+0x157/0x201 > ? irq_finalize_oneshot.part.47+0xe0/0xe0 > dpc_handler+0x29/0x40 > irq_thread_fn+0x24/0x60 > irq_thread+0xea/0x170 > ? irq_forced_thread_fn+0x80/0x80 > ? irq_thread_check_affinity+0xf0/0xf0 > kthread+0x124/0x140 > ? kthread_park+0x90/0x90 > ret_from_fork+0x1f/0x30 > Modules linked in: nft_fib_inet......... > CR2: 0000000000000050 > > With this patch, the handling flow of DPC containment and hotplug is > partly ordered and serialized, let hardware DPC do the controller > reset etc recovery action first, then DPC driver handling the > call-back from device drivers, clear the DPC status, at the end, pciehp driver handles the DLLSC and PDC etc. > This patch closes the race conditon partly. Not sure if Bjorn require to get rid of "this patch" from commit messages... In any case it's written in documentation. -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <20200925023423.42675-4-haifeng.zhao@intel.com>]
* Re: [PATCH 3/5] PCI/ERR: get device before call device driver to avoid null pointer reference [not found] ` <20200925023423.42675-4-haifeng.zhao@intel.com> @ 2020-09-25 12:35 ` Andy Shevchenko 2020-09-29 2:35 ` Ethan Zhao 0 siblings, 1 reply; 16+ messages in thread From: Andy Shevchenko @ 2020-09-25 12:35 UTC (permalink / raw) To: Ethan Zhao Cc: bhelgaas, oohall, ruscur, lukas, stuart.w.hayes, mr.nuke.me, mika.westerberg, linux-pci, linux-kernel, pei.p.jia On Thu, Sep 24, 2020 at 10:34:21PM -0400, Ethan Zhao wrote: > During DPC error injection test we found there is race condition between > pciehp and DPC driver, null pointer reference caused panic as following null -> NULL > > # setpci -s 64:02.0 0x196.w=000a > // 64:02.0 is rootport has DPC capability > # setpci -s 65:00.0 0x04.w=0544 > // 65:00.0 is NVMe SSD populated in above port > # mount /dev/nvme0n1p1 nvme > > (tested on stable 5.8 & ICX platform) > > Buffer I/O error on dev nvme0n1p1, logical block 468843328, > async page read > BUG: kernel NULL pointer dereference, address: 0000000000000050 > #PF: supervisor read access in kernel mode > #PF: error_code(0x0000) - not-present page Same comment about Oops. -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/5] PCI/ERR: get device before call device driver to avoid null pointer reference 2020-09-25 12:35 ` [PATCH 3/5] PCI/ERR: get device before call device driver to avoid null pointer reference Andy Shevchenko @ 2020-09-29 2:35 ` Ethan Zhao 2020-09-29 8:51 ` Andy Shevchenko 0 siblings, 1 reply; 16+ messages in thread From: Ethan Zhao @ 2020-09-29 2:35 UTC (permalink / raw) To: Andy Shevchenko Cc: Ethan Zhao, Bjorn Helgaas, Oliver, ruscur, Lukas Wunner, Stuart Hayes, Alexandru Gagniuc, Mika Westerberg, linux-pci, Linux Kernel Mailing List, Jia, Pei P Preferred style, there will be cleared comment in v6. Thanks, Ethan On Sat, Sep 26, 2020 at 12:42 AM Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote: > > On Thu, Sep 24, 2020 at 10:34:21PM -0400, Ethan Zhao wrote: > > During DPC error injection test we found there is race condition between > > pciehp and DPC driver, null pointer reference caused panic as following > > null -> NULL > > > > > # setpci -s 64:02.0 0x196.w=000a > > // 64:02.0 is rootport has DPC capability > > # setpci -s 65:00.0 0x04.w=0544 > > // 65:00.0 is NVMe SSD populated in above port > > # mount /dev/nvme0n1p1 nvme > > > > (tested on stable 5.8 & ICX platform) > > > > Buffer I/O error on dev nvme0n1p1, logical block 468843328, > > async page read > > BUG: kernel NULL pointer dereference, address: 0000000000000050 > > #PF: supervisor read access in kernel mode > > #PF: error_code(0x0000) - not-present page > > Same comment about Oops. > > -- > With Best Regards, > Andy Shevchenko > > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/5] PCI/ERR: get device before call device driver to avoid null pointer reference 2020-09-29 2:35 ` Ethan Zhao @ 2020-09-29 8:51 ` Andy Shevchenko 2020-09-29 9:38 ` Ethan Zhao 0 siblings, 1 reply; 16+ messages in thread From: Andy Shevchenko @ 2020-09-29 8:51 UTC (permalink / raw) To: Ethan Zhao Cc: Ethan Zhao, Bjorn Helgaas, Oliver, ruscur, Lukas Wunner, Stuart Hayes, Alexandru Gagniuc, Mika Westerberg, linux-pci, Linux Kernel Mailing List, Jia, Pei P On Tue, Sep 29, 2020 at 10:35:14AM +0800, Ethan Zhao wrote: > Preferred style, there will be cleared comment in v6. Avoid top postings. > On Sat, Sep 26, 2020 at 12:42 AM Andy Shevchenko > <andriy.shevchenko@linux.intel.com> wrote: > > > > On Thu, Sep 24, 2020 at 10:34:21PM -0400, Ethan Zhao wrote: > > > During DPC error injection test we found there is race condition between > > > pciehp and DPC driver, null pointer reference caused panic as following > > > > null -> NULL > > > > > > > > # setpci -s 64:02.0 0x196.w=000a > > > // 64:02.0 is rootport has DPC capability > > > # setpci -s 65:00.0 0x04.w=0544 > > > // 65:00.0 is NVMe SSD populated in above port > > > # mount /dev/nvme0n1p1 nvme > > > > > > (tested on stable 5.8 & ICX platform) > > > > > > Buffer I/O error on dev nvme0n1p1, logical block 468843328, > > > async page read > > > BUG: kernel NULL pointer dereference, address: 0000000000000050 > > > #PF: supervisor read access in kernel mode > > > #PF: error_code(0x0000) - not-present page > > > > Same comment about Oops. In another thread it was a good advice to move the full Oops (if you think it's very useful to have) after the cutter '---' line, so it will be in email archives but Git history. -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/5] PCI/ERR: get device before call device driver to avoid null pointer reference 2020-09-29 8:51 ` Andy Shevchenko @ 2020-09-29 9:38 ` Ethan Zhao 2020-09-29 10:48 ` Andy Shevchenko 0 siblings, 1 reply; 16+ messages in thread From: Ethan Zhao @ 2020-09-29 9:38 UTC (permalink / raw) To: Andy Shevchenko Cc: Ethan Zhao, Bjorn Helgaas, Oliver, ruscur, Lukas Wunner, Stuart Hayes, Alexandru Gagniuc, Mika Westerberg, linux-pci, Linux Kernel Mailing List, Jia, Pei P Andy, On Tue, Sep 29, 2020 at 4:51 PM Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote: > > On Tue, Sep 29, 2020 at 10:35:14AM +0800, Ethan Zhao wrote: > > Preferred style, there will be cleared comment in v6. > > Avoid top postings. > > > On Sat, Sep 26, 2020 at 12:42 AM Andy Shevchenko > > <andriy.shevchenko@linux.intel.com> wrote: > > > > > > On Thu, Sep 24, 2020 at 10:34:21PM -0400, Ethan Zhao wrote: > > > > During DPC error injection test we found there is race condition between > > > > pciehp and DPC driver, null pointer reference caused panic as following > > > > > > null -> NULL > > > > > > > > > > > # setpci -s 64:02.0 0x196.w=000a > > > > // 64:02.0 is rootport has DPC capability > > > > # setpci -s 65:00.0 0x04.w=0544 > > > > // 65:00.0 is NVMe SSD populated in above port > > > > # mount /dev/nvme0n1p1 nvme > > > > > > > > (tested on stable 5.8 & ICX platform) > > > > > > > > Buffer I/O error on dev nvme0n1p1, logical block 468843328, > > > > async page read > > > > BUG: kernel NULL pointer dereference, address: 0000000000000050 > > > > #PF: supervisor read access in kernel mode > > > > #PF: error_code(0x0000) - not-present page > > > > > > Same comment about Oops. > > In another thread it was a good advice to move the full Oops (if you think it's > very useful to have) after the cutter '---' line, so it will be in email > archives but Git history. So git history wouldn't give any of the Oops context, and he/she has to access LKML, if offline, then ...lost. Thanks, Ethan > > -- > With Best Regards, > Andy Shevchenko > > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/5] PCI/ERR: get device before call device driver to avoid null pointer reference 2020-09-29 9:38 ` Ethan Zhao @ 2020-09-29 10:48 ` Andy Shevchenko 2020-09-30 2:07 ` Ethan Zhao 0 siblings, 1 reply; 16+ messages in thread From: Andy Shevchenko @ 2020-09-29 10:48 UTC (permalink / raw) To: Ethan Zhao Cc: Ethan Zhao, Bjorn Helgaas, Oliver, ruscur, Lukas Wunner, Stuart Hayes, Alexandru Gagniuc, Mika Westerberg, linux-pci, Linux Kernel Mailing List, Jia, Pei P On Tue, Sep 29, 2020 at 05:38:00PM +0800, Ethan Zhao wrote: > On Tue, Sep 29, 2020 at 4:51 PM Andy Shevchenko > <andriy.shevchenko@linux.intel.com> wrote: > > On Tue, Sep 29, 2020 at 10:35:14AM +0800, Ethan Zhao wrote: > > > Preferred style, there will be cleared comment in v6. > > > > Avoid top postings. > > > > > On Sat, Sep 26, 2020 at 12:42 AM Andy Shevchenko > > > <andriy.shevchenko@linux.intel.com> wrote: > > > > On Thu, Sep 24, 2020 at 10:34:21PM -0400, Ethan Zhao wrote: ... > > > > > Buffer I/O error on dev nvme0n1p1, logical block 468843328, > > > > > async page read > > > > > BUG: kernel NULL pointer dereference, address: 0000000000000050 > > > > > #PF: supervisor read access in kernel mode > > > > > #PF: error_code(0x0000) - not-present page > > > > > > > > Same comment about Oops. > > > > In another thread it was a good advice to move the full Oops (if you think it's > > very useful to have) after the cutter '---' line, so it will be in email > > archives but Git history. > > So git history wouldn't give any of the Oops context, and he/she has > to access LKML, > if offline, then ...lost. Tell me, do you really think the line: #PF: error_code(0x0000) - not-present page makes any sense in the commit message? I do not think so. And so on, for almost 60% of the Oops. Really, to help one person you will make millions suffering. It's not okay. -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/5] PCI/ERR: get device before call device driver to avoid null pointer reference 2020-09-29 10:48 ` Andy Shevchenko @ 2020-09-30 2:07 ` Ethan Zhao 0 siblings, 0 replies; 16+ messages in thread From: Ethan Zhao @ 2020-09-30 2:07 UTC (permalink / raw) To: Andy Shevchenko Cc: Ethan Zhao, Bjorn Helgaas, Oliver, ruscur, Lukas Wunner, Stuart Hayes, Alexandru Gagniuc, Mika Westerberg, linux-pci, Linux Kernel Mailing List, Jia, Pei P On Tue, Sep 29, 2020 at 6:48 PM Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote: > > On Tue, Sep 29, 2020 at 05:38:00PM +0800, Ethan Zhao wrote: > > On Tue, Sep 29, 2020 at 4:51 PM Andy Shevchenko > > <andriy.shevchenko@linux.intel.com> wrote: > > > On Tue, Sep 29, 2020 at 10:35:14AM +0800, Ethan Zhao wrote: > > > > Preferred style, there will be cleared comment in v6. > > > > > > Avoid top postings. > > > > > > > On Sat, Sep 26, 2020 at 12:42 AM Andy Shevchenko > > > > <andriy.shevchenko@linux.intel.com> wrote: > > > > > On Thu, Sep 24, 2020 at 10:34:21PM -0400, Ethan Zhao wrote: > > ... > > > > > > > Buffer I/O error on dev nvme0n1p1, logical block 468843328, > > > > > > async page read > > > > > > BUG: kernel NULL pointer dereference, address: 0000000000000050 > > > > > > #PF: supervisor read access in kernel mode > > > > > > #PF: error_code(0x0000) - not-present page > > > > > > > > > > Same comment about Oops. > > > > > > In another thread it was a good advice to move the full Oops (if you think it's > > > very useful to have) after the cutter '---' line, so it will be in email > > > archives but Git history. > > > > So git history wouldn't give any of the Oops context, and he/she has > > to access LKML, > > if offline, then ...lost. > > Tell me, do you really think the line: > #PF: error_code(0x0000) - not-present page > makes any sense in the commit message? > > I do not think so. And so on, for almost 60% of the Oops. > Really, to help one person you will make millions suffering. It's not okay. > If you and millions feel so suffered, why not try to simplify the Oops code to not output nonsense too much to console and log. :) They might not be so important to old birds as you. but it is easy to cut off the road for newcomers. Anyway, it is not the focus of this patchset. help to take a look and try the code. Thanks, Ethan > -- > With Best Regards, > Andy Shevchenko > > ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2020-09-30 2:07 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-09-25 2:34 [PATCH 0/5] Fix DPC hotplug race and enhance error hanlding Ethan Zhao 2020-09-25 2:34 ` [PATCH 4/5] PCI: only return true when dev io state is really changed Ethan Zhao 2020-09-25 12:38 ` Andy Shevchenko 2020-09-27 1:28 ` Zhao, Haifeng 2020-09-25 13:56 ` Alex G. 2020-09-25 2:34 ` [PATCH 5/5] PCI/ERR: don't mix io state not changed and no driver together Ethan Zhao [not found] ` <20200925023423.42675-2-haifeng.zhao@intel.com> 2020-09-25 12:24 ` [PATCH 1/5] PCI: define a function to check and wait till port finish DPC handling Andy Shevchenko 2020-09-27 1:53 ` Zhao, Haifeng [not found] ` <20200925023423.42675-3-haifeng.zhao@intel.com> 2020-09-25 12:32 ` [PATCH 2/5] PCI: pciehp: check and wait port status out of DPC before handling DLLSC and PDC Andy Shevchenko 2020-09-27 1:50 ` Zhao, Haifeng [not found] ` <20200925023423.42675-4-haifeng.zhao@intel.com> 2020-09-25 12:35 ` [PATCH 3/5] PCI/ERR: get device before call device driver to avoid null pointer reference Andy Shevchenko 2020-09-29 2:35 ` Ethan Zhao 2020-09-29 8:51 ` Andy Shevchenko 2020-09-29 9:38 ` Ethan Zhao 2020-09-29 10:48 ` Andy Shevchenko 2020-09-30 2:07 ` Ethan Zhao
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.