Agreed and thanks for the pointers; please see the log files and .config attached as requested. Cheers, Stefan On Fri, 2021-12-10 at 15:01 +0100, Thorsten Leemhuis wrote: > On 10.12.21 14:45, Stefan Dietrich wrote: > > thanks for keeping an eye on the issue. I've sent the files in > > private > > because I did not want to spam the mailing lists with them. Please > > let > > me know if this is the correct procedure. > > It's likely okay in this case, but FWIW: most of the time it's the > wrong > thing to do as outlined here: > > https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html#general-advice-for-further-interactions > > One reason for this: others that might want to look into the issue > now > or a in a year or two might be unable to if crucial data was only > sent > in private. > > Ciao, Thorsten > > > On Fri, 2021-12-10 at 10:40 +0100, Thorsten Leemhuis wrote: > > > Hi, this is your Linux kernel regression tracker speaking. > > > > > > On 02.12.21 23:34, Vinicius Costa Gomes wrote: > > > > Hi Stefan, > > > > > > > > Stefan Dietrich writes: > > > > > > > > > Hi Vinicius, > > > > > > > > > > thanks for the patch - unfortunately it did not solve the > > > > > issue > > > > > and I > > > > > am still getting reboots/lockups. > > > > > > > > > > > > > Thanks for the test. We learned something, not a lot, but > > > > something: the > > > > problem you are facing is PTM related and it's not the same bug > > > > as > > > > that > > > > PM deadlock. > > > > > > > > I am still trying to understand what's going on. > > > > > > > > Are you able to send me the 'dmesg' output for the two kernel > > > > configs > > > > (CONFIG_PCIE_PTM enabled and disabled)? (no need to bring the > > > > network > > > > interface up or down). Your kernel .config would be useful as > > > > well. > > > > > > Stefan, could you provide the data Vinicius asked for? Or did you > > > do > > > that in private already? Or was progress made somewhere else and > > > I > > > simply missed this? > > > > > > Ciao, Thorsten, your Linux kernel regression tracker. > > > > > > P.S.: As a Linux kernel regression tracker I'm getting a lot of > > > reports > > > on my table. I can only look briefly into most of them. > > > Unfortunately > > > therefore I sometimes will get things wrong or miss something > > > important. > > > I hope that's not the case here; if you think it is, don't > > > hesitate > > > to > > > tell me about it in a public reply. That's in everyone's > > > interest, as > > > what I wrote above might be misleading to everyone reading this; > > > any > > > suggestion I gave they thus might sent someone reading this down > > > the > > > wrong rabbit hole, which none of us wants. > > > > > > BTW, I have no personal interest in this issue, which is tracked > > > using > > > regzbot, my Linux kernel regression tracking bot > > > (https://linux-regtracking.leemhuis.info/regzbot/). I'm only > > > posting > > > this mail to get things rolling again and hence don't need to be > > > CC > > > on > > > all further activities wrt to this regression. > > > > > > #regzbot poke > > > > > > > > On Wed, 2021-12-01 at 10:57 -0800, Vinicius Costa Gomes > > > > > wrote: > > > > > > Inspired by: > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=215129 > > > > > > > > > > > > Signed-off-by: Vinicius Costa Gomes < > > > > > > vinicius.gomes@intel.com> > > > > > > --- > > > > > > Just to see if it's indeed the same problem as the bug > > > > > > report > > > > > > above. > > > > > > > > > > > > drivers/net/ethernet/intel/igc/igc_main.c | 19 > > > > > > +++++++++++++ > > > > > > ------ > > > > > > 1 file changed, 13 insertions(+), 6 deletions(-) > > > > > > > > > > > > diff --git a/drivers/net/ethernet/intel/igc/igc_main.c > > > > > > b/drivers/net/ethernet/intel/igc/igc_main.c > > > > > > index 0e19b4d02e62..c58bf557a2a1 100644 > > > > > > --- a/drivers/net/ethernet/intel/igc/igc_main.c > > > > > > +++ b/drivers/net/ethernet/intel/igc/igc_main.c > > > > > > @@ -6619,7 +6619,7 @@ static void > > > > > > igc_deliver_wake_packet(struct > > > > > > net_device *netdev) > > > > > > netif_rx(skb); > > > > > > } > > > > > > > > > > > > -static int __maybe_unused igc_resume(struct device *dev) > > > > > > +static int __maybe_unused __igc_resume(struct device *dev, > > > > > > bool rpm) > > > > > > { > > > > > > struct pci_dev *pdev = to_pci_dev(dev); > > > > > > struct net_device *netdev = pci_get_drvdata(pdev); > > > > > > @@ -6661,20 +6661,27 @@ static int __maybe_unused > > > > > > igc_resume(struct > > > > > > device *dev) > > > > > > > > > > > > wr32(IGC_WUS, ~0); > > > > > > > > > > > > - rtnl_lock(); > > > > > > + if (!rpm) > > > > > > + rtnl_lock(); > > > > > > if (!err && netif_running(netdev)) > > > > > > err = __igc_open(netdev, true); > > > > > > > > > > > > if (!err) > > > > > > netif_device_attach(netdev); > > > > > > - rtnl_unlock(); > > > > > > + if (!rpm) > > > > > > + rtnl_unlock(); > > > > > > > > > > > > return err; > > > > > > } > > > > > > > > > > > > static int __maybe_unused igc_runtime_resume(struct device > > > > > > *dev) > > > > > > { > > > > > > - return igc_resume(dev); > > > > > > + return __igc_resume(dev, true); > > > > > > +} > > > > > > + > > > > > > +static int __maybe_unused igc_resume(struct device *dev) > > > > > > +{ > > > > > > + return __igc_resume(dev, false); > > > > > > } > > > > > > > > > > > > static int __maybe_unused igc_suspend(struct device *dev) > > > > > > @@ -6738,7 +6745,7 @@ static pci_ers_result_t > > > > > > igc_io_error_detected(struct pci_dev *pdev, > > > > > > * @pdev: Pointer to PCI device > > > > > > * > > > > > > * Restart the card from scratch, as if from a cold-boot. > > > > > > Implementation > > > > > > - * resembles the first-half of the igc_resume routine. > > > > > > + * resembles the first-half of the __igc_resume routine. > > > > > > **/ > > > > > > static pci_ers_result_t igc_io_slot_reset(struct pci_dev > > > > > > *pdev) > > > > > > { > > > > > > @@ -6777,7 +6784,7 @@ static pci_ers_result_t > > > > > > igc_io_slot_reset(struct pci_dev *pdev) > > > > > > * > > > > > > * This callback is called when the error recovery driver > > > > > > tells us > > > > > > that > > > > > > * its OK to resume normal operation. Implementation > > > > > > resembles the > > > > > > - * second-half of the igc_resume routine. > > > > > > + * second-half of the __igc_resume routine. > > > > > > */ > > > > > > static void igc_io_resume(struct pci_dev *pdev) > > > > > > { > > > > > > > > Cheers, > > > > > > > >