From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from wp530.webpack.hosteurope.de (wp530.webpack.hosteurope.de [80.237.130.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 188E12C80 for ; Fri, 10 Dec 2021 09:40:31 +0000 (UTC) Received: from ip4d173d4a.dynamic.kabel-deutschland.de ([77.23.61.74] helo=[192.168.66.200]); authenticated by wp530.webpack.hosteurope.de running ExIM with esmtpsa (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) id 1mvcO8-0007uy-Py; Fri, 10 Dec 2021 10:40:24 +0100 Message-ID: <063995d8-acf3-9f33-5667-f284233c94b4@leemhuis.info> Date: Fri, 10 Dec 2021 10:40:24 +0100 Precedence: bulk X-Mailing-List: regressions@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.3.0 Subject: Re: [PATCH] igc: Avoid possible deadlock during suspend/resume Content-Language: en-BS To: Vinicius Costa Gomes , Stefan Dietrich Cc: kuba@kernel.org, greg@kroah.com, netdev@vger.kernel.org, intel-wired-lan@lists.osuosl.org, regressions@lists.linux.dev References: <87r1awtdx3.fsf@intel.com> <20211201185731.236130-1-vinicius.gomes@intel.com> <5a4b31d43d9bf32e518188f3ef84c433df3a18b1.camel@gmx.de> <87o85yljpu.fsf@intel.com> From: Thorsten Leemhuis In-Reply-To: <87o85yljpu.fsf@intel.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-bounce-key: webpack.hosteurope.de;regressions@leemhuis.info;1639129232;52db4137; X-HE-SMSGID: 1mvcO8-0007uy-Py Hi, this is your Linux kernel regression tracker speaking. On 02.12.21 23:34, Vinicius Costa Gomes wrote: > Hi Stefan, > > Stefan Dietrich writes: > >> Hi Vinicius, >> >> thanks for the patch - unfortunately it did not solve the issue and I >> am still getting reboots/lockups. >> > > Thanks for the test. We learned something, not a lot, but something: the > problem you are facing is PTM related and it's not the same bug as that > PM deadlock. > > I am still trying to understand what's going on. > > Are you able to send me the 'dmesg' output for the two kernel configs > (CONFIG_PCIE_PTM enabled and disabled)? (no need to bring the network > interface up or down). Your kernel .config would be useful as well. Stefan, could you provide the data Vinicius asked for? Or did you do that in private already? Or was progress made somewhere else and I simply missed this? Ciao, Thorsten, your Linux kernel regression tracker. P.S.: As a Linux kernel regression tracker I'm getting a lot of reports on my table. I can only look briefly into most of them. Unfortunately therefore I sometimes will get things wrong or miss something important. I hope that's not the case here; if you think it is, don't hesitate to tell me about it in a public reply. That's in everyone's interest, as what I wrote above might be misleading to everyone reading this; any suggestion I gave they thus might sent someone reading this down the wrong rabbit hole, which none of us wants. BTW, I have no personal interest in this issue, which is tracked using regzbot, my Linux kernel regression tracking bot (https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting this mail to get things rolling again and hence don't need to be CC on all further activities wrt to this regression. #regzbot poke >> On Wed, 2021-12-01 at 10:57 -0800, Vinicius Costa Gomes wrote: >>> Inspired by: >>> https://bugzilla.kernel.org/show_bug.cgi?id=215129 >>> >>> Signed-off-by: Vinicius Costa Gomes >>> --- >>> Just to see if it's indeed the same problem as the bug report above. >>> >>> drivers/net/ethernet/intel/igc/igc_main.c | 19 +++++++++++++------ >>> 1 file changed, 13 insertions(+), 6 deletions(-) >>> >>> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c >>> b/drivers/net/ethernet/intel/igc/igc_main.c >>> index 0e19b4d02e62..c58bf557a2a1 100644 >>> --- a/drivers/net/ethernet/intel/igc/igc_main.c >>> +++ b/drivers/net/ethernet/intel/igc/igc_main.c >>> @@ -6619,7 +6619,7 @@ static void igc_deliver_wake_packet(struct >>> net_device *netdev) >>> netif_rx(skb); >>> } >>> >>> -static int __maybe_unused igc_resume(struct device *dev) >>> +static int __maybe_unused __igc_resume(struct device *dev, bool rpm) >>> { >>> struct pci_dev *pdev = to_pci_dev(dev); >>> struct net_device *netdev = pci_get_drvdata(pdev); >>> @@ -6661,20 +6661,27 @@ static int __maybe_unused igc_resume(struct >>> device *dev) >>> >>> wr32(IGC_WUS, ~0); >>> >>> - rtnl_lock(); >>> + if (!rpm) >>> + rtnl_lock(); >>> if (!err && netif_running(netdev)) >>> err = __igc_open(netdev, true); >>> >>> if (!err) >>> netif_device_attach(netdev); >>> - rtnl_unlock(); >>> + if (!rpm) >>> + rtnl_unlock(); >>> >>> return err; >>> } >>> >>> static int __maybe_unused igc_runtime_resume(struct device *dev) >>> { >>> - return igc_resume(dev); >>> + return __igc_resume(dev, true); >>> +} >>> + >>> +static int __maybe_unused igc_resume(struct device *dev) >>> +{ >>> + return __igc_resume(dev, false); >>> } >>> >>> static int __maybe_unused igc_suspend(struct device *dev) >>> @@ -6738,7 +6745,7 @@ static pci_ers_result_t >>> igc_io_error_detected(struct pci_dev *pdev, >>> * @pdev: Pointer to PCI device >>> * >>> * Restart the card from scratch, as if from a cold-boot. >>> Implementation >>> - * resembles the first-half of the igc_resume routine. >>> + * resembles the first-half of the __igc_resume routine. >>> **/ >>> static pci_ers_result_t igc_io_slot_reset(struct pci_dev *pdev) >>> { >>> @@ -6777,7 +6784,7 @@ static pci_ers_result_t >>> igc_io_slot_reset(struct pci_dev *pdev) >>> * >>> * This callback is called when the error recovery driver tells us >>> that >>> * its OK to resume normal operation. Implementation resembles the >>> - * second-half of the igc_resume routine. >>> + * second-half of the __igc_resume routine. >>> */ >>> static void igc_io_resume(struct pci_dev *pdev) >>> { >> > > > Cheers, > From mboxrd@z Thu Jan 1 00:00:00 1970 From: Thorsten Leemhuis Date: Fri, 10 Dec 2021 10:40:24 +0100 Subject: [Intel-wired-lan] [PATCH] igc: Avoid possible deadlock during suspend/resume In-Reply-To: <87o85yljpu.fsf@intel.com> References: <87r1awtdx3.fsf@intel.com> <20211201185731.236130-1-vinicius.gomes@intel.com> <5a4b31d43d9bf32e518188f3ef84c433df3a18b1.camel@gmx.de> <87o85yljpu.fsf@intel.com> Message-ID: <063995d8-acf3-9f33-5667-f284233c94b4@leemhuis.info> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: intel-wired-lan@osuosl.org List-ID: Hi, this is your Linux kernel regression tracker speaking. On 02.12.21 23:34, Vinicius Costa Gomes wrote: > Hi Stefan, > > Stefan Dietrich writes: > >> Hi Vinicius, >> >> thanks for the patch - unfortunately it did not solve the issue and I >> am still getting reboots/lockups. >> > > Thanks for the test. We learned something, not a lot, but something: the > problem you are facing is PTM related and it's not the same bug as that > PM deadlock. > > I am still trying to understand what's going on. > > Are you able to send me the 'dmesg' output for the two kernel configs > (CONFIG_PCIE_PTM enabled and disabled)? (no need to bring the network > interface up or down). Your kernel .config would be useful as well. Stefan, could you provide the data Vinicius asked for? Or did you do that in private already? Or was progress made somewhere else and I simply missed this? Ciao, Thorsten, your Linux kernel regression tracker. P.S.: As a Linux kernel regression tracker I'm getting a lot of reports on my table. I can only look briefly into most of them. Unfortunately therefore I sometimes will get things wrong or miss something important. I hope that's not the case here; if you think it is, don't hesitate to tell me about it in a public reply. That's in everyone's interest, as what I wrote above might be misleading to everyone reading this; any suggestion I gave they thus might sent someone reading this down the wrong rabbit hole, which none of us wants. BTW, I have no personal interest in this issue, which is tracked using regzbot, my Linux kernel regression tracking bot (https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting this mail to get things rolling again and hence don't need to be CC on all further activities wrt to this regression. #regzbot poke >> On Wed, 2021-12-01 at 10:57 -0800, Vinicius Costa Gomes wrote: >>> Inspired by: >>> https://bugzilla.kernel.org/show_bug.cgi?id=215129 >>> >>> Signed-off-by: Vinicius Costa Gomes >>> --- >>> Just to see if it's indeed the same problem as the bug report above. >>> >>> drivers/net/ethernet/intel/igc/igc_main.c | 19 +++++++++++++------ >>> 1 file changed, 13 insertions(+), 6 deletions(-) >>> >>> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c >>> b/drivers/net/ethernet/intel/igc/igc_main.c >>> index 0e19b4d02e62..c58bf557a2a1 100644 >>> --- a/drivers/net/ethernet/intel/igc/igc_main.c >>> +++ b/drivers/net/ethernet/intel/igc/igc_main.c >>> @@ -6619,7 +6619,7 @@ static void igc_deliver_wake_packet(struct >>> net_device *netdev) >>> netif_rx(skb); >>> } >>> >>> -static int __maybe_unused igc_resume(struct device *dev) >>> +static int __maybe_unused __igc_resume(struct device *dev, bool rpm) >>> { >>> struct pci_dev *pdev = to_pci_dev(dev); >>> struct net_device *netdev = pci_get_drvdata(pdev); >>> @@ -6661,20 +6661,27 @@ static int __maybe_unused igc_resume(struct >>> device *dev) >>> >>> wr32(IGC_WUS, ~0); >>> >>> - rtnl_lock(); >>> + if (!rpm) >>> + rtnl_lock(); >>> if (!err && netif_running(netdev)) >>> err = __igc_open(netdev, true); >>> >>> if (!err) >>> netif_device_attach(netdev); >>> - rtnl_unlock(); >>> + if (!rpm) >>> + rtnl_unlock(); >>> >>> return err; >>> } >>> >>> static int __maybe_unused igc_runtime_resume(struct device *dev) >>> { >>> - return igc_resume(dev); >>> + return __igc_resume(dev, true); >>> +} >>> + >>> +static int __maybe_unused igc_resume(struct device *dev) >>> +{ >>> + return __igc_resume(dev, false); >>> } >>> >>> static int __maybe_unused igc_suspend(struct device *dev) >>> @@ -6738,7 +6745,7 @@ static pci_ers_result_t >>> igc_io_error_detected(struct pci_dev *pdev, >>> * @pdev: Pointer to PCI device >>> * >>> * Restart the card from scratch, as if from a cold-boot. >>> Implementation >>> - * resembles the first-half of the igc_resume routine. >>> + * resembles the first-half of the __igc_resume routine. >>> **/ >>> static pci_ers_result_t igc_io_slot_reset(struct pci_dev *pdev) >>> { >>> @@ -6777,7 +6784,7 @@ static pci_ers_result_t >>> igc_io_slot_reset(struct pci_dev *pdev) >>> * >>> * This callback is called when the error recovery driver tells us >>> that >>> * its OK to resume normal operation. Implementation resembles the >>> - * second-half of the igc_resume routine. >>> + * second-half of the __igc_resume routine. >>> */ >>> static void igc_io_resume(struct pci_dev *pdev) >>> { >> > > > Cheers, >