From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 07D182C82 for ; Thu, 2 Dec 2021 22:34:23 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10186"; a="300245954" X-IronPort-AV: E=Sophos;i="5.87,282,1631602800"; d="scan'208";a="300245954" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2021 14:34:22 -0800 X-IronPort-AV: E=Sophos;i="5.87,282,1631602800"; d="scan'208";a="746314689" Received: from vcostago-desk1.jf.intel.com (HELO vcostago-desk1) ([10.54.70.10]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2021 14:34:21 -0800 From: Vinicius Costa Gomes To: Stefan Dietrich Cc: kuba@kernel.org, greg@kroah.com, netdev@vger.kernel.org, intel-wired-lan@lists.osuosl.org, regressions@lists.linux.dev Subject: Re: [PATCH] igc: Avoid possible deadlock during suspend/resume In-Reply-To: <5a4b31d43d9bf32e518188f3ef84c433df3a18b1.camel@gmx.de> References: <87r1awtdx3.fsf@intel.com> <20211201185731.236130-1-vinicius.gomes@intel.com> <5a4b31d43d9bf32e518188f3ef84c433df3a18b1.camel@gmx.de> Date: Thu, 02 Dec 2021 14:34:21 -0800 Message-ID: <87o85yljpu.fsf@intel.com> Precedence: bulk X-Mailing-List: regressions@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain Hi Stefan, Stefan Dietrich writes: > Hi Vinicius, > > thanks for the patch - unfortunately it did not solve the issue and I > am still getting reboots/lockups. > Thanks for the test. We learned something, not a lot, but something: the problem you are facing is PTM related and it's not the same bug as that PM deadlock. I am still trying to understand what's going on. Are you able to send me the 'dmesg' output for the two kernel configs (CONFIG_PCIE_PTM enabled and disabled)? (no need to bring the network interface up or down). Your kernel .config would be useful as well. > > Cheers, > Stefan > > On Wed, 2021-12-01 at 10:57 -0800, Vinicius Costa Gomes wrote: >> Inspired by: >> https://bugzilla.kernel.org/show_bug.cgi?id=215129 >> >> Signed-off-by: Vinicius Costa Gomes >> --- >> Just to see if it's indeed the same problem as the bug report above. >> >> drivers/net/ethernet/intel/igc/igc_main.c | 19 +++++++++++++------ >> 1 file changed, 13 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c >> b/drivers/net/ethernet/intel/igc/igc_main.c >> index 0e19b4d02e62..c58bf557a2a1 100644 >> --- a/drivers/net/ethernet/intel/igc/igc_main.c >> +++ b/drivers/net/ethernet/intel/igc/igc_main.c >> @@ -6619,7 +6619,7 @@ static void igc_deliver_wake_packet(struct >> net_device *netdev) >> netif_rx(skb); >> } >> >> -static int __maybe_unused igc_resume(struct device *dev) >> +static int __maybe_unused __igc_resume(struct device *dev, bool rpm) >> { >> struct pci_dev *pdev = to_pci_dev(dev); >> struct net_device *netdev = pci_get_drvdata(pdev); >> @@ -6661,20 +6661,27 @@ static int __maybe_unused igc_resume(struct >> device *dev) >> >> wr32(IGC_WUS, ~0); >> >> - rtnl_lock(); >> + if (!rpm) >> + rtnl_lock(); >> if (!err && netif_running(netdev)) >> err = __igc_open(netdev, true); >> >> if (!err) >> netif_device_attach(netdev); >> - rtnl_unlock(); >> + if (!rpm) >> + rtnl_unlock(); >> >> return err; >> } >> >> static int __maybe_unused igc_runtime_resume(struct device *dev) >> { >> - return igc_resume(dev); >> + return __igc_resume(dev, true); >> +} >> + >> +static int __maybe_unused igc_resume(struct device *dev) >> +{ >> + return __igc_resume(dev, false); >> } >> >> static int __maybe_unused igc_suspend(struct device *dev) >> @@ -6738,7 +6745,7 @@ static pci_ers_result_t >> igc_io_error_detected(struct pci_dev *pdev, >> * @pdev: Pointer to PCI device >> * >> * Restart the card from scratch, as if from a cold-boot. >> Implementation >> - * resembles the first-half of the igc_resume routine. >> + * resembles the first-half of the __igc_resume routine. >> **/ >> static pci_ers_result_t igc_io_slot_reset(struct pci_dev *pdev) >> { >> @@ -6777,7 +6784,7 @@ static pci_ers_result_t >> igc_io_slot_reset(struct pci_dev *pdev) >> * >> * This callback is called when the error recovery driver tells us >> that >> * its OK to resume normal operation. Implementation resembles the >> - * second-half of the igc_resume routine. >> + * second-half of the __igc_resume routine. >> */ >> static void igc_io_resume(struct pci_dev *pdev) >> { > Cheers, -- Vinicius From mboxrd@z Thu Jan 1 00:00:00 1970 From: Vinicius Costa Gomes Date: Thu, 02 Dec 2021 14:34:21 -0800 Subject: [Intel-wired-lan] [PATCH] igc: Avoid possible deadlock during suspend/resume In-Reply-To: <5a4b31d43d9bf32e518188f3ef84c433df3a18b1.camel@gmx.de> References: <87r1awtdx3.fsf@intel.com> <20211201185731.236130-1-vinicius.gomes@intel.com> <5a4b31d43d9bf32e518188f3ef84c433df3a18b1.camel@gmx.de> Message-ID: <87o85yljpu.fsf@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: intel-wired-lan@osuosl.org List-ID: Hi Stefan, Stefan Dietrich writes: > Hi Vinicius, > > thanks for the patch - unfortunately it did not solve the issue and I > am still getting reboots/lockups. > Thanks for the test. We learned something, not a lot, but something: the problem you are facing is PTM related and it's not the same bug as that PM deadlock. I am still trying to understand what's going on. Are you able to send me the 'dmesg' output for the two kernel configs (CONFIG_PCIE_PTM enabled and disabled)? (no need to bring the network interface up or down). Your kernel .config would be useful as well. > > Cheers, > Stefan > > On Wed, 2021-12-01 at 10:57 -0800, Vinicius Costa Gomes wrote: >> Inspired by: >> https://bugzilla.kernel.org/show_bug.cgi?id=215129 >> >> Signed-off-by: Vinicius Costa Gomes >> --- >> Just to see if it's indeed the same problem as the bug report above. >> >> drivers/net/ethernet/intel/igc/igc_main.c | 19 +++++++++++++------ >> 1 file changed, 13 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c >> b/drivers/net/ethernet/intel/igc/igc_main.c >> index 0e19b4d02e62..c58bf557a2a1 100644 >> --- a/drivers/net/ethernet/intel/igc/igc_main.c >> +++ b/drivers/net/ethernet/intel/igc/igc_main.c >> @@ -6619,7 +6619,7 @@ static void igc_deliver_wake_packet(struct >> net_device *netdev) >> netif_rx(skb); >> } >> >> -static int __maybe_unused igc_resume(struct device *dev) >> +static int __maybe_unused __igc_resume(struct device *dev, bool rpm) >> { >> struct pci_dev *pdev = to_pci_dev(dev); >> struct net_device *netdev = pci_get_drvdata(pdev); >> @@ -6661,20 +6661,27 @@ static int __maybe_unused igc_resume(struct >> device *dev) >> >> wr32(IGC_WUS, ~0); >> >> - rtnl_lock(); >> + if (!rpm) >> + rtnl_lock(); >> if (!err && netif_running(netdev)) >> err = __igc_open(netdev, true); >> >> if (!err) >> netif_device_attach(netdev); >> - rtnl_unlock(); >> + if (!rpm) >> + rtnl_unlock(); >> >> return err; >> } >> >> static int __maybe_unused igc_runtime_resume(struct device *dev) >> { >> - return igc_resume(dev); >> + return __igc_resume(dev, true); >> +} >> + >> +static int __maybe_unused igc_resume(struct device *dev) >> +{ >> + return __igc_resume(dev, false); >> } >> >> static int __maybe_unused igc_suspend(struct device *dev) >> @@ -6738,7 +6745,7 @@ static pci_ers_result_t >> igc_io_error_detected(struct pci_dev *pdev, >> * @pdev: Pointer to PCI device >> * >> * Restart the card from scratch, as if from a cold-boot. >> Implementation >> - * resembles the first-half of the igc_resume routine. >> + * resembles the first-half of the __igc_resume routine. >> **/ >> static pci_ers_result_t igc_io_slot_reset(struct pci_dev *pdev) >> { >> @@ -6777,7 +6784,7 @@ static pci_ers_result_t >> igc_io_slot_reset(struct pci_dev *pdev) >> * >> * This callback is called when the error recovery driver tells us >> that >> * its OK to resume normal operation. Implementation resembles the >> - * second-half of the igc_resume routine. >> + * second-half of the __igc_resume routine. >> */ >> static void igc_io_resume(struct pci_dev *pdev) >> { > Cheers, -- Vinicius