linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] net: hns3: fix to stop multiple HNS reset due to the AER changes
@ 2019-03-10  6:47 Huazhong Tan
  2019-03-10  6:59 ` David Miller
  0 siblings, 1 reply; 2+ messages in thread
From: Huazhong Tan @ 2019-03-10  6:47 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
	Shiju Jose, Huazhong Tan

From: Shiju Jose <shiju.jose@huawei.com>

The commit bfcb79fca19d
("PCI/ERR: Run error recovery callbacks for all affected devices")
affected the non-fatal error recovery logic for the HNS and RDMA devices.
This is because each HNS PF under PCIe bus receive callbacks
from the AER driver when an error is reported for one of the PF.
This causes unwanted PF resets because
the HNS decides which PF to reset based on the reset type set.
The HNS error handling code sets the reset type based on the hw error
type detected.

This patch provides fix for the above issue for the recovery of
the hw errors in the HNS and RDMA devices.

This patch needs backporting to the kernel v5.0+

Fixes: 332fbf576579 ("net: hns3: add handling of hw ras errors using new set of commands")
Reported-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hnae3.h            | 1 +
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c        | 4 +++-
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c | 9 +++++++--
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
index 66d7a8b..38b430f 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
@@ -194,6 +194,7 @@ struct hnae3_ae_dev {
 	const struct hnae3_ae_ops *ops;
 	struct list_head node;
 	u32 flag;
+	u8 override_pci_need_reset; /* fix to stop multiple reset happening */
 	enum hnae3_dev_type dev_type;
 	enum hnae3_reset_type reset_type;
 	void *priv;
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 0d1ae15..1c1f17e 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -1850,7 +1850,9 @@ static pci_ers_result_t hns3_slot_reset(struct pci_dev *pdev)
 
 	/* request the reset */
 	if (ae_dev->ops->reset_event) {
-		ae_dev->ops->reset_event(pdev, NULL);
+		if (!ae_dev->override_pci_need_reset)
+			ae_dev->ops->reset_event(pdev, NULL);
+
 		return PCI_ERS_RESULT_RECOVERED;
 	}
 
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
index 1feceff..1f52d11 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c
@@ -1317,8 +1317,10 @@ pci_ers_result_t hclge_handle_hw_ras_error(struct hnae3_ae_dev *ae_dev)
 		hclge_handle_all_ras_errors(hdev);
 	} else {
 		if (test_bit(HCLGE_STATE_RST_HANDLING, &hdev->state) ||
-		    hdev->pdev->revision < 0x21)
+		    hdev->pdev->revision < 0x21) {
+			ae_dev->override_pci_need_reset = 1;
 			return PCI_ERS_RESULT_RECOVERED;
+		}
 	}
 
 	if (status & HCLGE_RAS_REG_ROCEE_ERR_MASK) {
@@ -1327,8 +1329,11 @@ pci_ers_result_t hclge_handle_hw_ras_error(struct hnae3_ae_dev *ae_dev)
 	}
 
 	if (status & HCLGE_RAS_REG_NFE_MASK ||
-	    status & HCLGE_RAS_REG_ROCEE_ERR_MASK)
+	    status & HCLGE_RAS_REG_ROCEE_ERR_MASK) {
+		ae_dev->override_pci_need_reset = 0;
 		return PCI_ERS_RESULT_NEED_RESET;
+	}
+	ae_dev->override_pci_need_reset = 1;
 
 	return PCI_ERS_RESULT_RECOVERED;
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH net] net: hns3: fix to stop multiple HNS reset due to the AER changes
  2019-03-10  6:47 [PATCH net] net: hns3: fix to stop multiple HNS reset due to the AER changes Huazhong Tan
@ 2019-03-10  6:59 ` David Miller
  0 siblings, 0 replies; 2+ messages in thread
From: David Miller @ 2019-03-10  6:59 UTC (permalink / raw)
  To: tanhuazhong
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm, shiju.jose

From: Huazhong Tan <tanhuazhong@huawei.com>
Date: Sun, 10 Mar 2019 14:47:51 +0800

> From: Shiju Jose <shiju.jose@huawei.com>
> 
> The commit bfcb79fca19d
> ("PCI/ERR: Run error recovery callbacks for all affected devices")
> affected the non-fatal error recovery logic for the HNS and RDMA devices.
> This is because each HNS PF under PCIe bus receive callbacks
> from the AER driver when an error is reported for one of the PF.
> This causes unwanted PF resets because
> the HNS decides which PF to reset based on the reset type set.
> The HNS error handling code sets the reset type based on the hw error
> type detected.
> 
> This patch provides fix for the above issue for the recovery of
> the hw errors in the HNS and RDMA devices.
> 
> This patch needs backporting to the kernel v5.0+
> 
> Fixes: 332fbf576579 ("net: hns3: add handling of hw ras errors using new set of commands")
> Reported-by: Xiaofei Tan <tanxiaofei@huawei.com>
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2019-03-10  6:59 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-10  6:47 [PATCH net] net: hns3: fix to stop multiple HNS reset due to the AER changes Huazhong Tan
2019-03-10  6:59 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).