linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Oded Gabbay <oded.gabbay@gmail.com>
To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org
Cc: SW_Drivers@habana.ai, gregkh@linuxfoundation.org,
	davem@davemloft.net, kuba@kernel.org,
	Omer Shpigelman <oshpigelman@habana.ai>
Subject: [PATCH 11/15] habanalabs/gaudi: add QP error handling
Date: Thu, 10 Sep 2020 19:11:22 +0300	[thread overview]
Message-ID: <20200910161126.30948-12-oded.gabbay@gmail.com> (raw)
In-Reply-To: <20200910161126.30948-1-oded.gabbay@gmail.com>

From: Omer Shpigelman <oshpigelman@habana.ai>

Add Queue Pair (QP) error notification to the user e.g. security violation,
too many retransmissions, invalid QP etc.

Whenever a QP caused an error, the firmware will send an event to the
driver which will push the error as an error entry to the Completion Queue
(if exists).

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/gaudi/gaudi.c     | 13 ++++
 drivers/misc/habanalabs/gaudi/gaudiP.h    |  1 +
 drivers/misc/habanalabs/gaudi/gaudi_nic.c | 93 +++++++++++++++++++++++
 3 files changed, 107 insertions(+)

diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index 34b99bd94ef0..8fc2288fb424 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -6658,6 +6658,19 @@ static void gaudi_handle_eqe(struct hl_device *hdev,
 		hl_fw_unmask_irq(hdev, event_type);
 		break;
 
+	case GAUDI_EVENT_NIC0_QP0:
+	case GAUDI_EVENT_NIC0_QP1:
+	case GAUDI_EVENT_NIC1_QP0:
+	case GAUDI_EVENT_NIC1_QP1:
+	case GAUDI_EVENT_NIC2_QP0:
+	case GAUDI_EVENT_NIC2_QP1:
+	case GAUDI_EVENT_NIC3_QP0:
+	case GAUDI_EVENT_NIC3_QP1:
+	case GAUDI_EVENT_NIC4_QP0:
+	case GAUDI_EVENT_NIC4_QP1:
+		gaudi_nic_handle_qp_err(hdev, event_type);
+		break;
+
 	case GAUDI_EVENT_PSOC_GPIO_U16_0:
 		cause = le64_to_cpu(eq_entry->data[0]) & 0xFF;
 		dev_err(hdev->dev,
diff --git a/drivers/misc/habanalabs/gaudi/gaudiP.h b/drivers/misc/habanalabs/gaudi/gaudiP.h
index ba3150c073ca..dc1dcff43cd6 100644
--- a/drivers/misc/habanalabs/gaudi/gaudiP.h
+++ b/drivers/misc/habanalabs/gaudi/gaudiP.h
@@ -578,5 +578,6 @@ netdev_tx_t gaudi_nic_handle_tx_pkt(struct gaudi_nic_device *gaudi_nic,
 					struct sk_buff *skb);
 int gaudi_nic_sw_init(struct hl_device *hdev);
 void gaudi_nic_sw_fini(struct hl_device *hdev);
+void gaudi_nic_handle_qp_err(struct hl_device *hdev, u16 event_type);
 
 #endif /* GAUDIP_H_ */
diff --git a/drivers/misc/habanalabs/gaudi/gaudi_nic.c b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
index 8f6585c700cf..41789f7ed32e 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi_nic.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi_nic.c
@@ -3958,3 +3958,96 @@ int gaudi_nic_cq_mmap(struct hl_device *hdev, struct vm_area_struct *vma)
 
 	return rc;
 }
+
+static char *get_syndrome_text(u32 syndrome)
+{
+	char *str;
+
+	switch (syndrome) {
+	case 0x05:
+		str = "Rx got invalid QP";
+		break;
+	case 0x06:
+		str = "Rx transport service mismatch";
+		break;
+	case 0x09:
+		str = "Rx Rkey check failed";
+		break;
+	case 0x40:
+		str = "timer retry exceeded";
+		break;
+	case 0x41:
+		str = "NACK retry exceeded";
+		break;
+	case 0x42:
+		str = "doorbell on invalid QP";
+		break;
+	case 0x43:
+		str = "doorbell security check failed";
+		break;
+	case 0x44:
+		str = "Tx got invalid QP";
+		break;
+	case 0x45:
+		str = "responder got ACK/NACK on invalid QP";
+		break;
+	case 0x46:
+		str = "responder try to send ACK/NACK on invalid QP";
+		break;
+	default:
+		str = "unknown syndrome";
+		break;
+	}
+
+	return str;
+}
+
+void gaudi_nic_handle_qp_err(struct hl_device *hdev, u16 event_type)
+{
+	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct gaudi_nic_device *gaudi_nic =
+			&gaudi->nic_devices[event_type - GAUDI_EVENT_NIC0_QP0];
+	struct qp_err *qp_err_arr = gaudi_nic->qp_err_mem_cpu;
+	struct hl_nic_cqe cqe_sw;
+	u32 pi, ci;
+
+	mutex_lock(&gaudi->nic_qp_err_lock);
+
+	if (!gaudi->nic_cq_enable)
+		dev_err_ratelimited(hdev->dev,
+			"received NIC %d QP error event %d but no CQ to push it\n",
+			gaudi_nic->port, event_type);
+
+	pi = NIC_RREG32(mmNIC0_QPC0_ERR_FIFO_PRODUCER_INDEX);
+	ci = gaudi_nic->qp_err_ci;
+
+	cqe_sw.is_err = true;
+	cqe_sw.port = gaudi_nic->port;
+
+	while (ci < pi) {
+		cqe_sw.type = QP_ERR_IS_REQ(qp_err_arr[ci]) ?
+				HL_NIC_CQE_TYPE_REQ : HL_NIC_CQE_TYPE_RES;
+		cqe_sw.qp_number = QP_ERR_QP_NUM(qp_err_arr[ci]);
+		cqe_sw.qp_err.syndrome = QP_ERR_ERR_NUM(qp_err_arr[ci]);
+
+		ci = (ci + 1) & (QP_ERR_BUF_LEN - 1);
+
+		dev_err_ratelimited(hdev->dev,
+			"NIC QP error port: %d, type: %d, qpn: %d, syndrome: %s (0x%x)\n",
+			cqe_sw.port, cqe_sw.type, cqe_sw.qp_number,
+			get_syndrome_text(cqe_sw.qp_err.syndrome),
+			cqe_sw.qp_err.syndrome);
+
+		if (gaudi->nic_cq_enable)
+			copy_cqe_to_main_queue(hdev, &cqe_sw);
+	}
+
+	gaudi_nic->qp_err_ci = ci;
+	NIC_WREG32(mmNIC0_QPC0_ERR_FIFO_CONSUMER_INDEX, ci);
+
+	/* signal the completion queue that there are available CQEs */
+	if (gaudi->nic_cq_enable)
+		complete(&gaudi->nic_cq_comp);
+
+	mutex_unlock(&gaudi->nic_qp_err_lock);
+}
-- 
2.17.1


  parent reply	other threads:[~2020-09-10 19:02 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-10 16:11 [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver Oded Gabbay
2020-09-10 16:11 ` [PATCH 02/15] habanalabs/gaudi: add NIC firmware-related definitions Oded Gabbay
2020-09-10 16:11 ` [PATCH 03/15] habanalabs/gaudi: add NIC security configuration Oded Gabbay
2020-09-10 16:11 ` [PATCH 04/15] habanalabs/gaudi: add support for NIC QMANs Oded Gabbay
2020-09-10 16:11 ` [PATCH 05/15] habanalabs/gaudi: add NIC Ethernet support Oded Gabbay
2020-09-10 20:03   ` Jakub Kicinski
2020-09-10 20:18     ` Oded Gabbay
2020-09-14  9:52     ` Omer Shpigelman
2020-09-14 16:47       ` Jakub Kicinski
2020-09-10 16:11 ` [PATCH 06/15] habanalabs/gaudi: add NIC PHY code Oded Gabbay
2020-09-10 16:11 ` [PATCH 07/15] habanalabs/gaudi: allow user to get MAC addresses in INFO IOCTL Oded Gabbay
2020-09-10 16:11 ` [PATCH 08/15] habanalabs/gaudi: add a new IOCTL for NIC control operations Oded Gabbay
2020-09-10 16:11 ` [PATCH 09/15] habanalabs/gaudi: add CQ " Oded Gabbay
2020-09-10 16:11 ` [PATCH 10/15] habanalabs/gaudi: add WQ " Oded Gabbay
2020-09-10 16:11 ` Oded Gabbay [this message]
2020-09-10 16:11 ` [PATCH 12/15] habanalabs/gaudi: add debugfs entries for the NIC Oded Gabbay
2020-09-10 20:01   ` Jakub Kicinski
2020-09-10 20:10     ` Oded Gabbay
2020-09-10 20:16       ` Jakub Kicinski
2020-09-10 20:17         ` Oded Gabbay
2020-09-10 20:30           ` Jakub Kicinski
2020-09-10 20:33             ` Oded Gabbay
2020-09-14 13:48             ` Omer Shpigelman
2020-09-14 16:50               ` Jakub Kicinski
2020-09-15 12:57                 ` Oded Gabbay
2020-09-16 16:38                   ` Jakub Kicinski
2020-09-10 16:11 ` [PATCH 13/15] habanalabs/gaudi: Add ethtool support using coresight Oded Gabbay
2020-09-10 20:19   ` Andrew Lunn
2020-09-10 20:22     ` Oded Gabbay
2020-09-10 16:11 ` [PATCH 14/15] habanalabs/gaudi: support DCB protocol Oded Gabbay
2020-09-10 16:11 ` [PATCH 15/15] habanalabs/gaudi: add NIC init/fini calls from common code Oded Gabbay
2020-09-10 20:01 ` [PATCH 00/15] Adding GAUDI NIC code to habanalabs driver Jakub Kicinski
2020-09-10 20:16   ` Oded Gabbay
2020-09-10 20:25     ` Andrew Lunn
2020-09-10 20:30       ` Oded Gabbay
2020-09-10 20:38         ` Andrew Lunn
2020-09-10 20:52           ` Oded Gabbay
2020-09-11  6:22           ` Greg Kroah-Hartman
2020-09-10 20:28     ` Jakub Kicinski
2020-09-10 20:32       ` Oded Gabbay
2020-09-10 21:05         ` Florian Fainelli
2020-09-10 21:15           ` Oded Gabbay
2020-09-10 21:23             ` Florian Fainelli
  -- strict thread matches above, loose matches on Subject: below --
2020-09-10 15:03 Oded Gabbay
2020-09-10 15:03 ` [PATCH 11/15] habanalabs/gaudi: add QP error handling Oded Gabbay

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200910161126.30948-12-oded.gabbay@gmail.com \
    --to=oded.gabbay@gmail.com \
    --cc=SW_Drivers@habana.ai \
    --cc=davem@davemloft.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=oshpigelman@habana.ai \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).