All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] habanalabs/gaudi: reset device upon BMC request
@ 2021-03-09 18:28 Oded Gabbay
  2021-03-09 18:28 ` [PATCH 2/2] habanalabs/gaudi: always use single-msi mode Oded Gabbay
  0 siblings, 1 reply; 2+ messages in thread
From: Oded Gabbay @ 2021-03-09 18:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ofir Bitton

From: Ofir Bitton <obitton@habana.ai>

In case the BMC of the devices' box wants to initiate a reset of
a specific device, it must go through driver.
Once driver will receive the request it will initiate a hard reset
flow.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
 drivers/misc/habanalabs/gaudi/gaudi.c                         | 4 ++++
 drivers/misc/habanalabs/include/gaudi/gaudi_async_events.h    | 1 +
 .../habanalabs/include/gaudi/gaudi_async_ids_map_extended.h   | 2 +-
 3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index 8be75f7c97f8..099c51350be6 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -7575,6 +7575,10 @@ static void gaudi_handle_eqe(struct hl_device *hdev,
 			event_type, cause);
 		break;
 
+	case GAUDI_EVENT_BMC_RESET_CMD:
+		gaudi_print_irq_info(hdev, event_type, false);
+		goto reset_device;
+
 	case GAUDI_EVENT_PKT_QUEUE_OUT_SYNC:
 		gaudi_print_irq_info(hdev, event_type, false);
 		gaudi_print_out_of_sync_info(hdev, &eq_entry->pkt_sync_err);
diff --git a/drivers/misc/habanalabs/include/gaudi/gaudi_async_events.h b/drivers/misc/habanalabs/include/gaudi/gaudi_async_events.h
index 0a0fa57024f8..25f835ba2cd6 100644
--- a/drivers/misc/habanalabs/include/gaudi/gaudi_async_events.h
+++ b/drivers/misc/habanalabs/include/gaudi/gaudi_async_events.h
@@ -303,6 +303,7 @@ enum gaudi_async_event_id {
 	GAUDI_EVENT_NIC3_QP1 = 619,
 	GAUDI_EVENT_NIC4_QP0 = 620,
 	GAUDI_EVENT_NIC4_QP1 = 621,
+	GAUDI_EVENT_BMC_RESET_CMD = 646,
 	GAUDI_EVENT_PKT_QUEUE_OUT_SYNC = 647,
 	GAUDI_EVENT_FIX_POWER_ENV_S = 658,
 	GAUDI_EVENT_FIX_POWER_ENV_E = 659,
diff --git a/drivers/misc/habanalabs/include/gaudi/gaudi_async_ids_map_extended.h b/drivers/misc/habanalabs/include/gaudi/gaudi_async_ids_map_extended.h
index 9106395eb920..079be1fb8f70 100644
--- a/drivers/misc/habanalabs/include/gaudi/gaudi_async_ids_map_extended.h
+++ b/drivers/misc/habanalabs/include/gaudi/gaudi_async_ids_map_extended.h
@@ -670,7 +670,7 @@ static struct gaudi_async_events_ids_map gaudi_irq_map_table[] = {
 	{ .fc_id = 643, .cpu_id = 492, .valid = 0, .name = "" },
 	{ .fc_id = 644, .cpu_id = 493, .valid = 0, .name = "" },
 	{ .fc_id = 645, .cpu_id = 494, .valid = 0, .name = "" },
-	{ .fc_id = 646, .cpu_id = 495, .valid = 0, .name = "" },
+	{ .fc_id = 646, .cpu_id = 495, .valid = 1, .name = "BMC_RST_CMD" },
 	{ .fc_id = 647, .cpu_id = 496, .valid = 0, .name = "" },
 	{ .fc_id = 648, .cpu_id = 497, .valid = 0, .name = "" },
 	{ .fc_id = 649, .cpu_id = 498, .valid = 0, .name = "" },
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* [PATCH 2/2] habanalabs/gaudi: always use single-msi mode
  2021-03-09 18:28 [PATCH 1/2] habanalabs/gaudi: reset device upon BMC request Oded Gabbay
@ 2021-03-09 18:28 ` Oded Gabbay
  0 siblings, 0 replies; 2+ messages in thread
From: Oded Gabbay @ 2021-03-09 18:28 UTC (permalink / raw)
  To: linux-kernel

The device can get into deadlock in case it use indirect mode for MSI
interrupts (multi-msi) and have hard-reset during interrupt storm.

To prevent that, always use direct mode which means single-msi mode.

The F/W will prevent the host from writing to the indirect MSI
registers to prevent any malicious user from causing this scenario.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
 drivers/misc/habanalabs/gaudi/gaudi.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index 099c51350be6..0bcee675e1db 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -1766,8 +1766,7 @@ static int gaudi_enable_msi(struct hl_device *hdev)
 	if (gaudi->hw_cap_initialized & HW_CAP_MSI)
 		return 0;
 
-	rc = pci_alloc_irq_vectors(hdev->pdev, 1, GAUDI_MSI_ENTRIES,
-					PCI_IRQ_MSI);
+	rc = pci_alloc_irq_vectors(hdev->pdev, 1, 1, PCI_IRQ_MSI);
 	if (rc < 0) {
 		dev_err(hdev->dev, "MSI: Failed to enable support %d\n", rc);
 		return rc;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-03-09 18:29 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-09 18:28 [PATCH 1/2] habanalabs/gaudi: reset device upon BMC request Oded Gabbay
2021-03-09 18:28 ` [PATCH 2/2] habanalabs/gaudi: always use single-msi mode Oded Gabbay

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.