[PATCH 01/10] habanalabs: update device status sysfs documentation

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 01/10] habanalabs: update device status sysfs documentation
@ 2023-01-19 10:33 Oded Gabbay
  2023-01-19 10:33 ` [PATCH 02/10] habanalabs/gaudi2: print page fault axi transaction id Oded Gabbay
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: Oded Gabbay @ 2023-01-19 10:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ofir Bitton

From: Ofir Bitton <obitton@habana.ai>

As device status was changed recently, we must update the
documentation as well.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
 Documentation/ABI/testing/sysfs-driver-habanalabs | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/Documentation/ABI/testing/sysfs-driver-habanalabs b/Documentation/ABI/testing/sysfs-driver-habanalabs
index 13b5b2ec3be7..df2ca1a401b5 100644
--- a/Documentation/ABI/testing/sysfs-driver-habanalabs
+++ b/Documentation/ABI/testing/sysfs-driver-habanalabs
@@ -201,7 +201,18 @@ What:           /sys/class/habanalabs/hl<n>/status
 Date:           Jan 2019
 KernelVersion:  5.1
 Contact:        ogabbay@kernel.org
-Description:    Status of the card: "Operational", "Malfunction", "In reset".
+Description:    Status of the card:
+                "operational" - Device is available for work.
+                "in reset" - Device is going through reset, will be available
+                        shortly.
+                "disabled" - Device is not usable.
+                "needs reset" - Device is not usable until a hard reset will
+                        be initiated.
+                "in device creation" - Device is not available yet, as it is
+                        still initializing.
+                "in reset after device release" - Device is going through
+                        a compute-reset which is executed after a device release
+                        (relevant for Gaudi2 only).
 
 What:           /sys/class/habanalabs/hl<n>/thermal_ver
 Date:           Jan 2019
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 02/10] habanalabs/gaudi2: print page fault axi transaction id
  2023-01-19 10:33 [PATCH 01/10] habanalabs: update device status sysfs documentation Oded Gabbay
@ 2023-01-19 10:33 ` Oded Gabbay
  2023-01-19 10:33 ` [PATCH 03/10] habanalabs: block soft-reset on an unusable device Oded Gabbay
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Oded Gabbay @ 2023-01-19 10:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: Dani Liberman

From: Dani Liberman <dliberman@habana.ai>

AXI transaction id holds information about the initiator which caused
the page fault. In the future it will be translated automatically by
driver to an initiator name.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
 drivers/accel/habanalabs/gaudi2/gaudi2.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/accel/habanalabs/gaudi2/gaudi2.c b/drivers/accel/habanalabs/gaudi2/gaudi2.c
index 8c0cbd3b4a0c..72e08c1eae22 100644
--- a/drivers/accel/habanalabs/gaudi2/gaudi2.c
+++ b/drivers/accel/habanalabs/gaudi2/gaudi2.c
@@ -8287,7 +8287,7 @@ static int gaudi2_handle_hif_fatal(struct hl_device *hdev, u16 event_type, u64 i
 static void gaudi2_handle_page_error(struct hl_device *hdev, u64 mmu_base, bool is_pmmu,
 					u64 *event_mask)
 {
-	u32 valid, val;
+	u32 valid, val, axid_l, axid_h;
 	u64 addr;
 
 	valid = RREG32(mmu_base + MMU_OFFSET(mmDCORE0_HMMU0_MMU_ACCESS_PAGE_ERROR_VALID));
@@ -8300,8 +8300,11 @@ static void gaudi2_handle_page_error(struct hl_device *hdev, u64 mmu_base, bool
 	addr <<= 32;
 	addr |= RREG32(mmu_base + MMU_OFFSET(mmDCORE0_HMMU0_MMU_PAGE_ERROR_CAPTURE_VA));
 
-	dev_err_ratelimited(hdev->dev, "%s page fault on va 0x%llx\n",
-				is_pmmu ? "PMMU" : "HMMU", addr);
+	axid_l = RREG32(mmu_base + MMU_OFFSET(mmDCORE0_HMMU0_MMU_PAGE_FAULT_ID_LSB));
+	axid_h = RREG32(mmu_base + MMU_OFFSET(mmDCORE0_HMMU0_MMU_PAGE_FAULT_ID_MSB));
+
+	dev_err_ratelimited(hdev->dev, "%s page fault on va 0x%llx, transaction id 0x%llX\n",
+				is_pmmu ? "PMMU" : "HMMU", addr, ((u64)axid_h << 32) + axid_l);
 	hl_handle_page_fault(hdev, addr, 0, is_pmmu, event_mask);
 
 	WREG32(mmu_base + MMU_OFFSET(mmDCORE0_HMMU0_MMU_PAGE_ERROR_CAPTURE), 0);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 03/10] habanalabs: block soft-reset on an unusable device
  2023-01-19 10:33 [PATCH 01/10] habanalabs: update device status sysfs documentation Oded Gabbay
  2023-01-19 10:33 ` [PATCH 02/10] habanalabs/gaudi2: print page fault axi transaction id Oded Gabbay
@ 2023-01-19 10:33 ` Oded Gabbay
  2023-01-19 10:33 ` [PATCH 04/10] habanalabs/gaudi2: fix emda range registers razwi handling Oded Gabbay
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Oded Gabbay @ 2023-01-19 10:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: Koby Elbaz

From: Koby Elbaz <kelbaz@habana.ai>

A device with status malfunction indicates that it can't be used.
In such a case we do not support certain reset types, e.g.,
all kinds of soft-resets (compute reset, inference soft-reset),
and reset upon device release.

A hard-reset is the only way that an unusable device can change its
status. All other reset procedures can't put the device in a reset
procedure, which might ultimately cause the device to change its
status, unintentionally, to become operational again.

Such a scenario has recently occurred, when a user requested
a hard-reset while another heavy user workload was ongoing (reset
request is queued).
Since the workload couldn't finish within reset's timeout limits, the
reset has failed and set a device status malfunction.
Eventually, when the user released the FD, an unsuccessful soft-reset
occurred, hence followed by an additional hard-reset that changed the
ASICs status back to be operational.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
 drivers/accel/habanalabs/common/device.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/accel/habanalabs/common/device.c b/drivers/accel/habanalabs/common/device.c
index 2b6971463f12..9a9c494b08a4 100644
--- a/drivers/accel/habanalabs/common/device.c
+++ b/drivers/accel/habanalabs/common/device.c
@@ -1425,8 +1425,8 @@ static void handle_reset_trigger(struct hl_device *hdev, u32 flags)
 int hl_device_reset(struct hl_device *hdev, u32 flags)
 {
 	bool hard_reset, from_hard_reset_thread, fw_reset, hard_instead_soft = false,
-			reset_upon_device_release = false, schedule_hard_reset = false, delay_reset,
-			from_dev_release, from_watchdog_thread;
+			reset_upon_device_release = false, schedule_hard_reset = false,
+			delay_reset, from_dev_release, from_watchdog_thread;
 	u64 idle_mask[HL_BUSY_ENGINES_MASK_EXT_SIZE] = {0};
 	struct hl_ctx *ctx;
 	int i, rc;
@@ -1443,12 +1443,17 @@ int hl_device_reset(struct hl_device *hdev, u32 flags)
 	delay_reset = !!(flags & HL_DRV_RESET_DELAY);
 	from_watchdog_thread = !!(flags & HL_DRV_RESET_FROM_WD_THR);
 
+	if (!hard_reset && (hl_device_status(hdev) == HL_DEVICE_STATUS_MALFUNCTION)) {
+		dev_dbg(hdev->dev, "soft-reset isn't supported on a malfunctioning device\n");
+		return 0;
+	}
+
 	if (!hard_reset && !hdev->asic_prop.supports_compute_reset) {
 		hard_instead_soft = true;
 		hard_reset = true;
 	}
 
-	if (hdev->reset_upon_device_release && (flags & HL_DRV_RESET_DEV_RELEASE)) {
+	if (hdev->reset_upon_device_release && from_dev_release) {
 		if (hard_reset) {
 			dev_crit(hdev->dev,
 				"Aborting reset because hard-reset is mutually exclusive with reset-on-device-release\n");
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 04/10] habanalabs/gaudi2: fix emda range registers razwi handling
  2023-01-19 10:33 [PATCH 01/10] habanalabs: update device status sysfs documentation Oded Gabbay
  2023-01-19 10:33 ` [PATCH 02/10] habanalabs/gaudi2: print page fault axi transaction id Oded Gabbay
  2023-01-19 10:33 ` [PATCH 03/10] habanalabs: block soft-reset on an unusable device Oded Gabbay
@ 2023-01-19 10:33 ` Oded Gabbay
  2023-01-19 10:33 ` [PATCH 05/10] habanalabs: refactor user interrupt type Oded Gabbay
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Oded Gabbay @ 2023-01-19 10:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: Dani Liberman

From: Dani Liberman <dliberman@habana.ai>

Handling edma razwi is different than all other engines since edma
uses sft routers. For hbw transactions sft router contain separate
interface for each edma and for lbw there is common interface for
both edma engines of the same dcore.

To handle the razwi correctly we need to:
1. Simplify the calculation of the sft router address.
2. Add razwi handling for edma qm errors, since edma qman doesn't
   reports axi error response.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
 drivers/accel/habanalabs/gaudi2/gaudi2.c | 69 ++++++++++++------------
 1 file changed, 33 insertions(+), 36 deletions(-)

diff --git a/drivers/accel/habanalabs/gaudi2/gaudi2.c b/drivers/accel/habanalabs/gaudi2/gaudi2.c
index 72e08c1eae22..80cd4413b87d 100644
--- a/drivers/accel/habanalabs/gaudi2/gaudi2.c
+++ b/drivers/accel/habanalabs/gaudi2/gaudi2.c
@@ -1604,13 +1604,15 @@ static const u32 gaudi2_nic_initiator_lbw_rtr_id[NIC_NUMBER_OF_MACROS] = {
 	DCORE2_RTR0, DCORE2_RTR0, DCORE2_RTR0, DCORE3_RTR7, DCORE3_RTR7, DCORE3_RTR7
 };
 
-struct sft_info {
-	u8 interface_id;
-	u8 dcore_id;
-};
-
-static const struct sft_info gaudi2_edma_initiator_sft_id[NUM_OF_EDMA_PER_DCORE * NUM_OF_DCORES] = {
-	{0, 0},	{1, 0}, {0, 1}, {1, 1}, {1, 2}, {1, 3},	{0, 2},	{0, 3},
+static const u32 gaudi2_edma_initiator_hbw_sft[NUM_OF_EDMA_PER_DCORE * NUM_OF_DCORES] = {
+	mmSFT0_HBW_RTR_IF1_MSTR_IF_RR_SHRD_HBW_BASE,
+	mmSFT0_HBW_RTR_IF0_MSTR_IF_RR_SHRD_HBW_BASE,
+	mmSFT1_HBW_RTR_IF1_MSTR_IF_RR_SHRD_HBW_BASE,
+	mmSFT1_HBW_RTR_IF0_MSTR_IF_RR_SHRD_HBW_BASE,
+	mmSFT2_HBW_RTR_IF0_MSTR_IF_RR_SHRD_HBW_BASE,
+	mmSFT2_HBW_RTR_IF1_MSTR_IF_RR_SHRD_HBW_BASE,
+	mmSFT3_HBW_RTR_IF0_MSTR_IF_RR_SHRD_HBW_BASE,
+	mmSFT3_HBW_RTR_IF1_MSTR_IF_RR_SHRD_HBW_BASE
 };
 
 static const u32 gaudi2_pdma_initiator_hbw_rtr_id[NUM_OF_PDMA] = {
@@ -7212,7 +7214,7 @@ static void gaudi2_ack_module_razwi_event_handler(struct hl_device *hdev,
 				u8 module_sub_idx, u64 *event_mask)
 {
 	bool via_sft = false;
-	u32 hbw_rtr_id, lbw_rtr_id, dcore_id, dcore_rtr_id, sft_id, eng_id;
+	u32 hbw_rtr_id, lbw_rtr_id, dcore_id, dcore_rtr_id, eng_id;
 	u64 hbw_rtr_mstr_if_base_addr, lbw_rtr_mstr_if_base_addr;
 	u32 hbw_shrd_aw = 0, hbw_shrd_ar = 0;
 	u32 lbw_shrd_aw = 0, lbw_shrd_ar = 0;
@@ -7268,8 +7270,13 @@ static void gaudi2_ack_module_razwi_event_handler(struct hl_device *hdev,
 		lbw_rtr_id = hbw_rtr_id;
 		break;
 	case RAZWI_EDMA:
-		sft_id = gaudi2_edma_initiator_sft_id[module_idx].interface_id;
-		dcore_id = gaudi2_edma_initiator_sft_id[module_idx].dcore_id;
+		hbw_rtr_mstr_if_base_addr = gaudi2_edma_initiator_hbw_sft[module_idx];
+		dcore_id = module_idx / NUM_OF_EDMA_PER_DCORE;
+		/* SFT has separate MSTR_IF for LBW, only there we can
+		 * read the LBW razwi related registers
+		 */
+		lbw_rtr_mstr_if_base_addr = mmSFT0_LBW_RTR_IF_MSTR_IF_RR_SHRD_HBW_BASE +
+								dcore_id * SFT_DCORE_OFFSET;
 		via_sft = true;
 		sprintf(initiator_name, "EDMA_%u", module_idx);
 		break;
@@ -7298,13 +7305,7 @@ static void gaudi2_ack_module_razwi_event_handler(struct hl_device *hdev,
 	}
 
 	/* Find router mstr_if register base */
-	if (via_sft) {
-		hbw_rtr_mstr_if_base_addr = mmSFT0_HBW_RTR_IF0_RTR_CTRL_BASE +
-				dcore_id * SFT_DCORE_OFFSET +
-				sft_id * SFT_IF_OFFSET +
-				RTR_MSTR_IF_OFFSET;
-		lbw_rtr_mstr_if_base_addr = hbw_rtr_mstr_if_base_addr;
-	} else {
+	if (!via_sft) {
 		dcore_id = hbw_rtr_id / NUM_OF_RTR_PER_DCORE;
 		dcore_rtr_id = hbw_rtr_id % NUM_OF_RTR_PER_DCORE;
 		hbw_rtr_mstr_if_base_addr = mmDCORE0_RTR0_CTRL_BASE +
@@ -7318,22 +7319,8 @@ static void gaudi2_ack_module_razwi_event_handler(struct hl_device *hdev,
 	/* Find out event cause by reading "RAZWI_HAPPENED" registers */
 	hbw_shrd_aw = RREG32(hbw_rtr_mstr_if_base_addr + RR_SHRD_HBW_AW_RAZWI_HAPPENED);
 	hbw_shrd_ar = RREG32(hbw_rtr_mstr_if_base_addr + RR_SHRD_HBW_AR_RAZWI_HAPPENED);
-
-	if (via_sft) {
-		/* SFT has separate MSTR_IF for LBW, only there we can
-		 * read the LBW razwi related registers
-		 */
-		u64 base;
-
-		base = mmSFT0_HBW_RTR_IF0_RTR_CTRL_BASE + dcore_id * SFT_DCORE_OFFSET +
-				RTR_LBW_MSTR_IF_OFFSET;
-
-		lbw_shrd_aw = RREG32(base + RR_SHRD_LBW_AW_RAZWI_HAPPENED);
-		lbw_shrd_ar = RREG32(base + RR_SHRD_LBW_AR_RAZWI_HAPPENED);
-	} else {
-		lbw_shrd_aw = RREG32(lbw_rtr_mstr_if_base_addr + RR_SHRD_LBW_AW_RAZWI_HAPPENED);
-		lbw_shrd_ar = RREG32(lbw_rtr_mstr_if_base_addr + RR_SHRD_LBW_AR_RAZWI_HAPPENED);
-	}
+	lbw_shrd_aw = RREG32(lbw_rtr_mstr_if_base_addr + RR_SHRD_LBW_AW_RAZWI_HAPPENED);
+	lbw_shrd_ar = RREG32(lbw_rtr_mstr_if_base_addr + RR_SHRD_LBW_AR_RAZWI_HAPPENED);
 
 	eng_id = gaudi2_razwi_calc_engine_id(hdev, module, module_idx);
 	if (hbw_shrd_aw) {
@@ -7855,7 +7842,7 @@ static int gaudi2_handle_qm_sei_err(struct hl_device *hdev, u16 event_type,
 	return error_count;
 }
 
-static int gaudi2_handle_qman_err(struct hl_device *hdev, u16 event_type)
+static int gaudi2_handle_qman_err(struct hl_device *hdev, u16 event_type, u64 *event_mask)
 {
 	u32 qid_base, error_count = 0;
 	u64 qman_base;
@@ -7903,34 +7890,42 @@ static int gaudi2_handle_qman_err(struct hl_device *hdev, u16 event_type)
 		qman_base = mmDCORE3_MME_QM_BASE;
 		break;
 	case GAUDI2_EVENT_HDMA0_QM:
+		index = 0;
 		qid_base = GAUDI2_QUEUE_ID_DCORE0_EDMA_0_0;
 		qman_base = mmDCORE0_EDMA0_QM_BASE;
 		break;
 	case GAUDI2_EVENT_HDMA1_QM:
+		index = 1;
 		qid_base = GAUDI2_QUEUE_ID_DCORE0_EDMA_1_0;
 		qman_base = mmDCORE0_EDMA1_QM_BASE;
 		break;
 	case GAUDI2_EVENT_HDMA2_QM:
+		index = 2;
 		qid_base = GAUDI2_QUEUE_ID_DCORE1_EDMA_0_0;
 		qman_base = mmDCORE1_EDMA0_QM_BASE;
 		break;
 	case GAUDI2_EVENT_HDMA3_QM:
+		index = 3;
 		qid_base = GAUDI2_QUEUE_ID_DCORE1_EDMA_1_0;
 		qman_base = mmDCORE1_EDMA1_QM_BASE;
 		break;
 	case GAUDI2_EVENT_HDMA4_QM:
+		index = 4;
 		qid_base = GAUDI2_QUEUE_ID_DCORE2_EDMA_0_0;
 		qman_base = mmDCORE2_EDMA0_QM_BASE;
 		break;
 	case GAUDI2_EVENT_HDMA5_QM:
+		index = 5;
 		qid_base = GAUDI2_QUEUE_ID_DCORE2_EDMA_1_0;
 		qman_base = mmDCORE2_EDMA1_QM_BASE;
 		break;
 	case GAUDI2_EVENT_HDMA6_QM:
+		index = 6;
 		qid_base = GAUDI2_QUEUE_ID_DCORE3_EDMA_0_0;
 		qman_base = mmDCORE3_EDMA0_QM_BASE;
 		break;
 	case GAUDI2_EVENT_HDMA7_QM:
+		index = 7;
 		qid_base = GAUDI2_QUEUE_ID_DCORE3_EDMA_1_0;
 		qman_base = mmDCORE3_EDMA1_QM_BASE;
 		break;
@@ -7957,8 +7952,10 @@ static int gaudi2_handle_qman_err(struct hl_device *hdev, u16 event_type)
 	error_count = gaudi2_handle_qman_err_generic(hdev, event_type, qman_base, qid_base);
 
 	/* Handle EDMA QM SEI here because there is no AXI error response event for EDMA */
-	if (event_type >= GAUDI2_EVENT_HDMA2_QM && event_type <= GAUDI2_EVENT_HDMA5_QM)
+	if (event_type >= GAUDI2_EVENT_HDMA2_QM && event_type <= GAUDI2_EVENT_HDMA5_QM) {
 		error_count += _gaudi2_handle_qm_sei_err(hdev, qman_base, event_type);
+		gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_EDMA, index, 0, event_mask);
+	}
 
 	return error_count;
 }
@@ -8868,7 +8865,7 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent
 	case GAUDI2_EVENT_ROTATOR0_ROT0_QM ... GAUDI2_EVENT_ROTATOR1_ROT1_QM:
 		fallthrough;
 	case GAUDI2_EVENT_NIC0_QM0 ... GAUDI2_EVENT_NIC11_QM1:
-		error_count = gaudi2_handle_qman_err(hdev, event_type);
+		error_count = gaudi2_handle_qman_err(hdev, event_type, &event_mask);
 		event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR;
 		break;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 05/10] habanalabs: refactor user interrupt type
  2023-01-19 10:33 [PATCH 01/10] habanalabs: update device status sysfs documentation Oded Gabbay
                   ` (2 preceding siblings ...)
  2023-01-19 10:33 ` [PATCH 04/10] habanalabs/gaudi2: fix emda range registers razwi handling Oded Gabbay
@ 2023-01-19 10:33 ` Oded Gabbay
  2023-01-19 10:33 ` [PATCH 06/10] habanalabs: optimize command submission completion timestamp Oded Gabbay
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Oded Gabbay @ 2023-01-19 10:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ofir Bitton

From: Ofir Bitton <obitton@habana.ai>

In order to support more user interrupt types in the future, we
enumerate the user interrupt type instead of using a boolean.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
 drivers/accel/habanalabs/common/habanalabs.h | 21 ++++++++++++--------
 drivers/accel/habanalabs/common/irq.c        | 19 +++++++++++++-----
 drivers/accel/habanalabs/gaudi2/gaudi2.c     |  9 +++++----
 3 files changed, 32 insertions(+), 17 deletions(-)

diff --git a/drivers/accel/habanalabs/common/habanalabs.h b/drivers/accel/habanalabs/common/habanalabs.h
index 0b7fe4afd92d..a0dfbf4f6cbb 100644
--- a/drivers/accel/habanalabs/common/habanalabs.h
+++ b/drivers/accel/habanalabs/common/habanalabs.h
@@ -1083,20 +1083,25 @@ struct hl_cq {
 	atomic_t		free_slots_cnt;
 };
 
+enum hl_user_interrupt_type {
+	HL_USR_INTERRUPT_CQ = 0,
+	HL_USR_INTERRUPT_DECODER,
+};
+
 /**
  * struct hl_user_interrupt - holds user interrupt information
  * @hdev: pointer to the device structure
+ * @type: user interrupt type
  * @wait_list_head: head to the list of user threads pending on this interrupt
  * @wait_list_lock: protects wait_list_head
  * @interrupt_id: msix interrupt id
- * @is_decoder: whether this entry represents a decoder interrupt
  */
 struct hl_user_interrupt {
-	struct hl_device	*hdev;
-	struct list_head	wait_list_head;
-	spinlock_t		wait_list_lock;
-	u32			interrupt_id;
-	bool			is_decoder;
+	struct hl_device		*hdev;
+	enum hl_user_interrupt_type	type;
+	struct list_head		wait_list_head;
+	spinlock_t			wait_list_lock;
+	u32				interrupt_id;
 };
 
 /**
@@ -2691,11 +2696,11 @@ void hl_wreg(struct hl_device *hdev, u32 reg, u32 val);
 	p->size = sz; \
 })
 
-#define HL_USR_INTR_STRUCT_INIT(usr_intr, hdev, intr_id, decoder) \
+#define HL_USR_INTR_STRUCT_INIT(usr_intr, hdev, intr_id, intr_type) \
 ({ \
 	usr_intr.hdev = hdev; \
 	usr_intr.interrupt_id = intr_id; \
-	usr_intr.is_decoder = decoder; \
+	usr_intr.type = intr_type; \
 	INIT_LIST_HEAD(&usr_intr.wait_list_head); \
 	spin_lock_init(&usr_intr.wait_list_lock); \
 })
diff --git a/drivers/accel/habanalabs/common/irq.c b/drivers/accel/habanalabs/common/irq.c
index 8bbcc223df91..a986d7dea453 100644
--- a/drivers/accel/habanalabs/common/irq.c
+++ b/drivers/accel/habanalabs/common/irq.c
@@ -333,13 +333,22 @@ irqreturn_t hl_irq_handler_user_interrupt(int irq, void *arg)
 	struct hl_user_interrupt *user_int = arg;
 	struct hl_device *hdev = user_int->hdev;
 
-	if (user_int->is_decoder)
-		handle_user_interrupt(hdev, &hdev->common_decoder_interrupt);
-	else
+	switch (user_int->type) {
+	case HL_USR_INTERRUPT_CQ:
 		handle_user_interrupt(hdev, &hdev->common_user_cq_interrupt);
 
-	/* Handle user cq or decoder interrupts registered on this specific irq */
-	handle_user_interrupt(hdev, user_int);
+		/* Handle user cq interrupt registered on this specific irq */
+		handle_user_interrupt(hdev, user_int);
+		break;
+	case HL_USR_INTERRUPT_DECODER:
+		handle_user_interrupt(hdev, &hdev->common_decoder_interrupt);
+
+		/* Handle decoder interrupt registered on this specific irq */
+		handle_user_interrupt(hdev, user_int);
+		break;
+	default:
+		break;
+	}
 
 	return IRQ_HANDLED;
 }
diff --git a/drivers/accel/habanalabs/gaudi2/gaudi2.c b/drivers/accel/habanalabs/gaudi2/gaudi2.c
index 80cd4413b87d..65c720a0c64c 100644
--- a/drivers/accel/habanalabs/gaudi2/gaudi2.c
+++ b/drivers/accel/habanalabs/gaudi2/gaudi2.c
@@ -2966,11 +2966,11 @@ static void gaudi2_user_interrupt_setup(struct hl_device *hdev)
 
 	/* Initialize common user CQ interrupt */
 	HL_USR_INTR_STRUCT_INIT(hdev->common_user_cq_interrupt, hdev,
-				HL_COMMON_USER_CQ_INTERRUPT_ID, false);
+				HL_COMMON_USER_CQ_INTERRUPT_ID, HL_USR_INTERRUPT_CQ);
 
 	/* Initialize common decoder interrupt */
 	HL_USR_INTR_STRUCT_INIT(hdev->common_decoder_interrupt, hdev,
-				HL_COMMON_DEC_INTERRUPT_ID, true);
+				HL_COMMON_DEC_INTERRUPT_ID, HL_USR_INTERRUPT_DECODER);
 
 	/* User interrupts structure holds both decoder and user interrupts from various engines.
 	 * We first initialize the decoder interrupts and then we add the user interrupts.
@@ -2983,10 +2983,11 @@ static void gaudi2_user_interrupt_setup(struct hl_device *hdev)
 	 */
 	for (i = GAUDI2_IRQ_NUM_DCORE0_DEC0_NRM, j = 0 ; i <= GAUDI2_IRQ_NUM_SHARED_DEC1_NRM;
 										i += 2, j++)
-		HL_USR_INTR_STRUCT_INIT(hdev->user_interrupt[j], hdev, i, true);
+		HL_USR_INTR_STRUCT_INIT(hdev->user_interrupt[j], hdev, i,
+						HL_USR_INTERRUPT_DECODER);
 
 	for (i = GAUDI2_IRQ_NUM_USER_FIRST, k = 0 ; k < prop->user_interrupt_count; i++, j++, k++)
-		HL_USR_INTR_STRUCT_INIT(hdev->user_interrupt[j], hdev, i, false);
+		HL_USR_INTR_STRUCT_INIT(hdev->user_interrupt[j], hdev, i, HL_USR_INTERRUPT_CQ);
 }
 
 static inline int gaudi2_get_non_zero_random_int(void)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 06/10] habanalabs: optimize command submission completion timestamp
  2023-01-19 10:33 [PATCH 01/10] habanalabs: update device status sysfs documentation Oded Gabbay
                   ` (3 preceding siblings ...)
  2023-01-19 10:33 ` [PATCH 05/10] habanalabs: refactor user interrupt type Oded Gabbay
@ 2023-01-19 10:33 ` Oded Gabbay
  2023-01-19 10:33 ` [PATCH 07/10] habanalabs: enhance info printed on FW load errors Oded Gabbay
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Oded Gabbay @ 2023-01-19 10:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ofir Bitton

From: Ofir Bitton <obitton@habana.ai>

Completion timestamp is taken during the actual command submission
release. As the release happens in a work queue, the timestamp taken
is not accurate. Hence, we will take the timestamp in the interrupt
handler itself while propagating it to the release function.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
 .../accel/habanalabs/common/command_submission.c    | 12 ++++++++++--
 drivers/accel/habanalabs/common/habanalabs.h        |  4 ++++
 drivers/accel/habanalabs/common/irq.c               | 13 +++++++++----
 3 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/drivers/accel/habanalabs/common/command_submission.c b/drivers/accel/habanalabs/common/command_submission.c
index 00fedf2d8654..8270db0a72a2 100644
--- a/drivers/accel/habanalabs/common/command_submission.c
+++ b/drivers/accel/habanalabs/common/command_submission.c
@@ -398,8 +398,16 @@ static void hl_complete_job(struct hl_device *hdev, struct hl_cs_job *job)
 	 * flow by calling 'hl_hw_queue_update_ci'.
 	 */
 	if (cs_needs_completion(cs) &&
-		(job->queue_type == QUEUE_TYPE_EXT || job->queue_type == QUEUE_TYPE_HW))
+			(job->queue_type == QUEUE_TYPE_EXT || job->queue_type == QUEUE_TYPE_HW)) {
+
+		/* In CS based completions, the timestamp is already available,
+		 * so no need to extract it from job
+		 */
+		if (hdev->asic_prop.completion_mode == HL_COMPLETION_MODE_JOB)
+			cs->completion_timestamp = job->timestamp;
+
 		cs_put(cs);
+	}
 
 	hl_cs_job_put(job);
 }
@@ -776,7 +784,7 @@ static void cs_do_release(struct kref *ref)
 	}
 
 	if (cs->timestamp) {
-		cs->fence->timestamp = ktime_get();
+		cs->fence->timestamp = cs->completion_timestamp;
 		hl_push_cs_outcome(hdev, &cs->ctx->outcome_store, cs->sequence,
 				   cs->fence->timestamp, cs->fence->error);
 	}
diff --git a/drivers/accel/habanalabs/common/habanalabs.h b/drivers/accel/habanalabs/common/habanalabs.h
index a0dfbf4f6cbb..afc0c0d3f9e3 100644
--- a/drivers/accel/habanalabs/common/habanalabs.h
+++ b/drivers/accel/habanalabs/common/habanalabs.h
@@ -1940,6 +1940,7 @@ struct hl_userptr {
  * @type: CS_TYPE_*.
  * @jobs_cnt: counter of submitted jobs on all queues.
  * @encaps_sig_hdl_id: encaps signals handle id, set for the first staged cs.
+ * @completion_timestamp: timestamp of the last completed cs job.
  * @sob_addr_offset: sob offset from the configuration base address.
  * @initial_sob_count: count of completed signals in SOB before current submission of signal or
  *                     cs with encaps signals.
@@ -1972,6 +1973,7 @@ struct hl_cs {
 	struct list_head	staged_cs_node;
 	struct list_head	debugfs_list;
 	struct hl_cs_encaps_sig_handle *encaps_sig_hdl;
+	ktime_t			completion_timestamp;
 	u64			sequence;
 	u64			staged_sequence;
 	u64			timeout_jiffies;
@@ -2007,6 +2009,7 @@ struct hl_cs {
  * @debugfs_list: node in debugfs list of command submission jobs.
  * @refcount: reference counter for usage of the CS job.
  * @queue_type: the type of the H/W queue this job is submitted to.
+ * @timestamp: timestamp upon job completion
  * @id: the id of this job inside a CS.
  * @hw_queue_id: the id of the H/W queue this job is submitted to.
  * @user_cb_size: the actual size of the CB we got from the user.
@@ -2033,6 +2036,7 @@ struct hl_cs_job {
 	struct list_head	debugfs_list;
 	struct kref		refcount;
 	enum hl_queue_type	queue_type;
+	ktime_t			timestamp;
 	u32			id;
 	u32			hw_queue_id;
 	u32			user_cb_size;
diff --git a/drivers/accel/habanalabs/common/irq.c b/drivers/accel/habanalabs/common/irq.c
index a986d7dea453..04844e843a7b 100644
--- a/drivers/accel/habanalabs/common/irq.c
+++ b/drivers/accel/habanalabs/common/irq.c
@@ -72,15 +72,17 @@ static void irq_handle_eqe(struct work_struct *work)
  * @hdev: pointer to device structure
  * @cs_seq: command submission sequence
  * @cq: completion queue
+ * @timestamp: interrupt timestamp
  *
  */
-static void job_finish(struct hl_device *hdev, u32 cs_seq, struct hl_cq *cq)
+static void job_finish(struct hl_device *hdev, u32 cs_seq, struct hl_cq *cq, ktime_t timestamp)
 {
 	struct hl_hw_queue *queue;
 	struct hl_cs_job *job;
 
 	queue = &hdev->kernel_queues[cq->hw_queue_id];
 	job = queue->shadow_queue[hl_pi_2_offset(cs_seq)];
+	job->timestamp = timestamp;
 	queue_work(hdev->cq_wq[cq->cq_idx], &job->finish_work);
 
 	atomic_inc(&queue->ci);
@@ -91,9 +93,10 @@ static void job_finish(struct hl_device *hdev, u32 cs_seq, struct hl_cq *cq)
  *
  * @hdev: pointer to device structure
  * @cs_seq: command submission sequence
+ * @timestamp: interrupt timestamp
  *
  */
-static void cs_finish(struct hl_device *hdev, u16 cs_seq)
+static void cs_finish(struct hl_device *hdev, u16 cs_seq, ktime_t timestamp)
 {
 	struct asic_fixed_properties *prop = &hdev->asic_prop;
 	struct hl_hw_queue *queue;
@@ -113,6 +116,7 @@ static void cs_finish(struct hl_device *hdev, u16 cs_seq)
 		atomic_inc(&queue->ci);
 	}
 
+	cs->completion_timestamp = timestamp;
 	queue_work(hdev->cs_cmplt_wq, &cs->finish_work);
 }
 
@@ -130,6 +134,7 @@ irqreturn_t hl_irq_handler_cq(int irq, void *arg)
 	bool shadow_index_valid, entry_ready;
 	u16 shadow_index;
 	struct hl_cq_entry *cq_entry, *cq_base;
+	ktime_t timestamp = ktime_get();
 
 	if (hdev->disabled) {
 		dev_dbg(hdev->dev,
@@ -171,9 +176,9 @@ irqreturn_t hl_irq_handler_cq(int irq, void *arg)
 		if (shadow_index_valid && !hdev->disabled) {
 			if (hdev->asic_prop.completion_mode ==
 					HL_COMPLETION_MODE_CS)
-				cs_finish(hdev, shadow_index);
+				cs_finish(hdev, shadow_index, timestamp);
 			else
-				job_finish(hdev, shadow_index, cq);
+				job_finish(hdev, shadow_index, cq, timestamp);
 		}
 
 		/* Clear CQ entry ready bit */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 07/10] habanalabs: enhance info printed on FW load errors
  2023-01-19 10:33 [PATCH 01/10] habanalabs: update device status sysfs documentation Oded Gabbay
                   ` (4 preceding siblings ...)
  2023-01-19 10:33 ` [PATCH 06/10] habanalabs: optimize command submission completion timestamp Oded Gabbay
@ 2023-01-19 10:33 ` Oded Gabbay
  2023-01-19 10:33 ` [PATCH 08/10] habanalabs: run error handling if scrub_device_mem fails after reset Oded Gabbay
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Oded Gabbay @ 2023-01-19 10:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: Moti Haimovski

From: Moti Haimovski <mhaimovski@habana.ai>

This commit enhances the following error messages to also provide the
type of error occurred, this in order to ease debugging of errors
detected during firmware-load.

Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
 drivers/accel/habanalabs/common/firmware_if.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/accel/habanalabs/common/firmware_if.c b/drivers/accel/habanalabs/common/firmware_if.c
index ef228087ef55..da892d8fb3d6 100644
--- a/drivers/accel/habanalabs/common/firmware_if.c
+++ b/drivers/accel/habanalabs/common/firmware_if.c
@@ -335,7 +335,7 @@ int hl_fw_send_cpu_message(struct hl_device *hdev, u32 hw_queue_id, u32 *msg,
 			dev_dbg(hdev->dev, "Device CPU packet timeout (0x%x) due to FW reset\n",
 					tmp);
 		else
-			dev_err(hdev->dev, "Device CPU packet timeout (0x%x)\n", tmp);
+			dev_err(hdev->dev, "Device CPU packet timeout (status = 0x%x)\n", tmp);
 		hdev->device_cpu_disabled = true;
 		goto out;
 	}
@@ -1346,8 +1346,7 @@ static void detect_cpu_boot_status(struct hl_device *hdev, u32 status)
 		break;
 	default:
 		dev_err(hdev->dev,
-			"Device boot progress - Invalid status code %d\n",
-			status);
+			"Device boot progress - Invalid or unexpected status code %d\n", status);
 		break;
 	}
 }
@@ -1377,8 +1376,8 @@ int hl_fw_wait_preboot_ready(struct hl_device *hdev)
 		pre_fw_load->wait_for_preboot_timeout);
 
 	if (rc) {
-		dev_err(hdev->dev, "CPU boot ready status timeout\n");
 		detect_cpu_boot_status(hdev, status);
+		dev_err(hdev->dev, "CPU boot ready timeout (status = %d)\n", status);
 
 		/* If we read all FF, then something is totally wrong, no point
 		 * of reading specific errors
@@ -2427,7 +2426,7 @@ static int hl_fw_dynamic_wait_for_boot_fit_active(struct hl_device *hdev,
 		hdev->fw_poll_interval_usec,
 		dyn_loader->wait_for_bl_timeout);
 	if (rc) {
-		dev_err(hdev->dev, "failed to wait for boot\n");
+		dev_err(hdev->dev, "failed to wait for boot (status = %d)\n", status);
 		return rc;
 	}
 
@@ -2454,7 +2453,7 @@ static int hl_fw_dynamic_wait_for_linux_active(struct hl_device *hdev,
 		hdev->fw_poll_interval_usec,
 		fw_loader->cpu_timeout);
 	if (rc) {
-		dev_err(hdev->dev, "failed to wait for Linux\n");
+		dev_err(hdev->dev, "failed to wait for Linux (status = %d)\n", status);
 		return rc;
 	}
 
@@ -2793,7 +2792,7 @@ static int hl_fw_static_init_cpu(struct hl_device *hdev,
 
 	if (rc) {
 		dev_dbg(hdev->dev,
-			"No boot fit request received, resuming boot\n");
+			"No boot fit request received (status = %d), resuming boot\n", status);
 	} else {
 		rc = hdev->asic_funcs->load_boot_fit_to_device(hdev);
 		if (rc)
@@ -2816,7 +2815,7 @@ static int hl_fw_static_init_cpu(struct hl_device *hdev,
 
 		if (rc) {
 			dev_err(hdev->dev,
-				"Timeout waiting for boot fit load ack\n");
+				"Timeout waiting for boot fit load ack (status = %d)\n", status);
 			goto out;
 		}
 
@@ -2894,7 +2893,7 @@ static int hl_fw_static_init_cpu(struct hl_device *hdev,
 
 		if (rc) {
 			dev_err(hdev->dev,
-				"Failed to get ACK on skipping BMC, %d\n",
+				"Failed to get ACK on skipping BMC (status = %d)\n",
 				status);
 			WREG32(msg_to_cpu_reg, KMD_MSG_NA);
 			rc = -EIO;
@@ -2921,7 +2920,7 @@ static int hl_fw_static_init_cpu(struct hl_device *hdev,
 				"Device reports FIT image is corrupted\n");
 		else
 			dev_err(hdev->dev,
-				"Failed to load firmware to device, %d\n",
+				"Failed to load firmware to device (status = %d)\n",
 				status);
 
 		rc = -EIO;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 08/10] habanalabs: run error handling if scrub_device_mem fails after reset
  2023-01-19 10:33 [PATCH 01/10] habanalabs: update device status sysfs documentation Oded Gabbay
                   ` (5 preceding siblings ...)
  2023-01-19 10:33 ` [PATCH 07/10] habanalabs: enhance info printed on FW load errors Oded Gabbay
@ 2023-01-19 10:33 ` Oded Gabbay
  2023-01-19 10:33 ` [PATCH 09/10] habanalabs: clear in_compute_reset when escalating to hard reset Oded Gabbay
  2023-01-19 10:33 ` [PATCH 10/10] habanalabs/gaudi2: unsecure tpc kernel_config registers Oded Gabbay
  8 siblings, 0 replies; 10+ messages in thread
From: Oded Gabbay @ 2023-01-19 10:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tomer Tayar

From: Tomer Tayar <ttayar@habana.ai>

If device memory scrubbing from hl_device_reset() fails, we return with
an error code but not perform error handling code.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
 drivers/accel/habanalabs/common/device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/accel/habanalabs/common/device.c b/drivers/accel/habanalabs/common/device.c
index 9a9c494b08a4..edeec35fd9c6 100644
--- a/drivers/accel/habanalabs/common/device.c
+++ b/drivers/accel/habanalabs/common/device.c
@@ -1738,7 +1738,7 @@ int hl_device_reset(struct hl_device *hdev, u32 flags)
 	rc = hdev->asic_funcs->scrub_device_mem(hdev);
 	if (rc) {
 		dev_err(hdev->dev, "scrub mem failed from device reset (%d)\n", rc);
-		return rc;
+		goto out_err;
 	}
 
 	spin_lock(&hdev->reset_info.lock);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 09/10] habanalabs: clear in_compute_reset when escalating to hard reset
  2023-01-19 10:33 [PATCH 01/10] habanalabs: update device status sysfs documentation Oded Gabbay
                   ` (6 preceding siblings ...)
  2023-01-19 10:33 ` [PATCH 08/10] habanalabs: run error handling if scrub_device_mem fails after reset Oded Gabbay
@ 2023-01-19 10:33 ` Oded Gabbay
  2023-01-19 10:33 ` [PATCH 10/10] habanalabs/gaudi2: unsecure tpc kernel_config registers Oded Gabbay
  8 siblings, 0 replies; 10+ messages in thread
From: Oded Gabbay @ 2023-01-19 10:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tomer Tayar

From: Tomer Tayar <ttayar@habana.ai>

If resetting device upon release while the release watchdog work is
scheduled, the compute reset is replaced with hard reset.
In this case, need to clear the in_compute_reset indication in the
device reset information structure.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
 drivers/accel/habanalabs/common/device.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/accel/habanalabs/common/device.c b/drivers/accel/habanalabs/common/device.c
index edeec35fd9c6..9933e5858a36 100644
--- a/drivers/accel/habanalabs/common/device.c
+++ b/drivers/accel/habanalabs/common/device.c
@@ -1514,6 +1514,7 @@ int hl_device_reset(struct hl_device *hdev, u32 flags)
 						&hdev->device_release_watchdog_work.reset_work);
 
 			if (from_dev_release) {
+				hdev->reset_info.in_compute_reset = 0;
 				flags |= HL_DRV_RESET_HARD;
 				flags &= ~HL_DRV_RESET_DEV_RELEASE;
 				hard_reset = true;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 10/10] habanalabs/gaudi2: unsecure tpc kernel_config registers
  2023-01-19 10:33 [PATCH 01/10] habanalabs: update device status sysfs documentation Oded Gabbay
                   ` (7 preceding siblings ...)
  2023-01-19 10:33 ` [PATCH 09/10] habanalabs: clear in_compute_reset when escalating to hard reset Oded Gabbay
@ 2023-01-19 10:33 ` Oded Gabbay
  8 siblings, 0 replies; 10+ messages in thread
From: Oded Gabbay @ 2023-01-19 10:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ofir Bitton

From: Ofir Bitton <obitton@habana.ai>

This is required in order to allow the kernel to control relevant
configuration space via load and store instructions.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
 drivers/accel/habanalabs/gaudi2/gaudi2_security.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/accel/habanalabs/gaudi2/gaudi2_security.c b/drivers/accel/habanalabs/gaudi2/gaudi2_security.c
index b2b528788e39..a212f82e6604 100644
--- a/drivers/accel/habanalabs/gaudi2/gaudi2_security.c
+++ b/drivers/accel/habanalabs/gaudi2/gaudi2_security.c
@@ -1561,6 +1561,7 @@ static const u32 gaudi2_pb_dcr0_tpc0_unsecured_regs[] = {
 	mmDCORE0_TPC0_CFG_LUT_FUNC128_BASE_ADDR_HI,
 	mmDCORE0_TPC0_CFG_LUT_FUNC256_BASE_ADDR_LO,
 	mmDCORE0_TPC0_CFG_LUT_FUNC256_BASE_ADDR_HI,
+	mmDCORE0_TPC0_CFG_KERNEL_KERNEL_CONFIG,
 	mmDCORE0_TPC0_CFG_KERNEL_SRF_0,
 	mmDCORE0_TPC0_CFG_KERNEL_SRF_1,
 	mmDCORE0_TPC0_CFG_KERNEL_SRF_2,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-01-20  4:45 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-19 10:33 [PATCH 01/10] habanalabs: update device status sysfs documentation Oded Gabbay
2023-01-19 10:33 ` [PATCH 02/10] habanalabs/gaudi2: print page fault axi transaction id Oded Gabbay
2023-01-19 10:33 ` [PATCH 03/10] habanalabs: block soft-reset on an unusable device Oded Gabbay
2023-01-19 10:33 ` [PATCH 04/10] habanalabs/gaudi2: fix emda range registers razwi handling Oded Gabbay
2023-01-19 10:33 ` [PATCH 05/10] habanalabs: refactor user interrupt type Oded Gabbay
2023-01-19 10:33 ` [PATCH 06/10] habanalabs: optimize command submission completion timestamp Oded Gabbay
2023-01-19 10:33 ` [PATCH 07/10] habanalabs: enhance info printed on FW load errors Oded Gabbay
2023-01-19 10:33 ` [PATCH 08/10] habanalabs: run error handling if scrub_device_mem fails after reset Oded Gabbay
2023-01-19 10:33 ` [PATCH 09/10] habanalabs: clear in_compute_reset when escalating to hard reset Oded Gabbay
2023-01-19 10:33 ` [PATCH 10/10] habanalabs/gaudi2: unsecure tpc kernel_config registers Oded Gabbay

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.