* [PATCH] habanalabs: full FW hard reset support
@ 2020-12-08 15:39 Oded Gabbay
2020-12-08 15:39 ` [PATCH] habanalabs/gaudi: disable CGM at HW initialization Oded Gabbay
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Oded Gabbay @ 2020-12-08 15:39 UTC (permalink / raw)
To: linux-kernel; +Cc: SW_Drivers, Ofir Bitton
From: Ofir Bitton <obitton@habana.ai>
Driver must fetch FW hard reset capability at every FW boot stage:
preboot, CPU boot, CPU application.
If hard reset is triggered, driver will take into consideration
only the last capability received.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
drivers/misc/habanalabs/common/firmware_if.c | 51 +++++++++++++++-----
1 file changed, 38 insertions(+), 13 deletions(-)
diff --git a/drivers/misc/habanalabs/common/firmware_if.c b/drivers/misc/habanalabs/common/firmware_if.c
index c970bfc6db66..7720b48c6239 100644
--- a/drivers/misc/habanalabs/common/firmware_if.c
+++ b/drivers/misc/habanalabs/common/firmware_if.c
@@ -629,32 +629,34 @@ int hl_fw_read_preboot_status(struct hl_device *hdev, u32 cpu_boot_status_reg,
/* We read security status multiple times during boot:
* 1. preboot - a. Check whether the security status bits are valid
* b. Check whether fw security is enabled
- * c. Check whether hard reset is done by fw
- * 2. boot cpu - we get boot cpu security status
- * 3. FW application - we get FW application security status
+ * c. Check whether hard reset is done by preboot
+ * 2. boot cpu - a. Fetch boot cpu security status
+ * b. Check whether hard reset is done by boot cpu
+ * 3. FW application - a. Fetch fw application security status
+ * b. Check whether hard reset is done by fw app
*
* Preboot:
* Check security status bit (CPU_BOOT_DEV_STS0_ENABLED), if it is set
* check security enabled bit (CPU_BOOT_DEV_STS0_SECURITY_EN)
*/
if (security_status & CPU_BOOT_DEV_STS0_ENABLED) {
- hdev->asic_prop.fw_security_status_valid = 1;
+ prop->fw_security_status_valid = 1;
if (!(security_status & CPU_BOOT_DEV_STS0_SECURITY_EN))
prop->fw_security_disabled = true;
if (security_status & CPU_BOOT_DEV_STS0_FW_HARD_RST_EN)
- hdev->asic_prop.hard_reset_done_by_fw = true;
+ prop->hard_reset_done_by_fw = true;
} else {
- hdev->asic_prop.fw_security_status_valid = 0;
+ prop->fw_security_status_valid = 0;
prop->fw_security_disabled = true;
}
- dev_dbg(hdev->dev, "Firmware hard-reset is %s\n",
- hdev->asic_prop.hard_reset_done_by_fw ? "enabled" : "disabled");
+ dev_dbg(hdev->dev, "Firmware preboot hard-reset is %s\n",
+ prop->hard_reset_done_by_fw ? "enabled" : "disabled");
dev_info(hdev->dev, "firmware-level security is %s\n",
- prop->fw_security_disabled ? "disabled" : "enabled");
+ prop->fw_security_disabled ? "disabled" : "enabled");
return 0;
}
@@ -664,6 +666,7 @@ int hl_fw_init_cpu(struct hl_device *hdev, u32 cpu_boot_status_reg,
u32 cpu_security_boot_status_reg, u32 boot_err0_reg,
bool skip_bmc, u32 cpu_timeout, u32 boot_fit_timeout)
{
+ struct asic_fixed_properties *prop = &hdev->asic_prop;
u32 status;
int rc;
@@ -732,11 +735,22 @@ int hl_fw_init_cpu(struct hl_device *hdev, u32 cpu_boot_status_reg,
/* Read U-Boot version now in case we will later fail */
hdev->asic_funcs->read_device_fw_version(hdev, FW_COMP_UBOOT);
+ /* Clear reset status since we need to read it again from boot CPU */
+ prop->hard_reset_done_by_fw = false;
+
/* Read boot_cpu security bits */
- if (hdev->asic_prop.fw_security_status_valid)
- hdev->asic_prop.fw_boot_cpu_security_map =
+ if (prop->fw_security_status_valid) {
+ prop->fw_boot_cpu_security_map =
RREG32(cpu_security_boot_status_reg);
+ if (prop->fw_boot_cpu_security_map &
+ CPU_BOOT_DEV_STS0_FW_HARD_RST_EN)
+ prop->hard_reset_done_by_fw = true;
+ }
+
+ dev_dbg(hdev->dev, "Firmware boot CPU hard-reset is %s\n",
+ prop->hard_reset_done_by_fw ? "enabled" : "disabled");
+
if (rc) {
detect_cpu_boot_status(hdev, status);
rc = -EIO;
@@ -805,11 +819,22 @@ int hl_fw_init_cpu(struct hl_device *hdev, u32 cpu_boot_status_reg,
goto out;
}
+ /* Clear reset status since we need to read again from app */
+ prop->hard_reset_done_by_fw = false;
+
/* Read FW application security bits */
- if (hdev->asic_prop.fw_security_status_valid)
- hdev->asic_prop.fw_app_security_map =
+ if (prop->fw_security_status_valid) {
+ prop->fw_app_security_map =
RREG32(cpu_security_boot_status_reg);
+ if (prop->fw_app_security_map &
+ CPU_BOOT_DEV_STS0_FW_HARD_RST_EN)
+ prop->hard_reset_done_by_fw = true;
+ }
+
+ dev_dbg(hdev->dev, "Firmware application CPU hard-reset is %s\n",
+ prop->hard_reset_done_by_fw ? "enabled" : "disabled");
+
dev_info(hdev->dev, "Successfully loaded firmware to device\n");
out:
--
2.17.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH] habanalabs/gaudi: disable CGM at HW initialization
2020-12-08 15:39 [PATCH] habanalabs: full FW hard reset support Oded Gabbay
@ 2020-12-08 15:39 ` Oded Gabbay
2020-12-08 15:39 ` [PATCH] habanalabs/gaudi: enhance reset message Oded Gabbay
2020-12-08 15:39 ` [PATCH] habanalabs: update comment in hl_boot_if.h Oded Gabbay
2 siblings, 0 replies; 4+ messages in thread
From: Oded Gabbay @ 2020-12-08 15:39 UTC (permalink / raw)
To: linux-kernel; +Cc: SW_Drivers
In case the clock gating was enabled in preboot we need to disable it
at the H/W initialization stage before touching the MME/TPC registers.
Otherwise, the ASIC can get stuck. If the security is enabled in
the firmware level, the CGM is always disabled and the driver can't
enable it.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
drivers/misc/habanalabs/gaudi/gaudi.c | 14 +++++++++++---
.../misc/habanalabs/include/common/hl_boot_if.h | 5 +++++
2 files changed, 16 insertions(+), 3 deletions(-)
diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index 65895ba075fe..f316b898e8e0 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -2403,8 +2403,6 @@ static void gaudi_init_golden_registers(struct hl_device *hdev)
gaudi_init_e2e(hdev);
gaudi_init_hbm_cred(hdev);
- hdev->asic_funcs->disable_clock_gating(hdev);
-
for (tpc_id = 0, tpc_offset = 0;
tpc_id < TPC_NUMBER_OF_ENGINES;
tpc_id++, tpc_offset += TPC_CFG_OFFSET) {
@@ -3416,6 +3414,9 @@ static void gaudi_set_clock_gating(struct hl_device *hdev)
if (hdev->in_debug)
return;
+ if (!hdev->asic_prop.fw_security_disabled)
+ return;
+
for (i = GAUDI_PCI_DMA_1, qman_offset = 0 ; i < GAUDI_HBM_DMA_1 ; i++) {
enable = !!(hdev->clock_gating_mask &
(BIT_ULL(gaudi_dma_assignment[i])));
@@ -3467,7 +3468,7 @@ static void gaudi_disable_clock_gating(struct hl_device *hdev)
u32 qman_offset;
int i;
- if (!(gaudi->hw_cap_initialized & HW_CAP_CLK_GATE))
+ if (!hdev->asic_prop.fw_security_disabled)
return;
for (i = 0, qman_offset = 0 ; i < DMA_NUMBER_OF_CHANNELS ; i++) {
@@ -3801,6 +3802,13 @@ static int gaudi_hw_init(struct hl_device *hdev)
return rc;
}
+ /* In case the clock gating was enabled in preboot we need to disable
+ * it here before touching the MME/TPC registers.
+ * There is no need to take clk gating mutex because when this function
+ * runs, no other relevant code can run
+ */
+ hdev->asic_funcs->disable_clock_gating(hdev);
+
/* SRAM scrambler must be initialized after CPU is running from HBM */
gaudi_init_scrambler_sram(hdev);
diff --git a/drivers/misc/habanalabs/include/common/hl_boot_if.h b/drivers/misc/habanalabs/include/common/hl_boot_if.h
index 755c4800f002..7cb5f2d3e565 100644
--- a/drivers/misc/habanalabs/include/common/hl_boot_if.h
+++ b/drivers/misc/habanalabs/include/common/hl_boot_if.h
@@ -150,6 +150,10 @@
* CPU_BOOT_DEV_STS0_PLL_INFO_EN FW retrieval of PLL info is enabled.
* Initialized in: linux
*
+ * CPU_BOOT_DEV_STS0_CLK_GATE_EN Clock Gating enabled.
+ * FW initialized Clock Gating.
+ * Initialized in: preboot
+ *
* CPU_BOOT_DEV_STS0_ENABLED Device status register enabled.
* This is a main indication that the
* running FW populates the device status
@@ -171,6 +175,7 @@
#define CPU_BOOT_DEV_STS0_DRAM_SCR_EN (1 << 9)
#define CPU_BOOT_DEV_STS0_FW_HARD_RST_EN (1 << 10)
#define CPU_BOOT_DEV_STS0_PLL_INFO_EN (1 << 11)
+#define CPU_BOOT_DEV_STS0_CLK_GATE_EN (1 << 13)
#define CPU_BOOT_DEV_STS0_ENABLED (1 << 31)
enum cpu_boot_status {
--
2.17.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH] habanalabs/gaudi: enhance reset message
2020-12-08 15:39 [PATCH] habanalabs: full FW hard reset support Oded Gabbay
2020-12-08 15:39 ` [PATCH] habanalabs/gaudi: disable CGM at HW initialization Oded Gabbay
@ 2020-12-08 15:39 ` Oded Gabbay
2020-12-08 15:39 ` [PATCH] habanalabs: update comment in hl_boot_if.h Oded Gabbay
2 siblings, 0 replies; 4+ messages in thread
From: Oded Gabbay @ 2020-12-08 15:39 UTC (permalink / raw)
To: linux-kernel; +Cc: SW_Drivers
Print the initiator who performs the hard-reset for easier debugging.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
drivers/misc/habanalabs/gaudi/gaudi.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index f316b898e8e0..b0528883a995 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -3936,11 +3936,15 @@ static void gaudi_hw_fini(struct hl_device *hdev, bool hard_reset)
WREG32(mmPSOC_GLOBAL_CONF_SW_ALL_RST,
1 << PSOC_GLOBAL_CONF_SW_ALL_RST_IND_SHIFT);
- }
- dev_info(hdev->dev,
- "Issued HARD reset command, going to wait %dms\n",
- reset_timeout_ms);
+ dev_info(hdev->dev,
+ "Issued HARD reset command, going to wait %dms\n",
+ reset_timeout_ms);
+ } else {
+ dev_info(hdev->dev,
+ "Firmware performs HARD reset, going to wait %dms\n",
+ reset_timeout_ms);
+ }
/*
* After hard reset, we can't poll the BTM_FSM register because the PSOC
--
2.17.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH] habanalabs: update comment in hl_boot_if.h
2020-12-08 15:39 [PATCH] habanalabs: full FW hard reset support Oded Gabbay
2020-12-08 15:39 ` [PATCH] habanalabs/gaudi: disable CGM at HW initialization Oded Gabbay
2020-12-08 15:39 ` [PATCH] habanalabs/gaudi: enhance reset message Oded Gabbay
@ 2020-12-08 15:39 ` Oded Gabbay
2 siblings, 0 replies; 4+ messages in thread
From: Oded Gabbay @ 2020-12-08 15:39 UTC (permalink / raw)
To: linux-kernel; +Cc: SW_Drivers
Hard-reset flag is updated in many stages of the boot sequence of the
firmware.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
drivers/misc/habanalabs/include/common/hl_boot_if.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/misc/habanalabs/include/common/hl_boot_if.h b/drivers/misc/habanalabs/include/common/hl_boot_if.h
index 7cb5f2d3e565..b637dfd69f6e 100644
--- a/drivers/misc/habanalabs/include/common/hl_boot_if.h
+++ b/drivers/misc/habanalabs/include/common/hl_boot_if.h
@@ -145,7 +145,7 @@
* implemented. This means that FW will
* perform hard reset procedure on
* receiving the halt-machine event.
- * Initialized in: linux
+ * Initialized in: preboot, u-boot, linux
*
* CPU_BOOT_DEV_STS0_PLL_INFO_EN FW retrieval of PLL info is enabled.
* Initialized in: linux
--
2.17.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-12-08 15:40 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-08 15:39 [PATCH] habanalabs: full FW hard reset support Oded Gabbay
2020-12-08 15:39 ` [PATCH] habanalabs/gaudi: disable CGM at HW initialization Oded Gabbay
2020-12-08 15:39 ` [PATCH] habanalabs/gaudi: enhance reset message Oded Gabbay
2020-12-08 15:39 ` [PATCH] habanalabs: update comment in hl_boot_if.h Oded Gabbay
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).