linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/4] habanalabs: correct an error message
@ 2020-09-24  7:02 Oded Gabbay
  2020-09-24  7:02 ` [PATCH 2/4] habanalabs: release kernel context after hw_fini Oded Gabbay
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Oded Gabbay @ 2020-09-24  7:02 UTC (permalink / raw)
  To: linux-kernel; +Cc: SW_Drivers

We don't try to allocate huge pages here so remove the huge word.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/common/memory.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
index 3324332811bc..84227819e4d1 100644
--- a/drivers/misc/habanalabs/common/memory.c
+++ b/drivers/misc/habanalabs/common/memory.c
@@ -77,8 +77,8 @@ static int alloc_device_memory(struct hl_ctx *ctx, struct hl_mem_in *args,
 		paddr = (u64) gen_pool_alloc(vm->dram_pg_pool, total_size);
 		if (!paddr) {
 			dev_err(hdev->dev,
-				"failed to allocate %llu huge contiguous pages\n",
-				num_pgs);
+				"failed to allocate %llu contiguous pages with total size of %llu\n",
+				num_pgs, total_size);
 			return -ENOMEM;
 		}
 	}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/4] habanalabs: release kernel context after hw_fini
  2020-09-24  7:02 [PATCH 1/4] habanalabs: correct an error message Oded Gabbay
@ 2020-09-24  7:02 ` Oded Gabbay
  2020-09-24  7:02 ` [PATCH 3/4] habanalabs: add debug messages for opening/closing context Oded Gabbay
  2020-09-24  7:02 ` [PATCH 4/4] habanalabs: add notice of device not idle Oded Gabbay
  2 siblings, 0 replies; 5+ messages in thread
From: Oded Gabbay @ 2020-09-24  7:02 UTC (permalink / raw)
  To: linux-kernel; +Cc: SW_Drivers

Some engines use resources that belong to the kernel context (e.g. MMU
mappings). In case the halt-engines doesn't work properly due to H/W
restriction, we need to make sure the kernel context lives on until after
the hw_fini. The hw_fini resets the ASIC after that no engine is alive and
we can safely close the kernel context.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/common/device.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/misc/habanalabs/common/device.c b/drivers/misc/habanalabs/common/device.c
index 196e35d71118..20572224099a 100644
--- a/drivers/misc/habanalabs/common/device.c
+++ b/drivers/misc/habanalabs/common/device.c
@@ -967,14 +967,13 @@ int hl_device_reset(struct hl_device *hdev, bool hard_reset,
 		flush_workqueue(hdev->eq_wq);
 	}
 
-	/* Release kernel context */
-	if ((hard_reset) && (hl_ctx_put(hdev->kernel_ctx) == 1))
-		hdev->kernel_ctx = NULL;
-
 	/* Reset the H/W. It will be in idle state after this returns */
 	hdev->asic_funcs->hw_fini(hdev, hard_reset);
 
 	if (hard_reset) {
+		/* Release kernel context */
+		if (hl_ctx_put(hdev->kernel_ctx) == 1)
+			hdev->kernel_ctx = NULL;
 		hl_vm_fini(hdev);
 		hl_mmu_fini(hdev);
 		hl_eq_reset(hdev, &hdev->event_queue);
@@ -1465,13 +1464,13 @@ void hl_device_fini(struct hl_device *hdev)
 
 	hl_cb_pool_fini(hdev);
 
+	/* Reset the H/W. It will be in idle state after this returns */
+	hdev->asic_funcs->hw_fini(hdev, true);
+
 	/* Release kernel context */
 	if ((hdev->kernel_ctx) && (hl_ctx_put(hdev->kernel_ctx) != 1))
 		dev_err(hdev->dev, "kernel ctx is still alive\n");
 
-	/* Reset the H/W. It will be in idle state after this returns */
-	hdev->asic_funcs->hw_fini(hdev, true);
-
 	hl_vm_fini(hdev);
 
 	hl_mmu_fini(hdev);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/4] habanalabs: add debug messages for opening/closing context
  2020-09-24  7:02 [PATCH 1/4] habanalabs: correct an error message Oded Gabbay
  2020-09-24  7:02 ` [PATCH 2/4] habanalabs: release kernel context after hw_fini Oded Gabbay
@ 2020-09-24  7:02 ` Oded Gabbay
  2020-09-24  7:02 ` [PATCH 4/4] habanalabs: add notice of device not idle Oded Gabbay
  2 siblings, 0 replies; 5+ messages in thread
From: Oded Gabbay @ 2020-09-24  7:02 UTC (permalink / raw)
  To: linux-kernel; +Cc: SW_Drivers

During debugging of error we sometimes need to know whether the error
happened when a user context was open. Add debug prints when opening and
closing user contexts.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/common/context.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/misc/habanalabs/common/context.c b/drivers/misc/habanalabs/common/context.c
index df8171a2226c..bd03ef074eed 100644
--- a/drivers/misc/habanalabs/common/context.c
+++ b/drivers/misc/habanalabs/common/context.c
@@ -28,6 +28,8 @@ static void hl_ctx_fini(struct hl_ctx *ctx)
 	kfree(ctx->cs_pending);
 
 	if (ctx->asid != HL_KERNEL_ASID_ID) {
+		dev_dbg(hdev->dev, "closing user context %d\n", ctx->asid);
+
 		/* The engines are stopped as there is no executing CS, but the
 		 * Coresight might be still working by accessing addresses
 		 * related to the stopped engines. Hence stop it explicitly.
@@ -41,6 +43,7 @@ static void hl_ctx_fini(struct hl_ctx *ctx)
 		hl_vm_ctx_fini(ctx);
 		hl_asid_free(hdev, ctx->asid);
 	} else {
+		dev_dbg(hdev->dev, "closing kernel context\n");
 		hl_mmu_ctx_fini(ctx);
 	}
 }
@@ -168,6 +171,8 @@ int hl_ctx_init(struct hl_device *hdev, struct hl_ctx *ctx, bool is_kernel_ctx)
 			dev_err(hdev->dev, "ctx_init failed\n");
 			goto err_cb_va_pool_fini;
 		}
+
+		dev_dbg(hdev->dev, "create user context %d\n", ctx->asid);
 	}
 
 	return 0;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 4/4] habanalabs: add notice of device not idle
  2020-09-24  7:02 [PATCH 1/4] habanalabs: correct an error message Oded Gabbay
  2020-09-24  7:02 ` [PATCH 2/4] habanalabs: release kernel context after hw_fini Oded Gabbay
  2020-09-24  7:02 ` [PATCH 3/4] habanalabs: add debug messages for opening/closing context Oded Gabbay
@ 2020-09-24  7:02 ` Oded Gabbay
  2020-09-24  7:11   ` Tomer Tayar
  2 siblings, 1 reply; 5+ messages in thread
From: Oded Gabbay @ 2020-09-24  7:02 UTC (permalink / raw)
  To: linux-kernel; +Cc: SW_Drivers

The device should be idle after a context is closed. If not, print a
notice.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/common/context.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/misc/habanalabs/common/context.c b/drivers/misc/habanalabs/common/context.c
index bd03ef074eed..7a59dd7c6450 100644
--- a/drivers/misc/habanalabs/common/context.c
+++ b/drivers/misc/habanalabs/common/context.c
@@ -12,6 +12,7 @@
 static void hl_ctx_fini(struct hl_ctx *ctx)
 {
 	struct hl_device *hdev = ctx->hdev;
+	u64 idle_mask = 0;
 	int i;
 
 	/*
@@ -42,6 +43,13 @@ static void hl_ctx_fini(struct hl_ctx *ctx)
 		hl_cb_va_pool_fini(ctx);
 		hl_vm_ctx_fini(ctx);
 		hl_asid_free(hdev, ctx->asid);
+
+		if ((!hdev->pldm) && (hdev->pdev) &&
+				(!hdev->asic_funcs->is_device_idle(hdev,
+							&idle_mask, NULL)))
+			dev_notice(hdev->dev,
+				"device not idle after user context is closed (0x%llx)\n",
+				idle_mask);
 	} else {
 		dev_dbg(hdev->dev, "closing kernel context\n");
 		hl_mmu_ctx_fini(ctx);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* RE: [PATCH 4/4] habanalabs: add notice of device not idle
  2020-09-24  7:02 ` [PATCH 4/4] habanalabs: add notice of device not idle Oded Gabbay
@ 2020-09-24  7:11   ` Tomer Tayar
  0 siblings, 0 replies; 5+ messages in thread
From: Tomer Tayar @ 2020-09-24  7:11 UTC (permalink / raw)
  To: Oded Gabbay, linux-kernel; +Cc: SW_Drivers

On Thu, Sep 24, 2020 at 10:03 AM Oded Gabbay <oded.gabbay@gmail.com> wrote:
> The device should be idle after a context is closed. If not, print a
> notice.
> 
> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>

This patch-set is:
Reviewed-by: Tomer Tayar <ttayar@habana.ai>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-09-24  7:11 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-24  7:02 [PATCH 1/4] habanalabs: correct an error message Oded Gabbay
2020-09-24  7:02 ` [PATCH 2/4] habanalabs: release kernel context after hw_fini Oded Gabbay
2020-09-24  7:02 ` [PATCH 3/4] habanalabs: add debug messages for opening/closing context Oded Gabbay
2020-09-24  7:02 ` [PATCH 4/4] habanalabs: add notice of device not idle Oded Gabbay
2020-09-24  7:11   ` Tomer Tayar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).