linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/9] habanalabs: Increase queues depth
@ 2020-07-05 13:12 Oded Gabbay
  2020-07-05 13:12 ` [PATCH 2/9] habanalabs: rephrase error messages Oded Gabbay
                   ` (7 more replies)
  0 siblings, 8 replies; 11+ messages in thread
From: Oded Gabbay @ 2020-07-05 13:12 UTC (permalink / raw)
  To: linux-kernel, SW_Drivers; +Cc: Ofir Bitton

From: Ofir Bitton <obitton@habana.ai>

After recent concurrent cs amount increase, we must also
increase queues depth since much more concurrent work can be done.
All external queue depths were increased to 4096 as gaudi's
internal queue depths were also increased to 1024.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/gaudi/gaudiP.h |  6 ++---
 drivers/misc/habanalabs/habanalabs.h   | 31 ++++----------------------
 drivers/misc/habanalabs/hw_queue.c     |  2 --
 drivers/misc/habanalabs/irq.c          |  4 ----
 4 files changed, 7 insertions(+), 36 deletions(-)

diff --git a/drivers/misc/habanalabs/gaudi/gaudiP.h b/drivers/misc/habanalabs/gaudi/gaudiP.h
index 3958fe38c8ee..bdc5f96085a7 100644
--- a/drivers/misc/habanalabs/gaudi/gaudiP.h
+++ b/drivers/misc/habanalabs/gaudi/gaudiP.h
@@ -123,14 +123,14 @@
 
 /* Internal QMANs PQ sizes */
 
-#define MME_QMAN_LENGTH			64
+#define MME_QMAN_LENGTH			1024
 #define MME_QMAN_SIZE_IN_BYTES		(MME_QMAN_LENGTH * QMAN_PQ_ENTRY_SIZE)
 
-#define HBM_DMA_QMAN_LENGTH		64
+#define HBM_DMA_QMAN_LENGTH		1024
 #define HBM_DMA_QMAN_SIZE_IN_BYTES	\
 				(HBM_DMA_QMAN_LENGTH * QMAN_PQ_ENTRY_SIZE)
 
-#define TPC_QMAN_LENGTH			64
+#define TPC_QMAN_LENGTH			1024
 #define TPC_QMAN_SIZE_IN_BYTES		(TPC_QMAN_LENGTH * QMAN_PQ_ENTRY_SIZE)
 
 #define SRAM_USER_BASE_OFFSET  GAUDI_DRIVER_SRAM_RESERVED_SIZE_FROM_START
diff --git a/drivers/misc/habanalabs/habanalabs.h b/drivers/misc/habanalabs/habanalabs.h
index 4e68a41cce77..e4d6f7c91194 100644
--- a/drivers/misc/habanalabs/habanalabs.h
+++ b/drivers/misc/habanalabs/habanalabs.h
@@ -378,38 +378,15 @@ struct hl_cb {
 
 struct hl_cs_job;
 
-/*
- * Currently, there are two limitations on the maximum length of a queue:
- *
- * 1. The memory footprint of the queue. The current allocated space for the
- *    queue is PAGE_SIZE. Because each entry in the queue is HL_BD_SIZE,
- *    the maximum length of the queue can be PAGE_SIZE / HL_BD_SIZE,
- *    which currently is 4096/16 = 256 entries.
- *
- *    To increase that, we need either to decrease the size of the
- *    BD (difficult), or allocate more than a single page (easier).
- *
- * 2. Because the size of the JOB handle field in the BD CTL / completion queue
- *    is 10-bit, we can have up to 1024 open jobs per hardware queue.
- *    Therefore, each queue can hold up to 1024 entries.
- *
- * HL_QUEUE_LENGTH is in units of struct hl_bd.
- * HL_QUEUE_LENGTH * sizeof(struct hl_bd) should be <= HL_PAGE_SIZE
- */
-
-#define HL_PAGE_SIZE			4096 /* minimum page size */
-/* Must be power of 2 (HL_PAGE_SIZE / HL_BD_SIZE) */
-#define HL_QUEUE_LENGTH			256
+/* Queue length of external and HW queues */
+#define HL_QUEUE_LENGTH			4096
 #define HL_QUEUE_SIZE_IN_BYTES		(HL_QUEUE_LENGTH * HL_BD_SIZE)
 
-/*
- * HL_CQ_LENGTH is in units of struct hl_cq_entry.
- * HL_CQ_LENGTH should be <= HL_PAGE_SIZE
- */
+/* HL_CQ_LENGTH is in units of struct hl_cq_entry */
 #define HL_CQ_LENGTH			HL_QUEUE_LENGTH
 #define HL_CQ_SIZE_IN_BYTES		(HL_CQ_LENGTH * HL_CQ_ENTRY_SIZE)
 
-/* Must be power of 2 (HL_PAGE_SIZE / HL_EQ_ENTRY_SIZE) */
+/* Must be power of 2 */
 #define HL_EQ_LENGTH			64
 #define HL_EQ_SIZE_IN_BYTES		(HL_EQ_LENGTH * HL_EQ_ENTRY_SIZE)
 
diff --git a/drivers/misc/habanalabs/hw_queue.c b/drivers/misc/habanalabs/hw_queue.c
index 27f0c34b63b9..f5a10a5ac300 100644
--- a/drivers/misc/habanalabs/hw_queue.c
+++ b/drivers/misc/habanalabs/hw_queue.c
@@ -780,8 +780,6 @@ static int queue_init(struct hl_device *hdev, struct hl_hw_queue *q,
 {
 	int rc;
 
-	BUILD_BUG_ON(HL_QUEUE_SIZE_IN_BYTES > HL_PAGE_SIZE);
-
 	q->hw_queue_id = hw_queue_id;
 
 	switch (q->queue_type) {
diff --git a/drivers/misc/habanalabs/irq.c b/drivers/misc/habanalabs/irq.c
index 6981d67153b1..7a4878edb1a3 100644
--- a/drivers/misc/habanalabs/irq.c
+++ b/drivers/misc/habanalabs/irq.c
@@ -220,8 +220,6 @@ int hl_cq_init(struct hl_device *hdev, struct hl_cq *q, u32 hw_queue_id)
 {
 	void *p;
 
-	BUILD_BUG_ON(HL_CQ_SIZE_IN_BYTES > HL_PAGE_SIZE);
-
 	p = hdev->asic_funcs->asic_dma_alloc_coherent(hdev, HL_CQ_SIZE_IN_BYTES,
 				&q->bus_address, GFP_KERNEL | __GFP_ZERO);
 	if (!p)
@@ -282,8 +280,6 @@ int hl_eq_init(struct hl_device *hdev, struct hl_eq *q)
 {
 	void *p;
 
-	BUILD_BUG_ON(HL_EQ_SIZE_IN_BYTES > HL_PAGE_SIZE);
-
 	p = hdev->asic_funcs->cpu_accessible_dma_pool_alloc(hdev,
 							HL_EQ_SIZE_IN_BYTES,
 							&q->bus_address);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/9] habanalabs: rephrase error messages
  2020-07-05 13:12 [PATCH 1/9] habanalabs: Increase queues depth Oded Gabbay
@ 2020-07-05 13:12 ` Oded Gabbay
  2020-07-05 13:29   ` Tomer Tayar
  2020-07-05 13:12 ` [PATCH 3/9] habanalabs: extract cpu boot status lookup Oded Gabbay
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 11+ messages in thread
From: Oded Gabbay @ 2020-07-05 13:12 UTC (permalink / raw)
  To: linux-kernel, SW_Drivers

rephrase some error/warning/notice messages to make them more accessible to
ordinary users.

There is no need to print context ASID as the driver currently doesn't
support multiple contexts.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/command_submission.c | 20 +++++++++++++-------
 drivers/misc/habanalabs/context.c            |  3 +--
 drivers/misc/habanalabs/firmware_if.c        |  4 ++--
 drivers/misc/habanalabs/memory.c             |  3 +--
 4 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/drivers/misc/habanalabs/command_submission.c b/drivers/misc/habanalabs/command_submission.c
index 62dab99dda98..f81d6685e011 100644
--- a/drivers/misc/habanalabs/command_submission.c
+++ b/drivers/misc/habanalabs/command_submission.c
@@ -373,9 +373,9 @@ static void cs_timedout(struct work_struct *work)
 	hdev = cs->ctx->hdev;
 	ctx_asid = cs->ctx->asid;
 
-	/* TODO: add information about last signaled seq and last emitted seq */
-	dev_err(hdev->dev, "User %d command submission %llu got stuck!\n",
-		ctx_asid, cs->sequence);
+	dev_err(hdev->dev,
+		"Command submission %llu has not finished in time!\n",
+		cs->sequence);
 
 	cs_put(cs);
 
@@ -1130,7 +1130,7 @@ static long _hl_cs_wait_ioctl(struct hl_device *hdev,
 		rc = PTR_ERR(fence);
 		if (rc == -EINVAL)
 			dev_notice_ratelimited(hdev->dev,
-				"Can't wait on seq %llu because current CS is at seq %llu\n",
+				"Can't wait on CS %llu because current CS is at seq %llu\n",
 				seq, ctx->cs_sequence);
 	} else if (fence) {
 		rc = dma_fence_wait_timeout(fence, true, timeout);
@@ -1163,15 +1163,21 @@ int hl_cs_wait_ioctl(struct hl_fpriv *hpriv, void *data)
 	memset(args, 0, sizeof(*args));
 
 	if (rc < 0) {
-		dev_err_ratelimited(hdev->dev,
-				"Error %ld on waiting for CS handle %llu\n",
-				rc, seq);
 		if (rc == -ERESTARTSYS) {
+			dev_err_ratelimited(hdev->dev,
+				"user process got signal while waiting for CS handle %llu\n",
+				seq);
 			args->out.status = HL_WAIT_CS_STATUS_INTERRUPTED;
 			rc = -EINTR;
 		} else if (rc == -ETIMEDOUT) {
+			dev_err_ratelimited(hdev->dev,
+				"CS %llu has timed-out while user process is waiting for it\n",
+				seq);
 			args->out.status = HL_WAIT_CS_STATUS_TIMEDOUT;
 		} else if (rc == -EIO) {
+			dev_err_ratelimited(hdev->dev,
+				"CS %llu has been aborted while user process is waiting for it\n",
+				seq);
 			args->out.status = HL_WAIT_CS_STATUS_ABORTED;
 		}
 		return rc;
diff --git a/drivers/misc/habanalabs/context.c b/drivers/misc/habanalabs/context.c
index 1b96fefa4a65..1e3e5b19ecd9 100644
--- a/drivers/misc/habanalabs/context.c
+++ b/drivers/misc/habanalabs/context.c
@@ -112,8 +112,7 @@ void hl_ctx_free(struct hl_device *hdev, struct hl_ctx *ctx)
 		return;
 
 	dev_warn(hdev->dev,
-		"Context %d closed or terminated but its CS are executing\n",
-		ctx->asid);
+		"user process released device but its command submissions are still executing\n");
 }
 
 int hl_ctx_init(struct hl_device *hdev, struct hl_ctx *ctx, bool is_kernel_ctx)
diff --git a/drivers/misc/habanalabs/firmware_if.c b/drivers/misc/habanalabs/firmware_if.c
index 6900c01d060f..9e7f203a09d7 100644
--- a/drivers/misc/habanalabs/firmware_if.c
+++ b/drivers/misc/habanalabs/firmware_if.c
@@ -289,7 +289,7 @@ int hl_fw_armcp_info_get(struct hl_device *hdev)
 					HL_ARMCP_INFO_TIMEOUT_USEC, &result);
 	if (rc) {
 		dev_err(hdev->dev,
-			"Failed to send ArmCP info pkt, error %d\n", rc);
+			"Failed to handle ArmCP info pkt, error %d\n", rc);
 		goto out;
 	}
 
@@ -340,7 +340,7 @@ int hl_fw_get_eeprom_data(struct hl_device *hdev, void *data, size_t max_size)
 
 	if (rc) {
 		dev_err(hdev->dev,
-			"Failed to send ArmCP EEPROM packet, error %d\n", rc);
+			"Failed to handle ArmCP EEPROM packet, error %d\n", rc);
 		goto out;
 	}
 
diff --git a/drivers/misc/habanalabs/memory.c b/drivers/misc/habanalabs/memory.c
index 47da84a17719..e4e1693e5c6c 100644
--- a/drivers/misc/habanalabs/memory.c
+++ b/drivers/misc/habanalabs/memory.c
@@ -1730,8 +1730,7 @@ void hl_vm_ctx_fini(struct hl_ctx *ctx)
 	 */
 	if (!hdev->hard_reset_pending && !hash_empty(ctx->mem_hash))
 		dev_notice(hdev->dev,
-				"ctx %d is freed while it has va in use\n",
-				ctx->asid);
+			"user released device without removing its memory mappings\n");
 
 	hash_for_each_safe(ctx->mem_hash, i, tmp_node, hnode, node) {
 		dev_dbg(hdev->dev,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/9] habanalabs: extract cpu boot status lookup
  2020-07-05 13:12 [PATCH 1/9] habanalabs: Increase queues depth Oded Gabbay
  2020-07-05 13:12 ` [PATCH 2/9] habanalabs: rephrase error messages Oded Gabbay
@ 2020-07-05 13:12 ` Oded Gabbay
  2020-07-05 13:12 ` [PATCH 4/9] habanalabs: Add dropped cs statistics info struct Oded Gabbay
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Oded Gabbay @ 2020-07-05 13:12 UTC (permalink / raw)
  To: linux-kernel, SW_Drivers; +Cc: Christine Gharzuzi

From: Christine Gharzuzi <cgharzuzi@habana.ai>

Extract detection of the cpu boot status to a function
to allow code reuse

Signed-off-by: Christine Gharzuzi <cgharzuzi@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/firmware_if.c | 92 ++++++++++++++-------------
 1 file changed, 48 insertions(+), 44 deletions(-)

diff --git a/drivers/misc/habanalabs/firmware_if.c b/drivers/misc/habanalabs/firmware_if.c
index 9e7f203a09d7..3be1549cd137 100644
--- a/drivers/misc/habanalabs/firmware_if.c
+++ b/drivers/misc/habanalabs/firmware_if.c
@@ -393,6 +393,53 @@ static void fw_read_errors(struct hl_device *hdev, u32 boot_err0_reg)
 			"Device boot error - NIC F/W initialization failed\n");
 }
 
+static void hl_detect_cpu_boot_status(struct hl_device *hdev, u32 status)
+{
+	switch (status) {
+	case CPU_BOOT_STATUS_NA:
+		dev_err(hdev->dev,
+			"Device boot error - BTL did NOT run\n");
+		break;
+	case CPU_BOOT_STATUS_IN_WFE:
+		dev_err(hdev->dev,
+			"Device boot error - Stuck inside WFE loop\n");
+		break;
+	case CPU_BOOT_STATUS_IN_BTL:
+		dev_err(hdev->dev,
+			"Device boot error - Stuck in BTL\n");
+		break;
+	case CPU_BOOT_STATUS_IN_PREBOOT:
+		dev_err(hdev->dev,
+			"Device boot error - Stuck in Preboot\n");
+		break;
+	case CPU_BOOT_STATUS_IN_SPL:
+		dev_err(hdev->dev,
+			"Device boot error - Stuck in SPL\n");
+		break;
+	case CPU_BOOT_STATUS_IN_UBOOT:
+		dev_err(hdev->dev,
+			"Device boot error - Stuck in u-boot\n");
+		break;
+	case CPU_BOOT_STATUS_DRAM_INIT_FAIL:
+		dev_err(hdev->dev,
+			"Device boot error - DRAM initialization failed\n");
+		break;
+	case CPU_BOOT_STATUS_UBOOT_NOT_READY:
+		dev_err(hdev->dev,
+			"Device boot error - u-boot stopped by user\n");
+		break;
+	case CPU_BOOT_STATUS_TS_INIT_FAIL:
+		dev_err(hdev->dev,
+			"Device boot error - Thermal Sensor initialization failed\n");
+		break;
+	default:
+		dev_err(hdev->dev,
+			"Device boot error - Invalid status code %d\n",
+			status);
+		break;
+	}
+}
+
 int hl_fw_init_cpu(struct hl_device *hdev, u32 cpu_boot_status_reg,
 			u32 msg_to_cpu_reg, u32 cpu_msg_status_reg,
 			u32 boot_err0_reg, bool skip_bmc,
@@ -466,50 +513,7 @@ int hl_fw_init_cpu(struct hl_device *hdev, u32 cpu_boot_status_reg,
 	 * versions but we keep them here for backward compatibility
 	 */
 	if (rc) {
-		switch (status) {
-		case CPU_BOOT_STATUS_NA:
-			dev_err(hdev->dev,
-				"Device boot error - BTL did NOT run\n");
-			break;
-		case CPU_BOOT_STATUS_IN_WFE:
-			dev_err(hdev->dev,
-				"Device boot error - Stuck inside WFE loop\n");
-			break;
-		case CPU_BOOT_STATUS_IN_BTL:
-			dev_err(hdev->dev,
-				"Device boot error - Stuck in BTL\n");
-			break;
-		case CPU_BOOT_STATUS_IN_PREBOOT:
-			dev_err(hdev->dev,
-				"Device boot error - Stuck in Preboot\n");
-			break;
-		case CPU_BOOT_STATUS_IN_SPL:
-			dev_err(hdev->dev,
-				"Device boot error - Stuck in SPL\n");
-			break;
-		case CPU_BOOT_STATUS_IN_UBOOT:
-			dev_err(hdev->dev,
-				"Device boot error - Stuck in u-boot\n");
-			break;
-		case CPU_BOOT_STATUS_DRAM_INIT_FAIL:
-			dev_err(hdev->dev,
-				"Device boot error - DRAM initialization failed\n");
-			break;
-		case CPU_BOOT_STATUS_UBOOT_NOT_READY:
-			dev_err(hdev->dev,
-				"Device boot error - u-boot stopped by user\n");
-			break;
-		case CPU_BOOT_STATUS_TS_INIT_FAIL:
-			dev_err(hdev->dev,
-				"Device boot error - Thermal Sensor initialization failed\n");
-			break;
-		default:
-			dev_err(hdev->dev,
-				"Device boot error - Invalid status code %d\n",
-				status);
-			break;
-		}
-
+		hl_detect_cpu_boot_status(hdev, status);
 		rc = -EIO;
 		goto out;
 	}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/9] habanalabs: Add dropped cs statistics info struct
  2020-07-05 13:12 [PATCH 1/9] habanalabs: Increase queues depth Oded Gabbay
  2020-07-05 13:12 ` [PATCH 2/9] habanalabs: rephrase error messages Oded Gabbay
  2020-07-05 13:12 ` [PATCH 3/9] habanalabs: extract cpu boot status lookup Oded Gabbay
@ 2020-07-05 13:12 ` Oded Gabbay
  2020-07-05 13:12 ` [PATCH 5/9] habanalabs: Extract ECC information from FW Oded Gabbay
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Oded Gabbay @ 2020-07-05 13:12 UTC (permalink / raw)
  To: linux-kernel, SW_Drivers; +Cc: Ofir Bitton

From: Ofir Bitton <obitton@habana.ai>

Add command submission statistics structure which can be obtained
through the info ioctl. Each drop counter describes the reason for
which the command submission was dropped.
This information is needed for the user to be aware of the specific
reason for which the submitted work was dropped. The user can then
utilize the driver more efficiently.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/command_submission.c | 24 +++++++++++++++++++-
 drivers/misc/habanalabs/habanalabs.h         |  5 ++++
 drivers/misc/habanalabs/habanalabs_ioctl.c   | 24 ++++++++++++++++++++
 drivers/misc/habanalabs/hw_queue.c           |  5 +++-
 include/uapi/misc/habanalabs.h               | 21 +++++++++++++++++
 5 files changed, 77 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/habanalabs/command_submission.c b/drivers/misc/habanalabs/command_submission.c
index f81d6685e011..777f88d25acd 100644
--- a/drivers/misc/habanalabs/command_submission.c
+++ b/drivers/misc/habanalabs/command_submission.c
@@ -246,6 +246,18 @@ static void free_job(struct hl_device *hdev, struct hl_cs_job *job)
 	kfree(job);
 }
 
+static void cs_counters_aggregate(struct hl_device *hdev, struct hl_ctx *ctx)
+{
+	hdev->aggregated_cs_counters.device_in_reset_drop_cnt +=
+			ctx->cs_counters.device_in_reset_drop_cnt;
+	hdev->aggregated_cs_counters.out_of_mem_drop_cnt +=
+			ctx->cs_counters.out_of_mem_drop_cnt;
+	hdev->aggregated_cs_counters.parsing_drop_cnt +=
+			ctx->cs_counters.parsing_drop_cnt;
+	hdev->aggregated_cs_counters.queue_full_drop_cnt +=
+			ctx->cs_counters.queue_full_drop_cnt;
+}
+
 static void cs_do_release(struct kref *ref)
 {
 	struct hl_cs *cs = container_of(ref, struct hl_cs,
@@ -349,6 +361,8 @@ static void cs_do_release(struct kref *ref)
 	dma_fence_signal(cs->fence);
 	dma_fence_put(cs->fence);
 
+	cs_counters_aggregate(hdev, cs->ctx);
+
 	kfree(cs);
 }
 
@@ -632,12 +646,15 @@ static int cs_ioctl_default(struct hl_fpriv *hpriv, void __user *chunks,
 
 		rc = validate_queue_index(hdev, chunk, &queue_type,
 						&is_kernel_allocated_cb);
-		if (rc)
+		if (rc) {
+			hpriv->ctx->cs_counters.parsing_drop_cnt++;
 			goto free_cs_object;
+		}
 
 		if (is_kernel_allocated_cb) {
 			cb = get_cb_from_cs_chunk(hdev, &hpriv->cb_mgr, chunk);
 			if (!cb) {
+				hpriv->ctx->cs_counters.parsing_drop_cnt++;
 				rc = -EINVAL;
 				goto free_cs_object;
 			}
@@ -651,6 +668,7 @@ static int cs_ioctl_default(struct hl_fpriv *hpriv, void __user *chunks,
 		job = hl_cs_allocate_job(hdev, queue_type,
 						is_kernel_allocated_cb);
 		if (!job) {
+			hpriv->ctx->cs_counters.out_of_mem_drop_cnt++;
 			dev_err(hdev->dev, "Failed to allocate a new job\n");
 			rc = -ENOMEM;
 			if (is_kernel_allocated_cb)
@@ -683,6 +701,7 @@ static int cs_ioctl_default(struct hl_fpriv *hpriv, void __user *chunks,
 
 		rc = cs_parser(hpriv, job);
 		if (rc) {
+			hpriv->ctx->cs_counters.parsing_drop_cnt++;
 			dev_err(hdev->dev,
 				"Failed to parse JOB %d.%llu.%d, err %d, rejecting the CS\n",
 				cs->ctx->asid, cs->sequence, job->id, rc);
@@ -691,6 +710,7 @@ static int cs_ioctl_default(struct hl_fpriv *hpriv, void __user *chunks,
 	}
 
 	if (int_queues_only) {
+		hpriv->ctx->cs_counters.parsing_drop_cnt++;
 		dev_err(hdev->dev,
 			"Reject CS %d.%llu because only internal queues jobs are present\n",
 			cs->ctx->asid, cs->sequence);
@@ -875,6 +895,7 @@ static int cs_ioctl_signal_wait(struct hl_fpriv *hpriv, enum hl_cs_type cs_type,
 
 	job = hl_cs_allocate_job(hdev, q_type, true);
 	if (!job) {
+		ctx->cs_counters.out_of_mem_drop_cnt++;
 		dev_err(hdev->dev, "Failed to allocate a new job\n");
 		rc = -ENOMEM;
 		goto put_cs;
@@ -882,6 +903,7 @@ static int cs_ioctl_signal_wait(struct hl_fpriv *hpriv, enum hl_cs_type cs_type,
 
 	cb = hl_cb_kernel_create(hdev, PAGE_SIZE);
 	if (!cb) {
+		ctx->cs_counters.out_of_mem_drop_cnt++;
 		kfree(job);
 		rc = -EFAULT;
 		goto put_cs;
diff --git a/drivers/misc/habanalabs/habanalabs.h b/drivers/misc/habanalabs/habanalabs.h
index e4d6f7c91194..ae781453a509 100644
--- a/drivers/misc/habanalabs/habanalabs.h
+++ b/drivers/misc/habanalabs/habanalabs.h
@@ -10,6 +10,7 @@
 
 #include "include/armcp_if.h"
 #include "include/qman_if.h"
+#include <uapi/misc/habanalabs.h>
 
 #include <linux/cdev.h>
 #include <linux/iopoll.h>
@@ -787,6 +788,7 @@ struct hl_ctx {
 	struct mutex		mem_hash_lock;
 	struct mutex		mmu_lock;
 	struct list_head	debugfs_list;
+	struct hl_cs_counters	cs_counters;
 	u64			cs_sequence;
 	u64			*dram_default_hops;
 	spinlock_t		cs_lock;
@@ -1391,6 +1393,7 @@ struct hl_device_idle_busy_ts {
  * @compute_ctx: current compute context executing.
  * @idle_busy_ts_arr: array to hold time stamps of transitions from idle to busy
  *                    and vice-versa
+ * @aggregated_cs_counters: aggregated cs counters among all contexts
  * @dram_used_mem: current DRAM memory consumption.
  * @timeout_jiffies: device CS timeout value.
  * @max_power: the max power of the device, as configured by the sysadmin. This
@@ -1489,6 +1492,8 @@ struct hl_device {
 
 	struct hl_device_idle_busy_ts	*idle_busy_ts_arr;
 
+	struct hl_cs_counters		aggregated_cs_counters;
+
 	atomic64_t			dram_used_mem;
 	u64				timeout_jiffies;
 	u64				max_power;
diff --git a/drivers/misc/habanalabs/habanalabs_ioctl.c b/drivers/misc/habanalabs/habanalabs_ioctl.c
index 52eedd3a6c3a..5af1c03da473 100644
--- a/drivers/misc/habanalabs/habanalabs_ioctl.c
+++ b/drivers/misc/habanalabs/habanalabs_ioctl.c
@@ -276,6 +276,27 @@ static int time_sync_info(struct hl_device *hdev, struct hl_info_args *args)
 		min((size_t) max_size, sizeof(time_sync))) ? -EFAULT : 0;
 }
 
+static int cs_counters_info(struct hl_fpriv *hpriv, struct hl_info_args *args)
+{
+	struct hl_device *hdev = hpriv->hdev;
+	struct hl_info_cs_counters cs_counters = {0};
+	u32 max_size = args->return_size;
+	void __user *out = (void __user *) (uintptr_t) args->return_pointer;
+
+	if ((!max_size) || (!out))
+		return -EINVAL;
+
+	memcpy(&cs_counters.cs_counters, &hdev->aggregated_cs_counters,
+			sizeof(struct hl_cs_counters));
+
+	if (hpriv->ctx)
+		memcpy(&cs_counters.ctx_cs_counters, &hpriv->ctx->cs_counters,
+				sizeof(struct hl_cs_counters));
+
+	return copy_to_user(out, &cs_counters,
+		min((size_t) max_size, sizeof(cs_counters))) ? -EFAULT : 0;
+}
+
 static int _hl_info_ioctl(struct hl_fpriv *hpriv, void *data,
 				struct device *dev)
 {
@@ -336,6 +357,9 @@ static int _hl_info_ioctl(struct hl_fpriv *hpriv, void *data,
 	case HL_INFO_TIME_SYNC:
 		return time_sync_info(hdev, args);
 
+	case HL_INFO_CS_COUNTERS:
+		return cs_counters_info(hpriv, args);
+
 	default:
 		dev_err(dev, "Invalid request %d\n", args->op);
 		rc = -ENOTTY;
diff --git a/drivers/misc/habanalabs/hw_queue.c b/drivers/misc/habanalabs/hw_queue.c
index f5a10a5ac300..da66ffb528f8 100644
--- a/drivers/misc/habanalabs/hw_queue.c
+++ b/drivers/misc/habanalabs/hw_queue.c
@@ -514,6 +514,7 @@ int hl_hw_queue_schedule_cs(struct hl_cs *cs)
 	hdev->asic_funcs->hw_queues_lock(hdev);
 
 	if (hl_device_disabled_or_in_reset(hdev)) {
+		ctx->cs_counters.device_in_reset_drop_cnt++;
 		dev_err(hdev->dev,
 			"device is disabled or in reset, CS rejected!\n");
 		rc = -EPERM;
@@ -543,8 +544,10 @@ int hl_hw_queue_schedule_cs(struct hl_cs *cs)
 				break;
 			}
 
-			if (rc)
+			if (rc) {
+				ctx->cs_counters.queue_full_drop_cnt++;
 				goto unroll_cq_resv;
+			}
 
 			if (q->queue_type == QUEUE_TYPE_EXT ||
 					q->queue_type == QUEUE_TYPE_HW)
diff --git a/include/uapi/misc/habanalabs.h b/include/uapi/misc/habanalabs.h
index f218d1c62c62..d5c4f983b7a8 100644
--- a/include/uapi/misc/habanalabs.h
+++ b/include/uapi/misc/habanalabs.h
@@ -263,6 +263,7 @@ enum hl_device_status {
  *                         time the driver was loaded.
  * HL_INFO_TIME_SYNC     - Retrieve the device's time alongside the host's time
  *                         for synchronization.
+ * HL_INFO_CS_COUNTERS   - Retrieve command submission counters
  */
 #define HL_INFO_HW_IP_INFO		0
 #define HL_INFO_HW_EVENTS		1
@@ -274,6 +275,7 @@ enum hl_device_status {
 #define HL_INFO_CLK_RATE		8
 #define HL_INFO_RESET_COUNT		9
 #define HL_INFO_TIME_SYNC		10
+#define HL_INFO_CS_COUNTERS		11
 
 #define HL_INFO_VERSION_MAX_LEN	128
 #define HL_INFO_CARD_NAME_MAX_LEN	16
@@ -338,6 +340,25 @@ struct hl_info_time_sync {
 	__u64 host_time;
 };
 
+/**
+ * struct hl_info_cs_counters - command submission counters
+ * @out_of_mem_drop_cnt: dropped due to memory allocation issue
+ * @parsing_drop_cnt: dropped due to error in packet parsing
+ * @queue_full_drop_cnt: dropped due to queue full
+ * @device_in_reset_drop_cnt: dropped due to device in reset
+ */
+struct hl_cs_counters {
+	__u64 out_of_mem_drop_cnt;
+	__u64 parsing_drop_cnt;
+	__u64 queue_full_drop_cnt;
+	__u64 device_in_reset_drop_cnt;
+};
+
+struct hl_info_cs_counters {
+	struct hl_cs_counters cs_counters;
+	struct hl_cs_counters ctx_cs_counters;
+};
+
 struct hl_info_args {
 	/* Location of relevant struct in userspace */
 	__u64 return_pointer;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5/9] habanalabs: Extract ECC information from FW
  2020-07-05 13:12 [PATCH 1/9] habanalabs: Increase queues depth Oded Gabbay
                   ` (2 preceding siblings ...)
  2020-07-05 13:12 ` [PATCH 4/9] habanalabs: Add dropped cs statistics info struct Oded Gabbay
@ 2020-07-05 13:12 ` Oded Gabbay
  2020-07-05 13:12 ` [PATCH 6/9] habanalabs: PCIe iATU refactoring Oded Gabbay
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Oded Gabbay @ 2020-07-05 13:12 UTC (permalink / raw)
  To: linux-kernel, SW_Drivers; +Cc: Ofir Bitton

ECC (Error Correcting Code) interrupts are going to be handled
by the FW. Hence, we define an interface in which the driver can
obtain the relevant ECC information.
This information is needed for monitoring and can also lead
to a hard reset if ECC error is not correctable.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/gaudi/gaudi.c         | 366 ++++++------------
 drivers/misc/habanalabs/include/armcp_if.h    |  12 +-
 .../include/gaudi/asic_reg/gaudi_regs.h       |  19 +-
 3 files changed, 146 insertions(+), 251 deletions(-)

diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index aa4139626a04..888f42adee6a 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -316,6 +316,13 @@ static enum hl_queue_type gaudi_queue_type[GAUDI_QUEUE_ID_SIZE] = {
 	QUEUE_TYPE_NA,  /* GAUDI_QUEUE_ID_NIC_9_3 */
 };
 
+struct ecc_info_extract_params {
+	u64 block_address;
+	u32 num_memories;
+	bool derr;
+	bool disable_clock_gating;
+};
+
 static int gaudi_mmu_update_asid_hop0_addr(struct hl_device *hdev, u32 asid,
 								u64 phys_addr);
 static int gaudi_send_job_on_qman0(struct hl_device *hdev,
@@ -5117,62 +5124,75 @@ static void gaudi_print_mmu_error_info(struct hl_device *hdev)
  *  |                   |0xF4C memory wrappers 127:96                          |
  *  +-------------------+------------------------------------------------------+
  */
-static void gaudi_print_ecc_info_generic(struct hl_device *hdev,
-					const char *block_name,
-					u64 block_address, int num_memories,
-					bool derr, bool disable_clock_gating)
+static int gaudi_extract_ecc_info(struct hl_device *hdev,
+		struct ecc_info_extract_params *params, u64 *ecc_address,
+		u64 *ecc_syndrom, u8 *memory_wrapper_idx)
 {
 	struct gaudi_device *gaudi = hdev->asic_specific;
-	int num_mem_regs = num_memories / 32 + ((num_memories % 32) ? 1 : 0);
+	u32 i, num_mem_regs, reg, err_bit;
+	u64 err_addr, err_word = 0;
+	int rc = 0;
 
-	if (block_address >= CFG_BASE)
-		block_address -= CFG_BASE;
+	num_mem_regs = params->num_memories / 32 +
+			((params->num_memories % 32) ? 1 : 0);
 
-	if (derr)
-		block_address += GAUDI_ECC_DERR0_OFFSET;
+	if (params->block_address >= CFG_BASE)
+		params->block_address -= CFG_BASE;
+
+	if (params->derr)
+		err_addr = params->block_address + GAUDI_ECC_DERR0_OFFSET;
 	else
-		block_address += GAUDI_ECC_SERR0_OFFSET;
+		err_addr = params->block_address + GAUDI_ECC_SERR0_OFFSET;
 
-	if (disable_clock_gating) {
+	if (params->disable_clock_gating) {
 		mutex_lock(&gaudi->clk_gate_mutex);
 		hdev->asic_funcs->disable_clock_gating(hdev);
 	}
 
-	switch (num_mem_regs) {
-	case 1:
-		dev_err(hdev->dev,
-			"%s ECC indication: 0x%08x\n",
-			block_name, RREG32(block_address));
-		break;
-	case 2:
-		dev_err(hdev->dev,
-			"%s ECC indication: 0x%08x 0x%08x\n",
-			block_name,
-			RREG32(block_address), RREG32(block_address + 4));
-		break;
-	case 3:
-		dev_err(hdev->dev,
-			"%s ECC indication: 0x%08x 0x%08x 0x%08x\n",
-			block_name,
-			RREG32(block_address), RREG32(block_address + 4),
-			RREG32(block_address + 8));
-		break;
-	case 4:
-		dev_err(hdev->dev,
-			"%s ECC indication: 0x%08x 0x%08x 0x%08x 0x%08x\n",
-			block_name,
-			RREG32(block_address), RREG32(block_address + 4),
-			RREG32(block_address + 8), RREG32(block_address + 0xc));
-		break;
-	default:
-		break;
+	/* Set invalid wrapper index */
+	*memory_wrapper_idx = 0xFF;
+
+	/* Iterate through memory wrappers, a single bit must be set */
+	for (i = 0 ; i > num_mem_regs ; i++) {
+		err_addr += i * 4;
+		err_word = RREG32(err_addr);
+		if (err_word) {
+			err_bit = __ffs(err_word);
+			*memory_wrapper_idx = err_bit + (32 * i);
+			break;
+		}
+	}
 
+	if (*memory_wrapper_idx == 0xFF) {
+		dev_err(hdev->dev, "ECC error information cannot be found\n");
+		rc = -EINVAL;
+		goto enable_clk_gate;
 	}
 
-	if (disable_clock_gating) {
+	WREG32(params->block_address + GAUDI_ECC_MEM_SEL_OFFSET,
+			*memory_wrapper_idx);
+
+	*ecc_address =
+		RREG32(params->block_address + GAUDI_ECC_ADDRESS_OFFSET);
+	*ecc_syndrom =
+		RREG32(params->block_address + GAUDI_ECC_SYNDROME_OFFSET);
+
+	/* Clear error indication */
+	reg = RREG32(params->block_address + GAUDI_ECC_MEM_INFO_CLR_OFFSET);
+	if (params->derr)
+		reg |= FIELD_PREP(GAUDI_ECC_MEM_INFO_CLR_DERR_MASK, 1);
+	else
+		reg |= FIELD_PREP(GAUDI_ECC_MEM_INFO_CLR_SERR_MASK, 1);
+
+	WREG32(params->block_address + GAUDI_ECC_MEM_INFO_CLR_OFFSET, reg);
+
+enable_clk_gate:
+	if (params->disable_clock_gating) {
 		hdev->asic_funcs->enable_clock_gating(hdev);
 		mutex_unlock(&gaudi->clk_gate_mutex);
 	}
+
+	return rc;
 }
 
 static void gaudi_handle_qman_err_generic(struct hl_device *hdev,
@@ -5225,239 +5245,99 @@ static void gaudi_handle_qman_err_generic(struct hl_device *hdev,
 	}
 }
 
-static void gaudi_print_ecc_info(struct hl_device *hdev, u16 event_type)
+static void gaudi_handle_ecc_event(struct hl_device *hdev, u16 event_type,
+		struct hl_eq_ecc_data *ecc_data)
 {
-	u64 block_address;
-	u8 index;
-	int num_memories;
-	char desc[32];
-	bool derr;
-	bool disable_clock_gating;
+	struct ecc_info_extract_params params;
+	u64 ecc_address = 0, ecc_syndrom = 0;
+	u8 index, memory_wrapper_idx = 0;
+	bool extract_info_from_fw;
+	int rc;
 
 	switch (event_type) {
-	case GAUDI_EVENT_PCIE_CORE_SERR:
-		snprintf(desc, ARRAY_SIZE(desc), "%s", "PCIE_CORE");
-		block_address = mmPCIE_CORE_BASE;
-		num_memories = 51;
-		derr = false;
-		disable_clock_gating = false;
-		break;
-	case GAUDI_EVENT_PCIE_CORE_DERR:
-		snprintf(desc, ARRAY_SIZE(desc), "%s", "PCIE_CORE");
-		block_address = mmPCIE_CORE_BASE;
-		num_memories = 51;
-		derr = true;
-		disable_clock_gating = false;
-		break;
-	case GAUDI_EVENT_PCIE_IF_SERR:
-		snprintf(desc, ARRAY_SIZE(desc), "%s", "PCIE_WRAP");
-		block_address = mmPCIE_WRAP_BASE;
-		num_memories = 11;
-		derr = false;
-		disable_clock_gating = false;
-		break;
-	case GAUDI_EVENT_PCIE_IF_DERR:
-		snprintf(desc, ARRAY_SIZE(desc), "%s", "PCIE_WRAP");
-		block_address = mmPCIE_WRAP_BASE;
-		num_memories = 11;
-		derr = true;
-		disable_clock_gating = false;
-		break;
-	case GAUDI_EVENT_PCIE_PHY_SERR:
-		snprintf(desc, ARRAY_SIZE(desc), "%s", "PCIE_PHY");
-		block_address = mmPCIE_PHY_BASE;
-		num_memories = 4;
-		derr = false;
-		disable_clock_gating = false;
-		break;
-	case GAUDI_EVENT_PCIE_PHY_DERR:
-		snprintf(desc, ARRAY_SIZE(desc), "%s", "PCIE_PHY");
-		block_address = mmPCIE_PHY_BASE;
-		num_memories = 4;
-		derr = true;
-		disable_clock_gating = false;
+	case GAUDI_EVENT_PCIE_CORE_SERR ... GAUDI_EVENT_PCIE_PHY_DERR:
+	case GAUDI_EVENT_DMA0_SERR_ECC ... GAUDI_EVENT_MMU_DERR:
+		extract_info_from_fw = true;
 		break;
 	case GAUDI_EVENT_TPC0_SERR ... GAUDI_EVENT_TPC7_SERR:
 		index = event_type - GAUDI_EVENT_TPC0_SERR;
-		block_address = mmTPC0_CFG_BASE + index * TPC_CFG_OFFSET;
-		snprintf(desc, ARRAY_SIZE(desc), "%s%d", "TPC", index);
-		num_memories = 90;
-		derr = false;
-		disable_clock_gating = true;
+		params.block_address = mmTPC0_CFG_BASE + index * TPC_CFG_OFFSET;
+		params.num_memories = 90;
+		params.derr = false;
+		params.disable_clock_gating = true;
+		extract_info_from_fw = false;
 		break;
 	case GAUDI_EVENT_TPC0_DERR ... GAUDI_EVENT_TPC7_DERR:
 		index = event_type - GAUDI_EVENT_TPC0_DERR;
-		block_address =
+		params.block_address =
 			mmTPC0_CFG_BASE + index * TPC_CFG_OFFSET;
-		snprintf(desc, ARRAY_SIZE(desc), "%s%d", "TPC", index);
-		num_memories = 90;
-		derr = true;
-		disable_clock_gating = true;
+		params.num_memories = 90;
+		params.derr = true;
+		params.disable_clock_gating = true;
+		extract_info_from_fw = false;
 		break;
 	case GAUDI_EVENT_MME0_ACC_SERR:
 	case GAUDI_EVENT_MME1_ACC_SERR:
 	case GAUDI_EVENT_MME2_ACC_SERR:
 	case GAUDI_EVENT_MME3_ACC_SERR:
 		index = (event_type - GAUDI_EVENT_MME0_ACC_SERR) / 4;
-		block_address = mmMME0_ACC_BASE + index * MME_ACC_OFFSET;
-		snprintf(desc, ARRAY_SIZE(desc), "MME%d_ACC", index);
-		num_memories = 128;
-		derr = false;
-		disable_clock_gating = true;
+		params.block_address = mmMME0_ACC_BASE + index * MME_ACC_OFFSET;
+		params.num_memories = 128;
+		params.derr = false;
+		params.disable_clock_gating = true;
+		extract_info_from_fw = false;
 		break;
 	case GAUDI_EVENT_MME0_ACC_DERR:
 	case GAUDI_EVENT_MME1_ACC_DERR:
 	case GAUDI_EVENT_MME2_ACC_DERR:
 	case GAUDI_EVENT_MME3_ACC_DERR:
 		index = (event_type - GAUDI_EVENT_MME0_ACC_DERR) / 4;
-		block_address = mmMME0_ACC_BASE + index * MME_ACC_OFFSET;
-		snprintf(desc, ARRAY_SIZE(desc), "MME%d_ACC", index);
-		num_memories = 128;
-		derr = true;
-		disable_clock_gating = true;
+		params.block_address = mmMME0_ACC_BASE + index * MME_ACC_OFFSET;
+		params.num_memories = 128;
+		params.derr = true;
+		params.disable_clock_gating = true;
+		extract_info_from_fw = false;
 		break;
 	case GAUDI_EVENT_MME0_SBAB_SERR:
 	case GAUDI_EVENT_MME1_SBAB_SERR:
 	case GAUDI_EVENT_MME2_SBAB_SERR:
 	case GAUDI_EVENT_MME3_SBAB_SERR:
 		index = (event_type - GAUDI_EVENT_MME0_SBAB_SERR) / 4;
-		block_address = mmMME0_SBAB_BASE + index * MME_ACC_OFFSET;
-		snprintf(desc, ARRAY_SIZE(desc), "MME%d_SBAB", index);
-		num_memories = 33;
-		derr = false;
-		disable_clock_gating = true;
+		params.block_address =
+			mmMME0_SBAB_BASE + index * MME_ACC_OFFSET;
+		params.num_memories = 33;
+		params.derr = false;
+		params.disable_clock_gating = true;
+		extract_info_from_fw = false;
 		break;
 	case GAUDI_EVENT_MME0_SBAB_DERR:
 	case GAUDI_EVENT_MME1_SBAB_DERR:
 	case GAUDI_EVENT_MME2_SBAB_DERR:
 	case GAUDI_EVENT_MME3_SBAB_DERR:
 		index = (event_type - GAUDI_EVENT_MME0_SBAB_DERR) / 4;
-		block_address = mmMME0_SBAB_BASE + index * MME_ACC_OFFSET;
-		snprintf(desc, ARRAY_SIZE(desc), "MME%d_SBAB", index);
-		num_memories = 33;
-		derr = true;
-		disable_clock_gating = true;
-		break;
-	case GAUDI_EVENT_DMA0_SERR_ECC ... GAUDI_EVENT_DMA7_SERR_ECC:
-		index = event_type - GAUDI_EVENT_DMA0_SERR_ECC;
-		block_address = mmDMA0_CORE_BASE + index * DMA_CORE_OFFSET;
-		snprintf(desc, ARRAY_SIZE(desc), "DMA%d_CORE", index);
-		num_memories = 16;
-		derr = false;
-		disable_clock_gating = false;
-		break;
-	case GAUDI_EVENT_DMA0_DERR_ECC ... GAUDI_EVENT_DMA7_DERR_ECC:
-		index = event_type - GAUDI_EVENT_DMA0_DERR_ECC;
-		block_address = mmDMA0_CORE_BASE + index * DMA_CORE_OFFSET;
-		snprintf(desc, ARRAY_SIZE(desc), "DMA%d_CORE", index);
-		num_memories = 16;
-		derr = true;
-		disable_clock_gating = false;
-		break;
-	case GAUDI_EVENT_CPU_IF_ECC_SERR:
-		block_address = mmCPU_IF_BASE;
-		snprintf(desc, ARRAY_SIZE(desc), "%s", "CPU");
-		num_memories = 4;
-		derr = false;
-		disable_clock_gating = false;
-		break;
-	case GAUDI_EVENT_CPU_IF_ECC_DERR:
-		block_address = mmCPU_IF_BASE;
-		snprintf(desc, ARRAY_SIZE(desc), "%s", "CPU");
-		num_memories = 4;
-		derr = true;
-		disable_clock_gating = false;
-		break;
-	case GAUDI_EVENT_PSOC_MEM_SERR:
-		block_address = mmPSOC_GLOBAL_CONF_BASE;
-		snprintf(desc, ARRAY_SIZE(desc), "%s", "CPU");
-		num_memories = 4;
-		derr = false;
-		disable_clock_gating = false;
-		break;
-	case GAUDI_EVENT_PSOC_MEM_DERR:
-		block_address = mmPSOC_GLOBAL_CONF_BASE;
-		snprintf(desc, ARRAY_SIZE(desc), "%s", "CPU");
-		num_memories = 4;
-		derr = true;
-		disable_clock_gating = false;
-		break;
-	case GAUDI_EVENT_PSOC_CORESIGHT_SERR:
-		block_address = mmPSOC_CS_TRACE_BASE;
-		snprintf(desc, ARRAY_SIZE(desc), "%s", "CPU");
-		num_memories = 2;
-		derr = false;
-		disable_clock_gating = false;
-		break;
-	case GAUDI_EVENT_PSOC_CORESIGHT_DERR:
-		block_address = mmPSOC_CS_TRACE_BASE;
-		snprintf(desc, ARRAY_SIZE(desc), "%s", "CPU");
-		num_memories = 2;
-		derr = true;
-		disable_clock_gating = false;
-		break;
-	case GAUDI_EVENT_SRAM0_SERR ... GAUDI_EVENT_SRAM28_SERR:
-		index = event_type - GAUDI_EVENT_SRAM0_SERR;
-		block_address =
-			mmSRAM_Y0_X0_BANK_BASE + index * SRAM_BANK_OFFSET;
-		snprintf(desc, ARRAY_SIZE(desc), "SRAM%d", index);
-		num_memories = 2;
-		derr = false;
-		disable_clock_gating = false;
-		break;
-	case GAUDI_EVENT_SRAM0_DERR ... GAUDI_EVENT_SRAM28_DERR:
-		index = event_type - GAUDI_EVENT_SRAM0_DERR;
-		block_address =
-			mmSRAM_Y0_X0_BANK_BASE + index * SRAM_BANK_OFFSET;
-		snprintf(desc, ARRAY_SIZE(desc), "SRAM%d", index);
-		num_memories = 2;
-		derr = true;
-		disable_clock_gating = false;
-		break;
-	case GAUDI_EVENT_DMA_IF0_SERR ... GAUDI_EVENT_DMA_IF3_SERR:
-		index = event_type - GAUDI_EVENT_DMA_IF0_SERR;
-		block_address = mmDMA_IF_W_S_BASE +
-				index * (mmDMA_IF_E_S_BASE - mmDMA_IF_W_S_BASE);
-		snprintf(desc, ARRAY_SIZE(desc), "DMA_IF%d", index);
-		num_memories = 60;
-		derr = false;
-		disable_clock_gating = false;
-		break;
-	case GAUDI_EVENT_DMA_IF0_DERR ... GAUDI_EVENT_DMA_IF3_DERR:
-		index = event_type - GAUDI_EVENT_DMA_IF0_DERR;
-		block_address = mmDMA_IF_W_S_BASE +
-				index * (mmDMA_IF_E_S_BASE - mmDMA_IF_W_S_BASE);
-		snprintf(desc, ARRAY_SIZE(desc), "DMA_IF%d", index);
-		derr = true;
-		num_memories = 60;
-		disable_clock_gating = false;
-		break;
-	case GAUDI_EVENT_HBM_0_SERR ... GAUDI_EVENT_HBM_3_SERR:
-		index = event_type - GAUDI_EVENT_HBM_0_SERR;
-		/* HBM Registers are at different offsets */
-		block_address = mmHBM0_BASE + 0x8000 +
-				index * (mmHBM1_BASE - mmHBM0_BASE);
-		snprintf(desc, ARRAY_SIZE(desc), "HBM%d", index);
-		derr = false;
-		num_memories = 64;
-		disable_clock_gating = false;
-		break;
-	case GAUDI_EVENT_HBM_0_DERR ... GAUDI_EVENT_HBM_3_DERR:
-		index = event_type - GAUDI_EVENT_HBM_0_SERR;
-		/* HBM Registers are at different offsets */
-		block_address = mmHBM0_BASE + 0x8000 +
-				index * (mmHBM1_BASE - mmHBM0_BASE);
-		snprintf(desc, ARRAY_SIZE(desc), "HBM%d", index);
-		derr = true;
-		num_memories = 64;
-		disable_clock_gating = false;
-		break;
+		params.block_address =
+			mmMME0_SBAB_BASE + index * MME_ACC_OFFSET;
+		params.num_memories = 33;
+		params.derr = true;
+		params.disable_clock_gating = true;
 	default:
 		return;
 	}
 
-	gaudi_print_ecc_info_generic(hdev, desc, block_address, num_memories,
-					derr, disable_clock_gating);
+	if (extract_info_from_fw) {
+		ecc_address = le64_to_cpu(ecc_data->ecc_address);
+		ecc_syndrom = le64_to_cpu(ecc_data->ecc_syndrom);
+		memory_wrapper_idx = ecc_data->memory_wrapper_idx;
+	} else {
+		rc = gaudi_extract_ecc_info(hdev, &params, &ecc_address,
+				&ecc_syndrom, &memory_wrapper_idx);
+		if (rc)
+			return;
+	}
+
+	dev_err(hdev->dev,
+		"ECC error detected. address: %#llx. Syndrom: %#llx. block id %u\n",
+		ecc_address, ecc_syndrom, memory_wrapper_idx);
 }
 
 static void gaudi_handle_qman_err(struct hl_device *hdev, u16 event_type)
@@ -5507,8 +5387,6 @@ static void gaudi_print_irq_info(struct hl_device *hdev, u16 event_type,
 	dev_err_ratelimited(hdev->dev, "Received H/W interrupt %d [\"%s\"]\n",
 		event_type, desc);
 
-	gaudi_print_ecc_info(hdev, event_type);
-
 	if (razwi) {
 		gaudi_print_razwi_info(hdev);
 		gaudi_print_mmu_error_info(hdev);
@@ -5738,10 +5616,15 @@ static void gaudi_handle_eqe(struct hl_device *hdev,
 	case GAUDI_EVENT_PSOC_CORESIGHT_DERR:
 	case GAUDI_EVENT_SRAM0_DERR ... GAUDI_EVENT_SRAM28_DERR:
 	case GAUDI_EVENT_DMA_IF0_DERR ... GAUDI_EVENT_DMA_IF3_DERR:
-		fallthrough;
-	case GAUDI_EVENT_GIC500:
 	case GAUDI_EVENT_HBM_0_DERR ... GAUDI_EVENT_HBM_3_DERR:
 	case GAUDI_EVENT_MMU_DERR:
+		gaudi_print_irq_info(hdev, event_type, true);
+		gaudi_handle_ecc_event(hdev, event_type, &eq_entry->ecc_data);
+		if (hdev->hard_reset_on_fw_events)
+			hl_device_reset(hdev, true, false);
+		break;
+
+	case GAUDI_EVENT_GIC500:
 	case GAUDI_EVENT_AXI_ECC:
 	case GAUDI_EVENT_L2_RAM_ECC:
 	case GAUDI_EVENT_PLL0 ... GAUDI_EVENT_PLL17:
@@ -5837,6 +5720,11 @@ static void gaudi_handle_eqe(struct hl_device *hdev,
 	case GAUDI_EVENT_HBM_0_SERR ... GAUDI_EVENT_HBM_3_SERR:
 		fallthrough;
 	case GAUDI_EVENT_MMU_SERR:
+		gaudi_print_irq_info(hdev, event_type, true);
+		gaudi_handle_ecc_event(hdev, event_type, &eq_entry->ecc_data);
+		hl_fw_unmask_irq(hdev, event_type);
+		break;
+
 	case GAUDI_EVENT_PCIE_DEC:
 	case GAUDI_EVENT_MME0_WBC_RSP:
 	case GAUDI_EVENT_MME0_SBAB0_RSP:
diff --git a/drivers/misc/habanalabs/include/armcp_if.h b/drivers/misc/habanalabs/include/armcp_if.h
index dea7c90faafa..07f9972db28d 100644
--- a/drivers/misc/habanalabs/include/armcp_if.h
+++ b/drivers/misc/habanalabs/include/armcp_if.h
@@ -19,9 +19,19 @@ struct hl_eq_header {
 	__le32 ctl;
 };
 
+struct hl_eq_ecc_data {
+	__le64 ecc_address;
+	__le64 ecc_syndrom;
+	__u8 memory_wrapper_idx;
+	__u8 pad[7];
+};
+
 struct hl_eq_entry {
 	struct hl_eq_header hdr;
-	__le64 data[7];
+	union {
+		struct hl_eq_ecc_data ecc_data;
+		__le64 data[7];
+	};
 };
 
 #define HL_EQ_ENTRY_SIZE		sizeof(struct hl_eq_entry)
diff --git a/drivers/misc/habanalabs/include/gaudi/asic_reg/gaudi_regs.h b/drivers/misc/habanalabs/include/gaudi/asic_reg/gaudi_regs.h
index 62078077aee5..0c75d43532bd 100644
--- a/drivers/misc/habanalabs/include/gaudi/asic_reg/gaudi_regs.h
+++ b/drivers/misc/habanalabs/include/gaudi/asic_reg/gaudi_regs.h
@@ -93,17 +93,14 @@
 #include "psoc_hbm_pll_regs.h"
 #include "psoc_cpu_pll_regs.h"
 
-#define GAUDI_ECC_MEM_SEL_OFFSET	0xF18
-#define GAUDI_ECC_ADDRESS_OFFSET	0xF1C
-#define GAUDI_ECC_SYNDROME_OFFSET	0xF20
-#define GAUDI_ECC_SERR0_OFFSET		0xF30
-#define GAUDI_ECC_SERR1_OFFSET		0xF34
-#define GAUDI_ECC_SERR2_OFFSET		0xF38
-#define GAUDI_ECC_SERR3_OFFSET		0xF3C
-#define GAUDI_ECC_DERR0_OFFSET		0xF40
-#define GAUDI_ECC_DERR1_OFFSET		0xF44
-#define GAUDI_ECC_DERR2_OFFSET		0xF48
-#define GAUDI_ECC_DERR3_OFFSET		0xF4C
+#define GAUDI_ECC_MEM_SEL_OFFSET		0xF18
+#define GAUDI_ECC_ADDRESS_OFFSET		0xF1C
+#define GAUDI_ECC_SYNDROME_OFFSET		0xF20
+#define GAUDI_ECC_MEM_INFO_CLR_OFFSET		0xF28
+#define GAUDI_ECC_MEM_INFO_CLR_SERR_MASK	BIT(8)
+#define GAUDI_ECC_MEM_INFO_CLR_DERR_MASK	BIT(9)
+#define GAUDI_ECC_SERR0_OFFSET			0xF30
+#define GAUDI_ECC_DERR0_OFFSET			0xF40
 
 #define mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_SOB_OBJ_0                     0x492000
 #define mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0               0x494000
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 6/9] habanalabs: PCIe iATU refactoring
  2020-07-05 13:12 [PATCH 1/9] habanalabs: Increase queues depth Oded Gabbay
                   ` (3 preceding siblings ...)
  2020-07-05 13:12 ` [PATCH 5/9] habanalabs: Extract ECC information from FW Oded Gabbay
@ 2020-07-05 13:12 ` Oded Gabbay
  2020-07-05 13:12 ` [PATCH 7/9] habanalabs: remove soft-reset support from GAUDI Oded Gabbay
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Oded Gabbay @ 2020-07-05 13:12 UTC (permalink / raw)
  To: linux-kernel, SW_Drivers; +Cc: Ofir Bitton

From: Ofir Bitton <obitton@habana.ai>

Divide iATU initialization into inbound/outbound methods.
We must separate it in order to enable different match mode
per PCIe region.
In addition, added support for PCI address match mode.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/gaudi/gaudi.c |  59 ++++++-----
 drivers/misc/habanalabs/goya/goya.c   |  37 ++++++-
 drivers/misc/habanalabs/habanalabs.h  |  52 +++++++++-
 drivers/misc/habanalabs/pci.c         | 136 ++++++++++++--------------
 4 files changed, 180 insertions(+), 104 deletions(-)

diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index 888f42adee6a..a6e40dec3cd2 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -465,6 +465,7 @@ static int gaudi_pci_bars_map(struct hl_device *hdev)
 static u64 gaudi_set_hbm_bar_base(struct hl_device *hdev, u64 addr)
 {
 	struct gaudi_device *gaudi = hdev->asic_specific;
+	struct hl_inbound_pci_region pci_region;
 	u64 old_addr = addr;
 	int rc;
 
@@ -472,7 +473,10 @@ static u64 gaudi_set_hbm_bar_base(struct hl_device *hdev, u64 addr)
 		return old_addr;
 
 	/* Inbound Region 2 - Bar 4 - Point to HBM */
-	rc = hl_pci_set_dram_bar_base(hdev, 2, 4, addr);
+	pci_region.mode = PCI_BAR_MATCH_MODE;
+	pci_region.bar = HBM_BAR_ID;
+	pci_region.addr = addr;
+	rc = hl_pci_set_inbound_region(hdev, 2, &pci_region);
 	if (rc)
 		return U64_MAX;
 
@@ -486,22 +490,43 @@ static u64 gaudi_set_hbm_bar_base(struct hl_device *hdev, u64 addr)
 
 static int gaudi_init_iatu(struct hl_device *hdev)
 {
-	int rc = 0;
+	struct hl_inbound_pci_region inbound_region;
+	struct hl_outbound_pci_region outbound_region;
+	int rc;
+
+	/* Inbound Region 0 - Bar 0 - Point to SRAM + CFG */
+	inbound_region.mode = PCI_BAR_MATCH_MODE;
+	inbound_region.bar = SRAM_BAR_ID;
+	inbound_region.addr = SRAM_BASE_ADDR;
+	rc = hl_pci_set_inbound_region(hdev, 0, &inbound_region);
+	if (rc)
+		goto done;
 
 	/* Inbound Region 1 - Bar 2 - Point to SPI FLASH */
-	rc  = hl_pci_iatu_write(hdev, 0x314,
-				lower_32_bits(SPI_FLASH_BASE_ADDR));
-	rc |= hl_pci_iatu_write(hdev, 0x318,
-				upper_32_bits(SPI_FLASH_BASE_ADDR));
-	rc |= hl_pci_iatu_write(hdev, 0x300, 0);
-	/* Enable + Bar match + match enable */
-	rc |= hl_pci_iatu_write(hdev, 0x304, 0xC0080200);
+	inbound_region.mode = PCI_BAR_MATCH_MODE;
+	inbound_region.bar = CFG_BAR_ID;
+	inbound_region.addr = SPI_FLASH_BASE_ADDR;
+	rc = hl_pci_set_inbound_region(hdev, 1, &inbound_region);
+	if (rc)
+		goto done;
 
+	/* Inbound Region 2 - Bar 4 - Point to HBM */
+	inbound_region.mode = PCI_BAR_MATCH_MODE;
+	inbound_region.bar = HBM_BAR_ID;
+	inbound_region.addr = DRAM_PHYS_BASE;
+	rc = hl_pci_set_inbound_region(hdev, 2, &inbound_region);
 	if (rc)
-		return -EIO;
+		goto done;
+
+	hdev->asic_funcs->set_dma_mask_from_fw(hdev);
 
-	return hl_pci_init_iatu(hdev, SRAM_BASE_ADDR, DRAM_PHYS_BASE,
-				HOST_PHYS_BASE, HOST_PHYS_SIZE);
+	/* Outbound Region 0 - Point to Host */
+	outbound_region.addr = HOST_PHYS_BASE;
+	outbound_region.size = HOST_PHYS_SIZE;
+	rc = hl_pci_set_outbound_region(hdev, &outbound_region);
+
+done:
+	return rc;
 }
 
 static int gaudi_early_init(struct hl_device *hdev)
@@ -2884,16 +2909,6 @@ static int gaudi_hw_init(struct hl_device *hdev)
 
 	gaudi_init_hbm_dma_qmans(hdev);
 
-	/*
-	 * Before pushing u-boot/linux to device, need to set the hbm bar to
-	 * base address of dram
-	 */
-	if (gaudi_set_hbm_bar_base(hdev, DRAM_PHYS_BASE) == U64_MAX) {
-		dev_err(hdev->dev,
-			"failed to map HBM bar to DRAM base address\n");
-		return -EIO;
-	}
-
 	rc = gaudi_init_cpu(hdev);
 	if (rc) {
 		dev_err(hdev->dev, "failed to initialize CPU\n");
diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index ff32a8fa7624..5839b5bc9ee3 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -458,6 +458,7 @@ static int goya_pci_bars_map(struct hl_device *hdev)
 static u64 goya_set_ddr_bar_base(struct hl_device *hdev, u64 addr)
 {
 	struct goya_device *goya = hdev->asic_specific;
+	struct hl_inbound_pci_region pci_region;
 	u64 old_addr = addr;
 	int rc;
 
@@ -465,7 +466,10 @@ static u64 goya_set_ddr_bar_base(struct hl_device *hdev, u64 addr)
 		return old_addr;
 
 	/* Inbound Region 1 - Bar 4 - Point to DDR */
-	rc = hl_pci_set_dram_bar_base(hdev, 1, 4, addr);
+	pci_region.mode = PCI_BAR_MATCH_MODE;
+	pci_region.bar = DDR_BAR_ID;
+	pci_region.addr = addr;
+	rc = hl_pci_set_inbound_region(hdev, 1, &pci_region);
 	if (rc)
 		return U64_MAX;
 
@@ -487,8 +491,35 @@ static u64 goya_set_ddr_bar_base(struct hl_device *hdev, u64 addr)
  */
 static int goya_init_iatu(struct hl_device *hdev)
 {
-	return hl_pci_init_iatu(hdev, SRAM_BASE_ADDR, DRAM_PHYS_BASE,
-				HOST_PHYS_BASE, HOST_PHYS_SIZE);
+	struct hl_inbound_pci_region inbound_region;
+	struct hl_outbound_pci_region outbound_region;
+	int rc;
+
+	/* Inbound Region 0 - Bar 0 - Point to SRAM and CFG */
+	inbound_region.mode = PCI_BAR_MATCH_MODE;
+	inbound_region.bar = SRAM_CFG_BAR_ID;
+	inbound_region.addr = SRAM_BASE_ADDR;
+	rc = hl_pci_set_inbound_region(hdev, 0, &inbound_region);
+	if (rc)
+		goto done;
+
+	/* Inbound Region 1 - Bar 4 - Point to DDR */
+	inbound_region.mode = PCI_BAR_MATCH_MODE;
+	inbound_region.bar = DDR_BAR_ID;
+	inbound_region.addr = DRAM_PHYS_BASE;
+	rc = hl_pci_set_inbound_region(hdev, 1, &inbound_region);
+	if (rc)
+		goto done;
+
+	hdev->asic_funcs->set_dma_mask_from_fw(hdev);
+
+	/* Outbound Region 0 - Point to Host  */
+	outbound_region.addr = HOST_PHYS_BASE;
+	outbound_region.size = HOST_PHYS_SIZE;
+	rc = hl_pci_set_outbound_region(hdev, &outbound_region);
+
+done:
+	return rc;
 }
 
 /*
diff --git a/drivers/misc/habanalabs/habanalabs.h b/drivers/misc/habanalabs/habanalabs.h
index ae781453a509..365236589bbf 100644
--- a/drivers/misc/habanalabs/habanalabs.h
+++ b/drivers/misc/habanalabs/habanalabs.h
@@ -66,6 +66,8 @@
 #define IS_POWER_OF_2(n)		(n != 0 && ((n & (n - 1)) == 0))
 #define IS_MAX_PENDING_CS_VALID(n)	(IS_POWER_OF_2(n) && (n > 1))
 
+#define HL_PCI_NUM_BARS			6
+
 /**
  * struct pgt_info - MMU hop page info.
  * @node: hash linked-list node for the pgts shadow hash of pgts.
@@ -90,6 +92,16 @@ struct pgt_info {
 struct hl_device;
 struct hl_fpriv;
 
+/**
+ * enum hl_pci_match_mode - pci match mode per region
+ * @PCI_ADDRESS_MATCH_MODE: address match mode
+ * @PCI_BAR_MATCH_MODE: bar match mode
+ */
+enum hl_pci_match_mode {
+	PCI_ADDRESS_MATCH_MODE,
+	PCI_BAR_MATCH_MODE
+};
+
 /**
  * enum hl_fw_component - F/W components to read version through registers.
  * @FW_COMP_UBOOT: u-boot.
@@ -125,6 +137,32 @@ enum hl_cs_type {
 	CS_TYPE_WAIT
 };
 
+/*
+ * struct hl_inbound_pci_region - inbound region descriptor
+ * @mode: pci match mode for this region
+ * @addr: region target address
+ * @size: region size in bytes
+ * @offset_in_bar: offset within bar (address match mode)
+ * @bar: bar id
+ */
+struct hl_inbound_pci_region {
+	enum hl_pci_match_mode	mode;
+	u64			addr;
+	u64			size;
+	u64			offset_in_bar;
+	u8			bar;
+};
+
+/*
+ * struct hl_outbound_pci_region - outbound region descriptor
+ * @addr: region target address
+ * @size: region size in bytes
+ */
+struct hl_outbound_pci_region {
+	u64	addr;
+	u64	size;
+};
+
 /*
  * struct hl_hw_sob - H/W SOB info.
  * @hdev: habanalabs device structure.
@@ -1347,7 +1385,9 @@ struct hl_device_idle_busy_ts {
 /**
  * struct hl_device - habanalabs device structure.
  * @pdev: pointer to PCI device, can be NULL in case of simulator device.
- * @pcie_bar: array of available PCIe bars.
+ * @pcie_bar_phys: array of available PCIe bars physical addresses.
+ *		   (required only for PCI address match mode)
+ * @pcie_bar: array of available PCIe bars virtual addresses.
  * @rmmio: configuration area address on SRAM.
  * @cdev: related char device.
  * @cdev_ctrl: char device for control operations only (INFO IOCTL)
@@ -1442,7 +1482,8 @@ struct hl_device_idle_busy_ts {
  */
 struct hl_device {
 	struct pci_dev			*pdev;
-	void __iomem			*pcie_bar[6];
+	u64				pcie_bar_phys[HL_PCI_NUM_BARS];
+	void __iomem			*pcie_bar[HL_PCI_NUM_BARS];
 	void __iomem			*rmmio;
 	struct cdev			cdev;
 	struct cdev			cdev_ctrl;
@@ -1767,9 +1808,10 @@ int hl_pci_bars_map(struct hl_device *hdev, const char * const name[3],
 int hl_pci_iatu_write(struct hl_device *hdev, u32 addr, u32 data);
 int hl_pci_set_dram_bar_base(struct hl_device *hdev, u8 inbound_region, u8 bar,
 				u64 addr);
-int hl_pci_init_iatu(struct hl_device *hdev, u64 sram_base_address,
-			u64 dram_base_address, u64 host_phys_base_address,
-			u64 host_phys_size);
+int hl_pci_set_inbound_region(struct hl_device *hdev, u8 region,
+		struct hl_inbound_pci_region *pci_region);
+int hl_pci_set_outbound_region(struct hl_device *hdev,
+		struct hl_outbound_pci_region *pci_region);
 int hl_pci_init(struct hl_device *hdev);
 void hl_pci_fini(struct hl_device *hdev);
 
diff --git a/drivers/misc/habanalabs/pci.c b/drivers/misc/habanalabs/pci.c
index 61a8bb07262c..1791f6623c69 100644
--- a/drivers/misc/habanalabs/pci.c
+++ b/drivers/misc/habanalabs/pci.c
@@ -9,9 +9,15 @@
 #include "include/hw_ip/pci/pci_general.h"
 
 #include <linux/pci.h>
+#include <linux/bitfield.h>
 
 #define HL_PLDM_PCI_ELBI_TIMEOUT_MSEC	(HL_PCI_ELBI_TIMEOUT_MSEC * 10)
 
+#define IATU_REGION_CTRL_REGION_EN_MASK		BIT(31)
+#define IATU_REGION_CTRL_MATCH_MODE_MASK	BIT(30)
+#define IATU_REGION_CTRL_NUM_MATCH_EN_MASK	BIT(19)
+#define IATU_REGION_CTRL_BAR_NUM_MASK		GENMASK(10, 8)
+
 /**
  * hl_pci_bars_map() - Map PCI BARs.
  * @hdev: Pointer to hl_device structure.
@@ -187,110 +193,94 @@ static void hl_pci_reset_link_through_bridge(struct hl_device *hdev)
 }
 
 /**
- * hl_pci_set_dram_bar_base() - Set DDR BAR to map specific device address.
+ * hl_pci_set_inbound_region() - Configure inbound region
  * @hdev: Pointer to hl_device structure.
- * @inbound_region: Inbound region number.
- * @bar: PCI BAR number.
- * @addr: Address in DRAM. Must be aligned to DRAM bar size.
+ * @region: Inbound region number.
+ * @pci_region: Inbound region parameters.
  *
- * Configure the iATU so that the DRAM bar will start at the specified address.
+ * Configure the iATU inbound region.
  *
  * Return: 0 on success, negative value for failure.
  */
-int hl_pci_set_dram_bar_base(struct hl_device *hdev, u8 inbound_region, u8 bar,
-				u64 addr)
+int hl_pci_set_inbound_region(struct hl_device *hdev, u8 region,
+		struct hl_inbound_pci_region *pci_region)
 {
 	struct asic_fixed_properties *prop = &hdev->asic_prop;
-	u32 offset;
-	int rc;
+	u64 bar_phys_base, region_base, region_end_address;
+	u32 offset, ctrl_reg_val;
+	int rc = 0;
 
-	switch (inbound_region) {
-	case 0:
-		offset = 0x100;
-		break;
-	case 1:
-		offset = 0x300;
-		break;
-	case 2:
-		offset = 0x500;
-		break;
-	default:
-		dev_err(hdev->dev, "Invalid inbound region %d\n",
-			inbound_region);
-		return -EINVAL;
-	}
+	/* region offset */
+	offset = (0x200 * region) + 0x100;
+
+	if (pci_region->mode == PCI_ADDRESS_MATCH_MODE) {
+		bar_phys_base = hdev->pcie_bar_phys[pci_region->bar];
+		region_base = bar_phys_base + pci_region->offset_in_bar;
+		region_end_address = region_base + pci_region->size - 1;
 
-	if (bar != 0 && bar != 2 && bar != 4) {
-		dev_err(hdev->dev, "Invalid PCI BAR %d\n", bar);
-		return -EINVAL;
+		rc |= hl_pci_iatu_write(hdev, offset + 0x8,
+				lower_32_bits(region_base));
+		rc |= hl_pci_iatu_write(hdev, offset + 0xC,
+				upper_32_bits(region_base));
+		rc |= hl_pci_iatu_write(hdev, offset + 0x10,
+				lower_32_bits(region_end_address));
 	}
 
 	/* Point to the specified address */
-	rc = hl_pci_iatu_write(hdev, offset + 0x14, lower_32_bits(addr));
-	rc |= hl_pci_iatu_write(hdev, offset + 0x18, upper_32_bits(addr));
+	rc = hl_pci_iatu_write(hdev, offset + 0x14,
+			lower_32_bits(pci_region->addr));
+	rc |= hl_pci_iatu_write(hdev, offset + 0x18,
+			upper_32_bits(pci_region->addr));
 	rc |= hl_pci_iatu_write(hdev, offset + 0x0, 0);
-	/* Enable + BAR match + match enable + BAR number */
-	rc |= hl_pci_iatu_write(hdev, offset + 0x4, 0xC0080000 | (bar << 8));
+
+	/* Enable + bar/address match + match enable + bar number */
+	ctrl_reg_val = FIELD_PREP(IATU_REGION_CTRL_REGION_EN_MASK, 1);
+	ctrl_reg_val |= FIELD_PREP(IATU_REGION_CTRL_MATCH_MODE_MASK,
+			pci_region->mode);
+	ctrl_reg_val |= FIELD_PREP(IATU_REGION_CTRL_NUM_MATCH_EN_MASK, 1);
+
+	if (pci_region->mode == PCI_BAR_MATCH_MODE)
+		ctrl_reg_val |= FIELD_PREP(IATU_REGION_CTRL_BAR_NUM_MASK,
+				pci_region->bar);
+
+	rc |= hl_pci_iatu_write(hdev, offset + 0x4, ctrl_reg_val);
 
 	/* Return the DBI window to the default location */
 	rc |= hl_pci_elbi_write(hdev, prop->pcie_aux_dbi_reg_addr, 0);
 	rc |= hl_pci_elbi_write(hdev, prop->pcie_aux_dbi_reg_addr + 4, 0);
 
 	if (rc)
-		dev_err(hdev->dev, "failed to map DRAM bar to 0x%08llx\n",
-			addr);
+		dev_err(hdev->dev, "failed to map bar %u to 0x%08llx\n",
+				pci_region->bar, pci_region->addr);
 
 	return rc;
 }
 
 /**
- * hl_pci_init_iatu() - Initialize the iATU unit inside the PCI controller.
+ * hl_pci_set_outbound_region() - Configure outbound region 0
  * @hdev: Pointer to hl_device structure.
- * @sram_base_address: SRAM base address.
- * @dram_base_address: DRAM base address.
- * @host_phys_base_address: Base physical address of host memory for device
- *                          transactions.
- * @host_phys_size: Size of host memory for device transactions.
+ * @pci_region: Outbound region parameters.
  *
- * This is needed in case the firmware doesn't initialize the iATU.
+ * Configure the iATU outbound region 0.
  *
  * Return: 0 on success, negative value for failure.
  */
-int hl_pci_init_iatu(struct hl_device *hdev, u64 sram_base_address,
-			u64 dram_base_address, u64 host_phys_base_address,
-			u64 host_phys_size)
+int hl_pci_set_outbound_region(struct hl_device *hdev,
+		struct hl_outbound_pci_region *pci_region)
 {
 	struct asic_fixed_properties *prop = &hdev->asic_prop;
-	u64 host_phys_end_addr;
+	u64 outbound_region_end_address;
 	int rc = 0;
 
-	/* Inbound Region 0 - Bar 0 - Point to SRAM base address */
-	rc  = hl_pci_iatu_write(hdev, 0x114, lower_32_bits(sram_base_address));
-	rc |= hl_pci_iatu_write(hdev, 0x118, upper_32_bits(sram_base_address));
-	rc |= hl_pci_iatu_write(hdev, 0x100, 0);
-	/* Enable + Bar match + match enable */
-	rc |= hl_pci_iatu_write(hdev, 0x104, 0xC0080000);
-
-	/* Return the DBI window to the default location */
-	rc |= hl_pci_elbi_write(hdev, prop->pcie_aux_dbi_reg_addr, 0);
-	rc |= hl_pci_elbi_write(hdev, prop->pcie_aux_dbi_reg_addr + 4, 0);
-
-	hdev->asic_funcs->set_dma_mask_from_fw(hdev);
-
-	/* Point to DRAM */
-	if (!hdev->asic_funcs->set_dram_bar_base)
-		return -EINVAL;
-	if (hdev->asic_funcs->set_dram_bar_base(hdev, dram_base_address) ==
-								U64_MAX)
-		return -EIO;
-
-	/* Outbound Region 0 - Point to Host */
-	host_phys_end_addr = host_phys_base_address + host_phys_size - 1;
+	/* Outbound Region 0 */
+	outbound_region_end_address =
+			pci_region->addr + pci_region->size - 1;
 	rc |= hl_pci_iatu_write(hdev, 0x008,
-				lower_32_bits(host_phys_base_address));
+				lower_32_bits(pci_region->addr));
 	rc |= hl_pci_iatu_write(hdev, 0x00C,
-				upper_32_bits(host_phys_base_address));
-	rc |= hl_pci_iatu_write(hdev, 0x010, lower_32_bits(host_phys_end_addr));
+				upper_32_bits(pci_region->addr));
+	rc |= hl_pci_iatu_write(hdev, 0x010,
+				lower_32_bits(outbound_region_end_address));
 	rc |= hl_pci_iatu_write(hdev, 0x014, 0);
 
 	if ((hdev->power9_64bit_dma_enable) && (hdev->dma_mask == 64))
@@ -298,7 +288,8 @@ int hl_pci_init_iatu(struct hl_device *hdev, u64 sram_base_address,
 	else
 		rc |= hl_pci_iatu_write(hdev, 0x018, 0);
 
-	rc |= hl_pci_iatu_write(hdev, 0x020, upper_32_bits(host_phys_end_addr));
+	rc |= hl_pci_iatu_write(hdev, 0x020,
+				upper_32_bits(outbound_region_end_address));
 	/* Increase region size */
 	rc |= hl_pci_iatu_write(hdev, 0x000, 0x00002000);
 	/* Enable */
@@ -308,10 +299,7 @@ int hl_pci_init_iatu(struct hl_device *hdev, u64 sram_base_address,
 	rc |= hl_pci_elbi_write(hdev, prop->pcie_aux_dbi_reg_addr, 0);
 	rc |= hl_pci_elbi_write(hdev, prop->pcie_aux_dbi_reg_addr + 4, 0);
 
-	if (rc)
-		return -EIO;
-
-	return 0;
+	return rc;
 }
 
 /**
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 7/9] habanalabs: remove soft-reset support from GAUDI
  2020-07-05 13:12 [PATCH 1/9] habanalabs: Increase queues depth Oded Gabbay
                   ` (4 preceding siblings ...)
  2020-07-05 13:12 ` [PATCH 6/9] habanalabs: PCIe iATU refactoring Oded Gabbay
@ 2020-07-05 13:12 ` Oded Gabbay
  2020-07-05 13:30   ` Tomer Tayar
  2020-07-05 13:12 ` [PATCH 8/9] habanalabs: configure maximum queues per asic Oded Gabbay
  2020-07-05 13:12 ` [PATCH 9/9] habanalabs: use queue pi/ci in order to determine queue occupancy Oded Gabbay
  7 siblings, 1 reply; 11+ messages in thread
From: Oded Gabbay @ 2020-07-05 13:12 UTC (permalink / raw)
  To: linux-kernel, SW_Drivers

Soft-reset isn't supported in GAUDI. Remove the code that performs it and
print error in case the user wants to do it via sysfs.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/gaudi/gaudi.c | 76 ++++++++++-----------------
 1 file changed, 27 insertions(+), 49 deletions(-)

diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index a6e40dec3cd2..d3869209dbc6 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -75,7 +75,6 @@
 
 #define GAUDI_PLDM_RESET_WAIT_MSEC	1000		/* 1s */
 #define GAUDI_PLDM_HRESET_TIMEOUT_MSEC	20000		/* 20s */
-#define GAUDI_PLDM_SRESET_TIMEOUT_MSEC	14000		/* 14s */
 #define GAUDI_PLDM_TEST_QUEUE_WAIT_USEC	1000000		/* 1s */
 #define GAUDI_PLDM_MMU_TIMEOUT_USEC	(MMU_CONFIG_TIMEOUT_USEC * 100)
 #define GAUDI_PLDM_QMAN0_TIMEOUT_USEC	(HL_DEVICE_TIMEOUT_USEC * 30)
@@ -2969,46 +2968,37 @@ static void gaudi_hw_fini(struct hl_device *hdev, bool hard_reset)
 	struct gaudi_device *gaudi = hdev->asic_specific;
 	u32 status, reset_timeout_ms, boot_strap = 0;
 
-	if (hdev->pldm) {
-		if (hard_reset)
-			reset_timeout_ms = GAUDI_PLDM_HRESET_TIMEOUT_MSEC;
-		else
-			reset_timeout_ms = GAUDI_PLDM_SRESET_TIMEOUT_MSEC;
-	} else {
-		reset_timeout_ms = GAUDI_RESET_TIMEOUT_MSEC;
+	if (!hard_reset) {
+		dev_err(hdev->dev, "GAUDI doesn't support soft-reset\n");
+		return;
 	}
 
-	if (hard_reset) {
-		/* Tell ASIC not to re-initialize PCIe */
-		WREG32(mmPREBOOT_PCIE_EN, LKD_HARD_RESET_MAGIC);
+	if (hdev->pldm)
+		reset_timeout_ms = GAUDI_PLDM_HRESET_TIMEOUT_MSEC;
+	else
+		reset_timeout_ms = GAUDI_RESET_TIMEOUT_MSEC;
 
-		boot_strap = RREG32(mmPSOC_GLOBAL_CONF_BOOT_STRAP_PINS);
-		/* H/W bug WA:
-		 * rdata[31:0] = strap_read_val;
-		 * wdata[31:0] = rdata[30:21],1'b0,rdata[20:0]
-		 */
-		boot_strap = (((boot_strap & 0x7FE00000) << 1) |
-				(boot_strap & 0x001FFFFF));
-		WREG32(mmPSOC_GLOBAL_CONF_BOOT_STRAP_PINS, boot_strap & ~0x2);
-
-		/* Restart BTL/BLR upon hard-reset */
-		WREG32(mmPSOC_GLOBAL_CONF_BOOT_SEQ_RE_START, 1);
-
-		WREG32(mmPSOC_GLOBAL_CONF_SW_ALL_RST,
-				1 << PSOC_GLOBAL_CONF_SW_ALL_RST_IND_SHIFT);
-		dev_info(hdev->dev,
-			"Issued HARD reset command, going to wait %dms\n",
-			reset_timeout_ms);
-	} else {
-		/* Don't restart BTL/BLR upon soft-reset */
-		WREG32(mmPSOC_GLOBAL_CONF_BOOT_SEQ_RE_START, 0);
+	/* Tell ASIC not to re-initialize PCIe */
+	WREG32(mmPREBOOT_PCIE_EN, LKD_HARD_RESET_MAGIC);
 
-		WREG32(mmPSOC_GLOBAL_CONF_SOFT_RST,
-				1 << PSOC_GLOBAL_CONF_SOFT_RST_IND_SHIFT);
-		dev_info(hdev->dev,
-			"Issued SOFT reset command, going to wait %dms\n",
-			reset_timeout_ms);
-	}
+	boot_strap = RREG32(mmPSOC_GLOBAL_CONF_BOOT_STRAP_PINS);
+
+	/* H/W bug WA:
+	 * rdata[31:0] = strap_read_val;
+	 * wdata[31:0] = rdata[30:21],1'b0,rdata[20:0]
+	 */
+	boot_strap = (((boot_strap & 0x7FE00000) << 1) |
+			(boot_strap & 0x001FFFFF));
+	WREG32(mmPSOC_GLOBAL_CONF_BOOT_STRAP_PINS, boot_strap & ~0x2);
+
+	/* Restart BTL/BLR upon hard-reset */
+	WREG32(mmPSOC_GLOBAL_CONF_BOOT_SEQ_RE_START, 1);
+
+	WREG32(mmPSOC_GLOBAL_CONF_SW_ALL_RST,
+			1 << PSOC_GLOBAL_CONF_SW_ALL_RST_IND_SHIFT);
+	dev_info(hdev->dev,
+		"Issued HARD reset command, going to wait %dms\n",
+		reset_timeout_ms);
 
 	/*
 	 * After hard reset, we can't poll the BTM_FSM register because the PSOC
@@ -3022,18 +3012,6 @@ static void gaudi_hw_fini(struct hl_device *hdev, bool hard_reset)
 			"Timeout while waiting for device to reset 0x%x\n",
 			status);
 
-	if (!hard_reset) {
-		gaudi->hw_cap_initialized &= ~(HW_CAP_PCI_DMA | HW_CAP_MME |
-						HW_CAP_TPC_MASK |
-						HW_CAP_HBM_DMA);
-
-		WREG32(mmGIC_DISTRIBUTOR__5_GICD_SETSPI_NSR,
-				GAUDI_EVENT_SOFT_RESET);
-		return;
-	}
-
-	/* We continue here only for hard-reset */
-
 	WREG32(mmPSOC_GLOBAL_CONF_BOOT_STRAP_PINS, boot_strap);
 
 	gaudi->hw_cap_initialized &= ~(HW_CAP_CPU | HW_CAP_CPU_Q |
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 8/9] habanalabs: configure maximum queues per asic
  2020-07-05 13:12 [PATCH 1/9] habanalabs: Increase queues depth Oded Gabbay
                   ` (5 preceding siblings ...)
  2020-07-05 13:12 ` [PATCH 7/9] habanalabs: remove soft-reset support from GAUDI Oded Gabbay
@ 2020-07-05 13:12 ` Oded Gabbay
  2020-07-05 13:12 ` [PATCH 9/9] habanalabs: use queue pi/ci in order to determine queue occupancy Oded Gabbay
  7 siblings, 0 replies; 11+ messages in thread
From: Oded Gabbay @ 2020-07-05 13:12 UTC (permalink / raw)
  To: linux-kernel, SW_Drivers; +Cc: Ofir Bitton

From: Ofir Bitton <obitton@habana.ai>

Currently the amount of maximum queues is statically configured.
Using a static value is causing redundunt cycles when traversing
all queues and consumes more memory than actually needed.
In this patch we configure each asic with the exact number of
queues needed.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/command_submission.c | 14 ++++++--
 drivers/misc/habanalabs/gaudi/gaudi.c        | 33 +++++++++++--------
 drivers/misc/habanalabs/goya/goya.c          | 34 +++++++++++++++-----
 drivers/misc/habanalabs/goya/goyaP.h         |  6 +---
 drivers/misc/habanalabs/habanalabs.h         | 10 +++---
 drivers/misc/habanalabs/hw_queue.c           | 19 +++++++----
 6 files changed, 74 insertions(+), 42 deletions(-)

diff --git a/drivers/misc/habanalabs/command_submission.c b/drivers/misc/habanalabs/command_submission.c
index 777f88d25acd..1ba937b9a22e 100644
--- a/drivers/misc/habanalabs/command_submission.c
+++ b/drivers/misc/habanalabs/command_submission.c
@@ -363,6 +363,7 @@ static void cs_do_release(struct kref *ref)
 
 	cs_counters_aggregate(hdev, cs->ctx);
 
+	kfree(cs->jobs_in_queue_cnt);
 	kfree(cs);
 }
 
@@ -435,13 +436,19 @@ static int allocate_cs(struct hl_device *hdev, struct hl_ctx *ctx,
 	other = ctx->cs_pending[cs_cmpl->cs_seq &
 				(hdev->asic_prop.max_pending_cs - 1)];
 	if ((other) && (!dma_fence_is_signaled(other))) {
-		spin_unlock(&ctx->cs_lock);
 		dev_dbg(hdev->dev,
 			"Rejecting CS because of too many in-flights CS\n");
 		rc = -EAGAIN;
 		goto free_fence;
 	}
 
+	cs->jobs_in_queue_cnt = kcalloc(hdev->asic_prop.max_queues,
+			sizeof(*cs->jobs_in_queue_cnt), GFP_ATOMIC);
+	if (!cs->jobs_in_queue_cnt) {
+		rc = -ENOMEM;
+		goto free_fence;
+	}
+
 	dma_fence_init(&cs_cmpl->base_fence, &hl_fence_ops, &cs_cmpl->lock,
 			ctx->asid, ctx->cs_sequence);
 
@@ -463,6 +470,7 @@ static int allocate_cs(struct hl_device *hdev, struct hl_ctx *ctx,
 	return 0;
 
 free_fence:
+	spin_unlock(&ctx->cs_lock);
 	kfree(cs_cmpl);
 free_cs:
 	kfree(cs);
@@ -517,7 +525,7 @@ static int validate_queue_index(struct hl_device *hdev,
 
 	hw_queue_prop = &asic->hw_queues_props[chunk->queue_index];
 
-	if ((chunk->queue_index >= HL_MAX_QUEUES) ||
+	if ((chunk->queue_index >= asic->max_queues) ||
 			(hw_queue_prop->type == QUEUE_TYPE_NA)) {
 		dev_err(hdev->dev, "Queue index %d is invalid\n",
 			chunk->queue_index);
@@ -795,7 +803,7 @@ static int cs_ioctl_signal_wait(struct hl_fpriv *hpriv, enum hl_cs_type cs_type,
 	hw_queue_prop = &hdev->asic_prop.hw_queues_props[q_idx];
 	q_type = hw_queue_prop->type;
 
-	if ((q_idx >= HL_MAX_QUEUES) ||
+	if ((q_idx >= hdev->asic_prop.max_queues) ||
 			(!hw_queue_prop->supports_sync_stream)) {
 		dev_err(hdev->dev, "Queue index %d is invalid\n", q_idx);
 		rc = -EINVAL;
diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index d3869209dbc6..3404d6b0453f 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -340,14 +340,15 @@ static int gaudi_get_fixed_properties(struct hl_device *hdev)
 	struct asic_fixed_properties *prop = &hdev->asic_prop;
 	int i;
 
-	if (GAUDI_QUEUE_ID_SIZE >= HL_MAX_QUEUES) {
-		dev_err(hdev->dev,
-			"Number of H/W queues must be smaller than %d\n",
-			HL_MAX_QUEUES);
-		return -EFAULT;
-	}
+	prop->max_queues = GAUDI_QUEUE_ID_SIZE;
+	prop->hw_queues_props = kcalloc(prop->max_queues,
+			sizeof(struct hw_queue_properties),
+			GFP_KERNEL);
 
-	for (i = 0 ; i < GAUDI_QUEUE_ID_SIZE ; i++) {
+	if (!prop->hw_queues_props)
+		return -ENOMEM;
+
+	for (i = 0 ; i < prop->max_queues ; i++) {
 		if (gaudi_queue_type[i] == QUEUE_TYPE_EXT) {
 			prop->hw_queues_props[i].type = QUEUE_TYPE_EXT;
 			prop->hw_queues_props[i].driver_only = 0;
@@ -370,9 +371,6 @@ static int gaudi_get_fixed_properties(struct hl_device *hdev)
 		}
 	}
 
-	for (; i < HL_MAX_QUEUES; i++)
-		prop->hw_queues_props[i].type = QUEUE_TYPE_NA;
-
 	prop->completion_queues_count = NUMBER_OF_CMPLT_QUEUES;
 	prop->sync_stream_first_sob = 0;
 	prop->sync_stream_first_mon = 0;
@@ -548,7 +546,8 @@ static int gaudi_early_init(struct hl_device *hdev)
 			(unsigned long long) pci_resource_len(pdev,
 							SRAM_BAR_ID),
 			SRAM_BAR_SIZE);
-		return -ENODEV;
+		rc = -ENODEV;
+		goto free_queue_props;
 	}
 
 	if (pci_resource_len(pdev, CFG_BAR_ID) != CFG_BAR_SIZE) {
@@ -558,20 +557,26 @@ static int gaudi_early_init(struct hl_device *hdev)
 			(unsigned long long) pci_resource_len(pdev,
 								CFG_BAR_ID),
 			CFG_BAR_SIZE);
-		return -ENODEV;
+		rc = -ENODEV;
+		goto free_queue_props;
 	}
 
 	prop->dram_pci_bar_size = pci_resource_len(pdev, HBM_BAR_ID);
 
 	rc = hl_pci_init(hdev);
 	if (rc)
-		return rc;
+		goto free_queue_props;
 
 	return 0;
+
+free_queue_props:
+	kfree(hdev->asic_prop.hw_queues_props);
+	return rc;
 }
 
 static int gaudi_early_fini(struct hl_device *hdev)
 {
+	kfree(hdev->asic_prop.hw_queues_props);
 	hl_pci_fini(hdev);
 
 	return 0;
@@ -3466,7 +3471,7 @@ static int gaudi_test_queues(struct hl_device *hdev)
 {
 	int i, rc, ret_val = 0;
 
-	for (i = 0 ; i < HL_MAX_QUEUES ; i++) {
+	for (i = 0 ; i < hdev->asic_prop.max_queues ; i++) {
 		if (hdev->asic_prop.hw_queues_props[i].type == QUEUE_TYPE_EXT) {
 			rc = gaudi_test_queue(hdev, i);
 			if (rc)
diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index 5839b5bc9ee3..36db771f391c 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -337,11 +337,19 @@ static int goya_mmu_set_dram_default_page(struct hl_device *hdev);
 static int goya_mmu_add_mappings_for_device_cpu(struct hl_device *hdev);
 static void goya_mmu_prepare(struct hl_device *hdev, u32 asid);
 
-void goya_get_fixed_properties(struct hl_device *hdev)
+int goya_get_fixed_properties(struct hl_device *hdev)
 {
 	struct asic_fixed_properties *prop = &hdev->asic_prop;
 	int i;
 
+	prop->max_queues = GOYA_QUEUE_ID_SIZE;
+	prop->hw_queues_props = kcalloc(prop->max_queues,
+			sizeof(struct hw_queue_properties),
+			GFP_KERNEL);
+
+	if (!prop->hw_queues_props)
+		return -ENOMEM;
+
 	for (i = 0 ; i < NUMBER_OF_EXT_HW_QUEUES ; i++) {
 		prop->hw_queues_props[i].type = QUEUE_TYPE_EXT;
 		prop->hw_queues_props[i].driver_only = 0;
@@ -361,9 +369,6 @@ void goya_get_fixed_properties(struct hl_device *hdev)
 		prop->hw_queues_props[i].requires_kernel_cb = 0;
 	}
 
-	for (; i < HL_MAX_QUEUES; i++)
-		prop->hw_queues_props[i].type = QUEUE_TYPE_NA;
-
 	prop->completion_queues_count = NUMBER_OF_CMPLT_QUEUES;
 
 	prop->dram_base_address = DRAM_PHYS_BASE;
@@ -428,6 +433,8 @@ void goya_get_fixed_properties(struct hl_device *hdev)
 		CARD_NAME_MAX_LEN);
 
 	prop->max_pending_cs = GOYA_MAX_PENDING_CS;
+
+	return 0;
 }
 
 /*
@@ -540,7 +547,11 @@ static int goya_early_init(struct hl_device *hdev)
 	u32 val;
 	int rc;
 
-	goya_get_fixed_properties(hdev);
+	rc = goya_get_fixed_properties(hdev);
+	if (rc) {
+		dev_err(hdev->dev, "Failed to get fixed properties\n");
+		return rc;
+	}
 
 	/* Check BAR sizes */
 	if (pci_resource_len(pdev, SRAM_CFG_BAR_ID) != CFG_BAR_SIZE) {
@@ -550,7 +561,8 @@ static int goya_early_init(struct hl_device *hdev)
 			(unsigned long long) pci_resource_len(pdev,
 							SRAM_CFG_BAR_ID),
 			CFG_BAR_SIZE);
-		return -ENODEV;
+		rc = -ENODEV;
+		goto free_queue_props;
 	}
 
 	if (pci_resource_len(pdev, MSIX_BAR_ID) != MSIX_BAR_SIZE) {
@@ -560,14 +572,15 @@ static int goya_early_init(struct hl_device *hdev)
 			(unsigned long long) pci_resource_len(pdev,
 								MSIX_BAR_ID),
 			MSIX_BAR_SIZE);
-		return -ENODEV;
+		rc = -ENODEV;
+		goto free_queue_props;
 	}
 
 	prop->dram_pci_bar_size = pci_resource_len(pdev, DDR_BAR_ID);
 
 	rc = hl_pci_init(hdev);
 	if (rc)
-		return rc;
+		goto free_queue_props;
 
 	if (!hdev->pldm) {
 		val = RREG32(mmPSOC_GLOBAL_CONF_BOOT_STRAP_PINS);
@@ -577,6 +590,10 @@ static int goya_early_init(struct hl_device *hdev)
 	}
 
 	return 0;
+
+free_queue_props:
+	kfree(hdev->asic_prop.hw_queues_props);
+	return rc;
 }
 
 /*
@@ -589,6 +606,7 @@ static int goya_early_init(struct hl_device *hdev)
  */
 static int goya_early_fini(struct hl_device *hdev)
 {
+	kfree(hdev->asic_prop.hw_queues_props);
 	hl_pci_fini(hdev);
 
 	return 0;
diff --git a/drivers/misc/habanalabs/goya/goyaP.h b/drivers/misc/habanalabs/goya/goyaP.h
index 9d8a1761252d..8265cc21b45a 100644
--- a/drivers/misc/habanalabs/goya/goyaP.h
+++ b/drivers/misc/habanalabs/goya/goyaP.h
@@ -31,10 +31,6 @@
  */
 #define NUMBER_OF_INTERRUPTS		(NUMBER_OF_CMPLT_QUEUES + 1)
 
-#if (NUMBER_OF_HW_QUEUES >= HL_MAX_QUEUES)
-#error "Number of H/W queues must be smaller than HL_MAX_QUEUES"
-#endif
-
 #if (NUMBER_OF_INTERRUPTS > GOYA_MSIX_ENTRIES)
 #error "Number of MSIX interrupts must be smaller or equal to GOYA_MSIX_ENTRIES"
 #endif
@@ -170,7 +166,7 @@ struct goya_device {
 	u8		device_cpu_mmu_mappings_done;
 };
 
-void goya_get_fixed_properties(struct hl_device *hdev);
+int goya_get_fixed_properties(struct hl_device *hdev);
 int goya_mmu_init(struct hl_device *hdev);
 void goya_init_dma_qmans(struct hl_device *hdev);
 void goya_init_mme_qmans(struct hl_device *hdev);
diff --git a/drivers/misc/habanalabs/habanalabs.h b/drivers/misc/habanalabs/habanalabs.h
index 365236589bbf..9213d107b533 100644
--- a/drivers/misc/habanalabs/habanalabs.h
+++ b/drivers/misc/habanalabs/habanalabs.h
@@ -41,8 +41,6 @@
 
 #define HL_SIM_MAX_TIMEOUT_US		10000000 /* 10s */
 
-#define HL_MAX_QUEUES			128
-
 #define HL_IDLE_BUSY_TS_ARR_SIZE	4096
 
 /* Memory */
@@ -290,14 +288,15 @@ struct hl_mmu_properties {
  * @high_pll: high PLL frequency used by the device.
  * @cb_pool_cb_cnt: number of CBs in the CB pool.
  * @cb_pool_cb_size: size of each CB in the CB pool.
- * @tpc_enabled_mask: which TPCs are enabled.
+ * @max_pending_cs: maximum of concurrent pending command submissions
+ * @max_queues: maximum amount of queues in the system
  * @sync_stream_first_sob: first sync object available for sync stream use
  * @sync_stream_first_mon: first monitor available for sync stream use
  * @tpc_enabled_mask: which TPCs are enabled.
  * @completion_queues_count: number of completion queues.
  */
 struct asic_fixed_properties {
-	struct hw_queue_properties	hw_queues_props[HL_MAX_QUEUES];
+	struct hw_queue_properties	*hw_queues_props;
 	struct armcp_info		armcp_info;
 	char				uboot_ver[VERSION_MAX_LEN];
 	char				preboot_ver[VERSION_MAX_LEN];
@@ -336,6 +335,7 @@ struct asic_fixed_properties {
 	u32				cb_pool_cb_cnt;
 	u32				cb_pool_cb_size;
 	u32				max_pending_cs;
+	u32				max_queues;
 	u16				sync_stream_first_sob;
 	u16				sync_stream_first_mon;
 	u8				tpc_enabled_mask;
@@ -901,7 +901,7 @@ struct hl_userptr {
  * @aborted: true if CS was aborted due to some device error.
  */
 struct hl_cs {
-	u16			jobs_in_queue_cnt[HL_MAX_QUEUES];
+	u16			*jobs_in_queue_cnt;
 	struct hl_ctx		*ctx;
 	struct list_head	job_list;
 	spinlock_t		job_lock;
diff --git a/drivers/misc/habanalabs/hw_queue.c b/drivers/misc/habanalabs/hw_queue.c
index da66ffb528f8..7965551587fc 100644
--- a/drivers/misc/habanalabs/hw_queue.c
+++ b/drivers/misc/habanalabs/hw_queue.c
@@ -46,7 +46,7 @@ void hl_int_hw_queue_update_ci(struct hl_cs *cs)
 		goto out;
 
 	q = &hdev->kernel_queues[0];
-	for (i = 0 ; i < HL_MAX_QUEUES ; i++, q++) {
+	for (i = 0 ; i < hdev->asic_prop.max_queues ; i++, q++) {
 		if (q->queue_type == QUEUE_TYPE_INT) {
 			q->ci += cs->jobs_in_queue_cnt[i];
 			q->ci &= ((q->int_queue_len << 1) - 1);
@@ -509,6 +509,7 @@ int hl_hw_queue_schedule_cs(struct hl_cs *cs)
 	struct hl_device *hdev = ctx->hdev;
 	struct hl_cs_job *job, *tmp;
 	struct hl_hw_queue *q;
+	u32 max_queues;
 	int rc = 0, i, cq_cnt;
 
 	hdev->asic_funcs->hw_queues_lock(hdev);
@@ -521,8 +522,10 @@ int hl_hw_queue_schedule_cs(struct hl_cs *cs)
 		goto out;
 	}
 
+	max_queues = hdev->asic_prop.max_queues;
+
 	q = &hdev->kernel_queues[0];
-	for (i = 0, cq_cnt = 0 ; i < HL_MAX_QUEUES ; i++, q++) {
+	for (i = 0, cq_cnt = 0 ; i < max_queues ; i++, q++) {
 		if (cs->jobs_in_queue_cnt[i]) {
 			switch (q->queue_type) {
 			case QUEUE_TYPE_EXT:
@@ -601,7 +604,7 @@ int hl_hw_queue_schedule_cs(struct hl_cs *cs)
 
 unroll_cq_resv:
 	q = &hdev->kernel_queues[0];
-	for (i = 0 ; (i < HL_MAX_QUEUES) && (cq_cnt > 0) ; i++, q++) {
+	for (i = 0 ; (i < max_queues) && (cq_cnt > 0) ; i++, q++) {
 		if ((q->queue_type == QUEUE_TYPE_EXT ||
 				q->queue_type == QUEUE_TYPE_HW) &&
 				cs->jobs_in_queue_cnt[i]) {
@@ -872,7 +875,7 @@ int hl_hw_queues_create(struct hl_device *hdev)
 	struct hl_hw_queue *q;
 	int i, rc, q_ready_cnt;
 
-	hdev->kernel_queues = kcalloc(HL_MAX_QUEUES,
+	hdev->kernel_queues = kcalloc(asic->max_queues,
 				sizeof(*hdev->kernel_queues), GFP_KERNEL);
 
 	if (!hdev->kernel_queues) {
@@ -882,7 +885,7 @@ int hl_hw_queues_create(struct hl_device *hdev)
 
 	/* Initialize the H/W queues */
 	for (i = 0, q_ready_cnt = 0, q = hdev->kernel_queues;
-			i < HL_MAX_QUEUES ; i++, q_ready_cnt++, q++) {
+			i < asic->max_queues ; i++, q_ready_cnt++, q++) {
 
 		q->queue_type = asic->hw_queues_props[i].type;
 		q->supports_sync_stream =
@@ -909,9 +912,10 @@ int hl_hw_queues_create(struct hl_device *hdev)
 void hl_hw_queues_destroy(struct hl_device *hdev)
 {
 	struct hl_hw_queue *q;
+	u32 max_queues = hdev->asic_prop.max_queues;
 	int i;
 
-	for (i = 0, q = hdev->kernel_queues ; i < HL_MAX_QUEUES ; i++, q++)
+	for (i = 0, q = hdev->kernel_queues ; i < max_queues ; i++, q++)
 		queue_fini(hdev, q);
 
 	kfree(hdev->kernel_queues);
@@ -920,9 +924,10 @@ void hl_hw_queues_destroy(struct hl_device *hdev)
 void hl_hw_queue_reset(struct hl_device *hdev, bool hard_reset)
 {
 	struct hl_hw_queue *q;
+	u32 max_queues = hdev->asic_prop.max_queues;
 	int i;
 
-	for (i = 0, q = hdev->kernel_queues ; i < HL_MAX_QUEUES ; i++, q++) {
+	for (i = 0, q = hdev->kernel_queues ; i < max_queues ; i++, q++) {
 		if ((!q->valid) ||
 			((!hard_reset) && (q->queue_type == QUEUE_TYPE_CPU)))
 			continue;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 9/9] habanalabs: use queue pi/ci in order to determine queue occupancy
  2020-07-05 13:12 [PATCH 1/9] habanalabs: Increase queues depth Oded Gabbay
                   ` (6 preceding siblings ...)
  2020-07-05 13:12 ` [PATCH 8/9] habanalabs: configure maximum queues per asic Oded Gabbay
@ 2020-07-05 13:12 ` Oded Gabbay
  7 siblings, 0 replies; 11+ messages in thread
From: Oded Gabbay @ 2020-07-05 13:12 UTC (permalink / raw)
  To: linux-kernel, SW_Drivers; +Cc: Ofir Bitton

From: Ofir Bitton <obitton@habana.ai>

Instead of using the free slots amount on the compute CQ to determine
whether we can submit work to queues, use the queues pi/ci.

This is needed in future ASICs where we don't have CQ per queue.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/device.c     | 17 +++---
 drivers/misc/habanalabs/habanalabs.h |  2 +-
 drivers/misc/habanalabs/hw_queue.c   | 82 +++++++++-------------------
 drivers/misc/habanalabs/irq.c        |  7 +--
 4 files changed, 39 insertions(+), 69 deletions(-)

diff --git a/drivers/misc/habanalabs/device.c b/drivers/misc/habanalabs/device.c
index 2b38a119704c..65a5a5c52a48 100644
--- a/drivers/misc/habanalabs/device.c
+++ b/drivers/misc/habanalabs/device.c
@@ -1144,14 +1144,17 @@ int hl_device_init(struct hl_device *hdev, struct class *hclass)
 	 * because there the addresses of the completion queues are being
 	 * passed as arguments to request_irq
 	 */
-	hdev->completion_queue = kcalloc(cq_cnt,
-						sizeof(*hdev->completion_queue),
-						GFP_KERNEL);
+	if (cq_cnt) {
+		hdev->completion_queue = kcalloc(cq_cnt,
+				sizeof(*hdev->completion_queue),
+				GFP_KERNEL);
 
-	if (!hdev->completion_queue) {
-		dev_err(hdev->dev, "failed to allocate completion queues\n");
-		rc = -ENOMEM;
-		goto hw_queues_destroy;
+		if (!hdev->completion_queue) {
+			dev_err(hdev->dev,
+				"failed to allocate completion queues\n");
+			rc = -ENOMEM;
+			goto hw_queues_destroy;
+		}
 	}
 
 	for (i = 0, cq_ready_cnt = 0 ; i < cq_cnt ; i++, cq_ready_cnt++) {
diff --git a/drivers/misc/habanalabs/habanalabs.h b/drivers/misc/habanalabs/habanalabs.h
index 9213d107b533..a61aab09778c 100644
--- a/drivers/misc/habanalabs/habanalabs.h
+++ b/drivers/misc/habanalabs/habanalabs.h
@@ -461,7 +461,7 @@ struct hl_hw_queue {
 	u64			kernel_address;
 	dma_addr_t		bus_address;
 	u32			pi;
-	u32			ci;
+	atomic_t		ci;
 	u32			hw_queue_id;
 	u32			cq_id;
 	u32			msi_vec;
diff --git a/drivers/misc/habanalabs/hw_queue.c b/drivers/misc/habanalabs/hw_queue.c
index 7965551587fc..474a0e8a7797 100644
--- a/drivers/misc/habanalabs/hw_queue.c
+++ b/drivers/misc/habanalabs/hw_queue.c
@@ -23,10 +23,14 @@ inline u32 hl_hw_queue_add_ptr(u32 ptr, u16 val)
 	ptr &= ((HL_QUEUE_LENGTH << 1) - 1);
 	return ptr;
 }
+static inline int queue_ci_get(atomic_t *ci, u32 queue_len)
+{
+	return atomic_read(ci) & ((queue_len << 1) - 1);
+}
 
 static inline int queue_free_slots(struct hl_hw_queue *q, u32 queue_len)
 {
-	int delta = (q->pi - q->ci);
+	int delta = (q->pi - queue_ci_get(&q->ci, queue_len));
 
 	if (delta >= 0)
 		return (queue_len - delta);
@@ -40,21 +44,14 @@ void hl_int_hw_queue_update_ci(struct hl_cs *cs)
 	struct hl_hw_queue *q;
 	int i;
 
-	hdev->asic_funcs->hw_queues_lock(hdev);
-
 	if (hdev->disabled)
-		goto out;
+		return;
 
 	q = &hdev->kernel_queues[0];
 	for (i = 0 ; i < hdev->asic_prop.max_queues ; i++, q++) {
-		if (q->queue_type == QUEUE_TYPE_INT) {
-			q->ci += cs->jobs_in_queue_cnt[i];
-			q->ci &= ((q->int_queue_len << 1) - 1);
-		}
+		if (q->queue_type == QUEUE_TYPE_INT)
+			atomic_add(cs->jobs_in_queue_cnt[i], &q->ci);
 	}
-
-out:
-	hdev->asic_funcs->hw_queues_unlock(hdev);
 }
 
 /*
@@ -174,38 +171,26 @@ static int int_queue_sanity_checks(struct hl_device *hdev,
 }
 
 /*
- * hw_queue_sanity_checks() - Perform some sanity checks on a H/W queue.
+ * hw_queue_sanity_checks() - Make sure we have enough space in the h/w queue
  * @hdev: Pointer to hl_device structure.
  * @q: Pointer to hl_hw_queue structure.
  * @num_of_entries: How many entries to check for space.
  *
- * Perform the following:
- * - Make sure we have enough space in the completion queue.
- *   This check also ensures that there is enough space in the h/w queue, as
- *   both queues are of the same size.
- * - Reserve space in the completion queue (needs to be reversed if there
- *   is a failure down the road before the actual submission of work).
+ * Notice: We do not reserve queue entries so this function mustn't be called
+ *         more than once per CS for the same queue
  *
- * Both operations are done using the "free_slots_cnt" field of the completion
- * queue. The CI counters of the queue and the completion queue are not
- * needed/used for the H/W queue type.
  */
 static int hw_queue_sanity_checks(struct hl_device *hdev, struct hl_hw_queue *q,
 					int num_of_entries)
 {
-	atomic_t *free_slots =
-			&hdev->completion_queue[q->cq_id].free_slots_cnt;
+	int free_slots_cnt;
 
-	/*
-	 * Check we have enough space in the completion queue.
-	 * Add -1 to counter (decrement) unless counter was already 0.
-	 * In that case, CQ is full so we can't submit a new CB.
-	 * atomic_add_unless will return 0 if counter was already 0.
-	 */
-	if (atomic_add_negative(num_of_entries * -1, free_slots)) {
-		dev_dbg(hdev->dev, "No space for %d entries on CQ %d\n",
-			num_of_entries, q->hw_queue_id);
-		atomic_add(num_of_entries, free_slots);
+	/* Check we have enough space in the queue */
+	free_slots_cnt = queue_free_slots(q, HL_QUEUE_LENGTH);
+
+	if (free_slots_cnt < num_of_entries) {
+		dev_dbg(hdev->dev, "Queue %d doesn't have room for %d CBs\n",
+			q->hw_queue_id, num_of_entries);
 		return -EAGAIN;
 	}
 
@@ -366,7 +351,6 @@ static void hw_queue_schedule_job(struct hl_cs_job *job)
 {
 	struct hl_device *hdev = job->cs->ctx->hdev;
 	struct hl_hw_queue *q = &hdev->kernel_queues[job->hw_queue_id];
-	struct hl_cq *cq;
 	u64 ptr;
 	u32 offset, ctl, len;
 
@@ -395,17 +379,6 @@ static void hw_queue_schedule_job(struct hl_cs_job *job)
 	else
 		ptr = (u64) (uintptr_t) job->user_cb;
 
-	/*
-	 * No need to protect pi_offset because scheduling to the
-	 * H/W queues is done under the scheduler mutex
-	 *
-	 * No need to check if CQ is full because it was already
-	 * checked in hw_queue_sanity_checks
-	 */
-	cq = &hdev->completion_queue[q->cq_id];
-
-	cq->pi = hl_cq_inc_ptr(cq->pi);
-
 	ext_and_hw_queue_submit_bd(hdev, q, ctl, len, ptr);
 }
 
@@ -552,8 +525,7 @@ int hl_hw_queue_schedule_cs(struct hl_cs *cs)
 				goto unroll_cq_resv;
 			}
 
-			if (q->queue_type == QUEUE_TYPE_EXT ||
-					q->queue_type == QUEUE_TYPE_HW)
+			if (q->queue_type == QUEUE_TYPE_EXT)
 				cq_cnt++;
 		}
 	}
@@ -605,9 +577,8 @@ int hl_hw_queue_schedule_cs(struct hl_cs *cs)
 unroll_cq_resv:
 	q = &hdev->kernel_queues[0];
 	for (i = 0 ; (i < max_queues) && (cq_cnt > 0) ; i++, q++) {
-		if ((q->queue_type == QUEUE_TYPE_EXT ||
-				q->queue_type == QUEUE_TYPE_HW) &&
-				cs->jobs_in_queue_cnt[i]) {
+		if ((q->queue_type == QUEUE_TYPE_EXT) &&
+						(cs->jobs_in_queue_cnt[i])) {
 			atomic_t *free_slots =
 				&hdev->completion_queue[i].free_slots_cnt;
 			atomic_add(cs->jobs_in_queue_cnt[i], free_slots);
@@ -631,7 +602,7 @@ void hl_hw_queue_inc_ci_kernel(struct hl_device *hdev, u32 hw_queue_id)
 {
 	struct hl_hw_queue *q = &hdev->kernel_queues[hw_queue_id];
 
-	q->ci = hl_queue_inc_ptr(q->ci);
+	atomic_inc(&q->ci);
 }
 
 static int ext_and_cpu_queue_init(struct hl_device *hdev, struct hl_hw_queue *q,
@@ -666,7 +637,7 @@ static int ext_and_cpu_queue_init(struct hl_device *hdev, struct hl_hw_queue *q,
 	}
 
 	/* Make sure read/write pointers are initialized to start of queue */
-	q->ci = 0;
+	atomic_set(&q->ci, 0);
 	q->pi = 0;
 
 	return 0;
@@ -700,7 +671,7 @@ static int int_queue_init(struct hl_device *hdev, struct hl_hw_queue *q)
 
 	q->kernel_address = (u64) (uintptr_t) p;
 	q->pi = 0;
-	q->ci = 0;
+	atomic_set(&q->ci, 0);
 
 	return 0;
 }
@@ -729,7 +700,7 @@ static int hw_queue_init(struct hl_device *hdev, struct hl_hw_queue *q)
 	q->kernel_address = (u64) (uintptr_t) p;
 
 	/* Make sure read/write pointers are initialized to start of queue */
-	q->ci = 0;
+	atomic_set(&q->ci, 0);
 	q->pi = 0;
 
 	return 0;
@@ -931,7 +902,8 @@ void hl_hw_queue_reset(struct hl_device *hdev, bool hard_reset)
 		if ((!q->valid) ||
 			((!hard_reset) && (q->queue_type == QUEUE_TYPE_CPU)))
 			continue;
-		q->pi = q->ci = 0;
+		q->pi = 0;
+		atomic_set(&q->ci, 0);
 
 		if (q->supports_sync_stream)
 			sync_stream_queue_reset(hdev, q->hw_queue_id);
diff --git a/drivers/misc/habanalabs/irq.c b/drivers/misc/habanalabs/irq.c
index 7a4878edb1a3..195a5ecba0e8 100644
--- a/drivers/misc/habanalabs/irq.c
+++ b/drivers/misc/habanalabs/irq.c
@@ -122,12 +122,7 @@ irqreturn_t hl_irq_handler_cq(int irq, void *arg)
 			queue_work(hdev->cq_wq, &job->finish_work);
 		}
 
-		/* Update ci of the context's queue. There is no
-		 * need to protect it with spinlock because this update is
-		 * done only inside IRQ and there is a different IRQ per
-		 * queue
-		 */
-		queue->ci = hl_queue_inc_ptr(queue->ci);
+		atomic_inc(&queue->ci);
 
 		/* Clear CQ entry ready bit */
 		cq_entry->data = cpu_to_le32(le32_to_cpu(cq_entry->data) &
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* RE: [PATCH 2/9] habanalabs: rephrase error messages
  2020-07-05 13:12 ` [PATCH 2/9] habanalabs: rephrase error messages Oded Gabbay
@ 2020-07-05 13:29   ` Tomer Tayar
  0 siblings, 0 replies; 11+ messages in thread
From: Tomer Tayar @ 2020-07-05 13:29 UTC (permalink / raw)
  To: Oded Gabbay, linux-kernel, SW_Drivers

On Sun, Jul 5, 2020 at 16:13 Oded Gabbay <oded.gabbay@gmail.com> wrote:
> rephrase some error/warning/notice messages to make them more
> accessible to
> ordinary users.
> 
> There is no need to print context ASID as the driver currently doesn't
> support multiple contexts.
> 
> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>

Reviewed-by: Tomer Tayar <ttayar@habana.ai>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH 7/9] habanalabs: remove soft-reset support from GAUDI
  2020-07-05 13:12 ` [PATCH 7/9] habanalabs: remove soft-reset support from GAUDI Oded Gabbay
@ 2020-07-05 13:30   ` Tomer Tayar
  0 siblings, 0 replies; 11+ messages in thread
From: Tomer Tayar @ 2020-07-05 13:30 UTC (permalink / raw)
  To: Oded Gabbay, linux-kernel, SW_Drivers

On Sun, Jul 5, 2020 at 16:13 Oded Gabbay <oded.gabbay@gmail.com> wrote:
> Soft-reset isn't supported in GAUDI. Remove the code that performs it and
> print error in case the user wants to do it via sysfs.
> 
> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>

Reviewed-by: Tomer Tayar <ttayar@habana.ai>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-07-05 13:30 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-05 13:12 [PATCH 1/9] habanalabs: Increase queues depth Oded Gabbay
2020-07-05 13:12 ` [PATCH 2/9] habanalabs: rephrase error messages Oded Gabbay
2020-07-05 13:29   ` Tomer Tayar
2020-07-05 13:12 ` [PATCH 3/9] habanalabs: extract cpu boot status lookup Oded Gabbay
2020-07-05 13:12 ` [PATCH 4/9] habanalabs: Add dropped cs statistics info struct Oded Gabbay
2020-07-05 13:12 ` [PATCH 5/9] habanalabs: Extract ECC information from FW Oded Gabbay
2020-07-05 13:12 ` [PATCH 6/9] habanalabs: PCIe iATU refactoring Oded Gabbay
2020-07-05 13:12 ` [PATCH 7/9] habanalabs: remove soft-reset support from GAUDI Oded Gabbay
2020-07-05 13:30   ` Tomer Tayar
2020-07-05 13:12 ` [PATCH 8/9] habanalabs: configure maximum queues per asic Oded Gabbay
2020-07-05 13:12 ` [PATCH 9/9] habanalabs: use queue pi/ci in order to determine queue occupancy Oded Gabbay

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).