linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] habanalabs: don't init vm module if no MMU
@ 2020-11-02 19:57 Oded Gabbay
  2020-11-02 19:57 ` [PATCH] habanalabs: minimize prints when everything is fine Oded Gabbay
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Oded Gabbay @ 2020-11-02 19:57 UTC (permalink / raw)
  To: linux-kernel; +Cc: SW_Drivers

In case we are running without MMU enabled (debug mode), no need to
initialize the VM module in the driver.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
 drivers/misc/habanalabs/common/memory.c | 33 +++++++++++--------------
 1 file changed, 14 insertions(+), 19 deletions(-)

diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
index 84227819e4d1..75dd18771868 100644
--- a/drivers/misc/habanalabs/common/memory.c
+++ b/drivers/misc/habanalabs/common/memory.c
@@ -1685,27 +1685,19 @@ int hl_vm_ctx_init(struct hl_ctx *ctx)
 	 *   In case of DRAM mapping, the returned address is the physical
 	 *   address of the memory related to the given handle.
 	 */
-	if (ctx->hdev->mmu_enable) {
-		dram_range_start = prop->dmmu.start_addr;
-		dram_range_end = prop->dmmu.end_addr;
-		host_range_start = prop->pmmu.start_addr;
-		host_range_end = prop->pmmu.end_addr;
-		host_huge_range_start = prop->pmmu_huge.start_addr;
-		host_huge_range_end = prop->pmmu_huge.end_addr;
-	} else {
-		dram_range_start = prop->dram_user_base_address;
-		dram_range_end = prop->dram_end_address;
-		host_range_start = prop->dram_user_base_address;
-		host_range_end = prop->dram_end_address;
-		host_huge_range_start = prop->dram_user_base_address;
-		host_huge_range_end = prop->dram_end_address;
-	}
+	if (!ctx->hdev->mmu_enable)
+		return 0;
+
+	dram_range_start = prop->dmmu.start_addr;
+	dram_range_end = prop->dmmu.end_addr;
+	host_range_start = prop->pmmu.start_addr;
+	host_range_end = prop->pmmu.end_addr;
+	host_huge_range_start = prop->pmmu_huge.start_addr;
+	host_huge_range_end = prop->pmmu_huge.end_addr;
 
 	return vm_ctx_init_with_ranges(ctx, host_range_start, host_range_end,
-					host_huge_range_start,
-					host_huge_range_end,
-					dram_range_start,
-					dram_range_end);
+				host_huge_range_start, host_huge_range_end,
+				dram_range_start, dram_range_end);
 }
 
 /*
@@ -1737,6 +1729,9 @@ void hl_vm_ctx_fini(struct hl_ctx *ctx)
 	struct hlist_node *tmp_node;
 	int i;
 
+	if (!ctx->hdev->mmu_enable)
+		return;
+
 	hl_debugfs_remove_ctx_mem_hash(hdev, ctx);
 
 	/*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH] habanalabs: minimize prints when everything is fine
  2020-11-02 19:57 [PATCH] habanalabs: don't init vm module if no MMU Oded Gabbay
@ 2020-11-02 19:57 ` Oded Gabbay
  2020-11-02 19:57 ` [PATCH 1/4] habanalabs: sync stream structures refactor Oded Gabbay
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Oded Gabbay @ 2020-11-02 19:57 UTC (permalink / raw)
  To: linux-kernel; +Cc: SW_Drivers

No need to print when the driver starts to initialize the H/W. Drivers
should be silent when everything is OK.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
 drivers/misc/habanalabs/gaudi/gaudi.c | 2 --
 drivers/misc/habanalabs/goya/goya.c   | 2 --
 2 files changed, 4 deletions(-)

diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index 2910f427c716..9d9d22c4452c 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -2947,8 +2947,6 @@ static int gaudi_hw_init(struct hl_device *hdev)
 {
 	int rc;
 
-	dev_info(hdev->dev, "Starting initialization of H/W\n");
-
 	gaudi_pre_hw_init(hdev);
 
 	gaudi_init_pci_dma_qmans(hdev);
diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index 5db52064ed9e..f41fe748f1ca 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -2505,8 +2505,6 @@ static int goya_hw_init(struct hl_device *hdev)
 	struct asic_fixed_properties *prop = &hdev->asic_prop;
 	int rc;
 
-	dev_info(hdev->dev, "Starting initialization of H/W\n");
-
 	/* Perform read from the device to make sure device is up */
 	RREG32(mmPCIE_DBI_DEVICE_ID_VENDOR_ID_REG);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 1/4] habanalabs: sync stream structures refactor
  2020-11-02 19:57 [PATCH] habanalabs: don't init vm module if no MMU Oded Gabbay
  2020-11-02 19:57 ` [PATCH] habanalabs: minimize prints when everything is fine Oded Gabbay
@ 2020-11-02 19:57 ` Oded Gabbay
  2020-11-02 19:58 ` [PATCH 2/4] habanalabs: add support for multiple SOBs per monitor Oded Gabbay
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Oded Gabbay @ 2020-11-02 19:57 UTC (permalink / raw)
  To: linux-kernel; +Cc: SW_Drivers, Ofir Bitton

From: Ofir Bitton <obitton@habana.ai>

Refactor sync stream implementation by adding more structures for
better readability. In addition reducing allocated resources.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
 drivers/misc/habanalabs/common/habanalabs.h |  78 +++++---
 drivers/misc/habanalabs/common/hw_queue.c   | 197 +++++++++++---------
 drivers/misc/habanalabs/gaudi/gaudi.c       |  37 ++--
 drivers/misc/habanalabs/goya/goya.c         |   4 +-
 4 files changed, 180 insertions(+), 136 deletions(-)

diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
index d12f11bdaacf..58b4097235d9 100644
--- a/drivers/misc/habanalabs/common/habanalabs.h
+++ b/drivers/misc/habanalabs/common/habanalabs.h
@@ -68,9 +68,6 @@
 #define HL_RSVD_SOBS			4
 #define HL_RSVD_MONS			2
 
-#define HL_RSVD_SOBS_IN_USE		2
-#define HL_RSVD_MONS_IN_USE		1
-
 #define HL_MAX_SOB_VAL			(1 << 15)
 
 #define IS_POWER_OF_2(n)		(n != 0 && ((n & (n - 1)) == 0))
@@ -80,6 +77,22 @@
 
 #define HL_MAX_DCORES			4
 
+/**
+ * struct hl_gen_wait_properties - properties for generating a wait CB
+ * @data: command buffer
+ * @q_idx: queue id is used to extract fence register address
+ * @sob_id: SOB id to use in this wait CB
+ * @sob_val: SOB value to wait for
+ * @mon_id: monitor to use in this wait CB
+ */
+struct hl_gen_wait_properties {
+	void	*data;
+	u32	q_idx;
+	u16	sob_id;
+	u16	sob_val;
+	u16	mon_id;
+};
+
 /**
  * struct pgt_info - MMU hop page info.
  * @node: hash linked-list node for the pgts shadow hash of pgts.
@@ -502,9 +515,27 @@ struct hl_cs_job;
 #define HL_CPU_ACCESSIBLE_MEM_SIZE	SZ_2M
 
 /**
- * struct hl_hw_queue - describes a H/W transport queue.
+ * struct hl_sync_stream_properties -
+ *     describes a H/W queue sync stream properties
  * @hw_sob: array of the used H/W SOBs by this H/W queue.
+ * @next_sob_val: the next value to use for the currently used SOB.
+ * @base_sob_id: the base SOB id of the SOBs used by this queue.
+ * @base_mon_id: the base MON id of the MONs used by this queue.
+ * @curr_sob_offset: the id offset to the currently used SOB from the
+ *                   HL_RSVD_SOBS that are being used by this queue.
+ */
+struct hl_sync_stream_properties {
+	struct hl_hw_sob	hw_sob[HL_RSVD_SOBS];
+	u16			next_sob_val;
+	u16			base_sob_id;
+	u16			base_mon_id;
+	u8			curr_sob_offset;
+};
+
+/**
+ * struct hl_hw_queue - describes a H/W transport queue.
  * @shadow_queue: pointer to a shadow queue that holds pointers to jobs.
+ * @sync_stream_prop: sync stream queue properties
  * @queue_type: type of queue.
  * @kernel_address: holds the queue's kernel virtual address.
  * @bus_address: holds the queue's DMA address.
@@ -514,33 +545,24 @@ struct hl_cs_job;
  * @cq_id: the id for the corresponding CQ for this H/W queue.
  * @msi_vec: the IRQ number of the H/W queue.
  * @int_queue_len: length of internal queue (number of entries).
- * @next_sob_val: the next value to use for the currently used SOB.
- * @base_sob_id: the base SOB id of the SOBs used by this queue.
- * @base_mon_id: the base MON id of the MONs used by this queue.
  * @valid: is the queue valid (we have array of 32 queues, not all of them
  *         exist).
- * @curr_sob_offset: the id offset to the currently used SOB from the
- *                   HL_RSVD_SOBS that are being used by this queue.
  * @supports_sync_stream: True if queue supports sync stream
  */
 struct hl_hw_queue {
-	struct hl_hw_sob	hw_sob[HL_RSVD_SOBS];
-	struct hl_cs_job	**shadow_queue;
-	enum hl_queue_type	queue_type;
-	u64			kernel_address;
-	dma_addr_t		bus_address;
-	u32			pi;
-	atomic_t		ci;
-	u32			hw_queue_id;
-	u32			cq_id;
-	u32			msi_vec;
-	u16			int_queue_len;
-	u16			next_sob_val;
-	u16			base_sob_id;
-	u16			base_mon_id;
-	u8			valid;
-	u8			curr_sob_offset;
-	u8			supports_sync_stream;
+	struct hl_cs_job			**shadow_queue;
+	struct hl_sync_stream_properties	sync_stream_prop;
+	enum hl_queue_type			queue_type;
+	u64					kernel_address;
+	dma_addr_t				bus_address;
+	u32					pi;
+	atomic_t				ci;
+	u32					hw_queue_id;
+	u32					cq_id;
+	u32					msi_vec;
+	u16					int_queue_len;
+	u8					valid;
+	u8					supports_sync_stream;
 };
 
 /**
@@ -823,8 +845,8 @@ struct hl_asic_funcs {
 	u32 (*get_signal_cb_size)(struct hl_device *hdev);
 	u32 (*get_wait_cb_size)(struct hl_device *hdev);
 	void (*gen_signal_cb)(struct hl_device *hdev, void *data, u16 sob_id);
-	void (*gen_wait_cb)(struct hl_device *hdev, void *data, u16 sob_id,
-				u16 sob_val, u16 mon_id, u32 q_idx);
+	void (*gen_wait_cb)(struct hl_device *hdev,
+			struct hl_gen_wait_properties *prop);
 	void (*reset_sob)(struct hl_device *hdev, void *data);
 	void (*set_dma_mask_from_fw)(struct hl_device *hdev);
 	u64 (*get_device_time)(struct hl_device *hdev);
diff --git a/drivers/misc/habanalabs/common/hw_queue.c b/drivers/misc/habanalabs/common/hw_queue.c
index adb61f9eb2b5..613681c2cdcc 100644
--- a/drivers/misc/habanalabs/common/hw_queue.c
+++ b/drivers/misc/habanalabs/common/hw_queue.c
@@ -389,6 +389,89 @@ static void hw_queue_schedule_job(struct hl_cs_job *job)
 	ext_and_hw_queue_submit_bd(hdev, q, ctl, len, ptr);
 }
 
+static void init_signal_cs(struct hl_device *hdev,
+		struct hl_cs_job *job, struct hl_cs_compl *cs_cmpl)
+{
+	struct hl_sync_stream_properties *prop;
+	struct hl_hw_sob *hw_sob;
+	u32 q_idx;
+
+	q_idx = job->hw_queue_id;
+	prop = &hdev->kernel_queues[q_idx].sync_stream_prop;
+	hw_sob = &prop->hw_sob[prop->curr_sob_offset];
+
+	cs_cmpl->hw_sob = hw_sob;
+	cs_cmpl->sob_val = prop->next_sob_val++;
+
+	dev_dbg(hdev->dev,
+		"generate signal CB, sob_id: %d, sob val: 0x%x, q_idx: %d\n",
+		cs_cmpl->hw_sob->sob_id, cs_cmpl->sob_val, q_idx);
+
+	hdev->asic_funcs->gen_signal_cb(hdev, job->patched_cb,
+				cs_cmpl->hw_sob->sob_id);
+
+	kref_get(&hw_sob->kref);
+
+	/* check for wraparound */
+	if (prop->next_sob_val == HL_MAX_SOB_VAL) {
+		/*
+		 * Decrement as we reached the max value.
+		 * The release function won't be called here as we've
+		 * just incremented the refcount.
+		 */
+		kref_put(&hw_sob->kref, hl_sob_reset_error);
+		prop->next_sob_val = 1;
+		/* only two SOBs are currently in use */
+		prop->curr_sob_offset =
+			(prop->curr_sob_offset + 1) % HL_RSVD_SOBS;
+
+		dev_dbg(hdev->dev, "switched to SOB %d, q_idx: %d\n",
+				prop->curr_sob_offset, q_idx);
+	}
+}
+
+static void init_wait_cs(struct hl_device *hdev, struct hl_cs *cs,
+		struct hl_cs_job *job, struct hl_cs_compl *cs_cmpl)
+{
+	struct hl_cs_compl *signal_cs_cmpl;
+	struct hl_sync_stream_properties *prop;
+	struct hl_gen_wait_properties wait_prop;
+	u32 q_idx;
+
+	q_idx = job->hw_queue_id;
+	prop = &hdev->kernel_queues[q_idx].sync_stream_prop;
+
+	signal_cs_cmpl = container_of(cs->signal_fence,
+					struct hl_cs_compl,
+					base_fence);
+
+	/* copy the SOB id and value of the signal CS */
+	cs_cmpl->hw_sob = signal_cs_cmpl->hw_sob;
+	cs_cmpl->sob_val = signal_cs_cmpl->sob_val;
+
+	dev_dbg(hdev->dev,
+		"generate wait CB, sob_id: %d, sob_val: 0x%x, mon_id: %d, q_idx: %d\n",
+		cs_cmpl->hw_sob->sob_id, cs_cmpl->sob_val,
+		prop->base_mon_id, q_idx);
+
+	wait_prop.data = (void *) job->patched_cb;
+	wait_prop.sob_id = cs_cmpl->hw_sob->sob_id;
+	wait_prop.sob_val = cs_cmpl->sob_val;
+	wait_prop.mon_id = prop->base_mon_id;
+	wait_prop.q_idx = q_idx;
+	hdev->asic_funcs->gen_wait_cb(hdev, &wait_prop);
+
+	kref_get(&cs_cmpl->hw_sob->kref);
+	/*
+	 * Must put the signal fence after the SOB refcnt increment so
+	 * the SOB refcnt won't turn 0 and reset the SOB before the
+	 * wait CS was submitted.
+	 */
+	mb();
+	hl_fence_put(cs->signal_fence);
+	cs->signal_fence = NULL;
+}
+
 /*
  * init_signal_wait_cs - initialize a signal/wait CS
  * @cs: pointer to the signal/wait CS
@@ -399,84 +482,18 @@ static void init_signal_wait_cs(struct hl_cs *cs)
 {
 	struct hl_ctx *ctx = cs->ctx;
 	struct hl_device *hdev = ctx->hdev;
-	struct hl_hw_queue *hw_queue;
+	struct hl_cs_job *job;
 	struct hl_cs_compl *cs_cmpl =
 			container_of(cs->fence, struct hl_cs_compl, base_fence);
 
-	struct hl_hw_sob *hw_sob;
-	struct hl_cs_job *job;
-	u32 q_idx;
-
 	/* There is only one job in a signal/wait CS */
 	job = list_first_entry(&cs->job_list, struct hl_cs_job,
 				cs_node);
-	q_idx = job->hw_queue_id;
-	hw_queue = &hdev->kernel_queues[q_idx];
-
-	if (cs->type & CS_TYPE_SIGNAL) {
-		hw_sob = &hw_queue->hw_sob[hw_queue->curr_sob_offset];
-
-		cs_cmpl->hw_sob = hw_sob;
-		cs_cmpl->sob_val = hw_queue->next_sob_val++;
-
-		dev_dbg(hdev->dev,
-			"generate signal CB, sob_id: %d, sob val: 0x%x, q_idx: %d\n",
-			cs_cmpl->hw_sob->sob_id, cs_cmpl->sob_val, q_idx);
-
-		hdev->asic_funcs->gen_signal_cb(hdev, job->patched_cb,
-					cs_cmpl->hw_sob->sob_id);
-
-		kref_get(&hw_sob->kref);
-
-		/* check for wraparound */
-		if (hw_queue->next_sob_val == HL_MAX_SOB_VAL) {
-			/*
-			 * Decrement as we reached the max value.
-			 * The release function won't be called here as we've
-			 * just incremented the refcount.
-			 */
-			kref_put(&hw_sob->kref, hl_sob_reset_error);
-			hw_queue->next_sob_val = 1;
-			/* only two SOBs are currently in use */
-			hw_queue->curr_sob_offset =
-					(hw_queue->curr_sob_offset + 1) %
-						HL_RSVD_SOBS_IN_USE;
-
-			dev_dbg(hdev->dev, "switched to SOB %d, q_idx: %d\n",
-					hw_queue->curr_sob_offset, q_idx);
-		}
-	} else if (cs->type & CS_TYPE_WAIT) {
-		struct hl_cs_compl *signal_cs_cmpl;
-
-		signal_cs_cmpl = container_of(cs->signal_fence,
-						struct hl_cs_compl,
-						base_fence);
-
-		/* copy the the SOB id and value of the signal CS */
-		cs_cmpl->hw_sob = signal_cs_cmpl->hw_sob;
-		cs_cmpl->sob_val = signal_cs_cmpl->sob_val;
 
-		dev_dbg(hdev->dev,
-			"generate wait CB, sob_id: %d, sob_val: 0x%x, mon_id: %d, q_idx: %d\n",
-			cs_cmpl->hw_sob->sob_id, cs_cmpl->sob_val,
-			hw_queue->base_mon_id, q_idx);
-
-		hdev->asic_funcs->gen_wait_cb(hdev, job->patched_cb,
-						cs_cmpl->hw_sob->sob_id,
-						cs_cmpl->sob_val,
-						hw_queue->base_mon_id,
-						q_idx);
-
-		kref_get(&cs_cmpl->hw_sob->kref);
-		/*
-		 * Must put the signal fence after the SOB refcnt increment so
-		 * the SOB refcnt won't turn 0 and reset the SOB before the
-		 * wait CS was submitted.
-		 */
-		mb();
-		hl_fence_put(cs->signal_fence);
-		cs->signal_fence = NULL;
-	}
+	if (cs->type & CS_TYPE_SIGNAL)
+		init_signal_cs(hdev, job, cs_cmpl);
+	else if (cs->type & CS_TYPE_WAIT)
+		init_wait_cs(hdev, cs, job, cs_cmpl);
 }
 
 /*
@@ -720,22 +737,28 @@ static int hw_queue_init(struct hl_device *hdev, struct hl_hw_queue *q)
 
 static void sync_stream_queue_init(struct hl_device *hdev, u32 q_idx)
 {
-	struct hl_hw_queue *hw_queue = &hdev->kernel_queues[q_idx];
+	struct hl_sync_stream_properties *sync_stream_prop;
 	struct asic_fixed_properties *prop = &hdev->asic_prop;
 	struct hl_hw_sob *hw_sob;
-	int sob, queue_idx = hdev->sync_stream_queue_idx++;
+	int sob, queue_idx;
+
+	if (!hdev->kernel_queues[q_idx].supports_sync_stream)
+		return;
+
+	sync_stream_prop = &hdev->kernel_queues[q_idx].sync_stream_prop;
+	queue_idx = hdev->sync_stream_queue_idx++;
 
-	hw_queue->base_sob_id =
-		prop->sync_stream_first_sob + queue_idx * HL_RSVD_SOBS;
-	hw_queue->base_mon_id =
-		prop->sync_stream_first_mon + queue_idx * HL_RSVD_MONS;
-	hw_queue->next_sob_val = 1;
-	hw_queue->curr_sob_offset = 0;
+	sync_stream_prop->base_sob_id = prop->sync_stream_first_sob +
+			(queue_idx * HL_RSVD_SOBS);
+	sync_stream_prop->base_mon_id = prop->sync_stream_first_mon +
+			(queue_idx * HL_RSVD_MONS);
+	sync_stream_prop->next_sob_val = 1;
+	sync_stream_prop->curr_sob_offset = 0;
 
 	for (sob = 0 ; sob < HL_RSVD_SOBS ; sob++) {
-		hw_sob = &hw_queue->hw_sob[sob];
+		hw_sob = &sync_stream_prop->hw_sob[sob];
 		hw_sob->hdev = hdev;
-		hw_sob->sob_id = hw_queue->base_sob_id + sob;
+		hw_sob->sob_id = sync_stream_prop->base_sob_id + sob;
 		hw_sob->q_idx = q_idx;
 		kref_init(&hw_sob->kref);
 	}
@@ -743,15 +766,16 @@ static void sync_stream_queue_init(struct hl_device *hdev, u32 q_idx)
 
 static void sync_stream_queue_reset(struct hl_device *hdev, u32 q_idx)
 {
-	struct hl_hw_queue *hw_queue = &hdev->kernel_queues[q_idx];
+	struct hl_sync_stream_properties *prop =
+			&hdev->kernel_queues[q_idx].sync_stream_prop;
 
 	/*
 	 * In case we got here due to a stuck CS, the refcnt might be bigger
 	 * than 1 and therefore we reset it.
 	 */
-	kref_init(&hw_queue->hw_sob[hw_queue->curr_sob_offset].kref);
-	hw_queue->curr_sob_offset = 0;
-	hw_queue->next_sob_val = 1;
+	kref_init(&prop->hw_sob[prop->curr_sob_offset].kref);
+	prop->curr_sob_offset = 0;
+	prop->next_sob_val = 1;
 }
 
 /*
@@ -794,8 +818,7 @@ static int queue_init(struct hl_device *hdev, struct hl_hw_queue *q,
 		break;
 	}
 
-	if (q->supports_sync_stream)
-		sync_stream_queue_init(hdev, q->hw_queue_id);
+	sync_stream_queue_init(hdev, q->hw_queue_id);
 
 	if (rc)
 		return rc;
diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index 9d9d22c4452c..46dced9d1eec 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -472,9 +472,11 @@ static int gaudi_get_fixed_properties(struct hl_device *hdev)
 	prop->max_pending_cs = GAUDI_MAX_PENDING_CS;
 
 	prop->first_available_user_sob[HL_GAUDI_WS_DCORE] =
-			num_sync_stream_queues * HL_RSVD_SOBS;
+			prop->sync_stream_first_sob +
+			(num_sync_stream_queues * HL_RSVD_SOBS);
 	prop->first_available_user_mon[HL_GAUDI_WS_DCORE] =
-			num_sync_stream_queues * HL_RSVD_MONS;
+			prop->sync_stream_first_mon +
+			(num_sync_stream_queues * HL_RSVD_MONS);
 
 	return 0;
 }
@@ -6472,16 +6474,16 @@ static u32 gaudi_add_fence_pkt(struct packet_fence *pkt)
 	return pkt_size;
 }
 
-static void gaudi_gen_wait_cb(struct hl_device *hdev, void *data, u16 sob_id,
-			u16 sob_val, u16 mon_id, u32 q_idx)
+static void gaudi_gen_wait_cb(struct hl_device *hdev,
+		struct hl_gen_wait_properties *prop)
 {
-	struct hl_cb *cb = (struct hl_cb *) data;
+	struct hl_cb *cb = (struct hl_cb *) prop->data;
 	void *buf = (void *) (uintptr_t) cb->kernel_address;
 	u64 monitor_base, fence_addr = 0;
 	u32 size = 0;
 	u16 msg_addr_offset;
 
-	switch (q_idx) {
+	switch (prop->q_idx) {
 	case GAUDI_QUEUE_ID_DMA_0_0:
 		fence_addr = mmDMA0_QM_CP_FENCE2_RDATA_0;
 		break;
@@ -6521,7 +6523,7 @@ static void gaudi_gen_wait_cb(struct hl_device *hdev, void *data, u16 sob_id,
 	default:
 		/* queue index should be valid here */
 		dev_crit(hdev->dev, "wrong queue id %d for wait packet\n",
-				q_idx);
+				prop->q_idx);
 		return;
 	}
 
@@ -6534,17 +6536,15 @@ static void gaudi_gen_wait_cb(struct hl_device *hdev, void *data, u16 sob_id,
 	monitor_base = mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0;
 
 	/* First monitor config packet: low address of the sync */
-	msg_addr_offset =
-		(mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0 + mon_id * 4) -
-				monitor_base;
+	msg_addr_offset = (mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0 +
+			prop->mon_id * 4) - monitor_base;
 
 	size += gaudi_add_mon_msg_short(buf + size, (u32) fence_addr,
 					msg_addr_offset);
 
 	/* Second monitor config packet: high address of the sync */
-	msg_addr_offset =
-		(mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRH_0 + mon_id * 4) -
-				monitor_base;
+	msg_addr_offset = (mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRH_0 +
+			prop->mon_id * 4) - monitor_base;
 
 	size += gaudi_add_mon_msg_short(buf + size, (u32) (fence_addr >> 32),
 					msg_addr_offset);
@@ -6553,18 +6553,17 @@ static void gaudi_gen_wait_cb(struct hl_device *hdev, void *data, u16 sob_id,
 	 * Third monitor config packet: the payload, i.e. what to write when the
 	 * sync triggers
 	 */
-	msg_addr_offset =
-		(mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_DATA_0 + mon_id * 4) -
-				monitor_base;
+	msg_addr_offset = (mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_DATA_0 +
+			prop->mon_id * 4) - monitor_base;
 
 	size += gaudi_add_mon_msg_short(buf + size, 1, msg_addr_offset);
 
 	/* Fourth monitor config packet: bind the monitor to a sync object */
 	msg_addr_offset =
-		(mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_ARM_0 + mon_id * 4) -
+		(mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_ARM_0 + prop->mon_id * 4) -
 				monitor_base;
-	size += gaudi_add_arm_monitor_pkt(buf + size, sob_id, sob_val,
-						msg_addr_offset);
+	size += gaudi_add_arm_monitor_pkt(buf + size, prop->sob_id,
+			prop->sob_val, msg_addr_offset);
 
 	/* Fence packet */
 	size += gaudi_add_fence_pkt(buf + size);
diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index f41fe748f1ca..cd1366f10fbe 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -5297,8 +5297,8 @@ static void goya_gen_signal_cb(struct hl_device *hdev, void *data, u16 sob_id)
 
 }
 
-static void goya_gen_wait_cb(struct hl_device *hdev, void *data, u16 sob_id,
-			u16 sob_val, u16 mon_id, u32 q_idx)
+static void goya_gen_wait_cb(struct hl_device *hdev,
+		struct hl_gen_wait_properties *prop)
 {
 
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/4] habanalabs: add support for multiple SOBs per monitor
  2020-11-02 19:57 [PATCH] habanalabs: don't init vm module if no MMU Oded Gabbay
  2020-11-02 19:57 ` [PATCH] habanalabs: minimize prints when everything is fine Oded Gabbay
  2020-11-02 19:57 ` [PATCH 1/4] habanalabs: sync stream structures refactor Oded Gabbay
@ 2020-11-02 19:58 ` Oded Gabbay
  2020-11-02 19:58 ` [PATCH 3/4] habanalabs: sync stream refactor functions Oded Gabbay
  2020-11-02 19:58 ` [PATCH 4/4] habanalabs: remove duplicate check Oded Gabbay
  4 siblings, 0 replies; 6+ messages in thread
From: Oded Gabbay @ 2020-11-02 19:58 UTC (permalink / raw)
  To: linux-kernel; +Cc: SW_Drivers, Ofir Bitton

From: Ofir Bitton <obitton@habana.ai>

Support advanced monitor functionality to monitor more than a
single SOB. In addition expand all CB generation functions
with buffer offset in order to put in them multiple packets that are
generated by different functions.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
 .../habanalabs/common/command_submission.c    |  32 ++++
 drivers/misc/habanalabs/common/habanalabs.h   |  16 +-
 drivers/misc/habanalabs/common/hw_queue.c     |   6 +-
 drivers/misc/habanalabs/gaudi/gaudi.c         | 137 ++++++++++++------
 drivers/misc/habanalabs/goya/goya.c           |   9 +-
 5 files changed, 143 insertions(+), 57 deletions(-)

diff --git a/drivers/misc/habanalabs/common/command_submission.c b/drivers/misc/habanalabs/common/command_submission.c
index 9d49dd1558af..0d82c7dd93d0 100644
--- a/drivers/misc/habanalabs/common/command_submission.c
+++ b/drivers/misc/habanalabs/common/command_submission.c
@@ -38,6 +38,38 @@ void hl_sob_reset_error(struct kref *ref)
 			hw_sob->q_idx, hw_sob->sob_id);
 }
 
+/**
+ * hl_gen_sob_mask() - Generates a sob mask to be used in a monitor arm packet
+ * @sob_base: sob base id
+ * @sob_mask: sob user mask, each bit represents a sob offset from sob base
+ * @mask: generated mask
+ *
+ * Return: 0 if given parameters are valid
+ */
+int hl_gen_sob_mask(u16 sob_base, u8 sob_mask, u8 *mask)
+{
+	int i;
+
+	if (sob_mask == 0)
+		return -EINVAL;
+
+	if (sob_mask == 0x1) {
+		*mask = ~(1 << (sob_base & 0x7));
+	} else {
+		/* find msb in order to verify sob range is valid */
+		for (i = BITS_PER_BYTE - 1 ; i >= 0 ; i--)
+			if (BIT(i) & sob_mask)
+				break;
+
+		if (i > (HL_MAX_SOBS_PER_MONITOR - (sob_base & 0x7) - 1))
+			return -EINVAL;
+
+		*mask = ~sob_mask;
+	}
+
+	return 0;
+}
+
 static void hl_fence_release(struct kref *kref)
 {
 	struct hl_fence *fence =
diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
index 58b4097235d9..7307e0b88b44 100644
--- a/drivers/misc/habanalabs/common/habanalabs.h
+++ b/drivers/misc/habanalabs/common/habanalabs.h
@@ -77,20 +77,26 @@
 
 #define HL_MAX_DCORES			4
 
+#define HL_MAX_SOBS_PER_MONITOR	8
+
 /**
  * struct hl_gen_wait_properties - properties for generating a wait CB
  * @data: command buffer
  * @q_idx: queue id is used to extract fence register address
- * @sob_id: SOB id to use in this wait CB
+ * @size: offset in command buffer
+ * @sob_base: SOB base to use in this wait CB
  * @sob_val: SOB value to wait for
  * @mon_id: monitor to use in this wait CB
+ * @sob_mask: each bit represents a SOB offset from sob_base to be used
  */
 struct hl_gen_wait_properties {
 	void	*data;
 	u32	q_idx;
-	u16	sob_id;
+	u32	size;
+	u16	sob_base;
 	u16	sob_val;
 	u16	mon_id;
+	u8	sob_mask;
 };
 
 /**
@@ -844,8 +850,9 @@ struct hl_asic_funcs {
 	int (*load_boot_fit_to_device)(struct hl_device *hdev);
 	u32 (*get_signal_cb_size)(struct hl_device *hdev);
 	u32 (*get_wait_cb_size)(struct hl_device *hdev);
-	void (*gen_signal_cb)(struct hl_device *hdev, void *data, u16 sob_id);
-	void (*gen_wait_cb)(struct hl_device *hdev,
+	u32 (*gen_signal_cb)(struct hl_device *hdev, void *data, u16 sob_id,
+			u32 size);
+	u32 (*gen_wait_cb)(struct hl_device *hdev,
 			struct hl_gen_wait_properties *prop);
 	void (*reset_sob)(struct hl_device *hdev, void *data);
 	void (*set_dma_mask_from_fw)(struct hl_device *hdev);
@@ -1927,6 +1934,7 @@ void hl_cs_rollback_all(struct hl_device *hdev);
 struct hl_cs_job *hl_cs_allocate_job(struct hl_device *hdev,
 		enum hl_queue_type queue_type, bool is_kernel_allocated_cb);
 void hl_sob_reset_error(struct kref *ref);
+int hl_gen_sob_mask(u16 sob_base, u8 sob_mask, u8 *mask);
 void hl_fence_put(struct hl_fence *fence);
 void hl_fence_get(struct hl_fence *fence);
 
diff --git a/drivers/misc/habanalabs/common/hw_queue.c b/drivers/misc/habanalabs/common/hw_queue.c
index 613681c2cdcc..ca625789d78d 100644
--- a/drivers/misc/habanalabs/common/hw_queue.c
+++ b/drivers/misc/habanalabs/common/hw_queue.c
@@ -408,7 +408,7 @@ static void init_signal_cs(struct hl_device *hdev,
 		cs_cmpl->hw_sob->sob_id, cs_cmpl->sob_val, q_idx);
 
 	hdev->asic_funcs->gen_signal_cb(hdev, job->patched_cb,
-				cs_cmpl->hw_sob->sob_id);
+				cs_cmpl->hw_sob->sob_id, 0);
 
 	kref_get(&hw_sob->kref);
 
@@ -455,10 +455,12 @@ static void init_wait_cs(struct hl_device *hdev, struct hl_cs *cs,
 		prop->base_mon_id, q_idx);
 
 	wait_prop.data = (void *) job->patched_cb;
-	wait_prop.sob_id = cs_cmpl->hw_sob->sob_id;
+	wait_prop.sob_base = cs_cmpl->hw_sob->sob_id;
+	wait_prop.sob_mask = 0x1;
 	wait_prop.sob_val = cs_cmpl->sob_val;
 	wait_prop.mon_id = prop->base_mon_id;
 	wait_prop.q_idx = q_idx;
+	wait_prop.size = 0;
 	hdev->asic_funcs->gen_wait_cb(hdev, &wait_prop);
 
 	kref_get(&cs_cmpl->hw_sob->kref);
diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index 46dced9d1eec..930b26b1f445 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -6380,14 +6380,16 @@ static u32 gaudi_get_wait_cb_size(struct hl_device *hdev)
 			sizeof(struct packet_msg_prot) * 2;
 }
 
-static void gaudi_gen_signal_cb(struct hl_device *hdev, void *data, u16 sob_id)
+static u32 gaudi_gen_signal_cb(struct hl_device *hdev, void *data, u16 sob_id,
+		u32 size)
 {
 	struct hl_cb *cb = (struct hl_cb *) data;
 	struct packet_msg_short *pkt;
-	u32 value, ctl;
+	u32 value, ctl, pkt_size = sizeof(*pkt);
 
-	pkt = (struct packet_msg_short *) (uintptr_t) cb->kernel_address;
-	memset(pkt, 0, sizeof(*pkt));
+	pkt = (struct packet_msg_short *) (uintptr_t) (cb->kernel_address +
+									size);
+	memset(pkt, 0, pkt_size);
 
 	/* Inc by 1, Mode ADD */
 	value = FIELD_PREP(GAUDI_PKT_SHORT_VAL_SOB_SYNC_VAL_MASK, 1);
@@ -6403,6 +6405,8 @@ static void gaudi_gen_signal_cb(struct hl_device *hdev, void *data, u16 sob_id)
 
 	pkt->value = cpu_to_le32(value);
 	pkt->ctl = cpu_to_le32(ctl);
+
+	return size + pkt_size;
 }
 
 static u32 gaudi_add_mon_msg_short(struct packet_msg_short *pkt, u32 value,
@@ -6425,21 +6429,42 @@ static u32 gaudi_add_mon_msg_short(struct packet_msg_short *pkt, u32 value,
 	return pkt_size;
 }
 
-static u32 gaudi_add_arm_monitor_pkt(struct packet_msg_short *pkt, u16 sob_id,
-					u16 sob_val, u16 addr)
+static u32 gaudi_add_arm_monitor_pkt(struct hl_device *hdev,
+		struct packet_msg_short *pkt, u16 sob_base, u8 sob_mask,
+		u16 sob_val, u16 mon_id)
 {
+	u64 monitor_base;
 	u32 ctl, value, pkt_size = sizeof(*pkt);
-	u8 mask = ~(1 << (sob_id & 0x7));
+	u16 msg_addr_offset;
+	u8 mask;
+
+	if (hl_gen_sob_mask(sob_base, sob_mask, &mask)) {
+		dev_err(hdev->dev,
+			"sob_base %u (mask %#x) is not valid\n",
+			sob_base, sob_mask);
+		return 0;
+	}
+
+	/*
+	 * monitor_base should be the content of the base0 address registers,
+	 * so it will be added to the msg short offsets
+	 */
+	monitor_base = mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0;
+
+	msg_addr_offset =
+		(mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_ARM_0 + mon_id * 4) -
+				monitor_base;
 
 	memset(pkt, 0, pkt_size);
 
-	value = FIELD_PREP(GAUDI_PKT_SHORT_VAL_MON_SYNC_GID_MASK, sob_id / 8);
+	/* Monitor config packet: bind the monitor to a sync object */
+	value = FIELD_PREP(GAUDI_PKT_SHORT_VAL_MON_SYNC_GID_MASK, sob_base / 8);
 	value |= FIELD_PREP(GAUDI_PKT_SHORT_VAL_MON_SYNC_VAL_MASK, sob_val);
 	value |= FIELD_PREP(GAUDI_PKT_SHORT_VAL_MON_MODE_MASK,
 			0); /* GREATER OR EQUAL*/
 	value |= FIELD_PREP(GAUDI_PKT_SHORT_VAL_MON_MASK_MASK, mask);
 
-	ctl = FIELD_PREP(GAUDI_PKT_SHORT_CTL_ADDR_MASK, addr);
+	ctl = FIELD_PREP(GAUDI_PKT_SHORT_CTL_ADDR_MASK, msg_addr_offset);
 	ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_OP_MASK, 0); /* write the value */
 	ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_BASE_MASK, 2); /* W_S MON base */
 	ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_OPCODE_MASK, PACKET_MSG_SHORT);
@@ -6474,60 +6499,61 @@ static u32 gaudi_add_fence_pkt(struct packet_fence *pkt)
 	return pkt_size;
 }
 
-static void gaudi_gen_wait_cb(struct hl_device *hdev,
-		struct hl_gen_wait_properties *prop)
+static int gaudi_get_fence_addr(struct hl_device *hdev, u32 queue_id, u64 *addr)
 {
-	struct hl_cb *cb = (struct hl_cb *) prop->data;
-	void *buf = (void *) (uintptr_t) cb->kernel_address;
-	u64 monitor_base, fence_addr = 0;
-	u32 size = 0;
-	u16 msg_addr_offset;
+	u32 offset;
 
-	switch (prop->q_idx) {
+	switch (queue_id) {
 	case GAUDI_QUEUE_ID_DMA_0_0:
-		fence_addr = mmDMA0_QM_CP_FENCE2_RDATA_0;
+		offset = mmDMA0_QM_CP_FENCE2_RDATA_0;
 		break;
 	case GAUDI_QUEUE_ID_DMA_0_1:
-		fence_addr = mmDMA0_QM_CP_FENCE2_RDATA_1;
+		offset = mmDMA0_QM_CP_FENCE2_RDATA_1;
 		break;
 	case GAUDI_QUEUE_ID_DMA_0_2:
-		fence_addr = mmDMA0_QM_CP_FENCE2_RDATA_2;
+		offset = mmDMA0_QM_CP_FENCE2_RDATA_2;
 		break;
 	case GAUDI_QUEUE_ID_DMA_0_3:
-		fence_addr = mmDMA0_QM_CP_FENCE2_RDATA_3;
+		offset = mmDMA0_QM_CP_FENCE2_RDATA_3;
 		break;
 	case GAUDI_QUEUE_ID_DMA_1_0:
-		fence_addr = mmDMA1_QM_CP_FENCE2_RDATA_0;
+		offset = mmDMA1_QM_CP_FENCE2_RDATA_0;
 		break;
 	case GAUDI_QUEUE_ID_DMA_1_1:
-		fence_addr = mmDMA1_QM_CP_FENCE2_RDATA_1;
+		offset = mmDMA1_QM_CP_FENCE2_RDATA_1;
 		break;
 	case GAUDI_QUEUE_ID_DMA_1_2:
-		fence_addr = mmDMA1_QM_CP_FENCE2_RDATA_2;
+		offset = mmDMA1_QM_CP_FENCE2_RDATA_2;
 		break;
 	case GAUDI_QUEUE_ID_DMA_1_3:
-		fence_addr = mmDMA1_QM_CP_FENCE2_RDATA_3;
+		offset = mmDMA1_QM_CP_FENCE2_RDATA_3;
 		break;
 	case GAUDI_QUEUE_ID_DMA_5_0:
-		fence_addr = mmDMA5_QM_CP_FENCE2_RDATA_0;
+		offset = mmDMA5_QM_CP_FENCE2_RDATA_0;
 		break;
 	case GAUDI_QUEUE_ID_DMA_5_1:
-		fence_addr = mmDMA5_QM_CP_FENCE2_RDATA_1;
+		offset = mmDMA5_QM_CP_FENCE2_RDATA_1;
 		break;
 	case GAUDI_QUEUE_ID_DMA_5_2:
-		fence_addr = mmDMA5_QM_CP_FENCE2_RDATA_2;
+		offset = mmDMA5_QM_CP_FENCE2_RDATA_2;
 		break;
 	case GAUDI_QUEUE_ID_DMA_5_3:
-		fence_addr = mmDMA5_QM_CP_FENCE2_RDATA_3;
+		offset = mmDMA5_QM_CP_FENCE2_RDATA_3;
 		break;
 	default:
-		/* queue index should be valid here */
-		dev_crit(hdev->dev, "wrong queue id %d for wait packet\n",
-				prop->q_idx);
-		return;
+		return -EINVAL;
 	}
 
-	fence_addr += CFG_BASE;
+	*addr = CFG_BASE + offset;
+
+	return 0;
+}
+
+static u32 gaudi_add_mon_pkts(void *buf, u16 mon_id, u64 fence_addr)
+{
+	u64 monitor_base;
+	u32 size = 0;
+	u16 msg_addr_offset;
 
 	/*
 	 * monitor_base should be the content of the base0 address registers,
@@ -6536,15 +6562,17 @@ static void gaudi_gen_wait_cb(struct hl_device *hdev,
 	monitor_base = mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0;
 
 	/* First monitor config packet: low address of the sync */
-	msg_addr_offset = (mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0 +
-			prop->mon_id * 4) - monitor_base;
+	msg_addr_offset =
+		(mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0 + mon_id * 4) -
+				monitor_base;
 
 	size += gaudi_add_mon_msg_short(buf + size, (u32) fence_addr,
 					msg_addr_offset);
 
 	/* Second monitor config packet: high address of the sync */
-	msg_addr_offset = (mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRH_0 +
-			prop->mon_id * 4) - monitor_base;
+	msg_addr_offset =
+		(mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRH_0 + mon_id * 4) -
+				monitor_base;
 
 	size += gaudi_add_mon_msg_short(buf + size, (u32) (fence_addr >> 32),
 					msg_addr_offset);
@@ -6553,20 +6581,35 @@ static void gaudi_gen_wait_cb(struct hl_device *hdev,
 	 * Third monitor config packet: the payload, i.e. what to write when the
 	 * sync triggers
 	 */
-	msg_addr_offset = (mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_DATA_0 +
-			prop->mon_id * 4) - monitor_base;
+	msg_addr_offset =
+		(mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_DATA_0 + mon_id * 4) -
+				monitor_base;
 
 	size += gaudi_add_mon_msg_short(buf + size, 1, msg_addr_offset);
 
-	/* Fourth monitor config packet: bind the monitor to a sync object */
-	msg_addr_offset =
-		(mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_ARM_0 + prop->mon_id * 4) -
-				monitor_base;
-	size += gaudi_add_arm_monitor_pkt(buf + size, prop->sob_id,
-			prop->sob_val, msg_addr_offset);
+	return size;
+}
+
+u32 gaudi_gen_wait_cb(struct hl_device *hdev,
+		struct hl_gen_wait_properties *prop)
+{
+	struct hl_cb *cb = (struct hl_cb *) prop->data;
+	void *buf = (void *) (uintptr_t) cb->kernel_address;
+	u64 fence_addr = 0;
+	u32 size = prop->size;
 
-	/* Fence packet */
+	if (gaudi_get_fence_addr(hdev, prop->q_idx, &fence_addr)) {
+		dev_crit(hdev->dev, "wrong queue id %d for wait packet\n",
+				prop->q_idx);
+		return 0;
+	}
+
+	size += gaudi_add_mon_pkts(buf + size, prop->mon_id, fence_addr);
+	size += gaudi_add_arm_monitor_pkt(hdev, buf + size, prop->sob_base,
+			prop->sob_mask, prop->sob_val, prop->mon_id);
 	size += gaudi_add_fence_pkt(buf + size);
+
+	return size;
 }
 
 static void gaudi_reset_sob(struct hl_device *hdev, void *data)
diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index cd1366f10fbe..e8bf0b79cd67 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -5292,15 +5292,16 @@ static u32 goya_get_wait_cb_size(struct hl_device *hdev)
 	return 0;
 }
 
-static void goya_gen_signal_cb(struct hl_device *hdev, void *data, u16 sob_id)
+static u32 goya_gen_signal_cb(struct hl_device *hdev, void *data, u16 sob_id,
+		u32 size)
 {
-
+	return 0;
 }
 
-static void goya_gen_wait_cb(struct hl_device *hdev,
+static u32 goya_gen_wait_cb(struct hl_device *hdev,
 		struct hl_gen_wait_properties *prop)
 {
-
+	return 0;
 }
 
 static void goya_reset_sob(struct hl_device *hdev, void *data)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/4] habanalabs: sync stream refactor functions
  2020-11-02 19:57 [PATCH] habanalabs: don't init vm module if no MMU Oded Gabbay
                   ` (2 preceding siblings ...)
  2020-11-02 19:58 ` [PATCH 2/4] habanalabs: add support for multiple SOBs per monitor Oded Gabbay
@ 2020-11-02 19:58 ` Oded Gabbay
  2020-11-02 19:58 ` [PATCH 4/4] habanalabs: remove duplicate check Oded Gabbay
  4 siblings, 0 replies; 6+ messages in thread
From: Oded Gabbay @ 2020-11-02 19:58 UTC (permalink / raw)
  To: linux-kernel; +Cc: SW_Drivers, Ofir Bitton

From: Ofir Bitton <obitton@habana.ai>

Refactor sync stream implementation by reducing function length
for better readability.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
 .../habanalabs/common/command_submission.c    | 210 ++++++++++--------
 1 file changed, 119 insertions(+), 91 deletions(-)

diff --git a/drivers/misc/habanalabs/common/command_submission.c b/drivers/misc/habanalabs/common/command_submission.c
index 0d82c7dd93d0..b0f33579ca7f 100644
--- a/drivers/misc/habanalabs/common/command_submission.c
+++ b/drivers/misc/habanalabs/common/command_submission.c
@@ -792,22 +792,121 @@ static int cs_ioctl_default(struct hl_fpriv *hpriv, void __user *chunks,
 	return rc;
 }
 
+static int cs_ioctl_extract_signal_seq(struct hl_device *hdev,
+		struct hl_cs_chunk *chunk, u64 *signal_seq)
+{
+	u64 *signal_seq_arr = NULL;
+	u32 size_to_copy, signal_seq_arr_len;
+	int rc = 0;
+
+	signal_seq_arr_len = chunk->num_signal_seq_arr;
+
+	/* currently only one signal seq is supported */
+	if (signal_seq_arr_len != 1) {
+		dev_err(hdev->dev,
+			"Wait for signal CS supports only one signal CS seq\n");
+		return -EINVAL;
+	}
+
+	signal_seq_arr = kmalloc_array(signal_seq_arr_len,
+					sizeof(*signal_seq_arr),
+					GFP_ATOMIC);
+	if (!signal_seq_arr)
+		return -ENOMEM;
+
+	size_to_copy = chunk->num_signal_seq_arr * sizeof(*signal_seq_arr);
+	if (copy_from_user(signal_seq_arr,
+				u64_to_user_ptr(chunk->signal_seq_arr),
+				size_to_copy)) {
+		dev_err(hdev->dev,
+			"Failed to copy signal seq array from user\n");
+		rc = -EFAULT;
+		goto out;
+	}
+
+	/* currently it is guaranteed to have only one signal seq */
+	*signal_seq = signal_seq_arr[0];
+
+out:
+	kfree(signal_seq_arr);
+
+	return rc;
+}
+
+static int cs_ioctl_signal_wait_create_jobs(struct hl_device *hdev,
+		struct hl_ctx *ctx, struct hl_cs *cs, enum hl_queue_type q_type,
+		u32 q_idx)
+{
+	struct hl_cs_counters_atomic *cntr;
+	struct hl_cs_job *job;
+	struct hl_cb *cb;
+	u32 cb_size;
+
+	cntr = &hdev->aggregated_cs_counters;
+
+	job = hl_cs_allocate_job(hdev, q_type, true);
+	if (!job) {
+		ctx->cs_counters.out_of_mem_drop_cnt++;
+		atomic64_inc(&cntr->out_of_mem_drop_cnt);
+		dev_err(hdev->dev, "Failed to allocate a new job\n");
+		return -ENOMEM;
+	}
+
+	if (cs->type == CS_TYPE_WAIT)
+		cb_size = hdev->asic_funcs->get_wait_cb_size(hdev);
+	else
+		cb_size = hdev->asic_funcs->get_signal_cb_size(hdev);
+
+	cb = hl_cb_kernel_create(hdev, cb_size,
+				q_type == QUEUE_TYPE_HW && hdev->mmu_enable);
+	if (!cb) {
+		ctx->cs_counters.out_of_mem_drop_cnt++;
+		atomic64_inc(&cntr->out_of_mem_drop_cnt);
+		kfree(job);
+		return -EFAULT;
+	}
+
+	job->id = 0;
+	job->cs = cs;
+	job->user_cb = cb;
+	job->user_cb->cs_cnt++;
+	job->user_cb_size = cb_size;
+	job->hw_queue_id = q_idx;
+
+	/*
+	 * No need in parsing, user CB is the patched CB.
+	 * We call hl_cb_destroy() out of two reasons - we don't need the CB in
+	 * the CB idr anymore and to decrement its refcount as it was
+	 * incremented inside hl_cb_kernel_create().
+	 */
+	job->patched_cb = job->user_cb;
+	job->job_cb_size = job->user_cb_size;
+	hl_cb_destroy(hdev, &hdev->kernel_cb_mgr, cb->id << PAGE_SHIFT);
+
+	cs->jobs_in_queue_cnt[job->hw_queue_id]++;
+
+	list_add_tail(&job->cs_node, &cs->job_list);
+
+	hl_debugfs_add_job(hdev, job);
+
+	return 0;
+}
+
 static int cs_ioctl_signal_wait(struct hl_fpriv *hpriv, enum hl_cs_type cs_type,
 				void __user *chunks, u32 num_chunks,
 				u64 *cs_seq)
 {
-	u32 size_to_copy, q_idx, signal_seq_arr_len, cb_size;
+	struct hl_device *hdev = hpriv->hdev;
+	struct hl_ctx *ctx = hpriv->ctx;
 	struct hl_cs_chunk *cs_chunk_array, *chunk;
 	struct hw_queue_properties *hw_queue_prop;
-	u64 *signal_seq_arr = NULL, signal_seq;
-	struct hl_device *hdev = hpriv->hdev;
-	struct hl_cs_counters_atomic *cntr;
 	struct hl_fence *sig_fence = NULL;
-	struct hl_ctx *ctx = hpriv->ctx;
-	enum hl_queue_type q_type;
-	struct hl_cs_job *job;
+	struct hl_cs_counters_atomic *cntr;
+	struct hl_cs_compl *sig_waitcs_cmpl;
 	struct hl_cs *cs;
-	struct hl_cb *cb;
+	enum hl_queue_type q_type;
+	u32 size_to_copy, q_idx;
+	u64 signal_seq;
 	int rc;
 
 	*cs_seq = ULLONG_MAX;
@@ -857,52 +956,23 @@ static int cs_ioctl_signal_wait(struct hl_fpriv *hpriv, enum hl_cs_type cs_type,
 	}
 
 	if (cs_type == CS_TYPE_WAIT) {
-		struct hl_cs_compl *sig_waitcs_cmpl;
-
-		signal_seq_arr_len = chunk->num_signal_seq_arr;
-
-		/* currently only one signal seq is supported */
-		if (signal_seq_arr_len != 1) {
-			dev_err(hdev->dev,
-				"Wait for signal CS supports only one signal CS seq\n");
-			rc = -EINVAL;
+		rc = cs_ioctl_extract_signal_seq(hdev, chunk, &signal_seq);
+		if (rc)
 			goto free_cs_chunk_array;
-		}
 
-		signal_seq_arr = kmalloc_array(signal_seq_arr_len,
-						sizeof(*signal_seq_arr),
-						GFP_ATOMIC);
-		if (!signal_seq_arr) {
-			rc = -ENOMEM;
-			goto free_cs_chunk_array;
-		}
-
-		size_to_copy = chunk->num_signal_seq_arr *
-				sizeof(*signal_seq_arr);
-		if (copy_from_user(signal_seq_arr,
-					u64_to_user_ptr(chunk->signal_seq_arr),
-					size_to_copy)) {
-			dev_err(hdev->dev,
-				"Failed to copy signal seq array from user\n");
-			rc = -EFAULT;
-			goto free_signal_seq_array;
-		}
-
-		/* currently it is guaranteed to have only one signal seq */
-		signal_seq = signal_seq_arr[0];
 		sig_fence = hl_ctx_get_fence(ctx, signal_seq);
 		if (IS_ERR(sig_fence)) {
 			dev_err(hdev->dev,
 				"Failed to get signal CS with seq 0x%llx\n",
 				signal_seq);
 			rc = PTR_ERR(sig_fence);
-			goto free_signal_seq_array;
+			goto free_cs_chunk_array;
 		}
 
 		if (!sig_fence) {
 			/* signal CS already finished */
 			rc = 0;
-			goto free_signal_seq_array;
+			goto free_cs_chunk_array;
 		}
 
 		sig_waitcs_cmpl =
@@ -914,14 +984,14 @@ static int cs_ioctl_signal_wait(struct hl_fpriv *hpriv, enum hl_cs_type cs_type,
 				signal_seq);
 			hl_fence_put(sig_fence);
 			rc = -EINVAL;
-			goto free_signal_seq_array;
+			goto free_cs_chunk_array;
 		}
 
 		if (completion_done(&sig_fence->completion)) {
 			/* signal CS already finished */
 			hl_fence_put(sig_fence);
 			rc = 0;
-			goto free_signal_seq_array;
+			goto free_cs_chunk_array;
 		}
 	}
 
@@ -933,70 +1003,31 @@ static int cs_ioctl_signal_wait(struct hl_fpriv *hpriv, enum hl_cs_type cs_type,
 		if (cs_type == CS_TYPE_WAIT)
 			hl_fence_put(sig_fence);
 		hl_ctx_put(ctx);
-		goto free_signal_seq_array;
+		goto free_cs_chunk_array;
 	}
 
 	/*
 	 * Save the signal CS fence for later initialization right before
 	 * hanging the wait CS on the queue.
 	 */
-	if (cs->type == CS_TYPE_WAIT)
+	if (cs_type == CS_TYPE_WAIT)
 		cs->signal_fence = sig_fence;
 
 	hl_debugfs_add_cs(cs);
 
 	*cs_seq = cs->sequence;
 
-	job = hl_cs_allocate_job(hdev, q_type, true);
-	if (!job) {
-		ctx->cs_counters.out_of_mem_drop_cnt++;
-		atomic64_inc(&cntr->out_of_mem_drop_cnt);
-		dev_err(hdev->dev, "Failed to allocate a new job\n");
-		rc = -ENOMEM;
-		goto put_cs;
-	}
-
-	if (cs->type == CS_TYPE_WAIT)
-		cb_size = hdev->asic_funcs->get_wait_cb_size(hdev);
-	else
-		cb_size = hdev->asic_funcs->get_signal_cb_size(hdev);
+	if (cs_type == CS_TYPE_WAIT || cs_type == CS_TYPE_SIGNAL)
+		rc = cs_ioctl_signal_wait_create_jobs(hdev, ctx, cs, q_type,
+				q_idx);
 
-	cb = hl_cb_kernel_create(hdev, cb_size,
-				q_type == QUEUE_TYPE_HW && hdev->mmu_enable);
-	if (!cb) {
-		ctx->cs_counters.out_of_mem_drop_cnt++;
-		atomic64_inc(&cntr->out_of_mem_drop_cnt);
-		kfree(job);
-		rc = -EFAULT;
+	if (rc)
 		goto put_cs;
-	}
 
-	job->id = 0;
-	job->cs = cs;
-	job->user_cb = cb;
-	job->user_cb->cs_cnt++;
-	job->user_cb_size = cb_size;
-	job->hw_queue_id = q_idx;
-
-	/*
-	 * No need in parsing, user CB is the patched CB.
-	 * We call hl_cb_destroy() out of two reasons - we don't need the CB in
-	 * the CB idr anymore and to decrement its refcount as it was
-	 * incremented inside hl_cb_kernel_create().
-	 */
-	job->patched_cb = job->user_cb;
-	job->job_cb_size = job->user_cb_size;
-	hl_cb_destroy(hdev, &hdev->kernel_cb_mgr, cb->id << PAGE_SHIFT);
-
-	cs->jobs_in_queue_cnt[job->hw_queue_id]++;
-
-	list_add_tail(&job->cs_node, &cs->job_list);
 
 	/* increment refcount as for external queues we get completion */
 	cs_get(cs);
 
-	hl_debugfs_add_job(hdev, job);
-
 	rc = hl_hw_queue_schedule_cs(cs);
 	if (rc) {
 		if (rc != -EAGAIN)
@@ -1016,9 +1047,6 @@ static int cs_ioctl_signal_wait(struct hl_fpriv *hpriv, enum hl_cs_type cs_type,
 put_cs:
 	/* We finished with the CS in this function, so put the ref */
 	cs_put(cs);
-free_signal_seq_array:
-	if (cs_type == CS_TYPE_WAIT)
-		kfree(signal_seq_arr);
 free_cs_chunk_array:
 	kfree(cs_chunk_array);
 out:
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 4/4] habanalabs: remove duplicate check
  2020-11-02 19:57 [PATCH] habanalabs: don't init vm module if no MMU Oded Gabbay
                   ` (3 preceding siblings ...)
  2020-11-02 19:58 ` [PATCH 3/4] habanalabs: sync stream refactor functions Oded Gabbay
@ 2020-11-02 19:58 ` Oded Gabbay
  4 siblings, 0 replies; 6+ messages in thread
From: Oded Gabbay @ 2020-11-02 19:58 UTC (permalink / raw)
  To: linux-kernel; +Cc: SW_Drivers

We already check if queue index is smaller than max queues a few lines
above this check so no need to check this again.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
 drivers/misc/habanalabs/common/command_submission.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/misc/habanalabs/common/command_submission.c b/drivers/misc/habanalabs/common/command_submission.c
index b0f33579ca7f..1f8b53d42e3a 100644
--- a/drivers/misc/habanalabs/common/command_submission.c
+++ b/drivers/misc/habanalabs/common/command_submission.c
@@ -948,9 +948,10 @@ static int cs_ioctl_signal_wait(struct hl_fpriv *hpriv, enum hl_cs_type cs_type,
 	hw_queue_prop = &hdev->asic_prop.hw_queues_props[q_idx];
 	q_type = hw_queue_prop->type;
 
-	if ((q_idx >= hdev->asic_prop.max_queues) ||
-			(!hw_queue_prop->supports_sync_stream)) {
-		dev_err(hdev->dev, "Queue index %d is invalid\n", q_idx);
+	if (!hw_queue_prop->supports_sync_stream) {
+		dev_err(hdev->dev,
+			"Queue index %d does not support sync stream operations\n",
+			q_idx);
 		rc = -EINVAL;
 		goto free_cs_chunk_array;
 	}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-11-02 19:58 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-02 19:57 [PATCH] habanalabs: don't init vm module if no MMU Oded Gabbay
2020-11-02 19:57 ` [PATCH] habanalabs: minimize prints when everything is fine Oded Gabbay
2020-11-02 19:57 ` [PATCH 1/4] habanalabs: sync stream structures refactor Oded Gabbay
2020-11-02 19:58 ` [PATCH 2/4] habanalabs: add support for multiple SOBs per monitor Oded Gabbay
2020-11-02 19:58 ` [PATCH 3/4] habanalabs: sync stream refactor functions Oded Gabbay
2020-11-02 19:58 ` [PATCH 4/4] habanalabs: remove duplicate check Oded Gabbay

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).