* [PATCH] habanalabs: don't init vm module if no MMU
@ 2020-11-02 19:57 Oded Gabbay
2020-11-02 19:57 ` [PATCH] habanalabs: minimize prints when everything is fine Oded Gabbay
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: Oded Gabbay @ 2020-11-02 19:57 UTC (permalink / raw)
To: linux-kernel; +Cc: SW_Drivers
In case we are running without MMU enabled (debug mode), no need to
initialize the VM module in the driver.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
drivers/misc/habanalabs/common/memory.c | 33 +++++++++++--------------
1 file changed, 14 insertions(+), 19 deletions(-)
diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
index 84227819e4d1..75dd18771868 100644
--- a/drivers/misc/habanalabs/common/memory.c
+++ b/drivers/misc/habanalabs/common/memory.c
@@ -1685,27 +1685,19 @@ int hl_vm_ctx_init(struct hl_ctx *ctx)
* In case of DRAM mapping, the returned address is the physical
* address of the memory related to the given handle.
*/
- if (ctx->hdev->mmu_enable) {
- dram_range_start = prop->dmmu.start_addr;
- dram_range_end = prop->dmmu.end_addr;
- host_range_start = prop->pmmu.start_addr;
- host_range_end = prop->pmmu.end_addr;
- host_huge_range_start = prop->pmmu_huge.start_addr;
- host_huge_range_end = prop->pmmu_huge.end_addr;
- } else {
- dram_range_start = prop->dram_user_base_address;
- dram_range_end = prop->dram_end_address;
- host_range_start = prop->dram_user_base_address;
- host_range_end = prop->dram_end_address;
- host_huge_range_start = prop->dram_user_base_address;
- host_huge_range_end = prop->dram_end_address;
- }
+ if (!ctx->hdev->mmu_enable)
+ return 0;
+
+ dram_range_start = prop->dmmu.start_addr;
+ dram_range_end = prop->dmmu.end_addr;
+ host_range_start = prop->pmmu.start_addr;
+ host_range_end = prop->pmmu.end_addr;
+ host_huge_range_start = prop->pmmu_huge.start_addr;
+ host_huge_range_end = prop->pmmu_huge.end_addr;
return vm_ctx_init_with_ranges(ctx, host_range_start, host_range_end,
- host_huge_range_start,
- host_huge_range_end,
- dram_range_start,
- dram_range_end);
+ host_huge_range_start, host_huge_range_end,
+ dram_range_start, dram_range_end);
}
/*
@@ -1737,6 +1729,9 @@ void hl_vm_ctx_fini(struct hl_ctx *ctx)
struct hlist_node *tmp_node;
int i;
+ if (!ctx->hdev->mmu_enable)
+ return;
+
hl_debugfs_remove_ctx_mem_hash(hdev, ctx);
/*
--
2.17.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH] habanalabs: minimize prints when everything is fine
2020-11-02 19:57 [PATCH] habanalabs: don't init vm module if no MMU Oded Gabbay
@ 2020-11-02 19:57 ` Oded Gabbay
2020-11-02 19:57 ` [PATCH 1/4] habanalabs: sync stream structures refactor Oded Gabbay
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Oded Gabbay @ 2020-11-02 19:57 UTC (permalink / raw)
To: linux-kernel; +Cc: SW_Drivers
No need to print when the driver starts to initialize the H/W. Drivers
should be silent when everything is OK.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
drivers/misc/habanalabs/gaudi/gaudi.c | 2 --
drivers/misc/habanalabs/goya/goya.c | 2 --
2 files changed, 4 deletions(-)
diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index 2910f427c716..9d9d22c4452c 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -2947,8 +2947,6 @@ static int gaudi_hw_init(struct hl_device *hdev)
{
int rc;
- dev_info(hdev->dev, "Starting initialization of H/W\n");
-
gaudi_pre_hw_init(hdev);
gaudi_init_pci_dma_qmans(hdev);
diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index 5db52064ed9e..f41fe748f1ca 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -2505,8 +2505,6 @@ static int goya_hw_init(struct hl_device *hdev)
struct asic_fixed_properties *prop = &hdev->asic_prop;
int rc;
- dev_info(hdev->dev, "Starting initialization of H/W\n");
-
/* Perform read from the device to make sure device is up */
RREG32(mmPCIE_DBI_DEVICE_ID_VENDOR_ID_REG);
--
2.17.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 1/4] habanalabs: sync stream structures refactor
2020-11-02 19:57 [PATCH] habanalabs: don't init vm module if no MMU Oded Gabbay
2020-11-02 19:57 ` [PATCH] habanalabs: minimize prints when everything is fine Oded Gabbay
@ 2020-11-02 19:57 ` Oded Gabbay
2020-11-02 19:58 ` [PATCH 2/4] habanalabs: add support for multiple SOBs per monitor Oded Gabbay
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Oded Gabbay @ 2020-11-02 19:57 UTC (permalink / raw)
To: linux-kernel; +Cc: SW_Drivers, Ofir Bitton
From: Ofir Bitton <obitton@habana.ai>
Refactor sync stream implementation by adding more structures for
better readability. In addition reducing allocated resources.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
drivers/misc/habanalabs/common/habanalabs.h | 78 +++++---
drivers/misc/habanalabs/common/hw_queue.c | 197 +++++++++++---------
drivers/misc/habanalabs/gaudi/gaudi.c | 37 ++--
drivers/misc/habanalabs/goya/goya.c | 4 +-
4 files changed, 180 insertions(+), 136 deletions(-)
diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
index d12f11bdaacf..58b4097235d9 100644
--- a/drivers/misc/habanalabs/common/habanalabs.h
+++ b/drivers/misc/habanalabs/common/habanalabs.h
@@ -68,9 +68,6 @@
#define HL_RSVD_SOBS 4
#define HL_RSVD_MONS 2
-#define HL_RSVD_SOBS_IN_USE 2
-#define HL_RSVD_MONS_IN_USE 1
-
#define HL_MAX_SOB_VAL (1 << 15)
#define IS_POWER_OF_2(n) (n != 0 && ((n & (n - 1)) == 0))
@@ -80,6 +77,22 @@
#define HL_MAX_DCORES 4
+/**
+ * struct hl_gen_wait_properties - properties for generating a wait CB
+ * @data: command buffer
+ * @q_idx: queue id is used to extract fence register address
+ * @sob_id: SOB id to use in this wait CB
+ * @sob_val: SOB value to wait for
+ * @mon_id: monitor to use in this wait CB
+ */
+struct hl_gen_wait_properties {
+ void *data;
+ u32 q_idx;
+ u16 sob_id;
+ u16 sob_val;
+ u16 mon_id;
+};
+
/**
* struct pgt_info - MMU hop page info.
* @node: hash linked-list node for the pgts shadow hash of pgts.
@@ -502,9 +515,27 @@ struct hl_cs_job;
#define HL_CPU_ACCESSIBLE_MEM_SIZE SZ_2M
/**
- * struct hl_hw_queue - describes a H/W transport queue.
+ * struct hl_sync_stream_properties -
+ * describes a H/W queue sync stream properties
* @hw_sob: array of the used H/W SOBs by this H/W queue.
+ * @next_sob_val: the next value to use for the currently used SOB.
+ * @base_sob_id: the base SOB id of the SOBs used by this queue.
+ * @base_mon_id: the base MON id of the MONs used by this queue.
+ * @curr_sob_offset: the id offset to the currently used SOB from the
+ * HL_RSVD_SOBS that are being used by this queue.
+ */
+struct hl_sync_stream_properties {
+ struct hl_hw_sob hw_sob[HL_RSVD_SOBS];
+ u16 next_sob_val;
+ u16 base_sob_id;
+ u16 base_mon_id;
+ u8 curr_sob_offset;
+};
+
+/**
+ * struct hl_hw_queue - describes a H/W transport queue.
* @shadow_queue: pointer to a shadow queue that holds pointers to jobs.
+ * @sync_stream_prop: sync stream queue properties
* @queue_type: type of queue.
* @kernel_address: holds the queue's kernel virtual address.
* @bus_address: holds the queue's DMA address.
@@ -514,33 +545,24 @@ struct hl_cs_job;
* @cq_id: the id for the corresponding CQ for this H/W queue.
* @msi_vec: the IRQ number of the H/W queue.
* @int_queue_len: length of internal queue (number of entries).
- * @next_sob_val: the next value to use for the currently used SOB.
- * @base_sob_id: the base SOB id of the SOBs used by this queue.
- * @base_mon_id: the base MON id of the MONs used by this queue.
* @valid: is the queue valid (we have array of 32 queues, not all of them
* exist).
- * @curr_sob_offset: the id offset to the currently used SOB from the
- * HL_RSVD_SOBS that are being used by this queue.
* @supports_sync_stream: True if queue supports sync stream
*/
struct hl_hw_queue {
- struct hl_hw_sob hw_sob[HL_RSVD_SOBS];
- struct hl_cs_job **shadow_queue;
- enum hl_queue_type queue_type;
- u64 kernel_address;
- dma_addr_t bus_address;
- u32 pi;
- atomic_t ci;
- u32 hw_queue_id;
- u32 cq_id;
- u32 msi_vec;
- u16 int_queue_len;
- u16 next_sob_val;
- u16 base_sob_id;
- u16 base_mon_id;
- u8 valid;
- u8 curr_sob_offset;
- u8 supports_sync_stream;
+ struct hl_cs_job **shadow_queue;
+ struct hl_sync_stream_properties sync_stream_prop;
+ enum hl_queue_type queue_type;
+ u64 kernel_address;
+ dma_addr_t bus_address;
+ u32 pi;
+ atomic_t ci;
+ u32 hw_queue_id;
+ u32 cq_id;
+ u32 msi_vec;
+ u16 int_queue_len;
+ u8 valid;
+ u8 supports_sync_stream;
};
/**
@@ -823,8 +845,8 @@ struct hl_asic_funcs {
u32 (*get_signal_cb_size)(struct hl_device *hdev);
u32 (*get_wait_cb_size)(struct hl_device *hdev);
void (*gen_signal_cb)(struct hl_device *hdev, void *data, u16 sob_id);
- void (*gen_wait_cb)(struct hl_device *hdev, void *data, u16 sob_id,
- u16 sob_val, u16 mon_id, u32 q_idx);
+ void (*gen_wait_cb)(struct hl_device *hdev,
+ struct hl_gen_wait_properties *prop);
void (*reset_sob)(struct hl_device *hdev, void *data);
void (*set_dma_mask_from_fw)(struct hl_device *hdev);
u64 (*get_device_time)(struct hl_device *hdev);
diff --git a/drivers/misc/habanalabs/common/hw_queue.c b/drivers/misc/habanalabs/common/hw_queue.c
index adb61f9eb2b5..613681c2cdcc 100644
--- a/drivers/misc/habanalabs/common/hw_queue.c
+++ b/drivers/misc/habanalabs/common/hw_queue.c
@@ -389,6 +389,89 @@ static void hw_queue_schedule_job(struct hl_cs_job *job)
ext_and_hw_queue_submit_bd(hdev, q, ctl, len, ptr);
}
+static void init_signal_cs(struct hl_device *hdev,
+ struct hl_cs_job *job, struct hl_cs_compl *cs_cmpl)
+{
+ struct hl_sync_stream_properties *prop;
+ struct hl_hw_sob *hw_sob;
+ u32 q_idx;
+
+ q_idx = job->hw_queue_id;
+ prop = &hdev->kernel_queues[q_idx].sync_stream_prop;
+ hw_sob = &prop->hw_sob[prop->curr_sob_offset];
+
+ cs_cmpl->hw_sob = hw_sob;
+ cs_cmpl->sob_val = prop->next_sob_val++;
+
+ dev_dbg(hdev->dev,
+ "generate signal CB, sob_id: %d, sob val: 0x%x, q_idx: %d\n",
+ cs_cmpl->hw_sob->sob_id, cs_cmpl->sob_val, q_idx);
+
+ hdev->asic_funcs->gen_signal_cb(hdev, job->patched_cb,
+ cs_cmpl->hw_sob->sob_id);
+
+ kref_get(&hw_sob->kref);
+
+ /* check for wraparound */
+ if (prop->next_sob_val == HL_MAX_SOB_VAL) {
+ /*
+ * Decrement as we reached the max value.
+ * The release function won't be called here as we've
+ * just incremented the refcount.
+ */
+ kref_put(&hw_sob->kref, hl_sob_reset_error);
+ prop->next_sob_val = 1;
+ /* only two SOBs are currently in use */
+ prop->curr_sob_offset =
+ (prop->curr_sob_offset + 1) % HL_RSVD_SOBS;
+
+ dev_dbg(hdev->dev, "switched to SOB %d, q_idx: %d\n",
+ prop->curr_sob_offset, q_idx);
+ }
+}
+
+static void init_wait_cs(struct hl_device *hdev, struct hl_cs *cs,
+ struct hl_cs_job *job, struct hl_cs_compl *cs_cmpl)
+{
+ struct hl_cs_compl *signal_cs_cmpl;
+ struct hl_sync_stream_properties *prop;
+ struct hl_gen_wait_properties wait_prop;
+ u32 q_idx;
+
+ q_idx = job->hw_queue_id;
+ prop = &hdev->kernel_queues[q_idx].sync_stream_prop;
+
+ signal_cs_cmpl = container_of(cs->signal_fence,
+ struct hl_cs_compl,
+ base_fence);
+
+ /* copy the SOB id and value of the signal CS */
+ cs_cmpl->hw_sob = signal_cs_cmpl->hw_sob;
+ cs_cmpl->sob_val = signal_cs_cmpl->sob_val;
+
+ dev_dbg(hdev->dev,
+ "generate wait CB, sob_id: %d, sob_val: 0x%x, mon_id: %d, q_idx: %d\n",
+ cs_cmpl->hw_sob->sob_id, cs_cmpl->sob_val,
+ prop->base_mon_id, q_idx);
+
+ wait_prop.data = (void *) job->patched_cb;
+ wait_prop.sob_id = cs_cmpl->hw_sob->sob_id;
+ wait_prop.sob_val = cs_cmpl->sob_val;
+ wait_prop.mon_id = prop->base_mon_id;
+ wait_prop.q_idx = q_idx;
+ hdev->asic_funcs->gen_wait_cb(hdev, &wait_prop);
+
+ kref_get(&cs_cmpl->hw_sob->kref);
+ /*
+ * Must put the signal fence after the SOB refcnt increment so
+ * the SOB refcnt won't turn 0 and reset the SOB before the
+ * wait CS was submitted.
+ */
+ mb();
+ hl_fence_put(cs->signal_fence);
+ cs->signal_fence = NULL;
+}
+
/*
* init_signal_wait_cs - initialize a signal/wait CS
* @cs: pointer to the signal/wait CS
@@ -399,84 +482,18 @@ static void init_signal_wait_cs(struct hl_cs *cs)
{
struct hl_ctx *ctx = cs->ctx;
struct hl_device *hdev = ctx->hdev;
- struct hl_hw_queue *hw_queue;
+ struct hl_cs_job *job;
struct hl_cs_compl *cs_cmpl =
container_of(cs->fence, struct hl_cs_compl, base_fence);
- struct hl_hw_sob *hw_sob;
- struct hl_cs_job *job;
- u32 q_idx;
-
/* There is only one job in a signal/wait CS */
job = list_first_entry(&cs->job_list, struct hl_cs_job,
cs_node);
- q_idx = job->hw_queue_id;
- hw_queue = &hdev->kernel_queues[q_idx];
-
- if (cs->type & CS_TYPE_SIGNAL) {
- hw_sob = &hw_queue->hw_sob[hw_queue->curr_sob_offset];
-
- cs_cmpl->hw_sob = hw_sob;
- cs_cmpl->sob_val = hw_queue->next_sob_val++;
-
- dev_dbg(hdev->dev,
- "generate signal CB, sob_id: %d, sob val: 0x%x, q_idx: %d\n",
- cs_cmpl->hw_sob->sob_id, cs_cmpl->sob_val, q_idx);
-
- hdev->asic_funcs->gen_signal_cb(hdev, job->patched_cb,
- cs_cmpl->hw_sob->sob_id);
-
- kref_get(&hw_sob->kref);
-
- /* check for wraparound */
- if (hw_queue->next_sob_val == HL_MAX_SOB_VAL) {
- /*
- * Decrement as we reached the max value.
- * The release function won't be called here as we've
- * just incremented the refcount.
- */
- kref_put(&hw_sob->kref, hl_sob_reset_error);
- hw_queue->next_sob_val = 1;
- /* only two SOBs are currently in use */
- hw_queue->curr_sob_offset =
- (hw_queue->curr_sob_offset + 1) %
- HL_RSVD_SOBS_IN_USE;
-
- dev_dbg(hdev->dev, "switched to SOB %d, q_idx: %d\n",
- hw_queue->curr_sob_offset, q_idx);
- }
- } else if (cs->type & CS_TYPE_WAIT) {
- struct hl_cs_compl *signal_cs_cmpl;
-
- signal_cs_cmpl = container_of(cs->signal_fence,
- struct hl_cs_compl,
- base_fence);
-
- /* copy the the SOB id and value of the signal CS */
- cs_cmpl->hw_sob = signal_cs_cmpl->hw_sob;
- cs_cmpl->sob_val = signal_cs_cmpl->sob_val;
- dev_dbg(hdev->dev,
- "generate wait CB, sob_id: %d, sob_val: 0x%x, mon_id: %d, q_idx: %d\n",
- cs_cmpl->hw_sob->sob_id, cs_cmpl->sob_val,
- hw_queue->base_mon_id, q_idx);
-
- hdev->asic_funcs->gen_wait_cb(hdev, job->patched_cb,
- cs_cmpl->hw_sob->sob_id,
- cs_cmpl->sob_val,
- hw_queue->base_mon_id,
- q_idx);
-
- kref_get(&cs_cmpl->hw_sob->kref);
- /*
- * Must put the signal fence after the SOB refcnt increment so
- * the SOB refcnt won't turn 0 and reset the SOB before the
- * wait CS was submitted.
- */
- mb();
- hl_fence_put(cs->signal_fence);
- cs->signal_fence = NULL;
- }
+ if (cs->type & CS_TYPE_SIGNAL)
+ init_signal_cs(hdev, job, cs_cmpl);
+ else if (cs->type & CS_TYPE_WAIT)
+ init_wait_cs(hdev, cs, job, cs_cmpl);
}
/*
@@ -720,22 +737,28 @@ static int hw_queue_init(struct hl_device *hdev, struct hl_hw_queue *q)
static void sync_stream_queue_init(struct hl_device *hdev, u32 q_idx)
{
- struct hl_hw_queue *hw_queue = &hdev->kernel_queues[q_idx];
+ struct hl_sync_stream_properties *sync_stream_prop;
struct asic_fixed_properties *prop = &hdev->asic_prop;
struct hl_hw_sob *hw_sob;
- int sob, queue_idx = hdev->sync_stream_queue_idx++;
+ int sob, queue_idx;
+
+ if (!hdev->kernel_queues[q_idx].supports_sync_stream)
+ return;
+
+ sync_stream_prop = &hdev->kernel_queues[q_idx].sync_stream_prop;
+ queue_idx = hdev->sync_stream_queue_idx++;
- hw_queue->base_sob_id =
- prop->sync_stream_first_sob + queue_idx * HL_RSVD_SOBS;
- hw_queue->base_mon_id =
- prop->sync_stream_first_mon + queue_idx * HL_RSVD_MONS;
- hw_queue->next_sob_val = 1;
- hw_queue->curr_sob_offset = 0;
+ sync_stream_prop->base_sob_id = prop->sync_stream_first_sob +
+ (queue_idx * HL_RSVD_SOBS);
+ sync_stream_prop->base_mon_id = prop->sync_stream_first_mon +
+ (queue_idx * HL_RSVD_MONS);
+ sync_stream_prop->next_sob_val = 1;
+ sync_stream_prop->curr_sob_offset = 0;
for (sob = 0 ; sob < HL_RSVD_SOBS ; sob++) {
- hw_sob = &hw_queue->hw_sob[sob];
+ hw_sob = &sync_stream_prop->hw_sob[sob];
hw_sob->hdev = hdev;
- hw_sob->sob_id = hw_queue->base_sob_id + sob;
+ hw_sob->sob_id = sync_stream_prop->base_sob_id + sob;
hw_sob->q_idx = q_idx;
kref_init(&hw_sob->kref);
}
@@ -743,15 +766,16 @@ static void sync_stream_queue_init(struct hl_device *hdev, u32 q_idx)
static void sync_stream_queue_reset(struct hl_device *hdev, u32 q_idx)
{
- struct hl_hw_queue *hw_queue = &hdev->kernel_queues[q_idx];
+ struct hl_sync_stream_properties *prop =
+ &hdev->kernel_queues[q_idx].sync_stream_prop;
/*
* In case we got here due to a stuck CS, the refcnt might be bigger
* than 1 and therefore we reset it.
*/
- kref_init(&hw_queue->hw_sob[hw_queue->curr_sob_offset].kref);
- hw_queue->curr_sob_offset = 0;
- hw_queue->next_sob_val = 1;
+ kref_init(&prop->hw_sob[prop->curr_sob_offset].kref);
+ prop->curr_sob_offset = 0;
+ prop->next_sob_val = 1;
}
/*
@@ -794,8 +818,7 @@ static int queue_init(struct hl_device *hdev, struct hl_hw_queue *q,
break;
}
- if (q->supports_sync_stream)
- sync_stream_queue_init(hdev, q->hw_queue_id);
+ sync_stream_queue_init(hdev, q->hw_queue_id);
if (rc)
return rc;
diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index 9d9d22c4452c..46dced9d1eec 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -472,9 +472,11 @@ static int gaudi_get_fixed_properties(struct hl_device *hdev)
prop->max_pending_cs = GAUDI_MAX_PENDING_CS;
prop->first_available_user_sob[HL_GAUDI_WS_DCORE] =
- num_sync_stream_queues * HL_RSVD_SOBS;
+ prop->sync_stream_first_sob +
+ (num_sync_stream_queues * HL_RSVD_SOBS);
prop->first_available_user_mon[HL_GAUDI_WS_DCORE] =
- num_sync_stream_queues * HL_RSVD_MONS;
+ prop->sync_stream_first_mon +
+ (num_sync_stream_queues * HL_RSVD_MONS);
return 0;
}
@@ -6472,16 +6474,16 @@ static u32 gaudi_add_fence_pkt(struct packet_fence *pkt)
return pkt_size;
}
-static void gaudi_gen_wait_cb(struct hl_device *hdev, void *data, u16 sob_id,
- u16 sob_val, u16 mon_id, u32 q_idx)
+static void gaudi_gen_wait_cb(struct hl_device *hdev,
+ struct hl_gen_wait_properties *prop)
{
- struct hl_cb *cb = (struct hl_cb *) data;
+ struct hl_cb *cb = (struct hl_cb *) prop->data;
void *buf = (void *) (uintptr_t) cb->kernel_address;
u64 monitor_base, fence_addr = 0;
u32 size = 0;
u16 msg_addr_offset;
- switch (q_idx) {
+ switch (prop->q_idx) {
case GAUDI_QUEUE_ID_DMA_0_0:
fence_addr = mmDMA0_QM_CP_FENCE2_RDATA_0;
break;
@@ -6521,7 +6523,7 @@ static void gaudi_gen_wait_cb(struct hl_device *hdev, void *data, u16 sob_id,
default:
/* queue index should be valid here */
dev_crit(hdev->dev, "wrong queue id %d for wait packet\n",
- q_idx);
+ prop->q_idx);
return;
}
@@ -6534,17 +6536,15 @@ static void gaudi_gen_wait_cb(struct hl_device *hdev, void *data, u16 sob_id,
monitor_base = mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0;
/* First monitor config packet: low address of the sync */
- msg_addr_offset =
- (mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0 + mon_id * 4) -
- monitor_base;
+ msg_addr_offset = (mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0 +
+ prop->mon_id * 4) - monitor_base;
size += gaudi_add_mon_msg_short(buf + size, (u32) fence_addr,
msg_addr_offset);
/* Second monitor config packet: high address of the sync */
- msg_addr_offset =
- (mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRH_0 + mon_id * 4) -
- monitor_base;
+ msg_addr_offset = (mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRH_0 +
+ prop->mon_id * 4) - monitor_base;
size += gaudi_add_mon_msg_short(buf + size, (u32) (fence_addr >> 32),
msg_addr_offset);
@@ -6553,18 +6553,17 @@ static void gaudi_gen_wait_cb(struct hl_device *hdev, void *data, u16 sob_id,
* Third monitor config packet: the payload, i.e. what to write when the
* sync triggers
*/
- msg_addr_offset =
- (mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_DATA_0 + mon_id * 4) -
- monitor_base;
+ msg_addr_offset = (mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_DATA_0 +
+ prop->mon_id * 4) - monitor_base;
size += gaudi_add_mon_msg_short(buf + size, 1, msg_addr_offset);
/* Fourth monitor config packet: bind the monitor to a sync object */
msg_addr_offset =
- (mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_ARM_0 + mon_id * 4) -
+ (mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_ARM_0 + prop->mon_id * 4) -
monitor_base;
- size += gaudi_add_arm_monitor_pkt(buf + size, sob_id, sob_val,
- msg_addr_offset);
+ size += gaudi_add_arm_monitor_pkt(buf + size, prop->sob_id,
+ prop->sob_val, msg_addr_offset);
/* Fence packet */
size += gaudi_add_fence_pkt(buf + size);
diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index f41fe748f1ca..cd1366f10fbe 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -5297,8 +5297,8 @@ static void goya_gen_signal_cb(struct hl_device *hdev, void *data, u16 sob_id)
}
-static void goya_gen_wait_cb(struct hl_device *hdev, void *data, u16 sob_id,
- u16 sob_val, u16 mon_id, u32 q_idx)
+static void goya_gen_wait_cb(struct hl_device *hdev,
+ struct hl_gen_wait_properties *prop)
{
}
--
2.17.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/4] habanalabs: add support for multiple SOBs per monitor
2020-11-02 19:57 [PATCH] habanalabs: don't init vm module if no MMU Oded Gabbay
2020-11-02 19:57 ` [PATCH] habanalabs: minimize prints when everything is fine Oded Gabbay
2020-11-02 19:57 ` [PATCH 1/4] habanalabs: sync stream structures refactor Oded Gabbay
@ 2020-11-02 19:58 ` Oded Gabbay
2020-11-02 19:58 ` [PATCH 3/4] habanalabs: sync stream refactor functions Oded Gabbay
2020-11-02 19:58 ` [PATCH 4/4] habanalabs: remove duplicate check Oded Gabbay
4 siblings, 0 replies; 6+ messages in thread
From: Oded Gabbay @ 2020-11-02 19:58 UTC (permalink / raw)
To: linux-kernel; +Cc: SW_Drivers, Ofir Bitton
From: Ofir Bitton <obitton@habana.ai>
Support advanced monitor functionality to monitor more than a
single SOB. In addition expand all CB generation functions
with buffer offset in order to put in them multiple packets that are
generated by different functions.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
.../habanalabs/common/command_submission.c | 32 ++++
drivers/misc/habanalabs/common/habanalabs.h | 16 +-
drivers/misc/habanalabs/common/hw_queue.c | 6 +-
drivers/misc/habanalabs/gaudi/gaudi.c | 137 ++++++++++++------
drivers/misc/habanalabs/goya/goya.c | 9 +-
5 files changed, 143 insertions(+), 57 deletions(-)
diff --git a/drivers/misc/habanalabs/common/command_submission.c b/drivers/misc/habanalabs/common/command_submission.c
index 9d49dd1558af..0d82c7dd93d0 100644
--- a/drivers/misc/habanalabs/common/command_submission.c
+++ b/drivers/misc/habanalabs/common/command_submission.c
@@ -38,6 +38,38 @@ void hl_sob_reset_error(struct kref *ref)
hw_sob->q_idx, hw_sob->sob_id);
}
+/**
+ * hl_gen_sob_mask() - Generates a sob mask to be used in a monitor arm packet
+ * @sob_base: sob base id
+ * @sob_mask: sob user mask, each bit represents a sob offset from sob base
+ * @mask: generated mask
+ *
+ * Return: 0 if given parameters are valid
+ */
+int hl_gen_sob_mask(u16 sob_base, u8 sob_mask, u8 *mask)
+{
+ int i;
+
+ if (sob_mask == 0)
+ return -EINVAL;
+
+ if (sob_mask == 0x1) {
+ *mask = ~(1 << (sob_base & 0x7));
+ } else {
+ /* find msb in order to verify sob range is valid */
+ for (i = BITS_PER_BYTE - 1 ; i >= 0 ; i--)
+ if (BIT(i) & sob_mask)
+ break;
+
+ if (i > (HL_MAX_SOBS_PER_MONITOR - (sob_base & 0x7) - 1))
+ return -EINVAL;
+
+ *mask = ~sob_mask;
+ }
+
+ return 0;
+}
+
static void hl_fence_release(struct kref *kref)
{
struct hl_fence *fence =
diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
index 58b4097235d9..7307e0b88b44 100644
--- a/drivers/misc/habanalabs/common/habanalabs.h
+++ b/drivers/misc/habanalabs/common/habanalabs.h
@@ -77,20 +77,26 @@
#define HL_MAX_DCORES 4
+#define HL_MAX_SOBS_PER_MONITOR 8
+
/**
* struct hl_gen_wait_properties - properties for generating a wait CB
* @data: command buffer
* @q_idx: queue id is used to extract fence register address
- * @sob_id: SOB id to use in this wait CB
+ * @size: offset in command buffer
+ * @sob_base: SOB base to use in this wait CB
* @sob_val: SOB value to wait for
* @mon_id: monitor to use in this wait CB
+ * @sob_mask: each bit represents a SOB offset from sob_base to be used
*/
struct hl_gen_wait_properties {
void *data;
u32 q_idx;
- u16 sob_id;
+ u32 size;
+ u16 sob_base;
u16 sob_val;
u16 mon_id;
+ u8 sob_mask;
};
/**
@@ -844,8 +850,9 @@ struct hl_asic_funcs {
int (*load_boot_fit_to_device)(struct hl_device *hdev);
u32 (*get_signal_cb_size)(struct hl_device *hdev);
u32 (*get_wait_cb_size)(struct hl_device *hdev);
- void (*gen_signal_cb)(struct hl_device *hdev, void *data, u16 sob_id);
- void (*gen_wait_cb)(struct hl_device *hdev,
+ u32 (*gen_signal_cb)(struct hl_device *hdev, void *data, u16 sob_id,
+ u32 size);
+ u32 (*gen_wait_cb)(struct hl_device *hdev,
struct hl_gen_wait_properties *prop);
void (*reset_sob)(struct hl_device *hdev, void *data);
void (*set_dma_mask_from_fw)(struct hl_device *hdev);
@@ -1927,6 +1934,7 @@ void hl_cs_rollback_all(struct hl_device *hdev);
struct hl_cs_job *hl_cs_allocate_job(struct hl_device *hdev,
enum hl_queue_type queue_type, bool is_kernel_allocated_cb);
void hl_sob_reset_error(struct kref *ref);
+int hl_gen_sob_mask(u16 sob_base, u8 sob_mask, u8 *mask);
void hl_fence_put(struct hl_fence *fence);
void hl_fence_get(struct hl_fence *fence);
diff --git a/drivers/misc/habanalabs/common/hw_queue.c b/drivers/misc/habanalabs/common/hw_queue.c
index 613681c2cdcc..ca625789d78d 100644
--- a/drivers/misc/habanalabs/common/hw_queue.c
+++ b/drivers/misc/habanalabs/common/hw_queue.c
@@ -408,7 +408,7 @@ static void init_signal_cs(struct hl_device *hdev,
cs_cmpl->hw_sob->sob_id, cs_cmpl->sob_val, q_idx);
hdev->asic_funcs->gen_signal_cb(hdev, job->patched_cb,
- cs_cmpl->hw_sob->sob_id);
+ cs_cmpl->hw_sob->sob_id, 0);
kref_get(&hw_sob->kref);
@@ -455,10 +455,12 @@ static void init_wait_cs(struct hl_device *hdev, struct hl_cs *cs,
prop->base_mon_id, q_idx);
wait_prop.data = (void *) job->patched_cb;
- wait_prop.sob_id = cs_cmpl->hw_sob->sob_id;
+ wait_prop.sob_base = cs_cmpl->hw_sob->sob_id;
+ wait_prop.sob_mask = 0x1;
wait_prop.sob_val = cs_cmpl->sob_val;
wait_prop.mon_id = prop->base_mon_id;
wait_prop.q_idx = q_idx;
+ wait_prop.size = 0;
hdev->asic_funcs->gen_wait_cb(hdev, &wait_prop);
kref_get(&cs_cmpl->hw_sob->kref);
diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index 46dced9d1eec..930b26b1f445 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -6380,14 +6380,16 @@ static u32 gaudi_get_wait_cb_size(struct hl_device *hdev)
sizeof(struct packet_msg_prot) * 2;
}
-static void gaudi_gen_signal_cb(struct hl_device *hdev, void *data, u16 sob_id)
+static u32 gaudi_gen_signal_cb(struct hl_device *hdev, void *data, u16 sob_id,
+ u32 size)
{
struct hl_cb *cb = (struct hl_cb *) data;
struct packet_msg_short *pkt;
- u32 value, ctl;
+ u32 value, ctl, pkt_size = sizeof(*pkt);
- pkt = (struct packet_msg_short *) (uintptr_t) cb->kernel_address;
- memset(pkt, 0, sizeof(*pkt));
+ pkt = (struct packet_msg_short *) (uintptr_t) (cb->kernel_address +
+ size);
+ memset(pkt, 0, pkt_size);
/* Inc by 1, Mode ADD */
value = FIELD_PREP(GAUDI_PKT_SHORT_VAL_SOB_SYNC_VAL_MASK, 1);
@@ -6403,6 +6405,8 @@ static void gaudi_gen_signal_cb(struct hl_device *hdev, void *data, u16 sob_id)
pkt->value = cpu_to_le32(value);
pkt->ctl = cpu_to_le32(ctl);
+
+ return size + pkt_size;
}
static u32 gaudi_add_mon_msg_short(struct packet_msg_short *pkt, u32 value,
@@ -6425,21 +6429,42 @@ static u32 gaudi_add_mon_msg_short(struct packet_msg_short *pkt, u32 value,
return pkt_size;
}
-static u32 gaudi_add_arm_monitor_pkt(struct packet_msg_short *pkt, u16 sob_id,
- u16 sob_val, u16 addr)
+static u32 gaudi_add_arm_monitor_pkt(struct hl_device *hdev,
+ struct packet_msg_short *pkt, u16 sob_base, u8 sob_mask,
+ u16 sob_val, u16 mon_id)
{
+ u64 monitor_base;
u32 ctl, value, pkt_size = sizeof(*pkt);
- u8 mask = ~(1 << (sob_id & 0x7));
+ u16 msg_addr_offset;
+ u8 mask;
+
+ if (hl_gen_sob_mask(sob_base, sob_mask, &mask)) {
+ dev_err(hdev->dev,
+ "sob_base %u (mask %#x) is not valid\n",
+ sob_base, sob_mask);
+ return 0;
+ }
+
+ /*
+ * monitor_base should be the content of the base0 address registers,
+ * so it will be added to the msg short offsets
+ */
+ monitor_base = mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0;
+
+ msg_addr_offset =
+ (mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_ARM_0 + mon_id * 4) -
+ monitor_base;
memset(pkt, 0, pkt_size);
- value = FIELD_PREP(GAUDI_PKT_SHORT_VAL_MON_SYNC_GID_MASK, sob_id / 8);
+ /* Monitor config packet: bind the monitor to a sync object */
+ value = FIELD_PREP(GAUDI_PKT_SHORT_VAL_MON_SYNC_GID_MASK, sob_base / 8);
value |= FIELD_PREP(GAUDI_PKT_SHORT_VAL_MON_SYNC_VAL_MASK, sob_val);
value |= FIELD_PREP(GAUDI_PKT_SHORT_VAL_MON_MODE_MASK,
0); /* GREATER OR EQUAL*/
value |= FIELD_PREP(GAUDI_PKT_SHORT_VAL_MON_MASK_MASK, mask);
- ctl = FIELD_PREP(GAUDI_PKT_SHORT_CTL_ADDR_MASK, addr);
+ ctl = FIELD_PREP(GAUDI_PKT_SHORT_CTL_ADDR_MASK, msg_addr_offset);
ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_OP_MASK, 0); /* write the value */
ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_BASE_MASK, 2); /* W_S MON base */
ctl |= FIELD_PREP(GAUDI_PKT_SHORT_CTL_OPCODE_MASK, PACKET_MSG_SHORT);
@@ -6474,60 +6499,61 @@ static u32 gaudi_add_fence_pkt(struct packet_fence *pkt)
return pkt_size;
}
-static void gaudi_gen_wait_cb(struct hl_device *hdev,
- struct hl_gen_wait_properties *prop)
+static int gaudi_get_fence_addr(struct hl_device *hdev, u32 queue_id, u64 *addr)
{
- struct hl_cb *cb = (struct hl_cb *) prop->data;
- void *buf = (void *) (uintptr_t) cb->kernel_address;
- u64 monitor_base, fence_addr = 0;
- u32 size = 0;
- u16 msg_addr_offset;
+ u32 offset;
- switch (prop->q_idx) {
+ switch (queue_id) {
case GAUDI_QUEUE_ID_DMA_0_0:
- fence_addr = mmDMA0_QM_CP_FENCE2_RDATA_0;
+ offset = mmDMA0_QM_CP_FENCE2_RDATA_0;
break;
case GAUDI_QUEUE_ID_DMA_0_1:
- fence_addr = mmDMA0_QM_CP_FENCE2_RDATA_1;
+ offset = mmDMA0_QM_CP_FENCE2_RDATA_1;
break;
case GAUDI_QUEUE_ID_DMA_0_2:
- fence_addr = mmDMA0_QM_CP_FENCE2_RDATA_2;
+ offset = mmDMA0_QM_CP_FENCE2_RDATA_2;
break;
case GAUDI_QUEUE_ID_DMA_0_3:
- fence_addr = mmDMA0_QM_CP_FENCE2_RDATA_3;
+ offset = mmDMA0_QM_CP_FENCE2_RDATA_3;
break;
case GAUDI_QUEUE_ID_DMA_1_0:
- fence_addr = mmDMA1_QM_CP_FENCE2_RDATA_0;
+ offset = mmDMA1_QM_CP_FENCE2_RDATA_0;
break;
case GAUDI_QUEUE_ID_DMA_1_1:
- fence_addr = mmDMA1_QM_CP_FENCE2_RDATA_1;
+ offset = mmDMA1_QM_CP_FENCE2_RDATA_1;
break;
case GAUDI_QUEUE_ID_DMA_1_2:
- fence_addr = mmDMA1_QM_CP_FENCE2_RDATA_2;
+ offset = mmDMA1_QM_CP_FENCE2_RDATA_2;
break;
case GAUDI_QUEUE_ID_DMA_1_3:
- fence_addr = mmDMA1_QM_CP_FENCE2_RDATA_3;
+ offset = mmDMA1_QM_CP_FENCE2_RDATA_3;
break;
case GAUDI_QUEUE_ID_DMA_5_0:
- fence_addr = mmDMA5_QM_CP_FENCE2_RDATA_0;
+ offset = mmDMA5_QM_CP_FENCE2_RDATA_0;
break;
case GAUDI_QUEUE_ID_DMA_5_1:
- fence_addr = mmDMA5_QM_CP_FENCE2_RDATA_1;
+ offset = mmDMA5_QM_CP_FENCE2_RDATA_1;
break;
case GAUDI_QUEUE_ID_DMA_5_2:
- fence_addr = mmDMA5_QM_CP_FENCE2_RDATA_2;
+ offset = mmDMA5_QM_CP_FENCE2_RDATA_2;
break;
case GAUDI_QUEUE_ID_DMA_5_3:
- fence_addr = mmDMA5_QM_CP_FENCE2_RDATA_3;
+ offset = mmDMA5_QM_CP_FENCE2_RDATA_3;
break;
default:
- /* queue index should be valid here */
- dev_crit(hdev->dev, "wrong queue id %d for wait packet\n",
- prop->q_idx);
- return;
+ return -EINVAL;
}
- fence_addr += CFG_BASE;
+ *addr = CFG_BASE + offset;
+
+ return 0;
+}
+
+static u32 gaudi_add_mon_pkts(void *buf, u16 mon_id, u64 fence_addr)
+{
+ u64 monitor_base;
+ u32 size = 0;
+ u16 msg_addr_offset;
/*
* monitor_base should be the content of the base0 address registers,
@@ -6536,15 +6562,17 @@ static void gaudi_gen_wait_cb(struct hl_device *hdev,
monitor_base = mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0;
/* First monitor config packet: low address of the sync */
- msg_addr_offset = (mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0 +
- prop->mon_id * 4) - monitor_base;
+ msg_addr_offset =
+ (mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRL_0 + mon_id * 4) -
+ monitor_base;
size += gaudi_add_mon_msg_short(buf + size, (u32) fence_addr,
msg_addr_offset);
/* Second monitor config packet: high address of the sync */
- msg_addr_offset = (mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRH_0 +
- prop->mon_id * 4) - monitor_base;
+ msg_addr_offset =
+ (mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_ADDRH_0 + mon_id * 4) -
+ monitor_base;
size += gaudi_add_mon_msg_short(buf + size, (u32) (fence_addr >> 32),
msg_addr_offset);
@@ -6553,20 +6581,35 @@ static void gaudi_gen_wait_cb(struct hl_device *hdev,
* Third monitor config packet: the payload, i.e. what to write when the
* sync triggers
*/
- msg_addr_offset = (mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_DATA_0 +
- prop->mon_id * 4) - monitor_base;
+ msg_addr_offset =
+ (mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_PAY_DATA_0 + mon_id * 4) -
+ monitor_base;
size += gaudi_add_mon_msg_short(buf + size, 1, msg_addr_offset);
- /* Fourth monitor config packet: bind the monitor to a sync object */
- msg_addr_offset =
- (mmSYNC_MNGR_W_S_SYNC_MNGR_OBJS_MON_ARM_0 + prop->mon_id * 4) -
- monitor_base;
- size += gaudi_add_arm_monitor_pkt(buf + size, prop->sob_id,
- prop->sob_val, msg_addr_offset);
+ return size;
+}
+
+u32 gaudi_gen_wait_cb(struct hl_device *hdev,
+ struct hl_gen_wait_properties *prop)
+{
+ struct hl_cb *cb = (struct hl_cb *) prop->data;
+ void *buf = (void *) (uintptr_t) cb->kernel_address;
+ u64 fence_addr = 0;
+ u32 size = prop->size;
- /* Fence packet */
+ if (gaudi_get_fence_addr(hdev, prop->q_idx, &fence_addr)) {
+ dev_crit(hdev->dev, "wrong queue id %d for wait packet\n",
+ prop->q_idx);
+ return 0;
+ }
+
+ size += gaudi_add_mon_pkts(buf + size, prop->mon_id, fence_addr);
+ size += gaudi_add_arm_monitor_pkt(hdev, buf + size, prop->sob_base,
+ prop->sob_mask, prop->sob_val, prop->mon_id);
size += gaudi_add_fence_pkt(buf + size);
+
+ return size;
}
static void gaudi_reset_sob(struct hl_device *hdev, void *data)
diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index cd1366f10fbe..e8bf0b79cd67 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -5292,15 +5292,16 @@ static u32 goya_get_wait_cb_size(struct hl_device *hdev)
return 0;
}
-static void goya_gen_signal_cb(struct hl_device *hdev, void *data, u16 sob_id)
+static u32 goya_gen_signal_cb(struct hl_device *hdev, void *data, u16 sob_id,
+ u32 size)
{
-
+ return 0;
}
-static void goya_gen_wait_cb(struct hl_device *hdev,
+static u32 goya_gen_wait_cb(struct hl_device *hdev,
struct hl_gen_wait_properties *prop)
{
-
+ return 0;
}
static void goya_reset_sob(struct hl_device *hdev, void *data)
--
2.17.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 3/4] habanalabs: sync stream refactor functions
2020-11-02 19:57 [PATCH] habanalabs: don't init vm module if no MMU Oded Gabbay
` (2 preceding siblings ...)
2020-11-02 19:58 ` [PATCH 2/4] habanalabs: add support for multiple SOBs per monitor Oded Gabbay
@ 2020-11-02 19:58 ` Oded Gabbay
2020-11-02 19:58 ` [PATCH 4/4] habanalabs: remove duplicate check Oded Gabbay
4 siblings, 0 replies; 6+ messages in thread
From: Oded Gabbay @ 2020-11-02 19:58 UTC (permalink / raw)
To: linux-kernel; +Cc: SW_Drivers, Ofir Bitton
From: Ofir Bitton <obitton@habana.ai>
Refactor sync stream implementation by reducing function length
for better readability.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
.../habanalabs/common/command_submission.c | 210 ++++++++++--------
1 file changed, 119 insertions(+), 91 deletions(-)
diff --git a/drivers/misc/habanalabs/common/command_submission.c b/drivers/misc/habanalabs/common/command_submission.c
index 0d82c7dd93d0..b0f33579ca7f 100644
--- a/drivers/misc/habanalabs/common/command_submission.c
+++ b/drivers/misc/habanalabs/common/command_submission.c
@@ -792,22 +792,121 @@ static int cs_ioctl_default(struct hl_fpriv *hpriv, void __user *chunks,
return rc;
}
+static int cs_ioctl_extract_signal_seq(struct hl_device *hdev,
+ struct hl_cs_chunk *chunk, u64 *signal_seq)
+{
+ u64 *signal_seq_arr = NULL;
+ u32 size_to_copy, signal_seq_arr_len;
+ int rc = 0;
+
+ signal_seq_arr_len = chunk->num_signal_seq_arr;
+
+ /* currently only one signal seq is supported */
+ if (signal_seq_arr_len != 1) {
+ dev_err(hdev->dev,
+ "Wait for signal CS supports only one signal CS seq\n");
+ return -EINVAL;
+ }
+
+ signal_seq_arr = kmalloc_array(signal_seq_arr_len,
+ sizeof(*signal_seq_arr),
+ GFP_ATOMIC);
+ if (!signal_seq_arr)
+ return -ENOMEM;
+
+ size_to_copy = chunk->num_signal_seq_arr * sizeof(*signal_seq_arr);
+ if (copy_from_user(signal_seq_arr,
+ u64_to_user_ptr(chunk->signal_seq_arr),
+ size_to_copy)) {
+ dev_err(hdev->dev,
+ "Failed to copy signal seq array from user\n");
+ rc = -EFAULT;
+ goto out;
+ }
+
+ /* currently it is guaranteed to have only one signal seq */
+ *signal_seq = signal_seq_arr[0];
+
+out:
+ kfree(signal_seq_arr);
+
+ return rc;
+}
+
+static int cs_ioctl_signal_wait_create_jobs(struct hl_device *hdev,
+ struct hl_ctx *ctx, struct hl_cs *cs, enum hl_queue_type q_type,
+ u32 q_idx)
+{
+ struct hl_cs_counters_atomic *cntr;
+ struct hl_cs_job *job;
+ struct hl_cb *cb;
+ u32 cb_size;
+
+ cntr = &hdev->aggregated_cs_counters;
+
+ job = hl_cs_allocate_job(hdev, q_type, true);
+ if (!job) {
+ ctx->cs_counters.out_of_mem_drop_cnt++;
+ atomic64_inc(&cntr->out_of_mem_drop_cnt);
+ dev_err(hdev->dev, "Failed to allocate a new job\n");
+ return -ENOMEM;
+ }
+
+ if (cs->type == CS_TYPE_WAIT)
+ cb_size = hdev->asic_funcs->get_wait_cb_size(hdev);
+ else
+ cb_size = hdev->asic_funcs->get_signal_cb_size(hdev);
+
+ cb = hl_cb_kernel_create(hdev, cb_size,
+ q_type == QUEUE_TYPE_HW && hdev->mmu_enable);
+ if (!cb) {
+ ctx->cs_counters.out_of_mem_drop_cnt++;
+ atomic64_inc(&cntr->out_of_mem_drop_cnt);
+ kfree(job);
+ return -EFAULT;
+ }
+
+ job->id = 0;
+ job->cs = cs;
+ job->user_cb = cb;
+ job->user_cb->cs_cnt++;
+ job->user_cb_size = cb_size;
+ job->hw_queue_id = q_idx;
+
+ /*
+ * No need in parsing, user CB is the patched CB.
+ * We call hl_cb_destroy() out of two reasons - we don't need the CB in
+ * the CB idr anymore and to decrement its refcount as it was
+ * incremented inside hl_cb_kernel_create().
+ */
+ job->patched_cb = job->user_cb;
+ job->job_cb_size = job->user_cb_size;
+ hl_cb_destroy(hdev, &hdev->kernel_cb_mgr, cb->id << PAGE_SHIFT);
+
+ cs->jobs_in_queue_cnt[job->hw_queue_id]++;
+
+ list_add_tail(&job->cs_node, &cs->job_list);
+
+ hl_debugfs_add_job(hdev, job);
+
+ return 0;
+}
+
static int cs_ioctl_signal_wait(struct hl_fpriv *hpriv, enum hl_cs_type cs_type,
void __user *chunks, u32 num_chunks,
u64 *cs_seq)
{
- u32 size_to_copy, q_idx, signal_seq_arr_len, cb_size;
+ struct hl_device *hdev = hpriv->hdev;
+ struct hl_ctx *ctx = hpriv->ctx;
struct hl_cs_chunk *cs_chunk_array, *chunk;
struct hw_queue_properties *hw_queue_prop;
- u64 *signal_seq_arr = NULL, signal_seq;
- struct hl_device *hdev = hpriv->hdev;
- struct hl_cs_counters_atomic *cntr;
struct hl_fence *sig_fence = NULL;
- struct hl_ctx *ctx = hpriv->ctx;
- enum hl_queue_type q_type;
- struct hl_cs_job *job;
+ struct hl_cs_counters_atomic *cntr;
+ struct hl_cs_compl *sig_waitcs_cmpl;
struct hl_cs *cs;
- struct hl_cb *cb;
+ enum hl_queue_type q_type;
+ u32 size_to_copy, q_idx;
+ u64 signal_seq;
int rc;
*cs_seq = ULLONG_MAX;
@@ -857,52 +956,23 @@ static int cs_ioctl_signal_wait(struct hl_fpriv *hpriv, enum hl_cs_type cs_type,
}
if (cs_type == CS_TYPE_WAIT) {
- struct hl_cs_compl *sig_waitcs_cmpl;
-
- signal_seq_arr_len = chunk->num_signal_seq_arr;
-
- /* currently only one signal seq is supported */
- if (signal_seq_arr_len != 1) {
- dev_err(hdev->dev,
- "Wait for signal CS supports only one signal CS seq\n");
- rc = -EINVAL;
+ rc = cs_ioctl_extract_signal_seq(hdev, chunk, &signal_seq);
+ if (rc)
goto free_cs_chunk_array;
- }
- signal_seq_arr = kmalloc_array(signal_seq_arr_len,
- sizeof(*signal_seq_arr),
- GFP_ATOMIC);
- if (!signal_seq_arr) {
- rc = -ENOMEM;
- goto free_cs_chunk_array;
- }
-
- size_to_copy = chunk->num_signal_seq_arr *
- sizeof(*signal_seq_arr);
- if (copy_from_user(signal_seq_arr,
- u64_to_user_ptr(chunk->signal_seq_arr),
- size_to_copy)) {
- dev_err(hdev->dev,
- "Failed to copy signal seq array from user\n");
- rc = -EFAULT;
- goto free_signal_seq_array;
- }
-
- /* currently it is guaranteed to have only one signal seq */
- signal_seq = signal_seq_arr[0];
sig_fence = hl_ctx_get_fence(ctx, signal_seq);
if (IS_ERR(sig_fence)) {
dev_err(hdev->dev,
"Failed to get signal CS with seq 0x%llx\n",
signal_seq);
rc = PTR_ERR(sig_fence);
- goto free_signal_seq_array;
+ goto free_cs_chunk_array;
}
if (!sig_fence) {
/* signal CS already finished */
rc = 0;
- goto free_signal_seq_array;
+ goto free_cs_chunk_array;
}
sig_waitcs_cmpl =
@@ -914,14 +984,14 @@ static int cs_ioctl_signal_wait(struct hl_fpriv *hpriv, enum hl_cs_type cs_type,
signal_seq);
hl_fence_put(sig_fence);
rc = -EINVAL;
- goto free_signal_seq_array;
+ goto free_cs_chunk_array;
}
if (completion_done(&sig_fence->completion)) {
/* signal CS already finished */
hl_fence_put(sig_fence);
rc = 0;
- goto free_signal_seq_array;
+ goto free_cs_chunk_array;
}
}
@@ -933,70 +1003,31 @@ static int cs_ioctl_signal_wait(struct hl_fpriv *hpriv, enum hl_cs_type cs_type,
if (cs_type == CS_TYPE_WAIT)
hl_fence_put(sig_fence);
hl_ctx_put(ctx);
- goto free_signal_seq_array;
+ goto free_cs_chunk_array;
}
/*
* Save the signal CS fence for later initialization right before
* hanging the wait CS on the queue.
*/
- if (cs->type == CS_TYPE_WAIT)
+ if (cs_type == CS_TYPE_WAIT)
cs->signal_fence = sig_fence;
hl_debugfs_add_cs(cs);
*cs_seq = cs->sequence;
- job = hl_cs_allocate_job(hdev, q_type, true);
- if (!job) {
- ctx->cs_counters.out_of_mem_drop_cnt++;
- atomic64_inc(&cntr->out_of_mem_drop_cnt);
- dev_err(hdev->dev, "Failed to allocate a new job\n");
- rc = -ENOMEM;
- goto put_cs;
- }
-
- if (cs->type == CS_TYPE_WAIT)
- cb_size = hdev->asic_funcs->get_wait_cb_size(hdev);
- else
- cb_size = hdev->asic_funcs->get_signal_cb_size(hdev);
+ if (cs_type == CS_TYPE_WAIT || cs_type == CS_TYPE_SIGNAL)
+ rc = cs_ioctl_signal_wait_create_jobs(hdev, ctx, cs, q_type,
+ q_idx);
- cb = hl_cb_kernel_create(hdev, cb_size,
- q_type == QUEUE_TYPE_HW && hdev->mmu_enable);
- if (!cb) {
- ctx->cs_counters.out_of_mem_drop_cnt++;
- atomic64_inc(&cntr->out_of_mem_drop_cnt);
- kfree(job);
- rc = -EFAULT;
+ if (rc)
goto put_cs;
- }
- job->id = 0;
- job->cs = cs;
- job->user_cb = cb;
- job->user_cb->cs_cnt++;
- job->user_cb_size = cb_size;
- job->hw_queue_id = q_idx;
-
- /*
- * No need in parsing, user CB is the patched CB.
- * We call hl_cb_destroy() out of two reasons - we don't need the CB in
- * the CB idr anymore and to decrement its refcount as it was
- * incremented inside hl_cb_kernel_create().
- */
- job->patched_cb = job->user_cb;
- job->job_cb_size = job->user_cb_size;
- hl_cb_destroy(hdev, &hdev->kernel_cb_mgr, cb->id << PAGE_SHIFT);
-
- cs->jobs_in_queue_cnt[job->hw_queue_id]++;
-
- list_add_tail(&job->cs_node, &cs->job_list);
/* increment refcount as for external queues we get completion */
cs_get(cs);
- hl_debugfs_add_job(hdev, job);
-
rc = hl_hw_queue_schedule_cs(cs);
if (rc) {
if (rc != -EAGAIN)
@@ -1016,9 +1047,6 @@ static int cs_ioctl_signal_wait(struct hl_fpriv *hpriv, enum hl_cs_type cs_type,
put_cs:
/* We finished with the CS in this function, so put the ref */
cs_put(cs);
-free_signal_seq_array:
- if (cs_type == CS_TYPE_WAIT)
- kfree(signal_seq_arr);
free_cs_chunk_array:
kfree(cs_chunk_array);
out:
--
2.17.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 4/4] habanalabs: remove duplicate check
2020-11-02 19:57 [PATCH] habanalabs: don't init vm module if no MMU Oded Gabbay
` (3 preceding siblings ...)
2020-11-02 19:58 ` [PATCH 3/4] habanalabs: sync stream refactor functions Oded Gabbay
@ 2020-11-02 19:58 ` Oded Gabbay
4 siblings, 0 replies; 6+ messages in thread
From: Oded Gabbay @ 2020-11-02 19:58 UTC (permalink / raw)
To: linux-kernel; +Cc: SW_Drivers
We already check if queue index is smaller than max queues a few lines
above this check so no need to check this again.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
drivers/misc/habanalabs/common/command_submission.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/misc/habanalabs/common/command_submission.c b/drivers/misc/habanalabs/common/command_submission.c
index b0f33579ca7f..1f8b53d42e3a 100644
--- a/drivers/misc/habanalabs/common/command_submission.c
+++ b/drivers/misc/habanalabs/common/command_submission.c
@@ -948,9 +948,10 @@ static int cs_ioctl_signal_wait(struct hl_fpriv *hpriv, enum hl_cs_type cs_type,
hw_queue_prop = &hdev->asic_prop.hw_queues_props[q_idx];
q_type = hw_queue_prop->type;
- if ((q_idx >= hdev->asic_prop.max_queues) ||
- (!hw_queue_prop->supports_sync_stream)) {
- dev_err(hdev->dev, "Queue index %d is invalid\n", q_idx);
+ if (!hw_queue_prop->supports_sync_stream) {
+ dev_err(hdev->dev,
+ "Queue index %d does not support sync stream operations\n",
+ q_idx);
rc = -EINVAL;
goto free_cs_chunk_array;
}
--
2.17.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-11-02 19:58 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-02 19:57 [PATCH] habanalabs: don't init vm module if no MMU Oded Gabbay
2020-11-02 19:57 ` [PATCH] habanalabs: minimize prints when everything is fine Oded Gabbay
2020-11-02 19:57 ` [PATCH 1/4] habanalabs: sync stream structures refactor Oded Gabbay
2020-11-02 19:58 ` [PATCH 2/4] habanalabs: add support for multiple SOBs per monitor Oded Gabbay
2020-11-02 19:58 ` [PATCH 3/4] habanalabs: sync stream refactor functions Oded Gabbay
2020-11-02 19:58 ` [PATCH 4/4] habanalabs: remove duplicate check Oded Gabbay
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).