All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/17] drm/v3d: Introduce CPU jobs
@ 2023-11-24  0:46 Maíra Canal
  2023-11-24  0:46 ` [PATCH v2 01/17] drm/v3d: Remove unused function header Maíra Canal
                   ` (16 more replies)
  0 siblings, 17 replies; 25+ messages in thread
From: Maíra Canal @ 2023-11-24  0:46 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann
  Cc: Maíra Canal, kernel-dev, Boris Brezillon, dri-devel, Faith Ekstrand

This patchset implements the basic infrastructure for a new type of
V3D job, a CPU job. A CPU job is a job that requires CPU intervention.
It would be nice to perform this operations on the kernel space as we
can attach multiple in/out syncobjs to it.

Why we want a CPU job on the kernel?
====================================

There are some Vulkan commands that cannot be performed by the GPU, so
we implement those as CPU jobs on Mesa. But to synchronize a CPU job
in the user space, we need to hold part of the command submission flow
in order to correctly synchronize their execution.

By moving the CPU job to the kernel, we can make use of the DRM
schedule queues and all the advantages it brings with it. This way,
instead of stalling the submission thread, we can use syncobjs to
synchronize the job, providing a more effective management.

About the implementation
========================

After we decided that we would like to have a CPU job implementation
in the kernel, we could think about two possible implementations for
this job: creating an IOCTL for each type of CPU job or using an user
extension to provide a polymorphic behavior to a single CPU job IOCTL.
We decided for the latter one.

We have different types of CPU jobs (indirect CSD jobs, timestamp
query jobs, copy query results jobs...) and each of them have a common
infrastructure, but perform different operations. Therefore, by using
a single IOCTL that is extended by an user extension, we can reuse the
common infrastructure - avoiding code repetition - and yet use the
user extension ID to identify the type of job and depending on the
type of job, perform a certain operation.

About the patchset
==================

This patchset introduces the basic infrastructure of a CPU job with a
new V3D queue (V3D_CPU) e new tracers. Moreover, it introduces six
types of CPU jobs: an indirect CSD job, a timestamp query job, a
reset timestamp queries job, a copy timestamp query results job, a reset
performance queries job, and a copy performance query results job.

An indirect CSD job is a job that, when executed in the queue, will
map the indirect buffer, read the dispatch parameters, and submit a
regular dispatch. So, the CSD job depends on the CPU job execution. We
attach the wait dependencies to the CPU job and once they are satisfied,
we read the dispatch parameters, rewrite the uniforms (if needed) and
enable the CSD job execution, which depends on the completion of the
CPU job.

A timestamp query job is a job that calculates the value of the
timestamp query and updates the availability of the query. In order to
implement this job, we had to change the Mesa implementation of the
timestamp. Now, the timestamp query value is tracked in a BO, instead
of using a memory address. Moreover, the timestamp query availability is
tracked with a syncobj, which is signaled when the query is available.

A reset timestamp queries job is a job that resets the timestamp queries by
zeroing the timestamp BO in the right positions. The right position on
the timestamp BO is found through the offset of the first query.

A reset performance queries job is a job that zeros the values of the
performance monitors associated to that query. Moreover, it resets the
availability syncobj related to that query.

A copy query results job is a job that copy the results of a query to a
BO in a given offset with a given stride.

The patchset is divided as such:
 * #1 - #4: refactoring operations to prepare for the introduction of the
            CPU job
 * #5: addressing a vulnerability in the multisync extension
 * #6: decouple job allocation from job initiation
 * #7 - #9: introduction of the CPU job
 * #10 - #11: refactoring operations to prepare for the introduction of the
              indirect CSD job
 * #12: introduction of the indirect CSD job
 * #13: introduction of the timestamp query job
 * #14: introduction of the reset timestamp queries job
 * #15: introduction of the copy timestamp query results job
 * #16: introduction of the reset performance queries job
 * #17: introduction of the copy performance query results job

This patchset has its Mesa counterpart, which is available on [1].

Both the kernel and Mesa implementation were tested with

 * `dEQP-VK.compute.pipeline.indirect_dispatch.*`,
 * `dEQP-VK.pipeline.monolithic.timestamp.*`,
 * `dEQP-VK.synchronization.*`,
 * `dEQP-VK.query_pool.*`
 * and `dEQP-VK.multiview.*`.

[1] https://gitlab.freedesktop.org/mairacanal/mesa/-/tree/v3dv/v5/cpu-job

Changelog
=========

v1 -> v2: https://lore.kernel.org/dri-devel/20230904175019.1172713-1-mcanal@igalia.com/

* Rebase on top of drm-misc-next.
* Add GPU stats to the CPU queue.

Best Regards,
- Maíra

Maíra Canal (11):
  drm/v3d: Don't allow two multisync extensions in the same job
  drm/v3d: Decouple job allocation from job initiation
  drm/v3d: Use v3d_get_extensions() to parse CPU job data
  drm/v3d: Create tracepoints to track the CPU job
  drm/v3d: Enable BO mapping
  drm/v3d: Create a CPU job extension for a indirect CSD job
  drm/v3d: Create a CPU job extension for the timestamp query job
  drm/v3d: Create a CPU job extension for the reset timestamp job
  drm/v3d: Create a CPU job extension to copy timestamp query to a buffer
  drm/v3d: Create a CPU job extension for the reset performance query job
  drm/v3d: Create a CPU job extension for the copy performance query job

Melissa Wen (6):
  drm/v3d: Remove unused function header
  drm/v3d: Move wait BO ioctl to the v3d_bo file
  drm/v3d: Detach job submissions IOCTLs to a new specific file
  drm/v3d: Simplify job refcount handling
  drm/v3d: Add a CPU job submission
  drm/v3d: Detach the CSD job BO setup

 drivers/gpu/drm/v3d/Makefile     |    3 +-
 drivers/gpu/drm/v3d/v3d_bo.c     |   51 ++
 drivers/gpu/drm/v3d/v3d_drv.c    |    4 +
 drivers/gpu/drm/v3d/v3d_drv.h    |  132 ++-
 drivers/gpu/drm/v3d/v3d_gem.c    |  768 ------------------
 drivers/gpu/drm/v3d/v3d_sched.c  |  315 ++++++++
 drivers/gpu/drm/v3d/v3d_submit.c | 1293 ++++++++++++++++++++++++++++++
 drivers/gpu/drm/v3d/v3d_trace.h  |   57 ++
 include/uapi/drm/v3d_drm.h       |  220 ++++-
 9 files changed, 2063 insertions(+), 780 deletions(-)
 create mode 100644 drivers/gpu/drm/v3d/v3d_submit.c

--
2.41.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 01/17] drm/v3d: Remove unused function header
  2023-11-24  0:46 [PATCH v2 00/17] drm/v3d: Introduce CPU jobs Maíra Canal
@ 2023-11-24  0:46 ` Maíra Canal
  2023-11-24  0:46 ` [PATCH v2 02/17] drm/v3d: Move wait BO ioctl to the v3d_bo file Maíra Canal
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Maíra Canal @ 2023-11-24  0:46 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann
  Cc: Maíra Canal, kernel-dev, Boris Brezillon, dri-devel, Faith Ekstrand

From: Melissa Wen <mwen@igalia.com>

v3d_mmu_get_offset header was added but the function was never defined.
Just remove it.

Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
 drivers/gpu/drm/v3d/v3d_drv.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 4c59fefaa0b4..d8b392494eba 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -420,8 +420,6 @@ void v3d_irq_disable(struct v3d_dev *v3d);
 void v3d_irq_reset(struct v3d_dev *v3d);
 
 /* v3d_mmu.c */
-int v3d_mmu_get_offset(struct drm_file *file_priv, struct v3d_bo *bo,
-		       u32 *offset);
 int v3d_mmu_set_page_table(struct v3d_dev *v3d);
 void v3d_mmu_insert_ptes(struct v3d_bo *bo);
 void v3d_mmu_remove_ptes(struct v3d_bo *bo);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 02/17] drm/v3d: Move wait BO ioctl to the v3d_bo file
  2023-11-24  0:46 [PATCH v2 00/17] drm/v3d: Introduce CPU jobs Maíra Canal
  2023-11-24  0:46 ` [PATCH v2 01/17] drm/v3d: Remove unused function header Maíra Canal
@ 2023-11-24  0:46 ` Maíra Canal
  2023-11-24  0:46 ` [PATCH v2 03/17] drm/v3d: Detach job submissions IOCTLs to a new specific file Maíra Canal
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Maíra Canal @ 2023-11-24  0:46 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann
  Cc: Maíra Canal, kernel-dev, Boris Brezillon, dri-devel, Faith Ekstrand

From: Melissa Wen <mwen@igalia.com>

IOCTLs related to BO operations reside on the file v3d_bo.c. The wait BO
ioctl is the only IOCTL regarding BOs that is placed in a different file.
So, move it to the v3d_bo.c file.

Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
 drivers/gpu/drm/v3d/v3d_bo.c  | 33 +++++++++++++++++++++++++++++++++
 drivers/gpu/drm/v3d/v3d_drv.h |  4 ++--
 drivers/gpu/drm/v3d/v3d_gem.c | 33 ---------------------------------
 3 files changed, 35 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_bo.c b/drivers/gpu/drm/v3d/v3d_bo.c
index 8b3229a37c6d..357a0da7e16a 100644
--- a/drivers/gpu/drm/v3d/v3d_bo.c
+++ b/drivers/gpu/drm/v3d/v3d_bo.c
@@ -233,3 +233,36 @@ int v3d_get_bo_offset_ioctl(struct drm_device *dev, void *data,
 	drm_gem_object_put(gem_obj);
 	return 0;
 }
+
+int
+v3d_wait_bo_ioctl(struct drm_device *dev, void *data,
+		  struct drm_file *file_priv)
+{
+	int ret;
+	struct drm_v3d_wait_bo *args = data;
+	ktime_t start = ktime_get();
+	u64 delta_ns;
+	unsigned long timeout_jiffies =
+		nsecs_to_jiffies_timeout(args->timeout_ns);
+
+	if (args->pad != 0)
+		return -EINVAL;
+
+	ret = drm_gem_dma_resv_wait(file_priv, args->handle,
+				    true, timeout_jiffies);
+
+	/* Decrement the user's timeout, in case we got interrupted
+	 * such that the ioctl will be restarted.
+	 */
+	delta_ns = ktime_to_ns(ktime_sub(ktime_get(), start));
+	if (delta_ns < args->timeout_ns)
+		args->timeout_ns -= delta_ns;
+	else
+		args->timeout_ns = 0;
+
+	/* Asked to wait beyond the jiffie/scheduler precision? */
+	if (ret == -ETIME && args->timeout_ns)
+		ret = -EAGAIN;
+
+	return ret;
+}
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index d8b392494eba..fc04372d5cbd 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -385,6 +385,8 @@ int v3d_mmap_bo_ioctl(struct drm_device *dev, void *data,
 		      struct drm_file *file_priv);
 int v3d_get_bo_offset_ioctl(struct drm_device *dev, void *data,
 			    struct drm_file *file_priv);
+int v3d_wait_bo_ioctl(struct drm_device *dev, void *data,
+		      struct drm_file *file_priv);
 struct drm_gem_object *v3d_prime_import_sg_table(struct drm_device *dev,
 						 struct dma_buf_attachment *attach,
 						 struct sg_table *sgt);
@@ -405,8 +407,6 @@ int v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
 			 struct drm_file *file_priv);
 int v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
 			 struct drm_file *file_priv);
-int v3d_wait_bo_ioctl(struct drm_device *dev, void *data,
-		      struct drm_file *file_priv);
 void v3d_job_cleanup(struct v3d_job *job);
 void v3d_job_put(struct v3d_job *job);
 void v3d_reset(struct v3d_dev *v3d);
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index 9d2ac23c29e3..26f62a48d226 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -363,39 +363,6 @@ void v3d_job_put(struct v3d_job *job)
 	kref_put(&job->refcount, job->free);
 }
 
-int
-v3d_wait_bo_ioctl(struct drm_device *dev, void *data,
-		  struct drm_file *file_priv)
-{
-	int ret;
-	struct drm_v3d_wait_bo *args = data;
-	ktime_t start = ktime_get();
-	u64 delta_ns;
-	unsigned long timeout_jiffies =
-		nsecs_to_jiffies_timeout(args->timeout_ns);
-
-	if (args->pad != 0)
-		return -EINVAL;
-
-	ret = drm_gem_dma_resv_wait(file_priv, args->handle,
-				    true, timeout_jiffies);
-
-	/* Decrement the user's timeout, in case we got interrupted
-	 * such that the ioctl will be restarted.
-	 */
-	delta_ns = ktime_to_ns(ktime_sub(ktime_get(), start));
-	if (delta_ns < args->timeout_ns)
-		args->timeout_ns -= delta_ns;
-	else
-		args->timeout_ns = 0;
-
-	/* Asked to wait beyond the jiffie/scheduler precision? */
-	if (ret == -ETIME && args->timeout_ns)
-		ret = -EAGAIN;
-
-	return ret;
-}
-
 static int
 v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
 	     void **container, size_t size, void (*free)(struct kref *ref),
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 03/17] drm/v3d: Detach job submissions IOCTLs to a new specific file
  2023-11-24  0:46 [PATCH v2 00/17] drm/v3d: Introduce CPU jobs Maíra Canal
  2023-11-24  0:46 ` [PATCH v2 01/17] drm/v3d: Remove unused function header Maíra Canal
  2023-11-24  0:46 ` [PATCH v2 02/17] drm/v3d: Move wait BO ioctl to the v3d_bo file Maíra Canal
@ 2023-11-24  0:46 ` Maíra Canal
  2023-11-24  0:47 ` [PATCH v2 04/17] drm/v3d: Simplify job refcount handling Maíra Canal
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Maíra Canal @ 2023-11-24  0:46 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann
  Cc: Maíra Canal, kernel-dev, Boris Brezillon, dri-devel, Faith Ekstrand

From: Melissa Wen <mwen@igalia.com>

We will include a new job submission type, the CPU job submission. For
readability and maintability, separate the job submission IOCTLs and
related operations from v3d_gem.c.

Minor fix in the CSD submission kernel doc:
CSD (texture formatting) -> CSD (compute shader).

Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
 drivers/gpu/drm/v3d/Makefile     |   3 +-
 drivers/gpu/drm/v3d/v3d_drv.h    |  12 +-
 drivers/gpu/drm/v3d/v3d_gem.c    | 735 ------------------------------
 drivers/gpu/drm/v3d/v3d_submit.c | 744 +++++++++++++++++++++++++++++++
 4 files changed, 753 insertions(+), 741 deletions(-)
 create mode 100644 drivers/gpu/drm/v3d/v3d_submit.c

diff --git a/drivers/gpu/drm/v3d/Makefile b/drivers/gpu/drm/v3d/Makefile
index 4b21b20e4998..b7d673f1153b 100644
--- a/drivers/gpu/drm/v3d/Makefile
+++ b/drivers/gpu/drm/v3d/Makefile
@@ -12,7 +12,8 @@ v3d-y := \
 	v3d_perfmon.o \
 	v3d_trace_points.o \
 	v3d_sched.o \
-	v3d_sysfs.o
+	v3d_sysfs.o \
+	v3d_submit.o
 
 v3d-$(CONFIG_DEBUG_FS) += v3d_debugfs.o
 
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index fc04372d5cbd..4db9ace66024 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -401,17 +401,19 @@ struct dma_fence *v3d_fence_create(struct v3d_dev *v3d, enum v3d_queue queue);
 /* v3d_gem.c */
 int v3d_gem_init(struct drm_device *dev);
 void v3d_gem_destroy(struct drm_device *dev);
+void v3d_reset(struct v3d_dev *v3d);
+void v3d_invalidate_caches(struct v3d_dev *v3d);
+void v3d_clean_caches(struct v3d_dev *v3d);
+
+/* v3d_submit.c */
+void v3d_job_cleanup(struct v3d_job *job);
+void v3d_job_put(struct v3d_job *job);
 int v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
 			struct drm_file *file_priv);
 int v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
 			 struct drm_file *file_priv);
 int v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
 			 struct drm_file *file_priv);
-void v3d_job_cleanup(struct v3d_job *job);
-void v3d_job_put(struct v3d_job *job);
-void v3d_reset(struct v3d_dev *v3d);
-void v3d_invalidate_caches(struct v3d_dev *v3d);
-void v3d_clean_caches(struct v3d_dev *v3d);
 
 /* v3d_irq.c */
 int v3d_irq_init(struct v3d_dev *v3d);
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index 26f62a48d226..afc565078c78 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -11,8 +11,6 @@
 #include <linux/uaccess.h>
 
 #include <drm/drm_managed.h>
-#include <drm/drm_syncobj.h>
-#include <uapi/drm/v3d_drm.h>
 
 #include "v3d_drv.h"
 #include "v3d_regs.h"
@@ -241,739 +239,6 @@ v3d_invalidate_caches(struct v3d_dev *v3d)
 	v3d_invalidate_slices(v3d, 0);
 }
 
-/* Takes the reservation lock on all the BOs being referenced, so that
- * at queue submit time we can update the reservations.
- *
- * We don't lock the RCL the tile alloc/state BOs, or overflow memory
- * (all of which are on exec->unref_list).  They're entirely private
- * to v3d, so we don't attach dma-buf fences to them.
- */
-static int
-v3d_lock_bo_reservations(struct v3d_job *job,
-			 struct ww_acquire_ctx *acquire_ctx)
-{
-	int i, ret;
-
-	ret = drm_gem_lock_reservations(job->bo, job->bo_count, acquire_ctx);
-	if (ret)
-		return ret;
-
-	for (i = 0; i < job->bo_count; i++) {
-		ret = dma_resv_reserve_fences(job->bo[i]->resv, 1);
-		if (ret)
-			goto fail;
-
-		ret = drm_sched_job_add_implicit_dependencies(&job->base,
-							      job->bo[i], true);
-		if (ret)
-			goto fail;
-	}
-
-	return 0;
-
-fail:
-	drm_gem_unlock_reservations(job->bo, job->bo_count, acquire_ctx);
-	return ret;
-}
-
-/**
- * v3d_lookup_bos() - Sets up job->bo[] with the GEM objects
- * referenced by the job.
- * @dev: DRM device
- * @file_priv: DRM file for this fd
- * @job: V3D job being set up
- * @bo_handles: GEM handles
- * @bo_count: Number of GEM handles passed in
- *
- * The command validator needs to reference BOs by their index within
- * the submitted job's BO list.  This does the validation of the job's
- * BO list and reference counting for the lifetime of the job.
- *
- * Note that this function doesn't need to unreference the BOs on
- * failure, because that will happen at v3d_exec_cleanup() time.
- */
-static int
-v3d_lookup_bos(struct drm_device *dev,
-	       struct drm_file *file_priv,
-	       struct v3d_job *job,
-	       u64 bo_handles,
-	       u32 bo_count)
-{
-	job->bo_count = bo_count;
-
-	if (!job->bo_count) {
-		/* See comment on bo_index for why we have to check
-		 * this.
-		 */
-		DRM_DEBUG("Rendering requires BOs\n");
-		return -EINVAL;
-	}
-
-	return drm_gem_objects_lookup(file_priv,
-				      (void __user *)(uintptr_t)bo_handles,
-				      job->bo_count, &job->bo);
-}
-
-static void
-v3d_job_free(struct kref *ref)
-{
-	struct v3d_job *job = container_of(ref, struct v3d_job, refcount);
-	int i;
-
-	if (job->bo) {
-		for (i = 0; i < job->bo_count; i++)
-			drm_gem_object_put(job->bo[i]);
-		kvfree(job->bo);
-	}
-
-	dma_fence_put(job->irq_fence);
-	dma_fence_put(job->done_fence);
-
-	if (job->perfmon)
-		v3d_perfmon_put(job->perfmon);
-
-	kfree(job);
-}
-
-static void
-v3d_render_job_free(struct kref *ref)
-{
-	struct v3d_render_job *job = container_of(ref, struct v3d_render_job,
-						  base.refcount);
-	struct v3d_bo *bo, *save;
-
-	list_for_each_entry_safe(bo, save, &job->unref_list, unref_head) {
-		drm_gem_object_put(&bo->base.base);
-	}
-
-	v3d_job_free(ref);
-}
-
-void v3d_job_cleanup(struct v3d_job *job)
-{
-	if (!job)
-		return;
-
-	drm_sched_job_cleanup(&job->base);
-	v3d_job_put(job);
-}
-
-void v3d_job_put(struct v3d_job *job)
-{
-	kref_put(&job->refcount, job->free);
-}
-
-static int
-v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
-	     void **container, size_t size, void (*free)(struct kref *ref),
-	     u32 in_sync, struct v3d_submit_ext *se, enum v3d_queue queue)
-{
-	struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
-	struct v3d_job *job;
-	bool has_multisync = se && (se->flags & DRM_V3D_EXT_ID_MULTI_SYNC);
-	int ret, i;
-
-	*container = kcalloc(1, size, GFP_KERNEL);
-	if (!*container) {
-		DRM_ERROR("Cannot allocate memory for v3d job.");
-		return -ENOMEM;
-	}
-
-	job = *container;
-	job->v3d = v3d;
-	job->free = free;
-	job->file = file_priv;
-
-	ret = drm_sched_job_init(&job->base, &v3d_priv->sched_entity[queue],
-				 1, v3d_priv);
-	if (ret)
-		goto fail;
-
-	if (has_multisync) {
-		if (se->in_sync_count && se->wait_stage == queue) {
-			struct drm_v3d_sem __user *handle = u64_to_user_ptr(se->in_syncs);
-
-			for (i = 0; i < se->in_sync_count; i++) {
-				struct drm_v3d_sem in;
-
-				if (copy_from_user(&in, handle++, sizeof(in))) {
-					ret = -EFAULT;
-					DRM_DEBUG("Failed to copy wait dep handle.\n");
-					goto fail_deps;
-				}
-				ret = drm_sched_job_add_syncobj_dependency(&job->base, file_priv, in.handle, 0);
-
-				// TODO: Investigate why this was filtered out for the IOCTL.
-				if (ret && ret != -ENOENT)
-					goto fail_deps;
-			}
-		}
-	} else {
-		ret = drm_sched_job_add_syncobj_dependency(&job->base, file_priv, in_sync, 0);
-
-		// TODO: Investigate why this was filtered out for the IOCTL.
-		if (ret && ret != -ENOENT)
-			goto fail_deps;
-	}
-
-	kref_init(&job->refcount);
-
-	return 0;
-
-fail_deps:
-	drm_sched_job_cleanup(&job->base);
-fail:
-	kfree(*container);
-	*container = NULL;
-
-	return ret;
-}
-
-static void
-v3d_push_job(struct v3d_job *job)
-{
-	drm_sched_job_arm(&job->base);
-
-	job->done_fence = dma_fence_get(&job->base.s_fence->finished);
-
-	/* put by scheduler job completion */
-	kref_get(&job->refcount);
-
-	drm_sched_entity_push_job(&job->base);
-}
-
-static void
-v3d_attach_fences_and_unlock_reservation(struct drm_file *file_priv,
-					 struct v3d_job *job,
-					 struct ww_acquire_ctx *acquire_ctx,
-					 u32 out_sync,
-					 struct v3d_submit_ext *se,
-					 struct dma_fence *done_fence)
-{
-	struct drm_syncobj *sync_out;
-	bool has_multisync = se && (se->flags & DRM_V3D_EXT_ID_MULTI_SYNC);
-	int i;
-
-	for (i = 0; i < job->bo_count; i++) {
-		/* XXX: Use shared fences for read-only objects. */
-		dma_resv_add_fence(job->bo[i]->resv, job->done_fence,
-				   DMA_RESV_USAGE_WRITE);
-	}
-
-	drm_gem_unlock_reservations(job->bo, job->bo_count, acquire_ctx);
-
-	/* Update the return sync object for the job */
-	/* If it only supports a single signal semaphore*/
-	if (!has_multisync) {
-		sync_out = drm_syncobj_find(file_priv, out_sync);
-		if (sync_out) {
-			drm_syncobj_replace_fence(sync_out, done_fence);
-			drm_syncobj_put(sync_out);
-		}
-		return;
-	}
-
-	/* If multiple semaphores extension is supported */
-	if (se->out_sync_count) {
-		for (i = 0; i < se->out_sync_count; i++) {
-			drm_syncobj_replace_fence(se->out_syncs[i].syncobj,
-						  done_fence);
-			drm_syncobj_put(se->out_syncs[i].syncobj);
-		}
-		kvfree(se->out_syncs);
-	}
-}
-
-static void
-v3d_put_multisync_post_deps(struct v3d_submit_ext *se)
-{
-	unsigned int i;
-
-	if (!(se && se->out_sync_count))
-		return;
-
-	for (i = 0; i < se->out_sync_count; i++)
-		drm_syncobj_put(se->out_syncs[i].syncobj);
-	kvfree(se->out_syncs);
-}
-
-static int
-v3d_get_multisync_post_deps(struct drm_file *file_priv,
-			    struct v3d_submit_ext *se,
-			    u32 count, u64 handles)
-{
-	struct drm_v3d_sem __user *post_deps;
-	int i, ret;
-
-	if (!count)
-		return 0;
-
-	se->out_syncs = (struct v3d_submit_outsync *)
-			kvmalloc_array(count,
-				       sizeof(struct v3d_submit_outsync),
-				       GFP_KERNEL);
-	if (!se->out_syncs)
-		return -ENOMEM;
-
-	post_deps = u64_to_user_ptr(handles);
-
-	for (i = 0; i < count; i++) {
-		struct drm_v3d_sem out;
-
-		if (copy_from_user(&out, post_deps++, sizeof(out))) {
-			ret = -EFAULT;
-			DRM_DEBUG("Failed to copy post dep handles\n");
-			goto fail;
-		}
-
-		se->out_syncs[i].syncobj = drm_syncobj_find(file_priv,
-							    out.handle);
-		if (!se->out_syncs[i].syncobj) {
-			ret = -EINVAL;
-			goto fail;
-		}
-	}
-	se->out_sync_count = count;
-
-	return 0;
-
-fail:
-	for (i--; i >= 0; i--)
-		drm_syncobj_put(se->out_syncs[i].syncobj);
-	kvfree(se->out_syncs);
-
-	return ret;
-}
-
-/* Get data for multiple binary semaphores synchronization. Parse syncobj
- * to be signaled when job completes (out_sync).
- */
-static int
-v3d_get_multisync_submit_deps(struct drm_file *file_priv,
-			      struct drm_v3d_extension __user *ext,
-			      void *data)
-{
-	struct drm_v3d_multi_sync multisync;
-	struct v3d_submit_ext *se = data;
-	int ret;
-
-	if (copy_from_user(&multisync, ext, sizeof(multisync)))
-		return -EFAULT;
-
-	if (multisync.pad)
-		return -EINVAL;
-
-	ret = v3d_get_multisync_post_deps(file_priv, data, multisync.out_sync_count,
-					  multisync.out_syncs);
-	if (ret)
-		return ret;
-
-	se->in_sync_count = multisync.in_sync_count;
-	se->in_syncs = multisync.in_syncs;
-	se->flags |= DRM_V3D_EXT_ID_MULTI_SYNC;
-	se->wait_stage = multisync.wait_stage;
-
-	return 0;
-}
-
-/* Whenever userspace sets ioctl extensions, v3d_get_extensions parses data
- * according to the extension id (name).
- */
-static int
-v3d_get_extensions(struct drm_file *file_priv,
-		   u64 ext_handles,
-		   void *data)
-{
-	struct drm_v3d_extension __user *user_ext;
-	int ret;
-
-	user_ext = u64_to_user_ptr(ext_handles);
-	while (user_ext) {
-		struct drm_v3d_extension ext;
-
-		if (copy_from_user(&ext, user_ext, sizeof(ext))) {
-			DRM_DEBUG("Failed to copy submit extension\n");
-			return -EFAULT;
-		}
-
-		switch (ext.id) {
-		case DRM_V3D_EXT_ID_MULTI_SYNC:
-			ret = v3d_get_multisync_submit_deps(file_priv, user_ext, data);
-			if (ret)
-				return ret;
-			break;
-		default:
-			DRM_DEBUG_DRIVER("Unknown extension id: %d\n", ext.id);
-			return -EINVAL;
-		}
-
-		user_ext = u64_to_user_ptr(ext.next);
-	}
-
-	return 0;
-}
-
-/**
- * v3d_submit_cl_ioctl() - Submits a job (frame) to the V3D.
- * @dev: DRM device
- * @data: ioctl argument
- * @file_priv: DRM file for this fd
- *
- * This is the main entrypoint for userspace to submit a 3D frame to
- * the GPU.  Userspace provides the binner command list (if
- * applicable), and the kernel sets up the render command list to draw
- * to the framebuffer described in the ioctl, using the command lists
- * that the 3D engine's binner will produce.
- */
-int
-v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
-		    struct drm_file *file_priv)
-{
-	struct v3d_dev *v3d = to_v3d_dev(dev);
-	struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
-	struct drm_v3d_submit_cl *args = data;
-	struct v3d_submit_ext se = {0};
-	struct v3d_bin_job *bin = NULL;
-	struct v3d_render_job *render = NULL;
-	struct v3d_job *clean_job = NULL;
-	struct v3d_job *last_job;
-	struct ww_acquire_ctx acquire_ctx;
-	int ret = 0;
-
-	trace_v3d_submit_cl_ioctl(&v3d->drm, args->rcl_start, args->rcl_end);
-
-	if (args->pad)
-		return -EINVAL;
-
-	if (args->flags &&
-	    args->flags & ~(DRM_V3D_SUBMIT_CL_FLUSH_CACHE |
-			    DRM_V3D_SUBMIT_EXTENSION)) {
-		DRM_INFO("invalid flags: %d\n", args->flags);
-		return -EINVAL;
-	}
-
-	if (args->flags & DRM_V3D_SUBMIT_EXTENSION) {
-		ret = v3d_get_extensions(file_priv, args->extensions, &se);
-		if (ret) {
-			DRM_DEBUG("Failed to get extensions.\n");
-			return ret;
-		}
-	}
-
-	ret = v3d_job_init(v3d, file_priv, (void *)&render, sizeof(*render),
-			   v3d_render_job_free, args->in_sync_rcl, &se, V3D_RENDER);
-	if (ret)
-		goto fail;
-
-	render->start = args->rcl_start;
-	render->end = args->rcl_end;
-	INIT_LIST_HEAD(&render->unref_list);
-
-	if (args->bcl_start != args->bcl_end) {
-		ret = v3d_job_init(v3d, file_priv, (void *)&bin, sizeof(*bin),
-				   v3d_job_free, args->in_sync_bcl, &se, V3D_BIN);
-		if (ret)
-			goto fail;
-
-		bin->start = args->bcl_start;
-		bin->end = args->bcl_end;
-		bin->qma = args->qma;
-		bin->qms = args->qms;
-		bin->qts = args->qts;
-		bin->render = render;
-	}
-
-	if (args->flags & DRM_V3D_SUBMIT_CL_FLUSH_CACHE) {
-		ret = v3d_job_init(v3d, file_priv, (void *)&clean_job, sizeof(*clean_job),
-				   v3d_job_free, 0, NULL, V3D_CACHE_CLEAN);
-		if (ret)
-			goto fail;
-
-		last_job = clean_job;
-	} else {
-		last_job = &render->base;
-	}
-
-	ret = v3d_lookup_bos(dev, file_priv, last_job,
-			     args->bo_handles, args->bo_handle_count);
-	if (ret)
-		goto fail;
-
-	ret = v3d_lock_bo_reservations(last_job, &acquire_ctx);
-	if (ret)
-		goto fail;
-
-	if (args->perfmon_id) {
-		render->base.perfmon = v3d_perfmon_find(v3d_priv,
-							args->perfmon_id);
-
-		if (!render->base.perfmon) {
-			ret = -ENOENT;
-			goto fail_perfmon;
-		}
-	}
-
-	mutex_lock(&v3d->sched_lock);
-	if (bin) {
-		bin->base.perfmon = render->base.perfmon;
-		v3d_perfmon_get(bin->base.perfmon);
-		v3d_push_job(&bin->base);
-
-		ret = drm_sched_job_add_dependency(&render->base.base,
-						   dma_fence_get(bin->base.done_fence));
-		if (ret)
-			goto fail_unreserve;
-	}
-
-	v3d_push_job(&render->base);
-
-	if (clean_job) {
-		struct dma_fence *render_fence =
-			dma_fence_get(render->base.done_fence);
-		ret = drm_sched_job_add_dependency(&clean_job->base,
-						   render_fence);
-		if (ret)
-			goto fail_unreserve;
-		clean_job->perfmon = render->base.perfmon;
-		v3d_perfmon_get(clean_job->perfmon);
-		v3d_push_job(clean_job);
-	}
-
-	mutex_unlock(&v3d->sched_lock);
-
-	v3d_attach_fences_and_unlock_reservation(file_priv,
-						 last_job,
-						 &acquire_ctx,
-						 args->out_sync,
-						 &se,
-						 last_job->done_fence);
-
-	if (bin)
-		v3d_job_put(&bin->base);
-	v3d_job_put(&render->base);
-	if (clean_job)
-		v3d_job_put(clean_job);
-
-	return 0;
-
-fail_unreserve:
-	mutex_unlock(&v3d->sched_lock);
-fail_perfmon:
-	drm_gem_unlock_reservations(last_job->bo,
-				    last_job->bo_count, &acquire_ctx);
-fail:
-	v3d_job_cleanup((void *)bin);
-	v3d_job_cleanup((void *)render);
-	v3d_job_cleanup(clean_job);
-	v3d_put_multisync_post_deps(&se);
-
-	return ret;
-}
-
-/**
- * v3d_submit_tfu_ioctl() - Submits a TFU (texture formatting) job to the V3D.
- * @dev: DRM device
- * @data: ioctl argument
- * @file_priv: DRM file for this fd
- *
- * Userspace provides the register setup for the TFU, which we don't
- * need to validate since the TFU is behind the MMU.
- */
-int
-v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
-		     struct drm_file *file_priv)
-{
-	struct v3d_dev *v3d = to_v3d_dev(dev);
-	struct drm_v3d_submit_tfu *args = data;
-	struct v3d_submit_ext se = {0};
-	struct v3d_tfu_job *job = NULL;
-	struct ww_acquire_ctx acquire_ctx;
-	int ret = 0;
-
-	trace_v3d_submit_tfu_ioctl(&v3d->drm, args->iia);
-
-	if (args->flags && !(args->flags & DRM_V3D_SUBMIT_EXTENSION)) {
-		DRM_DEBUG("invalid flags: %d\n", args->flags);
-		return -EINVAL;
-	}
-
-	if (args->flags & DRM_V3D_SUBMIT_EXTENSION) {
-		ret = v3d_get_extensions(file_priv, args->extensions, &se);
-		if (ret) {
-			DRM_DEBUG("Failed to get extensions.\n");
-			return ret;
-		}
-	}
-
-	ret = v3d_job_init(v3d, file_priv, (void *)&job, sizeof(*job),
-			   v3d_job_free, args->in_sync, &se, V3D_TFU);
-	if (ret)
-		goto fail;
-
-	job->base.bo = kcalloc(ARRAY_SIZE(args->bo_handles),
-			       sizeof(*job->base.bo), GFP_KERNEL);
-	if (!job->base.bo) {
-		ret = -ENOMEM;
-		goto fail;
-	}
-
-	job->args = *args;
-
-	for (job->base.bo_count = 0;
-	     job->base.bo_count < ARRAY_SIZE(args->bo_handles);
-	     job->base.bo_count++) {
-		struct drm_gem_object *bo;
-
-		if (!args->bo_handles[job->base.bo_count])
-			break;
-
-		bo = drm_gem_object_lookup(file_priv, args->bo_handles[job->base.bo_count]);
-		if (!bo) {
-			DRM_DEBUG("Failed to look up GEM BO %d: %d\n",
-				  job->base.bo_count,
-				  args->bo_handles[job->base.bo_count]);
-			ret = -ENOENT;
-			goto fail;
-		}
-		job->base.bo[job->base.bo_count] = bo;
-	}
-
-	ret = v3d_lock_bo_reservations(&job->base, &acquire_ctx);
-	if (ret)
-		goto fail;
-
-	mutex_lock(&v3d->sched_lock);
-	v3d_push_job(&job->base);
-	mutex_unlock(&v3d->sched_lock);
-
-	v3d_attach_fences_and_unlock_reservation(file_priv,
-						 &job->base, &acquire_ctx,
-						 args->out_sync,
-						 &se,
-						 job->base.done_fence);
-
-	v3d_job_put(&job->base);
-
-	return 0;
-
-fail:
-	v3d_job_cleanup((void *)job);
-	v3d_put_multisync_post_deps(&se);
-
-	return ret;
-}
-
-/**
- * v3d_submit_csd_ioctl() - Submits a CSD (texture formatting) job to the V3D.
- * @dev: DRM device
- * @data: ioctl argument
- * @file_priv: DRM file for this fd
- *
- * Userspace provides the register setup for the CSD, which we don't
- * need to validate since the CSD is behind the MMU.
- */
-int
-v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
-		     struct drm_file *file_priv)
-{
-	struct v3d_dev *v3d = to_v3d_dev(dev);
-	struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
-	struct drm_v3d_submit_csd *args = data;
-	struct v3d_submit_ext se = {0};
-	struct v3d_csd_job *job = NULL;
-	struct v3d_job *clean_job = NULL;
-	struct ww_acquire_ctx acquire_ctx;
-	int ret;
-
-	trace_v3d_submit_csd_ioctl(&v3d->drm, args->cfg[5], args->cfg[6]);
-
-	if (args->pad)
-		return -EINVAL;
-
-	if (!v3d_has_csd(v3d)) {
-		DRM_DEBUG("Attempting CSD submit on non-CSD hardware\n");
-		return -EINVAL;
-	}
-
-	if (args->flags && !(args->flags & DRM_V3D_SUBMIT_EXTENSION)) {
-		DRM_INFO("invalid flags: %d\n", args->flags);
-		return -EINVAL;
-	}
-
-	if (args->flags & DRM_V3D_SUBMIT_EXTENSION) {
-		ret = v3d_get_extensions(file_priv, args->extensions, &se);
-		if (ret) {
-			DRM_DEBUG("Failed to get extensions.\n");
-			return ret;
-		}
-	}
-
-	ret = v3d_job_init(v3d, file_priv, (void *)&job, sizeof(*job),
-			   v3d_job_free, args->in_sync, &se, V3D_CSD);
-	if (ret)
-		goto fail;
-
-	ret = v3d_job_init(v3d, file_priv, (void *)&clean_job, sizeof(*clean_job),
-			   v3d_job_free, 0, NULL, V3D_CACHE_CLEAN);
-	if (ret)
-		goto fail;
-
-	job->args = *args;
-
-	ret = v3d_lookup_bos(dev, file_priv, clean_job,
-			     args->bo_handles, args->bo_handle_count);
-	if (ret)
-		goto fail;
-
-	ret = v3d_lock_bo_reservations(clean_job, &acquire_ctx);
-	if (ret)
-		goto fail;
-
-	if (args->perfmon_id) {
-		job->base.perfmon = v3d_perfmon_find(v3d_priv,
-						     args->perfmon_id);
-		if (!job->base.perfmon) {
-			ret = -ENOENT;
-			goto fail_perfmon;
-		}
-	}
-
-	mutex_lock(&v3d->sched_lock);
-	v3d_push_job(&job->base);
-
-	ret = drm_sched_job_add_dependency(&clean_job->base,
-					   dma_fence_get(job->base.done_fence));
-	if (ret)
-		goto fail_unreserve;
-
-	v3d_push_job(clean_job);
-	mutex_unlock(&v3d->sched_lock);
-
-	v3d_attach_fences_and_unlock_reservation(file_priv,
-						 clean_job,
-						 &acquire_ctx,
-						 args->out_sync,
-						 &se,
-						 clean_job->done_fence);
-
-	v3d_job_put(&job->base);
-	v3d_job_put(clean_job);
-
-	return 0;
-
-fail_unreserve:
-	mutex_unlock(&v3d->sched_lock);
-fail_perfmon:
-	drm_gem_unlock_reservations(clean_job->bo, clean_job->bo_count,
-				    &acquire_ctx);
-fail:
-	v3d_job_cleanup((void *)job);
-	v3d_job_cleanup(clean_job);
-	v3d_put_multisync_post_deps(&se);
-
-	return ret;
-}
-
 int
 v3d_gem_init(struct drm_device *dev)
 {
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
new file mode 100644
index 000000000000..f36214002f37
--- /dev/null
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -0,0 +1,744 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright (C) 2014-2018 Broadcom
+ * Copyright (C) 2023 Raspberry Pi
+ */
+
+#include <drm/drm_syncobj.h>
+
+#include "v3d_drv.h"
+#include "v3d_regs.h"
+#include "v3d_trace.h"
+
+/* Takes the reservation lock on all the BOs being referenced, so that
+ * at queue submit time we can update the reservations.
+ *
+ * We don't lock the RCL the tile alloc/state BOs, or overflow memory
+ * (all of which are on exec->unref_list).  They're entirely private
+ * to v3d, so we don't attach dma-buf fences to them.
+ */
+static int
+v3d_lock_bo_reservations(struct v3d_job *job,
+			 struct ww_acquire_ctx *acquire_ctx)
+{
+	int i, ret;
+
+	ret = drm_gem_lock_reservations(job->bo, job->bo_count, acquire_ctx);
+	if (ret)
+		return ret;
+
+	for (i = 0; i < job->bo_count; i++) {
+		ret = dma_resv_reserve_fences(job->bo[i]->resv, 1);
+		if (ret)
+			goto fail;
+
+		ret = drm_sched_job_add_implicit_dependencies(&job->base,
+							      job->bo[i], true);
+		if (ret)
+			goto fail;
+	}
+
+	return 0;
+
+fail:
+	drm_gem_unlock_reservations(job->bo, job->bo_count, acquire_ctx);
+	return ret;
+}
+
+/**
+ * v3d_lookup_bos() - Sets up job->bo[] with the GEM objects
+ * referenced by the job.
+ * @dev: DRM device
+ * @file_priv: DRM file for this fd
+ * @job: V3D job being set up
+ * @bo_handles: GEM handles
+ * @bo_count: Number of GEM handles passed in
+ *
+ * The command validator needs to reference BOs by their index within
+ * the submitted job's BO list.  This does the validation of the job's
+ * BO list and reference counting for the lifetime of the job.
+ *
+ * Note that this function doesn't need to unreference the BOs on
+ * failure, because that will happen at v3d_exec_cleanup() time.
+ */
+static int
+v3d_lookup_bos(struct drm_device *dev,
+	       struct drm_file *file_priv,
+	       struct v3d_job *job,
+	       u64 bo_handles,
+	       u32 bo_count)
+{
+	job->bo_count = bo_count;
+
+	if (!job->bo_count) {
+		/* See comment on bo_index for why we have to check
+		 * this.
+		 */
+		DRM_DEBUG("Rendering requires BOs\n");
+		return -EINVAL;
+	}
+
+	return drm_gem_objects_lookup(file_priv,
+				      (void __user *)(uintptr_t)bo_handles,
+				      job->bo_count, &job->bo);
+}
+
+static void
+v3d_job_free(struct kref *ref)
+{
+	struct v3d_job *job = container_of(ref, struct v3d_job, refcount);
+	int i;
+
+	if (job->bo) {
+		for (i = 0; i < job->bo_count; i++)
+			drm_gem_object_put(job->bo[i]);
+		kvfree(job->bo);
+	}
+
+	dma_fence_put(job->irq_fence);
+	dma_fence_put(job->done_fence);
+
+	if (job->perfmon)
+		v3d_perfmon_put(job->perfmon);
+
+	kfree(job);
+}
+
+static void
+v3d_render_job_free(struct kref *ref)
+{
+	struct v3d_render_job *job = container_of(ref, struct v3d_render_job,
+						  base.refcount);
+	struct v3d_bo *bo, *save;
+
+	list_for_each_entry_safe(bo, save, &job->unref_list, unref_head) {
+		drm_gem_object_put(&bo->base.base);
+	}
+
+	v3d_job_free(ref);
+}
+
+void v3d_job_cleanup(struct v3d_job *job)
+{
+	if (!job)
+		return;
+
+	drm_sched_job_cleanup(&job->base);
+	v3d_job_put(job);
+}
+
+void v3d_job_put(struct v3d_job *job)
+{
+	kref_put(&job->refcount, job->free);
+}
+
+static int
+v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
+	     void **container, size_t size, void (*free)(struct kref *ref),
+	     u32 in_sync, struct v3d_submit_ext *se, enum v3d_queue queue)
+{
+	struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
+	struct v3d_job *job;
+	bool has_multisync = se && (se->flags & DRM_V3D_EXT_ID_MULTI_SYNC);
+	int ret, i;
+
+	*container = kcalloc(1, size, GFP_KERNEL);
+	if (!*container) {
+		DRM_ERROR("Cannot allocate memory for v3d job.");
+		return -ENOMEM;
+	}
+
+	job = *container;
+	job->v3d = v3d;
+	job->free = free;
+	job->file = file_priv;
+
+	ret = drm_sched_job_init(&job->base, &v3d_priv->sched_entity[queue],
+				 1, v3d_priv);
+	if (ret)
+		goto fail;
+
+	if (has_multisync) {
+		if (se->in_sync_count && se->wait_stage == queue) {
+			struct drm_v3d_sem __user *handle = u64_to_user_ptr(se->in_syncs);
+
+			for (i = 0; i < se->in_sync_count; i++) {
+				struct drm_v3d_sem in;
+
+				if (copy_from_user(&in, handle++, sizeof(in))) {
+					ret = -EFAULT;
+					DRM_DEBUG("Failed to copy wait dep handle.\n");
+					goto fail_deps;
+				}
+				ret = drm_sched_job_add_syncobj_dependency(&job->base, file_priv, in.handle, 0);
+
+				// TODO: Investigate why this was filtered out for the IOCTL.
+				if (ret && ret != -ENOENT)
+					goto fail_deps;
+			}
+		}
+	} else {
+		ret = drm_sched_job_add_syncobj_dependency(&job->base, file_priv, in_sync, 0);
+
+		// TODO: Investigate why this was filtered out for the IOCTL.
+		if (ret && ret != -ENOENT)
+			goto fail_deps;
+	}
+
+	kref_init(&job->refcount);
+
+	return 0;
+
+fail_deps:
+	drm_sched_job_cleanup(&job->base);
+fail:
+	kfree(*container);
+	*container = NULL;
+
+	return ret;
+}
+
+static void
+v3d_push_job(struct v3d_job *job)
+{
+	drm_sched_job_arm(&job->base);
+
+	job->done_fence = dma_fence_get(&job->base.s_fence->finished);
+
+	/* put by scheduler job completion */
+	kref_get(&job->refcount);
+
+	drm_sched_entity_push_job(&job->base);
+}
+
+static void
+v3d_attach_fences_and_unlock_reservation(struct drm_file *file_priv,
+					 struct v3d_job *job,
+					 struct ww_acquire_ctx *acquire_ctx,
+					 u32 out_sync,
+					 struct v3d_submit_ext *se,
+					 struct dma_fence *done_fence)
+{
+	struct drm_syncobj *sync_out;
+	bool has_multisync = se && (se->flags & DRM_V3D_EXT_ID_MULTI_SYNC);
+	int i;
+
+	for (i = 0; i < job->bo_count; i++) {
+		/* XXX: Use shared fences for read-only objects. */
+		dma_resv_add_fence(job->bo[i]->resv, job->done_fence,
+				   DMA_RESV_USAGE_WRITE);
+	}
+
+	drm_gem_unlock_reservations(job->bo, job->bo_count, acquire_ctx);
+
+	/* Update the return sync object for the job */
+	/* If it only supports a single signal semaphore*/
+	if (!has_multisync) {
+		sync_out = drm_syncobj_find(file_priv, out_sync);
+		if (sync_out) {
+			drm_syncobj_replace_fence(sync_out, done_fence);
+			drm_syncobj_put(sync_out);
+		}
+		return;
+	}
+
+	/* If multiple semaphores extension is supported */
+	if (se->out_sync_count) {
+		for (i = 0; i < se->out_sync_count; i++) {
+			drm_syncobj_replace_fence(se->out_syncs[i].syncobj,
+						  done_fence);
+			drm_syncobj_put(se->out_syncs[i].syncobj);
+		}
+		kvfree(se->out_syncs);
+	}
+}
+
+static void
+v3d_put_multisync_post_deps(struct v3d_submit_ext *se)
+{
+	unsigned int i;
+
+	if (!(se && se->out_sync_count))
+		return;
+
+	for (i = 0; i < se->out_sync_count; i++)
+		drm_syncobj_put(se->out_syncs[i].syncobj);
+	kvfree(se->out_syncs);
+}
+
+static int
+v3d_get_multisync_post_deps(struct drm_file *file_priv,
+			    struct v3d_submit_ext *se,
+			    u32 count, u64 handles)
+{
+	struct drm_v3d_sem __user *post_deps;
+	int i, ret;
+
+	if (!count)
+		return 0;
+
+	se->out_syncs = (struct v3d_submit_outsync *)
+			kvmalloc_array(count,
+				       sizeof(struct v3d_submit_outsync),
+				       GFP_KERNEL);
+	if (!se->out_syncs)
+		return -ENOMEM;
+
+	post_deps = u64_to_user_ptr(handles);
+
+	for (i = 0; i < count; i++) {
+		struct drm_v3d_sem out;
+
+		if (copy_from_user(&out, post_deps++, sizeof(out))) {
+			ret = -EFAULT;
+			DRM_DEBUG("Failed to copy post dep handles\n");
+			goto fail;
+		}
+
+		se->out_syncs[i].syncobj = drm_syncobj_find(file_priv,
+							    out.handle);
+		if (!se->out_syncs[i].syncobj) {
+			ret = -EINVAL;
+			goto fail;
+		}
+	}
+	se->out_sync_count = count;
+
+	return 0;
+
+fail:
+	for (i--; i >= 0; i--)
+		drm_syncobj_put(se->out_syncs[i].syncobj);
+	kvfree(se->out_syncs);
+
+	return ret;
+}
+
+/* Get data for multiple binary semaphores synchronization. Parse syncobj
+ * to be signaled when job completes (out_sync).
+ */
+static int
+v3d_get_multisync_submit_deps(struct drm_file *file_priv,
+			      struct drm_v3d_extension __user *ext,
+			      void *data)
+{
+	struct drm_v3d_multi_sync multisync;
+	struct v3d_submit_ext *se = data;
+	int ret;
+
+	if (copy_from_user(&multisync, ext, sizeof(multisync)))
+		return -EFAULT;
+
+	if (multisync.pad)
+		return -EINVAL;
+
+	ret = v3d_get_multisync_post_deps(file_priv, data, multisync.out_sync_count,
+					  multisync.out_syncs);
+	if (ret)
+		return ret;
+
+	se->in_sync_count = multisync.in_sync_count;
+	se->in_syncs = multisync.in_syncs;
+	se->flags |= DRM_V3D_EXT_ID_MULTI_SYNC;
+	se->wait_stage = multisync.wait_stage;
+
+	return 0;
+}
+
+/* Whenever userspace sets ioctl extensions, v3d_get_extensions parses data
+ * according to the extension id (name).
+ */
+static int
+v3d_get_extensions(struct drm_file *file_priv,
+		   u64 ext_handles,
+		   void *data)
+{
+	struct drm_v3d_extension __user *user_ext;
+	int ret;
+
+	user_ext = u64_to_user_ptr(ext_handles);
+	while (user_ext) {
+		struct drm_v3d_extension ext;
+
+		if (copy_from_user(&ext, user_ext, sizeof(ext))) {
+			DRM_DEBUG("Failed to copy submit extension\n");
+			return -EFAULT;
+		}
+
+		switch (ext.id) {
+		case DRM_V3D_EXT_ID_MULTI_SYNC:
+			ret = v3d_get_multisync_submit_deps(file_priv, user_ext, data);
+			if (ret)
+				return ret;
+			break;
+		default:
+			DRM_DEBUG_DRIVER("Unknown extension id: %d\n", ext.id);
+			return -EINVAL;
+		}
+
+		user_ext = u64_to_user_ptr(ext.next);
+	}
+
+	return 0;
+}
+
+/**
+ * v3d_submit_cl_ioctl() - Submits a job (frame) to the V3D.
+ * @dev: DRM device
+ * @data: ioctl argument
+ * @file_priv: DRM file for this fd
+ *
+ * This is the main entrypoint for userspace to submit a 3D frame to
+ * the GPU.  Userspace provides the binner command list (if
+ * applicable), and the kernel sets up the render command list to draw
+ * to the framebuffer described in the ioctl, using the command lists
+ * that the 3D engine's binner will produce.
+ */
+int
+v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
+		    struct drm_file *file_priv)
+{
+	struct v3d_dev *v3d = to_v3d_dev(dev);
+	struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
+	struct drm_v3d_submit_cl *args = data;
+	struct v3d_submit_ext se = {0};
+	struct v3d_bin_job *bin = NULL;
+	struct v3d_render_job *render = NULL;
+	struct v3d_job *clean_job = NULL;
+	struct v3d_job *last_job;
+	struct ww_acquire_ctx acquire_ctx;
+	int ret = 0;
+
+	trace_v3d_submit_cl_ioctl(&v3d->drm, args->rcl_start, args->rcl_end);
+
+	if (args->pad)
+		return -EINVAL;
+
+	if (args->flags &&
+	    args->flags & ~(DRM_V3D_SUBMIT_CL_FLUSH_CACHE |
+			    DRM_V3D_SUBMIT_EXTENSION)) {
+		DRM_INFO("invalid flags: %d\n", args->flags);
+		return -EINVAL;
+	}
+
+	if (args->flags & DRM_V3D_SUBMIT_EXTENSION) {
+		ret = v3d_get_extensions(file_priv, args->extensions, &se);
+		if (ret) {
+			DRM_DEBUG("Failed to get extensions.\n");
+			return ret;
+		}
+	}
+
+	ret = v3d_job_init(v3d, file_priv, (void *)&render, sizeof(*render),
+			   v3d_render_job_free, args->in_sync_rcl, &se, V3D_RENDER);
+	if (ret)
+		goto fail;
+
+	render->start = args->rcl_start;
+	render->end = args->rcl_end;
+	INIT_LIST_HEAD(&render->unref_list);
+
+	if (args->bcl_start != args->bcl_end) {
+		ret = v3d_job_init(v3d, file_priv, (void *)&bin, sizeof(*bin),
+				   v3d_job_free, args->in_sync_bcl, &se, V3D_BIN);
+		if (ret)
+			goto fail;
+
+		bin->start = args->bcl_start;
+		bin->end = args->bcl_end;
+		bin->qma = args->qma;
+		bin->qms = args->qms;
+		bin->qts = args->qts;
+		bin->render = render;
+	}
+
+	if (args->flags & DRM_V3D_SUBMIT_CL_FLUSH_CACHE) {
+		ret = v3d_job_init(v3d, file_priv, (void *)&clean_job, sizeof(*clean_job),
+				   v3d_job_free, 0, NULL, V3D_CACHE_CLEAN);
+		if (ret)
+			goto fail;
+
+		last_job = clean_job;
+	} else {
+		last_job = &render->base;
+	}
+
+	ret = v3d_lookup_bos(dev, file_priv, last_job,
+			     args->bo_handles, args->bo_handle_count);
+	if (ret)
+		goto fail;
+
+	ret = v3d_lock_bo_reservations(last_job, &acquire_ctx);
+	if (ret)
+		goto fail;
+
+	if (args->perfmon_id) {
+		render->base.perfmon = v3d_perfmon_find(v3d_priv,
+							args->perfmon_id);
+
+		if (!render->base.perfmon) {
+			ret = -ENOENT;
+			goto fail_perfmon;
+		}
+	}
+
+	mutex_lock(&v3d->sched_lock);
+	if (bin) {
+		bin->base.perfmon = render->base.perfmon;
+		v3d_perfmon_get(bin->base.perfmon);
+		v3d_push_job(&bin->base);
+
+		ret = drm_sched_job_add_dependency(&render->base.base,
+						   dma_fence_get(bin->base.done_fence));
+		if (ret)
+			goto fail_unreserve;
+	}
+
+	v3d_push_job(&render->base);
+
+	if (clean_job) {
+		struct dma_fence *render_fence =
+			dma_fence_get(render->base.done_fence);
+		ret = drm_sched_job_add_dependency(&clean_job->base,
+						   render_fence);
+		if (ret)
+			goto fail_unreserve;
+		clean_job->perfmon = render->base.perfmon;
+		v3d_perfmon_get(clean_job->perfmon);
+		v3d_push_job(clean_job);
+	}
+
+	mutex_unlock(&v3d->sched_lock);
+
+	v3d_attach_fences_and_unlock_reservation(file_priv,
+						 last_job,
+						 &acquire_ctx,
+						 args->out_sync,
+						 &se,
+						 last_job->done_fence);
+
+	if (bin)
+		v3d_job_put(&bin->base);
+	v3d_job_put(&render->base);
+	if (clean_job)
+		v3d_job_put(clean_job);
+
+	return 0;
+
+fail_unreserve:
+	mutex_unlock(&v3d->sched_lock);
+fail_perfmon:
+	drm_gem_unlock_reservations(last_job->bo,
+				    last_job->bo_count, &acquire_ctx);
+fail:
+	v3d_job_cleanup((void *)bin);
+	v3d_job_cleanup((void *)render);
+	v3d_job_cleanup(clean_job);
+	v3d_put_multisync_post_deps(&se);
+
+	return ret;
+}
+
+/**
+ * v3d_submit_tfu_ioctl() - Submits a TFU (texture formatting) job to the V3D.
+ * @dev: DRM device
+ * @data: ioctl argument
+ * @file_priv: DRM file for this fd
+ *
+ * Userspace provides the register setup for the TFU, which we don't
+ * need to validate since the TFU is behind the MMU.
+ */
+int
+v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
+		     struct drm_file *file_priv)
+{
+	struct v3d_dev *v3d = to_v3d_dev(dev);
+	struct drm_v3d_submit_tfu *args = data;
+	struct v3d_submit_ext se = {0};
+	struct v3d_tfu_job *job = NULL;
+	struct ww_acquire_ctx acquire_ctx;
+	int ret = 0;
+
+	trace_v3d_submit_tfu_ioctl(&v3d->drm, args->iia);
+
+	if (args->flags && !(args->flags & DRM_V3D_SUBMIT_EXTENSION)) {
+		DRM_DEBUG("invalid flags: %d\n", args->flags);
+		return -EINVAL;
+	}
+
+	if (args->flags & DRM_V3D_SUBMIT_EXTENSION) {
+		ret = v3d_get_extensions(file_priv, args->extensions, &se);
+		if (ret) {
+			DRM_DEBUG("Failed to get extensions.\n");
+			return ret;
+		}
+	}
+
+	ret = v3d_job_init(v3d, file_priv, (void *)&job, sizeof(*job),
+			   v3d_job_free, args->in_sync, &se, V3D_TFU);
+	if (ret)
+		goto fail;
+
+	job->base.bo = kcalloc(ARRAY_SIZE(args->bo_handles),
+			       sizeof(*job->base.bo), GFP_KERNEL);
+	if (!job->base.bo) {
+		ret = -ENOMEM;
+		goto fail;
+	}
+
+	job->args = *args;
+
+	for (job->base.bo_count = 0;
+	     job->base.bo_count < ARRAY_SIZE(args->bo_handles);
+	     job->base.bo_count++) {
+		struct drm_gem_object *bo;
+
+		if (!args->bo_handles[job->base.bo_count])
+			break;
+
+		bo = drm_gem_object_lookup(file_priv, args->bo_handles[job->base.bo_count]);
+		if (!bo) {
+			DRM_DEBUG("Failed to look up GEM BO %d: %d\n",
+				  job->base.bo_count,
+				  args->bo_handles[job->base.bo_count]);
+			ret = -ENOENT;
+			goto fail;
+		}
+		job->base.bo[job->base.bo_count] = bo;
+	}
+
+	ret = v3d_lock_bo_reservations(&job->base, &acquire_ctx);
+	if (ret)
+		goto fail;
+
+	mutex_lock(&v3d->sched_lock);
+	v3d_push_job(&job->base);
+	mutex_unlock(&v3d->sched_lock);
+
+	v3d_attach_fences_and_unlock_reservation(file_priv,
+						 &job->base, &acquire_ctx,
+						 args->out_sync,
+						 &se,
+						 job->base.done_fence);
+
+	v3d_job_put(&job->base);
+
+	return 0;
+
+fail:
+	v3d_job_cleanup((void *)job);
+	v3d_put_multisync_post_deps(&se);
+
+	return ret;
+}
+
+/**
+ * v3d_submit_csd_ioctl() - Submits a CSD (compute shader) job to the V3D.
+ * @dev: DRM device
+ * @data: ioctl argument
+ * @file_priv: DRM file for this fd
+ *
+ * Userspace provides the register setup for the CSD, which we don't
+ * need to validate since the CSD is behind the MMU.
+ */
+int
+v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
+		     struct drm_file *file_priv)
+{
+	struct v3d_dev *v3d = to_v3d_dev(dev);
+	struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
+	struct drm_v3d_submit_csd *args = data;
+	struct v3d_submit_ext se = {0};
+	struct v3d_csd_job *job = NULL;
+	struct v3d_job *clean_job = NULL;
+	struct ww_acquire_ctx acquire_ctx;
+	int ret;
+
+	trace_v3d_submit_csd_ioctl(&v3d->drm, args->cfg[5], args->cfg[6]);
+
+	if (args->pad)
+		return -EINVAL;
+
+	if (!v3d_has_csd(v3d)) {
+		DRM_DEBUG("Attempting CSD submit on non-CSD hardware\n");
+		return -EINVAL;
+	}
+
+	if (args->flags && !(args->flags & DRM_V3D_SUBMIT_EXTENSION)) {
+		DRM_INFO("invalid flags: %d\n", args->flags);
+		return -EINVAL;
+	}
+
+	if (args->flags & DRM_V3D_SUBMIT_EXTENSION) {
+		ret = v3d_get_extensions(file_priv, args->extensions, &se);
+		if (ret) {
+			DRM_DEBUG("Failed to get extensions.\n");
+			return ret;
+		}
+	}
+
+	ret = v3d_job_init(v3d, file_priv, (void *)&job, sizeof(*job),
+			   v3d_job_free, args->in_sync, &se, V3D_CSD);
+	if (ret)
+		goto fail;
+
+	ret = v3d_job_init(v3d, file_priv, (void *)&clean_job, sizeof(*clean_job),
+			   v3d_job_free, 0, NULL, V3D_CACHE_CLEAN);
+	if (ret)
+		goto fail;
+
+	job->args = *args;
+
+	ret = v3d_lookup_bos(dev, file_priv, clean_job,
+			     args->bo_handles, args->bo_handle_count);
+	if (ret)
+		goto fail;
+
+	ret = v3d_lock_bo_reservations(clean_job, &acquire_ctx);
+	if (ret)
+		goto fail;
+
+	if (args->perfmon_id) {
+		job->base.perfmon = v3d_perfmon_find(v3d_priv,
+						     args->perfmon_id);
+		if (!job->base.perfmon) {
+			ret = -ENOENT;
+			goto fail_perfmon;
+		}
+	}
+
+	mutex_lock(&v3d->sched_lock);
+	v3d_push_job(&job->base);
+
+	ret = drm_sched_job_add_dependency(&clean_job->base,
+					   dma_fence_get(job->base.done_fence));
+	if (ret)
+		goto fail_unreserve;
+
+	v3d_push_job(clean_job);
+	mutex_unlock(&v3d->sched_lock);
+
+	v3d_attach_fences_and_unlock_reservation(file_priv,
+						 clean_job,
+						 &acquire_ctx,
+						 args->out_sync,
+						 &se,
+						 clean_job->done_fence);
+
+	v3d_job_put(&job->base);
+	v3d_job_put(clean_job);
+
+	return 0;
+
+fail_unreserve:
+	mutex_unlock(&v3d->sched_lock);
+fail_perfmon:
+	drm_gem_unlock_reservations(clean_job->bo, clean_job->bo_count,
+				    &acquire_ctx);
+fail:
+	v3d_job_cleanup((void *)job);
+	v3d_job_cleanup(clean_job);
+	v3d_put_multisync_post_deps(&se);
+
+	return ret;
+}
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 04/17] drm/v3d: Simplify job refcount handling
  2023-11-24  0:46 [PATCH v2 00/17] drm/v3d: Introduce CPU jobs Maíra Canal
                   ` (2 preceding siblings ...)
  2023-11-24  0:46 ` [PATCH v2 03/17] drm/v3d: Detach job submissions IOCTLs to a new specific file Maíra Canal
@ 2023-11-24  0:47 ` Maíra Canal
  2023-11-27  7:57   ` Iago Toral
  2023-11-24  0:47 ` [PATCH v2 05/17] drm/v3d: Don't allow two multisync extensions in the same job Maíra Canal
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 25+ messages in thread
From: Maíra Canal @ 2023-11-24  0:47 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann
  Cc: Maíra Canal, kernel-dev, Boris Brezillon, dri-devel, Faith Ekstrand

From: Melissa Wen <mwen@igalia.com>

Instead of checking if the job is NULL every time we call the function,
check it inside the function.

Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
 drivers/gpu/drm/v3d/v3d_submit.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index f36214002f37..e18e7c963884 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -129,6 +129,9 @@ void v3d_job_cleanup(struct v3d_job *job)
 
 void v3d_job_put(struct v3d_job *job)
 {
+	if (!job)
+		return;
+
 	kref_put(&job->refcount, job->free);
 }
 
@@ -517,11 +520,9 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
 						 &se,
 						 last_job->done_fence);
 
-	if (bin)
-		v3d_job_put(&bin->base);
-	v3d_job_put(&render->base);
-	if (clean_job)
-		v3d_job_put(clean_job);
+	v3d_job_put((void *)bin);
+	v3d_job_put((void *)render);
+	v3d_job_put((void *)clean_job);
 
 	return 0;
 
@@ -621,7 +622,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
 						 &se,
 						 job->base.done_fence);
 
-	v3d_job_put(&job->base);
+	v3d_job_put((void *)job);
 
 	return 0;
 
@@ -725,7 +726,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
 						 &se,
 						 clean_job->done_fence);
 
-	v3d_job_put(&job->base);
+	v3d_job_put((void *)job);
 	v3d_job_put(clean_job);
 
 	return 0;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 05/17] drm/v3d: Don't allow two multisync extensions in the same job
  2023-11-24  0:46 [PATCH v2 00/17] drm/v3d: Introduce CPU jobs Maíra Canal
                   ` (3 preceding siblings ...)
  2023-11-24  0:47 ` [PATCH v2 04/17] drm/v3d: Simplify job refcount handling Maíra Canal
@ 2023-11-24  0:47 ` Maíra Canal
  2023-11-24  0:47 ` [PATCH v2 06/17] drm/v3d: Decouple job allocation from job initiation Maíra Canal
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Maíra Canal @ 2023-11-24  0:47 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann
  Cc: Maíra Canal, kernel-dev, Boris Brezillon, dri-devel, Faith Ekstrand

Currently, two multisync extensions can be added to the same job and
only the last multisync extension will be used. To avoid this
vulnerability, don't allow two multisync extensions in the same job.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
 drivers/gpu/drm/v3d/v3d_submit.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index e18e7c963884..fe46dd316ca0 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -329,6 +329,11 @@ v3d_get_multisync_submit_deps(struct drm_file *file_priv,
 	struct v3d_submit_ext *se = data;
 	int ret;
 
+	if (se->in_sync_count || se->out_sync_count) {
+		DRM_DEBUG("Two multisync extensions were added to the same job.");
+		return -EINVAL;
+	}
+
 	if (copy_from_user(&multisync, ext, sizeof(multisync)))
 		return -EFAULT;
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 06/17] drm/v3d: Decouple job allocation from job initiation
  2023-11-24  0:46 [PATCH v2 00/17] drm/v3d: Introduce CPU jobs Maíra Canal
                   ` (4 preceding siblings ...)
  2023-11-24  0:47 ` [PATCH v2 05/17] drm/v3d: Don't allow two multisync extensions in the same job Maíra Canal
@ 2023-11-24  0:47 ` Maíra Canal
  2023-11-27  8:12   ` Iago Toral
  2023-11-24  0:47 ` [PATCH v2 07/17] drm/v3d: Add a CPU job submission Maíra Canal
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 25+ messages in thread
From: Maíra Canal @ 2023-11-24  0:47 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann
  Cc: Maíra Canal, kernel-dev, Boris Brezillon, dri-devel, Faith Ekstrand

We want to allow the IOCTLs to allocate the job without initiating it.
This will be useful for the CPU job submission IOCTL, as the CPU job has
the need to use information from the user extensions. Currently, the
user extensions are parsed before the job allocation, making it
impossible to fill the CPU job when parsing the user extensions.
Therefore, decouple the job allocation from the job initiation.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
 drivers/gpu/drm/v3d/v3d_submit.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index fe46dd316ca0..ed1a310bbd2f 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -135,6 +135,21 @@ void v3d_job_put(struct v3d_job *job)
 	kref_put(&job->refcount, job->free);
 }
 
+static int
+v3d_job_allocate(void **container, size_t size)
+{
+	if (*container)
+		return 0;
+
+	*container = kcalloc(1, size, GFP_KERNEL);
+	if (!*container) {
+		DRM_ERROR("Cannot allocate memory for V3D job.\n");
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
 static int
 v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
 	     void **container, size_t size, void (*free)(struct kref *ref),
@@ -145,11 +160,9 @@ v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
 	bool has_multisync = se && (se->flags & DRM_V3D_EXT_ID_MULTI_SYNC);
 	int ret, i;
 
-	*container = kcalloc(1, size, GFP_KERNEL);
-	if (!*container) {
-		DRM_ERROR("Cannot allocate memory for v3d job.");
-		return -ENOMEM;
-	}
+	ret = v3d_job_allocate(container, size);
+	if (ret)
+		return ret;
 
 	job = *container;
 	job->v3d = v3d;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 07/17] drm/v3d: Add a CPU job submission
  2023-11-24  0:46 [PATCH v2 00/17] drm/v3d: Introduce CPU jobs Maíra Canal
                   ` (5 preceding siblings ...)
  2023-11-24  0:47 ` [PATCH v2 06/17] drm/v3d: Decouple job allocation from job initiation Maíra Canal
@ 2023-11-24  0:47 ` Maíra Canal
  2023-11-27  8:45   ` Iago Toral
  2023-11-27  8:59   ` Iago Toral
  2023-11-24  0:47 ` [PATCH v2 08/17] drm/v3d: Use v3d_get_extensions() to parse CPU job data Maíra Canal
                   ` (9 subsequent siblings)
  16 siblings, 2 replies; 25+ messages in thread
From: Maíra Canal @ 2023-11-24  0:47 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann
  Cc: Maíra Canal, kernel-dev, Boris Brezillon, dri-devel, Faith Ekstrand

From: Melissa Wen <mwen@igalia.com>

Create a new type of job, a CPU job. A CPU job is a type of job that
performs operations that requires CPU intervention. The overall idea is
to use user extensions to enable different types of CPU job, allowing the
CPU job to perform different operations according to the type of user
externsion. The user extension ID identify the type of CPU job that must
be dealt.

Having a CPU job is interesting for synchronization purposes as a CPU
job has a queue like any other V3D job and can be synchoronized by the
multisync extension.

Signed-off-by: Melissa Wen <mwen@igalia.com>
Co-developed-by: Maíra Canal <mcanal@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
 drivers/gpu/drm/v3d/v3d_drv.c    |  4 ++
 drivers/gpu/drm/v3d/v3d_drv.h    | 14 +++++-
 drivers/gpu/drm/v3d/v3d_sched.c  | 57 +++++++++++++++++++++++
 drivers/gpu/drm/v3d/v3d_submit.c | 79 ++++++++++++++++++++++++++++++++
 include/uapi/drm/v3d_drm.h       | 17 +++++++
 5 files changed, 170 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index 44a1ca57d6a4..3debf37e7d9b 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -91,6 +91,9 @@ static int v3d_get_param_ioctl(struct drm_device *dev, void *data,
 	case DRM_V3D_PARAM_SUPPORTS_MULTISYNC_EXT:
 		args->value = 1;
 		return 0;
+	case DRM_V3D_PARAM_SUPPORTS_CPU_QUEUE:
+		args->value = 1;
+		return 0;
 	default:
 		DRM_DEBUG("Unknown parameter %d\n", args->param);
 		return -EINVAL;
@@ -189,6 +192,7 @@ static const struct drm_ioctl_desc v3d_drm_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(V3D_PERFMON_CREATE, v3d_perfmon_create_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(V3D_PERFMON_DESTROY, v3d_perfmon_destroy_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(V3D_PERFMON_GET_VALUES, v3d_perfmon_get_values_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(V3D_SUBMIT_CPU, v3d_submit_cpu_ioctl, DRM_RENDER_ALLOW | DRM_AUTH),
 };
 
 static const struct drm_driver v3d_drm_driver = {
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 4db9ace66024..010b146140ad 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -19,7 +19,7 @@ struct reset_control;
 
 #define GMP_GRANULARITY (128 * 1024)
 
-#define V3D_MAX_QUEUES (V3D_CACHE_CLEAN + 1)
+#define V3D_MAX_QUEUES (V3D_CPU + 1)
 
 static inline char *v3d_queue_to_string(enum v3d_queue queue)
 {
@@ -29,6 +29,7 @@ static inline char *v3d_queue_to_string(enum v3d_queue queue)
 	case V3D_TFU: return "tfu";
 	case V3D_CSD: return "csd";
 	case V3D_CACHE_CLEAN: return "cache_clean";
+	case V3D_CPU: return "cpu";
 	}
 	return "UNKNOWN";
 }
@@ -122,6 +123,7 @@ struct v3d_dev {
 	struct v3d_render_job *render_job;
 	struct v3d_tfu_job *tfu_job;
 	struct v3d_csd_job *csd_job;
+	struct v3d_cpu_job *cpu_job;
 
 	struct v3d_queue_state queue[V3D_MAX_QUEUES];
 
@@ -312,6 +314,14 @@ struct v3d_csd_job {
 	struct drm_v3d_submit_csd args;
 };
 
+enum v3d_cpu_job_type {};
+
+struct v3d_cpu_job {
+	struct v3d_job base;
+
+	enum v3d_cpu_job_type job_type;
+};
+
 struct v3d_submit_outsync {
 	struct drm_syncobj *syncobj;
 };
@@ -414,6 +424,8 @@ int v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
 			 struct drm_file *file_priv);
 int v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
 			 struct drm_file *file_priv);
+int v3d_submit_cpu_ioctl(struct drm_device *dev, void *data,
+			 struct drm_file *file_priv);
 
 /* v3d_irq.c */
 int v3d_irq_init(struct v3d_dev *v3d);
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index fccbea2a5f2e..a32c91b94898 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -55,6 +55,12 @@ to_csd_job(struct drm_sched_job *sched_job)
 	return container_of(sched_job, struct v3d_csd_job, base.base);
 }
 
+static struct v3d_cpu_job *
+to_cpu_job(struct drm_sched_job *sched_job)
+{
+	return container_of(sched_job, struct v3d_cpu_job, base.base);
+}
+
 static void
 v3d_sched_job_free(struct drm_sched_job *sched_job)
 {
@@ -262,6 +268,42 @@ v3d_csd_job_run(struct drm_sched_job *sched_job)
 	return fence;
 }
 
+static struct dma_fence *
+v3d_cpu_job_run(struct drm_sched_job *sched_job)
+{
+	struct v3d_cpu_job *job = to_cpu_job(sched_job);
+	struct v3d_dev *v3d = job->base.v3d;
+	struct v3d_file_priv *file = job->base.file->driver_priv;
+	u64 runtime;
+
+	void (*v3d_cpu_job_fn[])(struct v3d_cpu_job *job) = { };
+
+	v3d->cpu_job = job;
+
+	if (job->job_type >= ARRAY_SIZE(v3d_cpu_job_fn)) {
+		DRM_DEBUG_DRIVER("Unknown CPU job: %d\n", job->job_type);
+		return NULL;
+	}
+
+	file->start_ns[V3D_CPU] = local_clock();
+	v3d->queue[V3D_CPU].start_ns = file->start_ns[V3D_CPU];
+
+	v3d_cpu_job_fn[job->job_type](job);
+
+	runtime = local_clock() - file->start_ns[V3D_CPU];
+
+	file->enabled_ns[V3D_CPU] += runtime;
+	v3d->queue[V3D_CPU].enabled_ns += runtime;
+
+	file->jobs_sent[V3D_CPU]++;
+	v3d->queue[V3D_CPU].jobs_sent++;
+
+	file->start_ns[V3D_CPU] = 0;
+	v3d->queue[V3D_CPU].start_ns = 0;
+
+	return NULL;
+}
+
 static struct dma_fence *
 v3d_cache_clean_job_run(struct drm_sched_job *sched_job)
 {
@@ -416,6 +458,12 @@ static const struct drm_sched_backend_ops v3d_cache_clean_sched_ops = {
 	.free_job = v3d_sched_job_free
 };
 
+static const struct drm_sched_backend_ops v3d_cpu_sched_ops = {
+	.run_job = v3d_cpu_job_run,
+	.timedout_job = v3d_generic_job_timedout,
+	.free_job = v3d_sched_job_free
+};
+
 int
 v3d_sched_init(struct v3d_dev *v3d)
 {
@@ -471,6 +519,15 @@ v3d_sched_init(struct v3d_dev *v3d)
 			goto fail;
 	}
 
+	ret = drm_sched_init(&v3d->queue[V3D_CPU].sched,
+			     &v3d_cpu_sched_ops, NULL,
+			     DRM_SCHED_PRIORITY_COUNT,
+			     1, job_hang_limit,
+			     msecs_to_jiffies(hang_limit_ms), NULL,
+			     NULL, "v3d_cpu", v3d->drm.dev);
+	if (ret)
+		goto fail;
+
 	return 0;
 
 fail:
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index ed1a310bbd2f..d4b85ab16345 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -761,3 +761,82 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
 
 	return ret;
 }
+
+/**
+ * v3d_submit_cpu_ioctl() - Submits a CPU job to the V3D.
+ * @dev: DRM device
+ * @data: ioctl argument
+ * @file_priv: DRM file for this fd
+ *
+ * Userspace specifies the CPU job type and data required to perform its
+ * operations through the drm_v3d_extension struct.
+ */
+int
+v3d_submit_cpu_ioctl(struct drm_device *dev, void *data,
+		     struct drm_file *file_priv)
+{
+	struct v3d_dev *v3d = to_v3d_dev(dev);
+	struct drm_v3d_submit_cpu *args = data;
+	struct v3d_submit_ext se = {0};
+	struct v3d_cpu_job *cpu_job = NULL;
+	struct ww_acquire_ctx acquire_ctx;
+	int ret;
+
+	if (args->flags && !(args->flags & DRM_V3D_SUBMIT_EXTENSION)) {
+		DRM_INFO("invalid flags: %d\n", args->flags);
+		return -EINVAL;
+	}
+
+	ret = v3d_job_allocate((void *)&cpu_job, sizeof(*cpu_job));
+	if (ret)
+		return ret;
+
+	if (args->flags & DRM_V3D_SUBMIT_EXTENSION) {
+		ret = v3d_get_extensions(file_priv, args->extensions, &se);
+		if (ret) {
+			DRM_DEBUG("Failed to get extensions.\n");
+			goto fail;
+		}
+	}
+
+	/* Every CPU job must have a CPU job user extension */
+	if (!cpu_job->job_type) {
+		DRM_DEBUG("CPU job must have a CPU job user extension.\n");
+		goto fail;
+	}
+
+	ret = v3d_job_init(v3d, file_priv, (void *)&cpu_job, sizeof(*cpu_job),
+			   v3d_job_free, 0, &se, V3D_CPU);
+	if (ret)
+		goto fail;
+
+	if (args->bo_handle_count) {
+		ret = v3d_lookup_bos(dev, file_priv, &cpu_job->base,
+				     args->bo_handles, args->bo_handle_count);
+		if (ret)
+			goto fail;
+
+		ret = v3d_lock_bo_reservations(&cpu_job->base, &acquire_ctx);
+		if (ret)
+			goto fail;
+	}
+
+	mutex_lock(&v3d->sched_lock);
+	v3d_push_job(&cpu_job->base);
+	mutex_unlock(&v3d->sched_lock);
+
+	v3d_attach_fences_and_unlock_reservation(file_priv,
+						 &cpu_job->base,
+						 &acquire_ctx, 0,
+						 NULL, cpu_job->base.done_fence);
+
+	v3d_job_put((void *)cpu_job);
+
+	return 0;
+
+fail:
+	v3d_job_cleanup((void *)cpu_job);
+	v3d_put_multisync_post_deps(&se);
+
+	return ret;
+}
diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
index 1a7d7a689de3..00abef9d0db7 100644
--- a/include/uapi/drm/v3d_drm.h
+++ b/include/uapi/drm/v3d_drm.h
@@ -41,6 +41,7 @@ extern "C" {
 #define DRM_V3D_PERFMON_CREATE                    0x08
 #define DRM_V3D_PERFMON_DESTROY                   0x09
 #define DRM_V3D_PERFMON_GET_VALUES                0x0a
+#define DRM_V3D_SUBMIT_CPU                        0x0b
 
 #define DRM_IOCTL_V3D_SUBMIT_CL           DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_SUBMIT_CL, struct drm_v3d_submit_cl)
 #define DRM_IOCTL_V3D_WAIT_BO             DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_WAIT_BO, struct drm_v3d_wait_bo)
@@ -56,6 +57,7 @@ extern "C" {
 						   struct drm_v3d_perfmon_destroy)
 #define DRM_IOCTL_V3D_PERFMON_GET_VALUES  DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_PERFMON_GET_VALUES, \
 						   struct drm_v3d_perfmon_get_values)
+#define DRM_IOCTL_V3D_SUBMIT_CPU          DRM_IOW(DRM_COMMAND_BASE + DRM_V3D_SUBMIT_CPU, struct drm_v3d_submit_cpu)
 
 #define DRM_V3D_SUBMIT_CL_FLUSH_CACHE             0x01
 #define DRM_V3D_SUBMIT_EXTENSION		  0x02
@@ -93,6 +95,7 @@ enum v3d_queue {
 	V3D_TFU,
 	V3D_CSD,
 	V3D_CACHE_CLEAN,
+	V3D_CPU,
 };
 
 /**
@@ -276,6 +279,7 @@ enum drm_v3d_param {
 	DRM_V3D_PARAM_SUPPORTS_CACHE_FLUSH,
 	DRM_V3D_PARAM_SUPPORTS_PERFMON,
 	DRM_V3D_PARAM_SUPPORTS_MULTISYNC_EXT,
+	DRM_V3D_PARAM_SUPPORTS_CPU_QUEUE,
 };
 
 struct drm_v3d_get_param {
@@ -361,6 +365,19 @@ struct drm_v3d_submit_csd {
 	__u32 pad;
 };
 
+struct drm_v3d_submit_cpu {
+	/* Pointer to a u32 array of the BOs that are referenced by the job. */
+	__u64 bo_handles;
+
+	/* Number of BO handles passed in (size is that times 4). */
+	__u32 bo_handle_count;
+
+	__u32 flags;
+
+	/* Pointer to an array of ioctl extensions*/
+	__u64 extensions;
+};
+
 enum {
 	V3D_PERFCNT_FEP_VALID_PRIMTS_NO_PIXELS,
 	V3D_PERFCNT_FEP_VALID_PRIMS,
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 08/17] drm/v3d: Use v3d_get_extensions() to parse CPU job data
  2023-11-24  0:46 [PATCH v2 00/17] drm/v3d: Introduce CPU jobs Maíra Canal
                   ` (6 preceding siblings ...)
  2023-11-24  0:47 ` [PATCH v2 07/17] drm/v3d: Add a CPU job submission Maíra Canal
@ 2023-11-24  0:47 ` Maíra Canal
  2023-11-24  0:47 ` [PATCH v2 09/17] drm/v3d: Create tracepoints to track the CPU job Maíra Canal
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Maíra Canal @ 2023-11-24  0:47 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann
  Cc: Maíra Canal, kernel-dev, Boris Brezillon, dri-devel, Faith Ekstrand

Currently, v3d_get_extensions() only parses multisync data and assigns
it to the `struct v3d_submit_ext`. But, to implement the CPU job with
user extensions, we want v3d_get_extensions() to be able to parse CPU
job data and assign it to the `struct v3d_cpu_job`.

Therefore, allow the function v3d_get_extensions() to use `struct v3d_cpu_job *`
as a parameter. If the `struct v3d_cpu_job *` is assigned to NULL, it means
that the job is a GPU job and CPU job extensions should be rejected.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
 drivers/gpu/drm/v3d/v3d_submit.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index d4b85ab16345..955cf3106fdf 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -336,10 +336,9 @@ v3d_get_multisync_post_deps(struct drm_file *file_priv,
 static int
 v3d_get_multisync_submit_deps(struct drm_file *file_priv,
 			      struct drm_v3d_extension __user *ext,
-			      void *data)
+			      struct v3d_submit_ext *se)
 {
 	struct drm_v3d_multi_sync multisync;
-	struct v3d_submit_ext *se = data;
 	int ret;
 
 	if (se->in_sync_count || se->out_sync_count) {
@@ -353,7 +352,7 @@ v3d_get_multisync_submit_deps(struct drm_file *file_priv,
 	if (multisync.pad)
 		return -EINVAL;
 
-	ret = v3d_get_multisync_post_deps(file_priv, data, multisync.out_sync_count,
+	ret = v3d_get_multisync_post_deps(file_priv, se, multisync.out_sync_count,
 					  multisync.out_syncs);
 	if (ret)
 		return ret;
@@ -372,7 +371,8 @@ v3d_get_multisync_submit_deps(struct drm_file *file_priv,
 static int
 v3d_get_extensions(struct drm_file *file_priv,
 		   u64 ext_handles,
-		   void *data)
+		   struct v3d_submit_ext *se,
+		   struct v3d_cpu_job *job)
 {
 	struct drm_v3d_extension __user *user_ext;
 	int ret;
@@ -388,15 +388,16 @@ v3d_get_extensions(struct drm_file *file_priv,
 
 		switch (ext.id) {
 		case DRM_V3D_EXT_ID_MULTI_SYNC:
-			ret = v3d_get_multisync_submit_deps(file_priv, user_ext, data);
-			if (ret)
-				return ret;
+			ret = v3d_get_multisync_submit_deps(file_priv, user_ext, se);
 			break;
 		default:
 			DRM_DEBUG_DRIVER("Unknown extension id: %d\n", ext.id);
 			return -EINVAL;
 		}
 
+		if (ret)
+			return ret;
+
 		user_ext = u64_to_user_ptr(ext.next);
 	}
 
@@ -443,7 +444,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
 	}
 
 	if (args->flags & DRM_V3D_SUBMIT_EXTENSION) {
-		ret = v3d_get_extensions(file_priv, args->extensions, &se);
+		ret = v3d_get_extensions(file_priv, args->extensions, &se, NULL);
 		if (ret) {
 			DRM_DEBUG("Failed to get extensions.\n");
 			return ret;
@@ -586,7 +587,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
 	}
 
 	if (args->flags & DRM_V3D_SUBMIT_EXTENSION) {
-		ret = v3d_get_extensions(file_priv, args->extensions, &se);
+		ret = v3d_get_extensions(file_priv, args->extensions, &se, NULL);
 		if (ret) {
 			DRM_DEBUG("Failed to get extensions.\n");
 			return ret;
@@ -689,7 +690,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
 	}
 
 	if (args->flags & DRM_V3D_SUBMIT_EXTENSION) {
-		ret = v3d_get_extensions(file_priv, args->extensions, &se);
+		ret = v3d_get_extensions(file_priv, args->extensions, &se, NULL);
 		if (ret) {
 			DRM_DEBUG("Failed to get extensions.\n");
 			return ret;
@@ -792,7 +793,7 @@ v3d_submit_cpu_ioctl(struct drm_device *dev, void *data,
 		return ret;
 
 	if (args->flags & DRM_V3D_SUBMIT_EXTENSION) {
-		ret = v3d_get_extensions(file_priv, args->extensions, &se);
+		ret = v3d_get_extensions(file_priv, args->extensions, &se, cpu_job);
 		if (ret) {
 			DRM_DEBUG("Failed to get extensions.\n");
 			goto fail;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 09/17] drm/v3d: Create tracepoints to track the CPU job
  2023-11-24  0:46 [PATCH v2 00/17] drm/v3d: Introduce CPU jobs Maíra Canal
                   ` (7 preceding siblings ...)
  2023-11-24  0:47 ` [PATCH v2 08/17] drm/v3d: Use v3d_get_extensions() to parse CPU job data Maíra Canal
@ 2023-11-24  0:47 ` Maíra Canal
  2023-11-24  0:47 ` [PATCH v2 10/17] drm/v3d: Detach the CSD job BO setup Maíra Canal
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Maíra Canal @ 2023-11-24  0:47 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann
  Cc: Maíra Canal, kernel-dev, Boris Brezillon, dri-devel, Faith Ekstrand

Create tracepoints to track the three major events of a CPU job
lifetime:
	1. Submission of a `v3d_submit_cpu` IOCTL
	2. Beginning of the execution of a CPU job
	3. Ending of the execution of a CPU job

Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
 drivers/gpu/drm/v3d/v3d_sched.c  |  4 +++
 drivers/gpu/drm/v3d/v3d_submit.c |  2 ++
 drivers/gpu/drm/v3d/v3d_trace.h  | 57 ++++++++++++++++++++++++++++++++
 3 files changed, 63 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index a32c91b94898..30a88e557217 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -288,8 +288,12 @@ v3d_cpu_job_run(struct drm_sched_job *sched_job)
 	file->start_ns[V3D_CPU] = local_clock();
 	v3d->queue[V3D_CPU].start_ns = file->start_ns[V3D_CPU];
 
+	trace_v3d_cpu_job_begin(&v3d->drm, job->job_type);
+
 	v3d_cpu_job_fn[job->job_type](job);
 
+	trace_v3d_cpu_job_end(&v3d->drm, job->job_type);
+
 	runtime = local_clock() - file->start_ns[V3D_CPU];
 
 	file->enabled_ns[V3D_CPU] += runtime;
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 955cf3106fdf..337a15b4b594 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -806,6 +806,8 @@ v3d_submit_cpu_ioctl(struct drm_device *dev, void *data,
 		goto fail;
 	}
 
+	trace_v3d_submit_cpu_ioctl(&v3d->drm, cpu_job->job_type);
+
 	ret = v3d_job_init(v3d, file_priv, (void *)&cpu_job, sizeof(*cpu_job),
 			   v3d_job_free, 0, &se, V3D_CPU);
 	if (ret)
diff --git a/drivers/gpu/drm/v3d/v3d_trace.h b/drivers/gpu/drm/v3d/v3d_trace.h
index 7aa8dc356e54..06086ece6e9e 100644
--- a/drivers/gpu/drm/v3d/v3d_trace.h
+++ b/drivers/gpu/drm/v3d/v3d_trace.h
@@ -225,6 +225,63 @@ TRACE_EVENT(v3d_submit_csd,
 		      __entry->seqno)
 );
 
+TRACE_EVENT(v3d_submit_cpu_ioctl,
+	   TP_PROTO(struct drm_device *dev, enum v3d_cpu_job_type job_type),
+	   TP_ARGS(dev, job_type),
+
+	   TP_STRUCT__entry(
+			    __field(u32, dev)
+			    __field(enum v3d_cpu_job_type, job_type)
+			    ),
+
+	   TP_fast_assign(
+			  __entry->dev = dev->primary->index;
+			  __entry->job_type = job_type;
+			  ),
+
+	   TP_printk("dev=%u, job_type=%d",
+		     __entry->dev,
+		     __entry->job_type)
+);
+
+TRACE_EVENT(v3d_cpu_job_begin,
+	    TP_PROTO(struct drm_device *dev, enum v3d_cpu_job_type job_type),
+	    TP_ARGS(dev, job_type),
+
+	    TP_STRUCT__entry(
+			     __field(u32, dev)
+			     __field(enum v3d_cpu_job_type, job_type)
+			     ),
+
+	    TP_fast_assign(
+			   __entry->dev = dev->primary->index;
+			   __entry->job_type = job_type;
+			   ),
+
+	    TP_printk("dev=%u, job_type=%d",
+		      __entry->dev,
+		      __entry->job_type)
+);
+
+TRACE_EVENT(v3d_cpu_job_end,
+	    TP_PROTO(struct drm_device *dev, enum v3d_cpu_job_type job_type),
+	    TP_ARGS(dev, job_type),
+
+	    TP_STRUCT__entry(
+			     __field(u32, dev)
+			     __field(enum v3d_cpu_job_type, job_type)
+			     ),
+
+	    TP_fast_assign(
+			   __entry->dev = dev->primary->index;
+			   __entry->job_type = job_type;
+			   ),
+
+	    TP_printk("dev=%u, job_type=%d",
+		      __entry->dev,
+		      __entry->job_type)
+);
+
 TRACE_EVENT(v3d_cache_clean_begin,
 	    TP_PROTO(struct drm_device *dev),
 	    TP_ARGS(dev),
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 10/17] drm/v3d: Detach the CSD job BO setup
  2023-11-24  0:46 [PATCH v2 00/17] drm/v3d: Introduce CPU jobs Maíra Canal
                   ` (8 preceding siblings ...)
  2023-11-24  0:47 ` [PATCH v2 09/17] drm/v3d: Create tracepoints to track the CPU job Maíra Canal
@ 2023-11-24  0:47 ` Maíra Canal
  2023-11-24  0:47 ` [PATCH v2 11/17] drm/v3d: Enable BO mapping Maíra Canal
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Maíra Canal @ 2023-11-24  0:47 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann
  Cc: Maíra Canal, kernel-dev, Boris Brezillon, dri-devel, Faith Ekstrand

From: Melissa Wen <mwen@igalia.com>

Detach CSD job setup from CSD submission ioctl to reuse it in CPU
submission ioctl for indirect CSD job.

Signed-off-by: Melissa Wen <mwen@igalia.com>
Co-developed-by: Maíra Canal <mcanal@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
 drivers/gpu/drm/v3d/v3d_submit.c | 52 +++++++++++++++++++++-----------
 1 file changed, 34 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 337a15b4b594..7900bc573b96 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -269,6 +269,37 @@ v3d_attach_fences_and_unlock_reservation(struct drm_file *file_priv,
 	}
 }
 
+static int
+v3d_setup_csd_jobs_and_bos(struct drm_file *file_priv,
+			   struct v3d_dev *v3d,
+			   struct drm_v3d_submit_csd *args,
+			   struct v3d_csd_job **job,
+			   struct v3d_job **clean_job,
+			   struct v3d_submit_ext *se,
+			   struct ww_acquire_ctx *acquire_ctx)
+{
+	int ret;
+
+	ret = v3d_job_init(v3d, file_priv, (void *)job, sizeof(**job),
+			   v3d_job_free, args->in_sync, se, V3D_CSD);
+	if (ret)
+		return ret;
+
+	ret = v3d_job_init(v3d, file_priv, (void *)clean_job, sizeof(**clean_job),
+			   v3d_job_free, 0, NULL, V3D_CACHE_CLEAN);
+	if (ret)
+		return ret;
+
+	(*job)->args = *args;
+
+	ret = v3d_lookup_bos(&v3d->drm, file_priv, *clean_job,
+			     args->bo_handles, args->bo_handle_count);
+	if (ret)
+		return ret;
+
+	return v3d_lock_bo_reservations(*clean_job, acquire_ctx);
+}
+
 static void
 v3d_put_multisync_post_deps(struct v3d_submit_ext *se)
 {
@@ -697,24 +728,9 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
 		}
 	}
 
-	ret = v3d_job_init(v3d, file_priv, (void *)&job, sizeof(*job),
-			   v3d_job_free, args->in_sync, &se, V3D_CSD);
-	if (ret)
-		goto fail;
-
-	ret = v3d_job_init(v3d, file_priv, (void *)&clean_job, sizeof(*clean_job),
-			   v3d_job_free, 0, NULL, V3D_CACHE_CLEAN);
-	if (ret)
-		goto fail;
-
-	job->args = *args;
-
-	ret = v3d_lookup_bos(dev, file_priv, clean_job,
-			     args->bo_handles, args->bo_handle_count);
-	if (ret)
-		goto fail;
-
-	ret = v3d_lock_bo_reservations(clean_job, &acquire_ctx);
+	ret = v3d_setup_csd_jobs_and_bos(file_priv, v3d, args,
+					 &job, &clean_job, &se,
+					 &acquire_ctx);
 	if (ret)
 		goto fail;
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 11/17] drm/v3d: Enable BO mapping
  2023-11-24  0:46 [PATCH v2 00/17] drm/v3d: Introduce CPU jobs Maíra Canal
                   ` (9 preceding siblings ...)
  2023-11-24  0:47 ` [PATCH v2 10/17] drm/v3d: Detach the CSD job BO setup Maíra Canal
@ 2023-11-24  0:47 ` Maíra Canal
  2023-11-24  0:47 ` [PATCH v2 12/17] drm/v3d: Create a CPU job extension for a indirect CSD job Maíra Canal
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Maíra Canal @ 2023-11-24  0:47 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann
  Cc: Maíra Canal, kernel-dev, Boris Brezillon, dri-devel, Faith Ekstrand

For the indirect CSD CPU job, we will need to access the internal
contents of the BO with the dispatch parameters. Therefore, create
methods to allow the mapping and unmapping of the BO.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
 drivers/gpu/drm/v3d/v3d_bo.c  | 18 ++++++++++++++++++
 drivers/gpu/drm/v3d/v3d_drv.h |  4 ++++
 2 files changed, 22 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_bo.c b/drivers/gpu/drm/v3d/v3d_bo.c
index 357a0da7e16a..1bdfac8beafd 100644
--- a/drivers/gpu/drm/v3d/v3d_bo.c
+++ b/drivers/gpu/drm/v3d/v3d_bo.c
@@ -33,6 +33,9 @@ void v3d_free_object(struct drm_gem_object *obj)
 	struct v3d_dev *v3d = to_v3d_dev(obj->dev);
 	struct v3d_bo *bo = to_v3d_bo(obj);
 
+	if (bo->vaddr)
+		v3d_put_bo_vaddr(bo);
+
 	v3d_mmu_remove_ptes(bo);
 
 	mutex_lock(&v3d->bo_lock);
@@ -134,6 +137,7 @@ struct v3d_bo *v3d_bo_create(struct drm_device *dev, struct drm_file *file_priv,
 	if (IS_ERR(shmem_obj))
 		return ERR_CAST(shmem_obj);
 	bo = to_v3d_bo(&shmem_obj->base);
+	bo->vaddr = NULL;
 
 	ret = v3d_bo_create_finish(&shmem_obj->base);
 	if (ret)
@@ -167,6 +171,20 @@ v3d_prime_import_sg_table(struct drm_device *dev,
 	return obj;
 }
 
+void v3d_get_bo_vaddr(struct v3d_bo *bo)
+{
+	struct drm_gem_shmem_object *obj = &bo->base;
+
+	bo->vaddr = vmap(obj->pages, obj->base.size >> PAGE_SHIFT, VM_MAP,
+			 pgprot_writecombine(PAGE_KERNEL));
+}
+
+void v3d_put_bo_vaddr(struct v3d_bo *bo)
+{
+	vunmap(bo->vaddr);
+	bo->vaddr = NULL;
+}
+
 int v3d_create_bo_ioctl(struct drm_device *dev, void *data,
 			struct drm_file *file_priv)
 {
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 010b146140ad..04e97989c442 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -202,6 +202,8 @@ struct v3d_bo {
 	 * v3d_render_job->unref_list
 	 */
 	struct list_head unref_head;
+
+	void *vaddr;
 };
 
 static inline struct v3d_bo *
@@ -389,6 +391,8 @@ struct drm_gem_object *v3d_create_object(struct drm_device *dev, size_t size);
 void v3d_free_object(struct drm_gem_object *gem_obj);
 struct v3d_bo *v3d_bo_create(struct drm_device *dev, struct drm_file *file_priv,
 			     size_t size);
+void v3d_get_bo_vaddr(struct v3d_bo *bo);
+void v3d_put_bo_vaddr(struct v3d_bo *bo);
 int v3d_create_bo_ioctl(struct drm_device *dev, void *data,
 			struct drm_file *file_priv);
 int v3d_mmap_bo_ioctl(struct drm_device *dev, void *data,
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 12/17] drm/v3d: Create a CPU job extension for a indirect CSD job
  2023-11-24  0:46 [PATCH v2 00/17] drm/v3d: Introduce CPU jobs Maíra Canal
                   ` (10 preceding siblings ...)
  2023-11-24  0:47 ` [PATCH v2 11/17] drm/v3d: Enable BO mapping Maíra Canal
@ 2023-11-24  0:47 ` Maíra Canal
  2023-11-24  0:47 ` [PATCH v2 13/17] drm/v3d: Create a CPU job extension for the timestamp query job Maíra Canal
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Maíra Canal @ 2023-11-24  0:47 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann
  Cc: Maíra Canal, kernel-dev, Boris Brezillon, dri-devel, Faith Ekstrand

A CPU job is a type of job that performs operations that requires CPU
intervention. An indirect CSD job is a job that, when executed in the
queue, will map the indirect buffer, read the dispatch parameters, and
submit a regular dispatch. Therefore, it is a job that needs CPU
intervention.

So, create a user extension for the CPU job that enables the creation
of an indirect CSD. This user extension will allow the creation of a CSD
job linked to a CPU job. The CPU job will wait for the indirect CSD job
dependencies and, once they are signaled, it will update the CSD job
parameters.

Co-developed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
 drivers/gpu/drm/v3d/v3d_drv.h    |  31 +++++++++-
 drivers/gpu/drm/v3d/v3d_sched.c  |  41 ++++++++++++-
 drivers/gpu/drm/v3d/v3d_submit.c | 100 ++++++++++++++++++++++++++++++-
 include/uapi/drm/v3d_drm.h       |  37 +++++++++++-
 4 files changed, 205 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 04e97989c442..f665f3f5b65b 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -316,12 +316,41 @@ struct v3d_csd_job {
 	struct drm_v3d_submit_csd args;
 };
 
-enum v3d_cpu_job_type {};
+enum v3d_cpu_job_type {
+	V3D_CPU_JOB_TYPE_INDIRECT_CSD = 1,
+};
+
+struct v3d_indirect_csd_info {
+	/* Indirect CSD */
+	struct v3d_csd_job *job;
+
+	/* Clean cache job associated to the Indirect CSD job */
+	struct v3d_job *clean_job;
+
+	/* Offset within the BO where the workgroup counts are stored */
+	u32 offset;
+
+	/* Workgroups size */
+	u32 wg_size;
+
+	/* Indices of the uniforms with the workgroup dispatch counts
+	 * in the uniform stream.
+	 */
+	u32 wg_uniform_offsets[3];
+
+	/* Indirect BO */
+	struct drm_gem_object *indirect;
+
+	/* Context of the Indirect CSD job */
+	struct ww_acquire_ctx acquire_ctx;
+};
 
 struct v3d_cpu_job {
 	struct v3d_job base;
 
 	enum v3d_cpu_job_type job_type;
+
+	struct v3d_indirect_csd_info indirect_csd;
 };
 
 struct v3d_submit_outsync {
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 30a88e557217..a8ac46f70cee 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -25,6 +25,8 @@
 #include "v3d_regs.h"
 #include "v3d_trace.h"
 
+#define V3D_CSD_CFG012_WG_COUNT_SHIFT 16
+
 static struct v3d_job *
 to_v3d_job(struct drm_sched_job *sched_job)
 {
@@ -268,6 +270,41 @@ v3d_csd_job_run(struct drm_sched_job *sched_job)
 	return fence;
 }
 
+static void
+v3d_rewrite_csd_job_wg_counts_from_indirect(struct v3d_cpu_job *job)
+{
+	struct v3d_indirect_csd_info *indirect_csd = &job->indirect_csd;
+	struct v3d_bo *bo = to_v3d_bo(job->base.bo[0]);
+	struct v3d_bo *indirect = to_v3d_bo(indirect_csd->indirect);
+	struct drm_v3d_submit_csd *args = &indirect_csd->job->args;
+	u32 *wg_counts;
+
+	v3d_get_bo_vaddr(bo);
+	v3d_get_bo_vaddr(indirect);
+
+	wg_counts = (uint32_t *) (bo->vaddr + indirect_csd->offset);
+
+	if (wg_counts[0] == 0 || wg_counts[1] == 0 || wg_counts[2] == 0)
+		return;
+
+	args->cfg[0] = wg_counts[0] << V3D_CSD_CFG012_WG_COUNT_SHIFT;
+	args->cfg[1] = wg_counts[1] << V3D_CSD_CFG012_WG_COUNT_SHIFT;
+	args->cfg[2] = wg_counts[2] << V3D_CSD_CFG012_WG_COUNT_SHIFT;
+	args->cfg[4] = DIV_ROUND_UP(indirect_csd->wg_size, 16) *
+		       (wg_counts[0] * wg_counts[1] * wg_counts[2]) - 1;
+
+	for (int i = 0; i < 3; i++) {
+		/* 0xffffffff indicates that the uniform rewrite is not needed */
+		if (indirect_csd->wg_uniform_offsets[i] != 0xffffffff) {
+			u32 uniform_idx = indirect_csd->wg_uniform_offsets[i];
+			((uint32_t *) indirect->vaddr)[uniform_idx] = wg_counts[i];
+		}
+	}
+
+	v3d_put_bo_vaddr(indirect);
+	v3d_put_bo_vaddr(bo);
+}
+
 static struct dma_fence *
 v3d_cpu_job_run(struct drm_sched_job *sched_job)
 {
@@ -276,7 +313,9 @@ v3d_cpu_job_run(struct drm_sched_job *sched_job)
 	struct v3d_file_priv *file = job->base.file->driver_priv;
 	u64 runtime;
 
-	void (*v3d_cpu_job_fn[])(struct v3d_cpu_job *job) = { };
+	void (*v3d_cpu_job_fn[])(struct v3d_cpu_job *job) = {
+		[V3D_CPU_JOB_TYPE_INDIRECT_CSD] = v3d_rewrite_csd_job_wg_counts_from_indirect,
+	};
 
 	v3d->cpu_job = job;
 
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 7900bc573b96..b6707ef42706 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -396,6 +396,48 @@ v3d_get_multisync_submit_deps(struct drm_file *file_priv,
 	return 0;
 }
 
+/* Get data for the indirect CSD job submission. */
+static int
+v3d_get_cpu_indirect_csd_params(struct drm_file *file_priv,
+				struct drm_v3d_extension __user *ext,
+				struct v3d_cpu_job *job)
+{
+	struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
+	struct v3d_dev *v3d = v3d_priv->v3d;
+	struct drm_v3d_indirect_csd indirect_csd;
+	struct v3d_indirect_csd_info *info = &job->indirect_csd;
+
+	if (!job) {
+		DRM_DEBUG("CPU job extension was attached to a GPU job.\n");
+		return -EINVAL;
+	}
+
+	if (job->job_type) {
+		DRM_DEBUG("Two CPU job extensions were added to the same CPU job.\n");
+		return -EINVAL;
+	}
+
+	if (copy_from_user(&indirect_csd, ext, sizeof(indirect_csd)))
+		return -EFAULT;
+
+	if (!v3d_has_csd(v3d)) {
+		DRM_DEBUG("Attempting CSD submit on non-CSD hardware.\n");
+		return -EINVAL;
+	}
+
+	job->job_type = V3D_CPU_JOB_TYPE_INDIRECT_CSD;
+	info->offset = indirect_csd.offset;
+	info->wg_size = indirect_csd.wg_size;
+	memcpy(&info->wg_uniform_offsets, &indirect_csd.wg_uniform_offsets,
+	       sizeof(indirect_csd.wg_uniform_offsets));
+
+	info->indirect = drm_gem_object_lookup(file_priv, indirect_csd.indirect);
+
+	return v3d_setup_csd_jobs_and_bos(file_priv, v3d, &indirect_csd.submit,
+					  &info->job, &info->clean_job,
+					  NULL, &info->acquire_ctx);
+}
+
 /* Whenever userspace sets ioctl extensions, v3d_get_extensions parses data
  * according to the extension id (name).
  */
@@ -421,6 +463,9 @@ v3d_get_extensions(struct drm_file *file_priv,
 		case DRM_V3D_EXT_ID_MULTI_SYNC:
 			ret = v3d_get_multisync_submit_deps(file_priv, user_ext, se);
 			break;
+		case DRM_V3D_EXT_ID_CPU_INDIRECT_CSD:
+			ret = v3d_get_cpu_indirect_csd_params(file_priv, user_ext, job);
+			break;
 		default:
 			DRM_DEBUG_DRIVER("Unknown extension id: %d\n", ext.id);
 			return -EINVAL;
@@ -795,7 +840,10 @@ v3d_submit_cpu_ioctl(struct drm_device *dev, void *data,
 	struct v3d_dev *v3d = to_v3d_dev(dev);
 	struct drm_v3d_submit_cpu *args = data;
 	struct v3d_submit_ext se = {0};
+	struct v3d_submit_ext *out_se = NULL;
 	struct v3d_cpu_job *cpu_job = NULL;
+	struct v3d_csd_job *csd_job = NULL;
+	struct v3d_job *clean_job = NULL;
 	struct ww_acquire_ctx acquire_ctx;
 	int ret;
 
@@ -829,6 +877,9 @@ v3d_submit_cpu_ioctl(struct drm_device *dev, void *data,
 	if (ret)
 		goto fail;
 
+	clean_job = cpu_job->indirect_csd.clean_job;
+	csd_job = cpu_job->indirect_csd.job;
+
 	if (args->bo_handle_count) {
 		ret = v3d_lookup_bos(dev, file_priv, &cpu_job->base,
 				     args->bo_handles, args->bo_handle_count);
@@ -842,19 +893,66 @@ v3d_submit_cpu_ioctl(struct drm_device *dev, void *data,
 
 	mutex_lock(&v3d->sched_lock);
 	v3d_push_job(&cpu_job->base);
+
+	switch (cpu_job->job_type) {
+	case V3D_CPU_JOB_TYPE_INDIRECT_CSD:
+		ret = drm_sched_job_add_dependency(&csd_job->base.base,
+						   dma_fence_get(cpu_job->base.done_fence));
+		if (ret)
+			goto fail_unreserve;
+
+		v3d_push_job(&csd_job->base);
+
+		ret = drm_sched_job_add_dependency(&clean_job->base,
+						   dma_fence_get(csd_job->base.done_fence));
+		if (ret)
+			goto fail_unreserve;
+
+		v3d_push_job(clean_job);
+
+		break;
+	default:
+		break;
+	}
 	mutex_unlock(&v3d->sched_lock);
 
+	out_se = (cpu_job->job_type == V3D_CPU_JOB_TYPE_INDIRECT_CSD) ? NULL : &se;
+
 	v3d_attach_fences_and_unlock_reservation(file_priv,
 						 &cpu_job->base,
 						 &acquire_ctx, 0,
-						 NULL, cpu_job->base.done_fence);
+						 out_se, cpu_job->base.done_fence);
+
+	switch (cpu_job->job_type) {
+	case V3D_CPU_JOB_TYPE_INDIRECT_CSD:
+		v3d_attach_fences_and_unlock_reservation(file_priv,
+							 clean_job,
+							 &cpu_job->indirect_csd.acquire_ctx,
+							 0, &se, clean_job->done_fence);
+		break;
+	default:
+		break;
+	}
 
 	v3d_job_put((void *)cpu_job);
+	v3d_job_put((void *)csd_job);
+	v3d_job_put(clean_job);
 
 	return 0;
 
+fail_unreserve:
+	mutex_unlock(&v3d->sched_lock);
+
+	drm_gem_unlock_reservations(cpu_job->base.bo, cpu_job->base.bo_count,
+				    &acquire_ctx);
+
+	drm_gem_unlock_reservations(clean_job->bo, clean_job->bo_count,
+				    &cpu_job->indirect_csd.acquire_ctx);
+
 fail:
 	v3d_job_cleanup((void *)cpu_job);
+	v3d_job_cleanup((void *)csd_job);
+	v3d_job_cleanup(clean_job);
 	v3d_put_multisync_post_deps(&se);
 
 	return ret;
diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
index 00abef9d0db7..059b84984fb0 100644
--- a/include/uapi/drm/v3d_drm.h
+++ b/include/uapi/drm/v3d_drm.h
@@ -71,7 +71,8 @@ extern "C" {
 struct drm_v3d_extension {
 	__u64 next;
 	__u32 id;
-#define DRM_V3D_EXT_ID_MULTI_SYNC		0x01
+#define DRM_V3D_EXT_ID_MULTI_SYNC			0x01
+#define DRM_V3D_EXT_ID_CPU_INDIRECT_CSD		0x02
 	__u32 flags; /* mbz */
 };
 
@@ -365,6 +366,40 @@ struct drm_v3d_submit_csd {
 	__u32 pad;
 };
 
+/**
+ * struct drm_v3d_indirect_csd - ioctl extension for the CPU job to create an
+ * indirect CSD
+ *
+ * When an extension of DRM_V3D_EXT_ID_CPU_INDIRECT_CSD id is defined, it
+ * points to this extension to define a indirect CSD submission. It creates a
+ * CPU job linked to a CSD job. The CPU job waits for the indirect CSD
+ * dependencies and, once they are signaled, it updates the CSD job config
+ * before allowing the CSD job execution.
+ */
+struct drm_v3d_indirect_csd {
+	struct drm_v3d_extension base;
+
+	/* Indirect CSD */
+	struct drm_v3d_submit_csd submit;
+
+	/* Handle of the indirect BO, that should be also attached to the
+	 * indirect CSD.
+	 */
+	__u32 indirect;
+
+	/* Offset within the BO where the workgroup counts are stored */
+	__u32 offset;
+
+	/* Workgroups size */
+	__u32 wg_size;
+
+	/* Indices of the uniforms with the workgroup dispatch counts
+	 * in the uniform stream. If the uniform rewrite is not needed,
+	 * the offset must be 0xffffffff.
+	 */
+	__u32 wg_uniform_offsets[3];
+};
+
 struct drm_v3d_submit_cpu {
 	/* Pointer to a u32 array of the BOs that are referenced by the job. */
 	__u64 bo_handles;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 13/17] drm/v3d: Create a CPU job extension for the timestamp query job
  2023-11-24  0:46 [PATCH v2 00/17] drm/v3d: Introduce CPU jobs Maíra Canal
                   ` (11 preceding siblings ...)
  2023-11-24  0:47 ` [PATCH v2 12/17] drm/v3d: Create a CPU job extension for a indirect CSD job Maíra Canal
@ 2023-11-24  0:47 ` Maíra Canal
  2023-11-27  9:16   ` Iago Toral
  2023-11-24  0:47 ` [PATCH v2 14/17] drm/v3d: Create a CPU job extension for the reset timestamp job Maíra Canal
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 25+ messages in thread
From: Maíra Canal @ 2023-11-24  0:47 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann
  Cc: Maíra Canal, kernel-dev, Boris Brezillon, dri-devel, Faith Ekstrand

A CPU job is a type of job that performs operations that requires CPU
intervention. A timestamp query job is a job that calculates the
query timestamp and updates the query availability by signaling a
syncobj. As V3D doesn't provide any mechanism to obtain a timestamp
from the GPU, it is a job that needs CPU intervention.

So, create a user extension for the CPU job that enables the creation
of a timestamp query job. This user extension will allow the creation of
a CPU job that performs the timestamp query calculation and updates the
timestamp BO with the proper value.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
 drivers/gpu/drm/v3d/v3d_drv.h    | 17 +++++++++
 drivers/gpu/drm/v3d/v3d_sched.c  | 40 ++++++++++++++++++++-
 drivers/gpu/drm/v3d/v3d_submit.c | 62 ++++++++++++++++++++++++++++++++
 include/uapi/drm/v3d_drm.h       | 27 ++++++++++++++
 4 files changed, 145 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index f665f3f5b65b..524e4e952bae 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -318,6 +318,15 @@ struct v3d_csd_job {
 
 enum v3d_cpu_job_type {
 	V3D_CPU_JOB_TYPE_INDIRECT_CSD = 1,
+	V3D_CPU_JOB_TYPE_TIMESTAMP_QUERY,
+};
+
+struct v3d_timestamp_query {
+	/* Offset of this query in the timestamp BO for its value. */
+	u32 offset;
+
+	/* Syncobj that indicates the timestamp availability */
+	struct drm_syncobj *syncobj;
 };
 
 struct v3d_indirect_csd_info {
@@ -345,12 +354,20 @@ struct v3d_indirect_csd_info {
 	struct ww_acquire_ctx acquire_ctx;
 };
 
+struct v3d_timestamp_query_info {
+	struct v3d_timestamp_query *queries;
+
+	u32 count;
+};
+
 struct v3d_cpu_job {
 	struct v3d_job base;
 
 	enum v3d_cpu_job_type job_type;
 
 	struct v3d_indirect_csd_info indirect_csd;
+
+	struct v3d_timestamp_query_info timestamp_query;
 };
 
 struct v3d_submit_outsync {
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index a8ac46f70cee..828c4fc14dcd 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -21,6 +21,8 @@
 #include <linux/sched/clock.h>
 #include <linux/kthread.h>
 
+#include <drm/drm_syncobj.h>
+
 #include "v3d_drv.h"
 #include "v3d_regs.h"
 #include "v3d_trace.h"
@@ -71,6 +73,21 @@ v3d_sched_job_free(struct drm_sched_job *sched_job)
 	v3d_job_cleanup(job);
 }
 
+static void
+v3d_cpu_job_free(struct drm_sched_job *sched_job)
+{
+	struct v3d_cpu_job *job = to_cpu_job(sched_job);
+	struct v3d_timestamp_query_info *timestamp_query = &job->timestamp_query;
+
+	if (timestamp_query->queries) {
+		for (int i = 0; i < timestamp_query->count; i++)
+			drm_syncobj_put(timestamp_query->queries[i].syncobj);
+		kvfree(timestamp_query->queries);
+	}
+
+	v3d_job_cleanup(&job->base);
+}
+
 static void
 v3d_switch_perfmon(struct v3d_dev *v3d, struct v3d_job *job)
 {
@@ -305,6 +322,26 @@ v3d_rewrite_csd_job_wg_counts_from_indirect(struct v3d_cpu_job *job)
 	v3d_put_bo_vaddr(bo);
 }
 
+static void
+v3d_timestamp_query(struct v3d_cpu_job *job)
+{
+	struct v3d_timestamp_query_info *timestamp_query = &job->timestamp_query;
+	struct v3d_bo *bo = to_v3d_bo(job->base.bo[0]);
+	u8 *value_addr;
+
+	v3d_get_bo_vaddr(bo);
+
+	for (int i = 0; i < timestamp_query->count; i++) {
+		value_addr = ((u8 *) bo->vaddr) + timestamp_query->queries[i].offset;
+		*((u64 *) value_addr) = i == 0 ? ktime_get_ns() : 0ull;
+
+		drm_syncobj_replace_fence(timestamp_query->queries[i].syncobj,
+					  job->base.done_fence);
+	}
+
+	v3d_put_bo_vaddr(bo);
+}
+
 static struct dma_fence *
 v3d_cpu_job_run(struct drm_sched_job *sched_job)
 {
@@ -315,6 +352,7 @@ v3d_cpu_job_run(struct drm_sched_job *sched_job)
 
 	void (*v3d_cpu_job_fn[])(struct v3d_cpu_job *job) = {
 		[V3D_CPU_JOB_TYPE_INDIRECT_CSD] = v3d_rewrite_csd_job_wg_counts_from_indirect,
+		[V3D_CPU_JOB_TYPE_TIMESTAMP_QUERY] = v3d_timestamp_query,
 	};
 
 	v3d->cpu_job = job;
@@ -504,7 +542,7 @@ static const struct drm_sched_backend_ops v3d_cache_clean_sched_ops = {
 static const struct drm_sched_backend_ops v3d_cpu_sched_ops = {
 	.run_job = v3d_cpu_job_run,
 	.timedout_job = v3d_generic_job_timedout,
-	.free_job = v3d_sched_job_free
+	.free_job = v3d_cpu_job_free
 };
 
 int
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index b6707ef42706..2f03c8bca593 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -438,6 +438,64 @@ v3d_get_cpu_indirect_csd_params(struct drm_file *file_priv,
 					  NULL, &info->acquire_ctx);
 }
 
+/* Get data for the query timestamp job submission. */
+static int
+v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv,
+				   struct drm_v3d_extension __user *ext,
+				   struct v3d_cpu_job *job)
+{
+	u32 __user *offsets, *syncs;
+	struct drm_v3d_timestamp_query timestamp;
+
+	if (!job) {
+		DRM_DEBUG("CPU job extension was attached to a GPU job.\n");
+		return -EINVAL;
+	}
+
+	if (job->job_type) {
+		DRM_DEBUG("Two CPU job extensions were added to the same CPU job.\n");
+		return -EINVAL;
+	}
+
+	if (copy_from_user(&timestamp, ext, sizeof(timestamp)))
+		return -EFAULT;
+
+	if (timestamp.pad)
+		return -EINVAL;
+
+	job->job_type = V3D_CPU_JOB_TYPE_TIMESTAMP_QUERY;
+
+	job->timestamp_query.queries = kvmalloc_array(timestamp.count,
+						      sizeof(struct v3d_timestamp_query),
+						      GFP_KERNEL);
+	if (!job->timestamp_query.queries)
+		return -ENOMEM;
+
+	offsets = u64_to_user_ptr(timestamp.offsets);
+	syncs = u64_to_user_ptr(timestamp.syncs);
+
+	for (int i = 0; i < timestamp.count; i++) {
+		u32 offset, sync;
+
+		if (copy_from_user(&offset, offsets++, sizeof(offset))) {
+			kvfree(job->timestamp_query.queries);
+			return -EFAULT;
+		}
+
+		job->timestamp_query.queries[i].offset = offset;
+
+		if (copy_from_user(&sync, syncs++, sizeof(sync))) {
+			kvfree(job->timestamp_query.queries);
+			return -EFAULT;
+		}
+
+		job->timestamp_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync);
+	}
+	job->timestamp_query.count = timestamp.count;
+
+	return 0;
+}
+
 /* Whenever userspace sets ioctl extensions, v3d_get_extensions parses data
  * according to the extension id (name).
  */
@@ -466,6 +524,9 @@ v3d_get_extensions(struct drm_file *file_priv,
 		case DRM_V3D_EXT_ID_CPU_INDIRECT_CSD:
 			ret = v3d_get_cpu_indirect_csd_params(file_priv, user_ext, job);
 			break;
+		case DRM_V3D_EXT_ID_CPU_TIMESTAMP_QUERY:
+			ret = v3d_get_cpu_timestamp_query_params(file_priv, user_ext, job);
+			break;
 		default:
 			DRM_DEBUG_DRIVER("Unknown extension id: %d\n", ext.id);
 			return -EINVAL;
@@ -954,6 +1015,7 @@ v3d_submit_cpu_ioctl(struct drm_device *dev, void *data,
 	v3d_job_cleanup((void *)csd_job);
 	v3d_job_cleanup(clean_job);
 	v3d_put_multisync_post_deps(&se);
+	kvfree(cpu_job->timestamp_query.queries);
 
 	return ret;
 }
diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
index 059b84984fb0..65d5de076366 100644
--- a/include/uapi/drm/v3d_drm.h
+++ b/include/uapi/drm/v3d_drm.h
@@ -73,6 +73,7 @@ struct drm_v3d_extension {
 	__u32 id;
 #define DRM_V3D_EXT_ID_MULTI_SYNC			0x01
 #define DRM_V3D_EXT_ID_CPU_INDIRECT_CSD		0x02
+#define DRM_V3D_EXT_ID_CPU_TIMESTAMP_QUERY		0x03
 	__u32 flags; /* mbz */
 };
 
@@ -400,6 +401,32 @@ struct drm_v3d_indirect_csd {
 	__u32 wg_uniform_offsets[3];
 };
 
+/**
+ * struct drm_v3d_timestamp_query - ioctl extension for the CPU job to calculate
+ * a timestamp query
+ *
+ * When an extension DRM_V3D_EXT_ID_TIMESTAMP_QUERY is defined, it points to
+ * this extension to define a timestamp query submission. This CPU job will
+ * calculate the timestamp query and update the query value within the
+ * timestamp BO. Moreover, it will signal the timestamp syncobj to indicate
+ * query availability.
+ */
+struct drm_v3d_timestamp_query {
+	struct drm_v3d_extension base;
+
+	/* Array of queries' offsets within the timestamp BO for their value */
+	__u64 offsets;
+
+	/* Array of timestamp's syncobjs to indicate its availability */
+	__u64 syncs;
+
+	/* Number of queries */
+	__u32 count;
+
+	/* mbz */
+	__u32 pad;
+};
+
 struct drm_v3d_submit_cpu {
 	/* Pointer to a u32 array of the BOs that are referenced by the job. */
 	__u64 bo_handles;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 14/17] drm/v3d: Create a CPU job extension for the reset timestamp job
  2023-11-24  0:46 [PATCH v2 00/17] drm/v3d: Introduce CPU jobs Maíra Canal
                   ` (12 preceding siblings ...)
  2023-11-24  0:47 ` [PATCH v2 13/17] drm/v3d: Create a CPU job extension for the timestamp query job Maíra Canal
@ 2023-11-24  0:47 ` Maíra Canal
  2023-11-24  0:47 ` [PATCH v2 15/17] drm/v3d: Create a CPU job extension to copy timestamp query to a buffer Maíra Canal
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Maíra Canal @ 2023-11-24  0:47 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann
  Cc: Maíra Canal, kernel-dev, Boris Brezillon, dri-devel, Faith Ekstrand

A CPU job is a type of job that performs operations that requires CPU
intervention. A reset timestamp job is a job that resets the timestamp
queries based on the value offset of the first query. As V3D doesn't
provide any mechanism to obtain a timestamp from the GPU, it is a job
that needs CPU intervention.

So, create a user extension for the CPU job that enables the creation
of a reset timestamp job. This user extension will allow the creation of
a CPU job that resets the timestamp value in the timestamp BO and resets
the availability syncobj.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
 drivers/gpu/drm/v3d/v3d_drv.h    |  1 +
 drivers/gpu/drm/v3d/v3d_sched.c  | 21 +++++++++++++
 drivers/gpu/drm/v3d/v3d_submit.c | 51 ++++++++++++++++++++++++++++++++
 include/uapi/drm/v3d_drm.h       | 24 +++++++++++++++
 4 files changed, 97 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 524e4e952bae..a1c8ff06f16b 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -319,6 +319,7 @@ struct v3d_csd_job {
 enum v3d_cpu_job_type {
 	V3D_CPU_JOB_TYPE_INDIRECT_CSD = 1,
 	V3D_CPU_JOB_TYPE_TIMESTAMP_QUERY,
+	V3D_CPU_JOB_TYPE_RESET_TIMESTAMP_QUERY,
 };
 
 struct v3d_timestamp_query {
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 828c4fc14dcd..57c9fc887c46 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -342,6 +342,26 @@ v3d_timestamp_query(struct v3d_cpu_job *job)
 	v3d_put_bo_vaddr(bo);
 }
 
+static void
+v3d_reset_timestamp_queries(struct v3d_cpu_job *job)
+{
+	struct v3d_timestamp_query_info *timestamp_query = &job->timestamp_query;
+	struct v3d_timestamp_query *queries = timestamp_query->queries;
+	struct v3d_bo *bo = to_v3d_bo(job->base.bo[0]);
+	u8 *value_addr;
+
+	v3d_get_bo_vaddr(bo);
+
+	for (int i = 0; i < timestamp_query->count; i++) {
+		value_addr = ((u8 *) bo->vaddr) + queries[i].offset;
+		*((u64 *) value_addr) = 0;
+
+		drm_syncobj_replace_fence(queries[i].syncobj, NULL);
+	}
+
+	v3d_put_bo_vaddr(bo);
+}
+
 static struct dma_fence *
 v3d_cpu_job_run(struct drm_sched_job *sched_job)
 {
@@ -353,6 +373,7 @@ v3d_cpu_job_run(struct drm_sched_job *sched_job)
 	void (*v3d_cpu_job_fn[])(struct v3d_cpu_job *job) = {
 		[V3D_CPU_JOB_TYPE_INDIRECT_CSD] = v3d_rewrite_csd_job_wg_counts_from_indirect,
 		[V3D_CPU_JOB_TYPE_TIMESTAMP_QUERY] = v3d_timestamp_query,
+		[V3D_CPU_JOB_TYPE_RESET_TIMESTAMP_QUERY] = v3d_reset_timestamp_queries,
 	};
 
 	v3d->cpu_job = job;
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 2f03c8bca593..b04eb6448260 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -496,6 +496,54 @@ v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv,
 	return 0;
 }
 
+static int
+v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv,
+				   struct drm_v3d_extension __user *ext,
+				   struct v3d_cpu_job *job)
+{
+	u32 __user *syncs;
+	struct drm_v3d_reset_timestamp_query reset;
+
+	if (!job) {
+		DRM_DEBUG("CPU job extension was attached to a GPU job.\n");
+		return -EINVAL;
+	}
+
+	if (job->job_type) {
+		DRM_DEBUG("Two CPU job extensions were added to the same CPU job.\n");
+		return -EINVAL;
+	}
+
+	if (copy_from_user(&reset, ext, sizeof(reset)))
+		return -EFAULT;
+
+	job->job_type = V3D_CPU_JOB_TYPE_RESET_TIMESTAMP_QUERY;
+
+	job->timestamp_query.queries = kvmalloc_array(reset.count,
+						      sizeof(struct v3d_timestamp_query),
+						      GFP_KERNEL);
+	if (!job->timestamp_query.queries)
+		return -ENOMEM;
+
+	syncs = u64_to_user_ptr(reset.syncs);
+
+	for (int i = 0; i < reset.count; i++) {
+		u32 sync;
+
+		job->timestamp_query.queries[i].offset = reset.offset + 8 * i;
+
+		if (copy_from_user(&sync, syncs++, sizeof(sync))) {
+			kvfree(job->timestamp_query.queries);
+			return -EFAULT;
+		}
+
+		job->timestamp_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync);
+	}
+	job->timestamp_query.count = reset.count;
+
+	return 0;
+}
+
 /* Whenever userspace sets ioctl extensions, v3d_get_extensions parses data
  * according to the extension id (name).
  */
@@ -527,6 +575,9 @@ v3d_get_extensions(struct drm_file *file_priv,
 		case DRM_V3D_EXT_ID_CPU_TIMESTAMP_QUERY:
 			ret = v3d_get_cpu_timestamp_query_params(file_priv, user_ext, job);
 			break;
+		case DRM_V3D_EXT_ID_CPU_RESET_TIMESTAMP_QUERY:
+			ret = v3d_get_cpu_reset_timestamp_params(file_priv, user_ext, job);
+			break;
 		default:
 			DRM_DEBUG_DRIVER("Unknown extension id: %d\n", ext.id);
 			return -EINVAL;
diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
index 65d5de076366..e6f102c62a97 100644
--- a/include/uapi/drm/v3d_drm.h
+++ b/include/uapi/drm/v3d_drm.h
@@ -74,6 +74,7 @@ struct drm_v3d_extension {
 #define DRM_V3D_EXT_ID_MULTI_SYNC			0x01
 #define DRM_V3D_EXT_ID_CPU_INDIRECT_CSD		0x02
 #define DRM_V3D_EXT_ID_CPU_TIMESTAMP_QUERY		0x03
+#define DRM_V3D_EXT_ID_CPU_RESET_TIMESTAMP_QUERY	0x04
 	__u32 flags; /* mbz */
 };
 
@@ -427,6 +428,29 @@ struct drm_v3d_timestamp_query {
 	__u32 pad;
 };
 
+/**
+ * struct drm_v3d_reset_timestamp_query - ioctl extension for the CPU job to
+ * reset timestamp queries
+ *
+ * When an extension DRM_V3D_EXT_ID_CPU_RESET_TIMESTAMP_QUERY is defined, it
+ * points to this extension to define a reset timestamp submission. This CPU
+ * job will reset the timestamp queries based on value offset of the first
+ * query. Moreover, it will reset the timestamp syncobj to reset query
+ * availability.
+ */
+struct drm_v3d_reset_timestamp_query {
+	struct drm_v3d_extension base;
+
+	/* Array of timestamp's syncobjs to indicate its availability */
+	__u64 syncs;
+
+	/* Offset of the first query within the timestamp BO for its value */
+	__u32 offset;
+
+	/* Number of queries */
+	__u32 count;
+};
+
 struct drm_v3d_submit_cpu {
 	/* Pointer to a u32 array of the BOs that are referenced by the job. */
 	__u64 bo_handles;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 15/17] drm/v3d: Create a CPU job extension to copy timestamp query to a buffer
  2023-11-24  0:46 [PATCH v2 00/17] drm/v3d: Introduce CPU jobs Maíra Canal
                   ` (13 preceding siblings ...)
  2023-11-24  0:47 ` [PATCH v2 14/17] drm/v3d: Create a CPU job extension for the reset timestamp job Maíra Canal
@ 2023-11-24  0:47 ` Maíra Canal
  2023-11-24  0:47 ` [PATCH v2 16/17] drm/v3d: Create a CPU job extension for the reset performance query job Maíra Canal
  2023-11-24  0:47 ` [PATCH v2 17/17] drm/v3d: Create a CPU job extension for the copy " Maíra Canal
  16 siblings, 0 replies; 25+ messages in thread
From: Maíra Canal @ 2023-11-24  0:47 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann
  Cc: Maíra Canal, kernel-dev, Boris Brezillon, dri-devel, Faith Ekstrand

A CPU job is a type of job that performs operations that requires CPU
intervention. A copy timestamp query job is a job that copy the complete
or partial result of a query to a buffer. As V3D doesn't provide any
mechanism to obtain a timestamp from the GPU, it is a job that needs
CPU intervention.

So, create a user extension for the CPU job that enables the creation
of a copy timestamp query job. This user extension will allow the creation
of a CPU job that copy the results of a timestamp query to a BO with the
possibility to indicate the timestamp availability with a availability bit.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
 drivers/gpu/drm/v3d/v3d_drv.h    | 20 ++++++++++
 drivers/gpu/drm/v3d/v3d_sched.c  | 54 +++++++++++++++++++++++++
 drivers/gpu/drm/v3d/v3d_submit.c | 68 ++++++++++++++++++++++++++++++++
 include/uapi/drm/v3d_drm.h       | 41 +++++++++++++++++++
 4 files changed, 183 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index a1c8ff06f16b..4b7de76b504a 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -320,6 +320,7 @@ enum v3d_cpu_job_type {
 	V3D_CPU_JOB_TYPE_INDIRECT_CSD = 1,
 	V3D_CPU_JOB_TYPE_TIMESTAMP_QUERY,
 	V3D_CPU_JOB_TYPE_RESET_TIMESTAMP_QUERY,
+	V3D_CPU_JOB_TYPE_COPY_TIMESTAMP_QUERY,
 };
 
 struct v3d_timestamp_query {
@@ -361,6 +362,23 @@ struct v3d_timestamp_query_info {
 	u32 count;
 };
 
+struct v3d_copy_query_results_info {
+	/* Define if should write to buffer using 64 or 32 bits */
+	bool do_64bit;
+
+	/* Define if it can write to buffer even if the query is not available */
+	bool do_partial;
+
+	/* Define if it should write availability bit to buffer */
+	bool availability_bit;
+
+	/* Offset of the copy buffer in the BO */
+	u32 offset;
+
+	/* Stride of the copy buffer in the BO */
+	u32 stride;
+};
+
 struct v3d_cpu_job {
 	struct v3d_job base;
 
@@ -369,6 +387,8 @@ struct v3d_cpu_job {
 	struct v3d_indirect_csd_info indirect_csd;
 
 	struct v3d_timestamp_query_info timestamp_query;
+
+	struct v3d_copy_query_results_info copy;
 };
 
 struct v3d_submit_outsync {
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 57c9fc887c46..5068b499b7ab 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -362,6 +362,59 @@ v3d_reset_timestamp_queries(struct v3d_cpu_job *job)
 	v3d_put_bo_vaddr(bo);
 }
 
+static void
+write_to_buffer(void *dst, u32 idx, bool do_64bit, u64 value)
+{
+	if (do_64bit) {
+		u64 *dst64 = (u64 *) dst;
+		dst64[idx] = value;
+	} else {
+		u32 *dst32 = (u32 *) dst;
+		dst32[idx] = (u32) value;
+	}
+}
+
+static void
+v3d_copy_query_results(struct v3d_cpu_job *job)
+{
+	struct v3d_timestamp_query_info *timestamp_query = &job->timestamp_query;
+	struct v3d_timestamp_query *queries = timestamp_query->queries;
+	struct v3d_bo *bo = to_v3d_bo(job->base.bo[0]);
+	struct v3d_bo *timestamp = to_v3d_bo(job->base.bo[1]);
+	struct v3d_copy_query_results_info *copy = &job->copy;
+	struct dma_fence *fence;
+	u8 *query_addr;
+	bool available, write_result;
+	u8 *data;
+	int i;
+
+	v3d_get_bo_vaddr(bo);
+	v3d_get_bo_vaddr(timestamp);
+
+	data = ((u8 *) bo->vaddr) + copy->offset;
+
+	for (i = 0; i < timestamp_query->count; i++) {
+		fence = drm_syncobj_fence_get(queries[i].syncobj);
+		available = fence ? dma_fence_is_signaled(fence) : false;
+
+		write_result = available || copy->do_partial;
+		if (write_result) {
+			query_addr = ((u8 *) timestamp->vaddr) + queries[i].offset;
+			write_to_buffer(data, 0, copy->do_64bit, *((u64 *) query_addr));
+		}
+
+		if (copy->availability_bit)
+			write_to_buffer(data, 1, copy->do_64bit, available ? 1u : 0u);
+
+		data += copy->stride;
+
+		dma_fence_put(fence);
+	}
+
+	v3d_put_bo_vaddr(timestamp);
+	v3d_put_bo_vaddr(bo);
+}
+
 static struct dma_fence *
 v3d_cpu_job_run(struct drm_sched_job *sched_job)
 {
@@ -374,6 +427,7 @@ v3d_cpu_job_run(struct drm_sched_job *sched_job)
 		[V3D_CPU_JOB_TYPE_INDIRECT_CSD] = v3d_rewrite_csd_job_wg_counts_from_indirect,
 		[V3D_CPU_JOB_TYPE_TIMESTAMP_QUERY] = v3d_timestamp_query,
 		[V3D_CPU_JOB_TYPE_RESET_TIMESTAMP_QUERY] = v3d_reset_timestamp_queries,
+		[V3D_CPU_JOB_TYPE_COPY_TIMESTAMP_QUERY] = v3d_copy_query_results,
 	};
 
 	v3d->cpu_job = job;
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index b04eb6448260..7a84777f3bf4 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -544,6 +544,71 @@ v3d_get_cpu_reset_timestamp_params(struct drm_file *file_priv,
 	return 0;
 }
 
+/* Get data for the copy timestamp query results job submission. */
+static int
+v3d_get_cpu_copy_query_results_params(struct drm_file *file_priv,
+				      struct drm_v3d_extension __user *ext,
+				      struct v3d_cpu_job *job)
+{
+	u32 __user *offsets, *syncs;
+	struct drm_v3d_copy_timestamp_query copy;
+	int i;
+
+	if (!job) {
+		DRM_DEBUG("CPU job extension was attached to a GPU job.\n");
+		return -EINVAL;
+	}
+
+	if (job->job_type) {
+		DRM_DEBUG("Two CPU job extensions were added to the same CPU job.\n");
+		return -EINVAL;
+	}
+
+	if (copy_from_user(&copy, ext, sizeof(copy)))
+		return -EFAULT;
+
+	if (copy.pad)
+		return -EINVAL;
+
+	job->job_type = V3D_CPU_JOB_TYPE_COPY_TIMESTAMP_QUERY;
+
+	job->timestamp_query.queries = kvmalloc_array(copy.count,
+						      sizeof(struct v3d_timestamp_query),
+						      GFP_KERNEL);
+	if (!job->timestamp_query.queries)
+		return -ENOMEM;
+
+	offsets = u64_to_user_ptr(copy.offsets);
+	syncs = u64_to_user_ptr(copy.syncs);
+
+	for (i = 0; i < copy.count; i++) {
+		u32 offset, sync;
+
+		if (copy_from_user(&offset, offsets++, sizeof(offset))) {
+			kvfree(job->timestamp_query.queries);
+			return -EFAULT;
+		}
+
+		job->timestamp_query.queries[i].offset = offset;
+
+		if (copy_from_user(&sync, syncs++, sizeof(sync))) {
+			kvfree(job->timestamp_query.queries);
+			return -EFAULT;
+		}
+
+		job->timestamp_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync);
+	}
+	job->timestamp_query.count = copy.count;
+
+	job->copy.do_64bit = copy.do_64bit;
+	job->copy.do_partial = copy.do_partial;
+	job->copy.availability_bit = copy.availability_bit;
+	job->copy.offset = copy.offset;
+	job->copy.stride = copy.stride;
+
+	return 0;
+}
+
 /* Whenever userspace sets ioctl extensions, v3d_get_extensions parses data
  * according to the extension id (name).
  */
@@ -578,6 +643,9 @@ v3d_get_extensions(struct drm_file *file_priv,
 		case DRM_V3D_EXT_ID_CPU_RESET_TIMESTAMP_QUERY:
 			ret = v3d_get_cpu_reset_timestamp_params(file_priv, user_ext, job);
 			break;
+		case DRM_V3D_EXT_ID_CPU_COPY_TIMESTAMP_QUERY:
+			ret = v3d_get_cpu_copy_query_results_params(file_priv, user_ext, job);
+			break;
 		default:
 			DRM_DEBUG_DRIVER("Unknown extension id: %d\n", ext.id);
 			return -EINVAL;
diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
index e6f102c62a97..63debe433523 100644
--- a/include/uapi/drm/v3d_drm.h
+++ b/include/uapi/drm/v3d_drm.h
@@ -75,6 +75,7 @@ struct drm_v3d_extension {
 #define DRM_V3D_EXT_ID_CPU_INDIRECT_CSD		0x02
 #define DRM_V3D_EXT_ID_CPU_TIMESTAMP_QUERY		0x03
 #define DRM_V3D_EXT_ID_CPU_RESET_TIMESTAMP_QUERY	0x04
+#define DRM_V3D_EXT_ID_CPU_COPY_TIMESTAMP_QUERY	0x05
 	__u32 flags; /* mbz */
 };
 
@@ -451,6 +452,46 @@ struct drm_v3d_reset_timestamp_query {
 	__u32 count;
 };
 
+/**
+ * struct drm_v3d_copy_timestamp_query - ioctl extension for the CPU job to copy
+ * query results to a buffer
+ *
+ * When an extension DRM_V3D_EXT_ID_CPU_COPY_TIMESTAMP_QUERY is defined, it
+ * points to this extension to define a copy timestamp query submission. This
+ * CPU job will copy the timestamp queries results to a BO with the offset
+ * and stride defined in the extension.
+ */
+struct drm_v3d_copy_timestamp_query {
+	struct drm_v3d_extension base;
+
+	/* Define if should write to buffer using 64 or 32 bits */
+	__u8 do_64bit;
+
+	/* Define if it can write to buffer even if the query is not available */
+	__u8 do_partial;
+
+	/* Define if it should write availability bit to buffer */
+	__u8 availability_bit;
+
+	/* mbz */
+	__u8 pad;
+
+	/* Offset of the buffer in the BO */
+	__u32 offset;
+
+	/* Stride of the buffer in the BO */
+	__u32 stride;
+
+	/* Number of queries */
+	__u32 count;
+
+	/* Array of queries' offsets within the timestamp BO for their value */
+	__u64 offsets;
+
+	/* Array of timestamp's syncobjs to indicate its availability */
+	__u64 syncs;
+};
+
 struct drm_v3d_submit_cpu {
 	/* Pointer to a u32 array of the BOs that are referenced by the job. */
 	__u64 bo_handles;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 16/17] drm/v3d: Create a CPU job extension for the reset performance query job
  2023-11-24  0:46 [PATCH v2 00/17] drm/v3d: Introduce CPU jobs Maíra Canal
                   ` (14 preceding siblings ...)
  2023-11-24  0:47 ` [PATCH v2 15/17] drm/v3d: Create a CPU job extension to copy timestamp query to a buffer Maíra Canal
@ 2023-11-24  0:47 ` Maíra Canal
  2023-11-24  0:47 ` [PATCH v2 17/17] drm/v3d: Create a CPU job extension for the copy " Maíra Canal
  16 siblings, 0 replies; 25+ messages in thread
From: Maíra Canal @ 2023-11-24  0:47 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann
  Cc: Maíra Canal, kernel-dev, Boris Brezillon, dri-devel, Faith Ekstrand

A CPU job is a type of job that performs operations that requires CPU
intervention. A reset performance query job is a job that resets the
performance queries by resetting the values of the perfmons. Moreover,
we also reset the syncobjs related to the availability of the query.

So, create a user extension for the CPU job that enables the creation
of a reset performance job. This user extension will allow the creation of
a CPU job that resets the perfmons values and resets the availability syncobj.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
 drivers/gpu/drm/v3d/v3d_drv.h    | 28 +++++++++++++
 drivers/gpu/drm/v3d/v3d_sched.c  | 36 ++++++++++++++++
 drivers/gpu/drm/v3d/v3d_submit.c | 72 ++++++++++++++++++++++++++++++++
 include/uapi/drm/v3d_drm.h       | 27 ++++++++++++
 4 files changed, 163 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 4b7de76b504a..77854581d348 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -321,6 +321,7 @@ enum v3d_cpu_job_type {
 	V3D_CPU_JOB_TYPE_TIMESTAMP_QUERY,
 	V3D_CPU_JOB_TYPE_RESET_TIMESTAMP_QUERY,
 	V3D_CPU_JOB_TYPE_COPY_TIMESTAMP_QUERY,
+	V3D_CPU_JOB_TYPE_RESET_PERFORMANCE_QUERY,
 };
 
 struct v3d_timestamp_query {
@@ -331,6 +332,18 @@ struct v3d_timestamp_query {
 	struct drm_syncobj *syncobj;
 };
 
+/* Number of perfmons required to handle all supported performance counters */
+#define V3D_MAX_PERFMONS DIV_ROUND_UP(V3D_PERFCNT_NUM, \
+				      DRM_V3D_MAX_PERF_COUNTERS)
+
+struct v3d_performance_query {
+	/* Performance monitor IDs for this query */
+	u32 kperfmon_ids[V3D_MAX_PERFMONS];
+
+	/* Syncobj that indicates the query availability */
+	struct drm_syncobj *syncobj;
+};
+
 struct v3d_indirect_csd_info {
 	/* Indirect CSD */
 	struct v3d_csd_job *job;
@@ -362,6 +375,19 @@ struct v3d_timestamp_query_info {
 	u32 count;
 };
 
+struct v3d_performance_query_info {
+	struct v3d_performance_query *queries;
+
+	/* Number of performance queries */
+	u32 count;
+
+	/* Number of performance monitors related to that query pool */
+	u32 nperfmons;
+
+	/* Number of performance counters related to that query pool */
+	u32 ncounters;
+};
+
 struct v3d_copy_query_results_info {
 	/* Define if should write to buffer using 64 or 32 bits */
 	bool do_64bit;
@@ -389,6 +415,8 @@ struct v3d_cpu_job {
 	struct v3d_timestamp_query_info timestamp_query;
 
 	struct v3d_copy_query_results_info copy;
+
+	struct v3d_performance_query_info performance_query;
 };
 
 struct v3d_submit_outsync {
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 5068b499b7ab..8ebb1f3567f6 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -78,6 +78,7 @@ v3d_cpu_job_free(struct drm_sched_job *sched_job)
 {
 	struct v3d_cpu_job *job = to_cpu_job(sched_job);
 	struct v3d_timestamp_query_info *timestamp_query = &job->timestamp_query;
+	struct v3d_performance_query_info *performance_query = &job->performance_query;
 
 	if (timestamp_query->queries) {
 		for (int i = 0; i < timestamp_query->count; i++)
@@ -85,6 +86,12 @@ v3d_cpu_job_free(struct drm_sched_job *sched_job)
 		kvfree(timestamp_query->queries);
 	}
 
+	if (performance_query->queries) {
+		for (int i = 0; i < performance_query->count; i++)
+			drm_syncobj_put(performance_query->queries[i].syncobj);
+		kvfree(performance_query->queries);
+	}
+
 	v3d_job_cleanup(&job->base);
 }
 
@@ -415,6 +422,34 @@ v3d_copy_query_results(struct v3d_cpu_job *job)
 	v3d_put_bo_vaddr(bo);
 }
 
+static void
+v3d_reset_performance_queries(struct v3d_cpu_job *job)
+{
+	struct v3d_performance_query_info *performance_query = &job->performance_query;
+	struct v3d_file_priv *v3d_priv = job->base.file->driver_priv;
+	struct v3d_dev *v3d = job->base.v3d;
+	struct v3d_perfmon *perfmon;
+
+	for (int i = 0; i < performance_query->count; i++) {
+		for (int j = 0; j < performance_query->nperfmons; j++) {
+			perfmon = v3d_perfmon_find(v3d_priv,
+						   performance_query->queries[i].kperfmon_ids[j]);
+			if (!perfmon) {
+				DRM_DEBUG("Failed to find perfmon.");
+				continue;
+			}
+
+			v3d_perfmon_stop(v3d, perfmon, false);
+
+			memset(perfmon->values, 0, perfmon->ncounters * sizeof(u64));
+
+			v3d_perfmon_put(perfmon);
+		}
+
+		drm_syncobj_replace_fence(performance_query->queries[i].syncobj, NULL);
+	}
+}
+
 static struct dma_fence *
 v3d_cpu_job_run(struct drm_sched_job *sched_job)
 {
@@ -428,6 +463,7 @@ v3d_cpu_job_run(struct drm_sched_job *sched_job)
 		[V3D_CPU_JOB_TYPE_TIMESTAMP_QUERY] = v3d_timestamp_query,
 		[V3D_CPU_JOB_TYPE_RESET_TIMESTAMP_QUERY] = v3d_reset_timestamp_queries,
 		[V3D_CPU_JOB_TYPE_COPY_TIMESTAMP_QUERY] = v3d_copy_query_results,
+		[V3D_CPU_JOB_TYPE_RESET_PERFORMANCE_QUERY] = v3d_reset_performance_queries,
 	};
 
 	v3d->cpu_job = job;
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 7a84777f3bf4..b7d0e1ccea6b 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -609,6 +609,74 @@ v3d_get_cpu_copy_query_results_params(struct drm_file *file_priv,
 	return 0;
 }
 
+static int
+v3d_get_cpu_reset_performance_params(struct drm_file *file_priv,
+				     struct drm_v3d_extension __user *ext,
+				     struct v3d_cpu_job *job)
+{
+	u32 __user *syncs;
+	u64 __user *kperfmon_ids;
+	struct drm_v3d_reset_performance_query reset;
+
+	if (!job) {
+		DRM_DEBUG("CPU job extension was attached to a GPU job.\n");
+		return -EINVAL;
+	}
+
+	if (job->job_type) {
+		DRM_DEBUG("Two CPU job extensions were added to the same CPU job.\n");
+		return -EINVAL;
+	}
+
+	if (copy_from_user(&reset, ext, sizeof(reset)))
+		return -EFAULT;
+
+	job->job_type = V3D_CPU_JOB_TYPE_RESET_PERFORMANCE_QUERY;
+
+	job->performance_query.queries = kvmalloc_array(reset.count,
+							sizeof(struct v3d_performance_query),
+							GFP_KERNEL);
+	if (!job->performance_query.queries)
+		return -ENOMEM;
+
+	syncs = u64_to_user_ptr(reset.syncs);
+	kperfmon_ids = u64_to_user_ptr(reset.kperfmon_ids);
+
+	for (int i = 0; i < reset.count; i++) {
+		u32 sync;
+		u64 ids;
+		u32 __user *ids_pointer;
+		u32 id;
+
+		if (copy_from_user(&sync, syncs++, sizeof(sync))) {
+			kvfree(job->performance_query.queries);
+			return -EFAULT;
+		}
+
+		job->performance_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync);
+
+		if (copy_from_user(&ids, kperfmon_ids++, sizeof(ids))) {
+			kvfree(job->performance_query.queries);
+			return -EFAULT;
+		}
+
+		ids_pointer = u64_to_user_ptr(ids);
+
+		for (int j = 0; j < reset.nperfmons; j++) {
+			if (copy_from_user(&id, ids_pointer++, sizeof(id))) {
+				kvfree(job->performance_query.queries);
+				return -EFAULT;
+			}
+
+			job->performance_query.queries[i].kperfmon_ids[j] = id;
+		}
+	}
+	job->performance_query.count = reset.count;
+	job->performance_query.nperfmons = reset.nperfmons;
+
+	return 0;
+}
+
 /* Whenever userspace sets ioctl extensions, v3d_get_extensions parses data
  * according to the extension id (name).
  */
@@ -646,6 +714,9 @@ v3d_get_extensions(struct drm_file *file_priv,
 		case DRM_V3D_EXT_ID_CPU_COPY_TIMESTAMP_QUERY:
 			ret = v3d_get_cpu_copy_query_results_params(file_priv, user_ext, job);
 			break;
+		case DRM_V3D_EXT_ID_CPU_RESET_PERFORMANCE_QUERY:
+			ret = v3d_get_cpu_reset_performance_params(file_priv, user_ext, job);
+			break;
 		default:
 			DRM_DEBUG_DRIVER("Unknown extension id: %d\n", ext.id);
 			return -EINVAL;
@@ -1135,6 +1206,7 @@ v3d_submit_cpu_ioctl(struct drm_device *dev, void *data,
 	v3d_job_cleanup(clean_job);
 	v3d_put_multisync_post_deps(&se);
 	kvfree(cpu_job->timestamp_query.queries);
+	kvfree(cpu_job->performance_query.queries);
 
 	return ret;
 }
diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
index 63debe433523..833afdc5cc7c 100644
--- a/include/uapi/drm/v3d_drm.h
+++ b/include/uapi/drm/v3d_drm.h
@@ -76,6 +76,7 @@ struct drm_v3d_extension {
 #define DRM_V3D_EXT_ID_CPU_TIMESTAMP_QUERY		0x03
 #define DRM_V3D_EXT_ID_CPU_RESET_TIMESTAMP_QUERY	0x04
 #define DRM_V3D_EXT_ID_CPU_COPY_TIMESTAMP_QUERY	0x05
+#define DRM_V3D_EXT_ID_CPU_RESET_PERFORMANCE_QUERY	0x06
 	__u32 flags; /* mbz */
 };
 
@@ -492,6 +493,32 @@ struct drm_v3d_copy_timestamp_query {
 	__u64 syncs;
 };
 
+/**
+ * struct drm_v3d_reset_performance_query - ioctl extension for the CPU job to
+ * reset performance queries
+ *
+ * When an extension DRM_V3D_EXT_ID_CPU_RESET_PERFORMANCE_QUERY is defined, it
+ * points to this extension to define a reset performance submission. This CPU
+ * job will reset the performance queries by resetting the values of the
+ * performance monitors. Moreover, it will reset the syncobj to reset query
+ * availability.
+ */
+struct drm_v3d_reset_performance_query {
+	struct drm_v3d_extension base;
+
+	/* Array of performance queries's syncobjs to indicate its availability */
+	__u64 syncs;
+
+	/* Number of queries */
+	__u32 count;
+
+	/* Number of performance monitors */
+	__u32 nperfmons;
+
+	/* Array of u64 user-pointers that point to an array of kperfmon_ids */
+	__u64 kperfmon_ids;
+};
+
 struct drm_v3d_submit_cpu {
 	/* Pointer to a u32 array of the BOs that are referenced by the job. */
 	__u64 bo_handles;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 17/17] drm/v3d: Create a CPU job extension for the copy performance query job
  2023-11-24  0:46 [PATCH v2 00/17] drm/v3d: Introduce CPU jobs Maíra Canal
                   ` (15 preceding siblings ...)
  2023-11-24  0:47 ` [PATCH v2 16/17] drm/v3d: Create a CPU job extension for the reset performance query job Maíra Canal
@ 2023-11-24  0:47 ` Maíra Canal
  16 siblings, 0 replies; 25+ messages in thread
From: Maíra Canal @ 2023-11-24  0:47 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann
  Cc: Maíra Canal, kernel-dev, Boris Brezillon, dri-devel, Faith Ekstrand

A CPU job is a type of job that performs operations that requires CPU
intervention. A copy performance query job is a job that copy the complete
or partial result of a query to a buffer. In order to copy the result of
a performance query to a buffer, we need to get the values from the
performance monitors.

So, create a user extension for the CPU job that enables the creation
of a copy performance query job. This user extension will allow the creation
of a CPU job that copy the results of a performance query to a BO with the
possibility to indicate the availability with a availability bit.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
 drivers/gpu/drm/v3d/v3d_drv.h    |  1 +
 drivers/gpu/drm/v3d/v3d_sched.c  | 66 ++++++++++++++++++++++++++
 drivers/gpu/drm/v3d/v3d_submit.c | 81 ++++++++++++++++++++++++++++++++
 include/uapi/drm/v3d_drm.h       | 47 ++++++++++++++++++
 4 files changed, 195 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 77854581d348..6254db0482f4 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -322,6 +322,7 @@ enum v3d_cpu_job_type {
 	V3D_CPU_JOB_TYPE_RESET_TIMESTAMP_QUERY,
 	V3D_CPU_JOB_TYPE_COPY_TIMESTAMP_QUERY,
 	V3D_CPU_JOB_TYPE_RESET_PERFORMANCE_QUERY,
+	V3D_CPU_JOB_TYPE_COPY_PERFORMANCE_QUERY,
 };
 
 struct v3d_timestamp_query {
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 8ebb1f3567f6..f1a8a6dc5436 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -450,6 +450,71 @@ v3d_reset_performance_queries(struct v3d_cpu_job *job)
 	}
 }
 
+static void
+v3d_write_performance_query_result(struct v3d_cpu_job *job, void *data, u32 query)
+{
+	struct v3d_performance_query_info *performance_query = &job->performance_query;
+	struct v3d_copy_query_results_info *copy = &job->copy;
+	struct v3d_file_priv *v3d_priv = job->base.file->driver_priv;
+	struct v3d_dev *v3d = job->base.v3d;
+	struct v3d_perfmon *perfmon;
+	u64 counter_values[V3D_PERFCNT_NUM];
+
+	for (int i = 0; i < performance_query->nperfmons; i++) {
+		perfmon = v3d_perfmon_find(v3d_priv,
+					   performance_query->queries[query].kperfmon_ids[i]);
+		if (!perfmon) {
+			DRM_DEBUG("Failed to find perfmon.");
+			continue;
+		}
+
+		v3d_perfmon_stop(v3d, perfmon, true);
+
+		memcpy(&counter_values[i * DRM_V3D_MAX_PERF_COUNTERS], perfmon->values,
+		       perfmon->ncounters * sizeof(u64));
+
+		v3d_perfmon_put(perfmon);
+	}
+
+	for (int i = 0; i < performance_query->ncounters; i++)
+		write_to_buffer(data, i, copy->do_64bit, counter_values[i]);
+}
+
+
+static void
+v3d_copy_performance_query(struct v3d_cpu_job *job)
+{
+	struct v3d_performance_query_info *performance_query = &job->performance_query;
+	struct v3d_copy_query_results_info *copy = &job->copy;
+	struct v3d_bo *bo = to_v3d_bo(job->base.bo[0]);
+	struct dma_fence *fence;
+	bool available, write_result;
+	u8 *data;
+
+	v3d_get_bo_vaddr(bo);
+
+	data = ((u8 *) bo->vaddr) + copy->offset;
+
+	for (int i = 0; i < performance_query->count; i++) {
+		fence = drm_syncobj_fence_get(performance_query->queries[i].syncobj);
+		available = fence ? dma_fence_is_signaled(fence) : false;
+
+		write_result = available || copy->do_partial;
+		if (write_result)
+			v3d_write_performance_query_result(job, data, i);
+
+		if (copy->availability_bit)
+			write_to_buffer(data, performance_query->ncounters,
+					copy->do_64bit, available ? 1u : 0u);
+
+		data += copy->stride;
+
+		dma_fence_put(fence);
+	}
+
+	v3d_put_bo_vaddr(bo);
+}
+
 static struct dma_fence *
 v3d_cpu_job_run(struct drm_sched_job *sched_job)
 {
@@ -464,6 +529,7 @@ v3d_cpu_job_run(struct drm_sched_job *sched_job)
 		[V3D_CPU_JOB_TYPE_RESET_TIMESTAMP_QUERY] = v3d_reset_timestamp_queries,
 		[V3D_CPU_JOB_TYPE_COPY_TIMESTAMP_QUERY] = v3d_copy_query_results,
 		[V3D_CPU_JOB_TYPE_RESET_PERFORMANCE_QUERY] = v3d_reset_performance_queries,
+		[V3D_CPU_JOB_TYPE_COPY_PERFORMANCE_QUERY] = v3d_copy_performance_query,
 	};
 
 	v3d->cpu_job = job;
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index b7d0e1ccea6b..3d60b85eb123 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -677,6 +677,84 @@ v3d_get_cpu_reset_performance_params(struct drm_file *file_priv,
 	return 0;
 }
 
+static int
+v3d_get_cpu_copy_performance_query_params(struct drm_file *file_priv,
+					  struct drm_v3d_extension __user *ext,
+					  struct v3d_cpu_job *job)
+{
+	u32 __user *syncs;
+	u64 __user *kperfmon_ids;
+	struct drm_v3d_copy_performance_query copy;
+
+	if (!job) {
+		DRM_DEBUG("CPU job extension was attached to a GPU job.\n");
+		return -EINVAL;
+	}
+
+	if (job->job_type) {
+		DRM_DEBUG("Two CPU job extensions were added to the same CPU job.\n");
+		return -EINVAL;
+	}
+
+	if (copy_from_user(&copy, ext, sizeof(copy)))
+		return -EFAULT;
+
+	if (copy.pad)
+		return -EINVAL;
+
+	job->job_type = V3D_CPU_JOB_TYPE_COPY_PERFORMANCE_QUERY;
+
+	job->performance_query.queries = kvmalloc_array(copy.count,
+							sizeof(struct v3d_performance_query),
+							GFP_KERNEL);
+	if (!job->performance_query.queries)
+		return -ENOMEM;
+
+	syncs = u64_to_user_ptr(copy.syncs);
+	kperfmon_ids = u64_to_user_ptr(copy.kperfmon_ids);
+
+	for (int i = 0; i < copy.count; i++) {
+		u32 sync;
+		u64 ids;
+		u32 __user *ids_pointer;
+		u32 id;
+
+		if (copy_from_user(&sync, syncs++, sizeof(sync))) {
+			kvfree(job->performance_query.queries);
+			return -EFAULT;
+		}
+
+		job->performance_query.queries[i].syncobj = drm_syncobj_find(file_priv, sync);
+
+		if (copy_from_user(&ids, kperfmon_ids++, sizeof(ids))) {
+			kvfree(job->performance_query.queries);
+			return -EFAULT;
+		}
+
+		ids_pointer = u64_to_user_ptr(ids);
+
+		for (int j = 0; j < copy.nperfmons; j++) {
+			if (copy_from_user(&id, ids_pointer++, sizeof(id))) {
+				kvfree(job->performance_query.queries);
+				return -EFAULT;
+			}
+
+			job->performance_query.queries[i].kperfmon_ids[j] = id;
+		}
+	}
+	job->performance_query.count = copy.count;
+	job->performance_query.nperfmons = copy.nperfmons;
+	job->performance_query.ncounters = copy.ncounters;
+
+	job->copy.do_64bit = copy.do_64bit;
+	job->copy.do_partial = copy.do_partial;
+	job->copy.availability_bit = copy.availability_bit;
+	job->copy.offset = copy.offset;
+	job->copy.stride = copy.stride;
+
+	return 0;
+}
+
 /* Whenever userspace sets ioctl extensions, v3d_get_extensions parses data
  * according to the extension id (name).
  */
@@ -717,6 +795,9 @@ v3d_get_extensions(struct drm_file *file_priv,
 		case DRM_V3D_EXT_ID_CPU_RESET_PERFORMANCE_QUERY:
 			ret = v3d_get_cpu_reset_performance_params(file_priv, user_ext, job);
 			break;
+		case DRM_V3D_EXT_ID_CPU_COPY_PERFORMANCE_QUERY:
+			ret = v3d_get_cpu_copy_performance_query_params(file_priv, user_ext, job);
+			break;
 		default:
 			DRM_DEBUG_DRIVER("Unknown extension id: %d\n", ext.id);
 			return -EINVAL;
diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
index 833afdc5cc7c..d3643a25e6e3 100644
--- a/include/uapi/drm/v3d_drm.h
+++ b/include/uapi/drm/v3d_drm.h
@@ -77,6 +77,7 @@ struct drm_v3d_extension {
 #define DRM_V3D_EXT_ID_CPU_RESET_TIMESTAMP_QUERY	0x04
 #define DRM_V3D_EXT_ID_CPU_COPY_TIMESTAMP_QUERY	0x05
 #define DRM_V3D_EXT_ID_CPU_RESET_PERFORMANCE_QUERY	0x06
+#define DRM_V3D_EXT_ID_CPU_COPY_PERFORMANCE_QUERY	0x07
 	__u32 flags; /* mbz */
 };
 
@@ -519,6 +520,52 @@ struct drm_v3d_reset_performance_query {
 	__u64 kperfmon_ids;
 };
 
+/**
+ * struct drm_v3d_copy_performance_query - ioctl extension for the CPU job to copy
+ * performance query results to a buffer
+ *
+ * When an extension DRM_V3D_EXT_ID_CPU_COPY_PERFORMANCE_QUERY is defined, it
+ * points to this extension to define a copy performance query submission. This
+ * CPU job will copy the performance queries results to a BO with the offset
+ * and stride defined in the extension.
+ */
+struct drm_v3d_copy_performance_query {
+	struct drm_v3d_extension base;
+
+	/* Define if should write to buffer using 64 or 32 bits */
+	__u8 do_64bit;
+
+	/* Define if it can write to buffer even if the query is not available */
+	__u8 do_partial;
+
+	/* Define if it should write availability bit to buffer */
+	__u8 availability_bit;
+
+	/* mbz */
+	__u8 pad;
+
+	/* Offset of the buffer in the BO */
+	__u32 offset;
+
+	/* Stride of the buffer in the BO */
+	__u32 stride;
+
+	/* Number of performance monitors */
+	__u32 nperfmons;
+
+	/* Number of performance counters related to this query pool */
+	__u32 ncounters;
+
+	/* Number of queries */
+	__u32 count;
+
+	/* Array of performance queries's syncobjs to indicate its availability */
+	__u64 syncs;
+
+	/* Array of u64 user-pointers that point to an array of kperfmon_ids */
+	__u64 kperfmon_ids;
+};
+
 struct drm_v3d_submit_cpu {
 	/* Pointer to a u32 array of the BOs that are referenced by the job. */
 	__u64 bo_handles;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 04/17] drm/v3d: Simplify job refcount handling
  2023-11-24  0:47 ` [PATCH v2 04/17] drm/v3d: Simplify job refcount handling Maíra Canal
@ 2023-11-27  7:57   ` Iago Toral
  0 siblings, 0 replies; 25+ messages in thread
From: Iago Toral @ 2023-11-27  7:57 UTC (permalink / raw)
  To: Maíra Canal, Melissa Wen, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann
  Cc: kernel-dev, Boris Brezillon, dri-devel, Faith Ekstrand

El jue, 23-11-2023 a las 21:47 -0300, Maíra Canal escribió:
> From: Melissa Wen <mwen@igalia.com>
> 
> Instead of checking if the job is NULL every time we call the
> function,
> check it inside the function.
> 
> Signed-off-by: Melissa Wen <mwen@igalia.com>
> Signed-off-by: Maíra Canal <mcanal@igalia.com>
> ---
>  drivers/gpu/drm/v3d/v3d_submit.c | 15 ++++++++-------
>  1 file changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_submit.c
> b/drivers/gpu/drm/v3d/v3d_submit.c
> index f36214002f37..e18e7c963884 100644
> --- a/drivers/gpu/drm/v3d/v3d_submit.c
> +++ b/drivers/gpu/drm/v3d/v3d_submit.c
> @@ -129,6 +129,9 @@ void v3d_job_cleanup(struct v3d_job *job)
>  
>  void v3d_job_put(struct v3d_job *job)
>  {
> +       if (!job)
> +               return;
> +
>         kref_put(&job->refcount, job->free);
>  }
>  
> @@ -517,11 +520,9 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void
> *data,
>                                                  &se,
>                                                  last_job-
> >done_fence);
>  
> -       if (bin)
> -               v3d_job_put(&bin->base);
> -       v3d_job_put(&render->base);
> -       if (clean_job)
> -               v3d_job_put(clean_job);
> +       v3d_job_put((void *)bin);
> +       v3d_job_put((void *)render);
> +       v3d_job_put((void *)clean_job);

Personally, I am not a big fan of casting to void* instead of using
&job->base for all the v3d_job_put calls in this patch.

Iago

>  
>         return 0;
>  
> @@ -621,7 +622,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void
> *data,
>                                                  &se,
>                                                  job-
> >base.done_fence);
>  
> -       v3d_job_put(&job->base);
> +       v3d_job_put((void *)job);
>  
>         return 0;
>  
> @@ -725,7 +726,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void
> *data,
>                                                  &se,
>                                                  clean_job-
> >done_fence);
>  
> -       v3d_job_put(&job->base);
> +       v3d_job_put((void *)job);
>         v3d_job_put(clean_job);
>  
>         return 0;


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 06/17] drm/v3d: Decouple job allocation from job initiation
  2023-11-24  0:47 ` [PATCH v2 06/17] drm/v3d: Decouple job allocation from job initiation Maíra Canal
@ 2023-11-27  8:12   ` Iago Toral
  2023-11-27 12:50     ` Maira Canal
  0 siblings, 1 reply; 25+ messages in thread
From: Iago Toral @ 2023-11-27  8:12 UTC (permalink / raw)
  To: Maíra Canal, Melissa Wen, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann
  Cc: kernel-dev, Boris Brezillon, dri-devel, Faith Ekstrand

El jue, 23-11-2023 a las 21:47 -0300, Maíra Canal escribió:
> We want to allow the IOCTLs to allocate the job without initiating
> it.
> This will be useful for the CPU job submission IOCTL, as the CPU job
> has
> the need to use information from the user extensions. Currently, the
> user extensions are parsed before the job allocation, making it
> impossible to fill the CPU job when parsing the user extensions.
> Therefore, decouple the job allocation from the job initiation.
> 
> Signed-off-by: Maíra Canal <mcanal@igalia.com>
> ---
>  drivers/gpu/drm/v3d/v3d_submit.c | 23 ++++++++++++++++++-----
>  1 file changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_submit.c
> b/drivers/gpu/drm/v3d/v3d_submit.c
> index fe46dd316ca0..ed1a310bbd2f 100644
> --- a/drivers/gpu/drm/v3d/v3d_submit.c
> +++ b/drivers/gpu/drm/v3d/v3d_submit.c
> @@ -135,6 +135,21 @@ void v3d_job_put(struct v3d_job *job)
>         kref_put(&job->refcount, job->free);
>  }
>  
> +static int
> +v3d_job_allocate(void **container, size_t size)
> +{
> +       if (*container)
> +               return 0;

Mmm... is this really what we want? At least right now we expect
v3d_job_allocate to always allocate memory, is there any scenario in
which we would expect to call this with an already allocated container?

Iago

> +
> +       *container = kcalloc(1, size, GFP_KERNEL);
> +       if (!*container) {
> +               DRM_ERROR("Cannot allocate memory for V3D job.\n");
> +               return -ENOMEM;
> +       }
> +
> +       return 0;
> +}
> +
>  static int
>  v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
>              void **container, size_t size, void (*free)(struct kref
> *ref),
> @@ -145,11 +160,9 @@ v3d_job_init(struct v3d_dev *v3d, struct
> drm_file *file_priv,
>         bool has_multisync = se && (se->flags &
> DRM_V3D_EXT_ID_MULTI_SYNC);
>         int ret, i;
>  
> -       *container = kcalloc(1, size, GFP_KERNEL);
> -       if (!*container) {
> -               DRM_ERROR("Cannot allocate memory for v3d job.");
> -               return -ENOMEM;
> -       }
> +       ret = v3d_job_allocate(container, size);
> +       if (ret)
> +               return ret;
>  
>         job = *container;
>         job->v3d = v3d;


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 07/17] drm/v3d: Add a CPU job submission
  2023-11-24  0:47 ` [PATCH v2 07/17] drm/v3d: Add a CPU job submission Maíra Canal
@ 2023-11-27  8:45   ` Iago Toral
  2023-11-27  8:59   ` Iago Toral
  1 sibling, 0 replies; 25+ messages in thread
From: Iago Toral @ 2023-11-27  8:45 UTC (permalink / raw)
  To: Maíra Canal, Melissa Wen, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann
  Cc: kernel-dev, Boris Brezillon, dri-devel, Faith Ekstrand

El jue, 23-11-2023 a las 21:47 -0300, Maíra Canal escribió:
> From: Melissa Wen <mwen@igalia.com>
> 
> Create a new type of job, a CPU job. A CPU job is a type of job that
> performs operations that requires CPU intervention. The overall idea
> is
> to use user extensions to enable different types of CPU job, allowing
> the
> CPU job to perform different operations according to the type of user
> externsion. The user extension ID identify the type of CPU job that

s/externsion/extension

Iago

> must
> be dealt.
> 
> Having a CPU job is interesting for synchronization purposes as a CPU
> job has a queue like any other V3D job and can be synchoronized by
> the
> multisync extension.
> 
> Signed-off-by: Melissa Wen <mwen@igalia.com>
> Co-developed-by: Maíra Canal <mcanal@igalia.com>
> Signed-off-by: Maíra Canal <mcanal@igalia.com>
> ---
>  drivers/gpu/drm/v3d/v3d_drv.c    |  4 ++
>  drivers/gpu/drm/v3d/v3d_drv.h    | 14 +++++-
>  drivers/gpu/drm/v3d/v3d_sched.c  | 57 +++++++++++++++++++++++
>  drivers/gpu/drm/v3d/v3d_submit.c | 79
> ++++++++++++++++++++++++++++++++
>  include/uapi/drm/v3d_drm.h       | 17 +++++++
>  5 files changed, 170 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.c
> b/drivers/gpu/drm/v3d/v3d_drv.c
> index 44a1ca57d6a4..3debf37e7d9b 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.c
> +++ b/drivers/gpu/drm/v3d/v3d_drv.c
> @@ -91,6 +91,9 @@ static int v3d_get_param_ioctl(struct drm_device
> *dev, void *data,
>         case DRM_V3D_PARAM_SUPPORTS_MULTISYNC_EXT:
>                 args->value = 1;
>                 return 0;
> +       case DRM_V3D_PARAM_SUPPORTS_CPU_QUEUE:
> +               args->value = 1;
> +               return 0;
>         default:
>                 DRM_DEBUG("Unknown parameter %d\n", args->param);
>                 return -EINVAL;
> @@ -189,6 +192,7 @@ static const struct drm_ioctl_desc
> v3d_drm_ioctls[] = {
>         DRM_IOCTL_DEF_DRV(V3D_PERFMON_CREATE,
> v3d_perfmon_create_ioctl, DRM_RENDER_ALLOW),
>         DRM_IOCTL_DEF_DRV(V3D_PERFMON_DESTROY,
> v3d_perfmon_destroy_ioctl, DRM_RENDER_ALLOW),
>         DRM_IOCTL_DEF_DRV(V3D_PERFMON_GET_VALUES,
> v3d_perfmon_get_values_ioctl, DRM_RENDER_ALLOW),
> +       DRM_IOCTL_DEF_DRV(V3D_SUBMIT_CPU, v3d_submit_cpu_ioctl,
> DRM_RENDER_ALLOW | DRM_AUTH),
>  };
>  
>  static const struct drm_driver v3d_drm_driver = {
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.h
> b/drivers/gpu/drm/v3d/v3d_drv.h
> index 4db9ace66024..010b146140ad 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.h
> +++ b/drivers/gpu/drm/v3d/v3d_drv.h
> @@ -19,7 +19,7 @@ struct reset_control;
>  
>  #define GMP_GRANULARITY (128 * 1024)
>  
> -#define V3D_MAX_QUEUES (V3D_CACHE_CLEAN + 1)
> +#define V3D_MAX_QUEUES (V3D_CPU + 1)
>  
>  static inline char *v3d_queue_to_string(enum v3d_queue queue)
>  {
> @@ -29,6 +29,7 @@ static inline char *v3d_queue_to_string(enum
> v3d_queue queue)
>         case V3D_TFU: return "tfu";
>         case V3D_CSD: return "csd";
>         case V3D_CACHE_CLEAN: return "cache_clean";
> +       case V3D_CPU: return "cpu";
>         }
>         return "UNKNOWN";
>  }
> @@ -122,6 +123,7 @@ struct v3d_dev {
>         struct v3d_render_job *render_job;
>         struct v3d_tfu_job *tfu_job;
>         struct v3d_csd_job *csd_job;
> +       struct v3d_cpu_job *cpu_job;
>  
>         struct v3d_queue_state queue[V3D_MAX_QUEUES];
>  
> @@ -312,6 +314,14 @@ struct v3d_csd_job {
>         struct drm_v3d_submit_csd args;
>  };
>  
> +enum v3d_cpu_job_type {};
> +
> +struct v3d_cpu_job {
> +       struct v3d_job base;
> +
> +       enum v3d_cpu_job_type job_type;
> +};
> +
>  struct v3d_submit_outsync {
>         struct drm_syncobj *syncobj;
>  };
> @@ -414,6 +424,8 @@ int v3d_submit_tfu_ioctl(struct drm_device *dev,
> void *data,
>                          struct drm_file *file_priv);
>  int v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
>                          struct drm_file *file_priv);
> +int v3d_submit_cpu_ioctl(struct drm_device *dev, void *data,
> +                        struct drm_file *file_priv);
>  
>  /* v3d_irq.c */
>  int v3d_irq_init(struct v3d_dev *v3d);
> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c
> b/drivers/gpu/drm/v3d/v3d_sched.c
> index fccbea2a5f2e..a32c91b94898 100644
> --- a/drivers/gpu/drm/v3d/v3d_sched.c
> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
> @@ -55,6 +55,12 @@ to_csd_job(struct drm_sched_job *sched_job)
>         return container_of(sched_job, struct v3d_csd_job,
> base.base);
>  }
>  
> +static struct v3d_cpu_job *
> +to_cpu_job(struct drm_sched_job *sched_job)
> +{
> +       return container_of(sched_job, struct v3d_cpu_job,
> base.base);
> +}
> +
>  static void
>  v3d_sched_job_free(struct drm_sched_job *sched_job)
>  {
> @@ -262,6 +268,42 @@ v3d_csd_job_run(struct drm_sched_job *sched_job)
>         return fence;
>  }
>  
> +static struct dma_fence *
> +v3d_cpu_job_run(struct drm_sched_job *sched_job)
> +{
> +       struct v3d_cpu_job *job = to_cpu_job(sched_job);
> +       struct v3d_dev *v3d = job->base.v3d;
> +       struct v3d_file_priv *file = job->base.file->driver_priv;
> +       u64 runtime;
> +
> +       void (*v3d_cpu_job_fn[])(struct v3d_cpu_job *job) = { };
> +
> +       v3d->cpu_job = job;
> +
> +       if (job->job_type >= ARRAY_SIZE(v3d_cpu_job_fn)) {
> +               DRM_DEBUG_DRIVER("Unknown CPU job: %d\n", job-
> >job_type);
> +               return NULL;
> +       }
> +
> +       file->start_ns[V3D_CPU] = local_clock();
> +       v3d->queue[V3D_CPU].start_ns = file->start_ns[V3D_CPU];
> +
> +       v3d_cpu_job_fn[job->job_type](job);
> +
> +       runtime = local_clock() - file->start_ns[V3D_CPU];
> +
> +       file->enabled_ns[V3D_CPU] += runtime;
> +       v3d->queue[V3D_CPU].enabled_ns += runtime;
> +
> +       file->jobs_sent[V3D_CPU]++;
> +       v3d->queue[V3D_CPU].jobs_sent++;
> +
> +       file->start_ns[V3D_CPU] = 0;
> +       v3d->queue[V3D_CPU].start_ns = 0;
> +
> +       return NULL;
> +}
> +
>  static struct dma_fence *
>  v3d_cache_clean_job_run(struct drm_sched_job *sched_job)
>  {
> @@ -416,6 +458,12 @@ static const struct drm_sched_backend_ops
> v3d_cache_clean_sched_ops = {
>         .free_job = v3d_sched_job_free
>  };
>  
> +static const struct drm_sched_backend_ops v3d_cpu_sched_ops = {
> +       .run_job = v3d_cpu_job_run,
> +       .timedout_job = v3d_generic_job_timedout,
> +       .free_job = v3d_sched_job_free
> +};
> +
>  int
>  v3d_sched_init(struct v3d_dev *v3d)
>  {
> @@ -471,6 +519,15 @@ v3d_sched_init(struct v3d_dev *v3d)
>                         goto fail;
>         }
>  
> +       ret = drm_sched_init(&v3d->queue[V3D_CPU].sched,
> +                            &v3d_cpu_sched_ops, NULL,
> +                            DRM_SCHED_PRIORITY_COUNT,
> +                            1, job_hang_limit,
> +                            msecs_to_jiffies(hang_limit_ms), NULL,
> +                            NULL, "v3d_cpu", v3d->drm.dev);
> +       if (ret)
> +               goto fail;
> +
>         return 0;
>  
>  fail:
> diff --git a/drivers/gpu/drm/v3d/v3d_submit.c
> b/drivers/gpu/drm/v3d/v3d_submit.c
> index ed1a310bbd2f..d4b85ab16345 100644
> --- a/drivers/gpu/drm/v3d/v3d_submit.c
> +++ b/drivers/gpu/drm/v3d/v3d_submit.c
> @@ -761,3 +761,82 @@ v3d_submit_csd_ioctl(struct drm_device *dev,
> void *data,
>  
>         return ret;
>  }
> +
> +/**
> + * v3d_submit_cpu_ioctl() - Submits a CPU job to the V3D.
> + * @dev: DRM device
> + * @data: ioctl argument
> + * @file_priv: DRM file for this fd
> + *
> + * Userspace specifies the CPU job type and data required to perform
> its
> + * operations through the drm_v3d_extension struct.
> + */
> +int
> +v3d_submit_cpu_ioctl(struct drm_device *dev, void *data,
> +                    struct drm_file *file_priv)
> +{
> +       struct v3d_dev *v3d = to_v3d_dev(dev);
> +       struct drm_v3d_submit_cpu *args = data;
> +       struct v3d_submit_ext se = {0};
> +       struct v3d_cpu_job *cpu_job = NULL;
> +       struct ww_acquire_ctx acquire_ctx;
> +       int ret;
> +
> +       if (args->flags && !(args->flags & DRM_V3D_SUBMIT_EXTENSION))
> {
> +               DRM_INFO("invalid flags: %d\n", args->flags);
> +               return -EINVAL;
> +       }
> +
> +       ret = v3d_job_allocate((void *)&cpu_job, sizeof(*cpu_job));
> +       if (ret)
> +               return ret;
> +
> +       if (args->flags & DRM_V3D_SUBMIT_EXTENSION) {
> +               ret = v3d_get_extensions(file_priv, args->extensions,
> &se);
> +               if (ret) {
> +                       DRM_DEBUG("Failed to get extensions.\n");
> +                       goto fail;
> +               }
> +       }
> +
> +       /* Every CPU job must have a CPU job user extension */
> +       if (!cpu_job->job_type) {
> +               DRM_DEBUG("CPU job must have a CPU job user
> extension.\n");
> +               goto fail;
> +       }
> +
> +       ret = v3d_job_init(v3d, file_priv, (void *)&cpu_job,
> sizeof(*cpu_job),
> +                          v3d_job_free, 0, &se, V3D_CPU);
> +       if (ret)
> +               goto fail;
> +
> +       if (args->bo_handle_count) {
> +               ret = v3d_lookup_bos(dev, file_priv, &cpu_job->base,
> +                                    args->bo_handles, args-
> >bo_handle_count);
> +               if (ret)
> +                       goto fail;
> +
> +               ret = v3d_lock_bo_reservations(&cpu_job->base,
> &acquire_ctx);
> +               if (ret)
> +                       goto fail;
> +       }
> +
> +       mutex_lock(&v3d->sched_lock);
> +       v3d_push_job(&cpu_job->base);
> +       mutex_unlock(&v3d->sched_lock);
> +
> +       v3d_attach_fences_and_unlock_reservation(file_priv,
> +                                                &cpu_job->base,
> +                                                &acquire_ctx, 0,
> +                                                NULL, cpu_job-
> >base.done_fence);
> +
> +       v3d_job_put((void *)cpu_job);
> +
> +       return 0;
> +
> +fail:
> +       v3d_job_cleanup((void *)cpu_job);
> +       v3d_put_multisync_post_deps(&se);
> +
> +       return ret;
> +}
> diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
> index 1a7d7a689de3..00abef9d0db7 100644
> --- a/include/uapi/drm/v3d_drm.h
> +++ b/include/uapi/drm/v3d_drm.h
> @@ -41,6 +41,7 @@ extern "C" {
>  #define DRM_V3D_PERFMON_CREATE                    0x08
>  #define DRM_V3D_PERFMON_DESTROY                   0x09
>  #define DRM_V3D_PERFMON_GET_VALUES                0x0a
> +#define DRM_V3D_SUBMIT_CPU                        0x0b
>  
>  #define DRM_IOCTL_V3D_SUBMIT_CL           DRM_IOWR(DRM_COMMAND_BASE
> + DRM_V3D_SUBMIT_CL, struct drm_v3d_submit_cl)
>  #define DRM_IOCTL_V3D_WAIT_BO             DRM_IOWR(DRM_COMMAND_BASE
> + DRM_V3D_WAIT_BO, struct drm_v3d_wait_bo)
> @@ -56,6 +57,7 @@ extern "C" {
>                                                    struct
> drm_v3d_perfmon_destroy)
>  #define DRM_IOCTL_V3D_PERFMON_GET_VALUES  DRM_IOWR(DRM_COMMAND_BASE
> + DRM_V3D_PERFMON_GET_VALUES, \
>                                                    struct
> drm_v3d_perfmon_get_values)
> +#define DRM_IOCTL_V3D_SUBMIT_CPU          DRM_IOW(DRM_COMMAND_BASE +
> DRM_V3D_SUBMIT_CPU, struct drm_v3d_submit_cpu)
>  
>  #define DRM_V3D_SUBMIT_CL_FLUSH_CACHE             0x01
>  #define DRM_V3D_SUBMIT_EXTENSION                 0x02
> @@ -93,6 +95,7 @@ enum v3d_queue {
>         V3D_TFU,
>         V3D_CSD,
>         V3D_CACHE_CLEAN,
> +       V3D_CPU,
>  };
>  
>  /**
> @@ -276,6 +279,7 @@ enum drm_v3d_param {
>         DRM_V3D_PARAM_SUPPORTS_CACHE_FLUSH,
>         DRM_V3D_PARAM_SUPPORTS_PERFMON,
>         DRM_V3D_PARAM_SUPPORTS_MULTISYNC_EXT,
> +       DRM_V3D_PARAM_SUPPORTS_CPU_QUEUE,
>  };
>  
>  struct drm_v3d_get_param {
> @@ -361,6 +365,19 @@ struct drm_v3d_submit_csd {
>         __u32 pad;
>  };
>  
> +struct drm_v3d_submit_cpu {
> +       /* Pointer to a u32 array of the BOs that are referenced by
> the job. */
> +       __u64 bo_handles;
> +
> +       /* Number of BO handles passed in (size is that times 4). */
> +       __u32 bo_handle_count;
> +
> +       __u32 flags;
> +
> +       /* Pointer to an array of ioctl extensions*/
> +       __u64 extensions;
> +};
> +
>  enum {
>         V3D_PERFCNT_FEP_VALID_PRIMTS_NO_PIXELS,
>         V3D_PERFCNT_FEP_VALID_PRIMS,


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 07/17] drm/v3d: Add a CPU job submission
  2023-11-24  0:47 ` [PATCH v2 07/17] drm/v3d: Add a CPU job submission Maíra Canal
  2023-11-27  8:45   ` Iago Toral
@ 2023-11-27  8:59   ` Iago Toral
  1 sibling, 0 replies; 25+ messages in thread
From: Iago Toral @ 2023-11-27  8:59 UTC (permalink / raw)
  To: Maíra Canal, Melissa Wen, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann
  Cc: kernel-dev, Boris Brezillon, dri-devel, Faith Ekstrand

El jue, 23-11-2023 a las 21:47 -0300, Maíra Canal escribió:
> From: Melissa Wen <mwen@igalia.com>
> 
> Create a new type of job, a CPU job. A CPU job is a type of job that
> performs operations that requires CPU intervention. The overall idea
> is
> to use user extensions to enable different types of CPU job, allowing
> the
> CPU job to perform different operations according to the type of user
> externsion. The user extension ID identify the type of CPU job that
> must
> be dealt.
> 
> Having a CPU job is interesting for synchronization purposes as a CPU
> job has a queue like any other V3D job and can be synchoronized by
> the
> multisync extension.
> 
> Signed-off-by: Melissa Wen <mwen@igalia.com>
> Co-developed-by: Maíra Canal <mcanal@igalia.com>
> Signed-off-by: Maíra Canal <mcanal@igalia.com>
> ---
> 
(...)

> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c
> b/drivers/gpu/drm/v3d/v3d_sched.c
> index fccbea2a5f2e..a32c91b94898 100644
> --- a/drivers/gpu/drm/v3d/v3d_sched.c
> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
> @@ -55,6 +55,12 @@ to_csd_job(struct drm_sched_job *sched_job)
>         return container_of(sched_job, struct v3d_csd_job,
> base.base);
>  }
>  
> +static struct v3d_cpu_job *
> +to_cpu_job(struct drm_sched_job *sched_job)
> +{
> +       return container_of(sched_job, struct v3d_cpu_job,
> base.base);
> +}
> +
>  static void
>  v3d_sched_job_free(struct drm_sched_job *sched_job)
>  {
> @@ -262,6 +268,42 @@ v3d_csd_job_run(struct drm_sched_job *sched_job)
>         return fence;
>  }
>  
> +static struct dma_fence *
> +v3d_cpu_job_run(struct drm_sched_job *sched_job)
> +{
> +       struct v3d_cpu_job *job = to_cpu_job(sched_job);
> +       struct v3d_dev *v3d = job->base.v3d;
> +       struct v3d_file_priv *file = job->base.file->driver_priv;
> +       u64 runtime;
> +
> +       void (*v3d_cpu_job_fn[])(struct v3d_cpu_job *job) = { };

Shouldn't this be a static const? Also, maybe we want declare it
outside the function itself?

Iago

> +
> +       v3d->cpu_job = job;
> +
> +       if (job->job_type >= ARRAY_SIZE(v3d_cpu_job_fn)) {
> +               DRM_DEBUG_DRIVER("Unknown CPU job: %d\n", job-
> >job_type);
> +               return NULL;
> +       }
> +
> +       file->start_ns[V3D_CPU] = local_clock();
> +       v3d->queue[V3D_CPU].start_ns = file->start_ns[V3D_CPU];
> +
> +       v3d_cpu_job_fn[job->job_type](job);
> +
> +       runtime = local_clock() - file->start_ns[V3D_CPU];
> +
> +       file->enabled_ns[V3D_CPU] += runtime;
> +       v3d->queue[V3D_CPU].enabled_ns += runtime;
> +
> +       file->jobs_sent[V3D_CPU]++;
> +       v3d->queue[V3D_CPU].jobs_sent++;
> +
> +       file->start_ns[V3D_CPU] = 0;
> +       v3d->queue[V3D_CPU].start_ns = 0;
> +
> +       return NULL;
> +}

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 13/17] drm/v3d: Create a CPU job extension for the timestamp query job
  2023-11-24  0:47 ` [PATCH v2 13/17] drm/v3d: Create a CPU job extension for the timestamp query job Maíra Canal
@ 2023-11-27  9:16   ` Iago Toral
  2023-11-27 13:26     ` Maira Canal
  0 siblings, 1 reply; 25+ messages in thread
From: Iago Toral @ 2023-11-27  9:16 UTC (permalink / raw)
  To: Maíra Canal, Melissa Wen, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann
  Cc: kernel-dev, Boris Brezillon, dri-devel, Faith Ekstrand

El jue, 23-11-2023 a las 21:47 -0300, Maíra Canal escribió:

(...)
> +static void
> +v3d_timestamp_query(struct v3d_cpu_job *job)
> +{
> +       struct v3d_timestamp_query_info *timestamp_query = &job-
> >timestamp_query;
> +       struct v3d_bo *bo = to_v3d_bo(job->base.bo[0]);

I presume there is an implicit expectation here that a timestamp query
job only has one BO where we are writing query results, right? Maybe we
should document this more explicitly in the uAPI and also check that we
have at least that one BO before trying to map it and write to it?

Also, is there a reason why we take the job reference from job-
>base.bo[0] instead of job->bo[0]?

Iago

> +       u8 *value_addr;
> +
> +       v3d_get_bo_vaddr(bo);
> +
> +       for (int i = 0; i < timestamp_query->count; i++) {
> +               value_addr = ((u8 *) bo->vaddr) + timestamp_query-
> >queries[i].offset;
> +               *((u64 *) value_addr) = i == 0 ? ktime_get_ns() :
> 0ull;
> +
> +               drm_syncobj_replace_fence(timestamp_query-
> >queries[i].syncobj,
> +                                         job->base.done_fence);
> +       }
> +
> +       v3d_put_bo_vaddr(bo);
> +}
> +
>  static struct dma_fence *
>  v3d_cpu_job_run(struct drm_sched_job *sched_job)
>  {
> @@ -315,6 +352,7 @@ v3d_cpu_job_run(struct drm_sched_job *sched_job)
>  
>         void (*v3d_cpu_job_fn[])(struct v3d_cpu_job *job) = {
>                 [V3D_CPU_JOB_TYPE_INDIRECT_CSD] =
> v3d_rewrite_csd_job_wg_counts_from_indirect,
> +               [V3D_CPU_JOB_TYPE_TIMESTAMP_QUERY] =
> v3d_timestamp_query,
>         };
>  
>         v3d->cpu_job = job;
> @@ -504,7 +542,7 @@ static const struct drm_sched_backend_ops
> v3d_cache_clean_sched_ops = {
>  static const struct drm_sched_backend_ops v3d_cpu_sched_ops = {
>         .run_job = v3d_cpu_job_run,
>         .timedout_job = v3d_generic_job_timedout,
> -       .free_job = v3d_sched_job_free
> +       .free_job = v3d_cpu_job_free
>  };
>  
>  int
> diff --git a/drivers/gpu/drm/v3d/v3d_submit.c
> b/drivers/gpu/drm/v3d/v3d_submit.c
> index b6707ef42706..2f03c8bca593 100644
> --- a/drivers/gpu/drm/v3d/v3d_submit.c
> +++ b/drivers/gpu/drm/v3d/v3d_submit.c
> @@ -438,6 +438,64 @@ v3d_get_cpu_indirect_csd_params(struct drm_file
> *file_priv,
>                                           NULL, &info->acquire_ctx);
>  }
>  
> +/* Get data for the query timestamp job submission. */
> +static int
> +v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv,
> +                                  struct drm_v3d_extension __user
> *ext,
> +                                  struct v3d_cpu_job *job)
> +{
> +       u32 __user *offsets, *syncs;
> +       struct drm_v3d_timestamp_query timestamp;
> +
> +       if (!job) {
> +               DRM_DEBUG("CPU job extension was attached to a GPU
> job.\n");
> +               return -EINVAL;
> +       }
> +
> +       if (job->job_type) {
> +               DRM_DEBUG("Two CPU job extensions were added to the
> same CPU job.\n");
> +               return -EINVAL;
> +       }
> +
> +       if (copy_from_user(&timestamp, ext, sizeof(timestamp)))
> +               return -EFAULT;
> +
> +       if (timestamp.pad)
> +               return -EINVAL;
> +
> +       job->job_type = V3D_CPU_JOB_TYPE_TIMESTAMP_QUERY;
> +
> +       job->timestamp_query.queries =
> kvmalloc_array(timestamp.count,
> +                                                     sizeof(struct
> v3d_timestamp_query),
> +                                                     GFP_KERNEL);
> +       if (!job->timestamp_query.queries)
> +               return -ENOMEM;
> +
> +       offsets = u64_to_user_ptr(timestamp.offsets);
> +       syncs = u64_to_user_ptr(timestamp.syncs);
> +
> +       for (int i = 0; i < timestamp.count; i++) {
> +               u32 offset, sync;
> +
> +               if (copy_from_user(&offset, offsets++,
> sizeof(offset))) {
> +                       kvfree(job->timestamp_query.queries);
> +                       return -EFAULT;
> +               }
> +
> +               job->timestamp_query.queries[i].offset = offset;
> +
> +               if (copy_from_user(&sync, syncs++, sizeof(sync))) {
> +                       kvfree(job->timestamp_query.queries);
> +                       return -EFAULT;
> +               }
> +
> +               job->timestamp_query.queries[i].syncobj =
> drm_syncobj_find(file_priv, sync);
> +       }
> +       job->timestamp_query.count = timestamp.count;
> +
> +       return 0;
> +}
> +
>  /* Whenever userspace sets ioctl extensions, v3d_get_extensions
> parses data
>   * according to the extension id (name).
>   */
> @@ -466,6 +524,9 @@ v3d_get_extensions(struct drm_file *file_priv,
>                 case DRM_V3D_EXT_ID_CPU_INDIRECT_CSD:
>                         ret =
> v3d_get_cpu_indirect_csd_params(file_priv, user_ext, job);
>                         break;
> +               case DRM_V3D_EXT_ID_CPU_TIMESTAMP_QUERY:
> +                       ret =
> v3d_get_cpu_timestamp_query_params(file_priv, user_ext, job);
> +                       break;
>                 default:
>                         DRM_DEBUG_DRIVER("Unknown extension id:
> %d\n", ext.id);
>                         return -EINVAL;
> @@ -954,6 +1015,7 @@ v3d_submit_cpu_ioctl(struct drm_device *dev,
> void *data,
>         v3d_job_cleanup((void *)csd_job);
>         v3d_job_cleanup(clean_job);
>         v3d_put_multisync_post_deps(&se);
> +       kvfree(cpu_job->timestamp_query.queries);
>  
>         return ret;
>  }
> diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
> index 059b84984fb0..65d5de076366 100644
> --- a/include/uapi/drm/v3d_drm.h
> +++ b/include/uapi/drm/v3d_drm.h
> @@ -73,6 +73,7 @@ struct drm_v3d_extension {
>         __u32 id;
>  #define DRM_V3D_EXT_ID_MULTI_SYNC                      0x01
>  #define DRM_V3D_EXT_ID_CPU_INDIRECT_CSD                0x02
> +#define DRM_V3D_EXT_ID_CPU_TIMESTAMP_QUERY             0x03
>         __u32 flags; /* mbz */
>  };
>  
> @@ -400,6 +401,32 @@ struct drm_v3d_indirect_csd {
>         __u32 wg_uniform_offsets[3];
>  };
>  
> +/**
> + * struct drm_v3d_timestamp_query - ioctl extension for the CPU job
> to calculate
> + * a timestamp query
> + *
> + * When an extension DRM_V3D_EXT_ID_TIMESTAMP_QUERY is defined, it
> points to
> + * this extension to define a timestamp query submission. This CPU
> job will
> + * calculate the timestamp query and update the query value within
> the
> + * timestamp BO. Moreover, it will signal the timestamp syncobj to
> indicate
> + * query availability.
> + */
> +struct drm_v3d_timestamp_query {
> +       struct drm_v3d_extension base;
> +
> +       /* Array of queries' offsets within the timestamp BO for
> their value */
> +       __u64 offsets;
> +
> +       /* Array of timestamp's syncobjs to indicate its availability
> */
> +       __u64 syncs;
> +
> +       /* Number of queries */
> +       __u32 count;
> +
> +       /* mbz */
> +       __u32 pad;
> +};
> +
>  struct drm_v3d_submit_cpu {
>         /* Pointer to a u32 array of the BOs that are referenced by
> the job. */
>         __u64 bo_handles;


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 06/17] drm/v3d: Decouple job allocation from job initiation
  2023-11-27  8:12   ` Iago Toral
@ 2023-11-27 12:50     ` Maira Canal
  0 siblings, 0 replies; 25+ messages in thread
From: Maira Canal @ 2023-11-27 12:50 UTC (permalink / raw)
  To: Iago Toral, Melissa Wen, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann
  Cc: kernel-dev, Boris Brezillon, dri-devel, Faith Ekstrand

Hi Iago,

On 11/27/23 05:12, Iago Toral wrote:
> El jue, 23-11-2023 a las 21:47 -0300, Maíra Canal escribió:
>> We want to allow the IOCTLs to allocate the job without initiating
>> it.
>> This will be useful for the CPU job submission IOCTL, as the CPU job
>> has
>> the need to use information from the user extensions. Currently, the
>> user extensions are parsed before the job allocation, making it
>> impossible to fill the CPU job when parsing the user extensions.
>> Therefore, decouple the job allocation from the job initiation.
>>
>> Signed-off-by: Maíra Canal <mcanal@igalia.com>
>> ---
>>   drivers/gpu/drm/v3d/v3d_submit.c | 23 ++++++++++++++++++-----
>>   1 file changed, 18 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/v3d/v3d_submit.c
>> b/drivers/gpu/drm/v3d/v3d_submit.c
>> index fe46dd316ca0..ed1a310bbd2f 100644
>> --- a/drivers/gpu/drm/v3d/v3d_submit.c
>> +++ b/drivers/gpu/drm/v3d/v3d_submit.c
>> @@ -135,6 +135,21 @@ void v3d_job_put(struct v3d_job *job)
>>          kref_put(&job->refcount, job->free);
>>   }
>>   
>> +static int
>> +v3d_job_allocate(void **container, size_t size)
>> +{
>> +       if (*container)
>> +               return 0;
> 
> Mmm... is this really what we want? At least right now we expect

Yeah, it is.

> v3d_job_allocate to always allocate memory, is there any scenario in
> which we would expect to call this with an already allocated container?

Take a look here:

   19 int
   18 v3d_submit_cpu_ioctl(struct drm_device *dev, void *data,
   17                      struct drm_file *file_priv)
   16 {
   15         struct v3d_dev *v3d = to_v3d_dev(dev);
   14         struct drm_v3d_submit_cpu *args = data;
   13         struct v3d_submit_ext se = {0};
   12         struct v3d_submit_ext *out_se = NULL;
   11         struct v3d_cpu_job *cpu_job = NULL;
   10         struct v3d_csd_job *csd_job = NULL;
    9         struct v3d_job *clean_job = NULL;
    8         struct ww_acquire_ctx acquire_ctx;
    7         int ret;
    6
    5         if (args->flags && !(args->flags & 
DRM_V3D_SUBMIT_EXTENSION)) {
    4                 DRM_INFO("invalid flags: %d\n", args->flags);
    3                 return -EINVAL;
    2         }
    1
1187         ret = v3d_job_allocate((void *)&cpu_job, sizeof(*cpu_job));
    1         if (ret)
    2                 return ret;

Here we allocate the CPU job, as we need it allocated to fill its fields
with the information in the extensions.

    3
    4         if (args->flags & DRM_V3D_SUBMIT_EXTENSION) {
    5                 ret = v3d_get_extensions(file_priv, 
args->extensions, &se, cpu_job);

So, here we filling the CPU job fields with relevant information.

    6                 if (ret) {
    7                         DRM_DEBUG("Failed to get extensions.\n");
    8                         goto fail;
    9                 }
   10         }
   11
   12         /* Every CPU job must have a CPU job user extension */
   13         if (!cpu_job->job_type) {
   14                 DRM_DEBUG("CPU job must have a CPU job user 
extension.\n");
   15                 goto fail;
   16         }
   17
   18         trace_v3d_submit_cpu_ioctl(&v3d->drm, cpu_job->job_type);
   19
   20         ret = v3d_job_init(v3d, file_priv, (void *)&cpu_job, 
sizeof(*cpu_job),
   21                            v3d_job_free, 0, &se, V3D_CPU);

Here we are initiating the job (drm_sched_job_init and syncobjs stuff).
If I don't have this condition in v3d_job_allocate, I would allocate
the container twice.

   22         if (ret)
   23                 goto fail;

Best Regards,
- Maíra

> 
> Iago
> 
>> +
>> +       *container = kcalloc(1, size, GFP_KERNEL);
>> +       if (!*container) {
>> +               DRM_ERROR("Cannot allocate memory for V3D job.\n");
>> +               return -ENOMEM;
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>>   static int
>>   v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
>>               void **container, size_t size, void (*free)(struct kref
>> *ref),
>> @@ -145,11 +160,9 @@ v3d_job_init(struct v3d_dev *v3d, struct
>> drm_file *file_priv,
>>          bool has_multisync = se && (se->flags &
>> DRM_V3D_EXT_ID_MULTI_SYNC);
>>          int ret, i;
>>   
>> -       *container = kcalloc(1, size, GFP_KERNEL);
>> -       if (!*container) {
>> -               DRM_ERROR("Cannot allocate memory for v3d job.");
>> -               return -ENOMEM;
>> -       }
>> +       ret = v3d_job_allocate(container, size);
>> +       if (ret)
>> +               return ret;
>>   
>>          job = *container;
>>          job->v3d = v3d;
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 13/17] drm/v3d: Create a CPU job extension for the timestamp query job
  2023-11-27  9:16   ` Iago Toral
@ 2023-11-27 13:26     ` Maira Canal
  0 siblings, 0 replies; 25+ messages in thread
From: Maira Canal @ 2023-11-27 13:26 UTC (permalink / raw)
  To: Iago Toral, Melissa Wen, David Airlie, Daniel Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann
  Cc: kernel-dev, Boris Brezillon, dri-devel, Faith Ekstrand

Hi Iago,

On 11/27/23 06:16, Iago Toral wrote:
> El jue, 23-11-2023 a las 21:47 -0300, Maíra Canal escribió:
> 
> (...)
>> +static void
>> +v3d_timestamp_query(struct v3d_cpu_job *job)
>> +{
>> +       struct v3d_timestamp_query_info *timestamp_query = &job-
>>> timestamp_query;
>> +       struct v3d_bo *bo = to_v3d_bo(job->base.bo[0]);
> 
> I presume there is an implicit expectation here that a timestamp query
> job only has one BO where we are writing query results, right? Maybe we
> should document this more explicitly in the uAPI and also check that we
> have at least that one BO before trying to map it and write to it?

I'll do it.

> 
> Also, is there a reason why we take the job reference from job-
>> base.bo[0] instead of job->bo[0]?

job is a struct v3d_cpu_job, which has a struct v3d_job as base.
The BOs are stored on struct v3d_job, as this is a common functionality
of all job types.

Best Regards,
- Maíra

> 
> Iago
> 
>> +       u8 *value_addr;
>> +
>> +       v3d_get_bo_vaddr(bo);
>> +
>> +       for (int i = 0; i < timestamp_query->count; i++) {
>> +               value_addr = ((u8 *) bo->vaddr) + timestamp_query-
>>> queries[i].offset;
>> +               *((u64 *) value_addr) = i == 0 ? ktime_get_ns() :
>> 0ull;
>> +
>> +               drm_syncobj_replace_fence(timestamp_query-
>>> queries[i].syncobj,
>> +                                         job->base.done_fence);
>> +       }
>> +
>> +       v3d_put_bo_vaddr(bo);
>> +}
>> +
>>   static struct dma_fence *
>>   v3d_cpu_job_run(struct drm_sched_job *sched_job)
>>   {
>> @@ -315,6 +352,7 @@ v3d_cpu_job_run(struct drm_sched_job *sched_job)
>>   
>>          void (*v3d_cpu_job_fn[])(struct v3d_cpu_job *job) = {
>>                  [V3D_CPU_JOB_TYPE_INDIRECT_CSD] =
>> v3d_rewrite_csd_job_wg_counts_from_indirect,
>> +               [V3D_CPU_JOB_TYPE_TIMESTAMP_QUERY] =
>> v3d_timestamp_query,
>>          };
>>   
>>          v3d->cpu_job = job;
>> @@ -504,7 +542,7 @@ static const struct drm_sched_backend_ops
>> v3d_cache_clean_sched_ops = {
>>   static const struct drm_sched_backend_ops v3d_cpu_sched_ops = {
>>          .run_job = v3d_cpu_job_run,
>>          .timedout_job = v3d_generic_job_timedout,
>> -       .free_job = v3d_sched_job_free
>> +       .free_job = v3d_cpu_job_free
>>   };
>>   
>>   int
>> diff --git a/drivers/gpu/drm/v3d/v3d_submit.c
>> b/drivers/gpu/drm/v3d/v3d_submit.c
>> index b6707ef42706..2f03c8bca593 100644
>> --- a/drivers/gpu/drm/v3d/v3d_submit.c
>> +++ b/drivers/gpu/drm/v3d/v3d_submit.c
>> @@ -438,6 +438,64 @@ v3d_get_cpu_indirect_csd_params(struct drm_file
>> *file_priv,
>>                                            NULL, &info->acquire_ctx);
>>   }
>>   
>> +/* Get data for the query timestamp job submission. */
>> +static int
>> +v3d_get_cpu_timestamp_query_params(struct drm_file *file_priv,
>> +                                  struct drm_v3d_extension __user
>> *ext,
>> +                                  struct v3d_cpu_job *job)
>> +{
>> +       u32 __user *offsets, *syncs;
>> +       struct drm_v3d_timestamp_query timestamp;
>> +
>> +       if (!job) {
>> +               DRM_DEBUG("CPU job extension was attached to a GPU
>> job.\n");
>> +               return -EINVAL;
>> +       }
>> +
>> +       if (job->job_type) {
>> +               DRM_DEBUG("Two CPU job extensions were added to the
>> same CPU job.\n");
>> +               return -EINVAL;
>> +       }
>> +
>> +       if (copy_from_user(&timestamp, ext, sizeof(timestamp)))
>> +               return -EFAULT;
>> +
>> +       if (timestamp.pad)
>> +               return -EINVAL;
>> +
>> +       job->job_type = V3D_CPU_JOB_TYPE_TIMESTAMP_QUERY;
>> +
>> +       job->timestamp_query.queries =
>> kvmalloc_array(timestamp.count,
>> +                                                     sizeof(struct
>> v3d_timestamp_query),
>> +                                                     GFP_KERNEL);
>> +       if (!job->timestamp_query.queries)
>> +               return -ENOMEM;
>> +
>> +       offsets = u64_to_user_ptr(timestamp.offsets);
>> +       syncs = u64_to_user_ptr(timestamp.syncs);
>> +
>> +       for (int i = 0; i < timestamp.count; i++) {
>> +               u32 offset, sync;
>> +
>> +               if (copy_from_user(&offset, offsets++,
>> sizeof(offset))) {
>> +                       kvfree(job->timestamp_query.queries);
>> +                       return -EFAULT;
>> +               }
>> +
>> +               job->timestamp_query.queries[i].offset = offset;
>> +
>> +               if (copy_from_user(&sync, syncs++, sizeof(sync))) {
>> +                       kvfree(job->timestamp_query.queries);
>> +                       return -EFAULT;
>> +               }
>> +
>> +               job->timestamp_query.queries[i].syncobj =
>> drm_syncobj_find(file_priv, sync);
>> +       }
>> +       job->timestamp_query.count = timestamp.count;
>> +
>> +       return 0;
>> +}
>> +
>>   /* Whenever userspace sets ioctl extensions, v3d_get_extensions
>> parses data
>>    * according to the extension id (name).
>>    */
>> @@ -466,6 +524,9 @@ v3d_get_extensions(struct drm_file *file_priv,
>>                  case DRM_V3D_EXT_ID_CPU_INDIRECT_CSD:
>>                          ret =
>> v3d_get_cpu_indirect_csd_params(file_priv, user_ext, job);
>>                          break;
>> +               case DRM_V3D_EXT_ID_CPU_TIMESTAMP_QUERY:
>> +                       ret =
>> v3d_get_cpu_timestamp_query_params(file_priv, user_ext, job);
>> +                       break;
>>                  default:
>>                          DRM_DEBUG_DRIVER("Unknown extension id:
>> %d\n", ext.id);
>>                          return -EINVAL;
>> @@ -954,6 +1015,7 @@ v3d_submit_cpu_ioctl(struct drm_device *dev,
>> void *data,
>>          v3d_job_cleanup((void *)csd_job);
>>          v3d_job_cleanup(clean_job);
>>          v3d_put_multisync_post_deps(&se);
>> +       kvfree(cpu_job->timestamp_query.queries);
>>   
>>          return ret;
>>   }
>> diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
>> index 059b84984fb0..65d5de076366 100644
>> --- a/include/uapi/drm/v3d_drm.h
>> +++ b/include/uapi/drm/v3d_drm.h
>> @@ -73,6 +73,7 @@ struct drm_v3d_extension {
>>          __u32 id;
>>   #define DRM_V3D_EXT_ID_MULTI_SYNC                      0x01
>>   #define DRM_V3D_EXT_ID_CPU_INDIRECT_CSD                0x02
>> +#define DRM_V3D_EXT_ID_CPU_TIMESTAMP_QUERY             0x03
>>          __u32 flags; /* mbz */
>>   };
>>   
>> @@ -400,6 +401,32 @@ struct drm_v3d_indirect_csd {
>>          __u32 wg_uniform_offsets[3];
>>   };
>>   
>> +/**
>> + * struct drm_v3d_timestamp_query - ioctl extension for the CPU job
>> to calculate
>> + * a timestamp query
>> + *
>> + * When an extension DRM_V3D_EXT_ID_TIMESTAMP_QUERY is defined, it
>> points to
>> + * this extension to define a timestamp query submission. This CPU
>> job will
>> + * calculate the timestamp query and update the query value within
>> the
>> + * timestamp BO. Moreover, it will signal the timestamp syncobj to
>> indicate
>> + * query availability.
>> + */
>> +struct drm_v3d_timestamp_query {
>> +       struct drm_v3d_extension base;
>> +
>> +       /* Array of queries' offsets within the timestamp BO for
>> their value */
>> +       __u64 offsets;
>> +
>> +       /* Array of timestamp's syncobjs to indicate its availability
>> */
>> +       __u64 syncs;
>> +
>> +       /* Number of queries */
>> +       __u32 count;
>> +
>> +       /* mbz */
>> +       __u32 pad;
>> +};
>> +
>>   struct drm_v3d_submit_cpu {
>>          /* Pointer to a u32 array of the BOs that are referenced by
>> the job. */
>>          __u64 bo_handles;
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2023-11-27 13:26 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-24  0:46 [PATCH v2 00/17] drm/v3d: Introduce CPU jobs Maíra Canal
2023-11-24  0:46 ` [PATCH v2 01/17] drm/v3d: Remove unused function header Maíra Canal
2023-11-24  0:46 ` [PATCH v2 02/17] drm/v3d: Move wait BO ioctl to the v3d_bo file Maíra Canal
2023-11-24  0:46 ` [PATCH v2 03/17] drm/v3d: Detach job submissions IOCTLs to a new specific file Maíra Canal
2023-11-24  0:47 ` [PATCH v2 04/17] drm/v3d: Simplify job refcount handling Maíra Canal
2023-11-27  7:57   ` Iago Toral
2023-11-24  0:47 ` [PATCH v2 05/17] drm/v3d: Don't allow two multisync extensions in the same job Maíra Canal
2023-11-24  0:47 ` [PATCH v2 06/17] drm/v3d: Decouple job allocation from job initiation Maíra Canal
2023-11-27  8:12   ` Iago Toral
2023-11-27 12:50     ` Maira Canal
2023-11-24  0:47 ` [PATCH v2 07/17] drm/v3d: Add a CPU job submission Maíra Canal
2023-11-27  8:45   ` Iago Toral
2023-11-27  8:59   ` Iago Toral
2023-11-24  0:47 ` [PATCH v2 08/17] drm/v3d: Use v3d_get_extensions() to parse CPU job data Maíra Canal
2023-11-24  0:47 ` [PATCH v2 09/17] drm/v3d: Create tracepoints to track the CPU job Maíra Canal
2023-11-24  0:47 ` [PATCH v2 10/17] drm/v3d: Detach the CSD job BO setup Maíra Canal
2023-11-24  0:47 ` [PATCH v2 11/17] drm/v3d: Enable BO mapping Maíra Canal
2023-11-24  0:47 ` [PATCH v2 12/17] drm/v3d: Create a CPU job extension for a indirect CSD job Maíra Canal
2023-11-24  0:47 ` [PATCH v2 13/17] drm/v3d: Create a CPU job extension for the timestamp query job Maíra Canal
2023-11-27  9:16   ` Iago Toral
2023-11-27 13:26     ` Maira Canal
2023-11-24  0:47 ` [PATCH v2 14/17] drm/v3d: Create a CPU job extension for the reset timestamp job Maíra Canal
2023-11-24  0:47 ` [PATCH v2 15/17] drm/v3d: Create a CPU job extension to copy timestamp query to a buffer Maíra Canal
2023-11-24  0:47 ` [PATCH v2 16/17] drm/v3d: Create a CPU job extension for the reset performance query job Maíra Canal
2023-11-24  0:47 ` [PATCH v2 17/17] drm/v3d: Create a CPU job extension for the copy " Maíra Canal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.