All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] drm/v3d: add multiple in/out syncobjs support
@ 2021-09-29  9:41 Melissa Wen
  2021-09-29  9:42 ` [PATCH v2 1/4] drm/v3d: decouple adding job dependencies steps from job init Melissa Wen
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Melissa Wen @ 2021-09-29  9:41 UTC (permalink / raw)
  To: dri-devel
  Cc: Emma Anholt, David Airlie, Daniel Vetter, Maxime Ripard,
	Boris Brezillon, Iago Toral

[-- Attachment #1: Type: text/plain, Size: 3315 bytes --]

Currently, v3d only supports single in/out syncobj per submission (in
v3d_submit_cl, there are two in_sync, one for bin and another for render
job); however, Vulkan queue submit operations expect multiples wait and
signal semaphores. This series extends v3d interface and job dependency
operations to handle more than one in/out syncobj.

This version differs from the previous one by adding a prep work patch
that synthesizes job memory allocation and initialization into a single
function. Also, the design for ioctl extensions changed to subclass
drm_v3d_extension (generic type) as base element of any specific
extension (here is drm_v3d_multisync). Finally, in the multisync
extension, flags was replaced by wait_stage and it uses v3d_queue to
define sync stage (i.e. which queue to add wait semaphores)

The first patch just decouples the steps to lookup and add job
dependency from the job init code, since the operation repeats for every
syncobj that a job should wait before starting. So, the fourth patch of
this series will reuse it to handle multiples wait for semaphores.

The second patch moves job memory allocation to v3d_job_init() for any
type of v3d_job-based job. The main goal is to prevent errors when
handling job initialization failures (doing a proper cleanup). 

The third patch extends our interface by using a generic extension.
This approach was inspired by i915_user_extension[1] and
amd_cs_chunks[2] to give a little more flexibility in adding other
submission features in the future. Therefore, the list of extensions
would work as a hub of features that use an id to determine the
corresponding feature data type.

With this base, the fourth patch adds multiple wait/signal semaphores
support. For this, we add to the list of the generic extensions a new
data type (drm_v3d_multi_sync) that points to two arrays of syncobjs
(in/out) and also determines (wait_stage) the job to add wait
dependencies (v3d_queue). An auxiliary struct (v3d_submit_ext) is used
when parsing submission extensions. Finally, we reserve some space in
the semaphore struct (drm_v3d_sem) to accommodate timeline semaphores
that we aim to add support soon (same reason for already defining
v3d_submit_outsync).

[1] https://cgit.freedesktop.org/drm/drm-misc/commit/drivers/gpu/drm/i915/i915_user_extensions.c?id=9d1305ef80b95dde0337106ed8b826604e2155ad
[2] https://cgit.freedesktop.org/drm/drm-misc/tree/include/uapi/drm/amdgpu_drm.h#n556

In the mesa side, the work related to this series is in progress at
https://gitlab.freedesktop.org/mwen/mesa/-/commit/6c340bb35203a0418af87a7921d7295a4047b77f

v2:
- move job mem alloc to v3d_job_init (Iago)
- simplify and subclass the generic extension struct (Daniel)
- make job dependency conditions more understandable (Iago)


Melissa Wen (4):
  drm/v3d: decouple adding job dependencies steps from job init
  drm/v3d: alloc and init job in one shot
  drm/v3d: add generic ioctl extension
  drm/v3d: add multiple syncobjs support

 drivers/gpu/drm/v3d/v3d_drv.c |  10 +-
 drivers/gpu/drm/v3d/v3d_drv.h |  23 ++-
 drivers/gpu/drm/v3d/v3d_gem.c | 360 +++++++++++++++++++++++++---------
 include/uapi/drm/v3d_drm.h    |  78 ++++++++
 4 files changed, 368 insertions(+), 103 deletions(-)

-- 
2.30.2


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 1/4] drm/v3d: decouple adding job dependencies steps from job init
  2021-09-29  9:41 [PATCH v2 0/4] drm/v3d: add multiple in/out syncobjs support Melissa Wen
@ 2021-09-29  9:42 ` Melissa Wen
  2021-09-29  9:43 ` [PATCH v2 2/4] drm/v3d: alloc and init job in one shot Melissa Wen
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Melissa Wen @ 2021-09-29  9:42 UTC (permalink / raw)
  To: dri-devel
  Cc: Emma Anholt, David Airlie, Daniel Vetter, Maxime Ripard,
	Boris Brezillon, Iago Toral

[-- Attachment #1: Type: text/plain, Size: 2695 bytes --]

Prep work to enable a job to wait for more than one syncobj before
start. Also get rid of old checkpatch warnings in the v3d_gem file.
No functional changes.

Signed-off-by: Melissa Wen <mwen@igalia.com>
---
 drivers/gpu/drm/v3d/v3d_gem.c | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index ead0be8d48a7..e60fbc28ef29 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -416,7 +416,7 @@ v3d_wait_bo_ioctl(struct drm_device *dev, void *data,
 		return -EINVAL;
 
 	ret = drm_gem_dma_resv_wait(file_priv, args->handle,
-					      true, timeout_jiffies);
+				    true, timeout_jiffies);
 
 	/* Decrement the user's timeout, in case we got interrupted
 	 * such that the ioctl will be restarted.
@@ -434,12 +434,25 @@ v3d_wait_bo_ioctl(struct drm_device *dev, void *data,
 	return ret;
 }
 
+static int
+v3d_job_add_deps(struct drm_file *file_priv, struct v3d_job *job,
+		 u32 in_sync, u32 point)
+{
+	struct dma_fence *in_fence = NULL;
+	int ret;
+
+	ret = drm_syncobj_find_fence(file_priv, in_sync, point, 0, &in_fence);
+	if (ret == -EINVAL)
+		return ret;
+
+	return drm_sched_job_add_dependency(&job->base, in_fence);
+}
+
 static int
 v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
 	     struct v3d_job *job, void (*free)(struct kref *ref),
 	     u32 in_sync, enum v3d_queue queue)
 {
-	struct dma_fence *in_fence = NULL;
 	struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
 	int ret;
 
@@ -455,11 +468,7 @@ v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
 	if (ret)
 		goto fail;
 
-	ret = drm_syncobj_find_fence(file_priv, in_sync, 0, 0, &in_fence);
-	if (ret == -EINVAL)
-		goto fail_job;
-
-	ret = drm_sched_job_add_dependency(&job->base, in_fence);
+	ret = v3d_job_add_deps(file_priv, job, in_sync, 0);
 	if (ret)
 		goto fail_job;
 
@@ -499,7 +508,7 @@ v3d_attach_fences_and_unlock_reservation(struct drm_file *file_priv,
 	for (i = 0; i < job->bo_count; i++) {
 		/* XXX: Use shared fences for read-only objects. */
 		dma_resv_add_excl_fence(job->bo[i]->resv,
-						  job->done_fence);
+					job->done_fence);
 	}
 
 	drm_gem_unlock_reservations(job->bo, job->bo_count, acquire_ctx);
@@ -903,8 +912,7 @@ v3d_gem_init(struct drm_device *dev)
 	if (!v3d->pt) {
 		drm_mm_takedown(&v3d->mm);
 		dev_err(v3d->drm.dev,
-			"Failed to allocate page tables. "
-			"Please ensure you have CMA enabled.\n");
+			"Failed to allocate page tables. Please ensure you have CMA enabled.\n");
 		return -ENOMEM;
 	}
 
-- 
2.30.2


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 2/4] drm/v3d: alloc and init job in one shot
  2021-09-29  9:41 [PATCH v2 0/4] drm/v3d: add multiple in/out syncobjs support Melissa Wen
  2021-09-29  9:42 ` [PATCH v2 1/4] drm/v3d: decouple adding job dependencies steps from job init Melissa Wen
@ 2021-09-29  9:43 ` Melissa Wen
  2021-09-30  8:44   ` Iago Toral
  2021-09-29  9:44 ` [PATCH v2 3/4] drm/v3d: add generic ioctl extension Melissa Wen
  2021-09-29  9:45 ` [PATCH v2 4/4] drm/v3d: add multiple syncobjs support Melissa Wen
  3 siblings, 1 reply; 13+ messages in thread
From: Melissa Wen @ 2021-09-29  9:43 UTC (permalink / raw)
  To: dri-devel
  Cc: Emma Anholt, David Airlie, Daniel Vetter, Maxime Ripard,
	Boris Brezillon, Iago Toral

[-- Attachment #1: Type: text/plain, Size: 6304 bytes --]

Move job memory allocation to v3d_job_init function. This aim to facilitate
error handling in job initialization, since cleanup steps are similar for all
(struct v3d_job)-based types of job involved in a command submission. To
generalize v3d_job_init(), this change takes into account that all job structs
have the first element a struct v3d_job (bin, render, tfu, csd) or it is a
v3d_job itself (clean_job) for pointer casting.

Suggested-by: Iago Toral <itoral@igalia.com>
Signed-off-by: Melissa Wen <mwen@igalia.com>
---
 drivers/gpu/drm/v3d/v3d_gem.c | 115 +++++++++++++---------------------
 1 file changed, 42 insertions(+), 73 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index e60fbc28ef29..9cfa6f8d4357 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -392,6 +392,9 @@ v3d_render_job_free(struct kref *ref)
 
 void v3d_job_cleanup(struct v3d_job *job)
 {
+	if (!job)
+		return;
+
 	drm_sched_job_cleanup(&job->base);
 	v3d_job_put(job);
 }
@@ -450,12 +453,20 @@ v3d_job_add_deps(struct drm_file *file_priv, struct v3d_job *job,
 
 static int
 v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
-	     struct v3d_job *job, void (*free)(struct kref *ref),
+	     void **container, size_t size, void (*free)(struct kref *ref),
 	     u32 in_sync, enum v3d_queue queue)
 {
 	struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
+	struct v3d_job *job;
 	int ret;
 
+	*container = kcalloc(1, size, GFP_KERNEL);
+	if (!*container) {
+		DRM_ERROR("Cannot allocate memory for v3d job.");
+		return -ENOMEM;
+	}
+
+	job = *container;
 	job->v3d = v3d;
 	job->free = free;
 
@@ -479,6 +490,9 @@ v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
 	drm_sched_job_cleanup(&job->base);
 fail:
 	pm_runtime_put_autosuspend(v3d->drm.dev);
+	kfree(*container);
+	*container = NULL;
+
 	return ret;
 }
 
@@ -558,35 +572,20 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
 		return -EINVAL;
 	}
 
-	render = kcalloc(1, sizeof(*render), GFP_KERNEL);
-	if (!render)
-		return -ENOMEM;
+	ret = v3d_job_init(v3d, file_priv, (void *)&render, sizeof(*render),
+			   v3d_render_job_free, args->in_sync_bcl, V3D_RENDER);
+	if (ret)
+		goto fail;
 
 	render->start = args->rcl_start;
 	render->end = args->rcl_end;
 	INIT_LIST_HEAD(&render->unref_list);
 
-	ret = v3d_job_init(v3d, file_priv, &render->base,
-			   v3d_render_job_free, args->in_sync_rcl, V3D_RENDER);
-	if (ret) {
-		kfree(render);
-		return ret;
-	}
-
 	if (args->bcl_start != args->bcl_end) {
-		bin = kcalloc(1, sizeof(*bin), GFP_KERNEL);
-		if (!bin) {
-			v3d_job_cleanup(&render->base);
-			return -ENOMEM;
-		}
-
-		ret = v3d_job_init(v3d, file_priv, &bin->base,
+		ret = v3d_job_init(v3d, file_priv, (void *)&bin, sizeof(*bin),
 				   v3d_job_free, args->in_sync_bcl, V3D_BIN);
-		if (ret) {
-			v3d_job_cleanup(&render->base);
-			kfree(bin);
-			return ret;
-		}
+		if (ret)
+			goto fail;
 
 		bin->start = args->bcl_start;
 		bin->end = args->bcl_end;
@@ -597,18 +596,10 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
 	}
 
 	if (args->flags & DRM_V3D_SUBMIT_CL_FLUSH_CACHE) {
-		clean_job = kcalloc(1, sizeof(*clean_job), GFP_KERNEL);
-		if (!clean_job) {
-			ret = -ENOMEM;
-			goto fail;
-		}
-
-		ret = v3d_job_init(v3d, file_priv, clean_job, v3d_job_free, 0, V3D_CACHE_CLEAN);
-		if (ret) {
-			kfree(clean_job);
-			clean_job = NULL;
+		ret = v3d_job_init(v3d, file_priv, (void *)&clean_job, sizeof(*clean_job),
+				   v3d_job_free, 0, V3D_CACHE_CLEAN);
+		if (ret)
 			goto fail;
-		}
 
 		last_job = clean_job;
 	} else {
@@ -681,11 +672,9 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
 	drm_gem_unlock_reservations(last_job->bo,
 				    last_job->bo_count, &acquire_ctx);
 fail:
-	if (bin)
-		v3d_job_cleanup(&bin->base);
-	v3d_job_cleanup(&render->base);
-	if (clean_job)
-		v3d_job_cleanup(clean_job);
+	v3d_job_cleanup((void *)bin);
+	v3d_job_cleanup((void *)render);
+	v3d_job_cleanup(clean_job);
 
 	return ret;
 }
@@ -711,22 +700,16 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
 
 	trace_v3d_submit_tfu_ioctl(&v3d->drm, args->iia);
 
-	job = kcalloc(1, sizeof(*job), GFP_KERNEL);
-	if (!job)
-		return -ENOMEM;
-
-	ret = v3d_job_init(v3d, file_priv, &job->base,
+	ret = v3d_job_init(v3d, file_priv, (void *)&job, sizeof(*job),
 			   v3d_job_free, args->in_sync, V3D_TFU);
-	if (ret) {
-		kfree(job);
-		return ret;
-	}
+	if (ret)
+		goto fail;
 
 	job->base.bo = kcalloc(ARRAY_SIZE(args->bo_handles),
 			       sizeof(*job->base.bo), GFP_KERNEL);
 	if (!job->base.bo) {
-		v3d_job_cleanup(&job->base);
-		return -ENOMEM;
+		ret = -ENOMEM;
+		goto fail;
 	}
 
 	job->args = *args;
@@ -773,7 +756,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
 	return 0;
 
 fail:
-	v3d_job_cleanup(&job->base);
+	v3d_job_cleanup((void *)job);
 
 	return ret;
 }
@@ -806,29 +789,15 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
 		return -EINVAL;
 	}
 
-	job = kcalloc(1, sizeof(*job), GFP_KERNEL);
-	if (!job)
-		return -ENOMEM;
-
-	ret = v3d_job_init(v3d, file_priv, &job->base,
+	ret = v3d_job_init(v3d, file_priv, (void *)&job, sizeof(*job),
 			   v3d_job_free, args->in_sync, V3D_CSD);
-	if (ret) {
-		kfree(job);
-		return ret;
-	}
-
-	clean_job = kcalloc(1, sizeof(*clean_job), GFP_KERNEL);
-	if (!clean_job) {
-		v3d_job_cleanup(&job->base);
-		return -ENOMEM;
-	}
+	if (ret)
+		goto fail;
 
-	ret = v3d_job_init(v3d, file_priv, clean_job, v3d_job_free, 0, V3D_CACHE_CLEAN);
-	if (ret) {
-		v3d_job_cleanup(&job->base);
-		kfree(clean_job);
-		return ret;
-	}
+	ret = v3d_job_init(v3d, file_priv, (void *)&clean_job, sizeof(*clean_job),
+			   v3d_job_free, 0, V3D_CACHE_CLEAN);
+	if (ret)
+		goto fail;
 
 	job->args = *args;
 
@@ -877,7 +846,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
 	drm_gem_unlock_reservations(clean_job->bo, clean_job->bo_count,
 				    &acquire_ctx);
 fail:
-	v3d_job_cleanup(&job->base);
+	v3d_job_cleanup((void *)job);
 	v3d_job_cleanup(clean_job);
 
 	return ret;
-- 
2.30.2


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 3/4] drm/v3d: add generic ioctl extension
  2021-09-29  9:41 [PATCH v2 0/4] drm/v3d: add multiple in/out syncobjs support Melissa Wen
  2021-09-29  9:42 ` [PATCH v2 1/4] drm/v3d: decouple adding job dependencies steps from job init Melissa Wen
  2021-09-29  9:43 ` [PATCH v2 2/4] drm/v3d: alloc and init job in one shot Melissa Wen
@ 2021-09-29  9:44 ` Melissa Wen
  2021-09-30  9:11   ` Iago Toral
  2021-09-29  9:45 ` [PATCH v2 4/4] drm/v3d: add multiple syncobjs support Melissa Wen
  3 siblings, 1 reply; 13+ messages in thread
From: Melissa Wen @ 2021-09-29  9:44 UTC (permalink / raw)
  To: dri-devel
  Cc: Emma Anholt, David Airlie, Daniel Vetter, Maxime Ripard,
	Boris Brezillon, Iago Toral

[-- Attachment #1: Type: text/plain, Size: 7144 bytes --]

Add support to attach generic extensions on job submission. This patch
is third prep work to enable multiple syncobjs on job submission. With
this work, when the job submission interface needs to be extended to
accomodate a new feature, we will use a generic extension struct where
an id determines the data type to be pointed. The first application is
to enable multiples in/out syncobj (next patch), but the base is
already done for future features. Therefore, to attach a new feature,
a specific extension struct should subclass drm_v3d_extension and
update the list of extensions in a job submission.

v2:
- remove redundant elements to subclass struct (Daniel)

Signed-off-by: Melissa Wen <mwen@igalia.com>
---
 drivers/gpu/drm/v3d/v3d_drv.c |  4 +-
 drivers/gpu/drm/v3d/v3d_gem.c | 71 +++++++++++++++++++++++++++++++++--
 include/uapi/drm/v3d_drm.h    | 31 +++++++++++++++
 3 files changed, 100 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index c1deab2cf38d..3d6b9bcce2f7 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -83,7 +83,6 @@ static int v3d_get_param_ioctl(struct drm_device *dev, void *data,
 		return 0;
 	}
 
-
 	switch (args->param) {
 	case DRM_V3D_PARAM_SUPPORTS_TFU:
 		args->value = 1;
@@ -147,7 +146,7 @@ v3d_postclose(struct drm_device *dev, struct drm_file *file)
 DEFINE_DRM_GEM_FOPS(v3d_drm_fops);
 
 /* DRM_AUTH is required on SUBMIT_CL for now, while we don't have GMP
- * protection between clients.  Note that render nodes would be be
+ * protection between clients.  Note that render nodes would be
  * able to submit CLs that could access BOs from clients authenticated
  * with the master node.  The TFU doesn't use the GMP, so it would
  * need to stay DRM_AUTH until we do buffer size/offset validation.
@@ -219,7 +218,6 @@ static int v3d_platform_drm_probe(struct platform_device *pdev)
 	u32 mmu_debug;
 	u32 ident1;
 
-
 	v3d = devm_drm_dev_alloc(dev, &v3d_drm_driver, struct v3d_dev, drm);
 	if (IS_ERR(v3d))
 		return PTR_ERR(v3d);
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index 9cfa6f8d4357..b912419027f7 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -535,6 +535,33 @@ v3d_attach_fences_and_unlock_reservation(struct drm_file *file_priv,
 	}
 }
 
+static int
+v3d_get_extensions(struct drm_file *file_priv, u64 ext_handles)
+{
+	struct drm_v3d_extension __user *user_ext;
+
+	user_ext = u64_to_user_ptr(ext_handles);
+	while(user_ext) {
+		struct drm_v3d_extension ext;
+
+		if (copy_from_user(&ext, user_ext, sizeof(ext))) {
+			DRM_DEBUG("Failed to copy submit extension\n");
+			return -EFAULT;
+		}
+
+		switch (ext.id) {
+		case 0:
+		default:
+			DRM_DEBUG_DRIVER("Unknown extension id: %d\n", ext.id);
+			return -EINVAL;
+		}
+
+		user_ext = u64_to_user_ptr(ext.next);
+	}
+
+	return 0;
+}
+
 /**
  * v3d_submit_cl_ioctl() - Submits a job (frame) to the V3D.
  * @dev: DRM device
@@ -563,15 +590,24 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
 
 	trace_v3d_submit_cl_ioctl(&v3d->drm, args->rcl_start, args->rcl_end);
 
-	if (args->pad != 0)
+	if (args->pad)
 		return -EINVAL;
 
-	if (args->flags != 0 &&
-	    args->flags != DRM_V3D_SUBMIT_CL_FLUSH_CACHE) {
+	if (args->flags &&
+	    args->flags & ~(DRM_V3D_SUBMIT_CL_FLUSH_CACHE |
+			    DRM_V3D_SUBMIT_EXTENSION)) {
 		DRM_INFO("invalid flags: %d\n", args->flags);
 		return -EINVAL;
 	}
 
+	if (args->flags & DRM_V3D_SUBMIT_EXTENSION) {
+		ret = v3d_get_extensions(file_priv, args->extensions);
+		if (ret) {
+			DRM_DEBUG("Failed to get extensions.\n");
+			return ret;
+		}
+	}
+
 	ret = v3d_job_init(v3d, file_priv, (void *)&render, sizeof(*render),
 			   v3d_render_job_free, args->in_sync_bcl, V3D_RENDER);
 	if (ret)
@@ -700,6 +736,19 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
 
 	trace_v3d_submit_tfu_ioctl(&v3d->drm, args->iia);
 
+	if (args->flags && !(args->flags & DRM_V3D_SUBMIT_EXTENSION)) {
+		DRM_DEBUG("invalid flags: %d\n", args->flags);
+		return -EINVAL;
+	}
+
+	if (args->flags & DRM_V3D_SUBMIT_EXTENSION) {
+		ret = v3d_get_extensions(file_priv, args->extensions);
+		if (ret) {
+			DRM_DEBUG("Failed to get extensions.\n");
+			return ret;
+		}
+	}
+
 	ret = v3d_job_init(v3d, file_priv, (void *)&job, sizeof(*job),
 			   v3d_job_free, args->in_sync, V3D_TFU);
 	if (ret)
@@ -784,11 +833,27 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
 
 	trace_v3d_submit_csd_ioctl(&v3d->drm, args->cfg[5], args->cfg[6]);
 
+	if (args->pad)
+		return -EINVAL;
+
 	if (!v3d_has_csd(v3d)) {
 		DRM_DEBUG("Attempting CSD submit on non-CSD hardware\n");
 		return -EINVAL;
 	}
 
+	if (args->flags && !(args->flags & DRM_V3D_SUBMIT_EXTENSION)) {
+		DRM_INFO("invalid flags: %d\n", args->flags);
+		return -EINVAL;
+	}
+
+	if (args->flags & DRM_V3D_SUBMIT_EXTENSION) {
+		ret = v3d_get_extensions(file_priv, args->extensions);
+		if (ret) {
+			DRM_DEBUG("Failed to get extensions.\n");
+			return ret;
+		}
+	}
+
 	ret = v3d_job_init(v3d, file_priv, (void *)&job, sizeof(*job),
 			   v3d_job_free, args->in_sync, V3D_CSD);
 	if (ret)
diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
index 4104f22fb3d3..55b443ca6c0b 100644
--- a/include/uapi/drm/v3d_drm.h
+++ b/include/uapi/drm/v3d_drm.h
@@ -58,6 +58,20 @@ extern "C" {
 						   struct drm_v3d_perfmon_get_values)
 
 #define DRM_V3D_SUBMIT_CL_FLUSH_CACHE             0x01
+#define DRM_V3D_SUBMIT_EXTENSION		  0x02
+
+/* struct drm_v3d_extension - ioctl extensions
+ *
+ * Linked-list of generic extensions where the id identify which struct is
+ * pointed by ext_data. Therefore, DRM_V3D_EXT_ID_* is used on id to identify
+ * the extension type.
+ */
+struct drm_v3d_extension {
+	__u64 next;
+	__u32 id;
+#define DRM_V3D_EXT_ID_MULTI_SYNC		0x01
+	__u32 flags; /* mbz */
+};
 
 /**
  * struct drm_v3d_submit_cl - ioctl argument for submitting commands to the 3D
@@ -135,12 +149,16 @@ struct drm_v3d_submit_cl {
 	/* Number of BO handles passed in (size is that times 4). */
 	__u32 bo_handle_count;
 
+	/* DRM_V3D_SUBMIT_* properties */
 	__u32 flags;
 
 	/* ID of the perfmon to attach to this job. 0 means no perfmon. */
 	__u32 perfmon_id;
 
 	__u32 pad;
+
+	/* Pointer to an array of ioctl extensions*/
+	__u64 extensions;
 };
 
 /**
@@ -248,6 +266,12 @@ struct drm_v3d_submit_tfu {
 	__u32 in_sync;
 	/* Sync object to signal when the TFU job is done. */
 	__u32 out_sync;
+
+	__u32 flags;
+
+	/* Pointer to an array of ioctl extensions*/
+	__u64 extensions;
+
 };
 
 /* Submits a compute shader for dispatch.  This job will block on any
@@ -276,6 +300,13 @@ struct drm_v3d_submit_csd {
 
 	/* ID of the perfmon to attach to this job. 0 means no perfmon. */
 	__u32 perfmon_id;
+
+	/* Pointer to an array of ioctl extensions*/
+	__u64 extensions;
+
+	__u32 flags;
+
+	__u32 pad;
 };
 
 enum {
-- 
2.30.2


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 4/4] drm/v3d: add multiple syncobjs support
  2021-09-29  9:41 [PATCH v2 0/4] drm/v3d: add multiple in/out syncobjs support Melissa Wen
                   ` (2 preceding siblings ...)
  2021-09-29  9:44 ` [PATCH v2 3/4] drm/v3d: add generic ioctl extension Melissa Wen
@ 2021-09-29  9:45 ` Melissa Wen
  2021-09-30  9:54   ` Iago Toral
  3 siblings, 1 reply; 13+ messages in thread
From: Melissa Wen @ 2021-09-29  9:45 UTC (permalink / raw)
  To: dri-devel
  Cc: Emma Anholt, David Airlie, Daniel Vetter, Maxime Ripard,
	Boris Brezillon, Iago Toral

[-- Attachment #1: Type: text/plain, Size: 15456 bytes --]

Using the generic extension from the previous patch, a specific multisync
extension enables more than one in/out binary syncobj per job submission.
Arrays of syncobjs are set in struct drm_v3d_multisync, that also cares
of determining the stage for sync (wait deps) according to the job
queue.

v2:
- subclass the generic extension struct (Daniel)
- simplify adding dependency conditions to make understandable (Iago)

Signed-off-by: Melissa Wen <mwen@igalia.com>
---
 drivers/gpu/drm/v3d/v3d_drv.c |   6 +-
 drivers/gpu/drm/v3d/v3d_drv.h |  23 +++--
 drivers/gpu/drm/v3d/v3d_gem.c | 176 ++++++++++++++++++++++++++++++----
 include/uapi/drm/v3d_drm.h    |  49 +++++++++-
 4 files changed, 224 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index 3d6b9bcce2f7..bd46396a1ae0 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -96,6 +96,9 @@ static int v3d_get_param_ioctl(struct drm_device *dev, void *data,
 	case DRM_V3D_PARAM_SUPPORTS_PERFMON:
 		args->value = (v3d->ver >= 40);
 		return 0;
+	case DRM_V3D_PARAM_SUPPORTS_MULTISYNC_EXT:
+		args->value = 1;
+		return 0;
 	default:
 		DRM_DEBUG("Unknown parameter %d\n", args->param);
 		return -EINVAL;
@@ -135,9 +138,8 @@ v3d_postclose(struct drm_device *dev, struct drm_file *file)
 	struct v3d_file_priv *v3d_priv = file->driver_priv;
 	enum v3d_queue q;
 
-	for (q = 0; q < V3D_MAX_QUEUES; q++) {
+	for (q = 0; q < V3D_MAX_QUEUES; q++)
 		drm_sched_entity_destroy(&v3d_priv->sched_entity[q]);
-	}
 
 	v3d_perfmon_close_file(v3d_priv);
 	kfree(v3d_priv);
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index b900a050d5e2..b14ff1e96f49 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -19,15 +19,6 @@ struct reset_control;
 
 #define GMP_GRANULARITY (128 * 1024)
 
-/* Enum for each of the V3D queues. */
-enum v3d_queue {
-	V3D_BIN,
-	V3D_RENDER,
-	V3D_TFU,
-	V3D_CSD,
-	V3D_CACHE_CLEAN,
-};
-
 #define V3D_MAX_QUEUES (V3D_CACHE_CLEAN + 1)
 
 struct v3d_queue_state {
@@ -294,6 +285,20 @@ struct v3d_csd_job {
 	struct drm_v3d_submit_csd args;
 };
 
+struct v3d_submit_outsync {
+	struct drm_syncobj *syncobj;
+};
+
+struct v3d_submit_ext {
+	u32 wait_stage;
+
+	u32 in_sync_count;
+	u64 in_syncs;
+
+	u32 out_sync_count;
+	struct v3d_submit_outsync *out_syncs;
+};
+
 /**
  * __wait_for - magic wait macro
  *
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index b912419027f7..e92998d39eaa 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -454,11 +454,12 @@ v3d_job_add_deps(struct drm_file *file_priv, struct v3d_job *job,
 static int
 v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
 	     void **container, size_t size, void (*free)(struct kref *ref),
-	     u32 in_sync, enum v3d_queue queue)
+	     u32 in_sync, struct v3d_submit_ext *se, enum v3d_queue queue)
 {
 	struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
 	struct v3d_job *job;
-	int ret;
+	bool has_multisync = (se && se->in_sync_count);
+	int ret, i;
 
 	*container = kcalloc(1, size, GFP_KERNEL);
 	if (!*container) {
@@ -479,9 +480,28 @@ v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
 	if (ret)
 		goto fail;
 
-	ret = v3d_job_add_deps(file_priv, job, in_sync, 0);
-	if (ret)
-		goto fail_job;
+	if (has_multisync) {
+		if (se->wait_stage == queue) {
+			struct drm_v3d_sem __user *handle = u64_to_user_ptr(se->in_syncs);
+
+			for (i = 0; i < se->in_sync_count; i++) {
+				struct drm_v3d_sem in;
+
+				ret = copy_from_user(&in, handle++, sizeof(in));
+				if (ret) {
+					DRM_DEBUG("Failed to copy wait dep handle.\n");
+					goto fail_job;
+				}
+				ret = v3d_job_add_deps(file_priv, job, in.handle, 0);
+				if (ret)
+					goto fail_job;
+			}
+		}
+	} else {
+		ret = v3d_job_add_deps(file_priv, job, in_sync, 0);
+		if (ret)
+			goto fail_job;
+	}
 
 	kref_init(&job->refcount);
 
@@ -514,6 +534,7 @@ v3d_attach_fences_and_unlock_reservation(struct drm_file *file_priv,
 					 struct v3d_job *job,
 					 struct ww_acquire_ctx *acquire_ctx,
 					 u32 out_sync,
+					 struct v3d_submit_ext *se,
 					 struct dma_fence *done_fence)
 {
 	struct drm_syncobj *sync_out;
@@ -528,6 +549,18 @@ v3d_attach_fences_and_unlock_reservation(struct drm_file *file_priv,
 	drm_gem_unlock_reservations(job->bo, job->bo_count, acquire_ctx);
 
 	/* Update the return sync object for the job */
+	/* If multiples semaphores is supported */
+	if (se && se->out_sync_count) {
+		for (i = 0; i < se->out_sync_count; i++) {
+			drm_syncobj_replace_fence(se->out_syncs[i].syncobj,
+						  done_fence);
+			drm_syncobj_put(se->out_syncs[i].syncobj);
+		}
+		kvfree(se->out_syncs);
+		return;
+	}
+
+	/* Single signal semaphore */
 	sync_out = drm_syncobj_find(file_priv, out_sync);
 	if (sync_out) {
 		drm_syncobj_replace_fence(sync_out, done_fence);
@@ -535,13 +568,107 @@ v3d_attach_fences_and_unlock_reservation(struct drm_file *file_priv,
 	}
 }
 
+static void
+v3d_put_multisync_post_deps(struct v3d_submit_ext *se)
+{
+	unsigned int i;
+
+	for (i = 0; i < se->out_sync_count; i++)
+		drm_syncobj_put(se->out_syncs[i].syncobj);
+	kvfree(se->out_syncs);
+}
+
 static int
-v3d_get_extensions(struct drm_file *file_priv, u64 ext_handles)
+v3d_get_multisync_post_deps(struct drm_file *file_priv,
+			    struct v3d_submit_ext *se,
+			    u32 count, u64 handles)
+{
+	struct drm_v3d_sem __user *post_deps;
+	int i, ret;
+
+	se->out_syncs = (struct v3d_submit_outsync *)
+			kvmalloc_array(count,
+				       sizeof(struct v3d_submit_outsync),
+				       GFP_KERNEL);
+	if (!se->out_syncs)
+		return -ENOMEM;
+
+	post_deps = u64_to_user_ptr(handles);
+
+	for (i = 0; i < count; i++) {
+		struct drm_v3d_sem out;
+
+		ret = copy_from_user(&out, post_deps++, sizeof(out));
+		if (ret) {
+			DRM_DEBUG("Failed to copy post dep handles\n");
+			goto fail;
+		}
+
+		se->out_syncs[i].syncobj = drm_syncobj_find(file_priv,
+							    out.handle);
+		if (!se->out_syncs[i].syncobj) {
+			ret = -EINVAL;
+			goto fail;
+		}
+	}
+	se->out_sync_count = count;
+
+	return 0;
+
+fail:
+	for (i--; i >= 0; i--)
+		drm_syncobj_put(se->out_syncs[i].syncobj);
+	kvfree(se->out_syncs);
+
+	return ret;
+}
+
+/* Get data for multiple binary semaphores synchronization. Parse syncobj
+ * to be signaled when job completes (out_sync).
+ */
+static int
+v3d_get_multisync_submit_deps(struct drm_file *file_priv,
+			      struct drm_v3d_extension __user *ext,
+			      void *data)
+{
+	struct drm_v3d_multi_sync multisync;
+	struct v3d_submit_ext *se = data;
+	int ret;
+
+	ret = copy_from_user(&multisync, ext, sizeof(multisync));
+	if (ret)
+		return ret;
+
+	if (multisync.pad)
+		return -EINVAL;
+
+	ret = v3d_get_multisync_post_deps(file_priv, data, multisync.out_sync_count,
+					  multisync.out_syncs);
+	if (ret)
+		return ret;
+
+	se->in_sync_count = multisync.in_sync_count;
+	se->in_syncs = multisync.in_syncs;
+
+	se->wait_stage = multisync.wait_stage;
+
+	return 0;
+}
+
+/* Whenever userspace sets ioctl extensions, parses data according to the
+ * extension id (name). By now, generic ioctl extensions is only used to
+ * support multiple binary semaphores (struct drm_v3d_multi_sync).
+ */
+static int
+v3d_get_extensions(struct drm_file *file_priv,
+		   u64 ext_handles,
+		   void *data)
 {
 	struct drm_v3d_extension __user *user_ext;
+	int ret;
 
 	user_ext = u64_to_user_ptr(ext_handles);
-	while(user_ext) {
+	while (user_ext) {
 		struct drm_v3d_extension ext;
 
 		if (copy_from_user(&ext, user_ext, sizeof(ext))) {
@@ -550,7 +677,11 @@ v3d_get_extensions(struct drm_file *file_priv, u64 ext_handles)
 		}
 
 		switch (ext.id) {
-		case 0:
+		case DRM_V3D_EXT_ID_MULTI_SYNC:
+			ret = v3d_get_multisync_submit_deps(file_priv, user_ext, data);
+			if (ret)
+				return ret;
+			break;
 		default:
 			DRM_DEBUG_DRIVER("Unknown extension id: %d\n", ext.id);
 			return -EINVAL;
@@ -581,6 +712,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
 	struct v3d_dev *v3d = to_v3d_dev(dev);
 	struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
 	struct drm_v3d_submit_cl *args = data;
+	struct v3d_submit_ext se = {0};
 	struct v3d_bin_job *bin = NULL;
 	struct v3d_render_job *render;
 	struct v3d_job *clean_job = NULL;
@@ -601,7 +733,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
 	}
 
 	if (args->flags & DRM_V3D_SUBMIT_EXTENSION) {
-		ret = v3d_get_extensions(file_priv, args->extensions);
+		ret = v3d_get_extensions(file_priv, args->extensions, &se);
 		if (ret) {
 			DRM_DEBUG("Failed to get extensions.\n");
 			return ret;
@@ -609,7 +741,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
 	}
 
 	ret = v3d_job_init(v3d, file_priv, (void *)&render, sizeof(*render),
-			   v3d_render_job_free, args->in_sync_bcl, V3D_RENDER);
+			   v3d_render_job_free, args->in_sync_bcl, &se, V3D_RENDER);
 	if (ret)
 		goto fail;
 
@@ -619,7 +751,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
 
 	if (args->bcl_start != args->bcl_end) {
 		ret = v3d_job_init(v3d, file_priv, (void *)&bin, sizeof(*bin),
-				   v3d_job_free, args->in_sync_bcl, V3D_BIN);
+				   v3d_job_free, args->in_sync_bcl, &se, V3D_BIN);
 		if (ret)
 			goto fail;
 
@@ -633,7 +765,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
 
 	if (args->flags & DRM_V3D_SUBMIT_CL_FLUSH_CACHE) {
 		ret = v3d_job_init(v3d, file_priv, (void *)&clean_job, sizeof(*clean_job),
-				   v3d_job_free, 0, V3D_CACHE_CLEAN);
+				   v3d_job_free, 0, 0, V3D_CACHE_CLEAN);
 		if (ret)
 			goto fail;
 
@@ -693,6 +825,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
 						 last_job,
 						 &acquire_ctx,
 						 args->out_sync,
+						 &se,
 						 last_job->done_fence);
 
 	if (bin)
@@ -711,6 +844,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
 	v3d_job_cleanup((void *)bin);
 	v3d_job_cleanup((void *)render);
 	v3d_job_cleanup(clean_job);
+	v3d_put_multisync_post_deps(&se);
 
 	return ret;
 }
@@ -730,6 +864,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
 {
 	struct v3d_dev *v3d = to_v3d_dev(dev);
 	struct drm_v3d_submit_tfu *args = data;
+	struct v3d_submit_ext se = {0};
 	struct v3d_tfu_job *job;
 	struct ww_acquire_ctx acquire_ctx;
 	int ret = 0;
@@ -742,7 +877,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
 	}
 
 	if (args->flags & DRM_V3D_SUBMIT_EXTENSION) {
-		ret = v3d_get_extensions(file_priv, args->extensions);
+		ret = v3d_get_extensions(file_priv, args->extensions, &se);
 		if (ret) {
 			DRM_DEBUG("Failed to get extensions.\n");
 			return ret;
@@ -750,7 +885,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
 	}
 
 	ret = v3d_job_init(v3d, file_priv, (void *)&job, sizeof(*job),
-			   v3d_job_free, args->in_sync, V3D_TFU);
+			   v3d_job_free, args->in_sync, &se, V3D_TFU);
 	if (ret)
 		goto fail;
 
@@ -798,6 +933,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
 	v3d_attach_fences_and_unlock_reservation(file_priv,
 						 &job->base, &acquire_ctx,
 						 args->out_sync,
+						 &se,
 						 job->base.done_fence);
 
 	v3d_job_put(&job->base);
@@ -806,6 +942,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
 
 fail:
 	v3d_job_cleanup((void *)job);
+	v3d_put_multisync_post_deps(&se);
 
 	return ret;
 }
@@ -826,8 +963,9 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
 	struct v3d_dev *v3d = to_v3d_dev(dev);
 	struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
 	struct drm_v3d_submit_csd *args = data;
+	struct v3d_submit_ext se = {0};
 	struct v3d_csd_job *job;
-	struct v3d_job *clean_job;
+	struct v3d_job *clean_job = NULL;
 	struct ww_acquire_ctx acquire_ctx;
 	int ret;
 
@@ -847,7 +985,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
 	}
 
 	if (args->flags & DRM_V3D_SUBMIT_EXTENSION) {
-		ret = v3d_get_extensions(file_priv, args->extensions);
+		ret = v3d_get_extensions(file_priv, args->extensions, &se);
 		if (ret) {
 			DRM_DEBUG("Failed to get extensions.\n");
 			return ret;
@@ -855,12 +993,12 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
 	}
 
 	ret = v3d_job_init(v3d, file_priv, (void *)&job, sizeof(*job),
-			   v3d_job_free, args->in_sync, V3D_CSD);
+			   v3d_job_free, args->in_sync, &se, V3D_CSD);
 	if (ret)
 		goto fail;
 
 	ret = v3d_job_init(v3d, file_priv, (void *)&clean_job, sizeof(*clean_job),
-			   v3d_job_free, 0, V3D_CACHE_CLEAN);
+			   v3d_job_free, 0, 0, V3D_CACHE_CLEAN);
 	if (ret)
 		goto fail;
 
@@ -899,6 +1037,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
 						 clean_job,
 						 &acquire_ctx,
 						 args->out_sync,
+						 &se,
 						 clean_job->done_fence);
 
 	v3d_job_put(&job->base);
@@ -913,6 +1052,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
 fail:
 	v3d_job_cleanup((void *)job);
 	v3d_job_cleanup(clean_job);
+	v3d_put_multisync_post_deps(&se);
 
 	return ret;
 }
diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
index 55b443ca6c0b..3dfc0af8756a 100644
--- a/include/uapi/drm/v3d_drm.h
+++ b/include/uapi/drm/v3d_drm.h
@@ -73,6 +73,53 @@ struct drm_v3d_extension {
 	__u32 flags; /* mbz */
 };
 
+/* struct drm_v3d_sem - wait/signal semaphore
+ *
+ * If binary semaphore, it only takes syncobj handle and ignores flags and
+ * point fields. Point is defined for timeline syncobj feature.
+ */
+struct drm_v3d_sem {
+	__u32 handle; /* syncobj */
+	/* rsv below, for future uses */
+	__u32 flags;
+	__u64 point;  /* for timeline sem support */
+	__u64 mbz[2]; /* must be zero, rsv */
+};
+
+/* Enum for each of the V3D queues. */
+enum v3d_queue {
+	V3D_BIN,
+	V3D_RENDER,
+	V3D_TFU,
+	V3D_CSD,
+	V3D_CACHE_CLEAN,
+};
+
+/**
+ * struct drm_v3d_multi_sync - ioctl extension to add support multiples
+ * syncobjs for commands submission.
+ *
+ * When an extension of DRM_V3D_EXT_ID_MULTI_SYNC id is defined, it points to
+ * this extension to define wait and signal dependencies, instead of single
+ * in/out sync entries on submitting commands. The field flags is used to
+ * determine the stage to set wait dependencies.
+ */
+struct drm_v3d_multi_sync {
+	struct drm_v3d_extension base;
+	/* Array of wait and signal semaphores */
+	__u64 in_syncs;
+	__u64 out_syncs;
+
+	/* Number of entries */
+	__u32 in_sync_count;
+	__u32 out_sync_count;
+
+	/* set the stage (v3d_queue) to sync */
+	__u32 wait_stage;
+
+	__u32 pad; /* mbz */
+};
+
 /**
  * struct drm_v3d_submit_cl - ioctl argument for submitting commands to the 3D
  * engine.
@@ -228,6 +275,7 @@ enum drm_v3d_param {
 	DRM_V3D_PARAM_SUPPORTS_CSD,
 	DRM_V3D_PARAM_SUPPORTS_CACHE_FLUSH,
 	DRM_V3D_PARAM_SUPPORTS_PERFMON,
+	DRM_V3D_PARAM_SUPPORTS_MULTISYNC_EXT,
 };
 
 struct drm_v3d_get_param {
@@ -271,7 +319,6 @@ struct drm_v3d_submit_tfu {
 
 	/* Pointer to an array of ioctl extensions*/
 	__u64 extensions;
-
 };
 
 /* Submits a compute shader for dispatch.  This job will block on any
-- 
2.30.2


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/4] drm/v3d: alloc and init job in one shot
  2021-09-29  9:43 ` [PATCH v2 2/4] drm/v3d: alloc and init job in one shot Melissa Wen
@ 2021-09-30  8:44   ` Iago Toral
  2021-09-30  9:16     ` Melissa Wen
  0 siblings, 1 reply; 13+ messages in thread
From: Iago Toral @ 2021-09-30  8:44 UTC (permalink / raw)
  To: Melissa Wen, dri-devel
  Cc: Emma Anholt, David Airlie, Daniel Vetter, Maxime Ripard, Boris Brezillon

On Wed, 2021-09-29 at 10:43 +0100, Melissa Wen wrote:
> Move job memory allocation to v3d_job_init function. This aim to
> facilitate
> error handling in job initialization, since cleanup steps are similar
> for all
> (struct v3d_job)-based types of job involved in a command submission.
> To
> generalize v3d_job_init(), this change takes into account that all
> job structs
> have the first element a struct v3d_job (bin, render, tfu, csd) or it
> is a
> v3d_job itself (clean_job) for pointer casting.
> 
> Suggested-by: Iago Toral <itoral@igalia.com>
> Signed-off-by: Melissa Wen <mwen@igalia.com>
> ---
>  drivers/gpu/drm/v3d/v3d_gem.c | 115 +++++++++++++-------------------
> --
>  1 file changed, 42 insertions(+), 73 deletions(-)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c
> b/drivers/gpu/drm/v3d/v3d_gem.c
> index e60fbc28ef29..9cfa6f8d4357 100644
> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> @@ -392,6 +392,9 @@ v3d_render_job_free(struct kref *ref)
>  
>  void v3d_job_cleanup(struct v3d_job *job)
>  {
> +	if (!job)
> +		return;
> +
>  	drm_sched_job_cleanup(&job->base);
>  	v3d_job_put(job);
>  }
> @@ -450,12 +453,20 @@ v3d_job_add_deps(struct drm_file *file_priv,
> struct v3d_job *job,
>  
>  static int
>  v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
> -	     struct v3d_job *job, void (*free)(struct kref *ref),
> +	     void **container, size_t size, void (*free)(struct kref
> *ref),
>  	     u32 in_sync, enum v3d_queue queue)
>  {
>  	struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
> +	struct v3d_job *job;
>  	int ret;
>  
> +	*container = kcalloc(1, size, GFP_KERNEL);
> +	if (!*container) {
> +		DRM_ERROR("Cannot allocate memory for v3d job.");
> +		return -ENOMEM;
> +	}
> +
> +	job = *container;
>  	job->v3d = v3d;
>  	job->free = free;
>  

Right below this hunk we have an early return that doesn't jump to
fail:

        ret = pm_runtime_get_sync(v3d->drm.dev);
        if (ret < 0)
                return ret;


so we should kfree(*container) and set it to NULL there, right?

> @@ -479,6 +490,9 @@ v3d_job_init(struct v3d_dev *v3d, struct drm_file
> *file_priv,
>  	drm_sched_job_cleanup(&job->base);
>  fail:
>  	pm_runtime_put_autosuspend(v3d->drm.dev);
> +	kfree(*container);
> +	*container = NULL;
> +
>  	return ret;
>  }
> 

(...)

> @@ -806,29 +789,15 @@ v3d_submit_csd_ioctl(struct drm_device *dev,
> void *data,
>  		return -EINVAL;
>  	}
>  
> -	job = kcalloc(1, sizeof(*job), GFP_KERNEL);
> -	if (!job)
> -		return -ENOMEM;
> -
> -	ret = v3d_job_init(v3d, file_priv, &job->base,
> +	ret = v3d_job_init(v3d, file_priv, (void *)&job, sizeof(*job),
>  			   v3d_job_free, args->in_sync, V3D_CSD);
> -	if (ret) {
> -		kfree(job);
> -		return ret;
> -	}
> -
> -	clean_job = kcalloc(1, sizeof(*clean_job), GFP_KERNEL);
> -	if (!clean_job) {
> -		v3d_job_cleanup(&job->base);
> -		return -ENOMEM;
> -	}
> +	if (ret)
> +		goto fail;
> 

For this to work we need to explicitly initialize clean_job to NULL. 
Notice that the same applies to the bin and clean jobs in the CL ioctl,
but in that case we're already initializing them to NULL.

> -	ret = v3d_job_init(v3d, file_priv, clean_job, v3d_job_free, 0,
> V3D_CACHE_CLEAN);
> -	if (ret) {
> -		v3d_job_cleanup(&job->base);
> -		kfree(clean_job);
> -		return ret;
> -	}
> +	ret = v3d_job_init(v3d, file_priv, (void *)&clean_job,
> sizeof(*clean_job),
> +			   v3d_job_free, 0, V3D_CACHE_CLEAN);
> +	if (ret)
> +		goto fail;
>  
>  	job->args = *args;
>  
> @@ -877,7 +846,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void
> *data,
>  	drm_gem_unlock_reservations(clean_job->bo, clean_job->bo_count,
>  				    &acquire_ctx);
>  fail:
> -	v3d_job_cleanup(&job->base);
> +	v3d_job_cleanup((void *)job);
>  	v3d_job_cleanup(clean_job);
>  
>  	return ret;


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 3/4] drm/v3d: add generic ioctl extension
  2021-09-29  9:44 ` [PATCH v2 3/4] drm/v3d: add generic ioctl extension Melissa Wen
@ 2021-09-30  9:11   ` Iago Toral
  2021-09-30  9:22     ` Melissa Wen
  0 siblings, 1 reply; 13+ messages in thread
From: Iago Toral @ 2021-09-30  9:11 UTC (permalink / raw)
  To: Melissa Wen, dri-devel
  Cc: Emma Anholt, David Airlie, Daniel Vetter, Maxime Ripard, Boris Brezillon

On Wed, 2021-09-29 at 10:44 +0100, Melissa Wen wrote:
> Add support to attach generic extensions on job submission. This
> patch
> is third prep work to enable multiple syncobjs on job submission.
> With
> this work, when the job submission interface needs to be extended to
> accomodate a new feature, we will use a generic extension struct
> where
> an id determines the data type to be pointed. The first application
> is
> to enable multiples in/out syncobj (next patch), but the base is
> already done for future features. Therefore, to attach a new feature,
> a specific extension struct should subclass drm_v3d_extension and
> update the list of extensions in a job submission.
> 
> v2:
> - remove redundant elements to subclass struct (Daniel)
> 
> Signed-off-by: Melissa Wen <mwen@igalia.com>
> ---
>  drivers/gpu/drm/v3d/v3d_drv.c |  4 +-
>  drivers/gpu/drm/v3d/v3d_gem.c | 71
> +++++++++++++++++++++++++++++++++--
>  include/uapi/drm/v3d_drm.h    | 31 +++++++++++++++
>  3 files changed, 100 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.c
> b/drivers/gpu/drm/v3d/v3d_drv.c
> index c1deab2cf38d..3d6b9bcce2f7 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.c
> +++ b/drivers/gpu/drm/v3d/v3d_drv.c
> @@ -83,7 +83,6 @@ static int v3d_get_param_ioctl(struct drm_device
> *dev, void *data,
>  		return 0;
>  	}
>  
> -
>  	switch (args->param) {
>  	case DRM_V3D_PARAM_SUPPORTS_TFU:
>  		args->value = 1;
> @@ -147,7 +146,7 @@ v3d_postclose(struct drm_device *dev, struct
> drm_file *file)
>  DEFINE_DRM_GEM_FOPS(v3d_drm_fops);
>  
>  /* DRM_AUTH is required on SUBMIT_CL for now, while we don't have
> GMP
> - * protection between clients.  Note that render nodes would be be
> + * protection between clients.  Note that render nodes would be
>   * able to submit CLs that could access BOs from clients
> authenticated
>   * with the master node.  The TFU doesn't use the GMP, so it would
>   * need to stay DRM_AUTH until we do buffer size/offset validation.
> @@ -219,7 +218,6 @@ static int v3d_platform_drm_probe(struct
> platform_device *pdev)
>  	u32 mmu_debug;
>  	u32 ident1;
>  
> -
>  	v3d = devm_drm_dev_alloc(dev, &v3d_drm_driver, struct v3d_dev,
> drm);
>  	if (IS_ERR(v3d))
>  		return PTR_ERR(v3d);
> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c
> b/drivers/gpu/drm/v3d/v3d_gem.c
> index 9cfa6f8d4357..b912419027f7 100644
> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> @@ -535,6 +535,33 @@ v3d_attach_fences_and_unlock_reservation(struct
> drm_file *file_priv,
>  	}
>  }
>  
> +static int
> +v3d_get_extensions(struct drm_file *file_priv, u64 ext_handles)
> +{
> +	struct drm_v3d_extension __user *user_ext;
> +
> +	user_ext = u64_to_user_ptr(ext_handles);
> +	while(user_ext) {
> +		struct drm_v3d_extension ext;
> +
> +		if (copy_from_user(&ext, user_ext, sizeof(ext))) {
> +			DRM_DEBUG("Failed to copy submit extension\n");
> +			return -EFAULT;
> +		}
> +
> +		switch (ext.id) {
> +		case 0:
> +		default:
> +			DRM_DEBUG_DRIVER("Unknown extension id: %d\n",
> ext.id);
> +			return -EINVAL;
> +		}
> +
> +		user_ext = u64_to_user_ptr(ext.next);
> +	}
> +
> +	return 0;
> +}
> +
>  /**
>   * v3d_submit_cl_ioctl() - Submits a job (frame) to the V3D.
>   * @dev: DRM device
> @@ -563,15 +590,24 @@ v3d_submit_cl_ioctl(struct drm_device *dev,
> void *data,
>  
>  	trace_v3d_submit_cl_ioctl(&v3d->drm, args->rcl_start, args-
> >rcl_end);
>  
> -	if (args->pad != 0)
> +	if (args->pad)
>  		return -EINVAL;
>  
> -	if (args->flags != 0 &&
> -	    args->flags != DRM_V3D_SUBMIT_CL_FLUSH_CACHE) {
> +	if (args->flags &&
> +	    args->flags & ~(DRM_V3D_SUBMIT_CL_FLUSH_CACHE |
> +			    DRM_V3D_SUBMIT_EXTENSION)) {
>  		DRM_INFO("invalid flags: %d\n", args->flags);
>  		return -EINVAL;
>  	}
>  
> +	if (args->flags & DRM_V3D_SUBMIT_EXTENSION) {
> +		ret = v3d_get_extensions(file_priv, args->extensions);
> +		if (ret) {
> +			DRM_DEBUG("Failed to get extensions.\n");
> +			return ret;
> +		}
> +	}
> +
>  	ret = v3d_job_init(v3d, file_priv, (void *)&render,
> sizeof(*render),
>  			   v3d_render_job_free, args->in_sync_bcl,
> V3D_RENDER);
>  	if (ret)
> @@ -700,6 +736,19 @@ v3d_submit_tfu_ioctl(struct drm_device *dev,
> void *data,
>  
>  	trace_v3d_submit_tfu_ioctl(&v3d->drm, args->iia);
>  
> +	if (args->flags && !(args->flags & DRM_V3D_SUBMIT_EXTENSION)) {
> +		DRM_DEBUG("invalid flags: %d\n", args->flags);
> +		return -EINVAL;
> +	}
> +
> +	if (args->flags & DRM_V3D_SUBMIT_EXTENSION) {
> +		ret = v3d_get_extensions(file_priv, args->extensions);
> +		if (ret) {
> +			DRM_DEBUG("Failed to get extensions.\n");
> +			return ret;
> +		}
> +	}
> +
>  	ret = v3d_job_init(v3d, file_priv, (void *)&job, sizeof(*job),
>  			   v3d_job_free, args->in_sync, V3D_TFU);
>  	if (ret)
> @@ -784,11 +833,27 @@ v3d_submit_csd_ioctl(struct drm_device *dev,
> void *data,
>  
>  	trace_v3d_submit_csd_ioctl(&v3d->drm, args->cfg[5], args-
> >cfg[6]);
>  
> +	if (args->pad)
> +		return -EINVAL;
> +
>  	if (!v3d_has_csd(v3d)) {
>  		DRM_DEBUG("Attempting CSD submit on non-CSD
> hardware\n");
>  		return -EINVAL;
>  	}
>  
> +	if (args->flags && !(args->flags & DRM_V3D_SUBMIT_EXTENSION)) {
> +		DRM_INFO("invalid flags: %d\n", args->flags);
> +		return -EINVAL;
> +	}
> +
> +	if (args->flags & DRM_V3D_SUBMIT_EXTENSION) {
> +		ret = v3d_get_extensions(file_priv, args->extensions);
> +		if (ret) {
> +			DRM_DEBUG("Failed to get extensions.\n");
> +			return ret;
> +		}
> +	}
> +
>  	ret = v3d_job_init(v3d, file_priv, (void *)&job, sizeof(*job),
>  			   v3d_job_free, args->in_sync, V3D_CSD);
>  	if (ret)
> diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
> index 4104f22fb3d3..55b443ca6c0b 100644
> --- a/include/uapi/drm/v3d_drm.h
> +++ b/include/uapi/drm/v3d_drm.h
> @@ -58,6 +58,20 @@ extern "C" {
>  						   struct
> drm_v3d_perfmon_get_values)
>  
>  #define DRM_V3D_SUBMIT_CL_FLUSH_CACHE             0x01
> +#define DRM_V3D_SUBMIT_EXTENSION		  0x02
> +
> +/* struct drm_v3d_extension - ioctl extensions
> + *
> + * Linked-list of generic extensions where the id identify which
> struct is
> + * pointed by ext_data. Therefore, DRM_V3D_EXT_ID_* is used on id to
> identify
> + * the extension type.
> + */
> +struct drm_v3d_extension {
> +	__u64 next;
> +	__u32 id;
> +#define DRM_V3D_EXT_ID_MULTI_SYNC		0x01
> +	__u32 flags; /* mbz */
> +};
>  
>  /**
>   * struct drm_v3d_submit_cl - ioctl argument for submitting commands
> to the 3D
> @@ -135,12 +149,16 @@ struct drm_v3d_submit_cl {
>  	/* Number of BO handles passed in (size is that times 4). */
>  	__u32 bo_handle_count;
>  
> +	/* DRM_V3D_SUBMIT_* properties */
>  	__u32 flags;
>  
>  	/* ID of the perfmon to attach to this job. 0 means no perfmon.
> */
>  	__u32 perfmon_id;
>  
>  	__u32 pad;
> +
> +	/* Pointer to an array of ioctl extensions*/
> +	__u64 extensions;
>  };
>  
>  /**
> @@ -248,6 +266,12 @@ struct drm_v3d_submit_tfu {
>  	__u32 in_sync;
>  	/* Sync object to signal when the TFU job is done. */
>  	__u32 out_sync;
> +
> +	__u32 flags;
> +
> +	/* Pointer to an array of ioctl extensions*/
> +	__u64 extensions;

We want __u64 fields aligned to 64-bit so we should swap the positions
of flags and extensions.

> +
>  };
>  
>  /* Submits a compute shader for dispatch.  This job will block on
> any
> @@ -276,6 +300,13 @@ struct drm_v3d_submit_csd {
>  
>  	/* ID of the perfmon to attach to this job. 0 means no perfmon.
> */
>  	__u32 perfmon_id;
> +
> +	/* Pointer to an array of ioctl extensions*/
> +	__u64 extensions;
> +
> +	__u32 flags;
> +
> +	__u32 pad;
>  };
>  
>  enum {


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/4] drm/v3d: alloc and init job in one shot
  2021-09-30  8:44   ` Iago Toral
@ 2021-09-30  9:16     ` Melissa Wen
  0 siblings, 0 replies; 13+ messages in thread
From: Melissa Wen @ 2021-09-30  9:16 UTC (permalink / raw)
  To: Iago Toral
  Cc: dri-devel, Emma Anholt, David Airlie, Daniel Vetter,
	Maxime Ripard, Boris Brezillon

[-- Attachment #1: Type: text/plain, Size: 4363 bytes --]

On 09/30, Iago Toral wrote:
> On Wed, 2021-09-29 at 10:43 +0100, Melissa Wen wrote:
> > Move job memory allocation to v3d_job_init function. This aim to
> > facilitate
> > error handling in job initialization, since cleanup steps are similar
> > for all
> > (struct v3d_job)-based types of job involved in a command submission.
> > To
> > generalize v3d_job_init(), this change takes into account that all
> > job structs
> > have the first element a struct v3d_job (bin, render, tfu, csd) or it
> > is a
> > v3d_job itself (clean_job) for pointer casting.
> > 
> > Suggested-by: Iago Toral <itoral@igalia.com>
> > Signed-off-by: Melissa Wen <mwen@igalia.com>
> > ---
> >  drivers/gpu/drm/v3d/v3d_gem.c | 115 +++++++++++++-------------------
> > --
> >  1 file changed, 42 insertions(+), 73 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/v3d/v3d_gem.c
> > b/drivers/gpu/drm/v3d/v3d_gem.c
> > index e60fbc28ef29..9cfa6f8d4357 100644
> > --- a/drivers/gpu/drm/v3d/v3d_gem.c
> > +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> > @@ -392,6 +392,9 @@ v3d_render_job_free(struct kref *ref)
> >  
> >  void v3d_job_cleanup(struct v3d_job *job)
> >  {
> > +	if (!job)
> > +		return;
> > +
> >  	drm_sched_job_cleanup(&job->base);
> >  	v3d_job_put(job);
> >  }
> > @@ -450,12 +453,20 @@ v3d_job_add_deps(struct drm_file *file_priv,
> > struct v3d_job *job,
> >  
> >  static int
> >  v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
> > -	     struct v3d_job *job, void (*free)(struct kref *ref),
> > +	     void **container, size_t size, void (*free)(struct kref
> > *ref),
> >  	     u32 in_sync, enum v3d_queue queue)
> >  {
> >  	struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
> > +	struct v3d_job *job;
> >  	int ret;
> >  
> > +	*container = kcalloc(1, size, GFP_KERNEL);
> > +	if (!*container) {
> > +		DRM_ERROR("Cannot allocate memory for v3d job.");
> > +		return -ENOMEM;
> > +	}
> > +
> > +	job = *container;
> >  	job->v3d = v3d;
> >  	job->free = free;
> >  
> 
> Right below this hunk we have an early return that doesn't jump to
> fail:
> 
>         ret = pm_runtime_get_sync(v3d->drm.dev);
>         if (ret < 0)
>                 return ret;
> 
> 
> so we should kfree(*container) and set it to NULL there, right?
oh, yes. I missed it on porting downstream to upstream.
> 
> > @@ -479,6 +490,9 @@ v3d_job_init(struct v3d_dev *v3d, struct drm_file
> > *file_priv,
> >  	drm_sched_job_cleanup(&job->base);
> >  fail:
> >  	pm_runtime_put_autosuspend(v3d->drm.dev);
> > +	kfree(*container);
> > +	*container = NULL;
> > +
> >  	return ret;
> >  }
> > 
> 
> (...)
> 
> > @@ -806,29 +789,15 @@ v3d_submit_csd_ioctl(struct drm_device *dev,
> > void *data,
> >  		return -EINVAL;
> >  	}
> >  
> > -	job = kcalloc(1, sizeof(*job), GFP_KERNEL);
> > -	if (!job)
> > -		return -ENOMEM;
> > -
> > -	ret = v3d_job_init(v3d, file_priv, &job->base,
> > +	ret = v3d_job_init(v3d, file_priv, (void *)&job, sizeof(*job),
> >  			   v3d_job_free, args->in_sync, V3D_CSD);
> > -	if (ret) {
> > -		kfree(job);
> > -		return ret;
> > -	}
> > -
> > -	clean_job = kcalloc(1, sizeof(*clean_job), GFP_KERNEL);
> > -	if (!clean_job) {
> > -		v3d_job_cleanup(&job->base);
> > -		return -ENOMEM;
> > -	}
> > +	if (ret)
> > +		goto fail;
> > 
> 
> For this to work we need to explicitly initialize clean_job to NULL. 
> Notice that the same applies to the bin and clean jobs in the CL ioctl,
> but in that case we're already initializing them to NULL.
yes, thanks for pointing it out.

Melissa
> 
> > -	ret = v3d_job_init(v3d, file_priv, clean_job, v3d_job_free, 0,
> > V3D_CACHE_CLEAN);
> > -	if (ret) {
> > -		v3d_job_cleanup(&job->base);
> > -		kfree(clean_job);
> > -		return ret;
> > -	}
> > +	ret = v3d_job_init(v3d, file_priv, (void *)&clean_job,
> > sizeof(*clean_job),
> > +			   v3d_job_free, 0, V3D_CACHE_CLEAN);
> > +	if (ret)
> > +		goto fail;
> >  
> >  	job->args = *args;
> >  
> > @@ -877,7 +846,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void
> > *data,
> >  	drm_gem_unlock_reservations(clean_job->bo, clean_job->bo_count,
> >  				    &acquire_ctx);
> >  fail:
> > -	v3d_job_cleanup(&job->base);
> > +	v3d_job_cleanup((void *)job);
> >  	v3d_job_cleanup(clean_job);
> >  
> >  	return ret;
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 3/4] drm/v3d: add generic ioctl extension
  2021-09-30  9:11   ` Iago Toral
@ 2021-09-30  9:22     ` Melissa Wen
  2021-09-30  9:59       ` Iago Toral
  0 siblings, 1 reply; 13+ messages in thread
From: Melissa Wen @ 2021-09-30  9:22 UTC (permalink / raw)
  To: Iago Toral
  Cc: dri-devel, Emma Anholt, David Airlie, Daniel Vetter,
	Maxime Ripard, Boris Brezillon

[-- Attachment #1: Type: text/plain, Size: 8782 bytes --]

O 09/30, Iago Toral wrote:
> On Wed, 2021-09-29 at 10:44 +0100, Melissa Wen wrote:
> > Add support to attach generic extensions on job submission. This
> > patch
> > is third prep work to enable multiple syncobjs on job submission.
> > With
> > this work, when the job submission interface needs to be extended to
> > accomodate a new feature, we will use a generic extension struct
> > where
> > an id determines the data type to be pointed. The first application
> > is
> > to enable multiples in/out syncobj (next patch), but the base is
> > already done for future features. Therefore, to attach a new feature,
> > a specific extension struct should subclass drm_v3d_extension and
> > update the list of extensions in a job submission.
> > 
> > v2:
> > - remove redundant elements to subclass struct (Daniel)
> > 
> > Signed-off-by: Melissa Wen <mwen@igalia.com>
> > ---
> >  drivers/gpu/drm/v3d/v3d_drv.c |  4 +-
> >  drivers/gpu/drm/v3d/v3d_gem.c | 71
> > +++++++++++++++++++++++++++++++++--
> >  include/uapi/drm/v3d_drm.h    | 31 +++++++++++++++
> >  3 files changed, 100 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/v3d/v3d_drv.c
> > b/drivers/gpu/drm/v3d/v3d_drv.c
> > index c1deab2cf38d..3d6b9bcce2f7 100644
> > --- a/drivers/gpu/drm/v3d/v3d_drv.c
> > +++ b/drivers/gpu/drm/v3d/v3d_drv.c
> > @@ -83,7 +83,6 @@ static int v3d_get_param_ioctl(struct drm_device
> > *dev, void *data,
> >  		return 0;
> >  	}
> >  
> > -
> >  	switch (args->param) {
> >  	case DRM_V3D_PARAM_SUPPORTS_TFU:
> >  		args->value = 1;
> > @@ -147,7 +146,7 @@ v3d_postclose(struct drm_device *dev, struct
> > drm_file *file)
> >  DEFINE_DRM_GEM_FOPS(v3d_drm_fops);
> >  
> >  /* DRM_AUTH is required on SUBMIT_CL for now, while we don't have
> > GMP
> > - * protection between clients.  Note that render nodes would be be
> > + * protection between clients.  Note that render nodes would be
> >   * able to submit CLs that could access BOs from clients
> > authenticated
> >   * with the master node.  The TFU doesn't use the GMP, so it would
> >   * need to stay DRM_AUTH until we do buffer size/offset validation.
> > @@ -219,7 +218,6 @@ static int v3d_platform_drm_probe(struct
> > platform_device *pdev)
> >  	u32 mmu_debug;
> >  	u32 ident1;
> >  
> > -
> >  	v3d = devm_drm_dev_alloc(dev, &v3d_drm_driver, struct v3d_dev,
> > drm);
> >  	if (IS_ERR(v3d))
> >  		return PTR_ERR(v3d);
> > diff --git a/drivers/gpu/drm/v3d/v3d_gem.c
> > b/drivers/gpu/drm/v3d/v3d_gem.c
> > index 9cfa6f8d4357..b912419027f7 100644
> > --- a/drivers/gpu/drm/v3d/v3d_gem.c
> > +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> > @@ -535,6 +535,33 @@ v3d_attach_fences_and_unlock_reservation(struct
> > drm_file *file_priv,
> >  	}
> >  }
> >  
> > +static int
> > +v3d_get_extensions(struct drm_file *file_priv, u64 ext_handles)
> > +{
> > +	struct drm_v3d_extension __user *user_ext;
> > +
> > +	user_ext = u64_to_user_ptr(ext_handles);
> > +	while(user_ext) {
> > +		struct drm_v3d_extension ext;
> > +
> > +		if (copy_from_user(&ext, user_ext, sizeof(ext))) {
> > +			DRM_DEBUG("Failed to copy submit extension\n");
> > +			return -EFAULT;
> > +		}
> > +
> > +		switch (ext.id) {
> > +		case 0:
> > +		default:
> > +			DRM_DEBUG_DRIVER("Unknown extension id: %d\n",
> > ext.id);
> > +			return -EINVAL;
> > +		}
> > +
> > +		user_ext = u64_to_user_ptr(ext.next);
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> >  /**
> >   * v3d_submit_cl_ioctl() - Submits a job (frame) to the V3D.
> >   * @dev: DRM device
> > @@ -563,15 +590,24 @@ v3d_submit_cl_ioctl(struct drm_device *dev,
> > void *data,
> >  
> >  	trace_v3d_submit_cl_ioctl(&v3d->drm, args->rcl_start, args-
> > >rcl_end);
> >  
> > -	if (args->pad != 0)
> > +	if (args->pad)
> >  		return -EINVAL;
> >  
> > -	if (args->flags != 0 &&
> > -	    args->flags != DRM_V3D_SUBMIT_CL_FLUSH_CACHE) {
> > +	if (args->flags &&
> > +	    args->flags & ~(DRM_V3D_SUBMIT_CL_FLUSH_CACHE |
> > +			    DRM_V3D_SUBMIT_EXTENSION)) {
> >  		DRM_INFO("invalid flags: %d\n", args->flags);
> >  		return -EINVAL;
> >  	}
> >  
> > +	if (args->flags & DRM_V3D_SUBMIT_EXTENSION) {
> > +		ret = v3d_get_extensions(file_priv, args->extensions);
> > +		if (ret) {
> > +			DRM_DEBUG("Failed to get extensions.\n");
> > +			return ret;
> > +		}
> > +	}
> > +
> >  	ret = v3d_job_init(v3d, file_priv, (void *)&render,
> > sizeof(*render),
> >  			   v3d_render_job_free, args->in_sync_bcl,
> > V3D_RENDER);
> >  	if (ret)
> > @@ -700,6 +736,19 @@ v3d_submit_tfu_ioctl(struct drm_device *dev,
> > void *data,
> >  
> >  	trace_v3d_submit_tfu_ioctl(&v3d->drm, args->iia);
> >  
> > +	if (args->flags && !(args->flags & DRM_V3D_SUBMIT_EXTENSION)) {
> > +		DRM_DEBUG("invalid flags: %d\n", args->flags);
> > +		return -EINVAL;
> > +	}
> > +
> > +	if (args->flags & DRM_V3D_SUBMIT_EXTENSION) {
> > +		ret = v3d_get_extensions(file_priv, args->extensions);
> > +		if (ret) {
> > +			DRM_DEBUG("Failed to get extensions.\n");
> > +			return ret;
> > +		}
> > +	}
> > +
> >  	ret = v3d_job_init(v3d, file_priv, (void *)&job, sizeof(*job),
> >  			   v3d_job_free, args->in_sync, V3D_TFU);
> >  	if (ret)
> > @@ -784,11 +833,27 @@ v3d_submit_csd_ioctl(struct drm_device *dev,
> > void *data,
> >  
> >  	trace_v3d_submit_csd_ioctl(&v3d->drm, args->cfg[5], args-
> > >cfg[6]);
> >  
> > +	if (args->pad)
> > +		return -EINVAL;
> > +
> >  	if (!v3d_has_csd(v3d)) {
> >  		DRM_DEBUG("Attempting CSD submit on non-CSD
> > hardware\n");
> >  		return -EINVAL;
> >  	}
> >  
> > +	if (args->flags && !(args->flags & DRM_V3D_SUBMIT_EXTENSION)) {
> > +		DRM_INFO("invalid flags: %d\n", args->flags);
> > +		return -EINVAL;
> > +	}
> > +
> > +	if (args->flags & DRM_V3D_SUBMIT_EXTENSION) {
> > +		ret = v3d_get_extensions(file_priv, args->extensions);
> > +		if (ret) {
> > +			DRM_DEBUG("Failed to get extensions.\n");
> > +			return ret;
> > +		}
> > +	}
> > +
> >  	ret = v3d_job_init(v3d, file_priv, (void *)&job, sizeof(*job),
> >  			   v3d_job_free, args->in_sync, V3D_CSD);
> >  	if (ret)
> > diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
> > index 4104f22fb3d3..55b443ca6c0b 100644
> > --- a/include/uapi/drm/v3d_drm.h
> > +++ b/include/uapi/drm/v3d_drm.h
> > @@ -58,6 +58,20 @@ extern "C" {
> >  						   struct
> > drm_v3d_perfmon_get_values)
> >  
> >  #define DRM_V3D_SUBMIT_CL_FLUSH_CACHE             0x01
> > +#define DRM_V3D_SUBMIT_EXTENSION		  0x02
> > +
> > +/* struct drm_v3d_extension - ioctl extensions
> > + *
> > + * Linked-list of generic extensions where the id identify which
> > struct is
> > + * pointed by ext_data. Therefore, DRM_V3D_EXT_ID_* is used on id to
> > identify
> > + * the extension type.
> > + */
> > +struct drm_v3d_extension {
> > +	__u64 next;
> > +	__u32 id;
> > +#define DRM_V3D_EXT_ID_MULTI_SYNC		0x01
> > +	__u32 flags; /* mbz */
> > +};
> >  
> >  /**
> >   * struct drm_v3d_submit_cl - ioctl argument for submitting commands
> > to the 3D
> > @@ -135,12 +149,16 @@ struct drm_v3d_submit_cl {
> >  	/* Number of BO handles passed in (size is that times 4). */
> >  	__u32 bo_handle_count;
> >  
> > +	/* DRM_V3D_SUBMIT_* properties */
> >  	__u32 flags;
> >  
> >  	/* ID of the perfmon to attach to this job. 0 means no perfmon.
> > */
> >  	__u32 perfmon_id;
> >  
> >  	__u32 pad;
> > +
> > +	/* Pointer to an array of ioctl extensions*/
> > +	__u64 extensions;
> >  };
> >  
> >  /**
> > @@ -248,6 +266,12 @@ struct drm_v3d_submit_tfu {
> >  	__u32 in_sync;
> >  	/* Sync object to signal when the TFU job is done. */
> >  	__u32 out_sync;
> > +
> > +	__u32 flags;
> > +
> > +	/* Pointer to an array of ioctl extensions*/
> > +	__u64 extensions;
> 
> We want __u64 fields aligned to 64-bit so we should swap the positions
> of flags and extensions.

hmm.. not sure. before two arrays of 4 x _u32 elements, we have seven
_u32 elements... this is why I counted a odd number of _u32 and put _u32
flags before _u64 extensions... or is it working different for array
types?

For the same reason, I think there is an unalignment issue on
submit_csd that would need to change the current interface to solve
(afaiu)... 

> 
> > +
> >  };
> >  
> >  /* Submits a compute shader for dispatch.  This job will block on
> > any
> > @@ -276,6 +300,13 @@ struct drm_v3d_submit_csd {
> >  
> >  	/* ID of the perfmon to attach to this job. 0 means no perfmon.
> > */
> >  	__u32 perfmon_id;
> > +
> > +	/* Pointer to an array of ioctl extensions*/
> > +	__u64 extensions;
> > +
> > +	__u32 flags;
> > +
> > +	__u32 pad;
> >  };
> >  
> >  enum {
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 4/4] drm/v3d: add multiple syncobjs support
  2021-09-29  9:45 ` [PATCH v2 4/4] drm/v3d: add multiple syncobjs support Melissa Wen
@ 2021-09-30  9:54   ` Iago Toral
  2021-09-30 10:40     ` Melissa Wen
  0 siblings, 1 reply; 13+ messages in thread
From: Iago Toral @ 2021-09-30  9:54 UTC (permalink / raw)
  To: Melissa Wen, dri-devel
  Cc: Emma Anholt, David Airlie, Daniel Vetter, Maxime Ripard, Boris Brezillon

On Wed, 2021-09-29 at 10:45 +0100, Melissa Wen wrote:
> Using the generic extension from the previous patch, a specific
> multisync
> extension enables more than one in/out binary syncobj per job
> submission.
> Arrays of syncobjs are set in struct drm_v3d_multisync, that also
> cares
> of determining the stage for sync (wait deps) according to the job
> queue.
> 
> v2:
> - subclass the generic extension struct (Daniel)
> - simplify adding dependency conditions to make understandable (Iago)
> 
> Signed-off-by: Melissa Wen <mwen@igalia.com>
> ---
>  drivers/gpu/drm/v3d/v3d_drv.c |   6 +-
>  drivers/gpu/drm/v3d/v3d_drv.h |  23 +++--
>  drivers/gpu/drm/v3d/v3d_gem.c | 176 ++++++++++++++++++++++++++++++
> ----
>  include/uapi/drm/v3d_drm.h    |  49 +++++++++-
>  4 files changed, 224 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.c
> b/drivers/gpu/drm/v3d/v3d_drv.c
> index 3d6b9bcce2f7..bd46396a1ae0 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.c
> +++ b/drivers/gpu/drm/v3d/v3d_drv.c
> @@ -96,6 +96,9 @@ static int v3d_get_param_ioctl(struct drm_device
> *dev, void *data,
>  	case DRM_V3D_PARAM_SUPPORTS_PERFMON:
>  		args->value = (v3d->ver >= 40);
>  		return 0;
> +	case DRM_V3D_PARAM_SUPPORTS_MULTISYNC_EXT:
> +		args->value = 1;
> +		return 0;
>  	default:
>  		DRM_DEBUG("Unknown parameter %d\n", args->param);
>  		return -EINVAL;
> @@ -135,9 +138,8 @@ v3d_postclose(struct drm_device *dev, struct
> drm_file *file)
>  	struct v3d_file_priv *v3d_priv = file->driver_priv;
>  	enum v3d_queue q;
>  
> -	for (q = 0; q < V3D_MAX_QUEUES; q++) {
> +	for (q = 0; q < V3D_MAX_QUEUES; q++)
>  		drm_sched_entity_destroy(&v3d_priv->sched_entity[q]);
> -	}
>  
>  	v3d_perfmon_close_file(v3d_priv);
>  	kfree(v3d_priv);
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.h
> b/drivers/gpu/drm/v3d/v3d_drv.h
> index b900a050d5e2..b14ff1e96f49 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.h
> +++ b/drivers/gpu/drm/v3d/v3d_drv.h
> @@ -19,15 +19,6 @@ struct reset_control;
>  
>  #define GMP_GRANULARITY (128 * 1024)
>  
> -/* Enum for each of the V3D queues. */
> -enum v3d_queue {
> -	V3D_BIN,
> -	V3D_RENDER,
> -	V3D_TFU,
> -	V3D_CSD,
> -	V3D_CACHE_CLEAN,
> -};
> -
>  #define V3D_MAX_QUEUES (V3D_CACHE_CLEAN + 1)
>  
>  struct v3d_queue_state {
> @@ -294,6 +285,20 @@ struct v3d_csd_job {
>  	struct drm_v3d_submit_csd args;
>  };
>  
> +struct v3d_submit_outsync {
> +	struct drm_syncobj *syncobj;
> +};
> +
> +struct v3d_submit_ext {
> +	u32 wait_stage;
> +
> +	u32 in_sync_count;
> +	u64 in_syncs;
> +
> +	u32 out_sync_count;
> +	struct v3d_submit_outsync *out_syncs;
> +};
> +
>  /**
>   * __wait_for - magic wait macro
>   *
> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c
> b/drivers/gpu/drm/v3d/v3d_gem.c
> index b912419027f7..e92998d39eaa 100644
> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> @@ -454,11 +454,12 @@ v3d_job_add_deps(struct drm_file *file_priv,
> struct v3d_job *job,
>  static int
>  v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
>  	     void **container, size_t size, void (*free)(struct kref
> *ref),
> -	     u32 in_sync, enum v3d_queue queue)
> +	     u32 in_sync, struct v3d_submit_ext *se, enum v3d_queue
> queue)
>  {
>  	struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
>  	struct v3d_job *job;
> -	int ret;
> +	bool has_multisync = (se && se->in_sync_count);

I think this is not correct... we could be using the multisync
interface and pass 0 in_syncs and/or 0 out_syncs... what should
determine if we take the multisync path is the presence of the
extension alone.

> +	int ret, i;
>  
>  	*container = kcalloc(1, size, GFP_KERNEL);
>  	if (!*container) {
> @@ -479,9 +480,28 @@ v3d_job_init(struct v3d_dev *v3d, struct
> drm_file *file_priv,
>  	if (ret)
>  		goto fail;
>  
> -	ret = v3d_job_add_deps(file_priv, job, in_sync, 0);
> -	if (ret)
> -		goto fail_job;
> +	if (has_multisync) {
> +		if (se->wait_stage == queue) {
> +			struct drm_v3d_sem __user *handle =
> u64_to_user_ptr(se->in_syncs);
> +
> +			for (i = 0; i < se->in_sync_count; i++) {
> +				struct drm_v3d_sem in;
> +
> +				ret = copy_from_user(&in, handle++,
> sizeof(in));
> +				if (ret) {
> +					DRM_DEBUG("Failed to copy wait
> dep handle.\n");
> +					goto fail_job;
> +				}
> +				ret = v3d_job_add_deps(file_priv, job,
> in.handle, 0);
> +				if (ret)
> +					goto fail_job;
> +			}
> +		}
> +	} else {
> +		ret = v3d_job_add_deps(file_priv, job, in_sync, 0);
> +		if (ret)
> +			goto fail_job;
> +	}
>  
>  	kref_init(&job->refcount);
>  
> @@ -514,6 +534,7 @@ v3d_attach_fences_and_unlock_reservation(struct
> drm_file *file_priv,
>  					 struct v3d_job *job,
>  					 struct ww_acquire_ctx
> *acquire_ctx,
>  					 u32 out_sync,
> +					 struct v3d_submit_ext *se,
>  					 struct dma_fence *done_fence)
>  {
>  	struct drm_syncobj *sync_out;
> @@ -528,6 +549,18 @@ v3d_attach_fences_and_unlock_reservation(struct
> drm_file *file_priv,
>  	drm_gem_unlock_reservations(job->bo, job->bo_count,
> acquire_ctx);
>  
>  	/* Update the return sync object for the job */
> +	/* If multiples semaphores is supported */
> +	if (se && se->out_sync_count) {
> +		for (i = 0; i < se->out_sync_count; i++) {
> +			drm_syncobj_replace_fence(se-
> >out_syncs[i].syncobj,
> +						  done_fence);
> +			drm_syncobj_put(se->out_syncs[i].syncobj);
> +		}
> +		kvfree(se->out_syncs);
> +		return;
> +	}
> +
> +	/* Single signal semaphore */
>  	sync_out = drm_syncobj_find(file_priv, out_sync);
>  	if (sync_out) {
>  		drm_syncobj_replace_fence(sync_out, done_fence);
> @@ -535,13 +568,107 @@
> v3d_attach_fences_and_unlock_reservation(struct drm_file *file_priv,
>  	}
>  }
>  
> +static void
> +v3d_put_multisync_post_deps(struct v3d_submit_ext *se)
> +{
> +	unsigned int i;
> +
> +	for (i = 0; i < se->out_sync_count; i++)
> +		drm_syncobj_put(se->out_syncs[i].syncobj);
> +	kvfree(se->out_syncs);
> +}
> +
>  static int
> -v3d_get_extensions(struct drm_file *file_priv, u64 ext_handles)
> +v3d_get_multisync_post_deps(struct drm_file *file_priv,
> +			    struct v3d_submit_ext *se,
> +			    u32 count, u64 handles)
> +{
> +	struct drm_v3d_sem __user *post_deps;
> +	int i, ret;
> +
> +	se->out_syncs = (struct v3d_submit_outsync *)
> +			kvmalloc_array(count,
> +				       sizeof(struct
> v3d_submit_outsync),
> +				       GFP_KERNEL);
> +	if (!se->out_syncs)
> +		return -ENOMEM;
> +
> +	post_deps = u64_to_user_ptr(handles);
> +
> +	for (i = 0; i < count; i++) {
> +		struct drm_v3d_sem out;
> +
> +		ret = copy_from_user(&out, post_deps++, sizeof(out));
> +		if (ret) {
> +			DRM_DEBUG("Failed to copy post dep handles\n");
> +			goto fail;
> +		}
> +
> +		se->out_syncs[i].syncobj = drm_syncobj_find(file_priv,
> +							    out.handle)
> ;
> +		if (!se->out_syncs[i].syncobj) {
> +			ret = -EINVAL;
> +			goto fail;
> +		}
> +	}
> +	se->out_sync_count = count;
> +
> +	return 0;
> +
> +fail:
> +	for (i--; i >= 0; i--)
> +		drm_syncobj_put(se->out_syncs[i].syncobj);
> +	kvfree(se->out_syncs);
> +
> +	return ret;
> +}
> +
> +/* Get data for multiple binary semaphores synchronization. Parse
> syncobj
> + * to be signaled when job completes (out_sync).
> + */
> +static int
> +v3d_get_multisync_submit_deps(struct drm_file *file_priv,
> +			      struct drm_v3d_extension __user *ext,
> +			      void *data)
> +{
> +	struct drm_v3d_multi_sync multisync;
> +	struct v3d_submit_ext *se = data;
> +	int ret;
> +
> +	ret = copy_from_user(&multisync, ext, sizeof(multisync));
> +	if (ret)
> +		return ret;
> +
> +	if (multisync.pad)
> +		return -EINVAL;
> +
> +	ret = v3d_get_multisync_post_deps(file_priv, data,
> multisync.out_sync_count,
> +					  multisync.out_syncs);
> +	if (ret)
> +		return ret;
> +
> +	se->in_sync_count = multisync.in_sync_count;
> +	se->in_syncs = multisync.in_syncs;
> +
> +	se->wait_stage = multisync.wait_stage;
> +
> +	return 0;
> +}
> +
> +/* Whenever userspace sets ioctl extensions, parses data according
> to the
> + * extension id (name). By now, generic ioctl extensions is only
> used to
> + * support multiple binary semaphores (struct drm_v3d_multi_sync).
> + */

Let's remove the sentence that starts with "By now, generic..." it is
not really relevant and it is bound to get obsolete when we add more
extensions.

> +static int
> +v3d_get_extensions(struct drm_file *file_priv,
> +		   u64 ext_handles,
> +		   void *data)
>  {
>  	struct drm_v3d_extension __user *user_ext;
> +	int ret;
>  
>  	user_ext = u64_to_user_ptr(ext_handles);
> -	while(user_ext) {
> +	while (user_ext) {
>  		struct drm_v3d_extension ext;
>  
>  		if (copy_from_user(&ext, user_ext, sizeof(ext))) {
> @@ -550,7 +677,11 @@ v3d_get_extensions(struct drm_file *file_priv,
> u64 ext_handles)
>  		}
>  
>  		switch (ext.id) {
> -		case 0:
> +		case DRM_V3D_EXT_ID_MULTI_SYNC:
> +			ret = v3d_get_multisync_submit_deps(file_priv,
> user_ext, data);
> +			if (ret)
> +				return ret;
> +			break;
>  		default:
>  			DRM_DEBUG_DRIVER("Unknown extension id: %d\n",
> ext.id);
>  			return -EINVAL;
> @@ -581,6 +712,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void
> *data,
>  	struct v3d_dev *v3d = to_v3d_dev(dev);
>  	struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
>  	struct drm_v3d_submit_cl *args = data;
> +	struct v3d_submit_ext se = {0};
>  	struct v3d_bin_job *bin = NULL;
>  	struct v3d_render_job *render;
>  	struct v3d_job *clean_job = NULL;
> @@ -601,7 +733,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void
> *data,
>  	}
>  
>  	if (args->flags & DRM_V3D_SUBMIT_EXTENSION) {
> -		ret = v3d_get_extensions(file_priv, args->extensions);
> +		ret = v3d_get_extensions(file_priv, args->extensions,
> &se);
>  		if (ret) {
>  			DRM_DEBUG("Failed to get extensions.\n");
>  			return ret;
> @@ -609,7 +741,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void
> *data,
>  	}
>  
>  	ret = v3d_job_init(v3d, file_priv, (void *)&render,
> sizeof(*render),
> -			   v3d_render_job_free, args->in_sync_bcl,
> V3D_RENDER);
> +			   v3d_render_job_free, args->in_sync_bcl, &se,
> V3D_RENDER);

So here is an example of what I was saying above... if the driver is
using the multisync interface with 0 in_syncs, v3d_job_init here will
think we are not using multisync and instead try to use args-
>in_sync_bcl, which is not what we want. The same issue happens in all
other submission ioctls.

>  	if (ret)
>  		goto fail;
>  
> @@ -619,7 +751,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void
> *data,
>  
>  	if (args->bcl_start != args->bcl_end) {
>  		ret = v3d_job_init(v3d, file_priv, (void *)&bin,
> sizeof(*bin),
> -				   v3d_job_free, args->in_sync_bcl,
> V3D_BIN);
> +				   v3d_job_free, args->in_sync_bcl,
> &se, V3D_BIN);

Same here.

>  		if (ret)
>  			goto fail;
>  
> @@ -633,7 +765,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void
> *data,
>  
>  	if (args->flags & DRM_V3D_SUBMIT_CL_FLUSH_CACHE) {
>  		ret = v3d_job_init(v3d, file_priv, (void *)&clean_job,
> sizeof(*clean_job),
> -				   v3d_job_free, 0, V3D_CACHE_CLEAN);
> +				   v3d_job_free, 0, 0,
> V3D_CACHE_CLEAN);
>  		if (ret)
>  			goto fail;
>  
> @@ -693,6 +825,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void
> *data,
>  						 last_job,
>  						 &acquire_ctx,
>  						 args->out_sync,
> +						 &se,
>  						 last_job->done_fence);
>  
>  	if (bin)
> @@ -711,6 +844,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void
> *data,
>  	v3d_job_cleanup((void *)bin);
>  	v3d_job_cleanup((void *)render);
>  	v3d_job_cleanup(clean_job);
> +	v3d_put_multisync_post_deps(&se);
>  
>  	return ret;
>  }
> @@ -730,6 +864,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void
> *data,
>  {
>  	struct v3d_dev *v3d = to_v3d_dev(dev);
>  	struct drm_v3d_submit_tfu *args = data;
> +	struct v3d_submit_ext se = {0};
>  	struct v3d_tfu_job *job;
>  	struct ww_acquire_ctx acquire_ctx;
>  	int ret = 0;
> @@ -742,7 +877,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void
> *data,
>  	}
>  
>  	if (args->flags & DRM_V3D_SUBMIT_EXTENSION) {
> -		ret = v3d_get_extensions(file_priv, args->extensions);
> +		ret = v3d_get_extensions(file_priv, args->extensions,
> &se);
>  		if (ret) {
>  			DRM_DEBUG("Failed to get extensions.\n");
>  			return ret;
> @@ -750,7 +885,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void
> *data,
>  	}
>  
>  	ret = v3d_job_init(v3d, file_priv, (void *)&job, sizeof(*job),
> -			   v3d_job_free, args->in_sync, V3D_TFU);
> +			   v3d_job_free, args->in_sync, &se, V3D_TFU);

Same here.

>  	if (ret)
>  		goto fail;
>  
> @@ -798,6 +933,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void
> *data,
>  	v3d_attach_fences_and_unlock_reservation(file_priv,
>  						 &job->base,
> &acquire_ctx,
>  						 args->out_sync,
> +						 &se,
>  						 job->base.done_fence);
>  
>  	v3d_job_put(&job->base);
> @@ -806,6 +942,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void
> *data,
>  
>  fail:
>  	v3d_job_cleanup((void *)job);
> +	v3d_put_multisync_post_deps(&se);
>  
>  	return ret;
>  }
> @@ -826,8 +963,9 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void
> *data,
>  	struct v3d_dev *v3d = to_v3d_dev(dev);
>  	struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
>  	struct drm_v3d_submit_csd *args = data;
> +	struct v3d_submit_ext se = {0};
>  	struct v3d_csd_job *job;
> -	struct v3d_job *clean_job;
> +	struct v3d_job *clean_job = NULL;
>  	struct ww_acquire_ctx acquire_ctx;
>  	int ret;
>  
> @@ -847,7 +985,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void
> *data,
>  	}
>  
>  	if (args->flags & DRM_V3D_SUBMIT_EXTENSION) {
> -		ret = v3d_get_extensions(file_priv, args->extensions);
> +		ret = v3d_get_extensions(file_priv, args->extensions,
> &se);
>  		if (ret) {
>  			DRM_DEBUG("Failed to get extensions.\n");
>  			return ret;
> @@ -855,12 +993,12 @@ v3d_submit_csd_ioctl(struct drm_device *dev,
> void *data,
>  	}
>  
>  	ret = v3d_job_init(v3d, file_priv, (void *)&job, sizeof(*job),
> -			   v3d_job_free, args->in_sync, V3D_CSD);
> +			   v3d_job_free, args->in_sync, &se, V3D_CSD);

Same here.

>  	if (ret)
>  		goto fail;
>  
>  	ret = v3d_job_init(v3d, file_priv, (void *)&clean_job,
> sizeof(*clean_job),
> -			   v3d_job_free, 0, V3D_CACHE_CLEAN);
> +			   v3d_job_free, 0, 0, V3D_CACHE_CLEAN);
>  	if (ret)
>  		goto fail;
>  
> @@ -899,6 +1037,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev,
> void *data,
>  						 clean_job,
>  						 &acquire_ctx,
>  						 args->out_sync,
> +						 &se,
>  						 clean_job-
> >done_fence);
>  
>  	v3d_job_put(&job->base);
> @@ -913,6 +1052,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev,
> void *data,
>  fail:
>  	v3d_job_cleanup((void *)job);
>  	v3d_job_cleanup(clean_job);
> +	v3d_put_multisync_post_deps(&se);
>  
>  	return ret;
>  }
> diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
> index 55b443ca6c0b..3dfc0af8756a 100644
> --- a/include/uapi/drm/v3d_drm.h
> +++ b/include/uapi/drm/v3d_drm.h
> @@ -73,6 +73,53 @@ struct drm_v3d_extension {
>  	__u32 flags; /* mbz */
>  };
>  
> +/* struct drm_v3d_sem - wait/signal semaphore
> + *
> + * If binary semaphore, it only takes syncobj handle and ignores
> flags and
> + * point fields. Point is defined for timeline syncobj feature.
> + */
> +struct drm_v3d_sem {
> +	__u32 handle; /* syncobj */
> +	/* rsv below, for future uses */
> +	__u32 flags;
> +	__u64 point;  /* for timeline sem support */
> +	__u64 mbz[2]; /* must be zero, rsv */
> +};
> +
> +/* Enum for each of the V3D queues. */
> +enum v3d_queue {
> +	V3D_BIN,
> +	V3D_RENDER,
> +	V3D_TFU,
> +	V3D_CSD,
> +	V3D_CACHE_CLEAN,
> +};
> +
> +/**
> + * struct drm_v3d_multi_sync - ioctl extension to add support
> multiples
> + * syncobjs for commands submission.
> + *
> + * When an extension of DRM_V3D_EXT_ID_MULTI_SYNC id is defined, it
> points to
> + * this extension to define wait and signal dependencies, instead of
> single
> + * in/out sync entries on submitting commands. The field flags is
> used to
> + * determine the stage to set wait dependencies.
> + */
> +struct drm_v3d_multi_sync {
> +	struct drm_v3d_extension base;
> +	/* Array of wait and signal semaphores */
> +	__u64 in_syncs;
> +	__u64 out_syncs;
> +
> +	/* Number of entries */
> +	__u32 in_sync_count;
> +	__u32 out_sync_count;
> +
> +	/* set the stage (v3d_queue) to sync */
> +	__u32 wait_stage;
> +
> +	__u32 pad; /* mbz */
> +};
> +
>  /**
>   * struct drm_v3d_submit_cl - ioctl argument for submitting commands
> to the 3D
>   * engine.
> @@ -228,6 +275,7 @@ enum drm_v3d_param {
>  	DRM_V3D_PARAM_SUPPORTS_CSD,
>  	DRM_V3D_PARAM_SUPPORTS_CACHE_FLUSH,
>  	DRM_V3D_PARAM_SUPPORTS_PERFMON,
> +	DRM_V3D_PARAM_SUPPORTS_MULTISYNC_EXT,
>  };
>  
>  struct drm_v3d_get_param {
> @@ -271,7 +319,6 @@ struct drm_v3d_submit_tfu {
>  
>  	/* Pointer to an array of ioctl extensions*/
>  	__u64 extensions;
> -
>  };
>  
>  /* Submits a compute shader for dispatch.  This job will block on
> any


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 3/4] drm/v3d: add generic ioctl extension
  2021-09-30  9:22     ` Melissa Wen
@ 2021-09-30  9:59       ` Iago Toral
  0 siblings, 0 replies; 13+ messages in thread
From: Iago Toral @ 2021-09-30  9:59 UTC (permalink / raw)
  To: Melissa Wen
  Cc: dri-devel, Emma Anholt, David Airlie, Daniel Vetter,
	Maxime Ripard, Boris Brezillon

On Thu, 2021-09-30 at 10:22 +0100, Melissa Wen wrote:
> > > 
> O 09/30, Iago Toral wrote:
> > On Wed, 2021-09-29 at 10:44 +0100, Melissa Wen wrote:
(...) 
> > >  /**
> > >   * struct drm_v3d_submit_cl - ioctl argument for submitting
> > > commands
> > > to the 3D
> > > @@ -135,12 +149,16 @@ struct drm_v3d_submit_cl {
> > >  	/* Number of BO handles passed in (size is that times 4). */
> > >  	__u32 bo_handle_count;
> > >  
> > > +	/* DRM_V3D_SUBMIT_* properties */
> > >  	__u32 flags;
> > >  
> > >  	/* ID of the perfmon to attach to this job. 0 means no perfmon.
> > > */
> > >  	__u32 perfmon_id;
> > >  
> > >  	__u32 pad;
> > > +
> > > +	/* Pointer to an array of ioctl extensions*/
> > > +	__u64 extensions;
> > >  };
> > >  
> > >  /**
> > > @@ -248,6 +266,12 @@ struct drm_v3d_submit_tfu {
> > >  	__u32 in_sync;
> > >  	/* Sync object to signal when the TFU job is done. */
> > >  	__u32 out_sync;
> > > +
> > > +	__u32 flags;
> > > +
> > > +	/* Pointer to an array of ioctl extensions*/
> > > +	__u64 extensions;
> > 
> > We want __u64 fields aligned to 64-bit so we should swap the
> > positions
> > of flags and extensions.
> 
> hmm.. not sure. before two arrays of 4 x _u32 elements, we have seven
> _u32 elements... this is why I counted a odd number of _u32 and put
> _u32
> flags before _u64 extensions... or is it working different for array
> types?
> 

Ah yes, I was confused by the patch format, but you're right.

> For the same reason, I think there is an unalignment issue on
> submit_csd that would need to change the current interface to solve
> (afaiu)... 
> 

Yes, that one is not aligned, but it is too late to fix now without
braking the interface. We have not seen any issues caused by that on
32-bit Raspbian though.

Iago

> > > +
> > >  };
> > >  
> > >  /* Submits a compute shader for dispatch.  This job will block
> > > on
> > > any
> > > @@ -276,6 +300,13 @@ struct drm_v3d_submit_csd {
> > >  
> > >  	/* ID of the perfmon to attach to this job. 0 means no perfmon.
> > > */
> > >  	__u32 perfmon_id;
> > > +
> > > +	/* Pointer to an array of ioctl extensions*/
> > > +	__u64 extensions;
> > > +
> > > +	__u32 flags;
> > > +
> > > +	__u32 pad;
> > >  };
> > >  
> > >  enum {


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 4/4] drm/v3d: add multiple syncobjs support
  2021-09-30  9:54   ` Iago Toral
@ 2021-09-30 10:40     ` Melissa Wen
  2021-09-30 16:26       ` Melissa Wen
  0 siblings, 1 reply; 13+ messages in thread
From: Melissa Wen @ 2021-09-30 10:40 UTC (permalink / raw)
  To: Iago Toral
  Cc: dri-devel, Emma Anholt, David Airlie, Daniel Vetter,
	Maxime Ripard, Boris Brezillon

[-- Attachment #1: Type: text/plain, Size: 19107 bytes --]

On 09/30, Iago Toral wrote:
> On Wed, 2021-09-29 at 10:45 +0100, Melissa Wen wrote:
> > Using the generic extension from the previous patch, a specific
> > multisync
> > extension enables more than one in/out binary syncobj per job
> > submission.
> > Arrays of syncobjs are set in struct drm_v3d_multisync, that also
> > cares
> > of determining the stage for sync (wait deps) according to the job
> > queue.
> > 
> > v2:
> > - subclass the generic extension struct (Daniel)
> > - simplify adding dependency conditions to make understandable (Iago)
> > 
> > Signed-off-by: Melissa Wen <mwen@igalia.com>
> > ---
> >  drivers/gpu/drm/v3d/v3d_drv.c |   6 +-
> >  drivers/gpu/drm/v3d/v3d_drv.h |  23 +++--
> >  drivers/gpu/drm/v3d/v3d_gem.c | 176 ++++++++++++++++++++++++++++++
> > ----
> >  include/uapi/drm/v3d_drm.h    |  49 +++++++++-
> >  4 files changed, 224 insertions(+), 30 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/v3d/v3d_drv.c
> > b/drivers/gpu/drm/v3d/v3d_drv.c
> > index 3d6b9bcce2f7..bd46396a1ae0 100644
> > --- a/drivers/gpu/drm/v3d/v3d_drv.c
> > +++ b/drivers/gpu/drm/v3d/v3d_drv.c
> > @@ -96,6 +96,9 @@ static int v3d_get_param_ioctl(struct drm_device
> > *dev, void *data,
> >  	case DRM_V3D_PARAM_SUPPORTS_PERFMON:
> >  		args->value = (v3d->ver >= 40);
> >  		return 0;
> > +	case DRM_V3D_PARAM_SUPPORTS_MULTISYNC_EXT:
> > +		args->value = 1;
> > +		return 0;
> >  	default:
> >  		DRM_DEBUG("Unknown parameter %d\n", args->param);
> >  		return -EINVAL;
> > @@ -135,9 +138,8 @@ v3d_postclose(struct drm_device *dev, struct
> > drm_file *file)
> >  	struct v3d_file_priv *v3d_priv = file->driver_priv;
> >  	enum v3d_queue q;
> >  
> > -	for (q = 0; q < V3D_MAX_QUEUES; q++) {
> > +	for (q = 0; q < V3D_MAX_QUEUES; q++)
> >  		drm_sched_entity_destroy(&v3d_priv->sched_entity[q]);
> > -	}
> >  
> >  	v3d_perfmon_close_file(v3d_priv);
> >  	kfree(v3d_priv);
> > diff --git a/drivers/gpu/drm/v3d/v3d_drv.h
> > b/drivers/gpu/drm/v3d/v3d_drv.h
> > index b900a050d5e2..b14ff1e96f49 100644
> > --- a/drivers/gpu/drm/v3d/v3d_drv.h
> > +++ b/drivers/gpu/drm/v3d/v3d_drv.h
> > @@ -19,15 +19,6 @@ struct reset_control;
> >  
> >  #define GMP_GRANULARITY (128 * 1024)
> >  
> > -/* Enum for each of the V3D queues. */
> > -enum v3d_queue {
> > -	V3D_BIN,
> > -	V3D_RENDER,
> > -	V3D_TFU,
> > -	V3D_CSD,
> > -	V3D_CACHE_CLEAN,
> > -};
> > -
> >  #define V3D_MAX_QUEUES (V3D_CACHE_CLEAN + 1)
> >  
> >  struct v3d_queue_state {
> > @@ -294,6 +285,20 @@ struct v3d_csd_job {
> >  	struct drm_v3d_submit_csd args;
> >  };
> >  
> > +struct v3d_submit_outsync {
> > +	struct drm_syncobj *syncobj;
> > +};
> > +
> > +struct v3d_submit_ext {
> > +	u32 wait_stage;
> > +
> > +	u32 in_sync_count;
> > +	u64 in_syncs;
> > +
> > +	u32 out_sync_count;
> > +	struct v3d_submit_outsync *out_syncs;
> > +};
> > +
> >  /**
> >   * __wait_for - magic wait macro
> >   *
> > diff --git a/drivers/gpu/drm/v3d/v3d_gem.c
> > b/drivers/gpu/drm/v3d/v3d_gem.c
> > index b912419027f7..e92998d39eaa 100644
> > --- a/drivers/gpu/drm/v3d/v3d_gem.c
> > +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> > @@ -454,11 +454,12 @@ v3d_job_add_deps(struct drm_file *file_priv,
> > struct v3d_job *job,
> >  static int
> >  v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
> >  	     void **container, size_t size, void (*free)(struct kref
> > *ref),
> > -	     u32 in_sync, enum v3d_queue queue)
> > +	     u32 in_sync, struct v3d_submit_ext *se, enum v3d_queue
> > queue)
> >  {
> >  	struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
> >  	struct v3d_job *job;
> > -	int ret;
> > +	bool has_multisync = (se && se->in_sync_count);
> 
> I think this is not correct... we could be using the multisync
> interface and pass 0 in_syncs and/or 0 out_syncs... what should
> determine if we take the multisync path is the presence of the
> extension alone.
hmm.. yeah. so, I should drop this has_multisync and change conditions
to only check if we have a submit_ext (that means we have multisync
support) and move the in_sync_count to check if we should add any wait
semaphores, as below: 

> 
> > +	int ret, i;
> >  
> >  	*container = kcalloc(1, size, GFP_KERNEL);
> >  	if (!*container) {
> > @@ -479,9 +480,28 @@ v3d_job_init(struct v3d_dev *v3d, struct
> > drm_file *file_priv,
> >  	if (ret)
> >  		goto fail;
> >  
> > -	ret = v3d_job_add_deps(file_priv, job, in_sync, 0);
> > -	if (ret)
> > -		goto fail_job;
	if (se) {
		if (se->in_sync_count && se->wait_stage == queue) {
> > +			struct drm_v3d_sem __user *handle =
> > u64_to_user_ptr(se->in_syncs);
> > +
> > +			for (i = 0; i < se->in_sync_count; i++) {
> > +				struct drm_v3d_sem in;
> > +
> > +				ret = copy_from_user(&in, handle++,
> > sizeof(in));
> > +				if (ret) {
> > +					DRM_DEBUG("Failed to copy wait
> > dep handle.\n");
> > +					goto fail_job;
> > +				}
> > +				ret = v3d_job_add_deps(file_priv, job,
> > in.handle, 0);
> > +				if (ret)
> > +					goto fail_job;
> > +			}
> > +		}
> > +	} else {
> > +		ret = v3d_job_add_deps(file_priv, job, in_sync, 0);
> > +		if (ret)
> > +			goto fail_job;
> > +	}
> >  
> >  	kref_init(&job->refcount);
> >  
> > @@ -514,6 +534,7 @@ v3d_attach_fences_and_unlock_reservation(struct
> > drm_file *file_priv,
> >  					 struct v3d_job *job,
> >  					 struct ww_acquire_ctx
> > *acquire_ctx,
> >  					 u32 out_sync,
> > +					 struct v3d_submit_ext *se,
> >  					 struct dma_fence *done_fence)
> >  {
> >  	struct drm_syncobj *sync_out;
> > @@ -528,6 +549,18 @@ v3d_attach_fences_and_unlock_reservation(struct
> > drm_file *file_priv,
> >  	drm_gem_unlock_reservations(job->bo, job->bo_count,
> > acquire_ctx);
> >  
> >  	/* Update the return sync object for the job */
> > +	/* If multiples semaphores is supported */
> > +	if (se && se->out_sync_count) {

and drop this out_sync_count condition too ^
> > +		for (i = 0; i < se->out_sync_count; i++) {
> > +			drm_syncobj_replace_fence(se-
> > >out_syncs[i].syncobj,
> > +						  done_fence);
> > +			drm_syncobj_put(se->out_syncs[i].syncobj);
> > +		}
> > +		kvfree(se->out_syncs);
> > +		return;
> > +	}
> > +
> > +	/* Single signal semaphore */
> >  	sync_out = drm_syncobj_find(file_priv, out_sync);
> >  	if (sync_out) {
> >  		drm_syncobj_replace_fence(sync_out, done_fence);
> > @@ -535,13 +568,107 @@
> > v3d_attach_fences_and_unlock_reservation(struct drm_file *file_priv,
> >  	}
> >  }
> >  
> > +static void
> > +v3d_put_multisync_post_deps(struct v3d_submit_ext *se)
> > +{
> > +	unsigned int i;
> > +
> > +	for (i = 0; i < se->out_sync_count; i++)
> > +		drm_syncobj_put(se->out_syncs[i].syncobj);
> > +	kvfree(se->out_syncs);
> > +}
> > +
> >  static int
> > -v3d_get_extensions(struct drm_file *file_priv, u64 ext_handles)
> > +v3d_get_multisync_post_deps(struct drm_file *file_priv,
> > +			    struct v3d_submit_ext *se,
> > +			    u32 count, u64 handles)
> > +{
> > +	struct drm_v3d_sem __user *post_deps;
> > +	int i, ret;
> > +
> > +	se->out_syncs = (struct v3d_submit_outsync *)
> > +			kvmalloc_array(count,
> > +				       sizeof(struct
> > v3d_submit_outsync),
> > +				       GFP_KERNEL);
> > +	if (!se->out_syncs)
> > +		return -ENOMEM;
> > +
> > +	post_deps = u64_to_user_ptr(handles);
> > +
> > +	for (i = 0; i < count; i++) {
> > +		struct drm_v3d_sem out;
> > +
> > +		ret = copy_from_user(&out, post_deps++, sizeof(out));
> > +		if (ret) {
> > +			DRM_DEBUG("Failed to copy post dep handles\n");
> > +			goto fail;
> > +		}
> > +
> > +		se->out_syncs[i].syncobj = drm_syncobj_find(file_priv,
> > +							    out.handle)
> > ;
> > +		if (!se->out_syncs[i].syncobj) {
> > +			ret = -EINVAL;
> > +			goto fail;
> > +		}
> > +	}
> > +	se->out_sync_count = count;
> > +
> > +	return 0;
> > +
> > +fail:
> > +	for (i--; i >= 0; i--)
> > +		drm_syncobj_put(se->out_syncs[i].syncobj);
> > +	kvfree(se->out_syncs);
> > +
> > +	return ret;
> > +}
> > +
> > +/* Get data for multiple binary semaphores synchronization. Parse
> > syncobj
> > + * to be signaled when job completes (out_sync).
> > + */
> > +static int
> > +v3d_get_multisync_submit_deps(struct drm_file *file_priv,
> > +			      struct drm_v3d_extension __user *ext,
> > +			      void *data)
> > +{
> > +	struct drm_v3d_multi_sync multisync;
> > +	struct v3d_submit_ext *se = data;
> > +	int ret;
> > +
> > +	ret = copy_from_user(&multisync, ext, sizeof(multisync));
> > +	if (ret)
> > +		return ret;
> > +
> > +	if (multisync.pad)
> > +		return -EINVAL;
> > +
> > +	ret = v3d_get_multisync_post_deps(file_priv, data,
> > multisync.out_sync_count,
> > +					  multisync.out_syncs);
> > +	if (ret)
> > +		return ret;
> > +
> > +	se->in_sync_count = multisync.in_sync_count;
> > +	se->in_syncs = multisync.in_syncs;
> > +
> > +	se->wait_stage = multisync.wait_stage;
> > +
> > +	return 0;
> > +}
> > +
> > +/* Whenever userspace sets ioctl extensions, parses data according
> > to the
> > + * extension id (name). By now, generic ioctl extensions is only
> > used to
> > + * support multiple binary semaphores (struct drm_v3d_multi_sync).
> > + */
> 
> Let's remove the sentence that starts with "By now, generic..." it is
> not really relevant and it is bound to get obsolete when we add more
> extensions.
ok. I think I should also move this comment to the previous patch
where the function below was created.
> 
> > +static int
> > +v3d_get_extensions(struct drm_file *file_priv,
> > +		   u64 ext_handles,
> > +		   void *data)
> >  {
> >  	struct drm_v3d_extension __user *user_ext;
> > +	int ret;
> >  
> >  	user_ext = u64_to_user_ptr(ext_handles);
> > -	while(user_ext) {
> > +	while (user_ext) {
> >  		struct drm_v3d_extension ext;
> >  
> >  		if (copy_from_user(&ext, user_ext, sizeof(ext))) {
> > @@ -550,7 +677,11 @@ v3d_get_extensions(struct drm_file *file_priv,
> > u64 ext_handles)
> >  		}
> >  
> >  		switch (ext.id) {
> > -		case 0:
> > +		case DRM_V3D_EXT_ID_MULTI_SYNC:
> > +			ret = v3d_get_multisync_submit_deps(file_priv,
> > user_ext, data);
> > +			if (ret)
> > +				return ret;
> > +			break;
> >  		default:
> >  			DRM_DEBUG_DRIVER("Unknown extension id: %d\n",
> > ext.id);
> >  			return -EINVAL;
> > @@ -581,6 +712,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void
> > *data,
> >  	struct v3d_dev *v3d = to_v3d_dev(dev);
> >  	struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
> >  	struct drm_v3d_submit_cl *args = data;
> > +	struct v3d_submit_ext se = {0};
> >  	struct v3d_bin_job *bin = NULL;
> >  	struct v3d_render_job *render;
> >  	struct v3d_job *clean_job = NULL;
> > @@ -601,7 +733,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void
> > *data,
> >  	}
> >  
> >  	if (args->flags & DRM_V3D_SUBMIT_EXTENSION) {
> > -		ret = v3d_get_extensions(file_priv, args->extensions);
> > +		ret = v3d_get_extensions(file_priv, args->extensions,
> > &se);
> >  		if (ret) {
> >  			DRM_DEBUG("Failed to get extensions.\n");
> >  			return ret;
> > @@ -609,7 +741,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void
> > *data,
> >  	}
> >  
> >  	ret = v3d_job_init(v3d, file_priv, (void *)&render,
> > sizeof(*render),
> > -			   v3d_render_job_free, args->in_sync_bcl,
> > V3D_RENDER);
> > +			   v3d_render_job_free, args->in_sync_bcl, &se,
> > V3D_RENDER);
> 
> So here is an example of what I was saying above... if the driver is
> using the multisync interface with 0 in_syncs, v3d_job_init here will
> think we are not using multisync and instead try to use args-
> >in_sync_bcl, which is not what we want. The same issue happens in all
> other submission ioctls.
> 
> >  	if (ret)
> >  		goto fail;
> >  
> > @@ -619,7 +751,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void
> > *data,
> >  
> >  	if (args->bcl_start != args->bcl_end) {
> >  		ret = v3d_job_init(v3d, file_priv, (void *)&bin,
> > sizeof(*bin),
> > -				   v3d_job_free, args->in_sync_bcl,
> > V3D_BIN);
> > +				   v3d_job_free, args->in_sync_bcl,
> > &se, V3D_BIN);
> 
> Same here.
> 
> >  		if (ret)
> >  			goto fail;
> >  
> > @@ -633,7 +765,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void
> > *data,
> >  
> >  	if (args->flags & DRM_V3D_SUBMIT_CL_FLUSH_CACHE) {
> >  		ret = v3d_job_init(v3d, file_priv, (void *)&clean_job,
> > sizeof(*clean_job),
> > -				   v3d_job_free, 0, V3D_CACHE_CLEAN);
> > +				   v3d_job_free, 0, 0,
> > V3D_CACHE_CLEAN);
> >  		if (ret)
> >  			goto fail;
> >  
> > @@ -693,6 +825,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void
> > *data,
> >  						 last_job,
> >  						 &acquire_ctx,
> >  						 args->out_sync,
> > +						 &se,
> >  						 last_job->done_fence);
> >  
> >  	if (bin)
> > @@ -711,6 +844,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void
> > *data,
> >  	v3d_job_cleanup((void *)bin);
> >  	v3d_job_cleanup((void *)render);
> >  	v3d_job_cleanup(clean_job);
> > +	v3d_put_multisync_post_deps(&se);
> >  
> >  	return ret;
> >  }
> > @@ -730,6 +864,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void
> > *data,
> >  {
> >  	struct v3d_dev *v3d = to_v3d_dev(dev);
> >  	struct drm_v3d_submit_tfu *args = data;
> > +	struct v3d_submit_ext se = {0};
> >  	struct v3d_tfu_job *job;
> >  	struct ww_acquire_ctx acquire_ctx;
> >  	int ret = 0;
> > @@ -742,7 +877,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void
> > *data,
> >  	}
> >  
> >  	if (args->flags & DRM_V3D_SUBMIT_EXTENSION) {
> > -		ret = v3d_get_extensions(file_priv, args->extensions);
> > +		ret = v3d_get_extensions(file_priv, args->extensions,
> > &se);
> >  		if (ret) {
> >  			DRM_DEBUG("Failed to get extensions.\n");
> >  			return ret;
> > @@ -750,7 +885,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void
> > *data,
> >  	}
> >  
> >  	ret = v3d_job_init(v3d, file_priv, (void *)&job, sizeof(*job),
> > -			   v3d_job_free, args->in_sync, V3D_TFU);
> > +			   v3d_job_free, args->in_sync, &se, V3D_TFU);
> 
> Same here.
> 
> >  	if (ret)
> >  		goto fail;
> >  
> > @@ -798,6 +933,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void
> > *data,
> >  	v3d_attach_fences_and_unlock_reservation(file_priv,
> >  						 &job->base,
> > &acquire_ctx,
> >  						 args->out_sync,
> > +						 &se,
> >  						 job->base.done_fence);
> >  
> >  	v3d_job_put(&job->base);
> > @@ -806,6 +942,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void
> > *data,
> >  
> >  fail:
> >  	v3d_job_cleanup((void *)job);
> > +	v3d_put_multisync_post_deps(&se);
> >  
> >  	return ret;
> >  }
> > @@ -826,8 +963,9 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void
> > *data,
> >  	struct v3d_dev *v3d = to_v3d_dev(dev);
> >  	struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
> >  	struct drm_v3d_submit_csd *args = data;
> > +	struct v3d_submit_ext se = {0};
> >  	struct v3d_csd_job *job;
> > -	struct v3d_job *clean_job;
> > +	struct v3d_job *clean_job = NULL;
> >  	struct ww_acquire_ctx acquire_ctx;
> >  	int ret;
> >  
> > @@ -847,7 +985,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void
> > *data,
> >  	}
> >  
> >  	if (args->flags & DRM_V3D_SUBMIT_EXTENSION) {
> > -		ret = v3d_get_extensions(file_priv, args->extensions);
> > +		ret = v3d_get_extensions(file_priv, args->extensions,
> > &se);
> >  		if (ret) {
> >  			DRM_DEBUG("Failed to get extensions.\n");
> >  			return ret;
> > @@ -855,12 +993,12 @@ v3d_submit_csd_ioctl(struct drm_device *dev,
> > void *data,
> >  	}
> >  
> >  	ret = v3d_job_init(v3d, file_priv, (void *)&job, sizeof(*job),
> > -			   v3d_job_free, args->in_sync, V3D_CSD);
> > +			   v3d_job_free, args->in_sync, &se, V3D_CSD);
> 
> Same here.
> 
> >  	if (ret)
> >  		goto fail;
> >  
> >  	ret = v3d_job_init(v3d, file_priv, (void *)&clean_job,
> > sizeof(*clean_job),
> > -			   v3d_job_free, 0, V3D_CACHE_CLEAN);
> > +			   v3d_job_free, 0, 0, V3D_CACHE_CLEAN);
> >  	if (ret)
> >  		goto fail;
> >  
> > @@ -899,6 +1037,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev,
> > void *data,
> >  						 clean_job,
> >  						 &acquire_ctx,
> >  						 args->out_sync,
> > +						 &se,
> >  						 clean_job-
> > >done_fence);
> >  
> >  	v3d_job_put(&job->base);
> > @@ -913,6 +1052,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev,
> > void *data,
> >  fail:
> >  	v3d_job_cleanup((void *)job);
> >  	v3d_job_cleanup(clean_job);
> > +	v3d_put_multisync_post_deps(&se);
> >  
> >  	return ret;
> >  }
> > diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
> > index 55b443ca6c0b..3dfc0af8756a 100644
> > --- a/include/uapi/drm/v3d_drm.h
> > +++ b/include/uapi/drm/v3d_drm.h
> > @@ -73,6 +73,53 @@ struct drm_v3d_extension {
> >  	__u32 flags; /* mbz */
> >  };
> >  
> > +/* struct drm_v3d_sem - wait/signal semaphore
> > + *
> > + * If binary semaphore, it only takes syncobj handle and ignores
> > flags and
> > + * point fields. Point is defined for timeline syncobj feature.
> > + */
> > +struct drm_v3d_sem {
> > +	__u32 handle; /* syncobj */
> > +	/* rsv below, for future uses */
> > +	__u32 flags;
> > +	__u64 point;  /* for timeline sem support */
> > +	__u64 mbz[2]; /* must be zero, rsv */
> > +};
> > +
> > +/* Enum for each of the V3D queues. */
> > +enum v3d_queue {
> > +	V3D_BIN,
> > +	V3D_RENDER,
> > +	V3D_TFU,
> > +	V3D_CSD,
> > +	V3D_CACHE_CLEAN,
> > +};
> > +
> > +/**
> > + * struct drm_v3d_multi_sync - ioctl extension to add support
> > multiples
> > + * syncobjs for commands submission.
> > + *
> > + * When an extension of DRM_V3D_EXT_ID_MULTI_SYNC id is defined, it
> > points to
> > + * this extension to define wait and signal dependencies, instead of
> > single
> > + * in/out sync entries on submitting commands. The field flags is
> > used to
> > + * determine the stage to set wait dependencies.
> > + */
> > +struct drm_v3d_multi_sync {
> > +	struct drm_v3d_extension base;
> > +	/* Array of wait and signal semaphores */
> > +	__u64 in_syncs;
> > +	__u64 out_syncs;
> > +
> > +	/* Number of entries */
> > +	__u32 in_sync_count;
> > +	__u32 out_sync_count;
> > +
> > +	/* set the stage (v3d_queue) to sync */
> > +	__u32 wait_stage;
> > +
> > +	__u32 pad; /* mbz */
> > +};
> > +
> >  /**
> >   * struct drm_v3d_submit_cl - ioctl argument for submitting commands
> > to the 3D
> >   * engine.
> > @@ -228,6 +275,7 @@ enum drm_v3d_param {
> >  	DRM_V3D_PARAM_SUPPORTS_CSD,
> >  	DRM_V3D_PARAM_SUPPORTS_CACHE_FLUSH,
> >  	DRM_V3D_PARAM_SUPPORTS_PERFMON,
> > +	DRM_V3D_PARAM_SUPPORTS_MULTISYNC_EXT,
> >  };
> >  
> >  struct drm_v3d_get_param {
> > @@ -271,7 +319,6 @@ struct drm_v3d_submit_tfu {
> >  
> >  	/* Pointer to an array of ioctl extensions*/
> >  	__u64 extensions;
> > -
> >  };
> >  
> >  /* Submits a compute shader for dispatch.  This job will block on
> > any
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 4/4] drm/v3d: add multiple syncobjs support
  2021-09-30 10:40     ` Melissa Wen
@ 2021-09-30 16:26       ` Melissa Wen
  0 siblings, 0 replies; 13+ messages in thread
From: Melissa Wen @ 2021-09-30 16:26 UTC (permalink / raw)
  To: Iago Toral
  Cc: dri-devel, Emma Anholt, David Airlie, Daniel Vetter,
	Maxime Ripard, Boris Brezillon

[-- Attachment #1: Type: text/plain, Size: 20679 bytes --]

On 09/30, Melissa Wen wrote:
> On 09/30, Iago Toral wrote:
> > On Wed, 2021-09-29 at 10:45 +0100, Melissa Wen wrote:
> > > Using the generic extension from the previous patch, a specific
> > > multisync
> > > extension enables more than one in/out binary syncobj per job
> > > submission.
> > > Arrays of syncobjs are set in struct drm_v3d_multisync, that also
> > > cares
> > > of determining the stage for sync (wait deps) according to the job
> > > queue.
> > > 
> > > v2:
> > > - subclass the generic extension struct (Daniel)
> > > - simplify adding dependency conditions to make understandable (Iago)
> > > 
> > > Signed-off-by: Melissa Wen <mwen@igalia.com>
> > > ---
> > >  drivers/gpu/drm/v3d/v3d_drv.c |   6 +-
> > >  drivers/gpu/drm/v3d/v3d_drv.h |  23 +++--
> > >  drivers/gpu/drm/v3d/v3d_gem.c | 176 ++++++++++++++++++++++++++++++
> > > ----
> > >  include/uapi/drm/v3d_drm.h    |  49 +++++++++-
> > >  4 files changed, 224 insertions(+), 30 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/v3d/v3d_drv.c
> > > b/drivers/gpu/drm/v3d/v3d_drv.c
> > > index 3d6b9bcce2f7..bd46396a1ae0 100644
> > > --- a/drivers/gpu/drm/v3d/v3d_drv.c
> > > +++ b/drivers/gpu/drm/v3d/v3d_drv.c
> > > @@ -96,6 +96,9 @@ static int v3d_get_param_ioctl(struct drm_device
> > > *dev, void *data,
> > >  	case DRM_V3D_PARAM_SUPPORTS_PERFMON:
> > >  		args->value = (v3d->ver >= 40);
> > >  		return 0;
> > > +	case DRM_V3D_PARAM_SUPPORTS_MULTISYNC_EXT:
> > > +		args->value = 1;
> > > +		return 0;
> > >  	default:
> > >  		DRM_DEBUG("Unknown parameter %d\n", args->param);
> > >  		return -EINVAL;
> > > @@ -135,9 +138,8 @@ v3d_postclose(struct drm_device *dev, struct
> > > drm_file *file)
> > >  	struct v3d_file_priv *v3d_priv = file->driver_priv;
> > >  	enum v3d_queue q;
> > >  
> > > -	for (q = 0; q < V3D_MAX_QUEUES; q++) {
> > > +	for (q = 0; q < V3D_MAX_QUEUES; q++)
> > >  		drm_sched_entity_destroy(&v3d_priv->sched_entity[q]);
> > > -	}
> > >  
> > >  	v3d_perfmon_close_file(v3d_priv);
> > >  	kfree(v3d_priv);
> > > diff --git a/drivers/gpu/drm/v3d/v3d_drv.h
> > > b/drivers/gpu/drm/v3d/v3d_drv.h
> > > index b900a050d5e2..b14ff1e96f49 100644
> > > --- a/drivers/gpu/drm/v3d/v3d_drv.h
> > > +++ b/drivers/gpu/drm/v3d/v3d_drv.h
> > > @@ -19,15 +19,6 @@ struct reset_control;
> > >  
> > >  #define GMP_GRANULARITY (128 * 1024)
> > >  
> > > -/* Enum for each of the V3D queues. */
> > > -enum v3d_queue {
> > > -	V3D_BIN,
> > > -	V3D_RENDER,
> > > -	V3D_TFU,
> > > -	V3D_CSD,
> > > -	V3D_CACHE_CLEAN,
> > > -};
> > > -
> > >  #define V3D_MAX_QUEUES (V3D_CACHE_CLEAN + 1)
> > >  
> > >  struct v3d_queue_state {
> > > @@ -294,6 +285,20 @@ struct v3d_csd_job {
> > >  	struct drm_v3d_submit_csd args;
> > >  };
> > >  
> > > +struct v3d_submit_outsync {
> > > +	struct drm_syncobj *syncobj;
> > > +};
> > > +
> > > +struct v3d_submit_ext {
> > > +	u32 wait_stage;
> > > +
> > > +	u32 in_sync_count;
> > > +	u64 in_syncs;
> > > +
> > > +	u32 out_sync_count;
> > > +	struct v3d_submit_outsync *out_syncs;
> > > +};
> > > +
> > >  /**
> > >   * __wait_for - magic wait macro
> > >   *
> > > diff --git a/drivers/gpu/drm/v3d/v3d_gem.c
> > > b/drivers/gpu/drm/v3d/v3d_gem.c
> > > index b912419027f7..e92998d39eaa 100644
> > > --- a/drivers/gpu/drm/v3d/v3d_gem.c
> > > +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> > > @@ -454,11 +454,12 @@ v3d_job_add_deps(struct drm_file *file_priv,
> > > struct v3d_job *job,
> > >  static int
> > >  v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
> > >  	     void **container, size_t size, void (*free)(struct kref
> > > *ref),
> > > -	     u32 in_sync, enum v3d_queue queue)
> > > +	     u32 in_sync, struct v3d_submit_ext *se, enum v3d_queue
> > > queue)
> > >  {
> > >  	struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
> > >  	struct v3d_job *job;
> > > -	int ret;
> > > +	bool has_multisync = (se && se->in_sync_count);
> > 
> > I think this is not correct... we could be using the multisync
> > interface and pass 0 in_syncs and/or 0 out_syncs... what should
> > determine if we take the multisync path is the presence of the
> > extension alone.
> hmm.. yeah. so, I should drop this has_multisync and change conditions
> to only check if we have a submit_ext (that means we have multisync
> support) and move the in_sync_count to check if we should add any wait
> semaphores, as below: 
> 
> > 
> > > +	int ret, i;
> > >  
> > >  	*container = kcalloc(1, size, GFP_KERNEL);
> > >  	if (!*container) {
> > > @@ -479,9 +480,28 @@ v3d_job_init(struct v3d_dev *v3d, struct
> > > drm_file *file_priv,
> > >  	if (ret)
> > >  		goto fail;
> > >  
> > > -	ret = v3d_job_add_deps(file_priv, job, in_sync, 0);
> > > -	if (ret)
> > > -		goto fail_job;
> 	if (se) {
> 		if (se->in_sync_count && se->wait_stage == queue) {
never mind. I realized store extension flag on v3d_submit_ext works
better for handling multiples wait and signal semaphores.
Just sent a v3.

Thanks,

Melissa
> > > +			struct drm_v3d_sem __user *handle =
> > > u64_to_user_ptr(se->in_syncs);
> > > +
> > > +			for (i = 0; i < se->in_sync_count; i++) {
> > > +				struct drm_v3d_sem in;
> > > +
> > > +				ret = copy_from_user(&in, handle++,
> > > sizeof(in));
> > > +				if (ret) {
> > > +					DRM_DEBUG("Failed to copy wait
> > > dep handle.\n");
> > > +					goto fail_job;
> > > +				}
> > > +				ret = v3d_job_add_deps(file_priv, job,
> > > in.handle, 0);
> > > +				if (ret)
> > > +					goto fail_job;
> > > +			}
> > > +		}
> > > +	} else {
> > > +		ret = v3d_job_add_deps(file_priv, job, in_sync, 0);
> > > +		if (ret)
> > > +			goto fail_job;
> > > +	}
> > >  
> > >  	kref_init(&job->refcount);
> > >  
> > > @@ -514,6 +534,7 @@ v3d_attach_fences_and_unlock_reservation(struct
> > > drm_file *file_priv,
> > >  					 struct v3d_job *job,
> > >  					 struct ww_acquire_ctx
> > > *acquire_ctx,
> > >  					 u32 out_sync,
> > > +					 struct v3d_submit_ext *se,
> > >  					 struct dma_fence *done_fence)
> > >  {
> > >  	struct drm_syncobj *sync_out;
> > > @@ -528,6 +549,18 @@ v3d_attach_fences_and_unlock_reservation(struct
> > > drm_file *file_priv,
> > >  	drm_gem_unlock_reservations(job->bo, job->bo_count,
> > > acquire_ctx);
> > >  
> > >  	/* Update the return sync object for the job */
> > > +	/* If multiples semaphores is supported */
> > > +	if (se && se->out_sync_count) {
> 
> and drop this out_sync_count condition too ^
same here.
In the v3, I reworked a little more to detach multisync support from the
number of out_sync to lookup and replace.
> > > +		for (i = 0; i < se->out_sync_count; i++) {
> > > +			drm_syncobj_replace_fence(se-
> > > >out_syncs[i].syncobj,
> > > +						  done_fence);
> > > +			drm_syncobj_put(se->out_syncs[i].syncobj);
> > > +		}
> > > +		kvfree(se->out_syncs);
> > > +		return;
> > > +	}
> > > +
> > > +	/* Single signal semaphore */
> > >  	sync_out = drm_syncobj_find(file_priv, out_sync);
> > >  	if (sync_out) {
> > >  		drm_syncobj_replace_fence(sync_out, done_fence);
> > > @@ -535,13 +568,107 @@
> > > v3d_attach_fences_and_unlock_reservation(struct drm_file *file_priv,
> > >  	}
> > >  }
> > >  
> > > +static void
> > > +v3d_put_multisync_post_deps(struct v3d_submit_ext *se)
> > > +{
> > > +	unsigned int i;
> > > +
> > > +	for (i = 0; i < se->out_sync_count; i++)
> > > +		drm_syncobj_put(se->out_syncs[i].syncobj);
> > > +	kvfree(se->out_syncs);
> > > +}
> > > +
> > >  static int
> > > -v3d_get_extensions(struct drm_file *file_priv, u64 ext_handles)
> > > +v3d_get_multisync_post_deps(struct drm_file *file_priv,
> > > +			    struct v3d_submit_ext *se,
> > > +			    u32 count, u64 handles)
> > > +{
> > > +	struct drm_v3d_sem __user *post_deps;
> > > +	int i, ret;
> > > +
> > > +	se->out_syncs = (struct v3d_submit_outsync *)
> > > +			kvmalloc_array(count,
> > > +				       sizeof(struct
> > > v3d_submit_outsync),
> > > +				       GFP_KERNEL);
> > > +	if (!se->out_syncs)
> > > +		return -ENOMEM;
> > > +
> > > +	post_deps = u64_to_user_ptr(handles);
> > > +
> > > +	for (i = 0; i < count; i++) {
> > > +		struct drm_v3d_sem out;
> > > +
> > > +		ret = copy_from_user(&out, post_deps++, sizeof(out));
> > > +		if (ret) {
> > > +			DRM_DEBUG("Failed to copy post dep handles\n");
> > > +			goto fail;
> > > +		}
> > > +
> > > +		se->out_syncs[i].syncobj = drm_syncobj_find(file_priv,
> > > +							    out.handle)
> > > ;
> > > +		if (!se->out_syncs[i].syncobj) {
> > > +			ret = -EINVAL;
> > > +			goto fail;
> > > +		}
> > > +	}
> > > +	se->out_sync_count = count;
> > > +
> > > +	return 0;
> > > +
> > > +fail:
> > > +	for (i--; i >= 0; i--)
> > > +		drm_syncobj_put(se->out_syncs[i].syncobj);
> > > +	kvfree(se->out_syncs);
> > > +
> > > +	return ret;
> > > +}
> > > +
> > > +/* Get data for multiple binary semaphores synchronization. Parse
> > > syncobj
> > > + * to be signaled when job completes (out_sync).
> > > + */
> > > +static int
> > > +v3d_get_multisync_submit_deps(struct drm_file *file_priv,
> > > +			      struct drm_v3d_extension __user *ext,
> > > +			      void *data)
> > > +{
> > > +	struct drm_v3d_multi_sync multisync;
> > > +	struct v3d_submit_ext *se = data;
> > > +	int ret;
> > > +
> > > +	ret = copy_from_user(&multisync, ext, sizeof(multisync));
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	if (multisync.pad)
> > > +		return -EINVAL;
> > > +
> > > +	ret = v3d_get_multisync_post_deps(file_priv, data,
> > > multisync.out_sync_count,
> > > +					  multisync.out_syncs);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	se->in_sync_count = multisync.in_sync_count;
> > > +	se->in_syncs = multisync.in_syncs;
> > > +
> > > +	se->wait_stage = multisync.wait_stage;
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +/* Whenever userspace sets ioctl extensions, parses data according
> > > to the
> > > + * extension id (name). By now, generic ioctl extensions is only
> > > used to
> > > + * support multiple binary semaphores (struct drm_v3d_multi_sync).
> > > + */
> > 
> > Let's remove the sentence that starts with "By now, generic..." it is
> > not really relevant and it is bound to get obsolete when we add more
> > extensions.
> ok. I think I should also move this comment to the previous patch
> where the function below was created.
> > 
> > > +static int
> > > +v3d_get_extensions(struct drm_file *file_priv,
> > > +		   u64 ext_handles,
> > > +		   void *data)
> > >  {
> > >  	struct drm_v3d_extension __user *user_ext;
> > > +	int ret;
> > >  
> > >  	user_ext = u64_to_user_ptr(ext_handles);
> > > -	while(user_ext) {
> > > +	while (user_ext) {
> > >  		struct drm_v3d_extension ext;
> > >  
> > >  		if (copy_from_user(&ext, user_ext, sizeof(ext))) {
> > > @@ -550,7 +677,11 @@ v3d_get_extensions(struct drm_file *file_priv,
> > > u64 ext_handles)
> > >  		}
> > >  
> > >  		switch (ext.id) {
> > > -		case 0:
> > > +		case DRM_V3D_EXT_ID_MULTI_SYNC:
> > > +			ret = v3d_get_multisync_submit_deps(file_priv,
> > > user_ext, data);
> > > +			if (ret)
> > > +				return ret;
> > > +			break;
> > >  		default:
> > >  			DRM_DEBUG_DRIVER("Unknown extension id: %d\n",
> > > ext.id);
> > >  			return -EINVAL;
> > > @@ -581,6 +712,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void
> > > *data,
> > >  	struct v3d_dev *v3d = to_v3d_dev(dev);
> > >  	struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
> > >  	struct drm_v3d_submit_cl *args = data;
> > > +	struct v3d_submit_ext se = {0};
> > >  	struct v3d_bin_job *bin = NULL;
> > >  	struct v3d_render_job *render;
> > >  	struct v3d_job *clean_job = NULL;
> > > @@ -601,7 +733,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void
> > > *data,
> > >  	}
> > >  
> > >  	if (args->flags & DRM_V3D_SUBMIT_EXTENSION) {
> > > -		ret = v3d_get_extensions(file_priv, args->extensions);
> > > +		ret = v3d_get_extensions(file_priv, args->extensions,
> > > &se);
> > >  		if (ret) {
> > >  			DRM_DEBUG("Failed to get extensions.\n");
> > >  			return ret;
> > > @@ -609,7 +741,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void
> > > *data,
> > >  	}
> > >  
> > >  	ret = v3d_job_init(v3d, file_priv, (void *)&render,
> > > sizeof(*render),
> > > -			   v3d_render_job_free, args->in_sync_bcl,
> > > V3D_RENDER);
> > > +			   v3d_render_job_free, args->in_sync_bcl, &se,
> > > V3D_RENDER);
> > 
> > So here is an example of what I was saying above... if the driver is
> > using the multisync interface with 0 in_syncs, v3d_job_init here will
> > think we are not using multisync and instead try to use args-
> > >in_sync_bcl, which is not what we want. The same issue happens in all
> > other submission ioctls.
> > 
> > >  	if (ret)
> > >  		goto fail;
> > >  
> > > @@ -619,7 +751,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void
> > > *data,
> > >  
> > >  	if (args->bcl_start != args->bcl_end) {
> > >  		ret = v3d_job_init(v3d, file_priv, (void *)&bin,
> > > sizeof(*bin),
> > > -				   v3d_job_free, args->in_sync_bcl,
> > > V3D_BIN);
> > > +				   v3d_job_free, args->in_sync_bcl,
> > > &se, V3D_BIN);
> > 
> > Same here.
> > 
> > >  		if (ret)
> > >  			goto fail;
> > >  
> > > @@ -633,7 +765,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void
> > > *data,
> > >  
> > >  	if (args->flags & DRM_V3D_SUBMIT_CL_FLUSH_CACHE) {
> > >  		ret = v3d_job_init(v3d, file_priv, (void *)&clean_job,
> > > sizeof(*clean_job),
> > > -				   v3d_job_free, 0, V3D_CACHE_CLEAN);
> > > +				   v3d_job_free, 0, 0,
> > > V3D_CACHE_CLEAN);
> > >  		if (ret)
> > >  			goto fail;
> > >  
> > > @@ -693,6 +825,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void
> > > *data,
> > >  						 last_job,
> > >  						 &acquire_ctx,
> > >  						 args->out_sync,
> > > +						 &se,
> > >  						 last_job->done_fence);
> > >  
> > >  	if (bin)
> > > @@ -711,6 +844,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void
> > > *data,
> > >  	v3d_job_cleanup((void *)bin);
> > >  	v3d_job_cleanup((void *)render);
> > >  	v3d_job_cleanup(clean_job);
> > > +	v3d_put_multisync_post_deps(&se);
> > >  
> > >  	return ret;
> > >  }
> > > @@ -730,6 +864,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void
> > > *data,
> > >  {
> > >  	struct v3d_dev *v3d = to_v3d_dev(dev);
> > >  	struct drm_v3d_submit_tfu *args = data;
> > > +	struct v3d_submit_ext se = {0};
> > >  	struct v3d_tfu_job *job;
> > >  	struct ww_acquire_ctx acquire_ctx;
> > >  	int ret = 0;
> > > @@ -742,7 +877,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void
> > > *data,
> > >  	}
> > >  
> > >  	if (args->flags & DRM_V3D_SUBMIT_EXTENSION) {
> > > -		ret = v3d_get_extensions(file_priv, args->extensions);
> > > +		ret = v3d_get_extensions(file_priv, args->extensions,
> > > &se);
> > >  		if (ret) {
> > >  			DRM_DEBUG("Failed to get extensions.\n");
> > >  			return ret;
> > > @@ -750,7 +885,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void
> > > *data,
> > >  	}
> > >  
> > >  	ret = v3d_job_init(v3d, file_priv, (void *)&job, sizeof(*job),
> > > -			   v3d_job_free, args->in_sync, V3D_TFU);
> > > +			   v3d_job_free, args->in_sync, &se, V3D_TFU);
> > 
> > Same here.
> > 
> > >  	if (ret)
> > >  		goto fail;
> > >  
> > > @@ -798,6 +933,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void
> > > *data,
> > >  	v3d_attach_fences_and_unlock_reservation(file_priv,
> > >  						 &job->base,
> > > &acquire_ctx,
> > >  						 args->out_sync,
> > > +						 &se,
> > >  						 job->base.done_fence);
> > >  
> > >  	v3d_job_put(&job->base);
> > > @@ -806,6 +942,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void
> > > *data,
> > >  
> > >  fail:
> > >  	v3d_job_cleanup((void *)job);
> > > +	v3d_put_multisync_post_deps(&se);
> > >  
> > >  	return ret;
> > >  }
> > > @@ -826,8 +963,9 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void
> > > *data,
> > >  	struct v3d_dev *v3d = to_v3d_dev(dev);
> > >  	struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
> > >  	struct drm_v3d_submit_csd *args = data;
> > > +	struct v3d_submit_ext se = {0};
> > >  	struct v3d_csd_job *job;
> > > -	struct v3d_job *clean_job;
> > > +	struct v3d_job *clean_job = NULL;
> > >  	struct ww_acquire_ctx acquire_ctx;
> > >  	int ret;
> > >  
> > > @@ -847,7 +985,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void
> > > *data,
> > >  	}
> > >  
> > >  	if (args->flags & DRM_V3D_SUBMIT_EXTENSION) {
> > > -		ret = v3d_get_extensions(file_priv, args->extensions);
> > > +		ret = v3d_get_extensions(file_priv, args->extensions,
> > > &se);
> > >  		if (ret) {
> > >  			DRM_DEBUG("Failed to get extensions.\n");
> > >  			return ret;
> > > @@ -855,12 +993,12 @@ v3d_submit_csd_ioctl(struct drm_device *dev,
> > > void *data,
> > >  	}
> > >  
> > >  	ret = v3d_job_init(v3d, file_priv, (void *)&job, sizeof(*job),
> > > -			   v3d_job_free, args->in_sync, V3D_CSD);
> > > +			   v3d_job_free, args->in_sync, &se, V3D_CSD);
> > 
> > Same here.
> > 
> > >  	if (ret)
> > >  		goto fail;
> > >  
> > >  	ret = v3d_job_init(v3d, file_priv, (void *)&clean_job,
> > > sizeof(*clean_job),
> > > -			   v3d_job_free, 0, V3D_CACHE_CLEAN);
> > > +			   v3d_job_free, 0, 0, V3D_CACHE_CLEAN);
> > >  	if (ret)
> > >  		goto fail;
> > >  
> > > @@ -899,6 +1037,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev,
> > > void *data,
> > >  						 clean_job,
> > >  						 &acquire_ctx,
> > >  						 args->out_sync,
> > > +						 &se,
> > >  						 clean_job-
> > > >done_fence);
> > >  
> > >  	v3d_job_put(&job->base);
> > > @@ -913,6 +1052,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev,
> > > void *data,
> > >  fail:
> > >  	v3d_job_cleanup((void *)job);
> > >  	v3d_job_cleanup(clean_job);
> > > +	v3d_put_multisync_post_deps(&se);
> > >  
> > >  	return ret;
> > >  }
> > > diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
> > > index 55b443ca6c0b..3dfc0af8756a 100644
> > > --- a/include/uapi/drm/v3d_drm.h
> > > +++ b/include/uapi/drm/v3d_drm.h
> > > @@ -73,6 +73,53 @@ struct drm_v3d_extension {
> > >  	__u32 flags; /* mbz */
> > >  };
> > >  
> > > +/* struct drm_v3d_sem - wait/signal semaphore
> > > + *
> > > + * If binary semaphore, it only takes syncobj handle and ignores
> > > flags and
> > > + * point fields. Point is defined for timeline syncobj feature.
> > > + */
> > > +struct drm_v3d_sem {
> > > +	__u32 handle; /* syncobj */
> > > +	/* rsv below, for future uses */
> > > +	__u32 flags;
> > > +	__u64 point;  /* for timeline sem support */
> > > +	__u64 mbz[2]; /* must be zero, rsv */
> > > +};
> > > +
> > > +/* Enum for each of the V3D queues. */
> > > +enum v3d_queue {
> > > +	V3D_BIN,
> > > +	V3D_RENDER,
> > > +	V3D_TFU,
> > > +	V3D_CSD,
> > > +	V3D_CACHE_CLEAN,
> > > +};
> > > +
> > > +/**
> > > + * struct drm_v3d_multi_sync - ioctl extension to add support
> > > multiples
> > > + * syncobjs for commands submission.
> > > + *
> > > + * When an extension of DRM_V3D_EXT_ID_MULTI_SYNC id is defined, it
> > > points to
> > > + * this extension to define wait and signal dependencies, instead of
> > > single
> > > + * in/out sync entries on submitting commands. The field flags is
> > > used to
> > > + * determine the stage to set wait dependencies.
> > > + */
> > > +struct drm_v3d_multi_sync {
> > > +	struct drm_v3d_extension base;
> > > +	/* Array of wait and signal semaphores */
> > > +	__u64 in_syncs;
> > > +	__u64 out_syncs;
> > > +
> > > +	/* Number of entries */
> > > +	__u32 in_sync_count;
> > > +	__u32 out_sync_count;
> > > +
> > > +	/* set the stage (v3d_queue) to sync */
> > > +	__u32 wait_stage;
> > > +
> > > +	__u32 pad; /* mbz */
> > > +};
> > > +
> > >  /**
> > >   * struct drm_v3d_submit_cl - ioctl argument for submitting commands
> > > to the 3D
> > >   * engine.
> > > @@ -228,6 +275,7 @@ enum drm_v3d_param {
> > >  	DRM_V3D_PARAM_SUPPORTS_CSD,
> > >  	DRM_V3D_PARAM_SUPPORTS_CACHE_FLUSH,
> > >  	DRM_V3D_PARAM_SUPPORTS_PERFMON,
> > > +	DRM_V3D_PARAM_SUPPORTS_MULTISYNC_EXT,
> > >  };
> > >  
> > >  struct drm_v3d_get_param {
> > > @@ -271,7 +319,6 @@ struct drm_v3d_submit_tfu {
> > >  
> > >  	/* Pointer to an array of ioctl extensions*/
> > >  	__u64 extensions;
> > > -
> > >  };
> > >  
> > >  /* Submits a compute shader for dispatch.  This job will block on
> > > any
> > 



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-09-30 16:26 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-29  9:41 [PATCH v2 0/4] drm/v3d: add multiple in/out syncobjs support Melissa Wen
2021-09-29  9:42 ` [PATCH v2 1/4] drm/v3d: decouple adding job dependencies steps from job init Melissa Wen
2021-09-29  9:43 ` [PATCH v2 2/4] drm/v3d: alloc and init job in one shot Melissa Wen
2021-09-30  8:44   ` Iago Toral
2021-09-30  9:16     ` Melissa Wen
2021-09-29  9:44 ` [PATCH v2 3/4] drm/v3d: add generic ioctl extension Melissa Wen
2021-09-30  9:11   ` Iago Toral
2021-09-30  9:22     ` Melissa Wen
2021-09-30  9:59       ` Iago Toral
2021-09-29  9:45 ` [PATCH v2 4/4] drm/v3d: add multiple syncobjs support Melissa Wen
2021-09-30  9:54   ` Iago Toral
2021-09-30 10:40     ` Melissa Wen
2021-09-30 16:26       ` Melissa Wen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.