linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] V3D TFU engine support
@ 2018-11-08 16:16 Eric Anholt
  2018-11-08 16:16 ` [PATCH 1/4] drm/v3d: Fix whitespace inconsistency in the header Eric Anholt
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Eric Anholt @ 2018-11-08 16:16 UTC (permalink / raw)
  To: dri-devel; +Cc: linux-kernel, boris.brezillon, Daniel Vetter, Eric Anholt

This series adds support for V3D's TFU engine (a little texture
tiling/YUV import/mipmap generation block).  Corresponding Mesa
userspace is at
https://gitlab.freedesktop.org/anholt/mesa/commits/v3d-tfu

Once we have TFU, the next step will be compute shaders, which are a
lot more interesting.

Eric Anholt (4):
  drm/v3d: Fix whitespace inconsistency in the header.
  drm/v3d: Update a comment about what uses v3d_job_dependency().
  drm/v3d: Clean up the reservation object setup.
  drm/v3d: Add support for submitting jobs to the TFU.

 drivers/gpu/drm/v3d/v3d_drv.c   |  12 +-
 drivers/gpu/drm/v3d/v3d_drv.h   |  32 +++++-
 drivers/gpu/drm/v3d/v3d_gem.c   | 193 ++++++++++++++++++++++++++------
 drivers/gpu/drm/v3d/v3d_irq.c   |  12 +-
 drivers/gpu/drm/v3d/v3d_regs.h  |  58 ++++++++++
 drivers/gpu/drm/v3d/v3d_sched.c | 149 ++++++++++++++++++++----
 drivers/gpu/drm/v3d/v3d_trace.h |  20 ++++
 include/uapi/drm/v3d_drm.h      |  29 ++++-
 8 files changed, 437 insertions(+), 68 deletions(-)

-- 
2.19.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/4] drm/v3d: Fix whitespace inconsistency in the header.
  2018-11-08 16:16 [PATCH 0/4] V3D TFU engine support Eric Anholt
@ 2018-11-08 16:16 ` Eric Anholt
  2018-11-13 10:22   ` Boris Brezillon
  2018-11-08 16:16 ` [PATCH 2/4] drm/v3d: Update a comment about what uses v3d_job_dependency() Eric Anholt
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Eric Anholt @ 2018-11-08 16:16 UTC (permalink / raw)
  To: dri-devel; +Cc: linux-kernel, boris.brezillon, Daniel Vetter, Eric Anholt

Signed-off-by: Eric Anholt <eric@anholt.net>
---
 include/uapi/drm/v3d_drm.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
index f446656d00b1..b1e5de076b0f 100644
--- a/include/uapi/drm/v3d_drm.h
+++ b/include/uapi/drm/v3d_drm.h
@@ -66,7 +66,7 @@ struct drm_v3d_submit_cl {
 	 */
 	__u32 bcl_start;
 
-	 /** End address of the BCL (first byte after the BCL) */
+	/** End address of the BCL (first byte after the BCL) */
 	__u32 bcl_end;
 
 	/* Offset of the render command list.
@@ -82,7 +82,7 @@ struct drm_v3d_submit_cl {
 	 */
 	__u32 rcl_start;
 
-	 /** End address of the RCL (first byte after the RCL) */
+	/** End address of the RCL (first byte after the RCL) */
 	__u32 rcl_end;
 
 	/** An optional sync object to wait on before starting the BCL. */
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/4] drm/v3d: Update a comment about what uses v3d_job_dependency().
  2018-11-08 16:16 [PATCH 0/4] V3D TFU engine support Eric Anholt
  2018-11-08 16:16 ` [PATCH 1/4] drm/v3d: Fix whitespace inconsistency in the header Eric Anholt
@ 2018-11-08 16:16 ` Eric Anholt
  2018-11-13 10:22   ` Boris Brezillon
  2018-11-08 16:16 ` [PATCH 3/4] drm/v3d: Clean up the reservation object setup Eric Anholt
  2018-11-08 16:16 ` [PATCH 4/4] drm/v3d: Add support for submitting jobs to the TFU Eric Anholt
  3 siblings, 1 reply; 10+ messages in thread
From: Eric Anholt @ 2018-11-08 16:16 UTC (permalink / raw)
  To: dri-devel; +Cc: linux-kernel, boris.brezillon, Daniel Vetter, Eric Anholt

I merged bin and render's paths in a late refactoring.

Signed-off-by: Eric Anholt <eric@anholt.net>
---
 drivers/gpu/drm/v3d/v3d_sched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 9243dea6e6ad..e1f2aab0717b 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -39,7 +39,7 @@ v3d_job_free(struct drm_sched_job *sched_job)
 }
 
 /**
- * Returns the fences that the bin job depends on, one by one.
+ * Returns the fences that the bin or render job depends on, one by one.
  * v3d_job_run() won't be called until all of them have been signaled.
  */
 static struct dma_fence *
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/4] drm/v3d: Clean up the reservation object setup.
  2018-11-08 16:16 [PATCH 0/4] V3D TFU engine support Eric Anholt
  2018-11-08 16:16 ` [PATCH 1/4] drm/v3d: Fix whitespace inconsistency in the header Eric Anholt
  2018-11-08 16:16 ` [PATCH 2/4] drm/v3d: Update a comment about what uses v3d_job_dependency() Eric Anholt
@ 2018-11-08 16:16 ` Eric Anholt
  2018-11-13 10:22   ` Boris Brezillon
  2018-11-08 16:16 ` [PATCH 4/4] drm/v3d: Add support for submitting jobs to the TFU Eric Anholt
  3 siblings, 1 reply; 10+ messages in thread
From: Eric Anholt @ 2018-11-08 16:16 UTC (permalink / raw)
  To: dri-devel; +Cc: linux-kernel, boris.brezillon, Daniel Vetter, Eric Anholt

The extra to_v3d_bo() calls came from copying this from the vc4
driver, which stored the cma gem object in the structs.

Signed-off-by: Eric Anholt <eric@anholt.net>
---
 drivers/gpu/drm/v3d/v3d_gem.c | 32 +++++++++++---------------------
 1 file changed, 11 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index b88c96911453..d0dfdcbbd42c 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -214,10 +214,8 @@ v3d_attach_object_fences(struct v3d_exec_info *exec)
 	int i;
 
 	for (i = 0; i < exec->bo_count; i++) {
-		bo = to_v3d_bo(&exec->bo[i]->base);
-
 		/* XXX: Use shared fences for read-only objects. */
-		reservation_object_add_excl_fence(bo->resv, out_fence);
+		reservation_object_add_excl_fence(exec->bo[i]->resv, out_fence);
 	}
 }
 
@@ -228,11 +226,8 @@ v3d_unlock_bo_reservations(struct drm_device *dev,
 {
 	int i;
 
-	for (i = 0; i < exec->bo_count; i++) {
-		struct v3d_bo *bo = to_v3d_bo(&exec->bo[i]->base);
-
-		ww_mutex_unlock(&bo->resv->lock);
-	}
+	for (i = 0; i < exec->bo_count; i++)
+		ww_mutex_unlock(&exec->bo[i]->resv->lock);
 
 	ww_acquire_fini(acquire_ctx);
 }
@@ -251,13 +246,13 @@ v3d_lock_bo_reservations(struct drm_device *dev,
 {
 	int contended_lock = -1;
 	int i, ret;
-	struct v3d_bo *bo;
 
 	ww_acquire_init(acquire_ctx, &reservation_ww_class);
 
 retry:
 	if (contended_lock != -1) {
-		bo = to_v3d_bo(&exec->bo[contended_lock]->base);
+		struct v3d_bo *bo = exec->bo[contended_lock];
+
 		ret = ww_mutex_lock_slow_interruptible(&bo->resv->lock,
 						       acquire_ctx);
 		if (ret) {
@@ -270,19 +265,16 @@ v3d_lock_bo_reservations(struct drm_device *dev,
 		if (i == contended_lock)
 			continue;
 
-		bo = to_v3d_bo(&exec->bo[i]->base);
-
-		ret = ww_mutex_lock_interruptible(&bo->resv->lock, acquire_ctx);
+		ret = ww_mutex_lock_interruptible(&exec->bo[i]->resv->lock,
+						  acquire_ctx);
 		if (ret) {
 			int j;
 
-			for (j = 0; j < i; j++) {
-				bo = to_v3d_bo(&exec->bo[j]->base);
-				ww_mutex_unlock(&bo->resv->lock);
-			}
+			for (j = 0; j < i; j++)
+				ww_mutex_unlock(&exec->bo[j]->resv->lock);
 
 			if (contended_lock != -1 && contended_lock >= i) {
-				bo = to_v3d_bo(&exec->bo[contended_lock]->base);
+				struct v3d_bo *bo = exec->bo[contended_lock];
 
 				ww_mutex_unlock(&bo->resv->lock);
 			}
@@ -303,9 +295,7 @@ v3d_lock_bo_reservations(struct drm_device *dev,
 	 * before we commit the CL to the hardware.
 	 */
 	for (i = 0; i < exec->bo_count; i++) {
-		bo = to_v3d_bo(&exec->bo[i]->base);
-
-		ret = reservation_object_reserve_shared(bo->resv, 1);
+		ret = reservation_object_reserve_shared(exec->bo[i]->resv, 1);
 		if (ret) {
 			v3d_unlock_bo_reservations(dev, exec, acquire_ctx);
 			return ret;
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/4] drm/v3d: Add support for submitting jobs to the TFU.
  2018-11-08 16:16 [PATCH 0/4] V3D TFU engine support Eric Anholt
                   ` (2 preceding siblings ...)
  2018-11-08 16:16 ` [PATCH 3/4] drm/v3d: Clean up the reservation object setup Eric Anholt
@ 2018-11-08 16:16 ` Eric Anholt
  2018-11-13 22:03   ` Eric Anholt
  3 siblings, 1 reply; 10+ messages in thread
From: Eric Anholt @ 2018-11-08 16:16 UTC (permalink / raw)
  To: dri-devel; +Cc: linux-kernel, boris.brezillon, Daniel Vetter, Eric Anholt

The TFU can copy from raster, UIF, and SAND input images to UIF output
images, with optional mipmap generation.  This will certainly be
useful for media EGL image input, but is also useful immediately for
mipmap generation without bogging the V3D core down.

For now we only run the queue 1 job deep, and don't have any hang
recovery (though I don't think we should need it, with TFU).  Queuing
multiple jobs in the HW will require synchronizing the YUV coefficient
regs updates since they don't get FIFOed with the job.

Signed-off-by: Eric Anholt <eric@anholt.net>
---
 drivers/gpu/drm/v3d/v3d_drv.c   |  12 ++-
 drivers/gpu/drm/v3d/v3d_drv.h   |  32 +++++-
 drivers/gpu/drm/v3d/v3d_gem.c   | 177 ++++++++++++++++++++++++++++----
 drivers/gpu/drm/v3d/v3d_irq.c   |  12 ++-
 drivers/gpu/drm/v3d/v3d_regs.h  |  58 +++++++++++
 drivers/gpu/drm/v3d/v3d_sched.c | 147 ++++++++++++++++++++++----
 drivers/gpu/drm/v3d/v3d_trace.h |  20 ++++
 include/uapi/drm/v3d_drm.h      |  25 +++++
 8 files changed, 431 insertions(+), 52 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index 4857c0a63131..da0863281a73 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -184,10 +184,15 @@ static int v3d_get_param_ioctl(struct drm_device *dev, void *data,
 		return 0;
 	}
 
-	/* Any params that aren't just register reads would go here. */
 
-	DRM_DEBUG("Unknown parameter %d\n", args->param);
-	return -EINVAL;
+	switch (args->param) {
+	case DRM_V3D_PARAM_SUPPORTS_TFU:
+		args->value = 1;
+		return 0;
+	default:
+		DRM_DEBUG("Unknown parameter %d\n", args->param);
+		return -EINVAL;
+	}
 }
 
 static int
@@ -251,6 +256,7 @@ static const struct drm_ioctl_desc v3d_drm_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(V3D_MMAP_BO, v3d_mmap_bo_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(V3D_GET_PARAM, v3d_get_param_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(V3D_GET_BO_OFFSET, v3d_get_bo_offset_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(V3D_SUBMIT_TFU, v3d_submit_tfu_ioctl, DRM_RENDER_ALLOW | DRM_AUTH),
 };
 
 static const struct vm_operations_struct v3d_vm_ops = {
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 83c55ab6e1c0..e0624ea72942 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -8,19 +8,18 @@
 #include <drm/drm_gem.h>
 #include <drm/gpu_scheduler.h>
 #include <drm/drm_simple_kms_helper.h>
+#include "uapi/drm/v3d_drm.h"
 
 #define GMP_GRANULARITY (128 * 1024)
 
-/* Enum for each of the V3D queues.  We maintain various queue
- * tracking as an array because at some point we'll want to support
- * the TFU (texture formatting unit) as another queue.
- */
+/* Enum for each of the V3D queues. */
 enum v3d_queue {
 	V3D_BIN,
 	V3D_RENDER,
+	V3D_TFU,
 };
 
-#define V3D_MAX_QUEUES (V3D_RENDER + 1)
+#define V3D_MAX_QUEUES (V3D_TFU + 1)
 
 struct v3d_queue_state {
 	struct drm_gpu_scheduler sched;
@@ -74,6 +73,7 @@ struct v3d_dev {
 
 	struct v3d_exec_info *bin_job;
 	struct v3d_exec_info *render_job;
+	struct v3d_tfu_job *tfu_job;
 
 	struct v3d_queue_state queue[V3D_MAX_QUEUES];
 
@@ -224,6 +224,25 @@ struct v3d_exec_info {
 	u32 qma, qms, qts;
 };
 
+struct v3d_tfu_job {
+	struct drm_sched_job base;
+
+	struct drm_v3d_submit_tfu args;
+
+	/* An optional fence userspace can pass in for the job to depend on. */
+	struct dma_fence *in_fence;
+
+	/* v3d fence to be signaled by IRQ handler when the job is complete. */
+	struct dma_fence *done_fence;
+
+	struct v3d_dev *v3d;
+
+	struct kref refcount;
+
+	/* This is the array of BOs that were looked up at the start of exec. */
+	struct v3d_bo *bo[4];
+};
+
 /**
  * _wait_for - magic (register) wait macro
  *
@@ -287,9 +306,12 @@ int v3d_gem_init(struct drm_device *dev);
 void v3d_gem_destroy(struct drm_device *dev);
 int v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
 			struct drm_file *file_priv);
+int v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
+			 struct drm_file *file_priv);
 int v3d_wait_bo_ioctl(struct drm_device *dev, void *data,
 		      struct drm_file *file_priv);
 void v3d_exec_put(struct v3d_exec_info *exec);
+void v3d_tfu_job_put(struct v3d_tfu_job *exec);
 void v3d_reset(struct v3d_dev *v3d);
 void v3d_invalidate_caches(struct v3d_dev *v3d);
 void v3d_flush_caches(struct v3d_dev *v3d);
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index d0dfdcbbd42c..adc8b1ec15e3 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -207,27 +207,27 @@ v3d_flush_caches(struct v3d_dev *v3d)
 }
 
 static void
-v3d_attach_object_fences(struct v3d_exec_info *exec)
+v3d_attach_object_fences(struct v3d_bo **bos, int bo_count,
+			 struct dma_fence *fence)
 {
-	struct dma_fence *out_fence = exec->render_done_fence;
-	struct v3d_bo *bo;
 	int i;
 
-	for (i = 0; i < exec->bo_count; i++) {
+	for (i = 0; i < bo_count; i++) {
 		/* XXX: Use shared fences for read-only objects. */
-		reservation_object_add_excl_fence(exec->bo[i]->resv, out_fence);
+		reservation_object_add_excl_fence(bos[i]->resv, fence);
 	}
 }
 
 static void
 v3d_unlock_bo_reservations(struct drm_device *dev,
-			   struct v3d_exec_info *exec,
+			   struct v3d_bo **bos,
+			   int bo_count,
 			   struct ww_acquire_ctx *acquire_ctx)
 {
 	int i;
 
-	for (i = 0; i < exec->bo_count; i++)
-		ww_mutex_unlock(&exec->bo[i]->resv->lock);
+	for (i = 0; i < bo_count; i++)
+		ww_mutex_unlock(&bos[i]->resv->lock);
 
 	ww_acquire_fini(acquire_ctx);
 }
@@ -241,7 +241,8 @@ v3d_unlock_bo_reservations(struct drm_device *dev,
  */
 static int
 v3d_lock_bo_reservations(struct drm_device *dev,
-			 struct v3d_exec_info *exec,
+			 struct v3d_bo **bos,
+			 int bo_count,
 			 struct ww_acquire_ctx *acquire_ctx)
 {
 	int contended_lock = -1;
@@ -251,7 +252,7 @@ v3d_lock_bo_reservations(struct drm_device *dev,
 
 retry:
 	if (contended_lock != -1) {
-		struct v3d_bo *bo = exec->bo[contended_lock];
+		struct v3d_bo *bo = bos[contended_lock];
 
 		ret = ww_mutex_lock_slow_interruptible(&bo->resv->lock,
 						       acquire_ctx);
@@ -261,20 +262,20 @@ v3d_lock_bo_reservations(struct drm_device *dev,
 		}
 	}
 
-	for (i = 0; i < exec->bo_count; i++) {
+	for (i = 0; i < bo_count; i++) {
 		if (i == contended_lock)
 			continue;
 
-		ret = ww_mutex_lock_interruptible(&exec->bo[i]->resv->lock,
+		ret = ww_mutex_lock_interruptible(&bos[i]->resv->lock,
 						  acquire_ctx);
 		if (ret) {
 			int j;
 
 			for (j = 0; j < i; j++)
-				ww_mutex_unlock(&exec->bo[j]->resv->lock);
+				ww_mutex_unlock(&bos[j]->resv->lock);
 
 			if (contended_lock != -1 && contended_lock >= i) {
-				struct v3d_bo *bo = exec->bo[contended_lock];
+				struct v3d_bo *bo = bos[contended_lock];
 
 				ww_mutex_unlock(&bo->resv->lock);
 			}
@@ -294,10 +295,11 @@ v3d_lock_bo_reservations(struct drm_device *dev,
 	/* Reserve space for our shared (read-only) fence references,
 	 * before we commit the CL to the hardware.
 	 */
-	for (i = 0; i < exec->bo_count; i++) {
-		ret = reservation_object_reserve_shared(exec->bo[i]->resv, 1);
+	for (i = 0; i < bo_count; i++) {
+		ret = reservation_object_reserve_shared(bos[i]->resv, 1);
 		if (ret) {
-			v3d_unlock_bo_reservations(dev, exec, acquire_ctx);
+			v3d_unlock_bo_reservations(dev, bos, bo_count,
+						   acquire_ctx);
 			return ret;
 		}
 	}
@@ -420,6 +422,31 @@ void v3d_exec_put(struct v3d_exec_info *exec)
 	kref_put(&exec->refcount, v3d_exec_cleanup);
 }
 
+static void
+v3d_tfu_job_cleanup(struct kref *ref)
+{
+	struct v3d_tfu_job *job = container_of(ref, struct v3d_tfu_job,
+					       refcount);
+	struct v3d_dev *v3d = job->v3d;
+	unsigned int i;
+
+	dma_fence_put(job->in_fence);
+	dma_fence_put(job->done_fence);
+
+	for (i = 0; i < ARRAY_SIZE(job->bo); i++)
+		drm_gem_object_put_unlocked(&job->bo[i]->base);
+
+	pm_runtime_mark_last_busy(v3d->dev);
+	pm_runtime_put_autosuspend(v3d->dev);
+
+	kfree(job);
+}
+
+void v3d_tfu_job_put(struct v3d_tfu_job *job)
+{
+	kref_put(&job->refcount, v3d_tfu_job_cleanup);
+}
+
 int
 v3d_wait_bo_ioctl(struct drm_device *dev, void *data,
 		  struct drm_file *file_priv)
@@ -537,7 +564,8 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
 	if (ret)
 		goto fail;
 
-	ret = v3d_lock_bo_reservations(dev, exec, &acquire_ctx);
+	ret = v3d_lock_bo_reservations(dev, exec->bo, exec->bo_count,
+				       &acquire_ctx);
 	if (ret)
 		goto fail;
 
@@ -571,9 +599,10 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
 				  &v3d_priv->sched_entity[V3D_RENDER]);
 	mutex_unlock(&v3d->sched_lock);
 
-	v3d_attach_object_fences(exec);
+	v3d_attach_object_fences(exec->bo, exec->bo_count,
+				 exec->render_done_fence);
 
-	v3d_unlock_bo_reservations(dev, exec, &acquire_ctx);
+	v3d_unlock_bo_reservations(dev, exec->bo, exec->bo_count, &acquire_ctx);
 
 	/* Update the return sync object for the */
 	sync_out = drm_syncobj_find(file_priv, args->out_sync);
@@ -589,13 +618,119 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
 
 fail_unreserve:
 	mutex_unlock(&v3d->sched_lock);
-	v3d_unlock_bo_reservations(dev, exec, &acquire_ctx);
+	v3d_unlock_bo_reservations(dev, exec->bo, exec->bo_count, &acquire_ctx);
 fail:
 	v3d_exec_put(exec);
 
 	return ret;
 }
 
+/**
+ * v3d_submit_tfu_ioctl() - Submits a TFU (texture formatting) job to the V3D.
+ * @dev: DRM device
+ * @data: ioctl argument
+ * @file_priv: DRM file for this fd
+ *
+ * Userspace provides the register setup for the TFU, which we don't
+ * need to validate since the TFU is behind the MMU.
+ */
+int
+v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
+		    struct drm_file *file_priv)
+{
+	struct v3d_dev *v3d = to_v3d_dev(dev);
+	struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
+	struct drm_v3d_submit_tfu *args = data;
+	struct v3d_tfu_job *job;
+	struct ww_acquire_ctx acquire_ctx;
+	struct drm_syncobj *sync_out;
+	struct dma_fence *sched_done_fence;
+	int ret = 0;
+	int bo_count;
+
+	job = kcalloc(1, sizeof(*job), GFP_KERNEL);
+	if (!job)
+		return -ENOMEM;
+
+	ret = pm_runtime_get_sync(v3d->dev);
+	if (ret < 0) {
+		kfree(job);
+		return ret;
+	}
+
+	kref_init(&job->refcount);
+
+	ret = drm_syncobj_find_fence(file_priv, args->in_sync,
+				     0, 0, &job->in_fence);
+	if (ret == -EINVAL)
+		goto fail;
+
+	job->args = *args;
+	job->v3d = v3d;
+
+	spin_lock(&file_priv->table_lock);
+	for (bo_count = 0; bo_count < ARRAY_SIZE(job->bo); bo_count++) {
+		struct drm_gem_object *bo;
+
+		if (!args->bo_handles[bo_count])
+			break;
+
+		bo = idr_find(&file_priv->object_idr,
+			      args->bo_handles[bo_count]);
+		if (!bo) {
+			DRM_DEBUG("Failed to look up GEM BO %d: %d\n",
+				  bo_count, args->bo_handles[bo_count]);
+			ret = -ENOENT;
+			spin_unlock(&file_priv->table_lock);
+			goto fail;
+		}
+		drm_gem_object_get(bo);
+		job->bo[bo_count] = to_v3d_bo(bo);
+	}
+	spin_unlock(&file_priv->table_lock);
+
+	ret = v3d_lock_bo_reservations(dev, job->bo, bo_count, &acquire_ctx);
+	if (ret)
+		goto fail;
+
+	mutex_lock(&v3d->sched_lock);
+	ret = drm_sched_job_init(&job->base,
+				 &v3d_priv->sched_entity[V3D_TFU],
+				 v3d_priv);
+	if (ret)
+		goto fail_unreserve;
+
+	sched_done_fence = dma_fence_get(&job->base.s_fence->finished);
+
+	kref_get(&job->refcount); /* put by scheduler job completion */
+	drm_sched_entity_push_job(&job->base, &v3d_priv->sched_entity[V3D_TFU]);
+	mutex_unlock(&v3d->sched_lock);
+
+	v3d_attach_object_fences(job->bo, bo_count, sched_done_fence);
+
+	v3d_unlock_bo_reservations(dev, job->bo, bo_count, &acquire_ctx);
+
+	/* Update the return sync object */
+	sync_out = drm_syncobj_find(file_priv, args->out_sync);
+	if (sync_out) {
+		drm_syncobj_replace_fence(sync_out, 0, sched_done_fence);
+		drm_syncobj_put(sync_out);
+	}
+	dma_fence_put(sched_done_fence);
+
+	v3d_tfu_job_put(job);
+
+	return 0;
+
+fail_unreserve:
+	mutex_unlock(&v3d->sched_lock);
+	v3d_unlock_bo_reservations(dev, job->bo, bo_count, &acquire_ctx);
+fail:
+	v3d_tfu_job_put(job);
+
+	return ret;
+}
+
 int
 v3d_gem_init(struct drm_device *dev)
 {
diff --git a/drivers/gpu/drm/v3d/v3d_irq.c b/drivers/gpu/drm/v3d/v3d_irq.c
index e07514eb11b5..dd7a7b0bd5a1 100644
--- a/drivers/gpu/drm/v3d/v3d_irq.c
+++ b/drivers/gpu/drm/v3d/v3d_irq.c
@@ -4,8 +4,8 @@
 /**
  * DOC: Interrupt management for the V3D engine
  *
- * When we take a binning or rendering flush done interrupt, we need
- * to signal the fence for that job so that the scheduler can queue up
+ * When we take a bin, render, or TFU done interrupt, we need to
+ * signal the fence for that job so that the scheduler can queue up
  * the next one and unblock any waiters.
  *
  * When we take the binner out of memory interrupt, we need to
@@ -23,7 +23,8 @@
 
 #define V3D_HUB_IRQS ((u32)(V3D_HUB_INT_MMU_WRV |	\
 			    V3D_HUB_INT_MMU_PTI |	\
-			    V3D_HUB_INT_MMU_CAP))
+			    V3D_HUB_INT_MMU_CAP |	\
+			    V3D_HUB_INT_TFUC))
 
 static void
 v3d_overflow_mem_work(struct work_struct *work)
@@ -117,6 +118,11 @@ v3d_hub_irq(int irq, void *arg)
 	/* Acknowledge the interrupts we're handling here. */
 	V3D_WRITE(V3D_HUB_INT_CLR, intsts);
 
+	if (intsts & V3D_HUB_INT_TFUC) {
+		dma_fence_signal(v3d->tfu_job->done_fence);
+		status = IRQ_HANDLED;
+	}
+
 	if (intsts & (V3D_HUB_INT_MMU_WRV |
 		      V3D_HUB_INT_MMU_PTI |
 		      V3D_HUB_INT_MMU_CAP)) {
diff --git a/drivers/gpu/drm/v3d/v3d_regs.h b/drivers/gpu/drm/v3d/v3d_regs.h
index c3a5e4e44f73..92a2bd55e217 100644
--- a/drivers/gpu/drm/v3d/v3d_regs.h
+++ b/drivers/gpu/drm/v3d/v3d_regs.h
@@ -86,6 +86,64 @@
 # define V3D_TOP_GR_BRIDGE_SW_INIT_1                   0x0000c
 # define V3D_TOP_GR_BRIDGE_SW_INIT_1_V3D_CLK_108_SW_INIT BIT(0)
 
+#define V3D_TFU_CS                                     0x00400
+/* Stops current job, empties input fifo. */
+# define V3D_TFU_CS_TFURST                             BIT(31)
+# define V3D_TFU_CS_CVTCT_MASK                         V3D_MASK(23, 16)
+# define V3D_TFU_CS_CVTCT_SHIFT                        16
+# define V3D_TFU_CS_NFREE_MASK                         V3D_MASK(13, 8)
+# define V3D_TFU_CS_NFREE_SHIFT                        8
+# define V3D_TFU_CS_BUSY                               BIT(0)
+
+#define V3D_TFU_SU                                     0x00404
+/* Interrupt when FINTTHR input slots are free (0 = disabled) */
+# define V3D_TFU_SU_FINTTHR_MASK                       V3D_MASK(13, 8)
+# define V3D_TFU_SU_FINTTHR_SHIFT                      8
+/* Skips resetting the CRC at the start of CRC generation. */
+# define V3D_TFU_SU_CRCCHAIN                           BIT(4)
+/* skips writes, computes CRC of the image.  miplevels must be 0. */
+# define V3D_TFU_SU_CRC                                BIT(3)
+# define V3D_TFU_SU_THROTTLE_MASK                      V3D_MASK(1, 0)
+# define V3D_TFU_SU_THROTTLE_SHIFT                     0
+
+#define V3D_TFU_ICFG                                   0x00408
+/* Interrupt when the conversion is complete. */
+# define V3D_TFU_ICFG_IOC                              BIT(0)
+
+/* Input Image Address */
+#define V3D_TFU_IIA                                    0x0040c
+/* Input Chroma Address */
+#define V3D_TFU_ICA                                    0x00410
+/* Input Image Stride */
+#define V3D_TFU_IIS                                    0x00414
+/* Input Image U-Plane Address */
+#define V3D_TFU_IUA                                    0x00418
+/* Output Image Address */
+#define V3D_TFU_IOA                                    0x0041c
+/* Image Output Size */
+#define V3D_TFU_IOS                                    0x00420
+/* TFU YUV Coefficient 0 */
+#define V3D_TFU_COEF0                                  0x00424
+/* Use these regs instead of the defaults. */
+# define V3D_TFU_COEF0_USECOEF                         BIT(31)
+/* TFU YUV Coefficient 1 */
+#define V3D_TFU_COEF1                                  0x00428
+/* TFU YUV Coefficient 2 */
+#define V3D_TFU_COEF2                                  0x0042c
+/* TFU YUV Coefficient 3 */
+#define V3D_TFU_COEF3                                  0x00430
+
+#define V3D_TFU_CRC                                    0x00434
+
+#define V3D_TFU_INT_STS                                0x00438
+#define V3D_TFU_INT_SET                                0x0043c
+#define V3D_TFU_INT_CLR                                0x00440
+#define V3D_TFU_INT_MSK_STS                            0x00444
+#define V3D_TFU_INT_MSK_SET                            0x00448
+#define V3D_TFU_INT_MSK_CLR                            0x0044c
+#define V3D_TFU_INT_TFUC                               BIT(1)
+#define V3D_TFU_INT_TFUF                               BIT(0)
+
 /* Per-MMU registers. */
 
 #define V3D_MMUC_CONTROL                               0x01000
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index e1f2aab0717b..7a3d4020cfca 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -30,6 +30,12 @@ to_v3d_job(struct drm_sched_job *sched_job)
 	return container_of(sched_job, struct v3d_job, base);
 }
 
+static struct v3d_tfu_job *
+to_tfu_job(struct drm_sched_job *sched_job)
+{
+	return container_of(sched_job, struct v3d_tfu_job, base);
+}
+
 static void
 v3d_job_free(struct drm_sched_job *sched_job)
 {
@@ -38,6 +44,14 @@ v3d_job_free(struct drm_sched_job *sched_job)
 	v3d_exec_put(job->exec);
 }
 
+static void
+v3d_tfu_job_free(struct drm_sched_job *sched_job)
+{
+	struct v3d_tfu_job *job = to_tfu_job(sched_job);
+
+	v3d_tfu_job_put(job);
+}
+
 /**
  * Returns the fences that the bin or render job depends on, one by one.
  * v3d_job_run() won't be called until all of them have been signaled.
@@ -76,6 +90,27 @@ v3d_job_dependency(struct drm_sched_job *sched_job,
 	return fence;
 }
 
+/**
+ * Returns the fences that the TFU job depends on, one by one.
+ * v3d_tfu_job_run() won't be called until all of them have been
+ * signaled.
+ */
+static struct dma_fence *
+v3d_tfu_job_dependency(struct drm_sched_job *sched_job,
+		   struct drm_sched_entity *s_entity)
+{
+	struct v3d_tfu_job *job = to_tfu_job(sched_job);
+	struct dma_fence *fence;
+
+	fence = job->in_fence;
+	if (fence) {
+		job->in_fence = NULL;
+		return fence;
+	}
+
+	return NULL;
+}
+
 static struct dma_fence *v3d_job_run(struct drm_sched_job *sched_job)
 {
 	struct v3d_job *job = to_v3d_job(sched_job);
@@ -147,6 +182,71 @@ static struct dma_fence *v3d_job_run(struct drm_sched_job *sched_job)
 	return fence;
 }
 
+static struct dma_fence *
+v3d_tfu_job_run(struct drm_sched_job *sched_job)
+{
+	struct v3d_tfu_job *job = to_tfu_job(sched_job);
+	struct v3d_dev *v3d = job->v3d;
+	struct drm_device *dev = &v3d->drm;
+	struct dma_fence *fence;
+
+	fence = v3d_fence_create(v3d, V3D_TFU);
+	if (IS_ERR(fence))
+		return NULL;
+
+	v3d->tfu_job = job;
+	if (job->done_fence)
+		dma_fence_put(job->done_fence);
+	job->done_fence = dma_fence_get(fence);
+
+	trace_v3d_submit_tfu(dev, to_v3d_fence(fence)->seqno);
+
+	V3D_WRITE(V3D_TFU_IIA, job->args.iia);
+	V3D_WRITE(V3D_TFU_IIS, job->args.iis);
+	V3D_WRITE(V3D_TFU_ICA, job->args.ica);
+	V3D_WRITE(V3D_TFU_IUA, job->args.iua);
+	V3D_WRITE(V3D_TFU_IOA, job->args.ioa);
+	V3D_WRITE(V3D_TFU_IOS, job->args.ios);
+	if (job->args.coef[0] & V3D_TFU_COEF0_USECOEF) {
+		V3D_WRITE(V3D_TFU_COEF0, job->args.coef[0]);
+		V3D_WRITE(V3D_TFU_COEF1, job->args.coef[1]);
+		V3D_WRITE(V3D_TFU_COEF2, job->args.coef[2]);
+		V3D_WRITE(V3D_TFU_COEF3, job->args.coef[3]);
+	}
+	/* ICFG kicks off the job. */
+	V3D_WRITE(V3D_TFU_ICFG, job->args.icfg | V3D_TFU_ICFG_IOC);
+
+	return fence;
+}
+
+static void
+v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, struct drm_sched_job *sched_job)
+{
+	enum v3d_queue q;
+
+	mutex_lock(&v3d->reset_lock);
+
+	/* block scheduler */
+	for (q = 0; q < V3D_MAX_QUEUES; q++) {
+		struct drm_gpu_scheduler *sched = &v3d->queue[q].sched;
+
+		kthread_park(sched->thread);
+		drm_sched_hw_job_reset(sched, (sched_job->sched == sched ?
+					       sched_job : NULL));
+	}
+
+	/* get the GPU back into the init state */
+	v3d_reset(v3d);
+
+	/* Unblock schedulers and restart their jobs. */
+	for (q = 0; q < V3D_MAX_QUEUES; q++) {
+		drm_sched_job_recovery(&v3d->queue[q].sched);
+		kthread_unpark(v3d->queue[q].sched.thread);
+	}
+
+	mutex_unlock(&v3d->reset_lock);
+}
+
 static void
 v3d_job_timedout(struct drm_sched_job *sched_job)
 {
@@ -154,7 +254,6 @@ v3d_job_timedout(struct drm_sched_job *sched_job)
 	struct v3d_exec_info *exec = job->exec;
 	struct v3d_dev *v3d = exec->v3d;
 	enum v3d_queue job_q = job == &exec->bin ? V3D_BIN : V3D_RENDER;
-	enum v3d_queue q;
 	u32 ctca = V3D_CORE_READ(0, V3D_CLE_CTNCA(job_q));
 	u32 ctra = V3D_CORE_READ(0, V3D_CLE_CTNRA(job_q));
 
@@ -173,27 +272,15 @@ v3d_job_timedout(struct drm_sched_job *sched_job)
 		return;
 	}
 
-	mutex_lock(&v3d->reset_lock);
-
-	/* block scheduler */
-	for (q = 0; q < V3D_MAX_QUEUES; q++) {
-		struct drm_gpu_scheduler *sched = &v3d->queue[q].sched;
-
-		kthread_park(sched->thread);
-		drm_sched_hw_job_reset(sched, (sched_job->sched == sched ?
-					       sched_job : NULL));
-	}
-
-	/* get the GPU back into the init state */
-	v3d_reset(v3d);
+	v3d_gpu_reset_for_timeout(v3d, sched_job);
+}
 
-	/* Unblock schedulers and restart their jobs. */
-	for (q = 0; q < V3D_MAX_QUEUES; q++) {
-		drm_sched_job_recovery(&v3d->queue[q].sched);
-		kthread_unpark(v3d->queue[q].sched.thread);
-	}
+static void
+v3d_tfu_job_timedout(struct drm_sched_job *sched_job)
+{
+	struct v3d_tfu_job *job = to_tfu_job(sched_job);
 
-	mutex_unlock(&v3d->reset_lock);
+	v3d_gpu_reset_for_timeout(job->v3d, sched_job);
 }
 
 static const struct drm_sched_backend_ops v3d_sched_ops = {
@@ -203,6 +290,13 @@ static const struct drm_sched_backend_ops v3d_sched_ops = {
 	.free_job = v3d_job_free
 };
 
+static const struct drm_sched_backend_ops v3d_tfu_sched_ops = {
+	.dependency = v3d_tfu_job_dependency,
+	.run_job = v3d_tfu_job_run,
+	.timedout_job = v3d_tfu_job_timedout,
+	.free_job = v3d_tfu_job_free
+};
+
 int
 v3d_sched_init(struct v3d_dev *v3d)
 {
@@ -233,6 +327,19 @@ v3d_sched_init(struct v3d_dev *v3d)
 		return ret;
 	}
 
+	ret = drm_sched_init(&v3d->queue[V3D_TFU].sched,
+			     &v3d_tfu_sched_ops,
+			     hw_jobs_limit, job_hang_limit,
+			     msecs_to_jiffies(hang_limit_ms),
+			     "v3d_tfu");
+	if (ret) {
+		dev_err(v3d->dev, "Failed to create TFU scheduler: %d.",
+			ret);
+		drm_sched_fini(&v3d->queue[V3D_RENDER].sched);
+		drm_sched_fini(&v3d->queue[V3D_BIN].sched);
+		return ret;
+	}
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/v3d/v3d_trace.h b/drivers/gpu/drm/v3d/v3d_trace.h
index 85dd351e1e09..f54ed9cd3444 100644
--- a/drivers/gpu/drm/v3d/v3d_trace.h
+++ b/drivers/gpu/drm/v3d/v3d_trace.h
@@ -42,6 +42,26 @@ TRACE_EVENT(v3d_submit_cl,
 		      __entry->ctnqea)
 );
 
+TRACE_EVENT(v3d_submit_tfu,
+	    TP_PROTO(struct drm_device *dev,
+		     uint64_t seqno),
+	    TP_ARGS(dev, seqno),
+
+	    TP_STRUCT__entry(
+			     __field(u32, dev)
+			     __field(u64, seqno)
+			     ),
+
+	    TP_fast_assign(
+			   __entry->dev = dev->primary->index;
+			   __entry->seqno = seqno;
+			   ),
+
+	    TP_printk("dev=%u, seqno=%llu",
+		      __entry->dev,
+		      __entry->seqno)
+);
+
 TRACE_EVENT(v3d_reset_begin,
 	    TP_PROTO(struct drm_device *dev),
 	    TP_ARGS(dev),
diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
index b1e5de076b0f..3bb4c6136f18 100644
--- a/include/uapi/drm/v3d_drm.h
+++ b/include/uapi/drm/v3d_drm.h
@@ -36,6 +36,7 @@ extern "C" {
 #define DRM_V3D_MMAP_BO                           0x03
 #define DRM_V3D_GET_PARAM                         0x04
 #define DRM_V3D_GET_BO_OFFSET                     0x05
+#define DRM_V3D_SUBMIT_TFU                        0x06
 
 #define DRM_IOCTL_V3D_SUBMIT_CL           DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_SUBMIT_CL, struct drm_v3d_submit_cl)
 #define DRM_IOCTL_V3D_WAIT_BO             DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_WAIT_BO, struct drm_v3d_wait_bo)
@@ -43,6 +44,7 @@ extern "C" {
 #define DRM_IOCTL_V3D_MMAP_BO             DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_MMAP_BO, struct drm_v3d_mmap_bo)
 #define DRM_IOCTL_V3D_GET_PARAM           DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_GET_PARAM, struct drm_v3d_get_param)
 #define DRM_IOCTL_V3D_GET_BO_OFFSET       DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_GET_BO_OFFSET, struct drm_v3d_get_bo_offset)
+#define DRM_IOCTL_V3D_SUBMIT_TFU          DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_SUBMIT_TFU, struct drm_v3d_submit_tfu)
 
 /**
  * struct drm_v3d_submit_cl - ioctl argument for submitting commands to the 3D
@@ -179,6 +181,7 @@ enum drm_v3d_param {
 	DRM_V3D_PARAM_V3D_CORE0_IDENT0,
 	DRM_V3D_PARAM_V3D_CORE0_IDENT1,
 	DRM_V3D_PARAM_V3D_CORE0_IDENT2,
+	DRM_V3D_PARAM_SUPPORTS_TFU,
 };
 
 struct drm_v3d_get_param {
@@ -197,6 +200,28 @@ struct drm_v3d_get_bo_offset {
 	__u32 offset;
 };
 
+struct drm_v3d_submit_tfu {
+	__u32 icfg;
+	__u32 iia;
+	__u32 iis;
+	__u32 ica;
+	__u32 iua;
+	__u32 ioa;
+	__u32 ios;
+	__u32 coef[4];
+	/* First handle is the output BO, following are other inputs.
+	 * 0 for unused.
+	 */
+	__u32 bo_handles[4];
+	/* sync object to block on before submitting the TFU job.  Each TFU
+	 * job will execute in the order submitted to its FD.  Synchronization
+	 * against rendering jobs requires using sync objects.
+	 */
+	__u32 in_sync;
+	/* Sync object to signal when the TFU job is done. */
+	__u32 out_sync;
+};
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] drm/v3d: Fix whitespace inconsistency in the header.
  2018-11-08 16:16 ` [PATCH 1/4] drm/v3d: Fix whitespace inconsistency in the header Eric Anholt
@ 2018-11-13 10:22   ` Boris Brezillon
  0 siblings, 0 replies; 10+ messages in thread
From: Boris Brezillon @ 2018-11-13 10:22 UTC (permalink / raw)
  To: Eric Anholt; +Cc: dri-devel, linux-kernel, Daniel Vetter

On Thu,  8 Nov 2018 08:16:51 -0800
Eric Anholt <eric@anholt.net> wrote:

Maybe you could add a short description here to avoid having an empty
commit message.

> Signed-off-by: Eric Anholt <eric@anholt.net>

Reviewed-by: Boris Brezillon <boris.brezillon@bootlin.com>

> ---
>  include/uapi/drm/v3d_drm.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
> index f446656d00b1..b1e5de076b0f 100644
> --- a/include/uapi/drm/v3d_drm.h
> +++ b/include/uapi/drm/v3d_drm.h
> @@ -66,7 +66,7 @@ struct drm_v3d_submit_cl {
>  	 */
>  	__u32 bcl_start;
>  
> -	 /** End address of the BCL (first byte after the BCL) */
> +	/** End address of the BCL (first byte after the BCL) */
>  	__u32 bcl_end;
>  
>  	/* Offset of the render command list.
> @@ -82,7 +82,7 @@ struct drm_v3d_submit_cl {
>  	 */
>  	__u32 rcl_start;
>  
> -	 /** End address of the RCL (first byte after the RCL) */
> +	/** End address of the RCL (first byte after the RCL) */
>  	__u32 rcl_end;
>  
>  	/** An optional sync object to wait on before starting the BCL. */


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/4] drm/v3d: Update a comment about what uses v3d_job_dependency().
  2018-11-08 16:16 ` [PATCH 2/4] drm/v3d: Update a comment about what uses v3d_job_dependency() Eric Anholt
@ 2018-11-13 10:22   ` Boris Brezillon
  0 siblings, 0 replies; 10+ messages in thread
From: Boris Brezillon @ 2018-11-13 10:22 UTC (permalink / raw)
  To: Eric Anholt; +Cc: dri-devel, linux-kernel, Daniel Vetter

On Thu,  8 Nov 2018 08:16:52 -0800
Eric Anholt <eric@anholt.net> wrote:

> I merged bin and render's paths in a late refactoring.
> 
> Signed-off-by: Eric Anholt <eric@anholt.net>

Reviewed-by: Boris Brezillon <boris.brezillon@bootlin.com>

> ---
>  drivers/gpu/drm/v3d/v3d_sched.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
> index 9243dea6e6ad..e1f2aab0717b 100644
> --- a/drivers/gpu/drm/v3d/v3d_sched.c
> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
> @@ -39,7 +39,7 @@ v3d_job_free(struct drm_sched_job *sched_job)
>  }
>  
>  /**
> - * Returns the fences that the bin job depends on, one by one.
> + * Returns the fences that the bin or render job depends on, one by one.
>   * v3d_job_run() won't be called until all of them have been signaled.
>   */
>  static struct dma_fence *


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/4] drm/v3d: Clean up the reservation object setup.
  2018-11-08 16:16 ` [PATCH 3/4] drm/v3d: Clean up the reservation object setup Eric Anholt
@ 2018-11-13 10:22   ` Boris Brezillon
  0 siblings, 0 replies; 10+ messages in thread
From: Boris Brezillon @ 2018-11-13 10:22 UTC (permalink / raw)
  To: Eric Anholt; +Cc: dri-devel, linux-kernel, Daniel Vetter

On Thu,  8 Nov 2018 08:16:53 -0800
Eric Anholt <eric@anholt.net> wrote:

> The extra to_v3d_bo() calls came from copying this from the vc4
> driver, which stored the cma gem object in the structs.
> 
> Signed-off-by: Eric Anholt <eric@anholt.net>

Reviewed-by: Boris Brezillon <boris.brezillon@bootlin.com>

> ---
>  drivers/gpu/drm/v3d/v3d_gem.c | 32 +++++++++++---------------------
>  1 file changed, 11 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
> index b88c96911453..d0dfdcbbd42c 100644
> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> @@ -214,10 +214,8 @@ v3d_attach_object_fences(struct v3d_exec_info *exec)
>  	int i;
>  
>  	for (i = 0; i < exec->bo_count; i++) {
> -		bo = to_v3d_bo(&exec->bo[i]->base);
> -
>  		/* XXX: Use shared fences for read-only objects. */
> -		reservation_object_add_excl_fence(bo->resv, out_fence);
> +		reservation_object_add_excl_fence(exec->bo[i]->resv, out_fence);
>  	}
>  }
>  
> @@ -228,11 +226,8 @@ v3d_unlock_bo_reservations(struct drm_device *dev,
>  {
>  	int i;
>  
> -	for (i = 0; i < exec->bo_count; i++) {
> -		struct v3d_bo *bo = to_v3d_bo(&exec->bo[i]->base);
> -
> -		ww_mutex_unlock(&bo->resv->lock);
> -	}
> +	for (i = 0; i < exec->bo_count; i++)
> +		ww_mutex_unlock(&exec->bo[i]->resv->lock);
>  
>  	ww_acquire_fini(acquire_ctx);
>  }
> @@ -251,13 +246,13 @@ v3d_lock_bo_reservations(struct drm_device *dev,
>  {
>  	int contended_lock = -1;
>  	int i, ret;
> -	struct v3d_bo *bo;
>  
>  	ww_acquire_init(acquire_ctx, &reservation_ww_class);
>  
>  retry:
>  	if (contended_lock != -1) {
> -		bo = to_v3d_bo(&exec->bo[contended_lock]->base);
> +		struct v3d_bo *bo = exec->bo[contended_lock];
> +
>  		ret = ww_mutex_lock_slow_interruptible(&bo->resv->lock,
>  						       acquire_ctx);
>  		if (ret) {
> @@ -270,19 +265,16 @@ v3d_lock_bo_reservations(struct drm_device *dev,
>  		if (i == contended_lock)
>  			continue;
>  
> -		bo = to_v3d_bo(&exec->bo[i]->base);
> -
> -		ret = ww_mutex_lock_interruptible(&bo->resv->lock, acquire_ctx);
> +		ret = ww_mutex_lock_interruptible(&exec->bo[i]->resv->lock,
> +						  acquire_ctx);
>  		if (ret) {
>  			int j;
>  
> -			for (j = 0; j < i; j++) {
> -				bo = to_v3d_bo(&exec->bo[j]->base);
> -				ww_mutex_unlock(&bo->resv->lock);
> -			}
> +			for (j = 0; j < i; j++)
> +				ww_mutex_unlock(&exec->bo[j]->resv->lock);
>  
>  			if (contended_lock != -1 && contended_lock >= i) {
> -				bo = to_v3d_bo(&exec->bo[contended_lock]->base);
> +				struct v3d_bo *bo = exec->bo[contended_lock];
>  
>  				ww_mutex_unlock(&bo->resv->lock);
>  			}
> @@ -303,9 +295,7 @@ v3d_lock_bo_reservations(struct drm_device *dev,
>  	 * before we commit the CL to the hardware.
>  	 */
>  	for (i = 0; i < exec->bo_count; i++) {
> -		bo = to_v3d_bo(&exec->bo[i]->base);
> -
> -		ret = reservation_object_reserve_shared(bo->resv, 1);
> +		ret = reservation_object_reserve_shared(exec->bo[i]->resv, 1);
>  		if (ret) {
>  			v3d_unlock_bo_reservations(dev, exec, acquire_ctx);
>  			return ret;


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 4/4] drm/v3d: Add support for submitting jobs to the TFU.
  2018-11-08 16:16 ` [PATCH 4/4] drm/v3d: Add support for submitting jobs to the TFU Eric Anholt
@ 2018-11-13 22:03   ` Eric Anholt
  2018-11-28 19:45     ` Dave Emett
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Anholt @ 2018-11-13 22:03 UTC (permalink / raw)
  To: dri-devel; +Cc: linux-kernel, boris.brezillon, Daniel Vetter, Dave Emett

[-- Attachment #1: Type: text/plain, Size: 28036 bytes --]

Eric Anholt <eric@anholt.net> writes:

> The TFU can copy from raster, UIF, and SAND input images to UIF output
> images, with optional mipmap generation.  This will certainly be
> useful for media EGL image input, but is also useful immediately for
> mipmap generation without bogging the V3D core down.
>
> For now we only run the queue 1 job deep, and don't have any hang
> recovery (though I don't think we should need it, with TFU).  Queuing
> multiple jobs in the HW will require synchronizing the YUV coefficient
> regs updates since they don't get FIFOed with the job.
>
> Signed-off-by: Eric Anholt <eric@anholt.net>

Ccing Dave Emett, who may be able to provide useful feedback.

I think one interesting question here is if TFU hangs (has it ever hung,
in our experience?) do we want to reset the whole V3D, or is the reset
flag in the TFU block enough?

> ---
>  drivers/gpu/drm/v3d/v3d_drv.c   |  12 ++-
>  drivers/gpu/drm/v3d/v3d_drv.h   |  32 +++++-
>  drivers/gpu/drm/v3d/v3d_gem.c   | 177 ++++++++++++++++++++++++++++----
>  drivers/gpu/drm/v3d/v3d_irq.c   |  12 ++-
>  drivers/gpu/drm/v3d/v3d_regs.h  |  58 +++++++++++
>  drivers/gpu/drm/v3d/v3d_sched.c | 147 ++++++++++++++++++++++----
>  drivers/gpu/drm/v3d/v3d_trace.h |  20 ++++
>  include/uapi/drm/v3d_drm.h      |  25 +++++
>  8 files changed, 431 insertions(+), 52 deletions(-)
>
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
> index 4857c0a63131..da0863281a73 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.c
> +++ b/drivers/gpu/drm/v3d/v3d_drv.c
> @@ -184,10 +184,15 @@ static int v3d_get_param_ioctl(struct drm_device *dev, void *data,
>  		return 0;
>  	}
>  
> -	/* Any params that aren't just register reads would go here. */
>  
> -	DRM_DEBUG("Unknown parameter %d\n", args->param);
> -	return -EINVAL;
> +	switch (args->param) {
> +	case DRM_V3D_PARAM_SUPPORTS_TFU:
> +		args->value = 1;
> +		return 0;
> +	default:
> +		DRM_DEBUG("Unknown parameter %d\n", args->param);
> +		return -EINVAL;
> +	}
>  }
>  
>  static int
> @@ -251,6 +256,7 @@ static const struct drm_ioctl_desc v3d_drm_ioctls[] = {
>  	DRM_IOCTL_DEF_DRV(V3D_MMAP_BO, v3d_mmap_bo_ioctl, DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(V3D_GET_PARAM, v3d_get_param_ioctl, DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(V3D_GET_BO_OFFSET, v3d_get_bo_offset_ioctl, DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(V3D_SUBMIT_TFU, v3d_submit_tfu_ioctl, DRM_RENDER_ALLOW | DRM_AUTH),
>  };
>  
>  static const struct vm_operations_struct v3d_vm_ops = {
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
> index 83c55ab6e1c0..e0624ea72942 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.h
> +++ b/drivers/gpu/drm/v3d/v3d_drv.h
> @@ -8,19 +8,18 @@
>  #include <drm/drm_gem.h>
>  #include <drm/gpu_scheduler.h>
>  #include <drm/drm_simple_kms_helper.h>
> +#include "uapi/drm/v3d_drm.h"
>  
>  #define GMP_GRANULARITY (128 * 1024)
>  
> -/* Enum for each of the V3D queues.  We maintain various queue
> - * tracking as an array because at some point we'll want to support
> - * the TFU (texture formatting unit) as another queue.
> - */
> +/* Enum for each of the V3D queues. */
>  enum v3d_queue {
>  	V3D_BIN,
>  	V3D_RENDER,
> +	V3D_TFU,
>  };
>  
> -#define V3D_MAX_QUEUES (V3D_RENDER + 1)
> +#define V3D_MAX_QUEUES (V3D_TFU + 1)
>  
>  struct v3d_queue_state {
>  	struct drm_gpu_scheduler sched;
> @@ -74,6 +73,7 @@ struct v3d_dev {
>  
>  	struct v3d_exec_info *bin_job;
>  	struct v3d_exec_info *render_job;
> +	struct v3d_tfu_job *tfu_job;
>  
>  	struct v3d_queue_state queue[V3D_MAX_QUEUES];
>  
> @@ -224,6 +224,25 @@ struct v3d_exec_info {
>  	u32 qma, qms, qts;
>  };
>  
> +struct v3d_tfu_job {
> +	struct drm_sched_job base;
> +
> +	struct drm_v3d_submit_tfu args;
> +
> +	/* An optional fence userspace can pass in for the job to depend on. */
> +	struct dma_fence *in_fence;
> +
> +	/* v3d fence to be signaled by IRQ handler when the job is complete. */
> +	struct dma_fence *done_fence;
> +
> +	struct v3d_dev *v3d;
> +
> +	struct kref refcount;
> +
> +	/* This is the array of BOs that were looked up at the start of exec. */
> +	struct v3d_bo *bo[4];
> +};
> +
>  /**
>   * _wait_for - magic (register) wait macro
>   *
> @@ -287,9 +306,12 @@ int v3d_gem_init(struct drm_device *dev);
>  void v3d_gem_destroy(struct drm_device *dev);
>  int v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
>  			struct drm_file *file_priv);
> +int v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
> +			 struct drm_file *file_priv);
>  int v3d_wait_bo_ioctl(struct drm_device *dev, void *data,
>  		      struct drm_file *file_priv);
>  void v3d_exec_put(struct v3d_exec_info *exec);
> +void v3d_tfu_job_put(struct v3d_tfu_job *exec);
>  void v3d_reset(struct v3d_dev *v3d);
>  void v3d_invalidate_caches(struct v3d_dev *v3d);
>  void v3d_flush_caches(struct v3d_dev *v3d);
> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
> index d0dfdcbbd42c..adc8b1ec15e3 100644
> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> @@ -207,27 +207,27 @@ v3d_flush_caches(struct v3d_dev *v3d)
>  }
>  
>  static void
> -v3d_attach_object_fences(struct v3d_exec_info *exec)
> +v3d_attach_object_fences(struct v3d_bo **bos, int bo_count,
> +			 struct dma_fence *fence)
>  {
> -	struct dma_fence *out_fence = exec->render_done_fence;
> -	struct v3d_bo *bo;
>  	int i;
>  
> -	for (i = 0; i < exec->bo_count; i++) {
> +	for (i = 0; i < bo_count; i++) {
>  		/* XXX: Use shared fences for read-only objects. */
> -		reservation_object_add_excl_fence(exec->bo[i]->resv, out_fence);
> +		reservation_object_add_excl_fence(bos[i]->resv, fence);
>  	}
>  }
>  
>  static void
>  v3d_unlock_bo_reservations(struct drm_device *dev,
> -			   struct v3d_exec_info *exec,
> +			   struct v3d_bo **bos,
> +			   int bo_count,
>  			   struct ww_acquire_ctx *acquire_ctx)
>  {
>  	int i;
>  
> -	for (i = 0; i < exec->bo_count; i++)
> -		ww_mutex_unlock(&exec->bo[i]->resv->lock);
> +	for (i = 0; i < bo_count; i++)
> +		ww_mutex_unlock(&bos[i]->resv->lock);
>  
>  	ww_acquire_fini(acquire_ctx);
>  }
> @@ -241,7 +241,8 @@ v3d_unlock_bo_reservations(struct drm_device *dev,
>   */
>  static int
>  v3d_lock_bo_reservations(struct drm_device *dev,
> -			 struct v3d_exec_info *exec,
> +			 struct v3d_bo **bos,
> +			 int bo_count,
>  			 struct ww_acquire_ctx *acquire_ctx)
>  {
>  	int contended_lock = -1;
> @@ -251,7 +252,7 @@ v3d_lock_bo_reservations(struct drm_device *dev,
>  
>  retry:
>  	if (contended_lock != -1) {
> -		struct v3d_bo *bo = exec->bo[contended_lock];
> +		struct v3d_bo *bo = bos[contended_lock];
>  
>  		ret = ww_mutex_lock_slow_interruptible(&bo->resv->lock,
>  						       acquire_ctx);
> @@ -261,20 +262,20 @@ v3d_lock_bo_reservations(struct drm_device *dev,
>  		}
>  	}
>  
> -	for (i = 0; i < exec->bo_count; i++) {
> +	for (i = 0; i < bo_count; i++) {
>  		if (i == contended_lock)
>  			continue;
>  
> -		ret = ww_mutex_lock_interruptible(&exec->bo[i]->resv->lock,
> +		ret = ww_mutex_lock_interruptible(&bos[i]->resv->lock,
>  						  acquire_ctx);
>  		if (ret) {
>  			int j;
>  
>  			for (j = 0; j < i; j++)
> -				ww_mutex_unlock(&exec->bo[j]->resv->lock);
> +				ww_mutex_unlock(&bos[j]->resv->lock);
>  
>  			if (contended_lock != -1 && contended_lock >= i) {
> -				struct v3d_bo *bo = exec->bo[contended_lock];
> +				struct v3d_bo *bo = bos[contended_lock];
>  
>  				ww_mutex_unlock(&bo->resv->lock);
>  			}
> @@ -294,10 +295,11 @@ v3d_lock_bo_reservations(struct drm_device *dev,
>  	/* Reserve space for our shared (read-only) fence references,
>  	 * before we commit the CL to the hardware.
>  	 */
> -	for (i = 0; i < exec->bo_count; i++) {
> -		ret = reservation_object_reserve_shared(exec->bo[i]->resv, 1);
> +	for (i = 0; i < bo_count; i++) {
> +		ret = reservation_object_reserve_shared(bos[i]->resv, 1);
>  		if (ret) {
> -			v3d_unlock_bo_reservations(dev, exec, acquire_ctx);
> +			v3d_unlock_bo_reservations(dev, bos, bo_count,
> +						   acquire_ctx);
>  			return ret;
>  		}
>  	}
> @@ -420,6 +422,31 @@ void v3d_exec_put(struct v3d_exec_info *exec)
>  	kref_put(&exec->refcount, v3d_exec_cleanup);
>  }
>  
> +static void
> +v3d_tfu_job_cleanup(struct kref *ref)
> +{
> +	struct v3d_tfu_job *job = container_of(ref, struct v3d_tfu_job,
> +					       refcount);
> +	struct v3d_dev *v3d = job->v3d;
> +	unsigned int i;
> +
> +	dma_fence_put(job->in_fence);
> +	dma_fence_put(job->done_fence);
> +
> +	for (i = 0; i < ARRAY_SIZE(job->bo); i++)
> +		drm_gem_object_put_unlocked(&job->bo[i]->base);
> +
> +	pm_runtime_mark_last_busy(v3d->dev);
> +	pm_runtime_put_autosuspend(v3d->dev);
> +
> +	kfree(job);
> +}
> +
> +void v3d_tfu_job_put(struct v3d_tfu_job *job)
> +{
> +	kref_put(&job->refcount, v3d_tfu_job_cleanup);
> +}
> +
>  int
>  v3d_wait_bo_ioctl(struct drm_device *dev, void *data,
>  		  struct drm_file *file_priv)
> @@ -537,7 +564,8 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
>  	if (ret)
>  		goto fail;
>  
> -	ret = v3d_lock_bo_reservations(dev, exec, &acquire_ctx);
> +	ret = v3d_lock_bo_reservations(dev, exec->bo, exec->bo_count,
> +				       &acquire_ctx);
>  	if (ret)
>  		goto fail;
>  
> @@ -571,9 +599,10 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
>  				  &v3d_priv->sched_entity[V3D_RENDER]);
>  	mutex_unlock(&v3d->sched_lock);
>  
> -	v3d_attach_object_fences(exec);
> +	v3d_attach_object_fences(exec->bo, exec->bo_count,
> +				 exec->render_done_fence);
>  
> -	v3d_unlock_bo_reservations(dev, exec, &acquire_ctx);
> +	v3d_unlock_bo_reservations(dev, exec->bo, exec->bo_count, &acquire_ctx);
>  
>  	/* Update the return sync object for the */
>  	sync_out = drm_syncobj_find(file_priv, args->out_sync);
> @@ -589,13 +618,119 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
>  
>  fail_unreserve:
>  	mutex_unlock(&v3d->sched_lock);
> -	v3d_unlock_bo_reservations(dev, exec, &acquire_ctx);
> +	v3d_unlock_bo_reservations(dev, exec->bo, exec->bo_count, &acquire_ctx);
>  fail:
>  	v3d_exec_put(exec);
>  
>  	return ret;
>  }
>  
> +/**
> + * v3d_submit_tfu_ioctl() - Submits a TFU (texture formatting) job to the V3D.
> + * @dev: DRM device
> + * @data: ioctl argument
> + * @file_priv: DRM file for this fd
> + *
> + * Userspace provides the register setup for the TFU, which we don't
> + * need to validate since the TFU is behind the MMU.
> + */
> +int
> +v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
> +		    struct drm_file *file_priv)
> +{
> +	struct v3d_dev *v3d = to_v3d_dev(dev);
> +	struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
> +	struct drm_v3d_submit_tfu *args = data;
> +	struct v3d_tfu_job *job;
> +	struct ww_acquire_ctx acquire_ctx;
> +	struct drm_syncobj *sync_out;
> +	struct dma_fence *sched_done_fence;
> +	int ret = 0;
> +	int bo_count;
> +
> +	job = kcalloc(1, sizeof(*job), GFP_KERNEL);
> +	if (!job)
> +		return -ENOMEM;
> +
> +	ret = pm_runtime_get_sync(v3d->dev);
> +	if (ret < 0) {
> +		kfree(job);
> +		return ret;
> +	}
> +
> +	kref_init(&job->refcount);
> +
> +	ret = drm_syncobj_find_fence(file_priv, args->in_sync,
> +				     0, 0, &job->in_fence);
> +	if (ret == -EINVAL)
> +		goto fail;
> +
> +	job->args = *args;
> +	job->v3d = v3d;
> +
> +	spin_lock(&file_priv->table_lock);
> +	for (bo_count = 0; bo_count < ARRAY_SIZE(job->bo); bo_count++) {
> +		struct drm_gem_object *bo;
> +
> +		if (!args->bo_handles[bo_count])
> +			break;
> +
> +		bo = idr_find(&file_priv->object_idr,
> +			      args->bo_handles[bo_count]);
> +		if (!bo) {
> +			DRM_DEBUG("Failed to look up GEM BO %d: %d\n",
> +				  bo_count, args->bo_handles[bo_count]);
> +			ret = -ENOENT;
> +			spin_unlock(&file_priv->table_lock);
> +			goto fail;
> +		}
> +		drm_gem_object_get(bo);
> +		job->bo[bo_count] = to_v3d_bo(bo);
> +	}
> +	spin_unlock(&file_priv->table_lock);
> +
> +	ret = v3d_lock_bo_reservations(dev, job->bo, bo_count, &acquire_ctx);
> +	if (ret)
> +		goto fail;
> +
> +	mutex_lock(&v3d->sched_lock);
> +	ret = drm_sched_job_init(&job->base,
> +				 &v3d_priv->sched_entity[V3D_TFU],
> +				 v3d_priv);
> +	if (ret)
> +		goto fail_unreserve;
> +
> +	sched_done_fence = dma_fence_get(&job->base.s_fence->finished);
> +
> +	kref_get(&job->refcount); /* put by scheduler job completion */
> +	drm_sched_entity_push_job(&job->base, &v3d_priv->sched_entity[V3D_TFU]);
> +	mutex_unlock(&v3d->sched_lock);
> +
> +	v3d_attach_object_fences(job->bo, bo_count, sched_done_fence);
> +
> +	v3d_unlock_bo_reservations(dev, job->bo, bo_count, &acquire_ctx);
> +
> +	/* Update the return sync object */
> +	sync_out = drm_syncobj_find(file_priv, args->out_sync);
> +	if (sync_out) {
> +		drm_syncobj_replace_fence(sync_out, 0, sched_done_fence);
> +		drm_syncobj_put(sync_out);
> +	}
> +	dma_fence_put(sched_done_fence);
> +
> +	v3d_tfu_job_put(job);
> +
> +	return 0;
> +
> +fail_unreserve:
> +	mutex_unlock(&v3d->sched_lock);
> +	v3d_unlock_bo_reservations(dev, job->bo, bo_count, &acquire_ctx);
> +fail:
> +	v3d_tfu_job_put(job);
> +
> +	return ret;
> +}
> +
>  int
>  v3d_gem_init(struct drm_device *dev)
>  {
> diff --git a/drivers/gpu/drm/v3d/v3d_irq.c b/drivers/gpu/drm/v3d/v3d_irq.c
> index e07514eb11b5..dd7a7b0bd5a1 100644
> --- a/drivers/gpu/drm/v3d/v3d_irq.c
> +++ b/drivers/gpu/drm/v3d/v3d_irq.c
> @@ -4,8 +4,8 @@
>  /**
>   * DOC: Interrupt management for the V3D engine
>   *
> - * When we take a binning or rendering flush done interrupt, we need
> - * to signal the fence for that job so that the scheduler can queue up
> + * When we take a bin, render, or TFU done interrupt, we need to
> + * signal the fence for that job so that the scheduler can queue up
>   * the next one and unblock any waiters.
>   *
>   * When we take the binner out of memory interrupt, we need to
> @@ -23,7 +23,8 @@
>  
>  #define V3D_HUB_IRQS ((u32)(V3D_HUB_INT_MMU_WRV |	\
>  			    V3D_HUB_INT_MMU_PTI |	\
> -			    V3D_HUB_INT_MMU_CAP))
> +			    V3D_HUB_INT_MMU_CAP |	\
> +			    V3D_HUB_INT_TFUC))
>  
>  static void
>  v3d_overflow_mem_work(struct work_struct *work)
> @@ -117,6 +118,11 @@ v3d_hub_irq(int irq, void *arg)
>  	/* Acknowledge the interrupts we're handling here. */
>  	V3D_WRITE(V3D_HUB_INT_CLR, intsts);
>  
> +	if (intsts & V3D_HUB_INT_TFUC) {
> +		dma_fence_signal(v3d->tfu_job->done_fence);
> +		status = IRQ_HANDLED;
> +	}
> +
>  	if (intsts & (V3D_HUB_INT_MMU_WRV |
>  		      V3D_HUB_INT_MMU_PTI |
>  		      V3D_HUB_INT_MMU_CAP)) {
> diff --git a/drivers/gpu/drm/v3d/v3d_regs.h b/drivers/gpu/drm/v3d/v3d_regs.h
> index c3a5e4e44f73..92a2bd55e217 100644
> --- a/drivers/gpu/drm/v3d/v3d_regs.h
> +++ b/drivers/gpu/drm/v3d/v3d_regs.h
> @@ -86,6 +86,64 @@
>  # define V3D_TOP_GR_BRIDGE_SW_INIT_1                   0x0000c
>  # define V3D_TOP_GR_BRIDGE_SW_INIT_1_V3D_CLK_108_SW_INIT BIT(0)
>  
> +#define V3D_TFU_CS                                     0x00400
> +/* Stops current job, empties input fifo. */
> +# define V3D_TFU_CS_TFURST                             BIT(31)
> +# define V3D_TFU_CS_CVTCT_MASK                         V3D_MASK(23, 16)
> +# define V3D_TFU_CS_CVTCT_SHIFT                        16
> +# define V3D_TFU_CS_NFREE_MASK                         V3D_MASK(13, 8)
> +# define V3D_TFU_CS_NFREE_SHIFT                        8
> +# define V3D_TFU_CS_BUSY                               BIT(0)
> +
> +#define V3D_TFU_SU                                     0x00404
> +/* Interrupt when FINTTHR input slots are free (0 = disabled) */
> +# define V3D_TFU_SU_FINTTHR_MASK                       V3D_MASK(13, 8)
> +# define V3D_TFU_SU_FINTTHR_SHIFT                      8
> +/* Skips resetting the CRC at the start of CRC generation. */
> +# define V3D_TFU_SU_CRCCHAIN                           BIT(4)
> +/* skips writes, computes CRC of the image.  miplevels must be 0. */
> +# define V3D_TFU_SU_CRC                                BIT(3)
> +# define V3D_TFU_SU_THROTTLE_MASK                      V3D_MASK(1, 0)
> +# define V3D_TFU_SU_THROTTLE_SHIFT                     0
> +
> +#define V3D_TFU_ICFG                                   0x00408
> +/* Interrupt when the conversion is complete. */
> +# define V3D_TFU_ICFG_IOC                              BIT(0)
> +
> +/* Input Image Address */
> +#define V3D_TFU_IIA                                    0x0040c
> +/* Input Chroma Address */
> +#define V3D_TFU_ICA                                    0x00410
> +/* Input Image Stride */
> +#define V3D_TFU_IIS                                    0x00414
> +/* Input Image U-Plane Address */
> +#define V3D_TFU_IUA                                    0x00418
> +/* Output Image Address */
> +#define V3D_TFU_IOA                                    0x0041c
> +/* Image Output Size */
> +#define V3D_TFU_IOS                                    0x00420
> +/* TFU YUV Coefficient 0 */
> +#define V3D_TFU_COEF0                                  0x00424
> +/* Use these regs instead of the defaults. */
> +# define V3D_TFU_COEF0_USECOEF                         BIT(31)
> +/* TFU YUV Coefficient 1 */
> +#define V3D_TFU_COEF1                                  0x00428
> +/* TFU YUV Coefficient 2 */
> +#define V3D_TFU_COEF2                                  0x0042c
> +/* TFU YUV Coefficient 3 */
> +#define V3D_TFU_COEF3                                  0x00430
> +
> +#define V3D_TFU_CRC                                    0x00434
> +
> +#define V3D_TFU_INT_STS                                0x00438
> +#define V3D_TFU_INT_SET                                0x0043c
> +#define V3D_TFU_INT_CLR                                0x00440
> +#define V3D_TFU_INT_MSK_STS                            0x00444
> +#define V3D_TFU_INT_MSK_SET                            0x00448
> +#define V3D_TFU_INT_MSK_CLR                            0x0044c
> +#define V3D_TFU_INT_TFUC                               BIT(1)
> +#define V3D_TFU_INT_TFUF                               BIT(0)
> +
>  /* Per-MMU registers. */
>  
>  #define V3D_MMUC_CONTROL                               0x01000
> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
> index e1f2aab0717b..7a3d4020cfca 100644
> --- a/drivers/gpu/drm/v3d/v3d_sched.c
> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
> @@ -30,6 +30,12 @@ to_v3d_job(struct drm_sched_job *sched_job)
>  	return container_of(sched_job, struct v3d_job, base);
>  }
>  
> +static struct v3d_tfu_job *
> +to_tfu_job(struct drm_sched_job *sched_job)
> +{
> +	return container_of(sched_job, struct v3d_tfu_job, base);
> +}
> +
>  static void
>  v3d_job_free(struct drm_sched_job *sched_job)
>  {
> @@ -38,6 +44,14 @@ v3d_job_free(struct drm_sched_job *sched_job)
>  	v3d_exec_put(job->exec);
>  }
>  
> +static void
> +v3d_tfu_job_free(struct drm_sched_job *sched_job)
> +{
> +	struct v3d_tfu_job *job = to_tfu_job(sched_job);
> +
> +	v3d_tfu_job_put(job);
> +}
> +
>  /**
>   * Returns the fences that the bin or render job depends on, one by one.
>   * v3d_job_run() won't be called until all of them have been signaled.
> @@ -76,6 +90,27 @@ v3d_job_dependency(struct drm_sched_job *sched_job,
>  	return fence;
>  }
>  
> +/**
> + * Returns the fences that the TFU job depends on, one by one.
> + * v3d_tfu_job_run() won't be called until all of them have been
> + * signaled.
> + */
> +static struct dma_fence *
> +v3d_tfu_job_dependency(struct drm_sched_job *sched_job,
> +		   struct drm_sched_entity *s_entity)
> +{
> +	struct v3d_tfu_job *job = to_tfu_job(sched_job);
> +	struct dma_fence *fence;
> +
> +	fence = job->in_fence;
> +	if (fence) {
> +		job->in_fence = NULL;
> +		return fence;
> +	}
> +
> +	return NULL;
> +}
> +
>  static struct dma_fence *v3d_job_run(struct drm_sched_job *sched_job)
>  {
>  	struct v3d_job *job = to_v3d_job(sched_job);
> @@ -147,6 +182,71 @@ static struct dma_fence *v3d_job_run(struct drm_sched_job *sched_job)
>  	return fence;
>  }
>  
> +static struct dma_fence *
> +v3d_tfu_job_run(struct drm_sched_job *sched_job)
> +{
> +	struct v3d_tfu_job *job = to_tfu_job(sched_job);
> +	struct v3d_dev *v3d = job->v3d;
> +	struct drm_device *dev = &v3d->drm;
> +	struct dma_fence *fence;
> +
> +	fence = v3d_fence_create(v3d, V3D_TFU);
> +	if (IS_ERR(fence))
> +		return NULL;
> +
> +	v3d->tfu_job = job;
> +	if (job->done_fence)
> +		dma_fence_put(job->done_fence);
> +	job->done_fence = dma_fence_get(fence);
> +
> +	trace_v3d_submit_tfu(dev, to_v3d_fence(fence)->seqno);
> +
> +	V3D_WRITE(V3D_TFU_IIA, job->args.iia);
> +	V3D_WRITE(V3D_TFU_IIS, job->args.iis);
> +	V3D_WRITE(V3D_TFU_ICA, job->args.ica);
> +	V3D_WRITE(V3D_TFU_IUA, job->args.iua);
> +	V3D_WRITE(V3D_TFU_IOA, job->args.ioa);
> +	V3D_WRITE(V3D_TFU_IOS, job->args.ios);
> +	if (job->args.coef[0] & V3D_TFU_COEF0_USECOEF) {
> +		V3D_WRITE(V3D_TFU_COEF0, job->args.coef[0]);
> +		V3D_WRITE(V3D_TFU_COEF1, job->args.coef[1]);
> +		V3D_WRITE(V3D_TFU_COEF2, job->args.coef[2]);
> +		V3D_WRITE(V3D_TFU_COEF3, job->args.coef[3]);
> +	}
> +	/* ICFG kicks off the job. */
> +	V3D_WRITE(V3D_TFU_ICFG, job->args.icfg | V3D_TFU_ICFG_IOC);
> +
> +	return fence;
> +}
> +
> +static void
> +v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, struct drm_sched_job *sched_job)
> +{
> +	enum v3d_queue q;
> +
> +	mutex_lock(&v3d->reset_lock);
> +
> +	/* block scheduler */
> +	for (q = 0; q < V3D_MAX_QUEUES; q++) {
> +		struct drm_gpu_scheduler *sched = &v3d->queue[q].sched;
> +
> +		kthread_park(sched->thread);
> +		drm_sched_hw_job_reset(sched, (sched_job->sched == sched ?
> +					       sched_job : NULL));
> +	}
> +
> +	/* get the GPU back into the init state */
> +	v3d_reset(v3d);
> +
> +	/* Unblock schedulers and restart their jobs. */
> +	for (q = 0; q < V3D_MAX_QUEUES; q++) {
> +		drm_sched_job_recovery(&v3d->queue[q].sched);
> +		kthread_unpark(v3d->queue[q].sched.thread);
> +	}
> +
> +	mutex_unlock(&v3d->reset_lock);
> +}
> +
>  static void
>  v3d_job_timedout(struct drm_sched_job *sched_job)
>  {
> @@ -154,7 +254,6 @@ v3d_job_timedout(struct drm_sched_job *sched_job)
>  	struct v3d_exec_info *exec = job->exec;
>  	struct v3d_dev *v3d = exec->v3d;
>  	enum v3d_queue job_q = job == &exec->bin ? V3D_BIN : V3D_RENDER;
> -	enum v3d_queue q;
>  	u32 ctca = V3D_CORE_READ(0, V3D_CLE_CTNCA(job_q));
>  	u32 ctra = V3D_CORE_READ(0, V3D_CLE_CTNRA(job_q));
>  
> @@ -173,27 +272,15 @@ v3d_job_timedout(struct drm_sched_job *sched_job)
>  		return;
>  	}
>  
> -	mutex_lock(&v3d->reset_lock);
> -
> -	/* block scheduler */
> -	for (q = 0; q < V3D_MAX_QUEUES; q++) {
> -		struct drm_gpu_scheduler *sched = &v3d->queue[q].sched;
> -
> -		kthread_park(sched->thread);
> -		drm_sched_hw_job_reset(sched, (sched_job->sched == sched ?
> -					       sched_job : NULL));
> -	}
> -
> -	/* get the GPU back into the init state */
> -	v3d_reset(v3d);
> +	v3d_gpu_reset_for_timeout(v3d, sched_job);
> +}
>  
> -	/* Unblock schedulers and restart their jobs. */
> -	for (q = 0; q < V3D_MAX_QUEUES; q++) {
> -		drm_sched_job_recovery(&v3d->queue[q].sched);
> -		kthread_unpark(v3d->queue[q].sched.thread);
> -	}
> +static void
> +v3d_tfu_job_timedout(struct drm_sched_job *sched_job)
> +{
> +	struct v3d_tfu_job *job = to_tfu_job(sched_job);
>  
> -	mutex_unlock(&v3d->reset_lock);
> +	v3d_gpu_reset_for_timeout(job->v3d, sched_job);
>  }
>  
>  static const struct drm_sched_backend_ops v3d_sched_ops = {
> @@ -203,6 +290,13 @@ static const struct drm_sched_backend_ops v3d_sched_ops = {
>  	.free_job = v3d_job_free
>  };
>  
> +static const struct drm_sched_backend_ops v3d_tfu_sched_ops = {
> +	.dependency = v3d_tfu_job_dependency,
> +	.run_job = v3d_tfu_job_run,
> +	.timedout_job = v3d_tfu_job_timedout,
> +	.free_job = v3d_tfu_job_free
> +};
> +
>  int
>  v3d_sched_init(struct v3d_dev *v3d)
>  {
> @@ -233,6 +327,19 @@ v3d_sched_init(struct v3d_dev *v3d)
>  		return ret;
>  	}
>  
> +	ret = drm_sched_init(&v3d->queue[V3D_TFU].sched,
> +			     &v3d_tfu_sched_ops,
> +			     hw_jobs_limit, job_hang_limit,
> +			     msecs_to_jiffies(hang_limit_ms),
> +			     "v3d_tfu");
> +	if (ret) {
> +		dev_err(v3d->dev, "Failed to create TFU scheduler: %d.",
> +			ret);
> +		drm_sched_fini(&v3d->queue[V3D_RENDER].sched);
> +		drm_sched_fini(&v3d->queue[V3D_BIN].sched);
> +		return ret;
> +	}
> +
>  	return 0;
>  }
>  
> diff --git a/drivers/gpu/drm/v3d/v3d_trace.h b/drivers/gpu/drm/v3d/v3d_trace.h
> index 85dd351e1e09..f54ed9cd3444 100644
> --- a/drivers/gpu/drm/v3d/v3d_trace.h
> +++ b/drivers/gpu/drm/v3d/v3d_trace.h
> @@ -42,6 +42,26 @@ TRACE_EVENT(v3d_submit_cl,
>  		      __entry->ctnqea)
>  );
>  
> +TRACE_EVENT(v3d_submit_tfu,
> +	    TP_PROTO(struct drm_device *dev,
> +		     uint64_t seqno),
> +	    TP_ARGS(dev, seqno),
> +
> +	    TP_STRUCT__entry(
> +			     __field(u32, dev)
> +			     __field(u64, seqno)
> +			     ),
> +
> +	    TP_fast_assign(
> +			   __entry->dev = dev->primary->index;
> +			   __entry->seqno = seqno;
> +			   ),
> +
> +	    TP_printk("dev=%u, seqno=%llu",
> +		      __entry->dev,
> +		      __entry->seqno)
> +);
> +
>  TRACE_EVENT(v3d_reset_begin,
>  	    TP_PROTO(struct drm_device *dev),
>  	    TP_ARGS(dev),
> diff --git a/include/uapi/drm/v3d_drm.h b/include/uapi/drm/v3d_drm.h
> index b1e5de076b0f..3bb4c6136f18 100644
> --- a/include/uapi/drm/v3d_drm.h
> +++ b/include/uapi/drm/v3d_drm.h
> @@ -36,6 +36,7 @@ extern "C" {
>  #define DRM_V3D_MMAP_BO                           0x03
>  #define DRM_V3D_GET_PARAM                         0x04
>  #define DRM_V3D_GET_BO_OFFSET                     0x05
> +#define DRM_V3D_SUBMIT_TFU                        0x06
>  
>  #define DRM_IOCTL_V3D_SUBMIT_CL           DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_SUBMIT_CL, struct drm_v3d_submit_cl)
>  #define DRM_IOCTL_V3D_WAIT_BO             DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_WAIT_BO, struct drm_v3d_wait_bo)
> @@ -43,6 +44,7 @@ extern "C" {
>  #define DRM_IOCTL_V3D_MMAP_BO             DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_MMAP_BO, struct drm_v3d_mmap_bo)
>  #define DRM_IOCTL_V3D_GET_PARAM           DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_GET_PARAM, struct drm_v3d_get_param)
>  #define DRM_IOCTL_V3D_GET_BO_OFFSET       DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_GET_BO_OFFSET, struct drm_v3d_get_bo_offset)
> +#define DRM_IOCTL_V3D_SUBMIT_TFU          DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_SUBMIT_TFU, struct drm_v3d_submit_tfu)
>  
>  /**
>   * struct drm_v3d_submit_cl - ioctl argument for submitting commands to the 3D
> @@ -179,6 +181,7 @@ enum drm_v3d_param {
>  	DRM_V3D_PARAM_V3D_CORE0_IDENT0,
>  	DRM_V3D_PARAM_V3D_CORE0_IDENT1,
>  	DRM_V3D_PARAM_V3D_CORE0_IDENT2,
> +	DRM_V3D_PARAM_SUPPORTS_TFU,
>  };
>  
>  struct drm_v3d_get_param {
> @@ -197,6 +200,28 @@ struct drm_v3d_get_bo_offset {
>  	__u32 offset;
>  };
>  
> +struct drm_v3d_submit_tfu {
> +	__u32 icfg;
> +	__u32 iia;
> +	__u32 iis;
> +	__u32 ica;
> +	__u32 iua;
> +	__u32 ioa;
> +	__u32 ios;
> +	__u32 coef[4];
> +	/* First handle is the output BO, following are other inputs.
> +	 * 0 for unused.
> +	 */
> +	__u32 bo_handles[4];
> +	/* sync object to block on before submitting the TFU job.  Each TFU
> +	 * job will execute in the order submitted to its FD.  Synchronization
> +	 * against rendering jobs requires using sync objects.
> +	 */
> +	__u32 in_sync;
> +	/* Sync object to signal when the TFU job is done. */
> +	__u32 out_sync;
> +};
> +
>  #if defined(__cplusplus)
>  }
>  #endif
> -- 
> 2.19.1

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 4/4] drm/v3d: Add support for submitting jobs to the TFU.
  2018-11-13 22:03   ` Eric Anholt
@ 2018-11-28 19:45     ` Dave Emett
  0 siblings, 0 replies; 10+ messages in thread
From: Dave Emett @ 2018-11-28 19:45 UTC (permalink / raw)
  To: Eric Anholt; +Cc: dri-devel, linux-kernel, boris.brezillon, daniel.vetter

A few comments below.
In particular I think USECOEF handling is a bit broken?
Otherwise looks good to me.

> I think one interesting question here is if TFU hangs (has it ever hung,
> in our experience?) do we want to reset the whole V3D, or is the reset
> flag in the TFU block enough?

We've never seen the TFU hang AFAIK.
Seems prudent to handle anyway; what you've done looks fine to me.
I wouldn't try to reset the TFU on its own. I don't know if that TFU
reset bit has ever been tested!

> > @@ -251,6 +256,7 @@ static const struct drm_ioctl_desc v3d_drm_ioctls[] = {
> >       DRM_IOCTL_DEF_DRV(V3D_MMAP_BO, v3d_mmap_bo_ioctl, DRM_RENDER_ALLOW),
> >       DRM_IOCTL_DEF_DRV(V3D_GET_PARAM, v3d_get_param_ioctl, DRM_RENDER_ALLOW),
> >       DRM_IOCTL_DEF_DRV(V3D_GET_BO_OFFSET, v3d_get_bo_offset_ioctl, DRM_RENDER_ALLOW),
> > +     DRM_IOCTL_DEF_DRV(V3D_SUBMIT_TFU, v3d_submit_tfu_ioctl, DRM_RENDER_ALLOW | DRM_AUTH),
> >  };

I would extend the comment above this block to note that DRM_AUTH is
currently required on SUBMIT_TFU because TFU commands are currently
not validated. (The TFU does not access memory via the GMP so I assume
we will want to explicitly validate commands instead?)

> >  static void
> >  v3d_unlock_bo_reservations(struct drm_device *dev,

dev not used? Wouldn't be needed by v3d_lock_bo_reservations either,
if it didn't need to be passed to unlock.

> > +static void
> > +v3d_tfu_job_cleanup(struct kref *ref)
> > +{
> > +     struct v3d_tfu_job *job = container_of(ref, struct v3d_tfu_job,
> > +                                            refcount);
> > +     struct v3d_dev *v3d = job->v3d;
> > +     unsigned int i;
> > +
> > +     dma_fence_put(job->in_fence);
> > +     dma_fence_put(job->done_fence);
> > +
> > +     for (i = 0; i < ARRAY_SIZE(job->bo); i++)
> > +             drm_gem_object_put_unlocked(&job->bo[i]->base);

This is a bit questionable. job->bo[i] may be NULL. &job->bo[i]->base
would work out as NULL too, but this strictly speaking invokes
undefined behaviour.

> > +#define V3D_TFU_INT_STS                                0x00438
> > +#define V3D_TFU_INT_SET                                0x0043c
> > +#define V3D_TFU_INT_CLR                                0x00440
> > +#define V3D_TFU_INT_MSK_STS                            0x00444
> > +#define V3D_TFU_INT_MSK_SET                            0x00448
> > +#define V3D_TFU_INT_MSK_CLR                            0x0044c
> > +#define V3D_TFU_INT_TFUC                               BIT(1)
> > +#define V3D_TFU_INT_TFUF                               BIT(0)

These just alias the HUB_CTL_INT registers.
They shouldn't be used.
I would probably avoid listing them here to avoid confusion.

> > +     if (job->args.coef[0] & V3D_TFU_COEF0_USECOEF) {
> > +             V3D_WRITE(V3D_TFU_COEF0, job->args.coef[0]);
> > +             V3D_WRITE(V3D_TFU_COEF1, job->args.coef[1]);
> > +             V3D_WRITE(V3D_TFU_COEF2, job->args.coef[2]);
> > +             V3D_WRITE(V3D_TFU_COEF3, job->args.coef[3]);
> > +     }

If USECOEF isn't set, still want to write COEF0 to clear the bit?

> > +#define DRM_IOCTL_V3D_SUBMIT_TFU          DRM_IOWR(DRM_COMMAND_BASE + DRM_V3D_SUBMIT_TFU, struct drm_v3d_submit_tfu)

Should this not be DRM_IOW? No data is returned to userspace in the
drm_v3d_submit_tfu struct AFAICT?

> > +     /* sync object to block on before submitting the TFU job.  Each TFU
> > +      * job will execute in the order submitted to its FD.  Synchronization
> > +      * against rendering jobs requires using sync objects.
> > +      */
> > +     __u32 in_sync;

"Submit" is used to mean two different things here. Maybe "before
submitting the TFU job" --> "before running the TFU job" to avoid
confusion?

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-11-28 19:46 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-08 16:16 [PATCH 0/4] V3D TFU engine support Eric Anholt
2018-11-08 16:16 ` [PATCH 1/4] drm/v3d: Fix whitespace inconsistency in the header Eric Anholt
2018-11-13 10:22   ` Boris Brezillon
2018-11-08 16:16 ` [PATCH 2/4] drm/v3d: Update a comment about what uses v3d_job_dependency() Eric Anholt
2018-11-13 10:22   ` Boris Brezillon
2018-11-08 16:16 ` [PATCH 3/4] drm/v3d: Clean up the reservation object setup Eric Anholt
2018-11-13 10:22   ` Boris Brezillon
2018-11-08 16:16 ` [PATCH 4/4] drm/v3d: Add support for submitting jobs to the TFU Eric Anholt
2018-11-13 22:03   ` Eric Anholt
2018-11-28 19:45     ` Dave Emett

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).