Linux-Tegra Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v3 00/20] Host1x/TegraDRM UAPI
@ 2020-10-07 17:12 Mikko Perttunen
  2020-10-07 17:12 ` [PATCH v3 01/20] gpu: host1x: Use different lock classes for each client Mikko Perttunen
                   ` (19 more replies)
  0 siblings, 20 replies; 35+ messages in thread
From: Mikko Perttunen @ 2020-10-07 17:12 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Hi all,

here's the third revision of the Host1x/TegraDRM UAPI proposal.
The open issues from RFCv2 should be resolved now, so I'm
dropping the RFC tag. The series is still only tested with Tegra186
so I'm hoping for people with devices with other chips to test this
out.

The test suite[1] has been updated for the changes in this revision,
and also includes tests for the newly added DMA reservation support.
If there are no further issues with the UAPI definition, I'll
look at porting other userspace next - hoping for some help with that
as well since most of it is for chips I don't have easy access to.

The series can be also found in
https://github.com/cyndis/linux/commits/work/host1x-uapi-v3.

Older versions:
v1: https://www.spinics.net/lists/linux-tegra/msg51000.html
v2: https://www.spinics.net/lists/linux-tegra/msg53061.html

Thank you,
Mikko

[1] https://github.com/cyndis/uapi-test

Mikko Perttunen (20):
  gpu: host1x: Use different lock classes for each client
  gpu: host1x: Allow syncpoints without associated client
  gpu: host1x: Show number of pending waiters in debugfs
  gpu: host1x: Remove cancelled waiters immediately
  gpu: host1x: Use HW-equivalent syncpoint expiration check
  gpu: host1x: Cleanup and refcounting for syncpoints
  gpu: host1x: Introduce UAPI header
  gpu: host1x: Implement /dev/host1x device node
  gpu: host1x: DMA fences and userspace fence creation
  gpu: host1x: Add no-recovery mode
  gpu: host1x: Add job release callback
  gpu: host1x: Add support for syncpoint waits in CDMA pushbuffer
  gpu: host1x: Reset max value when freeing a syncpoint
  gpu: host1x: Reserve VBLANK syncpoints at initialization
  drm/tegra: Add new UAPI to header
  drm/tegra: Boot VIC during runtime PM resume
  drm/tegra: Set resv fields when importing/exporting GEMs
  drm/tegra: Allocate per-engine channel in core code
  drm/tegra: Implement new UAPI
  drm/tegra: Add job firewall

 drivers/gpu/drm/tegra/Makefile         |   4 +
 drivers/gpu/drm/tegra/dc.c             |  10 +-
 drivers/gpu/drm/tegra/drm.c            |  75 ++-
 drivers/gpu/drm/tegra/drm.h            |   9 +
 drivers/gpu/drm/tegra/gem.c            |   2 +
 drivers/gpu/drm/tegra/gr2d.c           |   4 +-
 drivers/gpu/drm/tegra/gr3d.c           |   4 +-
 drivers/gpu/drm/tegra/uapi.h           |  63 +++
 drivers/gpu/drm/tegra/uapi/firewall.c  | 197 +++++++
 drivers/gpu/drm/tegra/uapi/gather_bo.c |  86 ++++
 drivers/gpu/drm/tegra/uapi/gather_bo.h |  22 +
 drivers/gpu/drm/tegra/uapi/submit.c    | 679 +++++++++++++++++++++++++
 drivers/gpu/drm/tegra/uapi/submit.h    |  20 +
 drivers/gpu/drm/tegra/uapi/uapi.c      | 326 ++++++++++++
 drivers/gpu/drm/tegra/vic.c            | 118 ++---
 drivers/gpu/host1x/Makefile            |   2 +
 drivers/gpu/host1x/bus.c               |   7 +-
 drivers/gpu/host1x/cdma.c              |  69 ++-
 drivers/gpu/host1x/debug.c             |  14 +-
 drivers/gpu/host1x/dev.c               |  15 +
 drivers/gpu/host1x/dev.h               |  16 +-
 drivers/gpu/host1x/fence.c             | 207 ++++++++
 drivers/gpu/host1x/fence.h             |  13 +
 drivers/gpu/host1x/hw/cdma_hw.c        |   2 +-
 drivers/gpu/host1x/hw/channel_hw.c     |  63 ++-
 drivers/gpu/host1x/hw/debug_hw.c       |  11 +-
 drivers/gpu/host1x/intr.c              |  23 +-
 drivers/gpu/host1x/intr.h              |   2 +
 drivers/gpu/host1x/job.c               |  79 ++-
 drivers/gpu/host1x/job.h               |  14 +
 drivers/gpu/host1x/syncpt.c            | 185 ++++---
 drivers/gpu/host1x/syncpt.h            |  16 +-
 drivers/gpu/host1x/uapi.c              | 382 ++++++++++++++
 drivers/gpu/host1x/uapi.h              |  22 +
 include/linux/host1x.h                 |  47 +-
 include/uapi/drm/tegra_drm.h           | 420 ++++++++++++++-
 include/uapi/linux/host1x.h            | 134 +++++
 37 files changed, 3076 insertions(+), 286 deletions(-)
 create mode 100644 drivers/gpu/drm/tegra/uapi.h
 create mode 100644 drivers/gpu/drm/tegra/uapi/firewall.c
 create mode 100644 drivers/gpu/drm/tegra/uapi/gather_bo.c
 create mode 100644 drivers/gpu/drm/tegra/uapi/gather_bo.h
 create mode 100644 drivers/gpu/drm/tegra/uapi/submit.c
 create mode 100644 drivers/gpu/drm/tegra/uapi/submit.h
 create mode 100644 drivers/gpu/drm/tegra/uapi/uapi.c
 create mode 100644 drivers/gpu/host1x/fence.c
 create mode 100644 drivers/gpu/host1x/fence.h
 create mode 100644 drivers/gpu/host1x/uapi.c
 create mode 100644 drivers/gpu/host1x/uapi.h
 create mode 100644 include/uapi/linux/host1x.h

-- 
2.28.0


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v3 01/20] gpu: host1x: Use different lock classes for each client
  2020-10-07 17:12 [PATCH v3 00/20] Host1x/TegraDRM UAPI Mikko Perttunen
@ 2020-10-07 17:12 ` Mikko Perttunen
  2020-10-07 17:12 ` [PATCH v3 02/20] gpu: host1x: Allow syncpoints without associated client Mikko Perttunen
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Mikko Perttunen @ 2020-10-07 17:12 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

To avoid false lockdep warnings, give each client lock a different
lock class, passed from the initialization site by macro.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/bus.c | 7 ++++---
 include/linux/host1x.h   | 9 ++++++++-
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/host1x/bus.c b/drivers/gpu/host1x/bus.c
index e201f62d62c0..4101f64bd545 100644
--- a/drivers/gpu/host1x/bus.c
+++ b/drivers/gpu/host1x/bus.c
@@ -714,13 +714,14 @@ EXPORT_SYMBOL(host1x_driver_unregister);
  * device and call host1x_device_init(), which will in turn call each client's
  * &host1x_client_ops.init implementation.
  */
-int host1x_client_register(struct host1x_client *client)
+int __host1x_client_register(struct host1x_client *client,
+			   struct lock_class_key *key)
 {
 	struct host1x *host1x;
 	int err;
 
 	INIT_LIST_HEAD(&client->list);
-	mutex_init(&client->lock);
+	__mutex_init(&client->lock, "host1x client lock", key);
 	client->usecount = 0;
 
 	mutex_lock(&devices_lock);
@@ -741,7 +742,7 @@ int host1x_client_register(struct host1x_client *client)
 
 	return 0;
 }
-EXPORT_SYMBOL(host1x_client_register);
+EXPORT_SYMBOL(__host1x_client_register);
 
 /**
  * host1x_client_unregister() - unregister a host1x client
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index 20c885d0bddc..f711fc0154f4 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -320,7 +320,14 @@ static inline struct host1x_device *to_host1x_device(struct device *dev)
 int host1x_device_init(struct host1x_device *device);
 int host1x_device_exit(struct host1x_device *device);
 
-int host1x_client_register(struct host1x_client *client);
+int __host1x_client_register(struct host1x_client *client,
+			     struct lock_class_key *key);
+#define host1x_client_register(class) \
+	({ \
+		static struct lock_class_key __key; \
+		__host1x_client_register(class, &__key); \
+	})
+
 int host1x_client_unregister(struct host1x_client *client);
 
 int host1x_client_suspend(struct host1x_client *client);
-- 
2.28.0


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v3 02/20] gpu: host1x: Allow syncpoints without associated client
  2020-10-07 17:12 [PATCH v3 00/20] Host1x/TegraDRM UAPI Mikko Perttunen
  2020-10-07 17:12 ` [PATCH v3 01/20] gpu: host1x: Use different lock classes for each client Mikko Perttunen
@ 2020-10-07 17:12 ` Mikko Perttunen
  2020-10-07 17:12 ` [PATCH v3 03/20] gpu: host1x: Show number of pending waiters in debugfs Mikko Perttunen
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Mikko Perttunen @ 2020-10-07 17:12 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Syncpoints don't need to be associated with any client,
so remove the property, and expose host1x_syncpt_alloc.
This will allow allocating syncpoints without prior knowledge
of the engine that it will be used with.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v3:
* Clean up host1x_syncpt_alloc signature to allow specifying
  a name for the syncpoint.
* Export the function.
---
 drivers/gpu/host1x/syncpt.c | 22 ++++++++++------------
 drivers/gpu/host1x/syncpt.h |  1 -
 include/linux/host1x.h      |  3 +++
 3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index fce7892d5137..5982fdf64e1c 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -42,13 +42,13 @@ static void host1x_syncpt_base_free(struct host1x_syncpt_base *base)
 		base->requested = false;
 }
 
-static struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
-						 struct host1x_client *client,
-						 unsigned long flags)
+struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
+					  unsigned long flags,
+					  const char *name)
 {
 	struct host1x_syncpt *sp = host->syncpt;
+	char *full_name;
 	unsigned int i;
-	char *name;
 
 	mutex_lock(&host->syncpt_mutex);
 
@@ -64,13 +64,11 @@ static struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
 			goto unlock;
 	}
 
-	name = kasprintf(GFP_KERNEL, "%02u-%s", sp->id,
-			 client ? dev_name(client->dev) : NULL);
-	if (!name)
+	full_name = kasprintf(GFP_KERNEL, "%u-%s", sp->id, name);
+	if (!full_name)
 		goto free_base;
 
-	sp->client = client;
-	sp->name = name;
+	sp->name = full_name;
 
 	if (flags & HOST1X_SYNCPT_CLIENT_MANAGED)
 		sp->client_managed = true;
@@ -87,6 +85,7 @@ static struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
 	mutex_unlock(&host->syncpt_mutex);
 	return NULL;
 }
+EXPORT_SYMBOL(host1x_syncpt_alloc);
 
 /**
  * host1x_syncpt_id() - retrieve syncpoint ID
@@ -401,7 +400,7 @@ int host1x_syncpt_init(struct host1x *host)
 	host1x_hw_syncpt_enable_protection(host);
 
 	/* Allocate sync point to use for clearing waits for expired fences */
-	host->nop_sp = host1x_syncpt_alloc(host, NULL, 0);
+	host->nop_sp = host1x_syncpt_alloc(host, 0, "reserved-nop");
 	if (!host->nop_sp)
 		return -ENOMEM;
 
@@ -423,7 +422,7 @@ struct host1x_syncpt *host1x_syncpt_request(struct host1x_client *client,
 {
 	struct host1x *host = dev_get_drvdata(client->host->parent);
 
-	return host1x_syncpt_alloc(host, client, flags);
+	return host1x_syncpt_alloc(host, flags, dev_name(client->dev));
 }
 EXPORT_SYMBOL(host1x_syncpt_request);
 
@@ -447,7 +446,6 @@ void host1x_syncpt_free(struct host1x_syncpt *sp)
 	host1x_syncpt_base_free(sp->base);
 	kfree(sp->name);
 	sp->base = NULL;
-	sp->client = NULL;
 	sp->name = NULL;
 	sp->client_managed = false;
 
diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h
index 8e1d04dacaa0..3aa6b25b1b9c 100644
--- a/drivers/gpu/host1x/syncpt.h
+++ b/drivers/gpu/host1x/syncpt.h
@@ -33,7 +33,6 @@ struct host1x_syncpt {
 	const char *name;
 	bool client_managed;
 	struct host1x *host;
-	struct host1x_client *client;
 	struct host1x_syncpt_base *base;
 
 	/* interrupt data */
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index f711fc0154f4..099eff8a06d2 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -154,6 +154,9 @@ int host1x_syncpt_wait(struct host1x_syncpt *sp, u32 thresh, long timeout,
 struct host1x_syncpt *host1x_syncpt_request(struct host1x_client *client,
 					    unsigned long flags);
 void host1x_syncpt_free(struct host1x_syncpt *sp);
+struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
+					  unsigned long flags,
+					  const char *name);
 
 struct host1x_syncpt_base *host1x_syncpt_get_base(struct host1x_syncpt *sp);
 u32 host1x_syncpt_base_id(struct host1x_syncpt_base *base);
-- 
2.28.0


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v3 03/20] gpu: host1x: Show number of pending waiters in debugfs
  2020-10-07 17:12 [PATCH v3 00/20] Host1x/TegraDRM UAPI Mikko Perttunen
  2020-10-07 17:12 ` [PATCH v3 01/20] gpu: host1x: Use different lock classes for each client Mikko Perttunen
  2020-10-07 17:12 ` [PATCH v3 02/20] gpu: host1x: Allow syncpoints without associated client Mikko Perttunen
@ 2020-10-07 17:12 ` Mikko Perttunen
  2020-10-07 17:12 ` [PATCH v3 04/20] gpu: host1x: Remove cancelled waiters immediately Mikko Perttunen
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Mikko Perttunen @ 2020-10-07 17:12 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Show the number of pending waiters in the debugfs status file.
This is useful for testing to verify that waiters do not leak
or accumulate incorrectly.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/debug.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
index 3eee4318b158..2d06a7406b3b 100644
--- a/drivers/gpu/host1x/debug.c
+++ b/drivers/gpu/host1x/debug.c
@@ -69,6 +69,7 @@ static int show_channel(struct host1x_channel *ch, void *data, bool show_fifo)
 
 static void show_syncpts(struct host1x *m, struct output *o)
 {
+	struct list_head *pos;
 	unsigned int i;
 
 	host1x_debug_output(o, "---- syncpts ----\n");
@@ -76,12 +77,19 @@ static void show_syncpts(struct host1x *m, struct output *o)
 	for (i = 0; i < host1x_syncpt_nb_pts(m); i++) {
 		u32 max = host1x_syncpt_read_max(m->syncpt + i);
 		u32 min = host1x_syncpt_load(m->syncpt + i);
+		unsigned int waiters = 0;
 
-		if (!min && !max)
+		spin_lock(&m->syncpt[i].intr.lock);
+		list_for_each(pos, &m->syncpt[i].intr.wait_head)
+			waiters++;
+		spin_unlock(&m->syncpt[i].intr.lock);
+
+		if (!min && !max && !waiters)
 			continue;
 
-		host1x_debug_output(o, "id %u (%s) min %d max %d\n",
-				    i, m->syncpt[i].name, min, max);
+		host1x_debug_output(o,
+				    "id %u (%s) min %d max %d (%d waiters)\n",
+				    i, m->syncpt[i].name, min, max, waiters);
 	}
 
 	for (i = 0; i < host1x_syncpt_nb_bases(m); i++) {
-- 
2.28.0


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v3 04/20] gpu: host1x: Remove cancelled waiters immediately
  2020-10-07 17:12 [PATCH v3 00/20] Host1x/TegraDRM UAPI Mikko Perttunen
                   ` (2 preceding siblings ...)
  2020-10-07 17:12 ` [PATCH v3 03/20] gpu: host1x: Show number of pending waiters in debugfs Mikko Perttunen
@ 2020-10-07 17:12 ` Mikko Perttunen
  2020-10-07 17:12 ` [PATCH v3 05/20] gpu: host1x: Use HW-equivalent syncpoint expiration check Mikko Perttunen
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Mikko Perttunen @ 2020-10-07 17:12 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Before this patch, cancelled waiters would only be cleaned up
once their threshold value was reached. Make host1x_intr_put_ref
process the cancellation immediately to fix this.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/intr.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c
index 9245add23b5d..5d328d20ce6d 100644
--- a/drivers/gpu/host1x/intr.c
+++ b/drivers/gpu/host1x/intr.c
@@ -247,13 +247,17 @@ void host1x_intr_put_ref(struct host1x *host, unsigned int id, void *ref)
 	struct host1x_waitlist *waiter = ref;
 	struct host1x_syncpt *syncpt;
 
-	while (atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED) ==
-	       WLS_REMOVED)
-		schedule();
+	atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED);
 
 	syncpt = host->syncpt + id;
-	(void)process_wait_list(host, syncpt,
-				host1x_syncpt_load(host->syncpt + id));
+
+	spin_lock(&syncpt->intr.lock);
+	if (atomic_cmpxchg(&waiter->state, WLS_CANCELLED, WLS_HANDLED) ==
+	    WLS_CANCELLED) {
+		list_del(&waiter->list);
+		kref_put(&waiter->refcount, waiter_release);
+	}
+	spin_unlock(&syncpt->intr.lock);
 
 	kref_put(&waiter->refcount, waiter_release);
 }
-- 
2.28.0


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v3 05/20] gpu: host1x: Use HW-equivalent syncpoint expiration check
  2020-10-07 17:12 [PATCH v3 00/20] Host1x/TegraDRM UAPI Mikko Perttunen
                   ` (3 preceding siblings ...)
  2020-10-07 17:12 ` [PATCH v3 04/20] gpu: host1x: Remove cancelled waiters immediately Mikko Perttunen
@ 2020-10-07 17:12 ` Mikko Perttunen
  2020-10-07 17:12 ` [PATCH v3 06/20] gpu: host1x: Cleanup and refcounting for syncpoints Mikko Perttunen
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Mikko Perttunen @ 2020-10-07 17:12 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Make syncpoint expiration checks always use the same logic used by
the hardware. This ensures that there are no race conditions that
could occur because of the hardware triggering a syncpoint interrupt
and then the driver disagreeing.

One situation where this could occur is if a job incremented a
syncpoint too many times -- then the hardware would trigger an
interrupt, but the driver would assume that a syncpoint value
greater than the syncpoint's max value is in the future, and not
clean up the job.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/syncpt.c | 51 ++-----------------------------------
 1 file changed, 2 insertions(+), 49 deletions(-)

diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index 5982fdf64e1c..9ca0d852e32f 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -306,59 +306,12 @@ EXPORT_SYMBOL(host1x_syncpt_wait);
 bool host1x_syncpt_is_expired(struct host1x_syncpt *sp, u32 thresh)
 {
 	u32 current_val;
-	u32 future_val;
 
 	smp_rmb();
 
 	current_val = (u32)atomic_read(&sp->min_val);
-	future_val = (u32)atomic_read(&sp->max_val);
-
-	/* Note the use of unsigned arithmetic here (mod 1<<32).
-	 *
-	 * c = current_val = min_val	= the current value of the syncpoint.
-	 * t = thresh			= the value we are checking
-	 * f = future_val  = max_val	= the value c will reach when all
-	 *				  outstanding increments have completed.
-	 *
-	 * Note that c always chases f until it reaches f.
-	 *
-	 * Dtf = (f - t)
-	 * Dtc = (c - t)
-	 *
-	 *  Consider all cases:
-	 *
-	 *	A) .....c..t..f.....	Dtf < Dtc	need to wait
-	 *	B) .....c.....f..t..	Dtf > Dtc	expired
-	 *	C) ..t..c.....f.....	Dtf > Dtc	expired	   (Dct very large)
-	 *
-	 *  Any case where f==c: always expired (for any t).	Dtf == Dcf
-	 *  Any case where t==c: always expired (for any f).	Dtf >= Dtc (because Dtc==0)
-	 *  Any case where t==f!=c: always wait.		Dtf <  Dtc (because Dtf==0,
-	 *							Dtc!=0)
-	 *
-	 *  Other cases:
-	 *
-	 *	A) .....t..f..c.....	Dtf < Dtc	need to wait
-	 *	A) .....f..c..t.....	Dtf < Dtc	need to wait
-	 *	A) .....f..t..c.....	Dtf > Dtc	expired
-	 *
-	 *   So:
-	 *	   Dtf >= Dtc implies EXPIRED	(return true)
-	 *	   Dtf <  Dtc implies WAIT	(return false)
-	 *
-	 * Note: If t is expired then we *cannot* wait on it. We would wait
-	 * forever (hang the system).
-	 *
-	 * Note: do NOT get clever and remove the -thresh from both sides. It
-	 * is NOT the same.
-	 *
-	 * If future valueis zero, we have a client managed sync point. In that
-	 * case we do a direct comparison.
-	 */
-	if (!host1x_syncpt_client_managed(sp))
-		return future_val - thresh >= current_val - thresh;
-	else
-		return (s32)(current_val - thresh) >= 0;
+
+	return ((current_val - thresh) & 0x80000000U) == 0U;
 }
 
 int host1x_syncpt_init(struct host1x *host)
-- 
2.28.0


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v3 06/20] gpu: host1x: Cleanup and refcounting for syncpoints
  2020-10-07 17:12 [PATCH v3 00/20] Host1x/TegraDRM UAPI Mikko Perttunen
                   ` (4 preceding siblings ...)
  2020-10-07 17:12 ` [PATCH v3 05/20] gpu: host1x: Use HW-equivalent syncpoint expiration check Mikko Perttunen
@ 2020-10-07 17:12 ` Mikko Perttunen
  2020-10-07 22:23   ` kernel test robot
  2020-10-07 17:12 ` [PATCH v3 07/20] gpu: host1x: Introduce UAPI header Mikko Perttunen
                   ` (13 subsequent siblings)
  19 siblings, 1 reply; 35+ messages in thread
From: Mikko Perttunen @ 2020-10-07 17:12 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Add reference counting for allocated syncpoints to allow keeping
them allocated while jobs are referencing them. Additionally,
clean up various places using syncpoint IDs to use host1x_syncpt
pointers instead.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/drm/tegra/dc.c         |  4 +-
 drivers/gpu/drm/tegra/drm.c        | 17 ++++---
 drivers/gpu/drm/tegra/gr2d.c       |  4 +-
 drivers/gpu/drm/tegra/gr3d.c       |  4 +-
 drivers/gpu/drm/tegra/vic.c        |  4 +-
 drivers/gpu/host1x/cdma.c          | 11 ++---
 drivers/gpu/host1x/dev.h           |  7 ++-
 drivers/gpu/host1x/hw/cdma_hw.c    |  2 +-
 drivers/gpu/host1x/hw/channel_hw.c | 10 ++--
 drivers/gpu/host1x/hw/debug_hw.c   |  2 +-
 drivers/gpu/host1x/job.c           |  5 +-
 drivers/gpu/host1x/syncpt.c        | 75 +++++++++++++++++++++++-------
 drivers/gpu/host1x/syncpt.h        |  3 ++
 include/linux/host1x.h             |  8 ++--
 14 files changed, 99 insertions(+), 57 deletions(-)

diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c
index 9a0b3240bc58..efb41c10dad4 100644
--- a/drivers/gpu/drm/tegra/dc.c
+++ b/drivers/gpu/drm/tegra/dc.c
@@ -2127,7 +2127,7 @@ static int tegra_dc_init(struct host1x_client *client)
 		drm_plane_cleanup(primary);
 
 	host1x_client_iommu_detach(client);
-	host1x_syncpt_free(dc->syncpt);
+	host1x_syncpt_put(dc->syncpt);
 
 	return err;
 }
@@ -2152,7 +2152,7 @@ static int tegra_dc_exit(struct host1x_client *client)
 	}
 
 	host1x_client_iommu_detach(client);
-	host1x_syncpt_free(dc->syncpt);
+	host1x_syncpt_put(dc->syncpt);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index ba9d1c3e7cac..ceea9db341f0 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -171,7 +171,7 @@ int tegra_drm_submit(struct tegra_drm_context *context,
 	struct drm_tegra_syncpt syncpt;
 	struct host1x *host1x = dev_get_drvdata(drm->dev->parent);
 	struct drm_gem_object **refs;
-	struct host1x_syncpt *sp;
+	struct host1x_syncpt *sp = NULL;
 	struct host1x_job *job;
 	unsigned int num_refs;
 	int err;
@@ -298,8 +298,8 @@ int tegra_drm_submit(struct tegra_drm_context *context,
 		goto fail;
 	}
 
-	/* check whether syncpoint ID is valid */
-	sp = host1x_syncpt_get(host1x, syncpt.id);
+	/* Syncpoint ref will be dropped on job release. */
+	sp = host1x_syncpt_get_by_id(host1x, syncpt.id);
 	if (!sp) {
 		err = -ENOENT;
 		goto fail;
@@ -308,7 +308,7 @@ int tegra_drm_submit(struct tegra_drm_context *context,
 	job->is_addr_reg = context->client->ops->is_addr_reg;
 	job->is_valid_class = context->client->ops->is_valid_class;
 	job->syncpt_incrs = syncpt.incrs;
-	job->syncpt_id = syncpt.id;
+	job->syncpt = sp;
 	job->timeout = 10000;
 
 	if (args->timeout && args->timeout < 10000)
@@ -327,6 +327,9 @@ int tegra_drm_submit(struct tegra_drm_context *context,
 	args->fence = job->syncpt_end;
 
 fail:
+	if (sp)
+		host1x_syncpt_put(sp);
+
 	while (num_refs--)
 		drm_gem_object_put(refs[num_refs]);
 
@@ -380,7 +383,7 @@ static int tegra_syncpt_read(struct drm_device *drm, void *data,
 	struct drm_tegra_syncpt_read *args = data;
 	struct host1x_syncpt *sp;
 
-	sp = host1x_syncpt_get(host, args->id);
+	sp = host1x_syncpt_get_by_id_noref(host, args->id);
 	if (!sp)
 		return -EINVAL;
 
@@ -395,7 +398,7 @@ static int tegra_syncpt_incr(struct drm_device *drm, void *data,
 	struct drm_tegra_syncpt_incr *args = data;
 	struct host1x_syncpt *sp;
 
-	sp = host1x_syncpt_get(host1x, args->id);
+	sp = host1x_syncpt_get_by_id_noref(host1x, args->id);
 	if (!sp)
 		return -EINVAL;
 
@@ -409,7 +412,7 @@ static int tegra_syncpt_wait(struct drm_device *drm, void *data,
 	struct drm_tegra_syncpt_wait *args = data;
 	struct host1x_syncpt *sp;
 
-	sp = host1x_syncpt_get(host1x, args->id);
+	sp = host1x_syncpt_get_by_id_noref(host1x, args->id);
 	if (!sp)
 		return -EINVAL;
 
diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c
index 1a0d3ba6e525..d857a99b21a7 100644
--- a/drivers/gpu/drm/tegra/gr2d.c
+++ b/drivers/gpu/drm/tegra/gr2d.c
@@ -67,7 +67,7 @@ static int gr2d_init(struct host1x_client *client)
 detach:
 	host1x_client_iommu_detach(client);
 free:
-	host1x_syncpt_free(client->syncpts[0]);
+	host1x_syncpt_put(client->syncpts[0]);
 put:
 	host1x_channel_put(gr2d->channel);
 	return err;
@@ -86,7 +86,7 @@ static int gr2d_exit(struct host1x_client *client)
 		return err;
 
 	host1x_client_iommu_detach(client);
-	host1x_syncpt_free(client->syncpts[0]);
+	host1x_syncpt_put(client->syncpts[0]);
 	host1x_channel_put(gr2d->channel);
 
 	return 0;
diff --git a/drivers/gpu/drm/tegra/gr3d.c b/drivers/gpu/drm/tegra/gr3d.c
index b0b8154e8104..24442ade0da3 100644
--- a/drivers/gpu/drm/tegra/gr3d.c
+++ b/drivers/gpu/drm/tegra/gr3d.c
@@ -76,7 +76,7 @@ static int gr3d_init(struct host1x_client *client)
 detach:
 	host1x_client_iommu_detach(client);
 free:
-	host1x_syncpt_free(client->syncpts[0]);
+	host1x_syncpt_put(client->syncpts[0]);
 put:
 	host1x_channel_put(gr3d->channel);
 	return err;
@@ -94,7 +94,7 @@ static int gr3d_exit(struct host1x_client *client)
 		return err;
 
 	host1x_client_iommu_detach(client);
-	host1x_syncpt_free(client->syncpts[0]);
+	host1x_syncpt_put(client->syncpts[0]);
 	host1x_channel_put(gr3d->channel);
 
 	return 0;
diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c
index ade56b860cf9..cb476da59adc 100644
--- a/drivers/gpu/drm/tegra/vic.c
+++ b/drivers/gpu/drm/tegra/vic.c
@@ -197,7 +197,7 @@ static int vic_init(struct host1x_client *client)
 	return 0;
 
 free_syncpt:
-	host1x_syncpt_free(client->syncpts[0]);
+	host1x_syncpt_put(client->syncpts[0]);
 free_channel:
 	host1x_channel_put(vic->channel);
 detach:
@@ -221,7 +221,7 @@ static int vic_exit(struct host1x_client *client)
 	if (err < 0)
 		return err;
 
-	host1x_syncpt_free(client->syncpts[0]);
+	host1x_syncpt_put(client->syncpts[0]);
 	host1x_channel_put(vic->channel);
 	host1x_client_iommu_detach(client);
 
diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
index e8d3fda91d8a..6e6ca774f68d 100644
--- a/drivers/gpu/host1x/cdma.c
+++ b/drivers/gpu/host1x/cdma.c
@@ -273,15 +273,13 @@ static int host1x_cdma_wait_pushbuffer_space(struct host1x *host1x,
 static void cdma_start_timer_locked(struct host1x_cdma *cdma,
 				    struct host1x_job *job)
 {
-	struct host1x *host = cdma_to_host1x(cdma);
-
 	if (cdma->timeout.client) {
 		/* timer already started */
 		return;
 	}
 
 	cdma->timeout.client = job->client;
-	cdma->timeout.syncpt = host1x_syncpt_get(host, job->syncpt_id);
+	cdma->timeout.syncpt = job->syncpt;
 	cdma->timeout.syncpt_val = job->syncpt_end;
 	cdma->timeout.start_ktime = ktime_get();
 
@@ -312,7 +310,6 @@ static void stop_cdma_timer_locked(struct host1x_cdma *cdma)
 static void update_cdma_locked(struct host1x_cdma *cdma)
 {
 	bool signal = false;
-	struct host1x *host1x = cdma_to_host1x(cdma);
 	struct host1x_job *job, *n;
 
 	/* If CDMA is stopped, queue is cleared and we can return */
@@ -324,8 +321,7 @@ static void update_cdma_locked(struct host1x_cdma *cdma)
 	 * to consume as many sync queue entries as possible without blocking
 	 */
 	list_for_each_entry_safe(job, n, &cdma->sync_queue, list) {
-		struct host1x_syncpt *sp =
-			host1x_syncpt_get(host1x, job->syncpt_id);
+		struct host1x_syncpt *sp = job->syncpt;
 
 		/* Check whether this syncpt has completed, and bail if not */
 		if (!host1x_syncpt_is_expired(sp, job->syncpt_end)) {
@@ -499,8 +495,7 @@ int host1x_cdma_begin(struct host1x_cdma *cdma, struct host1x_job *job)
 		if (!cdma->timeout.initialized) {
 			int err;
 
-			err = host1x_hw_cdma_timeout_init(host1x, cdma,
-							  job->syncpt_id);
+			err = host1x_hw_cdma_timeout_init(host1x, cdma);
 			if (err) {
 				mutex_unlock(&cdma->lock);
 				return err;
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
index f781a9b0f39d..63010ae37a97 100644
--- a/drivers/gpu/host1x/dev.h
+++ b/drivers/gpu/host1x/dev.h
@@ -37,7 +37,7 @@ struct host1x_cdma_ops {
 	void (*start)(struct host1x_cdma *cdma);
 	void (*stop)(struct host1x_cdma *cdma);
 	void (*flush)(struct  host1x_cdma *cdma);
-	int (*timeout_init)(struct host1x_cdma *cdma, unsigned int syncpt);
+	int (*timeout_init)(struct host1x_cdma *cdma);
 	void (*timeout_destroy)(struct host1x_cdma *cdma);
 	void (*freeze)(struct host1x_cdma *cdma);
 	void (*resume)(struct host1x_cdma *cdma, u32 getptr);
@@ -261,10 +261,9 @@ static inline void host1x_hw_cdma_flush(struct host1x *host,
 }
 
 static inline int host1x_hw_cdma_timeout_init(struct host1x *host,
-					      struct host1x_cdma *cdma,
-					      unsigned int syncpt)
+					      struct host1x_cdma *cdma)
 {
-	return host->cdma_op->timeout_init(cdma, syncpt);
+	return host->cdma_op->timeout_init(cdma);
 }
 
 static inline void host1x_hw_cdma_timeout_destroy(struct host1x *host,
diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
index 2f3bf94cf365..e49cd5b8f735 100644
--- a/drivers/gpu/host1x/hw/cdma_hw.c
+++ b/drivers/gpu/host1x/hw/cdma_hw.c
@@ -295,7 +295,7 @@ static void cdma_timeout_handler(struct work_struct *work)
 /*
  * Init timeout resources
  */
-static int cdma_timeout_init(struct host1x_cdma *cdma, unsigned int syncpt)
+static int cdma_timeout_init(struct host1x_cdma *cdma)
 {
 	INIT_DELAYED_WORK(&cdma->timeout.wq, cdma_timeout_handler);
 	cdma->timeout.initialized = true;
diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
index 5eaa29d171c9..d4c28faf27d1 100644
--- a/drivers/gpu/host1x/hw/channel_hw.c
+++ b/drivers/gpu/host1x/hw/channel_hw.c
@@ -86,8 +86,7 @@ static void submit_gathers(struct host1x_job *job)
 
 static inline void synchronize_syncpt_base(struct host1x_job *job)
 {
-	struct host1x *host = dev_get_drvdata(job->channel->dev->parent);
-	struct host1x_syncpt *sp = host->syncpt + job->syncpt_id;
+	struct host1x_syncpt *sp = job->syncpt;
 	unsigned int id;
 	u32 value;
 
@@ -118,7 +117,7 @@ static void host1x_channel_set_streamid(struct host1x_channel *channel)
 static int channel_submit(struct host1x_job *job)
 {
 	struct host1x_channel *ch = job->channel;
-	struct host1x_syncpt *sp;
+	struct host1x_syncpt *sp = job->syncpt;
 	u32 user_syncpt_incrs = job->syncpt_incrs;
 	u32 prev_max = 0;
 	u32 syncval;
@@ -126,10 +125,9 @@ static int channel_submit(struct host1x_job *job)
 	struct host1x_waitlist *completed_waiter = NULL;
 	struct host1x *host = dev_get_drvdata(ch->dev->parent);
 
-	sp = host->syncpt + job->syncpt_id;
 	trace_host1x_channel_submit(dev_name(ch->dev),
 				    job->num_gathers, job->num_relocs,
-				    job->syncpt_id, job->syncpt_incrs);
+				    job->syncpt->id, job->syncpt_incrs);
 
 	/* before error checks, return current max */
 	prev_max = job->syncpt_end = host1x_syncpt_read_max(sp);
@@ -163,7 +161,7 @@ static int channel_submit(struct host1x_job *job)
 		host1x_cdma_push(&ch->cdma,
 				 host1x_opcode_setclass(HOST1X_CLASS_HOST1X,
 					host1x_uclass_wait_syncpt_r(), 1),
-				 host1x_class_host_wait_syncpt(job->syncpt_id,
+				 host1x_class_host_wait_syncpt(job->syncpt->id,
 					host1x_syncpt_read_max(sp)));
 	}
 
diff --git a/drivers/gpu/host1x/hw/debug_hw.c b/drivers/gpu/host1x/hw/debug_hw.c
index f31bcfa1b837..ceb48229d14b 100644
--- a/drivers/gpu/host1x/hw/debug_hw.c
+++ b/drivers/gpu/host1x/hw/debug_hw.c
@@ -204,7 +204,7 @@ static void show_channel_gathers(struct output *o, struct host1x_cdma *cdma)
 		unsigned int i;
 
 		host1x_debug_output(o, "\n%p: JOB, syncpt_id=%d, syncpt_val=%d, first_get=%08x, timeout=%d num_slots=%d, num_handles=%d\n",
-				    job, job->syncpt_id, job->syncpt_end,
+				    job, job->syncpt->id, job->syncpt_end,
 				    job->first_get, job->timeout,
 				    job->num_slots, job->num_unpins);
 
diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
index 89b6c14b7392..d8345d3bf0b3 100644
--- a/drivers/gpu/host1x/job.c
+++ b/drivers/gpu/host1x/job.c
@@ -79,6 +79,9 @@ static void job_free(struct kref *ref)
 {
 	struct host1x_job *job = container_of(ref, struct host1x_job, ref);
 
+	if (job->syncpt)
+		host1x_syncpt_put(job->syncpt);
+
 	kfree(job);
 }
 
@@ -680,7 +683,7 @@ EXPORT_SYMBOL(host1x_job_unpin);
  */
 void host1x_job_dump(struct device *dev, struct host1x_job *job)
 {
-	dev_dbg(dev, "    SYNCPT_ID   %d\n", job->syncpt_id);
+	dev_dbg(dev, "    SYNCPT_ID   %d\n", job->syncpt->id);
 	dev_dbg(dev, "    SYNCPT_VAL  %d\n", job->syncpt_end);
 	dev_dbg(dev, "    FIRST_GET   0x%x\n", job->first_get);
 	dev_dbg(dev, "    TIMEOUT     %d\n", job->timeout);
diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index 9ca0d852e32f..c7b910e413d8 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -75,6 +75,8 @@ struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
 	else
 		sp->client_managed = false;
 
+	kref_init(&sp->ref);
+
 	mutex_unlock(&host->syncpt_mutex);
 	return sp;
 
@@ -368,7 +370,7 @@ int host1x_syncpt_init(struct host1x *host)
  * host1x client drivers can use this function to allocate a syncpoint for
  * subsequent use. A syncpoint returned by this function will be reserved for
  * use by the client exclusively. When no longer using a syncpoint, a host1x
- * client driver needs to release it using host1x_syncpt_free().
+ * client driver needs to release it using host1x_syncpt_put().
  */
 struct host1x_syncpt *host1x_syncpt_request(struct host1x_client *client,
 					    unsigned long flags)
@@ -379,20 +381,9 @@ struct host1x_syncpt *host1x_syncpt_request(struct host1x_client *client,
 }
 EXPORT_SYMBOL(host1x_syncpt_request);
 
-/**
- * host1x_syncpt_free() - free a requested syncpoint
- * @sp: host1x syncpoint
- *
- * Release a syncpoint previously allocated using host1x_syncpt_request(). A
- * host1x client driver should call this when the syncpoint is no longer in
- * use. Note that client drivers must ensure that the syncpoint doesn't remain
- * under the control of hardware after calling this function, otherwise two
- * clients may end up trying to access the same syncpoint concurrently.
- */
-void host1x_syncpt_free(struct host1x_syncpt *sp)
+static void syncpt_release(struct kref *ref)
 {
-	if (!sp)
-		return;
+	struct host1x_syncpt *sp = container_of(ref, struct host1x_syncpt, ref);
 
 	mutex_lock(&sp->host->syncpt_mutex);
 
@@ -404,7 +395,23 @@ void host1x_syncpt_free(struct host1x_syncpt *sp)
 
 	mutex_unlock(&sp->host->syncpt_mutex);
 }
-EXPORT_SYMBOL(host1x_syncpt_free);
+
+/**
+ * host1x_syncpt_put() - free a requested syncpoint
+ * @sp: host1x syncpoint
+ *
+ * Release a syncpoint previously allocated using host1x_syncpt_request(). A
+ * host1x client driver should call this when the syncpoint is no longer in
+ * use.
+ */
+void host1x_syncpt_put(struct host1x_syncpt *sp)
+{
+	if (!sp)
+		return;
+
+	kref_put(&sp->ref, syncpt_release);
+}
+EXPORT_SYMBOL(host1x_syncpt_put);
 
 void host1x_syncpt_deinit(struct host1x *host)
 {
@@ -471,16 +478,48 @@ unsigned int host1x_syncpt_nb_mlocks(struct host1x *host)
 }
 
 /**
- * host1x_syncpt_get() - obtain a syncpoint by ID
+ * host1x_syncpt_get_by_id() - obtain a syncpoint by ID
+ * @host: host1x controller
+ * @id: syncpoint ID
+ */
+struct host1x_syncpt *host1x_syncpt_get_by_id(struct host1x *host,
+					      unsigned int id)
+{
+	if (id >= host->info->nb_pts)
+		return NULL;
+
+	if (kref_get_unless_zero(&host->syncpt[id].ref))
+		return &host->syncpt[id];
+	else
+		return NULL;
+}
+EXPORT_SYMBOL(host1x_syncpt_get_by_id);
+
+/**
+ * host1x_syncpt_get_by_id_noref() - obtain a syncpoint by ID but don't
+ * 	increase the refcount.
  * @host: host1x controller
  * @id: syncpoint ID
  */
-struct host1x_syncpt *host1x_syncpt_get(struct host1x *host, unsigned int id)
+struct host1x_syncpt *host1x_syncpt_get_by_id_noref(struct host1x *host,
+						    unsigned int id)
 {
 	if (id >= host->info->nb_pts)
 		return NULL;
 
-	return host->syncpt + id;
+	return &host->syncpt[id];
+}
+EXPORT_SYMBOL(host1x_syncpt_get_by_id_noref);
+
+/**
+ * host1x_syncpt_get() - increment syncpoint refcount
+ * @sp: syncpoint
+ */
+struct host1x_syncpt *host1x_syncpt_get(struct host1x_syncpt *sp)
+{
+	kref_get(&sp->ref);
+
+	return sp;
 }
 EXPORT_SYMBOL(host1x_syncpt_get);
 
diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h
index 3aa6b25b1b9c..a6766f8d55ee 100644
--- a/drivers/gpu/host1x/syncpt.h
+++ b/drivers/gpu/host1x/syncpt.h
@@ -11,6 +11,7 @@
 #include <linux/atomic.h>
 #include <linux/host1x.h>
 #include <linux/kernel.h>
+#include <linux/kref.h>
 #include <linux/sched.h>
 
 #include "intr.h"
@@ -26,6 +27,8 @@ struct host1x_syncpt_base {
 };
 
 struct host1x_syncpt {
+	struct kref ref;
+
 	unsigned int id;
 	atomic_t min_val;
 	atomic_t max_val;
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index 099eff8a06d2..c222cf61b70b 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -142,7 +142,9 @@ struct host1x_syncpt_base;
 struct host1x_syncpt;
 struct host1x;
 
-struct host1x_syncpt *host1x_syncpt_get(struct host1x *host, u32 id);
+struct host1x_syncpt *host1x_syncpt_get_by_id(struct host1x *host, u32 id);
+struct host1x_syncpt *host1x_syncpt_get_by_id_noref(struct host1x *host, u32 id);
+struct host1x_syncpt *host1x_syncpt_get(struct host1x_syncpt *sp);
 u32 host1x_syncpt_id(struct host1x_syncpt *sp);
 u32 host1x_syncpt_read_min(struct host1x_syncpt *sp);
 u32 host1x_syncpt_read_max(struct host1x_syncpt *sp);
@@ -153,7 +155,7 @@ int host1x_syncpt_wait(struct host1x_syncpt *sp, u32 thresh, long timeout,
 		       u32 *value);
 struct host1x_syncpt *host1x_syncpt_request(struct host1x_client *client,
 					    unsigned long flags);
-void host1x_syncpt_free(struct host1x_syncpt *sp);
+void host1x_syncpt_put(struct host1x_syncpt *sp);
 struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
 					  unsigned long flags,
 					  const char *name);
@@ -221,7 +223,7 @@ struct host1x_job {
 	dma_addr_t *reloc_addr_phys;
 
 	/* Sync point id, number of increments and end related to the submit */
-	u32 syncpt_id;
+	struct host1x_syncpt *syncpt;
 	u32 syncpt_incrs;
 	u32 syncpt_end;
 
-- 
2.28.0


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v3 07/20] gpu: host1x: Introduce UAPI header
  2020-10-07 17:12 [PATCH v3 00/20] Host1x/TegraDRM UAPI Mikko Perttunen
                   ` (5 preceding siblings ...)
  2020-10-07 17:12 ` [PATCH v3 06/20] gpu: host1x: Cleanup and refcounting for syncpoints Mikko Perttunen
@ 2020-10-07 17:12 ` Mikko Perttunen
  2020-10-07 17:12 ` [PATCH v3 08/20] gpu: host1x: Implement /dev/host1x device node Mikko Perttunen
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Mikko Perttunen @ 2020-10-07 17:12 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Add the userspace interface header, specifying interfaces
for allocating and accessing syncpoints from userspace,
and for creating sync_file based fences based on syncpoint
thresholds.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 include/uapi/linux/host1x.h | 134 ++++++++++++++++++++++++++++++++++++
 1 file changed, 134 insertions(+)
 create mode 100644 include/uapi/linux/host1x.h

diff --git a/include/uapi/linux/host1x.h b/include/uapi/linux/host1x.h
new file mode 100644
index 000000000000..9c8fb9425cb2
--- /dev/null
+++ b/include/uapi/linux/host1x.h
@@ -0,0 +1,134 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/* Copyright (c) 2020 NVIDIA Corporation */
+
+#ifndef _UAPI__LINUX_HOST1X_H
+#define _UAPI__LINUX_HOST1X_H
+
+#include <linux/ioctl.h>
+#include <linux/types.h>
+
+#if defined(__cplusplus)
+extern "C" {
+#endif
+
+struct host1x_allocate_syncpoint {
+	/**
+	 * @fd: [out]
+	 *
+	 * New file descriptor representing the allocated syncpoint.
+	 */
+	__s32 fd;
+
+	__u32 reserved[3];
+};
+
+struct host1x_syncpoint_info {
+	/**
+	 * @id: [out]
+	 *
+	 * System-global ID of the syncpoint.
+	 */
+	__u32 id;
+
+	__u32 reserved[3];
+};
+
+struct host1x_syncpoint_increment {
+	/**
+	 * @count: [in]
+	 *
+	 * Number of times to increment the syncpoint. The syncpoint can
+	 * be observed at in-between values, but each increment is atomic.
+	 */
+	__u32 count;
+};
+
+struct host1x_read_syncpoint {
+	/**
+	 * @id: [in]
+	 *
+	 * ID of the syncpoint to read.
+	 */
+	__u32 id;
+
+	/**
+	 * @value: [out]
+	 *
+	 * Current value of the syncpoint.
+	 */
+	__u32 value;
+};
+
+struct host1x_create_fence {
+	/**
+	 * @id: [in]
+	 *
+	 * ID of the syncpoint to create a fence for.
+	 */
+	__u32 id;
+
+	/**
+	 * @threshold: [in]
+	 *
+	 * When the syncpoint reaches this value, the fence will be signaled.
+	 * The syncpoint is considered to have reached the threshold when the
+	 * following condition is true:
+	 *
+	 * 	((value - threshold) & 0x80000000U) == 0U
+	 *
+	 */
+	__u32 threshold;
+
+	/**
+	 * @fence_fd: [out]
+	 *
+	 * New sync_file file descriptor containing the created fence.
+	 */
+	__s32 fence_fd;
+
+	__u32 reserved[1];
+};
+
+struct host1x_fence_extract_fence {
+	__u32 id;
+	__u32 threshold;
+};
+
+struct host1x_fence_extract {
+	/**
+	 * @fence_fd: [in]
+	 *
+	 * sync_file file descriptor
+	 */
+	__s32 fence_fd;
+
+	/**
+	 * @num_fences: [in,out]
+	 *
+	 * In: size of the `fences_ptr` array counted in elements.
+	 * Out: required size of the `fences_ptr` array counted in elements.
+	 */
+	__u32 num_fences;
+
+	/**
+	 * @fences_ptr: [in]
+	 *
+	 * Pointer to array of `struct host1x_fence_extract_fence`.
+	 */
+	__u64 fences_ptr;
+
+	__u32 reserved[2];
+};
+
+#define HOST1X_IOCTL_ALLOCATE_SYNCPOINT  _IOWR('X', 0x00, struct host1x_allocate_syncpoint)
+#define HOST1X_IOCTL_READ_SYNCPOINT      _IOR ('X', 0x01, struct host1x_read_syncpoint)
+#define HOST1X_IOCTL_CREATE_FENCE        _IOWR('X', 0x02, struct host1x_create_fence)
+#define HOST1X_IOCTL_SYNCPOINT_INFO      _IOWR('X', 0x03, struct host1x_syncpoint_info)
+#define HOST1X_IOCTL_SYNCPOINT_INCREMENT _IOWR('X', 0x04, struct host1x_syncpoint_increment)
+#define HOST1X_IOCTL_FENCE_EXTRACT       _IOWR('X', 0x05, struct host1x_fence_extract)
+
+#if defined(__cplusplus)
+}
+#endif
+
+#endif
-- 
2.28.0


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v3 08/20] gpu: host1x: Implement /dev/host1x device node
  2020-10-07 17:12 [PATCH v3 00/20] Host1x/TegraDRM UAPI Mikko Perttunen
                   ` (6 preceding siblings ...)
  2020-10-07 17:12 ` [PATCH v3 07/20] gpu: host1x: Introduce UAPI header Mikko Perttunen
@ 2020-10-07 17:12 ` Mikko Perttunen
  2020-10-07 17:12 ` [PATCH v3 09/20] gpu: host1x: DMA fences and userspace fence creation Mikko Perttunen
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Mikko Perttunen @ 2020-10-07 17:12 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Add the /dev/host1x device node, implementing the following
functionality:

- Reading syncpoint values
- Allocating syncpoints (providing syncpoint FDs)
- Incrementing syncpoints (based on syncpoint FD)

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v3:
* Pass process name as syncpoint name when allocating
  syncpoint.
---
 drivers/gpu/host1x/Makefile |   1 +
 drivers/gpu/host1x/dev.c    |   9 ++
 drivers/gpu/host1x/dev.h    |   3 +
 drivers/gpu/host1x/uapi.c   | 276 ++++++++++++++++++++++++++++++++++++
 drivers/gpu/host1x/uapi.h   |  22 +++
 include/linux/host1x.h      |   2 +
 6 files changed, 313 insertions(+)
 create mode 100644 drivers/gpu/host1x/uapi.c
 create mode 100644 drivers/gpu/host1x/uapi.h

diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
index 096017b8789d..882f928d75e1 100644
--- a/drivers/gpu/host1x/Makefile
+++ b/drivers/gpu/host1x/Makefile
@@ -9,6 +9,7 @@ host1x-y = \
 	job.o \
 	debug.o \
 	mipi.o \
+	uapi.o \
 	hw/host1x01.o \
 	hw/host1x02.o \
 	hw/host1x04.o \
diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index d0ebb70e2fdd..641317d23828 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -461,6 +461,12 @@ static int host1x_probe(struct platform_device *pdev)
 		goto deinit_syncpt;
 	}
 
+	err = host1x_uapi_init(&host->uapi, host);
+	if (err) {
+		dev_err(&pdev->dev, "failed to initialize uapi\n");
+		goto deinit_intr;
+	}
+
 	host1x_debug_init(host);
 
 	if (host->info->has_hypervisor)
@@ -480,6 +486,8 @@ static int host1x_probe(struct platform_device *pdev)
 	host1x_unregister(host);
 deinit_debugfs:
 	host1x_debug_deinit(host);
+	host1x_uapi_deinit(&host->uapi);
+deinit_intr:
 	host1x_intr_deinit(host);
 deinit_syncpt:
 	host1x_syncpt_deinit(host);
@@ -501,6 +509,7 @@ static int host1x_remove(struct platform_device *pdev)
 
 	host1x_unregister(host);
 	host1x_debug_deinit(host);
+	host1x_uapi_deinit(&host->uapi);
 	host1x_intr_deinit(host);
 	host1x_syncpt_deinit(host);
 	reset_control_assert(host->rst);
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
index 63010ae37a97..7b8b7e20e32b 100644
--- a/drivers/gpu/host1x/dev.h
+++ b/drivers/gpu/host1x/dev.h
@@ -17,6 +17,7 @@
 #include "intr.h"
 #include "job.h"
 #include "syncpt.h"
+#include "uapi.h"
 
 struct host1x_syncpt;
 struct host1x_syncpt_base;
@@ -143,6 +144,8 @@ struct host1x {
 	struct list_head list;
 
 	struct device_dma_parameters dma_parms;
+
+	struct host1x_uapi uapi;
 };
 
 void host1x_hypervisor_writel(struct host1x *host1x, u32 r, u32 v);
diff --git a/drivers/gpu/host1x/uapi.c b/drivers/gpu/host1x/uapi.c
new file mode 100644
index 000000000000..4747d8de132e
--- /dev/null
+++ b/drivers/gpu/host1x/uapi.c
@@ -0,0 +1,276 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * /dev/host1x syncpoint interface
+ *
+ * Copyright (c) 2020, NVIDIA Corporation.
+ */
+
+#include <linux/anon_inodes.h>
+#include <linux/cdev.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/host1x.h>
+#include <linux/nospec.h>
+
+#include "dev.h"
+#include "syncpt.h"
+#include "uapi.h"
+
+#include <uapi/linux/host1x.h>
+
+static int syncpt_file_release(struct inode *inode, struct file *file)
+{
+	struct host1x_syncpt *sp = file->private_data;
+
+	host1x_syncpt_put(sp);
+
+	return 0;
+}
+
+static int syncpt_file_ioctl_info(struct host1x_syncpt *sp, void __user *data)
+{
+	struct host1x_syncpoint_info args;
+	unsigned long copy_err;
+
+	copy_err = copy_from_user(&args, data, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	if (args.reserved[0] || args.reserved[1] || args.reserved[2])
+		return -EINVAL;
+
+	args.id = sp->id;
+
+	copy_err = copy_to_user(data, &args, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	return 0;
+}
+
+static int syncpt_file_ioctl_incr(struct host1x_syncpt *sp, void __user *data)
+{
+	struct host1x_syncpoint_increment args;
+	unsigned long copy_err;
+	u32 i;
+
+	copy_err = copy_from_user(&args, data, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	for (i = 0; i < args.count; i++) {
+		host1x_syncpt_incr(sp);
+		if (signal_pending(current))
+			return -EINTR;
+	}
+
+	return 0;
+}
+
+static long syncpt_file_ioctl(struct file *file, unsigned int cmd,
+			      unsigned long arg)
+{
+	void __user *data = (void __user *)arg;
+	long err;
+
+	switch (cmd) {
+	case HOST1X_IOCTL_SYNCPOINT_INFO:
+		err = syncpt_file_ioctl_info(file->private_data, data);
+		break;
+
+	case HOST1X_IOCTL_SYNCPOINT_INCREMENT:
+		err = syncpt_file_ioctl_incr(file->private_data, data);
+		break;
+
+	default:
+		err = -ENOTTY;
+	}
+
+	return err;
+}
+
+static const struct file_operations syncpt_file_fops = {
+	.owner = THIS_MODULE,
+	.release = syncpt_file_release,
+	.unlocked_ioctl = syncpt_file_ioctl,
+	.compat_ioctl = syncpt_file_ioctl,
+};
+
+struct host1x_syncpt *host1x_syncpt_fd_get(int fd)
+{
+	struct host1x_syncpt *sp;
+	struct file *file = fget(fd);
+
+	if (!file)
+		return ERR_PTR(-EINVAL);
+
+	if (file->f_op != &syncpt_file_fops) {
+		fput(file);
+		return ERR_PTR(-EINVAL);
+	}
+
+	sp = file->private_data;
+
+	host1x_syncpt_get(sp);
+
+	fput(file);
+
+	return sp;
+}
+EXPORT_SYMBOL(host1x_syncpt_fd_get);
+
+static int dev_file_open(struct inode *inode, struct file *file)
+{
+	struct host1x_uapi *uapi =
+		container_of(inode->i_cdev, struct host1x_uapi, cdev);
+
+	file->private_data = container_of(uapi, struct host1x, uapi);
+
+	return 0;
+}
+
+static int dev_file_ioctl_read_syncpoint(struct host1x *host1x,
+					 void __user *data)
+{
+	struct host1x_read_syncpoint args;
+	unsigned long copy_err;
+
+	copy_err = copy_from_user(&args, data, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	if (args.id >= host1x_syncpt_nb_pts(host1x))
+		return -EINVAL;
+
+	args.id = array_index_nospec(args.id, host1x_syncpt_nb_pts(host1x));
+	args.value = host1x_syncpt_read(&host1x->syncpt[args.id]);
+
+	copy_err = copy_to_user(data, &args, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	return 0;
+}
+
+static int dev_file_ioctl_alloc_syncpoint(struct host1x *host1x,
+					  void __user *data)
+{
+	struct host1x_allocate_syncpoint args;
+	struct host1x_syncpt *sp;
+	unsigned long copy_err;
+	int err;
+
+	copy_err = copy_from_user(&args, data, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	if (args.reserved[0] || args.reserved[1] || args.reserved[2])
+		return -EINVAL;
+
+	sp = host1x_syncpt_alloc(host1x, HOST1X_SYNCPT_CLIENT_MANAGED,
+				 current->comm);
+	if (!sp)
+		return -EBUSY;
+
+	err = anon_inode_getfd("host1x_syncpt", &syncpt_file_fops, sp,
+			       O_CLOEXEC);
+	if (err < 0)
+		goto free_syncpt;
+
+	args.fd = err;
+
+	copy_err = copy_to_user(data, &args, sizeof(args));
+	if (copy_err) {
+		err = -EFAULT;
+		goto put_fd;
+	}
+
+	return 0;
+
+put_fd:
+	put_unused_fd(args.fd);
+free_syncpt:
+	host1x_syncpt_put(sp);
+
+	return err;
+}
+
+static long dev_file_ioctl(struct file *file, unsigned int cmd,
+			   unsigned long arg)
+{
+	void __user *data = (void __user *)arg;
+	long err;
+
+	switch (cmd) {
+	case HOST1X_IOCTL_READ_SYNCPOINT:
+		err = dev_file_ioctl_read_syncpoint(file->private_data, data);
+		break;
+
+	case HOST1X_IOCTL_ALLOCATE_SYNCPOINT:
+		err = dev_file_ioctl_alloc_syncpoint(file->private_data, data);
+		break;
+
+	default:
+		err = -ENOTTY;
+	}
+
+	return err;
+}
+
+static const struct file_operations dev_file_fops = {
+	.owner = THIS_MODULE,
+	.open = dev_file_open,
+	.unlocked_ioctl = dev_file_ioctl,
+	.compat_ioctl = dev_file_ioctl,
+};
+
+int host1x_uapi_init(struct host1x_uapi *uapi, struct host1x *host1x)
+{
+	int err;
+	dev_t dev_num;
+
+	err = alloc_chrdev_region(&dev_num, 0, 1, "host1x");
+	if (err)
+		return err;
+
+	uapi->class = class_create(THIS_MODULE, "host1x");
+	if (IS_ERR(uapi->class)) {
+		err = PTR_ERR(uapi->class);
+		goto unregister_chrdev_region;
+	}
+
+	cdev_init(&uapi->cdev, &dev_file_fops);
+	err = cdev_add(&uapi->cdev, dev_num, 1);
+	if (err)
+		goto destroy_class;
+
+	uapi->dev = device_create(uapi->class, host1x->dev,
+				  dev_num, NULL, "host1x");
+	if (IS_ERR(uapi->dev)) {
+		err = PTR_ERR(uapi->dev);
+		goto del_cdev;
+	}
+
+	cdev_add(&uapi->cdev, dev_num, 1);
+
+	uapi->dev_num = dev_num;
+
+	return 0;
+
+del_cdev:
+	cdev_del(&uapi->cdev);
+destroy_class:
+	class_destroy(uapi->class);
+unregister_chrdev_region:
+	unregister_chrdev_region(dev_num, 1);
+
+	return err;
+}
+
+void host1x_uapi_deinit(struct host1x_uapi *uapi)
+{
+	device_destroy(uapi->class, uapi->dev_num);
+	cdev_del(&uapi->cdev);
+	class_destroy(uapi->class);
+	unregister_chrdev_region(uapi->dev_num, 1);
+}
diff --git a/drivers/gpu/host1x/uapi.h b/drivers/gpu/host1x/uapi.h
new file mode 100644
index 000000000000..7beb5e44c1b1
--- /dev/null
+++ b/drivers/gpu/host1x/uapi.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2020, NVIDIA Corporation.
+ */
+
+#ifndef HOST1X_UAPI_H
+#define HOST1X_UAPI_H
+
+#include <linux/cdev.h>
+
+struct host1x_uapi {
+	struct class *class;
+
+	struct cdev cdev;
+	struct device *dev;
+	dev_t dev_num;
+};
+
+int host1x_uapi_init(struct host1x_uapi *uapi, struct host1x *host1x);
+void host1x_uapi_deinit(struct host1x_uapi *uapi);
+
+#endif
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index c222cf61b70b..8f6d61e13e05 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -163,6 +163,8 @@ struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
 struct host1x_syncpt_base *host1x_syncpt_get_base(struct host1x_syncpt *sp);
 u32 host1x_syncpt_base_id(struct host1x_syncpt_base *base);
 
+struct host1x_syncpt *host1x_syncpt_fd_get(int fd);
+
 /*
  * host1x channel
  */
-- 
2.28.0


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v3 09/20] gpu: host1x: DMA fences and userspace fence creation
  2020-10-07 17:12 [PATCH v3 00/20] Host1x/TegraDRM UAPI Mikko Perttunen
                   ` (7 preceding siblings ...)
  2020-10-07 17:12 ` [PATCH v3 08/20] gpu: host1x: Implement /dev/host1x device node Mikko Perttunen
@ 2020-10-07 17:12 ` Mikko Perttunen
  2020-10-07 23:13   ` kernel test robot
  2020-10-08 11:13   ` kernel test robot
  2020-10-07 17:12 ` [PATCH v3 10/20] gpu: host1x: Add no-recovery mode Mikko Perttunen
                   ` (10 subsequent siblings)
  19 siblings, 2 replies; 35+ messages in thread
From: Mikko Perttunen @ 2020-10-07 17:12 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Add an implementation of dma_fences based on syncpoints. Syncpoint
interrupts are used to signal fences. Additionally, after
software signaling has been enabled, a 30 second timeout is started.
If the syncpoint threshold is not reached within this period,
the fence is signalled with an -ETIMEDOUT error code. This is to
allow fences that would never reach their syncpoint threshold to
be cleaned up.

Additionally, add a new /dev/host1x IOCTL for creating sync_file
file descriptors backed by syncpoint fences.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v3:
* Move declaration of host1x_fence_extract to public header
---
 drivers/gpu/host1x/Makefile |   1 +
 drivers/gpu/host1x/fence.c  | 207 ++++++++++++++++++++++++++++++++++++
 drivers/gpu/host1x/fence.h  |  13 +++
 drivers/gpu/host1x/intr.c   |   9 ++
 drivers/gpu/host1x/intr.h   |   2 +
 drivers/gpu/host1x/uapi.c   | 106 ++++++++++++++++++
 include/linux/host1x.h      |   4 +
 7 files changed, 342 insertions(+)
 create mode 100644 drivers/gpu/host1x/fence.c
 create mode 100644 drivers/gpu/host1x/fence.h

diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
index 882f928d75e1..a48af2cefae1 100644
--- a/drivers/gpu/host1x/Makefile
+++ b/drivers/gpu/host1x/Makefile
@@ -10,6 +10,7 @@ host1x-y = \
 	debug.o \
 	mipi.o \
 	uapi.o \
+	fence.o \
 	hw/host1x01.o \
 	hw/host1x02.o \
 	hw/host1x04.o \
diff --git a/drivers/gpu/host1x/fence.c b/drivers/gpu/host1x/fence.c
new file mode 100644
index 000000000000..400da6c1ab48
--- /dev/null
+++ b/drivers/gpu/host1x/fence.c
@@ -0,0 +1,207 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Syncpoint dma_fence implementation
+ *
+ * Copyright (c) 2020, NVIDIA Corporation.
+ */
+
+#include <linux/dma-fence.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/slab.h>
+#include <linux/sync_file.h>
+
+#include "intr.h"
+#include "syncpt.h"
+
+DEFINE_SPINLOCK(lock);
+
+struct host1x_syncpt_fence {
+	struct dma_fence base;
+
+	atomic_t signaling;
+
+	struct host1x_syncpt *sp;
+	u32 threshold;
+
+	struct host1x_waitlist *waiter;
+	void *waiter_ref;
+
+	struct delayed_work timeout_work;
+};
+
+static const char *syncpt_fence_get_driver_name(struct dma_fence *f)
+{
+	return "host1x";
+}
+
+static const char *syncpt_fence_get_timeline_name(struct dma_fence *f)
+{
+	return "syncpoint";
+}
+
+static bool syncpt_fence_enable_signaling(struct dma_fence *f)
+{
+	struct host1x_syncpt_fence *sf =
+		container_of(f, struct host1x_syncpt_fence, base);
+	int err;
+
+	if (host1x_syncpt_is_expired(sf->sp, sf->threshold))
+		return false;
+
+	dma_fence_get(f);
+
+	/*
+	 * The dma_fence framework requires the fence driver to keep a
+	 * reference to any fences for which 'enable_signaling' has been
+	 * called (and that have not been signalled).
+	 * 
+	 * We provide a userspace API to create arbitrary syncpoint fences,
+	 * so we cannot normally guarantee that all fences get signalled.
+	 * As such, setup a timeout, so that long-lasting fences will get
+	 * reaped eventually.
+	 */
+	schedule_delayed_work(&sf->timeout_work, msecs_to_jiffies(30000));
+
+	err = host1x_intr_add_action(sf->sp->host, sf->sp, sf->threshold,
+				     HOST1X_INTR_ACTION_SIGNAL_FENCE, f,
+				     sf->waiter, &sf->waiter_ref);
+	if (err) {
+		cancel_delayed_work_sync(&sf->timeout_work);
+		dma_fence_put(f);
+		return false;
+	}
+
+	/* intr framework takes ownership of waiter */
+	sf->waiter = NULL;
+
+	/*
+	 * The fence may get signalled at any time after the above call,
+	 * so we need to initialize all state used by signalling
+	 * before it.
+	 */
+
+	return true;
+}
+
+static void syncpt_fence_release(struct dma_fence *f)
+{
+	struct host1x_syncpt_fence *sf =
+		container_of(f, struct host1x_syncpt_fence, base);
+
+	if (sf->waiter)
+		kfree(sf->waiter);
+
+	dma_fence_free(f);
+}
+
+const struct dma_fence_ops syncpt_fence_ops = {
+	.get_driver_name = syncpt_fence_get_driver_name,
+	.get_timeline_name = syncpt_fence_get_timeline_name,
+	.enable_signaling = syncpt_fence_enable_signaling,
+	.release = syncpt_fence_release,
+};
+
+void host1x_fence_signal(struct host1x_syncpt_fence *f)
+{
+	if (atomic_xchg(&f->signaling, 1))
+		return;
+
+	/*
+	 * Cancel pending timeout work - if it races, it will
+	 * not get 'f->signaling' and return.
+	 */
+	cancel_delayed_work_sync(&f->timeout_work);
+
+	host1x_intr_put_ref(f->sp->host, f->sp->id, f->waiter_ref);
+
+	dma_fence_signal(&f->base);
+	dma_fence_put(&f->base);
+}
+
+static void do_fence_timeout(struct work_struct *work)
+{
+	struct delayed_work *dwork = (struct delayed_work *)work;
+	struct host1x_syncpt_fence *f =
+		container_of(dwork, struct host1x_syncpt_fence, timeout_work);
+
+	if (atomic_xchg(&f->signaling, 1))
+		return;
+
+	/*
+	 * Cancel pending timeout work - if it races, it will
+	 * not get 'f->signaling' and return.
+	 */
+	host1x_intr_put_ref(f->sp->host, f->sp->id, f->waiter_ref);
+
+	dma_fence_set_error(&f->base, -ETIMEDOUT);
+	dma_fence_signal(&f->base);
+	dma_fence_put(&f->base);
+}
+
+struct dma_fence *host1x_fence_create(struct host1x_syncpt *sp, u32 threshold)
+{
+	struct host1x_syncpt_fence *fence;
+
+	fence = kzalloc(sizeof(*fence), GFP_KERNEL);
+	if (!fence)
+		return ERR_PTR(-ENOMEM);
+
+	fence->waiter = kzalloc(sizeof(*fence->waiter), GFP_KERNEL);
+	if (!fence->waiter)
+		return ERR_PTR(-ENOMEM);
+
+	fence->sp = sp;
+	fence->threshold = threshold;
+
+	dma_fence_init(&fence->base, &syncpt_fence_ops, &lock,
+		       dma_fence_context_alloc(1), 0);
+
+	INIT_DELAYED_WORK(&fence->timeout_work, do_fence_timeout);
+
+	return &fence->base;
+}
+EXPORT_SYMBOL(host1x_fence_create);
+
+int host1x_fence_create_fd(struct host1x_syncpt *sp, u32 threshold)
+{
+	struct sync_file *file;
+	struct dma_fence *f;
+	int fd;
+
+	f = host1x_fence_create(sp, threshold);
+	if (IS_ERR(f))
+		return PTR_ERR(f);
+
+	fd = get_unused_fd_flags(O_CLOEXEC);
+	if (fd < 0) {
+		dma_fence_put(f);
+		return fd;
+	}
+
+	file = sync_file_create(f);
+	dma_fence_put(f);
+	if (!file)
+		return -ENOMEM;
+
+	fd_install(fd, file->file);
+
+	return fd;
+}
+EXPORT_SYMBOL(host1x_fence_create_fd);
+
+int host1x_fence_extract(struct dma_fence *fence, u32 *id, u32 *threshold)
+{
+	struct host1x_syncpt_fence *f;
+
+	if (fence->ops != &syncpt_fence_ops)
+		return -EINVAL;
+
+	f = container_of(fence, struct host1x_syncpt_fence, base);
+
+	*id = f->sp->id;
+	*threshold = f->threshold;
+
+	return 0;
+}
+EXPORT_SYMBOL(host1x_fence_extract);
diff --git a/drivers/gpu/host1x/fence.h b/drivers/gpu/host1x/fence.h
new file mode 100644
index 000000000000..5999fe644fc0
--- /dev/null
+++ b/drivers/gpu/host1x/fence.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2020, NVIDIA Corporation.
+ */
+
+#ifndef HOST1X_FENCE_H
+#define HOST1X_FENCE_H
+
+struct host1x_syncpt_fence;
+
+bool host1x_fence_signal(struct host1x_syncpt_fence *fence);
+
+#endif
diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c
index 5d328d20ce6d..19b59c5c94d0 100644
--- a/drivers/gpu/host1x/intr.c
+++ b/drivers/gpu/host1x/intr.c
@@ -13,6 +13,7 @@
 #include <trace/events/host1x.h>
 #include "channel.h"
 #include "dev.h"
+#include "fence.h"
 #include "intr.h"
 
 /* Wait list management */
@@ -121,12 +122,20 @@ static void action_wakeup_interruptible(struct host1x_waitlist *waiter)
 	wake_up_interruptible(wq);
 }
 
+static void action_signal_fence(struct host1x_waitlist *waiter)
+{
+	struct host1x_syncpt_fence *f = waiter->data;
+
+	host1x_fence_signal(f);
+}
+
 typedef void (*action_handler)(struct host1x_waitlist *waiter);
 
 static const action_handler action_handlers[HOST1X_INTR_ACTION_COUNT] = {
 	action_submit_complete,
 	action_wakeup,
 	action_wakeup_interruptible,
+	action_signal_fence,
 };
 
 static void run_handlers(struct list_head completed[HOST1X_INTR_ACTION_COUNT])
diff --git a/drivers/gpu/host1x/intr.h b/drivers/gpu/host1x/intr.h
index aac38194398f..dedbd0f700fb 100644
--- a/drivers/gpu/host1x/intr.h
+++ b/drivers/gpu/host1x/intr.h
@@ -33,6 +33,8 @@ enum host1x_intr_action {
 	 */
 	HOST1X_INTR_ACTION_WAKEUP_INTERRUPTIBLE,
 
+	HOST1X_INTR_ACTION_SIGNAL_FENCE,
+
 	HOST1X_INTR_ACTION_COUNT
 };
 
diff --git a/drivers/gpu/host1x/uapi.c b/drivers/gpu/host1x/uapi.c
index 4747d8de132e..7d7d9b891638 100644
--- a/drivers/gpu/host1x/uapi.c
+++ b/drivers/gpu/host1x/uapi.c
@@ -11,8 +11,10 @@
 #include <linux/fs.h>
 #include <linux/host1x.h>
 #include <linux/nospec.h>
+#include <linux/sync_file.h>
 
 #include "dev.h"
+#include "fence.h"
 #include "syncpt.h"
 #include "uapi.h"
 
@@ -195,6 +197,102 @@ static int dev_file_ioctl_alloc_syncpoint(struct host1x *host1x,
 	return err;
 }
 
+static int dev_file_ioctl_create_fence(struct host1x *host1x, void __user *data)
+{
+	struct host1x_create_fence args;
+	unsigned long copy_err;
+	struct sync_file *file;
+	int fd;
+
+	copy_err = copy_from_user(&args, data, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	if (args.reserved[0])
+		return -EINVAL;
+
+	if (args.id >= host1x_syncpt_nb_pts(host1x))
+		return -EINVAL;
+
+	args.id = array_index_nospec(args.id, host1x_syncpt_nb_pts(host1x));
+
+	fd = host1x_fence_create_fd(&host1x->syncpt[args.id], args.threshold);
+	if (fd < 0)
+		return fd;
+
+	args.fence_fd = fd;
+
+	copy_err = copy_to_user(data, &args, sizeof(args));
+	if (copy_err) {
+		fput(file->file);
+		return -EFAULT;
+	}
+
+	return 0;
+}
+
+static int dev_file_ioctl_fence_extract(struct host1x *host1x, void __user *data)
+{
+	struct host1x_fence_extract_fence __user *fences_user_ptr;
+	struct dma_fence *fence, **fences;
+	struct host1x_fence_extract args;
+	struct dma_fence_array *array;
+	unsigned int num_fences, i;
+	unsigned long copy_err;
+	int err;
+
+	copy_err = copy_from_user(&args, data, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	fences_user_ptr = u64_to_user_ptr(args.fences_ptr);
+
+	if (args.reserved[0] || args.reserved[1])
+		return -EINVAL;
+
+	fence = sync_file_get_fence(args.fence_fd);
+	if (!fence)
+		return -EINVAL;
+
+	array = to_dma_fence_array(fence);
+	if (array) {
+		fences = array->fences;
+		num_fences = array->num_fences;
+	} else {
+		fences = &fence;
+		num_fences = 1;
+	}
+
+	for (i = 0; i < min(num_fences, args.num_fences); i++) {
+		struct host1x_fence_extract_fence f;
+
+		err = host1x_fence_extract(fences[i], &f.id, &f.threshold);
+		if (err)
+			goto put_fence;
+
+		copy_err = copy_to_user(fences_user_ptr + i, &f, sizeof(f));
+		if (copy_err) {
+			err = -EFAULT;
+			goto put_fence;
+		}
+	}
+
+	args.num_fences = i+1;
+
+	copy_err = copy_to_user(data, &args, sizeof(args));
+	if (copy_err) {
+		err = -EFAULT;
+		goto put_fence;
+	}
+
+	return 0;
+
+put_fence:
+	dma_fence_put(fence);
+
+	return err;
+}
+
 static long dev_file_ioctl(struct file *file, unsigned int cmd,
 			   unsigned long arg)
 {
@@ -210,6 +308,14 @@ static long dev_file_ioctl(struct file *file, unsigned int cmd,
 		err = dev_file_ioctl_alloc_syncpoint(file->private_data, data);
 		break;
 
+	case HOST1X_IOCTL_CREATE_FENCE:
+		err = dev_file_ioctl_create_fence(file->private_data, data);
+		break;
+
+	case HOST1X_IOCTL_FENCE_EXTRACT:
+		err = dev_file_ioctl_fence_extract(file->private_data, data);
+		break;
+
 	default:
 		err = -ENOTTY;
 	}
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index 8f6d61e13e05..bcfb1cc9a1c1 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -165,6 +165,10 @@ u32 host1x_syncpt_base_id(struct host1x_syncpt_base *base);
 
 struct host1x_syncpt *host1x_syncpt_fd_get(int fd);
 
+struct dma_fence *host1x_fence_create(struct host1x_syncpt *sp, u32 threshold);
+int host1x_fence_create_fd(struct host1x_syncpt *sp, u32 threshold);
+int host1x_fence_extract(struct dma_fence *fence, u32 *id, u32 *threshold);
+
 /*
  * host1x channel
  */
-- 
2.28.0


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v3 10/20] gpu: host1x: Add no-recovery mode
  2020-10-07 17:12 [PATCH v3 00/20] Host1x/TegraDRM UAPI Mikko Perttunen
                   ` (8 preceding siblings ...)
  2020-10-07 17:12 ` [PATCH v3 09/20] gpu: host1x: DMA fences and userspace fence creation Mikko Perttunen
@ 2020-10-07 17:12 ` Mikko Perttunen
  2020-10-07 17:12 ` [PATCH v3 11/20] gpu: host1x: Add job release callback Mikko Perttunen
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Mikko Perttunen @ 2020-10-07 17:12 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Add a new property for jobs to enable or disable recovery i.e.
CPU increments of syncpoints to max value on job timeout. This
allows for a more solid model for hanged jobs, where userspace
doesn't need to guess if a syncpoint increment happened because
the job completed, or because job timeout was triggered.

On job timeout, we stop the channel, NOP all future jobs on the
channel using the same syncpoint, mark the syncpoint as locked
and resume the channel from the next job, if any.

The future jobs are NOPed, since because we don't do the CPU
increments, the value of the syncpoint is no longer synchronized,
and any waiters would become confused if a future job incremented
the syncpoint. The syncpoint is marked locked to ensure that any
future jobs cannot increment the syncpoint either, until the
application has recognized the situation and reallocated the
syncpoint.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v3:
* Move 'locked' check inside CDMA lock to prevent race
* Add clarifying comment to NOP-patching code
---
 drivers/gpu/drm/tegra/drm.c        |  1 +
 drivers/gpu/host1x/cdma.c          | 58 ++++++++++++++++++++++++++----
 drivers/gpu/host1x/hw/channel_hw.c |  2 +-
 drivers/gpu/host1x/job.c           |  4 +++
 drivers/gpu/host1x/syncpt.c        |  2 ++
 drivers/gpu/host1x/syncpt.h        | 12 +++++++
 include/linux/host1x.h             |  9 +++++
 7 files changed, 81 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index ceea9db341f0..7437c67924aa 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -197,6 +197,7 @@ int tegra_drm_submit(struct tegra_drm_context *context,
 	job->client = client;
 	job->class = client->class;
 	job->serialize = true;
+	job->syncpt_recovery = true;
 
 	/*
 	 * Track referenced BOs so that they can be unreferenced after the
diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
index 6e6ca774f68d..bd151c3a2a5f 100644
--- a/drivers/gpu/host1x/cdma.c
+++ b/drivers/gpu/host1x/cdma.c
@@ -312,10 +312,6 @@ static void update_cdma_locked(struct host1x_cdma *cdma)
 	bool signal = false;
 	struct host1x_job *job, *n;
 
-	/* If CDMA is stopped, queue is cleared and we can return */
-	if (!cdma->running)
-		return;
-
 	/*
 	 * Walk the sync queue, reading the sync point registers as necessary,
 	 * to consume as many sync queue entries as possible without blocking
@@ -324,7 +320,8 @@ static void update_cdma_locked(struct host1x_cdma *cdma)
 		struct host1x_syncpt *sp = job->syncpt;
 
 		/* Check whether this syncpt has completed, and bail if not */
-		if (!host1x_syncpt_is_expired(sp, job->syncpt_end)) {
+		if (!host1x_syncpt_is_expired(sp, job->syncpt_end) &&
+		    !job->cancelled) {
 			/* Start timer on next pending syncpt */
 			if (job->timeout)
 				cdma_start_timer_locked(cdma, job);
@@ -413,8 +410,11 @@ void host1x_cdma_update_sync_queue(struct host1x_cdma *cdma,
 	else
 		restart_addr = cdma->last_pos;
 
+	if (!job)
+		goto resume;
+
 	/* do CPU increments for the remaining syncpts */
-	if (job) {
+	if (job->syncpt_recovery) {
 		dev_dbg(dev, "%s: perform CPU incr on pending buffers\n",
 			__func__);
 
@@ -433,8 +433,44 @@ void host1x_cdma_update_sync_queue(struct host1x_cdma *cdma,
 
 		dev_dbg(dev, "%s: finished sync_queue modification\n",
 			__func__);
+	} else {
+		struct host1x_job *failed_job = job;
+
+		host1x_job_dump(dev, job);
+
+		host1x_syncpt_set_locked(job->syncpt);
+		failed_job->cancelled = true;
+
+		list_for_each_entry_continue(job, &cdma->sync_queue, list) {
+			unsigned int i;
+
+			if (job->syncpt != failed_job->syncpt)
+				continue;
+
+			for (i = 0; i < job->num_slots; i++) {
+				unsigned int slot = (job->first_get/8 + i) %
+						    HOST1X_PUSHBUFFER_SLOTS;
+				u32 *mapped = cdma->push_buffer.mapped;
+
+				/*
+				 * Overwrite opcodes with 0 word writes to
+				 * to offset 0xbad. This does nothing but
+				 * has a easily detected signature in debug
+				 * traces.
+				 */
+				mapped[2*slot+0] = 0x1bad0000;
+				mapped[2*slot+1] = 0x1bad0000;
+			}
+
+			job->cancelled = true;
+		}
+
+		wmb();
+
+		update_cdma_locked(cdma);
 	}
 
+resume:
 	/* roll back DMAGET and start up channel again */
 	host1x_hw_cdma_resume(host1x, cdma, restart_addr);
 }
@@ -490,6 +526,16 @@ int host1x_cdma_begin(struct host1x_cdma *cdma, struct host1x_job *job)
 
 	mutex_lock(&cdma->lock);
 
+	/*
+	 * Check if syncpoint was locked due to previous job timeout.
+	 * This needs to be done within the cdma lock to avoid a race
+	 * with the timeout handler.
+	 */
+	if (job->syncpt->locked) {
+		mutex_unlock(&cdma->lock);
+		return -EPERM;
+	}
+
 	if (job->timeout) {
 		/* init state on first submit with timeout value */
 		if (!cdma->timeout.initialized) {
diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
index d4c28faf27d1..bf21512e5078 100644
--- a/drivers/gpu/host1x/hw/channel_hw.c
+++ b/drivers/gpu/host1x/hw/channel_hw.c
@@ -191,7 +191,7 @@ static int channel_submit(struct host1x_job *job)
 	/* schedule a submit complete interrupt */
 	err = host1x_intr_add_action(host, sp, syncval,
 				     HOST1X_INTR_ACTION_SUBMIT_COMPLETE, ch,
-				     completed_waiter, NULL);
+				     completed_waiter, &job->waiter);
 	completed_waiter = NULL;
 	WARN(err, "Failed to set submit complete interrupt");
 
diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
index d8345d3bf0b3..e4f16fc899b0 100644
--- a/drivers/gpu/host1x/job.c
+++ b/drivers/gpu/host1x/job.c
@@ -79,6 +79,10 @@ static void job_free(struct kref *ref)
 {
 	struct host1x_job *job = container_of(ref, struct host1x_job, ref);
 
+	if (job->waiter)
+		host1x_intr_put_ref(job->syncpt->host, job->syncpt->id,
+				    job->waiter);
+
 	if (job->syncpt)
 		host1x_syncpt_put(job->syncpt);
 
diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index c7b910e413d8..8d658e5f7db2 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -385,6 +385,8 @@ static void syncpt_release(struct kref *ref)
 {
 	struct host1x_syncpt *sp = container_of(ref, struct host1x_syncpt, ref);
 
+	sp->locked = false;
+
 	mutex_lock(&sp->host->syncpt_mutex);
 
 	host1x_syncpt_base_free(sp->base);
diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h
index a6766f8d55ee..93e894677d89 100644
--- a/drivers/gpu/host1x/syncpt.h
+++ b/drivers/gpu/host1x/syncpt.h
@@ -40,6 +40,13 @@ struct host1x_syncpt {
 
 	/* interrupt data */
 	struct host1x_syncpt_intr intr;
+
+	/* 
+	 * If a submission incrementing this syncpoint fails, lock it so that
+	 * further submission cannot be made until application has handled the
+	 * failure.
+	 */
+	bool locked;
 };
 
 /* Initialize sync point array  */
@@ -115,4 +122,9 @@ static inline int host1x_syncpt_is_valid(struct host1x_syncpt *sp)
 	return sp->id < host1x_syncpt_nb_pts(sp->host);
 }
 
+static inline void host1x_syncpt_set_locked(struct host1x_syncpt *sp)
+{
+	sp->locked = true;
+}
+
 #endif
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index bcfb1cc9a1c1..fb62cc8b77dd 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -233,9 +233,15 @@ struct host1x_job {
 	u32 syncpt_incrs;
 	u32 syncpt_end;
 
+	/* Completion waiter ref */
+	void *waiter;
+
 	/* Maximum time to wait for this job */
 	unsigned int timeout;
 
+	/* Job has timed out and should be released */
+	bool cancelled;
+
 	/* Index and number of slots used in the push buffer */
 	unsigned int first_get;
 	unsigned int num_slots;
@@ -256,6 +262,9 @@ struct host1x_job {
 
 	/* Add a channel wait for previous ops to complete */
 	bool serialize;
+
+	/* Fast-forward syncpoint increments on job timeout */
+	bool syncpt_recovery;
 };
 
 struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
-- 
2.28.0


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v3 11/20] gpu: host1x: Add job release callback
  2020-10-07 17:12 [PATCH v3 00/20] Host1x/TegraDRM UAPI Mikko Perttunen
                   ` (9 preceding siblings ...)
  2020-10-07 17:12 ` [PATCH v3 10/20] gpu: host1x: Add no-recovery mode Mikko Perttunen
@ 2020-10-07 17:12 ` Mikko Perttunen
  2020-10-07 17:12 ` [PATCH v3 12/20] gpu: host1x: Add support for syncpoint waits in CDMA pushbuffer Mikko Perttunen
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Mikko Perttunen @ 2020-10-07 17:12 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Add a callback field to the job structure, to be called just before
the job is to be freed. This allows the job's submitter to clean
up any of its own state, like decrement runtime PM refcounts.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/job.c | 3 +++
 include/linux/host1x.h   | 4 ++++
 2 files changed, 7 insertions(+)

diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
index e4f16fc899b0..acf322beb56c 100644
--- a/drivers/gpu/host1x/job.c
+++ b/drivers/gpu/host1x/job.c
@@ -79,6 +79,9 @@ static void job_free(struct kref *ref)
 {
 	struct host1x_job *job = container_of(ref, struct host1x_job, ref);
 
+	if (job->release)
+		job->release(job);
+
 	if (job->waiter)
 		host1x_intr_put_ref(job->syncpt->host, job->syncpt->id,
 				    job->waiter);
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index fb62cc8b77dd..d7070fd65833 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -265,6 +265,10 @@ struct host1x_job {
 
 	/* Fast-forward syncpoint increments on job timeout */
 	bool syncpt_recovery;
+
+	/* Callback called when job is freed */
+	void (*release)(struct host1x_job *job);
+	void *user_data;
 };
 
 struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
-- 
2.28.0


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v3 12/20] gpu: host1x: Add support for syncpoint waits in CDMA pushbuffer
  2020-10-07 17:12 [PATCH v3 00/20] Host1x/TegraDRM UAPI Mikko Perttunen
                   ` (10 preceding siblings ...)
  2020-10-07 17:12 ` [PATCH v3 11/20] gpu: host1x: Add job release callback Mikko Perttunen
@ 2020-10-07 17:12 ` Mikko Perttunen
  2020-10-07 17:12 ` [PATCH v3 13/20] gpu: host1x: Reset max value when freeing a syncpoint Mikko Perttunen
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Mikko Perttunen @ 2020-10-07 17:12 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Add support for inserting syncpoint waits in the CDMA pushbuffer.
These waits need to be done in HOST1X class, while gather submitted
by the application execute in engine class.

Support is added by converting the gather list of job into a command
list that can include both gathers and waits. When the job is
submitted, these commands are pushed as the appropriate opcodes
on the CDMA pushbuffer.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/hw/channel_hw.c | 51 +++++++++++++++--------
 drivers/gpu/host1x/hw/debug_hw.c   |  9 +++-
 drivers/gpu/host1x/job.c           | 67 +++++++++++++++++++++---------
 drivers/gpu/host1x/job.h           | 14 +++++++
 include/linux/host1x.h             |  5 ++-
 5 files changed, 105 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
index bf21512e5078..d88a32f73f5e 100644
--- a/drivers/gpu/host1x/hw/channel_hw.c
+++ b/drivers/gpu/host1x/hw/channel_hw.c
@@ -55,31 +55,46 @@ static void submit_gathers(struct host1x_job *job)
 #endif
 	unsigned int i;
 
-	for (i = 0; i < job->num_gathers; i++) {
-		struct host1x_job_gather *g = &job->gathers[i];
-		dma_addr_t addr = g->base + g->offset;
-		u32 op2, op3;
+	for (i = 0; i < job->num_cmds; i++) {
+		struct host1x_job_cmd *cmd = &job->cmds[i];
 
-		op2 = lower_32_bits(addr);
-		op3 = upper_32_bits(addr);
+		if (cmd->is_wait) {
+			/* TODO use modern wait */
+			host1x_cdma_push(cdma,
+				 host1x_opcode_setclass(HOST1X_CLASS_HOST1X,
+					host1x_uclass_wait_syncpt_r(), 1),
+				 host1x_class_host_wait_syncpt(cmd->wait.id,
+					cmd->wait.threshold));
+			host1x_cdma_push(
+				cdma, host1x_opcode_setclass(job->class, 0, 0),
+				HOST1X_OPCODE_NOP);
+		} else {
+			struct host1x_job_gather *g = &cmd->gather;
 
-		trace_write_gather(cdma, g->bo, g->offset, g->words);
+			dma_addr_t addr = g->base + g->offset;
+			u32 op2, op3;
 
-		if (op3 != 0) {
+			op2 = lower_32_bits(addr);
+			op3 = upper_32_bits(addr);
+
+			trace_write_gather(cdma, g->bo, g->offset, g->words);
+
+			if (op3 != 0) {
 #if HOST1X_HW >= 6
-			u32 op1 = host1x_opcode_gather_wide(g->words);
-			u32 op4 = HOST1X_OPCODE_NOP;
+				u32 op1 = host1x_opcode_gather_wide(g->words);
+				u32 op4 = HOST1X_OPCODE_NOP;
 
-			host1x_cdma_push_wide(cdma, op1, op2, op3, op4);
+				host1x_cdma_push_wide(cdma, op1, op2, op3, op4);
 #else
-			dev_err(dev, "invalid gather for push buffer %pad\n",
-				&addr);
-			continue;
+				dev_err(dev, "invalid gather for push buffer %pad\n",
+					&addr);
+				continue;
 #endif
-		} else {
-			u32 op1 = host1x_opcode_gather(g->words);
+			} else {
+				u32 op1 = host1x_opcode_gather(g->words);
 
-			host1x_cdma_push(cdma, op1, op2);
+				host1x_cdma_push(cdma, op1, op2);
+			}
 		}
 	}
 }
@@ -126,7 +141,7 @@ static int channel_submit(struct host1x_job *job)
 	struct host1x *host = dev_get_drvdata(ch->dev->parent);
 
 	trace_host1x_channel_submit(dev_name(ch->dev),
-				    job->num_gathers, job->num_relocs,
+				    job->num_cmds, job->num_relocs,
 				    job->syncpt->id, job->syncpt_incrs);
 
 	/* before error checks, return current max */
diff --git a/drivers/gpu/host1x/hw/debug_hw.c b/drivers/gpu/host1x/hw/debug_hw.c
index ceb48229d14b..35952fd5597e 100644
--- a/drivers/gpu/host1x/hw/debug_hw.c
+++ b/drivers/gpu/host1x/hw/debug_hw.c
@@ -208,10 +208,15 @@ static void show_channel_gathers(struct output *o, struct host1x_cdma *cdma)
 				    job->first_get, job->timeout,
 				    job->num_slots, job->num_unpins);
 
-		for (i = 0; i < job->num_gathers; i++) {
-			struct host1x_job_gather *g = &job->gathers[i];
+		for (i = 0; i < job->num_cmds; i++) {
+			struct host1x_job_gather *g;
 			u32 *mapped;
 
+			if (job->cmds[i].is_wait)
+				continue;
+
+			g = &job->cmds[i].gather;
+
 			if (job->gather_copy_mapped)
 				mapped = (u32 *)job->gather_copy_mapped;
 			else
diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
index acf322beb56c..ac4091596811 100644
--- a/drivers/gpu/host1x/job.c
+++ b/drivers/gpu/host1x/job.c
@@ -38,7 +38,7 @@ struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
 	total = sizeof(struct host1x_job) +
 		(u64)num_relocs * sizeof(struct host1x_reloc) +
 		(u64)num_unpins * sizeof(struct host1x_job_unpin_data) +
-		(u64)num_cmdbufs * sizeof(struct host1x_job_gather) +
+		(u64)num_cmdbufs * sizeof(struct host1x_job_cmd) +
 		(u64)num_unpins * sizeof(dma_addr_t) +
 		(u64)num_unpins * sizeof(u32 *);
 	if (total > ULONG_MAX)
@@ -57,8 +57,8 @@ struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
 	mem += num_relocs * sizeof(struct host1x_reloc);
 	job->unpins = num_unpins ? mem : NULL;
 	mem += num_unpins * sizeof(struct host1x_job_unpin_data);
-	job->gathers = num_cmdbufs ? mem : NULL;
-	mem += num_cmdbufs * sizeof(struct host1x_job_gather);
+	job->cmds = num_cmdbufs ? mem : NULL;
+	mem += num_cmdbufs * sizeof(struct host1x_job_cmd);
 	job->addr_phys = num_unpins ? mem : NULL;
 
 	job->reloc_addr_phys = job->addr_phys;
@@ -101,22 +101,35 @@ EXPORT_SYMBOL(host1x_job_put);
 void host1x_job_add_gather(struct host1x_job *job, struct host1x_bo *bo,
 			   unsigned int words, unsigned int offset)
 {
-	struct host1x_job_gather *gather = &job->gathers[job->num_gathers];
+	struct host1x_job_gather *gather = &job->cmds[job->num_cmds].gather;
 
 	gather->words = words;
 	gather->bo = bo;
 	gather->offset = offset;
 
-	job->num_gathers++;
+	job->num_cmds++;
 }
 EXPORT_SYMBOL(host1x_job_add_gather);
 
+void host1x_job_add_wait(struct host1x_job *job, u32 id, u32 thresh)
+{
+	struct host1x_job_cmd *cmd = &job->cmds[job->num_cmds];
+
+	cmd->is_wait = true;
+	cmd->wait.id = id;
+	cmd->wait.threshold = thresh;
+
+	job->num_cmds++;
+}
+EXPORT_SYMBOL(host1x_job_add_wait);
+
 static unsigned int pin_job(struct host1x *host, struct host1x_job *job)
 {
 	struct host1x_client *client = job->client;
 	struct device *dev = client->dev;
 	struct host1x_job_gather *g;
 	struct iommu_domain *domain;
+	struct sg_table *sgt;
 	unsigned int i;
 	int err;
 
@@ -126,7 +139,6 @@ static unsigned int pin_job(struct host1x *host, struct host1x_job *job)
 	for (i = 0; i < job->num_relocs; i++) {
 		struct host1x_reloc *reloc = &job->relocs[i];
 		dma_addr_t phys_addr, *phys;
-		struct sg_table *sgt;
 
 		reloc->target.bo = host1x_bo_get(reloc->target.bo);
 		if (!reloc->target.bo) {
@@ -204,17 +216,20 @@ static unsigned int pin_job(struct host1x *host, struct host1x_job *job)
 	if (IS_ENABLED(CONFIG_TEGRA_HOST1X_FIREWALL))
 		return 0;
 
-	for (i = 0; i < job->num_gathers; i++) {
+	for (i = 0; i < job->num_cmds; i++) {
 		size_t gather_size = 0;
 		struct scatterlist *sg;
-		struct sg_table *sgt;
 		dma_addr_t phys_addr;
 		unsigned long shift;
 		struct iova *alloc;
 		dma_addr_t *phys;
 		unsigned int j;
 
-		g = &job->gathers[i];
+		if (job->cmds[i].is_wait)
+			continue;
+
+		g = &job->cmds[i].gather;
+
 		g->bo = host1x_bo_get(g->bo);
 		if (!g->bo) {
 			err = -EINVAL;
@@ -550,8 +565,13 @@ static inline int copy_gathers(struct device *host, struct host1x_job *job,
 	fw.num_relocs = job->num_relocs;
 	fw.class = job->class;
 
-	for (i = 0; i < job->num_gathers; i++) {
-		struct host1x_job_gather *g = &job->gathers[i];
+	for (i = 0; i < job->num_cmds; i++) {
+		struct host1x_job_gather *g;
+
+		if (job->cmds[i].is_wait)
+			continue;
+
+		g = &job->cmds[i].gather;
 
 		size += g->words * sizeof(u32);
 	}
@@ -573,10 +593,14 @@ static inline int copy_gathers(struct device *host, struct host1x_job *job,
 
 	job->gather_copy_size = size;
 
-	for (i = 0; i < job->num_gathers; i++) {
-		struct host1x_job_gather *g = &job->gathers[i];
+	for (i = 0; i < job->num_cmds; i++) {
+		struct host1x_job_gather *g;
 		void *gather;
 
+		if (job->cmds[i].is_wait)
+			continue;
+		g = &job->cmds[i].gather;
+
 		/* Copy the gather */
 		gather = host1x_bo_mmap(g->bo);
 		memcpy(job->gather_copy_mapped + offset, gather + g->offset,
@@ -619,8 +643,12 @@ int host1x_job_pin(struct host1x_job *job, struct device *dev)
 	}
 
 	/* patch gathers */
-	for (i = 0; i < job->num_gathers; i++) {
-		struct host1x_job_gather *g = &job->gathers[i];
+	for (i = 0; i < job->num_cmds; i++) {
+		struct host1x_job_gather *g;
+
+		if (job->cmds[i].is_wait)
+			continue;
+		g = &job->cmds[i].gather;
 
 		/* process each gather mem only once */
 		if (g->handled)
@@ -630,10 +658,11 @@ int host1x_job_pin(struct host1x_job *job, struct device *dev)
 		if (!IS_ENABLED(CONFIG_TEGRA_HOST1X_FIREWALL))
 			g->base = job->gather_addr_phys[i];
 
-		for (j = i + 1; j < job->num_gathers; j++) {
-			if (job->gathers[j].bo == g->bo) {
-				job->gathers[j].handled = true;
-				job->gathers[j].base = g->base;
+		for (j = i + 1; j < job->num_cmds; j++) {
+			if (!job->cmds[j].is_wait &&
+			    job->cmds[j].gather.bo == g->bo) {
+				job->cmds[j].gather.handled = true;
+				job->cmds[j].gather.base = g->base;
 			}
 		}
 
diff --git a/drivers/gpu/host1x/job.h b/drivers/gpu/host1x/job.h
index 94bc2e4ae241..33adfaede842 100644
--- a/drivers/gpu/host1x/job.h
+++ b/drivers/gpu/host1x/job.h
@@ -18,6 +18,20 @@ struct host1x_job_gather {
 	bool handled;
 };
 
+struct host1x_job_wait {
+	u32 id;
+	u32 threshold;
+};
+
+struct host1x_job_cmd {
+	bool is_wait;
+
+	union {
+		struct host1x_job_gather gather;
+		struct host1x_job_wait wait;
+	};
+};
+
 struct host1x_job_unpin_data {
 	struct host1x_bo *bo;
 	struct sg_table *sgt;
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index d7070fd65833..59296d3346fe 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -215,8 +215,8 @@ struct host1x_job {
 	struct host1x_client *client;
 
 	/* Gathers and their memory */
-	struct host1x_job_gather *gathers;
-	unsigned int num_gathers;
+	struct host1x_job_cmd *cmds;
+	unsigned int num_cmds;
 
 	/* Array of handles to be pinned & unpinned */
 	struct host1x_reloc *relocs;
@@ -275,6 +275,7 @@ struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
 				    u32 num_cmdbufs, u32 num_relocs);
 void host1x_job_add_gather(struct host1x_job *job, struct host1x_bo *bo,
 			   unsigned int words, unsigned int offset);
+void host1x_job_add_wait(struct host1x_job *job, u32 id, u32 thresh);
 struct host1x_job *host1x_job_get(struct host1x_job *job);
 void host1x_job_put(struct host1x_job *job);
 int host1x_job_pin(struct host1x_job *job, struct device *dev);
-- 
2.28.0


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v3 13/20] gpu: host1x: Reset max value when freeing a syncpoint
  2020-10-07 17:12 [PATCH v3 00/20] Host1x/TegraDRM UAPI Mikko Perttunen
                   ` (11 preceding siblings ...)
  2020-10-07 17:12 ` [PATCH v3 12/20] gpu: host1x: Add support for syncpoint waits in CDMA pushbuffer Mikko Perttunen
@ 2020-10-07 17:12 ` Mikko Perttunen
  2020-10-07 17:12 ` [PATCH v3 14/20] gpu: host1x: Reserve VBLANK syncpoints at initialization Mikko Perttunen
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Mikko Perttunen @ 2020-10-07 17:12 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

With job recovery becoming optional, syncpoints may have a mismatch
between their value and max value when freed. As such, when freeing,
set the max value to the current value of the syncpoint so that it
is in a sane state for the next user.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v3:
* Use host1x_syncpt_read instead of read_min to ensure syncpoint
  value is current.
---
 drivers/gpu/host1x/syncpt.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index 8d658e5f7db2..99d31932eb34 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -385,6 +385,7 @@ static void syncpt_release(struct kref *ref)
 {
 	struct host1x_syncpt *sp = container_of(ref, struct host1x_syncpt, ref);
 
+	atomic_set(&sp->max_val, host1x_syncpt_read(sp));
 	sp->locked = false;
 
 	mutex_lock(&sp->host->syncpt_mutex);
-- 
2.28.0


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v3 14/20] gpu: host1x: Reserve VBLANK syncpoints at initialization
  2020-10-07 17:12 [PATCH v3 00/20] Host1x/TegraDRM UAPI Mikko Perttunen
                   ` (12 preceding siblings ...)
  2020-10-07 17:12 ` [PATCH v3 13/20] gpu: host1x: Reset max value when freeing a syncpoint Mikko Perttunen
@ 2020-10-07 17:12 ` Mikko Perttunen
  2020-10-07 17:12 ` [PATCH v3 15/20] drm/tegra: Add new UAPI to header Mikko Perttunen
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Mikko Perttunen @ 2020-10-07 17:12 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

On T20-T148 chips, the bootloader can set up a boot splash
screen with DC configured to increment syncpoint 26/27
at VBLANK. Because of this we shouldn't allow these syncpoints
to be allocated until DC has been reset and will no longer
increment them in the background.

As such, on these chips, reserve those two syncpoints at
initialization, and only mark them free once the DC
driver has indicated it's safe to do so.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v3:
* New patch
---
 drivers/gpu/drm/tegra/dc.c  |  6 ++++++
 drivers/gpu/host1x/dev.c    |  6 ++++++
 drivers/gpu/host1x/dev.h    |  6 ++++++
 drivers/gpu/host1x/syncpt.c | 34 +++++++++++++++++++++++++++++++++-
 include/linux/host1x.h      |  3 +++
 5 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c
index efb41c10dad4..0b23e0922c25 100644
--- a/drivers/gpu/drm/tegra/dc.c
+++ b/drivers/gpu/drm/tegra/dc.c
@@ -2031,6 +2031,12 @@ static int tegra_dc_init(struct host1x_client *client)
 	struct drm_plane *cursor = NULL;
 	int err;
 
+	/*
+	 * DC has been reset by now, so VBLANK syncpoint can be released
+	 * for general use.
+	 */
+	host1x_syncpt_release_vblank_reservation(client, 26 + dc->pipe);
+
 	/*
 	 * XXX do not register DCs with no window groups because we cannot
 	 * assign a primary plane to them, which in turn will cause KMS to
diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index 641317d23828..8b50fbb22846 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -77,6 +77,7 @@ static const struct host1x_info host1x01_info = {
 	.has_hypervisor = false,
 	.num_sid_entries = 0,
 	.sid_table = NULL,
+	.reserve_vblank_syncpts = true,
 };
 
 static const struct host1x_info host1x02_info = {
@@ -91,6 +92,7 @@ static const struct host1x_info host1x02_info = {
 	.has_hypervisor = false,
 	.num_sid_entries = 0,
 	.sid_table = NULL,
+	.reserve_vblank_syncpts = true,
 };
 
 static const struct host1x_info host1x04_info = {
@@ -105,6 +107,7 @@ static const struct host1x_info host1x04_info = {
 	.has_hypervisor = false,
 	.num_sid_entries = 0,
 	.sid_table = NULL,
+	.reserve_vblank_syncpts = false,
 };
 
 static const struct host1x_info host1x05_info = {
@@ -119,6 +122,7 @@ static const struct host1x_info host1x05_info = {
 	.has_hypervisor = false,
 	.num_sid_entries = 0,
 	.sid_table = NULL,
+	.reserve_vblank_syncpts = false,
 };
 
 static const struct host1x_sid_entry tegra186_sid_table[] = {
@@ -142,6 +146,7 @@ static const struct host1x_info host1x06_info = {
 	.has_hypervisor = true,
 	.num_sid_entries = ARRAY_SIZE(tegra186_sid_table),
 	.sid_table = tegra186_sid_table,
+	.reserve_vblank_syncpts = false,
 };
 
 static const struct host1x_sid_entry tegra194_sid_table[] = {
@@ -165,6 +170,7 @@ static const struct host1x_info host1x07_info = {
 	.has_hypervisor = true,
 	.num_sid_entries = ARRAY_SIZE(tegra194_sid_table),
 	.sid_table = tegra194_sid_table,
+	.reserve_vblank_syncpts = false,
 };
 
 static const struct of_device_id host1x_of_match[] = {
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
index 7b8b7e20e32b..e360bc4a25f6 100644
--- a/drivers/gpu/host1x/dev.h
+++ b/drivers/gpu/host1x/dev.h
@@ -102,6 +102,12 @@ struct host1x_info {
 	bool has_hypervisor; /* has hypervisor registers */
 	unsigned int num_sid_entries;
 	const struct host1x_sid_entry *sid_table;
+	/*
+	 * On T20-T148, the boot chain may setup DC to increment syncpoints
+	 * 26/27 on VBLANK. As such we cannot use these syncpoints until
+	 * the display driver disables VBLANK increments.
+	 */
+	bool reserve_vblank_syncpts;
 };
 
 struct host1x {
diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index 99d31932eb34..d0be7bdbc6c9 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -52,7 +52,7 @@ struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
 
 	mutex_lock(&host->syncpt_mutex);
 
-	for (i = 0; i < host->info->nb_pts && sp->name; i++, sp++)
+	for (i = 0; i < host->info->nb_pts && kref_read(&sp->ref); i++, sp++)
 		;
 
 	if (i >= host->info->nb_pts)
@@ -359,6 +359,11 @@ int host1x_syncpt_init(struct host1x *host)
 	if (!host->nop_sp)
 		return -ENOMEM;
 
+	if (host->info->reserve_vblank_syncpts) {
+		kref_init(&host->syncpt[26].ref);
+		kref_init(&host->syncpt[27].ref);
+	}
+
 	return 0;
 }
 
@@ -545,3 +550,30 @@ u32 host1x_syncpt_base_id(struct host1x_syncpt_base *base)
 	return base->id;
 }
 EXPORT_SYMBOL(host1x_syncpt_base_id);
+
+static void do_nothing(struct kref *ref)
+{
+}
+
+/**
+ * host1x_syncpt_release_vblank_reservation() - Make VBLANK syncpoint
+ *   available for allocation
+ *
+ * @client: host1x bus client
+ *
+ * Makes VBLANK<i> syncpoint available for allocatation if it was
+ * reserved at initialization time. This should be called by the display
+ * driver after it has ensured that any VBLANK increment programming configured
+ * by the boot chain has been disabled.
+ */
+void host1x_syncpt_release_vblank_reservation(struct host1x_client *client,
+					      u32 syncpt_id)
+{
+	struct host1x *host = dev_get_drvdata(client->host->parent);
+
+	if (!host->info->reserve_vblank_syncpts)
+		return;
+
+	kref_put(&host->syncpt[syncpt_id].ref, do_nothing);
+}
+EXPORT_SYMBOL(host1x_syncpt_release_vblank_reservation);
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index 59296d3346fe..ff2a23cd6bb0 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -163,6 +163,9 @@ struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
 struct host1x_syncpt_base *host1x_syncpt_get_base(struct host1x_syncpt *sp);
 u32 host1x_syncpt_base_id(struct host1x_syncpt_base *base);
 
+void host1x_syncpt_release_vblank_reservation(struct host1x_client *client,
+					      u32 syncpt_id);
+
 struct host1x_syncpt *host1x_syncpt_fd_get(int fd);
 
 struct dma_fence *host1x_fence_create(struct host1x_syncpt *sp, u32 threshold);
-- 
2.28.0


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v3 15/20] drm/tegra: Add new UAPI to header
  2020-10-07 17:12 [PATCH v3 00/20] Host1x/TegraDRM UAPI Mikko Perttunen
                   ` (13 preceding siblings ...)
  2020-10-07 17:12 ` [PATCH v3 14/20] gpu: host1x: Reserve VBLANK syncpoints at initialization Mikko Perttunen
@ 2020-10-07 17:12 ` Mikko Perttunen
  2020-10-07 17:12 ` [PATCH v3 16/20] drm/tegra: Boot VIC during runtime PM resume Mikko Perttunen
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Mikko Perttunen @ 2020-10-07 17:12 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Update the tegra_drm.h UAPI header, adding the new proposed UAPI.
The old staging UAPI is left in for now, with minor modification
to avoid name collisions.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v3:
* Remove timeout field
* Inline the syncpt_incrs array to the submit structure
* Remove WRITE_RELOC (it is now implicit)
---
 include/uapi/drm/tegra_drm.h | 420 ++++++++++++++++++++++++++++++++---
 1 file changed, 393 insertions(+), 27 deletions(-)

diff --git a/include/uapi/drm/tegra_drm.h b/include/uapi/drm/tegra_drm.h
index c4df3c3668b3..9588d5e3308f 100644
--- a/include/uapi/drm/tegra_drm.h
+++ b/include/uapi/drm/tegra_drm.h
@@ -1,24 +1,5 @@
-/*
- * Copyright (c) 2012-2013, NVIDIA CORPORATION.  All rights reserved.
- *
- * Permission is hereby granted, free of charge, to any person obtaining a
- * copy of this software and associated documentation files (the "Software"),
- * to deal in the Software without restriction, including without limitation
- * the rights to use, copy, modify, merge, publish, distribute, sublicense,
- * and/or sell copies of the Software, and to permit persons to whom the
- * Software is furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
- * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
- * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
- * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
- * OTHER DEALINGS IN THE SOFTWARE.
- */
+/* SPDX-License-Identifier: MIT */
+/* Copyright (c) 2012-2020 NVIDIA Corporation */
 
 #ifndef _UAPI_TEGRA_DRM_H_
 #define _UAPI_TEGRA_DRM_H_
@@ -29,6 +10,8 @@
 extern "C" {
 #endif
 
+/* TegraDRM legacy UAPI. Only enabled with STAGING */
+
 #define DRM_TEGRA_GEM_CREATE_TILED     (1 << 0)
 #define DRM_TEGRA_GEM_CREATE_BOTTOM_UP (1 << 1)
 
@@ -644,13 +627,13 @@ struct drm_tegra_gem_get_flags {
 	__u32 flags;
 };
 
-#define DRM_TEGRA_GEM_CREATE		0x00
-#define DRM_TEGRA_GEM_MMAP		0x01
+#define DRM_TEGRA_GEM_CREATE_LEGACY	0x00
+#define DRM_TEGRA_GEM_MMAP_LEGACY	0x01
 #define DRM_TEGRA_SYNCPT_READ		0x02
 #define DRM_TEGRA_SYNCPT_INCR		0x03
 #define DRM_TEGRA_SYNCPT_WAIT		0x04
-#define DRM_TEGRA_OPEN_CHANNEL		0x05
-#define DRM_TEGRA_CLOSE_CHANNEL		0x06
+#define DRM_TEGRA_OPEN_CHANNEL	        0x05
+#define DRM_TEGRA_CLOSE_CHANNEL	        0x06
 #define DRM_TEGRA_GET_SYNCPT		0x07
 #define DRM_TEGRA_SUBMIT		0x08
 #define DRM_TEGRA_GET_SYNCPT_BASE	0x09
@@ -659,8 +642,8 @@ struct drm_tegra_gem_get_flags {
 #define DRM_TEGRA_GEM_SET_FLAGS		0x0c
 #define DRM_TEGRA_GEM_GET_FLAGS		0x0d
 
-#define DRM_IOCTL_TEGRA_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_CREATE, struct drm_tegra_gem_create)
-#define DRM_IOCTL_TEGRA_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_MMAP, struct drm_tegra_gem_mmap)
+#define DRM_IOCTL_TEGRA_GEM_CREATE_LEGACY DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_CREATE_LEGACY, struct drm_tegra_gem_create)
+#define DRM_IOCTL_TEGRA_GEM_MMAP_LEGACY DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_MMAP_LEGACY, struct drm_tegra_gem_mmap)
 #define DRM_IOCTL_TEGRA_SYNCPT_READ DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_SYNCPT_READ, struct drm_tegra_syncpt_read)
 #define DRM_IOCTL_TEGRA_SYNCPT_INCR DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_SYNCPT_INCR, struct drm_tegra_syncpt_incr)
 #define DRM_IOCTL_TEGRA_SYNCPT_WAIT DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_SYNCPT_WAIT, struct drm_tegra_syncpt_wait)
@@ -674,6 +657,389 @@ struct drm_tegra_gem_get_flags {
 #define DRM_IOCTL_TEGRA_GEM_SET_FLAGS DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_SET_FLAGS, struct drm_tegra_gem_set_flags)
 #define DRM_IOCTL_TEGRA_GEM_GET_FLAGS DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_GET_FLAGS, struct drm_tegra_gem_get_flags)
 
+/* New TegraDRM UAPI */
+
+struct drm_tegra_channel_open {
+	/**
+	 * @host1x_class: [in]
+	 *
+	 * Host1x class of the engine that will be programmed using this
+	 * channel.
+	 */
+	__u32 host1x_class;
+
+	/**
+	 * @flags: [in]
+	 *
+	 * Flags.
+	 */
+	__u32 flags;
+
+	/**
+	 * @channel_ctx: [out]
+	 *
+	 * Opaque identifier corresponding to the opened channel.
+	 */
+	__u32 channel_ctx;
+
+	/**
+	 * @hardware_version: [out]
+	 *
+	 * Version of the engine hardware. This can be used by userspace
+	 * to determine how the engine needs to be programmed.
+	 */
+	__u32 hardware_version;
+
+	__u32 reserved[2];
+};
+
+struct drm_tegra_channel_close {
+	/**
+	 * @channel_ctx: [in]
+	 *
+	 * Identifier of the channel to close.
+	 */
+	__u32 channel_ctx;
+
+	__u32 reserved[1];
+};
+
+#define DRM_TEGRA_CHANNEL_MAP_READWRITE			(1<<0)
+
+struct drm_tegra_channel_map {
+	/**
+	 * @channel_ctx: [in]
+	 *
+	 * Identifier of the channel to which make memory available for.
+	 */
+	__u32 channel_ctx;
+
+	/**
+	 * @handle: [in]
+	 *
+	 * GEM handle of the memory to map.
+	 */
+	__u32 handle;
+
+	/**
+	 * @offset: [in]
+	 *
+	 * Offset in the GEM handle's underlying memory to start the
+	 * mapping from.
+	 */
+	__u64 offset;
+
+	/**
+	 * @length: [in]
+	 *
+	 * Length of memory to map.
+	 */
+	__u64 length;
+
+	/**
+	 * @iova: [out]
+	 *
+	 * IOVA of mapped memory. Only available if hardware memory
+	 * isolation is supported. If provided, userspace can program this
+	 * address directly to the engine to skip using relocations.
+	 *
+	 * Will be set to U64_MAX if unavailable.
+	 */
+	__u64 iova;
+
+	/**
+	 * @mapping_id: [out]
+	 *
+	 * Identifier corresponding to the mapping, to be used for
+	 * relocations or unmapping later.
+	 */
+	__u32 mapping_id;
+
+	/**
+	 * @flags: [in]
+	 *
+	 * Flags.
+	 */
+	__u32 flags;
+
+	__u32 reserved[2];
+};
+
+struct drm_tegra_channel_unmap {
+	/**
+	 * @channel_ctx: [in]
+	 *
+	 * Channel identifier of the channel to unmap memory from.
+	 */
+	__u32 channel_ctx;
+
+	/**
+	 * @mapping_id: [in]
+	 *
+	 * Mapping identifier of the memory mapping to unmap.
+	 */
+	__u32 mapping_id;
+
+	__u32 reserved[2];
+};
+
+/* Submission */
+
+/**
+ * Specify that bit 39 of the patched-in address should be set to
+ * trigger layout swizzling between Tegra and non-Tegra Blocklinear
+ * layout on systems that store surfaces in system memory in non-Tegra
+ * Blocklinear layout.
+ */
+#define DRM_TEGRA_SUBMIT_BUF_RELOC_BLOCKLINEAR		(1<<0)
+/**
+ * Specify that any implicit fences required to read this buffer
+ * should be waited before executing the job.
+ */
+#define DRM_TEGRA_SUBMIT_BUF_RESV_READ			(1<<1)
+/**
+ * Specify that any implicit fences required to write this buffer
+ * should be waited before executing the job.
+ */
+#define DRM_TEGRA_SUBMIT_BUF_RESV_WRITE			(1<<2)
+
+struct drm_tegra_submit_buf {
+	/**
+	 * @mapping_id: [in]
+	 *
+	 * Identifier of the mapping to use in the submission.
+	 */
+	__u32 mapping_id;
+
+	/**
+	 * @flags: [in]
+	 *
+	 * Flags.
+	 */
+	__u32 flags;
+
+	/**
+	 * Information for relocation patching. Relocation patching will
+	 * be done if the MAP IOCTL that created `mapping_id` did not
+	 * return an IOVA. If an IOVA was returned, the application is
+	 * responsible for patching the address into the gather.
+	 */
+	struct {
+		/**
+		 * @target_offset: [in]
+		 *
+		 * Offset from the start of the mapping of the data whose
+		 * address is to be patched into the gather.
+		 */
+		__u64 target_offset;
+
+		/**
+		 * @gather_offset_words: [in]
+		 *
+		 * Offset in words from the start of the gather data to
+		 * where the address should be patched into.
+		 */
+		__u32 gather_offset_words;
+
+		/**
+		 * @shift: [in]
+		 *
+		 * Number of bits the address should be shifted right before
+		 * patching in.
+		 */
+		__u32 shift;
+	} reloc;
+
+	__u32 reserved[2];
+};
+
+#define DRM_TEGRA_SUBMIT_SYNCPT_INCR_CREATE_SYNC_FILE  (1<<0)
+
+struct drm_tegra_submit_syncpt_incr {
+	/**
+	 * @syncpt_fd: [in]
+	 *
+	 * Syncpoint file descriptor of the syncpoint that the job will
+	 * increment.
+	 */
+	__s32 syncpt_fd;
+
+	/**
+	 * @flags: [in]
+	 *
+	 * Flags.
+	 */
+	__u32 flags;
+
+	/**
+	 * @num_incrs: [in]
+	 *
+	 * Number of times the job will increment this syncpoint.
+	 */
+	__u32 num_incrs;
+
+	/**
+	 * @fence_value: [out]
+	 *
+	 * Value the syncpoint will have once the job has completed all
+	 * its specified syncpoint increments.
+	 *
+	 * Note that the kernel may increment the syncpoint before or after
+	 * the job. These increments are not reflected in this field.
+	 *
+	 * If the job hangs or times out, not all of the increments may
+	 * get executed.
+	 */
+	__u32 fence_value;
+
+	/**
+	 * @sync_file_fd: [out]
+	 *
+	 * Created sync_file file descriptor corresponding to the threshold
+	 * specified by `fence_value`. Only set if the CREATE_SYNC_FILE
+	 * flag is specified.
+	 */
+	__s32 sync_file_fd;
+
+	__u32 reserved[3];
+};
+
+/**
+ * Execute `words` words of Host1x opcodes specified in the `gather_data_ptr`
+ * buffer. Each GATHER_UPTR command uses successive words from the buffer.
+ */
+#define DRM_TEGRA_SUBMIT_CMD_GATHER_UPTR		0
+/**
+ * Wait for a syncpoint to reach a value before continuing with further
+ * commands.
+ */
+#define DRM_TEGRA_SUBMIT_CMD_WAIT_SYNCPT		1
+/**
+ * Wait for the fence represented by the sync_file file descriptor to be
+ * signaled before continuing with further commands. This command may be
+ * executed before submission of the job to hardware.
+ */
+#define DRM_TEGRA_SUBMIT_CMD_WAIT_SYNC_FILE		2
+
+/**
+ * If set, the driver is allowed to skip execution of this command if
+ * the previous job executed by the engine was from the same channel
+ * context as this job.
+ */
+#define DRM_TEGRA_SUBMIT_CONTEXT_SETUP			(1<<0)
+
+struct drm_tegra_submit_cmd_gather_uptr {
+	__u32 words;
+	__u32 reserved[3];
+};
+
+struct drm_tegra_submit_cmd_wait_syncpt {
+	__u32 id;
+	__u32 threshold;
+	__u32 reserved[2];
+};
+
+struct drm_tegra_submit_cmd_wait_sync_file {
+	__s32 fd;
+	__u32 reserved[3];
+};
+
+struct drm_tegra_submit_cmd {
+	/**
+	 * @type: [in]
+	 *
+	 * Command type to execute. One of the DRM_TEGRA_SUBMIT_CMD*
+	 * defines.
+	 */
+	__u32 type;
+
+	/**
+	 * @flags: [in]
+	 *
+	 * Flags.
+	 */
+	__u32 flags;
+
+	union {
+		struct drm_tegra_submit_cmd_gather_uptr gather_uptr;
+		struct drm_tegra_submit_cmd_wait_syncpt wait_syncpt;
+		struct drm_tegra_submit_cmd_wait_sync_file wait_sync_file;
+		__u32 reserved[4];
+	};
+};
+
+struct drm_tegra_channel_submit {
+	/**
+	 * @channel_ctx: [in]
+	 *
+	 * Identifier of the channel to submit this job to.
+	 */
+	__u32 channel_ctx;
+
+	__u32 reserved0;
+
+	/**
+	 * @bufs_ptr: [in]
+	 *
+	 * Pointer to an array of drm_tegra_submit_buf structures.
+	 */
+	__u64 bufs_ptr;
+
+	/**
+	 * @cmds_ptr: [in]
+	 *
+	 * Pointer to an array of drm_tegra_submit_cmd structures.
+	 */
+	__u64 cmds_ptr;
+
+	/**
+	 * @gather_data_ptr: [in]
+	 *
+	 * Pointer to an array of Host1x opcodes to be used by GATHER_UPTR
+	 * commands.
+	 */
+	__u64 gather_data_ptr;
+
+	/**
+	 * @num_bufs: [in]
+	 *
+	 * Number of elements in the `bufs_ptr` array.
+	 */
+	__u32 num_bufs;
+
+	/**
+	 * @num_cmds: [in]
+	 *
+	 * Number of elements in the `cmds_ptr` array.
+	 */
+	__u32 num_cmds;
+
+	/**
+	 * @gather_data_words: [in]
+	 *
+	 * Number of 32-bit words in the `gather_data_ptr` array.
+	 */
+	__u32 gather_data_words;
+
+	__u32 reserved1;
+
+	/**
+	 * @syncpt_incrs: [in,out]
+	 *
+	 * Information about each distinct syncpoint the job will increment.
+	 */
+	struct drm_tegra_submit_syncpt_incr syncpt_incrs[2];
+};
+
+#define DRM_IOCTL_TEGRA_CHANNEL_OPEN     DRM_IOWR(DRM_COMMAND_BASE + 0x10, struct drm_tegra_channel_open)
+#define DRM_IOCTL_TEGRA_CHANNEL_CLOSE    DRM_IOWR(DRM_COMMAND_BASE + 0x11, struct drm_tegra_channel_close)
+#define DRM_IOCTL_TEGRA_CHANNEL_MAP      DRM_IOWR(DRM_COMMAND_BASE + 0x12, struct drm_tegra_channel_map)
+#define DRM_IOCTL_TEGRA_CHANNEL_UNMAP    DRM_IOWR(DRM_COMMAND_BASE + 0x13, struct drm_tegra_channel_unmap)
+#define DRM_IOCTL_TEGRA_CHANNEL_SUBMIT   DRM_IOWR(DRM_COMMAND_BASE + 0x14, struct drm_tegra_channel_submit)
+
+#define DRM_IOCTL_TEGRA_GEM_CREATE       DRM_IOWR(DRM_COMMAND_BASE + 0x15, struct drm_tegra_gem_create)
+#define DRM_IOCTL_TEGRA_GEM_MMAP         DRM_IOWR(DRM_COMMAND_BASE + 0x16, struct drm_tegra_gem_mmap)
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.28.0


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v3 16/20] drm/tegra: Boot VIC during runtime PM resume
  2020-10-07 17:12 [PATCH v3 00/20] Host1x/TegraDRM UAPI Mikko Perttunen
                   ` (14 preceding siblings ...)
  2020-10-07 17:12 ` [PATCH v3 15/20] drm/tegra: Add new UAPI to header Mikko Perttunen
@ 2020-10-07 17:12 ` Mikko Perttunen
  2020-10-07 17:12 ` [PATCH v3 17/20] drm/tegra: Set resv fields when importing/exporting GEMs Mikko Perttunen
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Mikko Perttunen @ 2020-10-07 17:12 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

With the new UAPI implementation, engines are powered on and off
when there are active jobs, and the core code handles channel
allocation. To accommodate that, boot the engine as part of
runtime PM instead of using the open_channel callback, which is
not used by the new submit path.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v3:
* runtime_get/put is now done directly from submit path, so no
  callbacks are added
* Reworded.
---
 drivers/gpu/drm/tegra/vic.c | 114 +++++++++++++++++-------------------
 1 file changed, 53 insertions(+), 61 deletions(-)

diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c
index cb476da59adc..5d2ad125dca3 100644
--- a/drivers/gpu/drm/tegra/vic.c
+++ b/drivers/gpu/drm/tegra/vic.c
@@ -29,7 +29,6 @@ struct vic_config {
 
 struct vic {
 	struct falcon falcon;
-	bool booted;
 
 	void __iomem *regs;
 	struct tegra_drm_client client;
@@ -52,48 +51,6 @@ static void vic_writel(struct vic *vic, u32 value, unsigned int offset)
 	writel(value, vic->regs + offset);
 }
 
-static int vic_runtime_resume(struct device *dev)
-{
-	struct vic *vic = dev_get_drvdata(dev);
-	int err;
-
-	err = clk_prepare_enable(vic->clk);
-	if (err < 0)
-		return err;
-
-	usleep_range(10, 20);
-
-	err = reset_control_deassert(vic->rst);
-	if (err < 0)
-		goto disable;
-
-	usleep_range(10, 20);
-
-	return 0;
-
-disable:
-	clk_disable_unprepare(vic->clk);
-	return err;
-}
-
-static int vic_runtime_suspend(struct device *dev)
-{
-	struct vic *vic = dev_get_drvdata(dev);
-	int err;
-
-	err = reset_control_assert(vic->rst);
-	if (err < 0)
-		return err;
-
-	usleep_range(2000, 4000);
-
-	clk_disable_unprepare(vic->clk);
-
-	vic->booted = false;
-
-	return 0;
-}
-
 static int vic_boot(struct vic *vic)
 {
 #ifdef CONFIG_IOMMU_API
@@ -103,9 +60,6 @@ static int vic_boot(struct vic *vic)
 	void *hdr;
 	int err = 0;
 
-	if (vic->booted)
-		return 0;
-
 #ifdef CONFIG_IOMMU_API
 	if (vic->config->supports_sid && spec) {
 		u32 value;
@@ -153,8 +107,6 @@ static int vic_boot(struct vic *vic)
 		return err;
 	}
 
-	vic->booted = true;
-
 	return 0;
 }
 
@@ -308,35 +260,76 @@ static int vic_load_firmware(struct vic *vic)
 	return err;
 }
 
-static int vic_open_channel(struct tegra_drm_client *client,
-			    struct tegra_drm_context *context)
+
+static int vic_runtime_resume(struct device *dev)
 {
-	struct vic *vic = to_vic(client);
+	struct vic *vic = dev_get_drvdata(dev);
 	int err;
 
-	err = pm_runtime_get_sync(vic->dev);
+	err = clk_prepare_enable(vic->clk);
 	if (err < 0)
 		return err;
 
+	usleep_range(10, 20);
+
+	err = reset_control_deassert(vic->rst);
+	if (err < 0)
+		goto disable;
+
+	usleep_range(10, 20);
+
 	err = vic_load_firmware(vic);
 	if (err < 0)
-		goto rpm_put;
+		goto assert;
 
 	err = vic_boot(vic);
 	if (err < 0)
-		goto rpm_put;
+		goto assert;
+
+	return 0;
+
+assert:
+	reset_control_assert(vic->rst);
+disable:
+	clk_disable_unprepare(vic->clk);
+	return err;
+}
+
+static int vic_runtime_suspend(struct device *dev)
+{
+	struct vic *vic = dev_get_drvdata(dev);
+	int err;
+
+	err = reset_control_assert(vic->rst);
+	if (err < 0)
+		return err;
+
+	usleep_range(2000, 4000);
+
+	clk_disable_unprepare(vic->clk);
+
+	return 0;
+}
+
+static int vic_open_channel(struct tegra_drm_client *client,
+			    struct tegra_drm_context *context)
+{
+	struct vic *vic = to_vic(client);
+	int err;
+
+	err = pm_runtime_get_sync(vic->dev);
+	if (err < 0) {
+		pm_runtime_put(vic->dev);
+		return err;
+	}
 
 	context->channel = host1x_channel_get(vic->channel);
 	if (!context->channel) {
-		err = -ENOMEM;
-		goto rpm_put;
+		pm_runtime_put(vic->dev);
+		return -ENOMEM;
 	}
 
 	return 0;
-
-rpm_put:
-	pm_runtime_put(vic->dev);
-	return err;
 }
 
 static void vic_close_channel(struct tegra_drm_context *context)
@@ -344,7 +337,6 @@ static void vic_close_channel(struct tegra_drm_context *context)
 	struct vic *vic = to_vic(context->client);
 
 	host1x_channel_put(context->channel);
-
 	pm_runtime_put(vic->dev);
 }
 
-- 
2.28.0


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v3 17/20] drm/tegra: Set resv fields when importing/exporting GEMs
  2020-10-07 17:12 [PATCH v3 00/20] Host1x/TegraDRM UAPI Mikko Perttunen
                   ` (15 preceding siblings ...)
  2020-10-07 17:12 ` [PATCH v3 16/20] drm/tegra: Boot VIC during runtime PM resume Mikko Perttunen
@ 2020-10-07 17:12 ` Mikko Perttunen
  2020-10-07 17:12 ` [PATCH v3 18/20] drm/tegra: Allocate per-engine channel in core code Mikko Perttunen
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Mikko Perttunen @ 2020-10-07 17:12 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

To allow sharing of implicit fences when exporting/importing dma_buf
objects, set the 'resv' fields when importing or exporting GEM
objects.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/drm/tegra/gem.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/tegra/gem.c b/drivers/gpu/drm/tegra/gem.c
index 723df142a981..4a8acd4724bd 100644
--- a/drivers/gpu/drm/tegra/gem.c
+++ b/drivers/gpu/drm/tegra/gem.c
@@ -423,6 +423,7 @@ static struct tegra_bo *tegra_bo_import(struct drm_device *drm,
 	}
 
 	bo->gem.import_attach = attach;
+	bo->gem.resv = buf->resv;
 
 	return bo;
 
@@ -675,6 +676,7 @@ struct dma_buf *tegra_gem_prime_export(struct drm_gem_object *gem,
 	exp_info.size = gem->size;
 	exp_info.flags = flags;
 	exp_info.priv = gem;
+	exp_info.resv = gem->resv;
 
 	return drm_gem_dmabuf_export(gem->dev, &exp_info);
 }
-- 
2.28.0


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v3 18/20] drm/tegra: Allocate per-engine channel in core code
  2020-10-07 17:12 [PATCH v3 00/20] Host1x/TegraDRM UAPI Mikko Perttunen
                   ` (16 preceding siblings ...)
  2020-10-07 17:12 ` [PATCH v3 17/20] drm/tegra: Set resv fields when importing/exporting GEMs Mikko Perttunen
@ 2020-10-07 17:12 ` Mikko Perttunen
  2020-10-07 17:12 ` [PATCH v3 19/20] drm/tegra: Implement new UAPI Mikko Perttunen
  2020-10-07 17:12 ` [PATCH v3 20/20] drm/tegra: Add job firewall Mikko Perttunen
  19 siblings, 0 replies; 35+ messages in thread
From: Mikko Perttunen @ 2020-10-07 17:12 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

To avoid duplication, allocate the per-engine shared channel in the
core code instead. Once MLOCKs are implemented on Host1x side, we
can also update this to avoid allocating a shared channel when
MLOCKs are enabled.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/drm/tegra/drm.c | 11 +++++++++++
 drivers/gpu/drm/tegra/drm.h |  4 ++++
 2 files changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 7437c67924aa..7124b0b0154b 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -887,6 +887,14 @@ static struct drm_driver tegra_drm_driver = {
 int tegra_drm_register_client(struct tegra_drm *tegra,
 			      struct tegra_drm_client *client)
 {
+	/*
+	 * When MLOCKs are implemented, change to allocate a shared channel
+	 * only when MLOCKs are disabled.
+	 */
+	client->shared_channel = host1x_channel_request(&client->base);
+	if (!client->shared_channel)
+		return -EBUSY;
+
 	mutex_lock(&tegra->clients_lock);
 	list_add_tail(&client->list, &tegra->clients);
 	client->drm = tegra;
@@ -903,6 +911,9 @@ int tegra_drm_unregister_client(struct tegra_drm *tegra,
 	client->drm = NULL;
 	mutex_unlock(&tegra->clients_lock);
 
+	if (client->shared_channel)
+		host1x_channel_put(client->shared_channel);
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
index b25443255be6..3fc42fd97911 100644
--- a/drivers/gpu/drm/tegra/drm.h
+++ b/drivers/gpu/drm/tegra/drm.h
@@ -86,8 +86,12 @@ struct tegra_drm_client {
 	struct list_head list;
 	struct tegra_drm *drm;
 
+	/* Set by driver */
 	unsigned int version;
 	const struct tegra_drm_client_ops *ops;
+
+	/* Set by TegraDRM core */
+	struct host1x_channel *shared_channel;
 };
 
 static inline struct tegra_drm_client *
-- 
2.28.0


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v3 19/20] drm/tegra: Implement new UAPI
  2020-10-07 17:12 [PATCH v3 00/20] Host1x/TegraDRM UAPI Mikko Perttunen
                   ` (17 preceding siblings ...)
  2020-10-07 17:12 ` [PATCH v3 18/20] drm/tegra: Allocate per-engine channel in core code Mikko Perttunen
@ 2020-10-07 17:12 ` Mikko Perttunen
  2020-10-08  3:42   ` kernel test robot
  2020-10-19  2:21   ` Dmitry Osipenko
  2020-10-07 17:12 ` [PATCH v3 20/20] drm/tegra: Add job firewall Mikko Perttunen
  19 siblings, 2 replies; 35+ messages in thread
From: Mikko Perttunen @ 2020-10-07 17:12 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Implement the new UAPI, and bump the TegraDRM major version.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v3:
* Remove WRITE_RELOC. Relocations are now patched implicitly
  when patching is needed.
* Directly call PM runtime APIs on devices instead of using
  power_on/power_off callbacks.
* Remove incorrect mutex unlock in tegra_drm_ioctl_channel_open
* Use XA_FLAGS_ALLOC1 instead of XA_FLAGS_ALLOC
* Accommodate for removal of timeout field and inlining of
  syncpt_incrs array.
* Copy entire user arrays at a time instead of going through
  elements one-by-one.
* Implement waiting of DMA reservations.
* Split out gather_bo implementation into a separate file.
* Fix length parameter passed to sg_init_one in gather_bo
* Cosmetic cleanup.
---
 drivers/gpu/drm/tegra/Makefile         |   3 +
 drivers/gpu/drm/tegra/drm.c            |  46 +-
 drivers/gpu/drm/tegra/drm.h            |   5 +
 drivers/gpu/drm/tegra/uapi.h           |  63 +++
 drivers/gpu/drm/tegra/uapi/gather_bo.c |  86 ++++
 drivers/gpu/drm/tegra/uapi/gather_bo.h |  22 +
 drivers/gpu/drm/tegra/uapi/submit.c    | 675 +++++++++++++++++++++++++
 drivers/gpu/drm/tegra/uapi/submit.h    |  17 +
 drivers/gpu/drm/tegra/uapi/uapi.c      | 326 ++++++++++++
 9 files changed, 1225 insertions(+), 18 deletions(-)
 create mode 100644 drivers/gpu/drm/tegra/uapi.h
 create mode 100644 drivers/gpu/drm/tegra/uapi/gather_bo.c
 create mode 100644 drivers/gpu/drm/tegra/uapi/gather_bo.h
 create mode 100644 drivers/gpu/drm/tegra/uapi/submit.c
 create mode 100644 drivers/gpu/drm/tegra/uapi/submit.h
 create mode 100644 drivers/gpu/drm/tegra/uapi/uapi.c

diff --git a/drivers/gpu/drm/tegra/Makefile b/drivers/gpu/drm/tegra/Makefile
index d6cf202414f0..059322e88943 100644
--- a/drivers/gpu/drm/tegra/Makefile
+++ b/drivers/gpu/drm/tegra/Makefile
@@ -3,6 +3,9 @@ ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG
 
 tegra-drm-y := \
 	drm.o \
+	uapi/uapi.o \
+	uapi/submit.o \
+	uapi/gather_bo.o \
 	gem.o \
 	fb.o \
 	dp.o \
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 7124b0b0154b..88226dd0fd88 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -20,24 +20,20 @@
 #include <drm/drm_prime.h>
 #include <drm/drm_vblank.h>
 
+#include "uapi.h"
 #include "drm.h"
 #include "gem.h"
 
 #define DRIVER_NAME "tegra"
 #define DRIVER_DESC "NVIDIA Tegra graphics"
 #define DRIVER_DATE "20120330"
-#define DRIVER_MAJOR 0
+#define DRIVER_MAJOR 1
 #define DRIVER_MINOR 0
 #define DRIVER_PATCHLEVEL 0
 
 #define CARVEOUT_SZ SZ_64M
 #define CDMA_GATHER_FETCHES_MAX_NB 16383
 
-struct tegra_drm_file {
-	struct idr contexts;
-	struct mutex lock;
-};
-
 static int tegra_atomic_check(struct drm_device *drm,
 			      struct drm_atomic_state *state)
 {
@@ -90,7 +86,8 @@ static int tegra_drm_open(struct drm_device *drm, struct drm_file *filp)
 	if (!fpriv)
 		return -ENOMEM;
 
-	idr_init(&fpriv->contexts);
+	idr_init(&fpriv->legacy_contexts);
+	xa_init_flags(&fpriv->contexts, XA_FLAGS_ALLOC1);
 	mutex_init(&fpriv->lock);
 	filp->driver_priv = fpriv;
 
@@ -432,7 +429,7 @@ static int tegra_client_open(struct tegra_drm_file *fpriv,
 	if (err < 0)
 		return err;
 
-	err = idr_alloc(&fpriv->contexts, context, 1, 0, GFP_KERNEL);
+	err = idr_alloc(&fpriv->legacy_contexts, context, 1, 0, GFP_KERNEL);
 	if (err < 0) {
 		client->ops->close_channel(context);
 		return err;
@@ -487,13 +484,13 @@ static int tegra_close_channel(struct drm_device *drm, void *data,
 
 	mutex_lock(&fpriv->lock);
 
-	context = idr_find(&fpriv->contexts, args->context);
+	context = idr_find(&fpriv->legacy_contexts, args->context);
 	if (!context) {
 		err = -EINVAL;
 		goto unlock;
 	}
 
-	idr_remove(&fpriv->contexts, context->id);
+	idr_remove(&fpriv->legacy_contexts, context->id);
 	tegra_drm_context_free(context);
 
 unlock:
@@ -512,7 +509,7 @@ static int tegra_get_syncpt(struct drm_device *drm, void *data,
 
 	mutex_lock(&fpriv->lock);
 
-	context = idr_find(&fpriv->contexts, args->context);
+	context = idr_find(&fpriv->legacy_contexts, args->context);
 	if (!context) {
 		err = -ENODEV;
 		goto unlock;
@@ -541,7 +538,7 @@ static int tegra_submit(struct drm_device *drm, void *data,
 
 	mutex_lock(&fpriv->lock);
 
-	context = idr_find(&fpriv->contexts, args->context);
+	context = idr_find(&fpriv->legacy_contexts, args->context);
 	if (!context) {
 		err = -ENODEV;
 		goto unlock;
@@ -566,7 +563,7 @@ static int tegra_get_syncpt_base(struct drm_device *drm, void *data,
 
 	mutex_lock(&fpriv->lock);
 
-	context = idr_find(&fpriv->contexts, args->context);
+	context = idr_find(&fpriv->legacy_contexts, args->context);
 	if (!context) {
 		err = -ENODEV;
 		goto unlock;
@@ -734,11 +731,23 @@ static int tegra_gem_get_flags(struct drm_device *drm, void *data,
 #endif
 
 static const struct drm_ioctl_desc tegra_drm_ioctls[] = {
-#ifdef CONFIG_DRM_TEGRA_STAGING
-	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE, tegra_gem_create,
+	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_OPEN, tegra_drm_ioctl_channel_open,
+			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_CLOSE, tegra_drm_ioctl_channel_close,
+			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_MAP, tegra_drm_ioctl_channel_map,
 			  DRM_RENDER_ALLOW),
-	DRM_IOCTL_DEF_DRV(TEGRA_GEM_MMAP, tegra_gem_mmap,
+	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_UNMAP, tegra_drm_ioctl_channel_unmap,
 			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_SUBMIT, tegra_drm_ioctl_channel_submit,
+			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE, tegra_drm_ioctl_gem_create,
+			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(TEGRA_GEM_MMAP, tegra_drm_ioctl_gem_mmap,
+			  DRM_RENDER_ALLOW),
+#ifdef CONFIG_DRM_TEGRA_STAGING
+	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE_LEGACY, tegra_gem_create, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(TEGRA_GEM_MMAP_LEGACY, tegra_gem_mmap, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(TEGRA_SYNCPT_READ, tegra_syncpt_read,
 			  DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(TEGRA_SYNCPT_INCR, tegra_syncpt_incr,
@@ -792,10 +801,11 @@ static void tegra_drm_postclose(struct drm_device *drm, struct drm_file *file)
 	struct tegra_drm_file *fpriv = file->driver_priv;
 
 	mutex_lock(&fpriv->lock);
-	idr_for_each(&fpriv->contexts, tegra_drm_context_cleanup, NULL);
+	idr_for_each(&fpriv->legacy_contexts, tegra_drm_context_cleanup, NULL);
+	tegra_drm_uapi_close_file(fpriv);
 	mutex_unlock(&fpriv->lock);
 
-	idr_destroy(&fpriv->contexts);
+	idr_destroy(&fpriv->legacy_contexts);
 	mutex_destroy(&fpriv->lock);
 	kfree(fpriv);
 }
diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
index 3fc42fd97911..42f8a301555c 100644
--- a/drivers/gpu/drm/tegra/drm.h
+++ b/drivers/gpu/drm/tegra/drm.h
@@ -58,6 +58,11 @@ struct tegra_drm {
 	struct tegra_display_hub *hub;
 };
 
+static inline struct host1x *tegra_drm_to_host1x(struct tegra_drm *tegra)
+{
+	return dev_get_drvdata(tegra->drm->dev->parent);
+}
+
 struct tegra_drm_client;
 
 struct tegra_drm_context {
diff --git a/drivers/gpu/drm/tegra/uapi.h b/drivers/gpu/drm/tegra/uapi.h
new file mode 100644
index 000000000000..5c422607e8fa
--- /dev/null
+++ b/drivers/gpu/drm/tegra/uapi.h
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (c) 2020 NVIDIA Corporation */
+
+#ifndef _TEGRA_DRM_UAPI_H
+#define _TEGRA_DRM_UAPI_H
+
+#include <linux/dma-mapping.h>
+#include <linux/idr.h>
+#include <linux/kref.h>
+#include <linux/xarray.h>
+
+#include <drm/drm.h>
+
+struct drm_file;
+struct drm_device;
+
+struct tegra_drm_file {
+	/* Legacy UAPI state */
+	struct idr legacy_contexts;
+	struct mutex lock;
+
+	/* New UAPI state */
+	struct xarray contexts;
+};
+
+struct tegra_drm_channel_ctx {
+	struct tegra_drm_client *client;
+	struct host1x_channel *channel;
+	struct xarray mappings;
+};
+
+struct tegra_drm_mapping {
+	struct kref ref;
+
+	struct device *dev;
+	struct host1x_bo *bo;
+	struct sg_table *sgt;
+	enum dma_data_direction direction;
+	dma_addr_t iova;
+	dma_addr_t iova_end;
+};
+
+int tegra_drm_ioctl_channel_open(struct drm_device *drm, void *data,
+				 struct drm_file *file);
+int tegra_drm_ioctl_channel_close(struct drm_device *drm, void *data,
+				  struct drm_file *file);
+int tegra_drm_ioctl_channel_map(struct drm_device *drm, void *data,
+				struct drm_file *file);
+int tegra_drm_ioctl_channel_unmap(struct drm_device *drm, void *data,
+				struct drm_file *file);
+int tegra_drm_ioctl_channel_submit(struct drm_device *drm, void *data,
+				   struct drm_file *file);
+int tegra_drm_ioctl_gem_create(struct drm_device *drm, void *data,
+				struct drm_file *file);
+int tegra_drm_ioctl_gem_mmap(struct drm_device *drm, void *data,
+				struct drm_file *file);
+
+void tegra_drm_uapi_close_file(struct tegra_drm_file *file);
+void tegra_drm_mapping_put(struct tegra_drm_mapping *mapping);
+struct tegra_drm_channel_ctx *
+tegra_drm_channel_ctx_lock(struct tegra_drm_file *file, u32 id);
+
+#endif
diff --git a/drivers/gpu/drm/tegra/uapi/gather_bo.c b/drivers/gpu/drm/tegra/uapi/gather_bo.c
new file mode 100644
index 000000000000..b487a0d44648
--- /dev/null
+++ b/drivers/gpu/drm/tegra/uapi/gather_bo.c
@@ -0,0 +1,86 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2020 NVIDIA Corporation */
+
+#include <linux/scatterlist.h>
+#include <linux/slab.h>
+
+#include "gather_bo.h"
+
+static struct host1x_bo *gather_bo_get(struct host1x_bo *host_bo)
+{
+	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
+
+	kref_get(&bo->ref);
+
+	return host_bo;
+}
+
+static void gather_bo_release(struct kref *ref)
+{
+	struct gather_bo *bo = container_of(ref, struct gather_bo, ref);
+
+	kfree(bo->gather_data);
+	kfree(bo);
+}
+
+void gather_bo_put(struct host1x_bo *host_bo)
+{
+	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
+
+	kref_put(&bo->ref, gather_bo_release);
+}
+
+static struct sg_table *
+gather_bo_pin(struct device *dev, struct host1x_bo *host_bo, dma_addr_t *phys)
+{
+	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
+	struct sg_table *sgt;
+	int err;
+
+	if (phys) {
+		*phys = virt_to_phys(bo->gather_data);
+		return NULL;
+	}
+
+	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
+	if (!sgt)
+		return ERR_PTR(-ENOMEM);
+
+	err = sg_alloc_table(sgt, 1, GFP_KERNEL);
+	if (err) {
+		kfree(sgt);
+		return ERR_PTR(err);
+	}
+
+	sg_init_one(sgt->sgl, bo->gather_data, bo->gather_data_words*4);
+
+	return sgt;
+}
+
+static void gather_bo_unpin(struct device *dev, struct sg_table *sgt)
+{
+	if (sgt) {
+		sg_free_table(sgt);
+		kfree(sgt);
+	}
+}
+
+static void *gather_bo_mmap(struct host1x_bo *host_bo)
+{
+	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
+
+	return bo->gather_data;
+}
+
+static void gather_bo_munmap(struct host1x_bo *host_bo, void *addr)
+{
+}
+
+const struct host1x_bo_ops gather_bo_ops = {
+	.get = gather_bo_get,
+	.put = gather_bo_put,
+	.pin = gather_bo_pin,
+	.unpin = gather_bo_unpin,
+	.mmap = gather_bo_mmap,
+	.munmap = gather_bo_munmap,
+};
diff --git a/drivers/gpu/drm/tegra/uapi/gather_bo.h b/drivers/gpu/drm/tegra/uapi/gather_bo.h
new file mode 100644
index 000000000000..6b4c9d83ac91
--- /dev/null
+++ b/drivers/gpu/drm/tegra/uapi/gather_bo.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (c) 2020 NVIDIA Corporation */
+
+#ifndef _TEGRA_DRM_SUBMIT_GATHER_BO_H
+#define _TEGRA_DRM_SUBMIT_GATHER_BO_H
+
+#include <linux/host1x.h>
+#include <linux/kref.h>
+
+struct gather_bo {
+	struct host1x_bo base;
+
+	struct kref ref;
+
+	u32 *gather_data;
+	size_t gather_data_words;
+};
+
+extern const struct host1x_bo_ops gather_bo_ops;
+void gather_bo_put(struct host1x_bo *host_bo);
+
+#endif
diff --git a/drivers/gpu/drm/tegra/uapi/submit.c b/drivers/gpu/drm/tegra/uapi/submit.c
new file mode 100644
index 000000000000..95141f1516e5
--- /dev/null
+++ b/drivers/gpu/drm/tegra/uapi/submit.c
@@ -0,0 +1,675 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2020 NVIDIA Corporation */
+
+#include <linux/dma-fence-array.h>
+#include <linux/file.h>
+#include <linux/host1x.h>
+#include <linux/iommu.h>
+#include <linux/kref.h>
+#include <linux/list.h>
+#include <linux/nospec.h>
+#include <linux/pm_runtime.h>
+#include <linux/sync_file.h>
+
+#include <drm/drm_drv.h>
+#include <drm/drm_file.h>
+
+#include "../uapi.h"
+#include "../drm.h"
+#include "../gem.h"
+
+#include "gather_bo.h"
+#include "submit.h"
+
+static struct tegra_drm_mapping *
+tegra_drm_mapping_get(struct tegra_drm_channel_ctx *ctx, u32 id)
+{
+	struct tegra_drm_mapping *mapping;
+
+	xa_lock(&ctx->mappings);
+	mapping = xa_load(&ctx->mappings, id);
+	if (mapping)
+		kref_get(&mapping->ref);
+	xa_unlock(&ctx->mappings);
+
+	return mapping;
+}
+
+static void *alloc_copy_user_array(void __user *from, size_t count, size_t size)
+{
+	unsigned long copy_err;
+	size_t copy_len;
+	void *data;
+
+	if (check_mul_overflow(count, size, &copy_len))
+		return ERR_PTR(-EINVAL);
+
+	data = kvmalloc(copy_len, GFP_KERNEL);
+	if (!data)
+		return ERR_PTR(-ENOMEM);
+
+	copy_err = copy_from_user(data, from, copy_len);
+	if (copy_err) {
+		kvfree(data);
+		return ERR_PTR(-EFAULT);
+	}
+
+	return data;
+}
+
+static int submit_copy_gather_data(struct drm_device *drm,
+				   struct gather_bo **pbo,
+				   struct drm_tegra_channel_submit *args)
+{
+	unsigned long copy_err;
+	struct gather_bo *bo;
+	size_t copy_len;
+
+	if (args->gather_data_words == 0) {
+		drm_info(drm, "gather_data_words can't be 0");
+		return -EINVAL;
+	}
+
+	if (check_mul_overflow((size_t)args->gather_data_words, (size_t)4, &copy_len))
+		return -EINVAL;
+
+	bo = kzalloc(sizeof(*bo), GFP_KERNEL);
+	if (!bo)
+		return -ENOMEM;
+
+	kref_init(&bo->ref);
+	host1x_bo_init(&bo->base, &gather_bo_ops);
+
+	bo->gather_data = kmalloc(copy_len, GFP_KERNEL | __GFP_NOWARN);
+	if (!bo->gather_data) {
+		kfree(bo);
+		return -ENOMEM;
+	}
+
+	copy_err = copy_from_user(bo->gather_data,
+				  u64_to_user_ptr(args->gather_data_ptr),
+				  copy_len);
+	if (copy_err) {
+		kfree(bo->gather_data);
+		kfree(bo);
+		return -EFAULT;
+	}
+
+	bo->gather_data_words = args->gather_data_words;
+
+	*pbo = bo;
+
+	return 0;
+}
+
+static int submit_write_reloc(struct gather_bo *bo,
+			      struct drm_tegra_submit_buf *buf,
+			      struct tegra_drm_mapping *mapping)
+{
+	/* TODO check that target_offset is within bounds */
+	dma_addr_t iova = mapping->iova + buf->reloc.target_offset;
+	u32 written_ptr = (u32)(iova >> buf->reloc.shift);
+
+	if (buf->flags & DRM_TEGRA_SUBMIT_BUF_RELOC_BLOCKLINEAR)
+		written_ptr |= BIT(39);
+
+	if (buf->reloc.gather_offset_words >= bo->gather_data_words)
+		return -EINVAL;
+
+	buf->reloc.gather_offset_words = array_index_nospec(
+		buf->reloc.gather_offset_words, bo->gather_data_words);
+
+	bo->gather_data[buf->reloc.gather_offset_words] = written_ptr;
+
+	return 0;
+}
+
+static void submit_unlock_resv(struct tegra_drm_submit_data *job_data,
+			       struct ww_acquire_ctx *acquire_ctx)
+{
+	u32 i;
+
+	for (i = 0; i < job_data->num_used_mappings; i++) {
+		struct tegra_bo *bo = host1x_to_tegra_bo(
+			job_data->used_mappings[i].mapping->bo);
+
+		dma_resv_unlock(bo->gem.resv);
+	}
+
+	ww_acquire_fini(acquire_ctx);
+}
+
+static int submit_handle_resv(struct tegra_drm_submit_data *job_data,
+			      struct ww_acquire_ctx *acquire_ctx,
+			      struct xarray *implicit_fences)
+{
+	struct tegra_drm_used_mapping *mappings = job_data->used_mappings;
+	struct tegra_bo *bo;
+	int contended = -1;
+	int err;
+	u32 i;
+
+	/* Based on drm_gem_lock_reservations */
+
+	ww_acquire_init(acquire_ctx, &reservation_ww_class);
+
+retry:
+	if (contended != -1) {
+		bo = host1x_to_tegra_bo(mappings[contended].mapping->bo);
+
+		err = dma_resv_lock_slow_interruptible(bo->gem.resv, acquire_ctx);
+		if (err) {
+			ww_acquire_done(acquire_ctx);
+			return err;
+		}
+	}
+
+	for (i = 0; i < job_data->num_used_mappings; i++) {
+		bo = host1x_to_tegra_bo(mappings[i].mapping->bo);
+
+		if (i == contended)
+			continue;
+
+		err = dma_resv_lock_interruptible(bo->gem.resv, acquire_ctx);
+		if (err) {
+			int j;
+
+			for (j = 0; j < i; j++) {
+				bo = host1x_to_tegra_bo(mappings[j].mapping->bo);
+				dma_resv_unlock(bo->gem.resv);
+			}
+
+			if (contended != -1 && contended >= i) {
+				bo = host1x_to_tegra_bo(mappings[contended].mapping->bo);
+				dma_resv_unlock(bo->gem.resv);
+			}
+
+			if (err == -EDEADLK) {
+				contended = i;
+				goto retry;
+			}
+
+			ww_acquire_done(acquire_ctx);
+			return err;
+		}
+	}
+
+	ww_acquire_done(acquire_ctx);
+
+	for (i = 0; i < job_data->num_used_mappings; i++) {
+		bo = host1x_to_tegra_bo(mappings[i].mapping->bo);
+
+		if (mappings[i].flags & DRM_TEGRA_SUBMIT_BUF_RESV_WRITE) {
+			err = drm_gem_fence_array_add_implicit(implicit_fences,
+							       &bo->gem, true);
+			if (err < 0)
+				goto unlock_resv;
+		} else if (mappings[i].flags & DRM_TEGRA_SUBMIT_BUF_RESV_READ) {
+			err = dma_resv_reserve_shared(bo->gem.resv, 1);
+			if (err < 0)
+				goto unlock_resv;
+
+			err = drm_gem_fence_array_add_implicit(implicit_fences,
+							       &bo->gem, false);
+			if (err < 0)
+				goto unlock_resv;
+		}
+	}
+
+	return 0;
+
+unlock_resv:
+	submit_unlock_resv(job_data, acquire_ctx);
+
+	return err;
+}
+
+static int submit_process_bufs(struct drm_device *drm, struct gather_bo *bo,
+			       struct tegra_drm_submit_data *job_data,
+			       struct tegra_drm_channel_ctx *ctx,
+			       struct drm_tegra_channel_submit *args,
+			       struct ww_acquire_ctx *acquire_ctx,
+			       bool *need_implicit_fences)
+{
+	struct tegra_drm_used_mapping *mappings;
+	struct drm_tegra_submit_buf *bufs;
+	int err;
+	u32 i;
+
+	bufs = alloc_copy_user_array(u64_to_user_ptr(args->bufs_ptr),
+				     args->num_bufs, sizeof(*bufs));
+	if (IS_ERR(bufs))
+		return PTR_ERR(bufs);
+
+	mappings = kcalloc(args->num_bufs, sizeof(*mappings), GFP_KERNEL);
+	if (!mappings) {
+		err = -ENOMEM;
+		goto done;
+	}
+
+	*need_implicit_fences = false;
+
+	for (i = 0; i < args->num_bufs; i++) {
+		struct drm_tegra_submit_buf *buf = &bufs[i];
+		struct tegra_drm_mapping *mapping;
+
+		if (buf->flags & ~(DRM_TEGRA_SUBMIT_BUF_RELOC_BLOCKLINEAR |
+				   DRM_TEGRA_SUBMIT_BUF_RESV_READ |
+				   DRM_TEGRA_SUBMIT_BUF_RESV_WRITE)) {
+			err = -EINVAL;
+			goto drop_refs;
+		}
+
+		if (buf->reserved[0] || buf->reserved[1]) {
+			err = -EINVAL;
+			goto drop_refs;
+		}
+
+		mapping = tegra_drm_mapping_get(ctx, buf->mapping_id);
+		if (!mapping) {
+			drm_info(drm, "invalid mapping_id for buf: %u",
+				 buf->mapping_id);
+			err = -EINVAL;
+			goto drop_refs;
+		}
+
+		err = submit_write_reloc(bo, buf, mapping);
+		if (err) {
+			tegra_drm_mapping_put(mapping);
+			goto drop_refs;
+		}
+
+		mappings[i].mapping = mapping;
+		mappings[i].flags = buf->flags;
+
+		if (buf->flags & (DRM_TEGRA_SUBMIT_BUF_RESV_READ |
+				  DRM_TEGRA_SUBMIT_BUF_RESV_WRITE))
+			*need_implicit_fences = true;
+	}
+
+	job_data->used_mappings = mappings;
+	job_data->num_used_mappings = i;
+
+	err = 0;
+
+	goto done;
+
+drop_refs:
+	for (;;) {
+		if (i-- == 0)
+			break;
+
+		tegra_drm_mapping_put(mappings[i].mapping);
+	}
+
+	kfree(mappings);
+	job_data->used_mappings = NULL;
+
+done:
+	kvfree(bufs);
+
+	return err;
+}
+
+static int submit_get_syncpt(struct drm_device *drm, struct host1x_job *job,
+			     struct drm_tegra_channel_submit *args)
+{
+	struct drm_tegra_submit_syncpt_incr *incr;
+	struct host1x_syncpt *sp;
+
+	if (args->syncpt_incrs[1].num_incrs != 0) {
+		drm_info(drm, "Only 1 syncpoint supported for now");
+		return -EINVAL;
+	}
+
+	incr = &args->syncpt_incrs[0];
+
+	if ((incr->flags & ~DRM_TEGRA_SUBMIT_SYNCPT_INCR_CREATE_SYNC_FILE) ||
+	    incr->reserved[0] || incr->reserved[1] || incr->reserved[2])
+		return -EINVAL;
+
+	/* Syncpt ref will be dropped on job release */
+	sp = host1x_syncpt_fd_get(incr->syncpt_fd);
+	if (IS_ERR(sp))
+		return PTR_ERR(sp);
+
+	job->syncpt = sp;
+	job->syncpt_incrs = incr->num_incrs;
+
+	return 0;
+}
+
+static int submit_job_add_gather(struct host1x_job *job,
+				 struct tegra_drm_channel_ctx *ctx,
+				 struct drm_tegra_submit_cmd_gather_uptr *cmd,
+				 struct gather_bo *bo, u32 *offset,
+				 struct tegra_drm_submit_data *job_data)
+{
+	u32 next_offset;
+
+	if (cmd->reserved[0] || cmd->reserved[1] || cmd->reserved[2])
+		return -EINVAL;
+
+	/* Check for maximum gather size */
+	if (cmd->words > 16383)
+		return -EINVAL;
+
+	if (check_add_overflow(*offset, cmd->words, &next_offset))
+		return -EINVAL;
+
+	if (next_offset > bo->gather_data_words)
+		return -EINVAL;
+
+	host1x_job_add_gather(job, &bo->base, cmd->words, *offset * 4);
+
+	*offset = next_offset;
+
+	return 0;
+}
+
+static int submit_create_job(struct drm_device *drm, struct host1x_job **pjob,
+			     struct gather_bo *bo,
+			     struct tegra_drm_channel_ctx *ctx,
+			     struct drm_tegra_channel_submit *args,
+			     struct tegra_drm_submit_data *job_data,
+			     bool need_implicit_fences)
+{
+	struct drm_tegra_submit_cmd *cmds;
+	u32 i, gather_offset = 0;
+	struct host1x_job *job;
+	u32 num_cmds;
+	int err;
+
+	cmds = alloc_copy_user_array(u64_to_user_ptr(args->cmds_ptr),
+				     args->num_cmds, sizeof(*cmds));
+	if (IS_ERR(cmds))
+		return PTR_ERR(cmds);
+
+	num_cmds = args->num_cmds;
+	if (need_implicit_fences)
+		num_cmds++;
+
+	job = host1x_job_alloc(ctx->channel, num_cmds, 0);
+	if (!job) {
+		err = -ENOMEM;
+		goto done;
+	}
+
+	err = submit_get_syncpt(drm, job, args);
+	if (err < 0)
+		goto free_job;
+
+	job->client = &ctx->client->base;
+	job->class = ctx->client->base.class;
+	job->serialize = true;
+
+	if (need_implicit_fences) {
+		u32 threshold = host1x_syncpt_incr_max(job->syncpt, 1) - 1;
+		host1x_job_add_wait(job, host1x_syncpt_id(job->syncpt), threshold);
+	}
+
+	for (i = 0; i < args->num_cmds; i++) {
+		struct drm_tegra_submit_cmd *cmd = &cmds[i];
+
+		if (cmd->type == DRM_TEGRA_SUBMIT_CMD_GATHER_UPTR) {
+			err = submit_job_add_gather(job, ctx, &cmd->gather_uptr,
+						    bo, &gather_offset,
+						    job_data);
+			if (err)
+				goto free_job;
+		} else if (cmd->type == DRM_TEGRA_SUBMIT_CMD_WAIT_SYNCPT) {
+			if (cmd->wait_syncpt.reserved[0] ||
+			    cmd->wait_syncpt.reserved[1]) {
+				err = -EINVAL;
+				goto free_job;
+			}
+
+			host1x_job_add_wait(job, cmd->wait_syncpt.id,
+					    cmd->wait_syncpt.threshold);
+		} else if (cmd->type == DRM_TEGRA_SUBMIT_CMD_WAIT_SYNC_FILE) {
+			struct dma_fence *f;
+
+			if (cmd->wait_sync_file.reserved[0] ||
+			    cmd->wait_sync_file.reserved[1] ||
+			    cmd->wait_sync_file.reserved[2]) {
+				err = -EINVAL;
+				goto free_job;
+			}
+
+			f = sync_file_get_fence(cmd->wait_sync_file.fd);
+			if (!f) {
+				err = -EINVAL;
+				goto free_job;
+			}
+
+			err = dma_fence_wait(f, true);
+			dma_fence_put(f);
+
+			if (err)
+				goto free_job;
+		} else {
+			err = -EINVAL;
+			goto free_job;
+		}
+	}
+
+	if (gather_offset == 0) {
+		drm_info(drm, "Job must have at least one gather");
+		err = -EINVAL;
+		goto free_job;
+	}
+
+	*pjob = job;
+
+	err = 0;
+	goto done;
+
+free_job:
+	host1x_job_put(job);
+
+done:
+	kvfree(cmds);
+
+	return err;
+}
+
+static int submit_create_postfences(struct host1x_job *job,
+				    struct drm_tegra_channel_submit *args)
+{
+	struct drm_tegra_submit_syncpt_incr *incr = &args->syncpt_incrs[0];
+	struct tegra_drm_submit_data *job_data = job->user_data;
+	struct dma_fence *fence;
+	int err = 0;
+	u32 i;
+
+	fence = host1x_fence_create(job->syncpt, job->syncpt_end);
+	if (IS_ERR(fence))
+		return PTR_ERR(fence);
+
+	incr->fence_value = job->syncpt_end;
+
+	for (i = 0; i < job_data->num_used_mappings; i++) {
+		struct tegra_drm_used_mapping *um = &job_data->used_mappings[i];
+		struct tegra_bo *bo = host1x_to_tegra_bo(um->mapping->bo);
+
+		if (um->flags & DRM_TEGRA_SUBMIT_BUF_RESV_READ)
+			dma_resv_add_shared_fence(bo->gem.resv, fence);
+
+		if (um->flags & DRM_TEGRA_SUBMIT_BUF_RESV_WRITE)
+			dma_resv_add_excl_fence(bo->gem.resv, fence);
+	}
+
+	if (incr->flags & DRM_TEGRA_SUBMIT_SYNCPT_INCR_CREATE_SYNC_FILE) {
+		struct sync_file *sf;
+
+		err = get_unused_fd_flags(O_CLOEXEC);
+		if (err < 0)
+			goto put_fence;
+
+		sf = sync_file_create(fence);
+		if (!sf) {
+			err = -ENOMEM;
+			goto put_fence;
+		}
+
+		fd_install(err, sf->file);
+		incr->sync_file_fd = err;
+		err = 0;
+	}
+
+put_fence:
+	dma_fence_put(fence);
+
+	return err;
+}
+
+static void release_job(struct host1x_job *job)
+{
+	struct tegra_drm_client *client =
+		container_of(job->client, struct tegra_drm_client, base);
+	struct tegra_drm_submit_data *job_data = job->user_data;
+	u32 i;
+
+	for (i = 0; i < job_data->num_used_mappings; i++)
+		tegra_drm_mapping_put(job_data->used_mappings[i].mapping);
+
+	kfree(job_data->used_mappings);
+	kfree(job_data);
+
+	pm_runtime_put_autosuspend(client->base.dev);
+}
+
+int tegra_drm_ioctl_channel_submit(struct drm_device *drm, void *data,
+				   struct drm_file *file)
+{
+	struct tegra_drm_file *fpriv = file->driver_priv;
+	struct drm_tegra_channel_submit *args = data;
+	struct tegra_drm_submit_data *job_data;
+	struct ww_acquire_ctx acquire_ctx;
+	struct tegra_drm_channel_ctx *ctx;
+	DEFINE_XARRAY(implicit_fences);
+	bool need_implicit_fences;
+	struct host1x_job *job;
+	struct gather_bo *bo;
+	u32 i;
+	int err;
+
+	if (args->reserved0 || args->reserved1)
+		return -EINVAL;
+
+	ctx = tegra_drm_channel_ctx_lock(fpriv, args->channel_ctx);
+	if (!ctx)
+		return -EINVAL;
+
+	/* Allocate gather BO and copy gather words in. */
+	err = submit_copy_gather_data(drm, &bo, args);
+	if (err)
+		goto unlock;
+
+	job_data = kzalloc(sizeof(*job_data), GFP_KERNEL);
+	if (!job_data) {
+		err = -ENOMEM;
+		goto put_bo;
+	}
+
+	/* Get data buffer mappings and do relocation patching. */
+	err = submit_process_bufs(drm, bo, job_data, ctx, args, &acquire_ctx,
+				  &need_implicit_fences);
+	if (err)
+		goto free_job_data;
+
+	/* Allocate host1x_job and add gathers and waits to it. */
+	err = submit_create_job(drm, &job, bo, ctx, args,
+				job_data, need_implicit_fences);
+	if (err)
+		goto free_job_data;
+
+	/* Map gather data for Host1x. */
+	err = host1x_job_pin(job, ctx->client->base.dev);
+	if (err)
+		goto put_job;
+
+	/* Boot engine. */
+	err = pm_runtime_get_sync(ctx->client->base.dev);
+	if (err < 0)
+		goto put_pm_runtime;
+
+	job->user_data = job_data;
+	job->release = release_job;
+	job->timeout = 10000;
+
+	/*
+	 * job_data is now part of job reference counting, so don't release
+	 * it from here.
+	 */
+	job_data = NULL;
+
+	if (need_implicit_fences) {
+		xa_init_flags(&implicit_fences, XA_FLAGS_ALLOC);
+
+		/*
+		 * Lock DMA reservations, reserve fence slots and
+		 * retrieve prefences.
+		 */
+		err = submit_handle_resv(job->user_data, &acquire_ctx,
+					 &implicit_fences);
+		if (err)
+			goto put_pm_runtime;
+	}
+
+	/* Submit job to hardware. */
+	err = host1x_job_submit(job);
+	if (err)
+		goto unlock_resv;
+
+	/* Return postfences to userspace and add fences to DMA reservations. */
+	err = submit_create_postfences(job, args);
+
+	if (need_implicit_fences) {
+		struct dma_fence *fence;
+		unsigned long index;
+		int err = 0;
+
+		/* Unlock DMA reservations. */
+		submit_unlock_resv(job->user_data, &acquire_ctx);
+
+		/* Wait for fences and unblock job. */
+		xa_for_each(&implicit_fences, index, fence) {
+			err = dma_fence_wait(fence, false);
+			if (err < 0)
+				break;
+		}
+
+		xa_destroy(&implicit_fences);
+
+		if (err == 0)
+			host1x_syncpt_incr(job->syncpt);
+	}
+
+	goto put_job;
+
+unlock_resv:
+	if (need_implicit_fences) {
+		submit_unlock_resv(job->user_data, &acquire_ctx);
+		xa_destroy(&implicit_fences);
+	}
+put_pm_runtime:
+	if (!job->release)
+		pm_runtime_put(ctx->client->base.dev);
+	host1x_job_unpin(job);
+put_job:
+	host1x_job_put(job);
+free_job_data:
+	if (job_data && job_data->used_mappings) {
+		for (i = 0; i < job_data->num_used_mappings; i++)
+			tegra_drm_mapping_put(job_data->used_mappings[i].mapping);
+		kfree(job_data->used_mappings);
+	}
+	if (job_data)
+		kfree(job_data);
+put_bo:
+	gather_bo_put(&bo->base);
+unlock:
+	mutex_unlock(&fpriv->lock);
+	return err;
+}
diff --git a/drivers/gpu/drm/tegra/uapi/submit.h b/drivers/gpu/drm/tegra/uapi/submit.h
new file mode 100644
index 000000000000..0a165e9e4bda
--- /dev/null
+++ b/drivers/gpu/drm/tegra/uapi/submit.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (c) 2020 NVIDIA Corporation */
+
+#ifndef _TEGRA_DRM_UAPI_SUBMIT_H
+#define _TEGRA_DRM_UAPI_SUBMIT_H
+
+struct tegra_drm_used_mapping {
+	struct tegra_drm_mapping *mapping;
+	u32 flags;
+};
+
+struct tegra_drm_submit_data {
+	struct tegra_drm_used_mapping *used_mappings;
+	u32 num_used_mappings;
+};
+
+#endif
diff --git a/drivers/gpu/drm/tegra/uapi/uapi.c b/drivers/gpu/drm/tegra/uapi/uapi.c
new file mode 100644
index 000000000000..9bd7766e759a
--- /dev/null
+++ b/drivers/gpu/drm/tegra/uapi/uapi.c
@@ -0,0 +1,326 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2020 NVIDIA Corporation */
+
+#include <linux/host1x.h>
+#include <linux/iommu.h>
+#include <linux/list.h>
+
+#include <drm/drm_drv.h>
+#include <drm/drm_file.h>
+
+#include "../uapi.h"
+#include "../drm.h"
+
+struct tegra_drm_channel_ctx *
+tegra_drm_channel_ctx_lock(struct tegra_drm_file *file, u32 id)
+{
+	struct tegra_drm_channel_ctx *ctx;
+
+	mutex_lock(&file->lock);
+	ctx = xa_load(&file->contexts, id);
+	if (!ctx)
+		mutex_unlock(&file->lock);
+
+	return ctx;
+}
+
+static void tegra_drm_mapping_release(struct kref *ref)
+{
+	struct tegra_drm_mapping *mapping =
+		container_of(ref, struct tegra_drm_mapping, ref);
+
+	if (mapping->sgt)
+		dma_unmap_sgtable(mapping->dev, mapping->sgt,
+				  mapping->direction, DMA_ATTR_SKIP_CPU_SYNC);
+
+	host1x_bo_unpin(mapping->dev, mapping->bo, mapping->sgt);
+	host1x_bo_put(mapping->bo);
+
+	kfree(mapping);
+}
+
+void tegra_drm_mapping_put(struct tegra_drm_mapping *mapping)
+{
+	kref_put(&mapping->ref, tegra_drm_mapping_release);
+}
+
+static void tegra_drm_channel_ctx_close(struct tegra_drm_channel_ctx *ctx)
+{
+	unsigned long mapping_id;
+	struct tegra_drm_mapping *mapping;
+
+	xa_for_each(&ctx->mappings, mapping_id, mapping)
+		tegra_drm_mapping_put(mapping);
+
+	xa_destroy(&ctx->mappings);
+
+	host1x_channel_put(ctx->channel);
+
+	kfree(ctx);
+}
+
+int close_channel_ctx(int id, void *p, void *data)
+{
+	struct tegra_drm_channel_ctx *ctx = p;
+
+	tegra_drm_channel_ctx_close(ctx);
+
+	return 0;
+}
+
+void tegra_drm_uapi_close_file(struct tegra_drm_file *file)
+{
+	unsigned long ctx_id;
+	struct tegra_drm_channel_ctx *ctx;
+
+	xa_for_each(&file->contexts, ctx_id, ctx)
+		tegra_drm_channel_ctx_close(ctx);
+
+	xa_destroy(&file->contexts);
+}
+
+int tegra_drm_ioctl_channel_open(struct drm_device *drm, void *data,
+				 struct drm_file *file)
+{
+	struct tegra_drm_file *fpriv = file->driver_priv;
+	struct tegra_drm *tegra = drm->dev_private;
+	struct drm_tegra_channel_open *args = data;
+	struct tegra_drm_client *client = NULL;
+	struct tegra_drm_channel_ctx *ctx;
+	int err;
+
+	if (args->flags || args->reserved[0] || args->reserved[1])
+		return -EINVAL;
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return -ENOMEM;
+
+	err = -ENODEV;
+	list_for_each_entry(client, &tegra->clients, list) {
+		if (client->base.class == args->host1x_class) {
+			err = 0;
+			break;
+		}
+	}
+	if (err)
+		goto free_ctx;
+
+	if (client->shared_channel) {
+		ctx->channel = host1x_channel_get(client->shared_channel);
+	} else {
+		ctx->channel = host1x_channel_request(&client->base);
+		if (!ctx->channel) {
+			err = -EBUSY;
+			goto free_ctx;
+		}
+	}
+
+	err = xa_alloc(&fpriv->contexts, &args->channel_ctx, ctx,
+		       XA_LIMIT(1, U32_MAX), GFP_KERNEL);
+	if (err < 0)
+		goto put_channel;
+
+	ctx->client = client;
+	xa_init_flags(&ctx->mappings, XA_FLAGS_ALLOC1);
+
+	args->hardware_version = client->version;
+
+	return 0;
+
+put_channel:
+	host1x_channel_put(ctx->channel);
+free_ctx:
+	kfree(ctx);
+
+	return err;
+}
+
+int tegra_drm_ioctl_channel_close(struct drm_device *drm, void *data,
+				  struct drm_file *file)
+{
+	struct tegra_drm_file *fpriv = file->driver_priv;
+	struct drm_tegra_channel_close *args = data;
+	struct tegra_drm_channel_ctx *ctx;
+
+	if (args->reserved[0])
+		return -EINVAL;
+
+	ctx = tegra_drm_channel_ctx_lock(fpriv, args->channel_ctx);
+	if (!ctx)
+		return -EINVAL;
+
+	xa_erase(&fpriv->contexts, args->channel_ctx);
+
+	mutex_unlock(&fpriv->lock);
+
+	tegra_drm_channel_ctx_close(ctx);
+
+	return 0;
+}
+
+int tegra_drm_ioctl_channel_map(struct drm_device *drm, void *data,
+				struct drm_file *file)
+{
+	struct tegra_drm_file *fpriv = file->driver_priv;
+	struct drm_tegra_channel_map *args = data;
+	struct tegra_drm_channel_ctx *ctx;
+	struct tegra_drm_mapping *mapping;
+	struct drm_gem_object *gem;
+	u32 mapping_id;
+	int err = 0;
+
+	if (args->flags & ~DRM_TEGRA_CHANNEL_MAP_READWRITE)
+		return -EINVAL;
+	if (args->reserved[0] || args->reserved[1])
+		return -EINVAL;
+
+	if (!IS_ALIGNED(args->offset, 0x1000) ||
+	    !IS_ALIGNED(args->length, 0x1000))
+		return -EINVAL;
+
+	ctx = tegra_drm_channel_ctx_lock(fpriv, args->channel_ctx);
+	if (!ctx)
+		return -EINVAL;
+
+	mapping = kzalloc(sizeof(*mapping), GFP_KERNEL);
+	if (!mapping) {
+		err = -ENOMEM;
+		goto unlock;
+	}
+
+	kref_init(&mapping->ref);
+
+	gem = drm_gem_object_lookup(file, args->handle);
+	if (!gem) {
+		err = -EINVAL;
+		goto unlock;
+	}
+
+	if (args->offset >= gem->size || args->length > gem->size ||
+	    args->offset > gem->size - args->length) {
+		err = -EINVAL;
+		goto put_gem;
+	}
+
+	mapping->dev = ctx->client->base.dev;
+	mapping->bo = &container_of(gem, struct tegra_bo, gem)->base;
+
+	if (!iommu_get_domain_for_dev(mapping->dev) ||
+	    ctx->client->base.group) {
+		host1x_bo_pin(mapping->dev, mapping->bo,
+			      &mapping->iova);
+	} else {
+		mapping->direction = DMA_TO_DEVICE;
+		if (args->flags & DRM_TEGRA_CHANNEL_MAP_READWRITE)
+			mapping->direction = DMA_BIDIRECTIONAL;
+
+		mapping->sgt =
+			host1x_bo_pin(mapping->dev, mapping->bo, NULL);
+		if (IS_ERR(mapping->sgt)) {
+			err = PTR_ERR(mapping->sgt);
+			goto put_gem;
+		}
+
+		err = dma_map_sgtable(mapping->dev, mapping->sgt,
+				      mapping->direction,
+				      DMA_ATTR_SKIP_CPU_SYNC);
+		if (err)
+			goto unpin;
+
+		/* TODO only map the requested part */
+		mapping->iova = sg_dma_address(mapping->sgt->sgl) + args->offset;
+		mapping->iova_end = mapping->iova + gem->size;
+	}
+
+	mutex_unlock(&fpriv->lock);
+
+	err = xa_alloc(&ctx->mappings, &mapping_id, mapping,
+		       XA_LIMIT(1, U32_MAX), GFP_KERNEL);
+	if (err < 0)
+		goto unmap;
+
+	/* TODO: if appropriate, return actual IOVA */
+	args->iova = U64_MAX;
+	args->mapping_id = mapping_id;
+
+	return 0;
+
+unmap:
+	if (mapping->sgt) {
+		dma_unmap_sgtable(mapping->dev, mapping->sgt,
+				  mapping->direction, DMA_ATTR_SKIP_CPU_SYNC);
+	}
+unpin:
+	host1x_bo_unpin(mapping->dev, mapping->bo, mapping->sgt);
+put_gem:
+	drm_gem_object_put(gem);
+	kfree(mapping);
+unlock:
+	mutex_unlock(&fpriv->lock);
+	return err;
+}
+
+int tegra_drm_ioctl_channel_unmap(struct drm_device *drm, void *data,
+				  struct drm_file *file)
+{
+	struct tegra_drm_file *fpriv = file->driver_priv;
+	struct drm_tegra_channel_unmap *args = data;
+	struct tegra_drm_channel_ctx *ctx;
+	struct tegra_drm_mapping *mapping;
+
+	if (args->reserved[0] || args->reserved[1])
+		return -EINVAL;
+
+	ctx = tegra_drm_channel_ctx_lock(fpriv, args->channel_ctx);
+	if (!ctx)
+		return -EINVAL;
+
+	mapping = xa_erase(&ctx->mappings, args->mapping_id);
+
+	mutex_unlock(&fpriv->lock);
+
+	if (mapping) {
+		tegra_drm_mapping_put(mapping);
+		return 0;
+	} else {
+		return -EINVAL;
+	}
+}
+
+int tegra_drm_ioctl_gem_create(struct drm_device *drm, void *data,
+			       struct drm_file *file)
+{
+	struct drm_tegra_gem_create *args = data;
+	struct tegra_bo *bo;
+
+	if (args->flags)
+		return -EINVAL;
+
+	bo = tegra_bo_create_with_handle(file, drm, args->size, args->flags,
+					 &args->handle);
+	if (IS_ERR(bo))
+		return PTR_ERR(bo);
+
+	return 0;
+}
+
+int tegra_drm_ioctl_gem_mmap(struct drm_device *drm, void *data,
+			     struct drm_file *file)
+{
+	struct drm_tegra_gem_mmap *args = data;
+	struct drm_gem_object *gem;
+	struct tegra_bo *bo;
+
+	gem = drm_gem_object_lookup(file, args->handle);
+	if (!gem)
+		return -EINVAL;
+
+	bo = to_tegra_bo(gem);
+
+	args->offset = drm_vma_node_offset_addr(&bo->gem.vma_node);
+
+	drm_gem_object_put(gem);
+
+	return 0;
+}
-- 
2.28.0


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v3 20/20] drm/tegra: Add job firewall
  2020-10-07 17:12 [PATCH v3 00/20] Host1x/TegraDRM UAPI Mikko Perttunen
                   ` (18 preceding siblings ...)
  2020-10-07 17:12 ` [PATCH v3 19/20] drm/tegra: Implement new UAPI Mikko Perttunen
@ 2020-10-07 17:12 ` Mikko Perttunen
  19 siblings, 0 replies; 35+ messages in thread
From: Mikko Perttunen @ 2020-10-07 17:12 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Add a firewall that validates jobs before submission to ensure
they don't do anything they aren't allowed to do, like accessing
memory they should not access.

The firewall is functionality-wise a copy of the firewall already
implemented in gpu/host1x. It is copied here as it makes more
sense for it to live on the DRM side, as it is only needed for
userspace job submissions, and generally the data it needs to
do its job is easier to access here.

In the future, the other implementation will be removed.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v3:
* New patch
---
 drivers/gpu/drm/tegra/Makefile        |   1 +
 drivers/gpu/drm/tegra/uapi/firewall.c | 197 ++++++++++++++++++++++++++
 drivers/gpu/drm/tegra/uapi/submit.c   |   4 +
 drivers/gpu/drm/tegra/uapi/submit.h   |   3 +
 4 files changed, 205 insertions(+)
 create mode 100644 drivers/gpu/drm/tegra/uapi/firewall.c

diff --git a/drivers/gpu/drm/tegra/Makefile b/drivers/gpu/drm/tegra/Makefile
index 059322e88943..4e3295f436f1 100644
--- a/drivers/gpu/drm/tegra/Makefile
+++ b/drivers/gpu/drm/tegra/Makefile
@@ -5,6 +5,7 @@ tegra-drm-y := \
 	drm.o \
 	uapi/uapi.o \
 	uapi/submit.o \
+	uapi/firewall.o \
 	uapi/gather_bo.o \
 	gem.o \
 	fb.o \
diff --git a/drivers/gpu/drm/tegra/uapi/firewall.c b/drivers/gpu/drm/tegra/uapi/firewall.c
new file mode 100644
index 000000000000..a9c5b71bc235
--- /dev/null
+++ b/drivers/gpu/drm/tegra/uapi/firewall.c
@@ -0,0 +1,197 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2010-2020 NVIDIA Corporation */
+
+#include "../drm.h"
+#include "../uapi.h"
+
+#include "submit.h"
+
+struct tegra_drm_firewall {
+	struct tegra_drm_submit_data *submit;
+	struct tegra_drm_client *client;
+	u32 *data;
+	u32 pos;
+	u32 end;
+};
+
+static int fw_next(struct tegra_drm_firewall *fw, u32 *word)
+{
+	if (fw->pos == fw->end)
+		return -EINVAL;
+
+	*word = fw->data[fw->pos++];
+
+	return 0;
+}
+
+static bool fw_check_addr_valid(struct tegra_drm_firewall *fw, u32 offset)
+{
+	u32 i;
+
+	for (i = 0; i < fw->submit->num_used_mappings; i++) {
+		struct tegra_drm_mapping *m = fw->submit->used_mappings[i].mapping;
+
+		if (offset >= m->iova && offset <= m->iova_end)
+			return true;
+	}
+
+	return false;
+}
+
+static int fw_check_reg(struct tegra_drm_firewall *fw, u32 offset)
+{
+	bool is_addr;
+	u32 word;
+	int err;
+
+	err = fw_next(fw, &word);
+	if (err)
+		return err;
+
+	if (!fw->client->ops->is_addr_reg)
+		return 0;
+
+	is_addr = fw->client->ops->is_addr_reg(
+		fw->client->base.dev, fw->client->base.class, offset);
+
+	if (!is_addr)
+		return 0;
+
+	if (!fw_check_addr_valid(fw, word))
+		return -EINVAL;
+
+	return 0;
+}
+
+static int fw_check_regs_seq(struct tegra_drm_firewall *fw, u32 offset,
+			     u32 count, bool incr)
+{
+	u32 i;
+
+	for (i = 0; i < count; i++) {
+		if (fw_check_reg(fw, offset))
+			return -EINVAL;
+
+		if (incr)
+			offset++;
+	}
+
+	return 0;
+}
+
+static int fw_check_regs_mask(struct tegra_drm_firewall *fw, u32 offset,
+			      u16 mask)
+{
+	unsigned long bmask = mask;
+	unsigned int bit;
+
+	for_each_set_bit(bit, &bmask, 16) {
+		if (fw_check_reg(fw, offset+bit))
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int fw_check_regs_imm(struct tegra_drm_firewall *fw, u32 offset)
+{
+	bool is_addr;
+
+	is_addr = fw->client->ops->is_addr_reg(fw->client->base.dev,
+					       fw->client->base.class, offset);
+	if (is_addr)
+		return -EINVAL;
+
+	return 0;
+}
+
+enum {
+        HOST1X_OPCODE_SETCLASS  = 0x00,
+        HOST1X_OPCODE_INCR      = 0x01,
+        HOST1X_OPCODE_NONINCR   = 0x02,
+        HOST1X_OPCODE_MASK      = 0x03,
+        HOST1X_OPCODE_IMM       = 0x04,
+        HOST1X_OPCODE_RESTART   = 0x05,
+        HOST1X_OPCODE_GATHER    = 0x06,
+        HOST1X_OPCODE_SETSTRMID = 0x07,
+        HOST1X_OPCODE_SETAPPID  = 0x08,
+        HOST1X_OPCODE_SETPYLD   = 0x09,
+        HOST1X_OPCODE_INCR_W    = 0x0a,
+        HOST1X_OPCODE_NONINCR_W = 0x0b,
+        HOST1X_OPCODE_GATHER_W  = 0x0c,
+        HOST1X_OPCODE_RESTART_W = 0x0d,
+        HOST1X_OPCODE_EXTEND    = 0x0e,
+};
+
+int tegra_drm_fw_validate(struct tegra_drm_client *client, u32 *data, u32 start,
+			  u32 words, struct tegra_drm_submit_data *submit)
+{
+	struct tegra_drm_firewall fw = {
+		.submit = submit,
+		.client = client,
+		.data = data,
+		.pos = start,
+		.end = start+words,
+	};
+	bool payload_valid = false;
+	u32 payload;
+	int err;
+
+	while (fw.pos != fw.end) {
+		u32 word, opcode, offset, count, mask;
+
+		err = fw_next(&fw, &word);
+		if (err)
+			return err;
+
+		opcode = (word & 0xf0000000) >> 28;
+
+		switch (opcode) {
+		case HOST1X_OPCODE_INCR:
+			offset = (word >> 16) & 0xfff;
+			count = word & 0xffff;
+			err = fw_check_regs_seq(&fw, offset, count, true);
+			break;
+		case HOST1X_OPCODE_NONINCR:
+			offset = (word >> 16) & 0xfff;
+			count = word & 0xffff;
+			err = fw_check_regs_seq(&fw, offset, count, false);
+			break;
+		case HOST1X_OPCODE_MASK:
+			offset = (word >> 16) & 0xfff;
+			mask = word & 0xffff;
+			err = fw_check_regs_mask(&fw, offset, mask);
+			break;
+		case HOST1X_OPCODE_IMM:
+			/* IMM cannot reasonably be used to write a pointer */
+			offset = (word >> 16) & 0xfff;
+			err = fw_check_regs_imm(&fw, offset);
+			break;
+		case HOST1X_OPCODE_SETPYLD:
+			payload = word & 0xffff;
+			payload_valid = true;
+			break;
+		case HOST1X_OPCODE_INCR_W:
+			if (!payload_valid)
+				return -EINVAL;
+
+			offset = word & 0x3fffff;
+			err = fw_check_regs_seq(&fw, offset, payload, true);
+			break;
+		case HOST1X_OPCODE_NONINCR_W:
+			if (!payload_valid)
+				return -EINVAL;
+
+			offset = word & 0x3fffff;
+			err = fw_check_regs_seq(&fw, offset, payload, false);
+			break;
+		default:
+			return -EINVAL;
+		}
+
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/tegra/uapi/submit.c b/drivers/gpu/drm/tegra/uapi/submit.c
index 95141f1516e5..d2720f616c22 100644
--- a/drivers/gpu/drm/tegra/uapi/submit.c
+++ b/drivers/gpu/drm/tegra/uapi/submit.c
@@ -360,6 +360,10 @@ static int submit_job_add_gather(struct host1x_job *job,
 	if (next_offset > bo->gather_data_words)
 		return -EINVAL;
 
+	if (tegra_drm_fw_validate(ctx->client, bo->gather_data, *offset,
+				  cmd->words, job_data))
+		return -EINVAL;
+
 	host1x_job_add_gather(job, &bo->base, cmd->words, *offset * 4);
 
 	*offset = next_offset;
diff --git a/drivers/gpu/drm/tegra/uapi/submit.h b/drivers/gpu/drm/tegra/uapi/submit.h
index 0a165e9e4bda..0e51627e73f8 100644
--- a/drivers/gpu/drm/tegra/uapi/submit.h
+++ b/drivers/gpu/drm/tegra/uapi/submit.h
@@ -14,4 +14,7 @@ struct tegra_drm_submit_data {
 	u32 num_used_mappings;
 };
 
+int tegra_drm_fw_validate(struct tegra_drm_client *client, u32 *data, u32 start,
+			  u32 words, struct tegra_drm_submit_data *submit);
+
 #endif
-- 
2.28.0


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 06/20] gpu: host1x: Cleanup and refcounting for syncpoints
  2020-10-07 17:12 ` [PATCH v3 06/20] gpu: host1x: Cleanup and refcounting for syncpoints Mikko Perttunen
@ 2020-10-07 22:23   ` kernel test robot
  0 siblings, 0 replies; 35+ messages in thread
From: kernel test robot @ 2020-10-07 22:23 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: kbuild-all, linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen


[-- Attachment #1: Type: text/plain, Size: 2820 bytes --]

Hi Mikko,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tegra-drm/drm/tegra/for-next]
[also build test ERROR on tegra/for-next linus/master v5.9-rc8 next-20201007]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Mikko-Perttunen/Host1x-TegraDRM-UAPI/20201008-034403
base:   git://anongit.freedesktop.org/tegra/linux.git drm/tegra/for-next
config: arm-allyesconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/3721bf9ddd2b05fe12b3512999f77351ae839d08
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Mikko-Perttunen/Host1x-TegraDRM-UAPI/20201008-034403
        git checkout 3721bf9ddd2b05fe12b3512999f77351ae839d08
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arm 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   drivers/staging/media/tegra-video/vi.c: In function 'tegra_channel_cleanup':
>> drivers/staging/media/tegra-video/vi.c:621:2: error: implicit declaration of function 'host1x_syncpt_free'; did you mean 'host1x_syncpt_read'? [-Werror=implicit-function-declaration]
     621 |  host1x_syncpt_free(chan->mw_ack_sp);
         |  ^~~~~~~~~~~~~~~~~~
         |  host1x_syncpt_read
   cc1: some warnings being treated as errors

vim +621 drivers/staging/media/tegra-video/vi.c

3d8a97eabef0883 Sowjanya Komatineni 2020-05-04  616  
3d8a97eabef0883 Sowjanya Komatineni 2020-05-04  617  static void tegra_channel_cleanup(struct tegra_vi_channel *chan)
3d8a97eabef0883 Sowjanya Komatineni 2020-05-04  618  {
3d8a97eabef0883 Sowjanya Komatineni 2020-05-04  619  	v4l2_ctrl_handler_free(&chan->ctrl_handler);
3d8a97eabef0883 Sowjanya Komatineni 2020-05-04  620  	media_entity_cleanup(&chan->video.entity);
3d8a97eabef0883 Sowjanya Komatineni 2020-05-04 @621  	host1x_syncpt_free(chan->mw_ack_sp);
3d8a97eabef0883 Sowjanya Komatineni 2020-05-04  622  	host1x_syncpt_free(chan->frame_start_sp);
3d8a97eabef0883 Sowjanya Komatineni 2020-05-04  623  	mutex_destroy(&chan->video_lock);
3d8a97eabef0883 Sowjanya Komatineni 2020-05-04  624  }
3d8a97eabef0883 Sowjanya Komatineni 2020-05-04  625  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 76214 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 09/20] gpu: host1x: DMA fences and userspace fence creation
  2020-10-07 17:12 ` [PATCH v3 09/20] gpu: host1x: DMA fences and userspace fence creation Mikko Perttunen
@ 2020-10-07 23:13   ` kernel test robot
  2020-10-08 11:13   ` kernel test robot
  1 sibling, 0 replies; 35+ messages in thread
From: kernel test robot @ 2020-10-07 23:13 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: kbuild-all, linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen


[-- Attachment #1: Type: text/plain, Size: 2343 bytes --]

Hi Mikko,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on tegra-drm/drm/tegra/for-next]
[also build test WARNING on tegra/for-next linus/master v5.9-rc8 next-20201007]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Mikko-Perttunen/Host1x-TegraDRM-UAPI/20201008-034403
base:   git://anongit.freedesktop.org/tegra/linux.git drm/tegra/for-next
config: arm64-randconfig-r004-20201008 (attached as .config)
compiler: aarch64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/c4f5ec983027f2b19e6854a362e23a79e1630100
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Mikko-Perttunen/Host1x-TegraDRM-UAPI/20201008-034403
        git checkout c4f5ec983027f2b19e6854a362e23a79e1630100
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arm64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> drivers/gpu/host1x/fence.c:105:6: warning: no previous prototype for 'host1x_fence_signal' [-Wmissing-prototypes]
     105 | void host1x_fence_signal(struct host1x_syncpt_fence *f)
         |      ^~~~~~~~~~~~~~~~~~~

vim +/host1x_fence_signal +105 drivers/gpu/host1x/fence.c

   104	
 > 105	void host1x_fence_signal(struct host1x_syncpt_fence *f)
   106	{
   107		if (atomic_xchg(&f->signaling, 1))
   108			return;
   109	
   110		/*
   111		 * Cancel pending timeout work - if it races, it will
   112		 * not get 'f->signaling' and return.
   113		 */
   114		cancel_delayed_work_sync(&f->timeout_work);
   115	
   116		host1x_intr_put_ref(f->sp->host, f->sp->id, f->waiter_ref);
   117	
   118		dma_fence_signal(&f->base);
   119		dma_fence_put(&f->base);
   120	}
   121	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 38175 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 19/20] drm/tegra: Implement new UAPI
  2020-10-07 17:12 ` [PATCH v3 19/20] drm/tegra: Implement new UAPI Mikko Perttunen
@ 2020-10-08  3:42   ` kernel test robot
  2020-10-19  2:21   ` Dmitry Osipenko
  1 sibling, 0 replies; 35+ messages in thread
From: kernel test robot @ 2020-10-08  3:42 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: kbuild-all, linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen


[-- Attachment #1: Type: text/plain, Size: 3586 bytes --]

Hi Mikko,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tegra-drm/drm/tegra/for-next]
[also build test ERROR on tegra/for-next linus/master v5.9-rc8 next-20201007]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Mikko-Perttunen/Host1x-TegraDRM-UAPI/20201008-034403
base:   git://anongit.freedesktop.org/tegra/linux.git drm/tegra/for-next
config: arm64-randconfig-r004-20201008 (attached as .config)
compiler: aarch64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/6a3b3d79ce4488695cc0745edd19015fc2220d97
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Mikko-Perttunen/Host1x-TegraDRM-UAPI/20201008-034403
        git checkout 6a3b3d79ce4488695cc0745edd19015fc2220d97
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arm64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

   In file included from drivers/gpu/drm/tegra/uapi/uapi.c:12:
>> drivers/gpu/drm/tegra/uapi/../drm.h:84:1: error: attempted to randomize userland API struct tegra_drm_client_ops
      84 | };
         | ^
>> drivers/gpu/drm/tegra/uapi/uapi.c:62:5: warning: no previous prototype for 'close_channel_ctx' [-Wmissing-prototypes]
      62 | int close_channel_ctx(int id, void *p, void *data)
         |     ^~~~~~~~~~~~~~~~~
--
   In file included from drivers/gpu/drm/tegra/uapi/submit.c:18:
>> drivers/gpu/drm/tegra/uapi/../drm.h:84:1: error: attempted to randomize userland API struct tegra_drm_client_ops
      84 | };
         | ^

vim +84 drivers/gpu/drm/tegra/uapi/../drm.h

d43f81cbaf4353 drivers/gpu/host1x/drm/drm.h Terje Bergstrom 2013-03-22  74  
53fa7f7204c97d drivers/gpu/host1x/drm/drm.h Thierry Reding  2013-09-24  75  struct tegra_drm_client_ops {
53fa7f7204c97d drivers/gpu/host1x/drm/drm.h Thierry Reding  2013-09-24  76  	int (*open_channel)(struct tegra_drm_client *client,
c88c363072c6dc drivers/gpu/host1x/drm/drm.h Thierry Reding  2013-09-26  77  			    struct tegra_drm_context *context);
c88c363072c6dc drivers/gpu/host1x/drm/drm.h Thierry Reding  2013-09-26  78  	void (*close_channel)(struct tegra_drm_context *context);
c40f0f1afcb1dc drivers/gpu/drm/tegra/drm.h  Thierry Reding  2013-10-10  79  	int (*is_addr_reg)(struct device *dev, u32 class, u32 offset);
0f563a4bf66e51 drivers/gpu/drm/tegra/drm.h  Dmitry Osipenko 2017-06-15  80  	int (*is_valid_class)(u32 class);
c88c363072c6dc drivers/gpu/host1x/drm/drm.h Thierry Reding  2013-09-26  81  	int (*submit)(struct tegra_drm_context *context,
d43f81cbaf4353 drivers/gpu/host1x/drm/drm.h Terje Bergstrom 2013-03-22  82  		      struct drm_tegra_submit *args, struct drm_device *drm,
d43f81cbaf4353 drivers/gpu/host1x/drm/drm.h Terje Bergstrom 2013-03-22  83  		      struct drm_file *file);
d43f81cbaf4353 drivers/gpu/host1x/drm/drm.h Terje Bergstrom 2013-03-22 @84  };
d43f81cbaf4353 drivers/gpu/host1x/drm/drm.h Terje Bergstrom 2013-03-22  85  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 38175 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 09/20] gpu: host1x: DMA fences and userspace fence creation
  2020-10-07 17:12 ` [PATCH v3 09/20] gpu: host1x: DMA fences and userspace fence creation Mikko Perttunen
  2020-10-07 23:13   ` kernel test robot
@ 2020-10-08 11:13   ` kernel test robot
  1 sibling, 0 replies; 35+ messages in thread
From: kernel test robot @ 2020-10-08 11:13 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: kbuild-all, clang-built-linux, linux-tegra, talho, bhuntsman,
	dri-devel, Mikko Perttunen


[-- Attachment #1: Type: text/plain, Size: 3567 bytes --]

Hi Mikko,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on tegra-drm/drm/tegra/for-next]
[also build test WARNING on tegra/for-next linus/master v5.9-rc8 next-20201007]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Mikko-Perttunen/Host1x-TegraDRM-UAPI/20201008-034403
base:   git://anongit.freedesktop.org/tegra/linux.git drm/tegra/for-next
config: arm64-randconfig-r021-20201008 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 4d1d8ae7100ec3c7e1709addb7b3ec6f9ad0b44f)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm64 cross compiling tool for clang build
        # apt-get install binutils-aarch64-linux-gnu
        # https://github.com/0day-ci/linux/commit/c4f5ec983027f2b19e6854a362e23a79e1630100
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Mikko-Perttunen/Host1x-TegraDRM-UAPI/20201008-034403
        git checkout c4f5ec983027f2b19e6854a362e23a79e1630100
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> drivers/gpu/host1x/uapi.c:227:8: warning: variable 'file' is uninitialized when used here [-Wuninitialized]
                   fput(file->file);
                        ^~~~
   drivers/gpu/host1x/uapi.c:204:24: note: initialize the variable 'file' to silence this warning
           struct sync_file *file;
                                 ^
                                  = NULL
   1 warning generated.
--
>> drivers/gpu/host1x/fence.c:105:6: warning: no previous prototype for function 'host1x_fence_signal' [-Wmissing-prototypes]
   void host1x_fence_signal(struct host1x_syncpt_fence *f)
        ^
   drivers/gpu/host1x/fence.c:105:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void host1x_fence_signal(struct host1x_syncpt_fence *f)
   ^
   static 
   1 warning generated.

vim +/file +227 drivers/gpu/host1x/uapi.c

   199	
   200	static int dev_file_ioctl_create_fence(struct host1x *host1x, void __user *data)
   201	{
   202		struct host1x_create_fence args;
   203		unsigned long copy_err;
   204		struct sync_file *file;
   205		int fd;
   206	
   207		copy_err = copy_from_user(&args, data, sizeof(args));
   208		if (copy_err)
   209			return -EFAULT;
   210	
   211		if (args.reserved[0])
   212			return -EINVAL;
   213	
   214		if (args.id >= host1x_syncpt_nb_pts(host1x))
   215			return -EINVAL;
   216	
   217		args.id = array_index_nospec(args.id, host1x_syncpt_nb_pts(host1x));
   218	
   219		fd = host1x_fence_create_fd(&host1x->syncpt[args.id], args.threshold);
   220		if (fd < 0)
   221			return fd;
   222	
   223		args.fence_fd = fd;
   224	
   225		copy_err = copy_to_user(data, &args, sizeof(args));
   226		if (copy_err) {
 > 227			fput(file->file);
   228			return -EFAULT;
   229		}
   230	
   231		return 0;
   232	}
   233	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 35904 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 19/20] drm/tegra: Implement new UAPI
  2020-10-07 17:12 ` [PATCH v3 19/20] drm/tegra: Implement new UAPI Mikko Perttunen
  2020-10-08  3:42   ` kernel test robot
@ 2020-10-19  2:21   ` Dmitry Osipenko
  2020-10-19  8:13     ` Mikko Perttunen
  1 sibling, 1 reply; 35+ messages in thread
From: Dmitry Osipenko @ 2020-10-19  2:21 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

07.10.2020 20:12, Mikko Perttunen пишет:
> +int tegra_drm_ioctl_channel_map(struct drm_device *drm, void *data,
> +				struct drm_file *file)
> +{

Hello, Mikko!

Could you please tell what are the host1x clients that are going to be
upstreamed and will need this IOCTL?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 19/20] drm/tegra: Implement new UAPI
  2020-10-19  2:21   ` Dmitry Osipenko
@ 2020-10-19  8:13     ` Mikko Perttunen
  2020-10-19 17:27       ` Dmitry Osipenko
  0 siblings, 1 reply; 35+ messages in thread
From: Mikko Perttunen @ 2020-10-19  8:13 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

On 10/19/20 5:21 AM, Dmitry Osipenko wrote:
> 07.10.2020 20:12, Mikko Perttunen пишет:
>> +int tegra_drm_ioctl_channel_map(struct drm_device *drm, void *data,
>> +				struct drm_file *file)
>> +{
> 
> Hello, Mikko!
> 
> Could you please tell what are the host1x clients that are going to be
> upstreamed and will need this IOCTL?
> 

Hi Dmitry!

It is needed for any engine/job that wants to access memory, as this 
IOCTL must be used to map memory for the engine. So all of them.

Downstream doesn't have an equivalent IOCTL because it (currently) does 
mapping at submit time, but that is suboptimal because

- it requires doing relocations in the kernel which isn't required for T186+
- it's a big performance penalty, due to which the downstream kernel has 
the "deferred dma-buf unmapping" feature, where unmapping a dma_buf may 
not immediately unmap it in case it's used later, so that the "mapping" 
later is faster. A feature which we'd preferably get rid of.
- because of the above feature not being controlled by the user, it can 
cause variance in submit times.

On the other hand, we cannot (at least always) do the mapping at 
allocation/import time, because

- A single FD may have multiple channel_ctx's, and an allocation/import 
may need to be used in any subset of them
- The import IOCTL is fixed and doesn't have the parameters we'd need to 
do this at import time
- Overall it's more orthogonal to have GEM object acquirement in one 
step and mapping in another.

Maybe that's not quite what you asked, but it's some background anyway :)

Cheers,
Mikko

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 19/20] drm/tegra: Implement new UAPI
  2020-10-19  8:13     ` Mikko Perttunen
@ 2020-10-19 17:27       ` Dmitry Osipenko
  2020-10-20  9:18         ` Mikko Perttunen
  2020-10-20 11:40         ` Daniel Vetter
  0 siblings, 2 replies; 35+ messages in thread
From: Dmitry Osipenko @ 2020-10-19 17:27 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

19.10.2020 11:13, Mikko Perttunen пишет:
> On 10/19/20 5:21 AM, Dmitry Osipenko wrote:
>> 07.10.2020 20:12, Mikko Perttunen пишет:
>>> +int tegra_drm_ioctl_channel_map(struct drm_device *drm, void *data,
>>> +                struct drm_file *file)
>>> +{
>>
>> Hello, Mikko!
>>
>> Could you please tell what are the host1x clients that are going to be
>> upstreamed and will need this IOCTL?
>>
> 
> Hi Dmitry!
> 
> It is needed for any engine/job that wants to access memory, as this
> IOCTL must be used to map memory for the engine. So all of them.
> 
> Downstream doesn't have an equivalent IOCTL because it (currently) does
> mapping at submit time, but that is suboptimal because
> 
> - it requires doing relocations in the kernel which isn't required for
> T186+
> - it's a big performance penalty, due to which the downstream kernel has
> the "deferred dma-buf unmapping" feature, where unmapping a dma_buf may
> not immediately unmap it in case it's used later, so that the "mapping"
> later is faster. A feature which we'd preferably get rid of.
> - because of the above feature not being controlled by the user, it can
> cause variance in submit times.
> 
> On the other hand, we cannot (at least always) do the mapping at
> allocation/import time, because
> 
> - A single FD may have multiple channel_ctx's, and an allocation/import
> may need to be used in any subset of them
> - The import IOCTL is fixed and doesn't have the parameters we'd need to
> do this at import time
> - Overall it's more orthogonal to have GEM object acquirement in one
> step and mapping in another.
> 
> Maybe that's not quite what you asked, but it's some background anyway :)

I'm asking this question because right now there is only one potential
client for this IOCTL, the VIC. If other clients aren't supposed to be a
part of the DRM driver, like for example NVDEC which probably should be
a V4L driver, then DRM driver will have only a single VIC and in this
case we shouldn't need this IOCTL because DRM and V4L should use generic
dmabuf API for importing and exporting buffers.

I'm also not quite sure about whether the current model of the unified
Tegra DRM driver is suitable for having the separated engines. Perhaps
each separated engine should just have its own rendering node?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 19/20] drm/tegra: Implement new UAPI
  2020-10-19 17:27       ` Dmitry Osipenko
@ 2020-10-20  9:18         ` Mikko Perttunen
  2020-10-22  4:20           ` Dmitry Osipenko
  2020-10-20 11:40         ` Daniel Vetter
  1 sibling, 1 reply; 35+ messages in thread
From: Mikko Perttunen @ 2020-10-20  9:18 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

On 10/19/20 8:27 PM, Dmitry Osipenko wrote:
> 19.10.2020 11:13, Mikko Perttunen пишет:
>> On 10/19/20 5:21 AM, Dmitry Osipenko wrote:
>>> 07.10.2020 20:12, Mikko Perttunen пишет:
>>>> +int tegra_drm_ioctl_channel_map(struct drm_device *drm, void *data,
>>>> +                struct drm_file *file)
>>>> +{
>>>
>>> Hello, Mikko!
>>>
>>> Could you please tell what are the host1x clients that are going to be
>>> upstreamed and will need this IOCTL?
>>>
>>
>> Hi Dmitry!
>>
>> It is needed for any engine/job that wants to access memory, as this
>> IOCTL must be used to map memory for the engine. So all of them.
>>
>> Downstream doesn't have an equivalent IOCTL because it (currently) does
>> mapping at submit time, but that is suboptimal because
>>
>> - it requires doing relocations in the kernel which isn't required for
>> T186+
>> - it's a big performance penalty, due to which the downstream kernel has
>> the "deferred dma-buf unmapping" feature, where unmapping a dma_buf may
>> not immediately unmap it in case it's used later, so that the "mapping"
>> later is faster. A feature which we'd preferably get rid of.
>> - because of the above feature not being controlled by the user, it can
>> cause variance in submit times.
>>
>> On the other hand, we cannot (at least always) do the mapping at
>> allocation/import time, because
>>
>> - A single FD may have multiple channel_ctx's, and an allocation/import
>> may need to be used in any subset of them
>> - The import IOCTL is fixed and doesn't have the parameters we'd need to
>> do this at import time
>> - Overall it's more orthogonal to have GEM object acquirement in one
>> step and mapping in another.
>>
>> Maybe that's not quite what you asked, but it's some background anyway :)
> 
> I'm asking this question because right now there is only one potential
> client for this IOCTL, the VIC. If other clients aren't supposed to be a
> part of the DRM driver, like for example NVDEC which probably should be
> a V4L driver, then DRM driver will have only a single VIC and in this
> case we shouldn't need this IOCTL because DRM and V4L should use generic
> dmabuf API for importing and exporting buffers.

This IOCTL is required for GR2D/GR3D too, as they need to access memory 
as well. This is a different step from import/export -- first you import 
or allocate your memory so you have a GEM handle, then you map it to the 
channel, which does the IOMMU mapping (if there is an IOMMU).

> 
> I'm also not quite sure about whether the current model of the unified
> Tegra DRM driver is suitable for having the separated engines. Perhaps
> each separated engine should just have its own rendering node?
> 

Yeah, having separate nodes per engine might be better, e.g. so it's 
easier to do access control. I don't have a strong opinion on that though.

Mikko

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 19/20] drm/tegra: Implement new UAPI
  2020-10-19 17:27       ` Dmitry Osipenko
  2020-10-20  9:18         ` Mikko Perttunen
@ 2020-10-20 11:40         ` Daniel Vetter
  2020-10-20 12:51           ` Mikko Perttunen
  1 sibling, 1 reply; 35+ messages in thread
From: Daniel Vetter @ 2020-10-20 11:40 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Mikko Perttunen, Mikko Perttunen, Thierry Reding, Jon Hunter,
	Dave Airlie, linux-tegra, dri-devel, talho, bhuntsman

On Mon, Oct 19, 2020 at 7:27 PM Dmitry Osipenko <digetx@gmail.com> wrote:
>
> 19.10.2020 11:13, Mikko Perttunen пишет:
> > On 10/19/20 5:21 AM, Dmitry Osipenko wrote:
> >> 07.10.2020 20:12, Mikko Perttunen пишет:
> >>> +int tegra_drm_ioctl_channel_map(struct drm_device *drm, void *data,
> >>> +                struct drm_file *file)
> >>> +{
> >>
> >> Hello, Mikko!
> >>
> >> Could you please tell what are the host1x clients that are going to be
> >> upstreamed and will need this IOCTL?
> >>
> >
> > Hi Dmitry!
> >
> > It is needed for any engine/job that wants to access memory, as this
> > IOCTL must be used to map memory for the engine. So all of them.
> >
> > Downstream doesn't have an equivalent IOCTL because it (currently) does
> > mapping at submit time, but that is suboptimal because
> >
> > - it requires doing relocations in the kernel which isn't required for
> > T186+
> > - it's a big performance penalty, due to which the downstream kernel has
> > the "deferred dma-buf unmapping" feature, where unmapping a dma_buf may
> > not immediately unmap it in case it's used later, so that the "mapping"
> > later is faster. A feature which we'd preferably get rid of.
> > - because of the above feature not being controlled by the user, it can
> > cause variance in submit times.
> >
> > On the other hand, we cannot (at least always) do the mapping at
> > allocation/import time, because
> >
> > - A single FD may have multiple channel_ctx's, and an allocation/import
> > may need to be used in any subset of them
> > - The import IOCTL is fixed and doesn't have the parameters we'd need to
> > do this at import time
> > - Overall it's more orthogonal to have GEM object acquirement in one
> > step and mapping in another.
> >
> > Maybe that's not quite what you asked, but it's some background anyway :)
>
> I'm asking this question because right now there is only one potential
> client for this IOCTL, the VIC. If other clients aren't supposed to be a
> part of the DRM driver, like for example NVDEC which probably should be
> a V4L driver, then DRM driver will have only a single VIC and in this
> case we shouldn't need this IOCTL because DRM and V4L should use generic
> dmabuf API for importing and exporting buffers.

Randomly jumping in here ...

So if you have a drm driver with userspace in mesa3d already, the
usual approach is to have a libva implementation (ideally in mesa3d
too, using the gallium framework so that a lot of the boring
integration glue is taken care of already) directly on top of drm. No
v4l driver needed at all here.

And it sounds like this nvdec thing would fit that bill pretty neatly.

> I'm also not quite sure about whether the current model of the unified
> Tegra DRM driver is suitable for having the separated engines. Perhaps
> each separated engine should just have its own rendering node?

Above model using libva driver in userspace for nvdec would avoid this
issue too.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 19/20] drm/tegra: Implement new UAPI
  2020-10-20 11:40         ` Daniel Vetter
@ 2020-10-20 12:51           ` Mikko Perttunen
  0 siblings, 0 replies; 35+ messages in thread
From: Mikko Perttunen @ 2020-10-20 12:51 UTC (permalink / raw)
  To: Daniel Vetter, Dmitry Osipenko
  Cc: Mikko Perttunen, Thierry Reding, Jon Hunter, Dave Airlie,
	linux-tegra, dri-devel, talho, bhuntsman

On 10/20/20 2:40 PM, Daniel Vetter wrote:
> On Mon, Oct 19, 2020 at 7:27 PM Dmitry Osipenko <digetx@gmail.com> wrote:
>>
>> 19.10.2020 11:13, Mikko Perttunen пишет:
>>> On 10/19/20 5:21 AM, Dmitry Osipenko wrote:
>>>> 07.10.2020 20:12, Mikko Perttunen пишет:
>>>>> +int tegra_drm_ioctl_channel_map(struct drm_device *drm, void *data,
>>>>> +                struct drm_file *file)
>>>>> +{
>>>>
>>>> Hello, Mikko!
>>>>
>>>> Could you please tell what are the host1x clients that are going to be
>>>> upstreamed and will need this IOCTL?
>>>>
>>>
>>> Hi Dmitry!
>>>
>>> It is needed for any engine/job that wants to access memory, as this
>>> IOCTL must be used to map memory for the engine. So all of them.
>>>
>>> Downstream doesn't have an equivalent IOCTL because it (currently) does
>>> mapping at submit time, but that is suboptimal because
>>>
>>> - it requires doing relocations in the kernel which isn't required for
>>> T186+
>>> - it's a big performance penalty, due to which the downstream kernel has
>>> the "deferred dma-buf unmapping" feature, where unmapping a dma_buf may
>>> not immediately unmap it in case it's used later, so that the "mapping"
>>> later is faster. A feature which we'd preferably get rid of.
>>> - because of the above feature not being controlled by the user, it can
>>> cause variance in submit times.
>>>
>>> On the other hand, we cannot (at least always) do the mapping at
>>> allocation/import time, because
>>>
>>> - A single FD may have multiple channel_ctx's, and an allocation/import
>>> may need to be used in any subset of them
>>> - The import IOCTL is fixed and doesn't have the parameters we'd need to
>>> do this at import time
>>> - Overall it's more orthogonal to have GEM object acquirement in one
>>> step and mapping in another.
>>>
>>> Maybe that's not quite what you asked, but it's some background anyway :)
>>
>> I'm asking this question because right now there is only one potential
>> client for this IOCTL, the VIC. If other clients aren't supposed to be a
>> part of the DRM driver, like for example NVDEC which probably should be
>> a V4L driver, then DRM driver will have only a single VIC and in this
>> case we shouldn't need this IOCTL because DRM and V4L should use generic
>> dmabuf API for importing and exporting buffers.
> 
> Randomly jumping in here ...
> 
> So if you have a drm driver with userspace in mesa3d already, the
> usual approach is to have a libva implementation (ideally in mesa3d
> too, using the gallium framework so that a lot of the boring
> integration glue is taken care of already) directly on top of drm. No
> v4l driver needed at all here.
> 
> And it sounds like this nvdec thing would fit that bill pretty neatly.
Something like this would be my preference as well.

Mikko

> 
>> I'm also not quite sure about whether the current model of the unified
>> Tegra DRM driver is suitable for having the separated engines. Perhaps
>> each separated engine should just have its own rendering node?
> 
> Above model using libva driver in userspace for nvdec would avoid this
> issue too.
> -Daniel
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 19/20] drm/tegra: Implement new UAPI
  2020-10-20  9:18         ` Mikko Perttunen
@ 2020-10-22  4:20           ` Dmitry Osipenko
  2020-10-26  9:11             ` Mikko Perttunen
  0 siblings, 1 reply; 35+ messages in thread
From: Dmitry Osipenko @ 2020-10-22  4:20 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

20.10.2020 12:18, Mikko Perttunen пишет:
>> I'm asking this question because right now there is only one potential
>> client for this IOCTL, the VIC. If other clients aren't supposed to be a
>> part of the DRM driver, like for example NVDEC which probably should be
>> a V4L driver, then DRM driver will have only a single VIC and in this
>> case we shouldn't need this IOCTL because DRM and V4L should use generic
>> dmabuf API for importing and exporting buffers.
> 
> This IOCTL is required for GR2D/GR3D too, as they need to access memory
> as well. This is a different step from import/export -- first you import
> or allocate your memory so you have a GEM handle, then you map it to the
> channel, which does the IOMMU mapping (if there is an IOMMU).
> 

This doesn't answer my question. I don't have a full picture and for now
will remain dubious about this IOCTL, but it should be fine to have it
in a form of WIP patches (maybe staging feature) until userspace code
and hardware specs will become available.

Some more comments:

1. Older Tegra SoCs do not have restrictions which do not allow to group
IOMMU as wished by kernel driver. It's fine to have one static mapping
per-GEM that can be accessed by all DRM devices, that's why CHANNEL_MAP
is questionable.

2. IIUC, the mappings need to be done per-device group/stream and not
per-channel_ctx. It looks like nothing stops channel contexts to guess
mapping addresses of the other context, isn't it?

I'm suggesting that each GEM should have a per-device mapping and the
new IOCTL should create these GEM-mappings, instead of the channel_ctx
mappings.

3. We shouldn't need channel contexts at all, a single "DRM file"
context should be enough to have.

4. The new UAPI need to be separated into several parts in the next
revision, one patch for each new feature.

The first patches should be the ones that are relevant to the existing
userspace code, like support for the waits.

Partial mappings should be a separate feature because it's a
questionable feature that needs to be proved by a real userspace first.
Maybe it would be even better to drop it for the starter, to ease reviewing.

Waiting for fences on host1x should be a separate feature.

The syncfile support needs to be a separate feature as well because I
don't see a use-case for it right now.

I'd like to see the DRM_SCHED and syncobj support. I can help you with
it if it's out of yours scope for now.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 19/20] drm/tegra: Implement new UAPI
  2020-10-22  4:20           ` Dmitry Osipenko
@ 2020-10-26  9:11             ` Mikko Perttunen
  2020-10-27 19:06               ` Dmitry Osipenko
  0 siblings, 1 reply; 35+ messages in thread
From: Mikko Perttunen @ 2020-10-26  9:11 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman



On 10/22/20 7:20 AM, Dmitry Osipenko wrote:
> 20.10.2020 12:18, Mikko Perttunen пишет:
>>> I'm asking this question because right now there is only one potential
>>> client for this IOCTL, the VIC. If other clients aren't supposed to be a
>>> part of the DRM driver, like for example NVDEC which probably should be
>>> a V4L driver, then DRM driver will have only a single VIC and in this
>>> case we shouldn't need this IOCTL because DRM and V4L should use generic
>>> dmabuf API for importing and exporting buffers.
>>
>> This IOCTL is required for GR2D/GR3D too, as they need to access memory
>> as well. This is a different step from import/export -- first you import
>> or allocate your memory so you have a GEM handle, then you map it to the
>> channel, which does the IOMMU mapping (if there is an IOMMU).
>>
> 
> This doesn't answer my question. I don't have a full picture and for now
> will remain dubious about this IOCTL, but it should be fine to have it
> in a form of WIP patches (maybe staging feature) until userspace code
> and hardware specs will become available.
> 
> Some more comments:
> 
> 1. Older Tegra SoCs do not have restrictions which do not allow to group
> IOMMU as wished by kernel driver. It's fine to have one static mapping
> per-GEM that can be accessed by all DRM devices, that's why CHANNEL_MAP
> is questionable.

Sure, on older Tegras this is not strictly necessary because the 
firewall can check pointers. It's not related to IOMMU grouping.

> 
> 2. IIUC, the mappings need to be done per-device group/stream and not
> per-channel_ctx. It looks like nothing stops channel contexts to guess
> mapping addresses of the other context, isn't it?
> 
> I'm suggesting that each GEM should have a per-device mapping and the
> new IOCTL should create these GEM-mappings, instead of the channel_ctx
> mappings.

In the absence of context isolation, this is correct. But with context 
isolation (which is next on my upstream todo-list), on supported chips 
(T186+), we can map to individual contexts, which are associated with 
channel_ctx's.

Without context isolation, this IOCTL just maps to the underlying 
engine. The list of mappings can also be used by the firewall later - as 
mentioned, just the relocs would be enough for that, but I think there's 
a good orthogonality in CHANNEL_MAP describing memory regions accessible 
by the engine, and relocations just doing relocations.

> 
> 3. We shouldn't need channel contexts at all, a single "DRM file"
> context should be enough to have.

Yeah, maybe we could just have one "inline" channel context in the DRM 
file, that's just initialized by the CHANNEL_OPEN IOCTL.

> 
> 4. The new UAPI need to be separated into several parts in the next
> revision, one patch for each new feature.

I'll try to split up where possible.

> 
> The first patches should be the ones that are relevant to the existing
> userspace code, like support for the waits.

Can you elaborate what you mean by this?

> 
> Partial mappings should be a separate feature because it's a
> questionable feature that needs to be proved by a real userspace first.
> Maybe it would be even better to drop it for the starter, to ease reviewing.

Considering that the "no-op" support for it (map the whole buffer but 
just keep track of the starting offset) is only a couple of lines, I'd 
like to keep it in.

> 
> Waiting for fences on host1x should be a separate feature.

OK.

> 
> The syncfile support needs to be a separate feature as well because I
> don't see a use-case for it right now.

I'll separate it - the reason it's there is to avoid the overhead of the 
extra ID/threshold -> sync_file conversion IOCTL if you need it.

> 
> I'd like to see the DRM_SCHED and syncobj support. I can help you with
> it if it's out of yours scope for now.
> 

I already wrote some code for syncobj I can probably pull in. Regarding 
DRM_SCHED, help is accepted. However, we should keep using the hardware 
scheduler, and considering it's a bigger piece of work, let's not block 
this series on it.

Mikko

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 19/20] drm/tegra: Implement new UAPI
  2020-10-26  9:11             ` Mikko Perttunen
@ 2020-10-27 19:06               ` Dmitry Osipenko
  2020-10-28  9:54                 ` Mikko Perttunen
  0 siblings, 1 reply; 35+ messages in thread
From: Dmitry Osipenko @ 2020-10-27 19:06 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

26.10.2020 12:11, Mikko Perttunen пишет:
>>
>> The first patches should be the ones that are relevant to the existing
>> userspace code, like support for the waits.
> 
> Can you elaborate what you mean by this?

All features that don't have an immediate real use-case should be placed
later in the series because we may defer merging of those patches until
we will see userspace that uses those features since we can't really
decide whether these are decisions that we won't regret later on, only
practical application can confirm the correctness.

>> Partial mappings should be a separate feature because it's a
>> questionable feature that needs to be proved by a real userspace first.
>> Maybe it would be even better to drop it for the starter, to ease
>> reviewing.
> 
> Considering that the "no-op" support for it (map the whole buffer but
> just keep track of the starting offset) is only a couple of lines, I'd
> like to keep it in.

There is no tracking in the current code which prevents the duplicated
mappings, will we need to care about it? This a bit too questionable
feature for now, IMO. I'd like to see it as a separate patch.

...
>> I'd like to see the DRM_SCHED and syncobj support. I can help you with
>> it if it's out of yours scope for now.
>>
> 
> I already wrote some code for syncobj I can probably pull in. Regarding
> DRM_SCHED, help is accepted. However, we should keep using the hardware
> scheduler, and considering it's a bigger piece of work, let's not block
> this series on it.

I'd like to see all the custom IOCTLs to be deprecated and replaced with
the generic DRM API wherever possible. Hence, I think it should be a
mandatory features that we need to focus on. The current WIP userspace
already uses and relies on DRM_SCHED.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 19/20] drm/tegra: Implement new UAPI
  2020-10-27 19:06               ` Dmitry Osipenko
@ 2020-10-28  9:54                 ` Mikko Perttunen
  0 siblings, 0 replies; 35+ messages in thread
From: Mikko Perttunen @ 2020-10-28  9:54 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

On 10/27/20 9:06 PM, Dmitry Osipenko wrote:
> 26.10.2020 12:11, Mikko Perttunen пишет:
>>>
>>> The first patches should be the ones that are relevant to the existing
>>> userspace code, like support for the waits.
>>
>> Can you elaborate what you mean by this?
> 
> All features that don't have an immediate real use-case should be placed
> later in the series because we may defer merging of those patches until
> we will see userspace that uses those features since we can't really
> decide whether these are decisions that we won't regret later on, only
> practical application can confirm the correctness.

I was more referring to the "support for waits" part, should have 
clarified that.

> 
>>> Partial mappings should be a separate feature because it's a
>>> questionable feature that needs to be proved by a real userspace first.
>>> Maybe it would be even better to drop it for the starter, to ease
>>> reviewing.
>>
>> Considering that the "no-op" support for it (map the whole buffer but
>> just keep track of the starting offset) is only a couple of lines, I'd
>> like to keep it in.
> 
> There is no tracking in the current code which prevents the duplicated
> mappings, will we need to care about it? This a bit too questionable
> feature for now, IMO. I'd like to see it as a separate patch.

I don't think there is any need to special case duplicated mappings. I 
think this is a pretty obvious feature to have, but I can rename them to 
reserved0 and reserved1 and require that reserved0 is zero and reserved1 
is the size of the passed GEM object.

> 
> ...
>>> I'd like to see the DRM_SCHED and syncobj support. I can help you with
>>> it if it's out of yours scope for now.
>>>
>>
>> I already wrote some code for syncobj I can probably pull in. Regarding
>> DRM_SCHED, help is accepted. However, we should keep using the hardware
>> scheduler, and considering it's a bigger piece of work, let's not block
>> this series on it.
> 
> I'd like to see all the custom IOCTLs to be deprecated and replaced with
> the generic DRM API wherever possible. Hence, I think it should be a
> mandatory features that we need to focus on. The current WIP userspace
> already uses and relies on DRM_SCHED.
> 

 From my point of view, the ABI needs to be designed such that it can 
replace what we have downstream, i.e. it needs to support the usecases 
the downstream nvhost ABI supports currently. Otherwise there is no 
migration path to upstream and it's not worth it for me to work on this.

Although, I don't see what this ABI is missing that your userspace would 
rely on. Does it submit jobs in reverse order that would deadlock if 
drm_sched didn't reorder them based on prefences, or something?

Software-based scheduling might make sense in situations where the 
channel is shared by all processes, but at least for modern chips that 
are designed to use hardware scheduling, I want to use that capability.

Mikko

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, back to index

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-07 17:12 [PATCH v3 00/20] Host1x/TegraDRM UAPI Mikko Perttunen
2020-10-07 17:12 ` [PATCH v3 01/20] gpu: host1x: Use different lock classes for each client Mikko Perttunen
2020-10-07 17:12 ` [PATCH v3 02/20] gpu: host1x: Allow syncpoints without associated client Mikko Perttunen
2020-10-07 17:12 ` [PATCH v3 03/20] gpu: host1x: Show number of pending waiters in debugfs Mikko Perttunen
2020-10-07 17:12 ` [PATCH v3 04/20] gpu: host1x: Remove cancelled waiters immediately Mikko Perttunen
2020-10-07 17:12 ` [PATCH v3 05/20] gpu: host1x: Use HW-equivalent syncpoint expiration check Mikko Perttunen
2020-10-07 17:12 ` [PATCH v3 06/20] gpu: host1x: Cleanup and refcounting for syncpoints Mikko Perttunen
2020-10-07 22:23   ` kernel test robot
2020-10-07 17:12 ` [PATCH v3 07/20] gpu: host1x: Introduce UAPI header Mikko Perttunen
2020-10-07 17:12 ` [PATCH v3 08/20] gpu: host1x: Implement /dev/host1x device node Mikko Perttunen
2020-10-07 17:12 ` [PATCH v3 09/20] gpu: host1x: DMA fences and userspace fence creation Mikko Perttunen
2020-10-07 23:13   ` kernel test robot
2020-10-08 11:13   ` kernel test robot
2020-10-07 17:12 ` [PATCH v3 10/20] gpu: host1x: Add no-recovery mode Mikko Perttunen
2020-10-07 17:12 ` [PATCH v3 11/20] gpu: host1x: Add job release callback Mikko Perttunen
2020-10-07 17:12 ` [PATCH v3 12/20] gpu: host1x: Add support for syncpoint waits in CDMA pushbuffer Mikko Perttunen
2020-10-07 17:12 ` [PATCH v3 13/20] gpu: host1x: Reset max value when freeing a syncpoint Mikko Perttunen
2020-10-07 17:12 ` [PATCH v3 14/20] gpu: host1x: Reserve VBLANK syncpoints at initialization Mikko Perttunen
2020-10-07 17:12 ` [PATCH v3 15/20] drm/tegra: Add new UAPI to header Mikko Perttunen
2020-10-07 17:12 ` [PATCH v3 16/20] drm/tegra: Boot VIC during runtime PM resume Mikko Perttunen
2020-10-07 17:12 ` [PATCH v3 17/20] drm/tegra: Set resv fields when importing/exporting GEMs Mikko Perttunen
2020-10-07 17:12 ` [PATCH v3 18/20] drm/tegra: Allocate per-engine channel in core code Mikko Perttunen
2020-10-07 17:12 ` [PATCH v3 19/20] drm/tegra: Implement new UAPI Mikko Perttunen
2020-10-08  3:42   ` kernel test robot
2020-10-19  2:21   ` Dmitry Osipenko
2020-10-19  8:13     ` Mikko Perttunen
2020-10-19 17:27       ` Dmitry Osipenko
2020-10-20  9:18         ` Mikko Perttunen
2020-10-22  4:20           ` Dmitry Osipenko
2020-10-26  9:11             ` Mikko Perttunen
2020-10-27 19:06               ` Dmitry Osipenko
2020-10-28  9:54                 ` Mikko Perttunen
2020-10-20 11:40         ` Daniel Vetter
2020-10-20 12:51           ` Mikko Perttunen
2020-10-07 17:12 ` [PATCH v3 20/20] drm/tegra: Add job firewall Mikko Perttunen

Linux-Tegra Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-tegra/0 linux-tegra/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-tegra linux-tegra/ https://lore.kernel.org/linux-tegra \
		linux-tegra@vger.kernel.org
	public-inbox-index linux-tegra

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-tegra


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git