All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2 00/17] Host1x/TegraDRM UAPI
@ 2020-09-05 10:34 ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-05 10:34 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Hi all,

here's a second revision of the Host1x/TegraDRM UAPI proposal,
hopefully with most issues from v1 resolved, and also with
an implementation. There are still open issues with the
implementation:

* Relocs are now handled on TegraDRM side instead of Host1x,
  so the firewall is not aware of them, causing submission
  failure where the firewall is enabled. Proposed solution
  is to move the firewall to TegraDRM side, but this hasn't
  been done yet.
* For the new UAPI, syncpoint recovery on job timeout is
  disabled. What this means is that upon job timeout,
  all further jobs using that syncpoint are cancelled,
  and the syncpoint is marked unusable until it is freed.
  However, there is currently a race between the timeout
  handler and job submission, where submission can observe
  the syncpoint in non-locked state and yet the job
  cancellations won't cancel the new job.
* Waiting for DMA reservation fences is not implemented yet.
* I have only tested on Tegra186.

The series consists of three parts:

* The first part contains some fixes and improvements to
  the Host1x driver of more general nature,
* The second part adds the Host1x side UAPI, as well as
  Host1x-side changes needed for the new TegraDRM UAPI,
* The third part adds the new TegraDRM UAPI.

I have written some tests to test the new interface,
see https://github.com/cyndis/uapi-test. Porting of proper
userspace (e.g. opentegra, vdpau-tegra) will come once
there is some degree of conclusion on the UAPI definition.

The series can be also found in
https://github.com/cyndis/linux/commits/work/host1x-uapi.

Older versions:
v1: https://www.spinics.net/lists/linux-tegra/msg51000.html

Thank you,
Mikko

Mikko Perttunen (17):
  gpu: host1x: Use different lock classes for each client
  gpu: host1x: Allow syncpoints without associated client
  gpu: host1x: Show number of pending waiters in debugfs
  gpu: host1x: Remove cancelled waiters immediately
  gpu: host1x: Use HW-equivalent syncpoint expiration check
  gpu: host1x: Cleanup and refcounting for syncpoints
  gpu: host1x: Introduce UAPI header
  gpu: host1x: Implement /dev/host1x device node
  gpu: host1x: DMA fences and userspace fence creation
  WIP: gpu: host1x: Add no-recovery mode
  gpu: host1x: Add job release callback
  gpu: host1x: Add support for syncpoint waits in CDMA pushbuffer
  gpu: host1x: Reset max value when freeing a syncpoint
  drm/tegra: Add new UAPI to header
  drm/tegra: Add power_on/power_off engine callbacks
  drm/tegra: Allocate per-engine channel in core code
  WIP: drm/tegra: Implement new UAPI

 drivers/gpu/drm/tegra/Makefile      |   2 +
 drivers/gpu/drm/tegra/dc.c          |   4 +-
 drivers/gpu/drm/tegra/drm.c         |  75 ++-
 drivers/gpu/drm/tegra/drm.h         |  20 +-
 drivers/gpu/drm/tegra/gr2d.c        |   4 +-
 drivers/gpu/drm/tegra/gr3d.c        |   4 +-
 drivers/gpu/drm/tegra/uapi.h        |  59 +++
 drivers/gpu/drm/tegra/uapi/submit.c | 687 ++++++++++++++++++++++++++++
 drivers/gpu/drm/tegra/uapi/uapi.c   | 328 +++++++++++++
 drivers/gpu/drm/tegra/vic.c         | 131 +++---
 drivers/gpu/host1x/Makefile         |   2 +
 drivers/gpu/host1x/bus.c            |   7 +-
 drivers/gpu/host1x/cdma.c           |  53 ++-
 drivers/gpu/host1x/debug.c          |  14 +-
 drivers/gpu/host1x/dev.c            |   9 +
 drivers/gpu/host1x/dev.h            |  10 +-
 drivers/gpu/host1x/fence.c          | 207 +++++++++
 drivers/gpu/host1x/fence.h          |  15 +
 drivers/gpu/host1x/hw/cdma_hw.c     |   2 +-
 drivers/gpu/host1x/hw/channel_hw.c  |  67 ++-
 drivers/gpu/host1x/hw/debug_hw.c    |  11 +-
 drivers/gpu/host1x/intr.c           |  23 +-
 drivers/gpu/host1x/intr.h           |   2 +
 drivers/gpu/host1x/job.c            |  79 +++-
 drivers/gpu/host1x/job.h            |  14 +
 drivers/gpu/host1x/syncpt.c         | 137 +++---
 drivers/gpu/host1x/syncpt.h         |  21 +-
 drivers/gpu/host1x/uapi.c           | 381 +++++++++++++++
 drivers/gpu/host1x/uapi.h           |  22 +
 include/linux/host1x.h              |  40 +-
 include/uapi/drm/tegra_drm.h        | 431 +++++++++++++++--
 include/uapi/linux/host1x.h         | 134 ++++++
 32 files changed, 2718 insertions(+), 277 deletions(-)
 create mode 100644 drivers/gpu/drm/tegra/uapi.h
 create mode 100644 drivers/gpu/drm/tegra/uapi/submit.c
 create mode 100644 drivers/gpu/drm/tegra/uapi/uapi.c
 create mode 100644 drivers/gpu/host1x/fence.c
 create mode 100644 drivers/gpu/host1x/fence.h
 create mode 100644 drivers/gpu/host1x/uapi.c
 create mode 100644 drivers/gpu/host1x/uapi.h
 create mode 100644 include/uapi/linux/host1x.h

-- 
2.28.0


^ permalink raw reply	[flat|nested] 128+ messages in thread

* [RFC PATCH v2 00/17] Host1x/TegraDRM UAPI
@ 2020-09-05 10:34 ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-05 10:34 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

Hi all,

here's a second revision of the Host1x/TegraDRM UAPI proposal,
hopefully with most issues from v1 resolved, and also with
an implementation. There are still open issues with the
implementation:

* Relocs are now handled on TegraDRM side instead of Host1x,
  so the firewall is not aware of them, causing submission
  failure where the firewall is enabled. Proposed solution
  is to move the firewall to TegraDRM side, but this hasn't
  been done yet.
* For the new UAPI, syncpoint recovery on job timeout is
  disabled. What this means is that upon job timeout,
  all further jobs using that syncpoint are cancelled,
  and the syncpoint is marked unusable until it is freed.
  However, there is currently a race between the timeout
  handler and job submission, where submission can observe
  the syncpoint in non-locked state and yet the job
  cancellations won't cancel the new job.
* Waiting for DMA reservation fences is not implemented yet.
* I have only tested on Tegra186.

The series consists of three parts:

* The first part contains some fixes and improvements to
  the Host1x driver of more general nature,
* The second part adds the Host1x side UAPI, as well as
  Host1x-side changes needed for the new TegraDRM UAPI,
* The third part adds the new TegraDRM UAPI.

I have written some tests to test the new interface,
see https://github.com/cyndis/uapi-test. Porting of proper
userspace (e.g. opentegra, vdpau-tegra) will come once
there is some degree of conclusion on the UAPI definition.

The series can be also found in
https://github.com/cyndis/linux/commits/work/host1x-uapi.

Older versions:
v1: https://www.spinics.net/lists/linux-tegra/msg51000.html

Thank you,
Mikko

Mikko Perttunen (17):
  gpu: host1x: Use different lock classes for each client
  gpu: host1x: Allow syncpoints without associated client
  gpu: host1x: Show number of pending waiters in debugfs
  gpu: host1x: Remove cancelled waiters immediately
  gpu: host1x: Use HW-equivalent syncpoint expiration check
  gpu: host1x: Cleanup and refcounting for syncpoints
  gpu: host1x: Introduce UAPI header
  gpu: host1x: Implement /dev/host1x device node
  gpu: host1x: DMA fences and userspace fence creation
  WIP: gpu: host1x: Add no-recovery mode
  gpu: host1x: Add job release callback
  gpu: host1x: Add support for syncpoint waits in CDMA pushbuffer
  gpu: host1x: Reset max value when freeing a syncpoint
  drm/tegra: Add new UAPI to header
  drm/tegra: Add power_on/power_off engine callbacks
  drm/tegra: Allocate per-engine channel in core code
  WIP: drm/tegra: Implement new UAPI

 drivers/gpu/drm/tegra/Makefile      |   2 +
 drivers/gpu/drm/tegra/dc.c          |   4 +-
 drivers/gpu/drm/tegra/drm.c         |  75 ++-
 drivers/gpu/drm/tegra/drm.h         |  20 +-
 drivers/gpu/drm/tegra/gr2d.c        |   4 +-
 drivers/gpu/drm/tegra/gr3d.c        |   4 +-
 drivers/gpu/drm/tegra/uapi.h        |  59 +++
 drivers/gpu/drm/tegra/uapi/submit.c | 687 ++++++++++++++++++++++++++++
 drivers/gpu/drm/tegra/uapi/uapi.c   | 328 +++++++++++++
 drivers/gpu/drm/tegra/vic.c         | 131 +++---
 drivers/gpu/host1x/Makefile         |   2 +
 drivers/gpu/host1x/bus.c            |   7 +-
 drivers/gpu/host1x/cdma.c           |  53 ++-
 drivers/gpu/host1x/debug.c          |  14 +-
 drivers/gpu/host1x/dev.c            |   9 +
 drivers/gpu/host1x/dev.h            |  10 +-
 drivers/gpu/host1x/fence.c          | 207 +++++++++
 drivers/gpu/host1x/fence.h          |  15 +
 drivers/gpu/host1x/hw/cdma_hw.c     |   2 +-
 drivers/gpu/host1x/hw/channel_hw.c  |  67 ++-
 drivers/gpu/host1x/hw/debug_hw.c    |  11 +-
 drivers/gpu/host1x/intr.c           |  23 +-
 drivers/gpu/host1x/intr.h           |   2 +
 drivers/gpu/host1x/job.c            |  79 +++-
 drivers/gpu/host1x/job.h            |  14 +
 drivers/gpu/host1x/syncpt.c         | 137 +++---
 drivers/gpu/host1x/syncpt.h         |  21 +-
 drivers/gpu/host1x/uapi.c           | 381 +++++++++++++++
 drivers/gpu/host1x/uapi.h           |  22 +
 include/linux/host1x.h              |  40 +-
 include/uapi/drm/tegra_drm.h        | 431 +++++++++++++++--
 include/uapi/linux/host1x.h         | 134 ++++++
 32 files changed, 2718 insertions(+), 277 deletions(-)
 create mode 100644 drivers/gpu/drm/tegra/uapi.h
 create mode 100644 drivers/gpu/drm/tegra/uapi/submit.c
 create mode 100644 drivers/gpu/drm/tegra/uapi/uapi.c
 create mode 100644 drivers/gpu/host1x/fence.c
 create mode 100644 drivers/gpu/host1x/fence.h
 create mode 100644 drivers/gpu/host1x/uapi.c
 create mode 100644 drivers/gpu/host1x/uapi.h
 create mode 100644 include/uapi/linux/host1x.h

-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* [RFC PATCH v2 01/17] gpu: host1x: Use different lock classes for each client
  2020-09-05 10:34 ` Mikko Perttunen
@ 2020-09-05 10:34   ` Mikko Perttunen
  -1 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-05 10:34 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

To avoid false lockdep warnings, give each client lock a different
lock class, passed from the initialization site by macro.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/bus.c | 7 ++++---
 include/linux/host1x.h   | 9 ++++++++-
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/host1x/bus.c b/drivers/gpu/host1x/bus.c
index e201f62d62c0..4101f64bd545 100644
--- a/drivers/gpu/host1x/bus.c
+++ b/drivers/gpu/host1x/bus.c
@@ -714,13 +714,14 @@ EXPORT_SYMBOL(host1x_driver_unregister);
  * device and call host1x_device_init(), which will in turn call each client's
  * &host1x_client_ops.init implementation.
  */
-int host1x_client_register(struct host1x_client *client)
+int __host1x_client_register(struct host1x_client *client,
+			   struct lock_class_key *key)
 {
 	struct host1x *host1x;
 	int err;
 
 	INIT_LIST_HEAD(&client->list);
-	mutex_init(&client->lock);
+	__mutex_init(&client->lock, "host1x client lock", key);
 	client->usecount = 0;
 
 	mutex_lock(&devices_lock);
@@ -741,7 +742,7 @@ int host1x_client_register(struct host1x_client *client)
 
 	return 0;
 }
-EXPORT_SYMBOL(host1x_client_register);
+EXPORT_SYMBOL(__host1x_client_register);
 
 /**
  * host1x_client_unregister() - unregister a host1x client
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index 20c885d0bddc..f711fc0154f4 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -320,7 +320,14 @@ static inline struct host1x_device *to_host1x_device(struct device *dev)
 int host1x_device_init(struct host1x_device *device);
 int host1x_device_exit(struct host1x_device *device);
 
-int host1x_client_register(struct host1x_client *client);
+int __host1x_client_register(struct host1x_client *client,
+			     struct lock_class_key *key);
+#define host1x_client_register(class) \
+	({ \
+		static struct lock_class_key __key; \
+		__host1x_client_register(class, &__key); \
+	})
+
 int host1x_client_unregister(struct host1x_client *client);
 
 int host1x_client_suspend(struct host1x_client *client);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH v2 01/17] gpu: host1x: Use different lock classes for each client
@ 2020-09-05 10:34   ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-05 10:34 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

To avoid false lockdep warnings, give each client lock a different
lock class, passed from the initialization site by macro.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/bus.c | 7 ++++---
 include/linux/host1x.h   | 9 ++++++++-
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/host1x/bus.c b/drivers/gpu/host1x/bus.c
index e201f62d62c0..4101f64bd545 100644
--- a/drivers/gpu/host1x/bus.c
+++ b/drivers/gpu/host1x/bus.c
@@ -714,13 +714,14 @@ EXPORT_SYMBOL(host1x_driver_unregister);
  * device and call host1x_device_init(), which will in turn call each client's
  * &host1x_client_ops.init implementation.
  */
-int host1x_client_register(struct host1x_client *client)
+int __host1x_client_register(struct host1x_client *client,
+			   struct lock_class_key *key)
 {
 	struct host1x *host1x;
 	int err;
 
 	INIT_LIST_HEAD(&client->list);
-	mutex_init(&client->lock);
+	__mutex_init(&client->lock, "host1x client lock", key);
 	client->usecount = 0;
 
 	mutex_lock(&devices_lock);
@@ -741,7 +742,7 @@ int host1x_client_register(struct host1x_client *client)
 
 	return 0;
 }
-EXPORT_SYMBOL(host1x_client_register);
+EXPORT_SYMBOL(__host1x_client_register);
 
 /**
  * host1x_client_unregister() - unregister a host1x client
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index 20c885d0bddc..f711fc0154f4 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -320,7 +320,14 @@ static inline struct host1x_device *to_host1x_device(struct device *dev)
 int host1x_device_init(struct host1x_device *device);
 int host1x_device_exit(struct host1x_device *device);
 
-int host1x_client_register(struct host1x_client *client);
+int __host1x_client_register(struct host1x_client *client,
+			     struct lock_class_key *key);
+#define host1x_client_register(class) \
+	({ \
+		static struct lock_class_key __key; \
+		__host1x_client_register(class, &__key); \
+	})
+
 int host1x_client_unregister(struct host1x_client *client);
 
 int host1x_client_suspend(struct host1x_client *client);
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH v2 02/17] gpu: host1x: Allow syncpoints without associated client
  2020-09-05 10:34 ` Mikko Perttunen
@ 2020-09-05 10:34   ` Mikko Perttunen
  -1 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-05 10:34 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Syncpoints don't need to be associated with any client,
so remove the property, and expose host1x_syncpt_alloc.
This will allow allocating syncpoints without prior knowledge
of the engine that it will be used with.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/syncpt.c | 8 +++-----
 drivers/gpu/host1x/syncpt.h | 6 +++++-
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index fce7892d5137..7cb80d4768b1 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -42,9 +42,9 @@ static void host1x_syncpt_base_free(struct host1x_syncpt_base *base)
 		base->requested = false;
 }
 
-static struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
-						 struct host1x_client *client,
-						 unsigned long flags)
+struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
+					  struct host1x_client *client,
+					  unsigned long flags)
 {
 	struct host1x_syncpt *sp = host->syncpt;
 	unsigned int i;
@@ -69,7 +69,6 @@ static struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
 	if (!name)
 		goto free_base;
 
-	sp->client = client;
 	sp->name = name;
 
 	if (flags & HOST1X_SYNCPT_CLIENT_MANAGED)
@@ -447,7 +446,6 @@ void host1x_syncpt_free(struct host1x_syncpt *sp)
 	host1x_syncpt_base_free(sp->base);
 	kfree(sp->name);
 	sp->base = NULL;
-	sp->client = NULL;
 	sp->name = NULL;
 	sp->client_managed = false;
 
diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h
index 8e1d04dacaa0..77e7206cc316 100644
--- a/drivers/gpu/host1x/syncpt.h
+++ b/drivers/gpu/host1x/syncpt.h
@@ -33,7 +33,6 @@ struct host1x_syncpt {
 	const char *name;
 	bool client_managed;
 	struct host1x *host;
-	struct host1x_client *client;
 	struct host1x_syncpt_base *base;
 
 	/* interrupt data */
@@ -113,4 +112,9 @@ static inline int host1x_syncpt_is_valid(struct host1x_syncpt *sp)
 	return sp->id < host1x_syncpt_nb_pts(sp->host);
 }
 
+/* Allocate a syncpoint. */
+struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
+					  struct host1x_client *client,
+					  unsigned long flags);
+
 #endif
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH v2 02/17] gpu: host1x: Allow syncpoints without associated client
@ 2020-09-05 10:34   ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-05 10:34 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

Syncpoints don't need to be associated with any client,
so remove the property, and expose host1x_syncpt_alloc.
This will allow allocating syncpoints without prior knowledge
of the engine that it will be used with.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/syncpt.c | 8 +++-----
 drivers/gpu/host1x/syncpt.h | 6 +++++-
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index fce7892d5137..7cb80d4768b1 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -42,9 +42,9 @@ static void host1x_syncpt_base_free(struct host1x_syncpt_base *base)
 		base->requested = false;
 }
 
-static struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
-						 struct host1x_client *client,
-						 unsigned long flags)
+struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
+					  struct host1x_client *client,
+					  unsigned long flags)
 {
 	struct host1x_syncpt *sp = host->syncpt;
 	unsigned int i;
@@ -69,7 +69,6 @@ static struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
 	if (!name)
 		goto free_base;
 
-	sp->client = client;
 	sp->name = name;
 
 	if (flags & HOST1X_SYNCPT_CLIENT_MANAGED)
@@ -447,7 +446,6 @@ void host1x_syncpt_free(struct host1x_syncpt *sp)
 	host1x_syncpt_base_free(sp->base);
 	kfree(sp->name);
 	sp->base = NULL;
-	sp->client = NULL;
 	sp->name = NULL;
 	sp->client_managed = false;
 
diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h
index 8e1d04dacaa0..77e7206cc316 100644
--- a/drivers/gpu/host1x/syncpt.h
+++ b/drivers/gpu/host1x/syncpt.h
@@ -33,7 +33,6 @@ struct host1x_syncpt {
 	const char *name;
 	bool client_managed;
 	struct host1x *host;
-	struct host1x_client *client;
 	struct host1x_syncpt_base *base;
 
 	/* interrupt data */
@@ -113,4 +112,9 @@ static inline int host1x_syncpt_is_valid(struct host1x_syncpt *sp)
 	return sp->id < host1x_syncpt_nb_pts(sp->host);
 }
 
+/* Allocate a syncpoint. */
+struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
+					  struct host1x_client *client,
+					  unsigned long flags);
+
 #endif
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH v2 03/17] gpu: host1x: Show number of pending waiters in debugfs
  2020-09-05 10:34 ` Mikko Perttunen
@ 2020-09-05 10:34   ` Mikko Perttunen
  -1 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-05 10:34 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Show the number of pending waiters in the debugfs status file.
This is useful for testing to verify that waiters do not leak
or accumulate incorrectly.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/debug.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
index 3eee4318b158..2d06a7406b3b 100644
--- a/drivers/gpu/host1x/debug.c
+++ b/drivers/gpu/host1x/debug.c
@@ -69,6 +69,7 @@ static int show_channel(struct host1x_channel *ch, void *data, bool show_fifo)
 
 static void show_syncpts(struct host1x *m, struct output *o)
 {
+	struct list_head *pos;
 	unsigned int i;
 
 	host1x_debug_output(o, "---- syncpts ----\n");
@@ -76,12 +77,19 @@ static void show_syncpts(struct host1x *m, struct output *o)
 	for (i = 0; i < host1x_syncpt_nb_pts(m); i++) {
 		u32 max = host1x_syncpt_read_max(m->syncpt + i);
 		u32 min = host1x_syncpt_load(m->syncpt + i);
+		unsigned int waiters = 0;
 
-		if (!min && !max)
+		spin_lock(&m->syncpt[i].intr.lock);
+		list_for_each(pos, &m->syncpt[i].intr.wait_head)
+			waiters++;
+		spin_unlock(&m->syncpt[i].intr.lock);
+
+		if (!min && !max && !waiters)
 			continue;
 
-		host1x_debug_output(o, "id %u (%s) min %d max %d\n",
-				    i, m->syncpt[i].name, min, max);
+		host1x_debug_output(o,
+				    "id %u (%s) min %d max %d (%d waiters)\n",
+				    i, m->syncpt[i].name, min, max, waiters);
 	}
 
 	for (i = 0; i < host1x_syncpt_nb_bases(m); i++) {
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH v2 03/17] gpu: host1x: Show number of pending waiters in debugfs
@ 2020-09-05 10:34   ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-05 10:34 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

Show the number of pending waiters in the debugfs status file.
This is useful for testing to verify that waiters do not leak
or accumulate incorrectly.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/debug.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
index 3eee4318b158..2d06a7406b3b 100644
--- a/drivers/gpu/host1x/debug.c
+++ b/drivers/gpu/host1x/debug.c
@@ -69,6 +69,7 @@ static int show_channel(struct host1x_channel *ch, void *data, bool show_fifo)
 
 static void show_syncpts(struct host1x *m, struct output *o)
 {
+	struct list_head *pos;
 	unsigned int i;
 
 	host1x_debug_output(o, "---- syncpts ----\n");
@@ -76,12 +77,19 @@ static void show_syncpts(struct host1x *m, struct output *o)
 	for (i = 0; i < host1x_syncpt_nb_pts(m); i++) {
 		u32 max = host1x_syncpt_read_max(m->syncpt + i);
 		u32 min = host1x_syncpt_load(m->syncpt + i);
+		unsigned int waiters = 0;
 
-		if (!min && !max)
+		spin_lock(&m->syncpt[i].intr.lock);
+		list_for_each(pos, &m->syncpt[i].intr.wait_head)
+			waiters++;
+		spin_unlock(&m->syncpt[i].intr.lock);
+
+		if (!min && !max && !waiters)
 			continue;
 
-		host1x_debug_output(o, "id %u (%s) min %d max %d\n",
-				    i, m->syncpt[i].name, min, max);
+		host1x_debug_output(o,
+				    "id %u (%s) min %d max %d (%d waiters)\n",
+				    i, m->syncpt[i].name, min, max, waiters);
 	}
 
 	for (i = 0; i < host1x_syncpt_nb_bases(m); i++) {
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH v2 04/17] gpu: host1x: Remove cancelled waiters immediately
  2020-09-05 10:34 ` Mikko Perttunen
@ 2020-09-05 10:34   ` Mikko Perttunen
  -1 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-05 10:34 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Before this patch, cancelled waiters would only be cleaned up
once their threshold value was reached. Make host1x_intr_put_ref
process the cancellation immediately to fix this.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/intr.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c
index 9245add23b5d..5d328d20ce6d 100644
--- a/drivers/gpu/host1x/intr.c
+++ b/drivers/gpu/host1x/intr.c
@@ -247,13 +247,17 @@ void host1x_intr_put_ref(struct host1x *host, unsigned int id, void *ref)
 	struct host1x_waitlist *waiter = ref;
 	struct host1x_syncpt *syncpt;
 
-	while (atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED) ==
-	       WLS_REMOVED)
-		schedule();
+	atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED);
 
 	syncpt = host->syncpt + id;
-	(void)process_wait_list(host, syncpt,
-				host1x_syncpt_load(host->syncpt + id));
+
+	spin_lock(&syncpt->intr.lock);
+	if (atomic_cmpxchg(&waiter->state, WLS_CANCELLED, WLS_HANDLED) ==
+	    WLS_CANCELLED) {
+		list_del(&waiter->list);
+		kref_put(&waiter->refcount, waiter_release);
+	}
+	spin_unlock(&syncpt->intr.lock);
 
 	kref_put(&waiter->refcount, waiter_release);
 }
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH v2 04/17] gpu: host1x: Remove cancelled waiters immediately
@ 2020-09-05 10:34   ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-05 10:34 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

Before this patch, cancelled waiters would only be cleaned up
once their threshold value was reached. Make host1x_intr_put_ref
process the cancellation immediately to fix this.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/intr.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c
index 9245add23b5d..5d328d20ce6d 100644
--- a/drivers/gpu/host1x/intr.c
+++ b/drivers/gpu/host1x/intr.c
@@ -247,13 +247,17 @@ void host1x_intr_put_ref(struct host1x *host, unsigned int id, void *ref)
 	struct host1x_waitlist *waiter = ref;
 	struct host1x_syncpt *syncpt;
 
-	while (atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED) ==
-	       WLS_REMOVED)
-		schedule();
+	atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED);
 
 	syncpt = host->syncpt + id;
-	(void)process_wait_list(host, syncpt,
-				host1x_syncpt_load(host->syncpt + id));
+
+	spin_lock(&syncpt->intr.lock);
+	if (atomic_cmpxchg(&waiter->state, WLS_CANCELLED, WLS_HANDLED) ==
+	    WLS_CANCELLED) {
+		list_del(&waiter->list);
+		kref_put(&waiter->refcount, waiter_release);
+	}
+	spin_unlock(&syncpt->intr.lock);
 
 	kref_put(&waiter->refcount, waiter_release);
 }
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH v2 05/17] gpu: host1x: Use HW-equivalent syncpoint expiration check
  2020-09-05 10:34 ` Mikko Perttunen
@ 2020-09-05 10:34   ` Mikko Perttunen
  -1 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-05 10:34 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Make syncpoint expiration checks always use the same logic used by
the hardware. This ensures that there are no race conditions that
could occur because of the hardware triggering a syncpoint interrupt
and then the driver disagreeing.

One situation where this could occur is if a job incremented a
syncpoint too many times -- then the hardware would trigger an
interrupt, but the driver would assume that a syncpoint value
greater than the syncpoint's max value is in the future, and not
clean up the job.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/syncpt.c | 51 ++-----------------------------------
 1 file changed, 2 insertions(+), 49 deletions(-)

diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index 7cb80d4768b1..5329a0886d29 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -306,59 +306,12 @@ EXPORT_SYMBOL(host1x_syncpt_wait);
 bool host1x_syncpt_is_expired(struct host1x_syncpt *sp, u32 thresh)
 {
 	u32 current_val;
-	u32 future_val;
 
 	smp_rmb();
 
 	current_val = (u32)atomic_read(&sp->min_val);
-	future_val = (u32)atomic_read(&sp->max_val);
-
-	/* Note the use of unsigned arithmetic here (mod 1<<32).
-	 *
-	 * c = current_val = min_val	= the current value of the syncpoint.
-	 * t = thresh			= the value we are checking
-	 * f = future_val  = max_val	= the value c will reach when all
-	 *				  outstanding increments have completed.
-	 *
-	 * Note that c always chases f until it reaches f.
-	 *
-	 * Dtf = (f - t)
-	 * Dtc = (c - t)
-	 *
-	 *  Consider all cases:
-	 *
-	 *	A) .....c..t..f.....	Dtf < Dtc	need to wait
-	 *	B) .....c.....f..t..	Dtf > Dtc	expired
-	 *	C) ..t..c.....f.....	Dtf > Dtc	expired	   (Dct very large)
-	 *
-	 *  Any case where f==c: always expired (for any t).	Dtf == Dcf
-	 *  Any case where t==c: always expired (for any f).	Dtf >= Dtc (because Dtc==0)
-	 *  Any case where t==f!=c: always wait.		Dtf <  Dtc (because Dtf==0,
-	 *							Dtc!=0)
-	 *
-	 *  Other cases:
-	 *
-	 *	A) .....t..f..c.....	Dtf < Dtc	need to wait
-	 *	A) .....f..c..t.....	Dtf < Dtc	need to wait
-	 *	A) .....f..t..c.....	Dtf > Dtc	expired
-	 *
-	 *   So:
-	 *	   Dtf >= Dtc implies EXPIRED	(return true)
-	 *	   Dtf <  Dtc implies WAIT	(return false)
-	 *
-	 * Note: If t is expired then we *cannot* wait on it. We would wait
-	 * forever (hang the system).
-	 *
-	 * Note: do NOT get clever and remove the -thresh from both sides. It
-	 * is NOT the same.
-	 *
-	 * If future valueis zero, we have a client managed sync point. In that
-	 * case we do a direct comparison.
-	 */
-	if (!host1x_syncpt_client_managed(sp))
-		return future_val - thresh >= current_val - thresh;
-	else
-		return (s32)(current_val - thresh) >= 0;
+
+	return ((current_val - thresh) & 0x80000000U) == 0U;
 }
 
 int host1x_syncpt_init(struct host1x *host)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH v2 05/17] gpu: host1x: Use HW-equivalent syncpoint expiration check
@ 2020-09-05 10:34   ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-05 10:34 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

Make syncpoint expiration checks always use the same logic used by
the hardware. This ensures that there are no race conditions that
could occur because of the hardware triggering a syncpoint interrupt
and then the driver disagreeing.

One situation where this could occur is if a job incremented a
syncpoint too many times -- then the hardware would trigger an
interrupt, but the driver would assume that a syncpoint value
greater than the syncpoint's max value is in the future, and not
clean up the job.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/syncpt.c | 51 ++-----------------------------------
 1 file changed, 2 insertions(+), 49 deletions(-)

diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index 7cb80d4768b1..5329a0886d29 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -306,59 +306,12 @@ EXPORT_SYMBOL(host1x_syncpt_wait);
 bool host1x_syncpt_is_expired(struct host1x_syncpt *sp, u32 thresh)
 {
 	u32 current_val;
-	u32 future_val;
 
 	smp_rmb();
 
 	current_val = (u32)atomic_read(&sp->min_val);
-	future_val = (u32)atomic_read(&sp->max_val);
-
-	/* Note the use of unsigned arithmetic here (mod 1<<32).
-	 *
-	 * c = current_val = min_val	= the current value of the syncpoint.
-	 * t = thresh			= the value we are checking
-	 * f = future_val  = max_val	= the value c will reach when all
-	 *				  outstanding increments have completed.
-	 *
-	 * Note that c always chases f until it reaches f.
-	 *
-	 * Dtf = (f - t)
-	 * Dtc = (c - t)
-	 *
-	 *  Consider all cases:
-	 *
-	 *	A) .....c..t..f.....	Dtf < Dtc	need to wait
-	 *	B) .....c.....f..t..	Dtf > Dtc	expired
-	 *	C) ..t..c.....f.....	Dtf > Dtc	expired	   (Dct very large)
-	 *
-	 *  Any case where f==c: always expired (for any t).	Dtf == Dcf
-	 *  Any case where t==c: always expired (for any f).	Dtf >= Dtc (because Dtc==0)
-	 *  Any case where t==f!=c: always wait.		Dtf <  Dtc (because Dtf==0,
-	 *							Dtc!=0)
-	 *
-	 *  Other cases:
-	 *
-	 *	A) .....t..f..c.....	Dtf < Dtc	need to wait
-	 *	A) .....f..c..t.....	Dtf < Dtc	need to wait
-	 *	A) .....f..t..c.....	Dtf > Dtc	expired
-	 *
-	 *   So:
-	 *	   Dtf >= Dtc implies EXPIRED	(return true)
-	 *	   Dtf <  Dtc implies WAIT	(return false)
-	 *
-	 * Note: If t is expired then we *cannot* wait on it. We would wait
-	 * forever (hang the system).
-	 *
-	 * Note: do NOT get clever and remove the -thresh from both sides. It
-	 * is NOT the same.
-	 *
-	 * If future valueis zero, we have a client managed sync point. In that
-	 * case we do a direct comparison.
-	 */
-	if (!host1x_syncpt_client_managed(sp))
-		return future_val - thresh >= current_val - thresh;
-	else
-		return (s32)(current_val - thresh) >= 0;
+
+	return ((current_val - thresh) & 0x80000000U) == 0U;
 }
 
 int host1x_syncpt_init(struct host1x *host)
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH v2 06/17] gpu: host1x: Cleanup and refcounting for syncpoints
  2020-09-05 10:34 ` Mikko Perttunen
@ 2020-09-05 10:34   ` Mikko Perttunen
  -1 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-05 10:34 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Add reference counting for allocated syncpoints to allow keeping
them allocated while jobs are referencing them. Additionally,
clean up various places using syncpoint IDs to use host1x_syncpt
pointers instead.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/drm/tegra/dc.c         |  4 +-
 drivers/gpu/drm/tegra/drm.c        | 17 ++++---
 drivers/gpu/drm/tegra/gr2d.c       |  4 +-
 drivers/gpu/drm/tegra/gr3d.c       |  4 +-
 drivers/gpu/drm/tegra/vic.c        |  4 +-
 drivers/gpu/host1x/cdma.c          | 11 ++---
 drivers/gpu/host1x/dev.h           |  7 ++-
 drivers/gpu/host1x/hw/cdma_hw.c    |  2 +-
 drivers/gpu/host1x/hw/channel_hw.c | 10 ++--
 drivers/gpu/host1x/hw/debug_hw.c   |  2 +-
 drivers/gpu/host1x/job.c           |  5 +-
 drivers/gpu/host1x/syncpt.c        | 75 +++++++++++++++++++++++-------
 drivers/gpu/host1x/syncpt.h        |  3 ++
 include/linux/host1x.h             |  8 ++--
 14 files changed, 99 insertions(+), 57 deletions(-)

diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c
index 9a0b3240bc58..efb41c10dad4 100644
--- a/drivers/gpu/drm/tegra/dc.c
+++ b/drivers/gpu/drm/tegra/dc.c
@@ -2127,7 +2127,7 @@ static int tegra_dc_init(struct host1x_client *client)
 		drm_plane_cleanup(primary);
 
 	host1x_client_iommu_detach(client);
-	host1x_syncpt_free(dc->syncpt);
+	host1x_syncpt_put(dc->syncpt);
 
 	return err;
 }
@@ -2152,7 +2152,7 @@ static int tegra_dc_exit(struct host1x_client *client)
 	}
 
 	host1x_client_iommu_detach(client);
-	host1x_syncpt_free(dc->syncpt);
+	host1x_syncpt_put(dc->syncpt);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index ba9d1c3e7cac..ceea9db341f0 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -171,7 +171,7 @@ int tegra_drm_submit(struct tegra_drm_context *context,
 	struct drm_tegra_syncpt syncpt;
 	struct host1x *host1x = dev_get_drvdata(drm->dev->parent);
 	struct drm_gem_object **refs;
-	struct host1x_syncpt *sp;
+	struct host1x_syncpt *sp = NULL;
 	struct host1x_job *job;
 	unsigned int num_refs;
 	int err;
@@ -298,8 +298,8 @@ int tegra_drm_submit(struct tegra_drm_context *context,
 		goto fail;
 	}
 
-	/* check whether syncpoint ID is valid */
-	sp = host1x_syncpt_get(host1x, syncpt.id);
+	/* Syncpoint ref will be dropped on job release. */
+	sp = host1x_syncpt_get_by_id(host1x, syncpt.id);
 	if (!sp) {
 		err = -ENOENT;
 		goto fail;
@@ -308,7 +308,7 @@ int tegra_drm_submit(struct tegra_drm_context *context,
 	job->is_addr_reg = context->client->ops->is_addr_reg;
 	job->is_valid_class = context->client->ops->is_valid_class;
 	job->syncpt_incrs = syncpt.incrs;
-	job->syncpt_id = syncpt.id;
+	job->syncpt = sp;
 	job->timeout = 10000;
 
 	if (args->timeout && args->timeout < 10000)
@@ -327,6 +327,9 @@ int tegra_drm_submit(struct tegra_drm_context *context,
 	args->fence = job->syncpt_end;
 
 fail:
+	if (sp)
+		host1x_syncpt_put(sp);
+
 	while (num_refs--)
 		drm_gem_object_put(refs[num_refs]);
 
@@ -380,7 +383,7 @@ static int tegra_syncpt_read(struct drm_device *drm, void *data,
 	struct drm_tegra_syncpt_read *args = data;
 	struct host1x_syncpt *sp;
 
-	sp = host1x_syncpt_get(host, args->id);
+	sp = host1x_syncpt_get_by_id_noref(host, args->id);
 	if (!sp)
 		return -EINVAL;
 
@@ -395,7 +398,7 @@ static int tegra_syncpt_incr(struct drm_device *drm, void *data,
 	struct drm_tegra_syncpt_incr *args = data;
 	struct host1x_syncpt *sp;
 
-	sp = host1x_syncpt_get(host1x, args->id);
+	sp = host1x_syncpt_get_by_id_noref(host1x, args->id);
 	if (!sp)
 		return -EINVAL;
 
@@ -409,7 +412,7 @@ static int tegra_syncpt_wait(struct drm_device *drm, void *data,
 	struct drm_tegra_syncpt_wait *args = data;
 	struct host1x_syncpt *sp;
 
-	sp = host1x_syncpt_get(host1x, args->id);
+	sp = host1x_syncpt_get_by_id_noref(host1x, args->id);
 	if (!sp)
 		return -EINVAL;
 
diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c
index 1a0d3ba6e525..d857a99b21a7 100644
--- a/drivers/gpu/drm/tegra/gr2d.c
+++ b/drivers/gpu/drm/tegra/gr2d.c
@@ -67,7 +67,7 @@ static int gr2d_init(struct host1x_client *client)
 detach:
 	host1x_client_iommu_detach(client);
 free:
-	host1x_syncpt_free(client->syncpts[0]);
+	host1x_syncpt_put(client->syncpts[0]);
 put:
 	host1x_channel_put(gr2d->channel);
 	return err;
@@ -86,7 +86,7 @@ static int gr2d_exit(struct host1x_client *client)
 		return err;
 
 	host1x_client_iommu_detach(client);
-	host1x_syncpt_free(client->syncpts[0]);
+	host1x_syncpt_put(client->syncpts[0]);
 	host1x_channel_put(gr2d->channel);
 
 	return 0;
diff --git a/drivers/gpu/drm/tegra/gr3d.c b/drivers/gpu/drm/tegra/gr3d.c
index b0b8154e8104..24442ade0da3 100644
--- a/drivers/gpu/drm/tegra/gr3d.c
+++ b/drivers/gpu/drm/tegra/gr3d.c
@@ -76,7 +76,7 @@ static int gr3d_init(struct host1x_client *client)
 detach:
 	host1x_client_iommu_detach(client);
 free:
-	host1x_syncpt_free(client->syncpts[0]);
+	host1x_syncpt_put(client->syncpts[0]);
 put:
 	host1x_channel_put(gr3d->channel);
 	return err;
@@ -94,7 +94,7 @@ static int gr3d_exit(struct host1x_client *client)
 		return err;
 
 	host1x_client_iommu_detach(client);
-	host1x_syncpt_free(client->syncpts[0]);
+	host1x_syncpt_put(client->syncpts[0]);
 	host1x_channel_put(gr3d->channel);
 
 	return 0;
diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c
index ade56b860cf9..cb476da59adc 100644
--- a/drivers/gpu/drm/tegra/vic.c
+++ b/drivers/gpu/drm/tegra/vic.c
@@ -197,7 +197,7 @@ static int vic_init(struct host1x_client *client)
 	return 0;
 
 free_syncpt:
-	host1x_syncpt_free(client->syncpts[0]);
+	host1x_syncpt_put(client->syncpts[0]);
 free_channel:
 	host1x_channel_put(vic->channel);
 detach:
@@ -221,7 +221,7 @@ static int vic_exit(struct host1x_client *client)
 	if (err < 0)
 		return err;
 
-	host1x_syncpt_free(client->syncpts[0]);
+	host1x_syncpt_put(client->syncpts[0]);
 	host1x_channel_put(vic->channel);
 	host1x_client_iommu_detach(client);
 
diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
index e8d3fda91d8a..6e6ca774f68d 100644
--- a/drivers/gpu/host1x/cdma.c
+++ b/drivers/gpu/host1x/cdma.c
@@ -273,15 +273,13 @@ static int host1x_cdma_wait_pushbuffer_space(struct host1x *host1x,
 static void cdma_start_timer_locked(struct host1x_cdma *cdma,
 				    struct host1x_job *job)
 {
-	struct host1x *host = cdma_to_host1x(cdma);
-
 	if (cdma->timeout.client) {
 		/* timer already started */
 		return;
 	}
 
 	cdma->timeout.client = job->client;
-	cdma->timeout.syncpt = host1x_syncpt_get(host, job->syncpt_id);
+	cdma->timeout.syncpt = job->syncpt;
 	cdma->timeout.syncpt_val = job->syncpt_end;
 	cdma->timeout.start_ktime = ktime_get();
 
@@ -312,7 +310,6 @@ static void stop_cdma_timer_locked(struct host1x_cdma *cdma)
 static void update_cdma_locked(struct host1x_cdma *cdma)
 {
 	bool signal = false;
-	struct host1x *host1x = cdma_to_host1x(cdma);
 	struct host1x_job *job, *n;
 
 	/* If CDMA is stopped, queue is cleared and we can return */
@@ -324,8 +321,7 @@ static void update_cdma_locked(struct host1x_cdma *cdma)
 	 * to consume as many sync queue entries as possible without blocking
 	 */
 	list_for_each_entry_safe(job, n, &cdma->sync_queue, list) {
-		struct host1x_syncpt *sp =
-			host1x_syncpt_get(host1x, job->syncpt_id);
+		struct host1x_syncpt *sp = job->syncpt;
 
 		/* Check whether this syncpt has completed, and bail if not */
 		if (!host1x_syncpt_is_expired(sp, job->syncpt_end)) {
@@ -499,8 +495,7 @@ int host1x_cdma_begin(struct host1x_cdma *cdma, struct host1x_job *job)
 		if (!cdma->timeout.initialized) {
 			int err;
 
-			err = host1x_hw_cdma_timeout_init(host1x, cdma,
-							  job->syncpt_id);
+			err = host1x_hw_cdma_timeout_init(host1x, cdma);
 			if (err) {
 				mutex_unlock(&cdma->lock);
 				return err;
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
index f781a9b0f39d..63010ae37a97 100644
--- a/drivers/gpu/host1x/dev.h
+++ b/drivers/gpu/host1x/dev.h
@@ -37,7 +37,7 @@ struct host1x_cdma_ops {
 	void (*start)(struct host1x_cdma *cdma);
 	void (*stop)(struct host1x_cdma *cdma);
 	void (*flush)(struct  host1x_cdma *cdma);
-	int (*timeout_init)(struct host1x_cdma *cdma, unsigned int syncpt);
+	int (*timeout_init)(struct host1x_cdma *cdma);
 	void (*timeout_destroy)(struct host1x_cdma *cdma);
 	void (*freeze)(struct host1x_cdma *cdma);
 	void (*resume)(struct host1x_cdma *cdma, u32 getptr);
@@ -261,10 +261,9 @@ static inline void host1x_hw_cdma_flush(struct host1x *host,
 }
 
 static inline int host1x_hw_cdma_timeout_init(struct host1x *host,
-					      struct host1x_cdma *cdma,
-					      unsigned int syncpt)
+					      struct host1x_cdma *cdma)
 {
-	return host->cdma_op->timeout_init(cdma, syncpt);
+	return host->cdma_op->timeout_init(cdma);
 }
 
 static inline void host1x_hw_cdma_timeout_destroy(struct host1x *host,
diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
index 2f3bf94cf365..e49cd5b8f735 100644
--- a/drivers/gpu/host1x/hw/cdma_hw.c
+++ b/drivers/gpu/host1x/hw/cdma_hw.c
@@ -295,7 +295,7 @@ static void cdma_timeout_handler(struct work_struct *work)
 /*
  * Init timeout resources
  */
-static int cdma_timeout_init(struct host1x_cdma *cdma, unsigned int syncpt)
+static int cdma_timeout_init(struct host1x_cdma *cdma)
 {
 	INIT_DELAYED_WORK(&cdma->timeout.wq, cdma_timeout_handler);
 	cdma->timeout.initialized = true;
diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
index 5eaa29d171c9..d4c28faf27d1 100644
--- a/drivers/gpu/host1x/hw/channel_hw.c
+++ b/drivers/gpu/host1x/hw/channel_hw.c
@@ -86,8 +86,7 @@ static void submit_gathers(struct host1x_job *job)
 
 static inline void synchronize_syncpt_base(struct host1x_job *job)
 {
-	struct host1x *host = dev_get_drvdata(job->channel->dev->parent);
-	struct host1x_syncpt *sp = host->syncpt + job->syncpt_id;
+	struct host1x_syncpt *sp = job->syncpt;
 	unsigned int id;
 	u32 value;
 
@@ -118,7 +117,7 @@ static void host1x_channel_set_streamid(struct host1x_channel *channel)
 static int channel_submit(struct host1x_job *job)
 {
 	struct host1x_channel *ch = job->channel;
-	struct host1x_syncpt *sp;
+	struct host1x_syncpt *sp = job->syncpt;
 	u32 user_syncpt_incrs = job->syncpt_incrs;
 	u32 prev_max = 0;
 	u32 syncval;
@@ -126,10 +125,9 @@ static int channel_submit(struct host1x_job *job)
 	struct host1x_waitlist *completed_waiter = NULL;
 	struct host1x *host = dev_get_drvdata(ch->dev->parent);
 
-	sp = host->syncpt + job->syncpt_id;
 	trace_host1x_channel_submit(dev_name(ch->dev),
 				    job->num_gathers, job->num_relocs,
-				    job->syncpt_id, job->syncpt_incrs);
+				    job->syncpt->id, job->syncpt_incrs);
 
 	/* before error checks, return current max */
 	prev_max = job->syncpt_end = host1x_syncpt_read_max(sp);
@@ -163,7 +161,7 @@ static int channel_submit(struct host1x_job *job)
 		host1x_cdma_push(&ch->cdma,
 				 host1x_opcode_setclass(HOST1X_CLASS_HOST1X,
 					host1x_uclass_wait_syncpt_r(), 1),
-				 host1x_class_host_wait_syncpt(job->syncpt_id,
+				 host1x_class_host_wait_syncpt(job->syncpt->id,
 					host1x_syncpt_read_max(sp)));
 	}
 
diff --git a/drivers/gpu/host1x/hw/debug_hw.c b/drivers/gpu/host1x/hw/debug_hw.c
index f31bcfa1b837..ceb48229d14b 100644
--- a/drivers/gpu/host1x/hw/debug_hw.c
+++ b/drivers/gpu/host1x/hw/debug_hw.c
@@ -204,7 +204,7 @@ static void show_channel_gathers(struct output *o, struct host1x_cdma *cdma)
 		unsigned int i;
 
 		host1x_debug_output(o, "\n%p: JOB, syncpt_id=%d, syncpt_val=%d, first_get=%08x, timeout=%d num_slots=%d, num_handles=%d\n",
-				    job, job->syncpt_id, job->syncpt_end,
+				    job, job->syncpt->id, job->syncpt_end,
 				    job->first_get, job->timeout,
 				    job->num_slots, job->num_unpins);
 
diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
index 89b6c14b7392..d8345d3bf0b3 100644
--- a/drivers/gpu/host1x/job.c
+++ b/drivers/gpu/host1x/job.c
@@ -79,6 +79,9 @@ static void job_free(struct kref *ref)
 {
 	struct host1x_job *job = container_of(ref, struct host1x_job, ref);
 
+	if (job->syncpt)
+		host1x_syncpt_put(job->syncpt);
+
 	kfree(job);
 }
 
@@ -680,7 +683,7 @@ EXPORT_SYMBOL(host1x_job_unpin);
  */
 void host1x_job_dump(struct device *dev, struct host1x_job *job)
 {
-	dev_dbg(dev, "    SYNCPT_ID   %d\n", job->syncpt_id);
+	dev_dbg(dev, "    SYNCPT_ID   %d\n", job->syncpt->id);
 	dev_dbg(dev, "    SYNCPT_VAL  %d\n", job->syncpt_end);
 	dev_dbg(dev, "    FIRST_GET   0x%x\n", job->first_get);
 	dev_dbg(dev, "    TIMEOUT     %d\n", job->timeout);
diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index 5329a0886d29..b31b994624fa 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -76,6 +76,8 @@ struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
 	else
 		sp->client_managed = false;
 
+	kref_init(&sp->ref);
+
 	mutex_unlock(&host->syncpt_mutex);
 	return sp;
 
@@ -368,7 +370,7 @@ int host1x_syncpt_init(struct host1x *host)
  * host1x client drivers can use this function to allocate a syncpoint for
  * subsequent use. A syncpoint returned by this function will be reserved for
  * use by the client exclusively. When no longer using a syncpoint, a host1x
- * client driver needs to release it using host1x_syncpt_free().
+ * client driver needs to release it using host1x_syncpt_put().
  */
 struct host1x_syncpt *host1x_syncpt_request(struct host1x_client *client,
 					    unsigned long flags)
@@ -379,20 +381,9 @@ struct host1x_syncpt *host1x_syncpt_request(struct host1x_client *client,
 }
 EXPORT_SYMBOL(host1x_syncpt_request);
 
-/**
- * host1x_syncpt_free() - free a requested syncpoint
- * @sp: host1x syncpoint
- *
- * Release a syncpoint previously allocated using host1x_syncpt_request(). A
- * host1x client driver should call this when the syncpoint is no longer in
- * use. Note that client drivers must ensure that the syncpoint doesn't remain
- * under the control of hardware after calling this function, otherwise two
- * clients may end up trying to access the same syncpoint concurrently.
- */
-void host1x_syncpt_free(struct host1x_syncpt *sp)
+static void syncpt_release(struct kref *ref)
 {
-	if (!sp)
-		return;
+	struct host1x_syncpt *sp = container_of(ref, struct host1x_syncpt, ref);
 
 	mutex_lock(&sp->host->syncpt_mutex);
 
@@ -404,7 +395,23 @@ void host1x_syncpt_free(struct host1x_syncpt *sp)
 
 	mutex_unlock(&sp->host->syncpt_mutex);
 }
-EXPORT_SYMBOL(host1x_syncpt_free);
+
+/**
+ * host1x_syncpt_put() - free a requested syncpoint
+ * @sp: host1x syncpoint
+ *
+ * Release a syncpoint previously allocated using host1x_syncpt_request(). A
+ * host1x client driver should call this when the syncpoint is no longer in
+ * use.
+ */
+void host1x_syncpt_put(struct host1x_syncpt *sp)
+{
+	if (!sp)
+		return;
+
+	kref_put(&sp->ref, syncpt_release);
+}
+EXPORT_SYMBOL(host1x_syncpt_put);
 
 void host1x_syncpt_deinit(struct host1x *host)
 {
@@ -471,16 +478,48 @@ unsigned int host1x_syncpt_nb_mlocks(struct host1x *host)
 }
 
 /**
- * host1x_syncpt_get() - obtain a syncpoint by ID
+ * host1x_syncpt_get_by_id() - obtain a syncpoint by ID
+ * @host: host1x controller
+ * @id: syncpoint ID
+ */
+struct host1x_syncpt *host1x_syncpt_get_by_id(struct host1x *host,
+					      unsigned int id)
+{
+	if (id >= host->info->nb_pts)
+		return NULL;
+
+	if (kref_get_unless_zero(&host->syncpt[id].ref))
+		return &host->syncpt[id];
+	else
+		return NULL;
+}
+EXPORT_SYMBOL(host1x_syncpt_get_by_id);
+
+/**
+ * host1x_syncpt_get_by_id_noref() - obtain a syncpoint by ID but don't
+ * 	increase the refcount.
  * @host: host1x controller
  * @id: syncpoint ID
  */
-struct host1x_syncpt *host1x_syncpt_get(struct host1x *host, unsigned int id)
+struct host1x_syncpt *host1x_syncpt_get_by_id_noref(struct host1x *host,
+						    unsigned int id)
 {
 	if (id >= host->info->nb_pts)
 		return NULL;
 
-	return host->syncpt + id;
+	return &host->syncpt[id];
+}
+EXPORT_SYMBOL(host1x_syncpt_get_by_id_noref);
+
+/**
+ * host1x_syncpt_get() - increment syncpoint refcount
+ * @sp: syncpoint
+ */
+struct host1x_syncpt *host1x_syncpt_get(struct host1x_syncpt *sp)
+{
+	kref_get(&sp->ref);
+
+	return sp;
 }
 EXPORT_SYMBOL(host1x_syncpt_get);
 
diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h
index 77e7206cc316..eb49d7003743 100644
--- a/drivers/gpu/host1x/syncpt.h
+++ b/drivers/gpu/host1x/syncpt.h
@@ -11,6 +11,7 @@
 #include <linux/atomic.h>
 #include <linux/host1x.h>
 #include <linux/kernel.h>
+#include <linux/kref.h>
 #include <linux/sched.h>
 
 #include "intr.h"
@@ -26,6 +27,8 @@ struct host1x_syncpt_base {
 };
 
 struct host1x_syncpt {
+	struct kref ref;
+
 	unsigned int id;
 	atomic_t min_val;
 	atomic_t max_val;
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index f711fc0154f4..da87ceb33c2d 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -142,7 +142,9 @@ struct host1x_syncpt_base;
 struct host1x_syncpt;
 struct host1x;
 
-struct host1x_syncpt *host1x_syncpt_get(struct host1x *host, u32 id);
+struct host1x_syncpt *host1x_syncpt_get_by_id(struct host1x *host, u32 id);
+struct host1x_syncpt *host1x_syncpt_get_by_id_noref(struct host1x *host, u32 id);
+struct host1x_syncpt *host1x_syncpt_get(struct host1x_syncpt *sp);
 u32 host1x_syncpt_id(struct host1x_syncpt *sp);
 u32 host1x_syncpt_read_min(struct host1x_syncpt *sp);
 u32 host1x_syncpt_read_max(struct host1x_syncpt *sp);
@@ -153,7 +155,7 @@ int host1x_syncpt_wait(struct host1x_syncpt *sp, u32 thresh, long timeout,
 		       u32 *value);
 struct host1x_syncpt *host1x_syncpt_request(struct host1x_client *client,
 					    unsigned long flags);
-void host1x_syncpt_free(struct host1x_syncpt *sp);
+void host1x_syncpt_put(struct host1x_syncpt *sp);
 
 struct host1x_syncpt_base *host1x_syncpt_get_base(struct host1x_syncpt *sp);
 u32 host1x_syncpt_base_id(struct host1x_syncpt_base *base);
@@ -218,7 +220,7 @@ struct host1x_job {
 	dma_addr_t *reloc_addr_phys;
 
 	/* Sync point id, number of increments and end related to the submit */
-	u32 syncpt_id;
+	struct host1x_syncpt *syncpt;
 	u32 syncpt_incrs;
 	u32 syncpt_end;
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH v2 06/17] gpu: host1x: Cleanup and refcounting for syncpoints
@ 2020-09-05 10:34   ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-05 10:34 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

Add reference counting for allocated syncpoints to allow keeping
them allocated while jobs are referencing them. Additionally,
clean up various places using syncpoint IDs to use host1x_syncpt
pointers instead.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/drm/tegra/dc.c         |  4 +-
 drivers/gpu/drm/tegra/drm.c        | 17 ++++---
 drivers/gpu/drm/tegra/gr2d.c       |  4 +-
 drivers/gpu/drm/tegra/gr3d.c       |  4 +-
 drivers/gpu/drm/tegra/vic.c        |  4 +-
 drivers/gpu/host1x/cdma.c          | 11 ++---
 drivers/gpu/host1x/dev.h           |  7 ++-
 drivers/gpu/host1x/hw/cdma_hw.c    |  2 +-
 drivers/gpu/host1x/hw/channel_hw.c | 10 ++--
 drivers/gpu/host1x/hw/debug_hw.c   |  2 +-
 drivers/gpu/host1x/job.c           |  5 +-
 drivers/gpu/host1x/syncpt.c        | 75 +++++++++++++++++++++++-------
 drivers/gpu/host1x/syncpt.h        |  3 ++
 include/linux/host1x.h             |  8 ++--
 14 files changed, 99 insertions(+), 57 deletions(-)

diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c
index 9a0b3240bc58..efb41c10dad4 100644
--- a/drivers/gpu/drm/tegra/dc.c
+++ b/drivers/gpu/drm/tegra/dc.c
@@ -2127,7 +2127,7 @@ static int tegra_dc_init(struct host1x_client *client)
 		drm_plane_cleanup(primary);
 
 	host1x_client_iommu_detach(client);
-	host1x_syncpt_free(dc->syncpt);
+	host1x_syncpt_put(dc->syncpt);
 
 	return err;
 }
@@ -2152,7 +2152,7 @@ static int tegra_dc_exit(struct host1x_client *client)
 	}
 
 	host1x_client_iommu_detach(client);
-	host1x_syncpt_free(dc->syncpt);
+	host1x_syncpt_put(dc->syncpt);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index ba9d1c3e7cac..ceea9db341f0 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -171,7 +171,7 @@ int tegra_drm_submit(struct tegra_drm_context *context,
 	struct drm_tegra_syncpt syncpt;
 	struct host1x *host1x = dev_get_drvdata(drm->dev->parent);
 	struct drm_gem_object **refs;
-	struct host1x_syncpt *sp;
+	struct host1x_syncpt *sp = NULL;
 	struct host1x_job *job;
 	unsigned int num_refs;
 	int err;
@@ -298,8 +298,8 @@ int tegra_drm_submit(struct tegra_drm_context *context,
 		goto fail;
 	}
 
-	/* check whether syncpoint ID is valid */
-	sp = host1x_syncpt_get(host1x, syncpt.id);
+	/* Syncpoint ref will be dropped on job release. */
+	sp = host1x_syncpt_get_by_id(host1x, syncpt.id);
 	if (!sp) {
 		err = -ENOENT;
 		goto fail;
@@ -308,7 +308,7 @@ int tegra_drm_submit(struct tegra_drm_context *context,
 	job->is_addr_reg = context->client->ops->is_addr_reg;
 	job->is_valid_class = context->client->ops->is_valid_class;
 	job->syncpt_incrs = syncpt.incrs;
-	job->syncpt_id = syncpt.id;
+	job->syncpt = sp;
 	job->timeout = 10000;
 
 	if (args->timeout && args->timeout < 10000)
@@ -327,6 +327,9 @@ int tegra_drm_submit(struct tegra_drm_context *context,
 	args->fence = job->syncpt_end;
 
 fail:
+	if (sp)
+		host1x_syncpt_put(sp);
+
 	while (num_refs--)
 		drm_gem_object_put(refs[num_refs]);
 
@@ -380,7 +383,7 @@ static int tegra_syncpt_read(struct drm_device *drm, void *data,
 	struct drm_tegra_syncpt_read *args = data;
 	struct host1x_syncpt *sp;
 
-	sp = host1x_syncpt_get(host, args->id);
+	sp = host1x_syncpt_get_by_id_noref(host, args->id);
 	if (!sp)
 		return -EINVAL;
 
@@ -395,7 +398,7 @@ static int tegra_syncpt_incr(struct drm_device *drm, void *data,
 	struct drm_tegra_syncpt_incr *args = data;
 	struct host1x_syncpt *sp;
 
-	sp = host1x_syncpt_get(host1x, args->id);
+	sp = host1x_syncpt_get_by_id_noref(host1x, args->id);
 	if (!sp)
 		return -EINVAL;
 
@@ -409,7 +412,7 @@ static int tegra_syncpt_wait(struct drm_device *drm, void *data,
 	struct drm_tegra_syncpt_wait *args = data;
 	struct host1x_syncpt *sp;
 
-	sp = host1x_syncpt_get(host1x, args->id);
+	sp = host1x_syncpt_get_by_id_noref(host1x, args->id);
 	if (!sp)
 		return -EINVAL;
 
diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c
index 1a0d3ba6e525..d857a99b21a7 100644
--- a/drivers/gpu/drm/tegra/gr2d.c
+++ b/drivers/gpu/drm/tegra/gr2d.c
@@ -67,7 +67,7 @@ static int gr2d_init(struct host1x_client *client)
 detach:
 	host1x_client_iommu_detach(client);
 free:
-	host1x_syncpt_free(client->syncpts[0]);
+	host1x_syncpt_put(client->syncpts[0]);
 put:
 	host1x_channel_put(gr2d->channel);
 	return err;
@@ -86,7 +86,7 @@ static int gr2d_exit(struct host1x_client *client)
 		return err;
 
 	host1x_client_iommu_detach(client);
-	host1x_syncpt_free(client->syncpts[0]);
+	host1x_syncpt_put(client->syncpts[0]);
 	host1x_channel_put(gr2d->channel);
 
 	return 0;
diff --git a/drivers/gpu/drm/tegra/gr3d.c b/drivers/gpu/drm/tegra/gr3d.c
index b0b8154e8104..24442ade0da3 100644
--- a/drivers/gpu/drm/tegra/gr3d.c
+++ b/drivers/gpu/drm/tegra/gr3d.c
@@ -76,7 +76,7 @@ static int gr3d_init(struct host1x_client *client)
 detach:
 	host1x_client_iommu_detach(client);
 free:
-	host1x_syncpt_free(client->syncpts[0]);
+	host1x_syncpt_put(client->syncpts[0]);
 put:
 	host1x_channel_put(gr3d->channel);
 	return err;
@@ -94,7 +94,7 @@ static int gr3d_exit(struct host1x_client *client)
 		return err;
 
 	host1x_client_iommu_detach(client);
-	host1x_syncpt_free(client->syncpts[0]);
+	host1x_syncpt_put(client->syncpts[0]);
 	host1x_channel_put(gr3d->channel);
 
 	return 0;
diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c
index ade56b860cf9..cb476da59adc 100644
--- a/drivers/gpu/drm/tegra/vic.c
+++ b/drivers/gpu/drm/tegra/vic.c
@@ -197,7 +197,7 @@ static int vic_init(struct host1x_client *client)
 	return 0;
 
 free_syncpt:
-	host1x_syncpt_free(client->syncpts[0]);
+	host1x_syncpt_put(client->syncpts[0]);
 free_channel:
 	host1x_channel_put(vic->channel);
 detach:
@@ -221,7 +221,7 @@ static int vic_exit(struct host1x_client *client)
 	if (err < 0)
 		return err;
 
-	host1x_syncpt_free(client->syncpts[0]);
+	host1x_syncpt_put(client->syncpts[0]);
 	host1x_channel_put(vic->channel);
 	host1x_client_iommu_detach(client);
 
diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
index e8d3fda91d8a..6e6ca774f68d 100644
--- a/drivers/gpu/host1x/cdma.c
+++ b/drivers/gpu/host1x/cdma.c
@@ -273,15 +273,13 @@ static int host1x_cdma_wait_pushbuffer_space(struct host1x *host1x,
 static void cdma_start_timer_locked(struct host1x_cdma *cdma,
 				    struct host1x_job *job)
 {
-	struct host1x *host = cdma_to_host1x(cdma);
-
 	if (cdma->timeout.client) {
 		/* timer already started */
 		return;
 	}
 
 	cdma->timeout.client = job->client;
-	cdma->timeout.syncpt = host1x_syncpt_get(host, job->syncpt_id);
+	cdma->timeout.syncpt = job->syncpt;
 	cdma->timeout.syncpt_val = job->syncpt_end;
 	cdma->timeout.start_ktime = ktime_get();
 
@@ -312,7 +310,6 @@ static void stop_cdma_timer_locked(struct host1x_cdma *cdma)
 static void update_cdma_locked(struct host1x_cdma *cdma)
 {
 	bool signal = false;
-	struct host1x *host1x = cdma_to_host1x(cdma);
 	struct host1x_job *job, *n;
 
 	/* If CDMA is stopped, queue is cleared and we can return */
@@ -324,8 +321,7 @@ static void update_cdma_locked(struct host1x_cdma *cdma)
 	 * to consume as many sync queue entries as possible without blocking
 	 */
 	list_for_each_entry_safe(job, n, &cdma->sync_queue, list) {
-		struct host1x_syncpt *sp =
-			host1x_syncpt_get(host1x, job->syncpt_id);
+		struct host1x_syncpt *sp = job->syncpt;
 
 		/* Check whether this syncpt has completed, and bail if not */
 		if (!host1x_syncpt_is_expired(sp, job->syncpt_end)) {
@@ -499,8 +495,7 @@ int host1x_cdma_begin(struct host1x_cdma *cdma, struct host1x_job *job)
 		if (!cdma->timeout.initialized) {
 			int err;
 
-			err = host1x_hw_cdma_timeout_init(host1x, cdma,
-							  job->syncpt_id);
+			err = host1x_hw_cdma_timeout_init(host1x, cdma);
 			if (err) {
 				mutex_unlock(&cdma->lock);
 				return err;
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
index f781a9b0f39d..63010ae37a97 100644
--- a/drivers/gpu/host1x/dev.h
+++ b/drivers/gpu/host1x/dev.h
@@ -37,7 +37,7 @@ struct host1x_cdma_ops {
 	void (*start)(struct host1x_cdma *cdma);
 	void (*stop)(struct host1x_cdma *cdma);
 	void (*flush)(struct  host1x_cdma *cdma);
-	int (*timeout_init)(struct host1x_cdma *cdma, unsigned int syncpt);
+	int (*timeout_init)(struct host1x_cdma *cdma);
 	void (*timeout_destroy)(struct host1x_cdma *cdma);
 	void (*freeze)(struct host1x_cdma *cdma);
 	void (*resume)(struct host1x_cdma *cdma, u32 getptr);
@@ -261,10 +261,9 @@ static inline void host1x_hw_cdma_flush(struct host1x *host,
 }
 
 static inline int host1x_hw_cdma_timeout_init(struct host1x *host,
-					      struct host1x_cdma *cdma,
-					      unsigned int syncpt)
+					      struct host1x_cdma *cdma)
 {
-	return host->cdma_op->timeout_init(cdma, syncpt);
+	return host->cdma_op->timeout_init(cdma);
 }
 
 static inline void host1x_hw_cdma_timeout_destroy(struct host1x *host,
diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
index 2f3bf94cf365..e49cd5b8f735 100644
--- a/drivers/gpu/host1x/hw/cdma_hw.c
+++ b/drivers/gpu/host1x/hw/cdma_hw.c
@@ -295,7 +295,7 @@ static void cdma_timeout_handler(struct work_struct *work)
 /*
  * Init timeout resources
  */
-static int cdma_timeout_init(struct host1x_cdma *cdma, unsigned int syncpt)
+static int cdma_timeout_init(struct host1x_cdma *cdma)
 {
 	INIT_DELAYED_WORK(&cdma->timeout.wq, cdma_timeout_handler);
 	cdma->timeout.initialized = true;
diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
index 5eaa29d171c9..d4c28faf27d1 100644
--- a/drivers/gpu/host1x/hw/channel_hw.c
+++ b/drivers/gpu/host1x/hw/channel_hw.c
@@ -86,8 +86,7 @@ static void submit_gathers(struct host1x_job *job)
 
 static inline void synchronize_syncpt_base(struct host1x_job *job)
 {
-	struct host1x *host = dev_get_drvdata(job->channel->dev->parent);
-	struct host1x_syncpt *sp = host->syncpt + job->syncpt_id;
+	struct host1x_syncpt *sp = job->syncpt;
 	unsigned int id;
 	u32 value;
 
@@ -118,7 +117,7 @@ static void host1x_channel_set_streamid(struct host1x_channel *channel)
 static int channel_submit(struct host1x_job *job)
 {
 	struct host1x_channel *ch = job->channel;
-	struct host1x_syncpt *sp;
+	struct host1x_syncpt *sp = job->syncpt;
 	u32 user_syncpt_incrs = job->syncpt_incrs;
 	u32 prev_max = 0;
 	u32 syncval;
@@ -126,10 +125,9 @@ static int channel_submit(struct host1x_job *job)
 	struct host1x_waitlist *completed_waiter = NULL;
 	struct host1x *host = dev_get_drvdata(ch->dev->parent);
 
-	sp = host->syncpt + job->syncpt_id;
 	trace_host1x_channel_submit(dev_name(ch->dev),
 				    job->num_gathers, job->num_relocs,
-				    job->syncpt_id, job->syncpt_incrs);
+				    job->syncpt->id, job->syncpt_incrs);
 
 	/* before error checks, return current max */
 	prev_max = job->syncpt_end = host1x_syncpt_read_max(sp);
@@ -163,7 +161,7 @@ static int channel_submit(struct host1x_job *job)
 		host1x_cdma_push(&ch->cdma,
 				 host1x_opcode_setclass(HOST1X_CLASS_HOST1X,
 					host1x_uclass_wait_syncpt_r(), 1),
-				 host1x_class_host_wait_syncpt(job->syncpt_id,
+				 host1x_class_host_wait_syncpt(job->syncpt->id,
 					host1x_syncpt_read_max(sp)));
 	}
 
diff --git a/drivers/gpu/host1x/hw/debug_hw.c b/drivers/gpu/host1x/hw/debug_hw.c
index f31bcfa1b837..ceb48229d14b 100644
--- a/drivers/gpu/host1x/hw/debug_hw.c
+++ b/drivers/gpu/host1x/hw/debug_hw.c
@@ -204,7 +204,7 @@ static void show_channel_gathers(struct output *o, struct host1x_cdma *cdma)
 		unsigned int i;
 
 		host1x_debug_output(o, "\n%p: JOB, syncpt_id=%d, syncpt_val=%d, first_get=%08x, timeout=%d num_slots=%d, num_handles=%d\n",
-				    job, job->syncpt_id, job->syncpt_end,
+				    job, job->syncpt->id, job->syncpt_end,
 				    job->first_get, job->timeout,
 				    job->num_slots, job->num_unpins);
 
diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
index 89b6c14b7392..d8345d3bf0b3 100644
--- a/drivers/gpu/host1x/job.c
+++ b/drivers/gpu/host1x/job.c
@@ -79,6 +79,9 @@ static void job_free(struct kref *ref)
 {
 	struct host1x_job *job = container_of(ref, struct host1x_job, ref);
 
+	if (job->syncpt)
+		host1x_syncpt_put(job->syncpt);
+
 	kfree(job);
 }
 
@@ -680,7 +683,7 @@ EXPORT_SYMBOL(host1x_job_unpin);
  */
 void host1x_job_dump(struct device *dev, struct host1x_job *job)
 {
-	dev_dbg(dev, "    SYNCPT_ID   %d\n", job->syncpt_id);
+	dev_dbg(dev, "    SYNCPT_ID   %d\n", job->syncpt->id);
 	dev_dbg(dev, "    SYNCPT_VAL  %d\n", job->syncpt_end);
 	dev_dbg(dev, "    FIRST_GET   0x%x\n", job->first_get);
 	dev_dbg(dev, "    TIMEOUT     %d\n", job->timeout);
diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index 5329a0886d29..b31b994624fa 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -76,6 +76,8 @@ struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
 	else
 		sp->client_managed = false;
 
+	kref_init(&sp->ref);
+
 	mutex_unlock(&host->syncpt_mutex);
 	return sp;
 
@@ -368,7 +370,7 @@ int host1x_syncpt_init(struct host1x *host)
  * host1x client drivers can use this function to allocate a syncpoint for
  * subsequent use. A syncpoint returned by this function will be reserved for
  * use by the client exclusively. When no longer using a syncpoint, a host1x
- * client driver needs to release it using host1x_syncpt_free().
+ * client driver needs to release it using host1x_syncpt_put().
  */
 struct host1x_syncpt *host1x_syncpt_request(struct host1x_client *client,
 					    unsigned long flags)
@@ -379,20 +381,9 @@ struct host1x_syncpt *host1x_syncpt_request(struct host1x_client *client,
 }
 EXPORT_SYMBOL(host1x_syncpt_request);
 
-/**
- * host1x_syncpt_free() - free a requested syncpoint
- * @sp: host1x syncpoint
- *
- * Release a syncpoint previously allocated using host1x_syncpt_request(). A
- * host1x client driver should call this when the syncpoint is no longer in
- * use. Note that client drivers must ensure that the syncpoint doesn't remain
- * under the control of hardware after calling this function, otherwise two
- * clients may end up trying to access the same syncpoint concurrently.
- */
-void host1x_syncpt_free(struct host1x_syncpt *sp)
+static void syncpt_release(struct kref *ref)
 {
-	if (!sp)
-		return;
+	struct host1x_syncpt *sp = container_of(ref, struct host1x_syncpt, ref);
 
 	mutex_lock(&sp->host->syncpt_mutex);
 
@@ -404,7 +395,23 @@ void host1x_syncpt_free(struct host1x_syncpt *sp)
 
 	mutex_unlock(&sp->host->syncpt_mutex);
 }
-EXPORT_SYMBOL(host1x_syncpt_free);
+
+/**
+ * host1x_syncpt_put() - free a requested syncpoint
+ * @sp: host1x syncpoint
+ *
+ * Release a syncpoint previously allocated using host1x_syncpt_request(). A
+ * host1x client driver should call this when the syncpoint is no longer in
+ * use.
+ */
+void host1x_syncpt_put(struct host1x_syncpt *sp)
+{
+	if (!sp)
+		return;
+
+	kref_put(&sp->ref, syncpt_release);
+}
+EXPORT_SYMBOL(host1x_syncpt_put);
 
 void host1x_syncpt_deinit(struct host1x *host)
 {
@@ -471,16 +478,48 @@ unsigned int host1x_syncpt_nb_mlocks(struct host1x *host)
 }
 
 /**
- * host1x_syncpt_get() - obtain a syncpoint by ID
+ * host1x_syncpt_get_by_id() - obtain a syncpoint by ID
+ * @host: host1x controller
+ * @id: syncpoint ID
+ */
+struct host1x_syncpt *host1x_syncpt_get_by_id(struct host1x *host,
+					      unsigned int id)
+{
+	if (id >= host->info->nb_pts)
+		return NULL;
+
+	if (kref_get_unless_zero(&host->syncpt[id].ref))
+		return &host->syncpt[id];
+	else
+		return NULL;
+}
+EXPORT_SYMBOL(host1x_syncpt_get_by_id);
+
+/**
+ * host1x_syncpt_get_by_id_noref() - obtain a syncpoint by ID but don't
+ * 	increase the refcount.
  * @host: host1x controller
  * @id: syncpoint ID
  */
-struct host1x_syncpt *host1x_syncpt_get(struct host1x *host, unsigned int id)
+struct host1x_syncpt *host1x_syncpt_get_by_id_noref(struct host1x *host,
+						    unsigned int id)
 {
 	if (id >= host->info->nb_pts)
 		return NULL;
 
-	return host->syncpt + id;
+	return &host->syncpt[id];
+}
+EXPORT_SYMBOL(host1x_syncpt_get_by_id_noref);
+
+/**
+ * host1x_syncpt_get() - increment syncpoint refcount
+ * @sp: syncpoint
+ */
+struct host1x_syncpt *host1x_syncpt_get(struct host1x_syncpt *sp)
+{
+	kref_get(&sp->ref);
+
+	return sp;
 }
 EXPORT_SYMBOL(host1x_syncpt_get);
 
diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h
index 77e7206cc316..eb49d7003743 100644
--- a/drivers/gpu/host1x/syncpt.h
+++ b/drivers/gpu/host1x/syncpt.h
@@ -11,6 +11,7 @@
 #include <linux/atomic.h>
 #include <linux/host1x.h>
 #include <linux/kernel.h>
+#include <linux/kref.h>
 #include <linux/sched.h>
 
 #include "intr.h"
@@ -26,6 +27,8 @@ struct host1x_syncpt_base {
 };
 
 struct host1x_syncpt {
+	struct kref ref;
+
 	unsigned int id;
 	atomic_t min_val;
 	atomic_t max_val;
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index f711fc0154f4..da87ceb33c2d 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -142,7 +142,9 @@ struct host1x_syncpt_base;
 struct host1x_syncpt;
 struct host1x;
 
-struct host1x_syncpt *host1x_syncpt_get(struct host1x *host, u32 id);
+struct host1x_syncpt *host1x_syncpt_get_by_id(struct host1x *host, u32 id);
+struct host1x_syncpt *host1x_syncpt_get_by_id_noref(struct host1x *host, u32 id);
+struct host1x_syncpt *host1x_syncpt_get(struct host1x_syncpt *sp);
 u32 host1x_syncpt_id(struct host1x_syncpt *sp);
 u32 host1x_syncpt_read_min(struct host1x_syncpt *sp);
 u32 host1x_syncpt_read_max(struct host1x_syncpt *sp);
@@ -153,7 +155,7 @@ int host1x_syncpt_wait(struct host1x_syncpt *sp, u32 thresh, long timeout,
 		       u32 *value);
 struct host1x_syncpt *host1x_syncpt_request(struct host1x_client *client,
 					    unsigned long flags);
-void host1x_syncpt_free(struct host1x_syncpt *sp);
+void host1x_syncpt_put(struct host1x_syncpt *sp);
 
 struct host1x_syncpt_base *host1x_syncpt_get_base(struct host1x_syncpt *sp);
 u32 host1x_syncpt_base_id(struct host1x_syncpt_base *base);
@@ -218,7 +220,7 @@ struct host1x_job {
 	dma_addr_t *reloc_addr_phys;
 
 	/* Sync point id, number of increments and end related to the submit */
-	u32 syncpt_id;
+	struct host1x_syncpt *syncpt;
 	u32 syncpt_incrs;
 	u32 syncpt_end;
 
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH v2 07/17] gpu: host1x: Introduce UAPI header
  2020-09-05 10:34 ` Mikko Perttunen
@ 2020-09-05 10:34   ` Mikko Perttunen
  -1 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-05 10:34 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Add the userspace interface header, specifying interfaces
for allocating and accessing syncpoints from userspace,
and for creating sync_file based fences based on syncpoint
thresholds.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 include/uapi/linux/host1x.h | 134 ++++++++++++++++++++++++++++++++++++
 1 file changed, 134 insertions(+)
 create mode 100644 include/uapi/linux/host1x.h

diff --git a/include/uapi/linux/host1x.h b/include/uapi/linux/host1x.h
new file mode 100644
index 000000000000..9c8fb9425cb2
--- /dev/null
+++ b/include/uapi/linux/host1x.h
@@ -0,0 +1,134 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/* Copyright (c) 2020 NVIDIA Corporation */
+
+#ifndef _UAPI__LINUX_HOST1X_H
+#define _UAPI__LINUX_HOST1X_H
+
+#include <linux/ioctl.h>
+#include <linux/types.h>
+
+#if defined(__cplusplus)
+extern "C" {
+#endif
+
+struct host1x_allocate_syncpoint {
+	/**
+	 * @fd: [out]
+	 *
+	 * New file descriptor representing the allocated syncpoint.
+	 */
+	__s32 fd;
+
+	__u32 reserved[3];
+};
+
+struct host1x_syncpoint_info {
+	/**
+	 * @id: [out]
+	 *
+	 * System-global ID of the syncpoint.
+	 */
+	__u32 id;
+
+	__u32 reserved[3];
+};
+
+struct host1x_syncpoint_increment {
+	/**
+	 * @count: [in]
+	 *
+	 * Number of times to increment the syncpoint. The syncpoint can
+	 * be observed at in-between values, but each increment is atomic.
+	 */
+	__u32 count;
+};
+
+struct host1x_read_syncpoint {
+	/**
+	 * @id: [in]
+	 *
+	 * ID of the syncpoint to read.
+	 */
+	__u32 id;
+
+	/**
+	 * @value: [out]
+	 *
+	 * Current value of the syncpoint.
+	 */
+	__u32 value;
+};
+
+struct host1x_create_fence {
+	/**
+	 * @id: [in]
+	 *
+	 * ID of the syncpoint to create a fence for.
+	 */
+	__u32 id;
+
+	/**
+	 * @threshold: [in]
+	 *
+	 * When the syncpoint reaches this value, the fence will be signaled.
+	 * The syncpoint is considered to have reached the threshold when the
+	 * following condition is true:
+	 *
+	 * 	((value - threshold) & 0x80000000U) == 0U
+	 *
+	 */
+	__u32 threshold;
+
+	/**
+	 * @fence_fd: [out]
+	 *
+	 * New sync_file file descriptor containing the created fence.
+	 */
+	__s32 fence_fd;
+
+	__u32 reserved[1];
+};
+
+struct host1x_fence_extract_fence {
+	__u32 id;
+	__u32 threshold;
+};
+
+struct host1x_fence_extract {
+	/**
+	 * @fence_fd: [in]
+	 *
+	 * sync_file file descriptor
+	 */
+	__s32 fence_fd;
+
+	/**
+	 * @num_fences: [in,out]
+	 *
+	 * In: size of the `fences_ptr` array counted in elements.
+	 * Out: required size of the `fences_ptr` array counted in elements.
+	 */
+	__u32 num_fences;
+
+	/**
+	 * @fences_ptr: [in]
+	 *
+	 * Pointer to array of `struct host1x_fence_extract_fence`.
+	 */
+	__u64 fences_ptr;
+
+	__u32 reserved[2];
+};
+
+#define HOST1X_IOCTL_ALLOCATE_SYNCPOINT  _IOWR('X', 0x00, struct host1x_allocate_syncpoint)
+#define HOST1X_IOCTL_READ_SYNCPOINT      _IOR ('X', 0x01, struct host1x_read_syncpoint)
+#define HOST1X_IOCTL_CREATE_FENCE        _IOWR('X', 0x02, struct host1x_create_fence)
+#define HOST1X_IOCTL_SYNCPOINT_INFO      _IOWR('X', 0x03, struct host1x_syncpoint_info)
+#define HOST1X_IOCTL_SYNCPOINT_INCREMENT _IOWR('X', 0x04, struct host1x_syncpoint_increment)
+#define HOST1X_IOCTL_FENCE_EXTRACT       _IOWR('X', 0x05, struct host1x_fence_extract)
+
+#if defined(__cplusplus)
+}
+#endif
+
+#endif
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH v2 07/17] gpu: host1x: Introduce UAPI header
@ 2020-09-05 10:34   ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-05 10:34 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

Add the userspace interface header, specifying interfaces
for allocating and accessing syncpoints from userspace,
and for creating sync_file based fences based on syncpoint
thresholds.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 include/uapi/linux/host1x.h | 134 ++++++++++++++++++++++++++++++++++++
 1 file changed, 134 insertions(+)
 create mode 100644 include/uapi/linux/host1x.h

diff --git a/include/uapi/linux/host1x.h b/include/uapi/linux/host1x.h
new file mode 100644
index 000000000000..9c8fb9425cb2
--- /dev/null
+++ b/include/uapi/linux/host1x.h
@@ -0,0 +1,134 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/* Copyright (c) 2020 NVIDIA Corporation */
+
+#ifndef _UAPI__LINUX_HOST1X_H
+#define _UAPI__LINUX_HOST1X_H
+
+#include <linux/ioctl.h>
+#include <linux/types.h>
+
+#if defined(__cplusplus)
+extern "C" {
+#endif
+
+struct host1x_allocate_syncpoint {
+	/**
+	 * @fd: [out]
+	 *
+	 * New file descriptor representing the allocated syncpoint.
+	 */
+	__s32 fd;
+
+	__u32 reserved[3];
+};
+
+struct host1x_syncpoint_info {
+	/**
+	 * @id: [out]
+	 *
+	 * System-global ID of the syncpoint.
+	 */
+	__u32 id;
+
+	__u32 reserved[3];
+};
+
+struct host1x_syncpoint_increment {
+	/**
+	 * @count: [in]
+	 *
+	 * Number of times to increment the syncpoint. The syncpoint can
+	 * be observed at in-between values, but each increment is atomic.
+	 */
+	__u32 count;
+};
+
+struct host1x_read_syncpoint {
+	/**
+	 * @id: [in]
+	 *
+	 * ID of the syncpoint to read.
+	 */
+	__u32 id;
+
+	/**
+	 * @value: [out]
+	 *
+	 * Current value of the syncpoint.
+	 */
+	__u32 value;
+};
+
+struct host1x_create_fence {
+	/**
+	 * @id: [in]
+	 *
+	 * ID of the syncpoint to create a fence for.
+	 */
+	__u32 id;
+
+	/**
+	 * @threshold: [in]
+	 *
+	 * When the syncpoint reaches this value, the fence will be signaled.
+	 * The syncpoint is considered to have reached the threshold when the
+	 * following condition is true:
+	 *
+	 * 	((value - threshold) & 0x80000000U) == 0U
+	 *
+	 */
+	__u32 threshold;
+
+	/**
+	 * @fence_fd: [out]
+	 *
+	 * New sync_file file descriptor containing the created fence.
+	 */
+	__s32 fence_fd;
+
+	__u32 reserved[1];
+};
+
+struct host1x_fence_extract_fence {
+	__u32 id;
+	__u32 threshold;
+};
+
+struct host1x_fence_extract {
+	/**
+	 * @fence_fd: [in]
+	 *
+	 * sync_file file descriptor
+	 */
+	__s32 fence_fd;
+
+	/**
+	 * @num_fences: [in,out]
+	 *
+	 * In: size of the `fences_ptr` array counted in elements.
+	 * Out: required size of the `fences_ptr` array counted in elements.
+	 */
+	__u32 num_fences;
+
+	/**
+	 * @fences_ptr: [in]
+	 *
+	 * Pointer to array of `struct host1x_fence_extract_fence`.
+	 */
+	__u64 fences_ptr;
+
+	__u32 reserved[2];
+};
+
+#define HOST1X_IOCTL_ALLOCATE_SYNCPOINT  _IOWR('X', 0x00, struct host1x_allocate_syncpoint)
+#define HOST1X_IOCTL_READ_SYNCPOINT      _IOR ('X', 0x01, struct host1x_read_syncpoint)
+#define HOST1X_IOCTL_CREATE_FENCE        _IOWR('X', 0x02, struct host1x_create_fence)
+#define HOST1X_IOCTL_SYNCPOINT_INFO      _IOWR('X', 0x03, struct host1x_syncpoint_info)
+#define HOST1X_IOCTL_SYNCPOINT_INCREMENT _IOWR('X', 0x04, struct host1x_syncpoint_increment)
+#define HOST1X_IOCTL_FENCE_EXTRACT       _IOWR('X', 0x05, struct host1x_fence_extract)
+
+#if defined(__cplusplus)
+}
+#endif
+
+#endif
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH v2 08/17] gpu: host1x: Implement /dev/host1x device node
  2020-09-05 10:34 ` Mikko Perttunen
@ 2020-09-05 10:34   ` Mikko Perttunen
  -1 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-05 10:34 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Add the /dev/host1x device node, implementing the following
functionality:

- Reading syncpoint values
- Allocating syncpoints (providing syncpoint FDs)
- Incrementing syncpoints (based on syncpoint FD)

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/Makefile |   1 +
 drivers/gpu/host1x/dev.c    |   9 ++
 drivers/gpu/host1x/dev.h    |   3 +
 drivers/gpu/host1x/uapi.c   | 275 ++++++++++++++++++++++++++++++++++++
 drivers/gpu/host1x/uapi.h   |  22 +++
 include/linux/host1x.h      |   2 +
 6 files changed, 312 insertions(+)
 create mode 100644 drivers/gpu/host1x/uapi.c
 create mode 100644 drivers/gpu/host1x/uapi.h

diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
index 096017b8789d..882f928d75e1 100644
--- a/drivers/gpu/host1x/Makefile
+++ b/drivers/gpu/host1x/Makefile
@@ -9,6 +9,7 @@ host1x-y = \
 	job.o \
 	debug.o \
 	mipi.o \
+	uapi.o \
 	hw/host1x01.o \
 	hw/host1x02.o \
 	hw/host1x04.o \
diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index d0ebb70e2fdd..641317d23828 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -461,6 +461,12 @@ static int host1x_probe(struct platform_device *pdev)
 		goto deinit_syncpt;
 	}
 
+	err = host1x_uapi_init(&host->uapi, host);
+	if (err) {
+		dev_err(&pdev->dev, "failed to initialize uapi\n");
+		goto deinit_intr;
+	}
+
 	host1x_debug_init(host);
 
 	if (host->info->has_hypervisor)
@@ -480,6 +486,8 @@ static int host1x_probe(struct platform_device *pdev)
 	host1x_unregister(host);
 deinit_debugfs:
 	host1x_debug_deinit(host);
+	host1x_uapi_deinit(&host->uapi);
+deinit_intr:
 	host1x_intr_deinit(host);
 deinit_syncpt:
 	host1x_syncpt_deinit(host);
@@ -501,6 +509,7 @@ static int host1x_remove(struct platform_device *pdev)
 
 	host1x_unregister(host);
 	host1x_debug_deinit(host);
+	host1x_uapi_deinit(&host->uapi);
 	host1x_intr_deinit(host);
 	host1x_syncpt_deinit(host);
 	reset_control_assert(host->rst);
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
index 63010ae37a97..7b8b7e20e32b 100644
--- a/drivers/gpu/host1x/dev.h
+++ b/drivers/gpu/host1x/dev.h
@@ -17,6 +17,7 @@
 #include "intr.h"
 #include "job.h"
 #include "syncpt.h"
+#include "uapi.h"
 
 struct host1x_syncpt;
 struct host1x_syncpt_base;
@@ -143,6 +144,8 @@ struct host1x {
 	struct list_head list;
 
 	struct device_dma_parameters dma_parms;
+
+	struct host1x_uapi uapi;
 };
 
 void host1x_hypervisor_writel(struct host1x *host1x, u32 r, u32 v);
diff --git a/drivers/gpu/host1x/uapi.c b/drivers/gpu/host1x/uapi.c
new file mode 100644
index 000000000000..bc10e5fc0813
--- /dev/null
+++ b/drivers/gpu/host1x/uapi.c
@@ -0,0 +1,275 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * /dev/host1x syncpoint interface
+ *
+ * Copyright (c) 2020, NVIDIA Corporation.
+ */
+
+#include <linux/anon_inodes.h>
+#include <linux/cdev.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/host1x.h>
+#include <linux/nospec.h>
+
+#include "dev.h"
+#include "syncpt.h"
+#include "uapi.h"
+
+#include <uapi/linux/host1x.h>
+
+static int syncpt_file_release(struct inode *inode, struct file *file)
+{
+	struct host1x_syncpt *sp = file->private_data;
+
+	host1x_syncpt_put(sp);
+
+	return 0;
+}
+
+static int syncpt_file_ioctl_info(struct host1x_syncpt *sp, void __user *data)
+{
+	struct host1x_syncpoint_info args;
+	unsigned long copy_err;
+
+	copy_err = copy_from_user(&args, data, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	if (args.reserved[0] || args.reserved[1] || args.reserved[2])
+		return -EINVAL;
+
+	args.id = sp->id;
+
+	copy_err = copy_to_user(data, &args, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	return 0;
+}
+
+static int syncpt_file_ioctl_incr(struct host1x_syncpt *sp, void __user *data)
+{
+	struct host1x_syncpoint_increment args;
+	unsigned long copy_err;
+	u32 i;
+
+	copy_err = copy_from_user(&args, data, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	for (i = 0; i < args.count; i++) {
+		host1x_syncpt_incr(sp);
+		if (signal_pending(current))
+			return -EINTR;
+	}
+
+	return 0;
+}
+
+static long syncpt_file_ioctl(struct file *file, unsigned int cmd,
+			      unsigned long arg)
+{
+	void __user *data = (void __user *)arg;
+	long err;
+
+	switch (cmd) {
+	case HOST1X_IOCTL_SYNCPOINT_INFO:
+		err = syncpt_file_ioctl_info(file->private_data, data);
+		break;
+
+	case HOST1X_IOCTL_SYNCPOINT_INCREMENT:
+		err = syncpt_file_ioctl_incr(file->private_data, data);
+		break;
+
+	default:
+		err = -ENOTTY;
+	}
+
+	return err;
+}
+
+static const struct file_operations syncpt_file_fops = {
+	.owner = THIS_MODULE,
+	.release = syncpt_file_release,
+	.unlocked_ioctl = syncpt_file_ioctl,
+	.compat_ioctl = syncpt_file_ioctl,
+};
+
+struct host1x_syncpt *host1x_syncpt_fd_get(int fd)
+{
+	struct host1x_syncpt *sp;
+	struct file *file = fget(fd);
+
+	if (!file)
+		return ERR_PTR(-EINVAL);
+
+	if (file->f_op != &syncpt_file_fops) {
+		fput(file);
+		return ERR_PTR(-EINVAL);
+	}
+
+	sp = file->private_data;
+
+	host1x_syncpt_get(sp);
+
+	fput(file);
+
+	return sp;
+}
+EXPORT_SYMBOL(host1x_syncpt_fd_get);
+
+static int dev_file_open(struct inode *inode, struct file *file)
+{
+	struct host1x_uapi *uapi =
+		container_of(inode->i_cdev, struct host1x_uapi, cdev);
+
+	file->private_data = container_of(uapi, struct host1x, uapi);
+
+	return 0;
+}
+
+static int dev_file_ioctl_read_syncpoint(struct host1x *host1x,
+					 void __user *data)
+{
+	struct host1x_read_syncpoint args;
+	unsigned long copy_err;
+
+	copy_err = copy_from_user(&args, data, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	if (args.id >= host1x_syncpt_nb_pts(host1x))
+		return -EINVAL;
+
+	args.id = array_index_nospec(args.id, host1x_syncpt_nb_pts(host1x));
+	args.value = host1x_syncpt_read(&host1x->syncpt[args.id]);
+
+	copy_err = copy_to_user(data, &args, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	return 0;
+}
+
+static int dev_file_ioctl_alloc_syncpoint(struct host1x *host1x,
+					  void __user *data)
+{
+	struct host1x_allocate_syncpoint args;
+	struct host1x_syncpt *sp;
+	unsigned long copy_err;
+	int err;
+
+	copy_err = copy_from_user(&args, data, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	if (args.reserved[0] || args.reserved[1] || args.reserved[2])
+		return -EINVAL;
+
+	sp = host1x_syncpt_alloc(host1x, NULL, HOST1X_SYNCPT_CLIENT_MANAGED);
+	if (!sp)
+		return -EBUSY;
+
+	err = anon_inode_getfd("host1x_syncpt", &syncpt_file_fops, sp,
+			       O_CLOEXEC);
+	if (err < 0)
+		goto free_syncpt;
+
+	args.fd = err;
+
+	copy_err = copy_to_user(data, &args, sizeof(args));
+	if (copy_err) {
+		err = -EFAULT;
+		goto put_fd;
+	}
+
+	return 0;
+
+put_fd:
+	put_unused_fd(args.fd);
+free_syncpt:
+	host1x_syncpt_put(sp);
+
+	return err;
+}
+
+static long dev_file_ioctl(struct file *file, unsigned int cmd,
+			   unsigned long arg)
+{
+	void __user *data = (void __user *)arg;
+	long err;
+
+	switch (cmd) {
+	case HOST1X_IOCTL_READ_SYNCPOINT:
+		err = dev_file_ioctl_read_syncpoint(file->private_data, data);
+		break;
+
+	case HOST1X_IOCTL_ALLOCATE_SYNCPOINT:
+		err = dev_file_ioctl_alloc_syncpoint(file->private_data, data);
+		break;
+
+	default:
+		err = -ENOTTY;
+	}
+
+	return err;
+}
+
+static const struct file_operations dev_file_fops = {
+	.owner = THIS_MODULE,
+	.open = dev_file_open,
+	.unlocked_ioctl = dev_file_ioctl,
+	.compat_ioctl = dev_file_ioctl,
+};
+
+int host1x_uapi_init(struct host1x_uapi *uapi, struct host1x *host1x)
+{
+	int err;
+	dev_t dev_num;
+
+	err = alloc_chrdev_region(&dev_num, 0, 1, "host1x");
+	if (err)
+		return err;
+
+	uapi->class = class_create(THIS_MODULE, "host1x");
+	if (IS_ERR(uapi->class)) {
+		err = PTR_ERR(uapi->class);
+		goto unregister_chrdev_region;
+	}
+
+	cdev_init(&uapi->cdev, &dev_file_fops);
+	err = cdev_add(&uapi->cdev, dev_num, 1);
+	if (err)
+		goto destroy_class;
+
+	uapi->dev = device_create(uapi->class, host1x->dev,
+				  dev_num, NULL, "host1x");
+	if (IS_ERR(uapi->dev)) {
+		err = PTR_ERR(uapi->dev);
+		goto del_cdev;
+	}
+
+	cdev_add(&uapi->cdev, dev_num, 1);
+
+	uapi->dev_num = dev_num;
+
+	return 0;
+
+del_cdev:
+	cdev_del(&uapi->cdev);
+destroy_class:
+	class_destroy(uapi->class);
+unregister_chrdev_region:
+	unregister_chrdev_region(dev_num, 1);
+
+	return err;
+}
+
+void host1x_uapi_deinit(struct host1x_uapi *uapi)
+{
+	device_destroy(uapi->class, uapi->dev_num);
+	cdev_del(&uapi->cdev);
+	class_destroy(uapi->class);
+	unregister_chrdev_region(uapi->dev_num, 1);
+}
diff --git a/drivers/gpu/host1x/uapi.h b/drivers/gpu/host1x/uapi.h
new file mode 100644
index 000000000000..7beb5e44c1b1
--- /dev/null
+++ b/drivers/gpu/host1x/uapi.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2020, NVIDIA Corporation.
+ */
+
+#ifndef HOST1X_UAPI_H
+#define HOST1X_UAPI_H
+
+#include <linux/cdev.h>
+
+struct host1x_uapi {
+	struct class *class;
+
+	struct cdev cdev;
+	struct device *dev;
+	dev_t dev_num;
+};
+
+int host1x_uapi_init(struct host1x_uapi *uapi, struct host1x *host1x);
+void host1x_uapi_deinit(struct host1x_uapi *uapi);
+
+#endif
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index da87ceb33c2d..b970e1bbc29d 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -160,6 +160,8 @@ void host1x_syncpt_put(struct host1x_syncpt *sp);
 struct host1x_syncpt_base *host1x_syncpt_get_base(struct host1x_syncpt *sp);
 u32 host1x_syncpt_base_id(struct host1x_syncpt_base *base);
 
+struct host1x_syncpt *host1x_syncpt_fd_get(int fd);
+
 /*
  * host1x channel
  */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH v2 08/17] gpu: host1x: Implement /dev/host1x device node
@ 2020-09-05 10:34   ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-05 10:34 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

Add the /dev/host1x device node, implementing the following
functionality:

- Reading syncpoint values
- Allocating syncpoints (providing syncpoint FDs)
- Incrementing syncpoints (based on syncpoint FD)

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/Makefile |   1 +
 drivers/gpu/host1x/dev.c    |   9 ++
 drivers/gpu/host1x/dev.h    |   3 +
 drivers/gpu/host1x/uapi.c   | 275 ++++++++++++++++++++++++++++++++++++
 drivers/gpu/host1x/uapi.h   |  22 +++
 include/linux/host1x.h      |   2 +
 6 files changed, 312 insertions(+)
 create mode 100644 drivers/gpu/host1x/uapi.c
 create mode 100644 drivers/gpu/host1x/uapi.h

diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
index 096017b8789d..882f928d75e1 100644
--- a/drivers/gpu/host1x/Makefile
+++ b/drivers/gpu/host1x/Makefile
@@ -9,6 +9,7 @@ host1x-y = \
 	job.o \
 	debug.o \
 	mipi.o \
+	uapi.o \
 	hw/host1x01.o \
 	hw/host1x02.o \
 	hw/host1x04.o \
diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index d0ebb70e2fdd..641317d23828 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -461,6 +461,12 @@ static int host1x_probe(struct platform_device *pdev)
 		goto deinit_syncpt;
 	}
 
+	err = host1x_uapi_init(&host->uapi, host);
+	if (err) {
+		dev_err(&pdev->dev, "failed to initialize uapi\n");
+		goto deinit_intr;
+	}
+
 	host1x_debug_init(host);
 
 	if (host->info->has_hypervisor)
@@ -480,6 +486,8 @@ static int host1x_probe(struct platform_device *pdev)
 	host1x_unregister(host);
 deinit_debugfs:
 	host1x_debug_deinit(host);
+	host1x_uapi_deinit(&host->uapi);
+deinit_intr:
 	host1x_intr_deinit(host);
 deinit_syncpt:
 	host1x_syncpt_deinit(host);
@@ -501,6 +509,7 @@ static int host1x_remove(struct platform_device *pdev)
 
 	host1x_unregister(host);
 	host1x_debug_deinit(host);
+	host1x_uapi_deinit(&host->uapi);
 	host1x_intr_deinit(host);
 	host1x_syncpt_deinit(host);
 	reset_control_assert(host->rst);
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
index 63010ae37a97..7b8b7e20e32b 100644
--- a/drivers/gpu/host1x/dev.h
+++ b/drivers/gpu/host1x/dev.h
@@ -17,6 +17,7 @@
 #include "intr.h"
 #include "job.h"
 #include "syncpt.h"
+#include "uapi.h"
 
 struct host1x_syncpt;
 struct host1x_syncpt_base;
@@ -143,6 +144,8 @@ struct host1x {
 	struct list_head list;
 
 	struct device_dma_parameters dma_parms;
+
+	struct host1x_uapi uapi;
 };
 
 void host1x_hypervisor_writel(struct host1x *host1x, u32 r, u32 v);
diff --git a/drivers/gpu/host1x/uapi.c b/drivers/gpu/host1x/uapi.c
new file mode 100644
index 000000000000..bc10e5fc0813
--- /dev/null
+++ b/drivers/gpu/host1x/uapi.c
@@ -0,0 +1,275 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * /dev/host1x syncpoint interface
+ *
+ * Copyright (c) 2020, NVIDIA Corporation.
+ */
+
+#include <linux/anon_inodes.h>
+#include <linux/cdev.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/host1x.h>
+#include <linux/nospec.h>
+
+#include "dev.h"
+#include "syncpt.h"
+#include "uapi.h"
+
+#include <uapi/linux/host1x.h>
+
+static int syncpt_file_release(struct inode *inode, struct file *file)
+{
+	struct host1x_syncpt *sp = file->private_data;
+
+	host1x_syncpt_put(sp);
+
+	return 0;
+}
+
+static int syncpt_file_ioctl_info(struct host1x_syncpt *sp, void __user *data)
+{
+	struct host1x_syncpoint_info args;
+	unsigned long copy_err;
+
+	copy_err = copy_from_user(&args, data, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	if (args.reserved[0] || args.reserved[1] || args.reserved[2])
+		return -EINVAL;
+
+	args.id = sp->id;
+
+	copy_err = copy_to_user(data, &args, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	return 0;
+}
+
+static int syncpt_file_ioctl_incr(struct host1x_syncpt *sp, void __user *data)
+{
+	struct host1x_syncpoint_increment args;
+	unsigned long copy_err;
+	u32 i;
+
+	copy_err = copy_from_user(&args, data, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	for (i = 0; i < args.count; i++) {
+		host1x_syncpt_incr(sp);
+		if (signal_pending(current))
+			return -EINTR;
+	}
+
+	return 0;
+}
+
+static long syncpt_file_ioctl(struct file *file, unsigned int cmd,
+			      unsigned long arg)
+{
+	void __user *data = (void __user *)arg;
+	long err;
+
+	switch (cmd) {
+	case HOST1X_IOCTL_SYNCPOINT_INFO:
+		err = syncpt_file_ioctl_info(file->private_data, data);
+		break;
+
+	case HOST1X_IOCTL_SYNCPOINT_INCREMENT:
+		err = syncpt_file_ioctl_incr(file->private_data, data);
+		break;
+
+	default:
+		err = -ENOTTY;
+	}
+
+	return err;
+}
+
+static const struct file_operations syncpt_file_fops = {
+	.owner = THIS_MODULE,
+	.release = syncpt_file_release,
+	.unlocked_ioctl = syncpt_file_ioctl,
+	.compat_ioctl = syncpt_file_ioctl,
+};
+
+struct host1x_syncpt *host1x_syncpt_fd_get(int fd)
+{
+	struct host1x_syncpt *sp;
+	struct file *file = fget(fd);
+
+	if (!file)
+		return ERR_PTR(-EINVAL);
+
+	if (file->f_op != &syncpt_file_fops) {
+		fput(file);
+		return ERR_PTR(-EINVAL);
+	}
+
+	sp = file->private_data;
+
+	host1x_syncpt_get(sp);
+
+	fput(file);
+
+	return sp;
+}
+EXPORT_SYMBOL(host1x_syncpt_fd_get);
+
+static int dev_file_open(struct inode *inode, struct file *file)
+{
+	struct host1x_uapi *uapi =
+		container_of(inode->i_cdev, struct host1x_uapi, cdev);
+
+	file->private_data = container_of(uapi, struct host1x, uapi);
+
+	return 0;
+}
+
+static int dev_file_ioctl_read_syncpoint(struct host1x *host1x,
+					 void __user *data)
+{
+	struct host1x_read_syncpoint args;
+	unsigned long copy_err;
+
+	copy_err = copy_from_user(&args, data, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	if (args.id >= host1x_syncpt_nb_pts(host1x))
+		return -EINVAL;
+
+	args.id = array_index_nospec(args.id, host1x_syncpt_nb_pts(host1x));
+	args.value = host1x_syncpt_read(&host1x->syncpt[args.id]);
+
+	copy_err = copy_to_user(data, &args, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	return 0;
+}
+
+static int dev_file_ioctl_alloc_syncpoint(struct host1x *host1x,
+					  void __user *data)
+{
+	struct host1x_allocate_syncpoint args;
+	struct host1x_syncpt *sp;
+	unsigned long copy_err;
+	int err;
+
+	copy_err = copy_from_user(&args, data, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	if (args.reserved[0] || args.reserved[1] || args.reserved[2])
+		return -EINVAL;
+
+	sp = host1x_syncpt_alloc(host1x, NULL, HOST1X_SYNCPT_CLIENT_MANAGED);
+	if (!sp)
+		return -EBUSY;
+
+	err = anon_inode_getfd("host1x_syncpt", &syncpt_file_fops, sp,
+			       O_CLOEXEC);
+	if (err < 0)
+		goto free_syncpt;
+
+	args.fd = err;
+
+	copy_err = copy_to_user(data, &args, sizeof(args));
+	if (copy_err) {
+		err = -EFAULT;
+		goto put_fd;
+	}
+
+	return 0;
+
+put_fd:
+	put_unused_fd(args.fd);
+free_syncpt:
+	host1x_syncpt_put(sp);
+
+	return err;
+}
+
+static long dev_file_ioctl(struct file *file, unsigned int cmd,
+			   unsigned long arg)
+{
+	void __user *data = (void __user *)arg;
+	long err;
+
+	switch (cmd) {
+	case HOST1X_IOCTL_READ_SYNCPOINT:
+		err = dev_file_ioctl_read_syncpoint(file->private_data, data);
+		break;
+
+	case HOST1X_IOCTL_ALLOCATE_SYNCPOINT:
+		err = dev_file_ioctl_alloc_syncpoint(file->private_data, data);
+		break;
+
+	default:
+		err = -ENOTTY;
+	}
+
+	return err;
+}
+
+static const struct file_operations dev_file_fops = {
+	.owner = THIS_MODULE,
+	.open = dev_file_open,
+	.unlocked_ioctl = dev_file_ioctl,
+	.compat_ioctl = dev_file_ioctl,
+};
+
+int host1x_uapi_init(struct host1x_uapi *uapi, struct host1x *host1x)
+{
+	int err;
+	dev_t dev_num;
+
+	err = alloc_chrdev_region(&dev_num, 0, 1, "host1x");
+	if (err)
+		return err;
+
+	uapi->class = class_create(THIS_MODULE, "host1x");
+	if (IS_ERR(uapi->class)) {
+		err = PTR_ERR(uapi->class);
+		goto unregister_chrdev_region;
+	}
+
+	cdev_init(&uapi->cdev, &dev_file_fops);
+	err = cdev_add(&uapi->cdev, dev_num, 1);
+	if (err)
+		goto destroy_class;
+
+	uapi->dev = device_create(uapi->class, host1x->dev,
+				  dev_num, NULL, "host1x");
+	if (IS_ERR(uapi->dev)) {
+		err = PTR_ERR(uapi->dev);
+		goto del_cdev;
+	}
+
+	cdev_add(&uapi->cdev, dev_num, 1);
+
+	uapi->dev_num = dev_num;
+
+	return 0;
+
+del_cdev:
+	cdev_del(&uapi->cdev);
+destroy_class:
+	class_destroy(uapi->class);
+unregister_chrdev_region:
+	unregister_chrdev_region(dev_num, 1);
+
+	return err;
+}
+
+void host1x_uapi_deinit(struct host1x_uapi *uapi)
+{
+	device_destroy(uapi->class, uapi->dev_num);
+	cdev_del(&uapi->cdev);
+	class_destroy(uapi->class);
+	unregister_chrdev_region(uapi->dev_num, 1);
+}
diff --git a/drivers/gpu/host1x/uapi.h b/drivers/gpu/host1x/uapi.h
new file mode 100644
index 000000000000..7beb5e44c1b1
--- /dev/null
+++ b/drivers/gpu/host1x/uapi.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2020, NVIDIA Corporation.
+ */
+
+#ifndef HOST1X_UAPI_H
+#define HOST1X_UAPI_H
+
+#include <linux/cdev.h>
+
+struct host1x_uapi {
+	struct class *class;
+
+	struct cdev cdev;
+	struct device *dev;
+	dev_t dev_num;
+};
+
+int host1x_uapi_init(struct host1x_uapi *uapi, struct host1x *host1x);
+void host1x_uapi_deinit(struct host1x_uapi *uapi);
+
+#endif
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index da87ceb33c2d..b970e1bbc29d 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -160,6 +160,8 @@ void host1x_syncpt_put(struct host1x_syncpt *sp);
 struct host1x_syncpt_base *host1x_syncpt_get_base(struct host1x_syncpt *sp);
 u32 host1x_syncpt_base_id(struct host1x_syncpt_base *base);
 
+struct host1x_syncpt *host1x_syncpt_fd_get(int fd);
+
 /*
  * host1x channel
  */
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH v2 09/17] gpu: host1x: DMA fences and userspace fence creation
  2020-09-05 10:34 ` Mikko Perttunen
@ 2020-09-05 10:34   ` Mikko Perttunen
  -1 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-05 10:34 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Add an implementation of dma_fences based on syncpoints. Syncpoint
interrupts are used to signal fences. Additionally, after
software signaling has been enabled, a 30 second timeout is started.
If the syncpoint threshold is not reached within this period,
the fence is signalled with an -ETIMEDOUT error code. This is to
allow fences that would never reach their syncpoint threshold to
be cleaned up.

Additionally, add a new /dev/host1x IOCTL for creating sync_file
file descriptors backed by syncpoint fences.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/Makefile |   1 +
 drivers/gpu/host1x/fence.c  | 207 ++++++++++++++++++++++++++++++++++++
 drivers/gpu/host1x/fence.h  |  15 +++
 drivers/gpu/host1x/intr.c   |   9 ++
 drivers/gpu/host1x/intr.h   |   2 +
 drivers/gpu/host1x/uapi.c   | 106 ++++++++++++++++++
 include/linux/host1x.h      |   3 +
 7 files changed, 343 insertions(+)
 create mode 100644 drivers/gpu/host1x/fence.c
 create mode 100644 drivers/gpu/host1x/fence.h

diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
index 882f928d75e1..a48af2cefae1 100644
--- a/drivers/gpu/host1x/Makefile
+++ b/drivers/gpu/host1x/Makefile
@@ -10,6 +10,7 @@ host1x-y = \
 	debug.o \
 	mipi.o \
 	uapi.o \
+	fence.o \
 	hw/host1x01.o \
 	hw/host1x02.o \
 	hw/host1x04.o \
diff --git a/drivers/gpu/host1x/fence.c b/drivers/gpu/host1x/fence.c
new file mode 100644
index 000000000000..400da6c1ab48
--- /dev/null
+++ b/drivers/gpu/host1x/fence.c
@@ -0,0 +1,207 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Syncpoint dma_fence implementation
+ *
+ * Copyright (c) 2020, NVIDIA Corporation.
+ */
+
+#include <linux/dma-fence.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/slab.h>
+#include <linux/sync_file.h>
+
+#include "intr.h"
+#include "syncpt.h"
+
+DEFINE_SPINLOCK(lock);
+
+struct host1x_syncpt_fence {
+	struct dma_fence base;
+
+	atomic_t signaling;
+
+	struct host1x_syncpt *sp;
+	u32 threshold;
+
+	struct host1x_waitlist *waiter;
+	void *waiter_ref;
+
+	struct delayed_work timeout_work;
+};
+
+static const char *syncpt_fence_get_driver_name(struct dma_fence *f)
+{
+	return "host1x";
+}
+
+static const char *syncpt_fence_get_timeline_name(struct dma_fence *f)
+{
+	return "syncpoint";
+}
+
+static bool syncpt_fence_enable_signaling(struct dma_fence *f)
+{
+	struct host1x_syncpt_fence *sf =
+		container_of(f, struct host1x_syncpt_fence, base);
+	int err;
+
+	if (host1x_syncpt_is_expired(sf->sp, sf->threshold))
+		return false;
+
+	dma_fence_get(f);
+
+	/*
+	 * The dma_fence framework requires the fence driver to keep a
+	 * reference to any fences for which 'enable_signaling' has been
+	 * called (and that have not been signalled).
+	 * 
+	 * We provide a userspace API to create arbitrary syncpoint fences,
+	 * so we cannot normally guarantee that all fences get signalled.
+	 * As such, setup a timeout, so that long-lasting fences will get
+	 * reaped eventually.
+	 */
+	schedule_delayed_work(&sf->timeout_work, msecs_to_jiffies(30000));
+
+	err = host1x_intr_add_action(sf->sp->host, sf->sp, sf->threshold,
+				     HOST1X_INTR_ACTION_SIGNAL_FENCE, f,
+				     sf->waiter, &sf->waiter_ref);
+	if (err) {
+		cancel_delayed_work_sync(&sf->timeout_work);
+		dma_fence_put(f);
+		return false;
+	}
+
+	/* intr framework takes ownership of waiter */
+	sf->waiter = NULL;
+
+	/*
+	 * The fence may get signalled at any time after the above call,
+	 * so we need to initialize all state used by signalling
+	 * before it.
+	 */
+
+	return true;
+}
+
+static void syncpt_fence_release(struct dma_fence *f)
+{
+	struct host1x_syncpt_fence *sf =
+		container_of(f, struct host1x_syncpt_fence, base);
+
+	if (sf->waiter)
+		kfree(sf->waiter);
+
+	dma_fence_free(f);
+}
+
+const struct dma_fence_ops syncpt_fence_ops = {
+	.get_driver_name = syncpt_fence_get_driver_name,
+	.get_timeline_name = syncpt_fence_get_timeline_name,
+	.enable_signaling = syncpt_fence_enable_signaling,
+	.release = syncpt_fence_release,
+};
+
+void host1x_fence_signal(struct host1x_syncpt_fence *f)
+{
+	if (atomic_xchg(&f->signaling, 1))
+		return;
+
+	/*
+	 * Cancel pending timeout work - if it races, it will
+	 * not get 'f->signaling' and return.
+	 */
+	cancel_delayed_work_sync(&f->timeout_work);
+
+	host1x_intr_put_ref(f->sp->host, f->sp->id, f->waiter_ref);
+
+	dma_fence_signal(&f->base);
+	dma_fence_put(&f->base);
+}
+
+static void do_fence_timeout(struct work_struct *work)
+{
+	struct delayed_work *dwork = (struct delayed_work *)work;
+	struct host1x_syncpt_fence *f =
+		container_of(dwork, struct host1x_syncpt_fence, timeout_work);
+
+	if (atomic_xchg(&f->signaling, 1))
+		return;
+
+	/*
+	 * Cancel pending timeout work - if it races, it will
+	 * not get 'f->signaling' and return.
+	 */
+	host1x_intr_put_ref(f->sp->host, f->sp->id, f->waiter_ref);
+
+	dma_fence_set_error(&f->base, -ETIMEDOUT);
+	dma_fence_signal(&f->base);
+	dma_fence_put(&f->base);
+}
+
+struct dma_fence *host1x_fence_create(struct host1x_syncpt *sp, u32 threshold)
+{
+	struct host1x_syncpt_fence *fence;
+
+	fence = kzalloc(sizeof(*fence), GFP_KERNEL);
+	if (!fence)
+		return ERR_PTR(-ENOMEM);
+
+	fence->waiter = kzalloc(sizeof(*fence->waiter), GFP_KERNEL);
+	if (!fence->waiter)
+		return ERR_PTR(-ENOMEM);
+
+	fence->sp = sp;
+	fence->threshold = threshold;
+
+	dma_fence_init(&fence->base, &syncpt_fence_ops, &lock,
+		       dma_fence_context_alloc(1), 0);
+
+	INIT_DELAYED_WORK(&fence->timeout_work, do_fence_timeout);
+
+	return &fence->base;
+}
+EXPORT_SYMBOL(host1x_fence_create);
+
+int host1x_fence_create_fd(struct host1x_syncpt *sp, u32 threshold)
+{
+	struct sync_file *file;
+	struct dma_fence *f;
+	int fd;
+
+	f = host1x_fence_create(sp, threshold);
+	if (IS_ERR(f))
+		return PTR_ERR(f);
+
+	fd = get_unused_fd_flags(O_CLOEXEC);
+	if (fd < 0) {
+		dma_fence_put(f);
+		return fd;
+	}
+
+	file = sync_file_create(f);
+	dma_fence_put(f);
+	if (!file)
+		return -ENOMEM;
+
+	fd_install(fd, file->file);
+
+	return fd;
+}
+EXPORT_SYMBOL(host1x_fence_create_fd);
+
+int host1x_fence_extract(struct dma_fence *fence, u32 *id, u32 *threshold)
+{
+	struct host1x_syncpt_fence *f;
+
+	if (fence->ops != &syncpt_fence_ops)
+		return -EINVAL;
+
+	f = container_of(fence, struct host1x_syncpt_fence, base);
+
+	*id = f->sp->id;
+	*threshold = f->threshold;
+
+	return 0;
+}
+EXPORT_SYMBOL(host1x_fence_extract);
diff --git a/drivers/gpu/host1x/fence.h b/drivers/gpu/host1x/fence.h
new file mode 100644
index 000000000000..e36dfc11cca4
--- /dev/null
+++ b/drivers/gpu/host1x/fence.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2020, NVIDIA Corporation.
+ */
+
+#ifndef HOST1X_FENCE_H
+#define HOST1X_FENCE_H
+
+struct host1x_syncpt_fence;
+
+bool host1x_fence_signal(struct host1x_syncpt_fence *fence);
+
+int host1x_fence_extract(struct dma_fence *fence, u32 *id, u32 *threshold);
+
+#endif
diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c
index 5d328d20ce6d..19b59c5c94d0 100644
--- a/drivers/gpu/host1x/intr.c
+++ b/drivers/gpu/host1x/intr.c
@@ -13,6 +13,7 @@
 #include <trace/events/host1x.h>
 #include "channel.h"
 #include "dev.h"
+#include "fence.h"
 #include "intr.h"
 
 /* Wait list management */
@@ -121,12 +122,20 @@ static void action_wakeup_interruptible(struct host1x_waitlist *waiter)
 	wake_up_interruptible(wq);
 }
 
+static void action_signal_fence(struct host1x_waitlist *waiter)
+{
+	struct host1x_syncpt_fence *f = waiter->data;
+
+	host1x_fence_signal(f);
+}
+
 typedef void (*action_handler)(struct host1x_waitlist *waiter);
 
 static const action_handler action_handlers[HOST1X_INTR_ACTION_COUNT] = {
 	action_submit_complete,
 	action_wakeup,
 	action_wakeup_interruptible,
+	action_signal_fence,
 };
 
 static void run_handlers(struct list_head completed[HOST1X_INTR_ACTION_COUNT])
diff --git a/drivers/gpu/host1x/intr.h b/drivers/gpu/host1x/intr.h
index aac38194398f..dedbd0f700fb 100644
--- a/drivers/gpu/host1x/intr.h
+++ b/drivers/gpu/host1x/intr.h
@@ -33,6 +33,8 @@ enum host1x_intr_action {
 	 */
 	HOST1X_INTR_ACTION_WAKEUP_INTERRUPTIBLE,
 
+	HOST1X_INTR_ACTION_SIGNAL_FENCE,
+
 	HOST1X_INTR_ACTION_COUNT
 };
 
diff --git a/drivers/gpu/host1x/uapi.c b/drivers/gpu/host1x/uapi.c
index bc10e5fc0813..aae0f534bc71 100644
--- a/drivers/gpu/host1x/uapi.c
+++ b/drivers/gpu/host1x/uapi.c
@@ -11,8 +11,10 @@
 #include <linux/fs.h>
 #include <linux/host1x.h>
 #include <linux/nospec.h>
+#include <linux/sync_file.h>
 
 #include "dev.h"
+#include "fence.h"
 #include "syncpt.h"
 #include "uapi.h"
 
@@ -194,6 +196,102 @@ static int dev_file_ioctl_alloc_syncpoint(struct host1x *host1x,
 	return err;
 }
 
+static int dev_file_ioctl_create_fence(struct host1x *host1x, void __user *data)
+{
+	struct host1x_create_fence args;
+	unsigned long copy_err;
+	struct sync_file *file;
+	int fd;
+
+	copy_err = copy_from_user(&args, data, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	if (args.reserved[0])
+		return -EINVAL;
+
+	if (args.id >= host1x_syncpt_nb_pts(host1x))
+		return -EINVAL;
+
+	args.id = array_index_nospec(args.id, host1x_syncpt_nb_pts(host1x));
+
+	fd = host1x_fence_create_fd(&host1x->syncpt[args.id], args.threshold);
+	if (fd < 0)
+		return fd;
+
+	args.fence_fd = fd;
+
+	copy_err = copy_to_user(data, &args, sizeof(args));
+	if (copy_err) {
+		fput(file->file);
+		return -EFAULT;
+	}
+
+	return 0;
+}
+
+static int dev_file_ioctl_fence_extract(struct host1x *host1x, void __user *data)
+{
+	struct host1x_fence_extract_fence __user *fences_user_ptr;
+	struct dma_fence *fence, **fences;
+	struct host1x_fence_extract args;
+	struct dma_fence_array *array;
+	unsigned int num_fences, i;
+	unsigned long copy_err;
+	int err;
+
+	copy_err = copy_from_user(&args, data, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	fences_user_ptr = u64_to_user_ptr(args.fences_ptr);
+
+	if (args.reserved[0] || args.reserved[1])
+		return -EINVAL;
+
+	fence = sync_file_get_fence(args.fence_fd);
+	if (!fence)
+		return -EINVAL;
+
+	array = to_dma_fence_array(fence);
+	if (array) {
+		fences = array->fences;
+		num_fences = array->num_fences;
+	} else {
+		fences = &fence;
+		num_fences = 1;
+	}
+
+	for (i = 0; i < min(num_fences, args.num_fences); i++) {
+		struct host1x_fence_extract_fence f;
+
+		err = host1x_fence_extract(fences[i], &f.id, &f.threshold);
+		if (err)
+			goto put_fence;
+
+		copy_err = copy_to_user(fences_user_ptr + i, &f, sizeof(f));
+		if (copy_err) {
+			err = -EFAULT;
+			goto put_fence;
+		}
+	}
+
+	args.num_fences = i+1;
+
+	copy_err = copy_to_user(data, &args, sizeof(args));
+	if (copy_err) {
+		err = -EFAULT;
+		goto put_fence;
+	}
+
+	return 0;
+
+put_fence:
+	dma_fence_put(fence);
+
+	return err;
+}
+
 static long dev_file_ioctl(struct file *file, unsigned int cmd,
 			   unsigned long arg)
 {
@@ -209,6 +307,14 @@ static long dev_file_ioctl(struct file *file, unsigned int cmd,
 		err = dev_file_ioctl_alloc_syncpoint(file->private_data, data);
 		break;
 
+	case HOST1X_IOCTL_CREATE_FENCE:
+		err = dev_file_ioctl_create_fence(file->private_data, data);
+		break;
+
+	case HOST1X_IOCTL_FENCE_EXTRACT:
+		err = dev_file_ioctl_fence_extract(file->private_data, data);
+		break;
+
 	default:
 		err = -ENOTTY;
 	}
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index b970e1bbc29d..73a247e180a9 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -162,6 +162,9 @@ u32 host1x_syncpt_base_id(struct host1x_syncpt_base *base);
 
 struct host1x_syncpt *host1x_syncpt_fd_get(int fd);
 
+struct dma_fence *host1x_fence_create(struct host1x_syncpt *sp, u32 threshold);
+int host1x_fence_create_fd(struct host1x_syncpt *sp, u32 threshold);
+
 /*
  * host1x channel
  */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH v2 09/17] gpu: host1x: DMA fences and userspace fence creation
@ 2020-09-05 10:34   ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-05 10:34 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

Add an implementation of dma_fences based on syncpoints. Syncpoint
interrupts are used to signal fences. Additionally, after
software signaling has been enabled, a 30 second timeout is started.
If the syncpoint threshold is not reached within this period,
the fence is signalled with an -ETIMEDOUT error code. This is to
allow fences that would never reach their syncpoint threshold to
be cleaned up.

Additionally, add a new /dev/host1x IOCTL for creating sync_file
file descriptors backed by syncpoint fences.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/Makefile |   1 +
 drivers/gpu/host1x/fence.c  | 207 ++++++++++++++++++++++++++++++++++++
 drivers/gpu/host1x/fence.h  |  15 +++
 drivers/gpu/host1x/intr.c   |   9 ++
 drivers/gpu/host1x/intr.h   |   2 +
 drivers/gpu/host1x/uapi.c   | 106 ++++++++++++++++++
 include/linux/host1x.h      |   3 +
 7 files changed, 343 insertions(+)
 create mode 100644 drivers/gpu/host1x/fence.c
 create mode 100644 drivers/gpu/host1x/fence.h

diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
index 882f928d75e1..a48af2cefae1 100644
--- a/drivers/gpu/host1x/Makefile
+++ b/drivers/gpu/host1x/Makefile
@@ -10,6 +10,7 @@ host1x-y = \
 	debug.o \
 	mipi.o \
 	uapi.o \
+	fence.o \
 	hw/host1x01.o \
 	hw/host1x02.o \
 	hw/host1x04.o \
diff --git a/drivers/gpu/host1x/fence.c b/drivers/gpu/host1x/fence.c
new file mode 100644
index 000000000000..400da6c1ab48
--- /dev/null
+++ b/drivers/gpu/host1x/fence.c
@@ -0,0 +1,207 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Syncpoint dma_fence implementation
+ *
+ * Copyright (c) 2020, NVIDIA Corporation.
+ */
+
+#include <linux/dma-fence.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/slab.h>
+#include <linux/sync_file.h>
+
+#include "intr.h"
+#include "syncpt.h"
+
+DEFINE_SPINLOCK(lock);
+
+struct host1x_syncpt_fence {
+	struct dma_fence base;
+
+	atomic_t signaling;
+
+	struct host1x_syncpt *sp;
+	u32 threshold;
+
+	struct host1x_waitlist *waiter;
+	void *waiter_ref;
+
+	struct delayed_work timeout_work;
+};
+
+static const char *syncpt_fence_get_driver_name(struct dma_fence *f)
+{
+	return "host1x";
+}
+
+static const char *syncpt_fence_get_timeline_name(struct dma_fence *f)
+{
+	return "syncpoint";
+}
+
+static bool syncpt_fence_enable_signaling(struct dma_fence *f)
+{
+	struct host1x_syncpt_fence *sf =
+		container_of(f, struct host1x_syncpt_fence, base);
+	int err;
+
+	if (host1x_syncpt_is_expired(sf->sp, sf->threshold))
+		return false;
+
+	dma_fence_get(f);
+
+	/*
+	 * The dma_fence framework requires the fence driver to keep a
+	 * reference to any fences for which 'enable_signaling' has been
+	 * called (and that have not been signalled).
+	 * 
+	 * We provide a userspace API to create arbitrary syncpoint fences,
+	 * so we cannot normally guarantee that all fences get signalled.
+	 * As such, setup a timeout, so that long-lasting fences will get
+	 * reaped eventually.
+	 */
+	schedule_delayed_work(&sf->timeout_work, msecs_to_jiffies(30000));
+
+	err = host1x_intr_add_action(sf->sp->host, sf->sp, sf->threshold,
+				     HOST1X_INTR_ACTION_SIGNAL_FENCE, f,
+				     sf->waiter, &sf->waiter_ref);
+	if (err) {
+		cancel_delayed_work_sync(&sf->timeout_work);
+		dma_fence_put(f);
+		return false;
+	}
+
+	/* intr framework takes ownership of waiter */
+	sf->waiter = NULL;
+
+	/*
+	 * The fence may get signalled at any time after the above call,
+	 * so we need to initialize all state used by signalling
+	 * before it.
+	 */
+
+	return true;
+}
+
+static void syncpt_fence_release(struct dma_fence *f)
+{
+	struct host1x_syncpt_fence *sf =
+		container_of(f, struct host1x_syncpt_fence, base);
+
+	if (sf->waiter)
+		kfree(sf->waiter);
+
+	dma_fence_free(f);
+}
+
+const struct dma_fence_ops syncpt_fence_ops = {
+	.get_driver_name = syncpt_fence_get_driver_name,
+	.get_timeline_name = syncpt_fence_get_timeline_name,
+	.enable_signaling = syncpt_fence_enable_signaling,
+	.release = syncpt_fence_release,
+};
+
+void host1x_fence_signal(struct host1x_syncpt_fence *f)
+{
+	if (atomic_xchg(&f->signaling, 1))
+		return;
+
+	/*
+	 * Cancel pending timeout work - if it races, it will
+	 * not get 'f->signaling' and return.
+	 */
+	cancel_delayed_work_sync(&f->timeout_work);
+
+	host1x_intr_put_ref(f->sp->host, f->sp->id, f->waiter_ref);
+
+	dma_fence_signal(&f->base);
+	dma_fence_put(&f->base);
+}
+
+static void do_fence_timeout(struct work_struct *work)
+{
+	struct delayed_work *dwork = (struct delayed_work *)work;
+	struct host1x_syncpt_fence *f =
+		container_of(dwork, struct host1x_syncpt_fence, timeout_work);
+
+	if (atomic_xchg(&f->signaling, 1))
+		return;
+
+	/*
+	 * Cancel pending timeout work - if it races, it will
+	 * not get 'f->signaling' and return.
+	 */
+	host1x_intr_put_ref(f->sp->host, f->sp->id, f->waiter_ref);
+
+	dma_fence_set_error(&f->base, -ETIMEDOUT);
+	dma_fence_signal(&f->base);
+	dma_fence_put(&f->base);
+}
+
+struct dma_fence *host1x_fence_create(struct host1x_syncpt *sp, u32 threshold)
+{
+	struct host1x_syncpt_fence *fence;
+
+	fence = kzalloc(sizeof(*fence), GFP_KERNEL);
+	if (!fence)
+		return ERR_PTR(-ENOMEM);
+
+	fence->waiter = kzalloc(sizeof(*fence->waiter), GFP_KERNEL);
+	if (!fence->waiter)
+		return ERR_PTR(-ENOMEM);
+
+	fence->sp = sp;
+	fence->threshold = threshold;
+
+	dma_fence_init(&fence->base, &syncpt_fence_ops, &lock,
+		       dma_fence_context_alloc(1), 0);
+
+	INIT_DELAYED_WORK(&fence->timeout_work, do_fence_timeout);
+
+	return &fence->base;
+}
+EXPORT_SYMBOL(host1x_fence_create);
+
+int host1x_fence_create_fd(struct host1x_syncpt *sp, u32 threshold)
+{
+	struct sync_file *file;
+	struct dma_fence *f;
+	int fd;
+
+	f = host1x_fence_create(sp, threshold);
+	if (IS_ERR(f))
+		return PTR_ERR(f);
+
+	fd = get_unused_fd_flags(O_CLOEXEC);
+	if (fd < 0) {
+		dma_fence_put(f);
+		return fd;
+	}
+
+	file = sync_file_create(f);
+	dma_fence_put(f);
+	if (!file)
+		return -ENOMEM;
+
+	fd_install(fd, file->file);
+
+	return fd;
+}
+EXPORT_SYMBOL(host1x_fence_create_fd);
+
+int host1x_fence_extract(struct dma_fence *fence, u32 *id, u32 *threshold)
+{
+	struct host1x_syncpt_fence *f;
+
+	if (fence->ops != &syncpt_fence_ops)
+		return -EINVAL;
+
+	f = container_of(fence, struct host1x_syncpt_fence, base);
+
+	*id = f->sp->id;
+	*threshold = f->threshold;
+
+	return 0;
+}
+EXPORT_SYMBOL(host1x_fence_extract);
diff --git a/drivers/gpu/host1x/fence.h b/drivers/gpu/host1x/fence.h
new file mode 100644
index 000000000000..e36dfc11cca4
--- /dev/null
+++ b/drivers/gpu/host1x/fence.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2020, NVIDIA Corporation.
+ */
+
+#ifndef HOST1X_FENCE_H
+#define HOST1X_FENCE_H
+
+struct host1x_syncpt_fence;
+
+bool host1x_fence_signal(struct host1x_syncpt_fence *fence);
+
+int host1x_fence_extract(struct dma_fence *fence, u32 *id, u32 *threshold);
+
+#endif
diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c
index 5d328d20ce6d..19b59c5c94d0 100644
--- a/drivers/gpu/host1x/intr.c
+++ b/drivers/gpu/host1x/intr.c
@@ -13,6 +13,7 @@
 #include <trace/events/host1x.h>
 #include "channel.h"
 #include "dev.h"
+#include "fence.h"
 #include "intr.h"
 
 /* Wait list management */
@@ -121,12 +122,20 @@ static void action_wakeup_interruptible(struct host1x_waitlist *waiter)
 	wake_up_interruptible(wq);
 }
 
+static void action_signal_fence(struct host1x_waitlist *waiter)
+{
+	struct host1x_syncpt_fence *f = waiter->data;
+
+	host1x_fence_signal(f);
+}
+
 typedef void (*action_handler)(struct host1x_waitlist *waiter);
 
 static const action_handler action_handlers[HOST1X_INTR_ACTION_COUNT] = {
 	action_submit_complete,
 	action_wakeup,
 	action_wakeup_interruptible,
+	action_signal_fence,
 };
 
 static void run_handlers(struct list_head completed[HOST1X_INTR_ACTION_COUNT])
diff --git a/drivers/gpu/host1x/intr.h b/drivers/gpu/host1x/intr.h
index aac38194398f..dedbd0f700fb 100644
--- a/drivers/gpu/host1x/intr.h
+++ b/drivers/gpu/host1x/intr.h
@@ -33,6 +33,8 @@ enum host1x_intr_action {
 	 */
 	HOST1X_INTR_ACTION_WAKEUP_INTERRUPTIBLE,
 
+	HOST1X_INTR_ACTION_SIGNAL_FENCE,
+
 	HOST1X_INTR_ACTION_COUNT
 };
 
diff --git a/drivers/gpu/host1x/uapi.c b/drivers/gpu/host1x/uapi.c
index bc10e5fc0813..aae0f534bc71 100644
--- a/drivers/gpu/host1x/uapi.c
+++ b/drivers/gpu/host1x/uapi.c
@@ -11,8 +11,10 @@
 #include <linux/fs.h>
 #include <linux/host1x.h>
 #include <linux/nospec.h>
+#include <linux/sync_file.h>
 
 #include "dev.h"
+#include "fence.h"
 #include "syncpt.h"
 #include "uapi.h"
 
@@ -194,6 +196,102 @@ static int dev_file_ioctl_alloc_syncpoint(struct host1x *host1x,
 	return err;
 }
 
+static int dev_file_ioctl_create_fence(struct host1x *host1x, void __user *data)
+{
+	struct host1x_create_fence args;
+	unsigned long copy_err;
+	struct sync_file *file;
+	int fd;
+
+	copy_err = copy_from_user(&args, data, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	if (args.reserved[0])
+		return -EINVAL;
+
+	if (args.id >= host1x_syncpt_nb_pts(host1x))
+		return -EINVAL;
+
+	args.id = array_index_nospec(args.id, host1x_syncpt_nb_pts(host1x));
+
+	fd = host1x_fence_create_fd(&host1x->syncpt[args.id], args.threshold);
+	if (fd < 0)
+		return fd;
+
+	args.fence_fd = fd;
+
+	copy_err = copy_to_user(data, &args, sizeof(args));
+	if (copy_err) {
+		fput(file->file);
+		return -EFAULT;
+	}
+
+	return 0;
+}
+
+static int dev_file_ioctl_fence_extract(struct host1x *host1x, void __user *data)
+{
+	struct host1x_fence_extract_fence __user *fences_user_ptr;
+	struct dma_fence *fence, **fences;
+	struct host1x_fence_extract args;
+	struct dma_fence_array *array;
+	unsigned int num_fences, i;
+	unsigned long copy_err;
+	int err;
+
+	copy_err = copy_from_user(&args, data, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	fences_user_ptr = u64_to_user_ptr(args.fences_ptr);
+
+	if (args.reserved[0] || args.reserved[1])
+		return -EINVAL;
+
+	fence = sync_file_get_fence(args.fence_fd);
+	if (!fence)
+		return -EINVAL;
+
+	array = to_dma_fence_array(fence);
+	if (array) {
+		fences = array->fences;
+		num_fences = array->num_fences;
+	} else {
+		fences = &fence;
+		num_fences = 1;
+	}
+
+	for (i = 0; i < min(num_fences, args.num_fences); i++) {
+		struct host1x_fence_extract_fence f;
+
+		err = host1x_fence_extract(fences[i], &f.id, &f.threshold);
+		if (err)
+			goto put_fence;
+
+		copy_err = copy_to_user(fences_user_ptr + i, &f, sizeof(f));
+		if (copy_err) {
+			err = -EFAULT;
+			goto put_fence;
+		}
+	}
+
+	args.num_fences = i+1;
+
+	copy_err = copy_to_user(data, &args, sizeof(args));
+	if (copy_err) {
+		err = -EFAULT;
+		goto put_fence;
+	}
+
+	return 0;
+
+put_fence:
+	dma_fence_put(fence);
+
+	return err;
+}
+
 static long dev_file_ioctl(struct file *file, unsigned int cmd,
 			   unsigned long arg)
 {
@@ -209,6 +307,14 @@ static long dev_file_ioctl(struct file *file, unsigned int cmd,
 		err = dev_file_ioctl_alloc_syncpoint(file->private_data, data);
 		break;
 
+	case HOST1X_IOCTL_CREATE_FENCE:
+		err = dev_file_ioctl_create_fence(file->private_data, data);
+		break;
+
+	case HOST1X_IOCTL_FENCE_EXTRACT:
+		err = dev_file_ioctl_fence_extract(file->private_data, data);
+		break;
+
 	default:
 		err = -ENOTTY;
 	}
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index b970e1bbc29d..73a247e180a9 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -162,6 +162,9 @@ u32 host1x_syncpt_base_id(struct host1x_syncpt_base *base);
 
 struct host1x_syncpt *host1x_syncpt_fd_get(int fd);
 
+struct dma_fence *host1x_fence_create(struct host1x_syncpt *sp, u32 threshold);
+int host1x_fence_create_fd(struct host1x_syncpt *sp, u32 threshold);
+
 /*
  * host1x channel
  */
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH v2 10/17] WIP: gpu: host1x: Add no-recovery mode
  2020-09-05 10:34 ` Mikko Perttunen
@ 2020-09-05 10:34   ` Mikko Perttunen
  -1 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-05 10:34 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Add a new property for jobs to enable or disable recovery i.e.
CPU increments of syncpoints to max value on job timeout. This
allows for a more solid model for hanged jobs, where userspace
doesn't need to guess if a syncpoint increment happened because
the job completed, or because job timeout was triggered.

On job timeout, we stop the channel, NOP all future jobs on the
channel using the same syncpoint, mark the syncpoint as locked
and resume the channel from the next job, if any.

The future jobs are NOPed, since because we don't do the CPU
increments, the value of the syncpoint is no longer synchronized,
and any waiters would become confused if a future job incremented
the syncpoint. The syncpoint is marked locked to ensure that any
future jobs cannot increment the syncpoint either, until the
application has recognized the situation and reallocated the
syncpoint.

WIP: There is a race condition between the locking and submission:
* Submission passes locking check
* Concurrent existing job timeouts, locking the syncpoint
* Submission still goes ahead

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/drm/tegra/drm.c        |  1 +
 drivers/gpu/host1x/cdma.c          | 42 +++++++++++++++++++++++++-----
 drivers/gpu/host1x/hw/channel_hw.c |  6 ++++-
 drivers/gpu/host1x/job.c           |  4 +++
 drivers/gpu/host1x/syncpt.c        |  2 ++
 drivers/gpu/host1x/syncpt.h        | 12 +++++++++
 include/linux/host1x.h             |  9 +++++++
 7 files changed, 69 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index ceea9db341f0..7437c67924aa 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -197,6 +197,7 @@ int tegra_drm_submit(struct tegra_drm_context *context,
 	job->client = client;
 	job->class = client->class;
 	job->serialize = true;
+	job->syncpt_recovery = true;
 
 	/*
 	 * Track referenced BOs so that they can be unreferenced after the
diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
index 6e6ca774f68d..59ad4ca38292 100644
--- a/drivers/gpu/host1x/cdma.c
+++ b/drivers/gpu/host1x/cdma.c
@@ -312,10 +312,6 @@ static void update_cdma_locked(struct host1x_cdma *cdma)
 	bool signal = false;
 	struct host1x_job *job, *n;
 
-	/* If CDMA is stopped, queue is cleared and we can return */
-	if (!cdma->running)
-		return;
-
 	/*
 	 * Walk the sync queue, reading the sync point registers as necessary,
 	 * to consume as many sync queue entries as possible without blocking
@@ -324,7 +320,8 @@ static void update_cdma_locked(struct host1x_cdma *cdma)
 		struct host1x_syncpt *sp = job->syncpt;
 
 		/* Check whether this syncpt has completed, and bail if not */
-		if (!host1x_syncpt_is_expired(sp, job->syncpt_end)) {
+		if (!host1x_syncpt_is_expired(sp, job->syncpt_end) &&
+		    !job->cancelled) {
 			/* Start timer on next pending syncpt */
 			if (job->timeout)
 				cdma_start_timer_locked(cdma, job);
@@ -413,8 +410,11 @@ void host1x_cdma_update_sync_queue(struct host1x_cdma *cdma,
 	else
 		restart_addr = cdma->last_pos;
 
+	if (!job)
+		goto resume;
+
 	/* do CPU increments for the remaining syncpts */
-	if (job) {
+	if (job->syncpt_recovery) {
 		dev_dbg(dev, "%s: perform CPU incr on pending buffers\n",
 			__func__);
 
@@ -433,8 +433,38 @@ void host1x_cdma_update_sync_queue(struct host1x_cdma *cdma,
 
 		dev_dbg(dev, "%s: finished sync_queue modification\n",
 			__func__);
+	} else {
+		struct host1x_job *failed_job = job;
+
+		host1x_job_dump(dev, job);
+
+		host1x_syncpt_set_locked(job->syncpt);
+		failed_job->cancelled = true;
+
+		list_for_each_entry_continue(job, &cdma->sync_queue, list) {
+			unsigned int i;
+
+			if (job->syncpt != failed_job->syncpt)
+				continue;
+
+			for (i = 0; i < job->num_slots; i++) {
+				unsigned int slot = (job->first_get/8 + i) %
+						    HOST1X_PUSHBUFFER_SLOTS;
+				u32 *mapped = cdma->push_buffer.mapped;
+
+				mapped[2*slot+0] = 0x1bad0000;
+				mapped[2*slot+1] = 0x1bad0000;
+			}
+
+			job->cancelled = true;
+		}
+
+		wmb();
+
+		update_cdma_locked(cdma);
 	}
 
+resume:
 	/* roll back DMAGET and start up channel again */
 	host1x_hw_cdma_resume(host1x, cdma, restart_addr);
 }
diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
index d4c28faf27d1..145746c6f6fb 100644
--- a/drivers/gpu/host1x/hw/channel_hw.c
+++ b/drivers/gpu/host1x/hw/channel_hw.c
@@ -129,6 +129,10 @@ static int channel_submit(struct host1x_job *job)
 				    job->num_gathers, job->num_relocs,
 				    job->syncpt->id, job->syncpt_incrs);
 
+	/* TODO this is racy */
+	if (job->syncpt->locked)
+		return -EPERM;
+
 	/* before error checks, return current max */
 	prev_max = job->syncpt_end = host1x_syncpt_read_max(sp);
 
@@ -191,7 +195,7 @@ static int channel_submit(struct host1x_job *job)
 	/* schedule a submit complete interrupt */
 	err = host1x_intr_add_action(host, sp, syncval,
 				     HOST1X_INTR_ACTION_SUBMIT_COMPLETE, ch,
-				     completed_waiter, NULL);
+				     completed_waiter, &job->waiter);
 	completed_waiter = NULL;
 	WARN(err, "Failed to set submit complete interrupt");
 
diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
index d8345d3bf0b3..e4f16fc899b0 100644
--- a/drivers/gpu/host1x/job.c
+++ b/drivers/gpu/host1x/job.c
@@ -79,6 +79,10 @@ static void job_free(struct kref *ref)
 {
 	struct host1x_job *job = container_of(ref, struct host1x_job, ref);
 
+	if (job->waiter)
+		host1x_intr_put_ref(job->syncpt->host, job->syncpt->id,
+				    job->waiter);
+
 	if (job->syncpt)
 		host1x_syncpt_put(job->syncpt);
 
diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index b31b994624fa..2fad8b2a55cc 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -385,6 +385,8 @@ static void syncpt_release(struct kref *ref)
 {
 	struct host1x_syncpt *sp = container_of(ref, struct host1x_syncpt, ref);
 
+	sp->locked = false;
+
 	mutex_lock(&sp->host->syncpt_mutex);
 
 	host1x_syncpt_base_free(sp->base);
diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h
index eb49d7003743..d19461704ea2 100644
--- a/drivers/gpu/host1x/syncpt.h
+++ b/drivers/gpu/host1x/syncpt.h
@@ -40,6 +40,13 @@ struct host1x_syncpt {
 
 	/* interrupt data */
 	struct host1x_syncpt_intr intr;
+
+	/* 
+	 * If a submission incrementing this syncpoint fails, lock it so that
+	 * further submission cannot be made until application has handled the
+	 * failure.
+	 */
+	bool locked;
 };
 
 /* Initialize sync point array  */
@@ -120,4 +127,9 @@ struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
 					  struct host1x_client *client,
 					  unsigned long flags);
 
+static inline void host1x_syncpt_set_locked(struct host1x_syncpt *sp)
+{
+	sp->locked = true;
+}
+
 #endif
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index 73a247e180a9..3ffe16152ebc 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -229,9 +229,15 @@ struct host1x_job {
 	u32 syncpt_incrs;
 	u32 syncpt_end;
 
+	/* Completion waiter ref */
+	void *waiter;
+
 	/* Maximum time to wait for this job */
 	unsigned int timeout;
 
+	/* Job has timed out and should be released */
+	bool cancelled;
+
 	/* Index and number of slots used in the push buffer */
 	unsigned int first_get;
 	unsigned int num_slots;
@@ -252,6 +258,9 @@ struct host1x_job {
 
 	/* Add a channel wait for previous ops to complete */
 	bool serialize;
+
+	/* Fast-forward syncpoint increments on job timeout */
+	bool syncpt_recovery;
 };
 
 struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH v2 10/17] WIP: gpu: host1x: Add no-recovery mode
@ 2020-09-05 10:34   ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-05 10:34 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

Add a new property for jobs to enable or disable recovery i.e.
CPU increments of syncpoints to max value on job timeout. This
allows for a more solid model for hanged jobs, where userspace
doesn't need to guess if a syncpoint increment happened because
the job completed, or because job timeout was triggered.

On job timeout, we stop the channel, NOP all future jobs on the
channel using the same syncpoint, mark the syncpoint as locked
and resume the channel from the next job, if any.

The future jobs are NOPed, since because we don't do the CPU
increments, the value of the syncpoint is no longer synchronized,
and any waiters would become confused if a future job incremented
the syncpoint. The syncpoint is marked locked to ensure that any
future jobs cannot increment the syncpoint either, until the
application has recognized the situation and reallocated the
syncpoint.

WIP: There is a race condition between the locking and submission:
* Submission passes locking check
* Concurrent existing job timeouts, locking the syncpoint
* Submission still goes ahead

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/drm/tegra/drm.c        |  1 +
 drivers/gpu/host1x/cdma.c          | 42 +++++++++++++++++++++++++-----
 drivers/gpu/host1x/hw/channel_hw.c |  6 ++++-
 drivers/gpu/host1x/job.c           |  4 +++
 drivers/gpu/host1x/syncpt.c        |  2 ++
 drivers/gpu/host1x/syncpt.h        | 12 +++++++++
 include/linux/host1x.h             |  9 +++++++
 7 files changed, 69 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index ceea9db341f0..7437c67924aa 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -197,6 +197,7 @@ int tegra_drm_submit(struct tegra_drm_context *context,
 	job->client = client;
 	job->class = client->class;
 	job->serialize = true;
+	job->syncpt_recovery = true;
 
 	/*
 	 * Track referenced BOs so that they can be unreferenced after the
diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
index 6e6ca774f68d..59ad4ca38292 100644
--- a/drivers/gpu/host1x/cdma.c
+++ b/drivers/gpu/host1x/cdma.c
@@ -312,10 +312,6 @@ static void update_cdma_locked(struct host1x_cdma *cdma)
 	bool signal = false;
 	struct host1x_job *job, *n;
 
-	/* If CDMA is stopped, queue is cleared and we can return */
-	if (!cdma->running)
-		return;
-
 	/*
 	 * Walk the sync queue, reading the sync point registers as necessary,
 	 * to consume as many sync queue entries as possible without blocking
@@ -324,7 +320,8 @@ static void update_cdma_locked(struct host1x_cdma *cdma)
 		struct host1x_syncpt *sp = job->syncpt;
 
 		/* Check whether this syncpt has completed, and bail if not */
-		if (!host1x_syncpt_is_expired(sp, job->syncpt_end)) {
+		if (!host1x_syncpt_is_expired(sp, job->syncpt_end) &&
+		    !job->cancelled) {
 			/* Start timer on next pending syncpt */
 			if (job->timeout)
 				cdma_start_timer_locked(cdma, job);
@@ -413,8 +410,11 @@ void host1x_cdma_update_sync_queue(struct host1x_cdma *cdma,
 	else
 		restart_addr = cdma->last_pos;
 
+	if (!job)
+		goto resume;
+
 	/* do CPU increments for the remaining syncpts */
-	if (job) {
+	if (job->syncpt_recovery) {
 		dev_dbg(dev, "%s: perform CPU incr on pending buffers\n",
 			__func__);
 
@@ -433,8 +433,38 @@ void host1x_cdma_update_sync_queue(struct host1x_cdma *cdma,
 
 		dev_dbg(dev, "%s: finished sync_queue modification\n",
 			__func__);
+	} else {
+		struct host1x_job *failed_job = job;
+
+		host1x_job_dump(dev, job);
+
+		host1x_syncpt_set_locked(job->syncpt);
+		failed_job->cancelled = true;
+
+		list_for_each_entry_continue(job, &cdma->sync_queue, list) {
+			unsigned int i;
+
+			if (job->syncpt != failed_job->syncpt)
+				continue;
+
+			for (i = 0; i < job->num_slots; i++) {
+				unsigned int slot = (job->first_get/8 + i) %
+						    HOST1X_PUSHBUFFER_SLOTS;
+				u32 *mapped = cdma->push_buffer.mapped;
+
+				mapped[2*slot+0] = 0x1bad0000;
+				mapped[2*slot+1] = 0x1bad0000;
+			}
+
+			job->cancelled = true;
+		}
+
+		wmb();
+
+		update_cdma_locked(cdma);
 	}
 
+resume:
 	/* roll back DMAGET and start up channel again */
 	host1x_hw_cdma_resume(host1x, cdma, restart_addr);
 }
diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
index d4c28faf27d1..145746c6f6fb 100644
--- a/drivers/gpu/host1x/hw/channel_hw.c
+++ b/drivers/gpu/host1x/hw/channel_hw.c
@@ -129,6 +129,10 @@ static int channel_submit(struct host1x_job *job)
 				    job->num_gathers, job->num_relocs,
 				    job->syncpt->id, job->syncpt_incrs);
 
+	/* TODO this is racy */
+	if (job->syncpt->locked)
+		return -EPERM;
+
 	/* before error checks, return current max */
 	prev_max = job->syncpt_end = host1x_syncpt_read_max(sp);
 
@@ -191,7 +195,7 @@ static int channel_submit(struct host1x_job *job)
 	/* schedule a submit complete interrupt */
 	err = host1x_intr_add_action(host, sp, syncval,
 				     HOST1X_INTR_ACTION_SUBMIT_COMPLETE, ch,
-				     completed_waiter, NULL);
+				     completed_waiter, &job->waiter);
 	completed_waiter = NULL;
 	WARN(err, "Failed to set submit complete interrupt");
 
diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
index d8345d3bf0b3..e4f16fc899b0 100644
--- a/drivers/gpu/host1x/job.c
+++ b/drivers/gpu/host1x/job.c
@@ -79,6 +79,10 @@ static void job_free(struct kref *ref)
 {
 	struct host1x_job *job = container_of(ref, struct host1x_job, ref);
 
+	if (job->waiter)
+		host1x_intr_put_ref(job->syncpt->host, job->syncpt->id,
+				    job->waiter);
+
 	if (job->syncpt)
 		host1x_syncpt_put(job->syncpt);
 
diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index b31b994624fa..2fad8b2a55cc 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -385,6 +385,8 @@ static void syncpt_release(struct kref *ref)
 {
 	struct host1x_syncpt *sp = container_of(ref, struct host1x_syncpt, ref);
 
+	sp->locked = false;
+
 	mutex_lock(&sp->host->syncpt_mutex);
 
 	host1x_syncpt_base_free(sp->base);
diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h
index eb49d7003743..d19461704ea2 100644
--- a/drivers/gpu/host1x/syncpt.h
+++ b/drivers/gpu/host1x/syncpt.h
@@ -40,6 +40,13 @@ struct host1x_syncpt {
 
 	/* interrupt data */
 	struct host1x_syncpt_intr intr;
+
+	/* 
+	 * If a submission incrementing this syncpoint fails, lock it so that
+	 * further submission cannot be made until application has handled the
+	 * failure.
+	 */
+	bool locked;
 };
 
 /* Initialize sync point array  */
@@ -120,4 +127,9 @@ struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
 					  struct host1x_client *client,
 					  unsigned long flags);
 
+static inline void host1x_syncpt_set_locked(struct host1x_syncpt *sp)
+{
+	sp->locked = true;
+}
+
 #endif
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index 73a247e180a9..3ffe16152ebc 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -229,9 +229,15 @@ struct host1x_job {
 	u32 syncpt_incrs;
 	u32 syncpt_end;
 
+	/* Completion waiter ref */
+	void *waiter;
+
 	/* Maximum time to wait for this job */
 	unsigned int timeout;
 
+	/* Job has timed out and should be released */
+	bool cancelled;
+
 	/* Index and number of slots used in the push buffer */
 	unsigned int first_get;
 	unsigned int num_slots;
@@ -252,6 +258,9 @@ struct host1x_job {
 
 	/* Add a channel wait for previous ops to complete */
 	bool serialize;
+
+	/* Fast-forward syncpoint increments on job timeout */
+	bool syncpt_recovery;
 };
 
 struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH v2 11/17] gpu: host1x: Add job release callback
  2020-09-05 10:34 ` Mikko Perttunen
@ 2020-09-05 10:34   ` Mikko Perttunen
  -1 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-05 10:34 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Add a callback field to the job structure, to be called just before
the job is to be freed. This allows the job's submitter to clean
up any of its own state, like decrement runtime PM refcounts.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/job.c | 3 +++
 include/linux/host1x.h   | 4 ++++
 2 files changed, 7 insertions(+)

diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
index e4f16fc899b0..acf322beb56c 100644
--- a/drivers/gpu/host1x/job.c
+++ b/drivers/gpu/host1x/job.c
@@ -79,6 +79,9 @@ static void job_free(struct kref *ref)
 {
 	struct host1x_job *job = container_of(ref, struct host1x_job, ref);
 
+	if (job->release)
+		job->release(job);
+
 	if (job->waiter)
 		host1x_intr_put_ref(job->syncpt->host, job->syncpt->id,
 				    job->waiter);
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index 3ffe16152ebc..cabc5bef5bae 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -261,6 +261,10 @@ struct host1x_job {
 
 	/* Fast-forward syncpoint increments on job timeout */
 	bool syncpt_recovery;
+
+	/* Callback called when job is freed */
+	void (*release)(struct host1x_job *job);
+	void *user_data;
 };
 
 struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH v2 11/17] gpu: host1x: Add job release callback
@ 2020-09-05 10:34   ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-05 10:34 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

Add a callback field to the job structure, to be called just before
the job is to be freed. This allows the job's submitter to clean
up any of its own state, like decrement runtime PM refcounts.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/job.c | 3 +++
 include/linux/host1x.h   | 4 ++++
 2 files changed, 7 insertions(+)

diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
index e4f16fc899b0..acf322beb56c 100644
--- a/drivers/gpu/host1x/job.c
+++ b/drivers/gpu/host1x/job.c
@@ -79,6 +79,9 @@ static void job_free(struct kref *ref)
 {
 	struct host1x_job *job = container_of(ref, struct host1x_job, ref);
 
+	if (job->release)
+		job->release(job);
+
 	if (job->waiter)
 		host1x_intr_put_ref(job->syncpt->host, job->syncpt->id,
 				    job->waiter);
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index 3ffe16152ebc..cabc5bef5bae 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -261,6 +261,10 @@ struct host1x_job {
 
 	/* Fast-forward syncpoint increments on job timeout */
 	bool syncpt_recovery;
+
+	/* Callback called when job is freed */
+	void (*release)(struct host1x_job *job);
+	void *user_data;
 };
 
 struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH v2 12/17] gpu: host1x: Add support for syncpoint waits in CDMA pushbuffer
  2020-09-05 10:34 ` Mikko Perttunen
@ 2020-09-05 10:34   ` Mikko Perttunen
  -1 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-05 10:34 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Add support for inserting syncpoint waits in the CDMA pushbuffer.
These waits need to be done in HOST1X class, while gather submitted
by the application execute in engine class.

Support is added by converting the gather list of job into a command
list that can include both gathers and waits. When the job is
submitted, these commands are pushed as the appropriate opcodes
on the CDMA pushbuffer.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/hw/channel_hw.c | 51 +++++++++++++++--------
 drivers/gpu/host1x/hw/debug_hw.c   |  9 +++-
 drivers/gpu/host1x/job.c           | 67 +++++++++++++++++++++---------
 drivers/gpu/host1x/job.h           | 14 +++++++
 include/linux/host1x.h             |  5 ++-
 5 files changed, 105 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
index 145746c6f6fb..57e99de528de 100644
--- a/drivers/gpu/host1x/hw/channel_hw.c
+++ b/drivers/gpu/host1x/hw/channel_hw.c
@@ -55,31 +55,46 @@ static void submit_gathers(struct host1x_job *job)
 #endif
 	unsigned int i;
 
-	for (i = 0; i < job->num_gathers; i++) {
-		struct host1x_job_gather *g = &job->gathers[i];
-		dma_addr_t addr = g->base + g->offset;
-		u32 op2, op3;
+	for (i = 0; i < job->num_cmds; i++) {
+		struct host1x_job_cmd *cmd = &job->cmds[i];
 
-		op2 = lower_32_bits(addr);
-		op3 = upper_32_bits(addr);
+		if (cmd->is_wait) {
+			/* TODO use modern wait */
+			host1x_cdma_push(cdma,
+				 host1x_opcode_setclass(HOST1X_CLASS_HOST1X,
+					host1x_uclass_wait_syncpt_r(), 1),
+				 host1x_class_host_wait_syncpt(cmd->wait.id,
+					cmd->wait.threshold));
+			host1x_cdma_push(
+				cdma, host1x_opcode_setclass(job->class, 0, 0),
+				HOST1X_OPCODE_NOP);
+		} else {
+			struct host1x_job_gather *g = &cmd->gather;
 
-		trace_write_gather(cdma, g->bo, g->offset, g->words);
+			dma_addr_t addr = g->base + g->offset;
+			u32 op2, op3;
 
-		if (op3 != 0) {
+			op2 = lower_32_bits(addr);
+			op3 = upper_32_bits(addr);
+
+			trace_write_gather(cdma, g->bo, g->offset, g->words);
+
+			if (op3 != 0) {
 #if HOST1X_HW >= 6
-			u32 op1 = host1x_opcode_gather_wide(g->words);
-			u32 op4 = HOST1X_OPCODE_NOP;
+				u32 op1 = host1x_opcode_gather_wide(g->words);
+				u32 op4 = HOST1X_OPCODE_NOP;
 
-			host1x_cdma_push_wide(cdma, op1, op2, op3, op4);
+				host1x_cdma_push_wide(cdma, op1, op2, op3, op4);
 #else
-			dev_err(dev, "invalid gather for push buffer %pad\n",
-				&addr);
-			continue;
+				dev_err(dev, "invalid gather for push buffer %pad\n",
+					&addr);
+				continue;
 #endif
-		} else {
-			u32 op1 = host1x_opcode_gather(g->words);
+			} else {
+				u32 op1 = host1x_opcode_gather(g->words);
 
-			host1x_cdma_push(cdma, op1, op2);
+				host1x_cdma_push(cdma, op1, op2);
+			}
 		}
 	}
 }
@@ -126,7 +141,7 @@ static int channel_submit(struct host1x_job *job)
 	struct host1x *host = dev_get_drvdata(ch->dev->parent);
 
 	trace_host1x_channel_submit(dev_name(ch->dev),
-				    job->num_gathers, job->num_relocs,
+				    job->num_cmds, job->num_relocs,
 				    job->syncpt->id, job->syncpt_incrs);
 
 	/* TODO this is racy */
diff --git a/drivers/gpu/host1x/hw/debug_hw.c b/drivers/gpu/host1x/hw/debug_hw.c
index ceb48229d14b..35952fd5597e 100644
--- a/drivers/gpu/host1x/hw/debug_hw.c
+++ b/drivers/gpu/host1x/hw/debug_hw.c
@@ -208,10 +208,15 @@ static void show_channel_gathers(struct output *o, struct host1x_cdma *cdma)
 				    job->first_get, job->timeout,
 				    job->num_slots, job->num_unpins);
 
-		for (i = 0; i < job->num_gathers; i++) {
-			struct host1x_job_gather *g = &job->gathers[i];
+		for (i = 0; i < job->num_cmds; i++) {
+			struct host1x_job_gather *g;
 			u32 *mapped;
 
+			if (job->cmds[i].is_wait)
+				continue;
+
+			g = &job->cmds[i].gather;
+
 			if (job->gather_copy_mapped)
 				mapped = (u32 *)job->gather_copy_mapped;
 			else
diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
index acf322beb56c..ac4091596811 100644
--- a/drivers/gpu/host1x/job.c
+++ b/drivers/gpu/host1x/job.c
@@ -38,7 +38,7 @@ struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
 	total = sizeof(struct host1x_job) +
 		(u64)num_relocs * sizeof(struct host1x_reloc) +
 		(u64)num_unpins * sizeof(struct host1x_job_unpin_data) +
-		(u64)num_cmdbufs * sizeof(struct host1x_job_gather) +
+		(u64)num_cmdbufs * sizeof(struct host1x_job_cmd) +
 		(u64)num_unpins * sizeof(dma_addr_t) +
 		(u64)num_unpins * sizeof(u32 *);
 	if (total > ULONG_MAX)
@@ -57,8 +57,8 @@ struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
 	mem += num_relocs * sizeof(struct host1x_reloc);
 	job->unpins = num_unpins ? mem : NULL;
 	mem += num_unpins * sizeof(struct host1x_job_unpin_data);
-	job->gathers = num_cmdbufs ? mem : NULL;
-	mem += num_cmdbufs * sizeof(struct host1x_job_gather);
+	job->cmds = num_cmdbufs ? mem : NULL;
+	mem += num_cmdbufs * sizeof(struct host1x_job_cmd);
 	job->addr_phys = num_unpins ? mem : NULL;
 
 	job->reloc_addr_phys = job->addr_phys;
@@ -101,22 +101,35 @@ EXPORT_SYMBOL(host1x_job_put);
 void host1x_job_add_gather(struct host1x_job *job, struct host1x_bo *bo,
 			   unsigned int words, unsigned int offset)
 {
-	struct host1x_job_gather *gather = &job->gathers[job->num_gathers];
+	struct host1x_job_gather *gather = &job->cmds[job->num_cmds].gather;
 
 	gather->words = words;
 	gather->bo = bo;
 	gather->offset = offset;
 
-	job->num_gathers++;
+	job->num_cmds++;
 }
 EXPORT_SYMBOL(host1x_job_add_gather);
 
+void host1x_job_add_wait(struct host1x_job *job, u32 id, u32 thresh)
+{
+	struct host1x_job_cmd *cmd = &job->cmds[job->num_cmds];
+
+	cmd->is_wait = true;
+	cmd->wait.id = id;
+	cmd->wait.threshold = thresh;
+
+	job->num_cmds++;
+}
+EXPORT_SYMBOL(host1x_job_add_wait);
+
 static unsigned int pin_job(struct host1x *host, struct host1x_job *job)
 {
 	struct host1x_client *client = job->client;
 	struct device *dev = client->dev;
 	struct host1x_job_gather *g;
 	struct iommu_domain *domain;
+	struct sg_table *sgt;
 	unsigned int i;
 	int err;
 
@@ -126,7 +139,6 @@ static unsigned int pin_job(struct host1x *host, struct host1x_job *job)
 	for (i = 0; i < job->num_relocs; i++) {
 		struct host1x_reloc *reloc = &job->relocs[i];
 		dma_addr_t phys_addr, *phys;
-		struct sg_table *sgt;
 
 		reloc->target.bo = host1x_bo_get(reloc->target.bo);
 		if (!reloc->target.bo) {
@@ -204,17 +216,20 @@ static unsigned int pin_job(struct host1x *host, struct host1x_job *job)
 	if (IS_ENABLED(CONFIG_TEGRA_HOST1X_FIREWALL))
 		return 0;
 
-	for (i = 0; i < job->num_gathers; i++) {
+	for (i = 0; i < job->num_cmds; i++) {
 		size_t gather_size = 0;
 		struct scatterlist *sg;
-		struct sg_table *sgt;
 		dma_addr_t phys_addr;
 		unsigned long shift;
 		struct iova *alloc;
 		dma_addr_t *phys;
 		unsigned int j;
 
-		g = &job->gathers[i];
+		if (job->cmds[i].is_wait)
+			continue;
+
+		g = &job->cmds[i].gather;
+
 		g->bo = host1x_bo_get(g->bo);
 		if (!g->bo) {
 			err = -EINVAL;
@@ -550,8 +565,13 @@ static inline int copy_gathers(struct device *host, struct host1x_job *job,
 	fw.num_relocs = job->num_relocs;
 	fw.class = job->class;
 
-	for (i = 0; i < job->num_gathers; i++) {
-		struct host1x_job_gather *g = &job->gathers[i];
+	for (i = 0; i < job->num_cmds; i++) {
+		struct host1x_job_gather *g;
+
+		if (job->cmds[i].is_wait)
+			continue;
+
+		g = &job->cmds[i].gather;
 
 		size += g->words * sizeof(u32);
 	}
@@ -573,10 +593,14 @@ static inline int copy_gathers(struct device *host, struct host1x_job *job,
 
 	job->gather_copy_size = size;
 
-	for (i = 0; i < job->num_gathers; i++) {
-		struct host1x_job_gather *g = &job->gathers[i];
+	for (i = 0; i < job->num_cmds; i++) {
+		struct host1x_job_gather *g;
 		void *gather;
 
+		if (job->cmds[i].is_wait)
+			continue;
+		g = &job->cmds[i].gather;
+
 		/* Copy the gather */
 		gather = host1x_bo_mmap(g->bo);
 		memcpy(job->gather_copy_mapped + offset, gather + g->offset,
@@ -619,8 +643,12 @@ int host1x_job_pin(struct host1x_job *job, struct device *dev)
 	}
 
 	/* patch gathers */
-	for (i = 0; i < job->num_gathers; i++) {
-		struct host1x_job_gather *g = &job->gathers[i];
+	for (i = 0; i < job->num_cmds; i++) {
+		struct host1x_job_gather *g;
+
+		if (job->cmds[i].is_wait)
+			continue;
+		g = &job->cmds[i].gather;
 
 		/* process each gather mem only once */
 		if (g->handled)
@@ -630,10 +658,11 @@ int host1x_job_pin(struct host1x_job *job, struct device *dev)
 		if (!IS_ENABLED(CONFIG_TEGRA_HOST1X_FIREWALL))
 			g->base = job->gather_addr_phys[i];
 
-		for (j = i + 1; j < job->num_gathers; j++) {
-			if (job->gathers[j].bo == g->bo) {
-				job->gathers[j].handled = true;
-				job->gathers[j].base = g->base;
+		for (j = i + 1; j < job->num_cmds; j++) {
+			if (!job->cmds[j].is_wait &&
+			    job->cmds[j].gather.bo == g->bo) {
+				job->cmds[j].gather.handled = true;
+				job->cmds[j].gather.base = g->base;
 			}
 		}
 
diff --git a/drivers/gpu/host1x/job.h b/drivers/gpu/host1x/job.h
index 94bc2e4ae241..33adfaede842 100644
--- a/drivers/gpu/host1x/job.h
+++ b/drivers/gpu/host1x/job.h
@@ -18,6 +18,20 @@ struct host1x_job_gather {
 	bool handled;
 };
 
+struct host1x_job_wait {
+	u32 id;
+	u32 threshold;
+};
+
+struct host1x_job_cmd {
+	bool is_wait;
+
+	union {
+		struct host1x_job_gather gather;
+		struct host1x_job_wait wait;
+	};
+};
+
 struct host1x_job_unpin_data {
 	struct host1x_bo *bo;
 	struct sg_table *sgt;
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index cabc5bef5bae..78ea56230b97 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -211,8 +211,8 @@ struct host1x_job {
 	struct host1x_client *client;
 
 	/* Gathers and their memory */
-	struct host1x_job_gather *gathers;
-	unsigned int num_gathers;
+	struct host1x_job_cmd *cmds;
+	unsigned int num_cmds;
 
 	/* Array of handles to be pinned & unpinned */
 	struct host1x_reloc *relocs;
@@ -271,6 +271,7 @@ struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
 				    u32 num_cmdbufs, u32 num_relocs);
 void host1x_job_add_gather(struct host1x_job *job, struct host1x_bo *bo,
 			   unsigned int words, unsigned int offset);
+void host1x_job_add_wait(struct host1x_job *job, u32 id, u32 thresh);
 struct host1x_job *host1x_job_get(struct host1x_job *job);
 void host1x_job_put(struct host1x_job *job);
 int host1x_job_pin(struct host1x_job *job, struct device *dev);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH v2 12/17] gpu: host1x: Add support for syncpoint waits in CDMA pushbuffer
@ 2020-09-05 10:34   ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-05 10:34 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

Add support for inserting syncpoint waits in the CDMA pushbuffer.
These waits need to be done in HOST1X class, while gather submitted
by the application execute in engine class.

Support is added by converting the gather list of job into a command
list that can include both gathers and waits. When the job is
submitted, these commands are pushed as the appropriate opcodes
on the CDMA pushbuffer.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/hw/channel_hw.c | 51 +++++++++++++++--------
 drivers/gpu/host1x/hw/debug_hw.c   |  9 +++-
 drivers/gpu/host1x/job.c           | 67 +++++++++++++++++++++---------
 drivers/gpu/host1x/job.h           | 14 +++++++
 include/linux/host1x.h             |  5 ++-
 5 files changed, 105 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
index 145746c6f6fb..57e99de528de 100644
--- a/drivers/gpu/host1x/hw/channel_hw.c
+++ b/drivers/gpu/host1x/hw/channel_hw.c
@@ -55,31 +55,46 @@ static void submit_gathers(struct host1x_job *job)
 #endif
 	unsigned int i;
 
-	for (i = 0; i < job->num_gathers; i++) {
-		struct host1x_job_gather *g = &job->gathers[i];
-		dma_addr_t addr = g->base + g->offset;
-		u32 op2, op3;
+	for (i = 0; i < job->num_cmds; i++) {
+		struct host1x_job_cmd *cmd = &job->cmds[i];
 
-		op2 = lower_32_bits(addr);
-		op3 = upper_32_bits(addr);
+		if (cmd->is_wait) {
+			/* TODO use modern wait */
+			host1x_cdma_push(cdma,
+				 host1x_opcode_setclass(HOST1X_CLASS_HOST1X,
+					host1x_uclass_wait_syncpt_r(), 1),
+				 host1x_class_host_wait_syncpt(cmd->wait.id,
+					cmd->wait.threshold));
+			host1x_cdma_push(
+				cdma, host1x_opcode_setclass(job->class, 0, 0),
+				HOST1X_OPCODE_NOP);
+		} else {
+			struct host1x_job_gather *g = &cmd->gather;
 
-		trace_write_gather(cdma, g->bo, g->offset, g->words);
+			dma_addr_t addr = g->base + g->offset;
+			u32 op2, op3;
 
-		if (op3 != 0) {
+			op2 = lower_32_bits(addr);
+			op3 = upper_32_bits(addr);
+
+			trace_write_gather(cdma, g->bo, g->offset, g->words);
+
+			if (op3 != 0) {
 #if HOST1X_HW >= 6
-			u32 op1 = host1x_opcode_gather_wide(g->words);
-			u32 op4 = HOST1X_OPCODE_NOP;
+				u32 op1 = host1x_opcode_gather_wide(g->words);
+				u32 op4 = HOST1X_OPCODE_NOP;
 
-			host1x_cdma_push_wide(cdma, op1, op2, op3, op4);
+				host1x_cdma_push_wide(cdma, op1, op2, op3, op4);
 #else
-			dev_err(dev, "invalid gather for push buffer %pad\n",
-				&addr);
-			continue;
+				dev_err(dev, "invalid gather for push buffer %pad\n",
+					&addr);
+				continue;
 #endif
-		} else {
-			u32 op1 = host1x_opcode_gather(g->words);
+			} else {
+				u32 op1 = host1x_opcode_gather(g->words);
 
-			host1x_cdma_push(cdma, op1, op2);
+				host1x_cdma_push(cdma, op1, op2);
+			}
 		}
 	}
 }
@@ -126,7 +141,7 @@ static int channel_submit(struct host1x_job *job)
 	struct host1x *host = dev_get_drvdata(ch->dev->parent);
 
 	trace_host1x_channel_submit(dev_name(ch->dev),
-				    job->num_gathers, job->num_relocs,
+				    job->num_cmds, job->num_relocs,
 				    job->syncpt->id, job->syncpt_incrs);
 
 	/* TODO this is racy */
diff --git a/drivers/gpu/host1x/hw/debug_hw.c b/drivers/gpu/host1x/hw/debug_hw.c
index ceb48229d14b..35952fd5597e 100644
--- a/drivers/gpu/host1x/hw/debug_hw.c
+++ b/drivers/gpu/host1x/hw/debug_hw.c
@@ -208,10 +208,15 @@ static void show_channel_gathers(struct output *o, struct host1x_cdma *cdma)
 				    job->first_get, job->timeout,
 				    job->num_slots, job->num_unpins);
 
-		for (i = 0; i < job->num_gathers; i++) {
-			struct host1x_job_gather *g = &job->gathers[i];
+		for (i = 0; i < job->num_cmds; i++) {
+			struct host1x_job_gather *g;
 			u32 *mapped;
 
+			if (job->cmds[i].is_wait)
+				continue;
+
+			g = &job->cmds[i].gather;
+
 			if (job->gather_copy_mapped)
 				mapped = (u32 *)job->gather_copy_mapped;
 			else
diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
index acf322beb56c..ac4091596811 100644
--- a/drivers/gpu/host1x/job.c
+++ b/drivers/gpu/host1x/job.c
@@ -38,7 +38,7 @@ struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
 	total = sizeof(struct host1x_job) +
 		(u64)num_relocs * sizeof(struct host1x_reloc) +
 		(u64)num_unpins * sizeof(struct host1x_job_unpin_data) +
-		(u64)num_cmdbufs * sizeof(struct host1x_job_gather) +
+		(u64)num_cmdbufs * sizeof(struct host1x_job_cmd) +
 		(u64)num_unpins * sizeof(dma_addr_t) +
 		(u64)num_unpins * sizeof(u32 *);
 	if (total > ULONG_MAX)
@@ -57,8 +57,8 @@ struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
 	mem += num_relocs * sizeof(struct host1x_reloc);
 	job->unpins = num_unpins ? mem : NULL;
 	mem += num_unpins * sizeof(struct host1x_job_unpin_data);
-	job->gathers = num_cmdbufs ? mem : NULL;
-	mem += num_cmdbufs * sizeof(struct host1x_job_gather);
+	job->cmds = num_cmdbufs ? mem : NULL;
+	mem += num_cmdbufs * sizeof(struct host1x_job_cmd);
 	job->addr_phys = num_unpins ? mem : NULL;
 
 	job->reloc_addr_phys = job->addr_phys;
@@ -101,22 +101,35 @@ EXPORT_SYMBOL(host1x_job_put);
 void host1x_job_add_gather(struct host1x_job *job, struct host1x_bo *bo,
 			   unsigned int words, unsigned int offset)
 {
-	struct host1x_job_gather *gather = &job->gathers[job->num_gathers];
+	struct host1x_job_gather *gather = &job->cmds[job->num_cmds].gather;
 
 	gather->words = words;
 	gather->bo = bo;
 	gather->offset = offset;
 
-	job->num_gathers++;
+	job->num_cmds++;
 }
 EXPORT_SYMBOL(host1x_job_add_gather);
 
+void host1x_job_add_wait(struct host1x_job *job, u32 id, u32 thresh)
+{
+	struct host1x_job_cmd *cmd = &job->cmds[job->num_cmds];
+
+	cmd->is_wait = true;
+	cmd->wait.id = id;
+	cmd->wait.threshold = thresh;
+
+	job->num_cmds++;
+}
+EXPORT_SYMBOL(host1x_job_add_wait);
+
 static unsigned int pin_job(struct host1x *host, struct host1x_job *job)
 {
 	struct host1x_client *client = job->client;
 	struct device *dev = client->dev;
 	struct host1x_job_gather *g;
 	struct iommu_domain *domain;
+	struct sg_table *sgt;
 	unsigned int i;
 	int err;
 
@@ -126,7 +139,6 @@ static unsigned int pin_job(struct host1x *host, struct host1x_job *job)
 	for (i = 0; i < job->num_relocs; i++) {
 		struct host1x_reloc *reloc = &job->relocs[i];
 		dma_addr_t phys_addr, *phys;
-		struct sg_table *sgt;
 
 		reloc->target.bo = host1x_bo_get(reloc->target.bo);
 		if (!reloc->target.bo) {
@@ -204,17 +216,20 @@ static unsigned int pin_job(struct host1x *host, struct host1x_job *job)
 	if (IS_ENABLED(CONFIG_TEGRA_HOST1X_FIREWALL))
 		return 0;
 
-	for (i = 0; i < job->num_gathers; i++) {
+	for (i = 0; i < job->num_cmds; i++) {
 		size_t gather_size = 0;
 		struct scatterlist *sg;
-		struct sg_table *sgt;
 		dma_addr_t phys_addr;
 		unsigned long shift;
 		struct iova *alloc;
 		dma_addr_t *phys;
 		unsigned int j;
 
-		g = &job->gathers[i];
+		if (job->cmds[i].is_wait)
+			continue;
+
+		g = &job->cmds[i].gather;
+
 		g->bo = host1x_bo_get(g->bo);
 		if (!g->bo) {
 			err = -EINVAL;
@@ -550,8 +565,13 @@ static inline int copy_gathers(struct device *host, struct host1x_job *job,
 	fw.num_relocs = job->num_relocs;
 	fw.class = job->class;
 
-	for (i = 0; i < job->num_gathers; i++) {
-		struct host1x_job_gather *g = &job->gathers[i];
+	for (i = 0; i < job->num_cmds; i++) {
+		struct host1x_job_gather *g;
+
+		if (job->cmds[i].is_wait)
+			continue;
+
+		g = &job->cmds[i].gather;
 
 		size += g->words * sizeof(u32);
 	}
@@ -573,10 +593,14 @@ static inline int copy_gathers(struct device *host, struct host1x_job *job,
 
 	job->gather_copy_size = size;
 
-	for (i = 0; i < job->num_gathers; i++) {
-		struct host1x_job_gather *g = &job->gathers[i];
+	for (i = 0; i < job->num_cmds; i++) {
+		struct host1x_job_gather *g;
 		void *gather;
 
+		if (job->cmds[i].is_wait)
+			continue;
+		g = &job->cmds[i].gather;
+
 		/* Copy the gather */
 		gather = host1x_bo_mmap(g->bo);
 		memcpy(job->gather_copy_mapped + offset, gather + g->offset,
@@ -619,8 +643,12 @@ int host1x_job_pin(struct host1x_job *job, struct device *dev)
 	}
 
 	/* patch gathers */
-	for (i = 0; i < job->num_gathers; i++) {
-		struct host1x_job_gather *g = &job->gathers[i];
+	for (i = 0; i < job->num_cmds; i++) {
+		struct host1x_job_gather *g;
+
+		if (job->cmds[i].is_wait)
+			continue;
+		g = &job->cmds[i].gather;
 
 		/* process each gather mem only once */
 		if (g->handled)
@@ -630,10 +658,11 @@ int host1x_job_pin(struct host1x_job *job, struct device *dev)
 		if (!IS_ENABLED(CONFIG_TEGRA_HOST1X_FIREWALL))
 			g->base = job->gather_addr_phys[i];
 
-		for (j = i + 1; j < job->num_gathers; j++) {
-			if (job->gathers[j].bo == g->bo) {
-				job->gathers[j].handled = true;
-				job->gathers[j].base = g->base;
+		for (j = i + 1; j < job->num_cmds; j++) {
+			if (!job->cmds[j].is_wait &&
+			    job->cmds[j].gather.bo == g->bo) {
+				job->cmds[j].gather.handled = true;
+				job->cmds[j].gather.base = g->base;
 			}
 		}
 
diff --git a/drivers/gpu/host1x/job.h b/drivers/gpu/host1x/job.h
index 94bc2e4ae241..33adfaede842 100644
--- a/drivers/gpu/host1x/job.h
+++ b/drivers/gpu/host1x/job.h
@@ -18,6 +18,20 @@ struct host1x_job_gather {
 	bool handled;
 };
 
+struct host1x_job_wait {
+	u32 id;
+	u32 threshold;
+};
+
+struct host1x_job_cmd {
+	bool is_wait;
+
+	union {
+		struct host1x_job_gather gather;
+		struct host1x_job_wait wait;
+	};
+};
+
 struct host1x_job_unpin_data {
 	struct host1x_bo *bo;
 	struct sg_table *sgt;
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index cabc5bef5bae..78ea56230b97 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -211,8 +211,8 @@ struct host1x_job {
 	struct host1x_client *client;
 
 	/* Gathers and their memory */
-	struct host1x_job_gather *gathers;
-	unsigned int num_gathers;
+	struct host1x_job_cmd *cmds;
+	unsigned int num_cmds;
 
 	/* Array of handles to be pinned & unpinned */
 	struct host1x_reloc *relocs;
@@ -271,6 +271,7 @@ struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
 				    u32 num_cmdbufs, u32 num_relocs);
 void host1x_job_add_gather(struct host1x_job *job, struct host1x_bo *bo,
 			   unsigned int words, unsigned int offset);
+void host1x_job_add_wait(struct host1x_job *job, u32 id, u32 thresh);
 struct host1x_job *host1x_job_get(struct host1x_job *job);
 void host1x_job_put(struct host1x_job *job);
 int host1x_job_pin(struct host1x_job *job, struct device *dev);
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH v2 13/17] gpu: host1x: Reset max value when freeing a syncpoint
  2020-09-05 10:34 ` Mikko Perttunen
@ 2020-09-05 10:34   ` Mikko Perttunen
  -1 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-05 10:34 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

With job recovery becoming optional, syncpoints may have a mismatch
between their value and max value when freed. As such, when freeing,
set the max value to the current value of the syncpoint so that it
is in a sane state for the next user.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/syncpt.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index 2fad8b2a55cc..82ecb4ac387e 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -385,6 +385,7 @@ static void syncpt_release(struct kref *ref)
 {
 	struct host1x_syncpt *sp = container_of(ref, struct host1x_syncpt, ref);
 
+	atomic_set(&sp->max_val, host1x_syncpt_read_min(sp));
 	sp->locked = false;
 
 	mutex_lock(&sp->host->syncpt_mutex);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH v2 13/17] gpu: host1x: Reset max value when freeing a syncpoint
@ 2020-09-05 10:34   ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-05 10:34 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

With job recovery becoming optional, syncpoints may have a mismatch
between their value and max value when freed. As such, when freeing,
set the max value to the current value of the syncpoint so that it
is in a sane state for the next user.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/syncpt.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index 2fad8b2a55cc..82ecb4ac387e 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -385,6 +385,7 @@ static void syncpt_release(struct kref *ref)
 {
 	struct host1x_syncpt *sp = container_of(ref, struct host1x_syncpt, ref);
 
+	atomic_set(&sp->max_val, host1x_syncpt_read_min(sp));
 	sp->locked = false;
 
 	mutex_lock(&sp->host->syncpt_mutex);
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH v2 14/17] drm/tegra: Add new UAPI to header
  2020-09-05 10:34 ` Mikko Perttunen
@ 2020-09-05 10:34   ` Mikko Perttunen
  -1 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-05 10:34 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Update the tegra_drm.h UAPI header, adding the new proposed UAPI.
The old staging UAPI is left in for now, with minor modification
to avoid name collisions.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 include/uapi/drm/tegra_drm.h | 431 ++++++++++++++++++++++++++++++++---
 1 file changed, 404 insertions(+), 27 deletions(-)

diff --git a/include/uapi/drm/tegra_drm.h b/include/uapi/drm/tegra_drm.h
index c4df3c3668b3..6db5fa242715 100644
--- a/include/uapi/drm/tegra_drm.h
+++ b/include/uapi/drm/tegra_drm.h
@@ -1,24 +1,5 @@
-/*
- * Copyright (c) 2012-2013, NVIDIA CORPORATION.  All rights reserved.
- *
- * Permission is hereby granted, free of charge, to any person obtaining a
- * copy of this software and associated documentation files (the "Software"),
- * to deal in the Software without restriction, including without limitation
- * the rights to use, copy, modify, merge, publish, distribute, sublicense,
- * and/or sell copies of the Software, and to permit persons to whom the
- * Software is furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
- * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
- * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
- * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
- * OTHER DEALINGS IN THE SOFTWARE.
- */
+/* SPDX-License-Identifier: MIT */
+/* Copyright (c) 2012-2020 NVIDIA Corporation */
 
 #ifndef _UAPI_TEGRA_DRM_H_
 #define _UAPI_TEGRA_DRM_H_
@@ -29,6 +10,8 @@
 extern "C" {
 #endif
 
+/* TegraDRM legacy UAPI. Only enabled with STAGING */
+
 #define DRM_TEGRA_GEM_CREATE_TILED     (1 << 0)
 #define DRM_TEGRA_GEM_CREATE_BOTTOM_UP (1 << 1)
 
@@ -644,13 +627,13 @@ struct drm_tegra_gem_get_flags {
 	__u32 flags;
 };
 
-#define DRM_TEGRA_GEM_CREATE		0x00
-#define DRM_TEGRA_GEM_MMAP		0x01
+#define DRM_TEGRA_GEM_CREATE_LEGACY	0x00
+#define DRM_TEGRA_GEM_MMAP_LEGACY	0x01
 #define DRM_TEGRA_SYNCPT_READ		0x02
 #define DRM_TEGRA_SYNCPT_INCR		0x03
 #define DRM_TEGRA_SYNCPT_WAIT		0x04
-#define DRM_TEGRA_OPEN_CHANNEL		0x05
-#define DRM_TEGRA_CLOSE_CHANNEL		0x06
+#define DRM_TEGRA_OPEN_CHANNEL	        0x05
+#define DRM_TEGRA_CLOSE_CHANNEL	        0x06
 #define DRM_TEGRA_GET_SYNCPT		0x07
 #define DRM_TEGRA_SUBMIT		0x08
 #define DRM_TEGRA_GET_SYNCPT_BASE	0x09
@@ -659,8 +642,8 @@ struct drm_tegra_gem_get_flags {
 #define DRM_TEGRA_GEM_SET_FLAGS		0x0c
 #define DRM_TEGRA_GEM_GET_FLAGS		0x0d
 
-#define DRM_IOCTL_TEGRA_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_CREATE, struct drm_tegra_gem_create)
-#define DRM_IOCTL_TEGRA_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_MMAP, struct drm_tegra_gem_mmap)
+#define DRM_IOCTL_TEGRA_GEM_CREATE_LEGACY DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_CREATE_LEGACY, struct drm_tegra_gem_create)
+#define DRM_IOCTL_TEGRA_GEM_MMAP_LEGACY DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_MMAP_LEGACY, struct drm_tegra_gem_mmap)
 #define DRM_IOCTL_TEGRA_SYNCPT_READ DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_SYNCPT_READ, struct drm_tegra_syncpt_read)
 #define DRM_IOCTL_TEGRA_SYNCPT_INCR DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_SYNCPT_INCR, struct drm_tegra_syncpt_incr)
 #define DRM_IOCTL_TEGRA_SYNCPT_WAIT DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_SYNCPT_WAIT, struct drm_tegra_syncpt_wait)
@@ -674,6 +657,400 @@ struct drm_tegra_gem_get_flags {
 #define DRM_IOCTL_TEGRA_GEM_SET_FLAGS DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_SET_FLAGS, struct drm_tegra_gem_set_flags)
 #define DRM_IOCTL_TEGRA_GEM_GET_FLAGS DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_GET_FLAGS, struct drm_tegra_gem_get_flags)
 
+/* New TegraDRM UAPI */
+
+struct drm_tegra_channel_open {
+	/**
+	 * @host1x_class: [in]
+	 *
+	 * Host1x class of the engine that will be programmed using this
+	 * channel.
+	 */
+	__u32 host1x_class;
+
+	/**
+	 * @flags: [in]
+	 *
+	 * Flags.
+	 */
+	__u32 flags;
+
+	/**
+	 * @channel_ctx: [out]
+	 *
+	 * Opaque identifier corresponding to the opened channel.
+	 */
+	__u32 channel_ctx;
+
+	/**
+	 * @hardware_version: [out]
+	 *
+	 * Version of the engine hardware. This can be used by userspace
+	 * to determine how the engine needs to be programmed.
+	 */
+	__u32 hardware_version;
+
+	__u32 reserved[2];
+};
+
+struct drm_tegra_channel_close {
+	/**
+	 * @channel_ctx: [in]
+	 *
+	 * Identifier of the channel to close.
+	 */
+	__u32 channel_ctx;
+
+	__u32 reserved[1];
+};
+
+#define DRM_TEGRA_CHANNEL_MAP_READWRITE			(1<<0)
+
+struct drm_tegra_channel_map {
+	/**
+	 * @channel_ctx: [in]
+	 *
+	 * Identifier of the channel to which make memory available for.
+	 */
+	__u32 channel_ctx;
+
+	/**
+	 * @handle: [in]
+	 *
+	 * GEM handle of the memory to map.
+	 */
+	__u32 handle;
+
+	/**
+	 * @offset: [in]
+	 *
+	 * Offset in the GEM handle's underlying memory to start the
+	 * mapping from.
+	 */
+	__u64 offset;
+
+	/**
+	 * @length: [in]
+	 *
+	 * Length of memory to map.
+	 */
+	__u64 length;
+
+	/**
+	 * @iova: [out]
+	 *
+	 * IOVA of mapped memory. Only available if hardware memory
+	 * isolation is supported. If provided, userspace can program this
+	 * address directly to the engine to skip using relocations.
+	 *
+	 * Will be set to U64_MAX if unavailable.
+	 */
+	__u64 iova;
+
+	/**
+	 * @mapping_id: [out]
+	 *
+	 * Identifier corresponding to the mapping, to be used for
+	 * relocations or unmapping later.
+	 */
+	__u32 mapping_id;
+
+	/**
+	 * @flags: [in]
+	 *
+	 * Flags.
+	 */
+	__u32 flags;
+
+	__u32 reserved[2];
+};
+
+struct drm_tegra_channel_unmap {
+	/**
+	 * @channel_ctx: [in]
+	 *
+	 * Channel identifier of the channel to unmap memory from.
+	 */
+	__u32 channel_ctx;
+
+	/**
+	 * @mapping_id: [in]
+	 *
+	 * Mapping identifier of the memory mapping to unmap.
+	 */
+	__u32 mapping_id;
+
+	__u32 reserved[2];
+};
+
+/* Submission */
+
+/** Patch address of the specified mapping in the submitted gather. */
+#define DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC		(1<<0)
+/**
+ * Specify that bit 39 of the patched-in address should be set to
+ * trigger layout swizzling between Tegra and non-Tegra Blocklinear
+ * layout on systems that store surfaces in system memory in non-Tegra
+ * Blocklinear layout.
+ */
+#define DRM_TEGRA_SUBMIT_BUF_RELOC_BLOCKLINEAR		(1<<1)
+/**
+ * Specify that any implicit fences required to read this buffer
+ * should be waited before executing the job.
+ */
+#define DRM_TEGRA_SUBMIT_BUF_RESV_READ			(1<<2)
+/**
+ * Specify that any implicit fences required to write this buffer
+ * should be waited before executing the job.
+ */
+#define DRM_TEGRA_SUBMIT_BUF_RESV_WRITE			(1<<3)
+
+struct drm_tegra_submit_buf {
+	/**
+	 * @mapping_id: [in]
+	 *
+	 * Identifier of the mapping to use in the submission.
+	 */
+	__u32 mapping_id;
+
+	/**
+	 * @flags: [in]
+	 *
+	 * Flags.
+	 */
+	__u32 flags;
+
+	struct {
+		/**
+		 * @target_offset: [in]
+		 *
+		 * Offset from the start of the mapping of the data whose
+		 * address is to be patched into the gather.
+		 */
+		__u64 target_offset;
+
+		/**
+		 * @gather_offset_words: [in]
+		 *
+		 * Offset in words from the start of the gather data to
+		 * where the address should be patched into.
+		 */
+		__u32 gather_offset_words;
+
+		/**
+		 * @shift: [in]
+		 *
+		 * Number of bits the address should be shifted right before
+		 * patching in.
+		 */
+		__u32 shift;
+	} reloc;
+
+	__u32 reserved[2];
+};
+
+#define DRM_TEGRA_SUBMIT_SYNCPT_INCR_CREATE_SYNC_FILE  (1<<0)
+
+struct drm_tegra_submit_syncpt_incr {
+	/**
+	 * @syncpt_fd: [in]
+	 *
+	 * Syncpoint file descriptor of the syncpoint that the job will
+	 * increment.
+	 */
+	__s32 syncpt_fd;
+
+	/**
+	 * @flags: [in]
+	 *
+	 * Flags.
+	 */
+	__u32 flags;
+
+	/**
+	 * @num_incrs: [in]
+	 *
+	 * Number of times the job will increment this syncpoint.
+	 */
+	__u32 num_incrs;
+
+	/**
+	 * @fence_value: [out]
+	 *
+	 * Value the syncpoint will have once the job has completed all
+	 * its specified syncpoint increments.
+	 *
+	 * Note that the kernel may increment the syncpoint before or after
+	 * the job. These increments are not reflected in this field.
+	 *
+	 * If the job hangs or times out, not all of the increments may
+	 * get executed.
+	 */
+	__u32 fence_value;
+
+	/**
+	 * @sync_file_fd: [out]
+	 *
+	 * Created sync_file file descriptor corresponding to the threshold
+	 * specified by `fence_value`. Only set if the CREATE_SYNC_FILE
+	 * flag is specified.
+	 */
+	__s32 sync_file_fd;
+
+	__u32 reserved[3];
+};
+
+/**
+ * Execute `words` words of Host1x opcodes specified in the `gather_data_ptr`
+ * buffer. Each GATHER_UPTR command uses successive words from the buffer.
+ */
+#define DRM_TEGRA_SUBMIT_CMD_GATHER_UPTR		0
+/**
+ * Wait for a syncpoint to reach a value before continuing with further
+ * commands.
+ */
+#define DRM_TEGRA_SUBMIT_CMD_WAIT_SYNCPT		1
+/**
+ * Wait for the fence represented by the sync_file file descriptor to be
+ * signaled before continuing with further commands. This command may be
+ * executed before submission of the job to hardware.
+ */
+#define DRM_TEGRA_SUBMIT_CMD_WAIT_SYNC_FILE		2
+
+/**
+ * If set, the driver is allowed to skip execution of this command if
+ * the previous job executed by the engine was from the same channel
+ * context as this job.
+ */
+#define DRM_TEGRA_SUBMIT_CONTEXT_SETUP			(1<<0)
+
+struct drm_tegra_submit_cmd_gather_uptr {
+	__u32 words;
+	__u32 reserved[3];
+};
+
+struct drm_tegra_submit_cmd_wait_syncpt {
+	__u32 id;
+	__u32 threshold;
+	__u32 reserved[2];
+};
+
+struct drm_tegra_submit_cmd_wait_sync_file {
+	__s32 fd;
+	__u32 reserved[3];
+};
+
+struct drm_tegra_submit_cmd {
+	/**
+	 * @type: [in]
+	 *
+	 * Command type to execute. One of the DRM_TEGRA_SUBMIT_CMD*
+	 * defines.
+	 */
+	__u32 type;
+
+	/**
+	 * @flags: [in]
+	 *
+	 * Flags.
+	 */
+	__u32 flags;
+
+	union {
+		struct drm_tegra_submit_cmd_gather_uptr gather_uptr;
+		struct drm_tegra_submit_cmd_wait_syncpt wait_syncpt;
+		struct drm_tegra_submit_cmd_wait_sync_file wait_sync_file;
+		__u32 reserved[4];
+	};
+};
+
+struct drm_tegra_channel_submit {
+	/**
+	 * @channel_ctx: [in]
+	 *
+	 * Identifier of the channel to submit this job to.
+	 */
+	__u32 channel_ctx;
+
+	/**
+	 * @timeout_us: [in]
+	 *
+	 * Timeout in microseconds after which the kernel may consider
+	 * the job hung and may clean up the job and any dependent jobs.
+	 *
+	 * This value may be capped by the kernel.
+	 */
+	__u32 timeout_us;
+
+	/**
+	 * @syncpt_incrs_ptr: [in]
+	 *
+	 * Pointer to an array of drm_tegra_submit_syncpt_incr structures.
+	 */
+	__u64 syncpt_incrs_ptr;
+
+	/**
+	 * @bufs_ptr: [in]
+	 *
+	 * Pointer to an array of drm_tegra_submit_buf structures.
+	 */
+	__u64 bufs_ptr;
+
+	/**
+	 * @cmds_ptr: [in]
+	 *
+	 * Pointer to an array of drm_tegra_submit_cmd structures.
+	 */
+	__u64 cmds_ptr;
+
+	/**
+	 * @gather_data_ptr: [in]
+	 *
+	 * Pointer to an array of Host1x opcodes to be used by GATHER_UPTR
+	 * commands.
+	 */
+	__u64 gather_data_ptr;
+
+	/**
+	 * @num_syncpt_incrs: [in]
+	 *
+	 * Number of elements in the `syncpt_incrs_ptr` array.
+	 */
+	__u32 num_syncpt_incrs;
+
+	/**
+	 * @num_bufs: [in]
+	 *
+	 * Number of elements in the `bufs_ptr` array.
+	 */
+	__u32 num_bufs;
+
+	/**
+	 * @num_cmds: [in]
+	 *
+	 * Number of elements in the `cmds_ptr` array.
+	 */
+	__u32 num_cmds;
+
+	/**
+	 * @gather_data_words: [in]
+	 *
+	 * Number of 32-bit words in the `gather_data_ptr` array.
+	 */
+	__u32 gather_data_words;
+
+	__u32 reserved[4];
+};
+
+#define DRM_IOCTL_TEGRA_CHANNEL_OPEN     DRM_IOWR(DRM_COMMAND_BASE + 0x10, struct drm_tegra_channel_open)
+#define DRM_IOCTL_TEGRA_CHANNEL_CLOSE    DRM_IOWR(DRM_COMMAND_BASE + 0x11, struct drm_tegra_channel_close)
+#define DRM_IOCTL_TEGRA_CHANNEL_MAP      DRM_IOWR(DRM_COMMAND_BASE + 0x12, struct drm_tegra_channel_map)
+#define DRM_IOCTL_TEGRA_CHANNEL_UNMAP    DRM_IOWR(DRM_COMMAND_BASE + 0x13, struct drm_tegra_channel_unmap)
+#define DRM_IOCTL_TEGRA_CHANNEL_SUBMIT   DRM_IOWR(DRM_COMMAND_BASE + 0x14, struct drm_tegra_channel_submit)
+
+#define DRM_IOCTL_TEGRA_GEM_CREATE       DRM_IOWR(DRM_COMMAND_BASE + 0x15, struct drm_tegra_gem_create)
+#define DRM_IOCTL_TEGRA_GEM_MMAP         DRM_IOWR(DRM_COMMAND_BASE + 0x16, struct drm_tegra_gem_mmap)
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH v2 14/17] drm/tegra: Add new UAPI to header
@ 2020-09-05 10:34   ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-05 10:34 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

Update the tegra_drm.h UAPI header, adding the new proposed UAPI.
The old staging UAPI is left in for now, with minor modification
to avoid name collisions.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 include/uapi/drm/tegra_drm.h | 431 ++++++++++++++++++++++++++++++++---
 1 file changed, 404 insertions(+), 27 deletions(-)

diff --git a/include/uapi/drm/tegra_drm.h b/include/uapi/drm/tegra_drm.h
index c4df3c3668b3..6db5fa242715 100644
--- a/include/uapi/drm/tegra_drm.h
+++ b/include/uapi/drm/tegra_drm.h
@@ -1,24 +1,5 @@
-/*
- * Copyright (c) 2012-2013, NVIDIA CORPORATION.  All rights reserved.
- *
- * Permission is hereby granted, free of charge, to any person obtaining a
- * copy of this software and associated documentation files (the "Software"),
- * to deal in the Software without restriction, including without limitation
- * the rights to use, copy, modify, merge, publish, distribute, sublicense,
- * and/or sell copies of the Software, and to permit persons to whom the
- * Software is furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
- * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
- * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
- * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
- * OTHER DEALINGS IN THE SOFTWARE.
- */
+/* SPDX-License-Identifier: MIT */
+/* Copyright (c) 2012-2020 NVIDIA Corporation */
 
 #ifndef _UAPI_TEGRA_DRM_H_
 #define _UAPI_TEGRA_DRM_H_
@@ -29,6 +10,8 @@
 extern "C" {
 #endif
 
+/* TegraDRM legacy UAPI. Only enabled with STAGING */
+
 #define DRM_TEGRA_GEM_CREATE_TILED     (1 << 0)
 #define DRM_TEGRA_GEM_CREATE_BOTTOM_UP (1 << 1)
 
@@ -644,13 +627,13 @@ struct drm_tegra_gem_get_flags {
 	__u32 flags;
 };
 
-#define DRM_TEGRA_GEM_CREATE		0x00
-#define DRM_TEGRA_GEM_MMAP		0x01
+#define DRM_TEGRA_GEM_CREATE_LEGACY	0x00
+#define DRM_TEGRA_GEM_MMAP_LEGACY	0x01
 #define DRM_TEGRA_SYNCPT_READ		0x02
 #define DRM_TEGRA_SYNCPT_INCR		0x03
 #define DRM_TEGRA_SYNCPT_WAIT		0x04
-#define DRM_TEGRA_OPEN_CHANNEL		0x05
-#define DRM_TEGRA_CLOSE_CHANNEL		0x06
+#define DRM_TEGRA_OPEN_CHANNEL	        0x05
+#define DRM_TEGRA_CLOSE_CHANNEL	        0x06
 #define DRM_TEGRA_GET_SYNCPT		0x07
 #define DRM_TEGRA_SUBMIT		0x08
 #define DRM_TEGRA_GET_SYNCPT_BASE	0x09
@@ -659,8 +642,8 @@ struct drm_tegra_gem_get_flags {
 #define DRM_TEGRA_GEM_SET_FLAGS		0x0c
 #define DRM_TEGRA_GEM_GET_FLAGS		0x0d
 
-#define DRM_IOCTL_TEGRA_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_CREATE, struct drm_tegra_gem_create)
-#define DRM_IOCTL_TEGRA_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_MMAP, struct drm_tegra_gem_mmap)
+#define DRM_IOCTL_TEGRA_GEM_CREATE_LEGACY DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_CREATE_LEGACY, struct drm_tegra_gem_create)
+#define DRM_IOCTL_TEGRA_GEM_MMAP_LEGACY DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_MMAP_LEGACY, struct drm_tegra_gem_mmap)
 #define DRM_IOCTL_TEGRA_SYNCPT_READ DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_SYNCPT_READ, struct drm_tegra_syncpt_read)
 #define DRM_IOCTL_TEGRA_SYNCPT_INCR DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_SYNCPT_INCR, struct drm_tegra_syncpt_incr)
 #define DRM_IOCTL_TEGRA_SYNCPT_WAIT DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_SYNCPT_WAIT, struct drm_tegra_syncpt_wait)
@@ -674,6 +657,400 @@ struct drm_tegra_gem_get_flags {
 #define DRM_IOCTL_TEGRA_GEM_SET_FLAGS DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_SET_FLAGS, struct drm_tegra_gem_set_flags)
 #define DRM_IOCTL_TEGRA_GEM_GET_FLAGS DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_GET_FLAGS, struct drm_tegra_gem_get_flags)
 
+/* New TegraDRM UAPI */
+
+struct drm_tegra_channel_open {
+	/**
+	 * @host1x_class: [in]
+	 *
+	 * Host1x class of the engine that will be programmed using this
+	 * channel.
+	 */
+	__u32 host1x_class;
+
+	/**
+	 * @flags: [in]
+	 *
+	 * Flags.
+	 */
+	__u32 flags;
+
+	/**
+	 * @channel_ctx: [out]
+	 *
+	 * Opaque identifier corresponding to the opened channel.
+	 */
+	__u32 channel_ctx;
+
+	/**
+	 * @hardware_version: [out]
+	 *
+	 * Version of the engine hardware. This can be used by userspace
+	 * to determine how the engine needs to be programmed.
+	 */
+	__u32 hardware_version;
+
+	__u32 reserved[2];
+};
+
+struct drm_tegra_channel_close {
+	/**
+	 * @channel_ctx: [in]
+	 *
+	 * Identifier of the channel to close.
+	 */
+	__u32 channel_ctx;
+
+	__u32 reserved[1];
+};
+
+#define DRM_TEGRA_CHANNEL_MAP_READWRITE			(1<<0)
+
+struct drm_tegra_channel_map {
+	/**
+	 * @channel_ctx: [in]
+	 *
+	 * Identifier of the channel to which make memory available for.
+	 */
+	__u32 channel_ctx;
+
+	/**
+	 * @handle: [in]
+	 *
+	 * GEM handle of the memory to map.
+	 */
+	__u32 handle;
+
+	/**
+	 * @offset: [in]
+	 *
+	 * Offset in the GEM handle's underlying memory to start the
+	 * mapping from.
+	 */
+	__u64 offset;
+
+	/**
+	 * @length: [in]
+	 *
+	 * Length of memory to map.
+	 */
+	__u64 length;
+
+	/**
+	 * @iova: [out]
+	 *
+	 * IOVA of mapped memory. Only available if hardware memory
+	 * isolation is supported. If provided, userspace can program this
+	 * address directly to the engine to skip using relocations.
+	 *
+	 * Will be set to U64_MAX if unavailable.
+	 */
+	__u64 iova;
+
+	/**
+	 * @mapping_id: [out]
+	 *
+	 * Identifier corresponding to the mapping, to be used for
+	 * relocations or unmapping later.
+	 */
+	__u32 mapping_id;
+
+	/**
+	 * @flags: [in]
+	 *
+	 * Flags.
+	 */
+	__u32 flags;
+
+	__u32 reserved[2];
+};
+
+struct drm_tegra_channel_unmap {
+	/**
+	 * @channel_ctx: [in]
+	 *
+	 * Channel identifier of the channel to unmap memory from.
+	 */
+	__u32 channel_ctx;
+
+	/**
+	 * @mapping_id: [in]
+	 *
+	 * Mapping identifier of the memory mapping to unmap.
+	 */
+	__u32 mapping_id;
+
+	__u32 reserved[2];
+};
+
+/* Submission */
+
+/** Patch address of the specified mapping in the submitted gather. */
+#define DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC		(1<<0)
+/**
+ * Specify that bit 39 of the patched-in address should be set to
+ * trigger layout swizzling between Tegra and non-Tegra Blocklinear
+ * layout on systems that store surfaces in system memory in non-Tegra
+ * Blocklinear layout.
+ */
+#define DRM_TEGRA_SUBMIT_BUF_RELOC_BLOCKLINEAR		(1<<1)
+/**
+ * Specify that any implicit fences required to read this buffer
+ * should be waited before executing the job.
+ */
+#define DRM_TEGRA_SUBMIT_BUF_RESV_READ			(1<<2)
+/**
+ * Specify that any implicit fences required to write this buffer
+ * should be waited before executing the job.
+ */
+#define DRM_TEGRA_SUBMIT_BUF_RESV_WRITE			(1<<3)
+
+struct drm_tegra_submit_buf {
+	/**
+	 * @mapping_id: [in]
+	 *
+	 * Identifier of the mapping to use in the submission.
+	 */
+	__u32 mapping_id;
+
+	/**
+	 * @flags: [in]
+	 *
+	 * Flags.
+	 */
+	__u32 flags;
+
+	struct {
+		/**
+		 * @target_offset: [in]
+		 *
+		 * Offset from the start of the mapping of the data whose
+		 * address is to be patched into the gather.
+		 */
+		__u64 target_offset;
+
+		/**
+		 * @gather_offset_words: [in]
+		 *
+		 * Offset in words from the start of the gather data to
+		 * where the address should be patched into.
+		 */
+		__u32 gather_offset_words;
+
+		/**
+		 * @shift: [in]
+		 *
+		 * Number of bits the address should be shifted right before
+		 * patching in.
+		 */
+		__u32 shift;
+	} reloc;
+
+	__u32 reserved[2];
+};
+
+#define DRM_TEGRA_SUBMIT_SYNCPT_INCR_CREATE_SYNC_FILE  (1<<0)
+
+struct drm_tegra_submit_syncpt_incr {
+	/**
+	 * @syncpt_fd: [in]
+	 *
+	 * Syncpoint file descriptor of the syncpoint that the job will
+	 * increment.
+	 */
+	__s32 syncpt_fd;
+
+	/**
+	 * @flags: [in]
+	 *
+	 * Flags.
+	 */
+	__u32 flags;
+
+	/**
+	 * @num_incrs: [in]
+	 *
+	 * Number of times the job will increment this syncpoint.
+	 */
+	__u32 num_incrs;
+
+	/**
+	 * @fence_value: [out]
+	 *
+	 * Value the syncpoint will have once the job has completed all
+	 * its specified syncpoint increments.
+	 *
+	 * Note that the kernel may increment the syncpoint before or after
+	 * the job. These increments are not reflected in this field.
+	 *
+	 * If the job hangs or times out, not all of the increments may
+	 * get executed.
+	 */
+	__u32 fence_value;
+
+	/**
+	 * @sync_file_fd: [out]
+	 *
+	 * Created sync_file file descriptor corresponding to the threshold
+	 * specified by `fence_value`. Only set if the CREATE_SYNC_FILE
+	 * flag is specified.
+	 */
+	__s32 sync_file_fd;
+
+	__u32 reserved[3];
+};
+
+/**
+ * Execute `words` words of Host1x opcodes specified in the `gather_data_ptr`
+ * buffer. Each GATHER_UPTR command uses successive words from the buffer.
+ */
+#define DRM_TEGRA_SUBMIT_CMD_GATHER_UPTR		0
+/**
+ * Wait for a syncpoint to reach a value before continuing with further
+ * commands.
+ */
+#define DRM_TEGRA_SUBMIT_CMD_WAIT_SYNCPT		1
+/**
+ * Wait for the fence represented by the sync_file file descriptor to be
+ * signaled before continuing with further commands. This command may be
+ * executed before submission of the job to hardware.
+ */
+#define DRM_TEGRA_SUBMIT_CMD_WAIT_SYNC_FILE		2
+
+/**
+ * If set, the driver is allowed to skip execution of this command if
+ * the previous job executed by the engine was from the same channel
+ * context as this job.
+ */
+#define DRM_TEGRA_SUBMIT_CONTEXT_SETUP			(1<<0)
+
+struct drm_tegra_submit_cmd_gather_uptr {
+	__u32 words;
+	__u32 reserved[3];
+};
+
+struct drm_tegra_submit_cmd_wait_syncpt {
+	__u32 id;
+	__u32 threshold;
+	__u32 reserved[2];
+};
+
+struct drm_tegra_submit_cmd_wait_sync_file {
+	__s32 fd;
+	__u32 reserved[3];
+};
+
+struct drm_tegra_submit_cmd {
+	/**
+	 * @type: [in]
+	 *
+	 * Command type to execute. One of the DRM_TEGRA_SUBMIT_CMD*
+	 * defines.
+	 */
+	__u32 type;
+
+	/**
+	 * @flags: [in]
+	 *
+	 * Flags.
+	 */
+	__u32 flags;
+
+	union {
+		struct drm_tegra_submit_cmd_gather_uptr gather_uptr;
+		struct drm_tegra_submit_cmd_wait_syncpt wait_syncpt;
+		struct drm_tegra_submit_cmd_wait_sync_file wait_sync_file;
+		__u32 reserved[4];
+	};
+};
+
+struct drm_tegra_channel_submit {
+	/**
+	 * @channel_ctx: [in]
+	 *
+	 * Identifier of the channel to submit this job to.
+	 */
+	__u32 channel_ctx;
+
+	/**
+	 * @timeout_us: [in]
+	 *
+	 * Timeout in microseconds after which the kernel may consider
+	 * the job hung and may clean up the job and any dependent jobs.
+	 *
+	 * This value may be capped by the kernel.
+	 */
+	__u32 timeout_us;
+
+	/**
+	 * @syncpt_incrs_ptr: [in]
+	 *
+	 * Pointer to an array of drm_tegra_submit_syncpt_incr structures.
+	 */
+	__u64 syncpt_incrs_ptr;
+
+	/**
+	 * @bufs_ptr: [in]
+	 *
+	 * Pointer to an array of drm_tegra_submit_buf structures.
+	 */
+	__u64 bufs_ptr;
+
+	/**
+	 * @cmds_ptr: [in]
+	 *
+	 * Pointer to an array of drm_tegra_submit_cmd structures.
+	 */
+	__u64 cmds_ptr;
+
+	/**
+	 * @gather_data_ptr: [in]
+	 *
+	 * Pointer to an array of Host1x opcodes to be used by GATHER_UPTR
+	 * commands.
+	 */
+	__u64 gather_data_ptr;
+
+	/**
+	 * @num_syncpt_incrs: [in]
+	 *
+	 * Number of elements in the `syncpt_incrs_ptr` array.
+	 */
+	__u32 num_syncpt_incrs;
+
+	/**
+	 * @num_bufs: [in]
+	 *
+	 * Number of elements in the `bufs_ptr` array.
+	 */
+	__u32 num_bufs;
+
+	/**
+	 * @num_cmds: [in]
+	 *
+	 * Number of elements in the `cmds_ptr` array.
+	 */
+	__u32 num_cmds;
+
+	/**
+	 * @gather_data_words: [in]
+	 *
+	 * Number of 32-bit words in the `gather_data_ptr` array.
+	 */
+	__u32 gather_data_words;
+
+	__u32 reserved[4];
+};
+
+#define DRM_IOCTL_TEGRA_CHANNEL_OPEN     DRM_IOWR(DRM_COMMAND_BASE + 0x10, struct drm_tegra_channel_open)
+#define DRM_IOCTL_TEGRA_CHANNEL_CLOSE    DRM_IOWR(DRM_COMMAND_BASE + 0x11, struct drm_tegra_channel_close)
+#define DRM_IOCTL_TEGRA_CHANNEL_MAP      DRM_IOWR(DRM_COMMAND_BASE + 0x12, struct drm_tegra_channel_map)
+#define DRM_IOCTL_TEGRA_CHANNEL_UNMAP    DRM_IOWR(DRM_COMMAND_BASE + 0x13, struct drm_tegra_channel_unmap)
+#define DRM_IOCTL_TEGRA_CHANNEL_SUBMIT   DRM_IOWR(DRM_COMMAND_BASE + 0x14, struct drm_tegra_channel_submit)
+
+#define DRM_IOCTL_TEGRA_GEM_CREATE       DRM_IOWR(DRM_COMMAND_BASE + 0x15, struct drm_tegra_gem_create)
+#define DRM_IOCTL_TEGRA_GEM_MMAP         DRM_IOWR(DRM_COMMAND_BASE + 0x16, struct drm_tegra_gem_mmap)
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH v2 15/17] drm/tegra: Add power_on/power_off engine callbacks
  2020-09-05 10:34 ` Mikko Perttunen
@ 2020-09-05 10:34   ` Mikko Perttunen
  -1 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-05 10:34 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

With the new UAPI implementation, engines are powered on and off
when there are active jobs, and the core code handles channel
allocation. To accommodate that, add the power_on and power_off
callbacks. The open_channel and close_channel callbacks are now only
used for the staging path.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/drm/tegra/drm.h |  11 +++-
 drivers/gpu/drm/tegra/vic.c | 127 ++++++++++++++++++++----------------
 2 files changed, 78 insertions(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
index b25443255be6..b915a3946ad4 100644
--- a/drivers/gpu/drm/tegra/drm.h
+++ b/drivers/gpu/drm/tegra/drm.h
@@ -67,14 +67,19 @@ struct tegra_drm_context {
 };
 
 struct tegra_drm_client_ops {
-	int (*open_channel)(struct tegra_drm_client *client,
-			    struct tegra_drm_context *context);
-	void (*close_channel)(struct tegra_drm_context *context);
+	int (*power_on)(struct tegra_drm_client *client);
+	void (*power_off)(struct tegra_drm_client *client);
+
 	int (*is_addr_reg)(struct device *dev, u32 class, u32 offset);
 	int (*is_valid_class)(u32 class);
 	int (*submit)(struct tegra_drm_context *context,
 		      struct drm_tegra_submit *args, struct drm_device *drm,
 		      struct drm_file *file);
+
+	/* Legacy UAPI callbacks */
+	int (*open_channel)(struct tegra_drm_client *client,
+			    struct tegra_drm_context *context);
+	void (*close_channel)(struct tegra_drm_context *context);
 };
 
 int tegra_drm_submit(struct tegra_drm_context *context,
diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c
index cb476da59adc..4783c7254de9 100644
--- a/drivers/gpu/drm/tegra/vic.c
+++ b/drivers/gpu/drm/tegra/vic.c
@@ -52,48 +52,6 @@ static void vic_writel(struct vic *vic, u32 value, unsigned int offset)
 	writel(value, vic->regs + offset);
 }
 
-static int vic_runtime_resume(struct device *dev)
-{
-	struct vic *vic = dev_get_drvdata(dev);
-	int err;
-
-	err = clk_prepare_enable(vic->clk);
-	if (err < 0)
-		return err;
-
-	usleep_range(10, 20);
-
-	err = reset_control_deassert(vic->rst);
-	if (err < 0)
-		goto disable;
-
-	usleep_range(10, 20);
-
-	return 0;
-
-disable:
-	clk_disable_unprepare(vic->clk);
-	return err;
-}
-
-static int vic_runtime_suspend(struct device *dev)
-{
-	struct vic *vic = dev_get_drvdata(dev);
-	int err;
-
-	err = reset_control_assert(vic->rst);
-	if (err < 0)
-		return err;
-
-	usleep_range(2000, 4000);
-
-	clk_disable_unprepare(vic->clk);
-
-	vic->booted = false;
-
-	return 0;
-}
-
 static int vic_boot(struct vic *vic)
 {
 #ifdef CONFIG_IOMMU_API
@@ -308,47 +266,102 @@ static int vic_load_firmware(struct vic *vic)
 	return err;
 }
 
-static int vic_open_channel(struct tegra_drm_client *client,
-			    struct tegra_drm_context *context)
+
+static int vic_runtime_resume(struct device *dev)
 {
-	struct vic *vic = to_vic(client);
+	struct vic *vic = dev_get_drvdata(dev);
 	int err;
 
-	err = pm_runtime_get_sync(vic->dev);
+	err = clk_prepare_enable(vic->clk);
 	if (err < 0)
 		return err;
 
+	usleep_range(10, 20);
+
+	err = reset_control_deassert(vic->rst);
+	if (err < 0)
+		goto disable;
+
+	usleep_range(10, 20);
+
 	err = vic_load_firmware(vic);
 	if (err < 0)
-		goto rpm_put;
+		goto assert;
 
 	err = vic_boot(vic);
 	if (err < 0)
-		goto rpm_put;
+		goto assert;
+
+	return 0;
+
+assert:
+	reset_control_assert(vic->rst);
+disable:
+	clk_disable_unprepare(vic->clk);
+	return err;
+}
+
+static int vic_runtime_suspend(struct device *dev)
+{
+	struct vic *vic = dev_get_drvdata(dev);
+	int err;
+
+	err = reset_control_assert(vic->rst);
+	if (err < 0)
+		return err;
+
+	usleep_range(2000, 4000);
+
+	clk_disable_unprepare(vic->clk);
+
+	vic->booted = false;
+
+	return 0;
+}
+
+static int vic_power_on(struct tegra_drm_client *client)
+{
+	struct vic *vic = to_vic(client);
+
+	return pm_runtime_get_sync(vic->dev);
+}
+
+static void vic_power_off(struct tegra_drm_client *client)
+{
+	struct vic *vic = to_vic(client);
+
+	pm_runtime_put(vic->dev);
+}
+
+static int vic_open_channel(struct tegra_drm_client *client,
+			    struct tegra_drm_context *context)
+{
+	struct vic *vic = to_vic(client);
+	int err;
+
+	err = vic_power_on(client);
+	if (err < 0)
+		return err;
 
 	context->channel = host1x_channel_get(vic->channel);
 	if (!context->channel) {
-		err = -ENOMEM;
-		goto rpm_put;
+		vic_power_off(client);
+		return -ENOMEM;
 	}
 
 	return 0;
-
-rpm_put:
-	pm_runtime_put(vic->dev);
-	return err;
 }
 
 static void vic_close_channel(struct tegra_drm_context *context)
 {
-	struct vic *vic = to_vic(context->client);
-
 	host1x_channel_put(context->channel);
 
-	pm_runtime_put(vic->dev);
+	vic_power_off(context->client);
 }
 
 static const struct tegra_drm_client_ops vic_ops = {
+	.power_on = vic_power_on,
+	.power_off = vic_power_off,
 	.open_channel = vic_open_channel,
 	.close_channel = vic_close_channel,
 	.submit = tegra_drm_submit,
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH v2 15/17] drm/tegra: Add power_on/power_off engine callbacks
@ 2020-09-05 10:34   ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-05 10:34 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

With the new UAPI implementation, engines are powered on and off
when there are active jobs, and the core code handles channel
allocation. To accommodate that, add the power_on and power_off
callbacks. The open_channel and close_channel callbacks are now only
used for the staging path.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/drm/tegra/drm.h |  11 +++-
 drivers/gpu/drm/tegra/vic.c | 127 ++++++++++++++++++++----------------
 2 files changed, 78 insertions(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
index b25443255be6..b915a3946ad4 100644
--- a/drivers/gpu/drm/tegra/drm.h
+++ b/drivers/gpu/drm/tegra/drm.h
@@ -67,14 +67,19 @@ struct tegra_drm_context {
 };
 
 struct tegra_drm_client_ops {
-	int (*open_channel)(struct tegra_drm_client *client,
-			    struct tegra_drm_context *context);
-	void (*close_channel)(struct tegra_drm_context *context);
+	int (*power_on)(struct tegra_drm_client *client);
+	void (*power_off)(struct tegra_drm_client *client);
+
 	int (*is_addr_reg)(struct device *dev, u32 class, u32 offset);
 	int (*is_valid_class)(u32 class);
 	int (*submit)(struct tegra_drm_context *context,
 		      struct drm_tegra_submit *args, struct drm_device *drm,
 		      struct drm_file *file);
+
+	/* Legacy UAPI callbacks */
+	int (*open_channel)(struct tegra_drm_client *client,
+			    struct tegra_drm_context *context);
+	void (*close_channel)(struct tegra_drm_context *context);
 };
 
 int tegra_drm_submit(struct tegra_drm_context *context,
diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c
index cb476da59adc..4783c7254de9 100644
--- a/drivers/gpu/drm/tegra/vic.c
+++ b/drivers/gpu/drm/tegra/vic.c
@@ -52,48 +52,6 @@ static void vic_writel(struct vic *vic, u32 value, unsigned int offset)
 	writel(value, vic->regs + offset);
 }
 
-static int vic_runtime_resume(struct device *dev)
-{
-	struct vic *vic = dev_get_drvdata(dev);
-	int err;
-
-	err = clk_prepare_enable(vic->clk);
-	if (err < 0)
-		return err;
-
-	usleep_range(10, 20);
-
-	err = reset_control_deassert(vic->rst);
-	if (err < 0)
-		goto disable;
-
-	usleep_range(10, 20);
-
-	return 0;
-
-disable:
-	clk_disable_unprepare(vic->clk);
-	return err;
-}
-
-static int vic_runtime_suspend(struct device *dev)
-{
-	struct vic *vic = dev_get_drvdata(dev);
-	int err;
-
-	err = reset_control_assert(vic->rst);
-	if (err < 0)
-		return err;
-
-	usleep_range(2000, 4000);
-
-	clk_disable_unprepare(vic->clk);
-
-	vic->booted = false;
-
-	return 0;
-}
-
 static int vic_boot(struct vic *vic)
 {
 #ifdef CONFIG_IOMMU_API
@@ -308,47 +266,102 @@ static int vic_load_firmware(struct vic *vic)
 	return err;
 }
 
-static int vic_open_channel(struct tegra_drm_client *client,
-			    struct tegra_drm_context *context)
+
+static int vic_runtime_resume(struct device *dev)
 {
-	struct vic *vic = to_vic(client);
+	struct vic *vic = dev_get_drvdata(dev);
 	int err;
 
-	err = pm_runtime_get_sync(vic->dev);
+	err = clk_prepare_enable(vic->clk);
 	if (err < 0)
 		return err;
 
+	usleep_range(10, 20);
+
+	err = reset_control_deassert(vic->rst);
+	if (err < 0)
+		goto disable;
+
+	usleep_range(10, 20);
+
 	err = vic_load_firmware(vic);
 	if (err < 0)
-		goto rpm_put;
+		goto assert;
 
 	err = vic_boot(vic);
 	if (err < 0)
-		goto rpm_put;
+		goto assert;
+
+	return 0;
+
+assert:
+	reset_control_assert(vic->rst);
+disable:
+	clk_disable_unprepare(vic->clk);
+	return err;
+}
+
+static int vic_runtime_suspend(struct device *dev)
+{
+	struct vic *vic = dev_get_drvdata(dev);
+	int err;
+
+	err = reset_control_assert(vic->rst);
+	if (err < 0)
+		return err;
+
+	usleep_range(2000, 4000);
+
+	clk_disable_unprepare(vic->clk);
+
+	vic->booted = false;
+
+	return 0;
+}
+
+static int vic_power_on(struct tegra_drm_client *client)
+{
+	struct vic *vic = to_vic(client);
+
+	return pm_runtime_get_sync(vic->dev);
+}
+
+static void vic_power_off(struct tegra_drm_client *client)
+{
+	struct vic *vic = to_vic(client);
+
+	pm_runtime_put(vic->dev);
+}
+
+static int vic_open_channel(struct tegra_drm_client *client,
+			    struct tegra_drm_context *context)
+{
+	struct vic *vic = to_vic(client);
+	int err;
+
+	err = vic_power_on(client);
+	if (err < 0)
+		return err;
 
 	context->channel = host1x_channel_get(vic->channel);
 	if (!context->channel) {
-		err = -ENOMEM;
-		goto rpm_put;
+		vic_power_off(client);
+		return -ENOMEM;
 	}
 
 	return 0;
-
-rpm_put:
-	pm_runtime_put(vic->dev);
-	return err;
 }
 
 static void vic_close_channel(struct tegra_drm_context *context)
 {
-	struct vic *vic = to_vic(context->client);
-
 	host1x_channel_put(context->channel);
 
-	pm_runtime_put(vic->dev);
+	vic_power_off(context->client);
 }
 
 static const struct tegra_drm_client_ops vic_ops = {
+	.power_on = vic_power_on,
+	.power_off = vic_power_off,
 	.open_channel = vic_open_channel,
 	.close_channel = vic_close_channel,
 	.submit = tegra_drm_submit,
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH v2 16/17] drm/tegra: Allocate per-engine channel in core code
  2020-09-05 10:34 ` Mikko Perttunen
@ 2020-09-05 10:34   ` Mikko Perttunen
  -1 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-05 10:34 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

To avoid duplication, allocate the per-engine shared channel in the
core code instead. Once MLOCKs are implemented on Host1x side, we
can also update this to avoid allocating a shared channel when
MLOCKs are enabled.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/drm/tegra/drm.c | 11 +++++++++++
 drivers/gpu/drm/tegra/drm.h |  4 ++++
 2 files changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 7437c67924aa..7124b0b0154b 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -887,6 +887,14 @@ static struct drm_driver tegra_drm_driver = {
 int tegra_drm_register_client(struct tegra_drm *tegra,
 			      struct tegra_drm_client *client)
 {
+	/*
+	 * When MLOCKs are implemented, change to allocate a shared channel
+	 * only when MLOCKs are disabled.
+	 */
+	client->shared_channel = host1x_channel_request(&client->base);
+	if (!client->shared_channel)
+		return -EBUSY;
+
 	mutex_lock(&tegra->clients_lock);
 	list_add_tail(&client->list, &tegra->clients);
 	client->drm = tegra;
@@ -903,6 +911,9 @@ int tegra_drm_unregister_client(struct tegra_drm *tegra,
 	client->drm = NULL;
 	mutex_unlock(&tegra->clients_lock);
 
+	if (client->shared_channel)
+		host1x_channel_put(client->shared_channel);
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
index b915a3946ad4..984925d0ad3e 100644
--- a/drivers/gpu/drm/tegra/drm.h
+++ b/drivers/gpu/drm/tegra/drm.h
@@ -91,8 +91,12 @@ struct tegra_drm_client {
 	struct list_head list;
 	struct tegra_drm *drm;
 
+	/* Set by driver */
 	unsigned int version;
 	const struct tegra_drm_client_ops *ops;
+
+	/* Set by TegraDRM core */
+	struct host1x_channel *shared_channel;
 };
 
 static inline struct tegra_drm_client *
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH v2 16/17] drm/tegra: Allocate per-engine channel in core code
@ 2020-09-05 10:34   ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-05 10:34 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

To avoid duplication, allocate the per-engine shared channel in the
core code instead. Once MLOCKs are implemented on Host1x side, we
can also update this to avoid allocating a shared channel when
MLOCKs are enabled.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/drm/tegra/drm.c | 11 +++++++++++
 drivers/gpu/drm/tegra/drm.h |  4 ++++
 2 files changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 7437c67924aa..7124b0b0154b 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -887,6 +887,14 @@ static struct drm_driver tegra_drm_driver = {
 int tegra_drm_register_client(struct tegra_drm *tegra,
 			      struct tegra_drm_client *client)
 {
+	/*
+	 * When MLOCKs are implemented, change to allocate a shared channel
+	 * only when MLOCKs are disabled.
+	 */
+	client->shared_channel = host1x_channel_request(&client->base);
+	if (!client->shared_channel)
+		return -EBUSY;
+
 	mutex_lock(&tegra->clients_lock);
 	list_add_tail(&client->list, &tegra->clients);
 	client->drm = tegra;
@@ -903,6 +911,9 @@ int tegra_drm_unregister_client(struct tegra_drm *tegra,
 	client->drm = NULL;
 	mutex_unlock(&tegra->clients_lock);
 
+	if (client->shared_channel)
+		host1x_channel_put(client->shared_channel);
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
index b915a3946ad4..984925d0ad3e 100644
--- a/drivers/gpu/drm/tegra/drm.h
+++ b/drivers/gpu/drm/tegra/drm.h
@@ -91,8 +91,12 @@ struct tegra_drm_client {
 	struct list_head list;
 	struct tegra_drm *drm;
 
+	/* Set by driver */
 	unsigned int version;
 	const struct tegra_drm_client_ops *ops;
+
+	/* Set by TegraDRM core */
+	struct host1x_channel *shared_channel;
 };
 
 static inline struct tegra_drm_client *
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI
  2020-09-05 10:34 ` Mikko Perttunen
@ 2020-09-05 10:34   ` Mikko Perttunen
  -1 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-05 10:34 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Implement the new UAPI, and bump the TegraDRM major version.

WIP:
- Wait DMA reservations
- Implement firewall on TegraDRM side

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/drm/tegra/Makefile      |   2 +
 drivers/gpu/drm/tegra/drm.c         |  46 +-
 drivers/gpu/drm/tegra/drm.h         |   5 +
 drivers/gpu/drm/tegra/uapi.h        |  59 +++
 drivers/gpu/drm/tegra/uapi/submit.c | 687 ++++++++++++++++++++++++++++
 drivers/gpu/drm/tegra/uapi/uapi.c   | 328 +++++++++++++
 6 files changed, 1109 insertions(+), 18 deletions(-)
 create mode 100644 drivers/gpu/drm/tegra/uapi.h
 create mode 100644 drivers/gpu/drm/tegra/uapi/submit.c
 create mode 100644 drivers/gpu/drm/tegra/uapi/uapi.c

diff --git a/drivers/gpu/drm/tegra/Makefile b/drivers/gpu/drm/tegra/Makefile
index d6cf202414f0..d480491564b7 100644
--- a/drivers/gpu/drm/tegra/Makefile
+++ b/drivers/gpu/drm/tegra/Makefile
@@ -3,6 +3,8 @@ ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG
 
 tegra-drm-y := \
 	drm.o \
+	uapi/uapi.o \
+	uapi/submit.o \
 	gem.o \
 	fb.o \
 	dp.o \
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 7124b0b0154b..acd734104c9a 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -20,24 +20,20 @@
 #include <drm/drm_prime.h>
 #include <drm/drm_vblank.h>
 
+#include "uapi.h"
 #include "drm.h"
 #include "gem.h"
 
 #define DRIVER_NAME "tegra"
 #define DRIVER_DESC "NVIDIA Tegra graphics"
 #define DRIVER_DATE "20120330"
-#define DRIVER_MAJOR 0
+#define DRIVER_MAJOR 1
 #define DRIVER_MINOR 0
 #define DRIVER_PATCHLEVEL 0
 
 #define CARVEOUT_SZ SZ_64M
 #define CDMA_GATHER_FETCHES_MAX_NB 16383
 
-struct tegra_drm_file {
-	struct idr contexts;
-	struct mutex lock;
-};
-
 static int tegra_atomic_check(struct drm_device *drm,
 			      struct drm_atomic_state *state)
 {
@@ -90,7 +86,8 @@ static int tegra_drm_open(struct drm_device *drm, struct drm_file *filp)
 	if (!fpriv)
 		return -ENOMEM;
 
-	idr_init(&fpriv->contexts);
+	idr_init(&fpriv->legacy_contexts);
+	xa_init_flags(&fpriv->contexts, XA_FLAGS_ALLOC);
 	mutex_init(&fpriv->lock);
 	filp->driver_priv = fpriv;
 
@@ -432,7 +429,7 @@ static int tegra_client_open(struct tegra_drm_file *fpriv,
 	if (err < 0)
 		return err;
 
-	err = idr_alloc(&fpriv->contexts, context, 1, 0, GFP_KERNEL);
+	err = idr_alloc(&fpriv->legacy_contexts, context, 1, 0, GFP_KERNEL);
 	if (err < 0) {
 		client->ops->close_channel(context);
 		return err;
@@ -487,13 +484,13 @@ static int tegra_close_channel(struct drm_device *drm, void *data,
 
 	mutex_lock(&fpriv->lock);
 
-	context = idr_find(&fpriv->contexts, args->context);
+	context = idr_find(&fpriv->legacy_contexts, args->context);
 	if (!context) {
 		err = -EINVAL;
 		goto unlock;
 	}
 
-	idr_remove(&fpriv->contexts, context->id);
+	idr_remove(&fpriv->legacy_contexts, context->id);
 	tegra_drm_context_free(context);
 
 unlock:
@@ -512,7 +509,7 @@ static int tegra_get_syncpt(struct drm_device *drm, void *data,
 
 	mutex_lock(&fpriv->lock);
 
-	context = idr_find(&fpriv->contexts, args->context);
+	context = idr_find(&fpriv->legacy_contexts, args->context);
 	if (!context) {
 		err = -ENODEV;
 		goto unlock;
@@ -541,7 +538,7 @@ static int tegra_submit(struct drm_device *drm, void *data,
 
 	mutex_lock(&fpriv->lock);
 
-	context = idr_find(&fpriv->contexts, args->context);
+	context = idr_find(&fpriv->legacy_contexts, args->context);
 	if (!context) {
 		err = -ENODEV;
 		goto unlock;
@@ -566,7 +563,7 @@ static int tegra_get_syncpt_base(struct drm_device *drm, void *data,
 
 	mutex_lock(&fpriv->lock);
 
-	context = idr_find(&fpriv->contexts, args->context);
+	context = idr_find(&fpriv->legacy_contexts, args->context);
 	if (!context) {
 		err = -ENODEV;
 		goto unlock;
@@ -734,11 +731,23 @@ static int tegra_gem_get_flags(struct drm_device *drm, void *data,
 #endif
 
 static const struct drm_ioctl_desc tegra_drm_ioctls[] = {
-#ifdef CONFIG_DRM_TEGRA_STAGING
-	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE, tegra_gem_create,
+	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_OPEN, tegra_drm_ioctl_channel_open,
+			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_CLOSE, tegra_drm_ioctl_channel_close,
+			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_MAP, tegra_drm_ioctl_channel_map,
 			  DRM_RENDER_ALLOW),
-	DRM_IOCTL_DEF_DRV(TEGRA_GEM_MMAP, tegra_gem_mmap,
+	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_UNMAP, tegra_drm_ioctl_channel_unmap,
 			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_SUBMIT, tegra_drm_ioctl_channel_submit,
+			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE, tegra_drm_ioctl_gem_create,
+			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(TEGRA_GEM_MMAP, tegra_drm_ioctl_gem_mmap,
+			  DRM_RENDER_ALLOW),
+#ifdef CONFIG_DRM_TEGRA_STAGING
+	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE_LEGACY, tegra_gem_create, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(TEGRA_GEM_MMAP_LEGACY, tegra_gem_mmap, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(TEGRA_SYNCPT_READ, tegra_syncpt_read,
 			  DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(TEGRA_SYNCPT_INCR, tegra_syncpt_incr,
@@ -792,10 +801,11 @@ static void tegra_drm_postclose(struct drm_device *drm, struct drm_file *file)
 	struct tegra_drm_file *fpriv = file->driver_priv;
 
 	mutex_lock(&fpriv->lock);
-	idr_for_each(&fpriv->contexts, tegra_drm_context_cleanup, NULL);
+	idr_for_each(&fpriv->legacy_contexts, tegra_drm_context_cleanup, NULL);
+	tegra_drm_uapi_close_file(fpriv);
 	mutex_unlock(&fpriv->lock);
 
-	idr_destroy(&fpriv->contexts);
+	idr_destroy(&fpriv->legacy_contexts);
 	mutex_destroy(&fpriv->lock);
 	kfree(fpriv);
 }
diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
index 984925d0ad3e..fbacb0b35189 100644
--- a/drivers/gpu/drm/tegra/drm.h
+++ b/drivers/gpu/drm/tegra/drm.h
@@ -58,6 +58,11 @@ struct tegra_drm {
 	struct tegra_display_hub *hub;
 };
 
+static inline struct host1x *tegra_drm_to_host1x(struct tegra_drm *tegra)
+{
+	return dev_get_drvdata(tegra->drm->dev->parent);
+}
+
 struct tegra_drm_client;
 
 struct tegra_drm_context {
diff --git a/drivers/gpu/drm/tegra/uapi.h b/drivers/gpu/drm/tegra/uapi.h
new file mode 100644
index 000000000000..4867646670c6
--- /dev/null
+++ b/drivers/gpu/drm/tegra/uapi.h
@@ -0,0 +1,59 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (c) 2020 NVIDIA Corporation */
+
+#ifndef _TEGRA_DRM_CHANNEL_UAPI_H
+#define _TEGRA_DRM_CHANNEL_UAPI_H
+
+#include <linux/dma-mapping.h>
+#include <linux/idr.h>
+#include <linux/kref.h>
+#include <linux/xarray.h>
+
+#include <drm/drm.h>
+
+struct tegra_drm_file {
+	/* Legacy UAPI state */
+	struct idr legacy_contexts;
+	struct mutex lock;
+
+	/* New UAPI state */
+	struct xarray contexts;
+};
+
+struct tegra_drm_channel_ctx {
+	struct tegra_drm_client *client;
+	struct host1x_channel *channel;
+	struct xarray mappings;
+};
+
+struct tegra_drm_mapping {
+	struct kref ref;
+
+	struct device *dev;
+	struct host1x_bo *bo;
+	struct sg_table *sgt;
+	enum dma_data_direction direction;
+	dma_addr_t iova;
+};
+
+int tegra_drm_ioctl_channel_open(struct drm_device *drm, void *data,
+				 struct drm_file *file);
+int tegra_drm_ioctl_channel_close(struct drm_device *drm, void *data,
+				  struct drm_file *file);
+int tegra_drm_ioctl_channel_map(struct drm_device *drm, void *data,
+				struct drm_file *file);
+int tegra_drm_ioctl_channel_unmap(struct drm_device *drm, void *data,
+				struct drm_file *file);
+int tegra_drm_ioctl_channel_submit(struct drm_device *drm, void *data,
+				   struct drm_file *file);
+int tegra_drm_ioctl_gem_create(struct drm_device *drm, void *data,
+				struct drm_file *file);
+int tegra_drm_ioctl_gem_mmap(struct drm_device *drm, void *data,
+				struct drm_file *file);
+
+void tegra_drm_uapi_close_file(struct tegra_drm_file *file);
+void tegra_drm_mapping_put(struct tegra_drm_mapping *mapping);
+struct tegra_drm_channel_ctx *
+tegra_drm_channel_ctx_lock(struct tegra_drm_file *file, u32 id);
+
+#endif
diff --git a/drivers/gpu/drm/tegra/uapi/submit.c b/drivers/gpu/drm/tegra/uapi/submit.c
new file mode 100644
index 000000000000..84e1c602db3e
--- /dev/null
+++ b/drivers/gpu/drm/tegra/uapi/submit.c
@@ -0,0 +1,687 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2020 NVIDIA Corporation */
+
+#include <linux/dma-fence-array.h>
+#include <linux/file.h>
+#include <linux/host1x.h>
+#include <linux/iommu.h>
+#include <linux/kref.h>
+#include <linux/list.h>
+#include <linux/nospec.h>
+#include <linux/sync_file.h>
+
+#include <drm/drm_drv.h>
+#include <drm/drm_file.h>
+
+#include "../uapi.h"
+#include "../drm.h"
+#include "../gem.h"
+
+static struct tegra_drm_mapping *
+tegra_drm_mapping_get(struct tegra_drm_channel_ctx *ctx, u32 id)
+{
+	struct tegra_drm_mapping *mapping;
+
+	xa_lock(&ctx->mappings);
+	mapping = xa_load(&ctx->mappings, id);
+	if (mapping)
+		kref_get(&mapping->ref);
+	xa_unlock(&ctx->mappings);
+
+	return mapping;
+}
+
+struct gather_bo {
+	struct host1x_bo base;
+
+	struct kref ref;
+
+	u32 *gather_data;
+	size_t gather_data_len;
+};
+
+static struct host1x_bo *gather_bo_get(struct host1x_bo *host_bo)
+{
+	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
+
+	kref_get(&bo->ref);
+
+	return host_bo;
+}
+
+static void gather_bo_release(struct kref *ref)
+{
+	struct gather_bo *bo = container_of(ref, struct gather_bo, ref);
+
+	kfree(bo->gather_data);
+	kfree(bo);
+}
+
+static void gather_bo_put(struct host1x_bo *host_bo)
+{
+	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
+
+	kref_put(&bo->ref, gather_bo_release);
+}
+
+static struct sg_table *
+gather_bo_pin(struct device *dev, struct host1x_bo *host_bo, dma_addr_t *phys)
+{
+	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
+	struct sg_table *sgt;
+	int err;
+
+	if (phys) {
+		*phys = virt_to_phys(bo->gather_data);
+		return NULL;
+	}
+
+	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
+	if (!sgt)
+		return ERR_PTR(-ENOMEM);
+
+	err = sg_alloc_table(sgt, 1, GFP_KERNEL);
+	if (err) {
+		kfree(sgt);
+		return ERR_PTR(err);
+	}
+
+	sg_init_one(sgt->sgl, bo->gather_data, bo->gather_data_len);
+
+	return sgt;
+}
+
+static void gather_bo_unpin(struct device *dev, struct sg_table *sgt)
+{
+	if (sgt) {
+		sg_free_table(sgt);
+		kfree(sgt);
+	}
+}
+
+static void *gather_bo_mmap(struct host1x_bo *host_bo)
+{
+	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
+
+	return bo->gather_data;
+}
+
+static void gather_bo_munmap(struct host1x_bo *host_bo, void *addr)
+{
+}
+
+static const struct host1x_bo_ops gather_bo_ops = {
+	.get = gather_bo_get,
+	.put = gather_bo_put,
+	.pin = gather_bo_pin,
+	.unpin = gather_bo_unpin,
+	.mmap = gather_bo_mmap,
+	.munmap = gather_bo_munmap,
+};
+
+struct tegra_drm_used_mapping {
+	struct tegra_drm_mapping *mapping;
+	u32 flags;
+};
+
+struct tegra_drm_job_data {
+	struct tegra_drm_used_mapping *used_mappings;
+	u32 num_used_mappings;
+};
+
+static int submit_copy_gather_data(struct drm_device *drm,
+				   struct gather_bo **pbo,
+				   struct drm_tegra_channel_submit *args)
+{
+	unsigned long copy_err;
+	struct gather_bo *bo;
+
+	if (args->gather_data_words == 0) {
+		drm_info(drm, "gather_data_words can't be 0");
+		return -EINVAL;
+	}
+	if (args->gather_data_words > 1024) {
+		drm_info(drm, "gather_data_words can't be over 1024");
+		return -E2BIG;
+	}
+
+	bo = kzalloc(sizeof(*bo), GFP_KERNEL);
+	if (!bo)
+		return -ENOMEM;
+
+	kref_init(&bo->ref);
+	host1x_bo_init(&bo->base, &gather_bo_ops);
+
+	bo->gather_data =
+		kmalloc(args->gather_data_words*4, GFP_KERNEL | __GFP_NOWARN);
+	if (!bo->gather_data) {
+		kfree(bo);
+		return -ENOMEM;
+	}
+
+	copy_err = copy_from_user(bo->gather_data,
+				  u64_to_user_ptr(args->gather_data_ptr),
+				  args->gather_data_words*4);
+	if (copy_err) {
+		kfree(bo->gather_data);
+		kfree(bo);
+		return -EFAULT;
+	}
+
+	bo->gather_data_len = args->gather_data_words;
+
+	*pbo = bo;
+
+	return 0;
+}
+
+static int submit_write_reloc(struct gather_bo *bo,
+			      struct drm_tegra_submit_buf *buf,
+			      struct tegra_drm_mapping *mapping)
+{
+	/* TODO check that target_offset is within bounds */
+	dma_addr_t iova = mapping->iova + buf->reloc.target_offset;
+	u32 written_ptr = (u32)(iova >> buf->reloc.shift);
+
+	if (buf->flags & DRM_TEGRA_SUBMIT_BUF_RELOC_BLOCKLINEAR)
+		written_ptr |= BIT(39);
+
+	if (buf->reloc.gather_offset_words >= bo->gather_data_len)
+		return -EINVAL;
+
+	buf->reloc.gather_offset_words = array_index_nospec(
+		buf->reloc.gather_offset_words, bo->gather_data_len);
+
+	bo->gather_data[buf->reloc.gather_offset_words] = written_ptr;
+
+	return 0;
+}
+
+static void submit_unlock_resv(struct tegra_drm_job_data *job_data,
+			       struct ww_acquire_ctx *acquire_ctx)
+{
+	u32 i;
+
+	for (i = 0; i < job_data->num_used_mappings; i++) {
+		struct tegra_bo *bo = host1x_to_tegra_bo(
+			job_data->used_mappings[i].mapping->bo);
+
+		dma_resv_unlock(bo->gem.resv);
+	}
+
+	ww_acquire_fini(acquire_ctx);
+}
+
+static int submit_handle_resv(struct tegra_drm_job_data *job_data,
+			      struct ww_acquire_ctx *acquire_ctx)
+{
+	int contended = -1;
+	int err;
+	u32 i;
+
+	/* Based on drm_gem_lock_reservations */
+
+	ww_acquire_init(acquire_ctx, &reservation_ww_class);
+
+retry:
+	if (contended != -1) {
+		struct tegra_bo *bo = host1x_to_tegra_bo(
+			job_data->used_mappings[contended].mapping->bo);
+
+		err = dma_resv_lock_slow_interruptible(bo->gem.resv,
+						       acquire_ctx);
+		if (err) {
+			ww_acquire_done(acquire_ctx);
+			return err;
+		}
+	}
+
+	for (i = 0; i < job_data->num_used_mappings; i++) {
+		struct tegra_bo *bo = host1x_to_tegra_bo(
+			job_data->used_mappings[contended].mapping->bo);
+
+		if (i == contended)
+			continue;
+
+		err = dma_resv_lock_interruptible(bo->gem.resv, acquire_ctx);
+		if (err) {
+			int j;
+
+			for (j = 0; j < i; j++) {
+				bo = host1x_to_tegra_bo(
+					job_data->used_mappings[j].mapping->bo);
+				dma_resv_unlock(bo->gem.resv);
+			}
+
+			if (contended != -1 && contended >= i) {
+				bo = host1x_to_tegra_bo(
+					job_data->used_mappings[contended].mapping->bo);
+				dma_resv_unlock(bo->gem.resv);
+			}
+
+			if (err == -EDEADLK) {
+				contended = i;
+				goto retry;
+			}
+
+			ww_acquire_done(acquire_ctx);
+			return err;
+		}
+	}
+
+	ww_acquire_done(acquire_ctx);
+
+	for (i = 0; i < job_data->num_used_mappings; i++) {
+		struct tegra_drm_used_mapping *um = &job_data->used_mappings[i];
+		struct tegra_bo *bo = host1x_to_tegra_bo(
+			job_data->used_mappings[i].mapping->bo);
+
+		if (um->flags & DRM_TEGRA_SUBMIT_BUF_RESV_READ) {
+			err = dma_resv_reserve_shared(bo->gem.resv, 1);
+			if (err < 0)
+				goto unlock_resv;
+		}
+	}
+
+	return 0;
+
+unlock_resv:
+	submit_unlock_resv(job_data, acquire_ctx);
+
+	return err;
+}
+
+static int submit_process_bufs(struct drm_device *drm, struct gather_bo *bo,
+			       struct tegra_drm_job_data *job_data,
+			       struct tegra_drm_channel_ctx *ctx,
+			       struct drm_tegra_channel_submit *args,
+			       struct ww_acquire_ctx *acquire_ctx)
+{
+	struct drm_tegra_submit_buf __user *user_bufs_ptr =
+		u64_to_user_ptr(args->bufs_ptr);
+	struct tegra_drm_mapping *mapping;
+	struct drm_tegra_submit_buf buf;
+	unsigned long copy_err;
+	int err;
+	u32 i;
+
+	job_data->used_mappings =
+		kcalloc(args->num_bufs, sizeof(*job_data->used_mappings), GFP_KERNEL);
+	if (!job_data->used_mappings)
+		return -ENOMEM;
+
+	for (i = 0; i < args->num_bufs; i++) {
+		copy_err = copy_from_user(&buf, user_bufs_ptr+i, sizeof(buf));
+		if (copy_err) {
+			err = -EFAULT;
+			goto drop_refs;
+		}
+
+		if (buf.flags & ~(DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC |
+				  DRM_TEGRA_SUBMIT_BUF_RELOC_BLOCKLINEAR |
+				  DRM_TEGRA_SUBMIT_BUF_RESV_READ |
+				  DRM_TEGRA_SUBMIT_BUF_RESV_WRITE)) {
+			err = -EINVAL;
+			goto drop_refs;
+		}
+
+		if (buf.reserved[0] || buf.reserved[1]) {
+			err = -EINVAL;
+			goto drop_refs;
+		}
+
+		mapping = tegra_drm_mapping_get(ctx, buf.mapping_id);
+		if (!mapping) {
+			drm_info(drm, "invalid mapping_id for buf: %u",
+				 buf.mapping_id);
+			err = -EINVAL;
+			goto drop_refs;
+		}
+
+		if (buf.flags & DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC) {
+			err = submit_write_reloc(bo, &buf, mapping);
+			if (err) {
+				tegra_drm_mapping_put(mapping);
+				goto drop_refs;
+			}
+		}
+
+		job_data->used_mappings[i].mapping = mapping;
+		job_data->used_mappings[i].flags = buf.flags;
+	}
+
+	return 0;
+
+drop_refs:
+	for (;;) {
+		if (i-- == 0)
+			break;
+
+		tegra_drm_mapping_put(job_data->used_mappings[i].mapping);
+	}
+
+	kfree(job_data->used_mappings);
+	job_data->used_mappings = NULL;
+
+	return err;
+}
+
+static int submit_create_job(struct drm_device *drm, struct host1x_job **pjob,
+			     struct gather_bo *bo,
+			     struct tegra_drm_channel_ctx *ctx,
+			     struct drm_tegra_channel_submit *args,
+			     struct drm_file *file)
+{
+	struct drm_tegra_submit_cmd __user *user_cmds_ptr =
+		u64_to_user_ptr(args->cmds_ptr);
+	struct drm_tegra_submit_cmd cmd;
+	struct host1x_job *job;
+	unsigned long copy_err;
+	u32 i, gather_offset = 0;
+	int err = 0;
+
+	job = host1x_job_alloc(ctx->channel, args->num_cmds, 0);
+	if (!job)
+		return -ENOMEM;
+
+	job->client = &ctx->client->base;
+	job->class = ctx->client->base.class;
+	job->serialize = true;
+
+	for (i = 0; i < args->num_cmds; i++) {
+		copy_err = copy_from_user(&cmd, user_cmds_ptr+i, sizeof(cmd));
+		if (copy_err) {
+			err = -EFAULT;
+			goto free_job;
+		}
+
+		if (cmd.type == DRM_TEGRA_SUBMIT_CMD_GATHER_UPTR) {
+			if (cmd.gather_uptr.reserved[0] ||
+			    cmd.gather_uptr.reserved[1] ||
+			    cmd.gather_uptr.reserved[2]) {
+				err = -EINVAL;
+				goto free_job;
+			}
+
+			/* Check for maximum gather size */
+			if (cmd.gather_uptr.words > 16383) {
+				err = -EINVAL;
+				goto free_job;
+			}
+
+			host1x_job_add_gather(job, &bo->base,
+					      cmd.gather_uptr.words,
+					      gather_offset*4);
+
+			gather_offset += cmd.gather_uptr.words;
+
+			if (gather_offset > bo->gather_data_len) {
+				err = -EINVAL;
+				goto free_job;
+			}
+		} else if (cmd.type == DRM_TEGRA_SUBMIT_CMD_WAIT_SYNCPT) {
+			if (cmd.wait_syncpt.reserved[0] ||
+			    cmd.wait_syncpt.reserved[1]) {
+				err = -EINVAL;
+				goto free_job;
+			}
+
+			host1x_job_add_wait(job, cmd.wait_syncpt.id,
+					    cmd.wait_syncpt.threshold);
+		} else if (cmd.type == DRM_TEGRA_SUBMIT_CMD_WAIT_SYNC_FILE) {
+			struct dma_fence *f;
+
+			if (cmd.wait_sync_file.reserved[0] ||
+			    cmd.wait_sync_file.reserved[1] ||
+			    cmd.wait_sync_file.reserved[2]) {
+				err = -EINVAL;
+				goto free_job;
+			}
+
+			f = sync_file_get_fence(cmd.wait_sync_file.fd);
+			if (!f) {
+				err = -EINVAL;
+				goto free_job;
+			}
+
+			err = dma_fence_wait(f, true);
+			dma_fence_put(f);
+
+			if (err)
+				goto free_job;
+		} else {
+			err = -EINVAL;
+			goto free_job;
+		}
+	}
+
+	if (gather_offset == 0) {
+		drm_info(drm, "job must have at least one gather");
+		err = -EINVAL;
+		goto free_job;
+	}
+
+	*pjob = job;
+
+	return 0;
+
+free_job:
+	host1x_job_put(job);
+
+	return err;
+}
+
+static int submit_handle_syncpts(struct drm_device *drm, struct host1x_job *job,
+				 struct drm_tegra_submit_syncpt_incr *incr,
+				 struct drm_tegra_channel_submit *args)
+{
+	struct drm_tegra_submit_syncpt_incr __user *user_incrs_ptr =
+		u64_to_user_ptr(args->syncpt_incrs_ptr);
+	struct host1x_syncpt *sp;
+	unsigned long copy_err;
+
+	if (args->num_syncpt_incrs != 1) {
+		drm_info(drm, "Only 1 syncpoint supported for now");
+		return -EINVAL;
+	}
+
+	copy_err = copy_from_user(incr, user_incrs_ptr, sizeof(*incr));
+	if (copy_err)
+		return -EFAULT;
+
+	if ((incr->flags & ~DRM_TEGRA_SUBMIT_SYNCPT_INCR_CREATE_SYNC_FILE) ||
+	    incr->reserved[0] || incr->reserved[1] || incr->reserved[2])
+		return -EINVAL;
+
+	/* Syncpt ref will be dropped on job release */
+	sp = host1x_syncpt_fd_get(incr->syncpt_fd);
+	if (IS_ERR(sp))
+		return PTR_ERR(sp);
+
+	job->syncpt = sp;
+	job->syncpt_incrs = incr->num_incrs;
+
+	return 0;
+}
+
+static int submit_create_postfences(struct host1x_job *job,
+				    struct drm_tegra_submit_syncpt_incr *incr,
+				    struct drm_tegra_channel_submit *args)
+{
+	struct tegra_drm_job_data *job_data = job->user_data;
+	struct dma_fence *fence;
+	int err = 0;
+	u32 i;
+
+	fence = host1x_fence_create(job->syncpt, job->syncpt_end);
+	if (IS_ERR(fence))
+		return PTR_ERR(fence);
+
+	incr->fence_value = job->syncpt_end;
+
+	for (i = 0; i < job_data->num_used_mappings; i++) {
+		struct tegra_drm_used_mapping *um = &job_data->used_mappings[i];
+		struct tegra_bo *bo = host1x_to_tegra_bo(um->mapping->bo);
+
+		if (um->flags & DRM_TEGRA_SUBMIT_BUF_RESV_READ)
+			dma_resv_add_shared_fence(bo->gem.resv, fence);
+
+		if (um->flags & DRM_TEGRA_SUBMIT_BUF_RESV_WRITE)
+			dma_resv_add_excl_fence(bo->gem.resv, fence);
+	}
+
+	if (incr->flags & DRM_TEGRA_SUBMIT_SYNCPT_INCR_CREATE_SYNC_FILE) {
+		struct sync_file *sf;
+
+		err = get_unused_fd_flags(O_CLOEXEC);
+		if (err < 0)
+			goto put_fence;
+
+		sf = sync_file_create(fence);
+		if (!sf) {
+			err = -ENOMEM;
+			goto put_fence;
+		}
+
+		fd_install(err, sf->file);
+		incr->sync_file_fd = err;
+		err = 0;
+	}
+
+put_fence:
+	dma_fence_put(fence);
+
+	return err;
+}
+
+static int submit_copy_postfences(struct drm_tegra_submit_syncpt_incr *incr,
+				  struct drm_tegra_channel_submit *args)
+{
+	unsigned long copy_err;
+
+	struct drm_tegra_submit_syncpt_incr __user *user_incrs_ptr =
+		u64_to_user_ptr(args->syncpt_incrs_ptr);
+
+	copy_err = copy_to_user(user_incrs_ptr, incr, sizeof(*incr));
+	if (copy_err)
+		return -EFAULT;
+
+	return 0;
+}
+
+static void release_job(struct host1x_job *job)
+{
+	struct tegra_drm_client *client =
+		container_of(job->client, struct tegra_drm_client, base);
+	struct tegra_drm_job_data *job_data = job->user_data;
+	u32 i;
+
+	for (i = 0; i < job_data->num_used_mappings; i++)
+		tegra_drm_mapping_put(job_data->used_mappings[i].mapping);
+
+	kfree(job_data->used_mappings);
+	kfree(job_data);
+
+	if (client->ops->power_off)
+		client->ops->power_off(client);
+}
+
+int tegra_drm_ioctl_channel_submit(struct drm_device *drm, void *data,
+				   struct drm_file *file)
+{
+	struct tegra_drm_file *fpriv = file->driver_priv;
+	struct drm_tegra_channel_submit *args = data;
+	struct drm_tegra_submit_syncpt_incr incr;
+	struct tegra_drm_job_data *job_data;
+	struct ww_acquire_ctx acquire_ctx;
+	struct tegra_drm_channel_ctx *ctx;
+	struct host1x_job *job;
+	struct gather_bo *bo;
+	u32 i;
+	int err;
+
+	if (args->reserved[0] || args->reserved[1] || args->reserved[2] ||
+	    args->reserved[3])
+		return -EINVAL;
+
+	ctx = tegra_drm_channel_ctx_lock(fpriv, args->channel_ctx);
+	if (!ctx)
+		return -EINVAL;
+
+	err = submit_copy_gather_data(drm, &bo, args);
+	if (err)
+		goto unlock;
+
+	job_data = kzalloc(sizeof(*job_data), GFP_KERNEL);
+	if (!job_data) {
+		err = -ENOMEM;
+		goto put_bo;
+	}
+
+	err = submit_process_bufs(drm, bo, job_data, ctx, args, &acquire_ctx);
+	if (err)
+		goto free_job_data;
+
+	err = submit_create_job(drm, &job, bo, ctx, args, file);
+	if (err)
+		goto free_job_data;
+
+	err = submit_handle_syncpts(drm, job, &incr, args);
+	if (err)
+		goto put_job;
+
+	err = host1x_job_pin(job, ctx->client->base.dev);
+	if (err)
+		goto put_job;
+
+	if (ctx->client->ops->power_on)
+		ctx->client->ops->power_on(ctx->client);
+	job->user_data = job_data;
+	job->release = release_job;
+	job->timeout = min(args->timeout_us / 1000, 10000U);
+	if (job->timeout == 0)
+		job->timeout = 1;
+
+	/*
+	 * job_data is now part of job reference counting, so don't release
+	 * it from here.
+	 */
+	job_data = NULL;
+
+	err = submit_handle_resv(job->user_data, &acquire_ctx);
+	if (err)
+		goto unpin_job;
+
+	err = host1x_job_submit(job);
+	if (err)
+		goto unlock_resv;
+
+	err = submit_create_postfences(job, &incr, args);
+
+	submit_unlock_resv(job->user_data, &acquire_ctx);
+
+	if (err == 0)
+		err = submit_copy_postfences(&incr, args);
+
+	goto put_job;
+
+unlock_resv:
+	submit_unlock_resv(job->user_data, &acquire_ctx);
+unpin_job:
+	host1x_job_unpin(job);
+put_job:
+	host1x_job_put(job);
+free_job_data:
+	if (job_data && job_data->used_mappings) {
+		for (i = 0; i < job_data->num_used_mappings; i++)
+			tegra_drm_mapping_put(job_data->used_mappings[i].mapping);
+		kfree(job_data->used_mappings);
+	}
+	if (job_data)
+		kfree(job_data);
+put_bo:
+	kref_put(&bo->ref, gather_bo_release);
+unlock:
+	mutex_unlock(&fpriv->lock);
+	return err;
+}
diff --git a/drivers/gpu/drm/tegra/uapi/uapi.c b/drivers/gpu/drm/tegra/uapi/uapi.c
new file mode 100644
index 000000000000..287b83105b09
--- /dev/null
+++ b/drivers/gpu/drm/tegra/uapi/uapi.c
@@ -0,0 +1,328 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2020 NVIDIA Corporation */
+
+#include <linux/host1x.h>
+#include <linux/iommu.h>
+#include <linux/list.h>
+
+#include <drm/drm_drv.h>
+#include <drm/drm_file.h>
+
+#include "../uapi.h"
+#include "../drm.h"
+
+struct tegra_drm_channel_ctx *
+tegra_drm_channel_ctx_lock(struct tegra_drm_file *file, u32 id)
+{
+	struct tegra_drm_channel_ctx *ctx;
+
+	mutex_lock(&file->lock);
+	ctx = xa_load(&file->contexts, id);
+	if (!ctx)
+		mutex_unlock(&file->lock);
+
+	return ctx;
+}
+
+static void tegra_drm_mapping_release(struct kref *ref)
+{
+	struct tegra_drm_mapping *mapping =
+		container_of(ref, struct tegra_drm_mapping, ref);
+
+	if (mapping->sgt)
+		dma_unmap_sgtable(mapping->dev, mapping->sgt,
+				  mapping->direction, DMA_ATTR_SKIP_CPU_SYNC);
+
+	host1x_bo_unpin(mapping->dev, mapping->bo, mapping->sgt);
+	host1x_bo_put(mapping->bo);
+
+	kfree(mapping);
+}
+
+void tegra_drm_mapping_put(struct tegra_drm_mapping *mapping)
+{
+	kref_put(&mapping->ref, tegra_drm_mapping_release);
+}
+
+static void tegra_drm_channel_ctx_close(struct tegra_drm_channel_ctx *ctx)
+{
+	unsigned long mapping_id;
+	struct tegra_drm_mapping *mapping;
+
+	xa_for_each(&ctx->mappings, mapping_id, mapping)
+		tegra_drm_mapping_put(mapping);
+
+	xa_destroy(&ctx->mappings);
+
+	host1x_channel_put(ctx->channel);
+
+	kfree(ctx);
+}
+
+int close_channel_ctx(int id, void *p, void *data)
+{
+	struct tegra_drm_channel_ctx *ctx = p;
+
+	tegra_drm_channel_ctx_close(ctx);
+
+	return 0;
+}
+
+void tegra_drm_uapi_close_file(struct tegra_drm_file *file)
+{
+	unsigned long ctx_id;
+	struct tegra_drm_channel_ctx *ctx;
+
+	xa_for_each(&file->contexts, ctx_id, ctx)
+		tegra_drm_channel_ctx_close(ctx);
+
+	xa_destroy(&file->contexts);
+}
+
+int tegra_drm_ioctl_channel_open(struct drm_device *drm, void *data,
+				 struct drm_file *file)
+{
+	struct tegra_drm_file *fpriv = file->driver_priv;
+	struct tegra_drm *tegra = drm->dev_private;
+	struct drm_tegra_channel_open *args = data;
+	struct tegra_drm_client *client = NULL;
+	struct tegra_drm_channel_ctx *ctx;
+	int err;
+
+	if (args->flags || args->reserved[0] || args->reserved[1])
+		return -EINVAL;
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return -ENOMEM;
+
+	err = -ENODEV;
+	list_for_each_entry(client, &tegra->clients, list) {
+		if (client->base.class == args->host1x_class) {
+			err = 0;
+			break;
+		}
+	}
+	if (err)
+		goto free_ctx;
+
+	if (client->shared_channel) {
+		ctx->channel = host1x_channel_get(client->shared_channel);
+	} else {
+		ctx->channel = host1x_channel_request(&client->base);
+		if (!ctx->channel) {
+			err = -EBUSY;
+			goto free_ctx;
+		}
+	}
+
+	err = xa_alloc(&fpriv->contexts, &args->channel_ctx, ctx,
+		       XA_LIMIT(1, U32_MAX), GFP_KERNEL);
+	if (err < 0) {
+		mutex_unlock(&fpriv->lock);
+		goto put_channel;
+	}
+
+	ctx->client = client;
+	xa_init_flags(&ctx->mappings, XA_FLAGS_ALLOC);
+
+	args->hardware_version = client->version;
+
+	return 0;
+
+put_channel:
+	host1x_channel_put(ctx->channel);
+free_ctx:
+	kfree(ctx);
+
+	return err;
+}
+
+int tegra_drm_ioctl_channel_close(struct drm_device *drm, void *data,
+				  struct drm_file *file)
+{
+	struct tegra_drm_file *fpriv = file->driver_priv;
+	struct drm_tegra_channel_close *args = data;
+	struct tegra_drm_channel_ctx *ctx;
+
+	if (args->reserved[0])
+		return -EINVAL;
+
+	ctx = tegra_drm_channel_ctx_lock(fpriv, args->channel_ctx);
+	if (!ctx)
+		return -EINVAL;
+
+	xa_erase(&fpriv->contexts, args->channel_ctx);
+
+	mutex_unlock(&fpriv->lock);
+
+	tegra_drm_channel_ctx_close(ctx);
+
+	return 0;
+}
+
+int tegra_drm_ioctl_channel_map(struct drm_device *drm, void *data,
+				struct drm_file *file)
+{
+	struct tegra_drm_file *fpriv = file->driver_priv;
+	struct drm_tegra_channel_map *args = data;
+	struct tegra_drm_channel_ctx *ctx;
+	struct tegra_drm_mapping *mapping;
+	struct drm_gem_object *gem;
+	u32 mapping_id;
+	int err = 0;
+
+	if (args->flags & ~DRM_TEGRA_CHANNEL_MAP_READWRITE)
+		return -EINVAL;
+	if (args->reserved[0] || args->reserved[1])
+		return -EINVAL;
+
+	if (!IS_ALIGNED(args->offset, 0x1000) ||
+	    !IS_ALIGNED(args->length, 0x1000))
+		return -EINVAL;
+
+	ctx = tegra_drm_channel_ctx_lock(fpriv, args->channel_ctx);
+	if (!ctx)
+		return -EINVAL;
+
+	mapping = kzalloc(sizeof(*mapping), GFP_KERNEL);
+	if (!mapping) {
+		err = -ENOMEM;
+		goto unlock;
+	}
+
+	kref_init(&mapping->ref);
+
+	gem = drm_gem_object_lookup(file, args->handle);
+	if (!gem) {
+		err = -EINVAL;
+		goto unlock;
+	}
+
+	if (args->offset >= gem->size || args->length > gem->size ||
+	    args->offset > gem->size - args->length) {
+		err = -EINVAL;
+		goto put_gem;
+	}
+
+	mapping->dev = ctx->client->base.dev;
+	mapping->bo = &container_of(gem, struct tegra_bo, gem)->base;
+
+	if (!iommu_get_domain_for_dev(mapping->dev) ||
+	    ctx->client->base.group) {
+		host1x_bo_pin(mapping->dev, mapping->bo,
+			      &mapping->iova);
+	} else {
+		mapping->direction = DMA_TO_DEVICE;
+		if (args->flags & DRM_TEGRA_CHANNEL_MAP_READWRITE)
+			mapping->direction = DMA_BIDIRECTIONAL;
+
+		mapping->sgt =
+			host1x_bo_pin(mapping->dev, mapping->bo, NULL);
+		if (IS_ERR(mapping->sgt)) {
+			err = PTR_ERR(mapping->sgt);
+			goto put_gem;
+		}
+
+		err = dma_map_sgtable(mapping->dev, mapping->sgt,
+				      mapping->direction,
+				      DMA_ATTR_SKIP_CPU_SYNC);
+		if (err)
+			goto unpin;
+
+		/* TODO only map the requested part */
+		mapping->iova =
+			sg_dma_address(mapping->sgt->sgl) + args->offset;
+	}
+
+	mutex_unlock(&fpriv->lock);
+
+	err = xa_alloc(&ctx->mappings, &mapping_id, mapping,
+		       XA_LIMIT(1, U32_MAX), GFP_KERNEL);
+	if (err < 0)
+		goto unmap;
+
+	/* TODO: if appropriate, return actual IOVA */
+	args->iova = U64_MAX;
+	args->mapping_id = mapping_id;
+
+	return 0;
+
+unmap:
+	if (mapping->sgt) {
+		dma_unmap_sgtable(mapping->dev, mapping->sgt,
+				  mapping->direction, DMA_ATTR_SKIP_CPU_SYNC);
+	}
+unpin:
+	host1x_bo_unpin(mapping->dev, mapping->bo, mapping->sgt);
+put_gem:
+	drm_gem_object_put(gem);
+	kfree(mapping);
+unlock:
+	mutex_unlock(&fpriv->lock);
+	return err;
+}
+
+int tegra_drm_ioctl_channel_unmap(struct drm_device *drm, void *data,
+				  struct drm_file *file)
+{
+	struct tegra_drm_file *fpriv = file->driver_priv;
+	struct drm_tegra_channel_unmap *args = data;
+	struct tegra_drm_channel_ctx *ctx;
+	struct tegra_drm_mapping *mapping;
+
+	if (args->reserved[0] || args->reserved[1])
+		return -EINVAL;
+
+	ctx = tegra_drm_channel_ctx_lock(fpriv, args->channel_ctx);
+	if (!ctx)
+		return -EINVAL;
+
+	mapping = xa_erase(&ctx->mappings, args->mapping_id);
+
+	mutex_unlock(&fpriv->lock);
+
+	if (mapping) {
+		tegra_drm_mapping_put(mapping);
+		return 0;
+	} else {
+		return -EINVAL;
+	}
+}
+
+int tegra_drm_ioctl_gem_create(struct drm_device *drm, void *data,
+			       struct drm_file *file)
+{
+	struct drm_tegra_gem_create *args = data;
+	struct tegra_bo *bo;
+
+	if (args->flags)
+		return -EINVAL;
+
+	bo = tegra_bo_create_with_handle(file, drm, args->size, args->flags,
+					 &args->handle);
+	if (IS_ERR(bo))
+		return PTR_ERR(bo);
+
+	return 0;
+}
+
+int tegra_drm_ioctl_gem_mmap(struct drm_device *drm, void *data,
+			     struct drm_file *file)
+{
+	struct drm_tegra_gem_mmap *args = data;
+	struct drm_gem_object *gem;
+	struct tegra_bo *bo;
+
+	gem = drm_gem_object_lookup(file, args->handle);
+	if (!gem)
+		return -EINVAL;
+
+	bo = to_tegra_bo(gem);
+
+	args->offset = drm_vma_node_offset_addr(&bo->gem.vma_node);
+
+	drm_gem_object_put(gem);
+
+	return 0;
+}
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI
@ 2020-09-05 10:34   ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-05 10:34 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

Implement the new UAPI, and bump the TegraDRM major version.

WIP:
- Wait DMA reservations
- Implement firewall on TegraDRM side

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/drm/tegra/Makefile      |   2 +
 drivers/gpu/drm/tegra/drm.c         |  46 +-
 drivers/gpu/drm/tegra/drm.h         |   5 +
 drivers/gpu/drm/tegra/uapi.h        |  59 +++
 drivers/gpu/drm/tegra/uapi/submit.c | 687 ++++++++++++++++++++++++++++
 drivers/gpu/drm/tegra/uapi/uapi.c   | 328 +++++++++++++
 6 files changed, 1109 insertions(+), 18 deletions(-)
 create mode 100644 drivers/gpu/drm/tegra/uapi.h
 create mode 100644 drivers/gpu/drm/tegra/uapi/submit.c
 create mode 100644 drivers/gpu/drm/tegra/uapi/uapi.c

diff --git a/drivers/gpu/drm/tegra/Makefile b/drivers/gpu/drm/tegra/Makefile
index d6cf202414f0..d480491564b7 100644
--- a/drivers/gpu/drm/tegra/Makefile
+++ b/drivers/gpu/drm/tegra/Makefile
@@ -3,6 +3,8 @@ ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG
 
 tegra-drm-y := \
 	drm.o \
+	uapi/uapi.o \
+	uapi/submit.o \
 	gem.o \
 	fb.o \
 	dp.o \
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 7124b0b0154b..acd734104c9a 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -20,24 +20,20 @@
 #include <drm/drm_prime.h>
 #include <drm/drm_vblank.h>
 
+#include "uapi.h"
 #include "drm.h"
 #include "gem.h"
 
 #define DRIVER_NAME "tegra"
 #define DRIVER_DESC "NVIDIA Tegra graphics"
 #define DRIVER_DATE "20120330"
-#define DRIVER_MAJOR 0
+#define DRIVER_MAJOR 1
 #define DRIVER_MINOR 0
 #define DRIVER_PATCHLEVEL 0
 
 #define CARVEOUT_SZ SZ_64M
 #define CDMA_GATHER_FETCHES_MAX_NB 16383
 
-struct tegra_drm_file {
-	struct idr contexts;
-	struct mutex lock;
-};
-
 static int tegra_atomic_check(struct drm_device *drm,
 			      struct drm_atomic_state *state)
 {
@@ -90,7 +86,8 @@ static int tegra_drm_open(struct drm_device *drm, struct drm_file *filp)
 	if (!fpriv)
 		return -ENOMEM;
 
-	idr_init(&fpriv->contexts);
+	idr_init(&fpriv->legacy_contexts);
+	xa_init_flags(&fpriv->contexts, XA_FLAGS_ALLOC);
 	mutex_init(&fpriv->lock);
 	filp->driver_priv = fpriv;
 
@@ -432,7 +429,7 @@ static int tegra_client_open(struct tegra_drm_file *fpriv,
 	if (err < 0)
 		return err;
 
-	err = idr_alloc(&fpriv->contexts, context, 1, 0, GFP_KERNEL);
+	err = idr_alloc(&fpriv->legacy_contexts, context, 1, 0, GFP_KERNEL);
 	if (err < 0) {
 		client->ops->close_channel(context);
 		return err;
@@ -487,13 +484,13 @@ static int tegra_close_channel(struct drm_device *drm, void *data,
 
 	mutex_lock(&fpriv->lock);
 
-	context = idr_find(&fpriv->contexts, args->context);
+	context = idr_find(&fpriv->legacy_contexts, args->context);
 	if (!context) {
 		err = -EINVAL;
 		goto unlock;
 	}
 
-	idr_remove(&fpriv->contexts, context->id);
+	idr_remove(&fpriv->legacy_contexts, context->id);
 	tegra_drm_context_free(context);
 
 unlock:
@@ -512,7 +509,7 @@ static int tegra_get_syncpt(struct drm_device *drm, void *data,
 
 	mutex_lock(&fpriv->lock);
 
-	context = idr_find(&fpriv->contexts, args->context);
+	context = idr_find(&fpriv->legacy_contexts, args->context);
 	if (!context) {
 		err = -ENODEV;
 		goto unlock;
@@ -541,7 +538,7 @@ static int tegra_submit(struct drm_device *drm, void *data,
 
 	mutex_lock(&fpriv->lock);
 
-	context = idr_find(&fpriv->contexts, args->context);
+	context = idr_find(&fpriv->legacy_contexts, args->context);
 	if (!context) {
 		err = -ENODEV;
 		goto unlock;
@@ -566,7 +563,7 @@ static int tegra_get_syncpt_base(struct drm_device *drm, void *data,
 
 	mutex_lock(&fpriv->lock);
 
-	context = idr_find(&fpriv->contexts, args->context);
+	context = idr_find(&fpriv->legacy_contexts, args->context);
 	if (!context) {
 		err = -ENODEV;
 		goto unlock;
@@ -734,11 +731,23 @@ static int tegra_gem_get_flags(struct drm_device *drm, void *data,
 #endif
 
 static const struct drm_ioctl_desc tegra_drm_ioctls[] = {
-#ifdef CONFIG_DRM_TEGRA_STAGING
-	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE, tegra_gem_create,
+	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_OPEN, tegra_drm_ioctl_channel_open,
+			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_CLOSE, tegra_drm_ioctl_channel_close,
+			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_MAP, tegra_drm_ioctl_channel_map,
 			  DRM_RENDER_ALLOW),
-	DRM_IOCTL_DEF_DRV(TEGRA_GEM_MMAP, tegra_gem_mmap,
+	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_UNMAP, tegra_drm_ioctl_channel_unmap,
 			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_SUBMIT, tegra_drm_ioctl_channel_submit,
+			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE, tegra_drm_ioctl_gem_create,
+			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(TEGRA_GEM_MMAP, tegra_drm_ioctl_gem_mmap,
+			  DRM_RENDER_ALLOW),
+#ifdef CONFIG_DRM_TEGRA_STAGING
+	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE_LEGACY, tegra_gem_create, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(TEGRA_GEM_MMAP_LEGACY, tegra_gem_mmap, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(TEGRA_SYNCPT_READ, tegra_syncpt_read,
 			  DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(TEGRA_SYNCPT_INCR, tegra_syncpt_incr,
@@ -792,10 +801,11 @@ static void tegra_drm_postclose(struct drm_device *drm, struct drm_file *file)
 	struct tegra_drm_file *fpriv = file->driver_priv;
 
 	mutex_lock(&fpriv->lock);
-	idr_for_each(&fpriv->contexts, tegra_drm_context_cleanup, NULL);
+	idr_for_each(&fpriv->legacy_contexts, tegra_drm_context_cleanup, NULL);
+	tegra_drm_uapi_close_file(fpriv);
 	mutex_unlock(&fpriv->lock);
 
-	idr_destroy(&fpriv->contexts);
+	idr_destroy(&fpriv->legacy_contexts);
 	mutex_destroy(&fpriv->lock);
 	kfree(fpriv);
 }
diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
index 984925d0ad3e..fbacb0b35189 100644
--- a/drivers/gpu/drm/tegra/drm.h
+++ b/drivers/gpu/drm/tegra/drm.h
@@ -58,6 +58,11 @@ struct tegra_drm {
 	struct tegra_display_hub *hub;
 };
 
+static inline struct host1x *tegra_drm_to_host1x(struct tegra_drm *tegra)
+{
+	return dev_get_drvdata(tegra->drm->dev->parent);
+}
+
 struct tegra_drm_client;
 
 struct tegra_drm_context {
diff --git a/drivers/gpu/drm/tegra/uapi.h b/drivers/gpu/drm/tegra/uapi.h
new file mode 100644
index 000000000000..4867646670c6
--- /dev/null
+++ b/drivers/gpu/drm/tegra/uapi.h
@@ -0,0 +1,59 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (c) 2020 NVIDIA Corporation */
+
+#ifndef _TEGRA_DRM_CHANNEL_UAPI_H
+#define _TEGRA_DRM_CHANNEL_UAPI_H
+
+#include <linux/dma-mapping.h>
+#include <linux/idr.h>
+#include <linux/kref.h>
+#include <linux/xarray.h>
+
+#include <drm/drm.h>
+
+struct tegra_drm_file {
+	/* Legacy UAPI state */
+	struct idr legacy_contexts;
+	struct mutex lock;
+
+	/* New UAPI state */
+	struct xarray contexts;
+};
+
+struct tegra_drm_channel_ctx {
+	struct tegra_drm_client *client;
+	struct host1x_channel *channel;
+	struct xarray mappings;
+};
+
+struct tegra_drm_mapping {
+	struct kref ref;
+
+	struct device *dev;
+	struct host1x_bo *bo;
+	struct sg_table *sgt;
+	enum dma_data_direction direction;
+	dma_addr_t iova;
+};
+
+int tegra_drm_ioctl_channel_open(struct drm_device *drm, void *data,
+				 struct drm_file *file);
+int tegra_drm_ioctl_channel_close(struct drm_device *drm, void *data,
+				  struct drm_file *file);
+int tegra_drm_ioctl_channel_map(struct drm_device *drm, void *data,
+				struct drm_file *file);
+int tegra_drm_ioctl_channel_unmap(struct drm_device *drm, void *data,
+				struct drm_file *file);
+int tegra_drm_ioctl_channel_submit(struct drm_device *drm, void *data,
+				   struct drm_file *file);
+int tegra_drm_ioctl_gem_create(struct drm_device *drm, void *data,
+				struct drm_file *file);
+int tegra_drm_ioctl_gem_mmap(struct drm_device *drm, void *data,
+				struct drm_file *file);
+
+void tegra_drm_uapi_close_file(struct tegra_drm_file *file);
+void tegra_drm_mapping_put(struct tegra_drm_mapping *mapping);
+struct tegra_drm_channel_ctx *
+tegra_drm_channel_ctx_lock(struct tegra_drm_file *file, u32 id);
+
+#endif
diff --git a/drivers/gpu/drm/tegra/uapi/submit.c b/drivers/gpu/drm/tegra/uapi/submit.c
new file mode 100644
index 000000000000..84e1c602db3e
--- /dev/null
+++ b/drivers/gpu/drm/tegra/uapi/submit.c
@@ -0,0 +1,687 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2020 NVIDIA Corporation */
+
+#include <linux/dma-fence-array.h>
+#include <linux/file.h>
+#include <linux/host1x.h>
+#include <linux/iommu.h>
+#include <linux/kref.h>
+#include <linux/list.h>
+#include <linux/nospec.h>
+#include <linux/sync_file.h>
+
+#include <drm/drm_drv.h>
+#include <drm/drm_file.h>
+
+#include "../uapi.h"
+#include "../drm.h"
+#include "../gem.h"
+
+static struct tegra_drm_mapping *
+tegra_drm_mapping_get(struct tegra_drm_channel_ctx *ctx, u32 id)
+{
+	struct tegra_drm_mapping *mapping;
+
+	xa_lock(&ctx->mappings);
+	mapping = xa_load(&ctx->mappings, id);
+	if (mapping)
+		kref_get(&mapping->ref);
+	xa_unlock(&ctx->mappings);
+
+	return mapping;
+}
+
+struct gather_bo {
+	struct host1x_bo base;
+
+	struct kref ref;
+
+	u32 *gather_data;
+	size_t gather_data_len;
+};
+
+static struct host1x_bo *gather_bo_get(struct host1x_bo *host_bo)
+{
+	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
+
+	kref_get(&bo->ref);
+
+	return host_bo;
+}
+
+static void gather_bo_release(struct kref *ref)
+{
+	struct gather_bo *bo = container_of(ref, struct gather_bo, ref);
+
+	kfree(bo->gather_data);
+	kfree(bo);
+}
+
+static void gather_bo_put(struct host1x_bo *host_bo)
+{
+	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
+
+	kref_put(&bo->ref, gather_bo_release);
+}
+
+static struct sg_table *
+gather_bo_pin(struct device *dev, struct host1x_bo *host_bo, dma_addr_t *phys)
+{
+	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
+	struct sg_table *sgt;
+	int err;
+
+	if (phys) {
+		*phys = virt_to_phys(bo->gather_data);
+		return NULL;
+	}
+
+	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
+	if (!sgt)
+		return ERR_PTR(-ENOMEM);
+
+	err = sg_alloc_table(sgt, 1, GFP_KERNEL);
+	if (err) {
+		kfree(sgt);
+		return ERR_PTR(err);
+	}
+
+	sg_init_one(sgt->sgl, bo->gather_data, bo->gather_data_len);
+
+	return sgt;
+}
+
+static void gather_bo_unpin(struct device *dev, struct sg_table *sgt)
+{
+	if (sgt) {
+		sg_free_table(sgt);
+		kfree(sgt);
+	}
+}
+
+static void *gather_bo_mmap(struct host1x_bo *host_bo)
+{
+	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
+
+	return bo->gather_data;
+}
+
+static void gather_bo_munmap(struct host1x_bo *host_bo, void *addr)
+{
+}
+
+static const struct host1x_bo_ops gather_bo_ops = {
+	.get = gather_bo_get,
+	.put = gather_bo_put,
+	.pin = gather_bo_pin,
+	.unpin = gather_bo_unpin,
+	.mmap = gather_bo_mmap,
+	.munmap = gather_bo_munmap,
+};
+
+struct tegra_drm_used_mapping {
+	struct tegra_drm_mapping *mapping;
+	u32 flags;
+};
+
+struct tegra_drm_job_data {
+	struct tegra_drm_used_mapping *used_mappings;
+	u32 num_used_mappings;
+};
+
+static int submit_copy_gather_data(struct drm_device *drm,
+				   struct gather_bo **pbo,
+				   struct drm_tegra_channel_submit *args)
+{
+	unsigned long copy_err;
+	struct gather_bo *bo;
+
+	if (args->gather_data_words == 0) {
+		drm_info(drm, "gather_data_words can't be 0");
+		return -EINVAL;
+	}
+	if (args->gather_data_words > 1024) {
+		drm_info(drm, "gather_data_words can't be over 1024");
+		return -E2BIG;
+	}
+
+	bo = kzalloc(sizeof(*bo), GFP_KERNEL);
+	if (!bo)
+		return -ENOMEM;
+
+	kref_init(&bo->ref);
+	host1x_bo_init(&bo->base, &gather_bo_ops);
+
+	bo->gather_data =
+		kmalloc(args->gather_data_words*4, GFP_KERNEL | __GFP_NOWARN);
+	if (!bo->gather_data) {
+		kfree(bo);
+		return -ENOMEM;
+	}
+
+	copy_err = copy_from_user(bo->gather_data,
+				  u64_to_user_ptr(args->gather_data_ptr),
+				  args->gather_data_words*4);
+	if (copy_err) {
+		kfree(bo->gather_data);
+		kfree(bo);
+		return -EFAULT;
+	}
+
+	bo->gather_data_len = args->gather_data_words;
+
+	*pbo = bo;
+
+	return 0;
+}
+
+static int submit_write_reloc(struct gather_bo *bo,
+			      struct drm_tegra_submit_buf *buf,
+			      struct tegra_drm_mapping *mapping)
+{
+	/* TODO check that target_offset is within bounds */
+	dma_addr_t iova = mapping->iova + buf->reloc.target_offset;
+	u32 written_ptr = (u32)(iova >> buf->reloc.shift);
+
+	if (buf->flags & DRM_TEGRA_SUBMIT_BUF_RELOC_BLOCKLINEAR)
+		written_ptr |= BIT(39);
+
+	if (buf->reloc.gather_offset_words >= bo->gather_data_len)
+		return -EINVAL;
+
+	buf->reloc.gather_offset_words = array_index_nospec(
+		buf->reloc.gather_offset_words, bo->gather_data_len);
+
+	bo->gather_data[buf->reloc.gather_offset_words] = written_ptr;
+
+	return 0;
+}
+
+static void submit_unlock_resv(struct tegra_drm_job_data *job_data,
+			       struct ww_acquire_ctx *acquire_ctx)
+{
+	u32 i;
+
+	for (i = 0; i < job_data->num_used_mappings; i++) {
+		struct tegra_bo *bo = host1x_to_tegra_bo(
+			job_data->used_mappings[i].mapping->bo);
+
+		dma_resv_unlock(bo->gem.resv);
+	}
+
+	ww_acquire_fini(acquire_ctx);
+}
+
+static int submit_handle_resv(struct tegra_drm_job_data *job_data,
+			      struct ww_acquire_ctx *acquire_ctx)
+{
+	int contended = -1;
+	int err;
+	u32 i;
+
+	/* Based on drm_gem_lock_reservations */
+
+	ww_acquire_init(acquire_ctx, &reservation_ww_class);
+
+retry:
+	if (contended != -1) {
+		struct tegra_bo *bo = host1x_to_tegra_bo(
+			job_data->used_mappings[contended].mapping->bo);
+
+		err = dma_resv_lock_slow_interruptible(bo->gem.resv,
+						       acquire_ctx);
+		if (err) {
+			ww_acquire_done(acquire_ctx);
+			return err;
+		}
+	}
+
+	for (i = 0; i < job_data->num_used_mappings; i++) {
+		struct tegra_bo *bo = host1x_to_tegra_bo(
+			job_data->used_mappings[contended].mapping->bo);
+
+		if (i == contended)
+			continue;
+
+		err = dma_resv_lock_interruptible(bo->gem.resv, acquire_ctx);
+		if (err) {
+			int j;
+
+			for (j = 0; j < i; j++) {
+				bo = host1x_to_tegra_bo(
+					job_data->used_mappings[j].mapping->bo);
+				dma_resv_unlock(bo->gem.resv);
+			}
+
+			if (contended != -1 && contended >= i) {
+				bo = host1x_to_tegra_bo(
+					job_data->used_mappings[contended].mapping->bo);
+				dma_resv_unlock(bo->gem.resv);
+			}
+
+			if (err == -EDEADLK) {
+				contended = i;
+				goto retry;
+			}
+
+			ww_acquire_done(acquire_ctx);
+			return err;
+		}
+	}
+
+	ww_acquire_done(acquire_ctx);
+
+	for (i = 0; i < job_data->num_used_mappings; i++) {
+		struct tegra_drm_used_mapping *um = &job_data->used_mappings[i];
+		struct tegra_bo *bo = host1x_to_tegra_bo(
+			job_data->used_mappings[i].mapping->bo);
+
+		if (um->flags & DRM_TEGRA_SUBMIT_BUF_RESV_READ) {
+			err = dma_resv_reserve_shared(bo->gem.resv, 1);
+			if (err < 0)
+				goto unlock_resv;
+		}
+	}
+
+	return 0;
+
+unlock_resv:
+	submit_unlock_resv(job_data, acquire_ctx);
+
+	return err;
+}
+
+static int submit_process_bufs(struct drm_device *drm, struct gather_bo *bo,
+			       struct tegra_drm_job_data *job_data,
+			       struct tegra_drm_channel_ctx *ctx,
+			       struct drm_tegra_channel_submit *args,
+			       struct ww_acquire_ctx *acquire_ctx)
+{
+	struct drm_tegra_submit_buf __user *user_bufs_ptr =
+		u64_to_user_ptr(args->bufs_ptr);
+	struct tegra_drm_mapping *mapping;
+	struct drm_tegra_submit_buf buf;
+	unsigned long copy_err;
+	int err;
+	u32 i;
+
+	job_data->used_mappings =
+		kcalloc(args->num_bufs, sizeof(*job_data->used_mappings), GFP_KERNEL);
+	if (!job_data->used_mappings)
+		return -ENOMEM;
+
+	for (i = 0; i < args->num_bufs; i++) {
+		copy_err = copy_from_user(&buf, user_bufs_ptr+i, sizeof(buf));
+		if (copy_err) {
+			err = -EFAULT;
+			goto drop_refs;
+		}
+
+		if (buf.flags & ~(DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC |
+				  DRM_TEGRA_SUBMIT_BUF_RELOC_BLOCKLINEAR |
+				  DRM_TEGRA_SUBMIT_BUF_RESV_READ |
+				  DRM_TEGRA_SUBMIT_BUF_RESV_WRITE)) {
+			err = -EINVAL;
+			goto drop_refs;
+		}
+
+		if (buf.reserved[0] || buf.reserved[1]) {
+			err = -EINVAL;
+			goto drop_refs;
+		}
+
+		mapping = tegra_drm_mapping_get(ctx, buf.mapping_id);
+		if (!mapping) {
+			drm_info(drm, "invalid mapping_id for buf: %u",
+				 buf.mapping_id);
+			err = -EINVAL;
+			goto drop_refs;
+		}
+
+		if (buf.flags & DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC) {
+			err = submit_write_reloc(bo, &buf, mapping);
+			if (err) {
+				tegra_drm_mapping_put(mapping);
+				goto drop_refs;
+			}
+		}
+
+		job_data->used_mappings[i].mapping = mapping;
+		job_data->used_mappings[i].flags = buf.flags;
+	}
+
+	return 0;
+
+drop_refs:
+	for (;;) {
+		if (i-- == 0)
+			break;
+
+		tegra_drm_mapping_put(job_data->used_mappings[i].mapping);
+	}
+
+	kfree(job_data->used_mappings);
+	job_data->used_mappings = NULL;
+
+	return err;
+}
+
+static int submit_create_job(struct drm_device *drm, struct host1x_job **pjob,
+			     struct gather_bo *bo,
+			     struct tegra_drm_channel_ctx *ctx,
+			     struct drm_tegra_channel_submit *args,
+			     struct drm_file *file)
+{
+	struct drm_tegra_submit_cmd __user *user_cmds_ptr =
+		u64_to_user_ptr(args->cmds_ptr);
+	struct drm_tegra_submit_cmd cmd;
+	struct host1x_job *job;
+	unsigned long copy_err;
+	u32 i, gather_offset = 0;
+	int err = 0;
+
+	job = host1x_job_alloc(ctx->channel, args->num_cmds, 0);
+	if (!job)
+		return -ENOMEM;
+
+	job->client = &ctx->client->base;
+	job->class = ctx->client->base.class;
+	job->serialize = true;
+
+	for (i = 0; i < args->num_cmds; i++) {
+		copy_err = copy_from_user(&cmd, user_cmds_ptr+i, sizeof(cmd));
+		if (copy_err) {
+			err = -EFAULT;
+			goto free_job;
+		}
+
+		if (cmd.type == DRM_TEGRA_SUBMIT_CMD_GATHER_UPTR) {
+			if (cmd.gather_uptr.reserved[0] ||
+			    cmd.gather_uptr.reserved[1] ||
+			    cmd.gather_uptr.reserved[2]) {
+				err = -EINVAL;
+				goto free_job;
+			}
+
+			/* Check for maximum gather size */
+			if (cmd.gather_uptr.words > 16383) {
+				err = -EINVAL;
+				goto free_job;
+			}
+
+			host1x_job_add_gather(job, &bo->base,
+					      cmd.gather_uptr.words,
+					      gather_offset*4);
+
+			gather_offset += cmd.gather_uptr.words;
+
+			if (gather_offset > bo->gather_data_len) {
+				err = -EINVAL;
+				goto free_job;
+			}
+		} else if (cmd.type == DRM_TEGRA_SUBMIT_CMD_WAIT_SYNCPT) {
+			if (cmd.wait_syncpt.reserved[0] ||
+			    cmd.wait_syncpt.reserved[1]) {
+				err = -EINVAL;
+				goto free_job;
+			}
+
+			host1x_job_add_wait(job, cmd.wait_syncpt.id,
+					    cmd.wait_syncpt.threshold);
+		} else if (cmd.type == DRM_TEGRA_SUBMIT_CMD_WAIT_SYNC_FILE) {
+			struct dma_fence *f;
+
+			if (cmd.wait_sync_file.reserved[0] ||
+			    cmd.wait_sync_file.reserved[1] ||
+			    cmd.wait_sync_file.reserved[2]) {
+				err = -EINVAL;
+				goto free_job;
+			}
+
+			f = sync_file_get_fence(cmd.wait_sync_file.fd);
+			if (!f) {
+				err = -EINVAL;
+				goto free_job;
+			}
+
+			err = dma_fence_wait(f, true);
+			dma_fence_put(f);
+
+			if (err)
+				goto free_job;
+		} else {
+			err = -EINVAL;
+			goto free_job;
+		}
+	}
+
+	if (gather_offset == 0) {
+		drm_info(drm, "job must have at least one gather");
+		err = -EINVAL;
+		goto free_job;
+	}
+
+	*pjob = job;
+
+	return 0;
+
+free_job:
+	host1x_job_put(job);
+
+	return err;
+}
+
+static int submit_handle_syncpts(struct drm_device *drm, struct host1x_job *job,
+				 struct drm_tegra_submit_syncpt_incr *incr,
+				 struct drm_tegra_channel_submit *args)
+{
+	struct drm_tegra_submit_syncpt_incr __user *user_incrs_ptr =
+		u64_to_user_ptr(args->syncpt_incrs_ptr);
+	struct host1x_syncpt *sp;
+	unsigned long copy_err;
+
+	if (args->num_syncpt_incrs != 1) {
+		drm_info(drm, "Only 1 syncpoint supported for now");
+		return -EINVAL;
+	}
+
+	copy_err = copy_from_user(incr, user_incrs_ptr, sizeof(*incr));
+	if (copy_err)
+		return -EFAULT;
+
+	if ((incr->flags & ~DRM_TEGRA_SUBMIT_SYNCPT_INCR_CREATE_SYNC_FILE) ||
+	    incr->reserved[0] || incr->reserved[1] || incr->reserved[2])
+		return -EINVAL;
+
+	/* Syncpt ref will be dropped on job release */
+	sp = host1x_syncpt_fd_get(incr->syncpt_fd);
+	if (IS_ERR(sp))
+		return PTR_ERR(sp);
+
+	job->syncpt = sp;
+	job->syncpt_incrs = incr->num_incrs;
+
+	return 0;
+}
+
+static int submit_create_postfences(struct host1x_job *job,
+				    struct drm_tegra_submit_syncpt_incr *incr,
+				    struct drm_tegra_channel_submit *args)
+{
+	struct tegra_drm_job_data *job_data = job->user_data;
+	struct dma_fence *fence;
+	int err = 0;
+	u32 i;
+
+	fence = host1x_fence_create(job->syncpt, job->syncpt_end);
+	if (IS_ERR(fence))
+		return PTR_ERR(fence);
+
+	incr->fence_value = job->syncpt_end;
+
+	for (i = 0; i < job_data->num_used_mappings; i++) {
+		struct tegra_drm_used_mapping *um = &job_data->used_mappings[i];
+		struct tegra_bo *bo = host1x_to_tegra_bo(um->mapping->bo);
+
+		if (um->flags & DRM_TEGRA_SUBMIT_BUF_RESV_READ)
+			dma_resv_add_shared_fence(bo->gem.resv, fence);
+
+		if (um->flags & DRM_TEGRA_SUBMIT_BUF_RESV_WRITE)
+			dma_resv_add_excl_fence(bo->gem.resv, fence);
+	}
+
+	if (incr->flags & DRM_TEGRA_SUBMIT_SYNCPT_INCR_CREATE_SYNC_FILE) {
+		struct sync_file *sf;
+
+		err = get_unused_fd_flags(O_CLOEXEC);
+		if (err < 0)
+			goto put_fence;
+
+		sf = sync_file_create(fence);
+		if (!sf) {
+			err = -ENOMEM;
+			goto put_fence;
+		}
+
+		fd_install(err, sf->file);
+		incr->sync_file_fd = err;
+		err = 0;
+	}
+
+put_fence:
+	dma_fence_put(fence);
+
+	return err;
+}
+
+static int submit_copy_postfences(struct drm_tegra_submit_syncpt_incr *incr,
+				  struct drm_tegra_channel_submit *args)
+{
+	unsigned long copy_err;
+
+	struct drm_tegra_submit_syncpt_incr __user *user_incrs_ptr =
+		u64_to_user_ptr(args->syncpt_incrs_ptr);
+
+	copy_err = copy_to_user(user_incrs_ptr, incr, sizeof(*incr));
+	if (copy_err)
+		return -EFAULT;
+
+	return 0;
+}
+
+static void release_job(struct host1x_job *job)
+{
+	struct tegra_drm_client *client =
+		container_of(job->client, struct tegra_drm_client, base);
+	struct tegra_drm_job_data *job_data = job->user_data;
+	u32 i;
+
+	for (i = 0; i < job_data->num_used_mappings; i++)
+		tegra_drm_mapping_put(job_data->used_mappings[i].mapping);
+
+	kfree(job_data->used_mappings);
+	kfree(job_data);
+
+	if (client->ops->power_off)
+		client->ops->power_off(client);
+}
+
+int tegra_drm_ioctl_channel_submit(struct drm_device *drm, void *data,
+				   struct drm_file *file)
+{
+	struct tegra_drm_file *fpriv = file->driver_priv;
+	struct drm_tegra_channel_submit *args = data;
+	struct drm_tegra_submit_syncpt_incr incr;
+	struct tegra_drm_job_data *job_data;
+	struct ww_acquire_ctx acquire_ctx;
+	struct tegra_drm_channel_ctx *ctx;
+	struct host1x_job *job;
+	struct gather_bo *bo;
+	u32 i;
+	int err;
+
+	if (args->reserved[0] || args->reserved[1] || args->reserved[2] ||
+	    args->reserved[3])
+		return -EINVAL;
+
+	ctx = tegra_drm_channel_ctx_lock(fpriv, args->channel_ctx);
+	if (!ctx)
+		return -EINVAL;
+
+	err = submit_copy_gather_data(drm, &bo, args);
+	if (err)
+		goto unlock;
+
+	job_data = kzalloc(sizeof(*job_data), GFP_KERNEL);
+	if (!job_data) {
+		err = -ENOMEM;
+		goto put_bo;
+	}
+
+	err = submit_process_bufs(drm, bo, job_data, ctx, args, &acquire_ctx);
+	if (err)
+		goto free_job_data;
+
+	err = submit_create_job(drm, &job, bo, ctx, args, file);
+	if (err)
+		goto free_job_data;
+
+	err = submit_handle_syncpts(drm, job, &incr, args);
+	if (err)
+		goto put_job;
+
+	err = host1x_job_pin(job, ctx->client->base.dev);
+	if (err)
+		goto put_job;
+
+	if (ctx->client->ops->power_on)
+		ctx->client->ops->power_on(ctx->client);
+	job->user_data = job_data;
+	job->release = release_job;
+	job->timeout = min(args->timeout_us / 1000, 10000U);
+	if (job->timeout == 0)
+		job->timeout = 1;
+
+	/*
+	 * job_data is now part of job reference counting, so don't release
+	 * it from here.
+	 */
+	job_data = NULL;
+
+	err = submit_handle_resv(job->user_data, &acquire_ctx);
+	if (err)
+		goto unpin_job;
+
+	err = host1x_job_submit(job);
+	if (err)
+		goto unlock_resv;
+
+	err = submit_create_postfences(job, &incr, args);
+
+	submit_unlock_resv(job->user_data, &acquire_ctx);
+
+	if (err == 0)
+		err = submit_copy_postfences(&incr, args);
+
+	goto put_job;
+
+unlock_resv:
+	submit_unlock_resv(job->user_data, &acquire_ctx);
+unpin_job:
+	host1x_job_unpin(job);
+put_job:
+	host1x_job_put(job);
+free_job_data:
+	if (job_data && job_data->used_mappings) {
+		for (i = 0; i < job_data->num_used_mappings; i++)
+			tegra_drm_mapping_put(job_data->used_mappings[i].mapping);
+		kfree(job_data->used_mappings);
+	}
+	if (job_data)
+		kfree(job_data);
+put_bo:
+	kref_put(&bo->ref, gather_bo_release);
+unlock:
+	mutex_unlock(&fpriv->lock);
+	return err;
+}
diff --git a/drivers/gpu/drm/tegra/uapi/uapi.c b/drivers/gpu/drm/tegra/uapi/uapi.c
new file mode 100644
index 000000000000..287b83105b09
--- /dev/null
+++ b/drivers/gpu/drm/tegra/uapi/uapi.c
@@ -0,0 +1,328 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2020 NVIDIA Corporation */
+
+#include <linux/host1x.h>
+#include <linux/iommu.h>
+#include <linux/list.h>
+
+#include <drm/drm_drv.h>
+#include <drm/drm_file.h>
+
+#include "../uapi.h"
+#include "../drm.h"
+
+struct tegra_drm_channel_ctx *
+tegra_drm_channel_ctx_lock(struct tegra_drm_file *file, u32 id)
+{
+	struct tegra_drm_channel_ctx *ctx;
+
+	mutex_lock(&file->lock);
+	ctx = xa_load(&file->contexts, id);
+	if (!ctx)
+		mutex_unlock(&file->lock);
+
+	return ctx;
+}
+
+static void tegra_drm_mapping_release(struct kref *ref)
+{
+	struct tegra_drm_mapping *mapping =
+		container_of(ref, struct tegra_drm_mapping, ref);
+
+	if (mapping->sgt)
+		dma_unmap_sgtable(mapping->dev, mapping->sgt,
+				  mapping->direction, DMA_ATTR_SKIP_CPU_SYNC);
+
+	host1x_bo_unpin(mapping->dev, mapping->bo, mapping->sgt);
+	host1x_bo_put(mapping->bo);
+
+	kfree(mapping);
+}
+
+void tegra_drm_mapping_put(struct tegra_drm_mapping *mapping)
+{
+	kref_put(&mapping->ref, tegra_drm_mapping_release);
+}
+
+static void tegra_drm_channel_ctx_close(struct tegra_drm_channel_ctx *ctx)
+{
+	unsigned long mapping_id;
+	struct tegra_drm_mapping *mapping;
+
+	xa_for_each(&ctx->mappings, mapping_id, mapping)
+		tegra_drm_mapping_put(mapping);
+
+	xa_destroy(&ctx->mappings);
+
+	host1x_channel_put(ctx->channel);
+
+	kfree(ctx);
+}
+
+int close_channel_ctx(int id, void *p, void *data)
+{
+	struct tegra_drm_channel_ctx *ctx = p;
+
+	tegra_drm_channel_ctx_close(ctx);
+
+	return 0;
+}
+
+void tegra_drm_uapi_close_file(struct tegra_drm_file *file)
+{
+	unsigned long ctx_id;
+	struct tegra_drm_channel_ctx *ctx;
+
+	xa_for_each(&file->contexts, ctx_id, ctx)
+		tegra_drm_channel_ctx_close(ctx);
+
+	xa_destroy(&file->contexts);
+}
+
+int tegra_drm_ioctl_channel_open(struct drm_device *drm, void *data,
+				 struct drm_file *file)
+{
+	struct tegra_drm_file *fpriv = file->driver_priv;
+	struct tegra_drm *tegra = drm->dev_private;
+	struct drm_tegra_channel_open *args = data;
+	struct tegra_drm_client *client = NULL;
+	struct tegra_drm_channel_ctx *ctx;
+	int err;
+
+	if (args->flags || args->reserved[0] || args->reserved[1])
+		return -EINVAL;
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return -ENOMEM;
+
+	err = -ENODEV;
+	list_for_each_entry(client, &tegra->clients, list) {
+		if (client->base.class == args->host1x_class) {
+			err = 0;
+			break;
+		}
+	}
+	if (err)
+		goto free_ctx;
+
+	if (client->shared_channel) {
+		ctx->channel = host1x_channel_get(client->shared_channel);
+	} else {
+		ctx->channel = host1x_channel_request(&client->base);
+		if (!ctx->channel) {
+			err = -EBUSY;
+			goto free_ctx;
+		}
+	}
+
+	err = xa_alloc(&fpriv->contexts, &args->channel_ctx, ctx,
+		       XA_LIMIT(1, U32_MAX), GFP_KERNEL);
+	if (err < 0) {
+		mutex_unlock(&fpriv->lock);
+		goto put_channel;
+	}
+
+	ctx->client = client;
+	xa_init_flags(&ctx->mappings, XA_FLAGS_ALLOC);
+
+	args->hardware_version = client->version;
+
+	return 0;
+
+put_channel:
+	host1x_channel_put(ctx->channel);
+free_ctx:
+	kfree(ctx);
+
+	return err;
+}
+
+int tegra_drm_ioctl_channel_close(struct drm_device *drm, void *data,
+				  struct drm_file *file)
+{
+	struct tegra_drm_file *fpriv = file->driver_priv;
+	struct drm_tegra_channel_close *args = data;
+	struct tegra_drm_channel_ctx *ctx;
+
+	if (args->reserved[0])
+		return -EINVAL;
+
+	ctx = tegra_drm_channel_ctx_lock(fpriv, args->channel_ctx);
+	if (!ctx)
+		return -EINVAL;
+
+	xa_erase(&fpriv->contexts, args->channel_ctx);
+
+	mutex_unlock(&fpriv->lock);
+
+	tegra_drm_channel_ctx_close(ctx);
+
+	return 0;
+}
+
+int tegra_drm_ioctl_channel_map(struct drm_device *drm, void *data,
+				struct drm_file *file)
+{
+	struct tegra_drm_file *fpriv = file->driver_priv;
+	struct drm_tegra_channel_map *args = data;
+	struct tegra_drm_channel_ctx *ctx;
+	struct tegra_drm_mapping *mapping;
+	struct drm_gem_object *gem;
+	u32 mapping_id;
+	int err = 0;
+
+	if (args->flags & ~DRM_TEGRA_CHANNEL_MAP_READWRITE)
+		return -EINVAL;
+	if (args->reserved[0] || args->reserved[1])
+		return -EINVAL;
+
+	if (!IS_ALIGNED(args->offset, 0x1000) ||
+	    !IS_ALIGNED(args->length, 0x1000))
+		return -EINVAL;
+
+	ctx = tegra_drm_channel_ctx_lock(fpriv, args->channel_ctx);
+	if (!ctx)
+		return -EINVAL;
+
+	mapping = kzalloc(sizeof(*mapping), GFP_KERNEL);
+	if (!mapping) {
+		err = -ENOMEM;
+		goto unlock;
+	}
+
+	kref_init(&mapping->ref);
+
+	gem = drm_gem_object_lookup(file, args->handle);
+	if (!gem) {
+		err = -EINVAL;
+		goto unlock;
+	}
+
+	if (args->offset >= gem->size || args->length > gem->size ||
+	    args->offset > gem->size - args->length) {
+		err = -EINVAL;
+		goto put_gem;
+	}
+
+	mapping->dev = ctx->client->base.dev;
+	mapping->bo = &container_of(gem, struct tegra_bo, gem)->base;
+
+	if (!iommu_get_domain_for_dev(mapping->dev) ||
+	    ctx->client->base.group) {
+		host1x_bo_pin(mapping->dev, mapping->bo,
+			      &mapping->iova);
+	} else {
+		mapping->direction = DMA_TO_DEVICE;
+		if (args->flags & DRM_TEGRA_CHANNEL_MAP_READWRITE)
+			mapping->direction = DMA_BIDIRECTIONAL;
+
+		mapping->sgt =
+			host1x_bo_pin(mapping->dev, mapping->bo, NULL);
+		if (IS_ERR(mapping->sgt)) {
+			err = PTR_ERR(mapping->sgt);
+			goto put_gem;
+		}
+
+		err = dma_map_sgtable(mapping->dev, mapping->sgt,
+				      mapping->direction,
+				      DMA_ATTR_SKIP_CPU_SYNC);
+		if (err)
+			goto unpin;
+
+		/* TODO only map the requested part */
+		mapping->iova =
+			sg_dma_address(mapping->sgt->sgl) + args->offset;
+	}
+
+	mutex_unlock(&fpriv->lock);
+
+	err = xa_alloc(&ctx->mappings, &mapping_id, mapping,
+		       XA_LIMIT(1, U32_MAX), GFP_KERNEL);
+	if (err < 0)
+		goto unmap;
+
+	/* TODO: if appropriate, return actual IOVA */
+	args->iova = U64_MAX;
+	args->mapping_id = mapping_id;
+
+	return 0;
+
+unmap:
+	if (mapping->sgt) {
+		dma_unmap_sgtable(mapping->dev, mapping->sgt,
+				  mapping->direction, DMA_ATTR_SKIP_CPU_SYNC);
+	}
+unpin:
+	host1x_bo_unpin(mapping->dev, mapping->bo, mapping->sgt);
+put_gem:
+	drm_gem_object_put(gem);
+	kfree(mapping);
+unlock:
+	mutex_unlock(&fpriv->lock);
+	return err;
+}
+
+int tegra_drm_ioctl_channel_unmap(struct drm_device *drm, void *data,
+				  struct drm_file *file)
+{
+	struct tegra_drm_file *fpriv = file->driver_priv;
+	struct drm_tegra_channel_unmap *args = data;
+	struct tegra_drm_channel_ctx *ctx;
+	struct tegra_drm_mapping *mapping;
+
+	if (args->reserved[0] || args->reserved[1])
+		return -EINVAL;
+
+	ctx = tegra_drm_channel_ctx_lock(fpriv, args->channel_ctx);
+	if (!ctx)
+		return -EINVAL;
+
+	mapping = xa_erase(&ctx->mappings, args->mapping_id);
+
+	mutex_unlock(&fpriv->lock);
+
+	if (mapping) {
+		tegra_drm_mapping_put(mapping);
+		return 0;
+	} else {
+		return -EINVAL;
+	}
+}
+
+int tegra_drm_ioctl_gem_create(struct drm_device *drm, void *data,
+			       struct drm_file *file)
+{
+	struct drm_tegra_gem_create *args = data;
+	struct tegra_bo *bo;
+
+	if (args->flags)
+		return -EINVAL;
+
+	bo = tegra_bo_create_with_handle(file, drm, args->size, args->flags,
+					 &args->handle);
+	if (IS_ERR(bo))
+		return PTR_ERR(bo);
+
+	return 0;
+}
+
+int tegra_drm_ioctl_gem_mmap(struct drm_device *drm, void *data,
+			     struct drm_file *file)
+{
+	struct drm_tegra_gem_mmap *args = data;
+	struct drm_gem_object *gem;
+	struct tegra_bo *bo;
+
+	gem = drm_gem_object_lookup(file, args->handle);
+	if (!gem)
+		return -EINVAL;
+
+	bo = to_tegra_bo(gem);
+
+	args->offset = drm_vma_node_offset_addr(&bo->gem.vma_node);
+
+	drm_gem_object_put(gem);
+
+	return 0;
+}
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 06/17] gpu: host1x: Cleanup and refcounting for syncpoints
  2020-09-05 10:34   ` Mikko Perttunen
@ 2020-09-05 14:30     ` Dmitry Osipenko
  -1 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-05 14:30 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

05.09.2020 13:34, Mikko Perttunen пишет:
...
> +
> +/**
> + * host1x_syncpt_put() - free a requested syncpoint
> + * @sp: host1x syncpoint
> + *
> + * Release a syncpoint previously allocated using host1x_syncpt_request(). A
> + * host1x client driver should call this when the syncpoint is no longer in
> + * use.
> + */
> +void host1x_syncpt_put(struct host1x_syncpt *sp)
> +{
> +	if (!sp)
> +		return;
> +
> +	kref_put(&sp->ref, syncpt_release);
> +}
> +EXPORT_SYMBOL(host1x_syncpt_put);
>  
>  void host1x_syncpt_deinit(struct host1x *host)
>  {
> @@ -471,16 +478,48 @@ unsigned int host1x_syncpt_nb_mlocks(struct host1x *host)
>  }
>  
>  /**
> - * host1x_syncpt_get() - obtain a syncpoint by ID
> + * host1x_syncpt_get_by_id() - obtain a syncpoint by ID
> + * @host: host1x controller
> + * @id: syncpoint ID
> + */
> +struct host1x_syncpt *host1x_syncpt_get_by_id(struct host1x *host,
> +					      unsigned int id)
> +{
> +	if (id >= host->info->nb_pts)
> +		return NULL;
> +
> +	if (kref_get_unless_zero(&host->syncpt[id].ref))
> +		return &host->syncpt[id];
> +	else
> +		return NULL;
> +}
> +EXPORT_SYMBOL(host1x_syncpt_get_by_id);
> +
> +/**
> + * host1x_syncpt_get_by_id_noref() - obtain a syncpoint by ID but don't
> + * 	increase the refcount.
>   * @host: host1x controller
>   * @id: syncpoint ID
>   */
> -struct host1x_syncpt *host1x_syncpt_get(struct host1x *host, unsigned int id)
> +struct host1x_syncpt *host1x_syncpt_get_by_id_noref(struct host1x *host,
> +						    unsigned int id)
>  {
>  	if (id >= host->info->nb_pts)
>  		return NULL;
>  
> -	return host->syncpt + id;
> +	return &host->syncpt[id];
> +}
> +EXPORT_SYMBOL(host1x_syncpt_get_by_id_noref);
> +
> +/**
> + * host1x_syncpt_get() - increment syncpoint refcount
> + * @sp: syncpoint
> + */
> +struct host1x_syncpt *host1x_syncpt_get(struct host1x_syncpt *sp)
> +{
> +	kref_get(&sp->ref);
> +
> +	return sp;
>  }
>  EXPORT_SYMBOL(host1x_syncpt_get);

Hello, Mikko!

What do you think about to open-code all the host1x structs by moving
them all out into the public linux/host1x.h? Then we could inline all
these trivial single-line functions by having them defined in the public
header. This will avoid all the unnecessary overhead by allowing
compiler to optimize the code nicely.

Of course this could be a separate change and it could be done sometime
later, I just wanted to share this quick thought for the start of the
review.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 06/17] gpu: host1x: Cleanup and refcounting for syncpoints
@ 2020-09-05 14:30     ` Dmitry Osipenko
  0 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-05 14:30 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

05.09.2020 13:34, Mikko Perttunen пишет:
...
> +
> +/**
> + * host1x_syncpt_put() - free a requested syncpoint
> + * @sp: host1x syncpoint
> + *
> + * Release a syncpoint previously allocated using host1x_syncpt_request(). A
> + * host1x client driver should call this when the syncpoint is no longer in
> + * use.
> + */
> +void host1x_syncpt_put(struct host1x_syncpt *sp)
> +{
> +	if (!sp)
> +		return;
> +
> +	kref_put(&sp->ref, syncpt_release);
> +}
> +EXPORT_SYMBOL(host1x_syncpt_put);
>  
>  void host1x_syncpt_deinit(struct host1x *host)
>  {
> @@ -471,16 +478,48 @@ unsigned int host1x_syncpt_nb_mlocks(struct host1x *host)
>  }
>  
>  /**
> - * host1x_syncpt_get() - obtain a syncpoint by ID
> + * host1x_syncpt_get_by_id() - obtain a syncpoint by ID
> + * @host: host1x controller
> + * @id: syncpoint ID
> + */
> +struct host1x_syncpt *host1x_syncpt_get_by_id(struct host1x *host,
> +					      unsigned int id)
> +{
> +	if (id >= host->info->nb_pts)
> +		return NULL;
> +
> +	if (kref_get_unless_zero(&host->syncpt[id].ref))
> +		return &host->syncpt[id];
> +	else
> +		return NULL;
> +}
> +EXPORT_SYMBOL(host1x_syncpt_get_by_id);
> +
> +/**
> + * host1x_syncpt_get_by_id_noref() - obtain a syncpoint by ID but don't
> + * 	increase the refcount.
>   * @host: host1x controller
>   * @id: syncpoint ID
>   */
> -struct host1x_syncpt *host1x_syncpt_get(struct host1x *host, unsigned int id)
> +struct host1x_syncpt *host1x_syncpt_get_by_id_noref(struct host1x *host,
> +						    unsigned int id)
>  {
>  	if (id >= host->info->nb_pts)
>  		return NULL;
>  
> -	return host->syncpt + id;
> +	return &host->syncpt[id];
> +}
> +EXPORT_SYMBOL(host1x_syncpt_get_by_id_noref);
> +
> +/**
> + * host1x_syncpt_get() - increment syncpoint refcount
> + * @sp: syncpoint
> + */
> +struct host1x_syncpt *host1x_syncpt_get(struct host1x_syncpt *sp)
> +{
> +	kref_get(&sp->ref);
> +
> +	return sp;
>  }
>  EXPORT_SYMBOL(host1x_syncpt_get);

Hello, Mikko!

What do you think about to open-code all the host1x structs by moving
them all out into the public linux/host1x.h? Then we could inline all
these trivial single-line functions by having them defined in the public
header. This will avoid all the unnecessary overhead by allowing
compiler to optimize the code nicely.

Of course this could be a separate change and it could be done sometime
later, I just wanted to share this quick thought for the start of the
review.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 06/17] gpu: host1x: Cleanup and refcounting for syncpoints
  2020-09-05 14:30     ` Dmitry Osipenko
@ 2020-09-05 14:53       ` Mikko Perttunen
  -1 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-05 14:53 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

On 9/5/20 5:30 PM, Dmitry Osipenko wrote:
> 05.09.2020 13:34, Mikko Perttunen пишет:
> ...
>> +
>> +/**
>> + * host1x_syncpt_put() - free a requested syncpoint
>> + * @sp: host1x syncpoint
>> + *
>> + * Release a syncpoint previously allocated using host1x_syncpt_request(). A
>> + * host1x client driver should call this when the syncpoint is no longer in
>> + * use.
>> + */
>> +void host1x_syncpt_put(struct host1x_syncpt *sp)
>> +{
>> +	if (!sp)
>> +		return;
>> +
>> +	kref_put(&sp->ref, syncpt_release);
>> +}
>> +EXPORT_SYMBOL(host1x_syncpt_put);
>>   
>>   void host1x_syncpt_deinit(struct host1x *host)
>>   {
>> @@ -471,16 +478,48 @@ unsigned int host1x_syncpt_nb_mlocks(struct host1x *host)
>>   }
>>   
>>   /**
>> - * host1x_syncpt_get() - obtain a syncpoint by ID
>> + * host1x_syncpt_get_by_id() - obtain a syncpoint by ID
>> + * @host: host1x controller
>> + * @id: syncpoint ID
>> + */
>> +struct host1x_syncpt *host1x_syncpt_get_by_id(struct host1x *host,
>> +					      unsigned int id)
>> +{
>> +	if (id >= host->info->nb_pts)
>> +		return NULL;
>> +
>> +	if (kref_get_unless_zero(&host->syncpt[id].ref))
>> +		return &host->syncpt[id];
>> +	else
>> +		return NULL;
>> +}
>> +EXPORT_SYMBOL(host1x_syncpt_get_by_id);
>> +
>> +/**
>> + * host1x_syncpt_get_by_id_noref() - obtain a syncpoint by ID but don't
>> + * 	increase the refcount.
>>    * @host: host1x controller
>>    * @id: syncpoint ID
>>    */
>> -struct host1x_syncpt *host1x_syncpt_get(struct host1x *host, unsigned int id)
>> +struct host1x_syncpt *host1x_syncpt_get_by_id_noref(struct host1x *host,
>> +						    unsigned int id)
>>   {
>>   	if (id >= host->info->nb_pts)
>>   		return NULL;
>>   
>> -	return host->syncpt + id;
>> +	return &host->syncpt[id];
>> +}
>> +EXPORT_SYMBOL(host1x_syncpt_get_by_id_noref);
>> +
>> +/**
>> + * host1x_syncpt_get() - increment syncpoint refcount
>> + * @sp: syncpoint
>> + */
>> +struct host1x_syncpt *host1x_syncpt_get(struct host1x_syncpt *sp)
>> +{
>> +	kref_get(&sp->ref);
>> +
>> +	return sp;
>>   }
>>   EXPORT_SYMBOL(host1x_syncpt_get);
> 
> Hello, Mikko!
> 
> What do you think about to open-code all the host1x structs by moving
> them all out into the public linux/host1x.h? Then we could inline all
> these trivial single-line functions by having them defined in the public
> header. This will avoid all the unnecessary overhead by allowing
> compiler to optimize the code nicely.
> 
> Of course this could be a separate change and it could be done sometime
> later, I just wanted to share this quick thought for the start of the
> review.
> 

Hi :)

I think for such micro-optimizations we should have a benchmark to 
evaluate against. I'm not sure we have all that many function calls into 
here overall that it would make a noticeable difference. In any case, as 
you said, I'd prefer to keep further refactoring to a separate series to 
avoid growing this series too much.

Mikko

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 06/17] gpu: host1x: Cleanup and refcounting for syncpoints
@ 2020-09-05 14:53       ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-05 14:53 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

On 9/5/20 5:30 PM, Dmitry Osipenko wrote:
> 05.09.2020 13:34, Mikko Perttunen пишет:
> ...
>> +
>> +/**
>> + * host1x_syncpt_put() - free a requested syncpoint
>> + * @sp: host1x syncpoint
>> + *
>> + * Release a syncpoint previously allocated using host1x_syncpt_request(). A
>> + * host1x client driver should call this when the syncpoint is no longer in
>> + * use.
>> + */
>> +void host1x_syncpt_put(struct host1x_syncpt *sp)
>> +{
>> +	if (!sp)
>> +		return;
>> +
>> +	kref_put(&sp->ref, syncpt_release);
>> +}
>> +EXPORT_SYMBOL(host1x_syncpt_put);
>>   
>>   void host1x_syncpt_deinit(struct host1x *host)
>>   {
>> @@ -471,16 +478,48 @@ unsigned int host1x_syncpt_nb_mlocks(struct host1x *host)
>>   }
>>   
>>   /**
>> - * host1x_syncpt_get() - obtain a syncpoint by ID
>> + * host1x_syncpt_get_by_id() - obtain a syncpoint by ID
>> + * @host: host1x controller
>> + * @id: syncpoint ID
>> + */
>> +struct host1x_syncpt *host1x_syncpt_get_by_id(struct host1x *host,
>> +					      unsigned int id)
>> +{
>> +	if (id >= host->info->nb_pts)
>> +		return NULL;
>> +
>> +	if (kref_get_unless_zero(&host->syncpt[id].ref))
>> +		return &host->syncpt[id];
>> +	else
>> +		return NULL;
>> +}
>> +EXPORT_SYMBOL(host1x_syncpt_get_by_id);
>> +
>> +/**
>> + * host1x_syncpt_get_by_id_noref() - obtain a syncpoint by ID but don't
>> + * 	increase the refcount.
>>    * @host: host1x controller
>>    * @id: syncpoint ID
>>    */
>> -struct host1x_syncpt *host1x_syncpt_get(struct host1x *host, unsigned int id)
>> +struct host1x_syncpt *host1x_syncpt_get_by_id_noref(struct host1x *host,
>> +						    unsigned int id)
>>   {
>>   	if (id >= host->info->nb_pts)
>>   		return NULL;
>>   
>> -	return host->syncpt + id;
>> +	return &host->syncpt[id];
>> +}
>> +EXPORT_SYMBOL(host1x_syncpt_get_by_id_noref);
>> +
>> +/**
>> + * host1x_syncpt_get() - increment syncpoint refcount
>> + * @sp: syncpoint
>> + */
>> +struct host1x_syncpt *host1x_syncpt_get(struct host1x_syncpt *sp)
>> +{
>> +	kref_get(&sp->ref);
>> +
>> +	return sp;
>>   }
>>   EXPORT_SYMBOL(host1x_syncpt_get);
> 
> Hello, Mikko!
> 
> What do you think about to open-code all the host1x structs by moving
> them all out into the public linux/host1x.h? Then we could inline all
> these trivial single-line functions by having them defined in the public
> header. This will avoid all the unnecessary overhead by allowing
> compiler to optimize the code nicely.
> 
> Of course this could be a separate change and it could be done sometime
> later, I just wanted to share this quick thought for the start of the
> review.
> 

Hi :)

I think for such micro-optimizations we should have a benchmark to 
evaluate against. I'm not sure we have all that many function calls into 
here overall that it would make a noticeable difference. In any case, as 
you said, I'd prefer to keep further refactoring to a separate series to 
avoid growing this series too much.

Mikko
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 00/17] Host1x/TegraDRM UAPI
  2020-09-05 10:34 ` Mikko Perttunen
@ 2020-09-08 23:36   ` Dmitry Osipenko
  -1 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-08 23:36 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

05.09.2020 13:34, Mikko Perttunen пишет:
> Hi all,
> 
> here's a second revision of the Host1x/TegraDRM UAPI proposal,
> hopefully with most issues from v1 resolved, and also with
> an implementation. There are still open issues with the
> implementation:
Could you please clarify the current status of the DMA heaps. Are we
still going to use DMA heaps?

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 00/17] Host1x/TegraDRM UAPI
@ 2020-09-08 23:36   ` Dmitry Osipenko
  0 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-08 23:36 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

05.09.2020 13:34, Mikko Perttunen пишет:
> Hi all,
> 
> here's a second revision of the Host1x/TegraDRM UAPI proposal,
> hopefully with most issues from v1 resolved, and also with
> an implementation. There are still open issues with the
> implementation:
Could you please clarify the current status of the DMA heaps. Are we
still going to use DMA heaps?
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC
  2020-09-05 10:34   ` Mikko Perttunen
@ 2020-09-08 23:45     ` Dmitry Osipenko
  -1 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-08 23:45 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

05.09.2020 13:34, Mikko Perttunen пишет:
...
> +/* Submission */
> +
> +/** Patch address of the specified mapping in the submitted gather. */
> +#define DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC		(1<<0)

Shouldn't the kernel driver be aware about what relocations need to be
patched? Could you please explain the purpose of this flag?

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC
@ 2020-09-08 23:45     ` Dmitry Osipenko
  0 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-08 23:45 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

05.09.2020 13:34, Mikko Perttunen пишет:
...
> +/* Submission */
> +
> +/** Patch address of the specified mapping in the submitted gather. */
> +#define DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC		(1<<0)

Shouldn't the kernel driver be aware about what relocations need to be
patched? Could you please explain the purpose of this flag?
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 06/17] gpu: host1x: Cleanup and refcounting for syncpoints
  2020-09-05 14:53       ` Mikko Perttunen
@ 2020-09-09  0:07         ` Dmitry Osipenko
  -1 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-09  0:07 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

05.09.2020 17:53, Mikko Perttunen пишет:
...
>> Hello, Mikko!
>>
>> What do you think about to open-code all the host1x structs by moving
>> them all out into the public linux/host1x.h? Then we could inline all
>> these trivial single-line functions by having them defined in the public
>> header. This will avoid all the unnecessary overhead by allowing
>> compiler to optimize the code nicely.
>>
>> Of course this could be a separate change and it could be done sometime
>> later, I just wanted to share this quick thought for the start of the
>> review.
>>
> 
> Hi :)
> 
> I think for such micro-optimizations we should have a benchmark to
> evaluate against. I'm not sure we have all that many function calls into
> here overall that it would make a noticeable difference. In any case, as
> you said, I'd prefer to keep further refactoring to a separate series to
> avoid growing this series too much.

The performance difference doesn't bother me, it should be insignificant
in this particular case. The amount of the exported functions is what
makes me feel uncomfortable, and especially that most of those functions
are trivial.

My concern is that doing cleanups of the upstream drivers usually not
easy. Hence it could be a good thing to put effort into restructuring
the current code before new code is added. But at first we need to have
a full-featured draft implementation that will show what parts of the
driver require refactoring.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 06/17] gpu: host1x: Cleanup and refcounting for syncpoints
@ 2020-09-09  0:07         ` Dmitry Osipenko
  0 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-09  0:07 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

05.09.2020 17:53, Mikko Perttunen пишет:
...
>> Hello, Mikko!
>>
>> What do you think about to open-code all the host1x structs by moving
>> them all out into the public linux/host1x.h? Then we could inline all
>> these trivial single-line functions by having them defined in the public
>> header. This will avoid all the unnecessary overhead by allowing
>> compiler to optimize the code nicely.
>>
>> Of course this could be a separate change and it could be done sometime
>> later, I just wanted to share this quick thought for the start of the
>> review.
>>
> 
> Hi :)
> 
> I think for such micro-optimizations we should have a benchmark to
> evaluate against. I'm not sure we have all that many function calls into
> here overall that it would make a noticeable difference. In any case, as
> you said, I'd prefer to keep further refactoring to a separate series to
> avoid growing this series too much.

The performance difference doesn't bother me, it should be insignificant
in this particular case. The amount of the exported functions is what
makes me feel uncomfortable, and especially that most of those functions
are trivial.

My concern is that doing cleanups of the upstream drivers usually not
easy. Hence it could be a good thing to put effort into restructuring
the current code before new code is added. But at first we need to have
a full-featured draft implementation that will show what parts of the
driver require refactoring.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 15/17] drm/tegra: Add power_on/power_off engine callbacks
  2020-09-05 10:34   ` Mikko Perttunen
@ 2020-09-09  0:16     ` Dmitry Osipenko
  -1 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-09  0:16 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

05.09.2020 13:34, Mikko Perttunen пишет:
...
> +static int vic_power_on(struct tegra_drm_client *client)
> +{
> +	struct vic *vic = to_vic(client);
> +
> +	return pm_runtime_get_sync(vic->dev);

Please keep in mind that RPM needs to be put in a case of error.

Maybe it would be better if driver-core could take care of
resuming/suspending client's RPM instead of putting that burden on each
client individually?

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 15/17] drm/tegra: Add power_on/power_off engine callbacks
@ 2020-09-09  0:16     ` Dmitry Osipenko
  0 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-09  0:16 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

05.09.2020 13:34, Mikko Perttunen пишет:
...
> +static int vic_power_on(struct tegra_drm_client *client)
> +{
> +	struct vic *vic = to_vic(client);
> +
> +	return pm_runtime_get_sync(vic->dev);

Please keep in mind that RPM needs to be put in a case of error.

Maybe it would be better if driver-core could take care of
resuming/suspending client's RPM instead of putting that burden on each
client individually?
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI
  2020-09-05 10:34   ` Mikko Perttunen
@ 2020-09-09  0:47     ` Dmitry Osipenko
  -1 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-09  0:47 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

05.09.2020 13:34, Mikko Perttunen пишет:
> +static int submit_process_bufs(struct drm_device *drm, struct gather_bo *bo,
> +			       struct tegra_drm_job_data *job_data,
> +			       struct tegra_drm_channel_ctx *ctx,
> +			       struct drm_tegra_channel_submit *args,
> +			       struct ww_acquire_ctx *acquire_ctx)
> +{
> +	struct drm_tegra_submit_buf __user *user_bufs_ptr =
> +		u64_to_user_ptr(args->bufs_ptr);

If assignment makes line too long, then factor it out.

  struct drm_tegra_submit_buf __user *user_bufs_ptr;

  user_bufs_ptr = u64_to_user_ptr(args->bufs_ptr);

> +	struct tegra_drm_mapping *mapping;
> +	struct drm_tegra_submit_buf buf;
> +	unsigned long copy_err;
> +	int err;
> +	u32 i;
> +
> +	job_data->used_mappings =
> +		kcalloc(args->num_bufs, sizeof(*job_data->used_mappings), GFP_KERNEL);

The checkpatch should disallow this coding style. I'd write it as:

size_t size;

size = sizeof(*job_data->used_mappings);
job_data->used_mappings = kcalloc(args->num_bufs, size..);

> +	if (!job_data->used_mappings)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < args->num_bufs; i++) {
> +		copy_err = copy_from_user(&buf, user_bufs_ptr+i, sizeof(buf));

Whole array always should be copied at once. Please keep in mind that
each copy_from_user() has a cpu-time cost, there should maximum up to 2
copyings per job.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI
@ 2020-09-09  0:47     ` Dmitry Osipenko
  0 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-09  0:47 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

05.09.2020 13:34, Mikko Perttunen пишет:
> +static int submit_process_bufs(struct drm_device *drm, struct gather_bo *bo,
> +			       struct tegra_drm_job_data *job_data,
> +			       struct tegra_drm_channel_ctx *ctx,
> +			       struct drm_tegra_channel_submit *args,
> +			       struct ww_acquire_ctx *acquire_ctx)
> +{
> +	struct drm_tegra_submit_buf __user *user_bufs_ptr =
> +		u64_to_user_ptr(args->bufs_ptr);

If assignment makes line too long, then factor it out.

  struct drm_tegra_submit_buf __user *user_bufs_ptr;

  user_bufs_ptr = u64_to_user_ptr(args->bufs_ptr);

> +	struct tegra_drm_mapping *mapping;
> +	struct drm_tegra_submit_buf buf;
> +	unsigned long copy_err;
> +	int err;
> +	u32 i;
> +
> +	job_data->used_mappings =
> +		kcalloc(args->num_bufs, sizeof(*job_data->used_mappings), GFP_KERNEL);

The checkpatch should disallow this coding style. I'd write it as:

size_t size;

size = sizeof(*job_data->used_mappings);
job_data->used_mappings = kcalloc(args->num_bufs, size..);

> +	if (!job_data->used_mappings)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < args->num_bufs; i++) {
> +		copy_err = copy_from_user(&buf, user_bufs_ptr+i, sizeof(buf));

Whole array always should be copied at once. Please keep in mind that
each copy_from_user() has a cpu-time cost, there should maximum up to 2
copyings per job.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI (submit_handle_syncpts)
  2020-09-05 10:34   ` Mikko Perttunen
@ 2020-09-09  1:13     ` Dmitry Osipenko
  -1 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-09  1:13 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

05.09.2020 13:34, Mikko Perttunen пишет:
> +int tegra_drm_ioctl_channel_submit(struct drm_device *drm, void *data,
> +				   struct drm_file *file)
> +{
> +	struct tegra_drm_file *fpriv = file->driver_priv;
> +	struct drm_tegra_channel_submit *args = data;
> +	struct drm_tegra_submit_syncpt_incr incr;
> +	struct tegra_drm_job_data *job_data;
> +	struct ww_acquire_ctx acquire_ctx;
> +	struct tegra_drm_channel_ctx *ctx;
> +	struct host1x_job *job;
> +	struct gather_bo *bo;
> +	u32 i;
> +	int err;
> +
> +	if (args->reserved[0] || args->reserved[1] || args->reserved[2] ||
> +	    args->reserved[3])
> +		return -EINVAL;
> +
> +	ctx = tegra_drm_channel_ctx_lock(fpriv, args->channel_ctx);
> +	if (!ctx)
> +		return -EINVAL;
> +
> +	err = submit_copy_gather_data(drm, &bo, args);
> +	if (err)
> +		goto unlock;
> +
> +	job_data = kzalloc(sizeof(*job_data), GFP_KERNEL);
> +	if (!job_data) {
> +		err = -ENOMEM;
> +		goto put_bo;
> +	}
> +
> +	err = submit_process_bufs(drm, bo, job_data, ctx, args, &acquire_ctx);
> +	if (err)
> +		goto free_job_data;
> +
> +	err = submit_create_job(drm, &job, bo, ctx, args, file);
> +	if (err)
> +		goto free_job_data;
> +
> +	err = submit_handle_syncpts(drm, job, &incr, args);
> +	if (err)
> +		goto put_job;

How many sync points would use an average job? Maybe it should be better
to have the predefined array of sync points within the struct
drm_tegra_channel_submit?

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI (submit_handle_syncpts)
@ 2020-09-09  1:13     ` Dmitry Osipenko
  0 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-09  1:13 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

05.09.2020 13:34, Mikko Perttunen пишет:
> +int tegra_drm_ioctl_channel_submit(struct drm_device *drm, void *data,
> +				   struct drm_file *file)
> +{
> +	struct tegra_drm_file *fpriv = file->driver_priv;
> +	struct drm_tegra_channel_submit *args = data;
> +	struct drm_tegra_submit_syncpt_incr incr;
> +	struct tegra_drm_job_data *job_data;
> +	struct ww_acquire_ctx acquire_ctx;
> +	struct tegra_drm_channel_ctx *ctx;
> +	struct host1x_job *job;
> +	struct gather_bo *bo;
> +	u32 i;
> +	int err;
> +
> +	if (args->reserved[0] || args->reserved[1] || args->reserved[2] ||
> +	    args->reserved[3])
> +		return -EINVAL;
> +
> +	ctx = tegra_drm_channel_ctx_lock(fpriv, args->channel_ctx);
> +	if (!ctx)
> +		return -EINVAL;
> +
> +	err = submit_copy_gather_data(drm, &bo, args);
> +	if (err)
> +		goto unlock;
> +
> +	job_data = kzalloc(sizeof(*job_data), GFP_KERNEL);
> +	if (!job_data) {
> +		err = -ENOMEM;
> +		goto put_bo;
> +	}
> +
> +	err = submit_process_bufs(drm, bo, job_data, ctx, args, &acquire_ctx);
> +	if (err)
> +		goto free_job_data;
> +
> +	err = submit_create_job(drm, &job, bo, ctx, args, file);
> +	if (err)
> +		goto free_job_data;
> +
> +	err = submit_handle_syncpts(drm, job, &incr, args);
> +	if (err)
> +		goto put_job;

How many sync points would use an average job? Maybe it should be better
to have the predefined array of sync points within the struct
drm_tegra_channel_submit?
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI (submit_handle_syncpts)
  2020-09-09  1:13     ` Dmitry Osipenko
@ 2020-09-09  1:24       ` Dmitry Osipenko
  -1 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-09  1:24 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

09.09.2020 04:13, Dmitry Osipenko пишет:
...
> How many sync points would use an average job? Maybe it should be better
> to have the predefined array of sync points within the struct
> drm_tegra_channel_submit?
> 

The same question regarding the commands.

Wouldn't it be a good idea to make both usrptr arrays of sync points and
commands optional by having a small fixed-size buffers within
drm_tegra_channel_submit? Then a majority of jobs would only need to
copy the gather data from userspace.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI (submit_handle_syncpts)
@ 2020-09-09  1:24       ` Dmitry Osipenko
  0 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-09  1:24 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

09.09.2020 04:13, Dmitry Osipenko пишет:
...
> How many sync points would use an average job? Maybe it should be better
> to have the predefined array of sync points within the struct
> drm_tegra_channel_submit?
> 

The same question regarding the commands.

Wouldn't it be a good idea to make both usrptr arrays of sync points and
commands optional by having a small fixed-size buffers within
drm_tegra_channel_submit? Then a majority of jobs would only need to
copy the gather data from userspace.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI
  2020-09-05 10:34   ` Mikko Perttunen
@ 2020-09-09  2:06     ` Dmitry Osipenko
  -1 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-09  2:06 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

05.09.2020 13:34, Mikko Perttunen пишет:
> +int tegra_drm_ioctl_channel_open(struct drm_device *drm, void *data,
> +				 struct drm_file *file)
> +{
> +	struct tegra_drm_file *fpriv = file->driver_priv;
> +	struct tegra_drm *tegra = drm->dev_private;
> +	struct drm_tegra_channel_open *args = data;
> +	struct tegra_drm_client *client = NULL;
> +	struct tegra_drm_channel_ctx *ctx;
> +	int err;
> +
> +	if (args->flags || args->reserved[0] || args->reserved[1])
> +		return -EINVAL;
> +
> +	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
> +	if (!ctx)
> +		return -ENOMEM;
> +
> +	err = -ENODEV;
> +	list_for_each_entry(client, &tegra->clients, list) {
> +		if (client->base.class == args->host1x_class) {
> +			err = 0;
> +			break;
> +		}
> +	}
> +	if (err)
> +		goto free_ctx;
> +
> +	if (client->shared_channel) {
> +		ctx->channel = host1x_channel_get(client->shared_channel);
> +	} else {
> +		ctx->channel = host1x_channel_request(&client->base);
> +		if (!ctx->channel) {
> +			err = -EBUSY;
> +			goto free_ctx;
> +		}
> +	}
> +
> +	err = xa_alloc(&fpriv->contexts, &args->channel_ctx, ctx,
> +		       XA_LIMIT(1, U32_MAX), GFP_KERNEL);
> +	if (err < 0) {
> +		mutex_unlock(&fpriv->lock);

Looks like the lock was never taken.

> +		goto put_channel;
> +	}
> +
> +	ctx->client = client;
> +	xa_init_flags(&ctx->mappings, XA_FLAGS_ALLOC);

Why not XA_FLAGS_ALLOC1?

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI
@ 2020-09-09  2:06     ` Dmitry Osipenko
  0 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-09  2:06 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

05.09.2020 13:34, Mikko Perttunen пишет:
> +int tegra_drm_ioctl_channel_open(struct drm_device *drm, void *data,
> +				 struct drm_file *file)
> +{
> +	struct tegra_drm_file *fpriv = file->driver_priv;
> +	struct tegra_drm *tegra = drm->dev_private;
> +	struct drm_tegra_channel_open *args = data;
> +	struct tegra_drm_client *client = NULL;
> +	struct tegra_drm_channel_ctx *ctx;
> +	int err;
> +
> +	if (args->flags || args->reserved[0] || args->reserved[1])
> +		return -EINVAL;
> +
> +	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
> +	if (!ctx)
> +		return -ENOMEM;
> +
> +	err = -ENODEV;
> +	list_for_each_entry(client, &tegra->clients, list) {
> +		if (client->base.class == args->host1x_class) {
> +			err = 0;
> +			break;
> +		}
> +	}
> +	if (err)
> +		goto free_ctx;
> +
> +	if (client->shared_channel) {
> +		ctx->channel = host1x_channel_get(client->shared_channel);
> +	} else {
> +		ctx->channel = host1x_channel_request(&client->base);
> +		if (!ctx->channel) {
> +			err = -EBUSY;
> +			goto free_ctx;
> +		}
> +	}
> +
> +	err = xa_alloc(&fpriv->contexts, &args->channel_ctx, ctx,
> +		       XA_LIMIT(1, U32_MAX), GFP_KERNEL);
> +	if (err < 0) {
> +		mutex_unlock(&fpriv->lock);

Looks like the lock was never taken.

> +		goto put_channel;
> +	}
> +
> +	ctx->client = client;
> +	xa_init_flags(&ctx->mappings, XA_FLAGS_ALLOC);

Why not XA_FLAGS_ALLOC1?
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI
  2020-09-05 10:34   ` Mikko Perttunen
@ 2020-09-09  2:10     ` Dmitry Osipenko
  -1 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-09  2:10 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

05.09.2020 13:34, Mikko Perttunen пишет:
> +	job->timeout = min(args->timeout_us / 1000, 10000U);
> +	if (job->timeout == 0)
> +		job->timeout = 1;

clamp()

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI
@ 2020-09-09  2:10     ` Dmitry Osipenko
  0 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-09  2:10 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

05.09.2020 13:34, Mikko Perttunen пишет:
> +	job->timeout = min(args->timeout_us / 1000, 10000U);
> +	if (job->timeout == 0)
> +		job->timeout = 1;

clamp()
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 00/17] Host1x/TegraDRM UAPI
  2020-09-05 10:34 ` Mikko Perttunen
@ 2020-09-09  2:20   ` Dmitry Osipenko
  -1 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-09  2:20 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

05.09.2020 13:34, Mikko Perttunen пишет:
> Hi all,
> 
> here's a second revision of the Host1x/TegraDRM UAPI proposal,
> hopefully with most issues from v1 resolved, and also with
> an implementation. There are still open issues with the
> implementation:
> 
> * Relocs are now handled on TegraDRM side instead of Host1x,
>   so the firewall is not aware of them, causing submission
>   failure where the firewall is enabled. Proposed solution
>   is to move the firewall to TegraDRM side, but this hasn't
>   been done yet.
> * For the new UAPI, syncpoint recovery on job timeout is
>   disabled. What this means is that upon job timeout,
>   all further jobs using that syncpoint are cancelled,
>   and the syncpoint is marked unusable until it is freed.
>   However, there is currently a race between the timeout
>   handler and job submission, where submission can observe
>   the syncpoint in non-locked state and yet the job
>   cancellations won't cancel the new job.
> * Waiting for DMA reservation fences is not implemented yet.
> * I have only tested on Tegra186.
> 
> The series consists of three parts:
> 
> * The first part contains some fixes and improvements to
>   the Host1x driver of more general nature,
> * The second part adds the Host1x side UAPI, as well as
>   Host1x-side changes needed for the new TegraDRM UAPI,
> * The third part adds the new TegraDRM UAPI.
> 
> I have written some tests to test the new interface,
> see https://github.com/cyndis/uapi-test. Porting of proper
> userspace (e.g. opentegra, vdpau-tegra) will come once
> there is some degree of conclusion on the UAPI definition.

Could you please enumerate all the currently opened questions?

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 00/17] Host1x/TegraDRM UAPI
@ 2020-09-09  2:20   ` Dmitry Osipenko
  0 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-09  2:20 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

05.09.2020 13:34, Mikko Perttunen пишет:
> Hi all,
> 
> here's a second revision of the Host1x/TegraDRM UAPI proposal,
> hopefully with most issues from v1 resolved, and also with
> an implementation. There are still open issues with the
> implementation:
> 
> * Relocs are now handled on TegraDRM side instead of Host1x,
>   so the firewall is not aware of them, causing submission
>   failure where the firewall is enabled. Proposed solution
>   is to move the firewall to TegraDRM side, but this hasn't
>   been done yet.
> * For the new UAPI, syncpoint recovery on job timeout is
>   disabled. What this means is that upon job timeout,
>   all further jobs using that syncpoint are cancelled,
>   and the syncpoint is marked unusable until it is freed.
>   However, there is currently a race between the timeout
>   handler and job submission, where submission can observe
>   the syncpoint in non-locked state and yet the job
>   cancellations won't cancel the new job.
> * Waiting for DMA reservation fences is not implemented yet.
> * I have only tested on Tegra186.
> 
> The series consists of three parts:
> 
> * The first part contains some fixes and improvements to
>   the Host1x driver of more general nature,
> * The second part adds the Host1x side UAPI, as well as
>   Host1x-side changes needed for the new TegraDRM UAPI,
> * The third part adds the new TegraDRM UAPI.
> 
> I have written some tests to test the new interface,
> see https://github.com/cyndis/uapi-test. Porting of proper
> userspace (e.g. opentegra, vdpau-tegra) will come once
> there is some degree of conclusion on the UAPI definition.

Could you please enumerate all the currently opened questions?
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI
  2020-09-09  2:10     ` Dmitry Osipenko
@ 2020-09-09  2:34       ` Dmitry Osipenko
  -1 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-09  2:34 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

09.09.2020 05:10, Dmitry Osipenko пишет:
> 05.09.2020 13:34, Mikko Perttunen пишет:
>> +	job->timeout = min(args->timeout_us / 1000, 10000U);
>> +	if (job->timeout == 0)
>> +		job->timeout = 1;
> 
> clamp()
> 

Does it make sense to have timeout in microseconds?

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI
@ 2020-09-09  2:34       ` Dmitry Osipenko
  0 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-09  2:34 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

09.09.2020 05:10, Dmitry Osipenko пишет:
> 05.09.2020 13:34, Mikko Perttunen пишет:
>> +	job->timeout = min(args->timeout_us / 1000, 10000U);
>> +	if (job->timeout == 0)
>> +		job->timeout = 1;
> 
> clamp()
> 

Does it make sense to have timeout in microseconds?
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 06/17] gpu: host1x: Cleanup and refcounting for syncpoints
  2020-09-09  0:07         ` Dmitry Osipenko
@ 2020-09-09  8:03           ` Mikko Perttunen
  -1 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-09  8:03 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

On 9/9/20 3:07 AM, Dmitry Osipenko wrote:
> 05.09.2020 17:53, Mikko Perttunen пишет:
> ...
>>> Hello, Mikko!
>>>
>>> What do you think about to open-code all the host1x structs by moving
>>> them all out into the public linux/host1x.h? Then we could inline all
>>> these trivial single-line functions by having them defined in the public
>>> header. This will avoid all the unnecessary overhead by allowing
>>> compiler to optimize the code nicely.
>>>
>>> Of course this could be a separate change and it could be done sometime
>>> later, I just wanted to share this quick thought for the start of the
>>> review.
>>>
>>
>> Hi :)
>>
>> I think for such micro-optimizations we should have a benchmark to
>> evaluate against. I'm not sure we have all that many function calls into
>> here overall that it would make a noticeable difference. In any case, as
>> you said, I'd prefer to keep further refactoring to a separate series to
>> avoid growing this series too much.
> 
> The performance difference doesn't bother me, it should be insignificant
> in this particular case. The amount of the exported functions is what
> makes me feel uncomfortable, and especially that most of those functions
> are trivial.

I don't see a particular problem with this -- I think it's better to 
keep the data structures in the driver-internal headers to to improve 
modularization. I think we can get rid of the syncpt_get_by_id* 
functions once we remove the staging code, so that would clean up things 
as well.

> 
> My concern is that doing cleanups of the upstream drivers usually not
> easy. Hence it could be a good thing to put effort into restructuring
> the current code before new code is added. But at first we need to have
> a full-featured draft implementation that will show what parts of the
> driver require refactoring.
> 

My feeling is that once we have the new UAPI implemented, refactoring 
will be easier because we have a better idea of what we need of the 
code, and we will be able to remove the staging code, allowing removal 
or easier refactoring of many old paths.

While doing that, some of the new code will have to be changed again as 
well, sure, but at least the entire time we will have a functional 
implementation.

Mikko

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 06/17] gpu: host1x: Cleanup and refcounting for syncpoints
@ 2020-09-09  8:03           ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-09  8:03 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

On 9/9/20 3:07 AM, Dmitry Osipenko wrote:
> 05.09.2020 17:53, Mikko Perttunen пишет:
> ...
>>> Hello, Mikko!
>>>
>>> What do you think about to open-code all the host1x structs by moving
>>> them all out into the public linux/host1x.h? Then we could inline all
>>> these trivial single-line functions by having them defined in the public
>>> header. This will avoid all the unnecessary overhead by allowing
>>> compiler to optimize the code nicely.
>>>
>>> Of course this could be a separate change and it could be done sometime
>>> later, I just wanted to share this quick thought for the start of the
>>> review.
>>>
>>
>> Hi :)
>>
>> I think for such micro-optimizations we should have a benchmark to
>> evaluate against. I'm not sure we have all that many function calls into
>> here overall that it would make a noticeable difference. In any case, as
>> you said, I'd prefer to keep further refactoring to a separate series to
>> avoid growing this series too much.
> 
> The performance difference doesn't bother me, it should be insignificant
> in this particular case. The amount of the exported functions is what
> makes me feel uncomfortable, and especially that most of those functions
> are trivial.

I don't see a particular problem with this -- I think it's better to 
keep the data structures in the driver-internal headers to to improve 
modularization. I think we can get rid of the syncpt_get_by_id* 
functions once we remove the staging code, so that would clean up things 
as well.

> 
> My concern is that doing cleanups of the upstream drivers usually not
> easy. Hence it could be a good thing to put effort into restructuring
> the current code before new code is added. But at first we need to have
> a full-featured draft implementation that will show what parts of the
> driver require refactoring.
> 

My feeling is that once we have the new UAPI implemented, refactoring 
will be easier because we have a better idea of what we need of the 
code, and we will be able to remove the staging code, allowing removal 
or easier refactoring of many old paths.

While doing that, some of the new code will have to be changed again as 
well, sure, but at least the entire time we will have a functional 
implementation.

Mikko
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC
  2020-09-08 23:45     ` DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC Dmitry Osipenko
@ 2020-09-09  8:10       ` Mikko Perttunen
  -1 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-09  8:10 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

On 9/9/20 2:45 AM, Dmitry Osipenko wrote:
> 05.09.2020 13:34, Mikko Perttunen пишет:
> ...
>> +/* Submission */
>> +
>> +/** Patch address of the specified mapping in the submitted gather. */
>> +#define DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC		(1<<0)
> 
> Shouldn't the kernel driver be aware about what relocations need to be
> patched? Could you please explain the purpose of this flag?
> 

Sure, the kernel knows if it returned the IOVA to the user or not, so we 
could remove this flag and determine it implicitly. I don't think there 
is much harm in it though; if we have the flag an application can decide 
to ignore the iova field and just pass WRITE_RELOC always, and it's not 
really any extra code on kernel side.

Mikko

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC
@ 2020-09-09  8:10       ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-09  8:10 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

On 9/9/20 2:45 AM, Dmitry Osipenko wrote:
> 05.09.2020 13:34, Mikko Perttunen пишет:
> ...
>> +/* Submission */
>> +
>> +/** Patch address of the specified mapping in the submitted gather. */
>> +#define DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC		(1<<0)
> 
> Shouldn't the kernel driver be aware about what relocations need to be
> patched? Could you please explain the purpose of this flag?
> 

Sure, the kernel knows if it returned the IOVA to the user or not, so we 
could remove this flag and determine it implicitly. I don't think there 
is much harm in it though; if we have the flag an application can decide 
to ignore the iova field and just pass WRITE_RELOC always, and it's not 
really any extra code on kernel side.

Mikko
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 15/17] drm/tegra: Add power_on/power_off engine callbacks
  2020-09-09  0:16     ` Dmitry Osipenko
@ 2020-09-09  8:11       ` Mikko Perttunen
  -1 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-09  8:11 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

On 9/9/20 3:16 AM, Dmitry Osipenko wrote:
> 05.09.2020 13:34, Mikko Perttunen пишет:
> ...
>> +static int vic_power_on(struct tegra_drm_client *client)
>> +{
>> +	struct vic *vic = to_vic(client);
>> +
>> +	return pm_runtime_get_sync(vic->dev);
> 
> Please keep in mind that RPM needs to be put in a case of error.
> 
> Maybe it would be better if driver-core could take care of
> resuming/suspending client's RPM instead of putting that burden on each
> client individually?
> 

Good point, we should be able to just make RPM calls from the core code. 
I'll change it so (and fix the refcounting).

Mikko

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 15/17] drm/tegra: Add power_on/power_off engine callbacks
@ 2020-09-09  8:11       ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-09  8:11 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

On 9/9/20 3:16 AM, Dmitry Osipenko wrote:
> 05.09.2020 13:34, Mikko Perttunen пишет:
> ...
>> +static int vic_power_on(struct tegra_drm_client *client)
>> +{
>> +	struct vic *vic = to_vic(client);
>> +
>> +	return pm_runtime_get_sync(vic->dev);
> 
> Please keep in mind that RPM needs to be put in a case of error.
> 
> Maybe it would be better if driver-core could take care of
> resuming/suspending client's RPM instead of putting that burden on each
> client individually?
> 

Good point, we should be able to just make RPM calls from the core code. 
I'll change it so (and fix the refcounting).

Mikko
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI
  2020-09-09  0:47     ` Dmitry Osipenko
@ 2020-09-09  8:19       ` Mikko Perttunen
  -1 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-09  8:19 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

On 9/9/20 3:47 AM, Dmitry Osipenko wrote:
> 05.09.2020 13:34, Mikko Perttunen пишет:
>> +static int submit_process_bufs(struct drm_device *drm, struct gather_bo *bo,
>> +			       struct tegra_drm_job_data *job_data,
>> +			       struct tegra_drm_channel_ctx *ctx,
>> +			       struct drm_tegra_channel_submit *args,
>> +			       struct ww_acquire_ctx *acquire_ctx)
>> +{
>> +	struct drm_tegra_submit_buf __user *user_bufs_ptr =
>> +		u64_to_user_ptr(args->bufs_ptr);
> 
> If assignment makes line too long, then factor it out.
> 
>    struct drm_tegra_submit_buf __user *user_bufs_ptr;
> 
>    user_bufs_ptr = u64_to_user_ptr(args->bufs_ptr);
> 
>> +	struct tegra_drm_mapping *mapping;
>> +	struct drm_tegra_submit_buf buf;
>> +	unsigned long copy_err;
>> +	int err;
>> +	u32 i;
>> +
>> +	job_data->used_mappings =
>> +		kcalloc(args->num_bufs, sizeof(*job_data->used_mappings), GFP_KERNEL);
> 
> The checkpatch should disallow this coding style. I'd write it as:
> 
> size_t size;
> 
> size = sizeof(*job_data->used_mappings);
> job_data->used_mappings = kcalloc(args->num_bufs, size..);

I'll make these cleaner for next version.

> 
>> +	if (!job_data->used_mappings)
>> +		return -ENOMEM;
>> +
>> +	for (i = 0; i < args->num_bufs; i++) {
>> +		copy_err = copy_from_user(&buf, user_bufs_ptr+i, sizeof(buf));
> 
> Whole array always should be copied at once. Please keep in mind that
> each copy_from_user() has a cpu-time cost, there should maximum up to 2
> copyings per job.
> 

OK. BTW, do you have some reference/numbers for this or is it based on 
grate-driver experience?

Mikko

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI
@ 2020-09-09  8:19       ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-09  8:19 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

On 9/9/20 3:47 AM, Dmitry Osipenko wrote:
> 05.09.2020 13:34, Mikko Perttunen пишет:
>> +static int submit_process_bufs(struct drm_device *drm, struct gather_bo *bo,
>> +			       struct tegra_drm_job_data *job_data,
>> +			       struct tegra_drm_channel_ctx *ctx,
>> +			       struct drm_tegra_channel_submit *args,
>> +			       struct ww_acquire_ctx *acquire_ctx)
>> +{
>> +	struct drm_tegra_submit_buf __user *user_bufs_ptr =
>> +		u64_to_user_ptr(args->bufs_ptr);
> 
> If assignment makes line too long, then factor it out.
> 
>    struct drm_tegra_submit_buf __user *user_bufs_ptr;
> 
>    user_bufs_ptr = u64_to_user_ptr(args->bufs_ptr);
> 
>> +	struct tegra_drm_mapping *mapping;
>> +	struct drm_tegra_submit_buf buf;
>> +	unsigned long copy_err;
>> +	int err;
>> +	u32 i;
>> +
>> +	job_data->used_mappings =
>> +		kcalloc(args->num_bufs, sizeof(*job_data->used_mappings), GFP_KERNEL);
> 
> The checkpatch should disallow this coding style. I'd write it as:
> 
> size_t size;
> 
> size = sizeof(*job_data->used_mappings);
> job_data->used_mappings = kcalloc(args->num_bufs, size..);

I'll make these cleaner for next version.

> 
>> +	if (!job_data->used_mappings)
>> +		return -ENOMEM;
>> +
>> +	for (i = 0; i < args->num_bufs; i++) {
>> +		copy_err = copy_from_user(&buf, user_bufs_ptr+i, sizeof(buf));
> 
> Whole array always should be copied at once. Please keep in mind that
> each copy_from_user() has a cpu-time cost, there should maximum up to 2
> copyings per job.
> 

OK. BTW, do you have some reference/numbers for this or is it based on 
grate-driver experience?

Mikko
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI (submit_handle_syncpts)
  2020-09-09  1:24       ` Dmitry Osipenko
@ 2020-09-09  8:26         ` Mikko Perttunen
  -1 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-09  8:26 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

On 9/9/20 4:24 AM, Dmitry Osipenko wrote:
> 09.09.2020 04:13, Dmitry Osipenko пишет:
> ...
>> How many sync points would use an average job? Maybe it should be better
>> to have the predefined array of sync points within the struct
>> drm_tegra_channel_submit?
>>
> 
> The same question regarding the commands.
> 
> Wouldn't it be a good idea to make both usrptr arrays of sync points and
> commands optional by having a small fixed-size buffers within
> drm_tegra_channel_submit? Then a majority of jobs would only need to
> copy the gather data from userspace.
> 

Sure, I'll look into it. For syncpoints, it would be usually 1 but 
sometimes 2, so maybe make it 2. For commands, at least for downstream 
it would typically be 2 (one wait and one gather). Any opinion from 
grate-driver's point of view? Not sure if there is any recommendation 
regarding the max size of IOCTL data.

Mikko

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI (submit_handle_syncpts)
@ 2020-09-09  8:26         ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-09  8:26 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

On 9/9/20 4:24 AM, Dmitry Osipenko wrote:
> 09.09.2020 04:13, Dmitry Osipenko пишет:
> ...
>> How many sync points would use an average job? Maybe it should be better
>> to have the predefined array of sync points within the struct
>> drm_tegra_channel_submit?
>>
> 
> The same question regarding the commands.
> 
> Wouldn't it be a good idea to make both usrptr arrays of sync points and
> commands optional by having a small fixed-size buffers within
> drm_tegra_channel_submit? Then a majority of jobs would only need to
> copy the gather data from userspace.
> 

Sure, I'll look into it. For syncpoints, it would be usually 1 but 
sometimes 2, so maybe make it 2. For commands, at least for downstream 
it would typically be 2 (one wait and one gather). Any opinion from 
grate-driver's point of view? Not sure if there is any recommendation 
regarding the max size of IOCTL data.

Mikko
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI
  2020-09-09  2:06     ` Dmitry Osipenko
@ 2020-09-09  8:26       ` Mikko Perttunen
  -1 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-09  8:26 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

On 9/9/20 5:06 AM, Dmitry Osipenko wrote:
> 05.09.2020 13:34, Mikko Perttunen пишет:
>> +int tegra_drm_ioctl_channel_open(struct drm_device *drm, void *data,
>> +				 struct drm_file *file)
>> +{
>> +	struct tegra_drm_file *fpriv = file->driver_priv;
>> +	struct tegra_drm *tegra = drm->dev_private;
>> +	struct drm_tegra_channel_open *args = data;
>> +	struct tegra_drm_client *client = NULL;
>> +	struct tegra_drm_channel_ctx *ctx;
>> +	int err;
>> +
>> +	if (args->flags || args->reserved[0] || args->reserved[1])
>> +		return -EINVAL;
>> +
>> +	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
>> +	if (!ctx)
>> +		return -ENOMEM;
>> +
>> +	err = -ENODEV;
>> +	list_for_each_entry(client, &tegra->clients, list) {
>> +		if (client->base.class == args->host1x_class) {
>> +			err = 0;
>> +			break;
>> +		}
>> +	}
>> +	if (err)
>> +		goto free_ctx;
>> +
>> +	if (client->shared_channel) {
>> +		ctx->channel = host1x_channel_get(client->shared_channel);
>> +	} else {
>> +		ctx->channel = host1x_channel_request(&client->base);
>> +		if (!ctx->channel) {
>> +			err = -EBUSY;
>> +			goto free_ctx;
>> +		}
>> +	}
>> +
>> +	err = xa_alloc(&fpriv->contexts, &args->channel_ctx, ctx,
>> +		       XA_LIMIT(1, U32_MAX), GFP_KERNEL);
>> +	if (err < 0) {
>> +		mutex_unlock(&fpriv->lock);
> 
> Looks like the lock was never taken.

Thanks, will fix.

> 
>> +		goto put_channel;
>> +	}
>> +
>> +	ctx->client = client;
>> +	xa_init_flags(&ctx->mappings, XA_FLAGS_ALLOC);
> 
> Why not XA_FLAGS_ALLOC1?
> 

Will fix as well.

Mikko

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI
@ 2020-09-09  8:26       ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-09  8:26 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

On 9/9/20 5:06 AM, Dmitry Osipenko wrote:
> 05.09.2020 13:34, Mikko Perttunen пишет:
>> +int tegra_drm_ioctl_channel_open(struct drm_device *drm, void *data,
>> +				 struct drm_file *file)
>> +{
>> +	struct tegra_drm_file *fpriv = file->driver_priv;
>> +	struct tegra_drm *tegra = drm->dev_private;
>> +	struct drm_tegra_channel_open *args = data;
>> +	struct tegra_drm_client *client = NULL;
>> +	struct tegra_drm_channel_ctx *ctx;
>> +	int err;
>> +
>> +	if (args->flags || args->reserved[0] || args->reserved[1])
>> +		return -EINVAL;
>> +
>> +	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
>> +	if (!ctx)
>> +		return -ENOMEM;
>> +
>> +	err = -ENODEV;
>> +	list_for_each_entry(client, &tegra->clients, list) {
>> +		if (client->base.class == args->host1x_class) {
>> +			err = 0;
>> +			break;
>> +		}
>> +	}
>> +	if (err)
>> +		goto free_ctx;
>> +
>> +	if (client->shared_channel) {
>> +		ctx->channel = host1x_channel_get(client->shared_channel);
>> +	} else {
>> +		ctx->channel = host1x_channel_request(&client->base);
>> +		if (!ctx->channel) {
>> +			err = -EBUSY;
>> +			goto free_ctx;
>> +		}
>> +	}
>> +
>> +	err = xa_alloc(&fpriv->contexts, &args->channel_ctx, ctx,
>> +		       XA_LIMIT(1, U32_MAX), GFP_KERNEL);
>> +	if (err < 0) {
>> +		mutex_unlock(&fpriv->lock);
> 
> Looks like the lock was never taken.

Thanks, will fix.

> 
>> +		goto put_channel;
>> +	}
>> +
>> +	ctx->client = client;
>> +	xa_init_flags(&ctx->mappings, XA_FLAGS_ALLOC);
> 
> Why not XA_FLAGS_ALLOC1?
> 

Will fix as well.

Mikko
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI
  2020-09-09  2:34       ` Dmitry Osipenko
@ 2020-09-09  8:36         ` Mikko Perttunen
  -1 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-09  8:36 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman



On 9/9/20 5:34 AM, Dmitry Osipenko wrote:
> 09.09.2020 05:10, Dmitry Osipenko пишет:
>> 05.09.2020 13:34, Mikko Perttunen пишет:
>>> +	job->timeout = min(args->timeout_us / 1000, 10000U);
>>> +	if (job->timeout == 0)
>>> +		job->timeout = 1;
>>
>> clamp()
>>

Will fix.

> 
> Does it make sense to have timeout in microseconds?
> 

Not sure, but better have it a bit more fine-grained rather than 
coarse-grained. This still gives a maximum timeout of 71 minutes so I 
don't think it has any negatives compared to milliseconds.

Mikko

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI
@ 2020-09-09  8:36         ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-09  8:36 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel



On 9/9/20 5:34 AM, Dmitry Osipenko wrote:
> 09.09.2020 05:10, Dmitry Osipenko пишет:
>> 05.09.2020 13:34, Mikko Perttunen пишет:
>>> +	job->timeout = min(args->timeout_us / 1000, 10000U);
>>> +	if (job->timeout == 0)
>>> +		job->timeout = 1;
>>
>> clamp()
>>

Will fix.

> 
> Does it make sense to have timeout in microseconds?
> 

Not sure, but better have it a bit more fine-grained rather than 
coarse-grained. This still gives a maximum timeout of 71 minutes so I 
don't think it has any negatives compared to milliseconds.

Mikko
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 00/17] Host1x/TegraDRM UAPI
  2020-09-08 23:36   ` Dmitry Osipenko
@ 2020-09-09  8:40     ` Mikko Perttunen
  -1 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-09  8:40 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

On 9/9/20 2:36 AM, Dmitry Osipenko wrote:
> 05.09.2020 13:34, Mikko Perttunen пишет:
>> Hi all,
>>
>> here's a second revision of the Host1x/TegraDRM UAPI proposal,
>> hopefully with most issues from v1 resolved, and also with
>> an implementation. There are still open issues with the
>> implementation:
> Could you please clarify the current status of the DMA heaps. Are we
> still going to use DMA heaps?
> 

Sorry, should have mentioned the status in the cover letter. I sent an 
email to dri-devel about how DMA heaps should be used -- I believe the 
conclusion was that it's not entirely clear, but dma-bufs should only be 
used for buffers shared between engines. So for the time being, we 
should still implement GEM for intra-TegraDRM buffers. There seems to be 
some planning ongoing to see if the different subsystem allocators can 
be unified (see dma-buf heaps talk from linux plumbers conference), but 
for now we should go for GEM.

Mikko

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 00/17] Host1x/TegraDRM UAPI
@ 2020-09-09  8:40     ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-09  8:40 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

On 9/9/20 2:36 AM, Dmitry Osipenko wrote:
> 05.09.2020 13:34, Mikko Perttunen пишет:
>> Hi all,
>>
>> here's a second revision of the Host1x/TegraDRM UAPI proposal,
>> hopefully with most issues from v1 resolved, and also with
>> an implementation. There are still open issues with the
>> implementation:
> Could you please clarify the current status of the DMA heaps. Are we
> still going to use DMA heaps?
> 

Sorry, should have mentioned the status in the cover letter. I sent an 
email to dri-devel about how DMA heaps should be used -- I believe the 
conclusion was that it's not entirely clear, but dma-bufs should only be 
used for buffers shared between engines. So for the time being, we 
should still implement GEM for intra-TegraDRM buffers. There seems to be 
some planning ongoing to see if the different subsystem allocators can 
be unified (see dma-buf heaps talk from linux plumbers conference), but 
for now we should go for GEM.

Mikko
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 00/17] Host1x/TegraDRM UAPI
  2020-09-09  2:20   ` Dmitry Osipenko
@ 2020-09-09  8:44     ` Mikko Perttunen
  -1 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-09  8:44 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

On 9/9/20 5:20 AM, Dmitry Osipenko wrote:
> 05.09.2020 13:34, Mikko Perttunen пишет:
>> Hi all,
>>
>> here's a second revision of the Host1x/TegraDRM UAPI proposal,
>> hopefully with most issues from v1 resolved, and also with
>> an implementation. There are still open issues with the
>> implementation:
>>
>> * Relocs are now handled on TegraDRM side instead of Host1x,
>>    so the firewall is not aware of them, causing submission
>>    failure where the firewall is enabled. Proposed solution
>>    is to move the firewall to TegraDRM side, but this hasn't
>>    been done yet.
>> * For the new UAPI, syncpoint recovery on job timeout is
>>    disabled. What this means is that upon job timeout,
>>    all further jobs using that syncpoint are cancelled,
>>    and the syncpoint is marked unusable until it is freed.
>>    However, there is currently a race between the timeout
>>    handler and job submission, where submission can observe
>>    the syncpoint in non-locked state and yet the job
>>    cancellations won't cancel the new job.
>> * Waiting for DMA reservation fences is not implemented yet.
>> * I have only tested on Tegra186.
>>
>> The series consists of three parts:
>>
>> * The first part contains some fixes and improvements to
>>    the Host1x driver of more general nature,
>> * The second part adds the Host1x side UAPI, as well as
>>    Host1x-side changes needed for the new TegraDRM UAPI,
>> * The third part adds the new TegraDRM UAPI.
>>
>> I have written some tests to test the new interface,
>> see https://github.com/cyndis/uapi-test. Porting of proper
>> userspace (e.g. opentegra, vdpau-tegra) will come once
>> there is some degree of conclusion on the UAPI definition.
> 
> Could you please enumerate all the currently opened questions?
> 

Which open questions do you refer to? The open items of v1 should be 
closed now; for fences we setup an SW timeout to prevent them from 
sticking around forever, and regarding GEM the GEM IOCTLs are again 
being used.


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 00/17] Host1x/TegraDRM UAPI
@ 2020-09-09  8:44     ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-09  8:44 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

On 9/9/20 5:20 AM, Dmitry Osipenko wrote:
> 05.09.2020 13:34, Mikko Perttunen пишет:
>> Hi all,
>>
>> here's a second revision of the Host1x/TegraDRM UAPI proposal,
>> hopefully with most issues from v1 resolved, and also with
>> an implementation. There are still open issues with the
>> implementation:
>>
>> * Relocs are now handled on TegraDRM side instead of Host1x,
>>    so the firewall is not aware of them, causing submission
>>    failure where the firewall is enabled. Proposed solution
>>    is to move the firewall to TegraDRM side, but this hasn't
>>    been done yet.
>> * For the new UAPI, syncpoint recovery on job timeout is
>>    disabled. What this means is that upon job timeout,
>>    all further jobs using that syncpoint are cancelled,
>>    and the syncpoint is marked unusable until it is freed.
>>    However, there is currently a race between the timeout
>>    handler and job submission, where submission can observe
>>    the syncpoint in non-locked state and yet the job
>>    cancellations won't cancel the new job.
>> * Waiting for DMA reservation fences is not implemented yet.
>> * I have only tested on Tegra186.
>>
>> The series consists of three parts:
>>
>> * The first part contains some fixes and improvements to
>>    the Host1x driver of more general nature,
>> * The second part adds the Host1x side UAPI, as well as
>>    Host1x-side changes needed for the new TegraDRM UAPI,
>> * The third part adds the new TegraDRM UAPI.
>>
>> I have written some tests to test the new interface,
>> see https://github.com/cyndis/uapi-test. Porting of proper
>> userspace (e.g. opentegra, vdpau-tegra) will come once
>> there is some degree of conclusion on the UAPI definition.
> 
> Could you please enumerate all the currently opened questions?
> 

Which open questions do you refer to? The open items of v1 should be 
closed now; for fences we setup an SW timeout to prevent them from 
sticking around forever, and regarding GEM the GEM IOCTLs are again 
being used.

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 00/17] Host1x/TegraDRM UAPI
  2020-09-09  8:44     ` Mikko Perttunen
@ 2020-09-10 21:53       ` Dmitry Osipenko
  -1 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-10 21:53 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

09.09.2020 11:44, Mikko Perttunen пишет:
...
>> Could you please enumerate all the currently opened questions?
>>
> 
> Which open questions do you refer to?

Anything related to the UAPI definition that needs more thought. If
there is nothing outstanding, then good!

> The open items of v1 should be
> closed now; for fences we setup an SW timeout to prevent them from
> sticking around forever, and regarding GEM the GEM IOCTLs are again
> being used.
> 

We'll see how it will be in practice! For now it's a bit difficult to
decide what is good and what needs more improvement.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 00/17] Host1x/TegraDRM UAPI
@ 2020-09-10 21:53       ` Dmitry Osipenko
  0 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-10 21:53 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

09.09.2020 11:44, Mikko Perttunen пишет:
...
>> Could you please enumerate all the currently opened questions?
>>
> 
> Which open questions do you refer to?

Anything related to the UAPI definition that needs more thought. If
there is nothing outstanding, then good!

> The open items of v1 should be
> closed now; for fences we setup an SW timeout to prevent them from
> sticking around forever, and regarding GEM the GEM IOCTLs are again
> being used.
> 

We'll see how it will be in practice! For now it's a bit difficult to
decide what is good and what needs more improvement.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI
  2020-09-09  8:36         ` Mikko Perttunen
@ 2020-09-10 21:57           ` Dmitry Osipenko
  -1 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-10 21:57 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

09.09.2020 11:36, Mikko Perttunen пишет:
...
>>
>> Does it make sense to have timeout in microseconds?
>>
> 
> Not sure, but better have it a bit more fine-grained rather than
> coarse-grained. This still gives a maximum timeout of 71 minutes so I
> don't think it has any negatives compared to milliseconds.

If there is no good reason to use microseconds right now, then should be
better to default to milliseconds, IMO. It shouldn't be a problem to
extend the IOCLT with a microseconds entry, if ever be needed.

{
	__u32 timeout_ms;
...
	__u32 timeout_us;
}

timeout = timeout_ms + 1000 * timeout_us;

There shouldn't be a need for a long timeouts, since a job that takes
over 100ms is probably too unpractical. It also should be possible to
detect a progressing job and then defer timeout in the driver. At least
this is what other drivers do, like etnaviv driver for example:

https://elixir.bootlin.com/linux/v5.9-rc4/source/drivers/gpu/drm/etnaviv/etnaviv_sched.c#L107

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI
@ 2020-09-10 21:57           ` Dmitry Osipenko
  0 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-10 21:57 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

09.09.2020 11:36, Mikko Perttunen пишет:
...
>>
>> Does it make sense to have timeout in microseconds?
>>
> 
> Not sure, but better have it a bit more fine-grained rather than
> coarse-grained. This still gives a maximum timeout of 71 minutes so I
> don't think it has any negatives compared to milliseconds.

If there is no good reason to use microseconds right now, then should be
better to default to milliseconds, IMO. It shouldn't be a problem to
extend the IOCLT with a microseconds entry, if ever be needed.

{
	__u32 timeout_ms;
...
	__u32 timeout_us;
}

timeout = timeout_ms + 1000 * timeout_us;

There shouldn't be a need for a long timeouts, since a job that takes
over 100ms is probably too unpractical. It also should be possible to
detect a progressing job and then defer timeout in the driver. At least
this is what other drivers do, like etnaviv driver for example:

https://elixir.bootlin.com/linux/v5.9-rc4/source/drivers/gpu/drm/etnaviv/etnaviv_sched.c#L107
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI (submit_handle_syncpts)
  2020-09-09  8:26         ` Mikko Perttunen
@ 2020-09-10 21:58           ` Dmitry Osipenko
  -1 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-10 21:58 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

09.09.2020 11:26, Mikko Perttunen пишет:
> On 9/9/20 4:24 AM, Dmitry Osipenko wrote:
>> 09.09.2020 04:13, Dmitry Osipenko пишет:
>> ...
>>> How many sync points would use an average job? Maybe it should be better
>>> to have the predefined array of sync points within the struct
>>> drm_tegra_channel_submit?
>>>
>>
>> The same question regarding the commands.
>>
>> Wouldn't it be a good idea to make both usrptr arrays of sync points and
>> commands optional by having a small fixed-size buffers within
>> drm_tegra_channel_submit? Then a majority of jobs would only need to
>> copy the gather data from userspace.
>>
> 
> Sure, I'll look into it. For syncpoints, it would be usually 1 but
> sometimes 2, so maybe make it 2. For commands, at least for downstream
> it would typically be 2 (one wait and one gather). Any opinion from
> grate-driver's point of view? Not sure if there is any recommendation
> regarding the max size of IOCTL data.

The Opentegra will need more than 2 commands. We'll need to take a look
at what are the min/max/average numbers of commands are used by
Opentegra since it combines multiple jobs into one and each job may have
several wait commands.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI (submit_handle_syncpts)
@ 2020-09-10 21:58           ` Dmitry Osipenko
  0 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-10 21:58 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

09.09.2020 11:26, Mikko Perttunen пишет:
> On 9/9/20 4:24 AM, Dmitry Osipenko wrote:
>> 09.09.2020 04:13, Dmitry Osipenko пишет:
>> ...
>>> How many sync points would use an average job? Maybe it should be better
>>> to have the predefined array of sync points within the struct
>>> drm_tegra_channel_submit?
>>>
>>
>> The same question regarding the commands.
>>
>> Wouldn't it be a good idea to make both usrptr arrays of sync points and
>> commands optional by having a small fixed-size buffers within
>> drm_tegra_channel_submit? Then a majority of jobs would only need to
>> copy the gather data from userspace.
>>
> 
> Sure, I'll look into it. For syncpoints, it would be usually 1 but
> sometimes 2, so maybe make it 2. For commands, at least for downstream
> it would typically be 2 (one wait and one gather). Any opinion from
> grate-driver's point of view? Not sure if there is any recommendation
> regarding the max size of IOCTL data.

The Opentegra will need more than 2 commands. We'll need to take a look
at what are the min/max/average numbers of commands are used by
Opentegra since it combines multiple jobs into one and each job may have
several wait commands.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI
  2020-09-09  8:19       ` Mikko Perttunen
@ 2020-09-10 21:59         ` Dmitry Osipenko
  -1 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-10 21:59 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

09.09.2020 11:19, Mikko Perttunen пишет:
...
>>> +    if (!job_data->used_mappings)
>>> +        return -ENOMEM;
>>> +
>>> +    for (i = 0; i < args->num_bufs; i++) {
>>> +        copy_err = copy_from_user(&buf, user_bufs_ptr+i, sizeof(buf));
>>
>> Whole array always should be copied at once. Please keep in mind that
>> each copy_from_user() has a cpu-time cost, there should maximum up to 2
>> copyings per job.
>>
> 
> OK. BTW, do you have some reference/numbers for this or is it based on
> grate-driver experience?

I had numbers about 2 years ago while was profiling job submission
latency using host1x-tests and for a simple jobs there was a visible
difference caused by each copy_from_user(), kmalloc() and having
firewall functions uninlined.

Of course it wasn't critical, but it's also not difficult to optimize
such things.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI
@ 2020-09-10 21:59         ` Dmitry Osipenko
  0 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-10 21:59 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

09.09.2020 11:19, Mikko Perttunen пишет:
...
>>> +    if (!job_data->used_mappings)
>>> +        return -ENOMEM;
>>> +
>>> +    for (i = 0; i < args->num_bufs; i++) {
>>> +        copy_err = copy_from_user(&buf, user_bufs_ptr+i, sizeof(buf));
>>
>> Whole array always should be copied at once. Please keep in mind that
>> each copy_from_user() has a cpu-time cost, there should maximum up to 2
>> copyings per job.
>>
> 
> OK. BTW, do you have some reference/numbers for this or is it based on
> grate-driver experience?

I had numbers about 2 years ago while was profiling job submission
latency using host1x-tests and for a simple jobs there was a visible
difference caused by each copy_from_user(), kmalloc() and having
firewall functions uninlined.

Of course it wasn't critical, but it's also not difficult to optimize
such things.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 09/17] gpu: host1x: DMA fences and userspace fence creation
  2020-09-05 10:34   ` Mikko Perttunen
@ 2020-09-10 22:00     ` Dmitry Osipenko
  -1 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-10 22:00 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

05.09.2020 13:34, Mikko Perttunen пишет:
...
>  
> +static void action_signal_fence(struct host1x_waitlist *waiter)
> +{
> +	struct host1x_syncpt_fence *f = waiter->data;
> +
> +	host1x_fence_signal(f);
> +}
> +
>  typedef void (*action_handler)(struct host1x_waitlist *waiter);
>  
>  static const action_handler action_handlers[HOST1X_INTR_ACTION_COUNT] = {
>  	action_submit_complete,
>  	action_wakeup,
>  	action_wakeup_interruptible,
> +	action_signal_fence,
>  };

My expectation is that we should remove the host1x-waiter entirely. It
comes from 2011/2012 era of the host1x driver and now duplicates
functionality provided by the dma-fence and drm-scheduler. Perhaps it
could be okay to re-use existing code for the starter, but this is
something to keep in mind that it may be better not to put much effort
into the older code.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 09/17] gpu: host1x: DMA fences and userspace fence creation
@ 2020-09-10 22:00     ` Dmitry Osipenko
  0 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-10 22:00 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

05.09.2020 13:34, Mikko Perttunen пишет:
...
>  
> +static void action_signal_fence(struct host1x_waitlist *waiter)
> +{
> +	struct host1x_syncpt_fence *f = waiter->data;
> +
> +	host1x_fence_signal(f);
> +}
> +
>  typedef void (*action_handler)(struct host1x_waitlist *waiter);
>  
>  static const action_handler action_handlers[HOST1X_INTR_ACTION_COUNT] = {
>  	action_submit_complete,
>  	action_wakeup,
>  	action_wakeup_interruptible,
> +	action_signal_fence,
>  };

My expectation is that we should remove the host1x-waiter entirely. It
comes from 2011/2012 era of the host1x driver and now duplicates
functionality provided by the dma-fence and drm-scheduler. Perhaps it
could be okay to re-use existing code for the starter, but this is
something to keep in mind that it may be better not to put much effort
into the older code.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 00/17] Host1x/TegraDRM UAPI
  2020-09-09  8:40     ` Mikko Perttunen
@ 2020-09-10 22:09       ` Dmitry Osipenko
  -1 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-10 22:09 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

09.09.2020 11:40, Mikko Perttunen пишет:
> On 9/9/20 2:36 AM, Dmitry Osipenko wrote:
>> 05.09.2020 13:34, Mikko Perttunen пишет:
>>> Hi all,
>>>
>>> here's a second revision of the Host1x/TegraDRM UAPI proposal,
>>> hopefully with most issues from v1 resolved, and also with
>>> an implementation. There are still open issues with the
>>> implementation:
>> Could you please clarify the current status of the DMA heaps. Are we
>> still going to use DMA heaps?
>>
> 
> Sorry, should have mentioned the status in the cover letter. I sent an
> email to dri-devel about how DMA heaps should be used -- I believe the
> conclusion was that it's not entirely clear, but dma-bufs should only be
> used for buffers shared between engines. So for the time being, we
> should still implement GEM for intra-TegraDRM buffers. There seems to be
> some planning ongoing to see if the different subsystem allocators can
> be unified (see dma-buf heaps talk from linux plumbers conference), but
> for now we should go for GEM.

Thanks!

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 00/17] Host1x/TegraDRM UAPI
@ 2020-09-10 22:09       ` Dmitry Osipenko
  0 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-10 22:09 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

09.09.2020 11:40, Mikko Perttunen пишет:
> On 9/9/20 2:36 AM, Dmitry Osipenko wrote:
>> 05.09.2020 13:34, Mikko Perttunen пишет:
>>> Hi all,
>>>
>>> here's a second revision of the Host1x/TegraDRM UAPI proposal,
>>> hopefully with most issues from v1 resolved, and also with
>>> an implementation. There are still open issues with the
>>> implementation:
>> Could you please clarify the current status of the DMA heaps. Are we
>> still going to use DMA heaps?
>>
> 
> Sorry, should have mentioned the status in the cover letter. I sent an
> email to dri-devel about how DMA heaps should be used -- I believe the
> conclusion was that it's not entirely clear, but dma-bufs should only be
> used for buffers shared between engines. So for the time being, we
> should still implement GEM for intra-TegraDRM buffers. There seems to be
> some planning ongoing to see if the different subsystem allocators can
> be unified (see dma-buf heaps talk from linux plumbers conference), but
> for now we should go for GEM.

Thanks!
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC
  2020-09-09  8:10       ` DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC Mikko Perttunen
@ 2020-09-10 22:15         ` Dmitry Osipenko
  -1 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-10 22:15 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

09.09.2020 11:10, Mikko Perttunen пишет:
> On 9/9/20 2:45 AM, Dmitry Osipenko wrote:
>> 05.09.2020 13:34, Mikko Perttunen пишет:
>> ...
>>> +/* Submission */
>>> +
>>> +/** Patch address of the specified mapping in the submitted gather. */
>>> +#define DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC        (1<<0)
>>
>> Shouldn't the kernel driver be aware about what relocations need to be
>> patched? Could you please explain the purpose of this flag?
>>
> 
> Sure, the kernel knows if it returned the IOVA to the user or not, so we
> could remove this flag and determine it implicitly. I don't think there
> is much harm in it though; if we have the flag an application can decide
> to ignore the iova field and just pass WRITE_RELOC always, and it's not
> really any extra code on kernel side.

Sounds like there is no real practical use for this flag other than for
testing purposes, correct?

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC
@ 2020-09-10 22:15         ` Dmitry Osipenko
  0 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-10 22:15 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

09.09.2020 11:10, Mikko Perttunen пишет:
> On 9/9/20 2:45 AM, Dmitry Osipenko wrote:
>> 05.09.2020 13:34, Mikko Perttunen пишет:
>> ...
>>> +/* Submission */
>>> +
>>> +/** Patch address of the specified mapping in the submitted gather. */
>>> +#define DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC        (1<<0)
>>
>> Shouldn't the kernel driver be aware about what relocations need to be
>> patched? Could you please explain the purpose of this flag?
>>
> 
> Sure, the kernel knows if it returned the IOVA to the user or not, so we
> could remove this flag and determine it implicitly. I don't think there
> is much harm in it though; if we have the flag an application can decide
> to ignore the iova field and just pass WRITE_RELOC always, and it's not
> really any extra code on kernel side.

Sounds like there is no real practical use for this flag other than for
testing purposes, correct?
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 09/17] gpu: host1x: DMA fences and userspace fence creation
  2020-09-10 22:00     ` Dmitry Osipenko
@ 2020-09-11  9:07       ` Mikko Perttunen
  -1 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-11  9:07 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

On 9/11/20 1:00 AM, Dmitry Osipenko wrote:
> 05.09.2020 13:34, Mikko Perttunen пишет:
> ...
>>   
>> +static void action_signal_fence(struct host1x_waitlist *waiter)
>> +{
>> +	struct host1x_syncpt_fence *f = waiter->data;
>> +
>> +	host1x_fence_signal(f);
>> +}
>> +
>>   typedef void (*action_handler)(struct host1x_waitlist *waiter);
>>   
>>   static const action_handler action_handlers[HOST1X_INTR_ACTION_COUNT] = {
>>   	action_submit_complete,
>>   	action_wakeup,
>>   	action_wakeup_interruptible,
>> +	action_signal_fence,
>>   };
> 
> My expectation is that we should remove the host1x-waiter entirely. It
> comes from 2011/2012 era of the host1x driver and now duplicates
> functionality provided by the dma-fence and drm-scheduler. Perhaps it
> could be okay to re-use existing code for the starter, but this is
> something to keep in mind that it may be better not to put much effort
> into the older code.
> 

Agreed, it should be cleaned up and probably replaced. I made only 
minimal changes here to get all my tests working, as I didn't want to do 
a full refactoring in this patch series.

Mikko

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 09/17] gpu: host1x: DMA fences and userspace fence creation
@ 2020-09-11  9:07       ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-11  9:07 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

On 9/11/20 1:00 AM, Dmitry Osipenko wrote:
> 05.09.2020 13:34, Mikko Perttunen пишет:
> ...
>>   
>> +static void action_signal_fence(struct host1x_waitlist *waiter)
>> +{
>> +	struct host1x_syncpt_fence *f = waiter->data;
>> +
>> +	host1x_fence_signal(f);
>> +}
>> +
>>   typedef void (*action_handler)(struct host1x_waitlist *waiter);
>>   
>>   static const action_handler action_handlers[HOST1X_INTR_ACTION_COUNT] = {
>>   	action_submit_complete,
>>   	action_wakeup,
>>   	action_wakeup_interruptible,
>> +	action_signal_fence,
>>   };
> 
> My expectation is that we should remove the host1x-waiter entirely. It
> comes from 2011/2012 era of the host1x driver and now duplicates
> functionality provided by the dma-fence and drm-scheduler. Perhaps it
> could be okay to re-use existing code for the starter, but this is
> something to keep in mind that it may be better not to put much effort
> into the older code.
> 

Agreed, it should be cleaned up and probably replaced. I made only 
minimal changes here to get all my tests working, as I didn't want to do 
a full refactoring in this patch series.

Mikko
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC
  2020-09-10 22:15         ` DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC Dmitry Osipenko
@ 2020-09-11  9:52           ` Mikko Perttunen
  -1 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-11  9:52 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

On 9/11/20 1:15 AM, Dmitry Osipenko wrote:
> 09.09.2020 11:10, Mikko Perttunen пишет:
>> On 9/9/20 2:45 AM, Dmitry Osipenko wrote:
>>> 05.09.2020 13:34, Mikko Perttunen пишет:
>>> ...
>>>> +/* Submission */
>>>> +
>>>> +/** Patch address of the specified mapping in the submitted gather. */
>>>> +#define DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC        (1<<0)
>>>
>>> Shouldn't the kernel driver be aware about what relocations need to be
>>> patched? Could you please explain the purpose of this flag?
>>>
>>
>> Sure, the kernel knows if it returned the IOVA to the user or not, so we
>> could remove this flag and determine it implicitly. I don't think there
>> is much harm in it though; if we have the flag an application can decide
>> to ignore the iova field and just pass WRITE_RELOC always, and it's not
>> really any extra code on kernel side.
> 
> Sounds like there is no real practical use for this flag other than for
> testing purposes, correct?
> 

Patching depending just on if the MAP IOCTL returned an IOVA or not 
seems a bit "spooky action at a distance"-ish to me, but maybe it's not 
so bad.. I'll consider removing it.

Mikko

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC
@ 2020-09-11  9:52           ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-11  9:52 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

On 9/11/20 1:15 AM, Dmitry Osipenko wrote:
> 09.09.2020 11:10, Mikko Perttunen пишет:
>> On 9/9/20 2:45 AM, Dmitry Osipenko wrote:
>>> 05.09.2020 13:34, Mikko Perttunen пишет:
>>> ...
>>>> +/* Submission */
>>>> +
>>>> +/** Patch address of the specified mapping in the submitted gather. */
>>>> +#define DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC        (1<<0)
>>>
>>> Shouldn't the kernel driver be aware about what relocations need to be
>>> patched? Could you please explain the purpose of this flag?
>>>
>>
>> Sure, the kernel knows if it returned the IOVA to the user or not, so we
>> could remove this flag and determine it implicitly. I don't think there
>> is much harm in it though; if we have the flag an application can decide
>> to ignore the iova field and just pass WRITE_RELOC always, and it's not
>> really any extra code on kernel side.
> 
> Sounds like there is no real practical use for this flag other than for
> testing purposes, correct?
> 

Patching depending just on if the MAP IOCTL returned an IOVA or not 
seems a bit "spooky action at a distance"-ish to me, but maybe it's not 
so bad.. I'll consider removing it.

Mikko
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI
  2020-09-10 21:57           ` Dmitry Osipenko
@ 2020-09-11  9:59             ` Mikko Perttunen
  -1 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-11  9:59 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

On 9/11/20 12:57 AM, Dmitry Osipenko wrote:
> 09.09.2020 11:36, Mikko Perttunen пишет:
> ...
>>>
>>> Does it make sense to have timeout in microseconds?
>>>
>>
>> Not sure, but better have it a bit more fine-grained rather than
>> coarse-grained. This still gives a maximum timeout of 71 minutes so I
>> don't think it has any negatives compared to milliseconds.
> 
> If there is no good reason to use microseconds right now, then should be
> better to default to milliseconds, IMO. It shouldn't be a problem to
> extend the IOCLT with a microseconds entry, if ever be needed.
> 
> {
> 	__u32 timeout_ms;
> ...
> 	__u32 timeout_us;
> }
> 
> timeout = timeout_ms + 1000 * timeout_us;
> 
> There shouldn't be a need for a long timeouts, since a job that takes
> over 100ms is probably too unpractical. It also should be possible to
> detect a progressing job and then defer timeout in the driver. At least
> this is what other drivers do, like etnaviv driver for example:
> 
> https://elixir.bootlin.com/linux/v5.9-rc4/source/drivers/gpu/drm/etnaviv/etnaviv_sched.c#L107
> 

I still don't quite understand why it's better to default to 
milliseconds? As you say, there is no need to have a long timeout, and 
if we go microseconds now, then there wouldn't be a need to extend in 
the future.

Mikko

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI
@ 2020-09-11  9:59             ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-11  9:59 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

On 9/11/20 12:57 AM, Dmitry Osipenko wrote:
> 09.09.2020 11:36, Mikko Perttunen пишет:
> ...
>>>
>>> Does it make sense to have timeout in microseconds?
>>>
>>
>> Not sure, but better have it a bit more fine-grained rather than
>> coarse-grained. This still gives a maximum timeout of 71 minutes so I
>> don't think it has any negatives compared to milliseconds.
> 
> If there is no good reason to use microseconds right now, then should be
> better to default to milliseconds, IMO. It shouldn't be a problem to
> extend the IOCLT with a microseconds entry, if ever be needed.
> 
> {
> 	__u32 timeout_ms;
> ...
> 	__u32 timeout_us;
> }
> 
> timeout = timeout_ms + 1000 * timeout_us;
> 
> There shouldn't be a need for a long timeouts, since a job that takes
> over 100ms is probably too unpractical. It also should be possible to
> detect a progressing job and then defer timeout in the driver. At least
> this is what other drivers do, like etnaviv driver for example:
> 
> https://elixir.bootlin.com/linux/v5.9-rc4/source/drivers/gpu/drm/etnaviv/etnaviv_sched.c#L107
> 

I still don't quite understand why it's better to default to 
milliseconds? As you say, there is no need to have a long timeout, and 
if we go microseconds now, then there wouldn't be a need to extend in 
the future.

Mikko
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI
  2020-09-11  9:59             ` Mikko Perttunen
@ 2020-09-11 16:30               ` Dmitry Osipenko
  -1 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-11 16:30 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

11.09.2020 12:59, Mikko Perttunen пишет:
> On 9/11/20 12:57 AM, Dmitry Osipenko wrote:
>> 09.09.2020 11:36, Mikko Perttunen пишет:
>> ...
>>>>
>>>> Does it make sense to have timeout in microseconds?
>>>>
>>>
>>> Not sure, but better have it a bit more fine-grained rather than
>>> coarse-grained. This still gives a maximum timeout of 71 minutes so I
>>> don't think it has any negatives compared to milliseconds.
>>
>> If there is no good reason to use microseconds right now, then should be
>> better to default to milliseconds, IMO. It shouldn't be a problem to
>> extend the IOCLT with a microseconds entry, if ever be needed.
>>
>> {
>>     __u32 timeout_ms;
>> ...
>>     __u32 timeout_us;
>> }
>>
>> timeout = timeout_ms + 1000 * timeout_us;
>>
>> There shouldn't be a need for a long timeouts, since a job that takes
>> over 100ms is probably too unpractical. It also should be possible to
>> detect a progressing job and then defer timeout in the driver. At least
>> this is what other drivers do, like etnaviv driver for example:
>>
>> https://elixir.bootlin.com/linux/v5.9-rc4/source/drivers/gpu/drm/etnaviv/etnaviv_sched.c#L107
>>
>>
> 
> I still don't quite understand why it's better to default to
> milliseconds? As you say, there is no need to have a long timeout, and
> if we go microseconds now, then there wouldn't be a need to extend in
> the future.

It will nicer to avoid unnecessary unit-conversions in the code in order
to keep it cleaner.

I'm now also a bit dubious about that the timeout field of the submit
IOCTL will be in the final UAPI version because it should become
obsolete once drm-scheduler will be hooked up, since the hung-check
timeout will be specified per-hardware engine within the kernel driver
and there won't be much use for the user-defined timeout.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI
@ 2020-09-11 16:30               ` Dmitry Osipenko
  0 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-11 16:30 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

11.09.2020 12:59, Mikko Perttunen пишет:
> On 9/11/20 12:57 AM, Dmitry Osipenko wrote:
>> 09.09.2020 11:36, Mikko Perttunen пишет:
>> ...
>>>>
>>>> Does it make sense to have timeout in microseconds?
>>>>
>>>
>>> Not sure, but better have it a bit more fine-grained rather than
>>> coarse-grained. This still gives a maximum timeout of 71 minutes so I
>>> don't think it has any negatives compared to milliseconds.
>>
>> If there is no good reason to use microseconds right now, then should be
>> better to default to milliseconds, IMO. It shouldn't be a problem to
>> extend the IOCLT with a microseconds entry, if ever be needed.
>>
>> {
>>     __u32 timeout_ms;
>> ...
>>     __u32 timeout_us;
>> }
>>
>> timeout = timeout_ms + 1000 * timeout_us;
>>
>> There shouldn't be a need for a long timeouts, since a job that takes
>> over 100ms is probably too unpractical. It also should be possible to
>> detect a progressing job and then defer timeout in the driver. At least
>> this is what other drivers do, like etnaviv driver for example:
>>
>> https://elixir.bootlin.com/linux/v5.9-rc4/source/drivers/gpu/drm/etnaviv/etnaviv_sched.c#L107
>>
>>
> 
> I still don't quite understand why it's better to default to
> milliseconds? As you say, there is no need to have a long timeout, and
> if we go microseconds now, then there wouldn't be a need to extend in
> the future.

It will nicer to avoid unnecessary unit-conversions in the code in order
to keep it cleaner.

I'm now also a bit dubious about that the timeout field of the submit
IOCTL will be in the final UAPI version because it should become
obsolete once drm-scheduler will be hooked up, since the hung-check
timeout will be specified per-hardware engine within the kernel driver
and there won't be much use for the user-defined timeout.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 10/17] WIP: gpu: host1x: Add no-recovery mode
  2020-09-05 10:34   ` Mikko Perttunen
@ 2020-09-11 16:40     ` Dmitry Osipenko
  -1 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-11 16:40 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

05.09.2020 13:34, Mikko Perttunen пишет:
> +	} else {
> +		struct host1x_job *failed_job = job;
> +
> +		host1x_job_dump(dev, job);
> +
> +		host1x_syncpt_set_locked(job->syncpt);
> +		failed_job->cancelled = true;
> +
> +		list_for_each_entry_continue(job, &cdma->sync_queue, list) {
> +			unsigned int i;
> +
> +			if (job->syncpt != failed_job->syncpt)
> +				continue;
> +
> +			for (i = 0; i < job->num_slots; i++) {
> +				unsigned int slot = (job->first_get/8 + i) %
> +						    HOST1X_PUSHBUFFER_SLOTS;
> +				u32 *mapped = cdma->push_buffer.mapped;
> +
> +				mapped[2*slot+0] = 0x1bad0000;
> +				mapped[2*slot+1] = 0x1bad0000;

The 0x1bad0000 is a valid memory address on Tegra20.

The 0x60000000 is invalid phys address for all hardware generations.
It's used by grate-kernel [1] and VDE driver [2]. Note that the 0x6 <<
28 is also invalid Host1x opcode, while 0x1 should break CDMA parser
during of PB debug-dumping.

[1]
https://github.com/grate-driver/linux/blob/master/drivers/gpu/drm/tegra/gem.h#L16

[2]
https://elixir.bootlin.com/linux/v5.9-rc4/source/drivers/staging/media/tegra-vde/iommu.c#L99

The VDE driver reserves the trapping IOVA addresses, I assume the Host1x
driver should do the same.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 10/17] WIP: gpu: host1x: Add no-recovery mode
@ 2020-09-11 16:40     ` Dmitry Osipenko
  0 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-11 16:40 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

05.09.2020 13:34, Mikko Perttunen пишет:
> +	} else {
> +		struct host1x_job *failed_job = job;
> +
> +		host1x_job_dump(dev, job);
> +
> +		host1x_syncpt_set_locked(job->syncpt);
> +		failed_job->cancelled = true;
> +
> +		list_for_each_entry_continue(job, &cdma->sync_queue, list) {
> +			unsigned int i;
> +
> +			if (job->syncpt != failed_job->syncpt)
> +				continue;
> +
> +			for (i = 0; i < job->num_slots; i++) {
> +				unsigned int slot = (job->first_get/8 + i) %
> +						    HOST1X_PUSHBUFFER_SLOTS;
> +				u32 *mapped = cdma->push_buffer.mapped;
> +
> +				mapped[2*slot+0] = 0x1bad0000;
> +				mapped[2*slot+1] = 0x1bad0000;

The 0x1bad0000 is a valid memory address on Tegra20.

The 0x60000000 is invalid phys address for all hardware generations.
It's used by grate-kernel [1] and VDE driver [2]. Note that the 0x6 <<
28 is also invalid Host1x opcode, while 0x1 should break CDMA parser
during of PB debug-dumping.

[1]
https://github.com/grate-driver/linux/blob/master/drivers/gpu/drm/tegra/gem.h#L16

[2]
https://elixir.bootlin.com/linux/v5.9-rc4/source/drivers/staging/media/tegra-vde/iommu.c#L99

The VDE driver reserves the trapping IOVA addresses, I assume the Host1x
driver should do the same.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 10/17] WIP: gpu: host1x: Add no-recovery mode
  2020-09-11 16:40     ` Dmitry Osipenko
@ 2020-09-11 22:11       ` Mikko Perttunen
  -1 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-11 22:11 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

On 9/11/20 7:40 PM, Dmitry Osipenko wrote:
> 05.09.2020 13:34, Mikko Perttunen пишет:
>> +	} else {
>> +		struct host1x_job *failed_job = job;
>> +
>> +		host1x_job_dump(dev, job);
>> +
>> +		host1x_syncpt_set_locked(job->syncpt);
>> +		failed_job->cancelled = true;
>> +
>> +		list_for_each_entry_continue(job, &cdma->sync_queue, list) {
>> +			unsigned int i;
>> +
>> +			if (job->syncpt != failed_job->syncpt)
>> +				continue;
>> +
>> +			for (i = 0; i < job->num_slots; i++) {
>> +				unsigned int slot = (job->first_get/8 + i) %
>> +						    HOST1X_PUSHBUFFER_SLOTS;
>> +				u32 *mapped = cdma->push_buffer.mapped;
>> +
>> +				mapped[2*slot+0] = 0x1bad0000;
>> +				mapped[2*slot+1] = 0x1bad0000;
> 
> The 0x1bad0000 is a valid memory address on Tegra20.
> 
> The 0x60000000 is invalid phys address for all hardware generations.
> It's used by grate-kernel [1] and VDE driver [2]. Note that the 0x6 <<
> 28 is also invalid Host1x opcode, while 0x1 should break CDMA parser
> during of PB debug-dumping.
> 
> [1]
> https://github.com/grate-driver/linux/blob/master/drivers/gpu/drm/tegra/gem.h#L16
> 
> [2]
> https://elixir.bootlin.com/linux/v5.9-rc4/source/drivers/staging/media/tegra-vde/iommu.c#L99
> 
> The VDE driver reserves the trapping IOVA addresses, I assume the Host1x
> driver should do the same.
> 

The 0x1bad0000's are not intended to be memory addresses, they are NOOP 
opcodes (INCR of 0 words to offset 0xbad). I'll fix this to use proper 
functions to construct the opcodes and add some comments. These need to 
be NOOP opcodes so the command parser skips over these "cancelled" jobs 
when the channel is resumed.

BTW, 0x60000000 is valid on Tegra194 and later.

Mikko

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 10/17] WIP: gpu: host1x: Add no-recovery mode
@ 2020-09-11 22:11       ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-11 22:11 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

On 9/11/20 7:40 PM, Dmitry Osipenko wrote:
> 05.09.2020 13:34, Mikko Perttunen пишет:
>> +	} else {
>> +		struct host1x_job *failed_job = job;
>> +
>> +		host1x_job_dump(dev, job);
>> +
>> +		host1x_syncpt_set_locked(job->syncpt);
>> +		failed_job->cancelled = true;
>> +
>> +		list_for_each_entry_continue(job, &cdma->sync_queue, list) {
>> +			unsigned int i;
>> +
>> +			if (job->syncpt != failed_job->syncpt)
>> +				continue;
>> +
>> +			for (i = 0; i < job->num_slots; i++) {
>> +				unsigned int slot = (job->first_get/8 + i) %
>> +						    HOST1X_PUSHBUFFER_SLOTS;
>> +				u32 *mapped = cdma->push_buffer.mapped;
>> +
>> +				mapped[2*slot+0] = 0x1bad0000;
>> +				mapped[2*slot+1] = 0x1bad0000;
> 
> The 0x1bad0000 is a valid memory address on Tegra20.
> 
> The 0x60000000 is invalid phys address for all hardware generations.
> It's used by grate-kernel [1] and VDE driver [2]. Note that the 0x6 <<
> 28 is also invalid Host1x opcode, while 0x1 should break CDMA parser
> during of PB debug-dumping.
> 
> [1]
> https://github.com/grate-driver/linux/blob/master/drivers/gpu/drm/tegra/gem.h#L16
> 
> [2]
> https://elixir.bootlin.com/linux/v5.9-rc4/source/drivers/staging/media/tegra-vde/iommu.c#L99
> 
> The VDE driver reserves the trapping IOVA addresses, I assume the Host1x
> driver should do the same.
> 

The 0x1bad0000's are not intended to be memory addresses, they are NOOP 
opcodes (INCR of 0 words to offset 0xbad). I'll fix this to use proper 
functions to construct the opcodes and add some comments. These need to 
be NOOP opcodes so the command parser skips over these "cancelled" jobs 
when the channel is resumed.

BTW, 0x60000000 is valid on Tegra194 and later.

Mikko
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 10/17] WIP: gpu: host1x: Add no-recovery mode
  2020-09-11 22:11       ` Mikko Perttunen
@ 2020-09-12 12:53         ` Dmitry Osipenko
  -1 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-12 12:53 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

12.09.2020 01:11, Mikko Perttunen пишет:
> On 9/11/20 7:40 PM, Dmitry Osipenko wrote:
>> 05.09.2020 13:34, Mikko Perttunen пишет:
>>> +    } else {
>>> +        struct host1x_job *failed_job = job;
>>> +
>>> +        host1x_job_dump(dev, job);
>>> +
>>> +        host1x_syncpt_set_locked(job->syncpt);
>>> +        failed_job->cancelled = true;
>>> +
>>> +        list_for_each_entry_continue(job, &cdma->sync_queue, list) {
>>> +            unsigned int i;
>>> +
>>> +            if (job->syncpt != failed_job->syncpt)
>>> +                continue;
>>> +
>>> +            for (i = 0; i < job->num_slots; i++) {
>>> +                unsigned int slot = (job->first_get/8 + i) %
>>> +                            HOST1X_PUSHBUFFER_SLOTS;
>>> +                u32 *mapped = cdma->push_buffer.mapped;
>>> +
>>> +                mapped[2*slot+0] = 0x1bad0000;
>>> +                mapped[2*slot+1] = 0x1bad0000;
>>
>> The 0x1bad0000 is a valid memory address on Tegra20.
>>
>> The 0x60000000 is invalid phys address for all hardware generations.
>> It's used by grate-kernel [1] and VDE driver [2]. Note that the 0x6 <<
>> 28 is also invalid Host1x opcode, while 0x1 should break CDMA parser
>> during of PB debug-dumping.
>>
>> [1]
>> https://github.com/grate-driver/linux/blob/master/drivers/gpu/drm/tegra/gem.h#L16
>>
>>
>> [2]
>> https://elixir.bootlin.com/linux/v5.9-rc4/source/drivers/staging/media/tegra-vde/iommu.c#L99
>>
>>
>> The VDE driver reserves the trapping IOVA addresses, I assume the Host1x
>> driver should do the same.
>>
> 
> The 0x1bad0000's are not intended to be memory addresses, they are NOOP
> opcodes (INCR of 0 words to offset 0xbad). I'll fix this to use proper
> functions to construct the opcodes and add some comments. These need to
> be NOOP opcodes so the command parser skips over these "cancelled" jobs
> when the channel is resumed.
> 
> BTW, 0x60000000 is valid on Tegra194 and later.

At a quick glance it looked like a memory address :)

I'm now taking a closer look at this patch and it raises some more
questions, like for example by looking at the "On job timeout, we stop
the channel, NOP all future jobs on the channel using the same syncpoint
..." through the prism of grate-kernel experience, I'm not sure how it
could co-exist with the drm-scheduler and why it's needed at all. But I
think we need a feature-complete version (at least a rough version), so
that we could start the testing, and then it should be easier to review
and discuss such things.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 10/17] WIP: gpu: host1x: Add no-recovery mode
@ 2020-09-12 12:53         ` Dmitry Osipenko
  0 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-12 12:53 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

12.09.2020 01:11, Mikko Perttunen пишет:
> On 9/11/20 7:40 PM, Dmitry Osipenko wrote:
>> 05.09.2020 13:34, Mikko Perttunen пишет:
>>> +    } else {
>>> +        struct host1x_job *failed_job = job;
>>> +
>>> +        host1x_job_dump(dev, job);
>>> +
>>> +        host1x_syncpt_set_locked(job->syncpt);
>>> +        failed_job->cancelled = true;
>>> +
>>> +        list_for_each_entry_continue(job, &cdma->sync_queue, list) {
>>> +            unsigned int i;
>>> +
>>> +            if (job->syncpt != failed_job->syncpt)
>>> +                continue;
>>> +
>>> +            for (i = 0; i < job->num_slots; i++) {
>>> +                unsigned int slot = (job->first_get/8 + i) %
>>> +                            HOST1X_PUSHBUFFER_SLOTS;
>>> +                u32 *mapped = cdma->push_buffer.mapped;
>>> +
>>> +                mapped[2*slot+0] = 0x1bad0000;
>>> +                mapped[2*slot+1] = 0x1bad0000;
>>
>> The 0x1bad0000 is a valid memory address on Tegra20.
>>
>> The 0x60000000 is invalid phys address for all hardware generations.
>> It's used by grate-kernel [1] and VDE driver [2]. Note that the 0x6 <<
>> 28 is also invalid Host1x opcode, while 0x1 should break CDMA parser
>> during of PB debug-dumping.
>>
>> [1]
>> https://github.com/grate-driver/linux/blob/master/drivers/gpu/drm/tegra/gem.h#L16
>>
>>
>> [2]
>> https://elixir.bootlin.com/linux/v5.9-rc4/source/drivers/staging/media/tegra-vde/iommu.c#L99
>>
>>
>> The VDE driver reserves the trapping IOVA addresses, I assume the Host1x
>> driver should do the same.
>>
> 
> The 0x1bad0000's are not intended to be memory addresses, they are NOOP
> opcodes (INCR of 0 words to offset 0xbad). I'll fix this to use proper
> functions to construct the opcodes and add some comments. These need to
> be NOOP opcodes so the command parser skips over these "cancelled" jobs
> when the channel is resumed.
> 
> BTW, 0x60000000 is valid on Tegra194 and later.

At a quick glance it looked like a memory address :)

I'm now taking a closer look at this patch and it raises some more
questions, like for example by looking at the "On job timeout, we stop
the channel, NOP all future jobs on the channel using the same syncpoint
..." through the prism of grate-kernel experience, I'm not sure how it
could co-exist with the drm-scheduler and why it's needed at all. But I
think we need a feature-complete version (at least a rough version), so
that we could start the testing, and then it should be easier to review
and discuss such things.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 10/17] WIP: gpu: host1x: Add no-recovery mode
  2020-09-12 12:53         ` Dmitry Osipenko
@ 2020-09-12 13:31           ` Mikko Perttunen
  -1 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-12 13:31 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

On 9/12/20 3:53 PM, Dmitry Osipenko wrote:
> 12.09.2020 01:11, Mikko Perttunen пишет:
>> On 9/11/20 7:40 PM, Dmitry Osipenko wrote:
>>> 05.09.2020 13:34, Mikko Perttunen пишет:
>>>> +    } else {
>>>> +        struct host1x_job *failed_job = job;
>>>> +
>>>> +        host1x_job_dump(dev, job);
>>>> +
>>>> +        host1x_syncpt_set_locked(job->syncpt);
>>>> +        failed_job->cancelled = true;
>>>> +
>>>> +        list_for_each_entry_continue(job, &cdma->sync_queue, list) {
>>>> +            unsigned int i;
>>>> +
>>>> +            if (job->syncpt != failed_job->syncpt)
>>>> +                continue;
>>>> +
>>>> +            for (i = 0; i < job->num_slots; i++) {
>>>> +                unsigned int slot = (job->first_get/8 + i) %
>>>> +                            HOST1X_PUSHBUFFER_SLOTS;
>>>> +                u32 *mapped = cdma->push_buffer.mapped;
>>>> +
>>>> +                mapped[2*slot+0] = 0x1bad0000;
>>>> +                mapped[2*slot+1] = 0x1bad0000;
>>>
>>> The 0x1bad0000 is a valid memory address on Tegra20.
>>>
>>> The 0x60000000 is invalid phys address for all hardware generations.
>>> It's used by grate-kernel [1] and VDE driver [2]. Note that the 0x6 <<
>>> 28 is also invalid Host1x opcode, while 0x1 should break CDMA parser
>>> during of PB debug-dumping.
>>>
>>> [1]
>>> https://github.com/grate-driver/linux/blob/master/drivers/gpu/drm/tegra/gem.h#L16
>>>
>>>
>>> [2]
>>> https://elixir.bootlin.com/linux/v5.9-rc4/source/drivers/staging/media/tegra-vde/iommu.c#L99
>>>
>>>
>>> The VDE driver reserves the trapping IOVA addresses, I assume the Host1x
>>> driver should do the same.
>>>
>>
>> The 0x1bad0000's are not intended to be memory addresses, they are NOOP
>> opcodes (INCR of 0 words to offset 0xbad). I'll fix this to use proper
>> functions to construct the opcodes and add some comments. These need to
>> be NOOP opcodes so the command parser skips over these "cancelled" jobs
>> when the channel is resumed.
>>
>> BTW, 0x60000000 is valid on Tegra194 and later.
> 
> At a quick glance it looked like a memory address :)

It does look a bit like one :) I'll add a comment to make it clear.

> 
> I'm now taking a closer look at this patch and it raises some more
> questions, like for example by looking at the "On job timeout, we stop
> the channel, NOP all future jobs on the channel using the same syncpoint
> ..." through the prism of grate-kernel experience, I'm not sure how it
> could co-exist with the drm-scheduler and why it's needed at all. But I
> think we need a feature-complete version (at least a rough version), so
> that we could start the testing, and then it should be easier to review
> and discuss such things.
> 

The reason this is needed is that if a job times out and we don't do its 
syncpoint increments on the CPU, then a successive job incrementing that 
same syncpoint would cause fences to signal incorrectly. The job that 
was supposed to signal those fences didn't actually run; and any data 
those fences were protecting would still be garbage.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 10/17] WIP: gpu: host1x: Add no-recovery mode
@ 2020-09-12 13:31           ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-12 13:31 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

On 9/12/20 3:53 PM, Dmitry Osipenko wrote:
> 12.09.2020 01:11, Mikko Perttunen пишет:
>> On 9/11/20 7:40 PM, Dmitry Osipenko wrote:
>>> 05.09.2020 13:34, Mikko Perttunen пишет:
>>>> +    } else {
>>>> +        struct host1x_job *failed_job = job;
>>>> +
>>>> +        host1x_job_dump(dev, job);
>>>> +
>>>> +        host1x_syncpt_set_locked(job->syncpt);
>>>> +        failed_job->cancelled = true;
>>>> +
>>>> +        list_for_each_entry_continue(job, &cdma->sync_queue, list) {
>>>> +            unsigned int i;
>>>> +
>>>> +            if (job->syncpt != failed_job->syncpt)
>>>> +                continue;
>>>> +
>>>> +            for (i = 0; i < job->num_slots; i++) {
>>>> +                unsigned int slot = (job->first_get/8 + i) %
>>>> +                            HOST1X_PUSHBUFFER_SLOTS;
>>>> +                u32 *mapped = cdma->push_buffer.mapped;
>>>> +
>>>> +                mapped[2*slot+0] = 0x1bad0000;
>>>> +                mapped[2*slot+1] = 0x1bad0000;
>>>
>>> The 0x1bad0000 is a valid memory address on Tegra20.
>>>
>>> The 0x60000000 is invalid phys address for all hardware generations.
>>> It's used by grate-kernel [1] and VDE driver [2]. Note that the 0x6 <<
>>> 28 is also invalid Host1x opcode, while 0x1 should break CDMA parser
>>> during of PB debug-dumping.
>>>
>>> [1]
>>> https://github.com/grate-driver/linux/blob/master/drivers/gpu/drm/tegra/gem.h#L16
>>>
>>>
>>> [2]
>>> https://elixir.bootlin.com/linux/v5.9-rc4/source/drivers/staging/media/tegra-vde/iommu.c#L99
>>>
>>>
>>> The VDE driver reserves the trapping IOVA addresses, I assume the Host1x
>>> driver should do the same.
>>>
>>
>> The 0x1bad0000's are not intended to be memory addresses, they are NOOP
>> opcodes (INCR of 0 words to offset 0xbad). I'll fix this to use proper
>> functions to construct the opcodes and add some comments. These need to
>> be NOOP opcodes so the command parser skips over these "cancelled" jobs
>> when the channel is resumed.
>>
>> BTW, 0x60000000 is valid on Tegra194 and later.
> 
> At a quick glance it looked like a memory address :)

It does look a bit like one :) I'll add a comment to make it clear.

> 
> I'm now taking a closer look at this patch and it raises some more
> questions, like for example by looking at the "On job timeout, we stop
> the channel, NOP all future jobs on the channel using the same syncpoint
> ..." through the prism of grate-kernel experience, I'm not sure how it
> could co-exist with the drm-scheduler and why it's needed at all. But I
> think we need a feature-complete version (at least a rough version), so
> that we could start the testing, and then it should be easier to review
> and discuss such things.
> 

The reason this is needed is that if a job times out and we don't do its 
syncpoint increments on the CPU, then a successive job incrementing that 
same syncpoint would cause fences to signal incorrectly. The job that 
was supposed to signal those fences didn't actually run; and any data 
those fences were protecting would still be garbage.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 10/17] WIP: gpu: host1x: Add no-recovery mode
  2020-09-12 13:31           ` Mikko Perttunen
@ 2020-09-12 21:51             ` Dmitry Osipenko
  -1 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-12 21:51 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

12.09.2020 16:31, Mikko Perttunen пишет:
...
>> I'm now taking a closer look at this patch and it raises some more
>> questions, like for example by looking at the "On job timeout, we stop
>> the channel, NOP all future jobs on the channel using the same syncpoint
>> ..." through the prism of grate-kernel experience, I'm not sure how it
>> could co-exist with the drm-scheduler and why it's needed at all. But I
>> think we need a feature-complete version (at least a rough version), so
>> that we could start the testing, and then it should be easier to review
>> and discuss such things.
>>
> 
> The reason this is needed is that if a job times out and we don't do its
> syncpoint increments on the CPU, then a successive job incrementing that
> same syncpoint would cause fences to signal incorrectly. The job that
> was supposed to signal those fences didn't actually run; and any data
> those fences were protecting would still be garbage.

I'll need to re-read the previous discussion because IIRC, I was
suggesting that once job is hung, all jobs should be removed from
queue/PB and re-submitted, then the re-submitted jobs will use the
new/updated sync point base.

And we probably should need another drm_tegra_submit_cmd type that waits
for a relative sync point increment. The
drm_tegra_submit_cmd_wait_syncpt uses absolute sync point value and it
shouldn't be used for sync point increments that are internal to a job
because it complicates the recovery.

All waits that are internal to a job should only wait for relative sync
point increments.

In the grate-kernel every job uses unique-and-clean sync point (which is
also internal to the kernel driver) and a relative wait [1] is used for
the job's internal sync point increments [2][3][4], and thus, kernel
driver simply jumps over a hung job by updating DMAGET to point at the
start of a next job.

[1]
https://github.com/grate-driver/linux/blob/master/drivers/gpu/drm/tegra/uapi/patching.c#L367

[2]
https://github.com/grate-driver/xf86-video-opentegra/blob/master/src/gpu/gr3d.c#L486
[3]
https://github.com/grate-driver/xf86-video-opentegra/blob/master/src/exa/copy_2d.c#L389
[4]
https://github.com/grate-driver/xf86-video-opentegra/blob/master/src/gpu/tegra_stream_v2.c#L536

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 10/17] WIP: gpu: host1x: Add no-recovery mode
@ 2020-09-12 21:51             ` Dmitry Osipenko
  0 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-12 21:51 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

12.09.2020 16:31, Mikko Perttunen пишет:
...
>> I'm now taking a closer look at this patch and it raises some more
>> questions, like for example by looking at the "On job timeout, we stop
>> the channel, NOP all future jobs on the channel using the same syncpoint
>> ..." through the prism of grate-kernel experience, I'm not sure how it
>> could co-exist with the drm-scheduler and why it's needed at all. But I
>> think we need a feature-complete version (at least a rough version), so
>> that we could start the testing, and then it should be easier to review
>> and discuss such things.
>>
> 
> The reason this is needed is that if a job times out and we don't do its
> syncpoint increments on the CPU, then a successive job incrementing that
> same syncpoint would cause fences to signal incorrectly. The job that
> was supposed to signal those fences didn't actually run; and any data
> those fences were protecting would still be garbage.

I'll need to re-read the previous discussion because IIRC, I was
suggesting that once job is hung, all jobs should be removed from
queue/PB and re-submitted, then the re-submitted jobs will use the
new/updated sync point base.

And we probably should need another drm_tegra_submit_cmd type that waits
for a relative sync point increment. The
drm_tegra_submit_cmd_wait_syncpt uses absolute sync point value and it
shouldn't be used for sync point increments that are internal to a job
because it complicates the recovery.

All waits that are internal to a job should only wait for relative sync
point increments.

In the grate-kernel every job uses unique-and-clean sync point (which is
also internal to the kernel driver) and a relative wait [1] is used for
the job's internal sync point increments [2][3][4], and thus, kernel
driver simply jumps over a hung job by updating DMAGET to point at the
start of a next job.

[1]
https://github.com/grate-driver/linux/blob/master/drivers/gpu/drm/tegra/uapi/patching.c#L367

[2]
https://github.com/grate-driver/xf86-video-opentegra/blob/master/src/gpu/gr3d.c#L486
[3]
https://github.com/grate-driver/xf86-video-opentegra/blob/master/src/exa/copy_2d.c#L389
[4]
https://github.com/grate-driver/xf86-video-opentegra/blob/master/src/gpu/tegra_stream_v2.c#L536
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 10/17] WIP: gpu: host1x: Add no-recovery mode
  2020-09-12 21:51             ` Dmitry Osipenko
@ 2020-09-13  9:51               ` Mikko Perttunen
  -1 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-13  9:51 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

On 9/13/20 12:51 AM, Dmitry Osipenko wrote:
> 12.09.2020 16:31, Mikko Perttunen пишет:
> ...
>>> I'm now taking a closer look at this patch and it raises some more
>>> questions, like for example by looking at the "On job timeout, we stop
>>> the channel, NOP all future jobs on the channel using the same syncpoint
>>> ..." through the prism of grate-kernel experience, I'm not sure how it
>>> could co-exist with the drm-scheduler and why it's needed at all. But I
>>> think we need a feature-complete version (at least a rough version), so
>>> that we could start the testing, and then it should be easier to review
>>> and discuss such things.
>>>
>>
>> The reason this is needed is that if a job times out and we don't do its
>> syncpoint increments on the CPU, then a successive job incrementing that
>> same syncpoint would cause fences to signal incorrectly. The job that
>> was supposed to signal those fences didn't actually run; and any data
>> those fences were protecting would still be garbage.
> 
> I'll need to re-read the previous discussion because IIRC, I was
> suggesting that once job is hung, all jobs should be removed from
> queue/PB and re-submitted, then the re-submitted jobs will use the
> new/updated sync point base. >
> And we probably should need another drm_tegra_submit_cmd type that waits
> for a relative sync point increment. The
> drm_tegra_submit_cmd_wait_syncpt uses absolute sync point value and it
> shouldn't be used for sync point increments that are internal to a job
> because it complicates the recovery.
> 
> All waits that are internal to a job should only wait for relative sync
> point increments. >
> In the grate-kernel every job uses unique-and-clean sync point (which is
> also internal to the kernel driver) and a relative wait [1] is used for
> the job's internal sync point increments [2][3][4], and thus, kernel
> driver simply jumps over a hung job by updating DMAGET to point at the
> start of a next job.

Issues I have with this approach:

* Both this and my approach have the requirement for userspace, that if 
a job hangs, the userspace must ensure all external waiters have timed 
out / been stopped before the syncpoint can be freed, as if the 
syncpoint gets reused before then, false waiter completions can happen.

So freeing the syncpoint must be exposed to userspace. The kernel cannot 
do this since there may be waiters that the kernel is not aware of. My 
proposal only has one syncpoint, which I feel makes this part simpler, too.

* I believe this proposal requires allocating a syncpoint for each 
externally visible syncpoint increment that the job does. This can use 
up quite a few syncpoints, and it makes syncpoints a dynamically 
allocated resource with unbounded allocation latency. This is a problem 
for safety-related systems.

* If a job fails on a "virtual channel" (userctx), I think it's a 
reasonable expectation that further jobs on that "virtual channel" will 
not execute, and I think implementing that model is simpler than doing 
recovery.

Mikko

> 
> [1]
> https://github.com/grate-driver/linux/blob/master/drivers/gpu/drm/tegra/uapi/patching.c#L367
> 
> [2]
> https://github.com/grate-driver/xf86-video-opentegra/blob/master/src/gpu/gr3d.c#L486
> [3]
> https://github.com/grate-driver/xf86-video-opentegra/blob/master/src/exa/copy_2d.c#L389
> [4]
> https://github.com/grate-driver/xf86-video-opentegra/blob/master/src/gpu/tegra_stream_v2.c#L536
> 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 10/17] WIP: gpu: host1x: Add no-recovery mode
@ 2020-09-13  9:51               ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-13  9:51 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

On 9/13/20 12:51 AM, Dmitry Osipenko wrote:
> 12.09.2020 16:31, Mikko Perttunen пишет:
> ...
>>> I'm now taking a closer look at this patch and it raises some more
>>> questions, like for example by looking at the "On job timeout, we stop
>>> the channel, NOP all future jobs on the channel using the same syncpoint
>>> ..." through the prism of grate-kernel experience, I'm not sure how it
>>> could co-exist with the drm-scheduler and why it's needed at all. But I
>>> think we need a feature-complete version (at least a rough version), so
>>> that we could start the testing, and then it should be easier to review
>>> and discuss such things.
>>>
>>
>> The reason this is needed is that if a job times out and we don't do its
>> syncpoint increments on the CPU, then a successive job incrementing that
>> same syncpoint would cause fences to signal incorrectly. The job that
>> was supposed to signal those fences didn't actually run; and any data
>> those fences were protecting would still be garbage.
> 
> I'll need to re-read the previous discussion because IIRC, I was
> suggesting that once job is hung, all jobs should be removed from
> queue/PB and re-submitted, then the re-submitted jobs will use the
> new/updated sync point base. >
> And we probably should need another drm_tegra_submit_cmd type that waits
> for a relative sync point increment. The
> drm_tegra_submit_cmd_wait_syncpt uses absolute sync point value and it
> shouldn't be used for sync point increments that are internal to a job
> because it complicates the recovery.
> 
> All waits that are internal to a job should only wait for relative sync
> point increments. >
> In the grate-kernel every job uses unique-and-clean sync point (which is
> also internal to the kernel driver) and a relative wait [1] is used for
> the job's internal sync point increments [2][3][4], and thus, kernel
> driver simply jumps over a hung job by updating DMAGET to point at the
> start of a next job.

Issues I have with this approach:

* Both this and my approach have the requirement for userspace, that if 
a job hangs, the userspace must ensure all external waiters have timed 
out / been stopped before the syncpoint can be freed, as if the 
syncpoint gets reused before then, false waiter completions can happen.

So freeing the syncpoint must be exposed to userspace. The kernel cannot 
do this since there may be waiters that the kernel is not aware of. My 
proposal only has one syncpoint, which I feel makes this part simpler, too.

* I believe this proposal requires allocating a syncpoint for each 
externally visible syncpoint increment that the job does. This can use 
up quite a few syncpoints, and it makes syncpoints a dynamically 
allocated resource with unbounded allocation latency. This is a problem 
for safety-related systems.

* If a job fails on a "virtual channel" (userctx), I think it's a 
reasonable expectation that further jobs on that "virtual channel" will 
not execute, and I think implementing that model is simpler than doing 
recovery.

Mikko

> 
> [1]
> https://github.com/grate-driver/linux/blob/master/drivers/gpu/drm/tegra/uapi/patching.c#L367
> 
> [2]
> https://github.com/grate-driver/xf86-video-opentegra/blob/master/src/gpu/gr3d.c#L486
> [3]
> https://github.com/grate-driver/xf86-video-opentegra/blob/master/src/exa/copy_2d.c#L389
> [4]
> https://github.com/grate-driver/xf86-video-opentegra/blob/master/src/gpu/tegra_stream_v2.c#L536
> 
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 10/17] WIP: gpu: host1x: Add no-recovery mode
  2020-09-13  9:51               ` Mikko Perttunen
@ 2020-09-13 18:37                 ` Dmitry Osipenko
  -1 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-13 18:37 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

13.09.2020 12:51, Mikko Perttunen пишет:
...
>> All waits that are internal to a job should only wait for relative sync
>> point increments. >
>> In the grate-kernel every job uses unique-and-clean sync point (which is
>> also internal to the kernel driver) and a relative wait [1] is used for
>> the job's internal sync point increments [2][3][4], and thus, kernel
>> driver simply jumps over a hung job by updating DMAGET to point at the
>> start of a next job.
> 
> Issues I have with this approach:
> 
> * Both this and my approach have the requirement for userspace, that if
> a job hangs, the userspace must ensure all external waiters have timed
> out / been stopped before the syncpoint can be freed, as if the
> syncpoint gets reused before then, false waiter completions can happen.
> 
> So freeing the syncpoint must be exposed to userspace. The kernel cannot
> do this since there may be waiters that the kernel is not aware of. My
> proposal only has one syncpoint, which I feel makes this part simpler, too.
> 
> * I believe this proposal requires allocating a syncpoint for each
> externally visible syncpoint increment that the job does. This can use
> up quite a few syncpoints, and it makes syncpoints a dynamically
> allocated resource with unbounded allocation latency. This is a problem
> for safety-related systems.

Maybe we could have a special type of a "shared" sync point that is
allocated per-hardware engine? Then shared SP won't be a scarce resource
and job won't depend on it. The kernel or userspace driver may take care
of recovering the counter value of a shared SP when job hangs or do
whatever else is needed without affecting the job's sync point.

Primarily I'm not feeling very happy about retaining the job's sync
point recovery code because it was broken the last time I touched it and
grate-kernel works fine without it.

> * If a job fails on a "virtual channel" (userctx), I think it's a
> reasonable expectation that further jobs on that "virtual channel" will
> not execute, and I think implementing that model is simpler than doing
> recovery.

Couldn't jobs just use explicit fencing? Then a second job won't be
executed if first job hangs and explicit dependency is expressed. I'm
not sure that concept of a "virtual channel" is applicable to drm-scheduler.

I'll need to see a full-featured driver implementation and the test
cases that cover all the problems that you're worried about because I'm
not aware about all the T124+ needs and seeing code should help. Maybe
in the end yours approach will be the best, but for now it's not clear :)

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 10/17] WIP: gpu: host1x: Add no-recovery mode
@ 2020-09-13 18:37                 ` Dmitry Osipenko
  0 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-13 18:37 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

13.09.2020 12:51, Mikko Perttunen пишет:
...
>> All waits that are internal to a job should only wait for relative sync
>> point increments. >
>> In the grate-kernel every job uses unique-and-clean sync point (which is
>> also internal to the kernel driver) and a relative wait [1] is used for
>> the job's internal sync point increments [2][3][4], and thus, kernel
>> driver simply jumps over a hung job by updating DMAGET to point at the
>> start of a next job.
> 
> Issues I have with this approach:
> 
> * Both this and my approach have the requirement for userspace, that if
> a job hangs, the userspace must ensure all external waiters have timed
> out / been stopped before the syncpoint can be freed, as if the
> syncpoint gets reused before then, false waiter completions can happen.
> 
> So freeing the syncpoint must be exposed to userspace. The kernel cannot
> do this since there may be waiters that the kernel is not aware of. My
> proposal only has one syncpoint, which I feel makes this part simpler, too.
> 
> * I believe this proposal requires allocating a syncpoint for each
> externally visible syncpoint increment that the job does. This can use
> up quite a few syncpoints, and it makes syncpoints a dynamically
> allocated resource with unbounded allocation latency. This is a problem
> for safety-related systems.

Maybe we could have a special type of a "shared" sync point that is
allocated per-hardware engine? Then shared SP won't be a scarce resource
and job won't depend on it. The kernel or userspace driver may take care
of recovering the counter value of a shared SP when job hangs or do
whatever else is needed without affecting the job's sync point.

Primarily I'm not feeling very happy about retaining the job's sync
point recovery code because it was broken the last time I touched it and
grate-kernel works fine without it.

> * If a job fails on a "virtual channel" (userctx), I think it's a
> reasonable expectation that further jobs on that "virtual channel" will
> not execute, and I think implementing that model is simpler than doing
> recovery.

Couldn't jobs just use explicit fencing? Then a second job won't be
executed if first job hangs and explicit dependency is expressed. I'm
not sure that concept of a "virtual channel" is applicable to drm-scheduler.

I'll need to see a full-featured driver implementation and the test
cases that cover all the problems that you're worried about because I'm
not aware about all the T124+ needs and seeing code should help. Maybe
in the end yours approach will be the best, but for now it's not clear :)
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 10/17] WIP: gpu: host1x: Add no-recovery mode
  2020-09-13 18:37                 ` Dmitry Osipenko
@ 2020-09-15 10:57                   ` Mikko Perttunen
  -1 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-15 10:57 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

On 9/13/20 9:37 PM, Dmitry Osipenko wrote:
> 13.09.2020 12:51, Mikko Perttunen пишет:
> ...
>>> All waits that are internal to a job should only wait for relative sync
>>> point increments. >
>>> In the grate-kernel every job uses unique-and-clean sync point (which is
>>> also internal to the kernel driver) and a relative wait [1] is used for
>>> the job's internal sync point increments [2][3][4], and thus, kernel
>>> driver simply jumps over a hung job by updating DMAGET to point at the
>>> start of a next job.
>>
>> Issues I have with this approach:
>>
>> * Both this and my approach have the requirement for userspace, that if
>> a job hangs, the userspace must ensure all external waiters have timed
>> out / been stopped before the syncpoint can be freed, as if the
>> syncpoint gets reused before then, false waiter completions can happen.
>>
>> So freeing the syncpoint must be exposed to userspace. The kernel cannot
>> do this since there may be waiters that the kernel is not aware of. My
>> proposal only has one syncpoint, which I feel makes this part simpler, too.
>>
>> * I believe this proposal requires allocating a syncpoint for each
>> externally visible syncpoint increment that the job does. This can use
>> up quite a few syncpoints, and it makes syncpoints a dynamically
>> allocated resource with unbounded allocation latency. This is a problem
>> for safety-related systems.
> 
> Maybe we could have a special type of a "shared" sync point that is
> allocated per-hardware engine? Then shared SP won't be a scarce resource
> and job won't depend on it. The kernel or userspace driver may take care
> of recovering the counter value of a shared SP when job hangs or do
> whatever else is needed without affecting the job's sync point.

Having a shared syncpoint opens up possibilities for interference 
between jobs (if we're not using the firewall, the HW cannot distinguish 
between jobs on the same channel), and doesn't work if there are 
multiple channels using the same engine, which we want to do for newer 
chips (for performance and virtualization reasons).

Even then, even if we need to allocate one syncpoint per job, the issue 
seems to be there.

> 
> Primarily I'm not feeling very happy about retaining the job's sync
> point recovery code because it was broken the last time I touched it and
> grate-kernel works fine without it.

I'm not planning to retain it any longer than necessary, which is until 
the staging interface is removed. Technically I can already remove it 
now -- that would cause any users of the staging interface to 
potentially behave weirdly if a job times out, but maybe we don't care 
about that all that much?

> 
>> * If a job fails on a "virtual channel" (userctx), I think it's a
>> reasonable expectation that further jobs on that "virtual channel" will
>> not execute, and I think implementing that model is simpler than doing
>> recovery.
> 
> Couldn't jobs just use explicit fencing? Then a second job won't be
> executed if first job hangs and explicit dependency is expressed. I'm
> not sure that concept of a "virtual channel" is applicable to drm-scheduler.

I assume what you mean is that each job incrementing a syncpoint would 
first wait for the preceding job incrementing that syncpoint to 
complete, by waiting for the preceding job's fence value.

I would consider what I do in this patch to be an optimization of that. 
Let's say we detect a timed out job and just skip that job in the CDMA 
pushbuffer (but do not CPU-increment syncpoints), then at every 
subsequent job using that syncpoint, we will be detecting a timeout and 
skipping it eventually. With the "NOPping" in this patch we just 
pre-emptively cancel those jobs so that we don't have to spend time 
waiting for timeouts in the future. Functionally these should be the 
same, though.

The wait-for-preceding-job-to-complete thing should already be there in 
form of the "serialize" operation if the jobs use the same syncpoint.

So, if DRM scheduler's current operation is just skipping the timing out 
job and continuing from the next job, that's functionally fine. But we 
could improve DRM scheduler to allow for also cancelling future jobs 
that we know will time out. That would be in essence "virtual channel" 
support.

Userspace still has options -- if it puts in other prefences, timeouts 
will happen as usual. If it wants to have multiple "threads" of 
execution where a timeout on one doesn't affect the others, it can use 
different syncpoints for them.

> 
> I'll need to see a full-featured driver implementation and the test
> cases that cover all the problems that you're worried about because I'm
> not aware about all the T124+ needs and seeing code should help. Maybe
> in the end yours approach will be the best, but for now it's not clear :)
> 

My primary goal is simplicity of programming model and implementation. 
Regarding the resource management concerns, I can of course create a 
test case that allocates a lot of resources, but what I'm afraid about 
is that once we put this into a big system, with several VMs with their 
own resource reservations (including syncpoints), and the GPU and camera 
subsystems using hundreds of syncpoints, dynamic usage of those 
resources will create uncertainty in the system, and bug reports.

And of course, if we want to make a safety-related system, you also need 
to document before-hand how you are ensuring that e.g. job submission 
(including syncpoint allocation if that is dynamic) happens under x 
microseconds.

I don't think the model used in the grate host1x driver is bad, and I 
think the code there and its integration with the existing kernel 
frameworks are beautiful, and that is definitely a goal for the mainline 
driver as well. But I think we can make things even simpler overall and 
more reliable.

Mikko

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 10/17] WIP: gpu: host1x: Add no-recovery mode
@ 2020-09-15 10:57                   ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-15 10:57 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

On 9/13/20 9:37 PM, Dmitry Osipenko wrote:
> 13.09.2020 12:51, Mikko Perttunen пишет:
> ...
>>> All waits that are internal to a job should only wait for relative sync
>>> point increments. >
>>> In the grate-kernel every job uses unique-and-clean sync point (which is
>>> also internal to the kernel driver) and a relative wait [1] is used for
>>> the job's internal sync point increments [2][3][4], and thus, kernel
>>> driver simply jumps over a hung job by updating DMAGET to point at the
>>> start of a next job.
>>
>> Issues I have with this approach:
>>
>> * Both this and my approach have the requirement for userspace, that if
>> a job hangs, the userspace must ensure all external waiters have timed
>> out / been stopped before the syncpoint can be freed, as if the
>> syncpoint gets reused before then, false waiter completions can happen.
>>
>> So freeing the syncpoint must be exposed to userspace. The kernel cannot
>> do this since there may be waiters that the kernel is not aware of. My
>> proposal only has one syncpoint, which I feel makes this part simpler, too.
>>
>> * I believe this proposal requires allocating a syncpoint for each
>> externally visible syncpoint increment that the job does. This can use
>> up quite a few syncpoints, and it makes syncpoints a dynamically
>> allocated resource with unbounded allocation latency. This is a problem
>> for safety-related systems.
> 
> Maybe we could have a special type of a "shared" sync point that is
> allocated per-hardware engine? Then shared SP won't be a scarce resource
> and job won't depend on it. The kernel or userspace driver may take care
> of recovering the counter value of a shared SP when job hangs or do
> whatever else is needed without affecting the job's sync point.

Having a shared syncpoint opens up possibilities for interference 
between jobs (if we're not using the firewall, the HW cannot distinguish 
between jobs on the same channel), and doesn't work if there are 
multiple channels using the same engine, which we want to do for newer 
chips (for performance and virtualization reasons).

Even then, even if we need to allocate one syncpoint per job, the issue 
seems to be there.

> 
> Primarily I'm not feeling very happy about retaining the job's sync
> point recovery code because it was broken the last time I touched it and
> grate-kernel works fine without it.

I'm not planning to retain it any longer than necessary, which is until 
the staging interface is removed. Technically I can already remove it 
now -- that would cause any users of the staging interface to 
potentially behave weirdly if a job times out, but maybe we don't care 
about that all that much?

> 
>> * If a job fails on a "virtual channel" (userctx), I think it's a
>> reasonable expectation that further jobs on that "virtual channel" will
>> not execute, and I think implementing that model is simpler than doing
>> recovery.
> 
> Couldn't jobs just use explicit fencing? Then a second job won't be
> executed if first job hangs and explicit dependency is expressed. I'm
> not sure that concept of a "virtual channel" is applicable to drm-scheduler.

I assume what you mean is that each job incrementing a syncpoint would 
first wait for the preceding job incrementing that syncpoint to 
complete, by waiting for the preceding job's fence value.

I would consider what I do in this patch to be an optimization of that. 
Let's say we detect a timed out job and just skip that job in the CDMA 
pushbuffer (but do not CPU-increment syncpoints), then at every 
subsequent job using that syncpoint, we will be detecting a timeout and 
skipping it eventually. With the "NOPping" in this patch we just 
pre-emptively cancel those jobs so that we don't have to spend time 
waiting for timeouts in the future. Functionally these should be the 
same, though.

The wait-for-preceding-job-to-complete thing should already be there in 
form of the "serialize" operation if the jobs use the same syncpoint.

So, if DRM scheduler's current operation is just skipping the timing out 
job and continuing from the next job, that's functionally fine. But we 
could improve DRM scheduler to allow for also cancelling future jobs 
that we know will time out. That would be in essence "virtual channel" 
support.

Userspace still has options -- if it puts in other prefences, timeouts 
will happen as usual. If it wants to have multiple "threads" of 
execution where a timeout on one doesn't affect the others, it can use 
different syncpoints for them.

> 
> I'll need to see a full-featured driver implementation and the test
> cases that cover all the problems that you're worried about because I'm
> not aware about all the T124+ needs and seeing code should help. Maybe
> in the end yours approach will be the best, but for now it's not clear :)
> 

My primary goal is simplicity of programming model and implementation. 
Regarding the resource management concerns, I can of course create a 
test case that allocates a lot of resources, but what I'm afraid about 
is that once we put this into a big system, with several VMs with their 
own resource reservations (including syncpoints), and the GPU and camera 
subsystems using hundreds of syncpoints, dynamic usage of those 
resources will create uncertainty in the system, and bug reports.

And of course, if we want to make a safety-related system, you also need 
to document before-hand how you are ensuring that e.g. job submission 
(including syncpoint allocation if that is dynamic) happens under x 
microseconds.

I don't think the model used in the grate host1x driver is bad, and I 
think the code there and its integration with the existing kernel 
frameworks are beautiful, and that is definitely a goal for the mainline 
driver as well. But I think we can make things even simpler overall and 
more reliable.

Mikko
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI
  2020-09-11 16:30               ` Dmitry Osipenko
@ 2020-09-15 11:08                 ` Mikko Perttunen
  -1 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-15 11:08 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman



On 9/11/20 7:30 PM, Dmitry Osipenko wrote:
> 11.09.2020 12:59, Mikko Perttunen пишет:
>> On 9/11/20 12:57 AM, Dmitry Osipenko wrote:
>>> 09.09.2020 11:36, Mikko Perttunen пишет:
>>> ...
>>>>>
>>>>> Does it make sense to have timeout in microseconds?
>>>>>
>>>>
>>>> Not sure, but better have it a bit more fine-grained rather than
>>>> coarse-grained. This still gives a maximum timeout of 71 minutes so I
>>>> don't think it has any negatives compared to milliseconds.
>>>
>>> If there is no good reason to use microseconds right now, then should be
>>> better to default to milliseconds, IMO. It shouldn't be a problem to
>>> extend the IOCLT with a microseconds entry, if ever be needed.
>>>
>>> {
>>>      __u32 timeout_ms;
>>> ...
>>>      __u32 timeout_us;
>>> }
>>>
>>> timeout = timeout_ms + 1000 * timeout_us;
>>>
>>> There shouldn't be a need for a long timeouts, since a job that takes
>>> over 100ms is probably too unpractical. It also should be possible to
>>> detect a progressing job and then defer timeout in the driver. At least
>>> this is what other drivers do, like etnaviv driver for example:
>>>
>>> https://elixir.bootlin.com/linux/v5.9-rc4/source/drivers/gpu/drm/etnaviv/etnaviv_sched.c#L107
>>>
>>>
>>
>> I still don't quite understand why it's better to default to
>> milliseconds? As you say, there is no need to have a long timeout, and
>> if we go microseconds now, then there wouldn't be a need to extend in
>> the future.
> 
> It will nicer to avoid unnecessary unit-conversions in the code in order
> to keep it cleaner.

We can change all the internals to use microseconds as well. We 
eventually have to convert it to jiffies anyway, so the unit before that 
shouldn't matter much.

> 
> I'm now also a bit dubious about that the timeout field of the submit
> IOCTL will be in the final UAPI version because it should become
> obsolete once drm-scheduler will be hooked up, since the hung-check
> timeout will be specified per-hardware engine within the kernel driver
> and there won't be much use for the user-defined timeout.
> 

Perhaps we can omit this field for now. Looking at it, it's primarily 
used for tests, and for that we could add a debugfs knob to adjust the 
timeout if needed.

Mikko

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI
@ 2020-09-15 11:08                 ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-15 11:08 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel



On 9/11/20 7:30 PM, Dmitry Osipenko wrote:
> 11.09.2020 12:59, Mikko Perttunen пишет:
>> On 9/11/20 12:57 AM, Dmitry Osipenko wrote:
>>> 09.09.2020 11:36, Mikko Perttunen пишет:
>>> ...
>>>>>
>>>>> Does it make sense to have timeout in microseconds?
>>>>>
>>>>
>>>> Not sure, but better have it a bit more fine-grained rather than
>>>> coarse-grained. This still gives a maximum timeout of 71 minutes so I
>>>> don't think it has any negatives compared to milliseconds.
>>>
>>> If there is no good reason to use microseconds right now, then should be
>>> better to default to milliseconds, IMO. It shouldn't be a problem to
>>> extend the IOCLT with a microseconds entry, if ever be needed.
>>>
>>> {
>>>      __u32 timeout_ms;
>>> ...
>>>      __u32 timeout_us;
>>> }
>>>
>>> timeout = timeout_ms + 1000 * timeout_us;
>>>
>>> There shouldn't be a need for a long timeouts, since a job that takes
>>> over 100ms is probably too unpractical. It also should be possible to
>>> detect a progressing job and then defer timeout in the driver. At least
>>> this is what other drivers do, like etnaviv driver for example:
>>>
>>> https://elixir.bootlin.com/linux/v5.9-rc4/source/drivers/gpu/drm/etnaviv/etnaviv_sched.c#L107
>>>
>>>
>>
>> I still don't quite understand why it's better to default to
>> milliseconds? As you say, there is no need to have a long timeout, and
>> if we go microseconds now, then there wouldn't be a need to extend in
>> the future.
> 
> It will nicer to avoid unnecessary unit-conversions in the code in order
> to keep it cleaner.

We can change all the internals to use microseconds as well. We 
eventually have to convert it to jiffies anyway, so the unit before that 
shouldn't matter much.

> 
> I'm now also a bit dubious about that the timeout field of the submit
> IOCTL will be in the final UAPI version because it should become
> obsolete once drm-scheduler will be hooked up, since the hung-check
> timeout will be specified per-hardware engine within the kernel driver
> and there won't be much use for the user-defined timeout.
> 

Perhaps we can omit this field for now. Looking at it, it's primarily 
used for tests, and for that we could add a debugfs knob to adjust the 
timeout if needed.

Mikko
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 13/17] gpu: host1x: Reset max value when freeing a syncpoint
  2020-09-05 10:34   ` Mikko Perttunen
@ 2020-09-16 19:44     ` Dmitry Osipenko
  -1 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-16 19:44 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

05.09.2020 13:34, Mikko Perttunen пишет:
> With job recovery becoming optional, syncpoints may have a mismatch
> between their value and max value when freed. As such, when freeing,
> set the max value to the current value of the syncpoint so that it
> is in a sane state for the next user.
> 
> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> ---
>  drivers/gpu/host1x/syncpt.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
> index 2fad8b2a55cc..82ecb4ac387e 100644
> --- a/drivers/gpu/host1x/syncpt.c
> +++ b/drivers/gpu/host1x/syncpt.c
> @@ -385,6 +385,7 @@ static void syncpt_release(struct kref *ref)
>  {
>  	struct host1x_syncpt *sp = container_of(ref, struct host1x_syncpt, ref);
>  
> +	atomic_set(&sp->max_val, host1x_syncpt_read_min(sp));
>  	sp->locked = false;
>  
>  	mutex_lock(&sp->host->syncpt_mutex);
> 

Please note that the sync point state actually needs to be completely
reset at the sync point request-time because both downstream fastboot
and upstream u-boot [1] are needlessly enabling display VBLANK interrupt
that continuously increments sync point #26 during of kernel boot until
display controller is reset.

[1] https://github.com/u-boot/u-boot/blob/master/drivers/video/tegra.c#L155

Hence once sync point #26 is requested, it will have a dirty state. So
far this doesn't have any visible effect because sync points aren't used
much.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 13/17] gpu: host1x: Reset max value when freeing a syncpoint
@ 2020-09-16 19:44     ` Dmitry Osipenko
  0 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-16 19:44 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

05.09.2020 13:34, Mikko Perttunen пишет:
> With job recovery becoming optional, syncpoints may have a mismatch
> between their value and max value when freed. As such, when freeing,
> set the max value to the current value of the syncpoint so that it
> is in a sane state for the next user.
> 
> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> ---
>  drivers/gpu/host1x/syncpt.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
> index 2fad8b2a55cc..82ecb4ac387e 100644
> --- a/drivers/gpu/host1x/syncpt.c
> +++ b/drivers/gpu/host1x/syncpt.c
> @@ -385,6 +385,7 @@ static void syncpt_release(struct kref *ref)
>  {
>  	struct host1x_syncpt *sp = container_of(ref, struct host1x_syncpt, ref);
>  
> +	atomic_set(&sp->max_val, host1x_syncpt_read_min(sp));
>  	sp->locked = false;
>  
>  	mutex_lock(&sp->host->syncpt_mutex);
> 

Please note that the sync point state actually needs to be completely
reset at the sync point request-time because both downstream fastboot
and upstream u-boot [1] are needlessly enabling display VBLANK interrupt
that continuously increments sync point #26 during of kernel boot until
display controller is reset.

[1] https://github.com/u-boot/u-boot/blob/master/drivers/video/tegra.c#L155

Hence once sync point #26 is requested, it will have a dirty state. So
far this doesn't have any visible effect because sync points aren't used
much.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 13/17] gpu: host1x: Reset max value when freeing a syncpoint
  2020-09-16 19:44     ` Dmitry Osipenko
@ 2020-09-16 20:43       ` Mikko Perttunen
  -1 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-16 20:43 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

On 9/16/20 10:44 PM, Dmitry Osipenko wrote:
> 05.09.2020 13:34, Mikko Perttunen пишет:
>> With job recovery becoming optional, syncpoints may have a mismatch
>> between their value and max value when freed. As such, when freeing,
>> set the max value to the current value of the syncpoint so that it
>> is in a sane state for the next user.
>>
>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
>> ---
>>   drivers/gpu/host1x/syncpt.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
>> index 2fad8b2a55cc..82ecb4ac387e 100644
>> --- a/drivers/gpu/host1x/syncpt.c
>> +++ b/drivers/gpu/host1x/syncpt.c
>> @@ -385,6 +385,7 @@ static void syncpt_release(struct kref *ref)
>>   {
>>   	struct host1x_syncpt *sp = container_of(ref, struct host1x_syncpt, ref);
>>   
>> +	atomic_set(&sp->max_val, host1x_syncpt_read_min(sp));
>>   	sp->locked = false;
>>   
>>   	mutex_lock(&sp->host->syncpt_mutex);
>>
> 
> Please note that the sync point state actually needs to be completely
> reset at the sync point request-time because both downstream fastboot
> and upstream u-boot [1] are needlessly enabling display VBLANK interrupt
> that continuously increments sync point #26 during of kernel boot until
> display controller is reset.
> 
> [1] https://github.com/u-boot/u-boot/blob/master/drivers/video/tegra.c#L155
> 
> Hence once sync point #26 is requested, it will have a dirty state. So
> far this doesn't have any visible effect because sync points aren't used
> much.
> 

Maybe we can instead reserve syncpoints that might be used by the boot 
chain, and only allow allocating them once the display driver has acked 
that the syncpoint will no longer be incremented? That way if the 
display driver is disabled for some reason we'll still be fine.

Looking at the downstream driver, it (still, on new chips..) reserves 
the following syncpoints:

- 10 (AVP)
- 22 (3D)
- 26 (VBLANK0)
- 27 (VBLANK1)

and says that this applies to T20, T30, T114 and T148.

I suppose if you haven't observed this happening to other syncpoints 
than 26, then reserving 26 would probably be enough.

Mikko

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 13/17] gpu: host1x: Reset max value when freeing a syncpoint
@ 2020-09-16 20:43       ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-16 20:43 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

On 9/16/20 10:44 PM, Dmitry Osipenko wrote:
> 05.09.2020 13:34, Mikko Perttunen пишет:
>> With job recovery becoming optional, syncpoints may have a mismatch
>> between their value and max value when freed. As such, when freeing,
>> set the max value to the current value of the syncpoint so that it
>> is in a sane state for the next user.
>>
>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
>> ---
>>   drivers/gpu/host1x/syncpt.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
>> index 2fad8b2a55cc..82ecb4ac387e 100644
>> --- a/drivers/gpu/host1x/syncpt.c
>> +++ b/drivers/gpu/host1x/syncpt.c
>> @@ -385,6 +385,7 @@ static void syncpt_release(struct kref *ref)
>>   {
>>   	struct host1x_syncpt *sp = container_of(ref, struct host1x_syncpt, ref);
>>   
>> +	atomic_set(&sp->max_val, host1x_syncpt_read_min(sp));
>>   	sp->locked = false;
>>   
>>   	mutex_lock(&sp->host->syncpt_mutex);
>>
> 
> Please note that the sync point state actually needs to be completely
> reset at the sync point request-time because both downstream fastboot
> and upstream u-boot [1] are needlessly enabling display VBLANK interrupt
> that continuously increments sync point #26 during of kernel boot until
> display controller is reset.
> 
> [1] https://github.com/u-boot/u-boot/blob/master/drivers/video/tegra.c#L155
> 
> Hence once sync point #26 is requested, it will have a dirty state. So
> far this doesn't have any visible effect because sync points aren't used
> much.
> 

Maybe we can instead reserve syncpoints that might be used by the boot 
chain, and only allow allocating them once the display driver has acked 
that the syncpoint will no longer be incremented? That way if the 
display driver is disabled for some reason we'll still be fine.

Looking at the downstream driver, it (still, on new chips..) reserves 
the following syncpoints:

- 10 (AVP)
- 22 (3D)
- 26 (VBLANK0)
- 27 (VBLANK1)

and says that this applies to T20, T30, T114 and T148.

I suppose if you haven't observed this happening to other syncpoints 
than 26, then reserving 26 would probably be enough.

Mikko
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 13/17] gpu: host1x: Reset max value when freeing a syncpoint
  2020-09-16 20:43       ` Mikko Perttunen
@ 2020-09-16 21:37         ` Dmitry Osipenko
  -1 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-16 21:37 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

16.09.2020 23:43, Mikko Perttunen пишет:
...
>> Please note that the sync point state actually needs to be completely
>> reset at the sync point request-time because both downstream fastboot
>> and upstream u-boot [1] are needlessly enabling display VBLANK interrupt
>> that continuously increments sync point #26 during of kernel boot until
>> display controller is reset.
>>
>> [1]
>> https://github.com/u-boot/u-boot/blob/master/drivers/video/tegra.c#L155
>>
>> Hence once sync point #26 is requested, it will have a dirty state. So
>> far this doesn't have any visible effect because sync points aren't used
>> much.
>>
> 
> Maybe we can instead reserve syncpoints that might be used by the boot
> chain, and only allow allocating them once the display driver has acked
> that the syncpoint will no longer be incremented? That way if the
> display driver is disabled for some reason we'll still be fine.

sounds good

> Looking at the downstream driver, it (still, on new chips..) reserves
> the following syncpoints:
> 
> - 10 (AVP)
> - 22 (3D)
> - 26 (VBLANK0)
> - 27 (VBLANK1)
> 
> and says that this applies to T20, T30, T114 and T148.
> 
> I suppose if you haven't observed this happening to other syncpoints
> than 26, then reserving 26 would probably be enough.

I only saw SP 26 being used by the DC, but perhaps that may vary from
device to device and SP 27 could actually be used in a wild as well.

I think the AVP SP should only relate to the AVP-firmware that upstream
doesn't support, so we can ignore its reservation.

I've no idea what may use the 3D SP.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 13/17] gpu: host1x: Reset max value when freeing a syncpoint
@ 2020-09-16 21:37         ` Dmitry Osipenko
  0 siblings, 0 replies; 128+ messages in thread
From: Dmitry Osipenko @ 2020-09-16 21:37 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

16.09.2020 23:43, Mikko Perttunen пишет:
...
>> Please note that the sync point state actually needs to be completely
>> reset at the sync point request-time because both downstream fastboot
>> and upstream u-boot [1] are needlessly enabling display VBLANK interrupt
>> that continuously increments sync point #26 during of kernel boot until
>> display controller is reset.
>>
>> [1]
>> https://github.com/u-boot/u-boot/blob/master/drivers/video/tegra.c#L155
>>
>> Hence once sync point #26 is requested, it will have a dirty state. So
>> far this doesn't have any visible effect because sync points aren't used
>> much.
>>
> 
> Maybe we can instead reserve syncpoints that might be used by the boot
> chain, and only allow allocating them once the display driver has acked
> that the syncpoint will no longer be incremented? That way if the
> display driver is disabled for some reason we'll still be fine.

sounds good

> Looking at the downstream driver, it (still, on new chips..) reserves
> the following syncpoints:
> 
> - 10 (AVP)
> - 22 (3D)
> - 26 (VBLANK0)
> - 27 (VBLANK1)
> 
> and says that this applies to T20, T30, T114 and T148.
> 
> I suppose if you haven't observed this happening to other syncpoints
> than 26, then reserving 26 would probably be enough.

I only saw SP 26 being used by the DC, but perhaps that may vary from
device to device and SP 27 could actually be used in a wild as well.

I think the AVP SP should only relate to the AVP-firmware that upstream
doesn't support, so we can ignore its reservation.

I've no idea what may use the 3D SP.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 13/17] gpu: host1x: Reset max value when freeing a syncpoint
  2020-09-16 21:37         ` Dmitry Osipenko
@ 2020-09-17  7:25           ` Mikko Perttunen
  -1 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-17  7:25 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

On 9/17/20 12:37 AM, Dmitry Osipenko wrote:
> 16.09.2020 23:43, Mikko Perttunen пишет:
> ...
>>> Please note that the sync point state actually needs to be completely
>>> reset at the sync point request-time because both downstream fastboot
>>> and upstream u-boot [1] are needlessly enabling display VBLANK interrupt
>>> that continuously increments sync point #26 during of kernel boot until
>>> display controller is reset.
>>>
>>> [1]
>>> https://github.com/u-boot/u-boot/blob/master/drivers/video/tegra.c#L155
>>>
>>> Hence once sync point #26 is requested, it will have a dirty state. So
>>> far this doesn't have any visible effect because sync points aren't used
>>> much.
>>>
>>
>> Maybe we can instead reserve syncpoints that might be used by the boot
>> chain, and only allow allocating them once the display driver has acked
>> that the syncpoint will no longer be incremented? That way if the
>> display driver is disabled for some reason we'll still be fine.
> 
> sounds good
> 
>> Looking at the downstream driver, it (still, on new chips..) reserves
>> the following syncpoints:
>>
>> - 10 (AVP)
>> - 22 (3D)
>> - 26 (VBLANK0)
>> - 27 (VBLANK1)
>>
>> and says that this applies to T20, T30, T114 and T148.
>>
>> I suppose if you haven't observed this happening to other syncpoints
>> than 26, then reserving 26 would probably be enough.
> 
> I only saw SP 26 being used by the DC, but perhaps that may vary from
> device to device and SP 27 could actually be used in a wild as well.
> 
> I think the AVP SP should only relate to the AVP-firmware that upstream
> doesn't support, so we can ignore its reservation.
> 
> I've no idea what may use the 3D SP.
> 

My guess is that some very old code used fixed syncpoint numbers so 
these were added to the reservation list. Let's reserve 26 and 27, that 
should be simple enough since both would be "released" by the display 
driver.

Mikko

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH v2 13/17] gpu: host1x: Reset max value when freeing a syncpoint
@ 2020-09-17  7:25           ` Mikko Perttunen
  0 siblings, 0 replies; 128+ messages in thread
From: Mikko Perttunen @ 2020-09-17  7:25 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

On 9/17/20 12:37 AM, Dmitry Osipenko wrote:
> 16.09.2020 23:43, Mikko Perttunen пишет:
> ...
>>> Please note that the sync point state actually needs to be completely
>>> reset at the sync point request-time because both downstream fastboot
>>> and upstream u-boot [1] are needlessly enabling display VBLANK interrupt
>>> that continuously increments sync point #26 during of kernel boot until
>>> display controller is reset.
>>>
>>> [1]
>>> https://github.com/u-boot/u-boot/blob/master/drivers/video/tegra.c#L155
>>>
>>> Hence once sync point #26 is requested, it will have a dirty state. So
>>> far this doesn't have any visible effect because sync points aren't used
>>> much.
>>>
>>
>> Maybe we can instead reserve syncpoints that might be used by the boot
>> chain, and only allow allocating them once the display driver has acked
>> that the syncpoint will no longer be incremented? That way if the
>> display driver is disabled for some reason we'll still be fine.
> 
> sounds good
> 
>> Looking at the downstream driver, it (still, on new chips..) reserves
>> the following syncpoints:
>>
>> - 10 (AVP)
>> - 22 (3D)
>> - 26 (VBLANK0)
>> - 27 (VBLANK1)
>>
>> and says that this applies to T20, T30, T114 and T148.
>>
>> I suppose if you haven't observed this happening to other syncpoints
>> than 26, then reserving 26 would probably be enough.
> 
> I only saw SP 26 being used by the DC, but perhaps that may vary from
> device to device and SP 27 could actually be used in a wild as well.
> 
> I think the AVP SP should only relate to the AVP-firmware that upstream
> doesn't support, so we can ignore its reservation.
> 
> I've no idea what may use the 3D SP.
> 

My guess is that some very old code used fixed syncpoint numbers so 
these were added to the reservation list. Let's reserve 26 and 27, that 
should be simple enough since both would be "released" by the display 
driver.

Mikko
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 128+ messages in thread

end of thread, other threads:[~2020-09-17  8:08 UTC | newest]

Thread overview: 128+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-05 10:34 [RFC PATCH v2 00/17] Host1x/TegraDRM UAPI Mikko Perttunen
2020-09-05 10:34 ` Mikko Perttunen
2020-09-05 10:34 ` [RFC PATCH v2 01/17] gpu: host1x: Use different lock classes for each client Mikko Perttunen
2020-09-05 10:34   ` Mikko Perttunen
2020-09-05 10:34 ` [RFC PATCH v2 02/17] gpu: host1x: Allow syncpoints without associated client Mikko Perttunen
2020-09-05 10:34   ` Mikko Perttunen
2020-09-05 10:34 ` [RFC PATCH v2 03/17] gpu: host1x: Show number of pending waiters in debugfs Mikko Perttunen
2020-09-05 10:34   ` Mikko Perttunen
2020-09-05 10:34 ` [RFC PATCH v2 04/17] gpu: host1x: Remove cancelled waiters immediately Mikko Perttunen
2020-09-05 10:34   ` Mikko Perttunen
2020-09-05 10:34 ` [RFC PATCH v2 05/17] gpu: host1x: Use HW-equivalent syncpoint expiration check Mikko Perttunen
2020-09-05 10:34   ` Mikko Perttunen
2020-09-05 10:34 ` [RFC PATCH v2 06/17] gpu: host1x: Cleanup and refcounting for syncpoints Mikko Perttunen
2020-09-05 10:34   ` Mikko Perttunen
2020-09-05 14:30   ` Dmitry Osipenko
2020-09-05 14:30     ` Dmitry Osipenko
2020-09-05 14:53     ` Mikko Perttunen
2020-09-05 14:53       ` Mikko Perttunen
2020-09-09  0:07       ` Dmitry Osipenko
2020-09-09  0:07         ` Dmitry Osipenko
2020-09-09  8:03         ` Mikko Perttunen
2020-09-09  8:03           ` Mikko Perttunen
2020-09-05 10:34 ` [RFC PATCH v2 07/17] gpu: host1x: Introduce UAPI header Mikko Perttunen
2020-09-05 10:34   ` Mikko Perttunen
2020-09-05 10:34 ` [RFC PATCH v2 08/17] gpu: host1x: Implement /dev/host1x device node Mikko Perttunen
2020-09-05 10:34   ` Mikko Perttunen
2020-09-05 10:34 ` [RFC PATCH v2 09/17] gpu: host1x: DMA fences and userspace fence creation Mikko Perttunen
2020-09-05 10:34   ` Mikko Perttunen
2020-09-10 22:00   ` Dmitry Osipenko
2020-09-10 22:00     ` Dmitry Osipenko
2020-09-11  9:07     ` Mikko Perttunen
2020-09-11  9:07       ` Mikko Perttunen
2020-09-05 10:34 ` [RFC PATCH v2 10/17] WIP: gpu: host1x: Add no-recovery mode Mikko Perttunen
2020-09-05 10:34   ` Mikko Perttunen
2020-09-11 16:40   ` Dmitry Osipenko
2020-09-11 16:40     ` Dmitry Osipenko
2020-09-11 22:11     ` Mikko Perttunen
2020-09-11 22:11       ` Mikko Perttunen
2020-09-12 12:53       ` Dmitry Osipenko
2020-09-12 12:53         ` Dmitry Osipenko
2020-09-12 13:31         ` Mikko Perttunen
2020-09-12 13:31           ` Mikko Perttunen
2020-09-12 21:51           ` Dmitry Osipenko
2020-09-12 21:51             ` Dmitry Osipenko
2020-09-13  9:51             ` Mikko Perttunen
2020-09-13  9:51               ` Mikko Perttunen
2020-09-13 18:37               ` Dmitry Osipenko
2020-09-13 18:37                 ` Dmitry Osipenko
2020-09-15 10:57                 ` Mikko Perttunen
2020-09-15 10:57                   ` Mikko Perttunen
2020-09-05 10:34 ` [RFC PATCH v2 11/17] gpu: host1x: Add job release callback Mikko Perttunen
2020-09-05 10:34   ` Mikko Perttunen
2020-09-05 10:34 ` [RFC PATCH v2 12/17] gpu: host1x: Add support for syncpoint waits in CDMA pushbuffer Mikko Perttunen
2020-09-05 10:34   ` Mikko Perttunen
2020-09-05 10:34 ` [RFC PATCH v2 13/17] gpu: host1x: Reset max value when freeing a syncpoint Mikko Perttunen
2020-09-05 10:34   ` Mikko Perttunen
2020-09-16 19:44   ` Dmitry Osipenko
2020-09-16 19:44     ` Dmitry Osipenko
2020-09-16 20:43     ` Mikko Perttunen
2020-09-16 20:43       ` Mikko Perttunen
2020-09-16 21:37       ` Dmitry Osipenko
2020-09-16 21:37         ` Dmitry Osipenko
2020-09-17  7:25         ` Mikko Perttunen
2020-09-17  7:25           ` Mikko Perttunen
2020-09-05 10:34 ` [RFC PATCH v2 14/17] drm/tegra: Add new UAPI to header Mikko Perttunen
2020-09-05 10:34   ` Mikko Perttunen
2020-09-08 23:45   ` DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC Dmitry Osipenko
2020-09-08 23:45     ` DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC Dmitry Osipenko
2020-09-09  8:10     ` DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC Mikko Perttunen
2020-09-09  8:10       ` DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC Mikko Perttunen
2020-09-10 22:15       ` DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC Dmitry Osipenko
2020-09-10 22:15         ` DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC Dmitry Osipenko
2020-09-11  9:52         ` DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC Mikko Perttunen
2020-09-11  9:52           ` DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC Mikko Perttunen
2020-09-05 10:34 ` [RFC PATCH v2 15/17] drm/tegra: Add power_on/power_off engine callbacks Mikko Perttunen
2020-09-05 10:34   ` Mikko Perttunen
2020-09-09  0:16   ` Dmitry Osipenko
2020-09-09  0:16     ` Dmitry Osipenko
2020-09-09  8:11     ` Mikko Perttunen
2020-09-09  8:11       ` Mikko Perttunen
2020-09-05 10:34 ` [RFC PATCH v2 16/17] drm/tegra: Allocate per-engine channel in core code Mikko Perttunen
2020-09-05 10:34   ` Mikko Perttunen
2020-09-05 10:34 ` [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI Mikko Perttunen
2020-09-05 10:34   ` Mikko Perttunen
2020-09-09  0:47   ` Dmitry Osipenko
2020-09-09  0:47     ` Dmitry Osipenko
2020-09-09  8:19     ` Mikko Perttunen
2020-09-09  8:19       ` Mikko Perttunen
2020-09-10 21:59       ` Dmitry Osipenko
2020-09-10 21:59         ` Dmitry Osipenko
2020-09-09  1:13   ` [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI (submit_handle_syncpts) Dmitry Osipenko
2020-09-09  1:13     ` Dmitry Osipenko
2020-09-09  1:24     ` Dmitry Osipenko
2020-09-09  1:24       ` Dmitry Osipenko
2020-09-09  8:26       ` Mikko Perttunen
2020-09-09  8:26         ` Mikko Perttunen
2020-09-10 21:58         ` Dmitry Osipenko
2020-09-10 21:58           ` Dmitry Osipenko
2020-09-09  2:06   ` [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI Dmitry Osipenko
2020-09-09  2:06     ` Dmitry Osipenko
2020-09-09  8:26     ` Mikko Perttunen
2020-09-09  8:26       ` Mikko Perttunen
2020-09-09  2:10   ` Dmitry Osipenko
2020-09-09  2:10     ` Dmitry Osipenko
2020-09-09  2:34     ` Dmitry Osipenko
2020-09-09  2:34       ` Dmitry Osipenko
2020-09-09  8:36       ` Mikko Perttunen
2020-09-09  8:36         ` Mikko Perttunen
2020-09-10 21:57         ` Dmitry Osipenko
2020-09-10 21:57           ` Dmitry Osipenko
2020-09-11  9:59           ` Mikko Perttunen
2020-09-11  9:59             ` Mikko Perttunen
2020-09-11 16:30             ` Dmitry Osipenko
2020-09-11 16:30               ` Dmitry Osipenko
2020-09-15 11:08               ` Mikko Perttunen
2020-09-15 11:08                 ` Mikko Perttunen
2020-09-08 23:36 ` [RFC PATCH v2 00/17] Host1x/TegraDRM UAPI Dmitry Osipenko
2020-09-08 23:36   ` Dmitry Osipenko
2020-09-09  8:40   ` Mikko Perttunen
2020-09-09  8:40     ` Mikko Perttunen
2020-09-10 22:09     ` Dmitry Osipenko
2020-09-10 22:09       ` Dmitry Osipenko
2020-09-09  2:20 ` Dmitry Osipenko
2020-09-09  2:20   ` Dmitry Osipenko
2020-09-09  8:44   ` Mikko Perttunen
2020-09-09  8:44     ` Mikko Perttunen
2020-09-10 21:53     ` Dmitry Osipenko
2020-09-10 21:53       ` Dmitry Osipenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.